HOW DIFFERENT FEATURE SELECTION METHODS AFFECT THE PRECISION OF BREAST CANCER PREDICTION MACHINE LEARNING MODELS: A COMPARATIVE STUDY

Ms. Khamrunissa Hussain; Prof. Narendra Kumar; Dr. Sudheer Kumar Sharma

doi:10.63001/tbs.2025.v20.i02.S2.pp800-809

Authors

Ms. Khamrunissa Hussain
Prof. Narendra Kumar
Dr. Sudheer Kumar Sharma

DOI:

https://doi.org/10.63001/tbs.2025.v20.i02.S2.pp800-809

Keywords:

Breast Cancer Diagnostic Dataset, Feature Selection, Machine Learning, Breast Cancer

Abstract

Breast cancer is common in developing nations, the early identification of breast cancer is critical for successful treatment. When combined with standard diagnostic data, machine learning techniques can be used to evaluate the risk of acquiring breast cancer. While cancer datasets contain a wealth of patient information, not all data points are useful for predicting cancer outcomes, underscoring the importance of feature selection methods in identifying relevant data.
Numerous studies in this domain have sought to predict various types of breast tumors, as accurate diagnosis is essential for effective breast cancer treatment. The aim of this research is to compare how various feature selection techniques affect the accuracy of different machine learning algorithms currently in use. K-Nearest Neighbors (KNN), Naive Bayes (NB), Decision Trees (DT), Support Vector Machines (SVM), Logistic Regression (LR), Neural Networks (NN), Random Forest (RF), and Naive Bayes (NB) are the seven machine learning methods being assessed in this study. Mutual Information (MI), Spearman Correlation Coefficient, and F-test Feature Selection are among the feature selection methods examined.
The dataset by Wisconsin Diagnostic Breast Cancer (WDBC) is made accessible to the public via the UCI Repository, is used in the studies. According to the results, both the Logistic Regression and Neural Network algorithms outperform other models in terms of accuracy and performance across a wide range of metrics when feature selection is used.