1Asst. Professor Mohammad Salim Hamdard, 2Asst. Professor Hedayatullah Lodin
1,2Faculty of Computer Science, Kabul University
In real life data science problems, it’s almost rare that all the features in the dataset are useful for building a model. In machine learning, feature selection is the process of selecting a subset of relevant features or attributes for constructing a model. Removing irrelevant and redundant features and, selecting relevant features will improve the accuracy of a machine learning model. Furthermore, adding unnecessary variables to a model increases the overall complexity of the model. Our experiment indicates that the accuracy of a classification model is highly affected by the process of feature selection. We train three algorithms (K-Nearest Neighbors, Decision Tree, Multi-layer Perceptron) by selecting all the features and we got accuracies 49%, 84% and 71% accordingly. After doing some feature selection without any logical changes in models code the accuracy scores jumped to 82%, 86% and 78% accordingly which is quite impressive.
KEYWORDS:Machine Learning, Feature Selection, Accuracy, Dimensionality Reduction, Classification
REFERENCES1) J. Miao and L. Niu, “A Survey on Feature Selection,” Procedia Comput. Sci., vol. 91, no. Itqm, pp. 919–926, 2016, doi: 10.1016/j.procs.2016.07.111.
2) E. M. Karabulut, S. A. Özel, and T. İbrikçi, “A comparative study on the effect of feature selection on classification accuracy,” Procedia Technol., vol. 1, pp. 323–327, 2012, doi: 10.1016/j.protcy.2012.02.068.
3) R. C. Chen, C. Dewi, S. W. Huang, and R. E. Caraka, “Selecting critical features for data classification based on machine learning methods,” J. Big Data, vol. 7, no. 1, 2020, doi: 10.1186/s40537-020-00327-4.
4) Y. Akhiat, Y. Manzali, M. Chahhou, and A. Zinedine, “A New Noisy Random Forest Based Method for Feature Selection,” Cybern. Inf. Technol., vol. 21, no. 2, pp. 10–28, 2021, doi: 10.2478/cait-2021-0016.
5) A. Cardew, Antiquity and anxiety: Freud, jung, and the impossibility of the archaic. 2018. doi: 10.4324/9780203733332.
6) K. Taunk, S. De, S. Verma, and A. Swetapadma, “A brief review of nearest neighbor algorithm for learning and classification,” 2019 Int. Conf. Intell. Comput. Control Syst. ICCS 2019, no. May, pp. 1255–1260, 2019, doi: 10.1109/ICCS45141.2019.9065747.
7) M. Suyal and P. Goyal, “A Review on Analysis of K-Nearest Neighbor Classification Machine Learning Algorithms based on Supervised Learning,” Int. J. Eng. Trends Technol., vol. 70, no. 7, pp. 43–48, 2022, doi: 10.14445/22315381/IJETT-V70I7P205.
8) S. Raschka, Python machine learning. Packt publishing ltd., 2015.
9) Pooja Gulati, Amita Sharma, and Manish Gupta, “Theoretical Study of Decision Tree Algorithms to Identify Pivotal Factors for Performance Improvement: A Review Pooja Gulati,” Int. J. Comput. Appl., vol. 141, no. 14, pp. 975–8887, 2016.
10) M. Stephen, Machine Learning An Algorithmic Perspective Second Edition. 2014. [Online]. Available: https://b-ok.cc/book/2543746/ef80cb
11) T. Price and N. Lindqvist, “Evaluation of Feature Selection Methods for Machine Learning Classification of Breast Cancer,” pp. 1–40, 2018.
12) Y. Bouchlaghem, Y. Akhiat, and S. Amjad, “Feature Selection: A Review and Comparative Study,” E3S Web Conf., vol. 351, pp. 1–6, 2022, doi: 10.1051/e3sconf/202235101046.
13) Y. B. Wah, N. Ibrahim, and H. A. Hamid, “Feature selection methods: Case of filter and wrapper approaches for maximising classification accuracy SCIENCE & TECHNOLOGY Feature Selection Methods : Case of Filter and Wrapper Approaches for Maximising Classification Accuracy,” no. May 2020, 2018.
14) A. Jović, K. Brkić, and N. Bogunović, “A review of feature selection methods with applications,” 2015 38th Int. Conv. Inf. Commun. Technol. Electron. Microelectron. MIPRO 2015 - Proc., no. May, pp. 1200–1205, 2015, doi: 10.1109/MIPRO.2015.7160458.
