• P-ISSN 0974-6846 E-ISSN 0974-5645

Indian Journal of Science and Technology

Article

Indian Journal of Science and Technology

Year: 2023, Volume: 16, Issue: 10, Pages: 744-755

Original Article

Feature Selection Techniques in Learning Algorithms to Predict Truthful Data

Received Date:30 October 2022, Accepted Date:08 February 2023, Published Date:11 March 2023

Abstract

Objectives: This review focuses on various feature selection process, strategy, and methods such as filter, wrapper and embedded algorithms and its advantages and disadvantages are presented. Methods: The algorithms such as Mutual Information Gain (MIG), Chi-Square (CS) and Recursive Feature Elimination (RFE) are used to select features. In this review, two benchmark datasets: Breast cancer and Diabetes are used. Findings: To improve the efficiency, selection of appropriate feature selection methods and algorithms are most important. To measure the performance of these selected features Random Forest model used as classifiers and compared with Support Vector Machine and Decision Tree models. Filter method and algorithm selects up to 15 features out of 17 for diabetes dataset with 89 % to 98 % of accuracy. For breast cancer dataset, up to 28 features out of 31 features selected with 98.5 % of accuracy. Wrapper method RFE selects 14 features from 17 for diabetes and 10 out of 31 features selected for breast cancer. This RFE method shows up to 98.25 % of accuracy for diabetes and 99.20% of accuracy for breast cancer. Novelty: Feature selection techniques help to improve the performance, efficiency and decrease the storage and processing time and build a better model for further process in prediction. The proper feature selection helps to diagnose diseases at an earlier stage and improve the survival of human beings.

Keywords: Mutual Information Gain; ChiSquare; Recursive Feature Elimination; Support Vector Machine; Random Forest; Decision Tree

References

  1. Islam MR, Lima AA, Das SC, Mridha MF, Prodeep AR, Watanobe Y. A Comprehensive Survey on the Process, Methods, Evaluation, and Challenges of Feature Selection. IEEE Access. 2022;10:99595–99632. Available from: https://doi.org/10.1109/access.2022.3205618
  2. Pudjihartono N, Fadason T, Kempa-Liehr AW, O'sullivan JM. A Review of Feature Selection Methods for Machine Learning-Based Disease Risk Prediction. Frontiers in Bioinformatics. 2022;2:927312. Available from: https://doi.org/10.3389/fbinf.2022.927312
  3. Wang Y, Gao X, Ru X, Sun P, Wang J. A hybrid feature selection algorithm and its application in bioinformatics. PeerJ Computer Science. 2022;8:e933. Available from: https://doi.org/10.7717/peerj-cs.933
  4. Hessen SH, Abdul-Kader HM, Khedr AE, Salem RK. Developing Multiagent E-Learning System-Based Machine Learning and Feature Selection Techniques. Computational Intelligence and Neuroscience. 2022;2022:1–8. Available from: https://doi.org/10.1155/2022/2941840
  5. Tang J, Wang Y, Luo Y, Fu J, Zhang Y, Li Y, et al. Computational advances of tumor marker selection and sample classification in cancer proteomics. Computational and Structural Biotechnology Journal. 2020;18:2012–2025. Available from: https://doi.org/10.1016/j.csbj.2020.07.009
  6. Khan MA, Ashraf I, Alhaisoni M, Damaševičius R, Scherer R, Rehman A, et al. Multimodal Brain Tumor Classification Using Deep Learning and Robust Feature Selection: A Machine Learning Application for Radiologists. Diagnostics. 2020;10(8):565. Available from: https://doi.org/10.3390/diagnostics10080565
  7. Toğaçar M, Cömert Z, Ergen B. Classification of brain MRI using hyper column technique with convolutional neural network and feature selection method. Expert Systems with Applications. 2020;149:113274.
  8. Shi H, Wang H, Huang Y, Zhao L, Qin C, Liu C. A hierarchical method based on weighted extreme gradient boosting in ECG heartbeat classification. Computer Methods and Programs in Biomedicine. 2019;171:1–10. Available from: https://doi.org/10.1016/j.cmpb.2019.02.005
  9. Mohammed TA, Bayat O, Uçan ON, Alhayali S. Hybrid Efficient Genetic Algorithm for Big Data Feature Selection Problems. Foundations of Science. 2020;25(4):1009–1025. Available from: https://doi.org/10.1007/s10699-019-09588-6
  10. Atrey K, Sharma Y, Bodhey NK, Singh BK. Breast Cancer Prediction Using Dominance-based Feature Filtering Approach: A Comparative Investigation in Machine Learning Archetype. Brazilian Archives of Biology and Technology. 2019;62. Available from: https://doi.org/10.1590/1678-4324-2019180486
  11. Khan MA, Lali IU, Rehman A, Ishaq M, Sharif M, Saba T, et al. Brain tumor detection and classification: A framework of marker‐based watershed algorithm and multilevel priority features selection. Microscopy Research and Technique. 2019;82(6):909–922. Available from: https://doi.org/10.1002/jemt.23238
  12. Phagwara P, India, Nidhi, Sharma B, Handa D. Building predictive model by using data mining and feature selection techniques on academic dataset. International Journal of Modern Education and Computer Science. 2022;14(4):16–29. Available from: https://doi.org/10.5815/ijmecs.2022.04.02
  13. Khan F, Tarimer I, Alwageed HS, Karadağ BC, Fayaz M, Abdusalomov AB, et al. Effect of Feature Selection on the Accuracy of Music Popularity Classification Using Machine Learning Algorithms. Electronics. 2022;11(21):3518. Available from: https://doi.org/10.3390/electronics11213518
  14. Jain S, Kumar P. Accuracy Enhancement for Breast Cancer Detection Using Classification and Feature Selection. International Journal of Information Retrieval Research. 2022;12(2):1–15. Available from: https://doi.org/10.4018/ijirr.299931
  15. Elemam T, Elshrkawey M. A Highly Discriminative Hybrid Feature Selection Algorithm for Cancer Diagnosis. The Scientific World Journal. 2022;2022:1–15. Available from: https://doi.org/10.1155/2022/1056490
  16. Gomes R, Paul N, He N, Huber AF, Jansen RJ. Application of Feature Selection and Deep Learning for Cancer Prediction Using DNA Methylation Markers. Genes. 2022;13(9):1557. Available from: https://doi.org/10.3390/genes13091557

Copyright

© 2023 Usha & Anuradha. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Published By Indian Society for Education and Environment (iSee)

DON'T MISS OUT!

Subscribe now for latest articles and news.