• P-ISSN 0974-6846 E-ISSN 0974-5645

Indian Journal of Science and Technology


Indian Journal of Science and Technology

Year: 2023, Volume: 16, Issue: 38, Pages: 3250-3257

Original Article

Chi-Square Feature Selection Technique for Student’s performance prediction

Received Date:19 April 2023, Accepted Date:29 August 2023, Published Date:13 October 2023


Objectives: The main goals of this study are: 1) To assess students’ performance using several machine learning models. 2) To identify the attributes influencing the student’s performance using feature selection. 3) To assess and compare machine learning model performance using accuracy, precision, recall, F-1 score, and AUC score (Area Under Curve) as performance indicators. 4) Compare the effectiveness of feature selection-based versus non-feature-based machine learning models. Methods: The student performance dataset from UCI has been taken for this study. It consists of 650 records with 32 features. The pertinent features are selected by applying the Chi-square method to facilitate the effective construction of the model. Further, the implementation has been performed by using the classification models. Lastly, how well the machine learning model has performed has been compared in terms of performance metrics namely accuracy, precision, recall, F-1 score, and AUC score. Findings: The findings related to the first objective showed that the outcome of the student performance is passed and failed. The experimental evaluation of the Decision tree (DT), random forest (RF), SVM, K-Nearest Neighbors Algorithm (KNN), and XGBoost are evaluated in terms of accuracy, precision, recall, F-1 score, and AUC score. The F-1 score achieved by the DT, RF, SVM, KNN, and XGBoost is 92.16, 95.06, 95.19,93.8 and 94.59 respectively. The finding to the second objective identifies the attributes: Failures, Schoolsup, First Period Grade(G1), Second Period Grade(G2), and Final Grade(G3) influence on students’ performance. The finding of the third Objective shows that Support Vector Machine classification model outperforms the other models with F-1 score of 95.19%. The finding related to the fourth objective identifies that Models with use feature selection techniques give more performance than the model which does not use it. Novelty: Using machine learning to predict students’ performance can revolutionize the education sector by providing a data-driven approach to evaluating academic performance. This research work proposed a new “Chi-Square Based Feature Selection” (CBFS) technique for the prediction of students’ performance. Moreover, using chi-square for feature selection involves selecting only the most relevant features, which helps reduce the model’s complexity and improves its performance.

Keywords: Machine Learning, Prediction, Dataset Problem, Early Warning System, Educational Data Mining


  1. Embarak O. Apply Machine Learning Algorithms to Predict At-Risk Students to Admission Period. 2020 Seventh International Conference on Information Technology Trends (ITT). 2020;p. 190–195. Available from: https://doi.org/10.1109/ITT51279.2020.9320878
  2. Kouser F, Meghji AF, Mahoto NA. Early Detection of Failure Risks from Students' Data. 2020 International Conference on Emerging Trends in Smart Technologies (ICETST). 2020;p. 1–6. Available from: https://doi.org/10.1109/ICETST49965.2020.9080692
  3. Oreški D, Zamuda D. Machine Learning Based Model for Predicting Student Outcomes. International Conference on Industrial Engineering and Operations Management Istanbul. 2022. Available from: https://ieomsociety.org/proceedings/2022istanbul/967.pdf
  4. Begum S, Padmannavar SS. Prediction of Student Performance using Genetically Optimized Feature Selection with Multiclass Classification. International Journal of Engineering Trends and Technology. 2022;70(4):223–235. Available from: https://doi.org/10.14445/22315381/IJETT-V70I4P219
  5. Deepti A, Sonu M, Vikram B. Significance of NonAcademic Parameters for Predicting Student Performance Using Ensemble Learning Techniques. International Journal of System Dynamics Applications (IJSDA). 2021;10:38–49. Available from: https://doi.org/10.4018/IJSDA.2021070103
  6. Kumar M, Mehta G, Nayar N, Sharma A. EMT: Ensemble Meta-Based Tree Model for Predicting Student Performance in Academics. IOP Conference Series: Materials Science and Engineering. 2021;1022(1):012062. Available from: https://iopscience.iop.org/article/10.1088/1757-899X/1022/1/012062/pdf
  7. Salal YK, Hussain M, Theodorou P. Student Next Assignment Submission Prediction Using a Machine Learning Approach. In: Lecture Notes in Electrical Engineering. (Vol. 729, pp. 383-393) Springer International Publishing. 2021.
  8. Dhilipan J, Vijayalakshmi N, Suriya S, Christopher A. Prediction of Students Performance using Machine learning. IOP Conference Series: Materials Science and Engineering. 2021;1055(1):012122. Available from: https://iopscience.iop.org/article/10.1088/1757-899X/1055/1/012122/pdf
  9. Pujianto U, Prasetyo WA, Taufani AR. Students Academic Performance Prediction with k-Nearest Neighbor and C4.5 on SMOTE-balanced data. 2020 3rd International Seminar on Research of Information Technology and Intelligent Systems (ISRITI). 2020;p. 348–353. Available from: https://doi.org/10.1109/ISRITI51436.2020.9315439
  10. Mayahi KA, Al-Bahri M. Machine Learning Based Predicting Student Academic Success. 2020 12th International Congress on Ultra Modern Telecommunications and Control Systems and Workshops (ICUMT). 2020;p. 264–268. Available from: https://doi.org/10.1109/ICUMT51630.2020.9222435


© 2023 Bhoria et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Published By Indian Society for Education and Environment (iSee)


Subscribe now for latest articles and news.