• P-ISSN 0974-6846 E-ISSN 0974-5645

Indian Journal of Science and Technology


Indian Journal of Science and Technology

Year: 2022, Volume: 15, Issue: 6, Pages: 237-242

Original Article

Impact of Unbalanced Classification on the Performance of Software Defect Prediction Models

Received Date:25 November 2021, Accepted Date:23 January 2022, Published Date:16 February 2022


Objectives: To propose a suitable imbalanced data classification model to split the dataset into two new datasets and to test the created imbalanced dataset by the prediction models. Methods: The imbalance defect data sets are taken from the PROMISE library and used for the performance evaluation. The results clearly demonstrate that the performance of three existing prediction classifier models, K-Nearest Neighbor (KNN), Naive Bayes (NB), and Back Propagation (BPN), is very susceptible in terms of unbalance of classification, while Support Vector Machine (SVM) and Extreme Learning Machine (ELM) are more stable. Findings: The outcome of this research reveals that applied SVM and ELM machine learning models improves the performance in defect prediction and records 29% more than KNN, and 19% more than NB and BPN. Novelty: According to the findings of a comprehensive study, the proposed machine learning new classification imbalance impact analysis method outperforms the existing ones in order to transform the original imbalance data set into a new data set with an increasing imbalance rate and be able to select models to evaluate different predictions on the new data set.

Keywords: Software Fault Prediction Model; Imbalance Problem Classification; Artificial Intelligence; Smart Debugging; Unbalanced Classification


  1. Pandey SK, Tripathi AK. Class Imbalance Issue in Software Defect Prediction Models by various Machine Learning Techniques: An Empirical Study. 2021 8th International Conference on Smart Computing and Communications (ICSCC). 2021;2021:58–63. Available from: https://doi.org/10.1109/ICSCC51209.2021.9528170
  2. Makki S, Assaghir Z, Taher Y, Haque R, Hacid MS, Zeineddine H. An Experimental Study With Imbalanced Classification Approaches for Credit Card Fraud Detection. IEEE Access. 2019;7:93010–93022. Available from: https://dx.doi.org/10.1109/access.2019.2927266
  3. Thabtah F, Hammoud S, Kamalov F, Gonsalves A. Data imbalance in classification: Experimental evaluation. Information Sciences. 2020;513:429–441. Available from: https://dx.doi.org/10.1016/j.ins.2019.11.004
  4. Mohammed A, Podila PSB, Davis RL, Ataga KI, Hankins JS, Kamaleswaran R. Machine learning predicts early-onset acute organ failure in critically ill patients with sickle cell disease. bioRxiv. 2019;p. 614941. Available from: https://doi.org/10.1101/614941
  5. Liu W, Wang B, Wang W. Deep Learning Software Defect Prediction Methods for Cloud Environments Research. Scientific Programming. 2021;2323100:1–11. Available from: https://doi.org/10.1155/2021/2323100
  6. Majd A, Vahidi-Asl M, Khalilian A, Poorsarvi-Tehrani P, Haghighi H. SLDeep: Statement-level software defect prediction using deep-learning model on static code features. Expert Systems with Applications. 2020;147:113156. Available from: https://dx.doi.org/10.1016/j.eswa.2019.113156
  7. Bowes D, Hall T, Petrić J. Software defect prediction: do different classifiers find the same defects? Software Quality Journal. 2018;26(2):525–552. Available from: https://dx.doi.org/10.1007/s11219-016-9353-3
  8. Xia X, Lo D, Pan SJ, Nagappan N, Wang X. HYDRA: Massively Compositional Model for Cross-Project Defect Prediction. IEEE Transactions on Software Engineering. 2016;42(10):977–998. Available from: https://dx.doi.org/10.1109/tse.2016.2543218
  9. Shrikanth NC, Majumder S, Menzies T. Early Life Cycle Software Defect Prediction. Why? How. 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). 2021;p. 448–459. Available from: https://doi.org/10.1109/ICSE43902.2021.00050
  10. Rtayli N, Enneya N. Enhanced credit card fraud detection based on SVM-recursive feature elimination and hyper-parameters optimization. Journal of Information Security and Applications. 2020;55:102596. Available from: https://dx.doi.org/10.1016/j.jisa.2020.102596
  11. Mohammed A, Podila PSB, Davis RL, Ataga KI, Hankins JS, Kamaleswaran RS. Machine learning predicts early-onset acute organ failure in critically ill patients with sickle cell disease. bioRxiv. 2019;p. 614941. Available from: https://doi.org/10.1101/614941
  12. Xu Y, Yu Z, Chen CLP, Cao W. A Novel Classifier Ensemble Method Based on Subspace Enhancement for High-Dimensional Data Classification. IEEE Transactions on Knowledge and Data Engineering. 2021;p. 1. Available from: https://dx.doi.org/10.1109/tkde.2021.3087517
  13. Liu B, Tsoumakas G. Dealing with class imbalance in classifier chains via random undersampling. Knowledge-Based Systems. 2020;192:105292. Available from: https://dx.doi.org/10.1016/j.knosys.2019.105292
  14. Peng P, Zhang W, Zhang Y, Xu Y, Wang H, Zhang H. Cost sensitive active learning using bidirectional gated recurrent neural networks for imbalanced fault diagnosis. Neurocomputing. 2020;407:232–245. Available from: https://dx.doi.org/10.1016/j.neucom.2020.04.075
  15. Pandit M, Gupta D, Anand D, Goyal N, Aljahdali HM, Mansilla AO, et al. Towards Design and Feasibility Analysis of DePaaS: AI Based Global Unified Software Defect Prediction Framework. Applied Sciences. 2022;12(1):493. Available from: https://dx.doi.org/10.3390/app12010493
  16. Pal S, Sillitti A. A Classification of Software Defect Prediction Models. 2021 International Conference "Nonlinearity, Information and Robotics" (NIR). 2021;2021:1–6. Available from: https://doi: 10.1109/NIR52917.2021.9666110


© 2022 Eldho. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Published By Indian Society for Education and Environment (iSee)


Subscribe now for latest articles and news.