• P-ISSN 0974-6846 E-ISSN 0974-5645

Indian Journal of Science and Technology


Indian Journal of Science and Technology

Year: 2024, Volume: 17, Issue: 4, Pages: 343-351

Original Article

Identification of Most Preferable Machine Learning Technique for Prediction of Bank Loan Defaulters

Received Date:24 November 2023, Accepted Date:28 December 2023, Published Date:20 January 2024


Objectives: In the current financial landscape, banks confront with the significant challenges in effectively managing credit risk and ensuring the stability of their loan portfolios. It is imperative for the banks to ensure an accurate assessment of loan default possibility as a critical aspect of their overall risk management process. The study aims to develop a predictive model that is suitable for accurately identifying potential defaulters. Methods: Investigation employs a diverse range of machine learning techniques, including Random Forest, Logistic Regression, Decision Tree, k-Nearest Neighbour, Support Vector Machine, XG Boost, Ada Boost, and Gradient Boosting Machines, to evaluate loan default probabilities in both balanced and imbalanced data environments. The study's methodology involved the application of these algorithms to datasets typically characterized by imbalance, a frequent occurrence in financial risk assessments. We addressed this challenge by implementing resampling techniques, thereby enhancing the representativeness and accuracy of findings. Findings: Findings of this study indicate that in imbalanced datasets, the Random Forest algorithm emerged as the most accurate, registering an impressive 0.91 accuracy score. Comparable efficacy was noted in Logistic Regression and SVM, each achieving 0.90 and 0.91 accuracy scores respectively. Remarkably, in balanced datasets, the Random Forest model demonstrated a perfect accuracy score of 1.00, surpassing other models. This model consistently excelled in precision, recall, and F1-score metrics across different data scenarios. Novelty: This study highlights the Random Forest classifier as an optimal tool for predicting loan defaults, marking a significant advancement over existing methodologies. The outcomes of this research provide crucial insights for financial institutions in enhancing their loan risk assessments, thus enabling more precise and informed decision-making in lending processes.

Keywords: Credit risk, Machine learning, Random forest, Loan defaulter, Classification


  1. Kaur B, Kaur R, Sood K, Grıma S. Impact of Non-Performing Assets on the Profitability of the Indian Banking Sector. Contemporary Studies of Risks in Emerging Technology, Part A. 2023;p. 257–269. Available from: https://doi.org/10.1108/978-1-80455-562-020231017
  2. Sigrist F, Leuenberger N. Machine learning for corporate default risk: Multi-period prediction, frailty correlation, loan portfolios, and tail probabilities. European Journal of Operational Research. 2023;305(3):1390–1406. Available from: https://doi.org/10.1016/j.ejor.2022.06.035
  3. Wang Y, Zhang Y, Lu Y, Yu X. A Comparative Assessment of Credit Risk Model Based on Machine Learning - a case study of bank loan data. Procedia Computer Science. 2020;174:141–149. Available from: https://doi.org/10.1016/j.procs.2020.06.069
  4. Kristóf T, Virág M. EU-27 bank failure prediction with C5.0 decision trees and deep learning neural networks. Research in International Business and Finance. 2022;61:1–17. Available from: https://doi.org/10.1016/j.ribaf.2022.101644
  5. Nosratabadi S, Mosavi A, Duan P, Ghamisi P, Filip F, Band SS, et al. Data Science in Economics: Comprehensive Review of Advanced Machine Learning and Deep Learning Methods. Mathematics. 2020;8(10):1–25. Available from: https://doi.org/10.3390/math8101799
  6. Lee TK, Cho JH, Kwon DS, Sohn SY. Global stock market investment strategies based on financial network indicators using machine learning techniques. Expert Systems with Applications. 2019;117:228–242. Available from: https://doi.org/10.1016/j.eswa.2018.09.005
  7. Uthayakumar J, Metawa N, Shankar K, SKL. Intelligent hybrid model for financial crisis prediction using machine learning techniques. Information Systems and e-Business Management. 2020;18:617–645. Available from: https://doi.org/10.1007/s10257-018-0388-9
  8. Zhu X, Chu Q, Song X, Hu P, Peng L. Explainable prediction of loan default based on machine learning models. Data Science and Management. 2023;6(3):123–133. Available from: https://doi.org/10.1016/j.dsm.2023.04.003
  9. Uddin N, Ahamed MKU, Uddin MA, Islam MM, Talukder MA, Aryal S. An ensemble machine learning based bank loan approval predictions system with a smart application. International Journal of Cognitive Computing in Engineering. 2023;4:327–339. Available from: https://doi.org/10.1016/j.ijcce.2023.09.001
  10. Chittora P, Chaurasia S, Chakrabarti P, Kumawat G, Chakrabarti T, Leonowicz Z, et al. Prediction of Chronic Kidney Disease - A Machine Learning Perspective. IEEE Access. 2021;9:17312–17334. Available from: https://doi.org/10.1109/ACCESS.2021.3053763
  11. Bisong E, &bisong E. Building Machine Learning and Deep Learning Models on Google Cloud Platform. A Comprehensive Guide for Beginners (1). (pp. 243-250) CA, USA. Apress Berkeley. 2019.
  12. Bhavani TT, Rao MK, Reddy AM. Network Intrusion Detection System Using Random Forest and Decision Tree Machine Learning Techniques. In: First International Conference on Sustainable Technologies for Computational Intelligence, Advances in Intelligent Systems and Computing . (Vol. 1045, pp. 637-643) Springer, Singapore. 2020.
  13. Saritas MM, Yasar A. Performance analysis of ANN and Naive Bayes classification algorithm for data classification. International journal of intelligent systems and applications in engineering. 2019;7(2):88–91. Available from: https://doi.org/10.18201//ijisae.2019252786
  14. Zhou Y, Uddin MS, Habib T, Chi G, Yuan K. Feature selection in credit risk modeling: an international evidence. Economic Research-Ekonomska Istraživanja. 2021;34(1):3064–3091. Available from: https://doi.org/10.1080/1331677X.2020.1867213
  15. Moorthy RS, Pabitha P. Optimal Detection of Phising Attack using SCA based K-NN. Procedia Computer Science. 2020;171:1716–1725. Available from: https://doi.org/10.1016/j.procs.2020.04.184
  16. Viloria A, Lezama OBP, Mercado-Caruzo N. Unbalanced data processing using oversampling: Machine Learning. Procedia Computer Science. 2020;175:108–113. Available from: https://doi.org/10.1016/j.procs.2020.07.018
  17. Wei Z, Zhang L, Zhao L. Minority-prediction-probability-based oversampling technique for imbalanced learning. Information Sciences. 2023;622:1273–1295. Available from: https://doi.org/10.1016/j.ins.2022.11.148
  18. Younas F, Usman M, Yan WQ. A deep ensemble learning method for colorectal polyp classification with optimized network parameters. Applied Intelligence. 2023;53(2):2410–2433. Available from: https://doi.org/10.1007/s10489-022-03689-9


© 2024 Uphade et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Published By Indian Society for Education and Environment (iSee)


Subscribe now for latest articles and news.