Identification of Most Preferable Machine Learning Technique for Prediction of Bank Loan Defaulters

Digambar B Uphade; Aniket A Muley; Swapnil V Chalwadi

doi:10.17485/IJST/v17i4.2978

Article

Identification of Most Preferable Machine Learning Technique for Prediction of Bank Loan Defaulters

VIEWS 493
PDF 139

Indian Journal of Science and Technology

DOI: 10.17485/IJST/v17i4.2978

Year: 2024, Volume: 17, Issue: 4, Pages: 343-351

Original Article

Identification of Most Preferable Machine Learning Technique for Prediction of Bank Loan Defaulters

Digambar B Uphade^1*, Aniket A Muley², Swapnil V Chalwadi³

¹Department of Statistics, KRT Arts, BH Commerce and AM Science College, Nashik, 422002, Maharashtra, India
²School of Mathematical Sciences, Swami Ramanand Teerth Marathwada University, Nanded, 431606, Maharashtra, India
³School of Liberal Arts, Dr. Vishwanath Karad MIT World Peace University, Pune, 411038, Maharashtra, India

*Corresponding Author
Email: [email protected]

Received Date:24 November 2023, Accepted Date:28 December 2023, Published Date:20 January 2024

This work is licensed under a Creative Commons Attribution 4.0 International License.

Abstract

Objectives: In the current financial landscape, banks confront with the significant challenges in effectively managing credit risk and ensuring the stability of their loan portfolios. It is imperative for the banks to ensure an accurate assessment of loan default possibility as a critical aspect of their overall risk management process. The study aims to develop a predictive model that is suitable for accurately identifying potential defaulters. Methods: Investigation employs a diverse range of machine learning techniques, including Random Forest, Logistic Regression, Decision Tree, k-Nearest Neighbour, Support Vector Machine, XG Boost, Ada Boost, and Gradient Boosting Machines, to evaluate loan default probabilities in both balanced and imbalanced data environments. The study's methodology involved the application of these algorithms to datasets typically characterized by imbalance, a frequent occurrence in financial risk assessments. We addressed this challenge by implementing resampling techniques, thereby enhancing the representativeness and accuracy of findings. Findings: Findings of this study indicate that in imbalanced datasets, the Random Forest algorithm emerged as the most accurate, registering an impressive 0.91 accuracy score. Comparable efficacy was noted in Logistic Regression and SVM, each achieving 0.90 and 0.91 accuracy scores respectively. Remarkably, in balanced datasets, the Random Forest model demonstrated a perfect accuracy score of 1.00, surpassing other models. This model consistently excelled in precision, recall, and F1-score metrics across different data scenarios. Novelty: This study highlights the Random Forest classifier as an optimal tool for predicting loan defaults, marking a significant advancement over existing methodologies. The outcomes of this research provide crucial insights for financial institutions in enhancing their loan risk assessments, thus enabling more precise and informed decision-making in lending processes.

Keywords: Credit risk, Machine learning, Random forest, Loan defaulter, Classification

References

Obiora SC, Zeng Y, Li Q, Liu H, Adjei PD, Csordas T. The effect of economic growth on banking system performance: An interregional and comparative study of Sub-Saharan Africa and developed economies. Economic Systems. 2022;46(1):100939. Available from: https://doi.org/10.1016/j.ecosys.2022.100939
Kaur B, Kaur R, Sood K, Grıma S. Impact of Non-Performing Assets on the Profitability of the Indian Banking Sector. Contemporary Studies of Risks in Emerging Technology, Part A. 2023;p. 257–269. Available from: https://doi.org/10.1108/978-1-80455-562-020231017
Sigrist F, Leuenberger N. Machine learning for corporate default risk: Multi-period prediction, frailty correlation, loan portfolios, and tail probabilities. European Journal of Operational Research. 2023;305(3):1390–1406. Available from: https://doi.org/10.1016/j.ejor.2022.06.035
Wang Y, Zhang Y, Lu Y, Yu X. A Comparative Assessment of Credit Risk Model Based on Machine Learning - a case study of bank loan data. Procedia Computer Science. 2020;174:141–149. Available from: https://doi.org/10.1016/j.procs.2020.06.069
Kristóf T, Virág M. EU-27 bank failure prediction with C5.0 decision trees and deep learning neural networks. Research in International Business and Finance. 2022;61:1–17. Available from: https://doi.org/10.1016/j.ribaf.2022.101644
Nosratabadi S, Mosavi A, Duan P, Ghamisi P, Filip F, Band SS, et al. Data Science in Economics: Comprehensive Review of Advanced Machine Learning and Deep Learning Methods. Mathematics. 2020;8(10):1–25. Available from: https://doi.org/10.3390/math8101799
Lee TK, Cho JH, Kwon DS, Sohn SY. Global stock market investment strategies based on financial network indicators using machine learning techniques. Expert Systems with Applications. 2019;117:228–242. Available from: https://doi.org/10.1016/j.eswa.2018.09.005
Uthayakumar J, Metawa N, Shankar K, SKL. Intelligent hybrid model for financial crisis prediction using machine learning techniques. Information Systems and e-Business Management. 2020;18:617–645. Available from: https://doi.org/10.1007/s10257-018-0388-9
Ashtiani MN, Raahemi B. Intelligent Fraud Detection in Financial Statements Using Machine Learning and Data Mining: A Systematic Literature Review. IEEE Access. 2021;10:72504–72525. Available from: https://doi.org/10.1109/ACCESS.2021.3096799
Zhu X, Chu Q, Song X, Hu P, Peng L. Explainable prediction of loan default based on machine learning models. Data Science and Management. 2023;6(3):123–133. Available from: https://doi.org/10.1016/j.dsm.2023.04.003
Uddin N, Ahamed MKU, Uddin MA, Islam MM, Talukder MA, Aryal S. An ensemble machine learning based bank loan approval predictions system with a smart application. International Journal of Cognitive Computing in Engineering. 2023;4:327–339. Available from: https://doi.org/10.1016/j.ijcce.2023.09.001
Chittora P, Chaurasia S, Chakrabarti P, Kumawat G, Chakrabarti T, Leonowicz Z, et al. Prediction of Chronic Kidney Disease - A Machine Learning Perspective. IEEE Access. 2021;9:17312–17334. Available from: https://doi.org/10.1109/ACCESS.2021.3053763
Bisong E, &bisong E. Building Machine Learning and Deep Learning Models on Google Cloud Platform. A Comprehensive Guide for Beginners (1). (pp. 243-250) CA, USA. Apress Berkeley. 2019.
Kavitha M, Yudistira N, Kurita T. Multi instance learning via deep CNN for multi-class recognition of Alzheimer's disease. In: 2019 IEEE 11th International Workshop on Computational Intelligence and Applications (IWCIA). (pp. 89-94) IEEE. 2020.
Bhavani TT, Rao MK, Reddy AM. Network Intrusion Detection System Using Random Forest and Decision Tree Machine Learning Techniques. In: First International Conference on Sustainable Technologies for Computational Intelligence, Advances in Intelligent Systems and Computing . (Vol. 1045, pp. 637-643) Springer, Singapore. 2020.
Saritas MM, Yasar A. Performance analysis of ANN and Naive Bayes classification algorithm for data classification. International journal of intelligent systems and applications in engineering. 2019;7(2):88–91. Available from: https://doi.org/10.18201//ijisae.2019252786
Zhou Y, Uddin MS, Habib T, Chi G, Yuan K. Feature selection in credit risk modeling: an international evidence. Economic Research-Ekonomska Istraživanja. 2021;34(1):3064–3091. Available from: https://doi.org/10.1080/1331677X.2020.1867213
Moorthy RS, Pabitha P. Optimal Detection of Phising Attack using SCA based K-NN. Procedia Computer Science. 2020;171:1716–1725. Available from: https://doi.org/10.1016/j.procs.2020.04.184
Sharma P, Bora BJ. A Review of Modern Machine Learning Techniques in the Prediction of Remaining Useful Life of Lithium-Ion Batteries. Batteries. 2023;9(1):1–17. Available from: https://doi.org/10.3390/batteries9010013
Park J, Kwon S, Jeong SP. A study on improving turnover intention forecasting by solving imbalanced data problems: focusing on SMOTE and generative adversarial networks. Journal of Big Data. 2023;10(1):1–16. Available from: https://doi.org/10.1186/s40537-023-00715-6
Viloria A, Lezama OBP, Mercado-Caruzo N. Unbalanced data processing using oversampling: Machine Learning. Procedia Computer Science. 2020;175:108–113. Available from: https://doi.org/10.1016/j.procs.2020.07.018
Wei Z, Zhang L, Zhao L. Minority-prediction-probability-based oversampling technique for imbalanced learning. Information Sciences. 2023;622:1273–1295. Available from: https://doi.org/10.1016/j.ins.2022.11.148
Younas F, Usman M, Yan WQ. A deep ensemble learning method for colorectal polyp classification with optimized network parameters. Applied Intelligence. 2023;53(2):2410–2433. Available from: https://doi.org/10.1007/s10489-022-03689-9
Kovács G. An empirical comparison and evaluation of minority oversampling techniques on a large number of imbalanced datasets. Applied Soft Computing. 2019;83:105662. Available from: https://doi.org/10.1016/j.asoc.2019.105662

Copyright

© 2024 Uphade et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Published By Indian Society for Education and Environment (iSee)