A Credit Scoring Heterogeneous Ensemble Model Using Stacking and Voting

C J Anil Kumar; B K Raghavendra; S Raghavendra

doi:10.17485/IJST/v15i7.1715

Article

A Credit Scoring Heterogeneous Ensemble Model Using Stacking and Voting

VIEWS 1716
PDF 508

Indian Journal of Science and Technology

DOI: 10.17485/IJST/v15i7.1715

Year: 2022, Volume: 15, Issue: 7, Pages: 300-308

Original Article

A Credit Scoring Heterogeneous Ensemble Model Using Stacking and Voting

C J Anil Kumar¹,^2*, B K Raghavendra³, S Raghavendra⁴

¹Research Scholar, Department of Computer Science and Engineering Research Centre, BGS Institute of Technology, B G Nagara, Mandya, Karnataka, India
²Associate Professor, Department of Computer Science and Engineering, ATME College of Engineering, Mysuru, Karnataka, India
³Professor & Head, Department of Computer Science and Engineering, BGS Institute of Technology, B G Nagara, Mandya, Karnataka, India
⁴Associate Professor, Department of Computer Science and Engineering, School of Engineering and Technology, CHRIST Deemed to be University, Bengaluru, Karnataka, India

*Corresponding Author
Email: [email protected]

Received Date:15 September 2021, Accepted Date:28 January 2022, Published Date:23 February 2022

This work is licensed under a Creative Commons Attribution 4.0 International License.

Abstract

Background/Objectives: Recent studies emphasized on using ensemble models over single ones to solve credit scoring problems. The objective of this study is to build a heterogeneous ensemble classifier model with an improved classification accuracy. Methods: This study focuses on developing a heterogeneous ensemble classifier using Logistic Regression, K-nearest neighbor, Decision tree, Random Forest, Naïve Base and Support vector machine as base classifiers and Random Forest, Logistic Regression and Support vector machine as meta-classifiers. The proposed model is built using these six base classifiers for ensemble aggregation. A feature selection algorithm based on the random forest technique is used for selecting the best features. A stacking and voting method are used for building ensemble model. Findings: The ensemble classifier gives superior predictive performance than single classifiers SVM, DT, RF, NB, KNN and LR with an accuracy of 91.56% for Australian dataset and 84.35% for German dataset. Novelty: The proposed model uses stacking and majority voting method for ensemble classification. Initially, stacking is applied to the base classifiers. This is done in two levels. First the training dataset is split into 10 folds for cross validation. The output of each classifier is taken, and the dataset is updated with the meta-features. In the second level, three meta-classifiers (MC), namely LR, SVM and RF are used. Majority voting is applied to the output of these meta-classifiers for the prediction.

Keywords: Credit scoring; ensemble model; SVM; DT; RF; NB; KNN; LR

References

Nalić J, Martinović G, Žagar D. New hybrid data mining model for credit scoring based on feature selection algorithm and ensemble classifiers. Advanced Engineering Informatics. 2020;45:101130. Available from: https://dx.doi.org/10.1016/j.aei.2020.101130
Xia Y, Zhao J, He L, Li Y, Niu M. A novel tree-based dynamic heterogeneous ensemble method for credit scoring. Expert Systems with Applications. 2020;159:113615. Available from: https://dx.doi.org/10.1016/j.eswa.2020.113615
Rojarath A, , Songpan W. Probability-Weighted Voting Ensemble Learning for Classification Model. Journal of Advances in Information Technology. 2020;11(4):217–227. Available from: https://dx.doi.org/10.12720/jait.11.4.217-227
Zhang T, Chi G. A heterogeneous ensemble credit scoring model based on adaptive classifier selection: An application on imbalanced data. International Journal of Finance & Economics. 2021;26(3):4372–4385. Available from: https://dx.doi.org/10.1002/ijfe.2019
Abedin MZ, Guotai C, Hajek P, Zhang T. Combining weighted SMOTE with ensemble learning for the class-imbalanced prediction of small business credit risk. Complex & Intelligent Systems. 2022. Available from: https://dx.doi.org/10.1007/s40747-021-00614-4
Bao W, Lianju N, Yue K. Integration of unsupervised and supervised machine learning algorithms for credit risk assessment. Expert Systems with Applications. 2019;128:301–315. Available from: https://dx.doi.org/10.1016/j.eswa.2019.02.033
Plawaik P, Moloudabdar, Acharya R. Application of new deep genetic cascade ensemble of SVM classifiers to predict the Australian credit scoring. Applied soft computing. 2020. Available from: https://doi.org/10.1016/j.asoc.2019.105740
Zhang W, Yang D, Zhang S, Ablanedo-Rosas JH, Wu X, Lou Y. A novel multi-stage ensemble model with enhanced outlier adaptation for credit scoring. Expert Systems with Applications. 2021;165:113872. Available from: https://dx.doi.org/10.1016/j.eswa.2020.113872
Kunniu Z, Zhang Y, Liu R, Li. Resampling ensemble model based on data distribution for imbalanced credit risk evaluation in p2p lending. Information science. 2020;V(536):124–134. Available from: https://doi.org/10.1016/j.ins.2020.05.040
Li Y, Chen W. A Comparative Performance Assessment of Ensemble Learning for Credit Scoring. Mathematics. 1756;8(10):1756. Available from: https://doi.org/10.3390/math8101756
Sivasankar E, Selvi C, Mahalakshmi S. Rough set-based feature selection for credit risk prediction using weight-adjusted boosting ensemble method. Soft Computing. 2020;24(6):3975–3988. Available from: https://dx.doi.org/10.1007/s00500-019-04167-0
Jin Y, Zhang W, Wu X, Liu Y, Hu Z. A Novel Multi-Stage Ensemble Model With a Hybrid Genetic Algorithm for Credit Scoring on Imbalanced Data. IEEE Access. 2021;9:143593–143607. Available from: https://dx.doi.org/10.1109/access.2021.3120086
Dzelihodz C, Donko D, Kevric J. Improved credit scoring model based on bagging neural network. International Journal of Information Technology and Decision Making. 2018;17:1725–1741. Available from: https://doi.org/10.1142/S0219622018500293
Wang Z, Jiang C, Ding Y, Lyu X, Liu Y. A Novel behavioral scoring model for estimating probability of default over time in peer-to-peer lending. Electronic Commerce Research and Applications. 2018;27:74–82. Available from: https://dx.doi.org/10.1016/j.elerap.2017.12.006
Wei S, Yang D, Zhang W, Zhang S. A Novel Noise-Adapted Two-Layer Ensemble Model for Credit Scoring Based on Backflow Learning. IEEE Access. 2019;7:99217–99230. Available from: https://dx.doi.org/10.1109/access.2019.2930332
He H, Zhang W, Zhang S. A novel ensemble method for credit scoring: Adaption of different imbalance ratios. Expert Systems with Applications. 2018;98(98):105–117. Available from: https://dx.doi.org/10.1016/j.eswa.2018.01.012
Dietterich TG. Machine-learning research: Four current directions. AI Magazine. 1997;18:96–136. Available from: https://doi.org/10.1609/aimag.v18i4.1324
Ali L, Niamat A, Khan JA, Golilarz NA, Xingzhong X, Noor A, et al. An Optimized Stacked Support Vector Machines Based Expert System for the Effective Prediction of Heart Failure. IEEE Access. 2019;7:54007–54014. Available from: https://dx.doi.org/10.1109/access.2019.2909969
Zhang W, He H, Zhang S. A novel multi-stage hybrid model with enhanced multi-population niche genetic algorithm: An application in credit scoring. Expert Systems with Applications. 2019;121:221–232. Available from: https://dx.doi.org/10.1016/j.eswa.2018.12.020
Parvin H, MirnabiBaboli M, Alinejad-Rokny H. Proposing a classifier ensemble framework based on classifier selection and decision tree. Engineering Applications of Artificial Intelligence. 2015;37:34–42. Available from: https://dx.doi.org/10.1016/j.engappai.2014.08.005
Ala'raj M, Abbod MF. A new hybrid ensemble credit scoring model based on classifiers consensus system approach. Expert Systems with Applications. 2016;64:36–55. Available from: https://dx.doi.org/10.1016/j.eswa.2016.07.017
Kuncheva LI. Combining Pattern Classifiers: Methods and Algorithms. John Wiley & Sons. 2004.
Marqués AI, García V, Sánchez JS. Exploring the behaviour of base classifiers in credit scoring ensembles. Expert Systems with Applications. 2012;39(11):10244–10250. Available from: https://dx.doi.org/10.1016/j.eswa.2012.02.092
Sang HV, Nam NH, Nhan ND. A Novel Credit Scoring Prediction Model based on Feature Selection Approach and Parallel Random Forest. Indian Journal of Science and Technology. 2016;9(20). Available from: https://dx.doi.org/10.17485/ijst/2016/v9i20/92299
Guo S, He H, Huang X. A Multi-Stage Self-Adaptive Classifier Ensemble Model With Application in Credit Scoring. IEEE Access. 2019;7:78549–78559. Available from: https://dx.doi.org/10.1109/access.2019.2922676

Copyright

© 2022 Anil Kumar et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Published By Indian Society for Education and Environment (iSee)