A Novel Credit Scoring Prediction Model based on Feature Selection Approach and Parallel Random Forest

Ha Van Sang; Nguyen Ha Nam; Nguyen Duc Nhan

doi:10.17485/ijst/2016/v9i20/92299

Article

A Novel Credit Scoring Prediction Model based on Feature Selection Approach and Parallel Random Forest

VIEWS 1029
PDF 1422

Abstract
Full-Text HTML
Full-Text PDF
How to Cite

Indian Journal of Science and Technology

DOI: 10.17485/ijst/2016/v9i20/92299

Year: 2016, Volume: 9, Issue: 20, Pages: 1-6

Original Article

A Novel Credit Scoring Prediction Model based on Feature Selection Approach and Parallel Random Forest

Ha Van Sang^1* , Nguyen Ha Nam² , Nguyen Duc Nhan³

¹Department of Economic Information System, Academy of Finance, Hanoi, Viet Nam; [email protected] ²Department of Information Technology, VNU-University of Engineering and Technology, Hanoi, Viet Nam; [email protected] ³Department, Faculty of Telecommunications, Posts and Telecommunications Institute of Technology, Hanoi, Vietnam; [email protected]

*Author for correspondence
Ha Van Sang
Department of Economic Information System, Academy of Finance, Hanoi, Viet Nam;
Email: [email protected]

This work is licensed under a Creative Commons Attribution 4.0 International License.

Abstract

Background/Objectives: This article presents a method of feature selection to improve the accuracy and the computation speed of credit scoring models. Methods/Analysis: In this paper, we proposed a credit scoring model based on parallel Random Forest classifier and feature selection method to evaluate the credit risks of applicants. By integration of Random Forest into feature selection process, the importance of features can be accurately evaluated to remove irrelevant and redundant features. Findings: In this research, an algorithm to select best features was developed by using the best average and median scores and the lowest standard deviation as the rules of feature scoring. Consequently, the dimension of features can be reduced to the smallest possible number that allows of a remarkable runtime reduction. Thus the proposed model can perform feature selection and model parameters optimization at the same time to improve its efficiency. The performance of our proposed model was experimentally assessed using two public datasets which are Australian and German datasets. The obtained results showed that an improved accuracy of the proposed model compared to other commonly used feature selection methods. In particular, our method can attain the average accuracy of 76.2% with a significantly reduced running time of 72 minutes on German credit dataset and the highest average accuracy of 89.4% with the running time of only 50 minutes on Australian credit dataset. Applications/Improvements: This method can be usefully applied in credit scoring models to improve accuracy with a significantly reduced runtime.

Keywords: Credit Scoring, Feature Selection, Machine Learning, and Parallel Random Forest