Indian Journal of Science and Technology
Year: 2016, Volume: 9, Issue: 42, Pages: 1-8
Khor Kok-Chin* and Ng Keng-Hoong
Faculty of Computing and Informatics, Multimedia University, 63100, Cyberjaya, Selangor; [email protected], [email protected]
*Author for correspondence
Faculty of Computing and Informatics, Multimedia University, 63100, Cyberjaya, Selangor; [email protected]
Objectives: The imbalanced bank direct marketing data set utilized in this study is a two-class data mining problem, where a customer may or may not subscribe a product from a bank. Methods/Statistical Analysis: The data set inherited the rare class problem where the classification rate attained for the rare class is low. In this study, we attempted cost sensitive learning to mitigate the problem, and to address that there are various costs involved when misclassification occurs. Three learning algorithms, namely, Naive Bayes (NB), C4.5 and Naive Bayes Tree (NBT) were involved in the cost sensitive learning and their results were empirically evaluated. Findings: The results were also compared with two previous studies that utilized the cost insensitive SVM and over-sampling, respectively. Although cost sensitive learning is claimed able to handle imbalanced data sets, but we noticed that the learning is less effective for the bank direct marketing data set in overall. Cost sensitive learning provides a way of “wrapping” learning algorithms that are not designed to handle imbalanced class distributions. Therefore, it may not work well for certain imbalanced data sets. Over-sampling, on the other hand, worked well for the data set. Improvements/Applications: Over-sampling helped to generalize the decision region of the rare class clearly and subsequently improved the classification result.
Keywords: Bank Direct Marketing, Cost Sensitive Learning, Imbalanced Data Set, Rare Class Problem, Over-Sampling
Subscribe now for latest articles and news.