• P-ISSN 0974-6846 E-ISSN 0974-5645

Indian Journal of Science and Technology

Article

Indian Journal of Science and Technology

Year: 2016, Volume: 9, Issue: 30, Pages: 1-7

Original Article

High Dimensional Unbalanced Data Classification Vs SVM Feature Selection

Abstract

Background/Objectives: It is well known that the performance of the classification models prone to the class imbalance problem. The class imbalance problem occurs when one class of data severely outnumbers the other classes of data. The classification models learned on Support Vector Machines (SVM) are quite prominent in exhibiting better generalization abilities even in the context of the class imbalance problem. However, it is proved that the high imbalance ratio hinders SVM learning performance. With this concern, this paper presents an empirical study on the viability of SVM in the context of feature selection from moderately and highly unbalanced datasets. Methods/Statistical Analysis: The Support Vector Machine-Recursive Feature Elimination (SVM-RFE) wrapper feature selection is analyzed in this study and its performance on one document analysis and two biomedical unbalanced datasets is compared with two prominent feature selection methods like Chi-Square (CHI) test and Information Gain (IG) using Decision Tree and Naive Bayes classification models. Findings: From this empirical study two major identifications are reported: 1. For the considered scenarios, classification models learned on IG and CHI test are better performed than SVM-RFE feature selection of high class imbalance setting. 2. The SVM-RFE on rebalanced data yielded better performance than SVM-RFE on original data. Application/Improvements: Considered feature selection methods, including SVM-RFE yielded better performance on oversampled data than SVM-RFE on original data. Overall, this study reports models learned on Decision Tree exhibited better performance than the models learned on Naïve Bayes classifier.
Keywords: Class Imbalance Problem, Chi-Square, Information Gain, Support Vector Machine, SVM-RFE

DON'T MISS OUT!

Subscribe now for latest articles and news.