Indian Journal of Science and Technology
Year: 2016, Volume: 9, Issue: 48, Pages: 1-7
Lincy Mathews1* and Hari Seetha2
1 School of Information Technology and Engineering, VIT University, Vellore - 632014, Tamil Nadu, India; [email protected]
2 SCOPE, Vellore Institute of Technology University, Vellore - 632014, Tamil Nadu, India; [email protected]
*Author for correspondence
School of Information Technology and Engineering
Email: [email protected]
Aiming at the ever-present problem of imbalanced data in text classification, the paper employs the Genetic Algorithm approach for tackling the imbalance problem in a binary classed text data. One of the inherent characteristics of imbalanced data is the highly uneven distribution of data among the classes. Consequentially, the traditional classifier algorithms such as the Nearest Neighbor have shown a decreased performance due to the under representation of the interested class. A hybrid approach has been used to incorporate the oversampling technique with the advantages of Genetic Algorithm for generation of the artificial patterns for the minority class. This approach employs avoidance of over fitting as the fitness function to decide the stopping criterion for generation of synthetic samples. Efficient evaluation measures analyze the increase in performance of the proposed hybrid-learning model.
Keywords: Genetic Algorithm, Imbalance Data, Nearest Neighbor, Oversampling, Synthetic Data, Text Data
Subscribe now for latest articles and news.