• P-ISSN 0974-6846 E-ISSN 0974-5645

Indian Journal of Science and Technology

Article

Indian Journal of Science and Technology

Year: 2016, Volume: 9, Issue: 48, Pages: 1-7

Original Article

Using Genetic Approach for Learning from Imbalanced Text Corpora

Abstract

Aiming at the ever-present problem of imbalanced data in text classification, the paper employs the Genetic Algorithm approach for tackling the imbalance problem in a binary classed text data. One of the inherent characteristics of imbalanced data is the highly uneven distribution of data among the classes. Consequentially, the traditional classifier algorithms such as the Nearest Neighbor have shown a decreased performance due to the under representation of the interested class. A hybrid approach has been used to incorporate the oversampling technique with the advantages of Genetic Algorithm for generation of the artificial patterns for the minority class. This approach employs avoidance of over fitting as the fitness function to decide the stopping criterion for generation of synthetic samples. Efficient evaluation measures analyze the increase in performance of the proposed hybrid-learning model.
Keywords: Genetic Algorithm, Imbalance Data, Nearest Neighbor, Oversampling, Synthetic Data, Text Data

DON'T MISS OUT!

Subscribe now for latest articles and news.