Using Genetic Approach for Learning from Imbalanced Text Corpora

Lincy Mathews   and Hari Seetha

doi:10.17485/ijst/2016/v9i48/108360

Article

Using Genetic Approach for Learning from Imbalanced Text Corpora

VIEWS 735
PDF 306

Abstract
Full-Text HTML
Full-Text PDF
How to Cite

Indian Journal of Science and Technology

DOI: 10.17485/ijst/2016/v9i48/108360

Year: 2016, Volume: 9, Issue: 48, Pages: 1-7

Original Article

Using Genetic Approach for Learning from Imbalanced Text Corpora

Lincy Mathews^1* and Hari Seetha²

¹ School of Information Technology and Engineering, VIT University, Vellore - 632014, Tamil Nadu, India; [email protected]
² SCOPE, Vellore Institute of Technology University, Vellore - 632014, Tamil Nadu, India; [email protected]
*Author for correspondence
Lincy Mathews
School of Information Technology and Engineering
Email: [email protected]

This work is licensed under a Creative Commons Attribution 4.0 International License.

Abstract

Aiming at the ever-present problem of imbalanced data in text classification, the paper employs the Genetic Algorithm approach for tackling the imbalance problem in a binary classed text data. One of the inherent characteristics of imbalanced data is the highly uneven distribution of data among the classes. Consequentially, the traditional classifier algorithms such as the Nearest Neighbor have shown a decreased performance due to the under representation of the interested class. A hybrid approach has been used to incorporate the oversampling technique with the advantages of Genetic Algorithm for generation of the artificial patterns for the minority class. This approach employs avoidance of over fitting as the fitness function to decide the stopping criterion for generation of synthetic samples. Efficient evaluation measures analyze the increase in performance of the proposed hybrid-learning model.
Keywords: Genetic Algorithm, Imbalance Data, Nearest Neighbor, Oversampling, Synthetic Data, Text Data