Empirical Study of Feature Selection Methods for High Dimensional Data

S  DeepaLakshmi   and T  Velmurugan

doi:10.17485/ijst/2016/v9i39/90599

Article

Empirical Study of Feature Selection Methods for High Dimensional Data

VIEWS 958
PDF 225

Abstract
Full-Text HTML
Full-Text PDF
How to Cite

Indian Journal of Science and Technology

DOI: 10.17485/ijst/2016/v9i39/90599

Year: 2016, Volume: 9, Issue: 39, Pages: 1-6

Original Article

Empirical Study of Feature Selection Methods for High Dimensional Data

S. DeepaLakshmi^1* and T. Velmurugan²

¹ Bharathiar University, Coimbatore - 641046, Tamil Nadu, India; [email protected]
² PG and Research Department of Computer Science, D. G. Vaishnav College, Chennai - 600106, Tamil Nadu, India; [email protected]
*Author for correspondence
DeepaLakshmi
Bharathiar University
Email:[email protected]

This work is licensed under a Creative Commons Attribution 4.0 International License.

Abstract

Background/Objectives: Feature Selection is a process of selecting features that are relevant which is used in model construction by removing redundant, irrelevant and noisy data. A typical application of Text Mining is classification of messages and e-mails into spam and ham. Methods/Statistical Analysis: This article gives a comprehensive overview of the various Feature Selection methods for Text Mining. Various Filter methods like Pearson Correlation, Chi-square, Symmetrical Uncertainty and Mutual Information are applied to select the optimal set of features. Findings: Filter Feature Selection methods are used to classify Text data. Various Classification algorithms are applied using the optimal set of features obtained. The accuracy of classification algorithms are verified based on the chosen data set. Novelty/ Improvements: A comparative study of various filter methods for Feature Selection and classification algorithms for performance evaluation is conceded in this research work.
Keywords: Chi-Square, Feature Selection, Filter Method, Mutual Information, Pearson Correlation