Indian Journal of Science and Technology
Year: 2016, Volume: 9, Issue: 45, Pages: 1-5
Bhuvaneswari Ragothaman and B. Sarojini
Department of Computer Science, Avinashilingam Institute for Home Science and Higher Education for Women - University, Coimbatore - 641043, Tamil Nadu, India; [email protected], [email protected]
Objective: There exists a huge amount of heterogeneous data in medical databases which when mined may provide valuable information for medical diagnosis. The processing of this voluminous data is tedious, owing to its high dimensionality. The presence of irrelevant and redundant data may reduce the performance of data mining algorithms. The data preprocessing technique, feature selection, removes any irrelevant or redundant features and selects discriminative features. The objective of this research work is to propose a feature selection algorithm that meets two objectives, 1. Reduce the number of features and 2. Maximize the classification accuracy. Methods/Statistical Analysis: The research work proposes a feature selection algorithm Multi-Objective Non-Dominated Sorted Artificial Bee Colony Algorithm (NSABC) that combines Pareto optimization and Artificial Bee Colony (ABC) algorithm for selecting the Non-Dominant optimal feature subsets of three medical datasets viz. Wisconsin Breast Cancer, Pima Indian Diabetes and Statlog Heart disease datasets. The features selected by the ABC algorithm are further optimized by applying Pareto optimization. The feature subsets selected are validated by evaluating the performance of KNN classifier in terms of the classification metrics Precision, Recall and Accuracy before and after feature selection. Findings: The percentage of feature reduction and the improved performance of the classifier prove that the proposed feature selection method has selected the discriminate features of the datasets. The selected feature subsets are further validated by calculating the entropy, the measure signifying the individuality and independent nature of the features in the feature subset. Application/Improvement: The reduced feature subset selected is used in the easy diagnosis of the disease during the medical diagnosis. This also helps in cost and time reduction for the medical diagnosis of the disease.
Keywords: Data Mining, Entropy Feature Selection, KNN Classifier, Non-Dominated Sorted Artificial Bee Colony Algorithm (NSABC), Pareto Optimization
Subscribe now for latest articles and news.