A Study on Impact of Dimensionality Reduction on Naïve Bayes Classifier

Priya Mohan   and Ilango Paramasivam

doi:10.17485/ijst/2017/v10i20/101599

Article

A Study on Impact of Dimensionality Reduction on Naïve Bayes Classifier

VIEWS 1424
PDF 333

Abstract
Full-Text HTML
Full-Text PDF
How to Cite

Indian Journal of Science and Technology

DOI: 10.17485/ijst/2017/v10i20/101599

Year: 2017, Volume: 10, Issue: 20, Pages: 1-4

Original Article

A Study on Impact of Dimensionality Reduction on Naïve Bayes Classifier

Priya Mohan^1* and Ilango Paramasivam²

¹Department of Computer Science, Bharathiar University, Coimbatore – 641046, Tamilnadu, India; [email protected] ²School of Computing Science & Engineering, VIT University, Vellore – 632014, Tamilnadu, India; [email protected]

*Author for the correspondence:
Priya Mohan
Department of Computer Science, Bharathiar University, Coimbatore – 641046, Tamilnadu, India; [email protected]

This work is licensed under a Creative Commons Attribution 4.0 International License.

Abstract

Objectives: The time complexity of the machine learning algorithm is directly proportionate to the dimension of the dataset. In this paper, he impacts of dimensionality of the dataset on the machine learning algorithm, Naïve-Bayes Classifier is evaluated with all feature subsets to analyze whether there is any variations in the performance. Methods/Statistical Analysis: Naïve Bayes Classifier is taken for the study to evaluate its variations in terms of its performance in correctly classified instances and incorrectly classified instances. Pima Indian Type II diabetes dataset is taken for the experimental study. Confusion matrix will be formulated for the performance of Naïve-Bayes Classifier using 10-fold cross validation for each run. The study exhibits the impact of the dimensionality on the performance of Naïve-Bayes Classifier. Findings: The Naïve Bayes classifier classifies the patient records either as diabetes or as non-diabetes using the values of the feature set. It is a probabilistic approach of classifying the patient records into the binary class. It is found that there is an impact on the performance of Naïve Bayes Classifier due to the dimensionality of the feature set it terms of Classification accuracy, number of true positives, true negatives, false positives and false negatives. The incorrect classification is certainly dangerous. Whereas the valid classification facilitates the healthcare systems in terms of planning effective course of treatment which will save the life of the patient. The invalid classification will lead to a wrong diagnosis while formulating the treatment plan and it will lead to loss of life. Hence, the invalid classification in terms of false negative rate is to be viewed very seriously. In this paper, the study shows that there is an impact on the performance of Naïve Bayes Classifier due to the higher dimensionality of the dataset. Application/Improvements: They will be used in medical Informatics for the quality diagnosis and effective treatment planning. The focus on the false positive rate in the classification accuracy of Naïve Bayes Classifier will notably help the healthcare systems to diagnose the patients accurately to save life.

Keywords: Classification Accuracy, Dimensionality Reduction, Machine Learning, Naïve-Bayes Classifier