Indian Journal of Science and Technology
Year: 2016, Volume: 9, Issue: 35, Pages: 1-11
Nasib Singh Gill* and Pooja Mittal
Department of Computer Science and Applications, Maharshi Dayanand University, Rohtak - 124001, Haryana, India; [email protected]
*Author for correspondence
Nasib Singh Gill
Department of Computer Science and Applications
Objectives: Data mining approaches are used for developing the decision making systems. The current study proposes a novel hybrid model for diabetic prediction by using data mining techniques. The main objective of this study is to improve the accuracy rate by significantly reducing the size of the data under analysis at every stage. Methods/Statistical Analysis: To achieve the objectives, the PIMA Female Diabetic dataset, extracted from UCI repository, is used. The 10-fold cross validation method is used for extracting the testing and the training samples. Three rank based selection techniques are used for the attribute selection. The association between different attributes is identified and then clustering is performed under criticality using HMM and Fuzzy improved Neural Network. Findings: The data size reduces significantly when appropriate selection methods are applied in the respective sequence. For categorical data, the gain ratio attribute selection method out performs. Clustering is more effective when performed after identifying the exact associations among attributes. The proposed hybrid model achieved 92% of overall accuracy. The blend of supervised and un-supervised techniques achieved better results than the techniques when applied individually on the same data, as figured by the comparative analysis. The earlier prediction models worked either on classification or clustering. But in this present study, the classifiers and the clustering are performed. The Fuzzy improved Neural Networks are used for predicting the diabetes disease over the data. The result analysis proved that the prediction accuracy is poor (Naïve Bayes: 76.30%, Neural Networks: 75.13, Support Vector Machine: 77.47, K-Nearest neighbor: 69.79, Decision Tree (J48): 74.21), when the classifiers are implemented separately but when these are amalgamated with each other, produces better results. Application/ Improvements: The proposed hybrid model can be used as an expert system application, under the guidance of diabetic expert to assist the physicians for taking the decisions regarding the early diagnosis of the disease. In future, the proposed model can be applied on gender independent dataset. Further, the accuracy rate of the model can be improved by replacing the missing values of the dataset with the most appropriate value.
Keywords: Associative Clustering, Diabetes, Fuzzy Improved NN, Hidden Markov Model, Information Gain
Subscribe now for latest articles and news.