Indian Journal of Science and Technology
DOI: 10.17485/IJST/v15i39.1520
Year: 2022, Volume: 15, Issue: 39, Pages: 1978-1986
Original Article
Mohammad Atif1*, Faisal Anwer1, Faisal Talib2
1Department of Computer Science, Aligarh Muslim University, Aligarh, U.P, India
2Department of Mechanical Engineering, Aligarh Muslim University, Aligarh, U.P, India
*Corresponding Author
Email: [email protected]
Received Date:23 July 2022, Accepted Date:16 September 2022, Published Date:15 October 2022
Objectives: People all across the world are afflicted by the deadly ailment known as diabetes. Diabetes is a terrible condition characterized by high blood glucose levels. This chronic condition is one of the leading causes of death for people worldwide. Early identification and prediction of diabetes can be aided by machine learning techniques. The purpose of this study is to use an ensemble of machine learning algorithms to predict diabetes efficiently in order to help the patients suffering from this lethal disease. Methods: The existing methods use a single model to predict diabetes, which may have an impact on accuracy because no one model can fit all datasets. Therefore we propose a robust model based on ensemble learning using hard voting classifier. Both the Pima Indians Diabetes dataset and the Early Stage Diabetes Risk Prediction Dataset, which collect data on people with and without diabetes, were tested. For classification, the proposed ensemble hard voting classifier uses a combination of three machine learning algorithms namely logistic regression, decision tree, and support vector machine. Findings: On the PIMA diabetes dataset, the proposed ensemble approach achieves the highest accuracy, precision, recall, and F1 score value of 81.17%, while on the Early Stage Diabetes Risk Prediction Dataset, it achieves the highest accuracy, precision, recall, and F1 score value of 94.23%. Novelty: The proposed methodology was experimentally tested using the state-of-the-art technology and basic classifiers such as K-Nearest Neighbor, Logistic Regression, Support Vector Machine, and Random Forest. The results are validated by computing the confusion matrix and ROC for each classier type.
Keywords: Diabetes Detection; Machine Learning; Supervised Classification; Ensemble Classification; Hard Voting Classifier
© 2022 Atif et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Published By Indian Society for Education and Environment (iSee)
Subscribe now for latest articles and news.