• P-ISSN 0974-6846 E-ISSN 0974-5645

Indian Journal of Science and Technology


Indian Journal of Science and Technology

Year: 2022, Volume: 15, Issue: 38, Pages: 1932-1940

Original Article

A Dataset-Specific Machine Learning Study for Predicting Diabetes (Type-2) in a Developing Country Context

Received Date:04 June 2022, Accepted Date:17 August 2022, Published Date:14 October 2022


Objectives: Diabetes become more prevalent across the globe, understanding their sources and causes are more important than ever. This study uses machine learning techniques to efficiently detect Diabetic patients from many features. Methods: The purpose of this paper is to conduct a dataset-specific machine learning study for predicting diabetes in Bangladesh. Classification is used with 18 features including demographic characteristics, family history, dieting habit, clinical features, physical activities, and life quality. Five different classifiers are used. Findings: Based on using five different classifiers, results suggest that the Logistic Regression performed the best in predicting diabetes for this dataset. The accuracy of the logistic regression classifier exceeds 83.8%. Novelty: Unlike other studies, the authors combine eating habits with demographic and health features to enhance the performance of the classifiers. The result suggests that while addition of factors or features related to eating habits and lifestyle can increase the accuracy of prediction, the inclusion of more clinical features is more important to increase the accuracy. The authors believe that this finding is significant in the context of developing countries like Bangladesh considering the limited health-resource available as well as the fact of fast-changing of eating habits and lifestyle. Keywords: Machine Learning; chronic disease; classification; logistic regression; and diabetes


  1. Htay T, Soe K, Lopez-Perez A, Doan AH, Romagosa MA, Aung K. Mortality and Cardiovascular Disease in Type 1 and Type 2 Diabetes. Current Cardiology Reports. 2019;21(6):45. Available from: https://doi.org/10.1007/s11886-019-1133-9
  2. Panicacci S, Donati M, Profili F, Francesconi P, Fanucci L. Trading-Off Machine Learning Algorithms towards Data-Driven Administrative-Socio-Economic Population Health Management. Computers. 2021;10(1):4. Available from: https://doi.org/10.3390/computers10010004
  3. Awotunde IDJBO, Babatunde O. An Improved Hybridization in the Diagnosis of Diabetes Mellitus Using Selected Computational Intelligence. Information and Communication Technology and Applications: Third International Conference. ICTA 2020. Minna. Nigeria. 2020. Available from: https://doi.org/10.1007/978-3-030-69143-1_22
  4. Chauhan T, Rawat S, Malik S, Singh P. Supervised and Unsupervised Machine Learning based Review on Diabetes Care. 2021 7th International Conference on Advanced Computing and Communication Systems (ICACCS). 2021;p. 581–585. Available from: https://doi.org/10.1109/ICACCS51430.2021.9442021
  5. Chaki J, Ganesh ST, Cidham SK, Theertan SA. Machine learning and artificial intelligence based Diabetes Mellitus detection and self-management: A systematic review. Journal of King Saud University - Computer and Information Sciences. 2022;34(6):3204–3225. Available from: https://doi.org/10.1016/j.jksuci.2020.06.013
  6. Ahmad HF, Mukhtar H, Alaqail H, Seliaman M, Alhumam A. Investigating Health-Related Features and Their Impact on the Prediction of Diabetes Using Machine Learning. Applied Sciences. 2021;11(3):1173. Available from: https://doi.org/10.3390/app11031173
  7. Zou Q, Qu K, Luo Y, Yin D, Ju Y, Tang H. Predicting Diabetes Mellitus With Machine Learning Techniques. Frontiers in Genetics. 2018;9:515. Available from: https://doi.org/10.3389/fgene.2018.00515
  8. Aalaei S, Shahraki H, Rowhanimanesh A, Eslami S. Feature selection using genetic algorithm for breast cancer diagnosis: experiment on three different datasets. J. Basic Med. Sci. 2016;19(5):476–482. Available from: https://doi.org/10.3389/fgene.2018.00515
  9. Meng XH, Huang YX, Rao DP, Zhang Q, Liu Q. Comparison of three data mining models for predicting diabetes or prediabetes by risk factors. The Kaohsiung Journal of Medical Sciences. 2013;29(2):93–99. Available from: https://doi.org/10.1038/s41598-020-68771-z
  10. Rani AS, Jyothi S. Performance analysis of classification algorithms under different datasets. 3rd International Conference on Computing for Sustainable Global Development (INDIACom). 2016;p. 1584–1589.
  11. Tung EL, Chin MH. Demographic Influences and Health Disparities in Adults with Diabetes. Behavioral Diabetes. 2020;p. 441–461. Available from: https://doi.org/10.1007/978-3-030-33286-0_28
  12. Singh D, Leavline EJ, Baig BS. Diabetes prediction using medical data. J. Comput. Intell. Bioinforma. 2017;10(1):1–8. Available from: https://doi.org/10.37896/JXAT14.01/314405
  13. Mohiuddin AK. Diabetes Fact: Bangladesh Perspective. Int. J. Diabetes Res. 2019;2(1). Available from: https://doi.org/10.17554/j.issn.2414-2409.2019.02.12
  14. Kavakiotis I, Tsave O, Salifoglou A, Maglaveras N, Vlahavas I, Chouvarda I. Machine Learning and Data Mining Methods in Diabetes Research. Computational and Structural Biotechnology Journal. 2017;15:104–116. Available from: https://doi.org/10.1016/j.csbj.2016.12.005
  15. Pinchevsky Y, Butkow N, Raal FJ, Chirwa T, Rothberg A. Demographic and Clinical Factors Associated with Development of Type 2 Diabetes: A Review of the Literature. International Journal of General Medicine. 2020;13:121–129. Available from: https://doi.org/10.2147/IJGM.S226010
  16. Kyrou I, Tsigos C, Mavrogianni C, Cardon G, Stappen VV, Latomme J, et al. Sociodemographic and lifestyle-related risk factors for identifying vulnerable groups for type 2 diabetes: a narrative review with emphasis on data from Europe. BMC Endocrine Disorders. 2020;20(S1):1–13. Available from: https://doi.org/10.1186/s12902-019-0463-3
  17. Dagliati A, Marini S, Sacchi L, Cogni G, Teliti M, Tibollo V, et al. Machine Learning Methods to Predict Diabetes Complications. Journal of Diabetes Science and Technology. 2018;12(2):295–302. Available from: https://doi.org/10.1177/1932296817706375
  18. Kaur H, Kumari V. Predictive modelling and analytics for diabetes using a machine learning approach. Applied Computing and Informatics. 2022;18(1/2):90–100. Available from: https://doi.org/10.1016/j.aci.2018.12.004
  19. Zheng T, Xie W, Xu L, He X, Zhang Y, You M, et al. A machine learning-based framework to identify type 2 diabetes through electronic health records. International Journal of Medical Informatics. 2017;97:120–127. Available from: https://doi.org/10.1016/j.ijmedinf.2016.09.014
  20. Cahn A, Shoshan A, Sagiv T, Yesharim R, Goshen R, Shalev V, et al. Prediction of progression from pre‐diabetes to diabetes: Development and validation of a machine learning model. Diabetes/Metabolism Research and Reviews. 2020;36(2). Available from: https://doi.org/10.1002/dmrr.3252
  21. Khanam JJ, Foo SY. A comparison of machine learning algorithms for diabetes prediction. ICT Express. 2021;7(4):432–439. Available from: https://doi.org/10.1016/j.ijmedinf.2016.09.014
  22. Daanouni O, Cherradi B, Tmiri A. Diabetes Diseases Prediction Using Supervised Machine Learning and Neighbourhood Components Analysis. Proceedings of the 3rd International Conference on Networking, Information Systems & Security. 2020;p. 1–5. Available from: https://doi.org/10.1145/3386723.3387887
  23. Mujumdar A, Vaidehi V. Diabetes prediction using machine learning algorithms. Procedia Comput. Sci. 2019;165:292–299. Available from: https://doi.org/10.13140/RG.2.2.21353.21603


© 2022 Haque & Alharbi. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Published By Indian Society for Education and Environment (iSee)


Subscribe now for latest articles and news.