• P-ISSN 0974-6846 E-ISSN 0974-5645

Indian Journal of Science and Technology


Indian Journal of Science and Technology

Year: 2020, Volume: 13, Issue: 37, Pages: 3820-3842

Original Article

Intelligent socio-economic status prediction system using machine learning models on Rajahmundry A.P., SES dataset

Received Date:30 August 2020, Accepted Date:20 September 2020, Published Date:13 October 2020


Background: Developing economic and social systems and assuring the efficiency of economic and social processes is the major task for the government of any country. Predictable machine learning (ML) models are used for analyzing data sets that allow more efficient enterprise management. Now a day, the research on Socio-Economic Status (SES) and Machine Learning (ML) is very crucial to find socio-economic inequalities, and take further actions that are preventions, protections, and suppressions. Objectives: The mainobjective of this research is to understand the Socio Economic System issues and predicting SES levels on particular area like Rajahmundry, AP, India using statistical analysis and machine learning methodologies. Methods: In this, we analyze the data that is collected from Rajahmundry (Rajamahandravaram),Andhra Pradesh, India with 48 feature attributes (dimensions), and one target four class attribute (poor, rich, middle, upper-middle ). The SES levels like poor, rich, middle, and upper-middle classes are predicted by 5 ML algorithms. Findings: In this paper, we conduct the statistical analysis of each attribute, and analyze and compare the performance accuracies using confusion matrix, performance parameter (classification accuracy, Precision,Recall, and F1) values and receive operating characteristic (ROC) under AUC values of five efficient ML algorithms like Naïve Bayes, Decision Trees (DTs), k-NN, SVM (kernel RBF) and Random Forest (RF). We observed that the RF algorithm showed better results when compared with other algorithms for the Rajahmundry AP SES dataset. The RF algorithm performs 97.82% of classification accuracy (CA) and time is taken for model construction 0.41 seconds. The next superior performed ML model is DTs with 96.67% of CA and 0.16 seconds for model construction. Novelty: Comprehensive analysis indicates that the novel AP SES Dataset with empirical statistical analysis gives the good results and predicts the SES levels with RF model is very effective.

Keywords: Machine Learning; socio-economic status; Rajahmundry;household; poverty


  1. Qureshi MI, Qayyum S, Nassani AA, Aldakhil AM, Abro MMQ, Zaman K. Management of various socio-economic factors under the United Nations sustainable development agenda. Resources Policy. 2019;64. Available from: https://dx.doi.org/10.1016/j.resourpol.2019.101515
  2. Dou Y, Silva Rd, McCord P, Zaehringer J, Yang H, Furumo P, et al. Understanding How Smallholders Integrated into Pericoupled and Telecoupled Systems. Sustainability. 2020;12(4):1596. Available from: https://dx.doi.org/10.3390/su12041596
  3. Kumar A, Sharma A. Socio-Sentic framework for sustainable agricultural governance. Sustainable Computing: Informatics and Systems. 2018. Available from: https://doi.org/10.1016/j.suscom.2018.08.006
  4. Shah R, Zimmermann R. Multimodal analysis of user-generated multimedia content. Springer International Publishing. 2017.
  5. Goodwin Y, Strang KD. Socio-Cultural and Multi-Disciplinary Perceptions of Risk. International Journal of Risk and Contingency Management. 2012;1(1):1–11. Available from: https://dx.doi.org/10.4018/ijrcm.2012010101
  6. Dahdouh-Guebas F, Collin S, Seen DL, Rönnbäck P, Depommier D, Ravishankar T, et al. Analysing ethnobotanical and fishery-related importance of mangroves of the East-Godavari Delta (Andhra Pradesh, India) for conservation and management purposes. Journal of Ethnobiology and Ethnomedicine. 2006;2(1). Available from: https://dx.doi.org/10.1186/1746-4269-2-24
  7. Ranjith S, Shivapur AV, Kumar PSK, Hiremath CG, Dhungana S. Water quality evaluation in term of WQI river Tungabhadra. International Journal of Innovative Technology and Exploring Engineering. 2019;(8) 247–253. Available from: https://doi.org/10.35940/ijitee.I1051.0789S219
  8. Kennedy BP, Kawachi I, Glass R, Prothrow-Stith D. Income distribution, socioeconomic status, and self rated health in the United States: multilevel analysis. BMJ. 1998;317(7163):917–921. Available from: https://dx.doi.org/10.1136/bmj.317.7163.917
  9. Winkleby MA, Jatulis DE, Frank E, Fortmann SP. Socioeconomic status and health: how education, income, and occupation contribute to risk factors for cardiovascular disease. American Journal of Public Health. 1992;82(6):816–820. Available from: https://dx.doi.org/10.2105/ajph.82.6.816
  10. Grigsby M, Siddharthan T, Chowdhury M, Siddiquee A, Rubinstein A, Sobrino E, et al. Socioeconomic status and COPD among low- and middle-income countries. International Journal of Chronic Obstructive Pulmonary Disease. 2016;11:2497–2507. Available from: https://dx.doi.org/10.2147/copd.s111145
  11. Saritas MM. Performance analysis of ANN and Naive Bayes classification algorithm for data classification. International Journal of Intelligent Systems and Applications in Engineering. 2019;7(2):88–91. Available from: https://dx.doi.org/10.18201/ijisae.2019252786
  12. Karimi F, Sultana S, Babakan AS, Suthaharan S. An enhanced support vector machine model for urban expansion prediction. Computers, Environment and Urban Systems. 2019;75:61–75. Available from: https://dx.doi.org/10.1016/j.compenvurbsys.2019.01.001
  13. Liu ZG, Zhang Z, Liu Y, Dezert J, Pan Q. A new pattern classification improvement method with local quality matrix based on K-NN. Knowledge-Based Systems. 2019;164:336–347. Available from: https://doi.org/10.1016/j.knosys.2018.11.001
  14. Düntsch I, Gediga G. Confusion matrices and rough set data analysis. Journal of Physics: Conference Series. 2019;1229(1):012055. Available from: https://dx.doi.org/10.1088/1742-6596/1229/1/012055
  15. Hicks LA, Wheeler N, Sánchez-Busó L, Rakeman LJ, Harris RS, Grad HY. Evaluation of parameters affecting performance and reliability of machine learning-based antibiotic susceptibility testing from whole genome sequencing data. PLOS Computational Biology. 2019;15(9). Available from: https://dx.doi.org/10.1371/journal.pcbi.1007349


© 2020 Balasankar et al.This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Published By Indian Society for Education and Environment (iSee).


Subscribe now for latest articles and news.