Indian Journal of Science and Technology
Year: 2023, Volume: 16, Issue: 45, Pages: 4141-4155
Bhakti S Pimpale1*, Anala A Pandit2
1Research Scholar, Department of Computer Application, VJTI, Mumbai, India
2Ph.D. Supervisor, Department of Computer Application, VJTI, Mumbai, India
Email: [email protected]
Received Date:29 April 2023, Accepted Date:02 November 2023, Published Date:05 December 2023
Objectives: To forecast daily OPD patients based on air pollution and weather parameters, the objective is to build a robust model that accurately predicts patient volume by considering major missing values and factors such as PM2.5 levels, temperature, humidity, wind speed, and rainfall, etc. thereby improving healthcare planning and delivery. Methods: To develop the multioutput ensemble model for forecasting daily OPD (out-patient department), we have used 13 machine learning techniques such as regression analysis, Extra tree regressor, Support vector regressor, etc. We have collected and pre-processed data from multiple sources, including air quality and weather parameters from NASA’s website, and historical healthcare data from Shatabdi Hospital, Govandi, Mumbai. We have developed the model using a combination of Gaussian regressor and Extra tree regressor and evaluated its performance using metrics such as FastDTW, RMSE, etc. Findings: The prediction result shows that the multioutput ensemble model performed significantly better than other models even with the presence of outliers, multicollinearity, and non-stationarity with Root Mean Squared Error 0.46 and 0.22 for ARI and Pneumonia with lag 7 days and 8 days respectively. Moreover, this model also worked well including Covid-19 period data when there was a negligible correlation between independent and dependent variables. Novelty: None of the datasets that have been used for the prediction of time series data have had a significant gap in recorded data in the time domain which has been effectively taken care of in this research. Secondly, all the earlier research work in this domain addresses only a single disease that provides the same lag value irrespective of the disease. The period of expression after the event occurrence may vary for multiple diseases, albeit in one domain that is triggered by similar and /or different air pollutants. This issue has been addressed by ensembling multiple ML algorithms to effectively optimize time complexity.
Keywords: Acute Respiratory Infection, Pneumonia, Gaussian Regressor, Extra Tree Regressor, Weather Data, Air Pollution
© 2023 Pimpale & Pandit. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Published By Indian Society for Education and Environment (iSee)
Subscribe now for latest articles and news.