• P-ISSN 0974-6846 E-ISSN 0974-5645

Indian Journal of Science and Technology


Year: 2023, Volume: 16, Issue: 47, Pages: 4512-4524

Original Article

Improving Crop Yield Prediction Models with Optimization-Based Feature Selection and Filtering Approaches

Received Date:26 June 2023, Accepted Date:05 October 2023, Published Date:22 December 2023


Objective: To analyze the impact of various factors on crop yield and provide insights for improving crop production in the region. Methods: This research employs feature selection algorithms, machine learning models, and feature extraction algorithm Principal Component Analysis (PCA) technique to identify the key factors affecting crop yield in India. Data from the Indian Meteorological, Statistical, and Agriculture Departments spanning five decades are analyzed to provide valuable insights to policymakers and farmers. This research analyzed 20 factors in determining their impact on crop yield in the Indian economy. Three feature selection algorithms were used to identify the essential factors: forward feature selection, backward feature selection, and recursive feature elimination. These three algorithms were used to select the most important factors from the Twenty selected factors, and then three ML models were used to estimate the accuracy of the feature selection algorithms: Random Forest, XGBoost, and Multiple Linear Regression. Principal Component Analysis (PCA) was used for the dimensionality reduction of the features. RMSE, MAPE, MAE, and R2 were used to measure the feature selection method's performance. Findings: Out of the three machine learning algorithms, the Random Forest algorithm with the forward feature selection algorithm provided the highest model accuracy of 98.415 percent. Moreover, compare the combination of three machine learning algorithms and different feature selection algorithms. Novelty: Our approach to predicting crop yield is based on a combination of Feature Selection, PCA, and Machine Learning algorithms. This proposed research utilizes Feature Selection algorithms to identify the most crucial features among 20 available options and then apply Machine Learning models to make accurate predictions based on these features.

Keywords: Recursive Feature Elimination, Forward Feature Selection, Principal Component Analysis, Random Forest Regression, XGBoost, Crop yield prediction, Multiple Linear Regression


© 2023 Mehla et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Published By Indian Society for Education and Environment (iSee)


