• P-ISSN 0974-6846 E-ISSN 0974-5645

Indian Journal of Science and Technology


Indian Journal of Science and Technology

Year: 2023, Volume: 16, Issue: 47, Pages: 4512-4524

Original Article

Improving Crop Yield Prediction Models with Optimization-Based Feature Selection and Filtering Approaches

Received Date:26 June 2023, Accepted Date:05 October 2023, Published Date:22 December 2023


Objective: To analyze the impact of various factors on crop yield and provide insights for improving crop production in the region. Methods: This research employs feature selection algorithms, machine learning models, and feature extraction algorithm Principal Component Analysis (PCA) technique to identify the key factors affecting crop yield in India. Data from the Indian Meteorological, Statistical, and Agriculture Departments spanning five decades are analyzed to provide valuable insights to policymakers and farmers. This research analyzed 20 factors in determining their impact on crop yield in the Indian economy. Three feature selection algorithms were used to identify the essential factors: forward feature selection, backward feature selection, and recursive feature elimination. These three algorithms were used to select the most important factors from the Twenty selected factors, and then three ML models were used to estimate the accuracy of the feature selection algorithms: Random Forest, XGBoost, and Multiple Linear Regression. Principal Component Analysis (PCA) was used for the dimensionality reduction of the features. RMSE, MAPE, MAE, and R2 were used to measure the feature selection method's performance. Findings: Out of the three machine learning algorithms, the Random Forest algorithm with the forward feature selection algorithm provided the highest model accuracy of 98.415 percent. Moreover, compare the combination of three machine learning algorithms and different feature selection algorithms. Novelty: Our approach to predicting crop yield is based on a combination of Feature Selection, PCA, and Machine Learning algorithms. This proposed research utilizes Feature Selection algorithms to identify the most crucial features among 20 available options and then apply Machine Learning models to make accurate predictions based on these features.

Keywords: Recursive Feature Elimination, Forward Feature Selection, Principal Component Analysis, Random Forest Regression, XGBoost, Crop yield prediction, Multiple Linear Regression


  1. Klompenburg TV, Kassahun A, Catal C. Crop yield prediction using machine learning: A systematic literature review. Computers and Electronics in Agriculture. 2020;177:105709. Available from: https://doi.org/10.1016/j.compag.2020.105709
  2. Jia W, Sun M, Lian J, Hou S. Feature dimensionality reduction: a review. Complex & Intelligent Systems. 2022;8(3):2663–2693. Available from: https://doi.org/10.1007/s40747-021-00637-x
  3. PSMG, RB. Selection of Important Features for Optimizing Crop Yield Prediction. International Journal of Agricultural and Environmental Information Systems. 2019;10(3):54–71. Available from: https://doi.org/10.4018/IJAEIS.2019070104
  4. Lingwal S, Bhatia KK, Singh M. A novel machine learning approach for rice yield estimation. Journal of Experimental & Theoretical Artificial Intelligence. 2022;p. 1–20. Available from: https://doi.org/10.1080/0952813X.2022.2062458
  5. Gopal PSM, Bhargavi R. Optimum Feature Subset for Optimizing Crop Yield Prediction Using Filter and Wrapper Approaches. Applied Engineering in Agriculture. 2019;35(1):9–14. Available from: https://doi.org/10.13031/aea.12938
  6. Corrales DC, Schoving C, Raynal H, Debaeke P, Journet EPP, Constantin J. A surrogate model based on feature selection techniques and regression learners to improve soybean yield prediction in southern France. Computers and Electronics in Agriculture. 2022;192:106578. Available from: https://doi.org/10.1016/j.compag.2021.106578
  7. Whitmire CD, Vance JM, Rasheed HK, Missaoui A, Rasheed KM, Maier FW. Using Machine Learning and Feature Selection for Alfalfa Yield Prediction. AI. 2021;2(1):71–88. Available from: https://doi.org/10.3390/ai2010006
  8. Srivastava AK, Safaei N, Khaki S, Lopez G, Zeng W, Ewert F, et al. Winter wheat yield prediction using convolutional neural networks from environmental and phenological data. Scientific Reports. 2022;12(1). Available from: https://doi.org/10.1038/s41598-022-06249-w
  9. Liu Z, Japkowicz N, Wang R, Cai Y, Tang D, Cai X. A statistical pattern based feature extraction method on system call traces for anomaly detection. Information and Software Technology. 2020;126:106348. Available from: https://doi.org/10.1016/j.infsof.2020.106348
  10. Pham HT, Awange J, Kuhn M, Nguyen BV, Bui LK. Enhancing Crop Yield Prediction Utilizing Machine Learning on Satellite-Based Vegetation Health Indices. Sensors. 2022;22(3):719. Available from: https://doi.org/10.3390/s22030719
  11. Barbosa BDS, Ferraz GAES, Costa L, Ampatzidis Y, Vijayakumar V, Santos LMD. UAV-based coffee yield prediction utilizing feature selection and deep learning. Smart Agricultural Technology. 2021;1:100010. Available from: https://doi.org/10.1016/j.atech.2021.100010
  12. Mehla A, Deora SS. Use of Machine Learning and IoT in Agriculture. IoT Based Smart Applications. 2023;p. 277–293. Available from: https://doi.org/10.1007/978-3-031-04524-0_16
  13. Ramos APM, Osco LP, Furuya DEG, Gonçalves WN, Santana DC, Teodoro LPR, et al. A random forest ranking approach to predict yield in maize with uav-based vegetation spectral indices. Computers and Electronics in Agriculture. 2020;178:105791. Available from: https://doi.org/10.1016/j.compag.2020.105791
  14. Python. 2023. Available from: https://docs.python.org/3/library/
  15. Suruliandi A, Mariammal G, Raja SP. Crop prediction based on soil and environmental characteristics using feature selection techniques. Mathematical and Computer Modelling of Dynamical Systems. 2021;27(1):117–140. Available from: https://doi.org/10.1080/13873954.2021.1882505
  16. Aworka R, Cedric LS, Adoni WYH, Zoueu JT, Mutombo FK, Kimpolo CLM, et al. Agricultural decision system based on advanced machine learning models for yield prediction: Case of East African countries. Smart Agricultural Technology. 2022;2:100048. Available from: https://doi.org/10.1016/j.atech.2022.100048
  17. Obsie EY, Qu H, Drummond F. Wild blueberry yield prediction using a combination of computer simulation and machine learning algorithms. Computers and Electronics in Agriculture. 2020;178:105778. Available from: https://doi.org/10.1016/j.compag.2020.105778
  18. Manivasagam MA, Sumalatha P, Likitha A, Pravallika V, Satish KV, Sreeram S. An Efficient Crop Yield Prediction Using Machine Learning. International Journal of Research in Engineering. 2022;5(3). Available from: https://journal.ijresm.com/index.php/ijresm/article/view/1862
  19. Elavarasan D, Vincent PMDR. A reinforced random forest model for enhanced crop yield prediction by integrating agrarian parameters. Journal of Ambient Intelligence and Humanized Computing. 2021;12(11):10009–10022. Available from: https://doi.org/10.1007/s12652-020-02752-y
  20. Elavarasan D, Vincent DR. Reinforced XGBoost machine learning model for sustainable intelligent agrarian applications. Journal of Intelligent & Fuzzy Systems. 2020;39(5):7605–7620. Available from: https://doi.org/10.3233/JIFS-200862


© 2023 Mehla et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Published By Indian Society for Education and Environment (iSee)


Subscribe now for latest articles and news.