Prediction of IPL matches using Machine Learning while tackling ambiguity in results

Ayush Tripathi; Rashidul Islam; Vatsal Khandor; Vijayabharathi Murugan

doi:10.17485/IJST/v13i38.1649

Article

Prediction of IPL matches using Machine Learning while tackling ambiguity in results

VIEWS 5372
PDF 2610

Indian Journal of Science and Technology

DOI: 10.17485/IJST/v13i38.1649

Year: 2020, Volume: 13, Issue: 38, Pages: 4013-4035

Original Article

Prediction of IPL matches using Machine Learning while tackling ambiguity in results

Ayush Tripathi^1*, Rashidul Islam², Vatsal Khandor³, Vijayabharathi Murugan⁴

¹ Department of Computer Science, Raj Kumar Goel Institute of Technology, Ghaziabad, Tel.: +91-95-6010-4441
² Department of AEIE, Heritage Institute of Technology, Kolkata, India
³ Department of Computer Engineering, Dwarkadas J. Sanghvi College of Engineering, Mumbai
⁴ Department of Chemical Engineering, Indian Institute of Technology, Madras

^*Corresponding author
Tel: +91-95-6010-4441
Email: [email protected]

Received Date:12 September 2020, Accepted Date:17 October 2020, Published Date:28 October 2020

This work is licensed under a Creative Commons Attribution 4.0 International License.

Abstract

Background/Objectives: The IPL (Indian Premier League) is one of the most viewed cricket matches in the world. With a perpetual increase in the popularity and advertising associated with it, forecasting the IPL matches is becoming a need for the advertisers and the sponsors. This paper is centered on the implementation of machine learning to foretell the winner of an IPL match. Methods/Statistical analysis: The cricket in the T-20 format is highly unpredictable - many features contribute to the result of a cricket match, and each attribute feature has a weighted impact on the outcome of a game. In this paper, first, a meaningful dataset through data mining was defined; next, essential features using various methods like feature engineering and Analytic Hierarchy Process were derived. Besides, a key issue on data symmetry and the inability of models to handle it was identified, which extends to all types of classification models that compare two or more classes using similar features for both the classes. This concept in the paper is termed as model ambiguity that occurs due to the model’s asymmetric nature. Alongside, different machine learning classification algorithms like Naïve Bayes, SVM, k- Nearest Neighbor, Random Forest, Logistic Regression, ExtraTreesClassifier, XGBoost were adopted to train the models for predicting the winner. Findings: As per the investigation, tree-based classifiers provided better results with the derived model. The highest accuracy of 60.043% with Random Forest, with a standard deviation of 6.3% and an ambiguity of 1.4%, was observed. Novelty/Applications: Apart from reporting a more accurate result, the derived model has also solved the problem of multicollinearity and identified the issue of data symmetry (termed as model ambiguity). It can be leveraged by brands, sponsors, and advertisers to keep up their marketing strategies.

Keywords: The Indian Premier League; machine learning; analytic hierarchy process; winner prediction; IPL

References

Gupta V, Santosh N. Duff & Phelps Launches IPL Brand Valuation Report. Duff & Phelps. Available from: https://www.duffandphelps.com/about-us/news/ipl-brand-valuation-report-2019
Badwe A. IPL Advertising: All you need to know about the game of revenue. Kreedon. 2019. Available from: https://www.kreedon.com/ipl-advertising-all-you-need-to-know/
Fried G, Mumcu C. Sport analytics: A data-driven approach to sport business and management. Taylor & Francis. 2016.
Heaton J. An empirical analysis of feature engineering for predictive modeling. In: SoutheastCon. Norfolk, VA. p. 1–6.
Zheng A, Casari A. Feature engineering for machine learning: principles and techniques for data scientists. O'Reilly Media. 2018.
Lamsal R, Choudhary A. Predicting Outcome of Indian Premier League (IPL) Matches Using Machine Learning. 2018. Available from: https://arxiv.org/abs/1809.09813
IPL website. Available from: https://www.iplt20.com
Jhanwar MG, Pudi V. Predicting the Outcome of ODI Cricket Matches: A Team Composition Based Approach. In: European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases. .
Saikia H, Bhattacharjee D, Lemmer HH. Predicting the Performance of Bowlers in IPL: An Application of Artificial Neural Network. International Journal of Performance Analysis in Sport. 2012;12(1):75–89. Available from: https://dx.doi.org/10.1080/24748668.2012.11868584
Bhattacharjee D, Talukdar P. Predicting outcome of matches using pressure index: evidence from Twenty20 cricket, Communications in Statistics - Simulation and Computation. 2019. Available from: https://doi.org/10.1080/03610918.2018.1532003
Kaggle. Available from: https://www.kaggle.com/manasgarg/ipl
ESPNCricInfo. Available from: https://stats.espncricinfo.com/
Beautiful Soup Python Library. Available from: https://pypi.org/project/beautifulsoup4/
Brownlee J. A Gentle Introduction to Imbalanced Classification. Machine Learning Mastery. 2019. Available from: https://machinelearningmastery.com/what-is-imbalanced-classification/
Johnson JM, Khoshgoftaar TM. Survey on deep learning with class imbalance. Journal of Big Data. 2019;6(1). Available from: https://dx.doi.org/10.1186/s40537-019-0192-5
Emrouznejad A, Ho W. Fuzzy Analytic Hierarchy Process. NewYork. Chapman and Hall/CRC..
Passi K, Pandey N. Increased prediction accuracy in the game of cricket using machine learning. International Journal of Data Mining & Knowledge Management Process(IJDKP). 2009;8(2). Available from: https://arxiv.org/abs/1804.04226
Gokmen S, Dagalp R, Kilickaplan S. Multicollinearity in measurement error models. Communications in Statistics - Theory and Methods. 2020;p. 1–12. Available from: https://dx.doi.org/10.1080/03610926.2020.1750654
SKLearn. Available from: https://scikit-learn.org/
Kohavi R. A study of cross-validation and bootstrap for accuracy estimation and model selection. In: International Joint Conference on Artificial Intelligence. (Vol. 14, pp. 1137-1145) 1995.
Vieira SM, Kaymak U, Sousa MCJ. Cohen’s kappa coefficient as a performance measure for feature selection. International Conference on Fuzzy Systems. 2010. Available from: https://doi.org/ 10.1109/fuzzy.2010.5584447
Blanca MJ, Arnau J, López-Montiel D, Bono R, Bendayan R. Skewness and Kurtosis in Real Data Samples. Methodology. 2013;9(2):78–84. Available from: https://dx.doi.org/10.1027/1614-2241/a000057
Cain MK, Zhang Z, Yuan KH. Univariate and multivariate skewness and kurtosis for measuring nonnormality: Prevalence, influence and estimation. Behavior Research Methods. 2017;49(5):1716–1735. Available from: https://dx.doi.org/10.3758/s13428-016-0814-1
Narkhede S. Understanding AUC - ROC Curve. Towards Data Science. 2018. Available from: https://www.48hours.ai/files/AUC.pdf
Probst P, Wright MN, Boulesteix A. Hyperparameters and tuning strategies for random forest. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery. 2019;9(3). Available from: https://dx.doi.org/10.1002/widm.1301

Copyright

© 2020 Tripathi et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Published By Indian Society for Education and Environment (iSee).