Indian Journal of Science and Technology
Year: 2020, Volume: 13, Issue: 38, Pages: 4013-4035
Ayush Tripathi1*, Rashidul Islam2, Vatsal Khandor3, Vijayabharathi Murugan4
1 Department of Computer Science, Raj Kumar Goel Institute of Technology, Ghaziabad, Tel.: +91-95-6010-4441
2 Department of AEIE, Heritage Institute of Technology, Kolkata, India
3 Department of Computer Engineering, Dwarkadas J. Sanghvi College of Engineering, Mumbai
4 Department of Chemical Engineering, Indian Institute of Technology, Madras
Email: [email protected]
Received Date:12 September 2020, Accepted Date:17 October 2020, Published Date:28 October 2020
Background/Objectives: The IPL (Indian Premier League) is one of the most viewed cricket matches in the world. With a perpetual increase in the popularity and advertising associated with it, forecasting the IPL matches is becoming a need for the advertisers and the sponsors. This paper is centered on the implementation of machine learning to foretell the winner of an IPL match. Methods/Statistical analysis: The cricket in the T-20 format is highly unpredictable - many features contribute to the result of a cricket match, and each attribute feature has a weighted impact on the outcome of a game. In this paper, first, a meaningful dataset through data mining was defined; next, essential features using various methods like feature engineering and Analytic Hierarchy Process were derived. Besides, a key issue on data symmetry and the inability of models to handle it was identified, which extends to all types of classification models that compare two or more classes using similar features for both the classes. This concept in the paper is termed as model ambiguity that occurs due to the model’s asymmetric nature. Alongside, different machine learning classification algorithms like Naïve Bayes, SVM, k- Nearest Neighbor, Random Forest, Logistic Regression, ExtraTreesClassifier, XGBoost were adopted to train the models for predicting the winner. Findings: As per the investigation, tree-based classifiers provided better results with the derived model. The highest accuracy of 60.043% with Random Forest, with a standard deviation of 6.3% and an ambiguity of 1.4%, was observed. Novelty/Applications: Apart from reporting a more accurate result, the derived model has also solved the problem of multicollinearity and identified the issue of data symmetry (termed as model ambiguity). It can be leveraged by brands, sponsors, and advertisers to keep up their marketing strategies.
Keywords: The Indian Premier League; machine learning; analytic hierarchy process; winner prediction; IPL
© 2020 Tripathi et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Published By Indian Society for Education and Environment (iSee).
Subscribe now for latest articles and news.