Professional Documents
Culture Documents
1. INTRODUCTION
1.1 Project Overview
Indian Premier League (IPL) is one of the more popular cricket world tournaments, and its financial is
increasing each season, its viewership has increased markedly and the betting market for IPL is growing
significantly every year. With cricket being a very dynamic game, bettors and bookies are incentivized to bet on
the match results because it is a game that changes ball-by-ball. This paper investigates machine learning
technology to deal with the problem of predicting cricket match results based on historical match data of the IPL.
Influential features of the dataset have been identified using filter-based methods including Correlation-based
Feature Selection, Information Gain (IG), Relief and Wrapper. More importantly, machine learning techniques
including Naïve Bayes, Random Forest, K-Nearest Neighbor (KNN) and Model Trees (classification via
regression) have been adopted to generate predictive models from distinctive feature sets derived by the filter-
based methods. Two featured subsets were formulated, one based on home team advantage and other based on
Toss decision. Selected machine learning techniques were applied on both feature sets to determine a predictive
model. Experimental tests show that tree-based models particularly Random Forest performed better in terms of
accuracy, precision and recall metrics when compared to probabilistic and statistical models. However, on the
Toss featured subset, none of the considered machine learning algorithms performed well in producing accurate
predictive models.
Cricket is a well-known sport and with its increasing popularity and viewership, change of formats and
innovations in tournament played became necessary. To cater for potential future growth, global market research
was commissioned by the International Cricket Council (ICC) which revealed that cricket has more than one
billion fans worldwide, with the potential for significant growth. Among all formats of cricket, the popularity of
Twenty20 Internationals (T20) was the highest with 92%, with 87% of the fans
Once the dataset has been read, we should look at the head and tail of the dataset to make sure it is imported
correctly. The head of the dataset should look like this:
2.LITERATURE SURVEY
2.1 Library/Module Requirements
This Project Require some of the Python Libraries and Modules i.e pandas, numpy, matplotlib, Seaborns
libraries
Pandas: pandas is a Python package providing fast, flexible, and expressive data structures designed to
make working with “relational” or “labeled” data both easy and intuitive. It aims to be the fundamental high-level
building block for doing practical, real-world data analysis in Python. Additionally, it has the broader goal of
becoming the most powerful and flexible open-source data.
NumPy: It is a Python library that provides a multidimensional array object, various derived objects (such
as masked arrays and matrices), and an assortment of routines for fast operations on arrays, including
mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete Fourier transforms, basic linear
algebra, basic statistical operations, random simulation and much more. At the core of the NumPy package, is the
ND array object. This encapsulates n-dimensional arrays of homogeneous data types, with many operations being
performed in compiled code for performance. There are several important differences between NumPy arrays and
the standard Python sequences:
Matplotlib: Matplotlib is one of the most popular Python packages used for data visualization. It is a
cross-platform library for making 2D plots from data in arrays. It provides an object-oriented API that helps in
embedding plots can be used in Python and IPython shells, Jupyter notebook and web application servers also.
Seaborn: Seaborn is an open-source Python library built on top of matplotlib. It is used for data
visualization and exploratory data analysis. Seaborn works easily with dataframes and the Pandas library. The
graphs created can also be customized easily.
Software requirement:
• Windows 7 or Higher
2.3 Tools/Language/Platform
• Python Language
• Jupyter Platform
In cricket we need accurate data so we don’t fill null values which result in course wrong results. Here we
don’t consider the columns which are having null values.
Now find the fiends which have null values are not.
Now find out total number of matches played, Unique number of cities where matches are played and
Total number of teams including in IPL
Now we Find total Runs scored in each season, this can be achieved by finding total num of all type of score i.e
1’s 2’s 3’s etc... and plot the line plot by applying seasons against grouped total runs.
Resultant output:
5.CONCLUSION
Applying data Analysis Technique for analyzing cricket sports by considering historical game data,
players performance, natural parameters, pre-game conditions and other features is beneficial for multiple
stakeholders. In a dynamic format like T20, where the situation in a game change on every ball, it becomes
challenging to predict the match outcome. For predicting the final outcome of a T20 cricket match, we have
investigated machine learning technology for the possibility of improving the prediction rate of the results of
matches. We have formulated the problem in two scenarios, named for the most influential features, firstly the
Home Team features set and secondly Toss Winner decision features set.
REFERENCES
• Kumash Kapadia a , Hussein Abdel-Jaber b,⇑ , Fadi Thabtah a , Wael Hadi ,Sport analytics for cricket game
results using machine learning: An experimental study,
• Abdelhamid, N., Ayesh, A., Thabtah, F., 2012. An experimental study of three different rule ranking formulas
in associative classification mining, in: Proceedings of the 7th IEEE International Conference for Internet
Technology and Secured Transactions (ICITST-2012), pp. 795–800, UK.
• Analytics, C., 2017. Anaconda software distribution. Computer software Vers, 2-2.