You are on page 1of 8

Winning Prediction Analysis in One-Day-International

(ODI) Cricket Using Machine Learning Techniques

Abhishek Naik, Shivanee Pawar, Minakshee Naik, Sahil Mulani


abhisheknaik1025@gmail.com, shivanee.pawar7@gmail.com, minakshee.naik08@gmail.com
B.E. Computer Dept., AISSMS IOIT, Kennedy Road, Pune

ABSTRACT 2. SURVEY DETAILS


Cricket prediction can be viewed as one of the objectives of
sports analytics, which aims at helping decision makers to gain
competitive advantage. The difficulty of this task depends on
many factors, like the availability of data for the past events,
the ability to gather data for future events, the knowledge
needed to interpret gathered data, and others. Various
techniques for modeling a cricket match exist that yield
different result prediction algorithms. The modeling can be put
under the four generic categories: empirical models, dynamic
systems, statistical techniques and artificial intelligence
(including expert systems). In the artificial intelligence
category, there are several approaches that focus on Bayesian
network modeling. The Matrix factorization technique became
very popular in the field of multimedia content recommender
systems where it showed good scalability and predictive
accuracy. The idea behind using the latent features in our case
is to be able to build a successful model over existing matrix
factorization.

Keywords
Machine Learning, Deep Neural Network, Artificial
Intelligence.

1. INTRODUCTION
Sports Result Prediction can be viewed as one of the objectives
of sports analytics which aims at helping decision makers to
gain the competitive advantage. Data Analysis is becoming
more common specially in Sports. Using data analysis results
has become familiar in sports organization such as International
Cricket Council (ICC), International Federation of Association
Football (FIFA), Grand Slam of the International Tennis Tianxiang Cui, Jinpeng Li [1]:
Federation. The obstacle in this task depends on many factors,
like gathering the historical data, gathering data for future In this paper they have used GP to the problem of predicting
events, knowledge required to interpret the gathered data and the outcomes of English Premier League games with the result
many more aspect, and the result of the game has become the
being either win, lose or draw. They have selected 25 features
focus and concentration of sports game.

Volume 3 Issue 2 April - 2018 1


from each game as the input to GP system. The advantage of 53.25%. The accuracy of hybrid network is 52.29%. Elo has a
GP system is that it can generate as many high-quality successful prediction rate of 47.71% while goal ratio has
functions as per required. It uses Bayesian network together corrected prediction rate of 49.02%. Limitations of this system
with some other machine learning techniques including a is selection of input data to be used in forecasting systems is a
decision tree and KNN to predict the results. The overall critical issue. Hence different leagues, different input
average accuracy for the Bayesian network is 52.21%. Best parameters and of course different network structures should be
accuracy Is achieved by using an ANN (68.8%). The best tested in order to achieve a well-balanced generic forecasting
overall accuracy achieved by GP system is (76%). Limitation system.
of this system is that only 25 features are selected. Hence to
overcome this filter can applied to these features in order to k. Wickramaratna, Min Chen [11]:
detect the importance.
this paper processes a Neural Network based framework for
S. Dobravee [3]: semantic events detection in soccer videos. A Hidden Markov
model is used to detect the play and break events from soccer
This paper processes to build a goal scores prediction model videos. There are many issues related to HMM and SVM hence
that uses latent features obtained from matrix factorization to tackle the issues a novel learning-based event detection
process. Naïve Bayes classifier is also added to be able to framework is processed. In this paper, which incorporates both
predict outcomes of the match. In this the Matrix Factorization strength of multi model analysis and ability of neural network
technique is used to build the successful model even when the ensembles to enhance the generalization capabilities. In this
expect knowledge is not available. Limitation of this system is paper, an advanced framework for goal event detection in
that the size of database is very small, a short-termed dataset is soccer videos are proposed using multi model processing and
used. Hence to improve this prediction success is to append the classification power of neural network ensembles the future
regression model that would be used to improve latent features work is to extend the framework for multiple event detection at
models-based predictions. different domains.

J. Pan [4]: D. Comanier, V. Ramesh [12]:

This paper processes DS-evidence theory to calculate the This paper proposed a prediction after intersection-based
uncertainty of the data and the unknown a prior probability. For prediction filter to track the players distinctively in interaction
the tennis prediction results this paper is processed a method events of the volleyball crash pattern model and other pattern
with evidence theory to compute the uncertainty of competition models are use to detect each tracking objects after intersection.
results. Hybrid hadam evaluation method is proposed in this Algorithms such as Mean-shift, Cam-shift, SIFT, Kalman
paper although there is some dejiciary in this. This paper finds Filter, Extended Kalman Filter, Particle Filter had their own
that evidence theory can still calculate data uncertainty even merit. This paper proposes a prediction after intersection-based
though data sources of initialization input is absent. Method Particle filter the success rate of our proposal is around 80%
accuracy of predicting completion is 70% in this paper. while the conventional one is about 30%.

T. L. W. Walls, E. J. Bass [5]: Z. Yijie, x. Sun [13]:

This paper proposes a regression-based prediction model. This This paper offers finalized description of outcome prediction
was developed to allow better prediction of attendance for the for sport completions. It proposes a novel based prediction
student general admission seats. At university of Virginia, a model and team model-based study on existing technologies
regression-based approach was used in this system. The EM algorithm is used in this paper. Different models like defect
advantage of having this is that the use of regression is a analysis of traditional model, Game model, Team model is
promising method, especially for longer term planning. used. The result in the offensive and defensive rounds and the
total score to predict the exact rate of is between 65% and 70%
B. G. Aslan, M. M. Inceoglu [10]: which all approximate to traditional methods.

This paper processes two different input vector parameters B. Zhao, L. Chen [14]:
have been tested via learning vector quantization network in
order to emphasize the importance of input parameter selection. This paper processes a prediction model of sports result based
Neural networks are used for building the forecasting system on knowledge discovery in database. The method combines the
about soccer matches. This system has been improved to multiple light weighted models with a variety technical to
calculate the probability of outcome of soccer match. The goal improve the accuracy of prediction model of sport result. This
ratio compare model had been proposed for predicting the paper proposed KDD modelling method to analysis and predict
soccer model results. The two input methods have been used the non-against games and predict the result of the game and
i.e. LVQA & LVQB. The accuracy provided by the LVQA is analysis and determine individual of the game.
51.29% while the accuracy provided by the LVQB is about

Volume 3 Issue 2 April - 2018 138


Q. Wang, Z. Sun [18]: 4. PROPOSED SYSTEM
This paper analyzes the problem of spin classification firstly.
ARCHITECTURE
An adopted and improved the extreme Learning Machine
model is presented. Trajectory prediction methods can be Live match
classified into Two categories: Experience model and
Parameter model. Experience model uses local weighted Team A Team B
regression, Parameter model uses the stress analysis study.
Extreme Learning Machine is newly developed neural network Groun d det ail

algorithm. The experiment result should be a major


To ss
improvement fojr improved ELM in both classification
precision and efficiency. Playing 11 Playing 11
team A team B

3. SYSTEM DESCRIPTION
Step 1: End user must select the two teams. Depending on the Batting B owling Ca ptaincy
Defau lt batting
order
selection of both the teams the country and the ground where
the match will be held is displayed automatically.
Batting, Bowling
Batting, Bowling
Step 2: After the toss, the playing 11 players of both the teams Stats on the
Stats in the
Selected ground
Selected Prediction
will be short listed. Against the
country
Opposite team

Mat ch started
Step 3: Depending on the aspects that is the batting, bowling,
captaincy and the default batting order, the main prediction will
Prediction Prediction
take place.(Here the logistic regression and K-means clustering fluctuates fluctuates
Depending on Depending on
comes into picture as the graph will be plotted by logistic batsmen's batsmen's
Performance Performance

regression [Graph 1 &2]and then K-means clustering will take


place [On the points [Blue and Yellow depending on the two Fluctuat ed
predict io n
teams [Graph 1 &2]]] )
comp are

Step 4: After the match starts the prediction can differ


depending on the following two factors: If accurate

Successful
1. Batsmen’s Performance Unsuccess fu l

2. Batting order of the particular player Reas on of


failure

Step 5: The fluctuated prediction will be compared with the


main prediction (Fluctuated prediction means again the graph
will be formed on real time values and will be compared with
the graphs that have been made by main predictions.)

Step 6: When the result of the fluctuated prediction matches Fig. 2: Proposed architecture of System
with the result of the main prediction then we have the accurate
output.

Step 7: When the result of the fluctuated prediction differs from


the result of the main prediction then the system will give a
reason of failure.

Volume 3 Issue 2 April - 2018 139


Table 2:
5. ALGORITHMIC SURVEY
Logistic Regression []
NAMES AVERAGE
Logistic Regression is a predictive analysis. In
1]Rohit Sharma (RS) 55.33
logistic regression, the dependent variable is binary or
dichotomous, i.e. it only contains data coded as 1 (TRUE,
2]Shikhar Dhawan (SD) 43.00
success) or 0 (FALSE, failure). Logistic Regression is
basically used to describe the data and also to explain the 3]Virat Kohali (VK) 10.67
relationship between one dependent binary variable and one
or more than one ratio independent variables. 4]Ajinkya Rahane (AR) 28.00

Neural Network [] 5]Mahendra Singh 19.08


Dhoni[C](MSD)
In information technology, a neural network is a
system of hardware and/or software patterned after the 6]Suresh Raina (SR) 04.00
operation of neurons in the human brain. Neural networks
also called artificial neural networks are a variety of deep 7]Ravindra Jadeja (RJ) 08.00
learning technologies.
8]Ravichandran Ashwin (RA) 26.00
K-Means Clustering []
9]Mohammad Shami (MS) -
k-means clustering aims to partition n observations into k
clusters in which each observation belongs to the cluster with 10]Mohit Sharma (Mo S) -
the nearest mean, serving as a prototype of the cluster. The k-
means clustering algorithm attempts to split a given 11]Umesh Yadav (UY) -
anonymous data set (a set containing no information as to class
identity) into a fixed number (k) of clusters. Initially k number The summation of the players available average/ Number of
of so called centroids are chosen. players having their averages:

6. IMPLEMANTATION I1=RS+SD+VK+AR+MSD+SR+RJ+RA/8 = 24.26


Train data:
The mathematical representation of algorithm for proposed n = Number of available players
system is explained with the help of example below:
= Average of each players

In order to check whether the aspects of the main prediction


1 = ∑ =0 (1)
are right we take an example of historical match between two
teams that is INDIA VS AUSTRALIA. The match had
happened on 26th March 2015. The Venue was Sydney Cricket
Ground, Australia. (Due to space constraints we just consider
the batting aspect).

In the BATTING:

1st we consider the India’s and Australia’s performance against


each other on the SCG Ground:

INDIA:

Volume 3 Issue 2 April - 2018 140


AUSTRALIA: The RED line is the line which is partitioning the players who
have performed below the average and above the average.
Table 3:
2nd we consider the India’s and Australia’s performance
NAMES AVERAGE against each other in the country where the match is being
played that is Australia:
1]Aaron Finch (AF) 06.00
INDIA:
2]David Warner (DW) 95.00
NAMES AVERAGE
3]Steve Smith (SS) 28.00
1]Rohit Sharma (RS) 38.04
4]Glenn Maxwell (GM) -
2]Shikhar Dhawan (SD) 32.48
5]Shane Watson (SW) 01.00
3]Virat Kohali (VK) 44.08
6]Michel Clarke(C)(MC) 22.25
4]Ajinkya Rahane (AR) 42.07
7]James Faulkner (JF) 01.00
5]Mahendra Singh Dhoni[C](MSD) 39.60
8]Brad Haddin (BH) -
6]Suresh Raina (SR) 28.21
9]Mitchel Johnson (MJ) -
7]Ravindra Jadeja (RJ) 15.51
10]Mitchel Starc (MS) -
8]Ravichandran Ashwin (RA) 11.95
11]Josh Hazlewood (JH) -
9]Mohammad Shami (MS) -
The summation of the players available average/ Number of
players having their averages: 10]Mohit Sharma (MoS) -

A1=AF+DW+SS+SW+MC+JF /6 = 25.5 11] Umesh Yadav (UY) 06.05

n = Number of available players

= Average of each players Table 4

1 =∑ (2) The summation of the players available average/ Number of


=0 players having their averages:

The Graph1 is plotted as per the two tables (I1, A1) I2=RS+SD+VK+AR+MSD+SR+RJ+RA+UY/9 = 28.71

Graph 1: n = Number of available players

= Average of each players

2 =∑ (3
=0

In the above graph x axis consist of number of matches played


against each other on that particular ground (Considering
above example SCG) and y axis consists of average against
each other.

Volume 3 Issue 2 April - 2018 141


AUSTRALIA: In the above graph x axis consist of number of matches played
against each other on that particular ground (Considering
Table 5: above example SCG) and y axis consists of average against
each other.
NAMES AVERAGE
The RED line is the line which is partitioning the players who
1]Aaron Finch (AF) 46.08 have performed below the average and above the average.

2]David Warner (DW) 26.65 3rd is we add both the tables that is I1 + I2 and A1+ A2, we get
I3 and A3 respectively:
3]Steve Smith (SS) 71.22
I3 = I1+I2 A3=A1+A2
4]Glenn Maxwell (GM) 34.71
= 24.26+28.71 = 25.54+43.14
5]Shane Watson (SW) 38.61
= 52.97 = 68.68
6]Michel Clarke[C] (MC) 38.06

7]James Faulkner (JF) 86.31 So, we get to a result that considering only one aspect of batting
in mind the two teams that is India and Australia have got their
8]Brad Haddin (BH) 35.05 two percentile which means considering only the batting,
Australia have got 68.68% chances of winning which is more
9]Mitchel Johnson (MJ) 11.58 than the Indian chances of winning that is 52.97%

10]Mitchel Starc (MS) -

11]Josh Hazlewood (JH) - 7. MATHEMATICAL


REPRESANTATION OF PROPOSED
The summation of the players available average/ Number of
players having their averages:
SYSTEM

A2=AF+DW+SS+GM+SW+MC+BH+JF+MJ /9 = 43.14
S= {Input, Output, Success, Failure}
n = Number of available players
where,
= Average of each players
Input= {I1, I2, I3}
2 =∑ (4)
=0 I1=Select the two teams

The Graph2 is plotted as the given table (I2, A2) I2=Select the ground and the country

Graph 2: I3=Select the 11 team players


Output= {O1, O2}

O1=Main prediction

O2=Result of fluctuating prediction

Success=When the result of the main prediction will match


with the result of fluctuating prediction.

Failure=When the result of the fluctuating prediction will


differ from the main prediction in the aspect of the winning
team.

Volume 3 Issue 2 April - 2018 142


8. CONCLUSION AND FUTURE basketball games," Proceedings of the 2004 IEEE Systems and
Information Engineering Design Symposium, 2004.,
SCOPE:
Our prediction system based on logistic regression works Charlottesville, VA, 2004, pp. 203-208.
accurately as the result of the match had been that Australia had
won the match by 95 runs.
[6] D. Miljković, L. Gajić, A. Kovačević and Z. Konjović, "The
In future work we can give Predictions in different formats of
use of data mining for basketball matches outcomes
cricket like Test matches and Twenty-Twenty. We can also
prediction," IEEE 8th International Symposium on Intelligent
give the Prediction of major series like Champions Trophy,
Systems and Informatics, Subotica, 2010, pp. 309-312.
Ashes series, World Cups (50-50, 20-20). Predictions can also
be made when the match is abandoned due to rain, bad light,
etc.
[7] K. Trawinski, "A fuzzy classification system for prediction
of the results of the basketball games," International
Conference on Fuzzy Systems, Barcelona, 2010, pp. 1-7.
9. REFERENCES
[1] Tianxiang Cui, Jingpeng Li, J. R. Woodward and A. J.
Parkes, "An ensemble based Genetic Programming system to
predict English football premier league games," 2013 IEEE [8] J. Gumm, A. Barrett and G. Hu, "A machine learning
Conference on Evolving and Adaptive Intelligent Systems strategy for predicting march madness winners," 2015
(EAIS), Singapore, 2013, pp. 138-143 IEEE/ACIS 16th International Conference on Software
Engineering, Artificial Intelligence, Networking and
Parallel/Distributed Computing (SNPD), Takamatsu, 2015, pp.
1-6.
[2] J. Hucaljuk and A. Rakipović, "Predicting football scores
using machine learning techniques," 2011 Proceedings of the
34th International Convention MIPRO, Opatija, 2011, pp.
1623-1627. [9] Y. Saito, M. Kimura and S. Ishizaki, "Real-time prediction
to support decision-making in soccer," 2015 7th International
Joint Conference on Knowledge Discovery, Knowledge
Engineering and Knowledge Management (IC3K), Lisbon,
[3] S. Dobravec, "Predicting sports results using latent features:
2015, pp. 218-225.
A case study," 2015 38th International Convention on
Information and Communication Technology, Electronics and
Microelectronics (MIPRO), Opatija, 2015, pp. 1267-1272.
[10] B. G. Aslan and M. M. Inceoglu, "A Comparative Study
on Neural Network Based Soccer Result Prediction," Seventh
International Conference on Intelligent Systems Design and
[4] J. Pan, "Tennis Match Prediction Model Based on Improved
Applications (ISDA 2007), Rio de Janeiro, 2007, pp. 545-550.
D-S Evidence Theory," 2016 International Conference on
Robots & Intelligent System (ICRIS), Zhangjiajie, 2016, pp.
414-417.
[11] K. Wickramaratna, Min Chen, Shu-Ching Chen and Mei-
Ling Shyu, "Neural network-based framework for goal event
detection in soccer videos," Seventh IEEE International
[5] T. L. W. Walls and E. J. Bass, "A regression-based
Symposium on Multimedia (ISM'05), 2005, pp. 8 pp.
predictive model of student attendance at UVA men's

Volume 3 Issue 2 April - 2018 143


[12] D. Comaniciu, V. Ramesh, and P. Meer, “Player Tracking Graphics and Applications, vol. 36, no. 5, pp. 62-71, Sept.-Oct.
using prediction after intersection based practicle filter for 2016.
Volley Ball match”, IEEE Transactions on, Vol. 25, No. 5,
pp.564-577, May. 2003.

[17] Z. Bo, Q. Chaoling, X. Xiaoli and Z. Fanbo, "GM (1,1)


Model Gray Prediction for the Gold-Medal Result of Women's
[13] Z. Yijie and X. Jun, "Competition Results Prediction Put Shot in the 30th Olympic Games," 2011 International
Model Based on Athlete Ability Data Simulation and Conference on Future Computer Science and Education, Xi'an,
Analysis," 2015 Sixth International Conference on Intelligent 2011, pp. 334-337.
Systems Design and Engineering Applications (ISDEA),
Guiyang, 2015, pp. 222-225.

[14] B. Zhao and L. Chen, "Prediction Model of Sports Results [18] Q. Wang and Z. Sun, "Trajectory identification of spinning
Base on Knowledge Discovery in Data-Base," 2016 ball using improved extreme learning machine in table tennis
International Conference on Smart Grid and Electrical robot system," 2015 IEEE International Conference on Cyber
Automation (ICSGEA), Zhangjiajie, 2016, pp. 288-291. Technology in Automation, Control, and Intelligent Systems
(CYBER), Shenyang, 2015, pp. 551-554.

[15] A. McCabe and J. Trevathan, "Artificial Intelligence in


Sports Prediction," Fifth International Conference on [19] J. Liu, Z. Fang, K. Zhang and M. Tan, "A new data
Information Technology: New Generations (itng 2008), Las processing architecture for table tennis robot," 2015 IEEE
Vegas, NV, 2008, pp. 1194-1197. International Conference on Robotics and Biomimetics
(ROBIO), Zhuhai, 2015, pp. 1780-1785.

[16] R. Vuillemot and C. Perin, "Sports Tournament


Predictions Using Direct Manipulation," in IEEE Computer

Volume 3 Issue 2 April - 2018 144