You are on page 1of 12

UMT Artificial Intelligence Review (UMT-AIR)

Volume 1 Issue 1, Spring 2021


ISSN(P): 2791-1276 ISSN(E): 2791-1268
Journal DOI: https://doi.org/10.32350/UMT-AIR
Issue DOI: https://doi.org/10.32350/UMT-AIR/0101
Homepage: https://journals.umt.edu.pk/index.php/UMT-AIR

Journal QR Code:

Article: Predictive Analysis of PSL Match Winner Using Machine


Learning Techniques
Author(s): Muhammad Awais1, Khushal Das2

1
Software Engineer, Audience, Pakistan
Affiliation: 2
Game Developer, Finz Games, Pakistan

Article QR:

A. Muhammad, and D. Khushal, “Predictive analysis of PSL match


winner using machine learning techniques,” UMT Artificial
Citation:
Intelligence Review, vol. 1, pp. 74–85, 2021.
https://doi.org/10.32350/UMT-AIR/0101/05
Copyright
Information: This article is open access and is distributed under the terms of
Creative Commons Attribution 4.0 International License

A publication of the
Dr Hasan Murad School of Management
University of Management and Technology, Lahore, Pakistan
PREDICTIVE ANALYSIS OF PSL MATCH WINNERS
USING MACHINE LEARNING TECHNIQUES
Khushal Das 1*, Muhammad Awais2
ABSTRACT: Cricket is one of the winner with the help of the Random
most celebrated open-air sports Forest algorithm.
generating a huge amount of KEYWORDS: K-Nearest
measurable information. As Pakistan Neighbours (KNN), machine
Super League (PSL) games grow in learning techniques, Naïve Bayes,
popularity, the potential predictors Random Forest, PSL prediction
which influence the outcomes of the
matches need to be investigated. PSL I. INTRODUCTION
data spanning the years 2016-2019 Recent advancements in technology
and containing the specifics of the have allowed the simultaneous
players, match venue details, squads, collection and processing of large
and ball-by-ball details was collected. amounts of information, collectively
Subsequently, it was analyzed to known as big data. Currently, data
draw various conclusions that offer science is being used in almost every
assistance to enhance a player’s walk of life to achieve better and
performance. Due to the expanding consequential results. The goal is to
number of matches day by day, it is use statistical and machine learning
difficult to oversee or extricate algorithms on the available data for
valuable data from the accessible creating computer programs that are
information of all the matches. This able to retrieve and use data for
paper presents the preprocessing of understanding and self-learning. This
the data, as well as its visualization process begins by data observation,
and prediction. It centers on for instance, events, direct knowledge
measuring the results of PSL matches or training, for the reorganization of
by applying the existing information certain statistical trends and to make
mining algorithms. It includes factors informed decisions based on the
such as team 1, team 2, toss winner, samples collected in the future. The
and toss decision to predict the match
1
Department of Information Systems, Dr Hassan Murad School of Management,
University of Management and Technology, Lahore, Pakistan
*Corresponding Author: f2020313013@umt.edu.pk
2
Department of Information Systems, Dr Hassan Murad School of Management,
University of Management and Technology, Lahore, Pakistan
*Corresponding Author: f2020313003@umt.edu.pk
Dr Hasan Murad School of Management
75
Volume 1 Issue 1, Spring 2021
PREDICTIVE ANALYSIS OF PSL MATCH WINNERS USING …

main goal of machine learning is to Portugal's football club Lisboa e


eradicate the need for human Benfica [6-9] is one of their most
interference or help by enabling successful football clubs. It
computers to gain knowledge incorporates machine learning for
automatically and to change their task decision-making based on available
as per requirement. In recent years, information using data processing
developments in computing have and prediction techniques, showing
made this all progressively easier to the significance of machine learning
obtain as well as to process large in sports research. This sports club not
amounts of data containing detailed merely records but also estimates
information. almost everything about players on-
As a result, the availability of both and off-the-field, including their
live and historical data in the field of habits of relaxing, drinking, and
sports analytics has made machine training, practically each part of the
learning very applicable and useful game. Diverse models are designed
[1–5]. Sports analytics is based on the based on raw data to optimize game
gathering and analysis of historical planning and to create personalized
game information in order to extract training timetables. By integrating
essential knowledge from it, needed machine learning and predictive
to facilitate good decision-making. analytics, data analyzed by the
Machine learning can be used developed models enables players to
successfully on multiple occasions in continuously enhance their
sports, both off-the-field and on-the- performance. Depending on the
field. The success of a team and its interpretation of the information
result against another team can be gathered, team manager is assisted in
forecasted effectively through the decisions such as player replacement,
projected model. The proposed model keeping a player in the squad, parting
focuses primarily on the healthy a player on the bench and the time at
development of players and on which a player should be introduced
increasing their competitiveness for onto the field.
the facilitation of team owners and In the current project, the data set
other financiers or investors. The used emerged from the various
analysis was conducted using various matches played, with about 146
classification methods designed for match specifics providing complete
machine learning, for example, information about the winner of the
Gaussian Naive Bayes, K-Nearest match, the winner of the position toss,
Neighbours (KNN) and Random names of the teams and other
Forest. significant characteristics. The
UMT Ar tificial Intelligence
76
Volume 1 Issue 1, Spring 2021
Khushal Das and Muhammad

dataset used in our project contained focused. WASP (Winning and Score
information about the matches of PSL Predictor) [15] is a realistic method
held from the year 2016 through used in cricket. It forecasts the score
2019. Using this data set, we achieved and the possible outcome of a limited
the above mentioned primary over match of cricket, including a T20
objective of our project. or a one-day match. Sky Sports New
Zealand, for the first time, developed
II. LITERATURE REVIEW
such a tool in 2012.
One of the prime sports leagues, Hawk-Eye, which is a software, can
that is, Major Baseball League, has track the ball’s trajectory and visibly
progressed tremendously in the area shows the best statistically important
of sports technology in recent years track. It has also been formally used
by incorporating ball-by-ball in the Umpire Decision Review
statistics and basic awareness into the Method since the year 2009.
game. These are usually not Likewise, this computer-aided
observable otherwise and this is intelligent technology is also used in
where machines come into play. In other sports such as badminton, tennis
order to obtain and use the relevant and snooker. In this paper, we explore
data, experienced teams of technical briefly the relevance of machine
experts implement different machine learning for sports and the methods
learning skills to better train their suggested for its enhancement.
players with improved results.
Some of the categorizing or sorting III. MACHINE LEARNING
problems resolved with the aid of Modern autoencoders are
machine learning in the game of networks that attempt to encode and
baseball [9] include forecasting the With the application of machine
games’ outcomes, categorizing if a learning and Artif icial Intelligence
team will purposely permit a player to (AI), we can address the real-time
walk in to bat, and classifying non- problems of the society. In sports,
fastball pitches by field type. they have a thoughtful influence.
Moreover, sports analytics are also During the game, teams utilize data to
utilized in cricket to predict the result enhance on-the-field performance of
of a match while it is still in progress their players. Such predictive analysis
or even when the match is yet to begin helps them to make directed decisions
[10-14]. Concerns such as predicting and structural adjustments which
the number of wickets taken or the affect various aspects of their sports
runs scored during the match are also association, from recruiting players to
interesting problems which must be

Dr Hasan Murad School of Management


77
Volume 1 Issue 1, Spring 2021
PREDICTIVE ANALYSIS OF PSL MATCH WINNERS USING …

match results, from indulging more the enhancement policy. Arthur


fans to high sales of match ticket. Samuel introduced the smart method
to gaming using artificial intelligence
techniques in the game of checkers
during the 1960s and vouched for the
inclusion of a number of moves for
any player to win against his rival.
The authors demonstrated that
Bayesian networks perform better
Fig.1. Research Work Flow than other machine learning
Sports have been an important factor techniques in the research paper
affecting our culture over the past few “Using mathematics and statistics to
years. Participation in the Olympics comprehend data from baseball,
contributes to community wellbeing football, basketball, and other sports.”
and productivity. Player performance This paper offered a remedy for the
and participation in the match can be manual attempts aimed to improve
enhanced using powerful machine the previously obtained findings.
learning techniques which help to Such prediction of the Bayesian
enhance their previous performance, network delivered a 59.21 percent
while necessary changes can be accurate outcome as compared to the
made. Without the use of machine other approaches.
learning, a player’s strengths and The way gaming can be tracked has
weaknesses cannot be addressed in always been an influential approach
detail and areas for improvement in machine learning. In the baseball
cannot be identified. game paper [8], it was shown that
A fundamental approach to predict a gameplay can be evaluated using
cricket match winner was suggested machine learning techniques, such as
in the paper “Predicting the match SVM and KNN, supported with the
outcome in one day international current procedures. These techniques
cricket matches” [17]. It also involve the retrieval of 75 percent of
addressed how machine learning a match's highlights and the
algorithms would change the future of subsequent application of a deep
how players’ judge their gameplay. assessment method.
This technique has many benefits Improvements in the assessment of
over the current technique, where gameplay help the players to find
manual work is reduced to find faults fresh tactics to secure victory over
in gameplay, making it easier for the their opponents. They also suggest
management to effectively determine how the outcomes of the game can be

UMT Ar tificial Intelligence


78
Volume 1 Issue 1, Spring 2021
Khushal Das and Muhammad

represented pictorially and provide methodology. It reflects numerous


the manner in which a match's stages including the collection of
analysis is given out. During game data, exploration of data, data
prediction, we used supervised cleaning, data processing, and
algorithms such as Random Forest, creation of model evaluation. These
Naïve Bayes and KNN classifier, stages, also explained later in this
which are comprehensively explained paper, are fundamental in our work.
later in this paper. There are a number of algorithms
IV. PROPOSED WORK available to resolve the real-time
problems of machine learning. These
Literature review reveals that it is
possible to build a machine learning algorithms consider the predefined
model that can forecast the results of gameplay input and their former
a match even before its experiences to yield a precise result.
commencement. In cricket, there are These algorithms vary in
different formats, including the T20 performance based on the nature of
format, which have a number of problem.
turnarounds. This scenario makes it
challenging to predict the winner till A. Naïve Byes Classifier
the last ball of the match. Therefore, It comprises various classification
predicting the winner is very algorithms centered on Bayes’
complicated. Regression and theorem.
classification tasks are used to do a
great deal of mathematical work in
sports, all of which are subject to
supervised learning.
Supervised machine learning can be
split into two groups, that is,
classification and regression, based
on performance. This problem is of
classification. Therefore, on the PSL
Fig.2. Naïve Bayes Classifier
dataset, we applied multiple
classification algorithms, analyzed This pool of algorithms shares a
the outcomes and picked the best common principle, which is the
suitable model with the highest strong independence of each pair of
accuracy. features being classified. Bayes’
For the current research work, the theorem calculates posterior
above figure reflects the basic probability P(c|x) from P(c), P(x), and
P(x|c). Naive Bayes classifier
Dr Hasan Murad School of Management
79
Volume 1 Issue 1, Spring 2021
PREDICTIVE ANALYSIS OF PSL MATCH WINNERS USING …

presumes that the impact of a machine learning algorithm which


predictor’s (x) value on a given class can be implemented to solve both the
(c) is independent of the values of problem of regression and the
other predictors. This is referred to as
problem of classification.
the conditional independence of that
class.
B. Randon Forest
Random Fortvfest, primarily based
on Decision Tree, is the most eminent
machine learning algorithm and
includes multiple decision trees. It
obtains prediction from each decision
tree and estimates the best one by
voting. Moreover, by averaging the Fig.3. Distance Functions
result, it reduces overfitting.
The supervised KNN algorithm
The output of classification problems
assumes that in close proximity,
are usually discrete values which are
similar things occur. Hence, it
completely different from each other.
classifies the data based on the
This algorithm includes constructing
distance between two points. Its
a decision tree for every sample and
various applications include intrusion
generating prediction. Finally,
detection, data mining, and pattern
prediction results are voted and one
identification. KNN is non-
with the most votes is selected.
parametric and totally tough. Various
Test error, also known as expected
distance functions used in this
prediction error, can be computed any
algorithm are given below in
time using equation (2) mentioned
equation (3).
below, where E is the error and L
represents the values of the data. V. METHODOLOGY
A. Data Collection
Data was obtained from
C. K-Nearest Neighbors cricsheet.org [20]. On this website,
K-nearest neighbors (KNN) data is available in YAML format,
algorithm is a simple, yet powerful which is a human-readable format.

UMT Ar tificial Intelligence


10
80
Volume 1 Issue 1, Spring 2021
Khushal Das and Muhammad

There is a different file for each null values that affect the result.
match containing information about Data preprocessing yields a format
PSL season year, details of match that is easier to handle while using
winner, player of the match, match different algorithms.
number, location, name of the
1. Data Cleaning
stadium, umpires’ particulars,
participating teams, and margin of Data may contain noise and missing
winning. Since PSL is only five values for different columns due to
years old, the files of only 146 error(s) in recording or parsing. This
matches played during the five results in improper classification.
seasons are available. The data Hence, we replaced these values
contains null values. For prediction with dummy values such as mean
analysis, certain attributes may not and mode, depending on column
be needed. types. Further, we removed columns
that were completely null.
B. Data Pre-Processing
2. Choosing Required Attributes
Since the available data was in
YAML format and contained a In this step, we dropped certain
different file for each match, it columns of the data that did not help
needed strong preprocessing. First, in prediction or only resulted in
we read it into python and decreased accuracy. Feature
normalized it using python library importance was used to point out
known as json [999]. Then, we such columns.
converted it to dataframe [999] 3. Model Evaluation and
using another library called pandas Development
[999]. Afterwards, we concatenated We applied different classification
all of these files to construct a single algorithms including KNN,
dataframe [999] containing the data Random Forest and Naïve Bayes
of all matches. Subsequently, we did classifier. The basic procedure
exploratory data analysis to identify followed in applying all of these
any existing anomalies. Usually, algorithms is given below.
datasets contain flaws which need to 1. for for each sample iteration do
be addressed before applying the 2. Split data into train and test set.
algorithms. There could be several
Dr Hasan Murad School of Management
75
81
Volume 1 Issue 1, Spring 2021
PREDICTIVE ANALYSIS OF PSL MATCH WINNERS USING …

3. Train the model on training set representation of confusion matrix


using all features. is as follows:
4. Predict on the test set. Table.1. Confusion Matrix
5. Calculate features’ ranking. Confusion Matrix Predicted
6. for for each subset of features
Positive Negative
Si, where i = 1 to S do
7. Keep the Si most important Actual Positive True
Positive
False
Negative
features. Actual Negative False True
Positive Negative
8. Train the model on training act
using Si features. a) True Positive is the number of
9. Predict on the teat set, correct predictions that an
10. end instance is negative.
11. end b) False Negative is the number of
incorrect predictions that an
12. Calculate the performance of
instance is positive.
the model over Si using held- c) False Positive is the number of
beck samples. incorrect predictions that an
13. Identify an appropriate number instance is negative.
of features. d) True Negative is the number of
14. Identify the final list of features correct predictions that an
to keep in the model. instance is positive.
15. Train the model using the Precision: It is the ratio of correctly
optimal set of features on the predicted positive instances to the
original training set. total number of positively predicted
observations.
VI. RESULTS AND DISCUSSION 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃
𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 =
PSL dataset was used for 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 + 𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃
Recall: It is the ratio of correctly
applying different classification
predicted positive observations to
algorithms and Random Forest gave
total observations.
the highest accuracy. The top two 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃
𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅 =
highest accuracies were 50% and 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 + 𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹 𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑖𝑖𝑖𝑖𝑖𝑖

60%, yielded by KNN and Random F1-Score: It is the weighted average


Forest, respectively. Classification of recall and precision.
𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 ∗ 𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅
report comprises F1score, recall, 𝐹𝐹1 = 2 ×
𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 + 𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅
precision and accuracy. The
UMT Ar tificial Intelligence
82
Volume 1 Issue 1, Spring 2021
Khushal Das and Muhammad

Accuracy: It is the average of recall


and precision.
𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 + 𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅
𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴 =
2
The model has an accuracy of
60.11% and classification report is Figure.4.Obtained Accuracy
given in Table 2. The accuracy for
VII. CONCLUSION
the prediction of match winner is
shown for all the teams. Prediction of the winner in sports,
specifically cricket, is a really
Table.2. Match Winner Prediction
precision recall f1-score
complex and challenging task.
1 0.75 1 0.86 However, by applying machine
learning algorithms, it could be
3 0.67 1 0.8
4 1 1 1
5 1 0.71 0.83
6 0.5 0.5 0.5 made less complicated. In this
accuracy 0.8 15 research, different factors
macro avg 0.78 0.84 0.8
weighted avg 0.84 0.8 0.8 influencing the outcomes of PSL
The accuracy of the three algorithms matches were identified. The factors
is compared in Table 3 below. that considerably affect the results
Table.3.Accuracy Comparison of a PSL match include toss winner,
Type KNN Gaussian Random toss decision, participating teams,
Naive Forest
Bayes
stadium, and location.
Obtained 49.99% 19.86% 60.11% We designed a generic model based
Accuracy
on team 1 and team 2, toss winner,
From Table 3, it is eminent that toss decision, and venue and also
Random Forest has the highest predicted the match winner. Three
accuracy, followed by KNN classification models were trained
classifier. Gaussian Naive Bayes on the PSL dataset. These included
and other classifiers performed Gaussian Naive Bayes, KNN and
poorly in predicting the outcomes of Random Forest. Among these
PSL matches. Graphical algorithms, Random Forest
representation of model classifier gave the best result with an
performances is depicted below. accuracy of 60.11%.
For future work, the plan is to add
other attributes, such as the recent
Dr Hasan Murad School of Management
83
Volume 1 Issue 1, Spring 2021
PREDICTIVE ANALYSIS OF PSL MATCH WINNERS USING …

history of the pitch, match day 6. D. M. Robert Rein, "Big data


conditions, and the current form and and tactical analysis in elite
experience of the players of both soccer: future challenges and
teams. The model designed in our opportunities for sports
science," Springerplus, 2016.
project can be modified to be used
for other sports, such as football and
7. H. Ghasemzadeh and R. Jafari,
"Coordination Analysis of
baseball. Human Movements With Body
References Sensor Networks: A Signal
1. R. Lamsal and A. Choudhary, Processing Model to Evaluate
Predicting Outcome of Indian Baseball Swings," IEEE
Premier League (IPL) Matches Sensors Journal , pp. 603 - 610,
Using Machine Learning, 2011.
Melbourne: University of 8. T. A. Severini, Analytic
Melbourne, 2018. Methods in Sports: Using
2. M. Bailey and S. R. Clarke, Mathematics and Statistics to
"Predicting the Match Outcome Understand Data from
in One Day International Baseball, Football, Basketball,
Cricket Matches, while the and Other Sports, New York:
Game is in Progress," Journal taylor & francis Group, 2014.
of Sports Science & Medicine, 9. R. R. a. E. Feustel,
2006. "FORENSIC SPORTS
3. Predicting the Winner in One ANALYTICS: DETECTING
Day International Cricket , AND PREDICTING MATCH-
"Predicting the Winner in One FIXING IN TENNIS," Journal
Day International Cricket," of Prediction Markets, pp. 77-
Journal of Mathematical 95, 2014.
Sciences & Mathematics 10. A. D. Mahanth K. Gowda,
Education. "Bringing IoT to Sports
4. C. M. Gil Fried, A data-driven Analytics," NSDI, 2017.
approach to sport business and 11. K. Goldsberry, "CourtVision:
management, London: New Visual and Spatial
Routledge, 2016. Analytics for the NBA,"
5. T. H. Davenport, "What Harvard University,
Businesses Can Learn From Cambridge.
Sports Analytics," MIT Sloan 12. R. Lamsal and A. Choudhary,
Review, 2014. "Predicting Outcome of Indian
UMT Ar tificial Intelligence
84
Volume 1 Issue 1, Spring 2021
Khushal Das and Muhammad

Premier League (IPL) Matches 4th ACM Multimedia Systems


Using Machine Learning," Conference, 2013.
2018. 14. e. a. Rabindra Lamsal,
13. P. Halvorsen, S. Sægrov, D. K. "Predicting Outcome of Indian
C. Kristensen and A. Premier League (IPL) Matches
Mortensen, "Bagadus: An
Using Machine Learning,"
integrated system for arena
sports analytics - A soccer case DeepAI, 2018.
study," in Proceedings of the

Dr Hasan Murad School of Management


85
Volume 1 Issue 1, Spring 2021

You might also like