You are on page 1of 21

BUSINESS ANALYTICS

CIA-1

Deepak Arora(1820308)
Sai Kiran Reddy(1820380)
Sai Hrithik(1820381)
Paras Godara(1820365)
Yash Sharma(1820383)
Sparsh Rastogi(1820332)
3BBA-C
INTRODUCTION:

Sports analytics are a collection of relevant, historical, statistics that when properly applied can
provide a competitive advantage to a team or individual. Through the collection and analysation
of these data, sports analytics inform players, coaches and other staff in order to facilitate
decision making both during and prior to sporting events. The term "sports analytics" was
popularized in mainstream sports culture following the release of the 2011 film, Moneyball, in
which Oakland Athletics General Manager Billy Beane (played by Brad Pitt) relies heavily on
the use of analytics to build a competitive team on a minimal budget.
There are two key aspects of sports analytics — on-field and off-field analytics. On-field
analytics deals with improving the on-field performance of teams and players. It digs deep into
aspects such as game tactics and player fitness. Off-field analytics deals with the business side of
sports. Off-field analytics focuses on helping a sport organization or body surface patterns and
insights through data that would help increase ticket and merchandise sales, improve fan
engagement, etc. Off-field analytics essentially uses data to help rightsholders take decisions that
would lead to higher growth and increased profitability.
As technology has advanced over the last number of years data collection has become more in-
depth and can be conducted with relative ease. Advancements in data collection have allowed for
sports analytics to grow as well, leading to the development of advanced statistics as well sport
specific technologies that allow for things like game simulations to be conducted by teams prior
to play, improve fan acquisition and marketing strategies, and even understand the impact of
sponsorship on each team as well as its fans.
Another significant impact sports analytics have had on professional sports is in relation to sport
gambling. In depth sports analytics have taken sports gambling to new levels, whether it be
fantasy sports leagues or nightly wagers, betters now have more information at their disposal to
help aid decision making. A number of companies and webpages have been developed to help
provide fans with up to the minute information for their betting needs.
IMPORTANCE-

Professional sports, over time, have become really competitive where a minute can change the
course of the game. Sports teams now have much loyal fan-base and the followers are asking for
detailed information. Agencies and team players are also now realizing the need for proper
performance tracking such that corrective measures can be taken after studying accurate
performance metrics. As a result, sports managements are competing to gain a competitive edge
against the peers in the field of play. Hence, the surge in the need for sports analytics.Sports
analytics was brought to the public eye by the movie Moneyball, a 2011 sports-drama film that
portrayed how a baseball coach, Billy Beane rebuilt his team against all odds using empirical
data and statistical analyses on players’ performance. His trial with sabermetrics changed the
way the game is played forever and made analytics a dream for many.Dr. Lashbrook, Founder,
and President of Sports Management Worldwide mentioned that “The frontier of Analytics is
just beginning and there is no end in sight to its potential. (Sports) Analytics is a lucrative field
with unlimited opportunities. ” Today, not just baseball, the teams playing football, hockey,
soccer, etc. have at least one analyst for crunching team data.
PROGRESS IN SPORTS ANALYTICS:
Data within a sports organization would normally consist of player and team summaries,
performance statistics, video-clips, etc. but now the data comes from many varied sources that
have grown tremendously in the past few years. The explosion of sports science has made the
health and nutrition tracking of the players much more sophisticated. The trainers and medical
personnel now maintain their own datasets that have become an important player evaluation
asset. Apart from statistics, today’s sports analytics also relies on SQL technologies as opposed
to traditional hand-written archives, machine learning algorithms, data mining and all other
factors that fall under the gamut of predictive analytics. The improvements in computing
capabilities and the development of inferential statistics have opened a very new paradigm in the
field of sports. Tech companies like Zebra Technologies and STAT Sports have come up with
player tracking devices that capture metrics both on-field or while training in real-time that
provides timely insights and bespoke training plans. Such technologies have been instrumental in
reducing the number of possible player injuries, better strategy formulation, and performance
enhancement. These devices are products of advanced embedded technologies, cloud services,
and powerful processors. From the academic’s point of view, there are many universities, active
sports communities, management schools offering programmes in sports analytics. These
institutes have recognized the need and interest for such courses and are now offering fellowship
programmes for research in sports analytics. For aspirants in this field, understanding the
statistics is one facet of sports expertise but translating the data into solutions and insights such
that it helps to strategize remains the key aspect.
Managing In-play Run Chases in Limited Overs Cricket Using Optimized
CUSUM Charts-
INTRODUCTION-

Cricket, as the age old clich´e goes ‘is a game of glorious uncertainties. While uncertainty is an
important part, indeed an essential ingredient, of any sport, the multitude of factors that can
influence a game of cricket arguably outnumbers most other games. It is indeed an extremely
nervous task to build a model with multiple parameters and then tune those parameters to
perfection. Our paper attempts to simplify the task of model building for analyzing the outcome
during the second innings of limited overs25 cricket matches by proposing a mechanism for in-
play analysis based only on the runs scored in each ball. We demonstrate that the target set for
the chasing team at the end of first innings and a careful analysis of the ball-by-ball data on runs
scored during second innings can help us capture the complexity of a run chase.

Our paper build son the concepts of Cumulative Sum (CUSUM) control charts, where chart
parameters are optimized using the genetic algorithm to achieve maximum accuracy. CUSUM
charts take into account both the past and present information and hence, are considered more
powerful in detecting small changes in the sample data. Our work is inspired by this property of
CUSUM, which helps us in detecting minute changes in run chasing patterns. We have to define
a target for CUSUM charts. Here, we have considered the required runs per remaining ball as the
target, and therefore, it is updated continuously. The advantage of making the target dynamic is
that the effects of the loss of wickets and overs remaining are automatically incorporated in the
target. The reason is that a loss of wicket generally leads to a drop in the runs scored in each ball
for the next few overs, which increases the required run rate. Nonetheless, we wanted to
determine whether the direct introduction of wickets lost and overs remaining to the model
would result in any improvement in accuracy. To incorporate the effects of wickets loss and
remaining overs, we considered these two parameters as resources for the batting team and
updated the required runs, i.e., target by a factor which was developed using values from
Duckworth-Lewis-Stern table (Duckworth and Lewis, 1998; Stern, 2009). However, we did not
find any significant difference in accuracy in the outcome analysis for the modified model. We
have worked with a data set of over limited overs international matches (roughly 0.3669 million
ball-by-ball data points) during a period of twelve years (2006 to 2017). We have been able to
achievearound80%accuracylevelfromthehalfway mark (25th over) in the second innings of
ODIs. We could achieve an accuracy level of around for the same conditions in the case of T20s,
which are indeed more difficult to analyze. If we consider the 80% completion stage for the
matches (40 overs for77 ODIs and 16 overs for T20s), our model accuracy is around 87% and
82% respectively. We believe that our paper adds to the literature on in-play forecasting by
proposing a simple yet effective framework to analyze the outcome of a cricket match. It also
offers a novel approach to the application of control charts to analyze the outcome of a sport. To
the best of our knowledge, there is no mention of a similar approach to the problem of in-play
analysis in the sports literature.

DATA COLLECTION-

Our data set comprises a total of 1180 ODI, 240 and537T20matchesplayedbetween2006and2017.


There are 604 wins and 576 losses for the chasing 242 teams in ODIs, and 264 wins and 273
losses for the chasing teams in T20s. We have excluded matches 244 that were either abandoned
or tied or affected by rain and other reasons where DLS method was used to modify the target.
Around 40% of the matches have been randomly selected and used to identify the optimal
CUSUM model parameter values and the rest to predict and validate the models. Details about
the data set are presented in Table 1. From the table, we calculated the current and required run
rates for the second innings in terms of the average runs per ball. The calculation for other
parameters of the control charts in the context of cricket match is discussed in the following
subsections.

CUSUM chart for analyzing outcome in cricket

Table 1

Match break up for GA modelling and Prediction

ODI T20

Matches Balls Matches Balls

GA modelling 481 124293 223 24832

Prediction 699 181747 314 35683

Matches Excluded 199 39

A cricket match can be considered to be evenly poised as long as the control chart values based
on the gap between the current run rate and target run rate are within control limits. An upward
shift in the value above the upper limit indicates an increase in the chances of winning for the
chasing team, where as a downward shift below the lower limit indicates the opposite. Our data
sets how’s that for-most of the cases, the shift from the target is much smaller compared to the
standard deviation of runs per ball. As Shewhart chart for averages is very effective if the
magnitude of the shift is 1.5σ or higher (Montgomery, 2010), chart is not effective for this type
of data set. CUSUM charts are far better alternatives when such small ,consistent changes are
important (Hawkins and Olwell,1998). The CUSUM chart plots cumulative sums of deviations
of the sample values from a target value, i.e., CUSUM chart is created by plotting CUSUM
values from the first sample set to sample-set, denoted by Si =i j=1( xj −T)for a sample set. In
other words, Si combines information from all past and current sample set. In the traditional use
of CUSUM charts, a process is considered under control with respect to the target Taslon gas Si
fluctuates around zero .However, if the mean shifts upwards or downwards, then it indicates a
positive or negative trend, and we consider that as evidence of shift in the process mean. To
determine the significance of the shift, one can use a combination of two one-sided CUSUMs
called UPPER and LOWER CUSUM.

List of notations:

K = Maximum possible number of overs in both the innings (K =50 for ODI and K =20 for T20
matches) M = Total number of overs played in the second innings

Z = Total number of overs considered for analysis

B =Total number of overs data used for analysis= Minimum (Z,M)

In other words, if the total number of overs played in these condinnings is less than the number
of overs considered for analysis, we only used the limited data points available for analysis.

ni = Number of balls played in over i

xij = Runs scored in the jth ball of ith over

xi = Average runs per ball scored in over i

Then, the value of an upper one-sided CUSUM in over

i = UCi = Max{0,UC i−1 +xi −(Ri +ku × σi/ √ni)} (1)

Similarly, the value of a lower one-sided CUSUM in over

i = LCi = Min{0,LC i−1 +xi −(Ri −kl × σi /√ni)} (2)

Consider, LC0 = UC0=0

Eqs. 1 and 2 need some explanation in the context346 of its application in cricket. Note, (Ri +ku
× σi/√ni )347 and (Ri −kl × σi /√ni ) are required targets adjusted for σi, standard deviation of
runs per ball. σi increases if there is a deviation in within-over-score at different balls of the over.
For example, in a high scoring innings, more dot balls or singles may increase σi values. Such a
deviation will reduce UCi score and will increase LCi scores.Now,CUSUM value in any over I
can be mapped to the gap between average runs scored and adjusted required target .A positive
upper CUSUM in any particular over implies that the sum of upper CUSUM value in the
previous over and the average runs score din the current over is higher than the requirement, (Ri
+ku × σi/ √ni ). Similarly, a negative CUSUM in any particular over signifies that the sum of
lower CUSUM value in the previous over and the average runs scored in the current over is
lower than the requirement,(Ri−kl×σi/√ni).Also,note that the upper CUSUM cannot be negative
,and the lower CUSUM cannot be positive. Additionally ,we also need to define the limits that
would confirm whether the gap (i.e., the CUSUM value at either side) in any particular over is
significant or not. Hence, we need to compare those high and low CUSUM values with the
control limits. To perform the comparison, below we define the upper and lower control limits.

Additionally ,we also need to define the limits that would confirm whether the gap (i.e., the
CUSUM368 value at either side) in any particular over is significant or not. Hence, we need to
compare those high and low CUSUM values with the control limits. To perform the comparison,
below we define the upper and lower control limits.

UCLCUi=hu×σi/√ni (3)

LCLCUi=−hl×σi/√ni(4)

The mean, the standard deviation, and the required targets are updated at the end of each over.
The values of Ri, LCi, UCi, LCLCUi, and UCLCUi change at the end of each over as xi, ni, and
σi are updated.

The CUSUM charts in our work have three significant points of departures from traditional
CUSUM charts. First, we are interested in separately counting points outside upper and lower
control limits after the shifts take place. Hence, our objective is to minimize total errors as
opposed to the minimization of Type II errors. Second, due to continuous update in target and
standard deviation of runs scored, our CUSUM values and decision intervals are updated with
new samples, which makes it dynamic in nature. Finally ,our charts are created considering runs
scored 390 in each ball played in an over, including no-balls and wide-balls ,and thus ,we had to
modify the models to accommodate unequal sample sizes.Figure1 shows an example of CUSUM
chart based on the values calculated from ball-by-ball data in the second innings using Eqs. 1 to
4. The match we have considered here is an unsuccessful chase in T20 between South Africa and
West Indies (batting second), 2009. In the figure, the dotted lines represent upper CUSUM (UCi)
and lower CUSUM (LCi) values, whereas the smoothed lines represent upper control
limit(UCLCUi) and lower control limit (LCLCUi). For a positive upper CUSUM, if (UCi +xi) is
greater than standard deviation adjusted Ri, then the situation is in favor of the batting team. An
upper CUSUM value higher than upper control limitis interpreted as a significant upward shift
from the required target. As CUSUM chart points include all previous ball-by-ball data, asigne
scent upward shift is possible only if, in the last few overs, the team is consistently improving the
average score. On the other hand, if upper CUSUM is positive due to a good performance in
earlier overs, it may remain positive even when the average score is lower than the required
target. For a negative lower CUSUM, the opposite happens. If (LCi +xi) is lower than standard
deviation adjusted Ri,then the situation is considered unfavourable for the batting team. Based on
such interpretation, we now define the following measures to capture whether the CUSUM
values at various overs are significant ornot

CUi =1 ifUCi > UCLCUi

0 otherwise (5)

CLi =1 ifLCi < LCLCUi

0 otherwise (6)

AN EXAMPLE OF CUSUM Charts

WIN if (CUi −CLi) > 0

LOSS if (CUi −CLi) < 0

No conclusion if (CUi −CLi)=0

Computational analysis using genetic algorithm

We designed two types of CUSUM model, basic and modified (to capture other variables, viz.,
overs left and wickets left). As the performance of CUSUM chart depends on the parameters hl,
hu, kl, and ku, there is a need to identify optimal levels of these parameters with a subset of the
data available. In terms of prediction, percentage accuracy among a group of matches is
considered as the performance indicator. As discussed earlier, around 40% of the matches are
considered as a subset for identifying optimal levels of parameters (Table 1). If a group of L
matches is used for measuring percentage accuracy, then the subgroup presentation of these
matches can be found in Table 2.

Now, L = P +Q+W +X+Z, where the total number of matched wins and losses = P +X.

Overall percentage accuracy (OPA)

(P +X)×100/ L(8)

Table 2

Division of matches for comparison between predicted and actual results

WIN LOSS
PREDICTED RESULT
WIN P Q
LOSS W X
NO PREDICTION
Z Z
Actual Result WIN LOSS Predicted Result WIN P Q LOSS W X No Prediction Z

Percentage accuracy of analyzed matches (PAPM)

=(P +X)×100/ (L−Z)(9)

To see whether our model is biased towards accuracy in win prediction or loss prediction, we
have also captured those results. Our results show that in the case of ODI, as the number of overs
remaining increases, win accuracy becomes higher than loss accuracy, and as the number of
overs remaining decreases, loss accuracy becomes higher compared to win accuracy. However,
both types of accuracy increase with the decrease in number of overs remaining. The justification
is as follows. In a match where chasing team has lost most of its batting resources at the start or
towards the middle of the innings, the chasing team is likely to lose the match. Though, in
general, the effect of wicket loss is immediately reflected in run chasing average, a resultant
move
Table 3

Optimal CUSUM chart parameters for basic model

Overs remaining hl hu kl ku
OD I model
parameters
25 0.8202 0.47 0.6255 0.4034
91
20 1.3143 0.68 0.5702 0.3413
42
15 0.9193 0.82 0.8299 0.4341
11
10 1.0896 0.33 0.9932 0.7943
02
05 1.0294 0.99 1.0557 0.5824
21
T20 model
parameters
10 0.7576 0.50 0.4921 0.3720
73
08 0.3915 0.43 1.0012 0.3038
80
06 0.9790 0.49 0.8096 0.5410
67
04 0.3870 0.57 1.0720 0.4881
81
02 0.5747 0.24 0.9569 0.5434
96

Table 4

PercentageaccuracyintermsofOPAandPAPMforpredictionmat
chesbasedonoptimalchartparameters
(Win and Loss predications are also
shown separately)

Match Accuracy Overs


type type
Remaining
25 20 15 10 05
ODI OPA 78.68 81.40 83.69 85.26 87.
12
ODI PAPM 79.71 83.19 85.28 86.50 88.
91
ODI Win 82.90 85.49 86.4 84.47 87.
29
ODI Loss 77.11 81.20 84.14 88.82 90.
63
10 08 06 04 02
T20 OPA 71.34 75.16 76.11 78.03 82.
80
T20 PAPM 75.93 78.93 80.20 81.94 84.
14
T20 Win 81.06 77.02 79.62 85.42 84.
28
T20 Loss 71.78 81.16 80.85 78.71 84.
00
Overall percentage accuracy (OPA). Percentage accuracy of predicted
matches(PAPM).
Percentage accuracy of predicted matches (PAPM)

Compared to individual performances, the partnership is more important. Similarly, effective use
of remaining overs also depends on the bowler of the opponent team, in addition to other factors.
The effectiveness of any over in terms of scoring runs will be lower if the bowler of the opponent
team is good. Whereas the choice of the batting order is known, the choice of bowling order is
not known to the chasing team. As a result, it is rational for the chasing team to minimize the
utilization of lower order batting resource and maximize the utilization of the remaining overs.
Fall of wickets reduces the resource of the batting team in terms of wickets remaining. Hence,
the team will try to increase the utilization of the other resource, i.e., overs remaining. However,
the risk of not utilizing the remaining overs increases with the decrease in remaining wickets as
lower batting order resources will generally have lower batting capability. To utilize the
maximum of remaining overs, in general, the chasing team will not deviate much from the
required target. If a team is losing wickets at regular intervals, particularly in the last few overs,
an aggressive chaseis more of an exception than the norm. While comparing between two
different situations in terms of number of remaining wickets, for the situation with fewer
remaining wickets, the gap between average runs scored in a particular over and the required
target is smaller. In such a case, UCi value is lesser, the chance of UCi crossing upper control
limit is also lesser, and the chance of ‘Win’ in our model reduces. In other words, CUSUM chart
points will be have differently in the case where 100 runs have been scored
afterthefallof3wicketsversus100runsscoredafter the fall of 7 wickets, at the end of 10 overs in a
T20 match. The argument is also valid for ODI matches. Hence, we can conclude that as
CUSUM chart points of our basic model, UCi and LCi, take care of the combined effect of
wickets remaining and resource remaining, additional benefit of using DL parameters is at best
marginal. This validates our proposition that the effects of wickets lost and overs remaining are
already captured in the required run rate, and hence need not be considered separately. We also
present the accuracy of win and loss for additional models separately in Table 6. The trends of
win and loss accuracy percentages show similar behaviour a sour basic model. The justification
for such behaviour is already discussed in the previous section.
CONCLUSION-
In this paper, we present a novel and yet simple approach to the problem of analyzing the
outcome of a limited overs cricket match using the concept of control charts. We take the
CUSUM chart and then use the genetic algorithm optimize the parameters. The CUSUM chart
monitors all the cricket matches
consideredintrainingsettodesignthereferencevaluesanddecisionintervalsofthechartandthenusing
optimized reference values and decision intervals , we capture the possibility of winning or
losing a match while the second innings of the match is in progress. The work contributes to the
literature in two ways (i) demystifies model building for in-play analysis
duringarunchase,bydemonstratingthatthecombinationofthetargetsetandtherunsscoredineachball
can in itself capture the glorious uncertainties of the game, and)proposing an ovel application of
control charts in the field of study on sports outcome. The basic advantage of CUSUM is that it
is very easy to implement. Our method can additionally provide in-play monitoring of chasing
team run scoring behavior and present the monitoring process in a user-friendly graphical
interface rather than just providing expected outcome reports. Besides, the chart can immediately
issue warnings when scoring pattern substantially deviates from the required target. This, in turn,
allows cricket enthusiasts to identify the structural change in the game pattern. Moreover, the use
of one sided CUSUM method allows us to visually observe such change in both directions
(expectation regarding outcome changing from win to loss and vice-versa). This work would be
relevant to the body of practitioners who are interested in predicting the outcome of a cricket
match at various stages of the second innings. While the betting industry is a good fit, the
beneficiaries of this work are not restricted to just bookmakers and punters; the players, the
coaches and the captains would all benefit from our model. The work would not only help the
players identify the critical points during the second innings where it is essential to accelerate the
run rate, but also help the coach and the captain formulate strategies for a successful run chase.
Analysis of “Predicting plays” in the National Football League(NFL) from
2014-2018.

Football teams are composed of offensive and defensive players. By virtue of


having the ball, the offense dictates the play. Because the defense can only react
when the opposing team’s offense commences play, the ability to correctly predict
the type of play the offense will run can be a game-changing advantage. At the
very least, having clue son the type of play the offense will run allows the defense
to make better-informed decisions and increase their likelihood of limiting the
advancement of the other team. The idea of utilizing prediction for in-game
situations has become an increasingly popular research focus in sports analytics
(Alamar, 2013). However, the primary objective of previous papers in this area has
been to maximize prediction accuracy, which often results in accurate but
uninterruptable models. Such an approach is useful to test how far prediction
models have advanced or how easy different outcomes are to predict, but they
typically do not translate into a practical approach for utilizing those predictions in
an in-game situation. For example, accurate predictions from a neural network
model may not be implementable if the necessary technology to communicate the
predictions and turn them into actionable decisions is not allowed on the sidelines.
The National Football League (NFL) is an example of a league with such sideline
technology restrictions.

1.We show that a neural network model generates a maximum prediction accuracy
of 75.3% witha10.6%falsenegativerate. This prediction accuracy is competitive
with the state-of-theart; false negative rates were not reported for comparable
studies.

2. Being the first to view the NFL play prediction problem through the lens of
real-world implementation, we devised a simple decision tree model that captures
86% of the accuracy of the complex model.

Data sources-

We obtained our raw data from two sources: (1) play-by-play data from
www.NFLsavant.com, (2) Madden NFL video game player ratings from
maddenratings.weebly.com.
Models to maximize prediction accuracy
To develop an interpretable prediction model, we start with training a family of “complex”
models with the goal of maximizing prediction accuracy. This exercise serves two purposes.
First, we wish to validate that our predictions are competitive with the state-of-the-art complex
models . Second, we can use the accuracy achieved by our best-performing complex model as a
baseline for the simpler models we develop. Using our full dataset of raw and derived features,
we considered the following four models: classification trees, k-nearest neighbors, random
forests, and neural networks. The respective hyper parameters for each model were tuned using
repeated 10-fold cross

The prediction accuracies and false negative rates for each of the complex models

CART KNN Random Neutral Prediction


Accuracy
Prediction 73.3% 71.3% 74.7% 75.3%
accuracy
False negative 11.9% 6.7% 11.1% 10.6%
ratio

validation over 15 iterations. Because maximizing prediction accuracy was the goal, prediction
accuracy was used as the scoring metric for cross validation. However, we also considered the
false positive (i.e., predict pass when it is a run) and false negative (i.e., predict run when it is a
pass) rates of each model. Given the practical interpretation of these two metrics, we believe the
false negative rate is the more important metric to consider. A defense that is expecting a pass
will generally be in a better position to respond to a run, compared to a team that is positioned to
defend against a run when a pass play is executed instead. Table1 compares the prediction
accuracy and false negative rate of the four models. The neural network has the highest
prediction accuracy (75.3%) and is associated with the second lowest false negative rate (10.6%).
Recall from the literature that the two closest studies to ours generated prediction accuracies of
75.9% and 75.0%, which suggests that our result is competitive with the state-of-the-art. The
other papers did not document their false negative rates, so we cannot comment with certainty
about how our rate compares. The importance of each feat0ure in prediction accuracy is
highlighted the appendix.
Analysis from the videogame Knowing:

We obtained the overall player rating, which is a weighted sum of ratings along several
attributes, for each player from the 2014 to 2017 versions of the Madden NFL video game.These
ratings are based on player performance from the previous season which
allowsustousethemtopredictplaysfortheupcomingseason.Forexample,Madden14 is based on data
from the 2012–13 season and we use these ratings to predict plays for the 2013–14 season. The
Madden data augmented our play-by-play data with a degree of(subjective)domain knowledge
that capture more subtle differences in the strengths of the teams. For example, a team with
outstanding wide receivers or an offense facing a team that is adept at defending rushes are both
more likely to pass the ball

Table on passed down plays:

In this subsection, we provide a brief overview of insights gained from an initial exploratory
analysis of our dataset. Overall, 58.9% of plays were passing plays, which represents the baseline
accuracy of a naıve prediction model that predicts pass for every play .Drilling down a little
deeper, when the data set is divided by downs ,the proportion of passing plays on 3rd down is
approximately 79.3%, which is substantially higher than other down scenarios as shown in Table
1. This result is intuitive as passing plays typically result in more yards gained, which increases
the chance of a successful 3rd down conversion for longer distances. Further shown in Table 1, a
team is more likely to pass (65.6%) when behind in the game, compared to when the team is
leading (51.3%). This finding is also intuitive because passing plays are seen as higher reward
and higher risk: there is the potential to gain more yards ,but also an increased chance of a
turnover via interception or an incomplete pass. The Madden ratings of the position groups also
indicate differences in passing proportion among NFL teams(Table1).For instance, teams with
higher rated quarterbacks, higher rated wide receivers or lower rated running backs are generally
more likely to choose passing plays.

Players Prediction Model:


The design of our simple prediction approach was guided by two criteria. First, the prediction
model must be easy to execute in the short time frame that the defensive coordinator has to make
a play calling decision. In the NFL, the offense has a maximum of 40 seconds in which they can
snap the ball. Within the first 25 seconds, the defensive coordinator can communicate, via a one-
way radio, to the middle linebacker. Therefore, to ensure that the model is quick to use, we
limited the variables that the simple model could utilize to static variables that can be easily
observable at any point in time. By doing so, we eliminated variables such as the in-game
passing proportion, or the average yards gained per pass within the game, which would require
the coach to constantly update and keep track of throughout the game. The only static variables
permitted in the simple model were the quarter, down, minute, yards to go for first
down,previous yards gained and the point differential. The second criterion was interpretability.
We believe coaches are less likely to trust and adopt a black box model. Thus, the simple model
must be understandable by any who uses it. Given these two criteria, we ultimately decided to
implement a classification tree model with a limited number of splits. To determine an
acceptable balance between accuracy and simplicity, we trained a large family of classification
tree models that differed in how many variables were included and how many splits were
allowed.After obtaining the results of each attempted simple model, the next step was to select
the optimal one,considering acomplexity vs.accuracy trade-off. We thought that having fewer
variables, while maximizing prediction accuracy, would be desirable as it would require fewer
inputs to keep track of and consider before making a prediction.Thus,weended up choosing a
classification tree with three variables and 10 splits, which generated the highest prediction
accuracy among all of the trees considered. ThechosenclassificationtreeisdepictedinFig.1. The
three variables that had the greatest impact on play prediction were the current down, yards to go
for first down, and point differential. This simple model achieved a prediction accuracy of
65.3%, which corresponds to 86% of the accuracy generated by our
neuralnetworkmodel.Finally,we created an equivalent visual representation of the classification
tree that we believe is even easier to read and/or memorize, which may be useful in the time-
sensitive situations (see Fig. 2). It should be emphasized that this model does not replace a
coach’s knowledge but should be used to support decision making. Since we utilized a
classification tree, we can use the proportion of the majority class in each terminal leaf node as a
measure of how strong each prediction is, which allows coaches and players to decide how
heavily to trust the model in particular situations. The predictive accuracy of the model based on
different games scenarios is shown in Table 3. For example, we find that the model’s prediction
accuracy is higher in the fourth quarter, on third and fourth downs, when the offense is losing
and when the yards to go is greater than 13. On the other hand, model performance is fairly
stable across all yard lines. In addition to providing defensive coordinators with tools to assist
play calls, the model also serves as a means of supplying the individual players on the defensive
line with“pre-snapreads.”Players can run through the model and determine with greater
likelihood whether it will be a pass or a rush,and mentally

Prepare themselves for the ensuing play .For example, if a safety uses the model and predicts a
pass, this can inform him to take extra caution in guarding the wide receivers. The model could
be inserted in a play-call wristband similar to the ones used by quarterbacks. Since the players
can utilize the model up until the time of the snap, we are able to create a secondary model with
the addition of the offensive formation variable (Figs. 3 and 4). This new model achieved a
prediction accuracy of 72.3%, which captures 96% of the predictive power of our neural network
model. We propose that coaches use our base model to aid play calls while the defensive line
uses our secondary model to help make pre-snap reads.
APPLICATIONS OF SPORTS ANALYTICS-
The world of Motorsports-

When you take the case of team sports that are played on the field, data is measured on the field
and the analysis is done post the game off the field. Yes, the data is measured in real-time,
however, the analysis is done post the game. The game is reviewed, advanced analytics helps
reach conclusions and the necessary changes are incorporated in practice and put into full effect
from the following games. When it comes to the world of motorsports, it a whole different ball-
game, data is recorded in real-time, analysis is done in real-time and actionable solutions
reincorporated during the race. The power of advanced analytics in motorsports is unparalleled.

The 2005 Monaco Grand Prix!

Schumacher smashes into David Coulthard. Schumacher’s nosecone is detached and Coulthard’s
suspension is beyond repair. All the other drivers approach the turn ladled with debris and the
cars involved in the collision. The marshals deployed the safety car. Note* Kimi was leading the
race.

During the safety car period, the most logical thing for all drivers to do was to pit, change tires,
refuel and get back out to take the win. The race winning move was when the McLaren team
radioed and asked Kimi not to pit and stay out. This seemed like a bad move initially. Kimi
however fired in a few quick fire laps and increased his lead to a mind-blowing 35 seconds. He
pits on lap 42 and came out of the pits with a 13 second lead, brand new tires and fuel to finish
the race. The Flying Finn grabbed P1!

So what made McLaren make such a gutsy decision? It was the intelligence from the many
analysts who were working on real-time computations like how much fuel was there, how light
was the car because of the reduced fuel, how much longer would the tires last, wind resistance,
average lap times, lap time variances, so on and so forth. All of this done in real-time lead to the
decision in a matter of minutes. This was probably the first time in my opinion that real-time
advanced analytics was put into use by an F1 team.
MOTO GP!

The most eminent example of the use of Machine Learning and Artificial intelligence in
motorsports is in Moto GP. When Ducati turned to AI and ML by partnering with Accenture, it
was a decision that was looked upon rather cynically. They decided to use this approach
beginning 2012. Ducati was nowhere to be found amongst the title contenders. Only the
Yamahas and Hondas were dominating. Things needed to change. 100 IoT sensors were put on
the bikes to track performance data. New perspectives were created using simulations and bike
performance assessment reports under a range of various conditions. Advanced analytics and
ML techniques were applied to simulate data from previous successful tests. This helped the
engineers optimize the bike configuration for any race. There are 18 races in a season and as
many configurations and simulations were tested to prepare for any scenario and make sure the
bikes performed at max capacity at all times. The impacts of these changes were visible. The
change in one setting would trigger a change in another setting and this could be predicted. Even
without testing, the impact that a potential change in the configuration could have could be
predicted. This made the strategy rock solid for race day.

Ducati managed to make their bikes smarter with every turn, here is how:

o Data is gathered by the sensors on the bike which captured and the analytics
algorithm is applied.

o Real insights are used alter the bike configuration taking variables like track
conditions, rubber compound and intelligent testing.

o Under a huge array of track and weather conditions, the bike’s performance was
simulated and monitored. Ducati Corse applied ML techniques combined with the
data from the IoT sensors and saved a lot of effort that goes into traditional on
track testing.

o Specialized data visualization tools designed to view this particular data gave the
engineers new ways to optimize the bike configurations and achieve faster lap
times.

You might also like