You are on page 1of 24

1

Analysis of Factors Affecting


The Winning Percentage of One Day
International (ODI) Cricket Matches

MATH 240
PROFESSOR LANCE ROBINS
SUMMER 2015
By
REDDY BHARGAVI KAPUGANTI (0579749)
SAI RAJU JAMPANA (0576385)
VENU GOPAL VEGI (0578423)

POORVA PATIL (578850) Table on Contents


Introduction (Brief Background & Assumptions).3
Problem Statement..4
Test Design/Approach.4
Results of Data Analysis/Testing5
Conclusion....9
References.......10
Appendix - (Data Formatted from Excel)11
Descriptive Stats...11
Linear Regression & Scatter Plots.........................................................13
Multiple Regression.21

Introduction
In this paper we endeavor to dissect the elements influencing the winning percentage in
One Day International (ODI) Cricket Matches considering the investigation of most recent 10
years. Separate relapse trials are distinguished for the examination of ODI among different
countries. This examination ought to give knowledge on exactly how vital every variable is, for
the execution of the teams in winning. Our group decided to break down the variables inside of
the one-day international cricket matches, on account of our interest in Cricket. We as students of
India had the delight of seeing the Indians win the latest series. Each of the four of us is from
India, so when India won match this year it was energizing, as everybody bolstered and
commended their win. In spite of the fact that we were not ready to exclusively concentrate on
the Indian cricket group as the information would be excessively restricted, we chose, making it
impossible to extend our center to the whole ODI. In this way we chose to check whether we
could figure out what variables inside ODI information would prompt a higher winning rate.
Our group gathered information from ESPN.com. The information gathered was of the 21
teams in one day global amid the normal season traversing from 10 years. We expressed the
needy variable as the winning rate, computed the winning percentage by dividing total matches
won by the total matches played. In addition we considered other important variables as Average
(Batting), Average (Bowling), Strike Rate (Bowling) and Highest Scores.
The independent variables that are considered are Average (Batting), Average (Bowling),
Strike Rate (Bowling) and Highest Scores. However, the dependent variable is Winning
Percentage. To score is to hit runs and expecting a team can score more runs, we must accept that
their batting attitudes are nice. In addition to this in bowling, the team is concerned on taking
wickets and maintaining proper average or run rate. Thus, we as a team have predicted that of all
the variables tested, the batting & bowling stats will yield the strongest linear relationship to the
winning percentage.
The free variables that group under bowling details are no ball, wide ball, run rate and
wickets taken. No ball is the point at which a bowler tosses over the height of the batsman or
bowling off the crease. Wide ball is the point at which the bowler bowls a ball which is too away

4
from the wide crease. Run-rate is the runs given in a bowling over. Finally, wickets taken are
number of dismissals either by stump-outs; run-outs, lbws or catches. Bowling strike rate is
defined for a bowler as the average number of balls bowled per wicket taken. All these variables
are considered for the total average of bowling and strike rate.
Lastly, the free variables that group under batting details are runs and strike rate. Batting
strike rate is defined for a batsman as the average number of runs scored per 100 balls faced. The
higher the strike rate, the more effective a batsman is at scoring quickly. Runs are scored by a
batsman, and the aggregate of the scores of a team's batsmen (plus any extras) constitutes the
team's score. All these variables are considered for the total average of batting and strike rate.

Problem Statement
Our team chose to analyze the relationship of different details and the general winning
rate to better comprehend what constitutes a "winning group". In spite of the fact that we might
never be sure to what precisely constitutes a triumphant group, we might want to think our
outcomes will reveal some insight into what details to search for, in a team who has a higher
winning rate. We decided to break down the accompanying independent variables to see which
had the most influence on the winning rate: Average (Batting), Average (Bowling), Strike Rate
(Bowling) and Highest Score.

Test Design/Approach
This research paper is based on the data collected from the ESPN ODI official website for
the games played by 21 ODI teams within the last 10 years. The Excel data found in our
appendix contains data such as winning percentage, highest score, strike rate, average batting and
bowling.
Based on the acquired data, the first step was to perform the descriptive statistics. This
would give us a summary of the observations by analyzing the measures of central tendency
(mean, median, and mode), variation (rage, interquartile range, variance, standard deviation,
coefficient of variation), shape (skewness) etc.

5
The second step was performing a linear regression analysis with each independent
variable in relationship to our dependent variable. The dependent variable chosen is Winning
percentage and the independent variables chosen are Average (Batting), Average (Bowling),
Strike Rate (Bowling) and Highest Score. This was performed to detect the strength of a
relationship between the two respected variables and by analyzing the scatter plot, measures of
variation, standard errors, intercept etc.
Next, we performed a numerous relapse examination, with the greater part of our
independent variables in relationship to our dependent variable (winning rate). This examination
spoke to the error between the real and anticipated worth served to analyze completely the
accompanying suppositions: linearity, autonomy and ordinariness, square with fluctuation, the
Durbin-Watson insights, leftover plot, t detail, p esteem and so forth. These outcomes would
permit us to investigate the relationship of one needy variable and more autonomous variables by
looking at the coefficient of numerous determinations (r2), balanced sr2 and the general F test.
The noteworthiness of the variables was dictated by the p-estimation of 0.05 or less:

If the p-worth is more noteworthy than or equivalent to , don't dismiss the invalid

speculation;
If the p-worth is not as much as, reject the invalid speculation. As a final step, the
variable model was chosen as the most appropriate for our case and several tests were
conducted to check its assumptions including t-test, hypothesis test, regression analysis, p
value etc.

Results
We ran a multiple regression analysis as well as a simple linear regression on the data. The
simple linear regression results can be found in our Appendix. The multiple regression models
came out as significant. The variables batting average and bowling average affected the winning
percentage. We also noticed that both batting average and bowling average have an impact on the
winning percentage along with team performance. Therefore, for every change in the value of
batting average or blowing average will have an effect on winning percentage.

Intercept
Ave(Batting)
AVE(BOWLING)
SR(BOWLING)
HS

Coefficients
0.1468
0.0095
0.0113
-0.0057
0.0003

P-value
0.0344
0.0000
0.0000
0.0659
0.0229

There was no need to re-run the multiple regression analysis again as the first model was
significant in itself, returning the p-values which are considerable for every relationship. The
multiple regressions were run on 95% confidence interval. The model has r square value of .
9646 and adjusted r square value of .9558, which represents a good number. The r square number
did not improve even if the least affecting independent variable was removed indicating this to
be the best option.
The effect of bowling average is interdependent on batting average score as well, since
they both contribute to winning percentage. Sure, teams might have a positive impact on their
home countries, but the absence of better bowling might change the game. Same as the batting
pitch has an impact on bowling, the blowing pitch would have impact the batting score. In this
case both terms have an impact on total score and winning percentage.
We have included the scatter plots for both of them below. The plots also go hand in hand
with this information and it is quite clear that the data for strike rate (bowling) and highest score
are quite scattered whereas average batting and bowling appears as a common collection of
performance numbers concentrated at one place.

7
Scatter Plots Team Batting Average & Team Bowling Average
Team Average Batting

Team Bowling Average

8
The Multiple Regression Output
Regression
Statistics
Multiple R
0.9821
R Square
0.9646
Adjusted R
Square
0.9558
Standard
Error
0.0165
Observation
s
21
ANOVA
df
Regression
Residual
Total

SS
4

0.1185

16
20

0.0043
0.1228

Coeffici
ents

Standard
Error

Intercept

0.1468

0.0635

Ave(Batting)
AVE(BOWLIN
G)

0.0095

0.0011

0.0113

0.0020

-0.0057

0.0029

0.0003

0.0001

SR(BOWLIN
G)
HS

MS
F
0.02 109.03
96
42
0.00
03

t
Stat
2.31
23
8.83
42
5.57
49
1.97
42
2.51
62

Pvalue

Significa
nce F
0.0000

Lower
95%

Upper
95%

0.0344

0.0122

0.2814

0.0000

0.0072

0.0118

0.0000

0.0070

0.0156

0.0659

-0.0118

0.0004

0.0229

0.0000

0.0005

Conclusion
The above illustration indicates that the coefficients are depicting correct relationships
with the dependent variable. For instance, the variables like Average Batting and Average
Bowling have positive impact on the winning percentage whereas Strike Rate and Highest
Scores do not better the winning percentage. The coefficient for Strike Rate is actually negative
proving the same.
As team batting average and bowling average have a higher impact on the winning
percentage and hence a team should be putting a higher emphasis on both bowling and batting.
Meaning more time should be spent with the bowler so that his stats in reference to dismissals
can be higher. This would return a positive impact on the overall winning percentage. Moreover,
the batsmen should work on increasing their strike rate by making more runs and using new
strategies on the field while playing. The team should also put more time in practicing on hitting
the ball. If the team as a whole can improve its batting average, it is more likely that number of
boundaries will increase as well.
When selecting players for the team, the captains should have an idea that which player
can play in a strategic way on the field to score more runs. They should also watch out for
bowlers who have a history of good run rate (Bowling) or a higher percentage in accomplishing
wickets. Although a player can lose abilities, or abilities can deplete over time, it is still better to
have a stronger history of bowling. When playing against other team, it could be beneficial for
the bowler and the captain to know which players are more likely to hit boundaries (sixes and
fours) more often.

10
We have taken into consideration that the higher the sample size, the more closely the
representation of actual numbers. Also, because of the range with respect to the time covered,
these results can be trustworthy for not just the coming year but for many years to come. Hence,
our data covered 21 teams spanning a 10 year period.

11

References
Data was retrieved primarily from ESPN.com varying by years:
ESPN cricketinfo. (2015). Statistics One- Day International. Retrieved from
http://stats.espncricinfo.com/ci/engine/stats/index.html?
class=2;filter=advanced;groupby=team;orderby=runs;result=1;templa
te=results;type=batting
ESPN cricketinfo. (2015).Statistics One- Day International. Retrieved from
http://stats.espncricinfo.com/ci/engine/stats/index.html?
class=2;filter=advanced;groupby=team;orderby=wickets;result=1;spanmax1=11+Aug+20
15;spanmin1=11+Aug+2005;spanval1=span;template=results;type=bowling
Levine, D., Stephan, D., &Szabat, K. (2014). Statistics for Manager: Using Microsoft Excel (p.
314). Upper Saddle River, NJ: Pearson Education.

12

Appendix
Descriptive Summary: Average Batting

Mean
Median
Mode
Minimum
Maximum
Range
Variance
Standard
Deviation
Coeff. of
Variation
Skewness
Kurtosis
Count
Standard
Error

Ave(Batti
ng)
35.677142
86
37.64
#N/A
17.4
42.77
25.37
29.3790
5.4202
15.19%
-2.0180
5.9253
21
1.1828
Descriptive Summary: Average Bowling

Mean
Median
Mode
Minimum
Maximum
Range
Variance
Standard
Deviation
Coeff. of
Variation
Skewness
Kurtosis
Count
Standard

AVE(BOWL
ING)
24.0461904
8
23.79
#N/A
19.4
33.17
13.77
9.5197
3.0854
12.83%
1.3552
2.8792
21
0.6733

13
Error

Descriptive Summary: Strike Rate (Bowling)

Mean
Median
Mode
Minimum
Maximum
Range
Variance
Standard
Deviation
Coeff. of
Variation
Skewness
Kurtosis
Count
Standard
Error

SR(BOWLI
NG)
32.238095
24
31.8
31.4
28.9
37.7
8.8
4.3145
2.0771
6.44%
1.0017
1.1237
21
0.4533
Descriptive Summary: Highest Scores

Mean
Median
Mode
Minimum
Maximum
Range
Variance
Standard
Deviation
Coeff. of
Variation
Skewness
Kurtosis
Count
Standard

HS
161.85714
29
158
#N/A
78
264
186
2080.4286
45.6117
28.18%
0.3422
0.0602
21
9.9533

14
Error

Descriptive Summary: Winning Percentage

Mean
Median
Mode
Minimum
Maximum
Range
Variance
Standard
Deviation
Coeff. of
Variation
Skewness
Kurtosis
Count
Standard
Error

Winning
%
0.6200285
71
0.6348
#N/A
0.394
0.7585
0.3645
0.0061
0.0784
12.64%
-1.0271
2.5563
21
0.0171
Linear Regression: Average Batting

Regression Statistics
Multiple R
0.9334
R Square
0.8713
Adjusted R
Square
0.8645
Standard
Error
0.0288
Observation
s
21
ANOVA
df

SS

MS

Significa
nce F

15
Regression
Residual
Total

0.1070

19
20

0.0158
0.1228

Coeffici
ents
Intercept

0.1385

Ave(Batting)

0.0135

Standard
Error

0.107 128.59
0
40
0.000
8

Pvalue

t Stat
3.227
0.0429
7 0.0044
11.33
0.0012
99 0.0000

Scatter Plot: Average Batting

0.0000

Lower
95%

Upper
95%

0.0487

0.2284

0.0110

0.0160

16

Linear Regression: Average Bowling


Regression Statistics
Multiple R
0.7112
R Square
0.5058
Adjusted R
Square
0.4798
Standard
Error
0.0565
Observation
s
21
ANOVA
df
Regression
Residual
Total

SS
1

0.0621

19
20

0.0607
0.1228

Coeffici
ents
Intercept
AVE(BOWLIN
G)

Standard
Error

0.1856

0.0993

0.0181

0.0041

MS
0.06
21
0.00
32

F
19.44
86

t
Stat
1.87
02
4.41
01

Pvalue
0.076
9
0.000
3

Significa
nce F
0.0003

Lower
95%

Upper
95%

-0.0221

0.3934

0.0095

0.0266

17

Scatter Plot: Average Bowling

18

Linear regression: Strike Rate (Bowling)


Regression Statistics
Multiple R
0.5266
R Square
0.2773
Adjusted R
Square
0.2393
Standard
Error
0.0684
Observations
21
ANOVA
df
Regression
Residual
Total

SS
1

0.0341

19
20

0.0888
0.1228

Coefficie
nts
Intercept
SR(BOWLING
)

Standard
Error

-0.0205

0.2377

0.0199

0.0074

MS
0.03
41
0.00
47

F
7.29
10

t
Stat

Pvalu
e

0.08
62
2.70
02

0.93
22
0.01
42

Significa
nce F
0.0142

Lower
95%

Upper
95%

-0.5180

0.4770

0.0045

0.0353

19

Scatter Plot: Strike Rate (Bowling)

20

Linear Regression: Highest Scores


Regression Statistics
Multiple R
0.6922
R Square
0.4791
Adjusted R
Square
0.4517
Standard
Error
0.0580
Observation
s
21
ANOVA
df
Regression
Residual
Total

SS
1

0.0589

19
20

0.0640
0.1228

Coeffici
ents

Standard
Error

Intercept

0.4275

0.0478

HS

0.0012

0.0003

MS
0.05
89
0.00
34

F
17.47
87

t
Stat
8.95
30
4.18
08

Pvalue
0.000
0
0.000
5

Significa
nce F
0.0005

Lower
95%

Upper
95%

0.3276

0.5275

0.0006

0.0018

21

Scatter Plot: Highest Scores

22

Multiple Regression Analysis:

23
Regression
Statistics

24
Multiple R
R Square
Adjusted R
Square
Standard
Error
Observation
s

0.9821
0.9646
0.9558
0.0165
21

ANOVA
df
4

SS
0.1185

Residual

16

0.0043

Total

20

0.1228

Regression

Coeffici
ents

Standard
Error

Intercept

0.1468

0.0635

Ave(Batting)
AVE(BOWLIN
G)

0.0095

0.0011

0.0113

0.0020

-0.0057

0.0029

0.0003

0.0001

SR(BOWLIN
G)
HS

MS
F
0.02 109.03
96
42
0.00
03

t
Stat
2.31
23
8.83
42
5.57
49
1.97
42
2.51
62

Pvalue

Significa
nce F
0.0000

Lower
95%

Upper
95%

0.0344

0.0122

0.2814

0.0000

0.0072

0.0118

0.0000

0.0070

0.0156

0.0659

-0.0118

0.0004

0.0229

0.0000

0.0005