You are on page 1of 9

Analysing the Relationship Between Bowlers’

Height and Bowling performance

Group: 2
PGP/23/374 ABHISHEK MITTAL
PGP/23/375 AKSHITA GARG
PGP/23/380 ASHUTOSH MISHRA
PGP/23/401 NIKHIL GUPTA
PGP/23/403 NIKITA VERMA
PGP/23/426 UTKARSH CHATURVEDI
Introduction:
The project explores the relationship between cricket players’ height and bowling speed. Two
tests are performed on the data: ANOVA test to compare the performance of players from
different countries on the parameters of bowling speed and economy rate, and regression to find
the relationship between players’ heights and bowling speeds.
Motivation:
In athletics, player performance can be attributed to a number of reasons. The causes for good
performance can be linked to a player’s socio-economic conditions, innovation in the sports,
better coaching or even his own physique in certain sports.
A player’s height can often give him certain advantages in sports. A player’s height can result in
greater speed of movement and power in a sport, depending on his own abilities and build. It can
also have certain disadvantages caused because of size and resultant mass that might hamper his
performance. Finally, it is also possible that height may be irrelevant to a lot of sports.
Cricket is the most celebrated sport in the country and hence has been chosen for the purpose of
the project. We intend to analyse players’ performance data and make relevant conclusions about
its relationship with their heights.

Sampling methodology:
A sample of size 51 has been collected from standard secondary sources and the sampling
methodology used is random sampling.

Data description:
The data has a mix of qualitative and quantitative values. The qualitative data are :
● Name
● Country
● Right or left armed
The quantitative data consists of:
● Height
● Bowling speed
● Total matches
● Total wickets
● Economy rate
Bowling speed of players’ has been calculated by taking the average of 5-6 matches. The data
collected is for ODI matches only

Statistical methodology used for the data analysis:


Linear regression is used to find out a relationship between the height of players and bowling
speeds. For further analysis, ANOVA test is performed on the dataset, comparing the variation of
speeds between players from different countries. We also find out which country has the fastest
bowlers using Tukey’s Honestly Significant Differences. We also perform ANOVA test on
economy rates of players to see how it varies across different countries.

Conclusion:
We observe that there exists a positive correlation between bowlers’ height and bowling speed.

Regression Equation: Bowling Speed = 105.0 + 21.28 Height


Value of R-sq = 17.04%
Value of R-sq(adj) = 15.31%

The model is not a great fit but there is still some positive correlation between the two variables.
The correlation is not particularly strong, as evident by the low value of R-square. Hence, if a
player has greater height there is a slight chance of his height enhancing his speed.

For the ANOVA analysis for bowling speed, we set the significance level as 5%. The p-value
obtained =.03. Hence, we conclude that there is a difference in bowling speeds of players from
different countries. On performing Tukey’s Honestly Significant Difference Test, we find that
the difference in mean of bowling speeds of players from India-Pakistan and Pakistan-Zimbabwe
is the most significant.

When the ANOVA test is applied to the economy rates with a significance level of 5%, we find
that there is not much variance between different countries. P-value =0.225. Hence, we don’t
reject the null hypothesis that all means are equal.
APPENDIX:
Individual Contribution:

All the members contributed in brainstorming and deciding on the project. All members of the
groups were present through all the steps of the project and contributed wherever as much as
they could. The work division can be broadly classified as follows:
• Nikita and Nikhil worked on gathering the dataset and analysis.
• Akshita and Ashutosh ran the regression and ANOVA tests.
• Utkarsh and Abhishek gave contribution in cleaning the data and analysis.
Report writing was a collective activity, each member has done his or her part.

Regression:

Coefficients

Term Coef SE Coef T-Value P-Value VIF


Constant 105.0 12.7 8.29 0.000
Height 21.28 6.78 3.14 0.003 1.00

Analysis of Variance

Source DF Adj SS Adj MS F-Value P-Value


Regression 1 116.8 116.806 9.86 0.003
Height 1 116.8 116.806 9.86 0.003
Error 48 568.9 11.852
Lack-of-Fit 16 154.6 9.664 0.75 0.728
Pure Error 32 414.3 12.945
One-way ANOVA: Australia, England, India, New Zealand, Pakistan, South Africa, Sri
Lanka, etc.

Null hypothesis: All means are equal


Alternative hypothesis: At least one mean is different
Significance level: α = 0.05

Equal variances were assumed for the analysis.

Factor Information

Factor Levels: 9
Values: Australia, England, India, New Zealand, Pakistan, South Africa, Sri Lanka, West Indies,
Zimbabwe

Analysis of Variance
Source DF Adj SS Adj MS F-Value P-Value
Factor 8 225.6 28.20 2.56 0.023
Error 42 462.2 11.00
Total 50 687.8

Model Summary:
S R-sq R-sq(adj) R-sq(pred)
3.31720 32.81% 20.01% 0.00%

Means
Factor N Mean StDev 95% CI
Australia 10 145.90 3.49 ( 143.78, 148.02)
England 12 144.729 1.990 (142.797, 146.662)
India 6 142.21 4.18 ( 139.48, 144.94)
New Zealand 4 143.38 2.14 ( 140.03, 146.72)
Pakistan 4 150.13 6.02 ( 146.78, 153.48)
South Africa 5 144.18 2.28 ( 141.19, 147.18)
Sri Lanka 2 142.63 6.54 ( 137.89, 147.36)
West Indies 5 145.71 2.50 ( 142.72, 148.71)
Zimbabwe 3 141.33 2.31 ( 137.47, 145.20)

Pooled StDev = 3.31720

Tukey Pairwise Comparisons :

Grouping Information Using the Tukey Method and 95% Confidence

Factor N Mean Grouping


Pakistan 4 150.13 A
Australia 10 145.90 A B
West Indies 5 145.71 A B
England 12 144.729 A B
South Africa 5 144.18 A B
New Zealand 4 143.38 A B
Sri Lanka 2 142.63 A B
India 6 142.21 B
Zimbabwe 3 141.33 B

Means that do not share a letter are significantly different.

Tukey Simultaneous 95% CIs


Interval Plot of all countries:
One-way ANOVA for economy rate:

Null hypothesis: All means are equal


Alternative hypothesis: At least one mean is different
Significance level: α = 0.05

Equal variances were assumed for the analysis.

Analysis of Variance:

Source DF Adj SS Adj MS F-Value P-Value


Factor 8 4.740 0.5926 1.40 0.225
Error 40 16.894 0.4224
Total 48 21.635

Model Summary:

S R-sq R-sq(adj) R-sq(pred)


0.649894 21.91% 6.29% 0.00%

Means

Factor N Mean StDev 95% CI


Australia 10 4.680 0.524 ( 4.265, 5.095)
England 10 5.182 1.014 ( 4.767, 5.597)
India 6 4.932 0.403 ( 4.395, 5.468)
New Zealand 4 4.738 0.401 ( 4.081, 5.394)
Pakistan 4 4.580 0.478 ( 3.923, 5.237)
South Africa 5 4.322 0.428 ( 3.735, 4.909)
Sri Lanka 2 5.2700 0.0990 (4.3412, 6.1988)
West Indies 5 4.360 0.685 ( 3.773, 4.947)
Zimbabwe 3 5.103 0.530 ( 4.345, 5.862)

Pooled StDev = 0.649894

Tukey Pairwise Comparisons


Grouping Information Using the Tukey Method and 95% Confidence:

Factor N Mean Grouping


Sri Lanka 2 5.2700 A
England 10 5.182 A
Zimbabwe 3 5.103 A
India 6 4.932 A
New Zealand 4 4.738 A
Australia 10 4.680 A
Pakistan 4 4.580 A
West Indies 5 4.360 A
South Africa 5 4.322 A

Means that do not share a letter are significantly different.

Tukey Simultaneous Tests for Differences of Means


Tukey Simultaneous 95% CIs:

You might also like