Professional Documents
Culture Documents
Submitted to
Dr. Gulnaz Banu
By Shruti Arora
Question 1: Develop a simple linear regression model between the sold price and
batting strike rate, is there a statistically significant relationship between sold
price and batting strike rate?
Taking Dependent Variable as Sold Price and Independent Variable to be SR-B, Following
Conclusion may be drawn:
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.18426907
R Square 0.03395509
Adjusted R Square
0.02640786
Standard Error401399.957
Observations 130
ANOVA
df SS MS F Significance F
Regression 1 7.2489E+11 7.2489E+11 4.49901599 0.03584286
Residual 128 2.0624E+13 1.6112E+11
Total 129 2.1348E+13
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept 289521.427 114770.006 2.52262274 0.01287373 62429.3614 516613.492 62429.3614 516613.492
SR -B 2086.39366 983.642957 2.1210884 0.03584286 140.088017 4032.69931 140.088017 4032.69931
z Predicted SOLD
Residuals
PRICE 2000000
Here R2 = 0.034
P-value = 0.036
• R2 value tells how much variation is explained by the model and the correlation
between x and y. The higher the value of R2 (closer to 1) better is the model. P-value
indicates a significant relationship described by the model. It helps in determining if
the independent variable is significant.
Conclusion: Despite the low R , P-value is 0.036 which is less than the significance level
0.05 which indicates that there is some level of (minimal) Statistically significant relation
between Sold price and Batting strike rate.
Question 2: What is the impact of ability to score “SIXERS” on the player’s
price?
CONCLUSION:
• For every increase in number of sixers by the player the base price increases by
1040.147 times.
• Low R2 does not disprove the importance of any significance variables.
• Hence, with R2 lower than 1, the model doesn’t explain much of variation. Also, P-
value / F significant is greater than 0.05, shows that it is statistically not significant
(Very Low R2 and P-value greater than 0.05 is the Worst Possible scenario).
This means that there is very less impact of ability to score sixes on the player’s price.
Question 3: Develop a multiple linear regression model between Sold price
and batting striking rate and Sixers? What do you conclude from this
model?
CONCLUSION:
The linear regression’s F-test value is highly significant; thus, we can assume that the model
explains significant amount of the variance in Sold price.
From the co-efficient table, it can be seen that SIXER has a F significant of 0.00 (less than 0.05
& Highly significant) while SR-B has significance of 0.918 (much higher than 0.05 & No
significance).
The scenario which we have is low R2 and low P value which is not ideal. However, it means
that the model does not explain much variation of data but it is still significant (better than not
having a model at all; not the worst scenario).
QUESTION 4: Cricket in the T20 format is considered a young man’s sport, is
there evidence that the player’s price is influenced by age?
Independent variable: Ag
CONCLUSION:
Since P value is much less than 0.05 the Significance of independent variable ‘Age’ on the
dependant variable Price is very high.
R2 shows that the data variance is 4%. R value is so low at 0.209 showing there is no co-
relation between the variables.
Since R2 is low (and not close to 1) and P-value less than 0.05, it means that much of variation
in data cannot be explained, therefore, age cannot be considered act as an affecting factor for
the price of the player.
QUESTION 5: Are players of Indian origin paid more than players of other
countries?
Dependent Variable: Sold price, Independent Variable: Country, Dummy Variable is taken
with India =1 and rest taken 0.
COUNTRY Dummy Variable-Country SOLD PRICE Considering Dummy Country Variable ,India-1 and All other countries-0
SA 0 50000
BAN 0 50000
IND 1 350000 SUMMARY OUTPUT
IND 1 850000
IND 1 800000 Regression Statistics
AUS 0 50000 Multiple R 0.26843436 Y=B1+B2
IND 1 500000 R Square 0.072057
AUS 0 700000 Adjusted R Square
0.06480745
SA 0 950000 Standard Error 393404.49
SA 0 450000 Observations 130
WI 0 200000
WI 0 200000 ANOVA
IND 1 400000 df SS MS F Significance F
SA 0 300000 Regression 1 1.53831E+12 1.53831E+12 9.93950763 0.00201547
IND 1 300000 Residual 128 1.98102E+13 1.54767E+11
IND 1 1500000 Total 129 2.13485E+13
SL 0 250000
IND 1 375000 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
IND 1 500000 Intercept 430974.026 44832.60242 9.612960272 8.4429E-17 342265.062 519682.99 342265.062 519682.99
SA 0 300000 Dummy Variable-Country
221365.597 70214.64277 3.152698468 0.00201547 82433.9298 360297.264 82433.9298 360297.264
WI 0 150000
SL 0 150000
NZ 0 350000
ENG 0 1550000 RESIDUAL OUTPUT
IND 1 725000
IND 1 400000 Observation
Predicted SOLD PRICER esiduals
WI 0 800000 1 430974.026 -380974.026
SA 0 575000 2 430974.026 -380974.026
We know that, the regression equation for predicting an outcome variable Y on the basis of
predictor variable X can be written as:
Y = B1+ B2(X)
In order to understand if Indian players are getting paid more than players from other countries,
let us assume Dummy Variables
With the help of these variables ,we can arrive at the following model:
B2 = Average Difference in the selling price between Indian and Other Countries Selling Price
= 652340.026 - 430974.026
Hence, B2 = 221366
Hence, we can conclude that on an average, Indian players are paid higher compared to Other
country players during IPL.
The P-value/F significant for the dummy variable of India = 0.002 < 0.05 which is very
significant suggesting that there is a significant statistical evidence of a difference in average
selling price between Indian Players and Other Country Players.
Others: Base price, SR -B, Runs-s, ODI-SR-B, T- Runs, Ave, Sixers, ODI-Runs-s, HS
ODI-RUNS-S ODI-SR-B T-RUNS RUNS-S HS AVE SR -B SIXERS Base Price Sold Price
0 0 0 0 0 0.00 0.00 0 50000 50000
657 71.41 214 0 0 0.00 0.00 0 50000 50000 SUMMARY OUTPUT
1269 80.62 571 167 39 18.56 121.01 5 200000 350000
241 84.56 284 58 11 5.80 76.32 0 100000 850000 Regression Statistics
79 45.93 63 1317 71 32.93 120.71 28 100000 800000 Multiple R 0.71612017
172 72.26 0 63 48 21.00 95.45 0 50000 50000 R Square 0.5128281
120 78.94 51 26 15 4.33 72.22 1 100000 500000 Adjusted R Square
0.4762902
50 92.59 54 21 16 21.00 165.88 1 200000 700000 Standard Error294397.516
609 85.77 83 335 67 30.45 114.73 3 200000 950000 Observations 130
4686 84.76 5515 394 50 28.14 127.51 13 200000 450000
2004 81.39 2200 839 70 27.97 127.12 38 200000 200000 ANOVA
8778 70.74 9918 25 16 8.33 80.64 0 200000 200000 df SS MS F Significance F
38 65.51 5 337 24 13.48 113.09 9 125000 400000 Regression 9 1.0948E+13 1.2165E+12 14.0355138 3.0438E-15
4998 93.19 5457 1302 105 34.26 128.53 42 200000 300000 Residual 120 1.04E+13 8.667E+10
69 56.09 0 1540 95 31.43 122.32 36 100000 300000 Total 129 2.1348E+13
6773 88.19 3509 1782 70 37.13 136.45 64 400000 1500000
6455 86.8 4722 1077 76 28.34 117.83 24 150000 250000 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
18 60 0 6 2 1.00 33.33 0 100000 375000 Intercept 219297.209 100457.576 2.18298328 0.03098381 20398.1999 418196.219 20398.1999 418196.219
10889 71.24 13288 1703 75 27.92 116.88 23 400000 500000 ODI-RUNS-S 47.7630784 18.0069401 2.65248166 0.00907078 12.1105902 83.4155665 12.1105902 83.4155665
2536 84 654 978 74 36.22 119.27 35 300000 300000 ODI-SR-B -659.17764 1159.21324 -0.5686423 0.5706616 -2954.3392 1635.98391 -2954.3392 1635.98391
73 45.62 380 4 3 4.00 80.00 0 150000 150000 T-RUNS -64.320895 19.4967236 -3.2990618 0.0012772 -102.92305 -25.718739 -102.92305 -25.718739
239 60.96 249 4 2 0.00 133.33 0 150000 150000 RUNS-S 389.81159 108.030984 3.60833139 0.00045053 175.917761 603.70542 175.917761 603.70542
8037 71.49 7172 196 45 21.77 118.78 3 350000 350000 HS -3362.2887 1843.02662 -1.82433 0.07059015 -7011.3532 286.775704 -7011.3532 286.775704
3394 88.82 3845 62 24 31.00 116.98 2 950000 1550000 AVE -148.06087 5469.70497 -0.0270693 0.97844946 -10977.696 10681.5743 -10977.696 10681.5743
4819 86.17 3712 2065 93 33.31 128.90 32 220000 725000 SR -B 282.103364 951.77783 0.29639623 0.76743986 -1602.3505 2166.55723 -1602.3505 2166.55723
11363 73.7 7212 1349 91 25.45 106.81 42 200000 400000 SIXERS 815.543884 2478.90349 0.3289938 0.74273385 -4092.5125 5723.60027 -4092.5125 5723.60027
8087 83.95 6373 1804 128 50.11 161.79 129 250000 800000 Base Price 1.47950713 0.20618189 7.17573763 6.4733E-11 1.07128134 1.88773292 1.07128134 1.88773292
8094 83.26 6167 886 69 27.69 109.79 31 250000 575000
ANOVA
df SS MS F Significance F
Regression 5 1.1086E+13 2.2172E+12 26.7893001 2.7409E-18
Residual 124 1.0263E+13 8.2763E+10
Total 129 2.1348E+13
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept 70184.549 50638.8362 1.38598266 0.16823963 -30043.893 170412.991 -30043.893 170412.991
ODI-RUNS-S 50.8104732 17.0578821 2.97870936 0.00348313 17.0481463 84.5728001 17.0481463 84.5728001
T-RUNS -64.466029 17.6481175 -3.6528558 0.00038112 -99.396597 -29.535461 -99.396597 -29.535461
RUNS-C 136.673637 47.3109699 2.88883609 0.00456491 43.0319751 230.315299 43.0319751 230.315299
RUNS-S 262.696543 48.9130106 5.3706885 3.7273E-07 165.883994 359.509093 165.883994 359.509093
Base Price 1.36900935 0.18469117 7.41242433 1.6864E-11 1.00345378 1.73456492 1.00345378 1.73456492
FINAL ANALYSIS