You are on page 1of 18

Business Statistics & Analytics Group Assignment Report

Apoorva Chhangani (p41071), Rahul Jha (p41098), Ram Sandeep Peddada (p41100)
Saloni Sharma (p41104), Shubhayan Modak (p41115)

Dependent Variable: No. of Cases per million be ‘CPM’

Independent Variables: Stringency Index be ‘SI’

Hand Washing Facilities per million be ‘HWF’

Hospital beds per million be ‘HB’

GDP per Capita be ‘GDP’

Cardiovascular Death Rate ‘CDR’

Diabetes prevalence be ‘DP’

Female Smokers be ‘FS’

Male Smokers ‘MS’

Aged 65 older be ‘X’

Aged 70 older be ‘Y’

Life Expectancy be ‘LE’

Multiple Liner Regression

Hypothesis: Variations in Dependent variable caused by variations in Independent variable.

Statistically,

CPM = f (SI, HWF, HB, GDP, CDR, DP, FS, MS, X, Y, LE)

CPM = Bo + B1(SI) + B2(HWF) + B3(HB) + B4(GDP) + B5(CDR) + B6(DP) + B7(FS) + B8(MS) + B9(X) + B10(Y) +
B11(LE) + U

CPM^ = b*0 + b*1(SI) + b*2(HWF) + b*3(HB) + b*4(GDP) + b*5(CDR) + b*6(DP) + b*7(FS) + b*8(MS) + b*9(X)
+ b*10(Y) + b*11(LE)

Significance Value: 5%
Business Statistics & Analytics Group Assignment Report
Apoorva Chhangani (p41071), Rahul Jha (p41098), Ram Sandeep Peddada (p41100)
Saloni Sharma (p41104), Shubhayan Modak (p41115)

July-2020
CPM^ = -1774.47 + 25.75(SI) - 0.013(HWF) - 0.004(HB) - 81.32(GDP) - 689.52(CDR) - 11.48(DP) + 736.45(FS)
+ 240.14(MS) + 131.86(X) - 166.36(Y) + 79.40(LE)

Regression Statistics
Multiple R 0.573470227
R Square 0.328868101
Adjusted R Square 0.267347677
Standard Error 1439.205169
Observations 132

ANOVA
df SS MS F Significance
F
Regression 11 1.22E+08 11072555 5.345674 7.46E-07
Residual 120 2.49E+08 2071312
Total 131 3.7E+08

Coefficients Standard t Stat P-value Lower 95% Upper Lower Upper


Error 95% 95.0% 95.0%
Intercept -1774.478158 945.5032 -1.87676 0.06298 -3646.51 97.55242 -3646.51 97.55242
Avg. Stringency Index 25.75306117 8.122515 3.170577 0.001931 9.671046 41.83508 9.671046 41.83508
Handwashing -0.013389155 0.01082 -1.23743 0.218345 -0.03481 0.008034 -0.03481 0.008034
facilities per Million
Hospitals beds per -0.004593768 0.004454 -1.03141 0.304422 -0.01341 0.004225 -0.01341 0.004225
Million
GDP per Capita -81.32083852 119.4027 -0.68106 0.497143 -317.73 155.0882 -317.73 155.0882
Cardiovasc death -689.5217103 315.9976 -2.18205 0.031055 -1315.17 -63.8685 -1315.17 -63.8685
rate
Diabetes Prevalence -11.48346042 5.66232 -2.02805 0.04477 -22.6945 -0.27246 -22.6945 -0.27246
Female smokers 736.4505951 378.0022 1.948271 0.053717 -11.9675 1484.869 -11.9675 1484.869
Male smokers 240.1390373 97.55046 2.46169 0.01525 46.99591 433.2822 46.99591 433.2822
aged 65 older 131.8621375 35.90403 3.672627 0.00036 60.77465 202.9496 60.77465 202.9496
aged 70 older -166.3621582 102.6437 -1.62077 0.107691 -369.59 36.86524 -369.59 36.86524
life expectancy 79.40316931 74.34281 1.068068 0.287634 -67.7904 226.5968 -67.7904 226.5968

• From ANOVA, it can be observed that F-Value for the model is 7.46E-07, hence the model is significant.
• It can be understood from the above table that, In the absence of all the other independent variables,
Cases per million (CPM) can have minimum value of -1774.47.
• In the presence of any of the independent variable for example Stringency Index (SI), it can be interpreted
as 1 unit increase in SI would cause 25.753 units increase in CPM, keeping other factors constant. However,
for GDP per capita (GDP), 1 unit increase in GDP will cause 81.320 units of decrease in the value of CPM,
keeping other factors constant. Similarly, it can be said for other independent variables as well.
• Checking P-Values individually for independent variable suggests that all the variables are significant since
if P-Value is low, Null Hypothesis must go.
Business Statistics & Analytics Group Assignment Report
Apoorva Chhangani (p41071), Rahul Jha (p41098), Ram Sandeep Peddada (p41100)
Saloni Sharma (p41104), Shubhayan Modak (p41115)

• R2 being 0.328, suggests that Model considered can explain 32.8% of CPM. That is Remaining percentage is
unexplained due to factors out of model which have not been considered. Therefore, factors SI, HWF, HB, GDP,
CDR, DP, FS, MS, X, Y & LE explains only 32.8% dependency on CPM.
• As the p value of the intercept, coefficient of the Average Stringency Index, coefficient of cardiovascular
death rate, coefficient of diabetes prevalence, coefficient of the male and female smokers, coefficient of
aged 65 older is less than 5%, these are significant and rest factors have p value more than 5% so these are
not significant.

June-2020
CPM^ = -1788.61 + 8.69(SI) - 0.00925(HWF) - 0.0133(HB - 45.75(GDP) – 878.32(CDR) – 14.1978(DP) +
966.05(FS) + 327.5(MS) + 188.8(X) – 295.5(Y)- 21.5(LE)

Regression Statistics
Multiple R 0.563116
R Square 0.3171
Adjusted R Square 0.254501
Standard Error 1587.827
Observations 132

ANOVA
df SS MS F Significance
F
Regression 11 1.4E+08 12771266 5.06556 1.81E-06
Residual 120 3.03E+08 2521195
Total 131 4.43E+08

Coefficients Standard t Stat P-value Lower 95% Upper Lower Upper


Error 95% 95.0% 95.0%
Intercept -1788.61 1076.426 -1.66162 0.099199 -3919.86 342.6347 -3919.86 342.6347
Avg. Stringency Index 8.691388 9.870954 0.880501 0.380348 -10.8524 28.23519 -10.8524 28.23519
Handwashing facilities -0.00925 0.011824 -0.78246 0.435484 -0.03266 0.014159 -0.03266 0.014159
per million
Hospitals beds per Million -0.01333 0.004895 -2.72253 0.007444 -0.02302 -0.00364 -0.02302 -0.00364
GDP per Capita -45.7527 132.9869 -0.34404 0.731419 -309.058 217.5521 -309.058 217.5521
Cardiovasc death rate -878.32 352.2436 -2.4935 0.014011 -1575.74 -180.902 -1575.74 -180.902
Diabetes Prevalence -14.1978 6.306276 -2.25138 0.026181 -26.6838 -1.71183 -26.6838 -1.71183
Female smokers 966.0652 419.7902 2.301305 0.0231 134.9098 1797.221 134.9098 1797.221
Male smokers 327.5131 108.5046 3.018425 0.003104 112.6815 542.3448 112.6815 542.3448
aged 65 older 188.8068 40.34367 4.679961 7.6E-06 108.9292 268.6845 108.9292 268.6845
aged 70 older -295.553 115.9688 -2.54856 0.012079 -525.163 -65.9433 -525.163 -65.9433
life expectancy -21.5294 81.97055 -0.26265 0.793272 -183.825 140.7665 -183.825 140.7665
• From ANOVA, it can be observed that F-Value for the model is 1.81414913040725E-06, hence the model is
significant.
• It can be understood from the above table that, In the absence of all the other independent variables,
Cases per million (CPM) can have minimum value of -1788.61.
• In the presence of any of the independent variable for example Stringency Index (SI), it can be interpreted
as 1 unit increase in SI will cause 8.69 units increase in CPM, keeping other factors constant. However, for
Business Statistics & Analytics Group Assignment Report
Apoorva Chhangani (p41071), Rahul Jha (p41098), Ram Sandeep Peddada (p41100)
Saloni Sharma (p41104), Shubhayan Modak (p41115)

GDP per capita (GDP), 1 unit increase in GDP will cause 45.75 units of decrease in the value of CPM, keeping
other factors constant. Similarly, it can be said for other independent variables as well.
• Checking P-Values individually for independent variable suggests that all the variables are significant since
if P-Value is low, Null Hypothesis must go.
• R2 being 0.317, suggests that Model considered can explain 31.7% of CPM. That is Remaining percentage
is unexplained due to factors out of model which have not been considered. Therefore, factors SI, HWF,
HB, GDP, CDR, DP, FS, MS, X, Y & LE explains only 31.7% dependency on CPM.
• As the p value of the intercept, coefficient of the Hospitals beds per million, coefficient of cardiovascular
death rate, coefficient of diabetes prevalence, coefficient of the male and female smokers, coefficient of
aged 65 older and 70 older is less than 5%, these are significant and rest factors have p value more than
5% so these are not significant.

May-2020
CPM^ = 250.86 + 9.15(SI) - 0.00438(HWF) - 0.0084(HB) + 7.22(GDP) – 2.59(CDR) + 132.27(DP) + 14.44(FS) -
88.90(Y) + 109.688(LE)

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.498329
R Square 0.248332
Adjusted R Square 0.192881
Standard Error 32.74248
Observations 132

ANOVA
df SS MS F Significance
F
Regression 9 43210.57 4801.174 4.478416 4.31E-05
Residual 122 130792.5 1072.07
Total 131 174003.1

Coefficients Standard t Stat P-value Lower 95% Upper Lower Upper


Error 95% 95.0% 95.0%
Intercept 19.22113 18.50097 1.038925 0.300894 -17.4034 55.84564 -17.4034 55.84564
Avg. Stringency Index 0.096144 0.222835 0.431457 0.666898 -0.34498 0.537267 -0.34498 0.537267
Handwashing facilities per -0.00035 0.000262 -1.34987 0.179557 -0.00087 0.000165 -0.00087 0.000165
million
Hospitals beds per Million 1.55E-05 9.8E-05 0.158493 0.874331 -0.00018 0.000209 -0.00018 0.000209
GDP per Capita 2.797475 2.819981 0.992019 0.323152 -2.78496 8.37991 -2.78496 8.37991
Cardiovasc death rate -0.04584 0.029259 -1.56667 0.119783 -0.10376 0.012082 -0.10376 0.012082
Diabetes Prevalence 0.094581 0.787841 0.120051 0.90464 -1.46503 1.654191 -1.46503 1.654191
Female smokers 0.976192 0.455471 2.14326 0.034077 0.074543 1.877842 0.074543 1.877842
aged 70 older -0.52063 2.133609 -0.24401 0.807629 -4.74432 3.70306 -4.74432 3.70306
life expectancy -2.87607 1.718065 -1.67402 0.096689 -6.27715 0.525011 -6.27715 0.525011
Business Statistics & Analytics Group Assignment Report
Apoorva Chhangani (p41071), Rahul Jha (p41098), Ram Sandeep Peddada (p41100)
Saloni Sharma (p41104), Shubhayan Modak (p41115)

• From ANOVA, it can be observed that F-Value for the model is 0.0006210041582383, hence the model is
significant.
• It can be understood from the above table that, In the absence of all the other independent variables,
Cases per million (CPM) can have minimum value of 250.86.
• In the presence of any of the independent variable for example Stringency Index (SI), it can be interpreted
as 1 unit increase in SI will cause 9.15 units increase in CPM, keeping other factors constant. However, for
Cardiovascular Death Rate (CDR), 1 unit increase in CDR will cause 2.59 units of decrease in the value of
CPM, keeping other factors constant. Similarly, it can be said for other independent variables as well.
• Checking P-Values individually for independent variable suggests that all the variables are significant since
if P-Value is low, Null Hypothesis must go.
• R2 being 0.207, suggests that Model considered can explain 20.7% of CPM. That is Remaining percentage is
unexplained due to factors out of model which have not been considered. Therefore, factors SI, HWF, HB, GDP,
CDR, DP, FS, MS, X, Y & LE explains only 20.7% dependency on CPM.
• As the p value of the coefficient of the female smokers less than 5%, this is significant and rest factors have
p value more than 5% so these are not significant.
April-2020
CPM^ = -969.6 + 2.83(SI) - 0.018(HWF) - 0.0040(HB) – 24.84(GDP) – 3.22(CDR) + 24.90(DP) + 40.82(FS)-
39.78(Y) + 75.5(LE)

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.48883
R Square 0.238954
Adjusted R Square 0.196674
Standard Error 959.788
Observations 172

ANOVA
df SS MS F Significance
F
Regression 9 46856539 5206282 5.651673 8.6E-07
Residual 162 1.49E+08 921193.1
Total 171 1.96E+08

Coefficients Standard t Stat P-value Lower 95% Upper Lower Upper


Error 95% 95.0% 95.0%
Intercept 969.6177 455.5493 2.128458 0.034811 70.0373 1869.198 70.0373 1869.198
Avg. Stringency Index 2.835608 5.120982 0.553723 0.580532 -7.27688 12.94809 -7.27688 12.94809
Handwashing facilities -0.0018 0.005227 -0.34506 0.730496 -0.01212 0.008517 -0.01212 0.008517
per Million
Hospitals beds per -0.00408 0.002445 -1.66964 0.096923 -0.00891 0.000746 -0.00891 0.000746
Million
GDP per Capita -24.8474 66.44481 -0.37396 0.708926 -156.057 106.3622 -156.057 106.3622
Cardiovasc death rate -3.22393 0.684775 -4.70802 5.34E-06 -4.57617 -1.8717 -4.57617 -1.8717
Diabetes Prevalence 24.90189 18.72101 1.330158 0.185336 -12.0668 61.87056 -12.0668 61.87056
Business Statistics & Analytics Group Assignment Report
Apoorva Chhangani (p41071), Rahul Jha (p41098), Ram Sandeep Peddada (p41100)
Saloni Sharma (p41104), Shubhayan Modak (p41115)

Female smokers 40.82922 12.42525 3.285986 0.001246 16.29287 65.36556 16.29287 65.36556
aged 70 older -39.7867 48.68063 -0.8173 0.414957 -135.917 56.34369 -135.917 56.34369
life expectancy 75.52295 43.32416 1.743206 0.083195 -10.0299 161.0759 -10.0299 161.0759
• From ANOVA, it can be observed that F-Value for the model is 8.6E-07, hence the model is significant.
• It can be understood from the above table that, In the absence of all the other independent variables,
Cases per million (CPM) can have minimum value of 969.61.
• In the presence of any of the independent variable for example Stringency Index (SI), it can be interpreted
as 1 unit increase in SI will cause 2.83 units increase in CPM, keeping other factors constant. However, for
Cardiovascular Death Rate (CDR), 1 unit increase in CDR will cause 3.22units of decrease in the value of
CPM, keeping other factors constant. Similarly, it can be said for other independent variables as well.
• Checking P-Values individually for independent variable suggests that all the variables are significant since
if P-Value is low, Null Hypothesis must go.
• R2 being 0.238, suggests that Model considered can explain 23.8% of CPM. That is Remaining percentage is
unexplained due to factors out of model which have not been considered. Therefore, factors SI, HWF, HB, GDP,
CDR, DP, FS, MS, X, Y & LE explains only 23.8% dependency on CPM.
• As the p value of the intercept, coefficient of the male and female smokers is less than 5%, these are
significant and rest factors have p value more than 5% so these are not significant.

March-2020
CPM^ = 654.76 – 0.426(SI) - 0.00147(HWF) - 0.0011(HB) + 80.94(GDP) – 1.44(CDR) – 1.50(DP) + 33.84(FS)) -
78.12(Y) – 7.31LE)

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.536895
R Square 0.288256
Adjusted R Square 0.233506
Standard Error 572.818
Observations 127

ANOVA
df SS MS F Significance
F
Regression 9 15547952 1727550 5.264988 5.15E-06
Residual 117 38390091 328120.4
Total 126 53938043

Coefficients Standard t Stat P-value Lower 95% Upper Lower Upper


Error 95% 95.0% 95.0%
Intercept 654.7662 264.7256 2.473377 0.01482 130.4911 1179.041 130.4911 1179.041
Avg. Stringency Index -0.42628 3.807047 -0.11197 0.911038 -7.96594 7.113376 -7.96594 7.113376
Handwashing facilities -0.00147 0.004756 -0.30922 0.757707 -0.01089 0.007948 -0.01089 0.007948
per million
Hospitals beds per -0.00115 0.001701 -0.67375 0.501799 -0.00451 0.002223 -0.00451 0.002223
million
Business Statistics & Analytics Group Assignment Report
Apoorva Chhangani (p41071), Rahul Jha (p41098), Ram Sandeep Peddada (p41100)
Saloni Sharma (p41104), Shubhayan Modak (p41115)

GDP per Capita 80.94362 49.49364 1.635435 0.104645 -17.0759 178.9632 -17.0759 178.9632
Cardiovasc death rate -1.44579 0.529866 -2.72859 0.007342 -2.49516 -0.39641 -2.49516 -0.39641
Diabetes Prevalence -1.50334 13.85254 -0.10852 0.913766 -28.9376 25.93089 -28.9376 25.93089
Female smokers 33.84933 8.080418 4.189056 5.46E-05 17.84648 49.85217 17.84648 49.85217
aged 70 older -78.1284 37.55589 -2.08032 0.039679 -152.506 -3.75091 -152.506 -3.75091
life expectancy -7.31696 30.03836 -0.24359 0.807977 -66.8064 52.17244 -66.8064 52.17244
• From ANOVA, it can be observed that F-Value for the model is 5.1E-06, hence the model is significant.
• It can be understood from the above table that, In the absence of all the other independent variables,
Cases per million (CPM) can have minimum value of 654.76.
• In the presence of any of the independent variable for example Stringency Index (SI), it can be interpreted
as 1 unit increase in SI will cause 0.42 units decrease in CPM, keeping other factors constant. However, for
GDP Per Capita (GDP), 1 unit increase in GDP will cause 80.94units of decrease in the value of CPM.
Similarly, it can be said for other independent variables as well.
• Checking P-Values individually for independent variable suggests that all the variables are significant since
if P-Value is low, Null Hypothesis must go.
• R2 being 0.288, suggests that Model considered can explain 28.8% of CPM. That is Remaining percentage is
unexplained due to factors out of model which have not been considered. Therefore, factors SI, HWF, HB, GDP,
CDR, DP, FS, MS, X, Y & LE explains only 28.8% dependency on CPM.
• As the p value of the intercept, coefficient of cardiovascular death rate, coefficient of aged 65 older is less
than 5%, these are significant and rest factors have p value more than 5% so these are not significant.

February-2020
CPM^ = 20.5 + 0.482(SI) + 0.3074(HWF) - 3.1175(GDP) - 0.03967(CDR) - 1.28201 (DP) + 736.45(FS) + +
0.581109 (Y) + 1.937(LE)

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.866625
R Square 0.751038
Adjusted R Square 0.654219
Standard Error 8.843517
Observations 26

ANOVA
df SS MS F Significance
F
Regression 7 4246.699 606.6713 7.757172 0.000221
Residual 18 1407.74 78.2078
Total 25 5654.44

Coefficients Standard t Stat P-value Lower 95% Upper Lower Upper


Error 95% 95.0% 95.0%
Intercept 20.58889 9.18834 2.240763 0.037891 1.284908 39.89288 1.284908 39.89288
Avg Stringency Index 0.482582 0.125233 3.853485 0.001164 0.219478 0.745686 0.219478 0.745686
Business Statistics & Analytics Group Assignment Report
Apoorva Chhangani (p41071), Rahul Jha (p41098), Ram Sandeep Peddada (p41100)
Saloni Sharma (p41104), Shubhayan Modak (p41115)

Handwashing facilities 0.30749 0.259602 1.184463 0.251634 -0.23791 0.852894 -0.23791 0.852894
GDP per Capita -3.11755 1.299785 -2.39851 0.027511 -5.8483 -0.3868 -5.8483 -0.3868
Cardiovasc death rate -0.03967 0.024719 -1.60498 0.1259 -0.09161 0.012259 -0.09161 0.012259
Diabetes Prevalence -1.28201 0.758361 -1.6905 0.108175 -2.87527 0.311247 -2.87527 0.311247
aged 70 older 0.581109 0.83109 0.699213 0.493356 -1.16495 2.327164 -1.16495 2.327164
life expectancy 1.93717 0.795527 2.435078 0.025517 0.26583 3.60851 0.26583 3.60851
• From ANOVA, it can be observed that F-Value for the model is 0.000221, hence the model is significant.
• It can be understood from the above table that, In the absence of all the other independent variables,
Cases per million (CPM) can have minimum value of 20.58889.
• In the presence of any of the independent variable for example Stringency Index (SI), it can be interpreted
as 1 unit increase in SI will cause 0.48 units increase in CPM, keeping other factors constant. However, for
GDP Per Capita (GDP), 1 unit increase in GDP will cause 3.11 units of decrease in the value of CPM. Similarly,
it can be said for other independent variables as well.
• Checking P-Values individually for independent variable suggests that all the variables are significant except
for Handwashing facilities and Aged older 70.
• R2 being 0.751, suggests that Model considered can explain 75.1% of CPM. That is Remaining percentage is
unexplained due to factors out of model which have not been considered. Therefore, factors SI, HWF, HB, GDP,
CDR, DP, FS, MS, X, Y & LE explains only 75.1% dependency on CPM.
• As the p value of the intercept, coefficient of the Average Stringency Index, life expectancy is less than 5%,
these are significant and rest factors have p value more than 5% so these are not significant.

REGRESSION BASED ON DEATH PER MILLION:

March-2020

DPM^ = -2.089+ 0.27(SI) - 0.013(HWF) + 8.59(HB) – 2.8(GDP) – 0.041(CDR) – 0.32(DP) + 1.69(FS) – 1.61(Y)
– 2.29(LE)

Regression Statistics
Multiple R 0.483585
R Square 0.233855
Adjusted R Square 0.174921
Standard Error 23.50812
Observations 127

ANOVA
df SS MS F Significance F
Regression 9 19735.89 2192.877 3.968063 0.000194
Residual 117 64657.89 552.6315
Total 126 84393.78
Business Statistics & Analytics Group Assignment Report
Apoorva Chhangani (p41071), Rahul Jha (p41098), Ram Sandeep Peddada (p41100)
Saloni Sharma (p41104), Shubhayan Modak (p41115)

Coefficients Standard t Stat P-value Lower Upper Lower Upper


Error 95% 95% 95.0% 95.0%
Intercept -2.08964 10.86418 -0.19234 0.847808 -23.6056 19.42631 -23.6056 19.42631

Avg. Stringency Index 0.279127 0.156239 1.786537 0.076602 -0.0303 0.58855 -0.0303 0.58855

Handwashing facilities 8.59E-05 0.000195 0.440007 0.660744 -0.0003 0.000472 -0.0003 0.000472
per million
Hospitals beds per -2.8E-05 6.98E-05 -0.40167 0.688658 -0.00017 0.00011 -0.00017 0.00011
million
GDP per Capita 3.390891 2.03119 1.669411 0.09771 -0.63177 7.413557 -0.63177 7.413557

Cardiovasc death rate -0.04124 0.021745 -1.89633 0.060382 -0.0843 0.001829 -0.0843 0.001829

Diabetes Prevalence -0.32882 0.5685 -0.5784 0.564107 -1.4547 0.797065 -1.4547 0.797065

Female smokers 0.691642 0.331616 2.085672 0.039182 0.034894 1.348389 0.034894 1.348389

aged 70 older -1.61868 1.541272 -1.05022 0.295779 -4.67109 1.433726 -4.67109 1.433726

life expectancy -2.29979 1.232757 -1.86556 0.064607 -4.7412 0.141624 -4.7412 0.141624

• It can be understood from the above table that, In the absence of all the other independent
variables, death rate can have minimum value of -2.08964.
• In the presence of any of the independent variable for example female smokers, it can be
interpreted as 1 unit increase in smoker would cause 0.691642 units increase in death rate,
keeping other factors constant. However, for GDP per Capita, 1 unit increase in GDP will
cause 3.390 units of increase in the value of death rate, keeping other factors constant.
Similarly, it can be said for other independent variables as well.
• Checking P-Values individually for independent variable suggests that all the variables are
significant since if P-Value is low, Null Hypothesis must go.
• R2 being 0.233, suggests that Model considered can explain 23.3% of death rate. That is
Remaining percentage is unexplained due to factors out of model which have not been
considered. Therefore, factors SI, HWF, HB, GDP, CDR, DP, FS, Y & LE explains only 23.3%
dependency on death rate.

April-2020
DPM^ = 51.82 + 0.21(SI) - 0.0003(HWF) - 9.8(HB) + 6.07(GDP) – 0.18(CDR) – 0.46(DP) + 4.05(FS) –
4.91(Y) – 0.72(LE)

Regression Statistics
Multiple R 0.517536
R Square 0.267843
Adjusted R Square 0.227168
Standard Error 76.70034
Observations 172
Business Statistics & Analytics Group Assignment Report
Apoorva Chhangani (p41071), Rahul Jha (p41098), Ram Sandeep Peddada (p41100)
Saloni Sharma (p41104), Shubhayan Modak (p41115)

ANOVA
df SS MS F Significance F
Regression 9 348647 38738.56 6.584895 5.47E-08
Residual 162 953036.7 5882.942
Total 171 1301684

Coefficients Standard t Stat P-value


Lower Upper Lower Upper
Error 95% 95% 95.0% 95.0%
Intercept 51.82364 36.40469 1.423543 0.156502 -20.0653 123.7126 -20.0653 123.7126
Avg. Stringency Index 0.215991 0.409237 0.52779 0.598367 -0.59214 1.024119 -0.59214 1.024119
Handwashing facilities -0.00039 0.000418 -0.93352 0.351943 -0.00121 0.000435 -0.00121 0.000435
per Million
Hospitals beds per -9.8E-05 0.000195 -0.50198 0.616362 -0.00048 0.000288 -0.00048 0.000288
Million
GDP per Capita 6.071301 5.309859 1.143401 0.254559 -4.41416 16.55676 -4.41416 16.55676
Cardiovasc death rate -0.18111 0.054723 -3.30958 0.001152 -0.28917 -0.07305 -0.28917 -0.07305
Diabetes Prevalence -0.46181 1.496067 -0.30868 0.757958 -3.41612 2.492495 -3.41612 2.492495
Female smokers 4.054457 0.99295 4.083245 6.96E-05 2.093663 6.01525 2.093663 6.01525
aged 70 older -4.9188 3.890256 -1.26439 0.207906 -12.6009 2.76335 -12.6009 2.76335
life expectancy -0.72023 3.4622 -0.20803 0.835469 -7.55709 6.116627 -7.55709 6.116627

• It can be understood from the above table that, In the absence of all the other independent
variables, death rate can have minimum value of 51.82364
• In the presence of any of the independent variable for example female smokers, it can be
interpreted as 1 unit increase in smoker would cause 4.054457 units increase in death rate,
keeping other factors constant. However, for GDP per Capita, 1 unit increase in GDP will
cause 6.071301 units of increase in the value of death rate, keeping other factors constant.
Similarly, it can be said for other independent variables as well.
• Checking P-Values individually for independent variable suggests that all the variables are
significant since if P-Value is low, Null Hypothesis must go.
• R2 being 0.267, suggests that Model considered can explain 26.7% of death rate. That is
Remaining percentage is unexplained due to factors out of model which have not been
considered. Therefore, factors SI, HWF, HB, GDP, CDR, DP, FS, Y & LE explains only 26.7%
dependency on death rate.

May-2020

DPM^ = 19.22 + 0.096(SI) - 0.0003(HWF) + 1.55(HB) + 2.79(GDP) – 0.04(CDR) + 0.094(DP) +


0.976(FS) – 0.52(Y) – 2.87(LE)
Business Statistics & Analytics Group Assignment Report
Apoorva Chhangani (p41071), Rahul Jha (p41098), Ram Sandeep Peddada (p41100)
Saloni Sharma (p41104), Shubhayan Modak (p41115)

Regression Statistics
Multiple R 0.498329
R Square 0.248332
Adjusted R Square 0.192881
Standard Error 32.74248
Observations 132

ANOVA
df SS MS F Significance F
Regression 9 43210.57 4801.174 4.478416 4.31E-05
Residual 122 130792.5 1072.07
Total 131 174003.1

Coefficients Standard t Stat P-value


Lower Upper Lower Upper
Error 95% 95% 95.0% 95.0%
Intercept 19.22113 18.50097 1.038925 0.300894 -17.4034 55.84564 -17.4034 55.84564
Avg. Stringency Index 0.096144 0.222835 0.431457 0.666898 -0.34498 0.537267 -0.34498 0.537267
Handwashing facilities per -0.00035 0.000262 -1.34987 0.179557 -0.00087 0.000165 -0.00087 0.000165
million
Hospitals beds per Million 1.55E-05 9.8E-05 0.158493 0.874331 -0.00018 0.000209 -0.00018 0.000209
GDP per Capita 2.797475 2.819981 0.992019 0.323152 -2.78496 8.37991 -2.78496 8.37991
Cardiovasc death rate -0.04584 0.029259 -1.56667 0.119783 -0.10376 0.012082 -0.10376 0.012082
Diabetes Prevalence 0.094581 0.787841 0.120051 0.90464 -1.46503 1.654191 -1.46503 1.654191
Female smokers 0.976192 0.455471 2.14326 0.034077 0.074543 1.877842 0.074543 1.877842
aged 70 older -0.52063 2.133609 -0.24401 0.807629 -4.74432 3.70306 -4.74432 3.70306
life expectancy -2.87607 1.718065 -1.67402 0.096689 -6.27715 0.525011 -6.27715 0.525011

• It can be understood from the above table that, In the absence of all the other independent
variables, death rate can have minimum value of 19.22113
• In the presence of any of the independent variable for example female smokers, it can be
interpreted as 1 unit increase in smoker would cause 0.976192 units increase in death rate,
keeping other factors constant. However, for GDP per Capita, 1 unit increase in GDP will
cause 2.797475 units of increase in the value of death rate, keeping other factors constant.
Similarly, it can be said for other independent variables as well.
• Checking P-Values individually for independent variable suggests that all the variables are
significant since if P-Value is low, Null Hypothesis must go.
• R2 being 0.248, suggests that Model considered can explain 24.8% of death rate. That is
Remaining percentage is unexplained due to factors out of model which have not been
considered.
• Therefore, factors SI, HWF, HB, GDP, CDR, DP, FS, Y & LE explains only 24.8% dependency on
death rate.
June-2020
DPM^ = 10.73 + 0.116(SI) + 7.13(HWF) -0.00012(HB) - 0.36(GDP) – 21.7(CDR) - 0.34(DP) + 21.3(FS) –
8.76(MS) + 1.194(X) – 1.145(Y) – 1.49(LE)
Business Statistics & Analytics Group Assignment Report
Apoorva Chhangani (p41071), Rahul Jha (p41098), Ram Sandeep Peddada (p41100)
Saloni Sharma (p41104), Shubhayan Modak (p41115)

Regression Statistics
Multiple R 0.555506
R Square 0.308587
Adjusted R Square 0.245207
Standard Error 26.2231
Observations 132

ANOVA
df SS MS F Significance F
Regression 11 36828.88 3348.08 4.868865 3.4E-06
Residual 120 82518.11 687.6509
Total 131 119347

Coefficients Standard t Stat P-value Lower Upper Lower Upper


Error 95% 95% 95.0% 95.0%
Intercept 10.73925 17.77727 0.6041 0.546917 -24.4585 45.937 -24.4585 45.937

Avg. Stringency Index 0.116113 0.16302 0.712266 0.477683 -0.20665 0.438881 -0.20665 0.438881

Handwashing facilities 7.13E-06 0.000195 0.036489 0.970953 -0.00038 0.000394 -0.00038 0.000394
per million
Hospitals beds per -0.00012 8.08E-05 -1.50653 0.13456 -0.00028 3.83E-05 -0.00028 3.83E-05
Million
GDP per Capita -0.36449 2.19629 -0.16596 0.868471 -4.71299 3.984014 -4.71299 3.984014

Cardiovasc death rate -21.7362 5.817333 -3.73646 0.000287 -33.2541 -10.2183 -33.2541 -10.2183

Diabetes Prevalence -0.3414 0.104149 -3.27798 0.001368 -0.5476 -0.13519 -0.5476 -0.13519

Female smokers 21.35626 6.93287 3.080435 0.002563 7.629655 35.08286 7.629655 35.08286

Male smokers 8.76054 1.791963 4.888795 3.18E-06 5.212578 12.3085 5.212578 12.3085

aged 65 older 1.194068 0.666279 1.792143 0.07563 -0.12512 2.513254 -0.12512 2.513254

aged 70 older -1.45451 1.915234 -0.75944 0.449078 -5.24654 2.337523 -5.24654 2.337523

life expectancy -1.49428 1.35375 -1.10381 0.271885 -4.17461 1.186051 -4.17461 1.186051

• It can be understood from the above table that, In the absence of all the other independent
variables, death rate can have minimum value of 10.73925
• In the presence of any of the independent variable for example female smokers, it can be
interpreted as 1 unit increase in smoker would cause 21.35626 units increase in death rate,
keeping other factors constant. However, for male smokers, 1 unit increase in smoker will
cause 8.76054 units of increase in the value of death rate, keeping other factors constant.
Similarly, it can be said for other independent variables as well.
• Checking P-Values individually for independent variable suggests that all the variables are
significant since if P-Value is low, Null Hypothesis must go.
Business Statistics & Analytics Group Assignment Report
Apoorva Chhangani (p41071), Rahul Jha (p41098), Ram Sandeep Peddada (p41100)
Saloni Sharma (p41104), Shubhayan Modak (p41115)

• • R2 being 0.308, suggests that Model considered can explain 30.8% of death rate.
That is Remaining percentage is unexplained due to factors out of model which have not
been considered. Therefore, factors SI, HWF, HB, GDP, CDR, DP, FS, MS, X, Y & LE explains
only 30.8% dependency on death rate.

July-2020
DPM^ = -8.99 + 0.54(SI) - 3.300(HWF) -8.89(HB) - 2.919(GDP) – 19.899(CDR) - 0.3183(DP) +
19.288(FS) + 8.09(MS) + 0.94(X) – 0.046(Y) + 1.64(LE)

Regression Statistics
Multiple R 0.577905027
R Square 0.33397422
Adjusted R Square 0.272921857
Standard Error 31.147874
Observations 132

ANOVA
df SS MS F Significance F
Regression 11 58379.45 5307.222 5.470291 5.04E-07
Residual 120 116422.8 970.1901
Total 131 174802.3

Coefficients Standard t Stat P-value Lower Upper Lower Upper


Error 95% 95% 95.0% 95.0%
Intercept -8.993643514 20.46297 -0.43951 0.661084 -49.5089 31.52162 -49.5089 31.52162
Avg Stringency 0.544992638 0.175791 3.100233 0.00241 0.196939 0.893046 0.196939 0.893046
Index
Handwashing -3.30027E-08 0.000234 -0.00014 0.999888 -0.00046 0.000464 -0.00046 0.000464
facilities per Million
Hospitals beds per -8.89784E-06 9.64E-05 -0.09231 0.926607 -0.0002 0.000182 -0.0002 0.000182
Million
GDP per Capita -2.919547927 2.584163 -1.12978 0.26082 -8.03601 2.196915 -8.03601 2.196915
Cardiovasc death -19.89910535 6.838951 -2.90967 0.004313 -33.4398 -6.35846 -33.4398 -6.35846
rate
Diabetes -0.318378504 0.122546 -2.59803 0.01055 -0.56101 -0.07575 -0.56101 -0.07575
Prevalence
Female smokers 19.28820422 8.18088 2.357717 0.020006 3.090631 35.48578 3.090631 35.48578
Male smokers 8.099222035 2.111227 3.836262 0.000201 3.919139 12.27931 3.919139 12.27931
aged 65 older 0.941928757 0.77705 1.212186 0.227823 -0.59658 2.480434 -0.59658 2.480434
aged 70 older -0.046196671 2.221458 -0.0208 0.983443 -4.44453 4.352135 -4.44453 4.352135
life expectancy 1.649549552 1.608958 1.025228 0.307318 -1.53608 4.835174 -1.53608 4.835174

• From ANOVA, it can be observed that F-Value for the model is 5.04E-07, hence the model is
significant.
Business Statistics & Analytics Group Assignment Report
Apoorva Chhangani (p41071), Rahul Jha (p41098), Ram Sandeep Peddada (p41100)
Saloni Sharma (p41104), Shubhayan Modak (p41115)

• It can be understood from the above table that, In the absence of all the other independent
variables, death rate can have minimum value of -8.993
• In the presence of any of the independent variable for example female smokers, it can be
interpreted as 1 unit increase in smoker would cause 19.288 units increase in death rate,
keeping other factors constant. However, for male smokers, 1 unit increase in smoker will
cause 8.099 units of increase in the value of death rate, keeping other factors constant.
Similarly, it can be said for other independent variables as well.
• Checking P-Values individually for independent variable suggests that all the variables are
significant since if P-Value is low, Null Hypothesis must go.
• R2 being 0.333, suggests that Model considered can explain 33.3% of death rate. That is
Remaining percentage is unexplained due to factors out of model which have not been
considered. Therefore, factors SI, HWF, HB, GDP, CDR, DP, FS, MS, X, Y & LE explains only
33.3% dependency on death rate.

1. Does Government Response to the Covid-19 crisis have an impact on how the pandemic has affected
each country?
We have generated a model month wise to study how no. of cases per million and no. of deaths per million
in different countries have been reported due to the government response i.e., Stringency Index.

2. Do the development indicators tell a story on the effectiveness of a country’s response to the Covid-19
pandemic?
Business Statistics & Analytics Group Assignment Report
Apoorva Chhangani (p41071), Rahul Jha (p41098), Ram Sandeep Peddada (p41100)
Saloni Sharma (p41104), Shubhayan Modak (p41115)

We have generated a model month wise to study how no. of cases per million and no. of deaths per million
in different countries have been reported due to the Hand Washing Facilities per million, Hospital beds per
million, GDP per Capita.

3. Experts are of the opinion that lifestyle diseases and comorbidities are major risk factors in the fight
against the Covid-19 pandemic. Certain lifestyle choices and certain demographic factors also tend to
worsen the situation. Does the data give credence to these claims?
We have generated a model month wise to study how no. of cases per million and no. of deaths per million
in different countries have been reported due to the cardiovascular diseases proportion, diabetes
prevalence proportion, male and female smoker proportion and life expectancy in the countries.

4. How has Covid-19 affected different age groups with respect to severity in terms of number of deaths
reported?
We have generated a model to study how the no. of deaths in a country has affected based on the factors
of the number of people above age 65 and 70.

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.478988
R Square 0.22943
Adjusted R 0.220417
Square
Standard 13134.46
Error
Observations 174

ANOVA
df SS MS F Significance
F
Regression 2 8783290056 4391645028 25.45676 2.1E-10
Residual 171 29499880104 172513918.7
Total 173 38283170160

Coefficients Standard t Stat P-value Lower 95% Upper Lower Upper


Error 95% 95.0% 95.0%
Intercept 2308.563 1031.619141 2.237805078 0.026525 272.2145 4344.911 272.2145 4344.911
older than 65 -0.000408 9.22378E-05 - 1.75E-05 -0.00059 - -0.00059 -0.00023
4.419931522 0.000226
older than 70 0.001895 0.000346285 5.472411139 1.56E-07 0.001211 0.002579 0.001211 0.002579

Dependent Variable: Total number of deaths for month January to July (D)
Independent Variable:
Total population older than 65, X
Total population older than 70, Y
Business Statistics & Analytics Group Assignment Report
Apoorva Chhangani (p41071), Rahul Jha (p41098), Ram Sandeep Peddada (p41100)
Saloni Sharma (p41104), Shubhayan Modak (p41115)

Multiple Linear Regression


D = f (X, Y)
D = Bo + B1X + B2Y + U
D^ = b*0 +b1*X + b2*Y
From ANOVA, it can be observed that F-Value for the model is 2.1E-10, hence the model is significant.
It can be understood from the above table that, In the absence of all the other independent variables, total
death can have minimum value of 2308.563.
In the presence of any of the independent variable X i.e. total population older than 65, it can be interpreted
as 1 unit increase in X is reflected by 0.000408 units decrease total death. However, for Y i.e. total population
older than 75, 1 unit increase in Y will be reflected by 0.001895 units of increase in total deaths.
Checking P-Values individually for independent variable suggests that all the variables are significant since if
P-Value is low, Null Hypothesis must go.
R squared being 0.229, suggests that Model considered can explain 22.9% of the deaths. That is Remaining
percentage is due to factors out of model which have not been considered. Therefore, factors X, Y explains
only 22.9% dependency on total deaths.

5. How can we assess an overall performance of world against Covid-19 from this data?
Box-whisker can be plotted to represent the overall performance of the world with variables as cases
per million/deaths per million for different countries.
i. Indicating middle 50% range of the data.
ii. Plotting quartile points, the Lower & Upper Whisker points and points which are beyond these
are the outliers.
The outliers will show us that the countries which are most affected by the virus.
Business Statistics & Analytics Group Assignment Report
Apoorva Chhangani (p41071), Rahul Jha (p41098), Ram Sandeep Peddada (p41100)
Saloni Sharma (p41104), Shubhayan Modak (p41115)
Business Statistics & Analytics Group Assignment Report
Apoorva Chhangani (p41071), Rahul Jha (p41098), Ram Sandeep Peddada (p41100)
Saloni Sharma (p41104), Shubhayan Modak (p41115)

The Outliers are:

You might also like