You are on page 1of 7

Cigarettes Case, Sahil Nayar, Cluster Y

In order to answer the questions below, I first ran a linear regression of cigarette sales per capita against all of the explanatory variables except ‘state’. However, as can be seem in section 1 of the appendix, the explanatory power of this model is very low, with an R^S of just 0.32.

Therefore, I ran a second regression of ln (cigarette sales per capita) against all of the explanatory variables except ‘state’. The R^2 improves to 0.39, suggesting that this is model has better explanatory power. Therefore, I refer to this model (reproduced in section 2 of the appendix) when answering the questions below.

In the first model, Income is significant that the 10% level, price is significant at any level, and the remaining variables are not significant. This is counter intuitive since we would expect age to be significant (there is a legal age for smoking). It is also reasonable to expect education to be to significant, since educated people are more aware of the dangers of smoking. The fact that Income is not very significant is consistent with the fact that cigarettes are habit forming products, and hence relatively income inelastic. The signs of the significant coefficients are in line with expectations.

In the second model, Age is significant at the 10% level, Income is significant at the 2% level, and price is significant at any level. The signs of the significant coefficients are in line with expectations. This appears to be a better model than the first.

Multicollinearity could affect both models. We would expect, for example, Income and age to be positively correlated, income and education to be positively correlated, and income and black to be negatively correlated.

• (a) Ho: B(Female)=0 H1: B(Female) is not equal to zero We use a two-tail t-test. Since the p-value is 0.85 for the Female explanatory variable, we fail to reject the null hypothesis at any significance level. We could that Female is not needed in the regression equation.

• (b) To answer this question, it is necessary to run a regression of sales against all explanatory variables except ‘state’, ‘female’ and ‘high’. The results of this regression are reported in section 7 of the appendix. Conducting a two tail chow test between regressions 1 and 7 will help test the following hypothesis:

H1: Either one or both coefficients are not equal to 0

=>

F-stat= {(SSErm-SSEfm)}/{(p+1-k)/SSEfm/(n-p-1)} F-stat= 0.014

Since the F-stat is so low, we fail to reject the null hypothesis.

© From section 2 of the appendix, we can see that lower 95% and the upper 95% limits for the income variable are 3.13239E-05 and 0.000309126 respectively. Therefore, the 95% confidence interval is 0.000170225 +/-

0.0001389.

• (c) From section 4 of the appendix, we can see that the R^2 when Income is removed from the regression equation is 0.3, implying that 30% of the variation in sales can be explained when income is removed from the model.

• (d) From section 5 of the appendix, we can see that the R^2 when we include only Price, Income and Age in the regression equation is 0.37. This implies that Price, Income and Age can explain 37% of the variation in sales. Note that R^2 has fallen by only 0.02 compared to regression model 2 that included all of the variables expect ‘state’. This suggests that these three variables have the bulk of the explanatory power out of all the ones that we have data for.

• (e) From section 6 of the appendix, we can see that the R^2 when only Income is included in the regression model is 0.14, implying that Income can explain 14% of the variation in sales. At first sight, these seems to contradict the answer to question ©, where we removed income and R^2 fell by only 0.09, and not by 0.14. However, the reason for this is that income is correlated to the other variables that are still included in the regression equation. When income is removed, these act as partial proxies for income and pick up part of its affect on sales.

Appendix

• 1. Linear Regression Model

SUMMARY OUTPUT- Linear Model

Regression Statistics

 Multiple R 0.56643 R Square 0.320843
 Square Standard Error 28.17395995 Observations 51 ANOVA Significance df SS MS F F Regression 6 16499.47468 2749.912446 3.46436052 0.006856991 Residual 44 34925.96885 793.7720194 Total 50 51425.44353 Standard Coefficients Error t Stat P-value Lower 95% Upper 95% - Intercept 103.3448457 245.6071851 0.420772893 0.675969084 391.6439043 598.3335957 - Age 4.520452423 3.219768497 1.403968151 0.167347577 1.968564514 11.00946936 - - - HS 0.061586053 0.814684412 0.075594981 0.940084002 1.703474577 1.580302471 - Income 0.018946453 0.010215988 1.854588473 0.070364211 0.001642517 0.039535423 - Black 0.357535168 0.487219338 0.73382795 0.466946041 0.624390874 1.339461211 - - Female 1.052858856 5.561007986 -0.18932878 0.850705811 12.26033388 10.15461617 - - - - Price 3.254918434 1.031407044 3.155803959 0.002886409 5.333582719 1.176254149

Sales vs Predicted Sales

120
0
20
60
80
40
140
100
160
0
200
180
100
150
200
250
300
50

Predicted Sales

• 2. Non-Linear Regression Model

SUMMARY OUTPUT- Non-Linear Regression

Regression Statistics

 Multiple R 0.625647 R Square 0.391434
 Square Standard Error 0.190072168 Observations 51 ANOVA Significance df SS MS F F Regression 6 1.022448015 0.170408002 4.716859365 0.000869756 Residual 44 1.589606882 0.036127429 Total 50 2.612054897 Standard Coefficients Error t Stat P-value Lower 95% Upper 95% Intercept 4.821796221 1.656958776 2.910027872 0.005650188 1.482415278 8.161177165 - Age 0.038833553 0.021721774 1.787770803 0.08070137 0.004943805 0.08261091 - - - HS 0.003483929 0.005496169 0.633883217 0.529438762 0.014560729 0.007592871 Income 0.000170225 6.89209E-05 2.46985781 0.017466212 3.13239E-05 0.000309126 - Black 0.001831089 0.003286966 0.557075841 0.580298502 0.004793355 0.008455533 - - - Female 0.012980646 0.037516659 0.345996861 0.730994073 0.088590503 0.062629211 - - - - Price 0.024380487 0.006958261 3.503818895 0.001066606 0.038403941 0.010357033

ln(Sales) vs Predicted ln(Sales)

4.9
5
5.1
5.3
4.7
5.2
4.4
4.5
4.3
6
4.6
4.8
0
1
2
3
4
5

Predicted ln(Sales)

• 3. Non-Linear Regression Model excluding Female and High

SUMMARY OUTPUT- Excluding Female and HS

Regression Statistics

 Multiple R 0.619682888 R Square 0.384006881 Adjusted R Square 0.330442262
 Error Observations 51 ANOVA Significance df SS MS F F Regression 4 1.003047054 0.250761764 7.169039713 0.000142198 Residual 46 1.609007843 0.034978431 Total 50 2.612054897 Standard Coefficients Error t Stat P-value Lower 95% Upper 95% Intercept 4.061253068 0.423298415 9.594302564 1.49396E-12 3.209197565 4.913308571 Income 0.000146469 4.66896E-05 3.137080193 0.002974138 5.24877E-05 0.00024045 Age 0.037786995 0.014894817 2.53692238 0.014639292 0.007805284 0.067768706 Black 0.002468201 0.002117318 1.16572014 0.249736561 -0.00179374 0.006730141 - - - - Price 0.023703173 0.006775848 3.498185462 0.001050873 0.037342248 0.010064099 4. Non Linear Regression Model Excluding Income SUMMARY OUTPUT- Excluding Income Regression Statistics Multiple R 0.554132015 R Square 0.30706229 Adjusted R Square 0.230069211 Standard Error 0.200554306 Observations 51 ANOVA Significance df SS MS F F Regression 5 0.802063558 0.160412712 3.988180424 0.004437236 Residual 45 1.809991339 0.04022203 Total 50 2.612054897 Standard Coefficients Error t Stat P-value Lower 95% Upper 95% Intercept 5.351700624 1.733618867 3.087011065 0.003455756 1.860013041 8.843388207 Age 0.063872321 0.020270431 3.15100961 0.002891868 0.023045579 0.104699064 - HS 0.005799373 0.004231198 1.370622086 0.177291209 0.002722697 0.014321443 Black 0.006208014 0.002921003 2.125302071 0.039085419 0.000324812 0.012091216 - - - Female 0.037496172 0.03817503 0.982217229 0.331244423 0.114384628 0.039392284 - - - Price 0.020834885 0.007184048 2.900159344 0.005750394 -0.0353043 0.006365469

SUMMARY OUTPUT- Income, Age, Price

Regression Statistics

 Multiple R 0.604821953 R Square 0.365809595 Adjusted R Square 0.325329356 Standard Error 0.187737943 Observations 51 ANOVA Significance df SS MS F F Regression 3 0.955514743 0.318504914 9.036745018 7.82079E-05 Residual 47 1.656540154 0.035245535 Total 50 2.612054897 Standard Coefficients Error t Stat P-value Lower 95% Upper 95% Intercept 4.127128249 0.42110809 9.800638707 6.1093E-13 3.279968057 4.97428844 Income 0.000149342 4.68022E-05 3.190921651 0.002527788 5.51883E-05 0.000243496 Age 0.037523826 0.014949862 2.509978155 0.015575595 0.007448584 0.067599068 - - Price -0.02487975 0.006725788 3.699157815 0.00056573 -0.03841029 0.011349211 6. Non-Linear Model with only Income SUMMARY OUTPUT- ln(Sales) vs Income Regression Statistics Multiple R 0.371226199 R Square 0.137808891 Adjusted R Square 0.120213154 Standard Error 0.214385239 Observations 51 ANOVA Significance df SS MS F F Regression 1 0.359964388 0.359964388 7.831947665 0.007319681 Residual 49 2.252090509 0.045961031 Total 50 2.612054897 Standard Coefficients Error t Stat P-value Lower 95% Upper 95% Intercept 4.235606774 0.194208239 21.80961429 7.04454E-27 3.845330714 4.625882833 Income 0.000142671 5.09801E-05 2.798561714 0.007319681 4.02226E-05 0.000245119

7.

Linear Model excluding Female and HS

SUMMARY OUTPUT

Regression Statistics

 Multiple R 0.565849272 R Square 0.320185398 Adjusted R Square 0.261071085 Standard Error 27.5680058 Observations 51 ANOVA Significance df SS MS F F Regression 4 16465.67612 4116.419029 5.416376863 0.001167909 Residual 46 34959.76741 759.9949437 Total 50 51425.44353 Standard Coefficients Error t Stat P-value Lower 95% Upper 95% - Intercept 55.32958014 62.39529309 0.886758879 0.379821522 70.26562873 180.924789 Income 0.018892061 0.006882169 2.745073749 0.008601136 0.005038974 0.032745148 - Age 4.191538246 2.195534978 1.909119321 0.062496178 0.227844379 8.61092087 - - - - Price 3.239940647 0.998777726 3.243905587 0.002198981 5.250375905 1.229505389 - Black 0.334162426 0.312098265 1.070696198 0.289891915 0.294058789 0.962383641