Cigarettes Case, Sahil Nayar, Cluster Y
In order to answer the questions below, I first ran a linear regression of cigarette sales per capita against all of the explanatory variables except ‘state’. However, as can be seem in section 1 of the appendix, the explanatory power of this model is very low, with an R^S of just 0.32.
Therefore, I ran a second regression of ln (cigarette sales per capita) against all of the explanatory variables except ‘state’. The R^2 improves to 0.39, suggesting that this is model has better explanatory power. Therefore, I refer to this model (reproduced in section 2 of the appendix) when answering the questions below.
In the first model, Income is significant that the 10% level, price is significant at any level, and the remaining variables are not significant. This is counter intuitive since we would expect age to be significant (there is a legal age for smoking). It is also reasonable to expect education to be to significant, since educated people are more aware of the dangers of smoking. The fact that Income is not very significant is consistent with the fact that cigarettes are habit forming products, and hence relatively income inelastic. The signs of the significant coefficients are in line with expectations.
In the second model, Age is significant at the 10% level, Income is significant at the 2% level, and price is significant at any level. The signs of the significant coefficients are in line with expectations. This appears to be a better model than the first.
Multicollinearity could affect both models. We would expect, for example, Income and age to be positively correlated, income and education to be positively correlated, and income and black to be negatively correlated.
(a) Ho: B(Female)=0 H1: B(Female) is not equal to zero We use a twotail ttest. Since the pvalue is 0.85 for the Female explanatory variable, we fail to reject the null hypothesis at any significance level. We could that Female is not needed in the regression equation.
(b) To answer this question, it is necessary to run a regression of sales against all explanatory variables except ‘state’, ‘female’ and ‘high’. The results of this regression are reported in section 7 of the appendix. Conducting a two tail chow test between regressions 1 and 7 will help test the following hypothesis:
Ho: Beta (Female) = Beta (High) = 0
H1: Either one or both coefficients are not equal to 0
=>
Fstat= {(SSErmSSEfm)}/{(p+1k)/SSEfm/(np1)} Fstat= 0.014
Since the Fstat is so low, we fail to reject the null hypothesis.
© From section 2 of the appendix, we can see that lower 95% and the upper 95% limits for the income variable are 3.13239E05 and 0.000309126 respectively. Therefore, the 95% confidence interval is 0.000170225 +/
0.0001389.
(c) From section 4 of the appendix, we can see that the R^2 when Income is removed from the regression equation is 0.3, implying that 30% of the variation in sales can be explained when income is removed from the model.
(d) From section 5 of the appendix, we can see that the R^2 when we include only Price, Income and Age in the regression equation is 0.37. This implies that Price, Income and Age can explain 37% of the variation in sales. Note that R^2 has fallen by only 0.02 compared to regression model 2 that included all of the variables expect ‘state’. This suggests that these three variables have the bulk of the explanatory power out of all the ones that we have data for.
(e) From section 6 of the appendix, we can see that the R^2 when only Income is included in the regression model is 0.14, implying that Income can explain 14% of the variation in sales. At first sight, these seems to contradict the answer to question ©, where we removed income and R^2 fell by only 0.09, and not by 0.14. However, the reason for this is that income is correlated to the other variables that are still included in the regression equation. When income is removed, these act as partial proxies for income and pick up part of its affect on sales.
Appendix
1. Linear Regression Model
SUMMARY OUTPUT Linear Model
Regression Statistics
Multiple R 
0.566429724 
R Square 
0.320842632 
Adjusted R
0.228230264
Square 

Standard 

Error 
28.17395995 

Observations 
51 

ANOVA 

Significance 

df 
SS 
MS 
F 
F 

Regression 
6 
16499.47468 
2749.912446 
3.46436052 
0.006856991 

Residual 
44 
34925.96885 
793.7720194 

Total 
50 
51425.44353 

Standard 

Coefficients 
Error 
t Stat 
Pvalue 
Lower 95% 
Upper 95% 

 

Intercept 
103.3448457 
245.6071851 
0.420772893 
0.675969084 
391.6439043 
598.3335957 
 

Age 
4.520452423 
3.219768497 
1.403968151 
0.167347577 
1.968564514 
11.00946936 
 
 
 

HS 
0.061586053 
0.814684412 
0.075594981 
0.940084002 
1.703474577 
1.580302471 
 

Income 
0.018946453 
0.010215988 
1.854588473 
0.070364211 
0.001642517 
0.039535423 
 

Black 
0.357535168 
0.487219338 
0.73382795 
0.466946041 
0.624390874 
1.339461211 
 
 

Female 
1.052858856 
5.561007986 
0.18932878 
0.850705811 
12.26033388 
10.15461617 
 
 
 
 

Price 
3.254918434 
1.031407044 
3.155803959 
0.002886409 
5.333582719 
1.176254149 
Sales vs Predicted Sales
Predicted Sales
2. NonLinear Regression Model
SUMMARY OUTPUT NonLinear Regression
Regression Statistics
Multiple R 
0.625647146 
R Square 
0.391434352 
Adjusted R
0.308448127
Square 

Standard 

Error 
0.190072168 

Observations 
51 

ANOVA 

Significance 

df 
SS 
MS 
F 
F 

Regression 
6 
1.022448015 
0.170408002 
4.716859365 
0.000869756 

Residual 
44 
1.589606882 
0.036127429 

Total 
50 
2.612054897 

Standard 

Coefficients 
Error 
t Stat 
Pvalue 
Lower 95% 
Upper 95% 

Intercept 
4.821796221 
1.656958776 
2.910027872 
0.005650188 
1.482415278 
8.161177165 
 

Age 
0.038833553 
0.021721774 
1.787770803 
0.08070137 
0.004943805 
0.08261091 
 
 
 

HS 
0.003483929 
0.005496169 
0.633883217 
0.529438762 
0.014560729 
0.007592871 
Income 
0.000170225 
6.89209E05 
2.46985781 
0.017466212 
3.13239E05 
0.000309126 
 

Black 
0.001831089 
0.003286966 
0.557075841 
0.580298502 
0.004793355 
0.008455533 
 
 
 

Female 
0.012980646 
0.037516659 
0.345996861 
0.730994073 
0.088590503 
0.062629211 
 
 
 
 

Price 
0.024380487 
0.006958261 
3.503818895 
0.001066606 
0.038403941 
0.010357033 
ln(Sales) vs Predicted ln(Sales)
Predicted ln(Sales)
3. NonLinear Regression Model excluding Female and High
SUMMARY OUTPUT Excluding Female and HS
Regression Statistics
Multiple R 
0.619682888 
R Square 
0.384006881 
Adjusted R 

Square 
0.330442262 
Standard
0.187025216
Error 

Observations 
51 

ANOVA 

Significance 

df 
SS 
MS 
F 
F 

Regression 
4 
1.003047054 
0.250761764 
7.169039713 
0.000142198 

Residual 
46 
1.609007843 
0.034978431 

Total 
50 
2.612054897 

Standard 

Coefficients 
Error 
t Stat 
Pvalue 
Lower 95% 
Upper 95% 

Intercept 
4.061253068 
0.423298415 
9.594302564 
1.49396E12 
3.209197565 
4.913308571 
Income 
0.000146469 
4.66896E05 
3.137080193 
0.002974138 
5.24877E05 
0.00024045 
Age 
0.037786995 
0.014894817 
2.53692238 
0.014639292 
0.007805284 
0.067768706 
Black 
0.002468201 
0.002117318 
1.16572014 
0.249736561 
0.00179374 
0.006730141 
 
 
 
 

Price 
0.023703173 
0.006775848 
3.498185462 
0.001050873 
0.037342248 
0.010064099 
4. Non Linear Regression Model Excluding Income 

SUMMARY OUTPUT Excluding Income 

Regression Statistics 

Multiple R 
0.554132015 

R Square 
0.30706229 

Adjusted R 

Square 
0.230069211 

Standard 

Error 
0.200554306 

Observations 
51 

ANOVA 

Significance 

df 
SS 
MS 
F 
F 

Regression 
5 
0.802063558 
0.160412712 
3.988180424 
0.004437236 

Residual 
45 
1.809991339 
0.04022203 

Total 
50 
2.612054897 

Standard 

Coefficients 
Error 
t Stat 
Pvalue 
Lower 95% 
Upper 95% 

Intercept 
5.351700624 
1.733618867 
3.087011065 
0.003455756 
1.860013041 
8.843388207 
Age 
0.063872321 
0.020270431 
3.15100961 
0.002891868 
0.023045579 
0.104699064 
 

HS 
0.005799373 
0.004231198 
1.370622086 
0.177291209 
0.002722697 
0.014321443 
Black 
0.006208014 
0.002921003 
2.125302071 
0.039085419 
0.000324812 
0.012091216 
 
 
 

Female 
0.037496172 
0.03817503 
0.982217229 
0.331244423 
0.114384628 
0.039392284 
 
 
 

Price 
0.020834885 
0.007184048 
2.900159344 
0.005750394 
0.0353043 
0.006365469 
5.
NonLinear Model with only Price, Age and Income
SUMMARY OUTPUT Income, Age, Price
Regression Statistics
Multiple R 
0.604821953 

R Square 
0.365809595 

Adjusted R 

Square 
0.325329356 

Standard 

Error 
0.187737943 

Observations 
51 

ANOVA 

Significance 

df 
SS 
MS 
F 
F 

Regression 
3 
0.955514743 
0.318504914 
9.036745018 
7.82079E05 

Residual 
47 
1.656540154 
0.035245535 

Total 
50 
2.612054897 

Standard 

Coefficients 
Error 
t Stat 
Pvalue 
Lower 95% 
Upper 95% 

Intercept 
4.127128249 
0.42110809 
9.800638707 
6.1093E13 
3.279968057 
4.97428844 
Income 
0.000149342 
4.68022E05 
3.190921651 
0.002527788 
5.51883E05 
0.000243496 
Age 
0.037523826 
0.014949862 
2.509978155 
0.015575595 
0.007448584 
0.067599068 
 
 

Price 
0.02487975 
0.006725788 
3.699157815 
0.00056573 
0.03841029 
0.011349211 
6. NonLinear Model with only Income 

SUMMARY OUTPUT ln(Sales) vs Income 

Regression Statistics 

Multiple R 
0.371226199 

R Square 
0.137808891 

Adjusted R 

Square 
0.120213154 

Standard 

Error 
0.214385239 

Observations 
51 

ANOVA 

Significance 

df 
SS 
MS 
F 
F 

Regression 
1 
0.359964388 
0.359964388 
7.831947665 
0.007319681 

Residual 
49 
2.252090509 
0.045961031 

Total 
50 
2.612054897 

Standard 

Coefficients 
Error 
t Stat 
Pvalue 
Lower 95% 
Upper 95% 

Intercept 
4.235606774 
0.194208239 
21.80961429 
7.04454E27 
3.845330714 
4.625882833 
Income 
0.000142671 
5.09801E05 
2.798561714 
0.007319681 
4.02226E05 
0.000245119 
7.
Linear Model excluding Female and HS
SUMMARY OUTPUT
Regression Statistics
Multiple R 
0.565849272 

R Square 
0.320185398 

Adjusted R 

Square 
0.261071085 

Standard 

Error 
27.5680058 

Observations 
51 

ANOVA 

Significance 

df 
SS 
MS 
F 
F 

Regression 
4 
16465.67612 
4116.419029 
5.416376863 
0.001167909 

Residual 
46 
34959.76741 
759.9949437 

Total 
50 
51425.44353 

Standard 

Coefficients 
Error 
t Stat 
Pvalue 
Lower 95% 
Upper 95% 

 

Intercept 
55.32958014 
62.39529309 
0.886758879 
0.379821522 
70.26562873 
180.924789 
Income 
0.018892061 
0.006882169 
2.745073749 
0.008601136 
0.005038974 
0.032745148 
 

Age 
4.191538246 
2.195534978 
1.909119321 
0.062496178 
0.227844379 
8.61092087 
 
 
 
 

Price 
3.239940647 
0.998777726 
3.243905587 
0.002198981 
5.250375905 
1.229505389 
 

Black 
0.334162426 
0.312098265 
1.070696198 
0.289891915 
0.294058789 
0.962383641 