You are on page 1of 7

Cigarettes Case, Sahil Nayar, Cluster Y

In order to answer the questions below, I first ran a linear regression of cigarette sales per capita against all of the explanatory variables except ‘state’. However, as can be seem in section 1 of the appendix, the explanatory power of this model is very low, with an R^S of just 0.32.

Therefore, I ran a second regression of ln (cigarette sales per capita) against all of the explanatory variables except ‘state’. The R^2 improves to 0.39, suggesting that this is model has better explanatory power. Therefore, I refer to this model (reproduced in section 2 of the appendix) when answering the questions below.

In the first model, Income is significant that the 10% level, price is significant at any level, and the remaining variables are not significant. This is counter intuitive since we would expect age to be significant (there is a legal age for smoking). It is also reasonable to expect education to be to significant, since educated people are more aware of the dangers of smoking. The fact that Income is not very significant is consistent with the fact that cigarettes are habit forming products, and hence relatively income inelastic. The signs of the significant coefficients are in line with expectations.

In the second model, Age is significant at the 10% level, Income is significant at the 2% level, and price is significant at any level. The signs of the significant coefficients are in line with expectations. This appears to be a better model than the first.

Multicollinearity could affect both models. We would expect, for example, Income and age to be positively correlated, income and education to be positively correlated, and income and black to be negatively correlated.

  • (a) Ho: B(Female)=0 H1: B(Female) is not equal to zero We use a two-tail t-test. Since the p-value is 0.85 for the Female explanatory variable, we fail to reject the null hypothesis at any significance level. We could that Female is not needed in the regression equation.

  • (b) To answer this question, it is necessary to run a regression of sales against all explanatory variables except ‘state’, ‘female’ and ‘high’. The results of this regression are reported in section 7 of the appendix. Conducting a two tail chow test between regressions 1 and 7 will help test the following hypothesis:

H1: Either one or both coefficients are not equal to 0

=>

F-stat= {(SSErm-SSEfm)}/{(p+1-k)/SSEfm/(n-p-1)} F-stat= 0.014

Since the F-stat is so low, we fail to reject the null hypothesis.

© From section 2 of the appendix, we can see that lower 95% and the upper 95% limits for the income variable are 3.13239E-05 and 0.000309126 respectively. Therefore, the 95% confidence interval is 0.000170225 +/-

0.0001389.

  • (c) From section 4 of the appendix, we can see that the R^2 when Income is removed from the regression equation is 0.3, implying that 30% of the variation in sales can be explained when income is removed from the model.

  • (d) From section 5 of the appendix, we can see that the R^2 when we include only Price, Income and Age in the regression equation is 0.37. This implies that Price, Income and Age can explain 37% of the variation in sales. Note that R^2 has fallen by only 0.02 compared to regression model 2 that included all of the variables expect ‘state’. This suggests that these three variables have the bulk of the explanatory power out of all the ones that we have data for.

  • (e) From section 6 of the appendix, we can see that the R^2 when only Income is included in the regression model is 0.14, implying that Income can explain 14% of the variation in sales. At first sight, these seems to contradict the answer to question ©, where we removed income and R^2 fell by only 0.09, and not by 0.14. However, the reason for this is that income is correlated to the other variables that are still included in the regression equation. When income is removed, these act as partial proxies for income and pick up part of its affect on sales.

Appendix

  • 1. Linear Regression Model

SUMMARY OUTPUT- Linear Model

Regression Statistics

Multiple R

0.566429724

R Square

0.320842632

Square

Standard

Error

28.17395995

Observations

51

ANOVA

 

Significance

 

df

SS

MS

F

F

Regression

6

16499.47468

2749.912446

3.46436052

0.006856991

Residual

44

34925.96885

793.7720194

Total

50

51425.44353

 
 

Standard

 

Coefficients

Error

t Stat

P-value

Lower 95%

Upper 95%

 

-

Intercept

103.3448457

245.6071851

0.420772893

0.675969084

391.6439043

598.3335957

 

-

Age

4.520452423

3.219768497

1.403968151

0.167347577

1.968564514

11.00946936

-

-

-

HS

0.061586053

0.814684412

0.075594981

0.940084002

1.703474577

1.580302471

 

-

Income

0.018946453

0.010215988

1.854588473

0.070364211

0.001642517

0.039535423

 

-

Black

0.357535168

0.487219338

0.73382795

0.466946041

0.624390874

1.339461211

-

-

Female

1.052858856

5.561007986

-0.18932878

0.850705811

12.26033388

10.15461617

-

-

-

-

Price

3.254918434

1.031407044

3.155803959

0.002886409

5.333582719

1.176254149

Sales vs Predicted Sales

120 0 20 60 80 40 140 100 160 0 200 180 100 150 200 250
120
0
20
60
80
40
140
100
160
0
200
180
100
150
200
250
300
50

Predicted Sales

  • 2. Non-Linear Regression Model

SUMMARY OUTPUT- Non-Linear Regression

Regression Statistics

Multiple R

0.625647146

R Square

0.391434352

Square

Standard

Error

0.190072168

Observations

51

ANOVA

 

Significance

 

df

SS

MS

F

F

Regression

6

1.022448015

0.170408002

4.716859365

0.000869756

Residual

44

1.589606882

0.036127429

Total

50

2.612054897

 
 

Standard

 

Coefficients

Error

t Stat

P-value

Lower 95%

Upper 95%

Intercept

4.821796221

1.656958776

2.910027872

0.005650188

1.482415278

8.161177165

 

-

Age

0.038833553

0.021721774

1.787770803

0.08070137

0.004943805

0.08261091

-

-

-

HS

0.003483929

0.005496169

0.633883217

0.529438762

0.014560729

0.007592871

Income

0.000170225

6.89209E-05

2.46985781

0.017466212

3.13239E-05

0.000309126

 

-

Black

0.001831089

0.003286966

0.557075841

0.580298502

0.004793355

0.008455533

-

-

-

Female

0.012980646

0.037516659

0.345996861

0.730994073

0.088590503

0.062629211

-

-

-

-

Price

0.024380487

0.006958261

3.503818895

0.001066606

0.038403941

0.010357033

ln(Sales) vs Predicted ln(Sales)

4.9 5 5.1 5.3 4.7 5.2 4.4 4.5 4.3 6 4.6 4.8 0 1 2 3
4.9
5
5.1
5.3
4.7
5.2
4.4
4.5
4.3
6
4.6
4.8
0
1
2
3
4
5

Predicted ln(Sales)

  • 3. Non-Linear Regression Model excluding Female and High

SUMMARY OUTPUT- Excluding Female and HS

Regression Statistics

Multiple R

0.619682888

R Square

0.384006881

Adjusted R

Square

0.330442262

Error

Observations

51

ANOVA

 

Significance

 

df

SS

MS

F

F

Regression

4

1.003047054

0.250761764

7.169039713

0.000142198

Residual

46

1.609007843

0.034978431

Total

50

2.612054897

 
 

Standard

 

Coefficients

Error

t Stat

P-value

Lower 95%

Upper 95%

Intercept

4.061253068

0.423298415

9.594302564

1.49396E-12

3.209197565

4.913308571

Income

0.000146469

4.66896E-05

3.137080193

0.002974138

5.24877E-05

0.00024045

Age

0.037786995

0.014894817

2.53692238

0.014639292

0.007805284

0.067768706

Black

0.002468201

0.002117318

1.16572014

0.249736561

-0.00179374

0.006730141

-

-

-

-

Price

0.023703173

0.006775848

3.498185462

0.001050873

0.037342248

0.010064099

4.

Non Linear Regression Model Excluding Income

 

SUMMARY OUTPUT- Excluding Income

 

Regression Statistics

 

Multiple R

0.554132015

R Square

0.30706229

Adjusted R

Square

0.230069211

Standard

Error

0.200554306

Observations

51

ANOVA

 

Significance

 

df

SS

MS

F

F

Regression

5

0.802063558

0.160412712

3.988180424

0.004437236

Residual

45

1.809991339

0.04022203

Total

50

2.612054897

 
 

Standard

 

Coefficients

Error

t Stat

P-value

Lower 95%

Upper 95%

Intercept

5.351700624

1.733618867

3.087011065

0.003455756

1.860013041

8.843388207

Age

0.063872321

0.020270431

3.15100961

0.002891868

0.023045579

0.104699064

 

-

HS

0.005799373

0.004231198

1.370622086

0.177291209

0.002722697

0.014321443

Black

0.006208014

0.002921003

2.125302071

0.039085419

0.000324812

0.012091216

-

-

-

Female

0.037496172

0.03817503

0.982217229

0.331244423

0.114384628

0.039392284

-

-

-

Price

0.020834885

0.007184048

2.900159344

0.005750394

-0.0353043

0.006365469

SUMMARY OUTPUT- Income, Age, Price

Regression Statistics

Multiple R

0.604821953

R Square

0.365809595

Adjusted R

Square

0.325329356

Standard

Error

0.187737943

Observations

51

ANOVA

 

Significance

 

df

SS

MS

F

F

Regression

3

0.955514743

0.318504914

9.036745018

7.82079E-05

Residual

47

1.656540154

0.035245535

Total

50

2.612054897

 
 

Standard

 

Coefficients

Error

t Stat

P-value

Lower 95%

Upper 95%

Intercept

4.127128249

0.42110809

9.800638707

6.1093E-13

3.279968057

4.97428844

Income

0.000149342

4.68022E-05

3.190921651

0.002527788

5.51883E-05

0.000243496

Age

0.037523826

0.014949862

2.509978155

0.015575595

0.007448584

0.067599068

 

-

-

Price

-0.02487975

0.006725788

3.699157815

0.00056573

-0.03841029

0.011349211

6.

Non-Linear Model with only Income

 

SUMMARY OUTPUT- ln(Sales) vs Income

 

Regression Statistics

 

Multiple R

0.371226199

R Square

0.137808891

Adjusted R

Square

0.120213154

Standard

Error

0.214385239

Observations

51

ANOVA

 

Significance

 

df

SS

MS

F

F

Regression

1

0.359964388

0.359964388

7.831947665

0.007319681

Residual

49

2.252090509

0.045961031

Total

50

2.612054897

 
 

Standard

 

Coefficients

Error

t Stat

P-value

Lower 95%

Upper 95%

Intercept

4.235606774

0.194208239

21.80961429

7.04454E-27

3.845330714

4.625882833

Income

0.000142671

5.09801E-05

2.798561714

0.007319681

4.02226E-05

0.000245119

7.

Linear Model excluding Female and HS

SUMMARY OUTPUT

Regression Statistics

Multiple R

0.565849272

R Square

0.320185398

Adjusted R

Square

0.261071085

Standard

Error

27.5680058

Observations

51

ANOVA

 

Significance

 

df

SS

MS

F

F

Regression

4

16465.67612

4116.419029

5.416376863

0.001167909

Residual

46

34959.76741

759.9949437

Total

50

51425.44353

 
 

Standard

 

Coefficients

Error

t Stat

P-value

Lower 95%

Upper 95%

 

-

Intercept

55.32958014

62.39529309

0.886758879

0.379821522

70.26562873

180.924789

Income

0.018892061

0.006882169

2.745073749

0.008601136

0.005038974

0.032745148

 

-

Age

4.191538246

2.195534978

1.909119321

0.062496178

0.227844379

8.61092087

-

-

-

-

Price

3.239940647

0.998777726

3.243905587

0.002198981

5.250375905

1.229505389

 

-

Black

0.334162426

0.312098265

1.070696198

0.289891915

0.294058789

0.962383641