Feb 09, 2014

Prof. Frydman

March 2, 2012

Solutions to Multiple Linear Regression Review Questions

Q1) The regression model for Salary (in dollars) was developed from the

salary survey of computer professionals in a large corporation. The following

variables were used to predict Salary

X=number of years of experience

M=management coded as 1 for a person with management experience

and 0 otherwise

E=Education which was coded as 1 if a person had a high school diploma,

2 if a person had a college diploma and 3 for Advanced degree. Then E was

recoded using the following two dummy variables

HS =

1 if the person has high school diploma

0. otherwise

AD =

1 if the person has Advanced degree

0. otherwise

Answer questions below based on the Minitab output below.

a) Is the estimated multiple regression model I (where Education is coded

as 1, 2, 3) statistically signicant at c = 0.01? (State the hypothesis test and

your conclusion)

H

0

: ,

years

= ,

M

= ,

E

= 0

H

1

: at least one , 6= 0

Since the p-value for this test is zero, the model is statistically signicant for

any c 0..

b) Interpret the coecients in the estimated regression model.

b

,

years

= 570. all else equal, an additional year of experience increases the

mean salary by $570.

b

,

M

= 6688. all else equal, the dierence in mean salary of professionals

with management experience and those without is $6688.

1

b

,

E

= 1579. all else equal, a one level increase in education results in the

mean salary increase of 1579.

c) Is E a statistically signicant variable at c = 0.05? (State the hypoth-

esis test, rejection rule and your conclusion)

H

0

: ,

E

= 0

H

1

: ,

E

6= 0

Yes, because the p-value of this test is zero, that is, 21(2 |t

observ

|) 0.

where t

observ

= 6.02.

d) Now consider the multiple regression model II where Education was

coded using dummy variables dened above. Is there evidence at c = 0.05,

that professionals with Advanced degree have on average dierent salary

than professionals with a college degree? To answer this question formulate

an appropriate hypothesis test.

H

0

: ,

AD

= 0

H

1

: ,

AD

6= 0

Reject H

0

if |t

observ

| .

0025

= 1.96. Since |t

observ

| = 0.38. we cannot reject H

0

.

All else equal, there is no evidence that, that professionals with Advanced

Degree have, on average, dierent salary than professionals with a college

degree

e) Interpret the coecient of HS in the nal regression model III. Which

model do you prefer I, II or III?. Explain fully.

All else equal, the mean salary of professionals with high school degree is

by $3089 lower than the mean salary of professionals with College Degree.and

Advanced Degree combined. I would prefer Model III because it has all

variables statistically signicant, it has the highest adjusted :

2

. the lowest :

and is most informative about the inuence of predictors on Salary.

Q2 (Condos) A Florida real estate agent collected data on a number

of condominium units of similar size within a Florida development. Her

objective was to relate PRICE to other variables listed below.

2

PRICE = selling price of condo unit (in dollars)

FLOOR = oor (1 to 8)

DELEV = distance from the elevator (in yards)

VIEW = 1 if of ocean, 0 otherwise

END = 1 if end unit, 0 otherwise

FURN = 1 if furnished, 0 otherwise

a) Consider a multiple regression model involving all explanatory vari-

ables. How many condominium units were used to construct this model? Is

this a statistically signicant model at c = 0.05?

There were 60 condominiums. The model is statistically signicant be-

cause the p-value of the overall test F-test is equal to zero.

b) Compute the missing p-value associated with the test for the coecient

of FLOOR using the standard normal distribution. State the hypothesis test

(H

0

and H

1

) for which this is a p-value and interpret the p-value.

H

0

: ,

FLOOR

= 0

H

1

: ,

FLOOR

6= 0

Since |t

observ

| = 110.3,113.0 = 0.98.

p-value = 21(2 |t

observ

|) = 21(2 0.98)

= 2[0.5 1(0 < 2 < 0.98)] = 2(0.5 0.3365) = 0.327.

The p-value is the probability of obtaining |t

observ

| = 0.98 or more extreme

when H

0

is true. This is a large p-value, so we do not have evidence to reject

H

0

.

c) According to the best subsets regression, which is the best set of

predictors to use? Explain your choice.

The best set of predictors to use is VIEW and END. This model has the

highest adjuster r

2

and the smallest standard deviation of regression

d) Subsequently the real estate agent estimated two models: a simple

linear regression of PRICE on VIEW, and a linear regression of PRICE on

VIEW and END. She chose a simple linear regression model of PRICE

on VIEW as the nal model. Explain her reasoning

3

She wanted to have a model with highly statistically signicant explana-

tory variables. This lead her to choose the model with one variable: VIEW.

e) Using the regression of PRICE on VIEW, what is the estimated average

dierence in price of apartments with and without an ocean view?

b

,

VIEW

= 3361.3.

Construct a 95% condence interval for the actual dierence in the aver-

age prices of condos with and without an ocean view and interpret it.

3361.3 1.96(528.2)

(2326.03. 4396.57).

We can be 95% condent that the actual dierence in the average prices of

condos with and without an ocean view lies in this interval. Thus, a condo

with a view sells, on average, for more than a condo without a view.

4

MODEL I: Regression Analysis: Salary versus Years, M, E

The r egr essi on equat i on i s

Sal ar y = 6963 + 570 Year s + 6688 M + 1579 E

Pr edi ct or Coef SE Coef T P

Const ant 6963. 5 665. 7 10. 46 0. 000

Year s 570. 09 38. 56 14. 78 0. 000

M 6688. 1 398. 3 16. 79 0. 000

E 1578. 8 262. 3 6. 02 0. 000

S = 1312. 79 R- Sq = 92. 8% R- Sq( adj ) = 92. 3%

Anal ysi s of Var i ance

Sour ce DF SS MS F P

Regr essi on 3 928714168 309571389 179. 63 0. 000

Resi dual Er r or 42 72383410 1723415

Tot al 45 1001097577

MODEL II: Regression Analysis: Salary versus Years, M, HS, AD

The r egr essi on equat i on i s

Sal ar y = 11180 + 546 Year s + 6884 M - 3144 HS - 148 AD

Pr edi ct or Coef SE Coef T P

Const ant 11179. 6 366. 0 30. 55 0. 000

Year s 546. 18 30. 52 17. 90 0. 000

M 6883. 5 313. 9 21. 93 0. 000

HS - 3144. 0 362. 0 - 8. 69 0. 000

AD - 147. 8 387. 7 - 0. 38 0. 705

S = 1027. 44 R- Sq = 95. 7% R- Sq( adj ) = 95. 3%

MODEL III: Regression Analysis: S versus Years, M, HS

The r egr essi on equat i on i s

Sal ar y = 11112 + 549 Year s + 6859 M - 3089 HS

Pr edi ct or Coef SE Coef T P

Const ant 11112. 1 317. 0 35. 06 0. 000

Year s 548. 79 29. 44 18. 64 0. 000

M 6859. 5 304. 4 22. 54 0. 000

HS - 3089. 1 328. 6 - 9. 40 0. 000

S = 1016. 93 R- Sq = 95. 7% R- Sq( adj ) = 95. 4%

The r egr essi on equat i on i s

PRI CE = 17676 - 110 FLOOR + 56. 9 DELEV + 3442 VI EW- 2612 END + 409 FURN

Pr edi ct or Coef SE Coef T P

Const ant 17676. 4 850. 0 20. 79 0. 000

FLOOR - 110. 3 113. 6

DELEV 56. 86 64. 68 0. 88 0. 383

VI EW 3442. 0 542. 6 6. 34 0. 000

END - 2612 1487 - 1. 76 0. 085

FURN 409. 0 574. 9 0. 71 0. 480

S = 2024. 39 R- Sq = 46. 3% R- Sq( adj ) = 41. 3%

Anal ysi s of Var i ance

Sour ce DF SS MS F P

Regr essi on 5 190929812 38185962 9. 32 0. 000

Resi dual Er r or 54 221299362 4098136

Tot al 59 412229173

Best Subsets Regression: PRICE versus FLOOR, DELEV, VIEW, END, FURN

Response i s PRI CE

F D

L E V F

O L I E U

Mal l ows O E E N R

Var s R- Sq R- Sq( adj ) Cp S R V WD N

1 41. 1 40. 1 3. 2 2045. 8 X

2 44. 5 42. 5 1. 8 2003. 6 X X

3 45. 3 42. 4 3. 0 2006. 8 X X X

4 45. 8 41. 9 4. 5 2015. 3 X X X X

5 46. 3 41. 3 6. 0 2024. 4 X X X X X

Regression Analysis: PRICE versus VIEW, END

The r egr essi on equat i on i s

PRI CE = 17689 + 3543 VI EW- 2732 END

Pr edi ct or Coef SE Coef T P

Const ant 17688. 7 365. 8 48. 36 0. 000

VI EW 3543. 5 526. 5 6. 73 0. 000

END - 2732 1466 - 1. 86 0. 068

S = 2003. 58 R- Sq = 44. 5% R- Sq( adj ) = 42. 5%

Regression Analysis: PRICE versus VIEW

PRI CE = 17689 + 3361 VI EW

Pr edi ct or Coef SE Coef T P

Const ant 17688. 7 373. 5 47. 36 0. 000

VI EW 3361. 3 528. 2 6. 36 0. 000

S = 2045. 81 R- Sq = 41. 1% R- Sq( adj ) = 40. 1%

