Feb 21, 2017

Multiple Regression

Section 14.5

14.1 The coefficients of independent variables in a multiple regression model are interpreted as the change in y

for a one-unit change in the corresponding independent variable when all other independent variables are

held constant. For example, B2 gives the change in y due to a one-unit change in x2 when x1, x3, ... , xk are

held constant.

14.3 The independent variables can have a non-linear relationship but cannot be linearly related.

1. The mean of the probability distribution of is zero, that is, E() = 0.

2. The errors associated with different sets of values of independent variables are independent.

Furthermore, these errors are normally distributed and have a constant standard deviation which is

denoted by .

3. The independent variables are not linearly related.

4. There is no linear association between the random error term and each independent variables xi.

b. The value of a = 15.065 gives the value for y when x1 = 0 and x2 = 0. However, since x1 = 0 and

x2 = 0 do not occur together in the sample data, the estimate is invalid. The value b1 = .167 gives the

change in y for a one-unit change in x1 when x2 is held constant. The value b2 = .132 gives the

change in y for a one-unit change in x2 when x1 is held constant.

d. y = 15.065 + .167 x1 .132 x2 = 15.065 + .167(87) .132(54) = 22.466

f. df = n k 1 = 11 2 1 = 8

The 99% confidence interval for 1 is

b1 tsb1 = .167 (3.355)(.034) = .167 .114 = .053 to .281

Step 2: Since is unknown, use the t distribution.

Step 3: For = .01 with df = 8, the critical value of t is 2.896.

Step 4: t = (b2 B2)/ sb2 = 1.919

Step 5: Do not reject H0 since 1.919 > 2.896.

Conclude that B2 is not negative.

b. The value of a = 11.258 gives the expected weekly sales for restaurants in areas with zero population

and a mean annual household income of $0. However, since the sample data does not include any

restaurants in areas with zero population and a mean annual household income of $0, the estimate is

invalid. The value b1 = .011 indicates that for each increase of 1000 in population, a restaurants sales

are expected to increase by $11 when mean annual household income is held constant. The value b2 =

.199 indicates that for each increase of $1000 in mean annual household income, a restaurants sales

are expected to increase by $199 when population is held constant.

d. y = 11.258 + .011x1 + .199 x 2 = 11.258 + .011(50) + .199(55) = 22.753

The predicted sales for a restaurant with 50 thousand people living within a five-mile area surrounding

it and $55 thousand mean annual income of households in that area is $22,753.

e. y = 11.258 + .011x1 + .199 x 2 = 11.258 + .011(45) + .199(60) = 23.693

The expected (mean) sales for all restaurants with 45 thousand people living within a five-mile area

surrounding them and $60 thousand mean annual income of households living in those areas is

$23,693.

f. df = n k 1 = 11 2 1 = 8

The 95% confidence interval for 2 is

b2 tsb2 = .199 (2.306)(.117) = .199 .270 = .071 to .469

Step 2: Since is unknown, use the t distribution.

Step 3: For = .01 with df = 8, the critical values of t are 3.355 and 3.355.

Step 4: t = (b1 B1)/ sb1 = .120

Step 5: Do not reject H0 since .120 < 3.355.

Conclude that B1 is not different from zero.

1. c 2. a 3. c

4. A regression line obtained by using population data is called the population multiple regression model.

The estimated multiple regression model is obtained from sample data.

5. The regression coefficients in a multiple regression model are called the partial regression coefficients

because each of them gives the effect of the corresponding independent variable on the dependent variable

when all other independent variables are held constant.

6. R 2 is the proportion of the total sum of squares (SST) that is explained by the multiple regression model.

R 2 is the coefficient of multiple determination adjusted for degrees of freedom. R 2 generally increases as

more explanatory variables are added to the regression model while the value of R 2 may increase,

decrease, or stay the same as more independent variables are added. R 2 is always non-negative; R 2 can

be negative.

7. a. We would expect the relationship between sale price and lot size to be positive, the relationship

between sale price and living area to be positive, and the relationship between sale price and age to be

negative.

b. y = 200.153 + 11.889 x1 + .099 x 2 7.551x3

The signs of the coefficients of the independent variables obtained in the solution are consistent with

the expectations in part a.

c. The value of a = 200.153 gives the expected sale price of a house for a lot size of zero and living area

of zero at age zero. However, since x1 = 0, x2 = 0, and x3 = 0 do not occur together in the sample data,

the estimate is invalid. In fact, a lot size of zero and a living area of zero do not make sense. The

value b1 = 11.889 indicates that for an increase of one acre in the lot size, the sale price of a house is

expected to increase by $11,889 when living area and age are held constant. The value b2 = .099

indicates that for an increase of one square foot in living area, the sale price of a house is expected to

increase by $99 when lot size and age are held constant. The value b3 = 7.551 indicates that for an

increase of one year in age, the sale price of a house is expected to decrease by $7551 when lot size

and living area are held constant.

e. y = 200.153 + 11.889 x1 + .099 x 2 7.551x3 = 200.153 + 11.889(2.5) + .099(3000) 7.551(14)

= 421.162

The predicated sale price of a house that has a lot size of 2.5 acres, a living area of 3000 square feet,

and is 14 years old is $421,162.

f. y = 200.153 + 11.889 x1 + .099 x 2 7.551x3 = 200.153 + 11.889(2.2) + .099(2500) 7.551(7)

= 420.952

The point estimate of the mean sale prices of all houses that have a lot size of 2.2 acres, a living area of

2500 square feet, and are 7 years old is $420,952.

g. df = n k 1 = 13 3 1 = 9

The 99% confidence interval for 1 is

b1 tsb1 = 11.889 (3.250)(23.697) = 11.889 77.015 = 65.126 to 88.904

The 99% confidence interval for 2 is

b2 tsb2 = .099 (3.250)(.043) = .099 .140 = .041 to .239

The 99% confidence interval for 3 is

b3 tsb3 = 7.551 (3.250)(1.988) = 7.551 6.461 = 14.012 to 1.090

a ts a = 200.153 (2.821)(89.138) = 200.153 251.458 = 51.305 to 451.611

Step 2: Since is unknown, use the t distribution.

Step 3: For = .01 with df = 9, the critical value of t is 2.821

Step 4: t = (b1 B1)/ sb1 = .502

Step 5: Do not reject H0 since .502 < 2.821.

Conclude that B1 is not positive.

j. Step 1: H0: B2 = 0, H1: B2 > 0

Step 2: Since is unknown, use the t distribution.

Step 3: For = .025 with df = 9, the critical value of t is 2.262.

Step 4: t = (b2 B2)/ sb2 = 2.319

Step 5: Reject H0 since 2.319 > 2.262.

Conclude that B2 is positive.

k. Step 1: H0: B3 = 0, H1: B3 < 0

Step 2: Since is unknown, use the t distribution.

Step 3: For = .05 with df = 9, the critical value of t is 1.833.

Step 4: t = (b3 B3)/ sb3 = 3.799

Step 5: Reject H0 since 3.799 < 1.833.

Conclude that B2 is negative.

