Chapter

Fourteen

14- 2

Chapter Fourteen

Multiple Regression and Correlation

Analysis

GOALS

When you have completed this chapter, you

will be able to:

ONE

Describe the relationship between two or more independent

variables and the dependent variable using a multiple regression

equation.

TWO

Compute and interpret the multiple standard error of estimate and

the coefficient of determination.

THREE

Interpret a correlation matrix.

Goals

FOUR

Setup and interpret an ANOVA table.

14- 3

Chapter Fourteen continued

Multiple Regression and Correlation Ana

GOALS

When you have completed this chapter, you

will be able to:

FIVE

Conduct a test of hypothesis to determine if any of the set of

regression coefficients differ from zero.

SIX

Conduct a test of hypothesis on each of the regression

coefficients.

Goals

14- 4

The general multiple regression with k

independent variables is given by:

Y ' a b1 X 1 b2 X 2 ...bk X k

Greek letters are

used for a (and a is the Y-intercept.

b (when X1 to Xk are the

denoting independent

population variables.

parameters.

Multiple Regression

Analysis

14- 5

holding all other values constant, where j=1 to k. It is

called a partial regression coefficient, a net regression

coefficient, or just a regression coefficient.

The least squares criterion Because determining

is used to develop this b1, b2, etc. is very

equation. tedious, a software

package such as Excel

or MINITAB is

recommended.

Multiple Regression

Analysis

14- 6

a measure of the effectiveness of the regression equation.

units as the dependent determine what is a

variable. large value and

what is a small

The formula is:

value of the

(Y Y ' ) 2 standard error.

s y.12...k

n (k 1)

Multiple Standard Error

of Estimate

14- 7

variable must be

and the dependent variable

continuous and at

have a linear relationship. least interval-scaled.

The residuals should

follow the normal

distributed with mean 0.

The variation in (Y-Y) or Successive values of the

residual must be the same dependent variable must

for all values of Y. When be uncorrelated.

this is the case, we say the

Multiple Regression and

difference exhibits Correlation Assumptions

homoscedasticity.

homoscedasticity

Explained Variation 14- 8

ANOVA TABLE Variation

accounted

Source df SS MS for by the

Regression k-1 SSR SSR/(k-1) set of

(Y Y)2 independent

variables.

Error n-k-1 SSE SSE/(n-k-1)

(Y-Y)2

Total n-k-1 SS Total

(Y-Y)

Variation not accounted for by the

independent variables.

ANOVA table

14- 9

oA correlation matrix is

used to show all possible

simple correlation coefficients

among the variables. Correlation

Coefficients Cars Advertising

Sales

force

Cars 1.000

Sales force 0.872 0.537 1.000

locating correlated

independent variables.

independent variable is

correlated with the dependent

variable. Correlation Matrix

14- 10

The global test is used to investigate whether any of the

independent variables have significant coefficients. The

hypotheses are:

H 0 : 1 2 ... k 0

H 1 : Not all s equal 0

(number of independent variables) and

n-(k+1) degrees of freedom, where n is the

sample size.

Global Test

14- 11

independent variables have nonzero regression coefficients.

have zero regression distribution with n-

coefficients are (k+1) degrees of

usually dropped from freedom.

the analysis. bj 0

t= S

b

j

Variables

14- 12

A market researcher for Super

Dollar Super Markets is

studying the yearly amount

families of four or more spend

on food. Three independent

variables are thought to be

related to yearly food

expenditures (Food). Those

variables are: total family

income (Income) in $00, size of

family (Size), and whether the

family has children in college

(College).

EXAMPLE 1

14- 13

Food

expenditures = a + b1*(Income) + b2(Size) + b3(College)

Note the following regarding Other examples of

the regression equation. dummy variables

The variable college is called include gender, the

a dummy or indicator variable. part is acceptable or

It can take only one of two unacceptable, the

possible outcomes. That is a voter will or will not

child is a college student or vote for the incumbent

not. governor.

variable as 1 and the other 0. Example 1

continued

14- 14

Example 1 continued

14- 15

such as MINITAB or Excel, to

develop a correlation matrix.

out the regression equation

Y = 954 +1.09X1 + 748X2 + 565X3

Food

Expenditure=$954+$1.09*income+$748*size+$565*college

What food expenditure would you

estimate for a family of 4, with no

college students, and an income of

$50,000 (which is input as 500)? Example 1 continued

14- 16

The regression equation is

Food = 954 + 1.09 Income + 748 Size + 565 Student

Constant 954 1581 0.60 0.563

Income 1.092 3.153 0.35 0.738

Size 748.4 303.0 2.47 0.039

Student 564.5 495.1 1.14 0.287

Analysis of Variance

Source DF SS MS F P

Regression 3 10762903 3587634 10.94 0.003

Residual Error 8 2623764 327970

Total 11 13386667

Example 1 continued

14- 17

Food

Expenditure=$954+$1.09*income+$748*size+$565*college

Each additional $100 dollars of income per year will

increase the amount spent on food by $109 per year.

An additional family member will increase the amount

spent per year on food by $748.

A family with a college student will spend $565 more per

year on food than those without a college student.

Food Expenditure=$954+$1.09*500+$748*4+$565*0

So a family of 4, with no college

students, and an income of $50,000

will spend an estimated $4,491. Example 1 continued

From the regression 14- 18

The coefficient of

determination is 80.4 Food 1.000

percent. This means that

Income 0.587 1.000

more than 80 percent of

the variation in the Size 0.876 0.609 1.000

amount spent on food is College 0.773 0.491 0.743 1.000

accounted for by the

variables income, family None of the correlations among

size, and student. the independent variables should

The strongest correlation cause problems. All are between

between the dependent variable .70 and .70.

and an independent variable is

between family size and amount

spent on food. Correlation matrix

14- 19

any of the regression coefficients are not zero.

H 0 : 1 2 3 0 H1 : at least one

H0 is rejected if F>4.07.

From the MINITAB output, the computed value of

F is 10.94.

Decision: H0 is rejected. Not all the regression

coefficients are zero

Example 1 continued

14- 20

are not zero. This is the hypotheses for the independent

variable family size.

H0 : 2 0 H1: 2 0

From the MINITAB output, Thus, using the 5% level

the only significant variable of significance, reject H0

is FAMILY (family size) if the p-value < .05.

using the p-values. The

other variables can be

omitted from the model.

Example 1 continued

14- 21

family size.

The new regression equation is:

Y = 340 + 1031X2

two independent variables, and the R-square term was

reduced by only 3.6 percent.

Example 1 continued

14- 22

Food = 340 + 1031 Size

Constant 339.7 940.7 0.36 0.726

Size 1031.0 179.4 5.75 0.000

Analysis of Variance

Source DF SS MS F P

Regression 1 10275977 10275977 33.03 0.000

Residual Error 10 3110690 311069

Total 11 13386667

Example 1 continued

14- 23

value of Y and the predicted value Y.

distributed. Histograms and stem-and-leaf

charts are useful in checking this requirement.

Y values is used for showing that there are no

trends or patterns in the residuals.

Analysis of Residuals

14- 24

1000

Residuals

500

-500

4500 6000 7500

Y Residual Plot

14- 25

8

7

6

Frequency

5

4

3

2

1

0

-600 -200 200 600 1000

Residuals

Histograms of Residuals

