You are on page 1of 11

Overview of models for a continuous response variable

Y is a continuous response (or dependent) variable


X1 , X 2 , ..., X p are quantitative explanatory (or independent) variables
A, B are categorical explanatory variables (called explanatory factors)
response explanatory Model Model equation
variables name
Y X simple
linear 
yi  0  1xi   i
regression
Y X1 , X 2 , ..., X p multiple
linear y     x   2 x2i     k xki   i
i 0 1 1i
regression
Y A one way
ANOVA y   
ij i ij
Y A, B two way
ANOVA
Week 4 Multiple linear regression

4.1 Introduction

This is applicable when the data are multivariate. A multiple linear regression model relates a
response variable Y to more than one explanatory variable.
The main purpose of the multiple regression analysis is to find which explanatory variables
contribute to the variation of the response variable. We are usually looking for the ‘best’ subset
of the explanatory variables.

The Model

yi   0  1 x1i   2 x2i     k xki   i , i  1,2,...n.

where: k is the number of explanatory variables, and

 o ,  1 ,  k are the parameters of the model,


 i is a random error term.

1
4.2 Example

Is the size of your brain related to your mental capacity? In this study by Willerman et al. (1991)
the researchers use Magnetic Resonance Imaging (MRI) to determine the brain size of the
subjects. The researchers take into account gender and body size to draw conclusions about the
connection between brain size and intelligence.

Willerman et al. (1991) conducted their study at a large southwestern university. They selected a
sample of 40 right-handed Anglo introductory psychology students who had indicated no history
of alcoholism, unconsciousness, brain damage, epilepsy, or heart disease. These subjects were
drawn from a larger pool of introductory psychology students with total Scholastic Aptitude Test
Scores higher than 1350 or lower than 940 who had agreed to satisfy a course requirement by
allowing the administration of four subtests (Vocabulary, Similarities, Block Design, and Picture
Completion) of the Wechsler (1981) Adult Intelligence Scale-Revised. With prior approval of the
University's research review board, students selected for MRI were required to obtain prorated
full-scale IQs of greater than 130 or less than 103, and were equally divided by sex and IQ
classification. The MRI Scans were performed at the same facility for all 40 subjects. The scans
consisted of 18 horizontal MR images. The computer counted all pixels with non-zero gray scale
in each of the 18 images and the total count served as an index for brain size. The data are shown
below

1. Gender: Male or Female


2. FSIQ: Full Scale IQ scores based on the four Wechsler (1981) subtests
3. VIQ: Verbal IQ scores based on the four Wechsler (1981) subtests
4. PIQ: Performance IQ scores based on the four Wechsler (1981) subtests
5. Weight: body weight in pounds
6. Height: height in inches
7. MRI_CNT: total pixel Count from the 18 MRI scans

The Data:
Gender FSIQ VIQ PIQ Weight Height MRI_CNT
Female 133 132 124 118 64.5 816932
Male 140 150 124 . 72.5 1001121
Male 139 123 150 143 73.3 1038437
Male 133 129 128 172 68.8 965353
Female 137 132 134 147 65.0 951545
Female 99 90 110 146 69.0 928799
Female 138 136 131 138 64.5 991305
Female 92 90 98 175 66.0 854258
Male 89 93 84 134 66.3 904858
Male 133 114 147 172 68.8 955466
Female 132 129 124 118 64.5 833868
Male 141 150 128 151 70.0 1079549
Male 135 129 124 155 69.0 924059
Female 140 120 147 155 70.5 856472
Female 96 100 90 146 66.0 878897
Female 83 71 96 135 68.0 865363
Female 132 132 120 127 68.5 852244
Male 100 96 102 178 73.5 945088
Female 101 112 84 136 66.3 808020
Male 80 77 86 180 70.0 889083

2
Male 83 83 86 . . 892420
Male 97 107 84 186 76.5 905940
Female 135 129 134 122 62.0 790619
Male 139 145 128 132 68.0 955003
Female 91 86 102 114 63.0 831772
Male 141 145 131 171 72.0 935494
Female 85 90 84 140 68.0 798612
Male 103 96 110 187 77.0 1062462
Female 77 83 72 106 63.0 793549
Female 130 126 124 159 66.5 866662
Female 133 126 132 127 62.5 857782
Male 144 145 137 191 67.0 949589
Male 103 96 110 192 75.5 997925
Male 90 96 86 181 69.0 879987
Female 83 90 81 143 66.5 834344
Female 133 129 128 153 66.5 948066
Male 140 150 124 144 70.5 949395
Female 88 86 94 139 64.5 893983
Male 81 90 74 148 74.0 930016
Male 89 91 89 179 75.5 935863

A new variable SEX was created from gender having set male=0 and Female=1.
Note there are 40 cases in the data set, but 2 cases have missing values, so n=38 cases are used in
the multiple linear regression analysis.

The Multiple Scatter Diagrams:

The response variable y and the all the x continuous variables are plotted between each other.

MRI_CNT

FSI Q

VI Q

PI Q

WEIGHT

SEX

HEI GHT Male

Female

This plot shows that individually PIQ, WEIGHT and HEIGHT are related linearly with the
response variable MRI_CNT. Also there is some high correlation between some of the
explanatory variables.

3
4.3 SPSS regression output.

Regression
b
Variables Entered/Remov ed

Variables Variables
Model Entered Removed Method
1 Male=0
Female=1,
PIQ,
. Enter
WEIGHT ,
HE IGHT,a
VIQ, FSIQ
a. All requested variables entered.
b. Dependent V ariable: MRI_CNT

a) Variation explained by the model


Model Summary

Std. Error
Adjusted R of the
Model R R Square Square Estimate
1 .808 a .652 .585 46759.06
a. Predictors: (Constant), Male=0 Female=1, PIQ,
WEIGHT , HEIGHT , VIQ, FSIQ
The variation explained here is 65.2%.

b) Testing whether the x-variables jointly are significant.


ANOVAb

Sum of Mean
Model Squares df Square F Sig.
1 Regression 1.27E+11 6 2.12E+10 9.684 .000 a
Residual 6.78E+10 31 2.19E+09
Total 1.95E+11 37
a. Predictors: (Constant), Male=0 Female=1, PIQ, WEIGHT, HEIGHT, VIQ, FSIQ
b. Dependent Variable: MRI_CNT
If the regression is not significant, then y does not depend on the x’s. The hypotheses
may be written:

H o :  1   2     k  0 (y does not depend on x' s) i.e. model yi   o   i


H 1 :  j  0 for all j i.e. model yi   0   1 x1i   2 x2i     k xki   i

MS (regression)
Test Statistic F ~ Fk , nk 1 if H 0 true (where k  6 and n  38 )
MS (residual )
Observed F = 9.684 and P =0.000
Since P < 0.001, the data provide very strong evidence to reject H 0 .
Conclusion The data provide very strong evidence of a linear relationship between y
(MRI_CNT) and the six entered variables (i.e. SEX, PIQ, WEIGHT, HEIGHT, VIQ and FSIQ).
Hence the six entered variables are jointly significant.

4
c) Testing whether the x-variables individually are significant.

Coefficientsa

Standardi
zed
Unstandardized Coefficien
Coefficients ts
Model B Std. E rror Beta t Sig.
1 (Constant) 206819.4 235162.2 .879 .386
FSIQ -9389.378 4651.638 -3.082 -2.019 .052
VIQ 5388.765 2761.426 1.704 1.951 .060
PIQ 6287.507 2526.270 1.958 2.489 .018
WEIGHT 87.015 485.553 .028 .179 .859
HE IGHT 6883.317 3207.980 .379 2.146 .040
Male=0 Female=1 -42368.7 24529.592 -.295 -1.727 .094
a. Dependent V ariable: MRI_CNT

Interpretation of the coefficients:


The slopes (B) represent the amount by which y changes for every unit change in one of the x’s
while the rest of the x’s remain constant i.e. If PIQ increases by one unit while all the other
variables remain constant the y-variable (MRI_CNT) will increase by 6287.507.
The intercept represents the value of y when all the x’s are zero.

Hypothesis test about  j :


H 0 :  j  0 No linear relation between y and x j (given all the other entered x - variables)
H 1 :  i  0 There is a linear relation between y and x j (given all the other entered x - variables )

ˆ j
Test Statistic: T ~ t nk if H 0 true
std . error ( ˆ j )

Example
H o :  4  0 No relationship between MRI_CNT and PIQ (given FSIQ,VIQ,HEIGHT,WEIGHT,SEX)
H i :  4  0 There is a linear relationship (given all the other entered x - variables)

Observed T = 2.489 and P = 0.018

Since 0.01 < P  0.05, the data provide evidence to reject H 0 :  4  0 .


Conclusion
The data provide evidence of a linear relationship between MRI_CNT and PIQ (given all the
other entered x-variables). Hence PIQ is a significant linear predictor for MRI_CNT (given all
the other entered x-variables).

From the table, HEIGHT is also a significant linear predictor for MRI_CNT (given all the other
entered x-variables). However the input variables FSIQ, VIQ, WEIGHT and SEX are
individually not significant linear predictors for MRI_CNT (given all the other entered x-
variables).

5
4.4 Selecting appropriate subsets of x-variables.
There are many ways to construct a ‘best’ regression equation from a large set of explanatory
(x) variables.

Backward elimination: We begin with a model that includes all the explanatory variables
and eliminate the least significant variable one at a time, until all variables in the model are
significant.

Forward selection: We start with the constant and add the most significant variable
not in the model one at a time, until the variable to be entered is not significant.

Forward stepwise selection: Add one variable at the time into the model as in forward
selection, but also check whether any existing variables can be removed.

Backward elimination:

i) Start with all input variables in the model.

ii) Test whether the parameter of each variable currently in the model is different
from 0, and find its p value.
Select the least significant input variable from those currently in the model
[i.e. with the largest p-value],
and remove this input variable from the model, provided its p-value is greater than
the significance level for Remove (0.10, by default in SPSS).

iii) If no variables were removed in step ii), then stop;


otherwise return to step ii).

6
Forward selection:

i) Start with no input variables in the model.

ii) Add each variable not currently in the model in turn to the current model and
find its p value on entering the current model.
Select the most significant input variable from those not currently in the model
[i.e. with the smallest p-value when entered into the current model],
and include this input variable in the model provided its p-value is less than the
significance level for Entry (0.05, by default in SPSS).

iii) If no variables were entered in step ii), then stop;


otherwise return to step ii).

Forward stepwise selection:

i) Start with no input variables in the model.

ii) Select the most significant input variable from those not currently in the model
[i.e. with the smallest p-value when entered into the current model],
and include this input variable in the model provided its p-value is less than the
significance level for Entry (0.05, by default in SPSS).

iii) Select the least significant input variable from those currently in the model
[i.e. with the largest p-value],
and remove this input variable from the model, provided its p-value is greater than
the significance level for Remove (0.10, by default in SPSS).
Repeat iii) until all variables left in the model are significant

iii) If no variables were entered in step ii) or removed in step iii), then stop;
otherwise return to step ii).

7
The following output uses the option STEPWISE.

a) Which variables are included in the final model

a
Variables Entered/Remov ed

Variables Variables
Model Entered Removed Method
1 Stepwise
(Criteria:
Probabilit
y-of-F-to-e
Male=0 nter <=
.
Female=1 .050,
Probabilit
y-of-F-to-r
emove >=
.100).
2 Stepwise
(Criteria:
Probabilit
y-of-F-to-e
nter <=
PIQ .
.050,
Probabilit
y-of-F-to-r
emove >=
.100).
3 Stepwise
(Criteria:
Probabilit
y-of-F-to-e
nter <=
HE IGHT .
.050,
Probabilit
y-of-F-to-r
emove >=
.100).

Firstlya. SEX,
Dependent V ariable:
secondly MRI_CNT
PIQ and finally HEIGHT are found to be significant.

There are three models fitted altogether before the algorithm stopped:
i) SEX
ii) SEX+PIQ
iii) SEX+PIQ+HEIGHT
Model iii) is the final model.
The output below shows the R-squares, the ANOVA table and the regression coefficients from all
three models. Also shows the variables which were rejected from the analysis at each stage. Note
that the final prediction model is

MRI_CNT=353207.8 - 54561.4*(if female) + 1267.677*PIQ + 6447.095*HEIGHT

b) The R-Squares for the three fitted models

8
Model Summary

Std. E rror
Adjusted R of the
Model R R S quare Square Estimate
1 .649 a .421 .405 55951.37
2 .738 b .544 .518 50352.44
3 .778 c .605 .570 47576.82
a. Predictors: (Constant), Male=0 Female=1
b. Predictors: (Constant), Male=0 Female=1, P IQ
c. Predictors: (Constant), Male=0 Female=1, P IQ, HEIGHT

c) The ANOVA table for the three fitted models

ANOVAd

Sum of Mean
Model Squares df Square F Sig.
1 Regression 8.21E+10 1 8.21E+10 26.229 .000 a
Residual 1.13E+11 36 3.13E+09
Total 1.95E+11 37
2 Regression 1.06E+11 2 5.30E+10 20.919 .000 b
Residual 8.87E+10 35 2.54E+09
Total 1.95E+11 37
3 Regression 1.18E+11 3 3.93E+10 17.355 .000 c
Residual 7.70E+10 34 2.26E+09
Total 1.95E+11 37
a. Predictors: (Constant), Male=0 Female=1
b. Predictors: (Constant), Male=0 Female=1, PIQ
c. Predictors: (Constant), Male=0 Female=1, PIQ, HEIGHT
d. Dependent Variable: MRI_CNT

9
d) The regression coefficients for the three fitted models
Coefficientsa

Standardi
zed
Unstandardized Coefficien
Coefficients ts
Model B Std. E rror Beta t Sig.
1 (Constant) 955753.7 13187.864 72.472 .000
Male=0 Female=1 -93099.1 18178.216 -.649 -5.121 .000
2 (Constant) 829137.4 42861.670 19.344 .000
Male=0 Female=1 -90976.3 16373.728 -.634 -5.556 .000
PIQ 1127.148 366.639 .351 3.074 .004
3 (Constant) 353207.8 212544.8 1.662 .106
Male=0 Female=1 -54561.4 22231.139 -.380 -2.454 .019
PIQ 1267.677 351.864 .395 3.603 .001
HE IGHT 6447.095 2826.449 .355 2.281 .029
a. Dependent V ariable: MRI_CNT

e) The excluded variables at each stage from the three fitted models

Excluded V ariablesd

Collinearit
y
Partial Statistics
Model Beta In t Sig. Correlation Tolerance
1 FSIQ .287 a 2.404 .022 .377 .995
VIQ .223 a 1.796 .081 .290 .984
PIQ .351 a 3.074 .004 .461 .998
WEIGHT .173 a 1.060 .296 .176 .603
HE IGHT .257 a 1.447 .157 .238 .495
2 FSIQ -.329 b -1.023 .314 -.173 .126
VIQ -.132 b -.716 .479 -.122 .389
WEIGHT .187 b 1.284 .208 .215 .602
HE IGHT .355 b 2.281 .029 .364 .480
3 FSIQ -.179 c -.569 .573 -.099 .119
VIQ -.031 c -.173 .864 -.030 .363
WEIGHT .051 c .324 .748 .056 .473
a. Predictors in the Model: (Constant), Male=0 Female=1
b. Predictors in the Model: (Constant), Male=0 Female=1, PIQ
c. Predictors in the Model: (Constant), Male=0 Female=1, PIQ, HE IGHT
d. Dependent V ariable: MRI_CNT

10

You might also like