Professional Documents
Culture Documents
Lecture Week4 V3
Lecture Week4 V3
4.1 Introduction
This is applicable when the data are multivariate. A multiple linear regression model relates a
response variable Y to more than one explanatory variable.
The main purpose of the multiple regression analysis is to find which explanatory variables
contribute to the variation of the response variable. We are usually looking for the ‘best’ subset
of the explanatory variables.
The Model
1
4.2 Example
Is the size of your brain related to your mental capacity? In this study by Willerman et al. (1991)
the researchers use Magnetic Resonance Imaging (MRI) to determine the brain size of the
subjects. The researchers take into account gender and body size to draw conclusions about the
connection between brain size and intelligence.
Willerman et al. (1991) conducted their study at a large southwestern university. They selected a
sample of 40 right-handed Anglo introductory psychology students who had indicated no history
of alcoholism, unconsciousness, brain damage, epilepsy, or heart disease. These subjects were
drawn from a larger pool of introductory psychology students with total Scholastic Aptitude Test
Scores higher than 1350 or lower than 940 who had agreed to satisfy a course requirement by
allowing the administration of four subtests (Vocabulary, Similarities, Block Design, and Picture
Completion) of the Wechsler (1981) Adult Intelligence Scale-Revised. With prior approval of the
University's research review board, students selected for MRI were required to obtain prorated
full-scale IQs of greater than 130 or less than 103, and were equally divided by sex and IQ
classification. The MRI Scans were performed at the same facility for all 40 subjects. The scans
consisted of 18 horizontal MR images. The computer counted all pixels with non-zero gray scale
in each of the 18 images and the total count served as an index for brain size. The data are shown
below
The Data:
Gender FSIQ VIQ PIQ Weight Height MRI_CNT
Female 133 132 124 118 64.5 816932
Male 140 150 124 . 72.5 1001121
Male 139 123 150 143 73.3 1038437
Male 133 129 128 172 68.8 965353
Female 137 132 134 147 65.0 951545
Female 99 90 110 146 69.0 928799
Female 138 136 131 138 64.5 991305
Female 92 90 98 175 66.0 854258
Male 89 93 84 134 66.3 904858
Male 133 114 147 172 68.8 955466
Female 132 129 124 118 64.5 833868
Male 141 150 128 151 70.0 1079549
Male 135 129 124 155 69.0 924059
Female 140 120 147 155 70.5 856472
Female 96 100 90 146 66.0 878897
Female 83 71 96 135 68.0 865363
Female 132 132 120 127 68.5 852244
Male 100 96 102 178 73.5 945088
Female 101 112 84 136 66.3 808020
Male 80 77 86 180 70.0 889083
2
Male 83 83 86 . . 892420
Male 97 107 84 186 76.5 905940
Female 135 129 134 122 62.0 790619
Male 139 145 128 132 68.0 955003
Female 91 86 102 114 63.0 831772
Male 141 145 131 171 72.0 935494
Female 85 90 84 140 68.0 798612
Male 103 96 110 187 77.0 1062462
Female 77 83 72 106 63.0 793549
Female 130 126 124 159 66.5 866662
Female 133 126 132 127 62.5 857782
Male 144 145 137 191 67.0 949589
Male 103 96 110 192 75.5 997925
Male 90 96 86 181 69.0 879987
Female 83 90 81 143 66.5 834344
Female 133 129 128 153 66.5 948066
Male 140 150 124 144 70.5 949395
Female 88 86 94 139 64.5 893983
Male 81 90 74 148 74.0 930016
Male 89 91 89 179 75.5 935863
A new variable SEX was created from gender having set male=0 and Female=1.
Note there are 40 cases in the data set, but 2 cases have missing values, so n=38 cases are used in
the multiple linear regression analysis.
The response variable y and the all the x continuous variables are plotted between each other.
MRI_CNT
FSI Q
VI Q
PI Q
WEIGHT
SEX
Female
This plot shows that individually PIQ, WEIGHT and HEIGHT are related linearly with the
response variable MRI_CNT. Also there is some high correlation between some of the
explanatory variables.
3
4.3 SPSS regression output.
Regression
b
Variables Entered/Remov ed
Variables Variables
Model Entered Removed Method
1 Male=0
Female=1,
PIQ,
. Enter
WEIGHT ,
HE IGHT,a
VIQ, FSIQ
a. All requested variables entered.
b. Dependent V ariable: MRI_CNT
Std. Error
Adjusted R of the
Model R R Square Square Estimate
1 .808 a .652 .585 46759.06
a. Predictors: (Constant), Male=0 Female=1, PIQ,
WEIGHT , HEIGHT , VIQ, FSIQ
The variation explained here is 65.2%.
Sum of Mean
Model Squares df Square F Sig.
1 Regression 1.27E+11 6 2.12E+10 9.684 .000 a
Residual 6.78E+10 31 2.19E+09
Total 1.95E+11 37
a. Predictors: (Constant), Male=0 Female=1, PIQ, WEIGHT, HEIGHT, VIQ, FSIQ
b. Dependent Variable: MRI_CNT
If the regression is not significant, then y does not depend on the x’s. The hypotheses
may be written:
MS (regression)
Test Statistic F ~ Fk , nk 1 if H 0 true (where k 6 and n 38 )
MS (residual )
Observed F = 9.684 and P =0.000
Since P < 0.001, the data provide very strong evidence to reject H 0 .
Conclusion The data provide very strong evidence of a linear relationship between y
(MRI_CNT) and the six entered variables (i.e. SEX, PIQ, WEIGHT, HEIGHT, VIQ and FSIQ).
Hence the six entered variables are jointly significant.
4
c) Testing whether the x-variables individually are significant.
Coefficientsa
Standardi
zed
Unstandardized Coefficien
Coefficients ts
Model B Std. E rror Beta t Sig.
1 (Constant) 206819.4 235162.2 .879 .386
FSIQ -9389.378 4651.638 -3.082 -2.019 .052
VIQ 5388.765 2761.426 1.704 1.951 .060
PIQ 6287.507 2526.270 1.958 2.489 .018
WEIGHT 87.015 485.553 .028 .179 .859
HE IGHT 6883.317 3207.980 .379 2.146 .040
Male=0 Female=1 -42368.7 24529.592 -.295 -1.727 .094
a. Dependent V ariable: MRI_CNT
ˆ j
Test Statistic: T ~ t nk if H 0 true
std . error ( ˆ j )
Example
H o : 4 0 No relationship between MRI_CNT and PIQ (given FSIQ,VIQ,HEIGHT,WEIGHT,SEX)
H i : 4 0 There is a linear relationship (given all the other entered x - variables)
From the table, HEIGHT is also a significant linear predictor for MRI_CNT (given all the other
entered x-variables). However the input variables FSIQ, VIQ, WEIGHT and SEX are
individually not significant linear predictors for MRI_CNT (given all the other entered x-
variables).
5
4.4 Selecting appropriate subsets of x-variables.
There are many ways to construct a ‘best’ regression equation from a large set of explanatory
(x) variables.
Backward elimination: We begin with a model that includes all the explanatory variables
and eliminate the least significant variable one at a time, until all variables in the model are
significant.
Forward selection: We start with the constant and add the most significant variable
not in the model one at a time, until the variable to be entered is not significant.
Forward stepwise selection: Add one variable at the time into the model as in forward
selection, but also check whether any existing variables can be removed.
Backward elimination:
ii) Test whether the parameter of each variable currently in the model is different
from 0, and find its p value.
Select the least significant input variable from those currently in the model
[i.e. with the largest p-value],
and remove this input variable from the model, provided its p-value is greater than
the significance level for Remove (0.10, by default in SPSS).
6
Forward selection:
ii) Add each variable not currently in the model in turn to the current model and
find its p value on entering the current model.
Select the most significant input variable from those not currently in the model
[i.e. with the smallest p-value when entered into the current model],
and include this input variable in the model provided its p-value is less than the
significance level for Entry (0.05, by default in SPSS).
ii) Select the most significant input variable from those not currently in the model
[i.e. with the smallest p-value when entered into the current model],
and include this input variable in the model provided its p-value is less than the
significance level for Entry (0.05, by default in SPSS).
iii) Select the least significant input variable from those currently in the model
[i.e. with the largest p-value],
and remove this input variable from the model, provided its p-value is greater than
the significance level for Remove (0.10, by default in SPSS).
Repeat iii) until all variables left in the model are significant
iii) If no variables were entered in step ii) or removed in step iii), then stop;
otherwise return to step ii).
7
The following output uses the option STEPWISE.
a
Variables Entered/Remov ed
Variables Variables
Model Entered Removed Method
1 Stepwise
(Criteria:
Probabilit
y-of-F-to-e
Male=0 nter <=
.
Female=1 .050,
Probabilit
y-of-F-to-r
emove >=
.100).
2 Stepwise
(Criteria:
Probabilit
y-of-F-to-e
nter <=
PIQ .
.050,
Probabilit
y-of-F-to-r
emove >=
.100).
3 Stepwise
(Criteria:
Probabilit
y-of-F-to-e
nter <=
HE IGHT .
.050,
Probabilit
y-of-F-to-r
emove >=
.100).
Firstlya. SEX,
Dependent V ariable:
secondly MRI_CNT
PIQ and finally HEIGHT are found to be significant.
There are three models fitted altogether before the algorithm stopped:
i) SEX
ii) SEX+PIQ
iii) SEX+PIQ+HEIGHT
Model iii) is the final model.
The output below shows the R-squares, the ANOVA table and the regression coefficients from all
three models. Also shows the variables which were rejected from the analysis at each stage. Note
that the final prediction model is
8
Model Summary
Std. E rror
Adjusted R of the
Model R R S quare Square Estimate
1 .649 a .421 .405 55951.37
2 .738 b .544 .518 50352.44
3 .778 c .605 .570 47576.82
a. Predictors: (Constant), Male=0 Female=1
b. Predictors: (Constant), Male=0 Female=1, P IQ
c. Predictors: (Constant), Male=0 Female=1, P IQ, HEIGHT
ANOVAd
Sum of Mean
Model Squares df Square F Sig.
1 Regression 8.21E+10 1 8.21E+10 26.229 .000 a
Residual 1.13E+11 36 3.13E+09
Total 1.95E+11 37
2 Regression 1.06E+11 2 5.30E+10 20.919 .000 b
Residual 8.87E+10 35 2.54E+09
Total 1.95E+11 37
3 Regression 1.18E+11 3 3.93E+10 17.355 .000 c
Residual 7.70E+10 34 2.26E+09
Total 1.95E+11 37
a. Predictors: (Constant), Male=0 Female=1
b. Predictors: (Constant), Male=0 Female=1, PIQ
c. Predictors: (Constant), Male=0 Female=1, PIQ, HEIGHT
d. Dependent Variable: MRI_CNT
9
d) The regression coefficients for the three fitted models
Coefficientsa
Standardi
zed
Unstandardized Coefficien
Coefficients ts
Model B Std. E rror Beta t Sig.
1 (Constant) 955753.7 13187.864 72.472 .000
Male=0 Female=1 -93099.1 18178.216 -.649 -5.121 .000
2 (Constant) 829137.4 42861.670 19.344 .000
Male=0 Female=1 -90976.3 16373.728 -.634 -5.556 .000
PIQ 1127.148 366.639 .351 3.074 .004
3 (Constant) 353207.8 212544.8 1.662 .106
Male=0 Female=1 -54561.4 22231.139 -.380 -2.454 .019
PIQ 1267.677 351.864 .395 3.603 .001
HE IGHT 6447.095 2826.449 .355 2.281 .029
a. Dependent V ariable: MRI_CNT
e) The excluded variables at each stage from the three fitted models
Excluded V ariablesd
Collinearit
y
Partial Statistics
Model Beta In t Sig. Correlation Tolerance
1 FSIQ .287 a 2.404 .022 .377 .995
VIQ .223 a 1.796 .081 .290 .984
PIQ .351 a 3.074 .004 .461 .998
WEIGHT .173 a 1.060 .296 .176 .603
HE IGHT .257 a 1.447 .157 .238 .495
2 FSIQ -.329 b -1.023 .314 -.173 .126
VIQ -.132 b -.716 .479 -.122 .389
WEIGHT .187 b 1.284 .208 .215 .602
HE IGHT .355 b 2.281 .029 .364 .480
3 FSIQ -.179 c -.569 .573 -.099 .119
VIQ -.031 c -.173 .864 -.030 .363
WEIGHT .051 c .324 .748 .056 .473
a. Predictors in the Model: (Constant), Male=0 Female=1
b. Predictors in the Model: (Constant), Male=0 Female=1, PIQ
c. Predictors in the Model: (Constant), Male=0 Female=1, PIQ, HE IGHT
d. Dependent V ariable: MRI_CNT
10