You are on page 1of 30

Sequential sums of squares

or extra sums of squares

Sequential sums of squares:


what are they?
The reduction in the error sum of squares
when one or more predictor variables are
added to the regression model.
Or, the increase in the regression sum of
squares when one or more predictor
variables are added to the regression model.

Sequential sums of squares:


why?
They can be used to test whether one slope
parameter is 0.
They can be used to test whether a subset
(more than two, but less than all) of the
slope parameters are 0.

Example: Brain and body size


predictive of intelligence?
Sample of n = 38 college students
Response (Y): intelligence based on the PIQ
(performance) scores from the (revised)
Wechsler Adult Intelligence Scale.
Predictor (X1): Brain size based on MRI
scans (given as count/10,000)
Predictor (X2): Height in inches
Predictor (X3): Weight in pounds

OUTPUT #1
The regression equation is PIQ = 4.7 + 1.18 MRI
Predictor
Constant
MRI

Coef
4.65
1.1766

SE Coef
43.71
0.4806

T
0.11
2.45

P
0.916
0.019

F
5.99

P
0.019

Analysis of Variance
Source
Regression
Error
Total

DF
1
36
37

SS
2697.1
16197.5
18894.6

MS
2697.1
449.9

OUTPUT #2
The regression equation is
PIQ = 111 + 2.06 MRI - 2.73 Height
Predictor
Constant
MRI
Height

Coef
111.28
2.0606
-2.7299

SE Coef
55.87
0.5466
0.9932

T
1.99
3.77
-2.75

MS
2786.4
380.6

F
7.32

Analysis of Variance
Source
DF
SS
Regression
2
5572.7
Residual
35
13321.8
Total
37
18894.6
Source
MRI
Height

DF
1
1

Seq SS
2697.1
2875.6

P
0.054
0.001
0.009
P
0.002

OUTPUT #3
The regression equation is
PIQ = 111 + 2.06 MRI - 2.73 Height + 0.001 Weight
Predictor
Constant
MRI
Height
Weight
Analysis of
Source
Regression
Error
Total
Source
MRI
Height
Weight

Coef
111.35
2.0604
-2.732
0.0006

SE Coef
62.97
0.5634
1.229
0.1971

Variance
DF
SS
3
5572.7
34
13321.8
37
18894.6

MS
1857.6
391.8

DF
1
1
1

Seq SS
2697.1
2875.6
0.0

T
1.77
3.66
-2.22
0.00
F
4.74

P
0.086
0.001
0.033
0.998
P
0.007

Sequential sums of squares:


definition using SSE notation
SSR(X2|X1) = SSE(X1) - SSE(X1,X2)
In general, you subtract the error sum of
squares due to all of the predictors both left
and right of the bar from the error sum of
squares due to the predictor to the right of
the bar.
SSR(X2,X3|X1) = SSE(X1) - SSE(X1,X2,X3)

Sequential sums of squares:


definition using SSR notation
SSR(X2|X1) = SSR(X1,X2) SSR(X1)
In general, you subtract the regression sum
of squares due to the predictor to the right
of the bar from the regression sum of
squares due to all of the predictors both left
and right of the bar.
SSR(X2,X3|X1) = SSR(X1,X2,X3)-SSR(X1)

Decomposition of regression sum


of squares
In multiple regression, there is more than one way to
decompose the regression sum of squares. For example:

SSR X 1 , X 2 SSR X 1 SSR X 2 | X 1


SSR X 1 , X 2 SSR X 2 SSR X 1 | X 2

OUTPUT #2
The regression equation is
PIQ = 111 + 2.06 MRI - 2.73 Height
Predictor
Constant
MRI
Height

Coef
111.28
2.0606
-2.7299

SE Coef
55.87
0.5466
0.9932

T
1.99
3.77
-2.75

MS
2786.4
380.6

F
7.32

Analysis of Variance
Source
DF
SS
Regression
2
5572.7
Residual
35
13321.8
Total
37
18894.6
Source
MRI
Height

DF
1
1

Seq SS
2697.1
2875.6

P
0.054
0.001
0.009
P
0.002

OUTPUT #4
The regression equation is
PIQ = 111 - 2.73 Height + 2.06 MRI
Predictor
Constant
Height
MRI
Analysis of
Source
Regression
Error
Total
Source
Height
MRI

Coef
111.28
-2.7299
2.0606

SE Coef
55.87
0.9932
0.5466

Variance
DF
SS
2
5572.7
35
13321.8
37
18894.6
DF
1
1

Seq SS
164.0
5408.8

MS
2786.4
380.6

T
1.99
-2.75
3.77

P
0.054
0.009
0.00

F
7.32

P
0.002

Decomposition of SSR: how?


SSTO X 1 SSR X 1 SSE X 1
SSTO X 1 SSR X 1 SSR X 2 | X 1 SSE X 1 , X 2

SSTO X 1 , X 2 SSR X 1 , X 2 SSE X 1 , X 2

SSR X 1 , X 2 SSR X 1 SSR X 2 | X 1

Decomposition of SSR: how?


SSTO X 2 SSR X 2 SSE X 2
SSTO X 2 SSR X 2 SSR X 1 | X 2 SSE X 1 , X 2

SSTO X 1 , X 2 SSR X 1 , X 2 SSE X 1 , X 2

SSR X 1 , X 2 SSR X 2 SSR X 1 | X 2

Even more ways to decompose


SSR when 3 or more predictors
SSR X 1 , X 2 , X 3
SSR X 1 , X 2 , X 3
SSR X 1 , X 2 , X 3

Degrees of freedom and


regression mean squares
A sequential sum of squares involving one extra
predictor variable has one degree of freedom
associated with it:
SSR X 2 | X 1
MSR X 2 | X 1
1

A sequential sum of squares involving two extra


predictor variables has two degrees of freedom
associated with it:
SSR X 2 , X 3 | X 1
MSR X , X | X
2

Sequential sums of squares in


Minitab
The SSR is automatically decomposed into
one-degree-of-freedom sequential sums of
squares, in the order in which the predictor
variables are entered into the model.
To get sequential sum of squares involving
two or more predictor variables, sum the
appropriate one-degree-of-freedom
sequential sums of squares.

OUTPUT #3
The regression equation is
PIQ = 111 + 2.06 MRI - 2.73 Height + 0.001 Weight
Predictor
Constant
MRI
Height
Weight
Analysis of
Source
Regression
Error
Total
Source
MRI
Height
Weight

Coef
111.35
2.0604
-2.732
0.0006

SE Coef
62.97
0.5634
1.229
0.1971

Variance
DF
SS
3
5572.7
34
13321.8
37
18894.6

MS
1857.6
391.8

DF
1
1
1

Seq SS
2697.1
2875.6
0.0

T
1.77
3.66
-2.22
0.00
F
4.74

P
0.086
0.001
0.033
0.998
P
0.007

OUTPUT #5
The regression equation is
PIQ = 111 - 2.73 Height + 0.001 Weight + 2.06 MRI
Predictor
Constant
Height
Weight
MRI
Analysis of
Source
Regression
Error
Total
Source
Height
Weight
MRI

Coef
111.35
-2.732
0.0006
2.0604

SE Coef
62.97
1.229
0.1971
0.5634

Variance
DF
SS
3
5572.7
34
13321.8
37
18894.6
DF
1
1
1

Seq SS
164.0
169.5
5239.2

MS
1857.6
391.8

T
1.77
-2.22
0.00
3.66

P
0.086
0.033
0.998
0.001

F
4.74

P
0.007

Testing one slope 1= MRI is 0


Predictor
Constant
Height
Weight
MRI
Analysis of
Source
Regression
Error
Total
Source
Height
Weight
MRI

Coef
111.35
-2.732
0.0006
2.0604

SE Coef
62.97
1.229
0.1971
0.5634

Variance
DF
SS
3
5572.7
34
13321.8
37
18894.6
DF
1
1
1

Seq SS
164.0
169.5
5239.2

MS
1857.6
391.8

T
1.77
-2.22
0.00
3.66

P
0.086
0.033
0.998
0.001

F
4.74

P
0.007

Testing one slope 2= HT is 0


Predictor
Constant
MRI
Weight
Height
Analysis of
Source
Regression
Error
Total
Source
MRI
Weight
Height

Coef
111.35
2.0604
0.0006
-2.732

SE Coef
62.97
0.5634
0.1971
1.229

Variance
DF
SS
3
5572.7
34
13321.8
37
18894.6
DF
1
1
1

Seq SS
2697.1
940.9
1934.7

MS
1857.6
391.8

T
1.77
3.66
0.00
-2.22
F
4.74

P
0.086
0.001
0.998
0.033
P
0.007

Testing one slope 3= WT is 0


Predictor
Constant
MRI
Height
Weight
Analysis of
Source
Regression
Error
Total
Source
MRI
Height
Weight

Coef
111.35
2.0604
-2.732
0.0006

SE Coef
62.97
0.5634
1.229
0.1971

Variance
DF
SS
3
5572.7
34
13321.8
37
18894.6

MS
1857.6
391.8

DF
1
1
1

Seq SS
2697.1
2875.6
0.0

T
1.77
3.66
-2.22
0.00
F
4.74

P
0.086
0.001
0.033
0.998
P
0.007

Testing one slope k is 0:


why it works?
Full model:

Yi 0 1 X 1 2 X 2 3 X 3 i
SSE ( F ) SSE X 1 , X 2 , X 3

df F n 4
Reduced model:

Yi 0 1 X 1 2 X 2 i
SSE ( R ) SSE X 1 , X 2

df R n 3

Testing one slope k is 0:


why it works? (contd)
The general linear test statistic:

SSE R SSE F SSE F


F*

df R df F
df F
becomes:

SSR X 3 | X 1 , X 2 SSE X 1 , X 2 , X 3 MSR X 3 | X 1 , X 2


F*

n 4
1
MSE X 1 , X 2 , X 3

Testing whether 2 = 3 = 0
Full model:

Yi 0 1 X 1 2 X 2 3 X 3 i
SSE ( F ) SSE X 1 , X 2 , X 3

df F n 4
Reduced model:

Yi 0 1 X 1 i
SSE ( R ) SSE X 1

df R n 2

Testing whether 2 = 3 = 0
(contd)
The general linear test statistic:

SSE R SSE F SSE F


F*

df R df F
df F
becomes:

SSR X 3 , X 2 | X 1 SSE X 1 , X 2 , X 3 MSR X 2 , X 3 | X 1


F*

n 4
2
MSE X 1 , X 2 , X 3

OUTPUT #3
The regression equation is
PIQ = 111 + 2.06 MRI - 2.73 Height + 0.001 Weight
Predictor
Constant
MRI
Height
Weight
Analysis of
Source
Regression
Error
Total
Source
MRI
Height
Weight

Coef
111.35
2.0604
-2.732
0.0006

SE Coef
62.97
0.5634
1.229
0.1971

Variance
DF
SS
3
5572.7
34
13321.8
37
18894.6

MS
1857.6
391.8

DF
1
1
1

Seq SS
2697.1
2875.6
0.0

T
1.77
3.66
-2.22
0.00
F
4.74

P
0.086
0.001
0.033
0.998
P
0.007

H 0 : 2 3 0
H A : 2 0 or 3 0
2875.6
F*
391.8 3.670
2
P-value is:

P F 2,34 3.670 1 0.964 0.036

Cumulative Distribution Function


F distribution with 2 DF in numerator and 34 DF in
denominator
x
3.6700

P( X <= x )
0.9640

Getting P-value for F-statistic in


Minitab
Select Calc >> Probability Distributions >>
F
Select Cumulative Probability. Use default
noncentrality parameter of 0.
Type in numerator DF and denominator DF.
Select Input constant. Type in F-statistic.
Answer appears in session window.
P-value is 1 minus the number that appears.

Test whether 1 = 3 = 0
Analysis of
Source
Regression
Error
Total
Source
Height
Weight
MRI

Variance
DF
SS
3
5572.7
34
13321.8
37
18894.6
DF
1
1
1

Seq SS
164.0
169.5
5239.2

MS
1857.6
391.8

F
4.74

P
0.007