You are on page 1of 4

Inference for Regression Equations

In a beginning course in statistics, most often, the computational formulas for inference
in regression settings are simply given to the students. Some attempt is made to illustrate
why the components of the formula make sense, but the derivations are beyond the
scope of the course.
For advanced students, however, these formulas become applications of the
expected value theorems studied earlier in the year. To derive the regression inference
equations, students must remember that Var ( kX ) = k 2 Var ( X ) , and, when X and Y are
independent, Var ( X + Y ) = Var ( X ) + Var (Y ) and Var ( XY ) = Var ( X )Var (Y ) . Finally,
Var ( X )
Var ( X n ) = .
n
In addition, the Modeling Assumptions for Regression are:

1. There is a normally distributed


subpopulation of responses for each value of
the explanatory variable. These
subpopulations all have a common variance.
So, y | x ~ N ( y| x , e ) .

2. The means of the subpopulations fall on a


straight-line function of the explanatory
variable. This means that y| x = + x and
that y = a + bx estimates the mean response
for a given value of the explanatory variable.

Another way to describe this is to say that Graphical Representation of regression


Y = + X + with ~ N ( 0, e ) . assumptions
3. The selection of an observation from any of the subpopulations is independent of the
selection of any other observation. The values of the explanatory variable are assumed to
be fixed. This fixed (and known) value for the independent variable is essential for
developing the formulae.

The key to understanding the various standard errors for regression is to realize that the
variation of interest comes from the distribution of y around y|x . This is ~ N ( 0, e ) .

From our initial work on regression, we saw that y = a + bx and y = y + b ( x x ) .


XY i i
Now, if we let X i = xi x and Y = y y , then b = i
. All of the regression
i i
X i
i
2

equations originate with this computational formula for b.


To see that this is true, consider y = y + b ( x x ) . In this form, we have a one
variable problem. Since we know all the individual values of x and y, and, consequently,
the means x and y , we can use first semester calculus to solve for b. Define
2 2

( )
n n
S = ( yi yi ) = yi ( y + b ( xi x ) ) . Now, let X i = xi x and Yi = yi y , so
i =1 i =1
n 2

S = (Yi bX i ) . Find the value of b that minimizes S.


i =1

( X Y + bX ) = 0 .
n n
dS dS
= 2 (Yi bX i )( X i ) . If = 0 , then i i i
2
Solving for b, we find
db i =1 db i =1

n n i X iYi
b X i = X iYi and b =
2
.
i =1 i =1 X i2 i

The Standard Error for the Slope


To compute a confidence interval for , we need to determine the variance of b,
using the expected value theorems.

XY XY
, we compute Var ( b ) = Var .
i i i i
Since b = i

i
Since the values of X are
X i
i
2
X
i
i
2

assumed to be fixed, X 2
in the denominator is a constant. So,
X iYi

Var i = 1
Var X iYi
Xi
2 2
X2 i
i i
i

and Var X iYi = Var ( X 1Y1 + X 2Y2 + " + X nYn ) . The Xs are constants and we are
i
interested in the variation of Yi for the given X i , which is the common variance e2 .

So, Var ( X 1Y1 + X 2Y2 + " + X nYn ) = X 12Var (Y1 ) + X 22Var (Y2 ) + " + X n2Var (Yn ) = e2 X i2 .
i

X iYi X i2
Putting it all together, we find Var ( b ) = Var i =2 i = e . This
2


Xi
2 e 2
2
X i2
Xi
i i
i
e2
is often written as Var ( b ) = .
( x x )
2
i
i
So, the standard error for the slope in regression can be estimated by
se se
sb = or sb = .

(x x)
2
n 1 s x
i
i

The Standard Error for y , the Predicted Mean


Confidence intervals for a predicted mean can now be obtained. The standard
error can be determined by computing Var ( y ) . We know that y = y + b ( x x ) , so, as
before, using the expected value theorems, we find

Var ( y ) = Var ( y + b ( x x ) ) = Var ( y ) + ( x x ) Var ( b ) ,


2

yi
e2 = 1 Var y = n e = e .
2 2
with Var ( b ) = and Var ( y ) = Var i i n2
n n2
( x x )
2
i n

i

So,
e2 ( x x )
2
e2
Var ( y ) = Var ( y + b ( x x ) ) = + .
(x x )
2
n i

The standard error for predicting a mean response for a given value of x can be estimated
(x x) .
2
1
by s y = se +
n ( xi x )2

The Standard Error for the Intercept


The variance of the intercept a can be estimated using the previous formula for
the standard error for y . Since y = a + bx , the variance of a is the variance of y when
(x)
2
1
x = 0 . So, sa = se + .
n ( xi x )2

The Standard Error for a Predicted Value


Finally, to predict a y-value, y p , for a given x, we need to consider two
independent errors. We know that y is normally distributed around y|x so
y | x ~ N ( y|x , e ) . Given y|x , we can estimate our error in predicting y. But, as we
have just seen, there is also variation in our predictions of y|x . First, we predict y
taking into account its own variation and then we use that prediction in predicting y. So

2 e2 ( x x )2 1 ( x x)
2

Var ( y p ) = Var ( y ) + Var ( ) = e


+ + ( e ) = e 1 + +
2 2
.
n ( x x )2 n ( x x )2
i i

The standard error for this prediction can be estimated with

1 ( x x)
2

s y p = s 1 + +
2
.
e
n ( x x )2
i

Now we have all the equations found in the texts.

se
Standard error for Slope: sb =
n 1 sx

(x x)
2
Standard error for 1
s y = se +
Predicted Mean: n ( xi x )2

(x)
2
Standard error for the 1
sa = se +
Intercept: n ( xi x )2
1 ( x x)
2
Standard error for a

syp = s 1 + +
2

Predicted Value: n ( x x )2
e
i

Reference: Kennedy, Joh B. and Adam M. Neville, Basic Statistical Methods for
Engineers and Scientists, 3rd, Harper and Row, 1986.