Professional Documents
Culture Documents
Abbott
We have derived point estimators of all the unknown population parameters in the
Classical Linear Regression Model (CLRM) for which the population regression
equation, or PRE, is
∑ i u$ 2i RSS
σ$ =
2
= where û i = Yi − βˆ 0 − βˆ 1X i (i = 1, ..., N).
(N − 2) (N − 2)
( ) (
E σ$ 2 = σ 2 because E( RSS) = E Σ i u$ 2i ) = ( N − 2 )σ 2 .
Statistical inference consists essentially of using the point estimates β̂ 0 , β̂1 and
σ$ 2 of the unknown population parameters β0, β1 and σ2 to make statements about
the true values of β0 and β1 within specified margins of statistical error.
• For both types of statistical inference, we need to know the form of the
sampling distributions (or probability distributions) of the OLS coefficient
estimators β̂ 0 and β$ 1 and the unbiased (degrees-of-freedom-adjusted) error
variance estimator σ$ 2 .
For both forms of statistical inference -- interval estimation and hypothesis testing
-- it is necessary to obtain feasible test statistics for each of the OLS coefficient
estimators β̂ 0 and β$ 1 in the OLS sample regression equation (OLS-SRE)
Yi = βˆ 0 + βˆ 1X i + û i (i = 1,…,N) (2)
∑ xy ∑ x i yi
βˆ 1 = i i 2 i = i =1
(2.1)
∑i x i N
∑x 2
i
i =1
where x i ≡ X i − X , y i ≡ Yi − Y , X = ∑ i X i N , and Y = ∑ i Yi N .
βˆ 0 = Y − βˆ 1X (2.2)
1. It has a known probability distribution for the true population value of some
parameter(s).
2. It is a function only of sample data -- i.e., its value can be computed for some
hypothesized value of the unknown parameter(s) using only sample data.
The formula for a feasible test statistic must contain no unknown parameters
other than the parameter(s) of interest, i.e., the parameters being tested.
(1) a statement of the normality assumption and its implications for the sampling
distributions of the estimators β̂ 0 and β$ 1 ;
(2) a demonstration of how these implications can be used to derive feasible test
statistics for β̂ 0 and β̂ 1 ; and
(3) definition of three probability distributions related to the normal that are
used extensively in statistical inference.
The normality assumption states that the random error terms ui in the population
regression equation (PRE)
where N(0, σ2) denotes a normal distribution with mean or expectation equal
to 0 and variance equal to σ2 and the symbol “~” means “is distributed as”.
NOTE: The normal distribution has only two parameters: its mean or
expectation, and its variance. The probability density function, or pdf, of the
normal distribution is defined as follows: if a random variable X ~ N(µ, σ2),
then the pdf of X is denoted as f(Xi) and takes the form
−
1
⎡ ( X i − µ) 2 ⎤
f ( X i ) = (2 πσ )
2
exp ⎢−
2σ 2 ⎥⎦
2
♦ Parts (1) and (2) of A9 -- the assumptions of zero means and constant
variances -- are the “identically” part of the iid specification.
where NID(0, σ2) means “normally and independently distributed with zero
mean and constant variance σ2”.
The error normality assumption A9 implies that the sample values Yi of the
regressand (or dependent variable) Y are also normally distributed.
Yi = β0 + β1X i + u i (4)
↑_____________↑
That is, the Yi are normally and independently distributed (NID) with
Var(Yi X i ) = E ( u i2 X i ) = σ2 ∀ i = 1, ..., N;
Cov(Yi , Ys | X i , X s ) = E(u i u s X i , X s ) = 0 ∀ i ≠ s.
The linearity property of the OLS coefficient estimators β̂ 0 and β$ 1 means that the
normality of ui and Yi carries over to the sampling distributions of β̂ and β$ . For
0 1
example, recall that the OLS slope coefficient estimator β$ 1 can be written as the
following linear functions of the Yi and ui:
βˆ 1 = ∑ i k i Yi
(6)
= β1 + ∑ i k i u i
xi
ki = (i = 1, ..., N).
∑ i x 2i
Result: Equation (6) indicates that the OLS slope coefficient estimator β̂ 1 is a
linear function of both the observed Yi values and the unobserved ui values.
( ) ⎛
βˆ 1 ~ N β1 , Var (βˆ 1 ) = N⎜⎜ β1 ,
∑
σ2 ⎞
⎟.
2 ⎟
⎝ i i ⎠
x
The normality of the sampling distribution of β̂1 implies that the Z-statistic Z( β̂1 )
= (βˆ − β ) se( βˆ ) has the standard normal distribution with mean 0 and variance
1 1 1
1: i.e.,
βˆ − β βˆ − β βˆ 1 − β1
Z(βˆ 1 ) = 1 1 = 1 1 = ~ N(0, 1) .
Var(β1 )
ˆ se (βˆ
1 ) σ (∑ i x i )
2 12
( ) ⎛
βˆ 0 ~ N β 0 , Var (βˆ 0 ) = N⎜⎜ β 0 ,
σ 2 ∑ i X i2 ⎞
∑ 2 ⎟
⎟.
⎝ N i i ⎠
x
βˆ − β0 βˆ − β0 βˆ 0 − β0
Z(βˆ 0 ) = 0 = 0 = ~ N(0, 1) .
Var(β0 )
ˆ se (βˆ
0 ) σ(∑ i X i ) N (∑ i x i )
2 12 1/ 2 2 12
( N − 2) σ$ 2
~ χ 2 [ N − 2] ,
σ 2
∑ i û i2
σˆ 2 = where û i = Yi − βˆ 0 − βˆ 1X i (i = 1, ..., N).
( N −2)
(2) It must be capable of being calculated using only the given sample data
(Yi, Xi), i = 1, ..., N.
If some random variable X ~ N(µ, σ2), then the standardized normal variable
defined as Z = (X − µ)/σ has the standard normal distribution N(0,1).
X−µ
X ~ N(µ, σ2) ⇒ Z= ~ N(0, 1) .
σ
( ) ⎛
βˆ 1 ~ N β1 , Var (βˆ 1 ) = N⎜⎜ β1 ,
∑
σ2 ⎞
⎟.
2 ⎟
⎝ i i ⎠
x
βˆ − β βˆ − β βˆ 1 − β1
Z(βˆ 1 ) = 1 1 = 1 1 =
Var(βˆ 1 ) se(βˆ 1 ) σ (∑ i x i )
2 12
where Var( β̂1 ) is the true variance of β̂1 and se( β̂1 ) ≡ [Var( β̂1 )]1/2 is the true
standard error of β̂ 1 .
♦ Since β̂1 has the normal distribution, the statistic Z( β̂1 ) has the standard normal
distribution N(0,1): that is,
ˆβ ~ N⎛⎜ β , σ ⎞⎟ βˆ − β βˆ 1 − β1
2
⇒ Z(βˆ 1 ) = 1 1 = ~ N(0, 1) . (7)
⎜ 1 ∑ x2 ⎟ se(βˆ 1 ) σ (∑ i x i2 )
1 12
⎝ i i ⎠
Problem: The Z( β̂1 ) statistic in equation (7) is not a feasible test statistic for
β̂1 because it involves the unknown parameter σ, the square root of the
unknown error variance σ2.
To obtain a feasible test statistic for β̂1 , we use the Student's t-distribution.
Formally:
If (1) Z ~ N(0,1)
(2) V ~ χ2[m]
and (3) Z and V are independent,
Z
t = ~ t[m]
Vm
Result:
A t-statistic is simply the ratio of a standard normal variable to the square root
of an independent degrees-of-freedom-adjusted chi-square variable with m
degrees of freedom.
♦ Numerator of the t-statistic for β̂ 1 . The numerator of the t-statistic for β̂1 is
the Z( β̂1 ) statistic (7). It will be convenient to re-write the Z( β̂1 ) statistic in the
form
βˆ −β
Z(βˆ 1 ) = 1 1 =
β
ˆ −β
1 1 β (
= 1 1 i i
)
ˆ − β (∑ x 2 )1 2
~ N(0, 1) (8)
se(β1 ) σ (∑ i x i )
ˆ 2 12 σ
( N − 2) σ$ 2 σ$ 2 χ 2 [ N − 2]
~ χ 2 [ N − 2] ⇒ ~ . (9)
σ 2
σ2 ( N − 2)
The square root of this statistic is therefore distributed as the square root of a
degrees-of-freedom-adjusted chi-square variable with (N − 2) degrees of
freedom:
1
σ$ ⎡ χ [ N − 2] ⎤
2 2
~ ⎢ ⎥ . (10)
σ ⎣ ( N − 2) ⎦
♦ The t-statistic for β̂ 1 . The t-statistic for β̂1 is therefore the ratio of (8) to (10):
i.e.,
t (β1 ) =
ˆ Z(βˆ 1 )
=
(
βˆ 1 − β1 (∑ i x i2 ) σ
12
.
) (11)
σˆ σ σˆ σ
The t-statistic for β̂1 given by (11) can be rewritten without the unknown
parameter σ.
♦ Since the unknown parameter σ is the divisor of both the numerator and
denominator of t( β̂1 ), multiplication of the numerator and denominator of
(11) by σ permits the t-statistic for β̂1 to be written as
t (β1 ) =
ˆ ( )
βˆ 1 − β1 (∑ i x i2 ) σ
12
=
( )
βˆ 1 − β1 (∑ i x i2 )
12
. (12)
σˆ σ σˆ
( )
12
♦ Dividing the numerator and denominator of (12) by ∑ i x 2i yields
t (βˆ 1 ) =
( βˆ − β )
1 1
. (13)
σˆ (∑ i x i2 )
12
♦ But the denominator of (13) is simply the estimated standard error of β̂1 ;
i.e.,
σˆ
= Vâr (βˆ 1 ) = sê(βˆ 1 ) .
(∑ x )
i
2 12
i
βˆ 1 − β1 βˆ 1 − β1 βˆ − β
t (βˆ 1 ) = = = 1 1. (14)
σˆ (∑ i x )
2 12
i Vâr (βˆ 1 ) sê(βˆ 1 )
Note that, unlike the Z( β̂1 ) statistic in (8), the t( β̂1 ) statistic in (14) is a feasible
test statistic for β̂1 because its satisfies both the requirements for a feasible test
statistic.
(1) First, its sampling distribution is known; it has the t[N − 2] distribution,
the t-distribution with (N − 2) degrees of freedom:
βˆ − β
t (βˆ 1 ) = 1 1 ~ t[N − 2] .
sê(βˆ 1 )
(2) Second, its value can be calculated from sample data for any
hypothesized value of β 1 .
Result: The t-statistic for β̂ 0 is analogous to that for β̂1 and has the same
distribution: i.e.,
βˆ 0 − β0 βˆ − β0
t (βˆ 0 ) = = 0 ~ t[ N − 2]
Vâr (βˆ 0 ) sê(βˆ 0 )
1
⎡ σˆ ∑ i X ⎤
2 2 2
sê(βˆ 0 ) = Vâr (βˆ 0 ) = ⎢ ⎥ .
i
⎣ N ∑ i x ⎦
2
i
A second feasible test statistic for β̂1 can be derived from the normality
assumption A9 using the F-distribution.
Formally:
If (1) V1 ~ χ2[m1]
(2) V2 ~ χ2[m2]
and (3) V1 and V2 are independent,
V1 m1
F = ~ F[m1, m2]
V2 m 2
where F[m1, m2] denotes the F-distribution (or Fisher’s F-distribution) with m1
numerator degrees of freedom and m2 denominator degrees of freedom.
♦ Numerator of the F-statistic for β̂ 1 . The numerator of the F-statistic for β̂1 is
the square of the Z( β̂1 ) statistic (7). Recall that the square of a standard
normal N(0,1) random variable has a chi-square distribution with one
degree of freedom. Re-write the Z( β̂1 ) statistic as in (8) above:
Z(βˆ 1 ) =
β
ˆ −β
1 1
=
β
ˆ −β
1 1
ˆ − β (∑ x 2 )1 2
β
= 1 1 i i
( )
~ N(0, 1) . (8)
se(βˆ 1 ) σ (∑ i x i2 ) σ
1 2
(Z(βˆ )) = ( β −ˆ β ) ( βˆ − β ) ( βˆ − β ) (∑ x ) ~ χ [1] .
2 2 2
2 ˆ 2
= 2
(∑ x )
1 1 1 1 1 1 i i
(se(β ))
= (15)
σ σ
1 2 2 2 2
1 i i
( N − 2) σ$ 2 σ$ 2 χ 2 [ N − 2]
~ χ 2 [ N − 2] ⇒ ~ . (9)
σ 2
σ2 ( N − 2)
♦
2
(
It is possible to show that the χ 2 [1] -distributed statistic Z(βˆ 1 ) in (15) and the )
χ 2 [ N − 2] -distributed statistic ( N − 2)σˆ 2 σ 2 in (9) are statistically
independent.
♦ The F-statistic for β̂ 1 . The F-statistic for β̂1 is therefore the ratio of (15) to (9):
F(βˆ ) =
(Z(βˆ )) 1
2
σˆ σ 2
1 2
=
( βˆ − β ) (∑ x ) σ
1 1
2
i
2
i
2
σˆ 2 σ 2
=
( βˆ − β ) (∑ x )
1 1
2
i
2
i
(16)
σˆ 2
=
( βˆ − β )
1 1
2
σˆ ∑ i x i2
2
=
( βˆ − β )
1 1
2
F(βˆ 1 ) =
( βˆ − β ) 2
=
( βˆ − β ) 2
~ F[1, N − 2] .
σˆ 2 (∑ i x i2 )
1 1 1 1
(17)
Vâr (βˆ 1 )
Note that, like the t( β̂1 ) statistic in (14), the F( β̂1 ) statistic in (17) is a feasible
test statistic for β̂1 because its satisfies both the requirements for a feasible test
statistic.
F(β ) =
ˆ ( βˆ − β ) 1 1
2
~ F[1, N − 2] .
Vâr (βˆ 1 )
1
(2) Second, its value can be calculated entirely from sample data for any
hypothesized value of β1 .
Result: The F-statistic for β̂ 0 is analogous to that for β̂1 and has the same
distribution: i.e.,
F(βˆ ) =
( βˆ 0 − β0 ) 2
~ F[1, N − 2]
Vâr (βˆ 0 )
0
σˆ 2 ∑ i X i2
Vâr (β 0 ) =
ˆ .
N∑ i x i2
• The F-statistic for β̂1 is the square of the t-statistic for β̂1 :
F(βˆ ) =
( βˆ − β )
1 1
2
=
( βˆ − β )
1 1
2 2
⎛ βˆ − β ⎞
( 2
= ⎜⎜ 1 1 ⎟⎟ = t (βˆ 1 ) . )
1
Vâr (βˆ 1 ) ( sê(βˆ ))
1
2
⎝ sê(β1 ) ⎠
ˆ
F(βˆ ) =
( βˆ 0 − β 0 )
2
ˆ −β 2
β
= 0
( 0
= ⎜
)
⎛ βˆ 0 − β0 ⎞
⎟ = βˆ ) 2.
2
( )
( )
⎜ sê(βˆ ) ⎟ t (
Vâr (βˆ 0 )
0 2 0
sê(βˆ 0 ) ⎝ 0 ⎠
1. Under the error normality assumption A9, the sample statistics t (βˆ 1 ) and
t(βˆ ) have the t-distribution with N−2 degrees of freedom:
0
t (βˆ 1 ) =
( βˆ − β )
1 1
=
βˆ 1 − β1
~ t[N − 2] ;
Vâr (βˆ 1 ) sê(βˆ 1 )
t (βˆ 0 ) =
( βˆ− β0
0
= 0
)
βˆ − β0
~ t[ N − 2] .
Vâr (β0 )
ˆ sê (βˆ
0 )
2. Under the error normality assumption A9, the sample statistics F(βˆ 1 ) and
F (βˆ ) have the F-distribution with 1 numerator degree of freedom and
0
F(β ) =
ˆ ( βˆ − β )
1 1
2
~ F[1, N − 2] ;
Vâr (βˆ 1 )
1
F(βˆ ) =
( βˆ 0 − β0 ) 2
~ F[1, N − 2] .
Vâr (βˆ 0 )
0
Note that sê(βˆ 1 ) = Vâr (βˆ 1 ) and sê(βˆ 0 ) = Vâr (βˆ 0 ) are the estimated
standard errors, and Var $ (β$ 1 ) and Vâr (βˆ 0 ) are the estimated variances, of the
OLS coefficient estimators β$ and β̂ , respectively.
1 0
3. The Z-statistics for β̂ 0 and β̂1 are not feasible test statistics.
βˆ 1 − β1 βˆ − β βˆ 0 − β0 βˆ − β0
Z(βˆ 1 ) = = 1 1 and Z(βˆ 0 ) = = 0 .
Var(βˆ 1 ) se(βˆ 1 ) Var(β0 )
ˆ se (βˆ
0 )
They require for their computation the true but unknown variances and
standard errors of the OLS coefficient estimators, and these require that the
value of the error variance σ2 be known.
But since the value of σ2 is almost always unknown in practice, the values of
Var( β̂ 0 ) and Var( β$ 1 ), and of se( β̂ 0 ) and se( β$ 1 ), are also unknown.
t (βˆ 0 ) =
( βˆ− β0
0 ) ( βˆ − β0
= 0
) and t (βˆ 1 ) =
( βˆ − β )
1 1
=
βˆ 1 − β1
.
Vâr (βˆ 0 ) sê(βˆ 0 ) Vâr (βˆ 1 ) sê(βˆ 1 )
They are obtained by replacing the unknown variances and standard errors of
the OLS coefficient estimators in the Z-statistics Z( β̂ 0 ) and Z( β$ 1 ) with their
corresponding estimated variances Vâr (βˆ ) and Vâr (βˆ ) and estimated
0 1
5. The F-statistics for β̂ 0 and β̂1 also are feasible test statistics.
F(β ) =
ˆ ( βˆ 0− β0 )2
and F(β ) =
ˆ ( βˆ − β )1
. 1
2
The denominators of F(βˆ 0 ) and F(βˆ 1 ) are the estimated variances Vâr (βˆ 0 ) and
$ (β$ ) , not the true variances Var (βˆ ) and Var (βˆ ) .
Var 1 0 1
Formally:
If Z1, Z2, ..., Zm are m independent N(0,1) random variables such that
Zi ~ N(0,1) i = 1, ..., m,
m
V = Z12 + Z 22 + ⋅⋅⋅ + Z2m = ∑ Z2i ~ χ2[m]
i =1
(1) The mean and variance of a chi-square distribution are determined entirely by
the value of m, the degrees-of-freedom parameter:
E(V) = E(χ2[m]) = m;
(2) The value of m also completely determines the shape of the chi-square
distribution.
Vi ~ χ2[mi] i = 1, ..., n,
n n
V = V1 + V2 + … + Vn = ∑ Vi ~ χ2[k], k = ∑ mi .
i =1 i =1
Formally:
If (1) Z ~ N(0,1)
(2) V ~ χ2[m]
and (3) Z and V are independent,
Z
t = ~ t[m]
Vm
where t[m] denotes the t-distribution (or Student’s t-distribution) with m degrees
of freedom.
(1) The t-distribution has a mean equal to zero and a variance that is determined
completely by the value of m:
E(t) = E(t[m]) = 0;
m
Var(t) = Var(t[m]) = .
m−2
(3) The limiting distribution of the t[m]-distribution is the standard normal N(0,1)
distribution: that is,
As m → ∞, t[m] → N(0,1).
Formally:
If (1) V1 ~ χ2[m1]
(2) V2 ~ χ2[m2]
and (3) V1 and V2 are independent,
V1 m1
F = ~ F[m1, m2]
V2 m 2
where F[m1, m2] denotes the F-distribution (or Fisher’s F-distribution) with m1
numerator degrees of freedom and m2 denominator degrees of freedom.
The square of a random variable that has the t-distribution with k degrees of
freedom equals a random variable that has the F-distribution with m1 = 1
numerator degrees of freedom and m2 = k denominator degrees of freedom. That
is,
( t[ k ]) 2 = F[1, k ] .
NOTE: This equality holds only for F variables that have numerator degrees of
freedom equal to 1.