Professional Documents
Culture Documents
Regression With A Single Regressor: Hypothesis Tests and Confidence Intervals
Regression With A Single Regressor: Hypothesis Tests and Confidence Intervals
2
ˆ1 ~ N 1 , v4 , where vi = (Xi – X)ui
n X
4
Hypothesis Testing and the Standard
Error of ̂1 (Section 5.1)
The objective is to test a hypothesis, like 1 = 0, using data – to
reach a tentative conclusion whether the (null) hypothesis is
correct or incorrect.
General setup
Null hypothesis and two-sided alternative:
H0: 1 = 1,0 vs. H1: 1 1,0
where 1,0 is the hypothesized value under the null.
n ( X i X )
i 1
where vˆi = ( X i X )uˆi .
7
1 n 2
1
n 2 i 1
vˆi
ˆ 2ˆ = 2
, where vˆi = ( X i X )uˆi .
1
n 1 n 2
n ( Xi X )
i 1
SE( ˆ ) = ˆ 2 = the standard error of ˆ
1 ˆ1 1
so:
·
TestScore = 698.9 – 2.28STR, , R2 = .05, SER = 18.6
(10.4) (0.52)
t (1 = 0) = –4.38, p-value = 0.000 (2-sided)
95% 2-sided conf. interval for 1 is (–3.30, –1.26)
15
Summary of Statistical Inference
about 0 and 1:
Estimation:
OLS estimators ˆ0 and ˆ1
ˆ and ˆ have approximately normal sampling distributions
0 1
in large samples
Testing:
H0: 1 = 1,0 v. 1 1,0 (1,0 is the value of 1 under H0)
t = ( ˆ – 1,0)/SE( ˆ )
1 1
p-value = area under standard normal outside tact (large n)
Confidence Intervals:
95% confidence interval for 1 is { ˆ 1.96 SE( ˆ )}
1 1
So far, 1 has been called a “slope,” but that doesn’t make sense
if X is binary.
OLS regression: ·
TestScore = 650.0 + 7.4D
(1.3) (1.8)
Tabulation of group means:
Class Size Average score (Y ) Std. dev. (sY) N
Small (STR > 20) 657.4 19.4 238
Large (STR ≥ 20) 650.0 17.9 182
What…?
Consequences of homoskedasticity
Implication for computing standard errors
21
Example: hetero/homoskedasticity in the case of a binary
regressor (that is, the comparison of means)
Standard error when group variances are unequal:
ss2 sl2
SE =
ns nl
Standard error when group variances are equal:
1 1
SE = s p
ns nl
( n 1) s 2
( n 1) s 2
where s 2p = s s l l
(SW, Sect 3.6)
ns nl 2
sp = “pooled estimator of 2” when l2 = s2
Equal group variances = homoskedasticity
Unequal group variances = heteroskedasticity
22
Homoskedasticity in a picture:
23
Heteroskedasticity in a picture:
Heteroskedastic or homoskedastic?
25
The class size data:
Heteroskedastic or homoskedastic?
26
So far we have (without saying so) assumed
that u might be heteroskedastic.
Recall the three least squares assumptions:
1. E(u|X = x) = 0
2. (Xi,Yi), i =1,…,n, are i.i.d.
3. Large outliers are rare
27
What if the errors are in fact homoskedastic?
You can prove that OLS has the lowest variance among
estimators that are linear in Y… a result called the Gauss-
Markov theorem that we will return to shortly.
The formula for the variance of ˆ and the OLS standard
1
errors:
Homoskedasticity-only standard error formula:
1 n 2
1
n 2 i 1
uˆi
SE( ˆ1 ) = n .
n 1
n i 1
( X i X ) 2
heteroskedasticity).
The two formulas coincide (when n is large) in the special
case of homoskedasticity
So, you should always use heteroskedasticity-robust standard
errors. 33
Some Additional Theoretical
Foundations of OLS (Section 5.5)
We have already learned a very great deal about OLS: OLS is
unbiased and consistent; we have a formula for
heteroskedasticity-robust standard errors; and we can construct
confidence intervals and test statistics.
34
Still, some of you may have further questions:
Is this really a good reason to use OLS? Aren’t there other
estimators that might be better – in particular, ones that might
have a smaller variance?
Also, what ever happened to our old friend, the Student t
distribution?
35
The Extended Least Squares
Assumptions
These consist of the three LS assumptions, plus two more:
1. E(u|X = x) = 0.
2. (Xi,Yi), i =1,…,n, are i.i.d.
3. Large outliers are rare (E(Y4) < , E(X4) < ).
4. u is homoskedastic
5. u is distributed N(0,2)
Assumptions 4 and 5 are more restrictive – so they apply to
fewer cases in practice. However, if you make these
assumptions, then certain mathematical calculations simplify
and you can prove strong results – results that hold if these
additional assumptions are true.
We start with a discussion of the efficiency of OLS
36
Efficiency of OLS, part I: The
Gauss-Markov Theorem
Under extended LS assumptions 1-4 (the basic three, plus
homoskedasticity), ˆ has the smallest variance among all linear
1
Comments
The GM theorem is proven in SW Appendix 5.2
37
The Gauss-Markov Theorem, ctd.
ˆ1 is a linear estimator, that is, it can be written as a linear
function of Y1,…, Yn:
n
( X i X )ui
1 n
ˆ1 – 1 = i 1
n
= wi ui ,
n i 1
i
( X
i 1
X ) 2
(Xi X )
where wi = n
.
1
i
n i 1
( X X ) 2
The G-M theorem says that among all possible choices of {wi},
the OLS weights yield the smallest var( ˆ ) 1
38
Efficiency of OLS, part II:
( X i X )ui
ˆ1 – 1 = i 1
n
i
( X
i 1
X ) 2
1 n (Xi X )
= wi ui , where wi = n
.
n i 1 1
n i 1
( X i X ) 2