You are on page 1of 84

Introduction to Linear Regression

(SW Chapter 4)
Empirical problem: Class size and educational output
Policy question: What is the effect of reducing class
size by one student per class? by 8 students/class?
What is the right output (performance) measure?
parent satisfaction
student personal deelopment
future adult !elfare
future adult earnings
performance on standardized tests
"#$
What do data say about class sizes and test scores
!he Cali"ornia !est Score #ata Set
%ll &#' and &#8 California school districts (n ( ")*)
+ariables:
,
th
grade test scores (-tanford#. achieement test/
combined math and reading)/ district aerage
-tudent#teacher ratio (-01) ( no2 of students in the
district diided by no2 full#time equialent teachers
"#)
%n initial loo3 at the California test score data:
"#4
5o districts !ith smaller classes (lo!er -01) hae higher test
scores?
"#"
!he class size$test score policy %uestion:
What is the effect on test scores of reducing -01 by
one student/class?
6b7ect of policy interest:
0est score
STR

0his is the slope of the line relating test score and STR
"#,
0his suggests that !e !ant to dra! a line through the
Test Score v. STR scatterplot 8 but how?
"#'
Some &otation and !erminology
(Sections 4'( and 4'))
0he population regression line:
Test Score (
*
9
$
STR

$
( slope of population regression line
(
0est score
STR


( change in test score for a unit change in STR
Why are
*
and
$
population parameters?
We !ould li3e to 3no! the population alue of
$
2
We don:t 3no!
$
/ so must estimate it using data2
"#;
How can we estimate
*
and
$
from data?
1ecall that
Y
!as the least squares estimator of
Y
:
Y

soles/
)
$
min ( )
n
m i
i
Y m

<y analogy/ *e *ill "ocus on the least s%uares


(+ordinary least squares, or +OLS,) estimator o" the
un-no*n parameters
.
and
(
/ !hich soles/
* $
)
/ * $
$
min = ( )>
n
b b i i
i
Y b b

"#8
0he 6?- estimator soles:
* $
)
/ * $
$
min = ( )>
n
b b i i
i
Y b b

0he 6?- estimator minimizes the aerage squared


difference bet!een the actual alues of Y
i
and the
prediction (predicted alue) based on the estimated line2
0his minimization problem can be soled using
calculus (%pp2 "2))2
!he result is the /LS estimators o"
.
and
(
2
"#.
Why use /LS0 rather than some other estimator
6?- is a generalization of the sample aerage: if the
@lineA is 7ust an intercept (no )/ then the 6?-
estimator is 7ust the sample aerage of Y
$
/BY
n
(
Y )
2
?i3e
Y
/ the 6?- estimator has some desirable
properties: under certain assumptions/ it is unbiased
(that is/ !(
$
C
) (
$
)/ and it has a tighter sampling
distribution than some other candidate estimators of

$
(more on this later)
Dmportantly/ this is !hat eeryone uses 8 the common
@languageA of linear regression2
"#$*
"#$$
%pplication to the California Test Score " #lass Si$e data
Estimated slope (
$
C
( 8 )2)8
Estimated intercept (
*
C
( '.82.
"#$)
Estimated regression line:

TestScore
( '.82. 8 )2)8STR
%nterpretation of the estimated slope and intercept

TestScore
( '.82. 8 )2)8STR
5istricts !ith one more student per teacher on aerage
hae test scores that are )2)8 points lo!er2
0hat is/
0est score
STR

( 8)2)8
0he intercept (ta3en literally) means that/ according to
this estimated line/ districts !ith zero students per
teacher !ould hae a (predicted) test score of '.82.2
0his interpretation of the intercept ma3es no sense 8 it
eFtrapolates the line outside the range of the data 8 in
"#$4
this application/ the intercept is not itself
economically meaningful2
"#$"
1redicted 2alues 3 residuals:
6ne of the districts in the data set is %ntelope/ C%/ for
!hich STR ( $.244 and Test Score ( ',;28
predicted alue:
C
&ntelope
Y
( '.82. 8 )2)8$.244 ( ',"28
residual:
C
&ntelope
u
( ',;28 8 ',"28 ( 42*
"#$,
/LS regression: S!4!4 output
regress testscr str, robust
Regression with robust standard errors Number of obs = 420
F( 1, 418) = 192!
"rob # F = 00000
R$s%uared = 00&12
Root '() = 18&81
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
* Robust
testscr * +oef (td )rr t "#*t* ,9&- +onf .nter/a01
$$$$$$$$2$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
str * $2239808 &194892 $449 0000 $440094& $12&8!31
5cons * !98944 104!44! !344 0000 !38&!02 31940&3
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$

TestScore
( '.82. 8 )2)8STR
(!e:ll discuss the rest of this output later)
"#$'
0he 6?- regression line is an estimate/ computed using
our sample of dataG a different sample !ould hae gien
a different alue of
$
C
2
Ho! can !e:
quantify the sampling uncertainty associated !ith
$
C
?
use
$
C
to test hypotheses such as
$
( *?
construct a confidence interal for
$
?
?i3e estimation of the mean/ !e proceed in four steps:
$2 0he probability frame!or3 for linear regression
)2 Estimation
42 Hypothesis 0esting
"2 Confidence interals
"#$;
(' 1robability 5rame*or- "or Linear Regression
'opulation
population of interest (eF: all possible school districts)
Random variables: Y/
EF: (Test Score( STR)
)oint distribution of (Y/)
0he 3ey feature is that !e suppose there is a linear
relation in the population that relates and YG this linear
relation is the @population linear regressionA
"#$8
!he 1opulation Linear Regression 6odel (Section 4'7)
Y
i
(
*
9
$

i
9 u
i
/ i ( $/B/ n
is the independent variable or re*ressor
Y is the dependent variable

*
( intercept

$
( slope
u
i
( @error termA
0he error term consists of omitted factors/ or possibly
measurement error in the measurement of Y2 Dn
"#$.
general/ these omitted factors are other factors that
influence Y/ other than the ariable
"#)*
!+.: 0he population regression line and the error term
What are some of the omitted factors in this e+ample?
"#)$
,ata and samplin*
0he population ob7ects (@parametersA)
*
and
$
are
un3no!nG so to dra! inferences about these un3no!n
parameters !e must collect releant data2
Simple random samplin*:
Choose n entities at random from the population of
interest/ and obsere (record) and Y for each entity
-imple random sampling implies that I(
i
/ Y
i
)J/ i ( $/B/
n/ are independently and identically distributed (i2i2d2)2
(-ote: (
i
/ Y
i
) are distributed independently of (
.
/ Y
.
) for
different obserations i and .2)
"#))
0as3 at hand: to characterize the sampling distribution of
the 6?- estimator2 0o do so/ !e ma3e three
assumptions:
!he Least S%uares 4ssumptions
$2 0he conditional distribution of u gien has mean
zero/ that is/ !(uK ( +) ( *2
)2 (
i
(Y
i
)/ i ($/B/n/ are i2i2d2
42 and u hae four moments/ that is:
!(
"
) L and !(u
"
) L 2
We:ll discuss these assumptions in order2
"#)4
Least s%uares assumption 8(: E(u9X : x) : .'
/or any *iven value of ( the mean of u is $ero
"#)"
EFample: %ssumption M$ and the class size eFample
Test Score
i
(
*
9
$
STR
i
9 u
i
/ u
i
( other factors
@6ther factors:A
parental inolement
outside learning opportunities (eFtra math class/22)
home enironment conducie to reading
family income is a useful proFy for many such factors
-o !(uK(+) ( * means !(/amily %ncomeKSTR) ( constant
(!hich implies that family income and STR are
uncorrelated)2 This assumption is not innocuous0 We
will return to it often.
"#),
Least s%uares assumption 8):
(X
i
0Y
i
)0 i : (0;0n are i'i'd'
0his arises automatically if the entity (indiidual/ district)
is sampled by simple random sampling: the entity is
selected then/ for that entity/ and Y are obsered
(recorded)2
0he main place !e !ill encounter non#i2i2d2 sampling is
!hen data are recorded oer time (@time series dataA) 8
this !ill introduce some eFtra complications2
"#)'
Least s%uares assumption 87:
E(X
4
) < and E(u
4
) <
<ecause Y
i
(
*
9
$

i
9 u
i
/ assumption M4 can
equialently be stated as/ !(
"
) L and !(Y
"
) L 2
%ssumption M4 is generally plausible2 % finite domain of
the data implies finite fourth moments2 (-tandardized
test scores automatically satisfy thisG STR/ family income/
etc2 satisfy this too)2
"#);
$2 0he probability frame!or3 for linear regression
)' Estimation: the Sampling #istribution o"
$
C


(Section 4'4)
42 Hypothesis 0esting
"2 Confidence interals
?i3e
Y
/
$
C
has a sampling distribution2
What is !(
$
C
)? (!here is it centered)
What is ar(
$
C
)? (measure of sampling uncertainty)
What is its sampling distribution in small samples?
What is its sampling distribution in large samples?
"#)8
0he sampling distribution of
$
C
: some algebra:
Y
i
(
*
9
$

i
9 u
i
Y
(
*
9
$

9
u
so Y
i
8
Y
(
$
(
i
8

) 9 (u
i
8
u
)
0hus/
$
C
(
$
)
$
( )( )
( )
n
i i
i
n
i
i
Y Y


(
$
$
)
$
( )= ( ) ( )>
( )
n
i i i
i
n
i
i
u u

"#).
$
C
(
$
$
)
$
( )= ( ) ( )>
( )
n
i i i
i
n
i
i
u u

(
$ $
$
) )
$ $
( )( ) ( )( )
( ) ( )
n n
i i i i
i i
n n
i i
i i
u u




+



so
$
C
8
$
(
$
)
$
( )( )
( )
n
i i
i
n
i
i
u u

"#4*
We can simplify this formula by noting that:
$
( )( )
n
i i
i
u u

(
$
( )
n
i i
i
u

8
$
( )
n
i
i
u

1
]

(
$
( )
n
i i
i
u

2
0hus
$
C
8
$
(
$
)
$
( )
( )
n
i i
i
n
i
i
u

(
$
)
$
$
n
i
i

v
n
n
s
n

_

,

!here v
i
( (
i
8

)u
i
2
"#4$
$
C
8
$
(
$
)
$
$
n
i
i

v
n
n
s
n

_

,

/ !here v
i
( (
i
8

)u
i
We no! can calculate the mean and ariance of
$
C
:
!(
$
C
8
$
) (
)
$
$ $
n
i
i
n
! v s
n n

1
_

1
,
]

(
)
$
$
$
n
i
i

v
n
!
n n s

1
_

1

,
]

(
)
$
$
$
n
i
i

v
n
!
n n s

_
_

,
,

"#4)
No! !(v
i
/
)

s
) ( !=(
i
8

)u
i
/
)

s
> ( *
because !(u
i
K
i
(+) ( * (for details see %pp2 "24)
0hus/ !(
$
C
8
$
) (
)
$
$
$
n
i
i

v
n
!
n n s

_
_

,
,

( *
so
!(
$
C
) (
$
0hat is/
$
C
is an unbiased estimator o"
(
'
"#44
Calculation of the ariance of
$
C
:
$
C
8
$
(
$
)
$
$
n
i
i

v
n
n
s
n

_

,

0his calculation is simplified by supposing that n is


large (so that
)

s
can be replaced by
)

)G the result is/


ar(
$
C
) (
)
ar( )

v
n
(Oor details see %pp2 "242)
"#4"
0he eFact sampling distribution is complicated/ but !hen
the sample size is large !e get some simple (and good)
approFimations:
($) <ecause ar(
$
C
) $/n and !(
$
C
) (
$
/
$
C

p

$
()) When n is large/ the sampling distribution of
$
C
is
!ell approFimated by a normal distribution (C?0)
"#4,
$
C
8
$
(
$
)
$
$
n
i
i

v
n
n
s
n

_

,

When n is large:
v
i
( (
i
8 )u
i
(
i
8

)u
i
/ !hich is i2i2d2 (why?) and
has t!o moments/ that is/ ar(v
i
) L (why?)2 0hus
$
$
n
i
i
v
n

is distributed -(*/ar(v)/n) !hen n is large

s
is approFimately equal to
)

!hen n is large

$ n
n

( $ 8
$
n
$ !hen n is large
Putting these together !e hae:
"#4'
Large=n appro>imation to the distribution o"
$
C
:
$
C
8
$
(
$
)
$
$
n
i
i

v
n
n
s
n

_

,


$
)
$
n
i
i

v
n

/
!hich is approFimately distributed -(*/
)
) )
( )
v

)2
<ecause v
i
( (
i
8

)u
i
/ !e can !rite this as:
$
C
is approFimately distributed -(
$
/
"
ar=( ) >
i + i

u
n

)
"#4;
1ecall the summary of the sampling distribution of
Y
:
Oor (Y
$
/B/Y
n
) i2i2d2 !ith * L
)
Y

L /
0he eFact (finite sample) sampling distribution of
Y

has mean
Y
(@
Y
is an unbiased estimator of
Y
A) and
ariance
)
Y

/n
6ther than its mean and ariance/ the eFact
distribution of
Y
is complicated and depends on the
distribution of Y

Y

p


Y
(la! of large numbers)

( )
ar( )
Y ! Y
Y

is approFimately distributed -(*/$) (C?0)


"#48
1arallel conclusions hold "or the /LS estimator
$
C
:
Pnder the three ?east -quares %ssumptions/
0he eFact (finite sample) sampling distribution of
$
C


has mean
$
(@
$
C


is an unbiased estimator of
$
A)/ and
ar(
$
C
) is inersely proportional to n2
6ther than its mean and ariance/ the eFact
distribution of
$
C
is complicated and depends on the
distribution of (/u)

$
C

p


$
(la! of large numbers)

$ $
$
C C
( )
C
ar( )
!

is approFimately distributed -(*/$) (C?0)


"#4.
"#"*
$2 0he probability frame!or3 for linear regression
)2 Estimation
7' ?ypothesis !esting (Section 4'@)
"2 Confidence interals
-uppose a s3eptic suggests that reducing the number of
students in a class has no effect on learning or/
specifically/ test scores2 0he s3eptic thus asserts the
hypothesis/
H
*
:
$
( *
We !ish to test this hypothesis using data 8 reach a
tentatie conclusion !hether it is correct or incorrect2
"#"$
Null hypothesis and t*o=sided alternatie:
H
*
:
$
( * s2 H
$
:
$
*
or/ more generally/
H
*
:
$
(
$/*
s2 H
$
:
$

$/*
!here
$/*
is the hypothesized alue under the null2
Null hypothesis and one=sided alternatie:
H
*
:
$
(
$/*
s2 H
$
:
$
L
$/*
Dn economics/ it is almost al!ays possible to come up
!ith stories in !hich an effect could @go either !ay/A so
it is standard to focus on t!o#sided alternaties2
1ecall hypothesis testing for population mean using
Y
:
"#")
t (
/*
/
Y
Y
Y
s n

then re7ect the null hypothesis if KtK Q$2.'2
!here the S! of the estimator is the square root of an
estimator of the ariance of the estimator2
"#"4
%pplied to a hypothesis about
$
:
t (
estimator # hypothesized alue
standard error of the estimator
so
t (
$ $/*
$
C
C
( ) S!

!here
$
is the alue of
$/*
hypothesized under the null
(for eFample/ if the null alue is zero/ then
$/*
( *2
What is S!(
$
C
)?
S!(
$
C
) ( the square root of an estimator of the
ariance of the sampling distribution of
$
C

"#""
1ecall the eFpression for the ariance of
$
C
(large n):
ar(
$
C
) (
) )
ar=( ) >
( )
i + i

u
n

(
)
"
v

!here v
i
( (
i
8

)u
i
2 Estimator of the ariance of
$
C
:
$
)
C
C

(
)
) )
$ estimator of
(estimator of )
v

(
) )
$
)
)
$
$
C ( )
$
)
$
( )
n
i i
i
n
i
i
u
n
n

n

1
]

2
"#",
$
)
C
C

(
) )
$
)
)
$
$
C ( )
$
)
$
( )
n
i i
i
n
i
i
u
n
n

n

1
]

2
6&/ this is a bit nasty/ but:
0here is no reason to memorize this
Dt is computed automatically by regression soft!are
S!(
$
C
) (
$
)
C
C

is reported by regression soft!are


Dt is less complicated than it seems2 0he numerator
estimates the ar(v)/ the denominator estimates
ar()2
"#"'
1eturn to calculation of the t#statsitic:
t (
$ $/*
$
C
C
( ) S!

(
$
$ $/*
)
C
C
C


1e7ect at ,R significance leel if KtK Q $2.'
p#alue is p ( Pr=KtK Q Kt
act
K> ( probability in tails of
normal outside Kt
act
K
<oth the preious statements are based on large#n
approFimationG typically n ( ,* is large enough for
the approFimation to be eFcellent2
"#";
E>ample: Test Scores and STR0 Cali"ornia data
Estimated regression line:

TestScore
( '.82. 8 )2)8STR
1egression soft!are reports the standard errors:
S!(
*
C
) ( $*2" S!(
$
C
) ( *2,)
t#statistic testing
$/*
( * (
$ $/*
$
C
C
( ) S!

(
)2)8 *
*2,)

( 8"248
0he $1 )#sided significance leel is )2,8/ so !e re7ect
the null at the $R significance leel2
%lternatiely/ !e can compute the p#alueB
"#"8
0he p#alue based on the large#n standard normal
approFimation to the t#statistic is *2****$ ($*
8"
)
"#".
$2 0he probability frame!or3 for linear regression
)2 Estimation
42 Hypothesis 0esting
4' Con"idence inter2als (Section 4'A)
Dn general/ if the sampling distribution of an estimator is
normal for large n/ then a .,R confidence interal can be
constructed as estimator $2.'standard error2
-o: a .,R confidence interal for
$
C
is/
I
$
C
$2.'S!(
$
C
)J
"#,*
!+ample: Test Scores and STR/ California data
Estimated regression line:

TestScore
( '.82. 8 )2)8STR
S!(
*
C
) ( $*2" S!(
$
C
) ( *2,)
.,R confidence interal for
$
C
:
I
$
C
$2.'S!(
$
C
)J ( I8)2)8 $2.'*2,)J
( (8424*/ 8$2)')
Equialent statements:
0he .,R confidence interal does not include zeroG
0he hypothesis
$
( * is re7ected at the ,R leel
"#,$
4 con2ention "or reporting estimated regressions:
Put standard errors in parentheses belo! the estimates

TestScore

( '.82. 8 )2)8STR
($*2") (*2,))
0his eFpression means that:
0he estimated regression line is

TestScore
( '.82. 8 )2)8STR
0he standard error of
*
C
is $*2"
0he standard error of
$
C
is *2,)
"#,)
/LS regression: S!4!4 output
regress testscr str, robust
Regression with robust standard errors Number of obs = 420
F( 1, 418) = 192!
"rob # F = 00000
R$s%uared = 00&12
Root '() = 18&81
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
* Robust
testscr * +oef (td )rr t "#*t* ,9&- +onf .nter/a01
$$$$$$$$2$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
str * $2239808 &194892 $448 0000 $440094& $12&8!31
5cons * !98944 104!44! !344 0000 !38&!02 31940&3
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
so:

TestScore

( '.82. 8 )2)8STR
($*2") (*2,))
t (
$
( *) ( 8"248/ p#alue ( *2***
.,R conf2 interal for
$
is (8424*/ 8$2)')
"#,4
Regression *hen X is Binary (Section 4'C)
-ometimes a regressor is binary:
( $ if female/ ( * if male
( $ if treated (eFperimental drug)/ ( * if not
( $ if small class size/ ( * if not
-o far/
$
has been called a @slope/A but that doesn:t
ma3e much sense if is binary2
Ho! do !e interpret regression !ith a binary regressor?
"#,"
Y
i
(
*
9
$

i
9 u
i
/ !here is binary (
i
( * or $):
When
i
( *: Y
i
(
*
9 u
i

When
i
( $: Y
i
(
*
9
$
9 u
i

thus:
When
i
( */ the mean of Y
i
is
*

When
i
( $/ the mean of Y
i
is
*
9
$
that is:
!(Y
i
K
i
(*) (
*
!(Y
i
K
i
($) (
*
9
$
so:

$
( !(Y
i
K
i
($) 8 !(Y
i
K
i
(*)
"#,,
( population difference in group means
!+ample: TestScore and STR/ California data
?et
,
i
(
$ if )*
* if )*
i
i
STR
STR

'
>

0he 6?- estimate of the regression line relating


TestScore to , (!ith standard errors in parentheses) is:

TestScore
( ',*2* 9 ;2",
($24) ($28)
5ifference in means bet!een groups ( ;2"G
"#,'
S! ( $28 t ( ;2"/$28 ( "2*
"#,;
#ompare the re*ression results with the *roup means(
computed directly2
Class -ize %erage score (
Y
) -td2 de2 (s
Y
) -
-mall (STR Q )*) ',;2" $.2" )48
?arge (STR S )*) ',*2* $;2. $8)
Estimation:
small large
Y Y
( ',;2" 8 ',*2* ( ;2"
!est :.:
;2"
( ) $284
s l
s l
Y Y
t
S! Y Y

( "2*,
D@E con"idence inter2al (I;2"$2.'$284J((428/$$2*)
This is the same as in the regression!

TestScore
( ',*2* 9 ;2",
($24) ($28)
"#,8
Summary: regression *hen X
i
is binary (.$()
Y
i
(
*
9
$

i
9 u
i

*
( mean of Y gien that ( *

*
9
$
( mean of Y gien that ( $

$
( difference in group means/ ($ minus ( *
-E(
$
C
) has the usual interpretation
t#statistics/ confidence interals constructed as usual
0his is another !ay to do difference#in#means
analysis
"#,.
0he regression formulation is especially useful !hen
!e hae additional regressors (comin* up soon3)
"#'*
/ther Regression Statistics (Section 4'F)
% natural question is ho! !ell the regression line @fitsA
or eFplains the data2 0here are t!o regression statistics
that proide complementary measures of the quality of
fit:
0he re*ression R
)
measures the fraction of the
ariance of Y that is eFplained by G it is unitless and
ranges bet!een zero (no fit) and one (perfect fit)
0he standard error of the re*ression measures the fit
8 the typical size of a regression residual 8 in the units
of Y2
"#'$
!he R
)
Write Y
i
as the sum of the 6?- prediction 9 6?-
residual:
Y
i
(
C
i
Y 9
C
i
u
0he R
)
is the fraction of the sample ariance of Y
i

@eFplainedA by the regression/ that is/ by
C
i
Y :
R
)
(
!SS
TSS
/
!here !SS (
)
$
C C
( )
n
i
i
Y Y

and TSS (
)
$
( )
n
i
i
Y Y

2
"#')
R
)
(
!SS
TSS
/ !here !SS (
)
$
C C
( )
n
i
i
Y Y

and TSS (
)
$
( )
n
i
i
Y Y

0he R
)
:
R
)
( * means !SS ( */ so eFplains none of the
ariation of Y
R
)
( $ means !SS ( TSS/ so Y (
C
Y
so eFplains all of
the ariation of Y
* T R
)
T $
Oor regression !ith a single regressor (the case here)/
R
)
is the square of the correlation coefficient bet!een
and Y
"#'4
!he Standard Error of the Regression (SER)
0he standard error of the regression is (almost) the
sample standard deiation of the 6?- residuals:
S!R (
)
$
$
C C ( )
)
n
i i
i
u u
n



(
)
$
$
C
)
n
i
i
u
n


(the second equality holds because
$
$
C
n
i
i
u
n

( *)2
"#'"
S!R (
)
$
$
C
)
n
i
i
u
n


0he S!R:
has the units of u/ !hich are the units of Y
measures the spread of the distribution of u
measures the aerage @sizeA of the 6?- residual (the
aerage @mista3eA made by the 6?- regression line)
0he root mean squared error (R4S!) is closely
related to the S!R:
R4S! (
)
$
$
C
n
i
i
u
n

0his measures the same thing as the S!R 8 the minor


difference is diision by $/n instead of $/(n8))2
"#',
Technical note: !hy diide by n8) instead of n8$?
S!R (
)
$
$
C
)
n
i
i
u
n


5iision by n8) is a @degrees of freedomA correction
li3e diision by n8$ in
)
Y
s
G the difference is that/ in the
S!R/ t!o parameters hae been estimated (
*
and
$
/ by
*
C
and
$
C
)/ !hereas in
)
Y
s
only one has been estimated
(
Y
/ by
Y
)2
When n is large/ it ma3es negligible difference !hether
n/ n8$/ or n8) are used 8 although the conentional
formula uses n8) !hen there is a single regressor2
"#''
Oor details/ see -ection $,2"
EFample of R
)
and S!R

TestScore

( '.82. 8 )2)8STR/ R
)
( 2*,/ S!R ( $82'
($*2") (*2,))
"#';
The slope coefficient is statistically si*nificant and lar*e
in a policy sense( even thou*h STR e+plains only a small
fraction of the variation in test scores2
"#'8
4 1ractical &ote: ?eteros-edasticity0
?omos-edasticity0 and the 5ormula "or the Standard
Errors o"
*
C
and
$
C
(Section 4'D)
What do these t!o terms mean?
Consequences of homos3edasticity
Dmplication for computing standard errors
What do these t*o terms mean
Df ar(uK(+) is constant 8 that is/ the ariance of the
conditional distribution of u gien does not depend on
/ then u is said to be homosedastic2 6ther!ise/ u is
said to be heterosedastic2
"#'.
!omosedasticity in a picture2
!(uK(+) ( * (u satisfies ?east -quares %ssumption M$)
0he ariance of u does not change !ith (depend on) +
"#;*
!eterosedasticity in a picture2
!(uK(+) ( * (u satisfies ?east -quares %ssumption M$)
"#;$
0he ariance of u depends on + 8 so u is
heteros3edastic2
%n real#!orld eFample of heterosedasticity from labor
economics: aerage hourly earnings s2 years of
education (data source: $... Current Population -urey)
"#;)

A
v
e
r
a
g
e

h
o
u
r
l
y

e
a
r
n
i
n
g
s
Scatterplot and OLS Regression Line
Years of Education
Average Hourly Earnings Fitted values
5 10 15 0
0
0
!0
"0
"#;4
Ds heteros3edasticity present in the class size data?
Hard to sayBloo3s nearly homos3edastic/ but the spread
might be tighter for large alues of STR2
"#;"
-o far !e hae (!ithout saying so) assumed that u is
heteros3edastic:
Recall the three least s5uares assumptions2
$2 0he conditional distribution of u gien has mean
zero/ that is/ !(uK ( +) ( *2
)2 (
i
(Y
i
)/ i ($/B/n/ are i2i2d2
42 and u hae four finite moments2
Heteros3edasticity and homos3edasticity concern ar(uK
(+)2 <ecause !e hae not eFplicitly assumed
homos3edastic errors/ !e hae implicitly allo!ed for
heteros3edasticity2
"#;,
What if the errors are in fact homos6edastic?2
Uou can proe some theorems about 6?- (in
particular/ the Vauss#War3o theorem/ !hich says
that 6?- is the estimator !ith the lo!est ariance
among all estimators that are linear functions of (Y
$
/
B/Y
n
)G see -ection $,2,)2
0he formula for the ariance of
$
C
and the 6?-
standard error simplifies (%pp2 "2"): Df ar(u
i
K
i
(+) (
)
u

/ then
ar(
$
C
) (
) )
ar=( ) >
( )
i + i

u
n

( B (
)
)
u

-ote: ar(
$
C
) is inersely proportional to ar():
more spread in means more information about
$
C
2
"#;'
7eneral formula for the standard error of
$
C
is the of:
$
)
C
C

(
) )
$
)
)
$
$
C ( )
$
)
$
( )
n
i i
i
n
i
i
u
n
n

n

1
]

2
Special case under homos3edasticity:
$
)
C
C

(
)
$
)
$
$
C
$
)
$
( )
n
i
i
n
i
i
u
n
n

n

2
-ometimes it is said that the lo!er formula is simpler2
"#;;
0he homos3edasticity#only formula for the standard error
of
$
C
and the @heteros3edasticity#robustA formula (the
formula that is alid under heteros3edasticity) differ 8 in
general/ you *et different standard errors usin* the
different formulas2
?omos-edasticity=only standard errors are the
de"ault setting in regression so"t*are G
sometimes the only setting (e'g' E>cel)' !o get
the general +heteros-edasticity=robust,
standard errors you must o2erride the de"ault'
Df you don:t oerride the default and there is in fact
heteros3edasticity/ you !ill get the !rong standard errors
(and !rong t#statistics and confidence interals)2
"#;8
The critical points:
Df the errors are homos3edastic and you use the
heteros3edastic formula for standard errors (the one
!e deried)/ you are 6&
Df the errors are heteros3edastic and you use the
homos3edasticity#only formula for standard errors/
the standard errors are !rong2
0he t!o formulas coincide (!hen n is large) in the
special case of homos3edasticity
0he bottom line: you should al!ays use the
heteros3edasticity#based formulas 8 these are
conentionally called the heterosedasticity"robust
standard errors2
"#;.
?eteros-edasticity=robust standard errors in S!4!4
regress testscr str, robust
Regression with robust standard errors Number of obs = 420
F( 1, 418) = 192!
"rob # F = 00000
R$s%uared = 00&12
Root '() = 18&81
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
* Robust
testscr * +oef (td )rr t "#*t* ,9&- +onf .nter/a01
$$$$$$$$2$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
str * $2239808 &194892 $449 0000 $440094& $12&8!31
5cons * !98944 104!44! !344 0000 !38&!02 31940&3
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
Hse the +0 robust, optionIII
"#8*
Summary and 4ssessment (Section 4'(.)
0he initial policy question:
-uppose ne! teachers are hired so the student#
teacher ratio falls by one student per class2 What
is the effect of this policy interention (this
@treatmentA) on test scores?
5oes our regression analysis gie a conincing ans!er?
-ot really 8 districts !ith lo! STR tend to be ones
!ith lots of other resources and higher income
families/ !hich proide 3ids !ith more learning
opportunities outside schoolBthis suggests that
corr(u
i
/STR
i
) Q */ so !(u
i
K
i
)*2
"#8$
#igression on Causality
0he original question (!hat is the quantitatie effect of
an interention that reduces class size?) is a question
about a causal effect: the effect on Y of applying a unit
of the treatment is
$
2
<ut !hat is/ precisely/ a causal effect?
0he common#sense definition of causality isn:t
precise enough for our purposes2
Dn this course/ !e define a causal effect as the effect
that is measured in an ideal randomi#ed controlled
experiment2
"#8)
Ideal Randomized Controlled E>periment
%deal: sub7ects all follo! the treatment protocol 8
perfect compliance/ no errors in reporting/ etc2X
Randomi$ed: sub7ects from the population of interest
are randomly assigned to a treatment or control group
(so there are no confounding factors)
#ontrolled: haing a control group permits
measuring the differential effect of the treatment
!+periment: the treatment is assigned as part of the
eFperiment: the sub7ects hae no choice/ !hich
means that there is no @reerse causalityA in !hich
sub7ects choose the treatment they thin3 !ill !or3
best2
"#84
Bac- to class size:
What is an ideal randomized controlled eFperiment for
measuring the effect on Test Score of reducing STR?
Ho! does our regression analysis of obserational data
differ from this ideal?
o0he treatment is not randomly assigned
oDn the P- 8 in our obserational data 8 districts !ith
higher family incomes are li3ely to hae both
smaller classes and higher test scores2
o%s a result it is plausible that !(u
i
K
i
(+) *2
oDf so/ ?east -quares %ssumption M$ does not hold2
oDf so/
$
C
is biased: does an omitted factor ma3e
class size seem more important than it really is?
"#8"

You might also like