Fixed Effects Regression: Estimation

Three estimation methods:
1. “n-1 binary regressors” OLS regression
2. “Entity-demeaned” OLS regression
3. “Changes” specification (only works for T = 2)
• These three methods produce identical estimates
of the regression coefficients, and identical
standard errors.
• We already did the “changes” specification (1988
minus 1982) – but this only works for T = 2 years
• Methods #1 and #2 work for general T
• Method #1 is only practical when n isn’t too big
8-1
1. “n-1 binary regressors” OLS regression
Y
it
= β
0
+ β
1
X
it
+ γ
2
D2
i
+ … + γ
n
Dn
i
+ u
i
(1)
where D2
i
=
1 for =2 (state #2)
0 otherwise
i
¹
'
¹
etc.
• First create the binary variables D2
i
,…,Dn
i
• Then estimate (1) by OLS
• Inference (hypothesis tests, confidence intervals)
is as usual (using heteroskedasticity-robust
standard errors)
• This is impractical when n is very large (for
example if n = 1000 workers)
8-2
2. “Entity-demeaned” OLS regression
The fixed effects regression model:
Y
it
= β
1
X
it
+ α
i
+ u
i
The state averages satisfy:

1
1
T
it
t
Y
T
·

= α
i
+ β
1
1
1
T
it
t
X
T
·

+
1
1
T
it
t
u
T
·

Deviation from state averages:
Y
it

1
1
T
it
t
Y
T
·

=
β
1
1
1
T
it it
t
X X
T
·
¸ _

¸ ,
∑ +
1
1
T
it it
t
u u
T
·
¸ _

¸ ,

8-3
Entity-demeaned OLS regression, ctd.
Y
it

1
1
T
it
t
Y
T
·

=
β
1
1
1
T
it it
t
X X
T
·
¸ _

¸ ,
∑ +
1
1
T
it it
t
u u
T
·
¸ _

¸ ,

or
it
Y
%
= β
1
it
X
%
+
it
u
%
where
it
Y
%
= Y
it

1
1
T
it
t
Y
T
·

and
it
X
%
= X
it

1
1
T
it
t
X
T
·

• For i=1 and t = 1982,
it
Y
%
is the difference
between the fatality rate in Alabama in 1982,
and its average value in Alabama averaged over
all 7 years.
8-4
Entity-demeaned OLS regression, ctd.
it
Y
%
= β
1
it
X
%
+
it
u
%
(2)
where
it
Y
%
= Y
it

1
1
T
it
t
Y
T
·

, etc.
• First construct the demeaned variables
it
Y
%
and
it
X
%
• Then estimate (2) by regressing
it
Y
%
on
it
X
%
using
OLS
• Inference (hypothesis tests, confidence intervals)
is as usual (using heteroskedasticity-robust
standard errors)
• This is like the “changes” approach, but instead
Y
it
is deviated from the state average instead of
Y
i1
.
8-5
Example, ctd.
For n = 48, T = 7:
FatalityRate = –.66BeerTax + State fixed
effects
(.20)
• Should you report the intercept?
• How many binary regressors would you include
to estimate this using the “binary regressor”
method?
• Compare slope, standard error to the estimate for
the 1988 v. 1982 “changes” specification
T = 2, n = 48):
FR
1988
-FR
1982
= –.072 – 1.04(BeerTax
1988
–BeerTax
1982
)
8-6
(.065) (.36)
Regression with Time Fixed Effects
(SW Section 8.4)
An omitted variable might vary over time but not
across states:
• Safer cars (air bags, etc.); changes in national
laws
• These produce intercepts that change over time
• Let these changes (“safer cars”) be denoted by
the variable S
t
, which changes over time but not
states.
• The resulting population regression model is:
Y
it
= β
0
+ β
1
X
it
+ β
2
Z
i
+ β
3
S
t
+ u
it
8-7
Time fixed effects only
Y
it
= β
0
+ β
1
X
it
+ β
3
S
t
+ u
it
In effect, the intercept varies from one year to the
next:
Y
i,1982
= β
0
+ β
1
X
i,1982
+ β
3
S
1982
+ u
i,1982
= (β
0
+ β
3
S
1982
) + β
1
X
i,1982
+ u
i,1982
or
Y
i,1982
= µ
1982
+ β
1
X
i,1982
+ u
i,1982
,
µ
1982
= β
0
+ β
3
S
1982
Similarly,
Y
i,1983
= µ
1983
+ β
1
X
i,1983
+ u
i,1983
,
µ
1983
= β
0
+ β
3
S
1983
etc.
8-8
Two formulations for time fixed effects
1. “Binary regressor” formulation:
Y
it
= β
0
+ β
1
X
it
+ δ
2
B2
t
+ … δ
T
BT
t
+ u
it
where B2
t
=
1 when =2 (year #2)
0 otherwise
t
¹
'
¹
, etc.
2. “Time effects” formulation:
Y
it
= β
1
X
it
+ µ
t
+ u
it
8-9
Time fixed effects: estimation methods
1. “T-1 binary regressors” OLS regression
Y
it
= β
0
+ β
1
X
it
+ δ
2
B2
it
+ … δ
T
BT
it
+ u
it
• Create binary variables B2,…,BT
• B2 = 1 if t = year #2, = 0 otherwise
• Regress Y on X, B2,…,BT using OLS
• Where’s B1?
2. “Year-demeaned” OLS regression
• Deviate Y
it
, X
it
from year (not state) averages
• Estimate by OLS using “year-demeaned”
data
8-10
State and Time Fixed Effects
Y
it
= β
0
+ β
1
X
it
+ β
2
Z
i
+ β
3
S
t
+ u
it
1. “Binary regressor” formulation:
Y
it
= β
0
+ β
1
X
it
+ γ
2
D2
i
+ … + γ
n
Dn
i

+ δ
2
B2
t
+ … δ
T
BT
t
+ u
it
2. “State and time effects” formulation:
Y
it
= β
1
X
it
+ α
i
+ µ
t
+ u
it
8-11
State and time effects: estimation methods
1. “n-1 and T-1 binary regressors” OLS regression
• Create binary variables D2,…,Dn
• Create binary variables B2,…,BT
• Regress Y on X, D2,…,Dn, B2,…,BT using
OLS
• What about D1 and B1?
2. “State- and year-demeaned” OLS regression
• Deviate Y
it
, X
it
from year and state averages
• Estimate by OLS using “year- and state-
demeaned” data
These two methods can be combined too.
. ge

Some Theory: The Fixed Effects Regression
Assumptions (SW App. 8.2)
8-12
For a single X:
Y
it
= β
1
X
it
+ α
i
+ u
it
, i = 1,…,n, t = 1,…, T
1. E(u
it
|X
i1
,…,X
iT

i
) = 0.
2. (X
i1
,…,X
iT
,Y
i1
,…,Y
iT
), i =1,…,n, are i.i.d. draws
from their joint distribution.
3. (X
it
, u
it
) have finite fourth moments.
4. There is no perfect multicollinearity (multiple
X’s)
5. corr(u
it
,u
is
|X
it
,X
is

i
) = 0 for t ≠ s.
Assumptions 3&4 are identical; 1, 2, differ; 5 is
new
Assumption #1: E(u
it
|X
i1
,…,X
iT

i
) = 0
8-13
• u
it
has mean zero, given the state fixed effect
and the entire history of the X’s for that state
• This is an extension of the previous multiple
regression Assumption #1
• This means there are no omitted lagged effects
(any lagged effects of X must enter explicitly)
• Also, there is not feedback from u to future X:
o Whether a state has a particularly high
fatality rate this year doesn’t subsequently
affect whether it increases the beer tax.
series data.
8-14
Assumption #2: (X
i1
,…,X
iT
,Y
i1
,…,Y
iT
), i =1,…,n,
are i.i.d. draws from their joint distribution.
• This is an extension of Assumption #2 for
multiple regression with cross-section data
• This is satisfied if entities (states, individuals)
are randomly sampled from their population by
simple random sampling, then data for those
entities are collected over time.
• This does not require observations to be i.i.d.
over time for the same entity – that would be
unrealistic (whether a state has a mandatory
DWI sentencing law this year is strongly related
to whether it will have that law next year).
8-15
Assumption #5: corr(u
it
,u
is
|X
it
,X
is

i
) = 0 for t ≠ s
• This is new.
• This says that (given X), the error terms are
uncorrelated over time within a state.
• For example, u
CA,1982
and u
CA,1983
are uncorrelated
• Is this plausible? What enters the error term?
oEspecially snowy winter
oOpening major new divided highway
oFluctuations in traffic density from local
economic conditions
• Assumption #5 requires these omitted factors
entering u
it
to be uncorrelated over time, within a
state.
8-16
What if Assumption #5 fails:
corr(u
it
,u
is
|X
it
,X
is

i
) ≠0?
• A useful analogy is heteroskedasticity.
• OLS panel data estimators of β
1
are unbiased,
consistent
• The OLS standard errors will be wrong – usually
the OLS standard errors understate the true
uncertainty
• Intuition: if u
it
is correlated over time, you don’t
have as much information (as much random
variation) as you would were u
it
uncorrelated.
• This problem is solved by using
“heteroskedasticity and autocorrelation-consistent
on time series regression
8-17
Application: Drunk Driving Laws and Traffic
Deaths
(SW Section 8.5)
Some facts
• Approx. 40,000 traffic fatalities annually in the
U.S.
• 1/3 of traffic fatalities involve a drinking driver
• 25% of drivers on the road between 1am and
3am have been drinking (estimate)
• A drunk driver is 13 times as likely to cause a
fatal crash as a non-drinking driver (estimate)
8-18
Drunk driving laws and traffic deaths, ctd.
Public policy issues
• Drunk driving causes massive externalities
(sober drivers are killed, etc. etc.) – there is
ample justification for governmental
intervention
• Are there any effective ways to reduce drunk
driving? If so, what?
• What are effects of specific laws:
omandatory punishment
ominimum legal drinking age
oeconomic interventions (alcohol taxes)
8-19
The drunk driving panel data set
n = 48 U.S. states, T = 7 years (1982,…,1988)
(balanced)
Variables
• Traffic fatality rate (deaths per 10,000
residents)
• Tax on a case of beer (Beertax)
• Minimum legal drinking age
• Minimum sentencing laws for first DWI
violation:
o Mandatory Jail
o Manditory Community Service
ootherwise, sentence will just be a monetary
fine
8-20
• Vehicle miles per driver (US DOT)
• State economic data (real per capita income,
etc.)
8-21
Why might panel data help?
• Potential OV bias from variables that vary across
states but are constant over time:
oculture of drinking and driving
ovintage of autos on the road
Use state fixed effects
• Potential OV bias from variables that vary over
time but are constant across states:
oimprovements in auto safety over time
ochanging national attitudes towards drunk
driving
Use time fixed effects
8-22
8-23
8-24
Empirical Analysis: Main Results
• Sign of beer tax coefficient changes when fixed
state effects are included
• Fixed time effects are statistically significant but
do not have big impact on the estimated
coefficients
• Estimated effect of beer tax drops when other laws
are included as regressor
• The only policy variable that seems to have an
impact is the tax on beer – not minimum drinking
age, not mandatory sentencing, etc.
• The other economic variables have plausibly large
coefficients: more income, more driving, more
deaths
8-25
Extensions of the “n-1 binary regressor”
approach
The idea of using many binary indicators to
eliminate omitted variable bias can be extended to
non-panel data – the key is that the omitted variable
is constant for a group of observations, so that in
effect it means that each group has its own
intercept.
Example: Class size problem.
Suppose funding and curricular issues are
determined at the county level, and each county
has several districts. Resulting omitted variable
bias could be addressed by including binary
indicators, one for each county (omit one to
avoid perfect multicollinearity).
8-26
Summary: Regression with Panel Data
(SW Section 8.6)
Advantages and limitations of fixed effects
regression
• You can control for unobserved variables that:
ovary across states but not over time, and/or
ovary over time but not across states
• Estimation involves relatively
straightforward extensions of multiple regression
8-27
• Fixed effects estimation can be done three
ways:
1. “Changes” method when T = 2
2. “n-1 binary regressors” method when n is
small
3. “Entity-demeaned” regression
• Similar methods apply to regression with
time fixed effects and to both time and state
fixed effects
• Statistical inference: like multiple regression.
Limitations/challenges
• Need variation in X over time within states
• Time lag effects can be important
8-28
• Standard errors might be too low (errors might
be correlated over time)
8-29

1. “n-1 binary regressors” OLS regression Yit = β0 + β1Xit + γ 2D2i + … + γ nDni + ui where
1 for i =2 (state #2) D2i =  0 otherwise

(1)

etc.

First create the binary variables D2i,…,Dni

• Then estimate (1) by OLS • Inference (hypothesis tests, confidence intervals) is as usual (using heteroskedasticity-robust standard errors) • This is impractical when n is very large (for example if n = 1000 workers)

8-2

2. “Entity-demeaned” OLS regression The fixed effects regression model: Yit = β1Xit + αi + ui The state averages satisfy: 1 T 1 T 1 T ∑Yit = αi + β1 T ∑ X it + T ∑ uit T t =1 t =1 t =1 Deviation from state averages: 1 T Yit – ∑Yit = T t =1 1 T 1 T     β1 X it − ∑ X it  +  uit − ∑ uit  T t =1 T t =1     8-3 .

1 T Yit – ∑Yit = T t =1 1 T 1 T     β1 X it − ∑ X it  +  uit − ∑ uit  T t =1 T t =1     or % % % Yit = β1 X it + uit 1 T 1 T % % where Yit = Yit – ∑Yit and X it = Xit – ∑ X it T t =1 T t =1 • % For i=1 and t = 1982.Entity-demeaned OLS regression. and its average value in Alabama averaged over all 7 years. 8-4 . Yit is the difference between the fatality rate in Alabama in 1982. ctd.

8-5 . % % % Yit = β1 X it + uit 1 T % where Yit = Yit – ∑Yit . but instead Yit is deviated from the state average instead of Yi1. ctd. T t =1 • • (2) % % First construct the demeaned variables Yit and X it % % Then estimate (2) by regressing Yit on X it using OLS • Inference (hypothesis tests. etc. confidence intervals) is as usual (using heteroskedasticity-robust standard errors) • This is like the “changes” approach.Entity-demeaned OLS regression.

Example. n = 48): FR1988-FR1982 = –.20) • Should you report the intercept? • How many binary regressors would you include to estimate this using the “binary regressor” method? • Compare slope.04(BeerTax1988–BeerTax1982) 8-6 . T = 7: FatalityRate = –. ctd.66BeerTax + State fixed effects (. 1982 “changes” specification T = 2. standard error to the estimate for the 1988 v.072 – 1. For n = 48.

36) Regression with Time Fixed Effects (SW Section 8. which changes over time but not states. • The resulting population regression model is: Yit = β0 + β1Xit + β2Zi + β3St + uit 8-7 . changes in national laws • These produce intercepts that change over time • Let these changes (“safer cars”) be denoted by the variable St.4) An omitted variable might vary over time but not across states: • Safer cars (air bags. etc.065) (.(.).

8-8 .1983 + ui.1982. µ1982 = β0 + β3S1982 Similarly.1982 = µ1982 + β1Xi.1982 = (β0 + β3S1982) + β1Xi.1983 = µ1983 + β1Xi.1982 or Yi.Time fixed effects only Yit = β0 + β1Xit + β3St + uit In effect.1983. the intercept varies from one year to the next: Yi. µ1983 = β0 + β3S1983 etc.1982 + ui. Yi.1982 + ui.1982 = β0 + β1Xi.1982 + β3S1982 + ui.

0 otherwise 2. “Binary regressor” formulation: Yit = β0 + β1Xit + δ 2B2t + … δ TBTt + uit 1 when t =2 (year #2) where B2t =  .Two formulations for time fixed effects 1. “Time effects” formulation: Yit = β1Xit + µt + uit 8-9 . etc.

B2. Xit from year (not state) averages • Estimate by OLS using “year-demeaned” data 8-10 .…. “T-1 binary regressors” OLS regression Yit = β0 + β1Xit + δ 2B2it + … δ TBTit + uit • • • • Create binary variables B2.Time fixed effects: estimation methods 1.BT using OLS Where’s B1? 2. “Year-demeaned” OLS regression • Deviate Yit. = 0 otherwise Regress Y on X.BT B2 = 1 if t = year #2.….

“Binary regressor” formulation: Yit = β0 + β1Xit + γ 2D2i + … + γ nDni + δ 2B2t + … δ TBTt + uit 2. “State and time effects” formulation: Yit = β1Xit + αi + µt + uit 8-11 .State and Time Fixed Effects Yit = β0 + β1Xit + β2Zi + β3St + uit 1.

Dn • Create binary variables B2.2) 8-12 . “n-1 and T-1 binary regressors” OLS regression • Create binary variables D2. 8.and year-demeaned” OLS regression • Deviate Yit.BT using OLS • What about D1 and B1? 2.State and time effects: estimation methods 1.Dn.and statedemeaned” data These two methods can be combined too. B2. Some Theory: The Fixed Effects Regression Assumptions (SW App.….….….BT • Regress Y on X.…. “State. D2. Xit from year and state averages • Estimate by OLS using “year.

2.….Xis. draws from their joint distribution.….αi) = 0 for t ≠ s.d.….For a single X: Yit = β1Xit + αi + uit. 5 is new 5. 3.….i. are i. 1. 2. E(uit|Xi1. (Xit.XiT.Yi1.αi) = 0. Assumption #1: E(uit|Xi1.….XiT.n.….YiT). uit) have finite fourth moments. t = 1.XiT. i = 1.αi) = 0 8-13 . i =1. T 1. Assumptions 3&4 are identical. 4.n.uis|Xit. There is no perfect multicollinearity (multiple X’s) corr(uit. (Xi1. differ.….

o We’ll return to this when we take up time series data. given the state fixed effect and the entire history of the X’s for that state This is an extension of the previous multiple regression Assumption #1 This means there are no omitted lagged effects (any lagged effects of X must enter explicitly) Also.• • • • uit has mean zero. 8-14 . there is not feedback from u to future X: o Whether a state has a particularly high fatality rate this year doesn’t subsequently affect whether it increases the beer tax.

individuals) are randomly sampled from their population by simple random sampling.Yi1.…. • This does not require observations to be i.d.n.….i.i. 8-15 . • This is an extension of Assumption #2 for multiple regression with cross-section data • This is satisfied if entities (states.d. are i. over time for the same entity – that would be unrealistic (whether a state has a mandatory DWI sentencing law this year is strongly related to whether it will have that law next year). draws from their joint distribution.XiT.…. i =1.YiT). then data for those entities are collected over time.Assumption #2: (Xi1.

• This says that (given X). uCA.Xis.1983 are uncorrelated • Is this plausible? What enters the error term? oEspecially snowy winter oOpening major new divided highway oFluctuations in traffic density from local • economic conditions Assumption #5 requires these omitted factors entering uit to be uncorrelated over time. within a state.uis|Xit.Assumption #5: corr(uit.αi) = 0 for t ≠ s • This is new. the error terms are uncorrelated over time within a state. 8-16 . • For example.1982 and uCA.

• This problem is solved by using “heteroskedasticity and autocorrelation-consistent standard errors” – we return to this when we focus on time series regression 8-17 .What if Assumption #5 fails: corr(uit. consistent • The OLS standard errors will be wrong – usually the OLS standard errors understate the true uncertainty • Intuition: if uit is correlated over time.αi) ≠0? • A useful analogy is heteroskedasticity.Xis.uis|Xit. you don’t have as much information (as much random variation) as you would were uit uncorrelated. • OLS panel data estimators of β1 are unbiased.

• 1/3 of traffic fatalities involve a drinking driver • 25% of drivers on the road between 1am and 3am have been drinking (estimate) • A drunk driver is 13 times as likely to cause a fatal crash as a non-drinking driver (estimate) 8-18 .5) Some facts • Approx. 40.000 traffic fatalities annually in the U.Application: Drunk Driving Laws and Traffic Deaths (SW Section 8.S.

Public policy issues • Drunk driving causes massive externalities (sober drivers are killed. etc. etc.Drunk driving laws and traffic deaths.) – there is ample justification for governmental intervention • Are there any effective ways to reduce drunk driving? If so. what? • What are effects of specific laws: omandatory punishment ominimum legal drinking age oeconomic interventions (alcohol taxes) 8-19 . ctd.

T = 7 years (1982.…. sentence will just be a monetary fine 8-20 .S.1988) (balanced) Variables • Traffic fatality rate (deaths per 10. states.The drunk driving panel data set n = 48 U.000 residents) • Tax on a case of beer (Beertax) • Minimum legal drinking age • Minimum sentencing laws for first DWI violation: o Mandatory Jail o Manditory Community Service ootherwise.

) 8-21 . etc.• Vehicle miles per driver (US DOT) • State economic data (real per capita income.

Why might panel data help? • Potential OV bias from variables that vary across states but are constant over time: oculture of drinking and driving oquality of roads ovintage of autos on the road Use state fixed effects • Potential OV bias from variables that vary over time but are constant across states: oimprovements in auto safety over time ochanging national attitudes towards drunk driving Use time fixed effects 8-22 .

8-23 .

8-24 .

• The other economic variables have plausibly large coefficients: more income.Empirical Analysis: Main Results • Sign of beer tax coefficient changes when fixed state effects are included • Fixed time effects are statistically significant but do not have big impact on the estimated coefficients • Estimated effect of beer tax drops when other laws are included as regressor • The only policy variable that seems to have an impact is the tax on beer – not minimum drinking age. not mandatory sentencing. more driving. more deaths 8-25 . etc.

and each county has several districts. one for each county (omit one to avoid perfect multicollinearity). Suppose funding and curricular issues are determined at the county level. so that in effect it means that each group has its own intercept. Resulting omitted variable bias could be addressed by including binary indicators. 8-26 .Extensions of the “n-1 binary regressor” approach The idea of using many binary indicators to eliminate omitted variable bias can be extended to non-panel data – the key is that the omitted variable is constant for a group of observations. Example: Class size problem.

6) Advantages and limitations of fixed effects regression Advantages • You can control for unobserved variables that: ovary across states but not over time.Summary: Regression with Panel Data (SW Section 8. and/or ovary over time but not across states • More observations give you more information • Estimation involves relatively straightforward extensions of multiple regression 8-27 .

“Changes” method when T = 2 2. “n-1 binary regressors” method when n is small 3.• Fixed effects estimation can be done three ways: 1. Limitations/challenges • Need variation in X over time within states • Time lag effects can be important 8-28 . “Entity-demeaned” regression • Similar methods apply to regression with time fixed effects and to both time and state fixed effects • Statistical inference: like multiple regression.

• Standard errors might be too low (errors might be correlated over time) 8-29 .