Professional Documents
Culture Documents
Ani Katchova
2
Policy analysis with pooled cross sections
• Pooled cross sections are two or more independently sampled cross
sections in different time periods. They are not necessarily the same
units between the periods.
• Pooled cross sections can be used to evaluate the impact of a
treatment, event, program, or policy change.
• The difference-in-differences model involves before and after
comparisons in natural experiments to determine the effect of a
treatment.
3
The difference-in-differences model
• The difference-in-differences model (DID model) shows the effect of a
treatment in the after period (DID effect).
• A treatment is implemented for treated units (𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 = 1). For the
control units, there is no treatment (𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 = 0).
• Data are collected in the period after the treatment (𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 = 1) and the
period before the treatment (𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 = 0).
• The interaction term is 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 ∗ 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎, which is equal to 1 for treated
units in the after period.
• Difference-in-differences model:
𝑦𝑦 = 𝛽𝛽0 + 𝛿𝛿0 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 + 𝛽𝛽1 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 + 𝛿𝛿1 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 ∗ 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 + 𝑢𝑢
• The difference-in-differences effect 𝛿𝛿1 is the effect of the treatment in the
after period on the outcome 𝑦𝑦.
4
DID effect using two models
• Difference-in-differences model:
𝑦𝑦 = 𝛽𝛽0 + 𝛿𝛿0 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 + 𝛽𝛽1 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 + 𝛿𝛿1 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 ∗ 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 + 𝑢𝑢
• Estimate the regression in the before period (𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 = 0)
𝑦𝑦 = 𝛽𝛽0 + 𝛽𝛽1 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 + 𝑢𝑢
• The coefficient 𝛽𝛽1 shows the differences in outcomes between treated and control units
in the before period.
• Estimate the regression in the after period (𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 = 1)
𝑦𝑦 = (𝛽𝛽0 + 𝛿𝛿0 ) + (𝛽𝛽1 +𝛿𝛿1 ) 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 + 𝑢𝑢
• The coefficient (𝛽𝛽1 +𝛿𝛿1 ) shows the differences in outcomes between the treated and
control units in the after period.
• The DID effect is the difference between the two coefficients 𝛿𝛿1 = (𝛽𝛽1 +𝛿𝛿1 ) − 𝛽𝛽1
• The DID effect is the difference in outcomes between treated and control units in the
after period and the treated and control units in the before period.
5
DID effect using two models
• Difference-in-differences model:
𝑦𝑦 = 𝛽𝛽0 + 𝛿𝛿0 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 + 𝛽𝛽1 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 + 𝛿𝛿1 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 ∗ 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 + 𝑢𝑢
• Estimate the regression for the control units (𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 = 0)
𝑦𝑦 = 𝛽𝛽0 + 𝛿𝛿0 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 + 𝑢𝑢
• The coefficient 𝛿𝛿0 shows the difference in outcome between after and before for the
control units.
• Estimate the regression for the treated units (𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 = 1)
𝑦𝑦 = (𝛽𝛽0 +𝛽𝛽1 ) + (𝛿𝛿0 + 𝛿𝛿1 )𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 + 𝑢𝑢
• The coefficient (𝛿𝛿0 +𝛿𝛿1 ) shows the differences in outcomes between after and before for
the treated units.
• The DID effect is the difference between the two coefficients 𝛿𝛿1 = (𝛿𝛿0 +𝛿𝛿1 ) − 𝛿𝛿0
• The DID effect is difference in outcomes between after and before for the treated units
and after and before for the control units.
6
DID model
• Difference-in-differences model:
𝑦𝑦 = 𝛽𝛽0 + 𝛿𝛿0 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 + 𝛽𝛽1 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 + 𝛿𝛿1 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 ∗ 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 + 𝑢𝑢
• Estimate this regression model. The DID effect 𝛿𝛿1 is the effect of the treatment in the after
period.
• Outcome for control units (𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 = 0) in before period (𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 = 0) is 𝑦𝑦�0𝑐𝑐 = 𝛽𝛽0
• Outcome for treated units (𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 = 1) in before period (𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 = 0) is 𝑦𝑦�0𝑡𝑡 = 𝛽𝛽0 + 𝛽𝛽1
• Outcome for control units (𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 = 0) in after period (𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 = 1) is 𝑦𝑦�1𝑐𝑐 = 𝛽𝛽0 + 𝛿𝛿0
• Outcome for treated units (𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 = 1) in after period (𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 = 1) is 𝑦𝑦�1𝑡𝑡 = 𝛽𝛽0 + 𝛿𝛿0 + 𝛽𝛽1 + 𝛿𝛿1
• The DID effect is:
• 𝛿𝛿1 = 𝑦𝑦�1𝑡𝑡 − 𝑦𝑦�0𝑡𝑡 − 𝑦𝑦�1𝑐𝑐 − 𝑦𝑦�0𝑐𝑐 = 𝑦𝑦�1𝑡𝑡 − 𝑦𝑦�1𝑐𝑐 − 𝑦𝑦�0𝑡𝑡 − 𝑦𝑦�0𝑐𝑐
7
DID effect
• Difference-in-differences model:
𝑦𝑦 = 𝛽𝛽0 + 𝛿𝛿0 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 + 𝛽𝛽1 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 + 𝛿𝛿1 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 ∗ 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 + 𝑢𝑢
• The DID effect 𝛿𝛿1 is differences in outcomes between after and before for the treated units and
after and before for the control units.
• The DID effect 𝛿𝛿1 is also the differences in outcomes between treated and control units in the
after period and the treated and control units in the before period.
y81*nearinc
14
DID model example
After period Before period For houses For houses far DID regression
near from
incinerator incinerator
VARIABLES rprice rprice rprice rprice rprice
nearinc -30,688*** -18,824*** -18,824***
(5,828) (4,745) (4,875)
y81 6,926 18,790*** 18,790***
(8,205) (3,383) (4,050)
y81*nearinc -11,864
(7,457)
Constant 101,308*** 82,517*** 63,693*** 82,517*** 82,517***
(3,093) (2,654) (5,296) (2,278) (2,727)
The coefficients in regression in the before period and for houses far from incinerator are same as in DID model.
The DID effect is -30,688 - (-18,824) = 6,926 – 18,780 = -11,864
House prices near the incinerator were $11,864 lower than prices for houses far from the incinerator, after the
incinerator was built, but the effect is not significant.
15
Panel data model with two periods
• Panel data have a cross sectional dimension 𝑖𝑖 (people id, firm id, etc.)
and time series dimension 𝑡𝑡 (year, month, etc.).
• Variables are 𝑦𝑦𝑖𝑖𝑖𝑖 , 𝑥𝑥1𝑖𝑖𝑖𝑖 , and 𝑥𝑥2𝑖𝑖𝑖𝑖 .
• Example with two years of panel data (same units over two periods)
• 𝑦𝑦𝑖𝑖𝑖𝑖 = 𝛽𝛽0 + 𝛿𝛿0 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 + 𝛽𝛽1 𝑥𝑥1𝑖𝑖𝑖𝑖 + 𝛽𝛽2 𝑥𝑥2𝑖𝑖𝑖𝑖 + 𝑎𝑎𝑖𝑖 + 𝑢𝑢𝑖𝑖𝑖𝑖
• 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 is a dummy variable for the second period
• 𝑎𝑎𝑖𝑖 are individual specific effects or fixed effects. They are unobserved time-
constant factors that are the same for each unit over time.
• 𝑢𝑢𝑖𝑖𝑖𝑖 are unobserved factors (error term) that differ by units and years.
16
Panel data model with two periods
• Example with two years of panel data (same units over two periods)
• 𝑦𝑦𝑖𝑖𝑖𝑖 = 𝛽𝛽0 + 𝛿𝛿0 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 + 𝛽𝛽1 𝑥𝑥1𝑖𝑖𝑖𝑖 + 𝛽𝛽2 𝑥𝑥2𝑖𝑖𝑖𝑖 + 𝑎𝑎𝑖𝑖 + 𝑢𝑢𝑖𝑖𝑖𝑖
• Model for first year: 𝑦𝑦𝑖𝑖𝑖𝑖 = 𝛽𝛽0 + 𝛿𝛿0 0 + 𝛽𝛽1 𝑥𝑥1𝑖𝑖𝑖𝑖 + 𝛽𝛽2 𝑥𝑥2𝑖𝑖𝑖𝑖 + 𝑎𝑎𝑖𝑖 + 𝑢𝑢𝑖𝑖𝑖𝑖
• Model for second year: 𝑦𝑦𝑖𝑖(𝑡𝑡+1) = 𝛽𝛽0 + 𝛿𝛿0 1 + 𝛽𝛽1 𝑥𝑥1𝑖𝑖(𝑡𝑡+1) + 𝛽𝛽2 𝑥𝑥2𝑖𝑖(𝑡𝑡+1) + 𝑎𝑎𝑖𝑖 + 𝑢𝑢𝑖𝑖(𝑡𝑡+1)
• Subtract model for first year from model for second year
• 𝑦𝑦𝑖𝑖(𝑡𝑡+1) − 𝑦𝑦𝑖𝑖𝑖𝑖 = 𝛿𝛿0 + 𝛽𝛽1 (𝑥𝑥1𝑖𝑖 𝑡𝑡+1 −𝑥𝑥1𝑖𝑖𝑖𝑖 ) + 𝛽𝛽2 (𝑥𝑥2𝑖𝑖 𝑡𝑡+1 −𝑥𝑥2𝑖𝑖𝑖𝑖 ) + (𝑢𝑢𝑖𝑖 𝑡𝑡+1 −𝑢𝑢𝑖𝑖𝑡𝑡 )
• Using the notation for first differences: ∆𝑦𝑦𝑖𝑖𝑖𝑖 = 𝑦𝑦𝑖𝑖(𝑡𝑡+1) − 𝑦𝑦𝑖𝑖𝑖𝑖
• ∆𝑦𝑦𝑖𝑖𝑖𝑖 = 𝛿𝛿0 + 𝛽𝛽1 ∆𝑥𝑥1𝑖𝑖𝑖𝑖 + 𝛽𝛽2 ∆𝑥𝑥2𝑖𝑖𝑖𝑖 + ∆𝑢𝑢𝑖𝑖𝑖𝑖
• The first differences model does not include the individual specific effect (𝑎𝑎𝑖𝑖 ) and can be
estimated by OLS. The fixed effect was differenced out because it does not vary over
time.
17
Panel data and first differences
nr year wage hours educ exper dwage dhours deduc dexper
13 1980 15.76 2672 14 1 . . . .
13 1981 71.30 2320 14 2 55.54 -352 0 1
17 1980 47.42 2484 13 4 . . . .
17 1981 32.99 2804 13 5 -14.43 320 0 1
18 1980 32.81 2332 12 4 . . . .
18 1981 54.37 2116 12 5 21.57 -216 0 1
• Panel data for two periods: nr is person id, year is either 1980 or 1981. wage is in dollars, hours is
total hours worked for the year, educ and exper are in number of years.
• Variables are first differenced: ∆wage = dwage = wage_1981 – wage_1980.
• A first difference of education has all 0s, and a first difference of experience has all 1s. Someone
in the sample could have received more education in the second period, but they didn’t.
• Variables that have no variation (∆educ or ∆exper) cannot be included in regression models.
18
Panel data model with two periods example
• Panel data model for wages explained by hours worked.
• Panel data model: 𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑖𝑖𝑖𝑖 = 𝛽𝛽0 + 𝛿𝛿0 𝑑𝑑𝑑𝑑𝑑𝑑 + 𝛽𝛽1 ℎ𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑖𝑖𝑖𝑖 + 𝑎𝑎𝑖𝑖 + 𝑢𝑢𝑖𝑖𝑖𝑖
• Model for first year: 𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑖𝑖𝑖𝑖𝑖𝑖 = 𝛽𝛽0 + 𝛿𝛿0 0 + 𝛽𝛽1 ℎ𝑜𝑜𝑜𝑜𝑜𝑜𝑠𝑠1980 + 𝑎𝑎𝑖𝑖 + 𝑢𝑢𝑖𝑖𝑖𝑖𝑖𝑖
• Model for second year: 𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑖𝑖𝑖𝑖𝑖𝑖 = 𝛽𝛽0 + 𝛿𝛿0 1 + 𝛽𝛽1 ℎ𝑜𝑜𝑜𝑜𝑜𝑜𝑠𝑠1981 + 𝑎𝑎𝑖𝑖 +
𝑢𝑢𝑖𝑖𝑖𝑖𝑖𝑖
• Note that these models cannot be estimated because of the unobservable
𝑎𝑎𝑖𝑖 . Instead models without 𝑎𝑎𝑖𝑖 were estimated.
• Panel data model with first differences: ∆𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑖𝑖 = 𝛿𝛿0 + 𝛽𝛽1 ∆ℎ𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑖𝑖 + ∆𝑢𝑢𝑖𝑖
• First differencing eliminated the individual specific effect (𝑎𝑎𝑖𝑖 ) and the
model can be estimated by OLS.
19
First difference estimation
Model with both Model with Model for first Model for Model with first
periods both periods period second period differences
VARIABLES wage wage wage wage dwage
hours -0.004 -0.005 -0.002 -0.009*
(0.003) (0.003) (0.004) (0.005)
d1981 13.300***
(4.037)
dhours -0.035***
(0.005)
Constant 60.289*** 55.523*** 48.822*** 77.482*** 16.602***
(6.790) (6.913) (7.720) (11.575) (3.031)
Observations 1,090 1,090 545 545 545
R-squared 0.002 0.012 0.000 0.006 0.074
Using cross sectional data (across workers), there is no significant effect (or very small effect) of hours worked on
wages. Only in the model for the second period, for one more hour worked, wages were lower by 1 cent.
Model with first differences shows that for each additional hour worked the next year, wages were 3.5 cents
lower. There is a stronger and larger effect of hours worked on wage over time (as compared to cross sectionally).
20
Model with first differences also has a better fit (R-squared is higher, explaining 7.4% of the variation).
Review questions
• Describe the difference-in-differences approach. What variables need
to be included?
• What does the difference-in-differences effect measure?
• Describe the first difference estimator for panel data models with two
periods.
21