You are on page 1of 21

Simple Panel Data Models

Ani Katchova

© 2020 by Ani Katchova. All rights reserved.


Outline
• Difference-in-differences model
• Panel data model with first differences estimator

2
Policy analysis with pooled cross sections
• Pooled cross sections are two or more independently sampled cross
sections in different time periods. They are not necessarily the same
units between the periods.
• Pooled cross sections can be used to evaluate the impact of a
treatment, event, program, or policy change.
• The difference-in-differences model involves before and after
comparisons in natural experiments to determine the effect of a
treatment.

3
The difference-in-differences model
• The difference-in-differences model (DID model) shows the effect of a
treatment in the after period (DID effect).
• A treatment is implemented for treated units (𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 = 1). For the
control units, there is no treatment (𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 = 0).
• Data are collected in the period after the treatment (𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 = 1) and the
period before the treatment (𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 = 0).
• The interaction term is 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 ∗ 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎, which is equal to 1 for treated
units in the after period.
• Difference-in-differences model:
𝑦𝑦 = 𝛽𝛽0 + 𝛿𝛿0 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 + 𝛽𝛽1 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 + 𝛿𝛿1 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 ∗ 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 + 𝑢𝑢
• The difference-in-differences effect 𝛿𝛿1 is the effect of the treatment in the
after period on the outcome 𝑦𝑦.

4
DID effect using two models
• Difference-in-differences model:
𝑦𝑦 = 𝛽𝛽0 + 𝛿𝛿0 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 + 𝛽𝛽1 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 + 𝛿𝛿1 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 ∗ 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 + 𝑢𝑢
• Estimate the regression in the before period (𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 = 0)
𝑦𝑦 = 𝛽𝛽0 + 𝛽𝛽1 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 + 𝑢𝑢
• The coefficient 𝛽𝛽1 shows the differences in outcomes between treated and control units
in the before period.
• Estimate the regression in the after period (𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 = 1)
𝑦𝑦 = (𝛽𝛽0 + 𝛿𝛿0 ) + (𝛽𝛽1 +𝛿𝛿1 ) 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 + 𝑢𝑢
• The coefficient (𝛽𝛽1 +𝛿𝛿1 ) shows the differences in outcomes between the treated and
control units in the after period.
• The DID effect is the difference between the two coefficients 𝛿𝛿1 = (𝛽𝛽1 +𝛿𝛿1 ) − 𝛽𝛽1
• The DID effect is the difference in outcomes between treated and control units in the
after period and the treated and control units in the before period.

5
DID effect using two models
• Difference-in-differences model:
𝑦𝑦 = 𝛽𝛽0 + 𝛿𝛿0 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 + 𝛽𝛽1 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 + 𝛿𝛿1 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 ∗ 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 + 𝑢𝑢
• Estimate the regression for the control units (𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 = 0)
𝑦𝑦 = 𝛽𝛽0 + 𝛿𝛿0 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 + 𝑢𝑢
• The coefficient 𝛿𝛿0 shows the difference in outcome between after and before for the
control units.
• Estimate the regression for the treated units (𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 = 1)
𝑦𝑦 = (𝛽𝛽0 +𝛽𝛽1 ) + (𝛿𝛿0 + 𝛿𝛿1 )𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 + 𝑢𝑢
• The coefficient (𝛿𝛿0 +𝛿𝛿1 ) shows the differences in outcomes between after and before for
the treated units.
• The DID effect is the difference between the two coefficients 𝛿𝛿1 = (𝛿𝛿0 +𝛿𝛿1 ) − 𝛿𝛿0
• The DID effect is difference in outcomes between after and before for the treated units
and after and before for the control units.

6
DID model
• Difference-in-differences model:
𝑦𝑦 = 𝛽𝛽0 + 𝛿𝛿0 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 + 𝛽𝛽1 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 + 𝛿𝛿1 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 ∗ 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 + 𝑢𝑢
• Estimate this regression model. The DID effect 𝛿𝛿1 is the effect of the treatment in the after
period.
• Outcome for control units (𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 = 0) in before period (𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 = 0) is 𝑦𝑦�0𝑐𝑐 = 𝛽𝛽0
• Outcome for treated units (𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 = 1) in before period (𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 = 0) is 𝑦𝑦�0𝑡𝑡 = 𝛽𝛽0 + 𝛽𝛽1
• Outcome for control units (𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 = 0) in after period (𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 = 1) is 𝑦𝑦�1𝑐𝑐 = 𝛽𝛽0 + 𝛿𝛿0
• Outcome for treated units (𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 = 1) in after period (𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 = 1) is 𝑦𝑦�1𝑡𝑡 = 𝛽𝛽0 + 𝛿𝛿0 + 𝛽𝛽1 + 𝛿𝛿1
• The DID effect is:
• 𝛿𝛿1 = 𝑦𝑦�1𝑡𝑡 − 𝑦𝑦�0𝑡𝑡 − 𝑦𝑦�1𝑐𝑐 − 𝑦𝑦�0𝑐𝑐 = 𝑦𝑦�1𝑡𝑡 − 𝑦𝑦�1𝑐𝑐 − 𝑦𝑦�0𝑡𝑡 − 𝑦𝑦�0𝑐𝑐

7
DID effect
• Difference-in-differences model:
𝑦𝑦 = 𝛽𝛽0 + 𝛿𝛿0 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 + 𝛽𝛽1 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 + 𝛿𝛿1 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 ∗ 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 + 𝑢𝑢
• The DID effect 𝛿𝛿1 is differences in outcomes between after and before for the treated units and
after and before for the control units.
• The DID effect 𝛿𝛿1 is also the differences in outcomes between treated and control units in the
after period and the treated and control units in the before period.

Outcome Before After After - Before


after = 0 after = 1
Control (c) treated = 0 𝛽𝛽0 𝛽𝛽0 + 𝛿𝛿0 𝛿𝛿0
Treated (t) treated = 1 𝛽𝛽0 + 𝛽𝛽1 𝛽𝛽0 + 𝛿𝛿0 + 𝛽𝛽1 + 𝛿𝛿1 𝛿𝛿0 + 𝛿𝛿1
Treated - Control 𝛽𝛽1 𝛽𝛽1 + 𝛿𝛿1 𝛿𝛿1
8
DID model example
• The treatment is building of a garbage incinerator (waste treatment facility) which can potentially
lower the prices of houses that are near the incinerator.
• Two periods: 𝑦𝑦𝑦𝑦=0 in the before period of 1978 and 𝑦𝑦𝑦𝑦=1 in the after period of 1981, after the
incinerator was built.
• Treated units are houses near the incinerator (𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛=1) and control units are houses far from
the incinerator (𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛=0).
rprice Before After
yr81=0 yr81=1
Control (c) nearinc = 0 $82,517 $101,308
Treated (t) nearinc = 1 $63,693 $70,619
• Prices for houses near the incinerator and far from the incinerator increased in the after period.
• Prices for houses near the incinerator were lower in the before period.
• Did building of the incinerator lower house prices (rprice)?
9
DID effect using two models
• Difference-in-differences model:
𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 = 𝛽𝛽0 + 𝛿𝛿0 𝑦𝑦𝑦𝑦 + 𝛽𝛽1 𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛 + 𝛿𝛿1 𝑦𝑦𝑦𝑦 ∗ 𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛 + 𝑢𝑢
• Estimate the regression in the before period (𝑦𝑦𝑦𝑦 = 0)
𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 = 𝛽𝛽0 + 𝛽𝛽1 𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛 + 𝑢𝑢
• The coefficient 𝛽𝛽1 shows the differences in house prices for houses near
and far from the incinerator in the before period.
• Estimate the regression in the after period (𝑦𝑦𝑦𝑦 = 1)
𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 = (𝛽𝛽0 + 𝛿𝛿0 ) + (𝛽𝛽1 +𝛿𝛿1 ) 𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛 + 𝑢𝑢
• The coefficient (𝛽𝛽1 +𝛿𝛿1 ) shows the differences in house prices for houses
near and far from the incinerator in the after period.
• The DID effect is the difference between the two coefficients 𝛿𝛿1 =
(𝛽𝛽1 +𝛿𝛿1 ) − 𝛽𝛽1
10
DID effect using two models
After period Before period
VARIABLES rprice rprice
nearinc -30,688*** -18,824***
(5,828) (4,745)
y81

y81*nearinc

Constant 101,308*** 82,517***


(3,093) (2,654)
In the after period, prices for houses near the incinerator were $30,688 lower compared to prices for houses
that are far from the incinerator.
In the before period, prices for houses near the incinerator were $18,824 lower compared to prices for houses
that are far from the incinerator.
The DID effect is -30,688-(-18,824)= -11,864
Prices for houses near the incinerator were $11,864 lower than prices for houses far from the incinerator, after
the incinerator was built. 11
DID effect using two models
• Difference-in-differences model:
𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 = 𝛽𝛽0 + 𝛿𝛿0 𝑦𝑦𝑦𝑦 + 𝛽𝛽1 𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛 + 𝛿𝛿1 𝑦𝑦𝑦𝑦 ∗ 𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛 + 𝑢𝑢
• Estimate the regression for houses far from the incinerator (𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛 = 0)
𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 = 𝛽𝛽0 + 𝛿𝛿0 𝑦𝑦𝑦𝑦 + 𝑢𝑢
• The coefficient 𝛿𝛿0 shows the difference in house prices between after and
before for houses far from the incinerator.
• Estimate the regression in houses near the incinerator (𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛 = 1)
𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 = (𝛽𝛽0 +𝛽𝛽1 ) + (𝛿𝛿0 + 𝛿𝛿1 )𝑦𝑦𝑦𝑦 + 𝑢𝑢
• The coefficient (𝛿𝛿0 +𝛿𝛿1 ) shows the difference in house prices between
after and before for houses near the incinerator.
• The DID effect is the difference between the two coefficients 𝛿𝛿1 =
(𝛿𝛿0 +𝛿𝛿1 ) − 𝛿𝛿0
12
DID effect using two models
For houses near For houses far from
incinerator incinerator
VARIABLES rprice rprice
nearinc

y81 6,926 18,790***


(8,205) (3,383)
y81*nearinc

Constant 63,693*** 82,517***


(5,296) (2,278)
For houses near the incinerator, house prices were $6,926 higher in the after period compared to the before
period.
For houses far from the incinerator, house prices were $18,790 higher in the after period compared to the
before period.
The DID effect is 6,926 – 18,780 = -11,864 (same effect)
Prices for houses near the incinerator were $11,864 lower than prices for houses far from the incinerator, after
the incinerator was built. 13
DID model example
• Difference-in-differences model: 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 = 𝛽𝛽0 + 𝛿𝛿0 𝑦𝑦𝑦𝑦 + 𝛽𝛽1 𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛 + 𝛿𝛿1 𝑦𝑦𝑦𝑦 ∗ 𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛 + 𝑢𝑢
• The numbers is the table show the average house prices for treated units (houses near the
incinerator) and control units (houses far from the incinerator) in the before (1978) and after
(1981) period.
• The DID effect 𝛿𝛿1 shows that houses near the incinerator had $11,864 lower prices than houses
far from the incinerator, after the incinerator was built (in the after period).

Before After After - Before


after=0 after =1
Control (c) treated = 0 𝛽𝛽0 =82,517 𝛽𝛽0 + 𝛿𝛿0 = 101,308 𝛿𝛿0 = 18,790
Treated (t) treated = 1 𝛽𝛽0 + 𝛽𝛽1 =63,693 𝛽𝛽0 + 𝛿𝛿0 + 𝛽𝛽1 + 𝛿𝛿1 =70,619 𝛿𝛿0 + 𝛿𝛿1 = 6,926
Treated - Control 𝛽𝛽1 = -18,824 𝛽𝛽1 + 𝛿𝛿1 = -30,688 𝛿𝛿1 = -11,864

14
DID model example
After period Before period For houses For houses far DID regression
near from
incinerator incinerator
VARIABLES rprice rprice rprice rprice rprice
nearinc -30,688*** -18,824*** -18,824***
(5,828) (4,745) (4,875)
y81 6,926 18,790*** 18,790***
(8,205) (3,383) (4,050)
y81*nearinc -11,864
(7,457)
Constant 101,308*** 82,517*** 63,693*** 82,517*** 82,517***
(3,093) (2,654) (5,296) (2,278) (2,727)
The coefficients in regression in the before period and for houses far from incinerator are same as in DID model.
The DID effect is -30,688 - (-18,824) = 6,926 – 18,780 = -11,864
House prices near the incinerator were $11,864 lower than prices for houses far from the incinerator, after the
incinerator was built, but the effect is not significant.
15
Panel data model with two periods
• Panel data have a cross sectional dimension 𝑖𝑖 (people id, firm id, etc.)
and time series dimension 𝑡𝑡 (year, month, etc.).
• Variables are 𝑦𝑦𝑖𝑖𝑖𝑖 , 𝑥𝑥1𝑖𝑖𝑖𝑖 , and 𝑥𝑥2𝑖𝑖𝑖𝑖 .
• Example with two years of panel data (same units over two periods)
• 𝑦𝑦𝑖𝑖𝑖𝑖 = 𝛽𝛽0 + 𝛿𝛿0 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 + 𝛽𝛽1 𝑥𝑥1𝑖𝑖𝑖𝑖 + 𝛽𝛽2 𝑥𝑥2𝑖𝑖𝑖𝑖 + 𝑎𝑎𝑖𝑖 + 𝑢𝑢𝑖𝑖𝑖𝑖
• 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 is a dummy variable for the second period
• 𝑎𝑎𝑖𝑖 are individual specific effects or fixed effects. They are unobserved time-
constant factors that are the same for each unit over time.
• 𝑢𝑢𝑖𝑖𝑖𝑖 are unobserved factors (error term) that differ by units and years.

16
Panel data model with two periods
• Example with two years of panel data (same units over two periods)
• 𝑦𝑦𝑖𝑖𝑖𝑖 = 𝛽𝛽0 + 𝛿𝛿0 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 + 𝛽𝛽1 𝑥𝑥1𝑖𝑖𝑖𝑖 + 𝛽𝛽2 𝑥𝑥2𝑖𝑖𝑖𝑖 + 𝑎𝑎𝑖𝑖 + 𝑢𝑢𝑖𝑖𝑖𝑖
• Model for first year: 𝑦𝑦𝑖𝑖𝑖𝑖 = 𝛽𝛽0 + 𝛿𝛿0 0 + 𝛽𝛽1 𝑥𝑥1𝑖𝑖𝑖𝑖 + 𝛽𝛽2 𝑥𝑥2𝑖𝑖𝑖𝑖 + 𝑎𝑎𝑖𝑖 + 𝑢𝑢𝑖𝑖𝑖𝑖
• Model for second year: 𝑦𝑦𝑖𝑖(𝑡𝑡+1) = 𝛽𝛽0 + 𝛿𝛿0 1 + 𝛽𝛽1 𝑥𝑥1𝑖𝑖(𝑡𝑡+1) + 𝛽𝛽2 𝑥𝑥2𝑖𝑖(𝑡𝑡+1) + 𝑎𝑎𝑖𝑖 + 𝑢𝑢𝑖𝑖(𝑡𝑡+1)
• Subtract model for first year from model for second year
• 𝑦𝑦𝑖𝑖(𝑡𝑡+1) − 𝑦𝑦𝑖𝑖𝑖𝑖 = 𝛿𝛿0 + 𝛽𝛽1 (𝑥𝑥1𝑖𝑖 𝑡𝑡+1 −𝑥𝑥1𝑖𝑖𝑖𝑖 ) + 𝛽𝛽2 (𝑥𝑥2𝑖𝑖 𝑡𝑡+1 −𝑥𝑥2𝑖𝑖𝑖𝑖 ) + (𝑢𝑢𝑖𝑖 𝑡𝑡+1 −𝑢𝑢𝑖𝑖𝑡𝑡 )
• Using the notation for first differences: ∆𝑦𝑦𝑖𝑖𝑖𝑖 = 𝑦𝑦𝑖𝑖(𝑡𝑡+1) − 𝑦𝑦𝑖𝑖𝑖𝑖
• ∆𝑦𝑦𝑖𝑖𝑖𝑖 = 𝛿𝛿0 + 𝛽𝛽1 ∆𝑥𝑥1𝑖𝑖𝑖𝑖 + 𝛽𝛽2 ∆𝑥𝑥2𝑖𝑖𝑖𝑖 + ∆𝑢𝑢𝑖𝑖𝑖𝑖
• The first differences model does not include the individual specific effect (𝑎𝑎𝑖𝑖 ) and can be
estimated by OLS. The fixed effect was differenced out because it does not vary over
time.

17
Panel data and first differences
nr year wage hours educ exper dwage dhours deduc dexper
13 1980 15.76 2672 14 1 . . . .
13 1981 71.30 2320 14 2 55.54 -352 0 1
17 1980 47.42 2484 13 4 . . . .
17 1981 32.99 2804 13 5 -14.43 320 0 1
18 1980 32.81 2332 12 4 . . . .
18 1981 54.37 2116 12 5 21.57 -216 0 1

• Panel data for two periods: nr is person id, year is either 1980 or 1981. wage is in dollars, hours is
total hours worked for the year, educ and exper are in number of years.
• Variables are first differenced: ∆wage = dwage = wage_1981 – wage_1980.
• A first difference of education has all 0s, and a first difference of experience has all 1s. Someone
in the sample could have received more education in the second period, but they didn’t.
• Variables that have no variation (∆educ or ∆exper) cannot be included in regression models.

18
Panel data model with two periods example
• Panel data model for wages explained by hours worked.
• Panel data model: 𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑖𝑖𝑖𝑖 = 𝛽𝛽0 + 𝛿𝛿0 𝑑𝑑𝑑𝑑𝑑𝑑 + 𝛽𝛽1 ℎ𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑖𝑖𝑖𝑖 + 𝑎𝑎𝑖𝑖 + 𝑢𝑢𝑖𝑖𝑖𝑖
• Model for first year: 𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑖𝑖𝑖𝑖𝑖𝑖 = 𝛽𝛽0 + 𝛿𝛿0 0 + 𝛽𝛽1 ℎ𝑜𝑜𝑜𝑜𝑜𝑜𝑠𝑠1980 + 𝑎𝑎𝑖𝑖 + 𝑢𝑢𝑖𝑖𝑖𝑖𝑖𝑖
• Model for second year: 𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑖𝑖𝑖𝑖𝑖𝑖 = 𝛽𝛽0 + 𝛿𝛿0 1 + 𝛽𝛽1 ℎ𝑜𝑜𝑜𝑜𝑜𝑜𝑠𝑠1981 + 𝑎𝑎𝑖𝑖 +
𝑢𝑢𝑖𝑖𝑖𝑖𝑖𝑖
• Note that these models cannot be estimated because of the unobservable
𝑎𝑎𝑖𝑖 . Instead models without 𝑎𝑎𝑖𝑖 were estimated.
• Panel data model with first differences: ∆𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑖𝑖 = 𝛿𝛿0 + 𝛽𝛽1 ∆ℎ𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑖𝑖 + ∆𝑢𝑢𝑖𝑖
• First differencing eliminated the individual specific effect (𝑎𝑎𝑖𝑖 ) and the
model can be estimated by OLS.

19
First difference estimation
Model with both Model with Model for first Model for Model with first
periods both periods period second period differences
VARIABLES wage wage wage wage dwage
hours -0.004 -0.005 -0.002 -0.009*
(0.003) (0.003) (0.004) (0.005)
d1981 13.300***
(4.037)
dhours -0.035***
(0.005)
Constant 60.289*** 55.523*** 48.822*** 77.482*** 16.602***
(6.790) (6.913) (7.720) (11.575) (3.031)
Observations 1,090 1,090 545 545 545
R-squared 0.002 0.012 0.000 0.006 0.074
Using cross sectional data (across workers), there is no significant effect (or very small effect) of hours worked on
wages. Only in the model for the second period, for one more hour worked, wages were lower by 1 cent.
Model with first differences shows that for each additional hour worked the next year, wages were 3.5 cents
lower. There is a stronger and larger effect of hours worked on wage over time (as compared to cross sectionally).
20
Model with first differences also has a better fit (R-squared is higher, explaining 7.4% of the variation).
Review questions
• Describe the difference-in-differences approach. What variables need
to be included?
• What does the difference-in-differences effect measure?
• Describe the first difference estimator for panel data models with two
periods.

21

You might also like