You are on page 1of 39

ITSx: Policy Analysis Using

Interrupted Time Series


Week 5 Slides
Michael Law, Ph.D.
The University of British Columbia

COURSE OVERVIEW

Layout of the weeks


1.
2.
3.
4.
5.

Introduction, setup, data sources


Single series interrupted time series analysis
ITS with a control group
ITS Extensions
Regression discontinuities & Wrap-up

REGRESSION DISCONTINUITIES

Regression Discontinuity (RD)


Design
Compare trends in an outcome across an exposure variable
below and above a threshold

Major Assumption
The level and trend in the outcome above/below the threshold
would have continued absent the threshold

The Counterfactual
Outcome of Interest

Threshold

Change at
Threshold

Below Threshold

Above Threshold

Forcing Variable

Estimates
RD estimates whats known as a local average treatment
effect (LATE)
Comparing people just below to just above the threshold

Forcing Variable Examples

Student Achievement
Vote Margin
Birth Year
Minute of birth
Many others

Integrity of the Forcing Variable


Institutional integrity
Describe the process of assigning variables, and how access to
the intervention was assigned
Should not be subject to potential manipulation

Statistical integrity
There should not be a discontinuity in the density of cases at the
threshold

Testing Assumptions
Other variables should be smooth through the threshold

Potential RD Biases
1. Co-intervention / Non-smooth curve

Something aside from the intervention affects the outcome and


changes at the same threshold as the intervention

2. Instrumentation

The method of measurement differs above and below the threshold

3. Attrition

Individuals are differentially included in the sample on either side of


the threshold

4. Manipulation of threshold

PERFORMING AN RD ANALYSIS

Basic data setup


Person ID

Forcing

Threshold

Forcing_Threshold

Outcome

Basic RD model
For threshold j and forcing variable k:

outcome jk = 0 + 1 (k j) + 2 [k > j]+ 3 [k > j] k + jk


Predicted level at
smallest forcing
variable value

Pre-existing slope in the


outcome of interest

Change in the level


above the threshold
* Variable of interest

Change in the slope above


the threshold

outcome jk = 0 + 1 (k j) + 2 [k > j]+ 3 [k > j] k + jk

Outcome of Interest

Threshold

(slope below threshold)

2 (RD estimate)

(slope above threshold)

(intercept)
Below Threshold

Above Threshold

Forcing Variable

Running an RD Model
########################
# Modeling an RD
########################
# Fit the standard regression model
rd_model <- gls(outcome ~ forcing + threshold +
forcing_threshold,
data=data,
method="ML")
summary(rd_model)

Higher-order Polynomials
Often the relationship between the forcing variable and the
outcome on either side of the threshold will be non-linear
Solution: model in polynomial terms

Similar in structure and form to using a quadratic trend in a


time series analysis

Running an RD Model
#####################################
# Modeling an RD with square terms
#####################################
# Construct a square term on either side of the threshold
data$forcing_sq <- data$forcing^2
data$forcing_threshold_sq <- data$forcing_threshold^2
# Fit the standard regression model
rd_model <- gls(outcome ~ forcing + forcing_sq + threshold +
forcing_threshold + forcing_threshold_sq,
data=dataset,
method="ML")
summary(rd_model)

Modeling
Have to make decisions about range
Trade-off between linearity and data, or precision and bias as
Lee and Lemieux refer to it

Other considerations
Local linear regression
Kernel densities
Fuzzy RD designs

Presenting an RD Analyis
Common to present two figures:
Forcing variable and exposure to the intervention
Forcing variable and outcome

RD EXAMPLE: INCUMBENCY

Lee (2008)
Interested in the effect of incumbent party advantage
Uses data from US House of Representatives elections
Our data are from a replication by Caughey and Sekhon
Includes 7,598 elections from 1942 through 2006

Probability of Winning Next Election

Equal Vote Share

RD estimate

Democrat loss (negative margin)

Democrat win (positive margin)

Democratic Party Margin of Victory

Data Setup
state

year

dmargin

demwin

dwinnext

bin

1946

-6.218

22

1950

-4.146

23

1954

-5.118

23

1956

6.148

29

Setup Variables
# Setup square and cubic terms for forcing variable
dataset$dmargin2 <- dataset$dmargin^2
dataset$dmargin3 <- dataset$dmargin^3
# Setup interaction between forcing variable and threshold
dataset$dmargin_demwin <- dataset$dmargin * dataset$demwin
# Setup square and cubic terms for forcing variable * threshold
interactions
dataset$dmargin_demwin2 <- dataset$dmargin_demwin^2
dataset$dmargin_demwin3 <- dataset$dmargin_demwin^3

Preliminary Plot
###################################
# Preliminary Plot
###################################
# Setup bins for plotting
bins <- seq(-49,49,2)
# Get the mean within each bin
means <- tapply(dataset$dwinnext,dataset$bin,mean)
# Plot the results
plot(bins,means,
pch=19,
ylab="Probability of Winning Next Election",
xlab="Vote Margin in the Last Election",
xlim=c(-50,50),
col="lightblue")
# Add line at zero
abline(v=0,lty=2,col="grey")

Run Basic Model


###################################
# Modeling
###################################
model <- lm(dwinnext ~ dmargin + demwin + dmargin_demwin,
data=dataset)
summary(model)

Model 1 Results
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
0.2362171 0.0096311 24.526
<2e-16 ***
dmargin
0.0051402 0.0003727 13.790
<2e-16 ***
demwin
0.5558085 0.0139324 39.893
<2e-16 ***
dmargin_demwin -0.0008619 0.0005163 -1.669
0.0951 .

Incumbent Party Advantage: 56%

Add square terms


# Add square terms
model2 <- lm(dwinnext ~ dmargin + dmargin2 +
demwin + dmargin_demwin + dmargin_demwin2,
data=dataset)
summary(model2)
# Compare versus model 1
anova(model1, model2)

Model 2 Results
Coefficients:
Estimate
(Intercept)
0.28847535
dmargin
0.01172643
dmargin2
0.00014036
demwin
0.44811150
dmargin_demwin -0.00053605
dmargin_demwin2 -0.00028161

Std. Error t value Pr(>|t|)


0.01425106 20.242 < 2e-16 ***
0.00137841
8.507 < 2e-16 ***
0.00002829
4.962 7.14e-07 ***
0.02054055 21.816 < 2e-16 ***
0.00196543 -0.273
0.785
0.00003958 -7.114 1.23e-12 ***

Model 1 vs. Model 2


Analysis of Variance Table
Model 1: dwinnext ~ dmargin + demwin + dmargin_demwin
Model 2: dwinnext ~ dmargin + dmargin2 + demwin +
dmargin_demwin + dmargin_demwin2
Res.Df
RSS Df Sum of Sq
F
Pr(>F)
1
7593 732.19
2
7591 727.33 2
4.8522 25.32 1.096e-11 ***

Incumbent Party Advantage: 45%

Add cubic terms


# Run full specified model
model3 <- lm(dwinnext ~ dmargin + dmargin2 + dmargin3 + demwin
+ dmargin_demwin + dmargin_demwin2 +
dmargin_demwin3,
data=dataset)
summary(model3)
# Compare versus model 2
anova(model2, model3)

Model 3 Results
Coefficients:
Estimate
(Intercept)
0.300040593
dmargin
0.014578041
dmargin2
0.000288379
dmargin3
0.000002045
demwin
0.385243821
dmargin_demwin
0.009250574
dmargin_demwin2 -0.001068132
dmargin_demwin3 0.000006539

Std. Error t value


Pr(>|t|)
0.018943445 15.839
< 2e-16 ***
0.003374783
4.320 0.00001582 ***
0.000162408
1.776
0.0758 .
0.000002209
0.926
0.3547
0.027359614 14.081
< 2e-16 ***
0.004872682
1.898
0.0577 .
0.000231675 -4.610 0.00000408 ***
0.000003111
2.102
0.0356 *

Model 2 vs. Model 3


Analysis of Variance Table
Model 1: dwinnext ~ dmargin + dmargin2 + demwin +
dmargin_demwin + dmargin_demwin2
Model 2: dwinnext ~ dmargin + dmargin2 + dmargin3 + demwin +
dmargin_demwin + dmargin_demwin2 + dmargin_demwin3
Res.Df
RSS Df Sum of Sq
F
Pr(>F)
1
7591 727.33
2
7589 725.78 2
1.5515 8.1114 0.0003027 ***

Incumbent Party Advantage: 39%

A note on the example


I have modeled a discrete (win / loss) outcome using linear
regression
I have also posted code to perform the same analysis using
logistic regression