Week 5 EDX course notes for Interrupted Time Series.

Week 5 Slides

Michael Law, Ph.D.

The University of British Columbia

COURSE OVERVIEW

1.

2.

3.

4.

5.

Single series interrupted time series analysis

ITS with a control group

ITS Extensions

Regression discontinuities & Wrap-up

REGRESSION DISCONTINUITIES

Design

Compare trends in an outcome across an exposure variable

below and above a threshold

Major Assumption

The level and trend in the outcome above/below the threshold

would have continued absent the threshold

The Counterfactual

Outcome of Interest

Threshold

Change at

Threshold

Below Threshold

Above Threshold

Forcing Variable

Estimates

RD estimates whats known as a local average treatment

effect (LATE)

Comparing people just below to just above the threshold

Student Achievement

Vote Margin

Birth Year

Minute of birth

Many others

Institutional integrity

Describe the process of assigning variables, and how access to

the intervention was assigned

Should not be subject to potential manipulation

Statistical integrity

There should not be a discontinuity in the density of cases at the

threshold

Testing Assumptions

Other variables should be smooth through the threshold

Potential RD Biases

1. Co-intervention / Non-smooth curve

changes at the same threshold as the intervention

2. Instrumentation

3. Attrition

the threshold

4. Manipulation of threshold

PERFORMING AN RD ANALYSIS

Person ID

Forcing

Threshold

Forcing_Threshold

Outcome

Basic RD model

For threshold j and forcing variable k:

Predicted level at

smallest forcing

variable value

outcome of interest

above the threshold

* Variable of interest

the threshold

Outcome of Interest

Threshold

2 (RD estimate)

(intercept)

Below Threshold

Above Threshold

Forcing Variable

Running an RD Model

########################

# Modeling an RD

########################

# Fit the standard regression model

rd_model <- gls(outcome ~ forcing + threshold +

forcing_threshold,

data=data,

method="ML")

summary(rd_model)

Higher-order Polynomials

Often the relationship between the forcing variable and the

outcome on either side of the threshold will be non-linear

Solution: model in polynomial terms

time series analysis

Running an RD Model

#####################################

# Modeling an RD with square terms

#####################################

# Construct a square term on either side of the threshold

data$forcing_sq <- data$forcing^2

data$forcing_threshold_sq <- data$forcing_threshold^2

# Fit the standard regression model

rd_model <- gls(outcome ~ forcing + forcing_sq + threshold +

forcing_threshold + forcing_threshold_sq,

data=dataset,

method="ML")

summary(rd_model)

Modeling

Have to make decisions about range

Trade-off between linearity and data, or precision and bias as

Lee and Lemieux refer to it

Other considerations

Local linear regression

Kernel densities

Fuzzy RD designs

Presenting an RD Analyis

Common to present two figures:

Forcing variable and exposure to the intervention

Forcing variable and outcome

RD EXAMPLE: INCUMBENCY

Lee (2008)

Interested in the effect of incumbent party advantage

Uses data from US House of Representatives elections

Our data are from a replication by Caughey and Sekhon

Includes 7,598 elections from 1942 through 2006

RD estimate

Data Setup

state

year

dmargin

demwin

dwinnext

bin

1946

-6.218

22

1950

-4.146

23

1954

-5.118

23

1956

6.148

29

Setup Variables

# Setup square and cubic terms for forcing variable

dataset$dmargin2 <- dataset$dmargin^2

dataset$dmargin3 <- dataset$dmargin^3

# Setup interaction between forcing variable and threshold

dataset$dmargin_demwin <- dataset$dmargin * dataset$demwin

# Setup square and cubic terms for forcing variable * threshold

interactions

dataset$dmargin_demwin2 <- dataset$dmargin_demwin^2

dataset$dmargin_demwin3 <- dataset$dmargin_demwin^3

Preliminary Plot

###################################

# Preliminary Plot

###################################

# Setup bins for plotting

bins <- seq(-49,49,2)

# Get the mean within each bin

means <- tapply(dataset$dwinnext,dataset$bin,mean)

# Plot the results

plot(bins,means,

pch=19,

ylab="Probability of Winning Next Election",

xlab="Vote Margin in the Last Election",

xlim=c(-50,50),

col="lightblue")

# Add line at zero

abline(v=0,lty=2,col="grey")

###################################

# Modeling

###################################

model <- lm(dwinnext ~ dmargin + demwin + dmargin_demwin,

data=dataset)

summary(model)

Model 1 Results

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept)

0.2362171 0.0096311 24.526

<2e-16 ***

dmargin

0.0051402 0.0003727 13.790

<2e-16 ***

demwin

0.5558085 0.0139324 39.893

<2e-16 ***

dmargin_demwin -0.0008619 0.0005163 -1.669

0.0951 .

# Add square terms

model2 <- lm(dwinnext ~ dmargin + dmargin2 +

demwin + dmargin_demwin + dmargin_demwin2,

data=dataset)

summary(model2)

# Compare versus model 1

anova(model1, model2)

Model 2 Results

Coefficients:

Estimate

(Intercept)

0.28847535

dmargin

0.01172643

dmargin2

0.00014036

demwin

0.44811150

dmargin_demwin -0.00053605

dmargin_demwin2 -0.00028161

0.01425106 20.242 < 2e-16 ***

0.00137841

8.507 < 2e-16 ***

0.00002829

4.962 7.14e-07 ***

0.02054055 21.816 < 2e-16 ***

0.00196543 -0.273

0.785

0.00003958 -7.114 1.23e-12 ***

Analysis of Variance Table

Model 1: dwinnext ~ dmargin + demwin + dmargin_demwin

Model 2: dwinnext ~ dmargin + dmargin2 + demwin +

dmargin_demwin + dmargin_demwin2

Res.Df

RSS Df Sum of Sq

F

Pr(>F)

1

7593 732.19

2

7591 727.33 2

4.8522 25.32 1.096e-11 ***

# Run full specified model

model3 <- lm(dwinnext ~ dmargin + dmargin2 + dmargin3 + demwin

+ dmargin_demwin + dmargin_demwin2 +

dmargin_demwin3,

data=dataset)

summary(model3)

# Compare versus model 2

anova(model2, model3)

Model 3 Results

Coefficients:

Estimate

(Intercept)

0.300040593

dmargin

0.014578041

dmargin2

0.000288379

dmargin3

0.000002045

demwin

0.385243821

dmargin_demwin

0.009250574

dmargin_demwin2 -0.001068132

dmargin_demwin3 0.000006539

Pr(>|t|)

0.018943445 15.839

< 2e-16 ***

0.003374783

4.320 0.00001582 ***

0.000162408

1.776

0.0758 .

0.000002209

0.926

0.3547

0.027359614 14.081

< 2e-16 ***

0.004872682

1.898

0.0577 .

0.000231675 -4.610 0.00000408 ***

0.000003111

2.102

0.0356 *

Analysis of Variance Table

Model 1: dwinnext ~ dmargin + dmargin2 + demwin +

dmargin_demwin + dmargin_demwin2

Model 2: dwinnext ~ dmargin + dmargin2 + dmargin3 + demwin +

dmargin_demwin + dmargin_demwin2 + dmargin_demwin3

Res.Df

RSS Df Sum of Sq

F

Pr(>F)

1

7591 727.33

2

7589 725.78 2

1.5515 8.1114 0.0003027 ***

I have modeled a discrete (win / loss) outcome using linear

regression

I have also posted code to perform the same analysis using

logistic regression

