You are on page 1of 20

06

Non-experimental
methodologies for
quantitative analysis
Markus Frölich, Andreas Landmann,
Markus Olapade, and Robert Poppe
112

6.1. Introduction

The ultimate objective of quantita-


tive analysis is to establish causality.
Researchers want to know the causal
influence of a factor—the effect that
can be attributed to this factor and to
the factor only. Done correctly, quanti-
tative analysis allows both quantifying
the magnitude of this causal effect and
computing statistical precision of esti-
mation (confidence intervals). The goal
of an impact evaluation is to measure
the causal effect of a policy reform or
intervention on a set of well-defined
outcome variables.

Knowing causal relationships is use-


ful for making predictions about the consideration or not, but not both. The
consequences of changing policies or impact of a development programme
circumstances; they answer the ques- can only be identified by comparing
tion of what would happen in alter- realised outcomes of those who did
native (counterfactual) worlds. As an receive and of those who did not receive
example, one could try to identify the an intervention. Thus, data on non-par-
causal effect of introducing a commu- ticipating individuals needs to be col-
nity-based health insurance on health lected as well. The issue of selection
status or on out-of-pocket spending for bias is of central concern in this con-
health of the insured in a specific dis- text: selection bias may arise when
trict of a developing country. treated and non-treated individuals
are different with respect to observed
6.2. Selection bias and and unobserved characteristics. One
comparison issues reason could be for example, a proj-
ect manager who deliberately chooses
The fundamental problem of impact some individuals to be eligible for the
evaluation is the impossibility of programme but not others. Another
observing an individual in two states important source for selection bias
at a moment in time; each individ- is self-selection, i.e., when individuals
ual is either in the programme under themselves choose to be treated or not.
Non-experimental methodologies for quantitative analysis 113

Thus, any impact evaluation should be participated, whilst others did not par-
based on a detailed understanding about ticipate in the programme; otherwise,
why some individuals or communities results are likely to be biased.

Box 1: Potential outcomes and objects of interest

To make ideas more precise, let denote the outcome if individual or community i is
exposed to development intervention D. Before programme start, each individual (or
community) has two hypothetical outcomes: a potential outcome if individual i partici-
pates in the programme, and a potential outcome if individual i does not participate
in the programme. The causal effect of the intervention is defined as the difference
between and , i.e., the effect of participation in the programme relative to what
would have happened had individual i not participated in the programme. The individual
effect of the intervention is usually averaged over the population of interest, defined
as the average treatment effect . It can be interpreted as the average treat-
ment effect for a person randomly drawn from the population or, alternatively, as the
expected change in the average outcome if the individual status indicator variable of
development intervention D were changed from 0 to 1 for every individual (provided that
no general equilibrium effects occur), where individual i either receives (D=1) or does
not receive (D=0) the treatment. In a policy evaluation context of particular interest is
the average treatment effect on the treated (ATT) defined as . It may be
more informative to know how the programme affected those who chose to participate
in it than how it affected those who could have participated but decided not to.

One way to avoid selection bias is to (Angrist 2004). Given appropriate sam-
randomly assign individuals to a treat- ple sizes, the two groups will have
ment and a control group. We refer approximately the same character-
to randomised trials as methods that istics and differ only in terms of the
randomly assign individuals who are treatment status. They will be approx-
equally eligible and willing to partici- imately equal with respect to variables
pate into distinct groups; they are gen- like race, sex, and age, and also for
erally considered the most robust of all difficult to measure variables, such as
evaluation methodologies and some- lifestyle-related risks, quality of social
times referred to as the gold standard networks, and health awareness.
114

Box 2: How randomisation eliminates selection bias

Randomisation allows a simple interpretation of results: the impact is measured by the


difference in means between treatment and control group. Why is this so? With ran-
domisation, the average treatment effect on the treated (ATT) can be written as

ATT= .

Thus, randomisation allows replacing the expected (unobserved) counterfactual


outcome, , with the expected observed outcome of the non-participants,
. Essentially, is used to mimic the counterfactual. Because of
randomised assignment, it holds that , i.e., the expected non-
programme participation outcome is the same whether an individual actually partici-
pates or does not participate in the programme. This last equality usually does not hold
in non-experimental studies. Individuals have, for example, a better non-programme
participation outcome if there is positive selection into the treatment (“they would have
done better anyway”). This would lead to upward biased results.

In practice, there are several problems group may not take up treatment, or
with randomisation. Firstly, it may be individuals assigned to the control
unethical to deny access to benefits for group may seek similar treatment
a subgroup of individuals.1 Secondly, through alternative channels.
it may be politically unfeasible to ran-
domly deny access to a potentially ben- Sometimes, randomised trials are
eficial intervention. Thirdly, there may impractical. However, impact evalua-
not be any individuals who are unaf- tions are most valuable when we use
fected if the scope of the programme data to answer specific causal ques-
is nationwide. Fourthly, problems arise tions as if in a randomised controlled
when, after randomised assignment, trial. In absence of an experiment, we
individuals cross over to the treatment may look for a natural or quasi-exper-
group. For example, people might iment that mimics a randomised trial
travel to another municipality to buy in that there is a group affected by the
insurance after having learned that programme and some control group
a microfinance institution offers a new that is not affected. If it is credible to
life insurance scheme there. Fifthly, argue that the groups do not differ sys-
individuals assigned to the treatment tematically, such a natural or quasi-ex-
periment can be used for evaluation
1 For a discussion see Burtless and Orr (1986). instead of a randomised experiment.
Non-experimental methodologies for quantitative analysis 115

Unfortunately, it is often hard to justify


that programmes that were not ex-ante
planned as randomised experiments
do in fact fulfill this criteria. This is why
it is preferable to think about evalua-
tion and the appropriate design before
the programme is implemented.

In non-experimental studies, research-


ers often try to approximate a ran-
domised experiment by using statisti-
cal methods. We will discuss several
non-experimental methods in the fol- the counterfactual outcome for the
lowing paragraphs. The issue of selec- participants. The underlying assump-
tion bias is of central concern here. tion is that individual characteristics,
Rather complex statistical methods are on average, do not play a role for the
required in order to deal with selec- difference between the treated and
tion bias when using non-experimental the non-treated, which is a strong
data. The methods differ in the way they assumption. The latter method, the
correct for differences in (observed and before-after estimator, is based on
unobserved) characteristics between (at least) two cross-sections of data—
the treatment and the control group one cross-section before programme
and by their underlying assumptions.2
start and one cross-section after the
programme. It uses the participants’
Two non-experimental methods—dif-
pre-intervention outcome to impute the
ferences in means and before-after
counterfactual outcome for the partici-
estimation (also called reflexive com-
pants. The drawback of this method is
parisons)—usually do not give a satis-
that it is impossible to separate pro-
factory solution to the selection issue
gramme effects from general effects
when using non-experimental data.
that occurred during the same period.
In general, this is because changes in
the outcomes cannot be attributed to
Throughout this chapter, a hypothetical
the programme. The former method,
example will display the different evalu-
differences in means, is based on
ation methods. By use of a microinsur-
cross-sectional data using the out-
ance example for inpatient and outpa-
come of the non-participants to impute
tient hospital visits, we will explain the
main concepts to determine the effect of
2 For a more general and accessible introduction to impact
evaluation see further Leeuw and Vaessen (2009). this insurance on our outcome variable:
116

number of hospital visits. This example The cross-sectional comparison by dif-


can then be generalised to any other ference in means is our starting point
insurance and outcome of interest. for the impact evaluation. Hereby, we
compare two villages at t=1, where it
Figure 1 displays the impact health so happens that in the treatment vil-
insurance has on the number of hos- lage microinsurance is available. If the
pital visits. At time t=0, the microinsur- inhabitants from this village buy a pol-
ance for inpatient and outpatient ser- icy, they can make claims to the insurer
vices in a nearby hospital is introduced after a hospital visit. Inhabitants from
in our village. At that point of time the the control village do not have this
villagers visit the hospital 1.5 times opportunity and need to pay the total
a year on average. At time t=1, the hospital bill from their own money.
number of hospital visits has increased In a cross-sectional comparison we
to 3.5 visits per year (blue line). Without directly compare the hospital visits of
the introduction of microinsurance, the the treated and untreated groups.
frequency would have increased to 2.5
times only (dotted red line). As we will Figure 2 displays the naïve estima-
explain, the outcome 2.5 constitutes tor of the treatment effect using
the counterfactual outcome. Thus, the a cross-sectional comparison. The
(true) impact of the microinsurance treatment village has 3.5 hospital visits
scheme is 1 hospital visit per year. a year per person on average, where
village 2 has only two visits per year.
Figure 1: Real impact of The naïve estimator is simply the dif-
microinsurance on hospital visits ference between those two outcomes,
which is 1.5, a biased estimate of the
Hypothetical Impact of Microinsurance
treatment effect.
4
Red = without microinsurance scheme
Blue = with microinsurance 3,5 It is obvious that the effect might be
3.5
partially attributable to the availability
# of hospital visits

3
of a health insurance, but it cannot be
2,5
2.5 ruled out that other reasons have also
2
affected this result.
1,5
1.5
1. Were the two villages different from
1 the outset? Different in observables
0.5 like wealth, education, or different
in unobservables such as trust in
0
t=0 t=1 hospital staff?
Non-experimental methodologies for quantitative analysis 117

Figure 2: Cross-sectional comparison


4

3,5
Hospital visits per year

2,5

1,5

0,5

0
village with insurance village without insurance

2. Has one of the villages had other Another naïve estimator would be the
influences that might affect the before-after comparison. Here, we need
number of hospital visits? The road to conduct a baseline survey amongst
leading to the hospital might have the population of the treatment village
(shortly) before the microinsurance is
been blocked for village 2, hinder-
made available. We would ask explic-
ing people from visiting the hospital
itly how often a week the inhabitants
and leading to an overestimation of go to the hospital.3 Then the insurance
our treatment effect. is made available and after a certain
period of time the same survey ques-
Myriad scenarios can be constructed tions are gathered again from the vil-
to answer these two questions. lage. Of course, the time between the
According to our example from figure introduction of the microinsurance and
1, however, an actual comparable con- the follow-up survey needs to be long
trol village should have 2.5 hospital enough for certain incidences of sick-
visits per year in t=1. This shows that ness and claims to occur. As depicted
in figure 3, the estimated impact of our
using cross-sectional comparison,
hypothetical insurance is the differ-
the impact evaluator cannot be sure
ence between the number of hospital
whether the effect of microinsurance
on hospital visits results only from the
availability of microinsurance or also 3 Of course, it might be more reliable to gather this infor-
mation directly from the hospital: “How many inhabitants
from other confounding factors. from village 1 visit the hospital a week?”
118

visits before and after introduction, Figure 3: Before-and-after comparison


which is two hospital visits per year.

Hospital visits per year in treatment village


6
The interpretation would be that intro-
ducing microinsurance that covers the
5
cost of inpatient and outpatient care
increases the number of hospital visits 4
by two visits per year.
3
However, this estimator relies on the
2
important assumption that, in between
our two surveys, no other factors have
1
occurred that might cause a change
in hospital visits. This means that, for 0
the before-and-after estimator to pro- Before After

duce reliable results, our researcher


must be sure no other confounding 1. Instrumental variables
effects have occurred between the 2. Regression discontinuity design
two surveys. In fact, figure 1 shows (RDD)
that without microinsurance there still 3. Propensity score matching (PSM)
would have been an upwards trend 4. Difference-in-differences (DID)
in hospital visits and that, therefore,
our before-after comparison deliv- Amongst them, the first two
ers unreliable results. Reasons for an approaches, if applicable, usually give
upwards trend in hospital visits inde- the most convincing results. We will
pendent of the treatment could be also stress the importance of inter-
increases in prosperity, decreases in nal and external validity in each case.
transportation costs, and many other Internal validity is the extent to which
scenarios. the results are credible for the popu-
lation under consideration. External
In sections 6.3 through 6.6 of this chap- validity is the extent to which this sub-
ter, the following four non-experimen- population is representative for the
tal approaches will be explained: whole population (of interest). Some
Non-experimental methodologies for quantitative analysis 119

methods give results with high internal awareness). The instrument affects
validity, but low external validity, and insurance take-up without being itself
vice versa. We conclude with a discus- affected by different levels of aware-
sion of where non-experimental meth- ness about insurance. In the absence
ods should be applied. of a good instrument, one could not
tell apart the insurance’s effect and the
6.3. Instrumental variable awareness’ effect on hospital visits.
Instrumental variable methods solve this
Selection bias occurs when an omitted problem of omitted control variables.
variable has an effect on the outcome An instrumental variable is a variable
variable of interest and the treatment. which has an effect on whether an
It is also called selection on unob- individual takes up or does not take up
servables, whereby treatment selec- treatment and at the same time is per-
tion is affected by a variable that the mitted to affect the outcome variable of
researcher cannot observe in the data. interest via the treatment variable only.
For example, individuals with insurance This is called the exclusion restriction. In
could have had a higher (unobserved) other words, individuals with different
awareness for health issues from the values of the instrument differ in their
outset. Consequently, they would show treatment status. But otherwise, these
different health behaviour, than those individuals are comparable. Often, the
without insurance. Figure 4 illustrates exclusion restriction will be only valid
this simple case with arrows indicating conditionally, that means when con-
directions of influence and dashed lines trolling for individual characteristics.
indicating unobserved variables (health

Figure 4: Setup with instrumental variable

Awareness
Hospital
visits

Instrument Insurance
120

Box 3: How the instrumental variables method solves the problem of


unobservables

More formally, instrumental variable Z affects treatment status D, but there is no direct
relationship between Z and the outcome variable Y. Hence Z is allowed to affect Y only
indirectly via D. Suppose we have

where is the outcome variable, is the treatment indicator variable,


is an unobserved variable that is correlated with , and is a random error term.

If we now estimate (since is unobserved),

will be correlated with the residual because, in effect, we have

and are correlated, so and are correlated as well and, thus, is endogenous.

The instrument can now be used to get an unbiased estimate of the effect of the endog-
enous variable. Researchers use a method that is called two-stage least squares: in the
first stage, the instrument(s) Z is used to give estimated values of the endogenous treat-
ment variable D for every individual (or community):4

Then, in the second stage, this new variable is plugged in the equation of interest:

The coefficient gives now an unbiased estimate of the treatment effect.

Using an instrument for the evaluation


of microinsurance has not been done
often and, in general instruments are
hard to find. We therefore use a hypo-
thetical example to illustrate our point
in the context of this guide. Suppose
the government sets up a health insur-
ance programme for the poor. Every-
body who is interested has to register

4 Note that using predicted values as additional regressors


in the way presented here only works in linear models.
Non-experimental methodologies for quantitative analysis 121

and purchase the product at the local whilst the individuals in the control
insurance administration centre of the group do not receive such an encour-
neighbourhood or municipality. Now agement or incentive (or receive a dif-
imagine two households that are very ferent one). It is up to the individuals
close, but on different sides of the bor- whether they sign up for the actual
der between two neighbourhoods. The treatment. For example, imagine that
distance to the administrative centre the price of insurance is varied ran-
might differ considerably, but other- domly across communities, creating
wise the two neighbours should be very a random incentive to buy insurance
similar. For such pairs, distance to the for the population facing a lower price.
administrative centre could be used The instrumental variable that is gen-
as a predictor of insurance take-up erated here helps resolve the problem
that is otherwise unrelated to individ- of selection bias and allows consistent
ual characteristics—in other words, estimation of the effect that insurance
a good instrument. Here, the instru- take-up has on health and other out-
ment is correlated with insurance come measures. Similarly, we may
take-up but not with awareness. Thus, vary the effort related to take-up by, for
instead of comparing the treated to the example, varying service hours, density
untreated, we compare those with high of offices in a community, etc., from the
values and low values of the instru- insurer’s side. If areas or individuals
ment. This example is analogous to the cannot be exclusively chosen for a pro-
famous distance-to-school instrument gramme at random, we may at least
used by Card (1995) for schooling. Typi- give them varying incentives to do so.
cally, an instrument requires including
additional X variables, e.g., quality of If there is an instrument that fulfills
the neighbourhood, degree of urbani- the exclusion restriction as explained
sation, family background, etc. above, internal validity is high. How-
ever, external validity depends on
We may also generate instrumen- another quality of the instrument. If
tal variables ourselves by randomly the instrument predicts treatment
assigning incentives or encourage- status accurately, external validity is
ments to individuals (random encour- also likely to be high. Otherwise the
agement design). This approach looks instrumental variable results cannot
very much like a proper randomised be generalised to the whole popula-
experiment, except that we have imper- tion. The reason is that only those who
fect control over the beneficiaries. An are induced to take up treatment by the
encouragement or incentive is given to instrument can be used for the estima-
the individuals in the treatment group, tion of the treatment effect.
122

6.4. Regression discontinuity


design

Although not as rigorous as random


assignment, the regression disconti-
nuity design (RDD) approach may give
more convincing results than propen-
sity score matching (PSM) and differ-
ence-in-differences (DID) methods (see
below). The idea of RDD is to exploit
some cut-off point that is important
for selection into treatment and com-
pare people near this cutoff. Thus, this
approach implicitly compares treated
subjects to a control group that is very
similar.
for the treatment.5 To test for the plau-
A standard application is when enrol- sibility of this assumption, we can use
ment into treatment is limited and statistical methods. Although disconti-
selection of participants is conducted nuities in evaluation studies are often
according to an objective rule. Fre- unplanned, they may also be integrated
quently, such form of targeting is done ex-ante.
on the basis of a poverty index: individ-
uals above the threshold receive the To give an illustrative example, imag-
treatment, whilst it is withheld from ine the government wants to introduce
those individuals below the threshold. microinsurance especially for the poor.
When comparing individuals very close The village administration is respon-
to this threshold, their characteristics sible for the distribution of the insur-
barely differ, except with respect to ance and relies on a poverty index to
determine the eligible households.
their treatment status. It is basically
Only households that are considered
random as to whether an individual is
poor under this index are eligible and
below or above the cutoff given that
can buy the insurance. Such disconti-
the individual is close to the cut-off
nuities do not necessarily have to be
value. However, for this approach to
be valid, individuals must be unable 5 Even if they have some influence, the approach is feasible
to manipulate their value of the index as long as they are unable to manipulate their assignment
precisely. The solution in this case is the so-called fuzzy
such that they would become eligible regression discontinuity design.
Non-experimental methodologies for quantitative analysis 123

Figure 5: Regression discontinuity design


4 Dark blue = eligible for microinsurance scheme
Light blue = not eligible for microinsurance scheme
3.5
Hospital visits per year

3
2.5

2
1.5
1

0.5
0
poor Poverty line rich
Poverty index

planned as part of the intervention Therefore, we can use the households


(even though it is certainly beneficial to that are eligible for the insurance and
have it planned beforehand). Instead, very close to the threshold as our treat-
the evaluator could detect and exploit ment group, whilst those slightly above
any rule used in practice to determine the threshold serve as control group.
participation in the programme.
The benefit of RDD is that it does not
Figure 5 shows the number of hos- need actual randomisation. However,
pital visits by ranking in the poverty the interpretation of the estimated
index sometime after the programme impact is limited to the population that
started in our hypothetical village. is close to the threshold. As a result,
external validity of this approach is
We see that the number of hospital vis- rather limited. Further, it usually
its increases and that there is a jump requires a large sample for estimation.
exactly at the poverty line. This jump
is a result of the microinsurance pro- 6.5. Propensity score
gramme and the restriction that only matching (PSM)
poor people have access to this pro-
gramme. Any household that is above The basic idea of propensity score
the poverty line has no access to the matching is to match at least one
insurance product. RDD assumes that non-participant to every participant
households just above the poverty line with identical or highly similar values
are, in fact, similar to those slightly of observed characteristics X. The dif-
below the index in all relevant aspects. ference in outcome, Y, between these
124

two individuals is then averaged over to participants on the basis of the pro-
the whole population. A practical sim- pensity score, which is defined as the
plification is to match non-participants probability of treatment.

Box 4: How matching eliminates selection bias

Using matching, the average treatment effect on the treated (ATT) can be defined as
ATT= . As in the randomised
trial, the expected counterfactual outcome, can be replaced by the
expected observed outcome of the non-participants, , but only conditional
on a set of observable covariates, X. If treated and non-treated differ in terms of observ-
able characteristics X only and not in terms of unobservables (the so-called selection on
observables assumption), it holds that . Otherwise, selec-
tion bias will remain an issue.

Instead of matching on X, it suffices to match on the propensity score p(X), i.e., the proba-
bility of treatment defined as p(X)=Pr(D=1|X) (Rosenbaum and Rubin 1983). We can there-
fore also write

ATT= .

It is important to include all variables Panel data, if available, would allow


in X that affect the outcome and selec- testing for the plausibility of the under-
tion into the programme at the same lying assumptions by conducting
time. Another requirement is that a pseudo treatment test. The idea is to
the X variables need to be unaffected pretend that the participants received
by the treatment, i.e., they should be the treatment before the start of the
measured before treatment starts. intervention and then to estimate the
PSM requires both a thorough under- impact. Because the intervention had
standing of the selection process and not been in place yet, the estimated
a large data basis. Qualitative inter- effect should be zero. If the estima-
views with local project managers and tion leads to a different finding, then
participants may be helpful to deter- this should be taken as evidence that
mine which variables to collect in order participants and non-participants dif-
to ensure that all important variables fer with respect to unobserved char-
are included in X. acteristics. If we are willing to assume
Non-experimental methodologies for quantitative analysis 125

that these differences are time-invari- difficult to justify. Often we need


ant, then we can use a DID matching a method that can also take care of
approach. If, however, we suspect that confounding variables that are unob-
these differences change over time, served. However, as already men-
then we need more or better X vari- tioned, good instruments are hard to
ables or a better understanding of the find. Therefore, we would like have
selection process. other tools to deal with unobservables.
The DID estimator uses data with
Propensity score matching gives a time or cohort dimension to con-
rather low internal validity due to its trol for unobserved but time-invariant
reliance on the selection on observ- variables. It relies on comparing par-
ables assumption. In other words, ticipants and non-participants before
the results might be biased if there
and after the treatment. The minimum
are variables that are correlated with
requirement is to have data on an out-
insurance take-up and the outcome of
come variable, Y, and treatment status,
interest (such as hospital visits), but
D, before and after the intervention.
cannot be observed in the data. Exter-
(It can be carried out with or without
nal validity can be high, except in the
panel data and with or without con-
case that we cannot find sufficiently
trolling for characteristics, X.) In its
comparable untreated individuals to
be matched with every treated indi- simplest form, we take the difference
vidual (the so-called common support in Y for the participants before and
requirement). These treated individ- after the treatment and subtract the
uals would then need to be excluded difference in Y for the non-participants
from the analysis, which would reduce before and after the treatment. As
external validity. a result, time-invariant differences in
characteristics between participants
6.6. Difference-in-differences and non-participants are eliminated,
(DID) allowing us to identify the treatment
effect. Consequently, this approach
Relying on the assumption that selec- accounts for unobserved heterogene-
tion is on observables only can be ity as long as it is time-invariant.
126

Box 5: How the DID estimator accounts for time-invariant unobservables

To make ideas more precise, suppose we have

where c is a time-invariant variable.

With observations at time t=0 and t=1 we can take the first difference

Importantly, the time-invariant characteristic c drops out. As discussed in box 3, using


just one cross-section of data will lead to a biased treatment effect if c is correlated
with . By using the DID approach we get rid of the problematic unobservable.

Panel data are helpful but not strictly variables, X, if we can argue that time
required. Having cross-sectional data trends are the same at least for treated
before and after the treatment may and non-treated with the same X. This
suffice. For instance, if villages partic- can for example be done with PSM.
ipate in the intervention entirely, whilst Another extension is to use additional
other villages do not participate, it will differences of unaffected comparison
suffice to conduct representative sur-
veys in the villages before and after
the intervention, i.e., interviewing the
same individuals in the villages is not
required. Thus, this method allows us
to avoid problems with attrition com-
monly found in panel surveys.

The simple DID approach eliminates


time-invariant heterogeneity. However,
it fails to account for systematic differ-
ences in time trends between partici-
pants and non-participants. Therefore,
we should include additional control
Non-experimental methodologies for quantitative analysis 127

groups. For instance, imagine an Figure 6: Difference-in-differences


insurance product applies only to indi- with parallel time-trends
viduals below the age of 40. We can 1) Blue = with microinsurance scheme
then compare the time trend of individ- 2) Red = without microinsurance scheme
(counterfactual)
uals above the age of 40 in the treat- 3) Green = control group
ment villages with the time trend of 4
3,5
those above 40 in the control villages. 3,5
This difference in time trends can be 3

# of hospital visits
used to eliminate differences in time 2,5
2,5
trends of those under 40. A further 2
2
possibility is to use data for more than
1,5 1,5
one point in time before the treatment 1,5

is introduced. This would also allow 1 1


eliminating differences in time trends. 0,5
Having more than one survey after the
0
treatment implementation additionally t=0 t=1
allows the estimation of time-varying
and long-run treatment effects. the control village (the green line). In
this scenario the treatment effect is
In order to apply the DID estimator to (3.5-2)-(1.5-1)=1. We receive a reliable
our hypothetical example, we need result for our treatment effect.
data at two points in time for two dif-
ferent villages: one village where However, the second scenario shows
insurance is available and another one that if the time-trends are different the
where it is not. The treatment effect DID estimator does not produce reli-
is then calculated as the difference in
able results: (3.5-1.5)- (1.5-1)=1.5. Here,
hospital visits between the two villages
our estimated treatment effect overes-
after the introduction of our insurance
timates the true treatment effect by 0.5
in village 1 (t=1) and the difference
hospital visits because the treatment
between the two villages before insur-
ance was introduced (t=0). Two scenar- and control villages do not have paral-
ios show how this method relies on the lel time-trends. Reasons for the differ-
assumption that time-trends of the two ence in time-trends can be, for exam-
villages are equal. In figure 6, the two ple, macroeconomic effects that affect
villages have parallel time-trends. This treatment and control villages differ-
means that the counterfactual of the ently or any other confounding factors
village with insurance (red dashed line) that influence the number of hospital
changes over time in the same way as visits in one village but not in the other.
128

Figure 7: Difference-in-differences However, the methods described belong


without parallel time trends to the field of microeconometrics, which
1) Blue = with microinsurance scheme
are suited to evaluate interventions on
2) Red = without microinsurance scheme the micro- or meso-level. A central ele-
(counterfactual)
3) Green = control group ment is that there exist different units—
4
individuals, firms, hospitals, water
3,5
3,5 works, villages, local administrations,
3 neighbourhoods, districts, etc.—some
# of hospital visits

2,5 of which were exposed by the treat-


2,5
ment, whilst others were not. In order
2
1,5 2 to evaluate projects on the macro level,
1,5 1,5 such as budget support for balancing
1
1 the national budget, other econometric
0,5 methods are more suitable.
0
t=0 t=1 Convincing evaluations based on
non-experimental methods require
Internal validity of the DID approach a detailed understanding of the selec-
hinges on the assumption that partici- tion mechanism and comprehensive
pants’ and non-participants’ outcome and representative data on the treat-
variables under consideration have the ment and comparison groups. More-
same time trend. As explained, there over, we usually need more than 1000
are tests to check for the plausibility of observations in order to obtain suffi-
this assumption. In addition to the com- ciently precise estimates of the impact.
mon trend assumption, we require that Non-experimental methods are not
there are no spill-over effects from the suitable as a monitoring instrument
participants to the non-participants. If for projects in the phase of introduc-
these assumptions are fulfilled, internal tion and should only be applied after
validity is high. External validity is high, resolving initial problems. Apart from
as long as the sample our data is based initial obstacles, there are often larger
on is representative for the population modifications of the originally planned
of interest. intervention, making a precise defini-
tion of the treatment more difficult.
6.7. Fields of application of
non-experimental methods Quantitative methodologies can be
used to evaluate the impact of an inter-
Non-experimental quantitative impact vention compared to a situation with-
evaluation can be applied to many areas. out. It is more informative, however, to
Non-experimental methodologies for quantitative analysis 129

evaluate the impact of an intervention that the sample size was too small to
relative to other interventions or, alter- draw reliable conclusions.
natively, to evaluate different variants
of an intervention, keeping context and References
data collection procedure constant.
Angrist, J. D. 2004. American education re-
This, for example, would allow to look search changes track. Oxford Review of Econom-
at the impact of different incentives or ic Policy 20(2):198-212.
cost sharing arrangements for subsi-
Burtless, G. and L. L. Orr. 1986. Are classi-
dised insurance. cal experiments needed for manpower policy?
Journal of Human Resources 4:606-639.
Non-experimental methods gener-
Card, D. 1995. Earnings, schooling, and ability
ally give less convincing results than revisited. Research in Labor Economics 14:23-48.
experimental methods. Moreover, if
Leeuw, F. and J. Vaessen. 2009. Impact evalua-
the confidence intervals turn out to tions and development: NONIE guidance on im-
be very wide, we should not interpret pact evaluation. Washington, D.C.: The Network
these non-significant results as evi- of Networks on Impact Evaluation (NONIE).
http://siteresources.worldbank.org/EXTOED/
dence for the absence of an impact. Resources/nonie_guidance.pdf
This interpretation is only valid if the
confidence intervals are very narrow. Rosenbaum, P. R. and D. B. Rubin. 1983. The
central role of the propensity score in obser-
The correct interpretation would be vational studies for causal effects. Biometrika
70(1):41-55.

You might also like