Professional Documents
Culture Documents
Non-experimental
methodologies for
quantitative analysis
Markus Frölich, Andreas Landmann,
Markus Olapade, and Robert Poppe
112
6.1. Introduction
Thus, any impact evaluation should be participated, whilst others did not par-
based on a detailed understanding about ticipate in the programme; otherwise,
why some individuals or communities results are likely to be biased.
To make ideas more precise, let denote the outcome if individual or community i is
exposed to development intervention D. Before programme start, each individual (or
community) has two hypothetical outcomes: a potential outcome if individual i partici-
pates in the programme, and a potential outcome if individual i does not participate
in the programme. The causal effect of the intervention is defined as the difference
between and , i.e., the effect of participation in the programme relative to what
would have happened had individual i not participated in the programme. The individual
effect of the intervention is usually averaged over the population of interest, defined
as the average treatment effect . It can be interpreted as the average treat-
ment effect for a person randomly drawn from the population or, alternatively, as the
expected change in the average outcome if the individual status indicator variable of
development intervention D were changed from 0 to 1 for every individual (provided that
no general equilibrium effects occur), where individual i either receives (D=1) or does
not receive (D=0) the treatment. In a policy evaluation context of particular interest is
the average treatment effect on the treated (ATT) defined as . It may be
more informative to know how the programme affected those who chose to participate
in it than how it affected those who could have participated but decided not to.
One way to avoid selection bias is to (Angrist 2004). Given appropriate sam-
randomly assign individuals to a treat- ple sizes, the two groups will have
ment and a control group. We refer approximately the same character-
to randomised trials as methods that istics and differ only in terms of the
randomly assign individuals who are treatment status. They will be approx-
equally eligible and willing to partici- imately equal with respect to variables
pate into distinct groups; they are gen- like race, sex, and age, and also for
erally considered the most robust of all difficult to measure variables, such as
evaluation methodologies and some- lifestyle-related risks, quality of social
times referred to as the gold standard networks, and health awareness.
114
ATT= .
In practice, there are several problems group may not take up treatment, or
with randomisation. Firstly, it may be individuals assigned to the control
unethical to deny access to benefits for group may seek similar treatment
a subgroup of individuals.1 Secondly, through alternative channels.
it may be politically unfeasible to ran-
domly deny access to a potentially ben- Sometimes, randomised trials are
eficial intervention. Thirdly, there may impractical. However, impact evalua-
not be any individuals who are unaf- tions are most valuable when we use
fected if the scope of the programme data to answer specific causal ques-
is nationwide. Fourthly, problems arise tions as if in a randomised controlled
when, after randomised assignment, trial. In absence of an experiment, we
individuals cross over to the treatment may look for a natural or quasi-exper-
group. For example, people might iment that mimics a randomised trial
travel to another municipality to buy in that there is a group affected by the
insurance after having learned that programme and some control group
a microfinance institution offers a new that is not affected. If it is credible to
life insurance scheme there. Fifthly, argue that the groups do not differ sys-
individuals assigned to the treatment tematically, such a natural or quasi-ex-
periment can be used for evaluation
1 For a discussion see Burtless and Orr (1986). instead of a randomised experiment.
Non-experimental methodologies for quantitative analysis 115
3
of a health insurance, but it cannot be
2,5
2.5 ruled out that other reasons have also
2
affected this result.
1,5
1.5
1. Were the two villages different from
1 the outset? Different in observables
0.5 like wealth, education, or different
in unobservables such as trust in
0
t=0 t=1 hospital staff?
Non-experimental methodologies for quantitative analysis 117
3,5
Hospital visits per year
2,5
1,5
0,5
0
village with insurance village without insurance
2. Has one of the villages had other Another naïve estimator would be the
influences that might affect the before-after comparison. Here, we need
number of hospital visits? The road to conduct a baseline survey amongst
leading to the hospital might have the population of the treatment village
(shortly) before the microinsurance is
been blocked for village 2, hinder-
made available. We would ask explic-
ing people from visiting the hospital
itly how often a week the inhabitants
and leading to an overestimation of go to the hospital.3 Then the insurance
our treatment effect. is made available and after a certain
period of time the same survey ques-
Myriad scenarios can be constructed tions are gathered again from the vil-
to answer these two questions. lage. Of course, the time between the
According to our example from figure introduction of the microinsurance and
1, however, an actual comparable con- the follow-up survey needs to be long
trol village should have 2.5 hospital enough for certain incidences of sick-
visits per year in t=1. This shows that ness and claims to occur. As depicted
in figure 3, the estimated impact of our
using cross-sectional comparison,
hypothetical insurance is the differ-
the impact evaluator cannot be sure
ence between the number of hospital
whether the effect of microinsurance
on hospital visits results only from the
availability of microinsurance or also 3 Of course, it might be more reliable to gather this infor-
mation directly from the hospital: “How many inhabitants
from other confounding factors. from village 1 visit the hospital a week?”
118
methods give results with high internal awareness). The instrument affects
validity, but low external validity, and insurance take-up without being itself
vice versa. We conclude with a discus- affected by different levels of aware-
sion of where non-experimental meth- ness about insurance. In the absence
ods should be applied. of a good instrument, one could not
tell apart the insurance’s effect and the
6.3. Instrumental variable awareness’ effect on hospital visits.
Instrumental variable methods solve this
Selection bias occurs when an omitted problem of omitted control variables.
variable has an effect on the outcome An instrumental variable is a variable
variable of interest and the treatment. which has an effect on whether an
It is also called selection on unob- individual takes up or does not take up
servables, whereby treatment selec- treatment and at the same time is per-
tion is affected by a variable that the mitted to affect the outcome variable of
researcher cannot observe in the data. interest via the treatment variable only.
For example, individuals with insurance This is called the exclusion restriction. In
could have had a higher (unobserved) other words, individuals with different
awareness for health issues from the values of the instrument differ in their
outset. Consequently, they would show treatment status. But otherwise, these
different health behaviour, than those individuals are comparable. Often, the
without insurance. Figure 4 illustrates exclusion restriction will be only valid
this simple case with arrows indicating conditionally, that means when con-
directions of influence and dashed lines trolling for individual characteristics.
indicating unobserved variables (health
Awareness
Hospital
visits
Instrument Insurance
120
More formally, instrumental variable Z affects treatment status D, but there is no direct
relationship between Z and the outcome variable Y. Hence Z is allowed to affect Y only
indirectly via D. Suppose we have
and are correlated, so and are correlated as well and, thus, is endogenous.
The instrument can now be used to get an unbiased estimate of the effect of the endog-
enous variable. Researchers use a method that is called two-stage least squares: in the
first stage, the instrument(s) Z is used to give estimated values of the endogenous treat-
ment variable D for every individual (or community):4
Then, in the second stage, this new variable is plugged in the equation of interest:
and purchase the product at the local whilst the individuals in the control
insurance administration centre of the group do not receive such an encour-
neighbourhood or municipality. Now agement or incentive (or receive a dif-
imagine two households that are very ferent one). It is up to the individuals
close, but on different sides of the bor- whether they sign up for the actual
der between two neighbourhoods. The treatment. For example, imagine that
distance to the administrative centre the price of insurance is varied ran-
might differ considerably, but other- domly across communities, creating
wise the two neighbours should be very a random incentive to buy insurance
similar. For such pairs, distance to the for the population facing a lower price.
administrative centre could be used The instrumental variable that is gen-
as a predictor of insurance take-up erated here helps resolve the problem
that is otherwise unrelated to individ- of selection bias and allows consistent
ual characteristics—in other words, estimation of the effect that insurance
a good instrument. Here, the instru- take-up has on health and other out-
ment is correlated with insurance come measures. Similarly, we may
take-up but not with awareness. Thus, vary the effort related to take-up by, for
instead of comparing the treated to the example, varying service hours, density
untreated, we compare those with high of offices in a community, etc., from the
values and low values of the instru- insurer’s side. If areas or individuals
ment. This example is analogous to the cannot be exclusively chosen for a pro-
famous distance-to-school instrument gramme at random, we may at least
used by Card (1995) for schooling. Typi- give them varying incentives to do so.
cally, an instrument requires including
additional X variables, e.g., quality of If there is an instrument that fulfills
the neighbourhood, degree of urbani- the exclusion restriction as explained
sation, family background, etc. above, internal validity is high. How-
ever, external validity depends on
We may also generate instrumen- another quality of the instrument. If
tal variables ourselves by randomly the instrument predicts treatment
assigning incentives or encourage- status accurately, external validity is
ments to individuals (random encour- also likely to be high. Otherwise the
agement design). This approach looks instrumental variable results cannot
very much like a proper randomised be generalised to the whole popula-
experiment, except that we have imper- tion. The reason is that only those who
fect control over the beneficiaries. An are induced to take up treatment by the
encouragement or incentive is given to instrument can be used for the estima-
the individuals in the treatment group, tion of the treatment effect.
122
3
2.5
2
1.5
1
0.5
0
poor Poverty line rich
Poverty index
two individuals is then averaged over to participants on the basis of the pro-
the whole population. A practical sim- pensity score, which is defined as the
plification is to match non-participants probability of treatment.
Using matching, the average treatment effect on the treated (ATT) can be defined as
ATT= . As in the randomised
trial, the expected counterfactual outcome, can be replaced by the
expected observed outcome of the non-participants, , but only conditional
on a set of observable covariates, X. If treated and non-treated differ in terms of observ-
able characteristics X only and not in terms of unobservables (the so-called selection on
observables assumption), it holds that . Otherwise, selec-
tion bias will remain an issue.
Instead of matching on X, it suffices to match on the propensity score p(X), i.e., the proba-
bility of treatment defined as p(X)=Pr(D=1|X) (Rosenbaum and Rubin 1983). We can there-
fore also write
ATT= .
With observations at time t=0 and t=1 we can take the first difference
Panel data are helpful but not strictly variables, X, if we can argue that time
required. Having cross-sectional data trends are the same at least for treated
before and after the treatment may and non-treated with the same X. This
suffice. For instance, if villages partic- can for example be done with PSM.
ipate in the intervention entirely, whilst Another extension is to use additional
other villages do not participate, it will differences of unaffected comparison
suffice to conduct representative sur-
veys in the villages before and after
the intervention, i.e., interviewing the
same individuals in the villages is not
required. Thus, this method allows us
to avoid problems with attrition com-
monly found in panel surveys.
# of hospital visits
used to eliminate differences in time 2,5
2,5
trends of those under 40. A further 2
2
possibility is to use data for more than
1,5 1,5
one point in time before the treatment 1,5
evaluate the impact of an intervention that the sample size was too small to
relative to other interventions or, alter- draw reliable conclusions.
natively, to evaluate different variants
of an intervention, keeping context and References
data collection procedure constant.
Angrist, J. D. 2004. American education re-
This, for example, would allow to look search changes track. Oxford Review of Econom-
at the impact of different incentives or ic Policy 20(2):198-212.
cost sharing arrangements for subsi-
Burtless, G. and L. L. Orr. 1986. Are classi-
dised insurance. cal experiments needed for manpower policy?
Journal of Human Resources 4:606-639.
Non-experimental methods gener-
Card, D. 1995. Earnings, schooling, and ability
ally give less convincing results than revisited. Research in Labor Economics 14:23-48.
experimental methods. Moreover, if
Leeuw, F. and J. Vaessen. 2009. Impact evalua-
the confidence intervals turn out to tions and development: NONIE guidance on im-
be very wide, we should not interpret pact evaluation. Washington, D.C.: The Network
these non-significant results as evi- of Networks on Impact Evaluation (NONIE).
http://siteresources.worldbank.org/EXTOED/
dence for the absence of an impact. Resources/nonie_guidance.pdf
This interpretation is only valid if the
confidence intervals are very narrow. Rosenbaum, P. R. and D. B. Rubin. 1983. The
central role of the propensity score in obser-
The correct interpretation would be vational studies for causal effects. Biometrika
70(1):41-55.