You are on page 1of 3

INTER-AMERICAN DEVELOPMENT BANK Technical Note 2

Social Development Department December 1, 1999


Poverty and Inequality Advisory Unit

Poverty & Inequality Technical Notes

IMPACT EVALUATION METHODS FOR SOCIAL PROGRAMS


Ferdinando Regalia*

This technical note provides a brief overview of designs. These methods assume that the pro-
the most commonly used methods to carry out grams will not impact general conditions in the
impact evaluation of social programs. economy. Indeed, general equilibrium effects of
small-scale programs should be insignificant.
Evaluation of large-scale programs should ac-
Why Evaluate a Project’s Impact? count for general equilibrium effects.

Impact evaluation is an indispensable tool to


assess whether a program is achieving its objec- A Experimental Design
tive, how the beneficiaries’ situation changed as
a result of the program and what the situation Randomization. This is the most robust of all the
would have been without the program. Moreo- evaluation methodologies. Once the target
ver, if an impact evaluation is carried out at an population is chosen on the basis of observed
intermediate stage of project execution, very characteristics, the program’s actual beneficiar-
important lessons can be learned on how the ies and non-beneficiaries are selected randomly
program design and/or the project execution can within the pool of eligible beneficiaries. By
be modified to improve the effectiveness of the definition randomization implies assigning eli-
intervention. The definition of program objec- gible beneficiaries to a treatment and a control
tives and targeting mechanisms might also be group through a lottery. Randomization ensures
improved by planning an impact evaluation at an that there are no systematic differences in the
early stage of project design. While the design of observed characteristics between program par-
an impact evaluation can be time and resource ticipants and individuals in the control group.
intensive, the costs are very often small relative The impact of the intervention is assessed by
to the scale of a transfer program (particularly if subtracting the mean outcomes of the group of
in-country resources in terms of available data beneficiaries from the mean outcomes of the
and data processing skills are used). The returns non-beneficiaries in the control group. This can
in terms of increased effectiveness of social be done for any indicator of interest (income,
spending and greater accountability are very consumption, school attendance, labor force
high. participation, etc.). Randomizing beneficiaries is
feasible (and ethical) whenever budget con-
Evaluation Methods straint require rationing of program benefits.
Even when the programs are national in scale
Impact evaluation tries to answer the question: and aim at 100 percent coverage, expansion of
What would have happened if the program had coverage is often gradual, and randomization
not existed? All impact evaluation methods offers an ethically sound basis on which to pro-
compare a treatment group (the program benefi- ceed (since all targeted individuals have the
ciaries), with a control group of non- same probability of being selected). Individuals
beneficiaries. These methods fall into two cate- who function as controls at an early stage of
gories: experimental and quasi-experimental program implementation become beneficiaries at

*
Ferdinando Regalia is a Consultant in the Poverty and Inequality Advisory Unit, IDB.
a later stage. An example of an experimental and treatment groups might bias the results. Ac-
design impact evaluation system is PROGRESA curate weighting of the two groups helps to re-
in Mexico, a targeted human development pro- duce this bias. A second source of bias is more
gram. relevant. It arises when unobservable individual
characteristics systematically influence both the
program participation and the outcome variables
B Quasi-experimental Design that are the object of the impact analysis, i.e.
Quasi-experimental methods construct control selection bias. Programs that use self-selection
groups that resemble treatment groups through targeting criteria, such as workfare programs,
econometric techniques and not randomly by might be particularly subject to this second sort
means of a lottery among eligible beneficiaries. of bias.
Quasi-experimental methods can make use of
existing data. They require the existence of a Reflexive comparison. This method requires a
survey administered to both beneficiaries and baseline survey of program beneficiaries before
non-beneficiaries of a program. Although they the program is implemented and a follow-up
may not ensure the same level of reliability of survey. The baseline represents the control
results because they cannot fully control for se- group, and the evaluation is performed by com-
lection bias, these methods are in general less paring the average change in outcome indicators
costly to implement. Using a combination of before and after the intervention. This method
quasi-experimental methods helps control for however cannot identify the impact of the pro-
selection bias. gram from that of other factors (e.g. economy
wide changes) that have affected the beneficiar-
Matching. This method builds a control group ies. For this reason results are biased, and the
by selecting among the respondents of a large- direction of that bias is difficult to assess.
scale, often national, survey those individuals
whose observable characteristics are similar to Difference in difference. This method can be
those of the program beneficiaries. When the used to reduce the potential selection bias (when
comparison includes a large set of observed unobservable individual characteristics are as-
characteristics, the matching between benefici- sumed to be time invariant) and the impact of
aries and non-beneficiaries can be performed other factors exogenous to the program on ob-
using propensity scores. Propensity scores servable characteristics. It accomplishes this by
measure an individual’s predicted probability of looking at the difference in outcome of partici-
being a program participant given her observed pants relative to the difference in outcome of
characteristics. Propensity scores are usually non participants. Equivalently, it looks at the
obtained from the estimation of binary choice difference in indicators for the two groups at the
nonlinear econometric models using the whole end of the program relative to the difference in
sample of beneficiaries and non-beneficiaries. indicators at the beginning. Let X be the indica-
The matching method pairs participants and tor of interest, and the subscript T and C indicate
control group members from a similar socioeco- treatment and control groups, and time index 0
nomic environment with the closest propensity and 1 indicate the time before and after the im-
scores. A measure of this closeness is the abso- plementation of the program, then this method
lute difference in scores. The impact of the in- computes the following double difference:
tervention is evaluated by subtracting the mean
outcomes of the group of beneficiaries from the D = ( X T 1 − X C1 ) − ( X T 0 − X C 0 )
mean outcomes of the matched non-beneficiaries
belonging to the control group. The Matching Regression analysis allows for controlling of
method is useful to carry out an impact evalua- differences in initial observed characteristics
tion when no baseline data have been collected between control and treatment groups and for
before the program implementation. The changes in exogenous variables.
evaluation of the Argentinean Trabajar, a work-
fare program, was carried out using a propensity Regression methods based on instrumental vari-
score method. The results of the program ables. Sometimes it is neither possible nor de-
evaluation depend on the set of observable char- sirable to do a baseline and follow-up survey,
acteristics used to compute the propensity particularly when households originally included
scores. Significant differences in the distribution in the baseline survey are likely to drop out from
of observable characteristics between the control the sample non-randomly (attrition bias). When
outcomes are observed both for participants and tioners”, Mimeo. LCSPR/PRMPO. World Bank
non-participants after program implementation, Washington D.C.
instrumental variables can be used to evaluate
the program impact without incurring problems Gómez de León, J. and S. Parker (1999), “The
of selection bias. Any variable that is correlated impact of anti-poverty programs on labor force
with individual participation in the program, but participation: the case of Progresa in Mexico”,
is non-correlated with individual outcomes given Mimeo. Progresa team, Mexico D.F.
participation, can be used as an instrumental
variable. This method is carried out in two steps. Heckman, J.J., Ichimura, H. and P. Todd (1997),
First, participation in the program is predicted “Matching as an Econometric Evaluation Esti-
using instrumental variables. Then, mean out- mator: Evidence from Evaluating a Job Training
come indicators are compared conditional on Program", Review of Economic Studies, Octo-
predicted participation and nonparticipation. ber.
Finding appropriate instrumental variables is
often very hard, making the implementation of Heckman, J. J. and J. Smith (1998) “Evaluating
this method rather difficult. the Welfare State”, Frisch Centenary Economet-
ric Monograph Series, Cambridge, Cambridge
C General Equilibrium Effects University Press.

All the above evaluation methods assume that Jalan, J. and M. Ravallion, (1999) “Income
the programs have no effect on non-participants. Gains from Workfare and their Distribution.”,
In other words, these methods rest on two very Mimeo. Washington D.C.
strong assumptions that are not always satisfied.
First, that the distribution of individual out- Ravallion, M. (1999) “The Mystery of the Van-
comes within the control group of a given pro- ishing Benefits: Ms Speedy Analyst’s Introduc-
gram can be used to approximate the distribution tion to Evaluation”, Mimeo. World Bank
of individual outcomes if the program did not Washington D.C.
exist. Second, that the distribution of individual
outcomes within the treatment group of a given Schultz, T. P. (1999) “Preliminary Evidence of
program can be used to approximate the distri- the Impact of PROGRESA on school enrollments
bution of individual outcomes if the program is from 1997 and 1998”, IFPRI Report. Interna-
universally applied. The first assumption is plau- tional Food Policy Research Institute, Wash-
sible only if the program size is small and all the ington D.C.
general equilibrium effects generated by the
program, inclusive of taxes and spillover effects Resources
on factor and output markets, are considered to
be insignificant. The second assumption implies External
that it is reasonable to forecast the outcomes of a Paul Gertler, U.C. Berkeley
program’s expansion by relying on the results of James J. Heckman, University of Chicago
an evaluation carried out on a very reduced size Martin Ravallion, World Bank
program. However, this is not always the case Petra Todd, University of Pennsylvania
because the expansion might give rise to impor- Abt Associates Inc., Bethesda, MD,
tant general equilibrium effects. Such general http://www.abtassoc.com
equilibrium effects should therefore be taken
into consideration when fully assessing the im- IFPRI, International Food Policy Research In-
pact of a program and when carrying out a rig- stitute, Washington D.C., www.ifpri.org
orous cost-benefit analysis. This, however, is
not an easy task. IDB
Carola Alvarez
Further Readings Omar Arias
Arianna Legovini
Baker, J. L. (1999) “Evaluating Project Impact Gustavo Marquez
for Poverty Reduction: A Handbook for Practi- Carmen Pagés-Serra
Ferdinando Regalia

You might also like