You are on page 1of 57

ME: Chapter 4

Impact Evaluation

 By:
 Chala Dechassa (PhD, Associate prof)

My question is: Are we making an impact?

1 Chala Dechassa, PhD 11/17/2023


Contents-

▪ Monitoring and Impact evaluation


▪ Why do Impact Evaluation
▪ Why we need a comparison group
▪ Methods for constructing the comparison group
▪ When to do an impact evaluation

2 Chala Dechassa, PhD 11/17/2023


1. Impact evaluation
 Impact evaluations are a particular type of evaluation that
seeks to answer cause-and-effect questions.
 impact evaluations are structured around one particular
type of question:
 What is the impact (or causal effect) of a program on an
outcome of interest?
 This basic question incorporates an important causal dimension:
 we are interested only in the impact of the program, that
is, the effect on outcomes that the program directly
causes.
 An impact evaluation looks for the changes in outcome
that are directly attributable to the program.

3 Chala Dechassa, PhD 11/17/2023


Cont’d
 Many names (e.g. Rossi et al call this impact assessment)
so need to know the concept.
 Impact is the difference between outcomes with the
program and without it
 The goal of impact evaluation is to measure this
difference in a way that can attribute the difference to
the program, and only the program

4 M&E Chala Dechassa, PhD 11/17/2023


2. Why Evaluate?

 Development programs and policies are typically designed to change


outcomes, for example, to raise incomes, to improve learning,
or to reduce illness.
 Whether or not these changes are actually achieved is a crucial
public policy question but one that is not often examined.
 More commonly, program managers and policy makers focus on
controlling and measuring the inputs and immediate
outputs of a program—how much money is spent, how many
textbooks are distributed—rather than on assessing whether
programs have achieved their intended goals of improving
well-being.

5 M&E Chala Dechassa, PhD 11/17/2023


2. Why it matters
 We want to know if the program had an impact and
the average size of that impact
 Understand if policies work
 Justification for program (big $$)
 Scale up or not – did it work?
 Meta-analyses – learning from others
 understand the net benefits of the program
 Understand the distribution of gains and losses

6 Chala Dechassa, PhD 11/17/2023


Monitoring and IE

IMPACT Effect on living standards


- infant and child mortality,
- prevalence of specific disease

OUTCOMES Access, usage and satisfaction of users


- number of children vaccinated,
- percentage within 5 km of health center

OUTPUTS Goods and services generated


- number of nurses
- availability of medicine

INPUTS Financial and physical resources


- spending in primary health care

7 Chala Dechassa, PhD 11/17/2023


Impact Evaluation for Policy Decisions

 Impact evaluations are needed to inform policy makers on a range of


decisions, from curtailing inefficient programs, to scaling up
interventions that work,
▪ to adjusting program benefits,
▪ to selecting among various program alternatives.
 They are most effective when applied selectively to answer important
policy questions, and they can be particularly effective when applied
to innovative pilot programs that are testing a new, unproven, but
promising approach.

8 M&E Chala Dechassa, PhD 11/17/2023


3. What we need
→ The difference in outcomes with the program versus
without the program – for the same unit of analysis
(e.g. individual)

 Problem: individuals only have one existence

 Hence, we have a problem of a missing counter-factual, a


problem of missing data

 Counterfactual analysis enables evaluators to attribute cause


and effect between interventions and outcomes.

9 Chala Dechassa, PhD 11/17/2023


Cont’d

 The 'counterfactual' measures what would have


happened to beneficiaries in the absence of
the intervention, and

 impact is estimated by comparing counterfactual


outcomes to those observed under the intervention

10 Chala Dechassa, PhD 11/17/2023


Thinking about the counterfactual
 Why not compare individuals before and after (the
reflexive)?
 The rest of the world moves on and you are not sure
what was caused by the program and what by the
rest of the world

 We need a control/comparison group that will allow us


to attribute any change in the “treatment” group
to the program (causality)

11 Chala Dechassa, PhD 11/17/2023


comparison group issues
 Two central problems:
 Programs are targeted
→ Program areas will differ in observable and
unobservable ways precisely because the
program intended this
 Individual participation is (usually) voluntary
→Participants will differ from non-participants in
observable and unobservable ways
 Hence, a comparison of participants and an arbitrary
group of non-participants can lead to heavily biased
results
12 Chala Dechassa, PhD 11/17/2023
Example: providing fertilizer to farmers
 The intervention: provide fertilizer to farmers in a
poor region of a country (call it region A)
 Program targets poor areas
 Farmers have to enroll at the local extension office to receive
the fertilizer
 Starts in 2019, ends in 2021, we have data on yields for farmers
in the poor region and another region (region B) for both years
 We observe that the farmers we provide fertilizer to
have a decrease in yields from 2019 to 2021

13 Chala Dechassa, PhD 11/17/2023


Did the program not work?

▪ Further study reveals there was a national drought, and


everyone’s yields went down (failure of the reflexive
comparison).
▪ We compare the farmers in the program region to those in
another region. We find that our “treatment” farmers have a
larger decline than those in region B. Did the program have a
negative impact?
 Not necessarily (program placement)
 Farmers in region B have better quality soil
(unobservable)
 Farmers in the other region have more irrigation, which
is key in this drought year (observable)

14 Chala Dechassa, PhD 11/17/2023


OK, so let’s compare the farmers in region A
 We compare “treatment” farmers with their neighbors.
We think the soil is roughly the same.
 Let’s say we observe that treatment farmers’ yields decline by
less than comparison farmers. Did the program work?
 Not necessarily. Farmers who went to register with the program may
have more ability, and thus could manage the drought better than
their neighbors, but the fertilizer was irrelevant. (individual
unobservable)
 Let’s say we observe no difference between the two groups.
Did the program not work?
 Not necessarily. What little rain there was caused the fertilizer to run
off onto the neighbors’ fields. (spillover/contamination)

15 Chala Dechassa, PhD 11/17/2023


The comparison group
 In the end, with these naïve comparisons, we cannot tell
if the program had an impact

→ We need a comparison group that is as identical in


observable and unobservable dimensions as
possible, to those receiving the program, and a
comparison group that will not receive spillover benefits.

16 Chala Dechassa, PhD 11/17/2023


How to construct a comparison group –
building the counterfactual
1. Randomization
2. Matching
3. Difference-in-Difference
4. Instrumental variables
5. Regression discontinuity

17 Chala Dechassa, PhD 11/17/2023


1. Randomization
 Individuals/communities/firms are randomly assigned into
participation
 Counterfactual: randomized-out group
 Advantages:
 Often addressed to as the “gold standard”: by design: selection bias is
zero on average and mean impact is revealed
 Perceived as a fair process of allocation with limited resources

 Disadvantages:
 Ethical issues, political constraints
 Internal validity (exogeneity): people might not comply with the
assignment (selective non-compliance)
 Unable to estimate entry effect
 External validity (generalizability): usually run controlled experiment
on a pilot, small scale. Difficult to extrapolate the results to a larger
18 population. Chala Dechassa, PhD 11/17/2023
2. Matching
 Match participants with non-participants from a larger
survey
 Counterfactual: matched comparison group

 Each program participant is paired with one or more non-


participant that are similar based on observable characteristics

 Assumes that, conditional on the set of observables, there is


no selection bias based on unobserved heterogeneity
 When the set of variables to match is large, often match on a
summary statistics: the probability of participation as a function
of the observables (the propensity score)
 Propensity score matching (PSM) is a quasi-experimental
method in which the researcher uses statistical techniques
to construct an artificial control group by matching each treated
unit with a non-treated unit of Chala
19
similar characteristics.
Dechassa, PhD 11/17/2023
2. Matching
 Advantages:
 Does not require randomization, nor baseline (pre-intervention
data)
 Disadvantages:
 Strong identification assumptions
 Requires very good quality data: need to control for all factors
that influence program placement
 Requires significantly large sample size to generate comparison
group

20 Chala Dechassa, PhD 11/17/2023


Matching in our example…

 Using statistical techniques, we match a group of non-


participants with participants using variables like
 Gender,
 household size,
 education,
 experience,
 land size (rainfall to control for drought),
 irrigation (as many observable characteristics not
affected by fertilizer)

21 Chala Dechassa, PhD 11/17/2023


Matching in our example…
2 scenarios
Scenario 1:
 We show up afterwards, we can only match (within region)
those who got fertilizer with those who did not. Problem?
 Problem: select on expected gains and/or ability (unobservable)
Scenario 2:
 The program is allocated based on historical crop choice and
land size. We show up afterwards and match those eligible in
region A with those in region B. Problem?
 Problems: same issues of individual unobservable, but lessened
because we compare eligible to potential eligible
 now unobservables across regions

22 Chala Dechassa, PhD 11/17/2023


3. Difference-in-difference

 The difference in difference (or "double difference") estimator is


defined as the difference in average outcome in the treatment
group before and after treatment minus the difference in average
outcome in the control group before and after treatment: it is
literally a "difference of differences.

 Compares the changes in outcomes over time between a


population enrolled in a program (the treatment group) and a
population that is not (the comparison group).

 It is a useful tool for data analysis.

23 Chala Dechassa, PhD 11/17/2023


Cont’d
 Observations over time: compare observed changes in the
outcomes for a sample of participants and non-participants
 Identification assumption: the selection bias is time-invariant
(‘parallel trends’ in the absence of the program)
 Counter-factual: changes over time for the non-participants
Constraint:
Requires at least two cross-sections of data, pre-program and post-
program on participants and non-participants
 Need to think about the evaluation ex-ante, before the program
 Can be in principle combined with matching to adjust for pre-
treatment differences that affect the growth rate

24 M&E Chala Dechassa, PhD 11/17/2023


Examples
 Example 1 Card and Krueger (1994, AER) in "Minimum
Wages and Employment:

▪ A Case Study of the Fast-Food Industry in New Jersey and


Pennsylvania" try to evaluate the effect of the minimum
wage (the treatment) on employment (the outcome).

▪ On April 1, 1992, New Jersey’s minimum wage rose from


$4.25 to $5.05 per hour.

25 Chala Dechassa, PhD 11/17/2023


Cont’d…
 To evaluate the impact of the law, the authors
surveyed 410 fast-food restaurants in New Jersey
(the treatment group) and eastern Pennsylvania
(the control group) before and after the rise.

 Yi is the employment of a fast food restaurant,


 Ti is an indicator of whether or not a restaurant is in New
Jersey, and
 ti is an indicator of whether the observation is from before
or after the minimum wage hike

26 M&E Chala Dechassa, PhD 11/17/2023


4. Instrumental Variables
 An instrumental variable (sometimes called an “instrument” variable)
is a third variable, Z, used in regression analysis when you have
endogenous variables—variables that are influenced by other
variables in the model.
 In other words, you use it to account for unexpected
behaviour between variables.
 Using an instrumental variable to identify the hidden (unobserved) correlation
allows you to see the true correlation between the explanatory variable and
response variable,Y.
 Z is correlated with the explanatory variable (X) and uncorrelated with the
error term, ε, (What is ε?) in the equation:

Y = Xβ + ε.

27 Chala Dechassa, PhD 11/17/2023


CONT’D
 Identify variables that affects participation in the
program, but not outcomes conditional on participation
(exclusion restriction)
 Counterfactual: The causal effect is identified out of the
exogenous variation of the instrument

 Advantages:
 Does not require the exogeneity assumption of matching
 Disadvantages:
 The estimated effect is local: IV identifies the effect of the program only
for the sub-population of those induced to take-up the program by the
instrument
 Therefore different instruments identify different parameters. End up
with different magnitudes of the estimated effects
 Validity of the instrument can be questioned, cannot be tested.

28 Chala Dechassa, PhD 11/17/2023


Summing up: Methods
 No clear “gold standard” in reality – do what works best
in the context
 Watch for unobservables, but don’t forget observables
 Be flexible, be creative – use the context
 IE requires good monitoring and monitoring will help you
understand the effect size

29 Chala Dechassa, PhD 11/17/2023


Impact Evaluation and the Project Cycle

Value

30 Chala Dechassa, PhD 11/17/2023


A. Ownership
 IE can provide one avenue to build institutional capacity
and a culture of managing-by-results – so the IE should
be as widely owned within gov’t as possible
 Agree on a dissemination plan to maximize use of results for
policy development.
 Identify entry points in project and policy cycles
 midpoint and closing, for project;
 sector reporting for WB
 Budget cycles, policy reviews for gov’t
 Use partnerships with local academics to build local
capacity for impact evaluation.

31 Chala Dechassa, PhD 11/17/2023


B. Relevance and Applicability

 For an evaluation to be relevant, it must be designed to respond to


the policy questions that are of importance.

 Clarifying early what it is that will be learned and designing the


evaluation to that end will go some way to ensure that the
recommendations of the evaluation will feed into policy
making.

32 Chala Dechassa, PhD 11/17/2023


C. Flexibility and adaptability
 The evaluation must be tailored to the specific project and adapted
to the specific institutional context.
 The project design must be flexible to secure our ability to
learn in a structured manner, feed evaluation results back into
the project and change the project mid-course to improve project
end results.

This is an important point: In the past projects have been penalized for
affecting mid-course changes in project design. Now we want to make
change part of the project design.

33 Chala Dechassa, PhD 11/17/2023


D. Horizon matters
 The time it takes to achieve results is an important consideration for
timing the evaluation.
 Conversely, the timing of the evaluation will determine what
outcomes should be focused on.
 Early evaluations should focus on outcomes that are quick to show
change
 For long-term outcomes, evaluations may need to span beyond
project cycle.

Think through how things are expected to change over time and focus on
what is within the time horizon for the evaluation
Do not confuse the importance of an outcome with the time it takes for it to
change—some important outcomes are obtained instantaneously !
But don’t be afraid to look at intermediate outcomes either

34 Chala Dechassa, PhD 11/17/2023


Impact Evaluation and the Project Cycle
Stage 1: Identification

Get an Early Start: How do you get started?


 Get help and access to resources: contact person in your region or
sector responsible for impact evaluation and/or Thematic Group on Impact
Evaluation

 Define the timing for the various steps of the evaluation to ensure
you have enough lead time for preparatory activities (e.g. baseline
goes to the field before program activities start)

 The evaluation will require support from a range of policy-makers:


 start building and maintaining constituents,
 dialogue with relevant actors in government,
 build a broad base of support, include stakeholders
35 Chala Dechassa, PhD 11/17/2023
Build the Team
 Select impact evaluation team and define
responsibilities of:
 program managers (government),
 WB project team, and other donors,
 lead evaluator (impact evaluation specialist),
 local research/evaluation team, and
 data collection agency or firm

Selection of lead evaluator is critical for ensuring quality of


product, and so is the capacity of the data collection agency

 Partner with local researchers and research institutes


to build local capacity

36 Chala Dechassa, PhD 11/17/2023


Shift Paradigm

 From a project design based on “we know what’s best”


 To project design based on the notion that “we can learn
what’s best in this context, and adapt to new knowledge
as needed”

Work iteratively:
 Discuss what the team knows and what it needs to learn–the questions for
the evaluation—to deliver on project objectives
 Discuss translating this into a feasible project design
 Figure out what questions can feasibly be addressed
 Housekeeping: Include these first thoughts in a paragraph in the PCN

37 Chala Dechassa, PhD 11/17/2023


stage 2: Preparation through appraisal

1. Define project development objectives and


results framework

 This activity
 clarifies the results chain (logic of impacts) for the
project,
 identifies the outcomes of interest and the indicators
best suited to measure changes in those outcomes, and
 the expected time horizon for changes in those outcomes.

38 Chala Dechassa, PhD 11/17/2023


3: Work out project design features that
will affect evaluation design
 Target population and rules of selection
 This provides the evaluator with the universe for the
treatment and comparison sample
 Roll out plan
 This provide the evaluation with a framework for timing data
collection and, possibly, an opportunity to define a comparison
group

39 Chala Dechassa, PhD 11/17/2023


4: Narrow down the questions for the evaluation

 Questions aimed at measuring the impact of the project


on a set of outcomes, and

 Questions aimed at measuring the relative effectiveness


of different features of the project

40 Chala Dechassa, PhD 11/17/2023


Questions aimed at measuring the impact of the project are relatively
straightforward

 What is your hypothesis? (Results framework)


 By expanding water supply, the use of clean water will increase, water
borne disease decline, and health status will improve
 What is the evaluation question?
 Does improved water supply result in better health outcomes?
 How can do you test the hypothesis?
 The government might randomly assign areas for expansion in water
supply during the first and second phase of the program
 What will you measure?
 Measure the change in health outcomes in phase I areas relative to the
change in outcomes in phase II areas. Outcomes will include use of safe
water (S-T), incidence of diarrhea (S/M-T), and health status (L-T,
depending on when phase II occurs). Add other outcomes.
 What will you do with the results?
 If the hypothesis proves true go to phase II; if false, modify policy.
41 Chala Dechassa, PhD 11/17/2023
Questions aimed at measuring the relative effectiveness of
different project features

require identifying the tough design choices on


the table…

 What is the issue?


 What is the best package of products or services?
 Where do you start from (what is the
counterfactual)?
 What package is the government delivering now?
 Which changes do you or the government think
could be made to improve effectiveness?

42 Chala Dechassa, PhD 11/17/2023


 How do you test it?
 The government might agree to provide a package to a randomly
selected group of households and another package to another
group of households to see how the two package perform
 What will you measure?
 The average change in relevant outcomes for households receiving
one package versus the same for households receiving the other
package
 e.g. extension vs fertilizer+extension vs fertilizer+extension+seeds
 What will you do with the results?
 The package that is most effective in delivering desirable outcomes
becomes the one adopted by the project from the evaluation
onwards

43 Chala Dechassa, PhD 11/17/2023


Application, features that should be tested early on
 Early testing of project features (say 6 months to 1 year) can provide
the team with the information needed to adjust the project early on
in the direction most likely to deliver success.
 Features might include:
 alternative modes of delivery (e.g. use seed merchants vs.
extension agents),
 alternative packages of outputs, or
 different pricing schemes (e.g. alternative subsidy levels).

44 Chala Dechassa, PhD 11/17/2023


5: Develop identification strategy

 (to identify the impact of the project separately from


changes due to other causes )

 Once, the questions are defined, the lead evaluator


selects one or more comparison groups against which
to measure results in the treatment group.
 The “rigor” with which the comparison group is
selected will determine the reliability of the impact
estimates.
 Rigor?
 More-same observables and unobservables (experimental),
 Less-same observables (non-experimental)
45 Chala Dechassa, PhD 11/17/2023
Explore Existing Data
 Explore what data exists that might be relevant for use in
the evaluation.
 Discuss with the agencies of the national statistical system and
universities to identify existing data sources and future data collection
plans.
 Record data periodicity, quality, variables covered and sampling
frame and sample size, for
 Censuses
 Surveys (household, firms, facility, etc)
 Administrative data
 Data from the project monitoring system

46 Chala Dechassa, PhD 11/17/2023


New Data
 Start identifying additional data collection needs.
 Data for impact evaluation must be representative of treatment and
comparison group
 Questionnaires must include outcomes of interest (consumption, income,
assets etc), questions about the program in question and questions about
other programs, as well as control variables
 The data might be at household, community, firm, facility, or farm levels and
might be combined with specialty data such as those from water or land
quality tests.
 Investigate synergies with other projects to combine data
collection efforts and/or explore existing data collection efforts
on which the new data collection could piggy back
 Develop a data strategy for the impact evaluation
including:
 The timing for data collection
 The variables needed
 The sample (including size)
 47Plans to integrate data from other sources (e.g project
Chala Dechassa, monitoring data)
PhD 11/17/2023
Prepare for collecting data

 Identify data collection agency


 Lead evaluator or team will work with the data collection agency to
design sample, and train enumerators
 Lead evaluator or team will prepare survey questionnaire or
questionnaire module as needed
 Pre-testing survey instrument may take place at this stage to finalize
instruments
 If financed with outside funds, baseline can now go to the field. If
financed by project funds, baseline will go to the field just after
effectiveness but before implementation starts

48 Chala Dechassa, PhD 11/17/2023


Develop a Financial Plan
 Costs:
 Lead evaluator and research/evaluation team,
 Data collection,
 Supervision and
 Dissemination
 Finances: ,
 Research grants,
 Project funds, or
 Other donor funds

49 Chala Dechassa, PhD 11/17/2023


Stage 3: Negotiations to Completion
Ensure timely implementation
 Ensure timely procurement of evaluation services
especially contracting the data collection, and
 Supervise timely implementation of the evaluation
including
 Data collection
 Data analysis
 Dissemination and feedback

50 Chala Dechassa, PhD 11/17/2023


Data collection agency/firm
 Data collection agency or firm must have technical
knowledge and sufficient logistical capacity relative to the
scale of data collection required
 The same agency or firm should be expected to do
baseline and follow up data collection (and use the same
survey instrument)

51 Chala Dechassa, PhD 11/17/2023


Baseline data collection and analysis
 Baseline data collection should be carried out before
program implementation begins; optimally even before
program is announced
 Analysis of baseline data will provide program
management with additional information that might help
finalize program design

52 Chala Dechassa, PhD 11/17/2023


Follow-up data collection and analysis
 The timing of follow-up data collection must reflect the
learning strategy adopted
 Early data collection will help modifying programs mid
course to maximize longer-term effectiveness
 Later data collection will confirm achievement of longer-
term outcomes and justify continued flows of fiscal
resources into the program

53 Chala Dechassa, PhD 11/17/2023


Watch implementation closely from an
evaluation point of view
 Watch (monitor) what is actually being implemented:
 Will help understand results of evaluation
 Will help with timing of evaluation activities
 Watch for contamination in the control group
 Watch for violation of eligibility criteria
 Watch for other programs for the same beneficiaries
 Look for unintended impacts
 Look for unexploited evaluation opportunities
→ Good evaluation team communication is key here

54 Chala Dechassa, PhD 11/17/2023


Dissemination
 Implement plan for dissemination of evaluation results
ensuring that the timing is aligned with government’s decision
making cycle.
 Ensure that results are used to inform project management
and that available entry points are exploited to provide
additional feedback to the government
 Ensure that wider dissemination takes place only after the
client has had a chance to preview and discuss the results
 Nurture collaboration with local researchers throughout the
process

55 Chala Dechassa, PhD 11/17/2023


Housekeeping

 Put in place arrangements to procure the impact


evaluation work and fund it on time
 Use early results to inform mid-term review
 Use later results to inform the ICR, CAS and future
operations

56 Chala Dechassa, PhD 11/17/2023


-

 End of the course!

57 Chala Dechassa, PhD 11/17/2023

You might also like