Me - 044

ME: Chapter 4
Impact Evaluation
 By:
 Chala Dechassa (PhD, Associate prof)
My question is: Are we making an impact?
1 Chala Dechassa, PhD 11/17/2023

Contents-
▪ Monitoring and Impact evaluation

▪ Why do Impact Evaluation
▪ Why we need a comparison group
▪ Methods for constructing the comparison group
▪ When to do an impact evaluation

1. Impact evaluation
 Impact evaluations are a particular type of evaluation that
seeks to answer cause-and-effect questions.
 impact evaluations are structured around one particular
type of question:
 What is the impact (or causal effect) of a program on an
outcome of interest?
 This basic question incorporates an important causal dimension:
 we are interested only in the impact of the program, that
is, the effect on outcomes that the program directly
causes.
 An impact evaluation looks for the changes in outcome
that are directly attributable to the program.

Cont’d
 Many names (e.g. Rossi et al call this impact assessment)
so need to know the concept.
 Impact is the difference between outcomes with the
program and without it
 The goal of impact evaluation is to measure this
difference in a way that can attribute the difference to
the program, and only the program
4 M&E Chala Dechassa, PhD 11/17/2023

2. Why Evaluate?
 Development programs and policies are typically designed to change

outcomes, for example, to raise incomes, to improve learning,
or to reduce illness.
 Whether or not these changes are actually achieved is a crucial
public policy question but one that is not often examined.
 More commonly, program managers and policy makers focus on
controlling and measuring the inputs and immediate
outputs of a program—how much money is spent, how many
textbooks are distributed—rather than on assessing whether
programs have achieved their intended goals of improving
well-being.

2. Why it matters
 We want to know if the program had an impact and
the average size of that impact
 Understand if policies work
 Justification for program (big $$)
 Scale up or not – did it work?
 Meta-analyses – learning from others
 understand the net benefits of the program
 Understand the distribution of gains and losses

Monitoring and IE
IMPACT Effect on living standards

- infant and child mortality,
- prevalence of specific disease
OUTCOMES Access, usage and satisfaction of users

- number of children vaccinated,
- percentage within 5 km of health center
OUTPUTS Goods and services generated

- number of nurses
- availability of medicine
INPUTS Financial and physical resources

- spending in primary health care

Impact Evaluation for Policy Decisions
 Impact evaluations are needed to inform policy makers on a range of

decisions, from curtailing inefficient programs, to scaling up
interventions that work,
▪ to adjusting program benefits,
▪ to selecting among various program alternatives.
 They are most effective when applied selectively to answer important
policy questions, and they can be particularly effective when applied
to innovative pilot programs that are testing a new, unproven, but
promising approach.

3. What we need
→ The difference in outcomes with the program versus
without the program – for the same unit of analysis
(e.g. individual)
 Problem: individuals only have one existence
 Hence, we have a problem of a missing counter-factual, a

problem of missing data
 Counterfactual analysis enables evaluators to attribute cause

and effect between interventions and outcomes.

Cont’d
 The 'counterfactual' measures what would have

happened to beneficiaries in the absence of
the intervention, and
 impact is estimated by comparing counterfactual

outcomes to those observed under the intervention

Thinking about the counterfactual
 Why not compare individuals before and after (the
reflexive)?
 The rest of the world moves on and you are not sure
what was caused by the program and what by the
rest of the world
 We need a control/comparison group that will allow us

to attribute any change in the “treatment” group
to the program (causality)

comparison group issues
 Two central problems:
 Programs are targeted
→ Program areas will differ in observable and
unobservable ways precisely because the
program intended this
 Individual participation is (usually) voluntary
→Participants will differ from non-participants in
observable and unobservable ways
 Hence, a comparison of participants and an arbitrary
group of non-participants can lead to heavily biased
results
Example: providing fertilizer to farmers
 The intervention: provide fertilizer to farmers in a
poor region of a country (call it region A)
 Program targets poor areas
 Farmers have to enroll at the local extension office to receive
the fertilizer
 Starts in 2019, ends in 2021, we have data on yields for farmers
in the poor region and another region (region B) for both years
 We observe that the farmers we provide fertilizer to
have a decrease in yields from 2019 to 2021

Did the program not work?
▪ Further study reveals there was a national drought, and

everyone’s yields went down (failure of the reflexive
comparison).
▪ We compare the farmers in the program region to those in
another region. We find that our “treatment” farmers have a
larger decline than those in region B. Did the program have a
negative impact?
 Not necessarily (program placement)
 Farmers in region B have better quality soil
(unobservable)
 Farmers in the other region have more irrigation, which
is key in this drought year (observable)

OK, so let’s compare the farmers in region A
 We compare “treatment” farmers with their neighbors.
We think the soil is roughly the same.
 Let’s say we observe that treatment farmers’ yields decline by
less than comparison farmers. Did the program work?
 Not necessarily. Farmers who went to register with the program may
have more ability, and thus could manage the drought better than
their neighbors, but the fertilizer was irrelevant. (individual
unobservable)
 Let’s say we observe no difference between the two groups.
Did the program not work?
 Not necessarily. What little rain there was caused the fertilizer to run
off onto the neighbors’ fields. (spillover/contamination)

The comparison group
 In the end, with these naïve comparisons, we cannot tell
if the program had an impact
→ We need a comparison group that is as identical in

observable and unobservable dimensions as
possible, to those receiving the program, and a
comparison group that will not receive spillover benefits.

How to construct a comparison group –
building the counterfactual
1. Randomization
2. Matching
3. Difference-in-Difference
4. Instrumental variables
5. Regression discontinuity

1. Randomization
 Individuals/communities/firms are randomly assigned into
participation
 Counterfactual: randomized-out group
 Advantages:
 Often addressed to as the “gold standard”: by design: selection bias is
zero on average and mean impact is revealed
 Perceived as a fair process of allocation with limited resources
 Disadvantages:
 Ethical issues, political constraints
 Internal validity (exogeneity): people might not comply with the
assignment (selective non-compliance)
 Unable to estimate entry effect
 External validity (generalizability): usually run controlled experiment
on a pilot, small scale. Difficult to extrapolate the results to a larger
18 population. Chala Dechassa, PhD 11/17/2023
2. Matching
 Match participants with non-participants from a larger
survey
 Counterfactual: matched comparison group
 Each program participant is paired with one or more non-

participant that are similar based on observable characteristics
 Assumes that, conditional on the set of observables, there is

no selection bias based on unobserved heterogeneity
 When the set of variables to match is large, often match on a
summary statistics: the probability of participation as a function
of the observables (the propensity score)
 Propensity score matching (PSM) is a quasi-experimental
method in which the researcher uses statistical techniques
to construct an artificial control group by matching each treated
unit with a non-treated unit of Chala
19
similar characteristics.
Dechassa, PhD 11/17/2023
2. Matching
 Advantages:
 Does not require randomization, nor baseline (pre-intervention
data)
 Disadvantages:
 Strong identification assumptions
 Requires very good quality data: need to control for all factors
that influence program placement
 Requires significantly large sample size to generate comparison
group

Matching in our example…
 Using statistical techniques, we match a group of non-

participants with participants using variables like
 Gender,
 household size,
 education,
 experience,
 land size (rainfall to control for drought),
 irrigation (as many observable characteristics not
affected by fertilizer)

Matching in our example…
2 scenarios
Scenario 1:
 We show up afterwards, we can only match (within region)
those who got fertilizer with those who did not. Problem?
 Problem: select on expected gains and/or ability (unobservable)
Scenario 2:
 The program is allocated based on historical crop choice and
land size. We show up afterwards and match those eligible in
region A with those in region B. Problem?
 Problems: same issues of individual unobservable, but lessened
because we compare eligible to potential eligible
 now unobservables across regions

3. Difference-in-difference
 The difference in difference (or "double difference") estimator is

defined as the difference in average outcome in the treatment
group before and after treatment minus the difference in average
outcome in the control group before and after treatment: it is
literally a "difference of differences.
 Compares the changes in outcomes over time between a

population enrolled in a program (the treatment group) and a
population that is not (the comparison group).
 It is a useful tool for data analysis.

Cont’d
 Observations over time: compare observed changes in the
outcomes for a sample of participants and non-participants
 Identification assumption: the selection bias is time-invariant
(‘parallel trends’ in the absence of the program)
 Counter-factual: changes over time for the non-participants
Constraint:
Requires at least two cross-sections of data, pre-program and post-
program on participants and non-participants
 Need to think about the evaluation ex-ante, before the program
 Can be in principle combined with matching to adjust for pre-
treatment differences that affect the growth rate

Examples
 Example 1 Card and Krueger (1994, AER) in "Minimum
Wages and Employment:
▪ A Case Study of the Fast-Food Industry in New Jersey and

Pennsylvania" try to evaluate the effect of the minimum
wage (the treatment) on employment (the outcome).
▪ On April 1, 1992, New Jersey’s minimum wage rose from

$4.25 to $5.05 per hour.

Cont’d…
 To evaluate the impact of the law, the authors
surveyed 410 fast-food restaurants in New Jersey
(the treatment group) and eastern Pennsylvania
(the control group) before and after the rise.
 Yi is the employment of a fast food restaurant,

 Ti is an indicator of whether or not a restaurant is in New
Jersey, and
 ti is an indicator of whether the observation is from before
or after the minimum wage hike

4. Instrumental Variables
 An instrumental variable (sometimes called an “instrument” variable)
is a third variable, Z, used in regression analysis when you have
endogenous variables—variables that are influenced by other
variables in the model.
 In other words, you use it to account for unexpected
behaviour between variables.
 Using an instrumental variable to identify the hidden (unobserved) correlation
allows you to see the true correlation between the explanatory variable and
response variable,Y.
 Z is correlated with the explanatory variable (X) and uncorrelated with the
error term, ε, (What is ε?) in the equation:

Y = Xβ + ε.

CONT’D
 Identify variables that affects participation in the
program, but not outcomes conditional on participation
(exclusion restriction)
 Counterfactual: The causal effect is identified out of the
exogenous variation of the instrument
 Advantages:
 Does not require the exogeneity assumption of matching
 Disadvantages:
 The estimated effect is local: IV identifies the effect of the program only
for the sub-population of those induced to take-up the program by the
instrument
 Therefore different instruments identify different parameters. End up
with different magnitudes of the estimated effects
 Validity of the instrument can be questioned, cannot be tested.

Summing up: Methods
 No clear “gold standard” in reality – do what works best
in the context
 Watch for unobservables, but don’t forget observables
 Be flexible, be creative – use the context
 IE requires good monitoring and monitoring will help you
understand the effect size

Impact Evaluation and the Project Cycle
Value

A. Ownership
 IE can provide one avenue to build institutional capacity
and a culture of managing-by-results – so the IE should
be as widely owned within gov’t as possible
 Agree on a dissemination plan to maximize use of results for
policy development.
 Identify entry points in project and policy cycles
 midpoint and closing, for project;
 sector reporting for WB
 Budget cycles, policy reviews for gov’t
 Use partnerships with local academics to build local
capacity for impact evaluation.

B. Relevance and Applicability
 For an evaluation to be relevant, it must be designed to respond to

the policy questions that are of importance.
 Clarifying early what it is that will be learned and designing the

evaluation to that end will go some way to ensure that the
recommendations of the evaluation will feed into policy
making.

C. Flexibility and adaptability
 The evaluation must be tailored to the specific project and adapted
to the specific institutional context.
 The project design must be flexible to secure our ability to
learn in a structured manner, feed evaluation results back into
the project and change the project mid-course to improve project
end results.
This is an important point: In the past projects have been penalized for
affecting mid-course changes in project design. Now we want to make
change part of the project design.

D. Horizon matters
 The time it takes to achieve results is an important consideration for
timing the evaluation.
 Conversely, the timing of the evaluation will determine what
outcomes should be focused on.
 Early evaluations should focus on outcomes that are quick to show
change
 For long-term outcomes, evaluations may need to span beyond
project cycle.
Think through how things are expected to change over time and focus on
what is within the time horizon for the evaluation
Do not confuse the importance of an outcome with the time it takes for it to
change—some important outcomes are obtained instantaneously !
But don’t be afraid to look at intermediate outcomes either

Impact Evaluation and the Project Cycle
Stage 1: Identification
Get an Early Start: How do you get started?

 Get help and access to resources: contact person in your region or
sector responsible for impact evaluation and/or Thematic Group on Impact
Evaluation
 Define the timing for the various steps of the evaluation to ensure
you have enough lead time for preparatory activities (e.g. baseline
goes to the field before program activities start)
 The evaluation will require support from a range of policy-makers:

 start building and maintaining constituents,
 dialogue with relevant actors in government,
 build a broad base of support, include stakeholders
Build the Team
 Select impact evaluation team and define
responsibilities of:
 program managers (government),
 WB project team, and other donors,
 lead evaluator (impact evaluation specialist),
 local research/evaluation team, and
 data collection agency or firm
Selection of lead evaluator is critical for ensuring quality of

product, and so is the capacity of the data collection agency
 Partner with local researchers and research institutes

to build local capacity

Shift Paradigm
 From a project design based on “we know what’s best”

 To project design based on the notion that “we can learn
what’s best in this context, and adapt to new knowledge
as needed”
Work iteratively:
 Discuss what the team knows and what it needs to learn–the questions for
the evaluation—to deliver on project objectives
 Discuss translating this into a feasible project design
 Figure out what questions can feasibly be addressed
 Housekeeping: Include these first thoughts in a paragraph in the PCN

stage 2: Preparation through appraisal
1. Define project development objectives and

results framework
 This activity
 clarifies the results chain (logic of impacts) for the
project,
 identifies the outcomes of interest and the indicators
best suited to measure changes in those outcomes, and
 the expected time horizon for changes in those outcomes.

3: Work out project design features that
will affect evaluation design
 Target population and rules of selection
 This provides the evaluator with the universe for the
treatment and comparison sample
 Roll out plan
 This provide the evaluation with a framework for timing data
collection and, possibly, an opportunity to define a comparison
group

4: Narrow down the questions for the evaluation
 Questions aimed at measuring the impact of the project

on a set of outcomes, and
 Questions aimed at measuring the relative effectiveness

of different features of the project

Questions aimed at measuring the impact of the project are relatively
straightforward
 What is your hypothesis? (Results framework)

 By expanding water supply, the use of clean water will increase, water
borne disease decline, and health status will improve
 What is the evaluation question?
 Does improved water supply result in better health outcomes?
 How can do you test the hypothesis?
 The government might randomly assign areas for expansion in water
supply during the first and second phase of the program
 What will you measure?
 Measure the change in health outcomes in phase I areas relative to the
change in outcomes in phase II areas. Outcomes will include use of safe
water (S-T), incidence of diarrhea (S/M-T), and health status (L-T,
depending on when phase II occurs). Add other outcomes.
 What will you do with the results?
 If the hypothesis proves true go to phase II; if false, modify policy.
Questions aimed at measuring the relative effectiveness of
different project features
require identifying the tough design choices on

the table…
 What is the issue?

 What is the best package of products or services?
 Where do you start from (what is the
counterfactual)?
 What package is the government delivering now?
 Which changes do you or the government think
could be made to improve effectiveness?

 How do you test it?
 The government might agree to provide a package to a randomly
selected group of households and another package to another
group of households to see how the two package perform
 What will you measure?
 The average change in relevant outcomes for households receiving
one package versus the same for households receiving the other
package
 e.g. extension vs fertilizer+extension vs fertilizer+extension+seeds
 What will you do with the results?
 The package that is most effective in delivering desirable outcomes
becomes the one adopted by the project from the evaluation
onwards

Application, features that should be tested early on
 Early testing of project features (say 6 months to 1 year) can provide
the team with the information needed to adjust the project early on
in the direction most likely to deliver success.
 Features might include:
 alternative modes of delivery (e.g. use seed merchants vs.
extension agents),
 alternative packages of outputs, or
 different pricing schemes (e.g. alternative subsidy levels).

5: Develop identification strategy
 (to identify the impact of the project separately from

changes due to other causes )
 Once, the questions are defined, the lead evaluator

selects one or more comparison groups against which
to measure results in the treatment group.
 The “rigor” with which the comparison group is
selected will determine the reliability of the impact
estimates.
 Rigor?
 More-same observables and unobservables (experimental),
 Less-same observables (non-experimental)
Explore Existing Data
 Explore what data exists that might be relevant for use in
the evaluation.
 Discuss with the agencies of the national statistical system and
universities to identify existing data sources and future data collection
plans.
 Record data periodicity, quality, variables covered and sampling
frame and sample size, for
 Censuses
 Surveys (household, firms, facility, etc)
 Administrative data
 Data from the project monitoring system

New Data
 Start identifying additional data collection needs.
 Data for impact evaluation must be representative of treatment and
comparison group
 Questionnaires must include outcomes of interest (consumption, income,
assets etc), questions about the program in question and questions about
other programs, as well as control variables
 The data might be at household, community, firm, facility, or farm levels and
might be combined with specialty data such as those from water or land
quality tests.
 Investigate synergies with other projects to combine data
collection efforts and/or explore existing data collection efforts
on which the new data collection could piggy back
 Develop a data strategy for the impact evaluation
including:
 The timing for data collection
 The variables needed
 The sample (including size)
 47Plans to integrate data from other sources (e.g project
Chala Dechassa, monitoring data)
PhD 11/17/2023
Prepare for collecting data
 Identify data collection agency

 Lead evaluator or team will work with the data collection agency to
design sample, and train enumerators
 Lead evaluator or team will prepare survey questionnaire or
questionnaire module as needed
 Pre-testing survey instrument may take place at this stage to finalize
instruments
 If financed with outside funds, baseline can now go to the field. If
financed by project funds, baseline will go to the field just after
effectiveness but before implementation starts

Develop a Financial Plan
 Costs:
 Lead evaluator and research/evaluation team,
 Data collection,
 Supervision and
 Dissemination
 Finances: ,
 Research grants,
 Project funds, or
 Other donor funds

Stage 3: Negotiations to Completion
Ensure timely implementation
 Ensure timely procurement of evaluation services
especially contracting the data collection, and
 Supervise timely implementation of the evaluation
including
 Data collection
 Data analysis
 Dissemination and feedback

Data collection agency/firm
 Data collection agency or firm must have technical
knowledge and sufficient logistical capacity relative to the
scale of data collection required
 The same agency or firm should be expected to do
baseline and follow up data collection (and use the same
survey instrument)

Baseline data collection and analysis
 Baseline data collection should be carried out before
program implementation begins; optimally even before
program is announced
 Analysis of baseline data will provide program
management with additional information that might help
finalize program design

Follow-up data collection and analysis
 The timing of follow-up data collection must reflect the
learning strategy adopted
 Early data collection will help modifying programs mid
course to maximize longer-term effectiveness
 Later data collection will confirm achievement of longer-
term outcomes and justify continued flows of fiscal
resources into the program

Watch implementation closely from an
evaluation point of view
 Watch (monitor) what is actually being implemented:
 Will help understand results of evaluation
 Will help with timing of evaluation activities
 Watch for contamination in the control group
 Watch for violation of eligibility criteria
 Watch for other programs for the same beneficiaries
 Look for unintended impacts
 Look for unexploited evaluation opportunities
→ Good evaluation team communication is key here

Dissemination
 Implement plan for dissemination of evaluation results
ensuring that the timing is aligned with government’s decision
making cycle.
 Ensure that results are used to inform project management
and that available entry points are exploited to provide
additional feedback to the government
 Ensure that wider dissemination takes place only after the
client has had a chance to preview and discuss the results
 Nurture collaboration with local researchers throughout the
process

Housekeeping
 Put in place arrangements to procure the impact

evaluation work and fund it on time
 Use early results to inform mid-term review
 Use later results to inform the ICR, CAS and future
operations

-
 End of the course!

Me - 044

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Me - 044

Uploaded by

Copyright:

Available Formats

ME: Chapter 4

My question is: Are we making an impact?

1 Chala Dechassa, PhD 11/17/2023

▪ Monitoring and Impact evaluation

2 Chala Dechassa, PhD 11/17/2023

3 Chala Dechassa, PhD 11/17/2023

4 M&E Chala Dechassa, PhD 11/17/2023

 Development programs and policies are typically designed to change

5 M&E Chala Dechassa, PhD 11/17/2023

6 Chala Dechassa, PhD 11/17/2023

IMPACT Effect on living standards

OUTCOMES Access, usage and satisfaction of users

OUTPUTS Goods and services generated

INPUTS Financial and physical resources

7 Chala Dechassa, PhD 11/17/2023

 Impact evaluations are needed to inform policy makers on a range of

8 M&E Chala Dechassa, PhD 11/17/2023

 Problem: individuals only have one existence

 Hence, we have a problem of a missing counter-factual, a

 Counterfactual analysis enables evaluators to attribute cause

9 Chala Dechassa, PhD 11/17/2023

 The 'counterfactual' measures what would have

 impact is estimated by comparing counterfactual

10 Chala Dechassa, PhD 11/17/2023

 We need a control/comparison group that will allow us

11 Chala Dechassa, PhD 11/17/2023

13 Chala Dechassa, PhD 11/17/2023

▪ Further study reveals there was a national drought, and

14 Chala Dechassa, PhD 11/17/2023

15 Chala Dechassa, PhD 11/17/2023

→ We need a comparison group that is as identical in

16 Chala Dechassa, PhD 11/17/2023

17 Chala Dechassa, PhD 11/17/2023

 Each program participant is paired with one or more non-

 Assumes that, conditional on the set of observables, there is

20 Chala Dechassa, PhD 11/17/2023

 Using statistical techniques, we match a group of non-

21 Chala Dechassa, PhD 11/17/2023

22 Chala Dechassa, PhD 11/17/2023

 The difference in difference (or "double difference") estimator is

 Compares the changes in outcomes over time between a

 It is a useful tool for data analysis.

23 Chala Dechassa, PhD 11/17/2023

24 M&E Chala Dechassa, PhD 11/17/2023

▪ A Case Study of the Fast-Food Industry in New Jersey and

▪ On April 1, 1992, New Jersey’s minimum wage rose from

25 Chala Dechassa, PhD 11/17/2023

 Yi is the employment of a fast food restaurant,

26 M&E Chala Dechassa, PhD 11/17/2023

27 Chala Dechassa, PhD 11/17/2023

28 Chala Dechassa, PhD 11/17/2023

29 Chala Dechassa, PhD 11/17/2023

30 Chala Dechassa, PhD 11/17/2023

31 Chala Dechassa, PhD 11/17/2023

 For an evaluation to be relevant, it must be designed to respond to

 Clarifying early what it is that will be learned and designing the

32 Chala Dechassa, PhD 11/17/2023

33 Chala Dechassa, PhD 11/17/2023

34 Chala Dechassa, PhD 11/17/2023

Get an Early Start: How do you get started?

 The evaluation will require support from a range of policy-makers:

Selection of lead evaluator is critical for ensuring quality of

 Partner with local researchers and research institutes