Evaluation Toolkit PDF

EVALUATION
EVALUATION
EVALUATION
EVALU
EVALUATION
ION
Some Tools, Methods & Approaches
ALUATION
EVALUATION
EVALUATIONEVALUATIO
EVALUATION EVALUATION
ATION
LUATIONEVALUAT
EVALUATIONEVALUATION
EVALUATION
TION EVALUATION
EVALU
U.S. Department of State
Washington, D.C.
Evaluation: Some Tools, Methods & Approaches

Acknowledgments
This booklet was prepared by Rolf Sartorius with support from Taryn
Anderson, Michael Bamberger, Danielle de Garca, Mateusz Pucilowski and
Mike Duthie. The authors wish to thank Krishna Kumar and Peter Davis for
their useful comments and suggestions.
Copyright 2013
S O C I A L I M PA C T
2300 Clarendon Boulevard, Suite 1000

Arlington,VA 22201
http://www.socialimpact.com
Version 2.0
Social Impact

Table of Contents
Part I: Purpose and Overview......................................................................... 5
1. Purpose.......................................................................................................... 6
2. Evaluation types............................................................................................. 7
Performance and process..................................................................................... 7
Summative/ex-post.............................................................................................. 7
Impact evaluations............................................................................................... 7
Global/regional program evaluations................................................................... 8
Experience reviews and meta-analysis.................................................................. 8
Special Studies..................................................................................................... 8
Part II: Evaluation Designs.............................................................................. 9
Quantitative designs.......................................................................................... 14
Qualitative designs............................................................................................ 17
Mixed method designs....................................................................................... 18
Part III: Data Collection Methods................................................................. 25
Mining Project Records and Secondary Data..................................................... 26
Formal Surveys.................................................................................................. 28
Rapid Appraisal Methods.................................................................................. 31
Participatory Methods....................................................................................... 34
Part IV: Evaluation Tools and Approaches..................................................... 37
Results Frameworks........................................................................................... 38
Performance Management Plans........................................................................ 40
Gender Analysis................................................................................................ 42
Cost-Benefit and Cost-Effectiveness Analysis..................................................... 44
Information Communication Technology......................................................... 46
EE
Social Impact
EVALUATION
PART I
EVALUATION
EVALU
Purpose and Overview

PURPOSE
The State Department promotes evaluation to achieve the most effective foreign
policy outcomes and greater accountability to the American people.1 The findings,
conclusions and recommendations generated from evaluations should be used to
improve the effectiveness of State Department programming in the areas of foreign
assistance, diplomacy, security and other operations and to prevent mistakes from
being repeated. The Departments evaluation policy is based on the premise that
evaluations must enable evidence-based decision making and reward candor more
than superficial success stories.
The purpose of this Evaluation Overview is to strengthen awareness of evaluation
tools, methods and approaches, in order to assist the Department and its partners
in their planning and implementation of useful, high-quality evaluations. The
booklet is designed to present some of the most useful and promising evaluation
tools and methods to facilitate planning, managing and using evaluations. For
each tool and approach, you will find a summary of purpose and use, advantages
and disadvantages, relative costs, and the skillsets and time required for their
undertaking. Key references for additional information are also provided.
The tools, methods and approaches in this overview are intended to further serve the
Departments primary evaluation purposes:
AccountabilityWell designed and timely evaluations help to ensure
accountability for the USG resources spent on foreign affairs activities.
Evaluations enable program managers and leadership to determine the cost
effectiveness of programs, projects, initiatives, activities, interventions, etc., and
in the case of a program or project, quality of its planning and implementation.
Consequently, evaluation findings can provide empirical data for reports to
various stakeholders in foreign assistance planning and in larger diplomatic
activities.
LearningEvaluations document the results, impact, or effectiveness of
organizational and programming activities, thereby facilitating learning from
experience. The Department can apply such learning to the development
of new projects, programs, strategies and policies. Empirically-grounded
evaluations also aid informed decision making when considering new programs
or projects, interventions, activities, etc.
The Department distinguishes among several types of evaluation: performance/
process evaluations, summative/ex-post evaluations, impact evaluations, global/
regional program evaluations, experience reviews and special evaluation studies.
The Department anticipates that most of its evaluations will be performance
evaluations due to their ability to generate rapid and cost-effective learning.
1
Department of State Evaluation Policy, February, 2012.
Social Impact
Types of Evaluations
and impacts, but they often include
effectiveness, and they are conducted
when an effort has ended or is soon to
end. Summative/ex-post evaluations
answer questions such as: What changes
were observed in targeted populations,
organizations or policies during and at
the end of effort? To what extent can
the observed changes be attributed to
it? Were there unintended effects which
were not anticipated at the planning
stage? Were they positive or negative?
What factors explain the intended
and unintended impacts? The essence
of summative/ex-post evaluations lies
in establishing that the changes have
occurred as a result of the intervention,
or at least the latter has substantially
contributed to them.
Depending upon their information

needs, resources and priorities, bureaus
may utilize a wide range of evaluation
types, which may include the following
Performance /Process Evaluations:
Such evaluations focus on the
performance of an effort and examine
its implementation, inputs, outputs and
likely outcomes. They are undertaken
to answer a wide range of important
questions: Did the effort operate as
planned? What problems and challenges,
if any, did it face? Was it or is it being
effectively managed? Did it provide
planned goods and services in a timely
fashion? If not, why not? Were the
original cost estimates about it realistic?
Did it meet its targets or is it likely to
meet them? What are its expected effects
and impacts? Is it sustainable?
Impact Evaluations: Impact evaluations

are a sub-category of summative/ex-post
evaluations. For the purposes of this
Guidance, they refer to those evaluations
which use control groups to measure
the precise impacts of an effort. In such
evaluations, two groups treatment
and control are established at the
launch of an effort. The treatment group
receives the services and goods from the
effort in terms of technical assistance,
training, advice and/or financial support
from the effort while the control group
does not. The overall impact of the
effort is measured by comparing the
performance, conditions or status of the
two groups.
There is no hard or fast rule regarding

when bureaus should conduct
performance/process evaluations;
however, for an effort with a life span
of five years, Bureaus should consider
undertaking an evaluation during
its mid-cycle so that managers can
have an objective assessment of its
implementation progress, problems
and challenges, which would enable
them to make mid-course corrections,
if necessary. On the other hand, for a
two or three year effort, it is might be
preferable to conduct the evaluation
at its end because it takes eight to
twelve months before an intervention
becomes operational and its outputs are
measureable.
This is often referred to as the

counterfactual which is a comparison
of what actually happened to what
would have happened in the absence of
the effort.
Summative/ ex-post evaluations: They

differ from performance evaluations in
that their focus is primarily on outcomes
Types of Evaluations (continued)

Special Evaluation Studies: Bureaus
may also undertake special evaluation
studies to meet their specific
information needs. Such studies may be
undertaken when (a) a key decision has
to be made and available information
is inadequate; (b) there are major
implementation problems that should
be addressed, or (c) the Secretary of
the State, OMB, White House and the
Congress need empirically grounded
information that is not available from
routine sources. For example, if the
White House needs information about
the implementation of Human Rights
programs in the Middle East, DRL
or the Bureau for Near Eastern affairs
might commission a study on the
subject.
Global/Regional program evaluations:

They may be designed to examine the
performance and outcomes of a major
sector or sub-sector of foreign affairs
programs to draw general findings,
conclusions and lessons. The purpose of
a global evaluation of electoral assistance,
for example, is not to evaluate the success
or failure of individual projects but to
determine the efficacy and outcomes of
electoral assistance programs per se.
EVALUATI
Experience Reviews: These involve

systematic analysis of past experience
mostly based on the review of past
documents, reports, evaluations and
studies. Evaluators can also supplement
the information by key informant
interviews or workshops. They focus on
a limited range of questions and can be
completed within the span of two or
three weeks.
This booklet provides an overview of various tools and methods for conducting these
different types of evaluations. The list of topics included here is not intended to be
comprehensive. Some of these tools and approaches are complementary, while some
are substitutes. Although some have broad applicability, others are quite narrow
in their uses. The choice of which tools or approaches are appropriate for a given
context will depend on a range of considerations, including the intended uses of the
evaluation, evaluation stakeholders, the speed with which the information is needed,
and the resources available.
Mixing and matching approaches

Many of the tools and methods in this booklet work best when they are creatively
mixed or blended. A mixed methods approach combines elements of qualitative
and quantitative evaluation approaches and data collection methods. This
approach allows evaluators to leverage the strengths of each tool while mitigating
its weaknesses. Various approaches can be used sequentially or simultaneously,
with one method dominating the design or each method contributing equally. The
Social Impact
Department's Evaluation Policy does not value any single method over others. A
mixed method approach provides evaluations with a number of advantages:
Triangulation increases validity and credibility of findings and conclusions;
Comprehensiveness allows for collection of enough data to sufficiently answer
all evaluation questions;
ION
Clarity data collected from one method can clarify or supplement findings
collected via another method;
Development information collected through a given method may inform the

use or revision of subsequent methods;
Initiation data collected through multiple methods that provides divergent
findings can trigger further analysis

Key resources
`` Department of State (2012). Department of State Program Evaluation Policy.
http://www.state.gov/s/d/rm/rls/evaluation/2012/184556.htm
`` Department of State (2012). Evaluation Guidance for the Department of State.
`` Department of State (2012). Department of State Evaluation Policy: Frequently Asked Questions.
`` Department of State Diplopedia: http://diplopedia.state.gov/index.php?title=State_Program_Evaluations
`` Suggested Evaluation Resources: http://diplopedia.state.gov/images/Suggested_Evaluation_Resources.docx
`` Department of State Evaluation Community of Practice: http://cas.state.gov/evaluation
EE
10
Social Impact
EVALUATION
PART II
EVALUATION
EVALU
Evaluation Designs
11

Overview of Evaluation Designs
Evaluation designs describe the logic and the conceptual framework for answering
the evaluations key questions. DoS recognizes two main kinds of evaluation
designs, each with particular strengths and limitations in addressing specific kinds
of evaluation questionsperformance evaluation designs and impact evaluation
designs.
Performance evaluation (PE) designs are best at answering descriptive and
normative questions about programs. Illustrative descriptive questions can include:
who benefited from the program and who did not? What were strengths and
weaknesses in program implementation? How has the program made a difference?
Normative questions are those which gauge program performance against certain
agreed norms or standards: to what extent did the program achieve its target of
training 400 election supervisors? To what extent were do no harm principles
adhered to during implementation of the peace building program? To what extent
were DoS Green Building standards followed in the construction of the new
embassies? There are many PE designs and to simplify we identify three main types:
1) PE designs using primarily quantitative methods; 2) PE designs using primarily
qualitative methods; and 3) mixed methods PE design. Due to their practicality
in terms of lower costs, faster implementation and greater ability to describe the
interaction of the program and the program context, PE designs are much more
widely used in evaluating DoS programs compared to IE designs.
Impact evaluation (IE) designs are used to answer cause-and-effect questions about
DoS programsto what extent did the program cause outcomes to occur? To what
extent can the program benefits be attributed to the program? For example, an IE
design might be used to answer the question: To what extent did the youth training
program in Tunisia lead to greater employment of youth? In the DoS evaluation
policy IEs use experimental and quasi-experimental designs with a counterfactual
or control group. DoS intends to use these design only very selectively due to their
substantial cost, time and technical requirements.
These main designs are summarized in the following chart and then the advantages
and challenges in using each design type are outlined below.
12
Social Impact
Chart 1: Continuum of Evaluation Designs

Impact Evaluations
Experimental and
quasi-experimental
designs
1. Designs with
randomized
assignment (RCTs)
2. Designs with
comparison groups
but not randomized
assignment
Regression
discontinuity
Difference-indifference
Matching
Performance Evaluations
Quantitative designs
without control or
comparison groups
Qualitative designs
Mixed methods designs
Designs that assess

outcomes for the
same group using
qualitative methods
Designs that
systematically integrate
quantitative &
qualitative methods
Snapshot design
Before-and-after
Cross-sectional
Appreciative inquiry
Most significant
change
Case study design
Case study design

Qualitative dominant
design
Quantitative
dominant design
Balanced mixed
methods design
Designs that assess

outcomes for the
same group using
quantitative methods

Snapshot design
Before-and-after
Cross-sectional
Time Series
Increasing feasibility in the field

Decreasing statistical rigor for causal questions
13

Quantitative Performance
Evaluation Designs
What are they?
Quantitative performance evaluation (PE) designs rely predominantly on the use
standardized measures and standardized data collection procedures throughout the
evaluation to ensure comparability or to measure results. There a many types of
quantitative PE designs. Several of these that are potentially most useful for DOS
are: 1) quantitative snapshot designs; 2) before and after designs 3) cross sectional
designs; and 4) time series designs. Quantitative PE designs measure program effects
at a single point in time or through repeated measures without a counterfactual
group and do not answer cause-and-effect questions with certainty. These designs are
often very practical and can be used widely for evaluating DoS programs.
ADVANTAGES:
`Èvaluation findings can be generalized to the population about which

information is required
``Samples of individuals, communities or organizations can be selected to
ensure that the results will be representative of the population being studied.
`Èstimates can be obtained of the magnitude and distribution of program
results.
``Standardized approaches permit the evaluation to be replicated in different
areas over time with comparable findings.
`Ìncreases the credibility of findings for many (but not all) evaluation users
CHALLENGES:
``Many kinds of information are difficult to obtain through structured data

collection instruments particularly on sensitive topics such as domestic
violence and income.
``Many groups, especially ethnic minorities, sex workers and victims of
trafficking may be more difficult to reach through quantitative evaluations.
``There is often little contextual data to help interpret results or explain
variations across groups
``Difficulty in studying the program implementation process.
`Ìf capable local survey firms are not on the ground in-country quantitative
PEs can be expensive and time consuming, depending on their size and scale.
14
Social Impact
COSTS:
Many quantitative PEs are in the middle cost range with snapshot PEs being less
expensive and times series designs being more expensive assuming DOS will pay for
the repeated data collection costs.
SKILLS REQUIRED:
A high degree of skill in survey design, sampling, survey management and analysis,
especially for larger scale survey. Quantitative PE designs using simple surveys will be
less technically demanding
TIME REQUIRED:
For snapshot and cross-sectional PE designs 3-5 months is typical. Before-and-after

designs typically require a few months at program inception (baseline) and another
few months at the end of the program to assess performance.
KEY RESOURCES
`` Bamberger, M. (2012). RealWorld Evaluation Chapter 12.
`` Gertler, P. Impact Evaluation in Practice
`` Rist, R. and Morra, L. (2009). The Road to Results: Designing and Conducting Effective Development Evaluations.
The World Bank. www.
15
Quantitative PE Designs
Quantitative Snapshot Designs look
at a group of program participants at
one point in time during or after the
intervention. These designs can be used to
answer descriptive or normative questions.
For example, the main design might use
a single simple survey of several dozen to
a few hundred program participants to
answer descriptive questions about how the
program has benefited participants, how
much they liked the program, or how they
rate the quality of program services. The
design can also be used to answer normative
questions against specific targets or criterion
such as did the program achieve its targeted
75% satisfaction rating among program
participants? These designs produce rapid
access to evaluative information and they
are relatively cheap.
Quantitative Before-and-After Designs
can be used to answer descriptive and
normative questions such as: how much
have participants learned during the
program, how income has increased or
how have morbidity or mortality rates,
or incidents of violent conflict decreased
during the program. Evaluators ask about
group characteristics before and after the
program and there is no comparison group.
For example, in a conflict management
program evaluators could do a pre-test
(before the program) and post-test (after
the program) to see how much participants
learned about mediation techniques. Or
in an income generation project evaluator
could measure participant income at
program inception and again at program
completion. These designs require a baseline
or the time and resources to recreate one.
Cross-Sectional Designs show a snapshot

of program performance at one point
in time. The quantitative version of this
design is used with a survey and it answers
questions about subgroup responses to the
programs. It also can be used to address
descriptive or normative questions. For
example, a descriptive question that could
be addressed might be: how did women
versus men differentially respond to the
program? Whereas a normative question
could be: How did ethnic minorities versus
the majority group compare in meeting the
targeted 10% increase income for program
participants? The advantage of this design
is that it systematically disaggregates the
subgroups and examines how the program
has differentially affected them.
Time Series Designs take repeated
measures to explore and describe changes
over time and to identify trends. For
example, in health programs a time series
design might draw on government or other
donor statistics to measure HIV prevalence
rates over time in a particular district that
is receiving HIV/AIDS programming. In
a longitudinal design repeated measures
are taken from the same group. In an
interrupted time series design, measures are
taken from before the program, to examine
the pre-program trends, and then again
during and after the program to examine
the expected interruption in the trend. The
design is useful for where existing data sets
are available to examine trends, otherwise
the design is very and time consuming to
implement.
Although each of the above designs are quantitative each could have a qualitative
counterparts. For example, qualitative designs can use the same basic logic of snapshot,
before-and-after, cross-sectional and even time series designs.
16
Social Impact

Qualitative PE Designs
What are they?
Qualitative PE designs draw on a range of primarily qualitative methods to
address evaluation questions where understanding the interplay between program
performance and the program context is central, where evaluations need to be
conducted quickly, flexibly and at low cost; or where complex or rapidly changing
programs require fast, flexible, and on-going learning. These PE designs are often
comprised of a combination of document review, key informant interviews,
focus groups interviews and participatory evaluation methods such as mapping
exercises, or other qualitative approaches such as Appreciative Inquiry or Most
Significant Change methodology. Qualitative PE designs are very practical for
evaluating many kinds of DoS programs, especially those where outcomes such as
poverty, vulnerability, security, and empowerment combine a number of different
dimensions, which can be difficult to observe and measure.
STRENGTHS
``Flexibility to evolve
``Sampling focuses on high value subjects
``Holistic focus
`Èxamine the broader context of the program
``Multiple qualitative sources provide understanding of complex phenomena
``Narrative reports more accessible to non-specialists
``The use of participatory approaches makes it more likely that vulnerable and
voiceless groups are heard
CHALLENGES
`À flexible, evolving design may frustrate key users of the evaluation and be less
practicable, especially for some short duration evaluations
``Lack of generalizability of findings to other programs
``Multiple perspectiveshard to reach consensus on some major themes
``Methodological challenges in boiling down large quantities of qualitative data
`Ìnterpretivist methods appear too subjective
COSTS, TIME, SKILLS REQUIRED, RESOURCES
Cost, time and skills required will depend on the specific blend of qualitative
methods selected to support the PE design. See Part III for specifics.
17

Mixed Methods Evaluation Designs
What are they?
Mixed methods evaluation designs involve the systematic integration of different
methodologies at all stages of an evaluation. The mixed method approach normally
refers to evaluation designs that combine quantitative and qualitative methods.
It is important to distinguish systematic integration of quantitative and qualitative
methods from many evaluations that combine methods in an ad hoc manner.
The benefits and systematic and integrated use of mixed methods evaluation designs
is widely recognized in all spheres of domestic and international evaluation work.
They are generally the preferred evaluation design DoS programs for the following
reasons:
``DoS programs operate in complex and changing social, economic, ecological
and political contexts and no single evaluation methodology can adequately
describe the interactions among all these different factors.
``DoS program implementation and outcomes are affected by a wide range
of historical, economic, political, cultural, organizational, demographic and
natural environmental factors, all of which require different methodologies for
their assessment.
``DoS programs also produce a range of different outcomes and impacts, many
of which require different methodologies for measurement and evaluation.
``DoS programs change in responsive to how they are perceived by different
segments of the target (and non-target) population, and observing these
processes of behavioral change requires the application of different methods.
ADVANTAGES
Mixed methods designs, when used systematically, offer the potential to combine the
benefits of both qualitative and quantitative approaches while compensating for the
limitations of each approach when used separately. A well-designed mixed method
approach can offer a range of potential benefits:
`À well-designed mixed methods evaluation is able to draw on a much broader
range of qualitative and quantitative tools, techniques and conceptual
frameworks at all stages of the evaluation
``Normally, the design will also incorporate professionals from different
disciplines into the core evaluation team
``Mixed methods designs assist DoS in understanding how local contextual
factors can explain variations in program implementation and outcomes in
different locations
18
Social Impact
``Mixed methods designs combine the representativeness of quantitative

methods that allow for a generalization of findings from a sample to a larger
population with the ability of qualitative methods to assess the effect of
intervening variables (for example ethnicity, community leadership, etc.) on
outcomes.
COSTS, TIME, SKILLS REQUIRED
Cost, time and skills required will depend on the specific blend of mixed methods
selected to support the PE or IE design. See Part III for specifics
RESOURCES
`` Bamberger, M. (2013). The Mixed Methods Approach to Evaluation. SI Concept Note Series. No. 1., April
2013. http://www.socialimpact.com/evaluationresources
19

Case Study Design
What is it?
A Case Study Design is typically a qualitative or mixed methods evaluation. It
is a non-experimental design and does not use random selection or control and
comparison groups. The design is frequently used when DoS wants to gain in-depth
understanding of a program process, event, or situation and explain why results
occurred. It is useful when answering descriptive evaluation questions about why and
how the intervention works and it can be especially useful in portraying complex
program processes and how the program context interacts with targeted individuals,
communities or institutions to produce results or behavior changes. A key attribute
of a case study design is that it highlights why decisions were made, how decisions
were implemented and finally, with what results. Case study designs can be used
to examine program extremes (high performing or low performing examples) or a
typical intervention.
Case studies can use qualitative, quantitative or mixed methods to collect data. They
can consist of a single case or multiple cases across multiple sites or countries. For
example, a case study design could be used to describe how a program to reintegrate
former child soldiers in Sierra-Leone affected children participating in the program.
The case study would describe the program context, the history and background of
some key individual(s), how they participated in the program and how the program
affected the lives of selected children and their communities upon reintegration.
Rich learning could be gained from cases that portrayed typical children in the
program, or ones who had had experienced particular successes or failures due to
their involvement.
ADVANTAGES:
`Àllows for in-depth analysis of interplay between program context and results
``Helps to establish plausible causal relationships between interventions and
outcomes
``Frequently involves a mixed-methods approach strengthening credibility of
results
``Provides a story line that may be compelling for readers
LIMITATIONS:
``Focuses on only one causal relationship, sometime leaving out other potential
relationships
``Difficult to generalize to other situations
20
Social Impact
``Cannot establish statistical causality or significance as in quantitative impact

evaluation designs.
COST:
Medium, depending on number of cases selected and depth of data collection.

SKILLS REQUIRED:
Familiarity with various qualitative research methods such as key informant

interviews, focus groups and direct observation. Depending on the scope and focus
of the case study design may also draw on survey and other quantitative skills.
TIME REQUIRED:
Generally a minimum of at least 2-3 days required to collect case study data for a
single case such as an individual or a small-scale organization, plus an additional 2-3
days for analysis and report writing. More time if surveys are required. Substantially
more time of the unit of analysis is a large group (Liberian armed forces) or large
organization (Liberian Ministry of Defense), community, or region.
KEY RESOURCES:
`` USAID (2013). Evaluative Case Studies. Technical Note. USAID Monitoring and Evaluation Series. No x.
Version x. (draft)
`` Yin, Robert K. (2009). Case Study Research: Design and Methods. Fourth Edition. Thousand Oaks, CA: Sage.
`` Social Impact (2006). Monitoring, Evaluation and Learning for Fragile State and Peacebuilding Programs: Practical
Tools for Improving Program Performance and Results, pp. 40-47.
21

Impact Evaluation
What is it?
Impact evaluations utilize experimental or quasi-experimental methods to assess
the changes in development outcomes that are directly attributable to a given
intervention. To be able to isolate the impact of an intervention, evaluators must
first identify a credible and rigorously defined counterfactual a theoretical state
that predicts what would have happened to beneficiaries in the absence of the
intervention. The counterfactual is estimated by identifying a comparison group
that is as similar to the beneficiary (treatment) group as possible. Impact is then
measured by comparing the changes over time between the treatment group and
this comparison group. While comparison groups can be selected using a variety
of methodologies, randomized selection of potential beneficiaries into treatment
and control groups provides the strongest evidence of a relationship between the
intervention under study and the outcome measured. Given the complexities of
development work, however, experimental designs entailing randomized selection are
not always possible or desired. In such cases, evaluators should use the most rigorous
quasi-experimental methods available.
What can we use it for?

``Measuring outcomes and impacts of an activity and distinguishing these from
the influence of other, external factors
``Strengthening accountability for results
`Ìnforming decisions on whether to expand, modify or eliminate projects,
programs or policies
``Testing the relative effectiveness and efficiency of alternative interventions
``Drawing lessons for improving the design and management of future activities
Advantages:
``More rigorous than performance evaluations

``Can attribute changes in development outcomes to discrete interventions
``Can compare competing interventions or alternative intervention designs
``Provide estimates of the magnitude of outcomes and impacts for different
demographic groups and regions over time
``Statistical analysis and rigor can give managers and policy-makers added
confidence in decision-making
Findings should be available in time to inform the project itself or future strategies, designs and procurements
22
Social Impact
CHALLENGES:
Treatment Group exposed to a given

intervention (beneficiaries).
``Considerably more costly, timeconsuming, and management

intensive than performance
evaluations
Control Group identified using

randomized selection and not exposed to
a given intervention (counterfactual).
``Substantive changes in project

design can threaten evaluation
validity
Comparison Group identified

by non-randomized selection and
not exposed to a given intervention
(counterfactual).
``Require highly specialized

evaluators to design and
implement
``Selection of comparison groups could be politically or logistically difficult

(especially in the case of randomization)
Cost:
Impact evaluations generally cost more than performance evaluations. For reference
see the TIPS note on Impact Evaluation Costing.
Skills required:
Strong technical skills in social science research design, management analysis

and reporting. These generally include sampling and power calculations; survey
instrument design; enumeration and enumerator training; interviewing; data entry,
warehousing, cleaning and management; data analysis using statistical software; and
qualitative research skills to triangulate results.
Time required:
Varies according to design and scope of evaluation but could take multiple years.
23
A few approaches used for impact evaluations

Experimental Design members of
a population are randomly assigned
to treatment and control groups, and
questionnaires or other data collection
instruments (anthropometric measures,
school performance tests, etc.) are
administered to both groups. This is
done both before and after the project
intervention. Randomization maximizes
the probability that the two groups will be
statistically similar, controlling for selection
bias and producing the most rigorous
counterfactual estimate. Also called
Randomized Control Trials (RCTs).
are the comparison. This method assumes

that the local impacts (just around the cutoff)
are generalizable to the broader population.
Difference in Differences (DD) this
QED compares the changes in development
outcomes over a given period of time
between the beneficiaries (treatment) and
a purposively selected group of nonbeneficiaries (comparison). This method
allows evaluators to control for differences
between the treatment and comparison
groups that are constant over time but
forces us to assume that the two groups
would have experienced the same changes
(parallel trends). DD can be paired with
either experimental designs or matching
techniques to increase analytical rigor.
Quasi-Experimental Design (QED)

when randomization is not possible, a
comparison group is purposively selected
to be as similar to the treatment group as
possible. There are a number of different
methodologies for selection, each of which
entails its own assumptions about the
goodness-of-fit between the comparison
group and the counterfactual.
Matching this QED relies on large

data sets to construct the best possible
comparison group on the basis of observed
characteristics. Matching can be conducted
using a variety of methods including
Propensity Score Matching, where each unit
is assigned a probability (0-1) that it will
participate in a given program. The score
is expressed through a summary index of
relevant characteristics. Because matching
is based on observed characteristics, this
method necessitates the assumption that
there are no differences in unobservable
characteristics (motivation, etc.).
Regression Discontinuity (RD) this QED

utilizes a clear, numerical cutoff (threshold)
score for participation in a given intervention
to create a counterfactual. Evaluators compare
proximate observations from both sides of
the cutoff to estimate the impact of a given
intervention. Those just above the cutoff are
the treatment group, whereas those just below
Key Resources
`` Gertler, P. et. al. (2011). Impact Evaluation in Practice. The World Bank, Washington, D. C.
`` International Initiative for Impact Evaluation (3ie). http://www.3ieimpact.org/
`` Khandker, S. R., Koolwal, G. B., & Samad, H. A. (2010). Handbook on Impact Evaluation: Quantitative Methods
and Practices. Washington, DC: The International Bank for Reconstruction and Development / The World Bank.
http://go.worldbank.org/9H20R7VMP0
`` World Bank. Development Impact Evaluation Initiative. http://go.worldbank.org/1F1W42VYV0
`` Duflo, E. (2007). Using Randomization in Development Economics Research: A Toolkit. Centre for Economic Policy
Research, London. econ-www.mit.edu/files/806
24
EE
Social Impact
EVALUATION
PART III
EVALUATION
EVALU
Data Collection Methods
25

Mining Project Records and
Secondary Data
What is it?
Data mining uses project documents and records or secondary sources such as
published reports, censuses, surveys and comparative international data during the
evaluation. Project documents that can be mined include periodic project reports
(monthly, biannual, annual), baseline data, needs assessments, grant data bases,
internal and external evaluations, technical advisor input reports, field reports and
project logs and diaries kept by project personnel or beneficiaries. Mining secondary
data can include use of qualitative and ethnographic data such as posters, graffiti,
mass media reports (newspapers, TV, etc.), e-mail and social media (Facebook,
You-Tube, etc.). Examples of widely used comparative international data sets
include: MDG statistics, UN Human Development Index, Demographic and
Health Surveys, World Bank World Development Indicators and Transparency
International.

``Complement or check data collected during the evaluation
``Reconstruct baseline data when the evaluation is commissioned late in the
project cycle
``Develop a sampling frame for the selection of the project or comparison
groups
``Define the counterfactual when it is not possible to use a pre/post-test
comparison group evaluation design
ADVANTAGES:
``Produces significant cost and time savings

``Strengthens sample design by ensuring coverage of the total target population.
Improves matching of the project and comparison groups through techniques
such as propensity score matching and instrumental variables
``Provides independent check of data validity
`Ènriches quality and interpretation of evaluation findings
``Provides an alternative way to define the counterfactual particularly useful
for complex evaluations
26
Social Impact
CHALLENGES:
``Voluminous project records may be difficult and timeconsuming to review

and analyze
`Òften does not cover the correct time period, level of analysis, or correct
sample
``May not provide all of the required information
``Difficult and time-consuming to check the validity of secondary data
COST:
The cost of using published data or well-documented surveys is very low. However,
using project records or data from other organizations may require significant costs
to put the data in a form that can be used for the evaluation.
SKILLS REQUIRED:
Experience with statistical analysis of survey data.

TIME REQUIRED:
Relatively short compared to primary data collection.
Key resources
`` http://betterevaluation.org/plan/describe/existing_documents
`` Bamberger, M. (Nov. 2010). Reconstructing baseline data for impact evaluation and results measurement.
Special Series on the Nuts and Bolts of M&E Systems, No. 4. In Poverty Reduction and Economic Management
Notes. The World Bank. http://siteresources.worldbank.org/INTPOVERTY/Resources/335642-1276521901256/
premnoteME4.pdf
`` Boslaugh, S. (2007). An Introduction to Secondary Analysis. Excerpt from Secondary Data Sources for Public
Health: A Practical Guide. New York: Cambridge University Press. http://assets.cambridge.org/97805218/70016/
excerpt/9780521870016_excerpt.pdf
27

Formal Surveys
What are they?
Formal surveys are used to collect standardized information from a carefully selected
sample of individuals or aggregated units (households, schools, etc.). Surveys often
collect comparable information for a relatively large number of people in particular
project groups.
What can we use them for?

``Providing baseline data against which to compare strategy and performance
``Comparing different groups at a given point in time
``Comparing changes over time in the same group
``Providing a key input to a formal evaluation of the impact of a program or
project
`Àssessing the level of need in a particular target group or in a particular sector
as the basis for preparing a project or program design
ADVANTAGES:
``Findings from the right sample of respondents can be applied to the wider
target group or the population as a whole
``Quantitative estimates can be made for the size and distribution of impacts
``With the proliferation of donor surveys and national statistics agencies, there
may be good survey data to draw or build on
CHALLENGES:
``With the exception of Core Welfare Indicators Questionnaires (CWIQ),

results are often not available for a long period of time
``The processing of data and quality assurance can be a major bottleneck for the
larger surveys even with software tools such as Statistical Package for the Social
Sciences (SPSS)
``Demographic and Health Surveys (DHS) and household surveys are expensive
and time consuming
`Ùsed without qualitative methods, surveys may give an incomplete picture of
results and underlying causes of change
28
Social Impact
COST:
Ranges from $30-60 per household for the CWIQ to $1-3 million for a full DHS.
Costs may be significantly higher with no master sampling frame for the country.
SKILLS REQUIRED:
Sound technical and analytical skills for sample and questionnaire design, data
analysis, processing and reporting.
TIME REQUIRED:
Depends on the sample size. The CWIQ can be completed in two months. Standard
DHS fieldwork generally requires between three and seven months to complete,
while data collection through the final report takes one year to 18 months.
Key resources
`` Measure DHS. Demographic and Health Surveys. http://www.measuredhs.com/start.cfm
`` Grootaert, C. and van Bastelaer, T. (2002). Understanding and Measuring Social Capital: a multidisciplinary tool for
practitioners. Washington, DC: World Bank.
`` Sapsford, R. (2011). Survey Research (2nd ed.). Newbury Park, CA: Sage Publications.
`` World Bank. Core Welfare Indicator Questionnaire (CWIQ). http://go.worldbank.org/66ELZUGJ30
`` World Bank. Living Standards Measurement Survey (LSMS). http://www.worldbank.org/lsms/
`` World Bank. Quantitative Service Delivery Surveys (QSDS). http://go.worldbank.org/MB54FMT3E0
`` World Bank. Citizen Report Card and Community Scorecard. http://go.worldbank.org/QFAVL64790
29
EE
Some Types of Surveys

Public Opinion Surveys are designed to represent the opinions of a population by
conducting a series of questions and then extrapolating generalities within confidence
intervals. For example, The Asia Foundation Survey of the Afghan People provides
insights into Afghans' views on security, national reconciliation, the economy, governance,
corruption, justice, development, provision of services and gender equality. The survey has
been conducted annually since 2006 and tracks public opinion trends on these issues.
Demographic and Health Surveys (DHS) are nationally-representative household surveys
that provide data for a wide range of monitoring and impact evaluation indicators in the
areas of population, health, and nutrition. Standard DHS Surveys have large sample sizes
and typically are conducted every five years, to allow comparisons over time. Interim DHS
Surveys focus on the collection of information on key performance monitoring indicators
but may not include data for all impact evaluation measures (such as mortality rates). These
surveys are conducted between rounds of DHS surveys and have shorter questionnaires
than DHS surveys.
Core Welfare Indicators Questionnaire (CWIQ) is a household survey that measures
changes in social indicators for different population groupsspecifically indicators of
access, utilization, and satisfaction with social and economic services. It is a quick and
effective tool for improving activity design, targeting services to the poor and, when
repeated annually, for monitoring activity performance. Preliminary results can be obtained
within 30 days of the CWIQ survey.
Multi-Topic Household Survey (also known as Living Standards Measurement
SurveyLSMS) is a multi-subject integrated survey that provides a means to gather data
on a number of aspects of living standards to inform policy. These surveys cover spending,
household composition, education, health, employment, fertility, nutrition, savings,
agricultural activities, other sources of income. Single-topic household surveys cover a
narrower range of issues in more depth.
Client Satisfaction (or Service Delivery) Survey is used to assess the performance of
donor or government services based on client experience. The surveys shed light on the
constraints clients face in accessing services, their views about the quality and adequacy
of services, and the responsiveness of government officials. These surveys are usually
conducted by a government ministry or agency.
Citizen Report Cards have been conducted by NGOs and think-tanks in many countries.
Similar to service delivery surveys, they have also investigated the extent of corruption
encountered by ordinary citizens. A notable feature has been the widespread publication of
the findings.
Social Capital Surveys measure peoples perceptions of the trustworthiness of other
people and key institutions that shape their lives, as well as the norms of cooperation and
reciprocity that underlie attempts to work together to solve problems. These surveys have
been used for monitoring and evaluating peacebuilding and transition programs.
30
Social Impact

Rapid Appraisal Methods
What are they?
Data collection methods that can be employed quickly and at a low cost to obtain
a narrow, but in-depth understanding of the conditions and needs of the targeted
group. These methods elevate the importance and relevance of local knowledge. Less
structured than classic evaluation methods (i.e., surveys, experiments), they tend to
use a smaller sample size and may therefore have less statistical accuracy.

`Àccommodate resource constraints
`Ìnvestigate motivations and attitudes behind behaviors
`Àssess the development hypothesis and facilitate the development of a more
comprehensive, formal survey tool
`Ìdentify the universe of stakeholders and opinion leaders/decision-makers
Advantages:
``Quick and cost-effective

``Highly adaptable
``Focus on qualitative information produces detailed data
`Òn-the-spot analysis allows for verification of conclusions by local people
CHALLENGES:
``Limited generalizability/reliability and lack of clear validation procedures

lessens credibility
``Susceptible to agendas of participants
`Èxternally-driven process (not inherently participatory)
``Quality of results dependent on skill of evaluators
Cost:
Low to medium, depending on scope of evaluation and methods selected

Skills Required:
Data collection (administration of individual and group interviews, group/meeting

facilitation, field observation), cultural sensitivity, qualitative data analysis and basic
statistical skills.
31
Time required:
Two to six weeks, depending on the scope of evaluation (number of units to be

evaluated); should be scheduled according to lifestyle of community being evaluated.
Rapid Appraisal Methods

Key informant interviews a series of
individual interviews with a small, select
group of people with vast knowledge of
a particular subject. These are frequently
semi-structured, following a prepared
interview guide with predetermined topics
or loosely-worded questions. They may be
easier and less expensive than focus groups,
given a lesser demand for coordination or
incentives.
Focus groups interviews facilitated
by an impartial moderator with several
homogenous groups of stakeholders.
Typically comprised of seven to 12
participants and lasting between one to two
hours; anything longer should be broken
into multiple sessions. Focus groups allow
participants to build upon one anothers
comments, but are potentially at risk of
producing data biased by the most vocal
participants. Best for generating, testing
or exploring ideas or as a method of
triangulation.
Community interviews differ from
focus groups because they occur in a public
setting and are open to all community
members. Interview protocol is usually
more structured and there is less discussion
amongst participants. Most interaction is
between interviewer and participants.
Direct observation multiple evaluators

consciously record what they see, hear
and smell of their physical surroundings,
activities, processes or discussions. In
structured observation, evaluators look for a
specific behavior, object or event and use a
common form to record scores/comments.
Direct observation makes evaluators
aware of aspects either purposefully or
inadvertently omitted in data collection
from participants but provides only a
snapshot of the situation.
Mini-surveys Much smaller than a
formal questionnaire, mini-surveys focus
on a narrowly-defined issue, question or
problem, include 15-30 questions and are
designed to take no more than 30 minutes
to complete. They are administered only to
25-75 people, who are most often selected
through nonprobability sampling. Minisurveys are attractive to evaluators because
they can generate quantitative, easily
analyzable data fairly quickly. Web-based
tools such as Survey Monkey are very useful
where respondents have good connectivity.
Document review review of project or
external materials can provide information
about the context and events that occurred
prior to evaluation.
Scoring/ranking assesses the relative

Before-and-after photos (or drawings)
importance of different items. Ranking
provide visual evidence of change (though
requires ordering by priority; scoring
not necessarily evidence of the reason for the requires assigning a value.
change) that is easily understood by most
audiences. Can be augmented with captions
written by community members.
32
Social Impact
Key resources
`` McNall, M. and Foster-Fishman, P. (2007). Methods of Rapid Evaluation, Assessment, and Appraisal. American
Journal of Evaluation, 28:151. http://www.pol.ulaval.ca/perfeval/upload/publication_194.pdf
`` USAID. (2007). Using Rapid Appraisal Methods. Performance Monitoring & Evaluation TIPS, 2 Ed., No. 5.
http://www.usaid.gov/policy/evalweb/documents/TIPS-UsingRapidAppraisalMethods.pdf
33

Participatory Methods
What are they?
A collection of methods designed to facilitate ownership of M&E findings and
recommendations among the local population. Project beneficiaries play the primary
role in evaluation planning, data collection, analysis, and reporting. Follow-up
actions are decided upon and implemented locally. The methods are flexible, visual
(sometimes oral) and group-oriented; a small evaluation team facilitates but does not
dictate the process.

``Stakeholder analysis, problem analysis, community assessment
``Project design and implementation
``Monitoring and evaluation
Advantages:
`Èmpowers and has potential to build local capacity

`Òwnership by local population can boost quality (comprehensiveness,
accuracy) of findings
``Facilitates the exploration of sensitive subjects
``Monitoring allows corrective action to be taken sooner
`Ìncreases the chance that beneficiaries will be supportive/actively involved in
implementation of evaluation recommendations, thereby increasing likelihood
of sustainability
`Ìncreases information transparency
CHALLENGES:
``Less objectivity than other methods, potentially reducing credibility of results

`Èvidence is primarily anecdotal
``Requires substantial time commitment from locals
``May raise participant expectations of expected project results
``Potential for domination and misuse by some stakeholders to further their
own interests
Cost:
Low to medium, depending on scope and depth of application.

34
Social Impact
Skills required:
Familiarity with, or a minimum of several days training in participatory approaches;

group activity facilitation; ability to create safe, enabling environment; listening;
respect; analysis of qualitative data.
Time required:
Varies widely according to scope of evaluation and methods selected; typically a few
days to a week per community for use of some of the rapid appraisal methods but
much longer for other methods (i.e., participant observer) and for follow-up.
Commonly Used Participatory Tools

Participant Observer Full immersion, to the extent possible, in the local culture. This
method allows the researcher to draw conclusions from a first-hand basis.
Participatory Rural Appraisal (PRA) Can be used in both rural and urban areas.
Some evaluators also categorize the methods listed below as Rapid Appraisal Methods; the
difference lies in whether the application of the method permits the community to believe
it owns the data a key tenant of participatory methods.
``
Participatory mapping collectively creating a map of the community
``
Participatory calendars collectively recalling a history or projecting an anticipated
schedule of events (i.e., timing of rainy season)
``
Transect walks a small group of locals walks the evaluators/facilitators through the
community and discusses what they observe
``
Creative expression drawing, storytelling, drama, role-playing, music, collage
making
``
Participatory video using community made videos to assess change
Key resources
`` Harvey, E. (2005). Guide for Participatory Appraisal, Monitoring and Evaluation (PAME). Braamfontein, South
Africa: The MVULA Trust. http://www2.gtz.de/Dokumente/oe44/ecosan/en-guide-participatory-monitoringevaluation.pdf
`` Taylor-Powerll, E., Rossing, B. & Geran, J. (July 1998). Evaluating Collaboratives: Reaching the Potential.
University of Wisconsin-Extension. http://learningstore.uwex.edu/assets/pdfs/G3658-8.PDF [Creative
Expression]
`` UNDP (1997). Who are the Question Makers: A Guide to Participatory Evaluation. http://www.undp.org/
evaluation/documents/who.htm)
35
EE
36
Social Impact
EVALUATION
PART IV
EVALUATION
EVALU
Evaluation Tools and Approaches
37

Results Frameworks
What are they?
A results framework (RF) is a graphical representation of a development hypothesis.
It demonstrates the causal linkages between all levels of results necessary and
sufficient to achieve a specific bureau or mission goal. These results must be realistic
and achievable, one-dimensional, measurable and within the manageable interest
of the implementing Operating Unit. RFs are based on problem analysis and
information produced by technical analysis and other related assessments.

``Planning help determine the cause-and-effect pathway of objectives needed
to reach a bureau or mission goal
`Àssessment help the evaluator understand the logic behind the design of a
particular intervention
`Èvaluation design athe causal chain and anticipated results can help shape
evaluation questions and methods
Advantages:
``Brings the big picture to light through a focus on intervention effects and
outcomes rather than outputs
``Solidifies linkages between program outcomes and national-level goals and
strategies
``Facilitates agreement and understanding on the design and anticipated goals
of interventions, and generates ownership amongst all mission or bureau team
members
CHALLENGES:
``Bias toward quantifiable indicators

``Potential to oversimplify complex interventions
Cost:
Very little for development of actual RF; however, the cost of preceding analyses and
assessments will vary.
38
Social Impact
Skills required:
Several days of training recommended to grasp understanding of RF development.

For actual development, team members will need an understanding of the local
context in which results are being sought, technical knowledge, data analysis and
interpretation, problem recognition and logical reasoning.
Time required:
Actual development may take only a week or two but depends on the results of
assessments and analyses (i.e., environmental, gender, economic, etc.) that may
require up to several months.
Key resources
`` Department of State (2012). Managing for Results: Department of State Project Design Guidebook
`` Department of State (2012). Functional Bureau Strategy Guidance and Instructions
`` Department of State (2012). Integrated Country Strategy Guidance and Instructions
`` Department of State (2012). Joint Regional Strategy Guidance and Instructions
`` USAID. (2010). Building a Results Framework. Performance Monitoring & Evaluation TIPS, 2nd Ed., No. 13.
http://pdf.usaid.gov/pdf_docs/PNADW113.pdf
39

Performance Management Plans
What are they?
Performance management plans (PMP) measure and track progress toward achieving
results by identifying and defining a list of project-related indicators. The plans
typically include an overview of the bureau or mission's management systems,
how the PMP was developed, the relevant results framework, a narrative on the
development hypothesis, indicator reference sheets, an indicator table, and an M&E
task schedule.

`Èstablish monitoring and evaluation systems for reporting results, including
how information is collected, reviewed and analyzed
``Document definitions, assumptions and decisions
`Ùse data collection to make informed decisions
``Better communicate facts and figures on program achievements and progress
to the Department stakeholders (host country counterparts, partners,
Congress, OMB and American taxpayers)
Advantages:
``Puts bureau or mission teams on the same page at early stages of project
development
``Forces measurement of change for critical indicators
``Helps keep M&E activities on schedule (i.e., data collection, data quality
assessment, evaluations, etc.)
`Ìmproves knowledge, transparency, and accountability
CHALLENGES:
`Èasy to develop an unwieldy PMP with too many indicators

``Time and cost implications may make ideal indicators unreasonable to collect
Cost:
Minimal for PMP development; baseline data collection costs will vary.
Skills required:
Training on overall Department results-based management approach and PMP

development; technical knowledge (in relevant sector); in-depth knowledge of
country context; knowledge of performance management methodologies.
40
Social Impact
Time required:
May take between 2-4 weeks to develop PMP document, solicit and integrate input
and have reviewed by management. Depending on timing, may decide to wait to
undertake collection of baseline data and establish targets. Prep time may be needed
to develop SOWs for any aspects of PMP development that will be contracted out.
Key resources
`` Department of State (2012). Performance Management Guidebook
41

Gender Analysis
What is it?
An approach to program planning, monitoring and evaluation that assesses the
different ways in which interventions affect men and women, boys and girls, and
people of differing civil status (single, married, divorced, widowed, etc.). This
analysis can be conducted at the project, sector and national levels and ensures that
differences are addressed in the evaluation design, sample selection, data collection
and analysis. It recognizes the limitations of conventional quantitative data
collection methods for discussing sensitive topics such as domestic violence, control
of household resources, sexual harassment, social and political participation, and
gender differences in the labor market.

`Ìdentify and address design, implementation and outcome issues with
differential consequences for men and women, boys and girls
``Promote equity and human rights
``Promote economic efficiency of programs by ensuring full participation of
both sexes and maximizing the different resources that men and women
contribute
ADVANTAGES:
`Ènsures that all sectors of the target population benefit from interventions and
that resources of all sectors of the target community are mobilized
`Ènsures efficiency and equity of program impact
`Àddresses social, cultural, legal and political factors that limit womens
participation at the household, community, local and national levels
`Àddresses sensitive human rights issues such as human trafficking, sex trade
and exposure to HIV/AIDS
CHALLENGES:
``May raise sensitive issues that governments may not wish to address and that
donors may not wish to push
`Ùses frameworks and data collection techniques with which many
quantitatively trained researchers may not be familiar, sometimes causing
reluctance to use these techniques
``May require additional resources to contract staff with specialist skills
42
Social Impact
COST:
Depending on the methods used, gender analysis may increase the cost of the
evaluation by 10-20%. In cases where a stand-alone gender analysis is required, the
cost will be similar to comparable conventional evaluation studies.
SKILLS REQUIRED:
Sound knowledge of both quantitative and qualitative research and evaluation

techniques combined with familiarity with current thinking on gender and
development as well as special issues involved in collecting data on sensitive genderrelated topics.
TIME REQUIRED:
When used to complement a conventional evaluation, gender analysis may increase

the required staff weeks by 10-20%. However, if carefully coordinated, most of the
follow-up surveys or in-depth interviews can be conducted at the same time as the
regular survey.
Key resources
`` Department of State (2012). Department of State Policy Guidance: Promoting Gender Equality to Achieve Our
National Security and Foreign Policy Objectives. http://www.state.gov/s/gwi/rls/other/2012/187001.htm
`` Bamberger, M. (2005). Handbook for evaluating the impacts of development policies and programs. Developed
for International Program for Development Evaluation Training Workshop. Carleton University, Ottawa. http://
bambergerdevelopmentevaluation.org [click on gender]
43

Cost-Benefit and Cost-Effectiveness
Analysis
What are they?
Cost-benefit and cost-effectiveness analysis are tools for assessing whether or not
the costs of an activity can be justified by its outcomes and impacts. Cost-benefit
analysis measures efficiency by monetizing all inputs, outputs and outcomes.
Cost-effectiveness analysis estimates inputs in monetary terms and outcomes in
non-monetary quantitative terms (such as improvements in student reading scores).
Whereas cost-effectiveness focuses on a particular outcome, cost-benefit analysis
seeks to include all outcomes, each converted to a monetary benefit.

`Ìdentifying projects or approaches that offer the most efficient allocation of
resources
`Àssessing a projects outcomes relative to its costs, facilitating comparison with
other projects
ADVANTAGES:
`Ìt is a high quality approach for estimating and comparing the efficiency of
programs and projects
``Makes explicit project assumptions that might otherwise remain implicit or
overlooked at the design stage
`Ùseful for convincing policy-makers and funders that activity benefits justify
its costs
CHALLENGES:
``Fairly technical, requiring specialized financial and human resources

``Converting benefits (such as increased life expectancy) into monetary terms
often requires assumptions that strongly influence results
``Cost-effectiveness may be difficult to interpret in projects with multiple types
of outcomes
``Requisite data for cost-benefit calculations may not be available
``Results must be interpreted with care, particularly in projects where results are
difficult to quantify or are poorly measured
44
Social Impact
COST:
Varies greatly, depending on scope of analysis and availability of data.

SKILLS REQUIRED:
The procedures used in both types of analyses are often highly technical. They
require skill in economic analysis of programs in the sector and availability of
relevant economic and cost data.
TIME REQUIRED:
Varies greatly depending on scope of analysis and availability of data.
Key resources
`` Belli, P., et al. (2000). Economic Analysis of Investment Operations: Analytical Tools and Practical Applications. The
World Bank, Washington, D.C.
`` Millennium Challenge Corporation. (April 2009). Guidelines for Economic and Beneficiary Analysis. http://www.
mcc.gov/documents/guidance/guidance-economicandbeneficiaryanalysis.pdf
45
EE

Information Communication
Technology (ICT) for Evaluation
What is it?
ICT for Evaluation encompasses a broad and growing range of tools to increase
the effectiveness and efficiency of international evaluations. These tools include
use of Personal Date Assistants (PDAs), Smartphones, Netbooks, iPads, email and
web-based surveys, digital photos, audio and video and on-line focus groups to
collect data. In addition, there are technologies for enhancing the communication
of geographically dispersed evaluation teams such as WebEx or Skype. Finally, there
are tools for analyzing quantitative data such as Statistical Package for the Social
Sciences (SPSS) and Statistical Analysis Software (SAS); and tools for analyzing
qualitative data such as the Center for Disease Controls EZ-Text (free) plus
commercial products such as NVivo and Atlas-ti. NVivo and Atlas-ti are tools for
storing, coding, managing, analyzing and retrieving qualitative data.
ADVANTAGES:
`Ìn the case of electronic surveys substantial cost savings, immediate

aggregation of data; reduced transcription errors, programmable skip patterns,
and with GIS-enabled devices geo-positioning of survey respondents.
`Ìn the case of virtual communication tools the ability for teams to better plan
and coordination their activities in the field and collaborate on findings and
report writing when not collocated.
`Ìn the case of qualitative data analysis tools the ability to handle large data
sets, effective storage systems and once coding has been established quick and
easy access to analyzing data and establishing relational networks among data
categories. The tools also establish an audit trail as coding of qualitative data
proceeds.
CHALLENGES:
``High upfront costs for acquiring handhelds for survey teams

``Special skills required for programming and operating handheld devices and
specialized software packages.
`Òperating environments in remote locations may not include access to
internet, cell phones or electricity
46
Social Impact
``Tools like NVivo and Atlas-ti take time to learn; inputting and coding
the data can be time consuming and once categories, codes etc have been
established it may be hard to change them. In field settings capturing
qualitative data on digital recorders may not be practical and producing
quality transcripts for qualitative data analysis may add large and impractical
amounts of time to the data collection level of effort and budget.
COSTS:
Varies greatly depending on technologies used.

SKILLS REQUIRED:
ICT experts for PDA and handhelds for programming purposes; tteam members
who are skilled in using specific quantitative and qualitative software packages.
TIME REQUIRED:
Highly variable depending on technologies used.
KEY RESOURCES:
`` Sue, V. M., & Ritter, L. A. (2012).Conducting online surveys. (2nd ed.). Thousand Oaks, CA: SAGE Publications.
`` http://www.cdc-eztext.com/
`` http://betterevaluation.org/blog/analyzing_data_using_common_software
`` The Evaluation Exchange, Volume X Number 3, Fall 2004. Taking the Next Step: Harnessing the Power of
Technology for Evaluation. Harvard Family Research Project.
47
http://www.socialimpact.com

Evaluation Toolkit PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Evaluation Toolkit PDF

Uploaded by

Copyright:

Available Formats

EVALUATION

Some Tools, Methods & Approaches

Evaluation: Some Tools, Methods & Approaches

2300 Clarendon Boulevard, Suite 1000

Evaluation: Some Tools, Methods & Approaches

Evaluation: Some Tools, Methods & Approaches

Department of State Evaluation Policy, February, 2012.

Depending upon their information

Impact Evaluations: Impact evaluations

There is no hard or fast rule regarding

This is often referred to as the

Summative/ ex-post evaluations: They

Evaluation: Some Tools, Methods & Approaches

Types of Evaluations (continued)

Global/Regional program evaluations:

Experience Reviews: These involve

Mixing and matching approaches

Development information collected through a given method may inform the

Evaluation: Some Tools, Methods & Approaches

Evaluation: Some Tools, Methods & Approaches

Chart 1: Continuum of Evaluation Designs

Mixed methods designs

Designs that assess

Case study design

Designs that assess

Increasing feasibility in the field

Evaluation: Some Tools, Methods & Approaches

``Evaluation findings can be generalized to the population about which

``Many kinds of information are difficult to obtain through structured data

For snapshot and cross-sectional PE designs 3-5 months is typical. Before-and-after

Evaluation: Some Tools, Methods & Approaches

Cross-Sectional Designs show a snapshot

Evaluation: Some Tools, Methods & Approaches

``Mixed methods designs combine the representativeness of quantitative

Evaluation: Some Tools, Methods & Approaches

``Cannot establish statistical causality or significance as in quantitative impact

Medium, depending on number of cases selected and depth of data collection.

Familiarity with various qualitative research methods such as key informant

Evaluation: Some Tools, Methods & Approaches

What can we use it for?

``More rigorous than performance evaluations

Treatment Group exposed to a given

``Considerably more costly, timeconsuming, and management

Control Group identified using

``Substantive changes in project

Comparison Group identified

``Require highly specialized

``Selection of comparison groups could be politically or logistically difficult

Strong technical skills in social science research design, management analysis

Evaluation: Some Tools, Methods & Approaches

A few approaches used for impact evaluations

are the comparison. This method assumes

Quasi-Experimental Design (QED)

Matching this QED relies on large

Regression Discontinuity (RD) this QED

Evaluation: Some Tools, Methods & Approaches

What can we use it for?

``Produces significant cost and time savings

``Voluminous project records may be difficult and time- consuming to review

Experience with statistical analysis of survey data.

Relatively short compared to primary data collection.

Evaluation: Some Tools, Methods & Approaches

What can we use them for?