Rising To The Challenges of Impact Evalu

Rising to the challenges of impact evaluation in complex development contexts
Insights from piloting a Participatory Impact Assessment and Learning Approach (PIALA)
with IFAD in Ghana and Vietnam
Adinda Van Hemelrijck
Abstract
This chapter reflects on the use and value of a systemic theory-based and participatory mixed-methods
approach to address the challenges of impact evaluation in complex development contexts. A
Participatory Impact Assessment and Learning Approach (PIALA) was developed and piloted with the
International Fund for Agricultural Development (IFAD) in Vietnam (2013) and Ghana (2015) to
assess and debate systemic impacts on rural poverty together with partners and stakeholders. An action
research was conducted around the pilots to learn about the processes and mechanisms that can make
an impact evaluation using PIALA on a limited budget rigorous and inclusive. The main conclusion
from the research is that inclusiveness and rigour can reinforce each other, even more so when
participatory processes and methods are employed at a larger scale. Important attributes are
methodological complementarity and consistency, and extensive and robust triangulation and cross-
validation. Critical to deliver quality at scale is research capacity. Investing in such capacity helps
reducing the cost over time while enhancing the value and uptake of evaluation findings.
Introduction
Development today takes place in globalised contexts of growing inequality, uncertainty and
instability with new rising powers and infinite conflicting issues and interests. The 2030 Agenda for
Sustainable Development calls for fundamental systemic changes and adds demands for inclusiveness
(leaving no-one behind) and sustainability to those of effectiveness in order to eradicate poverty and
inequality and protect our planet. Interventions, by consequence, become ever more complex, with
stakes and stakeholders getting more diverse, influences more dense, problems more systemic and
outcomes more unpredictable. This complexity challenges the field of impact evaluation.
Traditional counterfactual-based approaches are generally found too costly and difficult to pursue in
complex environments due to high causal density, spill over, time lags and the unpredictability of
events (Befani et al, 2014; Picciotto, 2014). They too narrowly focus on specific intervention
components thus “leaving many evaluation questions unanswered” (H. White, 2014, p. 3). They also
don’t explain impacts or enquire sustainability given their focus on specific and isolated cause-effect
relations, and therefore cannot tell if/how/why similar relations would or won’t work elsewhere
(Picciotto, 2012; Ravallion, 2012; Woolcock, 2013). Last, engagement of and learning with partners
1
and stakeholders is inhibited by scientific procedure, raising questions about inclusiveness and
democratic value (Van Hemelrijck, 2013, 2016a). Alternative theory-based and complex systems
approaches, on the other hand, tend to be time-intensive and produce evidence not comparable across
many cases1; therefore aren’t suitable for evaluations with larger populations (larger-N) requiring
estimates of impact distribution (Beach & Pedersen, 2013). In addition, those allowing for
participation generally don’t set out to rigorously assess causality and address concerns of bias
(Copestake, 2013; H. White & Phillips, 2012). In 2012, the International Fund for Agricultural
Development (IFAD) launched an innovation initiative for developing an approach for impact
evaluation that could address these challenges. The approach was called PIALA: “Participatory Impact
Assessment and Learning Approach” (IFAD & BMGF, 2013b).2 IFAD is a United Nations agency
that provides loans and support to governments for agricultural and rural development programmes
aiming at reducing rural poverty by changing smallholder production and market systems (IFAD,
2016). These are generally medium to large scale programmes implemented by public and private
partners in quite political and complex environments. Given its commitment to moving 80 million
rural people out of poverty between 2013 and 20153, IFAD needed to find ways to systemically and
collaboratively assess and learn together with these partners and other key stakeholders about
programme contributions to reducing rural poverty (IFAD, 2011, 2012). Moreover, IFAD also wanted
to extend its people-centred and participatory approach to project design and implementation4 into the
realm of impact assessment without compromising on rigour in causal analysis (IFAD & BMGF,
2013b). As there were doubts about the value and sufficiency of established approaches to meet these
needs, there was an interest in methodological experimentation.
In this chapter, we first describe what PIALA is and briefly present the two IFAD evaluations that
piloted the approach. Then we discuss the main insights from the action research that was conducted
around the pilots on how an impact evaluation using PIALA can be rigorous and inclusive. Last, we
conclude with some reflections on the value-for-money of the approach, on how rigour and
inclusiveness may reinforce each other and generate greater value, and what the key attributes and
conditions are to achieve this.
1
This is mostly because the cases themselves are incomparable.
2
The piloting of PIALA was made possible with financing from IFAD and the BMGF, yet this paper does not represent the
views of the funders but the individual views of the author.
3
IFAD made this commitment for IFAD’s Ninth Replenishment period (2013-15) and later extended it to its Tenth
Replenishment period (2015-17) (IFAD, 2016).
4
Cf. Guijt & Woodhill, 2002.
2
Participatory Impact Assessment and Learning Approach (PIALA)
The PIALA initiative sought not to reinvent the wheel, but to develop a model that creatively
combines existing designs and methodologies in novel ways to address IFAD’s needs. Inspiration was
drawn mostly from the theory-based (in particular realist) and transformative (including rights-based)
traditions (Holland, 2013; Mertens, 2009; Pawson, 2013; Van Hemelrijck, 2013).
As shown in Figure 1, the approach combines five key elements. Together these make it possible to
assess impact in a systemic and participatory way at a larger scale and in complex contexts where
traditional counterfactuals generally don’t work well. The five elements are:
1. a systemic Theory of Change for visualising the project’s causal claims and engaging
stakeholders in framing the evaluation and debating the evidence;
2. multi-stage sampling of/in ‘open systems’ for enabling systemic inquiry across medium to
large size populations;
3. standardized participatory mixed methods for collecting, linking and crosschecking the data
in all sampled ‘systems’ in a systematic and comparable way;
4. a two-stage participatory sense-making model for engaging stakeholders at local and
aggregated levels in debating the emerging evidence; and
5. a configurational analysis method for assessing systemic change patterns and drawing
conclusions about the distribution and magnitude of their impact across medium to large
samples .
<insert Figure 1>
The first of the PIALA elements, the systemic ToC approach, forms the backbone for the entire
evaluation. It involves a process of reconstructing and visualising a programme’s impact pathways and
change hypotheses and the broader trends and influences, based on a thorough desk review and
discussions with key stakeholders. Different from a classic programme/project theory5, it takes an
evaluative lens focused on assessing the hypotheses by looking backwards from the envisioned impact
to the interactions and mechanisms that presumably have caused or influenced the impact (Funnell &
Rogers, 2011; Van Es et al, 2015). Impact is viewed from a systemic perspective, resulting from
changes in systems of interactions, rather than the direct and isolated relationship between intervention
and effect. A systemic ToC approach is most useful for evaluating the changes caused by many
different interventions, implementers, contributors and funders, as it helps creating a shared
5
A programme/project theory is constructed from a management perspective and focused on strategy and
performance looking forward towards delivery of planned results.
3
understanding of complex pathways and enables different stakeholders to critically engage in parts of
the analysis (Van Hemelrijck, 2013).
The purpose of multistage cluster sampling of/in ‘open systems’, which is the second PIALA
element, is to ensure sufficient representativeness of the various populations to enable comparison and
generalization of findings about systemic impact at a medium to large scale. If we want to know and
learn about systemic impact, then the ‘system’ should be the main level of analysis and thus also the
main sample unit. The system in case of government policies and programmes is generally too big and
too few for a classic counterfactual comparison. However, by focussing on the lowest embedded ‘open
system’ at the local level that is entrenched in and affected by the larger system, it is often possible to
have a sample of these that is large enough to cover systemic heterogeneity and subsample large-
enough subsamples for statistical comparison (Van Hemelrijck, 2017). Multi-stage cluster sampling of
these local systems, and of populations within these systems, is most cost-effective, as it substantially
reduces costs and logistics compared to other random sampling strategies (Levy & Barahona, 2002).
In the case of IFAD, the local systems that form the main sample unit are agricultural production and
market systems such as value or supply chains. These are the first population to sample, from which
the impact populations affected by the various parts of the system enquired by the evaluation are
subsampled (mostly randomly6) (Van Hemelrijck, 2016b).
The third PIALA element concerns the appropriate selection and mixing of methods and processes
(mostly participatory) to collect qualitative and quantitative data on the different causal claims in the
ToC. This is done in a way that makes it possible to build the actual causal paths with the data
mirroring the ToC for each locality or sampled ‘system’, and to compare these across the entire
sample of ‘systems’ as the basis for aggregating the findings for the programme as a whole. The
methods therefore are more-or-less standardized but sufficiently open-ended to draw out the varying
causal and contextual elements. Sensing tools such as causal flow mapping are included to capture
unintended effects and influences and uncover broader dynamics interacting with the programme. The
methods and tools also complement and build on each other analytically, each focusing on different
levels in the ToC, to investigate the cascading causes and effects and enable the construction of the
causal paths with the data. The methods further partly overlap to permit triangulation, each inquiring
one step further up and one step further down the causal chain. Systematic data collation and quality
monitoring makes it possible to crosscheck and link the data for building the causal paths during
fieldwork in every locality, and timely identify data gaps or weaknesses needing further inquiry before
moving to the next locality. A standard collation matrix is used for this, in which all data is entered
and ordered alongside the ToC (IFAD & BMGF, 2015).
6
Random sampling is needed for statistical analysis. This depends on the evaluation focus. In an impact evaluation for
Oxfam GB in Myanmar, for instance, PIALA’s sampling protocol was adapted to fit the specific evaluation focus and
requirements (Van Hemelrijck, 2016a).
4
Table 1 illustrates how this mixed-methods logic worked out in the IFAD pilot in Ghana.7 The codes8
in the top row reflect the type of causal link. A classic household survey and a participatory method
for generic change analysis (right side of the table) were used – the latter in gender-specific focus
groups– to take stock of the impacts of changes in livelihoods on household food and income. A
livelihood change analysis method, used in different but also gender-specific focus groups, was
triangulated with the generic change analysis (centre of the table) to further investigate the effects of
the observed livelihoods changes and the changes in production and market systems that caused these
changes. The tools used in these two methods were largely PRA-based. In the RTIMP evaluation (the
second PIALA pilot), the livelihood analysis also included a brief SenseMaker9 study. Constituent
feedback10 was further triangulated with the livelihood analysis (left side of the table) to investigate
the functioning, reach and effects of the mechanisms developed by the programme on the changes in
the various areas affecting livelihoods. Last, key informant interviews (KIIs) were conducted with
local, regional and national stakeholders to crosscheck the evidence from the Constituent Feedback
and Livelihood Analysis (IFAD & BMGF, 2015).
<insert Figure 2>
Participatory sense-making, which is the fourth PIALA element, involves local workshops organised
during fieldwork in each locality, and a programme-level workshop right after finishing fieldwork and
before turning to the final analysis and reporting. In these sense-making workshops, stakeholders
discuss the evidence and value the observed contributions (among other influences), by comparing the
actual causal paths revealed by the data with those hypothesised in the ToC. The stakeholders
participating in the workshops typically include decision-makers, service providers and impact
populations (e.g. small farmers, traders and enterprises). They represent all the different perspectives
necessary to cross-validate the evidence and inform final analysis. Sense-making in all researched
localities and at the aggregated level serves to not only crosscheck and strengthen the evidence, but
7
Source: Presentation of the results and reflections from the impact evaluation of the RTIMP evaluation (IFAD, Rome, 26
October 2015). Published on: https://www.ifad.org/documents/10180/d791c8af-51f2-4768-afef-49caf12b7ae8. More
information on the specific hypotheses, links and questions inquired by each of these methods can be found in the
evaluation’s design paper, published on: http://www.participatorymethods.org/sites/participatorymethods.org/files/IFAD-
RTIMP%20Impact%20Evaluation%20Design%20Paper%20Dec%202014.pdf.
8
In the actual ToC of the RTIMP, these were much more complex. The numbering in the codes reflect the contribution claim
and level in the ToC.
9
SenseMaker® is a software-based methodology developed by Cognitive Edge (cf. http://www.sensemaker-suite.com) that
facilitates mass ethnography and provides a way of nearly real-time mapping of social interactions and individual perceptions
and motivations to inform adaptive management and policy formulation. It collects large amounts of self-signified micro-
stories capturing people’s experiences and how these shape their perceptions of past and future change in ways that enable us
to identify emerging patterns of actions and decisions. The software permits statistical analysis at a very large scale.
10
Constituent Feedback (also called Constituent Voice) is a methodology developed Keystone Accountability (cf.
http://www.keystoneaccountability.org) for collecting quantified feedback and engaging in dialogue with key constituents or
beneficiaries, using standardized metrics similar to the customer satisfaction surveys developed in the private sector, and
descriptive statistics to produce visually data reports.
5
also create ownership and enable voice and stimulate systemic learning. In essence, it makes an
evaluation more democratic (Van Hemelrijck, 2016b).
Configurational analysis, finally, which is the fifth PIALA element, enables us to compare systemic
change and impact across the sample of systems to reach conclusions about distribution and
magnitude of impact. It uses elements of process tracing, contribution scoring and cross tabulation,
and involves four major steps. The first is the aggregated data collation in a standard excel matrix
format in which all evidence from the field collation matrices as well as secondary sources is
synthesized and tabulated alongside the ToC. The next step involves the clustering of the evidence
across all the sampled ‘systems’ to surface patterns or configurations of systems changes and causal
attributes. The third step involves the comparative analysis of similarities and differences in
configurations for the specific mechanisms or parts of the system of interest (including cases with and
cases without functioning mechanisms).11 The last step then involves the integration of findings for the
different parts and mechanisms as the basis for validating (or refuting) the hypotheses in the ToC,
zipping up the findings alongside the ToC, and drawing conclusions about the distribution and
magnitude of the programme’s contributions to impact (IFAD & BMGF, 2015; Van Hemelrijck,
2016b, 2017).
The five PIALA elements can be adapted to the specific needs and conditions of a programme, as long
as it remains consistent with PIALA’s two key principles of evaluating systemically and enabling
meaningful participation (Van Hemelrijck, 2016b). The design and conduct of an evaluation using
PIALA further combines standards of rigour, inclusiveness and feasibility to achieve optimal value
with limited budgets:
• Rigour is defined as the quality of thought put into the methodological design and conduct of
the evaluation to enable robust triangulation of different methods and perspectives in ways
that defeats bias or dominance of a single truth and ensures consistency while remaining
responsive to local contexts and conditions. PIALA builds on the premise that bias cannot be
avoided by a single method or procedure but can be mitigated through systematic triangulation
of different methods and perspectives (Camfield et al, 2014; Carugi, 2016; Mertens, 2010);
• Inclusiveness refers to the legitimacy of the ways in which people are engaged in the
evaluation, and to the level of impartiality or inclusion of all stakeholder views and
perspectives, which has intrinsic empowerment value but also contributes to the robustness
11
Software such as EvalC3 can be applied to assess the conjuncture of different mechanisms and causal processes. This
novel software developed by Rick Davies was piloted in an impact evaluation using PIALA for Oxfam GB in Myanmar (cf.
Van Hemelrijck, 2016a). The software helped to identify sets of causal attributes are necessary and/or sufficient for specific
sets of outcome attributes to occur, and compare and evaluate the performance of these causal models to find those with the
greatest predictive power.
6
and credibility of the evidence and thus to the validity12 of the findings (Chambers, 2015;
Pawson, 2013);
• Feasibility, finally, concerns the budget and capacity needed to meet the expectations of
rigour and inclusiveness and enhance learning (Van Hemelrijck & Guijt, 2016).
The two IFAD cases
PIALA was piloted with support from IFAD and the Bill & Melinda Gates Foundation (BMGF) in the
evaluation of two IFAD-funded programmes. The first was of the Doing Business with the Rural Poor
(DBRP) programme Southern Vietnam, conducted in one province in the Mekong Delta13. The second
was of the Roots & Tubers Improvement and Marketing Programme (RTIMP) in Ghana, conducted
country-wide14. Both programmes aimed at improving livelihoods and increasing food and income
security by enhancing smallholders’ capacity to commercialize and linking local businesses to markets
and industries. DBRP focused on developing diversified short value chain systems to achieve this;
RTIMP was concerned with developing much longer commodity chains linked to national and export
markets and industries (IFAD & BMGF, 2014; MOFA/GOG et al, 2015). The programmes essentially
sought to create the mechanisms needed to facilitate rural people’s access to services, resources and
markets. Text box 1 below summarises their ToC narratives (IFAD & BMGF, 2014; MOFA/GOG et
al, 2015).
<insert Text box 1>
Despite some important differences in the context and quality of the two evaluations (further discussed
in the next section), both produced quite convincing evidence of programme contributions to
livelihood improvements as a result of increased access to services, resources and markets generated
through the mechanisms put in place by the programmes. Although positive, the evidence also showed
that these improvements were rather limited, fragile and susceptible to climate and market shocks,
particularly for poor and vulnerable households and in remote and marginalised areas. Both in Ghana
and in Vietnam, poor and vulnerable households ran considerable risks by engaging in value chains
and accessing markets. These risks were left largely unmitigated due to: (a) inadequate market linking
12
‘Validity’ is understood as the extent to which findings are well founded, based on robust evidence and corresponding
with the reality of all the populations of the project being evaluated.
13
The DBRP was implemented from 2008 to 2014 in two provinces (Cao Bằng and Bến Tre) with a total budget of USD 51
million, including a USD 36 million loan from IFAD. The evaluation was conducted in 2013 at a cost of USD 90,000 and in
Bến Tre province only, where the project was implemented in 50 of 164 communes in eight of nine districts (IFAD &
BMGF, 2014). The evaluation report is published on IFAD’s website: https://www.ifad.org/documents/10180/841677fb-
7329-462e-866d-efbaf5edc6cb.
14
The RTIMP was implemented from 2007 until 2015 as a national programme in 106 of 216 districts spread across all ten
regions countrywide, with a total budget of USD 24 million of which USD 19 million was financed under an IFAD loan. The
evaluation was conducted countrywide after project completion in 2015, at a cost of USD 233,000, and covered the post mid-
term review period from 201014 (MOFA/GOG et al, 2015). The evaluation report is published on IFAD’s website:
https://www.ifad.org/documents/10180/7b74a2e6-e4bc-4514-a99e-44e0ee9adb7f.
7
and forecast to avoid local market saturation and monopolisation; and (b) inadequate poverty targeting
to ensure that support services and mechanisms are inclusive, sustainable and tailored to the needs of
vulnerable groups and households (Van Hemelrijck, 2016c). Recommendations were made by the two
evaluations to address these issues in IFAD-funded programmes and projects that are similar to the
DRBP and RTIMP.
The DBRP evaluation in Vietnam showed a positive trend of poor households becoming middle-
income, and middle-income households becoming well off. However, while one-third of poor
households had experienced an increase in their incomes, roughly the same proportion had seen a
decline. Engagement of women and men in different types of short value chains supported by the
programme (e.g. ornamental trees, fruit, rice cultivation, shrimp farming, cattle production, pig
raising) generally increased between 2007 and 2012 (and for women slightly more than for men).
Local trade increased, and more jobs became available for landless households, but also outmigration
of youth increased. Construction of roads, bridges and market places and provision of training and
services substantially contributed to the creation of market and job opportunities. Uptake of loans
provided through the DRBP mechanisms, however, remained limited. Also the relative benefits of
training and support services for poor and near-poor households remained unclear, raising questions
around the effectiveness of the programme’s targeting and outreach strategy (IFAD & BMGF, 2014).
Although the evidence suggested strong connections between all observed changes, confidence in
causal inference remained relatively weak. Data collation, crosschecking and quality monitoring
wasn’t yet done systematically with the ToC as a backbone structure in this first PIALA pilot.
Conversely, the RTIMP evaluation in Ghana showed significant improvements in roots- and tubers-
based livelihoods, with 15% of households increasing their incomes above USD 2/day, in turn
positively affecting household access to food. Very weak or no improvements were found in supply
chain areas where the RTIMP mechanisms were dysfunctional or not in place. Improvements though
occurred merely between 2009 and 2013 and in about 52% of the supply chain areas (or half of the
country). Moreover, no households gained profits above USD 4/day from roots and tubers, even
though 61% of the households had invested in roots and tubers businesses. Access to new seed
varieties and farming technologies initially had led to a boom in roots and tubers production across the
country, triggering a spill over into processing. Adoption of improved processing technologies though
remained limited in 83% of the cases, partly due to limited investment capital. By and large, the
finance mechanism put in place by the programme proved inaccessible, as it required pre-investment
without short-term capital return, posing high risks for smallholders. Weakening demand and
insufficient market linking exacerbated by an economic downturn caused a decline in producer prices
for roots and tubers from 2013 onward. Poor infrastructure and land tenure insecurity further
restrained the potential and incentives for smallholder innovation and value creation (MOFA/GOG et
8
al, 2015). Confidence in causal inference and generalizability of these findings was relatively strong
because of its systematic and multi-layered crosschecking and cross-validation procedure (further
discussed in the next section).
Key insights from the piloting
As mentioned earlier, the PIALA initiative was conceived as an action research to enquire the
conditions, processes and decisions affecting rigour and inclusiveness in the two pilots. Combining
multi-sited ethnography with cooperative inquiry, it involved extensive reflections with researchers
and participants in the two pilot countries and feedback sessions with global experts at IFAD
headquarters (Van Hemelrijck, 2016b). Insights from the first pilot in Vietnam helped better
addressing the challenges in the second pilot in Ghana (IFAD & BMGF, 2015). This section
summarizes some key lessons related to: (1) creating ownership; (2) deciding on the evaluation scope
and scale; (3) deciding on the counterfactual; (4) maintaining independence; (5) contextualising
poverty analysis; (6) dealing with power and bias; and (7) deciding on the scale and level of
engagement in the sense-making.
(1) Creating ownership of the evaluation
To create ownership, key stakeholders need to be sufficiently engaged in the framing and focusing of
the evaluation. Ownership implies that the evaluation is wanted, legitimized and enabled by a shared
sense of responsibility for its success. Ownership also enables participation in the analysis and
facilitates learning and greater uptake of evaluation findings and recommendations (Burns & Worsley,
2015; Patton, 2011). In the case of PIALA, stakeholders are engaged in the framing and focusing of
the evaluation through a process of reconstructing and visualising the ToC (Van Hemelrijck & Guijt,
2016).
In Vietnam, insufficient time and budget was spent on this process, affecting the rigour and
inclusiveness of the approach during the entire evaluation. A brief workshop was organised with the
programme steering committee and managers to discuss the programme logic and expectations for the
evaluation. The process of reconstructing and visualising the ToC, however, happened after the
workshop and almost independent of the design of the evaluation. The evaluation was guided by the
generic questions taken from the PIALA strategy, instead of questions specifically formulated to
enquire the causal links and assumptions in the DBRP’s ToC. As a result, the evaluation remained too
broad and the focus unclear, making it difficult for the researchers to relate the evidence back to the
ToC and arrive at greater precision in causal analysis. Furthermore, limited ownership of the ToC by
the stakeholders also hindered their critical engagement in sense-making and thus valuing
contributions as the basis for determining strategic directions for future programming (IFAD &
BMGF, 2013a).
9
Armed with these insights, the ToC process was made a priority and key deliverable in Ghana.
Researchers were required to lead on the ToC process and organise a design workshop to discuss the
ToC and the evaluation design options and determine the focus of the evaluation together with key
stakeholders. This investment in a more robust and collaborative ToC process bore fruit and laid the
foundation for attaining greater quality throughout the entire evaluation, resulting in stronger evidence
and ownership of findings (IFAD & BMGF, 2015; Van Hemelrijck, 2016b).
(2) Deciding on the scope and scale of the evaluation
PIALA’s mixed-methods approach pursues breadth through a household survey and participatory
inquiry across a representative sample of ‘systems’ on the one hand, and depth through a more focused
participatory investigation within each of the sampled systems. The various components and
mechanisms to be covered in each of the sampled systems relative to their overall span and complexity
determines the required depth or scope of evaluation. The size of the sample of open systems relative
to its total population size marks the evaluation’s breadth or scale. The larger the scale, in general the
more relevant the findings for national and global policy and advocacy will be. But scale also makes it
more challenging to attain rigour, as mixed-methods (particularly participatory) become more onerous
when used at a larger scale, in particular when also resources are tight. The PIALA pilots have shown
that depth of analysis does not necessarily have to sacrifice for breadth of coverage (or vice versa), if
sufficient capacity and motivation to deliver quality is present. In contexts where local research
capacity is weak, the ambitions in terms of scale or scope need to be lowered and more resources spent
on training, coaching and supervision (IFAD & BMGF, 2015).
Three relevant design options are available for designing an impact evaluation: full scope - limited
scale, limited scope - full scale, and full scope - full scale.15 When choosing for a full scope - limited
scale design, the emphasis is on learning about the project’s total contribution to impact in select cases
under specific conditions. Here the ToC approach is most useful for systemic learning with key
stakeholders. Fieldwork and analysis are less resource-intensive if the focus is merely on learning,
given the relatively small samples sizes. However, unless the project/programme itself is implemented
at a limited scale16, evaluation findings will not be generalizable for the entire population, therefore
less useful for influencing policy decisions. When choosing a ‘limited scope - full scale’ design, on the
other hand, the purpose is to assess the effects of one or two particular aspects or mechanisms of the
project. The ToC is strictly not necessary to conduct such a narrow study. Skipping the ToC process
15
A limited scope - limited scale option is not really relevant for impact evaluation but more for performance assessment
(e.g. cost-effectiveness study) as it limits the possibility of causal analysis through classic counterfactual comparison,
frequency statistics and/or triangulation and cross-validation of sources and methods.
16
In the case of small N projects/programmes, larger within-samples and more stringent triangulation and cross-validation
procedures (instead of scale as defined above) will take up the resources to attain the required level of rigour for
generalisation.
10
may help saving time and budget, but may risk missing out on systemic understanding leading to
flawed conclusions. Components are studied in isolation, thus not permitting analysis of systemic
interactions. For example, a cost-effectiveness study of Farmer Field Forums (FFF) in Ghana
recommended a scaling up as the high adoption of new technologies had proven the success of this
mechanism. The PIALA evaluation, however, showed that in a period of weak economic growth, this
success in fact contributed to market saturation negatively affecting livelihoods across the entire
country (Ibid; MOFA/GOG et al, 2015).
In Vietnam, a choice was made for a full scope - limited scale design, but with a disparate scale for the
different methods. To save resources, the participatory research was conducted only in a subsample
drawn from the sample of villages where household surveys were conducted. The assumption was that
this would be sufficient to conduct a full scope inquiry of contributions to rural poverty impact for the
entire programme area. This generated a disparity in the different data sets which caused problems for
their subsequent linking. While participatory data on causes and contributions were case-specific and
limited to only a few villages or cases, survey data on household impact concerned more widely
distributed data in a much bigger sample not related to the specific cases or villages inquired by the
participatory methods. This hindered causal inference (IFAD & BMGF, 2013a).
In Ghana, by contrast, a conscious choice was made to employ all methods in the same sample and at
the same scale. The three design options were discussed with clients and commissioners17 before any
procurement or design work was started, giving them a basic understanding of the cost and utility of
each. As the future Ghana Agriculture Sector Investment Programme (GASIP) was expected to scale
up most of the RTIMP mechanisms, the evaluation was found necessary for both reporting and
learning. The commissioners therefore chose for the most substantive and expensive design option:
full scope - full scale. The implication was six weeks of uninterrupted fieldwork –thus much more
intensive compared to Vietnam where fieldwork took only two weeks. Budget-wise it was more
tightly than in Vietnam because of the larger scale and scope. Quality was upheld though by working
with a highly competent and motivated research team (IFAD & BMGF, 2015; MOFA/GOG et al,
2015).
(3) Deciding on the counterfactual
Mainstream impact evaluation assumes that comparative analysis of evidence from treated and non-
treated locations is feasible and necessary to reach generalizable conclusions about impact on rural
household poverty. In most ‘real world’ evaluation contexts (Bamberger et al, 2012), however, it is
very difficult and costly to arrive at an accurate assignment of locations to specific interventions and
17
Including: the Ministry of Food & Agriculture (MoFA) of the Government of Ghana (GoG), and the IFAD Country Office
in Ghana.
11
an identification of credible control groups. The challenge occurs for instance in cases of unexpected
or uncontrolled project expansion and/or spill-over combined with high causal density of other
interventions and influences. In such contexts, it is difficult to discern project from non-project
localities and find the right matches (Woolcock, 2009). In addition, the ‘open systems’ that form the
principal sample unit in PIALA evaluations, generally don’t have clear boundaries (such as villages or
other administrative units have). Hence the identification and matching of control units for these
‘systems’ and subsampling of various populations from these, if possible at all, require fieldwork prior
to the evaluation, which substantially increases the cost.
In the Vietnam pilot, we assumed that comparative analysis of treated and non-treated units would be
possible and necessary to assess household-level impacts, and that the village constituted the best
proxy unit for enquiring the short value chains developed by the programme. These assumptions were
flawed and compromising on analytical rigour, making it difficult to generalize findings, for three
important reasons. First, without a clear definition and identification of the value chain systems, and
thus without having sampled proper proxies based on such a definition, it was difficult to relate the
data on changes in capacities, institutions and livelihoods to the specific value chains and assess the
causal links. Second, the matching of treated and non-treated villages was based on variables that
applied to the village as a unit, and not to the value chains, making it difficult to compare and relate
the data collected on the changes in the village and impacts on the households to the value chain
interventions. Last, the high heterogeneity in programme delivery and incoherence in its value chain
linking efforts, further conflated by the high causal density of other programmes and influences in the
villages, made it quasi impossible to get credible control data (IFAD & BMGF, 2013a).
Learning from this, much more work was done to understand and define the principle sample unit in
Ghana. In the evaluation design workshop, the decision was made not to drain resources into the
identification and inquiry of household-level control groups, but instead concentrate on analysing the
supply chain systems of the four major commodities supported by the programme. The supply chains
constituted loose catchment areas comprising clusters of communities of smallholders that supply raw
products to a small enterprise or industrial off-taker called the supply chain leader, who then
manufactures higher-value products out of the raw material for bigger markets. The chains were not
entirely homogeneous: they interacted and overlapped and in practice often differed from what was
sampled on paper. In some cases, suppliers produced different crops supplying different buyers and
markets; in other cases multiple chains intertwined. Even though the supply chain systems for each
commodity shared a similar structure, ensuring the data collected on these systems remained
comparable required much creativity and coordination (Chambers, 2015). Furthermore, no reliable
lists of households and beneficiaries were available for subsampling from the sampled supply chains.
Identification and matching of control units and sampling of households thus would have required
12
extensive pre-evaluation fieldwork. The participants in the design workshop voted against this. The
traditional comparison with control groups, therefore, was replaced with a configurational analysis
method that draws on systemic heterogeneity as the basis for counterfactual analysis. Supply chains
with different systemic configurations of treatments and causal attributes were randomly sampled
(with probability-proportional-to-size) from the four commodities’ supply chain populations. The
samples were large enough to include supply chains with dysfunctional or absent programme
mechanisms to serve as a ‘natural counterfactual’ (MOFA/GOG et al., 2015).
(4) Maintaining independence
Field mobilisation of research participants is best undertaken independently from project management
to avoid positive bias.18 When research participants suspect the research is not independent, they are
more likely to over- or under-report. On the other hand, they are unlikely to trust outsiders who are not
authorised and formally introduced by their leaders. Thus for the researchers to organise fieldwork at
scale, and mobilise participants without any help from the programme, they need to be good at
logistics, know about the areas and local customs, and be able to obtain authorisation and introduction
in ways that does not affect their independence. In contexts where this is not possible, strong
facilitation skills are needed to minimise undue influence or interference (IFAD & BMGF, 2015;
MOFA/GOG et al., 2015).
The challenges encountered in Ghana were quite different from those in Vietnam. In both pilots
though participants clearly trusted that the researchers were authorised and independent, making them
feel safe to express their views and critically engage in the group inquiries. In Vietnam, field research
cannot be conducted without government permission and interference. In evaluations of government
programmes, logistics are taken care of by the government. Hence in the DBRP evaluation, local
transportation and mobilisation was organised by local officials and programme staff. This made the
research highly efficient but challenged the researchers’ independence. Although quite collaborative
and supportive, local leaders and programme staff were omnipresent during fieldwork. Yet the
researchers artfully managed to keep them at a distance and safeguard the privacy of the focus groups
(IFAD & BMGF, 2013a).
In Ghana, the researchers took care of the transportation and mobilisation entirely by themselves and
without prior notification or engagement of the local officials, allowing for much greater
independence. Staff and officials were present only at the discussions in which they were invited to
participate as research participants. This of course made them more suspicious of and resistant to the
evaluation. Also the scale of the fieldwork, the remoteness and spread of the communities, the large
18
According to the OECD (2010, p. 14; 2002, pp. 24–25), “independence” implies an evaluation process that is transparent,
independent from project management and free from political influence or organisational pressure.
13
distances to travel over often very poor roads, and the difficulty to find safe and trusted locations for
convening people from different communities, made the field inquiries quite onerous. Hence
independence came at a serious effort and cost in Ghana, but avoided compromising on both rigour
and inclusiveness (IFAD & BMGF, 2015; MOFA/GOG et al., 2015).
(5) Contextualising poverty analysis
To make it possible to say something about a programme’s influences on poverty, data on these
influences and on poverty need to be linkable. Hence poverty has to be defined in ways relevant to the
context and conditions of the villagers in the programme area (also called construct validity). In both
the pilots in Vietnam and Ghana, a PRA tool known as participatory ranking was used for identifying
locally relevant characteristics of wealth and wellbeing and analysing changes in relative poverty
status (IFAD & BMGF, 2015). The tool helped creating a shared understanding among the
participants of their wealth and wellbeing as the basis for a causal flow mapping exercise around the
experienced changes therein. Furthermore, it permitted crosschecking and linking of the participatory
data with the household survey data.
The characteristics of wealth and wellbeing obtained from the participatory ranking exercise, however,
were not used as indicators for assessing changes in poverty status through the household survey. This
would have required participatory data collection prior to the evaluation as an input for designing the
household survey, which would have increased the cost of the evaluation. It is unclear if, and to what
extent, this might have generated more rigorous findings on poverty impact justifying the extra
investment.
In Vietnam, the survey focused only on IFAD’s generic poverty indicators and used the absolute
poverty categories of the Vietnamese government, which are purely income-based and proved
inadequate for assessing changes in poverty status related to the programme. Learning from this,
greater efforts were made in Ghana to ensure that the household survey questionnaire was sufficiently
attuned to the reality on the ground. Poverty characteristics corresponding with IFAD’s poverty
indicators were selected from the Ghana Living Standards Survey 2009-2014 for assessing the
households and computing the categories of poverty status (applying a proxy means test and principal
components analysis). No major differences were found between the characteristics obtained from the
participatory ranking exercise and those used by the household survey (IFAD & BMGF, 2015;
MOFA/GOG et al., 2015).
Arguably, greater rigour could have been obtained in the findings on poverty distribution and impact if
the questionnaire had enquired household characteristics in much greater detail. But more lengthy
surveys cost more while also increasing the risk of fatigue and gaming on both the part of the
respondents and the researchers (Chambers, 2008). Therefore, in both Vietnam and in Ghana, the
14
survey was limited to a twenty-minute interview, and also the participatory group discussions were
kept within a two-hour limit (IFAD & BMGF, 2015).
Thus instead of spending more resources on collecting and analysing participatory poverty
characteristics prior to the evaluation, or on collecting more fine-grained data on household
characteristics to identify poverty categories, in Ghana the choice was made to keep the poverty
analysis short and instead create more room for participation in ways that are meaningful to the
participants. This is what Chambers (2015) calls “appropriate imprecision”. Particularly the group-
based causal flow mapping exercises were found quite useful by the participants as it helped them
recall and understand the changes from a systemic perspective and equipped them to engage in
collective sense-making of evidence and discussions of contributions and responsibilities with other
stakeholders. The assumption is that this contributes to people’s ability to understand and navigate the
system and thus to their empowerment (Burns & Worsley, 2015; Merrifield, 2002).
(6) Dealing with power and bias
All methods are susceptible to bias, and biases may occur in every phase of the evaluation, from the
design to the analysis. Participatory methods, however, are considered more vulnerable than
traditional survey-based methods, as they collect perceptions, meanings and interpretations instead of
hard numbers. Yet also surveys generating these hard numbers are designed and conducted by human
beings with value judgements (Camfield et al., 2014; Copestake, 2013; H. White & Phillips, 2012).
Survey questionnaires tend to reflect the assumptions of the designers, while qualitative interviews
and participatory inquiries give room to the assumptions of the researched. Both may result in
desirability or courtesy bias: the researched tell the researchers what they think is wanted. As Sarah
White (2015, p. 138) illustrates in her review of a large-n RCT evaluation of girls’ empowerment in
Bangladesh: “hypothetical questions are susceptible to ‘desirability bias’ in which respondents give
the answer they believe the researchers wish to hear.” To overcome potential bias, the PIALA
evaluations employed mixed-methods in a way that enables extensive triangulation19 of different
methods and perspectives and cross-validation of findings at scale.
Yet scale may also create bias as it requires standardization that tends to instrumentalize participation
and de-politicize research context and relations, thus increasing the risk of power-blindness and bias
(Gaventa & Cornwall, 2006; Mosse, 2001). In the two pilots, attempts were made to avoid this by
19
Triangulation is a principle social science technique that involves the use of more than one type of information or data
source, method, and even theory and researcher, for the purpose of crosschecking in order to overcome weaknesses and
biases and thus obtain greater credibility of and confidence in findings (cf. http://www.betterevaluation.org/en/evaluation-
options/triangulation). In PIALA, this goes beyond merely verifying findings: it values different views and perspectives and
crosschecks them to build a rich and comprehensive picture of the change processes as the basis for identifying and checking
all plausible explanations for causality. Cross-validation in the case of PIALA is understood in “realist evaluation” terms as
the practice of (dis)confirming findings across multiple independent inquiry cases to strengthen the explanatory power and
the confidence in the conclusions about causality and contribution (cf. Pawson, 2013).
15
arranging and facilitating the group processes in ways that equal out power imbalances –e.g. by
making sure that in mixed groups power holders remain in the minority–, and by using visual tools
such as causal flow mapping and matrix scoring. Visual tools enable people to see how data is
constructed and indicate where things are flawed, while appropriate group composition and facilitation
strengthen their democratic voice (IFAD & BMGF, 2015). There is, of course, always a danger of
more powerful and influential participants dominating the discussions. Good facilitators though know
how to overturn this. Rigour then emanates from the sharp observation of motives and interactions
through a systematic reflective practice and crosschecking of different methods and sources
(Chambers, 2015). A strict procedure was developed and used in the PIALA pilots, involving six
essential steps:
1. Primary data collection in a standardized manner across the sample of systems, using at least
two different methods per type or area of change in the ToC, each used with minimum two
different sources;
2. Daily reflections on the quality of the processes by which each of the methods was used, and
of the quality of the data generated by these processes;
3. Triangulation and identification of data gaps/weaknesses in each locality through the collation
of both primary and secondary data alongside the ToC using a standard collation table;
4. Additional data collection through KIIs, where needed, to address the gaps/weaknesses;
5. Facilitation of collective sense-making in each of the localities, engaging local stakeholders in
discussions around remaining gaps and contradictions in the evidence and cross-validation of
findings for their local area;
6. Facilitation of sense-making at the aggregated programme level, engaging local and
programme-level stakeholders in the cross-validation of findings for the entire programme
area and the valuation of programme contributions to impact.
In Vietnam, this procedure was found challenging by the researchers. Mostly coming from a
quantitative research background, they struggled with triangulation as a way to compile a multi-
perspective picture. They were unable to uphold a daily practice of critical reflection on quality and
process. Moreover, the tools for data collation and quality monitoring proved insufficient to
adequately guide them. By consequence, data capture and collation (in step 3 and 4 of the procedure)
was not as structured and rigorous as in Ghana (IFAD & BMGF, 2013a). In Ghana, by contrast, the
researchers had a mixed background and substantial experience in participatory research. Data
collation and quality monitoring was undertaken daily and systematically. A standard set of questions
guided the daily reflections on the inclusiveness of the participatory research processes and the quality
and sufficiency of the data obtained from these processes. A standard table structured around the
causal claims and links in the ToC (see Figure 3) was used for data collation and triangulation.
16
Methods were tightly focused on the causal links, making the triangulation much more straightforward
and systematic. A Likert scale rubric was used to score the relative strength of emerging evidence for
each of the causal links in the ToC. Better equipped to handle large amounts of data, the researchers
were able to finish the data collation and identify data gaps and weaknesses in each locality in good
time, and prepare well for the local sense-making workshops (IFAD & BMGF, 2015).
<insert Figure 3>
(7) Deciding on the scale and level of engagement in the sense-making
Participation in evaluation is purely extractive if findings are not returned to the participants and there
is no opportunity for them to contest and debate the findings (Gaventa, 2004; Mohan & Hickey, 2004).
Using PIALA’s sense-making model, six village-level workshops with 180 and one provincial
workshop with 100 participants were organized in Vietnam, and 23 district workshops with 650 and
one national workshop with over 100 participants in Ghana. The participants in the workshops were
sampled purposively from the research participants (IFAD & BMGF, 2015).
The outcomes of the participatory sense-making were twofold. First, an additional layer of detail and
confirmation of evidence was obtained from the cross-validations in the local and aggregated
workshops, adding to the rigour of the evidence and thus to the validity of findings. Second, shared
ownership was created of the evidence and findings, contributing to the evaluation’s inclusiveness and
empowerment value. Having participated merely in data collection, people walked into the workshops
still knowing and owning little. They left the workshops with a comprehensive picture of the systemic
changes and the issues that the evidence had revealed, and of stakeholders’ different perspectives on
these.20 Critical to the success of the participatory sense-making was the scale of the workshops as
well as the way in which they were designed and facilitated. Special competencies are required
particularly for doing this at scale. When operating with low capacity and on a shoestring, the amount
and size of the workshops may need to be trimmed at the expense of both rigour and inclusiveness
(Van Hemelrijck & Guijt, 2016).
A truly participatory sense-making process implies equal and active engagement of all stakeholders.
Dynamic environments were created, long expert presentations banned, and the different types of
evidence made available in accessible (including visual) formats. Discussions were held in smaller
groups ensuring people felt ‘listened to’ rather than just ‘talked at’ (Newman, 2015). Beneficiaries
20
The survey and reflections held at the end of each workshop revealed a high degree of satisfaction among the participants
and of knowledge and insights gained by them that they found useful for future individual or collective action.
17
constituted over 30% of the participants in the provincial/national and 60-70% in the local workshops,
giving them sufficient weight in the debates with decision makers and service providers.21
Lessons were taken from Vietnam to help improve the sense-making model for Ghana. In Vietnam,
discussions took place mostly in mixed stakeholder groups and in plenary, which didn’t give the
farmers enough space to collect their thoughts and gain confidence. Learning from this, participants in
Ghana first worked in peer groups (organised around the part of the ToC representing their ‘patch’ in
the supply chain system –for instance, farmers discussing the part on production). In Vietnam, the
reconstruction of the causal flow was done in plenary, which again didn’t leave sufficient room for
farmers to engage. In Ghana, this was done in small mixed groups (organised around geographic
areas), with the farmers and processors systematically given the floor first, before all others, to present
their views. Plenary discussions took place only on the second day in a fish bowl set-up, in which
beneficiaries constituted the majority of the discussants. This was quite successful and provoked an
animated discussion with the bankers around the inaccessibility of the Micro-Enterprise Fund (which
presented a major failure for the programme) (IFAD & BMGF, 2013a, 2015).
Conclusions
The two pilots have shown that a participatory and systemic impact evaluation approach such as
PIALA can produce rigorous evidence useful for reporting and learning with partners and stakeholders
in contexts where traditional counterfactual analysis is not feasible. The pilots also have shown
PIALA’s potential empowerment value, or how engaging the impact populations or beneficiaries in
assessing and debating contributions to impact can contribute to enhancing impact even ex post.
Moreover, using the same methods and processes for collecting, cross-checking and analysing data in
the two impact evaluations have made it possible to compare and identify conclusions and formulate
recommendations with wider relevance for investments elsewhere, thus beyond the individual
programmes in question.
Compared to how much theory-based mixed-methods impact evaluations normally cost in countries
like Vietnam and Ghana, this was done at shoestring budgets. To give a sense of what other
approaches may cost: the estimated budget for a one-year Randomized Controlled Trial (RCT) study
in an IFAD-funded programme in Ghana similar to the RTIMP was around USD 200,000. This study
only covered one sub-component of the programme and eight districts in the North. The PIALA
evaluation of RTIMP, by contrast, cost US $ 233,000 but covered the entire programme (consisting of
three components which each 2-3 subcomponents) in thirty districts across the entire country.
21
Their group must be larger in the local workshops because they form the primary target group of the project at the local
level, while at the aggregated level the primary targets are policy makers and service providers.
18
Trade-offs occur in every evaluation aiming for greater value with limited resources. The PIALA
pilots have demonstrated that these can be turned into win-wins by carefully considering how rigour
and inclusiveness may reinforce each other and critically reflecting on the potential loss in value-for-
money if one were to be prioritized over the other (Van Hemelrijck & Guijt, 2016). Limiting
stakeholder engagement in the ToC process to save time and resources, for instance, leads to a
substantial loss of rigour in every phase of the evaluation. Conversely, reducing the scale and level of
engagement in sense-making limits cross-validation and thus also confidence in conclusions about
causality and contribution, while also reducing the inclusiveness and empowerment value of the
evaluation (Van Hemelrijck, 2016b). Reducing the sample size of the participatory inquiries to reduce
the cost not only limits the scale of participation and thus the ownership and potential uptake of
findings (Burns & Worsley, 2015), but also thwarts rigorous causal inference. Reducing the scope on
the other hand inhibits conclusion validity as it confines the systemic analysis and thus understanding
of non-linear impact trajectories (Woolcock, 2009).
The most essential conclusion is thus that inclusiveness and rigour can reinforce each other, and that
this is even more so when participatory processes and methods are employed at a larger scale.
People’s participation in impact evaluation can contribute to their understanding of the system in
which they function (Burns & Worsley, 2015) while also adding to the rigour and credibility of
findings, if it: (a) is inclusive and meaningful, enabling a robust crosschecking of many different
authentic voices; (b) avoids the dominance of any single truth, power or particular viewpoint, thus
mitigating bias; and (c) creates space for solid debate and equal voice, including of those who are least
powerful and least heard. Scale realised through rigorous sampling and representative inclusion of all
stakeholder perspectives makes it possible to generate knowledge that supersedes isolated anecdotes.
Moreover it also makes it possible to build contrasting evidence (or a ‘natural counterfactual’), thus
reducing doubt in causal inference. Rigour then emanates from the thoughtful design and facilitation
of participatory processes in ways that forestall the dominance of a single truth or viewpoint and
enable stakeholders to participate meaningfully. Important attributes of rigour are methodological
complementarity and consistency, and extensive and robust triangulation and cross-validation.
Critical to the quality delivery at scale, clearly, is researchers’ capacity. Investing in such capacity in
the long run therefore helps reduce the cost over time while enhancing the value and uptake of
evaluation findings. For the broader development sector, this implies building capacity through
providing guidance and support to new experiments with approaches like PIALA. For IFAD and
partners, optimising value-for-money would imply for instance using PIALA as a longitudinal
approach integrated with programme design as a way to invest in building local research partnerships
and capacity for impact evaluation and bring quality, continuity and consistency to IFAD’s impact
learning agenda at the national and global levels.
19
References
Bamberger, M., Rugh, J., & Mabri, L. (2012). RealWorld Evaluation. SAGE.
Befani, B., Barnett, C., & Stern, E. (2014). Introduction – Rethinking Impact Evaluation for
Development. IDS Bulletin, 45(6), 1–5. Institute of Development Studies (IDS).
Burns, D., & Worsley, S. (2015). Navigating Complexity in International Development: Facilitating
sustainable change at scale. Practical Action Publishing.
Camfield, L., Duvendack, M., & Palmer-Jones, R. (2014). Things you Wanted to Know about Bias in
Evaluations but Never Dared to Think. IDS Bulletin, 45(6), 49–64. Institute of Development
Studies (IDS).
Chambers, R. (2008). Revolutions in Development Inquiry. Earthscan.
Chambers, R. (2015). Inclusive rigour for complexity. Journal of Development Effectiveness, 7(3),
327–335.
Carugi, C. (2016). Experiences with systematic triangulation at the Global Environment Facility.
Evaluation and Program Planning, 55, 55–66.
Copestake, J. (2013). Credible impact evaluation in complex contexts: Confirmatory and exploratory
approaches (Draft 18 Oct 2013). Centre for Development Studies, University of Bath.
Funnell, S. C., & Rogers, P. J. (2011). Purposeful Program Theory: Effective Use of Theories of
Change and Logic Models. John Wiley & Sons.
Gaventa, J. (2004). Towards Participatory Governance: Assessing the Transformative Possibilities. In
Participation : from Tyranny to Transformation? Exploring New Approaches to Participation
in Development. Zed Books.
Gaventa, J., & Cornwall, A. (2006). Challenging the Boundaries of the Possible: Participation,
Knowledge and Power. IDS Bulletin, 37(6). Institute of Development Studies (IDS).
Guijt, I., & Woodhill, J. (2002). A guide for project M & E: managing for impact in rural
development. International Fund for Agricultural Development (IFAD).
IFAD. (2011). IFAD Strategic Framework 2011-2015. Enabling poor rural people to improve their
food security and nutrition, raise their incomes and strengthen their resilience. International
Fund for Agricultural Development (IFAD).
IFAD. (2012). Methodologies for Impact Assessments fro IFAD IX (Note to IFAD’s Executive Board
Representatives) (Note to IFAD’s Executive Board Representatives). International Fund for
Agricultural Development (IFAD).
IFAD. (2016) IFAD Strategic Framework 2016-2025. Enabling inclusive and sustainable rural
transformation. International Fund for Agricultural Development (IFAD).
IFAD, & BMGF. (2013a). Improved Learning Initiative for the design of a Participatory Impact
Assessment & Learning Approach (PIALA): Insights and lessons learned from the reflections
on the PIALA piloting in Vietnam. International Fund for Agricultural Development (IFAD).
20
IFAD, & BMGF. (2013b). PIALA Research Strategy. Improved Learning Initiative (Internal
Document). International Fund for Agricultural Development (IFAD).
IFAD, & BMGF. (2015). Improved Learning Initiative for the design of a Participatory Impact
Assessment & Learning Approach (PIALA): Methodological reflections following the second
PIALA pilot in Ghana. International Fund for Agricultural Development (IFAD).
Levy, S., & Barahona, C. (2002). How to generate statistics and influence policy using participatory
methods in research (Working Paper). Statistical Services Centre, University of Reading.
Merrifield, J. (2002). Learning citizenship (IDS Working Paper No. 158). Institute of Development
Studies (IDS).
Mertens, D. M. (2010). Philosophy in mixed methods teaching: The transformative paradigm as
illustration. International Journal of Multiple Research Approaches, 4(1).
MOFA/GOG, IFAD, & BMGF. (2015). Final Report on the participatory impact evaluation of the
Root & Tuber Improvement & Marketing Program (RTIMP). Pilot Application of a
Participatory Impact Assessment & Learning Approach (PIALA). International Fund for
Agricultural Development (IFAD).
Mohan, G., & Hickey, S. (2004). Relocating participation within a radical politics of development:
critical modernism and citizenship. In Participation : from Tyranny to Transformation?:
Exploring New Approaches to Participation in Development (pp. 59–74). Zed Books.
Mosse, D. (2001). “People’s knowledge”, participation and patronage: Operations and representations
in rural development. In Participation: the New Tyranny? Zed Books.
Newman, D. (2015). From the front of the room. Matter Group.
Patton, M. Q. (2011). Essentials of Utilization-Focused Evaluation. SAGE.
Pawson, R. (2013). The science of evaluation: a realist manifesto. SAGE.
Picciotto, R. (2012). Experimentalism and development evaluation: Will the bubble burst? Evaluation,
18(2).
Picciotto, R. (2014). Have Development Evaluators Been Fighting the Last War… And If So, What is
to be Done? IDS Bulletin, 45(6), 6–16. Institute of Development Studies (IDS).
Ravallion, M. (2012). Fighting poverty one experiment at a time. A review of Abhijit Banerjee and
Esther Duflo “Poor economics: A radical rethinking of the way to fight global poverty.”
Journal of Economic Literature (forthcoming).
van Es, M., Guijt, I., & Vogel, I. (2015). Theory of Change Thinking in Practice. HIVOS ToC
guideines. HIVOS.
Van Hemelrijck, A. (2013). Powerful Beyond Measure? Measuring complex systemic change in
collaborative settings. In Sustainable Participation and Culture in Communication: Theory and
Praxis. Intellect Ltd.
21
Van Hemelrijck, A. (2016a). Governance in Myanmar. Evaluation using PIALA (Participatory
Impact Assessment & Learning Approach) of the “Building resilient livelihoods in the Dry
Zone” project in the Dry Zone. (Effectiveness Review Series 2015/2016.). Oxfam GB.
Van Hemelrijck, A. (2016b). Understanding “Rigour” in Participatory Impact Evaluation for
Transformational Development: Insights from piloting a Participatory Impact Assessment and
Learning Approach (PIALA) (IDS PhD Work in Progress paper.). Institute of Development
Studies (IDS).
Van Hemelrijck, A. (2017). Walking the talk with Participatory Impact Assessment Learning
Approach (PIALA) in Myanmar. Retrieved from http://policy-
practice.oxfam.org.uk/blog/2017/01/real-geek-walking-the-talk-with-participatory-impact-
assessment-learning-approach-piala-in-myanmar
Van Hemelrijck, A., & Guijt, I. (2016). Balancing Inclusiveness, Rigour and Feasibility; Insights from
Participatory Impact Evaluations in Ghana and Vietnam. CDI Practice Paper, (14).
White, H. (2014). Current Challenges in Impact Evaluation. European Journal of Development
Research, 26.
White, H., & Phillips, D. (2012). Addressing attribution of cause and effect in small n impact
evaluations: towards an integrated framework (Working Paper No. 15). International Initiative
for Impact Evaluation (3ie).
White, S. C. (2015). Qualitative perspectives on the impact evaluation of girls’ empowerment in
Bangladesh. Journal of Development Effectiveness, 7(2), 127–145.
Woolcock, M. (2009). Toward a plurality of methods in project evaluation: a contextualised approach
to understanding impact trajectories and efficacy. The Journal of Development Effectiveness,
1(1).
Woolcock, M. (2013). Using case studies to explore the external validity of “complex” development
interventions. Evaluation, 19(3), 229–248.
22

Rising To The Challenges of Impact Evalu

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Rising To The Challenges of Impact Evalu

Uploaded by

Copyright:

Available Formats

Rising to the challenges of impact evaluation in complex development contexts

Adinda Van Hemelrijck

<insert Figure 1>

<insert Figure 2>

The two IFAD cases

<insert Text box 1>

Key insights from the piloting

(1) Creating ownership of the evaluation

(2) Deciding on the scope and scale of the evaluation

(3) Deciding on the counterfactual

(4) Maintaining independence

(5) Contextualising poverty analysis

(6) Dealing with power and bias

<insert Figure 3>

(7) Deciding on the scale and level of engagement in the sense-making

You might also like