Professional Documents
Culture Documents
Contribution Analysis PDF
Contribution Analysis PDF
Sebastian Lemire
Authors’ note
Rambøll Management Consulting, Hannemanns Allé 53, DK 2300 København S, Denmark. The author
would like to thank Steffen Bohni Nielsen and Melanie Kill for their thoughtful comments on an earlier
1
English abstract
This paper examines the methodological strengths and weaknesses of contribution analysis
(CA), awarding particular attention to the ability of CA to identify and determine the extent of
influence from alternative explanations. The author argues that CA – in its current form and application
– fares well in terms of identifying salient external factors, but finds itself in need of further
methodological elaboration to adequately account for the extent of influence from these factors. As
such, CA remains vulnerable to threats of internal validity. In strengthening the merit of causal claims
based on CA, focus should not only be directed at CA as an analytical strategy but should also involve
a much broader discussion on how to (re)think validity in the context of evaluation. An outline of the
implications of this new categorization for causal claims based on CA and a discussion of how to
2
Introduction
Impact evaluation has for many years received the lion’s share of attention in both evaluation
theory and practice. Yet, despite the sustained attention to determine the attribution of projects,
programs, and policies, there is to this day surprisingly little agreement on viable and methodologically
rigorous alternatives to the traditional counterfactual impact designs. Indeed, the received wisdom
among many an evaluator remains that rigorous impact evaluation cannot be done in the absence of
controlled comparisons. The counterfactual designs stand strong. One promising alternative to the
counterfactual designs is that of John Mayne’s contribution analysis (CA) that seeks to examine
evaluation design. “In complex systems”, Mayne goes on to argue, “experimenting with exogenous
variables is not possible or not practical: the counterfactual case cannot be established” (2008: 4).
Accordingly, the evaluation question must be readdressed by focusing on the extent to which the
evaluator can “build a case for reasonably inferring causality,” that is, the extent to which the
intervention can be said to have contributed to a set of observed (positive or negative) outcomes
The methodological strength of CA, then, rests on its ability to accommodate – without a
counterfactual – the often complex and messy nature of programs by taking into account the nexus of
conditioning variables and interactions among program components. This is – perhaps needless to say
to the experienced evaluator - in and of itself a strong selling point. Yet, despite the significant
theoretical interest and attention awarded contribution analysis, especially in development circles, there
3
are to this day few examples of the systematic application of contribution analysis (see Dybdal, Bohni
A stumbling block to using contribution analysis may be that this type of analysis
remains seen, especially in some of the evaluation profession, as a very weak substitute
need then is to develop and present CA as a methodologically rigorous alternative. In this effort, I
would argue that attention should be awarded the validity of causal claims based on CA. The
foundation of confidence in any methodologically sound design, method or analytical approach that
The overarching purpose of this paper is to advance the application of CA. The aim is two-fold,
in that I seek to both (1) advance the theoretical discussion on the validity of causal claims based on
CA and (2) push for further practical application of CA. The paper consists of three sections. The first
presents a brief outline of contribution analysis and note on its ability to identify and determine the
influence of external factors. However, in examining the merit of causal claims based on CA, focus
should not only be directed at CA as an analytical strategy but should also involve a much broader
discussion on how to (re)think validity in the context of evaluation. Thus the second section examines
the concept of validity. It discusses the overwhelming dominance of the Campbellian validity model in
both research and evaluation and makes the case for rethinking validity in the context of evaluation.
The result of the discussion is a new categorization of validity evidence for causal claims. The third
section outlines the implications of this new categorization for causal claims based on CA and
4
concludes with a discussion of how to enhance the credibility of CA and thereby pave the way for its
Contribution analysis
Contribution analysis (CA) has been presented and conceptually developed by John Mayne
through a series of seminal papers (1999, 2001 and 2008). In its development, CA has over time moved
away from its original setting in performance measurement and towards its new role in evaluating
complex programs in complex settings (Dybdal, Bohni and Lemire n.d.). While the approach has
undergone refinements in its methodology and even more notably its scope (see Dybdal, Bohni and
Lemire n.d. for a detailed account), the underlying logic of and core steps in CA remain more or less
the same: (i) elaborating the intervention’s theory of change, (ii) identifying key threats to the theory of
change’s mechanisms, (iii) identifying other contributing factors, and (iv) testing the primary rivaling
following steps: (1) Set out the cause-effect issue to be addressed, (2) develop the postulated theory of
change and risks to it, (3) gather the existing evidence on the theory of change, (4) assemble and assess
the contribution story and challenges to it, (5) seek out additional evidence, (6) revise and strengthen
the contribution story, (7) in complex settings, assemble and assess the complex contribution story
(Mayne, 2008).
providing answers to contribution questions such as: Has the program made a difference? How much of
developed and the expected outputs were delivered. Contribution is based on “the
5
inherent strength of the postulated theory of change and the fact that the expected
was developed, expected outputs occurred, immediate results were observed, and
evidence suggests the program was instrumental in creating those results, in light of
3. A contribution analysis of indirect influence can be construed when: “It would measure
the intermediate and final outcomes (or some of them) and gather evidence that the
assumptions (or some of them) in the theory of change in the areas of indirect influence
were born out. Statements of contribution at this level would attempt to provide factual
evidence for at least the key parts of the whole postulated theory of change” (Mayne
n.d.: 25-26).
The distinction between these three types of causal stories has less to do with the extent to which CA
can address the magnitude of the contribution and more to do with the relative strength or credibility of
the contribution story. The shared denominator for all three causal stories is that the evaluator –
through systematic evaluative inquiry – seeks to infer “plausible association” between the program and
a set of relevant outcomes (Mayne, 1999: 5-7). The aim of CA, then, is not to provide proof of a one-
on-one, linear causal linkage between a program and its intended outcomes, nor is it to determine the
exact contribution of the program. Rather the aim of CA is to provide evidence beyond reasonable
doubt that the program to some degree contributed to the specified outcomes.
It is important to note that Mayne does not commit himself, nor link the causal stories presented
above, to any specific evaluation design. He remains insistent that a mix of quantitative and qualitative
methods can be used in answering these questions through CA. According to Mayne (n.d.: 7), five
6
criteria concerning the embedded theory of change are to be met in order to infer “plausible
association”;
(ii) Implementation according to plan: Was the program implemented with high fidelity?
(iii) Evidentiary confirmation of key elements: To what extent are the key elements of the
(iv) Identification and examination of other influencing factors: To what extent have other
(v) Disproof of alternative explanations: To what extent has the most relevant alternative
These collectively serve as the quality criteria of causal stories based on CA. While Mayne continues
throughout his conceptual advancement of CA to summarily address the issue of accounting for other
influencing factors and assessing their potential influence, he never goes into detail on the subject. As
such, there is no operational framework or discussion of what it means to account for other influencing
factors, despite their importance being stated repeatedly. Moreover, the ability to account for the
influence of other factors is key in establishing the internal validity of causal claims and inferences
based on CA. At the risk of adding to an already overused term, the following presents a discussion of
The concept of validity has for several decades been widely discussed and developed (see Chen
2010 for an overview)ii. Unfortunately, but perhaps not unexpectedly, the long-enduring debates have
in many instances served to muddy rather than to clarify the waters. As a result, the term has come to
7
mean many different things to many different people. I suggest that the conceptual murkiness – and
even some of the central points of conflict in the ongoing debates – stem from a lack of recognition that
the very meaning and application of the term may differ across different fields of application between
For several decades the Campbellian validity model has been dominant in both research and
evaluation (Campbell and Stanley, 1963). Campbell and Stanley’s delineation between internal validity
(i.e., to what extent the design accounts for the influence of external factors) and external validity (i.e.,
to what extent the conclusions of the study can be generalized) has had profound influence on the
theory and practical application of validity amongst researchers and evaluators alike. The importance of
Indeed, the most oft-cited categorization of validity evidence still remains that of the
Campbellian model’s internal and external validity. External validity – generally stated – concerns the
extent to which one may safely generalize the conclusions derived from a study; that is, to what extent
the inferences and conclusions are valid for other subjects, other times and other places, or other
settings (Mohr 1995, p.92). This is obviously relevant in the context of research, as the aim of research
studies very often revolves around producing knowledge about a specific topic that can be generalized,
and in effect applied, to further an academic field. Internal validity – by some considered the sine qua
non of validity – is an expression of the extent to which a design accounts for external factors. As such,
internal validity is constitutes a key component in isolating and determining the magnitude the impact
of a program. The two types of validity are characterized by an inverse relationship (Chen, 2010). As
noted by Mohr, “The less successful a design is in accomplishing the first, the more it is depending on
8
Despite the heavy influence of the Campbellian validity model, it does not remain
unchallenged. Most recently, Huey Chen has questioned the relevance of the model in the context of
Because the Campbellian model was developed for academic research, ideas and principles
proposed by the model might not be wholly applicable or relevant to program evaluation.
Similarly, issues crucial to program evaluation but not to academic research are, in the
As just one example, Chen argues that the model’s emphasis on internal validity as the sine qua non of
research, may not be as relevant in the context of evaluation (2010). Instead, the relative importance of
internal versus external validity must be reconsidered in the context of evaluation (Chen 2010).
My interest is not to engage in debates on the relative importance and weighing of internal
versus external validity. In my opinion internal and external validity together express two overarching
types of validity that are necessary to address in research and evaluation. However, I would argue,
inspired by Carol Weiss and Chen, that the meaning, purpose and application of these two types of
validity needs to be clarified in the context of evaluation. Indeed, I think the hard-won clarity that could
result from such an effort would not only serve to enhance the credibility of contribution analysis but
Simply consider external validity that concerns the extent to which inferences and conclusions
can be generalized to other subjects at other times and places. As noted by Chen, “such an open-ended
quest for law-like propositions” is often more relevant in a research context whereas it may be
“extremely difficult or even impossible to achieve” in the context of evaluation (2010, p.207)iv. This is
not to say that this interpretation of external validity as statistical generalizability is not relevant for
some evaluations. In fact, in the early days of social engineering the aim of randomized controlled trials
9
was exactly to identify the programs that work and then implement these more widely. More recent
developments in the field of evaluation, such as systematic reviews and rapid evidence assessment, also
lend themselves well to this traditional interpretation of external validity. However, many – and
perhaps even most – evaluations have a much more practically–oriented aim in that they seek to answer
very specific questions and to produce information that supports the practical implementation of
programs in other local contexts. Indeed, one oft-cited challenge related to the utilization of
information stemming from evaluations is how to translate the generic learning statements from one
local context into actual practice in other local contexts. The emphasis on statistical generalization in
the Campbellian interpretation of external validity seems less appropriate in these types of evaluations.
information from evaluations, I would argue that the type of external validity that is particularly
relevant for many evaluations ought perhaps to be more in the direction practical generalizibility. This
new interpretation of practical generalizibility could express the extent to which inferences and
conclusions can support the local implementation of the program for other subjects, other times and
other places or other settings. Moreover, this may present a welcomed twin to Samuel Messick’s
concept of consequential validityv in the context of test validation that concerns the extent to which
adverse consequences are produced by invalid test interpretation and use (1989).
The Campbellian model has been applied without giving due justice to the differences in
context, aim and quality criteria of research and evaluation, and it may be a model whose time has
come to an end in relation to many evaluations. The idea is not to replace the Campbellian validity
model with an everything-goes-approach to validity. I am not advocating for an approach that simply
allows evaluators to be opportunistic in their choice of validity evidence. Rather the aim is to develop a
10
Inspired by Messick’s unitary validity concept, I suggest a new classification for cutting and
framing validity evidence for causal claims and inferences (see table 1 below). The first dimension
covers two different types of justification for making causal inferences: the evidential and the
consequential. The second dimension is the function of the causal claims and inferences for either
theoretical or practical use. According to the classification, the justification for the interpretation of
causal claims and inferences is primarily based on the appraisal of the evidential basis (i.e. to what
extent other influencing factors have been accounted for in isolating the impact of the intervention and
to what extent the causal claims be generalized to other subjects at other times and places?) and
perhaps secondarily supplemented by an appraisal of the consequential basis (i.e. to what extent have
the causal claims resulted in misconception due to flaws in the design or analytical strategy?).
Likewise, the justification for the practical use of causal inferences is primarily based on the appraisal
of the evidential basis (i.e. to what extent other influencing factors have been accounted for in isolating
the impact of the intervention and to what extent can the causal claims be practically applied to other
subjects at other times and places?) and secondarily supported by the consequential basis (i.e.to what
extent are the causal claims likely to lead to misapplication due to flaws in the design or analytical
strategy?).
A couple of examples might clarify how to apply the content of the table. A research study
aiming to examine the causal linkage between smoking and lung cancer would primarily build its case
on the evidential basis of internal validity and statistical generalizability. In addition, consequential
validity evidence would also strengthen the validity of the causal claims in the study. In marked
contrast, an evaluation of a pilot project on youth advising would more likely aim for the practical use
of its causal conclusions and therefore focus the validation effort on internal validity and practical
11
generalizibility. An examination of the consequential validity of the causal conclusions could further
If we accept this cutting and combining of validity evidence, where does this leave us in our
As mentioned earlier in this paper, published examples of the systematic application of CA are
far and few between. Accordingly, the following discussion builds on how CA – in its current
conceptual state and presentation – fares in relation to the proposed categorization of validity evidence
(primarily Mayne 1999, 2001 and 2008). It is also important to note that it is the combination of a
design and an analytical strategy that collectively enhances the validity of causal claims. Accordingly,
the less successful the design is in enhancing validity, the more one might depend on the analytical
strategy to do the work. As mentioned earlier, Mayne does not commit himself to any specific designs.
As such, the true capacity of CA in terms of realizing validity evidence cannot be determined by
examining CA isolated from specific designs. My examination of CA and validity, then, is more an
effort to gauge its relative strengths and weakness to justify causal claims. My focus will be on the
evidential basis, as this dimension constitutes the primary source of justification. It is in effect central
12
Consequently, I will focus my effort on practical generalizibility, statistical generalizibility and internal
which inferences and conclusions can support the local implementation of the program for other
subjects, other times and other places or other settings. The inherent focus of CA on understanding the
nature and context of causal linkages between a program and a set of desired outcomes provides causal
stories that lend themselves well to local implementation in other settings. The emphasis on the
development and subsequent refinement of an embedded theory of change leads the evaluator towards
a deep and highly applicable understanding of how the nuts and bolts of the program may function and
CA however holds a weaker position when it comes to statistical generalizibility – the extent to
which conclusions can be generalized to other subjects, other times and other places or other settings.
The focus on systematically examining the nature and context of the causal linkages between the
program and the desired set of outcomes is not likely to produce the type of law-like general statements
that lend themselves well to generalization. However, and as mentioned above, the true capacity of CA
in relation to statistical generalizibility remains contingent upon the specific design and methods
Third and finally, I think the real area of improvement in relation to CA and validity revolves
around internal validity. While Mayne provides different strategies for identifying the most salient
external factors, he never goes into any detail on how to gauge the influence of these. The importance
of an operational framework for identifying and assessing the influence of external factors is
particularly clear given the very aim of contribution analysis, especially in relation contribution stories
of direct and indirect influence. The advantage that CA offers in complex contexts is that it aims to
13
determine the relative, rather than the specific, contribution of a program to a set of outcomes of
interest. This being the case, explicit guidance on how to systematically gauge the magnitude of the
version of this aspect of CA provides a necessary stepping stone to strengthen the ability of CA to
approximate the contribution of programs. The consequences of this missing stepping stone are real. As
just one example, Michael Patton’s use of contribution analysis in his advocacy impact evaluation of a
Based on a thorough review of the campaign’s activities, interviews with key informants and
key knowledgeables, and careful analysis of the Supreme Court decision, we conclude that: The
my italics).
One might wonder how to interpret the vague – yet heavily loaded – quantifier “significant” in the
above conclusion. How was it determined that the contribution was significant as opposed moderate?
What does it really mean that the campaign contributed significantly? The need for some sort of
systematic approach to not only identify but also gauge the influence of other external factors is in my
mind called for. I would dare argue that the methodological soundness of CA demands a consistent and
rigorous use of strategies that aim to reduce the threat of external factors.
I am well aware that pushing for the increased application of CA requires more than a
theoretical discussion of validity issues. In closing this paper, I would like to share what I think are
some of the related issues that ought to be discussed and that may serve to strengthen the conceptual
and practical advancement of CA. Admittedly, these are only the beginnings.
First, I suggest we need to examine the underlying concept of causality that CA builds on.
Examining the underlying concept of causality should involve an examination of the counterfactual
14
framework that Mayne is positioning CA against when arguing that contribution analysis does not
counterfactual designs in settings where comparison and control groups are unfeasible. That being said
CA still deals with counterfactual-based questions and is – in my mind – certainly compatible with
Although it may not always be necessary or useful to make the counterfactual explicit, in attrib-
explicit counterfactual. An explicit counterfactual does not necessarily mean that one needs a
As he goes on to argue, the counterfactual may come in the form of different variants of interrupted
time-series design. If we accept this conceptualization of the explicit counterfactual, there is no reason
well keep our doors open to and be aware of this particular type of CA.
sharply distinguish between designs and analytical strategies, as these are often conflated. An
evaluation design specifies the frequency and placement of measurement points in relation to the
intervention being evaluated. It also specifies the demand for a control or comparison group. In doing
so a design delineates the overarching structure of the data collection. An analytical strategy specifies
how the data derived from the measurement points will be analyzed and connected with the questions
to be answered as part of the evaluation. One might argue that simply comparing experimental designs
with an analytical strategy is like comparing apples and oranges. I agree. However, I also recognize
that these comparisons are being made—and will continue to be made—and that contribution analysis,
as noted by Mayne, is “often perceived as a very weak substitute for the more traditional experimental
15
approaches to assessing causality” (n.d.: 1). In advancing CA, I think we have to accept that these
comparisons will be made and seek to inform and frame them as best as possible. This involves holding
on to the important distinction between a design and an analytical strategy and recognizing that the
internal validity of causal claims is both contingent upon the evaluation design and the analytical
strategy - collectively.
Third, and in direct extension with these two above points in mind, it may prove rewarding to
designs that will strengthen contribution stories based on CA. This has me wondering: Is there really
any reason why we can’t combine a counterfactual pre-/post design with CA? Are there certain
counterfactual or non-counterfactual designs that lend themselves particularly well to CA? Are there
types of designs that will strengthen and connect particularly well with the three different types of
contribution stories? Are there certain types of counterfactual designs that are required to support
contribution stories of direct or indirect influence? In answering these questions we have to be clear on
the distinction between counterfactuals and control groups, designs and analytical strategies. We also
Fourth and finally, I would argue that we should continue the methodological discussion on
how to enhance and assess the quality of contribution stories. How would we recognize a
methodologically sound CA if it were right in front of us? Mayne points towards five criteria, but are
there other relevant quality markers? I’m thinking here of a set of quality markers equivalent – but not
identical – to the quality markers typically employed in research. These quality markers may
16
Concluding remarks
Contribution analysis (CA) presents a promising and viable alternative to the traditional
counterfactual impact designs; indeed, it is my strong belief in the potential of CA that motivates this
arises for addressing the validity issues pertaining to CA. Validity is at its core about the extent to
which we can invest our trust in a set of inferences. As such, the foundation of confidence in any
methodologically sound design, method or analytical strategy that aims to produce inferences—
especially inferences of cause-and-effect relations—is contingent upon the extent to which attention is
awarded to validity issues. In order to pave the way for the increased practical application of CA we
need to address validity issues related to CA. However, we also need to make sure that the concept of
validity that we employ in building credible causal stories – by way of CA or other strategies – is
applicable and relevant to the field of evaluation. It is my modest hope that this paper will foster further
17
References
Bickman, L. 1987. The Function of Program Theory: Using Program Theory in Evaluation, New
Directions for Evaluation, 33 pp. 5-18
Campbell D.T. , & Stanley, J. 1963. Experimental and Quasi-experimental Designs for Research.
Chicago: Rand McNally
Chen, H. 1987. The Theory-driven Approach to Validity. Evaluation and Program Planning, 10, pp.
95-103.
Chen, H. 1988. Validity in evaluation research: A critical assessment of current issues, Policy and
Politics, 16 (1), pp. 1-16.
Chen, H. 2010. The Bottom-Up Approach to Integrative Validity: A New Perspective for Program
Evaluation. Evaluation and Program Planning, 33, pp. 205-214.
Davidson, E.J. 2000. Ascertaining Causality in Theory-Based Evaluation, New Directions for
Evaluation, 87, pp. 17-26.
House, E. 2001. Unfinished Business: Causes and Values, The American Journal of Evaluation 22 (3),
pp. 309-315.
Kane, M. 2001. Current Concerns in Validity Theory, Journal of educational measurement 38 (4), pp.
319-342.
Mayne, J. 1999. Addressing Attribution through Contribution Analysis: Using Performance Measures
Sensibly, discussion paper, Office of the Auditor General of Canada.
Mayne, J. 2001. Addressing Attribution through Contribution Analysis: Using Performance Measures
Sensibly, Canadian Journal of Program Evaluation, 16 (1), pp. 1-24.
Mayne, J. 2008. Contribution analysis: An approach to exploring cause and effect, ILAC Brief 16,
Institutional Learning and Change (ILAC) Initiative, Rome, Italy.
Mayne, J. n.d.: Addressing Cause and Effect in Simple and Complex Settings through
Contribution Analysis in R. Schwartz, K. Forss, and M. Marra (Eds.): Evaluating the Complex,
R. Schwartz, K. Forss, and M. Marra (Eds.), New York, Transaction Publishers (in print).
Messick, S. 1989. Validity in R. L. Linn (Ed.), Educational measurement (3rd ed.), pp. 13-103. New
York, American Council on Education and Macmillan.
18
Patton, M. 2008. Advocacy Impact Evaluation. JMDE, 5(9): pp. 1-10.
Rogers, P. et al. 2000. Program Theory Evaluation: Practice, Promise, and Problems, New Directions
for Evaluation, 87, pp. 5-13.
Rogers, P. 2007. Theory-Based Evaluation: Reflections Ten Years On, New Direction for Evaluation,
114, pp. 63-67.
Scheirer, A.M. 1987. Program Theory and Implementation Theory: Implications for Evaluators, New
Directions for Program Evaluation, 33, pp. 59-76.
White, H (2010). A contribution to current debates in impact evaluation, Evaluation 16 (2), pp. 153-
164.
19
i
One published example is that of Michael Patton and his employment of contribution analysis in evaluating a stealth
campaign (Patton, 2008).
ii
The literature on internal and external validity in research and evaluation is extensive (see Chen 1988 & 2010 as well as
Kane 2001 for a good overview), and space does not allow for a detailed account of the development of the term here.
However, some of the trends are particularly relevant in relation CA and therefore merit our attention. First of all, the
concept of validity in the Campbellian tradition pertains specifically to the research design; that is, it is the research design
that is being validated. Over the years the common consensus has moved towards validity pertaining to the claims and
inferences produced by different designs. Stated differently, it is the validity of the causal claims and inferences that are
being validated. As a result, any credible combination of design and analytical strategy that seeks to produce sound causal
claims has to address internal validity.
iii
New interpretations and conceptual advancement in the area of validity has come most often from the research
community, especially from researchers in the area of psychometrics where test and instrument validation is central (see
Kane 2001 and Messick 1989 among others).
iv
Meta-evaluation is the exception.
v
The concept of consequential validity was introduced by Messick in the context of instrumental validity, that is, the
validation of tests and measurement instruments. However, it certainly appears relevant given the heavy focus on utilization
in the field of evaluation.
20