You are on page 1of 22

Political Methodology

Committee on Concepts and Methods


Working Paper Series

October 2005

Comparability
A Key Issue in Research Design

John Gerring
Boston University

(jgerring@bu.edu)

Craig W. Thomas
University of Massachusetts, Amherst

(cthomas@polsci.umass.edu)

C&M
The Committee on Concepts and Methods
www.concepts-methods.org

IPSA
International Political Science Association
www.ipsa.ca

CIDE
Teaching and Research in the Social Sciences
www.cide.edu
Editor The C&M working paper series are published
by the Committee on Concepts and Methods
Andreas Schedler (CIDE, Mexico City) (C&M), the Research Committee No. 1 of the
International Political Science Association
(IPSA), hosted at CIDE in Mexico City. C&M
Editorial Board working papers are meant to share work in
progress in a timely way before formal
José Antonio Cheibub, Yale University publication. Authors bear full responsibility for
the content of their contributions. All rights
David Collier, University of California, Berkeley reserved.

Michael Coppedge, University of Notre Dame The Committee on Concepts and Methods
(C&M) promotes conceptual and
methodological discussion in political science. It
John Gerring, Boston University
provides a forum of debate between
methodological schools who otherwise tend to
George J. Graham, Vanderbilt University conduct their deliberations on separate tables. It
publishes two series of working papers:
Russell Hardin, New York University “Political Concepts” and “Political Methodology.”

Evelyne Huber, University of North Carolina at Political Concepts contains work of excellence
Chapel Hill on political concepts and political language. It
seeks to include innovative contributions to
concept analysis, language usage, concept
James Johnson, University of Rochester
operationalization, and measurement.

Gary King, Harvard University


Political Methodology contains work of
excellence on methods and methodology in the
Bernhard Kittel, University of Amsterdam study of politics. It invites innovative work on
fundamental questions of research design, the
James Mahoney, Brown University construction and evaluation of empirical
evidence, theory building and theory testing.
Gerardo L. Munck, University of Southern The series welcomes, and hopes to foster,
California, Los Angeles contributions that cut across conventional
methodological divides, as between quantitative
and qualitative methods, or between
Guillermo O’Donnell, University of Notre Dame interpretative and observational approaches.

Frederic C. Schaffer, Massachusetts Institute of Submissions. All papers are subject to review
Technology by either a member of the Editorial Board or an
external reviewer. Only English-language
Ian Shapiro, Yale University papers can be admitted. Authors interested in
including their work in the C&M Series may
Kathleen Thelen, Northwestern University seek initial endorsement by one editorial board
member. Alternatively, they may send their
paper to workingpapers@concepts-
methods.org.

The C&M webpage offers full access to past


working papers. It also permits readers to
comment on the papers.

www.concepts-methods.org
When you can measure what you are speaking about, and express it in numbers, you
know something about it; but when you cannot measure it, when you cannot express it in
numbers, your knowledge is of a meager and unsatisfactory kind: it may be the beginning
of knowledge, but you have scarcely, in your thoughts, advanced to the stage of science,
whatever the matter may be.
-- Lord Kelvin1

Not everything that can be counted counts, and not everything that counts can be counted.
-- Albert Einstein2

What things should a researcher count, and when she should count them? Are all things
countable worth counting? What is “qualitative” evidence, anyway? Is it simply non-quantitative
evidence? If so, what does “quantitative” mean? What is the relationship between these two kinds of
evidence and causal inference? Under what circumstances do qualitative methods for causal analysis
offer an advantage over quantitative methods?
Perhaps no division in the social sciences is so persistent, so nettlesome, and so poorly
understood as the division between quantitative and qualitative ways of knowing. The cleavage can
be traced back to the first applications of statistics within the disciplines of economics, political
science, and sociology, and became increasingly acute in the late-twentieth century as quantitative
approaches gained in stature, grew in complexity, and pushed qualitative empirical analysis out of the
limelight (Barnes 1925; Becker 1934; Bernard 1928; Gosnell 1933; Hammersley 1989; Jocher 1928;
Stouffer 1931; Teggart 1939/1967; Waller 1934; White 1930; Znaniecki 1934). During this period, the
division between qualitative and quantitative methods became associated – unfortunately and
inappropriately – with the rival epistemological positions of positivism and interpretivism. Smith
(1989: 29) summarizes the now-familiar stand-off:
On the one hand, there are those who argue that only through the application of quantitative
measurements and methods can the social sciences ever hope to become ‘real’ sciences; on
the other hand, there are those who claim that the subject matter of the social sciences is
simply not amenable to quantification and all attempts to impose such measures and
methods upon social behavior is just so much nonsense.3
While there have been many attempts to shed light on this persistent division in the social sciences,
work on this question is generated primarily by writers who occupy one of the two camps. These
writers tend to be either strong partisans or visceral opponents of the “quantitative worldview.” A
chronic dualism besets these debates, in large part because the distinction between quantitative and
qualitative forms of descriptive and causal inference have been folded into the debate between
positivism and interpretivism.
Positivists (aka naturalists), usually identified with quantitative methods, present their
perspective as hegemonic: there is, or ought to be, only one logic of inference (Blalock 1982, 1989;
Friedman 1953; Goldthorpe 2000; King, Keohane, Verba 1994; Lazarsfeld, Rosenberg 1955;
Lieberson 1985; Wilson 1998).4 The conclusion of these scholars is either that there are no important

1 Quoted in Kaplan (1964: 172).


2 The origin of this quote is in dispute. It appears to have been on a sign that hung outside Einstein’s
office at the Institute for Advanced Study. Though it may not have been written by him, it is nonetheless
generally attributed to him, and must have received his approbation.
3 See also Bryman (1984: 76).
4 Beck (2004: 1) likens this to “the view of the Inquisition that we are all God’s children.”
distinctions between qualitative and quantitative work or, to the extent that such distinctions exist,
they are to the detriment of qualitative scholarship. “When possible, quantify,” is the motto of this
camp (Benoit 2005). Where quantification is not possible, this camp encourages qualitative scholars
to follow the logic of quantitative reasoning.
Defenders of qualitative work typically emphasize the limits of quantification and the
insights that can be gained through an interpretive approach to social action. Rather than a unified
logic, interpretivists suggest that there might be multiple logics at work in the social sciences. These
multiple logics stem from epistemological or ontological commitments, which may themselves be
culturally prescribed, political, or historical in origin (MacIntyre 1971/1978; Rabinow and Sullivan
1979; Shweder 1996; Shweder and LeVine 1984; Taylor 1962; Winch 1958). Yet, the intent of many
of these authors is often as polemical as their opponents.
Thus are the lines of battle cast. On the one side, positivists claim there are no important
divisions between qualitative and quantitative work; this is the hegemonic vision of naturalism. On
the other side, interpretivists (not to mention post-structuralists) claim that the social sciences are
irreconcilably divided, perhaps even incommensurable. We believe that these positions are unhelpful
and, in certain important respects, misleading. The methodological tools of social science are neither
uniform (with the same standards applying for both quant and qual) nor dichotomous (with a bright
line separating quant and qual). While we agree with the general sentiment expressed by the
unificationists – and certainly with the goal of scientific cumulation across fields and methods – we
find that these sorts of pronouncements are primarily rhetorical. That is, they either ignore or
condemn differences among well-established traditions of scholarship that are quite real and
eminently justifiable. Yet, these differences are not well accounted for by the usual dichotomous logic
– i.e., quantitative versus qualitative, large-N versus small-N, numbers versus narrative, positivism
versus interpretivism, formal methods versus informal intuition. Things are more complicated.5
We present a new way of thinking about these issues that rests upon the key concept of
comparability. At the level of observations, we argue in the first part of the paper that the principal
factor separating qualitative observations from quantitative observations is the relative comparability
of evidence. Quantitative observations presume a population of things that can be readily measured,
counted, and hence compared. Qualitative observations, by contrast, presume an empirical field
where individual pieces of evidence are not directly comparable to one another, at least not to the
extent that they can be precisely measured. In this sense, quantitative work is appropriately labeled
nomothetic, and qualitative work idiographic. The key point is that the difference between these two
kinds of observations rests on the presumed comparability of adjacent observations, not (at least not
directly) on the size of N, the style of presentation (numbers or narrative), epistemology, ontology, or
the formal structure of the method.
In the second part of the paper, we address strategies of causal analysis. Here, we argue that
the traditional division between quantitative and qualitative research designs is both misleading and
insufficient. Instead, we propose that the social sciences are most usefully conceptualized in three
categories, based upon the number and comparability of observations: Mathematical, Mill-ean, and
Process Tracing. Mathematical methods presume a relatively large sample of precisely comparable
observations. Mill-ean methods presume a relatively small sample of comparable observations.
Process-Tracing methods differ greatly from Mill-ean and Mathematical methods because they rely
on a set of disparate – and hence non-comparable – observations, which may be quantitative or
qualitative. The key point about Process Tracing is that each observation typically comprises a sample
of one (N=1); hence, adjacent observations are understood to be causally significant, but not
descriptively comparable. These three methodological categories encompass the entire work of social
science, with the critical differentiating feature being the comparability, and number, of adjacent
observations.

5 We speak of a general tendency in the social sciences. As always, there are exceptions. More subtle
treatments of these issues can be found in Brady and Collier (2004), Elman (2005), George and Bennett (2005),
Lieberman (2005), Mahoney and Rueschemeyer (2003), and elsewhere.

2
Our purpose in this paper, it should be stressed, is not to reinforce existing cleavages in the
social sciences. Nor, for that matter, do we intend to establish a new tripartite division among rival
camps. Rather, we wish to draw attention to what we see as the most important underlying issue in
these debates, an issue that has not received much recognition. If analysis is based on comparison,
the central methodological question is what we can reasonably compare, and how precise those
comparisons can be. Here, there are legitimate differences of opinion, and they are not the sort that
can be empirically proven. In the concluding section of the paper we offer some speculative
arguments about why scholars might be inclined towards assumptions of comparability or non-
comparability.

COMPARABILITY
The key concept in this study is comparability, so it is important that this term be carefully
defined.6 We understand comparable observations as members of the same population and therefore
potential members of the same sample. They are examples of a similar phenomenon. They are apples,
rather than apples and oranges, to use the time-honored metaphor. Note that comparing apples and
oranges is not prohibited; however, to do so we must adopt a higher-order concept – e.g., fruit –
according to which apples and oranges are similar. Comparison, writes Fredrik Barth (1999: 78),
“involves identifying two forms as ‘variants’ of ‘the same,’ which means constructing an over-arching
category within which the two forms can be included, and compared and contrasted.” This common-
sense meaning of comparability is widely understood and agreed upon. But what does it mean for
items to be comparable within the context of social science research? Surely, it is more than shared
membership in an arbitrary linguistic category.
First, comparable observations must share a set of relevant descriptive attributes
(dimensions). This is what makes them comparable. The observations need not demonstrate the
same values for those attributes. Each observation in a sample may “score” differently on each
attribute in either quantitative or qualitative terms – e.g., high/low, present/absent, and so forth. But
each observation must be score-able on some scale, and the attribute must mean (roughly or precisely)
the same thing across the contexts in which it is being compared. We label this descriptive comparability,
and argue that it is a fundamental feature of conceptual validity. The defining attributes of a concept
must be valid across the designated observations. Otherwise, we say that a concept is being
“stretched” (Collier and Mahon 1993; Sartori 1970).7
A second kind of comparability refers to the inter-relationship of two factors in a causal
analysis, the cause (or vector of causal factors), X, and the outcome, Y. The specified X/Y
relationship must hold across the chosen observations. We label this causal comparability. This idea is
familiar to statisticians, who often invoke the assumption of unit homogeneity as part of their models.
“For a data set with n observations, unit homogeneity is the assumption that all units with the same
value of the explanatory variables have the same expected value of the dependent variable” (King,
Keohane, Verba 1994: 91).
Thus, there are two kinds of comparability: descriptive and causal. The first is presumed in
the second. If a sample of observations is assumed to be causally comparable, then it must also be
descriptively comparable. In statistical research, the assumption of unit homogeneity makes this
explicit, but it must also be true more generally, for causal comparability can exist only in the
presence of descriptive comparability.

6 Note that there are a large number of neighboring terms -- commensurability, consonance,

equivalence, homogeneity, homology, similarity – all of which we shall treat as synonyms. For work on these
topics see Barth (1999), DeFelice (1980), Urban (1999), Van Deth (1998), Zelditch (1971).
7 Descriptive comparability is closely related to measurement validity (Adcock and Collier 2001).

However, do not wish to presuppose that in order to achieve descriptive validity a concept must be measurable.
Note that problems of conceptual stretching may be ameliorated by the use of multi-dimensional typologies
(Elman 2005), in which circumstance a single concept is (usually) sub-divided into several concepts, each
pertaining to a different empirical realm.

3
DESCRIPTION: QUALITATIVE AND QUANTITATIVE

What makes a descriptive statement about the world “qualitative” rather than “quantitative”?
The question is so apparently obvious that it is difficult to reflect upon. Indeed, a recent lexicon
focused on qualitative inquiry notes, in the entry for this term, that “the adjective does not clearly
signal a particular meaning” (Schwandt 1997: 129). Rather, it is used as a “blanket designation for all
forms of [hermeneutic] inquiry including ethnography, case study research, naturalistic inquiry,
ethnomethodology, life history methodology, narrative inquiry, and the like.” In sum, concludes the
author of this dictionary of qualitative methods (with no apparent sense of irony), “’qualitative
research’ is simply not a very useful term for denoting a specific set of characteristics of inquiry”
(Ibid. 130).8 The inherent fuzziness of the concepts prompts some scholars to argue that the
qualitative/quantitative debate must be a red herring, for neither term is very specific (Brodbeck 1968:
573-4; Hammersley 1992). Alternately, “the transformation of quantity into quality, or conversely, is a
semantic or logical process, not a matter of ontology” (Kaplan 1964: 207). In other words, it is not
very important, and a distraction from the real issues of social science methodology.
Yet, these two concepts, and the attendant debates, refuse to be banished. Books, articles,
courses, institutes, and arguments continue to bear these names. There must be a there, there,
somewhere. One possibility is that the distinction between qual and quant is simply a matter of
numbers versus narrative, counting versus recounting. Wherever one sees a number, quantitative
inquiry is taking place; wherever one espies a word, qualitative inquiry is at work. By this off-hand
definition, all work in the social sciences includes both elements, a rather unhelpful conclusion.
Alternatively, one could calculate a specific ratio of numbers to words in a given work in order to
attain a quantitative/qualitative score. But this does not seem to be what most authors have in mind
when they invoke these categories.

COMPARABILITY: EXPLICIT-NESS AND PRECISION


We propose that what is usually at issue when these terms crop up in social science settings
is tethered to the issue of comparability. To “quantify” an observation, we submit, is to formulate it
in terms that can be explicitly and precisely compared across a large number of observations, i.e., where
a concept can be expressed on a numerical scale, a metric, a variable (we use these three words
interchangeably). Note that a number by itself (e.g., “70”) is not a quantitative observation. Its link to
the empirical world depends upon its connection to a metric such as temperature. “Seventy degrees
Fahrenheit” is a quantitative observation, while “70” is not. Similarly, a phone number does not
measure anything, for it is not based on a scale. As such, it is more properly classified as qualitative
than quantitative. Qualitative observations rely primarily on natural language, though they may also
include numbers so long as those numbers are not connected to particular scales. Quantitative
observations combine natural-language words (nouns, verbs, or adjectives) with numbers according
to some pre-assigned metric. It is a question of measurement, which “in the most general terms, can
be regarded as the assignment of numbers to objects (or events or situations) in accord with some
rule” (Kaplan 1964: 177).
Let us begin with a discussion of the concept of “precision.” Note that to simply re-code a
dichotomous natural-language category as a series of binary numbers does not make it any more
precise. Thus, 0/1 is no more precise than “pregnant/not pregnant.” However, numerical scales
offer the possibility of greater precision when the number of categories surpasses the categories
inherent in natural language, as well as in circumstances where these categories can be understood as

8For additional attempts at definition see Creswell (1998: 14-6), Denzin and Lincoln (1994: 2), Strauss
and Corbin (1998: 10-11).

4
positions on a continuous (interval) scale. To say that one room is “warmer” than another is
comparative, but it is less precise than saying that one room is 70 degrees Fahrenheit and the other is
60 degrees Fahrenheit. Thus, in many situations the use of a quantitative idiom allows for more
precise comparisons across units. In all situations, the use of a scale is at least as precise as natural
language (in the sense that no precision is lost in the translation of words to numbers). Again, to use
a quantitative idiom does not entail great precision; it entails the possibility of great precision (as well as
a more explicit set of comparisons).
Precision should not be confused with certainty. Qualitative or quantitative statements may be
uttered with more or less confidence. For example, one might say “I would guess that the room is 70
degrees” or “I would guess that the room is warm.” With quantitative statements, a mathematical
indicator of uncertainty may accompany the point score, as in World Bank estimates of the quality of
governance crossnationally – where, as it happens, the standard errors happen to be extraordinarily
high (Kaufmann, Kraay, and Zoido-Loboton 1999). When we say that quantitative observations are
more precise we are referring to the point estimate, not the degree of uncertainty (or dispersion).
As we said, quantitative statements are both more precise (at least potentially) and more
explicit. This is because the very act of creating a numerical scale requires a set of explicit comparisons
and an explicit comparison-set – a domain. Scales cannot be developed in highly specific contexts.
Imagine developing a barometer of a single event, say, the terrorist attacks that occurred on
September 11, 2001. The idea is nonsensical because scales make sense only relative to classes of
events. One can have a terrorism meter, but not a 9/11 meter (unless of course that event is being
used as a metric for understanding other events, in which case it becomes a comparative metric).
Weather can be measured precisely because, for one thing, there is lots of weather to measure, and
temperature is thought to have the same general meaning in many different contexts. Granted, all
scales are bounded; there is no universal scale (a scale applying to everything). Some things, like
metaphysics, have no temperature; the concept of temperature (and whatever scale might be used to
measure it) does not apply in this domain. The point is that relative to words, which are only loosely
and implicitly comparative, scales are precisely and explicitly comparative, and their range is usually
quite broad (otherwise, why bother to develop a systematic scale?).
Now, it is true that some natural-language adjectives, such as “warm,” are explicitly
comparative. But most words are more ambiguous. This is apparent in the extra locutions that are
necessary in order to render ordinary language comparative. One must clarify “warmer than,” “more
chair-like than,” and so forth; whereas, to append such judgments to a numerical scale is redundant.
(One does not say, “70 degrees F, warmer than 65 degrees F.”) Numerical scales are already
comparative, and no matter what one does with a numerical observation it cannot lose its precise,
explicitly comparative quality.
To be sure, if one labels an object with a noun – e.g., “chair” – one is implicitly (if not
explicitly) comparing it with other objects: non-chairs. Language has this universal aspect; if we call
something X, we imply that other things are not-X, or less-X or more-X. However, the comparisons
are vague. It is unclear, for example, where a chair leaves off and a stool begins, for few words – and
very few key words in social science – have crisp boundaries. More importantly, most words are
multivalent; they have more than one attribute and consequently can mean more than one thing.
Thus, to say that an object is “not-A” could mean a number of different things, depending on the
attribute(s) that are intended by an author, or understood by the reader. Moreover, a word usually
gains meaning by its context, and this context is undefined in settings other than that which the
author is studying. Additionally, the other objects that are not-X are typically not defined, in which
case the larger population of cases (the domain of the inference) remains implicit. Finally, words are
contingent upon a particular natural language, and this imposes another sort of contextual boundary
against comparison. (By contrast, the number “5” and the operator “=” mean the same thing
everywhere – since the adoption of a uniform language of mathematics – and they also mean the
same thing in all contexts that they might be employed.)
Frequently, natural-language comparisons are without any obvious comparative reference-
point. The statement “Caesar crossed the Rubicon” is comparative in so many possible ways that it

5
might be considered non-comparative: he did not cross it; he did not cross the Tiber; it was not
Brutus who crossed the Rubicon; and so forth. If this is “comparative,” it is in the most minimal
sense. Yet, it is important for our purposes to recognize that comparison is a matter of degree.
Qualitative observations can be more or less comparative, but quantitative observations are almost
always more precisely and explicitly comparative.
One final clarification is in order. We have said that all quantitative statements about the
world invoke a class of events; these form the basis for the metric. However, it does not follow that
quantitative statements about the world are necessarily broader in scope than qualitative statements.
Indeed, the very fuzziness of natural language issues a license to generalize – for one can avoid saying
anything terribly specific – while the exactness of quantification may rein in the temptation to
generalize. It follows that qualitative statements can be either very restricted in scope (as in our
previous example about the singular event of 9/11) or extremely broad. Saying something in words
does not affect the scope of the inference. Saying something with a metric, however, presupposes a
class of referents, which is to say it must make reference to more than one discrete event (and these
reference-points must be fairly precise and explicit).

TRADEOFFS: THICK AND THIN DESCRIPTION


In principle, any qualitative observation can be converted into quantitative form, as attested
by the plethora of methods designed to perform precisely this function, e.g., NUD-IST software
(Gahan and Hannibal 1999), Computer Assisted Qualitative Data Analysis Software (CAQDAS) (e.g.,
Fielding and Lee 1998), various narrative-based methods (Abbott 1992; Abell 2004; Buthe 2002;
Griffin 1993), as well as more generic forms of content analysis (Krippendorff 2003). There is no
such thing as a non-quantifiable observation because any single statement that can be made about
one phenomenon could also be made about another phenomenon, thus providing the possibility of
some sort of scale.
Yet, it is not clear that one would always want to make this transposition from words to
words-with-numbers (variables). Indeed, there are usually costs associated with this conversion. The
tradeoff may be understood in terms of precision and explicit-ness on the one hand, and depth (or
richness) on the other. More concisely, the analyst has the option of describing thinly or describing
thickly. Begin with the fact that words are usually multivalent (Robinson 1954); they generally carry a
variety of attributes, some of which may not even be logically consistent. This is particularly true of
key words in social science – e.g., democracy, justice, corporatism, political party, and so forth. When
one of these words is converted into a measurable variable, which is to say into to a precise scale, a
researcher is generally forced to drop one or more of its attributes. For not all of these attributes will
be precisely applicable to the class of phenomena that the concept is now (quite explicitly) intended
to cover. Recall that we are not necessarily describing an expansion in scope, for natural language can
reach as far as mathematical variables. But in making the comparison precise and explicit, it is usually
necessary to narrow the definition of the natural-language concept. To be sure, it could be that the
intension of the natural-language concept is also quite a bit narrower than the full set of attributes
normally (in ordinary language) associated with the term. An author is free to define a term as she
sees fit; qualitative work is not wedded to ordinary language (Sartori 1984). The point is, in creating a
variable one is forced to make explicit choices about which definitional attributes properly apply to a
class of phenomena, and which do not. This is likely to prompt some narrowing of the semantic
options. And this is why we consider the choice to quantify a concept as a move toward thin
description. More explicit comparisons are being made, but they are narrowed down to one or
several dimensions (the chosen attributes of the core concept).
Similarly, if one chooses to particularize, rather than to generalize, natural language is the
obvious vehicle of choice. As we have pointed out, it is inappropriate to construct a scale when the
class of instances under investigation is one or several. A scale presupposes a population. By contrast,
a word can be used in a highly specific context; it does not presuppose an explicit comparison with
other instances. This means that in describing the singularity of an event one is drawn towards the

6
implements of natural language. The lack of perfect commensurability between words used in one
context and the same words used in another context allows the researcher the facility to elucidate
what is different – categorically (qualitatively), not marginally (quantitatively) – about that
phenomenon. A very high score on some scale can be (indeed, must be) indicated with a quantitative
metric. But a very different kind of score requires a word, perhaps a series of words.
In short, there are gains and losses in the transposition of words to numbers, and vice-versa.
What is interesting about this classic debate is that both may be described as “reductionist.”
Quantitative studies are often accused of reducing reality, and in the process distorting that reality to
fit the austere requirements of the quantitative format. Each piece of reality must be sliced up into
variables and these must be comparable across all observations. Qualitative studies are accused of a
different kind of reduction, in which a subject is shrunk down to a highly particular context – the
country, neighborhood, or event of special interest.
From our perspective, these contrasting notions of reduction/expansion are best understood
as arguments over comparability. Scholars inclined toward the tools provided by natural language are
often keen to explore a wide variety of different aspects in a particular setting. They wish to explore
multiple dimensions of one thing. (“Dimension” is employed as a synonym for “variable” in this
context.) In Howard Becker’s (1996: 65) words, one is “trying to find out something about every
topic the research touches on, even tangentially.” Thus, Lizabeth Cohen’s (1991) history of the New
Deal follows the narrative of this extraordinary epoch in one city (Chicago) in extraordinary detail.
Clifford Geertz (1980) focuses on the “theatre state” in Bali, but his analysis touches on virtually all
aspects of Balinese culture, economy, and society. Qualitative analysis is thus often focused inward,
like a vast funnel. Many comparisons are made, but they are all understood as features of the same
general topic, existing in one time and place. Natural language is adept for this purpose for it is rich,
textured, context-specific, and multivalent. It elucidates a wealth of details about a person, event, or
situation. This is why we find a natural affinity between qualitative tools and ethnographic, historical,
and – more broadly – interpretivist styles of research. By contrast, scholars inclined toward a
numerical understanding of the world are drawn toward comparisons that are broad and thin. They
intend to explore one particular dimension of many things.
The interesting aspect of this familiar contrast is that both qualitative and quantitative
scholars perceive their work as conforming to the natural bend of the universe. Qualitative scholars
usually assume a case-centered approach. Different aspects of the same cases can be compared; they go
together. Quantitative scholars are drawn toward a dimensional approach to comparability. A single
aspect (dimension) of an entity is assumed to be comparable across multiple cases (Ragin 1997).
While for a qualitative scholar it would seem natural to explore everything about A, for a quantitative
scholar it would seem more natural to explore one thing about A, B, and C. Underlying scholars’
choice of method are certain assumptions about cross-case comparability. The tools we choose –
words or numbers – are, in part, the expression of our relative confidence in the ability to compare
across entities in a given research context.9

CAUSAL ANALYSIS: MATHEMATICAL, MILL-EAN, AND PROCESS TRACING

Thus far, we have argued that the venerable distinction between qualitative and quantitative
description in the social sciences is closely linked to, indeed largely derivative of, an underlying issue,
that of comparability. Quantitative work presumes a high level of comparability across observations
(either cross-case or within-case), reflected in the explicit and precise nature of a metric. Qualitative
work usually presumes that such comparisons are problematic and that, therefore, they should either

9 This is not to deny that many scholars use both words and numbers. The point is that within a
given context the likelihood of choosing one or the other strategy is influenced by assumptions about case-
comparability.

7
be avoided (in favor of a small scope) or left ambiguous (as ordinary language does). This is the
principal methodological justification for doing work that is quantitative or qualitative.
In this section we shift our focus from description to causal analysis, and to questions of
research design. However, the thrust of our argument will be familiar. We argue that the key divisions
in research design are best understood according to assumptions about comparability, specifically, the
number of observations that are deemed comparable within a given sample. This leads us to
distinguish three broad categories of research designs: 1) Mathematical (large-N), 2) Mill-ean (small-N),
and 3) Process Tracing (N=1).
Table 1 illustrates the defining features of these genres, most of which follow more or less
ineluctably from differences in sample size, which itself is driven largely by the assumed
comparability of adjacent observations. Since these are extraordinarily broad groupings,
encompassing all disciplines in the social sciences, and since the categories themselves are internally
diverse, it seems appropriate to refer to them as methodological genres, a term that also captures
salient differences in exposition. In any case, it should be clear that when speaking about
“Mathematical methods” or “Mill-ean methods” or “Process Tracing methods” we are speaking
about a diverse group of approaches.
We also recognize that there is a certain degree of genre-crossing within any given study.
Individual studies may employ a mix of Mathematical, Mill-ean, and Process-Tracing methods. This
may occur for several reasons. For example, when studies move across levels of analysis it is
common for authors to shift analytic tools. Indeed, different methods may be required when
analyzing cross-case and within-case evidence. However, at any given point in a study, the author
must employ one of these three genres of research design. In this respect, the classification is
mutually exclusive and exhaustive. Moreover, most individual studies can be readily classified
according to the predominant mode of analysis that the author employs. This usually corresponds to
the principal unit of analysis. Most studies in the social sciences today fit one of these three types,
even if they do not fall exclusively in one box.10

10 It should be clarified that this tripartite typology refers to methods of causal analysis, not to methods
for selecting the cases or observations. Prior to causal analysis, we assume that researchers have carefully
selected observations and cases (either randomly or purposefully), and that researchers have generated data
appropriately (either by experimental manipulation or some natural process). These observations may contain
quasi-experimental characteristics or they may be far from the experimental ideal. Causal analysis may be
conducted across cases or within cases. For our purposes, these issues are extraneous, though by no means
trivial. In bypassing them we do not intend to downplay them. Our intention, rather, is to focus narrowly on
what analysts do with data once cases have been chosen, the data have been generated, and the relevant
observations have been defined. This topic, we believe, is much less well understood.

8
Table 1:
Three Genres of Research Design

Mathematical Mill-ean Process Tracing

No. of comparable obs (N): Large Small One

Total obs: Large Small Indeterminate

Individual obs: Always quantitative Usually qualitative Quant or Qual

Presentation of obs: Rectangular dataset Table or prose Prose

Statistics, Most-similar, Counterfactual,


Analytic techniques: Boolean algebra Most-different Pattern-matching

Covariation: Real Real Real and imagined

Stability, replicability: High Moderate Low

Historical, Narrative,
Comparative,
Statistics, Ethnographic, Legal,
Familiar labels: QCA
Comparative-historical,
Journalistic,
Small-N cross-case study
Single-case study

9
MATHEMATICAL METHODS
The Mathematical genre will be familiar to most readers because it is represented by
hundreds of methods texts and courses. Here, the analysis is typically conducted upon a large sample
of highly comparable observations contained in a standard rectangular dataset, using some
mathematical algorithm to establish covariational patterns within the sample. For better or worse,
this is the standard template upon which contemporary understandings of research design in the
social sciences is based. For some, it appears to be the sine qua non of social science research (Beck
2004; Blalock 1982, 1989; Goldthorpe 2000; King, Keohane, Verba 1994; Lieberson 1985; for general
discussion see Brady and Collier 2004).
Our use of the term “Mathematical” does not presuppose any particular assumptions about
how this analysis is carried out. If statistical, the model may be linear or non-linear, additive or non-
additive, static or dynamic, probabilistic or deterministic, and so forth. The only assumption that
statistical models must make is that the observations are comparable to one another – or, if they are not,
that non-comparabilities can be corrected for by the modeling procedure (e.g., by weighting
techniques, selection procedures, matching cases, and so forth). For statisticians, the assumption of
unit homogeneity is paramount. These requirements apply whether the observations are defined
spatially (a cross-sectional research design), temporally (a time-series research design), or both (a
time-series cross-section research design). By extension, the same requirements apply whether the
analysis is probabilistic (as in most statistical methods) or deterministic (as in some versions of
Qualitative Comparative Analysis [Ragin 1987, 2000]).
As a rule, Mathematical work employs a sample that remains fairly stable throughout the
course of a single study. Granted, researchers may exclude or down-weight outliers and high-leverage
observations, and they may conduct sub-sample analyses. They may even interrogate different
datasets in the course of a longer study, or recode the sample to conduct sensitivity analyses.
However, in all these situations there is an explicit, well-defined sample of highly comparable
observations that provides the evidentiary basis for causal inference. The importance of this issue will
become apparent as we proceed.

MILL-EAN METHODS
The two most familiar Mill-ean methods are most-similar analysis (aka method of agreement)
analysis and most-different analysis (aka method of difference), both of which can be traced back to J.S.
Mill’s nineteenth-century classic, System of Logic (1834/1872). In most-similar analysis, cases are
chosen so as to be similar on all irrelevant dimensions and dissimilar on both the hypothesized causal
factor and the outcome of interest. In most-different analysis, cases are chosen to maximize
difference among the cases on all causal factors (except one), while maintaining similarity on the
outcome. The most-similar research design is more common, and probably more well-grounded,
than the most-different research design (Gerring 2006: ch 5).
The details of these research designs are not important here. What is important is that the
cross-case component of the analysis be fairly explicit. There must be a recognizable sample within
which the chosen cases are analyzed. In other words, there must be significant cross-case variation
and this variation must comprise an important element of the overall analysis. “Comparative-
historical” work is similar to the foregoing except that the analysis also incorporates a significant
over-time component (Mahoney and Rueschemeyer 2003). Cases are thus examined spatially and
temporally, and the temporal analysis usually includes a change in one or more of the key variables,
thus introducing an intervention (“treatment”) into the analysis.11

11 Our discussion thus far has approached Mill-ean methods according to the primary unit of analysis,

usually referred to as a “case” (a spatially and temporally delimited unit that lies at the same level of analysis as
the principal inference). To be sure, this genre of work may also exploit within-case variation, which might be
large-N (e.g., a mass survey of individual respondents or a time-series analysis of some process), small-N (e.g., a

10
Mill-ean methods, like Mathematical methods, are based upon a relatively stable sample of
comparable cases. Granted, there are likely to be some shifts in focus over the course of a longer
study. Sometimes, a researcher will choose to focus on a series of nested sub-samples, e.g., paired
comparisons (Collier and Collier 1991). The small size of the sample means that any change in the
chosen cases will have a substantial impact on the sample, and perhaps on the findings of the study.
Ceteris paribus, small samples are less stable than large samples.
Because Mill-ean methods must employ cases that are relatively comparable to one another,
they may be represented in a standard, rectangular dataset where the various dimensions of each case
are represented by discrete variables. Yet, because there are few cases (by definition), it is rare to see a
dataset presentation of the evidence. Instead, scholars typically rely on small tables, 2x2 matrices,
simple diagrams, or prose.
The most important difference between Mathematical methods and Mill-ean methods is that
the latter employs small samples that may be analyzed without the assistance of interval scales and
formal mathematical models. Indeed, statistics are relatively powerless when faced with samples of a
dozen or less. A simple bivariate analysis may be conducted, but this does not go much further than
what could be observed visually in a table or a scatterplot diagram.
Another difference with the Mathematical framework is that Mill-ean methods presuppose a
fairly simple coding of variables, usually in a dichotomous manner. Similarities and differences across
cases must be clear and distinct, otherwise they cannot be interpreted (due to the small-N problem).
Thus, continuous variables are usually dichotomized into high/low, present/absent, strong/weak,
and so forth. Simple coding schemes, and the absence of probability distributions, impose a
deterministic logic on Mill-ean methods, such that causal factors (or combinations of factors) must
be understood as necessary, sufficient, or necessary and sufficient. Deterministic assumptions may
also be employed in Mathematical methods, particularly Boolean methods, but they are not de
rigueur. The smaller the sample size, the more difficult it is to incorporate continuous causal factors
and probabilistic logic. However, in other respects – aside, that is, from sample size – Mathematical
and Mill-ean methods are quite similar. They both rely on covariational patterns across observations
that are presumed to be comparable to one another.

PROCESS-TRACING METHODS
Process Tracing, in our understanding, refers to any method of causal analysis in which the
researcher analyzes a series of noncomparable observations within a single case.12 Studies that employ
Process Tracing typically consist of many observations (either qualitative or quantitative), each
making a slightly different point, but all related to some overall argument (i.e., the primary inference).
Since the observations are not comparable to one another, the presentation is delivered in prose (or

comparison among a half-dozen regional units), or a series of N=1 observations (e.g., a study of a particular
decision or set of decisions within the executive). In short, the within-case components of Mill-ean methods
are indeterminate; they may be Mathematical, Mill-ean, or Process Tracing. The fact that a single study may
employ more than one method is not disturbing; as we observed, a change in an author’s level of analysis often
corresponds to a change in research design.
12 The term “process tracing” is ambiguous, having been appropriated for a variety of uses. For some

writers, it refers to any investigation (qualitative or quantitative) into causal mechanisms (George and Bennett
2005). There is, to be sure, a strong affinity between this technique, as we describe it, and a researcher’s insight
into causal paths. However, it may be a mistake to define process tracing as the search for causal mechanisms.
After all, this is also an objective of Mathematical and Mill-ean studies. In short, while Process-Tracing
methods give more attention to causal mechanisms, this should not be considered a defining feature. Process
tracing relies on the use of “causal-process observations” (following Brady and Collier 2004) to analyze causal
relationships. Other labels for this style of causal inference include colligation, narrative explanation, pattern-
matching, sequential explanation, genetic explanation, and causal chain explanation. For general discussion, see
Brady (2004), George and Bennett (2005: ch 8), Little (1995: 43-4), Scriven (1976), Seawright and Collier (2004),
Tarrow (1995: 472). For examples, see Geddes (2003: ch 2), George and Smoke (1974), Goldstone (2003: 50-1),
George and Bennett (2005: appendix).

11
“narrative analysis” [Mahoney 1999]). However, it is the absence of comparability among adjacent
observations – not the use of prose – that makes this approach so distinctive, and so mysterious.
Process-Tracing methods do not conform to standard notions of methodological rigor because most
elements of a “research design,” in the usual sense of the term, are absent. There is, for example, no
formally defined sample of observations, as with Mathematical and Mill-ean methods. Moreover, the
methods for making causal inferences that link observations into a causal chain are often not
explicitly stated. Consequently, Process-Tracing studies give the impression of being informal, ad hoc
– one damn observation after another.
The skepticism of mainstream methodologists is not difficult to comprehend. William Riker
(1985: 62-3; see also Beck 2004) regards process tracing as “scientifically impossible.”
Tracing a process, and imposing a pattern is, of course, no more and no less than writing
history. Although some nineteenth-century historians claimed to be scientific, such a claim
has seldom been put forward in this century until now, when it rises up, camouflaged, in
social science. There was good reason for abandoning the claim: Historical explanation is
genetic. It interprets cause as no more than temporal sequence, which, in the philosophy of
science, is precisely what has long been denounced as inadequate. Causality in science is a
necessary and sufficient condition; and, although temporal sequence is one of several
necessary conditions, it is not sufficient. . . Process-tracing of the history of an event, even
the comparison of several traced processes, does not give one generalizations or theory.
However, we shall argue that the wayward reputation of Process Tracing is only partially deserved.
Indeed, inferences drawn from Process-Tracing methods may be more secure, at least in some
instances, than inferences based on Mathematical or Mill-ean methods. There are strong arguments
for the employment of non-comparable (N=1) observations in social science.
We begin with an extended example drawn from Henry Brady’s (2004: 269-70) reflections
on his study of the Florida election results in the 2000 presidential election (conducted in tandem
with a team of methodologists). In the wake of this close election at least one commentator
suggested that because several networks called the state for Gore prior to a closing of the polls in the
Panhandle section of the state, this might have discouraged Republican voters from going to the
polls, and therefore might have affected the margin (which was razor thin and bitterly contested in
the several months after the election) (Lott 2000). In order to address the question, Brady stitches
together isolated pieces of evidence in an inferential chain. He begins with the timing of the media
calls – ten minutes before the closing of the polls in the Panhandle. “If we assume that voters go to
the polls at an even rate throughout the day,” Brady continues, “then only 1/72nd (ten minutes over
twelve hours) of the [379,000 eligible voters in the panhandle] had not yet voted when the media call
was made.” This is probably a reasonable assumption. (“Interviews with Florida election officials
and a review of media reports suggest that, typically, no rush to the polls occurs at the end of the day
in the panhandle.”) This means that “only 4,200 people could have been swayed by the media call of
the election, if they heard it.” He then proceeds to estimate how many of these 4,200 might have
heard the media calls, how many of these who heard it were inclined to vote for Bush, and how any
of these might have been swayed, by the announcement, to go to the polls in the closing minutes of
the day. Brady concludes: “the approximate upper bound for Bush’s vote loss was 224 and . . . the
actual vote loss was probably closer to somewhere between 28 and 56 votes.”
Brady’s conclusions rest not on a formal research design but rather on isolated observations
(both qualitative and quantitative) combined with deductive inferences: How many voters “had not
yet voted when the media called the election for Gore? How many of these voters heard the call? Of
these, how many decided not to vote? And of those who decided not to vote, how many would have
voted for Bush?” (Brady 2004: 269). This is the sort of detective work that fuels the typical Process-
Tracing study, and it is not a sort that can be represented in a rectangular dataset. The reason is that
the myriad pieces of evidence are not comparable to each other. They all support the central
argument – they are not “random” – but they do not comprise observations in a larger sample. They
are more correctly understood as a series of N=1 (one-shot) observations – or perhaps the more
ambiguous phrase “pieces of evidence” is appropriate. In any case, Brady’s observation about the
timing of the call – ten minutes before the closing of the poll – is followed by a second piece of

12
evidence, the total number of people who voted on that day, and a third and a fourth. It would be
impossible to string these together into a large, or even moderately-sized, sample, because each
element is disparate. Being disparate, they cannot be counted. While the analytic procedure seems
messy, we are convinced by its conclusions – more convinced, indeed, than by the large-N analysis
that Brady is arguing against. Thus, it seems reasonable to suppose that, in some circumstances at
least, Process Tracing is more scientific than sample-based inferences, even though its method is
difficult to describe.
This is the conundrum of Process-Tracing research. We are often convinced by the results,
but we cannot explain – at least not in any formal fashion – why. Our confidence appears to rest on
highly specific propositions and highly specific observations. There is little we can say, in general,
about “Brady’s research design” or other Process-Tracing research designs. It is no surprise that
Process Tracing receives little or no attention from traditional methods texts, structured as they are
around the quantitative template (e.g., King, Keohane, and Verba 1994). These methods texts do not
tell us why a great deal of research in the social sciences, including a good deal of case study research,
succeeds or fails.
While sample-based methods (both Mill-ean and Mathematical) can be understood according
to their covariational properties, Process-Tracing methods invoke a more complex logic, one that is
analogous to detective work, legal briefs, journalism, traditional historical accounts, and single-case
studies. The analyst seeks to make sense of a congeries of disparate evidence, some of which may
explain a single event or decision. The research question is always singular, though the ramifications
of the answer may be generalizable. Who shot JFK? Why did the US invade Iraq? What caused the
outbreak of World War One? Process-Tracing methods are, by definition, case-based. If a researcher
begins to draw comparisons with other assassinations or other wars, then she is using (at least
implicitly) a Mill-ean method, which means that all the standards of rigor for Mill-ean methods
pertain and the researcher is entering a different methodological context.
It is important to note that the observations enlisted in a Process-Tracing case study may be
either qualitative or quantitative. Brady employs a good deal of quantitative evidence. However,
because each quantitative observation is quite different from the others they do not collectively
constitute a sample. Each observation is sampled from a different population. This means that each
quantitative observation is qualitatively different. Again, it is the comparability of adjacent
observations, and the number of those observations, not the nature of the individual observations,
that define a study as Mathematical, Mill-ean, or Process Tracing.
Note also that because each observation is qualitatively different from the next, the entire set
of observations in a Process-Tracing study is indeterminate and unstable. The “sample” (we use this
term advisedly) shifts from observation to observation. Because of this, we refer to samples of 1, or
N=1 observations. A careful reader might object that the notion of an “observation” implies the
existence of other comparable observations in a larger population. We accept that this is true for
most observations. The issue is not whether comparable observations exist, but rather whether those
other observations are considered (i.e., sampled and analyzed) in the case study. If they are not
considered, then we have a set of N=1 observations. Regardless of how carefully one seeks to define
these things, there should be no disagreement on our basic point that samples, populations, and
sampling techniques are not well specified in Process-Tracing methods. If they are well specified,
then we are working in the realm of Mill-ean or Mathematical methods.
There are likely to be a number of non-comparable observations in a single Process-Tracing
study; indeed, the cumulative number of observations may be quite large. However, because these
observations are not well defined, it is difficult to say exactly how many there are. Non-comparable
observations are, by definition, difficult to count. Recall, from our previous discussion, that the act of
counting presumes comparability among the things being counted. Process-Tracing evidence lacks
this quality; this is why it is resistant to the N question. In an effort to count, one may of course
resort to lists of what appear to be distinct pieces of evidence. This approximates the numbering
systems commonly employed in legal briefs. But lists can always be composed in multiple ways, so
the total number of observations remains an open question. We do not know, and by the nature of

13
the analysis cannot know, precisely how many observations are present in studies such as Richard
Fenno’s Homestyle (1978), Herbert Kaufman’s The Forest Ranger (1960), Clifford Geertz’s Negara (1980),
and Jeffrey Pressman and Aaron Wildavsky’s Implementation (1973). Process-Tracing observations are
not different examples of the same thing; they are, instead, different things. Consequently, it is not clear
where one observation ends and another begins. They flow seamlessly together. Thus, we cannot re-
read Fenno, Kaufman, Geertz, or Pressman and Wildavsky with the aid of a calculator and hope to
discover their true N, nor would we gain much – if any – analytic leverage by so doing. Quantitative
researchers are inclined to assume that if observations cannot be counted they must not be there, or
– more charitably – that there must be very few of them. Qualitative researchers may insist that they
have many “rich” observations at their disposal, which provide them with the opportunity for thick
description. But they are unable to say, precisely, how many observations they have, or where these
observations are, or how many observations are needed for thick analysis. Indeed, the observations
themselves remain undefined.
This ambiguity is not in our opinion troublesome, for the number of observations in a
Process-Tracing study does not bear directly on the usefulness or truthfulness of that study. While
the number of observations in a sample drawn from a well-defined population contains information
directly relevant to any inferences that might be drawn from that sample, the number of observations
in a Process-Tracing study (assuming one could estimate the N) has no obvious relevance to
inferences that might be drawn from that study. Consider that if it was merely quantity that mattered
we might safely conclude that longer studies, which presumably contain more observations, are more
reliable or valid than shorter studies. Yet, it is laughable to assert that long books are more
convincing than short books. It is the quality of the observations and how they are analyzed, not the
quantity of observations, that is relevant in evaluating the truth-claims of a Process-Tracing study.
Thus, the N=1 designation that we have attached to Process-Tracing evidence should not be
understood as pejorative. In some circumstances, one lonely observation (qualitative or quantitative)
is sufficient to prove an inference. This is quite common, for example, when the author is attempting
to reject a necessary or sufficient condition. If we are inquiring into the cause of Joe’s demise, and we
know that he was shot at close range, we can eliminate suspects who were not in the general vicinity.
One observation – say, a videotape from a surveillance camera – is sufficient to provide conclusive
proof that a suspect was not, in fact, the killer, even though the evidence is neither quantitative nor
comparable to other pieces of evidence.
Process-Tracing methods apply only to situations in which the researcher is attempting to
reconstruct a sequence of events occurring within a single case – i.e., a relatively bounded unit such
as a nation, family, legislature, or decision-making unit. That case may be quite broad, and might even
encompass the whole world, but it must be understood as a single unit, for purposes of the analysis.
All Process-Tracing methods are inherently within-case analyses. If several cases are analyzed, the
researcher has adopted a different style of analysis, one in which there is a specifiable sample (either
large-N or small-N). The researcher may, for example, have begun with a Process-Tracing analysis
within one case study, and later switched levels of analysis by comparing that case study with other
case studies using Mill-ean or Mathematical methods.
The specific techniques employed in Process-Tracing studies – including pattern-matching
and counterfactual thought-experiments, have been explored by other authors (Campbell 1975/1988;
George and Bennett 2005; Hall 2003; Roberts 1996; Tannenwald 1999). At present, we wish simply
to call attention to a fundamentally puzzling aspect of this methodological genre. Process Tracing
rests on very proximate evidence (observations lying close to the “scene of the crime”) and, at the
same time, on very general assumptions about the theory at hand or the way the world works.
Process Tracing thus lies at both extremes of the inductive-deductive spectrum. Sample-based studies,
by contrast, generally require fewer deductive assumptions and, at the same time, are more removed
from the facts of the case. The extreme quality of Process Tracing – which bounces back and forth
from Big Theory to detailed observation – contributes to its “unstable” reputation. However, there
are often good reasons for this back-and-forth, as our previous discussion indicates.

14
CONCLUSION

According to the standard view, approaches in the social sciences can be understood as
either qualitative or quantitative (or some mix thereof). Scholars differ in their opinions about the
utility of this distinction. Some dismiss it as a red herring; others feel that there is good justification
for the division. We feel that there is no plausible way of discarding the distinction but that it is
greatly misunderstood. The key to this misunderstanding, we have argued, is to be found in
assumptions about comparability.
When thinking about comparability, it is vital to distinguish between descriptive statements
and causal analysis, the focus of the first and second parts of the article. With respect to descriptive
propositions about the world, there is indeed a difference in basic-level assumptions between
statements that are quantitative (i.e., understood through a numerical scale, a metric, or a variable)
and those which are qualitative (expressed in natural language), even if the focus is ostensibly on a
single observation. Quantitative descriptive statements presuppose a class of comparable cases that
can be compared in an explicit and precise manner. To measure phenomenon, X, is to impose a very
specific metric on it, one that is explicitly comparative (since other phenomena in this same category
are assumed to be score-able). Qualitative descriptive statements do not make any such
presuppositions. There may or may not be an identifiable class of comparable cases that can be
measured along some set of dimensions. Often, the assumption is quite the reverse – particularizing
rather than generalizing. Thus, we argued that to quantify something is to compare in an explicit and
precise manner. To qualify is to leave such comparisons open; one may or may not engage explicit
comparisons with adjacent cases, and any comparisons made are unlikely to be very precise. While
this might seem to indicate a distinct advantage for quantitative work, we also showed that there are
costs to assuming a quantitative idiom. Not only must the cases be (actually) comparable, but there is
usually some loss of information since words are usually multivalent and metrics are usually
unidimensional (or at most combine several dimensions). It is not clear that we always gain in analytic
leverage by moving from words to numbers. We do, however, make different sorts of comparisons.
The choice between math and natural language as tools of social science is, therefore, a
highly consequential choice. Methodological tools help us to reconstruct the empirical world; they are
not theory-neutral. In this respect, the division between math and language is akin to the influence
that early anthropologists and linguists assigned to language. Different languages divide up the world
into different packages; they encourage us to visualize things in different ways.13 So, arguably, do the
different “languages” of mathematics and natural language. Quantitative tools help us to compare,
and hence to generalize; qualitative tools encourage us to differentiate.
It is quite another thing, however, to disentangle the causal priority of methodologies and
ontologies. Do cultural anthropologists use qualitative tools because they envision a lumpy universe,
or do they see a lumpy universe because they perceive it with qualitative tools? About all one can say
with any degree of confidence is that there is a strong synergy between methods and ontologies. We
presume that they are, at the very least, strongly reinforcing. This may help to account for the
virulence and endurance of this central cleavage in the social sciences today.
In pointing to the problem of comparability as the key feature of this debate we hope to
have contributed to a demystification of this longstanding distinction, perhaps even to a
reconciliation of the two camps. It is not merely a matter of numbers versus words, or a debate about
what can or cannot be quantified. More fundamentally, the venerable debate represents fundamental
disagreements over how precise, explicit, and extensive social-science comparisons ought to be.
Those who resist numerical analysis, like Alasdair MacIntyre (1971/1978), are dubious about the
validity of comparisons. They see no need to enhance the precision or explicitness of comparisons

13 This view is ascribed to the work of Edward Sapir (1884-1936) and Benjamin Whorf (1897-1941),
and is known generally as the Sapir-Whorf hypothesis (Black 1962).

15
because they do not seek to compare in the first place, or they seek a more restricted ambit of
comparative reference-points. Those who embrace quantification are those who, like Gabriel
Almond and Sidney Verba (1963) – the authors of the seven-nation Civic Culture study that MacIntyre
attacked – are more comfortable with such comparisons. This debate has been engaged at many
levels – across individuals, across levels of government, across cultures, across time-periods – and
over many years.
In the second part of the paper we moved from descriptive statements to causal analysis.
Here, we argued that there are three genres of research design – Mathematical, Mill-ean, and Process
Tracing – each with a distinctive style of evidence and analysis. Mathematical methods are based on
large and relatively stable samples of highly comparable observations, presented in rectangular dataset
formats, and analyzed with statistical or Boolean techniques. Mill-ean methods employ small, stable
samples of comparable cases, usually presented in a table or figure, and usually analyzed with the
most-similar method. Process-Tracing methods employ non-comparable observations within a single
case study, presented in prose, and analyzed with a variety of techniques such as pattern-matching
and counterfactual thought-experiments.
In certain respects, this tripartite typology respects the traditional distinction between
quantitative and qualitative work. The Mathematical genre is more or less equivalent to the usual
meaning of the term “quantitative” (though we include Boolean approaches here, which some have
previously categorized as qualitative). What the conventional dichotomy misses, however, is the fact
that individual observations may be either qualitative or quantitative in a Process-Tracing style of
analysis. More important, the traditional quant/qual classification misses the distinction between
causal analysis based on comparable samples (whether Mathematical or Mill-ean) and causal analysis
based on N=1 samples (Process Tracing).
Arguably, the most important division in social science work today is not between quant and
qual, but rather between work that relies on samples comprised of comparable observations (either
large-N or small-N) and work based on evidence drawn from a concatenation of disparate N=1
observations. The difference between large-N (Mathematical) and small-N (Mill-ean) is merely a
matter of degrees. By contrast, the difference between sample-based research and causal analysis
based on a series of isolated (N=1) observations is a difference of kind. It is for this reason, we
believe, that Process-Tracing methods have proven so vexing to social science methodologists.
Indeed, Mill-ean methods have received more attention, and a great deal more respectability, than
Process-Tracing methods. Even to say the latter is to encounter bewilderment (e.g, Riker [quoted
above]; Beck 2004). Yet, the apparently cryptic approach of “Process Tracing” is not necessarily a
methodological weakness. In many instances, as we have shown, it is a necessity.

16
REFERENCES

Abbott, Andrew. 1992. “From Causes to Events: Notes on Narrative Positivism.” Sociological
Methods and Research 20:4 (May) 428-55.
Abell, Peter. 2004. “Narrative Explanation: An Alternative to Variable-Centered Explanation?”
Annual Review of Sociology 30, 287-310.
Adcock, Robert and David Collier. 2001. “Measurement Validity: A Shared Standard for Qualitative
and Quantitative Research.” American Political Science Review 95, 529-46.
Almond, Gabriel A. and Sidney Verba. 1963. Civic Culture: Political Attitudes and Democracy in Five
Nations. Princeton: Princeton University.
Barnes, Harry Elmer (ed). 1925. The History and Prospects of the Social Sciences. New York: Alfred A.
Knopf.
Barth, Fredik. 1999. “Comparative Methodologies in the Analysis of Anthropological Data.” In
John R. Bowen and Roger Petersen (eds), Critical Comparisons in Politics and Culture (Cambridge:
Cambridge University Press).
Beck, Nathaniel. 2004. “Is Causal-Process Observation an Oxymoron?: A Comment on Brady and
Collier, Rethinking Social Inquiry.” Ms.
Becker, Howard S. 1934. “Culture Case Study and Ideal-typical Method.” Social Forces 12:3, 399-405.
Becker, Howard S. 1996. “The Epistemology of Qualitative Research.” In Richard Jessor, Anne
Colby, and Richard A. Shweder (eds), Ethnography and Human Development: Context and Meaning in
Social Inquiry (Chicago: University of Chicago Press) 53-71.
Benoit, Kenneth. 2005. “How Qualitative Research Really Counts.” Qualitative Methods: Newsletter of
the American Political Science Association Organized Section on Qualitative Methods 3:1 (Spring) 9-12.
Bernard, L.L. 1928. “The Development of Method in Sociology.” The Monist 38 (April) 292-320.
Black, Max. 1962. Models and Metaphors: Studies in Language and Philosophy. Ithaca: Cornell University
Press.
Blalock, Hubert M., Jr. 1982. Conceptualization and Measurement in the Social Sciences. Beverly Hills: Sage.
Blalock, Hubert M., Jr. 1989. “The Real and Unrealized Contributions of Quantitative Sociology.”
American Sociological Review 54:3 (June) 447-60.
Brady, Henry E. 2004. “Data-Set Observations versus Causal-Process Observations: The 2000 U.S.
Presidential Election.” In Henry E. Brady and David Collier (eds), Rethinking Social Inquiry: Diverse
Tools, Shared Standards (Lanham: Rowman & Littlefield).
Brady, Henry E. and David Collier. 2004. Rethinking Social Inquiry: Diverse Tools, Shared Standards.
Lanham, MD: Rowman and Littlefield.
Brodbeck, May (ed). 1968. Readings in the Philosophy of the Social Sciences. New York: Macmillan.
Brown, Christine and Keith Lloyd. 2001. “Qualitative Methods in Psychiatric Research.” Advances in
Psychiatric Treatment 7, 350-6.
Bryman, Alan. 1984. “The Debate about Quantitative and Qualitative Research: A Question of
Method or Epistemology?” British Journal of Sociology 35:1 (March) 75-92.
Buthe, Tim. 2002. “Taking Temporality Seriously: Modeling History and the Use of Narratives as
Evidence.” American Political Science Review 96:3 (September).
Campbell, Donald T. 1975/1988. “‘Degrees of Freedom’ and the Case Study.” In Methodology and
Epistemology for Social Science, ed. E. Samuel Overman (Chicago: University of Chicago Press).
Cohen, Lizabeth. 1991. Making a New Deal: Industrial Workers in Chicago, 1919-1939. Cambridge:
Cambridge University Press.
Collier, David. 1993. “The Comparative Method.” In Ada W. Finifter (ed), Political Science: The State
of the Discipline II (Washington, DC: American Political Science Association).
Collier, David and James E. Mahon, Jr. 1993. “Conceptual 'Stretching' Revisited: Adapting
Categories in Comparative Analysis.” American Political Science Review 87:4 (December).
Collier, Ruth Berins and David Collier. 1991. Shaping the Political Arena: Critical Junctures, the Labor
Movement, and Regime Dynamics in Latin America. Princeton: Princeton University Press.

17
Creswell, John W. 1998. Qualitative Inquiry and Research Design. Thousand Oaks: Sage.
DeFelice, E. Gene. 1980. “Comparison Misconceived: Common Nonsense in Comparative
Politics.” Comparative Politics 13:1 (October) 119-26.
Denzin, Norman K. and Yvonna S. Lincoln (eds). 1994. Handbook of Qualitative Research. Thousand
Oaks: Sage.
Eckstein, Harry. 1975. “Case Studies and Theory in Political Science.” In Fred I. Greenstein and
Nelson W. Polsby (eds), Handbook of Political Science, vol. 7. Political Science: Scope and Theory (Reading,
MA: Addison-Wesley).
Elman, Colin. 2005. “Explanatory Typologies in Qualitative Studies of International Politics.”
International Organization 59:2 (Spring) 293-326.
Fenno, Richard F., Jr. 1978. Home Style: House Members in their Districts. Boston : Little, Brown.
Fielding, N.G. and R.M. Lee. 1998. Computer Analysis and Qualitative Research. Thousand Oaks: Sage.
Freedman, David A. 2004. Statistical Models: Theory and Practice. Ms.
Friedman, Milton. 1953. “The Methodology of Positive Economics.” In Essays in Positive Economics
(Chicago: University of Chicago Press).
Gahan, C. and M. Hannibal. 1999. Doing Qualitative Researcvh Using QSR NUD*IST. Thousand Oaks:
Sage.
Geddes, Barbara. 2003. Paradigms and Sand Castles: Theory Building and Research Design in
Comparative Politics. Ann Arbor: University of Michigan Press.
Geertz, Clifford. 1980. Negara: The Theatre State in Bali. Princeton: Princeton University Press.
George, Alexander L. and Andrew Bennett. 2005. Case Studies and Theory Development. Cambridge:
MIT Press.
George, Alexander L. and Richard Smoke. 1974. Deterrence in American Foreign Policy: Theory and
Practice. New York: Columbia University Press.
Gerring, John. 2001. Social Science Methodology: A Criterial Framework. Cambridge: Cambridge
University Press.
Gerring, John. 2006. Case Study Research: Principles and Practices. Cambridge: Cambridge University
Press.
Goldstone, Jack A. 2003. “Comparative Historical Analysis and Knowledge Accumulation in the
Study of Reovlutions.” In James Mahoney and Dietrich Rueschemeyer (eds), Comparative Historical
Analysis in the Social Sciences (Cambridge: Cambridge University Press).
Goldthorpe, John H. 2000. On Sociology: Numbers, Narratives, and the Integration of Research and Theory.
Oxford: Oxford University Press.
Gosnell, Harold F. 1933. “Statisticians and Political Scientists.” American Political Science Review 27:3
(June) 392-403.
Griffin, Larry J. 1992. “Temporality, Events, and Explanation in Historical Sociology: An
Introduction.” Sociological Methods and Research 20:4 (May) 403-27.
Hall, Peter A. 2003. “Aligning Ontology and Methodology in Comparative Politics.” In James
Mahoney and Dietrich Rueschemeyer (eds), Comparative Historical Analysis in the Social Sciences
(Cambridge: Cambridge University Press).
Hammersley, Martyn. 1989. The Dilemma of Qualitative Method: Blumer, Herbert and the Chicago School.
London: Routledge & Kegan Paul.
Hammersley, Martyn. 1992. “Deconstructing the Qualitative-Quantitative Divide.” In Julie
Brannen (ed), Mixing Methods: Qualitative and Quantitative Research (Aldershot: Avebury).
Jocher, Katharine. 1928. “The Case Study Method in Social Research.” Social Forces 7, 512-5.
Kaplan, Abraham. 1964. The Conduct of Inquiry: Methodology for Behavioral Science. San Francisco:
Chandler Publishing.
Kaufman, Herbert. 1960. The Forest Ranger: A Study in Administrative Behavior. Baltimore: Johns
Hopkins University Press.
Kaufmann, Daniel, Aart Kraay and Pablo Zoido-Lobaton. 1999. “Aggregating Governance
Indicators.” Manuscript, World Bank.

18
King, Gary, Robert O. Keohane, and Sidney Verba. 1994. Designing Social Inquiry: Scientific Inference in
Qualitative Research. Princeton: Princeton University Press.
Krippendorff, Klaus. 2003. Content Analysis: An Introduction to its Methodology. Sage.
Lazarsfeld, Paul F. and Morris Rosenberg. 1955. The Language of Social Research. Glencoe, IL: Free
Press.
Lieberman, Evan S. 2005. “Nested Analysis as a Mixed-Method Strategy for Comparative
Research.” American Political Science Review (August).
Lieberson, Stanley. 1985. Making it Count: The Improvement of Social Research and Theory. Berkeley:
University of California Press.
Little, Daniel. 1995. “Causal Explanation in the Social Sciences.” Southern Journal of Philosophy 34
(supplement) 31-56.
Lott, John R., Fr. 2000. “Gore Might Lose a Second Round: Media Suppressed the Bush Vote.”
Philadelphia Inquirer (November 14) 23A.
MacIntyre, Alasdair. 1971/1978. “Is a Science of Comparative Politics Possible?” In Against the Self-
Images of the Age: Essays on Ideology and Philosophy (London: Duckworth).
Mahoney, James. 1999. “Nominal, Ordinal, and Narrative Appraisal in Macro-Causal Analysis.”
American Journal of Sociology 104:4 (January) 1154-96.
Mahoney, James and Dietrich Rueschemeyer (eds). 2003. Comparative Historical Analysis in the Social
Sciences. Cambridge: Cambridge University Press.
Mill, John Stuart. 1843/1872. System of Logic, 8th ed. London: Longmans, Green.
Pressman, Jeffrey L. and Aaron Wildavsky. 1973. Implementation. Berkeley: University of California
Press.
Rabinow, Paul and William M. Sullivan (eds). 1979. Interpretive Social Science: A Reader. Berkeley:
University of California Press.
Ragin, Charles C. 1987. The Comparative Method: Moving Beyond Qualitative and Quantitative Strategies.
Berkeley: University of California.
Ragin, Charles C. 1997. “Turning the Tables: How Case-Oriented Research Challenges Variable-
Oriented Research.” Comparative Social Research 16.
Ragin, Charles C. 2000. Fuzzy-Set Social Science. Chicago: University of Chicago Press.
Roberts, Clayton. 1996. The Logic of Historical Explanation. University Park: Pennsylvania State
University Press.
Robinson, Richard. 1954. Definition. Oxford: Clarendon Press.
Sartori, Giovanni. 1970. “Concept Misformation in Comparative Politics.” American Political Science
Review 64:4 (December) 1033-46.
Sartori, Giovanni. 1984. “Guidelines for Concept Analysis.” In Social Science Concepts: A Systematic
Analysis (Beverly Hills: Sage) 15-48.
Schwandt, Thomas A. 1997. Qualitative Inquiry: A Dictionary of Terms. Thousand Oaks: Sage.
Scriven, Michael. 1976. “Maximizing the Power of Causal Investigations: The Modus Operandi
Method.” In G.V. Glass (ed), Evaluation Studies Review Annual (Beverly Hills: Sage) 101-18.
Seawright, Jason and David Collier. 2004. “Glossary.” In Henry E. Brady and David Collier (eds),
Rethinking Social Inquiry: Diverse Tools, Shared Standards (Lanham: Rowman & Littlefield) 273-313.
Shweder, Richard A. 1996. “Quanta and Qualia: What is the ‘Object’ of Ethnographic Method?” In
Richard Jessor, Anne Colby, and Richard A. Shweder (eds), Ethnography and Human Development:
Context and Meaning in Social Inquiry (Chicago: University of Chicago Press).
Shweder, Richard A. and Robert A. LeVine (eds). 1984. Culture Theory: Essays on Mind, Self, and
Emotion. Cambridge: Cambridge University Press.
Smith, Charles W. 1989. “The Qualitative Significance of Quantitative Representation.” In Barry
Glassner and Jonathan D. Moreno (eds), The Qualitative-Quantitative Distinction in the Social Sciences
(Boston Studies in the Philosophy of Science, 112) 29-42.
Stouffer, Samuel A. 1931. “Experimental Comoparison of a Statistical and a Case History
Technique of Attitude Research.” Publications of the American Sociological Society 25, 154-56.

19
Strauss, Anselm and Juliet Corbin. 1998. Basic of Qualitative Research: Techniques and Procedures for
Developing Grounded Theory. Thousand Oaks: Sage.
Tannenwald, Nina. 1999. “The Nuclear Taboo: The United States and the Normative Basis of
Nuclear Non-Use.” International Organization 53:3 (Summer) 433-68.
Tarrow, Sidney. 1995. “Bridging the Quantitative-Qualitative Divide in Political Science.” American
Political Science Review 89:2 (June) 471-74.
Taylor, Charles. 1962. The Explanation of Behavior. New York: Routledge & Kegan Paul.
Teggart, Frederick J. 1939/1967. Rome and China: A Study of Correlations in Historical Events. Berkeley:
University of California Press.
Tetlock, Philip E. and Aaron Belkin (eds). 1996. Counterfactual Thought Experiments in World Politics.
Princeton: Princeton University Press.
Urban, Greg. 1999. “The Role of Comparison in the Light of the Theory of Culture.” In John R.
Bowen and Roger Petersen (eds), Critical Comparisons in Politics and Culture (Cambridge: Cambridge
University Press).
Van Deth, Jan W. (ed). 1998. Comparative Politics: The Problem of Equivalence. New York: Routledge.
Waller, Willard. 1934. “Insight and Scientific Method.” American Journal of Sociology 40:3 (November)
285-97.
White, Leonard D. (ed). 1930. The New Social Science. Chicago: University of Chicago Press.
Wilson, Edward O. 1998. Consilience: The Unity of Knowledge. New York: Alfred A. Knopf.
Winch, Peter. 1958. The Idea of a Social Science, and its Relation to Philosophy. London: Routledge &
Kegan Paul.
Zelditch, M., Jr. 1971. “Intelligible Comparisons.” In I. Vallier (ed), Comparative Methods in Sociology:
Essays on Trends and Applications (Berkeley: University of California Press) 267-307.
Znaniecki, Florian. 1934. The Method of Sociology. New York: Rinehart.

20

You might also like