You are on page 1of 33

Techniques for Choosing Cases 87

5
0

0
00

Techniques for Choosing Cases


c.

0

,,,
i I


11''

',,
I

:I,Ii
E 0 1,1
<O
0 1,1

"'
0
Gerring, J. (2007). “Chapter 5: Techniques for Choosing Cases”. En ii;
1'1,,

'I
0 ,ij
.Q

Case Study Research, Cambridge, Cambridge University Press: E

'
a
z i:1
pp.86-150. ;q
0
N
,:11 111

l,,!1
I, 1:1
l11li :1::! 1
0

Case study analysis focuses on a small number of cases that are expected 0.0 0.2 0.4 0.6 0.8 1.0
to provide insight into a causal relationship across a larger population Mean value of X
of cases. This presents the researcher with a formidable problem of case FIGURE 5.1. San1ple 111eans of large-sample draws. A histogram showing the
1nean values of one VJriable in 500 samples of 1,000 cases each. Population
selection. Which cases should be chosen?
1nean = 0.5.
In la rgc-sample research, case selection is usually handled by son1e
version of randon1ization. If a sa1nple consists of a large enough num­
ber of independent random draws, the selected cases are likely to be However, in case study research the sa1nple is s1nall (by definition);
fairly representative of the overall population on any given variable. .ind this makes randomization problematic. Consider what would hap­
furthermore, if cases in the population are distributed hon1ogeneously pen if the sa1nple size were changed from 1,000 cases to only 5 cases.
across the ranges of the key variables, then it is probable that sonic 'l'he results are shown in Figure 5.2. On average, these s1nall-N ran­
cases will be included fro111 each i1nportant segn1cnt of those ranges, dom samples produce the right answer, so the procedure culminates in
thus providing sufficient leverage for causal analysis. (For situations in results that are unbiased. I-Iowever, many of the sample means are rather
which cases with theoretically relevant values of the variables are rare, a far from the population mean, and some are quite far indeed. Hence,
stratified sample that oversamples son1e subset of the population n1ay be even though this case-selection technique produces representative sam­
employed.) ples on average, any given sample may be wildly unrepresentative. In
A demonstration of the fact that randon1 san1pling is likely to produce statistical tern1s, the problem is that small sample sizes tend to pro­
a representative sample is shown .in Figure 5.1, a histogran1 of the 1nean duce estimates with a great deal of variance - sometimes referred to
values of 500 random sa1nples, each consisting of 1,000 cases. For each ;1s ;1 prohlem of precision. For this reason, random sampling is unre-
case one variable has been measured: a continuous variable that falls 11;1 hie in small-N research. (Note that in this chapter "N" refers to
som�where between zero and one. In the population, the mean value of t·;1st·s, not observations.) Moreover, there is no guarantee that a few
this variable is 0.5. How representative are the randon1 samples? One l':ISl's, ..:hoscn randomly, will provide leverage into the research question
good way of judging this is to compare the means of each of the 500 1h.11 ;1ni1natcs ;.111 investigation. The sa1nple 1night be representative, but
. _
random samples to the population rr1ean. As can be seen in the figure, all 1111 ir1lt,n11:1t·!vc.
of the sample means are very close to the population mean. So randon1 ll r:111do111 s:11npli11g is inappropri<1tc as a selection method in case
san1pling was a success, and each of the 500 san1ples turns 011r to he fairly ·,l1 HI�· n·o..;c;1rt·h, l11nv, 1hc11, is one 1·0 choose ,1 s:1111plc co1nprised of
iq· ',(·v,-r.il ,\',<'',:' k(·t·11 in 1ni11d ili:l! 1I H · l'.o,ds of t':l'i(" st•h•l'tion
1Tp1Tsc111:1tivl' of thl' popuL1rio11. 1111,· 1
lI. Doing Case Studies Techniques for Choosing C1ses 89
88
TABLE 5.L Techniques of case-selection

:1
"'
1. Typical
• Dc{inition: Cases (01:e or n1ore) are typical exa1nples of some cross-case
relatio:1ship.
• Cross·casc technique: A low-residual case (on-lier),
OJ"' • Uses: Hypothesis testing.
0

e" ' • Represer:Lati11eness: By Jefin.ition, the typicai ca<;e is reprcsentadve.


1. Diverse
0
0
• Definition: (:ases (two or more) ilh1minate the fu'.l range o f variation o n X 1 ) '(,

:I
e or X 1 ;Y.
��
• Cross-case technique: Diversity may be calculated by (a) categorical values of
� 1 or Y (e,g., J cwisb, Catholic, Protestant), {b) standard dcvi�tio:1s of X 1 o::: Y

--
i1f
• cortin1m11
1 · 1 or .C1 com b'1nattcns
sn r' · o f van�cs
' 1'e.g., I)ased on cross-tabulations'
factor analysis, or discriminant analysis_'..
• i}ses: }lypothesis gene!'ating or hypothesis
• .Representativeness: Dive:.-se cases a;:e likely ::o be n.-:presen::ative in the mir:in:aI
O,D 0,2 0.4 06 .D sense of rcpresrn_ting the full variation of the popul-stion \though they ;:n ight
Mean value of X not tnirrot the d1$tnbution of that variation in the popu:aticn).
FIGURE 5.2. San1ple 1nea;1s of small-san,ple drav;s, A histogra1n sho\ving the me:111 3. Extreme
values of one variable in 500 samples of 5 cases each. Populat\or: tnean � 0.5. • Definition: Cases (one or n1ore) exeTplify extren1e or unusual values on X 1 or
'{ rd,1tive to '>on1e u nivariate disrributio1� .
.. Cross�case technique; A case lying ;-,1,1ny standard deviations a\VJ.y frorn :he
remain the ,.;a1ne regardless of the size of the chosen samplt:. Large·N
mean of X 1 or Y.
cross�case a nalysis and case study analysis both ai1n to ldenrify cases that • (]ses: f-fyporhesis generatJng '.open-en<led probe ofX 1 or Y).
reproduce the relevant causal features of a larger universe ( representative· • Representativeness; Ach:evable only in co1np,rdson ro -a larger samole of cases,
ness) and provide variation alo:1g the <.Ernensions of theoretical interest 4. Deviant
(ca11sal leverage). In case study research, ho\vever, these goals n1ust be • Definition.: Cases (one or rnore) deviate fro.m so1ne cross-case relationship.
n1et throug�1 purposive ;nonrandon1) selection procedures. These may be • Cross�casc technique: A high-residual case (oudie:d.
• [Jses: I--Iypothesis gcat:rJting (to develop new expl�nations of Y).
enun1erated according to nine techniques, frorn \vhich \Ve derive nine case
• Representativeness: After the case study is conducted, i t rnav De corroborated
study types: typical.. diverse, extreme_, deriant, irrfluenfii,,1}. crucial_, path­ by :J. cross-case test, which in_clud-es a general hypothesis (a �e\v variable) baSed
u,ay, rnost-sitnilar, and most-di,fferent. on tne c:1.:,e study resean.: h. It the ;::ase .is now an on·-lier, it mav be con�idered
1

Table 5.1 su1nmarizes each type, including its general definition, a tech­ representative of the nc'.Y rela-c:ionship.
nique for ldentlfying it \vithtn a population of potential cases, its uses, 5. Influenria)
and i-c� probable representativeness. While each of these techniques is • Definition: Cases (one or more) ,vith influential configurations of the
norn1ally practiced on one or several cases {the diverse, most-si1nilar. and independent vari2bles,
• (--,rossMc.ase technique: .Ha t matrix or Cook's distance.
n1ost-diffcrent methods require ax least rwo), all may e1nploy additiona]
• lJses: f--iypothesi°' testing {to verify the status of cases that n1av influence the
cases � with the proviso that, at some point; they wiH no longer offer a� n:sults of ,1 cross-case aEalys1s).
opportunity for in··depth analysis and \vlH thus no longer be case studies • l\eprest•11!atit•e11ess: Not per�inen::1 given the goals of the in1luensial-case st·01dy.
in the usual sense. 6. (:ru.:ial
l"he main point of this chapter is co show how case-selection p:oct:dures � 1 )1·/inlliou: ( .Jsvo. (,111c or 1no1y) ,1rc n�ost- or least-likely to exhibit a given
rest> at least irnplii.: itly, upon an analysis of a larger population of potcnri;_i 1 nu1t·onh'.
• ( 'n1,.., , <1�,· In lnu,1111" t ,i11.d11.nnr .1·,�1",..,0h'll! oF rctttivc (flh'.frdness.
cases. The c:isc(s) identified fo� intensive study is chn-..cn fro1n ;l p opula·
tinn, ;111d rbt rl';1so;;s f1 H" rhi;; t:hoicc hing,· upon dn· \\',1\ 1n u·l111.. h i1 [..,
90 II. D oing Case Studies Techniques for Choosing Cases 91

TABLE s.1 (continued) statistic-al analysis is usually problematic. Second, relevant data must be
available for that population, or a significant sample of that population,
• Uses: Hypothesis testing (confirmatory or disconfirmatory).
on key variables, and the researcher must feel reasonably confident in
• Representativeness: Assessable by reference to prior expectations about the
case and the population. the accuracy and conceptual validity of these variables. Third, all the
7. Pathway stan<lar<l considerations of statistical research (e.g., identification, specifi­
• Definiti�n: Cases (one or rnore) \.vhere X 1 , and not X2, is likely to have caused cation, robustness) must be carefully considered and, wherever possible,
a positive outcon1e (Y = 1). investigated. T shall not dilate further on these fan1iliar issues except to
. .
• Cross-case technique: Cross-tab (fqr categorical variables) or residual analys1s warn the researcher against the unthinking use of statistical techniques. 1
(for continuous variables). When these requiren1ents are not 1net, the researcher must en1ploy a
• Uses: Hypothesis testing (to probe causal mechanisms).
qualitative approach to case selection. Thus, the point of this chapter is
• Representativeness: May be tested by exarnining residuals for the chosen cases.
not to insist upon quantitative techniques of case selection in case study
8. Most-similar
• Definition: Cases (t\VO or n1ore) are sirnilar on specified variables other than research. My purpose, rather, is to elucidate general principles that might
X 1 and/or Y. guide the process of case selection in case study research, ,.vhether the tech­
• Cross-case technique: Matching. nique is quantitative or qualitative. Son1e of these principles are already
• Uses: Hypothesis generating or hypothesis testing. widely known and widely practiced. Others are less common, or less
• Representativeness: May be tested by exan1ining residuals for tbc chosen cases. well understood. Most of these methods are viable - indeed, are virtually
9. Most-different identical - in qualitative and quantitative contexts. Hence, the statistical
• Definition: Cases (two or more) are different on specified variables other than sections of this chapter usually simply reformulate the logic of qualitative
X 1 and Y.
• Cross-case technique: The inverse of the rnost-similar 1nethod oi large-N case case-selection procedures as they might be applied to large populations
selection (see ahove). where the foregoing caveats apply.
• Uses: Hypothesis generating or hypothesis testing (eliminating deten11inistic
causes).
• Representativeness: May he tested by exan1ining residuals for the chosen cases. Typical Case

In order for a focused case study to provide insight into a broader phe­
nomenon, it must be representative of a broader set of cases. It is in this
situated within that population. This is the origin of the tern1inology j ust context that one n1ay speak of a typical-case approach to case selection.
listed - typical, diverse, extreme, and so on. It follows that case-selection The typical case exemplifies what is considered to be a typical set of values,
procedures in case study research may build upon prior cross-case analysis given some general understanding of a pheno1nenon. By construction, the
and depend, at the very least, upon certain assumptions abont a broader typical case is also a representative case; I en1ploy these two terms syn­
population. This, in turn, reinforces a central perspective of the book: onymously.2 (The antonyn1, deviance, is discussed in a later section.)
case study analysis does not exist, and is in1possible to conceptualize, in Some typical cases serve an exploratory role. Here, the author chooses
isolation fro1n cross-case analysis. a case based upon a set of descriptive characteristics and then probes for
To be sure, the sort of cross-case analysis that might be possible in a causal relationships. Robert and Helen Lynd selected a single city "to be

is' on how 1nuch information one has about these cases, on what sort of
given research context rests on how large the population of potential cases

general model might be constructed, and with what degree of confidence I C11j,1rari (2003); Kennedy (2003). Interestingly, the potential of cross-case statistics in
that model might be applied. In order for most quantitative (statistical) IH'lping to choose c1st·� f-or in-deptb ;inalysis is rtcognizcd in so1nc of the earliest discus­
of tilt' Cl\t' \t11(lr 1 1 1 1 · t l 1 n , I k.g., ()ueen 1 928: 226) .
case-selection techniques to be fruitful, several caveats must be satisfied.
•.i1 ,w,
l'lw l.1tt(·r l ( T 1 1 1 i� 1>l11·11 1·11q 1 l111T.l 1 1 1 l lw p�vd1ol()gic;il literature (e.g., Hersen anJ Barlow
First, the inference must pertain to more than several cases; otherwise, l '!"'t,: ,1,,1) .
II. Doing Case Studies Techniques for Choosing Cases 93
92

as representative as possible of conte1nporary A111crican life." Specifically, involves, first of a ll, ide ntifying the relevant variables. It involves, secondly,
they we re looking for a city with the selection of a case tha t has "typic al " v a lues re lative to the overa ll
ca usa l model; it is we ll e xplained.
1) a tetnpcrate cli1nate; 2) a sufficiently rapid rate of growth to ensure the presence Note tha t cases with atypic al scores on a particular dimension (e .g.,
of a plentiful assortn1cnt of the growing pains accornpanying contemporary social
very high or very low) may still he typical exa mples of a causal relation­
change; ?,) an industrial culture with modern, high-speed machine production;
4) the absence of <lo,ninance of the city's industry by a single plant u_.e., not a ship. Indeed, they m ay be more typical than cases whose values lie close
one-industry town); 5) a substantial local artistic life to balance its industrial to the mean.
activity . . . ; and 6) the absence of any outstanding peculiarities or acute l?cal Note also that because the typica l case embodies a typica l v alue on
problerns \vhich would 1nark the city off from the 1nidchannel sort of Arnencan some set of variables, the va r i ance of interest to the researcher must lie
community. 3 within that case. Specifica lly, the typical case of some phenome non 1nay
After exa1nining a numbe r of options, the Lynds decided th at Muncie, be he lpful in exploring causa l mechanis1ns and in solving ide ntification
Indian a , was more representativ e th an, or at lea st as .representa tive as, problems (e.g., endogeneity between X 1 and Y, an omitted v aria ble that
other midsize d cities in Ame ric a , thus qua lifying as a typic al case. n1 ay a ccount for X1 and Y, or som e othe r spurious ca usa l association).
This is a n inductive appro a ch to case se lection. Note that typicality D epending upon the results of the case study, the author ma y confirn1
m ay be understood according to the mean, median, o r mode on a par­ an existing hypothesis, disconfirm that hypothesis, or refra me it in a wa y

ticul ar dimension; there m ay be n1ultiple dimensions ( as in the foregoing tha t is consistent with the findings of the case study.
example); and e ach m ay be differently weighted (some dim e nsions may he
more irnporta nt than others). Where the selection criter ia a re multidimen­ Cross-Case Technique
siona l and a la rge sample of potenti al cases is in play, some for111 of factor How might one identify a typic a l cas� from a large popula tion of poten­
a n a lysis m a y be use ful in identifying the most-typic al case(s). Although ti al cases? If the ca usa l re la tionship involves only a single independe nt
the Lynds did not employ a statistical 1nodel to evalua te potentia l cases, variable and if the relationship is quite strong, it m ay be possible to
it is e asy to see how they might ha ve done so, at least along the first five identify typical cases simply by eyeballing the e vidence . A strong posi­
criteria . (The fina l criteria would be difficult to ope ra tiona lize in a l arge tive associ ation he tween X 1 and Y m ea ns th at a case with similar (high,
sample, since it involves "peculi arities" of any sort.) low, or middling) v a lues on X 1 and Y is probably a typic al case. How­
Howev er, the more comn1on employ111ent of the typica l-c ase 1ne thod ever, there are fe,v biva ria te causal relationships in socia l science . Usually,

involves a causal n1ode l of son1e phenomenon of theo retical interest. Here, more tha n one causa l factor must be evalua ted, even if the additional vari­
the researche r has identifie d a particular outcome (Y), and per haps a a bles se rve only a s cont rols. Moreover, ,vithout som e over all a sse ssment

specific X 1 /Y hypothesis, which she wishes to investiga te. In order to do of the cross-case evide nce it may be difficult to say ,vhether the general
so, she looks for a typic a l exa n1ple of th a t ca usal re lationship. Intuitively, relationship is positive or negative , strong or we ak. Thus, in any la rge-N
one ima gines tha t a case se lected according to the mean v alues of all sample (i.e., \iV henev er the nun1ber of pote ntial cases is great) it is advis­
pa rameters must be a typica l case relative to so1ne causal relationship. able to perform a form a l cross-case analysis in order to identify "typical"
However, this is bv n o 1nea ns assured. cases.
Suppose that th-e Lynds v.rere primarily intereste d in explaining feelings Suppose th at a n a rbitra ry case in the popula tion, denoted as case i,
of tr ust/distrust an1ong members of different social cla sses (one of the has a known score on e ach of se ver al relevant v ariables. Fo r the sa ke
implicit research go a ls of the Middletown study). This outcome is like ly of economy of language, let the v ariables involved in the re lationship be
to be a ffected by many factors, only some of which a re included in tbeir labe l e d Y1 and X 1 ,i, . . . XK,i , where Yi is the score of case i on one va ria ble and
six se lection critcria . So choosing cases with r espe ct to a ca usal hypothesis L\H:h of the XK,/s is the score of ca se i on one of the K other variables under
L·onsidcration. 1'hus, the re L-itionship involves a tota l of K + 1 va riables.
' I.ynd and 1.ynd ( 1929/1956), quoted in Yin (2004: 29-.1 0 ). k l·:111 hl' ;1ny intl'gl'r grT:ltl'r th:111 or equal to 1 .
94 II. Doing Case Studies Techniques for Choosing Cases 95

With these symbols, the established relationships among the variables reason of practicality (cost, convenience, etc.). However, scholars should
can be expressed mathematically. The idea is to find a function, fl), such try to avoid selecting from among the set of typical cases in a way that
that the average score of y for cases with some specific set of scores on is correlated with relevant omitted variables; such selection procedures
x 1 . . . XK is equal to f(x1, . . . XK ). Thus, the function f() should be chosen con1plicate the task of causal inference.
to capture the key ideas ahout the relationship of interest. A familiar Consider the (presurnably causal) relationship hetween economic
example 1nay 1nake this discussion clearer. developn1ent and level of democracy.4 Democracy is understood here as
Often, researchers choose an additive (linear) function to play the role a continuous concept along a twenty-one-point scale, from - 1 0 (most
of f(). Using traditional statistical notation, in which the average score of autocratic) to + 10 (most den1ocratic).5 Economic development is mea­
Yi across infinite repetitions of case i is denoted hy its expectation, E(yi ), sured in standard fashion by per capita GDP." Figure 5.3 displays this
a linear function represents a relationship in which: relationship in the form of a bivariate scatterplot. The classical result is
strikingly illustrated: \Vealthy countries arc almost exclusively democratic.
E(y; ) = f3o + f3i x 1., + . . . + f3KXK.i (5.l)
(For heuristic purposes, certain simplifying assumptions are adopted. I
Each of the f3 K 's in this equation represents an unknown constant. Regres­ shall assunie, for exa1nple, that this 1neasure of democracy is continuous
sion analysis allows researchers to use known information about the y and and unbounded. 7 I shall assume, more importantly, that the true relation­
x 1 • . . XK variables for a set of cases to estimate these unknown constants. ship between economic development and democracy is log-linear, positive,
Estimates of /3 K will be denoted here as bK . and causally asynnnetric, with economic development treated as exoge­
Using this terminology, we can now develop a formula for the degree to nous and democracy as endogenous.8 )
which a particular case is typical in light of a given relationship. A case is Given this general relationship, how n1ight a set of " typical" cases he
"typical" in the tenns of small-N methodology to the extent that its score selected? Recall that the Y variable is simply the democracy score, and
on the y variable is close to the average score on that variable for a case there is only one independent variable: logged per capita GDP. Hence,
with the san1e scores on the x1 . • . XK variables, as given by equation 5. 1 . the simplest relevant model is:
That is,
E(Polity, ) = f3o + /3i GDP, (5.3)
Typicality(i) = -abs[ y; - E(y; lx 1.;, . . . XK; I] (5.2)
= -abs[y; - bo + b 1 x 1,, + · · · + bKxK,,]
According to this discussion, the typicality of a case with respect to a For our purposes, the most important feature of this model is the resid­
particular relationship is simply -1 times the absolute value of that case's uals for each case. Figure 5.4 shows a histogram of these residuals. Obvi­
error term (its residu al) in regression analysis. This measure of typicality ously, a fairly large nun1ber of cases have quite low residuals an<l therefore
ranges, in theory, fro1n negative infinity to zero. When a case falls close to might be considered typical. A higher proportion of cases fall far below
the regression line, its typicality will be just below zero. When a case falls the regression line than far above it, suggesting that the model may be
far from the regression line, its typicality will be far below zero. Typical
cases have s1nall residuals.
In a large-N sample, there will often be many cases with high (i.e., near­ ·I I .ipso.:t ( 1 959). Whether economic dcvclop1nent has only the effect of n1aintaining demo­
zero) typicality scores. In such situations, researchers may elect not to cr;Hic regimes (Prze\vorski et al. 2000) or also of causing regime transitions (Boix and
S1okes 200.1) is not relevant to the present discussion, \\'here J assume a simple linear
focus on the cases with the highest estimated typicality, for such estimates 1\·l.1l'i()11�hip betvYecn \Vealth and dcn1ocracy.
may not be accurate enough to distinguish among several almost-identical Thi� sniring dcrivl'-� frrnn the Polity2 variable in the Polity lV dataset (11arshall and
cases. Instead, researchers may choose to randomly select from the set J:1gJ',l·r� 2005).
I l.11.1 .11·1· dr:1wn 11·"111 1 11< ' l',·1111 \\/odd 'J:1hll·s d;1tasct (Su1111ncrs :ind Heston J 991).
of cases witb very high typicality, or to choose from among these cases l'o tlt •,,.., l'r,·1cr .111,I l.1, l- 111.111 ( 1 1 )( 1 \ )
;1ccording to additional criteria, such as those to he discus;o.;('tl hl ' n', or hy Ii l',111 •.,·,· ( . , · 1 1 11q•, , I , 1] I 1 i l f l ', I 1 1 1, I 1'1 n 1\ , 1 1 •J 1 1· 1 . 1 [ . (.11)()0).
96 II. Doing Case Studies Techniques for Choosing Cases 97

- --- . .. .- ·- - ·
--- -- - - - - --- -- --
"'0
� ���:
I · .. .. . w
00



I.O
0
TI 0

"'
N
0 u


c


:0
0

:,;
E o
0
z
�-


.. •
0

- -- -�--
? 8 9 10
Logged 1995 per capita GDP 0
I'IGURE 5.3. The presumed relationship bet\veen economic development and -20 -1 5 -10 -5 0 5 10
detnocracy. A scatterplot showing level of den1ocracy (on tbc vertical axis)
Residual from robust regression
and level of wealth (on the horizontal axis) of all available countries in 1 995.
N = 131. FIGURE 5.4. Potential typical cases. A histogram of the residuals from a rohust
regression of logged per capita GDP on level of dcn1ocracy.
incomplete. Hopefully, within-case analysis will be able t o shed light on
the reasons for the asy1nn1etry. 9 it provides little insurance against errors that are purely stochastic. A
Because of the large nun1bcr of cases \vith quite sn1all residuals, the case n1dy lie directly on the regression line but still be, in so1ne in1por­
researcher will have a range of options for selecting typical cases. Indeed, tant respect, atypical. For example, it rnight have an odd co1nbination of
in this cxa1nple, twenty-six cases have a typicality score bct\veen O and - 1 . values; the interaction of variables might be different fron1 that in other
Any or all of these might reasonably be selected a s typical cases with cases; or unusual causal mechanisms 1night be at vvork. Most i1nportant,
respect to the n1odel described in equation 5.3. an analysis of residuals does not address problems of sample bias. If the
large-N samplL is not representative of the relevent population then any
Conclusion analysis based on the former is apt to be flawed. Typicality does not ensure
Typicality responds to the first desideratum of case selection, that the representativeness. For these reasons, it is important to supple111ent a sta­
chosen case be representative of a population of cases (as defined by tistical analysis of cases \Vith evidence drav.rn fron1 the case in question
the primary inference). Even so, it is in1portant to remind ourselves that (the case study itself) and with our general knowledge of the world. One
a single-minded pursuit of representatlveuess does not ensure that this should never judge a case solely by its residual. Yet, all other things being
desideratum \vill be acbieved. Indeed, the issue of case representativeness equal, a case with a lo\v residual is less likely to be unusual than a case with
is not an issue that can ever be definitively settled in a case study for­ ;1 high residual, and to this extent the method of case selection outlined
mat. When one refers to a " typical case" one is saying, in effect, thar the here may be a helpful guide to case study researchers faced with a large
probability of a case's representativeness is high, relative to other cases. 11un1ber of potential cases.
Note that the measure of typicality introduced here, the size of a case's
residual, can he misleading ii the statistical model is misspecified. And
I >ivcrsc c:asc

'! ln this cx:1n1p le, the :1sy111111etry is p robabl y due to the failure of i-hc 111odd Ill 1:1kc inro i\ '>t"l' o 1 1 d 1." , l '><" •,t·lc1 I l ! l l l ·,! 1 , l l <" J ',Y l1:1s :1s its prin1ary objective the acbieve­
;\l"l '•H11tt d1t · r('stril·1nl Lill) ',\' of tlw dqwndl'lll V;triahk, as di�l"ll��1·d r.ll'li,·1. ttH'lll n! 1 1 1 . 1 , 1 1 1 1 1 1 1 1 1 1 . 1 1 1 . 1 1 1( 1· .ilo11)', rt'lt"v:1 1 1 1 d i 1 1 1t·11-;io11s. I refer to this ;1s a
98 Il. Doing Case Studies Techniques for Choosing Cases 99

diverse-case n1ethod. For obvious reasons, this method requires the selec­ a theory that specifies independent variables, delineates then1 into nominal, ordi­
nal, or interval categories, and provides not only hypotheses on how these vari­
tion of a set of cases - at n1inimu1n, two - that are intended to represent
ables operate singly, but contingent generalizations on how and under what con­
the full range of values characterizing X 1 , Y, or son1e particular X 1 /Y ditions they behave in specified conjunctions or configurations to produce effects
relationship. tu on specified dependent variables. We call specified conjunctions or configurations
Where the individual variable of interest is categorical (on/off, red/ of the vJriables "types." A fully specified typological theory provides hypotheses
black/blue, Jewish/Protestarrt/Catholic), the identification of diversity is on all of the mathe,natically possible types relating to a pheno111enon, or on the
readily apparent. The investigator simply chooses one case from each full 'property space,' to use Lazarsfeld's term. Typological theories are rarely fully
. specified, however, because researchers are usually interested only in the types that
category. For a continuous variable, the choices are not so obvious. H�w­ arc relatively common or that have the greatest implications for theory-building
ever the researcher is well advised to choose both extreme values (high or policy-making. 1 1
and 'iow), and perhaps che mean or median as well. One may also look
for break-points in the distribution that seem to correspond to categorical George and Smoke, for example, wish t o explore different types o f deter­
differences an1ong cases. Or one may follow a theoretical hunch about rence failure - by "fait accompli," by "limited probe," and by "controlled
which threshold values count - that is, which ones are likely to produce pressure." Consequently, they wish to find cases that exemplify each type
differerrt values on Y. of causal mcchanism. 12
1\nother sort of diverse case takes account of the values of multiple Diversity may thus refer to a range of variation on X 1 or Y, or to a
variables (i.e., a vector) rather than a single variable. If these variables particular combination of causal factors (with or without a consideration
arc categorical, the identification of causal types rests upon the inte.r­ of the outco1ne). In each instance, the goal of case selection is to capture
section of each category. Two dichoto1nous variables produce a matrix the full range of variation along the dimension(s) of interest.
with four ccllsi three dichotomous variables produce a n1atrix of eig�t
cells, and so forth. If all variables arc deemed relevant to the analysis, Cross-Case Technique
the selection of diverse cases mandates the selection of one case drawn Since diversiry can mean mauy things, its en1ploy1nent in a large-N setting
from within each cell. Let us say that an outcome is thought to be is necessarily dependeut upon how it is understood. If it is understood to
affected by sex, race (black/white), and n1arital status. Here, a divcrse­ pertain only to a single variable (X, or Y), then the task is fairly simple,
case strategy of case selectiou would identify one case within each of these as we have discussed. Univariate traits are usually easy to discover in a
intersecting cells - a total of eight cases. Again, things become n1ore co1n­ large-N setting through descriptive statistics or through visual inspection
plicated when one or more of the factors is continuous, rather . than cat­ of the data.
egorical. Here, the diversity of case values do not fall nea�ly into �ells. Where diversity refers to particular combinations of variables, the rele­
Rather' these cells must be created by fiat - for example, high, medium, vant cross-case technique is so1ne version of stratified random sa1npling (in
low. a probabilistic setting) JJ or Qualitative Comparative Analysis (in a deter-
It will be seen that where multiple variables are under consideration, 1ninistic setting). 14 If the researcher suspects that a causal relationship is
the logic of diverse-case analysis rests upon the logic of typological the­ affected not only by con1binations of factors but also by their sequencing,
orizing - ,,._.,here different combinations of variables are assumed to have
effects ou an outcome that vary across types. George and Bennett define 1 1 ( ;eorge and Bennett (2005: 235). See also Elman (2005) and Laz:usfcld and Barton
a typological theory as I II <JS II ).
ivlorc precisely, (�eorge and Smoke (1 974: S.14, 522-3fi, Chapter 1 8; see also discussion
i 1 1 C :(lllicr ;111d M,1honcy 1 996: 78) set out to investigate causal pathways and discovered,
1 .•

in 1lw ()r dwir invl'st·igation of n1any cases, these three causal types. But for our
p 1 1 r11u�t·�
10 This method has not heen given n1uch attention by qualitative methodologists; hence, the l'()llf'�c

absence of a generally recognized name. It bears son1e resem�la1: ce to J. S. MiU's Joint wh:11 i\ i 1 u pu1·1.u11 i� tli;1t rhl' fin<1[ s,unple include at least Ont' representative of
l\1cthod of Agrceinent and Difference (Mill l 843/1872), \vhich is to <;11y, 11 n1t:xture of " l \'11(', ..

1110-;t ·�iinilar ,ind 1 no-;r-(li fferl'n1 analy:;,is, :1s discu ssl'd Liter. P,1 !'1(111 (_IIHl.1 : .' 1·1) ('111 p l<1 y �
Ill I
1·.11 l l

I I l\. 1.1 ·, 1 1 1 I . ' 1 11 ll 1 1


I I \ .
,·1· ( IH ,\II ( 'I ·1

d l l · 1 1 >1H t'j'I 1>f " n, : 1 x i 11 1 1 1 1 11 v.1ri;1lion (\H'll'l'l >)',('llt'ily) �,1 t1 1 pli1 q•, . "
100 If. Doing Case Studies Techniques for Choosing Cases 101

then the technique of analysis n1ust incorporate temporal elements. 1 5 cases is, by definition, a set of cases that encompasses a range of high and
Thus, the method of identifying causal types rests upon whatever method low values on relevant dimensions.
of identifying causal relationships is presumed to exist. There is, therefore, n1uch to reconunend this method of case selection. I
Note that the identification of distinct case types is intended to identify suspect that these advantages are con1n1only understood and are applied
groups of cases that are internally homogeneous (in all respects that n1ight on an intuitive level by case study researchers. However, the lack of a
affect the causal relationship of interest). Thus, the choice of cases within recognizable name ;:-, and an explicit methodological defense - has made it
each group should not be problematic, and may be accomplished through difficult for case study researchers to identify this method of case selection,
random sampling. However, if there is suspected diversity within each and to explain its logic to readers.
category, then measures should be taken to assure that the chosen cases
are typical of each category. A case study should not focus on an atypical
Extreme Case
member of a subgroup.
Indeed, considerations of diversity and typicality often go together. The extreme-case n1ethod selects a case because of its extreme v�lue on an
Thus, in a study of globalization and social welfare systems, Duane Swank independent or dependent variable of interest.'1 7 Thus, studies of domestic
first identifies three distinctive groups of welfare states: "universalistic" violence may choose to focus on extreme instances of abuse. 18 Studies of
(social democratic), "corporatist conservative," an<l "liberal." Next, he altruis1n may focus on those rare individuals who risk their lives to belp
looks within each group to find the most-typical cases. He decides that others (e.g., Holocaust rcsisters).1 9 Studies of ethnic politics may focus
the Nordic countries are more typical of the universalistic model than on the most heterogeneous societies (e.g., Papua New Guinea) in order to
the Netherlands, since the latter has "some characteristics of the occu­ better understand the role of ethnicity in a de1nocratic setting. 20 Studies
pationally based program structure and a political context of Christian of industrial policy often focus on the most successful countries (e.g., the
Democratic-led governments typical of the corporatist conservative NICs), 21 and so forth. 22
nations." 16 'fhus, the Nordic countries are chosen as representative cases Often an extreme case corresponds to a case that is considered to
within the universalistic case type, and are acco1npanied in the case-study he prototypical or paradigmatic of some phenon1ena of interest. This is
portion of his analysis by other cases chosen to represent the other welfare hccause concepts are often defined by their extre1nes, that is, their ideal
state types (corporatist conservative and liberal) . types. Ger1nan fascism defines the concept of fascism in part because
ii- offers the n1ost extreme example of that phenomenon. Ho,vever, the
111ethodological value of this case, and others like it, derives from its
Conclusion ('Xtren1ity (along some dimension of interest), not fro1n its theoretical
Encon1passing a full range of variation is likely to enhance the representa­
-;t·atus or its status in the literature on a subject.
tiveness of the sample of cases chosen by the researcher. This is a distinct
The notion of "extreme" may now be defined more precisely. An
advantage. Of course, the inclusion of a full range of variation may dis­
v x trcme value is an observation that lies far away fro1n the 1nean of a given
tort the actual distribution of cases across this spectrum. If there are more
"high" cases than "low" cases in a population and the researcher chooses 1 · !1 does not 1nake sense to apply the extre1ne-case method in a confinnatory/
only one high case and one low case, the resulting sample of two is not per­ di�nrnfirn1atory analysis. If a particular causal relationship is at issue, then both X 1
fectly representative. Even so, the diverse-case 1nethod often has stronger .ind Y n1ust be taken into account when choosing cases, as described in the various sce-
claims to representativeness than any other small-N sample (including the 11ario� l'h:lt Follo,v. At present, therefore, ,ve shall assume that the researcher has a general
q1lt's!ion in 1ni11d, but not a specific hypothesis.
typical case). The selection of diverse cases has the additional advantage Ill J\ i"o)\V I I\' { 1 ')87 ).
of introducing vari ation on the key variahles of interest. A set of diverse I '' I\ 1 1 >lll'! l\' ( I ')')() ),
'II J{ ( ·ill l' ( .1 0IH)/.1.(I() I ) .
' I [ ) 1 · 1 1 1 ( I '!.',:' / ) ,
1 ' Ahhon (200 I ); Ahhoir :ind hu-rc.\t ( l lJ8(1); 1\hh<)lt ;111d I\.\\' ( .1 1 1111 1 1 1 1 ,1 l 1 1 1 1 lw1 <'\,111q ,I, .. . ., , l , >11 1 , 1 ,, 11 < l l\ \.il1<111 ! ')' ( l '}'J(, }; ( ;l'dtlcs ( 1 990); ;ind Tendler
II, \w:1 1 1 k ( ,1.{lll.!: 1 1 ) . s('(' ,ll\u l·\pi1w. A11dt·l'�('II ( ] '!')0). I l 'l'l ')
102 II. Doing Case Studies Techniques for Choosing Cases 103
(.

distribution. For a continuous variable, the distance from the 111ean may
be in either direction (positive or negative). For a dichotomous variable
(present/absent), I understand extreme to mean unusual. If most cases
are positive along a given dimension, then a negative case constitutes an
extreme case. If most cases are negative, then a positive case constitutes
an extreme case. All things being equal, one is concerned not only with w
cases where something "happened," but also with cases where son1ething

"'
did not. It is the rareness of the value that makes a case valuable, in
this context, not its positive or negative value.23 Thus, if one is studying �
0

state capacity, a case of state failure is probably 1nore inforn1ative than


a case of state endurance simply because the former is 1nore unusual.
Similarly, if one is interested in incest taboos, a culture where the incest

•-
taboo is absent or weak is probably more useful than a culture where it is
0

present. Fascism is more important than nonfascisn1; and so forth. There


is a good reason, therefore, why case studies of revolution tend to focus
on "revolutionary" cases. Theda Skocpol had much more to learn from
France than fro1n Austro-Hungary, since France was more unusual than r-·.·· - --- - --�-
0

Austro-Hungary within the population of nation-states that Skocpol 0.0 0.5 1.0 1.5
was concerned to explain. 24 The reason is quite simple: there are fewer Extremeness
revolutionary cases than nonrevolutionary cases; thus, the variation
FIGURE 5.5. Potential extreme cases. A histogram of the "cxtren1eness" of all coun­
that one wishes to explore as a clue to causal relationships is encap­ tries on the dimension of dcn1ocracy, as measured by standard deviations fron1
sulated in these cases, viewed against a backdrop of nonrevolutionary the 1nean (absolute value).
cases.
to set an arbitrary threshold. Under assumptions of no nnality, cases with
Cross-Case Technique an extremeness score s1naller than two would generally not be considered
As stated, extreme cases lie far fron1 the mean of a variable. Extremity extreme. If the researcher wishes to be more conservative in classifying
(E), for the ith case, can be defined in terms of the sample mean (X) and cases as extreme, a higher threshold may be employed. In general, the
the standard deviation (s) for that variable: d10ice of threshold is left to the researcher, to be ma'de in a way that is
appropriate to the research problem at hand.

I X; - X I
Ei = --- (5.4) The mean of our democracy variable is 2.76, suggesting that the coun­
t ries in the 1995 dataset tend to be somewhat 1nore den1ocratic than
:1uthoritarian (zero is defined as the break-point between de1nocracy and
This definition of extremity is the absolute value of the standardized
:111tocracy). The standard deviation is 6.92, implying that there is a fair
("Z") score for the ith case. Cases with a large Ei qualify as extreme.
ii 111011nt of scatter around the n1ean.
Sometimes, the only criterion is a relative one. The researcher wishes to
Figure 5 .5 shows a histogram of the extremeness scores for all countries
find the 111ost exrreme case(s) available. At other times, it may be helpful
0 1 1 ll'vl'l of den1ocracy. As can easily be seen, no cases have extremeness

',cor('s g1T;1n__T than t\VO. The t\VO countries with the highest scores are
Traditionally, 1nethodologists have conceptualized cases as having "po.\itivc" or "nq..;:1
l !,1 1 .1r ;111d S;111di i\r;1hi;1. ' l'hcsc countries, which both have a de1nocracy
23

tive" v.;ilucs (e.g., E1nigh 1 997; tvlahoney and Goertz 2004; R;1gi11 )JlllO: f,0; !C1gin 2!HH:
1 26 ) . ',1 or(· (ll I O i'nr I qq ) , .11T 1,roh;1 hly I hl· t·wo hcst candidates for extreme-
21
· Sk(K'])C)I ( l 'J/ lJ). ' , I ',t' , l 1 1 . i i \'', i •,.
104 II. Doing Case Studies Techniques for Choosing Cases 105

re gre ssion line. However, this sort of analysis m eans that the researcher
Conclusion
The extreme -case n1ethnd appears to violate the social scienc e folk wisdorn is no longer pursuing an extren1e-case method. The extreme-case n1ethod
variab l . )' Se lecting case s is purely exploratory - a way of probing possible causes of Y, or possible
warning us not to "select on the depe nde nt
25
e

of what additional factors might affect th e outcome of interest , or of


cases are effects of X 1 , in an open-ended fashion. If th e researcher has son1e notion
on the dependent variable is indeed problen1atic if a numbe of r
are all
chosen all of which lie on one end of a variable's spectrum (they
sample to what relationship the causal factor of interest has to Y, then she ought to
positiv� or negative), and if the rese archer then subj ects this 2 6
v of a popula tion. Results pursue one of the other methods explored else where in this chapter. This
cross-case analysis as if it were repre sentati e
. Moreo v r, there also implies that an extreme-case n1ethod may transform into a different
for this sort of analysis would a lmost assuredly b e biased
e

exp licitly kind of approach as a study evolve s, that is, as a rnore specific hypothesis
will be littl e variation to expl ain, since the values of ea ch case arc
constrained. comes to light. Useful " extre111 e " case s at the outset of a study 1nay prove
e-case less useful at a later stage of analysis.
Ho\vever, this is not the proper employment of the extrem
metho d.)
me thod. (It is more appropriately labe led an extrem e-samp le
s that lie
The extreme-case method refers back to a l arger sample of cas
e
Deviant Case
of variation
in the background of the analysis and provide a full range
as well as a more repre sentative picture of the popula tion. It is a self­ The deviant-case n1ethod selects the case(s) that, by reference to son1e gen­
din1 nsion of int e rest, not eral understanding of a topic (e ither a specific theory or con1111on sense),
conscious atte mpt to maximize variance on the e
throug h the
to minimize it. If this population of cases is well understood - demonstrates a surprising value. Barbara Geddes note s the importance of
, or through deviant cases in medical science, where researchers are habitually focused
author's own cross-case analysis, through the work of others
a single
common sense - then a researcher 1nay justify the selection of on that which is pathological (according to standard theory and practice) .
-case analys is. lf no t, the Th e New England Journal of Medicine, one of the premier journals o f the
case exen1p l ifying an cxtre1ne value for within
me hod (see the field, carries a regular fe ature e ntitled "Case Records of rhe Massachuse tts
researcher n1ay be well advised to fol l o-....v a diverse-case t

earlier discussion). General Hospital." The se articles be ar titles like the following: "An 80-
eness.
By way of conclusion, let us return to the problcn1 of representariv Year-Old Woman with Sudden Unilateral Blindness" or "A 76-Ye ar-Old
entativ eness refers to a case that Man with Fever, Dyspnea, Pulmonary Infiltrates, Pleural Effusions, and
In the context of causal analysis, repres
exemplifies values on X 1 and Y that conforn1 to a genera l patter n. In a Confusion." 27 Similarly, medical researchers are keen to inv e stigate thOse
is gauged
cross-case rnodel, the representativeness of an individual case rare individuals who hav e not succtunbed, despite repeated exposure, to
by the size of its residual. '[he represe ntative case is therefo re a typical the AIDS virus. 28 Why are they resistant? What is different about these
case (as will be discuss ed). It people? What can we learn about AIDS in other patients by observing
case (as already discussed), not a deviant
devian t. Ther e is people \vho have built-in resistance to this disease?
will be seen that an extreme case 111ay be typical or
a causal
simply no way to te ll, be cause the re se archer has not yet specified Case studies in psychology and sociology are often comprised of devi­
e d, we 1nay ant (in the social sense) persons or groups. In economics, case studies
proposition. Once such a causal proposition has been specifi
on is sin1ilar to some popula tion of 111ay consist of countries or businesses that overperform (e.g., Botswana,
then ask whethe r the case in questi
relatio nship of interes t). Microsoft) or underperform (e .g., Britain through most of tbe twe nti eth
case s (in all respects that might affect the X1/Y
contex t of a
It is at this point that it becoines possible to say, within th e cl'ntury; Sears in recent decades) relative to soine set of expectations. In
from, the
cross-case statistical 1nodel, whether a case lies near to, or far
( ;l'ddes (2003: l .1 1 ). For other examples of case \York fron1 the annals of 1nedicine, see
" ( :linictl Rrpor1," i11 ·1·1;,. I ,111cct; "Case Studies" in The Canadian A1cdical Association
.
25 Geddes (1990); King, Keohane, and Verba ( 1994). See also discussions in Brady :1nd /"11r11,1/; .11l<I 1".11 i. ,w, 1·.<.11,·" 1 ii tlH' J, J11n1irl of ()hstetrics and Gynecology, often devoted
Collier (2004); Collier and Mnhoney (1 996); and Rogowski ( 1 995). 1 1 ) ,·li11i,·:1l , .1·.,". 1 d 1 ·., 11·,.,·il 1 1 1 ln1i,·!'k 2.00 1 : 7). For ex;1111ples frorn the subfield of com-
-1 1' The exception wollld [w ,1 cirn1 111st:111ce i11 which the rl'\t\H,-l11·r 111t(·t11h in di ...pr()Vl' 1 ' · 1 1 . 1 1 11·,· 1" ' 1 1 1 11 ', ·. , . I·
I( 111<
l)',il ( I '/OJ , \ ) ,
d\"ll'r1nini\li l' ,H)', t1nw111 (l )i1l11 l 1J'Jl·{' ).
'!i
1111 < I il,111e 11'1 .111, I \ ' , 1 1 '"1· I,. d I i I ' 1 ' 1., I . I l.11 11< ',, l'.1111.1 In,, .111d l·,111( i ( I !)< Jr,) .
1 06 II. Doing Case Studies Techniques for Choosing Cases 107

political science, case studies 1nay focus on countries where the welfare of a formal, 1nathematical representation of the hypothesis at hand, a
state is more developed (e.g., Sweden) or less developed (e.g., the United deviant case lies as far as possible from that prediction. Referring back to
States) than one would expect, given a set of general expectations about the model developed in equation 5 .3, we can define the extent to which a
welfare state develop111ent. case deviates from the predicted relationship as follows:
In all fields, the deviant case is closely linked to the investigation of the­
Deviance(i) = abs[y, - E(y, lxu, . . . x1u)l
= abs[y, - b o + b, x u + · · · + bKXK,f ]
oretical anomalies. Indeed, to say " deviant" is to imply "anon1alous. " 29 (5.5)
Note that while extre1ne cases are judged relative to the mean of a single
distribution (the distribution of values along a single dimension), deviant
cases are judged relative to so1ne general model of causal relations. The Deviance ranges from 0 , for cases exactly on the regression line, to
deviant-case method selects cases that, by reference to some general cross­ a theoretical lin1it of infinity. Researchers will usually be interested 111
case relationship, demonstrate a surprising value. They are "deviant" in selecting fron1 the cases with the highest overall estimated deviance.
that they are poorly explained by the multivariate model. The impor­ In our running example, a two-variable model with economic devel­
tant point is that deviantness can only be assessed relative to the general opment (Xi ) and democracy (Y), the most deviant cases fall below the
(quantitative or qualitative) model employed. regression line. This can be seen in Figure 5 .4. In fact, all eight cases with
This means that the relative deviantness of a case is likely to change a deviance score of more than ten have negative residuals; their scores on
whenever the general model is altered. For example, the United States is t·hc outcon1e are lower than they "should" be, given their level of deve\­
a deviant welfare state when this outcome is gauged relative to societal opn1ent. These eight cases are Croatia, Cuba, Indonesia, Iran, Morocco,
wealth. But it is less deviant - and perhaps not deviant at all - when Singapore, Syria, and Uzbekistan. Our general mo<lel of den1ocracy does
certain additional (political and societal) factors are included in the model, not explain these cases very well. Quite possibly, we could develop a bet­
as discussed in the epilogue. Deviance is model-dependent. Thus, when ter model if we understood what - aside from GDP per capita - might
discussing the concept of the deviant case, it is helpful to ask the following he driving the choice of regime type in these polities. This is the usual
question: relative to what general model (or set of background factors) is purpose for which deviant-case analysis is employed.
Case A deviant?
The purpose of a deviant-case analysis is usually to probe for new- but < :onclusion
as yet unspecified - explanations. (If the purpose is to disprove an extant As I have noted, rhe deviant-case method is an exploratory form of anal­
theory, I shall refer to the study as a crucial case, as will be discussed later.) ysis. As soon as a researcher's exploration of a particular case has iden­
Thus, the deviant-case method is only slightly more determinate than the I ificd,a factor to explain that case, it is no longer (by definition) deviant.
extren1e-case method. It, too, is an exploratory for1n of research. The ( ! "he exception would be a circumstance in which a case's outcome is
researcher hopes that causal processes within the deviant case will illus­ dt'en1ed to be accidental or idiosyncratic, and therefore inexplicable by
trate some causal factor that is applicable to other (deviant) cases. This ;1ny general model.) If the new explanation can be accurately n1easurcd as
means that a deviant-case study usually cuhninates in a general proposi­ ,1 'iingle variable (or set of variables) across a larger sample of cases, then

tion - one that may be applied to other cases in the population. .1 new cross-case 1nodel is in order. In this fashion, a case study initially
I r;1111cd as a deviant case is l ikely to be transformed into some other sort of
.111alysis.
Cross-Case Technique
This feature of the deviant-case study also helps to resolve doubts about
ln statistical terms, deviant-case selection is the opposite of typical-case
11-; rvprl'sl'nta tiveness. Evidently, the representativeness of a deviant case is
selection. Where a typical case lies as close as possible to the prediction
pro hll'lll:ll ic, si ncv t·hl' c;1sl' in question is, by construction, atypical. How­
c·vcr, 1 liis 1)rohlt·t11 l·. 1 1 1 Ii<' 1 1 1 ! 1 ig;1tcd if the rcscarcl�er generalizes whatever
p 1 c q ()'> i t i ( ) 1 1 i•, 1 1r( 1 \·1 , l,·d l1y Ilic L' :\.',t' study to other cases. In a Jarge-N
29 For a discussion of the important roli.: of :11101nalil"\ i n thl" dl"vt·lnp111l·11l ni' .'>L"i(' 1 1 1 i l i ,
1

1 1 1 c 1di·I , 1 1 1 1', 1· . . H , 0 1 1 q d 1·. l11·,I Ii\ 1 l 1c- lT1·.1ti( )ll of ;1 v:1ri;1hh· to rcprl'Sl'llt the
rheorizing, sec Flm:111 (2003) ;111d ! .:1L11os ( ] 'J 7 8 ) . h)r ("x:111 1pl!"', "I d(·1· 1 . 1 1 1 t , .1,(' 1"(''><".11< I ,

1 1 1 · \\ l n 1 H 1t l H",I'. ! l i . 1 1 t i " , .1 .., ·. 1 1 111\' 1 1 . 1 ·. !!lc- 1 1 1 di(·d . · 1 · 1 11·, 111.1\· 1·1 ·(pi irt' ..,011 1 t ·
d!·'>i l \ll'> 1 1 \ 1\w �1H · i.tl '>l.H"lll l"'>, •,,"!' 1\ 1 1 w 1 1 1 . 1 ( 1 '!'1 1 ) : < ·"l'l'nlr .1 · ( 1 1 1 1 1 · 1 i , I , I . , , 1 , · 1 1 1 1 I 'I ' � ) :
J · 1 1 1 i ) •, l 1 { I 'l<J ' \ ; . 1 1 1 d k ,·11, Ltl I . 1 1 1 , I \\-', , 1 1 I I ''·I 11/ I ' ! 1 ·, J
II. Doing Case Studies Techniques for Choosing Cases 1 09
108

original coding of cases (in addition to the cnsc under intensive study). However, the more usual employment of the influential-case 1nethod
However, so long as the underlying infor111Jtion for this coding is avail­ culminates in a substantive reinterpretation of the case - perhaps even
able, it should be possible to test the new hypothesis in a cross-case model. of the general 1nodel. It is not just a question of measurement. Consider
If the new variable is successful in explaining the studied case, it should no Thon1as Ertn1an's study of state building in Western Europe. As summa­
longer be deviant; or, at the very least, it will be less deviant. In statistical rized by Gerardo Munck, this study argues
terms, its residual will have shrunk. lt is nov,,1 typical, or at least more
that the interaction of a) the type of local governn1ent during the first period of
typical, and this relieves concerns about possible unrepresentativeness. statebuilding, with b) the timing of increases in geopolitical competition, strongly
influences the kind of regime and state that emerge. [Ertman] tests this hypoth­
esis against the historical experience of Europe and finds that 1nost countries fit
Influential Case his predictions. Denmark, however, is a 1najor exception. In Denmark, sustained
So1netlmes the choice of a case is motivated solely by the need to verify the geopolitical competition began relatively late and local govern1nent at the begin­
ning of the state building period was generally participatory, ,vbich should have led
assun1ptions behind a general 1nodel of causal relations. Herc, the analyst the country to develop 'patrimonial constitutionalism.' But in fact, it developed
attempts to provide a rationale for disregarding a problematic case, or a 'bureaucratic absolutisrn.' Ertrnan carefully explores the process through \vhich
set of problematic cases. That is to say, she attempts to show why apparent Denmark came to have a bureaucratic absolutist state and finds that Den1nark
deviations fro1n the norn1 are not really deviant, or do not challenge the had the early tnarks of a patrin1onial constitutionalist state. However, the country
core of the theory, once the circumstances of the special case or cases was pushed off this developrnental path by the influence of Gern1an knights, who
entered Denmark and brought with thern Gennan institutions of local govern­
are fully understood. A cross-case analysis may, after all, be marred by
lTtent. Ertn1an then traces the causal process through which these i1nported insti­
several classes of problems, including measurement error, specification tutions pushed Den,nark to develop bureaucratic absolutism, concluding that this
error, errors in establishing proper boundaries for the inference (the scope �leveloprnent was caused by a factor well oursidc his explanatory fra1nework.
30

of the argun1ent), and stochastic error (fluctuations in the phenomenon


under study that arc treated as rando1n, given available theoretical and Ertn1an's overall framework is confirmed insofar as he has been able to
empirical resources). If poorly fitting cases can be explained away by show, by an in-depth d iscussion of Denmark, that the causal processes
reference to these kinds of problerns, tben the theory of interest is that stipulated by the general theory hold even in this apparently disconfirming
n1uch stronger. This sort of deviant-case analysis answers the question, i.: ase. Denmark is still deviant, but it is so because of "contingent historical
"What about Case A (or cases of Type A)? How does that (seemingly l·ircumstances" that are exogenous to the theory.3 1
disconfirn1ing) case fit the n1odel?" The reader will have noted that influential-case analysis i s sin1ilar to
Because its underlying purpose, as well as the appropriate techniques deviant-case analysis. Both focus on outliers, unusual cases (relative to the
for case identification, is different from that of the deviant-case study, theory at hand). However, as we shall see, they focus on different kinds
I offer a new term for this method. The influential case is a case tbat 1if unusual cases. Moreover, the animating goals of these t\vo research
appears at firsr glance to invalidate a theory, or at leas� to cast doubt designs are quite different. The influential-case analysis begins with the
upon a theory. Possibly, upon closer inspection, it does not. Indeed, it .1i111 of confirming a general n1odel, while the deviant-case study has the
may end up confirming that theory - perhaps in some slightly altered .1!111 of generating a new hypothesis that modifies an exis'ting general
forn1. In this guise, the influential case is the "case that proves the rule." rnodcl. The confusion between these two case-study types stems from
A si1nple version of influential-case analysis involves the confirn1ation Ihl' fact that the same case study may fulfill botb objectives - qualifying a

of a key case's score on son1e critical d imension. This is essentially :1 f',1·1H-ral 1nodel and, at the sa1ne tin1e, confirming its core hypothesis.
question of measurement. Son1etin1es cases are poorly explained sin1p l y In their study of Roberto Michels's " iron law of oligarchy," Lipset,
because they are poorly understood. A close exan1ination of a particut1r I 'n nv, ,111d ( :olc111:111 choose to focus on an organization- the International
context n1ay reveal that an apparently falsifying case has been 111iscodcd.
If so, tbe initial challenge presented by that case to so1\ll' )!;l'Ilcr,11 tlH·orv 1
" r,, 1 1 1 1 1 , I, ( . ' ( H J . I : 1 I 1i\ , ,, ' .,I,,,, I , 1 1 1 1. 1 1 1 1 1 ')'!7),

has hccn ohvi,11\'tl. I I I 1 1 1 11.1 1 1 i l "' l ' \ l r , I


II. Doing Case Studies Techniques for Choosing Cases Ii 1
110

Typographical Union - that appears to violate the central presupposi· of the leverage of each case can be derived from the diagonal of ,he hat
tion.n The ITU. as noted bv one of the authors, has "a long-term two­ matrix. Specifically, the leverage of case t is given by the number in the
party systen1 wi,th free elccrions and frequent turnov� r in �ffice': and is Ui) position in the- hat rnatrix, or H i,i, 37
thus anything but oligarchic. 3 3 Thus, it calls into question M,chels s grand For any X n1atrix, the diagonal ea tries in the hat n1atrix will automat­
_ _ ically adJ up to K + 1 . Hence, interpretations of 1he leverage scores for
generalization about organizational b � havior. "f'he autho: s expla\n this
curious result hv rhe extraordinarily high level of educanon among the different cases will necessarily depend on the overall number of cases.
me�bers of this.union. Thus, Michcis's law is shown to be valid for most (]early\ any case with a score near one is. a case \Vith a great deal o.1: lever­
orgar:izations, but not all. It is vaEd with qualifications. Note that the age. In n1ost regressiorl' situations, however, no Cd.Se has a score that high.
respecificatiou of the original model (in effect, Lipset, Trow, and Cole­ l'\ standard rule of thutnb is to pay close attention to cases with a Ievrrao-e b

n:ian introduce a new control variable or boundary cond1t1on) 1nvolves score higher than 2 ( K + 1) iN. Cases with a leverage score above this
the exploration of a new hypothesis. In this respect, the use of an influ­ value are good candidates for influential··case selection.
ential case to confirm :1n existing rheory is quite similar to the use of a A..n interesting feature of the hat 111atrix i s that it does not depend on
deviant case to unearth a new· theory, the values of the dependent variable. Indeed, the Y vector does not appear
in equation 5.6. This means that the measure of leverage derived from the
hat matrix is, in effect, a 111easure of potential influence. It -::elJs us how
Cross-Case Technique
s that, if counterfactuaUy much difference the case \vould make in the final estin1ate if it were to
Influential cases in regression are those case
ble, would most sub_stan­ have an unusual score on the dependent vafJablc, but it docs not tell us
assigned a different value on the dependent varia
titative n1easures ot lllflu­ hovv much difference each case actually n1ade in the final esri1nate,
tial!v change the resulting estimates. Two quan
nostics, 34 The first, often ...;\nalysts involved in selecting influential cases will sornetin1es be inter­
enc� are con1mon ly applied in regression diag
from what is called the hat L'sted in measures of potential influence, because such 1neasures are rele­
referred to as the "leverage" of a case, derives
penJent .variables. for all o : vant in selecting cases when there n1ay be so111e a priori uncertainty about
matrix.35 Suppose that the scores on the inde
tl-:e matrix X, ¥.rh1ch has N scores on the dependent variable, 11uch of the infonnation in such case
the cases in a regression are represented by
K + 1 columns (representing studies conies from a careful, in·-depth rneasi:rement of the dependent
row s (representing each of the N cases) and
a constant). Further, allow variable - which may sometimes be unkno\vn� or only approxitn01tely
the K independent varia bles and allowing for
variable for a!l of the case&, known, before the case study begins. The measure of levera�e derived
y ro represent the scores on the dependent fron1 the hat n1:1trix is appropriate for such situations because lt does not
mn.
Therefore. Y will have N rows and only one colu
ix, H, is as follows: n:quire actual scores for the dependent variable,
Using these symbols, rhc formula for the hat matr A second co1nmonly discussed n1easure of influt:'nce i n statistics is
i5.6l
H = X(XTx:- 1 x'f { 'ook's distance. This statistic is a 111easure of the extent to which the estl-
ix transpose operation, 11,atcs of the fi i parameters would change if a given case were omitted frorr1
ln this equation, the syrr1bol "T" represents a matr
rse operation.36 i\ measure l lie analysis. Because regression analysis typicaHy includes 1nore than one
and the symbol •• --- 1 represents a matrix inve
1

/f i pal'J1neter, a measure of influence requires some method of combining


'

t!H· differences in each parameter to produce an overall n1easure of a


32 Lipse:j Trow, and Colcr:nan {1956 ). r.1st•'s influence. The Cook\; distance statistic resolves this dilen1ma by
:n Lipset (1959: 70).
34 Belsey, Kuh, and Welsch (2004). _. _ . . . ,
::s Tbis somcwhar;;.urious name dcnves fr�>nt
the fact that, 1t the hat n1atr:x 1!:'. mult1phcd ,iy
the vector containing voiues of the dependent val'iable, the resu!t is the vector o_f �tt,:d
values for each c.isc. Typically, the ve(.-tor of fitted values for :he depe:1d_ent var13hk ,, 1' !'lw di,, ll'i·;io11 lwn· i11;•. , I \ ,.., i l k H�L' {if thrt hat 111atnx ::1 lincar regression. Anal ysts rnay
disti ngui:,hed from rhe acrn-al ve.:tor uf value '> on the dependent varia�lc by the uv· o! .d•.,, lw 1:jji-n",'1·,i 1 1 1 ·,Hu.111,,11•. du1 d1) 1111! rt• ..:n11h'.c linear regression pn:,blerns., e.g.,
cvs : 1 w fit,cd v.i!HI'.',, �·.t1 1 b, wl•i"t" !�w ,l1· 1,,·:1t:,· ,; 1,11 ·,1!,k · . ,111 1 ,,11,,1;1,a1� (n-,:.1\q.�nri,.:;,f. Snrncti ines, t'.:ese situati 0 ns
rbe ..�,, or "h.at" symbcL Hence, the D.:.n ni,ull'.rix. ,,.,..-b1ei1 yrndu , 111 iw.1>, \>nn111 ,,L1:, ,! 1, 1 1 l n n , jp I· ,,- , q L 1,! )•,,·11tT.d11rd liiw.1rmt><lt·i�, which includes
sni<l to put �he- !lat on the dcpc:;dl'l11 v;iri:ih ll!i,

I•\\ ', i'.1 111'1 ,I, . 1 1 1 , I ,, I, 11 "' I I \ I :'-, 1, i!ll.11 ·J1 11d l'l, ·!d1'I I 'IX')).
\;, Sec ( ;r,'\''.W l_)_ '.102) '.c1 hril't' rrv), w.
I! • >I 'I! ' ,I I! j ,I
,l

l .
112 11. Doing Case Studies Techniques for Choosing Cases 1 13

taking a weighted sum of the squ(1red par;1111eter differences associated is denoted by Ei , then the Studentized residual, rl, can be cotnputed as
with deleting a specific case. Spccif-ic�1 11y, the for1nula for Cook's distance follows. (All symbols in this expression are as previously defined.)
cf
(5.9)
lS:
r s=======
JccMS £ ( 1 - H,=
,=- .,l
(b_, - b)T XT X(h_, - b) (5.7)
(K + l)MSE As can be seen from an inspection of equations 5.8 and 5.9, Cook's dis­
tance for a case depends primarily on two quantities: the size of the regres­
In this forn1ula, b represents all of the paran1eter estimates from the sion residual for that case and the leverage for tbat case. The most influ­
regression using the whole set of cases, and b_ i represents the parameter ential cases are those with substantial leverage that lie significantly off the
estimates fro1n the regression that excludes the ith case. X, as above, regression line.
represents the matrix of independent variables. K is the total nun1ber of Cook's distance for a given case provides a su111mary of the overall

in the formula by the use of K + 1). Finally, MSE stands for the mean
independent variables (not including the constant, which is allowed for difference that the decision to include that case makes for the parameter
esti1nates. Cases witb a large Cook's distance contribute quite a lot to
squared error, ,vhich is a 1neasure of the an1ount of variation in tbe depen­ the inferences drawn from the analysis. In this sense, such cases arc vital
dent variable not linearly associated with the independent variables. 3 8 for 1naintaining analytic conclusions. Discovering a significant 1neasure-
This so1newhat intimidating n1athcmatical notation gives precise 1nent error on the dependent variable or an i1nportant 01nitted variable
expression to the intuitive idea, discussed earlier, of measuring influence for such a case may dran1atica lly revise esti1nates of the overall relation­
as a weighted su1n of the differences that result in each parameter esti- ships. Hence, it 1nay be reasonable to select influential cases for in-depth
1nate when a single case is deleted fron1 the san1ple. One disadvantage of study.
this forn1ula is that it requires a number of extra regressions to be run in To su1n1narize, three statistical concepts have been introdnced in this
order to co1npute rneasures of inflnence for each case. The overall regres­ section. The hat matrix provides a measure of leverage, or potential influ­
sion must of course be cotnputed, and then an additional regression, with ence. Based solely on each case's scores on the independent variables, the
one case deleted, is required for each case. hat n1atrix tells us ho\v tnuch a change in (or a n1easurement error on)
Fortunately, n1atrix-algebraic manipulation de1nonstrates that the the dependent variable for that case would affect the overall regression
expression for Cook's distance given in equation 5.7 is equivalent to the line. Cook's distance goes further, considering scores on both the indepen­
follo\ving, computationally 111uch easier expression: dent and the dependent variables in order to tell us how 1nuch the overall
1:egression estimates would be affected if each case were to be dropped
(5.8) lro1n the analysis. This produces a measure of how much actual influ ence
( K + 1)(1 - H, , ) L'ach case has on the overall regression.
In this expression, H;, i refers to the 111easure of leverage for the ith case, Either the hat matrix or Cook's distance n1ay serve as an acceptable
taken fro1n the diagonal of the hat 1natrix, as already discussed. K once nicasure of influence for selecting case studies, although the differences
again represents tbe nu1nber of independent variables. Finally, rl is a just discussed must be kept in mind. In the following examples, Cook's dis­
special, modified version of the ith case's regression residual, known a.-; t:1ncc \Vill be used as the primary measure of influence because our interest
the Studentized residual, which needs ro be separately co1nputed. [<; in whether any particular cases 1night be influencing the coefficient esti-
The Studentized residual is designed so that the residuals for all cases 111;1rcs in our democracy-and-development regression. A third concept, the
v,,,jl] have the same variance. Tf the standard regression residual for case f
."it 1idcntized residual, was introduced as a necessary ele1ncnt in con1puting
< :ook's disc111CL'. ('l ' hc h;1r n1atrix is, of course, also a necessary ingredient
I l l ( :(>Ok 's disLllll l'. )

1:iJ', lll "(' ) . (1 c, I H ,\\'', I I H · < < H , k \ ,listat1lT scores for each of the countries
3 8 Spccific;i.lly, the MSE is found by srnnming thi' :-.quarl'd residuals fro111 rht· 1'1111 rt'gn·��i(ltl
:1nd then dividing by N - K - I , where N i.\ illl' 11111nlwr ()f C\St'-� ;11HI k i� dw llllllllll'1 ,ii
i11,kpt·11dt·11t v;1riahk�. 1 1 1 du· I q q � 1 1('1 1 . q 1 1 1 .1 I ,l 1 1 ' . 1 1 H I d1·1 1 1( u,T,ll·�· d:1LJ.'i('l . iVl ost col1 1Hrics h J Vl'
114 II. Doing Case Studies Techniques for Choosing Cases 115

105 cases are likely to be and, hence, the less l ikely a researcher is to use hat
matrix and Cook's distance statistics for purposes of case selection. In

ci these instances, it may not matter very much what values individual cases
0 display. (It may of course matter for the purpose of investigating causal

"'0
ci mechanisms. However, for this purpose one would not employ influential
statistics to choose cases.)

t'l
c
© 0
0

Crucial Case
ci
0
�0
0
0 "0 74
,75 Of all the extant methods of case selection, perhaps the most storied - and
0 certainly the 111ost controversial - is the crucial-case method, introduced
to the social science ,vorld several decades ago by Harry Eckstein. In

l1, l 1, 1111 IJl 1 ,I J l,,1, 1 1 1 1 1 ,1 1 \1J1, .1 l 1,l, 1, [ , , 11J1,l1


OJ
0
0 his seminal essay, Eckstein describes the crucial case as one "that must
0
0
J , I ,,
closely fit a theory if one is to have confidence in the theory's validity, or,
conversely, must not fit equally well any rule contrary to that proposed.""
0 20 40 60 80 100 120 A case is "crucial" in a some\vhat \.\'eaker - but much more com1non -
Country code sense when it is most, or least, likely to fulfill a theoretical prediction. A
FIGURE 5.6. Potential influential cases. The Cook's distance scores for an OLS "tnost-likely" case is one that, on all dimensions except the dimension
regression of de111ocracy on logged per capita GDP. The three numbered cases of theoretical interest, is predicted to achieve a certain outcome, and yet
have high Cook's distance scores.
does not. It is therefore used to disconfirm a theory. A " least-likely" case is
one that, on all dimensions except the dimension of rheoretical interest, is
quite lo\V scores. The three most serious exceptions to this generalization predicted not to achieve a certain outcome, and yet does so. It is therefore
are the numbered lines in the figure: Jamaica (74), Japan (75), and Nepal used to confirm a theory. In all for1nulations, the crucial case offers a
( 1 05 ). Of these three, Nepal is clearly the most influential by a wide 111ost-difficult test for an argument, and hence provides what is perhaps
margin. 1-Ience, any study of influential cases \vould want to start with an the strongest sort of evidence possible in a nonexperimental, single-case
in-depth consideration of Nepal. .'-l'tting.
Since the publication of Eckstein's influential essay, the crucial-case
;1pproach has been claimed in a multitude of studies across several social
Conclusion
The use of an influential-case srrategy of case selection is lin1ited to '-l'icnce disciplines and has come to be recognized as a staple of the
instances in which a researcher has reason to be concerned that her results 1.'asc study method.40 Yet the idea of any single case playing a crucial
are being driven by one or a few cases. This is most likely to be true i11 (or "critical") role is not \videly accepted a111ong most methodologists.41
small to moderate-sized san1ples. Where N is very large - greater than (Fvrn its progenitor seems to have had doubts.)
1 ,000, let us say - it is extremely unlikely that a small set of cases (much l !nfortunately, discussion of this method has focused misleadingly on
less an individual case) \Vill play an " influential" role. Of course, then· \vli:1t :1rc presumed to be largely inductive issues. Are there good crucial
may be influential sets of cases - for example, countries \Vi thin a particul:1r
continent or cultural region, or persons of Irish extraction. Sets of i11f11 1 I '! l·t I,;: \I ('Ill ( I 'J7 ) : \ ] 8).
ential observations arc often problematic in a titnc-scrics cross-sccti()11
1c1 h,r <'-.;:1111pll', oi' IIH· lTt1,·i.tl ,·,1,(· 11wthod, see r..cnnett, Lepgold, and Unger (1994); Desch
l .1 111 l.'.); ( ;ornlt11 .11al \1n11•.111.111 (.' l lllO); kl'mp ( 1 986); and Reilly rincl Phillpot (2003).
dataset -....vhere each unit (e.g., country) contains n1ultipll' ohsvrv;1 t ion,, h1r r, nu·1.il d1·., 11·.·. 1 1 1 1 1 , ·.,·,· < ,1 ' "1!'.1· .111d lk1111t'll (200)); l .t·vy (2002:1); nnd Stinchcombe
(through tin1e) and hence n1a y h,1 vc a strong in tl lll'llCl' 011 :1ggrrg;1 t c rv" uIt',. { I 'li,S: '·I ,', I.
Stil 1, the gl'JH.'Ll 1 rull' is: the L1 rgcr 1 l1v s;1111pli:, 1 lic I('"" i 11 q1t ,rt :111 t i 1l d i \' 1du .ti
I I 1 , n , , - . , , , ', . -
1. 1 1 0 >11 I 1 1 1 1 1 I )
Techniques for Choosing Cases lJ 7
116 II. Doing Case Studies

If a soothsayer predicts that you will rneet a dark stranger somctin1e and you do in
case s out there in the empirical wor l d? Have social scientists done a g ood
fact, your faith in his powers of precognition would not be 1nuch enhanced: you
job in identifying them? Yet the prac ticability of this m ethod re sts on would prohably continue to think his predictions were just the result of guesswork.
issues that are large ly de ductiv e in nature, as we shall see. However, if the prediction also gave the correct number of hairs on the head of
that stranger, your previous scepticis1n would no doubt be s everely shaken,44

Whil e these Poppcrian/Bayesian insights45 are relevant t o all empiri­


The Confirmatory (Least-Likely) Crucial Case
L et us begin with the confirmatory (a.lea. least-like ly) crucial case. The
implicit logic of this research design may be sumtnarized as foll o� s. Given cal re search designs, they are es pe cially rel e vant to case study re search
a set of facts w e are aske d to conte mplate the pro bability that a given the ­ de signs, for in th ese settings a single case ( or, at m ost, a small numhe r
of cases) is require d to bear a he avy burden of pro of. It shoul d be no
ory is true. While the ,facts n1atter, to be sure, the e ffectivene ss of this ort
. �
o f re se arch also rests upo n the formal p rop ertie s of the theo ry 111 question . surprise, there fore, that Poppe r's i dea of "riskine ss" ,vas appropriated by
Specifically, the degree to which a theory is an1enable to confir1nati on is case study researchers like Harry Eckstein to validate the enterprise of
contingent upon how many pr edicti ons can be derived fro m the theory single-case analysis. (Although Eckste in does no t cite Popper, the intel­
and on how "risky )' each in dividual pr ed ic tion is. In Popper's word s, le ctual lineage is clear. ) Riskiness, here, is anal ogous to what is usually
referred to as a "1nost-difficult" research design, which in a case study
Confirmations should count only if they are the result of risky predictions; that research design woul d be understood as a least-like ly case. Note also that
is to say, if, unenlightened by the theory in question, we should have expected
. the d is tinction between a 111us t-.fit case and a l east-likely case - that, in
an event which ,.v as incompatible with the theory - an event which would have
the e ve nt, actually doe s fit the terms of a theory - is a 1natter of degree.
refuted the theory. Every 'good' scientific theory is a prohibi tion; it forbids certain
things to happen. The more a theory forbids, the better it is. 42 Cases are 1norc or less crucial for confirming th eori es. The point is that,
in som e circumstances, the riskiness of the th eory may compensate for a
A risky pre diction is therefor e o ne that is highly precise and dete rmi­ paucity of empirical evi dence.
nate , an d thus unlike ly to be explainable by other causal factors (exte rnal The cr ucial-case re search de sign is, perforce , a highly deductive
to the t heo ry of intere st) or thr ough sto chastic processes. A theory pro ­ enterprise ; much depen ds on the quality of the th eory under investiga­
duces 1nany such pre dictions if it is fully e lab orated, issuing predictions tion. It f ollows that the theories 1nost ame nable to crucial-case analysis
no t only on the central outcome o f in terest but also on specific causal are th ose that arc lawlike in their precision, degree of elaboration, con­
1ncchanisms, and if it is broad in purvie,v. (The no tio n o f riskiness n1ay be siste ncy, and sco pe. The n1ore a theory attains the status of a causal law,
conceptualize d wit hin tbe Popperian lexicon as degrees of falsifiability.) the e asier it will be t o con firn1, o r to disconfirm, with a single case.

These po ints can also he ar ticulated in Bay esian terms. Co lin H owson Indeed, risky predictions are co1n1non in natural scienc e fields such
and Pete r Urbach explain: "The degree to which h [a hyp othe sis] is con­ as physics, which in turn se rved as the template for th e deductive­
firmed by e [a set o f evidence ] depends . . . on the extent t o which P( elh ) no mo logical ("c overing-law" ) model of sci ence that in flue nced Eckstein
exce eds P(e), that is, o n how 1nuch more pro bable e is relative to the and o th ers in the postwar decades.46 A fre quently cited example is the
hypo thesis an d backgroun d assumptions than it is re lative just to back­ first imp ortant empirical de monstrati on of the theory o f relativity, which
.
groun d assumptions." Again, "confirmation � s � orrelated "'." 1t� � ow m� �l� took the form of a single-event predictio n on the occasion of the May
,
mo re probable the evidence is if the hyp othesis ts true than if tt ts false. 29, 1 91 9, solar eclipse. Stephe n Van Evera describes the impact of this
Thus, the stranger the pre dic tio n o ffer ed by a theo ry - re lative to what prediction o n the validation of Ein ste in's theory.
we would n or1nally expect - the greater the degre e o f co nfirn1ation tb;1t
will be afforded by the evi dence. As an intuitive example, Hlnrvso n and l ·I lhid.
Urbach offer the following: 1 ' ;\ 1\iird pl>sil i < ,11 , 11· l1i, \1 1 1 1 1 1 1 H 1r 1 � 11> lw 1wid1er Popperian nor Dayesian, has been articu­
l.11nl l,y .\-1.11·1, ( I '1"1,: ( 1 1.11 1 1 1 · 1 1 , ) h 1 1n1 t hi-; 1wr�pcctivc, the sa111e idea is articulated as

·12 Popper ( l 96J: _)6). Sec ,1lso Popper ( 1 9.14/\ 968 ) . . 1 111.1 1 1 1· r ,,1 " ·., · 1·,·1, · 1 , ··.1 · .
I r , C, · · , , - . , ,. , l l , -111 1 •• 1 1 1 ' 1 I '
1 1 I h1w�(111 .1111\ l lrli:1l ·l1 ( 1 '18 <J: 81,). ( (
118 II. Doing Case Studies Techniques for Choosing Cases 119

Einstein's theory predicted that gravity would bend the pc1th of light toward a least-likely status relative to the theory of interest. Tsai's hypothesis is that
gravity source by a specific an1ount. Hence it predicted that during a solar eclipse
stars near the sun would appear displaced - stars actually hehind the sun would vill�ges with greater social solidarity (based on preexisting religious or
.
appear next to it, and stars lying next to the sun would appear farther fro1n it- and famrhal networks) will develop a higher level of social trust and mutual
it predicted the arnount of apparent displacen1ent. No other theory made these obligation and, as a result, will experience better governance. Crucial
predictions. The passage of this one single-case-study test brought the theory wide cases, therefore, are villages that evidence a high level of social solidarity
acceptance because the tested predictions \Vere unique - there was no plausible but that, along other dimensions, would be judged least-likely to develop
coinpeting explanation for the predicted result - hence the passed test was very
strong.47 good governance - that is, they are poor, isolated, and lack de1nocratic
institutions or accountability mechanisn1s from above. "Li Settlement "
The strength of this test is the extraordinary fit between the theory and a in Fujian province, is such a case. The fact that this impoverished villa�e
set of facts found in a single case, and the corresponding lack of fit between nonetheless boasts an impressive set of infrastructural accomplishments
all other theories and this set of facts. Einstein offered an explanation of such as paved roads with drainage ditches (a rarity in rural China) sug­
a particular set of anomalous findings that no other existing theory could gests that something rather unusual is going on here. Because her case is
1nake sense of. Of course, one must assume that there was no - or limited - carefully chosen to eliminate rival explanations, Tsai's conclusions about
measurement error. And one must assume that the phenomenon of inter­ the special role of social solidarity are difficult to gainsay. How else would
est is largely invariant; light <loes not bend differently at different times one e�plain this otherwise anomalous result? This is the strength of the
and places (except in ways that can be understood through the theory of least-likely case, where all other plausible explanations for an outcome
relativity). And one must assume, finally, that the theory itself makes sense have been mitigated.49
Jack Levy refers to this, evocatively, as a "Sinatra inference": if it can
on other grounds (other than the case of special interest); i t is a plausible
make it here, it can make it anywhere. 50 Thus, if social solidarity has
general theory. If one is willing to accept these a priori assumptions, then
the hypothesized effect in Li Settlement, it should have the same effect in
the 1919 "case study" provides a very strong confirmation of the theory.
1nore pi-opitious settings (e.g., where there is greater economic surplns).
I t is difficult to imagine a stronger proof of the theory from within an
The same implicit logic informs n1any case study analyses where the iutcnt
observational (nonexperimental) setting.
of the study is to confirm a hypothesis on the basis of a s ingle case (with­
In social science settings, by contrast, one does not commonly find
out extensive cross-case analysis). Indeed, I suspect tbat, implicitly, most
single-case studies offering knock-out evidence for a theory. This is, in
case study work that focuses on a single case and is not nested \vithin
my view, largely a product of the looseness (the underspecification) of
a cross-case analysis relies largely on the logic of the least-likely case.
most social science theories. George and Bennett point out that while the
llare\y is this logic n1ade explicit, except perhaps in a passing phrase or
thesis of the democratic peace is as close to a "law" as social science
two. Yet the deductive logic of the "risky" prediction may in fact be cen­
has yet seen, it cannot be confirmed (or refuted) by looking at specific
tral to the case study enterprise. Whether a case study is convincing or
causal mechanisms because the causal pathways mandated by the theory
not often rests on the reader's evaluation of how strong the evidence for
are multiple and diverse. Under the circumstances, no single-case test can
;1n argument might be, and this in turn - v1herever cross-case evidence is
offer strong confirmation of the theory (though, as we shall discuss, the
li111i�ed and no manipulated treatment can be devised - rests upon an esti-
theory may be disconfirmed with a single case). 48
111:1t1on of the degree of "fit" between a theory and the evidence at hand ' as
However, if one adopts a softer version of the crucial-case method ·
discussed.
the least-likely (most difficult) case - then possibilities abound. Lily Tsai's
investigation of governance at the village level in China employs sever.i 1
in-<lepth case studies of villages that are chosen (in part) because of tlll'i1 l'I
T�.11. I'_()07). 1t �h1111ld lll' 1to1nl I h;H T�;ii's conclusions do not rest solelv on this crucial
v.,�r·. lnd('l'd, >II!' n11pl,,, .., .1 l>r,,.,d r;1ngt· of 111erhodological tools, en�ompassing case
111('1 I i , 1,I•,
Van Evera (1 997: 6fi-7). Ser: also Fckstl'il1 ( 1 975) ;111d Popper ( 1 9(1 \).
',! i u l y 1 1 1 1 1 1 \ I < I \ ', .1 ·,1·
\II l 1 ·1· r l .'ll0 .1 . 1 1 , 1 - 1 1 ',, , ii·. . . l,l, . . 11 1 '. i l < J'I.I: · I'll;
47
\.1 1 , . 1 11 ( 1 (/ <) S:: •!9); ,ll\d Sh;ikr ( 1 988:
,

4 H Geo1·ge ,ind lk11ncll (200.1: 209).


1 ·1 ! , ) ,
120 II. Doing Case Studies Techniques for Choosing (;ases l2l

The Disconfitmato,y (Most-Likely) Crucial Case Netherlands� India, and Papua �ew Guinea score o n other factors that
A cenrra1 Popper.ian insight is that it is easier to disconfirm an inference affect democracy and social peace.
than to confirrn that san1e inference. (Indeed, Popper doubted that any Granted, it lnay be questioned \'vhether prcsurned invariant theories are
inference could be fully confirn1ed, and for rhis reason preferred the tcrn1 really invariant; perhaps they are better understood as probabilistic. Per··
�corroborate,") This is particularly true of case study research desig:1s, haps, that �s, the theory of cross-cutting cleavages is still true, probabilisti­
where evidence is l imited ro one or several cases. The key proviso is that cally, despite the apparenr Dutch exception. Or perhaps the theory is still
the theory under investigation n1ust take a consistent {a.k.a. .invariant, true, d eternllnisticatly, within a subset of case5. that does not include the
deterministic) form, even if its predictions are not terrifically precise, well Netl1crlands. (fh i s sort of clai111 see1ns unlikely in rhi� particular instance
elaborated, or broad. but it is quite plausible i n many orhers.) Or perhaps the theory is in need
i\s it happens, there are a fair n ui:nber of i nvariant propositions float­ of refr �n1i:1g; �t is true, deterministically� but applies only to cross-cutting
.
ing a;ound the social science disciplines. 5 1 In Chapter rrhree, ,ve discus11.ed ethn1c/rac1aJ c.1e.1vages, not to cleavages that are prlmarily religious. One
an older rheory that stipulated that political srability \'v"Ould occur only may quibble over \vhat it n1eans to "disconfirm" a theory. �fhe point is
in countries that are relatively homogeneous, or \vhere existing hetero ­ that tJ; e crt:cial case has, in all these circumstances, provided important
geneities are n1itigated by cross-·cL1tting cJeavages. 52 ,i\.rend lijphart's study updattng ot a theoretical prior.
of the Netberland.s� a peaceful country with reinforcing social cleavages, is
com111only viewed as refuting rhis theory on the basis of a single ln-depth Conclusion
case analysis. 5.l In this section; f have argued that the degree to \vhich cruc:al cases can
provide decisive confirmatio n or disconfirn1ation of a theorv' is in Iar•re
f-Ierctofore, I h2.vc treated causal factors as dichoton1ous. Countries
part a product of the structure of rhc rheory to be tested. It is a deductive
"
have either reinforcing or cross-cutting cleavages� and they have regirnes
that are either peaceful or conflictual. Evidently, these sorts of paran1eters n1�tter ra �he� th �n an inductive rr1a tter, ;-;tricdy speaking. In this respect,
_ . ' .
,1 pos1t1v1st , or1entation to\vard the \vork qf social science n1av lead to a
are often 1nattcrs of degree. In this read:.ng of the theoryJ c2.ses are rnore
or less crucial. Accordingly, the n1ost useful - th.1t is, n1ost cruclal - case � reater appreciation of the case study fonnat - :1.ot a denigrati�n of that
for Lijpharfs purpose is one that has the 111ost segregated social groups tor1nat, as is usually supposed. Those who, with Eckstein, e1nbrace the
and the most peaceful and democratic track record. In these respects, notion of covering laws are llkely to be attracted to the idea of cases that
the Netherlands was a very good choice. Indeed, the degree of disconAr­ arc crucial. By the same token, those \\·ho are in1pressc<l by the irregularity
:1 nd com plex1ty of social behavior are unlikely to be persuaded by crucial­
mation offered by this case study is probably grea,er rhan the degree of
disconfinnation th.1t 1night have been provided by another case, such a:, i.:.i:-:e studies, except as a 111echod of disconfirrning absurdly rigid caus.aJ
India or Papua Ne\v Guiuea - cow1tries where social peace has not ahvays I:, w,.
been secure. "fhe point is rhar \vhere variables·are continuous rather than I have show!1, relatedly, that it is almost al\vays easier to discon.hrm
,1 theory than to confirm it with a single case. Thus, a theory that is
dk:hototnous� it is po-ssible TO evaluate potential cases in ter1ns of thei1
degree of crucialness. 111H.l erstood ro be de::erminlstic 111ay be disconfirmed by a 1.:ase study, prop­
Note that vvhen disconfir1ning a causal argnrr1ent, background cau:.:1\ erly choser:. This is the 1nost co111n1on e1nployn1ent of rhe crucial-case
factors are irrelevant (except as they 1night affect the classification of the l!H'thod in soci::l science settings.
case -\:v ichln the popuh:.tion of an i nference). Tt does not 1n.1tter ho"v 1hv �ote- that the crucia]-ca;;e n1ethod of case selection cannot be employed
in a la rgc� N contexc. "fhis is bcca use the n1ethod of selection would render
51 Goertz nncl Levy (forrh.;;:orrin2}; Goertz ai:o Star· (2003;. I hr· L';l ell' st:1dx l°L'd L11HJant. ()nee OllC identifies the refevant paran1eters and
52 Ahnond (19 Sfi); Bentley {190811 967); I ifset ( \ 960/ 1 lj;-i3): '!'r.i1;1aP ( \ 'JS I ), 1!1c '>l'Orcs ol ;1JI t':l'><'" 011 d1nsc p�1:an;.eters, one has ln effect constructed
s.-i Lijphart (1968). See a!so di-e,cu,;.<,:ions- i11 E,.- ks'.TiJ' \ 1 975) ,nid l ::ph:1n ( ! 91, lJ ), hl,. .1,l.!1 .1 , 1 0.s;"- t'.l l' 1nod('I ! 1 1 . 1 1 \viii, !,y it.•;clf, confinn or disconfirn1 rhe theory i n
�.
ti0nal e-;,,:nmpks of ,:,1sL· �1 ;1dk-� ,li�<.:011!ir11ti•;µ. g;·nvr:1! rw(>ps,\il1(111� id .1 dc1t·,·111i11,·<1:, q 1 1 v.,l 1n11 . I Ii!' , .1•,c ',l n,lr 1· . dH·1;,. cl o:·1h irrrlcv;1nt, at least a s a 1neans of
nature, �-\'(' l\lli:n ( [<Jr,'i;; 1.nh,·t, Trow, ,\l'.d 1 ·,)l,·111.111 ( l'l)f,): N1t,!·.ud ( !'J(JiJ); H,·1!h
1 1 11tfl1 n1.1111i11 , q d 1- , , 1 1 d 1 1 1 1 1 . 1 1 1 > 1 1 1 11 ri·111;111 1c, lii )dily rclc1.-:inr a s ;1 lllL\lllS o f
(/0: )0f.1: t H i l l; :11:,l d 1,· di ,. 11 .,.1 .. 1,-. I l l J J:,,i: \ I ')'!;\) ,,!Jd H1 11·,,11\".l 1 i l ' l 'i\J
1 22 II. Doing Case Studies Techniques for Choosing Cases 123

exploring causal mechanisms, of course. However, because this objective Consider the following examples culled by Bear Braumoeller and
is quite different from that which is usually associated with the term, I drawn from diverse fields of political science.55 The decision to seek an
enlist a new tern1 for this technique. alliance is motivated by the search for either autonomy or security.56 Con­
quest is prevented by either deterrence or defense.57 Civilian intervention
in 111ilitary affairs is caused by either political isolation or geographical
Pathway Case encirclement.58 War is the product of miscalcnlation or loss of control. 59
One of the most important functions of case study research is the elucida­ Nonvoting is caused by ignorance, indifference, dissatisfaction, or inac­
tion of causal mechanisms. This is well established (see Chapter Three). tivity.6 0 Voting decisions are influenced either by high levels of informa­
But what sort of case is most useful for this purpose? Although all case tion or by the use of candidate gender as a proxy for social information.6 1
studies presun1ably shed light on causal mechanisms, not all cases are Democratization conies about through leadership-initiated reform, a con­
equally transparent. In situations where a causal hypothesis is clear and trolled opening to opposition, or the collapse of an authoritarian regime. 62
has already been confirmed by cross-case analysis, researchers are well These, and many other, social science arguments take the form of causal
advised to focus on a case where the causal effect of one factor can substitutability - multiple paths to a given outcome.
be isolated from other potentially confounding factors. I shall call this For heuristic purposes, it will be helpful to pursue one of these exam­
a pathway case to indicate its uniquely penetrating insight into causal ples in greater detail. For consistency, I focns on the last of the exemplars -
mechanisms. democratization. The literature, according to Braumoeller, identifies three
To clarify, the pathway case exists only in circun1stances where cross­ n1ain avenues of democratization (there may be more, but for present pur­
case covariational patterns are well studied but ,vherc the mechanisn1 poses let us assume that the universe is limited to three). The case study
linking X 1 and Y ren1ains dim. Because the pathway case builds on prior for1nat constrains us to analyze one at a time, so let us lin1it our scope to the
cross-case analysis, the problem of case selection n1ust be situated within first one - leadership-initiated reform. So considered, a causal-pathway
that sample. There is no stand-alone pathway case. Thus, the following case would be one with the following features: (a) democratization,
discussion focuses on how to select one (or a few) cases from a cross-case (b) leadership-initiated reform, (c) no controlled opening to the opposi­
sample. tion, (d) no collapse of the previous authoritarian regime, and (e) no other
extraneous factors that 111ight affect the process of democratization. In a
case of this type, the causal mechanisms by which leadership-initiated
refor1n may lead to democratization will be easiest to study. Note that it
Cross-Case Technique with Binary Variables
The logic of the pathway case is clearest in situations of causal sufficiency -
is not necessary to assume that leadership-initiated reform always leads
where a causal factor of interest, X1 , is Sufficient by itself (though per­
to de1nocratization; it may or may not be a deterministic cause. But it is
haps uot necessary) to cause a particular outco1ne, Y, understood as a
necessary to assume that leadership-initiated reform can sometimes lead
unidirectional or asymn1etric casual relationship. The other causes of
to den1ocratization. This covariational assumption about the relationship
Y, about which we need make no assumptions, are designated as J
vector, X2 .
Note that wherever various causal factors are deemed to be substi­ 1 1 Ibid. My chosen examples are limited to those that might plausibly be n1odeled ·with
tutable for one another, each factor is conceptualized (individually) as
sufficient. 54 Situations of causal equifinality presu1ne causal sufficiency
dichoto111ous variables. For further discussion and additional exan1ples, see Most and
St·;ur ( 1 984) and Cioffi-Revilla and Starr (1995).
on the part of each factor or set of conjoint factors. The QCA technique, ·,,, ivlorrow ( 1 99 1 ; 905).
for exan1ple, presumes causal sufficiency for each of the designated c,1us;il
Sl·hellin� ( l 966: 78).
",ll l't >�l'll ( 1 984: }')).
paths.
1•11 H .w:,d.il(· .111,l !{11·.I ( 1 ·1'! \ ' \ II
.,,, l l"V)' ( I ')H \ : 81,).

'" 1 l d , l ) n 1 1 1 . , 1 1 ( l ' J'I ' )


,. ' ( . , 1 , > 1 1 1 1 · 1 ( l ' ! ' l l I
124 II. Doing Case Studies Techniques for Choosing Cases 125

TABLE 5.2. Pathway case with dichotomous causal factors total number of possible combinations increases from eight (23 ) to sixteen
(24 ), and so forth. However, none of these co1nbinations is relevant for
present purposes except those where X2a and X2b have the same value
(zero or one). "Mixed" cases are not causal pathway cases, for reasons
that should become clear.
The pathway case, following the logic of the crucial case, is one where
Case the causal factor of interest, X1 , correctly predicts Y's positive value
types (Y = 1) while all other possible causes of Y (represented by the vector,
X2 ) 1nake " wrong" predictions. If X 1 is - at least in son1e circumstances -
a sufficient cause of Y, then it is these sorts of cases that should be most
useful for tracing causal mechanisms. There is only one such case in
X1 = the variable of theoretical interest. X2 = a vector of controls Table 5.2 - H. In all other cases, the mechanism running from X 1 to
(a score of zero indicates that all control variables have a score of zero,
Y would be difficult to discern, because the outcome to be explained does
while a score of one indicates that all control variables have a score
of one). Y = the outco1ne of interest. A�H = case types (the N for not occur (Y = 0), because X 1 and Y are not correlated in the usual way
each case type is indeterminate}. H = path\.vay case. Sample size = (violating the terms of our hypothesis), or because other confounding fac­
inci ctenninate. tors (X2 ) intrude. In case A, for example, the positive value on Y could be
Assumptions: (a) all variables can be coded dichoto1nously; (b) all
independent variables are positively correlated \.Vith Y in the general a product of X 1 or X2 . Consequently, an in-depth examination of cases
case; (c) X1 is (at least sometin1es) a sufficient cause of Y. A-G is not likely to be very revealing.
Keep in mind that because we already know from our cross-case exam­
between X1 and Y is presu1nably sustained by the cross-case evidence (if ination what the general causal relationships are, we know (prior to the
it is nor, there is no justification for a path\.vay case study). case study investigation) whar constitutes a correct or incorrect predic­
Now let us move from these examples to a general-purpose model. tion. In the crucial-case n1ethod, hy contrast, these expectations are deduc­
For heuristic purposes, let us presume that all variables in that rnodel are tive rather than empirical. This is what differentiates the two methods.
dichotomous (coded as zero or one) and that the model is complete (all i\nd this is why the causal-pathway case is useful principally for eluci­
causes of Y are included). All causal relationships will be coded so as to dating causal 1nechanisms rather than for verifying or falsifying general
be positive: X1 and Y covary as do X2 and Y. This allovvs us to visualize propositions (which are already apparent from the cross-case evidencc). 63
a range of possible cotnbinations at a glance. Now let us con1plicate n1atters a bit by in1agining a scenario in which at
Recall that the pathway case is always focused, by definition, on a least some of these substitutable causes are conjoint (a.k.a. conjunctural).
single causal factor, denoted X1 • (The researcher's focus n1ay shift to other ' [ 'hat is, several combinations of factors - Xa + Xb or Xe + Xd - are
causal factors, but may focus only on one causal factor at a tinie.) In sufficient to produce the outcome, Y. This is known in philosophical circles
this scenario, and regardless of how n1any additional causes of Y there ;\s �1n INUS condition, 64 and it is the pattern of causation assumed in most
might be (denoted X2 , a vector of controls), there are only eight relevant
case types, as illustrated in Table 5.2. Identifying these case types is a ''
1
H course, we should leave open the possibility that an investigation of causal mechanisn1s
(
n_ight invalidate a general clai1n, if that claim is utterly contingent upon a specific set
relatively si1nple 1natter, and can be accon1plished in a s1nall-N sa1nple hy 1

of c:111s;1I mechanisms and the case study sho\.VS that no such n1echanisn1s are present.
the construction of a truth table (modeled after Table 5.2) or in a largc-N 1 1 ,)WL'VL' r, rhis is rather unlikely in 1nost social science settings. Usually, the resu1t of such
sample by the use of cross-tabs. 11 li 1 1di 1 1g will lw ;1 rcformuhtion of the causal processes hy \.vhich X 1 causes Y - or,
Note that the total nun1ber of combinations of values depends 0 1 1 t l H· .1l11·r1 1 .11 ivd )·, .t n·,1li-.r: 1 l i!11 1 1 h ; 1 t t ill' c;1sc under Investigation i� aberrant (atypical of the
)',l ' I W r. i i p11p1 1 I.\I i 1 \ I I 1 11 , , l \ l ' \ ) ,
number of control variables, which we have rcprcscntl·d \.V ith ,1 sini 1, h · ,,-I :\ 1 1 lNl I� , 1 n 1 d 1 1 11 1 1 1 1 !'1i-1 ·. I<> . 1 1 1 l 1 1 ',1illi1 i!'llt h11t NcCl'\Sary part of ,1 condition which is
vector, X2 . If this vector consists of ,1 single v,1ri:1hll' , t hc 1 1 t l H'IT ;11-v 0 1 1 h 11·.,·II l 1 1 1 1 1 1 ·, , . . ,,. 1 1 1 l,i 1 I ',i , 1 1 1 , 11 1 1 1 l , , i . t 1,, 1 1 1 1 < 1 1 1.ll' l'<'\lii1 Th1 1 ,,;1 wlit·n 0 1 w identifies a short
eight C:lSl' l")'!ll'S, If d1is Vl'l'\01" l'llllSii.;ls (Jf 1 \V O v:1 ri:d1]l'� ( \. • . , , \. • 1 , ) , t l u·n l \tc ' 1 1 , 1 1 11 .I', 1 1 11 . • . 1 1 1 ., ' , , I I 1 1 1 , . , , 1 .. 1·. . 1 1 1 1 1 1·.. 1 1 1 . . t l i - , t, 1li. 1 1 1111' li r·t · \\':I', t", \ 1 1\t'd Ii} .l -;hurl
126 II. Doing Case Studies Techniques for Choosing Cases 127

QCA (Qualitative Comparative Analysis) models. 65 Here, everything that Achieving the second desideratum requires a bit of manipulation. In
has been said so far must be adjusted so that Xi refers to a set of causes order to determine which (non-outlier) cases are most strongly affected
(e.g., X, + Xb) and X2 refers to a vector of sets (e.g., X, + X<l , X, + by X 1 , given all the other parameters in the model, one must con1pare the
X1, Xg + Xh, . . . ). The scoring of all these variables makes matters more size of the residuals (their absolute value) for each case in a reduced-form
difficult than in the previous set of examples. However, the logical task IS model, Y = Constant + X2 + Resreduced, to the size of the residuals for
identical, and can be accomplished in a similar fashion, that is, in small-N each case in a full model, Y = Constant + X2 + X 1 + Resfoll · The pathway
datasets with truth tables and in large-N datasets with cross-tabs. Case case is that case, or set of cases, that shows the greatest difference between
H now refers to a conjunction of causes, but it is still the only possible the residuals for the reduced-form model and the full model (1'Rcsidual).
pathway case. Thus,

Pathway = IResrcduccd - ReSfulll, if I Resrcduccdl > I Resfuul (5.10)


Cross-Case Technique with Continuous Variables
Finally, we must tackle tbe most complicated scenario -when all (or most) Note that the residual for a case must be smaller in the full model than in
variables of concern to the 1nodel are continuous, rather than dichoto­ the reduced-form model; otherwise, the addition of the variable of interest
mous. Here, the job of case selection is considerably more complex, for (Xr ) pulls the case away from the regression line. We want to find a case
causal "sufficiency" (in the usual sense) cannot be invoked. It is no longer where the addition of X 1 pushes the case toward the regression line, that
plausible to assun1c that a given cause can be entirely partitioned, that is, is, it helps to "explain" the case.
that all rival factors can he eliminated. Even so, the search for a pathway As an example, let us suppose that we are interested in exploring
case may be viable. the effect of n1ineral wealth on the prospects for de1nocracy in a soci­
What we are looking for in this scenario is a case that satisfies two ety. According to a good deal of work on this subject, countries with a
criteria: ( 1 ) it is not an outlier (or at least not an extreme outlier) in the bounty of natural resources - particularly oil - are less likely to democra­
general model, and (2) its score on the outcome (Y) is strongly influenced tize (or, once having undergone a democratic transition, are more likely to
by tbe theoretical variable of interest (X 1 ), taking all other factors mto revert to authoritarian rule).66 The cross-country evidence is robust. Yet,
account (X2 ). Io this sort of case it should be easiest to identify the causal as is often the case, causal mechanisn1s ren1ain rather obscure. Consider
1nechanisms that lie between X1 and Y. the following list of possible causal pathways, summarized by Michael
In a large-N sample, these two desiderata may be judged by a careful Ross:
attention to the residuals attached to each case. Recall that the question
of deviance, which we have discussed in previous sections, is a matter of A 'rentier effect' . . . suggests that resources rich governments use low tax rates
degree. Cases are more or less typical/deviant relative to a general model, anU patronage to relieve pressures for greater accountability; a 'repression

with very high residuals (e.g., standardized residual > I 2 I). for cases that
as judged by the size of their residuals. It is easy enough to exclude cases cffccr' . . . argues rhar resource wealrh rerards democrarization by enabling gov­
ernn1ents to boost their funding for inrernal security; and a 'modernization
effecr' . . . holds that grov.fth based on the export of oil and minerals fails to bring
lie closer to their predicted value, small Jifferences in the size of residuals
,1bout the social and cultural changes thar tend to produce democratic govern-
1nay not matter so much. But, ceteris paribus, one would prefer a case 111cnt. 67
that lies closer to the regression liue.
Are all three causal mechanisms at work? Although Ross atten1pts to test
circuit in conjunction with some other background factors (e.g., oxygen) that were also factors in a large-N cross-country setting, his answers re1nain rather
d1l·sc
necessar.y to that outcome. But one is not implying that a short circuit \Vas necessary 10
that fire, v..·hich might have been (under different circu1nstances) caused by other factor�.
See :tvlackie (1965/1993). "'' !\.1rr1) ( l 'J'J'!), 1 1111111 ' 1111 ' \ . ( . ' Oi l � ) ; Rt)�� (200 1 ).
65 R:igin (2000). 1{1 l\', ( .1 1 1 1 l I : \ .' ' ii I
128 II. Doing Case Studies Techniques for Choosing Cases 129

speculative. 6 8 Let us see how this might be handled by a pathway-case TABLE 5.3. Possible pathway cases where variables are scalar and
approach. assumptions probabilistic
The factor of theoretical interest, oil wealth, may be operationalized
Resrcduccd Res fuu ..O..Residual
as per capita oil production (barrels of oil produced, divided by the total
Country

population of a country). 69 As previously, we measure detnocracy with a Iran -.282 -.456 .175
continuous variable coded from - 10 (1nost authoritarian) to +10 (most Turkrnenistan - 1 .220 -1.398 .178
!vlauritania -.076 -.255 .179
democratic). Additional factors in the model include GDP per capita
Turkey 2.261 2.069 .192
(logged), Muslims (as percent of the population), European language (per­ Switzerland . 1 77 -.028 .205
cent speaking a European language), and ethnic fractionalization ( 1 - Venezuela .148 .355 -.207
likelihood of two randomly chosen individuals belonging to the same Belgium .518 .310 .208
ethnic group). 70 These are regarded as background variables (X2) that Morocco -.540 -.776 .236
may affect a country's propensity to den1ocratize. The full model, li1nitcc.l Jordan .382 .142 .240
Djibouti -.451 .245
to 1995 (as in previous analyses), is as follows:
-.696
Bahrain -1.411 -1.673 .262
Luxen1bourg .559 .291 .269
Democracy = -3.71 Constant + 1.258 GDP (5.11) Singapore -1.593 -1.864 .271
+ -.075 Muslim + 1 . 843 European On1an - 1 . 2 70 -.981 -.289
Gabon -1. 743 -1.418 -.325
+-2 . 0 93 Ethnic fract + - 7.662 Oil -1.681
Saudi Arabia -1 .253 -.428
R2,<l1 = .450 (N = 149) Nonvay .315 1.285 -.971
lJnited 1\rab En1iratcs - 1.256 -.081 -l.175
The reduced-form model is identical except that the variable of theoretical Kuv..,ait -l.007 .925 -1.932
interest, Oil, is rcrnoved. Resrcduccd = the standardized residual for a case obtained frorn the reduced 1nodel
{without Oil) - equation 5.12.
Democracy = -. 8 3 1 Constant + .909 GDP (5.12) Res(u/1 = the standardized residual for a case obtained fron1 the full n1odel (,.vith
Oil) - equation S. 1 1 .
+-.086 Muslim + 2.242 European 6.Resi<lual = Res, wiu,cd - Resiul l · Listed in order of absolute value.

+ - 3.023 Ethnic fract


R2,d; = .428 (N = 149) these 1nay be summarily ren1oved fro1n consideration by virtue of the
fact that I Resrecluccd l < I Resfull l· Thus, we see that the inclusion of Oil
What does a cornparison of the residuals across equations 5. 1 1 and increases the residual for Norway; this case is apparently better explained
5.12 reveal? Table 5.3 displays the highest t-Residual cases. Several of 1uithout the inclusion of the variable of theoretical interest. Needless to
say, this is not a good case to explore if we wish to exan1ine the causal
68 Ross tests these various causnl mechanisn1s with cross-country data, en1ploying variot1:-. 111cchan isms that lie between natural resource wealth and de1nocracy. (It
proxies for these concepts in the benchmark model and observing the effect of these .­ rnight, however, be a good case for model diagnostics, as discussed in the
presu1nably intermediary - effects on the main variable of interest (oil resources). This previous section on influential cases.)
is a goo<l example of how cross-case evidence can be n1ustered to shed light on cn1\,il
!\ n1ong cases where the residual declines from the reduced to the full
rnechanisn1s; one is not ]in1ited to c.1se study forrnats, .-ts discussed in Chapter Three. Still,
as Ross notes (2001: 356), these tests are by no n1l:ans definitive. Indeed, the cocffici,·111 111odcl, scvcr;1\ arc clear-cut favorites as pathway cases. The United Arab
on the key oil variable re1nains fairly constant, except in circurnst:1nces ,vhen: rhe sa111pk 1:. n1ir,1tl'-" :111d l<u\v:1il h:1vc t h e highest .6.Residual values and also have
is severely constrained. L1irly 111od('.., I n· ..,id11.il:-. in t ill· ftil! 111odcl ( Rcs r,i1il , signifying that these
69 Derived frorn l-fun1phreys (2005).
70 C DPpc (.Ll ta are fro111 World Bank (2003). tv1 11';1i1n� :111d L11rtipl', \11 l.1111•, t 1:w, 1 · .\r1· , 1 1dnl 1 , I ..,('', :tn· 1 1 0 1 c \ ! 1 i · 1 1 1 t · ( 1 1 1 1 l 1 t · 1 •, ; indeed, :ll'l'ording to the p;1r;1111ctcrs of this
hy l h l' ;t11tl i oL Lth 1 1 il i'r.tl·1i 1 J 1 1:il11. :11iu 1 1 j., dr·,1w 1 1 11·<1 1 1 1 i\k-, 1 11.1 t'T .11. I ' I l l ) \ 1 . ·
1 11 1 1d(·I, 1 11(' I 1 1 1 1 ! ( ' 1 I :\ 1 .ii 1 I 1 1 1 1 1 . ! ! < ", \\' 0 1 i l d hi' 1 l')',,1 nlctl ;\.., .1 1 ypil·:1 I L·:1:-.c. l ' hv
130 If. Doing Case Studies Techniques for Choosing Cases 131

analysis suggests, therefore, that researchers seeking to explore the effect section of the chapter: case-selection procedures often cornbine different
of oil wealth on regirne type might do well to focus on the-se tv.ro cases, logics.
since their patterns of dernocracy cannot be \Veil explained by other fac­ Despite the technical nature of this discussion, it should be noted that
tors such as, economic develop111ent, religion, European influence, o r eth­ when researchers refer to a particular case as an "'example" of a broader
nic fractionalization. 'fhe presence of oil wealth in these countries would pheno1nenon, they are often referring to a pathv1ay case. This sort of
appear to have a strong independent effect on the prospects for dernocra­ case illustrates the causaI relationship of interest i n a particularly vivid
tization Jn these countries, an cffett that is well 111odeJed by our general 1nanner, and therefore may be regarded as a common trope a1nong case
rbeory and by the available cross-case evidence. And this effect should study researchers,
be interpretable in a case-study format more interpretable� at any rate,
than it would be in other cases.
Most-Similar Case
The most-similar method, unlike the previous methods, employs a mini­
Conclusion mum of two cases. 71 In its puresr form, the chosen pair of cases is sia1ilar
'fhe logic of caus::J "elimination" is rnuch more compelling \vhere vari­ in all respects except the variable(s) of interest.
ables arc dichotornous and where causal sufficiency can be assumed {X1 is If the study is exploratory (Le., hypothesis-gener?.ting), the reseJrcher
sufficient by itself, at least in so1ne circumstances, to cause Y). \Vhe:re vari­ looks for cases that differ on the outcon1e of theoretical jnrerest but are
ables are continuous the strategy of the path\.vay case is more dubious, for similar on various factors that might have contributed to that uutcon1e, as
potentially confounding causal factors (X2 ) cannot be neatly partitioned. illustrated in Table 5.4 (A). This is a common form of case selection at the
Even so, rhJs discussion has sho\.vn v;,rhy the selection of a pathway case is initial stage of research. Often, fruitful analysis begins with an apparent
a logical approach to case study analysis in tnany circumstances. anon1aly: two cases are apparently quite sirnilar, and yet demonstrate
The exceptions may be brlcfly noted. Sometimes, \Vhere all variables surprisingly different outcomes. 'The hope is that intensive studv of these
in a n1odel are dichoto1nous, there are no pathway cases, that is, no cases cases will reveal one or at 1nost several - factors that differ ac;oss these
of type H (in Table 5.2). This is known as the "empty cell" problem, or a cases. These differing factors (X 1 ) are the putative causes.
problern of severe causal n1ulticollinearity. The universe of observational So1netimes, a researcher begins with a strong hypothesis, i n which case
data docs not always oblige us \Vith cases th.at allr>W us to test a given her re-search design is confirmatory (hypothc:;;is-tcsting) from the get�go.
hypotbesis independently of all others. 'fhat is, she strives to identify cases that exhibit different scores on the
\X'herc variables are continnous� the analogous problem J5 that of a factor of interest and sin1ilar scores on all other possible causal facto.�s,
causal variable of interest (X i ) that has only minimal effects on the our­ as illustrated in the second (hypothesis-testing) diagram in Ta ble 5.4 (B).
con1e of interest. That is, its role in the general model is qulte minor (as If she discovers such a case, it ls: regarded as providing confirmatory evi­
judged by its standardized coefficient or b y F-tests comparing the reduced­ dence for the proposition, as well as fodder for an exploration of causai
form model and the full model). In these situations, the only cases that 1nechanisms.
are strongly affected by X, - if there are any at all - may be extreme The point is that the purpose of a most-similar research design,
outliers, and these sorts of cases are not properly regarded as prov1ding and hence its basic set-np, may change as a researcher rnoves fro1n an
confirmato:::-y evidence for a proposition, for reasons that are abundantly exploratory to a confirn1atory mode of analysis. However, regardless of
clear by now,
Finally, it must be underlined that the identification of a causal-pathway
' I \on11,,l1!!:_'� the ni,;1\1-sim!br 1r.cthod is known as the ''method of difference," after
case does nor obviate the utility of exploring other cases. Ho\vCvs"r, thl<; 1h 111vn1tiir ( \1111 1 1<•1 i/ J i-:72), F(;!' bier tr\'atment5 �ee Cohen and �agcl (1 934);
sort of rr:ulticase investigation moves beyond the logic of the c:1 ui.;al p;1th h·J :,1:1 ( l '!�'I); l ,!'l' : 1 1 11•, i .1 \HJ t ; Clt.1plt'r 9); Li 1 phnrt ( 1 971, 1 975); :\1eL·kst:·i,,�tl-: ( 1 975);
way case, underlining a point that we sh.:dl ret urn to in the cn111.: l11ding l'1 t1'w1ir,,h1 .11111 Ii 1111-· ! 1 •1 "li): .11111 Sl,1H p,,l ,ind S1 ,111e� ( 1 980),
132 II. Doing Case Studies Techniques for Choosing Cases 133

TABLE 5.4. Most-similar analysis with two case types one must code cases dichotomously (high/low, present/absent). This is
straightforward if the underlying variables are also dichotomous (e.g.,
(A) Hypothesis-generating (Y-centered):
federal/unitary) . Hov,ever, it is often the case that variables of concern
in the n1odel are continuous (e.g., party cohesion). In this setting, the
researcher n1ust "dichotomize" the scoring of cases so as to simplify the
two-case analysis. This is relatively unproblen1atic if the actual scores on
this dimension are quite different (on X 1 and Y) or virtually identical
(BJ Hypofhesis-testing (X1/Y-centered):
(on X2 ) . Unfortunately, the empirical universe does not always oblige the
requirements of Millean-style analysis, and in these instances the logic of
most-similar con1parison becomes questionable.
Some flexibility is admissible on the vector of controls ( X2 ) that are
"held constant" across the cases. Nonidentity is tolerable if the deviation
X 1 = the variable of theoretical interest. X2 = a vector of runs counter to the predicted hypothesis. For example, Epstein describes
controls. Y = the outcorne of interest. both the United States and Canada as having strong regional bases of
power, a factor that is probably n1ore significant in recent Canadian his­
where one begins, the results, when published, look like a hypothesis­ tory than in recent An1erican history. However, because regional bases of
testing research design. Question marks have been removed: (A) becomes power should lead to weaker parties, rather than to stronger parties, this
(B) in Table 5.4. Consequently, the notion of a "most-similar" analysis is el en1ent of nonidentity does not challenge Epstein's conclusions. Indeed, it
usually understood as a tool for understanding a specific X 1 /Y relation­ sets up a most-difficult research scenario, as discussed earlier. At the same
ship. time, Epstein's description of Canadian and An1erican parties as "loose"
As an example, let us consider Leon Epstein's classic study of party might be questioned. Arguably, An1erican parties, dominated in the latter
cohesion, which focuses on t\VO si1nilar countries, the United States twentieth century by direct primaries ( open to all who declare themselves
and Canada. Canada has highly disciplined parties whose members vote a n1ember of a party and, in son1e states, even to those \Vho arc men1-
together on the floor of the House of Commons, while the United States bers of the opposing party), are considerably more diffuse than Canadian
has weak, undisciplined parties \vhose members often defect on floor votes parties. The problem of coding continuous variables in a dichotomous
in Congress. In explaining these divergent outcomes, persistent over many n1a nner is threatening to any most-si1nilar analysis.
years, Epstein first discusses possible causal factors that are held more or In one respect, ho\vever, the requirements for case control are not so
less constant across the two cases. Both the United States and Canada stringent. Specifically, it is not usnally necessary to measure control vari­
inherited English political cultures; both have large territories and het­ ables (at least not with a high degree of precision) in order to control for
erogeneous populations; both are federal; and both have a fairly loose them. If two countries can be assnmed to have sin1ilar cultural heritages,
party structures with strong regional bases and a weak center. These are one needn't worry about constructing variables to measure that heritage.
the "control" variables (X2 ). Where they differ is in one constitutional ( )ne can simply assert that, whatever they arc, they are more or less con­
feature: Canada is parliamentary, while the United States is presidential. .-;tant across the two cases. This is similar to the technique employed in a
And it is this institutional difference that Epstein identifies as the differ­ r,111 don1ized experiment, where the researcher typically docs not atten1pt
entiating cause (X 1 ). 72 t o 1nc:.1sure all the factors that n1ight affect the causal relationship of inter­
Several caveats apply to any most-similar analysis (in addition to l':-.t. She assun1cs, rath er, that these unknown factors have been neutralized
the usual set of assumptions applying to all case study analysis). firs1 , , llToss the trc:1 1 11H·1H :111d co1nrol groups by randomization. This can be a
l n tgl' : 1 d v ; 1 1 1 1 ;q,,,. o\'(T l.1 r1 ·.c N lTnss -c;1sc in cthods, where each case must be
72 For further examples of the n1ost-simiL1r method, sn· Bn·111HT ( 1 ')7f,); I h 1 n i l l 1 )1 1 ( I ')'/ '); , l ',..,i 1 •, 111·d ,I .., 1 11·1 di, ·.1 i ! l c 1 1 1 1 . 1 l l 1 c l1·v;t11l conl rol v.1 riah lcs � often a highly
l.ipsct ( 1 968}; Migu('l (2004); r,..,Joul,kr { 1 ')77); : 1 1 1 d 1'<1\IH't' (.1 11(),1). q1 11",t io1 1 . i l il1· l l ! t H t · 1 l 1 1 1 1 • . 1 1 1 1 1 1 1 11t· ! 1 1.11 l l l l l '-,[ iinpo\t' .-; t rong :lS.<; l l lllpt ions
134 II. Doing Case Studies Techniques for Choosing Cases 135

about the shape of the underlying causal relationship (usually presumed the control-variable approach. This alternative begins b y identifying a set
to be linear). of variables (other than the dependent variable or the main independent
variable) on which the cases are to be matched. Then, for each case in
Cross-Case Technique the treatment group, the researcher identifies as many cases as possible
The most useful statistical tool for identifying cases for in-depth anal­ fro1n the control group with the exact same scores on the rnatching vari­
ysis in a most-similar setting is some variety of "matching" strategy.73 ables (the covariates). Finally, the re searcher looks at the difference on
Statistical estimates of causal effects based on matching technique s have the dependent variable between the cases in the treatment group and the
been a major topic in quantitative methodology over the last twenty-five matching cases in the control group. If the set of rnatching variables is
years, first in statistics74 and subsequently in econometrics75 and political broad enough to include all confounders, the average difference between
science. 76 the treatn1ent-group and the matching control-group cases should pro­
Matching techniques are based on an extension of experimental logic. vide a good estimate of the causal effect. Even in a situation in which the
In a randomized experiment, elaborate statistical models are unneces­ set of matching variables includes some, but not all, confounders, 1natch­
sary for cansal inference because, for a large enough selection of cases, ing may produce better causal inferences than regression models because
the treatment group and the control group have a high probability of cases that match on a set of explicitly selected variables are also n1ore
being similar in their background characteristics (Xz ). Hence, a simple likely to be similar on nnmeasured confounders.77
difference-of-means test is often sufficient to analyze the effects of a treat­ Unfortunately, the re latively simple matching procedure just described,
ment variable (X i ) across groups. kno\.vn as exact matching, is often impossible. This procedure typically
In observational studies where the hypothesized causal factor (Xi) is fails for continuous variables such as wealth, age, and distance, since there
dichotomous, the situation is superficially the san1e. For purposes of dis­ may be no two cases with the sau1e score on a continnous variable. For
cussion, we shall refer to cases with a "high" score on X1 as members of example, there is no unden1ocratic country ,vith the exact same per capita
the treatment group, and to cases with "low" scores as n1embers of the GDP as the United States. Note that the larger the number of covariates,
control group. Thus are observational stndies translated into the lexicon the lower the likelihood of finding exact matches.
of experimental analysis. In situatious where exact n1atching is infeasible, researchers may
However, in observatioual studies it is unusual to find cases that differ instead en1ploy approximate matching, ,vhere cases from the control
on X 1 but not on various background characteristics (X2) that n1igbt group that are close enough to 1natching cases from the treatment group
affect the outcome of interest. For exa111ple, countries that are strongly are accepted as matches. Major weaknesses of this approach include the
democratic (or strongly authoritarian) are likely to be similar in more fact that the definition of "close enough" is inevitably arbitrary, as well
than one respect. This greatly complicates the analysis of X 1 's inde pende11 1- as the fact that, for large sets of matching variables, few treatment cases
effcct on the outco 111e. ;ire likely to have even approximate matches.
The traditional approach to this problem is to introduce a variabil' "fo deal with situations in which exact matching is impossible, method­
for each potential confounder in a regression model of causal relation ologists have developed an alternative procedure known as propensity­
ships. But this standard-issue technique requires a strong set of assn111p· S(nre 1natching. This approach suggests a somewhat different definition of
tions about the behavior of the various factors introduced into the model. 'ii1nilarity than the previous two. Rather than focusing on sharing scores
Matching techniques have been developed as an explicit alternative t o 0 1 1 the n1atching variables, propensity-score matching focuses on shariug

., si1nilar estimated probability of having been in the treat1nent group,


73 For good introductions, see Ho et al. (2004); Morgan and Harding (2005); Rosl'1 1 h:u111 1
(2004); an<l Roscnhaun1 and Silber (200 1 ). For a discussion of lllatching pr()cnlur(·� 111
Stc1ta, see Abadie et al. (2001). · 1 11 >1\Tl'(T, 11 1 . 1 1 , li11 1 1·. 1·. , l,·.11 h· i11kritH' t 1 ) :1 1vcll-dcsig11ed and well-executed randomized
74 Rosenbau1n and Rubin ( 1 985); Roscnb,1t1rn (2004). ('\)ll't i11w111. ! II<' lw1u 1 1 1 ·. .,j 11 1 .11< 1 1 1 1 1 1•. t'\(1·1H I only so for ns equivalence on the varic1blcs
7 � I hihn ( l 998). 1 '1'1" 1 1 11' 1 1 n l1ul,-,I . 1 1 1 , l . 1 1 1 1 111111,, . ,1·-111,·d 1·.1r1.1hk� 1h.11 i'nrtuitously h;1ppen to he sitnil:-tr
71
' ] lo L'1 :d. ( 2()(),1 ); Jin.ii ( }()()) ).
.IL I "',', 1 1 11 ' , . 1 ·., ·. l',1 • , , i , 1 1 • ,I 1 " " 1 " I 1 .111,l,111111.1 1 1 1 111 l1.11 1dl, . . ,.tll \ l l l l l l(",l\llT"('( I l':Hi:1hll·�.
137
136
JI. Doing Case Studies Techniques for Choosing Cases

conditional on the marching va riables. In other words, wh en lo oki ng for In order to select rri�st-similar case s for the study o f the relationship
a match for a specific case in t he treatment group, researchers look for betwe en wealth and dernocracy, one 1nust arrive at a statistical model of
cases in rhe control group that - b e fore the sco re on the indep endent vari ­ the c auses of a country's wealth. Obviously, such a propo sition is com­
able is know n - would have been a s likely to be in the treatm ent group plex. Since this is an illustrative exarnple, w e shall be satisfied with a
as actually chosen cases. This is accomplished by a two-stag e analysis, cartoon 1nodel that i ncludes only a fe w independent vari a bles. A coun­
the first sta ge of which a pp roaches tb e key independent v ari a ble, X1 , as a try's wealth will be assum ed to be a function of the origin of its lega l
dependent va riable and the 111atching variables as i ndependen t vari a bles. system (measured by dummy variable s for English legal heritage, French
(This is similar i n spirit to selection models, wh ere a tv.,ro-sta ge approach leg a l heritage, socialist legal heritage, German legal heritage, and Scandi­
to causal infe ren c e is adopted.) Once this n1ode l has been cstirnated, the navian lega l heritage ) and its geographic endowments (m easured by th e

co efficient e stin1ates a re disregarded. lnstea d, the secon d stag e o f the anal­ dista nc e of each country's capital city frorn the equato r ).
ysis employs the fitted values for each c a se, which te ll us the probability of 'fhe first step in selecting most-sin1ilar case s is to run a nonparam etric
that c ase be ing assigned to the treatn1ent group, conditio nal on its sco res re gression with these inde pendent variables and logged per capita GDP
on the matching var iables. These fitted values are ref erred to as propen­ (th e i ndependent v ari able of theoretic al inte re st) a s th e d ependen t vari­
sity score s. The final step i n the p rocess is to cho os e matche s for each cas e abl e. The fitted value s fron1 this regressio n serve as p ropen sity scores,
in the treatn1ent group. This is accomplishe d by se lecting case s fron1 the and cas es \Vith si1nilar propensity scores are i n terpreted as 111atching. The
co ntr o l group with simila r propensity sco res. propensity score for our focus c a se, Costa Ric a, is 7.63. Exan1 i n in g the
The end re sult of thi s procedure is a se t of n1atched c a ses that c an propensity-score da ta , one see s th a t Beni n ha s a prope nsity sco re o f 7.58 -
be compa re<l in whatev er way the rese a rcher deems app rop riate. These quite si1nilar to Costa Rica 's. At the same time, Ben i n's per capita GDP
are the '" n1ost-sin1ilar" cas es, return ing to the qualita tive t e nninology. o f $1,163 is substan ti a lly different from C osta Rica's per capita GDP of

Ro senba um and Silber sumn1 ar ize the re sults of re c en t medic a l studies: $5,486, as are their democracy score s in 1995 (Benin is much less demo­
cr a tic than Co sta Rica ). Hence, Costa Ric a an<l Ben i n may be viewed as
Unlike model-based adjustrnents, whe re patients vanish and a rc rep laced by the 1nost-similar case s for testing the rel ationship hctwcen vvea lth and democ­
coe fficients of a n1ode l, in matching, ostensibly comparable patte rns are compared
racy, as illustrated in Ta hle 5.5. An in -<lepth ana lysis of th e se two cases
dire crly, one by one. Modern n1atching n1ethods invo l ve s� atistical �odel ing and
coinbinatorial al gorithJns, but the end resu lt is a coll ection of p a irs or sets ? f n1ay shed light on the causal pathw a ys between eco nomic <leve lopm en t
peop l e ,...vho look con1parable, at least on average. ln rna�ching,_ peopl e retain and de mocracy. Indeed, the se t\VO c ase s arc probably more infonnative
th eir integrity as peop le, so they can be exan1ined and their stories can be told than other two -case con1parisons precisely because the case-se lection pro­
individually. 78 c edure has identified countries whose other a ttributes a rc roughly equa l
Matchi ng, conclude the a uthors, "facilita tes, rather than inhibits, thick i n their propensity to democracy/authoritaria nism. This means th at the
differenc es on the variable of theoretical interest (GDP per c a pita) and the
descriptio n . " 79
outcon1e (democ racy) can be given a caus al i nterpretation - an interprc­
Indeed, the sa1nc 1natching technique s that ha ve be en used successfully
i n observational studies of medica l treatme nrs might also be adapted to the t.ition that would probably not h e sugge sted by a qualitarive a ssessment
study of n a tion-state s, politica l p artie s, citie s, or indeed any paired c as� s of th e se two countrie s (,.vhich are quite different i n culture, region, and
. histo r ical experienc e).
i n the social science s. Suppose that, in orde r to study the relanonsh1p
betwe en wea lth an d democracy, the re searcher wishes to select a c�1st· It is i n1po rtant to keep in mind that the quality of the "m a tch"
that is a s similar a s possible to Costa Ric a in background v ar iables, while depends en ti rely on the quality of the statistic al model us ed to generate
being a s different as possible on pe r capita GDP, the variable of thcoretic;1 l t lil' propensity sco res. A superficial n1ode l like the one us ed here may
interest, and the outco me of in te re st, de n1ocracy. produce rather supt"rlicial 111:itch e s. Yet, i n a large-N context - ,.vhere
d 1 i1.v11s, i f not 1ho11s.11HI..., ( ) f casL'S vie fo r inclusion - a forn1al approa ch
In l·;1'>t ' s(·h·t 1 i c i 1 1 1 d l 1· 1 ·. ·.ii•.11dil·.1n1 ,Hlv;uH;1gcs. At the very l east, one's
78 Rosi:nhaurn :1nd Sil her (200 1 : 22..\).
';'') !hid. .1·,·,11111p111 111·, .11 1 · 1i·11 1 k 1 1·, ! ! 1 .111·, 11.11c·111.
13 8 II. Doing Case Studies Techniques for Choosing Cases 139

TABLE s.s. Paired cases resulting front matching procedure the executive (parliamentary/presidential). Indeed, Epstein spends rela­
tively little ti1ne in this article discussing possible causal mechanisms;
his principal focus is on "scoring" the relevant variables, as discussed.
GDP PropensitJi
per capita score Denlocracy By the same token, if Epstein had already conducted a large-N cross­
case analysis prior to his case study, and if this cross�case analysis had
Benin r�vealed a strong pattern between executive type and party cohesion,
Cases hrs two-case analysis of the United States and Canada (cases that we
Costa Rica presume would have very sirnilar propensity scores) would now serve a
rather different purpose. Evidently, the function of the most-similar case
study shifts subtly but in1portantly when the case-selection procedure is,
Conclusion itself, a mode of analysis, offering strong prima facie evidence of a causal
The most-similar method is one of the oldest recognized techniques of relationship.
qualitative analysis, harking back to J. S. Mill's classic study, System of As with other methods of case selection, the most-similar method is
Logic (first published in 1834). By contrast, matching statistics are a rela­ prone to problems of non-representativeness. If this technique is employed
tively new technique in the arsenal of the social sciences, and have rarely in a qualitative fashion (without a syste1natic cross-case selection strat­
been employed for the purpose of selecting cases for in-depth analysis. egy), potential biases in the chosen cases must be addressed in a spec­
Yet, as suggested in the foregoing discussion, there n1ay be a fruitful inter­ ulative way. If the researcher employs a matching technique of case
change between the two approaches. Indeed, the current popularity of selection within a large-N sample, the problem of potential bias can be
n1atching among statisticians - relative, that is, to garden-variety regres­ addressed by assuring a choice of cases that are not extreme outliers, as
sion models - rests upon what qualitative researchers would recognize as j udged by their residuals in the full model. Most-similar cases should also
a "case-based" approach to causal analysis. If Rosenbaum and Silber are be "typical" cases, though some scope for deviance around the regres­
correct, it may be perfectly reasonable to appropriate this large-N method sion line may be acceptable for purposes of finding a good fit among
of analysis for case study purposes. cases.
To be sure, the purpose of a case study is son1ewhat different in situa­
tions where a large-N cross-case analysis has already been conducted.
Most-Different Cases
Here, the general causal relationship is usually clear. We know from
our cross-case study that GDP per capita is strongly associated with A final case-selection method is the reverse image of the previous method.
democracy; there is a strong presu1nption of causality. Of course, the case 1 --Iere, variation on independent variables is prized, while variation on the
study analysis may give us reasons to doubt. Perhaps the causal path­ outcome is eschewed. Rather than looking for cases that are most-similar
ways from economic developn1ent to regime type are difficult to identify. (>Ile looks for cases that are most-different. Specifically, the researcher trie;
Perhaps the presumed causal pathways, as identified by previous research to identify cases where just one independent variable (X 1 ) , as well as the
or theoretical hunch, are si1nply not in evidence. Even so, the usual pur­ dependent variable (Y), covary, while all other plausible factors (X20_ a)
pose of a case study analysis in this setting is to corroborate an initi,11 :-.how different values. 80
cross-case finding.
By contrast, if there is no prior cross-case investigation - at least llOlll' Hii Tl1t· n1osr-differcnt n1cthocl is sometitnes referred to as the " inethuJ of agreernent," fol­
of a formal nature - the case study performs a somewhat different roll'. luwin� i(s inventor, J. S. Mill (1 843/1872). See also Defelice (1986); Gerring (2001:
Herc, we will be more interested in the covariational patterns th�1t ;1rc -' I / I •l); l . 1jph:111 ( I < !7 1 , 1 97S); Meckstroth (1975); Przeworski and Teune (1970);
.111d 'il,(1q1ol , 1 1 1 d \<111H·1·. ( l < !:-lO). h>r t'X:1111plC's of rhis 1nethod, see Collier and Col-
discovered between X 1 and Y. Thus, Epstein's study of An1eric�111 :111d 11!'1" ( I 'I'! l/.1 1 H 1 .> 1 ; I "111T1·,1· . 1 1 1 . I I l11pn1x ( ] ')(,2); K:HI ( 1 997); Moore (1966); Skocpol
Canadian political parties is notable for its princip:d l--i11di11g: t h:H tlit" ( l 'l '' 1 ) : . 11111 \ , 1 • . 1 1 . 11 I ' ! I l l ', ' I I I f < , 1 \ , · 1 r 1 , 1 1 111,,I , i i 1 h ( ·�(' \ t 1 1 d i t '\ :irt· desL· r i lwd as co1n/Ji11-
underlying c.iusc of p,1rt y cohesion is to he found i11 IIH· <,lructurc ol
. .
111 1: 1 1 1 1 1 · . 1 ·. 1 1 1 1 1 1 . 1 1 1 1 1 , I "'" I ,loll, 1 , I l l I I H 1J i . ., I , ,
140 II. Doing Case Studies Techniques for Choosing Cases 141

TABLE 5.6. Most-different analysis with tiuo cases organizations, the persistence of friendship networks, and the disappoint-
men! w1' th post-con1mun1sn1. .
" 8 3 s·imp1y put, Howard concludes, "a great
number of citizens in Russia and Eastern Germany feel a strong and lin­
gering sense of distrust of any kind of public organization, a general sat­
isfaction with their own personal networks (accompanied by a sense of
deteriorating relations within society overall), and disappoint1nent in the
X1 = the variable of theoretical interest. X2a -d = a vector developments of post-coinmunism." 8 4
of controls. Y = the outcome of interest.
Results obtained from the analysis of East Germany and Russia are pre­
su1ned to apply in other post-communist polities (e.g., Lithuania, Poland,
The sin1plcst form of this two-case comparison is illustrated in Table Bulgaria, Albania). Indeed, by choosing a heterogenous sample, Howard
5.6. Cases A and B arc deen1ed "most-different," though they are similar solves potential problems of representativeness in his restricted sample.
in two essential respects - the causal variable of interest and the outcome. However, this sample is not representative across the entire population of
As an example, I follow Marc Howard's recent work, which explores the inference, which is intended to cover all countries, not just comn1u­
the enduring impact of co1nmunism on civil society. 8 1 Cross-national sur­ nist ones. (To argue that communis1n impedes the developn1ent of civil
veys show a strong correlation between forn1er co111munist regi1nes and society is to imply that noncommunistn stimulates the development of
low social capital, controlling for a variety of possible confounders. It is a civil society. The chosen sample is trnncated !censored] on the dependent
strong result. Howard wonders \vhy this relationship is so strong and why variable).
it persists, and perhaps even strengthens, in countries that are no longer Equally problematic is the lack of variation on key causal factors of
socialist or authoritarian. In order to answer this question, he focuses on interest - comn1unism and its putative causal pathways. For this reason, it
two most-different cases, Russia and East Gern1any. These t,vo countries is generally difficult to reach conclusions about the cansal status of these
were quite different - in all ways other than their communist experience - factors on the basis of the 1nost-different analysis alone. It is possible,
prior to the Soviet era, during the Soviet era, and in the post-Soviet era, that is, that the three causal pathways identified by Howard also operate
as East Germany was absorbed into West Gern1any. Yet they both score within polities that have never experienced communist rule. If so, they
near the bottom of various cross-national indices intended to measure the are not properly regarded as causal.
prevalence of civic engagen1ent in the current era. Thus, Howard's case Nor docs it see1n possible to conclnsively eli1ninatc rival hypotheses
selection procedure meets the requirements of tbe most-different research on the basis of this most-different analysis. Indeed, this is not Howard's
design: variance is found on all (or most) dimensions aside from the key intention. He wishes 1nerely to show that whatever influence on civil soci­
factor of interest (comn1unisn1) and the outcome (civic engagement). 82 l'ty might be attributed to economic, cultural, and other factors does not
What leverage is brought to the analysis by this approach? Howard\ exhaust this subject.
case studies combine evidence drawn fron1 mass surveys and from in My considered judgment, based on the foregoing methodological
depth interviews of s1nall, stratified samples of Russians and East (;l'r dil en1111as, is that the 1nost-different research design provides only minimal
mans. (This is a good illustration, incidentally, of how quantitative and insight into the proble1n of why com1nunist systems appear to suppress
qualitative evidence can be fruitfully combined in the intensive study o! ri vie engagement, years after their disappearance. Fortunately, this is not
several cases.) The product of this analysis is the identification of thn·(' tl il' only research design employed by Howard in his admirable study.
causal pathways that, Howard clai1ns, help to explain the l::1ggard st:11 11'> l ndL"ed, rhe author employs two other small-N cross-case methods, as
of civil society in post-co1nmunist polities: "the n1istrust of co111n11111i"1 \\!l·ll :ls a h1rgc-N cross-country statistical analysis. In n1y opinion, these
l l l l ' t l i od s do 1110-;1 c i i' 1 hc :111:1lytic work. East (;ermany may be regarded

81 I-lowarJ (2003). In the following discu�si011 ! rrv;1I tlw !t' rlll\ "srn." i;d t·: q 1it.1I," ", 11·11
society," and "civic L'll!..!,;1 ).!,L'lllt'lll" i11tlTd1<11ti•,t·.1 hly. '1 1 ll,1.I., I . ' '
82 1 [! l \V;;rd (2(lll \: (, '!), '' 1111. I ' 1 1 ·,
1
142 II. Doing Case Studies Techniques for Choosing Cases 143

as a causal-pathway case (as discussed earlier). It has all the attribmes Let us begin with a methodological obstacle that is faced bv both Mil­
nor1nallv assumed to foster civic engagemenr (e.g., a gro\ving economy, lean styJes of analysts - the necessity of dichoton1izlng every.variable in
multipa�ry competition 1 civil liberties, a free press, close association with the analysis. RccalJ that, as 1,vlth most-simiiar analysis, differences across
\Vestern European culture a nd politics), but nonetheless shows little or no cases nJ1.1sr be sizeable enough to be interpretable in an essentially dichoto­
improvement on this dimension during the post-transition era. 85 It is plilu­ n1ous fashion (e.g., high/low, presentiabsent), and si1nilarities must be
sible to attribute this lack of change to its com1nunist past, as Howard close enough to be understood as essentially identical (e.g., high/high,
does. 'fhe contrast bet\veen East and West Germany provides a rnost­ present/present). Otherwise the results of a Millean-style analvsis are not
sin1ilar analysis, since the two polities share virtually everything except interpretable. The problem of "degrees" is deadl y if the vari�bles under
a communist past. This variation is also deftly exploited b y Howard. In consideration are by nature continuous (e.g.) GDP). This is a particu­
short, Ho\\'ard;s conclusions are justitiable, but not on the basis of rnost­ lar concern i n Hov1ard's analysis, where East Germa:r:y scores somewhat
dilferent analysis, higher than Russia in civtc engagement; they are both low, though Russia
I do not wish to dismiss rhe most-differem research method entirely. is considerably lower. Howard assun1es that this divergence is minin1al
Surely, Hov.rard's findings are stronger with the i ntensive analysis of Russia enough :o he undersrood as a difference of degree rather than of kind,
than they would be without. Yet if one strips away the pathway case (East a judgment that 1nighc be questioned. In these respects, most-different
Gern1a:1y) and the most-similar analysis (East/West Ger1nany), there is analysis is n o n1ore secure - hut also no less - than :nost-similar analysis.
little leli upon which to base an analysis of causal relations (aside from In one respect, most-different analysis is superior to mos:�similar anal­
the large-N cross-national analysis). Indeed� most scholars \\ ho e111ploy 1
ysis. If the cocUng assumptions are sound, the n1ost··different research
the most-different merhod do so in conjunction with other methods. 86 It desjgn 111ay be useful for eliminating necess,;ry causes. Causal factors that
ls rarely� if ever, a stand-alone rnethod. 87 do not appear across the chosen cases e.g., X2a d in Table 5.6 are evi­
dently unnecessary fo: the production of Y. lfowever, it does not follo\v
Conclusion that the most-different method is the best method for eliminating neces­
Generalizing from this discussion of Marc Howard's work, I offer the fol· sary causes. Note that the defining feature of this method is the shared
l o\.\ring surnmary remarks on the rnost-diffe'rent n1ethod of case analysis. 1.;Jen1ent across cases -- X i in Tnble 5.6. This feature does not help one to
(I leave aside issues faced by all case study analyses, issues that formed l'li minate necessary causes. Indeed, if one were focused solely on eli1ninat­
the basis oi Chapter Three.) ing necessary causesi one would presumably seek out cases chat register
the same outcomes and have n1axirnum diversitv on other attributes. In
Htble 5.6, this would be a set of cases that satisfy conditions X2a-:h but
85 Ibid., &.
not X 1 . Thus, even the presumed strength of the most-different analysis
};6 See, e.g., CoUier and Collier ( i991/2002); Karl (1997}; 11oore (1966); SAocpol ( 197 q::
and Ynshar (2005: 23\. Karl ( 1 997), which affect,,: to be a most-diffe:-ent systctn ana; y �r. is not so strong.
{20), is a r;rticu\arl; dc;.sr ex.::n1ple of tt1s. Her study, focused ostensibl y or. pc11 :, Usually, case study analysis is focused on the ide�titication (or darifi­
:,:ates forates with large oil reserves), m::lkes two sorts of inferences, The Ersr co1lLTn1', ration) of causal relations, not on the t�limination of possible causes. In
the t:.1.suall y) obstructive role of oil in political and econon1ic development. T!;e SCl"r"I
sort of inference con.:etns variation within the population of 9etro-s.ratei>, s,hov1i ng th.11 this setting, the most-different technique is useful� but only lf assu111p-
sorr.e count:ies (e.g, Norway1 Indo'1esia} p-,anage to avoid the pattlulogies hrm1gh1 • "'
1
1inns of ""causal uniqueness" hold. By this l mean a situation in \vhich a
elsewhere by oil resourct"s. Wheu attf'nlpting to explain the constnunlng rok o! ,)ii .. 11 1-�1\'t'll outcome i s the product of only one cause: Y cannot occur except in
petro-states, Karl u;uaHy relies on co1"...trasts between petro-stares and nnn·p('fl'0·�1.11,-,, th1· presence of X 1 , X i is necessary, and in some situations (given certain
(e.g., Chapter 10). Onl y when attempting :o explain ciifferences antrn�g pctro-s:,ne'> d( I ,
she re�trict her s,1mple to p:::tro-states, In 1Ic.y opinion, very litr!e U'>e i,, ma,.k ,:f th,· 111,, ,I ! 1 .1tkground conditions) sufficient) to cause y_s s
different research cesign. < :011!-.idtT tht· follovving hypothetical exa111ple. Suppose that a new dis-
87 This \vas re-.'.ognized, at ie.1st implii.:iti y, by l\1ill ( I S4:1!1 872: 2 58--·'>l. SbT! ,ci<-:rn l!.r. i" ' n 1',l',t\ ,1hotJt \Vhi,:h little is k 11P\\.' ll, has appeared i n Country A. There are
f'choed by n1eth0Jo;,,gists ir, th,: int<:rveurn� yeirs (e_g., ( :(Jf1c11 ,111d '.'..1gv'. I 1 t t·1: . I , I "
c;crri:;g lO(J I : S!:s11tpnl J!ld '.)o0\\'1� I 1Jl'OL !11dcnl. ('\.r•lil'il dd,·11-.:·� pj :lw ,ldl, ' '"
'!!(l',l

l'h'1h:)d :tri· r:u,· \ht11 ,,,.,, l kL-l1u· l 'J81il.


1 44 II. Doing Case Sti,dies Techniques for Choosing Cases 14 5

hundreds: of infected persons across dozens of affected communities in Conclusion


that country. In Country B, '.oca:ed at the other end of the world, several
ne\v cases of the disease surface i n :t single co1nln unity. Jn this setting, In order to be a case of son1ething broader than itself, the chosen case 1nust
vve can imagine two sorts oi Mil lean analyses. The first exan1incs two �)e_ representative (in so1T1c respects) of a larger popula:ion. Otherwise - if
similar con1111unit:es within Country A, one of which has developed the lt 1S purely i dios_rncratic ("nnique") it i.s 11nfnforn1ative about anything
.
disease aud the other of which has not. This is the most-similar style of other than t1selt. A study based on a uonrepresentative sample has no
case comparison and focuses accordi:1gly on the i dentification of a dif­ �o: very Ht�le) exrer�al validity. To be sure, no phenomcuon is purely
1 dtosy11crat1c� rhe nonou of a uuique case is a n1.1tter that \Voul<l be difficult
1

ference Setween the two cases that might account for variation across the
sample. l\ second approach focuses on (highly dissi1nilJr) comrnunitJes to defiuc. One is concerned, as ahvays, with n1atters of degree. Cases are
v:here the disease has appeared across rhe t\vo counrries and searches for tnore or less representative of some broader phenomenon aud. on that
any similarities that rnight accoun:: for rhese similar outco1nes. ·rbis is the �core, rnay be c�nsidercd better or worse subjects for intensive ,analysis,
most-different research design. \The one exception, as nored i is the int1uential case.;
_
Both are plausible approaches to tbis particular problen1> and we can Of all the problems besetting case study analysis, perhaps the n1osr per­
.
i 1nagine cpide1niologists employing thern sin1ultaneously. Ho\vevcr, the siste�� and the most persistently bemoaned - is the problem of sample
most-different desi gn demands stronger assumptions about rhe uuderly­ btas. Lisa 1vlart1n finds !hat the overemphasis of iuternational relations
ing factors at \vork. It supposes that the disease arises fron1 the sarne cause scholars on a few \vel f-known cases of economic sanctions- 111ost of which
i n any T>etting. This n1ay be a reasonable operating ass-�mption when one failed to elicit any change in the sa nctioned country - "has distorted an,1-
1s dealiug \vith certain natural p bc:1on1ena like diseases. Even so, there are lys::s' vie\v o f the dynamics and characteristics of econon1ic sanctions. "90
111any exceptions. Death} for example, has 1r:.any causes. for rhis reason , llarbara Geddes charges that many analyses of industrial policy have
_
it wnuld not occur to us to Jook for n1ost-different cases of high n1ortality tocused exclusively on the n1ost successful cases - prin1arity the East Asian
arnund the world. [.1 order for the most-different research design to effec­ NICs - leading to biased inferences. 91 Anna Bteman and Carolyn Shelton
rively identify a causa1 factor at vvork ir. a given outcome, the researcher ,..how :bat ca �e study work on the question of structural adjust111ent is sys�
1nust assu1ne that XJ - the factor held constant across rhe diverse cases - tc1nat1cally biased insofar as researchers tend to focus on disaster cases -
is the only possibie cause of Y {see 'fable 5.6). This assumption rare-! y those where structural adjust1nent is associ.ared ,vith very poor health and
holJs in social scientific sertings, for n1ost outco1nes of interest to antbro hu °: an developtncnt outco111es. These cases, often located in snb-Saharau
_
pologisrs, econon1ists� political scientists, and sociologi&ts have multipi1 /\ tnca, are by no rneans representative of t he entire population. Conse­
cause!',. There are many ways to ;,vir- an election, to build a ·•,veifare state, quently, scholarship on the question of structural adjustn1ent is bighly-
to get in;:o a \var, to overthrow a government, ur - returning to M,Jr(· -.,kewed in a particular ideological <l.ir;;ction (against neoliberalism). si2
Ho\vard's \.vork to build a s::rong civil society. l\nd it is for this reason
that n1ost-different analysis ls rarely applied in social science \vork a1h l ,
where applied, ts rarely convincing.
If this seems a tad severe} there is a u1ore charitable \VJ)' of approachinl'.
the most-differeut method. Arguably, this is not e1 pure "method" at all lm1
1r�ereiy a supplement, a way of incorporating diversity in the subsamp l e , d
cases that _provide the unusual outcome of interest, If the unusual outco1nc
is r,;;volution, one 111ight \vish ro encompass a v.ride va:·iety of revolutin1L
in one's analysis. If the unusual outcotne i s post-corrununi�t civil ..;o('.1·1 \ .
it seems appropriate to include a diverse set of post-cnn1rill ll:1st pol 11 ic.'-< 111
one's sample of case studies, :ls '.Vt1 rc I {nw;1rd doe-.. Fro111 this pt·r..;pvl·1 11. ('.
rhc 1 1 1 ost-diffcrvn1 nH·thod ( :-.n--c,11lcd) :11ighi ht· hc1 l!·1 Lil,c!t·,t .1 dit·1·1·.,·
(,f,\i' 1 1 1\'dHid, ,1, c , plnn,d 1\1 ri!cr. , , Ii i , I i i ll\ 111,I ',: , 1, . . , . , 1 ,1 1 1 ,
146 JI. Doing Case Studies Techniques for Choosing Cases 147

These examples might be multiplied many times. Indeed, for many top­ case (in whatever n1odel the researcher has greatest confidence in) is a
ics the n1ost-studied cases are acknowledged to be less than representa­ reasonable place to start. Of course, this test is only as good as the 1nodel
tive. It is worth reflecting upon the fact that our knowledge of the world at hand. Any incorrect specifications or incorrect 1nodeling procedures
is heavily colored by a few "big" (populous, rich, powerful) countries, will likely bias the results and give an incorrect assessment of each case's
and that a good portion of the disciplines of economics, political science, " typicality." In addition, there is the possibility of stochastic error, errors
and sociology are built upon scholars' familiarity with the economics, that cannot be modeled in a general frarneworlc Given the explanatory
political science, and sociology of one country, the United States.93 Case weight that individual cases are asked to bear in a case study analysis, it
study work is particularly prone to proble1ns of investigator bias because is wise to consider more than just the residual test of representativeness.
so 111uch rides on the researcher's selection of one case (or a few cases). Deductive logic and an in-depth knowledge of the case in question are
Even if the investigator is unbiased, her san1ple 1nay still be biased sin1ply often more reliable tools than the results of a rather superficial cross-case
by virtue of "rando1n" error (which n1ay be understood as 1neasurement model.
error, error in the data-generation process, or an underlying causal feature In any case, there is no dispensing with the question. Case studies (with
of the universe). the two exceptions already noted) rest upon an assumed synecdoche: the
There are only two situations in which a case study researcher need case should stand for a population. If this is not true, or if there is reason
not be concerned with the representativeness of her chosen case. The first to doubt this assumption, then the utility of the case study is brought
is the influential-case research design, \Vhere a case is chosen because of severely into question.
its possible influence on a cross-case model, and hence is not expected Fortunately, there is some safety in numbers. Insofar as case study evi­
to be representative of a larger sample. The second is the deviant-case dence is combined with cross-case evidence, the issue of sample bias is
111ethod, where the chosen case is employed to confirn1 a broader cross­ tnitigated. Indeed, the skepticism about case study work that one com-
case argument to which the case stands as an apparent exception. Yet in 1nonly encounters in the social sciences today is, in my viev.,, a product
tbe latter instance, the c�osen case is expected to be representative of a of a too-literal interpretation of the case study method. A case study tout
broader set of cases - those, in particular, that are poorly explained by court is thought to mean a case study tout seul. Insofar as case studies
the extant model. and cross-case studies can be enlisted within the san1e investigation (either
In all other circutnstances, cases mnst be representative of the popula­ in the san1e study or by reference to other studies of the same subject),
tion of interest in whatever ways 1night be relevant to the proposition problems of representativeness are less worrison1e. This is the virtue of
in question. Note tbat where a researcher is attempting to disconfir111 :1 cross-level work, a.k.a. "triangulation."
deterministic proposition, the question of representativeness is perhaps
more appropriately understood as a question of classification: is the Ambiguities
chosen case appropriately classified as a member of the designated Before concluding, I wish to draw attention to two ambiguities in case­
population? If so, then it is fodder for a disconfirming case study. sclection strategics for case study research. The first coucerns the admix­
If the researcher is attempting to confirm a deterministic proposition, ! un_· of several case-selection strategies. The second concerns the changing
or to 1nake probabilistic argurneuts about a causal relationship, then tht· ...iatus of a case as a study proceeds.
problem of representativeness is of the more usual sort: is Case A u 1 1 i 1 So111e case studies follow only one strategy of case selection. They
homogenous relative to other cases in the population? This is not : 1 1 1 ,ire typical, diverse, extreme, deviant, influential, crucial, pathway, most­
easy matter to test. However, in a large-N context the residual for th.ti \1111ilur, or most-different research designs, as discussed. However, many
l ,\:-.(' studies n1ix ,1nd ,natch among these case-selection strategies. Indeed,
111..,of :tr :1s :111 case st·ndics seek representative samples, they are all in search
I d " 1 ypic;1]" L·:1 scs . · l'lll 1 ", it is L·o111111on for writers to declare that their case
93 Wahlke (1979: 1 3 ) ,vrites of the failings of the "behavioralist" mode of politiL·;tl \l l r 1 1 , ,

1·,, 1 1 11' 1•x.1 1 1 q 1lc, 1 10! 1 1 ( ' \ l rt · 1 1 11 · . 1 1 1 d 1 ypic;tl; it has ;111 cxtrc111e value on X 1
analysis. "lt rarely .1.in1s at gcncraliz;Hion; rcsc,nch effort:-. h:1vt· lwt·n vnnfi11( ' d \'\\1· 1 1 1 1 . i l l 1
t o c1se studies of single poliric:11 systt·rns, 1111 1:-,t < 1 1' tlw1n dt·.1li1 1)', w i 1 h 1 h<' 1\1111·11, . 1 1 ,
sr:-.tt·111." 1 1 1 Y h111 I', 1 H 1 1 , 1 1 1 1 , 1 1 1 1 · 1 t 1 ", 1 w, 1 ·. , 1 d i o :-. r 1 1 r r: 1 l i L· . 'l'h tTc is not 1n11(.J1 that
148 II. Doing Case Studies Techniques for Choosing Cases 1 49

one can say about these combinations of strategies except that, where This is unfortunate, but inevitable. One cannot con struct the perfect
the ca s es allow for a variety of e1npirical strategies , there is no reason research design until (a) one has a specific hypothesis and (b) one is rea­
not to purs ue them. And where the sa1ne case legitin1ately serves several s onably certa in about what one is going to find "out there" i n th e empir­
functions at once (without further effort on the res earcher's par t), there ical world. This is parti cularly true of observational rese arch designs, but
is l i ttle cost to a multipronged approach to ca se analysis .
it als o applies to ma n y experi1nental research desi gn s : usually, there is
The second issue that deserves emphasis is the changing status of a case a "good" ( i nformati ve) finding, and a finding that i s less insightful. In
during the cours e of a researcher's investi gation-v1rhich may last for y ears, short, the perfect case study research design i s usually apparent only ex
if not decades. The problem is particularly acute when a researcher begi n s post facto.
in an exploratory 1node and then proceeds to hypothesis testing (that is, There are three \Vays to handle this. One can explai n, straightfor\vardly,
she develops a specific X1 fY proposition), or whe n the operative hypoth­
that the initial research \Vas undertaken in an exploratory fa shion, and
esis or key control variable changes (a ne\v causal factor is discovered or therefore not constructed to test the specific hypothesis that is - now - the
another outcome becomes the focus of analysis). Things change. And it is primary argument. Alternatively, one can try to red esign the study after
the n1ark of a good researcher to keep her mind open to new evidence and the new (or revised) hypothesis has been formulated. This may requi re
new insights . Too often, methodological discussions give the mi sleading additional field research or perhaps the integrati on of additional ca s es or
i mpression that hypotheses are clear and remain fixed over the course
variables that can be obtained through secondary sources or consultation
of a study's development. Nothing could be further from the truth. The of experts. A final approach is to simply jettison, or deen1phasi ze, that
unofficial transcripts of academia - accessible in inforn1al settings, where portion of the res ea rch that no longer addresses the (revi s ed) key hypoth­
researchers let their guards down (particularly if inebri ated) - are filled esis. A three-ca s e study 1nay become a two-case study, and s o forth. Lost
with stories about dead ends, unexpected findings, and dra st ically revised ti me and effort ar e the costs of this downsi zi ng.
theory chapters . It would be interesting, i n th is vei n, to con1pare published In the event, practical considerations will probably determine which
work w ith dissertation prospectuses and fellowship applications. I doubt of these three strategi es, or con1biuations of strategi es, is to b e followed.
that the corre lati on between thes e two stages of research is particularly The point to ren1en1ber is that revisi on of one's cross -case rese arch desi gn
s trong.
is normal and to be expected. Not all t\vis ts and turns on the me anderi ng
Research, after all, is about discovery, not si mply the veri ficati on or fal- trail of truth can be anticipated.
sification of existing hypotheses. That s aid, it is also true that research on
a pa rticular topic should move from hypothesis ge nerating to hypothesis
Are There Other Methods of Case Selection?
testing. This marks the progress of a field, and of a scholar's own ,vork.
!\ t the ontsct of thi s chapter, I s ummarized the task of case selecti on a s
As a rule, res earch that beg ins \v i th an open-ended (X- or Y-centered)
a n1atter of achievi ng t\VO objectives: representati veness (typicality) and
analys is should conclude witb a determ inate X1 /Y hypothes is.
v,1riation (causal leverage). Evidently, there are oth er Objectives a s well.
1-he problen1 is that research strategies that are ideal for exploratio11
l;or example, one \Vis hes to identify ca ses that are independent of each
are not al\vays ideal for confirmat ion. I dis cussed this trade-off in Chaptc1
od1er. If chos en cases are affected by each other, tbe problem (sometimes
Three a s it pertains to the cross-case/cas e study dilem1na. It also app !it",
k 11own as Galton's problem or a probl em of diffusion) n1ust be corrected
to various methods of case study analysis, as pres ented in this chap l t 'I '.
hcfore analy s is can take place. I have neglected this i ssue because i t i s
The extreme-case method is inherently exploratory, since there is no clc:1 1
us11ally apparent to the researcher and, in any case, there are no easy
caus al hypothes is; the researcher is concerned merely to explore v;1ri;1
l!·L'hniqucs that m i ght be uti l ized to correct for such biases .94
tion on a single dimensi on (X 1 or Y). Other methods can he en1p\oy l ·(I
I h:1vc also di sregarded pragmatic/logistical i ssues that mi ght affect case
i n either an open-ended (exploratory) or a hypothesi s-testing (co11fir1n.1
.,cll'L'tion. Fvidcntly, case selection is often influenced- by a researcher's
tory/di sconfirmatory) 1node. The d i fficulty is th:1t oncl' thl' 1-csl':1rchcr h.1·.
arrived ar J detern1inatc hypothesis, thl' o r i ginally chosl' ll rl's;l':lrl'h l i l ''>lJ', 1 1
l.1,·1.,, \ i 1 1 1 p i 1 w, i 1 1 g ll))!ll) C.lSl' scl,:l'l· ioll, SL' (' (;erring
n1;1y no longer ht' so \Vl'll l·o11sl!'llL' lt'd.
I
' I t > I 1 1 1 1 1 1 1 1 · 1 .11,,. I I ''" , 1 1 i,l 1 1 1 1 ·• . 1 1 1 . I " 1 1 1 , · 1

I ' I II 1 1 I '1i ii i I
II. Doing Case Studies
150
ry, a personal entrfe into that
fa1nili arity w ith the language of a count
r fundin g that cove rs one arch ive
loc ale , special access to i mportant data, o i te
rather t han a not her. Pragma
tic considerations are often - and qu
elect ion process.
rightly - decisive in t he case-s
al prominence of a partic­
A fina l conside ration concerns the theoretic times
ect. Researche rs are some
ular case wi thin the literature on a subj
attenti on in previ­
have receive d exten sive
oblige d to study cases t hat
es or
to as "p aradigm atic " c as
ou s studies. T hese a re sometimes refer red
" exemplars. " 95 i-
ility nor t h eoret ical p rom
However, ne ither pragmatic/logistica l ut these
i n case selection . That is,
nence qua lifies as a methodological fac to r ming
lidity of the find ngs s
featu res of a case have no b earing o n the v a
i tem

p p heral
gr ant these issu es a
fr om a study. As such, it is appropriate to
eri

in the bo ok.
status in this cha pter, as I have elsewhere s­
is tradition a l to 1nake a di
One fina l caveat mus t be issued. While it se l ok
n and case u ly , l
tinc ti on between the tasks of case selectio
a a sis a c o o

e can­
1n to be nd stinct and overlapp in g. On
at these proc esses revea ls the
i i
sort of a nalysis that it mi ht be g
not choose a case w ithou t considering the
reade r should consider cho osi ng
subj ected to, and vic e versa . Thus, the
d ou t in this ch apter along with
cases by employing t he nine techni ques l ai quasi­
any considerat ions that mi
g ht be introduced hy v irt ue of a case 's
er Six) a nd its potential for
process acing tr
e xperimenta l quali ti es (Chapt
w turn.
(Ch apter Seven), subj ects to which we no

95 Flyvbjerg (2004: 427).

You might also like