Professional Documents
Culture Documents
Cognitive Neuropsychology
Publication details, including instructions for authors and subscription
information:
http://www.tandfonline.com/loi/pcgn20
To cite this article: Simon Fischer-Baum (2013) Making sense of deviance: Identifying dissociating
cases within the case series approach, Cognitive Neuropsychology, 30:7-8, 597-617, DOI:
10.1080/02643294.2013.846903
Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”)
contained in the publications on our platform. However, Taylor & Francis, our agents, and our
licensors make no representations or warranties whatsoever as to the accuracy, completeness, or
suitability for any purpose of the Content. Any opinions and views expressed in this publication
are the opinions and views of the authors, and are not the views of or endorsed by Taylor &
Francis. The accuracy of the Content should not be relied upon and should be independently
verified with primary sources of information. Taylor and Francis shall not be liable for any
losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities
whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or
arising out of the use of the Content.
This article may be used for research, teaching, and private study purposes. Any substantial
or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or
distribution in any form to anyone is expressly forbidden. Terms & Conditions of access and use
can be found at http://www.tandfonline.com/page/terms-and-conditions
Cognitive Neuropsychology, 2013
Vol. 30, Nos. 7 –8, 597 –617, http://dx.doi.org/10.1080/02643294.2013.846903
The case series approach in cognitive neuropsychology provides a means to test theories that make
quantitative predictions about associations between different components of the cognitive system
[Schwartz, M. F., & Dell, G. S. (2010). Case series investigations in cognitive neuropsychology.
Downloaded by [University of Waterloo] at 14:33 11 October 2014
Cognitive Neuropsychology, 27, 477 –494]. However, even when the predicted association is borne
out the study may include outliers—observations that deviate significantly from the rest of the
data. These outliers may reveal individual cases whose cognitive impairments dissociate from
other cases included in the study. These dissociating cases can pose a significant challenge to
the theory being tested. Using a recent case series that investigated the underlying causes of
letter perseveration in spelling [Fischer-Baum, S., & Rapp, B. (2012). Underlying cause(s) of
letter perseveration errors. Neuropsychologia, 50, 305–318], I discuss statistical and theoretical
issues that arise when using outlier detection techniques to identify dissociating cases in a case
series study.
A common critique of the single-case approach to of individual cases and analyses how these
neuropsychology has been its focus on identifying measures covary across participants (Schwartz &
dissociations between cognitive functions at the Dell, 2010; cf. Olson & Romani, 2011). To the
expense of identifying how cognitive functions extent that these different behavioural measures
are associated (Patterson & Plaut, 2009). Case tap into different cognitive functions, associations
series studies offer a complementary approach to between these measures reflect associations
the single-case design, providing a clear means between the underlying cognitive functions. The
for testing theoretically motivated associations. A relative merits of the case series methodology in
typical case series study combines data from mul- cognitive neuropsychology have been discussed
tiple behavioural measures across a large number extensively in this journal over the past several
Correspondence should be addressed to Simon Fischer-Baum, Department of Psychology, Rice University, MS-25, P.O. Box
1891, Houston, TX 77251, USA. (E-mail: sjf2@rice.edu).
I am grateful to Gary Dell, Matt Goldrick, Michael McCloskey, Fred Oswald, and Brenda Rapp for helping me work through
these issues of case series, outliers, and cognitive neuropsychology, and to David Kajander for reading through drafts. Portions of this
work were presented in a symposium on case series investigations at the 50th annual meeting of the Academy of Aphasia. The
remaining work was moulded by the discussion that took place at that meeting.
years (Bub, 2011; Dell & Schwartz, 2011; dissociating cases in a case series study. By carefully
Goldrick, 2011; Lambon Ralph, Patterson, & analysing our case series for dissociating cases, we
Plaut, 2011; Nickels, Howard, & Best, 2011; can use case series data to develop theories that
Olson & Romani, 2011; Rapp, 2011; Schwartz explain not only the associations that are observed
& Dell, 2010; Shallice & Bulatti, 2011). across large samples of brain-damaged individuals,
Generally, these researchers agree that case series but also the dissociations.
studies provide a means to test certain types of
hypotheses that are difficult to test with the
single-case approach. However, a number of THE CASE SERIES APPROACH TO
these researchers noted the importance of evaluat- LETTER PERSEVERATIONS IN
ing not only the associations that exist in the case DYSGRAPHIA
series, but also whether each individual in the
sample is a legitimate function of the underlying Fischer-Baum and Rapp (2012) addressed the
Downloaded by [University of Waterloo] at 14:33 11 October 2014
pattern of association (Dell & Schwartz, 2011; issue of dissociating cases in a case series in an
Goldrick, 2011; Nickels et al., 2011; Rapp, 2011; investigation of the underlying causes of letter per-
Schwartz & Dell, 2010, cf. Lambon Ralph et al., severation errors in dysgraphia. Perseveration
2011). According to Nickels et al. (2011, p. 481), errors are the inappropriate repetition of a previous
the logic of cognitive neuropsychology dictates response in the place of the current target. The
that even a single individual who deviates signifi- errors investigated in that paper were the perse-
cantly from the rest of the sample “may be a real veration of individual letters from a previous
example of a dissociating pattern that can falsify written response into a subsequent response (e.g.,
a theory.” the perseveration of the R in the error “edge”
The goal of this paper is to provide a frame- spelled as ERGE, immediately following the
work for evaluating case series studies for these response FRENCE). Two accounts for why
dissociating cases. I discuss a recent case series these perseveration errors were produced were
on the underlying causes of letter perseverations considered: a “failure-to-inhibit” account, in
in spelling as a concrete example (Fischer-Baum which perseverations arise because of a failure to
& Rapp, 2012). I then propose a statistical inhibit previous responses (e.g., Hauser, 1999)
method that uses the recursive externally studen- and a “failure-to-activate” account, in which perse-
tized residual test (Weisberg, 1985) as a tool for veration errors arise because the current target
detecting outliers, individual cases that deviate receives abnormally little activation (e.g., Cohen
significantly from the association being tested in & Dehaene, 1998; Dell, Burger, & Svec, 1997).
the case series. This method is similar to a To assess these two accounts, Fischer-Baum
recent proposal by Crawford and Garthwaite and Rapp (2012) analysed the spelling errors pro-
(2006, 2007a, 2007b) for comparing a single duced by a series of dysgraphic individuals.
neuropsychological case study to a population of Following work on verbal perseverations by
unimpaired control participants whose Martin and Dell (2004, 2007), they reasoned
performance is characterized by a regression that a failure-to-activate deficit should lead to
equation. both perseverative and nonperseverative intru-
The second half of the paper explores this sions, and they predicted an association between
outlier detection method in more detail by apply- an individual’s rate of producing both of these
ing it to simulated data. Across six simulations I error types. This predicted association was borne
explore the effects of the number of participants, out across the case series. Figure 1 reproduces
the number of observations per participant, the the association between perseverative and nonper-
frequency of dissociating cases in the population, severative intrusions in the 12 individuals with
the placement of the decision criterion, and the acquired dysgraphia studied by Fischer-Baum
role of follow-up testing on the ability to detect and Rapp (2012).
method suited to determining whether individual rate of nonperseverative intrusion. To say the
data points lie surprisingly far from a fitted residual is calculated externally means that when
regression line.1 This test requires the assumptions evaluating how far the nth case deviates from a
typical for linear regression models—namely that predicted point, that prediction is derived using
the error from the fit of the regression line is nor- all of the data except for the nth case. In the
mally distributed, and that errors are mutually example case series above, to determine the
uncorrelated. Using this method, we can evaluate residual of the apparent outlier, that individual’s
whether an individual’s rate of producing perse- observed rate of producing perseverations
veration errors differs significantly from the rate (25%) is compared to a predicted rate of produ-
expected given that individual’s rate of producing cing perseverations, derived from (a) that individ-
nonperseverative intrusions. Included in ual’s observed rate of producing nonperseverative
Appendix A and Appendix B are two Matlab pro- intrusions (4%) and (b) the regression line
grams that can be used, in combination with given the other 11 individuals in the study.
Downloaded by [University of Waterloo] at 14:33 11 October 2014
Statistics Toolbox, to carry out this outlier detec- Finally, the residuals are studentized, or converted
tion technique. Other statistical packages (includ- to a t-statistic that takes into account the variance
ing free packages like R) come equipped with in residuals—calculated over all of the data except
functions that calculate the externally studentized for the nth case—as well as leverage of the data
residual. point (i.e., the influence of the data point on its
The basic logic of any outlier detection test is as fitted value; high leverage indicates that its residual
follows: First, each point has to be evaluated for matters more).2 The apparent outlying case in
how far it deviates from the other data points in Figure 1 has an externally studentized residual of
the sample. Second, a decision criterion must be 6.3. An externally studentized residual is calcu-
set to evaluate whether the degree to which a lated for all individuals included in the case
data point deviates from the rest of the sample is series; from the data shown in Figure 1, an exter-
large enough to merit identification as an outlier. nally studentized residual is calculated for all 12 of
The externally studentized residual is a measure the cases, with the other 11 cases ranging from
of how far a single data point deviates from the 21.2 to 0.5.
rest of the sample. The residual of each data The second stage of outlier detection is to define
point is simply how far that point falls from the a decision criterion for determining whether a
point predicted by the regression line. For the single data point that deviates enough from the
case series described above, the residual is a rest of the sample is an outlier. Because the exter-
measure of how far an individual’s observed rate nally studentized residuals have been converted to
of producing perseveration deviates from the rate t-statistics, the decision criterion can be set using
that would be predicted given that individual’s the null hypothesis statistical testing approach;
1
Fischer-Baum and Rapp (2012) describe a Mahalanobis distance procedure to detect outliers following Penny (1996). However,
a further review of the statistics literature, and some simple simulations, has led me to the conclusion that the externally studentized
residual test would have been more appropriate with our data set. The Mahalanobis distance technique measures how far a given data
point is from the centre of mass of the whole data set, while the externally studentized residual test measures how far the data point is
from the regression line. Consider a data set that includes a data point with a much greater rate of perseverative and nonperseverative
intrusions than the rest of the individuals in the sample, but whose data point falls near the regression line. Given the theory being
tested, it would be incorrect to conclude that the individual represented by this data point suffers a qualitatively different deficit from
the others in the sample; instead, it appears that this individual has a quantitatively more extreme version of the same deficit. Yet,
because this data point is far from the centre of mass of the whole data set, it will be identified as an outlier by the Mahalanobis
distance technique. Because it falls near the regression line, it will not be identified as an outlier by the externally studentized residual
test.
2
Error is larger for more extreme values of the predictor variable, causing the studentized residual to be smaller. Therefore, a
larger difference between the observed and predicted values is needed for a point to be identified as an outlier at these extreme
values (see Crawford & Garthwaite, 2006, for further discussion of this point).
critical t-values can be identified such that we are the one outlying case is deleted from the analysis,
more than 95% confident that an externally studen- and externally studentized residuals are recalcu-
tized residual is different from zero. Because we lated for the remaining 11 cases. While these stu-
need to independently determine whether each of dentized residuals are now larger, none are
the 12 residuals is significantly different from significantly different from 0. The application of
zero, it is important to make some correction for the recursive externally studentized residual test
multiple comparisons. To address this problem, identifies a single outlier in the Fischer-Baum
Weisberg (1985) suggests Bonferroni correction. and Rapp (2012) case series.
Most of the simulations described below follow If we can show that this outlier is generated by a
Weisberg’s suggestion and use this conservative different mechanism from the mechanism that
decision criterion for identifying outliers, though generated the data for the other 11 participants in
the fifth simulation explores the effect of relaxing the study (Freeman, 1980), then its very presence
the decision criterion on the effectiveness of poses a significant problem for the necessary-
Downloaded by [University of Waterloo] at 14:33 11 October 2014
outlier detection. For the specific example above, and-sufficient failure-to-activate hypothesis.
because there are 12 values being evaluated, for a However, there remains a possibility that this
significant level of .05, with a Bonferroni correc- outlier is a false alarm; the same mechanism that
tion, the critical value is based on an alpha of .05/ generated the rest of the data generated this data
12 or .0042. The degrees of freedom for the signifi- point, and it is because of random noise that this
cance tests, in the case of simple linear regression, is individual is identified as an outlier. With real
(n 2 2), where n is the number of data points in the data, identifying whether an outlier is a hit or a
analysis. In the example case series, the critical false alarm is an ill-defined problem. We certainly
t-values for a two-tailed test with a p , .0042 and cannot claim that an outlier is a false alarm simply
10 degrees of freedom are +3.70 and 23.70. because we do not understand the deficit that is
Since the observed externally studentized residual giving rise to this pattern of performance.
for the apparent outlying case (6.3) is more Simulation studies provide a clear opportunity to
extreme than the critical value, the null hypothesis investigate the problem of false alarms in outlier
can be rejected that this residual is no different from detection. In a simulation, the mechanism that
0, and this data point is identified as an outlier. generates the data for all of the participants can
Even though no other participant in the study be prespecified, so we know which individuals in
has a residual value more extreme than the critical the simulation are truly dissociating cases and
value, it may be premature to conclude that none which are not. False alarms in simulation studies
of these other individuals are outliers. The are simply those individuals identified as outliers,
problem of masking—in which one extreme despite not being one of the individuals prespeci-
outlier increases the overall variability of the fied as a truly dissociating case.
dataset, making the detection of other outliers How much we need to worry about false alarms
more difficult—is well known in the statistics lit- depends on the statistical power of the outlier
erature (Ben-Gal, 2005). To address this detection technique. In part, the statistical power
problem, I suggest recursive application of the of the outlier detection technique depends on the
statistic described above until no more outliers design of the case series investigation—how
are detected. Those outliers detected during a many participants are included in the study?
single run of the statistical analysis should be How much data are collected for each participant?
deleted from the data set, and the statistical analy- Other aspects of the investigation outside of the
sis should be run again.3 In the case series example, control of the experimenter—that is, the
3
One concern about this recursive application of the externally studentized residual test is that it will make false alarms more
common; once one or two nonoutliers are incorrectly labelled as outliers, the technique may start to identify false alarms on other
data points. The results of the simulation discussed below show that such false alarms are rare with this technique.
prevalence of the different underlying deficits in error variance, because our actual measurement is
the population—will also affect the statistical based on a limited number of observations, these
power of outlier detection technique. These vari- measures will vary randomly. Consider an individ-
ables are explored in a series of simulation ual whose true underlying tendency is to produce a
studies below. perseveration error on 10% of the trials. If this
individual was given only 10 trials of the task,
how likely is it that she would make exactly one
BENEFITS OF MORE PARTICIPANTS perseveration error? Following the binomial distri-
AND MORE DATA PER bution, the probability that she will produce
PARTICIPANTS exactly one perseveration error (and therefore
have an estimated perseverative tendency of .1,
As case series researchers, a trade-off exists matching her true underlying tendency) is
between including more participants in the study roughly .4. The probability that she will not
Downloaded by [University of Waterloo] at 14:33 11 October 2014
and collecting more data from each participant. produce a single perseveration error (and therefore
Fischer-Baum and Rapp (2012) collected hun- have an estimated perseverative tendency of .0) is
dreds of spelling trials for each of the participants about .35, and the probability that she will
included in the case series. Because of the intensive produce at least two perseverations in 10 trials
investigation of each individual in the study, it was (and have an estimated perseverative tendency of
infeasible to include many participants. However, at least .2) is about .25. Because only limited
there may be statistical concerns with having too data are being collected from this individual, our
small a sample in a case series, as it may increase measure of the tendency to perseverate is going
sampling error variance. The goal of a case series to differ from the underlying value.
study is to draw inferences from our sample that As with sampling error variance, one way to
generalize to the population of all individuals reduce measurement error variance is to collect
that meet the inclusion criteria for the study. more observations from each individual in the
The importance of inclusion criteria has been sample. The effects of collecting more data are
raised in recent discussions of the case series meth- illustrated in Figure 2. In Fischer-Baum and
odology (Dell & Schwartz, 2011; Olson & Rapp (2012), each participant was asked to spell
Romani, 2011; Shallice & Buiatti, 2011). at least 300 words (sometimes many more), and
Regardless of inclusion criteria, it is unlikely that each produced more than 1000 letters. The error
the entire population is being investigated in the bars in Figure 2a show the bootstrapped 95% con-
study. Even when a random and representative fidence intervals (on x- and y-values indepen-
sample is drawn from the population of interest, dently) around each participant’s estimated values
statistics will vary randomly from sample to of the tendency to produce perseverative and non-
sample. As the size of the sample increases, the perseverative intrusions (Simon & Bruce, 2000);
sampling error variance will decrease. these confidence intervals indicate the range of
Researchers who include more participants in true underlying values that could plausibly yield
their study, in order for their results to better gen- the observed error rates given the actual number
eralize to the population as a whole, may collect of letters produced. The narrow error bars indicate
fewer data from each participant. This decision that there is little measurement error variance in
introduces an additional source of noise, measure- this study; estimates of each individual’s tendency
ment error variance. For each participant in the to produce preservative and nonperseverative
sample, we are trying to measure some true under- intrusions appear to be quite precise. Figure 2b
lying value—in my example an individual’s ten- shows the bootstrapped 95% confidence interval
dency to perseverate. But we can only measure assuming that only 75 observations had been col-
that underlying value from the actual data col- lected from each individual. The resulting error
lected from each individual. As with sampling bars are wide, indicating that there is a much
of observations per participant on hit rate (Figure back for multiple testing sessions can be time con-
4a). With 20 observations per participant, only suming. Simulation 3 looks at the trade-off
7% of the dissociating cases were correctly ident- between including more participants in the study
ified. However, when 1000 observations were col- and collecting more data per participant. With
lected per participant, 71% of the dissociating cases finite time and resources, is it better to collect
were correctly identified. Again, the false-alarm more data from a smaller sample, or include
rate in these simulations was extremely low, con- more participants but test each participant less?
sistently less than 0.5% across all conditions
(Figure 4b). There appears to be a slight cost, in
Simulation 3: Trade-off between
terms of false alarms, of increasing the number
participants and observations per
of observations, from approximately 0.2% of the
participation
truly associated individuals when 20 observations
were collected per participant to about 0.4% Simulation 3 looks at the direct trade-off between
Downloaded by [University of Waterloo] at 14:33 11 October 2014
when 1000 observations were collected per partici- including more participants in the study and col-
pant. However, as shown in Figure 4c, this slight lecting more observations per participant. A new
increase in false-alarm rate was overwhelmed by value—the total number of trials in the study—is
the large increases in hit rate when d ′ was calcu- defined as the product of the number of partici-
lated, with the d ′ -sensitivity of the outlier detec- pants and the observations per participant. For
tion technique around 1.4 with 20 observations example, if the study includes 50 participants,
per participant and 3.2 with 1000 observations and 40 observations are collected for each partici-
per participant. pant, then the study includes a total of 2000 trials.
Simulations 1 and 2 demonstrate that both The same number of trials could be distributed in
sampling and measurement error variance cause a different ways; an alternative 2000-trial study
decrease in the sensitivity of the recursive exter- could have included 200 participants, with only
nally studentized residual outlier detection tech- 10 observations collected for each participant.
nique. The sensitivity of the technique is Assuming an equal cost for each trial, how
improved both by increasing the number of par- should trials be distributed between the number
ticipants, decreasing sampling error variance, and of participants and the number of observations
by increasing the number of observations per par- per participant if the goal is to maximize the per-
ticipant, decreasing measurement error variance. formance of outlier detection?
Strikingly, the benefits of increasing the number To address this question, I looked at eight levels
of participants and increasing the observations of total numbers of trials—1000, 2000, 3000, 4000,
per participant appear to affect outlier detection 6000, 8000, 12,000, and 20,000 trials per study.
differentially. In Simulation 1, the effects of The proportion of dissociating cases was held con-
increasing the number of participants shows up stant at 10%, and random data sets were created for
in a decrease in false-alarm rate, while in different combinations of number of participants
Simulation 2, the effects of increasing observations and observations per participant, resulting in these
per participants shows up primarily in an increase numbers of trials. For example, the 2000-trial simu-
in hit rate. If one of the goals of the case series lations included random data sets generated with 10
investigation is to be able to detect dissociating participants and 200 observations per participant,
cases, as many researchers have argued, there are 20 participants and 100 observations per partici-
independent motivations for testing more partici- pant, 40 and 50, 50 and 40, 80 and 25, 100 and
pants and for collecting more data for each partici- 20, and, finally, 200 participants and 10 obser-
pant. Of course, there are costs, in terms of both vations per participant. The maximum number of
time and other resources, of running studies that participants was set to be 200, as studies including
are too large. Finding additional participants for more participants than that seem infeasible. For
the study and bringing the same participants each of these number of participants and
participant. Simulation 3 shows that the benefit of of dissociating cases. In Simulation 1, regardless
collecting more data per participant mostly out- of the number of participants in the study, only
weighs the benefits of including more participants. 35% of the dissociating cases were detected,
Of course, identifying dissociating cases is not meaning that approximately twice as many disso-
the only goal of a case series investigation. ciating cases went undetected as were identified.
Another goal is being able to identify associations In order to limit false alarms in the outlier detec-
that exist in the data. Simulation 1 shows that tion technique, many dissociating cases are
small sample sizes have higher correlations missed, and these misses make interpreting the
between perseverative and nonperseverative intru- null result difficult.
sions on average than larger sample sizes, but those Including more participants in the study
correlations are less likely to be significant. With increases the certainty that the failure to find an
small sample sizes, there is less power to detect outlier means that no dissociating case exists in
the true association that exists for 90% of the indi- the sample. The simulations above stipulated
Downloaded by [University of Waterloo] at 14:33 11 October 2014
viduals in the sample. Given these two goals in a that 90% of the individuals in the sample had
case series study, the simulations show that the the true underlying association, while the remain-
study should include a sample of at least 30 partici- ing 10% were dissociating cases. This means that
pants and at least 400 observations for each indi- in a sample of 10 participants, only a single disso-
vidual, assuming a 90 – 10 split between ciating case was present, while in a sample of 100
associated and dissociating cases. In real case participants, 10 dissociating cases were present.
series, there is no way to know how frequently While the likelihood of missing an individual dis-
the dissociating pattern will be observed. sociating case was relatively high (65%) regard-
Simulation 4 explores the effects of different pro- less of the number of dissociating cases in the
portions of dissociating cases in the sample. simulation, the likelihood of missing both outliers
However, before moving onto the next stimu- in a 20-participant study is lower than the likeli-
lation, I consider the situation in which no outliers hood of missing the single outlier in a 10-partici-
are identified in the sample. pant study, and the likelihood of missing all 10
outliers in a 100-participant study is lower still.
The results of Simulation 1 can be reanalysed to
WHAT DOES IT MEAN IF NO look at the likelihood that no outliers were
OUTLIERS ARE DETECTED? detected in samples with different numbers of par-
ticipants. The percentage of data sets with no out-
As Nickels et al. (2011) highlight, the presence of liers detected fell drastically as the number of
a single dissociating case in a case series has the participants in the study increases, from 60.4% of
possibility of falsifying the theory being tested. the studies with only 10 participants to 12.9%
But what if no such case is identified? Such a with 50 participants, 2.2% with 100 participants,
situation appears to support the theory being and 0.1% with 200 participants.
tested; however, there are several issues that arise The simulation stipulates not only that disso-
from taking this failure to identify a dissociating ciating cases are present, but also that they are rela-
case as strong evidence supporting the hypothesis. tively frequent, consisting of exactly 10% of the
The first is that the outlier detection technique cases in the sample. If these dissociating cases are
outlined above is a form of the null hypothesis test. more infrequent, say affecting only 2% of the
A failure to identify an outlier should have the cases in the sample, then even with large samples
same status as a failure to reject the null hypoth- it may be unlikely that even a single dissociating
esis; it may be consistent with a theory but it case will be identified. We have no way of estimat-
does not constitute strong evidence in favour of ing the frequency with which certain rare patterns
that theory. Indeed, even at its best, the outlier are observed, and, historically, cognitive neuropsy-
detection technique only identifies around 70% chology has not been concerned by the relative
0 2.14 ,0.1 —
10 2.09 0.2 0.3
20 2.03 1.6 0.5
30 .04 4.5 0.8
40 .12 14.1 1.4
50 .21 31.3 2.7
60 .31 57.6 6.8 Figure 6. Percentage of data sets with a significant correlation
70 .44 83.8 21.0 between perseverative and nonperseverative intrusions and
80 .58 97.0 48.9 without a single outlier detected as a function of the frequency of
Downloaded by [University of Waterloo] at 14:33 11 October 2014
90 .76 99.9 62.9 dissociating cases in the sample (number of participants ¼ 50,
98 .93 100 64.1 observations per participant ¼ 500).
100 .98 100 —
4
When there is no underlying association between perseverative and nonperseverative intrusions, the correlation is slightly nega-
tive; the rates of perseverative and nonperseverative intrusions are not entirely independent in this sample. The sum of perseverative
and nonperseverative intrusion rates and the proportion of correct responses must be equal to 1.0. Therefore, if the perseverative error
rate is very high (e.g., .50%) then the nonperseverative rate must be lower (e.g., ,50%) and vice versa.
finding a significant correlation in a case series inves- of increasing the number of misses. A fifth simu-
tigation does not mean that even the majority of lation explores the effects that shifting the decision
individuals in the sample show the underlying criterion has on hits and false alarms in outlier
association. Second, too many dissociating cases detection.
limit the ability to detect outliers.
This second issue is further illustrated in
Figure 6. This figure plots the proportion of data
Simulation 5: Varying the decision criterion
sets in which (a) there was a significant correlation,
and (b) no outliers were detected as a function of In this simulation, 10,000 data sets were created
the percentage of the sample made up of associated with 50 participants, 100 data points per partici-
cases. Such data sets are common both when dis- pant and a 90– 10 split between associated and dis-
sociating cases are very rare (e.g., when 98% of sociating cases. Each of the 10,000 data sets was
the sample are associated cases, 30% of the data evaluated for outliers using 11 different uncor-
sets have a significant correlation without a rected alpha-values for statistical testing (.2, .1,
single outlier detected) and when dissociating .05, .02, .01, .005, .002, .001, .0005, .0002,
cases are reasonably common (e.g., when 60% of .0001). Note that included in here is an uncor-
the sample are associated cases, 23% of all data rected alpha value (.001) that is equivalent to a
sets have a significant correlation without a Bonferroni-correct alpha value of .05 with 50 par-
single outlier detected). The size of the correlation ticipants. An average false-alarm rate and hit rate
coefficient can be used to distinguish these two were calculated for each of these different alpha-
causes of a failure to detect an outlier. When dis- levels and were plotted as an ROC-curve
sociating cases are rare, correlations will be very (Figure 7). In addition, for each decision criterion,
strong, and when dissociating cases are common, the proportion of the 10,000 data sets that con-
correlations will be weak to moderate. In tained at least one false alarm was calculated.
Table 1, the average correlation coefficient is .93 There is a clear cost, in terms of false alarms,
when 98% of the data are generated from associ- when the decision criterion is made more
ated cases and only .31 when 60% of the data are liberal—that is, by not correcting for multiple com-
generated from associated cases. In a high- parisons. With an uncorrected alpha of .05, the rate
powered case series, if the predicted association of hits is relatively high (68%), but so is the rate of
has a moderate correlation coefficient, and no out- false alarms (20%). Because associated cases are nine
liers are detected, it is more likely that no outliers times more common in each data set than dissociat-
are detected because the dissociating pattern is so ing cases, at this hit and false-alarm rate an individ-
common than because the dissociating pattern ual identified as an outlier is nearly three times more
does not exist. likely to be a false alarm than a hit. When using this
the same sets of simulated data with outlier detection run at outlier detection with this goal in mind, but the
different alpha levels. Highlighted in the figure is the point that decision criterion should be made less conservative.
represents a Bonferroni-corrected alpha of .05 and an uncorrected
In these sorts of studies, it may be appropriate to use
alpha of .05 (number of participants ¼ 50, observations per
participant ¼ 100, frequency of dissociating cases ¼ 10%). an uncorrected alpha for outlier detection.
uncorrected alpha, 99% of the data sets had at least THE ROLE OF FOLLOW-UP
one false alarm. Without correcting for multiple TESTING IN DISTINGUISHING
comparisons, it becomes difficult to say with any MISSES AND FALSE ALARMS
confidence whether truly dissociating cases exist in
the data set. As the simulations above illustrate, simply detect-
In contrast, when a Bonferroni-corrected alpha ing an outlier in a case series is not enough to con-
of .05 is used as the decision criterion, the hit rate clude that the theory being tested is wrong; there is
is much lower (35%), the false-alarm rate is lower always a possibility that the data point detected as
still (,0.3%), and only 11% of all data sets contain an outlier is actually a false alarm. Nickels et al.
at least one false alarm. With these hit rates and (2011) point out the importance of follow-up
false-alarm rates, a data point detected as an testing to determine whether cases identified as
outlier is 15 times more likely to be a truly disso- outliers are truly dissociating cases, or simply
ciating case than a false alarm. In this paper, I appear to be dissociating cases due to some con-
have focused on outlier detection in case series founding factor. For these apparent dissociating
investigations with the goal of identifying single cases, some theoretical account must be made to
cases who show the dissociating pattern; this explain how the mechanism that is generating
goal is better served by setting a conservative their responses differs from the mechanism gener-
decision criterion, as the cost of mistakenly reject- ating the associations in the other cases in the
ing the null hypothesis that all of the individuals series.5 Follow-up testing, both for the cases
suffer from the same underlying deficit is high. detected as outliers and for the other cases in the
Incorrectly identified dissociations limit the sample, can be used to show that the case detected
impact of single-case cognitive neuropsychology. as an outlier truly has a different deficit from the
Less conservative corrections that still keep the other individuals in the sample (see Dell &
familywise error rate below a specified alpha- Schwartz, 2011, for additional discussion of this
level, like the Holm – Bonferroni correction point).
5
Keep in mind, an outlier should not be discarded as a false alarm simply because the researcher cannot think of a theoretical
explanation for why that individual differs from the rest of the sample.
Nickels et al. (2011) argue that the outlier needs only 1% of the associated cases. Being identified
to dissociate on follow-up testing from all of the as an outlier in both experiments is an excellent
others in the sample in order to demonstrate that indicator that the first result was a hit and not a
it is a truly dissociating case. This argument false alarm; an individual that was an outlier in
ignores the fact that the outlier detection technique both experiments was more than 1000 times
may have missed some truly dissociating cases. As more likely to be a truly dissociating case than an
the simulations show, the Bonferroni-corrected, associated case mistakenly identified as an outlier.
externally studentized residual test is conservative, A second issue is whether follow-up testing can
making the likelihood of missing truly dissociating help distinguish misses from correct rejections. In
cases surprising high. Follow-up testing can help to the first experiment, an average of 48.1 individuals
identify these misses as well. Despite not being were not identified as outliers; of these, 44.9 were
detected as outliers, misses will pattern with the correctly rejected associated cases, and 3.2 were
identified dissociating case on the follow-up missed truly dissociating cases. Of these individ-
Downloaded by [University of Waterloo] at 14:33 11 October 2014
testing rather than patterning with the rest of the uals, an average of 0.4 cases were identified as out-
sample. A sixth simulation examines the role of liers in the second experiment (,1%). However,
additional testing in distinguishing hits from false the majority of the cases that were detected as an
alarms and identifying misses. outlier in Experiment 2, despite not being detected
as an outlier in Experiment 1, were truly dissociat-
ing cases (75%). Being identified as an outlier in
Simulation 6: The role of additional testing
the second experiment, despite not being ident-
The sixth simulation took a constant number of ified as an outlier in the first experiment, is a
participants (50), number of observations (100), decent indication that the first result was a miss
and frequency of dissociating cases in the sample and not a correct rejection.
(10%). The outlier detection technique was This simulation shows the role that additional
applied to these simulated data sets. Follow-up testing can play in distinguishing hits and misses
testing was then simulated in the simplest possible from false alarms and correct rejections. Outlier
manner: The same simulated participants, with the detection on a second experiment provides clear
same underlying rates of producing perseverative information about how to interpret the results of
and nonperseverative intrusions, were adminis- the first experiment. Individuals who are detected
tered a second experiment with 500 observations. as outliers on both experiments are almost cer-
Outlier detection is then applied to these new tainly truly dissociating cases while individuals
data sets, and a computer program tabulated how detected as outliers in the second experiment but
many individuals identified as outliers in the first not the first were probably misses. Nickels et al.
experiment were also identified as outliers in the (2011) and Schwartz and Dell (2010) argue that
second experiment. follow-up testing should be used to better under-
In the first experiment, an average of 1.9 out- stand the underlying deficit of the dissociating
liers were detected for each data set, with 1.8 cases. This stimulation shows the power that
being truly dissociating cases (hit rate: 35%) and even simple retesting—a direct replication of the
0.1 being associated cases (false-alarm rate: findings with the same individuals performing
,0.3%). Outliers in Experiment 1 also tended to the same task—can have in identifying truly disso-
be outliers in Experiment 2. An average of 1.5 ciating cases in a case series.
cases were identified as outliers in both exper-
iments (77%), and nearly all of these cases were
truly dissociating cases (.99.9%). Indeed, over CONCLUSIONS
80% of the truly dissociating cases identified as
an outlier in Experiment 1 were also identified as Case series investigations provide a clear tool for
an outlier in Experiment 2, while this was true of testing predicted associations between cognitive
functions in individuals with cognitive impair- the worst their predicted performance on test Y).
ments. When testing for these associations, it is However, this method is not appropriate for
critical to additionally look for dissociating cases, studies in which the hypothesis being tested does
individuals whose pattern of performance deviates not attribute a causal role to one of the variables
from the predicted association, particularly for and treats all of the variables as having equal
those studies in which a single dissociating case status (see Crawford & Garthwaite, 2005, 2007a,
can falsify the theory being tested. Determining 2007b; Crawford, Garthwaite, & Porter, 2010
whether a case series includes individuals whose for discussion of appropriate tests to employ in
data are unpredicted from the underlying associ- these conditions).
ation—called outliers in the statistics literature—is While the example discussed in this paper looks
a difficult statistical problem. This paper offers only at the relationship between two variables, the
one formal approach to testing case series for out- externally studentized residual test generalizes to
liers: the recursive externally studentized residual multiple linear regression (Weisberg, 1985).
Downloaded by [University of Waterloo] at 14:33 11 October 2014
test. Through a series of simulation studies, I have Specifically, the method generalizes to cases in
identified some statistical concerns that arise when which multiple causal variables are used to make
testing a case series for outliers. Briefly, it is impor- a prediction about a single dependent variable.
tant for case series investigations to include large As with the simple linear regression, the observed
numbers of participants. Increasing the number of value on that dependent variable can be compared
participants decreases the likelihood of falsely iden- to the predicted value based on the other explana-
tifying an individual as a dissociating case and tory variables, by computing an externally studen-
increases confidence that the dissociating pattern tized residual and comparing that value to some
is not attested when no outliers are detected. decision criteria.6 Extending this technique to
Furthermore, it is important in a case series investi- multiple regression is critical for case series inves-
gation to collect many observations from each par- tigations, as individuals in a large case series are
ticipant. Increasing the number of observations per likely to differ on many variables (i.e., age, edu-
participant has a sizeable effect on the likelihood of cation, speed of processing) that may be related
correctly detecting the truly dissociating cases in the to the measures being tested. The problem of cov-
sample. Finally, if outliers are detected, it is impor- ariates is beyond the scope of the present paper,
tant to do follow-up testing, in order to determine but I refer the reader to Crawford, Garthwaite,
whether the detected outlier is a truly dissociating and Ryan (2011) for a discussion of this issue, as
case or a false alarm, as well as identifying other dis- well as methods for comparing a case study to a
sociating cases that may have been missed by the control population in the presence of covariates.
outlier detection technique in a single experiment. Following all of the directives from this paper
While the Fischer-Baum and Rapp (2012) makes for infeasibly large case series investigations;
study of perseveration errors was used as a concrete while running hundreds of participants on dozens
example of how outlier detection might be applied of tests each composed of thousands of trials will
to case series data, the methods proposed in this result in the best data for case series investigations,
paper should be applicable to a wide range of a single lab has neither the time nor the money to
case series studies. This method is clearly appro- carry out such a study. Simulation 3 offered some
priate for detecting outliers in case series studies suggestions for how to balance some of these com-
that use linear regression analyses, in which there peting recommendations, advocating increasing
is a clear prediction being tested about the causal the number of observations per participant and
relationship of one of the variables on the other the expense of including more participants. An
(e.g., the lower a patient’s performance on test X, alternative perspective would be to make our case
6
The degrees of freedom for the generalized externally studentized residual test is n – p – 1, in which n is the total number of
data points, and p is the total number of parameters (2 in simple linear regression, .2 in multiple linear regression).
Martin, N., & Dell, G. S. (2004). Perseveration and for cognitive neuropsychology. Topics in Cognitive
anticipations in aphasia: Primed intrusions from Science, 1, 39–58.
the past and future. Seminars in Speech and Penny, K. I. (1996). Appropriate critical values
Language, 26, 349 –362. when testing for a single multivariate outlier by
Martin, N., & Dell, G. S. (2007). Common mechanisms using the Mahalanobis distance. Applied Statistics,
underlying perseverative and non-perseverative sound 45, 73 –81.
and word substitutions. Aphasiology, 21, 1002–1017. Rapp, B. (2011). Case series in cognitive neuropsychol-
Mirman, D., Strauss, T. J., Brecher, A., Walker, G. M., ogy: Promise, perils and proper persepective.
Sobel, P., Dell, G. S., & Schwartz, M. F. (2010). A Cognitive Neuropsychology, 28, 435–444.
large, searchable, web-based database of aphasic Schwartz, M. F., & Dell, G. S. (2010). Case series
performance on picture naming and other tests of investigations in cognitive neuropsychology.
cognitive function. Cognitive Neuropsychology, 27, Cognitive Neuropsychology, 27, 477–494.
495 –504. Shallice, T., & Buiatti, T. (2011). Types of case series –
Nickels, L., Howard, D., & Best, W. (2011). On the use The anatomically based approach: Commentary on
of different methodologies in cognitive neuropsy- M.F. Schwartz & G.S. Dell: Case series investi-
Downloaded by [University of Waterloo] at 14:33 11 October 2014
chology: Drink deep and from several sources. gations in cognitive neuropsychology. Cognitive
Cognitive Neuropsychology, 28, 475 –485. Neuropsychology, 28, 500– 514.
Olson, A. C., & Romani, C. (2011). Model evaluation and Simon, J. L., & Bruce, P. (2000). Resampling stats.
case series data. Cognitive Neuropsychology, 28, 486–499. Arlington, VA: Resampling Stats Inc.
Patterson, K., & Plaut, D. C. (2009). “Shallow draughts Weisberg, S. (1985). Applied linear regression (2nd ed.).
intoxicate the brain”: Lessons from cognitive science New York: Wiley.
APPENDIX A
reg_results ¼ regstats(Data(:,2),Data(:,1),’linear’,’studres’);
studentized_residuals(:,1) ¼ reg_results.studres;
for i ¼ 1:Number_of_subjects
if studentized_residuals(i) . Critical_values(2)
studentized_residuals(i,2) ¼ 1;
elseif studentized_residuals(i) , Critical_values(1)
studentized_residuals(i,2) ¼ -1;
else
end
end
end
APPENDIX B
[recursive_outlier_detection] ¼ zeros(length(Data),1);
[studentized_residuals] ¼ externally_studentized_residuals(Data, alpha);
DataPrime ¼ Data;
IndexData ¼ [1:1:length(Data)];
while sum(abs(studentized_residuals(:,2))) . 0
DataPrime2 ¼ [];
IndexData2 ¼ [];
for i ¼ 1:length(studentized_residuals)
if studentized_residuals(i,2) ¼¼ 0
DataPrime2 ¼ vertcat(DataPrime2, DataPrime(i,:));
IndexData2 ¼ horzcat(IndexData2, IndexData(i));
else
recursive_outlier_detection(IndexData(i)) ¼ studentized_residuals(i,2);
end
end
DataPrime ¼ DataPrime2;
Downloaded by [University of Waterloo] at 14:33 11 October 2014
IndexData ¼ IndexData2;
[studentized_residuals] ¼ externally_studentized_residuals(DataPrime, alpha);
end
end