You are on page 1of 22

This article was downloaded by: [University of Waterloo]

On: 11 October 2014, At: 14:33


Publisher: Routledge
Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office:
Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Cognitive Neuropsychology
Publication details, including instructions for authors and subscription
information:
http://www.tandfonline.com/loi/pcgn20

Making sense of deviance: Identifying


dissociating cases within the case series
approach
a
Simon Fischer-Baum
a
Department of Psychology, Rice University, Houston, TX, USA
Published online: 29 Oct 2013.

To cite this article: Simon Fischer-Baum (2013) Making sense of deviance: Identifying dissociating
cases within the case series approach, Cognitive Neuropsychology, 30:7-8, 597-617, DOI:
10.1080/02643294.2013.846903

To link to this article: http://dx.doi.org/10.1080/02643294.2013.846903

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”)
contained in the publications on our platform. However, Taylor & Francis, our agents, and our
licensors make no representations or warranties whatsoever as to the accuracy, completeness, or
suitability for any purpose of the Content. Any opinions and views expressed in this publication
are the opinions and views of the authors, and are not the views of or endorsed by Taylor &
Francis. The accuracy of the Content should not be relied upon and should be independently
verified with primary sources of information. Taylor and Francis shall not be liable for any
losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities
whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or
arising out of the use of the Content.
This article may be used for research, teaching, and private study purposes. Any substantial
or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or
distribution in any form to anyone is expressly forbidden. Terms & Conditions of access and use
can be found at http://www.tandfonline.com/page/terms-and-conditions
Cognitive Neuropsychology, 2013
Vol. 30, Nos. 7 –8, 597 –617, http://dx.doi.org/10.1080/02643294.2013.846903

Making sense of deviance: Identifying dissociating cases


within the case series approach
Simon Fischer-Baum
Department of Psychology, Rice University, Houston, TX, USA

The case series approach in cognitive neuropsychology provides a means to test theories that make
quantitative predictions about associations between different components of the cognitive system
[Schwartz, M. F., & Dell, G. S. (2010). Case series investigations in cognitive neuropsychology.
Downloaded by [University of Waterloo] at 14:33 11 October 2014

Cognitive Neuropsychology, 27, 477 –494]. However, even when the predicted association is borne
out the study may include outliers—observations that deviate significantly from the rest of the
data. These outliers may reveal individual cases whose cognitive impairments dissociate from
other cases included in the study. These dissociating cases can pose a significant challenge to
the theory being tested. Using a recent case series that investigated the underlying causes of
letter perseveration in spelling [Fischer-Baum, S., & Rapp, B. (2012). Underlying cause(s) of
letter perseveration errors. Neuropsychologia, 50, 305–318], I discuss statistical and theoretical
issues that arise when using outlier detection techniques to identify dissociating cases in a case
series study.

Keywords: Cognitive neuropsychology; Outlier detection; Case series; Single-case studies;


Associations and dissociations.

A common critique of the single-case approach to of individual cases and analyses how these
neuropsychology has been its focus on identifying measures covary across participants (Schwartz &
dissociations between cognitive functions at the Dell, 2010; cf. Olson & Romani, 2011). To the
expense of identifying how cognitive functions extent that these different behavioural measures
are associated (Patterson & Plaut, 2009). Case tap into different cognitive functions, associations
series studies offer a complementary approach to between these measures reflect associations
the single-case design, providing a clear means between the underlying cognitive functions. The
for testing theoretically motivated associations. A relative merits of the case series methodology in
typical case series study combines data from mul- cognitive neuropsychology have been discussed
tiple behavioural measures across a large number extensively in this journal over the past several

Correspondence should be addressed to Simon Fischer-Baum, Department of Psychology, Rice University, MS-25, P.O. Box
1891, Houston, TX 77251, USA. (E-mail: sjf2@rice.edu).
I am grateful to Gary Dell, Matt Goldrick, Michael McCloskey, Fred Oswald, and Brenda Rapp for helping me work through
these issues of case series, outliers, and cognitive neuropsychology, and to David Kajander for reading through drafts. Portions of this
work were presented in a symposium on case series investigations at the 50th annual meeting of the Academy of Aphasia. The
remaining work was moulded by the discussion that took place at that meeting.

# 2013 Taylor & Francis 597


FISCHER-BAUM

years (Bub, 2011; Dell & Schwartz, 2011; dissociating cases in a case series study. By carefully
Goldrick, 2011; Lambon Ralph, Patterson, & analysing our case series for dissociating cases, we
Plaut, 2011; Nickels, Howard, & Best, 2011; can use case series data to develop theories that
Olson & Romani, 2011; Rapp, 2011; Schwartz explain not only the associations that are observed
& Dell, 2010; Shallice & Bulatti, 2011). across large samples of brain-damaged individuals,
Generally, these researchers agree that case series but also the dissociations.
studies provide a means to test certain types of
hypotheses that are difficult to test with the
single-case approach. However, a number of THE CASE SERIES APPROACH TO
these researchers noted the importance of evaluat- LETTER PERSEVERATIONS IN
ing not only the associations that exist in the case DYSGRAPHIA
series, but also whether each individual in the
sample is a legitimate function of the underlying Fischer-Baum and Rapp (2012) addressed the
Downloaded by [University of Waterloo] at 14:33 11 October 2014

pattern of association (Dell & Schwartz, 2011; issue of dissociating cases in a case series in an
Goldrick, 2011; Nickels et al., 2011; Rapp, 2011; investigation of the underlying causes of letter per-
Schwartz & Dell, 2010, cf. Lambon Ralph et al., severation errors in dysgraphia. Perseveration
2011). According to Nickels et al. (2011, p. 481), errors are the inappropriate repetition of a previous
the logic of cognitive neuropsychology dictates response in the place of the current target. The
that even a single individual who deviates signifi- errors investigated in that paper were the perse-
cantly from the rest of the sample “may be a real veration of individual letters from a previous
example of a dissociating pattern that can falsify written response into a subsequent response (e.g.,
a theory.” the perseveration of the R in the error “edge”
The goal of this paper is to provide a frame- spelled as ERGE, immediately following the
work for evaluating case series studies for these response FRENCE). Two accounts for why
dissociating cases. I discuss a recent case series these perseveration errors were produced were
on the underlying causes of letter perseverations considered: a “failure-to-inhibit” account, in
in spelling as a concrete example (Fischer-Baum which perseverations arise because of a failure to
& Rapp, 2012). I then propose a statistical inhibit previous responses (e.g., Hauser, 1999)
method that uses the recursive externally studen- and a “failure-to-activate” account, in which perse-
tized residual test (Weisberg, 1985) as a tool for veration errors arise because the current target
detecting outliers, individual cases that deviate receives abnormally little activation (e.g., Cohen
significantly from the association being tested in & Dehaene, 1998; Dell, Burger, & Svec, 1997).
the case series. This method is similar to a To assess these two accounts, Fischer-Baum
recent proposal by Crawford and Garthwaite and Rapp (2012) analysed the spelling errors pro-
(2006, 2007a, 2007b) for comparing a single duced by a series of dysgraphic individuals.
neuropsychological case study to a population of Following work on verbal perseverations by
unimpaired control participants whose Martin and Dell (2004, 2007), they reasoned
performance is characterized by a regression that a failure-to-activate deficit should lead to
equation. both perseverative and nonperseverative intru-
The second half of the paper explores this sions, and they predicted an association between
outlier detection method in more detail by apply- an individual’s rate of producing both of these
ing it to simulated data. Across six simulations I error types. This predicted association was borne
explore the effects of the number of participants, out across the case series. Figure 1 reproduces
the number of observations per participant, the the association between perseverative and nonper-
frequency of dissociating cases in the population, severative intrusions in the 12 individuals with
the placement of the decision criterion, and the acquired dysgraphia studied by Fischer-Baum
role of follow-up testing on the ability to detect and Rapp (2012).

598 Cognitive Neuropsychology, 2013, 30 (7 –8)


CASE SERIES OUTLIERS

would challenge the claim that having a failure-


to-activate deficit alone is enough for persevera-
tion to occur.
A cursory glance at Figure 1 reveals a single
point with an extremely high rate of perseveration
errors (25% of all letters produced are persevera-
tions) that would not be predicted from his rela-
tively low rate of nonperseverative intrusions
(4% of all letters produced). This deviating
data point may be the type of dissociating case
that falsifies the necessary and sufficient failure-
to-activate theory. But how can we determine
that this data point truly deviates from the pre-
Downloaded by [University of Waterloo] at 14:33 11 October 2014

dicted association? And even if we determine


that this data point is an outlier, how do we
Figure 1. Observed relationship between the rate of producing letter
perseveration errors and nonperseverative intrusions in spelling-to- know whether or not the individual who generated
dictation in a sample of 12 dysgraphic individuals. [Reprinted from that data point is a “real example of a dissociating
Neuropsychologia, 50, Fischer-Baum, S., & Rapp, B. Underlying pattern”? The remainder of the paper outlines stat-
cause(s) of letter perseveration errors, 305– 318. Copyright (2012), istical methods that address these questions.
with permission from Elsevier.]

What can really be concluded from this associ- RECURSIVE EXTERNALLY


ation? It would be incorrect to conclude from this STUDENTIZED RESIDUAL TEST:
analysis that a failure-to-activate deficit is the only A METHOD FOR OUTLIER
deficit that produces letter perseverations or that DETECTION
all individuals in the sample perseverate because
of a failure-to-activate deficit. The association The issue of identifying outliers in correlational
alone also cannot tell us whether all individuals analyses has received a great deal of attention in
who have a failure-to-activate deficit produce a the statistics literature. Freeman (1980, p. 350)
significant number of perseverations. At best, it defines outliers as “any observation that has not
indicates that it is unlikely that a failure-to- been generated by the mechanism that generated
inhibit deficit explains all of the letter persevera- the majority of the observations in the dataset”.
tions. But the goal of these studies was to identify Following Freeman’s definition of outliers, the
the deficit that caused individuals to perseverate. If identification of an outlier in the case series
the theoretical claim being tested is that a failure- described above seems to indicate that the data
to-activate deficit is both necessary and sufficient point was not generated by the failure-to-activate
for persons with aphasia or dysgraphia to deficit, but rather by some other deficit. Of
produce perseverations, then a single individual course, like any detection procedure, a sensitive
whose pattern of performance deviates signifi- outlier detection technique maximizes hits—prop-
cantly from the predicted association would erly identifying those individuals generated by a
falsify that claim. An individual whose rate of per- different mechanism from that for the majority
severation was much higher than expected by their of observations in the data set—while avoiding
rate of nonperseverative intrusions would chal- false alarms—identifying as an outlier data points
lenge the claim that the failure-to-activate deficit generated by the same mechanism as that for the
is the only deficit that can cause perseveration, majority of the data points.
while an individual who produced few persevera- Weisberg (1985) proposes the externally stu-
tions despite many nonperseverative intrusions dentized residual test as an outlier detection

Cognitive Neuropsychology, 2013, 30 (7 –8) 599


FISCHER-BAUM

method suited to determining whether individual rate of nonperseverative intrusion. To say the
data points lie surprisingly far from a fitted residual is calculated externally means that when
regression line.1 This test requires the assumptions evaluating how far the nth case deviates from a
typical for linear regression models—namely that predicted point, that prediction is derived using
the error from the fit of the regression line is nor- all of the data except for the nth case. In the
mally distributed, and that errors are mutually example case series above, to determine the
uncorrelated. Using this method, we can evaluate residual of the apparent outlier, that individual’s
whether an individual’s rate of producing perse- observed rate of producing perseverations
veration errors differs significantly from the rate (25%) is compared to a predicted rate of produ-
expected given that individual’s rate of producing cing perseverations, derived from (a) that individ-
nonperseverative intrusions. Included in ual’s observed rate of producing nonperseverative
Appendix A and Appendix B are two Matlab pro- intrusions (4%) and (b) the regression line
grams that can be used, in combination with given the other 11 individuals in the study.
Downloaded by [University of Waterloo] at 14:33 11 October 2014

Statistics Toolbox, to carry out this outlier detec- Finally, the residuals are studentized, or converted
tion technique. Other statistical packages (includ- to a t-statistic that takes into account the variance
ing free packages like R) come equipped with in residuals—calculated over all of the data except
functions that calculate the externally studentized for the nth case—as well as leverage of the data
residual. point (i.e., the influence of the data point on its
The basic logic of any outlier detection test is as fitted value; high leverage indicates that its residual
follows: First, each point has to be evaluated for matters more).2 The apparent outlying case in
how far it deviates from the other data points in Figure 1 has an externally studentized residual of
the sample. Second, a decision criterion must be 6.3. An externally studentized residual is calcu-
set to evaluate whether the degree to which a lated for all individuals included in the case
data point deviates from the rest of the sample is series; from the data shown in Figure 1, an exter-
large enough to merit identification as an outlier. nally studentized residual is calculated for all 12 of
The externally studentized residual is a measure the cases, with the other 11 cases ranging from
of how far a single data point deviates from the 21.2 to 0.5.
rest of the sample. The residual of each data The second stage of outlier detection is to define
point is simply how far that point falls from the a decision criterion for determining whether a
point predicted by the regression line. For the single data point that deviates enough from the
case series described above, the residual is a rest of the sample is an outlier. Because the exter-
measure of how far an individual’s observed rate nally studentized residuals have been converted to
of producing perseveration deviates from the rate t-statistics, the decision criterion can be set using
that would be predicted given that individual’s the null hypothesis statistical testing approach;

1
Fischer-Baum and Rapp (2012) describe a Mahalanobis distance procedure to detect outliers following Penny (1996). However,
a further review of the statistics literature, and some simple simulations, has led me to the conclusion that the externally studentized
residual test would have been more appropriate with our data set. The Mahalanobis distance technique measures how far a given data
point is from the centre of mass of the whole data set, while the externally studentized residual test measures how far the data point is
from the regression line. Consider a data set that includes a data point with a much greater rate of perseverative and nonperseverative
intrusions than the rest of the individuals in the sample, but whose data point falls near the regression line. Given the theory being
tested, it would be incorrect to conclude that the individual represented by this data point suffers a qualitatively different deficit from
the others in the sample; instead, it appears that this individual has a quantitatively more extreme version of the same deficit. Yet,
because this data point is far from the centre of mass of the whole data set, it will be identified as an outlier by the Mahalanobis
distance technique. Because it falls near the regression line, it will not be identified as an outlier by the externally studentized residual
test.
2
Error is larger for more extreme values of the predictor variable, causing the studentized residual to be smaller. Therefore, a
larger difference between the observed and predicted values is needed for a point to be identified as an outlier at these extreme
values (see Crawford & Garthwaite, 2006, for further discussion of this point).

600 Cognitive Neuropsychology, 2013, 30 (7 –8)


CASE SERIES OUTLIERS

critical t-values can be identified such that we are the one outlying case is deleted from the analysis,
more than 95% confident that an externally studen- and externally studentized residuals are recalcu-
tized residual is different from zero. Because we lated for the remaining 11 cases. While these stu-
need to independently determine whether each of dentized residuals are now larger, none are
the 12 residuals is significantly different from significantly different from 0. The application of
zero, it is important to make some correction for the recursive externally studentized residual test
multiple comparisons. To address this problem, identifies a single outlier in the Fischer-Baum
Weisberg (1985) suggests Bonferroni correction. and Rapp (2012) case series.
Most of the simulations described below follow If we can show that this outlier is generated by a
Weisberg’s suggestion and use this conservative different mechanism from the mechanism that
decision criterion for identifying outliers, though generated the data for the other 11 participants in
the fifth simulation explores the effect of relaxing the study (Freeman, 1980), then its very presence
the decision criterion on the effectiveness of poses a significant problem for the necessary-
Downloaded by [University of Waterloo] at 14:33 11 October 2014

outlier detection. For the specific example above, and-sufficient failure-to-activate hypothesis.
because there are 12 values being evaluated, for a However, there remains a possibility that this
significant level of .05, with a Bonferroni correc- outlier is a false alarm; the same mechanism that
tion, the critical value is based on an alpha of .05/ generated the rest of the data generated this data
12 or .0042. The degrees of freedom for the signifi- point, and it is because of random noise that this
cance tests, in the case of simple linear regression, is individual is identified as an outlier. With real
(n 2 2), where n is the number of data points in the data, identifying whether an outlier is a hit or a
analysis. In the example case series, the critical false alarm is an ill-defined problem. We certainly
t-values for a two-tailed test with a p , .0042 and cannot claim that an outlier is a false alarm simply
10 degrees of freedom are +3.70 and 23.70. because we do not understand the deficit that is
Since the observed externally studentized residual giving rise to this pattern of performance.
for the apparent outlying case (6.3) is more Simulation studies provide a clear opportunity to
extreme than the critical value, the null hypothesis investigate the problem of false alarms in outlier
can be rejected that this residual is no different from detection. In a simulation, the mechanism that
0, and this data point is identified as an outlier. generates the data for all of the participants can
Even though no other participant in the study be prespecified, so we know which individuals in
has a residual value more extreme than the critical the simulation are truly dissociating cases and
value, it may be premature to conclude that none which are not. False alarms in simulation studies
of these other individuals are outliers. The are simply those individuals identified as outliers,
problem of masking—in which one extreme despite not being one of the individuals prespeci-
outlier increases the overall variability of the fied as a truly dissociating case.
dataset, making the detection of other outliers How much we need to worry about false alarms
more difficult—is well known in the statistics lit- depends on the statistical power of the outlier
erature (Ben-Gal, 2005). To address this detection technique. In part, the statistical power
problem, I suggest recursive application of the of the outlier detection technique depends on the
statistic described above until no more outliers design of the case series investigation—how
are detected. Those outliers detected during a many participants are included in the study?
single run of the statistical analysis should be How much data are collected for each participant?
deleted from the data set, and the statistical analy- Other aspects of the investigation outside of the
sis should be run again.3 In the case series example, control of the experimenter—that is, the

3
One concern about this recursive application of the externally studentized residual test is that it will make false alarms more
common; once one or two nonoutliers are incorrectly labelled as outliers, the technique may start to identify false alarms on other
data points. The results of the simulation discussed below show that such false alarms are rare with this technique.

Cognitive Neuropsychology, 2013, 30 (7 –8) 601


FISCHER-BAUM

prevalence of the different underlying deficits in error variance, because our actual measurement is
the population—will also affect the statistical based on a limited number of observations, these
power of outlier detection technique. These vari- measures will vary randomly. Consider an individ-
ables are explored in a series of simulation ual whose true underlying tendency is to produce a
studies below. perseveration error on 10% of the trials. If this
individual was given only 10 trials of the task,
how likely is it that she would make exactly one
BENEFITS OF MORE PARTICIPANTS perseveration error? Following the binomial distri-
AND MORE DATA PER bution, the probability that she will produce
PARTICIPANTS exactly one perseveration error (and therefore
have an estimated perseverative tendency of .1,
As case series researchers, a trade-off exists matching her true underlying tendency) is
between including more participants in the study roughly .4. The probability that she will not
Downloaded by [University of Waterloo] at 14:33 11 October 2014

and collecting more data from each participant. produce a single perseveration error (and therefore
Fischer-Baum and Rapp (2012) collected hun- have an estimated perseverative tendency of .0) is
dreds of spelling trials for each of the participants about .35, and the probability that she will
included in the case series. Because of the intensive produce at least two perseverations in 10 trials
investigation of each individual in the study, it was (and have an estimated perseverative tendency of
infeasible to include many participants. However, at least .2) is about .25. Because only limited
there may be statistical concerns with having too data are being collected from this individual, our
small a sample in a case series, as it may increase measure of the tendency to perseverate is going
sampling error variance. The goal of a case series to differ from the underlying value.
study is to draw inferences from our sample that As with sampling error variance, one way to
generalize to the population of all individuals reduce measurement error variance is to collect
that meet the inclusion criteria for the study. more observations from each individual in the
The importance of inclusion criteria has been sample. The effects of collecting more data are
raised in recent discussions of the case series meth- illustrated in Figure 2. In Fischer-Baum and
odology (Dell & Schwartz, 2011; Olson & Rapp (2012), each participant was asked to spell
Romani, 2011; Shallice & Buiatti, 2011). at least 300 words (sometimes many more), and
Regardless of inclusion criteria, it is unlikely that each produced more than 1000 letters. The error
the entire population is being investigated in the bars in Figure 2a show the bootstrapped 95% con-
study. Even when a random and representative fidence intervals (on x- and y-values indepen-
sample is drawn from the population of interest, dently) around each participant’s estimated values
statistics will vary randomly from sample to of the tendency to produce perseverative and non-
sample. As the size of the sample increases, the perseverative intrusions (Simon & Bruce, 2000);
sampling error variance will decrease. these confidence intervals indicate the range of
Researchers who include more participants in true underlying values that could plausibly yield
their study, in order for their results to better gen- the observed error rates given the actual number
eralize to the population as a whole, may collect of letters produced. The narrow error bars indicate
fewer data from each participant. This decision that there is little measurement error variance in
introduces an additional source of noise, measure- this study; estimates of each individual’s tendency
ment error variance. For each participant in the to produce preservative and nonperseverative
sample, we are trying to measure some true under- intrusions appear to be quite precise. Figure 2b
lying value—in my example an individual’s ten- shows the bootstrapped 95% confidence interval
dency to perseverate. But we can only measure assuming that only 75 observations had been col-
that underlying value from the actual data col- lected from each individual. The resulting error
lected from each individual. As with sampling bars are wide, indicating that there is a much

602 Cognitive Neuropsychology, 2013, 30 (7 –8)


CASE SERIES OUTLIERS

The general structure of the simulations


For all of the simulations in this paper, a simplified
situation was considered in which a participant’s
response was (a) a correctly produced letter, (b) a per-
severation error, or (c) an intrusion that was not a
perseveration. Two types of participants were
included in the simulations: truly associated partici-
pants whose probability of producing a perseveration
error and probability of producing a nonpersevera-
tive intrusion were identical, and dissociating cases,
who were participants for whom those error rates
were uncorrelated. These dissociating cases are the
outliers in these stimulations, as their data are gener-
Downloaded by [University of Waterloo] at 14:33 11 October 2014

ated by a different mechanism from that for the truly


associated participants. The goal of these stimulation
studies is to evaluate how well our outlier detection
technique performs at correctly identifying these
cases without falsely identifying the associated
participants as outliers. For all simulations except
Simulation 4, 90% of the participants were truly
associated participants.
For all participants, a random value between 0.0
and 1.0 was generated for their probability of cor-
rectly producing a letter. For the truly associated
participants, the remaining proportion was then
evenly divided between perseverative and nonper-
severative intrusions. For example, a truly associ-
Figure 2. Bootstrapped 95% confidence interval around both the
rate of perseveration errors and the rate of nonperseverative ated participant could have had a probability of
intrusions based on the data shown in Figure 1. Figure 2a shows .6 for correctly producing a letter, .2 for producing
the confidence intervals given the actual number of letters perseverations, and .2 for producing nonpersevera-
produced by each individual in the sample, while Figure 2b shows tive intrusions. For the dissociating cases, all three
the confidence intervals assuming that each individual only
true were randomly selected, and normalized,
produced 75 responses.
ensuring that they added up to 1.0.
Error was introduced into the simulation
larger range of true underlying values that could through resampling procedures. Each simulation
have plausibly yielded the observed result. specified the number of observations collected for
Both sampling and measurement error variance each participant. Each observation was classified
can contribute to poor performance of outlier as a perseveration, a nonperseverative intrusion, or
detection techniques. Collecting more data mini- a correct response, following the underlying prob-
mizes each source of noise in the sample, but the ability distribution for that participant. An observed
different sources of noise benefit from different perseveration rate and rate of producing nonperse-
types of additional data collection. The first three verative intrusions were then calculated for each
simulation studies examine the relative contri- participant by dividing the number of observed per-
butions of running more participants and collect- severations and the number of observed nonperse-
ing more data for each participant on the verative intrusions by the total number of
sensitivity of outlier detection. observations. In the limit, the observed data

Cognitive Neuropsychology, 2013, 30 (7 –8) 603


FISCHER-BAUM

should match the underlying probabilities of these


error types. However, because a finite number of
observations were collected, the observed values
deviate from the underlying probabilities, with
error that approximates a normal distribution.
Using this procedure, a single randomly gener-
ated data set was created given (a) a number of par-
ticipants, (b) the number of observations for each
participant, and (c) the proportion of dissociating
cases. The correlation between the perseverative
and nonperseverative intrusion rates was then cal-
culated and tested for significance. The recursive
externally studentized residual test was applied to
Downloaded by [University of Waterloo] at 14:33 11 October 2014

the data set in order to identify outliers, and


both hit rate—the proportion of dissociating
cases correctly identified—and false-alarm rate—
the proportion of associated cases falsely identified
as outliers—were calculated. For a given number
of participants, number of observations for a par-
ticipant, and proportion of dissociating cases,
10,000 randomly generated data sets were
created, and the correlation coefficient, signifi-
cance, hit rate, and false-alarm rate were calculated
for each of the 10,000 data sets.

Simulation 1: Varying number of


participants
The first simulation looked at the effects of reducing
sampling error variance by increasing the number of
participants in the study. Held constant in this simu-
Figure 3. (a) Hit rate, (b) false-alarm rate, and (c) d ′ for
lation was the proportion of dissociating cases (10%) simulated data sets from Simulation 1 with a constant number of
and the number of observations collected per partici- observations per participant (100) and proportion of the
pant (100). The number of participants varied in dissociating cases in the sample (10%) and a variable number of
increments of 10 from 10 to 200 participants. participants in the study (range: 10– 200).
Overall, the ability to detect the underlying
association between perseverative and nonperse- that number increased to 89% of the data sets
verative intrusions was good across number of when 20 participants were included in the study.
participants. The average correlation ranged Figure 3 shows the results of applying the
from .72, when the sample included 10 partici- outlier detection technique to these simulated
pants, to .69, when the sample included 200 par- data sets in terms of hit rate (Figure 3a) and
ticipants. When at least 30 participants were false-alarm rate (Figure 3b). The first result to
included in the sample, that correlation was signifi- note is that varying the number of participants
cant in more than 95% of the randomly generated has no effect on the hit rate in these simulations.
data sets. With 10 participants, the correlation Whether there are 10 participants or 200 partici-
was only significant in 72% of the data sets, and pants in the study, the outlier detection technique

604 Cognitive Neuropsychology, 2013, 30 (7 –8)


CASE SERIES OUTLIERS

correctly identifies approximately 35% of the dis-


sociating cases. The second result to note is how
low the false-alarm rate is in all of these simulated
data. Because of Bonferroni correction, the detec-
tion technique proposed here is quite conservative.
At most about 1% of the associated individuals are
falsely identified as outliers. It should be noted that
increasing the number of participants in the study
decreases the false-alarm rate. With 10 partici-
pants an average of 1.1% of the associative partici-
pants were identified as outliers, while with 200
participants the rate of false alarms was less than
0.1%. Following signal detection theory, the sensi-
tivity of outlier detection can be expressed with d ′ ,
Downloaded by [University of Waterloo] at 14:33 11 October 2014

a measure that takes into account both hit rate and


false-alarm rate. Figure 3c shows how the d ′ of the
recursive externally studentized residual detection
technique varies as a function of number of partici-
pants; as the number of participants increases, d ′
increases as well, from a d ′ of 1.9 for data sets
generated with 10 participants to a d ′ of 2.8 for
data sets generated with 200 participants. All else
being equal, reducing sampling error variance by
increasing the number of participants in the
study makes the recursive externally studentized
residual test more accurate, as it decreases the like-
lihood of producing a false alarm.

Simulation 2: Varying number of


observations per participant
The second simulation looked at the effect of low-
ering measurement error variance by increasing
Figure 4. (a) Hit rate, (b) false-alarm rate, and (c) d ′ for simulated
observations per participant on outlier detection. data sets from Simulation 2 with a constant number of participants
The number of participants was held constant (at in the study (50) and proportion of the dissociating cases in the
50) across the simulated data sets, as was the per- sample (10%) and a variable number of observations per
centage of dissociating cases (10%, or 5 in all data participants (range: 20– 1000).
sets), while the number of observations collected
for each participant varied by increments of 20
from 20 to 1000 observations. more than 90% of the random data set had signifi-
Significant correlations between rate of perse- cant correlations. When at least 40 observations
verative and rate of nonperseverative intrusions were collected per participant, more than 95% of
were observed in nearly all of the randomly gener- the random data sets had a significant correlation.
ated data sets. The average correlation coefficients Figure 4 shows the performance of the outlier
range from .45, with 20 observations per partici- detection technique over these random data sets.
pant, to .77, with 1000 observations per partici- Unlike increasing the number of participants,
pant. At all levels of observation-per-participant, there is a clear benefit of increasing the number

Cognitive Neuropsychology, 2013, 30 (7 –8) 605


FISCHER-BAUM

of observations per participant on hit rate (Figure back for multiple testing sessions can be time con-
4a). With 20 observations per participant, only suming. Simulation 3 looks at the trade-off
7% of the dissociating cases were correctly ident- between including more participants in the study
ified. However, when 1000 observations were col- and collecting more data per participant. With
lected per participant, 71% of the dissociating cases finite time and resources, is it better to collect
were correctly identified. Again, the false-alarm more data from a smaller sample, or include
rate in these simulations was extremely low, con- more participants but test each participant less?
sistently less than 0.5% across all conditions
(Figure 4b). There appears to be a slight cost, in
Simulation 3: Trade-off between
terms of false alarms, of increasing the number
participants and observations per
of observations, from approximately 0.2% of the
participation
truly associated individuals when 20 observations
were collected per participant to about 0.4% Simulation 3 looks at the direct trade-off between
Downloaded by [University of Waterloo] at 14:33 11 October 2014

when 1000 observations were collected per partici- including more participants in the study and col-
pant. However, as shown in Figure 4c, this slight lecting more observations per participant. A new
increase in false-alarm rate was overwhelmed by value—the total number of trials in the study—is
the large increases in hit rate when d ′ was calcu- defined as the product of the number of partici-
lated, with the d ′ -sensitivity of the outlier detec- pants and the observations per participant. For
tion technique around 1.4 with 20 observations example, if the study includes 50 participants,
per participant and 3.2 with 1000 observations and 40 observations are collected for each partici-
per participant. pant, then the study includes a total of 2000 trials.
Simulations 1 and 2 demonstrate that both The same number of trials could be distributed in
sampling and measurement error variance cause a different ways; an alternative 2000-trial study
decrease in the sensitivity of the recursive exter- could have included 200 participants, with only
nally studentized residual outlier detection tech- 10 observations collected for each participant.
nique. The sensitivity of the technique is Assuming an equal cost for each trial, how
improved both by increasing the number of par- should trials be distributed between the number
ticipants, decreasing sampling error variance, and of participants and the number of observations
by increasing the number of observations per par- per participant if the goal is to maximize the per-
ticipant, decreasing measurement error variance. formance of outlier detection?
Strikingly, the benefits of increasing the number To address this question, I looked at eight levels
of participants and increasing the observations of total numbers of trials—1000, 2000, 3000, 4000,
per participant appear to affect outlier detection 6000, 8000, 12,000, and 20,000 trials per study.
differentially. In Simulation 1, the effects of The proportion of dissociating cases was held con-
increasing the number of participants shows up stant at 10%, and random data sets were created for
in a decrease in false-alarm rate, while in different combinations of number of participants
Simulation 2, the effects of increasing observations and observations per participant, resulting in these
per participants shows up primarily in an increase numbers of trials. For example, the 2000-trial simu-
in hit rate. If one of the goals of the case series lations included random data sets generated with 10
investigation is to be able to detect dissociating participants and 200 observations per participant,
cases, as many researchers have argued, there are 20 participants and 100 observations per partici-
independent motivations for testing more partici- pant, 40 and 50, 50 and 40, 80 and 25, 100 and
pants and for collecting more data for each partici- 20, and, finally, 200 participants and 10 obser-
pant. Of course, there are costs, in terms of both vations per participant. The maximum number of
time and other resources, of running studies that participants was set to be 200, as studies including
are too large. Finding additional participants for more participants than that seem infeasible. For
the study and bringing the same participants each of these number of participants and

606 Cognitive Neuropsychology, 2013, 30 (7 –8)


CASE SERIES OUTLIERS

Assuming an even trade-off in the costs associ-


ated with running more participants and collecting
more data from each participant, the results of
Simulation 3 suggest that it is better to collect
more data for each participant than it is to
include more participants. While it is not always
the case that the outlier detection technique per-
forms best with the fewest number of participants
(10), it appears that increasing the number of par-
ticipants is only beneficial once there is a relatively
large number of observations (400 –600) collected
Figure 5. The d ′ for a given number of total number of trials in the per participant.
study as the number of participants increases (and the number of The assumption of an even trade-off between
observations per participant decreases). Darker lines reflect more
Downloaded by [University of Waterloo] at 14:33 11 October 2014

total trials in the study.


participants and observations may not hold for all
case series studies. For most case series studies, it
is more costly, in terms of both time and money,
observations per participant pairs, 10,000 random to recruit and consent additional participants than
data sets were generated, and hit rate, false-alarm it is to collect more data per participant by bringing
rate, and d ′ were calculated. them back for additional testing sessions or even
Figure 5 plots the d ′ as a function of number of keeping them longer in a single session. This
participants and total number of trials. problem is particularly salient in those case series
Unsurprisingly, increasing the total number of with restrictive inclusion and exclusion criteria,
trials (the darkening lines) increased d ′ . The requiring an individual either to show intact per-
greater the total number of trials in the study, formance in a number of cognitive tasks (e.g.,
the better the recursive externally studentized Nickels et al., 2011) or to have specifically localized
residual test performed at detecting dissociating brain damage (e.g., Shallice & Buiatti, 2011).
cases. However, within a given total number of However, there are circumstances in which it is
trials (shown here as a single line in the figure), impossible to collect a large number of observations
it was generally the case that it is better to run per participant—that is, case series with individuals
fewer participants and collect more observations in the acute stage of stroke aphasia in which all of
per participant. For small total numbers of trials the behavioural testing has to be carried out
(1000– 4000), the maximum d ′ was found with within a small window of time immediately follow-
only 10 participants (and 100, 200, 300, and 400 ing the stroke (e.g., Hillis, Kane, Barker,
observations per participants), and d ′ decreased Beauchamp, & Wityk, 2001). In those cases, the
monotonically as the number of participants costs of increasing the number of observations per
increases. With 6000 and 8000 trials, d ′ was participant may be greater than the costs of includ-
greater with 20 participants (300 and 400 obser- ing more participants.
vations per participant, respectively) than with 10
participants, but monotonically decreased with
more than 20 participants. For 12,000 trials, the
Summary of Simulations 1– 3
same level of sensitivity was observed with 20 par- The first three simulations provide some sugges-
ticipants (and 600 observations/participant) and tions for how a case series study should be designed
30 participants (and 400 observations for partici- to identify dissociating cases present in the sample.
pants), with decreasing sensitivity for smaller or Simulations 1 and 2 show that there are indepen-
larger numbers of participants. For 20,000 trials, dent benefits on the ability to detect outliers with
the maximum sensitivity was observed with 40 both increasing the number of participants and
participants (and 500 observations/participant). increasing the number of observations per

Cognitive Neuropsychology, 2013, 30 (7 –8) 607


FISCHER-BAUM

participant. Simulation 3 shows that the benefit of of dissociating cases. In Simulation 1, regardless
collecting more data per participant mostly out- of the number of participants in the study, only
weighs the benefits of including more participants. 35% of the dissociating cases were detected,
Of course, identifying dissociating cases is not meaning that approximately twice as many disso-
the only goal of a case series investigation. ciating cases went undetected as were identified.
Another goal is being able to identify associations In order to limit false alarms in the outlier detec-
that exist in the data. Simulation 1 shows that tion technique, many dissociating cases are
small sample sizes have higher correlations missed, and these misses make interpreting the
between perseverative and nonperseverative intru- null result difficult.
sions on average than larger sample sizes, but those Including more participants in the study
correlations are less likely to be significant. With increases the certainty that the failure to find an
small sample sizes, there is less power to detect outlier means that no dissociating case exists in
the true association that exists for 90% of the indi- the sample. The simulations above stipulated
Downloaded by [University of Waterloo] at 14:33 11 October 2014

viduals in the sample. Given these two goals in a that 90% of the individuals in the sample had
case series study, the simulations show that the the true underlying association, while the remain-
study should include a sample of at least 30 partici- ing 10% were dissociating cases. This means that
pants and at least 400 observations for each indi- in a sample of 10 participants, only a single disso-
vidual, assuming a 90 – 10 split between ciating case was present, while in a sample of 100
associated and dissociating cases. In real case participants, 10 dissociating cases were present.
series, there is no way to know how frequently While the likelihood of missing an individual dis-
the dissociating pattern will be observed. sociating case was relatively high (65%) regard-
Simulation 4 explores the effects of different pro- less of the number of dissociating cases in the
portions of dissociating cases in the sample. simulation, the likelihood of missing both outliers
However, before moving onto the next stimu- in a 20-participant study is lower than the likeli-
lation, I consider the situation in which no outliers hood of missing the single outlier in a 10-partici-
are identified in the sample. pant study, and the likelihood of missing all 10
outliers in a 100-participant study is lower still.
The results of Simulation 1 can be reanalysed to
WHAT DOES IT MEAN IF NO look at the likelihood that no outliers were
OUTLIERS ARE DETECTED? detected in samples with different numbers of par-
ticipants. The percentage of data sets with no out-
As Nickels et al. (2011) highlight, the presence of liers detected fell drastically as the number of
a single dissociating case in a case series has the participants in the study increases, from 60.4% of
possibility of falsifying the theory being tested. the studies with only 10 participants to 12.9%
But what if no such case is identified? Such a with 50 participants, 2.2% with 100 participants,
situation appears to support the theory being and 0.1% with 200 participants.
tested; however, there are several issues that arise The simulation stipulates not only that disso-
from taking this failure to identify a dissociating ciating cases are present, but also that they are rela-
case as strong evidence supporting the hypothesis. tively frequent, consisting of exactly 10% of the
The first is that the outlier detection technique cases in the sample. If these dissociating cases are
outlined above is a form of the null hypothesis test. more infrequent, say affecting only 2% of the
A failure to identify an outlier should have the cases in the sample, then even with large samples
same status as a failure to reject the null hypoth- it may be unlikely that even a single dissociating
esis; it may be consistent with a theory but it case will be identified. We have no way of estimat-
does not constitute strong evidence in favour of ing the frequency with which certain rare patterns
that theory. Indeed, even at its best, the outlier are observed, and, historically, cognitive neuropsy-
detection technique only identifies around 70% chology has not been concerned by the relative

608 Cognitive Neuropsychology, 2013, 30 (7 –8)


CASE SERIES OUTLIERS

Table 1. Effect of the frequency of associated cases on the


likelihood of a significant correlation and the sensitivity of the
outlier detection method

% Mean % Significant % True outliers


Associated correlation (a ¼ .05) detected

0 2.14 ,0.1 —
10 2.09 0.2 0.3
20 2.03 1.6 0.5
30 .04 4.5 0.8
40 .12 14.1 1.4
50 .21 31.3 2.7
60 .31 57.6 6.8 Figure 6. Percentage of data sets with a significant correlation
70 .44 83.8 21.0 between perseverative and nonperseverative intrusions and
80 .58 97.0 48.9 without a single outlier detected as a function of the frequency of
Downloaded by [University of Waterloo] at 14:33 11 October 2014

90 .76 99.9 62.9 dissociating cases in the sample (number of participants ¼ 50,
98 .93 100 64.1 observations per participant ¼ 500).
100 .98 100 —

Simulation 4: Varying frequency of


dissociating cases
frequency of different patterns of cognitive impair-
ment (e.g., Coltheart, 2001). Therefore, at best The fourth simulation holds constant both the
what can be concluded from a case series study in number of participants (50) and the number of
which no outliers are detected is that there has observations (500) while varying the frequency of
yet to be evidence presented that contradicts the dissociating cases in the sample. As in Simulations
theory supported by the case series. 1–3, for each random data set, the correlation coef-
Rareness of the dissociating pattern is not the ficient, the significance of the correlation, and the hit
only cause of a failure to detect any outliers. and false-alarm rates were calculated. Additionally,
Indeed, having too many individuals in the the percentage of random data sets in which (a) the
sample with the dissociating pattern can also correlation was significant, and (b) there were no
limit the effectiveness of the outlier detection outliers detected was calculated.
technique. One assumption of any outlier detec- Table 1 shows the average correlation coefficient,
tion technique is that a relatively small number the percentage of the random data sets with signifi-
of data points are produced by this alternative cant correlations, and the percentage of dissociating
mechanism (Freeman, 1980). As a result, outlier cases identified as outliers for different relative pro-
detection techniques perform poorly in situations portions of dissociating and associated cases. When
in which the frequency of dissociating cases is the proportion of associated cases was low (,30%
high. But even in data sets with large numbers of of all participants), the likelihood of finding a signifi-
dissociating cases, correlations supporting the cant correlation between perseverative and nonper-
underlying association for some of the individuals severative intrusions was also low (,5% of the
in the sample may still be significant. Simulation 4 data sets had a significant correlation), as was the
explores the effects of the frequency of dissociating ability to detect dissociating cases (,1% of the dis-
cases on both the likelihood of finding a significant sociating cases were identified as outliers).4 When
correlation and the hit rate of detecting outliers. the proportion of associated cases was high (.80%

4
When there is no underlying association between perseverative and nonperseverative intrusions, the correlation is slightly nega-
tive; the rates of perseverative and nonperseverative intrusions are not entirely independent in this sample. The sum of perseverative
and nonperseverative intrusion rates and the proportion of correct responses must be equal to 1.0. Therefore, if the perseverative error
rate is very high (e.g., .50%) then the nonperseverative rate must be lower (e.g., ,50%) and vice versa.

Cognitive Neuropsychology, 2013, 30 (7 –8) 609


FISCHER-BAUM

of all participants), the correlations were significant SHIFTING THE DECISION


in nearly all of the data sets (.95% of the data CRITERION
sets), and the ability to detect dissociating cases as
outliers was reasonably good (.45% of the disso- One potential concern with the outlier detection
ciating cases). Problems arose when associated and technique discussed in this paper is that it is
dissociating cases were roughly equally common in overly conservative. In detection theory, the
the sample. For example, when the sample had a decision criterion can be shifted to decrease the
50–50 split of associated and dissociating individ- number of misses, at the expense of increasing
uals, nearly a third of the random data sets had a sig- the number of false alarms. For the previous four
nificant correlation, while only 2.7% of the simulations, the criterion is based on the conserva-
dissociating cases were detected as outliers. The tive Bonferroni-corrected critical value, which errs
results of this simulation highlight two critical against identifying data points as outliers and
issues when interpreting case series studies. First, limits the number of false alarms at the expense
Downloaded by [University of Waterloo] at 14:33 11 October 2014

finding a significant correlation in a case series inves- of increasing the number of misses. A fifth simu-
tigation does not mean that even the majority of lation explores the effects that shifting the decision
individuals in the sample show the underlying criterion has on hits and false alarms in outlier
association. Second, too many dissociating cases detection.
limit the ability to detect outliers.
This second issue is further illustrated in
Figure 6. This figure plots the proportion of data
Simulation 5: Varying the decision criterion
sets in which (a) there was a significant correlation,
and (b) no outliers were detected as a function of In this simulation, 10,000 data sets were created
the percentage of the sample made up of associated with 50 participants, 100 data points per partici-
cases. Such data sets are common both when dis- pant and a 90– 10 split between associated and dis-
sociating cases are very rare (e.g., when 98% of sociating cases. Each of the 10,000 data sets was
the sample are associated cases, 30% of the data evaluated for outliers using 11 different uncor-
sets have a significant correlation without a rected alpha-values for statistical testing (.2, .1,
single outlier detected) and when dissociating .05, .02, .01, .005, .002, .001, .0005, .0002,
cases are reasonably common (e.g., when 60% of .0001). Note that included in here is an uncor-
the sample are associated cases, 23% of all data rected alpha value (.001) that is equivalent to a
sets have a significant correlation without a Bonferroni-correct alpha value of .05 with 50 par-
single outlier detected). The size of the correlation ticipants. An average false-alarm rate and hit rate
coefficient can be used to distinguish these two were calculated for each of these different alpha-
causes of a failure to detect an outlier. When dis- levels and were plotted as an ROC-curve
sociating cases are rare, correlations will be very (Figure 7). In addition, for each decision criterion,
strong, and when dissociating cases are common, the proportion of the 10,000 data sets that con-
correlations will be weak to moderate. In tained at least one false alarm was calculated.
Table 1, the average correlation coefficient is .93 There is a clear cost, in terms of false alarms,
when 98% of the data are generated from associ- when the decision criterion is made more
ated cases and only .31 when 60% of the data are liberal—that is, by not correcting for multiple com-
generated from associated cases. In a high- parisons. With an uncorrected alpha of .05, the rate
powered case series, if the predicted association of hits is relatively high (68%), but so is the rate of
has a moderate correlation coefficient, and no out- false alarms (20%). Because associated cases are nine
liers are detected, it is more likely that no outliers times more common in each data set than dissociat-
are detected because the dissociating pattern is so ing cases, at this hit and false-alarm rate an individ-
common than because the dissociating pattern ual identified as an outlier is nearly three times more
does not exist. likely to be a false alarm than a hit. When using this

610 Cognitive Neuropsychology, 2013, 30 (7 –8)


CASE SERIES OUTLIERS

(Holm, 1979), may be suitable alternatives to the


Bonferroni correction for the outlier procedures
outlined here.
A case-series investigator may want to identify
outliers with goals other than identifying these dis-
sociating cases. For example, a researcher may want
to limit the investigation only to those individuals
who best show the underlying association. In
these investigations, a miss—including in the analy-
sis an individual who actually suffers from a differ-
ent deficit from the others in the sample—may have
more impact than a false alarm. The externally stu-
Figure 7. Receiver-operating characteristic (ROC) curve based on dentized residual test may still be appropriate for
Downloaded by [University of Waterloo] at 14:33 11 October 2014

the same sets of simulated data with outlier detection run at outlier detection with this goal in mind, but the
different alpha levels. Highlighted in the figure is the point that decision criterion should be made less conservative.
represents a Bonferroni-corrected alpha of .05 and an uncorrected
In these sorts of studies, it may be appropriate to use
alpha of .05 (number of participants ¼ 50, observations per
participant ¼ 100, frequency of dissociating cases ¼ 10%). an uncorrected alpha for outlier detection.

uncorrected alpha, 99% of the data sets had at least THE ROLE OF FOLLOW-UP
one false alarm. Without correcting for multiple TESTING IN DISTINGUISHING
comparisons, it becomes difficult to say with any MISSES AND FALSE ALARMS
confidence whether truly dissociating cases exist in
the data set. As the simulations above illustrate, simply detect-
In contrast, when a Bonferroni-corrected alpha ing an outlier in a case series is not enough to con-
of .05 is used as the decision criterion, the hit rate clude that the theory being tested is wrong; there is
is much lower (35%), the false-alarm rate is lower always a possibility that the data point detected as
still (,0.3%), and only 11% of all data sets contain an outlier is actually a false alarm. Nickels et al.
at least one false alarm. With these hit rates and (2011) point out the importance of follow-up
false-alarm rates, a data point detected as an testing to determine whether cases identified as
outlier is 15 times more likely to be a truly disso- outliers are truly dissociating cases, or simply
ciating case than a false alarm. In this paper, I appear to be dissociating cases due to some con-
have focused on outlier detection in case series founding factor. For these apparent dissociating
investigations with the goal of identifying single cases, some theoretical account must be made to
cases who show the dissociating pattern; this explain how the mechanism that is generating
goal is better served by setting a conservative their responses differs from the mechanism gener-
decision criterion, as the cost of mistakenly reject- ating the associations in the other cases in the
ing the null hypothesis that all of the individuals series.5 Follow-up testing, both for the cases
suffer from the same underlying deficit is high. detected as outliers and for the other cases in the
Incorrectly identified dissociations limit the sample, can be used to show that the case detected
impact of single-case cognitive neuropsychology. as an outlier truly has a different deficit from the
Less conservative corrections that still keep the other individuals in the sample (see Dell &
familywise error rate below a specified alpha- Schwartz, 2011, for additional discussion of this
level, like the Holm – Bonferroni correction point).

5
Keep in mind, an outlier should not be discarded as a false alarm simply because the researcher cannot think of a theoretical
explanation for why that individual differs from the rest of the sample.

Cognitive Neuropsychology, 2013, 30 (7 –8) 611


FISCHER-BAUM

Nickels et al. (2011) argue that the outlier needs only 1% of the associated cases. Being identified
to dissociate on follow-up testing from all of the as an outlier in both experiments is an excellent
others in the sample in order to demonstrate that indicator that the first result was a hit and not a
it is a truly dissociating case. This argument false alarm; an individual that was an outlier in
ignores the fact that the outlier detection technique both experiments was more than 1000 times
may have missed some truly dissociating cases. As more likely to be a truly dissociating case than an
the simulations show, the Bonferroni-corrected, associated case mistakenly identified as an outlier.
externally studentized residual test is conservative, A second issue is whether follow-up testing can
making the likelihood of missing truly dissociating help distinguish misses from correct rejections. In
cases surprising high. Follow-up testing can help to the first experiment, an average of 48.1 individuals
identify these misses as well. Despite not being were not identified as outliers; of these, 44.9 were
detected as outliers, misses will pattern with the correctly rejected associated cases, and 3.2 were
identified dissociating case on the follow-up missed truly dissociating cases. Of these individ-
Downloaded by [University of Waterloo] at 14:33 11 October 2014

testing rather than patterning with the rest of the uals, an average of 0.4 cases were identified as out-
sample. A sixth simulation examines the role of liers in the second experiment (,1%). However,
additional testing in distinguishing hits from false the majority of the cases that were detected as an
alarms and identifying misses. outlier in Experiment 2, despite not being detected
as an outlier in Experiment 1, were truly dissociat-
ing cases (75%). Being identified as an outlier in
Simulation 6: The role of additional testing
the second experiment, despite not being ident-
The sixth simulation took a constant number of ified as an outlier in the first experiment, is a
participants (50), number of observations (100), decent indication that the first result was a miss
and frequency of dissociating cases in the sample and not a correct rejection.
(10%). The outlier detection technique was This simulation shows the role that additional
applied to these simulated data sets. Follow-up testing can play in distinguishing hits and misses
testing was then simulated in the simplest possible from false alarms and correct rejections. Outlier
manner: The same simulated participants, with the detection on a second experiment provides clear
same underlying rates of producing perseverative information about how to interpret the results of
and nonperseverative intrusions, were adminis- the first experiment. Individuals who are detected
tered a second experiment with 500 observations. as outliers on both experiments are almost cer-
Outlier detection is then applied to these new tainly truly dissociating cases while individuals
data sets, and a computer program tabulated how detected as outliers in the second experiment but
many individuals identified as outliers in the first not the first were probably misses. Nickels et al.
experiment were also identified as outliers in the (2011) and Schwartz and Dell (2010) argue that
second experiment. follow-up testing should be used to better under-
In the first experiment, an average of 1.9 out- stand the underlying deficit of the dissociating
liers were detected for each data set, with 1.8 cases. This stimulation shows the power that
being truly dissociating cases (hit rate: 35%) and even simple retesting—a direct replication of the
0.1 being associated cases (false-alarm rate: findings with the same individuals performing
,0.3%). Outliers in Experiment 1 also tended to the same task—can have in identifying truly disso-
be outliers in Experiment 2. An average of 1.5 ciating cases in a case series.
cases were identified as outliers in both exper-
iments (77%), and nearly all of these cases were
truly dissociating cases (.99.9%). Indeed, over CONCLUSIONS
80% of the truly dissociating cases identified as
an outlier in Experiment 1 were also identified as Case series investigations provide a clear tool for
an outlier in Experiment 2, while this was true of testing predicted associations between cognitive

612 Cognitive Neuropsychology, 2013, 30 (7 –8)


CASE SERIES OUTLIERS

functions in individuals with cognitive impair- the worst their predicted performance on test Y).
ments. When testing for these associations, it is However, this method is not appropriate for
critical to additionally look for dissociating cases, studies in which the hypothesis being tested does
individuals whose pattern of performance deviates not attribute a causal role to one of the variables
from the predicted association, particularly for and treats all of the variables as having equal
those studies in which a single dissociating case status (see Crawford & Garthwaite, 2005, 2007a,
can falsify the theory being tested. Determining 2007b; Crawford, Garthwaite, & Porter, 2010
whether a case series includes individuals whose for discussion of appropriate tests to employ in
data are unpredicted from the underlying associ- these conditions).
ation—called outliers in the statistics literature—is While the example discussed in this paper looks
a difficult statistical problem. This paper offers only at the relationship between two variables, the
one formal approach to testing case series for out- externally studentized residual test generalizes to
liers: the recursive externally studentized residual multiple linear regression (Weisberg, 1985).
Downloaded by [University of Waterloo] at 14:33 11 October 2014

test. Through a series of simulation studies, I have Specifically, the method generalizes to cases in
identified some statistical concerns that arise when which multiple causal variables are used to make
testing a case series for outliers. Briefly, it is impor- a prediction about a single dependent variable.
tant for case series investigations to include large As with the simple linear regression, the observed
numbers of participants. Increasing the number of value on that dependent variable can be compared
participants decreases the likelihood of falsely iden- to the predicted value based on the other explana-
tifying an individual as a dissociating case and tory variables, by computing an externally studen-
increases confidence that the dissociating pattern tized residual and comparing that value to some
is not attested when no outliers are detected. decision criteria.6 Extending this technique to
Furthermore, it is important in a case series investi- multiple regression is critical for case series inves-
gation to collect many observations from each par- tigations, as individuals in a large case series are
ticipant. Increasing the number of observations per likely to differ on many variables (i.e., age, edu-
participant has a sizeable effect on the likelihood of cation, speed of processing) that may be related
correctly detecting the truly dissociating cases in the to the measures being tested. The problem of cov-
sample. Finally, if outliers are detected, it is impor- ariates is beyond the scope of the present paper,
tant to do follow-up testing, in order to determine but I refer the reader to Crawford, Garthwaite,
whether the detected outlier is a truly dissociating and Ryan (2011) for a discussion of this issue, as
case or a false alarm, as well as identifying other dis- well as methods for comparing a case study to a
sociating cases that may have been missed by the control population in the presence of covariates.
outlier detection technique in a single experiment. Following all of the directives from this paper
While the Fischer-Baum and Rapp (2012) makes for infeasibly large case series investigations;
study of perseveration errors was used as a concrete while running hundreds of participants on dozens
example of how outlier detection might be applied of tests each composed of thousands of trials will
to case series data, the methods proposed in this result in the best data for case series investigations,
paper should be applicable to a wide range of a single lab has neither the time nor the money to
case series studies. This method is clearly appro- carry out such a study. Simulation 3 offered some
priate for detecting outliers in case series studies suggestions for how to balance some of these com-
that use linear regression analyses, in which there peting recommendations, advocating increasing
is a clear prediction being tested about the causal the number of observations per participant and
relationship of one of the variables on the other the expense of including more participants. An
(e.g., the lower a patient’s performance on test X, alternative perspective would be to make our case

6
The degrees of freedom for the generalized externally studentized residual test is n – p – 1, in which n is the total number of
data points, and p is the total number of parameters (2 in simple linear regression, .2 in multiple linear regression).

Cognitive Neuropsychology, 2013, 30 (7 –8) 613


FISCHER-BAUM

series investigations an ongoing community Crawford, J. R., & Garthwaite, P. H. (2007a).


project. New cases investigated in labs around Comparison of a single case to a control or normative
the world can be added to existing case series by sample in neuropsychology: Development of a
administering the same or very similar tests and Bayesian approach. Cognitive Neuropsychology, 24,
343–372.
carrying out the same analysis techniques. These
Crawford, J. R., & Garthwaite, P. H. (2007b). Using
new cases can be added to the case series database,
regression equations built from summary data in
and those case series can be reassessed for both the neuropsychological assessment of the individual
associations and outliers. The Moss Aphasia case. Neuropsychology, 21, 611–620.
Psycholinguistics Project Database already oper- Crawford, J. R., Garthwaite, P. H., & Porter, S. (2010).
ates in this spirit, making available both the tests Point and interval estimates of effect sizes in the
administered to a large sample of aphasic individ- case-controls design in neuropsychology: Rationale,
uals and the data from those individuals (Mirman methods, implementations, and proposed reporting
et al., 2010). Rapp (2011) pointed out the tempta- standards. Cognitive Neuropsychology, 27, 245–260.
Crawford, J. R., Garthwaite, P. H., & Ryan, K. (2011).
Downloaded by [University of Waterloo] at 14:33 11 October 2014

tion to ignore potentially relevant case studies


when drawing conclusions from the results of a Comparing a single case to a control sample:
Testing for neuropsychological deficits and dissociations
large case series. A mechanism by which these
in the presence of covariates. Cortex, 47, 1166–1178.
individual cases can be integrated to make larger
Dell, G. S., Burger, L., & Svec, W. (1997). Language
case series could minimize this temptation. production and serial order: A functional analysis
and a model. Psychological Review, 104, 123–147.
Dell, G. S., & Schwartz, M. F. (2011). Who’s in and
who’s out? Inclusion criteria, model evaluation, and
REFERENCES the treatment of exceptions in case series. Cognitive
Neuropsychology, 28, 515–520.
Ben-Gal, I. (2005). Outlier detection. In O. Maimon & Fischer-Baum, S., & Rapp, B. (2012). Underlying cause(s)
L. Rockach (Eds.), Data mining and knowledge dis- of letter perseveration errors. Neuropsychologia, 50,
covery handbook: A complete guide for practitioners 305–318.
and researchers (pp. 131– 146). Kluwer: Academic Freeman, P. R. (1980). On the number of outliers in
Publishers. data from a linear model. Trabajos de estadı́stica y de
Bub, D. (2011). Facing the challenge of variation in investigación operativa, 31, 349– 365.
neuropsychological populations: Lessons from Goldrick, M. (2011). Theory selection and evaluation in
biology. Cognitive Neuropsychology, 28, 445 –450. case series research. Cognitive Neuropsychology, 28,
Cohen, L., & Dehaene, S. (1998). Competition between 451–465.
past and present. Assessment and interpretation of Hauser, M. D. (1999). Perseveration, inhibition and the
verbal perseverations. Brain, 121, 1641–1659. prefrontal cortex: A new look. Current Opinion in
Coltheart, M. (2001). Assumptions and methods in Neurobiology, 9, 214–222.
cognitive neuropsychology. In B. Rapp (Ed.), Hillis, A. E., Kane, A., Barker, P., Beauchamp, N., &
Handbook of cognitive neuropsychology (pp. 3– 21). Wityk, R. (2001). Neural substrates of the cognitive
Philadelphia: Psychology Press. processes underlying reading: Evidence from
Crawford, J. R., & Garthwaite, P. H. (2005). Testing for Magnetic Resonance Perfusion Imaging in hypera-
suspected impairments and dissociations in single-case cute stroke. Aphasiology, 15, 919–931.
studies in neuropsychology: Evaluation of alternatives Holm, S. (1979). A simple sequentially rejective mul-
using Monte Carlo simulations and revised tests for tiple test procedure. Scandinavian Journal of
dissociations. Neuropsychology, 19, 318–331. Statistics, 6, 65– 70.
Crawford, J. R., & Garthwaite, P. H. (2006). Lambon Ralph, M. A., Patterson, K., & Plaut, D. C.
Comparing patients’ predicted test scores from a (2011). Finite case series or infinite single-
regression equation with their obtained scores: A sig- case studies? Comments on “Case series investi-
nificance test and point estimate of abnormality with gations in cognitive neuropsychology” by Schwartz
accompanying confidence limits. Neuropsychology, 20, and Dell (2010). Cognitive Neuropsychology, 28,
259– 271. 466–474.

614 Cognitive Neuropsychology, 2013, 30 (7 –8)


CASE SERIES OUTLIERS

Martin, N., & Dell, G. S. (2004). Perseveration and for cognitive neuropsychology. Topics in Cognitive
anticipations in aphasia: Primed intrusions from Science, 1, 39–58.
the past and future. Seminars in Speech and Penny, K. I. (1996). Appropriate critical values
Language, 26, 349 –362. when testing for a single multivariate outlier by
Martin, N., & Dell, G. S. (2007). Common mechanisms using the Mahalanobis distance. Applied Statistics,
underlying perseverative and non-perseverative sound 45, 73 –81.
and word substitutions. Aphasiology, 21, 1002–1017. Rapp, B. (2011). Case series in cognitive neuropsychol-
Mirman, D., Strauss, T. J., Brecher, A., Walker, G. M., ogy: Promise, perils and proper persepective.
Sobel, P., Dell, G. S., & Schwartz, M. F. (2010). A Cognitive Neuropsychology, 28, 435–444.
large, searchable, web-based database of aphasic Schwartz, M. F., & Dell, G. S. (2010). Case series
performance on picture naming and other tests of investigations in cognitive neuropsychology.
cognitive function. Cognitive Neuropsychology, 27, Cognitive Neuropsychology, 27, 477–494.
495 –504. Shallice, T., & Buiatti, T. (2011). Types of case series –
Nickels, L., Howard, D., & Best, W. (2011). On the use The anatomically based approach: Commentary on
of different methodologies in cognitive neuropsy- M.F. Schwartz & G.S. Dell: Case series investi-
Downloaded by [University of Waterloo] at 14:33 11 October 2014

chology: Drink deep and from several sources. gations in cognitive neuropsychology. Cognitive
Cognitive Neuropsychology, 28, 475 –485. Neuropsychology, 28, 500– 514.
Olson, A. C., & Romani, C. (2011). Model evaluation and Simon, J. L., & Bruce, P. (2000). Resampling stats.
case series data. Cognitive Neuropsychology, 28, 486–499. Arlington, VA: Resampling Stats Inc.
Patterson, K., & Plaut, D. C. (2009). “Shallow draughts Weisberg, S. (1985). Applied linear regression (2nd ed.).
intoxicate the brain”: Lessons from cognitive science New York: Wiley.

Cognitive Neuropsychology, 2013, 30 (7 –8) 615


FISCHER-BAUM

APPENDIX A

Matlab function for computing externally studentized residuals


function [studentized_residuals] ¼ externally_studentized_residuals(Data, alpha)
% Input is DATA – a N X 2 matrix for N subjects in a case series, with x and y values in each column and alpha –
the significance level for rejecting the null hypothesis, typically .05
% externally_studentized_residuals: Detection of outliers in a linear regression
% Computes an exernally-studentized residual for each subject in an analysis and compares that to a critical
value from a Bonferroni corrected alpha
% Output is two columns. Left column is the studentized residual for each case.
% The right column marks with (-1) values significantly below predicted value and(+1) for values significantly
higher than the predicted value.

[Number_of_subjects Dimensions_in_Data] ¼ size(Data);


Downloaded by [University of Waterloo] at 14:33 11 October 2014

studentized_residuals ¼ zeros(Number_of_subjects, 2);

Critical_values ¼ tinv([(alpha/2)/Number_of_subjects 1-(alpha/2)/Number_of_subjects],Number_of_subjects-


Dimensions_in_Data-1-1);

reg_results ¼ regstats(Data(:,2),Data(:,1),’linear’,’studres’);

studentized_residuals(:,1) ¼ reg_results.studres;

for i ¼ 1:Number_of_subjects
if studentized_residuals(i) . Critical_values(2)
studentized_residuals(i,2) ¼ 1;
elseif studentized_residuals(i) , Critical_values(1)
studentized_residuals(i,2) ¼ -1;
else
end
end
end

APPENDIX B

Matlab function for recursive application of


the externally studentized residual test
function [recursive_outlier_detection] ¼ externally_studentized_recursive(Data, alpha);
% Input is DATA – a N X 2 matrix for N subjects in a case series, with x and y values in each column and alpha –
the significance level for rejecting the null hypothesis, typically .05
% Program applies the externally_studentized_residuals program recursively.
% It outputs a single column, with one value for each case in the case series.
% A value of 0 means that the individual is not identified as an outlier
% A value of +1 means that the individual is an outlier in the positive direction
% A value of -1 means that the individual is an outlier in the negative direction

616 Cognitive Neuropsychology, 2013, 30 (7 –8)


CASE SERIES OUTLIERS

[recursive_outlier_detection] ¼ zeros(length(Data),1);
[studentized_residuals] ¼ externally_studentized_residuals(Data, alpha);
DataPrime ¼ Data;
IndexData ¼ [1:1:length(Data)];
while sum(abs(studentized_residuals(:,2))) . 0
DataPrime2 ¼ [];
IndexData2 ¼ [];
for i ¼ 1:length(studentized_residuals)
if studentized_residuals(i,2) ¼¼ 0
DataPrime2 ¼ vertcat(DataPrime2, DataPrime(i,:));
IndexData2 ¼ horzcat(IndexData2, IndexData(i));
else
recursive_outlier_detection(IndexData(i)) ¼ studentized_residuals(i,2);
end
end
DataPrime ¼ DataPrime2;
Downloaded by [University of Waterloo] at 14:33 11 October 2014

IndexData ¼ IndexData2;
[studentized_residuals] ¼ externally_studentized_residuals(DataPrime, alpha);
end
end

Cognitive Neuropsychology, 2013, 30 (7 –8) 617

You might also like