LHB 29 1 87

Law and Human Behavior, Vol. 29, No.
1, February 2005 (
C 2005)
DOI: 10.1007/s10979-005-1400-8
Improving Decision Making in Forensic Child

Sexual Abuse Evaluations
Steve Herman1
Mental health professionals can assist legal decision makers in cases of allegations
of child sexual abuse by collecting data using forensic interviews, psychological test-
ing, and record reviews, and by summarizing relevant findings from social science
research. Significant controversy surrounds another key task performed by mental
health professionals in most child sexual abuse evaluations, i.e., deciding whether or
not to substantiate unconfirmed abuse allegations. The available evidence indicates
that, on the whole, these substantiation decisions currently lack adequate psychomet-
ric reliability and validity: an analysis of empirical research findings leads to the con-
clusion that at least 24% of all of these decisions are either false positive or false
negative errors. Surprisingly, a reanalysis of existing research also indicates that it
may be possible to develop reliable, objective procedures to improve the consistency
and quality of decision making in this domain. A preliminary, empirically-grounded
procedure for making substantiation decisions is proposed.
KEY WORDS: reliability, validity, and accuracy of substantiation decisions in forensic evaluations of
child sexual abuse allegations by mental health professionals.
Child Protective Services (CPS) agencies in the United States receive approximately
250,000 reports of alleged child sexual abuse each year (Department of Health and
Human Services [DHHS], 2004; see also Peddle & Wang, 2001). Approximately one
third of these reports are screened out prior to investigation, but at least 150,000
reports are the objects of forensic child sexual abuse evaluations (FCSAEs) per-
formed by CPS caseworkers. The number of FCSAEs that are performed each year
by mental health professionals (MHPs)2 who do not work for CPS agencies is not
known, but is probably quite large, in the tens or hundreds of thousands. For ex-
ample, Davies et al. (1996) report that more than 1,000 FCSAEs are performed
annually at a single hospital-based site in San Diego, California.
1 103 Prices Switch Road, Warwick, New York 10990; e-mail: drsteveherman@yahoo.com.
2 The phrase MHP is used in a broad sense to refer to anyone who might perform a psychosocial evalua-
tion in a mental health or social work context, including CPS caseworkers. The phrase FCSAE is used
in a narrow sense to refer only to evaluations conducted by MHPs.
87
0147-7307/05/0200-0087/1
C 2005 Springer Science+Business Media, Inc.
88 Herman
The stakes for the children and adults who are the objects of FCSAEs are
high. Well conducted evaluations can result in the discovery of critical factual
information, and can lead to the substantiation of genuine allegations or the refu-
tation of false allegations (Horner, Guyer, & Kalter, 1993). However, when a low
quality investigation fails to substantiate genuine allegations of sexual abuse, vul-
nerable children may be left unprotected and perpetrators may go on to victimize
other children in the future (Sbraga & O’Donohue, 2003). On the other hand, a
low quality investigation that results in an erroneous decision to substantiate abuse
when no abuse has occurred or when the perpetrator has not been correctly identi-
fied can lead to the wrongful destruction of the lives of innocent children, adults, and
families (Wakefield & Underwager, 1994). Even investigations that ultimately end
with a finding that abuse allegations are unfounded can cause serious, irreparable
harm to children and their families (Bernet, 1997; Besharov, 1994; Fincham, Beach,
Moore, & Diener, 1994; Pillai, 2002; Richardson, 1990; for a dissenting opinion see
Finkelhor, 1993).
PRACTITIONER PERFORMANCE IN FORENSIC CHILD SEXUAL

ABUSE EVALUATIONS
Because of the high stakes, it is important to know how well practitioners are
currently performing in FCSAEs. In order to evaluate the quality of these evalua-
tions, it is necessary to distinguish among several different component tasks that are
common to all forensic mental health evaluations. According to Heilbrun’s (2003)
analysis, tasks performed in forensic mental health evaluations can be conceptu-
alized as falling into four broad phases; namely, preparation, data collection, data
interpretation, and communication of results. This article focuses on the data inter-
pretation phase of FCSAEs. Specifically, the two primary goals of this article are:
(a) to evaluate the accuracy of practitioner judgments in the central data interpre-
tation task in most FCSAEs—the development and rendering of an expert opinion
about the validity of unconfirmed allegations of child sexual abuse, and (b) to pro-
pose a method for improving the consistency and quality of these expert opinions.
Before we turn to the main goals of this article, let us first briefly examine the data
on practitioner performance in the other three phases of the FCSAE.
Preparation
Given the potentially severe consequences of poorly conducted FCSAEs,
one might argue that only the most highly trained and well prepared mental
health practitioners—board-certified forensic psychiatrists and psychologists, and
doctoral-level social workers with forensic training—should perform FCSAEs. In
reality, however, “much of the investigation done in cases of child sexual abuse ap-
pears to be done by the least trained professionals and paraprofessionals” (Lawlor,
1998, pp. 105–106). Many FCSAEs are performed by CPS caseworkers who have
no licensure in any mental health profession. In one survey study of MHPs who
perform child sexual abuse evaluations in California, 78 of the respondents de-
scribed themselves as social workers. Of these 78, only 66% possessed a graduate or
Improving Decision Making 89
undergraduate degree in social work or a related field and only 12% were licensed
(Shumaker, 2000). The inadequate professional preparation of many practitioners
of FCSAEs is given that seems unlikely to change in the foreseeable future, primar-
ily because of the high cost of evaluations conducted by the most highly qualified
MHPs.
Data Collection and Communication of Results

Some of the most important contributions made by MHPs to the successful
resolution of cases of alleged child sexual abuse arise from their roles as data col-
lectors and communicators of results, and do not necessarily require the rendering
of an expert opinion as to the validity of abuse allegations (Horner et al., 1993).
For example, a well-conducted, nonleading, videotaped interview with an abused
child may convince a perpetrator to confess or accept a plea bargain, sparing the
child the possibly traumatic drama of a long, drawn out court battle. Unfortunately,
some of the skills required to perform effectively as a data collector in this domain,
such as the ability to perform effective forensic interviews with children and adults,
are quite difficult to learn and are not taught to most MHPs as a standard part of
their academic training (Benedek et al., 1998; Lawlor, 1998). Another problem is
that the therapeutic interview techniques that are taught to MHPs—for example,
focusing on understanding therapy clients’ subjective reality—may make it more
difficult to conduct objective forensic interviews (Faust & Ziskin, 1988). As a result,
even experienced practitioners of FCSAEs often demonstrate poor forensic inter-
viewing skills (Warren & Marsil, 2002). Furthermore, the research on child sexual
abuse and child sexual abuse allegations is large, complex, constantly growing and
changing, and self-contradictory at times. Many MHPs lack the time, knowledge,
or skills necessary to effectively tackle this massive and confusing corpus (Berliner,
1998). Finally, many clinicians have erroneous implicit or explicit beliefs about al-
legations of child sexual abuse that may reduce the accuracy of their evaluations
(Conte, Sorenson, Fogarty, & Rosa, 1991; Jackson & Nuttall, 1993; Mason, 1998;
Oberlander, 1995). In summary, many practitioners lack adequate levels of training,
knowledge, and skill to perform high quality forensic interviews of children or to ac-
curately describe to legal decision makers the factors that distinguish between true
and false allegations of abuse.
Data Interpretation
The central data interpretation task in most FCSAEs is that of arriving at a
decision to classify abuse allegations as either substantiated or not substantiated.
However, in some cases the opinion of the evaluator as to whether or not abuse oc-
curred is superfluous because there is absolutely clear and convincing corroborative
evidence that abuse has occurred—for example, in the form of indisputable medical
evidence, documentary evidence such as photographs or videotapes, credible eye-
witnesses, or a credible confession by the perpetrator. One study of 399 FCSAEs
found that there was strong corroborating evidence in 30% of all of the cases re-
viewed (Elliott & Briere, 1994), implying that 70% of the cases lacked strong cor-
roborating evidence. The focus of the current article is on the approximately 70%
90 Herman
of cases in which evaluator decisions are most likely to play a decisive role because
of the lack of strong external corroborating evidence.
In contrast to the widely accepted legitimacy of the data collection and com-
munication roles for qualified MHPs in FCSAEs, there is considerable controversy
surrounding the central data interpretation role. In fact, there is a widespread (but
not universal) consensus among experts in the field that decisions by clinicians to
either substantiate or not substantiate uncorroborated sexual abuse allegations cur-
rently lack a firm scientific foundation (Fisher & Whiting, 1998; Goodman, Emery,
& Haugaard, 1998; Horner et al., 1993; Poole & Lindsay, 1998) and that such opin-
ions may be based on little more than a clinician’s subjective opinion or hunch that
abuse did or did not occur (cf. Benedek et al., 1998). Melton and Limber (1989) and
Fisher (1995) have argued that, given the current lack of a reliable scientific foun-
dation, it is irresponsible or even unethical for MHPs to offer expert opinions about
whether or not abuse has occurred. The comments made by Horner et al. in 1993 still
hold true: “in cases of alleged child sexual abuse, clinical experts have yet to demon-
strate that they possess any unique ability to find the truth, to determine the credibil-
ity of persons giving testimony, or to divine either the past or future from immediate
clinical observation and facts” (p. 930). Expressions of the opposing viewpoint, that
it is scientifically and ethically legitimate for MHPs to make substantiation deci-
sions in cases of alleged child sexual abuse, come primarily from clinicians, profes-
sional organizations that represent clinicians, and members of the legal profession
(American Academy of Child and Adolescent Psychiatry, 1997; American Profes-
sional Society on the Abuse of Children, 1997; Berliner & Conte, 1993; Oberlander,
1995; cf. Myers & Stern, 2002). The contradiction between the consensus among sci-
entific experts on child sexual abuse and actual practice in most FCSAEs reflects the
growing gap between the science and practice of psychology in this and many other
domains (Tavris, 2003).
Despite the lack of a firm scientific foundation, it is very unlikely that MHPs are
going to voluntarily abstain from making substantiation decisions in the foreseeable
future. Indeed, CPS caseworkers are obliged to offer these opinions as part of their
job duties, and non-CPS evaluator often face strong implicit and explicit pressures to
produce definitive substantiation decisions. Of course, it might still be the case that
evaluator opinions in FCSAEs are generally accurate because—one could argue—
experienced clinicians are intuitively able to ferret out the truth in these cases. If that
were true, it would mean that our current reliance on practitioner substantiation
decisions would be justified. But is it true?
HOW ACCURATE ARE SUBSTANTIATION DECISIONS?
In practical terms, most evaluator opinions in FCSAEs boil down to a deci-

sion to either substantiate or not substantiate abuse allegations. In a minority of
cases, evaluators decide that there is good reason to suspect abuse but not enough
evidence to substantiate. These ambiguous cases are usually described as inconclu-
sive or indicated. In practice, only a small percentage of evaluator decisions fall into
this inconclusive category. For example, Child Maltreatment 2002 (DHHS, 2004)
included detailed data on investigation disposition for all physical and sexual child
abuse and child neglect cases reported to CPS agencies in 49 states in 2002. Only
nine of these states classified any cases as indicated and significant numbers of indi-
cated cases were reported in only seven of these nine states. In these seven states,
about 23% of all cases were classified as indicated. Overall, across all 49 report-
ing states, 27% of all cases of child abuse and neglect were classified as substanti-
ated, 4% as indicated, and 60% as unsubstantiated (the remaining 9% fell into six
other small categories). Thus, for the most part, FCSAEs are diagnostic classifica-
tion procedures with a dichotomous outcome. The reliability, validity, and accuracy
of FCSAEs can therefore be described using the concepts and statistics that are used
to describe other dichotomous diagnostic tests in psychology and medicine (for in-
troductions see Greenhalgh, 1997 and Streiner, 2003; also see Kraemer, 1992).
Evaluator decisions in cases of unconfirmed abuse allegations are based on the
individual clinician’s review and analysis of the available data in light of the clini-
cian’s own personal experiences, knowledge of the research, and implicit or explicit
beliefs and values. The process of making decisions in this fashion is called clini-
cal judgment, which is often contrasted with actuarial or mechanical judgment, in
which data are weighed and judgments are arrived at using predefined formulas and
procedures (Dawes, Faust, & Meehl, 1989). Currently, there are no scientifically val-
idated actuarial decision procedures specifically designed to assist clinicians in the
evaluation of allegations of child sexual abuse.
Direct empirical evidence bearing on the validity and accuracy of clinical judg-
ments about allegations of child sexual abuse is difficult or impossible to obtain.
Experimental studies face insurmountable ethical barriers to achieving high lev-
els of internal and external validity. The accuracy of clinical judgments in this field
will never be directly studied in controlled experiments because children obviously
cannot be sexually abused or subjected to intensive suggestive interviews by adults
in order to determine whether or not clinicians can distinguish between children
known to be sexually abused and those who have not been abused but who are the
objects of suspicions or allegations of abuse by adults. Nonexperimental field studies
also face insurmountable difficulties to achieving high levels of validity. The main
problem is that in most real-world cases of allegations of child sexual abuse there
is no reliable gold standard to which clinician judgments can be compared. Only a
minority of all sexual abuse allegations are ultimately confirmed by clear and indis-
putable corroborative evidence. In cases of false allegations, clear and absolutely
convincing evidence that the allegations are false is almost always absent because it
is very difficult, if not impossible, to prove a blanket negative—that a particular child
was never sexually abused. Nevertheless, some authors have provided useful guide-
lines for attempting to establish the ground truth in studies of clinician judgments
about child sexual abuse allegations (Horowitz, Lamb, Esplin, Boychuk, & Reiter-
Laverly, 1995). Furthermore, there are a number of interesting studies that have
investigated the use of quasi-objective methods such as polygraph tests (Abrams,
1995) and criterion based content analysis and statement validity analysis (Raskin
& Esplin, 1991) to evaluate abuse allegations. Unfortunately, the reliability, validity,
and accuracy of these quasi-objective methods do not currently appear to be suffi-
cient to justify their use in many legal settings or as gold standards for the purpose
92 Herman
of the evaluation of the accuracy of clinician judgments (Berliner & Conte, 1993;
Cross & Saxe, 1992; Lamb, Sternberg, Esplin, Hershkowitz, & Orbach, 1997; Lamb,
Sternberg, Esplin, Hershkowitz, Orbach, & Hovav, 1997; Ruby & Brigham, 1998;
Saxe & Ben-Shakhar, 1999). In short, we have no universally accepted criterion that
can be used to evaluate the accuracy of clinician judgments in most real cases of
child sexual abuse.
In the absence of a reliable gold standard, there are a number of statistical
methods falling under the general rubric of latent class analyses that could be used to
provide precise estimates of the reliability, validity, and accuracy of clinician judg-
ments in FCSAEs (cf. Faraone & Tsuang, 1994), but these methods have not yet
been applied in this domain. In the meantime, there are a number of existing empir-
ical studies that do provide data that is relevant to the assessment of the validity and
accuracy of these judgments, including data (a) from studies that have attempted to
directly assess the accuracy of clinician judgments using some type of gold standard,
(b) about the reliability of clinician judgments, (c) about the factors that influence
clinician judgments, and (d) about the reliability, validity, and accuracy of clinician
judgments in similar domains.
The Reliability and Validity of Judgments About Child

Sexual Abuse Allegations
There are a handful of quasi-experimental and field studies that have attempted
to evaluate the reliability, validity, and accuracy of clinician judgments about alle-
gations of child sexual abuse. Assessing reliability is more feasible than directly as-
sessing validity and accuracy because reliability assessments do not rely on a gold
standard: we can observe that two clinicians disagree about a particular case without
necessarily knowing which one is correct (Garb, 1998). Information about reliability
can be useful in estimating validity and accuracy if (and only if) it turns out that the
reliability is low. If reliability is low then we can logically deduce that validity is also
low. If the validity is low, and the base and selection (substantiation) rates are nei-
ther very low (near .00) nor very high (near 1.00), then we can deduce that accuracy
must also be low. If reliability is high then it tells us little or nothing about validity
and accuracy, since reliability is a necessary but not sufficient condition for validity
(Crocker & Algina, 1986).
Realmuto and his colleagues used a quasi-experimental design in two small
studies that attempted to directly evaluate the ability of clinicians to detect past sex-
ual abuse in children. In the first experiment Realmuto, Jensen, and Wescoe (1990)
examined the ability of an experienced child psychiatrist to use an interview with
anatomically correct dolls to correctly classify children whose prior sexual abuse
status was not known to the interviewer. In the second experiment clinical judges
attempted to correctly classify some of the same children after viewing videotapes of
interviews conducted during the first experiment (Realmuto & Wescoe, 1992). The
legal and psychiatric records of the sexually abused children were carefully reviewed
by two clinicians in order to confirm that abuse had occurred. The nonabused chil-
dren were drawn from clinical and nonclinical samples for which there was no reason
to suspect abuse.
The internal validity of this experiment, or any other similar experiment, is in-
evitably limited by the possibility of classification errors. Thus, it is possible that
some of the children who were classified as abused by the experimenter had not
actually been abused and vice-versa. The external or ecological validity of these ex-
periments with respect to actual FCSAEs is also limited because of the restrictive
conditions that were placed on the interview process. Because of legitimate ethi-
cal concerns about the danger of exposing nonabused children to questions about
sexual abuse, each child was interviewed only one time for 20 min and the inter-
viewer was not allowed to ask direct questions about sexual abuse unless the child
raised the topic or divulged suspicious material. Despite these limitations, the data
obtained from these unique experiments is interesting and useful.
In the first experiment a child psychiatrist interviewed 15 children; 6 of these
children had been classified as sexually abused by the experimenters (Realmuto
et al., 1990). There was almost no correlation (φ = .04) between the interviewer’s
classification decisions and the experimenter-determined prior abuse status. In the
second experiment (Realmuto & Wescoe, 1992), 14 clinical judges each viewed 13
videotaped interviews from the first experiment. The experimenters had previously
substantiated abuse for 4 of these 13 children. The 14 judges were blind to the prior
abuse status of the children. Each clinician made a judgment to substantiate or not
substantiate abuse in each of the 13 cases, for a total of 182 judgments. The 14 judges
demonstrated a low level of interrater agreement as measured by Cohen’s kappa
(κ = .36). The observed correlation (φ = .08) between the 182 evaluator classifica-
tions and the experimenter-determined abuse status was negligible.
In order to make a rough comparison of diagnostic validities across studies with
different reliabilities, the author proposes an intuitively appealing, reliability-free
index of validity, v, the ratio of the achieved validity coefficient to the maximum
possible validity coefficient, v = φ/φmax . In order to calculate φmax we need an esti-
mate of the interrater reliability, φxx . Realmuto and Wescoe (1992) do not provide
a direct estimate of the average correlation, φxx , between pairs of judges. However,
since κ generally is close to φxx (Kraemer et al., 1999), we use κ as an estimate for
φxx . The square root of the reliability is an upper bound estimate for the correla-
tion between
√ evaluator classification and abuse status (Crocker & Algina, 1986), so
φmax = φxx = .60. Thus, in this experiment, v = .13 (cf. Ruscio, 1998).
McGraw and Smith (1992) reevaluated 18 cases of alleged child sexual abuse
that had already been evaluated by CPS caseworkers. The CPS caseworkers had
substantiated abuse in only 1 of the 18 cases whereas McGraw and Smith substan-
tiated abuse in 8 of the 18 cases. We can estimate the interrater reliability of the
evaluators in this study as κ = .14 by comparing judgments made by McGraw and
Smith (considered to be a single evaluator) with judgments made by the CPS case-
workers (also considered to be a single evaluator).
Shumaker (2000) created several short case vignettes that described allegations
of child sexual abuse. A panel of 10 experts in child sexual abuse unanimously
agreed that the abuse allegations described in two of the vignettes had no merit.
Shumaker asked MHPs who routinely perform child sexual abuse evaluations to
decide whether or not the allegations described in the vignettes had merit. Of the
131 MHPs who rated one of the two no-merit vignettes, 59 concluded that the
94 Herman
allegations had merit whereas 72 concluded that the allegations had no merit. Pro-
fessional affiliation was strongly associated with raters’ judgments: 79% of 52 social
workers concluded that a no-merit allegation had merit versus only 13% of the 45
psychologists and 35% of the 34 counselors. In other words, much of the variance
in the evaluators’ judgments can be accounted for by their professional affiliation,
a nonevidentiary factor that has no logical bearing on whether or not the abuse al-
legations in the vignettes had merit or, for that matter, whether or not actual abuse
allegations that these professionals evaluate on the job have merit. We can roughly
estimate the level of interrater agreement for the 131 clinicians who rated a no-merit
vignette as κ = 0.3
Results from experiments by Horner and his colleagues (Horner & Guyer,
1991a, 1991b; Horner, Guyer, & Kalter, 1992) are consistent with Shumaker (2000).
A real case of alleged sexual abuse of a 3-year-old girl by her father, who was in-
volved in a custody dispute with the mother, was presented to a total of 129 MHPs in
a series of experiments. The presentation was made by an experienced clinical psy-
chologist who had conducted a court-ordered custody evaluation in the case. The
presentation included excerpts of videotaped interviews with the child and video-
tapes of the child interacting with the accused father. The MHPs were able to ask
questions of the psychologist who made the presentation. Following the case pre-
sentation, the professionals were asked to estimate the probability that the sexual
abuse allegations were true. The range of probability estimates from the 129 MHPs
was from 0 (absolutely certain that the allegations were false) to 1.00 (absolutely
certain that the allegations were true).
Horner and Guyer, (1991a, 1991b) and Horner et al. (1992) did not require
participants to make a dichotomous decision to either substantiate or not substan-
tiate abuse, so it is difficult to directly compare their results to those of the other
studies discussed in this article. However, the weighted mean for the first proba-
bility estimate (each participant made two judgments of the probability that abuse
had occurred, one directly after the presentation and the second after a group dis-
cussion) can be calculated from their published results as approximately .40 with
an approximate standard deviation of .23. Assuming a roughly normal distribution,
these figures suggest that approximately 65% of the participants judged that the
probability of abuse was less than .50. Let us further assume that, if forced to make a
dichotomous judgment, participants who rated the probability as less than .50 would
not have substantiated abuse and that the remainder would have substantiated. We
can apply the same method used in footnote 2 to estimate the level of interrater
agreement among the 129 judges as κ = .08.
3 Since results for both of the no-merit vignettes were very similar, we collapse the data from these two
vignettes and consider them together. A rough estimate of κ is calculated by using the minimum possible
expected chance agreement rate (assuming all evaluators have the same overall substantiation rate),
p chance = .50. The proportion of agreement, p agree is calculated as the ratio of the number of all possible
consistent pairs of evaluators [(59 choose 2) + (72 choose 2)] to the number of all possible pairs of
evaluators (131 choose 2). For Shumaker (2000), the formula is:

p agree − p chance
59!
(59−2)!2!
+ (72−2)!2!
72! 131!
(131−2)!2!
− .50
κ≈ = = .00.
1 − p chance 1 − .50
In Finlayson and Koocher (1991), 269 pediatric psychologists rated how certain
they were on a scale from 0 to 100% that sexual abuse was occurring in four differ-
ent written vignettes describing child sexual abuse allegations. Based on data in the
article it is possible to determine for each vignette the percentage of respondents
that were 0–49% certain that abuse was occurring and the percentage that were 50–
100% certain. Applying the same method used to analyze the data from Horner and
Guyer, (1991a, 1991b) and Horner et al. (1992), we can estimate κ for each of the
four vignettes as .94, .51, .05, and .17, M = .42.
Bow, Quinnell, Zaroff, and Assemany (2002) surveyed 84 forensic psycholo-
gists who perform child custody evaluations in cases of allegations of child sexual
abuse. These psychologists had an average of 17 years of clinical experience with
child sexual abuse and had each performed an average of 52 custody evaluations
that included allegations of sexual abuse. They reported an average substantiation
rate of .30. According to Bow (personal communication, 2003), the standard devia-
tion of the self-reported substantiation rates was .22 and the range was from .00 to
1.00. Similarly, in Jackson and Nuttall (1993), 656 clinicians rated their level of be-
lief in the validity of child sexual abuse allegations described in 16 different written
vignettes on a 1 (“very confident sexual abuse did not occur”) to 6 (“very confident
sexual abuse occurred”) scale. For each of the 16 vignettes, the ratings made by the
clinicians ranged from 1 to 6.
Factors that Influence Judgments About Child Sexual

Abuse Allegations
Another rough indicator of the validity of clinician judgments about allegations
of child sexual abuse comes from studies that have examined factors that actually
influence clinicians’ substantiation decisions. To the extent that these decisions are
influenced or determined on the basis of nonevidentiary factors that are either log-
ically or empirically unrelated to the truth or falsity of abuse allegations, we should
have less confidence in them. To the extent that these decisions are influenced or de-
termined by factors that are associated with the truth or falsity of abuse allegations,
we should have more confidence in them. An example of a factor that is logically
unrelated to the truth or falsity of an abuse allegation, but that has been shown to
influence clinician judgments about the veracity of abuse allegations, is the clini-
cian’s own abuse history: clinicians who have themselves been sexually abused are
more likely to believe abuse allegations than clinicians who have not been sexually
abused (Jackson & Nuttall, 1993). Other logically unrelated factors associated with
clinicians’ level of belief in child sexual abuse allegations in the study by Jackson and
Nuttall were clinician’s age (younger more likely to believe), gender (women more
likely to believe), and discipline (social workers more likely to believe). Jackson
and Nuttall also found that other factors that have little or no empirical association
with the truth or falsity of abuse allegations—for example, the race of the alleged
perpetrator or the alleged victim—had a strong influence on clinician judgments.
Specifically, their results indicated that clinicians were more likely to believe abuse
allegations when the alleged perpetrator was described as a White American or the
alleged victim was described as an African-American.
96 Herman
Mason (1991, 1998) reviewed 122 appellate court decisions in child sexual abuse
cases and found that MHPs who testified held inconsistent views about what fea-
tures were indicative of sexual abuse. For example, some MHPs testified that re-
traction, recantation, and conflicting reports were characteristic of genuine cases of
sexual abuse, while others testified that children never lie about abuse and that con-
sistency in reports of abuse over time is characteristic of genuine reports. Evidence
for significant inconsistencies in practitioners’ beliefs about child sexual abuse and
how FCSAEs should be conducted has been found in other survey studies (Conte
et al., 1991; Kendall-Tackett & Watson, 1992; Oberlander, 1995).
Other studies indirectly address issues relevant to clinician judgments about
child sexual abuse. For example, many clinicians believe that they are able to de-
tect when children are telling the truth about sexual abuse based on the amount of
detail and other characteristics of children’s abuse narratives (Benedek et al., 1998;
Haskett, Wayland, Hutcheson, & Tavana, 1995; Oberlander, 1995). However, there
is now considerable evidence indicating that even experienced clinicians have virtu-
ally no ability to reliably detect the difference between true and false narratives of
past events told by children who give false reports or develop false memories as a re-
sult of suggestive interviewing (Ceci & Bruck, 1995; Ceci, Huffman, & Smith, 1994;
Ceci, Loftus, Leichtman, & Bruck, 1994; Leichtman & Ceci, 1995). Furthermore, a
significant body of research evidence indicates that, with a few exceptions, most peo-
ple (including most law enforcement personnel and most MHPs) have little ability
to detect motivated lying by adults (Ekman & O’Sullivan, 1991; Ekman, O’Sullivan,
& Frank, 1999) and, furthermore, most people tend to be overconfident about their
ability to detect lying (DePaulo, Charlton, Cooper, Lindsay, & Muhlenbruck, 1997;
Elaad, 2003; Kassin, 2002).
The Reliability and Validity of Clinical Judgment in Related Domains

There are few empirical studies that provide direct information on the relia-
bility, validity, and accuracy of clinical judgments about allegations of child sexual
abuse in FCSAEs. There are other, better studied, domains of forensic judgment
that require skills and knowledge similar to those required to make accurate judg-
ments in FCSAEs. Here are some examples: a large (N = 1,784 families), method-
ologically sound study examined the ability of CPS caseworkers to use clinical judg-
ment to predict the likelihood of future abuse or neglect substantiations in families
they had investigated (Baird & Wagner, 1999, 2000). The judgments of these case-
workers showed very low reliability and validity (κ = .18, φ = .04, v = .10).4 An-
other type of CPS caseworker decision that has a significant overlap with substanti-
ation decisions is the decision about whether or not to place a child in foster care. On
the basis of his review of a number of analogue studies of foster care placement de-
cisions, Lindsey (1992) concluded that an upper-bound estimate for the interrater
reliability of these decisions was .25 (this figure probably represents an interrater
4 The correlation between caseworker risk assessments and future substantiation was calculated by col-
lapsing the three categories of risk assignments used in this study (Low, Medium, High) into two
(Low/Medium, High).
correlation, although this is not entirely clear from the text of the article). Another
related forensic judgment task is the assessment of the risk of recidivism for sex
offenders. A meta-analysis of 10 studies (N= 1,453) found a very small average cor-
relation (r = .10) between clinical predictions of risk and recidivism (Hanson, 1998;
Hanson & Bussiere, 1996).
The reliability, validity, and accuracy of clinical judgment is disappointingly low
even in domains that are closely related to the skills and knowledge we expect clin-
icians to have, for example, the ability to diagnose mental illness, to assess person-
ality traits, or to predict the likelihood of future behavior. Even in these tasks, the
majority of studies have found average interrater reliability to be only fair. For ex-
ample, Garb (1998) reports 85 reliability estimates from 24 different studies of clini-
cal judgments by MHPs. The statistics reported in these studies included κ, intraclass
correlations, and interrater correlations. The mean reliability was .47 (SD = .29).
The relative validity of clinical and actuarial judgments in psychology and medicine
was examined by Grove, Zald, Lebow, Snitz, and Nelson (2000) in a meta-analysis of
136 studies. Of the 83 studies that examined psychosocial diagnoses and predictions
of human behavior rather than medical diagnoses, 34 used a correlation coefficient
to describe the accuracy of clinician judgments. Of the 34 studies that reported cor-
relations, the 4 that examined forensic judgments had an average correlation of .17;
the average correlation for the 30 nonforensic studies was .36.
Summary
Table 1 summarizes reliability and validity data from 76 different studies of clin-
ical judgment in forensic and nonforensic domains. The average validity coefficient
Table 1. Reliability and Validity of Clinical Judgments

Citation Reliability Validity
Clinical judgments about child sexual abuse allegations
Realmuto, Jensen, and Wescoe (1990) .04
Finlayson and Koocher (1991) .42
Horner and Guyer (1991a, 1994b) and Horner et al. (1992) .08
Realmuto and Wescoe (1992) .36 .08
McGraw and Smith (1992) .14
Shumaker (2000) .00
Clinical judgments in other forensic domains
Lindsey (1992) .25
Hanson and Bussiere (1996) and Hanson (1998) (mean of 10 studies) .10
Baird and Wagner (1999, 2000) .18 .04
Grove et al. (2000) (mean of 4 studies)a .17
Clinical judgments in nonforensic domains
Grove et al. (2000) (mean of 30 studies)a .36
Garb (1998) (mean of 85 estimates from 24 studies) .47
Note. Some statistics in this table represent extrapolations from data presented in the cited articles; see
the text of the current article for details. Reliability is estimated by Cohen’s kappa, except for Garb
(1998) and Lindsey (1992); see the text of the current article for details. Validity is estimated by the
correlation coefficient, φ or r.
a The Grove et al. (2000) meta-analysis of 136 studies included 30 studies that described the accuracy
of nonforensic clinical judgments in terms of a correlation and 4 studies described the accuracy of
forensic clinical judgments in terms of a correlation.
98 Herman
for clinician judgments about whether or not a child had been sexually abused in the
two experiments by Realmuto et al. (Realmuto & Wescoe, 1992) was φ = .06. The
average validity in 16 studies of clinical forensic judgments in domains other than
that of child sexual abuse was φ = .11. Note that the average validity for clinical
judgments in the 30 nonforensic studies from the Grove et al. (2000) meta-analysis
was φ = 36; more than three times the average from the 16 forensic studies. Taken
together, these results suggest that the actual overall validity coefficient for clini-
cians’ judgments about the validity of unconfirmed allegations of child sexual abuse
(a forensic domain) falls within the range .05–.25.
Another way to form a rough upper-bound estimate of the validity coefficient
is by looking at reliability. The average interrater reliability in five studies of clini-
cian judgments about unconfirmed allegations of child sexual abuse as estimated by
κ was .20 (see Table 1). The square root of the reliability represents
√ a theoretical
upper bound on the validity coefficient, in this case, φmax = φxx = .45. Note that
it is very unlikely that the actual validity coefficient will attain 100% of its theoret-
ically possible maximum value or, in other words, that v = 1.00. In Realmuto and
Wescoe (1992), v = .13, and in Baird and Wagner (1999, 2000), v = .10. If clinician
judgments in this domain were able to attain as much as 50% of their maximum
potential validity, v = .50, then the validity coefficient would be φ = v × φmax = .23.
Although this somewhat informal analysis does not permit a precise statistical es-
timation of the validity of clinician judgments, it does suggest that the validity co-
efficient is probably less than φ = .25 and is almost certainly less than φ = .50. For
the purposes of this discussion, we use .50 as a liberal upper-bound estimate for the
actual validity coefficient. Note that φ = .50 exceeds all of the validity coefficients
displayed in Table 1.
In order to construct a scenario that will allow us to estimate upper bounds for
clinician accuracy, we also need to estimate (a) the overall substantiation rate in
FCSAEs, and (b) the base rate of true allegations in the population of all cases of
alleged child sexual abuse that are subjected to FCSAEs. The largest relevant stud-
ies are those based on data collected from CPS agencies by the DHHS and sum-
marized in the annual Child Maltreatment reports (DHHS, 2004). These studies
indicate that CPS caseworkers substantiate abuse in about 27% of the FCSAEs that
they perform each year. Smaller, more detailed, studies of substantiation rates in
samples of investigations by CPS caseworkers have yielded significantly higher sub-
stantiation rates: the weighted mean substantiation rate was .53 (range = .43–.63) in
the four smaller studies of CPS investigations of child sexual abuse allegations in-
cluded in Table 2. The weighted mean substantiation rate for five studies of non-CPS
FCSAEs shown in Table 2 was .47 (range = .25–.62). The reason for the discrep-
ancy between the substantiation rate from the large, national CPS surveys and rates
from the smaller studies of CPS and non-CPS investigations is unclear. One possible
explanation is that the national statistics are based on reports from the states and
about 20 states do not distinguish between screened-in and screened-out reports in
calculating substantiation rates (Peddle & Wang, 2001), which means that screened-
out reports may end up being categorized as unsubstantiated, and could thereby
deflate the overall estimate of the substantiation rate (cf. Haskett et al., 1995).
For the purposes of this article, we will use .40 as a rough estimate of the overall
Table 2. Substantiation Rates in Forensic Child Sexual Abuse Evaluations

Citation N Substantiation rate
National survey of Child Protective Services (CPS) evaluations
Department of Health and Human Services (2004) 170,000 .27
Smaller studies of CPS evaluations
Everson and Boat (1989) 1,249 .56
Jones and McGraw (1987) 576 .53
Oates et al. (2000) 551 .43
Haskett et al. (1995) 175 .63
Studies of non-CPS evaluations
Elliott and Briere (1994) 399 .62
Bowen and Aldous (1999) 393 .48
Wakefield and Underwager (1989) 319 .39
Keary and Fitzpatrick (1994) 251 .52
Drach et al. (2001) 247 .25
substantiation rate in evaluations of child sexual abuse allegations in the United

States.
We have no way of directly estimating the base rate of true allegations. How-
ever, we use the mean substantiation rate as our best available estimate; in other
words, we make the assumption that the true base rate of genuine allegations in
the population of all cases of allegations of child sexual abuse that are investigated
is close to .40. Note that we are not assuming that all substantiation decisions are
correct, just that the rates of false positive and false negative errors are similar. Pre-
cise estimates of the base and substantiation rates, and the numerical equivalence
of false positive and false negative error rates, are not crucial to the arguments pre-
sented here.
Using these estimates, we can calculate accuracy statistics for an Optimistic
Scenario that provides an upper-bound estimate for the overall accuracy of current
clinical judgments in FCSAEs: under the Optimistic Scenario, 24% of all substanti-
ation decisions will be erroneous. If substantiation decisions were made completely
at random (with estimated base and substantiation rates of .40), then we would ex-
pect a chance error rate of 48%. Changes in the assumed substantiation and base
rates would have little impact on the overall error rate estimate: if we chose .30 as
our estimate of the base and substantiation rates, then the error rate would be 21%;
if we chose .50, the error rate would be 25%. Using a lower, more realistic, estimate
of the validity coefficient would have a substantial impact on the estimated error
rate: a validity coefficient of φ = .25 instead of φ = .50 yields an error rate of 36%.
In short, this analysis indicates that the overall accuracy of substantiation decisions
by MHPs in cases of unconfirmed allegations of child sexual abuse is quite low, even
under generous assumptions.
Drawing conclusions from anecdotal evidence is always risky, however, the
above analysis is consistent with an impression derived from media reports that
poorly conducted FCSAEs and errors in substantiation decisions by MHPs have had
dire consequences for many children and adults in numerous criminal cases over the
past 20 years (e.g., Coleman, 1989; Humphrey, 1985; Johnson, 2004; Leslie, 2004;
Nathan & Snedeker, 2001; Rabinowitz, 2004; Rosenthal, 1995; San Diego County
Grand Jury, 1992, 1994; Seattle Post Intelligencer, 1998).
100 Herman
IMPROVING THE QUALITY OF SUBSTANTIATION DECISIONS
In the past, two main approaches have been taken to improving clinician per-
formance in FCSAEs: (a) providing additional training to practitioners, most often
focusing on improving forensic interviewing skills, and (b) creating and promulgat-
ing guidelines for the practice of FCSAEs. The prognosis for achieving substantial
improvements in the performance of the average practitioner of FCSAEs through
additional training is poor: a number of empirical studies have demonstrated that
additional training alone does not result in any significant, sustained improvement in
forensic interviewing skills, even when such training is intensive (Warren & Marsil,
2002). For example, one 10-day seminar on interviewing skills for practitioners of
FCSAEs had no positive impact on overall interviewing skills, and overall interview
performance was actually lower following the 10-day training at one of the research
sites (Stevenson, Leung, & Cheung, 1992). Only when intensive training is combined
with subsequent ongoing, intensive on-the-job supervision and feedback has there
been a measurable positive impact on interviewing skills (Lamb, Sternberg, Orbach,
Esplin, & Mitchell, 2002). However, Lamb et al. found that once the intensive on-
the-job supervision was discontinued, child interviewers tended to revert to poor
interviewing practices. Because of the expense, effort, and the need for permanent
intensive supervision required under the Lamb et al. model, it is unlikely that this
model will ever find widespread application in real-world settings where FCSAEs
take place.
Although it is difficult to say what, if any, effect the creation and promulga-
tion of guidelines for practice (e.g., American Academy of Child and Adolescent
Psychiatry, 1997; American Professional Society on the Abuse of Children, 1997;
Committee on Ethical Guidelines for Forensic Psychologists, 1991; Home Office,
1992; Lamb, 1994) has had on the quality of FCSAEs, a few studies (Davies &
Wescott, 1999; Sternberg, Lamb, Davies, & Westcott, 2001) and observations from
the field (cf. Lawlor, 1998) suggest that the impact of these guidelines on the perfor-
mance of the average practitioner has not been substantial.
One approach that has not yet been tried, but has been shown to improve the
accuracy of clinical judgments in other domains, is the use of objective decision-
making procedures based on actuarial analyses of empirical data. The accuracy of
actuarial decision-making procedures has equaled or surpassed that of clinical judg-
ment in assessing and predicting human behavior across a wide range of domains. In
the Grove et al. (2000) meta-analysis of 136 studies that compared clinical and actu-
arial judgments in psychology and medicine, actuarial methods were more accurate
in approximately 40% of the studies. In approximately 50%, clinical and actuarial
judgments were equivalent in accuracy, and in only about 10% did clinical judgment
outperform actuarial judgment. In the 10 studies in this meta-analysis that compared
actuarial and clinical judgments in forensic contexts, there was an even more marked
trend for the superiority of actuarial over clinical judgment: the mean effect size for
the advantage of actuarial over clinical judgment in the 10 forensic studies was .89;
the mean effect size in the 41 clinical-personality studies was .19.
There are no reliable studies that have compared actuarial and clinical judg-
ments in FCSAEs because, so far, there are no validated actuarial decision
procedures for FCSAEs. However, studies have compared actuarial and clinical
judgment in related forensic domains. In the previously discussed meta-analysis of
10 studies of predictions of recidivism among sex offenders by Hanson (1998) and
Hanson and Bussiere (1996), the average correlation between clinical predictions of
risk and actual recidivism was .10. On the other hand, Hanson found that a sin-
gle variable that was available in some of these studies, number of prior sexual
offenses, correlated .19 with recidivism, and a simple 4-item actuarial instrument
achieved a correlation of .27 with recidivism (Hanson, 1997). In the previously dis-
cussed study of CPS assessments of risk for future child abuse and neglect, Baird
and Wagner (1999, 2000) compared the accuracy of clinical judgment to the accu-
racy of an actuarial method based on a numerical score computed by simple addi-
tion from the Michigan Assessment (Children’s Research Center, 1999), a 12-item
questionnaire that was filled in by caseworkers. Predictions based solely on numer-
ical scores from the Michigan Assessment achieved higher reliability and validity
(κ = .56, φ = .18, v = .24) than did those of clinicians who relied on their own clin-
ical judgment (κ = .18, φ = .04, v = .10). Milner’s Child Abuse Potential Inventory
represents an actuarial approach that has been shown to be useful in the postdiction
and the prediction of the physical abuse of children (Milner, 1994).
There are two general ways in which actuarial methods can be applied to the
modeling of decision-making tasks. If a reliable gold standard exists, then statisti-
cal techniques or neural networks can be applied to existing datasets in order to
develop models that predict the hard outcomes recorded in the datasets. However,
reliable gold standards are absent in many areas of clinical judgment by MHPs. For
example, there are no reliable, objective, medical tests that can be used to diag-
nose most mental illnesses. Nevertheless, actuarially-derived decision procedures in
the form of validated psychological tests and explicit written criteria such as those
found in the Diagnostic and Statistical Manual of Mental Disorders, fourth edition
(American Psychiatric Association, 1994) have found wide application in the diag-
nosis of mental illness. When no gold standard is available, actuarial analyses are
ultimately based on predictions of clinical judgments rather than independent out-
comes or indicators. The clinical judgments that are used as outcomes in the devel-
opment of these models tend to be those of the best clinicians operating under op-
timal conditions; this is one reason why these decision procedures can outperform
the average clinician. Surprisingly, we have known for many years that actuarial
models can also sometimes outperform the very clinicians on whom the models are
based (Blenkner, 1954; Meehl, 1954, National Council on Crime and Delinquency,
2000). This can happen because statistical decision procedures automatically elimi-
nate certain sources of error and bias that affect clinical judgment, one of which is
that clinicians do not consistently weight predictors. Is it possible, then, to develop
an actuarial model to predict substantiation decisions made by the most qualified
practitioners of FCSAEs operating under optimal conditions?
A Simple Predictive Model for High Quality Substantiation Decisions

Keary and Fitzpatrick (1994) examined 251 FCSAEs that took place in a hospi-
tal setting in Ireland. From information provided in the article, it appears that these
102 Herman
FCSAEs were of high quality: the substantiation decisions were made by consensus
among teams of medical and mental health evaluators, children were interviewed no
more than three times, and a deliberate effort was made to adhere to published best
practice standards for child sexual abuse evaluations. Evaluators classified cases into
only two categories: substantiated and unsubstantiated.
Data presented in the article by Keary and Fitzpatrick (1994) can be used to
assess the association between disclosure of abuse by children prior to formal in-
vestigation, disclosure of abuse by children during formal investigative interviews,
and evaluators’ final substantiation decisions. Keary and Fitzpatrick focus on the
association between prior disclosure and disclosure during formal investigation, but
not on the associations between the two disclosure status variables and evaluators’
final substantiation decisions. The current author has reanalyzed the data presented
by Keary and Fitzpatrick in order to assess the accuracy of three different meth-
ods for predicting the substantiation decisions in their dataset. The first prediction
method proposed by the current author uses the Prior Allegation Rule: If the child
made a statement of abuse prior to the onset of formal investigation then substanti-
ate abuse, otherwise do not substantiate abuse. The second proposed method uses
the Investigation Allegation Rule: If the child made a statement of abuse during the
first three formal interviews then substantiate abuse, otherwise do not substantiate
abuse. The third proposed method uses the Consistent Allegation Rule: If the child
made a statement of abuse prior to the onset of formal investigation and also during
the first three formal interviews then substantiate abuse; if the child did not make
a statement of abuse prior to the onset of formal investigation and did not make a
statement of abuse during the first three formal interviews then classify as unsub-
stantiated; otherwise, classify the case as inconclusive.
These three proposed rules all achieve relatively high levels of predictive accu-
racy with regard to the final substantiation decisions in Keary and Fitzpatrick (1994).
Table 3 describes the predictive accuracy of each of the three proposed rules with
respect to the Keary and Fitzpatrick dataset. The accuracy of the Consistent Alle-
gation Rule cannot be directly compared to that of the other two rules on the whole
Table 3. Validity and Accuracy of Three Proposed Rules for Predicting Substantiation Decisions,
As Applied to Data from Keary and Fitzpatrick (1994)
Rule φ Error rate False positive rate False negative rate
Prior Allegation Rule .65 .18 .15 .20
Investigation Allegation Rule .85 .08 .06 .09
Consistent Allegation Rule .81 .20
(all 251 cases)a
Consistent Allegation Rule .87 .06 .02–.07 .06–.11
(216 consistent cases only)b
Note. The false positive rate refers to the proportion of all cases classified as substantiated by the rule,
but as unsubstantiated by evaluators. The false negative rate is the opposite.
a The 35 discrepant disclosure cases classified by the rule as inconclusive are counted as errors. A
rough value for the correlation is calculated by assigning inconclusive cases a numerical value mid-
way between substantiated and unsubstantiated cases.
bThe 35 discrepant disclosure cases are excluded. The false positive and false negative rates cannot be
calculated exactly from the data in the article, but minimum and maximum values can be calculated,
hence the ranges.
dataset because it classifies the 35 (14%) discrepant disclosure cases into a third, in-
conclusive category. Table 3 shows results for the Consistent Allegation Rule when
it is applied to all 251 cases and when it is applied only to the 216 consistent disclo-
sure cases. If the evaluators in Keary and Fitzpatrick had been allowed to classify
some cases as inconclusive, then it is possible that the Consistent Allegation Rule
might have achieved a higher overall hit rate than either of the other two rules. An-
other advantage to the Consistent Allegation Rule as compared with the other two
rules is that it incorporates information about abuse statements made by the child
both prior to and during the formal investigation. Furthermore, an error that results
from classifying as inconclusive a case that evaluators classified as substantiated or
unsubstantiated is a less serious error than an error that results from classifying as
unsubstantiated a case that evaluators classified as substantiated (or vice-versa). For
these reasons, we focus our discussion on the Consistent Allegation Rule.
Support for the hypothesis that the Consistent Allegation Rule is able to accu-
rately predict most substantiation decisions can be found in at least one other study
of FCSAEs. For the subset of 320 cases classified as either abused or nonabused in
Elliott and Briere (1994), the Consistent Allegation Rule would have correctly pre-
dicted 281 (88%) of the of the 320 substantiation decisions (φ = .74). The two un-
derlying empirical hypotheses encapsulated in the Consistent Allegation Rule are:
(1) The existence of a statement of abuse by the child prior to the onset of
formal investigation and a statement of abuse during the first two to three
formal investigative interviews are characteristic of abuse allegations that
are later substantiated by teams of highly qualified experts operating under
optimal conditions.
(2) The absence of a statement of abuse by the child prior to the onset of formal
investigation and the absence of a statement of abuse during the first two
to three formal investigative interviews are characteristic of abuse allega-
tions that are later classified as unsubstantiated by teams of highly qualified
experts operating under optimal conditions.
A number of other empirical studies support these two hypotheses. Three stud-
ies (Bradley & Wood, 1996; Faller & Henry, 2000; Goodman-Brown, Edelstein,
Goodman, Jones, & Gordon, 2003) examined a total of 751 cases of alleged sexual
abuse. Allegations of abuse made by the child prior to the onset of formal inves-
tigation occurred in 78% of these 751 cases. On the other hand, spontaneous false
allegations of abuse by nonabused children are quite rare. For example, Oates et al.
(2000) found that deliberate false “disclosures” of abuse emanating from children
without prompting or pressure from adults occurred in only 2.5% of the 551 cases
of abuse allegations that they examined. Although there is some controversy on this
issue, reliable studies also suggest that denials of abuse by abused children during
formal interviews are uncommon, despite the belief of many practitioners that ex-
plicit denial of abuse by the child is a common feature of genuine abuse allegations
(Mason, 1998). Bradley and Wood (1996) found initial denials of sexual abuse dur-
ing formal interviews by the child in only 6% of 234 substantiated cases they exam-
ined. Faller and Henry (2000) found that only 4.6% of 323 allegedly abused children
104 Herman
whose cases had been referred for criminal prosecution had initially denied abuse.
These consistent results from methodologically sound studies are in conflict with re-
sults from a smaller study (Sorensen & Snow, 1991). Sorenson and Snow claimed
that 75% of a sample of 116 abused children that they had seen as evaluators or
psychotherapists had denied abuse at some point. However, other researchers have
offered convincing critiques of the methods and objectivity of the Sorenson and
Snow study (Ceci & Bruck, 1995; Poole & Lindsay, 1998).
As it stands, the Consistent Allegation Rule could not be implemented as a
comprehensive procedure for making decision procedures in real-world settings be-
cause it fails to deal adequately with a small number of special cases. Modification of
the rule is necessary because (a) in a small number of cases, there will be convincing
external evidence that abuse has occurred, but children will make discrepant disclo-
sures or no disclosure at all (e.g., Lawson & Chaffin, 1992; Sjoeberg & Lindblad,
2002), and (b) in a small minority of cases in which children have made a state-
ment of abuse prior to investigation and during the formal interviews, there may be
convincing external evidence that the allegations are false or that the perpetrator
has not been correctly identified by the child. The data from Keary and Fitzpatrick
(1994) suggest that at most 12% of the 251 cases that they evaluated fell into the
first category and at most 3% fell into the second category. Elliott and Briere (1994)
found the presence of external evidence of abuse combined with no disclosure of
abuse or a discrepant disclosure of abuse occurred in only 10% of 399 FCSAEs they
reviewed. To mechanically classify as unsubstantiated the 10–15% of all cases in
which the child has made no statement of abuse or has made discrepant statements,
but in which there is strong external evidence that abuse has occurred, would be
offensive to reason and justice. The Modified Consistent Allegation Rule in Fig. 1
allows for the rational and flexible handling of these relatively uncommon cases. Be-
cause of these modifications, it is possible that the Modified Consistent Allegation
Rule could achieve an even higher level of predictive accuracy than the Consistent
Allegation Rule.
SUMMARY, RECOMMENDATIONS, AND CONCLUSION
An analysis of existing empirical research indicates that clinician judgments

about the validity of unconfirmed allegations of child sexual abuse in FCSAEs have
low levels of reliability, validity, and accuracy. The average interrater reliability as
estimated by Cohen’s kappa was .20 in five studies of clinical judgments about alle-
gations of child sexual abuse (Table 1). Kappas below .40 are generally considered
poor, values from .40 to .59 fair, .60 to .74 good, and .75 or above excellent (Garb,
1998). Although other researchers have already clearly articulated the serious im-
plications of the problem of the low reliability of these judgments (e.g., Benedek
et al., 1998; Jackson & Nuttall, 1993; Poole & Lindsay, 1998), this article is the first
attempt to use reliability and validity data derived from empirical studies to form a
quantitative upper-bound estimate for validity and accuracy in this domain.
The empirical data on reliability and validity are quite consistent across a num-
ber of studies and, taken together, suggest that (a) the average validity coefficient
Fig. 1. The Modified Consistent Allegation Rule.
for clinician substantiation decisions in cases of unconfirmed allegation of child sex-

ual abuse probably falls somewhere in the range of .05–.25, and (b) it is very unlikely
that the validity coefficient exceeds .50. Using .50 as a liberal upper-bound estimate,
and with reasonable estimates for the base and substantiation rates, we determined
that at least 24% of all substantiation decisions in cases of unconfirmed allegations
of child sexual abuse are erroneous. This translates into at least 25,000 erroneous
substantiation decisions (false positives and false negatives) per year by CPS case-
workers and thousands of additional errors by non-CPS evaluators.
The estimated minimum value of 24% for the error rate in current substantia-
tion decisions in FCSAEs that involve unconfirmed allegations of child sexual abuse
casts doubt on the utility of these decisions in clinical contexts and on their admis-
sibility in legal contexts. The Daubert decision of the United States Supreme Court
106 Herman
(Daubert v. Merrell Dow Pharmaceuticals, Inc., 1993) is now a controlling precedent

in Federal courts and in about half of all state courts (Huron Consulting Group,
2004). Daubert makes explicit reference to the magnitude of the known or potential
error rate as one of the factors to be considered in determining the admissibility of
expert witness testimony. Daubert does not specify a specific cutoff value for the
error rate that would make a diagnostic test inadmissible. This is not a deficiency
in Daubert, since no single cutoff can or should be specified: expert evidence that is
accurate enough to be admissible for one purpose or under one standard of proof
may not be accurate enough to be admissible for another purpose or under another
standard of proof. Deciding when the error rate of a diagnostic test becomes too
high to permit the results of the test to be admitted as legal evidence is a contextual
sociolegal value judgment that must be resolved by policy makers and the courts
and cannot be resolved by scientific debate or empirical evidence. However, empir-
ical analyses can contribute to sociolegal deliberations by clarifying the error rates
and the likely consequences of different types of accurate and inaccurate judgments
(Horner & Guyer, 1991a). Legal decision makers might be more likely to prohibit
the admission of expert evidence that is directly or indirectly derived from a substan-
tiation decision by a MHP if they were informed that such testimony is based on a
diagnostic procedure that has an overall error rate that is almost certainly greater
than 24%.
A further problem for legal decision makers is that, although there are almost
certainly some MHPs who are more accurate than others, our ability to reliably dis-
tinguish between more and less accurate experts is quite limited. Extensive research
has demonstrated the inadequacy of some traditional measures—for example, years
of clinical experience and experts’ confidence in their opinions—for the assessment
of experts’ accuracy across numerous domains of clinical judgment (Garb, 1989,
1998). The counterintuitive finding that years of clinical experience has no mea-
surable association with accuracy in certain domains is largely due to the fact that
MHPs almost never receive reliable feedback on the accuracy of certain types of
judgments, for example, judgments about unconfirmed allegations of child sexual
abuse, which makes it difficult or impossible for them to learn from experience in
these particular domains (Ziskin, 1995). Even academic and professional training
appears to have only a weak association with clinical expertise and accuracy across
a range of clinical judgment tasks (Garb, 1998). In sum, apart from the difficult-to-
gauge advantage conferred by advanced professional and academic training, legal
decision makers have few reliable methods for judging whether an individual ex-
pert’s error rate is likely to be significantly higher or lower than average (Horner &
Guyer, 1991a; Ziskin, 1995).
Despite the problem of high error rates, doubts about the legal admissibility
of evidence based on substantiation decisions, and the difficulties of distinguishing
between accurate and inaccurate experts, cases that involve substantiation decisions
by MHPs will continue to be adjudicated in many legal contexts, and legal decision
makers are going to continue to have to evaluate the accuracy of these substantia-
tion decisions. In some cases, decision makers are faced with conflicting expert opin-
ions about abuse allegations. When evaluations and evaluators must be evaluated,
legal decision makers should focus primarily on the details of how an evaluation was
conducted rather on the years of clinical experience or the level of confidence that
the expert expresses in his or her opinion (see Benedek et al., 1998). If a particu-
lar evaluation was conducted according to available best practice standards and has
most of the features described below under Recommendations for Clinical Practice,
then it is probably more likely to be accurate than an evaluation that does not meet
these standards. An evaluator’s substantiation rate may also provide information
about accuracy: if an individual evaluator has a substantiation rate that is very low,
say less than 20%, the probability of false negatives by that evaluator will be higher
than average; if an individual evaluator has a substantiation rate that is very high,
say greater than 80%, the probability of false positives will be higher than average.
Unfortunately, all of these methods for evaluating the accuracy of evaluators and
their evaluations are imprecise. One of the main conclusions of the current analysis
is that we need to find better ways to make these substantiation decisions in the first
place.
The current finding of low overall accuracy in clinician judgments about uncon-
firmed allegations of child sexual abuse is consistent with the almost universal con-
sensus among top scientific experts that these evaluations currently have no firm sci-
entific basis. However, somewhat contrary to the prevailing consensus, the current
analysis also suggests that there may be a scientific basis for creating empirically-
grounded objective procedures for making substantiation decisions. One procedure
proposed by the author, the Modified Consistent Allegation Rule, correctly predicts
80–100% of the 251 substantiation decisions made in Keary and Fitzpatrick (1994)
and 88–100% of the 320 definitive substantiation decisions in Elliot and Briere
(1994). The reason that the predictive accuracy is given as a range is due to flexi-
bility in the handling of the approximately 10–20% of cases that the Rule classifies
into the “inconclusive” category.
As one anonymous reviewer pointed out, the current analysis does not conclu-
sively prove that the adoption of the Modified Consistent Allegation Rule, or some
other similar rule, would actually improve the accuracy of substantiation decisions
with respect to reality, since we have no reliable gold standard that we can use to di-
rectly test the accuracy of substantiation decisions predicted by the rule. The current
analysis shows only that we may be able to predict substantiation decisions that are
made by the most qualified experts operating under optimal conditions. However,
there are a number of arguments that do support the hypothesis that the universal
adoption of this rule would improve the overall accuracy of substantiation decisions:
(1) The adoption of this rule would automatically eliminate some well under-
stood sources of bias and error in clinician judgments, for example, the bias
towards substantiation by clinicians who have themselves been victims of
sexual abuse (Jackson & Nuttall, 1993). As Berliner and Conte (1993) point
out, “professional bias, whether it results from the psychological needs of
the professional, the commercial sale of professional opinion as in some
expert testimony, or other factors appears to be a significant problem [in
FCSAEs]” (p. 112); also see Benedek et al. (1998).
(2) Past studies have found that, although clinicians may be able to correctly
identify the factors that predict outcomes in certain judgment tasks, their
108 Herman
accuracy often suffers because they do not apply consistent weightings to

predictors (Baird & Wagner, 1999, 2000; National Council on Crime and
Delinquency, 2000).
(3) In many past studies, the accuracy of judgments made according to actuar-
ial rules has equaled or exceeded that of clinical judgments in almost every
domain, and the advantage of actuarial rules may be even greater in foren-
sic domains (Grove et al., 2000).
(4) The proposed rule predicts substantiation decisions made under optimal
conditions by the most highly qualified professionals working in teams. It is
reasonable to assume that the accuracy of these decisions will exceed that
of individual practitioners who may be more subject to biases and may lack
adequate professional preparation.
(5) The proposed rule is consistent with one commonsense approach to eval-
uating abuse allegations (a child who says he was abused is more likely to
have been abused than one who does not say he has been abused or denies
abuse) and, by focusing on abuse statements made during the first three
formal interviews, it is consistent with the perception of some of the top ex-
perts in the field that abuse allegations that are made by children only after
many interviews by MHPs and others who suspect that abuse has occurred
are less likely to be valid (Benedek et al., 1998; Bruck, Ceci, & Hembrooke,
1998; Poole & Lindsay, 1998).
(6) The strict application of the Modified Consistent Allegation Rule could
very possibly have prevented numerous costly fiascos that have wreaked
havoc in the lives of many children and adults, for example, the day-care
sex abuse scandals of McMartin (Coleman, 1989), Little Rascals (Bruck,
1998), and Wee Care (Rosenthal, 1995). Besides the costs in human suffer-
ing, these fiascos are expensive: the McMartin prosecution took 6 years and
cost California taxpayers $15 million, making it the most expensive prose-
cution in history (Fukurai & Butler, 1994). In these three cases, a total of
approximately 600 children made what were almost certainly false allega-
tions of abuse. The strict application of the proposed rule would probably
have correctly classified all or almost all of the false allegations in these
cases because, with one possible exception in the Little Rascals case, not
a single one of these 600 children made a clear statement of sexual abuse
prior to intensive questioning by adults involved in the investigations.
Recommendations for Research

Replication and Enhancement of the Predictive Model
Although the analysis of existing research in this article provides a strong ar-
gument for the immediate adoption of some version of the Modified Consistent
Allegation Rule as a rational basis for making substantiation decisions, it is not
likely that this Rule, or some variant, will be widely adopted until the findings
that underlie the current analysis are convincingly replicated. It is also possible
that predictive accuracy could be increased through the inclusion of additional psy-
chosocial predictor variables. For example, some researchers and clinicians have
suggested that allegations of abuse may be more likely to be true when the child’s
initial disclosure of abuse is made spontaneously (either accidentally or intention-
ally), without intensive prior questioning by suspicious adults (cf. Bruck et al., 1998;
Gardner, 1995, 1998; Poole & Lindsay, 1998). The data from Keary and Fitzpatrick
(1994) and Elliot and Briere (1994) do not address the issue of the spontaneity of the
child’s initial abuse allegation, and this issue is not addressed in the current version
of the Modified Consistent Allegation Rule.
There are other psychosocial variables that should also be investigated in future
research. For example, the presence of a custody dispute in cases of alleged sexual
abuse has shown a weak to moderate association with subsequent decisions not to
substantiate sexual abuse in a number of studies that allow for direct comparisons of
custody and noncustody cases drawn from the same clinical populations (Bowen &
Aldous, 1999; Haskett et al., 1995; Paradise, Rostain, & Nathanson, 1988; Trocmé
et al., 2001; Wakefield & Underwager, 1989). These studies reviewed a combined
total of 1,645 cases of alleged child sexual abuse of which 250 included a custody
dispute component. The overall substantiation rate for cases with a custody dis-
pute component was 18% versus 49% for the noncustody cases (φ = .22). There are
numerous other psychosocial variables that may provide some small amount of in-
cremental validity, for example, the age of the alleged perpetrator; the relationship
of the alleged perpetrator to the alleged victim; prior investigations or substantia-
tions of abuse for either the alleged victim or the alleged perpetrator; the content
of the abuse allegations (bizarre or nonbizarre allegations); and certain aspects of
the psychosocial history of the alleged perpetrator (e.g., substance abuse, violence,
impulsive behavior), the alleged victim (e.g., past history of true or false allegations
of abuse), and the adult who is alleging that abuse occurred (e.g., signs of men-
tal illness, especially paranoid symptoms, cf. Benedek et al., 1998; Gardner, 1995;
Rogers, 1992). The incremental validity of these and other variables over the Modi-
fied Consistent Allegation Rule may be small because (a) of the ceiling effect—the
Modified Consistent Allegation Rule already appears to account for much of the
variance in substantiation decisions—and (b) other predictor variables are likely
to correlated with the two main predictors already incorporated into the Modified
Consistent Allegation Rule.
Research designed to replicate and extend the current analysis is urgently
needed and should proceed by (a) creating a comprehensive list of candidate psy-
chosocial predictor variables, (b) developing and testing a questionnaire designed
to record important features of FCSAEs, including the values of specific predictor
variables and outcomes, (c) selecting field research sites where high quality FCSAEs
are being conducted, (d) performing record reviews of completed FCSAEs at the se-
lected sites using the previously developed questionnaire to record data, (e) using
statistical techniques and possibly neural networks (e.g., Marshall & English, 2000)
to develop and cross validate predictive models, (f) expanding the Modified Consis-
tent Allegation Rule as needed, or creating another decision procedure, preferably
one that could be reduced to an easily disseminated written format, and (g) planning
110 Herman
for the dissemination of the decision procedure by seeking endorsements from sci-
entific and legal experts in the field in order to provide policy makers and profes-
sional organizations with arguments and incentives for recommending or mandating
the adoption of the decision procedure.
Precise Estimates of Clinician Accuracy

The current analysis has relied on a somewhat informal method for estimat-
ing a liberal upper bound on the accuracy of clinical judgments about child sexual
abuse evaluations. The method employed here cannot be used to form a precise sta-
tistical estimate of clinician accuracy complete with a confidence interval. Research
designed to provide more precise estimates of the accuracy of clinician judgments
about allegations of child sexual abuse would be desirable. The application of latent
class analyses may make it possible to form more precise estimates of the accuracy
of clinical judgments about allegations of child sexual abuse, even in the absence of
a reliable gold standard (cf. Faraone & Tsuang, 1994; Hui & Xiao, 1998; Weller &
Mann, 1997). These models work by comparing decisions made by multiple raters
in response to the same series of cases. An additional benefit is that these models
allow for the estimation of the accuracy of specific raters, which could help us form
a more precise idea of the advantage in accuracy conferred by advanced academic
and professional training. It should be noted that there are certain insurmountable
problems with the applicability of at least some latent class models to the analysis
of clinical judgments about child sexual abuse. Specifically, it will be impossible to
achieve both high ecological validity and full statistical independence among clini-
cians’ judgments when multiple clinicians are evaluating the same cases (for a cogent
discussion of a similar problem with respect to research on the polygraph see Saxe
& Ben-Shakhar, 1999).
A well constructed study using latent class models to estimate the accuracy of
clinical judgments about allegations of child sexual abuse would probably have to
rely on written case vignettes or extracts from actual case files, possibly with the
addition of video or audiotape recordings. If the cases are carefully chosen (or con-
structed), using a research design similar to that of Jackson and Nuttall (1993), this
research could also be used to examine associations between predictor variables
and substantiation decisions and could help to cross validate findings from the pre-
viously proposed field studies. A study of this type might also be used to evaluate
the effects of case variables such as the age or race of the child on the accuracy
of evaluator judgments. There is another interesting question about the nature of
MHPs’ judgments about child sexual abuse allegations that could be addressed in a
study of this type: do most clinicians tend to mentally classify allegations into two
categories (substantiated and unsubstantiated) or three categories (substantiated,
indicated, and unsubstantiated), or do some or all clinicians tend to think of the va-
lidity of abuse allegations in terms of a continuum of probability? This issue could
be studied empirically by asking evaluators to classify abuse allegations using into
either two or three categories and then also asking them to assign a numerical prob-
ability rating indicating their level of certainty (0–100%) that the allegations are
true.
Recommendations for Clinical Practice

The main recommendation that follows from the current analysis is that the re-
finement, cross validation, dissemination, and mandated adoption of an objective,
empirically-grounded procedure for making substantiation decisions in FCSAEs is
an urgent research and policy priority. This will take time. In the interim, CPS case-
workers and other MHPs are probably making thousands of erroneous substanti-
ation decisions each year. While we await the rationalization of decision-making
procedures in this domain, we have a professional and ethical responsibility to try
to (a) decrease these errors, and (b) reduce secondary trauma inflicted on children
and families as a result of poorly conducted FCSAEs. We can work towards this
goal by finding better ways to encourage and enforce closer adherence to existing
published guidelines, ethics codes, consensus statements, relevant legal precedents,
and rules, and recommendations resulting from empirical research. A comprehen-
sive review of this literature and research is beyond the scope of this article. How-
ever, there are a number of relatively simple principles that can be extracted from
published guidelines and empirical research that, if they could be more widely im-
plemented, would most likely lead to a reduction in the risk of harm to the public.
These principles and guidelines can be divided into two categories: (a) practices that
could be immediately implemented in all or almost all FCSAEs at little or no cost,
and (b) principles that are too costly or impractical to implement in all settings, but
which should be followed whenever possible.
Recommendations that Could be Immediately Implemented at Little or No Cost

(1) Children should be interviewed no more than two to three times, and
at least two interviews are preferred to a single interview (American
Academy of Child and Adolescent Psychiatry, 1997; Quinn, 2002). In a
chapter on FCSAEs in the key text in the field of forensic child psychia-
try, Quinn (2002) writes: “Having more than three interviews increases the
chance that the child will feel coerced into elaborating the story or may in-
crease the degree of contamination of the child’s story by the interviewer.”
(p. 155) However, see the study by Carnes, Nelson-Gardell, Wilson, and
Orgassa (2001), which suggests that, in a small percentage of cases, as many
as five, but no more than five, child interviews may be necessary.
(2) All forensic child interviews should be electronically recorded from start
to finish, preferably on videotape, because research has shown that inter-
viewers have poor recall for what was actually said in forensic interviews
(Benedek et al., 1998; Berliner & Lieb, 2001; Ceci & Bruck, 2000; Pathak
& Thompson, 1999; Warren & Woodall, 1999).
(3) The roles of therapist and forensic evaluator should be kept separate and
should never be performed by the same individual except under very rare
circumstances, for example, in isolated communities where only one MHP
is available (Committee on Ethical Guidelines for Forensic Psychologists,
1991; Shuman, Greenberg, Heilbrun, & Foote, 1998; Strasburger, Gutheil,
& Brodsky, 1997).
112 Herman
(4) Structured interview protocols should be used in all child interviews (Poole
& Lamb, 1998; Sternberg, Lamb, Orbach, Esplin, & Mitchell, 2001; Walker,
2002) even if training in their use is not practical or immediately available.
The National Institute of Child Health and Human Development Protocol
for Investigative Interviews of Alleged Sex-abuse Victims (a copy of this pro-
tocol is available in Orbach, Hershkowitz, Lamb, Esplin, & Horowitz, 2000)
is sufficiently simple and self-explanatory to be used as is without provid-
ing interviewers with any additional training. For advanced practitioners,
Poole and Lamb (1998) offer a valuable, detailed description of how to
conduct a protocol-based child interview.
(5) Practitioners should adhere more closely to already published practice
guidelines and ethical standards for FCSAEs and forensic mental health
assessment in general (e.g., American Academy of Child and Adolescent
Psychiatry, 1997; American Professional Society on the Abuse of Children,
1997; Heilbrun, 2001). Licensed MHPs should adhere closely to the ethical
standards of their professions and to ethical standards for forensic mental
health practice (e.g., Committee on Ethical Guidelines for Forensic Psy-
chologists, 1991; Kuehnle, 1998). These guidelines and standards should be
digested, adapted, summarized, and disseminated nationally for use in CPS
agencies and other institutional settings where FCSAEs are conducted.
(6) There are three key texts that should be required reading for practitioners
of FCSAEs: Jeopardy in the Courtroom: A Scientific Analysis of Children’s
Testimony (Ceci & Bruck, 1995), Investigative Interviews of Children: A
Guide for Helping Professionals (Poole & Lamb, 1998), and Expert Wit-
nesses in Child Abuse Cases: What Can and Should Be Said in Court (Ceci
& Hembrooke, 1998). These three books are a starting point; there are
many other books, articles, and legal precedents with which practitioners
of FCSAEs who testify in child abuse cases should be familiar.5 The prin-
ciples for practice that are stated or implied in these books, articles, rules
and decisions should be digested, adapted, summarized, and disseminated
for use in CPS agencies and other settings where FCSAEs are conducted.
Recommendations that Should be Implemented Whenever Possible

(1) In civil and criminal sexual abuse cases requiring courtroom testimony by
mental health experts, court-appointed experts should be used more of-
ten and the reliance on mental health experts hired by one side or the
other should be avoided when possible (Grove, Barden, Garb, & Lilienfeld,
5 For example, Benedek et al. (1998), Bruck and Ceci (1995), Bruck et al. (1998), Bruck, Ceci, and
Hembrooke (2002), Campbell (1998), Daubert v. Merrell Dow Pharmaceuticals, Inc. (1993), Dawes et al.
(1989), Faust and Ziskin (1988), Federal Rules of Evidence (2004), Fincham et al. (1994), Frye v. United
States (1923), Garb (1998), Gardner (1995, 1998), General Electric Co. v. Joiner (1997), Heilbrun (2001),
Horner and Guyer (1991a, 1991b), Horner et al. (1992, 1993), Idaho v. Wright (1990), Jackson and
Nuttall (1993), Kuehnle (1996), Kumho Tire Co., Ltd. v. Carmichael (1999), Melton and Limber (1989)
Melton, Petrila, Poythress, and Slobogin (1997), Poole and Lindsay (1998), Quinn (2002), Wood (1996),
and Ziskin (1995).
2002). The American Association for the Advancement of Science (n.d.)

has undertaken an impressive project that is designed to facilitate the in-
creased use of court-appointed scientific experts in civil and criminal cases.
(2) Forensic interviews in FCSAEs should be conducted by—or at least super-
vised by—a qualified doctoral-level forensic psychologist or social worker,
or by a forensic psychiatrist (cf. American Academy of Child and Adoles-
cent Psychiatry, 1997).
(3) FCSAEs should be conducted by multidisciplinary teams that include at
least one forensically-trained, doctoral-level MHP or forensic psychiatrist
and at least one medical professional who is an expert in the medical diag-
nosis of child sexual abuse. The presence of CPS caseworkers, police inves-
tigators, and prosecuting attorneys on the team may also be desirable (cf.
Governor’s Task Force on Children’s Justice, 1998).
A project that would probably benefit many children and adults who become
entangled in cases of child sexual abuse allegations would be the creation of a short,
cogent, consensus summary of the scientific findings and relevant legal precedents
specifically designed to help frontline legal decision makers make more scientifically
informed decisions. What is needed is a shorter and more focused version of the
excellent, but lengthy and hard to find, Legal and Mental Health Perspectives on
Child Custody Law: A Deskbook for Judges by Benedek et al. (1998). Ideally, as
in the case of Benedek et al., this summary should be prepared as the consensus
statement of a committee of top scientific and legal experts and distributed via the
Web and in print to legal decision makers and practitioners of FCSAEs.
CONCLUSION
Many practitioners of FCSAEs lack adequate professional preparation and cur-

rently perform poorly as data collectors, data interpreters, and as communicators
and educators of legal decision makers. Furthermore, substantiation decisions in
FCSAEs currently lack a firm scientific foundation. Achieving closer adherence to
already available best practice guidelines and other principles derived from empir-
ical research is a desirable goal, but existing standards do not adequately address a
key reason for the currently high error rates in FCSAEs: the reliance on unassisted
clinical judgment for making substantiation decisions. Consequently, closer adher-
ence to current best practice standards, even if it could be achieved, might not lead
to significant reductions in decision errors. The approach taken here is a construc-
tive one of (a) acknowledging that substantiation decisions are going to continue to
be made in large numbers by MHPs, and (b) proposing a method for improving the
consistency, quality, and, probably, the accuracy of these decisions.
The current analysis suggests that the best hope for achieving significant im-
provements in the overall quality of FCSAEs does not lie with voluntary changes
in behavior by practitioners or with that attractive—but expensive and ineffective—
panacea, “more training,” but rather with the dissemination, adoption, and enforce-
ment of clear practice guidelines and, especially, with the creation and widespread
114 Herman
adoption of an empirically-grounded, objective procedure for making substantia-

tion decisions. Only top-down, systemic changes endorsed by scientific and legal
experts and mandated by the courts, government policy makers, licensing boards,
and professional organizations stand a reasonable chance of significantly reducing
the current substantial risks to the public from poorly conducted FCSAEs.
REFERENCES
Abrams, S. (1995). False memory syndrome vs. total repression. Journal of Psychiatry and Law, 23, 283–
293.
American Academy of Child and Adolescent Psychiatry. (1997). Practice parameters for the forensic
evaluation of children and adolescents who may have been physically or sexually abused. Journal of
the American Academy of Child and Adolescent Psychiatry, 36, 423–442.
American Association for the Advancement of Science. (n.d.). Court appointed scientific experts: A
demonstration project of the American Association for the Advancement of Science. Retrieved July
7, 2004 from http://www.aaas.org/spp/case/case.htm
American Professional Society on the Abuse of Children. (1997). Psychosocial evaluation of suspected
sexual abuse in children (2nd ed.). Chicago: Author.
American Psychiatric Association. (1994). Diagnostic and statistical manual of mental disorders (4th ed.).
Washington, DC: Author.
Baird, C., & Wagner, D. (1999). Risk assessment in Child Protective Services: Consensus and actuarial
model reliability. Child Welfare, 78, 723–737.
Baird, C., & Wagner, D. (2000). The relative validity of actuarial- and consensus-based risk assessment
systems. Children and Youth Services Review, 22, 839–871.
Benedek, E. P., Derdeyn, A. P., Effron, E. J., Guyer, M. J., Hayden, K. S., Jurow, G. L., et al. (1998).
Legal and mental health perspectives on child custody law: A deskbook for judges. St. Paul, MN: West
Group.
Berliner, L. (1998). The use of expert testimony in child sexual abuse cases. In S. J. Ceci & H. Hembrooke
(Eds.), Expert witnesses in child abuse case: What can and should be said in court (pp. 11–27).
Washington, DC: American Psychological Association.
Berliner, L., & Conte, J. R. (1993). Sexual abuse evaluations: Conceptual and empirical obstacles. Child
Abuse and Neglect, 17, 111–125.
Berliner, L., & Lieb, R. (2001). Child sexual abuse investigations: Testing documentation meth-
ods. Olympia, WA: Washington State Institute for Public Policy. Retrieved July 1, 2004 from
http://www.wsipp.wa.gov/rptfiles/pilotprojects.pdf
Bernet, W. (1997). Case study: Allegations of abuse created in a single interview. Journal of the American
Academy of Child and Adolescent Psychiatry, 36, 966–970.
Besharov, D. J. (1994). Responding to child sexual abuse: The need for a balanced approach. Fu-
ture of Children, 4, 135–155. Retrieved July 1, 2004 from http://www.futureofchildren.org/usr doc/
Vol4no2art8.pdf
Blenkner, M. (1954). Predictive factors in the initial interview in family casework. Social Science Review,
28, 65–73.
Bow, J. N., Quinnell, F. A., Zaroff, M., & Assemany, A. (2002). Assessment of sexual abuse allegations
in child custody cases. Professional Psychology: Research and Practice, 33, 566–575.
Bowen, K., & Aldous, M. B. (1999). Medical evaluation of sexual abuse in children without disclosed or
witnessed abuse. Archives of Pediatric and Adolescent Medicine, 153, 1160–1164.
Bradley, A. R., & Wood, J. M. (1996). How do children tell? The disclosure process in child sexual abuse.
Child Abuse and Neglect, 20, 881–891.
Bruck, M. (1998). The trials and tribulations of a novice expert witness. In S. J. Ceci & H. Hembrooke
(Eds.), Expert witnesses in child abuse case: What can and should be said in court (pp. 85–104).
Bruck, M., & Ceci, S. J. (1995). Amicus brief for the case of State of New Jersey v Michaels presented by
Committee of Concerned Social Scientists. Psychology, Public Policy, and Law, 1, 272–322.
Bruck, M., Ceci, S. J., & Hembrooke, H. (1998). Reliability and credibility of young children’s reports.
From research to policy and practice. American Psychologist, 53, 136–151.
Bruck, M., Ceci, S. J., & Hembrooke, H. (2002). The nature of children’s true and false narratives. De-
velopmental Review, 22, 520–554.
Campbell, T. W. (1998). Smoke and mirrors: The devastating effect of false sexual abuse claims. New
York: Insight Books/Plenum Press.
Carnes, C. N., Nelson-Gardell, D., Wilson, C., & Orgassa, U. C. (2001). Extended forensic evaluation
when sexual abuse is suspected: A multisite field study. Child Maltreatment, 6, 230–242.
Ceci, S. J., & Bruck, M. (1995). Jeopardy in the courtroom: A scientific analysis of children’s testimony.
Ceci, S. J., & Bruck, M. (2000). Why judges must insist on electronically preserved recordings of child
interviews. Court Review, 37, 10–12.
Ceci, S. J., & Hembrooke, H. (Eds.). (1998). Expert witnesses in child abuse cases: What can and should
be said in court. Washington, DC: American Psychological Association.
Ceci, S. J., Huffman, M. L. C., & Smith, E. (1994). Repeatedly thinking about a non-event: Source misat-
tributions among preschoolers. Consciousness and Cognition: An International Journal, 3, 388–407.
Ceci, S. J., Loftus, E. F., Leichtman, M. D., & Bruck, M. (1994). The possible role of source misattri-
butions in the creation of false beliefs among preschoolers. International Journal of Clinical and
Experimental Hypnosis, 42, 304–320.
Children’s Research Center. (1999). The improvement of child protective services with structured decision
making: The CRC model. Madison, WI: Author. Retrieved October 25, 2003 from http://www.nccd-
crc.org/crc/crc sdm book.pdf
Coleman, L. (1989). Learning from the McMartin hoax. Issues in Child Abuse Accusations, 1. Retrieved
October 25, 2003 from http://www.ipt-forensics.com/journal/volume1/j1 2 7.htm
Committee on Ethical Guidelines for Forensic Psychologists. (1991). Specialty guidelines for foren-
sic psychologists. Law and Human Behavior, 15, 655–665. Retrieved October 25, 2003 from
http://www.unl.edu/ap-ls/foren.pdf
Conte, J. R., Sorenson, M. A., Fogarty, L., & Rosa, J. D. (1991). Evaluating children’s reports of sexual
abuse: Results from a survey of professionals. American Journal of Orthopsychiatry, 61, 428–437.
Crocker, L., & Algina, J. (1986). Introduction to classical and modern test theory. New York: Holt,
Rinehart, and Winston, Inc.
Cross, T. P., & Saxe, L. (1992). A critique of the validity of polygraph testing in child sexual abuse cases.
Journal of Child Sexual Abuse, 1, 19–33.
Daubert v. Merrell Dow Pharmaceuticals, Inc., 509 U.S. 579, 113 S.Ct. 2786 (1993).
Davies, D., Cole, J., Albertella, G., McCulloch, L., Allen, K., & Kekevian, H. (1996). A model for con-
ducting forensic interviews with child victims of abuse. Child Maltreatment, 1, 189–199.
Davies, G. M., & Wescott, H. L. (1999). Interviewing child witnesses under the Memorandum of Good
Practice: A research review. London: Home Office.
Dawes, R. M., Faust, D., & Meehl, P. E. (1989). Clinical versus actuarial judgment. Science, 243, 1668–
1674.
Department of Health and Human Services. (2004). Child maltreatment 2002. Washington, DC: Author.
Retrieved June 13, 2004, from http://nccanch.acf.hhs.gov/general/stats/index.cfm.
DePaulo, B. M., Charlton, K., Cooper, H., Lindsay, J. J., & Muhlenbruck, L. (1997). The accuracy-
confidence correlation in the detection of deception. Personality and Social Psychology Review, 1,
346–357.
Drach, K. M., Wientzen, J., & Ricci, L. R. (2001). The diagnostic utility of sexual behavior problems
in diagnosing sexual abuse in a forensic child abuse evaluation clinic. Child Abuse and Neglect, 25,
489–503.
Ekman, P., & O’Sullivan, M. (1991). Who can catch a liar? American Psychologist, 46, 913–920.
Ekman, P., O’Sullivan, M., & Frank, M. G. (1999). A few can catch a liar. Psychological Science, 10,
263–266.
Elaad, E. (2003). Effects of feedback on the overestimated capacity to detect lies and the underestimated
ability to tell lies. Applied Cognitive Psychology, 17, 349–363.
Elliott, D. M., & Briere, J. (1994). Forensic sexual abuse evaluations of older children: Disclosures and
symptomatology. Behavioral Sciences and the Law, 12, 261–277.
Everson, M. D., & Boat, B. W. (1989). False allegations of sexual abuse by children and adolescents.
Journal of the American Academy of Child and Adolescent Psychiatry, 28, 230–235.
Faller, K. C., & Henry, J. (2000). Child sexual abuse: A case study in community collaboration. Child
Abuse and Neglect, 24, 1215–1225.
Faraone, S. V., & Tsuang, M. T. (1994). Measuring diagnostic accuracy in the absence of a “gold stan-
dard.” American Journal of Psychiatry, 151, 650–657.
Faust, D., & Ziskin, J. (1988). The expert witness in psychology and psychiatry. Science, 241, 31–35.
Federal Rules of Evidence, 28 App. U.S.C. (2004).
Fincham, F. D., Beach, S. R. H., Moore, T., & Diener, C. (1994). The professional response to child
sexual abuse: Whose interests are being served? Family Relations, 43, 244–254.
116 Herman
Finkelhor, D. (1993). The main problem is still underreporting, not overreporting. In R. J. Gelles &
D. R. Loseke (Eds.), Current controversies on family violence (pp. 273–287). Newbury Park, CA:
Sage.
Finlayson, L. M., & Koocher, G. P. (1991). Professional judgment and child abuse reporting in sexual
abuse cases. Professional Psychology: Research and Practice, 22, 464–472.
Fisher, C. B. (1995). American Psychological Association’s (1992) Ethics Code and the validation of
sexual abuse in day-care settings. Psychology, Public Policy, and Law, 1, 461–478.
Fisher, C. B., & Whiting, K. A. (1998). How valid are child sexual abuse validations? In S. J. Ceci &
H. Hembrooke (Eds.), Expert witnesses in child abuse case: What can and should be said in court
(pp. 159–184). Washington, DC: American Psychological Association.
Frye v. United States, 293 F. 1013 (D. C. Cir. 1923).
Fukurai, H., & Butler, E. W. (1994). Sociologists in action: The McMartin sexual abuse case, litigation,
justice, and mass hysteria. American Sociologist, 25, 44–71.
Garb, H. N. (1989). Clinical judgment, clinical training, and professional experience. Psychological Bul-
letin, 105, 387–396.
Garb, H. N. (1998). Studying the clinician: Judgment research and psychological assessment. Washington,
DC: American Psychological Association.
Gardner, R. A. (1995). Protocols for the sex-abuse evaluation. Cresskill, NJ: Creative Therapeutics.
Gardner, R. A. (1998). The parental alienation syndrome (2nd ed.). Cresskill, NJ: Creative Therapeutics,
Inc.
General Electric Co. v. Joiner, 118 S. Ct. 512 (1997).
Goodman, G. S., Emery, R. E., & Haugaard, J. J. (1998). Developmental psychology and law: Divorce,
child maltreatment, foster care, and adoption. In I. Siegel & A. Renninger (Eds.), Child psychology
in practice (pp. 775–876). New York: Wiley.
Goodman-Brown, T. B., Edelstein, R. S., Goodman, G. S., Jones, D. P. H., & Gordon, D. S. (2003). Why
children tell: A model of children’s disclosure of sexual abuse. Child Abuse and Neglect, 27, 525–
540.
Governor’s Task Force on Children’s Justice. (1998). A model child abuse protocol: Coordinated inves-
tigative team approach. Michigan: State of Michigan Family Independence Agency. Retrieved July
15, 2004 from http://www.michigan.gov/documents/fia-pub794 13083 7.pdf
Greenhalgh, T. (1997). How to read a paper: Papers that report diagnostic or screening tests.
British Medical Journal, 315, 540–543. Retrieved July 1, 2004 from http://bmj.bmjjournals.com/cgi/
content/full/315/7107/540.
Grove, W. M., Barden, R. C., Garb, H. N., & Lilienfeld, S. O. (2002). Failure of Rorschach-
Comprehensive-System-based testimony to be admissible under the Daubert–Joiner–Kumho stan-
dard. Psychology, Public Policy, and Law, 8, 216–234.
Grove, W. M., Zald, D. H., Lebow, B. S., Snitz, B. E., & Nelson, C. (2000). Clinical versus mechanical
prediction: A meta-analysis. Psychological Assessment, 12, 19–30.
Hanson, R. K. (1997). The development of a brief actuarial risk scale for sex offense recidivism. Ottawa,
Ontario, Canada: Department of the Solicitor General of Canada. Retrieved October 25, 2003 from
http://www.sgc.gc.ca/publications/corrections/199704 e.pdf
Hanson, R. K. (1998). What do we know about sex offender risk assessment? Psychology, Public Policy,
and Law, 4, 50–72.
Hanson, R. K., & Bussiere, M. T. (1996). Predictors of sexual offender recidivism: A meta-analysis (User
Report No. 96-04). Ottawa, Ontario, Canada: Department of the Solicitor General of Canada. Re-
trieved July 1, 2004 from http://www.psepc-sppcc.gc.ca/publications/corrections/199604 e.pdf
Haskett, M. E., Wayland, K., Hutcheson, J. S., & Tavana, T. (1995). Substantiation of sexual abuse alle-
gations: Factors involved in the decision-making process. Journal of Child Sexual Abuse, 4, 19–47.
Heilbrun, K. (2001). Principles of forensic mental health assessment. New York: Kluwer Academic/
Plenum Publishers.
Heilbrun, K. (2003). Principles of forensic mental health assessment: Implications for the forensic assess-
ment of sexual offenders. Annals of the New York Academy of Science, 989, 167–184.
Home Office. (1992). Memorandum of good practice on video recorded interviews with child witnesses for
criminal proceedings. London: Her Majesty’s Stationery Office.
Horner, T. M., & Guyer, M. J. (1991a). Prediction, prevention, and clinical expertise in child custody
cases in which allegations of child sexual abuse have been made: I. Predictable rates of diagnos-
tic error in relation to various clinical decision making strategies. Family Law Quarterly, 25, 217–
252.
Horner, T. M., & Guyer, M. J. (1991b). Prediction, prevention, and clinical expertise in child custody
cases in which allegations of child sexual abuse have been made: II. Prevalence rates of child sexual
abuse and the precision of “tests” constructed to diagnose it. Family Law Quarterly, 25, 381–409.
Horner, T. M., Guyer, M. J., & Kalter, N. M. (1992). Prediction, prevention, and clinical expertise in
child custody cases in which allegations of child sexual abuse have been made: III. Studies of expert
opinion formation. Family Law Quarterly, 25, 381–409.
Horner, T. M., Guyer, M. J., & Kalter, N. M. (1993). Clinical expertise and the assessment of child
sexual abuse. Journal of the American Academy of Child and Adolescent Psychiatry, 32, 925–
931.
Horowitz, S. W., Lamb, M. E., Esplin, P. W., Boychuk, T., & Reiter-Laverly, L. (1995). Establishing the
ground truth in studies of child sexual abuse. Expert Evidence: The International Digest of Human
Behaviour Science and Law, 4, 42–51.
Hui, S. L., & Xiao, H. Z. (1998). Evaluation of diagnostic tests without gold standards. Statistical Methods
in Medical Research, 7, 354–370.
Humphrey, H. H. (1985). Report On Scott County Investigations. Minneapolis, MN: Minnesota Attorney
General.
Huron Consulting Group. (2004). States that follow Daubert. Retrieved July 10, 2004 from
http://www.huronconsultinggroup.com/uploadedfiles/daubert states4.pdf
Idaho v. Wright, 497 U.S. 805 (1990).
Jackson, H., & Nuttall, R. (1993). Clinician responses to sexual abuse allegations. Child Abuse and Ne-
glect, 17, 127–143.
Johnson, J. (2004, May 1). Conviction tossed after 19 years. A man’s molestation trial is nullified after
several witnesses retract testimony they gave as children. Los Angeles Times, p. 1 (California; Metro;
Metro Desk; Part B).
Jones, D. P., & McGraw, J. M. (1987). Reliable and fictitious accounts of sexual abuse to children. Journal
of Interpersonal Violence, 2, 27–45.
Kassin, S. M. (2002). Human judges of truth, deception, and credibility: Confident but erroneous. Car-
dozo Law Review, 23, 809–816.
Keary, K., & Fitzpatrick, C. (1994). Children’s disclosure of sexual abuse during formal investigation.
Kendall-Tackett, K. A., & Watson, M. W. (1992). Use of anatomical dolls by Boston-area professionals.
Kraemer, H. C. (1992). Evaluating medical tests: Objective and quantitative guidelines. Newbury Park,
CA: Sage.
Kraemer, H. C., Kazdin, A. E., Offord, D. R., Kessler, R. C., Jensen, P. S., & Kupfer, D. J. (1999).
Measuring the potency of risk factors for clinical or policy significance. Psychological Methods, 4,
257–271.
Kuehnle, K. (1996). Assessing allegations of child sexual abuse. Sarasota, FL: Professional Resource
Press/Professional Resource Exchange, Inc.
Kuehnle, K. (1998). Ethics and the forensic expert: A case study of child custody involving allegations of
child sexual abuse. Ethics and Behavior, 8, 1–18.
Kumho Tire Co., Ltd. v. Carmichael, 119 S. Ct. 1167 (1999).
Lamb, M. E. (1994). The investigation of child sexual abuse: An interdisciplinary consensus statement.
Lamb, M. E., Sternberg, K. J., Esplin, P. W., Hershkowitz, I., & Orbach, Y. (1997). Assessing the credi-
bility of children’s allegations of sexual abuse: A survey of recent research. Learning and Individual
Differences, 9, 175–194.
Lamb, M. E., Sternberg, K. J., Esplin, P. W., Hershkowitz, I., Orbach, Y., & Hovav, M. (1997). Criterion-
based content analysis: A field validation study. Child Abuse and Neglect, 21, 255–264.
Lamb, M. E., Sternberg, K. J., Orbach, Y., Esplin, P. W., & Mitchell, S. (2002). Is ongoing feedback
necessary to maintain the quality of investigative interviews with allegedly abused children? Applied
Developmental Science, 6(1), 35–41.
Lawlor, R. J. (1998). The expert witness in child sexual abuse cases: A clinician’s view. In S. J. Ceci &
H. Hembrooke (Eds.), Expert witnesses in child sexual abuse cases: What can and should be said in
court (pp. 105–122). Washington, DC: American Psychological Association.
Lawson, L., & Chaffin, M. (1992). False negatives in sexual abuse disclosure interviews: Incidence and
influence of caretaker’s belief in abuse in cases of accidental abuse discovery by diagnosis of STD.
Journal of Interpersonal Violence, 7, 532–542.
Leichtman, M. D., & Ceci, S. J. (1995). The effects of stereotypes and suggestions on preschooler’s re-
ports. Developmental Psychology, 31, 568–578.
Leslie, J. (2004, July 15). Experts doubt child molester’s guilt. WXIA-TV Atlanta. Retrieved July 17, 2004
from http://www.11alive.com/news/news article.aspx?storyid=49215
Lindsey, D. (1992). Reliability of the foster care placement decision: An analysis of national survey data.
Research on Social Work Practice, 2, 65–80.
118 Herman
Marshall, D. B., & English, D. J. (2000). Neural network modeling of risk assessment in Child Protective
Services. Psychological Methods, 5, 102–124.
Mason, M. A. (1991). A judicial dilemma: Expert witness testimony in child sex abuse cases. Journal of
Psychiatry and Law, 19, 185–219.
Mason, M. A. (1998). Expert testimony regarding the characteristics of sexually abused children: A con-
troversy on both sides of the bench. In S. J. Ceci & H. Hembrooke (Eds.), Expert witnesses in child
abuse cases (pp. 217–234). Washington, DC: American Psychological Association.
McGraw, J. M., & Smith, H. A. (1992). Child sexual abuse allegations amidst divorce and custody pro-
ceedings: Refining the validation process. Journal of Child Sexual Abuse, 1, 49–62.
Meehl, P. E. (1954). Clinical versus statistical prediction: A theoretical analysis and a review of the evidence.
Minneapolis: University of Minnesota Press.
Melton, G. B., & Limber, S. (1989). Psychologists’ involvement in cases of child maltreatment. Limits of
role and expertise. American Psychologist, 44, 1225–1233.
Melton, G. B., Petrila, J., Poythress, N. G., & Slobogin, C. (1997). Psychological evaluations for the courts
(2nd ed.). New York: Guilford.
Milner, J. S. (1994). Assessing physical child abuse risk: The Child Abuse Potential inventory. Clinical
Psychology Review, 14, 547–583.
Myers, J. E. B., & Stern, A. E. (2002). Expert testimony. In J. E. B. Myers, L. Berliner, J. Briere, C. T.
Hendrix, C. Jenny, & T. A. Reid (Eds.), The APSAC handbook on child maltreatment (pp. 379–401).
Thousand Oaks, CA: Sage.
Nathan, D., & Snedeker, M. (2001). Satan’s silence: Ritual abuse and the making of a modern American
witch hunt. Lincoln, NE: Authors Choice Press.
National Council on Crime and Delinquency. (2000). Child abuse and neglect: Improving consistency in
decision-making. Retrieved July 6, 2004, from http://www.nccd-crc.org/crc/pubs/focusreliab.pdf
Oates, R. K., Jones, D. P., Denson, D., Sirotnak, A., Gary, N., & Krugman, R. D. (2000). Erroneous
concerns about child sexual abuse. Child Abuse and Neglect, 24, 149–157.
Oberlander, L. B. (1995). Psycholegal issues in child sexual abuse evaluations: A survey of forensic men-
tal health professionals. Child Abuse and Neglect, 19, 475–490.
Orbach, Y., Hershkowitz, I., Lamb, M. E., Esplin, P. W., & Horowitz, D. (2000). Assessing the value of
structured protocols for forensic interviews of alleged child abuse victims. Child Abuse and Neglect,
24, 733–752.
Paradise, J. E., Rostain, A. L., & Nathanson, M. (1988). Substantiation of sexual abuse charges when
parents dispute custody or visitation. Pediatrics, 81, 835–839.
Pathak, M. K., & Thompson, W. C. (1999). From child to witness to jury: Effects of suggestion on the
transmission and evaluation of hearsay. Psychology, Public Policy, and Law, 5, 372–387.
Peddle, N., & Wang, C. T. (2001). Current trends in child abuse prevention, reporting, and fatalities:
The 1999 fifty state survey (Working Paper No. 808). Chicago, IL: Prevent Child Abuse America.
Retrieved February 10, 2002, from http://www.preventchildabuse.org/research ctr/fact sheets/
1999 50 survey.pdf
Pillai, M. (2002). Allegations of abuse: The need for responsible practice. Medical Science and the Law,
42, 149–159.
Poole, D. A., & Lamb, M. E. (1998). Investigative interviews of children: A guide for helping professionals.
Poole, D. A., & Lindsay, D. S. (1998). Assessing the accuracy of young children’s reports: Lessons from
the investigation of child sexual abuse. Applied and Preventive Psychology, 7, 1–26.
Quinn, K. M. (2002). Interviewing children for suspected sexual abuse. In D. Schetky & E. P.
Benedek (Eds.), Principles and practice of child and adolescent forensic psychiatry. Washington, DC:
American Psychiatric Association.
Rabinowitz, D. (2004). No crueler tyrannies: Accusation, false witness, and other terrors of our times. New
York: Free Press.
Raskin, D. C., & Esplin, P. W. (1991). Statement validity assessment: Interview procedures and content
analysis of children’s statements of sexual abuse. Behavioral Assessment, 13, 265–291.
Realmuto, G. M., Jensen, J., & Wescoe, S. (1990). Specificity and sensitivity of sexually anatomically
correct dolls in substantiating abuse: A pilot study. Journal of the American Academy of Child Ado-
lescent Psychiatry, 29, 743–746.
Realmuto, G. M., & Wescoe, S. (1992). Agreement among professional about a child’s sexual abuse
status: Interviews with sexually anatomically correct dolls as indicators of abuse. Child Abuse and
Neglect, 16, 719–725.
Richardson, D. W. (1990). The effects of a false allegation of child sexual abuse on an intact middle
class family. Issues in Child Abuse Accusations, 2. Retrieved February 1, 2003 from http://www.ipt-
forensics.com/journal/volume2/j2 4 7.htm
Rogers, M. L. (1992). Delusional disorder and the evolution of mistaken sexual allegations in child cus-
tody cases. American Journal of Forensic Psychology, 10, 47–69.
Rosenthal, R. (1995). State of New Jersey v Margaret Kelly Michaels: An overview. Psychology, Public
Policy, and Law, 1, 246–271.
Ruby, C. L., & Brigham, J. C. (1998). Can criteria-based content analysis distinguish between
true and false statements of African-American speakers? Law and Human Behavior, 22, 369–
388.
Ruscio, J. (1998). Information integration in child welfare cases: An introduction to statistical decision
making. Child Maltreatment, 3, 143–156.
San Diego County Grand Jury. (1992). Child sexual abuse, assault, and molest issues (Report
No. 8). San Diego, CA: Author. Retrieved October 2, 2002, from http://www.co.san-diego.ca.us/
cnty/cntydepts/safety/grand/reports/report8.html
San Diego County Grand Jury. (1994). Analysis of child molestation issues (Report No. 7). San Diego,
CA: Author. Retrieved July 1, 2004 from http://www.co.san-diego.ca.us/cnty/cntydepts/safety/
grand/reports/report7a.html
Saxe, L., & Ben-Shakhar, G. (1999). Admissibility of polygraph tests: The application of scientific stan-
dards post-Daubert. Psychology, Public Policy, and Law, 5(1), 203–223.
Sbraga, T. P., & O’Donohue, W. T. (2003). Post hoc reasoning in possible cases of child sexual
abuse: Symptoms of inconclusive origins. Clinical Psychology: Science and Practice, 10, 320–
334.
Seattle Post Intelligencer. (1998). Special Report: A Record of Abuses in Wenatchee. Retrieved July 10,
2004 from http://seattlepi.nwsource.com/powertoharm/
Shumaker, K. R. (2000). Measured professional competence between and among different mental health
disciplines when evaluating and making recommendations in cases of suspected child sexual abuse.
Dissertation Abstracts International, 60(11), 5791B. (UMI No. 9950748)
Shuman, D. W., Greenberg, S., Heilbrun, K., & Foote, W. E. (1998). An immodest proposal: Should
treating mental health professionals be barred from testifying about their patients? Behavioral Sci-
ences and the Law, 16, 509–523.
Sjoeberg, R. L., & Lindblad, F. (2002). Limited disclosure of sexual abuse in children whose experiences
were documented by videotape. American Journal of Psychiatry, 159, 312–314.
Sorensen, T., & Snow, B. (1991). How children tell: The process of disclosure in child sexual abuse. Child
Welfare, 70, 3–15.
Sternberg, K. J., Lamb, M. E., Davies, G. M., & Westcott, H. L. (2001). The Memorandum of Good
Practice: Theory versus application. Child Abuse and Neglect, 25, 669–681.
Sternberg, K. J., Lamb, M. E., Orbach, Y., Esplin, P. W., & Mitchell, S. (2001). Use of a structured
investigative protocol enhances young children’s responses to free-recall prompts in the course of
forensic interviews. Journal of Applied Psychology, 86, 997–1005.
Stevenson, K. M., Leung, P., & Cheung, K. M. (1992). Competency-based evaluation of interviewing
skills in child sexual abuse cases. Social Work Research and Abstracts, 28, 11–16.
Strasburger, L. H., Gutheil, T. G., & Brodsky, A. (1997). On wearing two hats: Role conflict in serving as
both psychotherapist and expert witness. American Journal of Psychiatry, 154, 448–455.
Streiner, D. L. (2003). Diagnosing tests: Using and misusing diagnostic and screening tests. Journal of
Personality Assessment, 81, 209–219.
Tavris, C. (2003, February 28). Mind games: Psychological warfare between therapists and scientists.
Chronicle of Higher Education. Retrieved October 25, 2003 from http://chronicle.com/free/v49/
i25/25b00701.htm
Trocmé, N., MacLaurin, B., Fallon, B., Daciuk, J., Billingsley, D., Tourigny, M., et al. (2001). Canadian
Incidence Study of Reported Child Abuse and Neglect: Final Report. Ottawa, Ontario, Canada: Min-
ister of Public Works and Government Services Canada.
Wakefield, H., & Underwager, R. (1989, April 8, 1989). Techniques for Interviewing Children in Sexual
Abuse Cases. Paper presented at the Fifth Annual Symposium in Forensic Psychology, San Diego,
CA. Retrieved October 2, 2002, from http://fact.on.ca/Info/vac/under2.pdf.
Wakefield, H., & Underwager, R. (1994). The alleged child victim and real victims. In J. J. Krivacska
& J. Money (Eds.), Handbook of forensic sexology (pp. 223–264). Buffalo, NY: Prometheus Books.
Retrieved June 13, 2004 from http://www.ipt-forensics.com/library/alleged.htm
Walker, N. E. (2002). Forensic interviews of children: The components of scientific validity and le-
gal admissibility. Law and Contemporary Problems, 65, 149–178. Retrieved July 7, 2004 from
http://www.law.duke.edu/shell/cite.pl?65+Law+&+Contemp.+Probs.+149+(Winter+2002)
Warren, A. R., & Marsil, D. F. (2002). Children as victims and witnesses in the criminal trial process:
Why children’s suggestibility remains a serious concern. Law and Contemporary Problems, 65, 127–
147.
120 Herman
Warren, A. R., & Woodall, C. E. (1999). How accurate is hearsay testimony? The reliability of hearsay
testimony: How well do interviewers recall their interviews with children? Psychology, Public Policy,
and Law, 5, 355–371.
Weller, S. C., & Mann, N. C. (1997). Assessing rater performance without a “gold standard” using con-
sensus theory. Medical Decision Making, 17, 71–79.
Wood, J. M. (1996). Weighing evidence in sexual abuse evaluations: An introduction to Bayes’s Theorem.
Child Maltreatment, 1, 25–36.
Ziskin, J. (1995). Coping with psychiatric and psychological testimony (5th ed.). Los Angeles: Law and
Psychology Press.

LHB 29 1 87

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

LHB 29 1 87

Uploaded by

Copyright:

Available Formats

Law and Human Behavior, Vol. 29, No.

Improving Decision Making in Forensic Child

PRACTITIONER PERFORMANCE IN FORENSIC CHILD SEXUAL

Data Collection and Communication of Results

HOW ACCURATE ARE SUBSTANTIATION DECISIONS?

In practical terms, most evaluator opinions in FCSAEs boil down to a deci-

The Reliability and Validity of Judgments About Child

Factors that Influence Judgments About Child Sexual

The Reliability and Validity of Clinical Judgment in Related Domains

Table 1. Reliability and Validity of Clinical Judgments

Table 2. Substantiation Rates in Forensic Child Sexual Abuse Evaluations

substantiation rate in evaluations of child sexual abuse allegations in the United

IMPROVING THE QUALITY OF SUBSTANTIATION DECISIONS

A Simple Predictive Model for High Quality Substantiation Decisions

SUMMARY, RECOMMENDATIONS, AND CONCLUSION

An analysis of existing empirical research indicates that clinician judgments

Fig. 1. The Modified Consistent Allegation Rule.

for clinician substantiation decisions in cases of unconfirmed allegation of child sex-

(Daubert v. Merrell Dow Pharmaceuticals, Inc., 1993) is now a controlling precedent

accuracy often suffers because they do not apply consistent weightings to

Recommendations for Research

Precise Estimates of Clinician Accuracy

Recommendations for Clinical Practice

Recommendations that Could be Immediately Implemented at Little or No Cost

Recommendations that Should be Implemented Whenever Possible

2002). The American Association for the Advancement of Science (n.d.)

Many practitioners of FCSAEs lack adequate professional preparation and cur-

adoption of an empirically-grounded, objective procedure for making substantia-

You might also like