You are on page 1of 8

Child Abuse & Neglect 35 (2011) 613–620

Contents lists available at ScienceDirect

Child Abuse & Neglect

Validity of Criteria-Based Content Analysis (CBCA) at trial in


free-narrative interviews
Paolo Roma a,∗ , Pietro San Martini b , Ugo Sabatello c , Roberto Tatarelli a , Stefano Ferracuti a
a
NESMOS, Department of Neuroscience, Mental Health and Sensory Organs, Sant’Andrea Hospital, Faculty of Medicine and Psychology, “Sapienza”
University of Rome, Italy
b
Department of Dynamic and Clinical Psychology, Faculty of Medicine and Psychology, Rome, Italy
c
Department of Child and Adolescent Psychiatry, Faculty of Medicine and Dentistry, Rome, Italy

a r t i c l e i n f o a b s t r a c t

Article history: Objective: The reliability of child witness testimony in sexual abuse cases is often con-
Received 20 September 2009 troversial, and few tools are available. Criteria-Based Content Analysis (CBCA) is a widely
Received in revised form 13 January 2011 used instrument for evaluating psychological credibility in cases of suspected child sexual
Accepted 20 April 2011
abuse. Only few studies have evaluated CBCA scores in children suspected of being sexu-
ally abused. We designed this study to investigate the reliability of CBCA in discriminating
allegations of child sexual abuse during court hearings, by comparing CBCA results with
Keywords:
CBCA
the court’s final, unappealable sentence. We then investigated whether CBCA scores corre-
Child witness testimony lated with age, and whether some criteria were better than others in distinguishing cases
Child sexual abuse allegation of confirmed and unconfirmed abuse.
Suggestive interview Methods: From a pool of 487 child sexual abuse cases, confirmed and unconfirmed cases
were selected using various criteria including child IQ ≥ 70, agreement between the final
trial outcome and the opinion of 3 experts, presence of at least 1 independent validating
informative component in cases of confirmed abuse, and absence of suggestive questions
during the child’s testimonies. This screening yielded a study sample of 60 confirmed and
49 unconfirmed cases. The 14 item version of CBCA was applied to child witness testimony
by 2 expert raters.
Results: Of the 14 criteria tested, 12 achieved satisfactory inter-rater agreement (Maxwell’s
Random Error). Analyses of covariance, with case group (confirmed vs. unconfirmed) and
gender as independent variables and age as a covariate, showed no main effect of gender.
Analyses of the interaction showed that the simple effects of abuse were significant in
both sex. Nine CBCA criteria were satisfied more often among confirmed than unconfirmed
cases; seven criteria increased with age.
Conclusion: CBCA scores distinguish between confirmed and unconfirmed cases. The cri-
teria that distinguish best between the 2 groups are Quantity of Details, Interactions, and
Subjective Experience. CBCA scores correlate positively with age, and independently from
abuse; all the criteria test except 2 (Unusual Details and Misunderstood Details) increase
with age. The agreement rate could be increased by merging criteria Unusual and Super-
fluous details that achieve a low inter-rater agreement when investigated separately.
Practice implication: Given its ability to distinguish between confirmed and unconfirmed
cases of suspected child abuse, the CBCA could be a useful tool for expert opinion. Because
our strict selection criteria make it difficult to generalize our results, further studies should
investigate whether the CBCA is equally useful in the cases we excluded from our study (for
example mental retardation).
© 2011 Elsevier Ltd. All rights reserved.

∗ Corresponding author address: UOC Psichiatria, Via di Grottarossa 1035, Roma 00189, Italy.

0145-2134/$ – see front matter © 2011 Elsevier Ltd. All rights reserved.
doi:10.1016/j.chiabu.2011.04.004
614 P. Roma et al. / Child Abuse & Neglect 35 (2011) 613–620

Introduction

Child testimony in cases of alleged sexual abuse is a controversial matter. A major problem is the truthfulness of the
child’s narrative, often the most important or only evidence brought to court. Few methods are available for assessing child
credibility, and the few currently available tools have not been tested on large populations.
One of the most popular tools for assessing the credibility of child witness testimony in sexual offence trials is the
Statement Validity Assessment (SVA) (Vrij, 2000). The core component of SVA is Criteria-Based Content Analysis (CBCA), a
detailed psychological analysis of child witness testimony. The CBCA hinges clinically on the hypothesis that described events
that really happened differ in content and quality from described events that were not actually experienced (Undeutsch,
1982, 1989). In its original form, CBCA was based on 19 criteria (Steller & Kohnken, 1989). Later, Raskin, Esplin, and Horowitz
(1991) redefined the criteria and eliminated 5 criteria unrelated to the basic memory concept embodied in the Undeutsch
hypothesis, thus proposing a CBCA version comprising 14 criteria.
Several reviews have fully described CBCA and its rationale (Ceci, Kulkofsky, Klemfuss, Sweeney, & Bruck, 2007; Lamb
et al., 1997; Vrij, 2000, 2005). Some research evaluated the CBCA results for factors such as age (Anson, Golding, & Gully,
1993; Buck, Warren, Betman, & Brigham, 2002), fantasy proneness (Merckelbach, 2004), questioning styles, and effects of
multiple interviews (Davies, Westcott, & Horan, 2000; Hershkowitz, Lamb, Sternberg, & Esplin, 1997, Vrij, Mann, Kristen, &
Fisher, 2007). Others focussed on the validity of the tool, evaluating differences in total CBCA scores between supposedly
truthful and false accounts. Several studies took place in laboratories (Akehurst, Kohnken, & Hofer, 2001; Berliner & Conte,
1993; Landry & Brigham, 1992; Porter & Yuille, 1996; Ruby & Brigham, 1997; Vrij, Akehurst, Soukara, & Bull, 2002). Only 4
studies have evaluated the CBCA in the field, using real life events.
In the first of these field studies, Esplin et al. (1988) categorized each criterion as not present (0), present (1), or strongly
present (2) and selected: 20 child abuse cases as confirmed, based on confessions by the perpetrator or strong physical evi-
dence; 20 cases as unconfirmed, based on children’s recantations, judicial dismissal, lack of prosecution, or no corroborating
evidence. They found that the total scores for the 19 criteria were significantly higher for confirmed than for unconfirmed
cases with no overlapping between the 2 distributions.
In a subsequent study, Raskin and Esplin (1991a) based their discrimination between “confirmed” and “doubtful” cases on
the result of polygraphic and medical examinations, confessions, and eyewitness accounts. They used a Likert system 0–1–2
to evaluate each CBCA criterion in 40 statements (20 confirmed; 20 doubtful) and reported that the single criteria were
distributed differently in the 2 groups without overlap between the distribution of scores assigned to statements known
to involve confirmed and doubtful incidents. Later, after Wells and Loftus (1991) pointed out that their criteria actually
evaluated the convincibility of the statement thus resulting in a “circular classification” —they reanalyzed their data so as
to ensure that independent case facts were used to assess the plausibility of abuse. The new results of Raskin and Esplin
(1991b) confirmed the previous findings.
In their field validation study, Lamb et al. (1997) analyzed the investigative interviews of 98 Israeli Jews allegedly victims
of child sexual abuse (4–13 years), including only the cases that contained evidence of actual physical contact between
a known accused and the child and a corroborative element (including medical evidence or witnesses). The investigators
used Independent Case Fact Scales (ICFS) to determine whether the independent case facts made the allegation seem Very
Likely, Quite Likely, Questionable, Quite Unlikely, or Very Unlikely. The investigators used a 0–1 scoring system for the
CBCA (presence/absence of the criterion) and considered only 14 CBCA memory-related criteria (Raskin et al., 1991). For
statistical analyses they collapsed the Very and Quite Likely groups into a plausible group (76 cases) and the Very and Quite
Unlikely groups into an implausible group (13 cases). This procedure disclosed a significantly larger number of CBCA criteria
in the plausible than in the implausible cases (M = 6.74, SD = 2.20 vs. M = 4.85, SD = 2.7). Because the difference between the 2
groups, albeit significant, was extremely small, they concluded by recommending extreme caution in using the CBCA system
in forensic settings.
In a similar study, Craig, Scheibe, Raskin, Kircher, and Dodd (1999) examined 48 depositions (obtained during the police
investigation) from children allegedly victims of sexual abuse aged 3–16 years, using for the CBCA the same criteria and
scoring system of Lamb et al. (1997). A statement was classified as confirmed (35 cases) if the supposed aggressor made a
confession or failed a polygraph test or did both; or highly doubtful (13 cases) if the child provided a detailed and credible
recantation or the accused passed a polygraph test. The average CBCA score for the confirmed cases was slightly, though
significantly, higher than the average score for the doubtful cases (M = 7.2, SD = 2.2 vs. M = 5.7, SD = 3.2).
These 4 field studies all confirmed small differences in total CBCA scores with higher values in cases of confirmed than
unconfirmed abuse. The studies were poorly homogenous because they investigated a different number of CBCA criteria
and used different ratings (0, 1, 2 vs. 0, 1). These 4 studies collectively analyzed statements made during police questioning
by 226 children suspected of being abused, and reported a higher rate of plausible or confirmed cases than unconfirmed,
implausible or highly doubtful cases (161 vs. 65). The number of cases studied was nevertheless too small to confirm or
negate the utility and validity of CBCA. Finally, all the cited studies considered pre-trial investigation narratives instead of
narratives given in the courtroom during the hearing. This is an important distinction because courtroom narratives are
usually quantitatively and qualitatively richer than pre-trial narratives also because they refer to the court proceedings
overall.
A major problem in CBCA field research is case assignment to the confirmed or the unconfirmed groups. None of the
aforementioned studies took into account the court’s final judgement and assigned cases to the groups only according to
P. Roma et al. / Child Abuse & Neglect 35 (2011) 613–620 615

an opinion expressed by the researchers. For this reason the relation between the CBCA score and the legal verdict remains
unknown. Even if the sentence does not indicate a true or false statement, it is a major variable that reflects the court
proceedings overall (including witness statements, corroboration proof, and environmental investigations).
Our primary aim in this field validation study in a sample of witness children of both sexes was to find out whether
overall CBCA scores were higher for court confirmed than for unconfirmed cases of child sexual abuse. Our secondary aims
were to seek a possible correlation between CBCA scores and age and to identify the CBCA criteria best able to distinguish
between confirmed and unconfirmed cases. To provide reliable data we used strict selection criteria, and most important,
to diminish possible circularity in classification we assigned cases according to a new dual judgement procedure (the trial
verdict and opinion expressed by a panel of 3 experts blinded to child’s testimony in court). To assess the reliability of the
CBCA in distinguishing between confirmed and unconfirmed cases we used the 14-item CBCA validated in the 2 preceding
field studies (Craig et al., 1999; Lamb et al., 1997).

Method

Sample selection

The initial sample consisted of 487 forensic cases of supposed sexual abuse in children under 16 years (92 boys and 396
girls) collected from Courts throughout Italy, from the forensic practice of 2 of the authors, and from colleagues. The data for
the study sample were analyzed exclusively by the 2 other authors not engaged in forensic practice and by their research
collaborators. The 487 cases regarded trials held throughout Italy between 1990 and 2002. The files included interviews
with the children (investigative interviews and trial interview), psychodiagnostic evaluations of the child, legal proceedings,
and final unappealable verdict. The cases were selected as follows.
From the total pool, we chose cases in which (a) the child was born and educated in Italy; (b) had no language deficiencies;
(c) was >3 years and <15 years old; (d) had an IQ of >70, as evaluated by WISC-R (Wechsler, 1986); (e) had no history of
neurological or neuropsychiatric illness; (f) had undergone no more than 2 interviews before testifying in court; and (g)
had produced at least 1,000 words during the testimony in court (not including articles, prepositions, and conjunctions).
Two further criteria required (h) that the accusation concerned physical contact between the alleged perpetrator and the
child; and (i) the perpetrator was known. Criteria a, b, and c excluded primary language deficits that could interfere with the
testimony; criteria d and e avoided the possibility of a serious illness which could distort the child’s linguistic production and
competence; criterion f served to reduce learning of the interview procedures (Stromwell, Bengtsson, Leander, & Granhag,
2004); criterion g avoided interviews with insufficient verbal production (as suggested by Raskin & Esplin, 1991c); and
criteria h and i served to specify a particular range of cases (as suggested by Lamb et al., 1997). This strict selection procedure
eliminated 248 cases and left 239 cases for study, of which 158 were confirmed cases (the abuser had been found guilty)
and 81 were unconfirmed cases (the accused had been acquitted).
Based on the reasons for judgement, the 158 confirmed cases were then grouped according to whether independent
validation information was available. The independent validation information considered included telephone tapping, inde-
pendent witnesses, pornographic material regarding the supposed victim, and biological evidence of abuse (DNA). We
decided to include among these criteria also the supposed abuser’s confession if he had not retracted (We acknowledge
the limitations of this criteria because someone might accuse himself for other reasons, an unlikely hypothesis given the
extensive enquiries undertaken for cases of sexual abuse.). Of the 158 confirmed cases, we included the 92 that met the rule
requiring one or more independent sources of information, and eliminated the 66 cases that had no independent information.
None of the 81 acquitted cases had any independent validation information corroborating the offence. This selection yielded
a sample of 183 cases (92 confirmed and 81 unconfirmed) from which we excluded cases in which the child’s interview
showed evidence that the interviewer had used suggestive or leading questions (The questions put to the child about the
presumed sexual abuse were evaluated by three post-graduate psychologists prepared on the battery of erroneous questions
proposed in two previous publications (Ferracuti, Cannoni, Roma, Carbone, & Mancini, 2002; Roma, Carbone, & Ferracuti,
2005) that foresee as suggestive all interventions that contain forced choice questions, that contain changes in the elements
referred by the child, elements not referred by the child, repeated questions and those promising an award or punishment.)
when discussing the supposed abuse. All the remaining case files (81 confirmed and 66 unconfirmed) were submitted to the
panel consisting of the 3 experts for their opinion on the truthfulness, independently from the child’s narration in the court
room (The three experts were two psychologists and one child neuropsychiatrist. They all usually worked as forensic experts
in child sexual abuse cases. The experts had all the relevant information available except for the child’s testimony in court
and the legal outcome. They were required to assess each single case and give an expert opinion on whether the abuse was
likely or unlikely, by examining witnesses, medical, biological and objective evidence. The experts worked independently
and at their final meeting reached agreement. This procedure eliminated influence stemming from the child narrative so
that the experts gave an external evaluation of the case. Their opinion was also especially valuable in those cases where
the acquittal verdict depended on procedural aspects.). If the panel’s classification and court’s verdict agreed, the case was
retained. Of the 147 cases examined, 133 were retained (74 confirmed and 59 unconfirmed) including 109 “helped depo-
sitions” at the trial (helped by a psychologist, a child neuropsychiatrist, or a social worker who received the most suitable
question for the child’s chronologic age directly from the parties through the judge) and the remaining 24 cases had the
judge as the sole interviewer. To eliminate further distorting variables, we included only cases with “helped depositions”.
616 P. Roma et al. / Child Abuse & Neglect 35 (2011) 613–620

Table 1
Descriptive data on confirmed vs. unconfirmed abuse (total sample and according age group).

Abuse N Mean age (SD) Mean CBCA score (SD; range)

Pre-school age (4–5 years) Confirmed F=8 4.83 (0.38) 6.92 (0.90; 6–9)
M=4
Tot = 12

Unconfirmed F = 12 4.77 (0.42) 3.00 (1.19; 1–5)


M=6
Tot = 18

Elementary school age Confirmed F = 28 8.38 (1.62) 7.58 (1.18; 6–10)


(6–10 years) M=3
Tot = 31

Unconfirmed F = 17 7.65 (1.42) 4.25 (1.07; 3–6)


M=3
Tot = 20

Lower secondary school age Confirmed F = 12 12.82 (1.87) 8.23 (1.10; 7–10)
(11–14 years) M=5
Tot = 17

Unconfirmed F=9 12.09 (1.04) 5.55 (1.21; 4–7)


M=2
Tot = 11

Tot. sample Confirmed F = 48 9.21 (3.57) 7.63 (1.18; 6–10)


(4–14 years) M = 12
Tot = 60

Unconfirmed F = 38 7.81 (3.30) 4.08 (1.48; 1–7)


M = 11
Tot = 49

Acronyms: N = number; SD = standard deviation; F = female; M = male; Tot = total.

This was done because when the child is questioned directly by the judge without expert help, owing to courtroom inter-
ferences the narrative lacks continuity. Conversely, in “helped depositions” the children’s interviews proceeded freely, with
few interventions by the examiner.
The final sample therefore consisted of 60 confirmed and 49 unconfirmed child sexual abuse allegations. All judgements
except 12 were expressed in the appeal court. The final verdict in the 49 unconfirmed cases established that the child’s
testimony was not trustworthy, thus allowing a full acquittal, and often referred to environmental manipulations of the
child’s memory. None of the experts were compensated for their work.

Sample features

The total selected group included 109 child witnesses (23 boys and 86 girls), aged 4–14 years (M = 8.58; SD = 3.51). The 2
groups ranged from 4 to 14 years in age, and differed in mean age (confirmed, M = 9.21; SD = 3.57 vs. unconfirmed, M = 7.81;
SD = 3.30; t107 = 2.106; p = .038). Conversely the mean age in the education subgroups did not differ (Table 1).

CBCA scoring

As in the 2 most recent field studies (Craig et al., 1999; Lamb et al., 1997) we used the 14-item CBCA version (Table 2)
of Raskin et al. (1991). Two other clinical and forensic psychologists independently and blindly in relation to trial outcome
examined all the selected interviews according to the CBCA criteria defined by Raskin and Esplin (1991a, pp. 156–157). Both
psychologists were expert raters who had attended university postgraduate master’s courses in forensic psychology that
focussed on the tool. To gain further experience in CBCA scoring they had also received additional training on 10 child sexual
abuse depositions in 3 sessions, each lasting about 2.5 h. After evaluating the transcriptions individually, the 2 psychologists
were then required to reach a joint agreement on each case. They also gave a written report on difficulties or problems they
had encountered in the individual evaluation and in the joint scoring. In all cases the 2 experts reached joint agreement. For
every single protocol, a score of 1 was given if the criterion was satisfied, and 0 if the criterion was not satisfied. The sum of
the scores for the single criteria yielded the total score, ranging from 0 to 14. The psychologists who rated the scores took
no part in the other study procedures.
All data were analyzed with a SPSS11 package (p values ≤ .05 were considered to indicate statistical significance). Data
were analyzed to determine (a) the index of inter-rater agreement in assigning scores for the single criteria (Maxwell’s
Random Error – RE) (Maxwell, 1977); (b) the presence of difference in the total score between confirmed and unconfirmed
P. Roma et al. / Child Abuse & Neglect 35 (2011) 613–620 617

Table 2
Criteria definitions and Maxwell’s Random Error (RE) coefficient.

Criteria RE Satisfied if child’s narration of supposed offence . . .

1. Logical structure and coherence .93 . . . has coherence: the different and independent details form a logic
and consistent account of a sequence of events.
2. Unstructured production .76 . . . has no underlying pattern or structure.
3. Quantity of details .82 . . . has key details about location, circumstances, time, people and
objects.
4. Embedding of events in context .93 . . . includes the event in a spatial and temporal context congruous
with the usual life experience of the child.
5. Description of the interaction .83 . . . has at least 1 description of interrelated action during the event
6. Reproduction of conversation .82 . . . has at least 1 description of verbatim reproduction during the
event.
7. Unexpected complication .91 . . . has at least 1 unforeseen interruption of the event
8. Unusual details .61 . . . has at least 1 detail that has a low probability of occurrence.
9. Superfluous details .52 . . . has at least 1 detail that is peripheral for the event.
10. Accurately reported details misunderstood .93 . . . has at least detail that the child report accurately, but not
understood.
11. References to other sexually toned events .80 . . . has at least 1 report of a sexual event (during the interview or from
the files) external to the event.
12. References to one’s feeling or thought .90 . . . has at least 1 detail referred to cognitive and emotional state of the
child.
13. Attribution of feelings to the perpetrator .93 . . . has at least 1 detail referred to cognitive and emotional state of the
perpetrator.
14. Spontaneous correction or addition .72 . . . has at least 1 spontaneous correction of statements.

Table 3
CBCA estimated means scores for confirmed and unconfirmed cases. Means are estimated at the mean age of the sample (M = 8.58 years; SD = 3.51).

Confirmed Unconfirmed Total (Conf. + Unconf.)

Male 7.83 (1.03) 3.27 (.79) 5.67 (2.13)


Females 7.58 (1.21) 4.31 (1.56) 6.14 (2.13)
Total (male + female) 7.63 (1.17) 4.08 (1.48)

abuse controlling for age and gender (2-way analysis of covariance ANCOVA); (c) differences in frequency rates for each
single criterion between the confirmed and unconfirmed abuse groups (chi-square test); (d) differences in frequency rates
for each criterion according to age classes (chi-square tests).

Results

Inter-rater agreements

According to inter-rater agreements evaluated through Maxwell’s Random Error (RE) coefficient of the 14 criteria tested
(Table 2), 12 exhibited a satisfactory level of agreement (.71 < RE < .94), whereas criterion 8 Unusual Details and criterion 9
Superfluous Details, showed lower levels (respectively: .61 and .52). When we examined the difficulties in scoring reported
by the psychologists, these 2 criteria partially overlapped, because the raters reported many events in which the details
could be interpreted either as unusual or superfluous.

Analysis of covariance

CBCA score ranges in the confirmed and unconfirmed cases in the 3 age classes overlapped, by at the most, 1 point
(Table 1). No case achieved the minimum or maximum score (14 criteria).
Age and CBCA scores were substantially and significantly correlated (r = .42, p = .012; for males r = .43, p = .39; females
r = .42, p = .001). To avoid confounding due to the differing mean age in confirmed and unconfirmed cases and to increase
statistical power, the subsequent analyses therefore treated age as a covariate.
The difference in CBCA scores between confirmed and unconfirmed cases, controlling also for the effect of gender, were
tested in a 2-way ANCOVA with case group (confirmed vs. unconfirmed) and gender as independent variables and age as a
covariate (Table 3). The analysis showed no main effect of gender (F1,104 = 1.28, p = .260), a significant main effect of confirmed
abuse (F1,104 = 166.03, p = .000) and a significant interaction (F1,104 = 4.04, p = .047). Subsequent analyses of the interaction
showed that the simple effects of confirmed abuse were significant both in males (F1,104 = 70.65, p = .000) and in females (F
1,104 = 139.49, p = .000).
618 P. Roma et al. / Child Abuse & Neglect 35 (2011) 613–620

Table 4
Number of cases satisfying the criterion and chi square tests between confirmed and unconfirmed and between the age groups.

CBCA criteria Abuse Age class


2a
Confirmed Unconfirmed X <6 years (24 cases) 6–10 years (51 cases) >10 years (28 cases) X2
(60 cases) (49 cases) E.S.b

1 57 39 4.718 20 46 25 .441
95% 79.59% (.236) 83.33% 90.20% 89.29%

2 43 36 1.87 18 38 19 .416
73.47% 71.67% (.020) 73.33% 74.51% 67.86%

3 57 30 7.061* 14 46 23 10.818*
95% 61.22% (.419) 60.00% 90.20% 82.14%
4. 53 26 15.101* 9 43 25 26.834*
88.33% 53.06% (.393) 36.67% 84.31% 89.20%

5 44 13 21.844* 7 27 21 11.772*
73.33% 26.53% (.466) 30.00% 52.94% 75.00%

6 37 12 13.601* 5 27 16 10.545
61.77% 24.49% (.372) 20.00% 52.94% 57.14%

7 21 9 2.953* 10 8 9 7.641
35% 18.37% (.185) 43.33% 15.69% 32.14%

8 14 11 .771 10 9 3 10.235
23.30% 22.45% (.010) 40.33% 17.61% 10.74%

9 56 25 23.129 9 41 25 29.169*
93.33% 51.00% (.382) 39.37% 80.40% 90.73%

10 25 7 8.475* 10 18 2 9.168
41.67% 14.29% (.299) 40.00% 35.29% 7.14%

11 20 14 .106 4 18 11 4.204
33.33% 28.57% (.051) 16.67% 35.29% 39.29%

12 31 3 23.989* 2 17 14 11.001*
51.67% 6.12% (.489) 10.00% 33.33% 50.00%

13 14 0 11.117* 0 5 9 14.158*
23.33% 0% (.347) 0.00% 9.80% 32.14%

14 23 3 13.685* 16 11 13 12.882*
38.33% 6.12% (.376) 6.67% 21.57% 46.43%
a
With continuity correction.
b
Effect size according to Cohen’s convention.
*
p ≤ .004.

Frequency of criteria in confirmed/unconfirmed cases and in age class

To keep experimentwise error rates at .05 for the 2 groups each comprising 14 comparisons, single comparison alphas
were set at .004 by the formula [˛ew ≤ 1 − (1 − ˛c )14 ]. Of the 14 criteria tested, 9 were more often satisfied significantly
among confirmed than among the unconfirmed cases (Table 4). No significant differences were found for criteria 1, 2, 8, 9,
and 11. According to Cohen’s conventions, the effect sizes for criteria 3, 5, and 12 were close to large (.50), those for criteria
4, 6, 10, and 13 close to medium (.30).
Chi-square tests evaluating the influence of age groups on the frequency rates for the single criteria in the total sample
showed that for seven criteria (3, 4, 5, 9, 12, 13, 14) the frequency generally increased with age (Table 4). Differences for
criteria 6, 8, and 10 were marginally significant (p ≤ .010). Differences for criterion 6 increased with age, those for the other
2 criteria decreased.

Discussion

As our working hypothesis predicted, our findings in this field validation study in a sample of witness children of both
sexes show that overall CBCA scores are higher for confirmed child sexual abuse than for unconfirmed cases. The distribution
of the criteria between the confirmed and unconfirmed groups shows that 9 of the 14 CBCA criteria are more frequently
satisfied in confirmed cases. The largest differences between groups involved the criteria Quantity of Details (criterion 3),
Interactions (5), and Subjective Experience (12). Independently from group (confirmed/unconfirmed) the total CBCA score
increases with increasing age in boys and girls. Taken singly, at least 7 of the 14 CBCA criteria increased in frequency with
age. Exceptions were Unusual Details (8) and Misunderstood Details (10), criteria that showed an opposite trend, albeit only
marginally significant. This datum might indicate that because younger children find unusual details especially interesting
P. Roma et al. / Child Abuse & Neglect 35 (2011) 613–620 619

they report them more often. Younger children probably also have greater difficulty in understanding elements they find
unfamiliar (Lamers-Winkelman & Buffing, 1996). Another interesting finding, in part already reported by others (Boychuk,
1991; Santtila, Roppola, Runtti, & Niemi, 2000), was that children younger than six never reported criterion 13 (Mental
Status of the Perpetrator), possibly because at this age a theory of mind remains only partly developed (Carruthers, 1996).
When we investigated the reliability of the scoring method, we found that all criteria except 8 and 9 (Unusual Details and
Superfluous Details) yielded high coefficients of inter-rater agreement. Low RE scores on criteria 8 and 9 were also found by
Horowitz et al. (1997). When we analyzed the raters’ written reports we found that the 2 criteria were indeed difficult to
differentiate because they often identified similar elements. For example, in 1 interview a girl stated that in the penthouse
where sexual abuse was enacted the abuser coughed owing to the dust or the house cat’s fur. This detail could be evaluated
as superfluous (insofar as it says nothing about the crime) or as unusual (since not everyone is allergic to cat fur). In another
case, the child testified that while he was screaming owing to the abuse, the perpetrator listened to loud music through
headphones. Again, this detail is both unusual and superfluous. In yet another case, the girl stated that the abuser cleaned
the leather-covered sofa with a paper-handkerchief decorated with cartoons. Cartoons are both a superfluous detail and an
uncommon detail (paper-handkerchiefs are usually white). When we addressed the question of details by collapsing the
scores for the 2 criteria into 1 we obtained a high RE, .89. This finding may suggest that future investigations should use
a single criterion for “Unusual and Superfluous Details” (Specifically, we summed the scores for the two criteria and then
divided them into two categories: present (scores 1 and 2) and absent (score 0)).
A strong point in our study is the rigorous sample selection requiring cases to be confirmed by external criteria, and
avoiding biases such as language deficits, mental retardation, and erroneous interview techniques. Applying the 14-item
CBCA in these strictly selected free narrative cases is of minor importance for the purposes of justice – because CBCA is often
superfluous in these cases— but is a new finding of major importance for research because it suggests that the CBCA can in
most cases predict the trial verdict in confirmed and unconfirmed child sexual abuse.
Our study has several limitations. First, the generalizability of our results is limited by the strict criteria used to select
our case from the initial pool of cases. Only 23% of the entire pool was selected (109 cases out of 487). Most cases were
contaminated by at least 1 of the factors we controlled for when we selected subjects, a topic that deserves more attention
in future research. Second, because we sought to compare the CBCA results with the court’s verdict we decided to examine
the child’s testimony at trial, because the Italian judiciary system accepts only verbal evidence given by the supposed victim
at the trial (previous statements —to the police, the forensic psychologist, or other authorities— have no value at the trial)
and also because our experience in this field suggests that the child’s narrative in the court room during the debate between
the 2 parties is richer and potentially more meaningful than evidence collected in another setting. The court hearing is
nevertheless the last in a series of other hearings and is therefore the most likely to be altered. Although we tried to control
for this possibility by excluding all the interviews contaminated by erroneous interview techniques we cannot overlook the
possible presence of other confounding factors. A third possible limitation in generalizing our results is that unlike other
judicial systems, the Italian system does not foresee the presence of a popular jury in cases of sexual abuse but always
requires a judicial collegiate consisting of judges with expertise in the subject. Another limitation is that our study in well-
defined cases leaves open the reliability of the CBCA in less well-defined cases. Finally, we acknowledge possible bias from
the fact that the forensic experts were not independent of the researchers.
In conclusion, our data obtained in a sample of witness children of both sexes are in line with the results from previous field
studies on CBCA showing that cases of confirmed abuse achieve on average higher CBCA scores than cases of unconfirmed
abuse. Independently from the type of abuse, the CBCA score depends strongly on the age of the child witness (Craig et al.,
1999; Horowitz et al., 1997; Lamers-Winkelman & Buffing, 1996; Poole & Lamb, 1998) showing that cognitive maturation,
by increasing child narrative ability, is a crucial factor in determining the number of satisfied criteria. The dual judgement
procedure (the trial verdict and opinion expressed by a panel of 3 experts) diminishes circularity in classification, excludes
doubtful cases and selects the better defined cases whose CBCA scores differentiate between confirmed and unconfirmed
cases in all age classes.

References

Akehurst, L., Kohnken, G., & Hofer, E. (2001). Content credibility of accounts derived from live and video presentations. Legal and Criminological Psychology,
6, 65–83.
Anson, D. A, Golding, S. L., & Gully, K. J. (1993). Child sexual abuse allegations: Reliability of criteria-based content analysis. Law and Human Behavior, 17,
331–341.
Berliner, L., & Conte, J. R. (1993). Sexual abuse evaluations: Conceptual and empirical obstacles. Child Abuse & Neglect, 17, 111–125.
Boychuk, T. D. (1991). Criteria-based content analysis of children’s statements about sexual abuse: A field-based validation study. Unpublished doctoral
dissertation, Arizona State University.
Buck, J. A., Warren, A. R., Betman, S., & Brigham, J. C. (2002). Age differences in Criteria-Based Content Analysis scores in typical child sexual abuse interviews.
Applied Developmental Psychology, 23, 267–283.
Carruthers, P. (1996). Simulation and self-knowledge: A defense of the theory-theory. In P. Carruthers, & P. K. Smith (Eds.), Theories of mind. Cambridge:
Cambridge University Press.
Ceci, S., Kulkofsky, S., Klemfuss, J., Sweeney, C., & Bruck, M. (2007). Unwarranted assumptions about children’s testimonial accuracy. Annual Review of
Clinical Psychology, 3, 311–328.
Craig, R. A., Scheibe, R., Raskin, D. C., Kircher, J. C., & Dodd, D. H. (1999). Interviewer questions and content analysis of children’s statements of sexual abuse.
Applied Developmental Science, 3, 77–85.
Davies, G. M., Westcott, H. L., & Horan, N. (2000). The impact of questioning style on the content of investigative interviews with suspected child sexual
abuse victims. Psychology, Crime, and Law, 6, 81–97.
620 P. Roma et al. / Child Abuse & Neglect 35 (2011) 613–620

Esplin, P. W., Boychuk, T., & Raskin, D. C. (1988). A field validity study of Criteria-Based Content Analysis of children’s statements in sexual abuse cases. In
Paper presented at the NATO Advanced Study Institute on Credibility Assessment. June, Maratea, Italy.
Ferracuti, S., Cannoni, E., Roma, P., Carbone, G., & Mancini, P. (2002). Analisi della struttura dell’intervista nel corso di audizioni protette di minori supposte
vittime di abuso sessuale. Psichiatria dell’infanzia e dell’adolescenza, 69, 749–758.
Hershkowitz, I., Lamb, M. E., Sternberg, K. J., & Esplin, P. W. (1997). The relationships among interviewer utterance type, CBCA scores and the richness of
children’s responses. Legal and Criminological Psychology, 2, 169–176.
Horowitz, S. W., Lamb, M. E., Esplin, P. W., Boychuk, T. D., Krispin, O., & Reiter-Lavery, L. (1997). Reliability of Criteria-Based Content Analysis of child witness
statements. Legal and Criminological Psychology, 2, 11–21.
Lamb, M. E, Sternberg, K. J., Esplin, P. W., Hershkowitz, I., Orbach, Y., & Hovav, M. (1997). Criterion-Based Content Analysis: A field validation study. Child
Abuse & Neglect, 21, 255–264.
Lamers-Winkelman, F., & Buffing, F. (1996 June). Children’s testimony in the Netherlands: A study of statement validity analysis. Criminal Justice and
Behavior, 23(2), 304–321.
Landry, K., & Brigham, J. C. (1992). The effect of training in Criteria-Based Content Analysis on the ability to detect deception in adults. Law and Human
Behavior, 16, 663–675.
Maxwell, A. E. (1977). Coefficients of agreement between observers and their interpretation. British Journal of Psychiatry, 130, 79–83.
Merckelbach, H. (2004). Telling a good story: Fantasy proneness and the quality of fabricated memories. Personality and Individual Differences, 37(7),
1371–1382.
Poole, D., & Lamb, M. (1998). Children as witnesses: The tragedy and the dilemma. Investigative interviews of children: A guide for helping professionals (pp.
7–32). Washington, DC, USA: American Psychological Association., Porter, S., & Yuille, J. C. (1996). The language of deceit: An investigation of the.
Porter, S., & Yuille, J. C. (1996). The language of deceit: An investigation of the verbal clues to deception in the interrogation context. Law and Human
Behavior, 20, 443–459.
Raskin, D. C, & Esplin, P. W. (1991). Assessment of children’s statements of sexual abuse. In J. Doris (Ed.), The suggestibility of children’s recollections (pp.
153–164). Washington, DC: American Psychological Association.
Raskin, D. C., & Esplin, F. W. (1991). Response to Wells, Loftus, and Mcgaug. In J. Doris (Ed.), The suggestibility of children’s recollections (pp. 153–164).
Washington, DC: American Psychological Association.
Raskin, D. C., & Esplin, P. W. (1991). Statement Validity Assessment: Interview procedure and content analysis of children’s statements of sexual abuse.
Behavioural Assessment, 13(3), 265–291.
Raskin, D. C., Esplin, F. W., & Horowitz, S. (1991). Investigative interviews and assessment of children in sexual abuse cases. Unpublished manuscript, Department
of Psychology, University of Utah.
Roma, P., Carbone, G., & Ferracuti, S. (2005). Utilità della CBCA nella valutazione dell’attendibilità del minore. Psichiatria dell’infanzia e dell’adolescenza, 72(2),
339–350.
Ruby, C. L., & Brigham, J. C. (1997). The usefulness of the Criteria-Based Content Analysis technique in distinguishing between truthful and fabricated
allegations. Psychology, Public Policy, and Law, 3, 705–737.
Santtila, P., Roppola, H., Runtti, M., & Niemi, P. (2000). Assessment of child witness statements using Criteria-Based Content Analysis (CBCA): The effects of
age, verbal ability, and interviewer’s emotional style. Psychology, Crime, and Law, 6, 159–179.
Steller, M., & Kohnken, G. (1989). Criteria-Based Content Analysis. In D. C. Raskin (Ed.), Psychological methods in criminal investigation and evidence (pp.
217–245). New York: Springer-Verlag.
Stromwell, L. A., Bengtsson, L., Leander, L., & Granhag, P. A. (2004). Assessing children’s statements: The impact of a repeated experience on CBCA and RM
ratings. Applied Cognitive Psychology, 18, 653–668.
Undeutsch, U. (1982). Statement reality analysis. In A. Trankell (Ed.), Reconstructing the past: The role of psychologists in criminal trials (pp. 27–56). Deventer,
the Netherlands: Kluwer.
Undeutsch, U. (1989). The development of statement reality analysis. In J. C. Yuille (Ed.), Credibility assessment (pp. 101–120). Dordrecht, The Netherlands:
Kluwer.
Vrij, A. (2000). Detecting lies and deceit: The psychology of lying and its implications for professional practice. Chichester, England: Wiley.
Vrij, A. (2005). Criteria-Based Content Analysis: A qualitative review of the first 37 studies. Psychology, Public Policy, and Law, 11(1), 3–41.
Vrij, A., Akehurst, L., Soukara, S., & Bull, R. (2002). Will the truth come out? The effect of deception, age, status, coaching, and social skills on CBCA scores.
Law and Human Behavior, 26, 261–283.
Vrij, A., Mann, S., Kristen, S., & Fisher, R. (2007). Cues to deception and ability to detect lies as a function of police interview styles. Law and Human Behavior,
31(5), 499–518.
Wells, G. L., & Loftus, E. F. (1991). Commentary: Is this child fabricating? Reaction to a new assessment technique. In J. Doris (Ed.), The suggestibility of
children’s recollections (pp. 153–164). Washington, DC: American Psychological Association.
Wechsler, D. (1986). WISC-R. In Scala di intelligenza Wechsler per bambini—Riveduta. Firenze: Organizzazioni Speciali.

You might also like