You are on page 1of 14

Computer Utilization and Clinical Judgment

in Psychological Assessment Reports


Elizabeth O. Lichtenberger
Alliant International University, San Diego, CA

The process of assessment report writing is a complex one, involving both


the statistical evaluation of data and clinical methods of data interpreta-
tion to appropriately answer referral questions. Today, a computer often
analyzes data generated in a psychological assessment, at least in part. In
this article, the author focuses on the interaction between the decision-
making processes of human clinicians and the test interpretations that are
computer-based. The benefits and problems with computers in assess-
ment are highlighted and are presented alongside the research on the
validity of automated assessment, as well as research comparing clini-
cians and computers in the decision-making process. The author con-
cludes that clinical judgment and computer-based test interpretation
each have weaknesses. However, by using certain strategies to reduce
clinicians’ susceptibility to errors in decision making and to ensure that
only valid computer-based test interpretations are used, clinicians can opti-
mize the accuracy of conclusions that they draw in their assessment
report © 2005 Wiley Periodicals, Inc. J Clin Psychol 62: 19–32, 2006.

Keywords: psychological assessment report; computer utilization; clinical


judgment; report writing

The role of computers in psychological assessment has increased dramatically in the past
four decades. Though many clinicians in the 21st century embrace such technologies that
have the potential to assist in conducting accurate assessments, there is ongoing contro-
versy about the validity and the extent to which such computer technology should be used
in the clinical process. Indeed, an entire special section of the March 2000 issue of Psy-
chological Assessment was devoted to “The Use of Computers for Making Judgments
and Decisions.” This discussion followed from those of 20 or more years ago on the topic
of using computers in assessment (Matarazzo, 1983; Shectman, 1979; Tallent, 1987).

Correspondence concerning this article should be addressed to: Elizabeth O. Lichtenberger; e-mail:
drlichtenberger@aol.com

JOURNAL OF CLINICAL PSYCHOLOGY, Vol. 62(1), 19–32 (2006) © 2006 Wiley Periodicals, Inc.
Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/jclp.20197
20 Journal of Clinical Psychology, January 2006

Not only has the frequency of computer use by psychologists increased, but comput-
ers are predicted to “become increasingly important for psychological assessment” over
the next decades (Garb, 2000, p. 31). Many clinicians and researchers have contemplated
the anticipated effect of new technology in the assessment field (Garb, 2000; Groth-
Marnat, 2000, Snyder, 2000). Predicting the developments in clinical assessment, Groth-
Marnat (2000) envisions that not only will computers be used to optimize and enhance
incremental validity but also they will provide new options for assessment and evaluation
of assessment data such as interactive virtual reality, computer analysis of narrative infor-
mation from spoken interactions, large interactive norms, and use of more advanced
multivariate predictive models. Goals to examine as technology advances and as com-
puters are increasingly integrated into the assessment process are to: (a) more effectively
integrate clinical and mechanical prediction methods, and (b) promote a more productive
partnership between clinicians and researchers that develop the computer programs used
in assessment (Snyder, 2000).
To achieve those goals, clinicians and researchers must first recognize the current
state of computer utilization and its interface with clinical judgment in psychological
assessment. To that end, I will review the assets and the problems with computers in
psychological assessment. A review of some of the pertinent research on the topics of
computer-based test interpretation, actuarial prediction, and clinical judgment will be
integrated. Finally, suggestions will be offered to improve the use of computers in assess-
ment to enhance clinical judgment in the process of writing psychological reports.

Benefits of Computers in Assessment


Computers can do many things well in a psychological evaluation. They are widely used
in assessment to improve clerical efficiency, store data, and to generate interpretive hypoth-
eses (Groth-Marnat, 2000; Lichtenberger, Mather, Kaufman, & Kaufman, 2004). They
can process an immense amount of data and translate these data into behavioral state-
ments that “constitute psychological reports of acceptable syntax” (Tallent, 1987, p. 96).
Administration of tests via a computerized format offers certain benefits over tradi-
tional means as well. For example, some researchers have found that asking questions
about sensitive behaviors by computer produces more candid responding as compared
with traditional methods of assessment (Feigelson & Dwight, 2000). Similarly, research
has shown that examinees perceive Internet-based personality testing to be more com-
fortable, less intimidating, and more preferable than traditional paper-and-pencil ver-
sions of tests (Salgado & Moscoso, 2003). Although there is great interest in expanding
the availability of psychological assessment services through the Internet, there are a
number of potential problems associated with this venue that need to be addressed before
wider clinical use of Internet-based tests develops (Butcher, Perry, & Hahn, 2004). For
example, the equivalence between Internet-based and traditional versions of tests need to
be evaluated, norms need to be developed for tests based on administration via the Inter-
net, and the security of test items when administered over the Internet needs to be assured
(Butcher et al., 2004). When these aspects of Internet-based assessment instruments are
further developed and fully evaluated, then this method of assessment may reveal further
benefits.
Computerized adaptive tests (typically seen in ability testing) can select particular
items to match each examinee’s level of ability, which provides higher measurement
precision than conventional tests (Hontangas, Olea, Ponsoda, Revuelta, & Wise, 2004).
Because computerized adaptive testing tailors items to each client, unnecessary items are
eliminated—therefore creating a more efficient and precise assessment (Butcher et al.,
Journal of Clinical Psychology DOI 10.1002/jclp
Computer Utilization and Clinical Judgment 21

2004; Moreno, Wetzel, McBride, & Weiss, 1984). Although computerized adaptive test-
ing has been applied successfully to ability and aptitude testing, its application to per-
sonality assessment has not been investigated as frequently. In the realm of personality
assessment, computerized adaptive testing has been most frequently applied to the Min-
nesota Multiphasic Personality Inventory (MMPI) and MMPI-2 (Ben-Porath, Slutske, &
Butcher, 1989; Roper, Ben-Porath, & Butcher, 1991). Although these studies of the MMPI
and MMPI-2 have shown that the item savings for the computerized adaptive versions
were about 27% in comparison to the conventional booklet version, questions regarding
the external validity of the computerized adaptive personality tests remain (Butcher et al.,
2004). The external validity of the computerized adaptive version of the MMPI-2 has
been demonstrated with some studies (e.g., Handle, Ben-Porath, & Watt, 1999), but the
validity of each computerized adaptive test needs to be demonstrated prior to use. Butcher
et al. (2004) note that unless time is a critical factor in the assessment (i.e., both the client
and the clinician are not rushed for time), there are no practical advantages for using
computerized adaptive testing in personality test administration.
Additional benefits of integrating computers into the administration of tests have
been noted with multimedia computerized assessments (e.g., using full-motion video
presented on the computer), which have been shown to elicit more positive affective
reactions by test takers than traditional paper-and-pencil assessments (Richman-Hirsch,
Olson-Buchanan, & Drasgow, 2000). Thus, computerized versions of tests can be used to
increase the ecological validity of a test. Although these uses of computer technology in
the administration of tests are promising, the current use of computers in clinical assess-
ment remains centered on clerical efficiency, storing data, and generating interpretive
hypotheses.

Problems Associated With Computers in Assessment


Computer-based test interpretation (CBTI) has excited many but has also raised contro-
versy. Groth-Marnat (2000) recognized that “computers (and technologies in general)
offer wondrous solutions, but, at the same time, they open up new dilemmas for the
profession and society” (p. 359). One of the major problems of new computer technology
is that many automated assessment programs for interpreting psychological test results
are not validated (Adams & Heaton, 1985; Butcher, Perry, & Atlis, 2000; Garb, 1998,
2000; Matarazzo, 1986; Moreland, 1985). It is also possible that some automated assess-
ment programs are biased (Garb, 2000). Bias can be introduced into automated assess-
ment programs because they are developed by expert clinical judges, who themselves
may be biased and who may bring bias into the programs they write.
Snyder (2000) points out some problems associated with CBTIs: They (a) promote
overly passive attitudes toward clinical evaluation, (b) create an unwarranted impression
of scientific precision, (c) rely on generalities precluding differential descriptions of respon-
dents, and (d) promote misuse through increased availability to inadequately trained con-
sumers. Groth-Marnat (2000) highlights other problems that technology creates such as:
(a) increased access to information increases client confidentiality concerns; (b) software
quickly becomes outdated and clinicians may continue to use obsolete software; (c) com-
plete narrative computerized reports may tempt clinicians to present them as their own;
and (d) as technology is increasingly used, the role of the clinician in the decision making
process is potentially reduced.
An additional problem with some CBTIs is the length of reports that computers
generate and the fact that statements are made about nearly every score collected. Mat-
arazzo (1983) expressed concern about the large number of “valid-sounding statements”
Journal of Clinical Psychology DOI 10.1002/jclp
22 Journal of Clinical Psychology, January 2006

that many computer reports produce. Similarly, others have commented about computer-
generated reports that present “everything you could possibly tell about a person from the
test” (Butcher, 1978, p. 950). The underlying concern is that CBTIs sometimes present an
overabundance of information, which practitioners need to wade through carefully. Typ-
ically, not all computer-generated statements should be used in a final report.
Clinicians may be tempted to reach beyond their scope of expertise when computer-
based test interpretations are at their disposal. Matarazzo (1983) pointed out that com-
puters often generate “valid-sounding narrative statements” and computers have the
“spurious appearance of objectivity and infallibility as a halo effect from the computer”
(p. 323). Thus, a danger of computer-generated reports is that having a printout of scores
interpreted by a seemingly infallible machine may make unknowledgeable clinicians
attempt to incorporate data that they are not qualified to handle. Poorly trained clinicians
armed with computer-generated reports are likely to feel that they know much more than
they do (Eichman, 1972). However, Fowler and Butcher (1986) point out that “as long
as there have been tests, there have been poorly trained professionals who have used
them badly” (p. 95). Thus, computerized tests and CBTIs do not cause the problems
themselves, and the solution to this particular problem has more to do with proper train-
ing, licensing, and ethical accountability, than changing computer-based testing.
Publishers of computer-generated reports do try to limit their services to well-trained
professionals (or those who will consult with such professionals). For example, test pub-
lishers typically require that users of their products meet certain “Test User Qualifications”
that include information such as appropriate training, professional credentials, and educa-
tional background. However, it is difficult to make certain that those who cannot or who will
not use them responsibly do not obtain them. Companies that develop such computer pro-
grams cannot be responsible for monitoring the ethical use of their product by clinicians;
hence, professional organizations such as the American Psychological Association (APA)
and the National Association of School Psychologists (NASP) have created guidelines that
focus on the use of computers in assessment. These guidelines address the ethical, clinical,
and professional issues associated with computerized test administration, scoring, inter-
pretation, and computer generated reports (e.g., NASP Professional Code of Conduct; NASP,
2000; Ethical Standards of the American Psychological Association, APA, 2002).

Research Related to Computers in Assessment


The research related to computers in the assessment process has focused on two related,
but separate types of computer programs: actuarial assessment programs (involving sta-
tistical or actuarial prediction) and automated assessment programs (otherwise known as
computer-based test interpretation). Statistical prediction involves mathematical equa-
tions such as linear-regression equations and Bayesian rules, whereas automated assess-
ment consists of a series of if–then statements that are written by expert clinicians (Garb,
1998). Statistical prediction rules are usually empirically based, whereas automated assess-
ment programs are based on the beliefs of expert clinicians (who typically draw upon
published research and clinical experience). The review of research on automated assess-
ment presented here focuses on the validity of these programs, and the review of research
on statistical prediction focuses on how such prediction compares with clinical (human)
prediction.

Validity of Automated Assessment


The validity of CBTI systems depends on the validity of the tests for which they are
developed. If a measure is valid, then “the validity of the CBTI depends on how closely
Journal of Clinical Psychology DOI 10.1002/jclp
Computer Utilization and Clinical Judgment 23

the developer of the system conforms to the actuarial finding for the instrument” (Snyder,
2000, p. 54; see also Butcher, 1995). However, most CBTI systems are not based solely
on actuarial data; expert clinicians typically help develop the narrative statements asso-
ciated with the actuarial data. Errors are created when miscommunications occur between
programmers of the CBTI system and the expert clinicians (Snyder, 2000). To determine
the validity of CBTI systems, ideally, external criterion validity data should be evaluated.
Such data examine the congruence of statements from the computer-based narrative with
independent observations or ratings of the test respondent. Unfortunately, a paucity of
such external criterion studies for CBTI systems exist (Garb, 2000). Another, less restric-
tive type of validity study is also reported for CBTI: customer satisfaction studies (More-
land, 1985; Snyder, 2000). Although the consumer satisfaction studies are not considered
by some to be as valuable as external criterion studies, they do merit consideration.
The field of personality assessment has offered the most research on the validity of
CBTI systems, using both consumer satisfaction and external criterion studies. Typically,
consumer satisfaction studies require consumers to rate how accurate, clear, and useful a
CBTI is (Snyder, 2000; Snyder, Widiger, & Hoover, 1990). The MMPI and MMPI-2
narratives have been studied most frequently in terms of consumer satisfaction. One of
the clear results of the consumer satisfaction studies is that consumers assign greater
accuracy to narratives with a larger number of nonspecific statements (i.e., statements
that could accurately apply to just about anyone—“the Barnum Effect”; Moreland, 1985;
Snyder, 2000). Although some studies have tried to control for the Barnum Effect by
contrasting ratings of bogus and actual CBTI’s, Snyder and colleagues (1990) noted that
even this methodology can be compromised by “halo effects.” Hoover and Snyder (1991)
tried to reduce the confound by interweaving bogus statements with real statements within
a single report (using the Marital Satisfaction Inventory), but this methodology has been
infrequently used by other researchers for studying other measures (Snyder, 2000). Other
methodological considerations can help improve future consumer satisfaction validity
studies: rate specific statements and paragraphs in addition to the global narrative, assess
interrater reliability, use a large representative sample of users, and tap a range of behav-
iors covered by the interpretive system (Moreland, 1985; Snyder, 2000; Snyder et al.,
1990).
In terms of external criterion studies, the MMPI and MMPI-2 narratives have been
evaluated most extensively (Butcher et al., 2000; Moreland, 1985, 1987). Results from
such studies have been mixed, as the small sampling of studies reviewed here suggests.
For example, Hedlund, Morgan, and Master (1972), found that there was a false positive
rate of 62% when clinicians’ ratings based on discharge summaries were compared to
narrative statements produced from MMPI reports. This contrasted with data reported by
Butcher et al. (1998) that examined the utility of computer-based MMPI-2 reports in
Australia, France, Norway, and the United States. Butcher et al. (1998) found that clini-
cians judged 80 to 100% of computer-generated narrative statements about the MMPI-2
to be appropriate in two thirds of the records evaluated, indicating that those were “highly
accurate.” Additionally, in 87% of these MMPI-2 reports, at least 60% of the computer-
generated narratives were believed to be “appropriate.” The validity of MMPI narrative
reports varied in a study by Eyde, Kowal, and Fishburne (1991). The authors found
differences in validity that was dependent on whether the MMPI profile was in the clin-
ical versus subclinical range. The subclinical normal cases had a higher percentage of
cases that had sentences that could not be rated in the MMPI narrative, whereas the
inpatient clinical cases had a low percentage of sentences that could not be rated. Shores
and Carstairs (1998) demonstrated the validity of using MMPI-2 computerized reports
in detecting malingering. When groups of students were instructed to “fake-good,”
Journal of Clinical Psychology DOI 10.1002/jclp
24 Journal of Clinical Psychology, January 2006

“fake-bad,” or were given the standard MMPI-2 instructions, the MMPI-2 computerized
report correctly classified 94% of the fake-good group, 100% of the fake-bad group, and
78% of the group completing the MMPI-2 with standard instructions (Shores & Carstairs,
1998). This study supports the use of the MMPI-2 computerized report in the forensic
realm for the detection of malingering.
Comprehensive reviews of CBTI validity are available elsewhere (e.g., Moreland,
1987; Butcher et al., 2000). However, those reviewed here highlight the fact that the
validity of individual CBTI systems varies widely. Butcher and colleagues (2000) remind
clinicians that although a computer-based program may be validated in a certain setting,
it is not necessarily valid or appropriate for other clinical settings (e.g., clinical vs. foren-
sic settings). Thus, clinicians are urged, where possible, to evaluate carefully the validity
of each CBTI they use in various settings before integrating the data into their written
assessment reports.

Comparison of Clinicians and Computers in the Decision-Making Process


In 1954, Meehl presented a classic text that compared clinical versus actuarial accuracy,
and he concluded that actuarial judgment was more accurate than clinical judgment (Meehl,
1954). The decades of research that have followed have generally supported Meehl’s
conclusion. “Mechanical prediction (whether from statistical, actuarial, or alternative
algorithmic bases) has the potential to outperform clinical prediction by individual judges
on the basis of subjective processing of the same data, given reasonably reliable and valid
indicators of the criterion” (Snyder, 2000, p. 57).
Garb (1998) completed a thorough review of research on clinical judgment. He con-
cluded that clinical judgment is often of questionable accuracy. Several factors may affect
the accuracy of clinicians’ judgment: primacy effects, confirmatory bias, hindsight bias,
attribution bias, not taking into consideration base-rate frequency, and the clinician’s
personality type. In research reviewed by Garb (1998) on clinical versus statistical pre-
diction, often clinicians were given limited information with which to make diagnoses
and personality ratings. Given that clinicians conducting an assessment typically have
access to the test data, as well as behavioral observations, clinical interview data, and
medical and psychological history, it is reasonable to question whether clinicians’ accu-
racy would improve in these studies if given more data. However, Garb (1984) noted that
the validity of clinical judgments does not always increase when clinicians are given
additional information; results from empirical studies indicate that the validity of clinical
judgments does increase when demographic and history data are added to psychometric
data. Some research shows that when clinicians have access to clinical interview data as
an additional predictor, they do relatively worse compared with mechanical prediction
(Grove, Zald, Lebow, Snitz, & Nelson, 2000). However, Westen and Weinberger (2004)
found that clinicians can provide valid and reliable data if their inferences are quantified
using psychometric instruments designed for expert observers. Future research will need
to be conducted to determine the incremental validity of such quantification of clinician’s
inferences (e.g., a Q-sort instrument akin to a clinician-report MMPI) in predicting clin-
ically relevant client-report measures such as the MMPI-2.
Grove and colleagues (2000) conducted a meta-analytic study of clinical versus
mechanical prediction on 136 studies in the fields of psychology, medicine, and human
behavior. On average, mechanical-prediction techniques were 10% more accurate than
clinical predictions. However, clinical predictions were “often as accurate as mechanical
predictions” (Grove et al., 2000, p. 19), and in 6 to 16% of the studies, clinical predictions
were more accurate. The overall superiority of the mechanical predictions did not vary
Journal of Clinical Psychology DOI 10.1002/jclp
Computer Utilization and Clinical Judgment 25

depending on judgment task, type of judge, or judges’ amounts of experience. Although


mechanical prediction does seem generally more accurate than clinical prediction, for
many decisions or referral questions, mechanical models are not available.

Goal for the Future: Improve Clinician–Computer Interaction to Enhance Clinical


Judgment in Assessment Reports

The assessment process is not simply an analysis of numbers and a comparison of values
to base rates. Assessment involves testing, obtaining a data set, and generating a series of
hypotheses based on that data. However, it is not a simple process: “Assessment is a
complex set of activities that seeks solutions to specific problems, whereas testing fits a
straightforward actuarial paradigm. Testing is readily adapted to an automated data pro-
cessing system, whereas the complex of assessment procedures cannot be so managed”
(Tallent, 1987, p. 95). Both psychometrics and clinical methods are important in assess-
ment. As noted, computers do an excellent job with the actuarial aspects of assessment,
and they even do a decent job of generating hypotheses about behavior, personality, and
cognitive ability. However, they have not yet been designed to answer complex referral
questions that necessarily must draw on data from multiple domains such as behavioral
observations, medical history, family history, personality functioning, as well as cogni-
tive functioning. Although clinical judgment is imperfect, trained clinicians armed with
theory and knowledge of tests and psychometrics are the key figures in the assessment
process. Clinicians weave together the data from diverse sources, and use problem solv-
ing to formulate the best diagnostic picture of a client, the most useful treatment plan, and
present a complete picture of a client in a written report.
Psychological assessment is a process of solving problems (often answering com-
plex questions), but it is a variable process that “cannot be reduced to a finite set of
specific rules” (Maloney & Ward, 1976, p. 5). Computer-based test interpretations are
based mainly on statistical (actuarial) predictions derived from a set of rules (e.g., if–then
statements) linked to narrative interpretations designed by clinical experts. Although it is
debatable whether answering the questions in a complex psychological assessment can
be thoroughly done by a computer program that operates based on a finite set of rules, it
is clear that computer-derived data are clearly important, very valuable, and typically
fairly accurate (e.g., Grove et al., 2000). In assessment, the actuarial component must be
integrated with the clinical component as “scores are not meant to replace psychological
thinking; they are designed to facilitate it, and as such they can be relegated to the back-
ground when this is warranted by the logic of the problem” (Schafer, 1949, p. 331).
We need to determine how to create an optimal interaction between clinicians and
computers so that the best of the actuarial and the best of the clinical worlds are inte-
grated. Each client presents a unique situation and clinicians must be prepared to focus on
the essentials of each individual’s situation. For each individual, computer-generated
reports typically include a multitude of hypotheses and a plethora of narrative statements.
Some of those statements are accurate but too broad (e.g., they could accurately describe
anyone), whereas other statements are specific hypotheses that answer a specific facet of
a client’s referral question but need further data for validation. Because of the large
amount of data generated during an assessment (both computer-generated and clinician-
generated), diagnosticians typically know more about clients than they should include in
their written reports. “In fact, part of the task of all psychologists is to choose just what
needs to be communicated out of the wealth of material at their disposal” (Shect-
man, 1979, p 738). Thus, human evaluators must use clinical methods to formulate a
Journal of Clinical Psychology DOI 10.1002/jclp
26 Journal of Clinical Psychology, January 2006

psychological report that pulls the most relevant, valid information from the testing
(computer-generated and other information), and apply it to the specific context of the
client’s life.
To make the relationship between computers’ (mechanical prediction and CBTI)
methods and humans’ clinical methods produce the strongest, most accurate written report,
clinicians need to critically evaluate both methods: computer-based test interpretations
(CBTI) and themselves as data processors. Evaluating any CBTI should be carefully
done before the data is integrated into an assessment report. Snyder (2000, p. 56) sum-
marized the work of Butcher (1995), Moreland (1985), and Roid (1985) and listed several
points to consider in evaluating CBTI systems. Each of Snyder’s evaluative points speaks
to the validity of the CBTI and the validity of the test upon which the system was based.
It is important that a CBTI system has independent scholarly reviews and that the system
user’s guide describes the interpretive approach and the basis for decision rules.
In addition to evaluating CBTIs, clinicians need to evaluate themselves in the clini-
cal decision making process. Many studies have shown that clinicians are unaware of (or
underestimate) sources of error in their own decisions (Garb, 1998). The errors that cli-
nicians face in their own decision making process during a psychological assessment
were clearly articulated by Garb (1998). The following are examples of some of the
errors that clinicians encounter (as summarized by Snyder, 2000):

• Clinician’s judgments are unduly influenced by data collected early or late in the
assessment (primacy and recency effects).
• Clinician’s judgments are unduly influenced by information already available.
• Clinicians tend to use and remember information that can confirm, but not refute a
hypothesis.
• Clinicians attend to information that is most readily available or easily recalled.
• Clinicians have the tendency to assign high probabilities to prototypical combina-
tions of characteristics.
• Clinicians have the tendency to recall occasions when two phenomena co-vary
rather than occasions when they do not.
• Clinicians overestimate the accuracy of their explanation for an observation.

Adding to the errors in clinical judgment is the fact that clinicians often do not
receive adequate feedback on the accuracy of their judgments (Einhorn & Hogarth, 1978;
Garb, 1998). This lack of feedback gives them little opportunity to modify poor judgment
practices. However, if clinicians do evaluate themselves during the decision-making pro-
cess in assessments, they may be able to reduce their susceptibility to errors. According to
the research reviewed by Garb (1998), clinicians can become more accurate when they
use certain strategies to counteract biases. These strategies include, for example, evalu-
ating alternatives when making judgments, incorporating situational determinants along
with intrapersonal ones, and decreasing reliance on personal memory.

Strategies for Optimizing Computers and Clinical Judgment

As clinicians of the 21st century, we sit poised with exciting technological advances at
our fingertips. To date, the research indicates that although statistical-prediction pro-
grams often offer valid summaries of psychological assessment data, these programs are
not currently capable of answering complex referral questions that are presented by many
clients. Computer-based test interpretations yielding complete narrative reports are
Journal of Clinical Psychology DOI 10.1002/jclp
Computer Utilization and Clinical Judgment 27

available with many psychological tests from personality measures to measures of cog-
nitive ability. Many of these CBTIs have not been validated; therefore, it appears that
clinicians with their fallible judgment remain as the key component in writing psycho-
logical assessment reports. Therefore, clinicians must take ultimate responsibility for
conscientiously using the technology that is available to them, as well as using the tech-
niques available to improve their own clinical judgment when formulating a written
assessment report. In the remaining paragraphs, strategies for optimizing the interaction
between clinicians and CBTIs are suggested for three domains: clinicians, training, and
test developers.

Strategies for Clinicians

Data from a test or computerized narratives are only as good as the clinician who uses
them. A final report presented to a client or referral source will be most useful if it is well
written; uses clear, concise language; and direct, specific recommendations. Skilled cli-
nicians create such well-written reports by using tenets of report writing articulated in
many books and professional articles. Much has been written about assessment report
writing and information from such sources should be incorporated when writing assess-
ment reports, whether they are based on raw data from the test itself or on a CBTI. For the
interested reader, information on report writing can be found by consulting some of the
following: Butcher (2002); Garb (1992); Groth-Marnat (2003); Harvey (1997); Lichten-
berger et al., 2004); and Tallent (1993).
Although the future holds promise of more interaction between humans and comput-
ers in the assessment process, currently these interactions are confined to written mate-
rial. As they exist today, computer-based test interpretations do not account for potentially
critical nonverbal cues in assessment (e.g., speech patterns, vocal tone, facial expres-
sions), or data from multiple other sources such as clinical history and other tests admin-
istered in an assessment battery (Butcher et al., 2000). Thus, clinicians must supplement
the computer-based test interpretations (at least those that are deemed valid) with key
behavioral observations, clinical history, and additional test data when answering com-
plex referral questions posed in each individual assessment. The most accurate conclu-
sions drawn in a report will be made by those clinicians that bear in mind that their own
judgments need to be double-checked for accuracy.
Clinicians can go through several steps when proceeding from the CBTI narrative to
actually writing the report to produce the most accurate and least-biased report. For
example, such steps may include the following:

1. Evaluate each major statement or hypothesis made in the CBTI narrative to deter-
mine if there are pieces of data from other sources that refute or confirm the
hypotheses. Garb (1998) stated “Results from empirical studies indicate that the
validity of clinical judgments does increase when demographic and history data
are added to psychometric data” (p. 218). In addition, one of the strategies against
bias suggested by Garb (1998) for clinicians is to “consider alternatives when
making judgments.” Thus, carefully evaluating hypotheses generated from the
CBTI with other data follows this suggested strategy against bias. A hypothesis
derived from the CBTI can be confirmed with one piece of supplementary data,
but two pieces of confirmatory data are preferable. If one or more pieces of data
contradict the hypothesis, then that CBTI-derived hypothesis may not be valid for
that client. You may reject that hypothesis based on the contradictory data or you
may choose to administer supplemental testing (or collect other collateral data)
Journal of Clinical Psychology DOI 10.1002/jclp
28 Journal of Clinical Psychology, January 2006

before making a final determination about the hypothesis. Other sources of data
to review in this process include:
a. Clinical interview data
b. Collateral data (e.g., medical records, school records, information from
psychologist)
c. Behavioral observations during the assessment (verbal and nonverbal)
d. Supplemental test data
e. Prior test data
2. Determine whether other hypotheses exist to explain the behavior or symptoms in
addition to those derived from the CBTI narrative (e.g., medical or organic expla-
nations, drug or alcohol use, situational determinants, other domains not tapped
by your CBTI). Evaluate whether these alternative hypotheses have adequate
support by examining other sources of data as you did in step 1 above.
3. Review written notes taken during the assessment to decrease reliance on per-
sonal memory. As Garb (1998) noted that faulty clinician memory adds to bias in
clinical judgment, it is important to develop and double-check written notes through-
out the assessment process.
4. When available, use computer-based (or other) actuarial prediction formulas to
evaluate the supplemental data that is not included in the CBTI narrative.

Strategies for Training

Although specific strategies to aid clinicians in integrating CBTIs into their final written
report are useful, these suggestions will only be valuable if the clinician first has ade-
quate training in assessment. Any test in the field of psychology, whether computerized
or not, can be misused and misinterpreted by those who are not properly trained. Respon-
sibility for adequate training in assessment falls on the shoulders of many: formal doc-
toral programs in psychology, supervisors of trainees, ethics boards, test developers, and
clinicians themselves. Adequate training begins with formal schooling and supervised
clinical experience. Training programs need to teach assessment methods that include
CBTI narratives and how to evaluate them and use them properly. Keeping up-to-date
with the latest assessment technology in the classroom is vital. Similarly, supervisors
need to ensure that trainees are not simply cutting and pasting parts of a CBTI narrative
for use in their final reports. Supervisors are responsible for guiding their trainees through
hands-on learning in how to adapt CBTIs appropriately and integrating with other essen-
tial data to answer complex referral questions. Ethics boards and licensing boards need to
address the issues of CBTIs as well. As mentioned earlier, ethical guidelines do exist for
the use of computers in assessment, but it is critical that these guidelines are kept up-to-
date as the field and the technology progresses. Test developers need to restrict as much
as possible the purchase of their products to only people with the appropriate credentials
and training. “Test User Qualifications” offer good general guidelines for whom can
make a purchase, but perhaps test developers could offer more in-depth training on some
of their computerized assessment instruments for those clinicians whom feel like they
need additional training before being able to use the instrument responsibly. At the fore-
front of training issues are the clinicians themselves. Each individual needs to be aware
of his or her knowledge base, be aware of when more training on a computerized test (or
new version of a test) is necessary, and be aware when they should refer to another
clinician if they feel their skills in the technologically related area are not up to par. To
Journal of Clinical Psychology DOI 10.1002/jclp
Computer Utilization and Clinical Judgment 29

help improve clinical skills, clinicians may consider obtaining supervision or consulta-
tion on using CBTIs in writing reports as such feedback has been shown to increase
accuracy of clinicians’ judgments (Garb, 1998).

Strategies for Test Developers

Clinicians may actively implement methods to decrease their biases, fine-tune their judg-
ment, and evaluate the CBTIs they use, as suggested in the above section. However,
developers of computerized tests can also take action and modify their software to opti-
mize the interaction of clinicians with the programs. Ideally, CBTI reports or the software
itself should incorporate such procedures to assist clinicians in evaluating and modifying
CBTI narratives. For example, pop-ups or decision trees could be built into the software
so that clinicians could interact with the program to answer a series of questions to
determine whether the CBTI hypotheses for each individual client are refuted or con-
firmed by other sources. Even qualitative data could be quantified and entered into an
interpretive algorithm to help refute or support CBTI hypotheses. Such additions to a
program could also help remind clinicians to consider alternative hypothetical explana-
tions for the results. Additions to computer programs such as these suggested here should,
of course, be adopted only if the empirical research literature demonstrates that demo-
graphic, medical, or life-history variables moderate the relation between test scores and
the targeted hypothesis.
Software could be further refined to incorporate research on optimizing clinical judg-
ment in common clinical situations (assessing suicide risk, dangerousness, optimizing
interventions, selecting the optimal medication, predicting recidivism, etc.) by asking
clinicians for information regarding demographics, client characteristics, and history of
the problem, as well as relevant test results. Again, such modifications that would inter-
ject these types of data should first be shown to empirically moderate the relationship
between test scores and dangerousness, outcome of interventions, etc. Software has been
developed to aid in the complex process of treatment planning; it could be used as a
model for developing analogous components for CBTI software used in complex assess-
ments. For example, the Systematic Treatment Selection (STS) system (www.systemat-
ictreatmentselection.com) developed by Beutler and Williams (1999) follows rigorous,
research-based strategies to assist clinicians in designing optimal treatment plans for
individual psychotherapy by walking clinicians through a series of question–answer steps
posed by the STS computer program. Other examples of software include the Level of
Service Inventory-Revised (LSI-R; Andrews & Bonta, 1995), which helps predict parole
outcome, success in correctional halfway houses, institutional misconducts, and recidi-
vism based on information about an offender obtained through criminal records, inter-
views, etc. Another example is the Suicide Probability Scale Computer Report (Cull &
Gill, 1988), which is an empirically validated measure of suicide potential that a clinician
can obtain based on client responses to 36 items that describe particular feelings and
behaviors.
In summary, clinicians need to take CBTI one step further by either manually eval-
uating the appropriateness and accuracy of CBTI hypotheses. Test developers need to
incorporate methods by which the software itself assists clinicians in making such eval-
uations and in doing so we will optimize the assessment process. As computer-based test
interpretations are advanced and judgments of human clinicians are refined, the marriage
of the two in an assessment report will continue to be strengthened throughout the 21st
century.
Journal of Clinical Psychology DOI 10.1002/jclp
30 Journal of Clinical Psychology, January 2006

References
Adams, K.M., & Heaton, R.K. (1985). Automated interpretation of neuropsychological test data.
Journal of Consulting and Clinical Psychology, 53, 790–802.
American Psychological Association. (2002). Ethical principles of psychologists and code of con-
duct. American Psychologist, 57(12), 1060–1073.
Andrews, D., & Bonta, J. (1995). Level of Service Inventory—Revised for Windows. North
Tonawanda, NY: Multi-Health Systems.
Ben-Porath, Y.S., Slutske, W.S., & Butcher, J.N. (1989). A real-data simulation of computerized
administration of the MMPI. Psychological Assessment: A Journal of Consulting and Clinical
Psychology, 1, 18–22.
Beutler, L.E., & Williams, O.B. (1999). Systematic treatment selection (STS): A software package
for treatment planning [Computer software]. Ventura, CA: Center for Behavioral Health
Technology.
Butcher, J.N. (1978). Review of Minnesota Multiphasic Personality Inventory: Behaviordyne Psy-
chodianostic Laboratory Services. In O.K. Buros (Ed.), Eighth mental measurements year-
book (pp. 949–52). Highland Park, NJ: Gryphon.
Butcher, J.N. (1995). How to use computer-based reports. In J.N. Butcher (Ed.), Clinical person-
ality assessment: Practical approaches. New York: Oxford University Press.
Butcher, J.N. (2002). How to use computer-based reports. In J.N. Butcher (Ed.), Clinical person-
ality assessment: Practical approaches (2nd ed.). New York: Oxford University Press.
Butcher, J.N., Berah, E., Ellersten, B., Miach, P., Lim, J., Nezami, E., et al. (1998). Objective
personality assessment: Computer-based MMPI-2 interpretation in international clinical set-
tings. In C. Belar (Ed.), Comprehensive clinical psychology: Sociocultural and individual
differences (pp. 277–312). New York: Elsevier.
Butcher, J.N., Perry, J.N., & Atlis, M.M. (2000). Validity and utility of computer-based interpreta-
tion. Psychological Assessment, 12(1), 6–18.
Butcher, J.N., Perry, J., & Hahn, J. (2004). Computers in clinical assessment: Historical develop-
ments, present status, and future challenges. Journal of Clinical Psychology, 60(3), 331–345.
Cull, J.G., & Gill, W.S. (1988). Suicide Probability Scale disk [Computer software]. Los Angeles,
CA: Western Psychological Services.
Eichman, W.J. (1972). Minnesota Multiphasic Personality Inventory: Computerized scoring and
interpreting services. In O.K. Buros (Ed.), Seventh mental measurements yearbook (pp. 253–
255). Highland Park, NJ: Gryphon.
Einhorn, J.H., & Hogarth, R.M. (1978). Confidence in judgment: Persistence of the illusion of
validity. Psychological Review, 85, 395– 416.
Eyde, L., Kowal, D.M., & Fishburne, F.J. (1991). The validity of computer-based test interpreta-
tions of the MMPI. In T.B. Gutkin & S.L. Wise (Eds.), The computer and the decision-making
process (pp. 75–123). Hillsdale, NJ: Erlbaum.
Feigelson, M.E., & Dwight, S.A. (2000). Can asking questions by computer improve the candid-
ness of responding? A meta-analytic perspective. Consulting Psychology Journal: Practice and
Research, 52(4), 248–255.
Fowler, R.D., & Butcher, J.N. (1986). Critique of Matarazzo’s views on computerized testing: All
sigma and no meaning. American Psychologist, 41, 94–96.
Garb, H.N. (1984). The incremental validity information used in personality assessment. Clinical
Psychology Review, 4, 641– 655.
Garb, H.N. (1992). The debate over the use of computer based test reports. The Clinical Psychol-
ogist, 45, 95–100.
Garb, H.N. (1998). Studying the clinician: Judgment research and psychological assessment. Wash-
ington, DC: American Psychological Association.
Garb, H.N. (2000). Computers will become increasingly important for psychological assessment:
Not that there’s anything wrong with that! Psychological Assessment, 12(1), 31–39.

Journal of Clinical Psychology DOI 10.1002/jclp


Computer Utilization and Clinical Judgment 31

Groth-Marnat, G. (2000). Visions of clinical assessment: Then, now, and a brief history of the
future. Journal of Clinical Psychology, 56(3), 349–365.
Groth-Marnat, G. (2003). Handbook of psychological assessment (4th ed.). New York: Wiley.
Grove, W.M., Zald, D.H., Lebow, B.S., Snitz, B.E., & Nelson, C. (2000). Clinical versus mechan-
ical prediction: A meta-analysis. Psychological Assessment, 12(1), 19–30.
Handle, R.W., Ben-Porath, Y.S., & Watt, M. (1999). Computerized adaptive assessment with the
MMPI-2 in a clinical setting. Psychological Assessment, 11, 369–380.
Harvey, V.S. (1997). Improving readability of psychological reports. Professional Psychology:
Research and Practice, 28, 271–274.
Hedlund, J.L., Morgan, D.W., & Master, F.D. (1972). The Mayo Clinic automated MMPI program:
Cross-validation with psychiatric patients in an army hospital. Journal of Clinical Psychology,
28, 505–510.
Hontangas, P., Olea, J., Ponsoda, V., Revuelta, J., & Wise, S.L. (2004). Assisted self-adaptive test-
ing: A comparative study. European Journal of Psychological Assessment, 20(1), 2–9.
Hoover, D.W., & Snyder, D.K. (1991). Validity of the computerized interpretive report for the
Marital Satisfaction Inventory: A customer satisfaction study. Psychological Assessment, 3,
213–217.
Lichtenberger, E.O., Mather, N., Kaufman, N.L., & Kaufman, A.S. (2004). Essentials of assess-
ment report writing. New York: Wiley.
Maloney, M.P., & Ward, M.P. (1976). Psychological assessment: A conceptual approach. New
York: Oxford University Press.
Matarazzo, J. (1983, July 22). Computerized psychological testing. Science, 221, p. 363.
Matarazzo, J.D. (1986). Computerized clinical psychological test interpretations: Unvalidated plus
all mean and no sigma. American Psychologist, 41, 14–24.
Meehl, P.E. (1954). Clinical versus statistical prediction: A theoretical analysis and review of the
evidence. Minneapolis: University of Minnesota Press.
Moreland, K.L. (1985). Validation of computer-based test interpretations: Problems and prospects.
Journal of Consulting and Clinical Psychology, 53, 816–825.
Moreland, K.L. (1987). Computerized psychological assessment: What’s available. In J.N. Butcher
(Ed.), Computerized psychological assessment: A practitioner’s guide (pp. 64–86). New York:
Basic Books.
Moreno, K.E., Wetzel, C.D., McBride, J.R., & Weiss, J.J. (1984). Relationship computerized adap-
tive testing (CAT) subtests. Applied Psychological Measurement, 8, 155–163.
National Association of School Psychologists. (2000). Professional conduct manual. Principles for
professional ethics. Bethesda, MD: Author.
Richman-Hirsch, W.L., Olson-Buchanan, J.B., & Drasgow, F. (2000). Examining the impact of
administration medium on examinee perceptions and attitudes. Journal of Applied Psychol-
ogy, 85(6), 880–887.
Roid, G.H. (1985). Computer-based test interpretation: The potential of quantitative methods of
test interpretation. Computers in Human Behavior, 1, 207–219.
Roper, B.L., Ben-Porath, Y.S., & Butcher, J.N. (1991). Comparability and validity of computerized
adaptive testing with the MMPI-2. Journal of Personality Assessment, 65, 358–371.
Salgado, J.F., & Moscoso, S. (2003). Internet-based personality testing: Equivalence of measures
and assesses’ perceptions and reactions. International Journal of Selection and Assessment,
11, 194–205.
Schafer, R. (1949). Psychological tests in clinical research. Journal of Consulting Psychology, 13,
328–334.
Shectman, F. (1979). Using computers in clinical practice. New York: Hayworth.
Shores, E.A., & Carstairs, J.R. (1998). Accuracy of the MMPI-2 computerized Minnesota Report in
identifying fake-good and fake-bad response sets. Clinical Neuropsychologist, 12(1), 101–106.

Journal of Clinical Psychology DOI 10.1002/jclp


32 Journal of Clinical Psychology, January 2006

Snyder, D.K. (2000). Computer-assisted judgment: Defining strengths and liabilities. Psychologi-
cal Assessment, 12(1), 52– 60.
Snyder, D.K., Widiger, T.A., & Hoover, D.W. (1990). Methodological considerations in validating
computer-based test interpretations: Controlling for response bias. Psychological Assessment,
2, 470– 477.
Tallent, N. (1987). Computer-generated psychological reports: A look at the modern psychometric
machine. Journal of Personality Assessment, 51(1), 95–108.
Tallent, N. (1993). Psychological report writing. Englewood Cliffs, NJ: Prentice Hall.
Westen, D., & Weinberger, J. (2004). When clinical description becomes statistical description.
American Psychologist, 59(7), 595– 613.

Journal of Clinical Psychology DOI 10.1002/jclp

You might also like