You are on page 1of 21


Copyright 2004, Lawrence Erlbaum Associates, Inc.

Rorschach Assessment of Changes Following

Psychotherapy: A Meta-Analytic Review RORSCHACH

Cato Grnnerd
Department of Psychology
University of Oslo

I examined Rorschach assessment of personality changes following psychotherapy. I con-

ducted a comprehensive literature search to find all studies using the Rorschach method at least
twice for the same participant in connection with psychotherapy. I conducted meta-analyses for
38 samples, and I performed regression analyses to identify moderating factors. Across all Ror-
schach scores, the total weighted sample effect size was r = .26, and nearly half the variables
obtained effect sizes higher than .30. Several moderating factors were found. Most important,
effect sizes increased with longer and more intensive therapy. More concern for interscorer reli-
ability was associated with larger effect sizes, whereas a higher degree of scorer blinding was
associated with smaller effect size magnitudes. Predicted levels of change based on the regres-
sion models indicated substantial increases in effect size with longer therapies. The data indi-
cate that many elements in the Rorschach are valid indicators of change despite the poor reputa-
tion the method has acquired within psychotherapy research.

The Rorschach method is viewed by many as the clinical tool longer intervals were associated with less stability) and the
par excellence. Critics, on the other hand, have been quite number of participants in the samples (i.e., larger samples
strong in their dismissal of the Rorschach in clinical and fo- were associated with higher stability). Furthermore, some of
rensic settings (e.g., Garb, 1999; Garb, Wood, & Nezworski, the variables with the lowest stability levels reflect state-like
2000; Wood, Lilienfeld, Garb, & Nezworski, 2000). In psy- features of personality, which cannot be expected to remain
chotherapy research, the Rorschach method, along with other as stable as those reflecting trait-like features. Ample evi-
so called projective instruments, has largely been discarded dence can also be found of acceptable and often excellent
in recent years because of poor psychometric qualities, reli- interscorer reliability levels in Rorschach studies (Acklin,
ance on inference, and their reflection of orientations that McDowell, Verschell, & Chan, 2000; McDowell & Acklin,
emphasize unconscious processes (Lambert & Hill, 1994, p. 1996; Meyer, 1997a, 1999a; Meyer et al., 2002). Concerns
73). Although Lambert and Hill acknowledge the fact that about reliability should therefore not be a decisive factor
the Comprehensive System (CS; quoting the second edition; against the Rorschach method.
Exner, 1986) has overcome some of these problems, the Ror- The general scepticism toward the Rorschach method
schach method is still viewed as too time consuming and might also be an important factor when assessment methods
costly. Self-report and observer inventories are preferred. are selected for a study. Some previous meta-analyses on the
During the last decades, studies using the Rorschach method validity of the Rorschach method (Atkinson, Quarrington,
have gradually disappeared from reviews on psychotherapy Alp, & Cyr, 1986; Parker, 1983; Parker, Hanson, & Hunsley,
(e.g., Garfield, 1978, 1994; Meltzoff & Kornreich, 1970). 1988; West, 1998) have been criticized for various faults and
The Rorschach method is not widely used in psychother- biases (Garb, Florio, & Grove, 1998, 1999; Garb et al.,
apy research, but among the studies that exist, even fewer 2000), although some of the critical arguments have them-
studies have used the method to evaluate therapy progress by selves been disputed (Hiller, Rosenthal, Bornstein, Berry, &
testing before and after treatment. Uncertainty regarding Brunell-Neuleib, 1999; Parker, Hunsley, & Hanson, 1999).
temporal stability might be one reason for this. However, in a Other analyses have clearly indicated, however, that the Ror-
recent meta-analysis (Grnnerd, 2003), I concluded that schach method can yield acceptable effect size magnitudes.
temporal stability levels are moderate to high. The varying Hiller et al. (1999) found unweighted and weighted effect
levels of stability seen among some variables were, among sizes of r = .29 and r = .26, respectively, based on crite-
other factors, related to the length of the retest periods (i.e., rion-oriented validity studies. The Rorschach method ob-
tained larger effect sizes than the Minnesota Multiphasic association between working alliance and outcome was r =
Personality Inventory (MMPI; Hathaway & McKinley, .22. Finally, Shadish, Navarro, Matt, and Philips (2000) ex-
1943) when behavioral outcome criteria were used such as amined the benefits of psychotherapy using studies in clini-
hospitalization, discontinuation from therapy, and so forth. cally representative conditions. Shadish et al. found effect
The MMPI, on the other hand, showed larger effect sizes sizes of r = .20 and .15 for random effects and fixed effects
when it was compared with other self-report criteria. Hiller et models, respectively.
al. (1999) suggested that the MMPI and other self-report in- Rorschach studies and psychotherapy studies in general
struments share method variance through a similar focus on thus have seemed to yield similar effect size magnitudes. The
how people present themselves. The Rorschach method, on question remains how the Rorschach performs in the field of
the other hand, might be more related to what people actually psychotherapy evaluations. Meyer and Archer (2001) called
do as defined by behavioral criteria. for more data on the Rorschachs locus of effectiveness to en-
Hiller et al. (1999) examined the validity of various hance effectiveness and utility of Rorschach testing. As
variables from the Rorschach method in a diverse set of stud- pointed out in another extensive literature review (Meyer et
ies. Two other meta-analyses have also shown encouraging al., 2001), different assessment instruments yield different
results for more focused assessments using specific vari- types of information, and getting a clearer picture of the
ables, namely the Rorschach Prognostic Rating Scale (Meyer strengths and weaknesses of the Rorschach method is there-
& Handler, 1997, 2000), and the Rorschach Oral Depend- fore important. To achieve this, Meyer and Archer stressed the
ency Scale (Bornstein, 1999). Both studies concluded that importance of doing meta-analyses directed toward more lim-
the Rorschach method can be a highly valid predictor of ob- ited areas of application such as more specific scorings, inter-
jective criteria such as psychotherapy outcome or depend- pretations, and contexts. Global meta-analyses may give
ency behavior. Meyer and Archer (2001) summarized data useful information on the general yield of an instrument com-
from all available meta-analyses and reanalyzed the Parker et pared with other instruments but are inherently limited by their
al. (1988) data based on the criticism raised against this study lack of focus on more limited and specific applications. Some
(Garb et al., 1998). Meyer and Archer found that the Ror- Rorschach variables may be valuable in some contexts but not
schach method performed neither better nor worse than other others; some may be generally valuable across contexts,
assessment methods. Across all methods, effect sizes gener- whereas others may not prove valuable at all.
ally hovered around r = .30. For the Rorschach, effect sizes This analysis is an attempt to move in such a direction by
from general and focused meta-analyses ranged from .14 to aiming at the very practical and every day clinical task of as-
.46 (see Meyer & Archers Table 4, p. 493; a simple average sessing progress in psychotherapy. I made an examination
of listed studies yielded r = .34). into studies using the Rorschach method to assess personal-
Meta-analyses of psychotherapy studies have yielded re- ity changes following psychotherapy to attempt to clarify
sults in the same range as the Rorschach studies. Smith and whether this dismissal of the Rorschach method in psycho-
Glass (1977) in their classic study of benefits from psycho- therapy research is warranted. Furthermore, I make a tenta-
therapy found an average effect size of r = .32 (d = .68). On tive comparison with temporal stability data from a previous
average, though, effect size levels for the benefits of treat- meta-analysis (Grnnerd, 2003). Considering the two top-
ment seem to have been somewhat lower. Lipsey and Wilson ics together is important, as temporal stability has a direct
(1993) summarized data from 302 meta-analyses studying bearing on the validity of the results. According to general
various kinds of treatments. The average effect size from psychometric theory, reliability is a prerequisite for validity
these studies was r = .24 (converted from d = .50, assuming (e.g., Grove, Andreassen, McDonald-Scott, Keller, &
equal number of participants in treatment and comparison Shapiro, 1981). Reliability limits the validity that can be ob-
groups), whereas a more limited selection of 156 carefully tained for a measure (the level of validity is limited to the
controlled studies yielded r = .23 (d = .47) on average. square root of the reliability). Short-term mood fluctuations
Forty-one of the 302 meta-analyses examined psychotherapy and long-term personality changes can also be observed in
studies only and are therefore more comparable with the pur- temporal stability data. Most of the constructs I want to mea-
pose of this study, and they yielded an average of r = .33 (d = sure are not entirely stable. The challenge for psychotherapy
.63, simple average computed from Lipsey & Wilsons, research in this regard is to identify clinically meaningful
1993, Table 1, Point 1.1 and 1.2). Later meta-analyses of the changes as opposed to changes that might be statistically sig-
benefits from psychotherapy have obtained similar results. nificant but do not bear any clinical importance.
Grissom (1996) summarized 68 meta-analyses and found In this study, I evaluated the usefulness of the Rorschach
that the typical difference between a therapy and a control method. Whether psychotherapy is effective or not is not an
group was r = .35 (d = .75). Westen and Morrison (2001) issue here, only whether the Rorschach method can detect ac-
found effect sizes of r = .15, r = .37, and r = .41 for tual changes. Psychotherapy research has shown that psy-
manualized treatments of depression, panic disorder, and chotherapy is effective (e.g., Bergin & Garfield, 1994;
generalized anxiety disorder, respectively (ds = .3, .8, and .9, Grissom, 1996; Lipsey & Wilson, 1993; Seligman, 1995),
respectively). Martin, Garske, and Davis (2000) found the and this general conclusion is used as a basis for this study. I
touch on the relative strengths and weaknesses of the Ror- examination of the abstracts, I enforced the search exclusion
schach method compared with other instruments in psycho- criteria and a number of other exclusion criteria, as explained
therapy evaluation. Specifically, I report change data from in the following, before a study was retrieved in full text. I
other instruments, but I make no attempt to gather represen- liberally defined psychotherapy as a nonexperimental thera-
tative data for these instruments to compare with the Ror- peutic intervention over a minimum period of 1 month of an
schach method. established type such as psychoanalysis, client-centered
therapy, or brief cognitive therapy. I excluded nonstandard
administrations of the Rorschach card, such as group admin-
STUDY 1: GLOBAL ESTIMATES OF CHANGE istrations, modified cards, alternative series (e.g.,
Behn-Rorschach Test; Zulliger, 1941) as well as studies in
Method which the participants were given only some of the 10 cards.
I also excluded studies involving participants with disabili-
Literature search. I did a comprehensive literature ties such as mental retardation or deafness. Medication was
search in the PsycINFO database from 1921 to October 2003 defined as active treatment in this context, and therefore, I
aiming at finding all studies that reported changes following only included studies comparing psychotherapy groups with
psychotherapy with the Rorschach method. The search terms nonmedicated control groups.
were also used in Grnnerd (2003) to find temporal stability Based on the examination of the abstracts, I retrieved 158
studies. I included journal articles, books, and dissertations studies in full text. These were studies that either were clearly
in the search and used the terms stabilit* or retest* or relevant or studies in which it could not be determined from
(consisten* and time), combined with (therap* or the abstract or the keywords whether it fulfilled the inclusion
psychotherap* or treat*) and (outcom* or contin* or and exclusion criteria or not. If there was any doubt, I re-
prognos* or predict* or improv* or chang*). I excluded stud- trieved the study in full text. The full text examination
ies of specific effects of drugs, medicines, surgery, or other showed that the majority of studies fell clearly short of the
purely somatic treatments together with studies of children criteria. Other studies seemingly reported relevant data, but I
and adolescents, meaning that the mean age of the partici- excluded them on closer examination as explained in the Ap-
pants at retest had to be at least 18 years. I also excluded pendix. In addition, Glatt and Karon (1974) included partici-
group and consensus Rorschach studies. All these restric- pants not undergoing therapy, but B. Karon (personal
tions were implemented in the search with the following ex- communication, October 7, 2003) kindly provided the raw
clusion terms: family rorschach*, joint rorschach*, consen- data from the participants who were in therapy. Kavanagh
sus rorschach*, group rorschach*, ag=childhood, surgic*, (1985) only reported data from the variables that were signif-
brain damag*, brain lesion*, and case stud*. Language was icant, and Kavanagh could not be reached to provide addi-
restricted to English, Norwegian, Swedish and Danish. tional data. From the Method section, it was possible,
This initial PsycINFO search returned 562 references. In however, to determine which other variables were examined
addition, I reviewed indexes and reference lists in other that were not significant. A secondary source (Hamlin &
available material1 including standard Rorschach textbooks Albee, 1948) was used to code data from Muench (1947),
(Beck, 1950; Klopfer, Ainsworth, Klopfer, & Holt, 1954; which could not be retrieved despite persistent attempts. Sev-
Piotrowski, 1957; Rapaport, 1949; Schachtel, 1966; Schafer, eral attempts were made to contact the authors of the ex-
1954), Rorschach reviews (Goldfried, Stricker, & Weiner, cluded studies listed in the Appendix to inquire if additional
1971; Jensen, 1959; Lerner, 1975; Rickers-Ovsiankina, data was available. No additional data was obtained from
1960), and psychotherapy reviews (Lambert & Hill, 1994; these contacts.2
Meltzoff & Kornreich, 1970; Orlinsky, Grawe, & Parks, From the set of 158 studies retrieved in full text, 24 studies
1994). As work progressed and studies were retrieved in full had reported relevant and usable data and were included in
text, I checked all reference lists for relevant studies that these analyses.
were not identified in the PsycINFO search.
I retrieved and examined more than 600 study abstracts to
find studies in which participants were undergoing psycho- The following authors were contacted during August to October
therapy, were tested at the beginning and at least once during 2003, but they were not able to provide additional data: Noble A.
Endicott (Endicott & Endicott, 1963), Harold S. Zamansky
or after psychotherapy, and in which Rorschach data reflect- (Zamansky & Goldman, 1960), John E. Exner, Jr. (regarding retest
ing changes from baseline to retest was analyzed. Because of period in the Exner, 1974, study) and Judy L. Kantrowitz
poor indexing of many studies in the PsycINFO database, (Kantrowitz, Katz, & Paolitto,1990; Kantrowitz, Katz, Paolitto,
some studies slipped through the exclusion criteria. During Sashin, & Solomon, 1987a, 1987b; Kantrowitz, Paolitto, Sashin,
Solomon, & Katz, 1986). The authors of the following studies could
not be reached: Barry, Blyth, and Albrecht (1952); Cadman,
Misbach, and Brown (1954); Rioch (1949); Kavanagh (1985);
I thank Gregory J. Meyer for providing additional references that Schwager and Spear (1981); Dudek (1970; regarding retest period
were not located in the search process. for the psychotherapy group); and Spero (1984).
Coding. In some cases, a study provided data from one turn were converted to effect size estimates. In another sam-
sample of participants, and the study would then be coded as ple (Baker, 1998), I used means and standard deviations to
one sample. Other studies reported data from two separate calculate an independent samples t test. The appropriate test
samples of participants, each receiving a different type of for these data would be a paired samples t test, but the re-
therapy. I coded these subsamples as separate samples in quired data were not reported. Treating the samples as inde-
these analyses. Furthermore, Lipsey and Wilson (2001) rec- pendent effectively underestimates the difference, thus
ommended that data from simple pretest to posttest compari- providing a conservative estimate in this context. Campo,
son analyses and from treatment versus control analyses Dow, and Tuset (1988) reported z-test data for pretest to
should be analyzed separately because pretest to posttest posttest differences, whereas the appropriate test for the re-
comparison designs will generally yield higher effect sizes ported design would have been a paired samples t test. I
than treatment versus control designs. In cases in which both therefore coded the data as t values, which resulted in a more
these types of data were reported for the same participants, I conservative estimate of the differences. Krout, Krout, and
coded the studies as two separate samples that each went into Dulin (1952) contained a few discrepancies between the ta-
separate analyses. bles and the text, but only the table data was considered. The
I considered all data reflecting changes from baseline to Weiner and Exner (1991) study contained two errors, but the
the final testing. Exner and Andronikof-Sanglade (1992), correct data was kindly provided by I. B. Weiner (personal
Glatt and Karon (1974), Haimowitz and Haimowitz (1952), communication, August 5, 2003).3
and Weiner and Exner (1991) reported data from two or three All authors had made predictions about the expected di-
retests, but only the final retest was coded in Study 1 (Study 2 rection of the changes for each of the variables they exam-
included all retests). Doherty (2001) and Glatt and Karon ined. The studies could therefore be classified as
(1974) both reported data based on the same participants, and confirmatory studies in which a number of variables thought
I therefore coded them as one sample, identified as Glatt and to be important for assessment of changes had been selected
Karon (1974). In cases in which the study reported data from by the authors based on theory and/or previous empirical
other methods alongside the Rorschach, I also coded this data data. Each entry was classified as representing an effect in
to allow a comparison of the Rorschach method with other the predicted direction, a null result, or an effect in the oppo-
methods. site of the predicted direction. All effect sizes that were in the
I did not code subgroups within samples separately unless direction that had been hypothesized by the author(s) were
the subgroup represented a specific form of therapy. Blatt, given a positive sign, and all in the opposite direction were
Ford, Berman, Cook, and Meyer (1988) presented data for given a negative sign. I coded data that resulted in effect sizes
subgroups classified as anaclitic and introjective, but I coded of zero as null results.
only the change data for the whole sample. Similarly, I did I did not code the Diagnostic Interview for Borderline
not consider subgroups in Glatt and Karon (1974) treated by (Gunderson, Kolb, & Austin, 1981) data reported in Baker
experienced versus inexperienced therapists. (1998) and the Bell Adjustment Inventory (Bell, 1934) data
I thus divided the 24 included studies into 29 samples. in Hamlin and Albee (1948). In both cases, high scores on
Twenty-two of these reported pretest to posttest analysis these instruments were used as selection criteria for the sam-
data, and the remaining 7 samples reported treatment versus ples, and changes in the scores from pretest to posttest were
control analysis data. One study reported both pretest to tested and found significant. This design creates conditions
posttest data as well as control data for the same participants for regression toward the mean to mix with true changes.
(Dudek, 1970). I coded different kinds of result statistics (r, t, Because the exact amount of regression artifacts in
F, 2, p, and z) from Rorschach data as well as data from nonrandomized pretest to posttest designs cannot be estab-
other assessment instruments individually as reported (I call lished (Campbell & Kenny, 1999), I excluded the data as a
these result statistics entries in this study). I only coded p val- precaution.
ues in cases in which no other more exact statistics were re- I did not check the coding for reliability, which may be re-
ported or could be calculated from raw data. In cases in garded as a weakness of this study. After the initial coding,
which only probability limits were given (e.g., p < .05), I however, I made two thorough checks of the complete cod-
coded the limits as reported. I assigned an effect size of zero ing with more than a year passing in between each check. In
to results reported as nonsignificant. If probability levels both rounds, errors were detected and corrected, and signifi-
were based on two-tailed tests, I converted them to one-tailed cantly less errors were detected in the last round. Although
values, that is, the p value was divided by two. This proce- this procedure cannot replace a proper reliability check, it
dure allows the direction of the effect to be coded (Rosenthal, should inspire confidence in the accuracy of this coding.
In five samples in which only probability limits were re- 3
In Weiner and Exners (1991) Table 3, in the data for Adj D > 0
ported (Coons, 1957; Coons & Peacock, 1970; Exner, 1974; from the first testing of the short-term sample, the correct percentage
Exner & Andronikof-Sanglade, 1992; Weiner & Exner, is 23, not 3 as reported. In the same table, the correct entries for Sum
1991), I used the raw data to calculate 2 values, which in Shading > FM + m are 32, 36, 13, 14**, 10, 11, 9, and 10.
Analyses. The correlation coefficient r was used as an for imperfections before entering the analyses, but in line
estimator of effect size magnitude. Rosenthal (1991a) pre- with Rosenthal (1991b), the real-world levels are pre-
sented methods to convert a variety of statistics into r. The ad- ferred. On the practical side, too many studies failed to report
vantage of using r over the other commonly used effect size interscorer reliability levels, and the type of statistics varied.
statistic, Cohens (1988) d, is that the same conversion is Also, conversion from percentage agreement to kappa-type
used for dependent and independent t tests and that less data statistics is problematic (Acklin, 1999; Grnnerd, 1999;
is generally needed from the studies to compute effect sizes. McDowell & Acklin, 1996; but see also Meyer, 1997a,
The r values should not be interpreted as traditional correla- 1999b).
tion coefficients; instead, they represent a general indicator In addition to the analysis of variables averaged within sam-
of the size of an effect on a scale ranging from 1 to +1. Al- ples, I also analyzed different Rorschach scores and variables
though r may not be widely used in the therapy literature, a across samples (the term variable is used for convenience in
number of meta-analyses involving the Rorschach method this context). In cases in which a sample had more than one ef-
have used the correlation coefficient as the effect size estima- fect size for a variable, I averaged the results within the sample
tor (e.g., Bornstein, 1998, 1999; Hiller et al., 1999; Parker et first. Each sample thus contributed only one entry for each
al., 1988; Romney, 1990). Hemphill (2003) summarized in- variable. Most often, a single score defined a variable. As an
formation from two reviews of meta-analytic findings. One example, I coded entries from nine samples for the Human
review covered different forms of psychological interven- Movement score, M, and these entries defined the cases for the
tion, and the second covered different assessment instru- analysis of the variable M. In other instances, the number of
ments. Hemphill suggested the following empirically based entries for a specific score was rather small, and a more encom-
guidelines for interpreting effect size magnitudes: low r < passing variable was constructed. Form was one such exam-
.20, middle r = .20 to .30, and high r > .30. ple, which included four entries for Lambda > .99, two for F%
I did coding and basic effect size conversions by manually and one for NonF%. Two special variables were also defined,
entering data and formulas onto a Quattro Pro spreadsheet Constructs and Global. The Constructs variable consisted of
(Corel, 2000). The data was then transferred to SPSS (2001) clinical ratings, which in these studies were based on psycho-
for analysis. analytic theory, and included definitions for, for example,
To obtain an average effect size for a sample, all calcu- Anxiety, Attitude Toward Self, Hostility, and so forth. Only
lated effect sizes from that sample were used to compute an examples were given for the scoring criteria, but judging from
average. As an example (see Table 1), the three probability these reports, they were based both on traditional scoring cate-
limits of p = .05, .46, and .08 reported in Barendregt (1957) gories, such as Movement, Shading, and so forth, as well as
were converted into effect sizes of r = .27, .02, and .23, re- more informal clinical evaluations of, for example, the pres-
spectively. They were in turn combined into a total average ence of anxiety in the Rorschach data. The Global variable
effect size of r = .18 for that sample (when exact, nonrounded consisted of instances in which clinicians identified which one
effect sizes were used). of two protocols was taken before and which one was taken af-
To obtain the global estimates of change effect sizes ter treatment. No specific criteria of change were given to the
across samples, a weighted average based on judges of the studies in advance, so the judges decided for
nontransformed sample correlational effect sizes was calcu- themselves what constituted important signs of improvement.
lated using the sample size N as the weight. Field (2001) ar- In total, I defined a list of 32 variables. I calculated
gued that weighted averages give the most accurate weighted averages and levels of heterogeneity for each of
estimation of the population average. Feild also demon- these (e.g., the effect size for Form Level was rw = .25, with a
strated that averages based on Fishers Zr transformed cor- nonsignificant level of heterogeneity).
relations can lead to overestimations when the population Finally, I compared the Rorschach effect sizes to effect
effect sizes are variable and especially so for large effects. sizes obtained from other assessment instruments in the same
Conversely, averages based on raw correlations slightly un- samples. These analyses should only be regarded as tenta-
derestimated the population effect sizes, but to a lesser de- tive, however, based on two major limitations. First, contrary
gree than the overestimation based on transformed to the Rorschach results, the results from other instruments
correlations. The single predictorcriterion relationship in did not provide a representative selection of studies using
this study (i.e., changes due to treatment) also calls for these instruments. A full meta-analysis of how any particular
weighted averages based on nontransformed correlations to instrument performs when evaluating therapy changes might
be used. yield different effect size magnitudes than those reported in
Heterogeneity was estimated by calculating Q, a 2-dis- these studies. Consequently, the other instruments were only
tributed estimate. Significant Q values indicate that system- coded according to broad methodological types without fur-
atic factors are associated with variations between estimates. ther individual attention. The types were self-report, ability,
I address these factors in Study 2 following. and case rating. I discarded two instruments that did not fit
I did not correct estimates for attenuation (or level of reli- these categories (the Thematic Apperception Test [Murray,
ability) for two reasons. First, estimates could be corrected 1938] and the Id Ego Superego Test [Dombrose & Slobin,
Studies Evaluating Change Following Psychotherapy Divided Into Samples

Study Diagnostic Category Treatment Modality Period N rw

Baker (1998) Borderline NR 38 22 .06

Barendregt (1957) Bronchial asthma Group therapy 19 36 .18
Blatt et al. (1988) Schizophrenia Intensive psychotherapy 15 90 .07
Campo, Dow, & Tuset (1988) NR Psychoanalysis; psychoanalytically 18 30 .14
oriented therapy
Carr (1949) NR Nondirective psychotherapy NR 9 .21
Coons (1957)
Interaction Group Control Mostly schizophrenia Interaction oriented group therapy 2 41 .36
Insight Group Control Mostly schizophrenia Insight oriented group therapy 2 42 .07
Coons & Peacock (1970) Mostly schizophrenia Group therapy 1 56 .14
Dudek (1970)
Psychoanalytic Group I Neurosis Psychoanalysis NR 16 .46
Psychoanalytic Group II Schizophrenia Psychoanalysis 20 10 .13
Psychotherapy Groups Control Neurosis and schizophrenia Psychoanalysis 67 55 .63
Exner (1974) Depression, schizophrenia, NR NR 181 .18
and character disorder
Exner & Andronikof-Sanglade
Brief, second retest Mixed Brief therapy, various modalities 10 35 .19
Short, second retest Mixed Short therapy, various modalities 25 35 .39
Gaylin (1966) NR NR 4 57 .19
Glatt & Karon (1974), Schizophrenia Psychoanalysis 20 15 .20
Psychotherapy subgroup, third
Gunderson et al. (1984)a, second Schizophrenia Insight-oriented therapy; reality-adaptive, 24 74 .20
retest supportive therapy
Haimovitz & Haimovitz (1952), NR Client-centered therapy and/or group 15 10 .80
second retest therapy
Hamlin, Berger, & Cummings NR NR 15 16 .77
Kavanagh (1985) Mixed Psychoanalysis; psychoanalytic 33 33 .05
Krout, Krout, & Dulin (1952)
Psychoanalytic group Mixed Psychoanalytic therapy 12 19 .43
Nonanalytic group Mixed Nonanalytic therapy 13 14 .37
LaFrance (1971) Drug dependent convicts Group therapy 6 18 .81
Mintz, Schmeidler, & Bristol (1956) NR Psychoanalysis 20 20 .66
Peterson (1954) NR NR 3 42 .08
Peyman (1956) Schizophrenia Group therapy 6 32 .41
Rice (1973) NR Client-centered therapy 2 48 .27
Weiner & Exner (1991)
Short, third retest Mixed Rational emotive therapy; Gestalt therapy; 48 88 .28
modeling therapy; assertiveness therapy
Long, third retest Mixed Dynamically oriented therapy 48 88 .43

Note. N = 1,202. Retest period = the average retest period for all participants in the sample reported in whole months; rw = the average weighted effect size
obtained for each sample; NR = not reported.
aDesign presented in Stanton et al. (1984). bSecondary source for Muench (1947).

1958]) from the analyses because both came from the same Results
sample (LaFrance, 1971). Second, the included samples var-
ied as to how many other instruments were used. In studies in Table 1 summarizes the key descriptive characteristics of the
which many instruments were used, all were compared to the 29 included samples and their obtained effect sizes (see Table
same Rorschach effect size and these pairs of observations 6 in Study 2 for a complete overview of the sample coding
were therefore not independent. and average effects). The total number of participants in all
In this study, I took several steps to ensure that the coding samples amounted to 1,202. Average retest periods varied
and analyses were conservative. Given the force of recent crit- from 6 weeks to 5 years, with an average of 18.7 months. I
icisms of the Rorschach method, an underestimation of effect coded a total of 266 Rorschach entries, and I also coded 201
sizes was more desirable than an overestimation. This should entries from various other assessment methods from 10 of the
also be taken into account when the results are evaluated. samples.
Two samples had obtained negative effect sizes. In TABLE 3
Baker (1998), the group of borderline participants showed a Combined Level and Heterogeneity
for 32 Variables
mixed pattern of changes on the Rorschach, with the aver-
age tipping below zero (r = .06). Some indexes were also Variable k N rw Q
positive, however, so the picture was not clear. In Coons
and Peacock (1970), an interaction effect occurred with a Sum Shd > FM + m 4 246 .58 2.82
D score 4 246 .58 3.69
parallel treatment given to half the group. The subgroup Constructs 3 43 .50 0.66
that received psychotherapy and a random ward treatment FC 2 26 .50 4.41*
showed a higher degree of change (r = .11) than those who Color Balance 6 310 .49 24.55***
Organization 5 303 .45 15.43**
received organized ward treatment (r = .38). Splitting the Global 6 222 .44 9.32
sample into two subsamples to separate the effect of the Populars 4 97 .38 1.31
ward treatment would compromise the coding of data from DEPI 4 246 .37 2.19
FM + m 2 26 .36 2.71
other methods in the study. The overall average sample ef- Afr 6 272 .36 7.91
fect size would also remain the same because the two sub- CDI 4 246 .36 9.30*
groups were equally large. For both reasons, the results All H Content 5 259 .34 15.48**
Shading 6 272 .32 9.07
were coded as belonging to a single sample, resulting in a Adj D 4 246 .30 10.46*
negative effect size overall. M 9 430 .30 25.59**
Average sample-based effect sizes are shown in Table 2. S 2 70 .28 0.60
Form 7 329 .27 12.34
The average for the 22 pretest to posttest design samples was
R 5 187 .26 7.46
rw = .25; for the 7 treatment versus control design samples it r & (2) 5 352 .22 1.50
was rw = .28. The difference between the two was not signifi- Miscellaneous Variables 10 329 .21 7.51
cant, t(27) = .232, p = .818. The total average for all 29 sam- EA & EB 5 575 .20 3.78
a&p 4 246 .17 2.61
ples was rw = .26. Heterogeneity was not significant for any Miscellaneous Content 5 303 .17 5.79
of the three sample groups. A fail-safe N analysis indicated Originals 1 57 .16
that 404 studies with null results are needed to invalidate the Special Scores 6 358 .14 8.35
Form Level 8 437 .12 5.25
results for the pretest to posttest design sample group, and 59 SumC 3 116 .10 5.72
studies are needed for the treatment versus control design W&D 4 89 .10 8.91*
samples. For all 29 samples, the fail-safe N was 790. Based Depth 2 176 .09 0.00
Object Relations 2 123 .03 0.01
on the lack of difference between the two sample subgroups, A% 2 26 .00 0.00
the average of all 29 samples is referred to as the main result
in the following. Note. Table sorted by effect size. k = number of samples; N = the number of
participants in the k samples; rw = the weighted average; Q = 2 distributed
Table 3 presents the average effect sizes for the 32 vari- heterogeneity statistic with df = k 1; Constructs = clinical evaluations of
ables reported in the samples. Weighted effect sizes varied personality constructs such as anxiety, attitude toward others, and libido;
between .00 and .58. The average effect size for the 32 vari- Color Balance = (FC+) (FC) CFC, CF + C > FC + 1, FC/FC + CF + C;
ables was r = .29. The diverse impact of treatment on the dif- Organization = Zd < 3.0, Zf; Global = clinical evaluation of change or
ferent variables was clearly illustrated. Using Hemphills improvement for the protocol as a whole; All H Content = H, H < (H) + Hd +
(Hd), Pure H < 2; Shading = c, T = 0, T > 1; S = S > 2; Form = F%, NonF%,
Lambda > .99; r & (2) = Fr + rF, Fr + rF > 0, 3r + (2)/R < .33, 3r + (2)/R > .43;
TABLE 2 Miscellaneous Variables = Adaptive Regression, Primary Process Thinking,
Sample-Based Effect Sizes Human Experience Variable, Rorschach Prognostic Rating Scale,
Annihilation Anxiety, Buhlers Basic Rorschach Score, Carrs signs,
Sample Group k N Q rw CI FSN Muenchs signs, Haimowitzs signs, Determinant changes, Damage content,
Hostile content, Oppression content; EA & EB = Ambitence, EA, EA < 7,
Pretest to posttest Extratensive, Introversive; a & p = Mp > Ma, p > a + 1; Miscellaneous
design 22 948 18.38 .25 .19 to .31 404 Content = Content changes, Intellect > 5; Special Scores = Confab, Contam,
Treatment versus Fab, SumSpSc > 6, TDI, WSum6; Form Level = F + %, X + % < .70, X % >
control design 7 280 5.85 .28 .17 to .39 59 .20, FQ+/o change, FQu change, FQ change; W & D = D+/D, W+/W, W,
All 29 1202 24.14 .26 .20 to .31 790
(W Dd)/R; Depth = FD > 2; Object relations = various DCOS and MOA
Note. Other averages were also calculated: unweighted averages based on indexes. See Table 6 for references.
raw correlations, unweighted averages based on Fishers Zr-transformed
correlations, and weighted averages based on Fishers Zr-transformed
correlations (weighted by the square root of N 3, the inverse of the standard (2003) empirically based classifications, only 10 variables
error of Zr). All were somewhat higher than the averages listed in the table but were of small magnitude (r < .20), whereas 14 were in the
not dramatically different. The weighted averages across all samples based high range (r > .30). Only 8 variables showed significant het-
on transformed correlations were .29, .34, and .30 for unweighted raw r,
erogeneity. More than half of the variables consisted of en-
unweighted Zr, and weighted Zr, respectively. k = the number of samples; N =
the total number of participants included in the samples; Q = 2 distributed tries from less than five samples, some from as few as one
heterogeneity statistic; rw = the average r weighted by N; CI = 95% and two, and the data from these variables must obviously be
confidence interval; FSN = fail-safe N. interpreted with caution.
Table 4 shows the Rorschach results compared with the by the findings in Table 4 in which the same studies had com-
other instruments reported in 10 of the samples. Based on the pared the Rorschach method to other assessment instru-
lack of significant differences between pretest to posttest de- ments. The comparison in Table 4 indicates that ability tests
sign samples and treatment versus control samples, all sam- and outcome ratings have shown somewhat higher levels of
ples were analyzed together. Also note carefully that the change, but the studies that only have reported Rorschach
effect sizes were weighted by sample size in the analyses, data showed significantly larger effect sizes than the
which caused the high degrees of freedom. multimethod studies. Thus, the question is raised whether the
First of all, the 19 samples that only reported Rorschach 10 multimethod samples are representative of Rorschach
results obtained significantly higher Rorschach effect size studies. If they are not, which there is some reason to believe,
magnitudes than the 10 samples that reported results from the main implication would be that the Rorschach method is
several instruments: Rorschach only, rw = .29; Rorschach neither inferior nor superior when making assessments of
and others, rw = .20, t(1,230) = 7.944, p < .000, two-tailed. In personality changes.
the 10 samples that reported multiple instruments, the Ror- Practically half the variables in Table 3 displayed effect
schach effect sizes were significantly smaller than the effect sizes in the high range (r > .30) indicating that some variables
sizes of other assessment instruments. Interestingly enough, are more useful than others when making assessments of per-
the difference was caused by ability tests and case ratings, sonality changes. Also notice that the variables Constructs
which were higher in magnitude than Rorschach and and Global both obtained impressive effect sizes of rw = .50
self-report methods. and rw = .44, respectively. These results are important be-
cause the variables represent a more holistic approach to per-
Summary sonality assessment that has more direct links to core
concepts in clinical practice.
Sample level effect sizes were consistently situated in the up-
per middle range of effects typically observed in psychologi-
cal studies (Hemphill, 2003). The results can be interpreted STUDY 2: MODERATOR INFLUENCES
to reflect two underlying findings. First, these effect sizes ON CHANGE ESTIMATES
were on the same level and even somewhat larger than the av-
erage effect size of r = .23 based on 156 meta-analyses of Method
treatment studies (Lipsey & Wilson, 1993). This shows that
in this group of studies, psychotherapy has had an effect that Literature search. The same 24 studies selected in
is comparable to other psychotherapy studies. Study 1 were used in this study. The coding was more elabo-
Second, the Rorschach method reflects these changes in rate, however, and included additional samples as detailed in
the same magnitude as other instruments. This is indicated the following.
both by these findings compared to prior meta-analyses and
Coding. Exner and Andronikof-Sanglade (1992),
TABLE 4 Glatt and Karon (1974), Haimowitz and Haimowitz (1952),
Effect Size Magnitudes for the Rorschach and Weiner and Exner (1991) have all reported data from
Method Compared to Other Assessment two or three retests. Each retest in these studies have been
Methods in 10 Samples coded as separate samples and added to the samples from
Grouping Test df p E rw
Study 1. Although this coding creates a degree of statistical
dependence among moderator variables, it allows a more
Rorschach t = 2.808 1, 504 .005 thorough analysis of the influence of treatment duration on
versus others
Rorschach 10 .20
effect size magnitudes.
All other The 24 included studies were thus divided into 38 sam-
instruments 17 .23 ples, 9 more than in Study 1. All the additional samples re-
Instrument type F = 13.383 1, 502 < .000 ported pretest to posttest analysis data, making a total of 31
Rorschach 10 .20
Self-report 5 .18 samples with in this group. The remaining 7 samples re-
Ability 5 .23 ported treatment versus control analysis data. The two sam-
Case rating 7 .25 ple subgroups were still not significantly different, t(36) =
Note. N = 471 in 10 samples. E = the number effect sizes entered from the 0.330, p = .743. All 38 samples were therefore analyzed to-
10 samples; rw = the weighted average; self-report = Katz Adjustment Scale gether in this study.
(2), Soskis Attitude Towards Illness, Camarillo Dynamic Assessment Scale, The effect size coding of the additional samples was per-
Client Outcome Rating; ability = Wechsler Adult Intelligence Scale (4), formed as reported in Study 1. In addition, a number of sam-
Visual-Verbal Test; case rating = Menninger Scales, Fair weather,
StraussHarder Symptoms, Hospital Adjustment Scale, Psychotherapy
ple characteristics were coded as moderator variables, as
Outcome Interview, Behavior Rating, Therapists outcome rating. See Table presented in Table 5. Concern for Interscorer Reliability and
6 for references. Blindscoring were coded because of their methodological
Moderator Variable Coding

Variable Code Description

Sample level
Concern for Interscorer Reliability 0 Not reported or no control
1 Partial control: Inappropriate method (percentage agreement), statements that reliability checks
were made but without report of data, calculations not based on data from reported subjects, low
agreement with appropriate method (mean kappa or ICC below .60)
2 Full control: Kappa or ICC, high agreement (M > .59)
Blindscoring 0 Not reported or no blinding
1 Partial blinding: Ambiguous report of blinding procedure, or a degree of blinding was applied, for
example, the same person collected all protocols and then scored them in random order (not by
pretest and posttest pairs)
2 Full blinding: Scorer(s) completely blind to pretest and posttest information
Publication Type 0 Unpublished/non-peer-reviewed: Book, dissertation, report, secondary source
1 Minor journal: Peer-reviewed journal, not coded as major
2 Major journal: Journal of Personality Assessment, Journal of Consulting and Clinical Psychology,
Journal of Clinical Psychology, Psychotherapy
Retest Period Average retest period for the sample converted into months
Therapy Load (Therapy Type + 1) Sessions per Week
Helper variables for Therapy Load
Sessions per Week Average number of sessions per week in therapy; missing values replaced by 1
Therapy type 0 Group therapy
1 Individual and group therapy
2 Individual therapy; default if not reported

Note. Only moderator variables included in one or more of the regression models are listed. ICC = intraclass correlation coefficient.

importance. As previously noted, reliability constrains valid- and the protocols were masked for pretest and posttest infor-
ity, and poor interscorer reliability may have a detrimental mation during scoring.
effect on effect sizes. Also, given the time span of the rele- The Publication Type moderator addressed possible pub-
vant studies and the increased focus on interscorer reliability lication bias. Publications in core journals were coded 2,
in Rorschach data in recent years, the effects of reliability on publications in other peer-reviewed journals were coded 1,
effect sizes in this context were of interest. When nothing and the remaining publications were coded 0. Publication
was mentioned about interscorer reliability, the study was bias was an issue in the debate over a previous meta-analysis
given the code 0. Other studies presented agreement esti- involving the Rorschach method (Parker et al., 1988). In this
mates that were problematic, insufficient, or inappropriate, literature search, I aimed at finding all studies reporting rele-
and these were coded 1. This coding included the use of per- vant data and thereby also provide a good opportunity to ex-
centage agreement, which has been criticized as inappropri- amine the possible influence of publication bias.
ate, especially for low base rate data (Grnnerd, 1999; Retest Period was thought to be of major importance in
Wood, Nezworski, & Stejskal, 1996). Similarly, if reliability this context. Psychotherapy research has generally con-
checks were not made of the data in the study but rather a ref- cluded that longer therapies are associated with higher effect
erence was made to the scorers previous achievements, the size magnitudes (Orlinsky et al., 1994), and the same result
study was also coded 1. Finally, if agreement levels were could therefore be expected in this data. Therapy duration
low, although using appropriate estimates, it was also coded was highly similar to the retest period of the final test in most
1. The study was coded 2 only when appropriate methods cases. Initially, therapy duration was coded as a separate
were used and interscorer reliability was acceptable (> .59). moderator but was finally abandoned because of scarce in-
Blindscoring is important when repeated measures are ap- formation in the included studies; and in the group of studies
plied in clinical contexts, as they were in many of the in- in which it was reported, the correlation with Retest Period
cluded studies. Expectancy effects are well documented was very high.
(e.g., Rosenthal & Rubin, 1978), and blindscoring has there- Therapy Load was calculated as an index of therapy inten-
fore become a central methodological feature of high quality sity based on the assumption that individual therapy in gen-
studies. When no mention was made of blinding, the study eral can be regarded as more intensive than group therapy
was coded 0. In cases in which the report of the blinding pro- and that more frequent sessions can be expected to result in a
cedure was ambiguous or unclear, the study was coded 1. higher degree of change.
This was also the case when a pseudoblinding was applied, as Several other moderators were initially also coded but
when the administrator scored protocols at random and not were either not influential in the analyses (the exclusion limit
ordered in pretest and posttest pairs. The study was coded 2 was p > .15 as described in the Analyses section following),
only when the scorers were different from the administrators, or they were highly problematic. Therefore, I give only a
brief description of these moderators here.4 The sample level tinuos variables because they reflected an assumed increase
moderator variables were Attrition (addressing the concern in quality with increasing values.
that patients dropping out could be different from those who A relatively new development within the field of
stayed in therapy), Number of Scores (a measure of scoring meta-analysis is to use regression models to predict the levels
complexity that again could be related to level of reliability), obtainable from an ideal study or in specific modifications of
Publication Year (addressing possible changes in method- single studies (e.g., Roberts & DelVecchio, 2000; Shadish et
ological quality over the years), Randomized Design (many al., 2000). The obtained regression equation provides an ac-
studies used naturalized samples that could be thought to dif- count of the relative influence of the various moderators but
fer from more experimentally oriented designs), Schizophre- also provides a possibility to enter the best possible values
nia versus Nonschizophrenia (schizophrenia patients may for each moderator back into the equation. In practice, this is
show lower extent of change due to more chronic illness), done by multiplying the ideal moderator values with their
Scoring System (addressing possible differences between the corresponding unstandardized regression coefficients. This
CS, other standard systems, and nonstandard systems) and procedure allows a researcher to address specific shortcom-
Severity of Pathology (based on the same concern as with the ings in the data set and estimate what kind of change can be
schizophrenic participant; this variable was rated along the achieved when the effects of these limitations are contained.
neurotic-schizophrenic spectrum). Publication Year and In line with Grnnerd (2003), I applied this method in these
Scoring System were influential in the analyses, but both analyses.
contributed to overfitted regression models and were there- I conducted two analyses. The first analysis was sample
fore discarded. based and included the average effect size across all Ror-
schach scores in each of the 38 samples listed in Table 6. The
Analyses. The focus of Study 2 was to establish moderators presented in Table 5 were used as potential pre-
whether moderator variables were systematically related to dictors of the sample-based effect sizes.
variations in effect sizes levels. This meant that focus was In the second analysis, regression analyses were per-
shifted away from absolute effect size levels, as analyzed in formed separately for variables that were used in at least five
Study 1, to the relative influence of moderator variables on samples. Note carefully, however, that compared to the data
effect size magnitudes. Average levels were therefore of no listed in Table 3, these analyses included data from all the re-
special interest and should be expected to be generally lower test intervals, and therefore, the number of entries and aver-
in magnitude due to the inclusion of several retests. This fol- age levels of change were in many cases different from those
lows the previously mentioned finding from psychotherapy listed in Table 3. The 2 miscellaneous variables were omit-
research that longer therapies are generally associated with ted, which left 13 variables to be analyzed. As an example, 6
larger effect sizes. We should therefore expect that at least on entries for the Affect Ratio, Afr, defined the cases of one re-
average, the first retests yield smaller effects than subsequent gression analysis (no Afr data was available from the studies
retests. with several retests, so these cases were equivalent to those
Moderator influences were analyzed with the stepwise lin- listed in Table 3) as did 17 entries for M (the 9 entries from
ear regression procedure in SPSS (2001). Outliers were the final retests in Table 3 and an additional 8 entries from in-
winsorized at 3 SDs from the mean before the data entered termediate retests), 16 entries for Form Level (8 from final
the regression analyses. The method of weighted least retests and 8 from intermediate retests), and so forth. Given
squares (WLS) was applied in the regression analysis, using the low number of entries for most variables, the potential
the sample size as the weight. WLS is needed when the sam- moderator variables were restricted to Retest Period and
ple variance differs between estimates, and the procedure is Therapy Load.
generally more robust when the data have skewed distribu- The inclusion of several retests of the same participants
tions and moderators are intercorrelated (Steel & creates a degree of dependence between the moderator vari-
Kammeyer-Mueller, 2002). ables and the effect sizes themselves. First, when results
Meyer (2000) noted that the traditional alpha level of p = from several retests are reported, they are not independent re-
.05 is too strict when building stepwise regression models, sults. The baseline will be the same for all retests, thereby
especially when the number of cases in the analysis is small. limiting the variance between the various retests. Second,
Following Meyers (2000) recommendations, alpha levels when all retests in a study are defined as separate samples, all
were set to p = .10 for inclusion and p = .15 for exclusion. The the sample-level moderators except Retest Period will be
moderators Concern for Interscorer Reliability, identical, for example, Concern for Interscorer Reliability
Blindscoring, and Publication Type were treated as con- was the same for all retests, Publication Type was the same,
and so forth (note, however, that because of attrition, the
number of participants was not always the same for all retests
within a study). This coding practice will effectively weight
The full list of moderator variable codings is available from Cato the moderators according to the number of retests from a
Grnnerds Web site at study. As previously noted, WLS regression has proven to be
more robust in the face of moderator nonindependence but other methods (parts of this data were also presented in Table
can only partly rectify its influence. To assess the extent of 1). According to a common guideline (Curran, West, &
the problem, I ran a control analysis including only the final Finch, 1996), none of the moderators were highly skewed,
retest of each sample to remove moderator interdependence. and no transformations were therefore necessary.
If the obtained model from this analysis was similar to the Table 7 shows all generated regression models. Statis-
main analysis model, the impact of nonindependence should tically significant prediction equations were computed for
not be decisive in these results. the sample-level analysis and for 4 of the 13 specific vari-
ables. For the sample-level analysis, Retest Period proved
Results most influential. Increased length of time in therapy was re-
lated to increased effect sizes. The next factors were related
Table 6 shows the complete coding of all retest intervals for to methodology. Effect sizes were higher in magnitude in
all samples including both Rorschach data and data from studies coded with higher levels of Concern for Interscorer

Coding and Effect Sizes for All 38 Samples

Study Scoring System Moderator Codinga N E PP rw TC rw

Baker (1998) CS 38/P/n/N/f/S/1.0/m/34/2/m/60/92/U 22 4 .06

Barendregt (1957) Own 19/N/n/S/f/N/2.0/lo/46/2/g/0/3/Mi 36 4 .18
Blatt et al. (1988) ABZb 15/N/r/N/f/S/4.0/m/21/12/i/10/41/Mi 90 23 .07
Menninger scales (Harty et al., 1981) 6 .33
Fairweather symptoms (Fairweather et al., 1960) 1 .14
StraussHarder symptoms (Strauss & Harder, 4 .16
WAIS (Wechsler, 1944) 3 .18
Campo, Dow, & Tuset (1988) Own 18/N/n/S/n/N/3.0/lo/27/9/i/0/9/Mi 30 9 .14
Carr (1949) Beck NR/N/n/N/n/S/NR/m/NR/3/g/0/27/Ma 9 2 .21
Coons (1957)
Interaction Group Control Clin.ev.c 2.3/P/r/N/f/N/3.0/m/NR/3/g/3/1/Mi 41 1 .36
WAIS 3 .27
Insight Group Control Clin.ev. 2.3/P/r/N/f/N/3.0/m/NR/3/g/3/1/Mi 42 1 .07
WAIS 3 .02
Coons & Peacock (1970) Clin.ev. 1.4/N/r/N/f/N/5.0/m/29/5/g/5/1/Mi 56 6 .14
WAIS 1 .35
Hospital Adjustment Scale (McReynolds & 1 .18
Ferguson, 1953)
Dudek (1970)
Psychoanalytic Group I Klopfer NR/N/n/S/n/C/2.0/lo/29/6/i/0/3/U 16 11 .46
Psychoanalytic Group II Klopfer 20/N/n/S/n/C/2.0/h/29/6/i/0/3/U 10 11 .13
Psychotherapy Groups Control Klopfer 67/N/n/N/n/C/2.0/m/32/6/i/0/3/U 55 1 .63
Exner (1974) CS NR/N/n/N/n/C/NR/m/0/NR/g/0/3/U 181 4 .18
Exner & Andronikof-Sanglade (1992)
Brief, first retest CS 3.5P/n/S/p/C/1.0/lo/25/3/i/0/45/Ma 35 27 .25
Katz Adjustment Scale (Katz & Lyerly, 1963) 7 .44
Brief, second retest CS 10/P/n/S/p/C/1.0/lo/25/3/i/0/45/Ma 35 27 .19
Katz Adjustment Scale 7 .44
Short, first retest CS 10/P/n/S/p/C/1.0/lo/25/3/i/0/45/Ma 35 27 .38
Katz Adjustment Scale 7 .44
Short, second retest CS 25.5/P/n/S/p/C/1.0/lo/25/3/i/0/45/Ma 35 27 .39
Katz Adjustment Scale 7 .44
Gaylin (1966) Standardd 4/N/n/S/f/S/1.0/lo/29/3/i/0/19/Ma 57 13 .19
Glatt & Karon (1974), Psychotherapy subgroup
First retest Standard 6/P/r/S/f/S/1.0/h/27/3/i/0/21/Ma 14 8 .39
Second retest Standard 12/P/r/S/f/S/1.0/h/27/3/i/0/21/Ma 11 8 .18
Third retest Standard, HBBe 20/P/r/S/f/S/1.0/h/27/3/i/0/21/Ma 15 12 .20
Gunderson et al. (1984), second retest Holtf 24/P/n/S/f/S/1.4/h/27/4/i/22/108/Mi 74 4 .20
Visual-Verbal Test (Feldman & Drasgow, 1951) 2 .28
Camarillo Dynamic Assessment Scale (May & 4 .01
Dixon, 1969)
Soskis Attitude Towards Illness Questionnaire 4 .02
(Soskis & Bowers, 1969)
Psychotherapy Outcome Interview (Gunderson 8 .21
& Gomes-Schwartz, 1980)
Haimovitz & Haimovitz (1952),
First retest, posttherapy Own 2.4/F/n/N/f/N/1.5/m/35/3/m/0/10/U 56 10 .27
Second retest, followup Own 15.4/F/n/N/f/N/1.5/m/35/3/m/82/10/U 10 1 .80

TABLE 6 (Continued)

Study Scoring System Moderator Codinga N E PP rw TC rw

Hamlin, Berger, & Cummings (1952) Muenchg 15.4/P/n/N/n/S/1.0/m/0/3/i/0/20/Ma 16 2 .77

Kavanagh (1985) DCOS, MOAh 32.7/P/n/N/f/S/4.0/m/32/12/i/0/33/Mi 33 .05
Krout, Krout, & Dulin (1952)
Psychoanalytic group Own 12/P/n/N/n/N/4.0/m/27/12/i/0/37/Ma 19 10 .43
Nonanalytic group Own 13/P/n/N/n/N/1.0/m/27/3/i/0/37/Ma 14 10 .37
LaFrance (1971) Holt 6/P/n/N/n/S/1.0/m/25/1/g/0/52/Mi 18 1 .81
TAT 1 .58
Id Ego Superego Test 1 .81
Behavior rating 1 .80
Mintz, Schmeidler, & Bristol (1956) Clin.ev. 20/P/n/N/f/N/3.0/m/30/9/i/0/1/Ma 20 1 .66
Peterson (1954) Standard 3/N/n/S/n/S/NR/lo/0/NR/i/0/30/Ma 42 12 .08
Peyman (1956) RPRSi 6/N/r/S/n/S/2.0/h/31/2/g/11/58/Mi 32 6 .41
Rice (1973) Standard 2.3/N/n/S/n/S/2.0/lo/28/6/i/0/3/Ma 48 1 .27
Therapist outcome rating 1 .37
Client outcome rating 1 .30
Weiner & Exner (1991)
Short, first retest CS 13/N/n/S/f/C/1.0/lo/23/3/i/0/48/Ma 88 27 .23
Short, second retest CS 29/N/n/S/f/C/1.0/lo/23/3/i/0/48/Ma 88 27 .26
Short, third retest CS 48/N/n/S/f/C/1.0/lo/23/3/i/0/48/Ma 88 27 .28
Long, first retest CS 13/N/n/S/f/C/2.4/lo/26/7/i/0/48/Ma 88 27 .18
Long, second retest CS 29/N/n/S/f/C/2.4/lo/26/7/i/0/48/Ma 88 27 .35
Long, third retest CS 48/N/n/S/f/C/2.4/lo/26/7/i/0/48/Ma 88 27 .43

Note. E = the number of entries coded from the study; PP = pretest to posttest design data; TC = treatment versus control design data; rw = weighted average
sample effect size; first line of each study presents Rorschach data, thereafter other multiple assessment instruments as specified; CS = Comprehensive System;
WAIS = Wechsler Adult Intelligence Scale; TAT = Thematic Apperception Test.
aKey to moderator coding: Retest Period (months)/Concern for Interscorer Reliability (N = not reported; P = partial; F = full)/Randomized Design (n = not

randomized; r = randomized)/Schizophrenia (N = nonschizophrenia; S = schizophrenia)/Blindscoring (n = no blinding; p = partial; f = full)/Scoring System (N =

nonstandard; S = standard; C = CS)/Sessions per Week/Severity of Pathology (lo = low; m = middle; h = high)/Subject Age/Therapy Load/Therapy Type (g =
group; m = mixed; i = individual)/Attrition (percentage)/Number of Scores/Publication Type (U = unpublished; Mi = Minor; Ma = Major). bAllison (A), Blatt (B),
and Zimet (Z; 1988), including Developmental Concept of the Object Scale (DCOS; Blatt, Brenneis, Schimek, & Glick, 1976) and Mutuality of Autonomy Scale
(MOA; Urist, 1977). cGlobal clin.ev. (clinical evaluation) of improvement from test to retest. dVarious basic scores from Rorschach (1921/1942), Beck (1950),
Klopfer, Ainsworth, Klopfer, and Holt (1954), and/or Friedman (1953). eHurvich, Benveniste, and Brodys (HBB; 1998) Annihilation Anxiety scoring. fHolts
(1956) Primary Process scoring. gMuenchs (1947) adjustment and improvement signs. hSee footnote b. iKlopfer et al.s (1954) Rorschach Prognostic Rating
Scale (RPRS).

Regression Models

Model k R Constant r B SE B

Sample 38 .664 .150

Retest Period .38 .058 .002 .538***
Scorer Blinding .37 .103 .031 .446***
Concern for Interscorer Reliability .15 .132 .049 .410***
Publication Type .05 .072 .039 .269*
M 17 .561 .141
Retest Period .56 .007 .003 .561**
Color Balance 14 .833 .191
Retest Period .76 .007 .002 .655***
Therapy Load .55 .030 .014 .366*
Form Level 16 .555 .238
Therapy Load .55 .015 .006 .555**
All H Content 13 .587 .080
Therapy Load .59 .050 .021 .587**

Note. N = 1,202. Models were not generated for R, Afr, Form, Shading, EA & EB, r & (2), Organization, Special Scores, or Global. The remaining variables in
Table 3 were drawn from less than five samples. The number of participants the variables were based on was the same as in Table 3. k = the number of samples
contributing data to the analyses; r = the univariate correlation between the effect size and the moderator.
*p < .10. **p < .05. ***p < .01.
Reliability and lower in magnitude in studies coded with The predicted increases in levels of change were sub-
higher levels of Blindscoring. Finally, increasing values on stantial and impressive. The model for the sample-level re-
the Publication Type moderator was related to slight in- sults predicted an increase up to r = .49 after 2 years.
creases in effect size levels. In a control analysis based only Human movement (M) and Color Balance both showed
on the final retest interval, a rather different model was re- steady increases with time, a finding among the most inter-
turned including Randomized Design. The model was some- esting in this analysis.
what overfitted, though, based on the moderators relatively
high standard error (.090) and restricted range and rather un- Summary
likely predicted values. Whether the control model suggested
that moderator nonindependence did cause problems is The most important and clear finding from the regression
therefore uncertain, but some caution should probably be ex- analyses of moderator influences was that increasing length
ercised in the interpretation of the main model. and intensity of the therapy was associated with higher de-
For the specific variables, Retest Period and/or Therapy grees of personality changes. Such a finding is important and
Load was a factor in all four variable models. Increasing re- portrays a meaningful development as therapy progresses.
test periods and increasing intensity, as defined by Therapy The validity of the results from the global estimates of
Load, was associated with increased effect size magnitudes change (Study 1) is thus strengthened even further.
except for one case. Form Level effect sizes seemed to de- Second, two important factors related to methodology
crease with increasing therapy load, a finding that should be were revealed. A higher level of Concern for Interscorer
interpreted with caution. The raw data indicated that modera- Reliability was associated with larger change estimates, in-
tor dependence was a problem in this particular variable be- dicating that the authors who conscientiously attended to
cause three of the four included studies each contributed data the reliability of their data seemed to be rewarded by over-
from two or three retest intervals. all higher levels of validity. The other factor demonstrated
Table 8 presents the predicted levels obtained by entering a methodological pitfall. A higher degree of blindscoring
ideal values for the included moderators based on the ob- was associated with smaller effect sizes, which could very
tained regression models in Table 7. The ideal value for well be explained by expectancy effects. When scorers or
Therapy Load was defined as individual therapy twice a judges know which protocol is taken pretherapy and which
week. Retest Period, which was effectively defined as ther- is taken posttherapy, it seems that the postprotocols were
apy duration in this context, was defined as brief therapy of 6 rated more favorably than preprotocols, thereby inflating
months, short-term therapy of 1 year, and long-term therapy the degree of change. The relationship between Publication
of 2 years. Publication Type was defined as a major journal, Type and effect size levels could also be taken as an indica-
and Concern for Interscorer Reliability and Blindscoring tion of methodological quality, although only indirectly.
were both set to the highest level. One can assume that peer-reviewed journals publish studies
of higher quality than non-peer-reviewed publications. The
question is whether the difference between major and mi-
nor journals can be interpreted in the same way. There
TABLE 8 should be no reason to doubt the standard of the journals
Observed Effect Size Levels for Final categorized as major, however, and the same interpretation
Retests and Predicted Effect Size Levels is therefore reasonable.
Based on the Regression Models
The predicted levels of change over therapy duration
Predicted showed impressive gains in effect sizes. Especially the sam-
ple model result was impressive because the model included
Model Observed rw 6m 1y 2y a negative factor in Blindscoring. A note of caution should be
Sample .25 .39 .42 .49 made about extending the predictions too far. The assump-
Variable tion of a linear relationship may not hold and especially not
M .30 .18 .22 .30 when the relationship between time and effect size is
Color Balance .49 .53 .57 .65
Form Level .12 .15 stretched beyond the time scope of these studies. Other stud-
All H Content .34 .38 ies have in fact indicated a logarithmic relationship between
Note. Sample average drawn from Table 2 and variable averages from dose and effect, with larger gains in the beginning of treat-
Table 3. rw = weighted averages from final retests; 6 m = a 6-month retest ment and then smaller and smaller gains as therapy pro-
period; 1y = 1 year; 2 y = 2 years. Values entered in sample model: Concern gresses (Howard, Moras, Brill, Martinovitch, & Lutz, 1996).
for Interscorer Reliability = 2 (full control); Blindscoring = 2 (full blinding); This data was not extensive and coherent enough to allow a
Publication Type = 2 (major journal); Retest Period as specified in column
thorough examination of possible curvilinear relationships.
header. Values entered in the variable models: Sessions per Week = 2;
Therapy Type = 2 (individual therapy), resulting in Therapy Load = 6; Retest The results do show, however, that when retest period and
Period entered as specified in column header. Levels for the models that did specific methodological issues are considered, the Ror-
not include Retest Period as a moderator are presented in the 6 m column. schach may yield substantial effect sizes over time.
DISCUSSION (Seligman, 1995, 1996). Studies included in this analysis
that reported several retests for various samples (Exner &
In this meta-analysis of studies using the Rorschach method Andronikof-Sanglade, 1992; Gunderson et al., 1984;
to assess personality changes following psychotherapy, the Haimowitz & Haimowitz, 1952; Weiner & Exner, 1991)
method showed moderate sensitivity. Overall results were also lend strong support to this conclusion in which the in-
strong. A combined sample effect size of rw = .26 is a moder- crease in effect sizes with longer therapy duration was
ate effect size according to Hemphills (2003) guidelines but clearly demonstrated within the same groups. Only one
still substantial in this context. Meyer et al. (2001) noted that study (Glatt & Karon, 1974) defied this trend in which the
effect sizes from a wide selection of psychological assess- first retest reached the highest level and then sinking from r
ment studies based on appropriate criteria and without = .39 to .18 and finally .20. The proportion of missing data
method confounds typically fall between .15 and .30. Meyer was relatively high in this material, which could account
et al. (2001) argued that one generally should be pleased for the somewhat atypical pattern.
(p. 134) when one encounters effect sizes around .30. Other Much of this data is based on pretest to posttest compari-
meta-analyses of psychotherapy studies have yielded much son designs, and I showed in a previous study (Grnnerd,
the same effect size magnitudes as these Rorschach results as 2003) that temporal stability decreases with increasing test
shown in the beginning of the article. These results also show intervals. The largest portion of the data showed changes in
that the Rorschach method seemed to perform as well as the direction predicted by the examiners, however, thus con-
other instruments. The authors of the original studies clearly tradicting the assertion that low temporal stability should be
found the selection of instruments to be relevant and useful, related to larger effect sizes. Such a result would be unlikely
most likely because their selection provided a multifaceted if changes were unrelated to therapeutic interventions.
account of changes in various personality domains. It is en- Rather, one would see a picture of changes going in either
couraging to find that the Rorschach method clearly also had way despite the predicted directions. I therefore conclude
its place. From the perspective of this analysis, the negative that temporal stability is not a major concern in this context.
reputation that the Rorschach method has acquired within the Clinicians who are using the Rorschach method in their
field of psychotherapy research should be reconsidered. daily work might find the substantial effect sizes obtained for
These samples are based on a variety of participants with construct evaluations and global clinical evaluations of par-
varying types of pathology, with very different types of ther- ticular interest. Clinical use of the Rorschach method has of-
apies of varying duration, and with a variety of predictors and ten been criticized for lacking scientific credibility and even
designs. They represent a selection of clinically representa- being speculative. These data contradict such a general asser-
tive studies mostly being conducted with naturalized sam- tion, and show that proper, conceptually driven, theoretically
ples from everyday clinical practice. This feature is a based use of the Rorschach can yield highly valid results. The
strength concerning the generalizability of the results. The locus of responsibility is thus shifted away from the Ror-
fail-safe N analysis showed that 790 unpublished studies schach as a method and onto the user who must carry the re-
with null results need to be included before these results are sponsibility of treating Rorschach data according to
rendered nonsignificant, another indication that the findings clinically and scientifically sound principles.
are sturdy. On the other hand, the diversity of the included studies
The predicted levels obtained from the regression mod- makes this analysis too diffuse to say which types of therapy,
els inspire further confidence in the results. Substantial in- of what length, and for what kind of disorders the Rorschach
creases were found in effect size magnitudes when retest method is most sensitive. Very few studies have used the
periods and methodological features were considered. Most Rorschach method to assess therapy progress, a surprisingly
important, longer and more intensive psychotherapy low number compared with the large amount of research on
yielded larger effect sizes than shorter and less intensive the Rorschach method in general and compared with the
therapy. This finding is in line with the broader psychother- methods widespread use in clinical settings. In this analysis,
apy research literature, which has generally concluded that the low number of samples resulted in a clustering of certain
longer therapies are associated with greater improvement types of data within single samples, for example, that object
(Orlinsky et al., 1994). Howard, Kopta, Krause, and relations data came from only two studies, and almost all CS
Orlinsky (1986) showed that increasing doses of therapy re- data from two others. The regression models were in many
sulted in higher ratings of satisfactory outcome. The rela- cases somewhat fragile. During the course of this work, it
tionship was logarithmic, indicating that the quickest gains was noticeable how small changes in the data selection or
were achieved in the early phases of therapy, and then the type of analysis caused the models to change quite quickly.
gains are smaller and smaller as therapy length increases. This posed a particular problem for the variable-level analy-
Howard, Lueger, Maling, and Martinovitch (1993) concep- ses in Tables 3, 7, and 8 for which the data was the most
tualized the change pattern in the phase model of psycho- scarce. A reasonable question to ask is therefore whether the
therapy. The doseeffect relationship has been confirmed in variation in the levels of change across different variables as
another important data set, the Consumer Reports study seen in Table 3 is a result of true differences or a lack of data.
Looking at the substantial interpretation of the various in shorter therapies, which could explain the higher effect
variables, identifying patterns is difficult, probably also due sizes found for these variables. Affect and stress are more at
to the restricted amount of data. One also needs to consider the front of the clinical picture and therefore more easily ac-
the possibility that some personality characteristics may cessible for the patient and the therapist. Again, this corre-
change but are not very well captured by Rorschach vari- sponds quite well with the phase model, which showed the
ables. With these problems in mind, a gross and tentative most rapid recovery rates for acute distress and then some-
grouping of the variables can be made. what slower for chronic distress symptoms. The increase in
In the lower half of Table 3, we find several variables re- predicted levels for Color Balance might suggest that therapy
lated to self-concept, self-perception, and interpersonal rela- might be quite effective in bringing affects under more delib-
tions such as reflection and pairs (r & (2)), Depth, Object erate control and to increase the efficiency of secondary pro-
Relations, and Active and Passive movement (a & p). The cesses. In more cognitive terms, the patient is better able to
exceptions were Human Movement (M), Human Content cope with stress, anxiety, and other emotions.
(All H Content), and Shading, which were in the range of .30 A further note should be taken of the predicted increases
to .34. Human Movement also showed a strong increase in in M and Color Balance over time. Initially, this result might
predicted levels of change with time (Table 8). suggest a link to Hermann Rorschachs conceptualizations of
In the upper half of Table 3, one finds nearly all variables Erlebnistypus (Rorschach, 1921/1942) and to current con-
related to stress and control capacity on one hand and affect ceptualizations of Experience Actual (EA; Exner, 2003). If
on the other. Variables related to stress and coping, such as one looks at the components of EA, which are M and SumC,
the D score, Coping Deficit Index (CDI), and Adjusted D one finds, however, that SumC showed a weak effect. SumC,
score (Adj D), were in the upper portion of the list. An even and thereby EA, has been criticized for being a problematic
more marked feature was the clustering of affective features indicator of problem-solving resources (Lajoie, 1998), and
in the upper half, with the sole exception of the weighted these results may underline this criticism. If one is to infer
Color scores (SumC). Color Balance, Form Color (FC), that therapy leads to improved functioning, which should be
Stimulation (SumShd > FM + m), Affective Ratio (Afr), and a reasonable inference, it seems that Color Balance would be
partly Shading and the Depression Index (DEPI) all dis- a better conceptualization of affective coping resources than
played large effect sizes. In addition, Color Balance showed SumC.
a strong gain in predicted degree of change over time (Table For specific variables examined in this study and in
8). Grnnerd (2003), the effect sizes for changes due to treat-
The remaining variables in Table 3 related to information ment were correlated with the effect size for temporal stabil-
processing (W & D, Form, Organization) as well as ity. The correlation was close to zero. The temporal stability
mediational and ideational aspects of personality (Populars, levels for this group of variables were high, however. Except
Form Level, FM + m, Special Scores). These variables were for variables reflecting state-like features (FM + m, Shading)
more spread across the range of effect size magnitudes and and Animal Content, all stability coefficients ranged from
did not seem to form any particular pattern. .70 to .84, in many cases with even higher predicted levels
The clustering may suggest that features related to self based on regression models with moderators. To compare
perception and to some extent interpersonal relationships are more directly, temporal stability levels for all types of vari-
less susceptible to change than affective and coping features. ables in a scoring system are needed. This is especially perti-
Interpersonal relationships and self-perceptions are funda- nent for those variables that are important for interpretations
mental features of personality, and problems related to these and for the CS with its widespread use and reputation for sci-
areas often form core issues in long-term therapies. Given the entific vigor. Even more important, the reliability of each
limited average length of the therapies, less change could and variable, both interscorer reliability and temporal stability,
perhaps even should be expected for these variables. The fact should be considered within each study when data is ana-
that Human Movement started out on a relatively low level lyzed. Only this can give a direct assessment of how reliabil-
and then showed increases with time in treatment may sug- ity levels affect effect size magnitudes.
gest, however, that ideational features of interpersonal rela- These results can be used as a starting ground for improve-
tionships are susceptible to changes given enough time. If ments in methodology. The particular challenges facing re-
this interpretation is feasible, then thinking differently about searchers using the Rorschach method have been pertinently
relations to other people may in turn provide the basis for pointed out several times (e.g., Exner, 1995), and these re-
changing perceptions about ones self, which may be a much sults add further weight to the importance of these concerns.
slower process. These features correspond to Researchers should ensure that proper blinding is applied in
characterological symptoms in the phase model of psycho- administration and scoring. Expectancy effects are well doc-
therapy outcome (Howard et al., 1993), which showed a umented in drug treatment studies (Fisher & Greenberg,
slower recovery rate than acute and chronic distress. 1997) and in other contexts (Rosenthal & Rubin, 1978),
Helping a person to handle affects and stress more effi- showing that researchers have a tendency to find what they
ciently, on the other hand, should be a more manageable task are looking for. These results also gives the same suggestion.
Ensuring proper blinding of testretest information will most same analysis have failed to obtain significant results.
likely result in lower effect size magnitudes, but the results Another objection can also be raised. When less seriously
will also be more valid. impaired participants are studied, a larger portion of the
Next, the reliability of the data should be secured by a group will be expected to move across the cutoff points. The
proper interscorer reliability analysis. The reliability should observed changes might be too minute to have any clinical
also be incorporated into the analyses, for example, by dis- significance, or the changes might be caused by random fac-
carding variables with low reliability. The recent debate con- tors. Reporting the magnitude of the observed changes is
cerning interscorer reliability in Rorschach studies (e.g., therefore important also, as Abraham et al. (1994) did (this
Meyer, 1997a, 1997b, 1999b; Meyer et al., 2002; Wood et study examined adolescents and was therefore not included
al., 1996; Wood, Nezworski, & Stejskal, 1997) has been in this analysis). Otherwise, depending on the actual partici-
helpful in highlighting the importance of securing reliability, pants and the choice of cutoff levels, the possibility of both
and methods have been presented for doing so. The problem Type I and Type II errors are immanent in these types of anal-
should therefore be smaller today than when many of the yses.
studies included in this analysis were published. It is advisable that the units of analysis are as close as pos-
Two of the included studies, Exner and sible to clinically important aspects of the data. To enhance
Andronikof-Sanglade (1992) and Weiner and Exner (1991), methodology in this field further, more care should be taken
deserve special attention and can serve as illustrations of a to develop methods to assess changes that focus on the clini-
general point. These studies are important because they cal usefulness of selected elements in the Rorschach method.
have been written by central figures behind the CS and These methods should consider many structural and substan-
therefore serve as models for other studies (e.g., Abraham, tial aspects of the data. First, the distributional characteristics
Lepisto, Lewis, Schultz, & Finkelberg, 1994; Carlson & are important. Skewness reduces testretest correlations
Lindgren, 2002; de Ruiter & Hildebrand, 2000). They (Dunlap, Chen, & Greer, 1994), and this needs to be ac-
should be rewarded for their general focus on clinically im- counted for in temporal stability studies and possibly also in
portant aspects of the data, but concerns can also be raised. change assessments. The reliability of the variable is impor-
In these studies, the number and percentage of persons in a tant, and the analysis should directly incorporate the effects
group who were outside a cutoff point on a specific vari- of interscorer reliability and temporal stability. This could be
able on the first test were presented. This figure was then done according to classic reliability theory as a correction for
compared with the same number and percentage on retest attenuation or perhaps more appropriately according to
on the same variable. In this way, a dichotomy was created generalizability theory (e.g., Shavelson & Webb, 1991). On
based on whether a person was inside or outside an area as- the substantial side, such methods should take both clinically
sumed to have interpretational importance. Significant dif- meaningful cutoff points, the magnitude of the change, and
ferences between the groups were tested by a chi-square the expected change for any given individual into account
analysis with correction for continuity. The results of the (for a recent review of such methods, see Wise, 2004). The
analyses were represented as probability limits for each expected pattern of changes based on each persons individ-
variable. ual personality configuration should be considered, espe-
First, applying a type of analysis in which the participants cially because of the Rorschach methods alleged ability to
to a larger extent are their own controls is preferable rather capture these configurations.
than only comparing group means or frequencies. More em- These analyses do not suggest that the Rorschach method
phasis is thus put on idiographical aspects of the data, which necessarily should be the preferred instrument in psychother-
should increase the clinical validity of the results. In psycho- apy studies. What can be concluded, however, is that these
therapy research, this is very important given the highly indi- data show that the Rorschach method is not inferior in this
vidual configuration of pathology and personality changes. context. It is a time-consuming method, and in large-scale
The possibility of individual changes canceling each other at projects, this might be a decisive factor. The relatively low
the group level should be constricted. If two participants number of participants in each of these samples might also
cross the cutoff point in opposite directions from test to re- speak of this concern. On the other hand, it is a question of
test, the resulting group change will be zero. Data from the gains and losses related to the purpose at hand. Several au-
stability analysis also clearly suggested the advantages of thors of these studies have pointed to the important fact that
testretest correlations over other statistics in which only different participants change differently according to their
group differences are tested. Second, general cutoff points psychopathology and that the Rorschach method meaning-
might not be relevant for the participants at hand. The more fully captures the individual nature of these changes. Given
severe the pathology, the more problematic this can be, as the positive results of this study, the Rorschachs idiographic
initial deviations from cutoff points could be larger. This ef- nature does not come at the expense of more quantitative use.
fectively reduces the statistical power of the study. Given the Rather, they are complementary aspects of the same method,
quite strong results in the mentioned studies, this remains a which should encourage more use of the Rorschach method
principal objection, but other studies that have followed the in psychotherapy research in the future.
ACKNOWLEDGMENTS Bornstein, R. F. (1999). Criterion validity of objective and projective de-
pendency tests: A meta-analytic assessment of behavioral prediction. Psy-
chological Assessment, 11, 4857.
A paper with a brief outline of an early version of this study Cadman, W. H., Misbach, L., & Brown, D. V. (1954). An assessment of
was presented at the Sixteenth International Congress of Ror- roundtable therapy. Psychological Monographs, 68, 384.
schach and Projective Methods in Rome, Italy, September Campbell, D. T., & Kenny, D. A. (1999). A primer on regression artifacts.
2002. The results were presented at the seventh European New York: Guilford.
*Campo, V., Dow, N., & Tuset, A. (1988). Rorschach, O.R.T. and follow-up.
Rorschach Association Congress in Stockholm, Sweden,
British Journal of Projective Psychology, 33(2), 3153.
August 2004. I thank Ellen Hartman for valuable comments Carlson, A. M., & Lindgren, T. (2002, September). Methodological prob-
on the text and Monica Martinussen for good advice on lems when using the Rorschach in the measurement of psychotherapy out-
meta-analytic techniques at the start of this project. Jukka come. Paper presented at the 17th International Congress of Rorschach
Nyblom provided valuable help with some of the raw data and Projective Methods, Rome, Italy.
*Carr, A. C. (1949). An evaluation of nine nondirective psychotherapy cases
analyses and suggested the weighting in the intermethod
by means of the Rorschach. Journal of Consulting Psychology, 13,
analyses. Stine Vogt provided useful language corrections for 196205.
an early version of this manuscript. Sarah Elisabeth Zeld Cohen, J. (1988). Statistical power analysis or the behavioral sciences.
kindly did the actual job of recovering the Glatt and Karon Hillsdale, NJ: Lawrence Erlbaum Associates, Inc.
(1974) data from the archives. *Coons, W. H. (1957). Interaction and insight in group psychotherapy. Ca-
nadian Journal of Psychology, 11, 18.
*Coons, W. H., & Peacock, E. P. (1970). Interpersonal interaction and per-
sonality change in group psychotherapy. Canadian Psychiatric Associa-
REFERENCES tion Journal, 15, 347355.
Corel. (2000). Quattro Pro (Version 9.0 for Windows) [Computer software].
References marked with an asterisk indicate studies included in the Ottawa, Ontario, Canada: Author.
meta-analysis. Curran, P. J., West, S. G., & Finch, J. F. (1996). The robustness of test statis-
Abraham, P. P., Lepisto, B. L., Lewis, M. G., Schultz, L., & Finkelberg, S. tics to nonnormaility and specification error in confirmatory factor analy-
(1994). An outcome study: Changes in Rorschach variables of adoles- sis. Psychological Methods, 1, 1629.
cents in residential treatment. Journal of Personality Assessment, 62, de Ruiter, C., & Hildebrand, M. (2000, September). The Rorschach in the as-
505514. sessment of treatment outcome in forensic psychiatric patients. Paper pre-
Acklin, M. W. (1999). Behavioral science foundations of the Rorschach sented at the 6th Congress of the European Rorschach Association, Paris,
Test: Research and clinical applications. Assessment, 6, 319326. France.
Acklin, M. W., McDowell, C. J., Verschell, M. S., & Chan, D. (2000). Doherty, J. P. (2001). The assessment of annihilation anxiety in the psycho-
Interobserver agreement, intraobserver reliability, and the Rorschach therapy of schizophrenia. Unpublished doctoral dissertation, Central
Comprehensive System. Journal of Personality Assessment, 74, Michigan University, Mount Pleasant.
1547. Dombrose, L. A., & Slobin, M. S. (1958). The IES test. Perceptual and Mo-
Allison, J., Blatt, S. J., & Zimet, C. N. (1988). The interpretations of psycho- tor Skills, 8, 347389.
logical tests. Washington, DC: Hemisphere. *Dudek, S. Z. (1970). Effects of different types of therapy on the personality
Atkinson, L., Quarrington, B., Alp, I. E., & Cyr, J. J. (1986). Rorschach va- as a whole. Journal of Nervous and Mental Disease, 150, 329345.
lidity: An empirical approach to the literature. Journal of Clinical Psy- Dunlap, W. P., Chen, R., & Greer, T. (1994). Skew reduces test-retest reli-
chology, 42, 360362. ability. Journal of Applied Psychology, 79, 310313.
*Baker, L. C. (1998). Rorschach assessment in borderline personality disor- Endicott, N. A., & Endicott, J. (1963). Improvement in untreated psychiat-
der: A follow-up study. Unpublished doctoral dissertation, Michigan State ric patients. Archives of General Psychiatry, 9, 575585.
University, East Lansing. *Exner, J. E., Jr. (1974). The Rorschach: A Comprehensive System: Vol. 1.
*Barendregt, J. T. (1957). A psychological investigation of the effect of Basic foundations. New York: Wiley.
group psychotherapy in patients with bronchial asthma. Journal of Psy- Exner, J. E., Jr. (1978). The Rorschach: A Comprehensive System: Vol. 2.
chosomatic Research, 2, 115119. Current research and advanced interpretation. New York: Wiley.
Barry, J. R., Blyth, D. D., & Albrecht, R. (1952). Relationships between Exner, J. E., Jr. (1986). The Rorschach: A Comprehensive System: Vol 1. Ba-
Rorschach scores and adjustment level. Journal of Consulting Psychol- sic foundations (2nd ed.). New York: Wiley.
ogy, 16, 3036. Exner, J. E., Jr. (Ed.). (1995). Issues and methods in Rorschach research.
Beck, S. J. (1950). Rorschachs Test: Vol. 1. Basic processes (2nd ed.). New New York: Lawrence Erlbaum Associates, Inc.
York: Grune & Stratton. Exner, J. E., Jr. (2003). The Rorschach: A Comprehensive System: Vol. 1. Ba-
Bergin, A. E., & Garfield, S. L. (1994). Handbook of psychotherapy and be- sic foundations and principles of interpretation (4th ed.). New York:
havior change (4th ed.). New York: Wiley. Wiley.
Berman, A. L. (1972). Videotape self-confrontation of schizophrenic ego *Exner, J. E., Jr., & Andronikof-Sanglade, A. (1992). Rorschach changes
and thought processes. Journal of Consulting and Clinical Psychology, following brief and short-term therapy. Journal of Personality Assess-
39, 7885. ment, 59, 5971.
Blatt, S. J., Brenneis, C. B., Schimek, J. G., & Glick, M. (1976). Normal Fairweather, G., Simon, R., Beghard, M., Weingarten, E., Holland, J.,
developement and psychopathological impairment of the concept of the Sanders, R., et al. (1960). Relative effectiveness of psychotherapeutic pro-
object on the Rorschach. Journal of Abnormal Psychology, 85, 364373. grams: A multicriteria comparison of four programs for three different pa-
*Blatt, S. J., Ford, R. Q., Berman, W., Cook, B., & Meyer, R. (1988). The as- tient groups. Psychology Monographs, 74, 171185.
sessment of change during the intensive treatment of borderline and Feldman, M. J., & Drasgow, J. A. (1951). A visual verbal test for schizophre-
schizophrenic young adults. Psychoanalytic Psychology, 5, 127158. nia. Psychiatric Quarterly Supplement, 25, 5564.
Bornstein, R. F. (1998). Interpersonal dependency and physical illness: A Field, A. P. (2001). Meta-analysis of correlation coefficients: A Monte Carlo
meta-analytic review of retrospective and prospective studies. Journal of comparison of fixed- and random-effects methods. Psychological
Research in Personality, 32, 480497. Methods, 6, 161180.
Fisher, S., & Greenberg, R. P. (1997). The curse of the placebo: Fanciful pur- Wolff & J. A. Precker (Eds.), Success in psychotherapy (pp. 94111).
suit of a pure biological therapy. In S. Fisher & R. P. Greenberg (Eds.), New York: Grune & Stratton.
From placebo to panacea (pp. 356). New York: Wiley. Harrower, M. (1958). Personality change and development, as measured by
Fishman, D. B. (1973). Rorschach adaptive regression and change in psy- projective techniques. New York: Grune and Stratton.
chotherapy. Journal of Personality Assessment, 37, 218244. Harrower, M. (1965). Psychodiagnostic testing: An empirical approach:
Friedman, H. (1953). Perceptual regression in schizophrenia. An hypothesis Based on a follow-up of 2000 cases. Springfield, IL: Thomas.
suggested by the use of the Rorschach test. Journal of Projective Tech- Harty, M. Cerney, M., Colson, D., Coyne, L., Freiswyk, S., Johnston, S., et
niques, 17, 171185. al. (1981). Correlates of change and long-term outcome for intensively
Garb, H. N. (1999). Call for a moratorium on the use of the Rorschach Ink- treated hospital patients: An exploratory study. Bulletin of the Menninger
blot Test in clinical and forensic settings. Assessment, 6, 313317. Clinic, 45, 209228.
Garb, H. N., Florio, C. M., & Grove, W. M. (1998). The validity of the Ror- Hathaway, S. R., & McKinley, J. C. (1943). The Minnesota Multiphasic Per-
schach and the Minnesota Multiphasic Personality Inventory: Results sonality Inventory. Minneapolis: University of Minnesota Press.
from meta-analyses. Psychological Science, 9, 402404. Hemphill, J. F. (2003). Interpreting the magnitudes of correlation coeffi-
Garb, H. N., Florio, C. M., & Grove, W. M. (1999). The Rorschach contro- cients. American Psychologist, 58, 7879.
versy: Reply to Parker, Hunsley, and Hanson. Psychological Science, 10, Hiller, J. B., Rosenthal, R., Bornstein, R. F., Berry, D. T. R., &
293294. Brunell-Neuleib, S. (1999). A comparative meta-analysis of Rorschach
Garb, H. N., Wood, J. M., & Nezworski, M. T. (2000). Projective techniques and MMPI validity. Psychological Assessment, 11, 278296.
and the detection of child sexual abuse. Child Maltreatment: Journal of Holt, R. R. (1956). Gauging primary and secondary processes in Rorschach
the American Professional Society on the Abuse of Children, 5, 161168. responses. Journal of Projective Techniques, 20, 1425.
Garfield, S. L. (1978). Research on client variables in psychotherapy. In A. Howard, K. I., Kopta, S. M., Krause, M. S., & Orlinsky, D. E. (1986). The
E. Bergin & S. L. Garfield (Eds.), Handbook of psychotherapy and behav- dose-effect relationship in psychotherapy. American Psychologist, 41,
ior change (3rd ed.). New York: Wiley. 159164.
Garfield, S. L. (1994). Research on client variables in psychotherapy. In A. Howard, K. I., Lueger, R. J., Maling, M. S., & Martinovitch, Z. (1993). A
E. Bergin & S. L. Garfield (Eds.), Handbook of psychotherapy and behav- phase model of psychotherapy outcome: Causal mediation of change.
ior change (4th ed.). New York: Wiley. Journal of Consulting and Clinical Psychology, 61, 678685.
*Gaylin, N. L. (1966). Psychotherapy and psychological health: A Ror- Howard, K. I., Moras, K., Brill, P. L., Martinovitch, Z., & Lutz, W. (1996).
schach function and structure analysis. Journal of Consulting Psychology, Evaluation of psychotherapy: Efficacy, effectiveness, and patient prog-
30, 494500. ress. American Psychologist, 51, 10591064.
Gibson, R. L., Snyder, W. U., & Ray, W. S. (1955). The counseling process. Hurvich, M., Benveniste, P., & Brody, S. (1998). Scoring manual for the
Journal of Counseling Psychology, 2, 8390. RCS. Unpublished manuscript.
*Glatt, C. T., & Karon, B. P. (1974). A Rorschach validation study of the ego Jensen, A. R. (1959). The reliability of projective techniques: Review of the
regression theory of psychopathology. Journal of Consulting and Clinical literature. Acta Psychologica, Amsterdam, 16, 108136.
Psychology, 42, 569576. Kamil, L. J. (1970). Psychodynamic changes through systematic desensiti-
Goldfried, M. R., Stricker, G., & Weiner, I. B. (1971). Rorschach handbook zation. Journal of Abnormal Psychology, 76, 199205.
of clinical and research applications. Englewood Cliffs, NJ: Kantrowitz, J. L., Katz, A. L., & Paolitto, F. (1990). Follow-up of psycho-
Prentice-Hall. analysis five to ten years after termination: I. Stability of change. Journal
Grissom, R. J. (1996). The magical number .7 .2: Meta-meta-analysis of of the American Psychoanalytic Association, 38, 471496.
the probability of superior outcome in comparisons involving therapy, Kantrowitz, J. L., Katz, A. L., Paolitto, F., Sashin, J., & Solomon, L. (1987a).
placebo, and control. Journal of Consulting and Clinical Psychology, 64, Changes in the level and quality of object relations in psychoanalysis: Fol-
973982. low-up of a longitudinal, prospective study. Journal of the American Psy-
Grnnerd, C. (1999). Rorschach interrater agreement estimates: An empir- choanalytic Association, 35, 2346.
ical evaluation. Scandinavian Journal of Psychology, 40, 115120. Kantrowitz, J. L., Katz, A. L., Paolitto, F., Sashin, J., & Solomon, L.
Grnnerd, C. (2003). Temporal stability in the Rorschach method: A (1987b). The role of reality testing in psychoanalysis: Follow-up of 22
meta-analytic review. Journal of Personality Assessment, 80, 272293. cases. Journal of the American Psychoanalytic Association, 35, 367385.
Grove, W. M., Andreassen, N. C., McDonald-Scott, P., Keller, M. B., & Kantrowitz, J. L., Paolitto, F., Sashin, J., Solomon, L., & Katz, A. L. (1986).
Shapiro, R. W. (1981). Reliability studies of psychiatric diagnosis. Ar- Affect availability, tolerance, complexity, and modulation in psychoanal-
chives of General Psychiatry, 38, 408413. ysis: Follow-up of a longitudinal, prospective study. Journal of the Ameri-
*Gunderson, J. G., Frank, A. F., Katz, H. M., Vannicelli, M. L., Frosch, J. P., can Psychoanalytic Association, 34, 529559.
& Knapp, P. H. (1984). Effects of psychotherapy in schizophrenia: II. Karon, B. P., & Vandenbos, G. R. (1972). The consequences of psychother-
Comparative outcome of two forms of treatment. Schizophrenia Bulletin, apy for schizophrenic patients. Psychotherapy, 9, 111119.
10, 564598. Katz, M. M., & Lyerly, S. (1963). Methods for measuring adjustment and so-
Gunderson, J. G., Gomes-Schwartz, B. (1980). The quality of outcome from cial behavior in the community. Psychological Reports, 13, 503535.
psychotherapy of schizophrenia. In M. B. B. J. S. Strauss, Jr., T. W. *Kavanagh, G. G. (1985). Changes in patients object representations during
Downey, S. Fleck, S. Jackson, & I. Levine (Eds.), The psychotherapy of psychoanalysis and psychoanalytic psychotherapy. Bulletin of the
schizophrenia (pp. 257275). New York: Plenum. Menninger Clinic, 49, 546564.
Gunderson, J. G., Kolb, J. E., & Austin, Y. (1981). The diagnostic inter- Klopfer, B., Ainsworth, M., Klopfer, W., & Holt, R. (1954). Developments
view for borderline patients. American Journal of Psychiatry, 142, in the Rorschach technique: Vol. 1. Technique and theory. New York:
277288. World Book.
*Haimowitz, N., & Haimowitz, M. (1952). Personality changes in cli- *Krout, J., Krout, M. H., & Dulin, T. J. (1952). Rorschach test-retest as a
ent-centered psychotherapy. In W. Wolff & J. A. Precker (Eds.), Success gauge of progress in psychotherapy. Journal of Clinical Psychology, 8,
in psychotherapy (pp. 6393). New York: Grune & Stratton. 380384.
Hamlin, R. M., & Albee, G. W. (1948). Muenchs tests before and after *LaFrance, S. C. (1971). Developing ego strength in drug dependent persons
nondirective therapy: A control group for his subjects. Journal of Con- through group therapy. Correctional Psychologist, 4(5), 171182.
sulting Psychology, 12, 412416. Lajoie, G. (1998, August). The uncertain validity of the D ScoreA demon-
*Hamlin, R. M., Berger, B., & Cummings, S. T. (1952). Changes in adjust- stration. Paper presented at the 5th Congress of the European Rorschach
ment following psychotherapy as reflected in Rorschach signs. In W. Association, Madrid, Spain.
Lambert, M. J., & Hill, C. E. (1994). Assessing psychotherapy outcome and Handbook of psychotherapy and behavior change (4th ed., pp. 270376).
process. In A. E. Bergin & S. L. Garfield (Eds.), Handbook of psychother- New York: Wiley.
apy and behavior change (4th ed.). New York: Wiley. Parker, K. C. H. (1983). A meta-analysis of the reliability and validity of the
Lerner, B. (1972). Therapy in the ghetto: Political impotence and personal Rorschach. Journal of Personality Assessment, 47, 227231.
disintegration. Baltimore: Johns Hopkins Press. Parker, K. C. H., Hanson, R. K., & Hunsley, J. (1988). MMPI, Rorschach,
Lerner, B., & Fiske, D. W. (1973). Client attributes and the eye of the be- and WAIS: A meta-analytic comparison of reliability, stability, and valid-
holder. Journal of Consulting and Clinical Psychology, 40, 272277. ity. Psychological Bulletin, 103, 367373.
Lerner, P. M. (1975). Handbook of Rorschach scales. New York: Interna- Parker, K. C. H., Hunsley, J., & Hanson, R. K. (1999). Old wine from old
tional Universities Press. skins sometimes tastes like vinegar: A response to Garb, Florio, and
Lindberg, D. (1981). A controlled study of 5 years treatment with psycho- Grove. Psychological Science, 10, 291292.
therapy in combination with depot neuroleptics in schizophrenia. Acta *Peterson, A. O. D. (1954). A comparative study of Rorschach scoring
Psychiatrica Scandinavica: Supplementum, 289, 5566. methods in evaluating personality changes resulting from psychotherapy.
Lipsey, M. W., & Wilson, D. B. (1993). The efficacy of psychological, edu- Journal of Clinical Psychology, 10, 190192.
cational, and behavioral treatment. American Psychologist, 48, *Peyman, D. A. R. (1956). An investigation of the effects of group psycho-
11811209. therapy on chronic schizophrenic patients. Group Psychotherapy, 9,
Lipsey, M. W., & Wilson, D. B. (2001). Practical meta-analysis. Thousand 3539.
Oaks, CA: Sage. Piotrowski, Z. A. (1957). Perceptanalysis. New York: Macmillan.
Martin, D. J., Garske, J. P., & Davis, M. K. (2000). Relation of the therapeu- Piotrowski, Z. A., & Schreiber, M. (1952). Rorschach perceptanalytic
tic alliance with outcome and other variables: A meta-analytic review. measurement of personality changes during and after intensive psycho-
Journal of Consulting and Clinical Psychology, 68, 438450. analytically oriented psychotherapy. In G. Bychowski & J. L. Despert
May, P. R. A., & Dixon, W. J. (1969). The Camarillo Dynamic Assessment (Eds.), Specialized techniques in pyschotherapy (pp. 337361). New
Scale. Bulletin of the Menninger Clinic, 33, 135. York: Basic.
McDowell, C., & Acklin, M. W. (1996). Standardizing procedures for calcu- Pruitt, W. A., & Spilka, B. (1964). Rorschach empathy-object relationship
lating Rorschach interrater reliability: Conceptual and empirical founda- scale. Journal of Projective Techniques and Personality Assessment, 28,
tions. Journal of Personality Assessment, 66, 308320. 331336.
McReynolds, P., & Ferguson, J. T. (1953). Clinical manual for the Hospital Rapaport, D. (1949). Diagnostic psychological testing. New York: Interna-
Adjustment Scale. Palo Alto, CA: Leland Stanford University Press. tional Universities Press.
Meltzoff, J., & Kornreich, M. (1970). Research in psychotherapy. New York: *Rice, L. N. (1973). Client behavior as a function of therapist style and client
Atherton. resources. Journal of Counseling Psychology, 20, 306311.
Meyer, G. J. (1997a). Assessing reliability: Critical corrections for a critical Rickers-Ovsiankina, M. A. (Ed.). (1960). Rorschach psychology. New York:
examination of the Rorschach Comprehensive System. Psychological As- Wiley.
sessment, 9, 480489. Rioch, M. J. (1949). The use of the Rorschach test in the assessment of
Meyer, G. J. (1997b). Thinking clearly about reliability. More critical cor- change in patients under psychotherapy. Psychiatry: Journal for the Study
rections regarding the Rorschach Comprehensive System. Psychological of Interpersonal Processes, 12, 427434.
Assessment, 9, 495498. Roberts, B. W., & DelVecchio, W. F. (2000). The rank-order consistency of
Meyer, G. J. (1999a, July). A cross national review of Rorschach interscorer personality traits from childhood to old age: A quantitative review of lon-
reliability. Paper presented at the 16th International Congress of Ror- gitudinal studies. Psychological Bulletin, 126, 325.
schach and Projective Methods, Amsterdam, The Netherlands. Romney, D. M. (1990). Though disorder in the relatives of schizophrenics:
Meyer, G. J. (1999b). Simple procedures to estimate chance agreement and A meta-analytic review of selected published studies. Journal of Nervous
kappa for the interrater reliability of response segments using the Ror- and Mental Disease, 178, 481486.
schach Comprehensive System. Journal of Personality Assessment, 72, Rorschach, H. (1942). Psychodiagnostics (9th ed., P. Lemkau, Trans.). Bern,
230255. Switzerland: Hans Huber Verlag. (Original work published 1921)
Meyer, G. J. (2000). On the science of Rorschach research. Journal of Per- Rosenthal, R. (1991a). Meta-analytic procedures in social research (Rev.
sonality Assessment, 75, 4681. ed.). Baltimore: Sage.
Meyer, G. J., & Archer, R. P. (2001). The hard science of Rorschach re- Rosenthal, R. (1991b). Quality-weighting of studies in meta-analytic re-
search: What do we know and where do we go? Psychological Assess- search. Psychotherapy Research, 1, 2528.
ment, 13, 486502. Rosenthal, R., & Rubin, D. B. (1978). Interpersonal expectancy effects: The
Meyer, G. J., Finn, S. E., Eyde, L., Kay, G. G., Moreland, K., Dies, R. R., et first 345 studies. The Behavioral and Brain Sciences, 3, 377415.
al. (2001). Psychological testing and psychological assessment: A review Schachtel, E. G. (1966). Experiential foundations of Rorschachs test. New
of evidence and issues. American Psychologist, 56, 128165. York: Basic.
Meyer, G. J., & Handler, L. (1997). The ability of the Rorschach to predict Schafer, R. (1954). Psychoanalytic interpretation in Rorschach testing. New
subsequent outcome: A meta-analysis of the Rorschach Prognostic Rat- York: Grune & Stratton.
ing Scale. Journal of Personality Assessment, 69, 138. Schwager, E., & Spear, W. E. (1981). New perspectives on psychological
Meyer, G. J., & Handler, L. (2000). Correction to Meyer and Handler (1997). tests as measures of change. Bulletin of the Menninger Clinic, 45,
Journal of Personality Assessment, 74, 504506. 527541.
Meyer, G. J., Hilsenroth, M. J., Baxter, D., Exner, J. E., Jr., Fowler, J. C., Seligman, M. E. P. (1995). The effectiveness of psychotherapy: The Con-
Piers, C. C., et al. (2002). An examination of interrater reliability for scor- sumer Reports study. American Psychologist, 50, 965974.
ing the Rorschach Comprehensive System in eight data sets. Journal of Seligman, M. E. P. (1996). Science as an ally of practice. American Psychol-
Personality Assessment, 78, 219274. ogist, 51, 10721079.
*Mintz, E. E., Schmeidler, G. R., & Bristol, M. (1956). Rorschach changes Shadish, W. R., Navarro, A. M., Matt, G. E., & Phillips, G. (2000). The ef-
during psychoanalysis. Journal of Projective Techniques, 20, 414417. fects of psychological therapies under clinically representative condi-
Muench, G. A. (1947). An evaluation of non-directive psychotherapy. Inter- tions: A meta-analysis. Psychological Bulletin, 126, 512529.
national Review of Applied Psychology, 13, 1738. Shavelson, R. J., & Webb, N. M. (1991). Generalizability theory: A primer
Murray, H. A. (1938). Explorations in personality. New York: Oxford. (Vol. 1). Thousand Oaks, CA: Sage.
Orlinsky, D. E., Grawe, K., & Parks, B. K. (1994). Process and outcome in Smith, M. L., & Glass, G. V. (1977). Meta-analysis of psychotherapy out-
psychotherapy: Noch einmal. In A. E. Bergin & S. L. Garfield (Eds.), come studies. American Psychologist, 32, 752760.
Soskis, D. A., & Bowers, M. B., Jr. (1969). The schizophrenic experience: A West, M. M. (1998). Meta-analysis of studies assessing the efficacy of pro-
follow-up study of attitude and post-hospital adjustment. Journal of Ner- jective techniques in discriminating child sexual abuse. Child Abuse and
vous and Mental Disease, 149, 443449. Neglect, 22, 11511166.
Spero, M. H. (1984). Some pre- and post-treatment characteristics of cult Westen, D., & Morrison, K. (2001). A multidimensional meta-analysis of
devotees. Perceptual and Motor Skills, 58, 749750. treatments for depression, panic, and generalized anxiety disorder: An
SPSS. (2001). Statistical Package for the Social Sciences (Version 11.0.0 for empirical examination of the status of empirically supported therapies.
Windows) [Computer software]. Chicago: Author. Journal of Consulting and Clinical Psychology, 69, 875899.
Stanton, A. H., Gunderson, J. G., Knapp, P. H., Frank, A. F., Vannicelli, M. Wise, E. (2004). Methods for analyzing psychotherapy outcomes: A review
L., Schnitzer, R., et al. (1984). Effects of psychotherapy in schizophrenia: of clinical significance, reliable change, and recommendations for future
I. Design and implementation of a controlled study. Schizophrenia Bulle- directions. Journal of Personality Assessment, 82, 5059.
tin, 10, 520563. Wode-Helgodt, B., Berg, G., Petterson, U., Rydelius, P. A., & Trollehed, H.
Steel, P. D., & Kammeyer-Mueller, J. D. (2002). Comparing meta-analytic (1988). Group therapy with schizophrenic patients in outpatient depart-
moderator estimation techniques under realistic conditions. Journal of ments. Acta Psychiatrica Scandinavica, 78, 304313.
Applied Psychology, 87, 96111. Wood, J. M., Lilienfeld, S. O., Garb, H. N., & Nezworski, M. T. (2000).
Strauss, J. C., & Harder, D. W. (1981). The Case Record Rating Scale: A The Rorschach test in clinical diagnosis: A critical review, with a back-
method for rating symptoms and social functions data from case records. ward look at Garfield (1947). Journal of Clinical Psychology, 56,
Psychiatry Research, 4, 333345. 395430.
Urist, J. (1977). The Rorschach test and the assessment of object relations. Wood, J. M., Nezworski, M. T., & Stejskal, W. J. (1996). The Comprehen-
Journal of Personality Assessment, 41, 39. sive System for the Rorschach: A critical examination. Psychological Sci-
Vidalis, A. A., Preston, T. D., & Baker, G. H. (1990). Is day hospital treat- ence, 7, 310.
ment effective, and can success be predicted? International Journal of So- Wood, J. M., Nezworski, M. T., & Stejskal, W. J. (1997). The reliability of
cial Psychiatry, 36, 137142. the Comprehensive System for the Rorschach: A comment on Meyer
Watkins, J. G. (1949). Evaluating success in psychotherapy. The American (1997). Psychological Assessment, 9, 490494.
Psychologist, 4, 396. Zamansky, H. S., & Goldman, A. E. (1960) A comparison of two methods of
Wechsler, D. (1944). The measurement of adult intelligence. Baltimore: Wil- analyzing Rorschach data in assessing therapeutic change. Journal of
liams and Wilkins. Projective Techniques, 24, 7582.
*Weiner, I. B., & Exner, J. E., Jr. (1991). Rorschach changes in long-term and Zulliger, H. (1941). Behn-Rorschach Test. Bern, Switzerland: Hans Huber
short-term psychotherapy. Journal of Personality Assessment, 56, 453465. Verlag.
Studies Excluded From the Meta-Analysis Despite Appropriate Variables

Study Reason

Barry, Blyth, & Albrecht (1952) Sample included participants who were not in psychotherapy
Berman (1972) Experimental intervention using video
Cadman, Misbach, & Brown (1954) Sample included participants who were not in psychotherapy
Doherty (2001) The study was based on the same participants as in Glatt and Karon (1974), but the data was included in the Glatt
and Karon samples third retest
Endicott & Endicott (1963) Sample included participants who were not in psychotherapy
Exner (1978) Rreported testretest correlations, which cannot be meaningfully converted to effect sizes reflecting changes
Fishman (1973) Did not analyze changes in Rorschach data from pretest to posttest
Gibson, Snyder, & Ray (1955) Reported factor analysis of data from multiple tests based on the same participants as in Peterson (1954), which
was included in this analysis
Harrower (1958) Reported only frequency data
Harrower (1965) Reported only frequency data
Kamil (1970) Alternative administration Rorschach cards
Kantrowitz, Katz, Paolitto, Sashin, Data was based on several assessment instruments (TAT, Rorschach, Draw-a-Person, Cole Animal, & WAIS); no
& Solomon (1987a) specific Rorschach data was reported
Kantrowitz Katz, Paolitto, Sashin, Same as previous
& Solomon (1987b)
Kantrowitz, Katz, & Paolitto (1990) Same as previous
Kantrowitz, Paolitto, Sashin, Same as previous
Solomon, & Katz (1986)
Karon & Vandenbos (1972) The study was based on the same participants as in Glatt & Karon (1974) and included participants who were not in
psychotherapy; data was not available from Karon
Lerner (1972) Alternative administration Rorschach cards
Lerner & Fiske (1973) Alternative administration Rorschach cards
Lindberg (1981) Criterion variables possibly also used as selection criteria: The diagnosis of schizophrenia was to be confirmed by
the projective tests the Ro [Rorschach] and the HIT [Holtzman Inkblot Technique] (p. 58); the author did not
specify which indexes were considered as a confirmation of schizophrenia, but neither did he specifically
exclude those that were under investigation
Piotrowski & Schreiber (1952) Reported only data from variables that were significant, and it was not possible to determine which other variables
were also investigated
Pruitt & Spilka (1964) Did not analyze changes in Rorschach data from pretest to posttest
Rioch (1949) Reported only frequency data
Schwager & Spear (1981) Excluded one patient from the analyses that was atypical compared to the others and another that did not show
indications of change
Spero (1984) Reported only data from variables that were significant, and it was not possible to determine which other variables
were also investigated
Vidalis, Preston, & Baker (1990) Alternative administration Rorschach cards
Watkins (1949) Reported only frequency data
Wode-Helgodt, Berg, Petterson, Control participants received medication
Rydelius, &Trollehed (1988)
Zamansky & Goldman (1960) Sample included participants who were not in psychotherapy

Note. TAT = Thematic Appreception Test; WAIS = Wechsler Adult Intelligence Scales.

Cato Grnnerd
Department of Psychology
Pb 111
FIN80101 Joensuu, Finland

Received October 2, 2002

Revised May 6, 2004