You are on page 1of 10

Journal of Consulting and Ctinica] Psychology Copyright 1989 by the American Psychological Association. Inc.

1989, Vol. 57,No. 1,138-147 0022-006X/89/J00.7!

Power to Detect Differences Between Alternative Treatments


in Comparative Psychotherapy Outcome Research

Alan E. Kazdin and Debra Bass


Western Psychiatric Institute and Clinic
University of Pittsburgh School of Medicine

Comparative studies of psychotherapy often find few or no differences in the outcomes that alterna-
tive treatments produce. Although these findings may well reflect the comparability of alternative
treatments, as a rule, studies are often not sufficiently powerful to detect the sorts of effects sizes
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

likely to be found when two or more treatments are contrasted. The present survey evaluated the
This document is copyrighted by the American Psychological Association or one of its allied publishers.

power of psychotherapy outcome studies to detect differences for contrasts of two or more treatments
and treatment versus no-treatment control conditions. Outcome studies (A' = 85) were drawn from
nine journals over a 3-year period (1984-1986). Data in each article were examined first to provide
estimates of effect sizes and then to evaluate statistical power at posttreatment and follow-up. The
findings indicate that the power of studies to detect differences between treatment and no treatment
is generally quite adequate given the relatively large effect sizes usually evident for this comparison.
On the other hand, the power is relatively weak to detect the small-to-medium effect sizes likely to
be evident when alternative treatments are contrasted with each other. Thus, the equivalent outcomes
that treatments produce (i.e., "no difference") may be due to the relatively weak power of the tests.
The implications for interpreting current outcome studies and for designing future comparative
studies are highlighted.

In psychotherapy research, comparative outcome studies ad- The absence of clear outcome differences between alternative
dress the question of which of two or more techniques is the treatments has served as an important point of departure re-
most effective for a particular clinical problem and patient pop- garding the nature of therapy and the processes that account for
ulation. Often the constituent treatments reflect conflicting change (see Stiles et al., 1986). As a prominent example, Frank
conceptual views about the nature of dysfunction, the focus of (1982) suggested that therapeutic change results from several
treatment, and the techniques required to produce change. features that are common among different techniques. Thus,
Consequently, comparative studies generate tremendous inter- no outcome differences between treatments might be expected
est (see Heimberg& Becker, 1984; Kazdin, 1986; Lambert, Sha- given their common ingredients. This view has been bolstered
piro, & Bergjn, 1986; Luborsky, Singer, & Luborsky, 1975; in part by the recognition that different techniques often appear
Rachman & Wilson, 1980; Stiles, Shapiro, & Elliott, 1986). more diverse in theory than they do in clinical practice (e.g.,
Outcome evidence on the relative effectiveness of alternative Klein, Dittmann, Parloff, & Gill, 1969; Sloane et al., 1975).
treatments has been evaluated extensively. Conclusions have Common ingredients, particularly the special relationship be-
been drawn from individual comparative outcome investiga- tween client and therapist and the provision of support, empa-
tions (e.g., Sloane, Staples, Cristol, \orkston, & Whipple, thy, and concern, are pervasive among alternative techniques
1975), box-score (e.g., Luborsky et al., 1975) and narrative re- (Waterhouse & Strupp, 1984).
views (e.g., Kazdin & Wilson, 1978; Lambert etal., 1986; Rach- Current estimates suggest that well over 400 psychotherapy
man & Wilson, 1980), and meta-analyses (see Brown, 1987). techniques are in use for adults (Karasu, personal communica-
Although individual studies and large-scale reviews occasion- tion, March 1, 1985) and that over 230 techniques are in use
ally argue for the superiority of one technique over another, for children (Kazdin, 1988). If we assume that many of these
evaluations of the literature usually suggest that treatments tend techniques are effective in producing therapeutic change, it is
not to differ or at least not to differ very much in the outcomes difficult to conceive that they vary in effectiveness or operate
they produce. through different therapeutic processes. Yet, this assumption is
not tantamount to stating that the results from viable treatment
contenders for a given clinical problem will be similar. Whether
Completion of this article was facilitated by Research Scientist Devel- treatment outcomes differ can only be evaluated empirically.
opment Award MH00353 and by Grant MH35408 from the National The finding that treatments do not differ in many, if not most,
Institute of Mental Health. We are extremely grateful to Jacob Cohen,
tests may mean that treatments are approximately equal in their
Larry V. Hedges, and Kenneth I. Howard. Special thanks are also ex-
effects. Yet, it is important to know if the studies are, as a rule,
tended to Helena C. Kraemer, who provided comments and guidance
on prior drafts. designed to detect outcome differences. When two or more ac-
Correspondence concerning this article should be addressed to Alan tive interventions are expected to produce change, the investi-
E. Kazdin, Western Psychiatric Institute and Clinic, 3811 O'Hara gation must be sufficiently sensitive to detect what could prove
Street, Pittsburgh, Pennsylvania 15213. to be relatively small differences.

138
COMPARATIVE OUTCOME STUDIES AND STATISTICAL POWER 139

A critical research issue is the extent to which an investiga- point, the surveys did not focus on psychotherapy outcome ex-
tion can detect differences between groups when differences ex- clusively or primarily. Yet, the characteristics of psychotherapy
ist within the population. This notion is referred to as statistical outcome studies may make the conclusions even more appli-
power and reflects the probability that the test will lead to rejec- cable.
tion of the null hypothesis.1 Power is a function of the criterion To begin with, small sample sizes may be dictated in part by
for statistical significance (alpha), sample size (.V), and the the subject matter due to difficulties in recruiting, treating, and
difference that exists between groups (effect size). retaining large samples of homogeneous subjects (Kraemer,
Although power is an issue in virtually all research, it raises 1981). In contemporary outcome research, studies typically in-
special issues in studies where two or more conditions (groups) clude 20 or fewer subjects per group (e.g., Cross, Sheehan, &
are not significantly different (Fagley, 198S; Freiman, Chal- Khan, 1982; DiLoreto, 1971; Liberman & Eckman, 1981;
mers, Smith, & Kuebler, 1978; Kazdin, 1980). The absence of Rush, Beck, Kovacs, & Hollon, 1977). Indeed, it is not difficult
significant differences can contribute to knowledge under a va- to find studies comparing alternative treatment and control
riety of circumstances. However, an essential precondition is conditions in which the sample sizes are 10 or fewer cases per
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

that the investigation be sufficiently powerful to detect mean- condition (e.g., Forman, 1980; Yu, Harris, Solovitz, & Franklin,
This document is copyrighted by the American Psychological Association or one of its allied publishers.

ingful differences. In the vast majority of psychotherapy out- 1986). Inadequate levels of power could be a rival explanation
come studies that have contrasted two or more treatments, the for the absence of treatment differences.
power may have been relatively weak due to small samples sizes. A second problem of psychotherapy research is the loss of
There are many reasons to suspect that outcome studies as a subjects over time. Apart from the possible selection biases that
general rule provide weak tests. Over 25 years ago, Cohen attrition can introduce, sample sizes (and power) may be mark-
(1962) examined clinical research published in the Journal of edly reduced by the end of treatment and by the follow-up as-
Abnormal and Social Psychology for a 1-year period (1960). sessment (Garfield, 1980). The toll of attrition on the design is
Over 2,000 statistical tests were identified (from 70 articles) that often high. In some treatment programs, between 40% and 50%
were considered to reflect direct tests of the hypotheses. To eval- of subjects may drop out of treatment (e.g., Fleischman, 1981;
uate power, Cohen examined different effect sizes, that is, the Patterson, 1974; Viale-Val, Rosenthal, Curtiss, & Marohn,
magnitude of the differences between alternative groups based 1984). In one large-scale comparative outcome study, almost
on standard deviation units. Cohen distinguished three levels of 90% of the cases (396 of 450) that completed treatment were
effect sizes (small = .25, medium = .50, and large = 1.00) and lost at follow-up 1 year later (Feldman, Caplinger, & Wodarski,
evaluated the power of published studies to detect differences at 1983). Thus, in psychotherapy outcome studies, sample size
these levels, assuming alpha = .05 and using nondirectional and power may weaken considerably over time. If the conclusion
(two-tailed) tests. of no differences between treatments is not evident at posttreat-
The results indicated that power was generally weak for de- ment, it may well be reached at follow-up.
tecting differences equivalent to small and medium effect sizes. One may be able to discern from psychotherapy research the
For example, the mean power of studies to detect differences approximate effect sizes for classes of comparisons such as
reflecting small and medium effect sizes was. 18 and .48, respec- treatment versus no treatment and treatment versus treatment.
tively. This means that, on the average, studies had slightly less Indeed, such effect sizes have often been examined in the con-
than a 1 in 5 chance to detect small effect sizes and less than a text of meta-analysis. Effect sizes obtained after experiments
1 in 2 chance to detect medium effect sizes. These levels are are completed provide an estimate of population effect sizes
considerably below the recommended level of power, .80 (4 in 5 (Cohen, 1973). It is important to examine effect sizes for com-
chance).2 Cohen concluded that the power of the studies was parisons of alternative therapy techniques, not only to deter-
weak and that sample sizes in future studies should routinely mine the magnitude of the differences with which we may be
be increased (see also Cohen, 1977). working, but also to interpret studies in which critical tests of
A more recent analysis examined if the situation has im- treatment are provided and few or no differences emerge.
proved since Cohen's earlier portrayal. Rossi, Rossi, and Cot- From the published research, one can estimate the power of
trill (1984) sampled research from the Journal of Personality studies to detect differences of the magnitudes that emerge for
and Social Psychology and the Journal of Abnormal Psychology,
journals that are descendants of the publication Cohen ana- 1
Power (1-beta) is the probability of rejecting the null hypothesis
lyzed. The data were obtained from 142 articles in the 1982 when it is false. Stated differently, power is the likelihood of finding
volume of each journal. Rossi et al. found that 3%, 26%, and differences between the treatments when, in fact, the treatments are
69% of the studies in 1982 had power above .80 for detecting truly different in their outcomes.
2
small, medium, and large effects, respectively. This compared The level of power that is adequate is not easily specified or justified
with parallel data from Cohen (1962) of 0%, 9%, and 79%, re- mathematically. As with the level of confidence (alpha), the decision is
spectively. Although slight increases in power were evident for based on convention about the margin of protection one should have
against falsely accepting the null hypothesis (beta). Cohen (1965) rec-
detecting small and medium effects, the vast majority of studies
ommended adoption of the convention that beta = .20 and hence power
were quite weak with regard to detecting such effects.
(l-beta) = .80 when alpha = .05. This translates to the 4 in 5 likelihood
The applicability of these findings to psychotherapy outcome of detecting an effect when a difference exists in the population. Al-
research might be challenged. The surveys of Cohen (1962) and though power a .80 is used as a criterion for discussion in the present
Rossi et al. (1984) covered only a 1 -year period, sampled a small article, a higher level (.90, .95) is often encouraged as the acceptable
number of journals, and encompassed diverse areas of research criterion (e.g., Freiman, Chalmers, Smith, & Kuebler, 1978; Friedman,
within clinical and social psychology. Perhaps, more to the Furberg,&DeMets, 1985).
140 ALAN E. KAZDIN AND DEBRA BASS

comparisons of alternative treatment conditions. There is, of goals (e.g., reading improvement). Outcome investigations referred to
course, the bias that derives from consulting only published studies designed to measure some facet of psychological adjustment or
studies. Such studies may be more prone to yield significant functioning after treatment was completed (posttreatment). At least two
groups or conditions were required for the study to be included. The
effects, whereas those that did not yield such effects may be un-
groups could include any combination of treatment and control condi-
published and relegated to the investigator's file drawer (Rosen-
tions. Although primary interest was in studies comparing two or more
thai, 1979). However, comparative outcome studies with few or
treatments, all psychotherapy outcome studies were included if there
no differences are often published (e.g., Sloane et al., 1975). The were at least two groups. This inclusion criterion was adopted to permit
reasons include the keen interest in the comparisons in their evaluation of power for the different comparisons (one treatment vs.
own right, the likely differences between the treatments and the another treatment vs. no-treatment).
no-treatment or waiting-list control condition, and the evalua- To provide a sample of psychotherapy outcome research, nine jour-
tion of therapeutic processes common to alternative tech- nals were studied for a 3-year period (1984-1986). Four journals were
niques. selected because they were the most frequent contributors to meta-anal-
In the present article, we examine the power of treatment yses of psychotherapy outcome research (Shapiro & Shapiro, 1982;
Smith, Glass, AMillei; 1980). The journals included Ox Journal of Con-
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

outcome studies to detect differences when alternative treat-


sulting and Clinical Psychology, the Journal of Counseling Psychology,
This document is copyrighted by the American Psychological Association or one of its allied publishers.

ments are compared. The question is important because the


Behavior Therapy, and Behaviour Research and Therapy. In addition,
general absence of differences between two or more treatments
five other journals were selected because they reflect disciplines (psychi-
tested in a given study may be viewed differently, depending on atry) in which treatment outcome research is published, they are often
whether the experimental tests have adequate power. It might viewed as publication outlets for clinical trials, or their publication do-
well be that the outcome evidence argues generally for the ab- main explicitly delineates psychotherapy research. Thus, these latter
sence of major treatment differences. To argue this position re- journals were selected largely on the basis of face validity, that is, be-
quires some assurance that the tests comparing different treat- cause they include therapy research and appear by virtue of publication
ments are sufficiently powerful. The question can be addressed domain to be a primary outlet For this group, five journals were se-
by examining the effect sizes of current outcome studies to de- lected: Archives of General Psychiatry, American Journal of Psychiatry,
Psychotherapy: Theory, Research and Practice, British Journal of Psy-
tect differences between alternative treatment conditions and
chiatry, and British Journal of Clinical Psychology. From the different
the sample sizes that are used.
journals for a 3-year period, 120 psychotherapy outcome studies were
In meta-analyses of psychotherapy, the usual goal is to com-
identified. Of these, 85 (70.8%) provided sufficient statistical informa-
pare treatment against a control group and to draw conclusions tion to compute effect sizes.3
about the magnitude of effect sizes for specific techniques (e.g.,
behavior therapy vs. cognitive therapy). A related goal in the
Comparisons of Interest
present article was to examine the power of psychotherapy stud-
ies to detect differences when two or more treatments are in- To evaluate the statistical power of these psychotherapy studies, three
cluded. However, the specific techniques and the direction of types of comparisons were made: (a) comparison of each treatment with
their differences in relation to each other were not of interest. the other treatments included in the study, (b) comparison of treatment
with a no-treatment control condition, and (c) comparison of treatment
The objective was to examine the power to detect differences
with an active, nonspecific (attention placebo) control condition. Of pri-
with different types or classes of comparisons.
mary interest was the fust comparison that addresses the question of
The power of psychotherapy studies was examined by esti-
the effect size and power when alternative treatments are compared. The
mating effect sizes from findings of psychotherapy outcome comparisons involving treatment and alternative control conditions
studies for the comparisons of interest. Effect sizes and power provide important baseline data regarding the power for delecting
were estimated for comparisons at posttreatment and follow-up different types of effects in outcome research.
because of the changes in sample sizes (attrition) over the course Treatments consisted of alternative psychotherapy techniques given
of therapy studies and the different and occasionally diametri- the definition noted previously. No-treatment control referred to any
cally opposed outcomes that alternative treatments can produce group from which treatment was withheld during the pre- to posttreat-
ment interval for the treatment group. Active control referred to groups
at different points in time (see Kazdin, 1988). Treatment stud-
in which an attention placebo or discussion control procedure was used
ies were surveyed from nine journals over a 3-year period. Sta-
to control for such factors as meeting with a therapist and attending
tistical tests were examined from the psychotherapy studies to
sessions.4
estimate their power to detect differences where they exist in
comparing alternative treatment conditions.
3
A list of the studies for the present analyses is available on request.
4
In meta-analyses of psychotherapy, effect sizes obtained by compar-
Method
ing treatment versus alternative types of control groups are occasionally
no different (Casey & Herman, 1985). For the present purposes, the dis-
Studies of Interest
tinction between no-treatment (including waiting-list) control and at-
Psychotherapy outcome studies were selected for the present evalua- tention-placebo control was nevertheless retained. The rationale was
tion. Psychotherapy was defined to include interventions designed to the likelihood that effect size and power would be lower for detecting
decrease distress, psychological symptoms, and maladaptive behavior differences between treatment and an attention-placebo group than be-
or designed to improve adaptive and prosocial functioning through the tween treatment and a no-treatment group. The differences could have
use of interpersonal interaction, counseling, or activities following a spe- important implications for designing studies if sample sizes need to be
cific treatment plan (see Garfield, 1980; Walrond-Skinner, 1986). Ex- planned in accord with different expected effect sizes. Also, evaluations
cluded from the definition were interventions using medication as a of alternative treatments have occasionally noted that a strong placebo
form of treatment or interventions directed singularly at educational is often no more or less effective than a technique considered to be more
COMPARATIVE OUTCOME STUDIES AND STATISTICAL POWER 141

Measure and Calculation of Effect Size and, hence, are combined in subsequent analyses. The table de-
tails information relevant to the evaluation of power but also
Within each study, an effect size was calculated between each pair of
portrays several basic characteristics of psychotherapy outcome
groups on each outcome measure. Effect size (ES) was denned as (m, -
studies.
m2)/i where m, and m2 refer to two group means (treatment or control)
The mean sample size (N) for all studies was 53 subjects
and s is the pooled within-group standard deviation.9 Each ES was clas-
sified as coming from a comparison of two treatments, of treatment (SD = 39.2); the mean number of groups was 3 (SD = 1.2). All
versus no treatment, or of treatment versus an active control. Also, each 85 studies included at least one treatment group, given that this
ES was classified as evaluated at posttreatment or at follow-up. If multi- was a selection criterion for inclusion; 75 (88.2%) of the studies
ple follow-up assessments were reported, only the last (longest duration) included two or more treatment conditions; 40 (47.1%) of the
assessment was considered. Thus, six types of ESs might have been de- studies included a no-treatment or waiting-list control condi-
rived from a single study. Within a study, multiple values of each effect tion. Only 8 (9.4%) studies included an active control (e.g., at-
size might have been obtained if there were several outcome measures. tention-placebo) condition. In terms of outcome evaluation all
The ESs of each type within a study were averaged, so that each study
studies included posttreatment and 67.1% of the studies in-
contributed no more than one mean ES per comparison of interest.6
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

cluded a follow-up assessment.8 The mean duration of the lon-


Of central interest was the case in which two or more treatments (T,,
This document is copyrighted by the American Psychological Association or one of its allied publishers.

gest follow-up in the studies was 8.0 months (SD = 7.5).


r2 Tt) were compared. In these cases, T, and F2 were each sepa-
rate treatments (e.g., cognitive therapy, behavior therapy). The specific
conditions that constituted Tt and 7"2 within a given study or across Sample Sizes and Estimates of Effect Size
studies were not of interest. There was no interest, for the present
purposes, in drawing conclusions about the relative merits of alternative Table 2 summarizes the sample sizes observed from the stud-
psychotherapy techniques. Across all studies, the identity of the treat- ies surveyed. As noted in this table and in the others that follow,
ment conditions designated as T, or 7"2 varied. The goal was to evaluate medians, 25th, and 75th percentiles are presented. The median
the power of comparative studies to detect differences between alterna- (50th percentile) is presented as a measure of central tendency
tive treatments. Consequently, in computing ESs for treatment versus uninfluenced by extreme data points (outliers) that might affect
treatment comparisons, the absolute value of the ES (| m, - m21 /.v) was
both the mean and the standard deviation. The 25th and 75th
calculated. Thus, the mean ES for a study with two treatments would
percentiles are presented as the interval that captures 50% of
be the mean of the absolute value differences between each of these
the studies about that median. The discussion focuses on means
treatments for each of the outcome measures. Similarly, for compara-
tive purposes, the absolute magnitude of the differences was used to to make the present comments parallel with other evaluations
compute ESs between treatment and control conditions.7 Effect sizes
were calculated from the means and standard deviations reported in the
5
studies. When this information was unavailable, ESs were calculated In computing effect sizes, not all researchers have used the pooled
from other reported statistics. Techniques for estimating ESs from other standard deviation as the estimate of variance. The pooled estimate was
data sources are described elsewhere (Smith et al, 1980). used in the present analyses because it is readily estimated from studies
in which t and F tests are reported but standard deviations for individual
groups are omitted. In addition, the pooled estimate may be a less-bi-
Results
ased estimate than the standard deviation of the control group (Hedges
AOlkin, 1985).
Descriptive Characteristics 6
Within a given study, more than one outcome measure was likely to
From the nine journals, 85 psychotherapy outcome studies be included. In some previous meta-analyses, separate outcome mea-
were used to compute effect sizes and to estimate power. Two of sures have been used as separate ESs for the data analyses. The issues
raised with this procedure include the undue weight given to studies
the journals (American Journal of Psychiatry, Psychotherapy:
with a large number of outcome measures and the nonindependence of
Theory, Research and Practice) yielded no studies for the pres-
observations (ESs) for any data analyses. An alternative strategy is to
ent analysis. Effect sizes for the comparisons of interest were
calculate a mean ES for a given comparison (e.g., treatment vs. non-
computed for each of the measures within a study. A total of treatment) by averaging the ESs from the individual outcome measures.
2,501 ESs were computed: 1,752 at posttreatment and 749 at Thus, for a given study with a treatment and no-treatment control con-
follow-up. However, each study could yield only one ES for each dition, one ES would be generated for that comparison based on the
of the comparisons of interest (e.g., treatment vs. no treatment) mean of ESs for all of the outcome measures. Alternative strategies for
at posttreatment and follow-up. Thus, the number of ESs varied handling multiple outcomes within a study have different limitations. A
depending on the groups and assessment periods included in major limitation of the present method is that it assumes that all out-
the study. come measures in a study should be weighted equally (i.e., are equally
important). There remains no consensus on how to prioritize outcome
Table 1 presents the number of studies available from each
measures to resolve this concern (see Brown, 1987).
journal, the sample sizes for the studies, the group composition, 7
In meta-analyses, comparisons of treatment versus control groups
the types and time points of comparisons, and the study length.
do not use absolute effect size. In such analyses, the interest is in iden-
The journals are presented separately in this initial table for tifying positive or negative effect sizes, that is, in whether the treatment
descriptive purposes. Individual journals were not of interest group was better or worse than the control group. Effect size data based
on directional differences of treatment versus control conditions for the
present survey are available from the authors.
' Thisfigureis an underestimate of follow-up evaluations of psycho-
well-grounded in theory, research, or practice (Kazdin & Wilcoxon, therapy studies because investigators occasionally publish follow-up
1976; Prideau, Murdock, & Brody, 1983). Thus, no-treatment and ac- data in publications separate from the original article hi which post-
tive control conditions reflect different substantive issues. treatment data were presented.
142 ALAN E. KAZDIN AND DEBRA BASS

Table 1
Psychotherapy Outcome Studies: Sources and Study Characteristics

Variable All journals JCCP JCP BT BRT AGP BJP BJCP

No. of studies 85 38 10 16 16 2 2 1
Sample size
M 53.60 55.63 81.10 38.44 45.31 71.00 42.50 64.00
SD 39.21 31.97 79.45 18.00 29.84 42.43 31.82 0
Range 12-298 12-146 31-298 14-66 17-131 41-101 20-65
No. of groups —
M 3.14 3.05 3.70 2.56 3.31 4.00 2.00 8.00
SD 1.20 .93 1.06 .63 1.35 2.83 0 0
Range 2-8 2-6 2-6 2-4 2-6 2-6
No. of studies with — —
Two or more treatments 75' 34 9 13 15 2 1 1
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

No-treatment control 40" 19 7 9 4 0 1 0


This document is copyrighted by the American Psychological Association or one of its allied publishers.

Active control 8° 5 0 0 3 0 0 0
Follow-up assessment 57" 27 4 11 12 2 1 0
Follow-up length (months)
M 8.03 7.14 5.00 8.73 11.75 3.50 1.00 0
SD 7.55 5.11 4.97 6.23 12.48 3.54 0 0
Range 1-46 1-18 1-12 1-24 2-46 1-6 0 0

Nate. JCCP = Journal oj'ConsultingandClinical Psychology; JCP = Journal of Counseling Psychology; BT = Behavior Therapy; BRT= Behaviour
Research and Therapy; ASP = Archives of General Psychiatry; BJP = British Journal of Psychiatry; BJCP = British Journal of Clinical Psychology.
Dash denotes that the high and low numbers for the range were the same or that only one entry was available.
•88.24%. b 47.06%. "9.41%. "67.06%.

in which effect sizes of psychotherapy research have been exam- Table 3 presents estimated effect sizes for the comparisons of
ined. interest. As a potential guideline for the interpretation of the
As is evident in the table, the sample sizes were quite similar data, it is useful to bear in mind Cohen's (1977) classification
across treatment and control groups. Consideration of the data of small, medium, and large ESs at .20, .SO, and .SO.9 When two
pooled across groups revealed that the mean group sizes at post- or more treatments were compared with each other, the mean
treatment and follow-up were 16.1 and 15.3, respectively. Ex- absolute ESs at posttreatment and follow-up were .50 and .47,
amination of the 75th percentile conveyed that three fourths of respectively. These fall within the range of medium ESs. The
the studies included fewer than 20 subjects per group. mean absolute ES across all outcome studies comparing treat-
ment versus no treatment was .85 at posttreatment and .89 at
follow-up. These ESs fall within the range of large ESs. Few
studies (8 at posttreatment, 5 at follow-up) were available that
Table 2 compared treatment versus active control conditions. The
Sample Sizes (n) Obtained From Surveyed Studies mean ES for this comparison was .38 at posttreatment and .32
Group Posttreatmenl Follow-up at follow-up, both between small and medium ESs.10
The range of ESs obtained for the comparison of interest are
Treatment illustrated in Figure 1. This figure conveys the median ES and
M 16.01(11.12) 14.56 (7.83) the 25th and 75th percentiles, a range that includes 50% of the
Man 12.00 13.00
10.00-19.00
studies for the comparison of interest. The data convey clearly
25th-75th percentiles 10.00-17.00
Range 6-107 3-74 that the comparisons of alternative treatments span the small-
No-treatment control to-medium range. In contrast, the ESs for treatment versus no
M 16.49(16.52) 19.91 (21.70) treatment are in the medium-to-large range.
Mdn 12.00 13.00
25th-75th percentiles 10.00-18.00 9.00-20.00
Range 5-114 7-84 Estimated Power
Active control
If ESs fall within the ranges noted previously, one can esti-
M 16.89(9.74) 20.17(10.28)
Mdn 12.00 16.50 mate the extent to which studies, as currently designed, are
25th-75th percentiles 12.00-24.50 11.50-32.50
Range 5-34 10-34 ' The magnitudes for small, medium, and large effect sizes have been
All revised from those specified originally by Cohen (1962), which were .25,
M 16.12(12.09) 15.25(9.81)
.50, and 1.00. The effect sizes of .20, .50, and .80 reflect the magnitudes
Mdn 12.00 13.50
10.00-19.00 10.00-17.00 in more recent references (Cohen, 1977).
25th-75th percentiles 10
Range 5-114 3-84 Because of the small number of studies, the data for this compari-
son will not be discussed further. However estimates of ES, power, and
Note. Standard deviations are expressed in parentheses. related information are presented in the tables.
COMPARATIVE OUTCOME STUDIES AND STATISTICAL POWER 143

Table 3 Table 4
Estimated Effect Sizes for Comparisons of Interest Estimated Power to Detect Differences Between Alternative
Treatments and Treatment Versus Control Conditions
Comparison Posttreatment Follow-up
Comparison Posttreatment Follow-up
Treatment vs treatment
M .50 (.31) .47 (.32) Treatment vs. treatment
Mdn .47 .40 Mdn .74 .63
25th-75th percentiles .26-.66 .27-.64 25-75 percentiles .27-97 .38-87
Range .04-1.52 .02-1.76 Range .05-995 .03-995
No. of articles 75 42 No. of articles 75 42
Treatment vs no treatment Treatment vs no treatment
M .85 (.47) .89 (.46) Mdn .995 .995
Mdn .78 .87 25th-75th percentiles .89-995 .92-995
25th-75th percentiles .54-1.05 .52-1.15 Range .37-995 .73-995
Range .30-2.67 .26-1.85 No. of articles 40 10
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

No. of articles 40 10 Treatment vs active control


This document is copyrighted by the American Psychological Association or one of its allied publishers.

Treatment vs active control Mdn .51 .53


M .38 (.17) .32 (.24) 25th-75th percentiles .25-79 .19-84
Mdn .44 .27 Range .14-87 .03-97
25th-75th percentiles .19-53 .13-53 No. of articles 8 5
Range .15-.59 .02-66
No. of articles 8 5

Note. Standard deviations are expressed in parentheses.


and, consequently, provides an optimal level of power for the
study given the fixed sample size. Power functions are nonlin-
ear. Because the arithmetic average (mean) does not weight the
likely to be powerful enough to detect significant differences be-
different levels appropriately, median levels of power are dis-
tween alternative types of comparisons. Based on sample sizes
cussed. The median estimated levels of power for the different
and estimated ESs, power can be estimated from the tables pro-
comparisons of interest, accompanied by 25th and 75th percen-
vided by Cohen (1977). For the purpose of identifying power,
tiles, are presented in Table 4.
alpha was set at .OS for two-tailed tests of significance. Also, the
The primary question concerned the power to detect differ-
mean group size (n) was used in cases in which groups were
ences when two or more alternative treatments are compared.
unequal in size or individual group size could not be deter-
As noted in Table 4, the power to detect differences for this type
mined (Cohen, 1977). The use of an arithmetic mean for the
of comparison yielded a median of .74 at posttreatment and of
groups facilitates the use of power tables. Assuming equal group
.63 at follow-up. Thus, the median chance of a study to detect
sizes also maximizes the power for the sample size of a study
a difference at posttreatment and at follow-up is about 7 in 10
and 3 in 5, respectively. For individual studies comparing two
or more treatments, 45.3% (34 of 75) and 28.6% (12 of 42),
1.2 respectively, met the criterion of power 2; .80 needed to detect
differences between treatment groups at posttreatment and fol-
low-up. The majority (54.7% and 71.4%) of studies that com-
1
pared alternative treatments did not meet the recommended
level of power.
0.8 The table indicates relatively strong power for the test of treat-
ment versus no treatment. Median power was estimated at .995
Effect
at posttreatment and follow-up for this comparison. Using .80
Size 0.6
as a criterion for adequate power, outcome studies generally are
sufficiently powerful to detect differences between treatment
0.4 versus no treatment. As for the individual outcome studies,
82.5% (33 of 40) and 90.0% (9 of 10) met the criterion of power
0.2 & .80 for the comparison of treatment versus no treatment
at posttreatment and follow-up, respectively.

Sample Sizes for Psychotherapy Research


T (P) T (FU) NT (P) NT (FU)
Treatment Compared To... Based on data obtained in the present survey, more specific
Figure 1. Effect sizes (ESs) obtained from comparisons of treatment comments can be made about requisite samples sizes when al-
(T) with another treatment or with no treatment (NT) separately for ternative forms of psychotherapy are compared. Assume for a
posttreatment (P) and follow-up (FU). The column for each compari- moment that an investigator wishes to compare two treatments
son reflects toe range of ESs from the 25th to the 75th percentiles; the and to retain generally acceptable levels of alpha, beta, and
horizontal line within each column reflects the median ES. power (1-beta); in this case, alpha = .05 for a two-tailed test and
144 ALAN E. KAZDIN AND DEBRA BASS

beta = .20 so that power = .80. For the purpose of illustration, ment and follow-up). Also, the majority of studies (82.5% and
sample size and ESs will be considered for the different compar- 90.0%) comparing treatment with a no-treatment control met
isons in relation to the posttreatment evaluation of outcome. or surpassed the power of .80 at posttreatment and follow-up.
In the present survey, the median ES for comparing two treat- A number of implications can be drawn from the present sur-
ments was .47 at posttreatment, with a range from .26 to .66 vey. First, power in psychotherapy outcome studies research is
representing the 25th and 75th percentiles. Figure 2 conveys generally low for detecting small and medium effects. The need
the corresponding sample sizes needed for these ESs. A sample for improved power remains, and several recommendations
size of 71 per group would be needed to retain power at the made over 20 years ago (Cohen, 1965) continue to be timely.
desired level for the median ES, with a corresponding range The concern with weak power in relation to psychotherapy re-
from 232 to 36 subjects for the 25th and 75th percentiles.11 Ex- search echoes points voiced in relation to research in clinical,
amination of the figure shows the number of subjects per group social, educational, and applied psychology more generally
actually used. The median number of subjects for studies com- (e.g., Brewer, 1972; Chase & Chase, 1976; Cohen, 1962; Rossi
paring alternative treatments was 12.0, with a range from 10 to et al., 1984). Psychological research, of course, is not alone on
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

19 for the 25th and 75th percentiles. These actual sample sizes this score. The absence of differences in major clinical trials of
This document is copyrighted by the American Psychological Association or one of its allied publishers.

are far below the number needed to detect treatment differ- alternative interventions in medicine and public health may, in
ences. many instances, be attributed to insufficient power (see Frei-
If the investigator wishes to compare treatment versus no manetal., 1978).
treatment, the median ES is .78 at posttreatment. This ES is A second and more focused implication of the present results
bounded by ESs of .54 and 1.05, which reflect the 25th and 75th pertains to the special issues regarding comparative psychother-
percentiles and represent 50% of the studies surveyed. Figure 2 apy outcome research (Heimberg & Becker, 1984; Kazdin,
plots the corresponding sample sizes for these ESs. A sample 1986). In evaluations of alternative psychotherapy techniques,
size of 27 per group would be needed for the desired power with both in individual studies and in literature reviews, a frequent
the median ES for this comparison; the corresponding range of interpretation is that there are few or no outcome differences
54 to 14 subjects per group would be required for the 25th and between alternative techniques. Perhaps most treatment tech-
75th percentiles. These numbers are larger but closer to the ac- niques are not very different in the effects they produce, and
tual sample sizes used. Figure 2 shows the actual sample sizes, the common processes of therapy advanced to explain the null-
with a median of 12 at posttreatment bounded by a range from hypothesis findings are well-based. However, in light of the pres-
10 to 18 for the respective percentiles. ent findings, it is appropriate and parsimonious to raise another
prospect. Possibly, studies that compare alternative treatments
are not sufficiently powerful to detect differences between treat-
Discussion
ments unless the effect sizes are large.
Psychotherapy outcome research was examined in nine jour- Given the complexities of clinical problems and psychother-
nals over a 3-year period (1984-86) to assess the extent to which apy and the limitations and variability of outcome assessment,
studies are sufficiently powerful to detect differences between large effects, even if they are evident in the population, might be
alternative treatment and control conditions. Effect sizes were difficult to obtain in a given investigation. The present survey
estimated at posttreatment and follow-up, and these were used suggests that effect sizes for comparisons of alternative treat-
along with sample size data to evaluate power. The results can ments are likely to be in the small-to-medium range. Given the
be discussed with reference to Cohen's (1977) criteria for small usual sample sizes that are used, the majority of studies may
(.20), medium (.50), and large (.80) effect sizes. The present re- not be sufficiently powerful to detect such differences.
sults indicate that comparisons of alternative treatments yield There are several limitations of the present survey. To begin
effect sizes close to the medium level (mean ESs = .50 at post- with, the methods of estimating and evaluating effect sizes in
treatment and .47 at follow-up) and that comparisons of treat- psychotherapy research are not entirely free from controversy.
ment versus no treatment tend to yield relatively large effect Significant issues such as the appropriate weighting of alterna-
sizes (mean ESs = .85 at posttreatment and .89 at follow-up).
The question of interest is: To what extent are psychotherapy
1
outcome studies sufficiently powerful to detect differences given ' The requisite sample (N) or group (n) sizes for a given alpha, power,
these effect sizes? Using estimated effect sizes and sample sizes and effect size of various increments can be obtained from various pub-
from the studies themselves and adopting an alpha of .05 (for lished tables (see Cohen, 1977;Hinkle&Oliver, 1983;Kraemer&Thie-
two-tailed tests), we obtained power estimates. In evaluating the mann, 1987). For the present purposes, the requisite group sizes, when
findings, we considered power 2 .80 a criterion of adequate sen- we assumed an equal number per group, were obtained by direct calcu-
lation (see Lachin, 1981; Snedecor & Cochran, 1 980). For a two-tailed
sitivity of a test.
test of means from two independent samples of equal group sizes,
Power for comparisons of alternative treatments was below
the recommended level (medians = .74 for posttreatment and
.63 for follow-up). Indeed, for studies comparing two or more
active treatments, fewer than half at posttreatment and fewer
than one third at follow-up (45.3% and 28.6% of the studies) For a two-tailed test with alpha = .05 and beta = .20, this translates to
had power at or above the recommended level. The power to
detect differences for comparisons of treatment versus no treat- (1.96 + .842)2(2)
ment was quite adequate (medians = .995 for both posttreat- " -- JSP
COMPARATIVE OUTCOME STUDIES AND STATISTICAL POWER 145

Required ns for power = .80 and other study characteristics were ignored. The impact of
250- these and other characteristics of therapy on effect sizes have
been studied in large-scale meta-analyses (e.g., Shapiro & Sha-
piro, 1982; Smith etal., 1980).
200-
The purpose of the present survey was to examine broad
classes of comparisons to estimate power. The estimates, when
Group 150- viewed in the range of 25th and 75th percentiles, provide an
Size (n) interval that suggests reasonable consistency, even though the
distinctions between different techniques and studies were not
100- -
made. One might want to examine whether the power of studies
of particular treatments is weaker than that of others and to
50-
e a make technique distinctions. However, this was beyond the
present goal.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

A feature directly related to different characteristics of the


This document is copyrighted by the American Psychological Association or one of its allied publishers.

T (P) T (FU) NT (P)NT (FU) therapy studies was neglected in the present survey and may
Treatment Compared To- have important implications for power. The present analysis fo-
cused on the power to detect differences between treatments.
Obtained ns from Survey The analyses examined main effects of a single variable (treat-
25- ment technique). This focus is tantamount to asking the ques-
tion, does psychotherapy work? The question has long been re-
jected as much too general to warrant serious attention (e.g.,
20-
Edwards & Cronbach, 1952). The global question has been re-
placed by a more specific focus aimed at identifying which type
Group 15
of treatment works best for which type of client, as adminis-
Size (n) tered by which type of therapist under which circumstances
10- (Kiesler, 1966; Paul, 1967). This question focuses on interac-
tions (e.g., Treatment X Patient x Therapist X Setting) rather
than simply on the efficacy of treatment or the relative effec-
5- tiveness of different treatments. Yet, testing interactions in fac-
torial studies is likely to divide samples into smaller groups than
those used to test main effects of treatment. If power is weak for
T (P) T (FU) NT (P) NT (FU) testing the main effect of treatment techniques, a fortiori, it is
Treatment Compared To... likely to be weak for testing the interactions. This concern has
been voiced in relation to other areas of psychological research
Figure 2. Sample sizes required for comparisons of treatment (T) with
in which the inadequacy of power to test statistical interactions
another treatment or with no treatment (NT) separately for posttreat-
is common (Chase & Tucker, 1975; Cohen, 1965).
ment (P) and follow-up (FU). (The sample sizes required are based on
Another limitation of the present evaluation is the exclusive
alpha = .05 for a two-tailed test, with power set at. 80 and an assumption
of equal ns per group. The column for each comparison reflects the focus on power and the limited discussion even within this do-
range of sample sizes from the 25th and 75th percentiles; the horizontal main. The focus has suggested that power needs to be increased
line within each column corresponds to the median sample size needed. in psychotherapy outcome studies given the magnitude of effect
The upper figure represents the sample sizes required for power - .80 sizes usually reported. The discussion has emphasized that
given the effect sizes that are obtained for the different types of compari- power is a function of alpha, sample size, and effect size. Sample
sons; the lower figure conveys the actual sample sizes used in studies of size seems to be the only variable for the investigator to manipu-
the present survey.) late and improve given that the adherence to alpha (at .05 or
.01) has become a strongly entrenched convention (Cohen,
1977) and that effect size, at first blush, seems merely to reflect
live outcome measures, the appropriate standard deviation the state of affairs evident in nature, that is, an estimate of true
unit, the extent to which the sample effect sizes within and population differences.
among different studies can be pooled to estimate population Actually, power entails much more than the factors empha-
effect sizes, the nonindependence of effect sizes computed for a sized in this article. Power is a function of effect size, which is
given study for the types of comparisons of interest, and the influenced in manifold ways by the care and consistency with
exclusion of unpublished studies, (among others) have been which an investigation is conducted. Stated generally, effect size
carefully discussed, but not entirely resolved, in other second- can be increased or rather optimized in an investigation by re-
ary analyses. The present method of computing effect sizes fol- ducing error variance. Methodological features such as select-
lowed several practices used in prior meta-analyses and is sub- ing homogeneous sets of patients, ensuring the integrity of
ject to similar issues and concerns. The present survey suffers treatment, standardizing the assessment conditions, carefully
from greater interpretive obstacles by combining studies, effect choosing outcome measures, and similar practices increase the
sizes, and power estimates across broad classes of treatment in power of an investigation by reducing variability in its execu-
which the identity of the type of treatment, patient, measure, tion. Although the present focus was limited to consideration
146 ALAN E. KAZDIN AND DEBRA BASS

of power, power itself can be augmented by the careful execu- interpreted unambiguously without improved power. In the
tion of the study. context of psychotherapy research, as opposed to government
The present survey illustrates the utility of power analyses in and politics, it may be the absence rather than the presence of
evaluating the results of previously conducted research. Such power that corrupts.
analyses can estimate the likelihood that research can detect
differences given sample and effect sizes. Another and more im-
References
portant use of power analysis is the planning of a study to ensure
that sample sizes can detect an effect of a given magnitude (Co- Brewer, J. K. (1972). On the power of statistical tests in the American
hen, 1977; Kraemer & Thiemann, 1987; Meinert, 1986). Psy- Educational Research Journal. American Educational Research
chotherapy outcome research can profit greatly from this use of Journal, 9,391-401.
power analyses. The usual impediment for incorporating power Brown, J. (1987). A review of meta-analyses conducted on psychother-
in the design of research is hesitancy in estimating effect size. apy outcome research. Clinical Psychology Review, 7.1-23.
Casey, R. J., & Herman, J. S. (1985). The outcome of psychotherapy
However, data from meta-analyses of psychotherapy (Brown,
with children. Psychological Bulletin, 98,388-400.
1987) as well as descriptive analyses such as those found in the
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

Chase, L. J., & Chase, R. B. (1976). A statistical power analysis of ap-


This document is copyrighted by the American Psychological Association or one of its allied publishers.

present survey provide estimates of the approximate effect sizes


plied psychological research. Journal of Applied Psychology 61,234-
for alternative types of comparisons. As guidelines, they may be 237.
used to identify the sample sizes required for adequate power. Chase, L. J., & Tucker, R. K. (1975). A power-analytic examination of
It is possible, of course, that large sample sizes needed to pro- contemporary communication research. Speech Monographs, 42,
vide adequate comparisons of treatment are not available for 29-41.
particular clinical problems or settings. These limitations do Cohen, J. (1962). The statistical power of abnormal-social psychological
not necessarily warrant undertaking a comparison that has research: A review. Journal of Abnormal and Social Psychology, 65,
minimal chances of detecting differences. The levels of confi- 145-153.
dence adopted to protect against Type I error (alpha) and Type Cohen, J. (1965). Some statistical issues in psychological research. In
B. B. Wolman (Ed.), Handbook of'clinicalpsychology (pp. 95-121).
II error (beta) and the resulting power (1 -beta) may need to be
New %rk: McGraw-Hill.
reconsidered. Depending on the specific question, the compari-
Cohen, 1. (1973). Statistical power analysis and research results. Ameri-
son of interest, and the cost and consequences of no differences can Education Research Journal, 10, 225-230.
between treatments, one might wish to vary alpha or beta Cohen, J. (1977). Statistical power analysis for the behavioral sciences
(Meinert, 1986). The issue requires consideration in advance of (2nd ed.). New York: Academic Press.
the execution of the study. Given the sample sizes usually used Cross, D. G., Sheehan, P. W., & Khan, J. A. (1982). Short- and long-
in comparative outcome studies and the likely effect sizes usu- term follow-up of clients receiving insight-oriented therapy and be-
ally reported, differences are not likely to be detected with ac- havior therapy. Journal of Consulting and Clinical Psychology, 50,
ceptable levels of power. 103-112.
DiLoreto, A. O. (1971). Comparative psychotherapy: An experimental
The neglect of power can have major implications for inter-
analysis. Chicago: Aldine-Atherton.
preting research, perhaps especially so in the evaluation of psy-
Edwards, A. L., & Cronbach, L. J. (1952). Experimental design for re-
chotherapy. As noted previously, psychotherapy research is an
search in psychotherapy. Journal of Clinical Psychology, 8,51 -59.
area in which the absence of differences is often taken to be Fagley, N. S. (1985). Applied statistical power analysis and the interpre-
quite significant from conceptual and clinical perspectives. It tation of nonsignificant results by research consumers. Journal of
may well be that treatments are similar in the outcomes they Counseling Psychology, 32, 391-396.
produce and that findings of no difference reflect the actual Feldman, R. A., Caplinger, T. E., & Wodarski, J. S. (1983). The St. Louis
state of the population. However, there remains a plausible al- conundrum: The effective treatment of antisocial youths. Englewood
ternative hypothesis. The present evaluation suggests that the Clifls, NJ: Prentice-Hall.
power of studies comparing alternative treatments is relatively Heischman, M. J. (1981). A replication of Patterson's "Intervention
for boys with conduct problems." Journal of Consulting and Clinical
weak. The results are not intended to fuel further the search for
Psychology, 49,342-351.
the superiority of one treatment over another. More sophisti-
Fonnan, S. G. (1980). A comparison of cognitive training and response
cated questions involving interactions of treatment with vari-
cost procedures of modifying aggressive behavior of elementary
ables or other conditions of interest need to be studied. How- school children. Behavior Therapy, 11, 594-600.
ever, the points raised in relation to power in the present survey Frank, J. D. (1982). Therapeutic components shared by all psychothera-
may need to be considered even further. Unless sample sizes are pies. In J. H. Harvey & M. M. Parks (Eds.), Psychotherapy research
substantially increased from current levels, finer grained analy- and behavior change (Vol. 1, pp. 5-37). Washington, DC: American
ses of treatments are likely to be associated with even lower Psychological Association.
power as the division of the sample for the comparisons of inter- Freiman, J. A., Chalmers, T. C, Smith, H., & Kuebter, R. R. (1978).
est results in smaller units for analyses. The importance of beta, the Type II error, and sample size in the
design and interpretation of the randomized control trial. New En-
The present survey evaluated alternative classes of compari-
gland Journal of Medicine, 299, 690-694.
sons that are made in comparative outcome studies. The im-
Friedman, L. M., Furberg, C. D., & DeMets, D. L. (1985). Fundamen-
plications of weak power for more specific substantive questions tals of clinical trials (2nd ed.). Littleton, MA: PSG Publications.
(e.g., Treatment X Patient X Therapist X Setting interactions) Garfield, S. L. (1980). Psychotherapy: An eclectic approach. New Tforfc
were not addressed. Even so, the general conclusion that might Wiley.
be drawn is worth underscoring. The absence of differences be- Hedges, L. V, & Olkin, I. (1985). Statistical methods for meta-analysis.
tween treatments in comparative outcome studies may not be Orlando, FL: Academic Press.
COMPARATIVE OUTCOME STUDIES AND STATISTICAL POWER 147

Heimberg, R. G., & Becker, R. E. (1984). Comparative outcome re- Multiple settings, treatments, and criteria. Journal of Consulting and
search. In M. Hersen, L. Michelson, & A. S. Bellack (Eds.), Issues in Clinical Psychology, 42,471-481.
psychotherapy research (pp. 251-283). New York: Plenum Press. Paul, G. L. (1967). Outcome research in psychotherapy. Journal of Con-
Hinkle, D. E., & Oliver, J. D. (1983). How large should the sample be? sulting Psychology, 31,109-118.
A question with no simple answer? Or. . . .EducationalandPsycho- Prioleau, L., Murdock, M., & Brody, N. (1983). An analysis of psycho-
logical Measurement, 43,1051-1060. therapy versus placebo studies. The Behavioral and Brain Sciences,
Kazdin, A. E. (1980). Research design in clinical psychology. New York: 8, 275-285.
Harper & Row. Rachman, S. J., & Wilson, G. T. (1980). The effects of psychological
Kazdin, A. E. (1986). Comparative outcome studies of psychotherapy: therapy (2nd ed.). Oxford, England: Pergamon Press.
Methodological issues and strategies. Journal of Consulting and Clini- Rosenthal, R. (1979). The "file drawer problem" and tolerance for null
cal Psychology, 54, 95-105. results. Psychological Bulletin, 86,638-641.
Rossi, J. S., Rossi, S. R., & Cottrill, S. D. (1984, April). Statistical power
Kazdin, A. E. (1988). Child psychotherapy: Developing and identifying
of research in social and abnormal psychology: What have we gained
effective treatments. New York: Pergamon Press.
in 20 years? Paper presented at the meeting of the Eastern Psychologi-
Kazdin, A. E., & Wilcoxon, L. A. (1976). Systematic desensitization
cal Association, Baltimore, MD.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

and nonspecific treatment effects: A methodological evaluation. Psy-


Rush, A. J., Beck, A. X, Kovacs, M., & Hollon, S. (1977). Comparative
This document is copyrighted by the American Psychological Association or one of its allied publishers.

chological Bulletin. 83, 729-758.


efficacy of cognitive therapy and pharmacotherapy in the treatment
Kazdin, A. E., & Wilson, G. T. (1978). Evaluation of behavior therapy:
of depressed outpatients. Cognitive Therapy and Research, 1,17-38.
Issues, evidence, and research strategies. Cambridge, MA: Ballinger.
Shapiro, D. A., & Shapiro, D. (1982). Meta-analysis of comparative
Kiesler, D. J. (1966). Some myths of psychotherapy research and the therapy outcome studies: A replication and refinement. Psychological
search for a paradigm. Psychological Bulletin, 65,110-136. Bulletin, 92, 581-604.
Klein, M. H., Dittmann, A. T., Parloff, M. B., & Gill, M. M. (1969). Sloane, R. B., Staples, E R., Cristol, A. H., Yorkston, N. J., & Whipple,
Behavior therapy: Observations and reflections. Journal of Consulting K. (1975). Psychotherapy versus behavior therapy. Cambridge, MA:
and Clinical Psychology, 33,259-266. Harvard University Press.
Kraemer, H. C. (1981). Coping strategies in psychiatric clinical re- Smith, M. L., Glass, G. V., & Miller, T. I. (1980). The benefits of psycho-
search. Journal of Consulting and Clinical Psychology, 49,309-319. therapy. Baltimore, MD: Johns Hopkins University Press.
Kraemer, H. C, & Thiemann, S. (1987). How many subjects?Statistical Snedecor, G. W., & Cochran, W. G. (1980). Statistical methods (7th
power analysis in research. Newbury Park, CA: Sage. ed.). Ames: Iowa State University Press.
Lachin, J. M. (1981). Introduction to sample size determination and Stiles, W. B., Shapiro, D. A., & Elliott, R. (1986). "Are all psychothera-
power analysis for clinical trials. Controlled Clinical Trials, 2, 93- pies equivalent?" American Psychologist, 41,165-180.
113. Viale-Val, G., Rosenthal, R. H., Curtiss, G., & Marohn, R. C. (1984).
Lambert, M. J., Shapiro, D. A., & Bergin, A. E. (1986). The effective- Dropout from adolescent psychotherapy: A preliminary study. Jour-
ness of psychotherapy. In S. L. Garfield & A. E. Bergin (Eds.), Hand- nal of the American Academy of Child Psychiatry, 23,562-568.
book of psychotherapy and behavior change (3rd ed., pp. 157-211). Walrond-Skinner, S. (1986). Dictionary of psychotherapy London:
New York: Wiley. Routledge & Kegan Paul.
Liberman, R. P., & Eckman, T. (1981). Behavior therapy vs. insight- Waterhouse, G. J., & Strupp, H. H. (1984). The patient-therapist rela-
oriented therapy for repeated suicide attemptors. Archives of General tionship: Research from the psychodynamic perspective. Clinical
Psychology Review, 4, 77-92.
Psychiatry, 38, 1126-1130.
Yu, P., Harris, G. E., Solovitz, B. L., & Franklin, J. L. (1986). A social
Luborsky, L., Singer, B., & Luborsky, L. (1975). Comparative studies of
problem-solving intervention for children at high risk for later psy-
psychotherapies: Is it true that "everyone has won and all must have
chopathology. Journal of Clinical Child Psychology, 15, 30-40.
prizes?" Archives of General Psychiatry, 32,995-1008.
Meinert, C. L. (1986). Clinical trials: Design, conduct, and analysis. Received February 1,1988
New York Oxford University Press. Revision received May 10,1988
Patterson, G. R. (1974). Interventions for boys with conduct problems: Accepted May 23,1988 •

You might also like