Professional Documents
Culture Documents
Cohen (1992) StatisticalPower
Cohen (1992) StatisticalPower
$XWKRUV-DFRE&RKHQ
5HYLHZHGZRUNV
6RXUFH&XUUHQW'LUHFWLRQVLQ3V\FKRORJLFDO6FLHQFH9RO1R-XQSS
3XEOLVKHGE\Sage Publications, Inc.RQEHKDOIRIAssociation for Psychological Science
6WDEOH85/http://www.jstor.org/stable/20182143 .
$FFHVVHG
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.
Sage Publications, Inc. and Association for Psychological Science are collaborating with JSTOR to digitize,
preserve and extend access to Current Directions in Psychological Science.
http://www.jstor.org
= .01 is
Note
power at a2
only.56.1
also that at any given value of a, a
test ismore stringent than
two-sided
a one-sided
test.
Statistical
The power
of a statistical
test of a
power depends
criterion
(a),
on
the
significance
sample size (N), and the population
effect size (ES).
The importance of power analysis
arises from the fact that most empir
under study.
phenomena
A typical H0 is that a population
r, is
correlation,
product-moment
zero, to be tested at the two-sided
.05 level. When
this H0 is
(ct2 =)
tested on a sample of N cases ran
drawn
from a population
in
Analysis for
Regression/Correlation
Sciences
the Behavioral
(2nd ed.,
equal
zero,
re
searchers
risk mistakenly
rejecting
it is true, a Type Ierror,
the H0 when
rate (.05) is controlled by the
whose
a criterion. They also riskmistakenly
the H0 as tenable when
it
accepting
is false, a Type IIerror, whose prob
ability is called ?. Power is thus 1
of not accepting
?, the probability
it is false, that is, the
of successfully
rejecting
probability
the H0.
The outcome
of a statistical test
on
the
the
depends
degree to which
on
is
that
the
is,
false,
magnitude
H0
in this
of the population
ES, which
case is the absolute size of the pop
a),
the
form
in meta
used
reviews
analytic power
areas or journals.
r?the
ulation
larger the r, the
that the H0
greater the likelihood
It is also true that
will be rejected.
the outcome depends on N, a larger
sample being more likely to result in
of
research
EFFECTSIZE
r indeed does
the H0 when
domly
which
Thus,
at a2
.05,
for
exam
r is .30, when
ple, if the population
N is 40, the power of the standard t
test of a sample r turns out to equal
.48, whereas when N is 80, power is
r is .40, when
.78. If the population
N is 40, power is .74, but when N is
is .96. Finally, the test
80, power
outcome depends also on a, the risk
I error. A smaller and
of a Type
therefore more stringent a criterion,
=
.01, for any given popu
say, a2
in
lation r and N, would
result
For example,
smaller power.
with
r =
.30 and N = 80,
population
=
.05 is .78,
while
power at a2
Published by Cambridge
University
Press
the null
hypothesis,
H0,
hypothesis of inter
est, Hv For testing a sample r, the
r is zero,
H0 is that the population
and the H^ posits a specific nonzero
.30. Thus, the
for example,
value,
ES in this example
is simply the dif
.30
ference:
.00. Every statistical
test has its own ES index, a contin
uous value
that runs from zero,
and the alternate
CURRENTDIRECTIONS INPSYCHOLOGICALSCIENCE 99
when the H0 is true. Each ES index is
a pure (i.e., scale-free)
value that
to it,
in terms appropriate
measures,
the discrepancy
the Ht
For example,
between
between
difference
means
in the classical
between
difference
means standardized
difference
population
difference
tests and
the H0 and
independent
t test is d, the
the population
this
by dividing
by the common within
standard deviation.
(The
is absolute
for two-sided
is either positive or nega
As another example,
for testing
the departure of a population
pro
portion (P) from .50, the ES index is
= P .50. If an investigator be
g
in
lieves that there is a sex difference
the incidence of dyslexia
such that
boys are at different risk from girls,
in a sample of dyslexic children, she
by
population
the common
within
standard deviation of the
observations.1
in the social
sci
Investigators
ences find specifying the ES the most
difficult aspect of power analysis.
This is at least partly due to the rel
low level of consciousness
atively
in those disci
about magnitudes
plines. The conquest of psychologi
cal science by Fisherian null hypoth
to
esis testing (where the alternative
so that
the H0 is simply its negation,
no Hy is specified) has had the un
effect
of emphasizing
the
of
values
from
p
signifi
magnitudes
cance tests rather than the magni
tudes of the psychological
phenom
ena under study.3 A salutary side
fortunate
readiness
previous
variables,
and
and
for small, medium,
ESs.1
large
Another means of facilitating the
of the various ES in
understanding
is by transforming
dices
them into
other measures.
For example, many
of the ES indices
between
normal distri
intuition.
cated
Because
the ES indices
are not
a, THE SIGNIFICANCE
CRITERION
I have
familiar,
generally
proposed
as conventions,
or operational
defi
and
nitions,
"small/'
"medium,"
to
ES
values
of
each
index
"large"
user
some
sense
with
the
of
provide
Itwas my intent that me
its scale.1
dium ES represent an effect of a size
to the naked
likely to be apparent
that small
eye of a careful observer,
ES be noticeably
smaller yet not triv
ial, and that large ES be the same
as small is
distance above medium
it. I also made an effort to
below
make these conventions
comparable
across different statistical tests.
For example,
for the test that r =
and large ESs are,
0, small, medium,
rs .10,
the
respectively,
population
test
two
.50.
For
and
the
that
.30,
means
are
the
population
equal,
=
same
are
in
the
d
ESs,
order,
.20,
.50, and .80. The .20 ES is exempli
fied by the mean
be
IQ difference
tween twins and nontwins
latter
(the
being larger), the .50 ES by the mean
between
clerical and
IQ difference
semiskilled workers,
and the .80 ES
IQ difference between
by the mean
Ph.D.s and college freshmen.
In the
test
of the H0
analysis of variance
that
.40
have equal
populations
the index is (the standard
means,
ized standard
deviation
of the
re
The probability of mistakenly
re
a
the
a,
represents
jecting
H0,
search policy?the
maximum
risk
one
is prepared
to take of making
conven
It has become
this error.
tional
DETERMININGSAMPLESIZE
means)
Copyright ?
are,
respectively,
1992 American
.10,
Psychological
.25,
Society
In planning
research,
deciding
the sample sizes is crucial. Because
costs
of dyslexia.
If in a
incidence
of dyslexic children half
population
are boys, there is no sex difference,
so H0 is P =
.50. Departure
from
.50 would
render H0 false. The ES
index for this test isg = P
.50, the
Abnormal
a neophyte
re
power are desired,
searcher might suggest a2 = .01 and
some very large value for power,
I
medium,
large
for
that
the
median
found,
example,
ES at a2
power to detect a medium
=
.05 was
.46. The many power
in the biosocial
sci
surveys done
ences since that time have had sim
linear
mately
demands
jects, cost-effectiveness
that this decision
be appropriate.
in connection
When
asked
with a
particular
investigation what a and
if a medium
ES (d
between means,
=
in
the population,
these
.5) exists
in
require 194 cases
specifications
of the two samples. Similarly,
r =
they require that if population
of a
.30, a test of the significance
=
sample r have 254 cases. For a2
.05 and .99 power,
the N require
each
ments
are,
respectively,
148
the
of
looking up various combinations
and
that
in
would
result
Ns
a2
g
within
the desired range and noting
the resulting power. From this table,
she could choose a set of specifica
DETERMININGPOWER
195.
sam
To determine
the necessary
one
to
needs
the
a,
ple size,
posit
I
and
desired
have
power.
pro
ES,
that in the ab
posed as a convention
sence of any other basis for setting
the value for desired power,
In scientific research,
ically more serious to make
used.1
.80 be
it is typ
a false
.80
power
There
is a useful
in assessing
analysis
in
research
search,
particularly
were
which
results
nonsignificant
Given
the N employed
obtained.
and a, one needs only to posit the
ES to determine
power.
population
The sample
ES found, or one or
more ES values posited by the asses
sor, may
common
serve
the
volume
It is a
this purpose.
that power was
finding
poor for plausible
ESs, usually be
cause of small N.
In 1962, I reviewed the articles in
1960
of the journal
portance.4
Power
1/6
.15
1/6
.10
.15
.10
.15
1/6
.15
.75
.75
.85
.50
.85
.60
.90
.95
.95
92
98
98
96
97
90
91
92
90
University
conventional
small,
of
definitions
and
ES.
a similar
ilar results. For example,
review by Sedlmeier and Gigerenzer
of
taken as confirmation:
significance
The median
power of these studies
to detect a medium
ES at a2 = .05
was
.25!
CONCLUSION
Published by Cambridge
the
an experi
lower still (.37) when
a criterion was employed.
mentwise
Even worse was the finding that in
11% of the studies, the H0 was taken
as the research hypothesis
and non
tions.
and
and Social
Psychology
from the perspective of power.51 de
termined power for each statistical
test in each article using the N em
=
.01, .05, and . 10 for
ployed at a2
Press
am,
Acknowledgments?I
as
always,
Notes
1. J. Cohen, Statistical Power Analysis for the
Behavioral Sciences, 2nd ed. (Erlbaum, Hillsdale,
NJ, 1988). This is the source of the system of power
analysis described here; the power values and sam
ple sizes of the illustrations derive from this book's
tables.
2. J. Neyman and E.S. Pearson, On the use and
interpretation of certain test criteria for purposes of
Rand R. Wilcox
be used
Certainly,
mon goals
one
of the most
com
in applied
research
is
two or more groups
in
comparing
terms of some measure
of location,
that is, a quantity
intended to repre
sent the "typicar
subject or object
rwilcox@wilcox.usc.edu.
in applied work.
a procedure
choosing
for
it
to
groups,
comparing
helps
keep
three common
goals in mind:
1. Control the probability of a Type
Ierror when
are
the distributions
identical.
consequences.
In this article,
I review the prob
lem that arises in using conventional
to compare group
statistical methods
means and then discuss some solu
2. Compute
accurate
confidence
intervals for the difference
be
tween two measures
of location
tions.
Standard
nonparametric
methods do not correct the problem,
nor do some of the better known im
means.
for comparing
provements
new methods
There are, however,
that can help applied researchers.
3. Achieve
reasonably
high power
when
the two groups differ
in
terms of some measure
of loca
When
using
the Mann-Whitney
Copyright ?
1992 American
U test, but
Psychological
Society
when
the distributions
differ.
tion.
Goal
to perform
groups,
appears
very
well.2
For Goal 2, Student's
t test ap
pears to perform
reasonably when
sample sizes are used, but for
unequal sample sizes, serious prob
lems arise. In particular, Cressie and
Whitford3
described general circum
stances under which,
no matter how
equal