You are on page 1of 22

Chapter 17

Power

OBJECTIVES FOR CHAPTER 17

After studying the text and working the problems in this chapter, you should be able to:

1. Define statistical power with a formula and with words


2. List and explain the factors that affect power
3. Calculate the amount of power in one-sample t tests, independent-samples t tests, and
paired-samples t tests
4. Calculate the amount of power in a test of a Pearson r
5. Estimate the sample size needed for t-test designs and correlation studies
6. Decide which of two studies is better, even though both retained the null hypothesis

Power is a probability – the probability of rejecting the null hypothesis when it is false. This
chapter explains the concept of statistical power for three kinds of t tests and for a Pearson
correlation coefficient, r. A test is powerful if it is likely to reject a false null hypothesis.

The textbook, Tales of Distributions, has a heavy emphasis on null hypothesis statistical
testing (NHST), which was introduced in Chapter 8, Hypothesis Testing and Effect Size:
One-Sample Designs. You may recall that to test a null hypothesis, you tentatively assume
that it is true and then, on the basis of sample data, conclude that the null hypothesis is false
or that you don’t have good evidence that it is false. After the statistical analysis is finished,
you interpret the results. From Chapter 8 onward, you worked with several different
statistical tests. Each was developed for a particular design (such as two groups) or for a
particular kind of dependent variable (such as ranks). In every case, however, these tests are
based on the logic of null hypothesis statistical testing.

For every NHST technique, there is a null hypothesis, which is symbolized H0. The form of
the null hypothesis in the textbook is a hypothesis of equivalence. An example you’ve seen
several times is H0: µ0 = µ1. Of course, the truth is that this null hypothesis may be true or
that it may be false. It is the job of the statistical test to provide information that allows you to
move in the direction of one of these conclusions.
A REVIEW OF POTENTIAL MISTAKES
Any conclusion, of course, could be mistaken. So far, you have concentrated on minimizing
a particular kind of mistake, the mistake of rejecting the null hypothesis when it is true. If
you reject a null hypothesis that is true, you make a Type I error. Although you cannot totally
eliminate the possibility of a Type I error, hypothesis testing allows you to control it with the
α level you adopt. If α = .05, you are assured that the probability of a Type I error is no
greater than .05.

As you no doubt recall, there is a second way you can make a mistake in a statistical analysis.
You make this mistake if you retain a false null hypothesis. Retaining the null hypothesis
when it is false is a Type II error. Table 16.1 illustrates the circumstances in which Type I
and Type II errors are possible. Note that a Type I error is possible only when the null
hypothesis is true and a Type II error is possible only when the null hypothesis is false.

Table 17.1 Type I and Type II errors

The true situation in the population

H0 true H0 false
Retain H0 Correct decision Type II error
Reject H0 Type I error Correct decision

The probability of a Type II error is symbolized with the Greek beta, β. To review the factors
that influence the value of β, see pages 218-219 in the textbook. Learning the material in this
chapter will improve your understanding of the factors that influence β. This is a chapter that
attends to the situation when the null hypothesis is false, which is the right hand column of
Table 17.1.

POWER DEFINED AND EXPLAINED


Note in Table 17.1 that when the null hypothesis is false, there are two possible outcomes:
a Type II error and a correct decision. The probability of a Type II error is β and the
probability of a correct decision is 1 - β. Increasing the probability of a correct decision is
the approach that statisticians take in dealing with false H0’s. Thus, the attention in this
chapter is on the value 1 -β and how to maximize it.

The expression, 1 -β, is the power of a statistical test. A test with power = .90 means that
if the null hypothesis is false (to the degree you think it is), you have a .90 probability of
rejecting the null hypothesis with the test.

Consider a statistical test that has a power of .25. Such a test has a .75 probability of failing
to detect a difference that is actually there. You can imagine that if you knew this before you
began your experiment, you might well take steps to increase the power. If you couldn’t
increase the power, you might decide to abandon the project and spend your time and effort
on something with a greater than a .25 chance of success.
ADEQUATE POWER
How much power is enough? This is another of those statistical questions for which there is a
commonly accepted conventional answer. The conventional answer is that adequate power is
1
.80. Sometimes students react to this “convention” by wondering why smart people would
adopt a standard that allows them to miss detecting a difference once in every five
experiments (if their assumptions are correct).

The explanation for this (if you are such a wondering student) is that resources are limited and
adding power requires additional resources. In most situations, researchers can increase power
only by adding observations. In many cases, however, it is expensive to obtain observations,
and as power is increased beyond .80, the sample size requirements go up exponentially.
Thus, for most problems that have an effect size index that is small or medium, you cannot
afford to absolutely ensure that you will find the suspected effect because the resources
needed for additional observations just aren’t available.

ASSUMPTIONS AND ACCURACY


Calculating the power of a statistical test always requires that you make assumptions. These
are assumptions that you cannot be sure are true. Because of this uncertainty, the results of a
power analysis are never treated as exact answers. Not having exact answers is not a
problem for researchers, however, because what they want to know is whether a statistical
test has power equal to about .80 or whether it is in the .40 range. Researchers recognize that
making a distinction between power probabilities of .67 and .63 just isn’t helpful.

The fact that researchers do not need precision is also helpful for those who are just
learning about power analysis because it allows textbook writers to use the normal curve to
explain power, even though the probabilities that result are only approximately correct. As a
result, a power analysis does not change for different sized samples; the normal curve fits
all.

Power is a concept that applies to every NHST statistic. This chapter, however, explains
only how to calculate the power of three kinds of t tests and tests of the significance of r.
Corresponding methods of calculating power exist for other statistics.

1
You have already worked with one convention using the number .80. A d value of 0.80,
however, is quite different from the idea that power = .80. An effect size index of 0.80 means
you have a large effect size. The index, d, is an expression of a difference between means, per
standard deviation unit. Power, however, is expressed as a probability, the probability of
rejecting a false null hypothesis.
TWO DIFFERENT SITUATIONS AS EXAMPLES
Figure 17.1 and Figure 17.2 show two different situations in which the null hypothesis is
false. Figure 17.1 shows situation A. In Figure 16.1, the population on the left is the null
hypothesis population, but the sample is drawn from the population on the right. The sample
is from a population with a mean, µ1, which differs from the null hypothesis population by
an amount that produces a d value of 0.20.

Situation A
d= 0.20

µµ1 0
Figure 17.1 Situation A in which the null hypothesis is false. The degree of falseness is
indicated by a d value of 0.20

Situation B
Population 0, d = 0.80
Population 2,

µ µ
0 2

Figure 17.2 Situation B in which the null hypothesis is false. The degree of falseness is
indicated by a d value of 0.80

Figure 17.2 (situation B) likewise has the null hypothesis population on the left. The
sample, however, comes from the population on the right. The difference between the two
population means produces a d value of 0.80. The two figures illustrate a small effect size (d
= 0.20 in Figure 16.1) and a large effect size (d = 0.80 in Figure 16.2).

Suppose a one-sample t test was calculated on a sample from population 1 in situation A and a
separate one-sample t test was calculated on a sample from population 2
in situation B. That is, in situation A, a sample from population 1 is tested against H0: µ0 =
µ1 and likewise, in situation B, a sample from population 2 is tested against H0: µ0 = µ2.
Which of the two t tests is more likely to reject the null hypothesis? Choose situation A or
situation B before going on. Answer: __________

A one-sample t test is more likely to reject the null hypothesis in situation B than in situation
A. However, is rejecting the null hypothesis the right decision in both cases? Or is rejecting
the null hypothesis an error? Answer: ___________

Rejecting the null hypothesis is the correct thing to do in both cases. Failure to reject
would be a Type II error. But what is the probability of rejecting the null hypothesis in
situation A? In situation B? For probabilities, you need a power analysis.

SAMPLING DISTRIBUTIONS OF THE MEAN


The two large curves in Figure 17.3 are the same two population curves you saw in Figure
17.1. The two small curves in Figure 16.3 show the sampling distribution of the mean for
2
each population when N = 9.

2
A graph of a sampling distribution of the mean shows the means of all the samples of a
particular size from a population. Thus, in Figure 17.3, the sampling distribution of the
mean for population 0 is the small curve on the left. The sampling distribution of the mean
for population 1 is the small curve on the right.
Situation A
d= 0.20

Sampling distribution Sampling distribution of the mean


from of the mean from population 0 population 1 N=9
N=9

Figure 17.3 Populations from Figure 16.1 and sampling distributions of the mean for
N = 9. Critical region for α = .05 is shaded.

In Figure 17.3, the expected values of the mean (mean of the sampling distributions) are
µ0 and µ1, the same as the population means. Focus on the sampling distribution of the
mean for the null hypothesis population. The rejection region for a two-tailed test is
shaded. Estimate what proportion of the means from population 1 has values that are
shaded in the sampling distribution on the left. Answer: ______

This is a hard question. Here is more information to help you either answer or confirm your
answer. The sampling distribution of the null hypothesis is on the left; the shaded portions
show the rejection region. All sample means from population 1 with values that are the same
as values in the shaded regions of population 0 lead to a rejected null hypothesis. What
proportion of the means from population 1 are in the rejection regions? Make an eyeball
estimate (attending to both shaded regions). Answer: ______

To my eye, about 20-25 percent of the right side and one percent of the left side of the
sampling distribution of population 1 are in the rejection region of the null hypothesis
sampling distribution. Thus, an eyeball estimate is that power is .21 to .26 in the case where
d = 0.20 and N = 9.
PROBLEMS (Answers at the end of the chapter)

1. “How much power is enough?” What is the conventional answer to the


question?
2. Write definitions of a Type I error and a Type II error. What symbols are used for
the probability of each of these errors?
*3. Figure 16.4 shows situation B (d = 0.80) and sampling distributions of the mean
for N = 9. Use Figure 17.4 to estimate the power available to detect that the
populations are different.

Situation B
d= 0.80

Sampling distribution Sampling distribution of the mean


from of the mean from population 0 population 2 N=9 N=9

Figure 17.4 Populations from Figure 16.2 and sampling distributions of the mean for
N = 9. Critical region for α = .05 is shaded.

FACTORS THAT AFFECT POWER


A list of factors that affect power follows. Understanding these factors is necessary for
understanding a power analysis. After the factors are explained, you will calculate power
values. Factors that affect power are:
1. α
2. the actual difference between population means
3. N
4. using a one-tailed or a two-tailed statistical test.

THE EFFECT OF α
Here’s the rule about α and power: If α is changed to larger values (e.g., from .01 to .05 to
.10), power increases. This makes sense; as you make it easier to reject the null hypothesis
by changing α, the more likely you are to reject H0. Being more likely to reject the null
hypothesis overall means that you are more likely to reject H0 when it is false. Thus, power
is increased because you are more likely to reject a false null hypothesis.

Situation B
d = 0.80

.10 .05 .01


critical region cutoffs

Figure 16.5 Enlarged versions of sampling distributions in Figure 16.4. Critical regions
for α levels of .10, .05, and .01 are shaded light, medium, and dark.

Figure 17.5 provides a picture of the effects of changing α. It is an enlarged version of the
sampling distributions in situation B in Figure 16.4. The rejection regions for α levels of .10,
.05 and .01 are shaded as light, medium, and dark, respectively. Look at Figure 16.5 and
estimate the amount of the sampling distribution from population 2 that is to the right of each
of the three alpha levels.
for α = .10: _____ for
α = .05: _____ for α =
.01: _____
To my eye these proportions are: for
α = .10, about .80, for α = .05, about
.70, and for α = .01, about .50.

Thus, as α is reduced, so is power.


THE EFFECT OF THE ACTUAL DIFFERENCE BETWEEN POPULATION MEANS
The second factor, the size of the difference between population means, is one that doesn’t
need much additional explanation. Early in this chapter, when you looked at Figure 17.1 and
Figure 17.2, you probably said that a t test would be more likely to detect the false null
hypothesis in situation B than it would in situation A. The population means in situation B are
farther apart than they are in situation A.

In situation A, power was approximately .11 (text example). In situation B power was
approximately .70 (problem 3). In both cases, sample size was 9. In summary, the greater
the difference between the populations you are sampling from, the greater the power of your
statistical test. This issue is sometimes referred to as the degree of falseness of H0.

THE EFFECT OF N
The rule for the effect of sample size (N) on power is that as N increases, power increases.
One way to understand this relationship more thoroughly is to examine the algebra in a one-
sample t test. Mentally work your way through the formula that follows, starting on the right
side and working back to the left. Imagine that N increases and see what happens to t.
Convince yourself that as N increases, power increases.

X −µ X −µ
t == 10 10
ssˆ
X
N

The effect of increasing N is to make sX smaller and, of course, a smaller sX increases the
numerical value of t. The larger t is, the more likely the rejection of the null hypothesis.
Thus, power increases even though the difference between means stays the same.
SituationB
d= 0.80

Sampling distribution Sampling distribution

Figure 17.6 Sampling distribution of the mean with N = 9 (upper panel) and sampling
distribution of the mean with N = 36 (lower panel). Situation B.

A second explanation of how N affects power is with pictures rather than algebra. The upper
panel of Figure 17.6 shows sampling distributions from Situation B for N = 9, which is a
repeat of Figure 17.5. The bottom panel of Figure 16.6 shows sampling distributions from
Situation B when N is increased to 36. In both panels, rejection regions are shaded (.05 level,
two-tailed test). When N = 9 (top panel), my eyeball estimate of power remains about .70.
Look at the lower panel sampling distributions for N = 36. What is your estimate of the power
in this situation? Answer: ______

My eyeball estimate of power in the bottom panel of Figure 16.6 is about .99 to
1.00. Thus, as sample size increases, so does power.

THE EFFECT OF A ONE-OR A TWO-TAILED TEST


The rule for one- and two-tailed tests is that one-tailed tests are more powerful than two-
tailed tests. Look again at Figure 17.5, which shows the two sampling distributions of the
mean from situation B. My eyeball estimate of power was .70 when α is .05. I got this by
estimating that 70 percent of population 2 lay to the right of the α = .05 cutoff point. The
other tail of the sampling distribution can be ignored; none of population 2 is to the left of
the .05 cutoff point.

What if the test had been a one-tailed test with α still at .05? Looking at Figure 17.5, you
can see that the rejection region for a two-tailed test with α = .10 consists of .05 on the right
and .05 on the left. Thus, the rejection region for a one-tailed test with α = .05 is the entire
shaded area on the right. My estimate of the power in this
case is .80. Thus, a one-tailed test is more powerful than a two-tailed test. (Remember,
however, that a one-tailed test is completely incapable of detecting a difference in which the
ordinal position of the two means is the opposite of what you expect.)

PROBLEMS
1 From memory, list four factors that affect power.
2 Write a sentence about each of the four factors, explaining how power changes as that
factor is reduced or is changed.

STEPS IN CONDUCTING A POWER ANALYSIS

ESTIMATING OR CHOOSING AN EFFECT SIZE


An important step in a power analysis is to either estimate or to choose a value for d, the
effect size index. From your earlier study, you know that the formula for finding the effect
size index, d, is
µ −µ
d= 1 0
σ In most situations, you don’t have any way to know the value of population
parameters such as µ and σ. Even estimates require data. Never the less, to do a power
analysis, you must start with a value for d. There are several ways to meet this requirement.

• Previous research. Sometimes you can estimate d by using statistics from published
studies on similar topics.
• Practical considerations. In some cases, a researcher can identify a minimum
difference that would be worth finding out about. Oftentimes, these situations are ones in
which a researcher is doing applied research.
• Conventional values of small, medium, and large. As you may recall from Chapter 4
in Tales, Jacob Cohen proposed that d values of .20, .50, and .80 be labeled small, medium,
and large. If neither of the first two solutions works, the researcher might choose one of
these conventional values for d.

ESTIMATING POWER
The actual process of estimating power involves a formula and a table. Look at Table M now
(last page in this chapter). The figures in the body of the table are power probabilities. To
enter the table, select an α value (which will be for a two-tailed test). In addition, you will
need a value for δ(delta), which is explained below. At the intersection of αand δ, you have
the approximate power available for your particular analysis.

The formula for δis

δ=d[f ()N ]

The value of d comes from the first step in your power analysis. The expression f(N) is a
general expression. Its specific form depends on which statistical test you are
finding power for. For a specific statistical test, there is a specific formula for f(N). Once you
have a value for δ, enter Table M to find an estimate of the power available for your test.

ONE SAMPLE t TEST

POWER
For a one-sample t test, f(N) =

δ=d[f ()N ]=dN N.


Thus, for a
one-sample
You may recall from Chapter 8 in the textbook t analyzed data on the weight of
that you
tortilla chips in packages that claimed to test,
contain 269.3 grams. The mean weight of the 8
packages I weighed was 270.675 grams; the standard deviation was 0.523 grams. I’ll use
those data to find a value for d. With d in hand, we can find how much power went into that
one-sample t test.
µ−µ 270.675 −269.3
10
d = ==2.63
σ 0.523

To find power, enter Table M with the δvalue of 7.44. The largest value is 5.00, which
indicates power of .99 for an αlevel of .01. Thus, given the very large effect size of 2.63,
even a small sample of 8 was more than adequate to detect the fact that the Frito-Lay
company puts more in their tortilla chip packages than the advertised amount.

Here’s a second problem, which is based on an example in Howell (2002). A clinical


psychologist thinks that people who seek psychological help are more intelligent than those
in the general population. (IQ scores have a mean of 100 and a standard deviation of 15.)
Based on her experience, she thinks that patients are about five IQ points higher than
average. In addition, she knows she can fairly easily obtain IQ scores on a sample of 25
patients. Her power analysis follows.
µ −µ 105 −100
d == =0.33
1 0

σ 15

From Table M, she discovers that, with α= .05 (two-tailed test), power is between .36 and .40.
Using linear interpolation, she concludes that the power for her experiment is about .38.
Power of .38 is not very encouraging. Her most practical solution is to increase sample size.
How much of an increase is needed? The next section shows how to determine N for a
specified amount of power.

DETERMINING N
Let’s suppose that our clinical psychologist has decided to determine the sample size
required for power = .80. Again, she lets d = 0.33. From Table M, she finds that power of
.80 corresponds to a δ value of 2.80. She looks at the basic formula for delta,
N and
δ= d
sees that she
has all the
elements
except N. N
δ= d
So, she
solves for N 2 2

and then ‫ۇ‬δ ‫ۊ‬N =2.80‫ ۊ=ۋ‬2 ‫(=ۋ‬8.48)= 71.91 = 72


substitutes d ‫ ی‬0.33
the knowns
Thus,into the an 80 percent chance of detecting an IQ difference of 5 points using a two-
to have
tailed test withfor
formula α = .05, she needs IQ scores from 72 patients.
N.
For the last example in this section, let’s return to tortilla chips. Suppose that you want to
conduct your own study of tortilla weights, but you haven’t discovered the data in the
textbook. How can you get an estimate of d? Let’s try reason.

Perhaps, you might reason like this. Because chips are very cheap to produce and public
outcry for selling light is a big problem, a manufacturer will make sure that the customer gets
more than the package claims. Using this reasoning, you decide that the effect size is big.

Begin by assuming an effect size index of 1.00. Next, choose a large value for power, say
.95. With these decisions in hand, you can find the N needed for your experiment. Start
with the formula for δ for a one-sample t test.

δ=Nd

Once again, by rearranging this formula and solving for N, you get

‫ۇ‬δ
N= 2

d The next step is to enter‫ ۊ‬Table M. Choose an α value of .05 and then go down until
you find a power value of .95. The δ value is 3.60. Now you have the elements you

H0 true H0 false
Retain H0 Correct decision Type II error
Reject H0 Type I error Correct decision
Thus, the plan for the experiment calls for the purchase of 13 bags of tortilla chips with the
expectation that if the effect size index is 1.00, you would have a .95 probability
of detecting that Frito-Lay puts more chips or less chips in their bags. (Being able to detect
more or less comes from the fact that Table M gives values for a two-tailed test.)

INDEPENDENT SAMPLES t TEST

POWER
In the case of the independent-samples t test,

()=
fN N
2

The value for N depends on whether or not the sample sizes are equal. When both
samples are the same size, then N = N1 = N2 . When N1 ≠N2, the value for N is given
by the formula

( )(
2 NN)
12 N
N= 1

+N 2

Thus, for an independent samples t test,

N
δ=d
2

What are the effects on children of spending their days in a child-care center? Sandra Scarr
investigated this question; in addition, she wrote a review article for the February, 1998
American Psychologist. The problem that follows is based on material in her article.

Her analysis showed that the behavioral adjustment scores of children who spent 24 or more
months in a day-care center were not significantly different from those who had been cared
for at home. At this point, there are two possible interpretations. One is that the behavior of
day-care children (all of them) is not different from the behavior of children cared for at home
(all of them). The other possibility is that the sample data produced a Type II error. That is,
there is a difference in the populations but the sample data failed to detect it. A power
analysis helps you decide between these two alternatives.

From the data in Scarr’s report, a d value of 0.15 was calculated. The N in this study was
1100, with 550 in each group. Working from the formula that appeared previously,

N
δ=d =0.15
22() =()0.15 (23.45
550)=3.52
From Table M, you find that the power available in this study was .94 for a two-tailed test
with α = .05. Thus, our analysis shows that a great deal of power was devoted to detecting
any difference that exists between maternal and non-maternal child care, but that no
significant difference was found. With all this power, a Type II error seems unlikely. In any
case, if there is a difference, the power analysis suggests that the effect size index is quite
small.

DETERMINING N
A common question among all researchers, from beginners to experts, is, “How many
participants will I need for my study?” Let’s eavesdrop on a conversation with a local
research methods guru. Put yourself in the scenario that follows.

You: How many participants do I need in my two-group study? Me: I’d just run as
many as I could, without beating myself to death. Just do a
reasonable number. You: Isn’t there some scientific way to decide? Me: Sure, let’s do a
power analysis and see what it turns up for you. You: Do I need an Ouiji Board? Me: Nope,
but a calculator would be handy. Have one in your backpack? You: Sure, always! Me: O.K.,
how much power do you want? You: Well, .80, I guess. Isn’t that the right answer? Me: It’s
a good answer; we’ll work with that. How big of an effect size index are you

working with? Do you have data to make an estimate from?


You: No, not really. I’m using a comprehension test that I made up. I’ll just assume that I
am working with a medium effect size index. I just know that the two levels of my
independent variable will produce a difference in the dependent variable.
Me: Hmmmm. (nods) Let’s see, we have enough information to do the analysis and get an
answer to your question. Let’s assume that you are able to get equal numbers of participants
in your two groups. We’ll find the N for each group and then multiply by 2.

With equal numbers in each group, N1 = N2 = N. We’ll start with the formula for δ
for an independent-samples t test,

N
δ= d
2

Now, let’s isolate N,


2
2δ ‫ۊ‬
N=‫ۈ‬
‫ۉ‬d
2

‫ۋ‬

O.K., we’ll plug in your medium effect size for d and the δ value for .80 power…
2(2.80)
2 2


N=‫ۈ‬ = 62.72
2‫ۋ‬
d ‫(ی‬0.50)
2
=

Now, we can multiply the 62.72 by 2… 2 (62.72) = 125.44

You: Uh-Oh.
Me: Yeah, too bad, but better to know than to delude yourself. For your independent project,
it’s no deadly sin to have low power. The main purpose of an independent project is to get
you started doing research on your own.
You: Yeah, but…. Anyway, I’m still thinking …..

This example shows you a fact about independent-samples t tests: they are not very
powerful. That is, you need more than 100 participants to have enough power to detect a
medium effect size in four experiments out of five.

PAIRED - SAMPLES t TEST

POWER
For a paired-samples t test, f(N) = N.
Thus,
δ= d N

where N is the number of pairs of scores in the analysis.

DETERMINING N
The formula for N for a paired-samples t test is the same as that for the one-sample t
test you learned earlier.

‫ۇ‬δ
N= 2

d‫ۊ‬
I’ll work just one problem here and see if it leads to any insight. What sample size is needed
to detect a medium effect size with a power of .80? By now, perhaps you can fill in the
formula from memory rather than by consulting tables? Here’s the formula:
2 2

N = ‫ =ۋ =ۋ‬31.36 d ‫ۇی‬δ‫ۊ‬ 0.50 2.80 ‫ۊ‬


Because N means pairs in paired-samples
designs, about 62 participants are needed to detect a medium effect size in four
experiments out of five.

Me: How’s your thinking going?


You: Ha! I got my idea! Me: I
figured you would.
PEARSON r

For
N −1a Pearson
, r, f ()N =
where N is the
POWER number of pairs
The effect size index for a Pearsonincorrelation
the data. coefficient is some symbol which hasn’t
been agreed upon by statisticians. For now, we’ll use r. (That is what Cohen uses.) Using r
as the symbol for an effect size index has the advantage of telling us it is associated with a
Pearson correlation coefficient. The disadvantage of using r is that it already has another
meaning – it is the symbol for a Pearson correlation coefficient.

To find δso you can enter Table M to find power,

δ=d[ f (N)] =d

How much power is available to detect a moderate correlation coefficient of .30 using 30
pairs of scores? A sample of 30 is a commonly recommended sample size.

.30(5.39) =1.62

Consulting Table M in the α = .05 column, you find that power = .36 for δ = 1.62. Thus, for a
correlation study the commonly recommended sample size of 30 pairs is not a good
recommendation. A sample size of 30 pairs doesn’t provide enough power to detect a
moderate sized r. So, how about 100 pairs? Now,

.30(9.95) =2.98

For δ = 2.98, power is about .85. Thus, for detecting a modest .30 correlation coefficient, a
better recommendation is a sample size of 100 pairs.

DETERMINING N
Solving the equation δ=d N −1 for
N,
2

‫ۇ‬δ‫ۊ‬
N = ‫ۋ‬+1
‫ۉ‬d

I’ll use the formula for N to determine a sample size that has a .95 probability of detecting
an r = .40. For this problem, δ = 3.60.
2 2

‫ۇ‬δ‫ۇ ۊ‬3.60 ‫ۊ‬


N = ‫ۋ‬+1 = ‫ۋ‬+1 =81+1 =82 pairs‫ۉ‬d
‫ۉی‬.40
CONCLUSION

Scientific experimenters use NHST statistics to reach conclusions about natural phenomena.
As also happens with other approaches, statistical conclusions can be wrong at times. One of
the beauties of statistics is that a power analysis allows you to calculate the probability of
being right, if your preliminary conclusions about nature are correct.

This chapter provides an introduction to the statistical power of NHST tests. Jacob Cohen’s
4-page, 1992 primer on power in Current Directions in Psychological Science is another
good introduction. Of course, power can be calculated for tests other than t-tests and
correlation coefficients. Howell (2002) shows power calculations for ANOVA and factorial
ANOVA. Cohen’s article shows power formulas for chi square.
PROBLEMS

6. Some years ago, the mean ACT score was about 23 for freshmen at fairly selective
liberal arts schools. Now it is about 27. The standard deviation on this national test is
6. Suppose some group wanted to “scientize” its claim that scores are significantly
higher now by running a t test on a random sample of 100 students from each of the
two eras. How much power would there be in such a test?

7. Your statistics textbook describes an early study on smoking and lung cancer in
which the consumption rate and cancer rate for 11 countries was correlated. How much
power was available to detect this relationship, if the relationship is r = .50?

8. Suppose that the population correlation coefficient for the relationship between
smoking and lung cancer is .75 (which it is). How much power is available to detect that
there is a relationship if the study is based on 11 countries?

9. A pharmaceutical lab had tons of data on the latency of rats’ jumping response to a
light that preceded a shock. The shock occurred unless the rat responded by jumping a
barrier. The latency to the light was 4.0 seconds with a standard deviation of 1.5 seconds.
The researchers tested a drug that was expected to cause time distortion. They decided that
any drug effect that changed latency less than one-half second was trivial. (This decision was
based on changes caused by other drugs.) What size sample is needed for their one-sample t
test, if they want to have power of .80?

10. Create a table. The first column lists the four statistical tests in this chapter. The next
three columns are labeled small, medium, and large. The spanner over these three columns is
effect size index. Conventional values go in these cells. (Use .10, .30, and .50 as small,
medium, and large effect size indexes for r.) The fifth column is labeled “formula for f (N)”;
the cells contain the formulas for each test. Columns 6-8 are labeled, small, medium, and
large. The spanner is “Sample size for power = .80.” Compute sample sizes appropriate for
these 12 cells.

11. Review the Objectives at the beginning of the chapter.


ANSWERS TO PROBLEMS

1 Adequate power is .80.


2 A Type I error (the probability of which is symbolized, α), is rejecting the null
hypothesis when it is true. A Type II error (the probability of which is symbolized, β), is
failing to reject the null hypothesis when it is false.
3 To my eye about 70 percent of the sampling distribution of the mean from
Population 2 is to the right of the right side critical region of the null hypothesis curve.
None is to the left of the left side critical region.
4 α, The actual difference between population means, N, One-tailed or two-tailed
statistical test
5 As α is reduced (as from .05 to .01), power is reduced As the actual difference
between population means is reduced, power is reduced As N is reduced, power is reduced
A two-tailed statistical test is less powerful than a one-tailed test

23 − 27
6. d = =−.667
6
δ= .667 =
.667(7.071) =
‫ۈ‬ ‫ۋ‬
100
2
4.72
Power ≥ .99

Interpretation: There is plenty of power to detect a difference of 4 ACT points with


samples of 100.

7. δ= .50 11−1 =
.5(3.162) = 1.58
Power = .35
Interpretation: There is not much power to detect a correlation coefficient of .50 using just
11 pairs of data.

8. δ= .75 11−1 =
.75(3.162) =
2.37 Power =
.66
Interpretation: Even for a correlation coefficient of .75, there is not much power to
detect it using just 11 pairs of data.
9 The minimum change in mean latency that was worth detecting was 0.5 seconds.
Thus, the minimum effect size index is
0.5
d == 0.33 2

1.5 ‫ۇ‬δ ‫ ۊ‬2.80


2

N = ‫=ۋ ۊ‬ ‫ =ۋ‬71.99 = 72 d‫ی‬


.33

2(72) = 144 rats

Interpretation: The researchers need a gross of rats (144) to detect that the drug
distorts time perception to a degree that they consider worthwhile to know.

10. Effect size index Formula Sample for Power = .80

H0 true H0 false
Retain H0 Correct decision Type II error
Reject H0 Type I error Correct decision

need to solve for N.


2

N 1.00
3 60 12 96
Table M Approximate Power as a Function of Delta and Significance
Level

Alpha Level for a Two-Tailed Test delta (δ) .10 .05 .02 .01

H0 true H0 false
Retain H0 Correct decision Type II error
Reject H0 Type I error Correct decision

need to solve for N.


2

N 1.00
3.60 2 12.96=

= = ‫ۊ‬

Small Med Large f(N) Small Med
Test
One-sample t .20 .50 .80 N 196 31.36
Independent t .20 .50 .80 2N 784 125.44

Paired t .20 .50 .80 N 392 62.72


Correlation r .10 .30 .50 N −1 785 88.11
pairs pairs
.26 .17 .09 .06
1.00
1.10 .29 .19 .11 .07
1.20 .33 .22 .13 .08
1.30 .37 .25 .15 .10
1.40 .40 .29 .18 .12
1.50 .44 .32 .20 .14
1.60 .48 .36 .23 .16
1.70 .52 .40 .26 .19
1.80 .56 .44 .30 .22
1.90 .60 .48 .33 .25
2.00 .64 .52 .37 .28
2.10 .68 .56 .41 .32
2.20 .71 .59 .45 .35
2.30 .74 .63 .49 .39
2.40 .77 .67 .53 .43
Source: Computed
2.50 .80 by the.71
author with SPSS.
.57 .47
2.60 .83 .74 .61 .51
2.70 .85 .77 .64 .55
2.80 .88 .80 .68 .59
2.90 .90 .83 .72 .63

You might also like