|Views: 7,050|Likes: 7

Published by Ahmad Abdullah Najjar

Statistical power analysis characterizes the ability of a study to detect a meaningful effect size—for example, the difference between two population means. It also determines the sample size required to provide a desired power for an effect of scientific interest. Proper planning reduces the risk of conducting a study that will not produce useful results and determines the most sensitive design for the resources available. Power analysis is now integral to the health and behavioral sciences, and its use is steadily increasing wherever empirical studies are performed. SAS Institute is working to implement power analysis for common situations such as t -tests, ANOVA, comparison of binomial proportions, equivalence testing, survival analysis, contingency tables and linear models, and eventually for a wide range of models and designs. An effective graphical user interface reveals the contribution to power of factors such as effect size, sample size, inherent variability, type I error rate, and choice of design and analysis. This presentation demonstrates issues involved in power analysis, summarizes the current state of methodology and software, and outlines future directions.

Statistical power analysis characterizes the ability of a study to detect a meaningful effect size—for example, the difference between two population means. It also determines the sample size required to provide a desired power for an effect of scientific interest. Proper planning reduces the risk of conducting a study that will not produce useful results and determines the most sensitive design for the resources available. Power analysis is now integral to the health and behavioral sciences, and its use is steadily increasing wherever empirical studies are performed. SAS Institute is working to implement power analysis for common situations such as t -tests, ANOVA, comparison of binomial proportions, equivalence testing, survival analysis, contingency tables and linear models, and eventually for a wide range of models and designs. An effective graphical user interface reveals the contribution to power of factors such as effect size, sample size, inherent variability, type I error rate, and choice of design and analysis. This presentation demonstrates issues involved in power analysis, summarizes the current state of methodology and software, and outlines future directions.

See more

See less

Paper 265-25

Sample Size Computations and Power Analysiswith the SAS

®

System

John M. Castelloe, SAS Institute Inc., Cary, NC

Abstract

Statistical power analysis characterizes the ability ofa study to detect a meaningful effect size—for exam-ple, the difference between two population means.It also determines the sample size required to pro-vide a desired power for an effect of scientiﬁc inter-est. Proper planning reduces the risk of conductinga study that will not produce useful results and de-termines the most sensitive design for the resourcesavailable. Power analysis is now integral to the healthand behavioral sciences, and its use is steadily in-creasing wherever empirical studies are performed.SAS Institute is working to implement power analy-sis for common situations such as

t

-tests, ANOVA,comparison of binomial proportions, equivalence test-ing, survival analysis, contingency tables and linearmodels, and eventually for a wide range of modelsand designs. An effective graphical user interface re-veals the contribution to power of factors such as ef-fect size, sample size, inherent variability, type I errorrate, and choice of design and analysis. This presen-tation demonstrates issues involved in power analy-sis, summarizes the current state of methodology andsoftware, and outlines future directions.

Introduction

Suppose you have performed a small study and aredisappointed to ﬁnd that the results are unexpectedlyinsigniﬁcant. Where did you go wrong? You may needto do a larger study to detect the effect you suspect,but how much larger?Alternatively, suppose you have performed a largestudy and found a

hugely

signiﬁcant effect. In follow-up studies, can you make more efﬁcient use of re-sources by using smaller sample sizes?Power analysis can optimize the resource usage anddesign of a study, improving chances of conclusiveresults with maximum efﬁciency. Power analysis ismost effective when performed at the study planningstage, and as such it encourages early collaborationbetween researcher and statistician. It also focusesattention on effect sizes and variability in the underly-ing scientiﬁc process, concepts that both researcherand statistician should considercarefully at this stage.Muller and Benignus (1992) and O’Brien and Muller(1993) provide cogent discussions of these and re-lated concepts. These references also provide a goodgeneral introduction to power analysis.There are many factors involved in a power analysis,such as the research objective, design, data analysismethod, power, sample size, type I error, variability,and effect size. By performing a power analysis, youcan learn about the relationships between these fac-tors, optimizing those that are under your control andexploring the implications of those that are ﬁxed.For the purposes of statistical testing, the researchobjective is usually to use a feasible sample of data toassess a given hypothesis,

À

½

, that some effect ex-ists in a much larger population of potential data. Ifthe sample data lead you to conclude that

À

½

is true,but the opposite is really the case—that is, if the (null)hypothesis

À

¼

is true that there really is

no

effect—this is called a

type I error

. The probability of a type Ierror is usually designated “alpha” or

«

, and statisticaltests are designed to ensure that

«

is suitably small(for example, less than 0.05). But it is also impor-tant to control the probability

¬

of making the oppo-site (

type II

) error—that is, concluding

À

¼

, that thereis no effect, when there really is one. The probability

½

¬

of correctly rejecting

À

¼

when it is false is tradi-tionally called the

power

of the test. (Note, however,that another more technical deﬁnition of power is theprobability of rejecting

À

¼

for any given set of circum-stances, even those corresponding to

À

¼

being true.)Power analysis is often problematic in practice, be-ing performed infrequently or improperly. There areseveral reasons for this: it is technically complicated,usually under-represented in statistical curricula, andoften not performed early enough to be effective (thatis, in the study planning stage). Good software tools1

for power analysis can alleviate these difﬁculties andhelp you to beneﬁt from these techniques.

Some Power Analysis Scenarios

Thereare severaldifferent varieties of poweranalysis.Here are a few simple scenarios:

¯

A statistician in a manufacturing company is re-viewing a proposed experiment designed to as-sess the effect of various operating conditionson the quality of a product. He would like to con-duct a power analysis to see if the planned num-ber of replications and experimental units will besufﬁcient to detect the postulated effects.

¯

An advertising executive is interested in study-ing severalalternative marketing strategies, withthe aim of deciding how many and which strate-gies to implement. She would like to get a ball-park idea of how many mailings are necessaryto detect differences in response rates.

¯

A study performed by a behavioral scientist(without a prior power analysis) did not detect asigniﬁcant difference between success rates oftwo alternative therapies. He is considering re-peating the study, but would ﬁrst like to assessthe power of the ﬁrst study to detect the minimaleffect size in which he is interested. A ﬁndingof low power would encourage him to repeat thestudy with a larger sample size or more efﬁcientdesign.Perhaps the most basic distinction in power analysisis that between

prospective

and

retrospective

analy-ses. In the examples above, the ﬁrst two are prospec-tive, while the third is retrospective. A prospectivepower analysis looks ahead to a future study, whilea retrospective power analysis attempts to character-ize a completed study. Sometimes the distinction is abit fuzzy: for example, a retrospective analysis for arecently completed study can become a prospectiveanalysis if it leads to the planning of a new study toaddress the same research objectives with improvedresource allocation.Although a retrospective analysis is the most conve-nient kind of power analysis to perform, it is often un-informative or misleading, especially when power iscomputed for the observed effect size. See the sec-tion “Effective Power Analysis” for more details.Power analyses can also be characterized by the fac-tor(s) of primary interest. For example, you mightwant to estimate power, determine required samplesize, or assess detectable effect sizes. Sometimesthe researchgoal involves the largest acceptable con-ﬁdence interval width instead of the signiﬁcance ofa hypothesis test; in this case, rather than consid-ering the criterion of power, you would focus on theprobability of achieving the desired conﬁdence inter-val precision. There are also Bayesian approaches tosample size determination for estimating parametersor maximizing utility functions.

Example: Prospective Analysis for aClinical Trial

The purpose of this example is to introduce some ofthe issues involved in power analysis and to demon-strate the use of some simple SAS

®

software toolsfor addressing them. Let’s say you are a clinical re-searcher wanting to compare the effect of two drugs,A and B, on systolic blood pressure (SBP). You haveenough resourcestorecruit25 subjects foreach drug.Will this be enough to ensure a reasonable chance ofestablishing a signiﬁcant result if the mean SBPs ofpatients on each drug really differ? In other words,will your study have good power? The answer de-pends on many factors:

¯

Howbig is the underlying effectsize that you aretrying to detect? That is, what is the populationdifference in mean SBP between patients usingdrug A and patients using drug B? Of course,this is unknown; that is why you are doing thestudy! But you can make an educated guess orset a goal for the detectable effect size. Thenthe power analysis determines the chance ofdetecting this conjectured effect size. For ex-ample, suppose you have some results from aprevious study involving drug A, and you believethat the mean SBP for drug B differs by about10% from the mean SBP for drug A. If the meanSBP for drug A is 120, you thus posit an effectsize of 12.

¯

Whatis theinherentvariability inSBP? Supposeprevious studies involving drug A have shownthe standard deviation of SBP to be between 11and 15, and that the standard deviations are ex-pected to be roughly the same forthe two drugs.You want to consider this range of variability inyour power analysis.

¯

What data analysis method and level of type Ierror should you use? You decide to use thesimple approach of a two-sample

t

-test (assum-ing equal variances) with

«

= 0.05. To be con-servative you use a two-sided test, although yoususpect the mean SBP for drug B is higher.With thesespeciﬁcations, the powercan becomputed2

using the noncentral F distribution. The following SASstatements compute this power for the standard devi-ation of 15:

data twosample; Mu1=120; Mu2=132; StDev=15; N1=25; N2=25; Alpha=0.05; NCP = (Mu2-Mu1)**2/((StDev**2)*(1/N1 + 1/N2));CriticalValue = FINV(1-Alpha, 1, N1+N2-2, 0);Power = SDF(’F’, CriticalValue,1, N1+N2-2, NCP); proc print data=twosample;run;

The noncentrality parameter NCP is calculated fromthe conjectured means Mu1 and Mu2, sample sizesN1 and N2, and common standard deviation StDev.The critical value of the teststatistic is then computed,and the power is the probability of a noncentral-F ran-dom variable with noncentrality parameter NCP, onenumerator degree of freedom, and N1 + N2

2 de-nominator degrees of freedom exceeding this criticalvalue. This probability is computed using the DATAstep function SDF, which calculates survival distribu-tion function values. In general, SDF = 1

CDF; theSDF form is more accurate for values in the uppertail. The CDF and SDF functions, introduced in Re-lease 6.11 and 6.12 of the SAS System, respectively,are documented in SAS Institute Inc. (1999b). Theiruse is recommended for applications requiring theirenhanced numerical accuracy.The resulting power is about 79%. If you would reallylike a power of 85% or more when the standard de-viation is 15, then you will need more subjects. Howmany? One way to investigate required sample sizeis to construct a power curve, as shown in Figure 1.This curve was generated using the Sample Size taskin the Analyst Application. Note that a sample size of30 for each group would be sufﬁcient to achieve 85%power.Now suppose that a colleague brings to your atten-tion the possibility of using a simple AB/BA cross-overdesign. Half of the subjects would get 6 weeks ondrug A, a 4-week washout period, and then 6 weekson drug B; the other half would follow the same pat-tern but with drug order switched. Assuming thereare no period or carry-over effects, you can use apaired

t

-test to assess the difference between the twodrugs. Each pair consists of the SBP for a patientwhile using drug A and the SBP for that same patientwhile using drug B. Suppose previous studies haveshown that there is correlation of roughly

= 0.8 be-tween pairs of SBP measurements for each subject.What would the power for the study be if you use this

Figure 1.

Power Curve for Two-Sample

t

-testcross-over design with 25 subjects? You simply needto calculate thestandard deviation of apair difference,which is given by

¡

Õ

¾ ½

·

¾ ¾

¾

½

¾

where

½

and

¾

are the standard deviations for thetwo drug types (assumed to be equal in this case).The resulting values are

¡

= 6.96 when

½

=

¾

= 11,and

¡

= 9.49 when

½

=

¾

= 15. The following SASstatements compute the power for the larger standarddeviation:

data paired; Mu1=120; Mu2=132; StDev1=15;StDev2=15; Corr=0.8; N=25; Alpha=0.05;StDevDiff = sqrt(StDev1**2 +StDev2**2 -2*Corr*StDev1*StDev2); NCP = (Mu2-Mu1)**2 /(StDevDiff**2/N);CriticalValue = FINV(1-Alpha, 1, N-1, 0);Power = SDF(’F’, CriticalValue,1, N-1, NCP); proc print data=paired;run;

The resulting power is over 99% with 25 subjects.A power curve generated using the Analyst Applica-tion, displayed in Figure 2, reveals that 85% power3

Filters

1 hundred reads

1 thousand reads

Subhash Choudhary liked this

spuzzar liked this

spuzzar liked this

serdaralpan liked this

tomor2 liked this

Osmosis66 liked this

acansiz liked this