Professional Documents
Culture Documents
To cite: Quach NE, Yang K, Abstract whether the limited sample sizes are suffi-
Chen R, et al. Post-hoc power Power analysis is a key component of planning prospective cient to reliably detect significant treatment
analysis: a conceptually studies such as clinical trials. However, some journals
valid approach for power
differences reported in the manuscripts.
in biomedical and psychosocial sciences request power As statistical power describes the proba-
based on observed study
analysis for data already collected and analysed before bility, or likelihood, of an event to occur in the
data. General Psychiatry
2022;35:e100764. doi:10.1136/ accepting manuscripts for publication. Many have raised
future, such as a statistically significant treat-
gpsych-2022-100764 concerns about the conceptual basis for such post-hoc
power analyses. More recently, Zhang et al showed by ment or exposure effect in a study, post-hoc
Received 01 July 2022 using simulation studies that such power analyses do not power analysis is clearly flawed since power
Accepted 23 August 2022 indicate true power for detecting statistical significance analysis is being performed for an event that
since post-hoc power estimates vary in the range of has already occurred (ie, the treatment or
practical interests and can be very different from the true exposure difference already exists in the study
power. On the other hand, journals’ request for information data) regardless of whether the difference
about the reliability of statistical findings in a manuscript is statistically significant. Many have raised
due to small sample sizes is justified since the sample size concerns on such conceptual grounds.1–5
plays an important role in the reproducibility of statistical
Despite these efforts, some journals continue
findings. The problem is the wording of the journals'
request, as the current power analysis paradigm is not to request post-hoc power analysis as part of
designed to address journals’ concerns about the reliability their decision-making process in publishing
of the statistical findings. In this paper, we propose an manuscripts. On the other hand, even if an
alternate formulation of power analysis to provide a approach or method is conceptually flawed, it
conceptually valid approach to the journals’ wrongly may still provide useful information.
worded but practically significant concern. For example, for addressing missing
follow- up data in longitudinal data, the
Introduction last observation carried forward (LOCF) is
Power analysis is critical to designing and conceptually flawed when used as a general
planning prospective studies in biomed- statistical strategy to deal with missing data
© Author(s) (or their ical and psychosocial research. It provides during follow- up assessments. However, in
employer(s)) 2022. Re-use some cases, LOCF is still used to provide
permitted under CC BY-NC. No critically important sample sizes needed to
commercial re-use. See rights detect statistically significant and clinically information about treatment differences.
and permissions. Published by meaningful treatment differences and eval- Consider a longitudinal study on a disease of
BMJ. uate cost–benefit ratios so that studies can be interest in which the subjects’ health condi-
1
Division of Biostatistics conducted with minimal resources without tions will deteriorate over time. Estimates
and Bioinformatics, Herbert of changes over time under LOCF provide
Wertheim School of Public
compromising scientific integrity and rigour.
Thus, power analysis is informative for information for the mean change of health
Health and Human Longevity
Science, UC San Diego, La Jolla, prospective studies, that is, studies that are conditions in the best scenario since follow-up
California, USA yet to be conducted. However, the last author missing data are likely due to deteriorated
2
Department of Orthopedics, of this paper has been receiving numerous health conditions.
Emory Healthcare, Emory Unfortunately, this is not the case with
University, Atlanta, Georgia, USA
requests from domain experts to perform
3
Shanghai Mental Health Center, power analysis for data already analysed and post-hoc power analysis. Zhang et al1 exam-
Shanghai Jiao Tong University reported in submitted manuscripts. Although ined the utility of post- hoc power analysis
School of Medicine, Shanghai, the reasons for such ‘post-hoc’ power analysis in comparing two groups using simulation
China are never provided by the journals consid- and found that post-hoc power estimates are
Correspondence to ering publications of the manuscripts, our generally not informative about the true treat-
Dr Xinlian Zhang; understanding is that they are likely due to ment difference unless used for large effect
xizhang@health.ucsd.edu the sample sizes of the data analysed, that is, size and/or large sample size. For medium
Figure 1 Histogram of power from existing post-hoc power function, along with power from the prospective power function
for sample size n=50, based on 1000 Monte Carlo runs for effect size (A) Δ=0.2, (B) Δ=0.5 and (C) Δ=0.8.
Figure 2 Histogram of power from a proposed post-hoc power function, along with power from the prospective power
function for sample size n=50, based on 1000 Monte Carlo runs for effect size (A) Δ=0.2, (B) Δ=0.5 and (C) Δ=0.8.
as a function of manufacturers in addition to differences function of the standard normal distribution N(0,1). If
between SUVs and sedans using linear regression, we will we condition on the null H0 instead of Ha in Equation 4,
get different estimates of regression parameters (coeffi- we obtain a type I error, as in Equation 5:
cients) and standard errors, but same test statistics (F and type I error(n1 , n2 , α, H0 ) = Pr (Reject H0 | H0 )
t statistics) and p values.
= Pr (Z ≥ Zα/2 )
In clinical research, the magnitude of d is used to (5)
indicate meaningful treatment difference or exposure = 2[1 − ϕ(Zα/2 )]
effects. This is because statistical significance is a function =α
of sample size and any small treatment difference can which is the probability of rejecting the null H0 when
become statistically significant with a sufficiently large the null is true. For power analysis, we generally set type I
sample size. Thus, statistical significance cannot be used errors at α=0.05, so that Zα/2=Z0.025=1.96.
to define the magnitude of treatment or exposure effects. In practice, we often set power at some prespecified
Defined only by the population parameters, effect size levels and then perform power analysis to determine the
is a meaningful measure of the magnitude of treatment minimum sample size to detect a prespecified effect size
or exposure effects. Equation 3 indicates that both the Δ with the desired level of power. For example, if we want
null and alternative hypotheses only involve population to determine sample size n per group to achieve, say, 0.8
parameters. This characterisation of the statistical hypoth- power with equal sample between two groups, we can
esis is critically important since this fundamental assump- obtain such minimum n by solving for n in the following
tion is violated when performing post-hoc power analysis. equation7:
For power analysis, we want to determine the proba-
bility to reject the null H0 in favour of the alternative Ha ψ(n, n, α, Ha (d = ∆)) ≥ 0.8 (6)
under Ha for the hypothesis in Equation 3. To compute When applying Equation 3 for a post-hoc power anal-
power, we need to specify the H0, Ha, type I error α, and ysis in a study, we substitute the observed effect size ∆n in
∧
Figure 3 Histogram of power from existing post-hoc power function, along with power from the prospective power function
for sample size n=100, based on 1000 Monte Carlo runs for effect size (A) Δ=0.2, (B) Δ=0.5 and (C) Δ=0.8.
∧
∧
1 − ϕ
Z1− α − √ n
∆ ∧
if∆n > 0
By the CLT, the
( variability
) of n is described by a normal
∆
2 1 + 1
1 1
∧ n1 n2 distribution N ∆, n1 + n2 . Thus, in the absence of knowl-
ψ(n1 , n2 , α, Ha (d = ∆n )) =
∧
edge about Δ, values closer to ∆n are better candidates for
∧
∧
ϕ
−Z1− α − √ ∆n if∆n < 0
2 1 + 1 ∧
n1 n2
Δ, while values more distant from ∆n are less likely to be
(7) good candidates for Δ. By giving more weights to values
∧
Equation 7 only indicates the probability to detect the closer to ∆n and less weights to values more distant from
∧ ∧ ∧ ( )
∧
sample effect size ∆n , which can be quite different from ∆n , the normal distribution centred at ∆n , N ∆n , n11 + n12
power estimates computed based on the true popula- , quantifies our uncertainty about Δ. Thus, for post-hoc
tion effect size Δ. Except for large sample sizes n, power power analysis, we replace the alternative hypothesis in
∧
estimates based on ∆n can be highly variable, rendering Equation 3 involving a known population effect size Δ
them uninformative about the true effect size Δ. ( Δ with their
with a set of candidate values for ) candidacy
∧
On the other hand, post-hoc power analysis based on described by the distribution N ∆n , n11 + 1
n2 :
Equation 7 also presents a conceptual challenge. Under
the current power analysis paradigm, power is the proba- ( )
∧
bility to detect a population-level effect size Δ. This effect H0 : d = 0 vs Ha : d ∼ N ∆n , n11 + 1
n2 (8)
size is specified with complete certainty. For example, if
we set Δ=0.5, it means that we know that the difference The hypothesis in Equation 8 is fundamentally
between two population means of interest is 0.5. For different from the hypothesis in Equation 3 for regular
∧
post-hoc power analysis, we compute power using ∆n as power analysis for prospective studies. Unlike Equa-
if this was the difference between the two population tion 3, there are more than one candidate value for
∧ Δ and post-hoc power analysis must consider all such
means. Due to sampling variability, ∆n varies around Δ ) for Δ
candidate ds with their relative informativeness
(
∧
and the two can be substantially different. Indeed, as illus- described by the distribution N ∆n , n11 + n12 . Thus,
trated in Zhang et al,1 post-hoc power based on Equation
7 can be misleading when used to indicate power for Δ a sensible way to achieve this is to average power esti-
based on Equation 4. mates over all such candidates according to their rela-
Thus, for post-hoc power analysis to be conceptually tive
( informativeness
) for Δ described by the distribution
∧
consistent with power analysis for prospective studies and 1 1
N ∆n , n1 + n2 . However, since there are infinitely many
informative about the population effect size Δ, we must
∧ such Δs, we need to use integrals in calculus to perform
account for the sampling variability in ∆n . Although this averaging.
∧ ∧
∆n ̸= ∆, ∆n is generally informative about Δ, with dimin- Let f ∧ (∆) denote the density function of the normal
ishing uncertainty as sample size increases, a phenom- ∆n (∧ 1 1 )
distribution N ∆n , n1 + n2 . Then by averaging the
enon known as the central limit theorem (CLT) in the
∧ power function ψ (n1 , n2 , α, Ha (d = ∆)) in Equation 7 over
theory of statistics.8 By quantifying the variability in ∆n all( plausible ) values of Δ weighted by the distribution
and incorporating such variability in specifying the alter- ∧
1 1
native hypothesis, we can develop new post-hoc power N ∆n , n + n2 , we obtain power for the hypothesis in
1
analysis to inform our ability to detect Δ. Equation 8:
Figure 4 Histogram of power from proposed post-hoc power function, along with power from the prospective power function
for sample size n=100, based on 1000 Monte Carlo runs for effect size (A) Δ=0.2, (B) Δ=0.5 and (C) Δ=0.8.
( )
τ
∧
n1 , n2 , α, Ha (d : N(∆n ,
1
+
1
)) =
ˆ ∞
ψ(n1 , n2 , α, Ha (d = x))f ∧ (x) dx
For example, within the current study context, we may
n1 n2 −∞ ∆n start without any knowledge about d, in which case we can
(9) use a non-informative prior distribution, or a constant.
∧
∞
where −∞ h(x)dx denotes the value of the integral of After obtaining ∆n from a( real study) with sample size
´
common (population) variance. We set the population 0.259 to 0.463 to 0.711 when Δ changed from 0.2 to
parameters as follows: 0.5 to 0.8. Unlike in figure 1, power was larger than
0.1 in all three cases. Moreover, the sample SDs for the
µ1 = 0, µ2 = µ1 + ∆, σ 2 = 1. (12)
three effect sizes were 0.111, 0.177 and 0.162 for the
2
Since σ =1, the difference between the means, Δ, is proposed, compared with 0.151, 0.239 and 0.194 for
Cohen’s d, which is interpreted as Δ=0.2, Δ=0.5 and Δ=0.8 the traditional power method. The smaller variability
for small, medium and large effect size, respectively.6 of the proposed post-hoc power was the result of being
∧
For convenience, we assume a common sample size for less sensitive to variability in ∆n , compared with the
both groups, that is, n1=n2=n. We set Δ and n to different existing post-hoc power.
values so we can see how power estimates from the three Shown in figure 3 are the histograms of power
different approaches change as a function of the two from the existing post-hoc power approach based on
parameters. 1000 Monte Carlo sample sizes for effect size Δ=0.2
Given all these parameters, we can readily evaluate (figure 3A), Δ=0.5 (figure 3B) and Δ=0.8 (figure 3C),
the prospective power function in Equation 6. For along with power from the prospective power function
post-
hoc power analysis, power is not a constant and (the vertical line) for sample size n=100. As in figure 1,
∧
varies according to observed effect size ∆n . We use Monte as Δ changed from 0.2 to 0.5 to 0.8, the mean power
Carlo simulation to capture such variability. Given the increased from 0.226 to 0.670 to 0.951. Again, post-hoc
parameters in Equation 12 and sample size n, we simulate power was quite variable, covering the entire range in
sample, Y ik , from N(µ, σ 2 ), compute the sample effect size all cases, except for Δ=0.8, where the minimum was
∧ 0.48. With Δ=0.8 and n=100, the smaller sampling vari-
∆n and evaluate the post-hoc power functions in Equa- ∧
tion 7 and Equation 9. Thus, unlike prospective power, ability ∆n along with the large effect size led to power
both post-hoc power functions depend on the observed values close to 1, greatly reducing the variability of
∧
effect size ∆n . The difference is that Equation 7 treats post-hoc power, as compared with n=50.
∧ Shown in figure 4 are the histograms of power from
∆n as the true effect size, while Equation 9 acknowledges the proposed post-hoc power approach based on 1000
∧
sampling variability in ∆n and uses its sampling distribu- Monte Carlo sample sizes with the mean difference Δ=0.2
tion to inform the underlying effect size Δ. (figure 4A), Δ=0.5 (figure 4B) and Δ=0.8 (figure 4C),
∧
The variance of ∆n is Var(
∧
2 along with power from the prospective power function
∧
∆n ) = n , which will be close (the vertical line) for n=100. As in figure 3, the mean
to 0 and ∆n will be close to Δ for a very large sample size power grew from 0.303 to 0.636 to 0.901 as Δ increased
n. In this case, both post-hoc power will be close to the from 0.2 to 0.5 to 0.8. The sample SDs for Δ=0.2, Δ=0.5
prospective power. For small and moderate sample sizes, and Δ=0.8 were 0.135, 0.174 and 0.086, compared with
all three power values will differ from each other. Given Δ, 0.184, 0.221 and 0.071 for the existing power in figure 3.
the prospective power only has one value for each sample Except for Δ=0.8, the proposed post-hoc power was less
size, while the two post-hoc power approaches will have variable than its existing counterpart. For Δ=0.8, the trend
∧
different values for different observed ∆n . For small and was reversed and post-hoc power was more variable for
moderate sample sizes, post-hoc power values will have the proposed than the traditional approach. This is made
large variabilities and may not be informative about the clear by comparing the two histograms in figure 3C and
power based on the true Δ. figure 4C. For Δ=0.8, there was much less variability in
Shown in figure 1 are the histograms of power from the power from the traditional than the proposed approach;
existing post-hoc power approach based on 1000 Monte about 70% of power values were between 0.95 and 1 for
Carlo sample sizes for effect size Δ=0.2 (figure 1A), Δ=0.5 the traditional approach, compared with about 30% for
(figure 1B) and Δ=0.8 (figure 1C), along with power from the proposed approach. Thus, this smaller variability of
the prospective power function (vertical line) for sample power from the traditional approach again reflects its
∧
size n=50. As expected, power increased in both cases as higher sensitivity to the observed ∆n than the proposed
the true effect size Δ became larger; the sample mean for alternative.
the post-hoc power was 0.167, 0.443 and 0.761, respec-
tively. For all three effect sizes, there was a large amount
of variability in the post-hoc power, covering the entire Discussion
range of power function between 0 and 1. In this paper, we proposed a new approach for post-hoc
Shown in figure 2 are the histograms of power from power analysis. Unlike the existing approach, the proposed
the proposed post- hoc power approach based on alternative is conceptually sound and can be applied to
1000 Monte Carlo sample sizes with the mean differ- gauge variability of statistical findings and thus inform reli-
ence Δ=0.2 (figure 2A), Δ=0.5 (figure 2B) and Δ=0.8 ability of such findings. The approach is illustrated using
(figure 2C), along with power from the prospective simulated data by comparing with the traditional approach.
power function (the vertical line) for sample size The simulation study results show that the proposed
n=50. As in figure 1, the mean power increased from approach is less sensitive to observed effect sizes and is more
informative about power estimates based on the underlying Competing interests None declared.
true and observed effect size. The simulation results also Patient consent for publication Not required.
show that even when performed correctly post-hoc power Provenance and peer review Commissioned; externally peer reviewed.
analysis may yield power values that are different from the Open access This is an open access article distributed in accordance with the
power based on the underlying effect size under the current Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which
paradigm. On the other hand, we believe that the current permits others to distribute, remix, adapt, build upon this work non-commercially,
paradigm may not provide useful power estimates for real and license their derivative works on different terms, provided the original work is
properly cited, appropriate credit is given, any changes made indicated, and the use
prospective studies. As true effect sizes are rarely known for is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
certain in most studies, treating our guessed effect sizes as
such ground truth in computing power is flawed practice. ORCID iD
Natalie E Quach http://orcid.org/0000-0001-9382-5826
Thus, power analysis for prospective studies also needs to
account for uncertainty about true effect sizes to provide
practical and useful power estimates. Work is underway to
extend the proposed post-hoc power approach to power
References
analysis for prospective studies. 1 Zhang Y, Hedo R, Rivera A, et al. Post hoc power analysis: is it an
informative and meaningful analysis? Gen Psychiatr 2019;32:e100069.
Twitter Natalie E Quach @NatalieEQuach 2 Heonig JM, Heisey DM. The abuse of power: the pervasive fallacy of
power calculations for data analysis. Am Stat 2001;55:19–24.
Contributors All authors participated in the discussion of the statistical issues and 3 Korn EL. Projection from previous studies. A caution. Control Clin
worked together to develop this paper. JT, XMT, XZ and MX reviewed the literature Trials 1990;11:67–9.
and discussed the conceptual issues with the conventional post-hoc power analysis 4 Kraemer HC, Mintz J, Noda A, et al. Caution regarding the use of pilot
approach. All authors contributed to the development of the new post-hoc power studies to guide power calculations for study proposals. Arch Gen
analysis approach proposed in this paper. NEQ, KY and RC developed the simulation Psychiatry 2006;63:484–9.
algorithms and associated R codes and performed the simulation study. JT, XMT, 5 Levine M, Ensom MH. Post hoc power analysis: an idea whose time
has passed? Pharmacotherapy 2001;21:405–9.
XZ, MX and NEQ worked together to draft and finalise the manuscript. 6 Cohen J. Statistical power analysis for the behavioral sciences. 2nd
Funding The project described was partially supported by the National Institutes of ed. Mahwah (NJ): LEA, 1988.
Health (grant UL1TR001442) of CTSA funding. 7 Tu XM, Kowalski J, Zhang J, et al. Power analyses for longitudinal
trials and other clustered designs. Stat Med 2004;23:2799–815.
Disclaimer The content is solely the responsibility of the authors and does not 8 Tang W, He H, XM T. Applied categorical and count data analysis.
necessarily represent the official views of the NIH. Boca Raton (FL): Chapman & Hall/CRC, 2012.
Natalie E Quach has been a Biostatistics PhD student in the Division of Biostatistics and Bioinformatics at
the UC San Diego Herbert Wertheim School of Public Health and Human Longevity Science since 2022. She
graduated with Latin honors from UC San Diego, USA in 2022, receiving a Bachelor of Science in Applied
Mathematics. She is currently a researcher and teaching assistant at UC San Diego. Her main research
interests include causal inference and clinical trials.