Professional Documents
Culture Documents
1 L-Biostat School of Public Health, KU Leuven, University of Leuven, Address for correspondence Emmanuel Lesaffre, Dr. Sc., L-Biostat
Leuven, Belgium School of Public Health, KU Leuven University of Leuven,
Kapucijnenvoer 35, Leuven 3000, Belgium
Semin Liver Dis 2018;38:97–102. (e-mail: emmanuel.lesaffre@kuleuven.be).
Abstract For decades, the superiority trial has been the most popular design to assess the
efficacy of newly developed drugs in a randomized controlled clinical trial. In a
superiority trial, the aim is to show that the new (experimental) treatment is better
than the standard treatment or placebo. However, it becomes increasingly difficult to
Sorafenib was the first and the only drug that improved least as high efficacy of the experimental treatment as
overall survival (OS) in patients with advanced hepatocel- sorafenib, with perhaps some additional benefits.
lular carcinoma (HCC).1 The drug is globally approved for this In the first scenario, one uses a superiority test to verify
indication, but is also associated with major toxicities and the greater efficacy than placebo in second line treatment.
30% of patients stop the treatment because of intolerance.2 In the second scenario, a noninferiority (NI) test evaluates
Therefore, there is still the need for developing new drugs the new agent against sorafenib in first line treatment.
that may be better suited for those who do not tolerate In this article, superiority testing is contrasted with NI
sorafenib or for whom the drug is inefficacious. Two strate- testing. Both statistical as well as clinical aspects are dis-
gies have been developed when evaluating newly developed cussed. Focus is on the NI test, which has become increas-
drugs to treat HCC patients. ingly popular in drug research in the last two decades.
In the first strategy, the efficacy of the new agent is Experience with NI tests revealed their positive and negative
evaluated in a placebo-controlled trial for the second line aspects.
treatment of those patients for whom sorafenib turned out To introduce the theoretical concepts, one generically
not to be appropriate. It is then hoped that the new agent has speaks of the experimental (E) treatment (new agent) and
greater efficacy than placebo. In the second strategy, the the control (C) treatment (standard treatment or placebo).
efficacy and safety of the new agent is compared with that of Illustrations will be based on the results from recent clinical
sorafenib as the first line treatment of HCC patients. In this trials on advanced liver cancer. More specifically, the super-
case, a cautious approach could then be to demonstrate at iority studies like BRISK-PS,3 EVOLVE-1,4 and REACH2 for
treating HCC patients in second line will be considered. To result at a two-sided significance level of 0.05. A statistically
illustrate NI concepts, use is made of the studies such as significant result is claimed when p-value is < 0.05. For the
BRISK-FL,5 LIGHT,6 and the recently published study7 com- EVOLVE-1 study, the sample size computation was based on
paring lenvatinib with sorafenib in the first line treatment of HR ¼ 0.714, while for the REACH study HR ¼ 0.75 was
HCC patients. But, also NI studies from other therapeutic assumed. In each of these studies, the aim is to reject the
areas will be taken to illustrate concepts. null hypothesis given by:
Fig. 1 Observed hazard ratios and two-sided 95% confidence intervals (CIs)/one-sided 97.5% CIs published in studies comparing experimental
trials to treat hepatocellular carcinoma (HCC) patients versus sorafenib in the first and second line treatments. Superiority (3 bottom) and
noninferiority (NI) (top 2) results are reported. The dotted horizontal lines extend the two-sided 95% CI to the one-sided 97.5% CI. The dotted
vertical lines define the NI margins chosen in the respective studies.
In ►Fig. 1, a hypothetical interval of equivalence is tolerated by some HCC patients, brivanib could be added to
indicated, which runs from 0.9 to 1/0.9 ¼ 1.11. An alterna- the palette of treatments and may open other treatment
tive to the standard superiority test, is to assess whether the options for the oncologists. The same is true for linifanib,
HR is at most 0.9. The null hypothesis for this test becomes which was compared with sorafenib in the LIGHT study.
The NI margin for the BRISK-FL study was set at δINF
H0: HR δSUP ¼ 0.9. (2) ¼ 1.08, while for the LIGHT study δINF ¼ 1.0491 was chosen.
At the end of the study, NI could not be claimed, for any of the
The aim of this superiority test is in fact to show that the two studies. In the BRISK-FL trial the observed HR was 1.06,
true HR is less than 0.9, i.e., that the experimental treatment with a two-sided 95% CI equal to (0.93, 1.22), and hence
provides a clinically attractive benefit to the patients. This crossing the boundary of 1.08. For the LIGHT study, the
implies that the two-sided 95% CI or the one-sided 97.5% CI results are HR ¼ 1.046 with a two-sided 95% CI ¼ (0.896,
are completely left to HR ¼ 0.9. This kind of superiority test 1.221), again crossing the boundary now equal to 1.0491. In
was advocated already in 1986 by Spiegelhalter and Freed- ►Fig. 1, the results of the two trials together with their
man,8 and is considered a more clinically useful superiority chosen NI margins are displayed.
test, but has not been followed up by the medical commu-
nity. Of course, one practical problem is the choice of δSUP,
Three Classical Designs
which may be subject of discussion.
scarce in HCC trials. A critical reflection about the positive statistical arguments. Choosing a NI margin using only
and negative aspects of NI designs is given below. clinical arguments may be tough, especially for OS in life-
threatening diseases as with hepatocellular cancer since
Reasons for Choosing a Noninferiority Design allowing increased mortality for the new drug may be
The standard design for RCTs is still the superiority trial, but difficult to justify. This approach is an example of a direct
there may be reasons why this design is not to be preferred. comparison between E and C.
Here are two main reasons: One may motivate the choice of the NI margin using an
indirect comparison with a (putative) placebo treatment.
• Placebo-controlled trials are not acceptable. Since sorafe-
This approach has been used in the comparison of novel oral
nib has shown to be an effective and established drug to
anticoagulants (NOACs) with warfarin for the treatment of
treat HCC patients, placebo-controlled RCTs to test first
patients suffering from atrial fibrillation. Indeed, several NI
line treatments in this indication are not ethically justifi-
trials were recently conducted to compare the efficacy of
able. The ASSENT II study9,10 was one of the first NI trials
these NOACs with warfarin in preventing (ischemic and
to test the efficacy of thrombolytics for treating acute
hemorrhagic) stroke or systemic embolism, see, e.g., the
myocardial infarct patients. The aim of the ASSENT II
ARISTOTLE study.12 These studies make use of the half
study was to test the efficacy of tenecteplase based on
rule. Loosely spoken, this rule defines the NI margin for
30-day mortality rate. At the planning of the trial, alte-
the HR such that the efficacy of the new drug is likely to be
plase (manufactured by the same drug company) was the
problem, and that is why the recommended approach for a NI one wishes to show. This is even more the case for NI
trial is to perform the ITT and PP analysis and hope that the studies. This is the “ugly” part of NI studies. For instance, it
two analyses will confirm each other. is not uncommon to see publications where a nonsignifi-
cant result is a posteriori interpreted as “no difference
Sample Size Calculations between treatment groups” or is turned into a NI claim
In a superiority trial, the power and sample size depend on with a NI margin defined after having seen the data. This is
the assumed clinically important treatment δCLIN. In a NI done in the SARAH trial where HCC patients were rando-
trial, the power and sample size depend on the choice of the mized to either selective internal radiotherapy (SIRT) or
NI margin δNI. When δCLIN ¼ δNI, the necessary sample size in sorafenib.15 This open-label study was designed as a super-
the superiority and the NI trial are equal when the sample iority trial, but failed to show a significant difference
size in the NI trial is determined assuming that the two drugs between OS of the two treatments. The authors did not
have equal efficacy. However, because δCLIN is typically make an explicit NI claim, but interpreted (between the
(much) larger than δNI, the calculated sample size for a NI lines, what often happens) their nonsignificant result as “no
trial is often much larger than that of a superiority trial. difference.” Because the patients appear to tolerate better
SIRT, the authors concluded that SIRT may have a lot to offer
Combining Noninferiority and Superiority in One Trial to future HCC patients. This conclusion was taken despite
Suppose that, at the analysis stage of a NI trial, the results the open-label character of the study, making it easier that
• Check the conduct of the trial: All aspects which reduce 8 Spiegelhalter DJ, Freedman LS. A predictive approach to selecting
the quality of the trial will help to “show” noninferiority. the size of a clinical trial, based on subjective clinical opinion. Stat
• Noninferiority CANNOT be defined/claimed a posteriori. Med 1986;5(01):1–13
9 Van De Werf F, Adgey J, Ardissino D, et al; Assessment of the Safety
and Efficacy of a New Thrombolytic (ASSENT-2) Investigators.
Single-bolus tenecteplase compared with front-loaded alteplase
References in acute myocardial infarction: the ASSENT-2 double-blind ran-
1 Ribeiro de Souza A, Reig M, Bruix J. Systemic treatment for domised trial. Lancet 1999;354(9180):716–722
advanced hepatocellular carcinoma: the search of new agents 10 Lesaffre E, Bluhmki E, Wang-Clow F, et al. The general concepts of
to join sorafenib in the effective therapeutic armamentarium. an equivalence trial, applied to ASSENT-2, a large-scale mortality
Expert Opin Pharmacother 2016;17(14):1923–1936 study comparing two fibrinolytic agents in acute myocardial
2 Zhu AX, Park JO, Ryoo B-Y, et al; REACH Trial Investigators. infarction. Eur Heart J 2001;22(11):898–902
Ramucirumab versus placebo as second-line treatment in 11 Bingham CO III, Sebba AI, Rubin BR, et al. Efficacy and safety of
patients with advanced hepatocellular carcinoma following etoricoxib 30 mg and celecoxib 200 mg in the treatment of
first-line therapy with sorafenib (REACH): a randomised, dou- osteoarthritis in two identically designed, randomized, pla-
ble-blind, multicentre, phase 3 trial. Lancet Oncol 2015;16(07): cebo-controlled, non-inferiority studies. Rheumatology (Oxford)
859–870 2007;46(03):496–507
3 Llovet JM, Decaens T, Raoul J-L, et al. Brivanib in patients with 12 Granger CB, Alexander JH, McMurray JJV, et al; ARISTOTLE Com-
advanced hepatocellular carcinoma who were intolerant to sor- mittees and Investigators. Apixaban versus warfarin in patients
afenib or for whom sorafenib failed: results from the randomized with atrial fibrillation. N Engl J Med 2011;365(11):981–992