You are on page 1of 10

PharmacoEconomics - Open

https://doi.org/10.1007/s41669-023-00391-5

ORIGINAL RESEARCH ARTICLE

Impact of Extrapolation Model Choices on the Structural Uncertainty


in Economic Evaluations for Cancer Immunotherapy: A Case Study
of Checkmate 067
Taihang Shao1,2 · Mingye Zhao1,2 · Leyi Liang1,2 · Lizheng Shi3 · Wenxi Tang1,4 

Accepted: 16 January 2023


© The Author(s) 2023

Abstract
Objectives  The aim of this study was to compare the performance of different extrapolation modeling techniques and analyze
their impact on structural uncertainties in the economic evaluations of cancer immunotherapy.
Methods  The individual patient data was reconstructed through published Checkmate 067 Kaplan Meier curves. Standard
parametric models and six flexible techniques were tested, including fractional polynomial, restricted cubic splines, Royston–
Parmar models, generalized additive models, parametric mixture models, and mixture cure models. Mean square errors
(MSE) and bias from raw survival plots were used to test the model fitness and extrapolation performance. Variability of
estimated incremental cost-effectiveness ratios (ICERs) from different models was used to inform the structural uncertainty
in economic evaluations. All indicators were analyzed and compared under cut-offs of 3 years and 6.5 years, respectively,
to further discuss model impact under different data maturity. R Codes for reproducing this study can be found on GitHub.
Results  The flexible techniques in general performed better than standard parametric models with smaller MSE irrespective
of the data maturity. Survival outcomes projected by long-term extrapolation using immature data differed from those with
mature data. Although a best-performing model was not found because several models had very similar MSE in this case,
the variability of modeled ICERs significantly increased when prolonging simulation cycles.
Conclusions  Flexible techniques show better performance in the case of Checkmate 067, regardless of data maturity. Model
choices affect ICERs of cancer immunotherapy, especially when dealing with immature survival data. When researchers lack
evidence to identify the ‘right’ model, we recommend identifying and revealing the model impacts on structural uncertainty.

Key Points for Decision Makers 

Taihang Shao and Mingye Zhao contributed equally to this work. High structural uncertainty existed in extrapolation
models when simulating the long-term economic evalua-
* Lizheng Shi
tion results of cancer immunotherapies, especially when
lshi1@tulane.edu
dealing with immature data.
* Wenxi Tang
tokammy@cpu.edu.cn Flexible techniques show better performances when
1
standard parametric models are not flexible enough to
School of International Pharmaceutical Business, China
Pharmaceutical University, Nanjing 211198, China
capture the complexity of survival hazards from cancer
2
immunotherapy. Model validity will be reinforced if
Center for Pharmacoeconomics and Outcomes Research,
China Pharmaceutical University, Nanjing 211198, China
external evidence exists.
3
Department of Global Health Management and Policy, Identifying and reporting the structural uncertainty
School of Public Health and Tropical Medicine, Tulane caused by extrapolation model selection in an economic
University, New Orleans,  LA 70118, USA evaluation is recommended as researchers lack evidence
4
Department of Public Affairs Management, School to identify the ‘right’ model among several models in
of International Pharmaceutical Business, China most cases.
Pharmaceutical University, Nanjing 211198, China

Vol.:(0123456789)
T. Shao et al.

1 Introduction Through these studies, we can conclude that methods that


provide more degrees of freedom may accurately represent
Decision making for anti-cancer drugs usually requires a survival for anti-cancer drugs, particularly if data are more
lifetime projection including survival benefit and cost when mature or external data are available to inform the long-
mature data are quite often unobtainable [1]. Especially with term extrapolations.
the emergence of immune checkpoint inhibitors (ICIs), time- Despite rich literature focused on survival extrapolation,
to-event data with long-enough follow-up that could fully few studies evaluated the impacts of model selection on
capture the complex survival hazards is rarely available. economic evaluations for cancer immunotherapy. Several
To be specific, median overall survival (OS) data reported items including the immature survival data and the long-
in these clinical trials are not necessarily reached. Mean- term extrapolation which may lead to high uncertainty in
while, individual patient data (IPD) is usually unavailable economic evaluations of cancer immunotherapy still need
for peer replication. This results in high uncertainties when to be examined when using flexible models [19]. In this
conducting an economic evaluation under these conditions. work, we aimed to evaluate the GOF and extrapolation per-
Modeling techniques are therefore used both to capture the formance of different modeling technologies through a case
key features of survival functions (model fitness) and to study of Checkmate 067. We present the results of the model
simulate the survival data to a longer term (extrapolation differences in extrapolated survival outcomes, and the result-
performance). ing structural uncertainties in economic evaluation. Further
The most widely used parametric models in economic recommendations on how to deal with immature data and
evaluation are standard parametric models [2], including model selection through this case study are also provided.
exponential, Weibull, Gompertz, lognormal, and log-
logistic [3]. A study by Djalalov et al. (2019) introduced
a method to fit parametric survival distributions and pro-
vided a systemic approach to estimating transition prob- 2 Methods
abilities from survival data using parametric distributions
[3]. However, survival curves for ICIs tend to be more The study process for this article is shown in Fig. 1. All
complex and variable in shape, with declining survival algorithms for fit, extrapolation, and economic evaluation
after the initial phase followed by plateaus [4]. Note that in this study were implemented in R (version 4.0.2, https://​
standard parametric models are limited in the types of www.r-p​ rojec​ t.o​ rg/). R Codes for reproducing this study can
hazard functions they can reproduce, which means that be found on GitHub (https://​github.​com/​Taiha​ngShao/​uncer​
they may not be flexible enough to model survival curves tainty-​of-​CEA-​flexi​ble-​extra​polat​ion-​techn​iques).
during all phases where there are multiple changes in the
slope of the hazard function [4, 5]. 2.1 Clinical Data Sources
In comparison, flexible parametric models includ-
ing fractional polynomials (FP), restricted cubic splines The clinic trial selected for this research was Checkmate 067
(RCS), Royston–Parmar (RP) models, and generalized (a phase III, randomized, double-blind study of nivolumab
additive models (GAM) can capture the inflection points monotherapy or nivolumab plus ipilimumab versus ipili-
of survival curves [6–8]. Other models including land- mumab monotherapy in subjects with previously untreated
mark models, parametric mixture models (PMM), and unresectable or metastatic melanoma) because it has the
mixture cure models (MCM) can also model complex longest survival data so far for cancer immunotherapy [23,
hazard shapes. A rich literature has demonstrated the 24]. We extracted both progression-free (PFS) and OS data
improved fit performance of these flexible models [4, 6, of nivolumab plus ipilimumab (NI) and ipilimumab (I)
8–15]. It can be concluded that flexible models performed from this study. IPD was obtained through reconstruction.
better in fitting and extrapolating survival outcomes than Results published in 2017 with 3 years of data (with a mini-
standard parametric models. However, selecting survival mum follow-up of 36 months) [24] and 2021 with 6.5 years
models based only on goodness-of-fit (GOF) statistics is (minimum follow-up of 77 months) [23] were all included
unsuitable since good within-sample fit does not guarantee to test model performance with data maturity. The 3-year
good extrapolation performance. The National Institute for data were used for fit and extrapolation (considered imma-
Health and Care Excellence (NICE) has already published ture in this study) [24], while the 6.5-year data were used
some guidelines regarding the process of fit and extrapo- for fit and validation (considered mature in this study) [23].
lation [16, 17]. Some current studies have also discussed The hypothesis here is that the data was immature when the
different flexible modeling techniques in extrapolating median OS was not reached, and vice versa.
survival outcomes regarding immunotherapies [7, 18–22].
Impact of Extrapolation Model Choices in Economic Evaluations for Cancer Immunotherapy

Fig. 1  Flow chart of study process. AIC Akaike information criterion, ICER incremental cost-effectiveness ratio, IPD individual patient data,
MSE mean squared errors

2.2 Model Fit, Extrapolation, and Validation has three parameters that can assume a variety of hazard
shapes (e.g., unimodal, monotonically increasing or decreas-
We used GetData Graph Digitizer (version 2.26) to extract ing, or bathtub) [19]. Landmark models that require patients’
survival plots from PFS and OS curves. Guyot’s method, response status for choosing a landmark time point were
as NICE recommended, was used to reconstruct individual not included in this study since we only had summary data
patient data through the SurvHE package in R [25–27]. instead of detailed IPD information [17, 18]. For FP models,
In this study, 3-year and 6.5-year survival data were both the best models for both first-order and second-order were
used to test the model GOF (as compared with the original included. For RP models, the best models for all three scales
KM data). We further extrapolated 3-year survival data to (‘odds’, ‘normal’, and ‘hazard’) were considered. For RCS,
6.5 years to compare the GOF with the original 6.5-year KM GAM, PMM, and MCM models, we only considered the
data, for the purpose of testing the model impact when only models with the best performance. Therefore, we included
immature data is available. a total of 16 models in the comparisons. Methodologies and
The models used for fit and extrapolation in our study details of implementing these modeling techniques are pro-
included standard parametric models (exponential, Weibull, vided in Supplementary file 3 (see electronic supplementary
Gompertz, lognormal, log-logistic, gamma, and generalized material [ESM]).
gamma distributions) and six flexible models (FP, RCS, RP,
GAM, PMM, and MCM) [1]. Note that generalized gamma
T. Shao et al.

GOF for specific types of models was checked by Akai- 2.3 Economic Evaluation
ke’s information criterion (AIC) and visual inspection. AIC
is described by the following formula [28]: For economic evaluation, we considered three groups of
models, including models with best fit of 3-year data, models
AIC = −2 log L + 2k
with the best extrapolation performance after the third year,
L refers to the likelihood of the model and k refers to the and models with the best fit of 6.5-year data. Four survival
number of parameters. outcomes were needed for each cost-effectiveness analysis,
GOF between models was checked by two indicators. The including both PFS and OS for the treatment and compara-
primary measure was mean squared errors (MSE), and the tor. For every outcome, we sought the best model with the
secondary measure was bias [8, 15]. MSE could penalize lowest MSE. Therefore, a total of 3­ 4 combinations were con-
positive and negative deviations from the estimand of mean sidered for a specific group of models. Model combinations
equally. Besides, MSE could be interpreted as penalizing were excluded only when they were not clinically realistic
both for bias (how close are the model estimates to the truth) (PFS higher than OS at specific time points).
and variance (how much do estimates vary across cycles). There already existed a rich literature that studied the
How the bias and variance contributed to the MSE could be cost effectiveness of nivolumab plus ipilimumab versus ipili-
provided by the measure of bias. mumab monotherapy in subjects with previously untreated
MSE and bias are described by the following formula: unresectable or metastatic melanoma according to a cur-
rent systematic review [31]. Therefore, we chose to repro-
n
1∑ duce and simplify high-quality research instead of building
MSE = (Y − Ŷ i )2
n i=1 i a new model. Kohn et al. evaluated the cost effectiveness
of five first-line immunotherapies for advanced melanoma
n with a Markov model [32]. This research considered both
1∑ nivolumab plus ipilimumab and ipilimumab monotherapy
bias = (Y − Ŷ i )
n i=1 i as their first-line therapies. The parameters reported in this
study were comprehensive and had authoritative sources.
n refers to the number of samples, Yi refers to the real value,
Medication, dosage, and subsequent treatment were also
and Ŷ i refers to the predicted value.
close to the original Checkmate 067 (first-line NI + I fol-
Extrapolation performance was also evaluated by MSE
lowed by second-line carboplatin and paclitaxel; first-line I
and bias. That is, we checked the MSE/bias of model-esti-
followed by second-line NI). Thus, based on Kohn’s study,
mated survival data and observed survival data at each time
we constructed a simplified partitioned survival model since
point from 3 years [3, 6, 29]. The check of extrapolation
some parts of the original model were not needed in our
performance was also supplemented with visual inspection.
study. A summary of Kohn’s study is reported in Supple-
For AIC and MSE, smaller values indicated better model
mentary file 4 (see ESM). Parameters including costs, utili-
performance; for bias, performance was examined by how
ties, and incidence of adverse events were obtained from the
close values are to zero.
original study. End-of-life cost was not reported by Kohn
To improve the accuracy of extrapolating survival out-
et al., and we obtained it from a similar study [33]. The
comes [17, 18], we conducted the external validation. How-
detailed parameter inputs are shown in Table 1, Supplemen-
ever, due to lack of data, the only external data found was
tary file 1 (see ESM). Simulation times were set at 6.5 years
from I-OS (overall survival data of ipilimumab), for which
and 20 years. Outcomes included incremental costs, incre-
the longest follow-up time of 10 years has been reached.
mental quality-adjusted life-years (QALYs), and incremental
Therefore, we only conducted the external validation for
cost-effectiveness ratio (ICER).
3-year data of I-OS. The choice of external data was made
The impacts of model choices were reflected on the
following a recently published guideline [19]. Other exter-
structural uncertainties of economic evaluation. We used
nal data were obtained from a pooled analysis of long-term
the variability of the modeled ICERs to measure this struc-
survival data from 12 studies of ipilimumab in unresectable
tural uncertainty. Tornado diagrams were drawn to show the
or advanced melanoma [30]. A KM curve was reconstructed.
variability. However, to quantify and visualize this structural
The MSE/bias of model-estimated survival data and external
uncertainty, we used the distances of modeled ICER plots
data at each time point between 3 years and 10 years was
to the referencing ICER to indicate the model variability.
checked.
We first defined the referencing ICER as the one with the
smallest distance among all modeled ones, and then decided
the variability by calculating the distance between each
observed ICER and the reference dot. For distance calcula-
tion, we did it in two steps: first to standardize all modeled
Impact of Extrapolation Model Choices in Economic Evaluations for Cancer Immunotherapy

outcomes (cohort point estimates of incremental QALYs 3.1.2 GOF Under Flexible Techniques and Data Maturity
and incremental costs for each PFS and OS using selected
model), and second to calculate the discrete degree of mod- According to Fig. 2, the average MSE of flexible modeling
eled ICER estimate to the referencing one. The standardiza- techniques was less than that of standard parametric models,
tion process was conducted as follows: which indicated that flexible modeling techniques had better
GOF. The bias of 6.5-year modeled data with standard para-
X−𝜇
Z= metric models was greater than that of 3-year modeled data.
𝜎 However, it was the opposite in the flexible techniques. For
Z refers to the standardized data, X refers to the original specific models, RP models performed well when fitting the
data, μ refers to the mean, σ refers to the standard deviation. data (MSE always ranked top 3). To compare between model
Therefore, new outcome points (x = standardized incre- groups, the 6.5-year data fit group had a smaller MSE than
mental costs, y = standardized incremental QALYs) were the 3-year data fit group. This indicated that GOF could be
obtained after the standardization. improved by modeling with longer follow-up data.
To measure the discrete degree between the ICER points
and the reference point, the Euclidean Distance was used 3.1.3 Goodness of Extrapolation
(see following formula).
√ More variation could be seen in the extrapolated parts of
dist = (x2 − x1 )2 + (y2 − y1 )2 the survival curves (S1 Fig. 3, see ESM). It was found that
more flexible techniques outperformed standard parametric
dist refers to the Euclidean Distance between two points, x models on average. Gompertz and FP models performed
and y refer to the coordinate of points. The mean and stand- well for extrapolation. Interestingly, it could be observed
ard deviation of these calculated distances were then deter- in Fig. 2 (all models included) that the top-ranked models
mined. The larger mean and the wider standard deviation in the 3-year data extrapolate group were different from
indicated a more discrete degree of the results. Note that those in the 3-year data fit group. This indicated that models
this study did not evaluate parameter uncertainty because with best fit are not always the ones with best extrapolation
we only focused on the uncertainty caused by the choice of performance. Notably, according to Fig. 2 (top five mod-
extrapolation models. els included), although we provided the models that rank
top for each comparison, several models had similar MSE
results and the statistical difference among them could not
3 Results be assessed.

3.1 Assessment of Fit and Extrapolation 3.2 External Validation


Performance Among Different Models
Details of external validation are shown in Supplemen-
Figure 2 shows the visualized results of fit and extrapolation tary file 2 (see ESM). MSE results of different models are
performance among different models. Detailed results are shown in S2 Table 1 and survival plots are shown in S2
shown in Supplementary file 1 Tables 2–4 (including MSE, Fig. 2. By comparing the modeled I-OS data with 10-year
estimated log-likelihood, AIC, and coefficients) (see ESM). external data, we found that RCS and GAM, which per-
formed well in extrapolating the 3-year data for longer
3.1.1 Goodness‑of‑Fit (GOF) horizons, also showed a better performance when vali-
dated by the external data. This indicated that RCS and
Smoothed hazard plots and survival plots based on observed GAM might provide a good long-term extrapolation per-
and modeled data are shown as S1 Figs. 1–2 and 5–6 (see formance. However, second-order FP, which showed the
ESM). Visually, almost all the models provided a good fit best performance in extrapolating 3-year data to 6.5 years,
of the observed hazard data for both 3-year and 6.5-year OS had a poor performance in external validation due to over-
data (S1 Figs. 2 and 6, see ESM), although there were differ- fit. In addition, it was hard to tell which model performed
ences in the extent to which local fluctuations were captured better without external data since the GOF statistics were
(S1 Figs. 1 and 5, see ESM). However, most models failed to close (MSE of six models had a difference within 1).
capture the steep descents in the early stages of PFS. They
either underestimated or overestimated the survival rate, 3.3 Economic Evaluation Results
which was particularly obvious in I-PFS (progression-free
survival data of ipilimumab) (S1 Figs. 2 and 6, see ESM). The process of economic evaluation is given in Supplement
4 (see ESM). For a total of 81 potential modeled curves,
T. Shao et al.

Fig. 2  Visualized results of fit and extrapolation performance among had better performance. Exp exponential, FP fractional polynomial,
different models. ‘3-year data fit’ means that this MSE is calculated GAM generalized additive models, gengamma generalized gamma,
by fitted 3-year data and original 3-year data; ‘3-year data extrapo- IPI ipilimumab, lnorm lognormal, llogis log-logistic, mix-cure mix-
late’ means that this MSE is calculated by extrapolated 6.5-year data ture cure model, MSE mean squared errors, NIV nivolumab plus ipili-
and original 6.5-year data; ‘6.5-year data fit’ means that this MSE mumab, OS overall survival, param-mix parameter mixture model,
is calculated by fitted 6.5-year data and original 6.5-year data. Cur- PFS progression-free survival, RCS restricted cubic spline models,
rent MSE value = original MSE value * 10,000. The lower the point RP Royston-Parmar models
is located, the lower the MSE value it presents, meaning the model

45 were included with 3-year fitted data combined with a when the simulation time was set to 6.5 years. However,
20-year extrapolated data, and 54 models were included with a 20-year horizon, the 3-year data extrapolate group
with extrapolated data from the beginning. Table 1 shows performed better than the 3-year data fit group. However,
the summary results of the impacts of model selection on no significant statistical difference could be observed among
economic evaluation. According to Table 1, it was obvi- the three groups.
ous that the estimated ICER varied by the simulation time Tornado diagrams are provided in Supplementary file 4,
and model selection. However, it was hard to evaluate the Figs. 1–6 (see ESM). Based on S4 Figs. 1–6, we found that
association between them. The 3-year data fit group had the model choice for a specific survival curve might lead to
the largest summed mean of distances away from reference significant changes in estimated ICER (e.g., selecting the
ICERs regardless of simulation time, while the 6.5-year RP-hazard model in I-OS always brought huge fluctuations
data fit group had the smallest. The 3-year data extrapolate in estimations). Standardized ICERs for three groups with
group performed almost the same as the 3-year data fit group different study horizons are shown in Fig. 3. According to
Impact of Extrapolation Model Choices in Economic Evaluations for Cancer Immunotherapy

Table 1  Summary results of the impacts of model selection on economic evaluation


Simulation time Model Variability of ICER Discrete degree of ICER
Best model ICER of the best Range of differences Mean SD Reference model
model in ICERs

6.5 years 6.5-year data fit Mixcure—RP_nor- 57812.02803 [−368.23, 32030.62] 1.66 1.02 RCS—RP_nor-
(n = 81) mal—RP_haz- mal—RP_normal—
ard—RP.hazard RP.hazard
3-year data fit Gompertz—RP_ 73801.48638 [−14075.37, 2.07 0.79 GAM—RP_normal—
(n = 81) normal—RP_ 7586.33] RP_odds—FP2
odds—FP2
3-year data extrapo- Mixcure—GAM— 53733.05215 [−9779.14, 2.04 1.14 Gompertz—GAM—
late (n = 54) FP2—FP2 21241.05] GAM—Gompertz
20 years 6.5-year data fit Mixcure—RP_nor- −16154.67858 [−2867.83, 1.85 0.69 RCS—RP_odds—
(n = 81) mal—RP_haz- 17754.82] RP_normal—GAM
ard—RP.hazard
3-year data fit Gompertz—RP_ 23201.78262 [−15501.65, 2.47 1.00 FP2—GAM—RP_
(n = 45) normal—RP_haz- 12899.22] normal—Gompertz
ard—FP2
3-year data extrapo- Mixcure—GAM— −87767.92543 [−5627.76, 2.00 1.33 Mixcure—GAM—
late (n = 54) FP2—FP2 93277.87] GAM—Gompertz

For variability of ICER, the best models were selected according to the MSE for each curve. The range of differences in ICERs was calculated to
reflect the upper and lower range of the differences between modeled ICERs and the best model
For the discrete degree of ICER, the mean and SD were calculated by the distances between result points and the reference point. The reference
point was the point that had the closest distance to all the other points. Result points were standardized from incremental costs and incremen-
tal QALYs calculated by the economic evaluations. Best models follow the order ‘NI-PFS, NI-OS, I-PFS, I-OS’ to correspond to the survival
curves. 3-year data was considered immature. 6.5-year data was considered mature
FP fractional polynomial, GAM generalized additive models, ICER incremental cost-effectiveness ratio, I-OS overall survival data of ipili-
mumab, I-PFS progression-free survival data of ipilimumab, MCE mean squared errors, mixcure mixture cure model, NI-OS overall survival
data of nivolumab plus ipilimumab, NI-PFS progression-free survival data of nivolumab plus ipilimumab, QALYs quality-adjusted life-years,
RCS restricted cubic spline models, RP Royston-Parmar models, SD standard deviation

Fig. 3, more discrete results could be observed when simu- data, and the economic evaluation showed that the results
lation time progressed. The 3-year data fit group appeared from choosing models through GOF were more discrete
more scattered than the other two groups regardless of the than choosing models through goodness of extrapolation.
simulation time, and the 3-year data extrapolate group This showed that selecting survival models based only on
appeared less scattered than the 6.5-year data fit group. GOF statistics was unsuitable and might lead to biased cost-
A possible reason was that many groups of models were effectiveness results. An alternative approach was to search
excluded from the 3-year data extrapolate group. for external evidence [17, 26]. In our case study, models with
good extrapolation performance could be identified through
external validation. In addition, over-fitted models could also
4 Discussion be identified. A recently published guide has pointed out the
potential available sources of external evidence (e.g. long-
In this study, we explored the effect of modeling technique term survival data of the same products used in the same
selection on fitting and extrapolation of survival curves indication or more mature data from the same products but
through a case study of cancer immunotherapy. A simplified used in a later line of treatment for the same disease) [19].
partitioned survival model was constructed to evaluate the However, despite several available approaches [26, 34, 35],
impacts of model selection on the structural uncertainties in a standard approach to using external evidence still needs
economic evaluation, including the variability of estimated future studies. A necessary point that should be highlighted
ICER and the discrete degree of ICER. is that although researchers could identify the best model
Model selection could influence the prediction of survival through statistical indicators, there still existed uncertainty
outcomes, leading to the uncertainty of economic evaluation. in estimated ICER under different model choices and study
Based on our results targeted on 3-year data, we found that horizons. All these results should be reported in an eco-
models selected only based on GOF statistics did not show nomic evaluation of cancer immunotherapy to show the
a superior MSE when validated by the 6.5-year original structural uncertainty [19] (e.g., tornado diagrams).
T. Shao et al.

Fig. 3  Standardized economic evaluation result points for three 3-year data was considered immature. 6.5-year data was considered
groups of models under different simulation times. Red points refer to mature. The three groups of models refer to (1) models with best fit
the reference points. Reference points were selected as the point with of 3-year data, (2) models with best extrapolation performance of
the closest distance to all the other points. Result points were stand- 3-year data, and (3) models with best fit of 6.5-year data. ICER incre-
ardized from incremental costs and incremental QALYs calculated in mental cost-effectiveness ratio, QALYs quality-adjusted life-years
the economic evaluations. Black dashed lines represent the line y = 0.

Data maturity could also influence the survival outcomes Secondly, extrapolation uncertainty would be raised with
and economic outcomes. Our findings showed that the esti- prolonged model horizon. Thirdly, external evidence can be
mated ICERs calculated from immature data appeared more helpful to validate the model choice, especially when deal-
discrete than those from mature data. This indicated that ing with immature data. Among previous studies, two have
different model selections based on immature data brought already compared the different extrapolation models through
more uncertainty. Unfortunately, a high proportion of current the case study of Checkmate 067 [4, 18]. Gibson et al. com-
cost-effectiveness analyses for cancer immunotherapy were pared RCS with standard parametric models and found that
conducted based on immature data. Although using flex- RCS performed better in modeling PFS [4]. Federico et al.
ible techniques can be helpful in reducing the uncertainty of included six survival models to fit the OS from different data
capturing complex survival hazards, few studies take a full cuts [18]. They both found that survival models explicitly
model choice into consideration. A recent systematic review incorporating survival heterogeneity showed greater accu-
of French health technology assessment (HTA) reports indi- racy for earlier data cuts than standard parametric models.
cated that only one study applied a flexible technique among However, the two studies only focused on either PFS or OS
11 assessed targeted cancer immunotherapies [36]. Although outcomes. Our study further explored the impacts of model
using external evidence could be helpful when dealing with selection on economic evaluation. Our study further proved
immature data [17, 19], results generated from the imma- that estimated ICERs could show great variability with
ture data should be carefully considered for decision making model choices and horizons. We also present this kind of
because of unaddressed uncertainties. structural uncertainty in a visualized and quantitative way to
Our study validated several peer studies and guidelines make it easier to understand. Finally, we suggest that some-
[4, 6–15, 18]. Firstly, models with the best GOF might not times researchers might lack evidence to select the ‘best’
necessarily provide improved extrapolation performance. model, so reporting the uncertainty is recommended.
Impact of Extrapolation Model Choices in Economic Evaluations for Cancer Immunotherapy

However, this study is not without limitations. First, the WT and LS; funding acquisition: WT; resources: WT; supervision:
lack of IPD data and reconstructed IPD might lead to some WT and LS.
biases, although one study showed that using reconstructed
IPD had little influence on economic evaluation results [3]. Open Access  This article is licensed under a Creative Commons Attri-
bution-NonCommercial 4.0 International License, which permits any
Second, we considered the effects of different models and non-commercial use, sharing, adaptation, distribution and reproduction
simulation time in our economic evaluation; however, we in any medium or format, as long as you give appropriate credit to the
ignored the impact of sample size and the parameters. In original author(s) and the source, provide a link to the Creative Com-
other words, the sample size and the parameters were con- mons licence, and indicate if changes were made. The images or other
third party material in this article are included in the article's Creative
trolled constant in our analysis. Third, the economic model Commons licence, unless indicated otherwise in a credit line to the
we used was simplified based on published studies. The material. If material is not included in the article's Creative Commons
original model was Markov [32], and we used a partitioned licence and your intended use is not permitted by statutory regula-
survival model. Deficiencies in model structures and model tion or exceeds the permitted use, you will need to obtain permission
directly from the copyright holder. To view a copy of this licence, visit
assumptions might bias the cost-effectiveness results. Thus, http://​creat​iveco​mmons.​org/​licen​ses/​by-​nc/4.​0/.
we only focused on the uncertainty of the results instead of
the practical significance. Finally, using a single case study
might also be viewed as a limitation. Further studies that
include more cancer immunotherapies and more cancer References
types could help to test the generalizability of our findings.
1. Ishak KJ, Kreif N, Benedict A, Muszbek N. Overview of paramet-
ric survival analysis for health-economic applications. Pharmaco-
economics. 2013;31(8):663–75.
2. The National Institute for Health and Care Excellence. NICE DSU
5 Conclusions technical support document 14: Survival analysis for economic
evaluations alongside clinical trials—extrapolation with patient-
Flexible techniques present better performance in the case of level data. 2022. https://​www.​sheff​i eld.​ac.​uk/​sites/​defau​lt/​files/​
Checkmate 067 regardless of data maturity. Model selections 2022-​02/​TSD14-​Survi​val-​analy​sis.​updat​ed-​March-​2013.​v2.​pdf.
Accessed 3 Apr 2022.
matter to ICERs of cancer immunotherapy, especially when 3. Djalalov S, Beca J, Ewara EM, Hoch JS. A comparison of
dealing with immature survival data. Finally, under usual different analysis methods for reconstructed survival data
cases when researchers lack evidence to identify the ‘right’ to inform cost-effectiveness analysis. Pharmacoeconomics.
model, a recommended approach is to identify and report 2019;37(12):1525–36.
4. Gibson E, Koblbauer I, Begum N, Dranitsaris G, Liew D,
these structural uncertainties even when external data could McEwan P, et al. Modelling the survival outcomes of immuno-
help to exclude some of the models considered. oncology drugs in economic evaluations: a systematic approach
to data analysis and extrapolation. Pharmacoeconomics.
Supplementary Information  The online version contains supplemen- 2017;35(12):1257–70.
tary material available at https://d​ oi.o​ rg/1​ 0.1​ 007/s​ 41669-0​ 23-0​ 0391-5. 5. Crowther MJ, Lambert PC. A general framework for parametric
survival analysis. Stat Med. 2014;33(30):5280–97.
Declarations  6. Kearns B, Stevenson MD, Triantafyllopoulos K, Manca A. Gener-
alized linear models for flexible parametric modeling of the hazard
Funding  General Program of National Natural Science Foundation of function. Med Decis Making. 2019;39(7):867–78.
China (72174207). 7. Klijn SL, Fenwick E, Kroep S, Johannesen K, Malcolm B, Kurt
M, et al. What did time tell us? A comparison and retrospective
Consent  Not applicable. validation of different survival extrapolation methods for immuno-
oncologic therapy in advanced or metastatic renal cell carcinoma.
Ethics approval  Not applicable. Pharmacoeconomics. 2021;39(3):345–56.
8. Kearns B, Stevenson MD, Triantafyllopoulos K, Manca A. The
extrapolation performance of survival models for data with a cure
Conflicts of interest  All authors declared that they have no conflict of
fraction: a simulation study. Value Health. 2021;24(11):1634–42.
interest.
9. Su D, Wu B, Shi L. Cost-effectiveness of atezolizumab plus beva-
cizumab vs sorafenib as first-line treatment of unresectable hepa-
Availability of data and material  All data generated or analyzed during
tocellular carcinoma. JAMA Netw Open. 2021;4(2): e210037.
this study are included in this published article and supplementary files.
10. Whittington MD, McQueen RB, Ollendorf DA, Kumar VM,
Chapman RH, Tice JA, et al. Long-term survival and cost-effec-
Code availability  R codes for this study are available on GitHub
tiveness associated with axicabtagene ciloleucel vs chemotherapy
(https://​github.​com/​Taiha​ngShao/​uncer​tainty-​of-​CEA-​flexi​ble-​extra​
for treatment of B-Cell lymphoma. JAMA Netw Open. 2019;2(2):
polat​ion-​techn​iques)
e190035.
11. Gallacher D, Kimani P, Stallard N. Extrapolating parametric sur-
Author contributions  Conceptualization: all authors; methodology:
vival models in health technology assessment: a simulation study.
TS and MZ; formal analysis and investigation: TS and MZ; writing:
Med Decis Making. 2021;41(1):37–50.
original draft preparation: TS, MZ and LL; writing: review and editing:
T. Shao et al.

12. Gallacher D, Kimani P, Stallard N. Extrapolating parametric sur- 24. Wolchok JD, Chiarion-Sileni V, Gonzalez R, Rutkowski P,
vival models in health technology assessment using model averag- Grob JJ, Cowey CL, et  al. Overall survival with combined
ing: a simulation study. Med Decis Making. 2021;41(4):476–84. nivolumab and ipilimumab in advanced melanoma. N Engl J Med.
13. Gray J, Sullivan T, Latimer NR, Salter A, Sorich MJ, Ward RL, 2017;377(14):1345–56.
et al. Extrapolation of survival curves using standard parametric 25. The National Institute for Health and Care Excellence. CHTE2020
models and flexible parametric spline models: comparisons in sources and synthesis of evidence; update to evidence synthesis
large registry cohorts with advanced cancer. Med Decis Making. methods. 2020. https://n​ iceds​ u.s​ ites.s​ heff​i eld.a​ c.u​ k/m
​ ethod​ s-d​ evel​
2021;41(2):179–93. opment/​chte2​020-​sourc​es-​and-​synth​esis-​of-​evide​nce. Accessed
14. Grant TS, Burns D, Kiff C, Lee D. A case study examining the 14 Apr 2022.
usefulness of cure modelling for the prediction of survival based 26. Guyot P, Ades AE, Beasley M, Lueza B, Pignon JP, Welton NJ.
on data maturity. Pharmacoeconomics. 2020;38(4):385–95. Extrapolation of survival curves from cancer trials using external
15. Kearns B, Stevenson MD, Triantafyllopoulos K, Manca A. Com- information. Med Decis Making. 2017;37(4):353–66.
paring current and emerging practice models for the extrapolation 27. Guyot P, Ades AE, Ouwens MJ, Welton NJ. Enhanced second-
of survival data: a simulation study and case-study. Bmc Med Res ary analysis of survival data: reconstructing the data from pub-
Methodol. 2021. https://​doi.​org/​10.​1186/​s12874-​021-​01460-1. lished Kaplan-Meier survival curves. Bmc Med Res Methodol.
16. The National Institute for Health and Care Excellence. Guide to 2012;12:9.
the methods of technology appraisal 2013. https://​www.​nice.​org.​ 28. Jrgensen SE. Model Selection and Multimodel Inference. Ecol
uk/​proce​ss/​pmg9/​chapt​er/​forew​ord. Accessed 3 Apr 2022. Model. 2004.
17. The National Institute for Health and Care Excellence. NICE DSU 29. Liu XR, Pawitan Y, Clements M. Parametric and penal-
technical support document 21: Flexible Methods for Survival ized generalized survival models. Stat Methods Med Res.
Analysis. 2022. https://​www.​sheff​i eld.​ac.​uk/​sites/​defau​lt/​files/​ 2018;27(5):1531–46.
2022-0​ 2/T​ SD21-F
​ lex-S
​ urv-T ​ SD-2​ 1_F
​ inal_a​ lt_t​ ext.p​ df. Accessed 30. Schadendorf D, Hodi FS, Robert C, Weber JS, Margolin K, Hamid
3 Apr 2022. O, et al. Pooled analysis of long-term survival data from phase II
18. Federico PV, Kurt M, Zhang L, Butler MO, Michielin O, Amadi and phase III trials of ipilimumab in unresectable or metastatic
A, et al. Heterogeneity in survival with immune checkpoint inhibi- melanoma. J Clin Oncol. 2015;33(17):1889–94.
tors and its implications for survival extrapolations: a case study 31. Gorry C, McCullagh L, Barry M. Economic evaluation of sys-
in advanced melanoma. MDM Policy Pract. 2022;7(1):97836411. temic treatments for advanced melanoma: a systematic review.
19. Palmer S, Borget I, Friede T, Husereau D, Karnon J, Kearns B, Value Health. 2020;23(1):52–60.
et al. A guide to selecting flexible survival models to inform eco- 32. Kohn CG, Zeichner SB, Chen Q, Montero AJ, Goldstein DA,
nomic evaluations of cancer immunotherapies. Value Health. Flowers CR. Cost-effectiveness of immune checkpoint inhibi-
2023;26(2):185–192. https://​doi.​org/​10.​1016/j.​jval.​2022.​07.​009. tion in BRAF wild-type advanced melanoma. J Clin Oncol.
20. Bullement A, Latimer NR, Bell GH. Survival extrapolation in can- 2017;35(11):1194–202.
cer immunotherapy: a validation-based case study. Value Health. 33. Bensimon AG, Zhou ZY, Jenkins M, Song Y, Gao W, Signoro-
2019;22(3):276–83. vitch J, et al. Cost-effectiveness of pembrolizumab for the adju-
21. Cooper M, Smith S, Williams T, Aguiar-Ibanez R. How accu- vant treatment of resected high-risk stage III melanoma in the
rate are the longer-term projections of overall survival for can- United States. J Med Econ. 2019;22(10):981–93.
cer immunotherapy for standard versus more flexible parametric 34. Jackson C, Stevens J, Ren S, Latimer N, Bojke L, Manca A, et al.
extrapolation methods? J Med Econ. 2022;25(1):260–73. Extrapolating survival from randomized trials using external data:
22. Filleron T, Bachelier M, Mazieres J, Perol M, Meyer N, Martin a review of methods. Med Decis Making. 2017;37(4):377–90.
E, et al. Assessment of treatment effects and long-term benefits in 35. Soikkeli F, Hashim M, Ouwens M, Postma M, Heeg B. Extrapo-
immune checkpoint inhibitor trials using the flexible parametric lating survival data using historical trial-based a priori distribu-
cure model: a systematic review. JAMA Netw Open. 2021;4(12): tions. Value Health. 2019;22(9):1012–7.
e2139573. 36. Grumberg V, Roze S, Chevalier J, Borrill J, Gaudin AF, Bran-
23. Wolchok JD, Chiarion-Sileni V, Gonzalez R, Grob JJ, Rutkowski choux S. A review of overall survival extrapolations of immune-
P, Lao CD, et al. Long-term outcomes with nivolumab plus ipili- checkpoint inhibitors used in health technology assessments by
mumab or nivolumab alone versus ipilimumab in patients with the French health authorities. Int J Technol Assess Health Care.
advanced melanoma. J Clin Oncol. 2022;40(2):127–37. 2022;38(1): e28.

You might also like