You are on page 1of 10

British Journal of Cancer www.nature.

com/bjc

ARTICLE
Epidemiology

Multistate models for the natural history of cancer progression


1✉
Li C. Cheung , Paul S. Albert1, Shrutikona Das1 and Richard J. Cook2

This is a U.S. Government work and not under copyright protection in the US; foreign copyright protection may apply 2022

BACKGROUND: Multistate models can be effectively used to characterise the natural history of cancer. Inference from such models
has previously been useful for setting screening policies.
METHODS: We introduce the basic elements of multistate models and the challenges of applying these models to cancer data.
Through simulation studies, we examine (1) the impact of assuming time-homogeneous Markov transition intensities when the
intensities depend on the time since entry to the current state (i.e., the process is time-inhomogenous semi-Markov) and (2) the
effect on precancer risk estimation when observation times depend on an unmodelled intermediate disease state.
RESULTS: In the settings we examined, we found that misspecifying a time-inhomogenous semi-Markov process as a time-
homogeneous Markov process resulted in biased estimates of the mean sojourn times. When screen-detection of the intermediate
disease leads to more frequent future screening assessments, there was minimal bias induced compared to when screen-detection
1234567890();,:

of the intermediate disease leads to less frequent screening.


CONCLUSIONS: Multistate models are useful for estimating parameters governing the process dynamics in cancer such as
transition rates, sojourn time distributions, and absolute and relative risks. As with most statistical models, to avoid incorrect
inference, care should be given to use the appropriate specifications and assumptions.
British Journal of Cancer (2022) 127:1279–1288; https://doi.org/10.1038/s41416-022-01904-5

BACKGROUND that even richer classes of models for the natural history of cancer
Multistate models characterise the movement of individuals can be developed.
through successive states in a disease process. These are often In this paper, we introduce the basic elements of multistate
used to describe the occurrence of complications and mortality in models and how they can be used, describe some of the
patients receiving treatment for cancer [1–3] and other diseases methodological challenges unique to studies involving cancer and
[4, 5]. They have also proven useful in describing the natural through simulation studies, we examine two modelling challenges
history of diseases, including, for example, the progression that can impact inferences aiming to inform cancer screening
through different risk strata for cardiovascular disease [6], the practice. In the first simulation study, we investigate the impact of
impact of alcohol consumption on dementia and the mediating misspecifying transition intensities on inferences about sojourn
role of cardiometabolic disease [7], and the transition of times in disease states. In the second simulation study, we target
individuals through progressively advanced states of retinopathy the absolute risk of disease based on a two-state survival model
among diabetics [8]. but examine the impact of an intermittent observation process
The notion of cancer as a multistate process originated in that depends on an intermediate disease state.
Armitage and Doll’s theory of carcinogenesis [9], which concep-
tualises the cancer process as a series of mutational events leading
to malignancy. Since then, the clonal evolution of cancer has been METHODS
demonstrated repeatedly [10, 11], and several clinically identifiable Introduction to multistate models
cancer precursor states have been discovered [12]. Multistate Figure 1a displays a general K-state model in which arrows depict
models characterising the transitions across healthy, preclinical, transitions that can be made directly; here they can be made from each
and clinical cancer states have been used in breast [13–22], state to any other state, but many models feature restrictions. The two-
state survival model (Fig. 1b), competing risk model (Fig. 1c) and illness-
colorectal [23–26], cervical [27–30], liver [31], lung [32], prostate
death model (Fig. 1d) are examples of useful multistate models that have
[33–35] and gastric [36] cancers. In cancers caused by viruses such been widely studied and applied in cancer research. We let X(t) denote the
as cervical (human papillomavirus (HPV)) and nasopharyngeal state occupied by an individual at age t taking on one of the values 1,2,…,K
(Epstein–Barr) cancers, it has been useful to include an infection and let Hðt Þ ¼ fX ðsÞ; 0  s  tg be the history of state occupancy up to
state in the multistate process [28, 30, 37]. The availability of large- age t, t > 0. We consider a set of fixed attributes recorded in a covariate
scale cohorts featuring longitudinal multi-omics data [38] mean vector z. The risk of movement between states i and j may be characterised

1
Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD, USA. 2Department of Statistics and
Actuarial Science, University of Waterloo, Waterloo, ON, Canada. ✉email: li.cheung@nih.gov

Received: 10 March 2022 Revised: 21 June 2022 Accepted: 28 June 2022


Published online: 11 July 2022

Published on Behalf of CRUK


L.C. Cheung et al.
1280
by transition intensities timescale. Models wherein covariates affect transition intensities  in a
multiplicative way (i.e., λij ðtjHðt Þ; zÞ ¼ λij0 ðtjHðt ÞÞ exp z0 βij ) are most
λij ðtjHðt Þ; zÞ ¼ lim PðX ðt þ ΔtÞ ¼ jjX ðt Þ ¼ i; Hðt Þ; zÞ=Δt;
Δt!0 common but additive models with λij ðtjHðt  Þ; zÞ ¼ λij0 ðtjHðtÞÞ þ z0 βij are
also possible [39], with λij0 ðtjHðt ÞÞ a baseline intensity characterising the
for j not equal to i, i, j = 1,…,K. For Markov models, transition intensities risk of transition for an individual with covariates equal to the reference
depend only on the state currently occupied (i.e., λij ðtjHðt Þ; zÞ ¼ λij ðtjzÞ), level. The transition intensities can be arranged in the form of a K x K
while for semi-Markov models, transition intensities depend on s, the time matrix Q where entries in each row of the matrix must sum to zero; we
since entry to the current state (i.e., λij ðtjHðt Þ; zÞ ¼ λij ðs; zÞ). For both show in the Web Appendix how this can be used to calculate transition
Markov and semi-Markov models, there can be trends in the transition probabilities [40].
intensities whereby the risk of transition changes as a function of the basic Misclassification of states may arise from imperfect laboratory, screening
or diagnostic tests. In such cases (for example, see Fig. 1e), the underlying
multistate process is often assumed to be governed by a set of transition
a intensities, but given this latent process, the observed state is governed by
a matrix of misclassification probabilities E. A hidden multistate model is
fitted by finding the values of unknown parameters in Q and E that
… maximise the likelihood defined as proportional to the probability of the
State 1 State 2 State 3 State K
observed path; hidden Markov models are the most commonly adopted in
this setting.
For multistate models geared toward modelling of natural history, the
likelihood construction may be further complicated due to intermittent
observation of individuals and the consequence that the exact transition
b times between healthy and preclinical states are unknown. Moreover,
individuals who die or are censored “cancer-free” could have been in any
Event-free Event healthy/preclinical state prior to death or censoring. It is computationally
convenient to adopt time-homogenous Markov models (i.e.,
λij ðtjzÞ ¼ λij ðzÞ) due to these complications, but this assumption may not
always be appropriate for cancer.
c In Fig. 2a, we illustrate a multistate model with four states: healthy,
Event 1 precursor to clinical cancer (i.e., a definable pathologic state that
progresses to clinical cancer, such as precancer or preclinical cancer,
Event 2
etc.), clinical cancer, and death. In the Web Appendix, we give examples of
Event-free
how to construct the likelihood for a time-homogenous Markov model and
a semi-Markov model in which the transition from the cancer precursor to

the clinical cancer states follows a gamma distribution. The gamma


Event K-1 distribution is attractive for modelling time-inhomogenous cancer transi-
tions as it is relatively tractable but allows for the risk of a transition to
d depend on the length of time in the present state. The gamma distribution
is a natural choice since the occurrence of a transition to a cancer state
Disease-
Disease may arise from several sequential mutation events with independent and
free
identically distributed exponential waiting times between them [41, 42].

Use of multistate models to describe the natural history of


Death cancer
Multistate models have been extensively applied to cancer cohorts, and in
e particular, breast cancer screening cohorts, to describe the natural history
"
Event-free Event and inform screening practice. A three-state model (healthy, preclinical
Underlying states
cancer and clinical cancer; such as Fig. 2a without the death state) was
initially proposed to jointly estimate the average time spent in the screen-
detectable preclinical breast cancer state and the sensitivity of mammo-
Event-free Event Observed states graphy [13, 14]. Later models assessed overdiagnosis from breast cancer
screening by expanding the state space to include non-progressive
preclinical cancer states [15–18] or a competing mortality state [19],
Fig. 1 Examples of multistate models. a General K-state process considered the effects of covariates on the sojourn times [19, 20] and
where each state can transition to the other K-1 states. b Two-state estimated the mortality reduction from screening [21, 22]. Estimates of the
survival process. c K-1 competing risk. d Reversible illness-death mean sojourn time in the screen-detectable preclinical breast cancer state
process. e Two-state survival process with misclassification of states.

a
2 = Cancer 2 = Clinical
1 = Healthy
precursor cancer

4 = Death

b
Acquisition Progression
2 = HPV-positive, 3 = HPV-positive,
1 = HPV-negative cervical precancer
precancer-free
Clearance

Fig. 2 Examples of multistate cancer processes. a Progressive cancer process with four states: healthy, preclinical cancer, clinical cancer, and
death. b Cancer process with three states: HPV-negative, HPV-positive without precancer, and HPV-positive with precancer. Acquired HPV
infections can either clear or progress.

British Journal of Cancer (2022) 127:1279 – 1288


L.C. Cheung et al.
1281
have ranged from 2 to 4 years with shorter sojourn times and poorer precancer/cancer) survival models or competing risk (healthy, precancer/
screening sensitivity found in younger women; this indicates that shorter cancer, and competing mortality) models that assume disease-
screening intervals would be needed to achieve the same breast cancer independent visit times and censoring. However, in populations under-
mortality reduction [19, 22]. This evidence influenced the American Cancer going screening, the frequency of visits can depend on the observation of
Society’s recommendations that high-risk women aged 40–54 be screened intermediate disease states, violating this statistical assumption. In
annually before transitioning to biennially screening at age 55 [43]. addition, the observation scheme in the data used to estimate risks may
Application of multistate models to describe the natural history of not match the observation scheme used in clinical practice (e.g., 3-year
cancer largely assume a time-homogeneous Markov process; however, this retesting intervals following a negative HPV result in the well-studied
assumption may not be reasonable. Several prior works have modelled the Kaiser Permanente Northern California cervical cancer screening cohort
sojourn time distribution under semi-Markov assumptions via piecewise [60] vs. 5-year retesting intervals under current US guidelines [58]).
exponential [44, 45], Weibull [45] or Gompertz [34] intensities. Etzioni and Through simulation studies, we examined whether dependent visit times
Shen [46] proposed a non-parametric semi-Markov method based on a bias inferences using two-state survival models, whether these inferences
computationally intensive EM algorithm [47] to estimate the sojourn time are portable to settings with a different observation scheme, and whether
distribution for breast cancer. Kang and Lagakos [30] proposed a non- multistate models that include the intermediate disease states can reduce
parametric semi-Markov approach for reversable processes to estimate bias and enhance portability.
time-dependent transition intensities for HPV type 16 clearance and Our simulation study mirrored the cervical precancer screening process,
progression to cervical precancer. In the next section, we describe a with 10,000 individuals acquiring human papillomavirus (HPV) infections
simulation study to examine the bias that can result when a time- that can either clear or progress to cervical precancer (Fig. 2b). We allowed
inhomogenous semi-Markov process is modelled as a time-homogenous individuals to begin in one of 3 states (HPV-negative, HPV-positive and
Markov process. cervical precancer) following distributions observed at enrolment in the
Kaiser Permanente Northern California cohort [60]. The time to acquisition
of HPV was assumed to follow an exponential distribution, compatible with
Impact of misspecifying a time-inhomogenous semi-Markov the notion that sexual behaviour does not change over the course of the
process as time-homogenous Markov study. Time to HPV clearance and progression to precancer were governed
Specifying the form of transition intensities can be challenging, particularly by time-homogenous transition intensities. Individuals who cleared an HPV
when disease processes are under intermittent observation, so we infection may reacquire HPV infection up to three times.
investigate the impact of model misspecification via simulation. Specifi- We compared risks of cervical precancer estimated from data using four
cally, we study the inferences based on a time-homogeneous Markov different observation schemes:
assumption when the true intensity for the transition from the cancer
precursor to clinical cancer state is semi-Markov and of a gamma form. We 1) Fixed intervals—observation at 6-month intervals over 10 years,
considered a four-state progressive disease process with healthy, cancer mirroring the fixed observation scheme used in trials [62–65];
precursor, clinical cancer, and death states (Fig. 2a). The goal of the 2) Random intervals—mixed case interval-censoring scheme where
multistate analysis is to characterise natural history, and in particular, the
observation times can vary but have a mean of 1 year, mirroring the
mean sojourn time in the cancer precursor state to guide the specification
annual cytology observation scheme often used before the advent
of screening intervals. In the first set of simulations, all transition intensities of HPV testing [66];
were specified to be time-homogeneous. In the second set of simulations, 3) Doctor’s care—testing-dependent follow-up where observation
the transition intensity for cancer precursor to clinically detectable cancer intervals can vary but have a mean of 1 or 3 years, depending if
had a gamma form with shape parameter equal to two (see Web individuals test HPV-positive or HPV-negative, which mirrors the
Appendix). In these simulations, we allow transition intensities to differ by observation scheme used in the Kaiser Permanente Northern
individual covariates and examined the effect of misspecification on binary
California cohort [59, 67]; and
(following a Bernoulli distribution with a success probability of 0.5) and 4) Follow-up avoidance—individuals testing HPV-negative have mean
continuous (following a standard normal distribution) covariates. observation intervals of 3 years, but individuals testing HPV-positive
We assumed a 15-year cohort study in which 10,000 individuals, initially avoid screening with a mean observation interval of 5 years. Such
cancer-free, provided biospecimens at annual study visits. At the end of observation patterns might arise from fear or embarrassment over
the study, the biospecimens from each visit were assessed for the presence the HPV-positive result.
of a cancer precursor. We assumed that dates of cancer diagnosis and
death are available for all participants from registries, with individuals who
In clinical practice, the need for colposcopy of HPV-positive individuals is
are cancer-free or alive right-censored at the end of the study. We determined by an additional triage test (e.g., cervical cytology, HPV
considered two simulated visit processes: (1) all individuals attend visits genotyping [61] or p16/Ki-67 dual-stain [68]), but for the purpose of
until the end of the study, (2) individuals attended between 1 and 15 visits evaluating estimation bias due to observation schemes, we assumed all
with a 6.7% annual drop-out (i.e., the number of visits follow a discrete HPV-positive individuals are sent to colposcopy. Thirty percent of
uniform distribution). The time between visits follows a normal distribution
participants were simulated to complete the 10-year follow-up, with a
with mean of 1 year and standard deviation of a quarter of a year.
drop-off rate of 6.7% per year in all observation schemes except for the
We fitted time-homogenous Markov models and the correctly specified fixed intervals scheme.
semi-Markov models to each of the 500 simulated datasets and examined For each dataset (500 for each observation scheme), we estimated
bias (reported as the mean and Monte Carlo standard errors) and 95% prevalent (immediately detectable by colposcopy/biopsy) and cumulative
coverage probabilities for parameters of the transition intensities and for risk of cervical precancer for three common clinical scenarios: HPV-
the mean sojourn time in each state. We chose 500 iterations to precisely negative at the first observation, HPV-positive at the first observation
estimate all mean relative bias with Monte Carlo standard errors of <1%. A
(prevalent HPV infections), and newly detected with HPV in follow-up
summary of the simulations and the location of the results are given in
observation (incident HPV infections). We also examined whether
Table 1a. Further details on the simulation parameters and estimation precancer risk estimates for incident HPV infections using data from the
approaches are in the Web Appendix. four observation schemes reflect the precancer risk under the observation
scheme currently recommended for primary HPV screening in the US: test-
Performance of survival models when screening intervals dependent assessment intervals where HPV-positive individuals are
depend on intermediate disease states retested in 1-year and HPV-negative individuals are retested in 5 years
The use of cancer risk models to determine who to screen has been [58]. To estimate precancer risk, we compared two approaches: (1)
proposed for breast [48], colorectal [49], lung [50–54] and oral cavity [55] obtaining the non-parametric maximum likelihood estimates (NPMLE) of
cancer. Lung cancer risk models have also been proposed for managing the survival distribution for a two-state (healthy and precancer) model
individuals with lung nodules once screening is underway [56, 57]. In using the expectation-maximisation-iterative convex minorant (EM-ICM)
screening to prevent cervical cancer, which has a clinically identifiable algorithm for interval-censored event times [69], and (2) fitting a multistate
precancer, current US consensus management guidelines [58] use model via the msm package in R[70], which can handle intermittent
precancer risk estimates [59–61] to determine whether women with observation of time-homogenous Markov processes while modelling HPV
abnormal screening results should undergo immediate treatment, colpo- as an intermediate state in the healthy to precancer process. Precancer
scopy, or return for retesting in 1, 3, or 5 years. Precancer/cancer risk risks are then estimated using the probability transition matrix. A summary
estimates used in screening are largely based on two-state (healthy, of the simulations and location of the results are given in Table 1b. Further

British Journal of Cancer (2022) 127:1279 – 1288


L.C. Cheung et al.
1282
Table 1. Summary of simulations.

(a) Performance of Multistate Models under different model specifications


Simulated transition intensities Modelled transition intensities Visit process Covariates Location of
resultsa
Exponential Exponential 15 annual visits Continuousb T2
Binaryc TS2
1–15 annual visitsd Continuousb TS3
Binaryc TS4
Exponential except cancer Exponential 15 annual visits Continuousb T2
precursor to cancer transition, Binaryc TS2
which is gamma
1–15 annual visitsd Continuousb TS3
Binaryc TS4
Exponential except cancer Exponential except cancer precursor to 15 annual visits Continuousb T2
precursor to cancer transition, cancer transition, which is gamma Binaryc TS2
which is gamma
1–15 annual visitsd Continuousb TS3
Binaryc TS4
(b) Cumulative risk of precancer under different observation schemes
Simulated transition intensities Estimation approach Visit process Clinical scenario Location of
resultsa
Exponential Two-state (HPV−, cervical precancer) Fixed intervals (every HPV− at initial visit F3a, TS5
non-parametric survival approache 6 months) HPV+ at initial visit F3a, TS6
HPV+ at follow- F3b
up visit
Random intervalsf HPV− at initial visit F3a, TS5
HPV+ at initial visit F3a, TS6
HPV+ at follow- F3b
up visit
Test-dependent, doctor’s HPV− at initial visit F3a, TS5
careg HPV+ at initial visit F3a, TS6
HPV+ at follow- F3b
up visit
Test-dependent, follow- HPV− at initial visit F3a, TS5
up avoidanceh HPV+ at initial visit F3a, TS6
HPV+ at follow- F3b
up visit
Three-state (HPV−HPV+, cervical Fixed intervals (every HPV− at initial visit F3a, TS5, TS7
precancer) time-homogeneous Markov 6 months) HPV+ at initial visit F3a, TS6, TS7
modeli
HPV+ at follow- F3b, TS7
up visit
Random intervalsf HPV− at initial visit F3a, TS5, TS7
HPV+ at initial visit F3a, TS6, TS7
HPV+ at follow- F3b, TS7
up visit
Test-dependent, doctor’s HPV− at initial visit F3a, TS5, TS7
careg HPV+ at initial visit F3a, TS6, TS7
HPV+ at follow- F3b, TS7
up visit
Test-dependent, follow- HPV− at initial visit F3a, TS5, TS7
up avoidanceh HPV+ at initial visit F3a, TS6, TS7
HPV+ at follow- F3b, TS7
up visit
HPV−/+ human papillomavirus negative/positive.
a
Additional abbreviations for location of results: T = table, F = figure, S = supplement.
b
Covariates simulated according to a standard normal distribution.
c
Covariates simulated according to a Bernoulli distribution with success probability = 0.5.
d
Number of visits attended simulated according to a discrete uniform (1,15) distribution.
e
Estimated using EM-ICM algorithm.
f
Observation process simulated according to mixed-cases interval-censoring scheme
g
Test-dependent visit intervals with mean of 3 years for HPV- and 1 year for HPV+.
h
Test-dependent visit intervals with mean of 3 years for HPV- and 5 years for HPV+.
i
Estimated using msm package.

British Journal of Cancer (2022) 127:1279 – 1288


L.C. Cheung et al.
1283
details on the simulation of the latent disease process, the observation fitted data, cannot be used to correctly infer the prevalent
schemes, and the estimation approaches can be found in the Web precancer risk that would occur in US clinical practice. In addition,
Appendix. under a follow-up avoidant observation scheme, the two-state
survival approach would underestimate the precancer risk among
individuals detected with new HPV infections but without
RESULTS prevalent precancers. In contrast, the three-state Markov model
Impact of misspecifying a time-inhomogenous semi-Markov can use data from any observation scheme to recover the
process as time-homogenous Markov underlying transition rates (Supplemental Table S7) and estimate
The simulated datasets each consisted of 10,000 individuals in precancer risk that is close to the precancer risk simulated
which clinical cancer rarely occurred (a mean of 2% for simulations according to the US-recommended observation scheme (Fig. 3b).
with continuous covariate; 7–7.5% for simulations with binary
covariate). In simulations without individual drop-out, 5% and 11%
of individuals were observed with the cancer precursor in DISCUSSION
simulations with continuous and binary covariates, respectively. Multistate models can be used to better understand the natural
In simulations with 6.7% annual individual drop-out, these history of cancer, including the time spent in different preclinical
proportions were approximately halved. Across simulation set- and clinical cancer states and the effect of covariates on
tings, 24–29% of the individuals died without progressing to transitions. In this article, we examined two challenges that might
clinical cancer; for most of these individuals, it was unknown affect estimates used to inform cancer screening practice. First,
whether they reached the cancer precursor state prior to death. the use of a time-homogenous Markov process when state
The proportions with cancers and deaths approximate the transitions follow a time-inhomogenous semi-Markov process will
projected proportions with lung cancers and deaths over a 15- result in substantially biased estimates of mean sojourn times.
year period in the general (simulations with continuous covariate) Second, we found that precancer/cancer risks estimated using
and high-risk (simulations with binary covariate) population of US two-state survival approaches can be robust to observation
ever-smokers, aged 40–84 years [71]. schemes in which the intensity of assessments increases upon
In simulations with no drop-out, estimates of transition intensity detecting an unmodelled intermediate disease state (e.g., doctor’s
parameters and sojourn times from the correctly specified models care) but the bias increases when the intensity of assessments
were unbiased, but when the semi-Markov process was mis- decreases upon detection of intermediate disease (e.g., follow-up
specified as a time-homogeneous Markov process, the mean avoidance). When the intermediate disease state is included in a
sojourn time for the reference covariate level was overestimated multistate model, then such observation schemes are condition-
by as much as 28% (Table 2 and Supplemental Table S2). ally independent and consistent risk estimates can be obtained
Compared to simulations with no drop-out, 6.7% annual drop-out without also modelling the visit process [40, 72, 73].
resulted in greater absolute bias (1.5–2.9% vs. 0–0.6%) and poorer In our simulations evaluating model misspecification, we
95% coverage probabilities (20–95% vs. 94–96%) for intensity considered a scenario in which all state transitions followed an
parameters governing transitions from the healthy or cancer exponential distribution except the cancer precursor to clinical
precursor states to the death state in the correctly specified cancer transition, which followed a gamma transition. Piecewise-
models (Supplemental Tables S3 and S4). This is because the time constant baseline intensities [42, 43] offer a flexible approach to
from the last healthy assessment to death or censoring is longer, modelling time-inhomogeous intensities, and along with these,
leading to greater uncertainty on whether individuals remained intensities with Weibull [43] or Gompertz [34] form can be used for
healthy or reached the cancer precursor state. semi-Markov transitions. Non-parametric methods can be used to
estimate marginal features in some settings [46, 74] which can be
Performance of survival models when screening intervals useful for checking parametric assumptions. Our simulation
depend on intermediate disease states focused on illustrating the potential bias from model misspecifica-
Performance of the two-state survival approach varied by observa- tion on estimation of mean sojourn time, given its importance in
tion scheme (Fig. 3a and Supplemental Tables S5 and S6). When the determining cancer screening intervals[19, 22]. Of course, the
assessment times are independent of the modelled disease process impact of model misspecification will depend on the target of
or when individuals were more intensely observed following the inference and intended use.
detection of HPV (e.g., doctor’s care), the estimated risks of In our study of risk estimation under dependent observation
precancer following initial HPV results were unbiased. However, intervals, we modelled our simulation after the HPV and cervical
when individuals were less intensely observed following the precancer process. We used observation schemes found in
detection of HPV (e.g., follow-up avoidance), the survival approach previously conducted trials/cohorts [59, 62–67], as well as a
underestimated the precancer risk until the time that most women hypothetical observation scheme in which patients avoid follow-
previously detected with HPV had returned for screening. By up screening after detection of HPV. In our simulation study, we
contrast, precancer risks following initial HPV results estimated found that the precancer risks for newly detected HPV infections
using multistate models were similar to the true underlying differed by observation schemes due to different lag times
precancer risk for all four observation schemes (Fig. 3a and between HPV acquisition and detection. This finding is relevant to
Supplemental Tables S5 and S6). Because the multistate models US risk-based guidelines for cervical precancer [58] as the data
include the intermediate HPV state in the model, the observation used to estimate risks come from different observation schemes
schemes are non-informative of the disease process. [59], most of which differ from current recommendations of 1-year
Among individuals detected with new HPV infections in follow- retesting for HPV-positive, cytology-negative individuals and
up, the precancer risk differed according to the observation 5-year retesting for HPV-negative individuals [58].
scheme used in the data, with observation schemes that have A prior study focusing on prostate cancer similarly concluded
longer assessment intervals following an HPV-negative test result that estimated cancer risk may differ by observation scheme, and
having greater prevalent (i.e., immediately detectable by colpo- that the multistate modelling framework is more suitable for
scopy/biopsy at the time of new HPV detection) precancer risk compare risks across studies [35]. Because multistate models can
(Fig. 3b). Because the assessment intervals used in the fitted data recover the underlying transition intensities, they can also be used
are shorter than that currently recommended in the US [58], the to estimate precancer/cancer risks following the detection of an
prevalent precancer risks estimated using a two-state survival intermediate disease state under a counterfactual observation
approach, while unbiased for the observation scheme used in the scheme, either as a closed-form equation or as the basis of

British Journal of Cancer (2022) 127:1279 – 1288


1284
Table 2. Performance of a multistate model under correctly specified vs. misspecified transition intensities—continuous covariate, no dropouts.
Simulated transition Exponential Exponential for all except 2−>3 transition, which Exponential for all except 2−>3 transition, which
intensities is gamma is gamma

Modelled transition Exponential Exponential Exponential for all except 2−>3 transition, which
intensities is gamma

Parametera Relative bias, mean (SE), ASE/ 95% CP, Relative Bias, mean (SE), ASE/ 95% CP, Relative bias, mean (SE), ASE/ 95% CP,
%b ESDb %b %b ESDb %b %b ESDb %b
b121 0.0 (0.0) 0.99 94.6 −0.4 (0.0) 1.01 95.0 0.0 (0.0) 1.00 96.0
b122 = b232c,d −0.1 (0.3) 0.98 94.2 8.1 (0.3) 1.04 83.2 0.1 (0.3) 1.02 95.4
e
b141 = b241 0.0 (0.0) 0.98 95.4 0.0 (0.0) 0.95 94.6 0.0 (0.0) 0.95 94.6
b142 = b242e −0.3 (0.5) 0.98 94.8 −0.0 (0.5) 0.97 95.6 0.0 (0.5) 0.97 95.6
b231d −0.1 (0.1) 0.97 95.2 −50 (0.15) 1.12 0.0 −0.1 (0.1) 0.99 95.4
exp(b34)f 0.3 (0.3) 1.07 97.0 1.0 (0.4) 1.02 96.0 1.0 (0.4) 1.02 96.0
MST in state 1g 0.0 (0.1) 1.01 95.6 0.3 (0.1) 0.97 95.0 0.0 (0.1) 0.97 95.2
MST in state 2g 0.3 (0.3) 0.97 94.8 28 (0.3) 1.12 0.8 0.2 (0.2) 0.99 95.8
MST in state 3g −0.2 (0.3) 1.08 96.0 −0.2 (0.4) 1.03 95.4 −0.3 (0.4) 1.03 95.4
MST Sojourn time, SE standard error, ASE asymptotic standard error, ESD empirical standard deviation, CP coverage probability.
a
States are 1 = healthy, 2 = cancer precursor state, 3 = clinical cancer state, 4 = death. Transitions from state i to j have rate parameter qij = exp(bij1+bij2z) where covariate z follows a standard normal
distribution. True parameter values are b121 = −5.5, b122 = b232 = 0.5, b141 = b241 = −4, b142 = b242 = 0.2, exp(b34)=1, b231 = −3 (when an exponential rate parameter), and b231 = −3+log(2) (when a gamma
rate parameter).
b
L.C. Cheung et al.

Presented results are based on 500 simulation runs. Relative bias is the average difference in estimated versus true parameter, standardised by dividing by the true parameter value and reported on the
percentage scale. Positive bias indicates overestimation and negative indicates underestimation.
c
Covariate effect for the transition from healthy to cancer precursor state is the same as for the transition from cancer precursor to clinical cancer state. This constraint is used in both data simulation and model
estimation.
d
Transition intensities in which the rate parameter q23 = exp(b231+b232z) is for an exponential vs. gamma distribution under the different simulation or model settings.
e
Transition intensity for death does not change among cancer-free individuals when reaching the cancer precursor state. This constraint is applied to both data simulation and model estimation.
f
Because, the relative bias of b34 = 0 is not well-defined, we assessed relative bias against exp(b34) = 1.
g
Mean sojourn time is for the reference covariate level (z = 0) with values of 44.6 (state 1), 14.7 (state 2— exponential 2−>3 transition), 15.6 (state 2—gamma 2−>3 transition) and 1 (state 3).

British Journal of Cancer (2022) 127:1279 – 1288


L.C. Cheung et al.
1285
a b
True, HPV+ True, HPV-
15
True, new HPV+
Surv., fixed visits, HPV+ Surv., fixed visits, HPV-
Markov, fixed visits, new HPV+
Surv., random visits, HPV+ Surv., random visits, HPV-
Surv., DC, HPV+ Surv., DC, HPV- Markov, random visits, new HPV+
Surv., FA, HPV+ Surv., FA, HPV- Markov, DC, new HPV+
Markov, fixed visits, HPV+ Markov, fixed visits, HPV- Markov, FA, new HPV+
15 Markov, random visits, HPV+ Markov, random visits, HPV- Surv., fixed visits, new HPV+
Markov, DC, HPV+ Markov, DC, HPV- Surv., random visits, new HPV+
Markov, FA, HPV+ Markov, FA, HPV- Surv., DC, new HPV+
Surv., FA, new HPV+

10
Cumulative risk

Cumulative risk
10

5
5

0 0

0 2 4 6 8 10 0 2 4 6 8 10
Time (years) Time (years)

Fig. 3 Comparison of risk estimation approaches in data generated under various observation schemes. Risk estimation approaches
compared are for a two-state survival [Surv.] approach vs. a time-homogeneous Markov [Markov] model. Observation schemes considered are:
[fixed visits] in which visits occur every 6 months; [random visits] in which visits are generated using a mixed-cases interval-censoring scheme
with intervals between visits normally distributed with a mean of 1 year; doctor’s care [DC] in which intervals between visits are normally
distributed with means of 1 and 3 years following positive and negative HPV results, respectively; and follow-up avoidance [FA] in which
intervals between visits are normally distributed with means of 5 and 3 years following positive and negative HPV results, respectively.
a Precancer risk stratified by HPV status at study start (denoted as time 0). For precancer risks following HPV+ at the study start, the true risk
for the simulated data (red curve), all Markov-estimated risk (orange curves), and survival-estimated risks under fixed visits (blue curve, solid
lines) overlap; survival-estimated risks under random visits and doctor’s care overlap (blue curve, uniformed dashed lines) with one another
for t < 1 year and overlap with the true risk for t > =1 year. For precancer risks following HPV- at the study start, the true risk for the simulated
data (black curve), all Markov-estimated risks (purple curves), and all survival-estimated risks (dark green curves) overlap. b Precancer risk
following the detection of incident HPV infections (time of detection denoted as time 0). Markov-estimated risks are for the observation
scheme currently recommended for primary HPV screening in the US: 1-year and 5-year retesting intervals following HPV-positive and HPV-
negative results, respectively. All Markov-estimated risk (orange curves) overlap. All presented risk curves are averages of 500 simulation runs.

microsimulation models [75]. These risk estimates can be used to For both the multistate model and the two-state survival
determine whether more invasive procedures are required [59]. approaches, we used methods that accounted for intermittent
However, the assumptions of the multistate model should be observations, which results in interval-censored transition times
assessed. In simulations examining the effect of informative [81, 82]. Our findings on the robustness of risk estimation for two-
observation times, we assumed that the competing transitions of state survival approaches only apply to approaches that properly
HPV clearance and progression follow exponential distributions; account for interval-censoring. Prior articles have shown that
however, current evidence indicates that these transition inten- ignoring interval-censoring can lead to invalid inferences [83, 84].
sities vary with time [76, 77]. Our first set of simulations show that In addition, risk analyses in cancer data often exclude individuals
treating a semi-Markov process as Markov can result in consider- with clinical cancer at enrolment and assume individuals are in a
able bias and can misinform screening recommendations. healthy state. Without assessment for preclinical states, this
Modelling a reversible semi-Markov process with panel data is assumption can lead to the biased inference, especially when
particularly difficult with methods available only when individuals compounded by ignoring interval censoring [76]. In hidden
are observed at prespecified fixed times [30, 78]. Markov models [28] and two-state survival models [76], missing
We examined the scenario in which the observation process assessments or partial non-assessment, which arises when the
depends on an unmodelled intermediate disease state and disease state is defined by multiple tests, can be handled using a
showed that we can reduce bias in inference by expanding the missing at random (missingness depends on covariates) assump-
model to include that state. Other interrelationships between the tion. In semi-Markov models, occupancy of states other than the
disease process and the observation scheme can occur and are healthy state at enrollment can be challenging for estimation due
not as easily resolved, particularly when it depends on unknown to left truncation [85] and requires assuming a time prior to
characteristics such as shared random effects [72, 73]. enrollment in which all individuals are in the healthy state.
In our simulation scenario, we considered HPV-positive as an Standard multistate models typically assume population homo-
intermediate disease state in the cervical precancer process; in geneity in transition rates given observed covariates, but in cancer
practice, the clinician may also have HPV genotyping, cytology, a substantial portion of the “at-risk” population never progresses
and colposcopy/biopsy results. While these can be included as beyond the normal healthy state. Mover–stayer models have been
additional states in a multistate model [28], as the number of proposed to account for this latent population heterogeneity, with
states in the model increases, the complexity can quickly grow. An the model considered as a mixture of two independent Markov
alternative to multistate models is to use biologically inspired chains, one in which individuals remain in the normal healthy
tumour growth models that relate the unobserved natural history state and the other with an unknown transition matrix to be
of the tumour to patient outcomes [79, 80]. Such models are estimated [28, 86]. Mover–stayer models have also been used to
relatively new but have advantages in representing tumour characterise heterogeneity in polyp progression to colorectal
growth as a continuous process. cancer [87] and change in breast cancer tumour malignancy [88].

British Journal of Cancer (2022) 127:1279 – 1288


L.C. Cheung et al.
1286
Alternatively, population heterogeneity can be modelled using a 16. Yen MF, Tabar L, Vitak B, Smith RA, Chen HH, Duffy SW. Quantifying the potential
frailty term representing unobserved covariates [87], provided that problem of overdiagnosis of ductal carcinoma in situ in breast cancer screening.
this frailty term does not also influence the observation process. Eur J Cancer. 2003;39:1746–54.
Given the low incidence of cancer and a long course of time in 17. Olsen AH, Agbaje OF, Myles JP, Lynge E, Duffy SW. Overdiagnosis, sojourn time,
and sensitivity in the Copenhagen mammography screening program. Breast J.
which it develops, it may not be feasible to measure biomarkers or
2006;12:338–42.
assess potential cancer precursors in prospective cohorts. Two- 18. Yen AM, Chen HH. Modeling the overdetection of screen-identified cancers in
phase studies, in which individuals who reached higher disease population-based cancer screening with the Coxian phase-type Markov process.
states are oversampled are increasingly common (e.g., omics Stat Med. 2020;39:660–73.
testing using stored specimens). Such data requires accounting for 19. Taghipour S, Banjevic D, Miller AB, Montgomery N, Jardine AK, Harvey BJ. Para-
the observed process and covariates used for sample selection in meter estimates for invasive breast cancer progression in the Canadian National
the multistate model likelihood or by weighted likelihood Breast Screening Study. Br J Cancer. 2013;108:542–8.
estimation functions [40, 89]. 20. Wu YY, Yen MF, Yu CP, Chen HH. Individually tailored screening of breast cancer
Multistate models can be a useful tool for improved under- with genes, tumour phenotypes, clinical attributes, and conventional risk factors.
Br J Cancer. 2013;108:2241–9.
standing of the cancer process which can inform cancer screening
21. Uhry Z, Hedelin G, Colonna M, Asselain B, Arveux P, Rogel A, et al. Multi-state
and prevention strategies. As with all statistical models, correct Markov models in cancer screening evaluation: a brief review and case study. Stat
inference depends on proper specification and assumptions. Our Methods Med Res. 2010;19:463–86.
article examined two aspects of this using simulation studies and 22. Duffy SW, Day NE, Tabar L, Chen HH, Smith TC. Markov models of breast tumor
highlighted some of the methodological challenges specific to progression: some age-specific results. J Natl Cancer Inst Monogr. 1997;22:93–7.
cancer. 23. Launoy G, Smith TC, Duffy SW, Bouvier V. Colorectal cancer mass-screening:
estimation of faecal occult blood test sensitivity, taking into account cancer mean
sojourn time. Int J Cancer. 1997;73:220–4.
CODE AVAILABILITY 24. Prevost TC, Launoy G, Duffy SW, Chen HH. Estimating sensitivity and sojourn time
All codes used for data simulation and analysis are available on Github (https:// in screening for colorectal cancer: a comparison of statistical approaches. Am J
github.com/liccheung/multistate.model.simulations). Epidemiol. 1998;148:609–19.
25. Chen TH, Yen MF, Lai MS, Koong SL, Wang CY, Wong JM, et al. Evaluation of a
selective screening for colorectal carcinoma: the Taiwan Multicenter Cancer
Screening (TAMCAS) project. Cancer. 1999;86:1116–28.
REFERENCES
26. Chen CD, Yen MF, Wang WM, Wong JM, Chen TH. A case-cohort study for the
1. Beesley LJ, Shuman AG, Mierzwa ML, Bellile EL, Rosen BS, Casper KA, et al. disease natural history of adenoma-carcinoma and de novo carcinoma and
Development and assessment of a model for predicting individualized outcomes surveillance of colon and rectum after polypectomy: implication for efficacy of
in patients with oropharyngeal cancer. JAMA Netw Open. 2021;4:e2120055. colonoscopy. Br J Cancer. 2003;88:1866–73.
2. Beesley LJ, Morgan TM, Spratt DE, Singhal U, Feng FY, Furgal AC, et al. Individual 27. van Oortmarssen GJ, Habbema JD. Duration of preclinical cervical cancer and
and population comparisons of surgery and radiotherapy outcomes in prostate reduction in incidence of invasive cancer following negative pap smears. Int J
cancer using Bayesian multistate models. JAMA Netw Open. 2019;2:e187765. Epidemiol. 1995;24:300–7.
3. Le-Rademacher JG, Peterson RA, Therneau TM, Sanford BL, Stone RM, Mandrekar 28. Aron J, Albert PS, Wentzensen N, Cheung LC. Hidden mover-stayer model for
SJ. Application of multi-state models in cancer clinical trials. Clin Trials. disease progression accounting for misclassified and partially observed diag-
2018;15:489–98. nostic tests: application to the natural history of human papillomavirus and
4. Upshaw JN, Konstam MA, Klaveren D, Noubary F, Huggins GS, Kent DM. Multi- cervical precancer. Stat Med. 2021;40:3460–76.
state model to predict heart failure hospitalizations and all-cause mortality in 29. Taguchi A, Hara K, Tomio J, Kawana K, Tanaka T, Baba S, et al. Multistate Markov
outpatients with heart failure with reduced ejection fraction: model derivation model to predict the prognosis of high-risk human papillomavirus-related cer-
and external validation. Circ Heart Fail. 2016;9:e003146. vical lesions. Cancers. 2020;12:270.
5. van Vught LA, Klein Klouwenberg PM, Spitoni C, Scicluna BP, Wiewel MA, Horn J, 30. Kang M, Lagakos SW. Statistical methods for panel data from a semi-Markov
et al. Incidence, risk factors, and attributable mortality of secondary infections in process, with application to HPV. Biostatistics. 2007;8:252–64.
the intensive care unit after admission for sepsis. J Am Med Assoc. 31. Kay R. A Markov model for analysing cancer markers and disease states in survival
2016;315:1469–79. studies. Biometrics. 1986;42:855–65.
6. Lindbohm JV, Sipila PN, Mars NJ, Pentti J, Ahmadi-Abhari S, Brunner EJ, et al. 32. Chien CR, Lai MS, Chen TH. Estimation of mean sojourn time for lung cancer by
5-year versus risk-category-specific screening intervals for cardiovascular disease chest X-ray screening with a Bayesian approach. Lung Cancer. 2008;62:215–20.
prevention: a cohort study. Lancet Public Health. 2019;4:e189–99. 33. Wu GH, Auvinen A, Maattanen L, Tammela TL, Stenman UH, Hakama M, et al.
7. Sabia S, Fayosse A, Dumurgier J, Dugravot A, Akbaraly T, Britton A, et al. Alcohol Number of screens for overdetection as an indicator of absolute risk of over-
consumption and risk of dementia: 23 year follow-up of Whitehall II cohort study. diagnosis in prostate cancer screening. Int J Cancer. 2012;131:1367–75.
BMJ. 2018;362:k2927. 34. Bhatt R, van den Hout A, Pashayan N. A multistate survival model of the natural
8. Group DER, Nathan DM, Bebu I, Hainsworth D, Klein R, Tamborlane W, et al. history of cancer using data from screened and unscreened population. Stat Med.
Frequency of evidence-based screening for retinopathy in type 1 diabetes. N. 2021;40:3791–807.
Engl J Med. 2017;376:1507–16. 35. Lange JM, Gulati R, Leonardson AS, Lin DW, Newcomb LF, Trock BJ, et al. Esti-
9. Armitage P, Doll R. The age distribution of cancer and a multi-stage theory of mating and comparing cancer progression risks under varying surveillance pro-
carcinogenesis. Br J Cancer. 1954;8:1–12. tocols. Ann Appl Stat. 2018;12:1773–95.
10. Nowell PC. The clonal evolution of tumor cell populations. Science 36. Liu CY, Wu CY, Lin JT, Lee YC, Yen AM, Chen TH. Multistate and multifactorial
1976;194:23–8. progression of gastric cancer: results from community-based mass screening for
11. Greaves M. Cancer causation: the Darwinian downside of past success? Lancet gastric cancer. J Med Screen. 2006;13:S2–5.
Oncol. 2002;3:244–51. 37. Chen HH, Prevost TC, Duffy SW. Evaluation of screening for nasopharyngeal
12. Wacholder S. Precursors in cancer epidemiology: aligning definition and function. carcinoma: trial design using Markov chain models. Br J Cancer.
Cancer Epidemiol Biomark Prev. 2013;22:521–7. 1999;79:1894–900.
13. Duffy SW, Chen HH, Tabar L, Day NE. Estimation of mean sojourn time in breast 38. Divison of the Cancer Epidemiology and Genetics NCI. Connect for Cancer Pre-
cancer screening using a Markov chain model of both entry to and exit from the vention Study. https://dceg.cancer.gov/research/who-we-study/cohorts/connect
preclinical detectable phase. Stat Med. 1995;14:1531–43. Accessed 7 July 2022.
14. Chen HH, Duffy SW, Tabar L. A Markov chain method to estimate the tumour 39. Aalen OO, Borgan O, Fekjaer H. Covariate adjustment of event histories estimated
progression rate from preclinical to clinical phase, sensitivity and positive pre- from Markov chains: the additive approach. Biometrics. 2001;57:993–1001.
dictive value for mammography in breast cancer screening. J R Stat Soc D-Sta. 40. Cook RJ, Lawless JF. Multistate models for the analysis of life history data. Boca
1996;45:307–17. Raton, FL: CRC Press; 2018.
15. Duffy SW, Agbaje O, Tabar L, Vitak B, Bjurstam N, Bjorneld L, et al. Overdiagnosis 41. Yang Y, Nair VN. Parametric inference for time-to-failure in multi-state semi-
and overtreatment of breast cancer: estimates of overdiagnosis from two trials of Markov models: A comparison of marginal and process approaches. Can J Stat/La
mammographic screening for breast cancer. Breast Cancer Res. 2005;7:258–65. Rev Canadienne de Statistique. 2011;39:537–55.

British Journal of Cancer (2022) 127:1279 – 1288


L.C. Cheung et al.
1287
42. Cook RJ, Lawless JF, Lakhal-Chaieb L, Lee K-A. Robust estimation of mean func- and cervical cytology: a population-based study in routine clinical practice.
tions and treatment effects for recurrent events under event-dependent cen- Lancet Oncol. 2011;12:663–72.
soring and termination: application to skeletal complications in cancer metastatic 68. Clarke MA, Cheung LC, Castle PE, Schiffman M, Tokugawa D, Poitras N, et al. Five-
to bone. J Am Stat Assoc. 2009;104:60–75. year risk of cervical precancer following p16/Ki-67 dual-stain triage of HPV-
43. Oeffinger KC, Fontham ET, Etzioni R, Herzig A, Michaelson JS, Shih YC, et al. Breast positive women. JAMA Oncol. 2019;5:181–6.
cancer screening for women at average risk: 2015 guideline update from the 69. Wellner JA, Zhan Y. A hybrid algorithm for computation of the nonparametric
American Cancer Society. J Am Med Assoc. 2015;314:1599–614. maximum likelihood estimator from censored data. J Am Stat Assoc.
44. Shen Y, Zelen M. Robust modeling in screening studies: estimation of sensitivity 1997;92:945–59.
and preclinical sojourn time distribution. Biostatistics. 2005;6:604–14. 70. Jackson C. Multi-state models for panel data: the msm package for R. J Stat Softw.
45. Hsieh HJ, Chen TH, Chang SH. Assessing chronic disease progression using non- 2011;38:1–28.
homogeneous exponential regression Markov models: an illustration using a 71. Cheung LC, Berg CD, Castle PE, Katki HA, Chaturvedi AK. Life-gained-based versus
selective breast cancer screening in Taiwan. Stat Med. 2002;21:3369–82. risk-based selection of smokers for lung cancer screening. Ann Intern Med.
46. Etzioni R, Shen Y. Estimating asymptomatic duration in cancer: the AIDS con- 2019;171:623–32.
nection. Stat Med. 1997;16:627–44. 72. Gruger J, Kay R, Schumacher M. The validity of inferences based on incomplete
47. Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via observations in disease state models. Biometrics 1991;47:595–605.
the EM algorithm. J R Stat Soc Ser B. 1977;39:1–38. 73. Cook RJ, Lawless JF. Statistical issues in modeling chronic disease in cohort
48. Gail MH, Brinton LA, Byar DP, Corle DK, Green SB, Schairer C, et al. Projecting studies. Stat Biosci. 2014;6:127–61.
individualized probabilities of developing breast cancer for white females who 74. de Una-Alvarez J, Meira-Machado L. Nonparametric estimation of transition
are being examined annually. J Natl Cancer Inst. 1989;81:1879–86. probabilities in the non-Markov illness-death model: a comparative study. Bio-
49. Freedman AN, Slattery ML, Ballard-Barbash R, Willis G, Cann BJ, Pee D, et al. metrics. 2015;71:364–75.
Colorectal cancer risk prediction tool for white men and women without known 75. Campos NG, Demarco M, Bruni L, Desai KT, Gage JC, Adebamowo SN, et al. A
susceptibility. J Clin Oncol. 2009;27:686–93. proposed new generation of evidence-based microsimulation models to inform
50. Katki HA, Kovalchik SA, Berg CD, Cheung LC, Chaturvedi AK. Development and global control of cervical cancer. Prev Med. 2021;144:106438.
validation of risk models to select ever-smokers for CT lung cancer screening. J 76. Cheung LC, Pan Q, Hyun N, Schiffman M, Fetterman B, Castle PE, et al. Mixture
Am Med Assoc. 2016;315:2300–11. models for undiagnosed prevalent disease and interval-censored incident dis-
51. Bach PB, Kattan MW, Thornquist MD, Kris MG, Tate RC, Barnett MJ, et al. Variations ease: applications to a cohort assembled from electronic health records. Stat
in lung cancer risk among smokers. J Natl Cancer Inst. 2003;95:470–8. Med. 2017;36:3583–95.
52. Tammemagi MC, Katki HA, Hocking WG, Church TR, Caporaso N, Kvale PA, et al. 77. Katki HA, Schiffman M, Castle PE, Fetterman B, Poitras NE, Lorey T, et al. Five-year
Selection criteria for lung-cancer screening. N. Engl J Med. 2013;368:728–36. risks of CIN 3+ and cervical cancer among women who test Pap-negative but are
53. Marcus MW, Chen Y, Raji OY, Duffy SW, Field JK. LLPi: liverpool lung project risk HPV-positive. J Low Genit Trac Dis. 2013;17:S56–63.
prediction model for lung cancer incidence. Cancer Prev Res. 2015;8:570–5. 78. Aralis H, Brookmeyer R. A stochastic estimation procedure for intermittently-
54. Cassidy A, Myles JP, van Tongeren M, Page RD, Liloglou T, Duffy SW, et al. The LLP observed semi-Markov multistate models with back transitions. Stat Methods
risk model: an individual risk prediction model for lung cancer. Br J Cancer. Med Res. 2019;28:770–87.
2008;98:270–6. 79. Gasparini A, Humphreys K. Estimating latent, dynamic processes of breast cancer
55. Cheung LC, Ramadas K, Muwonge R, Katki HA, Thomas G, Graubard BI, et al. Risk- tumour growth and distant metastatic spread from mammography screening
based selection of individuals for oral cancer screening. J Clin Oncol. data. Stat Methods Med Res. 2022;31:862–81.
2021;39:663–74. 80. Abrahamsson L, Isheden G, Czene K, Humphreys K. Continuous tumour growth
56. Robbins HA, Cheung LC, Chaturvedi AK, Baldwin DR, Berg CD, Katki HA. Man- models, lead time estimation and length bias in breast cancer screening studies.
agement of lung cancer screening results based on individual prediction of Stat Methods Med Res. 2020;29:374–95.
current and future lung cancer risks. J Thorac Oncol. 2021;17:252–63. 81. Schick A, Yu Q. Consistency of the GMLE with mixed case interval-censored data.
57. Robbins HA, Berg CD, Cheung LC, Chaturvedi AK, Katki HA. Identification of Scand J Stat. 2000;27:45–55.
candidates for longer lung cancer screening intervals following a negative low- 82. Zhang Z, Sun J. Interval censoring. Stat Methods Med Res. 2010;19:53–70.
dose computed tomography result. J Natl Cancer Inst. 2019;111:996–9. 83. Panageas KS, Ben-Porat L, Dickler MN, Chapman PB, Schrag D. When you look
58. Perkins RB, Guido RS, Castle PE, Chelmow D, Einstein MH, Garcia F, et al. 2019 matters: the effect of assessment schedule on progression-free survival. J Natl
ASCCP risk-based management consensus guidelines for abnormal cervical cancer Cancer Inst. 2007;99:428–32.
screening tests and cancer precursors. J Low Genit Trac Dis. 2020;24:102–31. 84. Sutradhar R, Barbera L. Multistate models for examining the progression of
59. Cheung LC, Egemen D, Chen X, Katki HA, Demarco M, Wiser AL, et al. 2019 ASCCP intermittently measured patient-reported symptoms among patients with can-
risk-based management consensus guidelines: methods for risk estimation, cer: the importance of accounting for interval censoring. J Pain Symptom Manag.
recommended management, and validation. J Low Genit Trac Dis. 2021;61:54–62.
2020;24:90–101. 85. Tolusso D, Cook RJ. Robust estimation of state occupancy probabilities for
60. Egemen D, Cheung LC, Chen X, Demarco M, Perkins RB, Kinney W, et al. Risk interval-censored multistate data: an application involving spondylitis in psoriatic
estimates supporting the 2019 ASCCP risk-based management consensus arthritis. Commun Stat - Theory Methods. 2009;38:3307–25.
guidelines. J Low Genit Trac Dis. 2020;24:132–43. 86. Albert PS. A Mover-Stayer model for longitudinal marker data. Biometrics.
61. Demarco M, Egemen D, Raine-Bennett TR, Cheung LC, Befano B, Poitras NE, et al. 1999;55:1252–7.
A study of partial human papillomavirus genotyping in support of the 2019 87. Yen AMF, Chen THH, Duffy SW, Chen C-D. Incorporating frailty in a multi-state
ASCCP risk-based management consensus guidelines. J Low Genit Trac Dis. model: application to disease natural history modelling of adenoma-carcinoma in
2020;24:144–7. the large bowel. Stat Methods Med Res. 2010;19:529–46.
62. Wright TC Jr, Stoler MH, Behrens CM, Apple R, Derion T, Wright TL. The ATHENA 88. Chen HH, Duffy SW, Tabar L. A mover-stayer mixture of Markov chain models for
human papillomavirus study: design, methods, and baseline results. Am J Obstet the assessment of dedifferentiation and tumour progression in breast cancer. J
Gynecol. 2012;206:46.e1–e11. Appl Stat. 1997;24:265–78.
63. Stoler MH, Wright TC Jr, Parvu V, Vaughan L, Yanson K, Eckert K, et al. The 89. Hsu C-Y, Hsu W-F, Yen AM-F, Chen H-H. Sampling-based Markov regression
onclarity human papillomavirus trial: design, methods, and baseline results. model for multistate disease progression: applications to population-based
Gynecol Oncol. 2018;149:498–505. cancer screening program. Stat Methods Med Res. 2020;29:2198–216.
64. Schiffman M, Adrianza ME. ASCUS-LSIL Triage Study. Design, methods and
characteristics of trial participants. Acta Cytol. 2000;44:726–42.
65. Herrero R, Hildesheim A, Rodriguez AC, Wacholder S, Bratti C, Solomon D, et al.
Rationale and design of a community-based double-blind randomized clinical ACKNOWLEDGEMENTS
trial of an HPV 16 and 18 vaccine in Guanacaste, Costa Rica. Vaccine. This work utilised the computational resources of the NIH HPC Biowulf cluster (http://
2008;26:4795–808. hpc.nih.gov).
66. Herrero R, Schiffman MH, Bratti C, Hildesheim A, Balmaceda I, Sherman ME, et al.
Design and methods of a population-based natural history study of cervical
neoplasia in a rural province of Costa Rica: the Guanacaste Project. Rev Panam
AUTHOR CONTRIBUTIONS
Salud Publica. 1997;1:362–75.
LCC had full access to the data in the study and take final responsibility for the
67. Katki HA, Kinney WK, Fetterman B, Lorey T, Poitras NE, Cheung L, et al. Cervical
decision to submit it for publication. LCC, PSA, and RJC conceived and designed the
cancer risk for women undergoing concurrent testing for human papillomavirus
work. LCC drafted the manuscript. LCC and SD created the figures. All authors

British Journal of Cancer (2022) 127:1279 – 1288


L.C. Cheung et al.
1288
participated in statistical analysis and interpretation and in critical revision of the CONSENT TO PUBLISH
manuscript. None.

FUNDING ADDITIONAL INFORMATION


This work was funded in part by the Intramural Research Program of the US National Supplementary information The online version contains supplementary material
Institutes of Health (NIH)/National Cancer Institute. available at https://doi.org/10.1038/s41416-022-01904-5.

Correspondence and requests for materials should be addressed to Li C. Cheung.


COMPETING INTERESTS
The authors declare no competing interests. Reprints and permission information is available at http://www.nature.com/
reprints

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims
ETHICS APPROVAL AND CONSENT TO PARTICIPATE in published maps and institutional affiliations.
The research was carried out on simulated data. No ethics approval was necessary.

British Journal of Cancer (2022) 127:1279 – 1288

You might also like