Professional Documents
Culture Documents
com/bjc
ARTICLE
Epidemiology
This is a U.S. Government work and not under copyright protection in the US; foreign copyright protection may apply 2022
BACKGROUND: Multistate models can be effectively used to characterise the natural history of cancer. Inference from such models
has previously been useful for setting screening policies.
METHODS: We introduce the basic elements of multistate models and the challenges of applying these models to cancer data.
Through simulation studies, we examine (1) the impact of assuming time-homogeneous Markov transition intensities when the
intensities depend on the time since entry to the current state (i.e., the process is time-inhomogenous semi-Markov) and (2) the
effect on precancer risk estimation when observation times depend on an unmodelled intermediate disease state.
RESULTS: In the settings we examined, we found that misspecifying a time-inhomogenous semi-Markov process as a time-
homogeneous Markov process resulted in biased estimates of the mean sojourn times. When screen-detection of the intermediate
disease leads to more frequent future screening assessments, there was minimal bias induced compared to when screen-detection
1234567890();,:
BACKGROUND that even richer classes of models for the natural history of cancer
Multistate models characterise the movement of individuals can be developed.
through successive states in a disease process. These are often In this paper, we introduce the basic elements of multistate
used to describe the occurrence of complications and mortality in models and how they can be used, describe some of the
patients receiving treatment for cancer [1–3] and other diseases methodological challenges unique to studies involving cancer and
[4, 5]. They have also proven useful in describing the natural through simulation studies, we examine two modelling challenges
history of diseases, including, for example, the progression that can impact inferences aiming to inform cancer screening
through different risk strata for cardiovascular disease [6], the practice. In the first simulation study, we investigate the impact of
impact of alcohol consumption on dementia and the mediating misspecifying transition intensities on inferences about sojourn
role of cardiometabolic disease [7], and the transition of times in disease states. In the second simulation study, we target
individuals through progressively advanced states of retinopathy the absolute risk of disease based on a two-state survival model
among diabetics [8]. but examine the impact of an intermittent observation process
The notion of cancer as a multistate process originated in that depends on an intermediate disease state.
Armitage and Doll’s theory of carcinogenesis [9], which concep-
tualises the cancer process as a series of mutational events leading
to malignancy. Since then, the clonal evolution of cancer has been METHODS
demonstrated repeatedly [10, 11], and several clinically identifiable Introduction to multistate models
cancer precursor states have been discovered [12]. Multistate Figure 1a displays a general K-state model in which arrows depict
models characterising the transitions across healthy, preclinical, transitions that can be made directly; here they can be made from each
and clinical cancer states have been used in breast [13–22], state to any other state, but many models feature restrictions. The two-
state survival model (Fig. 1b), competing risk model (Fig. 1c) and illness-
colorectal [23–26], cervical [27–30], liver [31], lung [32], prostate
death model (Fig. 1d) are examples of useful multistate models that have
[33–35] and gastric [36] cancers. In cancers caused by viruses such been widely studied and applied in cancer research. We let X(t) denote the
as cervical (human papillomavirus (HPV)) and nasopharyngeal state occupied by an individual at age t taking on one of the values 1,2,…,K
(Epstein–Barr) cancers, it has been useful to include an infection and let Hðt Þ ¼ fX ðsÞ; 0 s tg be the history of state occupancy up to
state in the multistate process [28, 30, 37]. The availability of large- age t, t > 0. We consider a set of fixed attributes recorded in a covariate
scale cohorts featuring longitudinal multi-omics data [38] mean vector z. The risk of movement between states i and j may be characterised
1
Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD, USA. 2Department of Statistics and
Actuarial Science, University of Waterloo, Waterloo, ON, Canada. ✉email: li.cheung@nih.gov
a
2 = Cancer 2 = Clinical
1 = Healthy
precursor cancer
4 = Death
b
Acquisition Progression
2 = HPV-positive, 3 = HPV-positive,
1 = HPV-negative cervical precancer
precancer-free
Clearance
Fig. 2 Examples of multistate cancer processes. a Progressive cancer process with four states: healthy, preclinical cancer, clinical cancer, and
death. b Cancer process with three states: HPV-negative, HPV-positive without precancer, and HPV-positive with precancer. Acquired HPV
infections can either clear or progress.
Modelled transition Exponential Exponential Exponential for all except 2−>3 transition, which
intensities is gamma
Parametera Relative bias, mean (SE), ASE/ 95% CP, Relative Bias, mean (SE), ASE/ 95% CP, Relative bias, mean (SE), ASE/ 95% CP,
%b ESDb %b %b ESDb %b %b ESDb %b
b121 0.0 (0.0) 0.99 94.6 −0.4 (0.0) 1.01 95.0 0.0 (0.0) 1.00 96.0
b122 = b232c,d −0.1 (0.3) 0.98 94.2 8.1 (0.3) 1.04 83.2 0.1 (0.3) 1.02 95.4
e
b141 = b241 0.0 (0.0) 0.98 95.4 0.0 (0.0) 0.95 94.6 0.0 (0.0) 0.95 94.6
b142 = b242e −0.3 (0.5) 0.98 94.8 −0.0 (0.5) 0.97 95.6 0.0 (0.5) 0.97 95.6
b231d −0.1 (0.1) 0.97 95.2 −50 (0.15) 1.12 0.0 −0.1 (0.1) 0.99 95.4
exp(b34)f 0.3 (0.3) 1.07 97.0 1.0 (0.4) 1.02 96.0 1.0 (0.4) 1.02 96.0
MST in state 1g 0.0 (0.1) 1.01 95.6 0.3 (0.1) 0.97 95.0 0.0 (0.1) 0.97 95.2
MST in state 2g 0.3 (0.3) 0.97 94.8 28 (0.3) 1.12 0.8 0.2 (0.2) 0.99 95.8
MST in state 3g −0.2 (0.3) 1.08 96.0 −0.2 (0.4) 1.03 95.4 −0.3 (0.4) 1.03 95.4
MST Sojourn time, SE standard error, ASE asymptotic standard error, ESD empirical standard deviation, CP coverage probability.
a
States are 1 = healthy, 2 = cancer precursor state, 3 = clinical cancer state, 4 = death. Transitions from state i to j have rate parameter qij = exp(bij1+bij2z) where covariate z follows a standard normal
distribution. True parameter values are b121 = −5.5, b122 = b232 = 0.5, b141 = b241 = −4, b142 = b242 = 0.2, exp(b34)=1, b231 = −3 (when an exponential rate parameter), and b231 = −3+log(2) (when a gamma
rate parameter).
b
L.C. Cheung et al.
Presented results are based on 500 simulation runs. Relative bias is the average difference in estimated versus true parameter, standardised by dividing by the true parameter value and reported on the
percentage scale. Positive bias indicates overestimation and negative indicates underestimation.
c
Covariate effect for the transition from healthy to cancer precursor state is the same as for the transition from cancer precursor to clinical cancer state. This constraint is used in both data simulation and model
estimation.
d
Transition intensities in which the rate parameter q23 = exp(b231+b232z) is for an exponential vs. gamma distribution under the different simulation or model settings.
e
Transition intensity for death does not change among cancer-free individuals when reaching the cancer precursor state. This constraint is applied to both data simulation and model estimation.
f
Because, the relative bias of b34 = 0 is not well-defined, we assessed relative bias against exp(b34) = 1.
g
Mean sojourn time is for the reference covariate level (z = 0) with values of 44.6 (state 1), 14.7 (state 2— exponential 2−>3 transition), 15.6 (state 2—gamma 2−>3 transition) and 1 (state 3).
10
Cumulative risk
Cumulative risk
10
5
5
0 0
0 2 4 6 8 10 0 2 4 6 8 10
Time (years) Time (years)
Fig. 3 Comparison of risk estimation approaches in data generated under various observation schemes. Risk estimation approaches
compared are for a two-state survival [Surv.] approach vs. a time-homogeneous Markov [Markov] model. Observation schemes considered are:
[fixed visits] in which visits occur every 6 months; [random visits] in which visits are generated using a mixed-cases interval-censoring scheme
with intervals between visits normally distributed with a mean of 1 year; doctor’s care [DC] in which intervals between visits are normally
distributed with means of 1 and 3 years following positive and negative HPV results, respectively; and follow-up avoidance [FA] in which
intervals between visits are normally distributed with means of 5 and 3 years following positive and negative HPV results, respectively.
a Precancer risk stratified by HPV status at study start (denoted as time 0). For precancer risks following HPV+ at the study start, the true risk
for the simulated data (red curve), all Markov-estimated risk (orange curves), and survival-estimated risks under fixed visits (blue curve, solid
lines) overlap; survival-estimated risks under random visits and doctor’s care overlap (blue curve, uniformed dashed lines) with one another
for t < 1 year and overlap with the true risk for t > =1 year. For precancer risks following HPV- at the study start, the true risk for the simulated
data (black curve), all Markov-estimated risks (purple curves), and all survival-estimated risks (dark green curves) overlap. b Precancer risk
following the detection of incident HPV infections (time of detection denoted as time 0). Markov-estimated risks are for the observation
scheme currently recommended for primary HPV screening in the US: 1-year and 5-year retesting intervals following HPV-positive and HPV-
negative results, respectively. All Markov-estimated risk (orange curves) overlap. All presented risk curves are averages of 500 simulation runs.
microsimulation models [75]. These risk estimates can be used to For both the multistate model and the two-state survival
determine whether more invasive procedures are required [59]. approaches, we used methods that accounted for intermittent
However, the assumptions of the multistate model should be observations, which results in interval-censored transition times
assessed. In simulations examining the effect of informative [81, 82]. Our findings on the robustness of risk estimation for two-
observation times, we assumed that the competing transitions of state survival approaches only apply to approaches that properly
HPV clearance and progression follow exponential distributions; account for interval-censoring. Prior articles have shown that
however, current evidence indicates that these transition inten- ignoring interval-censoring can lead to invalid inferences [83, 84].
sities vary with time [76, 77]. Our first set of simulations show that In addition, risk analyses in cancer data often exclude individuals
treating a semi-Markov process as Markov can result in consider- with clinical cancer at enrolment and assume individuals are in a
able bias and can misinform screening recommendations. healthy state. Without assessment for preclinical states, this
Modelling a reversible semi-Markov process with panel data is assumption can lead to the biased inference, especially when
particularly difficult with methods available only when individuals compounded by ignoring interval censoring [76]. In hidden
are observed at prespecified fixed times [30, 78]. Markov models [28] and two-state survival models [76], missing
We examined the scenario in which the observation process assessments or partial non-assessment, which arises when the
depends on an unmodelled intermediate disease state and disease state is defined by multiple tests, can be handled using a
showed that we can reduce bias in inference by expanding the missing at random (missingness depends on covariates) assump-
model to include that state. Other interrelationships between the tion. In semi-Markov models, occupancy of states other than the
disease process and the observation scheme can occur and are healthy state at enrollment can be challenging for estimation due
not as easily resolved, particularly when it depends on unknown to left truncation [85] and requires assuming a time prior to
characteristics such as shared random effects [72, 73]. enrollment in which all individuals are in the healthy state.
In our simulation scenario, we considered HPV-positive as an Standard multistate models typically assume population homo-
intermediate disease state in the cervical precancer process; in geneity in transition rates given observed covariates, but in cancer
practice, the clinician may also have HPV genotyping, cytology, a substantial portion of the “at-risk” population never progresses
and colposcopy/biopsy results. While these can be included as beyond the normal healthy state. Mover–stayer models have been
additional states in a multistate model [28], as the number of proposed to account for this latent population heterogeneity, with
states in the model increases, the complexity can quickly grow. An the model considered as a mixture of two independent Markov
alternative to multistate models is to use biologically inspired chains, one in which individuals remain in the normal healthy
tumour growth models that relate the unobserved natural history state and the other with an unknown transition matrix to be
of the tumour to patient outcomes [79, 80]. Such models are estimated [28, 86]. Mover–stayer models have also been used to
relatively new but have advantages in representing tumour characterise heterogeneity in polyp progression to colorectal
growth as a continuous process. cancer [87] and change in breast cancer tumour malignancy [88].
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims
ETHICS APPROVAL AND CONSENT TO PARTICIPATE in published maps and institutional affiliations.
The research was carried out on simulated data. No ethics approval was necessary.