You are on page 1of 22

Chapter 1

Multiple-spells competing risks


model using copula

Simon M. S. Lo

1
2 CHAPTER 1. MULTIPLE-SPELLS COMPETING RISKS MODEL

Abstract

We show that when multiple spells are available, the competing risks model can
be identified with an assumed form of copula even if its dependency parameter is
unknown. Specifically, the competing latent marginal distributions, cause-specific
cumulative incidence functions and the overall survival functions can be identified
nonparametrically, while the copula’s dependency parameter can be identified with
an assumed parametric form. We show in simulations that even if the parametric
form of the copula is unknown and sensitivity analysis is conducted, the width of the
bounds of the estimated latent marginal distributions can be reduced considerably
compared with the case when only single spells are available, in which the dependency
parameter cannot be identified. We propose a maximum likelihood estimator using
a flexible parametric function for the cause-specific cumulative incidence functions of
the observed multiple durations. As the underlying assumptions on the data generat-
ing process are completely different from the multivariate mixed proportional hazard
model, our estimator provides a pragmatic alternative for the practitioners if different
rival models are to be fitted and compared.

Keywords: Dependent censoring, Archimedean copula, Multiple spells


JEL: C24, C41

1.1 Motivation
It is well known that in a competing risks model only the overall survival func-
tion and the risk-specific cumulative incidence functions (or called subdistributions)
are identified, the latent marginal distributions are nonparametrically unidentified
without imposing some restrictions to the model (Cox, 1962; Tsiatis, 1975). While
the risk-specific cumulative incidence functions are the focus of research interests in
biostatistics (Kalbfleisch and Prentice, 2002), economists are mainly interested in
1.1. MOTIVATION 3

studying the latent marginal distributions (van den Berg, 2001). The latent marginal
distributions can be identified by exploiting the variations induced by covariates if
they are assumed to be mixed proportional hazard model or accelerated failure time
model (Heckman and Honoré, 1989). An alternative approach does not require co-
variates but assumes that the copula of the joint distribution is completely known,
i.e. both the parametric form and its parameter are known, while leaving the latent
marginal distributions unspecified (Zheng and Klein, 1995).

The main advantage of the first approach is its flexibility as they require only
semiparametric assumptions on the latent marginals while leaving the copula unspec-
ified. But the corresponding estimator relies heavily on the observation with failure
time closed to zero (Femanian, 2003) and there are still some critical issues to be
solved, see e.g. Lee and Lewbel (2013). And thus, in practices, additional parametric
assumptions of the marginal distributions as well as the joint distribution are often
imposed to the estimator (Crowder, 2012).

In contrast, the second approach can be easily implemented by a corresponding


maximum likelihood estimator which is called the copula graphic estimator (CGE)
and it avoids imposing restrictions on the latent marginal distributions which are
hard to be justified in certain circumstances, see e.g Fitzenberger and Wilke (2010).
In the presence of covariates, Lo and Wilke (2014) show that the conditional marginal
distributions and the covariate effects can be analyzed nonparametrically when the
parametric forms of the cumulative incidence functions are known. They claim that
while the latent marginal distributions are unobservable from the data, the choice of
the suitable model for the cumulative incidence functions can be verified empirically.
But the major shortcoming of the CGE is that the knowledge of the copula including
the parametric form and the dependency parameters are often unknown and non-
testable. For this reason, the CGE is usually applied in sensitivity analysis which
applies different parametric classes of copula and allows the dependency parameter of
the copula to vary over all possible values. The dependency parameter of the copula
4 CHAPTER 1. MULTIPLE-SPELLS COMPETING RISKS MODEL

determines the global dependency structure between the competing risks that can be
summarized by some usual rank dependency measures, e.g. the Kendall’s τ . On the
other hand, different parametric classes of the copula will have different properties of
tail dependency between the competing risks (Nelson, 2006).

Previous studies show that the estimated bounds for the latent marginal distribu-
tions using the CGE are often wide and the unknown dependency parameter rather
than the choice of the parametric class of the copula is the major cause (Lo and Wilke,
2014). Motivated by these findings, this paper shows that when multiple spells are
available and are jointly modelled by an Archimedean copula, the dependency pa-
rameter can be identified and estimated. Under certain conditions, the cause-specific
cumulative incidence functions and the overall survival functions of the multiple du-
rations can be identified nonparametrically except for a normalizing constant. And,
finally, the latent marginal distributions can be identified nonparametrically using
the standard CGE. Even if the parametric family of the copula is unknown, we show
with simulations that the width of the bounds of the estimated latent marginal distri-
butions estimated in sensitivity analysis can be reduced considerably compared with
the case where only single spells are available and the dependency parameter cannot
be identified.

We propose a maximum likelihood estimator for the copula parameter and the
parametric cause-specific cumulative incidence functions of the observed multiple du-
rations. The parametric version of our model is similar to the multiple-spells paramet-
ric multivariate mixed proportional hazard model (MMPHM), in which the mixture
distribution has similar role to the copula. But the assumptions here are made on the
cause-specific cumulative incidence functions rather than the latent marginal distri-
butions. These imply completely different restrictions imposed to the data generating
process. And hence, our model provides a pragmatic alternative for the practition-
ers when different rival models are to be fitted and compared, especially when the
MMPHM provides poorer fit to the data.
1.2. THE MODEL 5

In section ?? we provide identification results and propose a maximum likelihood


estimator for the model. In section ?? we run a series of simulation to investigate the
performance of the estimators. We conclude in the last section.

1.2 The Model

1.2.1 Identifications

In a competing risks model of two risks (a and b) with multiple spells, the size of
population is m. Let ni (> 1) be the number of pairs of latent marginal survival
functions for individual i. We assume that the probability of having individuals with
the same size of multiple spells is positive, i.e.

Assumption 1.1 Let Ni follows a discrete probability distribution such that


limm→∞ Pr(Ni = z) > 0 for some positive integer z(< ∞). Ni is independent to the
latent marginal durations.

For each individual i there are then 2ni spells of latent marginal durations. Let
n= m
P
i=1 ni . There are 2n spells of latent marginal durations in the population. Let

the k-th spell of the latent marginal durations of individual i with destination state
r, Trik , be distributed as the latent marginal survival function Sr (Trik ) = Pr(Trik >
trik ) = Urik for r = a or b. Assume that the latent marginal survivals Urik are jointly
distributed by an Archimedean copula, the joint distribution of the latent marginal
durations equals to

(2ni )
S(tai1 , tbi1 , ..., taini , tbini ) = Cθ (Sa (tai1 ), Sb (tbi1 ), ..., Sa (taini ), Sb (tbini )), (1.1)

(p) Pp
where Cθ (X1 , .., Xp ) = φ−1
θ ( j=1 φθ (Xj )) is a p(≥ 2)-dimensional Archimedean cop-
ula with copula generator φθ and unknown dependency parameter θ ∈ R. For (??)
to hold for any ni > 0, it requires that
6 CHAPTER 1. MULTIPLE-SPELLS COMPETING RISKS MODEL

Assumption 1.2 φθ is a continuous strictly decreasing function from [0, 1] to [0, ∞]


such that φθ (0) = ∞ and φθ (1) = 0 and its inverse φ−1
θ is completely monotonic on

[0, ∞].

Assumption ?? restricts the possible set of θ such that the marginals in (??) are
positively dependent with each other (Nelson, 2006, p.152).
(2ni )
Cθ in (??) is equivalent to a fully nested hierarchical copula that uses a ni -
dimensional copula joining the ni pairs of competing latent marginals (Joe, 1997):

(2ni ) (ni ) (2) (2)


Cθ (Uai1 , Ubi1 , ..., Ubi1 ) = Cθ (Cθ (Uai1 , Ubi1 ), ..., Cθ (Uaini , Ubini )), (1.2)

(2)
with Cθ (Uaik , Ubik ) in (??) is the distribution function of the identified minimum of
the latent marginal duration, Tik = min{Taik , Tbik }, which is given by

(2)
Cθ (Sa (taik ), Sb (tbik )) = S(tik ) = exp(−Λ(tik )), (1.3)

Rt
with S(t) = Pr(T > t) is the overall survival function and Λ(t) = 0
λ(u)du is the
integrated hazard function. Let λr be the cause-specific hazard function such that
Rt
λ = λa + λb and Λr (t) = 0 λr (u)du. The pseudo cause-specific survival function
(Jeong and Fine, 2006) is

Wr (t) = exp(−Λr (t)), (1.4)

such that S(t) = Wa (t)Wb (t) . The cumulative incidence function for risk r is Qr (t) =
Rt
Pr(T ≤ t, δr = 1) = 0 λr (u)S(u)du, and is related to the copula as

(2)
∂C (S (t
a aik ), S (t ))
b bik

Q′a (tik ) = − θ . (1.5)
∂taik


taik =tbik =tik

The risk indicator for the k-spell of individual i is δik = 1 if argminr={a,b} {Taik , Tbik } =
(2)
a. When Cθ (uaik , ubik ) in (??) is replaced by S(tik ), the joint distribution (??)
1.2. THE MODEL 7

becomes the joint distribution of Tik , given by

(ni )
S(ti1 , ..., tini ) = Cθ (S(ti1 ), ..., S(tini )), (1.6)

Since {Tik }k=1,...,ni are mutually dependent through the copula Cθni , identification of
(??) requires

Assumption 1.3 For some a priori chosen t0 , it holds that Λ(t0 ) = c for some
c ∈ (0, ∞).

(ni )
Assumption 1.4 Cθ is totally ordered, i.e. Cθn1i (u1 , ..., uni ) ≤ Cθp2 (u1 , ..., uni ) for
all u1 , ..., uni ∈ [0, 1]ni whenever θ1 ≤ θ2 or θ1 ≥ θ2 .

Given the normalizing Assumption ??, S(t) is identified up to a monotonic trans-


formation of the time scale. The sufficient condition for Assumption ?? is that
φ′θ1 (q)/φ′θ2 (q) is non-decreasing or non-increasing in q ∈ [0, 1] (Nelson, 2006, p.137).
This includes many popular families of copulas includes the Clayton, Gumbel, Frank,
Ali-Mikhail-Haq, and the Joe family, etc. We have the following identification results:

Proposition 1.1 If Assumption ?? - ?? hold, θ, S(t), Qr (t), and Sr (t) in (??), (??)
and (??) are identified except for a normalizing constant.

Proof. Without loss of generality, let ni = z, and the number of spells with δik = 1
is za . Given Assumption ?? and evaluated at t0 , (??) equals to S(t0 , ..., t0 ) =
(z)
Cθ (exp(−c), ..., exp(−c)). Given Assumption ??, the z-dimensional joint distribu-
tion function at the LHS can be evaluated from the population conditional on ni = z.
Given that φθ is a continuous, strictly decreasing function in θ, Assumption ?? im-
plies that the RHS is a strictly decreasing or increasing function in θ and thus θ can
be identified. Next, evaluate at any t, (??) equals to

z
!
(z)
X
S(t, ..., t) = Cθ (S(t), ..., S(t)) = φ−1
θ φθ (S(t)) = φ−1
θ (zφθ (S(t))), (1.7)
i=1
8 CHAPTER 1. MULTIPLE-SPELLS COMPETING RISKS MODEL

which is a strictly increasing function of S(t) under Assumption ??. S(t) is then
identified. To identify Qr (t), we obtain the joint density function of {(Tik , δik )}k=1,...,z
which is given by the z-order derivative of (??),

f ((t1 , δ1 ), ..., (tz , δz ))



z (2z)
∂ Cθ (Sa (ta1 ), Sb (tb1 ), ..., Sa (taz ), Sb (tbz ))
=
∂t1 ...∂tz


tak =tbk =tk ,∀k=1,...,z

z (z) (2) (2)
∂ Cθ (Cθ (Sa (ta1 ), Sb (tb1 )), ..., Cθ (Sa (taz ), Sb (tbz )))
= (2) (2)

∂Cθ (Sa (ta1 ), Sb (tb1 ))...∂Cθ (Sa (taz ), Sb (tbz )) (2)
C (Sa (tak ),Sb (tbk ))=S(tk ),∀k=1,...,z
θ
Y Y
× Q′a (tk ) Q′b (tk ). (1.8)
k|δk =1 k|δk =0

Consider the case of z = za , and evaluate tk at any t∗ for all k. Since θ and S(t∗ ) are
known, Q′a (t∗ ) is identified as

 1/za
z (z) (2) (2)
∂ Cθ (Cθ (., .), ..., Cθ (., .))
Q′a (t∗ ) = f ((t∗ , δ1 ), ..., (t∗ , δz ))/ (2) (2)
 .
∂Cθ (., .)...∂Cθ (., .) (2)
C (.,.)=S(t∗ ),∀k
θ

Finally, Sr (t) in (??) is identified by the CGE (Rivest and Wells, 2001), which is
given by
 Z t 
Sr (t) = 1 − φ−1
θ − ′ ′
φθ (S(u))Qr (u)du . (1.9)
0

This completes the proof. 

The identification results for θ and S(t) are similar to the univariate mixed pro-
portional hazard model (MPHM) with multiple spells as is discussed by Honoré
(1993). In fact, the MPHM is a special case of (??) in which the inverse of the
copula generator equals to the Laplace transform of a frailty distribution, φ−1 (s) =
L(s), and the individual frailty v enters multiplicatively on the hazard, such that
Rt
S(t) = 0 exp(−Λ(u)v)du = L(Λ(t)). These extra and crucial assumptions allow for
1.2. THE MODEL 9

nonparametric identification of the frailty distribution. It is because the joint distribu-


tion evaluated at t can be transformed as S(t, ..., t) = L( zi=1 L−1 (S(t))) = L(zΛ(t)).
P

And the relationship of S(t, ..., t) and Λ(t) can be used to trace out L nonparametri-
cally, and thus the entire copula function. The Laplace transform practically reduces
the z-dimensional copula function into an univariate frailty distribution. It is not
possible in (??), however, as the summation operator cannot go through φ as what
the MMPH does. The relationship of S(t, ..., t) and S(t) in (??) can only trace out the
45◦ line through the origin of the copula function, which can be used to identify the
entire copula only up to a given family. As discussed in Honoré (1993), one limitation
of the Laplace transform that might be relevant in applications is that the frailty
in the MPHM is constrained to be identical for each spell of duration. Elbers and
Ridder (1982) discuss another version of the MPHM, in which different frailties for
each duration are modelled by a multivariate frailty distribution. In such version of
MPHM, the multivariate frailty distribution can be identified nonparametrically by
exploiting the variations of the observed covariates, provided that the movements of
the covariates are sufficient to trace out the entire support of the underlying copula,
and stricter assumptions on the frailty distribution and the hazard function are also
necessary. In contrast, the copula function in Proposition ?? is not limited to the
Laplace transform and the multivariate copula function can be identified without the
assistance of covariates, but the copula needed to be specified parametrically.

Note that the results of Honoré (1993) and Elbers and Ridder (1982) concern the
identification of univariate duration model only, i.e. S(t) in (??). Our results go
beyond this and include the identification of the competing risks model in (??). Ab-
bring and van den Berg (2003) study the identification of competing risks model using
the multivariate MPHM (MMPHM), which is comparable to our case when S(ti ) is
identified in the first step. They show that the availability of multiple spells facilitates
identification of the dependency structure between the competing risks without re-
quiring covariate variations, provided that sufficient variations on the latent marginal
10 CHAPTER 1. MULTIPLE-SPELLS COMPETING RISKS MODEL

distributions across the spells are available. For instance, in a two risks model, the
P P
joint distribution equals to L( k Λa,k (tk ), k Λb,k (tk )). They require that the inte-
grated hazard function for the j-th spell Λa,j (tj ) cannot be identical to the integrated
hazard function for the l-th spell Λa,l (tl ), for any j 6= l. For, otherwise, similar to
the argument above, only the 45◦ line through the origin of the copula function can
be traced out. This implies that the latent marginal distributions are spell-specific,
which could be difficult to be justified in some applications. If this assumption does
not hold, identification requires variations of covariates and similar assumptions on
the frailty distribution as discussed by Elbers and Ridder (1982). These are not re-
quired in Proposition ??. Moreover, identification results of Abbring and van den
Berg (2003) are based on the assumption that the latent marginal distribution have
the form of MPHM while making no assumption on the copula. On the hand, Propo-
sition ?? is based on the assumption that the copula function is known up to the
dependency parameter while the latent marginal functions in (??) are completely un-
specified. As a final remark, similar to the multiple-spells univariate MPHM (van den
Berg, 2001), our model is over-identified, except in the special case that all individuals
have identical number of spells.

1.2.2 Estimation

In this section we propose a parametric maximum likelihood estimator for model


(??), (??) and (??). To increase flexibility we model the cause-specific pseudo survival
function in (??) by an odd-rate transformation model (Dabrowska and Doksum, 1988)
:

Wr (t) = (1 + αr Hr (t))−1/αr . (1.10)

Hr (t) can be interpreted as a flexible transformation of the duration variable, αr ∈ R.


For example, when αr → 0, Λr (t) = Hr (t); when α = 1, Hr (t) is the odds function
1.3. SIMULATIONS 11

and Λr (t) = 1/αr ln(1 + αr Hr (t)). And we model Hr (t) with the Gompertz function

Hr (t) = νr (exp(ρr t) − 1)/ρr , (1.11)

with νr , ρr ∈ R+ . The cause-specific cumulative incidence functions and the overall


survival function can be derived from (??) and (??) using (??), (??) and (??). The
resulting overall survival function is flexible to cover a variety of general shapes (see
Appendix 3). The likelihood function for the joint distribution of ni durations coming
from the m individuals is

L(θ, α, ν, ρ; X)
m
"
(n ) (2)
Y ∂ ni Cθ i (Sa (tai1 ), Sb (tbi1 ), ..., Cθ (Sa (taini ), Sb (tbini )))
= (2) (2)

i=1 ∂C (Sa (tai1 ), Sb (tbi1 ))...∂C (Sa (tain ), Sb (tbin ))
θ θ i i (2)
Cθ (Sa (taik ),Sb (tbik ))=S(tik )
ni
#
Y
× (Λ′a (tik ))δik (Λ′b (tik ))(1−δik ) S(tik ) , (1.12)
k=1

with α = (αa , αb )′ , ν = (νa , νb )′ , ρ = (ρa , ρb )′ , and X = (X1 , ..., Xm )′ with Xi =


((Ti1 , δi1 ), ..., (Tini , δini ))′ . After θ, α, ν, ρ are estimated, the estimator for Sr (t) in
(??) is derived by plugging the corresponding estimated values in the CGE given by
(??). The estimator for Sj (t) is consistent and asymptotically normal with known
asymptotic covariance matrix. Other parametric models for S(t) and Qr (t) can be
used in (??) and standard goodness of fit tests for non-nested model can be applied
to compare different models.

1.3 Simulations
In this section we study the finite properties of the estimator using simulations. In
all simulations, we set θ such that the implied Kendall’s tau is 0.4. The parameters
of the odd rate transformation model in (??) and the Gompertz function in (??)
are (ν1 , ρ1 , α1 ) = (2, 1, 0.5) and (ν2 , ρ2 , α2 ) = (1, 2, 1.5). The simulated Q1 (∞) and
12 CHAPTER 1. MULTIPLE-SPELLS COMPETING RISKS MODEL

Q2 (∞) are .6 and .4 respectively. We simulate 400 samples of 400 pairs of competing
latent marginal distributions, in which m = 200 and ni = 2 for all i. We assess
the performance of the estimators by computing the mean squared error (MSE) and
the squared bias (SB) of the estimated Kendall’s tau, τ , with MSE(τ ) = E(τ̂ −
τ )2 , and SB(τ ) = E(E(τ̂ ) − τ )2 . The expected values are approximated by taking
the average of the simulated 400 samples. We compute the average mean squared
error (AMSE) and the average squared bias (ASB) of the estimated latent marginal
survival distributions, with AMSE(Sr ) = L1 Ll=1 E(Ŝr (tl ) − Sr (tl ))2 and ASB(Sr ) =
P

1
PL 2
L l=1 (E(Ŝr (tl ))−Sr (tl )) for risk r = 1, 2 and tl = t1 , . . . , tL are 100 equidistant grid

points on the support of T . To adjust for the scale effect on the bias measures, we
normalize the MSE/AMSE by computing the relative MSE/AMSE (RMSE/RAMSE),
with RMSE(τ )= MSE(τ )/τ and RAMSE(Sr )= AMSE(Sr )/S̄r where S̄r is the average
value of Sr (tl ) taken over 100 grid points. As the biases are different across the
duration axis, we also report the maximum values of MSE and SB across the grid
points, denoted by MMSE and MSB. Figures plotting the values of the actual Sr (t)
and the estimated Ŝr (t) are also included.

In the first part of simulations, we study the bias of the CGE when the copula
function is wrongly specified. We consider three copulas with different properties of
tail dependency, namely, the Frank copula (no tail dependency), the Gumbel copula
(upper tail dependency), and the Clayton copula (lower tail dependency). We try
3x3 combinations of these copulas to simulate and estimate the data, while keeping
the cause-specific pseudo survival functions simulated and estimated by the odd rate
transformation model with Gompertz function. Results are reported in Table ??. In
the first column we simulate the data with Frank copula which are then estimated
by the Frank, Gumbel and the Clayton copula in the first, second and third row
respectively. In the second column, the data are simulated with the Gumbel copula
and the Clayton copula in the third column. The AMSE, ASB, MSB and MMSE are
reported in unit of 1e-2 and RAMSE are reported in percentage.
1.3. SIMULATIONS 13

Table 1.1: Average mean squared error (AMSE), average squared bias (ASB), max-
imum squared bias (MSB) and maximum MSE (MMSE) reported in unit 1e-2, and
relative AMSE (RAMSE) reported in %. Data simulated and estimated with different
copulas.

Simulated by: Frank Gumbel Clayton


Est. by Frank S1 S2 τ S1 S2 τ S1 S2 τ
AMSE/MSE 0.01 0.03 0.07 0.03 0.07 0.11 0.09 0.41 0.10
ASB/SB 0.00 0.00 0.00 0.01 0.04 0.01 0.07 0.37 0.01
MMSE 0.04 0.05 0.07 0.13 0.17 0.11 0.25 0.83 0.10
MSB 0.00 0.01 0.00 0.08 0.12 0.01 0.21 0.79 0.01
RAMSE/RMSE 0.07% 0.14% 0.18% 0.17% 0.29% 0.28% 0.48% 1.87% 0.25%
Est. by Gumbel S1 S2 τ S1 S2 τ S1 S2 τ
AMSE/MSE 0.02 0.09 0.47 0.01 0.04 0.08 0.06 0.39 0.49
ASB/SB 0.01 0.04 0.40 0.00 0.00 0.00 0.05 0.35 0.34
MMSE 0.09 0.22 0.47 0.04 0.05 0.08 0.21 0.95 0.49
MSB 0.06 0.18 0.40 0.00 0.01 0.00 0.18 0.91 0.34
RAMSE/RMSE 0.13% 0.36% 1.18% 0.08% 0.16% 0.20% 0.35% 1.81% 1.21%
Est. by Clayton S1 S2 τ S1 S2 τ S1 S2 τ
AMSE/MSE 0.02 0.10 0.56 0.02 0.10 0.71 0.01 0.02 0.08
ASB/SB 0.00 0.07 0.48 0.01 0.06 0.60 0.00 0.00 0.00
MMSE 0.06 0.16 0.56 0.11 0.22 0.71 0.04 0.05 0.08
MSB 0.02 0.14 0.48 0.10 0.19 0.60 0.00 0.00 0.00
RAMSE/RMSE 0.09% 0.40% 1.40% 0.13% 0.39% 1.78% 0.07% 0.08% 0.20%
14 CHAPTER 1. MULTIPLE-SPELLS COMPETING RISKS MODEL

The results simulated and estimated by the same copula can be viewed as bench-
mark for comparisons, in which the ASB/SB are all zero and the AMSE/MSE are
mainly driven by sample variances. The maximum SB are also very closed to zero
and the RAMSE/MSE are all not greater than 0.2%. For those estimated with a
wrong copula, the AMSE/MSE and ASB/SB all have order less than 10−2 and the
RAMSE/RMSE are all less than 2%. Regarding the estimates of S1 and S2 , the
maximum biases of MSE and SB are apparently higher. Most obvious are the data
simulated by Clayton, which has lower tail dependency and thus has stronger de-
pendency between longer durations. The MSB for S2 are as high has .0091, which
stands for a bias of 9.5 percentage point. Panel (e) and (f) in Figure ?? indicates
that the maximum bias appears at the longer region of the durations, specifically,
when the survival function is about 0.1, i.e. only one-tenth of the sample remaining
at risks. These observations suggest when the data is estimated by a misspecified
copula with different properties of tail dependencies, it will not cause substantial bias
in the estimated global dependency (represented by Kendall’s τ ) nor substantial bias
in the overall performance of the estimated latent survival functions (represented by
AMSE/ASB and RAMSE/RMSE), but the estimator performs relatively weaker to
capture the local variations along the duration axis, especially when there are fewer
observations remained in the risks set. Regarding the estimates of the Kendall’s τ ,
no obvious patterns are observed. The largest SB is found in the model simulated by
Gumbel and estimated by Clayton with value equals to .006 which stands for a bias
of 0.077, i.e. the estimated Kendall’s τ is around .032.

Next, we study the bias of the CGE when the cause-specific pseudo survival
functions in (??) and (??) are wrongly specified. The data are simulated by the
Frank copula and the odd rate transformation model (ORT) with Gompertz function
(GOM), and are estimated by Frank copula but with other models of cause-specific
pseudo survival function. These include the log-normal accelerated failure time model
h i h i
(LNAFT) with hr (t) = ρr t Φ ( ρr ) / 1 − Φ( ρr ) and Wr (t) = 1 − Φ( ln t−ν
1 ′ ln t−νr ln t−νr
ρr
r
),
1.3. SIMULATIONS 15

Figure 1.1: Comparison of actual (solid line) and estimated (dashed line) latent
marginal survival functions from Table ??.

(a)Sim. by Frank, est. by Gumbel (b) Sim. by Frank, est. by Clayton


S1 S2 S1 S2
1

1
.5

.5

.5

.5
0

0
0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100
grids grids grids grids

(c) Sim. by Gumbel, est. by Frank (d) Sim. by Gumbel, est. by Clayton
S1 S2 S1 S2
1

1
.5

.5

.5

.5
0

0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100


grids grids grids grids

(e) Sim. by Clayton, est. by Frank (f) Sim. by Clayton, est. by Gumbel
S1 S2 S1 S2
1

1
.5

.5

.5

.5
0

0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100


grids grids grids grids
16 CHAPTER 1. MULTIPLE-SPELLS COMPETING RISKS MODEL

νr ρr (νr t)(ρr −1)


the log-logistic proportional odds model (LLPOM) with hr (t) = 1+(νr t)ρr
and
1
Wr (t) = 1+(νr t)ρr
. We also estimate the model with the frank copula and the MM-
PHM using Weibull distribution as the baseline hazard. The MMPHM makes no as-
sumptions on the cause-specific pseudo survival functions but on the latent marginal
survival functions, i.e. Sr (t) = exp(−νr tρr ) in (??).

Table 1.2: Average mean squared error (AMSE), average squared bias (ASB), maxi-
mum squared bias (MSB), maximum MSE (MMSE), and relative AMSE (RAMSE),
estimated by misspecified cause-specific pseudo survival functions (CSPSF) and the
MMPHM.
Estimated by LNAFT Estimated by LLPOM Estimated by MMPHM
S1 S2 τ S1 S2 τ S1 S2 τ
AMSE/MSE 0.30 0.34 0.09 0.17 0.15 0.08 4.10 6.60 21.5
ASB/SB 0.29 0.31 0.01 0.17 0.13 0.01 4.04 6.49 21.3
MMSE 0.58 0.53 0.09 0.29 0.27 0.08 12.46 14.51 21.5
MSB 0.55 0.51 0.01 0.29 0.25 0.01 12.38 14.43 21.3
RAMSE/RMSE 1.63% 1.36% 0.22% 0.94% 0.59% 0.20% 21.98% 26.46% 53.80%

Comparing Table ?? with Table ??, it finds that the estimator for the latent
marginal survivals is generally more sensitive to the misspecified cause-specific pseudo
survival functions than the misspecified copula. However, while the AMSE, ASB,
MSB, and RAMSE for the models estimated by LNAFT and LLPOM are still com-
parable to that in Table ??, the results estimated by the MMPHM perform consid-
erably poorer than the other models. The ASB for S2 is 6.49 which means a bias of
.25 percentage point on average. And the maximum MSB is 14.43, which means a
maximum bias of 38 percentage point. Most interestingly, the model estimated by
MMPHM is the only model in Table ?? and ?? that produces considerable bias in
the estimated Kendall’s τ . In fact, the estimated Kendall’s τ for the MMPHM is .86,
which is much larger than .4 and results in a SB at 21.3. We can see in Figure ?? that
the biases of the estimated latent marginal survival functions appear along the entire
duration axis, which indicates that the estimator not only has poor performance lo-
cally but also globally. This fact is reflected in the high values of RAMSE and RMSE
1.3. SIMULATIONS 17

for the MMPHM model. Note that the differences between the MMPHM and the
other models is that the MMPHM is built on the latent marginal survival functions
while the ORT, LNAFT, and the LLPOM are built on the cause-specific pseudo sur-
vival functions, and thus on the cause-specific cumulative incidence functions. As
is suggested in (??), the relationship between the latent marginal survivals and the
cause-specific cumulative incidence functions are complicated, which makes the data
generated by these two mechanism fundamentally different and is most likely to be
the major cause for the biased estimates reported in Table ??. These results confirm
that when the underlying data generating process is unknown, the CGE serves as an
important rival model alternative to the MMPHM, such that the practitioners can fit
and compare both models in order to find a better fit.

Finally, we compare the sensitivity analyzes using multiple spells with that using
single spell. Both assume that the correct cause-specific pseudo survival functions are
known. For the case of multiple spells, we assume that the copula’s family and its
dependency parameter are unknown. The sensitivity analyzes are done by fitting the
data with different copulas. We plot Ŝr (t) estimated by different copulas in a single
figure in Panel (a) of Figure ??. For the single spell case, we assume that the Frank
copula is known but the dependency parameter is unknown. We estimate the latent
marginals with the standard single-spell CGE using different assumed Kendall’s τ ,
including -.9, -.7, -.5, -.3, -.1, .1, .3, .5, .7., and .9. Different estimates are combined
into a single figure in Panel (b) of Figure ??. Results show that the width of the
bounds of the estimated Ŝr (t) is much more sensitive to the misspecified Kendall’s
τ than the misspecified copula. These suggest that even when the correct copula is
unknown in practices, the availability of multiple spells, which makes identification
of the dependency parameter possible, can reduce the uncertainly of the estimated
CGE considerably.
18 CHAPTER 1. MULTIPLE-SPELLS COMPETING RISKS MODEL

Figure 1.2: Comparison of actual (solid line) and estimated (dashed line) latent
marginal survival functions from Table ??.
(a) Estimated by LNAFT (b) Estimated by LLPOM
S1 S2 S1 S2
1

1
.5

.5

.5

.5
0

0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100


grids grids grids grids

(c) Estimated by MMPHM


S1 S2
1

1
.5

.5
0

0 20 40 60 80 100 0 20 40 60 80 100
grids grids
1.4. CONCLUSIONS 19

Figure 1.3: Comparison of sensitivity analysis using multiple spells and using single
spell. Simulated by Frank Copula and odd rate transformation model with Gompertz
function (black line).
(a) Different estimates by different copulas (b) Different estimates by Frank copula
with unknown Kendall’s τ (Grey lines) with different Kendall’s τ (Grey lines)
S1 S1 S1 S2
1

1
.8

.8

.8

.8
.6

.6

.6

.6
.4

.4

.4

.4
.2

.2

.2

.2
0

0
0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100
grid grid grids grids

1.4 Conclusions

Zheng and Klein (1995) show that the competing risks model can be identified non-
parametrically if the copula is completely known. While this approach avoids making
assumptions on the latent marginals, it requires a rather strong assumption on the
copula, particularly the dependency parameter. Our simulations studies confirm pre-
vious findings (Lo and Wilke, 2014) that the identified latent marginals are more
prone to the misspecified dependency parameter of the copula than the misspeci-
fied parametric form. We show in this paper that the availability of multiple spells
improves the identification results by relaxing the identifying assumptions. By assum-
ing that the multiple spells of competing latent marginals are jointly distributed by a
parametric copula, the dependency parameter of the copula can be identified from the
joint distribution of the multiple spells of observable durations. Given the identified
dependency parameter, the identification results of Zheng and Klein (1995) can be
applied to identify the competing latent marginals nonparametrically. We show that
even when the correct parametric form of the copula is unknown in an application and
sensitivity analyses are conducted, the width of the bounds of the CGE using multiple
20 CHAPTER 1. MULTIPLE-SPELLS COMPETING RISKS MODEL

spells can be reduced considerably than the case when only single spells are available.
We propose a parametric maximum likelihood estimator in which different forms of
cause-specific pseudo survival functions can be fitted and compared, including the
flexible odd rate transformation model. Since the models built on the cause-specific
pseudo survival functions impose very different restrictions on the underlying DGP
as the models built on the competing latent marginal survival functions. The bias of
the estimates could be substantial when the underlying DPG is misspecified even if
the parametric copula is correct. We claim, therefore, that the CGE using multiple
spells serve as a pragmatic alternative to the popular MMPHM when different rival
models are to be fitted and compared.

Reference

Abbring, J. and van den Berg, G. (2003) The identifiability of the mixed proportional
hazards competing risks model. Journal of the Royal Statistical Society, B, 65,
701-710.
Cox, D.R. (1962) Renewal Theory. London: Methuen.
Crowder, M. (2012) Multivariate Survival Analysis and Competing Risks. Boca Ra-
ton, USA: CRC.
Dabrowska, D. and Doksum. K (1988) Estimation and testing in a two-sample gen-
eralized odds-rate model. Journal of the American Statistical Association, 83,
744-749.
Elbers, C. and Ridder, G. (1982) True and spurious duration dependence: The iden-
tification of the proportional hazard model. Review of Economics Studies, 49,
403-409.
Femanian, J. (2003) Nonparametric estimation of competing risks model with covari-
ates. Journal of Multivariate Analysis, 85, 156-191.
Fitzenberger, B. and Wilke, R. (2010) New insights into unemployment duration
and post unemployment earnings in Germany. Oxford Bulletin of Economics and
Statistics, 72, 796-826.
Heckman, J. and Honoré, B. (1989) The identifiability of the competing risks model.
Biometrika, 76, 325-330.
1.4. CONCLUSIONS 21

Honoré, B. (1993) Identification results for duration models with multiple spells. The
Review of Economics Studies, 60, 241-246.
Jeong, J. and Fine. J. (2006) Direct parametric inference for the cumulative incidence
function. Journal of the Royal Statistics Society C, 55, 187-200.
Joe, H. (1997) Multivariate Models and Dependence Concepts. London: Chapman
and Hall.
Kalbfleisch, J. and Prentice, R. (2002) The statistical analysis of failure time data,
Wiley.
Lee, S., Lewbel, A. (2013) Nonparametric identification of accelerated failure time
competing risks models. Econometric Theory, 29, 905-919.
Lo, S. and Wilke, R. (2014) A regression model for the Copula-Graphic Estimator.
Journal of Econometric Methods, 3, 20-41.
Nelsen, R. (2006) An Introduction to Copulas, 2nd Edition, New York: Springer.
Rivest, L. and Wells, M. (2001) A martingale approach to the Copula-Graphic Estima-
tor for the survival function under dependent censoring, Journal of Multivariate
Analysis, 79, 138-155.
Tsiatis, A. (1975) A nonidentifiability aspect of the problem of competing risks. Pro-
ceedings of the National Academy of Sciences, 72, 20-22.
Van den Berg, G. (2001) Duration models: Specification, identification and multiple
Durations. In Handbook of Econometrics, Vol.5, North-Holland.
Zheng, M. and Klein, J. (1995) Estimates of marginal survival for dependent com-
peting risks based on assumed copula. Biometrika, 82, 127-138.

Appendix 3
22 CHAPTER 1. MULTIPLE-SPELLS COMPETING RISKS MODEL

Figure 1.4: Examples of odd rate transformation model using Gompertz function.
(a) ν= .01, ρ=3, α=2.5, 1.5, .5, -.5, -1.5 (b) ν= .5, ρ=0.5, α=2.5, 1.5, .5, -.5, -1.5
1

1
.8

.8
.6

.6
.4

.4
.2

.2
0

0 1 2 3 4 0 1 2 3 4
t t

You might also like