Professional Documents
Culture Documents
To cite this article: A. A. Sunethra & M. R. Sooriyarachchi (2020): A novel method for joint
modeling of survival data and count data for both simple randomized and cluster randomized data,
Communications in Statistics - Theory and Methods, DOI: 10.1080/03610926.2020.1713366
1. Introduction
1.1. Response variables found in medical data
In medical data, health status of a patient can be measured with respect to various
measurements such as time to survival from the disease, stage/severity of the disease,
count of the disease recurrences occurred etc. For example, recurrences of tumors
among cancer patients, recurrences of seizures among epileptic patients, recurrences of
Myocardial infarctions among cardiovascular patients illustrate some probable disease
recurrences of disease progression. In diseases with such recurrent episodes of the dis-
ease progression, it is of paramount importance that the treatment/medication given to
patients should have an impact on both the number of disease recurrences occurred
and on the time to survival from the disease and hence identification of risk factors can
be considered with respect to both of these variables. Such analysis would reveal
whether the treatment had reduced the number of disease recurrences and increased the
survival time. When considering suitable statistical models for these response variables,
a Poisson/Negative Binomial regression model would be suitable for the response vari-
able of number of disease recurrences which falls into the category of generalized liner
models and survival regression models which is a specialized form of regression used
especially on medical data.
Often these responses are correlated with each other such that patients with higher
number of recurrences would have a shorter survival time or vice versa. When two or
more response variables are found to be correlated with each other, fitting a bivariate
model is preferable to fitting separate univariate models to each response variable
because a bivariate/joint model will capture the dependency between the responses.
Alternative approaches found in the literature in the analysis of this kind of data
includes only analysis where the survival variable was taken as a response variable and
other correlated variables such as stage of the disease, number of disease recurrences
etc. were taken as the predictor variable of the survival models (Verity, Hosking, and
Easter 2008; Kwong and Hutton 2003). This approach is criticized by the authors since
it treats the variables of disease progression such as number of disease recurrences, stage
of the disease as constant/fixed for a given patient as explanatory/predictor variables in
a statistical model. However, literature suggests it is better to specify these variables as
response variables mainly because such variables resemble a random component within
a given patient (Cowling 2003; Cowling, Hutton, and Shaw 2006). This background
manifested the requirement of bivariate statistical models for the case of two correlated
responses found in medical data. A joint model (bivariate model) simultaneously mod-
els the two responses.
diseases whereas the stage of the disease/type of the disease can be a fixed component
for certain diseases.
The dearth of studies which have considered bivariate modeling of survival and count
responses also further strengthen the choice of the response variables in this study.
However, it is noteworthy that the methodology behind the development of the joint
model/bivariate model for the responses of this study can be applied to other choices of
responses as well.
1.3. Objectives
The prime objective of this study is to propose an efficient methodology for analyzing cor-
related survival and count responses for both simple randomized and cluster randomized
data designs. To meet the intended objective, four variants of joint models were considered
which consisted of two types of joint models and two types of data designs.
The literature of joint models in survival data is dominated by joint modeling of survival
and longitudinal data. In contrast, this study considers the joint modeling of a survival
response and another count response variable which is observed only once i.e., a non-lon-
gitudinal response variable. Therefore, this study addresses a rather neglected area in joint
modeling of survival data (Wang 2013; Lai, Lavori, and Shih 2012). However, the methods
used in the development of joint models for survival and longitudinal data was studied for
acquiring insights into theory of joint model development. Therefore, Section 2 includes
the literature of joint models available for survival and longitudinal data followed by the
literature of joint model for survival and non-longitudinal data.
2. Literature review
2.1. Joint modeling of survival and longitudinal data
In the recent past, a large literature has been found in this area of joint modeling of
survival and longitudinal data (Wu et al. 2012). Such joint models of survival and longi-
tudinal data have been developed by specifying two sub models for the survival
response and the longitudinal responses with a two-stage modeling approach
(Rizopoulos 2012). Even though, this approach has been shown to be suitable for the
case of joint modeling of survival and longitudinal biomarker data, it is not suitable for
the joint modeling scenario considered in this study because the count response variable
considered in this study is not a longitudinally measured biomarker.
About the methodologies considered in these joint models, survival data has usually
been modeled by using semi-parametric models popularly known as Cox models while
longitudinal responses are assumed to follow the Normal distribution which is suitable
only for continuous biomarkers. For example, the joint model developed by Rizopoulos
(2012) which considers the CD4 cell count as the longitudinal responses uses square
root of the CD4 cell counts and assume Normal distribution. In contrary, this research
extends the normal response to a count response from the Exponential family and
extends the semi-parametric Cox model for the survival response to consider parametric
survival models and falls into the category of joint modeling of survival data and non-
longitudinal data.
4 A. A. SUNETHRA AND M. R. SOORIYARACHCHI
for each patient. Therefore, it can be suspected that the use of patient specific random
effects to join the two responses are more sensible than cluster specific random
effects.Therefore, the joint model proposed by them postulates an association at an
upper level of the data heirachy. In contrast, the joint model developed in this study
uses patient level random effects for joining the responses and hence is applicable for
non-clustered data as well whereas for the case of non-indepent data, the model will be
developed with two levels of random effects seperately for joining the two responses of
the same patient and for joining the patients in the same cluster.
In summary, the joint model developemnt in this study adress a gap in the literaure with
respect to the three aspects of types of response variables considered, the structure of data
considered and with respect to the methodology/theory used for model developemnt. The
methodology used in this research for joint model development falls into the class of ran-
dom effects models which were originally developed for modeling correlated data.
(Verbeke 2011). The two responses considered in this study are of different type which
account for the main reason for the choice of using this approach for joint model devel-
opment in this study.
The novelty of this study is the use of patient level random effects for developing
joint models particularly with survival data. The use of patient level/observational level
random effects result in complex integrating of the likelihood of the joint model with
respect to a larger vector of random effects compared to the vector of random effects
defined for groups/clusters of patients which is of smaller scale than the vector of ran-
dom effects defined at observational level.
Little attention has been paid to joint models in survival data with another single
response variable (Wang 2013). But, at early stages of a clinical trial, only a single
observation of the longitudinal response might be available and it might be worthy
enough to use this single response variable to evaluate the efficacy of the treatments to
the survival of the patients which will in turn provide insight into modifications to the
treatment plans at early stage of the clinical trials (Lai, Lavori, and Shih 2012).
3. Theory
The study initially considered the development of the joint model for simple random-
ized data and subsequently, the extension of the joint model for cluster randomized
data was considered. Hence, the theoretical development of the joint model for non-
clustered, simple randomized data is explained initially in Section 3.1.
where h ¼ ðb, c, r2v Þ denotes the parameter vector of the joint model. The integral in
Equation (1) cannot be calculated analytically for several generalized linear mixed mod-
els. Even if there are analytical expressions, these tend to be cumbersome (Molenberghs
et al. 2010; Molenberghs, Verbeke, and Demetrio 2007). Therefore, numerical approxi-
mations were considered for maximizing the joint model likelihood (Pinheiro and Bates
1995) where numerical approximations for the integrals is used for calculating the log-
likelihood and the score vector. Among the numerical approximation methods, adaptive
Gaussian quadrature method was used for integral approximations for estimating the
joint models in this study.
3.1.2.1. Derivations over the model. Once a joint model is fitted, it is of interest to
derive marginal means, marginal variances of the two responses, covariance and correl-
ation between the two responses over the fitted model. The correlation that is resultant
from this model between the two responses takes the form of:
8 A. A. SUNETHRA AND M. R. SOORIYARACHCHI
Therefore, it can be noted that the correlation between the two responses are being
dictated by the variance of the random effects (r2v Þ, variance parameters of the Poisson
models (b0, b1 ) and lognormal model (r2 Þ: As per the correlation formula 2, this joint
model can accommodate only positive correlations between the responses as 2 results
in only positive values for any given combination of the parameters involved. As
expected, r2v ! 0, corr ðTi , Yi Þ ! 0: Further, it is important to note that r2v !
1, Corr ðTi , Yi Þ ! 1
1 : i.e. when the variance of the random effects is increased,
½expðr2 Þ2
1=2
the maximum (limiting) correlation that the joint model approaches expðr2 Þ .
Therefore, the joint model developed with shared random effects cannot model high
correlations between the survival and count responses if the variance of the survival
responses is high. These derivations clearly reflected that the shared random effect mod-
els are succumb to strong assumptions on the plausible association/correlation between
the two responses. Similarly, Teixeira-Pinto and Normand (2009) highlighted these
restrictions with respect to joint modeling of continuous and binary responses and
Verbeke et al. (2014) also showed up these restrictions in the shared random effect
models for joint modeling of multivaraite longitutdinal data. This provided insight that
the use of separate random effects for each responses which follow a joint distribution
can mitigate the limitations levied by the use of shared random effects. In the presence
of negatively correlated responses, some indirect methods for shared random effects
models are suggested in the literature (Teixeira-Pinto and Normand 2009; Choi
et al. 2015).
Even though these indirect methods are applicable for responses like Binary, Ordinal
and continuous variables (modeled with Normal distribution), they are not applicable
for responses of survival and count considered in this study.
As joint random effects models are known for relaxing the restrictions imposed by
the shared random effects models (Verbeke et al. 2014), the development of a joint
model was extended to the use of separate, joint random effects which is suitable for
both negative and positive correlations. The development of the joint model with joint
random effects is explained in Section 3.1.3.
2
expðexpðX1i T b þ gv1i ÞÞðexpðX1i T b þ gv1i ÞÞyi 0 r1 qr1 r2
f yi jv1i ¼ , f ðv1i , v2i Þ ¼ N2 ,
yi ! 0 qr1 r2 r22
Similar to the joint model with shared random effects, method of maximum likeli-
hood estimation was used for model estimation. The parameter estimation involved
approximating the joint likelihood in 4 using numerical approximation, namely the
adaptive Gaussian quadrature rules.
The correlation between the two responses can be derived as:
As per the correlation formula 5, when q < 0 the joint model accommodates nega-
tive correlations between the responses and with q > 0 positive correlations can be
modeled .When considering the magnitude of the correlation between the responses as
can be inferred from the formula above, the correlation depends on the variance param-
eters of the marginal models (i.e., b0 , b1 , r2 ) and the variance and covariances of the
random effects (r21 , r22 , qÞ: Therefore, the level of the correlation can be adjusted by
varying the values of these parameters and sign of the correlation can be adjusted by
adjusting the sign of the parameter q:
A major contribution in the theoretical development of this study constitutes the
extension of the joint model for cluster randomized data. The theory behind the devel-
opment of the joint model for cluster randomized data is explained in Section 3.2.
responses of two patients of the same center ((CovðYij , Yi0 j Þ, CovðTij , Ti0 j Þ CovðYij , Ti0 j ÞÞ
while covariance between responses of the patients in different centers are zero.
Assume Yij to come from a Poisson distribution with a mean rate of kij and Tij to fol-
low the lognormal distribution with mean lij and variance rij : Let X1ij , X2ij be vectors
of explanatory variables associated with the distribution of Yij and Tij respectively.
In the presence of clustered data, a better choice for the univariate models would be
a generalized linear mixed model for the count response and a random effects survival
regression model for the survival response which corresponds to the univariate random
effects models for the responses separately. Hence, the univariate models for the two
responses are as follows. Yij juj Poisson kij where logðkij Þ ¼ X1ij 0 b þ u j a nd
0
Tij juj Lognormalðlij , rij Þ lij ¼ X2ij c þ uj and uj Normal 0, r2u .
As noted above, the separate univariate models are adjusted for the correlation present
within the patients of the same cluster while patients across the clusters are regarded as
independent. As these two-separate random effect model do not represent the correlation/
dependence between the responses of the same patient, a joint model will be specified at
patient level which is a better choice of statistical model for correlated responses which
model both the responses simultaneously. Therefore, these separate models are suitable
when the two responses Yij and Tij are not correlated/clustered at patient level. Then, join-
ing the above two univariate random effects models at patient level can induce the associ-
ation between the responses which are clustered/correlated at the patient level. Initially, the
development of the model with shared random effects is presented in this section.
logðtij Þlij 2
where marginal distributions are f tij ; lij , rij jvij , uj ¼ pffiffiffi1ffi exp 12 , Sðtij ;
2prij t rij
expðkij Þðkij Þyij
logðtij Þlij
lij , rij jvij , uj Þ ¼ 1 Unor rij , f yij ; kij jvij , uj ¼ yij ! , f ðvij Þ ¼ pffiffiffi
1ffi
exp
2prv
v 2 u 2
12 rijv and f ðuj Þ ¼ pffiffiffi
1ffi
2pru
exp 12 ruj :
Recall that f tij ; lij , rij jvij , uj denotes the probability density of the lognormal distri-
bution which gives the probability of the observed survival data (for dij ¼ 1Þ and
S tij ; lij , rij jvij , uj denotes the survival function of the lognormal distribution which
gives the probability of the censored survival data (for dij ¼ 0Þ: The probability density
of the Poisson distribution is given by f yij ; kij jvij , uj while f ðvij Þ and f ðuj Þ denotes the
distributions of the random effects.
For simplicity, the shape parameter of the lognormal distribution (rij Þ was treated as
a constant parameter and for some identifiability reasons, the scaling parameter of the
random effects was set at unity as g¼1 which allocates similar scale for the two
responses (Liu, Wolfe, and Kalbfleisch 2007).
3.2.1.1. Estimation. Let Oij denotes the observed data per patient, that is Oij ¼
yij , tij , dij , X1ij , X2ij and let h ¼ ðb, c, r2v , r2u Þ denotes the parameter vector of the joint
model. Then, the likelihood for Oij is:
ðð
d
1di
LðOij Þ ¼ f tij jvij , uj ; h i S tij jvij , uj ; h f Yij jvij , uj ; h f ðvij ; hÞf uj ; h dvij duj (7)
Q Qnj
The joint likelihood of the data can be formulated as LðO; hÞ ¼ m j¼1 i¼1 LðOij Þ:
The parameter estimation of the joint model involves maximizing of the above joint
likelihood which involves two levels of nested random effects. This is undoubtedly a
complex likelihood function with respect to maximization where the literature has high-
lighted the complexities in maximization of the likelihoods with nested random effects
even for univariate models with a single response variable (Raudenbush, Yang, and
Yosef 2000; Pinheiro and Chao 2006; Rabe-Hesketh, Skrondal, and Pickles 2005). In
contrast to the likelihood functions with standard distributions, the likelihood in 35
consist of a user-specified likelihood function with components from separate distribu-
tions assumed for the survival and count responses. As Gaussian quadrature method
has been identified as a suitable method for numerically approximating likelihood func-
tions with nested random effects in univariate models (Pinheiro and Chao 2006), the
same was used for approximating the joint model likelihood given in 18.
Above specification of the joint model resembles a joint model with nested random
effects where patient level random effects are nested within cluster level random effects
which can be considered as an extension of the joint models available in the literature
which uses center level random effects both for inducing the correlation between the
responses clustered at patient level and for correlation between the patients within the
same center. Therefore, the use of separate random effects which follow a nested struc-
ture in the field of joint modeling can be considered as a novel contribution of this
study. The correlation between the two responses of the same patient can be derived as
follows:
12 A. A. SUNETHRA AND M. R. SOORIYARACHCHI
cov Tij , Yij
corr Tij , Yij ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Var Tij VarðYij Þ
1
ðeðb0 þb1 X1i Þ Þ2 ðer2v þr2u Þðer2v 1Þðer2u 1Þ
¼ h i1=2
1=2 ðr2v þr2u Þ=2
½er2v þr2u ðer2 þr2u þr2v eru erv þ 1Þ þ eðb0 þb1 X1i Þ erv þru ðerv 1Þðeru 1Þ
2 2 2 2 2 2
e
(8)
As per the formula 8, the corr Tij , Yij depends on the variances of the two random
effects ðr2u , r2v Þ and variance parameters of the two marginal distributions (b0 , b1 , r2 ).
As per the formula 36, the correlation between the responses gets zero when at least
one covariance parameter tends to zero, that is when r2u ! 0 or r2v ! 0: The use of
nested random effects where the distribution of the patient level random effects to be
nested on the cluster level random effects can be considered as the reason for this
nature of the correlation between the responses. When considering the sign of the
correlation, this model can accommodate only positive correlations between the
responses. The correlations corr Tij , Ti0 j and corrðYij , Yi0 j Þ can also be derived in a simi-
lar fashion. When r2v !1, it can be seen that correlation of the two responses approxi-
mates to the following.
corr Tij , Yij ! 1=½expð2r2u þ r2 Þ1=2
Thus, the maximum correlation the joint model can accommodate is governed by the
variance of survival response (r2 Þ and the cluster covariance ðr2u Þ: A major limitation of
the shared random effects joint model is that only positive correlations between the
responses can be modeled. Therefore, the development of the joint model with joint
random effects is explained in the following section.
Then, the joint distribution of the two responses can be derived as:
ðð
f ðtij , yij Þ ¼ f ðtij , yij jv, uj Þf ðvÞ f ðuj dvduj
ðð (9)
d
1dij
¼ f tij jv2ij, uj ij S tij jv2ij , uj f yij jv1ij, uj f ðvÞf ðuj Þdvduj
1
where marginal distributions are f tij ; lij , rij jv2ij , uj ¼ pffiffiffi
1ffi
2prij t
exp 2
2
logðtij Þlij logðtij Þlij y
expðkij Þðkij Þ ij
rij Þ, S tij ; lij , rij jv2ij , uj ¼ 1 Unor rij , f Yij ; kij jv1ij , uj ¼ yij ! ,
2
f ðv1ij , v2ij Þ ¼ pffiffiffiffiffiffiffiffiffiffiffiffi
1
2
exp 12 vT R1 v and f ðuj Þ ¼ pffiffiffi 1ffi
2pr
exp 1 uj
2 ru :
ð2pÞ jRj u
3.2.2.1. Estimation. Let Oij denote the observed data per patient, i.e., Oij ¼
yij , tij , dij , X1ij , X2ij and let h ¼ ðb, c, R, r2u Þ denote the parameter vector of the joint
model. Then, likelihood for Oij is L Oij ; h
ðð
d
1dij
¼ f tij jv2ij, uj; h ij S tij jv2ij , uj ; h f yij jv1ij, uj ; h f ðv; hÞf ðuj ; hÞdvduj
Q Qnj
Then, the joint likelihood of the data is LðO; hÞ ¼ m
j¼1 i¼1 L Oij ; h :
The parameter estimation of the joint model involves maximizing the above joint
likelihood which involves two levels of nested random effects. The parameter estimation
turns out to be an intricate process which required integrating with two levels of nested
random effects. Numerical integration method of Adaptive Gaussian quadrature was
used here as well.
The correlation between the two responses of a patient can be derived as:
cov Tij , Yij
corr Tij , Yij ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ffi
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
cov Tij , Tij covðYij , Yij Þ
¼ h i1=2
1=2 ðr2 þr2 Þ=2
½er2u þr22 ðer2u þr22 þr2 eru er2 þ 1Þ
2 2
e 1 u þ eb0 þb1 X1ij er1 þru ðer1 1Þðeru 1Þ
2 2 2 2
(10)
As per the correlation formula in 9, when q < 0 the joint model accommodates
negative correlations between the responses and with q > 0 positive correlations can
be modeled.
4. Example
The data set was obtained from a multi-center randomized control trial on Epilepsy
which was conducted at 81 hospitals/centers with the objective of comparing the two
treatment policies of immediate and deferred antiepileptic treatments. For a concise
explanation on the trial, refer Marson et al. (2002).
Out of the several outcomes observed, the time from randomization to first seizure of
any type was considered as the survival response which is an internationally agreed out-
come on treatment evaluation for Epilepsy (ILAE Commission on Antiepileptic Drugs
14 A. A. SUNETHRA AND M. R. SOORIYARACHCHI
Pe rce nt
10 50
Exponential
10 3160.594
1
Normal
1
3124.492
0.1 0.1
1 01 1 1 10 0 0 00 0 1 1 10 0 00 0 00 0 00
00 0. 0. 10 0 00 0. 10 00 00
0. 10 10
0 0 10 10 00 00
10 10 00 00
10 10
time_seizure
time_seizure
Ex p o n e n t ia l N o rm a l
99
90
90
50
Pe rce nt
Pe rce nt
10 50
10
1
0.1 0.1
1 10 100 1000 10000 -2000 0 2000 4000
time_seizure time_seizure
1998). The other response was the count of seizures experienced by the patient during
the follow up period after randomization to the treatment in the trial. The demographic
variables such as age (in years), sex (1 ¼ male, 0 ¼ female) and clinical features such as
the type of epilepsy (0 ¼ partial and 1 ¼ generalized) and type of treatment
(1 ¼ immediate, 0 ¼ deferred) were considered as the predictor variables.
Binomial distribution indicated to capture both the over dispersion and the excess of
zeros present in the count data (Schmidt, Pereira, and Vieira 2008).
tp is the time by which 0 p0 proportion of patient has had their first seizure (pth quantile).
and center level random effects are desirable. Then, the joint model can be written
as Yij jv1ij , uj Negative Binomial kij , a
logðkij Þ ¼ X1ij T b þ v1ij þ uj and Tij jv2ij , uj Lognormalðlij , rij Þ
lij ¼ X2ij T b þ v2ij þ uj Where v ¼ ðv1ij , v2ij Þ Normalð0, RÞ
2
r1 qr1 r2
uj Normalð0, ru Þ and R ¼
2
qr1 r2 r22
4.3.2. Analyzing time to seizure and count of seizures with univariate and
joint models
For each model, the explanatory variables of Age, Sex, Epilepsy Type and Treatment
Type were considered initially both as main effects and as interaction terms. But, none
of the two-way interactions yielded significant improvement to the model fit. Therefore,
only main effects of the explanatory variables were considered in model fitting based on
the principle of parsimony. Then, the significance of squared terms of the Age variable
was considered which indicated improvements on the model fit of the count model
only. The model selection was done with backward elimination based on the p-values of
the model parameters and the AIC of the models (Park and Qiu 2014; M€ uller, Scealy,
and Welsh 2013; Zhang et al. 2014). The final models of the univariate random effects
models and the joint model are in Table 2.
It should be noted that variable selection into the final models was not solely
depended on the p-values of the parameter estimates because the literature prefer the
use of model fit statistics such as AIC, BIC for variable section in random effects mod-
els (M€uller, Scealy, and Welsh 2013; Bolker et al. 2009; Park and Qiu 2014). Then, the
final models resulted from Univariate models and the joint model were compared critic-
ally to evaluate the new joint model over the separate univariate models with respect to
COMMUNICATIONS IN STATISTICS—THEORY AND METHODS 17
fixed parameter estimates, random effects estimates and the standard errors of the par-
ameter estimates. Both models had center level random effects to adjust the models for
correlation between the patients within then same center and it could be seen that vari-
ance parameter of the center level random effects (r2u Þ were significant in both univari-
ate models and the joint models which justifies the inclusion of center level correlations
in the marginal models of both responses. But, it is noteworthy that the standard errors
of the center level random effects (standard errors of r2u Þ were lower in the joint model
than the two univariate models where models with lower standard errors for random
effects are considered better. The patient level random effects were used only in the
joint model for specifying the correlation between the two responses of the same patient
and it is noted that their variance covariance parameters (r2v1 , r2v1 , rv1v2 Þ were all highly
significant. The negative covariance estimates for rv1v2 indicates that the model has cap-
tured the negative correlation between the responses. Then, the fixed parameter esti-
mates of the two models were compared. When considered the fixed parameters that
were significant in the final Univariate survival model and joint model, both resulted in
the same set of explanatory variables. The sign of the coefficients (±) were also similar
in both models indicating that both the models implied a similar direction of associ-
ation between the time to seizure and each explanatory variable. But, the coefficients of
the parameter estimates differed in magnitude particularly in the intercepts (3.13, 6.05),
coefficients for the Epilepsy Type (.468, .91) and of the variance (r2 Þ parameter of the
survival times (2.9, 1.17). The estimate of the variance of the survival times (r2 Þ was
high in the univariate model (2.9) than the joint model (1.17) indicating that the joint
model has controlled the dispersion of the time to seizure response variable. Thus, these
differences in the coefficients of the parameter estimates give different survival estimates
from the two modeling approaches.
To explain the impact of the different parameter estimates, the parameter estimate
for the treatment variable was considered here. As per the coefficient for Treatment
type resulted from the univariate model, the expected count of seizures of the patients
treated with immediate treatment is 72% (exp(.331)) lower than the count of patients
from the deferred treatment group whereas the joint model estimates that the expected
count of the immediate treatment group is only 48%(exp(.733)) lower than the
deferred treatment group when other variables were held constant. When considered
the impact of treatment type to the time to first seizure, coefficients were positive in
both the univariate model (0.813) and the joint model (0.904) indicating a longer seiz-
ure-free duration for immediate treatment than the deferred treatment. When consid-
ered the impact of age to the count of seizures, both the univariate and the joint model
resulted negative coefficients indicating that the average number of seizures decreases
when the age increases. This may surprise some readers however past literature indi-
cates that the incidence of epileptic seizures is high in children, declines in younger
adults and increases in the elderly (above 55 years) (Stephen and Brodie 2000). In line
with this, the age cohort considered in the study had an average age of 30 years and a
median of 24 years implying most of these patients are children and young adults less
than 55 years and hence our results tally with the literature.
When considered the standard errors of the parameter estimates, the joint model had
lower standard errors than the univariate models for most of the parameter estimates
18 A. A. SUNETHRA AND M. R. SOORIYARACHCHI
which also indicated a better performance in the joint model than the univariate models. In
summary, comparison of the parameter estimates and their associated standard errors and
highly significant random effects shed light into a better joint model performance over the
univariate models. Then, model diagnostics and validations were considered.
Estimates
1.0
New Joint Model
Univariate Model
Center-Joint Model
Estimated Survival Probabilities
0.9 Actuals
0.8
0.7
0.6
0.5
0.4
Figure 2. Survival estimates of the univariate, center-joint model and new joint model.
joint model proposed in this study. Thus, it indicates that the use of patient level ran-
dom effects in the new joint model have been effective in controlling the dispersion in
the data (Harrison 2014, 2015). Then, the model diagnostics and the validation of the
center-level joint model was considered.
It was clearly indicated that the new joint model fits the data better than the existing
joint modeling approach considered in the center-level joint model.
Then, the survival probabilities of the test dataset were estimated as per the final cen-
ter level joint model. The estimated survival probabilities of the three types of models
(univariate, center-level joint model and new joint model) were plotted in the same fig-
ure for better comparison of the models.
The estimated survival probabilities given in Figure 2 clearly demonstrated that the newly
proposed joint model of this study outperformed the other model of center-level joint model
which is fitted as per the currently used approach for joint modeling of clustered data. But,
even the center-level joint model was better than fitting separate univariate models.
In summary, the analysis of the example data using separate univariate models, cen-
ter-level joint model and the new joint model indicated that the joint model proposed
in this study is better than the rest of the model.
5. Discussion
5.1. Important conclusions from the methodology
The main aspiration on the conduct of this research was to develop a bivariate model
consisting of a survival response and count response variable. The development of the
joint model/bivariate model deployed two popularly known approaches of joint model
20 A. A. SUNETHRA AND M. R. SOORIYARACHCHI
development, namely the shared random effects models and joint random effects mod-
els. Another key aspect that was considered in the model development was the underly-
ing design of the data where two popular designs of simple randomized designs and
cluster randomized designs were considered. Therefore, four types of joint models were
considered combining the two types of models and two types of designs.
The use of nested random effects models for joint model development, developing a
joint model for both simple randomized and cluster randomized data, developing joint
models for both positively and negatively correlated responses and developing joint
models consisting of fully parametric marginal models serves as the methodological
contributions of this study.
It is noteworthy that the univariate model for the count response was also a random
effects model defined with center level random effects which resulted in these ill-
behaved residuals (Figure 3).
Therefore, the use of patient level random effects which was the additional level of
random effects present in the joint model was suspected to be the source behind the
control of the dispersion of the count responses in the joint model. The use of patient
level random effects resembles the use of observational level random effects which has
been identified be as a method of controlling the dispersion of many random variables
(Harrison 2014, 2015).
Deriving sample size formulae for the two designs of simple randomized data
and cluster randomized data
Carryout simulation studies to test these models developed.
Developing goodness of fit test methods for the developed joint model
Implementing alternative methods of model estimation for the joint model devel-
oped for cluster randomized data.
5.4. Summary
The main aspiration in the conduct of this study was on developing a joint model for
the two responses of survival and count which was accomplished via developing four
distinct model which differed with respect to the nature of the correlation between the
responses and with respect to the design of the data. In the analysis of an actual dataset
of Epilepsy, the joint model demonstrated superior functionality than the univariate
models with aspects of parameter estimations, standard errors, model diagnostics and
validations. The type of the joint model proposed by the study for the example con-
sisted of two levels of nested random effects where the literature consisted only a single
level of random effects for joint modeling. Therefore, the proposed joint model was
compared with the compatible joint model defined with a single level of random effects
as well. The performance of the proposed joint model was undoubtedly better than the
joint model with single level of random effects as well. In summary, the developed joint
22 A. A. SUNETHRA AND M. R. SOORIYARACHCHI
models of this study showcased superior performance for analyzing correlated responses
of survival and count for the example. Therefore, the conduct of this research contrib-
uted to an improved methodology for analyzing correlated responses of survival and
count data.
Acknowledgment
The generous support rendered by Prof. Tony Marson of University of Liverpool and Prof. Jane
L. Hutton of University of Warwick in finding a suitable dataset for this study is deeply
appreciated.
References
Austin, P., H. Stryhn, G. Leckie, and J. Merlo. 2018. Measures of clustering and heterogeneity in
multilevel Poisson regression analyses of rates/count data. Statistics in Medicine 37 (4):572–89.
doi:10.1002/sim.7532.
Bolker, B., M. Brooks, C. Clark, S. Geange, J. Poulsen, and M. Steven. 2009. Generalized linear
mixed models: A practical guide for ecology and evolution. Trends in Ecology & Evolution 24
(3):127–35. doi:10.1016/j.tree.2008.10.008.
Carroll, R., D. Ruppert, L. Stefanski, and C. Crainiceanu. 2006. Measurement error in nonlinear
models: A modern perspective. India: CRC Press.
Choi, J., J. Cai, D. Zeng, and A. Olshan. 2015. Joint analysis of survival time and longitudinal cat-
egorical outcomes. Statistics in Biosciences 1:19–47. doi:10.1007/s12561-013-9091-z.
Cowling, B. J. 2003. Survival models for censored point processes. Doctoral dissertation,
University of Warwick.
Cowling, B. J., J. L. Hutton, and J. E. Shaw. 2006. Joint modelling of event counts and survival
times. Journal of the Royal Statistical Society: Series C (Applied Statistics) 55 (1):31–9. doi:10.
1111/j.1467-9876.2005.00529.x.
Gueorguieva, R. 2001. A multivariate generalized linear mixed model for joint modelling of clus-
tered outcomes in the exponential family. Statistical Modelling: An International Journal 1 (3):
177–93. doi:10.1177/1471082X0100100302.
Harrison, X. 2014. Using observation-level random effects to model over dispersion in count data
in ecology and evolution. PeerJ 2:E616. doi:10.7717/peerj.616.
Harrison, X. 2015. A comparison of observation-level random effect and beta-binomial models
for modelling overdispersion in binomial data in ecology & evolution. PeerJ 3:e1114. doi:10.
7717/peerj.1114.
Henderson, R., P. Diggle, and A. Dobson. 2000. Joint modelling of longitudinal measurements and
event time data. Biostatistics (Oxford, England) 1 (4):465–80. doi:10.1093/biostatistics/1.4.465.
Hougaard, P. 2000. Analysis of multivariate survival data. New York, NY: Springer.
ILAE Commission on Antiepileptic Drugs. 1998. Considerations on designing clinical trials to
evaluate the place of new antiepileptic drugs in the treatment of newly diagnosed and chronic
patients with epilepsy. Epilepsia 39 (7):799–803.
Kwong, G., and J. Hutton. 2003. Choice of parametric models in survival analysis: Applications
to monotherapy for epilepsy and cerebral palsy. Journal of the Royal Statistical Society: Series C
(Applied Statistics) 52:153–68. doi:10.1111/1467-9876.00395.
Lai, T., P. Lavori, and M. Shih. 2012. Sequential design of phase II-III cancer trials. Statistics in
Medicine 31 (18):1944–60. doi:10.1002/sim.5346.
Liu, L., R. Wolfe, and J. Kalbfleisch. 2007. A shared random effects model for censored medical
costs and mortality. Statistics in Medicine 26 (1):139–55. doi:10.1002/sim.2535.
Marson, A. G., P. R. Williamson, H. Clough, and J. L. Hutton. 2002. Carbamazepine versus val-
proatem onotherapy for epilepsy: A meta analysis. Epilepsia 43 (5):505–13. doi:10.1046/j.1528-
1157.2002.20801.x.
COMMUNICATIONS IN STATISTICS—THEORY AND METHODS 23