You are on page 1of 20

ARTICLE - ARXIV.

ORG

Discrete Bilal distribution with right-censored data

Bruno Caparroz Lopes de Freitasa , Jorge Alberto Achcarb , Marcos Vinicius de


Oliveira Peresb , Edson Zangiacomi Martinezb
a
State University of Maringá, Master Program in Biostatistics, Maringá, Brazil; b Ribeirão
arXiv:2111.01943v1 [stat.ME] 2 Nov 2021

Preto Medical School, University of São Paulo (USP), Ribeirão Preto, Brazil

ARTICLE HISTORY
Compiled November 4, 2021

ABSTRACT
This paper presents inferences for the discrete Bilal (DB) distribution introduced
by Altun et al. (2020). We consider parameter estimation for DB distribution in the
presence of randomly right-censored data. We use maximum likelihood and Bayesian
methods for the estimation of the model parameters. We also consider the inclusion
of a cure fraction in the model. The usefulness of the proposed model was illustrated
with three examples considering real datasets. These applications suggested that the
model based on DB distribution performs at least as good as some other traditional
discrete models as the DsFx-I, discrete Lindley, discrete Rayleigh, and discrete Burr-
Hatke distributions. R codes are provided in an appendix at the end of the paper
so that reader can carry out their own analysis.

KEYWORDS
Survival analysis; Maximum likelihood estimation; Cure fraction; Bayesian
inference; Discrete distributions; Censored data.

1. Introduction

Survival analysis is one of the statistical techniques most commonly encountered in the
medical literature (Flynn, 2012). These methods are applied when the time until the
occurrence of an event is the object of interest. Examples in medical research include
the time to respond to treatment, relapse-free survival time, time to death, time to
device failure, and time to regain mobility (Myers, 2007). The Kaplan-Meier plots, log-
rank tests, and Cox (proportional hazards) regression model are the most widely used
survival analysis techniques in medical studies (Le Rademacher and Wang, 2021). As
an alternative to the traditional proportional hazard model, parametric models have
become popular in the last decades. The parametric models assume that the time-to-
event variable follows a known probability distribution, such as Weibull, gamma, or
the log-normal distributions. Among the discrete distributions proposed in the sta-
tistical literature to model time-to-event data, we have the discrete Weibull distribu-
tion (Nakagawa and Osaki, 1975), the discrete Lindley distribution (Gómez-Déniz and
Calderı́n-Ojeda, 2012), the exponentiated discrete Weibull distribution (Nekoukhou
and Bidram, 2015; Cardial et al., 2020; Freitas et al., 2021), the discrete generalized

Corresponding author: Ribeirão Preto Medical School, Av. Bandeirantes 3900, University of São Paulo (USP),
Ribeirão Preto, 14049-900, Brazil. E-mail: edson@fmrp.usp.br
Rayleigh distribution (Alamatsaz et al., 2016), and the discrete Sushila distribution
(Oliveira et al., 2019).
Let X be a random variable denoting a survival time, and let x be an observation
of X. The continuous Bilal distribution introduced by Abd-Elrahman (2013) has a
probability density function (pdf) given by

6 2x x
fX (x) = e− θ 1 − e− θ , x ≥ 0, θ > 0
θ
and probability accumulated distribution function given by
2x x
FX (x) = 1 − e− θ 3 − 2e− θ .

The survival function, that is, the probability that an individual survives at least until
time x, is given by SX (x) = 1 − FX (x). The author named this distribution as Bilal
since this is his youngest son’s name (Abd-Elrahman, 2013). Classical and Bayesian
approaches to find an estimated value for the parameter θ of a Bilal distribution based
on a given type-2 right censoring sample are given by Abd-Elrahman and Niazi (2017).
Generalizations of the Bilal distribution are found in the works of Abd-Elrahman
(2019) and Shi et al. (2019).
To obtain a discrete version of the Bilal distribution, Altun et al. (2020) considered
that the random variable T has probability mass function (pmf) given by

P (T = t) = P (t ≤ X ≤ t + 1) = SX (t) − SX (t + 1), t ∈ N0 , (1)

where X is the underlying continuous random variable, T = [X] (the largest integer
less than or equal to X), and SX (x) = P (X > x) (see Methodology-IV
  in the article
− 2t − t
by Chakraborty (2015)). Thus, replacing SX (t) by e θ 3 − 2e θ and SX (t + 1) by
2(t+1)
 t+1

e− θ 3 − 2e− θ in the expression (1), we have

2t
 t
 2(t+1)
 t+1

P (T = t) = e− θ 3 − 2e− θ − e− θ 3 − 2e− θ . (2)

1
Let us assume the parameter transformation p = e− θ , 0 < p < 1. From (2) and,
following the notation of Altun et al. (2020), the pmf of the discrete Bilal distribution
is given by

f (t) = P (T = t) = 2(p3 − 1)p3t − 3(p2 − 1)p2t , t ∈ N0 , (3)

where 0 < p < 1. The corresponding probability accumulated distribution function is


given by

F (t) = P (T ≤ t) = 1 − (3 − 2pt+1 )p2(t+1) , t ∈ N0 ,

and the survival function is thus given by

S(t) = 1 − F (t) = P (T > t) = (3 − 2pt+1 )p2(t+1) , t ∈ N0 .

To simplify the obtaining of estimators for the parameters of the discrete Bilal distri-
bution, we consider the reparameterization p = e−β , where β > 0. Thus, the pmf is

2
given by

f (t) = P (T = t) = 2(e−3β − 1)e−3βt − 3(e−2β − 1)e−2βt , t ∈ N0 ,

where the corresponding probability accumulated distribution function is given by


h i
F (t) = P (T ≤ t) = 1 − 3 − 2e−β(t+1) e−2β(t+1) , t ∈ N0 ,

and the survival function is


h i
S(t) = 3 − 2e−β(t+1) e−2β(t+1) , t ∈ N0 .

The corresponding hazard function is given by

P (T = t) f (t) 2(e−3β − 1)e−βt − 3(e−2β − 1)


h(t) = P ( T = t| T ≥ t) = = = .
P (T ≥ t) S(t − 1) 3 − 2e−βt

Altun et al. (2020) showed that the mean and the variance of a random variable T
that follows a discrete Bilal distribution with parameter β are respectively given by

e−2β e−2β + e−β + 3



E(T ) = −2β
(e + e−β + 1) (1 − e−2β )

and

e−2β 3e−4β + 4e−3β − e−2β + 4e−β + 3



V ar(T ) = 2 2 .
(e−2β + e−β + 1) (e−2β − 1)

Let us denote a discrete Bilal distribution with parameter β as DB(β). Figure 1


presents graphs of the pmf, survival function, and hazard function of the discrete
Bilal (DB) distribution considering different values for β. We can note that the DB
distribution has an increasing hazard function.
The novelty of the present article consists in introducing the DB distribution to
model lifetime data in the presence of right-censored time-to-event data. We also
consider the inclusion of a cure fraction in the model. This paper is organized as follows.
Section 2 presents the maximum-likelihood (ML) estimation for the parameter of the
DB distribution based on complete and censored data. ML estimation in the presence
of censored data and a cure fraction is also discussed in this section. In addition,
Section 2 presents a Bayesian framework for the model. Three examples considering
real data from the medical literature are used in Section 3 to illustrate the usefulness
of this model to a broad range of problems. Finally, in Section 4, some concluding
remarks are presented. The computational codes used in this article are provided in
the Appendix.

3
(a) f(t), β = 0.05 (b) S(t), β = 0.05 (c) h(t), β = 0.05
0.05 1.0 ●●
0.10
●●●● ●● ●●
●● ●● ●● ●●● ●●● ●●●●
0.04 ● ●● 0.08 ●● ●● ●●
0.8 ● ● ● ●●
● ●● ●
●● ●●●
0.03 ●● ●● 0.06 ● ●●

S(t)
0.6

h(t)
● ●●

f(t)
●● ●●
●● ●● ●
0.02 ●● 0.04 ●
● ●● 0.4 ●● ●
●● ●● ●
●● ●●
0.01 ●●● 0.02

0.2 ● ●● ●● ●
●● ●● ●● ●
0.00 0.0 0.00

0 2 4 6 8 11 14 17 20 23 26 29 0 2 4 6 8 11 14 17 20 23 26 29 32 35 0 2 4 6 8 11 14 17 20 23 26 29 32 35

(d) f(t), β = 0.1 (e) S(t), β = 0.1 (f) h(t), β = 0.1


0.10 1.0 0.20

● ● ● ● ● ● ● ●
0.08 ● ● ● ● ● ●
● 0.8 ●
0.15 ● ● ●
● ● ● ●
● ● ● ● ●
0.06 ●

S(t)
0.6

h(t)
● ●
f(t)

0.10 ●
● ● ●
0.04 ● 0.4 ●
● ● ●
● ●
● ● ● ● ● 0.05
0.02 ● ● 0.2 ● ●
● ● ● ● ● ● ●
● ● ● ●
0.00 0.0 0.00

0 2 4 6 8 10 12 14 16 18 20 0 2 4 6 8 10 12 14 16 18 20 0 2 4 6 8 10 12 14 16 18 20

(g) f(t), β = 0.16 (h) S(t), β = 0.15 (i) h(t), β = 0.15


0.15 1.0 ● ● ● ● ●
● 0.25 ● ● ● ● ● ●
● ● ● ● ●
0.8 ● ● ●
● ● 0.20 ●
● ●
0.10 ● ●

S(t)
0.6

h(t)
● 0.15
f(t)


● ●
● 0.4 0.10
0.05 ● ● ●
● ●
● 0.2 ● 0.05 ●
● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ●
0.00 0.0 0.00

0 2 4 6 8 10 12 14 16 18 20 0 2 4 6 8 10 12 14 16 18 20 0 2 4 6 8 10 12 14 16 18 20

(j) f(t), β = 0.2 (k) S(t), β = 0.2 (l) h(t), β = 0.2


0.20 1.0 0.4
● ● ●
● ● ● ● ● ● ● ● ● ● ●
0.15 ● 0.8 0.3 ● ● ● ●
● ● ●

● ●
S(t)

0.6

h(t)

f(t)

0.10 ●
0.2 ●
● ●

0.4

0.05 ● ● 0.1 ●
● 0.2 ●
● ● ● ●
● ● ● ● ● ● ● ● ● ●
0.00 ● ● ● ● ● ● 0.0 ● ● ● ● ● ● 0.0

0 2 4 6 8 10 12 14 16 18 20 0 2 4 6 8 10 12 14 16 18 20 0 2 4 6 8 10 12 14 16 18 20

(m) f(t), β = 0.5 (n) S(t), β = 0.5 (o) h(t), β = 0.5


0.4 1.0 0.8
● ●
0.8 ● ● ● ● ● ● ● ●
0.3 0.6 ●
● ●
S(t)

0.6

h(t)
f(t)

0.2 ●
0.4
0.4

0.1 ●
0.2
0.2

● ●
0.0 ● ● ● ● ● ● 0.0 ● ● ● ● ● ● ● 0.0

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10

Figure 1. The pmf, survival function and hazard function of DB(β) for different values of β.

2. Methods

2.1. Maximum likelihood estimation for complete data


After some algebra we can see that the equation (3) is equivalent to

f (t) = P (T = t) = p2t (p − 1) 2pt (p2 + p + 1) − 3p − 3 .


 
(4)

Let T1 ,...,Tn be a random sample of failure times from a DB survival distribution.


Considering p = e−β and the expression (4), the likelihood function for the parameter
β is given by
n
Y h i
L( β| t) = e−2βti (e−β − 1) 2e−βti (e−2β + e−β + 1) − 3e−β − 3 ,
i=1

and the corresponding log-likelihood function is given by


n
X n
X h i
`( β| t) = −2β ti + n log(e−β − 1) + log 2e−βti (e−2β + e−β + 1) − 3e−β − 3 .
i=1 i=1

4
By deriving the log-likelihood function with respect to β, we have the following equa-
tion:
n n
2e−β(ti −1) 1 + 2e−β + 2ti e−βti e−2β + e−β + 1 − 3e−β
 
d` X ne−β X
= −2 ti − −β − .
dβ e −1 2e−βti (e−2β + e−β + 1) − 3e−β − 3
i=1 i=1
(5)
The maximum-likelihood (ML) estimator βbM L for β is obtained by equating the right-
hand side of (5) to zero and solving for β. Nevertheless, the resulting expression has not
a closed-form solution and so numerical methods are needed to find the ML estimate
for β. In this article, we use the maxLik package in R program to obtain the ML
estimate of the parameter β (Henningsen and Toomet, 2011). A confidence interval
for β can be constructed from the asymptotic normality of the ML estimate considering
large sample sizes, given by
 
βbM L ∼ N β, Vd
ar(βbM L ) ,

where, in the single-parameter case, Vd


ar(βbM L ) is the estimated variance for βbM L .
Therefore, an approximate 100(1 − υ)% Wald-type confidence interval (CI) for β is
given by
q
βbM L ∓ zυ/2 Vd
ar(βbM L ),

where zυ denotes the upper υ-th percentile of the standard normal distribution. The
asymptotic variance of a ML estimator can be estimated by the negative of the inverse
of the second derivative of the log-likelihood function evaluated at βbM L . Thus, we have
n
d2 ` ne−β X Ai
2
= − 2 + 2, (6)
dβ −β
(e − 1) −βt −2β + e + 1) − 3e−β − 3]
−β
i=1 [2e
i (e

where

Ai = −2e−β(ti −1) (3 − 2e−βti + 6e−2β + 12e−β ) + 2ti e−β(ti −1) (3 + 9e−β − 6e−2β + 4e−βti )
   
+6ti e−βti 2e−2β − e−2β − e−β + ti + 3e−β 2e−βti − 2e−2β e−βti − 3
 
+4e−2β (6ti − e−β + 6ti e−β + 4ti e−2β − 4) + 6t2i e−β(1+ti ) 2e−β + e−2β + 2 .

The negative second derivative of the log-likelihood function is the observed informa-
tion denoted by In (β), that is,

d2 `
In (β) = − .
dβ 2

The expected value of In (β), say in (β), is called the expected Fisher information.
The asymptotic variance of βbM L is given by the inverse of the expected information
evaluated at the ML estimate of β, that is,
h i−1
Vd
ar(βbM L ) = in (βbM L ) .

5
2.2. Maximum-likelihood estimation in presence of censored data
Considering a random sample (ti , di ) of size n, i = 1, · · · , n, the contribution of the
ith individual to the likelihood function is given by

Li = [f (ti )]di [S(ti )]1−di ,

where di is a censoring indicator variable, that is, di = 1 for an observed survival


time and di = 0 for a right-censored survival time. Assuming the data with a DB
distribution, the likelihood function for the parameter β is given by
n
Y h idi
L( β| t, d) = e−2βti di (e−β − 1)di 2e−βti (e−2β + e−β + 1) − 3e−β − 3
i=1
h i(1−di )
× 3 − 2e−β(ti +1) e−2β(ti +1)(1−di ) ,

and the corresponding log-likelihood function is


n
X n
X
`( β| t, d) = 2β ti di + log(e−β − 1) di
i=1 i=1
n
X h i
+ di log 2e−βti (e−2β + e−β + 1) − 3e−β − 3
i=1
Xn h i n
X
+ (1 − di ) log 3 − 2e−β(ti +1) − 2β (ti + 1) (1 − di ) .
i=1 i=1

By deriving the log-likelihood function with respect to β, we have


n n
d` X e−β X
= 2 ti di − −β di
dβ e −1
i=1 i=1
n
2e−βti e−β + 2e−2β + ti + ti e−2β + ti e−β − 3e−β

X
− di
2e−βti (e−2β + e−β + 1) − 3e−β − 3
i=1
n n
X (ti + 1)e−β(ti +1) X
−2 (1 − di ) − 2 (ti + 1) (1 − di ) .
i=1
2e−β(ti +1) − 3 i=1

Setting this expression equal to zero, we get the corresponding score equation whose
numerical solution leads to the ML estimator. The second derivative of the log-
likelihood function with respect to β is given by
n n
d2 ` X e−β X Bi
= − di 2 − di 2
dβ 2 (e−β − 1) [2e−βti (e−2β + e−β + 1) − 3e−β − 3]
i=1 i=1
n
X e−β(ti +1) (ti + 1)2
−6 (1 − di )  2 ,
i=1 2e−β(ti +1) − 3

6
where
 
Bi = 6t2i e−βti e−3β + 2e−2β + 2e−β e−βti + 1
   
+12ti e−βti e−2β e−β + 2 + 6e−βti e−β e−2β + 4e−β + 2
 
−4e−2βti e−β e−2β + 4e−β + 1 − 9e−β .

Approximated CI for β also could be obtained based on the asymptotical normality


of the ML estimate for β in similar way as described in the previous subsection.
Given that the time-to-event variable is discrete, randomized quantile residuals can
be used to test model adequacy (Dunn and Smyth, 1996). These residuals are given
by ri = Φ−1 (ui ), i = 1, ..., n, where Φ(·) is the standard
h normal distribution function i
and ui is an uniform random variable on the interval F (ti − 1, βbM L ), F (ti , βbM L ) if
h i
di = 1 and F (ti , βbM L ), 1 if di = 0. Let us consider that F (ti , βbM L ) is the cumulative
distribution function of the DB distribution based on the ML estimate for β. The
randomized quantile residuals are expected to follow the standard normal distribution
if the model is correct. Hence, a normal Q-Q plot can be used to visually check the
normality of these residuals.

2.3. Maximum-likelihood estimation including censored data and a cure


fraction
A fundamental characteristic of the traditional survival analysis methods is that the
survival function S(t) converges to zero when the time variable tends to infinity. In
the applications of survival methods to medical data, this implies assuming that all
individuals under study are susceptible to the event of interest. However, there are
situations where this assumption is not satisfied (Othus et al., 2012). For example,
in randomized trials evaluating the efficacy of a treatment for a disease of interest,
it is possible that some patients may be cured of the disease due to the treatment
under study. If the event of interest is death due to this disease, these patients are
no longer subject to this event. The presence of cured individuals in a data set is
usually suggested by a of stable plateau at the right tail of the Kaplan–Meier non-
parametric estimator of the survival function, with heavy censoring in this portion of
the plot (Corbière et al., 2009). Different parametric and non-parametric approaches
that consider the presence of immune individuals have been proposed in the literature
(Maller and Zhou, 1996; Amico and Van Keilegom, 2018; Peng and Yu, 2021). These
approaches include the mixture model, which explicitly includes a parameter account-
ing for a fraction of immune individuals (Lambert, 2007; Martinez et al., 2013). This
model assumes that the probability of observing a survival time greater than or equal
to some fixed value t is given by the survival function

S(t) = η + (1 − η)S0 (t),

where η is the proportion of immune, cured or not susceptible individuals, and S0 (t)
is the baseline survival function for the susceptible individuals (Farewell, 1982). Con-
sidering a random sample (ti , di ) of size n, i = 1, · · · , n, the contribution of the ith

7
individual to the likelihood function is given by

Li = [f (ti )]di [S (ti )]1−di = [(1 − η)f0 (ti )]di [η + (1 − η)S0 (t)]1−di ,

where di is a binary censoring indicator variable and f0 (t) is the corresponding baseline
probability function. Assuming the mixture model based on the DB distribution, the
likelihood function for β and η is given by
n
Y h id i
L( β, η| t, d) = (1 − η)di e−2βdi ti (e−β − 1)di 2e−βti (e−2β + e−β + 1) − 3e−β − 3
i=1
n h i o1−di
× η + (1 − η) 3 − 2e−β(ti +1) e−2β(ti +1) .

The log-likelihood function in this case is


n
X n
X n
X
`( β, η| t, d) = di log(1 − η) − 2β di ti + di log(e−β − 1)
i=1 i=1 i=1
n
X h i
+ di log 2e−βti (e−2β + e−β + 1) − 3e−β − 3
i=1
n
X n h i o
+ (1 − di ) log η + (1 − η) 3 − 2e−β(ti +1) e−2β(ti +1) .
i=1

The first derivative of the log-likelihood function with respect to β is given by


n n
∂` X X e−β
= −2 di ti − di
∂β e−β − 1
i=1 i=1
n −βt ti + e−β 2e−β + ti e−β + ti + 1 − 3e−β
 
X 2e i

− di
2e−βti (e−2β + e−β + 1) − 3e−β − 3
i=1
n
(ti + 1) e−2β(ti +1) e−β(ti +1) − 1
 
X
+6 (η − 1) (1 − di ) n  1 o,
η + (1 − η) e−2β(ti +1) 3+2 e −2β(ti +1) 2
i=1

and the first derivative of the log-likelihood function with respect to η is given by

n n  −β(t +1) 2
+ 1 e−β(ti +1) − 1

∂` 1 X X 2e i

=− di + (1 − di ) n 1 o.
∂η 1−η 
η + (1 − η) e−2β(ti +1) 3 + 2 e−2β(ti +1) 2
i=1 i=1

Setting these expressions equal to zero and solving them simultaneously we get the ML
estimators of the parameters β and η. Although we cannot obtain explicit expressions
for the ML estimators for these parameters, they can be estimated numerically using
iterative algorithms such as the Newton-Raphson method and its variants.
The second partial derivatives of the ML function are given as follows:
n n
∂2` X e−β X Ci
2
= − d i 2 + 6 (η − 1) (1 − di ) (ti + 1)2 e−2β(ti +1) ,
∂β (e−β − 1) Di
i=1 i=1

8
where
( )
 −β ti +1

Ci = η 2 − 3e + 5 (η − 1) e−3β(ti +1)

and

Di = η 2 + 6η (1 − η) e−2β(ti +1) + 4ηe−3β(ti +1) + 9 (1 − 2η) e−4β(ti +1) + 4 (1 − η)2 e−6β(ti +1)
(
−β ti +1 )
+12 (1 − 2η) e−4β(ti +1) e + 9η 2 e−4β(ti +1) − 4η 2 e−3β(ti +1) + 12η 2 e−5β(ti +1) ,

n n  −β(t +1) 2 
+ 1 e−β(ti +1) − 1 3e−2β(ti +1) + 2e−3β(ti +1) − 1
 
∂2` 1 X X 2e i

= di + (1−di ) ,
∂η 2 (η − 1)2 i=1 i=1
Ei

where

Ei = η 2 + 6η (1 − η) e−2β(ti +1) + 4η 1 − η 2 e−3β(ti +1)



h i
+ (1 − η)2 9e−4β(ti +1) + 12e−5β(ti +1) + 4e−6β(ti +1) ,

and
n
(ti + 1) e−2β(ti +1) e−β(ti +1) − 1
 
∂2` X
=6 (1 − di ) ,
∂β∂η Fi
i=1

where
h i h i
Fi = η 2 +2η (1 − η) 3e−2β(ti +1) + 2e−3β(ti +1) +(1 − η)2 9e−4β(ti +1) + 12e−5β(ti +1) + 4e−6β(ti +1) .

The asymptotic multivariate normal distribution of the ML estimators βbM L and


ηbM L is denoted by
     
βbM L β V11 V12
∼N , ,
ηbM L η V21 V22

where V11 is the variance of βbM L , V22 is the variance of ηbM L , and V12 = V12 is the
covariance between βbM L and ηbM L . Approximate 100(1 − υ)% Wald-type CIs for β and
η are, respectively, given by
q q
βbM L ∓ zυ/2 Vd
ar(βbM L ) and ηbM L ∓ zυ/2 Vd
ar(ηbM L ),

where zυ denotes the upper υ-th percentile of the standard normal distribution. The
asymptotic variances of the ML estimators are given by the elements of the inverse of
the Fisher’s information matrix. The expected information matrix is given by

 2   2  
∂ ` ∂ `
−E ∂β 2 −E ∂β∂η 
in (β, η) =     ,
∂2` ∂2`
−E ∂β∂η −E ∂η 2

9
where the derivatives are provided above. R code for implementing this procedure is
presented in the Appendix.

2.4. Bayesian analysis


The Bayesian approach is an alternative to the ML estimation of the parameters. In
the Bayesian inference it is necessary to specify a prior distribution for each unknown
parameter (Gelman et al., 2013). From the Bayes’ theorem, the posterior distribution
of a particular parameter under the model specification is proportional to its prior
distribution multiplied by the likelihood of the data. Considering the discrete Bilal
distribution, we can assume a gamma prior distribution to the β parameter. That is,
β ∼ Gamma(aβ , bβ ), where aβ and bβ are known hyperparameters and Gamma(a, b)
denotes a gamma distribution with mean a/b and variance a/b2 . In the case of the
model in the presence of a cure fraction, we can consider a prior η ∼ Beta(aη , bη )
to obtain a Bayesian estimate of η, where aη and bη are known hyperparameters and
Beta(a, b) denotes a beta distribution with mean a/(a+b) and variance ab/[(a+b)2 (a+
b + 1)]. Further, we assumed prior independence between the parameters β and η.
In this article, posterior summaries of interest were obtained using standard Markov-
chain Monte Carlo (MCMC) procedures as the Gibbs sampling. The simulation algo-
rithm generated 1,005,000 samples of the joint posterior distribution of interest with a
burn-in phase of 5,000 simulated samples to eliminate the effect of the initial values in
the iterative procedure and considered a thinning interval of size 200 to have approx-
imately independent samples. The Bayes estimates of the parameters were obtained
as the mean of samples drawn from the joint posterior distribution. The convergence
of the simulated sequences was monitored by using traceplots and the Geweke diag-
nostic (Geweke, 1992). The Geweke convergence diagnostic is based on a z score that
compares the difference in the two means of non-overlapping sections of a simulated
Markov chain, divided by the asymptotic standard error of the difference. This z score
asymptotically follows a standardized normal distribution, so we obtain convergence
for a chain if its correspondent absolute z score is less than 1.96. Posterior summaries
of interest were obtained using the MCMCpack package of the R software (Martin and
Quinn, 2006). See Appendix for details about the R code used in this article.

3. Examples

In this section, we illustrate the estimation procedure proposed here with three ex-
amples from the literature. We compare the fits of the discrete Bilal distribution with
some competitive models such as DsFx-I (Eliwa and El-Morshedy, 1996), discrete
Lindley (Gómez-Déniz and Calderı́n-Ojeda, 2012), discrete Rayleigh (Roy, 2004), and
discrete Burr-Hatke (El-Morshedy et al., 2020) distributions. All these distributions
have only one parameter to estimate. The fitted models are compared using Akaike
and Bayesian information criteria (AIC and BIC). When the sample size is small,
many authors have suggested using the corrected AIC (AICC) as an alternative to
AIC (Hurvich and Tsai, 1989).

10
1.0

● ●
● ●
● ● ● Kaplan−Meier estimates
● ●
● ● DB model (AIC 132.6, BIC 133.7, AICC 132.8)
● ● ●

0.8

● DsFx−I model (AIC 135.3, BIC 136.3, AICC 135.5)
● ● ● ● Discrete Lindley model (AIC 133.2, BIC 134.3, AICC 133.4)

Survival function ● ● ● ● Discrete Rayleigh model (AIC 136.0, BIC 137.0, AICC 136.2)

0.6
● ● ● ● Discrete Burr−Hatke model (AIC 178.8, BIC 180.0, AICC 179.0)
● ●
● ●
● ● ●
● ●
0.4


● ●


● ● ●

● ●
● ●

0.2

● ●
● ●

● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ● ● ● ●
● ● ●
● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ●
● ● ●
● ● ●


0.0

0 5 10 15 20 25

Time (weeks)

Figure 2. Survival function for the acute leukemia patients’ data estimated by the Kaplan-Meier method
and by using the models based on the DB, DsFx-I, discrete Lindley, discrete Rayleigh, and discrete Burr-Hatke
distributions.

(a) DB model (b) DsFx−I model (c) Discrete Lindley model


● ●

1.5
● ●

Sample Quantiles

Sample Quantiles

1.0

● ●
●● ●●

0.5
●● ●●

● ●● ●●●●
●●
0.0

1

−0.5
●● ●●
●● ●●

● ● ●

K−S, p−value 0.9238
● ● ● ●
−1.0

−1.5
● K−S, p−value 0.7939
Sample Quantiles


● ●


● ● ● −2 −1 0 1 2 −2 −1 0 1 2
0

Theoretical Quantiles Theoretical Quantiles



● (d) Discrete Rayleigh model (e) Discrete Burr−Hatke model
● ●

2.0

2


● ●
Sample Quantiles

Sample Quantiles
● ●
● ●● ● ●
−1

● ●●●●

1.0
● ●● ● ●●
●● ●●
0

● ●

●●● ●

K−S, p−value 0.8474 ●


−1

0.0
● ●
● ●●

−2


● K−S, p−value < 0.0001
−2

● ● K−S, p−value 0.1068


−1.0
−3

−2 −1 0 1 2 −2 −1 0 1 2 −2 −1 0 1 2

Theoretical Quantiles Theoretical Quantiles Theoretical Quantiles

Figure 3. Randomized quantile residuals for the models based on the (a) DB, (b) DsFx-I, (c) discrete
Lindley, (d) discrete Rayleigh, and (e) discrete Burr-Hatke distributions, fitted to data from patients with
acute leukemia.

3.1. Patients with acute leukemia


In this subsection, a numerical example with complete data is presented to illustrate
the applicability of the discrete Bilal distribution. A total of 97 patients with acute
leukemia participated in a clinical trial investigating the effect of 6-mercaptopurine on
the duration of steroid-induced remissions (Freireich et al., 1963). The remission times
for the n = 21 patients treated with placebo were 1, 1, 2, 2, 3, 4, 4, 5, 5, 8, 8, 8, 8, 11,
11, 12, 12, 15, 17, 22, and 23 weeks. Using the maxLik package of the R software, we
obtained an ML estimate of βbM L = 0.09085 for the parameter β of the discrete Bilal
distribution, with a standard error of 0.01431. An approximate 95% Wald-type CI for
β is (0.0628, 0.1189). Figure 2 compares the survival function estimated by the Kaplan-
Meier method and fitted by parametric models based on the discrete Bilal, DsFx-I,
discrete Lindley, discrete Rayleigh, and discrete Burr-Hatke distributions. Figure 2 also
shows the corresponding AIC, BIC, and AICC values. Figure 3 shows the resulting

11
residual analysis and the corresponding p-values for the Kolmogorov-Smirnov (K-S)
test for normality. The model based on the DB distribution appears to fit the data
reasonably well, as does the model based on the Lindley distribution. Figures 2 and
3 suggest that the discrete Rayleigh and discrete Burr-Hatke distributions do not fit
the data well.
(a) Traceplot (b) Posterior density (c) ACF plot

1.0
30

0.14
25

0.8
0.12 20

0.6
Density

ACF
0.10 15
0.09111

0.4
0.08 10

0.2
0.06 5

0.0
0 ● ● ●
0.04
0e+00 2e+05 4e+05 6e+05 8e+05 1e+06 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0 5 10 15 20
Iterations Lag

Figure 4. Posterior samples for the model parameters of the DB distribution applied to data from patients
with acute leukemia. (a) Traceplot of posterior samples, (b) histogram and posterior density with the corre-
spondent 95% HDI (blue line), and (c) auto-correlation function (ACF) plot for the posterior samples of the
model parameters.

For a Bayesian data analysis, it was assumed an approximately non-informative


gamma prior distribution for the parameter β of the DB distribution, that is,
β ∼ Gamma(0.001, 0.001). Posterior samples for β are described in Figure 4. The
traceplot shown in panel (a) shows the evolution of the MCMC draws over the it-
erations, indicating that the generated samples reached good convergence. The plot
of the autocorrelation function (ACF) shows that the posterior samples are uncorre-
lated (panel (c)). The corresponding Geweke z-score is 0.183, also suggesting satis-
factory convergence of the samples to a stable distribution. The posterior mean for
β is βbBayes = 0.09111, and the corresponding 95% HDI (highest density interval) is
(0.0650, 0.1203). The 95% HDI is plotted on the histogram shown in panel (b) of Fig-
ure 4. We can note that the ML and the Bayesian estimates are fairly close to each
other.

3.2. Hospitalized patients with COVID-19


The study by (Paranjpe et al., 2020) assessed the association between administration
of in-hospital anticoagulation and survival in a large cohort of hospitalized patients
with COVID-19 in the Mount Sinai Health System, New York City. In this example,
we consider a subsample of n = 161 patients who required mechanical ventilation and
were not treated with in-hospital systemic anticoagulation. The variable of interest is
the time from admission to death, in days. We have 15 (9.3%) censored observations.
As the data were available in figures and not in numerical form, we used the open-
source software WebPlotDigitizer, a web based tool to extract numerical data from
images (Drevon et al., 2017; Rohatgi, 2020). In this example, the ML estimate for
the parameter β of the discrete Bilal distribution is βbM L = 0.07047 (standard error
0.00413, 95% Wald-type CI 0.0624 to 0.0786). Figure 5 compares the survival function

12
1.0

● ● ●
● ● ●
● ● ● Kaplan−Meier estimates

● ● ● ● DB model (AIC 1019.9, BIC 1023.0, AICC 1019.9)
● ●

0.8
● ● DsFx−I model (AIC 1019.0, BIC 1022.1.3, AICC 1019.0)
● ●
● ●
● ●
● Discrete Lindley model (AIC 1014.6.2, BIC 1017.6, AICC 1014.6)
● ●
● Discrete Rayleigh model (AIC 1054.6, BIC 1057.7, AICC 1054.6)
Survival function ● ●
● ●

0.6
● ●
● ● ● Discrete Burr−Hatke model (AIC 1359.3, BIC 1362.3, AICC 1359.3)
● ● ●
● ● ●
● ●

● ●
0.4



● ●
● ● ●

● ● ●
● ●
● ●

● ●
0.2

● ●
● ●
● ●
● ●
● ●
● ●
● ● ●
● ● ●
● ● ●
● ● ● ● ●
● ● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ●
0.0

0 5 10 15 20 25 30

Days since admission

Figure 5. Survival function for the COVID-19 patients’ data estimated by the Kaplan-Meier method and
by using the models based on the DB, DsFx-I, discrete Lindley, discrete Rayleigh, and discrete Burr-Hatke
distributions.

estimated by the Kaplan-Meier method and fitted by parametric models based on the
DB and other distributions. Figure 6 shows the randomized quantile residuals from
the fitted models. We note that the DB distribution fitted the data as well as the
DsFx-I and the discrete Lindley distributions, but the models based on the discrete
Rayleigh and discrete Burr-Hatke distributions did not fit the data well.
In the Bayesian analysis, as in the previous example, we assumed a gamma prior dis-
tribution for the parameter β given by β ∼ Gamma(0.001, 0.001). Figure 7 describes
the posterior samples for β. The traceplot in panel (a) shows that the MCMC algorithm
has stabilized, and the corresponding Geweke z-score is 0.033, also suggesting satisfac-
tory convergence. Panel (b) describes the shape of the posterior distribution and shows
the 95% HDI. The posterior mean for β is βbBayes = 0.07051, and the corresponding
95% HDI is (0.0626, 0.0787). This MCMC estimate is closer to the corresponding ML
estimate. The ACF plot shows that autocorrelations are not significantly different from
zero (panel (c)).

3.3. Recurrence rates of pelvic tumors with marginal or intracapsular


margins
In this example, we consider a model for survival data with a cure fraction based on
the DB distribution. Let us consider the data from a study undertaken at the Muscu-
loskeletal Oncology Center of the First Affiliated Hospital of Sun Yat-Sen University,
China, between 2003 and 2013 (Wang et al., 2015). The objective of this study was to
evaluate the effectiveness of reconstruction with a modular hemipelvic endoprosthesis
after pelvic tumor resection. The recurrence times of pelvic tumors with marginal or
intracapsular margins were 3, 7, 11+, 18, 22+, 25, 28, 32+, 34+, 35, 35+, 36+, 40+,
40+, 41, 54+, 66+, 76+, 84+, 88+, and 92+ months, where + denotes a censored
observation. By applying the model described in subsection 2.3 to these data, we ob-
tained the ML estimates βbM L = 0.02859 (standard error 0.01047, 95%CI 0.0081 to
0.0491) and ηbM L = 0.57985 (standard error 0.13965, 95%CI 0.3061 to 0.8536). Figure
8 compares the Kaplan-Meier estimates and the ML estimates of the survival function
corresponding to the model based on DB distribution and the concurrent discrete dis-

13
(a) DB model (b) DsFx−I model (c) Discrete Lindley model
● ●
● ●
● ●

2
● ●● ●●
●●●

2
●●

Sample Quantiles

Sample Quantiles
●● ●
●●
●●●
●●●●
● ●
● ●●
●●
●●
●● ●

●●●


●●
●●●
●●
● ● ●●

1
●● ●

1


●●
● ●●

●● ●



●●
● ●

●●


●●

2

●●
● ●

●●

● ●

●●
●●

●●
●●

●●

●●
●●

●●
●●
●●
● ●●
● ●
●●
● ●

0

●●●●●●

● ●

0

● ●
●●



●●

● ●



● ●●

●● ●●


●●

●●● ●●



●●

●●

● ●●

●●
● ●
●●


●●



●●

●●

● ●
●●

−1

−1
●●● ●

●●
●●
●●●●●● ●
●●
●●



●●
●●






●●

●●
●● ●●

● ●●● K−S, p−value 0.1712 ●
●●




● ●● ●●●

1
● ●●●

−2
●●

−2




● K−S, p−value 0.5905

●●●

●● ●
Sample Quantiles
● ● ●

●●
●●

●●
●●●

● ●




●●●
● −2 −1 0 1 2 −2 −1 0 1 2

●●

●●




●●



0

●●

●●

●●






● Theoretical Quantiles Theoretical Quantiles
●●




●●


●●



●●






●●
●●

● (d) Discrete Rayleigh model (e) Discrete Burr−Hatke model

4

●●
●●

4

−1

●●

● ●

Sample Quantiles

Sample Quantiles

3

3
● ●

● ●●

●● ●●●
●●● ●●●●
●●

2
●● ● ●●

2
● ●●
●●

●● ●●

●●
●●
● ●
● ●●
●●
●●


●●
● ●●●●

●●
●● ●●
●● ●●


●●
●●●
● ●●

●●
● ●●

●●

●●
● ●


●●


●● ●●
●●
●●
●●●

●●

1
●● ●
●●
●●

●●●

●●●

1

● ●
●●


●●●


●●


●● ●


●●●
−2




● ●●
●●

●●





● ●●
●●
●●




● ●●
● ● ●

−1 0
●●
●●

●● ●●●●

●●
● ●

●●



● ●
● ●



●●


−1

●●

● ●
●●


●●


●●

● ●

● ●









●●

●●


●●

●● K−S, p−value < 0.0001

●●●●●●
● K−S, p−value < 0.0001
−3

● K−S, p−value 0.4581 ●●●●

−3

−3
● ●

−2 −1 0 1 2 −2 −1 0 1 2 −2 −1 0 1 2

Theoretical Quantiles Theoretical Quantiles Theoretical Quantiles

Figure 6. Randomized quantile residuals for the models based on the (a) DB, (b) DsFx-I, (c) discrete Lindley,
(d) discrete Rayleigh, and (e) discrete Burr-Hatke distributions, fitted to data from hospitalized patients with
COVID-19.

(a) Traceplot (b) Posterior density (c) ACF plot

1.0
0.085 100

0.080

0.8
80

0.075

0.6
60
Density

0.07051

ACF
0.070

0.4
40
0.065

0.2
20
0.060 0.0

0 ● ● ●

0e+00 2e+05 4e+05 6e+05 8e+05 1e+06 0.055 0.060 0.065 0.070 0.075 0.080 0.085 0 5 10 15 20
Iterations Lag

Figure 7. Posterior samples for the parameter of the DB distribution applied to data from hospitalized
patients with COVID-19. (a) Traceplot of posterior samples, (b) histogram and posterior density with the
correspondent 95% HDI (blue line), and (c) auto-correlation function (ACF) plot for the posterior samples of
the model parameter.

tributions. We can note that the results provided by models based on DB and discrete
Lindley distributions are almost identical and are overlapping on the graph. Figure 9
describes the randomized quantile residuals from the fitted models. The model based
on the DB distribution appears to fit the data reasonably well, as does the mod-
els based on the DsFx-I and the Lindley distribution. Models based on the discrete
Rayleigh and discrete Burr-Hatke distributions did not fit the data well.
Assuming the Bayesian model introduced in Section 2.4, we assumed prior dis-
tributions β ∼ Gamma(0.001, 0.001) and η ∼ Beta(1, 1), that are approximately
non-informative priors for the model parameters. Posterior means for β and η are
βbBayes = 0.02597 (95% HDI 0.0082 to 0.0464) and ηbBayes = 0.51048 (95% HDI 0.1879
to 0.8286), respectively. Figure 10 describes the posterior samples for the parameters
β and η. Panels (a) and (d) show that the MCMC chains reached satisfactory conver-
gence for both parameters and the Geweke Z scores for β and η chains are -0.060 and
-0.729, respectively. Panels (b) and (e) describe the posterior densities and the 95%
HDI. Panels (c) and (f) shows that autocorrelations within each chain were reasonably

14
1.0

●●●●●●●●●
●●●●● ●●●
●●●●●● ●●
●●
●●
●●●●● ●●●
●● ●●
●● ●●●●●
●● ●
● ●●●●●●●
●●●

0.8
●● ●
●●●
●●
●●
● ●● ●●
●●
● ●●
●●●●
●● ●●●● ●●
●●●● ●●● ●●●●
Survival function
●●●●●●●●●●●● ●●
●●●● ●
●● ●●●● ●●●●●●
●●●●●●●●●●●●●●●●● ● ●● ●●
● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
● ●●
●●●●
●●
●●●●●
●●●●●●●●
●●●●●
●●● ●●●
●●●
●●

0.6
●●● ●●● ● ●●●●●●
●● ●●●●●●
●●●●●●●●●●●●●●●● ●●
●●●●
●●●●●●●●●●
●●●●
●●●●
●●●●●●
●● ● ● ● ●● ●● ●● ●●
●●

Kaplan−Meier estimates
0.4

● DB model (AIC 84.2, BIC 86.3, AICC 84.4)


● DsFx−I model (AIC 84.7, BIC 86.8, AICC 84.9)
● Discrete Lindley model (AIC 84.1, BIC 86.2, AICC 84.4)
0.2

● Discrete Rayleigh model (AIC 83.9, BIC 86.0, AICC 84.1)


● Discrete Burr−Hatke model (AIC 112.8, BIC 114.9, AICC 113.0)
0.0

0 20 40 60 80 100

Follow up (months)

Figure 8. Survival function for the pelvic tumors patients’ data estimated by the Kaplan-Meier method and
by using the models based on the DB, DsFx-I, discrete Lindley, discrete Rayleigh, and discrete Burr-Hatke
distributions.

(a) DB model (b) DsFx−I model (c) Discrete Lindley model

1.5
● ●

2


Sample Quantiles

Sample Quantiles

1.0

●● ●

0.5
● ● ●
●● ●

● ●●●●

●●●
0.0
●● ●

● ● ●●

−0.5
●●
● ● ● ●

1


● ●
−1.0


K−S, p−value 0.1004 K−S, p−value 0.127

−1.5
Sample Quantiles

● ● ● ● ●

● −2 −1 0 1 2 −2 −1 0 1 2

0

● ● Theoretical Quantiles Theoretical Quantiles


● ●


● (d) Discrete Rayleigh model (e) Discrete Burr−Hatke model

3
● ●


Sample Quantiles

Sample Quantiles
1
−1


2
● ●
● ●
●●●
●● ●
●●
0

1


● ● ●●
● ●● ● ●
●●

0
● ●●●
−1


● ● ● ● ●●

−2

K−S, p−value 0.9857 ●


K−S, p−value < 0.0001 −1
K−S, p−value < 0.0001
−2


−2

−2 −1 0 1 2 −2 −1 0 1 2 −2 −1 0 1 2

Theoretical Quantiles Theoretical Quantiles Theoretical Quantiles

Figure 9. Randomized quantile residuals for the models based on the (a) DB, (b) DsFx-I, (c) discrete Lindley,
(d) discrete Rayleigh, and (e) discrete Burr-Hatke distributions, fitted to data from patients with pelvic tumors.

low.

4. Concluding Remarks

The literature contains few articles on the DB distribution introduced by (Altun et al.,
2020). Hence the contribution of the present article is the introduction of parameter
estimation for DB distribution considering the inclusion of right-censored data and a
cure fraction. Three applications to real datasets show that the model based on DB
distribution performs at least as good as some other traditional discrete models as
the DsFx-I, discrete Lindley, discrete Rayleigh, and discrete Burr-Hatke distributions.
Therefore, the model based on DB distribution showed to be a suitable way to analyze
discrete survival data, even including the presence of immune individuals. Moreover,
the model can be easily implemented in computational programs as R, as showed in

15
(a) Traceplot (b) Posterior density (c) ACF plot
40
0.08

0.8
0.06 30

Parameter β

Density

ACF
20

0.4
0.04

0.02597
0.02 10

0.0
0 ● ● ●
0e+00 4e+05 8e+05 0.00 0.02 0.04 0.06 0.08 0 5 10 15 20
Iterations Lag

(d) Traceplot (e) Posterior density (f) ACF plot


3.0
0.8

0.8
2.5
Parameter η

0.6 2.0

Density
0.51048

ACF
1.5

0.4

0.4
1.0
0.2 0.5

0.0
0.0 0.0 ● ● ●
0e+00 4e+05 8e+05 0.0 0.2 0.4 0.6 0.8 0 5 10 15 20
Iterations Lag

Figure 10. Posterior samples for the model parameters of the DB distribution applied to data from patients
with pelvic tumors. (a) Traceplots of posterior samples, (b) histograms and posterior densities with the corre-
spondent 95% HDI (blue lines), and (c) auto-correlation function (ACF) plots for the posterior samples of the
model parameters.

the Appendix. Currently, we find many examples of application of cure rate models to
medical data (Gallardo et al., 2021; Leão, 2020; Rafati et al., 2020), which makes these
models attractive to be assumed in lifetime data analysis. The methods introduced in
this paper could be very helpful to researchers dealing with discrete survival data.

References

[1] Abd-Elrahman, A. M. (2013). Utilizing ordered statistics in lifetime distributions produc-


tion: A new lifetime distribution and applications, Journal of Probability and Statistical
Science, 11, 153–164.
[2] Abd-Elrahman, A. M. (2019). Reliability estimation under type-II censored data from the
generalized Bilal distribution, Journal of the Egyptian Mathematical Society, 27, 1-15.
[3] Abd-Elrahman, A. M. and Niazi, S. F. (2017). Approximate Bayes estimators applied to
the Bilal model, Journal of the Egyptian Mathematical Society, 25, 65-70.
[4] Alamatsaz, M.H., Dey, S., Dey, T., and Harandi, S.S. (2016). Discrete generalized Rayleigh
distribution, Pakistan Journal of Statistics, 32, 1–20.
[5] Altun, E., El-Morshedy, M., and Eliwa, M. S. (2020). A study on discrete Bilal distri-
bution with properties and applications on integer-valued autoregressive process, Revstat
Statistical Journal, 18, 70–99.
[6] Amico, M. and Van Keilegom, I. (2018). Cure models in survival analysis, Annual Review
of Statistics and Its Application, 5, 311–342.
[7] Cardial, M.R.P., Fachini-Gomes, J.B., and Nakano, E. Y. (2020). Exponentiated discrete
Weibull distribution for censored data, Brazilian Journal of Biometrics, 38, 35–56.
[8] Chakraborty, S. (2015). Generating discrete analogues of continuous probability
distributions-A survey of methods and constructions, Journal of Statistical Distributions
and Applications, 2, 1–30.
[9] Corbière, F., Commenges, D., Taylor, J. M., and Joly, P. (2009). A penalized likelihood
approach for mixture cure models, Statistics in Medicine, 28, 510–524.
[10] Drevon, D., Fursa, S. R., and Malcolm, A. L. (2017). Intercoder reliability and validity of
WebPlotDigitizer in extracting graphed data, Behavior Modification, 41, 323–339.
[11] Dunn, P. K. and Smyth, G. K. (1996). Randomized quantile residuals, Journal of Com-
putational and Graphical Statistics, 5, 236–244.

16
[12] Eliwa, M. S. and El-Morshedy, M. (2021). A one-parameter discrete distribution for over-
dispersed data: Statistical and reliability properties with applications, Journal of Applied
Statistics, 1–21.
[13] El-Morshedy, M., Eliwa, M. S. and Altun, E. (2020). Discrete Burr-Hatke distribution
with properties, estimation methods and regression model, IEEE Access, 8, 74359–74370.
[14] Farewell, V. T. (1982). The use of mixture models for the analysis of survival data with
long-term survivors, Biometrics, 38, 1041–1046.
[15] Flynn, R. (2012). Survival analysis, Journal of Clinical Nursing, 21, 2789–2797.
[16] Freireich, E. J., Gehan, E., Frei-3rd, E., Schroeder, L. R., Wolman, I. J., Anbari, R.,
Burgert, E. O., Mills, S. D., Pinkel, D., and Selawry, O. S. (1963). The effect of 6-
mercaptopurine on the duration of steroid induced remissions in acute leukemia - A model
for evaluation of other potentially useful therapy, Blood, 21, 699–716.
[17] Freitas, B. C. L., Oliveira-Peres, M. V., Achcar, J. A., and Martinez, E. Z. (2021). Classical
and Bayesian inference approaches for the exponentiated discrete Weibull model with
censored data and a cure fraction, Pakistan Journal of Statistics and Operation Research,
17, 467–481.
[18] Gallardo, D. I., Castro, M., and Gómez, H. W. (2021). An alternative promotion time
cure model with overdispersed number of competing causes: an application to melanoma
data, Mathematics, 9, 1815.
[19] Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., and Rubin, D. B.
(2013). Bayesian data analysis, 3rd ed., Chapman Hall.
[20] Geweke, J. (1992). Evaluating the accuracy of sampling-based approaches to the calcula-
tion of posterior moments, Bayesian Statistics, 4, 641—649.
[21] Gómez-Déniz, E. and Calderı́n-Ojeda, E. (2011). The discrete Lindley distribution: prop-
erties and applications, Journal of Statistical Computation and Simulation, 81, 1405–1416.
[22] Henningsen, A. and Toomet, O. (2011). maxLik: A package for maximum likelihood esti-
mation in R, Computational Statistics, 26, 443–458.
[23] Hurvich, C. M. and Tsai, C. L. (1989). Regression and time series model selection in small
samples, Biometrika, 76, 297—307.
[24] Lambert, P. C. (2007). Modeling of the cure fraction in survival studies, The Stata Journal,
7, 351–375.
[25] Le Rademacher, J. and Wang, X. (2021). Time-to-event data: An overview and analysis
considerations, Journal of Thoracic Oncology, 16, 1067–1074.
[26] Leão, J., Bourguignon, M., Gallardo, D. I., Rocha, R., and Tomazella, V. (2020). A
new cure rate model with flexible competing causes with applications to melanoma and
transplantation data, Statistics in Medicine, 39, 3272–3284.
[27] Maller, R. A. and Zhou, X. (1996). Survival analysis with long-term survivors, John Wiley
& Sons, New York.
[28] Martin, A. D. and Quinn, K. M. (2006). Applied Bayesian inference in R using MCMC-
pack, R News, 6, 2–7.
[29] Martinez, E. Z and Achcar, J. A. and Jácome, A. A. and Santos, J. S (2013). Mixture
and non-mixture cure fraction models based on the generalized modified Weibull distri-
bution with an application to gastric cancer data, Computer Methods and Programs in
Biomedicine, 112, 343–355.
[30] Myers, J. (2007). Survival analysis techniques in clinical research, The Journal of the
Kentucky Medical Association, 105, 545–550.
[31] Nakagawa, T. and Osaki, S. (1975). The discrete Weibull distribution, IEEE Transactions
on Reliability, 24, 300-301.
[32] Nekoukhou, V. and Bidram, H. (2015). The exponentiated discrete Weibull distribution,
SORT Statistics and Operations Research Transactions, 39, 127–146.
[33] Oliveira, R. P., Oliveira-Peres, M. V. Martinez, E. Z., and Achcar, J. A. (2019). Use of a
discrete Sushila distribution in the analysis of right-censored lifetime data, Model Assisted
Statistics and Applications, 14, 255–2681.
[34] Othus, M., Barlogie, B., LeBlanc, M. L., and Crowley, J. J. (2012). Cure models as a

17
useful statistical tool for analyzing survival, Clinical Cancer Research, 18, 3731–3736.
[35] Paranjpe, I., Fuster, V., Lala, A., Russak, A. J., Glicksberg, B. S., Levin, M. A., Charney,
A. W., Narula, J., Fayad, Z. A., Bagiella, E., Zhao, S., and Nadkarni, G. N. (2020). As-
sociation of treatment dose anticoagulation with in-hospital survival among hospitalized
patients with COVID-19, Journal of the American College of Cardiology, 76, 122–124.
[36] Peng, Y. and Yu, B. (2021). Cure models: Methods, applications, and implementation,
CRC Press.
[37] Rafati, S., Baneshi, M. R., and Bahrampour, A. (2020). Factors affecting long-survival of
patients with breast cancer by non-mixture and mixture cure models using the Weibull,
log-logistic and Dagum distributions: a Bayesian approach, Asian Pacific Journal of Can-
cer Prevention: APJCP, 21, 485.
[38] Rohatgi, A. (2004). WebPlotDigitizer version 4.4, Available from:
https://automeris.io/WebPlotDigi-tizer/.
[39] Roy, D. (2004). Discrete Rayleigh distribution, IEEE Transactions on Reliability, 53,
255–260.
[40] Shi, X., Shi, Y. and Zhou, K. (2021). Estimation for entropy and parameters of generalized
Bilal distribution under adaptive type II progressive hybrid censoring scheme, Entropy,
23, 206.
[41] Wang, B., Xie, X., Yin, J., Zou, C., Wang, J., Huang, G., Wang, Y., and Shen, J. (2015).
Reconstruction with modular hemipelvic endoprosthesis after pelvic tumor resection: a
report of 50 consecutive cases, PLoS One, 10, e0127263.

Appendice: R Codes

Under the frequentist approach, the following R code is used to implement the model
for survival data with a cure fraction based on the DB distribution, as presented in
subsection 2.3. We used the function maxLik of the maxLik package (Henningsen and
Toomet, 2011) for the maximization of the likelihood function.
# Reading data (Wang et al., 2015)
t <- c(3,7,11,18,22,25,28,32,34,35,35,36,40,40,41,54,66,76,84,88,92)
d <- c(1,1,0,1,0,1,1,0,0,1,0,0,0,0,1,0,0,0,0,0,0)
n <- length(t) # the sample size
K <- 2 # number of parameters

# Loading the maxLik package


library(maxLik)
# The likelihood function
log.f <- function(parms) {
beta <- parms[1]
eta <- parms[2]
if (parms[1]<0) return(-Inf)
if (parms[2]<0) return(-Inf)
if (parms[2]>1) return(-Inf)
p <- exp(-beta)
St0 <- (3-2*p^(t+1))*p^(2*(t+1))
ft0 <- p^(2*t)*(p-1)*(2*p^t*(p^2+p+1)-3*p-3)
St <- eta + (1-eta)*St0
ft <- (1-eta)*ft0
like <- ft^d * St^(1-d)

18
L <- sum(log(like))
if (is.na(L)==TRUE) {return(-Inf)} else {return(L)} }
# Obtaining the ML estimates
mle <- c()
mle <- maxLik(logLik=log.f,start=c(0.08,0.6))
summary(mle)
betaDB <-mle$estimate[1]
etaDB <-mle$estimate[2]
s <- vcov(mle)
# The 95% confidence intervals
llimDB <- round(betaDB - qnorm(0.975) * sqrt(s[1,1]),4)
ulimDB <- round(betaDB + qnorm(0.975) * sqrt(s[1,1]),4)
llimDBe <- round(etaDB - qnorm(0.975) * sqrt(s[2,2]),4)
ulimDBe <- round(etaDB + qnorm(0.975) * sqrt(s[2,2]),4)
cat("n = ",n,"\n")
cat("Beta = ",betaDB, "95%CI: (",llimDB,",",ulimDB, ") \n")
cat("Eta = ",etaDB, "95%CI: (",llimDBe,",",ulimDBe, ") \n")
# Calculating AIC, BIC and AICC
aic <- AIC(mle)
bic <- AIC(mle,k = log(n))
aicc <- aic + (2*K^2+2*K)/(n-K-1)
cat("AIC = ",aic,", BIC = ",bic,", AICC = ",aicc,"\n")
This is the R code for the Bayesian model for survival data with a cure fraction
based on the DB distribution, as presented in subsection 2.4:
# The log posterior function
log.post <- function(t,d,parms) {
beta <- parms[1]
eta <- parms[2]
if (parms[1]<0) return(-Inf)
if (parms[2]<0) return(-Inf)
if (parms[2]>1) return(-Inf)
p <- exp(-beta)
St0 <- (3-2*p^(t+1))*p^(2*(t+1))
ft0 <- p^(2*t)*(p-1)*(2*p^t*(p^2+p+1)-3*p-3)
St <- eta + (1-eta)*St0
ft <- (1-eta)*ft0
like <- ft^d * St^(1-d)
log.like <- sum(log(like))
prior <- dgamma(beta,0.001,0.001)*dbeta(eta,1,1)
log.prior <- log(prior)
L <- log.like + log.prior
if (is.na(L)==TRUE) {return(-Inf)} else {return(L)} }

# Obtaining the MCMC estimates


posterior <- MCMCmetrop1R(log.post,theta.init=c(beta=0.05,eta=0.6),
burnin=10000, mcmc=1000000, thin=200, logfun=T, t=t, d=d, verbose=100000,
tune = 1)
varnames(posterior) <- c("beta","eta")
summary(posterior)

19
# Obtaining the HPD intervals
HPDinterval(posterior, prob = 0.95)
# Geweke z scores
geweke.diag(posterior)

20

You might also like