You are on page 1of 3

Bayesian analysis of Lung cancer patients

June 15, 2023

1 Introduction
Competing risk analysis is used to investigate the specific risk in case an event
possesses the possibility of occurrence due to more than one mutually exclusive
causes. For instance, in the case of a lung cancer patients, other factors such
as heart failure, kidney failure, or a road accident could also contribute to their
death alongside the cancer itself. In such scenario competing risk model can
be employed for better understanding of the patterns and behavior of specific
cause of death. Typically, competing risk data are presented in form of pair
(X, S), where random variable X denotes time to death of individual and S
the cause of death. In some cases the cause of death is observed as completely
unknown or some information is available about it. This type of data is termed
as masked data in literature. Masking in data occurs due to certain unavoidable
circumstances, such as a patient may die outside hospital or medical setting, an
incomplete or inaccurate death certificate of patient, conducting postmortem
examinations may not be always possible or feasible when it’s required to diag-
nose the cause of death, etc. In these scenarios the exact cause of death could
not reported to the registry. In real-life situations, it is challenging or impos-
sible to observe and document the exact cause of death of each patient. This
ambiguity in competing risk data makes the analysis hard and thus we need
some advanced computational techniques for the estimation procedure.
In this article, we utilize Bayesian approach for competing risk analysis of
lung cancer patients. The data of lung cancer patients is considered from Surveil-
lance, Epidemiology, and End Results (SEER) cancer organization. The data
contains survival time and the cause of death of patients. However, for some
of the patients the cause of death are reported as missing. It’s worth noting
that Lung cancer patients may experience multiple health issues or complica-
tions and that make it challenging to pinpoint the exact cause of death. For
example, they may have other underlying health conditions or may experience
complications related to their cancer treatment. Sometimes, the medical records
or documentation about a patient’s health history may not provide enough in-
formation to determine the specific cause of death. This can happen if certain
details are missing or if the records are not thorough enough to accurately assess
the situation. Different healthcare professionals may have varying opinions or

1
perspectives on the cause of death. This can be especially true when multi-
ple factors are involved, and there may be disagreements in determining which
factor had the most significant impact. In certain cases, cause of death informa-
tion may be withheld or masked to protect the privacy of the patient or their
family. This can occur if disclosing specific details could potentially lead to
the identification of the individual or if there are legal or ethical considerations
involved.

2 Modeling
To model the lifetime of lung cancer patients, we consider a study where N
patients are enrolled, and each patient is exposed to J = 3 mutually exclusive
competing cause of deaths: Lung cancer (j = 1), other cancer (j = 2) and non-
cancer (j = 3) cause of death. Suppose among these patients, only n patients
died and rest nc = N − n censored from study. Let Xij (i = 1, . . . , n; j =
1, 2, 3) denotes the random lifetime of ith patient under j th risk. Since all
risks are working together for the death of patients, we observe the time of
death as minimum of Xij , denoted as Xi = min(Xi1 , Xi2 , Xi3 ). Corresponding
to each Xi , the cause of death of patients is also reported as Si ⊆ {1, 2, 3}.
The set Si contains the true cause of death Ki ∈ {1, , 2, 3} and it assumes
value {1} , {2} , {3}, when the cause of death is observed exactly. Otherwise it
takes value {1, 2} , {2, 3} , {1, 3}, or {1, 2, 3} for masked cause of death. Here,
Si = {1, 2} indicates that the cause of death of patient may be either lung
cancer or other cancer, similarly Si = {1, 2, 3} indicates that the cause of death
of patient is either lung cancer or other cancer, or non-cancer, which implies
that information regarding cause of death is completely missing. In SEER data
set, the cause of deaths were reported as either exactly known or completely
missing. Hence, we will consider only two case: Si = {j} and Si = {1, 2, 3}.
Notice that, when the cause of death is reported exactly, Si contains single
element, whereas Si has more than one elements when the cause of death is
masked. For this data set, the likelihood function comes out as

 
n X
Y J
Y Ync Y
J
P (x|ψ) = P (Si = si |Ki = j, Xi = xi )hj (xi ) F̄j (xi ) F̄j (xl+n ),
 
i=1 j∈Si j=1 l=1 j=1
(1)

where hj (xi ) and F̄j (xi ) denotes the cause-specific hazard and survival function
corresponding to cause j, and P (Si = si |Ki = j, Xi = xi ) represents the con-
ditional probability of observing cause of death as si given that the ith patient
was died due to cause Ki = j at time xi . When Si = {j} (j = 1, 2, 3), it
is the probability of exact detection whereas if Si = {1, 2, 3}, it provides the
probability of the cause of death of ith patients being masked.
In literature many authors modeled masking probability differently. Some
of authors considered masking probabilities to be independent, indicating that

2
masking probabilities are same. On the other hand, others considered masking
probabilities to be dependent, implying that they are not same for all cause of
death. While analysing the lung cancer patient data, we observed that fewer
number of deaths were reported with missing cause of death as the survival time
of lung cancer patients increases. This trend can be true in general because the
treatment of lung cancer often involves advanced and potentially costly methods.
And it’s less likely that the cause of death of patient will be reported as missing
when the patient is taking advance treatments. Therefore, we considered the
masking probability to be dependent upon the time of death. We model the
masking probability using the cumulative distribution function of exponential
distribution as follows

P (Si = {j} |Ki = j, Xi = xi ) = exp(−λj xi ) (2)

which implies

P (Si = {1, 2, 3} |Ki = j, Xi = xi ) = 1 − exp(−λj xi ) (3)

You might also like