You are on page 1of 33

MLE under Survival Data: Type I and Random

Censoring and K-M Estimator


Module 16

Saurav De

Department of Statistics
Presidency University

Saurav De (Department of Statistics Presidency


MLEUniversity)
under Survival Data: Type I and Random Censoring and K-M Estimator 1 / 33
MLE under Survival Data:
Type I and Random Censoring
and K-M Estimator

An illustration under type I censoring:

Consider exponential life of an item with mean θ.


Under type I censoring the likelihood of θ will be
n
Y (1−δi )
[fθ (ti )]δi F θ (Li )

L(θ) =
i=1

1
where fθ (t) = θ exp {−t/θ} and F θ (t) = exp {−t/θ}

Saurav De (Department of Statistics Presidency


MLEUniversity)
under Survival Data: Type I and Random Censoring and K-M Estimator 2 / 33
MLE under Survival Data:
Type I and Random Censoring
and K-M Estimator

If we define a set A = {i | δi = 1}, we can write


Y1 Y
L(θ) = exp {−ti /θ} . exp {−Li /θ}
θ
i∈A i∈Ac
( " #)
1 1 X X
= exp − ti + Li
θr θ c i∈A i∈A

X n
X
where r = δi = δi = # uncensored cases observed.
i∈A i=1

Saurav De (Department of Statistics Presidency


MLEUniversity)
under Survival Data: Type I and Random Censoring and K-M Estimator 3 / 33
MLE under Survival Data:
Type I and Random Censoring
and K-M Estimator

X X n
X
Now we can write ti + Li = yi
i∈A i∈Ac i=1
( n )
1 X
=⇒ L(θ) = r exp − yi /θ
θ
i=1

n
X
Yi
i=1
Hence get the MLE of θ as θ̂ = r

Note: Here r is not fixed but a random variable.

Saurav De (Department of Statistics Presidency


MLEUniversity)
under Survival Data: Type I and Random Censoring and K-M Estimator 4 / 33
MLE under Survival Data:
Type I and Random Censoring
and K-M Estimator

=⇒ It is difficult to find the exact sampling distribution of θ̂.


Therefore consider the asymptotic distribution of θ̂.
Asymptotic distribution of MLE =⇒
D
θ̂ ≡ θ̂n −→ N θ, I −1 (θ)


where I (θ) = Fisher’s Information

Saurav De (Department of Statistics Presidency


MLEUniversity)
under Survival Data: Type I and Random Censoring and K-M Estimator 5 / 33
MLE under Survival Data:
Type I and Random Censoring
and K-M Estimator
h i
∂ 2
i.e. I (θ) = E − ∂θ 2 log L(θ)

n
( )
X
1
Now L(θ) = θr exp − yi /θ
i=1

n
X n
X
yi yi
i=1 ∂ r i=1
=⇒ log L(θ) = −r log θ − =⇒ log L(θ) = − + 2
θ ∂θ θ θ
And
n
X
yi
∂2 r i=1
log L(θ) = 2 − 2 3
∂θ2 θ θ
Saurav De (Department of Statistics Presidency
MLEUniversity)
under Survival Data: Type I and Random Censoring and K-M Estimator 6 / 33
MLE under Survival Data:
Type I and Random Censoring
and K-M Estimator
X
  2 E (Yi )
∂2 E (r )
=⇒ E − ∂θ2
log L(θ) = θ3
− θ2
. . . . . . (∗)

Now
E (Yi ) = E (Yi | δi = 1)Pθ [δi = 1] + E (Yi | δi = 0)Pθ [δi = 0] . . . . . . (∗∗)

As Pθ [Yi = Li | δi = 0] = 1 so E (Yi | δi = 0) = Li .

ZLi
t θ1 exp{−t/θ}dt

0
Also E (Yi | δi = 1) E (Ti | Ti ≤ Li ) = Fθ (Li )
h   i
= θ (1 − exp {−Li /θ})−1 1 − Li
θ + 1 exp {−Li /θ}

Saurav De (Department of Statistics Presidency


MLEUniversity)
under Survival Data: Type I and Random Censoring and K-M Estimator 7 / 33
MLE under Survival Data:
Type I and Random Censoring
and K-M Estimator
On simplification (∗∗) =⇒
   
Li
E (Yi ) = θ 1 − + 1 exp {−Li /θ} + Li exp {−Li /θ}
θ
= θ (1 − exp {−Li /θ})

n
X n
X
=⇒ E (r ) = E (δi ) = Pθ [δi = 1]
i=1 i=1
n
X
= (1 − exp {−Li /θ}) = Q say
i=1

Saurav De (Department of Statistics Presidency


MLEUniversity)
under Survival Data: Type I and Random Censoring and K-M Estimator 8 / 33
MLE under Survival Data:
Type I and Random Censoring
and K-M Estimator
Hence finally (∗) =⇒
n
X
2θ (1 − exp {−Li /θ})
i=1 Q Q
I (θ) = − = 2
θ3 θ 2 θ
 2
D
=⇒ Q̂n −→ N 0, θQ

n 
X n o
Note: MLE of Q will be Q̂ = 1 − exp −Li /θ̂
i=1

(by invariance property of MLE)

Saurav De (Department of Statistics Presidency


MLEUniversity)
under Survival Data: Type I and Random Censoring and K-M Estimator 9 / 33
MLE under Survival Data:
Type I and Random Censoring
and K-M Estimator

Note: For large n, the normality assumption for θ̂ is good. However for
small n, it is rather poor. There are alternative approximate methods that
can be recommended in the context of asymptotic normality even for small
n. One such approach discussed below is due to D. A. Sprott (1973,
Biometrika).
Sprott showed that the transformation φ̂ = θ̂−1/3 converges in distribution
to normality more closely than θ̂ itself, even for small n.

Obviously here φ = θ−1/3 . Also fom Taylor’s expansion we get

E (φ̂) ≈ φ = θ̂−1/3

Saurav De (Department of Statistics Presidency


MLEUniversity)
under Survival Data: Type I and Random Censoring and K-M Estimator 10 / 33
MLE under Survival Data:
Type I and Random Censoring
and K-M Estimator

 2

And V (φ̂) ≈ φ Vasy (θ̂)
∂θ
2 2
θ−2/3 φ2

1 θ
= − θ−4/3 = =
3 Q 9Q 9Q

Thus for testing any hypothesis or constructing any confidence interval of


some parametic function of θ, we can start with

φ̂ − φ D φ̂ − φ D
q −→ N(0, 1) or, r −→ N(0, 1).
φ2 φ̂2
9Q
9Q̂

Saurav De (Department of Statistics Presidency


MLEUniversity)
under Survival Data: Type I and Random Censoring and K-M Estimator 11 / 33
MLE under Survival Data:
Type I and Random Censoring
and K-M Estimator

Random Censoring : the most frequently used censoring in statistics.

A random censoring is very similar to type I censoring. The only difference


is that here the censoring times are also random variables.

A simple random censoring process is one in which ith item is assumed to


have lifetime Ti and censoring time Ci with Ti and Ci independently
distributed continuous random variables.

Saurav De (Department of Statistics Presidency


MLEUniversity)
under Survival Data: Type I and Random Censoring and K-M Estimator 12 / 33
MLE under Survival Data:
Type I and Random Censoring
and K-M Estimator

Let Ti be iid with common pdf fθ (·) and common cdf Fθ (·)
Let Ci be iid with common pdf g (·) and common cdf G (·).
Define (Yi , δi ) same way as in type I censoring.
Then the likelihood function of θ for given data set {(ti , δi ), i = 1, . . . , n}
on the paired random variables (Yi , δi )s will be
n
Y  δi  1−δi
L(θ) = fθ (ti )G (ti ) g (ti )F θ (ti )
i=1

Saurav De (Department of Statistics Presidency


MLEUniversity)
under Survival Data: Type I and Random Censoring and K-M Estimator 13 / 33
MLE under Survival Data:
Type I and Random Censoring
and K-M Estimator

Verification of the form of likelihood:

Under random censoring we know

Pθ [yi < Yi ≤ yi + ∆yi , δi = 0] = Pθ [yi < Ci ≤ yi + ∆yi , Ti > yi ]

(as δi = 0 ⇔ Ti > Ci ⇔ Yi = Ci )

= P [yi < Ci ≤ yi + ∆yi ] Pθ [Ti > yi ]

(as Ti and Ci are independently distributed)

Saurav De (Department of Statistics Presidency


MLEUniversity)
under Survival Data: Type I and Random Censoring and K-M Estimator 14 / 33
MLE under Survival Data:
Type I and Random Censoring
and K-M Estimator

=⇒ with δi = 0 the likelihood corresponding to ith unit will be

Pθ [yi < Yi ≤ yi + ∆yi , δi = 0]


Li (θ) = lim = g (yi )F θ (yi )
∆yi →0 ∆yi

On the other hand

Pθ [yi < Yi ≤ yi + ∆yi , δi = 1] = Pθ [yi < Ti ≤ yi + ∆yi , Ci > yi ]

(as now δi = 1 ⇔ Ti ≤ Ci ⇔ Yi = Ti )

= Pθ [yi < Ti ≤ yi + ∆yi ] P [Ci > yi ]


=⇒ now Li (θ) = fθ (yi )G (yi )

Saurav De (Department of Statistics Presidency


MLEUniversity)
under Survival Data: Type I and Random Censoring and K-M Estimator 15 / 33
MLE under Survival Data:
Type I and Random Censoring
and K-M Estimator
Thus we have

Li (θ) = g (ti )F θ (ti ) with δi = 0


= fθ (ti )G (ti ) with δi = 1

Combining these two forms we can write


δ  1−δi
Li (θ) = fθ (ti )G (ti ) i g (ti )F θ (ti )

for all i.

=⇒ the joint likelihood function of θ based on given data set on


independent (Yi , δi ) pairs will be
n
Y  δi  1−δi
L(θ) = fθ (ti )G (ti ) g (ti )F θ (ti )
i=1

Saurav De (Department of Statistics Presidency


MLEUniversity)
under Survival Data: Type I and Random Censoring and K-M Estimator 16 / 33
MLE under Survival Data:
Type I and Random Censoring
and K-M Estimator
Informative and Noninformative Random Censoring :

Let θ : Parameter of the probability distribution of lifetime variable, Ti .


If the probability distribution of the censoring variable Ci also involves θ as
its parameter, the random censoring is informative.
(Reason : Then censoring variables also add information about θ.)

Otherwise, it is noninformative.

e.g. Ti ∼ Exponential (θ) , Ci ∼ Exponential (γθ)

=⇒ informative censoring.

Ti ∼ Exponential (θ) , Ci ∼ Exponential (φ); φ independent of θ

=⇒ noninformative censoring.
Saurav De (Department of Statistics Presidency
MLEUniversity)
under Survival Data: Type I and Random Censoring and K-M Estimator 17 / 33
MLE under Survival Data:
Type I and Random Censoring
and K-M Estimator
Kaplan-Meier (K-M) Estimator/Product Limit (PL) Estimator of
survival Function

Let t1 , . . . , tn be uncensored sample observations on failure times.

Then a nonparametric estimate of the survival function S(t) at the time


point t is given by
#observations ≥ t
Rn (t) = . . . . . . (1)
n
This is basically the complementary empirical distribution function at the
point t.

But usually we cannot expect uncensored failure data due to many


practical limitations.
Saurav De (Department of Statistics Presidency
MLEUniversity)
under Survival Data: Type I and Random Censoring and K-M Estimator 18 / 33
MLE under Survival Data:
Type I and Random Censoring
and K-M Estimator

Consider a type I censored sample.

In this case # lifetimes or failure times ≥ t may not be known exactly. ⇒


we need some modification in (1).

The modified estimator by incorporating appropriate way the sense of type


I scensoring , is called the PL estimate of the survival function.

PL estimator =⇒ also known as K-M estimator from the authors who first
discussed its properties.

Saurav De (Department of Statistics Presidency


MLEUniversity)
under Survival Data: Type I and Random Censoring and K-M Estimator 19 / 33
MLE under Survival Data:
Type I and Random Censoring
and K-M Estimator
Let there be n items and k(≤ n) distinct failure times t1 < t2 < . . . < tk
observed.
Let dj = # failures at time point tj
In addition to failure times, there are censoring times Li for items whose
lifetimes are not observed due to some reasons.
let nj = # items at risk of failing at tj (i.e. # items that are functioning
and uncensored just prior to tj ).
Then the K-M estimator is defined as
Y nj − dj
R̂n (t) =
nj
j:tj ≤t

where nj+1 = nj − dj − cj ; cj = # items censored at tj .

Saurav De (Department of Statistics Presidency


MLEUniversity)
under Survival Data: Type I and Random Censoring and K-M Estimator 20 / 33
MLE under Survival Data:
Type I and Random Censoring
and K-M Estimator

Result : The K-M estimator is a nonparametric MLE of the survival


function S(t).

Justification : Let T : Lifetime of a randomly chosen item.


=⇒ S(t) = P[T > t] at the time point t.
Let t1 < t2 < . . . < tk ≤ t < tk+1 .
Then we can look upon S(t) as

S(t) = P[T > t1 ]P[T > t2 | T > t1 ] . . . P[T > t | T > tk ]

Let λj = Probability that a randomly chosen item will fail at the time tj
given that it survived at tj−1 .

Saurav De (Department of Statistics Presidency


MLEUniversity)
under Survival Data: Type I and Random Censoring and K-M Estimator 21 / 33
MLE under Survival Data:
Type I and Random Censoring
and K-M Estimator

=⇒

S(t) = (1 − λ1 )(1 − λ2 ) . . . (1 − λk ).1


Y
= (1 − λj )
j:tj ≤t

As nj = # items under the risk of failing at tj given that they survived at


tj−1 (like # independent Bernoulli trials) and

dj = # items actually failed at tj (like random # successes out of nj trials)

=⇒ dj ∼ Binomial (nj , λj ), for all j

Saurav De (Department of Statistics Presidency


MLEUniversity)
under Survival Data: Type I and Random Censoring and K-M Estimator 22 / 33
MLE under Survival Data:
Type I and Random Censoring
and K-M Estimator
dj
We already know that for binomial distribution MLE of λj is λ̂j = nj

=⇒ by invariance property, the MLE of S(t) will be


Y Y  dj

R̂n (t) = (1 − λ̂j ) = 1− −→ K-M estimator
nj
j:tj ≤t j:tj ≤t

Note : dj always follows binomial distribution irrespective of the parent


population distribution Fθ of the lifetime Ti .
=⇒ the distribution of K-M estimator is distribution free i.e. K-M
estimator is a nonparametric estimator of the survival function and also
the MLE of it under type I censored case.

Saurav De (Department of Statistics Presidency


MLEUniversity)
under Survival Data: Type I and Random Censoring and K-M Estimator 23 / 33
MLE under Survival Data:
Type I and Random Censoring
and K-M Estimator
The estimated asymptotic variance of the K-M estimator R̂n (t) :
X
log R̂n (t) = log(1 − λ̂j ).
j:tj ≤t

So, from delta method, the asymptotic variance of log R̂n (t) will be
!2
  X ∂
V log R̂n (t) ≈ log(1 − λ̂j ) |λ̂j =λj V (1 − λ̂j )
j:t ≤t
∂ λ̂ j j
X λj (1 − λj )
= (1 − λj )−2
nj
j:tj ≤t
X λj
=
nj (1 − λj )
j:tj ≤t

Saurav De (Department of Statistics Presidency


MLEUniversity)
under Survival Data: Type I and Random Censoring and K-M Estimator 24 / 33
MLE under Survival Data:
Type I and Random Censoring
and K-M Estimator
So the estimated asymptotic variance of log R̂n (t) will be
  X λ̂j X dj
V̂ log R̂n (t) ≈ = .
n (1 − λ̂j ) nj (nj − dj )
j:tj ≤t j j:tj ≤t

Again, using delta method we have


!2
  ∂ R̂n (t)  
V̂ R̂n (t) ≈ V̂ log R̂n (t)
∂ log R̂n (t)
 2 X dj
= R̂n (t)
nj (nj − dj )
j:tj ≤t

This is known as Greenwood’s formula (1926) for asymptotic variance of


K-M survival estimator.
Saurav De (Department of Statistics Presidency
MLEUniversity)
under Survival Data: Type I and Random Censoring and K-M Estimator 25 / 33
MLE under Survival Data:
Type I and Random Censoring
and K-M Estimator
Example. Calculate the K-M estimate of S(t) for the following data,
where δi = 1 if individual i died at time ti and δi = 0 if individual i was
censored at that time, for i = 1, . . . , 8.

i ti δi
1 2 1
2 5 1
3 8 1
4 11 0
5 12 0
6 15 1
7 20 1
8 23 0

Saurav De (Department of Statistics Presidency


MLEUniversity)
under Survival Data: Type I and Random Censoring and K-M Estimator 26 / 33
MLE under Survival Data:
Type I and Random Censoring
and K-M Estimator
The K-M estimates R̂n (ti ) of the survival function at different time points
ti are given in the following table:

i ti δi ni 1 − λ̂j R̂n (ti )


7
1 2 1 8 8 0.875
6 7 6
2 5 1 7 7 8 . 7 = 0.750
5 7 6 5
3 8 1 6 6 8 . 7 . 6 = 0.625
5 7 6 5 5
4 11 0 5 5 8 . 7 . 6 . 5 = 0.625
4 7 6 5 5 4
5 12 0 4 4 8 . 7 . 6 . 5 . 4 = 0.625
2 7 6 5 5 4 2
6 15 1 3 3 8 . 7 . 6 . 5 . 4 . 3 = 0.417
1 7 6 5 5 4 2 1
7 20 1 2 2 8 . 7 . 6 . 5 . 4 . 3 . 2 = 0.208
1 7 6 5 5 4 2 1 1
8 23 0 1 1 8 . 7 . 6 . 5 . 4 . 3 . 2 . 1 = 0.208

Saurav De (Department of Statistics Presidency


MLEUniversity)
under Survival Data: Type I and Random Censoring and K-M Estimator 27 / 33
MLE under Survival Data:
Type I and Random Censoring
and K-M Estimator

Example Consider the following failure time data measured in some


suitable unit :

6 4 4 10 5∗ 5 8∗ 11 6 8∗ 6∗ 8 4∗ 4 4 7∗ 6 10∗ 3 5
where ∗ denotes the reading is right censored, not exact failure time.

Using R program, find the estimates of the survival function at different


time points assuming (i) the failure time T ∼ shifted Exponential
(t−α)
distribution with density f (t) = σ1 e − σ , t ≥ α > 0 and (ii) T ∼ F ,
where F is absolutely continuous.

Saurav De (Department of Statistics Presidency


MLEUniversity)
under Survival Data: Type I and Random Censoring and K-M Estimator 28 / 33
MLE under Survival Data:
Type I and Random Censoring
and K-M Estimator

Solution. Given data are Type I censored data.


(i) Assuming shifted exponential failure distribution,
X we get from Module
(yi −y(1) )
15 that the ML estimate of (α, σ) = (y(1) , r ) where yi s are the
potential responses, y(1) = min {y1 , . . . , yn } and r = # uncensored cases.

Also the survival function at time point t will be


(t−α)
S(t) = P[T > t] = e − σ . Hence its (ML) estimate will be
(t−α̂)
Ŝn (t) = e − σ̂ . Using R program we will find these estimates.

Saurav De (Department of Statistics Presidency


MLEUniversity)
under Survival Data: Type I and Random Censoring and K-M Estimator 29 / 33
MLE under Survival Data:
Type I and Random Censoring
and K-M Estimator
Assumption (ii) leads to the nonparametric estimation of the survival
function under Type I censoring. Hence we will use K-M estimator to
estimate the survival probability at each time point. The R program in
support of this computation and the findings are as follows.
R program for the estimation of survival probabilities :
R Code and Output :
> library ( survival )
> t = c (6 ,4 ,4 ,10 ,5 ,5 ,8 ,11 ,6 ,8 ,6 ,8 ,4 ,4 ,4 ,7 ,6 ,10 ,3 ,5)
> del = c (1 ,1 ,1 ,1 ,0 ,1 ,0 ,1 ,1 ,0 ,0 ,1 ,0 ,1 ,1 ,0 ,1 ,0 ,1 ,1)
> df = data . frame (t , del )
> # parametric
> alpha = min ( t )
> sigma = sum (t - alpha ) / sum ( del )
> Sn1 = exp ( -(t - alpha ) / sigma )

Saurav De (Department of Statistics Presidency


MLEUniversity)
under Survival Data: Type I and Random Censoring and K-M Estimator 30 / 33
MLE under Survival Data:
Type I and Random Censoring
and K-M Estimator

R Code and Output (continued) :


> Sn1
[1] 0.5436906 0.8161762 0.8161762 0.2412612 0.6661436 0.6661436
0.3621760
[8] 0.1969117 0.5436906 0.3621760 0.5436906 0.3621760 0.8161762
0.8161762
[15] 0.8161762 0.4437473 0.5436906 0.2412612 1.0000000 0.6661436
> # non - parametric
> fit = survfit ( Surv ( df $t , df $ del ) ~ 1 , df )

Saurav De (Department of Statistics Presidency


MLEUniversity)
under Survival Data: Type I and Random Censoring and K-M Estimator 31 / 33
MLE under Survival Data:
Type I and Random Censoring
and K-M Estimator
R Code and Output (continued) :
> summary ( fit )
Call : survfit ( formula = Surv ( df $t , df $ del ) ~ 1 , data = df )

time n . risk n . event survival std . err lower 95% CI upper 95% CI
3 20 1 0.950 0.0487 0.8591 1.000
4 19 4 0.750 0.0968 0.5823 0.966
5 14 2 0.643 0.1087 0.4616 0.895
6 11 3 0.468 0.1170 0.2862 0.764
8 6 1 0.390 0.1207 0.2123 0.715
10 3 1 0.260 0.1331 0.0951 0.709
11 1 1 0.000 NaN NA NA
> Sn2 = fit $ surv
> Sn2
[1] 0.9500000 0.7500000 0.6428571 0.4675325 0.4675325 0.3896104
0.2597403
[8] 0.0000000
> # K - M plot
> plot ( fit , xlab = " t " , ylab = expression ( S ( t ) ) , main = " Kaplan - Meier Plot " )
Saurav De (Department of Statistics Presidency
MLEUniversity)
under Survival Data: Type I and Random Censoring and K-M Estimator 32 / 33
MLE under Survival Data:
Type I and Random Censoring
and K-M Estimator
The Kaplan-Meier Plot corresponding to the problem :

Saurav De (Department of Statistics Presidency


MLEUniversity)
under Survival Data: Type I and Random Censoring and K-M Estimator 33 / 33

You might also like