You are on page 1of 4

Statistic Seminar, 8th talk:

Nonparametric estimation for interval censored data


Martina Albers, Nanina Anderegg, Urs Müller
Monday, 2. May 2011

1 Interval Censoring

Current Status Censoring / Interval Censoring Case 1:

• X : the failure time, where X ∼ F


• T : observation time, where T ∼ G
• X is independent of T

• n observations which are iid copies of (T, ∆) = (T, 1{X ≤ T })


The goal is to estimate the distribution function of X , i.e. F (x) = P [X ≤ x].
Interval Censoring Case k : (here for k = 2)

• X : the failure time, where X ∼ F

• (T1 , T2 ): observation times, where (T1 , T2 ) ∼ G


• X is independent of (T1 , T2 )
• n observations which are iid copies of

(T, ∆) = ((T1 , T2 ) , 1{X ≤ T1 }, 1{T1 < X ≤ T2 }, 1{X > T2 })

The goal is again to estimate the distribution function of X .


Mixed Case Interval Censoring: Instead of having a xed k , the number
of observations may also vary from subject to subject, e.g. the rst patient is
tested 2 times, the second patient 3 times and the third patient only once. Dene
hence a random variable K denoting the number of observations!
Bivariate Interval Censored Data:

• (X, Y ): the failure time, where (X, Y ) ∼ F


• U = (U1 , U2 ), V = (V1 , V2 ): observation times, where (U, V ) ∼ G

• (X, Y ) are independent of (U, V )


• n observations which are iid copies of (U, V, ∆), where

∆ = (∆11 , ∆12 , ∆13 , ∆21 , ∆22 , ∆23 , ∆31 , ∆32 , ∆33 )

1
The variable ∆ij is dened to be: ∆ij = 1{(X, Y ) ∈ Rij }, where the Rij must
be chosen from:
R11 = (0, U1 ] × (0, V1 ]
R12 = (U1 , U2 ] × (0, V1 ]
R13 = (U2 , ∞) × (0, V1 ]
R21 = (0, U1 ] × (V1 , V2 ]
R22 = (U1 , U2 ] × (V1 , V2 ]
R23 = (U2 , ∞) × (V1 , V2 ]
R31 = (0, U1 ] × (V2 , ∞)
R32 = (U1 , U2 ] × (V2 , ∞)
R33 = (U2 , ∞) × (V2 , ∞)
(1)
The goal is again to estimate the distribution function of (X, Y ), i.e.
P [X ≤ x, Y ≤ y].

2 The Nonparametric MLE for Current Status


Data

The likelihood of n iid observations (ti , δi ) , i = 1, . . . , n can be written as


n
(2)
Y 1−δi
F (ti )δi (1 − F (ti )) g(ti )
i=1

and thus the nonparametric MLE F fullls


n
(3)
Y 1−δi
Ln (F ) = F (ti )δi (1 − F (ti ))
i=1

and is well-dened. By using the notation of observed sets


(0, ti ] if δi = 1
(
Ri = , i = 1, . . . , n (4)
(ti , ∞) if δi = 1
the likelihood of the nonparametric MLE F can be rewritten as
n
(5)
Y
Ln (F ) = PF (Ri ),
i=1

where PF (Ri ) is the probability under distribution F that X ∈ Ri .

3 Finite sample properties and computation of


the MLE

Reducing the optimization problem:


The optimization problem
n
sup ln (F ) with ln (F ) = log Ln (F ) = (6)
X
log PF (Ri ),
F ∈F i=1

2
where F is the space of all distribution functions on the appropriate space, is an
innite dimensional optimization problem. We can reduce it to a nite dimensio-
nal optimization problem by looking at the maximal intersections A1 , . . . , Am of
the observed sets R1 , . . . , Rn , which are areas where there is maximal overlap of
the observed sets. Let α1 , . . . , αm be the masses assigned to the corresponding
sets A1 , . . . , Am . It can be shown that
n
(7)
X
ln (α) = log(C T α)i ,
i=1

where C is an m × n matrix, called the clique matrix, with entries Cji = 1{Aj ⊆
Ri }. We then get a nite dimensional convex optimization problem:

ln (α̂) = max ln (α), (8)


α∈A
m
where A = {α ∈ Rm : αj ≥ 0, j = 1, . . . , m, (9)
X
αj = 1}.
j=1

Existence and (non-)uniqueness of the MLE:

Theorem 1. The MLE α̂ dened by (8) exists.


Let PF (R) denote the vector (PF (R1 ), . . . , PF (Rn )).
Theorem 2. The log likelihood (8) is strictly concave in PF (R). Thus, the MLE
estimates the probabilties PF (R1 ), . . . , PF (Rn ) of the observation rectangles un-
iquely.
However, the log likelihood is concave in F and α, but not strictly concave,
which means that two dierent functions F1 , F2 ∈ F can yield the same vector
PF (R). Similarly, two dierent α1 , α2 ∈ A can yield the same vector PF (R).
Thus, we cannot estimate F or α uniquely. See slides for an example.
Theorem 3. The MLE α̂ is unique if the clique matrix C has rank m.

4 Characterization and convex minorants for Cur-


rent status Data

Let T(1) , . . . , T(n) denote the order statistics of T1 , . . . , Tn , and let ∆(1) , . . . , ∆(n)
be the corresponding ∆ values, i.e., ∆(i) = ∆j if T(i) = Tj . Furthermore, let
Y = {y ∈ Rn : 0 < y1 ≤ · · · ≤ yn < 1}, and dene ŷ ≡ F̂n (T(i) ).
Proposition 1. ([GW92], Proposition 1.1, page 39) The vector ŷ ∈ Y is the
MLE if and only if
X  ∆(i) 1 − ∆(i)

− ≤ 0, for all j = 1, . . . , n (10)
ŷi 1 − ŷi
i≥j
n 
1 − ∆(i)

∆(i)
(11)
X
− ŷi = 0
i=1
ŷi 1 − ŷi

3
Corollary 1. The vector ŷ ∈ Y is the MLE if and only if

∆(i) − ŷi ≥ 0, for all j = 1, . . . , n + 1, (12)


X
i<j

and equality holds if ŷj > ŷj−1 (with ŷ0 = 0 and ŷn+1 = 1).
Proposition 2. Let P = {Pi = (i, j≤i ∆(j) ), i = 0, . . . , n}. Let H be the
P
greatest convex minorant of P . Then ŷ is the MLE if and only if for all i =
1, . . . , n, ŷi equals the left derivativce of H at i.

5 Asymptotic Theory

• The MLE for current status data is globally and locally consistent.

• The MLE for current status data converges globally and locally with rate
n1/3 to F0 .
• n1/3 (F̂n −F0 ) converges in distribution to the slope of the convex minorant
of a Brownian motion plus a parabola at point 0.
The likelihood ratio test has asymptotic distribution D = S 2 (t) − S02 (t)dt,
R

where S is the slope process of the greatest convex minorant of a two-sided


Brownian motion plus a parabola, and S0 is the slope process of the greatest
convex minorant of a two-sided Brownian motion plus a parabola under the
constraint that the slopes are ≥ 0 for all t ≥ 0 and the slopes are ≤ 0 for t ≤ 0.
Let λn (θ) be the likelihood ratio test for the null hypothesis H0 : F (t0 ) = θ
and the alternative H1 : F (t0 ) 6= θ. Then for 0 < α < 1 and dα such that
P (D > dα ) = α, we have the condence sets Cn,α = {θ : 2 log λn (θ) ≤ dα }.
Proposition 3. Suppose that F and G have densities f and g which are positive
and continuous in a neighbourhood of t0 . Then
PF,G (F (t0 ) ∈ Cn,α ) → P (D ≤ dα ) = 1 − α, as n → ∞.

Literatur

[GW92] P. Groenebook and J. A. Wellner, Information bounds and nonpa-


rametric maximum likelihood estimation, Birkhäuser Verlag, Basel,
1992.
[Maa07a] Marloes H. Maathuis, Survival analysis for interval censo-
red data, part 1, Seminar of Statistics, ETH Zurich, 2007,
http://stat.ethz.ch/∼maathuis/teaching/fall07/notes1a.pdf (last ac-
cessed April 29, 2011).
[Maa07b] , Survival analysis for interval censored da-
ta, part 2, Seminar of Statistics, ETH Zurich, 2007,
http://stat.ethz.ch/∼maathuis/teaching/fall07/notes2b.pdf (last
accessed April 29, 2011).

You might also like