You are on page 1of 29

Lecture 4: Survival Models and Analysis

Estimating the Survival Function

December 1, 2021

Survival Models and Analysis December 1, 2021 1 / 29


Survival Models and Analysis
Estimating the Survival Function
Other Justifications for KM Estimator

Other Justifications for KM Estimator


I. Likelihood-based derivation(Cox and Oakes)
For a discrete failure time variable, define:
dj number of failures at aj .
rj number of individuals at risk at aj (including those censored at aj )
λj Pr(death) in j-th interval (conditional on survival to start of
interval)
The likelihood is that of g independent binomials:
g
d
Y
L(λ) = λj j (1 − λj )rj −dj
j=1
Therefore, the maximum likelihood estimator of λj is:

dj
λ̂j =
rj
Survival Models and Analysis December 1, 2021 2 / 29
Survival Models and Analysis
Estimating the Survival Function
Other Justifications for KM Estimator

Proof:

g
d
Y
L(λ) = λj j (1 − λj )rj −dj
j=1
d
X
LogL(λ) = [log λj j (1 − λj )rj −dj ]
X
= [dj log λj + (rj − dj )log (1 − λj )]
∂logL(λ) dj rj − dj
= − =0
∂λ λj 1 − λj
dj r j − dj
=
rj 1 − λj
dj (1 − λj ) = λj (rj − dj )
dj − λj rj = 0
dj
λj =
rj
Survival Models and Analysis December 1, 2021 3 / 29
Survival Models and Analysis
Estimating the Survival Function
Other Justifications for KM Estimator

Now we plug in the MLE’s of λ to estimate S(t):


Y
Ŝ(t) = (1 − λ̂j )
j:aj <t
Y dj
= (1 − )
rj
j:aj <t

II. Redistribute to the right justification (Efron, 1967)


In the absence of censoring, Ŝ(t) is just the proportion of individuals with
T ≥ t.
The idea behind Efron’s approach is to spread the contributions of
censored observations out over all the possible times to their right.
Algorithm:
• Step (1): arrange the n observed times (deaths or censorings) in
increasing order. If there are ties, put censored after deaths.
• Step (2): Assign weight (1/n) to each time.
Survival Models and Analysis December 1, 2021 4 / 29
Survival Models and Analysis
Estimating the Survival Function
Other Justifications for KM Estimator

• Step (3): Moving from left to right, each time you encounter a censored
observation, distribute its mass to all times to its right.
• Step (4): Calculate Ŝj by subtracting the final weight for time j from
Ŝj−1
Example of “redistribute to the right” algorithm
Consider the event times: 2, 2.5+, 3, 3, 4, 4.5+, 5, 6, 7. The algorithm
goes as follows:

Survival Models and Analysis December 1, 2021 5 / 29


Survival Models and Analysis
Estimating the Survival Function
Other Justifications for KM Estimator

This comes out the same as the product limit approach.


e.g Step 3(a) and 3(b)

0.11
0.25 = 0.22 + 2( )
7
0.11
0.13 = 0.11 +
7
0.13
0.17 = 0.13 + ...
3
Step (4):

1 − 0.11 = 0.889 0.508 − 0 = 0.508


0.889 − 0 = 0.889 0.508 − 0.17 = 0.339
0.889 − 0.25 = 0.635 0.339 − 0.17 = 0.169
0.635 − 0.13 = 0.508 0.169 − 0.17 = 0.000
Survival Models and Analysis December 1, 2021 6 / 29
Survival Models and Analysis
Estimating the Survival Function
Properties of the KM Estimator

Properties of the KM Estimator


In the case of no censoring
# deaths at t or greater
Ŝ(t) =
n
Where n is the number of individuals in the study. This is just like an
estimated probability from a binomial distribution, so we have:
Ŝ(t) ' N (S(t), S(t)[1 − S(t)]/n)
How does censoring affect this?
• Ŝ(t) is still approximately normal
• The mean of Ŝ(t) converges to the true S(t)
• The variance is a bit more complicated (since the denominator n
includes some censored observations).
Once we get the variance, then we can construct (pointwise) (1 − α)%
confidence bands about Ŝ(t): Ŝ(t) ± Z1−α/2 se[Ŝ(t)]
Survival Models and Analysis December 1, 2021 7 / 29
Survival Models and Analysis
Estimating the Survival Function
Properties of the KM Estimator

Greenwood’s formula

We can think of the KM estimator as


Y
Ŝ(t) = (1 − λ̂j )
j:τj <t

where λ̂j = dj /rj .


Since the λ̂j ’s are just binomial proportions, we can apply standard
likelihood theory to show that each λ̂j is approximately normal, with mean
the true λj , and
λ̂j (1 − λ̂j )
var (λ̂j ) ≈
rj
Also, the λ̂j ’s are independent in large enough samples. Since Ŝ(t) is a
function of the λj ’s, we can estimate its variance using the delta method:
Survival Models and Analysis December 1, 2021 8 / 29
Survival Models and Analysis
Estimating the Survival Function
Properties of the KM Estimator

Delta method: If Y is normal with mean µ and variance σ 2 , then g (Y ) is


approximately normally distributed with mean g (µ) and variance
[g 0 (µ)]2 σ 2 .
Two specific examples of the delta method:
Given that g (Y ) ∼ N{g (µ), [g 0 (µ)]2 σ 2 }
(A) Z = log (Y ) 
 2 
then Z ∼ N log (µ), µ1 σ 2
(B) Z = exp(Y ) h i
then Z ∼ N e µ , [e µ ]2 σ 2
The examples above use the following results from calculus:
 
d 1 du
logu =
dx u dx
 
d u u du
e =e
dx dx
Survival Models and Analysis December 1, 2021 9 / 29
Survival Models and Analysis
Estimating the Survival Function
Properties of the KM Estimator

Greenwood’s formula (continued)

λ̂j (1 − λ̂j )
Var (λ̂j ) =
rj
 
dj dj
rj 1 − rj
=
rj
dj (rj − dj )
=
rj2
 

(1 − λˆj )
Y
Var (Ŝ(t)) = Var 
j:aj <t

Instead of dealing with Ŝ(t) directly, we will look at its log:


Determine the var [log Ŝ(t)]
Survival Models and Analysis December 1, 2021 10 / 29
Survival Models and Analysis
Estimating the Survival Function
Properties of the KM Estimator

Y
log [Ŝ(t)] = log (1 − λ̂j )
j
X
= log (1 − λ̂j )
j

Thus, by approximate independence of the λ̂j ’s,


  X h i
Var log [Ŝ(t)] = var log (1 − λ̂j )
j
 !2 
X −1
by (A) =  var (λ̂j )
j
1 − λ̂j
 !2 " #
X 1 λ̂j (1 − λ̂j ) 
=
 1 − λ̂j rj 
j

Survival Models and Analysis December 1, 2021 11 / 29


Survival Models and Analysis
Estimating the Survival Function
Properties of the KM Estimator

dj
X λ̂j X rj
= = d
j
(1 − λ̂j )r j j (1 − rjj )rj
X dj

=
rj (rj − dj
j
h i
Now Ŝ(t) = exp log Ŝ(t)
Var (Ŝ(t)) = Var
h (exp(log Ŝ(t)))
i
By (B) Z ∼ N e µ , [e µ ]2 σ 2

h i2
Var (Ŝ(t)) = e log Ŝ(t) var (log Ŝ(t))
h i2 X  dj

= Ŝ(t) Greenwood’s formula
rj (rj − dj
j

Survival Models and Analysis December 1, 2021 12 / 29


Survival Models and Analysis
Estimating the Survival Function
Properties of the KM Estimator

Confidence intervals

For a 95% confidence interval, we could use;


h i
Ŝ(t) ± Z1−α/2 se Ŝ(t)

where se[Ŝ(t)] is calculated using Greenwood’s formula.

Problem:

This approach can yield values > 1 or < 0. Better approach: Get a 95%
confidence interval for

L(t) = log (−log (S(t)))

Since this quantity is unrestricted, the confidence interval will be in the


proper range when we transform back.
Survival Models and Analysis December 1, 2021 13 / 29
Survival Models and Analysis
Estimating the Survival Function
Properties of the KM Estimator

To see why this works, note the following:

• Since Ŝ(t) is an estimated probability


• Taking the log of Ŝ(t) has bounds:
−∞ ≤ log [Ŝ(t)] ≤ 0
• Taking the opposite:
0 ≤ −log [Ŝ(t)] ≤ ∞
• Taking the log again:
h i
−∞ ≤ log −log [Ŝ(t)] ≤ ∞
To transform back, reverse steps with S(t) = exp(−exp(L(t))

Survival Models and Analysis December 1, 2021 14 / 29


Survival Models and Analysis
Estimating the Survival Function
Properties of the KM Estimator

Log-log Approach for Confidence Intervals:


(a) Define L(t) = log (−log (S(t)))
(b) Form a 95% confidence interval for L(t) based on L̂(t), yielding
[L̂(t) − A, L̂(t)) + A]
(c) Since S(t) = exp(−exp(L(t)), the confidence bounds for the 95% CI
on S(t) are h i
exp(−e (L̂(t)+A) ), exp(−e (L̂(t)−A) )
(Note that the upper and lower bound switch)
(d) Substituting L̂(t) = log (−log (Ŝ(t))) back into the above bounds, we
get confidence bounds of
 A −A

[Ŝ(t)]e , [Ŝ(t)]e
What is A?  
• A is 1.96se L̂(t) . To calculate this, we need to calculate
  h i
Var L̂(t) = Var log (−log (Ŝ(t))
Survival Models and Analysis December 1, 2021 15 / 29
Survival Models and Analysis
Estimating the Survival Function
Properties of the KM Estimator

• From our previous calculations, we know


X dj

Var (log [Ŝ(t)]) =
rj (rj − dj
j

• Applying the delta method as in example (A), we get:


  h i
Var L̂(t) = Var log (−log (Ŝ(t))
" #2 " #2
1 1 X dj

= Var (log Ŝ(t)) =
log Ŝ(t) log Ŝ(t) rj (rj − dj )
j

• We take the square root of the above to get se(L̂(t)), and then form the
confidence intervals as:
±1.96se(L̂(t))
Ŝ(t)e
This is the approach that Stata uses. R gives an option to calculate these
bounds (conf.type="log-log" in surv.fit
Survival Models and Analysis December 1, 2021 16 / 29
Survival Models and Analysis
Estimating the Survival Function
Properties of the KM Estimator

Summary of Confidence Intervals on S(t)


• Calculate Ŝ(t) ± 1.96se[Ŝ(t)] where se[Ŝ(t)] is calculated using
Greenwood’s formula, and replace negative lower bounds by 0 and upper
bounds greater than 1 by 1.

This approach was Recommended by Collett, it is the default using SAS


and is not very satisfactory.
• Use a log transformation to stabilize the variance and allow for
non-symmetric confidence intervals. This is what is normally done for
the confidence interval of an estimated odds ratio.
Use h i X dj
Var log (Ŝ(t)) =
(rj − dj )rj
j
(Greenwood’s formula) This is the default in R.
• Use the log-log transformation just described. It is somewhat
complicated, but always yields proper bounds. This is the default in
Stata.
Survival Models and Analysis December 1, 2021 17 / 29
Survival Models and Analysis
Estimating the Survival Function
Properties of the KM Estimator

Different software might use different approaches to calculate the CI.


Software for Kaplan-Meier Curves
• Stata -stset and sts commands
• SAS - proc lifetest
• R -surv.fit(time,status)

Defaults for Confidence Interval Calculations


h i
• Stata -”log-log” ⇒ L̂(t) ± 1.96se L̂(t)
where L(t) = log [−log (S(t))]
h i
• SAS - ”plain” ⇒ Ŝ(t) ± 1.96se Ŝ(t)
h i
• R - ”log” ⇒ log Ŝ(t) ± 1.96se log Ŝ(t)

but R will also give either of the other two options if you request them.
Survival Models and Analysis December 1, 2021 18 / 29
Survival Models and Analysis
Estimating the Survival Function
Properties of the KM Estimator

R Commands

library(survival) Loading the survival package

install.packages("survminer") Installing package survminer

library(survminer) Loading the survminer package

time<-c(6, 6, 6, 6, 7, 9, 10, 10, 11, 13, 16, 17, 19, 20, 22,
23, 25, 32, 32, 34, 35) Creating the time variable.

status<-c(0,1,1,1,1,0,0,1,0,1,1,0,0,0,1,1,0,0,0,0,0) Creating
the event variable
Survival Models and Analysis December 1, 2021 19 / 29
Survival Models and Analysis
Estimating the Survival Function
Properties of the KM Estimator

data<-data.frame(time,status) Creating the survival dataframe


surv.object<-Surv(time,status) Creating the survival object

fit<-survfit(surv.object1, data=data) Fit survival data using


the Kaplan-Meier method

summary(fit)

ggsurvplot(fit,data=data) Plotting the survival curve

Survival Models and Analysis December 1, 2021 20 / 29


Survival Models and Analysis
Estimating the Survival Function
Properties of the KM Estimator

Kaplan Meier plot

Survival Models and Analysis December 1, 2021 21 / 29


Survival Models and Analysis
Estimating the Survival Function
Properties of the KM Estimator

Stata Commands
Create a file called “leukemia.dat” with the raw data, with a column for
treatment, weeks to relapse (i.e., duration of remission), and relapse
status:

Survival Models and Analysis December 1, 2021 22 / 29


Survival Models and Analysis
Estimating the Survival Function
Properties of the KM Estimator

If the dataset has already been created and loaded into Stata, then you
can substitute the following commands for initializing the data:

Survival Models and Analysis December 1, 2021 23 / 29


Survival Models and Analysis
Estimating the Survival Function
Properties of the KM Estimator

Survival Models and Analysis December 1, 2021 24 / 29


Survival Models and Analysis
Estimating the Survival Function
Properties of the KM Estimator

SAS Commands for Kaplan Meier Estimator - PROC LIFETEST

The SAS command for the Kaplan-Meier estimate is:

time failtime*censor(1);
time failtime*failind(0);

The first variable is the failure time, and the second is the failure or
censoring indicator. In parentheses you need to put the specific numeric
value that corresponds to censoring.

The upper and lower confidence limits on Ŝ(t) are included in the data set
“OUTSURV” when specified. The upper and lower limits are called:
sdf ucl, sdf lcl.

Survival Models and Analysis December 1, 2021 25 / 29


Survival Models and Analysis
Estimating the Survival Function
Properties of the KM Estimator

SAS Commands

Survival Models and Analysis December 1, 2021 26 / 29


Survival Models and Analysis
Estimating the Survival Function
Properties of the KM Estimator

Survival Models and Analysis December 1, 2021 27 / 29


Survival Models and Analysis
Estimating the Survival Function
Properties of the KM Estimator

Figure 1: KM Survival Estimate and Confidence intervals (type ‘log-log’)


Survival Models and Analysis December 1, 2021 28 / 29
Survival Models and Analysis
Estimating the Survival Function
Properties of the KM Estimator

Means,P Medians, Quantiles based on the KM


Mean: kj=1 τj Pr (T = τj )
Median: by definition, this is the time, τ , such that S(τ ) = 0.5. However,
in practice, it is defined as the smallest time such that ŝ(τ ) ≤ 0.5. The
median is more appropriate for censored survival data than the mean.
For the treated leukemia patients, we find:

Ŝ(22) = 0.5378
Ŝ(23) = 0.4482

The median is thus 23. This can also be seen visually on the graph to the
left.
Lower quartile (25th percentile):
the smallest time (LQ) such that Ŝ(LQ) ≤ 0.75
Upper quartile (75th percentile):
the smallest time (UQ) such that Ŝ(UQ) ≤ 0.25
Survival Models and Analysis December 1, 2021 29 / 29

You might also like