Survival - Notes (Lecture 4)

Lecture 4: Survival Models and Analysis
Estimating the Survival Function
December 1, 2021
Survival Models and Analysis December 1, 2021 1 / 29

Survival Models and Analysis
Other Justifications for KM Estimator

I. Likelihood-based derivation(Cox and Oakes)
For a discrete failure time variable, define:
dj number of failures at aj .
rj number of individuals at risk at aj (including those censored at aj )
λj Pr(death) in j-th interval (conditional on survival to start of
interval)
The likelihood is that of g independent binomials:
g
d
Y
L(λ) = λj j (1 − λj )rj −dj
j=1
Therefore, the maximum likelihood estimator of λj is:
dj
λ̂j =
rj
Proof:
g
d
Y
L(λ) = λj j (1 − λj )rj −dj
j=1
d
X
LogL(λ) = [log λj j (1 − λj )rj −dj ]
X
= [dj log λj + (rj − dj )log (1 − λj )]
∂logL(λ) dj rj − dj
= − =0
∂λ λj 1 − λj
dj r j − dj
=
rj 1 − λj
dj (1 − λj ) = λj (rj − dj )
dj − λj rj = 0
dj
λj =
rj
Now we plug in the MLE’s of λ to estimate S(t):

Y
Ŝ(t) = (1 − λ̂j )
j:aj <t
Y dj
= (1 − )
rj
j:aj <t
II. Redistribute to the right justification (Efron, 1967)

In the absence of censoring, Ŝ(t) is just the proportion of individuals with
T ≥ t.
The idea behind Efron’s approach is to spread the contributions of
censored observations out over all the possible times to their right.
Algorithm:
• Step (1): arrange the n observed times (deaths or censorings) in
increasing order. If there are ties, put censored after deaths.
• Step (2): Assign weight (1/n) to each time.
• Step (3): Moving from left to right, each time you encounter a censored
observation, distribute its mass to all times to its right.
• Step (4): Calculate Ŝj by subtracting the final weight for time j from
Ŝj−1
Example of “redistribute to the right” algorithm
Consider the event times: 2, 2.5+, 3, 3, 4, 4.5+, 5, 6, 7. The algorithm
goes as follows:

This comes out the same as the product limit approach.

e.g Step 3(a) and 3(b)
0.11
0.25 = 0.22 + 2( )
7
0.11
0.13 = 0.11 +
7
0.13
0.17 = 0.13 + ...
3
Step (4):
1 − 0.11 = 0.889 0.508 − 0 = 0.508

0.889 − 0 = 0.889 0.508 − 0.17 = 0.339
0.889 − 0.25 = 0.635 0.339 − 0.17 = 0.169
0.635 − 0.13 = 0.508 0.169 − 0.17 = 0.000
Properties of the KM Estimator

In the case of no censoring
# deaths at t or greater
Ŝ(t) =
n
Where n is the number of individuals in the study. This is just like an
estimated probability from a binomial distribution, so we have:
Ŝ(t) ' N (S(t), S(t)[1 − S(t)]/n)
How does censoring affect this?
• Ŝ(t) is still approximately normal
• The mean of Ŝ(t) converges to the true S(t)
• The variance is a bit more complicated (since the denominator n
includes some censored observations).
Once we get the variance, then we can construct (pointwise) (1 − α)%
confidence bands about Ŝ(t): Ŝ(t) ± Z1−α/2 se[Ŝ(t)]
Greenwood’s formula
We can think of the KM estimator as

Y
Ŝ(t) = (1 − λ̂j )
j:τj <t
where λ̂j = dj /rj .

Since the λ̂j ’s are just binomial proportions, we can apply standard
likelihood theory to show that each λ̂j is approximately normal, with mean
the true λj , and
λ̂j (1 − λ̂j )
var (λ̂j ) ≈
rj
Also, the λ̂j ’s are independent in large enough samples. Since Ŝ(t) is a
function of the λj ’s, we can estimate its variance using the delta method:
Delta method: If Y is normal with mean µ and variance σ 2 , then g (Y ) is

approximately normally distributed with mean g (µ) and variance
[g 0 (µ)]2 σ 2 .
Two specific examples of the delta method:
Given that g (Y ) ∼ N{g (µ), [g 0 (µ)]2 σ 2 }
(A) Z = log (Y )
2
then Z ∼ N log (µ), µ1 σ 2
(B) Z = exp(Y ) h i
then Z ∼ N e µ , [e µ ]2 σ 2
The examples above use the following results from calculus:

d 1 du
logu =
dx u dx

d u u du
e =e
dx dx
Greenwood’s formula (continued)
λ̂j (1 − λ̂j )
Var (λ̂j ) =
rj

dj dj
rj 1 − rj
=
rj
dj (rj − dj )
=
rj2
 
(1 − λˆj )
Y
Var (Ŝ(t)) = Var 
j:aj <t
Instead of dealing with Ŝ(t) directly, we will look at its log:

Determine the var [log Ŝ(t)]
Y
log [Ŝ(t)] = log (1 − λ̂j )
j
X
= log (1 − λ̂j )
j
Thus, by approximate independence of the λ̂j ’s,

X h i
Var log [Ŝ(t)] = var log (1 − λ̂j )
j
 !2 
X −1
by (A) =  var (λ̂j )
j
1 − λ̂j
 !2 " #
X 1 λ̂j (1 − λ̂j ) 
=
 1 − λ̂j rj 
j

dj
X λ̂j X rj
= = d
j
(1 − λ̂j )r j j (1 − rjj )rj
X dj

=
rj (rj − dj
j
h i
Now Ŝ(t) = exp log Ŝ(t)
Var (Ŝ(t)) = Var
h (exp(log Ŝ(t)))
i
By (B) Z ∼ N e µ , [e µ ]2 σ 2
h i2
Var (Ŝ(t)) = e log Ŝ(t) var (log Ŝ(t))
h i2 X dj

= Ŝ(t) Greenwood’s formula
rj (rj − dj
j

Confidence intervals
For a 95% confidence interval, we could use;

h i
Ŝ(t) ± Z1−α/2 se Ŝ(t)
where se[Ŝ(t)] is calculated using Greenwood’s formula.
Problem:
This approach can yield values > 1 or < 0. Better approach: Get a 95%
confidence interval for
L(t) = log (−log (S(t)))
Since this quantity is unrestricted, the confidence interval will be in the

proper range when we transform back.
To see why this works, note the following:
• Since Ŝ(t) is an estimated probability

• Taking the log of Ŝ(t) has bounds:
−∞ ≤ log [Ŝ(t)] ≤ 0
• Taking the opposite:
0 ≤ −log [Ŝ(t)] ≤ ∞
• Taking the log again:
h i
−∞ ≤ log −log [Ŝ(t)] ≤ ∞
To transform back, reverse steps with S(t) = exp(−exp(L(t))

Log-log Approach for Confidence Intervals:

(a) Define L(t) = log (−log (S(t)))
(b) Form a 95% confidence interval for L(t) based on L̂(t), yielding
[L̂(t) − A, L̂(t)) + A]
(c) Since S(t) = exp(−exp(L(t)), the confidence bounds for the 95% CI
on S(t) are h i
exp(−e (L̂(t)+A) ), exp(−e (L̂(t)−A) )
(Note that the upper and lower bound switch)
(d) Substituting L̂(t) = log (−log (Ŝ(t))) back into the above bounds, we
get confidence bounds of
A −A

[Ŝ(t)]e , [Ŝ(t)]e
What is A?
• A is 1.96se L̂(t) . To calculate this, we need to calculate
h i
Var L̂(t) = Var log (−log (Ŝ(t))
• From our previous calculations, we know

X dj

Var (log [Ŝ(t)]) =
rj (rj − dj
j
• Applying the delta method as in example (A), we get:

h i
Var L̂(t) = Var log (−log (Ŝ(t))
" #2 " #2
1 1 X dj

= Var (log Ŝ(t)) =
log Ŝ(t) log Ŝ(t) rj (rj − dj )
j
• We take the square root of the above to get se(L̂(t)), and then form the
confidence intervals as:
±1.96se(L̂(t))
Ŝ(t)e
This is the approach that Stata uses. R gives an option to calculate these
bounds (conf.type="log-log" in surv.fit
Summary of Confidence Intervals on S(t)

• Calculate Ŝ(t) ± 1.96se[Ŝ(t)] where se[Ŝ(t)] is calculated using
Greenwood’s formula, and replace negative lower bounds by 0 and upper
bounds greater than 1 by 1.
This approach was Recommended by Collett, it is the default using SAS

and is not very satisfactory.
• Use a log transformation to stabilize the variance and allow for
non-symmetric confidence intervals. This is what is normally done for
the confidence interval of an estimated odds ratio.
Use h i X dj
Var log (Ŝ(t)) =
(rj − dj )rj
j
(Greenwood’s formula) This is the default in R.
• Use the log-log transformation just described. It is somewhat
complicated, but always yields proper bounds. This is the default in
Stata.
Different software might use different approaches to calculate the CI.

Software for Kaplan-Meier Curves
• Stata -stset and sts commands
• SAS - proc lifetest
• R -surv.fit(time,status)
Defaults for Confidence Interval Calculations

h i
• Stata -”log-log” ⇒ L̂(t) ± 1.96se L̂(t)
where L(t) = log [−log (S(t))]
h i
• SAS - ”plain” ⇒ Ŝ(t) ± 1.96se Ŝ(t)
h i
• R - ”log” ⇒ log Ŝ(t) ± 1.96se log Ŝ(t)
but R will also give either of the other two options if you request them.
R Commands
library(survival) Loading the survival package
install.packages("survminer") Installing package survminer
library(survminer) Loading the survminer package
time<-c(6, 6, 6, 6, 7, 9, 10, 10, 11, 13, 16, 17, 19, 20, 22,
23, 25, 32, 32, 34, 35) Creating the time variable.
status<-c(0,1,1,1,1,0,0,1,0,1,1,0,0,0,1,1,0,0,0,0,0) Creating
the event variable
data<-data.frame(time,status) Creating the survival dataframe

surv.object<-Surv(time,status) Creating the survival object
fit<-survfit(surv.object1, data=data) Fit survival data using

the Kaplan-Meier method
summary(fit)
ggsurvplot(fit,data=data) Plotting the survival curve

Kaplan Meier plot

Stata Commands
Create a file called “leukemia.dat” with the raw data, with a column for
treatment, weeks to relapse (i.e., duration of remission), and relapse
status:

If the dataset has already been created and loaded into Stata, then you
can substitute the following commands for initializing the data:


SAS Commands for Kaplan Meier Estimator - PROC LIFETEST
The SAS command for the Kaplan-Meier estimate is:
time failtime*censor(1);
time failtime*failind(0);
The first variable is the failure time, and the second is the failure or
censoring indicator. In parentheses you need to put the specific numeric
value that corresponds to censoring.
The upper and lower confidence limits on Ŝ(t) are included in the data set
“OUTSURV” when specified. The upper and lower limits are called:
sdf ucl, sdf lcl.

SAS Commands


Figure 1: KM Survival Estimate and Confidence intervals (type ‘log-log’)

Means,P Medians, Quantiles based on the KM

Mean: kj=1 τj Pr (T = τj )
Median: by definition, this is the time, τ , such that S(τ ) = 0.5. However,
in practice, it is defined as the smallest time such that ŝ(τ ) ≤ 0.5. The
median is more appropriate for censored survival data than the mean.
For the treated leukemia patients, we find:
Ŝ(22) = 0.5378
Ŝ(23) = 0.4482
The median is thus 23. This can also be seen visually on the graph to the
left.
Lower quartile (25th percentile):
the smallest time (LQ) such that Ŝ(LQ) ≤ 0.75
Upper quartile (75th percentile):
the smallest time (UQ) such that Ŝ(UQ) ≤ 0.25

Survival - Notes (Lecture 4)

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Survival - Notes (Lecture 4)

Uploaded by

Copyright:

Available Formats

Lecture 4: Survival Models and Analysis

Estimating the Survival Function

Survival Models and Analysis December 1, 2021 1 / 29

Other Justifications for KM Estimator

Now we plug in the MLE’s of λ to estimate S(t):

II. Redistribute to the right justification (Efron, 1967)

Survival Models and Analysis December 1, 2021 5 / 29

This comes out the same as the product limit approach.

1 − 0.11 = 0.889 0.508 − 0 = 0.508

Properties of the KM Estimator

We can think of the KM estimator as

where λ̂j = dj /rj .

Delta method: If Y is normal with mean µ and variance σ 2 , then g (Y ) is

Greenwood’s formula (continued)

Instead of dealing with Ŝ(t) directly, we will look at its log:

Thus, by approximate independence of the λ̂j ’s,

Survival Models and Analysis December 1, 2021 11 / 29

Survival Models and Analysis December 1, 2021 12 / 29

For a 95% confidence interval, we could use;

where se[Ŝ(t)] is calculated using Greenwood’s formula.

L(t) = log (−log (S(t)))

Since this quantity is unrestricted, the confidence interval will be in the

To see why this works, note the following:

• Since Ŝ(t) is an estimated probability

Survival Models and Analysis December 1, 2021 14 / 29

Log-log Approach for Confidence Intervals:

• From our previous calculations, we know

• Applying the delta method as in example (A), we get:

Summary of Confidence Intervals on S(t)

This approach was Recommended by Collett, it is the default using SAS

Different software might use different approaches to calculate the CI.

Defaults for Confidence Interval Calculations

library(survival) Loading the survival package

install.packages("survminer") Installing package survminer

library(survminer) Loading the survminer package

data<-data.frame(time,status) Creating the survival dataframe

fit<-survfit(surv.object1, data=data) Fit survival data using

ggsurvplot(fit,data=data) Plotting the survival curve

Survival Models and Analysis December 1, 2021 20 / 29

Kaplan Meier plot

Survival Models and Analysis December 1, 2021 21 / 29

Survival Models and Analysis December 1, 2021 22 / 29

Survival Models and Analysis December 1, 2021 23 / 29

Survival Models and Analysis December 1, 2021 24 / 29

SAS Commands for Kaplan Meier Estimator - PROC LIFETEST

The SAS command for the Kaplan-Meier estimate is:

Survival Models and Analysis December 1, 2021 25 / 29

Survival Models and Analysis December 1, 2021 26 / 29

Survival Models and Analysis December 1, 2021 27 / 29

Figure 1: KM Survival Estimate and Confidence intervals (type ‘log-log’)

Means,P Medians, Quantiles based on the KM

You might also like