You are on page 1of 10

Statistics 581

. SURVIVAL DATA ANALYSIS - LIFE TABLE METHODS

Wei Zhang, Synergic Reso~rces Corporation

Key words: Censored observations, Cohort, Hazard some general concepts about survival analysis and its
function. applications will be introduced. Then life table methods
together with SAS LIFETEST procedure will be detailed.

Abstract What is Survival Data Analysis ?

Even though SAS PC DOS version 6.04 has been Survival data analysis centers on the period of time
released for a quite bit time, its UFETEST procedure it takes a group of individuals (persons or things) to reach
especially the life table option might be still new to some a predefined event, called failure. Survival data can be
of SAS statistical users. This tutorial will focus on life from any types of subjects, such as the durations of
table techniques in estimating survival function, hazard employment in companies, the periods of stock market
function, and lifetime (median residual lifetime). SAS up, and the survival times of patients in a clinical trial.
UFETEST procedure in creating life tables will be In many studies, researchers follow some subjects until
discussed along with detailed example about its one of their characteristics disappear. However, the study
applications. The example used is from a utility may be completed before the characteristic disappears or
conservation program. However the technique can be subjects sometimes withdraw in the middle of a study. In
used in many other areas, such as pharmaceutical these cases, the survival times are incomplete.
research, marketing research, and demographic research. Incomplete survival times are called censored times.
The key concept of ~red observations in survival data Completed survival times are called event times. All
analysis will be introduced and the SAS coding techniques survival analysis techniques take into account the fact that
are provided. The distinction between the medical not all subjects being studied have their complete survival
(clinical) life table and popUlation (demographic) life table times at the point that the data are collected. Therefore,
will also be discussed. the techniques in survival analysis are different from
conventional statistical methods, either parametric or
non parametric, because survival data are almost always
Introduction incomplete.

Although life table methods have been widely used The concept of censored observations is very
by demographers and medical researchers, they are not important in order to understand survival analYllis. Figure
well known by many others. Some researchers use 1 illustrates how censored observations are defined.
parametric approaches to do any kind of survival data
estimations without realizing some nonparametric As illustrated in Figure I, observations for A and B
methods, such as life tables, can be good alternatives have completed their lifetime (e.g., A started in 1981 and
because they are easy to use and to understand. failed in 1983). However, for observation C its lifetime
was not completed at the time of data collection in 1993
It is inappropriate to think that parametric methods (survey cut off date), i.e. the observation C was still
are always better than nonparametric methods. alive. Thus, observation C is a censored observation.
Nonparametric or distribution-free methods are less Although observation C's complete lifetime was not
efficient than parametric methods when survival times available, it should not be excluded from the survival
follow a theoretical distribution and more efficient when analysis because in estimating a lifetime for a group we
no suitable theoretical distributions are known (Elisa T. need the total number exposed to a risk of failure (this
Lee, 1992). According to Lee, nonparametric methods includes both failures and censored observations). In
should be used to analyze survival data before attempting other words, any observations exposed to a risk of failure,
to fit a theoretical distribution. In the following sections, for the purpose of this study, have either a complete or

NESUG '93 Proceedings


582 Statistics

incomplete lifetime history and must be included in the study can be applied to many other types of studies
survival analysis. Including censored observations in involving survival data.
calculating a group lifetime is the unique feature of a
survival analysis. In the above example, the nine years
(1984-1993) of life history from observation C contribute Life Table Methods
information which is useful in estimating a group lifetime.
SAS bas three procedures in doing survival analyses: There are two types of life tables in general:
UFETEST. UFEREG, and PHREG. The LIFETEST population or demographic life table and medical or
procedure uses nonparametric methods including life clinical life table. In SAS UFETEST, the life table
tables. The other two, LIFEREG and PHREG, use option is for medical life table by default. Within each
parametric methods. Primarily. LIFETEST is used to type of life tabl~, it can be further classified as cohort
estimate the survival time while LIFEREG and PH REG (longitudinal) life table and period (cross-sectional) life
are used to investigate the factors affecting the survival table. In the example given in this paper, a period life
time. table was used for the analysis. The main reason for this
choice is that the insulation measureS data collected in the
survey were installed from 1981 to 1988 and are therefore
Figure 1 . not in a same cohort'. The period life table is capable of
AD Example of CeosORld ObservatioDs including eight years of data in one life table and
generating meaningful statistical estimations from these
Measures observations even though the data are from different
cohorts. The period life table is a mathematical model of
A the life history of a hypothetical cohort. The key
---e assumption with li!e period life table is that the hazard
function (a point estimates of age-specific failure
experiences) during the current time represents the failure
experience of the whole cohort. That is, we assume the
B _ _ _ _ _~••
failure rate of one type of measure "x" years old now to
be the same failure rate that the measures "(x-I)" years
old today will have next year when they became ·x· years
old.
c
-----..... The follOlNing statistical estimates are generated in
the life table analysis: survivorship function, probability
density function, hazard function, effective sample size,
and conditional probability of failure. A defmition of
L--'-'1---~-1--"-3---"4---"-$---"-'--"3~S these statistical terms can be found in Appendix A. As
Year was mentioned before, the key concept in the survival
analysis is the censored observation. Therefore, it is
important to understand how life table methods handle the
censored observations. The effective sample size takes
care of the censored problems. As can be seen from its
Where Can Survival Analysis Be Used ? definition, the effective sample size equals the actual
sample size minus one half of the censored observation in
Survival analysis can be used in many different the corresponding intervals. The re3soD that censored
areas. Medical researcbers may use it to study patient's observations should be subtracted from the actual sample
survival time after a medical treatment. Marketing size is that at the time the censored observations are
researchers from telephone company or credit cards censored, they are no longer in the position to provide
company can use survival analysis to estimate the average useful information in survival estimations. But before the
time of stay for their customers. The example given in time they are censored, they can provide such
this paper was from a research project for a utility information. This is why censored observation can not be
company. The subject being studied was a water heater excluded from survival data at the beginning. The reason
tank wrap. one kind of insulation measure from an energy that only one half of censored observations is subtracted
conservation program. The methodology used in. this from the actual sample size is because actual as well as

NESUG '93 Proceedings


Statistics 583

effective sample sizes are point estimates. Normally Table 1 summarizes the major statistical estimations
sample sizes in survival data are defined at the mid-point from the life table. More detailed findings can be found
of each age interval. Thus if censored observations are in Appendix C. Both point and interval estimates of the
symmetrically distributed in each survival year interval, survivorship function and hazard function are provided.
only one half of censored observations need to be The point estimates can be used as a framework for
subtracted from the actual sample to get the effective comparison with estimates of measure life developed by
sample size. engineers. The interval estimates can serve as a range
between which the true measure life lies. It should be
SAS LIFETEST Procedure noted that both upper and lower confidence limits for the
life table estimations were calculated based on the
There is a life table option, LT, in PROC assumption of a normal distribution and a confidence level
LIFETEST. If its option is not specified, by default is 95% .. However, a survival distribution is often skewed
PROC LIFETEST uses product-limit method in doing or away from a normal distribution. Since the survival
survival estimation. The product-limit estimate can be distribution may not be an exact normal distribution, the
considered as a special case of the life table estimate level of confidence which can be associated with the
where each interval contains only one observation. The interval estimates (i.e., the range between which the ~
basic input data to construct a life table are the following measure life lies) may deviate slightly from 95%.
three dates: the date of installation (starting data), the date Therefore, caution should be taken when using the
of removal (ending date), and the date of data collection interval estimates for planning andlor forecasting.
(survey cut off date). The survival history data
contributed to the life table by the event observations can
be obtained bY subtracting the date of installation from the Table 1.
date of removal; the survival history data contributed to
the life table by the censored observations can be obtained Survival Analysis
by subtracting the date of installation from the date of for the Water Heater Tank Wrap
data collection. The detailed SAS code for a life table is
provided in Appendix B. As can be seen from the SAS
code, all one has to do· in life table methods is to make Year Survival Year Hazard
durations of survival time for both events and censored (Interval) Function (Mid Point) Function
observations available. The durations of survival time
then will be used as the TIME variable after PROC 0-1 1.000 0.5 0.002
LIFETEST statement. The plot option in LIFETEST can 1-2 0.998 1.5 0.010
produce not only a set of plots for the estimated survival 2-3 0.988 2.5 0.028
function, hazard function, and density function against 3-4 0.961 3.5 0.018
time, but also a plot of the negative log of the estimated 4-5 0.944 4.5 0.038
survival function against time (by specifying LS), and a 5-6 0.909 5.5 0.045
plot of the log of the negative log of the estimated 6-7 0.869 6.5 0.052
survival function against log time (by specifying LLS). 7-8 0.825 7.5 0.048
The LS and LLS plots provide an empirical check of the 8-9 0.786 8.5 0.061
appropriateness of the exponential model and the Weibull 9-10 0.740 9.5 0.076
model, respectively, for the survival data. The LLS plot 10-11 0.689 10.5 0.118
is especially useful to check out the proportional hazards 11-12 0.609 11.5 0.095
model assumption, proportional odds, before attempting 12-13 0.554 12.5 0.138
to use PHREG. The OUTSURV option provides interval 13-14 0.482 13.5 0.566
estimates with 95 % confidence intervals.

Results of the Survival Analysis The following are the most important results from
the life table analysis:
From these life tables, we are able to examine both
.the level of measure retention, and patterns related to this • Median Residual Lifetime
retention, in greater detail than is possible through more • Mortality Patterns
simplistic univariate analysis techniques. • Probability of Survival

NESUG '93 Proceedings


584 Statistics

Mortality patterns are illustrated in Figure 2.


• Median Residual Lifetime The failure rate for the tank wrap increases as years
increase (with some fluctuations). It is not easy to explain
A very important statistic estimated from the life these fluctuations without prior knowledge about the tank
table is the median residual lifetime. This is the amount 1Nf8p. In general, sophisticated modelling techniques may
of time which elapses before reducing the number of at- be needed to further examine the fluctuation from the
risk units to one-half, also known as the median future hazard function.
lifetime. For this study, this statistic estimate is available
only for the first three years because of limited number of
years for which data were available and the relatively high • Probability of Survival
frequency with which these measures were observed to be
in place still. The median residual lifetime can be used as .Using the survivorship function generated in the
a proxy for the water heater tank wrap lifetime. See table life tables, it can provide the probability that, at any given
1 in appendix C for details. time following installation, the measure will still be in
place. This probability is referred to as ·Survival
Function Estimates· which are presented in table I and
• Mortalitv Patterns figure 3.

Using a life table statistic called .the Hazard


function, it is possible to identify trends in measure Figure 3
failure over the life of a measure. In other words, at Survival FUDction S( t) Estimates
what time during the life of a measure is failure most SIt)

likely to occur? Mortality patterns have been explored


using instantaneous failure rates for each age interval, as
represented by the hazard function. The value of the
hazard function for each year can be compared with more
traditioual engineering estimations. 0.1
~

hIt)
Figure 2
Hazan! Function h(t) Estimates 0.4
~
0.1',...--------------,
0.2

o
D
Year
• 10 II 14

- Tad: Wrap

.
Tips and Pitfalls

, 10 Il 14
Year • Measure Retention by Income Group

- T•• kWnp A comparison analysis can be conducted in life


table by dividing data into two (or more) mutually
exclusive strata. In the given example, a· separate
survival analysis was performed by using high and low

NESUG '93 Proceedings


Statistics 585

income groups: To do this, a new variable called 'group' failure and date of installation from the events). One can
was created. The STRATA option was used in PROC replace the missing values with six years plus their date
UFETEST to take 'group' variable. The SAS code for of installation. Caution should be taken when a relatively
this analysis is also in Appendix B. In this example, large portion of events has missing values. Because if
family income below $30,000 was defined as a low 30 % of events have missing values on their failure dates
income group. Family income greater than, or equal to, and six years lifetime is used to impute the missing
$30,000 was considered to be a high income group. values, the hazard function can be biased (age-specific
Comparisons were made in terms of survival level and failure rate at age six and half will go up sharply). In this
pattern. case the interpretations about mortality pattern (hazard
function) should be made carefully. A simple simulation
As illustrated in Figure 4, the survival curves by model can be used to avoid destroying the mortality
income group are similar. However, the low income pattern. ·Instead of adding six years to all failure-date-
group has higher survival rates. missing observations, one can give them more or less than
six years as well as six years so that the total average of
'imputation years' is still the six. By this method, the
mortality pattern can remain the same as the one with no
Figure 4 . missing values, assuming that missing failure dates are
Su"hal Function S(t) by Income Group evenly distributed.
Sill
,------------------,
• Which Types of Life Table Should be
Used?

O.8r------ Depending on what kind of survival data are


available, one can decide if population life table or
0 . 6 \ - - - - - - - - ---"-,....."'--
medical life table is more appropriate. For popUlation life
tables, a whole history of death data is needed. In other
words, to use population life table, there should be no
O.4r---------------j censored observations at the survey cut off day (or if
censored observations occur, one has to estimate each of
0.2r------------·------'- the future death day for the censored observations before
a population life table is constructed). A unique statistic
from population life tables is life expectancy, which is the
6 8 10 14 average number of years remaining at beginning of age
Year interval. A good general reference on how to construct
- Low ID.come -+- BI.h Income population life tables is Shryock and Siegel (1975).

For medical or clinical life tables, one does not


need to know all the death records when data are
collected. Withdrawals in the middle of a study and
survivors at the day when data are collected are allowed.
Both can be treated as censored observations. In the
• Missin( values given example, there were no withdrawals. Only
survivors (measures still in place) were coded as censored
It is not uncommon that survi val data contain observations. If withdrawals occur, they should be coded
missing values. Most often, a portion of failure dates are as censored observations the same way as the survivors.
missing values. One commonly used solution to handle
this type of missing value is to use the average duration
of survival time from those events that their failure dates Note:
are not missing as a proxy to impute the missing values.
For example, the average duration of survival time for 1. Cohort is defined as a group of individuals or subjects
non-missing value events is six years (use PROC MEANS that enter a study as the same time (normally in the same
and VAR will be the difference between the date of year).

NFSUG '93 Proceedings


586 Statistics

Appendix A

STATISI'ICAL ESTIMATE DEFlNmONS FOR LIFE TABLE ANALYSIS

• Survivorship function: This function, denoted by S(t), is defined as the probability that an
individual survives longer than t: Set) = P (an individual survives longer than t)

In practice, the survivorship function is estimated as the proportion of individuals surviving longer than t:

Set) '" Number of individuals surviving 1cnger than t


Total number of individuals

Set) is also known as the cumulative survival rate.

• Probability density function. Like any other continuous random variable, the survival time T has
a probability density function defined as the ~imit of the probability that an individual fails in the
short interval t to t+At per unit width At, or simply the probability of failure in a small interval
per unit time. It can be expressed as

P{an individual dying in the interval (t.I+AO}


f(t) = Lim At
At->O
In practice, the probability density function f(t) is estimated as the proportion of individuals dying in an
interval per unit width:

f(t) '" Number of i.ntJividuQ/s tiying within the interval beginning at time t
(Total number of individuals)(Interval width)

The probability density function is also known as the unconditional failure rate.

• Hazard function. The hazard function h(t) of survival time T gives the conditional failure rate.
This is defined as the probability of failure during a very small time interval, assuming that the
individual has survived to the beginning of the interval, or as the limit of the probability that an
individual fails in a very short interval, t to t + At, given that the individual has Survived to time t.
h(t)= Lim P{an individual of age t fails in the time interval (t.t +Atl)
At->O At

The hazard function can also be defined in terms of the survivorship function Set) and the probability density
function f(t):

NESUG '93 Proceedings


Statistics 587

h(t) = /J1l
S(t)

In practice, the hazard function is estimated as the proportion of individuals dying in an interval per unit
time, given that they have survived in the beginning of the interval:

h(t) = Number of individuals dying within the interval beginning at time t


(Number of individuals surviving at t)(lnterval width)

The hazard function is also known as the instantaneous failure rate or conditional failure rate.

• Effective sample size. The effective sample size n'(t) has the following definition:

n'(t) = net) _ wet)


2

or
I 1 1
n'(t) = n (t-l) - d(t-I) - - wet-I) - - wet)
2 2

where net) is the sample size, del) is the total number of failures in each age interval, and wet) is censored
observations. The effective sample size is the key concept in life table survival analysis. It is the effective
sample size 'that makes the adjustment for censored data.

• Conditional probability of failure. The conditional probability of failure q(t) is defined as total
number of failures in each age interval divided by the effective sample size:

q(t) = d(t)
n'(t)

NESUG '93 Proceedings


588 Statistics

Appendix B

Source Code Listing

1. Life Table

options ps=S5 Is= 145;


libname data ' .• \data';

title 'Survival Analysis for the Water Heater Tank Wrap';


data a:
set data.nu_517;
yr)nst1=substr(datecomp,I,2); *yr_instl - Year of installation*;
yr_remol =substr(tl_dremc,3,2): *yrJemol - Year ofremoval*;
yr_inst=input(yrjnstl,2.); *Convert character to numeric*;
yr_ remo= input(yr_IernO 1,2.);

*** Define censored variables: c=O event, c= 1 censored" ***;


c=O;
if tl_inpl= 1 or tl_inpl=2 or tljnpl =4 then c= 1;
1* variable 'tl_inpl' - insulation still in place ?
tl_inpl= lor 2 or 4 indicating the insulation was in place at the time
when data were collected *1

*** Treatment for censored data (the study cut-off day was 1993) ***;
if c= 1 then yr_remo=93;

*********************************************************************.,

***** Life Table Procedures *****:


proc Iifetest plots=(s,ls,lIs,h,p) intervals = (0 to 13 by I) method=1t outsurv=ci;
time diffJr*c(l); .
label diff.yr= 'Year';
proc print data = ci;
run;

2. Life Table with Strata

**********************************************************************.,
if 0< = income < =7; *** Define range ***;
ifincome < 4 then group=O; *** income < $30000 ***;
else if income > = 4 then group= 1; *** income > = $30000 ***;
*************************************.,

proc Iifetest plots={s,ls,lIs,h,p) intervals= (0 to 12 by I) method=act;


time diffJr*c(I);
strata group;
label diffJr='Year';
run;

NESUG '93 Proceedings


Statistics 589

Appendix C

Table I
Sur-vivel Analysis for the Vater Heater Tent Vr8p
Ltfe Tebte Survival Est'.t ..
Conditional
Effective Cordi tfonoL P,_ILity SUrvival Medl.., Median
Interval Nunber NUIt>e. 5_L. Probability Standard Standard Residual Standard
(Lower, Upper) Failed Censored size of Failure Error Survival Failure Error l.ifetlme Error

o 1 I 0 517.0 0.00193 0.00193 1.0000 a 0 12.7500 0.30711


1 2 5 a 516.0 0.0097 0.00431 0.9981 0.00193 0.00193 11.7636 0.3076
2 3 14 0 511.0 0.0274 0.00722 0.9884 0.0116 0.00471 10.8313 0.3061
3 4 9 0 497.0 0.0181 0.00598 0.9613 0.0387 0.00848
4 5 18 0 488.0 0.0369 0.00B53 0.9439 0.0561 0.0101
5 6 20 29 455.5 0.0439 0.0096 0.9091 0.0909 0.0126
6 7 20 52 395.0 0.0506 0.0110 0.8692 0.1308 0.0149
7 8 15 62 318.0 0.0472 0.0119 0.8252 0.1748 0.0171
8 9 15 37 253.5 0.0592 0.0148 0.7862 0.2138 0.0190
9 10 15 32 204.0 0.0735 0.0183 0.7397 0.2603 0.0214
10 11 17 152.5 0.1115 0.0255 0.6853 0.3147 0.0240
II
12
12
13
8
4
"
54
44
88.0
31.0
0.D909
0.1290
0.0306
0.0602
0.6089
0.5536
0.3911
0.4464
0.0275
0.0312
13 4 1 4.5 a._ 0.1481 0.4821 0.51711 0.0430
Evaluated It the Midpoint af the Jnterval
PO. Hazard
Intervat Standard Standard
(Lower. Upper) POF Error Hatard Error

o I 0.00193 0.00193 0.001936 0.001936


1 2 0.0097 0.00430 0.009737 0.004355
2 3 0.0271 0.00714 0.027718 0.007423
3 4 0.0174 0.00575 0.018274 0.D06091
4 5 0.0348 0.00806 0.037578 0.008856
5 6 0.0399 0.00875 0.044893 0.010036
6 7 0.0440 0.0096 0.051948 0.011612
7 8 0.0389 0.0098 0.048309 0.01247
8 9 0.0465 0.0117 0.060976 0.015737
9 10 0.0544 0.0136 0.076336 0.019695
10 11 0.0764 0.0177 0.118056 0.028583
11 12 0.0554 0.0188 0.095238 0.033634
12 13 0.0714 0.0336 0.137931 0.068801
SLlllRary of the Hurtter of Censored WId ~ored Values

Total Failed C_ored lCensored


517 165 352 68.0851

V.r-itble Ubel = DefinitiCln:

survival survival fwtetfon esti .. tes


SOF _La. the lover endpoint of the survivel confidence interval
SO. _UCL the upper endpoint of the surviyal- ~fdtrce ;ntel"Vlll
po. the density func:tion esti. .tes
POF _lel the tower endpoiftt of the PDf confidence interval
POf _OCL the URHr .,q,oint of the POF confidence inter\tlll
Hazard the hazard Hti_tes

IIAZ_LtL the tover endpoint of the hazard confidence interval

IIAZ_UCL the f4!Per endpoint of the huard canfidence intervat

lobL. 2
Survival Analysis 'for the llater Heater Tent wrap
OSS TEARS SURViVAl $OF _lel SDF_UCL NIDPOUll POF PDF_La. PDf-,Je1 HAZUD HAl_LtL HAZ_UCL

I 0 1.00000 1.00000 1.00000 0.5 0.001934 0.000000 0.00572 0.00194 O.OOODOO 0.00573
2 1 0.99807 0.99428 1.00000 1.5 0.009671 0.001235 0.01811 0.00974 0.001202 0.01827
3 2 0.98839 0.97\116 0.99763 2.5 0.0270711 0.013088 0.04107 0.02778 0.013229 0.04233
4 3 0.96132 0.94469 0.97794 3.5 0.017408 0.006134 0.02868 0.01827 0.006336 0.03021
5 4 0.94391 0.92407 0.96374 4.5 0.034816 0.019015 0.05062 0.03758 0.020221 0.05494
6 5 0._ 0.88431 0.93387 5.5 0.039916 0.022776 0.05706 0._ 0.025223 0.06456
7 6 0._,7 0.83995 0.89840 6.5 0.044009 Q.025158 0.06286 0.05195 0.029189 0.07471
B 7 0.82517 0.79166 0.85867 7.5 0.038923 0.019631 0.05821 0.04831 0.023869 0.07275
9 B 0.78624 0.74897 0.82351 8.5 0.046523 0.D23581 0.06947 0.06D98 0.030133 0.09182
10 9 0.73972 0.69787 0.78157 9.5 0.054391 0.027719 0.08106 0.07634 0.037734 0.11494
11 10 0.68533 0.63837 0.73229 10.5 0.076397 0.041767 0.11103 0.111106 0.062035 0.17408
12 11 0.60893 0.55496 0.66290 11.5 0.055357 0.018455 0.09226 0.09524 0.029318 0.16116
13 12 0.55357 0.49238 0.61477 12.5 0.071429 0.005627 0.13723 0.13793 0.003083 0.27278
14 13 0.48214 0.39783 0.56646

NFSUG '93 Proceedings


590 Statistics

Acknowledgements:
The author wishes to thank Larry Helbers and Sam Ye for their comments and valuable suggestions.

References:
1. Elisa T. Lee. (1992). Statistical Methods/or Survival Data AfUllysis. John Wiley and Sons, Inc.

2. SAS Institute. (1992). SAS/STAT User's Guide, Version 6, Fourth &1ilioll. SAS Institute Inc.

Author Contact:
Wei Zhang ,
Synergic Resources Corporation
Suite 127
III Presidential BouleVArd
Bala Cynwyd, PA 19004
(215) 667-2160

NESUG '93 Proceedings

You might also like