You are on page 1of 16

J. chron. Dis. 1969, Vol. 21, pp. 629-644. Pergamon Press.

Printed in Great Britain

ESTIMATING SURVIVAL FUNCTIONS FROM THE


LIFE TABLE*

EDMUNDA. GEHAN, Ph.D.


The University of Texas M. D. Anderson Hospital and Tumor Institute Houston, Texas 77025,
U.S.A.

(Received 29 February 1968; in revised form 2 August 1968)

1. INTRODUCTION
THE PROCEDUREfor estimating a survival curve from data given in the life table
is well known. BERKSON and GAGE [l] and CUTLER and EDERER 121 give an
actuarial or life table method of estimating a survival curve. KAPLANand MEIER
[3] give a maximum likelihood estimate called the product-limit estimate; the life
table method is nearly equivalent to this in large sample,s.
The procedure for estimating other functions of survival time is not so well
known, especially to applied research workers in the life sciences. Any distribution
of survival times can be characterized by three equivalent functions which may be
defined in words as follows :
Survivorship function, T(ir(t):probability that an individual survives longer than t.
The survivorship function is often called a survival curve.
Hazard function, X(t): probability that an individual dies in a short interval of
time, given survival to time r. The h,azard function often is termed the force of
mortality or age-specific failure rate.
Probability density function, f(t): limit of the probability an individual dies in
the short interval t to (t + At) per unit width (At).
These functions are mathematically equivalent, as explained in the next section,
but each illustrates a different aspect of the data.
This paper has several purposes: (1) to provide methods for estimating the
three functions of survival time from the life table; (2) to give a procedure for
estimating median remaining life-time from life table data; and (3) to indicate how
a plot of the hazard function can be utilized to distinguish among theoretical forms
of survival time distribution. In particular, it will be seen that the log-normal
distribution implies a special form of hazard function that often is not fulfilled in
applications.

YThis investigation was partially supported by PHS researoh grants FRO0258 and FRO0254
from the Division of Research Facilities and Resources to The University of Texas at
Houston and to the Common Research Computer Facility, Texas Medical Center, Houston,
Texas.
629
630 EDMUND A. GEHAN

If some survival data are available and the form of the theoretical distribution
is unknown, it is suggested that the various survival functions be estimated and a
theoretical distribution chosen that gives a good fit to the three survival functions.
The estimation of parameters of some survival distributions from estimates of
the hazard function is considered in GEHANand SIDDIQUI[4].
There are often right-censored observations present in a set of survival data;
in other words, some individuals who were alive when last seen but who have been
lost to follow-up or were withdrawn alive at the end of the period of study. Estima-
tion procedures will be considered only if they are appropriate when censored data
are pre,sent. Of course, the procedures given will also be valid when there is
complete ascertainment of survival times.
The term “survival time” is used throughout the paper,, though it would be
equally proper to use length of response, time to recurrence of disease, time to
development of tumor or some other function of response time. In the following
sections, we give the relationship among the survival functions, a method for esti-
mating the various functions from the life table, and a method for estimating
median remaining life-time. Also, the hazard functions are given for some
theoretical distributions (exponential, Weibull, log-normal, Gamma and Gompertz).
Finally, estimates of the survival functions are obtained for an example.

2. RELATIONSHIP AMONG SURVIVAL FUNCTIONS


The Ithree equivalent survival functions are: the survivorship function, the
probtibility density function and .the hazard function. If any of the functions is
given, the other two can be derived. These relationships have been pointed out by
BROADBBNT[a, Cox [6], and BUCKL.AND[7], among others. The functions are
defined in words as follows :

Survivor-ship function (s.f.)


The survivorship function, g(t), is the probability that an individual survives
longer than t, that is
“7(t) = prob. (T > t),
where T is a random variable designating survival time. Clearly, 9(O)= 1, 9(00)=0,
and g(t) is a non-increasing function of t. In some applications, the cumulative
distribution function, F(t), is used This is related to g(t) by
3(t) = 1 -F(t). (2.1)
Estimates of 3(t) can be obtained by the method of BERKSONand GAGE [l] or
KAPLAN and MEIER [3]. The former method requires grouping the data and is
appropriate only when the sample size is fairly large; in this instance, the two
methods give nearly equivalent results. The latter method is appropriate for small
or large samples and grouped or tmgrouped data.

Probability density function (p.d.f.)


The probability density function, f(f), is the limit of the
prob. (individual dies between t and (t + At)) as At--f o
.
nr
Estimating Survival Functions from the Life Table 631

Since T represents a survival time and is necessarily non-negative, f(t) is zero for
negative t and defined by the above definition for t 2 0. It can be shown that

f(t) = - 3’0). (2.2)

Hence, given a form for the s.f., the p.d.f. is obtained by differentiation.

Hazard function (hf.)


The hazard function, X(f), is the probability of nearly immediate failure for an
individual known to be alive at time t. The h.f. is also known as the age-specific
failure rate, the conditional failure rate, the force of mortality or instantaneous
death rate. By definition,

(2.3)

The h.f., X (t), must be non-negative.


If 9(r) or F(t) is given, f(t) is obtained by differentiation and h(t) is obtained
using (2.3). If a form for f(t) is given, then

by definition and X(t) is obtained from ,(2.3). Finally, if X(f) is given, it can be
shown that
SO = exp I- \ h(u) dul (2.4

and f(t) is then obtained from (2.3). Thus, given any of the three survival functions,
the other two can be derived.

3. ESTIMATION OF SURVIVAL FUNCTIONS FROM


THE LIFE TABLE
The format of the life table is given in Table 1. Entries in the life table are as
follows :
(a) Znterval (xi- ). This column gives the groupings into which the survival
times and times to loss or withdrawal are distributed. The notation (xr ) denotes
the interval from xt up to but not including x~+~.The last interval extends theore-
tically to infinity. These intervals are assumed to be fixed; thus, the number of
individuals dying in each interval is a random variable which follows the multi-
nomial distribution.
(b) Mid-point (xmc). The mid-point of each interval, designated x~, i= 1, . . .,
s-l, is included for convenience in plotting the s.f and h.f. Both functions are
plotted at xmr.
(c) Width (hi). The width of each interval, hi=Xi+l --xi, i= 1, . . ., s - 1, is
needed for the calculation of the s.f. and h.f. The width of the last interval, h,, in
theory, is infinite; no estimate of h.f. or s.f. can be obtained for this interval.
(4 Number entering ith interval (n;). The number of individuals entering the first
interval is q’. the total sample size. Other entries are determined from
632 EDMUND A. GEHAN

s- t” . . . g

&- ,_y . . . ,_y


Estimating Survival Functions from the Life Table 633

ni’ = n,_:- lw - wbl - d,. In hand calculation, these entries are usually determined
after the entries li, wi and dl have been made (i= 1, . . . , 4. Then,
n;=G+‘+Ii+wi+d< (i=s,. . ., 1) and, as a check, n,’ should equal the total sample
size.
The individuals whose survival experience is studied could be obtained from:
(1) a cohort study, i.e., a group of individuals studied from some zero point, say
time of diagnosis, to death, or (2) a series of cohorts analyzed at a particular date;
for example, the cohorts could be all cases of a disease diagnosed in 1960, 1961,
. . . and the time of analysis could be 1968, or (3) a clinical trial where each patient
is observed from time of start of treatment to conclusion of study. In each case, the
individual’s survival time is measured from his own zero point. Examples of zero
point are time of start of treatment, time of diagnosis and time of first symptoms.
(e) Number lost to foZZow-up (Q. This is the number of individuals who are
lost to observation for some reason and whose survival .status thus became unknown
in the ith interval (i= 1, . . ., s). Individuals may be lost to observation if they
move, or fail to return for treatment, etc. Every attempt should be made to trace
such cases. See (0 for the assumption made about the ,survival experience of such
cases.
(0 Number withdrawn aJive (wi). Individuals withdrawn alive are those known
to be alive at the closing date of the study. Such observations arise in clinical
trials and cohort studies. In both cases, individuals are exposed to the risk of
death for varying periods of time depending on their date of entrance on study.
An individual entering study at a point in chronological time near the closing date
will have a period of exposure to the risk of failure shorter than an individual who
enters study substantially in advance of the closing date. The time to withdrawal
alive for an individual alive at the closing date is the length of chronological
time from his entrance into study to the closing date of the study. Thus, Wi is the
number of individuals withdrawn alive in the irh interval.
The assumption made for the life table calculations is that the survival experience
after ‘the date of last contact of those lost to follow-up (Zi) and withdrawn alive
(wJ is similar to that of the individuals who remain under observation, This
assumption seems reasonable for individuals withdrawn live. However, as CUTLER
and EDERER [2] explain, the survival experience of lo,st individuals may be better,
the same, or worse than individuals continuing under observation. Consequently,
it is most important to keep the percentage of individuals lost as low as possible.
(g) Number exposed to risk (n,). This number is defined as ni =n: - _5(li + wi),
(i=l, . . ., s). If there are no losses or withdrawals, then ni=n,‘. Individuals lost
or withdrawn in an interval are credited with being exposed to the risk of failure
for one-half the interval. This is a basic assumption of the life table and should
be correct on the average. The presumption is that the times to 10,s~or withdrawal
are approximately uniformly distributed throughout the interval.
(h) Number dying (di). This is the number dying in the P interval. The time
to death for each individual is measured from his own zero point.
(9 Conditional proportion dying (6). This is given by e1 = di/nr, (i= 1, . . . , s - l),
d8= 1. The proportion is conditional since it is the probability of death in the
rti interval, given exposure to the risk of death in the P interval.
634 FDMTJNDA.GJ3HAN

(jj Conditional proportion surviving (b$. This is given by at= 1 -Gt.

(k) Cumulative proportion surviving (Ff or &xi>>. This is an estimate of the


survivorship function at time xi; it is often referred to as the cumulative survival
rate. The estimate is it =&iL1 (i= 1, . . . , s) and il= 1.00. It is the usual life
table estimate and has been given by numerous others, including BBRKSONand
GAGE [l] and CUTLER and EDBRER [2]. The result is based on the fact that sur-
viving to the start of the Ph interval means surviving to the start of the (i- l)th
interval and *then surviving the ith interval. Note that this probability is defined for
the last interval.
(Z) Probability density function [i(xn()]. From the definition of the p.d.f.,
the natural estimate is

Thus, &,,J is the estimated probability of dying in the ith interval per unit width;
this is the definition of the p.d.f.
(m) Hazard function &x,,,Jl. The estimate of the h.f. for each interval is

(3.1)

This is the so-called actuarial estimate of hazard function and it is described in


KIMBALL [8]. In words, it is the number of deaths per unit time in the interval
divided by the average number of survivors at the mid-point of the interval. The
average number of individuals alive at x ml is approximated by (ni-dJ2). Other
estimates of hazard function are given by KIMBALL[8], WATSONand LEADBETTER
191 and SACHER[lo]. No systematic study has been undertaken to compare the
various estimates of hazard function.
An estimate of hazard function proposed by SACHER[lo] is

This estimate is derived by assuming that hazard is constant within an interval,


but varies among intervals. Given survival to the P interval, the chance of death
in the ith interval is Cl- e-*ihi] and the chance of surviving is exp (-h&J. Assuming
the number of deaths in each .interval is binomially distributed and the sample
size is nL, it is easily shown that (3.2) is a maximum likelihood estimate of hazard.
GEHAN and SIDDIQUI141 did a Monte Carlo study comparing (3.1) to (3.2) and
found that (3.1) was clearly the superior estimate since it was less biased. Conse-
quently, (3.1) was the estimate of hazard chosen here. Further study is needed to
determine whether or not there is an estimate of hazard generally superior to (3.1).
The estimates of the survival functions obtained from the life table are &_x,)
Estimating Survival Functions from the Life Table 635

(or kJ, &,,J and *h(x.,,,J*.Each survival function illustrates a different aspect of
the data. The survivorship function, &Xi) is useful in obtaining median and other
percentile estimates of survival time. The median, z, is that value such that
4($=05. Other percentiles of survival time are obtained similarly. The sur-
vivorship function is also useful for estimating the percentage of patients surviving
longer than x time units. A common example is the proportion of patients with
a disease surviving longer than 5 yr. A plot of the hazard function, &x,,,J,
characterizes the ageing of the population. Does the risk of death per unit time
increase, remain about the same, decrease or describe a more complex course? In
a later section, it will be shown that plotting the hazard function is helpful in making
a choice of theoretical distribution. The probability density function, ~(x,,J, can
be used to estimate the proportion of deaths taking place in any interval of time.
It is also useful for estimating peaks of high frequency of death.
The variances of the estimates of the survival functions in the ith interval are :

i-l

var~Xj)lcz k 2 -& , (3.3)


j=l

(3.4)

and

(3.5)

All the formulae are large-sample approximations. The formula for the
Var [&xi)] is well-known, having been given first by GREENWOOD[l 11. The other
two variance formulas do not seem to have been given before and are derived in the
appendix. These may be used to obtain approximate confidence limits for the
various survival functions.

4. MEDIAN REMAINING LIFE-TIME


In nearly all papers concerned with life tables, an estimate of the mean life-time
or the expectation of life is derived, (see, for example, KAPLAN and MEIER [3] or
CHIANG [12]). An ambiguity arises in the determination of this estimate if
$(xJ > 0 and not all individuals die in the last interval: in that case, the mean
life-time is indeterminate and often is estimated as the mean life-time limited to
time L and G(L). Since this is not easily interpreted, it suggests looking for
another type of estimate.
Estimating the median remaining life-rime is simple, descriptive and can be

*A computer program is available to calculate the various survival functions and their
variances, given wg I,, di and R, (+I, 2, . . ..,. s). A copy may be obtained by writing the
author. Thanks to Mrs. Jane E. Putman for wntmg the program.
636 EDMUND A. GEHAN

accomplished when G(X) is less than O-5 for some x. The median is recommended
in elementary statistics texts for characterizing distributions skewed to the right. It
is possible to estimate it routinely from the life table. If the median and the
expectation of life are estimable, both could be estimated. When the mean is sub-
stantially larger than the median life-time, it is an indication that there is some
proportion of long term survivors.
The median remaining lifeatime at time xi, (i= 1, . . . , s - 1) is designated z, and
defined as :

Here, bj is the estimated proportion surviving beyond the lower limit of the class
interval containing the median. For the median to be defined, F,+* must be less
than ii/2.
The variance of this estimate is approximately

VaJY(;;i>= giz (3.6)


4ni [f&J

where ^Kx,-) is the estimate of the probability density function in the interval
containing the median. This is an extension of the formula derived in KENDALL
and STUART [13] to the case of conditional probability density functions. The com-
puter program noted will also calculate estimates of G and J [Var (;Fi)].

5. HAZARD FUNCTIONS FOR SOME SURVIVAL


DISTRIBUTIONS
Most papers in the life sciences presenting survival data give the survivorship
function or survival curve only. Survivorship functions from nearly all theoretical
distributions have the form

Hence, it is difhcult to distinguish among theoretical distributions from a plot of


the survivorship function.
Estimating Survival Functions from the Life Table 637

The hazard funotion characterizes the ageing or wearing out process. The
simplest hazard function is constant with time; in this case, the p.d.f. is the
exponential distribution. This means that there is no ageing and failure is a random
event. Though this is a very simple distribution, it has been found to fit many kinds
of data. ZELEN [14] discusses the application of exponential models in cancer
research; EPSTEIN[15] gives its role in industrial life-testing in one of many papers
on the distribution. The exponential distribution is not always applicable; see
BERG and ROBBINS[16] for an example in which the data fit the exponential dis-
tribution over the short but not the long-term.
Survival distributions other than the exponential have hazard functions that vary
with time. A complete discussion of various survival distributions is given by
Cox [6] and BUCKLAND[71. Some of the results pertaining to hazard functions
are summarized here :
Distribution Hazard function
Exponential A(x) = A,
Remarks : There is no ageing; failure is a random event.
Weibull A(x)= A,"lX,tii-'

Remarks : Exponential distribution is a special case when Al= 1. If A1> 1,


there is positive ageing starting from zero; if A, < 1, there is negative ageing and
hazard approaches zero as x approaches infinity.
Gompertz X(x) = exp {A,+ A,x)

Remarks : Exponential distribution is a special case when A1= 0. If A1> 0 (( 0).


there is positive (negative) ageing starting from exp {Ao}.

Gamma
where

Remarks : Exponential distribution is a special case when A1= 1. If 0 < A, < 1,


there is negative ageing, with A(x) --, co as x + 0, A(x) + A,, as x+- ‘00. If
A, > 1, there is positive ageing with A(0) =0, A(x) + ho as x + co.

Log-normal A(x)= _!- [log (WI” 1


x J (%rAo) 2A0

where G(t) = &j exp ($1 du


-Lv

Remarks: Exponential distribution is not a special case. The hazard function


increases initially to a maximum and then decreases to zero as x + ,oo, i.e., there
is an early period of positive ageing followed by negative ageing.
638 EDMUND
A. GEHAN

The exponential distribution is a special case of all the above distributions except
the log-normal. From the information given in the remarks, it should ,be evident that
the five survival distributions have a variety of types of hazard function. Suppose
a survival distribution is to be fitted to some data and there is not #theoretical basis
for choosing among distributions. It is suggested that the sample hazard functions be
plotted and a distribution chosen using the information given in the remarks. Other
forms of survival distribution are given by Cox [61 and BUCKLAND[?i.
A common choice of survival distribution among those in the life sciences is
the log-normal. Since most probability density functions are skewed to the right,
it is natural to take logarithms to obtain a more symmetrical distribution. How-
ever, choosing the log-normal distribution implies that there is an early period of
positive ageing followed by a period of negative ageing. The model should be
examined carefully and the sample hazard functions plotted before arguing that
survival is log-normal.
The log-normal distribution is difficult to distinguish from an exponential distri-
bution, especially when the distinction is attempted from a plot of the survivorship
functions. Fig. 1 gives a plot of the survivorship and hazard functions for an
exponential distribution and three forms of a log-normal distribution (coeff. of
var. = 05,1*0 and l-5).
The coeff. of var. is always one for the exponential distribution. The average sur-
vival time in all cases is 20 time units.
The curves given are those based on known values of the parameters of the
distributions and do not represent a sample of real data. It should be evident
that if there is a moderate-sized sample of survival times with a sample coefficient
of variation near one, it will be nearly impossible to distinguish between an ex-
ponential and a log-normal distribution in terms of “goodness of fit”, especially if
only the survivorship functions are compared. Note that the hazard functions for
each form of log-normal rise to a peak and later decrease. This implies the highest
risk of death per unit time is in some period after the start of the study. Since the
exponential is a special case of the Gamma, Gompertz and Weibull distributions,
one of these distributions would probably fit as well or better than a log-normal
distribution when the coefficient of variation is near one.
If there is no theoretical basis for choosing between fitting a log-normal and an
exponential distribution and both fit the data about equally well, the exponential
should probably be selected because of its simplicity.

6. EXAMPLE OF SURVIVAL FUNCTION CALCULATIONS


FOR MALES WITH ANGINA PECTORIS
PARKERet al. [17] presented data on the survival of 2418 males with angina
pectoris as part of a larger group of patients who were examined at the Mayo
Clinic from January 1, 1927, to December 31, 1936. They investigated the rela-
tionship of survival to age, sex, and other factors possibly influencing prognosis.
Estimates of the various survival functions are given in Table 2 and the functions
are plotted in Fig. 2.
It is evident from the hazard functions that the death rate per year is highest in
the lirst year after diagnosis. From the end of the first year to the start of the
tenth year, the hazard functions (death rates per year) remain relatively constant
Estimating Survival Functions from the Life Table 639

SURVIVORSHIP FUNCTION

0 EXPONENTIAL

.I2 RD FUNCTION

.I0

UJ .06
5
2 .06
If
i .04

hG. 1. Survivorship and hazard functions for an exponential distribution and


three log-normal distributions coefficient of variation 0.5, 1.0 and 1.5.

between O-09 and O-12. The hazard functions are generally higher after the tenth
year. Hence, the prognosis for a patient who has survived 1 yr is better than that
for a newly diagnosed patient if factors intiuencing prognosis are not considered.
A similar interpretation is reached by examining the median remaining life-times
by year. Initially, the estimated median life-time is 5.3 yr. From the first year to
the sixth, the median remaining life-time is above 5.3 yr and begins decreasing con-
sistently only from the seventh year on.
Further interpretation will not be attempted here, especially since PARKERet al.
[17] have a complete discussion. Also, it would be misleading to interpret these
data as representing the survival of a homogenous group of individuals.
TABLE 2. %RVIVAI. F~NCTIONCAL~JLA~ONS FOR MALES wITHANGINAPECTORIS

Year Number Number Number number Con?1 Con#l


after Mid- Width entering lost to withdrawn exposed Number prop’n
diag. Doint interval f&OW-UD alive to risk dvina %E survivinpr

0 0.5 2418 0” 2418.0 456 0.1886 0.8114


1 1.5 1962 1942.5 226 0.1163 0.8837
f :: 1523
1697 0 1686.0
1511.5 152
171 Ez 0.8869
0.9098

4 4:s 1329 0” 1317.0 135 d.1025 0.8975

: ::: 1170
938 0 1116.5
871.5 125 0.1120
0.0952 0.8880
0.9048
7 722 x 671.0 ;: 0.1103 0.8897

; I:: 546
427 x 512.0
395.0 z 0.1063
0.0996 EE
10 1;: 321 0 298.5 43 0.1441 0:8559

:: 12.5
11:5 233
146 0” 206.5
129.5 34
18 0.1390
0.1646 0.8354
0.8610
;: 13.5
14.5 95 : 47.5
81.5 6’ 0.1104
0.1263 0.8737
0.8896

15 - :; 0 30.0 0 1.0000 0.0000 B


?
Year 0
after Mid- i(X~,, I/Var[ 6 (x,)1 dVarkxmi~l A
JVaMx,Jl xi I/War Cxi)l E!
diag. point &

0.5 l.oooo 0.1886 0.2082 0.0080 0.0097 5.33 0.17


1.5 0.8114 0.1235 0.0080 0.0060 0.0082 6.25 0.20
0.7170 EE 0.0944 0.0092 0.0051 0.0076 6.34 0.24
415
:: 0.5786
0.6524 0:073g 0.1199 0.0054 6.23 0.24
0.0593 0.1080 “dEE 0.0049 Ez 6.22 0.19
0.5193 0.0581 0.1186 0:0103 0.0050 0:0106 5.91 0.18
0.0439 0.1000 0.0104 0.0047 0.0110 5.60 0.19
7:s
2.: 0.4611
0.4172 0.0460 0.1167 0.0105 0.0052 0.0135 5.17 0.27
0.3712 0.0370 0.1048 0.0106 0.0050 0.0147 4.94 0.28
10:5
;: 0.2987
0.3342 0.0355 0.1123 0.0107 0.0053 0.0173 4.83 0.41
0.0430 0.1552 0.0109 0.0063 0.0236 4.69 0.42
11.5 0.2557 0.0421 0.1794 0.0111 0.0068 0.0306 4.00+ -
12.5 0.2136 0.0297 0.1494 0.0114 0.0067 0.0351 3.00+ -
13.5 0.1839 0.0203 0.1169 0.0118 0.0065 0.0389
14.5 0.1636 0.0207 0.1348 0.0123 0.0080 0.0549 ::zs --
- 0.1429 - 0.0133 - - - -
Estimating Survival Functions from the Life Table 641

1.00 SURVIVORSMP FUNCTION


n

.25

01, i’i’i’l
8 ’
I
10
I I
12 ’ 14
8 ’

HAZARD FUNCTION

PROBABILITY DENSITY FUNCTION

o’,, , , , , , I I’1 II I I I1
0 .2 4 6 6 IO 12 14
YEAit AFTER DIAGNOSIS
FIG. 2. Survival functions for males with angina pcctoris.

Parker et al pointed out various factors influencing prognosis. These data


are typical of many sets of survival data in which patients can be divided into
sub-groups according to the value of some prognostic factors. When the sample
size is very large, survival studies can be made separately for each sub-group.
However, in small to moderate-sized samples, further work is needed on the best
methods for using the available concomitant information.

REFERENCES
1. BERKSON, J. and GAGE, R. R.: Calculation of survival rates for cancer. Proc. staff Meet.
Mayo Clin. 25,270, 1950.
2. CUTLER, S. I. and EDERER, F.: Maximum utilization of the life table method in analyzing
survival. J. &on. Dis. 8 (6), 699, 1958.
3. KAPLAN, E. L. and MEIER, P.: Nonparametric estimation from incomplete observations.
1. Am. stat. Ass. 53 (l), 457, 1958.
4. GEHAN,E. A. and SIDDIQUI,M. M. : In prepara,tion.
5. BROADBENT, S. : Simple mortality rates. Appl. Statist. MI (2), 86, 1958.
6. Cox, D. R.: Renewal Theory, Chap. I. Wiley, New York, 1962.
7. BUCKLAND,W. R.: Sfatistical Assessment of the Life Characteristic. Hafner, New York,
1964.
642 EDMUND A. GEM

8. KIMBALL, A. W.: Estimation of mortality intensities in animal experiments. Biometrics,


16 (4), 505, 1960.
WATSON, G. S. and LEADBETTER,M. R. : Hazard Analysis, I. Biometrika, 51, 175, 1964.
109 SACHER, G. A: On the statistical nature of mortality, with special reference to chronic
radiation mortality. Radiology, 67,250, 1956.
11. GREENWOOD,M. : Reports on public health and medical subjects, No. 33, Appendix I.
The “Errors of Sampling” of the Survivorship Tables. H.M. Stationery Office, London,
1926.
12. Cmmo, C. L. : A stochastic study of the life table and its applications: I. Probability
distributions of the biometric functions. Biometrics, 16 (4), 618, 1960.
13. -ALL. M. G. and S?U~T, A.: The Advanced Thwrv of Statistics. Vol. I. Second
Edn. Hafner, New York, 1963:
14. ZUEN, M. : Application of exponential models in cancer research. J R statist. Sot. (Ser. A)
129 (3), 368 1966.
15. EPSTEIN. B.: The exponential distribution and its role in life testing. Znd. _Qual. Control,
15 (6), 5; 1958. -
16. BERG, J. W. and ROBBINS, G. F.: The failure of a model to predict cancer survival.
J. chron. Dis. 20 (IO). 809. 1967.
17. PARKER, R. L., DRY, T. I., WILLIUS, F. A. and GAGE, R. P.: Life expectancy in angina
pectoris. J. Am. med. Ass. 131 (2), 95, 1946.

APPENDIX
We first find the Var (&,Jl assuming that all individuals have failed. The result will
be given in terms of the true proportions failing in each interval. Estimates of these propor-
tions in the incomplete sample case will be substituted for the true proportions so that the
final variance formula is approximate and will be valid only for large samples.
If a sample of ai individuals is followed until all fail and each death is recorded as occurring
in one of s fixed intervals, the joint distribution of the numbers of deaths is multinomial.
Suppose the set-up is as follows :

Interval

Ill 121 . . . Ii( . . . IsI


Deaths 4 d, . . . d, . . . d,

Sample Alive
entering “1 n2 . . . “1 . . .
“s
interval

Proportion
dying in 6, 8, . . . 9, . . . es
Population interval

Deaths Vl v* *. . Vi . . . ve

The sample size is n, so that i v(=ni Here, ni’=ni since there are no losses or withdrawals.
I=1
.
Also, i f$=l. It is convenient to introduce
*=*
i-1 i-l i-1
m,=Ld,, I.+= I: v, and &= L Bi
1=1 111 +=I

The estimate of hazard for the ith interval is


Estimating Survival Functions from the Life Table 643

and we wish to find the variance of the estimate.

.Let ad,- d,-v,

6mi= mi-wi.
Then,

4
Var
‘, (ni_m‘--di/2) I =Var
h{ (n~-Pi-vi/2)
i
1-
am,
(nl-_CL<-vi12) -
6di
2 (n~-P{-vi12) i

.and this is approximately

=Var Vl
h‘ (n,--CL,-vr/:

(W2

(ni-Pi-Vi/2)3 + 4 (hi-_I.L*-vi/2)”

(8di)? @m3 (ad,)


+ (A.])
vi (ni--Cc*-vi/2) vi (n,-/.$-vi/2) + (n,-/.&,-q/2)9 1

since E @dJ=E (Sm,)=O, where E is expected value.


With the assumption of multinomid sampling,
E(6d~=n,0i(1-6i), E(Sm,)“=n,+,(l-I#Q)

2
and E @d&m,)= -n,+$,. Substituting in (A.l) and after considerable simplification, we obtain

Var {t (x,J}=
nIhj2 ei
(1 -f#Ji-si/2)z ( l-
0,
[ 2(1-&-&/z) I) *
(A.21

‘This is ahe formula with complete ascertainment of survival times. For incomplete samples,
we use
e,& &+l-;, ?r,+ + n,,

where + means is estimated by. Of course, when there are losses or withdrawals, the actual
number starting study is n,’ and those alive at the start of each interval are ni’(i=l, . . ., s).
In all variance formulas, (3.3), (3.4), (3.5), and (4.1) n,’ is replaced by ni since this is the estimated
number of individuals exposed to the risk of failure under the life-table assumption concerning
losses and withdrawals. This certainly affects the estimates of the survival functions and their
variances. The effect of this, while almost certainly slight in large samples, will not be inves-
tigated here.
With the above assumptions and replacements, (A.2) becomes

(A.3)

We also wish to find Var &,,J) where

;(x& 4 , (i=l, . . ., s-l).

Since $,=a, . . . jc,, the problem is to find the variance of a function of i random variables.
Using the large sample approximation formula in KENDALLand STUART [12] (p. 232), we have
644 EDMUND A. GEHAN

M.4)

where,

Now,

and

Var {ii1 =
Sj(lII,-&I , (j=l, . . . , i).
I

Also, it has been shown by CHIANG[l I] that COYt&, j &=O, jfk.


Substituting these results in (A.4), we obtain

This is a large sample approximation formula and is defined only when ni > 0, i= 1, . . . , s- 1.

You might also like