You are on page 1of 13

Logistic Regression, Survival Analysis, and the Kaplan-Meier Curve

Author(s): Bradley Efron


Source: Journal of the American Statistical Association, Vol. 83, No. 402 (Jun., 1988), pp. 414-
425
Published by: American Statistical Association
Stable URL: http://www.jstor.org/stable/2288857
Accessed: 11/09/2010 09:19

Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at
http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless
you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you
may use content in the JSTOR archive only for your personal, non-commercial use.

Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at
http://www.jstor.org/action/showPublisher?publisherCode=astata.

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed
page of such transmission.

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.

American Statistical Association is collaborating with JSTOR to digitize, preserve and extend access to Journal
of the American Statistical Association.

http://www.jstor.org
LogisticRegression,SurvivalAnalysis,and the
Kaplan-Meier Curve
BRADLEYEFRON*

We discusstheuse ofstandard regression


logistic techniques hazardratesandsurvival
to estimate curvesfromcensoreddata.
Thesetechniques allowthestatistician regression
to use parametric modeling waythatprovides
on censoreddatain a flexible
bothestimates and standarderrors.An exampleis giventhatdemonstrates theincreasedstructure thatcan be seen in a
parametric as compared
analysis, withthenonparametricKaplan-Meier curves.Infact,thelogistic
survival estimates
regression
are closelyrelatedto Kaplan-Meiercurves,and approachtheKaplan-Meierestimateas thenumberof parameters grows
large.
KEY WORDS: Partiallogisticregression; survival
Parametric estimates;
curves;Hazard-rate Parametric smoothing;Semi-
parametric smoothing.

1. INTRODUCTION This articleis primarily methodological.It showshow


somefamiliar ideas(logistic
theoretical hazard-
regression,
The Kaplan-Meiersurvivalestimator is an important
rateanalysis,and partiallikelihood)can be combinedto
toolforanalyzing censoreddata.Figure1 showsa typical
givea simple,insightful analysisof censoreddata. Most
application.Two treatments forhead and neck cancer
of thetheoryhas alreadyappearedor is at leastclosely
werecomparedin a randomized trial.The Kaplan-Meier
relatedto workby severalauthors:Cox (1975),Holford
curvesshowtreatment B outperforming treatment A in
(1976), Thompson(1977), Efron(1977), Prenticeand
termsof patientsurvival,a resultverifiedby standard
Gloeckler(1978), Pierce,Stewart,and Kopecky(1979),
significance tests.Good references fortheKaplan-Meier,
AndersonandSenthilselvan (1980,1982),PadgettandWei
orproduct-limit estimator, methodsin gen-
and life-table
(1980), Mykytyn and Santner(1981), Laird and Oliver
eral,includeMiller(1981),Cox and Oakes (1984),Pren-
(1981),TannerandWong(1983,1984),O'Sullivan(1986),
tice and Kalbfleisch(1980), and Johnsonand Elandt- Clayton(1983). Theserela-
Tsai (1986),and particularly
Johnson (1980).
tionshipsare briefly discussedat the end of thisarticle
The Kaplan-Meiercurveis so easy to calculateand
(Sec. 5, RemarksJ and K).
(beingtotally nonparametric) requiresso fewassumptions
thatit is easyto forgetitslimitations. Firstof all, it can 2. PARTIAL LOGISTICREGRESSION
be inefficient comparedto parametric survivalestimators.
This sectiondiscussesusingtraditional logisticregres-
Thisis themainpointofMiller'simportant article"What survival
siontechniques(see Cox 1970)to fitparametric
PriceKaplan-Meier?"(Miller1983). Secondly,survival
curvestocensoreddata.Themethodiseasilyimplemented
curvesaredifficult tocomparebyeye,evenintheabsence
usinganystandard regression
logistic program, givesdirect
ofstatistical noise.
estimatesof the hazardrate,and providesapproximate
Figure2 comparestreatments A and B forthecancer standarderrorsin additionto estimatedsurvivalcurves
studyin termsof theirestimated hazardrates.These es- andhazardrates.Theseparametric modelsarecalledpar-
timates arebasedon a parametric theory ofsurvival-curve
tial logisticregressionsbecause of a connectionto Cox's
estimation, whichis themaintopicofthisarticle.We now (1975,ex. 2) theoryofpartiallikelihood.The basicidea
see quitea bitmorestructure. Bothtreatments startwith is notmuchdifferent fromthatof theKaplan-Meieres-
the hazardrate near 0, followedby a high-risk period timator, andinfactreducestotheKaplan-Meierestimator
peakingat fivemonths.The estimatedhazardratessta- as thenumberofparameters growslarge.
bilizeafterone year,withtreatment A havingabout2.5 A concreteexamplewillhelpexplaintheideasandno-
timeshigherriskthantreatment B. tationinvolved.Table 1 showsthedataforarmA ofthe
The methodused to construct Figure2, calledpartial head-and-neck-cancer study,discretizedbyone-month in-
logisticregression, is basicallya straightforward applica- tervals.The originalundiscretized dataappearat thebot-
tionof logisticregression as described,forexample,by tomof thetable.For each value of i from1 to N = 47
Cox (1970). Section2 givesan overviewof thismethod, (thelastmonthwithanydata), thetableshows
withthe detailsfilledin in Sections3 and 4. All of this
materialinvolvesa discretization of the data, even if it ni = no. ofpatientsat riskat thebeginning of monthi
originally is in continuous form.Section5 discussesthe si = no. ofpatientswhodiedduringmonthi
continuouslimitsof our discretemodels.This clarifies
theirconnection withtraditional parametric survival func- s! = no. ofpatientslostto follow-up duringmonthi.
tionssuchas theexponential and theGompertz.
(2.1)

* BradleyEfronis Professor,DepartmentofStatistics,
StanfordUni- ? 1988AmericanStatistical
Association
CA 94305.Thisarticle
Stanford,
versity, wasstimulatedbya talkofWei- Journal Association
oftheAmericanStatistical
YangTsai concerning estimation.
isotonichazard-rate and Methods
June1988,Vol.83, No.402,Theory
414
Efron:Survival Analysis Via Kaplan-Meier 415

1.0
Table 1. Data forArmA oftheHead-and-Neck-CancerStudy
ConductedbytheNorthern California
OncologyGroup,
Discretized
byMonths
0.8-

ll
Month n, s, s,' Month n, s, s

o.6 - 1 51 1 0 25 7 0 0
2 50 2 0 26 7 0 0
0 3 48 5 1 27 7 0 0
a.o * 4 42 2 0 28 7 0 0
2 5 40 8 0 29 7 0 0
6 32 7 0 30 7 0 0
7 25 0 1 31 7 0 0
0.2.B 8 24 3 0 32 7 0 0
9 21 2 0 33 7 0 0
10 19 2 1 34 7 0 0
0.0 A 11 16 0 1 35 7 0 0
0 20 40 60 80 12 15 0 0 36 7 0 0
Months 13 15 0 0 37 7 1 1
Figure1. Kaplan-MeierEstimated SurvivalCurves,ArmsA and B. 14 15 3 0 38 5 1 0
15 12 1 0 39 4 0 0
Theseestimatesare takenfroma studycomparing radiationtherapy
16 11 0 0 40 4 0 0
alone (A) versusradiation
plus chemotherapy (B) forthetreatment of 17 11 0 0 41 4 0 1
head and neck cancer. Treatment B is significantlybetteraccording 18 11 1 1 42 3 0 0
to theMantel-Haenszel test,significancelevel.01 (see Tables 1 and 19 9 0 0 43 3 0 0
2). "Death" actuallymeans "recurrence ofdisease." Theerrorbars 20 9 2 0 44 3 0 0
indicate? one standarderror. 21 7 0 0 45 3 0 1
22 7 0 0 46 2 0 0
23 7 0 0 47 2 1 1
24 7 0 0
For example,n3 = 48 patientswerealiveat thebegin-
ningofthethirdmonthofobservation, duringwhichS3 = Total 628 42 9
5 patientsdied and s3 = 1 patientwas lostto follow-up. NOTE: n,is the numberofpatientsat riskat thebeginningofmonthi, s, the numberofobserved
Thisleftn4= 42 patientsstillunderstudyatthebeginning 34,
deaths, s; the numberlost to follow-up.The survivaltimes in days forthe 51 patientswere 7,
42, 63, 64, 74+, 83, 84, 91, 108, 112, 129, 133, 133, 139, 140, 140, 146, 149, 154, 157,
ofmonth4. "Lost to follow-up," or "censored,"dataoc- 160, 160, 165, 173, 176, 185+, 218, 225, 241, 248, 273, 277, 279+, 297, 319+, 405, 417,
curredmainlybecausepatientsenteredthestudyat dif- 420, 440, 523, 523+, 583, 594, 1,101, 1,116+, 1,146, 1,226+, 1,349+, 1,412+, 1,417 ("+"
indicateslostto follow-up).The table was constructedfromthese data, takingone monthto be
ferentcalendartimes,and someof themwerestillalive 30.438 days.
whenthedatawerecollectedat theend ofthestudy.
Table 2 showsthe discretized data forarmB of the number ofdeathssiis binomially distributed, givenni,say
study.Here we haveusedN = 61 discreteintervals, not
all ofthesamelength.(The choiceofdiscretization made Si I ni - Bi(ni, hi) independently, i = 1, 2, ... , N.
littledifference
intheestimated hazardratesandsurvival (2.2)
curves;see RemarkE, Sec. 3, and RemarkI, Sec. 5.)
Our basicassumption is thatfordataoftype(2.1), the In otherwords,si has discretedensity

0.20
(n;)
hS'(l - hi)n,-S, si = 0, 1, 2, . . . , ni.
Here hiis thediscretehazardrate:

0.15 -
diesduringithinterval
hi = Pr{patient
1
patientsurvivesuntilbeginning ofithinterval}. (2.3)
The binomialassumption in (2.2) is basicto mostwork
0.10
in survivalanalysis.Nice discussions appearin chapter4
ofCox and Oakes (1984) and section(5.2) ofKalbfleisch
andPrentice(1980).In whatfollows,weconsider thenito
befixedat theirobservedvalues,and takeliterally thein-
dependence assumptionin(2.2). Although thisassumption
cannotbe exactlytrue(see Sec. 3, RemarkA), itleadsto
reasonableconclusionsunderthe usual assumptions for
censoreddata.
Months
The survival functionforourdiscretized is
situation
Figure2. Hazard-RateEstimatesforthe Head-and-NeckCancer
Study.Thereis an earlyhigh-risk
periodforbothtreatments.
Thehazard fi- (1--h), (2.4)
ratesstabilize
afteroneyear,withtreatment
A having a hazardrate l'j<i
roughly2.5 times
thatoftreatment
B. (Thebullets
areidentifying
sym-
bolsforcurveA,notdatapoints.) is basedona parametrictheprobability
Thisfigure thata patientdoes notdie duringthefirst
analysisdescribedinSection
2. i - 1 timeintervalsand thussurvivesat leastuntilthe
416 Journal of the American Statistical Association, June 1988

Table2. DiscretizedData forArmB ofthe slightly moregeneralformof (2.8) is usedwhenthedis-


Head-and-Neck-Cancer Study cretization are ofunequallengths,
intervals as inTable2.
Month n, Month n, s,
(See Sec. 3, RemarkE.)
S, st S,
A standardlogisticregressionprogram-usingthe
.5 45 0 0 23.0 14 0 0 GLIM package,forexample-findsthemaximum likeli-
siindBi(ni,hi) as
1.0 45 0 0 24.0 14 1 0
1.5 45 1 0 25.0 13 0 1
hoodestimate(MLE) 'a fora, assuming
2.0 44 0 0 26.0 12 0 0 in (2.2). Thisgiveshi = [1 + exp(-xi a)]' as theMLE
2.5 44 0 0 27.0 12 1 0 of the hazardratehi, and Gi = ll<j<i(1 - hj) as the
3.0 44 1 0 29.0 11 0 0
3.5 43 2 0 31.0 11 0 0
MLE of thesurvivalcurveGi. We call thisprocedurea
4.0 41 3 0 33.0 11 0 0 partiallogisticregressionbecauseof its connection with
4.5 38 3 0 35.0 11 0 0 thetheoryofpartiallikelihood, see in particularexample
5.0 35 2 0 37.0 11 0 1
5.5 33 2 0 39.0 10 0 0
2 ofCox (1975).
6.0 31 2 1 41.0 10 0 1 Figure3 comparesthelife-table estimateGi,(2.5), with
6.5 28 2 0 43.0 9 0 0 twodifferent partiallogisticregressionmodels.The tri-
7.0 26 1 0 45.0 9 0 1
7.5 25 0 0 47.0 8 0 0
anglesindicateGi, based on thecubicmodel(2.7). The
8.0 25 0 0 49.0 8 0 0 bulletsindicateGi basedon model(2.8), withp = 4 and
8.5 25 1 0 51.0 8 0 0
9.0 24 0 0 53.0 8 1 0
10.0 24 1 0 55.0 7 0 1 Xi (1, ti,(tQ- 11)2 , Q,- 11)3 ), (29
11.0 23 1 0 57.0 6 0 0
12.0 22 1 0 59.0 6 1 1 where (ti - 11)
13.0 21 0 0 61.0 4 0 0
= min{(ti- 11), O}. Specification(2.9)
14.0 21 0 0 63.0 4 0 1 is a cubic-linearspline,withthe join pointat t = 11
15.0 21 1 0 65.0 3 0 0 months. The logitAiis allowedto varyas a cubicfunction
16.0 20 1 0 67.0 3 0 1 of timebeforet = 11 months,butonlylinearlyin time
17.0 19 0 0 69.0 2 0 0
18.0 19 1 2 71.0 2 0 1 after11 months.Moreover,thelogitthought ofas a con-
19.0 16 0 0 73.0 1 0 0 tinuousfunction of time,sayA(t) = a1 + a2t + a3(t -
20.0 16 0 0 75.0 1 0 0 11) + a4(t - 11)3, has an everywhere-continuous first
21.0 16 1 1 77.0 1 0 1
22.0 14 0 0 derivative,even at t = 11. Figure4 showsmodel(2.9)
Total
appliedto armB ofthehead-and-neck experiment.
1,123 31 14
The hazard-rate estimates in Figure2 are basedon the
NOTE: Discretizationis by half-months untilthe end of month9, by monthsuntilthe end of cubic-linearspline(2.9). The rationaleforthismodelis
month27, and by two-month intervalsuntilthe end of month77, fora totalofN = 61 intervals.
The survivaltimein days forthe 45 patientswere 37, 84, 92, 94, 110, 112, 119, 127, 130, 133, simple:Thehead-and-neck-cancer studyistypical ofmany
140, 146, 155, 159, 169+, 173, 179, 194, 195, 209, 249, 281, 319, 339, 432, 469, 519, 528+,
547+, 613+, 633, 725, 759+, 817, 1,092+, 1,245+, 1,331 +, 1,557, 1,642+, 1,771 +, 1,776,
medical survivalsituationsin havingmore data available
1,897+, 2,023+, 2,146+, 2,297+. intheearlymonths, andalsoinhavinga morecomplicated
earlystructure.Model(2.9) is designedtomatchthecom-
plexityof thefittedcurveto theavailability of statistical
beginning oftheithinterval (G1 = 1, bydefinition). The information, and to the perceivedneed forcomplexity.
life-table methodestimates each hibyhi = silni,theob- Model (2.9), includingthe choiceof the join point,is
viouschoicefrom(2.2), and thensubstitutes h,forh,in discussedfurther in Section4.
(2.4) to getthelife-table survival estimate
Gi= 1?j<iI(1 -h)(2.5) 1.0

The Kaplan-Meiercurveis thesameas (2.5), exceptthat


the timeintervalsare chosenso smallthatsi neverex- 0.8 -
ceeds 1.
Ourtacticinthispaperistoestimate
hibylogistic
regres- 0.6 -
sion.Let Aibe thelogisticparameter
Ai log[hi/(1 - hi)], i = 1, 2, . .. , N, (2.6)
0.4-
so hi = [1 + exp( -)L)] -1. For each value of i, let xi be a
known1 x p covariatevector.For example,we might
0.2-
considera cubiclogisticregression
in time,
xi = (1, ti,ti2,ti3) i = 1, 2, ... ., N, (2.7)
0.0

where tiis the midpointof the ith timeinterval(ti = i - 0 10 20 30 40 50


Months
.5 monthsin Table 1). The logisticregression
modelis
Figure3. Parametric
VersusNonparametric
Survival Estimation,
Arm
A.A life-table
estimate
forarmA (jaggedsolidline)is compared
with
twoparametric estimates:
cubicpartial
logistic
regression andcu-
(/v)
whereaYis a p x 1 vectorof unknownparameters.
A bic-linear
splinejoinedat 11months(smoothcurve, indicated
by0).
Efron:Survival Analysis Via Kaplan-Meier 417

1.0
occasionally
write , and hia to emphasizethe depen-
denceon a.
0.8 - These assumptions describea standardlogisticregres-
sionmodel(e.g., see Cox 1970),so we willquotewithout
prooftheusualresults formaximum likelihoodestimation
0.6 - in such models. Let s = (Si, S2,* SN), nha = ,

n2h2
(nlh1l,, , . . . , nNhN,,)', andX equal theN x p ma-
trixhavingvectorxi of (2.8), as itsithrow.Thenthep-
0.4 -
dimensionalscore vectoria = ( (aIaaj) log fa(s) .).
is
0.2 -
ia = X'(s - nha). (3.1)
The MLE of a is thata^thatmakes(3.1) equal 0.
0.0
0 20 40
I
60 80 Thep x p secondderivative matrix- la, withjlthele-
Months ment - (a2laajaal) logfa(s), is givenby
Figure4. ParametricVersusNonparametric SurvivalEstimates,Arm
B. Ufe-tableestimateforarmB Oaggedline)is comparedwithpartial ia = X' diag(niVj,a)X. (3.2)
splinejoined at 11 months
logisticregressionbased on cubic-linear
[see(2.9)].
Here Via hi,a(1 - hi,a),and diag(niVi,a)is the N x N
diagonalmatrixwithdiagonalelementsniVi,a. The ex-
fora, 9a = E{ia(S)ia(S)'}
matrix
pectedFisherinformation
3. MAXIMUM AND
ESTIMATES
LIKELIHOOD = E-la}, also equals X' diag(niVi,a)X.The observed
STANDARD ERRORS matrix
Fisherinformation tobe 3 = &,orequiv-
is defined
This sectiondiscussescalculationof maximumlikeli- 3
alentlyfrom(3.2) = 1& = X' diag(niVi,&)X.
hood estimatesand theirstandarderrors,forpartiallo- Estimatedstandarderrors(SE's) forquantitiesof in-
gisticregression models.UsingthearmA data of Table terestsuchas a, hi, and Gi are obtainedfromfamiliar
1 as an example,we showhow the parametric survival maximum likelihoodcalculations:
estimatesapproachthelife-table curvesandhowtheires- COV ( AY
A

_
1

timatedstandarderrorsapproachthosegivenby Green-
wood'sformula, as theparametric modelsbecomemore SE(hi,&) = Vi,[xigx- l]1/2
complicated. The additionaltheoryrequiredformodels
involvingjoin-pointestimation, suchas the cubic-linear
SE(Gi,) = Gi, [(E hj,&x) (hj&Xj) (3.3)
spline(2.9), is discussedin Section4.
Suppose, then, that we have si I ni - Bi(ni, hi) as in
(2.2), wheretheniare considered fixedat theirobserved Here Gi, = H<V( - hj,&).We usually use the shorter
values,and thesi are takento be independently distrib- notationhi = hi,& Gi = Gi,
uted,giventheni. (The independence assumption is fur- Table 3 givesestimated hazardratesandtheirstandard
therdiscussed inRemarkA.) Also,assumethatthelogistic errorsforthreeconditional (2.8), fit
logisticregressions
parameterAi = log[hi/(1- hi)] followsthe linearlogistic to the Table 1 arm A data: a linearmodel xi = (1, ti),the
model Ai = xia as in (2.8), so hi = [1 + exp(-Ai)] -. We cubicmodel(2.7), and the cubic-linear spline(2.9). (A

Table 3. EstimatedHazard Rates and TheirStandard Errorsat Selected Time Points,forTable 1 ArmA Data

Hazard estimate Standard errorestimate

Month Linear Cubic Cubic-linear Life-table LTSM Linear Cubic Cubic-linear LTSM

1 .090 .053 .015 .020 .019 .0177 .0184 .0122 .0218


3 .085 .076 .087 .104 .071 .0152 .0165 .0228 .0160
5 .080 .095 .146 .200 .102 .0132 .0163 .0316 .0220
7 .076 .106 .123 .0 .099 .0117 .0191 .0319 .0197
9 .072 .107 .076 .095 .090 .0108 .0217 .0279 .0164
11 .068 .098 .051 .0 .075 .0102 .0218 .0185 .0199
15 .060 .068 .047 .083 .058 .0102 .0181 .0178 .0241
20 .052 .034 .043 .222 .038 .0113 .0132 .0131 .0225
25 .045 .016 .039 .0 .008 .0126 .0092 .0119 .0192
30 .039 .010 .035 .0 .0 .0136 .0072 .0133 .0162
35 .033 .010 .032 .0 .033 .0140 .0075 .0158 .0172
40 .029 .021 .029 .0 .053 .0145 .0138 .0183 .0193
45 .025 .112 .027 .0 .092 .0145 .0768 .0205 .0298
47 .023 .266 .026 .500 .099 .0140 .1956 .0213 .0331
NOTE: There are threeconditionallogisticregressions:linear,cubic, and cubic-linearspline,as explained inthe text;the life-tableestimatesl/nj;and a smoothedversionofthe life-tableestimate
(LTSM), obtained froman adaptive local regressionsmootherapplied to the life-tableestimate.Estimatedstandarderrorsforthe linearand cubic models were obtained from(3.3). The standard
errorforcubic-linearspline includes a termforthe choice of the join at 11 months,as explained in Section 4. The standarderrorforLTSM was obtained from25 bootstrapreplications.
418 Journalofthe American StatisticalAssociation,June1988

Table4. ResidualAnalysisofFourHazardModelsFitto Table 1 Data

Deaths ExpectedDeaths Signeddevianceresiduals


Month n s Linear Cubic Cubic-linear LTSM Linear Cubic Cubic-linear LTSM
1 51 1 4.59 2.70 .76 .97 -2.10 -1.22 .27 .03
2 50 2 4.40 3.20 2.18 2.25 -1.33 -.74 -.13 -.17
3 48 5 4.08 3.65 4.16 3.41 .46 .70 .42 .84
4 42 2 3.49 3.65 5.31 3.82 -.90 -.98 -1.73 -1.07
5-6 72 15 5.70 7.06 10.40 7.41 3.44 2.78 1.46 2.63
7-8 49 3 3.68 5.24 5.41 4.78 -.38 -1.12 -1.19 -.91
9-11 56 4 3.93 5.77 3.54 4.63 .04 -.82 .25 -.31
12-14 45 3 2.88 3.81 2.18 2.89 .07 -.45 .54 .08
15-18 45 2 2.60 2.56 2.05 2.41 -.40 -.38 -.03 -.28
19-24 46 2 2.31 1.33 1.91 1.38 -.22 .55 .06 .50
25-31 49 0 2.02 .60 1.80 .84 -2.03 -1.09 -1.91 -.41
32-38 47 2 1.57 .49 1.52 1.45 .38 1.63 .38 .44
39-47 28 1 .75 1.96 .78 1.97 .28 -.78 .24 -.79
Total 628 42
Sumofsquares 23.71 18.24 11.02 11.04
Chi-squared level
significance .014 .032 .201
Degreesoffreedom (11) (9) (8)
NOTE: betterthan eitherthe linearor cubic models.
The models are the same ones considered in Table 3. The cubic-linearspline fitsthe data significantly

quadraticmodelwas also used,butgavealmostthesame The bottomof the table showssignificantly too-large


estimatedhazardsas the linearmodel.) The estimated valuesof E2RJforthe linearand cubicmodels,butthe
standarderrorsforthelinearand cubicregressions were cubic-linearmodel,with5 dfincluding thechoiceofjoin,
obtainedfrom(3.3). The standarderror for the cubic- has an acceptableA8significance levelof .201.The differ-
linearsplineincludesan additionaltermforestimatingthe encesin E>R? are also significant; forexample,18.24 -
join,as explainedin Section4. A semiparametricmethod =
11.02 7.22 is .005 compared
significant, witha 2 dis-
(thesmoothedlife-table estimate)also appearsinTable3 tribution,indicatinga genuinelyimproved fitingoingfrom
and is discussedin RemarkB ofthissection. a cubicto a cubic-linear model.
The linearmodelhas the smallestestimatedstandard The nonparametric estimate
life-table ofthehazardrate
errors,butit does notfitthearmA dataverywell.This is hi = silni.Thisestimateis alwaysunbiasedforhi,as-
is seenin Table 4, wherethedatafromTable 1 has been sumingni > 0, butis usuallytoo variableto be of direct
groupedinto13 timeperiods,eachcontaining roughly50 use. (The excessivevariabilityofhiis obviousinTable3.)
person-months of observation. For example,theseventh Of course,we can do betterwitha parametric modelif
timeperiodincludesmonths 9-11, with56 = 21 + 19 + theparametric assumptions are correct.The following re-
16 totalperson-months ofobservation and 4 = 2 + 2 + sultsforconditional logisticregression modelsare further
0 totaldeaths,as shownin Table 1. discussedin RemarkD. The ratioofasymptotic variances
The expecteddeathsforeach model, oftheparametric hazardestimate hi = hi, compared with
thenonparametric estimatehi,is
ej > nihi, (3.4)
jth time
period
var{hi}= i
var{h~}
[?zzJlzi
j=1
Zi
Z =\ V,a Xi. (3.6)
appearinthemiddlepanelofTable4. Howwellorpoorly
these match the observed deaths s1 is measured by the Thisratiois alwayslessthan1.
signeddevianceresiduals
variances
Theorem. The averageratioof asymptotic
hazard-rate
and nonparametric
betweenthe parametric
RI- \/_sign(sj - ej) is
estimates
F5. S. 1~~~~~~~~~~~1/2
~n,-s, 3
x Si
Lbejj
e log + (ni-sj) log
n-ej
nj
(3.5) 1
varlhil
N var{h;} p
N
~~(3.7)
Ni=1var{hil} N
Yd

shownin the rightpanel. If a modelis correct-inthe


sensethatit containsthetruehazardfunction-then the In otherwords,ifwe estimatethehazardrateusinga
R, shouldbe approximate standardnormaldeviates,with p-parameter conditional
logisticregression model,and if
sumofsquaresEhj approximately chi-squared distributed the model is correct,thenthe asymptotic varianceofthe
with13 (no. ofmodelparameters) df(see McCullaghand hazard estimates is reduced by a factorp/N, compared
Nelder1983,sec. 2.4.3). A numericalinvestigationsug- withthe nonparametric estimates, in thesenseof (3.7).
gestedthatreplacing13 by14.3increasestheaccuracyof As p -- N the advantage of hiover hidisappears.In fact,
the chi-squaredapproximation here,but does not sub- if p = N and the N x N matrix X is of fullrank,then
changetheobservedsignificance
stantially levels. hi = hi.
Efron:Survival Analysis Via Kaplan-Meier 419

Table5. Estimated
Survival
Functions
and StandardErrors

Estimated
survival Estimated
standarderrorsforlog survival
Month Linear Cubic Cubic-linear Life-table Linear Cubic Cubic-linear Life-table
1 .910 .947 .985 .980 .019 .019 .031 .033
3 .759 .819 .860 .843 .054 .055 .059 .065
5 .640 .677 .642 .642 .084 .086 .090 .095
7 .545 .543 .483 .501 .109 .112 .129 .132
9 .469 .433 .402 .397 .131 .143 .155 .168
11 .407 .350 .359 .355 .150 .177 .186 .202
15 .313 .250 .295 .261 .184 .236 .256 .256
20 .236 .197 .235 .184 .222 .284 .286 .299
25 .185 .176 .191 .184 .262 .312 .309 .325
30 .150 .166 .159 .184 .307 .325 .319 .338
35 .125 .158 .134 .184 .358 .341 .324 .348
40 .107 .147 .115 .126 .413 .360 .338 .370
45 .094 .108 .100 .126 .470 .445 .436 .493
47 .089 .065 .095 .063 .493 .718 .686 .726
NOTE: Leftpanel: estimated survivalG at the end of the indicatedmonths,forfourdifferent estimatorsapplied to the arm A data,
Table 1. Rightpanel: estimatedstandarderrorsforlog{G}. The standarderrorforthe cubic-linearspline includes a termforthe choice
ofjoin (see Sec. 4). Note thatthe life-tableestimateis onlyslightlymore variablethanthe cubic-linearspline.

Suppose we are interested in estimating the survival expectthatthevariability ofa survival curveGi,obtained
function Gi ratherthan the hazard rate hi. In thiscase, from a p-parameter conditional logistic regression, ap-
parametric methodsofferless impressive improvementsproachesthevariability of thelife-table estimateGi as p
overthenonparametric life-table
approach.Table5 com- -- N. What is surprisingin Table 5 is how quicklythe
paresthe estimatedsurvivalcurvesGi (fromthelinear, approachtakesplace. Even thecubicmodel,withonlyp
cubic,and cubic-linear splinemodels)withthenonpara- = 4 parameters, has barely1O%-15% smallerstandard
metriclife-tableestimate(2.5). The rightpanel shows errorsthanGi. On the otherhand,parametric models
estimatedstandarderrorsforlog{Gi}. The cubic-linear providemuchgreaterimprovements whenestimating the
spline,whichwasouronlyparametric modelgivinga sat- hazardrate,as thetheorem(3.7) shows.
isfactory fitto the data in Table 1, is onlyslightly less
variablethanthe life-table estimate.In the notationof RemarkA. The independence assumption (2.2) can-
notbe literallytrue.Forexample,ifthereis no censoring
(3.3), Si = ni - ni+1.In this case, the sequence s1, s2, . . ., is
completely determined by the sequencen1,n2, ... , in
SE(log Gi,a)= [( hi i,a, ] (3.8) contradiction to (2.2).
j<i j<i Nevertheless, calculations based on (2.2) givereason-
ableanswers underreasonableassumptions. Usingtheno-
It is easierto comparestandarderrorsforlog G thanfor tationof (2.1), letv' = (s1, s2,
s1, S2, . . * , s{1, Si)
G itself,becausethefactorGi&in thethirdequationof and = (s1, withn
vi Si, S2, S2' . . . , sii, Si'i). Starting
sharpenthe com- n1patientsat riskat thebeginning
-
formula(3.3) is removed.To further ofobservation (which
parison,all of thestandarderrorsin Table 5 werecalcu- we taketo be a constant, fixedat itsobservedvalue),vi
latedassuming thatthecubicmodelwas true.
is thehistory ofdeathsand lossesforthefirsti - 1 time
Formula(3.8) is closelyrelatedtoGreenwood's formula
intervals;v' is the same history extendedto includesi.
forthevarianceofthelife-table estimate. Supposein(2.8) Here we followtheusualconvention thatthes!' lossesin
we takep = N andxi = ei,theN-dimensional vector(0,
anyone timeinterval occurafterthesi deaths.Notethat
O,..., 1, 0,. . .,O) with1 in the ith place (i = 1,
n2 = n1 - s1 - s ,n3 = n2 - S2- s , and so forth, so
2, . . . , N). ThenX equals theN x N identity matrix, thereis no need to indicaten2,n3, . . , in or vi'.
ni vi
and (3.1) showsthatthe MLE hi equals si/ni= hi, the
We assumethatsi, givenvi,has a Bi(ni,hi,,)distribu-
nonparametric MLE. In thiscase,theMLE ofthesurvival
tion,wherehi,a = [1 + exp(-xia)]1, as in (2.8), and that
curve,Gi, equals (2.5), the life-tableestimateGi = on a nuisance
4i',givenvi, has a distribution depending
1l<j<i(1 - hj). The observedinformation matrix 9 = X'
parametervector(, but not on a:
diag(niVi,&)X -
equalsdiag(nihi(1 hi)),so (3.8) gives
fa,jS1Sli51S2, S2 ,*
I
SN, SN)

SE{log Gi} = n1h1(1 1/2


= hslja (1 hl,a)n,-s1 I v1)

I
[(s:i) - f<(sj

r ~~~ ~~1/2
= [2 ' Si ) (3.9) X [(122) h2a(1 - h2)S2] f(S I V4) -

whichis Greenwood'sformula (see Miller1981,p. 45). >([() - hN,a) N NNI fg(SN (3.10)
hN,a (1 |k).
This calculation
(as wellas common sense)leads us to
420 Journalof the American StatisticalAssociation,June1988

See Prenticeand Kalbfleisch(1980,sec. 5.2) fora nice as a strictformofsmoothing. Past 15 months,


thespline
discussionofnoninformativecensoring,which(3.10) rep- model has a considerably smallerstandarderrorthan
resents. LTSM (see Table 3), butmaybe biasedifthemodelis
The log-likelihoodla,4 log faj(Sj sj, * ., SN) for . grosslywrong.RemarkH of Section5 briefly discusses
(3.10) can be written
as theliterature of smoothhazardestimators.
la,= la +l, (3.11) RemarkC. The referees
forthisarticlesuggested
using
where the complementarylog-log link (i,a = log( - log(1 -
hi,a)) insteadof the logitAi,aas our regressionparameter.
la = 109 II in')h, (1 -hi,a)nj-s' For smallvaluesofhi,asuchas thosesuggested byTables
Si=i 1 and 2, thismakeslittledifference, sinceAi,and Oi, are
andledoes notdependon a. Sincela is thelog-likelihood nearly equal. As a check, the cubic-linear splinemodel
for the independent binomialsituationsi I ni Bi(ni,
nd was refit,using ki,a = xia insteadof (2.8). The refitted
hi,a), we see that(a) thescorevectorfora basedon (3.10)
hazard ratehi,&for arms A and B agreed with thecubic-
is ia = X'(s - nha), as givenin (3.1); (b) the MLE a is linear column in Table 3 to within.3% for every value
thesame as thatbased on theindependent binomialas- of t.
sumptions;(c) the secondderivation matrix- , is of
block-diagonal form, RemarkD. Let h = (hi,&)h2,.&.. , hN,&)' be the vec-
torofestimated hazardsbasedon a partiallogisticregres-
=i (4-i a 0?) ,(3.12) sion (2.2) and (2.8), and let h = (h1,h2, . .. , hN) be the
vectorof life-tablehazards hi = silni. Also, let g, = X'
as definedafter(3.2). The jointcovariance
covariance diag(niVi,a)X,
where- laisas givenin(3.2); (d) theestimated
matrixof h and h is approximately
fora, obtainedbytakingtheupperleftp x p block
matrix
of (-l&,)1, is -1j = g-l where as before g is the
observedFisher information cov (h) (Ml M1 (3.14)
matrixX' diag(niVi,&)X, \h kMlM
based on theindependent binomialassumptions; (e) the
expectedFisherinformation matrixg, = E- la,} is also where M1 and M2 are N x N matrices:M1 l
of block-diagonal form,withtheupperleftp x p block diag(Vi,a)X9j1X' diag(Vi,,)and M2 = diag(Vi,lni). [This
equal to Ea,{ -1a}; and (f) the approximate covariance followsfromthe Taylor series expansionh - h -
matrix fora, obtainedbytakingtheupperleftpx p block diag(Vi,a)Xgil1X'(s - nha) and h - h = diag(l/ni)(s -
of g -1, is (Ea{ - a}),J whichby Jensen'sinequalitysat- nha).]
isfies The formof (3.14) showsthat,asymptotically, h= h
+ independent extraerror;thecovarianceof the extra
(Ec-i a}j) -E1 -} 1. (3.13) erroris M2 - M1,whichis a positivesemidefinite matrix.
In summary, the independent binomialmodel (2.2)- Note that(3.14) gives (3.6). Then,
(2.8) canbe replacedbythemorebelievablemodel(3.10), 1 N var{Ai}
withoutchangingthe MLE a^. The estimatedcovariance var{hi}
NE
matrixof 'a, based on theobservedFisherinformation,
also does not change.This is nottrueforthe expected
Fisherinformation, but(3.13)suggests thatthecovariance -N Zi,a ZI,aZj,al [2 1
a tr{Ip}
approximation basedon theindependent binomialmodel
willbe greaterthanthatbasedon (3.10). p/N, - (3.15)
RemarkB. The smoothedlife-table hazardestimate whichis thetheorem(3.7). The asymptotic independence
(LTSM) in Table 3 was obtainedbyapplying an adaptive of h - h and h is a familiar resultfromgeneraltheory
local regression smoother ("super-smoother"; see,Fried- relatingefficient estimators and unbiasedestimatesof 0
manand Stuetzle1981)to thelife-table estimates =
hi sil (see Rao 1973,sec. Sa); some versionof(3.15)wouldhold
ni. "Adaptive"here meansthatthe widthof the local forparameterizations otherthanthe logisticregression
smoothing windowwas chosenbythesmoother itself,on (2.8).
thebasisofa generalized cross-validationcriterion.Hastie
and Tibshirani(1986) discussedlocal smoothingesti- RemarkE. A moregeneralformofthelinearlogistic
mators,extensively. model(2.8) is
Table3 showsonlymoderate agreement betweenLTSM
andthecubic-linear splinehazardestimate forarmA. The Ai = ai + x^ce i = 1, 2, ... ., N, (3.16)
singledeathof 47 monthsin Table 1 greatlyinfluenced where al, a2, . . , a,, are any fixed constants.Results
LTSM. RemovingthisdeathbringsLTSM downto 0 at (3.1)-(3.3) remainvalidas stated,as longas we remember
47 months. The agreement betweenLTSM andthecubic- thatAiis givenby(3.16) ratherthan(2.8). McCullaghand
linearsplinewas excellentforarmB. Nelder(1984,p. 138) calleda, an offset.
Usinga modelsuchas a cubic-linear splinecan be seen The offsetformof )Ai,a was used in fitting conditional
Efron:Survival Analysis Via Kaplan-Meier 421

modelsto thearmB data ofTable 2:


logisticregression Table 7. EstimatedDifferencesBetween ArmsA and B

ai= og2 i = L. 18 Logit difference Hazard difference

= log(1), i = 19,... , 36 Month AA- 1B se Penalty hA - hB se Penalty

= log(2), i = 37, . . . , 61. (3.17) 1 1.31 1.41 1.03 .011 .012 1.00
3 .43 .45 1.00 .022 .030 1.00
Thesea' compensate forthediffering lengthsofthetime 5 .21 .37 1.00 .017 .044 1.00
intervalsin Table 2. [See Eq. (5.3). The fittedhazards 7 .39 .36 1.02 .032 .033 1.00
9 .68 .45 1.04 .035 .024 1.00
hi, forarmB wereadjustedto one-month for
intervals, 11 .80 .54 1.00 .028 .020 1.01
hi, by 2 for i = 1, . .. , 18, in
example, by multiplying 15 .83 .48 1.00 .026 .016 1.02
20 .85 .42 1.01 .024 .013 1.01
orderto maketheplottedhazardratesin Fig. 2 compa- 25 .87 .44 1.01 .022 .013 1.00
rable.] 35 .90 .63 1.01 .019 .016 1.01
45 .95 .92 1.01 .016 .020 1.02
SPLINEMODEL
4. THECUBIC-LINEAR NOTE: Leftpanel: the logitscale. Rightpanel: the hazard scale. The penaltyratiois now very
small,so estimatingthe join pointfromthe data adds littleto the standarderror.
Thissectiondiscussesthemaximum likelihoodestima-
tionofthejoinpointinthecubic-linear splinemodel(2.9).
We are particularly interestedin assessingtheincreased in thepictureddifferences of thetwohazardsthantheir
standarderrorofquantities suchas thehazard-
ofinterest, individual standard errors would suggest.(It is important
betweenarmsA andB ofthecancerstudy,
ratedifferences to note that thisstatement depends on choosingthesame
due to theestimation of thejoin. The discussionhereis join forboth estimated hazard rates.)
verybrief.Efron(1986,sec. 4) gavemoredetails. The resultsin Table 6 are basedon a generalization of
Tables6 and 7 numerically summarize therathertech- model (2.8):
nicalresultsofEfron(1986).In Table6 we see theMLE's
= xi(o)a, i = 1, 2, ... , N, (4.1)
ofthehazardratesforarmsA andB, basedon thecubic- Aii-it,ao

linearsplinemodelwithjoinat 11months. The estimated


wherexi(+) is a 1 x p covariatevectordepending on an
standarderrorsare calculatedfroma formula(4.3) that
unknownreal-valuedparameter0. We requirethatthe
addsa termto (3.3) toaccountforthedata-based selection
vectorderivative
of the join. The penaltyratio{standarderrorfrom(4.3)}/
{standarderrorfrom(3.3)} can be quitesubstantial, the
greatest penaltyin Table 6 being1.48. Xi(+)_ (...dxij(0 ...) (4.2)
Table 7 comparesthehazardratesforarmsA and B.
The two comparisonstatistics are the difference of the be definedcontinuously as a function of +.
hazardrates,say hA,J- hB,i, and the difference on the
A A

The case of particular interestis where0 is the join


logitscale 'AAi - BJ= log(hA,iIhBI) - log((1 - hA,)I(l pointofa cubic-linear spline,xi(+) = (1, ti,(ti - 0)2, (t
- hB,i)).In thiscase,wesee thatthereis almostnopenalty -
0) ), as in (2.9). Then xi(+) = (0, 0, -2(ti - o),
fromestimating thejoin. - a continuous functionof 0 for any
-3(t4 0)2 ),
Tables6 and 7 have a simpleinterpretation: Choosing value
ti.
thejoin pointon thebasis of thedata can substantially
Supposethata and0 in(4.1) areestimated bymaximum
increasethestandarderroroftheestimated hazardrates hazardratehi,&,(= [1 +
likelihood.Thentheestimated
butithaslittleeffect onstatistics comparing thetwohazard
rates.In Figure2, forinstance, exp(-Ai,&,)]-1 has greaterstandarderrorthan indicated
we can havegreaterfaith in
(3.3) becauseof theestimation of 0 in additionto a,
and likewisefortheotherestimated parameters. The ap-
Table6. EstimatedHazardRates,TheirStandardErrors,and the propriatestandard-error formulasrequirethe following
PenaltyRatioat Selected TimePoints definitions: X andXare theN x p matrices, withithrows
ArmA ArmB estimates xi(+) and xi(+), respectively; D = where
diag(niVi,&1,),
estimates
Vija,qs = hi,a,(l - hia,4); J = X'DX; v = 9- (XDX&);
Month h se Penalty f se Penalty u = Xca - Xv; and Q = E,NPuJnjVj,&,;.
1 .015 .012 1.11 .004 .005 1.17 formulas
The asymptotic forthestandarderrorsare
3 .087 .023 1.06 .060 .021 1.02
5 .146 .032 1.02 .129 .035 1.03 SE(Ai) uAIQJ"2
[xiS-xi' +
7 .123 .032 1.29 .091 .030 1.33
9 .076 .028 1.38 .040 .019 1.48
SE(hi) - Vi,a[XJ-_1X, +u~IQ]2
11 .051 .018 1.04 .024 .009 1.00
15 .047 .018 1.30 .021 .009 1.22
20 .043 .013 1.17 .019 .007 1.18
i \2X / UQ11/2 (4.3)
A

25 .039 .012 1.03 .017 .006 1.11 SE A + 1I/2


35 .032 .016 1.02 .013 .005 1.10
45 .027 .021 1.06 .010 .005 1.01 ~
SEGi)aj G/Q1_ hix (4.3)ix
NOTE: The penalty ratioequals {se including joinprechosen}.
joinchoice}/[se The penalty
ratiocan be quitelarge.
422 Journalof the American StatisticalAssociation,June1988

oftheJoinPoint4
Table8. Deviancesas a Function It is easytosee theresults
ofdiscretizing
thiscontinuous
forArmsA and B situation.Supposethattheithdiscretetimeinterval has
centerpointtiand lengthAi. Then,the discretedensity
giJa -= i 2ga(t) dtis obtainedbya standardTaylorseries
Deviance 9.5 10 11 12 13 argument:
devA 50.928 48.016 47.519 47.370* 47.669
devB 34.011* 34.083 34.557 35.205 35.889 gi a = ga(ti)Ai + O(A ). (5.1)
Total 84.939 82.099 82.076* 82.575 83.558 thediscrete
Similarly, survival Gi,a=
function Ej2i gi and
NOTE: The totaldeviance is minimizedfor+ = 11.
discretehazard rate hia = gi, Gi,aare givenby
* Minima.
Gi,a = Ga(ti) + [ga(ti)I2] Ai + O(Aw)

Comparing (4.3) with(3.3), we see thatthetermsin (4.3) hi,a = ha(ti) Ai - 2hj(ti)2


&A + 0(A). (5.2)
involving ui represent theadditionalstandarderrorfrom The logisticparameter.<i, = log
hi,aI(1 - hi,a)equals
estimating 4 byitsMLE 4. Thesetermsdisappearif4 is
assumedknown,inwhichcase theestimated standard er- a= log(A1)+ log(ha(4)) (A2).
+ 2ha(tj)Ai + (5.3)
rorsfrom(4.3) are thesameas thosefrom(3.3). Note thattheparameteri, = log( -log(1 -hi,a)) dis-
Efron(1986,sec. 4), derivedformulas (4.3), inaddition cretizesmorenicely.Thus
/, log(Ai) =
log(ht(t1)) + +
to morecomplicated formulas forthestandarderrorsof but
0(AW), as Remark C shows,this is unimportant in the
quantitiessuchas hAj - hB,i,whichcomparetwoestimated present context.
hazardrates.Table 7 showsthatthesecomplicated for- In thissectionwe considerthe
parametric classofcon-
mulasare almostsuperfluous here. We get almostthe
tinuoushazardfunctions
sameestimated standarderrors,simplybyusing(3.3) on
thetwoindependent armsofthestudy:SE(hA,i - hB,i) = ha(t) = exp[x(t)a], (5.4)
In otherwords,in thiscase we
[SE(hA i)2 + SE(hB,j)2I112.
wherea is a p x 1 vectorof unknownparameters and
can ignoretheeffect ofestimating 4. x(t) is an observedp-dimensional time-dependent co-
RemarkE The MLE's (&A, ,aB, ) forthecancerstudy variatevector.For convenience, we assumethatx(t) has
werefoundbycombining standardlogisticregression for first coordinate 1 forall t, sayx(t) = (1, x(1)(t)), andthat
aA and aB witha direct searchoverthepossiblevaluesof x(l)(t) has continuoussecond derivativeswithrespectto t.
+. Table 8 showspartofthissearch.For each trialvalue If all of thediscretetimeintervals havethesamelength
of 4, the MLE's aA and aB were foundwitha logistic Ai = A, then(5.3) gives
regression program, producing estimates andhB,j. The
hAji Ai = log(A) + x(ti)a + 0(A). (5.5)
deviancesdevAand devB,
Thisapproachesmodel(2.8) as A >~0, withthequantify
log(A) in (5.5) absorbedintothe a1 termofx(t1)a = a1
dev 2 (si log s- + (ni - s1)logn - hi)
ni (1 + X(l)(ti) a().
i= ii -hi
The simplestcase of (5.4) is wherep = 1 and x(t) =
getsmalleras thelikelihoodgetslarger.The value of 4 1, so ha(t) = exp(a1) forall t > 0. Let 0O-exp(a1). The
thatminimizes devA+ devBis theMLE. Table 8 shows cumulativehazard Ha(t) f ha(s) ds equals Ot,giving
f
that+ = 11 forthecancerstudy. Ga(t) = exp[ -Ha(t)I = exp( -Ot) and ga(t) = exp( -Ot)
fort > 0. In otherwords,thelifetime distributionT cor-
responding to hazardfunction exp(a1) = 0 is a one-sided
5. THECONTINUOUS CASE exponentialdistribution scaled to have expectation 0-1,
Ourmethodsso farhavedependedon discretization of say T ZlO, whereZ has density e-z forz > 0.
thedata,as inTables1 and2. Thisintroduces an arbitrary The next-simplest case is wherethelog hazardrateis
featureintotheanalysis,althoughtheexactformof the linearin t, x(t) = (1, t), so that
discretizationis usuallyunimportant(see RemarkI). This ha(t) = exp(a1 + a2t), t > 0. (5.6)
sectiondiscussestheclassofcontinuous modelsobtained
as limitsof theconditional (2.2) and Defining
logisticregressions 0 [exp(aj)]/a2,thecumulative hazardandsur-
(2.8), as thediscretetimeintervals in
("months" Table 1) vival functions are
becomesmall.The discussion byClayton(1983)is closely Ha(t) = 0[exp(a2t)- 1]
relatedto thepointofviewtakenhere.
Suppose thatga(t) is a probabilitydensityfunctionon Ga(t) = exp - {0[exp(a2t) - 1]}. (5.7)
the positiveaxis, withsurvivalfunctionGa(t) = - ga(s)
ds and hazard functionha(t) = g,(t)IG(t) fort > 0. It isWe recognize Ga(t) as thesurvival
functionofa Gompertz
convenient to assumethatga(t) has a continuous distribution(see Johnson and Kotz 1970,
p. 271). In other
second
derivative in t, thoughonlya continuous words, (5.6) corresponds to a Gompertz lifetime distri-
firstderivative
is actuallynecessaryto maketheconnection withtheear- bution,
liersections. T (11ar2)log[1+ Z10], (5.8)
Efron:Survival Analysis Via Kaplan-Meier 423

suchas thosefortheGompertz
whereZ is a standardone-sidedexponential.This last culations (5.7)
distribution
resultassumesthata2 > 0, SO Ga(t) approaches0 as t give
00o

Whatifa2 < 0 in (5.6)? ThisrepresentstheinterestingGa(ti)/Ga(tO) = h_(t?) {1 - exp[a2(tl - to)]


case ofan incompleteGompertz wherethere
distribution, -a2

thatthelifetime
is positiveprobability T is infinite: t? 2 to> - (5.14)

Pa{T = oo} = e0, 0 [exp(a1)]/a2 < 0. (5.9) If a2 < 0, thenPa{T = oo}is positiveand can be found
by lettingt1 (5.14):-- ooin
Withthisunderstanding, (5.7) remainstrueas stated.
All of thediscretemodelsconsideredpreviously have Pa{T = ??} = Ga(to)[ha(to)I(-a2)I (5.15)
the potential for estimatingPa{T = oo} to be positive. For botharmsof the cancerstudy,the MLE a2 was
(Thishappenedinbotharmsofthecancerstudy;see Re- negative.Formula(5.15), withto= 47 months forarmA
markG.) Thisis an advantageofourapproach.In prac- and to = 77 monthsforarmB, givesthefollowing esti-
tice,it is oftendesirableto includethe possibilityof a matesforthesurvival fractions:
positivesurvivalfraction,butthiscanbe clumsy todo with
the usual parametric modelsfor lifetimedistributions. P&A{T = oo} = .025, P&B{T = oo} = .189. (5.16)
Miller(1981,sec. 2.4) gavea briefdiscussion. Of course,estimates suchas (5.16) shouldbe interpreted
Morecomplicated examplesofmodel(5.4), suchas log withcaution,sincetheyrepresent heroicextrapolations
ha(t) = a1 + a2t + a3t2,do notyieldsimpleexpressions beyondtheobserveddata.
forthecdfor density.Thisis unimportant, sincethees-
timationof parameters dependsonlyon the log hazard RemarkH. Thisarticleconcentrates on theone-sam-
rate,whichis particularly easyto use formodel(5.4). ple situation, where all patientshave the same survival
The parametervectora in model (5.4) is estimated curveGa(t). Model (5.4) and its discrete analog extend
as follows:Let n(t) be the numberof patientsat risk easilytotheregression situation, where patient j's survival
justbefore time t. We assume thattheoccurrence of ob- depends on a time-varying vectorzj(t) of observedco-
serveddeathsis a Poissonpointprocess,withintensity variates,say
n(t)ha(t) = n(t)ex(t)aat timet. This is thelimiting
process
hj(t) = exp[x(t)a + zj(t)f6]. (5.17)
obtainedfrom(2.2) by letting thediscretetimeintervals
decreaseto zero length(see Efron1977). Supposethat Model(5.17),andinparticular itsconnection withCox's
out of all n patientswe observed m deaths,at say,
times, likelihood,
partial orproportional hazards, model wasex-
T1,T2, . .. , Tm,withthe othern - m patientsbeinglost aminedin Efron(1977). It is showntherethatthefully
to follow-up at varioustimesduringthestudy.DefineS parametric model(5.17) willusuallynotimprove muchon
thepartiallikelihoodmodel hj(t) = ho(t) with
exp[zj(t)f3],
x(T1)'. The scorevectorla forthePoissonprocess
is ho(t)completely at
unspecified, least notfor the estimation

f
off,.On theotherhand,(5.17) canbe effective inactually
ia = S - n(t)x(t)'ex(t)adt, (5.10) estimating the hazardshj(t), rather than just comparing
0 themas thepartiallikelihoodmodeldoes.
so theMLE 'a is givenby RemarkI. Suppose thatin the continuousPoisson-
processsituation(5.4), we discretizeto situation(2.2).
S = n(t)x(t)'ex(06 dt. (5.11) How muchinformation is lost?For convenience assume
thatthecontinuous lifetime variateT takesitsvaluesin
The observedFisherinformation matrix fora is theunitinterval[0, 1], and thatthediscretization of the
data is into N equal subintervals, as in Table 1. Let
-a = fn(t)x(t)'x(t)ex(t)a dt. (5.12) ga(N) be theFisherinformation matrix fora basedon the
discretedata (2.2) (takingtheindependence assumption
and
literally), let ga(oo)be the Fisher information matrix
It is easy to see that(5.10) and (5.12) are simplythe
onecan based on the original continuous data. Then, as N -?
continuous analogsof(3.1) and(3.2). Conversely,
look at (3.1) and (3.2) as convenient summation approx- ga(N) - a(??) - c/N2, (5.18)
imations to theintegralsin (5.10)-(5.12).The connection
betweenthediscrete andcontinuous caseswasdrawnmore wherec = (1/12)fJx(t)'x(t)n(t)h(t) dt.Here thefunc-
carefully inEfron(1977),including a derivationof(5.10)- tionn(t) is consideredfixedat its observedvalue,even
(5.12). thoughit is random[liketheniin (2.2)].
Result(5.18) saysthattheinformation lossdue to dis-
RemarkG. A continuous cubic-linear splinemodellog cretization goes to0 veryquickly as N grows large.Various
ha(t) = a1l+ at2t+ ae3(t /ff+ a4(t - /)3 has
- alternative discretizations were triedon the cancer-study
data, suchas discretizing armA intothesame intervals
log ha(ti) = log ha(t0) + a2(t1- to) (5.13) used for arm B in Table 2, withalmost-imperceptible
forvaluesof t1and togreaterthanthejoin point4. Cal- changesin theresults.
424 Journalof the American StatisticalAssociation,June1988

RemarkJ. A data discretization different from(2.2) isotonicproceduresare veryattractive. As Tsai (1986)


was usedbyHolford(1976) and Lairdand Oliver(1981): pointedout, the isotonicfitting algorithmcontinuesto
Assumethatthelifetime variateT is "piecewiseexponen- have a simpleformevenifthedata are left-truncated as
tial,"thatis,thatT hasa constant hazardratewithin each wellas right-censored.
The samestatement appliesto all
discrete timeinterval.Let Oibe thehazardratefortheith of themethodsproposedin thisarticle.
interval,and let ri be the totalobservation timeforall The classofgeneralized linearmodels(McCullaghand
patientsduringtheithinterval. Nelder1984)includeslogisticregression.The methodsof
The likelihood function
forthissituation is thesameas thisarticleallowthegeneralized-linear-modeltechnology
thatfora modelwhichassumesthatthenumber ofdeaths to be appliedto survivalanalysis.McCullaghand Nelder
si in interval
i is conditionally
Poisson, (1984,chap.9) presented a different
wayofattacking the
sameproblem.
- Po(Oiri) independently, i = 1, 2, . . N.
siiri .,
[ReceivedFebruary
1987.RevisedSeptember
1987.]
(5.19)

Model (5.19) is similarto (2.2). Like (2.2), it involvesN REFERENCES


parameters, 01,02, . . ., ONthatcan be estimated bygen- Anderson,J.A., andSenthilselvan, A. (1980),"SmoothEstimates for
eralizedlinearregression. It has theadvantageof taking theHazardFunction," JournaloftheRoyalStatisticalSociety,
Ser.B,
42, 322-327.
morecarefulaccountoftheobservedpattern oflossesand (1982),"A Two-step Regression ModelforHazardFunctions,"
deathswithineach interval, sincetheseaffectributnot AppliedStatistics,31, 44-51.
ni. This made littledifference in the cancerstudy,but Clayton,D. (1983),"Fittinga GeneralFamilyof FailureTimeDistri-
butionsUsingGLIM," AppliedStatistics, 32, 102-109.
mightbe moreimportant ifthedata discretization were Cox, D. R. (1970),Analysisof BinaryData, London:Methuen.
moredrastic. (1975),"PartialLikelihood,"Biometrika, 62, 269-278.
Still anotherdiscretization was used by Thompson Cox,D. R., andOakes,D. (1984),AnalysisofSurvivalData, London:
(1977),Prenticeand Gloeckler(1978),and Pierce,Stew- Efron, Chapman& Hall.
B. (1977), "The Efficiency of Cox's LikelihoodFunctionfor
art,andKopecky(1979). All ofthesemethods, including CensoredData," JournaloftheAmericanStatistical Association,72,
(2.2), reducethesurvivaldistribution to N unknown pa- 557-565.
rameters. Here we have consideredstillsmallermodels, Meier (1986),"LogisticRegression, Survival
Analysis, andtheKaplan-
Curve,"TechnicalReport266, Stanford University,Dept. of
suchas the five-parameter cubic-linear spline,to better Statistics,andTechnical Report115,Stanford University,Div. ofBio-
estimatethe survivaldistribution. The otherreferences statistics.
concentrate on thesituation wherethereis covariatein- Friedman,
Journal
J.,andStuetzle,
oftheAmerican
W. (1981),"Projection
Statistical
Association,
PursuitRegression,"
76, 817-823.
formation, suchas in (5.17), and do not considerpara- Hastie,T. J., and Tibshirani,R. J. (1986), "GeneralizedAdditive
metricsurvival modelssuchas (2.8). Models,"StatisticalScience,1, 297-318.
Holford, T. (1976),"LifeTablesWithConcomitant Information,"Bio-
metrics,32, 587-597.
RemarkK. Severalauthorshave investigated semi- Johnson, N., andElandt-Johnson, R. (1980),SurvivalModelsandData
Analysis, NewYork:JohnWiley.
parametric hazardestimates h(t), whicharesmoothfunc- Johnson, N., and Kotz,S. (1970),Continuous Univariate
Distributions
tionsof t but do not assumea specificparametric form (Vol. 2), NewYork:JohnWiley.
suchas (5.4). TannerandWong(1983,1984)adaptedker- Laird,N., and Oliver,0. (1981), "CovarianceAnalysisof Censored
SurvivalData UsingLog-Linear Analysis Techniques,"Journal ofthe
nel estimators foruse withcensoreddata. Andersonand American Statistical
Association,76,231-240.
Senthilselvan (1980, 1982) fitthe hazardby smoothing McCullagh,P., andNelder,J.(1984),Generalized LinearModels,Lon-
splines.O'Sullivan(1986)pursuedthesplineapproachin don: Chapman& Hall.
R. G., Jr.(1981),Survival
a detailedstudyof the methodof penalizedlikelihood. Miller,(1983), Analysis,
"WhatPriceKaplan-Meier?"
NewYork:JohnWiley.
Biometrics,
39, 1077-1081.
The local-regression smoother LTSM considered inTable Mykytyn, S. W., and Santner,T. J. (1981),"Maximum LikelihoodEs-
3 is in thespiritof thesepapers.Table 3 illustrates both timationof the SurvivalFunctionBased on CensoredData Under
the promiseand the possiblepitfallsof semiparametric Hazard Theory
Rate Assumptions," Communications
and Methods,10, 1369-1387.
in Statistics,
PartA-
smoothing techniques.The simpleparametric modeling O'Sullivan,F. (1986), "Estimationof Densitiesand Hazardsby the
discussedin thisarticleis a lessambitious toolthatworks MethodofPenalizedLikelihood,"TechnicalReport68,University of
California, Berkeley,Dept. ofStatistics.
to bestadvantagewhenthedataaresparse,as inthelater Padgett,W.J.,andWei,L. J.(1980),"Maximum LikelihoodEstimation
monthsofthecancerstudy. of a DistributionFunctionWithIncreasing FailureRate Based on
Anothersemiparametric approachto estimating the CensoredObservations," Biometrika, 67, 470-474.
Pierce,D., Stewart,W., and Kopecky,K. (1979),"Distribution-Free
hazardrateassumesthath(t) is a monotonefunction oft Regression AnalysisofGroupedSurvival Data," Biometrics,
35,785-
(see Mykytyn and Santner1981;Padgettand Wei 1980; 793.
Tsai1986).Thesepapersessentially useisotonic regression Prentice,R., andGloeckler,L. (1978),"RegressionAnalysisofGrouped
SurvivalData WithApplication to BreastCancerData," Biometrics,
to estimatethe binomialparameters hi in (2.2). Miller 34, 57-68.
(1981,p. 15) warnedagainstassumptions of a monotone Prentice, R., andKalbfleisch, ofFailure
J.(1980),TheStatisticalAnalysis
hazardrateinbiostatistical situations,
andinfactthehaz- TimeData, NewYork:JohnWiley.
C. R. (1973),LinearStatistical
ardratesforthecancerstudywerenonmonotone. In cases Rao,
York:JohnWiley.
InferenceandItsApplications,New
wheremonotonicity is a safe assumption, however,the Tanner,M. A., andWong,W.H. (1983),"TheEstimation oftheHazard
Efron:Survival Analysis Via Kaplan-Meier 425

FunctionFromRandomlyCensoredData by the KernelMethod," Thompson, W. (1977),"On theTreatment ofGroupedObservationsin


The Annals of Statistics,11, 989-993. LifeStudies,"Biometrics,
35, 463-470.
(1984),"Data-BasedNonparametric Estimationof theHazard Tsai,W.-Y.(1986),"Estimation
oftheSurvivalFunction
WithIncreasing
FunctionWithApplications to Model Diagnosticsand Exploratory FailureRate Based on LeftTruncated and RightCensoredData,"
Analysis," Journalof the AmericanStatisticalAssociation,79, 174- unpublishedmanuscript,New York:Brookhaven NationalLabora-
182. tory,AppliedMathematicsDept.