10 views

Uploaded by nuriyesan

Cox Regression, Roc

- Nutritional survey standard analysis.docx
- ggplot2
- Biometric User Identification and Authentication
- Identify the Risk to Hospital Admission in Uk-systematic Review of Literature
- 20081103153446_F Distribution Tables
- Lupus x Estresse Oxidativo
- biomatric
- nomogram
- Logistic Regression Variables Bblk
- Data Analysis and Findings (Edit)
- Ques 3.docx
- comparation of HbA1C and fasting blood sugar
- Data Science
- CT4_2015
- 8.pdf
- Test Reliability_chapter 8
- Quantitative Aptitude Test
- Med 06 Course Outline
- 3firmPortfolioExample.xls
- Panel Data Analysis (mer)

You are on page 1of 14

March 2005

Patrick J. Heagerty

Department of Biostatistics, University of Washington, P.O. Box 357232, Seattle,

Washington 98195-7232, U.S.A.

email: heagerty@u.washington.edu

and

Yingye Zheng

Fred Hutchinson Cancer Research Center, 1100 Fairview Avenue North, MP 702, P.O. Box 19024,

Seattle, Washington 98109-1024, U.S.A.

Summary. The predictive accuracy of a survival model can be summarized using extensions of the pro-

portion of variation explained by the model, or R2 , commonly used for continuous response models, or

using extensions of sensitivity and specicity, which are commonly used for binary response models. In this

article we propose new time-dependent accuracy summaries based on time-specic versions of sensitivity

and specicity calculated over risk sets. We connect the accuracy summaries to a previously proposed global

concordance measure, which is a variant of Kendalls tau. In addition, we show how standard Cox regression

output can be used to obtain estimates of time-dependent sensitivity and specicity, and time-dependent

receiver operating characteristic (ROC) curves. Semiparametric estimation methods appropriate for both

proportional and nonproportional hazards data are introduced, evaluated in simulations, and illustrated

using two familiar survival data sets.

Key words: Cox regression; Discrimination; Prediction; Sensitivity; Specicity.

In this article we propose a new method for characterizing the ultimate objective. The goals of this article are to introduce

predictive accuracy of a regression model when the outcome new time-dependent sensitivity, specicity, and ROC concepts

of interest is a censored survival time. We focus on data ob- appropriate for survival regression models; to demonstrate the

tained from a prospective study in which a continuous follow- connection between time-dependent ROC methods and clas-

up time is observed for each participant, but where follow-up sical concordance summaries such as Kendalls tau or the c

can be terminated either by the occurrence of the event of index (Harrell, Lee, and Mark, 1996); and to show how stan-

interest or by censoring. Thus the essential outcome informa- dard Cox regression estimation methods directly provide the

tion is the combination of the status at the end of follow-up ingredients needed to calculate the proposed time-dependent

(binary) and the length of follow-up (continuous). Because accuracy summaries.

censored data share features of both continuous response data

and binary data, the accuracy concepts that are standard for 1.1 Notation

either response type may be extended for survival outcomes. Let Ti be the survival time for subject i, and assume that

Previous research has focused on extending the proportion we only observe the minimum of Ti and Ci , where Ci rep-

of variation explained by the covariates, or R2 , to censored resents an independent censoring time. Dene the follow-up

data models (Schemper and Henderson, 2000; OQuigley and time Xi = min(Ti , Ci ), and let i = 1(Ti Ci ) denote the

Xu, 2001). In addition, limited work has explored the use censoring indicator. The survival time Ti can also be rep-

of familiar binary outcome methods such as receiver operat- resented through the counting process, N i (t) = 1(Ti t),

ing characteristic (ROC) curves for application in the longi- or the corresponding increment, dN i (t) = N i (t) N i (t).

tudinal setting (Etzioni et al., 1999; Heagerty, Lumley, and Note that we focus on the counting process N i (t) which is

Pepe, 2000; Slate and Turnbull, 2000). Time-dependent ROC dened solely in terms of the survival time Ti rather than the

curves oer an alternative to the use of R2 extensions for more common notation Ni (t) = 1(Xi t, i = 1), which de-

survival data. However, the goal of an ROC analysis is to pends on the censoring time (Fleming and Harrington, 1991).

characterize the prognostic potential of a marker (or model) Let Ri (t) = 1(Xi t) denote the at-risk indicator. We also

by focusing on the correct classication rates. Methods that assume that for each subject we have a collection of time-

summarize the proportion of variation explained by covariates invariant covariates, Z i = (Z i1 , Z i2 , . . . , Zip ).

92

Survival Model Predictive Accuracy and ROC Curves 93

We focus here on using Cox model methods to both gen- OQuigley and Xu (2001) also develop R2 summaries for

erate a model score and to evaluate the prognostic potential Cox regression. In their approach the role of survival time

of the model score. However, the evaluation methods that we and covariate are reversed, and the proportion of variation

propose can be used to summarize the accuracy of a prog- in the covariate that is explained by survival is proposed.

nostic score generated through any alternative regression or The authors exploit partial likelihood estimation methods be-

predictive method, and in this case varying coecient meth- cause the methods provide model-based estimates of the dis-

ods (Hastie and Tibshirani, 1993) such as locally weighted tribution of covariates conditional on survival time. Focusing

partial likelihood estimation (Cai and Sun, 2003) provide a on a scalar covariate, Xu and OQuigley (2000) show that

convenient approach for estimating key accuracy summaries. i (, t) = Ri (t) exp(Z i )/W (t) can be used to estimate

Therefore, we briey introduce the relevant aspects of par- the distribution of the covariate, Z i , conditional on the

tial likelihood estimation. Under the proportional hazards event occurring at time t, P (Z i z | Ti = t) = j j (, t)

assumption, (t | Z i ) = 0 (t) exp(Z Ti ), where (t | Z i ) = 1(Zj z). OQuigley and Xu (2001) obtain estimates of the

lim0 1 P [Ti [t, t + ) | Z i , Ti t]. The partial likelihood conditional variance var(Z i | Ti = t) and propose a global

score equations can be written as summary by integrating estimates of the marginal and condi-

tional variance over the survival distribution. Our approach is

similar in that we also use i (, t) to estimate conditional dis-

0= i Z i k (, Xi )Z k ,

tributions, but rather than computing variances we estimate

i k

time-dependent versions of sensitivity and specicity dened

where

k (, t) = Rk (t) exp(Z Tk )/W (t), with W (t) = in the following section.

j

Rj (t) exp(Z Tj ). Solving these equations yields the con-

sistent and asymptotically normal maximum partial likeli- 1.3 Overview

hood estimator (MPLE) (Cox, 1972). In Section 2 we briey review ROC methods proposed for

summarizing the accuracy of a prognostic marker or model

1.2 Proportion of Variance Approaches when the outcome of interest is a survival time. We then

Two main approaches exist for characterizing the proportion develop new denitions of time-dependent sensitivity and

of variation explained by a survival model. Schemper and specicity that are strongly connected to partial likelihood

Henderson (2000) overview an approach where the survival concepts. Time-dependent accuracy measures can be used

time is characterized by a counting process representation, to calculate time-specic ROC curves, and time-specic area

N i (t) = 1(Ti t), and time-integrated variances are used to under the curve (AUC) summaries. We show that a global

form the summary measure. Alternatively, OQuigley and Xu concordance measure is the integral, or weighted average, of

(2001) consider the proportion of variation in the covariate, time-specic AUC measures. In Section 3 we discuss the es-

Z i , that is explained by the survival time Ti . timation of time-dependent ROC and AUC summaries and

Schemper and Henderson (2000) build on earlier work that provide a method that is applicable to a proportional haz-

extends R2 to Cox regression. Their approach focuses on using ards model, and a more general method that can be used to

the counting process, N i (t), and marginal and conditional ex- characterize any scalar prognostic score even if proportional

pectations given by the survival functions S(t) = E[1 N i (t)] hazards do not obtain. Finally, in Section 4 we analyze two

and S(t | Z i ) = E[1 N i (t) | Z i ], respectively. Because the well-known data sets. We conclude the article with a brief

vital status indicator N i (t) is a binary variable, Schemper discussion.

and Henderson (2000) propose using the marginal variance

S(t)[1 S(t)] and the conditional variance S(t | Z i )[1 2. Censored Survival and Predictive Accuracy

S(t | Z i )] to characterize the proportion of variation explained 2.1 Background on ROC Curve Analysis

by the covariates Z i . In particular, a nite time range (0, ) When outcomes Yi are binary the accuracy of a prediction

is considered and time-average variances are formed: or classication rule is typically summarized through correct

classication rates dened as sensitivity, P (pi > c | Yi = 1),

D( ) = S(t)[1 S(t)] f (t) dt f (t) dt and specicity, P (pi c | Yi = 0), where pi is a prediction,

0 0

and c is a criterion for classifying the prediction as positive

(pi > c) or negative (pi c). When no a priori value of c is in-

DZ ( ) = EZ {S(t | Z)[1 S(t | Z)]} f (t) dt f (t) dt, dicated the full spectrum of sensitivities and specicities can

0 0

be characterized using an ROC curve that plots the true

where f (t) is the marginal density of Ti . Our representation positive rate (sensitivity) versus the false positive rate

above diers by a factor of 2 from the proposal of Schemper (1-specicity) for all c (, +).

and Henderson (2000) as they also consider the mean absolute An ROC curve provides complete information on the set

deviation, E[|N i (t) S(t)|] = 2S(t)[1 S(t)]. Finally, the of all possible combinations of true-positive and false-positive

summary V ( ) = D( ) DZ ( )/D( ) is proposed as the rates, but is also more generally useful as a graphical char-

proportion of variation explained by covariates. Similarly, our acterization of the magnitude of separation between the case

approach views survival data through the counting process and control marker distributions. If case measurements and

representation, N i (t), but because N i (t) is a binary outcome control measurements have no overlap then the ROC curve

we explore the extension of standard binary response accuracy takes the value 1 (perfect true-positive rate) for any false-

summaries such as ROC curves rather than considering an positive rate greater than 0. In this situation the marker

extension of R2 . is perfect at discriminating between cases and controls.

94 Biometrics, March 2005

, t2

, . . . , tm

tical then the ROC curve lies on the 45 line indicating that in discriminating between subjects who die prior to a given

the marker is useless for separating cases from controls. time t

and those that survive beyond t

. ROC curves are

C/D

The area under the ROC curve, or AUC, is known to rep- dened as ROCt (p) = TPCt {[FPDt ]1 (p)} where TPCt (c) =

resent a measure of concordance between the marker and the P (Mi > c | Ni (t) = 1), FPDt (c) = P (Mi > c | Ni (t) = 0), and

disease status indicator (Hanley and McNeil, 1982). Speci- [FPDt (p)]1 = inf c {c : FPDt (c) p}. In the absence of censoring

C/D

cally, the AUC measures the probability that the marker value ROCt (p) can be estimated using the empirical distribution

for a randomly selected case exceeds the marker value for a of the marker separately among cases and controls. With cen-

randomly selected control and is directly related to the Mann sored survival times Heagerty et al. (2000) develop a non-

Whitney U statistic (Hanley and McNeil, 1982; Pepe, 2003). parametric estimator based on the nearest-neighbor bivariate

Finally, ROC curves are particularly useful for comparing the distribution estimator of Akritas (1994). A substantive ap-

discriminatory capacity of dierent potential biomarkers. For plication that demonstrates use of cumulative/dynamic ROC

example, if for each value of specicity one marker always curves for a Cox regression model can be found in Fan et al.

has a higher sensitivity, then this marker will be a uniformly (2002).

better diagnostic measurement. See Zhou, McClish, and 2.2.2 Incident/static. Etzioni et al. (1999) and Slate and

Obuchowski (2002) or Pepe (2003) for more discussion of ROC Turnbull (2000) adopt an alternative denition of time-

analysis. dependent sensitivity and specicity using

In this section we rst review previous proposals for gener-

alizing the concepts of sensitivity and specicity for applica- sensitivityI (c, t) : P (Mi > c | Ti = t)=P Mi > c | dNi (t) = 1

tion to survival endpoints. Denitions of sensitivity and speci-

city are given in terms of the actual survival time Ti . Cen-

soring needs to be addressed for valid estimation. We then where dN i (t) = N i (t) N i (t). Using this denition, each

show that a certain choice of time-dependent true-positive subject does not change disease status and is treated as either

and false-positive denitions leads to time-dependent ROC a case or a control. Cases are stratied according to the time

curves and time-dependent AUC summaries that are directly at which the event occurs (incident) and controls are dened

related to a previously proposed concordance summary for as those subjects who are event free through a xed follow-up

survival data. period, (0, t ) (static). These denitions facilitate the use of

standard regression approaches for characterizing sensitivity

2.2 Extensions of Sensitivity and Specicity

and specicity because the event time, Ti , can simply be used

For survival data there are several potential extensions of as a covariate. To estimate the quantiles of the conditional

cross-sectional sensitivity and specicity. Rather than a sim- distribution of the marker, Mi , given the event time, Ti = t,

ple binary outcome, Yi = 1, a survival time can be viewed as a Etzioni et al. (1999) and Slate and Turnbull (2000) consider

time-varying binary outcome by focusing on the counting pro- parametric methods that assume a normal distribution, but

cess representation N i (t) = 1(Ti t). Accuracy extensions which allow the mean and variance to be functions of the

are classied according to whether the cases used to dene measurement time, disease status, and the event time for the

time-dependent sensitivity are incident cases where Ti = t, cases. Cai et al. (2003) propose methods for estimating time-

or equivalently dN i (t) = 1, is used to dene cases for time dependent sensitivity and specicity when the event time is

t, or cumulative cases where Ti t or N i (t) = 1 is used. We censored. Recently, Zheng and Heagerty (2004) have proposed

also consider whether controls are static, dened as subjects regression quantile methods, which relax the parametric dis-

with Ti > t for a xed value of t , or whether controls are tributional assumptions of previous approaches.

dynamic and dened for time t as those subjects with Ti > t. 2.2.3 Incident/dynamic. In this article we focus on the

We use the superscripts C and I to denote dierent denitions following denitions of sensitivity and specicity:

of sensitivity, and use the superscripts D and D to denote dif-

ferent denitions of specicity. In this section we focus on a sensitivityI (c, t) : P (Mi > c | Ti = t) = P Mi > c | dNi (t) = 1

scalar marker value Mi that is used as a predictor of death.

When our interest is in the accuracy of a regression model we specicityD (c, t) : P (Mi c | Ti > t) = P Mi c | Ni (t) = 0 .

will use Mi = Z Ti . Using this approach a subject can play the role of a control for

2.2.1 Cumulative/dynamic. For a baseline marker value, an early time, t < Ti , but then play the role of case when t =

Mi , Heagerty et al. (2000) propose versions of time-dependent Ti . This dynamic status parallels the multiple contributions

sensitivity and specicity using the denitions that a subject can make to the partial likelihood function.

Here sensitivity measures the expected fraction of subjects

sensitivityC (c, t) : P (Mi > c | Ti t) = P Mi > c | Ni (t) = 1

with a marker greater than c among the subpopulation of

specicityD (c, t) : P (Mi c | Ti > t) = P Mi c | Ni (t) = 0 . individuals who die at time t, while specicity measures the

fraction of subjects with a marker less than or equal to c

Using this approach, at any xed time t the entire population among those who survive beyond time t. Incident sensitivity

is classied as either a case or a control on the basis of vital and dynamic specicity are dened by dichotomizing the risk

status at time t. Also, each individual plays the role of a con- set at time t into those observed to die (cases) and those

trol for times t < Ti , but then contributes as a case for later observed to survive (controls). In Section 3 we discuss how

times, t Ti . Cumulative/dynamic accuracy summaries are the observed marker data among risk sets can be used to

most appropriate when a specic time t

(or a small collection estimate time-dependent accuracy concepts.

Survival Model Predictive Accuracy and ROC Curves 95

Incident sensitivity and dynamic specicity have some ap- is a weighted average of the area under time-specic ROC

pealing characteristics relative to the alternative denitions. curves,

First, incident sensitivity and dynamic specicity are based

P [Mj > Mk | Tj < Tk ]

on classication of the risk set at time t into case(s) and

controls, and are, therefore, a natural companion to hazard

models. Second, the denitions easily allow extension to time- = 2 P [{Mj > Mk } | {Tj = t} {t < Tk }]

t

dependent covariates using P [Mi (t) > c | Ti = t] to dene in-

cident sensitivity and P [Mi (t) c | Ti > t] to dene dynamic P [{Tj = t} {t < Tk }] dt

specicity with a longitudinal marker Mi (t). Use of cumu-

lative sensitivity does not permit a time-varying marker. Fi- = AUC(t) w(t) dt = ET [AUC(T ) 2 S(T )]

nally, use of incident sensitivity and dynamic specicity allows t

both time-specic accuracy summaries and, as shown in Sec- with w(t) = 2 f (t) S(t).

tion 2.4, allows time-averaged summaries that directly relate

to a familiar global concordance measure. In contrast, meth- In this notation AUC(t) is based on the I/D denition of sen-

ods have not been proposed for meaningfully averaging the sitivity and specicity, AUC(t) = P (Mj > Mk | Tj = t, Tk > t).

time-specic incident/static or cumulative/dynamic accuracy See the Appendix for a derivation.

summaries. In practice we would typically restrict attention to a xed

follow-up period (0, ). The concordance summary can be

2.3 Time-Dependent ROC Curves modied to account for nite follow-up:

After selecting denitions for time-dependent sensitivity and

specicity, ROC curves can be computed and interpreted. In C = AUC(t) w (t) dt,

0

this article we focus on incident/dynamic (I/D) ROC curves

I/D

dened as the function ROCt (p), where p denotes the dy- where w (t) = 2 f (t) S(t)/W , W = 0 2 f (t) S(t) dt =

I/D

namic false-positive rate, and ROCt (p) denotes the corre- 1 S 2 ( ). The restricted concordance summary remains a

sponding incident true-positive rate. Specically, let cp be weighted average of the time-specic AUCs with the weights

dened as the threshold that yields a false-positive rate of rescaled such that they integrate to 1.0 over the range (0, ).

p: P (Mi > cp | Ti > t) = 1 specicityD (cp , t) = p. The true- The interpretation of C is a slight modication of the origi-

I/D

positive rate, ROCt (p), is the sensitivity that is obtained nal concordance, where C = P [Mj > Mk | Tj < Tk , Tj < ].

I/D Thus C is the probability that the predictions for a random

using this threshold, or ROCt (p) = sensitivityI (cp , t) =

pair of subjects are concordant with their outcomes, given

P (Mi > cp | Ti = t). Using the true and false-positive

that the smaller event time occurs in (0, ).

rate functions TPIt (c) = sensitivityI (c, t) and TPDt (c) = 1

The concordance summary C is directly related to Kendalls

specicityD (c, t) allows the ROC curve to be written

tau. Specically, C = K/2 + 1/2, where K denotes Kendalls

as the composition of TPIt (c) and the inverse function

tau (see Agresti, 2002, p. 60 for denition). Korn and Simon

[TPDt ]1 (p) = cp :

(1990) and Harrell et al. (1996) discuss the use of Kendalls

I/D

1

ROCt (p) = TPIt FPDt (p) tau (K or a ) with survival data and propose modications

1 to account for censored observations.

I/D

for p [0, 1]. We use the notation AUC(t) = 0 ROCt (p)dp 2.5 Example: Gaussian Marker and Log-Normal Disease Time

to denote the area under the I/D ROC curve for time t.

To illustrate time-dependent accuracy concepts we consider

2.4 Time-Dependent AUC and Concordance a simple example where the marker Mi and the log of sur-

In the previous subsection we discussed how ROC methods vival time log(Ti ) follow a bivariate normal distribution. By

can be used to characterize the ability of a marker to dis- convention we consider a higher marker value as indicative of

tinguish cases at time t from controls at time t. However, in earlier disease onset and, therefore, explore bivariate distri-

many applications no a priori time t is identied, and a global butions with a negative correlation between the marker and

accuracy summary is desired. In this subsection we show how log(time).

time-dependent ROC curves are related to a standard con- If [Mi , log(Ti )] has a bivariate normal distribution with

cordance summary. The global summary we adopt is mean (0, 0) and unit standard deviations then time-dependent

incident sensitivity and cumulative 1-specicity are

C = P [Mj > Mk | Tj < Tk ],

log(t) c

which indicates the probability that the subject who died at P Mi > c | dNi (t) = 1 = TPIt (c) =

(1 2 )

the earlier time has a larger value of the marker. This is not

the usual form (i.e., P [Mj > Mk | Tj > Tk ]), but reects the

S2N [c, log(t); ]

conventions for ROC analysis. P Mi > c | Ni (t) = 0 = FPDt (c) = ,

[ log(t)]

In order to understand the relationship between this dis-

crimination summary and ROC curves we assume indepen- where (x) = P (X < x) for X N (0, 1) and S2N [x, y; ] =

dence of observations (Mj , Tj ) and (Mk , Tk ), and assume that P (X > x, Y > y) for (X, Y) bivariate mean 0 unit normal

Tj is continuous such that P (Tk = Tj ) = 0. We use P(x) with correlation .

to denote probability or density depending on the context. Figure 1a shows I/D ROC curves for = 0.8. The solid

These assumptions imply that the concordance summary C line corresponds to t = exp(2) and has an AUC of 0.923

96 Biometrics, March 2005

1.0

0.8

0.6

sensitivity

0.4

log(t) = -2

log(t) = -1

0.2

log(t) = 0

log(t) = 1

log(t) = 2

0.0

1-specificity

1.0

rho = -0.9

rho = -0.8

0.9

rho = -0.7

rho = -0.6

0.8

AUC(t)

0.7

0.6

w(t)

0.5

time

Figure 1. Incident/dynamic ROC and AUC plots for a bivariate (log) normal distribution. (a) Incident/dynamic ROC

curves for a scalar marker and a disease time where {Mi , log(Ti )} is bivariate normal with = 0.8. (b) Plots of AUC(t) for

a scalar marker and a disease time where {Mi , log(Ti )} is bivariate normal with taking the values (0.9, 0.8, 0.7, 0.6).

indicating very good separation between the distribution for a positive test, then by denition, only 10% of the controls

Mi among subjects with Ti = exp(2) as compared to the (i.e., log(Ti ) > 2) would have a value of Mi greater than 1.19.

marker distribution for subjects with Ti > exp(2). Further- The ROC plot shows that for this false-positive rate of 10%

more, if the threshold value c10% = 1.19 were used to indicate a sensitivity, or true-positive rate, of 75% can be obtained:

Survival Model Predictive Accuracy and ROC Curves 97

TPIt (1.19) = 0.752. If we consider a later time such as log(t) = the marker given failure: E(Mi | Ti = t) = k Mk k (, t).

0 we nd less overall discrimination with an AUC of 0.741. However, Xu and OQuigley (2000) show that these weights

Again, specic operating points can be identied; for example, can also be used to estimate the distribution of the covariate

the ROC curve shows that if the false-positive rate is again conditional on death at time t:

controlled at 10% then a true-positive rate of only 30% is now I

obtained (here c10% = 0.320). One of the key advantages of

t (c) = P (Mi > c | Ti = t) =

TP 1(Mk > c) k (, t), (1)

an ROC curve is that it facilitates comparisons across dier- k

ent conditions in terms of the sensitivity of a marker where where the estimate P (Mi > c | Ti = t) is a consistent estima-

the specicity is controlled at a xed level for each condition. tor when the Cox model for Mi holds. Estimation of us-

Here we have evaluated the temporal variation in sensitivity ing partial likelihood provides a semiparametric estimate for

while controlling 1-specicity at 10%. TPIt (c). An empirical estimator can be used for FPDt (c):

In Figure 1b we show the AUC(t) functions for dierent

D

values of . For each value of we nd a decreasing AUC(t) t (c) = P (Mi > c | Ti > t)

FP

with increasing time. In addition, with decreasing correlation

between the marker and the disease time we nd uniformly = 1(Mk > c) Rk (t+)/W R (t+), (2)

decreasing values for AUC(t). A global accuracy summary k

can be obtained using C, which integrates AUC(t) using the

weight function proportional to 2 f (t) S(t). Figure 1b also where Rk (t+) = lim0 Rk (t + ||), and W R (t+) = k Rk (t+).

displays the weight function, which for this example is w(t) = The term W R (t+) denotes the size of the control set at

2 (t)[ 1 (t)], where (x) and (x) are the standard nor- time t, where we dene the control set as the risk set minus

D

mal density and distribution functions, respectively. In this t (c) is the empirical

subjects who fail at time t. Essentially, FP

bivariate normal situation there exists an analytical solution distribution function for marker values among the control set,

I

for the concordance: C = sin1 ()/ + 0.5. For = 0.9 t (c) is an exponential tilt of the empirical distribution

and TP

we nd C = 0.827, while with = 0.6 we nd C = 0.703. function for the marker among risk set subjects (Anderson,

Therefore, when the marker Mi and log-survival time have a 1979).

correlation of 0.9 there is a 82.7% chance that for a random

3.2 Estimation: TPIt (c) and FPDt (c) under

pair of observations the marker value for the earlier survival

Nonproportional Hazards

time is greater than the marker value for the larger survival

time. This concordance probability is reduced to 70.3% when In order to use equation (1) to estimate incident sensitiv-

= 0.6. ity the proportional hazards assumption must be satised.

However, this aspect can be relaxed by adopting a varying-

coecient model of the form (t | Mi ) = 0 (t) exp[Mi (t)]. The

3. Estimation of Incident/Dynamic time-varying coecient function (t) can be estimated either

Time-Dependent Accuracy in a one-step fashion based on routine Cox model residuals,

In this section we propose methods for the estimation of time- or through locally weighted partial likelihood methods. Note

dependent accuracy summaries using a single scalar marker that if proportional hazards do obtain then (t) 1 when

Mi . When interest is in the accuracy of a survival regres- Mi = Z Ti .

sion model we propose using the linear predictor as a scalar Grambsch and Therneau (1994) describe residual-based

marker, Mi = Z Ti , and then using nonparametric or semi- methods for assessing the proportional hazards model that

parametric methods to characterize the time-dependent sen- can also be used to obtain estimates of time-varying coef-

sitivity and specicity of the model score. In particular, we cient functions. In order to dene the residuals we adopt

discuss how the Cox model and partial likelihood concepts can the following notation: S (p) (, t) = k Rk (t) exp(Z Tk ) Z p

k ,

be conveniently used to provide semiparametric estimates of where Z p T

k refers to 1, Z k , and Z k Z k for p = 0, 1, 2, respec-

I/D accuracy. However, the methods that we propose do not tively. The scaled Schoenfeld residuals are dened for each

require the model score, Mi , to be derived from a propor- observed ordered failure time, t(j) , as the vector

tional hazards model and are potentially applicable for any

prognostic scale. rj () = V 1 [, t(j) ]{Z (j) e[, t(j) ]},

where e[, t(j) ] = S (1) [, t(j) ]/S (0) [, t(j) ], V [, t(j) ] = S (2) [,

3.1 Estimation: TPIt (c) and FPDt (c) under t(j) ]/S (0) [, t(j) ] e[, t(j) ]e[, t(j) ]T , and Z(j ) denotes the co-

Proportional Hazards variate for the subject observed to die at time t(j) . Grambsch

Properties of the partial likelihood function make estimation and Therneau (1994) show that E{rj | F[t(j) ]} [(t) 0 ],

of I/D ROC curves a natural companion to Cox regression. where 0 is the time-averaged coecient and F(t) is the right-

Here we assume that the censoring time Ci is independent of continuous ltration specifying the survival process history.

the failure time Ti and marker Mi . To clearly distinguish be- This property is used to obtain focused tests of proportion-

tween the general model score, Mi = Z Ti , and a Cox model ality, and to obtain estimates of the time-varying coecient

that uses this score, we denote as the proportional haz- function, k (t) corresponding to covariate Z i,k . As a graphi-

ards regression parameter (t | Mi ) = 0 (t) exp(Mi ). It is well cal diagnostic tool standard regression-smoothing techniques

known that under a proportional hazards model the weights, are now commonly applied to the points [t(j) , k + rj,k ()] fol-

i (, t) = Ri (t) exp(Mi )/W (t) introduced in Section 1.1, lowing a Cox model t in order to obtain estimates of time-

are used to compute an estimate of the expected value of dependent coecient functions, k (t).

98 Biometrics, March 2005

For the evaluation of the accuracy of a marker, Mi , the 3.4 Inference for Incident/Dynamic Accuracy Summaries

smoothing of Schoenfeld residuals can be used to obtain a I

Xu and OQuigley (2000) show that the estimator TP t (c)

simple estimate of I/D AUC(t) by exploiting standard Cox

given in equation (1) is consistent provided that the propor-

model output. First a Cox model of the form 0 (t) exp(Mi ) is

tional hazards model obtains, and provided the independent

t, followed by use of regression-smoothing methods to obtain

observations are subject to independent censoring. Parallel

(t). Second, equation (2) can still be used to obtain estimates

arguments apply for the estimator obtained using a varying-

of false-positive rates, and (1) can now be evaluated using (t)

coecient model given in equation (3) whenever a consistent

rather than a constant value :

estimator of (t) is used. Cai and Sun (2003) show that the

I locally weighted MPLE is consistent under standard regu-

t (c) = P (Mi > c | Ti = t) =

TP 1(Mk > c) k [(t), t]. (3) D

larity conditions. In addition, because FP t (c) is an empiri-

k

cal distribution function calculated over the control set (i.e.,

By using equation (3) we are adopting the exible semi- the risk set minus the case), consistency obtains provided the

parametric hazard model, 0 (t) exp[Mi (t)], which no longer control set represents an unbiased sample (i.e., independent

assumes proportionality, but rather only assumes smoothly censoring). Therefore, consistent estimates of time-dependent

varying hazard ratios over time. sensitivity and specicity and corresponding AUC(t) and C

More formal exible semiparametric statistical methods summaries are obtained under the proportional hazards as-

can be used to estimate a varying-coecient hazard model sumption using equations (1) and (2), and under more gen-

and subsequently produce time-dependent accuracy sum- eral nonproportional hazards assumptions using equation (3).

maries based on minimal model assumptions. For example, Finally, because the accuracy summaries are dened over the

Hastie and Tibshirani (1993) discuss both smooth paramet- joint distribution of the marker Mi and the survival time Ti ,

ric methods and nonparametric penalized likelihood meth- the nonparametric bootstrap of Efron (1979) based on resam-

ods for estimating the function (t) in the model i (t) = pling of observations (Mi , Xi , i ) may be used to compute

0 (t) exp[Mi (t)]. More recently Cai and Sun (2003) char- standard errors or to provide condence intervals.

acterize the properties of locally weighted partial likelihood 3.5 Discrete Times and General Hazard Models

methods used to obtain varying coecient estimates. Using

Our motivation for developing tools to summarize predictive

kernel weights that are specied as a function of time, t,

accuracy stems from interest in characterizing the prognostic

allows use of local-linear estimation methods. Cai and Sun

potential of Cox models for continuous survival times. How-

(2003) prove the pointwise consistency and asymptotic nor-

ever, the basic time-dependent accuracy concepts and the es-

mality of the resulting function estimator, (t). Smooth para-

timation method outlined in Section 3.2 generalizes to discrete

metric and/or nonparametric methods allow valid estimation

survival times and/or alternative hazard regression models.

of accuracy summaries such as AUC(t) based on the mini-

The key to estimation of TPIt (c) presented in Sections 3.1

mal model assumptions because models of the form i (t) =

and 3.2 is that a hazard model can be used to reweight the em-

0 (t) exp[Mi (t)] only assume linearity in Mi and smoothly

pirical distribution of Mi calculated over the risk set at time

varying hazard ratios over time. The linearity assumption can

t. Equations (1) and (3) show specic details for Cox models.

be relaxed by using a model with single or multiple transfor-

More generally, let P (Ti = t | Ti t, Mi ) denote the hazard,

mations of Mi and a vector of time-varying coecients.

where P (t) represents either density for continuous survival

I/D

3.3 Estimation: ROCt (p), AUC(t), and C times or probability for discrete times. A hazard regression

Given estimates of TPIt (c) and FPDt (c) the area under the model can be formulated as g[P (Ti = t | Ti t, Mi )] = (t) +

ROC curve at time t, AUC(t), and the integrated area, C , Mi (t), where g(x) is a link function. The Cox model is a spe-

can be calculated. The estimated ROC curve is given as cial case where a log link is used; (t) = log 0 (t); and (t)

I/D I

D 1
under the proportional hazards assumption. Following ar-

t (p) = TP

ROC t t

FP (p) , guments given in Xu and OQuigley (2000) the general model

implies:

D

where t ]1 (p) = inf c {c : FP

[FP t (c) p}. The estimated P (Mi = m | Ti = t)

I/D

AUC(t) is simply AUC(t) t (p) dp estimated using

= ROC g 1 [(t) + m (t)] P (Mi = m | Ti t), (4)

standard numerical integration methods such as the trapezoid

rule. Finally, the estimated concordance is given by where P (Mi = m | Ti t) denotes either the marker den-

sity or probability depending on whether a continuous or dis-

C = w (t) dt,

AUC(t)

crete marker distribution is assumed. See the Appendix for

a derivation. Equation (4) shows that P (Mi = m | Ti = t)

can be estimated from separate estimates of the hazard

where AUC(t) is given above and w (t) = 2 f(t) S(t)/ model and the distribution of the marker conditional on Ti

[1 S ( )]. The KaplanMeier estimator can be used for S(t),

2

t. Therefore, the general estimation approach outlined in

and a discrete approximation to f(t) can be used based on the Section 3.2 can be adopted for either discrete survival times

increments in the KaplanMeier estimator. If KaplanMeier or for general hazard regression models provided that con-

is used to estimate f (t) and S(t) then AUC(t) only needs to sistent estimates of [(t), (t)] and P (Mi = m | Ti t) are

be evaluated at the observed failure times in order to calculate available. Tied survival times impact choice of a method for

C . estimating the hazard model parameters. In addition, with

Survival Model Predictive Accuracy and ROC Curves 99

discrete survival

times calculation of the concordance sum- timation for the model 0 (t) exp[Mi (t)] using the method of

mary C = AUC(t) w(t) dt requires modication to account Cai and Sun (2003); and simple local linear smoothing of the

for the fact that P (Tj = Tk ) = 0 and, therefore, the constant scaled Schoenfeld residuals. For local MPL estimation and lo-

2 in the weight w(t) = 2 f (t) S(t) needs to be computed as cal linear smoothing we used an Epanechnikov kernel with a

1/P (Tj < Tk ). Finally, Cox models are convenient because span of n1/5 where n is the number of observations.

the baseline hazard, (t) = log 0 (t), drops out of (4), and is In order to estimate AUC(t) and C using semiparamet-

thus not required for estimation of TPIt (c). ric methods the model for the survival time conditional on

the marker, 0 (t) exp[Mi (t)], is combined with the observed

3.6 Simulations to Evaluate Incident/Dynamic Estimation marker distribution within each risk set according to the

In order to demonstrate the feasibility of using Cox regres- methods described in Section 3.2. We have adopted a survival

sion methods and the marker distribution among risk sets for model that assumes that the log hazard increases linearly in

estimating I/D ROC curves and global concordance we con- Mi for each time t. The true data-generating model is actu-

ducted a set of simulation studies. ally nonlinear with a concave risk function. Therefore, for this

For each of m = 500 simulated data sets a sample of n = simulation our estimation used a rst-order approximation to

200 marker values, Mi , and survival times, Ti , were gener- the true conditional hazard surface.

ated such that (Mi , log Ti ) is bivariate normal with a correla- Table 1 displays the mean and standard deviation for the

tion of = 0.7. An independent log-normal censoring time estimate of AUC(t) at various values of t when data are gener-

was generated to yield a xed expected fraction of censored ated with 20% and with 40% censoring. When 20% of the ob-

observations (either 20% or 40% censored). For each simu- servations are censored we nd that the MLE for AUC(t) has

lated data set we estimated the I/D AUC(t) function and the minimal bias for log(t) between 2 and 2. Estimates based on

concordance summary C using the largest observed survival the locally weighted MPLE and the residual smoother yield

time to truncate follow-up time. We applied four methods of approximately unbiased estimates for all but the most ex-

estimation to the censored data: maximum likelihood assum- treme values of time with some negative bias observed for

ing a bivariate normal distribution for the survival time and both the semiparametric estimators. For example, at log(t) =

the marker; maximum partial likelihood using the Cox model,

2 the mean AUC(t) using the locally weighted MPLE is

which for this example incorrectly assumes proportional haz- 0.860 (relative bias of 1 0.860/0.884 = 3%) and using

ards; locally weighted maximum partial likelihood (MPL) es- the residual smoother the average is 0.881 (relative bias of

Table 1

Simulation results for estimation of I/D accuracy. Data (Mi , log Ti ) were generated as bivariate normal with a correlation of

= 0.7. The sample size for each simulated data set was N = 200. The AUC(t) curve and the integrated curve, C , were

estimated using: maximum likelihood assuming a bivariate normal model; Cox model, which assumes proportional hazards; local

maximum partial likelihood for the varying-coecient model (t) = 0 (t) exp[(t)Mi ]; and a local linear smooth of the scaled

Schoenfeld residuals to estimate the varying-coecient model.

Log time AUC(t) Mean SD Mean SD Mean SD Mean SD

20% censoring

2.0 0.884 0.884 0.018 0.743 0.028 0.860 0.052 0.881 0.044

1.5 0.833 0.834 0.019 0.734 0.026 0.817 0.033 0.829 0.035

1.0 0.782 0.782 0.019 0.725 0.024 0.768 0.031 0.771 0.033

0.5 0.734 0.734 0.019 0.716 0.023 0.722 0.032 0.720 0.033

0.0 0.693 0.693 0.018 0.707 0.021 0.688 0.034 0.686 0.034

0.5 0.660 0.660 0.016 0.700 0.023 0.655 0.041 0.657 0.040

1.0 0.634 0.634 0.015 0.691 0.028 0.633 0.044 0.637 0.041

1.5 0.614 0.614 0.013 0.670 0.044 0.621 0.064 0.622 0.048

2.0 0.598 0.598 0.012 0.600 0.075 0.579 0.076 0.573 0.060

C 0.741 0.741 0.016 0.720 0.020 0.737 0.018 0.740 0.018

40% censoring

2.0 0.884 0.884 0.019 0.749 0.031 0.859 0.054 0.875 0.048

1.5 0.833 0.834 0.021 0.742 0.029 0.818 0.035 0.827 0.037

1.0 0.782 0.782 0.021 0.732 0.026 0.770 0.035 0.772 0.035

0.5 0.734 0.734 0.020 0.722 0.024 0.724 0.038 0.722 0.039

0.0 0.693 0.693 0.019 0.712 0.024 0.689 0.042 0.687 0.041

0.5 0.660 0.660 0.018 0.702 0.026 0.654 0.045 0.655 0.043

1.0 0.634 0.635 0.016 0.689 0.035 0.633 0.057 0.637 0.048

1.5 0.614 0.614 0.015 0.653 0.055 0.617 0.075 0.614 0.051

2.0 0.598 0.599 0.013 0.560 0.073 0.555 0.075 0.546 0.058

C 0.741 0.741 0.017 0.727 0.022 0.740 0.021 0.742 0.021

100 Biometrics, March 2005

weighted MPLE mean estimate is 0.579 (relative bias = 1 Cox regression estimates for the VA lung cancer data where

0.579/0.598 = 3%) and for the residual smoother the mean follow-up is truncated at 500 days. The reference category for

is 0.573 (relative bias = 1 0.573/0.598 = 4%). As ex- cell type is squamous.

pected for local regression methods Table 1 shows that the

Covariate Estimate SE Z

nonparametric methods yield substantially greater variances

for specic values of t compared to the MLE. Treatment 0.323 0.206 1.566

Incorrectly assuming proportional hazards lead to biased Age/10 0.086 0.093 0.937

estimates. Table 1 shows that the estimated AUC(t) obtained Karnofsky score 0.032 0.005 5.931

using equation (1) with an estimated Cox model coecient Cell type (small) 0.841 0.270 3.116

is negatively biased for log(t) < 0. For example, at log(t) = Cell type (adeno) 1.151 0.295 3.896

2 we obtain a negative bias of 1 0.743/0.884 = 16%. Cell type (large) 0.350 0.285 1.231

For log(t) > 0 the estimates obtained using the Cox model

and equation (1) are positively biased indicating that direct

use of the proportional hazards assumption produces an esti- tus measure known as the Karnofsky score. Schemper and

mated AUC(t) curve that is atter than the target with early Henderson (2000) use these covariates plus a treatment indi-

underestimation and late overestimation. cator and report an R2 of V = 0.24. This would suggest that

When censoring is increased to 40% similar patterns are the covariates explain only 24% of the time-integrated vari-

found for all estimators. Table 1 shows that the bias in AUC(t) ance in survival status.

is slightly larger with increased censoring. For example, at For comparison we use the same covariates and Cox regres-

I/D

log(t) = 2 the mean estimate for the locally weighted MPLE sion to create estimates of ROCt (p) for select t, the AUC(t)

is 0.555 (relative bias of 1 0.555/0.598 = 7%) and for the function, and the concordance summary C . For our analysis

residual smoother it is 0.546 (relative bias of 1 0.546/0.598 = we terminate follow-up at 500 days. Estimated model coef-

9%). Therefore, even with 40% censoring the smooth semi- cients and standard errors are given in Table 2. Using the

parametric methods appear to perform adequately. proportional hazards assumption we can employ equations

Finally, Table 1 also shows the results for the estima- (1) and (2) to estimate time-specic I/D ROC curves, and

tion of the global concordance summary C . In the simu-

then integrate the ROC curve to obtain AUC(t). Estimates

lations we estimate C using the analytical results for the of AUC(t) and pointwise 90% condence intervals are dis-

MLE: C = sin1 ()/2 + 1/2. For the methods that adopt a played in Figure 2a. Over the rst 60 days of follow-up the

varying-coecient hazard model we set equal to the largest AUC(t) ranges between 0.66 and 0.73. The substantive inter-

uncensored survival time in the observed data and, therefore, pretation is: on any day, t, between 0 and 60, the probability

truncate follow-up at slightly dierent times for each simu- that a subject who dies on day t having a model score greater

lated data set. However, even with 40% censoring the largest than a subject who survives beyond day t is at least 0.66. The

uncensored time had a median value of exp(2.30) with an accuracy summaries suggest good short-term discriminatory

interquartile range of exp(2.04) to exp(2.65), and thus typ- potential of the model score. The estimated AUC(t) function

ically very little mass in the survival distribution is lost be- tends to decline over time to approximately 0.65 for 100 <

cause S[exp(2.30)] = 1 (2.30) = 0.01. With 20% censoring t < 300. Estimates of AUC(t) also become increasingly vari-

the mean estimate for the MLE, locally weighted MPLE, and able over time due to the diminishing size of the risk set. Using

residual smoother are 0.741 (SD = 0.016), 0.737 (SD = 0.018), a

follow-up of = 365 days yields a concordance estimate of

and 0.740 (SD = 0.018), respectively. In contrast the estimate w (t)/dt = 0.713 with a standard error of 0.026.

AUC(t)

0

obtained naively assuming proportional hazards is negatively This implies that conditional on one event occurring within

biased with an average estimate of 0.720 (relative bias = 1 the rst year, the probability that the model score is larger

0.720/0.741 = 3%). These results suggest that the smooth for the subject with the smaller event time is 71.3%. The con-

semiparametric methods yield little bias, and for this example cordance estimate C is relatively modest in magnitude, but

exhibit high eciency relative to the MLE. A similar pattern is signicantly dierent from the null value of 0.50 (95% CI

is seen with 40% censoring where slightly increased standard for C : 0.661, 0.765).

deviations are observed relative to results obtained with 20% To characterize the model score, Mi = Z Ti , using fewer

censoring. assumptions we relax the proportional hazards assumption

for Mi by using a varying coecient model: 0 (t) exp[Mi (t)].

4. Examples Note that we are still focusing on use of the Cox model with a

In this section we illustrate the proposed methods using two proportional hazards assumption to generate the model score,

well-studied data sets. but are relaxing the assumptions needed to characterize model

accuracy. This highlights the fact that dierent methods can

4.1 VA Lung Cancer Data be used for generating and evaluating a survival regression

Kalbeisch and Prentice (2002) present and analyze Veterans model score. For the VA lung cancer data we simply use a

Administration (VA) lung cancer data from a clinical trial kernel smooth of the scaled Schoenfeld residuals to estimate

in which males with inoperable cancer were randomized to a (t). The estimate of (t) suggests a decreasing log-relative

standard treatment or a test therapy. Baseline covariates that hazard with increasing time (not shown).

were considered important predictors of mortality include: pa- Figure 2b shows estimates of AUC(t) based on equations

tient age, histological type of tumor, and a performance sta- (2) and (3), which relax the proportional hazards assumption.

Survival Model Predictive Accuracy and ROC Curves 101

1.0

0.8

AUC

0.6

0.4

w(t)

Time (days)

1.0

0.8

AUC

0.6

0.4

w(t)

Time (days)

Figure 2. Incident/dynamic AUC plots for the VA lung cancer data. (a) Accuracy of the model score (linear predictor) under

the assumption of proportional hazards. Estimates of I/D AUC(t) versus time with pointwise 90% condence intervals. Using

w (t) dt = 0.713 (SE = 0.026). (b) Accuracy of the model score (linear predictor) based

= 365 we obtain C = 0 AUC(t)

on a varying-coecient multiplicative hazard model. Estimates of I/D AUC(t) versus time with pointwise 90% condence

w (t) dt = 0.738 (SE = 0.022).

intervals. Using = 365 we obtain C = 0 AUC(t)

102 Biometrics, March 2005

Cox regression estimates for the PBC data

1.0

Covariate Estimate SE Z

Model 1

Log(bilirubin) 0.877 0.099 8.866

0.8

Edema 0.785 0.300 2.617

Albumin 0.944 0.237 3.985

Age 0.033 0.009 3.881

0.6

Model 2

sensitivity

Edema 1.190 0.295 4.031

Albumin 1.314 0.223 5.897

0.4

t = 30

t = 60

0.2

t = 90

t = 120 4.2 Mayo PBC Data

Next, we consider data from a randomized placebo-controlled

trial of the drug D-penicillamine (DPCA) for the treatment of

0.0

0.0 0.2 0.4 0.6 0.8 1.0

between 1974 and 1984 (Fleming and Harrington, 1991).

1-specificity Among the 312 subjects randomized to the study, 125 died

by the end of the follow-up. Although the study established

Figure 3. Incident/dynamic ROC curves for the VA lung that DPCA is not eective for the treatment of PBC, the data

cancer data. A model score is derived using Cox regression have been used to develop a commonly used clinical predic-

with Karnofsky score, age, and cell type. ROC curves are esti- tion model. We use this example to illustrate how ROC curves

mated using a varying-coecient Cox model with the derived and/or AUC(t) summaries can be used to compare dierent

model score as the single predictor. model scores.

We rst consider a Cox model containing ve covariates:

log(bilirubin), albumin, log(prothrombin time), edema, and

First, notice that the short-term accuracy of the model score age. Table 3 gives the regression estimates using the propor-

remains good with AUC(t) between 0.70 and 0.78 over the tional hazard, model with mortality as the response. Except

rst 60 days of follow-up. Second, the discriminatory ability for log(prothrombin time), all covariates are strong predictors

of the model score declines substantially over time, and esti- of survival. The model has been used to create a widely used

mates of AUC(t) approach 0.50 after approximately 300 days, prognostic score. We now address the basic question: How well

suggesting that the model score is essentially useless at dis- does the model score discriminate subjects who are likely to

criminating incident cases from controls after 300 days. The die from subjects who are likely to survive? In addition, we

1-year concordance is estimated as C = 0.738, a slight in- consider whether the accuracy of the score changes over time.

crease from the estimate obtained assuming proportional haz- Using the tted linear predictor from the Cox model, we con-

ards. In this example the AUC(t) curve is particularly useful struct I/D time-dependent ROC curves and associated sum-

for displaying the fact that the baseline model score is good maries for the Mayo model. Figure 4a plots AUC(t) eval-

at discriminating early cases from early controls, but is of de- uated at each failure time. The model score has very good

creasing prognostic utility with increasing temporal distance discriminatory capacity for distinguishing those patients who

from the baseline measurement. Declining prognostic value die at time t from those who live beyond time t. The accuracy

is not surprising, particularly because the Karnofsky score is especially good for follow-up times less than 1000 days, with

is actually a time-varying health status measure, but only early AUC(t) estimates exceeding 0.85. The accuracy of the

the baseline value is available for the regression model. Fig-

model score gradually decreases with time. Based on AUC(t)

ure 3 shows select estimates of I/D ROC curves based on and the KaplanMeier estimator of the marginal survival dis-

the varying-coecient model. Similar to the plot of AUC(t) tribution we estimate a concordance summary, C , of 0.80,

the ROC curves show that predictive accuracy is uniformly with xed at 4000 days for this and subsequent analysis.

decreasing with increasing time since baseline. For example, To quantify the impact of a single covariate on the accu-

controlling the dynamic false-positive rate at 20% leads to an racy of prediction we t a second Cox regression model that

incident sensitivity of 56% at 30 days, decreasing to 45%, 42%, does not include the covariate log(bilirubin). Table 3 displays

and 38% for 60, 90, and 120 days. The ROC curves also show coecient estimates for this new four-covariate model. The

details regarding the trade-o between sensitivity and speci- estimate of C drops from 0.80 to 0.73 when log(bilirubin)

city. If a stricter false-positive rate of 10% was desired then is excluded from the model. In addition, we can use the es-

the corresponding sensitivity would only be 40% at 30 days timated AUC(t) curves shown in Figure 4a to quantify for

and less than 30% for follow-up times of 60 days or greater. each follow-up time t the additional predictive accuracy that

Survival Model Predictive Accuracy and ROC Curves 103

1.0

5 covariates: iAUC = 0.796

0.9

4 covariates: iAUC = 0.733

0.8

AUC

0.7

0.6

0.5

0.4

Time (days)

1.0

0.9

0.8

AUC

0.7

0.6

0.5

0.4

Time (days)

Figure 4. Incident/dynamic AUC plots for the Mayo PBC data. (a) Accuracy of the model score using ve covariates ()

log(bilirubin), log(prothrombin), edema, albumin, and age, and the model score using four covariates (+), where log(bilirubin)

is excluded. Lines plot the estimates of I/D AUC(t) versus time under the assumption of proportional hazards. (b) Accuracy

of the model score using ve covariates () log(bilirubin), log(prothrombin), edema, albumin, and age, and the model score

using four covariates (+), where log(bilirubin) is excluded. Estimation is based on a varying-coecient multiplicative hazard

model. Lines plot the estimates of I/D AUC(t) versus time.

104 Biometrics, March 2005

is obtained by using bilirubin in addition to the other model marker, Mi , or covariates, Z i , would be useful. Second, we

covariates. Relative to the ve-covariate model the estimated have proposed estimators that assume a prospective study

AUC(t) for the four-covariate model is approximately 0.10 design. Extension to casecohort data may be important

units below the ve-covariate model AUC(t) for t between 0 for characterizing the accuracy of markers for rare diseases.

and 2000 days. Third, development of analytical approximations that charac-

We then relax the proportional hazard assumption and use terize the large sample distribution of the proposed estimators

the time-varying coecient models as described in Section 3.2 would facilitate approximate inference for time-dependent

to characterize the accuracy of the model score Mi = Z Ti . ROC curves, the AUC(t) curve, or the concordance summary

The bottom panel of Figure 4 displays the AUC function C . Finally, exploration of time-dependent accuracy methods

based on the estimated time-varying coecient obtained us- with a longitudinal marker, Mi (t), would be important for

ing locally weighted MPL. Early estimates of AUC(t) now ex- the common prospective medical setting in which predictive

ceed 0.90 and decline sharply to approximately 0.75 at 2000 covariate information is updated over time.

days for the ve-covariate model and to less than 0.65 at

2000 days for the four-covariate model. Using the estimated Resume

AUC(t) reveals that the Mayo model is excellent at short-

term prediction but that the predictive accuracy declines to Ladequation dun modele de survie peut etre resumee grace

a des extensions du pourcentage de variabilite expliquee par

AUC(t) < 0.80 by 1 year for the model without bilirubin, and le modele, ou R2, utilise habituellement pour les modeles

to AUC(t) < 0.80 by 5 years for the ve-covariate model. Fi- expliquant une reponse continue, ou grace a des extensions

nally, using the time-varying coecient produces a global con- de la sensibilite et specicite, utilisees habituellement pour

cordance summary of 0.80 for the ve-covariate model and predire une reponse binaire. Dans cet article nous proposons

0.72 for the model that excludes bilirubin. une version dependant du temps de ladequation, en utilisant

des fonctions du temps de la sensibilite et la specicite cal-

culees sur les groupes a risque. Nous relions les resumes de

5. Discussion ladequation a une mesure globale de la concordance, proposee

This article introduces a new version of time-dependent sen- auparavant, qui est une extension du tau de Kendall. De plus,

sitivity, specicity, and associated ROC curves that are useful nous montrons comment utiliser les resultats obtenus par un

for characterizing the predictive accuracy of a scalar marker, modele de Cox an dobtenir les estimations de la sensibilite et

such as a derived model score, when the outcome is a cen- la specicite dependant du temps ainsi que des courbes ROC

(Receiver Operating Characteristic) dependant du temps. Des

sored survival time. We show that the area under the time-

methodes destimation semi-parametrique adaptees a la fois

specic ROC curves can be plotted as a function of time to aux modeles a hasards proportionnels et non proportionnels

characterize temporal changes in accuracy, and can be inte- sont presentees, evaluees par des simulations et illustrees par

grated using the marginal distribution of the failure time to deux jeux de donnees de survie.

provide a global concordance summary. Incident sensitivity

and dynamic specicity are shown to be easily estimated us-

References

ing a tted hazard model and the empirical distribution of

the marker data within risk sets. Using only a routine Cox Agresti, A. (2002). Categorical Data Analysis, 2nd edition.

model output allows estimates of accuracy that assume pro- New York: John Wiley & Sons.

portional hazards and simple regression smoothing of scaled Akritas, M. G. (1994). Nearest neighbor estimation of a bi-

Schoenfeld residuals provides accuracy summaries appropri- variate distribution under random censoring. Annals of

ate for markers that do not satisfy proportional hazards. Sim- Statistics 22, 12991327.

ulations suggest that residual smoothing and locally weighted Anderson, J. A. (1979). Multivariate logistic compounds.

partial likelihood estimators both provide feasible and accu- Biometrika 66, 1726.

rate estimates. Cai, T., Pepe, M. S., Lumley, T., Zheng, Y., and Jenny, N. S.

Our methods explicitly decouple the generation of a pre- (2003). The sensitivity and specicity of markers for

dictive score from the evaluation of prognostic accuracy. An event times. University of Washington Technical Report

investigator may use Cox regression to create a model score 188, 130.

Mi = Z Ti that is a time-invariant linear combination of base- Cai, Z. and Sun, Y. (2003). Local linear estimation for time-

line covariates Z i . However, using the exible methods pro- dependent coecients in Coxs regression models. Scan-

posed in Section 3.2 to evaluate the prognostic potential of dinavian Journal of Statistics 30, 93111.

Mi does not require commitment to the proportional hazards Cox, D. R. (1972). Regression models and life-tables (with

assumption. A practical advantage of using Mi = Z Ti is that discussion). Journal of the Royal Statistical Society, Series

a single scoring of the baseline covariates is conducted to B, Methodological 34, 187220.

generate Mi , but if proportional hazards is clearly violated Efron, B. (1979). Bootstrap methods: Another look at the

then a more general model such as 0 (t) exp[Z Ti (t)] may be jackknife. Annals of Statistics 7, 126.

appropriate, and would lead to a time-varying score Mi (t) = Etzioni, R., Pepe, M., Longton, G., Hu, C., and Goodman,

Z Ti (t). G. (1999). Incorporating the time dimension in receiver

A number of aspects warrant additional research. First, operating characteristic curves: A case study of prostate

estimation methods proposed in Sections 3.1 and 3.2 as- cancer. Medical Decision Making 19, 242251.

sume that the censoring time is independent of the survival Fan, V., Au, D., Heagerty, P., Deyo, R., McDonell, M., and

time. Relaxation to allow conditional independence given the Fihn, S. (2002). Validation of case-mix measures derived

Survival Model Predictive Accuracy and ROC Curves 105

ical Epidemiology 55, 371380.

Concordance as Function of AUC(t)

Fleming, T. R. and Harrington, D. P. (1991). Counting Pro-

cesses and Survival Analysis. New York: John Wiley & Assume independent observations (Mj , Tj ) and (Mk , Tk ), and

Sons. assume that Tj is continuous such that P (Tk = Tj ) = 0. Let

Grambsch, P. M. and Therneau, T. M. (1994). Proportional P (x) denote probability or density depending on the context:

hazards tests and diagnostics based on weighted residu- 1

P [Tj < Tk ] = (by independence)

als (Corr: 1995, 82, 668). Biometrika 81, 515526. 2

of the area under the receiver operating characteristic P [Mj > Mk | Tj < Tk ]

(ROC) curve. Radiology 143, 2936.

Harrell, F. E., Lee, K. L., and Mark, D. B. (1996). Multi- = P [{Mj > Mk } {Tj < Tk }] 2

variable prognostic models: Issues in developing models,

evaluating assumptions and adequacy, and measuring

and reducing errors. Statistics in Medicine 15, 361 = P [{Mj > Mk } {Tj = t} {t < Tk }] 2 dt

t

387.

Hastie, T. and Tibshirani, R. (1993). Varying-coecient mod-

els. Journal of the Royal Statistical Society, Series B 55, = P [{Mj > Mk } | {Tj = t} {t < Tk }] 2

757796. t

dependent ROC curves for censored survival data and a P [{Tj = t} {t < Tk }] dt

diagnostic marker. Biometrics 56, 337344.

Kalbeisch, J. D. and Prentice, R. L. (2002). The Statistical = AUC(t) 2 P [Tj = t] P [t < Tk ] dt

Analysis of Failure Time Data. New York: John Wiley & t

Sons.

Korn, E. L. and Simon, R. (1990). Measures of explained vari-

= AUC(t) w(t) dt = ET [AUC(T ) 2 S(T )],

ation for survival data. Statistics in Medicine 9, 487503. t

OQuigley, J. and Xu, R. (2001). Explained variation in pro-

portional hazards regression. In Handbook of Statistics in with w(t) = 2 f (t) S(t).

Clinical Oncology, J. Crowley (ed), 397409. New York:

Marcel Dekker.

Pepe, M. S. (2003). The Statistical Evaluation of Medical Tests Hazard as Bridge from P(Mi = m | Ti t)

for Classication and Prediction. Oxford: Oxford Univer- to P(Mi = m | Ti = t)

sity Press. Let P(x) denote probability or density depending on the con-

Schemper, M. and Henderson, R. (2000). Predictive accuracy text and specic assumptions. For either continuous or dis-

and explained variation in Cox regression. Biometrics 56, crete survival times the conditional hazard can be dened as

249255.

Slate, E. H. and Turnbull, B. W. (2000). Statistical models (t | Mi = m) = P (Ti = t | Mi = m)/P (Ti t | Mi = m).

for longitudinal biomarkers of disease onset. Statistics in Let P(m) denote the marginal density or distribution of the

Medicine 19, 617637. marker M. Following Xu and OQuigley (2000) we obtain the

Xu, R. and OQuigley, J. (2000). Proportional hazards es- following general relationship:

timate of the conditional survival function. Journal of

the Royal Statistical Society, Series B, Methodological 62, P (Mi = m | Ti = t)

667680.

Zheng, Y. and Heagerty, P. (2004). Semiparametric estimation = P (Ti = t | Mi = m) P (Mi = m)/P (Ti = t)

of time-dependent ROC curves for longitudinal marker

data. Biostatistics 5, 615632. = (t | Mi = m) P (Ti t | Mi = m)

Zhou, X.-H., McClish, D. K., and Obuchowski, N. A. (2002).

Statistical Methods in Diagnostic Medicine. New York: P (Mi = m)/P (Ti = t)

John Wiley & Sons.

= (t | Mi = m) P (Mi = m | Ti t) P (Ti t)/P (Ti = t)

Received August 2003. Revised March 2004.

Accepted March 2004. P (Mi = m | Ti = t) (t | Mi = m) P (Mi = m | Ti t).

- Nutritional survey standard analysis.docxUploaded byMaria Lelly Aswindani
- ggplot2Uploaded byHiromn Nagata
- Biometric User Identification and AuthenticationUploaded byGurdeep Singh Wadhwa
- Identify the Risk to Hospital Admission in Uk-systematic Review of LiteratureUploaded byGlobal Research and Development Services
- 20081103153446_F Distribution TablesUploaded bynawa
- Lupus x Estresse OxidativoUploaded byRaquel Soares
- biomatricUploaded byRavinder Garg
- nomogramUploaded bySOr
- Logistic Regression Variables BblkUploaded byLinda Yana Ginting
- Data Analysis and Findings (Edit)Uploaded byArdiana Amran
- Ques 3.docxUploaded byAbhirup Sengupta
- comparation of HbA1C and fasting blood sugarUploaded byIez Fatihah
- Data ScienceUploaded byAbreham
- CT4_2015Uploaded byHemanta Bashyal
- 8.pdfUploaded byWISSAL
- Test Reliability_chapter 8Uploaded byMay
- Quantitative Aptitude TestUploaded bykeertimmunish
- Med 06 Course OutlineUploaded byapi-3696879
- 3firmPortfolioExample.xlsUploaded byRudy Martin Bada Alayo
- Panel Data Analysis (mer)Uploaded byPär Sjölander
- Age_+estimated+glomerular+filtration+rate+and+ejection+fraction+score+predicts+contrast-induced+acute+kidney+injury+in+patients+with+diabetes+and+chronic+kidney+disease_+insight+from+the+TRACK-D+studyUploaded byangga_darmawan26
- HCU_PGDBM_AssignmentUploaded byrao_nari8305
- Otolaryngology Head and Neck Surgery 2011 Kanerva 0194599810397497Uploaded byAbraham Muñoz
- For Immediate Release Tuesday, October 20, 2009 at 8:30 a.m.Uploaded byqtipx
- STA2020F+Test+2+2009Uploaded byAnn Kim
- Transportation Statistics: table b 04Uploaded byBTS
- 1_ Factor AnalysisUploaded byNghiem Xuan Hoa
- Bayesian Networks for Earthquake Magnitude Classification in a Early Warning SystemUploaded byoksya
- csUploaded byJ. Barrera
- Factor AnalysisUploaded byAnonymous gr2j6DXN

- concordance-c-index_2.pdfUploaded bynuriyesan
- lim inf and sup.pdfUploaded bynuriyesan
- lim inf and sup.pdfUploaded bynuriyesan
- lec4.pdfUploaded bynuriyesan
- ConnectednessUploaded byhsanchezl
- Summary sequencesUploaded bynuriyesan
- BanachUploaded bynuriyesan
- CantorUploaded bynuriyesan
- CantorUploaded bynuriyesan
- InductionUploaded bynuriyesan
- istatistikUploaded bynuriyesan

- ATT Field Doc Ericcson RAX Board Install and TestUploaded byjpotoczny6509
- Catalan Sequence GenerationUploaded byLuc-Andre Sabourin
- VHDL SyntaxUploaded byNaeem Ahmad
- 04 Power Train ToolsUploaded byJose Mesco
- Fast construction of KD treesUploaded byP_A_I
- WRC-107Uploaded bysudhirm16
- LeapMotion JavaUploaded byspidermanpc
- Homework 1Uploaded byJennifer Tomalá Gonzaléz
- dwtUploaded byBhagyashree Pande
- mac jackie - resumeUploaded byapi-310337639
- Everyday Electronics 1989 12Uploaded byAmado Zambrano
- Visa Wizard HollandUploaded byIjju Zaidi
- Search Results ScribUploaded bykengkure
- Bizhub 211 user manualUploaded byionutkok
- Distribution network study with and without DG --Final year Project report--PEC chd.--SACHIN KUMAR SINGLAUploaded bysachincoolsexy
- Assign#1Uploaded byAnshul Jain
- C1306-D (1)Uploaded byhector
- ABAP DictionaryUploaded byRahul Jain
- Tg784nv3 Cli GuideUploaded byacevice69
- CSE6242-HW4Uploaded byRichard Ding
- HelpUploaded byDoinholly13
- Aarkstore Enterprise | Control Valve Primer a Users Guide, Fourth EditionUploaded bywinaarkstore
- SAP CRM Pipeline Performance ManagementUploaded bySuwalee Paisanvorajit
- DDSNUploaded byankitjaipur
- Crestron Mercury Ss Ccs-uc-1 1Uploaded byPradeesh Menomadathil
- Cpu Scrap Au RecoveryUploaded byAmainod Damâlo Inial
- Senior Technical WriterUploaded byapi-77422571
- Fonts HowtoUploaded bybeedgai
- Maxmill - Fanuc ManualUploaded byMico Stanojevic
- CFD-AS-3600-09.pdfUploaded byTri Le