You are on page 1of 10

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/8193428

Applications in survival analysis

Article  in  Journal of Animal Science · February 1999


Source: PubMed

CITATIONS READS
21 823

1 author:

Stephen D Kachman
University of Nebraska at Lincoln
144 PUBLICATIONS   3,347 CITATIONS   

SEE PROFILE

All content following this page was uploaded by Stephen D Kachman on 27 January 2014.

The user has requested enhancement of the downloaded file.


Applications in Survival Analysis
Stephen D. Kachman

J ANIM SCI 1999, 77:147-153.

The online version of this article, along with updated information and services, is located on
the World Wide Web at:
http://jas.fass.org/content/77/E-Suppl_2/147

www.asas.org

Downloaded from jas.fass.org by guest on December 31, 2011


Breeding and Genetics 5: Survival Analysis/Threshold Models

Applications in Survival Analysis1,2


STEPHEN D. KACHMAN
Department of Biometry, University of Nebraska-Lincoln, Lincoln 68583-0712

ABSTRACT As its name implies, survival analysis is typically


used to examine either the length of time an in-
Survival or failure time traits such as herd life and
dividual survives or the length of time until a part
days open are both important economically and pose a
fails (1, 2, 4, 11). However, survival analysis is also
number of challenges to an analysis based on linear
applicable when monitoring length of time until suc-
mixed models. The main features of a survival trait
cess.
are that it is the time until some event occurs, and
Software such as Proc Lifereg ( 9 ) for fixed effects
some of the observations are censored. Survival
models and, more recently, software such as Survival
models and the associated estimation procedures pro-
Kit (5, 6 ) for mixed models are available to analyze
vide a flexible means of modeling survival traits. In
survival data. However, they are not of much use
this paper I will discuss the application of survival
without a basic understanding of survival analysis. In
analysis based on the Weibull distribution. The com-
addition, analysis of data sets encountered in animal
ponents that make up a survival model will be
breeding frequently test the limits of general pack-
presented along with their interpretation. Issues
ages.
related to the model construction and estimation will
The objectives of this paper are to provide a brief
be presented.
introduction to the analysis of survival data and to
1999 American Society of Animal Science discuss some of the details needed to modify existing
and American Dairy Science Association. All rights reserved. programs to analyze survival traits.

( Key words: survival, failure time, mixed model) MODEL

INTRODUCTION The time that animal i fails, Ti, can be thought of


as a random process, which depends on many factors.
Evaluation of traits, which are measured in days, These factors can include fixed effects, b, such as the
months, or years, poses a number of challenges. These sex of the animal and random effects, u, such as the
traits consist of the length of time between two genetic merit of the animal. Typically, these are com-
events. For example, a breeder may be interested in bined into a vector of risk factors for animal i, hi = xib
the length of productive life. The trait would then be + ziu. In an animal breeding analysis the distribution
the length of time an animal is productive. The of the random effects is often assumed to be a mul-
breeder is then faced with the following challenges. tivariate normal due to its flexibility in modeling
First, the endpoints of the interval must be defined. complex covariance structures.
Second, how will a record be treated if the animal The probability that animal i survives at least
leaves the herd for a factor unrelated to production? until time t, given its risk function, is called the
Third, how will a record be treated if the animal is survival function
still productive when the evaluation takes place?
Fourth, the distribution is heavily skewed. Survival S( t;hi) = Pr( Ti ≥ t) = 1 – F( t; hi)
analysis (3, 8 ) is an approach to analyzing traits ∞
such as these. ⌠
=  ƒ( w; hi) dw
⌡t

1Presented at the 1998 ADSA/ASAS meeting in Denver,


where Ti = time of failure, F( t; hi) = cumulative dis-
Colorado.
2This manuscript has been assigned Journal Series No. 12524, tribution function for Ti, and ƒ( t; hi) = density func-
Agricultural Research Division Office. tion for Ti.

147

Downloaded from jas.fass.org by guest on December 31, 2011


148 KACHMAN

The challenge is then to develop a reasonable


model for the survival function. Hazard functions pro-
vide one approach and will be discussed next.

Hazard Function
Models for survival analysis can be built from a
hazard function, which measures the risk of failure of
an individual at time t. The hazard function for
animal i at time t is
Figure 2. Weibull hazard function, rl( lt) r–1 where l = 1/5, and
Pr( Ti < t + Dt|Ti > t) ƒ( t; hi) rate parameter ( r) of 0.5 ( ) , 1 ( – –), or 2 ( ......) .
l( t; hi ) = lim = .
Dt→0 Dt S( t; hi)

hi) = l. The resulting density and survival functions


Another way to look at the hazard function is that for are
short periods of time ( Dt) , the probability that an
animal fails is approximately equal to l( t; hi) Dt. ƒ( t; hi) = le–lt
From its definition, the hazard function must be S( t; hi) = e–lt,
nonnegative. In addition it must be positive at time t
unless there is no risk of failure at time t. Without which is an exponential model (Figure 1).
going into detail, the survival function can be ob- The constant hazard function will produce a popu-
tained from the hazard function with the following lation in which the chance of an animal surviving an
relationship additional 5 yr is the same at birth and at 5 and 10
yr. A generalization of this would be for the hazard to
–L( t;hi )
S( t; hi) = e either increase or decrease over time.
The Weibull model, which has the following hazard
t function and survival functions

where L( t; hi) =  l( w; hi) dw. The cumulative distri-
⌡0 l( t; hi) = rl( lt) r–1
r
– ( lt)
bution and density functions follow directly S( t; hi) = e ,

F( t; hi) = 1 – S( t; hi) has the flexibility to model increasing or decreasing


ƒ( t; hi) = l( t; hi) S( t; hi) . hazards. When r = 1 the Weibull distribution reduces
to the exponential distribution. The Weibull model
If we assume that the risk of failure is constant has a decreasing hazard function when r < 1 and an
over time, we get the following hazard function: l( t; increasing hazard function when r > 1. The Weibull
hazard and survival functions are presented in
Figures 2 and 3.
As can be seen from Figure 3, for a given l the
survival functions all intersect at t = 1/l. At t = 1/l
the percentage survival is equal to exp(–1) ™ 37%.
The rate parameter r determines how quickly the
survival function drops off. For small values of the
rate parameter, the survival function drops quickly
and then levels off. For large values of the rate
parameter, the survival function starts off fairly level
and then drops off suddenly. The effect of changes in
the rate parameter can be seen in Figure 4.
With the Weibull model, l plays the role of adjust-
Figure 1. Exponential survival function, e–( lt) where l = 1/2.5 ing the intercept. To emphasize the intercept role, the
( ) , or l = 1/5 ( – –). Weibull survival function can be rewritten as

J. Anim. Sci. Vol. 77, Suppl. 2/J. Dairy Sci. Vol. 82, Suppl. 2/1999

Downloaded from jas.fass.org by guest on December 31, 2011


SYMPOSIUM: SURVIVAL ANALYSIS/THRESHOLD MODELS 149

S( t; hi) = e–exp[r ln(t) + r ln(l) ]


= e–exp[r ln(t)+h]
p h
= e–t e
[1]

where h = r ln(l) . The corresponding hazard function


is l( t; hi) = r tr–1eh.
The hazard function is now the product of two
parts. The baseline part, l0( t) = r tr–1, models the
basic shape of the hazard function and, therefore, the
shape of the density and survival functions. The scal-
ing part, eh, models the relative risk above or below
the baseline risk. Figure 4. Weibull density function where l = 1/5 and rate
parameter ( r) of 0.5 ( ) , 1 ( – –), or 2 ( ......) .
In many cases it is reasonable to assume that the
basic shape remains constant for different risk factors
but that certain risk factors either increase ( h > 0 ) or
decrease ( h < 0 ) the overall risk of failure. Propor-
tional hazard models are distributions for which haz- h = Xb + Zu
ard functions can be factored into a baseline hazard,
which does not depend on the risk factors, and a where b = vector of fixed effects, u ∼ N ( 0, G) = vector
scaling factor, which does not depend on time. Typi- of random effects, and X and Z = known design ma-
cally, the scaling function is exp(hi) with hi being a trices. The vector of risk factors differs from the usual
scalar. The hazard function can then be written as
mixed model equation
hi
l( t; hi) = l0( t) e y = Xb + Zu + e

where l0( t) = l( t; 0 ) = baseline hazard function. The in several important ways. First, it does not include a
survival function can then be written as residual component ( e) . In the survival model the
residual variability is modeled through the survival
–L0( t) e
hi distribution. Second, the expected survival time,
S( t; hi) = e given the random effects, is not equal to the Xβ + Zu
as in the mixed model. Third, larger values of the risk
where L0( t) = L( t; 0). The role of the risk factor hi factor lead to shorter expected survival times.
will be examined next. Under the Weibull survival model, the survival
function [1] for animal i can be written as
Risk Factor hi
–exp[r ln(t)+hi]
S( t; hi) = e . [2]
The vector of risk factors is a linear combination of
fixed and random effects
The effect of changes on median survival time, mhi,
can be found by solving S( mhi; hi) = 0.5. After a bit of
algebra

1/re–hi/r
mh = [– ln(0.5) ] .
i

The effect of a D unit change in hi on median survival


time is

mh + D = mh e–D/r.
i i

r
Figure 3. Weibull survival function, e– ( lt) where l = 1/5 and For example, let r = 2, and the risk factor for males be
rate parameter ( r) of 0.5 ( ) , 1 ( – –), or 2 ( ......) . 0.5 larger than the risk factor for females; then the

J. Anim. Sci. Vol. 77, Suppl. 2/J. Dairy Sci. Vol. 82, Suppl. 2/1999

Downloaded from jas.fass.org by guest on December 31, 2011


150 KACHMAN

median survival time for males would be approxi-


mately 78% ( e–0.5/2 ™ 0.78) of the median survival
time of comparable females. The effect of changes in
the risk factor on survival time are in Figure 5.
To summarize, larger risk factors correspond to
higher risks and shorter survival times. In addition,
an additive change in the risk factor results in a
multiplicative change in median survival time.

ESTIMATION
The basic approaches to estimation include non- Figure 5. Effect of changes in the risk factor on median survival
parametric, semi-parametric, and parametric. The fo- time with rate parameter ( r) of 0.5 ( ) , 1 ( – –), or 2 ( ......) .
cus of this paper is on the parametric approach. The
parametric approach is better suited to handle the
large complex models encountered in animal breed-
ing. The basic parametric approach involves getting y*i = 1 – L ( T ) ehi + R h ,
0 i ii i
the joint likelihood of the survival times and the
random effects. In simple cases, the marginal likeli- which is very similar to the usual mixed model equa-
hood of survival time can be obtained by integrating tions. Because these equations must be solved itera-
over random effects. The marginal likelihood can also tively, the computational time will be several times
be approximated by taking a second order Taylor’s
greater than for a corresponding linear model when
series expansion of the joint log-likelihood. From a
variance components are known. However, if the vari-
Bayesian viewpoint, that would be equivalent to ob-
ance components must be estimated computational
taining the posterior mode.
times will be similar to the corresponding linear
Ignoring an additive constant, the joint log-
likelihood for the Weibull distribution is model. Approximate standard errors and tests are
obtained as in the linear model case.
l ( b, u,r) = ∑[ln(r/ti) + r ln(ti) + hi
i CENSORING
– exp(r ln(ti) + hi) ] – 1/2 ln |G| – 1/2u′G–1u.
In this section the effect of censoring will be dis-
cussed. Unlike traits such as yearling weight, data on
Written in a slightly more general form
survival traits are often censored. That is, the sur-
vival time may either be known to be greater than a
l ( b, u,r) = ∑[ln( l0( ti) + hi – L0( ti)exp( hi) ]
certain amount (right censored), less than a certain
i
– 1/2 ln |G| – 1/2u′G–1u. [3] amount (left censored), or be within a certain range
(double censored). Of the three types of censoring,
Posterior mode estimates of the fixed and random right censoring is the most common. Right censoring
effects can then be obtained by taking the first and can occur because an animal is removed before failure
second partial derivatives of [3]. After a little algebra, can be observed or because failure occurred after the
the resulting estimation equations are end of data collection. Left censoring can occur be-
cause an animal failed before data collection began.
X′RX X′RZ  b̂  Double censoring can occur if there is a break in data
= 
X′y* 
Z′RX Z ′RZ + G–1 û Z ′y*  [4] collection, and an animal fails somewhere in that
interval.
where In the following examination of censoring, time of
censoring and survival time will be assumed to be
2 independent. Attention will also be focused on han-
R = – ∂l
∂h∂h′ dling right censoring. Conceptually other types of cen-
Rii = L ( T ) ehi soring are handled similarly.
0 i
∂l
For data that are right censored, the time of cen-
y* = + Rη soring is observed instead of the time of failure. Let Ti
∂h

J. Anim. Sci. Vol. 77, Suppl. 2/J. Dairy Sci. Vol. 82, Suppl. 2/1999

Downloaded from jas.fass.org by guest on December 31, 2011


SYMPOSIUM: SURVIVAL ANALYSIS/THRESHOLD MODELS 151

= observed time at which an animal has failed or the management, disease, or economic forces. A sharp
time at which the record was censored. If a record is drop in the price of milk would increase a dairy cow’s
uncensored, then the density function of Ti is risk of being culled. However, we would not expect the
drop to have an impact on an animal prior to the drop
–L( Ti; hi) taking place. These changes can also be on an in-
ƒ( Ti; hi) = l( Ti; hi) e = λ(Ti; hi) S( Ti; hi) .
dividual animal basis for factors such as disease and
reproductive status. Figure 6 illustrates a hazard
If a record is censored, then the probability mass function for an animal who becomes ill at 2 yr and
function of Ti is obtained by integrating ƒ( t; hi) from recovers at 3.5 yr. The risk of failure increases during
Ti to ∞ yielding the period of illness and decreases when the animal
recovers. These changes in the risk of failure can be
–L( Ti; hi)
S( Ti; hi) = e . modeled using a time dependent covariate.
The record on the animal is broken into three
conditionally independent observations: 1 ) a well
The log likelihood for animal i is
animal with a survival time greater than 2, 2 ) an ill
animal with a survival time greater than 3.5 condi-
li = Wi ln(l( Ti; hi) ) – L( Ti; hi)
tioned on survival till time 2 yr, and 3 ) a recovered
animal conditioned on survival until 3.5 yr. The
where Wi = 1 if a record is uncensored and = 0 if a resulting log likelihood for the animal at time t > 3.5
record is censored. The corresponding elements in [4] yr is
are
li( t) = ln(S(2; hi0) )
Rii = L0 (Ti ) ehi [5] – ln(S( 2; hi1) ) + ln ( S(3.5; hi1) )
– ln(S( 3.5; hi2) ) + ln ( ƒ( t; hi2) )
and
where hi0, hi1, and hi2 = risk factors for animal i when
y*i = Wi – L0( Ti) ehi + Riihi. [6] it is well, ill, and recovered, respectively. The likeli-
hood is obtained by observing that
Censoring can lead to difficulties in parameter esti-
ƒ( t;hi|t > C) = ƒ( t;hi)
mation. When animal i is right censored at Ti, the log
S( C;hi)
likelihood is
li( t|t > C) = ln(ƒ( t;hi) ) – ln(S( C;hi) )
hi
li = –L0( Ti) e
where ƒ( t; hi|t > C) = density of animal i surviving to
time t, conditioned on its surviving to time C, and
taking the partial with respect to hi yields
li( t|t > C) = corresponding contribution to the log
likelihood.
∂li h
= –L0( Ti) e i.
∂hi
PROGRAMMING ISSUES
h Existing mixed model programs can be modified to
Because L0( Ti) is positive, and e i is positive for all
handle the analysis of a survival trait with relatively
values of hi, the partial is always negative. The impli-
small changes. The changes that need to be made
cation is that if all the records in a fixed effect group
include repeatedly building and solving the mixed
are right censored, then the estimate of the risk factor
model equations with updated risk factors. Within the
for that fixed effect group will go to –∞.
portion of the program that builds the mixed model
equations, risk factors, hi, adjusted weights, Rii, and
TIME-DEPENDENT COVARIATES adjusted dependent variables, y*i , need to be calcu-
Various events in an animal’s life can lead to lated for each animal. The adjusted risk factors can
changes in its risk of failure. For example, the under- be calculated within the main body of the program.
lying risk in a herd can change over time because of The adjusted weights and dependent variables are

J. Anim. Sci. Vol. 77, Suppl. 2/J. Dairy Sci. Vol. 82, Suppl. 2/1999

Downloaded from jas.fass.org by guest on December 31, 2011


152 KACHMAN

best calculated using a link function to ease future


modifications.
The basic changes needed within the main body of
the program are illustrated below:

DO I=1,N Loop to read in the N


records.
Read in record
ETA=0 Calculate the risk factor ( hi
= ETA) based on the solu-
DO J=1, NEFF tion for the NEFF fixed Figure 6. Changes in the hazard function for an animal that
and random effects (SOL) starts out well ( ) , is ill ( – – ) at time 2, and recovers ( . . . . . .)
ETA=ETA+X*SOL and the design matrix at time 3.5.
( X) .
END DO
CALL LINK(Y,R,ETA,W,YSTAR) Calculate the weight ( Rii =
R) and adjusted dependent
variable ( y*i = YSTAR) mates obtained by taking partial derivates of the log
based on the failure time likelihood with respect to the rate parameter. The
( Ti = Y ) and censor code weight matrix and adjusted dependent variables for
( Wi = W).
Build LHS and RHS animal i are
END DO
L( Ti; hi, ri)
Rii = L( Ti; hi, ri) ln( Ti)
L( Ti, hi, ri) ln( Ti) 


The two additions are the calculation of the risk
factor for animal ETA and the addition of a link 1
L( Ti; hi, ri ln( Ti) 2 +
subroutine LINK(). r2i 
The basics of the link subroutine, assuming r is
known, are 1 – L( Ti; hi, ri) 

 
hi
y*i = 1 + Rii  
( 1 – L( Ti; hi, ri) ) ln( Ti) + ri
 ri
SUBROUTINE LINK
(Y,R,ETA,W,YSTAR) where ri = rate parameter for animal i. Typically the
REAL*8 Y,R,ETA,W,
LAMBDA,RHO model equation for the rate parameter is ri = r or in
RHO=1. Known rate parameter(r=RHO) matrix form
lambda=exp(RHO*log(y)) Baseline survival function
( L0( Ti)=lambda)
R=LAMBDA*EXP(ETA) Weight ( Rii=R) [5]. r = 1r
YSTAR=W–LAMBDA*EXP Adjusted dependent variable
(ETA)+R*ETA ( y*i =YSTAR) [6]
where r = vector of rate parameters.
RETURN
END In addition, simultaneous estimation of the rate
parameter and the risk factor can lead to convergence
problems. Often it will be necessary to initially fix the
rate parameter to obtain reasonable estimates of the
within the link subroutine the one line that depends risk factors. Second, risk factors have a tendency to go
on the hazard function selected is the calculation of to ±∞. Generally this effect is due to contemporary
integrated baseline hazard function lambda. groups in which all observations are right censored or
Although the basics are straightforward, the use- due to inclusion of time-dependent covariates. The
fulness of the program will depend on several addi- basic way of handling this effect is to provide bounds
tional details. First, the above modifications depend for the risk factors. For example, bounds for r ln(Ti)
on the rate parameter RHO being known. In practice + hi would be –7 and 2.5. Third, time-dependent
it will need to be estimated. Estimation of the rate covariates can be handled by preprocessing the data
parameter can be treated as a second trait and esti- to produce multiple coded records.

J. Anim. Sci. Vol. 77, Suppl. 2/J. Dairy Sci. Vol. 82, Suppl. 2/1999

Downloaded from jas.fass.org by guest on December 31, 2011


SYMPOSIUM: SURVIVAL ANALYSIS/THRESHOLD MODELS 153

SUMMARY nonparametric ( 8 ) semi-parametric ( 1 0 ) approaches,


which have not been discussed here.
A number of economically important traits meas-
ure the time until an event occurs. These traits pose a
number of challenges including non-normal distribu- REFERENCES
tions and censoring. Survival analysis provides a set 1 Beaudeau, F., V. Ducrocq, C. Fourichon, and H. Seegers. 1995.
of distributions appropriate for time until event Effect of disease on length of productive life of French Holstein
traits. In addition, estimation procedures are dairy cows assessed by survival analysis. J. Dairy Sci. 78:
103–117.
equipped to handle various forms of censoring. The 2 Ducrocq, V. 1994. Statistical analysis of length of productive life
Weibull survival model with the addition of time- for dairy cows of the Normande breed. J. Dairy Sci. 77:855–866.
dependent covariates can handle a wide variety of 3 Ducrocq, V., and G. Casella. 1996. A Bayesian analysis of mixed
survival analysis. Genet. Sel. Evol. 28:505–529.
survival traits. Time-dependent covariates also allow 4 Ducrocq, V., R. L. Quaas, E. J. Pollak, and G. Casella. 1988.
the modeling of events which have an effect of limited Length of productive life of dairy cows. 1. Justification of a
duration. Weibull model. J. Dairy Sci. 71:3061–3070.
5 Ducrocq, V., and J. Sölkner. 1998. The Survival Kit—a Fortran
In general, modifications to existing programs for package for the analysis of survival data. Proc. 6th World Cong.
evaluating linear mixed models to handle survival Genet. Appl. Livest. Prod. 23:447–448.
traits should be fairly minor. The challenges are 6 Ducrocq, V., and J. Sölkner. 1998. The Survival Kit, V3.0,
User’s Manual. Inst. Natl. Recherche Agron., Universität für
primarily in handling boundary conditions and time- Bodenkultur, Jouy-en-Josas, France.
dependent covariates. In addition, the Survival Kit 7 McCullagh, P., and J. A. Nelder. 1989. Models for survival data.
set of programs is also available. Pages 419–431 in Generalized Linear Models. 2nd ed. Chapman
& Hall, London, United Kingdom.
This paper provides a brief introduction to survival 8 Miller, R. G., Jr., G. Gong, and A. Muñoz. 1981. Survival
analysis. A more detailed presentation can be found Analysis. Wiley Series in Probability and Mathematical Statis-
in Miller et al.(8) which explores various approaches tics. John Wiley & Sons Inc., New York, NY.
9 SAS/STAT User’s Guide, Version 6, 4th Edition. 1989. SAS
to the analysis of fixed effects models. McCullagh and Inst. Inc., Cary, NC.
Nelder ( 7 ) provide an introduction from the general- 10 Smith, S. P., and F. R. Allaire. 1986. Analysis of failure times
ized linear model perspective. Ducrocq and Casella measured on dairy cows: theoretical considerations in animal
( 3 ) examine the analysis of mixed models from a breeding. J. Dairy Sci. 69:217–227.
11 Smith, S. P., and R. L. Quaas. 1984. Productive lifespan of bull
Bayesian perspective. In addition to the parametric progeny groups: failure time analysis. J. Dairy Sci. 67:
methods discussed in this paper, there are a variety of 2999–3007.

J. Anim. Sci. Vol. 77, Suppl. 2/J. Dairy Sci. Vol. 82, Suppl. 2/1999

Downloaded from jas.fass.org by guest on December 31, 2011


Citations This article has been cited by 3 HighWire-hosted articles:
http://jas.fass.org/content/77/E-Suppl_2/147#otherarticles

Downloaded from jas.fass.org by guest on December 31, 2011


View publication stats

You might also like