Survival Analysis
Stephen P. Jenkins
18 July 2005
ii
Contents
Preface xi
1 Introduction 1
1.1 What survival analysis is about . . . . . . . . . . . . . . . . . . . 1
1.2 Survival time data: some notable features . . . . . . . . . . . . . 3
1.2.1 Censoring and truncation of survival time data . . . . . . 4
1.2.2 Continuous versus discrete (or grouped) survival time data 6
1.2.3 Types of explanatory variables . . . . . . . . . . . . . . . 7
1.3 Why are distinctive statistical methods used? . . . . . . . . . . . 8
1.3.1 Problems for OLS caused by right censoring . . . . . . . . 8
1.3.2 Timevarying covariates and OLS . . . . . . . . . . . . . . 9
1.3.3 ‘Structural’ modelling and OLS . . . . . . . . . . . . . . . 9
1.3.4 Why not use binary dependent variable models rather
than OLS? . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4 Outline of the book . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2 Basic concepts: the hazard rate and survivor function 13
2.1 Continuous time . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.1.1 The hazard rate . . . . . . . . . . . . . . . . . . . . . . . 14
2.1.2 Key relationships between hazard and survivor functions . 15
2.2 Discrete time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2.1 Survival in continuous time but spell lengths are interval
censored . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2.2 The discrete time hazard when time is intrinsically discrete 19
2.2.3 The link between the continuous time and discrete time
cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.3 Choosing a speci…cation for the hazard rate . . . . . . . . . . . . 21
2.3.1 Continuous or discrete survival time data? . . . . . . . . . 21
2.3.2 The relationship between the hazard and survival time . . 22
2.3.3 What guidance from economics? . . . . . . . . . . . . . . 22
iii
iv CONTENTS
3 Functional forms for the hazard rate 25
3.1 Introduction and overview: a taxonomy . . . . . . . . . . . . . . 25
3.2 Continuous time speci…cations . . . . . . . . . . . . . . . . . . . 26
3.2.1 Weibull model and Exponential model . . . . . . . . . . . 26
3.2.2 Gompertz model . . . . . . . . . . . . . . . . . . . . . . . 27
3.2.3 Loglogistic Model . . . . . . . . . . . . . . . . . . . . . . 27
3.2.4 Lognormal model . . . . . . . . . . . . . . . . . . . . . . . 28
3.2.5 Generalised Gamma model . . . . . . . . . . . . . . . . . 28
3.2.6 Proportional Hazards (PH) models . . . . . . . . . . . . . 28
3.2.7 Accelerated Failure Time (AFT) models . . . . . . . . . . 33
3.2.8 Summary: PH versus AFT assumptions for continuous
time models . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.2.9 A semiparametric speci…cation: the piecewiseconstant
Exponential (PCE) model . . . . . . . . . . . . . . . . . . 38
3.3 Discrete time speci…cations . . . . . . . . . . . . . . . . . . . . . 40
3.3.1 A discrete time representation of a continuous time pro
portional hazards model . . . . . . . . . . . . . . . . . . . 41
3.3.2 A model in which time is intrinsically discrete . . . . . . . 43
3.3.3 Functional forms for characterizing duration dependence
in discrete time models . . . . . . . . . . . . . . . . . . . 44
3.4 Deriving information about survival time distributions . . . . . . 45
3.4.1 The Weibull model . . . . . . . . . . . . . . . . . . . . . . 45
3.4.2 Gompertz model . . . . . . . . . . . . . . . . . . . . . . . 50
3.4.3 Loglogistic model . . . . . . . . . . . . . . . . . . . . . . 50
3.4.4 Other continuous time models . . . . . . . . . . . . . . . . 53
3.4.5 Discrete time models . . . . . . . . . . . . . . . . . . . . . 53
4 Estimation of the survivor and hazard functions 55
4.1 KaplanMeier (productlimit) estimators . . . . . . . . . . . . . . 55
4.1.1 Empirical survivor function . . . . . . . . . . . . . . . . . 56
4.2 Lifetable estimators . . . . . . . . . . . . . . . . . . . . . . . . . 58
5 Continuous time multivariate models 61
5.0.1 Random sample of in‡ow and each spell monitored until
completed . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.0.2 Random sample of in‡ow with (right) censoring, moni
tored until t
. . . . . . . . . . . . . . . . . . . . . . . . . 63
5.0.3 Random sample of population, right censoring but cen
soring point varies . . . . . . . . . . . . . . . . . . . . . . 63
5.0.4 Left truncated spell data (delayed entry) . . . . . . . . . . 64
CONTENTS v
5.0.5 Sample from stock with no reinterview . . . . . . . . . . 66
5.0.6 Right truncated spell data (out‡ow sample) . . . . . . . . 67
5.1 Episode splitting: timevarying covariates and estimation of con
tinuous time models . . . . . . . . . . . . . . . . . . . . . . . . . 68
6 Discrete time multivariate models 71
6.1 In‡ow sample with right censoring . . . . . . . . . . . . . . . . . 71
6.2 Lefttruncated spell data (‘delayed entry’) . . . . . . . . . . . . . 73
6.3 Righttruncated spell data (out‡ow sample) . . . . . . . . . . . . 75
7 Cox’s proportional hazard model 77
8 Unobserved heterogeneity (‘frailty’) 81
8.1 Continuous time case . . . . . . . . . . . . . . . . . . . . . . . . . 82
8.2 Discrete time case . . . . . . . . . . . . . . . . . . . . . . . . . . 84
8.3 What if unobserved heterogeneity is ‘important’ but ignored? . . 86
8.3.1 The duration dependence e¤ect . . . . . . . . . . . . . . . 87
8.3.2 The proportionate response of the hazard to variations in
a characteristic . . . . . . . . . . . . . . . . . . . . . . . . 87
8.4 Empirical practice . . . . . . . . . . . . . . . . . . . . . . . . . . 89
9 Competing risks models 91
9.1 Continuous time data . . . . . . . . . . . . . . . . . . . . . . . . 91
9.2 Intrinsically discrete time data . . . . . . . . . . . . . . . . . . . 93
9.3 Intervalcensored data . . . . . . . . . . . . . . . . . . . . . . . . 97
9.3.1 Transitions can only occur at the boundaries of the intervals. 99
9.3.2 Destinationspeci…c densities are constant within intervals 101
9.3.3 Destinationspeci…c hazard rates are constant within in
tervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
9.3.4 Destinationspeci…c proportional hazards with a common
baseline hazard function . . . . . . . . . . . . . . . . . . . 106
9.3.5 The log of the integrated hazard changes at a constant
rate over the interval . . . . . . . . . . . . . . . . . . . . . 108
9.4 Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
9.4.1 Lefttruncated data . . . . . . . . . . . . . . . . . . . . . 108
9.4.2 Correlated risks . . . . . . . . . . . . . . . . . . . . . . . . 109
9.5 Conclusions and additional issues . . . . . . . . . . . . . . . . . . 110
10 Additional topics 113
References 115
vi CONTENTS
Appendix 121
List of Tables
1.1 Examples of lifecourse domains and states . . . . . . . . . . . . 2
3.1 Functional forms for the hazard rate: examples . . . . . . . . . . 26
3.2 Di¤erent error term distributions imply di¤erent AFT models . . 34
3.3 Speci…cation summary: proportional hazard versus accelerated
failure time models . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.4 Classi…cation of models as PH or AFT: summary . . . . . . . . . 38
3.5 Ratio of mean to median survival time: Weibull model . . . . . . 49
4.1 Example of data structure . . . . . . . . . . . . . . . . . . . . . . 56
5.1 Example of episode splitting . . . . . . . . . . . . . . . . . . . . . 69
6.1 Person and personperiod data structures: example . . . . . . . . 73
7.1 Data structure for Cox model: example . . . . . . . . . . . . . . 78
vii
viii LIST OF TABLES
List of Figures
ix
x LIST OF FIGURES
Preface
These notes were written to accompany my Survival Analysis module in the
masterslevel University of Essex lecture course EC968, and my Essex University
Summer School course on Survival Analysis.
1
(The …rst draft was completed
in January 2002, and has been revised several times since.) The course reading
list, and a sequence of lessons on how to do Survival Analysis (based around
the Stata software package), are downloadable from
http://www.iser.essex.ac.uk/teaching/degree/stephenj/ec968/index.php.
Please send me comments and suggestions on both these notes and the do
ityourself lessons:
Email: stephenj@essex.ac.uk
Post: Institute for Social and Economic Research, University of Essex, Wiven
hoe Park, Colchester CO4 3SQ, United Kingdom.
Beware: the notes remain work in progress, and will evolve as and when time
allows.. Charts and graphs from the classroom presentations are not included
(you have to get something for being present in person!). The document was
produced using Scienti…c Workplace version 5.0 (formatted using the ‘Standard
LaTeX book’ style).
My lectures were originally based on a set of overhead transparencies given
to me by John Micklewright (University of Southampton) that he had used in
a graduate microeconometrics lecture course at the European University In
stitute. Over the years, I have also learnt much about survival analysis from
Mark Stewart (University of Warwick) and John Ermisch (University of Essex).
Robert Wright (University of Stirling) patiently answered questions when I …rst
started to use survival analysis. The Stata Reference Manuals written by the
StataCorp sta¤ have also been a big in‡uence. They are superb, and useful as
a text not only as program manuals. I have also drawn inspiration from other
Stata users. In addition to the StataCorp sta¤, I would speci…cally like to cite
1
Information about Essex Summer School courses and how to apply is available from
http://www.essex.ac.uk/methods.
xi
xii PREFACE
the contributions of Jeroen Weesie (Utrecht University) and Nick Cox (Durham
University). The writing of Paul Allison (University of Pennsylvania) on sur
vival analysis has also in‡uenced me, providing an exemplary model of how to
explain complex issues in a clear nontechnical manner.
I wish to thank Janice Webb for wordprocessing a preliminary draft of the
notes. I am grateful to those who have drawn various typographic errors to
my attention, and also made several other helpful comments and suggestions. I
would like to especially mention Paola De Agostini, José Diaz, Annette Jäckle,
Lucinda Platt, Thomas Siedler and the course participants at Essex and else
where (including Frigiliana, Milan, and Wellington).
The responsibility for the content of these notes (and the webbased) Lessons
is mine alone.
If you wish to cite this document, please refer to:
Jenkins, Stephen P. (2004). Survival Analysis. Unpublished manuscript, In
stitute for Social and Economic Research, University of Essex, Colchester, UK.
Downloadable from http://www.iser.essex.ac.uk/teaching/degree/stephenj/ec968/pdfs/ec968lnotesv6.pdf
c ( Stephen P. Jenkins, 2005.
Chapter 1
Introduction
1.1 What survival analysis is about
This course is about the modelling of timetoevent data, otherwise known as
transition data (or survival time data or duration data). We consider a partic
ular lifecourse ‘domain’, which may be partitioned into a number of mutually
exclusive states at each point in time. With the passage of time, individuals
move (or do not move) between these states. For some examples of lifecourse
domains and states, see Table 1.1.
For each given domain, the patterns for each individual are described by the
time spent within each state, and the dates of each transition made (if any).
Figure 1, from Tuma and Hannan (1984, Figure 3.1) shows a hypothetical mar
ital history for an individual. There are three states (married, not married,
dead) di¤erentiated on the vertical axis, and the horizontal axis shows the pas
sage of time t. The length of each horizontal line shows the time spent within
each state, i.e. spell lengths, or spell durations, or survival times. More gener
ally, we could imagine having this sort of data for a large number of individuals
(or …rms or other analytical units), together with information that describes
the characteristics of these individuals (to be used as explanatory variables in
multivariate models).
This course is about the methods used to model transition data, and the
relationship between transition patterns and characteristics. Data patterns of
the sort shown in Figure 1 are quite complex however; in particular, there are
multistate transitions (three states) and repeat spells within a given state (two
spells in the state ‘notmarried’). Hence, to simplify matters, we shall focus on
models to describe survival times within a single state, and assume that we have
single spell data for each individual. Thus, for the most part, we consider exits
from a single state to a single destination.
1
1
Nonetheless we shall, later, allow for transitions to multiple destination states under
the heading ‘independent competing risk’ models, and shall note the conditions under which
repeated spell data may be modelled using singlespell methods.
1
2 CHAPTER 1. INTRODUCTION
Domain State
Marriage married
cohabiting
separated
divorced
single
Receipt of cash bene…t receiving bene…t x
receiving bene…t y
receiving x and y
receiving neither
Housing tenure ownedoutright
owned with mortgage
renter – social housing
renter – private
other
Paid work employed
selfemployed
unemployed
inactive
retired
Table 1.1: Examples of lifecourse domains and states
We also make a number of additional simplifying assumptions:
« the chances of making a transition from the current state do not depend
on transition history prior to entry to the current state (there is no state
dependence);
« entry into the state being modelled is exogenous – there are no ‘initial
conditions’ problems. Otherwise the models of survival times in the current
state would also have to take account of the di¤erential chances of being
found in the current state in the …rst place;
« the model parameters describing the transition process are …xed, or can
be parameterized using explanatory variables – the process is stationary.
The models that have been specially developed or adapted to analyze survival
times are distinctive largely because they need to take into account some special
features of the data, both the ‘dependent’ variable for analysis (survival time
itself), and also the explanatory variables used in our multivariate models. Let
us consider these features in turn.
1.2. SURVIVAL TIME DATA: SOME NOTABLE FEATURES 3
1.2 Survival time data: some notable features
Survival time data may be derived in a number of di¤erent ways, and the way
the data are generated has important implications for analysis. There are four
main types of sampling process providing survival time data:
1. Stock sample Data collection is based upon a random sample of the indi
viduals that are currently in the state of interest, who are typically (but
not always) interviewed at some time later, and one also determines when
they entered the state (the spell start date). For example, when mod
elling the length of spells of unemployment insurance (UI) receipt, one
might sample all the individuals who were in receipt of UI at a given date,
and also …nd out when they …rst received UI (and other characteristics).
2. In‡ow sample Data collection is based on a random sample of all persons
entering the state of interest, and individuals are followed until some pre
speci…ed date (which might be common to all individuals), or until the
spell ends. For example, when modelling the length of spells of receipt of
unemployment insurance (UI), one might sample all the individuals who
began a UI spell.
3. Out‡ow sample Data collection is based on a random sample of those
leaving the state of interest, and one also determines when the spell began.
For example, to continue our UI example, the sample would consist of
individuals leaving UI recept.
4. Population sample Data collection is based on a general survey of the
population (i.e. where sampling is not related to the process of interest),
and respondents are asked about their current and/or previous spells of
the type of interest (starting and ending dates).
Data may also be generated from combinations of these sample types. For
example, the researcher may build a sample of spells by considering all spells
that occurred between two dates, for example between 1 January and 1 June
of a given year. Some spells will already be in progress at the beginning of
the observation window (as in the stock sample case), whereas some will begin
during the window (as in the in‡ow sample case).
The longitudinal data in these four types of sample may be collected from
three main types of survey or database:
1. Administrative records For example, information about UI spells may be
derived from the database used by the government to administer the ben
e…t system. The administrative records may be the sole source of infor
mation about the individuals, or may be combined with a social survey
that asks further questions of the persons of interest.
2. Crosssection sample survey, with retrospective questions In this case, re
spondents to a survey are asked to provide information about their spells
4 CHAPTER 1. INTRODUCTION
in the state of interest using retrospective recall methods. For example,
when considering how long marriages last, analysts may use questions ask
ing respondents whether they are currently married, or ever have been,
and determining the dates of marriage and of divorce, separation, and
widowhood. Similar sorts of methods are commonly used to collect infor
mation about personal histories of employment and jobs over the working
life.
3. Panel and cohort surveys, with prospective data collection In this case, the
longitudinal information is built from repeated interviews (or other sorts
of observation) on the sample of interest at a number of di¤erent points
in time. At each interview, respondents are typically asked about their
current status, and changes since the previous interview, and associated
dates.
Combinations of these survey instruments may be used. For example a panel
survey may also include retrospective question modules to ask about respon
dents’ experiences before the survey began. Administrative records containing
longitudinal data may be matched into a sample survey, and so on.
The main lesson of this brief introduction to data collection methods is
that, although each method provides spell data, the nature of the information
about the spells di¤ers, and this has important implications for how one should
analyze the data. The rest of this section highlight the nature of the di¤erences
in information about spells. The …rst aspect concerns whether survival times
are complete, censored or truncated. The second and related aspect concerns
whether the analyst observes the precise dates at which spells are observed (or
else survival times are only observed in intervals of time, i.e. grouped or banded)
or, equivalently – at least from the analytic point of view – whether survival
times are intrinsically discrete.
1.2.1 Censoring and truncation of survival time data
A survival time is censored if all that is known is that it began or ended within
some particular interval of time, and thus the total spell length (from entry time
until transition) is not known exactly. We may distinguish the following types
of censoring:
« Right censoring: at the time of observation, the relevant event (transition
out of the current state) had not yet occurred (the spell end date is un
known), and so the total length of time between entry to and exit from
the state is unknown. Given entry at time 0 and observation at time t, we
only know that the completed spell is of length T t.
« Left censoring: the case when the start date of the spell was not observed,
so again the exact length of the spell (whether completed or incomplete)
is not known. Note that this is the de…nition of left censoring most com
monly used by social scientists. (Be aware that biostatisticians typically
1.2. SURVIVAL TIME DATA: SOME NOTABLE FEATURES 5
use a di¤erent de…nition: to them, leftcensored data are those for which
it is known that exit from the state occurred at some time before the
observation date, but it is not known exactly when. See e.g. Klein and
Moeschberger, 1997.)
By contrast, truncated survival time data are those for which there is a
systematic exclusion of survival times from one’s sample, and the sample se
lection e¤ect depends on survival time itself. We may distinguish two types of
truncation:
« Left truncation: the case when only those who have survived more than
some minimum amount of time are included in the observation sample
(‘small’ survival times – those below the threshold – are not observed).
Left truncation is also known by other names: delayed entry and stock
sampling with followup. The latter term is the mostcommonly referred
to by economists, re‡ecting the fact that data they use are often generated
in this way. If one samples from the stock of persons in the relevant state
at some time :, and interviews them some time later, then persons with
short spells are systematically excluded. (Of all those who began a spell
at time r < :, only those with relatively long spells survived long enough
to be found in the stock at time : and thence available to be sampled.)
Note that the spell start is assumed known in this case (cf. left censoring),
but the subject’s survival is only observed from some later date – hence
‘delayed entry’.
« Right truncation: this is the case when only those persons who have expe
rienced the exit event by some particular date are included in the sample,
and so relatively ‘long’ survival times are systematically excluded. Right
truncation occurs, for example, when a sample is drawn from the persons
who exit from the state at a particular date (e.g. an out‡ow sample from
the unemployment register).
The most commonly available survival time data sets contain a combination
of survival times in which either (i) both entry and exit dates are observed
(completed spell data), or (ii) entry dates are observed and exit dates are not
observed exactly (right censored incomplete spell data). The ubiquity of such
right censored data has meant that the term ‘censoring’ is often used as a short
hand description to refer to this case. We shall do so as well.
See Figure 2 for some examples of di¤erent types of spells. *** insert and
add comments ***
We assume that the process that gives rise to censoring of survival times is
independent of the survival time process. There is some latent failure time for
person i given by T
I
and some latent censoring time C
I
, and what we observe is
T
I
= min¦T
I
. C
I
¦. See the texts for more about the di¤erent types of censoring
mechanisms that have been distinguished in the literature. If rightcensoring is
not independent – instead its determinants are correlated with the determinants
of the transition process – then we need to model the two processes jointly.
6 CHAPTER 1. INTRODUCTION
An example is where censoring arises through nonrandom sample dropout
(‘attrition’).
1.2.2 Continuous versus discrete (or grouped) survival time
data
So far we have implicitly assumed that the transition event of interest may occur
at any particular instant in time; the stochastic process occurs in continuous
time. Time is a continuum and, in principle, the length of an observed spell
length can be measured using a nonnegative real number (which may be frac
tional). Often this is derived from observations on spell start dates and either
spell exit dates (complete spells) or last observation date (censored spells). Sur
vival time data do not always come in this form, however, and for two reasons.
The …rst reason is that survival times have been grouped or banded into
discrete intervals of time (e.g. numbers of months or years). In this case, spell
lengths may be summarised using the set of positive integers (1, 2, 3, 4, and so
on), and the observations on the transition process are summarized discretely
rather than continuously. That is, although the underlying transition process
may occur in continuous time, the data are not observed (or not provided) in
that form. Biostatisticians typically refer to this situation as one of interval
censoring, a natural description given the de…nitions used in the previous sub
section. The occurence of tied survival times may be an indicator of interval
censoring. Some continuous time models often (implicitly) assume that tran
sitions can only occur at di¤erent times (at di¤erent instants along the time
continuum), and so if there is a number of individuals in one’s data set with
the same survival time, one might ask whether the ties are genuine, or simply
because survival times have been grouped at the observation or reporting stage.
The second reason for discrete time data is when the underlying transition
process is an intrinsically discrete one. Consider, for example, a machine tool
set up to carry out a speci…c cycle of tasks and this cycle takes a …xed amount
of time. When modelling how long it takes for the machine to break down, it
would be natural to model failure times in terms of the number of discrete cycles
that the machine tool was in operation. Similarly when modelling fertility, and
in particular the time from puberty to …rst birth, it might be more natural to
measure time in terms of numbers of menstrual cycles rather than number of
calendar months.
Since the same sorts of models can be applied to discrete time data regardless
of the reason they were generated (as we shall see below), we shall mostly refer
simply to discrete time models, and constrast these with continuous time models.
Thus the more important distinction is between discrete time data and con
tinuous time data. Models for the latter are the most commonly available and
most commonly applied, perhaps re‡ecting their origins in the biomedical sci
ences. However discrete time data are relatively common in the social sciences.
One of the themes of this lecture course is that one should use models that
re‡ect the nature of the data available. For this reason, more attention is given
to discrete time models than is typically common. For the same reason, I give
1.2. SURVIVAL TIME DATA: SOME NOTABLE FEATURES 7
more explicit attention to how to estimate models using data sets containing
lefttruncated spells than do most texts.
1.2.3 Types of explanatory variables
There are two main types. Contrast, …rst, explanatory variables that describe
« the characteristics of the observation unit itself (e.g. a person’s age, or a
…rm’s size), versus
« the characteristics of the socioeconomic environment of the observation
unit (e.g. the unemployment rate of the area in which the person lives).
As far model speci…cation is concerned, this distinction makes no di¤erence.
It may make a signi…cant di¤erence in practice, however, as the …rst type of
variables are often directly available in the survey itself, whereas the second
type often have to be collected separately and then matched in.
The second contrast is between explanatory variables that are
« …xed over time, whether time refers to calendar time or survival time
within the current state, e.g. a person’s sex; and
« timevarying, and distinguish between those that vary with survival time
and those vary with calendar time.
The unemployment rate in the area in which a person lives may vary with
calendar time (the business cycle), and this can induce a relationship with sur
vival time but does not depend intrinsically on survival time itself. By contrast,
social assistance bene…t rates in Britain used to vary with the length of time
that bene…t had been received: Supplementary Bene…t was paid at the short
term rate for spells up to 12 months long, and paid at a (higher) longterm rate
for the 13th and subsequent months for spells lasting this long. (In addition
some calendar time variation in the bene…t generosity in real terms was induced
by in‡ation, and by annual uprating of bene…t amounts at the beginning of each
…nancial year (April).)
Some books refer to timedependent variables. These are either the same as
the timevarying variables described above or, sometimes, variables for which
changes over time can be written directly as a function of survival time. For
example, given some personal characteristic summarized using variable A, and
survival time t, such a timedependent variable might be A log(t).
The distinction between …xed and timevarying covariates is relevant for
both analytical and practical reasons. Having all explanatory variables …xed
means that analytical methods and empirical estimation are more straightfor
ward. With timevarying covariates, some model interpretations no longer hold.
And from a practical point of view, one has to reorganise one’s data set in order
to incorporate them and estimate models. More about this ‘episode splitting’
later on.
8 CHAPTER 1. INTRODUCTION
1.3 Why are distinctive statistical methods used?
This section provides some motivation for the distinctive specialist methods
that have been developed for survival analysis by considering why some of the
methods that are commonly used elsewhere in economics and other quantitative
social science disciplines cannot be applied in this context (at least in their
standard form). More speci…cally, what is the problem with using either (1)
Ordinary Least Squares (OLS) regressions of survival times, or with using (2)
binary dependent variable regression models (e.g. logit, probit) with transition
event occurrence as the dependent variable? Let us consider these in turn.
OLS cannot handle three aspects of survival time data very well:
« censoring (and truncation)
« timevarying covariates
« ‘structural’ modelling
1.3.1 Problems for OLS caused by right censoring
To illustrate the (right) censoring issue, let us suppose that the ‘true’ model
is such that there is a single explanatory variable, A
I
for each individual i =
1. . . . . :, who has a true survival time of T
I
. In addition, in the population, a
higher A is associated with a shorter survival time. In the sample, we observe
T
I
where T
I
= T
I
for observations with completed spells, and T
I
< T
I
for right
censored observations.
Suppose too that the incidence of censoring is higher at longer survival times
relative to shorter survival times. (This does not necessarily con‡ict with the
assumption of independence of the censoring and survival processes – it simply
re‡ects the passage of time. The longer the observation period, the greater the
proportion of spells for which events are observed.)
**CHART TO INSERT**
Data ‘cloud’: combinations of ‘true’ A
I
, T
I
By OLS, we mean: regress T
I
, or better still log T
I
(noting that survival times
are all nonnegative and distributions of survival times are typically skewed),
on A
I
, …tting the linear relationship
log(T
I
) = a +/A
I
+c
I
(1.1)
The OLS parameter estimates are the solution to min
o,b
n
¸
I=1
(c
I
)
2
. ^ a is the
vertical intercept;
´
/ is the slope of the least squares line.
Case (a) Exclude censored cases altogether
Sample data cloud less dense everywhere but disproportionately at higher t
**CHART TO INSERT**
1.3. WHY ARE DISTINCTIVE STATISTICAL METHODS USED? 9
Case (b) Treat censored durations as if complete
**CHART TO INSERT**
Underrecording – especially at higher t
sample OLS line has wrong slope again
1.3.2 Timevarying covariates and OLS
How can one handle timevarying covariates, given that OLS has only a single
dependent variable and there are multiple values for the covariate in question?
If one were to choose one value of a timevarying covariate at some particular
time as ‘representative’, which one would one choose of the various possibilities?
For example:
« the value at the time just before transition? But completed survival times
vary across people, and what would one do about censored observations
(no transition observed)?
« the value of the variable when the spell started, as this is the only de…nition
that is consistently de…ned? But then much information is thrown away.
In sum, timevarying covariates require some special treatment in modelling.
1.3.3 ‘Structural’ modelling and OLS
Our behavioural models of for example job search, marital search, etc., are
framed in terms of decisions about whether to do something (and observed
transitions re‡ect that choice). I.e. models are not formulated in terms of
completed spell lengths. Perhaps, then, we should model transitions directly.
1.3.4 Why not use binary dependent variable models rather
than OLS?
Given the above problems, especially the censoring one, one might ask whether
one could use instead a binary dependent regression model (e.g. logit, probit)?
I.e. one could get round the censoring issue (and the structural modelling one),
by simply modelling whether or not someone made a transition or not. (Obser
vations with a transition would have a ‘1’ for the dependent variable; censored
observations would have a ‘0’.) However, this strategy is also potentially prob
lematic:
« it takes no account of the di¤erences in time in which each person is at
risk of experiencing the event. One could get around this by considering
whether a transition occurred within some prespeci…ed interval of time
(e.g. 12 months since the spell began), but . . .
10 CHAPTER 1. INTRODUCTION
« one still loses a large amount of information, in particular about when
someone left if she or he did so.
Crosstabulations of (banded) survival times against some categorical/categorised
variable cannot be used for inference about the relationship between survival
time and that variable, for the same sorts of reasons. (Crosstabulations of a
dependent variable against each explanatory variable is often used with other
sorts of data to explore relationships.) In particular, the problems include:
« the dependent variable is mismeasured and censoring is not accounted
for;
« timevarying explanatory variables cannot be handled easily (current val
ues may be misleading)
1.4 Outline of the book
The preceding sections have argued that, for survival analysis, we need methods
that directly account for the sequential nature of the data, and are able to
handle censoring and incorporate timevarying covariates. The solution is to
model survival times indirectly, via the socalled ‘hazard rate’, which is a concept
related to chances of making a transition out of the current state at each instant
(or time period) conditional on survival up to that point. The rest of this book
elaborates this strategy, considering both continuous and discrete time models.
The hazard rate is de…ned more formally in Chapter 2. I also draw attention
to the intimate connections between the hazard, survivor, failure, and density
functions. Chapter 3 discusses functional forms for the hazard rate. I set out
some of the most commonlyused speci…cations, and explain how models may
be classi…ed into two main types: proportional hazards or accelerated failure
time models. I also show, for a selection of models, what the hazard function
speci…cation implies about the distribution of survival times (including the me
dian and mean spell lengths), and about the relationship between di¤erences
in survival times and di¤erences in characteristics (summarised by di¤erences
in values of explanatory variables). Survival analysis is not simply about es
timating model parameters, but also interpreting them and drawing out their
implications to the fullest extent.
In the subsequent chapters, we move from concepts to estimation. The
aim is to indicate the principles behind the methods used, rather than provide
details (or proofs) about the statistical properties of estimators, or the numerical
analysis methods required to actually derive them.
Chapter 4 discusses KaplanMeier (productlimit) and Lifetable estimators
of the survivor and hazard functions. These are designed to …t functions for
the sample as a whole, or for separate subgroups; they are not multivariate
regression models. Multivariate regression models in which di¤erences in char
acteristics are incorporated via di¤erences in covariate values are the subject
of Chapters 5–9. The problems with using OLS to model survival time data
1.4. OUTLINE OF THE BOOK 11
are shown to be resolved if one uses instead estimation based on the maximum
likelihood principle or the partial likelihood principle.
A continuing theme is that estimation has to take account of the sampling
scheme that generates the observed survival time data. The chapters indicate
how the likelihoods underlying estimation di¤er in each of the leading sam
pling schemes: random samples of spells (with or without right censoring),
and left and righttruncated spell data. We also examine how to incorporate
timevarying covariates using ‘episodesplitting’. Chapters 5 and 6 discuss con
tinuous time and discrete time regression models respectively, estimated using
maximum likelihood. Chapter 7 introduces Cox’s semiparametric proportional
hazard model for continuous time data, estimated using partial likelihood.
The remainder of the book discusses a selection of additional topics. Chap
ter 8 addresses the subject of unobserved heterogeneity (otherwise known as
‘frailty’). In the models considered in the earlier chapters, it is implicitly as
sumed that all relevant di¤erences between individuals can be summarised by
the observed explanatory variables. But what if there are unobserved or un
observable di¤erences? The chapter discusses the impact that unobserved het
erogeneity may have on estimates of regression coe¢cients and duration depen
dence, and outlines the methods that have been proposed to take account of
the problem. Chapter 9 considers estimation of competing risks models. Earlier
chapters consider models for exit to a single destination state; this chapter shows
how one can model transitions to a number of mutuallyexclusive destination
states. Chapter 10 [not yet written!] discusses repeated spell data (the rest of
the book assumes that the available data contain a single spell per subject).
A list of references and appendices complete the book.
12 CHAPTER 1. INTRODUCTION
Chapter 2
Basic concepts: the hazard
rate and survivor function
In this chapter, we de…ne key concepts of survival analysis, namely the hazard
function and the survivor function, and show that there is a onetoone relation
ship between them. For now, we shall ignore di¤erences in hazard rates, and
so on, across individuals in order to focus on how they vary with survival time.
How to incorporate individual heterogeneity is discussed in the next chapter.
2.1 Continuous time
The length of a spell for a subject (person, …rm, etc.) is a realisation of a
continuous random variable T with a cumulative distribution function (cdf),
1(t), and probability density function (pdf), 1(t). 1(t) is also known in the
survival analysis literature as the failure function. The survivor function is
o(t) = 1 ÷1(t); t is the elapsed time since entry to the state at time 0.
*Insert chart* of pdf, showing cdf as area under curve
Failure function (cdf)
Pr(T _ t) = 1(t) (2.1)
which implies, for the Survivor function:
Pr(T t) = 1 ÷1(t) = o(t). (2.2)
Observe that some authors use 1(t) to refer to the survivor function. I use
o(t) throughout.
*Insert chart* of pdf in terms of cdf
13
14CHAPTER 2. BASIC CONCEPTS: THE HAZARDRATE ANDSURVIVOR FUNCTION
The pdf is the slope of the cdf (Failure) function:
1(t) = lim
!0
Pr(t _ T _ t + t)
t
=
01(t)
0t
= ÷
0o(t)
0t
(2.3)
where t is a very small (‘in…nitesimal’) interval of time. The 1(t)t is akin to
the unconditional probability of having a spell of length exactly t, i.e. leaving
state in tiny interval of time [t. t + t].
The survivor function o(t) and the Failure function 1(t) are each probabili
ties, and therefore inherit the properties of probabilities. In particular, observe
that the survivor function lies between zero and one, and is a strictly decreasing
function of t. The survivor function is equal to one at the start of the spell
(t = 0) and is zero at in…nity.
0 _ o(t) _ 1 (2.4)
o(0) = 1 (2.5)
lim
!1
o(t) = 0 (2.6)
0o
0t
< 0 (2.7)
0
2
o
0t
2
? 0. (2.8)
The density function is nonnegative
1(t) _ 0 (2.9)
but may be greater than one in value (the density function does not summarize
probabilities).
2.1.1 The hazard rate
The hazard rate is a di¢cult concept to grasp, some people …nd. Let us begin
with its de…nition, and then return to interpretation. The continuous time
hazard rate, 0(t), is de…ned as:
0(t) =
1(t)
1 ÷1(t)
=
1(t)
o(t)
. (2.10)
Suppose that we let Pr(¹) be the probability of leaving the state in the tiny
interval of time between t and t+t, and Pr(1) be the probability of survival up
to time t, then the probability of leaving in the interval (t. t+t], conditional on
survival up to time t, may be derived from the rules of conditional probability:
Pr(¹[1) = Pr(¹¨ 1)´ Pr(1) = Pr(1[¹) Pr(¹)´ Pr(1) = Pr(¹)´ Pr(1).
(2.11)
2.1. CONTINUOUS TIME 15
since Pr(1[¹) = 1. But Pr(¹)´ Pr(1) = 1(t)t´o(t). This expression is closely
related to the expression that de…nes the hazard rate: compare it with (2.10):
0(t)t =
1(t)t
o(t)
. (2.12)
Thus 0(t)t. for tiny t. is akin to the conditional probability of having a spell
length of exactly t, conditional on survival up to time t. It should be stressed,
however, that the hazard rate is not a probability, as it refers to the exact time
t and not the tiny interval thereafter. (Later we shall consider discrete time
survival time data. We shall see that the discrete time hazard is a (condi
tional) probability.) The only restriction on the hazard rate, and implied by the
properties of 1(t) and o(t), is that:
0(t) _ 0.
That is, 0(t) may be greater than one, in the same way that the probability
density 1(t) may be greater than one.
The probability density function 1(t) summarizes the concentration of spell
lengths (exit times) at each instant of time along the time axis. The hazard
function summarizes the same concentration at each point of time, but condi
tions the expression on survival in the state up to that instant, and so can be
thought of as summarizing the instantaneous transition intensity.
To understand the conditioning underpinning the hazard rate further, con
sider an unemployment example. Contrast (i) conditional probability 0(12)t.
for tiny t.and (ii) unconditional probability 1(12)t. Expression (i) denotes
the probability of (re)employment in the interval (12. 12 + t) for a person
who has already been unemployed 12 months, whereas (ii) is the probability for
an entrant to unemployment of staying unemployed for 12 months and leaving
in interval (12. 12 + t). Alternatively, take a longevity example. Contrast the
unconditional probability of dying at age 12 (for all persons of a given birth
cohort), and probability of dying at age 12, given survival up to that age.
Economists may recognise the expression for the hazard as being of the same
form as the “inverse Mills ratio” that used when accounting for sample selection
biases.
2.1.2 Key relationships between hazard and survivor func
tions
I now show that there is a onetoone relationship between a speci…cation for
the hazard rate and a speci…cation for the survivor function. I.e. whatever
functional form is chosen for 0(t), one can derive o(t) and 1(t) from it, and also
1(t) and H(t). Indeed, in principle, one can start from any one of these di¤erent
characterisations of the distribution, and derive the others from it. In practice,
16CHAPTER 2. BASIC CONCEPTS: THE HAZARDRATE ANDSURVIVOR FUNCTION
one typically starts from considerations of the shape of the hazard rate.
0(t) =
1(t)
1 ÷1(t)
(2.13)
=
÷0[1 ÷1(t)]´0t
1 ÷1(t)
(2.14)
=
0¦÷ln[1 ÷1(t)]¦
0t
(2.15)
=
0¦÷ln[o(t)]¦
0t
(2.16)
using the fact that 0 ln[o(r)]´0r = o
0
(r)´o(r) and o(t) = 1 ÷ 1(t). Now
integrate both sides:

0
0(n)dn = ÷ln[1 ÷1(t)] [

0
. (2.17)
But 1(0) = 0 and ln(1) = 0. so
ln[1 ÷1(t)] = ln[o(t)] = ÷

0
0(n)dn, i.e. (2.18)
o(t) = exp
÷

0
0(n)dn
(2.19)
o(t) = exp[÷H(t)] (2.20)
where the integrated hazard function, H(t), is
H(t) =

0
0(n)dn (2.21)
= ÷ln[o(t)]. (2.22)
Observe that
H(t) _ 0
0H(t)
0t
= 0(t).
We have therefore demonstrated the onetoone relationships between the
various concepts. In Chapter 3, we take a number of common speci…cations
for the hazard rate and derive the corresponding survivor functions, integrated
hazard functions, and density functions.
2.2 Discrete time
As discussed earlier, discrete survival time data may arise because either (i) the
time scale is intrinsically discrete, or (ii) survival occurs in continuous time but
spell lengths are observed only in intervals (‘grouped’ or ‘banded’ data). Let us
consider the latter case …rst.
2.2. DISCRETE TIME 17
2.2.1 Survival in continuous time but spell lengths are
intervalcensored
*Insert chart (time scale) *
We suppose that the time axis may be partitioned into a number of contigu
ous nonoverlapping (‘disjoint’) intervals where the interval boundaries are the
dates a
0
= 0. a
1
. a
2
. a
3
. .... a

.The intervals themselves are described by
[0 = a
0
. a
1
]. (a
1
. a
2
]. (a
2
. a
3
]. .... (a
1
. a

= ·]. (2.23)
This de…nition supposes that interval (a
,1
. a
,
] begins at the instant after the
date marking the beginning and end of the interval (a
,1
). The time index
ing the end of the interval (a
,
) is included in the interval.
1
Observe that
a
1
. a
2
. a
3
. .... are dates (points in time), and the intervals need not be of equal
length (though we will later suppose for convenience that they are).
The value of the survivor function at the time demarcating the start of the
,th interval is
Pr(T a
,1
) = 1 ÷1(a
,1
) = o(a
,1
) (2.24)
where 1(.) is the Failure function de…ned earlier. The value of the survivor
function at the end of the ,th interval is
Pr(T a
,
) = 1 ÷1(a
,
) = 1(a
,
) = o(a
,
). (2.25)
The probability of exit within the ,th interval is
Pr(a
,1
< T _ a
,
) = 1(a
,
) ÷1(a
,1
) = o(a
,1
) ÷o(a
,
). (2.26)
The interval hazard rate, /(a
,
), also known as the discrete hazard rate, is the
probability of exit in the interval (a
,1
. a
,
], and de…ned as:
/(a
,
) = Pr(a
,1
< T _ a
,
[T a
,1
) (2.27)
=
Pr(a
,1
< T _ a
,
)
Pr(T a
,1
)
(2.28)
=
o(a
,1
) ÷o(a
,
)
o(a
,1
)
(2.29)
= 1 ÷
o(a
,
)
o(a
,1
)
(2.30)
Note that the interval time hazard is a (conditional) probability, and so
0 _ /(a
,
) _ 1. (2.31)
1
One could, alternatively, de…ne the intervals as [a
j1
; a
j
), for j = 1; :::; k. The choice
is largely irrelevant in development of the theory, though it can matter in practice, because
di¤erent software packages use di¤erent conventions and this can lead to di¤erent results. (The
di¤erences arise when one splits spells into a sequence of subepisodes – ‘episode splitting’.)
I have used the de…nition that is consistent with Stata’s de…nition of intervals. The TDA
package uses the other convention.
18CHAPTER 2. BASIC CONCEPTS: THE HAZARDRATE ANDSURVIVOR FUNCTION
In this respect, the discrete hazard rate di¤ers from the continuous time hazard
rate, for which 0(n) _ 0 and may be greater than one.
Although the de…nition used so far refers to intervals that may, in principle,
be of di¤erent lengths, in practice it is typically assumed that intervals are of
equal unit length, for example a ‘week’ or a ‘month’). In this case, the time
intervals can be indexed using the positive integers. Interval (a
,1
. a
,
] may be
relabelled (a
,
÷1. a
,
], for a
,
= 1. 2. 3. 4. ..., and we may refer to this as the ,th
interval, and to the interval hazard rate as /(,) rather than /(a
,
).
We shall assume that intervals are of unit length from now on, unless other
wise stated.
The probability of survival until the end of interval , is the product of prob
abilities of not experiencing event in each of the intervals up to and including
the current one. For example, o
3
= (probability of survival through interval 1)
(probability of survival through interval 2, given survival through interval 1)
(probability of survival through interval 3, given survival through interval 2).
Hence, more generally, we have:
o(,) = o
,
= (1 ÷/
1
)(1 ÷/
2
).....(1 ÷/
,1
)(1 ÷/
,
) (2.32)
=
,
¸
=1
(1 ÷/

) (2.33)
o(,) refers to a discrete time survivor function, written in terms of interval
hazard rates. We shall reserve the notation o(a
,
) or, more generally o(t), to
refer to the continuous time survivor function, indexed by a date – an instant
of time – rather than an interval of time. (Of course the two functions imply
exactly the same value since, by construction, the end of the ,th interval is date
a
,
.) In the special case in which the hazard rate is constant over time (in which
case survival times follow a Geometric distribution), i.e. /
,
= /, all ,, then
o(,) = (1 ÷/)
,
(2.34)
log[ o(,)] = , log(1 ÷/). (2.35)
The discrete time failure function, 1(,), is
1(,) = 1
,
= 1 ÷o(,) (2.36)
= 1 ÷
,
¸
=1
(1 ÷/

). (2.37)
The discrete time density function for the intervalcensored case, 1(,), is the
probability of exit within the ,th interval:
2.2. DISCRETE TIME 19
1(,) = Pr(a
,1
< T _ a
,
) (2.38)
= o(, ÷1) ÷o(,)
=
o(,)
1 ÷/
,
÷o(,)
=
1
1 ÷/
,
÷1
o(,)
=
/
,
1 ÷/
,
,
¸
=1
(1 ÷/

). (2.39)
Thus the discrete density is the probability of surviving up to the end of interval
, ÷ 1. multiplied by the probability of exiting in the ,

interval. Expression
(2.39) is used extensively in later chapters when deriving expressions for sample
likelihoods. Observe that
0 _ 1(,) _ 1. (2.40)
2.2.2 The discrete time hazard when time is intrinsically
discrete
In the case in which survival times are instrinsically discrete, survival time T is
now a discrete random variable with probabilities
1(,) = 1
,
= Pr(T = ,) (2.41)
where , ÷ ¦1. 2. 3. . . .¦. the set of positive integers. Note that , now indexes ‘cy
cles’ rather than intervals of equal length. But we can apply the same notation;
we index survival times using the set of positive integers in both cases. The
discrete time survivor function for cycle ,, showing the probability of survival
for , cycles, is given by:
o(,) = Pr(T _ ,) =
1
¸
=,
1

. (2.42)
The discrete time hazard at ,. /(,). is the conditional probability of the event
at , (with conditioning on survival until completion of the cycle immediately
before the cycle at which the event occurs) is:
/(,) = Pr(T = ,[T _ ,) (2.43)
=
1(,)
o(, ÷1)
(2.44)
It is more illuminating to write the discrete time survivor function analogously
to the expression for the equallength interval case discussed above (for survival
20CHAPTER 2. BASIC CONCEPTS: THE HAZARDRATE ANDSURVIVOR FUNCTION
to the end of interval ,), i.e.:
o
,
= o(,) = (1 ÷/
1
)(1 ÷/
2
).....(1 ÷/
,1
)(1 ÷/
,
) (2.45)
=
,
¸
=1
(1 ÷/

). (2.46)
The discrete time failure function is:
1
,
= 1(,) = 1 ÷o(,) (2.47)
= 1 ÷
,
¸
=1
(1 ÷/

). (2.48)
Observe that the discrete time density function can also be written as in the
intervalcensored case, i.e.:
1(,) = /
,
o
,1
=
/
,
1 ÷/
,
o
,
. (2.49)
2.2.3 The link between the continuous time and discrete
time cases
In the discrete time case, we have from (2.45) that:
log o(,) =
,
¸
=1
log(1 ÷/

). (2.50)
For ‘small’ /

. a …rstorder Taylor series approximation may be used to show
that
log(1 ÷/

)  ÷/

(2.51)
which implies, in turn, that
log o(,)  ÷
,
¸
=1
/

. (2.52)
Now contrast this expression with that for the continuous time case, and note the
parallels between the summation over discrete hazard rates and the integration
over continuous time hazard rates:
logo(t) = ÷H(t) = ÷

0
0(n)dn. (2.53)
As /

becomes smaller and smaller, the closer that discrete time hazard /
,
is to
the continuous time hazard 0(t) and, correspondingly, the discrete time survivor
function tends to the continuous time one.
2.3. CHOOSING A SPECIFICATION FOR THE HAZARD RATE 21
2.3 Choosing a speci…cation for the hazard rate
The empirical analyst with survival time data to hand has choices to make before
analyzing them. First, should the survival times be treated as observations on
a continuous random variable, observations on a continuous random variable
which is grouped (interval censoring), or observations on an intrinsically discrete
random variable? Second, conditional on that choice, what is the shape of the
allimportant relationship between the hazard rate and survival time? Let us
consider these questions in turn.
2.3.1 Continuous or discrete survival time data?
The answer to this question may often be obvious, and clear from consideration
of both the underlying behavioural process generating survival times, and the
process by which the data were recorded. Intrinsically discrete survival times
are rare in the social sciences. The vast majority of the behavioural processes
that social scientists study occur in continuous time, but it is common for the
data summarizing spell lengths to be recorded in grouped form. Indeed virtually
all data are grouped (even with survival times recorded in units as small as days
or hours).
A key issue, then, is the length of the intervals used for grouping relative to
the typical spell length: the smaller the ratio of the former to the latter, the
more appropriate it is to use a continuous time speci…cation.
If one has information about the day, month, and year in which a spell
began, and also the day, month, and year, at which subjects were last observed
– so survival times are measured in days – and the typical spell length is several
months or years, then it is reasonable to treat survival times as observations on
a continuous random variable (not grouped). But if spells length are typically
only a few days long, then recording them in units of days implies substantial
grouping. It would then make sense to use a speci…cation that accounted for the
interval censoring. A related issue concerns ‘tied’ survival times – more than
one individual in the data set with the same recorded survival time. A relatively
high prevalence of ties may indicate that the banding of survival times should
be taken into account when choosing the speci…cation. For some analysis of
the e¤ects of grouping, see Bergström and Edin (1992), Petersen (1991), and
Petersen and Koput (1992).
Historically, many of the methods developed for analysis of survival time
data assumed that the data set contained observations on a continuous random
variable (and arose in applications where this assumption was reasonable). Ap
plication of these methods to social science data, often intervalcensored, was
not necessarily appropriate. Today, this is much less of a problem. Methods for
handling intervalcensored data (or intrinsically discrete data) are increasingly
available, and one of the aims of this book is to promulgate them.
22CHAPTER 2. BASIC CONCEPTS: THE HAZARDRATE ANDSURVIVOR FUNCTION
2.3.2 The relationship between the hazard and survival
time
The only restriction imposed so far on the continuous time hazard rate is 0(t) _
0, and on the discrete time hazard rate is 0 _ /(,) _ 1. Nothing has been
assumed about the pattern of duration dependence – how the hazard varies with
survival times. The following desiderata suggest themselves:
« A shape that is empirically relevant (or suggested by theoretical models).
This is likely to di¤er between applications – the functional form describ
ing human mortality is likely to di¤er from that for transitions out of
unemployment transitions, for failure of machine tools, and so on.
« A shape that has convenient mathematical properties e.g. closed form
expressions for 0 (t) . and o (t) and tractable expressions for summary sta
tistics of survival time distributions such as the mean and median survival
time.
2.3.3 What guidance from economics?
As an illustration of the …rst point, consider the extent to which economic theory
might provides suggestions for what the shape of the hazard rate is like. Con
sider a twostate labour market, where the two states are (1) employment, and
(2) unemployment. Hence the only way to leave unemployment is by becoming
employed. To leave unemployment requires that an unemployed person both
receives a job o¤er, and that that o¤er is acceptable. (The job o¤er probability
is conventionally considered to be under the choice of …rms, and the acceptance
probability dependent on the choice of workers.) For a given worker, we may
write the unemployment exit hazard rate 0(t) as the product of the job o¤er
hazard :(t) and the job acceptance hazard ¹(t):
0(t) = :(t)¹(t). (2.54)
A simple structural model
In a simple job search framework, the unemployed person searches across the
distribution of wage o¤ers, and the optional policy is to adopt a reservation
wage r, and accept a job o¤er with associated wage n only if n _ r. Hence,
0(t) = :(t) [1 ÷\(t)] (2.55)
where \ (t) is the cdf of the wage o¤er distribution facing the worker. How the
reemployment hazard varies with duration thus depends on:
1. How the reservation wage varies with duration of unemployment. (In an
in…nite horizon world one would expect r to be constant; in a …nite horizon
world, one would expect r to decline with the duration of unemployment.)
2.3. CHOOSING A SPECIFICATION FOR THE HAZARD RATE 23
2. How the job o¤er hazard : varies with duration of unemployment. (It is
unclear what to expect.)
In sum, the simple search model provides some quite strong restrictions on
the hazard rate (if in‡uences via : are negligible).
Reduced form approach
An alternative interpretation is that the simple search model (or even more
sophisticated variants) place too many restrictions on the hazard function, i.e.
the restrictions may be consequence of the simplifying assumptions used to make
the model tractable, rather than intrinsic. Hence one may want to use a reduced
form approach, and write the hazard rate more generally as
0(t) = 0 (A (t. :) . t) . (2.56)
where A is a vector of personal characteristics that may vary with unemploy
ment duration (t) or with calendar time (:). That is we allow, in a more ad hoc
way, for the fact that:
1. unemployment bene…ts may vary with duration t. and maybe also calendar
time : (because of policy changes, for example); and
2. local labour market conditions may vary with calendar time (:); and
3. 0 may also vary directly with survival time, t.
Examples of this include
« Employers screening unemployed applicants on the basis of how long each
applicant has been unemployed, for example rejecting the longerterm un
employed) : 0:´0t < 0.
« The reservation wage falling with unemployment duration: 0¹´0t 0 (a
resource e¤ect);
« Discouragement (or a ‘welfare culture’ or ‘bene…t dependence’ e¤ect) may
set in as the unemployment spell lengthens, leading to decline in search
intensity: 0:´0t < 0.
« Time limits on eligibility to Unemployment Insurance (UI) may lead to a
bene…t exhaustion e¤ect, with the reemployment hazard (0) rising as the
time limit approaches.
Observe the variety of potential in‡uences. Moreover, some of the in‡uences
mentioned would imply that the hazard rises with unemployment duration,
whereas others imply that the hazard declines with duration. The actual shape
of the hazard will re‡ect a mixture of these e¤ects. This suggests that it is
important not to preimpose particular shape on the hazard function. What
24CHAPTER 2. BASIC CONCEPTS: THE HAZARDRATE ANDSURVIVOR FUNCTION
one needs to do is strike a balance between what a theoretical (albeit simpli…ed)
model might suggest, and ‡exibility in speci…cation and ease of estimation. This
will improve model …t, though of course problems of interpretation may remain.
(Without the structural model, one cannot identify which in‡uence has which
e¤ect so easily.)
Although the examples above referred to modelling of unemployment du
ration, much the same issues for model selection are likely to arise in other
contexts.
Chapter 3
Functional forms for the
hazard rate
3.1 Introduction and overview: a taxonomy
The last chapter suggested that there is no single shape for the hazard rate that
is appropriate in all contexts. In this chapter we review the functional forms for
the hazard rate that have been most commonly used in the literature. What this
means in terms of survival, density and integrated hazard functions is examined
in the following chapter.
We begin with an overview and taxonomy, and then consider continuous
time and discrete time speci…cations separately. See Table 3.1. The entries in
the table in the emphasized font are the ones that we shall focus on. One further
distinction between the speci…cations, discussed later in the chapter, refers to
their interpretation – whether the models can be described as
« proportional hazards models (PH), or
« accelerated failure time models (AFT).
Think of the PH and AFT classi…cations as describing whole families of
survival time distributions, where each member of a family shares a common
set of features and properties.
Note that although we speci…ed hazard functions simply as a function of
survival time in the last chapter, now we add an additional dimension to the
speci…cation – allowing the hazard rate to vary between individuals, depending
on their characteristics.
25
26 CHAPTER 3. FUNCTIONAL FORMS FOR THE HAZARD RATE
Continuous time Continuous time Discrete time
parametric semiparametric
Exponential Piecewise constant Exponential Logistic
Weibull ‘Cox’ model Complementary loglog
Loglogistic
Lognormal
Gompertz
Generalized Gamma
Table 3.1: Functional forms for the hazard rate: examples
3.2 Continuous time speci…cations
In the previous chapter, we referred to the continuous time hazard rate, 0(t)
and ignored any potential di¤erences in hazard rates between individuals. Now
we remedy that. We suppose that the characteristics of a given individual may
be summarized by a vector of variables A and, correspondingly, now refer to
the hazard function 0 (t. A) and survivor function o (t. A), integrated hazard
function H (t. A), and so on.
The way is which heterogeneity is incorporated is as follows. We de…ne a
linear combination of the characteristics:
0
A =
0
+
1
A
1
+
2
A
2
+
3
A
3
+. . . +
1
A
1
. (3.1)
There are 1 variables observed for each person, and the s are parameters,
later to be estimated. Observe that the linear index
0
A, when evaluated, is
simply a single number for each individual. For the moment, suppose that the
values of the As do not vary with survival or calendar time, i.e. there are no
timevarying covariates.
3.2.1 Weibull model and Exponential model
The Weibull model is speci…ed as:
0 (t. A) = ct
o1
exp
0
A
(3.2)
= ct
o1
` (3.3)
where ` = exp
0
A
, c 0, and exp(.) is the exponential function. The
hazard rate either rises monotonically with time (c 1), falls monotonically
with time (c < 1), or is constant. The last case, c = 1, is the special case of the
Weibull model known as the Exponential model. For a given value of c, larger
values of ` imply a larger hazard rate at each survival time. The c is the shape
parameter.
Beware: di¤erent normalisations and notations are used by di¤erent au
thors. For example, you may see the Weibull hazard written as 0 (t. A) =
3.2. CONTINUOUS TIME SPECIFICATIONS 27
(1´o)t
(1/c)1
` where o = 1´c. (The reason for this will become clearer in
the next section.) Cox and Oakes (1984) characterise the Weibull model as
0 (t. A) = {`(`t)
{1
= {`
{
t
{1
.
*Insert charts* showing how hazard varies with (i) variations in c with …xed
`; (ii) variations in ` with …xed c.
3.2.2 Gompertz model
The Gompertz model has a hazard function given by:
0 (t. A) = `exp(t) (3.4)
and so the log of the hazard is a linear function of survival time:
log 0 (t. A) =
0
A +t. (3.5)
If the shape parameter 0, then the hazard is monotonically increasing; if
= 0, it is constant; and if < 0, the hazard declines monotonically. *Insert
charts* showing how hazard varies with (i) variations in with …xed `; (ii)
variations in ` with …xed .
3.2.3 Loglogistic Model
To specify the next model, we use the parameterisation c = exp(÷
0
A), where
0
A =
0
+
1
A
1
+
2
A
2
+
3
A
3
+. . . +
1
A
1
. (3.6)
The reason for using a parameterization based on c and
, rather than on `
and , as for the Weibull model, will become apparent later, when we compare
proportional hazard and accelerated failure time models.
The Loglogistic model hazard rate is:
0 (t. A) =
c
1
t
(
1
1)
1 + (ct)
1
(3.7)
where shape parameter 0. An alternative representation is:
0 (t. A) =
.c
,
t
,1
1 + (ct)
,
(3.8)
where . = 1´. Klein and Moeschberger (1997) characterise the loglogistic
model hazard function as 0 (t. A) = .jt
,1
´(1 + jt
,
), which is equivalent to
the expression above with j = c
,
.
*Insert chart* of hazard rate against time.
Observe that the hazard rate is monotonically decreasing with survival time
for _ 1 (i.e. . _ 1). If < 1 (i.e. . 1), then the hazard …rst rises with
time and then falls monotonically.
28 CHAPTER 3. FUNCTIONAL FORMS FOR THE HAZARD RATE
3.2.4 Lognormal model
This model has hazard rate
0 (t. A) =
1
c
p
2t
exp
¸
÷
1
2
ln()µ
c
¸
2
1 ÷
ln()µ
c
(3.9)
where (.) is the standard Normal cumulative distribution function and char
acteristics are incorporated with the parameterisation j =
0
A. The hazard
rate is similar to that for the Loglogistic model for the case < 1 (i.e. …rst
rising and then declining).
3.2.5 Generalised Gamma model
This model has a rather complicated speci…cation involving two shape para
meters. Let us label them { and o, as in the Stata manual: they are the
shape and scale parameters respectively. The hazard function is quite ‡exible
in shape, even including the possibility of a U shaped or socalled ‘bathtub’
shaped hazard (commonly cited as a plausible description for the hazard of hu
man mortality looking at the lifetime as a whole). The Generalised Gamma
incorporates several of the other models as special cases. If { = 1, we have the
Weibull model; if { = 1. o = 1, we have the Exponential model. With { = 0,
the Lognormal model results. And if { = o, then one has the standard Gamma
distribution. These relationships mean that the generalised Gamma is useful
for testing model speci…cation: by estimating this general model, one can use a
Wald (or likelihood ratio) test to investigate whether one of the nested models
provides a satisfactory …t to the data.
*Insert chart* of hazard rate against time.
Note that a number of other parametric models have been proposed. For
example, there is the generalized F distribution (see Kalb‡eisch and Prentice
1980).
3.2.6 Proportional Hazards (PH) models
Let us now return to the general case of hazard rate 0 (t. A), i.e. the hazard rate
at survival time t for a person with …xed covariates summarised by the vector
A.
The PH speci…cation
Proportional hazards models are also known as ‘multiplicative hazard’ models,
or ‘log relative hazard’ models for reasons that will become apparent shortly.
The models are characterised by their satisfying a separability assumption:
0 (t. A) = 0
0
(t) exp(
0
A) = 0
0
(t) ` (3.10)
3.2. CONTINUOUS TIME SPECIFICATIONS 29
where
« 0
0
(t) : the ‘baseline hazard’ function, which depends on t (but not A). It
summarizes the pattern of ‘duration dependence’, assumed to be common
to all persons;
« ` = exp(
0
A) : a personspeci…c nonnegative function of covariates A
(which does not depend on t, by construction), which scales the baseline
hazard function common to all persons. In principle, any nonnegative
function might be used to summarise the e¤ects of di¤erences in personal
characteristics, but since exp(.) is the function that is virtually always
used, we will use it.
Note that I have used one particular convention for the de…nition of the
baseline hazard function. An alternative characterization writes the propor
tional hazard model as
0 (t. A) = 0
0
(t) `
(3.11)
where
0
0
(t) = 0
0
(t) exp(
0
) and `
= exp(
1
A
1
+
2
A
2
+
3
A
3
+. . .+
1
A
1
). (3.12)
The rationale for this alternative representation is that the intercept term (
0
)
is common to all individuals, and is therefore not included in the term summa
rizing individual heterogeneity. By contrast, the …rst representation treats the
intercept as a regression parameter like the other elements of . In most cases,
it does not matter which characterization is used.
Interpretation
The PH property implies that absolute di¤erences in A imply proportionate
di¤erences in the hazard at each t. For some t = t. and for two persons i and ,
with vectors of characteristics A
I
and A
,
,
0
t. A
I
0
t. A
,
= exp
0
A
I
÷
0
A
,
= exp
0
(A
I
÷A
,
)
. (3.13)
We can also write (3.13) in ‘log relative hazard’ form:
log
¸
0
t. A
I
0
t. A
,
¸
=
0
(A
I
÷A
,
) . (3.14)
Observe that the righthand side of these expressions does not depend on sur
vival time (by assumption the covariates are not time dependent), i.e. the
proportional di¤erence result in hazards is constant.
If persons i and , are identical on all but the /th characteristic, i.e. A
In
=
A
,n
for all : ÷ ¦1. .... 1`/¦, then
30 CHAPTER 3. FUNCTIONAL FORMS FOR THE HAZARD RATE
0
t. A
I
0
t. A
,
= exp[

(A
I
÷A
,
)]. (3.15)
If, in addition, A
I
÷ A
,
= 1, i.e. there is a one unit change in A

, ceteris
paribus, then
0
t. A
I
0
t. A
,
= exp(

). (3.16)
The right hand side of this expression is known as the hazard ratio. It also shows
the proportionate change in the hazard given a change in a dummy variable
covariate from zero to one or, more precisely a change from A
I
= 0 to A
,
= 1,
with all other covariates held …xed.
There is a nice interpretation of the regression coe¢cients in PH models.
The coe¢cient on the /th covariate A,

, has the property

= 0 log 0(t. A)´0A

(3.17)
which tells us that in a PH model, each regression coe¢cient summarises the
proportional e¤ect on the hazard of absolute changes in the corresponding co
variate. This e¤ect does not vary with survival time.
Alternatively, we can relate

to the elasticity of the hazard with respect
to A

. Recall that an elasticity summarizes the proportional response in one
variable to a proportionate change in another variable, and so is given by
A

0 log 0(t. A)´0A

=

A

. If A

= log(7

), so that the covariate is mea
sured in logs, then it follows that

is the elasticity of the hazard with respect
to 7

.
If Stata is used to estimate a PH model, you can choose to report estimates
of either
´

or exp(
´

), for each /.
In addition, observe that, for two persons with the same A = A. but di¤erent
survival times:
0
t. A
0
n. A
=
0
0
(t)
0
0
(n)
(3.18)
so the ratio of hazards does not depend on A in this case.
Some further implications of the PH assumption
If 0 (t. A) = 0
0
(t) ` and ` does not vary with survival time, i.e. the PH speci
…cation applies, then
o (t. A) = exp
¸
÷

0
0 (n) dn
(3.19)
= exp
¸
÷`

0
0
0
(n) dn
(3.20)
= [o
0
(t)]
X
(3.21)
3.2. CONTINUOUS TIME SPECIFICATIONS 31
given the ‘baseline’ survivor function, o
0
(t), where
o
0
(t) = exp
¸
÷

0
0
0
(n) dn
. (3.22)
Thus log o (t. A) = `log o
0
(t). You can verify that it is also the case that
log o (t. A) = `
log o
0
(t) where o
0
(t) = exp
÷

0
0
0
(n) dn
.
Hence, in the PH model, di¤erences in characteristics imply a scaling of
the common baseline survivor function. Given the relationships between the
survivor function and the probability density function, we also …nd that for a
PH model
1(t) = 1
0
(t)`[o
0
(t)]
X1
(3.23)
where 1
0
(t) is the baseline density function, 1
0
(t) = 1(t[A = 0).
Alternatively, (3.19) may be rewritten in terms of the integrated hazard
function H(t) = ÷lno(t), implying
H(t) = `H
0
(t) . (3.24)
where H
0
(t) = ÷lno
0
(t).
If the baseline hazard does not vary with survival time, 0
0
(n) = 0 for all n,
then H
0
(t) =

0
0
0
(n) dn = 0

0
dn = 0t. So, in the constant hazard rate case,
a plot of the integrated hazard against survival time should yield a straight line
through the origin. Indeed, if the integrated hazard plot is concave to the origin
(rising but at a decreasing rate), then this suggests that the hazard rate declines
with survival time rather than constant. Similarly if the integrated hazard plot
is convex to the origin (rising but at a increasing rate), then this suggests that
the hazard rate increases with survival time.
The relationship between the survivor function and baseline survivor func
tion also implies that
ln[÷lno(t)] = ln` + ln(÷ln[o
0
(t)]) (3.25)
=
0
A + ln(÷ln[o
0
(t)]) (3.26)
or, rewritten in terms of the integrated hazard function,
ln[H(t)] =
0
A + ln[H
0
(t)] (3.27)
The lefthand side is the log of the integrated hazard function at survival time
t. The result suggests that one can check informally whether a model satis…es
the PH assumption by plotting the log of the integrated hazard function against
survival time (or a function of time) for groups of persons with di¤erent sets
of characteristics. (The estimate of the integrated hazard would be derived
using a nonparametric estimator such as the NelsonAalen estimator referred
to in Chapter 4.) If the PH assumption holds, then the plots of the estimated
ln[H(t)] against t for each of the groups should have di¤erent vertical intercepts
32 CHAPTER 3. FUNCTIONAL FORMS FOR THE HAZARD RATE
but a common shape, moving in parallel. To see this, suppose that there are
two groups, labelled ¹ and 1. Hence,
ln[H
.
(t)] =
0
A
.
+ ln[H
0
(t)]
ln[H
1
(t)] =
0
A
1
+ ln[H
0
(t)].
The groupspeci…c intercepts are the numbers implied by
0
A
.
and
0
A
1
whereas the common shape arises because the remaining term, the function
of t is common to both groups.
Incorporating timevarying covariates
It is relatively straightforward in principle to generalise the PH speci…cation to
allow for timevarying covariates. For example, suppose that
0 (t. A

) = 0
0
(t) exp(
0
A

) (3.28)
where there is now a ‘t’ subscript on the vector of characteristics A. Observe
that for any given survival time t = t, we still have that absolute di¤erences in
A correspond to proportional di¤erences in the hazard. However the propor
tionality factor now varies with survival time rather than being constant (since
the variables in A on the righthand side of the equation are timevarying).
The survivor function is more complicated too. From the standard relationship
between the hazard and the survivor functions, we have:
o (t. A

) = exp
¸
÷

0
0 (n) dn
(3.29)
= exp
¸
÷

0
0
0
(n) exp(
0
A
u
)dn
. (3.30)
In general, the survivor function cannot be simply factored as in the case when
covariates are constant. Progress can be made, however, by assuming that each
covariate is constant within some prede…ned time interval.
To illustrate this, suppose that there is a single covariate that takes on two
values, depending on whether survival time is before or after some date :, i.e.
A = A
1
if t < :
A = A
2
if t _ :.
(3.31)
The survivor function (3.29) now becomes
3.2. CONTINUOUS TIME SPECIFICATIONS 33
o (t. A

) = exp
¸
÷
s
0
0
0
(n) exp(A
1
)dn ÷

s
0
0
(n) exp(A
2
)dn
(3.32)
= exp
¸
÷`
1
s
0
0
0
(n)dn ÷`
2

s
0
0
(n) dn
(3.33)
= exp
¸
÷`
1
s
0
0
0
(n) dn
exp
¸
÷`
2

s
0
0
(n) dn
(3.34)
= [o
0
(:)]
X1
[o
0
(t)]
X2
[o
0
(:)]
X2
. (3.35)
The probability of survival until time t is the probability of survival until time
:, times the probability of survival until time t conditional on survival up until
time : (the conditioning is accounted for by the expression in the denominator
of the second term in the product). The density function can be derived in the
usual way as the product of the survivor function and the hazard rate.
Expressions of this form underpin the process of estimation of PH models
with timevarying covariates. As we shall see in Chapter 5, there are two parts
to practical implementation: (a) reorganisation of one’s data set to incorporate
the timevarying covariates (‘episode splitting’), and (b) utilisation of estimation
routines that allow for conditioned survivor functions such as those in the second
term in (3.32) – socalled ‘delayed entry’.
3.2.7 Accelerated Failure Time (AFT) models
Let us again assume for the moment that personal characteristics do not vary
with survival time.
AFT speci…cation
The AFT model assumes a linear relationship between the log of (latent) survival
time T and characteristics A :
ln(T) =
0
A +. (3.36)
where
is a vector of parameters (cf. 3.6), and . is an error term. This
expression may be rewritten as
1 = j +on (3.37)
or
1 ÷j
o
= n (3.38)
where 1 = ln(T) . j =
0
A, and n = .´o is an error term with density function
1(n), and o is a scale factor (which is related to the shape parameters for the
hazard function – see below). This form is sometimes referred to as a generalized
linear model (GLM) or loglinear model speci…cation.
34 CHAPTER 3. FUNCTIONAL FORMS FOR THE HAZARD RATE
Distribution of n Distribution of T
Extreme Value (1 parameter) Exponential
Extreme Value (2 parameter) Weibull
Logistic Loglogistic
Normal Lognormal
log Gamma (3 parameter Gamma) Generalised Gamma
Table 3.2: Di¤erent error term distributions imply di¤erent AFT models
Distributional assumptions about n determine which sort of regression model
describes the random variable T: see Table 3.2.
In the case of the Exponential distribution, the scale factor o = 1, and
1(n) = exp[n ÷ exp(n)]. In the Weibull model, o is a free parameter, and
o = 1´c to use our earlier notation. The density 1(n) = cexp[cn ÷ exp(cn)]
in this case. For the Loglogistic model, 1(n) = exp(n)´[1 + exp(n)], and free
parameter o = ., to use our earlier notation.
For the Lognormal model, the error term n is normally distributed, and we
have a speci…cation of a (log)linear model that appears to be similar to the
standard linear model that is typically estimated using ordinary least squares.
This is indeed the case: it is the example that we considered in the Introduction.
The speci…cation underscores the claim that the reason that OLS did not provide
good estimates in the example was because it could not handle censored data
rather than the speci…cation itself. (The trick to handle censored data turns
out to hinge on a di¤erent method of estimation, i.e. maximum likelihood. See
Chapter 5.) The table also indicates that the normal linear model might be
inappropriate for an additional reason: the hazard rate function may have a
shape other than that assumed by the lognormal speci…cation.
Interpretation
But why the ‘Accelerated Failure Time’ label for these models? From (3.36),
and letting c = exp
÷
0
A
= exp(÷j), it follows that:
ln(Tc) = .. (3.39)
The term c, which is constant by assumption, acts like a time scaling factor.
The two key cases are as follows:
« c 1: it is as if the clock ticks faster (the time scale for someone with
characteristics A is Tc, whereas the time scale for someone with charac
teristics A = 0 is T). Failure is ‘accelerated’ (survival time shortened).
« c < 1 : it is as if the clock ticks slower. Failure is ‘decelerated’ (survival
time lengthened).
We can also see this time scaling property directly in terms of the survivor
function. Recall the de…nition of the survivor function:
3.2. CONTINUOUS TIME SPECIFICATIONS 35
o(t. A) = Pr[T t[A]. (3.40)
Expression (3.40) is equivalent to writing:
o(t. A) = Pr[1 ln(t)[A] (3.41)
= Pr[on ln(t) ÷j] (3.42)
= Pr[exp(on) t exp(÷j)]. (3.43)
Now de…ne a ‘baseline’ survivor function, o
0
(t. A), which is the corresponding
function when all covariates are equal to zero, i.e. A = 0, in which case then
j =
0
, and exp(÷j) = exp(÷
0
) = c
0
. Thus:
o
0
(t) = Pr[T t[A = 0]. (3.44)
This expression can be rewritten using (3.43) as
o
0
(t) = Pr[exp (on) tc
0
] (3.45)
or
o
0
(:) = Pr[exp (on) :c
0
] (3.46)
for any :. In particular, let : = t exp(÷j)´c
0
, and substitute this into the
expression for o
0
(:). Comparisons with (3.43), show that we can rewrite the
survivor function as:
o(t. A) = o
0
[t exp(÷j)] (3.47)
= o
0
[tc] (3.48)
where c = exp(÷j). It follows that c 1 is equivalent to having j < 0 and
c < 1 is equivalent to having j 0.
In sum, the e¤ect of the covariates is to change the time scale by a con
stant (survival timeinvariant) scale factor c = exp(÷j). When explaining this
idea, Allison (1995, p. 62) provides an illustration based on longevity. The
conventional wisdom is that one year for a dog is equivalent to seven years for a
human, i.e. in terms of actual calendar years, dogs age faster than humans do.
In terms of (3.47), if o(t. A) describes the survival probability of a dog, then
o
0
(t) describes the survival probability of a human, and c = 7.
How may we intrepret the coe¢cients

? Di¤erentiation shows that

=
0 ln(T)
0A

. (3.49)
Thus an AFT regression coe¢cient relates proportionate changes in survival
time to a unit change in a given regressor, with all other characteristics held
…xed. (Contrast this with the interpretation of the coe¢cients in the PH model
– they relate a one unit change in a regressor to a proportionate change in the
36 CHAPTER 3. FUNCTIONAL FORMS FOR THE HAZARD RATE
hazard rate, not survival time.) Stata allows you to choose to report either
´

or
exp(
´

).
Recall from (3.36) that
T = exp(
0
A) exp(.) (3.50)
If persons i and , are identical on all but the /th characteristic, i.e. A
In
= A
,n
for all : ÷ ¦1. .... 1`/¦, and they have the same ., then
T
I
T
,
= exp[

(A
I
÷A
,
)]. (3.51)
If, in addition, A
I
÷ A
,
= 1, i.e. there is a one unit change in A

, ceteris
paribus, then
T
I
T
,
= exp(

). (3.52)
The exp(

) is called the time ratio.
Some further implications
Using the general result that 0(t) = ÷[0o(t)´0t]´o(t), and applying it to (3.47),
the relationship between the hazard rate for someone with characteristics A and
the hazard rate for the case when A = 0 (i.e. the ‘baseline’ hazard 0
0
(.)), is
given for AFT models by:
0(t. A) = c0
0
(tc). (3.53)
Similarly, the probability density function for AFT models is given by:
1(t. A) = c0
0
(tc)o
0
(tc) (3.54)
= c1
0
(t).
The Weibull model is the only model that satis…es both the PH and
AFT assumptions
The Weibull distribution is the only distribution for which, with constant co
variates, the PH and AFT models coincide. To show this requires …nding the
functional form which satis…es the restriction that
[o
0
(t)]
X
= o
0
[tc]. (3.55)
See Cox and Oakes (1984, p. 71) for a proof.
This property implies a direct correspondence between the parameters used
in the Weibull PH and Weibull representations. What are the relationships?
Recall that the PH version of Weibull model is:
0 (t. A) = ct
o1
` (3.56)
= 0
0
(t) ` (3.57)
3.2. CONTINUOUS TIME SPECIFICATIONS 37
with baseline hazard 0
0
(t) = ct
o1
, which is sometimes written 0
0
(t) = (1´o)t
1
1
with o = 1´c (see earlier for reasons). Making the appropriate substitutions,
one can show that the Weibull AFT coe¢cients (
) are related to the Weibull
PH coe¢cients () as follows:

= ÷o

= ÷

´c, for each /. (3.58)
To see this for yourself, substitute the PH and AFT Weibull survivor functions
into (3.55):
[exp(÷t
o
)]
X
= exp(÷[t exp(÷j)]
).
By taking logs of both sides, and rearranging expressions, you will …nd that this
equality holds only if

= ÷

´c, for each /.
In Stata, you can choose to report either or
(or the exponentiated values
of each coe¢cient).
Incorporating timevarying covariates
Historically AFT models have been speci…ed assuming that the vector summa
rizing personal characteristics is timeinvariant. Clearly it is di¢cult to incor
porate them directly into the basic equation that relates log survival time to
charcteristics: see (3.36). However they can be incorporated via the hazard
function and thence the survivor and density functions. A relatively straight
forward generalisation of the AFT hazard to allow for timevarying covariates
is to suppose that
0(t. A

) = c

0
0
(tc

) (3.59)
where there is now a ‘t’ subscript on c

= exp(÷
0
A

). As in the PH case, we
can factor the survivor function if we assume each covariate is constant within
some prede…ned time interval. To illustrate this, again suppose that there is a
single covariate that takes on two values, depending on whether survival time
is before or after some date :, i.e.
A = A
1
if t < :
A = A
2
if t _ :.
(3.60)
We therefore have c
1
= exp(÷
A
1
) and c
2
= exp(÷
A
2
). The survivor
function now becomes
o (t. A

) = exp
¸
÷
s
0
0
0
(nc
1
) c
1
dn ÷

s
0
0
(nc
2
) c
2
dn
(3.61)
= [o
0
(:c
1
)]
r
1
[o
0
(tc
2
)]
2
[o
0
(:c
2
)]
r
2
. (3.62)
where manipulations similar to those used in the corresponding derivation for
the PH case have been used. The density function can be derived as the product
38 CHAPTER 3. FUNCTIONAL FORMS FOR THE HAZARD RATE
Function Proportional hazards Accelerated failure time
Hazard, 0 (t. A) 0
0
(t) ` c0
0
(tc)
Survivor, o (t. A) [o
0
(t)]
X
o
0
(tc)
Density, 1(t. A) 1
0
(t)`[o
0
(t)]
X1
c1
0
(t)
Note: ` = exp(
0
A); c = exp(÷
0
A). The characteristics vector A is timeinvariant.
Table 3.3: Speci…cation summary: proportional hazard versus accelerated fail
ure time models
Model PH AFT
Exponential X X
Weibull X X
Loglogistic X
Lognormal X
Gompertz X
Generalized Gamma X
Table 3.4: Classi…cation of models as PH or AFT: summary
of the hazard rate and this survivor function. As in the PH case, estimation
of AFT models with timevarying covariates requires a combination of episode
splitting and software that can handle conditioned survivor functions (delayed
entry).
3.2.8 Summary: PH versus AFT assumptions for contin
uous time models
We can now summarise the assumptions about the functional forms of the haz
ard function, density function, and survivor function: see Table 3.3 which shows
the case where there are no timevarying covariates.
Table 3.4 classi…es parametric models according to whether they can be
interpreted as PH or AFT. Note that the PH versus AFT description refers
to the interpretation of parameter estimates and not to di¤erences in how the
model per se is estimated. As we see in later chapters, estimation of all the
models cited so far is based on expressions for survivor functions and density
functions.
3.2.9 Asemiparametric speci…cation: the piecewiseconstant
Exponential (PCE) model
The PiecewiseConstant Exponential (PCE) model is an example of a semi
parametric continuous time hazard speci…cation. By contrast with all the para
metric models considered so far, the speci…cation does not completely character
3.2. CONTINUOUS TIME SPECIFICATIONS 39
ize the shape of the hazard function. Whether it generally increases or decreases
with survival time is left to be …tted from the data, rather than speci…ed a priori.
*Insert chart* shape of baseline hazard in this case.
The time axis is partitioned into a number of intervals using (researcher
chosen) cutpoints. It is assumed that the hazard rate is constant within each
interval but may, in principle, di¤er between intervals. An advantage of the
model compared to the ones discussed so far is that the overall shape of the
hazard function does not have to be imposed in advance. For example, with
this model, one can explore whether the hazard does indeed appear to vary
monotonically with survival time, and then perhaps later choose one of the
parametric models in the light of this check.
It turns out that this is a special (and simple) case of the models incorporat
ing timevarying covariates discussed earlier. The PCE model is a form of PH
model for which we have, in general, 0 (t. A

) = 0
0
(t) exp(
0
A

). In the PCE
special case, we have:
0 (t. A

) =
0
1
exp(
0
A
1
) t ÷ (0. t
1
]
0
2
exp(
0
A
2
) t ÷ (t
1
. t
2
]
.
.
.
.
.
.
0
1
exp(
0
A
1
) t ÷ (t
11
. t
1
]
(3.63)
The baseline hazard rate (0) is constant within each of the 1 intervals but
di¤ers between intervals. Covariates may be …xed or, if timevarying, constant
within each interval. This expression may be rewritten as
0 (t. A

) =
exp
log(0
1
) +
0
A
1
t ÷ (0. t
1
]
exp
log(0
2
) +
0
A
2
t ÷ (t
1
. t
2
]
.
.
.
.
.
.
exp
log(0
1
) +
0
A
1
t ÷ (t
11
. t
1
]
(3.64)
or
0 (t. A

) =
exp(
¯
`
1
) t ÷ (0. t
1
]
exp(
¯
`
2
) t ÷ (t
1
. t
2
]
.
.
.
.
.
.
exp(
¯
`
1
) t ÷ (t
11
. t
1
]
(3.65)
so the constant intervalspeci…c hazard rates are equivalent to having interval
speci…c intercept terms in the overall hazard. One can estimate these by de…ning
binary (dummy) variables that refer to each interval, and the required estimates
are the estimated coe¢cients on these variables. However, observe that in order
to identify the model parameters, one cannot include all the intervalspeci…c
dummies and an intercept term in the regression. Either one includes all the
dummies and excludes the intercept term, or includes all but one dummy and
includes an intercept. (To see why, note that, for the …rst interval, and similarly
for the others,
¯
`
1
= log(0
1
) +
0
+
1
A
1
+
2
A
2
+
3
A
3
+. . . +
1
A
1
.)
40 CHAPTER 3. FUNCTIONAL FORMS FOR THE HAZARD RATE
The expression for the PCE survivor function is a special case of the earlier
expression for survivor function where we assumed that covariates were constant
within some prede…ned interval (cf. 3.32). If we consider the case where 1 = 2
(as earlier), then it can be shown that
o (t. A

) = [o
0
(:)]
e
X1
[o
0
(t)]
e
X2
[o
0
(:)]
e
X2
(3.66)
= [exp(÷:)]
e
X1
[exp(÷t)]
e
X2
[exp(÷:)]
e
X2
(3.67)
= exp(÷:
¯
`
1
) exp[÷(t ÷:)
¯
`
2
]. (3.68)
It was stated earlier that expressions with this structure were the foundation
for estimation of models with timevarying covariates – combining episode split
ting and software that can handle delayed entry spell data. For the PCE model,
however, one only needs to episode split; one does not need software that can
handle spells with delayed entry (the spell from : to t in our example). With
a constant hazard rate, the relevant likelihood contribution for this spell is the
same as a spell from time 0 to (t ÷:): note the second term on the righthand
side of (3.68).
The PCE model should be distinguished from Cox’s Proportional Hazard
model that is considered in Chapter 7. Both are continuous time models, can
incorporate timevarying covariates, and allow for some ‡exibility in the shape
of the hazard function. However the Cox model is more general, in the sense
that it allows estimates of the slope parameters in the vector to be derived
regardless of what the baseline hazard function looks like. The PCE model
requires researcher input in its speci…cation (the cutpoints); the Cox model
estimates are derived for a totally arbitrary baseline hazard function. On the
other hand, if you desire ‡exibility and explicit estimates of the baseline hazard
function, you might use the PCE model.
3.3 Discrete time speci…cations
We consider two models. The …rst is the discrete time representation of a contin
uous time proportional hazards model, and leads to the socalled complementary
loglog speci…cation. This model can also be applied when survival times are
intrinsically discrete. The second model, the logistic model, was primarily de
veloped for this second case but may also be applied to the …rst. It may be given
an interpretation in terms of the proportional odds of failure. For expositional
purposes we assume …xed covariates.
3.3. DISCRETE TIME SPECIFICATIONS 41
3.3.1 A discrete time representation of a continuous time
proportional hazards model
The underlying continuous time model is summarised by the hazard rate 0 (t. A),
but the available survival time data are intervalcensored – grouped or banded
into intervals in the manner described earlier. That is, exact survival times are
not known, only that they fall within some interval of time. What we do here
is derive an estimate of parameters describing the continuous time hazard. but
taking into account the nature of the banded survival time data that is available
to us.
The survivor function at time a
,
, the date marking the end of the interval
(a
,1
. a
,
], is given by:
o(a
,
. A) = exp
¸
÷
oj
0
0 (n. A) dn
. (3.69)
Suppose also that the hazard rate satis…es the PH assumption:
0 (t. A) = 0
0
(t) c
o
0
·
= 0
0
(t) ` (3.70)
where, as before,
0
A =
0
+
1
A
1
+
2
A
2
+ . . .
1
A
1
and ` = exp(
0
A).
These assumptions imply that
o(a
,
. A) = exp
¸
÷
oj
0
0
0
(t) `dn
(3.71)
= exp
¸
÷`
oj
0
0
0
(t) dn
(3.72)
= exp[÷H
,
`] (3.73)
where H
,
= H(a
,
) =
oj
0
0
0
(n. A) dn is the integrated baseline hazard evaluated
at the end of the interval. Hence, the baseline survivor function at a
,
is:
o
0
(a
,
) = exp(÷H
,
). (3.74)
The discrete time (interval) hazard function, /(a
,
. A) = /
,
(A) is de…ned by
/
,
(A) =
o(a
,1
. A) ÷o(a
,
. A)
o(a
,1
. A)
(3.75)
= 1 ÷
o(a
,
. A)
o(a
,1
. A)
(3.76)
= 1 ÷exp[`(H
,1
÷H
,
)] (3.77)
which implies that
log (1 ÷/
,
(A)) = `(H
,1
÷H
,
) (3.78)
and hence
log (÷log[1 ÷/
,
(A)]) =
0
A + log(H
,
÷H
,1
).
42 CHAPTER 3. FUNCTIONAL FORMS FOR THE HAZARD RATE
Similarly, the discrete time (interval) baseline hazard for the interval (a
,1
. a
,
)
is
1 ÷/
0,
= exp(H
,1
÷H
,
) (3.79)
and hence
log [÷log (1 ÷/
0,
)] = log (H
,
÷H
,1
) (3.80)
= log
¸
oj
oj1
0
0
(n) dn
¸
(3.81)
=
,
. say (3.82)
where
,
is the log of the di¤erence between the integrated baseline hazard 0
0
(t)
evaluated at the end of the interval (a
,1
. a
,
] and the beginning of the interval.
We can substitute this expression back into that for /(a
,
. A) and derive an
expression for the interval hazard rate:
log(÷log[1 ÷/
,
(A)]) =
0
A +
,
, or (3.83)
/(a
,
. A) = 1 ÷exp[÷exp
0
A +
,
]. (3.84)
The log(÷log(.)) transformation is known as the complementary loglog trans
formation; hence the discretetime PH model is often referred to as a cloglog
model.
Observe that, if each interval is of unit length, then we can straightforwardly
index time intervals in terms of the interval number rather than the dates mark
ing the end of each interval. In this case, we can write the discrete time hazard
equivalently as
/(,. A) = 1 ÷exp[÷exp
0
A +
,
].
The cloglog model is a form of generalized linear model with particular link
function: see (3.83). When estimated using intervalcensored survival data, one
derives estimates of the regression coe¢cients , and of the parameters
,
. The
coe¢cients are the same ones as those characterizing the continuous time
hazard rate 0 (t) = 0
0
(t) exp(
0
A). However the parameters characterizing the
baseline hazard function 0
0
(t) cannot be identi…ed without further assumptions:
the
,
summarize di¤erences in values of the integrated hazard function, and
are consistent with a number of di¤erent shapes of the hazard function within
each interval. To put things another way, the
,
summarize the pattern of
duration dependence in the interval hazard, but one cannot identify the precise
pattern of duration dependence in the continuous time hazard without further
assumptions.
Restrictions on the speci…cation of the
,
can lead to discrete time models
corresponding directly to the parametric PH models considered in the previous
chapter. For example, if the continuous time model is the Weibull one, and
we have unitlength intervals, then evaluation of the integrated baseline hazard
3.3. DISCRETE TIME SPECIFICATIONS 43
– see later in chapter for the formula – reveals that
,
= (,)
o
÷ (, ÷ 1)
o
. In
practice, however, most analysts do not impose restrictions such as these on the
,
when doing empirical work. Instead, either they do not place any restrictions
on how the
,
vary from interval to interval (in which case the model is a type
of semiparametric one) or, alternatively, they specify duration dependence in
terms of the discrete time hazard (rather than the continuous time one). That
is, the variation in
,
from interval to interval is speci…ed using a parametric
functional form. Examples of these speci…cations are given below.
Note, …nally, that the cloglog model is not the only model that is consistent
with a continuous time model and intervalcensored survival time data (though
it is the most commonlyused one). Sueyoshi (1995) has shown how, for example,
a logistic hazard model (as considered in the next subsection) with interval
speci…c intercepts may be consistent with an underlying continuous time model
in which the withininterval durations follow a loglogistic distribution.
3.3.2 A model in which time is intrinsically discrete
When survival times are intrinsically discrete, we could of course apply the
discrete time PH model that we have just discussed, though of course it would
not have the same interpretation. An alternative, also commonly used in the
literature, is a model that can be labelled the proportional odds model. (Beware
that here the odds refer to the hazard, unlike in the case of the Loglogistic model
where they referred to the survivor function.) Let us suppose, for convenience,
that survival times are recorded in ’months’.
The proportional odds model assumes that the relative odds of making a
transition in month ,, given survival up to end of the previous month, is sum
marised by an expression of the form:
/(,. A)
1 ÷/(,. A)
=
¸
/
0
(,)
1 ÷/
0
(,)
exp
0
A
(3.85)
where /(,. A) is the discrete time hazard rate for month ,, and /
0
(,. A) is the
corresponding baseline hazard arising when A = 0. The relative odds of making
a transition at any given time is given by the product of two components: (a)
a relative odds that is common to all individuals, and (b) an individualspeci…c
scaling factor. It follows that
log it [/(,. A)] = log
¸
/(,. A)
1 ÷/(,. A)
= c
,
+
0
A (3.86)
where c
,
= logit[ /
0
(,)]. We can write this expression, alternatively, as
/(,. A) =
1
1 + exp
÷c
,
÷
0
A
. (3.87)
This is the logistic hazard model and, given its derivation, has a proportional
odds interpretation. In principle the c
,
may di¤er for each month. Often
however the pattern of variation in the c
,
, and the
,
in the cloglog model, are
characterized using some function of ,.
44 CHAPTER 3. FUNCTIONAL FORMS FOR THE HAZARD RATE
3.3.3 Functional forms for characterizing duration depen
dence in discrete time models
Examples of duration dependence speci…cations include:
« r log (,), which can be thought of a discretetime analogue to the contin
uous time Weibull model, because the shape of the hazard monotonically
increases if r 0, decreases if r < 0, or is constant if r = 0 (r + 1 = c
is thus analogous to the Weibull shape parameter c). If this speci…cation
were combined with a logistic hazard model, then the full model speci…ca
tion would be logit[/(,. A)] = r log (,) +
0
A, and r is a parameter that
would be estimated together with intercept and slope parameters within
the vector .
« .
1
, + .
2
,
2
+ .
3
,
3
+ ... + .
µ
,
µ
, i.e. a jth order polynomial function of
time, where the shape parameters are .
1
. .
2
. .
3
. ....
µ
. If j = 2 (a quadratic
function of time), the interval hazard is Ushaped or inverseUshaped. If
the quadratic speci…cation were combined with a cloglog hazard model,
then the full model speci…cation would be cloglog[/(,. A)] = .
1
, +.
2
,
2
+
0
A, and .
1
, and .
2
are parameters that would be estimated together
with intercept and slope parameters within the vector .
« piecewise constant, i.e. groups of months are assumed to have the same
hazard rate, but the hazard di¤ers between these groups. If this speci…
cation were combined with a logistic hazard model, then the full model
speci…cation would be cloglog[/(,. A)] =
1
1
1
+
2
1
2
+...+
²
1
²
+
0
A,
where 1
l
is a binary variable equal to one if , =  and equal to zero oth
erwise. I.e. the researcher creates dummy variables corresponding to each
interval (or group of intervals). When estimating the full model, one would
not include an intercept term within the vector , or else as it would be
collinear. (Alternatively one could include an intercept term but drop one
of the dummy variables.)
The choice of shape of hazard function in these models is up to the investi
gator – just as one can choose between di¤erent parametric functional forms in
the continuous time models.
In practice, cloglog and logistic hazard models that share the same duration
dependence speci…cation and the same A yield similar estimates – as long as
the hazard rate is relatively ‘small’. To see why this is so, note that
log it(/) = log
/
1 ÷/
= log(/) ÷log(1 ÷/). (3.88)
As / ÷0, then log(1 ÷/) ÷0 also. That is,
log it(/)  log(/) for ‘small’ /.
With a su¢ciently small hazard rate, the proportional odds model (a linear
function of duration dependence and characteristics) is a close approximation
3.4. DERIVINGINFORMATIONABOUT SURVIVAL TIME DISTRIBUTIONS45
to a model with the log of the hazard rate as dependent variable. The result
follows from the fact that the estimates of from a discrete time proportional
hazards model correspond to those of a continuous time model in which log(0)
is a linear function of characteristics.
3.4 Deriving information about survival time dis
tributions
The sorts of questions that one might wish to know, given estimates of a given
model, include:
« How long are spells (on average)?
« How do spell lengths di¤er for persons with di¤erent characteristics?
« What is the pattern of duration dependence (the shape of the hazard
function with survival time)?
To provide answers to these questions, we need to know the shape of the
survivor function for each model. We also need expressions for the density and
survivor functions in order to construct expressions for sample likelihoods in
order to derive model estimates using the maximum likelihood principle, as we
shall see in Chapters 5 and 6. The integrated hazard function also has several
uses, including being used for speci…cation tests. We derived a number of general
results for PH and AFT models in an earlier section; in essence what we are
doing here is illustrating those results, with reference to the speci…c functional
forms for distributions that were set out in the earlier sections of this chapter.
To illustrate how one goes about deriving the relevant expressions, we focus
on a relatively small number of models, and assume no timevarying covariates.
3.4.1 The Weibull model
Hazard rate
From Section 3.2, we know that the Weibull model hazard rate is given by
0 (t. A) = ct
o1
exp
0
A
(3.89)
= ct
o1
` (3.90)
where ` = exp
0
A
. The baseline hazard function is 0
0
(t) = ct
o1
or, using
the alternative characterization mentioned earlier, 0
0
(t) = ct
o1
exp(
0
). The
expression for the hazard rate implies that:
log[0(t. A)] = log c + (c ÷1) log (t) +
0
A (3.91)
46 CHAPTER 3. FUNCTIONAL FORMS FOR THE HAZARD RATE
We can easily demonstrate the Weibull model satis…es the general properties of
PH model, discussed earlier. Remember that all these results also apply to the
Exponential model, as that is simply the Weibull model with c = 1. First,
00 (t. A)
0A

= 0

(3.92)
or
0 log 0 (t. A)
0A

=

(3.93)
where A

is the /th covariate in the vector of characteristics A. Thus each co
e¢cient summarises the proportionate response of the hazard to a small change
in the relevant covariate.
From this expression can be derived the elasticity of the hazard rate with
respect to changes in a given characteristic:
A

0
00
0A

=
0 log 0
0 log A

=

A

. (3.94)
If A

= log(7

), then the elasticity of the hazard with respect to changes in 7

is
0 log 0
0 log 7

=

. (3.95)
The Weibull model also has the PH property concerning the relative hazard
rates for two persons at same t, but with di¤erent A :
0
t. A
1
0
t. A
2
= exp
0
(A
1
÷A
2
)
. (3.96)
The Weibull shape parameter c can also be interpreted with reference to the
elasticity of the hazard with respect to survival time:
0 log 0 (t. A)
0 log (t)
= c ÷1. (3.97)
*Insert graphs*
The relative hazard rates for two persons with same A = A, but at di¤erent
survival times t and n, where t n, are given by
0
t. A
´0
n. A
=
t
n
o1
. (3.98)
Thus failure at time t is (t´n)
o1
times more likely than at time n, unless c = 1
in which case failure is equally likely. Constrast the cases of c 1 and c < 1.
3.4. DERIVINGINFORMATIONABOUT SURVIVAL TIME DISTRIBUTIONS47
Survivor function
Recall that 0 (t. A) = ct
o1
`. But we know that for all models, o (t. A) =
1÷1 (t. A) = exp
÷

0
0 (n. A) dn
. So substituting the Weibull expression for
the hazard rate into this expression:
o (t. A) = exp
÷

0
cn
o1
`dn
(3.99)
= exp
÷`c
n
o
c

0
¸
(3.100)
= exp
÷`c
¸
t
o
c
÷
0
o
c
(3.101)
using the fact that there are no timevarying covariates. The expression in the
curly brackets {.} refers to the evaluation of the de…nite integral. Hence the
Weibull survivor function is:
o (t. A) = exp (÷`t
o
) . (3.102)
*Insert graphs* to show how varies with A and with c
Density function
Recall that 1(t) = 0(t)o(t), and so in the Weibull case:
1 (t. A) = ct
o1
`exp(÷`t
o
) . (3.103)
Integrated hazard function
Recall that H (t. A) = ÷lno(t. A). Hence, in the Weibull case,
H (t. A) = `t
o
. (3.104)
It follows that
log H (t. A) = log(`) +clog(t) (3.105)
=
0
A +clog(t). (3.106)
This expression suggests that a simple graphical speci…cation test of whether
observed survival times follow the Weibull distribution can be based on a plot
of the log of the integrated hazard function against the log of survival time. If
the distribution is Weibull, the graph should be a straight line (with slope equal
to the Weibull shape parameter c) or, for groups characterised by combinations
of A, the graphs should be parallel lines.
48 CHAPTER 3. FUNCTIONAL FORMS FOR THE HAZARD RATE
Quantiles of the survivor function, including median
The median duration is survival time : such that o(:) = 0.5.
*Insert chart * o(t) against t, showing :
From (3.102) we have
log o (t. A) = ÷`t
o
(3.107)
and hence
t =
¸
1
`
[÷log o (t)]
1/o
. (3.108)
Thus, at the median survival time, we must also have that
: =
¸
1
`
[÷log (0.5)]
1/o
(3.109)
=
¸
log (2)
`
1/o
. (3.110)
Expressions for the upper and lower quartiles (or any other quantile) may be
derived similarly. Substitute t = 0.75 or 0.25 in the expression for : rather
than 0.5.
How does the median duration vary with di¤erences in characteristics? Con
sider the following expression for the proportionate change in the median given
a change in a covariate:
0 log :
0A

=
1
c
0[log(2) ÷log(`)]
0A

= ÷
1
c
0 log(`)
0A

= ÷

c
. (3.111)
Observe also that this elasticity is equal to

. Also, using the result that
0 log :´0A

= (1´:)0:´0A

, the elasticity of the median duration with re
spect to a change in characteristics is
A

:
0:
0A

=
0 log(:)
0 log(A

)
=
÷

A

c
=

A

. (3.112)
If A

= log(7

), then the elasticity of the median with respect to changes in
7

is ÷

´c =

.
Mean (expected) survival time
In general, the mean or expected survival time is given by
c (T) =
1
0
t1 (t) dt =
1
0
o (t) dt (3.113)
where the expression in terms of the survivor function was derived using integra
tion by parts. Hence in the Weibull case, c (T) =
1
0
exp(÷`t
o
) dt. Evaluation
of this integral yields (Klein and Moeschberger 1990, p.37):
3.4. DERIVINGINFORMATIONABOUT SURVIVAL TIME DISTRIBUTIONS49
c
1 +
1
o
[log (2)]
1/o
Ratio of mean to median
4.0 0.906 0.912 0.99
3.0 0.893 0.885 1.01
2.0 0.886 0.832 1.06
1.1 0.965 0.716 1.35
1.0 1.000 0.693 1.44
0.9 1.052 0.665 1.58
0.8 1.133 0.632 1.79
0.7 1.266 0.592 2.14
0.6 1.505 0.543 2.77
0.5 2.000 0.480 4.17
Table 3.5: Ratio of mean to median survival time: Weibull model
c (T) =
1
`
1
1 +
1
c
(3.114)
where (.) is the Gamma function. The Gamma function has a simple formula
when its argument is an integer: (:) = (: ÷ 1)!, for integervalued :. For
example, (5) = 4! = 4 3 2 1 = 24. (4) = 3! = 6. (3) = 2! = 2.
(2) = 1. For noninteger arguments, one has to use special tables (or a builtin
function in one’s software package). Observe that for Exponential model (c =
1), the mean duration is simply 1´`.
So, if c = 0.5, i.e. there is negative duration dependence (a monotonically
declining hazard), then c (T) = (1´`)
1
(3) = 2´`
2
. Compare this with the
median :  0.480´`
2
. More generally, the ratio of the mean to the median in
the Weibull case is
ratio =
1 +
1
o
[log (2)]
1/o
(3.115)
Table 3.5 summarises the ratio of the mean to median for various values of
c. Observe that, unless the hazard is increasing at a particularly fast rate (large
c 0), then the mean survival time is longer than the median.
The elasticity of the mean duration with respect to a change in characteristics
is
A

c (T)
0c (T)
0A

=
÷

A

c
(3.116)
which is exactly the same as the corresponding elasticity for the median. Note
that 0 log c (T) ´0A

= ÷

´c, so that if A

= log(7

), then the elasticity of
the median with respect to changes in 7

is ÷

´c, i.e. the same expression as
for the corresponding elasticity for the median!
50 CHAPTER 3. FUNCTIONAL FORMS FOR THE HAZARD RATE
3.4.2 Gompertz model
From Section 3.2, the Gompertz model hazard rate is given by
0 (t. A) = `exp(t). (3.117)
By integrating this expression, one derives the integrated hazard function
H (t. A) =
`
[exp(t) ÷1]. (3.118)
Since H(t) = ÷log o(t), it follows that the expression for the survivor function
is:
o (t. A) = exp
`
[1 ÷exp(t)]
. (3.119)
The density function can be derived as 1(t. A) = 0 (t. A) o (t. A). Observe that
if = 0, then we have the Exponential hazard model.
To derive an expression for the median survival time, we applying the same
logic as above, and …nd that
: =
1
log
1 +
log 2
`
. (3.120)
There is no closedform expression for the mean.
3.4.3 Loglogistic model
Hazard rate
From Section 3.2, the Loglogistic model hazard rate is given by
0 (t. A) =
c
1
t
(
1
1)
1 + (ct)
1
(3.121)
=
.c
,
t
(,1)
1 + (ct)
,
(3.122)
for . = 1´.where shape parameter 0 and c = exp
÷
0
A
.
Survivor function
The loglogistic survivor function is given by
o (t. A) =
1
1 + (ct)
1/~
(3.123)
=
1
1 + (ct)
,
. (3.124)
3.4. DERIVINGINFORMATIONABOUT SURVIVAL TIME DISTRIBUTIONS51
Beware: this expression di¤ers slightly from that shown by e.g. Klein and
Moeschberger (1997), since they parameterise the hazard function di¤erently.
*Insert graphs* to show how varies with A and with
Density function
Recall that 1(t. A) = 0(t. A)o(t. A), and so in the Loglogistic case:
1 (t. A) =
1
~
c
1
t
(
1
1)
1 + (ct)
1/~
2
(3.125)
=
.c
,
t
(,1)
[1 + (ct)
,
]
2
(3.126)
Integrated hazard function
From above, we have
H(t. A) = log
1 + (ct)
1
= log [1 + (ct)
,
] (3.127)
and hence
ln[exp[H(t. A)] ÷1] = ÷.
0
A +.log (t) . (3.128)
One might therefore graph an estimate of the lefthand side against log(t) as a
speci…cation check for the model: the graph should be a straight line. A more
common check is expressed in terms of the log odds of survival, however: see
below.
Quantiles of the survivor function, including median
The median survival time is
: = c
1
. (3.129)
Hence it can be derived that 0:´0A

=

´c
2
, and so the elasticity of the
median with respect to a change in A

is
0 log (:)
0 log (A

)
=

A

c
. (3.130)
52 CHAPTER 3. FUNCTIONAL FORMS FOR THE HAZARD RATE
Mean (expected) survival time
There is no closed form expression for mean survival time unless < 1 or
equivalently . 1 (i.e. the hazard rises and then falls). In this case, the mean
is (Klein and Moeschberger, 1997, p. 37):
c (T) =
1
c
¬
sin(¬)
. < 1. (3.131)
where ¬ is the trigonometric constant 3.1415927.... (Klein and Moeschberger’s
expression is in terms of the relatively unfamiliar cosecant function, Csc(r).
Note however that Csc(r) = 1´ sin(r).) The elasticity of the mean is given by
0 log c (T)
0 log (A

)
=

A

c
=
0 log (:)
0 log (A

)
. (3.132)
i.e. the same expression as for the elasticity of the median.
The ratio of the mean to the median is given by
c (T)
:
=
¬
sin(¬)
. < 1.
Logodds of survival interpretation
The Loglogistic model can be given a logodds of survival interpretation. From
(3.123), the (conditional) odds of survival to t is
o(t. A)
1 ÷o(t. A)
= (ct)
1
(3.133)
and at A = 0 (i.e. c = exp(÷
0
) = c
0
, also, so
o(t. A[A = 0)
1 ÷o(t. A[A = 0)
= (tc
0
)
1
(3.134)
and therefore
o(t. A)
1 ÷o(t. A)
=
¸
o(t. A[A = 0)
1 ÷o(t. A[A = 0)
(
c
c
0
)
1
. (3.135)
Hence the (conditional) log odds of survival at each t for any individual
depend on a ‘baseline’ conditional odds of survival (common to all persons) and
a personspeci…c factor depending on characteristics (and the shape parameter
) that scales those ‘baseline’ conditional odds: (c´c
0
) = exp(÷
1
A
1
÷
2
A
2
÷
... ÷
1
A
1
).
The logodds property suggests a graphical speci…cation check for the Log
logistic model. From (3.133), we have
log
¸
o(t. A)
1 ÷o(t. A)
=
0
A ÷.log(t). (3.136)
3.4. DERIVINGINFORMATIONABOUT SURVIVAL TIME DISTRIBUTIONS53
The check is therefore based on a graph of log[(1 ÷ o(t. A))´o(t. A)] against
log(t) – it should be a straight line if the Loglogistic model is appropriate. Or
give rise to parallel lines if plotted separately for di¤erent groups classi…ed by
combinations of values of A.
3.4.4 Other continuous time models
See texts for discussion of other models. Most continuous time parametric
models have closed form expressions for the median but not for the mean.
3.4.5 Discrete time models
The expressions for the median and mean, etc., depend on the baseline hazard
speci…cation that the researcher chooses. For discrete time models, there is
typically no closed form expressions for median and mean and they have to be
derived numerically (see web Lessons for examples).
Recall that survival up to the end of the ,th interval (or completion of the
,th cycle) is given by:
o(,) = o
,
=
,
¸
=1
(1 ÷/

) (3.137)
and /

is a logit or complementary loglog function of characteristics and sur
vival time. In general this function is not invertible so as to produce suitable
closed form expressions. The median is de…ned implicitly by …nding the value
of , such that o(t
,
) = 0.5.
Things are simpler in the special case in which the discrete hazard rate is
constant over time, i.e. /
,
= / all , (the case of a Geometric distribution of
survival times). In this case,
o
,
= (1 ÷/)
,
(3.138)
log o
,
= , log(1 ÷/) (3.139)
and so the median is
: =
log(0.5)
log(1 ÷/)
=
÷log(2)
log(1 ÷/)
. (3.140)
In the general case, in which the hazard varies with survival time, the mean
survival time c(T) satis…es
c(T) =
1
¸
=1
/1(/) (3.141)
or, alternatively,
54 CHAPTER 3. FUNCTIONAL FORMS FOR THE HAZARD RATE
c(T) =
1
¸
=1
o(/) (3.142)
where 1(/) = Pr(T = /), and 1 is the maximum survival time. In the special
case in which the hazard rate is constant at all survival times, then
c(T) =
1 ÷/
/
1 ÷(1 ÷/)
1
(3.143)
implying that the mean survival time is approximated well by (1 ÷/)´/ if 1 is
‘large’.
Chapter 4
Estimation of the survivor
and hazard functions
In this chapter, we consider estimators of the survivor and hazard functions –
the empirical counterparts of the concepts that we considered in the previous
chapter. One might think of the estimators considered here as nonparametric
estimators, as no prior assumptions are made about the shapes of the relevant
functions. The methods may be applied to a complete sample, or to subgroups
of subjects within the sample, where the groups are de…ned by some combina
tion of observable characteristics. The analysis of subgroups may, of course, be
constrained by cell size considerations – which is one of the reasons for taking
account of the e¤ects of di¤erences in characteristics on survival and hazard
functions by using multiple regression methods. (These are discussed in the
chapters following.) In any case, because the nonparametric analysis is infor
mative about the pattern of duration dependence, it may also assist with the
choice of parametric model.
We shall assume that we have a random sample from the population of
spells, and allow for rightcensoring (but not truncation). We consider …rst
the KaplanMeier productlimit estimator and then the lifetable estimator. The
main di¤erence between them is that the former is speci…ed assuming continuous
time measures of spells, whereas for the latter the survival times are banded
(grouped).
4.1 KaplanMeier (productlimit) estimators
Let t
1
< t
2
< ... < t
,
< ... < t

< · represent the survival times that are
observed in the data set. From the data, one can also determine the following
quantities:
d
,
: the number of persons observed to ‘fail’ (make a transition out of the state)
at t
,
55
56CHAPTER 4. ESTIMATIONOF THE SURVIVOR ANDHAZARDFUNCTIONS
Failure time # failures # censored # at risk of failure
t
1
d
1
:
1
:
1
t
2
d
2
:
2
:
2
t
3
d
3
:
3
:
3
.
.
.
.
.
.
.
.
.
.
.
.
t
,
d
,
:
,
:
,
.
.
.
.
.
.
.
.
.
.
.
.
t

d

:

:

Table 4.1: Example of data structure
:
,
: the number of persons whose observed duration is censored in the interval
[t
,
. t
,+1
), i.e. still in state at time t but not in state by t + 1
:
,
: the number of persons at risk of making a transition (ending their spell)
immediately prior to t
,
, which is made up of those who have a censored
or completed spell of length t
,
or longer:
:
,
= (:
,
+d
,
) + (:
,+1
+d
,+1
) +.... + (:

+d

)
Table 4.1 provides a summary of the data structure.
4.1.1 Empirical survivor function
The proportion of those entering a state who survive to the …rst observed sur
vival time t
1
,
´
o(t
1
), is simply one minus the proportion who made a tran
sition out of the state by that time, where the latter can be estimated by
the number of exits divided by the number who were at risk of transition:
d
1
´(d
1
+ :
1
) = d
1
´:
1
. Similarly the proportion surviving to the second ob
served survival time t
2
is
´
o(t
1
) multiplied by one minus the proportion who
made a transition out of the state between t
1
and t
2
. More generally, at sur
vival time t
,
,
´
o(t
,
) =
¸
,jj<
1 ÷
d
,
:
,
. (4.1)
So, the KaplanMeier estimate of the survivor function is given by the prod
uct of one minus the number of exits divided by the number of persons at risk of
exit, i.e. the product of one minus the ‘exit rate’ at each of the survival times.
Standard errors can be derived for the estimates using Greenwood’s formulae
(see texts).
From this, one can also derive an estimate of the failure function
´
1 (t
,
) =
1 ÷
´
o (t
,
) and the integrated hazard function
´
H(t
,
), since
4.1. KAPLANMEIER (PRODUCTLIMIT) ESTIMATORS 57
o (t) = exp
¸
÷

0
0 (n) dn
= exp[÷H (t)] (4.2)
==
´
H (t
,
) = ÷log
´
o (t
,
) . (4.3)
It is important to note that one can only derive estimates of the survivor
function (and the integrated hazard function) at the dates at which there are
transitions, and the last estimate depends on the largest noncensored survival
time. To derive estimates of survival probabilities at dates beyond the maximum
observed failure time, or at dates between withinsample failure times, requires
additional assumptions. Re‡ecting this, the estimated survivor function is con
ventionally drawn as a step function, with the …rst ‘step’, at t = 0, having a
height of one and width t
1
, with the next ‘step’ beginning at time t
1
with height
´
o(t
1
) and width t
2
÷ t
1
, and so on. The shape of this step function is not like
a regular staircase: the height between steps varies (depending on the survivor
function estimates), so too does the width of the steps (depending on the times
at which failures were observed). To put things another way, one has a set of
estimates at ¦
´
o(t
,
). t
,
¦ at various t
,
but, to draw the survivor function, one can
not simply ‘connect the dots’ (as in children’s drawing books). This would be
making assumptions about the shape of the survivor function or, equivalently,
the hazard function, that would most likely be unwarranted. This is a price
that one has to pay for using a nonparametric estimator.
Indeed, the nonparametric step function nature of the survivor function and
the integrated hazard function has implications for estimation of the hazard
function. One might think that one could derive an estimate of the hazard
rate
´
0 (t) directly from the estimates of the integrated hazard function, using
the property that 0H (t) ´0t = 0 (t). To do this one would need estimates
of the slope of the integrated hazard function at a series of survival times.
But the estimated integrated hazard function (‘cumulative hazard’) is also a
step function (with higher steps at longer observed survival times). Trying to
estimate the slope of the integrated hazard function at each of the observed
survival time is equivalent to trying to …nd the slope at the corner of each of
the steps. Clearly, the slope is not wellde…ned: so, nor is a nonparametric
estimate of the hazard rate.
What is one to do if an estimate of the hazard function is required? One
possibility is to divide the survival time axis into a number of regular intervals of
times and to derive estimates of the interval hazard rate (/ rather than 0): see
the discussion of lifetable estimators below. Another possibility is to return to
the estimated integrated hazard function, and to smooth this step function (for
example using a kernel smoother). In e¤ect, this is a statistically sophisticated
method of ‘connecting the dots’. See e.g. Klein and Moeschberger (1997, section
6.2) for a full discussion.
The advantage is, of course, that once one has a smooth curve, one can also
derive its slope. (The estimated slope is the ‘smoothed hazard’.) The disadvan
tage is that smoothing incorporates assumptions that need not be appropriate.
58CHAPTER 4. ESTIMATIONOF THE SURVIVOR ANDHAZARDFUNCTIONS
What if, for example, there are genuine step changes in survival probabilities?
(Think about exit from employment into retirement: there are likely to be large
– and genuine – changes in survival probabilities at the age at which people
qualify for state pensions.)
An alternative to the KaplanMeier estimator of the survival and integrated
hazard functions is the NelsonAalen estimator. In essence, with the Kaplan
Meier estimator, one estimates the survivor function and derives the cumulative
hazard function from that; with the NelsonAalen estimator, it is viceversa.
The NelsonAalen estimator of the cumulative hazard is
´
H(t
,
) =
¸
,jj<
d
,
:
,
(4.4)
and this provides an estimate of the survivor function which is exp(÷
´
H(t
,
)),
sometimes called the FlemingHarrington estimator. It turns out that the esti
mators are asymptotically equivalent (they provide the same estimates as the
sample size becomes in…nitely large), but have di¤erent small sample properties.
The NelsonAalen estimator of the cumulative hazard function has better small
sample properties that the KaplanMeier estimator, whereas for estimation of
the survivor function, the relative advantage is reversed. (In practice, with
datasets with sample sizes typical in the social sciences, the di¤erence between
corresponding estimates is often negliglible.)
The sts command in Stata provides estimates of the KaplanMeier (product
limit) estimator of the survivor function and NelsonAalen estimator of the cu
mulative hazard function; sts graph includes an option for graphing smoothed
hazards, with ‡exibility about the degree of smoothing (bandwidth).
4.2 Lifetable estimators
The lifetable method uses broadly the same idea as the KaplanMeier one, but
it was explicitly developed to handle situation where the observed information
about the number of exits and the number at risk are those that pertain to
intervals of time, i.e. there is grouped survival time data.
De…ne intervals of time 1
,
where , = 1. ..... J : 1
,
: [t
,
. t
,+1
), where
d
,
: the number of failures observed in interval 1
,
:
,
: the number of censored spell endings observed in interval 1
,
·
,
: the number at risk of failure at start of interval.
Because the survival time axis is grouped into bands, one wants to adjust
the observed number of persons at risk in a given interval to take account of the
fact that during the period, some people will have left. The idea is to produce
an ‘averaged’ estimate centred on the midpoint of the interval. Suppose that
transitions are evenly spread over each interval, in which case half of the total
4.2. LIFETABLE ESTIMATORS 59
number of events for each interval will have left halfway through the relevant
interval. We can therefore de…ne an adjusted number at risk of exit:
:
,
: the adjusted number at risk of failure used for midpoint of interval
:
,
= ·
,
÷
d
,
2
(4.5)
and hence
´
o (,) =
,
¸
=1
1 ÷
d

:

. (4.6)
From this expression can be derived an estimate of the density function
(recall o(t) = 1 ÷1(t))
´
1 (,) =
´
1 (, + 1) ÷
´
1 (,)
t
,+1
÷t
,
=
´
o (,) ÷
´
o (, + 1)
t
,+1
÷t
,
(4.7)
and an estimate of the hazard rate
´
0 (,) =
´
1 (,)
¯
o (,)
(4.8)
where
¯
o (/) =
´
o(/) +
´
o(/ + 1)
2
(4.9)
and is taken as applying to the time corresponding to the midpoint of the
interval. See Stata’s ltable command.
Note that the ‘adjustment’ is used because the underlying survival times are
continuous, but the observed survival time data are grouped.
If the time axis were intrinsically discrete, so that the intervals referred to
basic units of time or ‘cycles’, then the ‘adjustment’ is not required. The same
is true if one simply wanted to derive an estimate of the interval hazard rate.
The formula for the estimator of the survivor function in this case is the same
as that for the KaplanMeier one in the continuous time case. Observe that if
there are no events within some interval 1
,
, then the estimator of the interval
hazard is equal to zero.
** to be added: discussion of formulae for standard errors; strati…cation,
and tests of homogeneity **
60CHAPTER 4. ESTIMATIONOF THE SURVIVOR ANDHAZARDFUNCTIONS
Chapter 5
Continuous time
multivariate models
In this chapter and the next, we return to the problem that was raised in Chapter
1. We drew attention to problems with using OLS (and some other commonly
used econometric methods) to estimate multivariate models of survival times
given the issues of censoring and timevarying covariates. The principal lesson
of this chapter (and the next) is that these problems may be addressed by
using estimation based on maximum likelihood methods (ML). ML methods
per se are not considered at all here – see any standard econometrics text, e.g.
Greene (2003), for a discussion of the ML principle and the nice properties of ML
estimates. For a very useful (and free) introduction, see chapter 1 of Gould et al.
(2003), downloadable from http://www.statapress.com/books/mlch1.pdf. In
this chapter we consider parametric regression models formulated in continuous
time; in the next chapter we consider discrete time regression models. The
continuous time Cox regression model is considered in Chapter 7 (it uses an
estimation principle that di¤ers from ML).
A second lesson of this and succeeding chapters is that the sample likelihood
used needs to be appropriate for the type of process that generated the data,
i.e. the type of sampling scheme. The main sampling schemes in social science
applications are as follows:
1. a random sample from the in‡ow to the state (or of the population or
some other group – as long as sample selection is unrelated to survival)
and each spell is monitored (from start) until completion;
2. a random sample as in (1) and either (a) each spell is monitored until
some common time t
, or (b) the censoring point varies;
3. a sample from the stock at point of time who are interviewed some time
later (this corresponds to the delayed entry or left truncated spell data
case discussed in the biostatistics literature);
61
62 CHAPTER 5. CONTINUOUS TIME MULTIVARIATE MODELS
4. a sample from the stock with no reinterview.
5. a sample from the out‡ow (the case of right truncated spell data).
Cases (3) – (5) di¤er from the …rst two in that one has to take account
of ‘selection e¤ects’ when characterising the sample likelihood. For continuous
time data, Stata’s streg modules handle cases (1)–(3) straightforwardly with
no problem, as long as the data have been stset data correctly. Cases (4) and
(5) require one to write one’s own maximisation routines.
Let us consider the likelihood contributions relevant in each case. The like
lihood contribution per spell is L
I
and for the sample as a whole,
L =
n
¸
I=1
L
I
or, equivalently, in terms of the loglikelihood function,
log L =
n
¸
I=1
log L
I
.
Software for maximization likelihood estimation typically works by requiring
speci…cation of expressions of log L
I
for each individual i (corresponding to a
row in the analysis data set), and log L is derived by summing down the rows.
5.0.1 Random sample of in‡ow and each spell monitored
until completed
Suppose we have a sample of completed spells (or persons – since we assume one
spell per person), indexed by i = 1. . . . . :. Then each individual contribution to
the likelihood is given by the relevant density function. (There are no censored
spells, by construction.). Hence the overall sample likelihood function is given
by
L =
n
¸
I=1
1 (T
I
) (5.1)
where T
I
denotes completed spell length for person i. and hence
log L =
n
¸
I=1
log 1 (T
I
) .
with log L
I
= log 1 (T
I
) .To complete the model speci…cation, one has to choose
a speci…c parametric model for the survival time distribution, e.g. Weibull, and
substitute the relevant expression for into the expression for log 1 (T
I
).
63
5.0.2 Random sample of in‡ow with (right) censoring,
monitored until t
Now the sample consists of
« completed spells, indexed , = 1. .... J, with T
,
such that T
,
_ t
: L
,
=
1(T
,
), and
« (right) censored spells, indexed / = 1. .... 1, with T

such that T

t
:
L

= o(t
)
It follows that
L =
²
¸
,=1
1(T
,
)
1
¸
=1
o(t
). (5.2)
5.0.3 Random sample of population, right censoring but
censoring point varies
This is almost the same as the case just considered. It is probably the most
common likelihood used in empirical applications.
L =
²
¸
,=1
1(T
,
)
1
¸
=1
o(T

) (5.3)
The likelihood L is often written di¤erently in this case, in terms of the hazard
rate. Taking logs of both sides implies:
log L =
²
¸
,=1
log 1(T
,
) +
1
¸
=1
log o(T

) (5.4)
=
²
¸
,=1
log
¸
1(T
,
)
o(T
,
)
o(T
,
)
+
1
¸
=1
log o(T

) (5.5)
=
²
¸
,=1
log [0(T
,
)o(T
,
)] +
1
¸
=1
log o(T

) (5.6)
=
²
¸
,=1
log 0(T
,
) +
²
¸
,=1
log o(T
,
) +
1
¸
=1
log o(T

) (5.7)
=
²
¸
,=1
log 0(T
,
) +
Þ
¸
I=1
log o(T
I
) (5.8)
=
Þ
¸
I=1
[c
I
log 0(T
I
) + log o(T
I
)] (5.9)
64 CHAPTER 5. CONTINUOUS TIME MULTIVARIATE MODELS
where c
I
is a censoring indicator de…ned such that
c
I
=
1 if spell complete
0 if spell censored.
(5.10)
In this case, we have
log L
I
= c
I
log 0(T
I
) + log o(T
I
).
Given the relationship between the survivor function and the integrated hazard
function, one can also write the loglikelihood contribution for each individual
as
log L
I
= c
I
log 0(T
I
) ÷H(T
I
)
= c
I
log 0(T
I
) ÷
Ti
0
0(n)dn. (5.11)
Again, it is by choice of di¤erent parametric speci…cations for 0(t) that the
model is fully speci…ed.
If all survival times are censored (c
I
= 0, for all i), then the model cannot
be …tted. If there are no exits, then the sample provides no information about
the nature of duration dependence in the hazard rate.
5.0.4 Left truncated spell data (delayed entry)
The most common social science example of this type of sampling scheme is
when there is a sample from the stock of individuals at a point in time, who
are interviewed some time later (‘stock sampling with followup’). Spell start
dates are assumed to be known (these dates are of course before the date of the
stock sample), so the total time at risk of exit can be calculated, together with
the time between sampling and interview (last observation). Entry is ‘delayed’
because the observation of the subjects under study occurs some time after they
are …rst at risk of the event.
A sample from the stock is a nonrandom sample (see below), but we can
handle the ‘selection bias’ using information about the elapsed time between
sampling and interview. We have to analyse outcomes that have occurred by
the time of interview conditional on surviving in the state up to the sampling
time.
*Insert chart* time line indicating sampling time and interview times
Let us index spells that were completed by the time of the interview by
, = 1. .... J, and index spells still in progress at the time of the interview (right
censored) by / = 1. .... 1. Then we may de…ne
the incomplete spell length for each person at the time that the spell was
sampled from the stock: T
,
for completed spells, T

for censored spells.
the total observed spell length is T
,
+ t
,
for spells that were completed by
the time of the interview, and T

+7

for censored spells, where 7

is the
length of time between sampling and interview.
65
For both completed and censored spells, we have to condition on the fact that
the person survived su¢ciently long in the state to be at risk of being sampled in
the stock. For example, suppose that the sample was of the stock of unemployed
persons at 1 May 2001. Of all those who entered unemployment on 1 January
2001, some will have left unemployment by May; only the ‘slower’ exiters will
still be unemployed in May and at risk of being sampled in the stock. If we
ignored this problem, we would not take account of the lengthbiased sampling.
How do we do this?
Recall the expression for a conditional probability:
Pr(¹[1) =
Pr(¹¨ 1)
Pr(1)
. (5.12)
By analogy, we de‡ate the likelihood contribution for each individual by the
probability of survival from entry to the state until the stock sampling date.
But this probability is given for each i by the survival function: o(T
I
). Hence,
Likelihood contribution for leavers (type ,):
L
,
=
1 (T
,
+ t
,
)
o (T
,
)
. (5.13)
Likelihood for stayers (type /):
L

=
o(T

+7

)
o (T

)
. (5.14)
Hence the overall sample likelihood is
L =
²
¸
,=1
1 (T
,
+ t
,
)
o (T
,
)
1
¸
=1
o (T

+7

)
o (T

)
. (5.15)
To write this expression in a form closer to that used in the previous sub
section, let us simply de…ne T
I
as the total spell length, and use the censoring
indicator c
I
to distinguish between censored and complete spells, and let t
I
be
the date at which the stock sample was drawn.Then
L =
Þ
¸
I=1
¸
1 (T
I
)
o (t
I
)
ci
¸
o (T
I
)
o (t
I
)
1ci
(5.16)
=
Þ
¸
I=1
[0 (T
I
)]
ci
¸
o (T
I
)
o (t
I
)
(5.17)
or
log L =
Þ
¸
I=1
c
I
log 0(T
I
) + log
¸
o (T
I
)
o (t
I
)
¸
(5.18)
66 CHAPTER 5. CONTINUOUS TIME MULTIVARIATE MODELS
which is directly comparable with the (log)likelihood derived for the case without
lefttruncation. This expression can be rewritten as
log L =
Þ
¸
I=1
¦c
I
log 0(T
I
) + log [n
I
o (T
I
)]¦ (5.19)
where n
I
= 1´o (t
I
). Think of the n
I
as being like an weighting variable: one
weights the delayed entry observations by a type of inverseprobability weight
to account for the left truncation. The later in time that t
I
is (the closer to T
I
),
the larger the weight. If there is no left truncation, then o (t
I
) = 1 = n
I
.
This model can also be estimated straightforwardly in Stata using streg as
long as the data have been properly stset …rst (use the enter option to indicate
the ‘entry’ time, i.e. stock sampling date).
Lancaster (1979) also considered a variation on this sampling scheme. In
his data set, he did not have the exact date of exit for those spells that ended
in the interval between stock sampling and interview: he only knew that there
had been an exit. To estimate a model with this structure, one simply replaces
the expression 1 (T
,
+ t
,
) in (5.15) with o (T
,
) ÷o (T
,
+ t
,
).
5.0.5 Sample from stock with no reinterview
This is a di¢cult and awkward case. It used to be relatively common because
of a lack of suitable longitudinal surveys. This model could be …tted using data
from standard crosssectional household surveys: these surveys include samples
of unemployed people, and typically include a single question asking how long
the respondent had been unemployed. See Nickell (1979) and Atkinson et al.
(1984) for applications.
In this situation we have no information that we can use to condition survival
on (as in the previous subsection). So, in order to derive the sample likelihood,
we have to write down the probability of observing such a spell including taking
account of the di¤erent chances of entering unemployment at di¤erent dates.
Someone unemployed when surveyed at the start of October 2001 could have
entered unemployment on 1 January and remained unemployed 9 months, or
entered unemployment on 1 February and remained unemployed 8 months, or
unemployment on 1 March and remained unemployed 7 months. (Similarly for
the unemployed people who were surveyed during a di¤erent month.)
We want to model the chance of having an incomplete spell of length t at
calendar time :, conditional on being unemployed at date : – where the chance
of the latter itself depends on the chances of having entered unemployment at
some date in the past, r, and then remaining unemployed between r and :.
Distinguish a series of di¤erent event types:
¹ : incomplete spell of length t at calendar time :
1 : unemployed at date : (may di¤er by person – be :
I
– if from survey with
interviews through year)
67
C : survival of spell until time : given entry at time r (i.e. spell length t = :÷r)
1 : entry to unemployment at date r.
L
I
= Pr (¹[1) (5.20)
= Pr (¹¨ 1) ´ Pr (1) (5.21)
= Pr (¹) ´ Pr (1) (5.22)
=
Pr (C) Pr (1)
Pr (1)
. since 1 (¹) = Pr (C) Pr (1) (5.23)
What are the expressions for the component probabilities? Look …rst at:
Pr (1) =
s
¸
r=1
o
I
(: ÷t[entry at t) n
I
(t) . (5.24)
This says that the probability of being unemployed at : is the sum over all dates
before the interview (indexed by t) of the product of the probability of entering
unemployment at date t and the probability of remaining unemployed between
t and :.
We now need to make an assumption about unemployment in‡ow rates in
order to evaluate Pr(1). Assume that
Pr(1) = n
I
(r) = i
I
n(r), (5.25)
i.e. the probability of entry at r factors into an individual …xed e¤ect (this varies
with i but not time) and a function depending on time but not i (i.e. common
to all persons).
L
I
=
o(:
I
[entry at r
I
)i
I
n(r
I
)
si
¸
r=1
o(:
I
÷ t[entry at t)i
I
n(t)
(5.26)
=
o(:
I
[entry at r
I
)n(r
I
)
si
¸
r=1
o(:
I
÷ t[entry at t)n(t)
. (5.27)
Observe the cancelling of the individual …xed e¤ects. Implementing the
model one can use ‘external’ data about unemployment entry rates for the n(.)
term. This likelihood cannot be …tted using a canned routine in standard pack
ages and has to be specially written (tricky!).
5.0.6 Right truncated spell data (out‡ow sample)
An example of this sampling scheme is where one has a sample of all the persons
who left unemployment at a particular date and one wants to study the hazard
of (re)employment. Or one wants to study longevity, and has a sample based
68 CHAPTER 5. CONTINUOUS TIME MULTIVARIATE MODELS
on death records. In these sorts of cases, there is an overrepresentation of short
spells relative to long spells. Of all people beginning a spell at a particular date,
those who are more likely to survive (have relatively long spells) are less likely
to be found in the out‡ow at a particular date – a form of ‘selection bias’. To
control for this, one has to condition on failure at the relevant survival time
when specifying each individual’s likelihood contribution.
The sample likelihood is given by
L =
n
¸
I=1
1 (T
I
)
1(T
I
)
. (5.28)
There are no censored spells, of course, by de…nition. The numerator in (5.28)
is the density function for a spell of length T
I
. This must be conditioned: hence
the expression in the denominator in (5.28) which gives the probability of failure
at t = T
I
, i.e. the probability of entering at t = 0, surviving up until the instant
just before time t = T
I
, and then making the transition at t = T
I
.
5.1 Episode splitting: timevarying covariates
and estimation of continuous time models
To illustrate this, let us return to the most commonly assumed sampling scheme,
i.e. a random sample of spells with right censoring but the censoring point varies.
In our sample likelihood derivation above, we implicitly assumed
« explanatory variables were all constant – there were no timevarying co
variates;
« the data set was organised so that there was one row for each individual
at risk of transition.
Estimation of continuous time parametric regression models incorporating
timevarying covariates requires episode splitting. One has to split the sur
vival time (episode) for each individual into subperiods within which each time
varying covariate is constant. I.e. one has to create multiple records for each
individual, with one record per subperiod.
1
What is the logic behind this?
Consider a person i with two di¤erent values for a covariate:
A
1
if t < n
A
2
if t _ n
(5.29)
Recall that the loglikelihood contribution for person i in the data structure
that we have is:
1
Episode splitting to incorporate timevarying covariates is also required for other types
of model. For the Cox model it need only be done at the survival times at which transitions
occur (see Chapter 7) and, for discretetime models, episode splitting is done at each time
point (see Chapter 6).
5.1. EPISODE SPLITTING: TIMEVARYINGCOVARIATES ANDESTIMATIONOF CONTINUOUS TIME MODELS69
Record # Censoring indicator Survival time Entry time TVC value
Single data record for i
1 c
I
= 0 or 1 T
I
0 
Multiple data records for i (after episode splitting)
1 c
I
= 0 n 0 A
1
2 c
I
= 0 or 1 T
I
n A
2
Table 5.1: Example of episode splitting
log L
I
= c
I
log [0 (T
I
)] + log [o (T
I
)] (5.30)
where i’s observed survival time is T
I
and the censoring indicator c
I
= 1 if i’s
spell is complete (transition observed) and 0 if the spell is censored. But
log [o (T
I
)] = log
¸
o (n)
o (T
I
)
o (n)
(5.31)
= log [o (n)] + log
¸
o (T
I
)
o (n)
. (5.32)
(This expression, incorporating some notational liberties to facilitate exposition,
follows directly from the discussion about the incorporation of timevarying co
variates into PH and AFT models in Chapter 3.) Thus the log of the probability
of survival until T = (log of probability of survival to time n) + (log of proba
bility of survival to T
I
, conditional on entry at n).
So what we do is create one new record with c
I
= 0. t = n (a right censored
episode), plus one new record summarising an episode with ‘delayed entry’ at
time n and censoring indicator c
I
has the value as in the original data. In the
…rst episode and record, the timevarying covariate takes on the value A
1
and in
the second record the timevarying covariate takes on the value A
2
. See Table
5.1 for a summary of the old and new data structures.
Thus episode splitting (when combined with software that can handle left
truncated spell data) gives the correct loglikelihood contribution. In Stata,
episode splitting is easily accomplished using stsplit (applied after the data
have been stset, and which then cleverly updates stset). Then one creates the
appropriate timevarying covariate values for each record and, …nally, multivari
ate continuous time models can be straightforwardly estimated using the streg
or stcox commands.
70 CHAPTER 5. CONTINUOUS TIME MULTIVARIATE MODELS
Chapter 6
Discrete time multivariate
models
6.1 In‡ow sample with right censoring
We now measure time in discrete intervals indexed by the positive integers, and
let us suppose that each interval is a month long. We observe a person i’s spell
from month / = 1 through to the end of the ,

month, at which point i’s spell
is either complete (c
I
= 1), or right censored (c
I
= 0). The discrete hazard is
/
I,
= Pr (T
I
= ,[T
I
_ ,) . (6.1)
The likelihood contribution for a censored spell is given by the discrete time
survivor function
L
I
= Pr(T
I
,) = o
I
(,) (6.2)
=
,
¸
=1
(1 ÷/
I
) (6.3)
and the likelihood contribution for each completed spell is given by the discrete
time density function:
L
I
= Pr(T
I
= ,) = 1
I
(,) (6.4)
= /
I,
o
I
(, ÷1) (6.5)
=
/
I,
1 ÷/
I,
,
¸
=1
(1 ÷/
I
) (6.6)
The likelihood for the whole sample is
71
72 CHAPTER 6. DISCRETE TIME MULTIVARIATE MODELS
L =
n
¸
I=1
[Pr(T
I
= ,)]
ci
[Pr (T
I
,)]
1ci
(6.7)
=
n
¸
I=1
¸
/
I,
1 ÷/
I,
,
¸
=1
(1 ÷/
I
)
¸ci
¸
,
¸
=1
(1 ÷/
I
)
¸1ci
(6.8)
=
n
¸
I=1
¸
/
I,
1 ÷/
I,
ci
,
¸
=1
(1 ÷/
I
)
¸
(6.9)
where c
I
is a censoring indicator de…ned such that c
I
= 1 if a spell is complete
and c
I
= 0 if a spell is rightcensored (as in the previous chapter). This implies
that
log L =
n
¸
I=1
c
I
log
/
I,
1 ÷/
I,
+
n
¸
I=1
,
¸
=1
log (1 ÷/
I
) . (6.10)
Now de…ne a new binary indicator variable n
I
= 1 if person i makes a transition
(their spell ends) in month /, and n
I
= 0 otherwise. That is,
c
I
= 1 ==n
I
= 1 for / = T
I
. n
I
= 0 otherwise (6.11)
c
I
= 0 ==n
I
= 0 for all / (6.12)
Hence, we can write
log L =
n
¸
I=1
,
¸
=1
n
I
log
/
I
1 ÷/
I
+
n
¸
I=1
,
¸
=1
log (1 ÷/
I
) (6.13)
=
n
¸
I=1
,
¸
=1
[n
I
log /
I
+ (1 ÷n
I
) log (1 ÷/
I
)] . (6.14)
But this expression has exactly the same form as the standard likelihood function
for a binary regression model in which n
I
is the dependent variable and in
which the data structure has been reorganized from having one record per spell
to having one record for each month that a person is at risk of transition from
the state (socalled personmonth data or, more generally, personperiod data).
Table 6.1 summarises the two data structures.
This is just like the episode splitting described earlier for continuous time
models which also lead to the creation of data records, except that here there is
episode splitting on a much more extensive basis, and done regardless of whether
there are any timevarying covariates among the A. As long as the hazard is
not constant, then the variable summarizing the pattern of duration dependence
will itself be a timevarying covariate in this reorganised data set.
This result implies that there is an easy estimation method available for
discrete time hazard models using data with this sampling scheme (see inter
alia Allison, 1984; Jenkins, 1995). It has four steps:
6.2. LEFTTRUNCATED SPELL DATA (‘DELAYED ENTRY’) 73
Person data Personmonth data
person id, i c
I
T
I
person id, i c
I
T
I
n
I
personmonth id, /
1 0 2 1 0 2 0 1
1 0 2 0 2
2 1 3 2 1 3 0 1
.
.
.
.
.
.
.
.
. 2 1 3 0 2
2 1 3 1 3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Table 6.1: Person and personperiod data structures: example
1. Reorganize data into personperiod format;
2. Create any timevarying covariates – at the very least this includes a vari
able describing duration dependence in the hazard rate (see the discussion
in Chapter 3 of alternative speci…cations for the
,
terms);
3. Choose the functional form for /
I
(logistic or cloglog);
4. Estimate the model using any standard binary dependent regression pack
age: logit for a logistic model; cloglog for complementary loglog model.
The easy estimation method is not the only way of estimating the model. In
principle, one could estimate the sequence likelihood given at the start without
reorganising the data. Indeed before the advent of cheap computer memory or
storage, one might have had to do this because the episode splitting required
for the easy estimation method could create very large data sets with infeasible
storage requirements.
That the likelihood for discrete time hazard model can be written in the
same form as the likelihood for a binary dependent model also has some other
implications. For example, the latter models can only be estimated if the data
contains both ‘successes’ and ‘failures’ in the binary dependent variable. In the
current context, it means that in order to estimate discrete time hazard models
in this way, the data must contain both censored and complete spells. (To see
this, look at …rst order condition for a maximum of the loglikelihood function.)
The easy estimation method also can be used when there is stock sample
with followup (‘delayed entry’), as we shall now see.
6.2 Lefttruncated spell data (‘delayed entry’)
We proceed in the discrete time case in an analogous manner to that employed
with continuous time data. Recall that with no delayed entry
L
I
=
/
I,
1 ÷/
I,
ci
,
¸
=1
(1 ÷/
I
) =
/
I,
1 ÷/
I,
ci
o
I
(,) . (6.15)
74 CHAPTER 6. DISCRETE TIME MULTIVARIATE MODELS
With delayed entry at time n
I
(say) for person i, we have to condition on survival
up to time n
I
(corresponding to the end of the n
I
th interval or cycle), which
means dividing the expression above by o (n
I
). Hence with lefttruncated data,
the likelihood contribution for i is:
L
I
=
ij
1ij
ci
,
¸
=1
(1 ÷/
I
)
o
I
(n
I
)
. (6.16)
But
o
I
(n
I
) =
ui
¸
=1
(1 ÷/
I
) (6.17)
and this leads to a ‘convenient cancelling’ result (Guo, 1993; Jenkins, 1995):
L
I
=
/
I,
1 ÷/
I,
ci
,
¸
=1
(1 ÷/
I
)
ui
¸
=1
(1 ÷/
I
)
¸
¸
¸
¸
¸
(6.18)
=
/
I,
1 ÷/
I,
ci
,
¸
=ui+1
(1 ÷/
I
) . (6.19)
Taking logarithms, we have
log L
I
=
,
¸
=ui+1
[n
I
log /
I
+ (1 ÷n
I
) log (1 ÷/
I
)] (6.20)
which is very similar to the expression that we had in the nodelayedentry
case, except that the summation now runs over the months from the month of
‘delayed entry’ (e.g. when the stock sample was drawn) to the month when last
observed.
Implementation of the easy estimation method using data with this sampling
scheme now has four steps (Jenkins, 1995):
1. Reorganize data into personperiod format;
2. Throw away the records for all the periods prior to the time of ‘delayed en
try’ (the months up to and including n in this case), retaining the months
during which each individual is observed at risk of experiencing the event;
3. Create any timevarying covariates (at the very least this includes a vari
able describing duration dependence in the hazard rate – see above);
4. Choose the functional form for /
I
(logistic or cloglog);
5. Estimate the model using any standard binary dependent regression pack
age: logit for a logistic model; cloglog for complementary loglog model.
The only di¤erence from before is step 2 (throwing data away). Actually, in
practice, steps one and two might be combined. If one has a panel data set, for
instance, one can create a data set with the correct structure directly.
6.3. RIGHTTRUNCATED SPELL DATA (OUTFLOW SAMPLE) 75
6.3 Righttruncated spell data (out‡ow sample)
Again the speci…cation for the sample likelihood closely parallels that for the
continuous time case. There are no censored cases; all spells are completed, by
construction. But one has to account for the selection bias arising from out‡ow
sampling by conditioning each individual’s likelihood contribution on failure at
the observed survival time. Each spell’s contribution to the sample likelihood is
given by the discrete density function, 1(,), divided by the discrete time failure
function, 1(,) = 1 ÷o(,). Hence i’s contribution to sample likelihood is:
L
I
=
ij
1ij
,
¸
=1
(1 ÷/
I
)
1 ÷
¸
,
¸
=1
(1 ÷/
I
)
. (6.21)
Unfortunately the likelihood function does not conveniently simplify in this case
(as it did in the lefttruncated spell data case).
76 CHAPTER 6. DISCRETE TIME MULTIVARIATE MODELS
Chapter 7
Cox’s proportional hazard
model
This model, proposed by Cox (1972), is perhaps the mostoften cited article
in survival analysis. The distinguishing feature of Cox’s proportional hazard
model, sometimes simply referred to as the ‘Cox model’, is its demonstration
that one could estimate the relationship between the hazard rate and explana
tory variables without having to make any assumptions about the shape of the
baseline hazard function (cf. the parametric models considered earlier). Hence
the Cox model is sometimes referred to as a semiparametric model. The result
derives from innovative use of the proportional hazard assumption together with
several other insights and assumptions, and a partial likelihood (PL) method of
estimation rather than maximum likelihood. Here follows an intuitive demon
stration of how the model works, based on the explanation given by Allison
(1984).
We are working in continuous time, and suppose that we have a random
sample of spells, some of which are censored and some are complete. (The
models can be estimated using lefttruncated data, but we shall not consider
that case here.)
Recall the PH speci…cation
0 (t. A
I
) = 0
0
(t) exp
0
A
I
(7.1)
= 0
0
(t) `
I
. (7.2)
or, equivalently,
0 (t. A
I
) = 0
0
(t) `
I
. (7.3)
We shall suppose, for the moment, that the A vector is constant, but note that
the Cox model can in fact also handle timevarying covariates.
Cox (1972) proposed a method for estimating the slope coe¢cients in (i.e.
excluding the intercept) without having to specify any functional form for the
77
78 CHAPTER 7. COX’S PROPORTIONAL HAZARD MODEL
Person # Time Event #
i t
I
/
1 2 1
2 4 2
3 5 3
4 5*
5 6 4
6 9*
7 11 5
8 12*
*: censored observation
Table 7.1: Data structure for Cox model: example
baseline hazard function, using the method of PL. PL works in terms of the
ordering of events by constrast with the focus in ML on persons (spells).
Consider the illustrative data set shown in Table 7.1. We assume that there
is a maximum of one event at each possible survival time (this rules out ties in
survival times – for which there are various standard ways of adapting the Cox
model). In the table persons are arranged in order of survival times.
The sample Partial Likelihood is given by
11 =
1
¸
=1
L

(7.4)
This is quite di¤erent from the sample likelihood expressions we had before in
the maximum likelihood case. The / indexes events, not persons. But what
then is each L

? Think of it as the following:
1

= Pr(person i has event at t = t
I
conditional on being in the risk set at t = t
I
), i.e. (7.5)
= Pr(this particular person i experiences the event at t = t
I
,
given that one observation amongst many at risk experiences the event)
Or, of the people at risk of experiencing the event, which one in fact does
experience it? To work out this probability, use the rules of conditional prob
ability together with the result that the probability density for survival times
is the product of the hazard rate and survival function, i.e. 1(t) = 0(t)o(t),
and so the probability that an event occurs in the tiny interval [t. t + t) is
1 (t) dt = 0(t)o(t)dt.
Consider event / = 5 with risk set i ÷ ¦7. 8¦. We can de…ne
¹ = Pr (event experienced by i = 7 and not i = 8) = [0
7
(11)o
7
(11)dt] [o
8
(11)]
1 = Pr (event experienced by i = 8 and not i = 7) = [0
8
(11)o
8
(11)dt] [o
7
(11)]
79
Now consider the expression for probability ¹ conditional on the probability
of either ¹ or 1 (the chance that either could have experienced event, which is
a sum of probabilities). Using the standard conditional probability formula, we
…nd that
L
5
=
¹
¹+1
=
0
7
(11)
0
7
(11) +0
8
(11)
(7.6)
Note that the survivor function terms cancel. This same idea can be applied to
derive all the other L

. For example,
L
1
=
0
1
(2)
0
1
(2) +0
2
(2) +.... +0
8
(2)
. (7.7)
Everyone is in the risk set for the …rst event.
Now let us apply the PH assumption 0 (t. A
I
) = 0
0
(t) `
I
, starting for illus
tration with event / = 5 with associated risk set i ÷ ¦7. 8¦. We can substitute
into the expression for L
5
above, and …nd that
L
5
=
0
0
(11) `
7
0
0
(11) `
7
+0
0
(11) `
8
(7.8)
=
`
7
`
7
+`
8
. (7.9)
The baseline hazard contributions cancel. (The intercept term, which is
common to all, exp(
0
), also cancels from both the numerator and the denom
inator since it appears in each of the ` terms.) By similar arguments,
L
1
=
`
1
`
1
+`
2
+... +`
8
(7.10)
and so on. Given each L

expression, one can construct the complete PL expres
sion for the whole sample of events, and then maximize it to derive estimates
of the slope coe¢cients within . It has been shown that these estimates have
‘nice’ properties.
Note that the baseline hazard function is completely unspeci…ed, which can
be seen as a great advantage (one avoids potential problems from specifying the
wrong shape), but some may also see it as a disadvantage if one is particularly
interested in the shape of the baseline hazard function for its own sake.
One can derive an estimate of the baseline survivor and cumulative hazard
functions (and thence of the baseline hazard using methods analogous to the
KaplanMeier procedure discussed earlier – with the same issues arising).
If there are tied survival times, then the neat results above do not hold
exactly. In this case, either one has to use various approximations (several stan
dard ones are built into software packages by default), or modify the expressions
to derive the ‘exact’ partial likelihood – though this may increase computational
time substantially. (It turns out that the speci…cation in the latter case is closely
related to the conditional logit model.) On the other hand, if one …nds that the
80 CHAPTER 7. COX’S PROPORTIONAL HAZARD MODEL
incidence of tied survival times is relatively high in one’s data set, then perhaps
one should ask whether a continuous time model is suitable. One could instead
apply the discrete time proportional hazards model.
The Cox PL model can incorporate timevarying covariates. In empirical im
plementation, one uses episode splitting again. By constrast with the approaches
to this employed in the last two chapters, observe that now the splitting need
only be done at the failure times. This is because the PL estimates are derived
only using the information about the risk pool at each failure time. Thus co
variates are only ‘evaluated’ during the estimation at the failure times, and it
does not matter what happens to their values in between.
Finally observe that the expression for each L

does not depend on the pre
cise survival time at which the /th event occurs. Only the order of events a¤ects
the PL expression. Check this yourself: multiply all the survival times in the
table by two and repeat the derivations (you should …nd that the estimates of
the slope coe¢cients in are the same). What is the intuition? Remember that
the PH assumption means implies that the hazard function for two di¤erent in
dividuals has the same shape, di¤ering only by a constant multiplicative scaling
factor that does not vary with survival time. To estimate that constant scaling
factor, given the common shape, one does not need the exact survival times.
Chapter 8
Unobserved heterogeneity
(‘frailty’)
In the multivariate models considered so far, all di¤erences between individ
uals were assumed to be captured using observed explanatory variables (the
A vector). We now consider generalisations of the earlier models to allow for
unobserved individual e¤ects. There are usually referred to as ‘frailty’ in the
biomedical sciences. (If one is modelling human survival times, then frailty is
an unobserved propensity to experience an adverse health event.) There are
several reasons why these variables might be relevant. For example:
« omitted variables (unobserved in the available data, or intrinsically unob
servable such as ‘ability’)
« measurement errors in observed survival times or regressors (see Lancaster,
1990, chapter 4).
What if the e¤ects are important but ‘ignored’ in modelling? The literature
suggests several …ndings:
« The ‘nofrailty’ model will overestimate the degree of negative duration
dependence in the hazard (i.e. underestimate the degree of positive du
ration dependence); * Insert …gure *
« The proportionate response of the hazard rate to a change in a regressor /
is no longer constant (it was given by

in the models without unobserved
heterogeneity), but declines with time;
« one gets an underestimate of the true proportionate response of the haz
ard to a change in a regressor / from the nofrailtymodel

.
Let us now look at these results in more detail.
81
82 CHAPTER 8. UNOBSERVED HETEROGENEITY (‘FRAILTY’)
8.1 Continuous time case
For convenience we suppress the subscript indexing individuals, and assume for
now that there are no timevarying covariates. We consider the model
0 (t. A [ ·) = ·0 (t. A) (8.1)
where
0(t. A) is the hazard rate depending on observable characteristics A, and
· is an unobservable individual e¤ect that scales the nofrailty component.
Random variable · is assumed to have the following properties:
« · 0
« c(·) = 1 (unit mean, a normalisation required for identi…cation)
« …nite variance o
2
0, and is
« distributed independently of t and A.
This model is sometimes referred to as a ‘mixture’ model – think of the two
components being ‘mixed’ together – or as a ‘mixed proportional hazard’ model
(MPH). It can be shown, using the standard relationship between the hazard
rate and survivor function (see chapter 3), that the relationship between the
frailty survivor function and the nofrailty survivor function is
o (t. A [ ·) = [o (t. A)]
u
(8.2)
Thus the individual e¤ect · scales nofrailty component survivor function. In
dividuals with aboveaverage values of · leave relatively fast (their hazard rate
is higher, other things being equal, and their survival times are smaller), and
the opposite occurs for individuals with belowaverage values of ·.
If the nofrailty hazard component has Proportional Hazards form, then:
0 (t. A) = 0
0
(t) c
o
0
·
(8.3)
0 (t. A [ ·) = ·0
0
(t) c
o
0
·
(8.4)
log [0 (t. A [ ·)] = log 0
0
(t) +
0
A +n (8.5)
where n = log (·) and c(n) = c. In the nofrailty model, the log of the hazard
rate at each survival time t equals the log of the baseline hazard rate at t
(common to all persons) plus an additive individualspeci…c component (
0
A).
The frailty model for the loghazard adds an additional additive ‘error’ term
(n). Alternatively, think of this as a random intercept model: the intercept is
0
+n.
How does one estimate frailty models, given that the individual e¤ect is
unobserved? Clearly we cannot estimate the values of · themselves since, by
8.1. CONTINUOUS TIME CASE 83
construction, they are unobserved. Put another way, there are as many indi
vidual e¤ects as individuals in the data set, and there are not enough degrees
of freedom left to …t these parameters. However if we suppose the distribution
of · has a shape whose functional form is summarised in terms of only a few
key parameters, then we can estimate those parameters with the data available.
The steps are as follows:
« Specify a distribution for the random variable ·, where this distribution
has a particular parametric functional form (e.g. summarising the variance
of ·).
« Write the likelihood function so that it refers to the distributional para
meter(s) (rather than each ·), otherwise known as ‘integrating out’ the
random individual e¤ect.
This means that one works with some survivor function
o
u
(t. A) = o(t. A[. o
2
), (8.6)
and not o(t. A[. ·). Then
o
u
(t. A) =
1
0
[o(t. A)]
u
o (·) d· (8.7)
where o (·) is the probability density function (pdf) for ·. The o (·) is the
‘mixing’ distribution. But what shape is appropriate to use for the distribution
of · (among the potential candidates satisfying the assumptions given earlier)?
The most commonly used speci…cation for the mixing distribution is the
Gamma distribution, with unit mean and variance o
2
. Making the relevant
substitutions into o
u
(t. A) and evaluating the integral implies a speci…c func
tional form for the frailty survivor function:
o
t. A [ . o
2
=
1 ÷o
2
lno (t. A)
(1/c
2
)
(8.8)
=
1 +o
2
H (t. A)
(1/c
2
)
(8.9)
where o (t. A) is the nofrailty survivor function, and using the property that
the integrated hazard function H (t. A) = ÷lno (t. A). (Note that lim
c
2
!0
o
t. A [ . o
2
= o (t. A).)
Now consider what this implies if the nofrailty part follows a Weibull model,
in which case: o (t) = exp(÷`t
o
), H(t) = `t
o
, ` = exp(
0
A), and so
o
t. A [ . o
2
=
1 +o
2
`t
o
(1/c
2
)
(8.10)
for which one may calculate that the median survival time is given by
:
u
=
(2
c
2
÷1)
o
2
`
1
. (8.11)
84 CHAPTER 8. UNOBSERVED HETEROGENEITY (‘FRAILTY’)
This may be compared with the expression for the median in the nofrailty case
for the Weibull model, [(log(2)´`]
1/o
, which is the corresponding expression as
· ÷0.
This survivor function (and median) is an unconditional one, i.e. not condi
tioning on a value of · (that has been integrated out). An alternative approach is
to derive the survivor function (and its quantiles) for speci…c values of · (these
are conditional survivor functions). Of the speci…c values, the most obvious
choice is · = 1 (the mean value), but other values such as the upper or lower
quartiles could also be used. (The estimates of the quartiles of the heterogeneity
distribution can be derived using the estimates of o
2
.)
An alternative mixing distribution to the Gamma distribution is the Inverse
Gaussian distribution. This is less commonly used (but available in Stata along
with the Gamma mixture model). For further discussion, see e.g. Lancaster
(1990), the Stata manuals under streg, or EC968 Web Lesson 8.
8.2 Discrete time case
Recall that in the continuous time proportional hazard model, then the log of
the frailty hazard is:
log [0 (,. A [ ·)] = log [0
0
(,)] +
0
A +n. (8.12)
Recall too the discrete time model for the situation where survival times are
grouped (leading to the cloglog model). By the same arguments as those, one
can have a discrete time PH model with unobserved heterogeneity:
cloglog [/(,. A [ ·)] = 1(,) +
0
A +n (8.13)
where n = log (·), as above. 1(,) characterises the baseline hazard function.
If · has a Gamma distribution, as proposed by Meyer (1990), there is a closed
form expression for the frailty survivor function: see (8.9) above. If we have
an in‡ow sample with right censoring, the contribution to the sample likeli
hood for a censored observation with spell length , intervals is o
,. A [ . o
2
,
and contribution of someone who makes a transition in the ,th interval is
o
, ÷1. A [ . o
2
÷o
,. A [ . o
2
, with the appropriate substitutions made.
This model can be estimated using my Stata program pgmhaz8 (or pgmhaz
for users of Stata versions 5–7). The data should be organized in personperiod
form, as discussed in the previous chapter, and timevarying covariates may be
incorporated.
Alternatively, one may suppose that n has a Normal distribution with mean
zero. In this case, there is no convenient closed form expression for the sur
vivor function and hence likelihood contributions: the ‘integrating out’ must be
done numerically. Estimation may be done using the builtin Stata program
xtcloglog, using data organized in personperiod form.
8.2. DISCRETE TIME CASE 85
For the logit model, one might suppose now that log odds of the hazard
takes the form
/(,. A [ c)
1 ÷/(,. A [ c)
=
¸
/
0
(,)
1 ÷/
0
(,)
exp
0
A +c
(8.14)
which leads to the model with
logit [/(,. A [ c)] = 1(,) +
0
A +c (8.15)
and c is an ‘error’ term with mean zero, and …nite variance. If one assumes
that c has a Normal distribution with mean zero, one has a model that can
be estimated using the Stata program xtlogit applied to data organized in
personperiod form.
All the approaches mentioned so far use parametric approaches. A non
parametric approach to characterising the frailty distribution was pioneered in
the econometrics literature by Heckman and Singer (1984). The idea is essen
tially that one …ts an arbitrary distribution using a set of parameters. These
parameters comprise a set of ‘mass points’ and the probabilities of a person be
ing located at each mass point. We have a discrete (multinomial) rather than a
continuous mixing distribution. Sociologists might recognize the speci…cation as
a form of ‘latent class’ model: the process describing timetoevent now di¤ers
between a number of classes (groups) within the population.
To be concrete, consider the discrete time proportional hazards model, for
which the interval hazard rate is given (see Chapter 3) by
/(,. A) = 1 ÷exp[÷exp(
0
+
1
A
1
+
2
A
2
+... +
1
A
1
+
,
)].
Suppose there are two types of individual in the population (where this feature
is not observed). We can incorporate this idea by allowing the intercept
0
to
vary between the two classes, i.e.
/
1
(,. A) = 1÷exp[÷exp(j
1
+
1
A
1
+
2
A
2
+...+
1
A
1
+
,
)] for Type 1 (8.16)
/
2
(,. A) = 1 ÷exp[÷exp(j
2
+
1
A
1
+
2
A
2
+... +
1
A
1
+
,
)] for Type 2.
(8.17)
If j
2
j
1
, then Type 2 people are fast exiters relatively to Type 1 people, other
things being equal. Once again we have a random intercept model, but the
randomness is characterised by a discrete distribution rather than a continuous
(e.g. Gamma) distribution, as earlier.
If we have an in‡ow sample with right censoring, the contribution to the
sample likelihood for a person with spell length , intervals is the probability
weighted sum of the contributions arising were he a Type 1 or a Type 2 person,
i.e.
L = ¬L(j
1
) + (1 ÷¬)L(j
2
) (8.18)
where
L
1
=
/
1
(,. A)
1 ÷/
1
(,. A)
c ,
¸
=1
[1 ÷/
1
(/. A)] (8.19)
86 CHAPTER 8. UNOBSERVED HETEROGENEITY (‘FRAILTY’)
L
2
=
/
2
(,. A)
1 ÷/
2
(,. A)
c ,
¸
=1
[1 ÷/
2
(/. A)] (8.20)
where ¬ is the probability of belonging to Type 1, and c is the censoring indica
tor. Where there are ' latent classes, the likelihood contribution for a person
with spell length , is
L =
1
¸
n=1
¬
n
L(j
n
). (8.21)
The j
n
are the ' ‘mass point’ parameters describing the support of the discrete
multinomial distribution, and the ¬
n
are their corresponding probabilities, with
with
¸
n
¬
n
= 1. The number of mass points to choose is not obvious a priori.
In practice, researchers have often used a small number (e.g. ' = 2 or '
= 3) and conducted inference conditional on this choice. For a discussion of the
‘optimal’ number of mass points, using socalled Gâteaux derivatives, see e.g.
Lancaster (1990, chapter 9).
This model can be estimated in Stata using my program hshaz (get it by
typing ssc install hshaz in an uptodate version of Stata) or using Sophia
RabeHesketh’s program gllamm (ssc install gllamm).
8.3 What if unobserved heterogeneity is ‘impor
tant’ but ignored?
In this section, we provide more details behind the results that were stated at
the beginning of the chapter. See Lancaster (1979, 1990) for a more extensive
treatment (on which I have relied heavily).
We suppose that the ‘true’ model, with no omitted regressors, takes the PH
form and there is a continuous mixing distribution:
0 (t. A [ ·) = ·0
0
(t) `. ` = c
o
0
·
(8.22)
which implies
0 log [0 (t. A [ ·)]
0A

=

. (8.23)
I.e. the proportionate response of the hazard to some included variable A

is
given by the coe¢cient

. This parameter is constant and does not depend on
t (or A). The model also implies
0 log [0 (t. A) ·]
0t
=
0 log 0
0
(t. A)
0t
. (8.24)
We suppose that the ‘observed’ model, i.e. with omitted regressors, is
0 (t. A [ ·) = ·0
0
(t) `
1
(8.25)
where `
1
= c
o
0
1
·1
, A
1
is a subset of A.
8.3. WHAT IF UNOBSERVEDHETEROGENEITYIS ‘IMPORTANT’ BUT IGNORED?87
Assuming · ~ Gamma(1. o
2
), then
o
1
t [ o
2
=
1 ÷o
2
o (t)
1
2
=
1 +o
2
H (t)
1
2
(8.26)
and
0
1
t [ o
2
=
o
1
t [ o
2
c
2
0
0
(t) `
1
. (8.27)
8.3.1 The duration dependence e¤ect
Look at the ratio of the hazards from the two models:
0
1
0
=
0
0
(t) `
1
o
1
t [ o
2
c
2
0
0
(t) ·
(8.28)
·
o
1
t [ o
2
c
2
(8.29)
which is monotonically decreasing with t, for all o
2
0.
In sum, the hazard rate from a model with omitted regressors increases less
fast, or falls faster, that does the ‘true’ hazard (from the model with no omitted
regressors).
The intuition is of a selection or ‘weeding out’ e¤ect. Controlling for observ
able di¤erences, people with unobserved characteristics associated with higher
exit rates leave the state more quickly than others. Hence ‘survivors’ at longer
t increasingly comprise those with low · which, in turn, implies a lower hazard,
and the estimate of hazard is an underestimate of ‘true’ one.
Bergström and Eden (1992) illustrate the argument nicely in the context of
a Weibull model for the nonfrailty component. Recall that the Weibull model
in the AFT representation is written log T =
0
A + on, with the variance of
the ‘residuals’ given by o
2
var(n). If one added in (unobserved) heterogeneity
to the systematic part of the model, then one would expect the variance of
the residuals to fall correspondingly. This is equivalent to a decline in o. But
recall o = 1´c where c is the Weibull shape parameter we discussed earlier. A
decline in o is equivalent to a rise in c. Thus the ‘true’ model exhibits more
positive duration dependence than the model without this extra heterogeneity
incorporated.
8.3.2 The proportionate response of the hazard to varia
tions in a characteristic
We consider now the proportionate response of the hazard to a variation in A

where A

is an included regressor (part of A
1
). We know that the proportionate
response in the ‘true’ model is
0 log 0
0A

=

. (8.30)
88 CHAPTER 8. UNOBSERVED HETEROGENEITY (‘FRAILTY’)
In the observed model, we have
0 log 0
1
0A

=
0
o
2
o
1
t [ o
2
+
0
1
A
1
0A

(8.31)
=

+
o
2
o
1
(t [ o
2
)
0
o
1
t [ o
2
0A

. (8.32)
But
0
o
1
t [ o
2
0A

= ÷
1
o
2
[1 +o
2
H
1
(t)]
1
2
1
o
2
0 [H
1
(t)]
0A

(8.33)
=
÷o
1
t [ o
2
1 +o
2
H
1
(t)
0 [H
1
(t)]
0A

(8.34)
and
0 [H
1
(t)]
0A

=

H
1
(t) (8.35)
so
o
2
o
1
0 [o
1
]
0A

= o
2
÷

H
1
(t)
1 +o
2
H
1
(t)
. (8.36)
Hence, substituting back into the earlier expression, we have that
0 log 0
1
0A

=

¸
1 ÷
o
2
H
1
(t)
1 +o
2
H
1
(t)
(8.37)
=

1 +o
2
H
1
(t)
(8.38)
=

o
1
t [ o
2
c
2
. (8.39)
Note that 1 + o
2
H
1
(t) 1 or, alternatively, that 0 _ o
1
_ 1 and tends to zero
as t ÷·. The implications are twofold:
1. Suppose one wants to estimate

(with its nice interpretation as being
the proportionate response of the hazard in the ‘true’ model), then the
omitted regressor model provides an underestimate (in the modulus) of
the proportionate response.
2. Variation in H
1
causes equiproportionate variation in the hazard rate at
all t in the ‘true’ model. But with omitted regressors, the proportionate
e¤ect tends to zero as t ÷·.
The intuition is as follows. Suppose there are two types of person ¹, 1. On
average the hazard rate is higher for ¹ than for 1 at each time t. In general
the elasticity of the hazard rate at any t varies with the ratio of the average
group hazards, 0
.
and 0
1
. Group ¹ members with high · (higher hazard) leave
…rst, implying that the average 0
.
among the survivors falls and becomes more
similar to the average 0
1
. Thus the ratio of 0
.
to 0
1
declines as t increases and
the proportionate response will fall. It is a ‘weeding out’ e¤ect again.
8.4. EMPIRICAL PRACTICE 89
8.4 Empirical practice
These results suggest that taking accounting of unobserved heterogeneity is a
potentially important part of empirical work. It has often not been so in the
past, partly because of a lack of available software, but that constraint is now
less relevant. One can estimate frailty models and test whether unobserved
heterogeneity is relevant using likelihood ratio tests based on the restricted and
unrestricted models. (Alternatively there are ‘score’ tests based on the restricted
model.)
Observe that virtually commonlyavailable estimation software programs as
sume that one has data from a in‡ow or population sample. To properly account
for left (or right) truncation requires special programs. For example, the ‘con
venient cancelling’ results used to derive an easy estimation method for discrete
time models no longer apply. The likelihood contribution for a person with a
left truncated spell still requires conditioning on survival up to the truncation
date, but the expression for the survivor function now involves factors related
to the unobserved heterogeneity.
The early empirical social science literature found that conclusions about
whether or not frailty was ‘important’ (e¤ects on estimate of duration depen
dence and estimates of ) appeared to be sensitive to choice of shape of distrib
ution for ·. Some argued that the choice of distributional shape was essentially
‘arbitrary’, and this stimulated the development of nonparametric methods,
including the discrete mixture methods brie‡y referred to.
Subsequent empirical work suggests, however, that the e¤ects of unobserved
heterogeneity are mitigated, and thence estimates more robust, if the analyst
uses a ‡exible baseline hazard speci…cation. (The earlier literature had typically
used speci…cations, often the Weibull one, that were not ‡exible enough.) See
e.g. Dolton and van der Klaauw (1995).
All in all, the topic underscores the importance of getting good data, includ
ing a wide range of explanatory variables that summarize well the di¤erences
between individuals.
90 CHAPTER 8. UNOBSERVED HETEROGENEITY (‘FRAILTY’)
Chapter 9
Competing risks models
Until now we have considered modelled transitions out the current state (exit
to any state from the current one). Now we consider the possibility of exit to
one of several destination states. For illustration, we will suppose that there
are two destination states ¹. 1, but the arguments generalise to any number of
destinations. The continuous time case will be considered …rst, and then the
discrete time one (for both the intervalcensored and ‘pure’ discrete cases). We
shall see that the assumption of independence in competing risks makes several
of the models straightforward to estimate.
9.1 Continuous time data
De…ne
0
.
(t) : the latent hazard rate of exit to destination ¹, with survival times
characterised by density function 1
.
(t), and latent failure time T
.
;
0
1
(t) : the latent hazard rate of exit to destination 1, with survival times
characterised by density function 1
1
(t), and latent failure time T
1
.
0 (t) : the hazard rate for exit to any destination.
Each destinationspeci…c hazard rate can be thought of as the hazard rate
that would apply were transitions to all the other destinations not possible. If
this were so, we would be able to link the observed hazards with that destination
speci…c hazard. However, as there are competing risks, the hazard rates are
‘latent’ rather than observed in this way. What we observe in the data is either
(i) no event at all (a censored case, with a spell length T
c
), or (ii) an exit
which is to ¹ or to 1 (and we know which is which). The observed failure
time T = min¦T
.
. T
1
. T
c
¦ . What we are seeking is methods to estimate the
destinationspeci…c hazard rates given data in this form.
91
92 CHAPTER 9. COMPETING RISKS MODELS
Assume 0
.
and 0
1
are independent. This implies that
0 (t) = 0
.
(t) +0
1
(t) . (9.1)
I.e. the hazard rate for exit to any destination is the sum of the destination
speci…c hazard rates. (Cf. probabilities: Pr(¹ or 1) = Pr (¹) + Pr (1) if A, B
independent. So 0 (t) dt = 0
.
(t) dt +0
1
(t) dt.)
Independence also means that the survivor function for exit to any destina
tion can factored into a product of destinationspeci…c survivor functions:
o (t) = exp
¸
÷

0
0 (n) dn
(9.2)
= exp
¸
÷

0
[0
.
(n) +0
1
(n)] dn
(9.3)
= exp
¸
÷

0
0
.
(n) dn
exp
¸
÷

0
0
1
(n) dn
(9.4)
= o
.
(t) o
1
(t) . (9.5)
The derivation uses the property c
o+b
= c
o
c
b
.
The individual sample likelihood contribution in the independent competing
risk model with two destinations is of three types:
L
.
: exit to ¹, where L
.
= 1
.
(T) o
1
(T)
L
1
: exit to 1, where L
1
= 1
1
(T) o
.
(T)
L
c
: censored spell, where L
c
= o (T) = o
.
(T) o
1
(T)
In the L
.
case, the likelihood contribution summarises the chances of a
transition to ¹ combined with no transition to 1, and vice versa in the L
1
case.
Now de…ne destinationspeci…c censoring indicators
c
.
=
1 if i exits to ¹
0 otherwise (exit to 1 or censored)
(9.6)
c
1
=
1 if i exits to 1
0 otherwise (exit to ¹ or censored)
(9.7)
The overall contribution from the individual to the likelihood, L, is
L = (L
.
)
o
A
(L
1
)
o
B
L
c
1o
A
o
B
(9.8)
= [1
.
(T) o
1
(T)]
o
A
[1
1
(T) o
.
(T)]
o
B
[o
.
(T) o
1
(T)]
1o
A
o
B
(9.9)
=
¸
1
.
(T)
o
.
(T)
o
A
o
.
(T)
¸
1
1
(T)
o
1
(T)
o
B
o
1
(T) (9.10)
=
[0
.
(T)]
o
A
o
.
(T)
¸
[0
1
(T)]
o
B
o
1
(T)
¸
(9.11)
9.2. INTRINSICALLY DISCRETE TIME DATA 93
or
lnL =
c
.
ln0
.
(T) + lno
.
(T)
¸
+
c
1
ln0
1
(T) + lno
1
(T)
¸
. (9.12)
The loglikelihood for the sample as a whole is the sum of this expression over
all individuals in the sample.
In other words, the (log) likelihood for the continuous time competing risk
model with two destination states factors into two parts, each of which de
pends only on parameters speci…c to that destination. Hence one can maximise
the overall (log)likelihood by maximising the two component parts separately.
These results generalize straightforwardly to the situation with more than two
independent competing risks.
This means that the model can be estimated very easily. Simply de…ne new
destinationspeci…c censoring variables (as above) and then estimate separate
models for each destination state. The overall model likelihood value is the sum
of the likelihood values for each of the destinationspeci…c models.
These results are extremely convenient if one is only interested in estimat
ing the competing risks model. Often, however, one is also interested in testing
hypotheses involving restrictions across the destinationspeci…c hazards. In par
ticular one is typically interested in testing whether the same model applies for
transitions to each destination or whether the models di¤er (and how). But in
troducing such restrictions means that, in principle, one has to jointly estimate
the hazards: under the null hypotheses with restrictions, the likelihood is no
longer separable.
Fortunately there are some simple methods for testing a range of hypothe
ses that can be implemented using estimation of singlerisk models, as long as
one assumes that hazard rates have a proportional hazard form. See Naren
dranathan and Stewart (1991). The …rst hypothesis they consider is equality
across the destinationspeci…c hazards of all parameters except the intercepts,
which is equivalent to supposing that the ratios of destinationspeci…c hazards
are independent of survival time and the same for all individuals. The second,
and weaker, hypothesis is that the hazards need not be constant over t but
are equal across individuals. This is equivalent to supposing that conditional
probabilities of exit to a particular state at a particular elapsed survival time
are the same for all individuals. Implementation of the …rst test involves use of
likelihood values from estimation of the unrestricted destinationspeci…c models
and the unrestricted single risk model (not distinguishing between exit types),
and the number of exits to each destination. Implementation of the second test
requires similar information, including the number of exits to each destination
at each observed survival time.
9.2 Intrinsically discrete time data
Recall that in the continuous time case, exits to only one destination are feasible
at any given instant. Hence, assuming independent risks, the overall (continuous
94 CHAPTER 9. COMPETING RISKS MODELS
time) hazard equals the sum of the destinationspeci…c hazards. With discrete
time, things get more complicated, and the neat separability result that applies
in the continuous time case no longer holds. As before let us distinguish between
the cases in which survival times are intrinsically discrete and in which they
arise from interval censoring (survival times are intrinsically continuous, but
are observed grouped into intervals), beginning with the former.
In this case, there is a parallel with the continuous time case: the discrete
hazard rate for exit at time , to any destination is the sum of the destination
speci…c discrete hazard rates. That is,
/(,) = /
.
(,) +/
1
(,) . (9.13)
Because survival times are intrinsically discrete, if there is an exit to one of the
destinations at a given survival time, then there cannot be an exit to the other
destination at the same survival time. However this property does not lead to a
neat separability result for the likelihood analogous to that for the continuous
time case. To see why, consider the likelihood contributions for the discrete time
model. There are three types: that for an individual exiting to ¹ (L
.
), that for
an individual exiting to 1 (L
1
), and that for a censored case (L
c
). Supposing
that the observed survival time for an individual is , cycles, then:
L
.
= /
.
(,) o(, ÷1) (9.14)
=
¸
/
.
(,)
1 ÷/(,)
o(,) (9.15)
=
¸
/
.
(,)
1 ÷/
.
(,) ÷/
1
(,)
o(,) (9.16)
Similarly,
L
1
= /
1
(,) o(, ÷1) (9.17)
=
¸
/
1
(,)
1 ÷/(,)
o(,) (9.18)
=
¸
/
1
(,)
1 ÷/
.
(,) ÷/
1
(,)
o(,) (9.19)
and
L
c
= o(,). (9.20)
There is a common term in each expression summarising the overall probability
of survival of survival for , cycles, i.e. o
I
(,). In the continuous time case,
the analogous expression could be rewritten as the product of the destination
speci…c survivor functions. But this result does not carry over to here:
o(,) =
,
¸
=1
[1 ÷/(/)] =
,
¸
=1
[1 ÷/
.
(/) ÷/
1
(/)] . (9.21)
9.2. INTRINSICALLY DISCRETE TIME DATA 95
The overall likelihood contribution for an individual with an observed spell
length of , cycles is:
L = (L
.
)
o
A
(L
1
)
o
B
L
c
1o
A
o
B
=
¸
/
.
(,)
1 ÷/
.
(,) ÷/
1
(,)
o
A ¸
/
1
(,)
1 ÷/
.
(,) ÷/
1
(,)
o
B
,
¸
=1
[1 ÷/
.
(/) ÷/
1
(/)] (9.22)
Another way of writing the likelihood, which we refer back to later on, is
L = o(,)
¸
/(,)
1 ÷/(,)
o
A
+o
B ¸
/
.
(,)
/(,)
o
A ¸
/
1
(,)
/(,)
o
B
. (9.23)
Although there is no neat separability result in this case, it turns out that
there is still a straightforward means of estimating an independent competing
risk model, as Allison (1982) has demonstrated. The ‘trick’ is to assume a
particular form for the destinationspeci…c hazards:
/
.
(/) =
exp(
0
.
A)
1 + exp(
0
.
A) + exp(
0
1
A)
(9.24)
/
1
(/) =
exp(
0
1
A)
1 + exp(
0
.
A) + exp(
0
1
A)
(9.25)
and hence
1 ÷/
.
(/) ÷/
1
(/) =
1
1 + exp(
0
.
A) + exp(
0
1
A)
(9.26)
With destinationspeci…c censoring indicators c
.
and c
.
de…ned as before, the
likelihood contribution for the individual with spell length , can be written:
L =
¸
exp(
0
.
A)
1 + exp(
0
.
A) + exp(
0
1
A)
o
A
¸
exp(
0
1
A)
1 + exp(
0
.
A) + exp(
0
1
A)
o
B
¸
1
1 + exp(
0
.
A) + exp(
0
1
A)
1o
A
o
B
,1
¸
=1
¸
1
1 + exp(
0
.
A) + exp(
0
1
A)
. (9.27)
However, as Allison (1982) pointed out, this likelihood has the same form as the
likelihood for a standard multinomial logit model applied to reorganised data.
To estimate the model, there are four steps:
96 CHAPTER 9. COMPETING RISKS MODELS
1. expand the data into personperiod form (as discussed in Chapter 6 for
singledestination discrete time models).
2. Construct an dependent variable for each personperiod observation. This
takes the value 0 for all censored observations in the reorganised data
set. (For persons with censored spells, all observations are censored; for
persons with a completed spell, all observations are censored except the
…nal one.) For persons with an exit to destination ¹ in the …nal period
observed, set the dependent variable equal to 1, and for those with an exit
to destination 1 in the …nal period observed, set the dependent variable
equal to 2. (If there are more destinations, create additional categories of
the dependent variable.)
3. Construct any other variables required, in particular variables summaris
ing durationdependence in the destinationspeci…c hazards. Other time
varying covariates may also be constructed.
4. Estimate the model using a multinomial logit program, setting the refer
ence (base) category equal to 0.
Observe that the particular values for the dependent variable that are chosen
do not matter. What is important is that they are distinct (and also that
one’s software knows which value corresponds to the base category). In e¤ect,
censoring is being treated as another type of destination, call it C. However
in the multinomial logit model, there is an identi…cation issue. One cannot
estimate a set of coe¢cients (call them
c
) for the third process in addition
to the other two sets of parameters (
.
,
1
). There is more than one set of
estimates that would led to the same probabilities of the outcomes observed. As
a consequence, one of the sets of parameters is set equal to zero. In principle, it
does not matter which, but in the current context it is intuitively appealing to set
c
= 0. The other coe¢cients (
.
,
1
) then measure changes in probabilities
relative to the censored (no event) outcome.
The ratio of the probability of exit to destination ¹ to the probability of
no exit at all is exp(
.
), with an analogous interpretation for exp(
1
). This is
what the Stata Reference Manuals (in the mlogit entry) refer to as the ‘relative
risk’. And as the manual explains, ‘the exponentiated value of a coe¢cient is
the relative risk ratio for a one unit change in the corresponding variable, it
being understood that risk is measured as the risk of the category relative to
the base category’ (no event in this case).
A nice feature of this ‘multinomial logit’ hazard model is that it is straightfor
ward to test hypotheses about whether the destinationspeci…c discrete hazards
have common determinants. Using software packages such as Stata it is straight
forward to apply Wald tests of whether, for example, corresponding coe¢cients
are equal or not. One can also estimate models in which some components of
the destinationspeci…c hazards, for example the durationdependence speci…
cations, are constrained to take the same form. One potential disadvantage of
estimating the model using the multinomial logit programs in standard software
9.3. INTERVALCENSORED DATA 97
is that these typically require that the same set of covariates appears in each
equation.
9.3 Intervalcensored data
We now consider the case in which survival times are generated by some contin
uous time process, but observed survival times are grouped into intervals of unit
length (for example the ‘month’ or ‘week’). One way to proceed would simply
be to apply the ‘multinomial logit’ hazard model to these data, as described
above, i.e. making assumptions about the discrete time (interval) hazard and
eschewing any attempt to relate the model to an underlying process in contin
uous time. The alternative is to do as we did in Chapter 3, and to explicitly
relate the model to the continuous time hazard(s). As we shall now see, the
models are complicated relative to those discussed earlier in this chapter, for
two related reasons. First, the likelihood is not separable as it was for the
continuous time case. Second, and more fundamentally, the shape of the (con
tinuous time) hazard rate within each interval cannot be identi…ed from the
grouped data that is available. To construct the sample likelihood, assumptions
have to be made about this shape, and alternative assumptions lead to di¤erent
econometric models.
In the data generation processes discussed so far in this chapter, the overall
hazard was equal to the sum of the destinationspeci…c hazards. This is not
true in the intervalcensored case. With grouped survival times, more than one
latent event is possible in each interval (though, of course, only one is actually
observed). Put another way, when constructing the likelihood and considering
the probability of observing an exit to a speci…c destination in a given interval,
we have to take account of the fact that, not only was there an exit to that
destination, but also that that exit occurred before an exit to the other potential
destinations.
Before considering the expressions for the likelihood contributions, let us
explore the relationship between the overall discrete (interval) hazard and the
destinationspeci…c interval hazards, and the relationship between these discrete
interval hazards and the underlying continuous time hazards. Using the same
notation as in Chapter 2, we note that the ,th interval (of unit length) runs
between dates a
,
÷1 and a
,
. The overall discrete hazard for the ,th interval is
given by
/(,) = 1 ÷
o(a
,
)
o(a
,
÷1)
= 1 ÷
exp
÷
oj
0
[0
.
(t) +0
1
(t)]dt
exp
÷
oj1
0
[0
.
(t) +0
1
(t)]dt
= 1 ÷exp
¸
÷
oj
oj1
[0
.
(t) +0
1
(t)]dt
¸
(9.28)
98 CHAPTER 9. COMPETING RISKS MODELS
where we have used on the property 0 (t) = 0
.
(t) + 0
1
(t). The destination
speci…c discrete hazards for the same interval are
/
.
(,) = 1 ÷exp
¸
÷
oj
oj1
0
.
(t) dt
¸
(9.29)
and
/
1
(,) = 1 ÷exp
¸
÷
oj
oj1
0
1
(t) dt
¸
. (9.30)
It follows that
/(,) = 1 ÷¦[1 ÷/
.
(,)] [1 ÷/
1
(,)]¦ (9.31)
or
1 ÷/(,) = [1 ÷/
.
(,)] [1 ÷/
1
(,)] . (9.32)
Thus the overall discrete interval hazard equals one minus the probability of not
exiting during the interval to any of the possible destinations, and this latter
probability is the product of one minus the destinationspeci…c discrete hazard
rates. Rearranging the expressions, we also have
/(,) = /
.
(,) +/
1
(,) +/
.
(,) /
1
(,) (9.33)
 /
.
(,) +/
1
(,) if /
.
(,) /
1
(,)  0. (9.34)
This tells us that the overall interval hazard is only approximately equal to
the sum of the destinationspeci…c interval hazards, with the accuracy of the
approximation improving the smaller that the destinationspeci…c hazards are.
Now consider the relationship between the survivor function for exit to any
destination and the survivor functions for exits to each destination:
o (,) = (1 ÷/
1
) (1 ÷/
2
) (....) (1 ÷/
,
) (9.35)
= (1 ÷/
.1
) (1 ÷/
11
) (1 ÷/
.2
) (1 ÷/
12
)
... (1 ÷/
.2
) (1 ÷/
1,
)
= (1 ÷/
.1
) (1 ÷/
12
) (.....) (1 ÷/
.,
)
(1 ÷/
11
) (1 ÷/
12
) (....) (1 ÷/
1,
) . (9.36)
In other words,
o (,) = o
.
(,) o
1
(,) (9.37)
so that there is a factoring of the overall grouped data survivor function, analo
gous to the continuous time case. (The same factoring result can also be derived
by expressing the survivor functions in terms of the continuous time hazards,
rather than the discrete time ones.)
As before, there are three types of contribution to the likelihood: that for an
individual exiting to ¹ (L
.
), that for an individual exiting to 1 (L
1
), and that
for a censored case (L
c
). The latter is straightforward (we have just derived
it). For a person with a censored spell length of , intervals,
9.3. INTERVALCENSORED DATA 99
L
c
= o (,) = o
.
(,) o
1
(,)
=
,
¸
=1
[1 ÷/
.
(/)][1 ÷/
1
(/)] (9.38)
What is the likelihood contribution if, instead of being censored, the individual’s
spell completed with an exit to destination ¹ during interval ,? We need to
write down the joint probability that the exact spell length lay between the
lengths implied by the boundaries of the interval and that latent exit time to
destination 1 was after the latent exit time to destination ¹. According to the
convention established in Chapter 2, the ,th interval is de…ned as (a
,
÷ 1. a
,
].
I.e. the interval, of unit length, begins just after date a
,
÷1, and it …nishes at
(and includes) date a
,
, the start of the next interval. The expression for the
joint probability that we want is
L
.
= Pr(a
,
÷1 < T
.
_ a
,
. T
1
T
.
) (9.39)
=
oj
oj1
1
u
1(n. ·)d·dn (9.40)
=
oj
oj1
1
u
1
.
(n)1
1
(·)d·dn (9.41)
=
oj
oj1
¸
oj
u
1
.
(n)1
1
(·)d· +
1
oj
1
.
(n)1
1
(·)d·
¸
dn (9.42)
where 1(n. ·) is the joint probability density function for latent spell lengths
T
.
and T
1
, and the lower integration point in the second integral, n. is the
(unobserved) time within the interval at which the exit to ¹ occurred. Be
cause we assumed independence of competing risks, 1(n. ·) = 1
.
(n)1
1
(·). We
cannot proceed further without making some assumptions about the shape of
the withininterval density functions or, equivalently, the withininterval hazard
rates.
Five main assumptions have been used in the literature to date. The …rst is
that transitions can only occur at the boundaries of the intervals. The second is
that the destinationspeci…c density functions are constant within each interval
(though may vary between intervals), and the third is that destinationspeci…c
hazard rates are constant within each interval (though may vary between in
tervals). The fourth is that the the hazard rate takes a particular proportional
hazards form, and the …fth is that the log of the integrated hazard changes
linearly over the interval.
9.3.1 Transitions can only occur at the boundaries of the
intervals.
This was the assumption made by Narendrenathan and Stewart (1993). They
considered labour force transitions by British unemployed men using spell data
100 CHAPTER 9. COMPETING RISKS MODELS
grouped into weekly intervals. At the time, there was a requirement for un
employed individuals to register (‘sign on’), weekly, at the unemployment o¢ce
in order to receive unemployment bene…ts, and so there is some plausibility in
the idea that transitions would only occur at weekly intervals. The assumption
has powerful and helpful implications. If transitions can only occur at interval
boundaries then, if a transition to ¹ occurred in interval , = (a
,
÷ 1. a
,
], it
occurred at date a
,
, and it must be the case that T
1
a
,
(i.e. beyond the ,th
interval). This, in turn, means that 1
1
(·) = 0 between dates n and a
,
. This
allows substantial simpli…cation of (9.42):
L
.
=
oj
oj1
1
oj
1
.
(n)1
1
(·)d·dn (9.43)
=
oj
oj1
1
.
(n)dn
1
oj
1
1
(·)d· (9.44)
= [1
.
(a
,
) ÷1
.
(a
,
÷1)] [1 ÷1
1
(a
,
)] (9.45)
= /
.
(,) o
.
(, ÷1) o
1
(,) (9.46)
=
¸
/
.
(,)
1 ÷/
.
(,)
o
.
(,) o
1
(,) . (9.47)
By similar arguments, we may write
L
1
=
¸
/
1
(,)
1 ÷/
1
(,)
o
.
(,) o
1
(,) . (9.48)
In both expressions, the Chapter 3 de…nitions of the discrete (interval) hazard
function (/) and survivor function (o) in the intervalcensored case have been
used. Of course, empirical evaluation of the expressions also requires selection
of a functional form for the destinationspeci…c continuous hazards. A natural
choice for these is a proportional hazard speci…cation, in which case /
.
(,) and
/
1
(,) would each take the complementary loglog form discussed in Chapter 3:
/
.
(,) = 1 ÷exp[÷exp
0
.
A +
.,
] (9.49)
and
/
1
(,) = 1 ÷exp[÷exp
0
1
A +
1,
] (9.50)
where
.,
is the log of the integrated baseline hazard for destination ¹ over the
,th interval, and
1,
is interpreted similarly.
If we de…ne destinationspeci…c censoring indicators c
.
and c
1
, as above,
then the overall likelihood contribution for the person with a spell length of ,
intervals is given by
L = (L
.
)
o
A
(L
1
)
o
B
L
c
1o
A
o
B
=
¸
/
.
(,)
1 ÷/
.
(,)
o
A
o
.
(,)
¸
/
1
(,)
1 ÷/
1
(,)
o
B
o
1
(,). (9.51)
9.3. INTERVALCENSORED DATA 101
Thus in the case where transitions can only occur at the interval boundaries,
the likelihood contribution partitions into a product of terms, each of which is a
function of a single destinationspeci…c hazard only. In other words, the result
is analogous to that continuous time models (and also generalises to a greater
number of destination states). And, as in the continuous time case, one can
estimate the overall independent competing risk model by estimating separate
destinationspeci…c models having de…ned suitable destinationspeci…c censoring
variables.
1
9.3.2 Destinationspeci…c densities are constant within in
tervals
This was the assumption made by, for example, Dolton and van der Klaauw
(1999). We begin again with (9.42), but now apply a di¤erent set of assumptions.
The assumption of constant withininterval densities (as used for the ‘acturial
adjustment’ in the context of lifetables) implies that
1
.
(n)1
1
(·) = 1
.,
1
1,
when a
,
÷1 < n _ a
,
and a
,
÷1 < · _ a
,
. (9.52)
Substituting this result into (9.42), we derive
L
.
=
oj
oj1
¸
oj
u
1
.,
1
1,
d· +
1
2
1
oj
1
.
(·)1
1
(n)d· +
1
2
1
oj
1
.
(·)1
1
(n)d·
¸
dn.
(9.53)
Now, evaluating the …rst term of the three within the square brackets,
oj
u
1
.,
1
1,
d· = 1
.,
1
1,
[a
,
÷n] (9.54)
and then substituting back into L
.
, we get
L
.
=
oj
oj1
1
.,
1
1,
[a
,
÷n]dn
+
1
2
oj
oj1
1
oj
1
.
(n)1
1
(·)d·dn +
1
2
oj
oj1
1
oj
1
.
(n)1
1
(·)d·dn. (9.55)
The …rst term in (9.55) is
oj
oj1
1
.,
1
1,
[a
,
÷n]dn = 1
.,
1
1,
[a
,
÷
1
2
(a
,
)
2
+
1
2
(a
,
÷1)
2
]
=
1
2
1
.,
1
1,
(9.56)
=
1
2
oj
oj1
oj
oj1
1
.,
1
1,
d·dn (9.57)
1
In an earlier version of these Lecture Notes, I claimed (incorrectly) that this result held
universally for intervalcensored spell data. I am grateful for clari…catory discussions with
Paul Allison, Chandra Shah, Mark Stewart, Peter Dolton, and especially Wilbert van der
Klaauw. My derivations reported for case (2) are based on unpublished notes by Wilbert.
102 CHAPTER 9. COMPETING RISKS MODELS
Now, substituting this back into (9.55), the expression for L
.
, we can combine
the …rst two terms in square brackets, to derive
L
.
=
1
2
oj
oj1
1
oj1
1
.
(n)1
1
(·)d·dn +
1
2
oj
oj1
1
oj
1
.
(n)1
1
(·)d·dn (9.58)
=
1
2
Pr(a
,
÷1 < T
.
_ a
,
. T
1
a
,
÷1) +
1
2
Pr(a
,
÷1 < T
.
_ a
,
. T
1
a
,
). (9.59)
This is the result cited by Dolton and van der Klaauw (1999, footnote 9). Since
survival times T
.
and T
1
are independent (by assumption), the joint probabil
ities can be calculated using expressions for the marginal probabilities. That
is,
L
.
=
1
2
[Pr(a
,
÷1 < T
.
_ a
,
) Pr(T
1
a
,
÷1)]
+
1
2
[Pr(a
,
÷1 < T
.
_ a
,
) Pr(T
1
a
,
)] (9.60)
= Pr(a
,
÷1 < T
.
_ a
,
)
1
2
[Pr(T
1
a
,
÷1) + Pr(T
1
a
,
)](9.61)
=
¸
/
.
(,)
1 ÷/
.
(,)
o
.
(,)
1
2
[o
1
(, ÷1) +o
1
(,)] (9.62)
=
¸
/
.
(,)
1 ÷/
.
(,)
o
.
(,)
o
1
(,)
2
¸
1
1 ÷/
1
(,)
+ 1
(9.63)
=
¸
/
.
(,)
1 ÷/
.
(,)
o
.
(,)o
1
(,)
¸
1 ÷/
1
(,) ´2
1 ÷/
1
(,)
. (9.64)
Similarly, the expression for the likelihood contribution for the case when there
is a transition to destination 1 in interval , is
L
1
=
1
2
[Pr(a
,
÷1 < T
1
_ a
,
) Pr(T
.
a
,
÷1)]
+
1
2
[Pr(a
,
÷1 < T
1
_ a
,
) Pr(T
.
a
,
)] (9.65)
=
¸
/
1
(,)
1 ÷/
1
(,)
o
1
(,)
1
2
[o
.
(, ÷1) +o
.
(,)] (9.66)
=
¸
/
1
(,)
1 ÷/
1
(,)
o
1
(,)o
.
(,)
¸
1 ÷/
.
(,) ´2
1 ÷/
.
(,)
. (9.67)
Thus the constant withininterval densities assumption leads to expressions for
the likelihood contributions that have rather nice interpretations. The two com
ponent probabilities comprising equation (9.60), and similarly in (9.65), provide
bounds on the joint probability of interest and, with the assumption, we simply
take the average of them. This in turn involves a simple averaging of survival
functions that refer to the beginning and end of the relevant interval: see equa
tions (9.62) and (9.66). The averaging property also holds when there are more
9.3. INTERVALCENSORED DATA 103
than two destinations. If there were three destinations, call them ¹, 1, and 1,
then expression corresponding to (9.66) is
L
1
=
¸
/
1
(,)
1 ÷/
1
(,)
o
1
(,)
1
3
[o
.
(, ÷1)o
1
(, ÷1)]
2
3
[o
.
(,)o
1
(,)] . (9.68)
It is as if, with a greater number of competing risks, the fact that a transition to
1 rather than the other destinations occurred, suggests that the transition times
for the other risks were more likely to have been after the end of the interval. The
likelihood contribution gives greater weight to endofinterval survival functions.
To sum up, with two destinations ¹, 1, the overall likelihood contribution
for an individual with a spell of , intervals is
L = (L
.
)
o
A
(L
1
)
o
B
L
c
1o
A
o
B
=
¸
/
.
(,)
1 ÷/
.
(,)
o
A ¸
1 ÷/
1
(,) ´2
1 ÷/
1
(,)
o
A
o
.
(,) (9.69)
¸
/
1
(,)
1 ÷/
1
(,)
o
B ¸
1 ÷/
.
(,) ´2
1 ÷/
.
(,)
o
B
o
1
(,). (9.70)
where the precise speci…cation depends on the choice of functional form for the
discrete (interval) hazard, for example the cloglog form as in (9.49) and (9.50)
above. Whichever is used, the expression is not separable into destination
speci…c components that can be maximized separately. That is, to estimate
the independent competing risk model in this case, one could not use standard
singlerisk model software – special programs are required.
On the other hand, equations (9.64) and (9.67) imply that, if one did (in
correctly) use the single risk approach as in Case 1, then computations of the
individual likelihood contributions would be underestimates, with the magni
tude of the error depending on the size of the destinationspeci…c hazard rates.
If the hazard rate /
.
(,) = 0.1, then
1
A
(,)/2
1
A
(,)
 1.06, whereas if the hazard
equalled 0.01, the ratio is about 1.01. In other words, as the destinationspeci…c
hazards become in…nitesimally small, then we have the likelihood contributions
tend to expressions that are the same as those derived for the case when tran
sitions could only occur at the interval boundaries. The size of the discrete
hazards will depend on the application in question, and how wide the intervals
are (for example are they one week or one year?).
9.3.3 Destinationspeci…c hazard rates are constant within
intervals
With this assumption, the distribution of survival times for each destination
(and overall) takes the Exponential form within each interval. Let
104 CHAPTER 9. COMPETING RISKS MODELS
0
.
(t) = 0
.,
if a
,
÷1 < t _ a
,
, and (9.71)
0
1
(t) = 0
1,
if a
,
÷1 < t _ a
,
, implying (9.72)
0(,) = 0
.,
+0
1,
. if a
,
÷1 < t _ a
,
. (9.73)
One obvious parameterisation would be 0
.,
= exp(
0.,
+
1.
A
1
+
2.
A
2
+
... +
1.
A
1
), and 0
1,
= exp(
01,
+
11
A
1
+
21
A
2
+... +
11
A
1
). I.e. each
destinationspeci…c hazard rate has a piecewise constant exponential (PCE)
form.
Given these assumptions, the destinationspeci…c interval hazards are, mak
ing the appropriate substitutions into (9.29) and (9.30),
/
.
(,) = 1 ÷exp
÷0
.,
(9.74)
/
1
(,) = 1 ÷exp
÷0
1,
(9.75)
/(,) = 1 ÷exp
÷0
,
. (9.76)
Now consider the likelihood contribution for an individual who made a tran
sition to destination ¹ during interval ,. We have explicit functional forms
for the density functions within the relevant interval: 1
.
(n) = 0
.,
o
.
(n) =
0
.,
o
.
(, ÷ 1) exp
0
.,
(a
,
÷1 ÷n)
when a
,
÷ 1 _ n < a
,
, and similarly for
1
1
(·). These can be substituted into (9.42), and the expression evaluated.
There are two terms in the double integral. The second is
L
.
(second term) =
oj
oj1
1
oj
1
.
(n)1
1
(·)d·dn
= /
.
(,)o
.
(, ÷1)o
1
(,)
=
/
.
(,)
1 ÷/
.
(,)
o
.
(,)o
1
(,) (9.77)
L
.
(second term) is the likelihood contribution if one knew that no transitions
to 1 could have occurred within the interval. But we have to allow for this
possibility; hence the adjustment summarised in the …rst double integral term
in the expression for L
.
. Tedious manipulations show that
L
.
(…rst term) = o(, ÷1)/(,)
¸
0
.,
0
,
÷
/
.
(,)
/(,)
exp[÷0
1,
]
. (9.78)
The …rst two terms in (9.78) represent the probability of survival to beginning
of the ,th interval followed by exit to any destination within interval ,. This
probability is then multiplied by a term (in the square brackets), which is the
di¤erence between two components. The …rst component is the ratio of the
9.3. INTERVALCENSORED DATA 105
destinationspeci…c instantaneous hazard to the overall instantaneous hazard.
The second component is the ratio of the destinationspeci…c discrete hazard to
the overall discrete hazard, scaled by a factor (exp[÷0
1,
]) which is a destination
speci…c conditional probability of survival (nonexit to 1) over an interval of
unit length.Observe that if 0
1,
= 0, then the term in square brackets is equal
to zero, and L
.
takes the same form as we derived when transitions could only
occur at the boundaries of intervals.
Combining L
.
(…rst term) and L
.
(second term), further manipulations re
veal that
L
.
= o(, ÷1)/(,)
0
.,
0
.,
+0
1,
= o(,)
/(,)
1 ÷/(,)
0
.,
0
.,
+0
1,
(9.79)
The contribution in this case is the probability of survival to the beginning of
interval ,, multiplied by the probability that an event of any type occurred
within the interval conditional on survival to the beginning of the interval (i.e.
/(,)), multiplied by the relative chance that the event was ¹ rather than 1 at
each instant during the interval.
The overall likelihood in this case is
L
3
= o(,)
/(,)
1 ÷/(,)
o
A
+o
B
0
.,
0
.,
+0
1,
o
A
0
1,
0
.,
+0
1,
o
B
= o(, ÷1) [1 ÷/(,)]
1o
A
o
B
[/(,)]
o
A
+o
B
0
.,
0
,
o
A
0
1,
0
,
o
B
(9.80)
which corresponds to equation 4 of Røed and Zhang (2002a, p. 14). The expres
sion generalises straightforwardly when there are more than two potential des
tinations. Whatever the number of destinations, observe that the expression for
the likelihood contribution is not separable into destinationspeci…c components
that can be maximized separately. Observe also that as the destinationspeci…c
hazards become in…nitesimally small, then we have the likelihood contributions
tend to expressions that are the same as those derived for the case when tran
sitions could only occur at the interval boundaries. The size of the interval
hazards will depend on the application in question, and how wide the intervals
are.
There is also a connection between the expression L
3
and the expression
(9.23) describing the likelihood in the ‘multinomial’ case. Recall that this was:
L = o(,)
¸
/(,)
1 ÷/(,)
o
A
+o
B ¸
/
.
(,)
/(,)
o
A ¸
/
1
(,)
/(,)
o
B
. (9.81)
If intervals are short, or intervalhazards are relatively small then, the more
likely it is that /
.
(,)  0
.,
, /
1
(,)  0
1,
, In this case, the ‘multinomial
model’ will provide estimates that are similar to the intervalcensoring model
assuming a constant hazard within intervals (and also assuming that each uses
the same speci…cation for duration dependence in the discrete hazard).
106 CHAPTER 9. COMPETING RISKS MODELS
9.3.4 Destinationspeci…c proportional hazards with a com
mon baseline hazard function
This assumption underlies the derivation of competing risks models presented
by, for example, Hamerle and Tutz (1989). Let us consider the case with two
potential destinations ¹ and 1. The authors assume that the destination
speci…c continuous time hazards have the following proportional hazards forms:
0
.
(t) = 0
0
(t)`
.
, where `
.
= exp(
.
A) (9.82)
0
1
(t) = 0
0
(t)`
1
, where `
1
= exp(
1
A) (9.83)
where baseline hazard function 0
0
(t) is common to each destinationspeci…c
hazard. This is an example of a ‘proportional intensity’ model. These models
have the property that, conditional on exit being made at a particular time, the
probability that it is to one speci…c state rather than any other one does not
depend on survival time (Lancaster, 1990, pp. 103–4).
To aid our subsequent derivations, also de…ne the integrated baseline hazard
function
H(t) =

0
0
0
(t)dt (9.84)
which means that the continuous time hazards can be rewritten in terms of the
derivative of the integrated hazard function:
0
.
(t) = `
.
H(t) (9.85)
0
1
(t) = `
1
H(t). (9.86)
The survivor function can also be written in terms of the integrated hazard:
o(t) = exp
¸
÷

0
0(t)dt
= exp[÷(`
.
+`
1
)H(t)]. (9.87)
The discrete time (interval) hazard rate for the ,th interval is
/(,) = 1 ÷
¸
o(a
,
)
o(a
,
÷1)
(9.88)
= 1 ÷exp[÷(`
.
+`
1
)(H(a
,
) ÷H(a
,
÷1))] (9.89)
= 1 ÷exp[÷(
¯
`
.,
+
¯
`
1,
)] (9.90)
where
¯
`
.,
= exp(
,
+
.
A) = exp(
,
)`
.
(9.91)
¯
`
1,
= exp(
,
+
1
A) = exp(
,
)`
1
(9.92)
9.3. INTERVALCENSORED DATA 107
and the
,
= log[H(a
,
)] ÷H(a
,
÷1)] are intervalspeci…c parameters. Similarly,
we may de…ne destinationspeci…c interval hazard rates
/
.
(,) = 1 ÷exp(÷
¯
`
.,
) (9.93)
/
1
(,) = 1 ÷exp(÷
¯
`
1,
) (9.94)
and survivor functions:
o
.
(t) = exp[÷`
.
1(t)] (9.95)
o
1
(t) = exp[÷`
1
1(t)] (9.96)
Using arguments similar to those developed earlier for the other cases, the
likelihood contribution for an individual with exit to destination ¹ in interval ,
is
L
.
=
oj
oj1
oj
u
0
.
(n)o
.
(n)0
1
(n)o
1
(n)d·dn
+
oj
oj1
1
oj
1
.
(n)1
1
(·)d·dn. (9.97)
L
.
(second term) = /
.
(,)o
.
(, ÷1)o
1
(,), but what is L
.
(…rst term)? We can
rewrite it in terms of `
.
, `
1
, and the integrated baseline hazard function:
L
.
(…rst term) =
oj
oj1
oj
u
`
.
H(n) exp[÷`
.
H(n)]`
1
H(·) exp[÷`
.
H(n)]d·dn.
(9.98)
Using derivations similar to those for Case 3, one can show that this expression
can be written as:
L
.
= o(, ÷1)/(,)
¯
`
.,
¯
`
.,
+
¯
`
1,
= o(, ÷1)/(,)
0
.,
0
.,
+0
1,
. (9.99)
The overall likelihood contribution is given by
L
4
= o(,)
/(,)
1 ÷/(,)
o
A
+o
B
0
.,
0
.,
+0
1,
o
A
0
1,
0
.,
+0
1,
o
B
= o(, ÷1) [1 ÷/(,)]
1o
A
o
B
[/(,)]
o
A
+o
B
0
.,
0
,
o
A
0
1,
0
,
o
B
(9.100)
This corresponds to the expression derived by Hamerle and Tutz (1989,
pp. 85–7). Clearly the likelihood contribution has a very similar shape that
derived for the case where hazards were assumed to be constant within each
interval. (Indeed Model 3 can be derived from Model 4. Suppose that 0
0
(t) in
108 CHAPTER 9. COMPETING RISKS MODELS
the Proportional Intensities model is timeinvariant, in which case
,
= , for
all ,. Now let the intercept terms in `
.
and `
1
be intervalspeci…c, in which
case Model 3 is what results.)
In sum, the proportional intensity model adds some generality (weaker as
sumptions about the shape of the baseline hazard within intervals) at the cost
of forcing a common baseline hazard function for the di¤erent latent risks. Ana
lysts would typically like to allow the shapes of the baseline hazards to di¤er for
the various destination types, and to test for equality rather than imposing it
from the very start. The PCE model allows each of the baseline hazards to vary
in a ‡exible manner with survival time, and may provide information about
whether it is appropriate to make the proportionate intensities assumption.
9.3.5 The log of the integrated hazard changes at a con
stant rate over the interval
This assumption about the linearity of the withininterval log of the baseline
hazards was made by Han and Hausman (1990) and Sueyoshi (1992). We shall
not consider it as it has been used relatively rarely and is, in any case, rather
complicated to explain. As Sueyoshi (1992, Appendix B) explains, the assump
tion implies that the hazard increases within each interval. The assumption of
a constant density within each interval (case 2 above) also implies that hazards
rise within each interval, and contrasts with the assumption of a constant hazard
made in the last subsection.
9.4 Extensions
9.4.1 Lefttruncated data
The derivations so far has assumed that analyst has access to a random sample
of spells. The models can all be easily adapted to the case in which the interval
censored survival time data are subject to left truncation, also known as ‘delayed
entry’. Suppose that the data for a given subject are truncated at a date within
the i

interval, where i < ,. As we have seen in earlier chapters, to derive the
correct likelihood contributions in this case, one needs to condition on survival
up to the truncation date. This means dividing the likelihood contribution
expression for the random sample of spells case (as considered earlier) by o(i).
Now, each of the likelihood expressions for intervalcensored data considered
earlier (L
1
. L
2
. L
3
. L
4
) is of the formL = o(,)7, and so the likelihood expression
for the left truncation case is simply L = o(,)7´o(i). But, given the relationship
between the survivor function and the interval hazard, there is a convenient
cancelling result: o(,)´o(i) =
¸
,
=I
(1 ÷/

). This has a convenient implication
for empirical researchers. Software programs for maximizing log L are typically
applied to data sets organised so that there is one row for each interval that
each subject is at risk of experiencing a transition. In the random sample of
spells case, a subject with , contributes , data rows and the loglikelihood
9.4. EXTENSIONS 109
contribution is log[o(,)7] =
¸
,
=1
log(1 ÷ /

) + log(7). With lefttruncated
data, a subject contributes , ÷i data rows and the loglikelihood contribution is
log[o(,)7´o(i)] =
¸
,
=I+1
log(1 ÷/

) +log(7). Thus, ICR models for interval
censored and lefttruncated survival data can be easilyestimated using the same
programs as for random samples of spells, applied to data sets in which data
rows corresponding to the intervals prior to the truncation point have been
excluded: see Chapter 6.
9.4.2 Correlated risks
The models may also be extended to allow for correlated rather than indepen
dent risks.
2
Suppose that each destinationspeci…c hazard rate now depends on
individualspeci…c unobserved heterogeneity in addition to, but independent of,
observed heterogeneity (A). Speci…cally, the hazard for transtions to destina
tion ¹ is rewritten to be a function of
.
A+j
.
(rather than of
.
A as before),
and A no longer includes an intercept term. Similarly, the hazard of transition
to destination ¹ is rewritten to be a function of
1
A + j
1
. The latent dura
tions T
.
. T
1
are now assumed to be independent conditional on the unobserved
heterogeneity components. But, by allowing j
.
and j
1
to be correlated, the
unconditional latent durations may be correlated.
Perhaps the most straightforward means of estimating the extended model
is to suppose that j
.
and j
1
have a discrete distribution with ' points of
support, and these mass points are estimated together with the probabilities
¬
n
for : = 1. . . . ', of the di¤erent combinations of j
.
and j
1
. That is,
letting the likelihood contribution for each combination of mass points be given
by L
n
(0 [j
.
. j
1
), where regression parameters be represented by the vector 0,
the overall likelihood contribution for each person is now
L
=
1
¸
n=1
¬
n
L
n
(0 [j
.
. j
1
) with
1
¸
n=1
¬
n
= 1. (9.101)
This is similar to the discussion of discrete mixing distributions to characterize
unobserved heterogeneity (see the previous chapter). The di¤erence is that
here the distribution is multivariate – the points of support refer to a joint
distribution rather than a marginal distribution. Rather than using a discrete
mixture distribution, one could assume a continuous distribution, e.g. supposing
that the heterogeneity distribution is multivariate normal (cf. Schneider and
Uhlendor¤, 2004).
2
See e.g. Dolton and van der Klaauw (1999), Røed and Zhang (2002a), Han and Hausman
(1990), Sueyoshi (1992), and Van den Berg and Lindeboom (1998). Conditions for identi…ca
tion of proportional hazards models with regressors for dependent competing risks are given
by Abbring and van den Berg (2003), Han and Hausman (1990), and Heckman and Honoré
(1989).
110 CHAPTER 9. COMPETING RISKS MODELS
9.5 Conclusions and additional issues
Our analysis of competing risks models has shown that, if we assume indepen
dence in competing risks, then estimation can be straightforward for several
important types of data generation process. Extensions to allow for correlated
risks have been introduced in the literature, but have not been used much in
applied work yet.
If we have continuous time data, the independent competing risks model can
be estimated using standard singlerisk models, as discussed elsewhere in this
book. Some complications arise with discretetime data. If the data are gen
uinely discrete, one cannot use a singlerisk model approach, but the ‘multinom
inal logit’ model is easy to estimate nonetheless.
With intervalcensored data, if one is prepared to assume that transitions
can only occur at the interval boundaries, then the modelling can be undertaken
straightforwardly, in a manner analogous to that for continuous time data. But
in other situations, one cannot avoid modelling the destinationspeci…c models
jointly, with di¤erent speci…cations arising depending on the assumptions made
about the shape of the hazard (or density) function within each interval. Since
alternative assumptions are possible (and each is intrinsically nontestable), this
raises the question of whether the estimates derived are sensitive to the choice
made. There can be no de…nitive answer to this question – it depends on the
context. Nonetheless the situations in which transitions occur only at the inter
val boundaries are perhaps relatively rare. Regarding the other alternatives, we
may note that Han and Hausman (1990) and Dolton and van der Klaauw (1995)
suggested that their estimates were relatively robust to alternative choices. By
constrast, Sueyoshi (1992) emphasised that robustness can depend on the width
of the intervals, particularly when the models include timevarying covariates.
Finally, a cautionary note about the interpretation of coe¢cient estimates
from destinationspeci…c hazard regressions. In the standard singlerisk and
given the parameterizations used earlier, if the coe¢cient estimate associated
with a particular explanatory variable A is positive, then there is a straight
forward interpretation. For example, with a proportional hazards model, larger
values of A imply a larger hazard rate, and shorter survival times. In compet
ing risks, interpretations of coe¢cients are not always so straightforward. For
example, one may be interested in more than simply how a covariate impacts on
the destinationspeci…c latent hazard. One may be also interested in estimating
the (unconditional) probability of exiting to a particular destination ¹ (say), or
the (conditional) probability of exit to ¹ conditional on exiting at a particular
survival time, or expected spell length in the state conditional on exit to ¹.
It turns out that all these quantities depend on all the parameters in the
competing risk model, not only those in the destinationspeci…c model of exit
to ¹, as Thomas (1996) explains. He also shows that if each hazard takes the
proportional hazard form, then an increase in A will increase the conditional
probability of exit to ¹ if its estimated coe¢cient in the equation for the hazard
of exit via ¹ is larger than the corresponding coe¢cients in the hazards for
all other risks. The same issues arise in the intrinsically discrete ‘multinomial
9.5. CONCLUSIONS AND ADDITIONAL ISSUES 111
logit’ competing risks model. For multinomial logit models, it is wellknown
that increases in a variable with a positive coe¢cient in the equation for one
outcome need not lead to an increase in the probability of that outcome, as
the probability of another outcome may increase by even more. Simulating the
predicted values of interest for di¤erent values of the explanatory variables is
one way of addressing the issues raised in this paragraph.
The estimation of unconditional probabilities of exit to a particular desti
nation is known in the biostatistics literature as the estimation of cumulative
incidence. See e.g. Gooley et al. (1999). A Stata program stcompet is pro
vided by Coviello and Boggess (2004).
112 CHAPTER 9. COMPETING RISKS MODELS
Chapter 10
Additional topics
Nothing on this at present – something for the future.
Potential topics might include:
« Estimation using repeated spells and multistate transition models, and
correlated competing risks (both are essentially an extension to the chapter
on unobserved heterogeneity). For a helpful introduction to the topic, see
Allison (1984). For an extensive, but more advanced, discussion based
extensively on the application of mixture models, see Van den Berg (2001).
« Simulation of survival time data
113
114 CHAPTER 10. ADDITIONAL TOPICS
References
Abbring, J.H. and van den Berg, G.J.(2003), ‘The identi…ability of the mixed
proportional hazards competing risks model’, Journal of the Royal Statistical
Society (A), 65, 701–710.
Abbring, J.H. and van den Berg, G.J.(2003), ‘The unobserved heterogene
ity distribution in duration analysis’, unpublished paper, Tinbergen Institute,
Amsterdam.
Allison, P. (1982), ‘Discrete time methods for the analysis of event histories’,
pp. 61–98, in S. Leinhardt (ed) Sociological Methodology 1982, JosseyBass, San
Francisco.
Allison, P. (1984), Event History Analysis, Sage, Newbury Park CA.
Allison, P. (1995), Survival Analysis Using the SAS
R (
System: A Practical
Guide, SAS Institute, Gary NC.
Arulampulam, W. and Stewart, M. (1995), ‘The determinants of individual
unemployment durations in an era of high unemployment’, Economic Journal
105, 321–332.
Atkinson, A.B., Gomulka, J., Micklewright, J., and Rau, N. (1984), ‘Unem
ployment bene…t, duration and incentives in Britain’, Journal of Public Eco
nomics 23, 3–26.
Baker, M., and Molino, A. (2000), ‘Duration dependence and nonparametric
heterogeneity: a MonteCarlo study’, Journal of Econometrics 96, 357–393.
Bane, M. and Ellwood, D. (1986), ‘Slipping in and out of poverty’, Journal
of Human Resources 21, 1–23.
Blank, R.M. (1989), ‘Analyzing the length of welfare spells’, Journal of
Public Economics 39, 245–273.
Bergström, R. and Edin, P.A. (1992), ‘Time aggregation and the distrib
utional shape of unemployment duration’, Journal of Applied Econometrics 7,
5–30.
Blossfeld, H.P., Hamerle, A., and Mayer, K. (1989), Event History Analysis,
Lawrence Urbaum Associates, Hillsdale NJ.
Blossfeld, H.P., and Rohwer, G. (2002) Techniques of Event History Mod
eling: New Approaches to Causal Analysis, second edition, Lawrence Urbaum
Associates, Mahwah NJ.
Box–Ste¤ensmeier, J.M. and Jones, B.S. (2004), Event History Modeling: A
Guide for Social Scientists, Cambridge University Press, Cambridge. 2004
115
116 REFERENCES
Cleves, M., Gould, W.W., and Gutierrez, R. (2002), An Introduction to
Survival Analysis Using Stata, Stata Press, College Station TX.
Coviello, V. and Boggess, M. (2004), ‘Cumulative incidence estimation in
the presence of competing risks’, The Stata Journal, 4, 103–112.
Cox, D. and Oakes, D. (1984), Analysis of Survival Data, Chapman Hall,
London.
Dolton, P. and van der Klaauw, W. (1995), ‘Leaving teaching in the UK: a
duration analysis’, Economic Journal 105, 431–435.
Dolton, P. and van der Klaauw, W. (1999), ‘The turnover of teachers: a
competing risks explanation’, Review of Economics and Statistics 81, 543–552.
ElandtJohnson, R.C. and Johnson, N.L. (1980, reprinted 1999), Survival
Methods and Data Analysis, John Wiley, New York.
Ermisch, J.F. and Ogawa, N. (1994), ‘Age at motherhood in Japan’, Journal
of Population Economics 7, 393–420.
Gooley, T.A., Leisenring, W., Crowley, J., and Storer, B.E. (1999), ‘Estima
tion of failure probabilities in the presence of competing risks: new representa
tions of old estimators, Statistics in Medicine, 18, 695–706.
Gould, W., Pitblado, J. and Sribney, W. (2003), Maximum Likelihood Esti
mation with Stata, second edition, Stata Press, College Station TX.
Gourieroux, C. (2000), Econometrics of Qualitative Dependent Variables,
Cambridge University Press, Cambridge.
Greene, W. (2003), Econometric Analysis, 5th edition, PrenticeHall Inter
national, Englewood Cli¤s NJ.
Guo, G. (1993), ‘Eventhistory analysis for lefttruncated data’, pp. 217–
243 in P.V. Marsden (ed.), Sociological Methodology 1993, Blackwell, Cambridge
MA.
Hachen, D.S. Jr. (1988), ‘The competing risks model. A method for analyz
ing processes with multiple types of events’, Sociological Methods and Research
17, 21–54.
Ham, J. and Rae, A. (1987) ‘Unemployment insurance and male unemploy
ment duration in Canada’, Journal of Labor Economics 5, 325–353.
Hamerle, A. and Tutz, G. (1988), Diskrete Modelle zur Analyse von Verweil
dauer und Lebenzeiten, Campus, Frankfurt and New York.
Han, A. and Hausman, J. (1990), ‘Flexible parametric estimation of duration
and competing risks models’, Journal of Applied Econometrics 5, 1–28.
Heckman, J.J. and Honoré, B.E. (1989), ‘The identi…ability of the competing
risks model’, Biometrika, 76, 325–330.
Heckman, J.J. and Singer, B. (1984), ‘A method for minimising the impact
of distributional assumptions in econometric models for duration data’, Econo
metrica 52, 271–320.
Holford, T.R. (1980), ‘The analysis of rates and survivorship using loglinear
models’, Biometrics 36, 299–305.
Hosmer, D.W. and Lemeshow, S. (1999), Applied Survival Analysis, Wiley,
New York.
Hoynes, H. and MaCurdy, T. (1994), ‘Has the decline in welfare shortened
welfare spells?’, American Economic Review Papers and Proceedings 84, 43–48.
117
Jenkins, S.P. (1995), ‘Easy ways to estimate discrete time duration models’,
Oxford Bulletin of Economics and Statistics 57, 129–138.
Jenkins, S.P. and GarcíaSerrano, C. (2004), ‘The relationship between un
employment bene…ts and reemployment probabilities: evidence from Spain’,
Oxford Bulletin of Economics and Statistics 66, 239–260. Longer version avail
able at http://www.iser.essex.ac.uk/pubs/workpaps/wp200017.php
Jenkins, S.P. and Rigg, J.A. (2001), The Dynamics of Poverty in Britain,
Department for Work and Pensions Research Report No. 157, Corporate Doc
ument Services, Leeds. http://www.dwp.gov.uk/asd/asd5/rrep157.asp
Kalb‡eisch, J.D. and Prentice, R.L. (1980), The Statistical Analysis of Fail
ure Time Data, John Wiley, New York.
Katz, L.F. and Meyer, B.D. (1990). ‘The impact of the potential duration
of unemployment bene…ts on the duration of unemployment’, Journal of Public
Economics, 41, 45–72.
Kiefer, N. (1985), ‘Econometric analysis of duration data’, Journal of Econo
metrics 28, 1–169.
Kiefer, N. (1988), ‘Economic duration data and hazard functions’, Journal
of Economic Literature 26, 646–679.
Kiefer, N. (1990), ‘Econometric methods for grouped duration data’, pp.
97–117 in J. Hartog, G. Ridder, and J. Theeuwes (eds), Panel Data and Labor
Market Studies, NorthHolland, Amsterdam.
Klein, J.P. and Moeschberger, M.L. (1997), Survival Analysis. Techniques
for Censored & Truncated Data, SpringerVerlag, New York.
Lancaster, T. (1990), The Econometric Analysis of Transition Data. Cam
bridge University Press, Cambridge.
Lancaster, T. (1979), ‘Econometric methods for the duration of unemploy
ment’, Econometrica 47, 939–956.
Meyer, B. (1990), ‘Unemployment insurance and unemployment spells’, Econo
metrica 58, 757–782.
Narandranathan, W. and Stewart, M. (1991), ‘Simple methods for testing
the proportionality of causespeci…c hazards in competing risks models’, Oxford
Bulletin of Economics and Statistics 53, 331–340.
Narandranathan, W. and Stewart, M. (1993), ‘Modelling the probability of
leaving unemployment: competing risk models with ‡exible baseline hazards’,
Applied Statistics 42, 63–83.
Narandranathan, W. and Stewart, M. (1995), ‘How does the bene…t e¤ect
vary as unemployment spells lengthen?’, Journal of Applied Econometrics 8,
361–381.
Nickell, S. (1979), ‘Estimating the probability of leaving unemployment’,
Econometrica 47, 1249–1266.
Nickell, S. and Lancaster, T. (1980), ‘The analysis of reemployment prob
abilities of the unemployed’, Journal of the Royal Statistical Society Series A,
143, 141–165.
Ondrich, J. and Rhody, S. (1999), ‘Multiple spells in the PrenticeGloeckler
Meyer likelihood with unobserved heterogeneity’, Economics Letters 63, 139–
144.
118 REFERENCES
O’Neill, J., Bassi, L. and Wolf, D. (1987), ‘The duration of welfare spells’,
Review of Economics and Statistics 69, 241–248.
Petersen, T. (1991), ‘Timeaggregation bias in continuoustime hazardrate
models’, pp. 263–290 in P.V. Marsden (ed.), Sociological Methodology 1991,
Blackwell, Cambridge MA.
Petersen, T. and Koput, K.W. (1992), ‘Timeaggregation hazardrate models
with covariates’, Sociological Methods and Research 21, 25–51.
Rodríguez, G. (1999?), ‘Survival analysis’, http://data.princeton.edu/wws509/notes/c7.pdf.
Prentice, R.L. and Gloeckler, L.A. (1978), ‘Regression analysis of grouped
survival data with application to breast cancer data’, Biometrics 34, 57–67.
Schneider, H. and Uhlendor¤, A. (2004), ‘The transition from welfare to
work and the role of potential labor income’, Working Paper 1420, IZA, Bonn.
Shaw, A., Walker, R., Ashworth, K., Jenkins, S., and Middleton, S. (1996),
Moving O¤ Income Support: Barriers and Bridges, DSS Research Report No.
53, HMSO, London. HV245.M6
Singer, J.D. and Willett, J.B. (1993), ‘It’s about time: using discretetime
survival analysis to study duration and the timing of events’, Journal of Edu
cational Statistics 18, 155–195.
Singer, J.D. and Willett, J.B. (1995), ‘It’s déjà vu all over again: using
multiplespell discretetime survival analysis’, Journal of Educational Statistics
20, 41–67.
Singer, J.D. and Willett, J.B. (2003), Applied Longitudinal Data Analysis.
Modeling Change and Event Occurence, Oxford University Press, Oxford.
StataCorp (2003), Stata Statistical Software: Release 8, Stata Corporation,
College Station TX.
Sueyoshi, G.T. (1992), ‘Semiparametric proportional hazards estimation of
competing risks models with timevarying covariates’, Journal of Econometrics
51, 25–58.
Sueyoshi, G.T. (1995), ‘A class of binary response models for grouped dura
tion data’, Journal of Applied Econometrics 10, 411–431.
Therneau, T.M. and Grambsch, P.M. (2000), Modeling Survival Data: Ex
tending the Cox Model, Springer, New York.
Thomas, J. (1996), ‘On the interpretation of covariate estimates in indepen
dent competingrisks models’, Oxford Bulletin of Economics and Statistics 48,
27–39.
Tuma, N.B. and Hannan, M.T., (1984), Social Dynamics: Models and Meth
ods, Academic Press, Orlando FL.
Van den Berg, G.J. (2001), ’Duration models: speci…cation, identi…cation
and multiple durations’, in Handbook of Econometrics Volume 5 (eds.) JJ.
Heckman and E. Leamer, NorthHolland, Amsterdam.
Van den Berg, G.J. Lindeboom, M., and Ridder, G. (1994), ‘Attrition in
longitudinal panel data and the empirical analysis of labor market behavior’,
Journal of Applied Econometrics 9, 421–435.
Van den Berg, G.J. and Lindeboom, M. (1994), ‘Attrition in panel survey
data and the estimation of multistate labor market models’, Journal of Human
Resources 33, 458–478.
119
Wooldridge, J.M. (2002), Econometric Analysis of Cross Section and Panel
Data, MIT Press, Cambridge MA.
Yamaguchi, K. (1991), Event History Analysis, Sage, Newbury Park CA.
120 REFERENCES
Appendix
Nothing here at present. Future versions may contain appendices about e.g.
Maximum Likelihood, and more about Partial Likelihood, or more about the
details of statistical testing in general (Wald and likelihood ratio), and residual
analysis.
121
ii
Contents
Preface xi
1 Introduction 1.1 1.2
1
1.3
1.4
What survival analysis is about . . . . . . . . . . . . . . . . . . . 1 Survival time data: some notable features . . . . . . . . . . . . . 3 1.2.1 Censoring and truncation of survival time data . . . . . . 4 1.2.2 Continuous versus discrete (or grouped) survival time data 6 1.2.3 Types of explanatory variables . . . . . . . . . . . . . . . 7 Why are distinctive statistical methods used? . . . . . . . . . . . 8 1.3.1 Problems for OLS caused by right censoring . . . . . . . . 8 1.3.2 Timevarying covariates and OLS . . . . . . . . . . . . . . 9 1.3.3 ‘ Structural’modelling and OLS . . . . . . . . . . . . . . . 9 1.3.4 Why not use binary dependent variable models rather than OLS? . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Outline of the book . . . . . . . . . . . . . . . . . . . . . . . . . . 10 13 13 14 15 16 17 19 20 21 21 22 22
2 Basic concepts: the hazard rate and survivor function 2.1 Continuous time . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.1 The hazard rate . . . . . . . . . . . . . . . . . . . . . . . 2.1.2 Key relationships between hazard and survivor functions . Discrete time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Survival in continuous time but spell lengths are intervalcensored . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.2 The discrete time hazard when time is intrinsically discrete 2.2.3 The link between the continuous time and discrete time cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Choosing a speci…cation for the hazard rate . . . . . . . . . . . . 2.3.1 Continuous or discrete survival time data? . . . . . . . . . 2.3.2 The relationship between the hazard and survival time . . 2.3.3 What guidance from economics? . . . . . . . . . . . . . . iii
2.2
2.3
. . . . .2. . . . . .4 Random sample of in‡ and each spell monitored until ow completed . .4. . .1 4. . . . . . . . . . . . . . . . .1 Weibull model and Exponential model . . . . . . . . . . . . . . . . . Discrete time speci…cations . . . .3. . . . .2. .3. . . . . . . 3. . . . . . . . . . . . . . . . .2. . . . . . 3. . . . . . . . . . . . . . . Continuous time speci…cations . . . .1 Empirical survivor function . . . 3. . . . . .2 5. . . . . . . .3 Loglogistic Model . . . . . . . . . .iv 3 Functional forms for the hazard rate 3. .3.4. . . . . . . .1. . Random sample of in‡ with (right) censoring.7 Accelerated Failure Time (AFT) models . . 3. . . . . . . . Random sample of population. 4. . Left truncated spell data (delayed entry) . . . . . . . . 3.0. . . . . . . . . . . . . . . . . . . . . . . . . . . . .4 Other continuous time models . . . . . . . . . . . . . . .
5 Continuous time multivariate models 5. . . .0. . .5 Discrete time models .2 Gompertz model . . . . . . . . . . .4
4 Estimation of the survivor and hazard functions 4. . .2
CONTENTS 25 25 26 26 27 27 28 28 28 33 38 38 40
Introduction and overview: a taxonomy . . . . . . Lifetable estimators . . . . 3. . . . . . .2 KaplanMeier (productlimit) estimators . . . . . . . . .2.6 Proportional Hazards (PH) models . .3
41 43 44 45 45 50 50 53 53 55 55 56 58 61
3. . . . . . . . . . 3. . 3. . . . . . . . A discrete time representation of a continuous time proportional hazards model . . . . . . . . . . .2. . . . . . .2. . . . . . . . . . . 3. . . . . . . . . .1 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4 Lognormal model . . . .2 Gompertz model . . . . . . 3. .3 Loglogistic model .1 The Weibull model . . .2. . . . . . . . . . . . . . . . .2 A model in which time is intrinsically discrete . . . Deriving information about survival time distributions .
62 63 63 64
. . . . . . . . . . . . . moniow tored until t .0. . . . . . . . . . . . . . . . . . . . . . . . . .4. . . 3. . . 3. . . .0. . . . . .4. . . 3. . . . 3. . . . . . . . . . .9 A semiparametric speci…cation: the piecewiseconstant Exponential (PCE) model . . . . . . . . . . . . .2. .3 5. right censoring but censoring point varies . . . . 3. . . . . . . . . . . . . . . .2. . . . . 3.1 5. . . . .1
3. . . . . . . . . . . . . . . . . . .8 Summary: PH versus AFT assumptions for continuous time models . . . . . . . . . . . .4. . . . . 3. . . . .5 Generalised Gamma model . . . . . .3 Functional forms for characterizing duration dependence in discrete time models . .
1 Transitions can only occur at the boundaries of the intervals. . . . 93 Intervalcensored data . . . . . . . . . . . . .3. . . . . . .4 Destinationspeci…c proportional hazards with a common baseline hazard function . . . . . . . . . . Empirical practice . . . .1 8. . . . . . . . .4. . . . . . . . . .4
9. . . . . .0. . . . . . . . . . . . . . . . . .3 Continuous time case . . . . . . . . .3
9. . . . . . . . . . . . . . . . . .3. . . . . . . . . 97 9.3. .5
Continuous time data . . . . . . . . . 108 9. . 99 9. . .6 Right truncated spell data (out‡ sample) . . . . .5 The log of the integrated hazard changes at a constant rate over the interval . . . . . . . .1
6 Discrete time multivariate models 6. . . 110 113 115
10 Additional topics References
. 109 Conclusions and additional issues . . . . of con. . . . . . . . . . . . . .3. . . ) Righttruncated spell data (out‡ sample) . . .2 9. . . . . . . . . . . .2 8. . . . . . . . . . . . . . . . 8. . .0. .3 Destinationspeci…c hazard rates are constant within intervals . . 103 9. 91 Intrinsically discrete time data . .4
The proportionate response of the hazard to variations in a characteristic . . . . . . . . . . . . . . . . . . . .1 6. 106 9. .
87 89 91
9 Competing risks models 9. . . . . . . . . . . . . . . . . .2 6. . . . . . . . . Discrete time case . . . . . . .1 Lefttruncated data . . . . . . . . .2
8. . . . . . . . ow
7 Cox’ proportional hazard model s 8 Unobserved heterogeneity (‘ frailty’ ) 8. . . . . . .3. . . . . . . . . .3. . . . . ow Lefttruncated spell data (‘ delayed entry’ . . 108 9. . . . . . . . . . . . . .4. . . . . . . . . . . . ow Episode splitting: timevarying covariates and estimation tinuous time models . . .
v 66 67 68 71 71 73 75 77 81 . . . . .2 Correlated risks . . . . . . . . . What if unobserved heterogeneity is ‘ important’but ignored? 8. . . 108 Extensions . . . . . . .1 The duration dependence e¤ect . . . . . . . .CONTENTS 5. . . .5 Sample from stock with no reinterview . . . . . . . . . . . . . . . . . . . . . . . . 5. . . . . . . . . . . . . . . . . .3 In‡ sample with right censoring . . . . . . . . . . . . . . . . . . . . . .3. . . . . . . . . . . . .1 9. . . 82 84 86 87
5. . . . . . . . .2 Destinationspeci…c densities are constant within intervals 101 9. . . . .
vi Appendix
CONTENTS 121
.
.5 4. . . . . Di¤erent error term distributions imply di¤erent AFT models . . . . . . . . Classi…cation of models as PH or AFT: summary . . . . . . . .1 5. .1 3. . . . . . .List of Tables
1. . . . . . . . . . . . . . . . . Ratio of mean to median survival time: Weibull model . . . .1 6. . . . . . . . . . . Person and personperiod data structures: example . . . . . . 2 26 34 38 38 49 56 69 73 78
vii
. . . . .2 3. .3 3.1 3. . . . . . . . Functional forms for the hazard rate: examples . Example of episode splitting . . . . . . . .1 Examples of lifecourse domains and states . . . . . .1 7. . . . . . . . . . . . . . . . . . Speci…cation summary: proportional hazard versus accelerated failure time models . . . . . . . . . . . . . . . Data structure for Cox model: example . Example of data structure . .4 3. . . . .
viii
LIST OF TABLES
.
List of Figures
ix
.
x
LIST OF FIGURES
.
Preface
These notes were written to accompany my Survival Analysis module in the masterslevel University of Essex lecture course EC968. and a sequence of lessons on how to do Survival Analysis (based around the Stata software package). and my Essex University Summer School course on Survival Analysis. and useful as a text not only as program manuals.uk Post: Institute for Social and Economic Research.essex. They are superb.php. The Stata Reference Manuals written by the StataCorp sta¤ have also been a big in‡ uence. Colchester CO4 3SQ. United Kingdom.) The course reading list. University of Essex.iser. I have also drawn inspiration from other Stata users. Charts and graphs from the classroom presentations are not included (you have to get something for being present in person!). and will evolve as and when time allows. I have also learnt much about survival analysis from Mark Stewart (University of Warwick) and John Ermisch (University of Essex).. My lectures were originally based on a set of overhead transparencies given to me by John Micklewright (University of Southampton) that he had used in a graduate microeconometrics lecture course at the European University Institute.ac.0 (formatted using the ‘ Standard LaTeX book’style). are downloadable from http://www. In addition to the StataCorp sta¤. The document was produced using Scienti…c Workplace version 5.ac. I would speci…cally like to cite
1 Information about Essex Summer School courses and how to apply is available from http://www. Wivenhoe Park. Beware: the notes remain work in progress.uk/teaching/degree/stephenj/ec968/index. and has been revised several times since.ac. Over the years. Robert Wright (University of Stirling) patiently answered questions when I …rst started to use survival analysis. Please send me comments and suggestions on both these notes and the doityourself lessons: Email: stephenj@essex.
xi
.1 (The …rst draft was completed in January 2002.uk/methods.essex.
xii
PREFACE
the contributions of Jeroen Weesie (Utrecht University) and Nick Cox (Durham University). The writing of Paul Allison (University of Pennsylvania) on survival analysis has also in‡ uenced me, providing an exemplary model of how to explain complex issues in a clear nontechnical manner. I wish to thank Janice Webb for wordprocessing a preliminary draft of the notes. I am grateful to those who have drawn various typographic errors to my attention, and also made several other helpful comments and suggestions. I would like to especially mention Paola De Agostini, José Diaz, Annette Jäckle, Lucinda Platt, Thomas Siedler and the course participants at Essex and elsewhere (including Frigiliana, Milan, and Wellington). The responsibility for the content of these notes (and the webbased) Lessons is mine alone.
If you wish to cite this document, please refer to: Jenkins, Stephen P. (2004). Survival Analysis. Unpublished manuscript, Institute for Social and Economic Research, University of Essex, Colchester, UK. Downloadable from http://www.iser.essex.ac.uk/teaching/degree/stephenj/ec968/pdfs/ec968lnotesv6.pd c Stephen P. Jenkins, 2005.
Chapter 1
Introduction
1.1 What survival analysis is about
This course is about the modelling of timetoevent data, otherwise known as transition data (or survival time data or duration data). We consider a particular lifecourse ‘ domain’ which may be partitioned into a number of mutually, exclusive states at each point in time. With the passage of time, individuals move (or do not move) between these states. For some examples of lifecourse domains and states, see Table 1.1. For each given domain, the patterns for each individual are described by the time spent within each state, and the dates of each transition made (if any). Figure 1, from Tuma and Hannan (1984, Figure 3.1) shows a hypothetical marital history for an individual. There are three states (married, not married, dead) di¤erentiated on the vertical axis, and the horizontal axis shows the passage of time t. The length of each horizontal line shows the time spent within each state, i.e. spell lengths, or spell durations, or survival times. More generally, we could imagine having this sort of data for a large number of individuals (or …rms or other analytical units), together with information that describes the characteristics of these individuals (to be used as explanatory variables in multivariate models). This course is about the methods used to model transition data, and the relationship between transition patterns and characteristics. Data patterns of the sort shown in Figure 1 are quite complex however; in particular, there are multistate transitions (three states) and repeat spells within a given state (two spells in the state ‘ notmarried’ Hence, to simplify matters, we shall focus on ). models to describe survival times within a single state, and assume that we have single spell data for each individual. Thus, for the most part, we consider exits from a single state to a single destination.1
1 Nonetheless we shall, later, allow for transitions to multiple destination states under the heading ‘ independent competing risk’ models, and shall note the conditions under which repeated spell data may be modelled using singlespell methods.
1
2 Domain Marriage
CHAPTER 1. INTRODUCTION State married cohabiting separated divorced single receiving bene…t x receiving bene…t y receiving x and y receiving neither ownedoutright owned with mortgage renter –social housing renter –private other employed selfemployed unemployed inactive retired
Receipt of cash bene…t
Housing tenure
Paid work
Table 1.1: Examples of lifecourse domains and states
We also make a number of additional simplifying assumptions: the chances of making a transition from the current state do not depend on transition history prior to entry to the current state (there is no state dependence); entry into the state being modelled is exogenous – there are no ‘ initial conditions’problems. Otherwise the models of survival times in the current state would also have to take account of the di¤erential chances of being found in the current state in the …rst place; the model parameters describing the transition process are …xed, or can be parameterized using explanatory variables –the process is stationary. The models that have been specially developed or adapted to analyze survival times are distinctive largely because they need to take into account some special features of the data, both the ‘ dependent’ variable for analysis (survival time itself), and also the explanatory variables used in our multivariate models. Let us consider these features in turn.
respondents to a survey are asked to provide information about their spells
.2
Survival time data: some notable features
Survival time data may be derived in a number of di¤erent ways. or until the spell ends. 2.1. one might sample all the individuals who began a UI spell. In‡ow sample Data collection is based on a random sample of all persons entering the state of interest. when modelling the length of spells of unemployment insurance (UI) receipt. the sample would consist of individuals leaving UI recept. 3. or may be combined with a social survey that asks further questions of the persons of interest. Crosssection sample survey. and also …nd out when they …rst received UI (and other characteristics). For example. and the way the data are generated has important implications for analysis. Out‡ow sample Data collection is based on a random sample of those leaving the state of interest. ow The longitudinal data in these four types of sample may be collected from three main types of survey or database: 1. and individuals are followed until some prespeci…ed date (which might be common to all individuals). with retrospective questions In this case. to continue our UI example. For example. 2.2.e. Data may also be generated from combinations of these sample types. and respondents are asked about their current and/or previous spells of the type of interest (starting and ending dates). whereas some will begin during the window (as in the in‡ sample case). one might sample all the individuals who were in receipt of UI at a given date. and one also determines when they entered the state (the spell start date). and one also determines when the spell began. Administrative records For example. Population sample Data collection is based on a general survey of the population (i. when modelling the length of spells of receipt of unemployment insurance (UI). For example. for example between 1 January and 1 June of a given year. information about UI spells may be derived from the database used by the government to administer the bene…t system. The administrative records may be the sole source of information about the individuals. 4. the researcher may build a sample of spells by considering all spells that occurred between two dates. SURVIVAL TIME DATA: SOME NOTABLE FEATURES
3
1. who are typically (but not always) interviewed at some time later. For example. There are four main types of sampling process providing survival time data: 1. where sampling is not related to the process of interest). Stock sample Data collection is based upon a random sample of the individuals that are currently in the state of interest. Some spells will already be in progress at the beginning of the observation window (as in the stock sample case).
equivalently – at least from the analytic point of view – whether survival times are intrinsically discrete.1
Censoring and truncation of survival time data
A survival time is censored if all that is known is that it began or ended within some particular interval of time. and determining the dates of marriage and of divorce. i. Left censoring: the case when the start date of the spell was not observed. At each interview. the longitudinal information is built from repeated interviews (or other sorts of observation) on the sample of interest at a number of di¤erent points in time. The rest of this section highlight the nature of the di¤erences in information about spells. (Be aware that biostatisticians typically
. 3.2. separation.4
CHAPTER 1. The second and related aspect concerns whether the analyst observes the precise dates at which spells are observed (or else survival times are only observed in intervals of time. and so on. although each method provides spell data. and so the total length of time between entry to and exit from the state is unknown. INTRODUCTION in the state of interest using retrospective recall methods. so again the exact length of the spell (whether completed or incomplete) is not known. For example. analysts may use questions asking respondents whether they are currently married. the relevant event (transition out of the current state) had not yet occurred (the spell end date is unknown).e. and changes since the previous interview. the nature of the information about the spells di¤ers.
1. censored or truncated. grouped or banded) or. Given entry at time 0 and observation at time t. and thus the total spell length (from entry time until transition) is not known exactly. For example a panel survey may also include retrospective question modules to ask about respondents’experiences before the survey began. and associated dates. when considering how long marriages last. Panel and cohort surveys. Administrative records containing longitudinal data may be matched into a sample survey. we only know that the completed spell is of length T > t. The main lesson of this brief introduction to data collection methods is that.
Combinations of these survey instruments may be used. We may distinguish the following types of censoring: Right censoring: at the time of observation. The …rst aspect concerns whether survival times are complete. with prospective data collection In this case. Similar sorts of methods are commonly used to collect information about personal histories of employment and jobs over the working life. or ever have been. and this has important implications for how one should analyze the data. and widowhood. respondents are typically asked about their current status. Note that this is the de…nition of left censoring most commonly used by social scientists.
Ci g. Left truncation is also known by other names: delayed entry and stock sampling with followup. (Of all those who began a spell at time r < s. If rightcensoring is not independent –instead its determinants are correlated with the determinants of the transition process – then we need to model the two processes jointly. then persons with short spells are systematically excluded. and interviews them some time later. Right truncation: this is the case when only those persons who have experienced the exit event by some particular date are included in the sample. when a sample is drawn from the persons who exit from the state at a particular date (e.) Note that the spell start is assumed known in this case (cf. or (ii) entry dates are observed and exit dates are not observed exactly (right censored incomplete spell data). truncated survival time data are those for which there is a systematic exclusion of survival times from one’ sample. See e. left censoring).2.) By contrast. only those with relatively long spells survived long enough to be found in the stock at time s and thence available to be sampled. See Figure 2 for some examples of di¤erent types of spells. and so relatively ‘ long’survival times are systematically excluded. We may distinguish two types of truncation: Left truncation: the case when only those who have survived more than some minimum amount of time are included in the observation sample (‘ small’ survival times – those below the threshold – are not observed).
. but the subject’ survival is only observed from some later date – hence s ‘ delayed entry’ . re‡ ecting the fact that data they use are often generated in this way. See the texts for more about the di¤erent types of censoring mechanisms that have been distinguished in the literature. *** insert and add comments *** We assume that the process that gives rise to censoring of survival times is independent of the survival time process.g.1. Right truncation occurs. SURVIVAL TIME DATA: SOME NOTABLE FEATURES
5
use a di¤erent de…nition: to them. The latter term is the mostcommonly referred to by economists. an out‡ow sample from the unemployment register). The most commonly available survival time data sets contain a combination of survival times in which either (i) both entry and exit dates are observed (completed spell data). and the sample ses lection e¤ect depends on survival time itself.g. but it is not known exactly when. There is some latent failure time for person i given by Ti and some latent censoring time Ci . We shall do so as well. If one samples from the stock of persons in the relevant state at some time s. The ubiquity of such right censored data has meant that the term ‘ censoring’is often used as a shorthand description to refer to this case. Klein and Moeschberger. leftcensored data are those for which it is known that exit from the state occurred at some time before the observation date. 1997. and what we observe is Ti = minfTi . for example.
although the underlying transition process may occur in continuous time. spell lengths may be summarised using the set of positive integers (1.2.
1. 4. One of the themes of this lecture course is that one should use models that re‡ the nature of the data available. That is. for example. Consider. or simply because survival times have been grouped at the observation or reporting stage. and constrast these with continuous time models. Thus the more important distinction is between discrete time data and continuous time data. The occurence of tied survival times may be an indicator of interval censoring. 2. INTRODUCTION
An example is where censoring arises through nonrandom sample dropout (‘ attrition’ ). in principle. Biostatisticians typically refer to this situation as one of interval censoring. the stochastic process occurs in continuous time.g. and the observations on the transition process are summarized discretely rather than continuously. perhaps re‡ ecting their origins in the biomedical sciences. Often this is derived from observations on spell start dates and either spell exit dates (complete spells) or last observation date (censored spells). it would be natural to model failure times in terms of the number of discrete cycles that the machine tool was in operation. it might be more natural to measure time in terms of numbers of menstrual cycles rather than number of calendar months. For this reason. Models for the latter are the most commonly available and most commonly applied. and for two reasons. In this case. numbers of months or years). I give
. the length of an observed spell length can be measured using a nonnegative real number (which may be fractional). For the same reason. and so if there is a number of individuals in one’ data set with s the same survival time. one might ask whether the ties are genuine. 3. and in particular the time from puberty to …rst birth. the data are not observed (or not provided) in that form. more attention is given ect to discrete time models than is typically common. Some continuous time models often (implicitly) assume that transitions can only occur at di¤erent times (at di¤erent instants along the time continuum).6
CHAPTER 1. however. Similarly when modelling fertility. we shall mostly refer simply to discrete time models. Survival time data do not always come in this form. a machine tool set up to carry out a speci…c cycle of tasks and this cycle takes a …xed amount of time. When modelling how long it takes for the machine to break down. a natural description given the de…nitions used in the previous subsection. Time is a continuum and. The …rst reason is that survival times have been grouped or banded into discrete intervals of time (e. and so on). Since the same sorts of models can be applied to discrete time data regardless of the reason they were generated (as we shall see below).2
Continuous versus discrete (or grouped) survival time data
So far we have implicitly assumed that the transition event of interest may occur at any particular instant in time. The second reason for discrete time data is when the underlying transition process is an intrinsically discrete one. However discrete time data are relatively common in the social sciences.
and paid at a (higher) longterm rate for the 13th and subsequent months for spells lasting this long. …rst. and survival time t. whereas the second type often have to be collected separately and then matched in. a person’ sex. The distinction between …xed and timevarying covariates is relevant for both analytical and practical reasons.2. variables for which changes over time can be written directly as a function of survival time.g. And from a practical point of view. Contrast. and this can induce a relationship with survival time but does not depend intrinsically on survival time itself.g. e. (In addition some calendar time variation in the bene…t generosity in real terms was induced by in‡ ation. By contrast. explanatory variables that describe the characteristics of the observation unit itself (e. such a timedependent variable might be X log(t). The second contrast is between explanatory variables that are …xed over time.) Some books refer to timedependent variables. some model interpretations no longer hold. With timevarying covariates. a person’ age. versus s the characteristics of the socioeconomic environment of the observation unit (e. Having all explanatory variables …xed means that analytical methods and empirical estimation are more straightforward. whether time refers to calendar time or survival time within the current state. More about this ‘ episode splitting’ later on. The unemployment rate in the area in which a person lives may vary with calendar time (the business cycle). however. and by annual uprating of bene…t amounts at the beginning of each …nancial year (April).
1.g.3
Types of explanatory variables
There are two main types.1. SURVIVAL TIME DATA: SOME NOTABLE FEATURES
7
more explicit attention to how to estimate models using data sets containing lefttruncated spells than do most texts. this distinction makes no di¤erence. For example. one has to reorganise one’ data set in order s to incorporate them and estimate models. the unemployment rate of the area in which the person lives). or a s …rm’ size). These are either the same as the timevarying variables described above or.2. As far model speci…cation is concerned. It may make a signi…cant di¤erence in practice.
. and distinguish between those that vary with survival time and those vary with calendar time. sometimes. and s timevarying. social assistance bene…t rates in Britain used to vary with the length of time that bene…t had been received: Supplementary Bene…t was paid at the shortterm rate for spells up to 12 months long. as the …rst type of variables are often directly available in the survey itself. given some personal characteristic summarized using variable X.
b i=1 n P
(1. Suppose too that the incidence of censoring is higher at longer survival times relative to shorter survival times. in the population. a is the ^
Sample data cloud less dense everywhere but disproportionately at higher t **CHART TO INSERT**
. Xi for each individual i = 1. logit. More speci…cally. (This does not necessarily con‡ with the ict assumption of independence of the censoring and survival processes –it simply re‡ ects the passage of time.3. who has a true survival time of Ti . let us suppose that the ‘ true’ model is such that there is a single explanatory variable. b is the slope of the least squares line.1
Problems for OLS caused by right censoring
To illustrate the (right) censoring issue.1) (ei )2 . we mean: regress Ti . Ti By OLS. The longer the observation period.g. and Ti < Ti for right censored observations. or better still log Ti (noting that survival times are all nonnegative and distributions of survival times are typically skewed). In the sample. we observe Ti where Ti = Ti for observations with completed spells.3
Why are distinctive statistical methods used?
This section provides some motivation for the distinctive specialist methods that have been developed for survival analysis by considering why some of the methods that are commonly used elsewhere in economics and other quantitative social science disciplines cannot be applied in this context (at least in their standard form). b Case (a) Exclude censored cases altogether
a. : : : .) **CHART TO INSERT** Data ‘ cloud’ combinations of ‘ : true’Xi . …tting the linear relationship log(Ti ) = a + bXi + ei The OLS parameter estimates are the solution to min vertical intercept. the greater the proportion of spells for which events are observed. n. probit) with transition event occurrence as the dependent variable? Let us consider these in turn. OLS cannot handle three aspects of survival time data very well: censoring (and truncation) timevarying covariates ‘ structural’modelling
1. what is the problem with using either (1) Ordinary Least Squares (OLS) regressions of survival times. In addition.8
CHAPTER 1. a higher X is associated with a shorter survival time. or with using (2) binary dependent variable regression models (e. INTRODUCTION
1. on Xi .
timevarying covariates require some special treatment in modelling. are framed in terms of decisions about whether to do something (and observed transitions re‡ ect that choice).. models are not formulated in terms of completed spell lengths. as this is the only de…nition that is consistently de…ned? But then much information is thrown away.e.1.) However.g. WHY ARE DISTINCTIVE STATISTICAL METHODS USED? Case (b) Treat censored durations as if complete **CHART TO INSERT** Underrecording –especially at higher t sample OLS line has wrong slope again
9
1. censored observations would have a ‘ .e. One could get around this by considering whether a transition occurred within some prespeci…ed interval of time (e. (Observations with a transition would have a ‘ 1’for the dependent variable. I. For example: the value at the time just before transition? But completed survival times vary across people.4
Why not use binary dependent variable models rather than OLS?
Given the above problems.
1. . Perhaps. we should model transitions directly.3.g.3. then. 12 months since the spell began). one might ask whether one could use instead a binary dependent regression model (e. and what would one do about censored observations (no transition observed)? the value of the variable when the spell started. by simply modelling whether or not someone made a transition or not.3
‘ Structural’modelling and OLS
Our behavioural models of for example job search. logit. but . one could get round the censoring issue (and the structural modelling one).3.3. In sum.2
Timevarying covariates and OLS
How can one handle timevarying covariates. probit)? I.
1.
. this strategy is also potentially prob0’ lematic: it takes no account of the di¤erences in time in which each person is at risk of experiencing the event. marital search. . etc. given that OLS has only a single dependent variable and there are multiple values for the covariate in question? If one were to choose one value of a timevarying covariate at some particular time as ‘ representative’ which one would one choose of the various possibilities? . especially the censoring one.
for the same sorts of reasons. The hazard rate is de…ned more formally in Chapter 2. I also draw attention to the intimate connections between the hazard. The rest of this book elaborates this strategy. in particular about when someone left if she or he did so. for a selection of models. Chapter 3 discusses functional forms for the hazard rate. and density functions. failure. In the subsequent chapters. Multivariate regression models in which di¤erences in characteristics are incorporated via di¤erences in covariate values are the subject of Chapters 5– The problems with using OLS to model survival time data 9. survivor. The solution is to model survival times indirectly.
. Survival analysis is not simply about estimating model parameters. I also show. and are able to handle censoring and incorporate timevarying covariates. (Crosstabulations of a dependent variable against each explanatory variable is often used with other sorts of data to explore relationships. The aim is to indicate the principles behind the methods used. via the socalled ‘ hazard rate’ which is a concept . timevarying explanatory variables cannot be handled easily (current values may be misleading)
1. These are designed to …t functions for the sample as a whole. or for separate subgroups. the problems include: the dependent variable is mismeasured and censoring is not accounted for. and about the relationship between di¤erences in survival times and di¤erences in characteristics (summarised by di¤erences in values of explanatory variables). for survival analysis.) In particular. INTRODUCTION one still loses a large amount of information.4
Outline of the book
The preceding sections have argued that. or the numerical analysis methods required to actually derive them. related to chances of making a transition out of the current state at each instant (or time period) conditional on survival up to that point.10
CHAPTER 1. they are not multivariate regression models. what the hazard function speci…cation implies about the distribution of survival times (including the median and mean spell lengths). considering both continuous and discrete time models. we need methods that directly account for the sequential nature of the data. but also interpreting them and drawing out their implications to the fullest extent. rather than provide details (or proofs) about the statistical properties of estimators. Chapter 4 discusses KaplanMeier (productlimit) and Lifetable estimators of the survivor and hazard functions.
Crosstabulations of (banded) survival times against some categorical/categorised variable cannot be used for inference about the relationship between survival time and that variable. I set out some of the most commonlyused speci…cations. and explain how models may be classi…ed into two main types: proportional hazards or accelerated failure time models. we move from concepts to estimation.
Chapter 10 [not yet written!] discusses repeated spell data (the rest of the book assumes that the available data contain a single spell per subject). But what if there are unobserved or unobservable di¤erences? The chapter discusses the impact that unobserved heterogeneity may have on estimates of regression coe¢ cients and duration dependence.
. sumed that all relevant di¤erences between individuals can be summarised by the observed explanatory variables. estimated using maximum likelihood. A list of references and appendices complete the book.and righttruncated spell data. and left.4. this chapter shows how one can model transitions to a number of mutuallyexclusive destination states.1. and outlines the methods that have been proposed to take account of the problem. The remainder of the book discusses a selection of additional topics. The chapters indicate how the likelihoods underlying estimation di¤er in each of the leading sampling schemes: random samples of spells (with or without right censoring). Chapter 8 addresses the subject of unobserved heterogeneity (otherwise known as ‘ frailty’ In the models considered in the earlier chapters. Chapter 9 considers estimation of competing risks models. Chapter 7 introduces Cox’ semiparametric proportional s hazard model for continuous time data. A continuing theme is that estimation has to take account of the sampling scheme that generates the observed survival time data. tinuous time and discrete time regression models respectively. We also examine how to incorporate timevarying covariates using ‘ episodesplitting’ Chapters 5 and 6 discuss con. estimated using partial likelihood. Earlier chapters consider models for exit to a single destination state. OUTLINE OF THE BOOK
11
are shown to be resolved if one uses instead estimation based on the maximum likelihood principle or the partial likelihood principle. it is implicitly as).
INTRODUCTION
.12
CHAPTER 1.
we de…ne key concepts of survival analysis. How to incorporate individual heterogeneity is discussed in the next chapter. showing cdf as area under curve Failure function (cdf) Pr(T t) = F (t) (2. across individuals in order to focus on how they vary with survival time. for the Survivor function: Pr(T > t) = 1 F (t) S(t): (2.) is a realisation of a continuous random variable T with a cumulative distribution function (cdf). and show that there is a onetoone relationship between them. we shall ignore di¤erences in hazard rates. and probability density function (pdf). etc. t is the elapsed time since entry to the state at time 0. *Insert chart* of pdf in terms of cdf 13
.1) which implies. f (t). *Insert chart* of pdf. F (t) is also known in the survival analysis literature as the failure function. namely the hazard function and the survivor function. For now. F (t).Chapter 2
Basic concepts: the hazard rate and survivor function
In this chapter. I use S(t) throughout. and so on. The survivor function is S(t) 1 F (t). …rm.2)
Observe that some authors use F (t) to refer to the survivor function.
2.1
Continuous time
The length of a spell for a subject (person.
11)
. The f (t) t is akin to ) the unconditional probability of having a spell of length exactly t. leaving state in tiny interval of time [t. some people …nd. t+ t]. (2. 0 S(t) S(0) = 1 lim S(t) = 0 @S @t @2S @t2 The density function is nonnegative f (t) 0 (2. is de…ned as: (t) = f (t) f (t) = : F (t) S(t) (2. conditional on survival up to time t. The survivor function S(t) and the Failure function F (t) are each probabilities.14CHAPTER 2.
2.1. and therefore inherit the properties of probabilities.3)
t!0
where t is a very small (‘ in…nitesimal’ interval of time.6) (2.4) (2. i. (t). and Pr(B) be the probability of survival up to time t. then the probability of leaving in the interval (t.e. and then return to interpretation. Let us begin with its de…nition. may be derived from the rules of conditional probability: Pr(AjB) = Pr(A \ B)= Pr(B) = Pr(BjA) Pr(A)= Pr(B) = Pr(A)= Pr(B).8)
t!1
but may be greater than one in value (the density function does not summarize probabilities).5) (2.7) (2. observe that the survivor function lies between zero and one. The survivor function is equal to one at the start of the spell (t = 0) and is zero at in…nity.1
The hazard rate
The hazard rate is a di¢ cult concept to grasp.9) < 0 ? 0: 1 (2. t + t]. BASIC CONCEPTS: THE HAZARD RATE AND SURVIVOR FUNCTION The pdf is the slope of the cdf (Failure) function: f (t) = lim Pr(t T t t+ t) = @F (t) = @t @S(t) @t (2. The continuous time hazard rate. and is a strictly decreasing function of t.10)
1
Suppose that we let Pr(A) be the probability of leaving the state in the tiny interval of time between t and t+ t. In particular.
Contrast the unconditional probability of dying at age 12 (for all persons of a given birth cohort). is akin to the conditional probability of having a spell length of exactly t.
. in the same way that the probability density f (t) may be greater than one. (t) may be greater than one.1. one can start from any one of these di¤erent characterisations of the distribution.1. for tiny t. as it refers to the exact time t and not the tiny interval thereafter.2
Key relationships between hazard and survivor functions
I now show that there is a onetoone relationship between a speci…cation for the hazard rate and a speci…cation for the survivor function.e.12)
Thus (t) t. 12 + t): Alternatively. I. and so can be thought of as summarizing the instantaneous transition intensity. Indeed. however. for tiny t. CONTINUOUS TIME
15
since Pr(BjA) = 1: But Pr(A)= Pr(B) = f (t) t=S(t). consider an unemployment example. The probability density function f (t) summarizes the concentration of spell lengths (exit times) at each instant of time along the time axis. This expression is closely related to the expression that de…nes the hazard rate: compare it with (2:10): (t) t = f (t) t : S(t) (2. and derive the others from it. We shall see that the discrete time hazard is a (conditional) probability. The hazard function summarizes the same concentration at each point of time. In practice. and implied by the properties of f (t) and S(t). take a longevity example. is that: (t) 0:
That is. (Later we shall consider discrete time survival time data. 12 + t) for a person who has already been unemployed 12 months. but conditions the expression on survival in the state up to that instant. conditional on survival up to time t: It should be stressed.
2. whereas (ii) is the probability for an entrant to unemployment of staying unemployed for 12 months and leaving in interval (12. Contrast (i) conditional probability (12) t. and also f (t) and H(t).2. Economists may recognise the expression for the hazard as being of the same form as the “inverse Mills ratio”that used when accounting for sample selection biases. given survival up to that age. one can derive S(t) and F (t) from it.and (ii) unconditional probability f (12) t: Expression (i) denotes the probability of (re)employment in the interval (12. To understand the conditioning underpinning the hazard rate further. that the hazard rate is not a probability. whatever functional form is chosen for (t).) The only restriction on the hazard rate. in principle. and probability of dying at age 12.
Now
using the fact that @ ln[g(x)]=@x integrate both sides: Z t (u)du =
0
ln[1
F (t)] jt : 0
(2.
2. so ln[1 F (t)] S(t) S(t)
Z t = ln[S(t)] = (u)du.
. discrete survival time data may arise because either (i) the time scale is intrinsically discrete. integrated hazard functions.13) (2.20)
=
exp[ H(t)]
where the integrated hazard function. we take a number of common speci…cations for the hazard rate and derive the corresponding survivor functions. (t) = = = = f (t) 1 F (t) @[1 F (t)]=@t 1 F (t) @f ln[1 F (t)]g @t @f ln[S(t)]g @t = g 0 (x)=g(x) and S(t) = 1 (2. In Chapter 3.16CHAPTER 2.e. Let us consider the latter case …rst. or (ii) survival occurs in continuous time but spell lengths are observed only in intervals (‘ grouped’or ‘ banded’data). is Z t H(t) (u)du
0
(2. H(t).14) (2.2
Discrete time
As discussed earlier.21) (2.18) (2.17)
But F (0) = 0 and ln(1) = 0. BASIC CONCEPTS: THE HAZARD RATE AND SURVIVOR FUNCTION one typically starts from considerations of the shape of the hazard rate.19) (2.15) (2. 0 Z t = exp (u)du
0
(2.16) F (t). i.22)
=
ln[S(t)]:
Observe that H(t) 0 @H(t) = (t): @t We have therefore demonstrated the onetoone relationships between the various concepts. and density functions.
23)
This de…nition supposes that interval (aj 1 . aj ] begins at the instant after the date marking the beginning and end of the interval (aj 1 ). a3 . are dates (points in time).30)
Note that the interval time hazard is a (conditional) probability. a2 . aj ].31)
1 One could. :::. DISCRETE TIME
17
2. The TDA s package uses the other convention. is the probability of exit in the interval (aj 1 . a2 ]. :::. k.2. also known as the discrete hazard rate. because di¤erent software packages use di¤erent conventions and this can lead to di¤erent results. ak
= 1]:
(2. and the intervals need not be of equal length (though we will later suppose for convenience that they are). (a2 . The choice is largely irrelevant in development of the theory. (The di¤erences arise when one splits spells into a sequence of subepisodes – ‘ episode splitting’ .26)
The interval hazard rate. ak :The intervals themselves are described by [0 = a0 .2. aj ). a2 . :::.28) (2. and so 0 h(aj ) 1: (2. a3 . (ak
1 . alternatively.) I have used the de…nition that is consistent with Stata’ de…nition of intervals. a1 ]. The value of the survivor function at the time demarcating the start of the jth interval is Pr(T > aj 1 ) = 1 F (aj 1 ) = S(aj 1 ) (2.25)
The probability of exit within the jth interval is Pr(aj
1
<T
aj ) = F (aj )
F (aj
1)
= S(aj
1)
S(aj ):
(2. The time indexing the end of the interval (aj ) is included in the interval.29) (2.27) (2. (a1 . de…ne the intervals as [a j 1 .2.24) where F (:) is the Failure function de…ned earlier. The value of the survivor function at the end of the jth interval is Pr(T > aj ) = 1 F (aj ) = F (aj ) = S(aj ): (2.
. for j = 1. though it can matter in practice. and de…ned as: h(aj ) = Pr(aj 1 < T aj jT > aj Pr(aj 1 < T aj ) = Pr(T > aj 1 ) S(aj 1 ) S(aj ) = S(aj 1 ) S(aj ) = 1 S(aj 1 )
1)
(2. a1 . :::. h(aj ).1
Survival in continuous time but spell lengths are intervalcensored
*Insert chart (time scale) * We suppose that the time axis may be partitioned into a number of contiguous nonoverlapping (‘ disjoint’ intervals where the interval boundaries are the ) dates a0 = 0.1 Observe that a1 . a3 ].
hj = h.34) (2. by construction. intervals can be indexed using the positive integers. all j. for which (u) 0 and may be greater than one. Interval (aj 1 . in principle. aj ]. is F (j) = Fj = 1 1
j Y
(2. S3 = (probability of survival through interval 1) (probability of survival through interval 2. aj ] may be relabelled (aj 1.) In the special case in which the hazard rate is constant over time (in which case survival times follow a Geometric distribution). for example a ‘ week’ or a ‘ month’ In this case. and we may refer to this as the jth interval.35)
S(j) (1 hk ):
(2. The probability of survival until the end of interval j is the product of probabilities of not experiencing event in each of the intervals up to and including the current one. we have: S(j) = Sj = (1
j Y
h1 )(1 hk )
h2 ):::::(1
hj
1 )(1
hj )
(2. 2. :::. 4. for aj = 1.18CHAPTER 2. written in terms of interval hazard rates. the end of the jth interval is date aj .32) (2. is the probability of exit within the jth interval:
. to refer to the continuous time survivor function. We shall assume that intervals are of unit length from now on. indexed by a date – an instant of time – rather than an interval of time. f (j).33)
(1
k=1
S(j) refers to a discrete time survivor function. F (j).37)
k=1
The discrete time density function for the intervalcensored case. the discrete hazard rate di¤ers from the continuous time hazard rate. For example. 3.e.36) (2. in practice it is typically assumed that intervals are of equal unit length. the time ). then
S(j) = (1 h)j log[ S(j)] = j log(1 h): The discrete time failure function. BASIC CONCEPTS: THE HAZARD RATE AND SURVIVOR FUNCTION In this respect. more generally S(t). Hence. i. (Of course the two functions imply exactly the same value since. be of di¤erent lengths. and to the interval hazard rate as h(j) rather than h(aj ). unless otherwise stated. We shall reserve the notation S(aj ) or. given survival through interval 2). more generally. given survival through interval 1) (probability of survival through interval 3. Although the de…nition used so far refers to intervals that may.
we index survival times using the set of positive integers in both cases. Observe that 0 f (j) 1: (2. is given by: S(j) = Pr(T j) =
1 X k=j
fk :
(2.41)
where j 2 f1.39)
Thus the discrete density is the probability of surviving up to the end of interval j 1. Note that j now indexes ‘ cycles’rather than intervals of equal length.2.39) is used extensively in later chapters when deriving expressions for sample likelihoods. is the conditional probability of the event at j (with conditioning on survival until completion of the cycle immediately before the cycle at which the event occurs) is: h(j) = = Pr(T = jjT f (j) S(j 1) j) (2.38)
hk ):
(2. The discrete time survivor function for cycle j. multiplied by the probability of exiting in the j th interval.43) (2.40)
2.2.2.2
The discrete time hazard when time is intrinsically discrete
In the case in which survival times are instrinsically discrete. But we can apply the same notation. 2. h(j). DISCRETE TIME
19
f (j) = Pr(aj 1 < T aj ) = S(j 1) S(j) S(j) = S(j) 1 hj 1 = 1 S(j) 1 hj =
j hj Y (1 1 hj k=1
(2. the set of positive integers. showing the probability of survival for j cycles.44)
It is more illuminating to write the discrete time survivor function analogously to the expression for the equallength interval case discussed above (for survival
. 3. : : :g. survival time T is now a discrete random variable with probabilities f (j) fj = Pr(T = j) (2. Expression (2.42)
The discrete time hazard at j.
e. that log S(j)
j X
hk :
(2. correspondingly.3
The link between the continuous time and discrete time cases
j X
In the discrete time case.
.50)
k=1
For ‘ small’ hk .45) that: log S(j) = log(1 hk ): (2. i. we have from (2.51) which implies.45) (2.e. in turn. and note the parallels between the summation over discrete hazard rates and the integration over continuous time hazard rates: Z t logS(t) = H(t) = (u)du: (2.2. a …rstorder Taylor series approximation may be used to show that log(1 hk ) hk (2.52)
k=1
Now contrast this expression with that for the continuous time case. i.20CHAPTER 2.46)
(1
hk ):
k=1
The discrete time failure function is: Fj = F (j) = 1 = 1
j Y
S(j) hk ):
(2.: f (j) = hj Sj
1
=
hj Sj : 1 hj
(2.49)
2.47) (2. the discrete time survivor function tends to the continuous time one.: Sj = S(j) = (1 =
j Y
h1 )(1
h2 ):::::(1
hj
1 )(1
hj )
(2. the closer that discrete time hazard hj is to the continuous time hazard (t) and.48)
(1
k=1
Observe that the discrete time density function can also be written as in the intervalcensored case.53)
0
As hk becomes smaller and smaller. BASIC CONCEPTS: THE HAZARD RATE AND SURVIVOR FUNCTION to the end of interval j).
Indeed virtually all data are grouped (even with survival times recorded in units as small as days or hours). at which subjects were last observed –so survival times are measured in days –and the typical spell length is several months or years. then recording them in units of days implies substantial grouping. If one has information about the day. Intrinsically discrete survival times are rare in the social sciences. First.
.3
Choosing a speci…cation for the hazard rate
The empirical analyst with survival time data to hand has choices to make before analyzing them. but it is common for the data summarizing spell lengths to be recorded in grouped form. observations on a continuous random variable which is grouped (interval censoring).3. and the process by which the data were recorded.
2. then it is reasonable to treat survival times as observations on a continuous random variable (not grouped).2. was not necessarily appropriate.3. many of the methods developed for analysis of survival time data assumed that the data set contained observations on a continuous random variable (and arose in applications where this assumption was reasonable). often intervalcensored. A related issue concerns ‘ tied’ survival times – more than one individual in the data set with the same recorded survival time. then. Historically. month. what is the shape of the allimportant relationship between the hazard rate and survival time? Let us consider these questions in turn. is the length of the intervals used for grouping relative to the typical spell length: the smaller the ratio of the former to the latter. A key issue. But if spells length are typically only a few days long. the more appropriate it is to use a continuous time speci…cation. should the survival times be treated as observations on a continuous random variable. or observations on an intrinsically discrete random variable? Second. and year in which a spell began. and Petersen and Koput (1992). Today. and year. For some analysis of the e¤ects of grouping. Methods for handling intervalcensored data (or intrinsically discrete data) are increasingly available. and one of the aims of this book is to promulgate them. and clear from consideration of both the underlying behavioural process generating survival times. CHOOSING A SPECIFICATION FOR THE HAZARD RATE
21
2. A relatively high prevalence of ties may indicate that the banding of survival times should be taken into account when choosing the speci…cation. this is much less of a problem. month.1
Continuous or discrete survival time data?
The answer to this question may often be obvious. and also the day. conditional on that choice. see Bergström and Edin (1992). It would then make sense to use a speci…cation that accounted for the interval censoring. Application of these methods to social science data. Petersen (1991). The vast majority of the behavioural processes that social scientists study occur in continuous time.
The following desiderata suggest themselves: A shape that is empirically relevant (or suggested by theoretical models). we may write the unemployment exit hazard rate (t) as the product of the job o¤er hazard (t) and the job acceptance hazard A(t): (t) = (t)A(t): A simple structural model In a simple job search framework. Hence.)
. for failure of machine tools. and the optional policy is to adopt a reservation wage r. (t) = (t) [1 W (t)] (2. To leave unemployment requires that an unemployed person both receives a job o¤er. closed form expressions for (t) . and so on.3.g. and accept a job o¤er with associated wage w only if w r. and S (t) and tractable expressions for summary statistics of survival time distributions such as the mean and median survival time. consider the extent to which economic theory might provides suggestions for what the shape of the hazard rate is like. and that that o¤er is acceptable.3
What guidance from economics?
As an illustration of the …rst point.2
The relationship between the hazard and survival time
The only restriction imposed so far on the continuous time hazard rate is (t) 0. Consider a twostate labour market. the unemployed person searches across the distribution of wage o¤ers. one would expect r to decline with the duration of unemployment.3. A shape that has convenient mathematical properties e.) For a given worker.22CHAPTER 2. Hence the only way to leave unemployment is by becoming employed. This is likely to di¤er between applications –the functional form describing human mortality is likely to di¤er from that for transitions out of unemployment transitions. where the two states are (1) employment. How the reservation wage varies with duration of unemployment.55) (2. BASIC CONCEPTS: THE HAZARD RATE AND SURVIVOR FUNCTION
2. and on the discrete time hazard rate is 0 h(j) 1.54)
where W (t) is the cdf of the wage o¤er distribution facing the worker. and (2) unemployment. in a …nite horizon world. (The job o¤er probability is conventionally considered to be under the choice of …rms.
2. and the acceptance probability dependent on the choice of workers. How the reemployment hazard varies with duration thus depends on: 1. Nothing has been assumed about the pattern of duration dependence –how the hazard varies with survival times. (In an in…nite horizon world one would expect r to be constant.
Time limits on eligibility to Unemployment Insurance (UI) may lead to a bene…t exhaustion e¤ect. t) . unemployment bene…ts may vary with duration t. for the fact that: 1. That is we allow. (It is
In sum. and 3. Reduced form approach An alternative interpretation is that the simple search model (or even more sophisticated variants) place too many restrictions on the hazard function. for example). with the reemployment hazard ( ) rising as the time limit approaches. Hence one may want to use a reduced form approach.)
23
varies with duration of unemployment.2. and maybe also calendar time s (because of policy changes. s) . The actual shape of the hazard will re‡ a mixture of these e¤ects. and 2. Observe the variety of potential in‡ uences. t. Moreover. the simple search model provides some quite strong restrictions on the hazard rate (if in‡ uences via are negligible). in a more ad hoc way. What
. How the job o¤er hazard unclear what to expect. may also vary directly with survival time.56)
where X is a vector of personal characteristics that may vary with unemployment duration (t) or with calendar time (s). leading to decline in search intensity: @ =@t < 0. CHOOSING A SPECIFICATION FOR THE HAZARD RATE 2.
Examples of this include Employers screening unemployed applicants on the basis of how long each applicant has been unemployed. Discouragement (or a ‘ welfare culture’or ‘ bene…t dependence’e¤ect) may set in as the unemployment spell lengthens.3. the restrictions may be consequence of the simplifying assumptions used to make the model tractable. local labour market conditions may vary with calendar time (s).e. and write the hazard rate more generally as (t) = (X (t. This suggests that it is ect important not to preimpose particular shape on the hazard function. (2. i. rather than intrinsic. for example rejecting the longerterm unemployed) : @ =@t < 0: The reservation wage falling with unemployment duration: @A=@t > 0 (a resource e¤ect). some of the in‡ uences mentioned would imply that the hazard rises with unemployment duration. whereas others imply that the hazard declines with duration.
) Although the examples above referred to modelling of unemployment duration.
. BASIC CONCEPTS: THE HAZARD RATE AND SURVIVOR FUNCTION one needs to do is strike a balance between what a theoretical (albeit simpli…ed) model might suggest. This will improve model …t. much the same issues for model selection are likely to arise in other contexts.24CHAPTER 2. one cannot identify which in‡ uence has which e¤ect so easily. though of course problems of interpretation may remain. (Without the structural model. and ‡ exibility in speci…cation and ease of estimation.
Chapter 3
Functional forms for the hazard rate
3.1 Introduction and overview: a taxonomy
The last chapter suggested that there is no single shape for the hazard rate that is appropriate in all contexts. In this chapter we review the functional forms for the hazard rate that have been most commonly used in the literature. What this means in terms of survival, density and integrated hazard functions is examined in the following chapter. We begin with an overview and taxonomy, and then consider continuous time and discrete time speci…cations separately. See Table 3.1. The entries in the table in the emphasized font are the ones that we shall focus on. One further distinction between the speci…cations, discussed later in the chapter, refers to their interpretation –whether the models can be described as proportional hazards models (PH), or accelerated failure time models (AFT). Think of the PH and AFT classi…cations as describing whole families of survival time distributions, where each member of a family shares a common set of features and properties. Note that although we speci…ed hazard functions simply as a function of survival time in the last chapter, now we add an additional dimension to the speci…cation –allowing the hazard rate to vary between individuals, depending on their characteristics. 25
26
CHAPTER 3. FUNCTIONAL FORMS FOR THE HAZARD RATE Continuous time semiparametric Piecewise constant Exponential ‘ Cox’ model Discrete time Logistic Complementary loglog
Continuous time parametric Exponential Weibull Loglogistic Lognormal Gompertz Generalized Gamma
Table 3.1: Functional forms for the hazard rate: examples
3.2
Continuous time speci…cations
In the previous chapter, we referred to the continuous time hazard rate, (t) and ignored any potential di¤erences in hazard rates between individuals. Now we remedy that. We suppose that the characteristics of a given individual may be summarized by a vector of variables X and, correspondingly, now refer to the hazard function (t; X) and survivor function S (t; X), integrated hazard function H (t; X), and so on. The way is which heterogeneity is incorporated is as follows. We de…ne a linear combination of the characteristics:
0
X
0
+
1 X1
+
2 X2
+
3 X3
+ ::: +
K XK :
(3.1)
There are K variables observed for each person, and the s are parameters, later to be estimated. Observe that the linear index 0 X, when evaluated, is simply a single number for each individual. For the moment, suppose that the values of the Xs do not vary with survival or calendar time, i.e. there are no timevarying covariates.
3.2.1
Weibull model and Exponential model
The Weibull model is speci…ed as: (t; X) = = t t
1 1
exp
0
X
(3.2) (3.3)
where exp 0 X , > 0, and exp(.) is the exponential function. The hazard rate either rises monotonically with time ( > 1), falls monotonically with time ( < 1), or is constant. The last case, = 1, is the special case of the Weibull model known as the Exponential model. For a given value of , larger values of imply a larger hazard rate at each survival time. The is the shape parameter. Beware: di¤erent normalisations and notations are used by di¤erent authors. For example, you may see the Weibull hazard written as (t; X) =
3.2. CONTINUOUS TIME SPECIFICATIONS
27
(1= )t(1= ) 1 where 1= . (The reason for this will become clearer in the next section.) Cox and Oakes (1984) characterise the Weibull model as (t; X) = { ( t){ 1 = { { t{ 1 : *Insert charts* showing how hazard varies with (i) variations in with …xed ; (ii) variations in with …xed :
3.2.2
Gompertz model
The Gompertz model has a hazard function given by: (t; X) = exp( t) (3.4)
and so the log of the hazard is a linear function of survival time: log (t; X) =
0
X + t:
(3.5)
If the shape parameter > 0, then the hazard is monotonically increasing; if = 0, it is constant; and if < 0, the hazard declines monotonically. *Insert charts* showing how hazard varies with (i) variations in with …xed ; (ii) variations in with …xed :
3.2.3
Loglogistic Model
exp(
K XK : 0
To specify the next model, we use the parameterisation
0
X), where (3.6)
X
0
+
1 X1
+
2 X2
+
3 X3
+ ::: +
The reason for using a parameterization based on and , rather than on and , as for the Weibull model, will become apparent later, when we compare proportional hazard and accelerated failure time models. The Loglogistic model hazard rate is:
1
(t; X) = where shape parameter
> 0. An alternative representation is: (t; X) = ' ' t' 1 ' 1 + ( t) (3.8)
h
t(
1
1)
1
1 + ( t)
i
(3.7)
where ' 1= : Klein and Moeschberger (1997) characterise the loglogistic model hazard function as (t; X) = ' t' 1 =(1 + t' ), which is equivalent to the expression above with = ' . *Insert chart* of hazard rate against time. Observe that the hazard rate is monotonically decreasing with survival time for 1 (i.e. ' 1). If < 1 (i.e. ' > 1), then the hazard …rst rises with time and then falls monotonically.
2. as in the Stata manual: they are the shape and scale parameters respectively. The PH speci…cation Proportional hazards models are also known as ‘ multiplicative hazard’models. …rst rising and then declining). log The models are characterised by their satisfying a separability assumption: (t.5
Generalised Gamma model
This model has a rather complicated speci…cation involving two shape parameters.6
Proportional Hazards (PH) models
Let us now return to the general case of hazard rate (t. These relationships mean that the generalised Gamma is useful for testing model speci…cation: by estimating this general model. then one has the standard Gamma distribution. And if { = . or ‘ relative hazard’ models for reasons that will become apparent shortly.10)
. With { = 0. Note that a number of other parametric models have been proposed. X) =
ln(t)
n
ln(t)
o2
(3. The hazard function is quite ‡ exible in shape.2. X) =
0
(t) exp( 0 X) =
0
(t)
(3. we have the Exponential model.2. i. the Lognormal model results. Let us label them { and . one can use a Wald (or likelihood ratio) test to investigate whether one of the nested models provides a satisfactory …t to the data.e. If { = 1. The hazard rate is similar to that for the Loglogistic model for the case < 1 (i.28
CHAPTER 3. = 1.
3. FUNCTIONAL FORMS FOR THE HAZARD RATE
3. there is the generalized F distribution (see Kalb‡ eisch and Prentice 1980). even including the possibility of a U shaped or socalled ‘ bathtub’ shaped hazard (commonly cited as a plausible description for the hazard of human mortality looking at the lifetime as a whole). the hazard rate at survival time t for a person with …xed covariates summarised by the vector X. For example.e.4
Lognormal model
This model has hazard rate
t 1 p 2
exp 1
1 2
(t. X). *Insert chart* of hazard rate against time.9)
where (:) is the standard Normal cumulative distribution function and characteristics are incorporated with the parameterisation = 0 X. if { = 1. we have the Weibull model. The Generalised Gamma incorporates several of the other models as special cases.
3.
13)
We can also write (3. which scales the baseline hazard function common to all persons. t. Note that I have used one particular convention for the de…nition of the baseline hazard function.2.e. any nonnegative function might be used to summarise the e¤ects of di¤erences in personal characteristics. i.e.11) where
0
(t) =
0
(t) exp(
0)
and
= exp(
1 X1 + 2 X2 + 3 X3 +: : :+ K XK ):
(3. In most cases. then
. It 0 (t) : the ‘ summarizes the pattern of ‘ duration dependence’ assumed to be common . but since exp(:) is the function that is virtually always used.13) in ‘ relative hazard’form: log " # t. = exp( 0 X) : a personspeci…c nonnegative function of covariates X (which does not depend on t. :::.12)
The rationale for this alternative representation is that the intercept term ( 0 ) is common to all individuals. An alternative characterization writes the proportional hazard model as (t. it does not matter which characterization is used. to all persons. and for two persons i and j with vectors of characteristics Xi and Xj . Xim = Xjm for all m 2 f1. which depends on t (but not X). the …rst representation treats the intercept as a regression parameter like the other elements of . Xj
0
Xi
0
Xj = exp
0
(Xi
Xj ) :
(3. we will use it. X) = 0 (t) (3.14)
Observe that the righthand side of these expressions does not depend on survival time (by assumption the covariates are not time dependent). For some t = t. Xj
(3. and is therefore not included in the term summarizing individual heterogeneity. Knkg. In principle.3. If persons i and j are identical on all but the kth characteristic. Xi log = 0 (Xi Xj ) : t. the proportional di¤erence result in hazards is constant. CONTINUOUS TIME SPECIFICATIONS where
29
baseline hazard’function. by construction). By contrast. Xi = exp t. Interpretation The PH property implies that absolute di¤erences in X imply proportionate di¤erences in the hazard at each t. i.
then does not vary with survival time. k . observe that. If Xk log(Zk ). i. each regression coe¢ cient summarises the proportional e¤ect on the hazard of absolute changes in the corresponding covariate. FUNCTIONAL FORMS FOR THE HAZARD RATE
t. we can relate k to the elasticity of the hazard with respect to Xk : Recall that an elasticity summarizes the proportional response in one variable to a proportionate change in another variable. i.30
CHAPTER 3. X) = 0 (t) and …cation applies. Xj
k ):
(3.e. ceteris t.15)
Xjk = 1. X = (t) 0 (u)
0
(3. X)
= = =
exp exp [S0 (t)]
(u) du Z
t 0
(3.21)
0
(u) du
0
. with all other covariates held …xed. X u. you can choose to report estimates of either b k or exp( b k ). for two persons with the same X = X. the PH speciZ
t
S (t. in addition. for each k: In addition. Some further implications of the PH assumption If (t. but di¤erent survival times: t. has the property
k
= @ log (t. Alternatively. then it follows that k is the elasticity of the hazard with respect to Zk . This e¤ect does not vary with survival time.17)
which tells us that in a PH model.e. There is a nice interpretation of the regression coe¢ cients in PH models. Xi = exp( t. Xj If. and so is given by Xk @ log (t. X)=@Xk = k Xk . The coe¢ cient on the kth covariate X. there is a one unit change in Xk . Xik paribus. If Stata is used to estimate a PH model.20) (3. Xi = exp[ t. It also shows the proportionate change in the hazard given a change in a dummy variable covariate from zero to one or. then
k (Xik
Xjk )]:
(3. so that the covariate is measured in logs. more precisely a change from Xik = 0 to Xjk = 1.19) (3.18)
so the ratio of hazards does not depend on X in this case. X)=@Xk
(3.16)
The right hand side of this expression is known as the hazard ratio.
The result suggests that one can check informally whether a model satis…es the PH assumption by plotting the log of the integrated hazard function against survival time (or a function of time) for groups of persons with di¤erent sets of characteristics. Alternatively. then this suggests that the hazard rate declines with survival time rather than constant. Indeed.19) may be rewritten in terms of the integrated hazard function H(t) = ln S(t). You can verify that it is i also the case that h R t log S (t.22)
0
Thus log S (t. X) = log S0 (t). then this suggests that the hazard rate increases with survival time. rewritten in terms of the integrated hazard function. (The estimate of the integrated hazard would be derived using a nonparametric estimator such as the NelsonAalen estimator referred to in Chapter 4.) If the PH assumption holds. Similarly if the integrated hazard plot is convex to the origin (rising but at a increasing rate). f0 (t) = f (tjX = 0). where S0 (t) exp Z
t 0
31
(u) du :
(3. X) = log S0 (t) where S0 (t) exp (u) du . 0 (u) = for all u.2. If the baseline hazard does not vary with survival time.24)
where H0 (t) = ln S0 (t). (3. implying H(t) = H0 (t) : (3. So. Rt Rt then H0 (t) = 0 0 (u) du = 0 du = t.26)
X + ln[H0 (t)]
(3.23)
where f0 (t) is the baseline density function. Given the relationships between the survivor function and the probability density function. ln[H(t)] =
0
(3. 0 0 Hence. The relationship between the survivor function and baseline survivor function also implies that ln[ ln S(t)] = ln + ln ( ln [S0 (t)]) 0 = X + ln( ln[S0 (t)]) or. if the integrated hazard plot is concave to the origin (rising but at a decreasing rate). CONTINUOUS TIME SPECIFICATIONS given the ‘ baseline’survivor function.3. we also …nd that for a PH model f (t) = f0 (t) [S0 (t)]
1
(3. then the plots of the estimated ln[H(t)] against t for each of the groups should have di¤erent vertical intercepts
. a plot of the integrated hazard against survival time should yield a straight line through the origin. in the constant hazard rate case. S0 (t). di¤erences in characteristics imply a scaling of the common baseline survivor function. in the PH model.25) (3.27)
The lefthand side is the log of the integrated hazard function at survival time t.
29) (3.30)
Z
0
(u) exp( 0 Xu )du :
0
In general. depending on whether survival time is before or after some date s.28)
where there is now a ‘ subscript on the vector of characteristics X. however.31)
. Observe t’ that for any given survival time t = t. i. suppose that (t) exp( 0 Xt )
(t.
Incorporating timevarying covariates It is relatively straightforward in principle to generalise the PH speci…cation to allow for timevarying covariates. Hence. To see this. Xt ) =
0
(3. From the standard relationship between the hazard and the survivor functions. moving in parallel. To illustrate this. we have: Z
t
S (t. The survivor function is more complicated too. we still have that absolute di¤erences in X correspond to proportional di¤erences in the hazard.32
CHAPTER 3. the function of t is common to both groups. suppose that there are two groups. ln[HA (t)] ln[HB (t)] = =
0 0
XA + ln[H0 (t)] XB + ln[H0 (t)]:
The groupspeci…c intercepts are the numbers implied by 0 XA and 0 XB whereas the common shape arises because the remaining term.29) now becomes
(3. Progress can be made. Xt )
= =
exp exp
(u) du
t 0
(3. the survivor function cannot be simply factored as in the case when covariates are constant.e. labelled A and B. by assuming that each covariate is constant within some prede…ned time interval. However the proportionality factor now varies with survival time rather than being constant (since the variables in X on the righthand side of the equation are timevarying). suppose that there is a single covariate that takes on two values. FUNCTIONAL FORMS FOR THE HAZARD RATE
but a common shape. For example. X = X1 if t < s X = X2 if t s: The survivor function (3.
This expression may be rewritten as Y = or Y + u (3.36)
where is a vector of parameters (cf.2.3.33) (u) du (3. and is a scale factor (which is related to the shape parameters for the hazard function –see below). The density function can be derived in the usual way as the product of the survivor function and the hazard rate.
. there are two parts to practical implementation: (a) reorganisation of one’ data set to incorporate s the timevarying covariates (‘ episode splitting’ and (b) utilisation of estimation ). and u = z= is an error term with density function f (u).37)
=u
(3. Xt ) = exp = exp = exp = [S0 (s)]
Z
s 0
(u) exp( X1 )du
0 (u)du 2
0
1
Z
s
1
Z
0 s 0
Z
t
Z
0
t 0
(u) exp( X2 )du (3. As we shall see in Chapter 5.35)
s
(u) du Z
t 0
s 2
(u) du exp
2 2
0
s
1
[S0 (t)] [S0 (s)]
:
The probability of survival until time t is the probability of survival until time s.32) (3.38)
0 where Y ln (T ) . routines that allow for conditioned survivor functions such as those in the second term in (3.7
Accelerated Failure Time (AFT) models
Let us again assume for the moment that personal characteristics do not vary with survival time. This form is sometimes referred to as a generalized linear model (GLM) or loglinear model speci…cation. CONTINUOUS TIME SPECIFICATIONS
33
S (t.32) –socalled ‘ delayed entry’ . X.2. 3. times the probability of survival until time t conditional on survival up until time s (the conditioning is accounted for by the expression in the denominator of the second term in the product). AFT speci…cation The AFT model assumes a linear relationship between the log of (latent) survival time T and characteristics X : ln (T ) =
0
X +z
(3.34) (3. Expressions of this form underpin the process of estimation of PH models with timevarying covariates.6).
3. and z is an error term.
f (u) = exp(u)=[1 + exp(u)].2.2: Di¤erent error term distributions imply di¤erent AFT models Distributional assumptions about u determine which sort of regression model describes the random variable T : see Table 3. We can also see this time scaling property directly in terms of the survivor function. This is indeed the case: it is the example that we considered in the Introduction.36). the scale factor = 1. The density f (u) = exp[ u exp( u)] in this case. Failure is ‘ decelerated’(survival time lengthened). Failure is ‘ accelerated’(survival time shortened). Interpretation But why the ‘ Accelerated Failure Time’ label for these models? From (3. is a free parameter. In the case of the Exponential distribution. For the Loglogistic model.39)
The term . which is constant by assumption.) The table also indicates that the normal linear model might be inappropriate for an additional reason: the hazard rate function may have a shape other than that assumed by the lognormal speci…cation. The two key cases are as follows: > 1: it is as if the clock ticks faster (the time scale for someone with characteristics X is T . FUNCTIONAL FORMS FOR THE HAZARD RATE Distribution of u Extreme Value (1 parameter) Extreme Value (2 parameter) Logistic Normal log Gamma (3 parameter Gamma) Distribution of T Exponential Weibull Loglogistic Lognormal Generalised Gamma
Table 3. i. maximum likelihood. and free parameter = '. whereas the time scale for someone with characteristics X = 0 is T ). In the Weibull model.34
CHAPTER 3. For the Lognormal model.e. it follows that: ln (T ) = z: (3. (The trick to handle censored data turns out to hinge on a di¤erent method of estimation. the error term u is normally distributed. The speci…cation underscores the claim that the reason that OLS did not provide good estimates in the example was because it could not handle censored data rather than the speci…cation itself. acts like a time scaling factor. and f (u) = exp[u exp(u)]. 0 and letting exp X = exp( ). Recall the de…nition of the survivor function:
. See Chapter 5. and we have a speci…cation of a (log)linear model that appears to be similar to the standard linear model that is typically estimated using ordinary least squares. and = 1= to use our earlier notation. to use our earlier notation. < 1 : it is as if the clock ticks slower.
p.40)
)]:
(3. X) = S0 [t exp ( = S0 [t ] )] (3.43)
Now de…ne a ‘ baseline’survivor function. CONTINUOUS TIME SPECIFICATIONS
35
S(t. i.49)
Thus an AFT regression coe¢ cient relates proportionate changes in survival time to a unit change in a given regressor. How may we intrepret the coe¢ cients k ? Di¤erentiation shows that
k
=
@ ln (T ) : @Xk
(3. X) = Pr[Y > ln(t)jX] = Pr[ u > ln(t) ] = Pr[exp ( u) > t exp(
(3. In particular. When explaining this idea. (Contrast this with the interpretation of the coe¢ cients in the PH model –they relate a one unit change in a regressor to a proportionate change in the
.43) as S0 (t) = Pr[exp ( u) > t or S0 (s) = Pr[exp ( u) > s
0] 0]
(3.45) (3. The conventional wisdom is that one year for a dog is equivalent to seven years for a human.e. In terms of (3. which is the corresponding function when all covariates are equal to zero.47). In sum. S0 (t. i. and exp( ) = exp( 0 ) 0 . with all other characteristics held …xed.2.42) (3.43).40) is equivalent to writing: S(t. X) = Pr[T > tjX]: Expression (3. Allison (1995. let s = t exp( )= 0 . in which case then = 0 . 62) provides an illustration based on longevity. and = 7. Thus: S0 (t) = Pr[T > tjX = 0]: This expression can be rewritten using (3.41) (3. dogs age faster than humans do. in terms of actual calendar years. It follows that > 1 is equivalent to having < 0 and < 1 is equivalent to having > 0.44)
(3.3.47) (3. if S(t. X = 0. the e¤ect of the covariates is to change the time scale by a constant (survival timeinvariant) scale factor exp ( ). X) describes the survival probability of a dog. show that we can rewrite the survivor function as: S(t. and substitute this into the expression for S0 (s): Comparisons with (3. then S0 (t) describes the survival probability of a human. X).e.48)
where exp ( ).46)
for any s.
:::. To show this requires …nding the functional form which satis…es the restriction that [S0 (t)] = S0 [t ]: (3.) Stata allows you to choose to report either b k or exp( b k ): Recall from (3. the relationship between the hazard rate for someone with characteristics X and the hazard rate for the case when X = 0 (i.55)
See Cox and Oakes (1984.e. X) =
0 (t
):
(3. is given for AFT models by: (t. there is a one unit change in Xk . and applying it to (3. Knkg. then Ti = exp[ k (Xik Xjk )]: (3. the probability density function for AFT models is given by: f (t. X) = = t
1 0 (t)
(3. not survival time. i. and they have the same z. in addition.36
CHAPTER 3. 71) for a proof. the ‘ baseline’ hazard 0 (:)).57)
. Xim = Xjm for all m 2 f1.54)
The Weibull model is the only model that satis…es both the PH and AFT assumptions The Weibull distribution is the only distribution for which. What are the relationships? Recall that the PH version of Weibull model is: (t.e. ceteris Ti = exp( Tj The exp(
k)
If. This property implies a direct correspondence between the parameters used in the Weibull PH and Weibull representations. the PH and AFT models coincide.36) that T = exp(
0
X) exp(z)
(3.e.56) (3. i.
Some further implications Using the general result that (t) = [@S(t)=@t]=S(t). FUNCTIONAL FORMS FOR THE HAZARD RATE
hazard rate. p.50)
If persons i and j are identical on all but the kth characteristic.47). then
k ):
(3. X) = =
0 (t )S0 (t ) f0 (t):
(3.52)
is called the time ratio. Xik paribus.53)
Similarly.51) Tj Xjk = 1. with constant covariates.
62)
s
:
where manipulations similar to those used in the corresponding derivation for the PH case have been used. i. Xt ) = exp = [S0 (s
Z
s 0 (u 1) 1 du
2 2
0
1
1 )]
[S0 (t [S0 (s
2 )] 2 )]
Z
t 0
(u
2)
2 du
(3.60) exp( X2 ).e. you can choose to report either or (or the exponentiated values of each coe¢ cient). However they can be incorporated via the hazard function and thence the survivor and density functions. The density function can be derived as the product
. X = X1 if t < s X = X2 if t s: We therefore have 1 function now becomes exp( X1 ) and
2
(3.2. for each k:
(3. Xt ) =
t 0 (t t ) 0
(3. and rearranging expressions. CONTINUOUS TIME SPECIFICATIONS
37
1
with baseline hazard 0 (t) = t 1 .36). you will …nd that this equality holds only if k = k = . for each k. depending on whether survival time is before or after some date s. Clearly it is di¢ cult to incorporate them directly into the basic equation that relates log survival time to charcteristics: see (3.58)
To see this for yourself. which is sometimes written 0 (t) = (1= )t with = 1= (see earlier for reasons).55): [exp( t )] = exp( [t exp( )] ): By taking logs of both sides. A relatively straightforward generalisation of the AFT hazard to allow for timevarying covariates is to suppose that (t. The survivor
S (t. we can factor the survivor function if we assume each covariate is constant within some prede…ned time interval. As in the PH case. To illustrate this.3. again suppose that there is a single covariate that takes on two values. Incorporating timevarying covariates Historically AFT models have been speci…ed assuming that the vector summarizing personal characteristics is timeinvariant.61) (3. one can show that the Weibull AFT coe¢ cients ( ) are related to the Weibull PH coe¢ cients ( ) as follows:
k
1
=
k
=
k=
. In Stata. Making the appropriate substitutions. substitute the PH and AFT Weibull survivor functions into (3.59)
where there is now a ‘ t’subscript on t exp( Xt ).
38
CHAPTER 3. Table 3. FUNCTIONAL FORMS FOR THE HAZARD RATE
Function Proportional hazards Accelerated failure time Hazard. the speci…cation does not completely character
. Note that the PH versus AFT description refers to the interpretation of parameter estimates and not to di¤erences in how the model per se is estimated. f (t.
3. S (t.2. = exp( X): The characteristics vector X is timeinvariant. and survivor function: see Table 3. estimation of AFT models with timevarying covariates requires a combination of episode splitting and software that can handle conditioned survivor functions (delayed entry).4 classi…es parametric models according to whether they can be interpreted as PH or AFT. Table 3. X) 0 (t) 0 (t ) Survivor.8
Summary: PH versus AFT assumptions for continuous time models
We can now summarise the assumptions about the functional forms of the hazard function. By contrast with all the parametric models considered so far. As in the PH case. As we see in later chapters. density function.9
A semiparametric speci…cation: the piecewiseconstant Exponential (PCE) model
The PiecewiseConstant Exponential (PCE) model is an example of a semiparametric continuous time hazard speci…cation.3: Speci…cation summary: proportional hazard versus accelerated failure time models Model Exponential Weibull Loglogistic Lognormal Gompertz Generalized Gamma PH X X X AFT X X X X X
Table 3.4: Classi…cation of models as PH or AFT: summary of the hazard rate and this survivor function. X) [S0 (t)] S0 (t ) 1 Density. X) f0 (t) [S0 (t)] f0 (t) 0 0 Note: = exp( X).
3.2. (t. estimation of all the models cited so far is based on expressions for survivor functions and density functions.3 which shows the case where there are no timevarying covariates.
with this model. for the …rst interval. This expression may be rewritten as 8 0 t 2 (0. 1 ] > 1 exp( X1 ) > > < 2 exp( 0 X2 ) t 2 ( 1.)
The baseline hazard rate ( ) is constant within each of the K intervals but di¤ers between intervals. Either one includes all the dummies and excludes the intercept term. . . 2] (t. K ]
. we have: 8 0 t 2 (0. K ]
so the constant intervalspeci…c hazard rates are equivalent to having intervalspeci…c intercept terms in the overall hazard. Xt ) = (3. > > . It turns out that this is a special (and simple) case of the models incorporating timevarying covariates discussed earlier. It is assumed that the hazard rate is constant within each interval but may. An advantage of the model compared to the ones discussed so far is that the overall shape of the hazard function does not have to be imposed in advance. and then perhaps later choose one of the parametric models in the light of this check. Xt ) = (3.3. . and similarly for the others. in general. K ] or 8 > exp(e1 ) t 2 (0. one can explore whether the hazard does indeed appear to vary monotonically with survival time. . and the required estimates are the estimated coe¢ cients on these variables. . one cannot include all the intervalspeci…c dummies and an intercept term in the regression. rather than speci…ed a priori. For example. (t.63) (t. Xt ) = . The time axis is partitioned into a number of intervals using (researcherchosen) cutpoints. . (To see why. e1 = log( 1 ) + 0 + 1 X1 + 2 X2 + 3 X3 + : : : + K XK . . or includes all but one dummy and includes an intercept.2. . .64) . . > > : exp(e ) t 2 ( K K 1. . constant within each interval. 2] (3. di¤er between intervals.65) . in principle. Covariates may be …xed or. CONTINUOUS TIME SPECIFICATIONS
39
ize the shape of the hazard function. > . 1 ] > exp log( 1 ) + X1 > > < exp log( 2 ) + 0 X2 t 2 ( 1. . > > . The PCE model is a form of PH model for which we have. Xt ) = 0 (t) exp( 0 Xt ): In the PCE special case. *Insert chart* shape of baseline hazard in this case. Whether it generally increases or decreases with survival time is left to be …tted from the data. 2] (t. One can estimate these by de…ning binary (dummy) variables that refer to each interval. note that. > : exp log( K ) + 0 XK t 2 ( K 1. observe that in order to identify the model parameters. if timevarying. However. 1 ] > > > < exp(e2 ) t 2 ( 1. > > : 0 K exp( XK ) t 2 ( K 1 .
The PCE model requires researcher input in its speci…cation (the cutpoints). one only needs to episode split. With a constant hazard rate.3
Discrete time speci…cations
We consider two models. The PCE model should be distinguished from Cox’ Proportional Hazard s model that is considered in Chapter 7. On the other hand. 3.
.40
CHAPTER 3. Both are continuous time models. the Cox model estimates are derived for a totally arbitrary baseline hazard function.
exp( se1 ) exp[ (t
s)e2 ]:
e2
(3. if you desire ‡ exibility and explicit estimates of the baseline hazard function. the logistic model. FUNCTIONAL FORMS FOR THE HAZARD RATE
The expression for the PCE survivor function is a special case of the earlier expression for survivor function where we assumed that covariates were constant within some prede…ned interval (cf. and allow for some ‡ exibility in the shape of the hazard function. was primarily developed for this second case but may also be applied to the …rst.66)
e2
[exp( s)]
It was stated earlier that expressions with this structure were the foundation for estimation of models with timevarying covariates –combining episode splitting and software that can handle delayed entry spell data.68). This model can also be applied when survival times are intrinsically discrete. The second model. then it can be shown that
S (t.67) (3. can incorporate timevarying covariates.32). If we consider the case where K = 2 (as earlier). one does not need software that can handle spells with delayed entry (the spell from s to t in our example). in the sense that it allows estimates of the slope parameters in the vector to be derived regardless of what the baseline hazard function looks like. For the PCE model. the relevant likelihood contribution for this spell is the same as a spell from time 0 to (t s): note the second term on the righthand side of (3. you might use the PCE model. It may be given an interpretation in terms of the proportional odds of failure. Xt )
=
[S0 (s)]
e1
[S0 (t)]
[S0 (s)]
e1
e2
= =
[exp( s)]
[exp( t)]
e2
(3. and leads to the socalled complementary loglog speci…cation. However the Cox model is more general. For expositional purposes we assume …xed covariates. The …rst is the discrete time representation of a continuous time proportional hazards model.68)
3. however.
72) (3. X) du : (3. X) S(aj S(aj .69) S(aj .1
A discrete time representation of a continuous time proportional hazards model
The underlying continuous time model is summarised by the hazard rate (t.3. only that they fall within some interval of time. X) S(aj .75) (3. as before. is given by: Z aj (u. Hence. X). X) = = = R aj
exp exp
Z
aj 0
(t) du
0
(3. the baseline survivor function at aj is: S0 (aj ) = exp( Hj ): The discrete time (interval) hazard function. What we do here is derive an estimate of parameters describing the continuous time hazard. 0 X 0 + These assumptions imply that
0
(t) e +
0
X
=
0
(t)
K XK
(3.71) (3. X) du is the integrated baseline hazard evaluated at the end of the interval. aj ].74) hj (X) is de…ned by
hj (X) =
(3. but the available survival time data are intervalcensored –grouped or banded into intervals in the manner described earlier. the date marking the end of the interval (aj 1 . exact survival times are not known. X) = where.76) (3. DISCRETE TIME SPECIFICATIONS
41
3.78)
X + log(Hj
. X) S(aj 1 .
1 X1
2 X2
+ :::
S(aj . The survivor function at time aj . X) = exp
0
Suppose also that the hazard rate satis…es the PH assumption: (t. h(aj .3.3. X) S(aj 1 . That is.70) and exp( 0 X).73)
0
exp [ Hj ]
Z
aj
(t) du
0
where Hj H(aj ) = 0 0 (u. but taking into account the nature of the banded survival time data that is available to us. X) exp [ (Hj 1 Hj )]
1 . X)
(3.77)
= 1 = 1 which implies that log (1 and hence log ( log[1
hj (X)) = hj (X)]) =
0
(Hj
1
Hj ) Hj
1 ):
(3.
aj )
Similarly. the j summarize the pattern of duration dependence in the interval hazard. if the continuous time model is the Weibull one.
or
0
(3. and are consistent with a number of di¤erent shapes of the hazard function within each interval. X) and derive an expression for the interval hazard rate:
0
log( log[1
hj (X)]) = h (aj . we can write the discrete time hazard equivalently as h (j. hence the discretetime PH model is often referred to as a cloglog model. We can substitute this expression back into that for h(aj .84)
The log( log(. and of the parameters j : The coe¢ cients are the same ones as those characterizing the continuous time hazard rate (t) = 0 (t) exp( 0 X). X) = 1
X+
j. if each interval is of unit length. then we can straightforwardly index time intervals in terms of the interval number rather than the dates marking the end of each interval.82)
=
j.81) (3. then evaluation of the integrated baseline hazard
. Restrictions on the speci…cation of the j can lead to discrete time models corresponding directly to the parametric PH models considered in the previous chapter. However the parameters characterizing the baseline hazard function 0 (t) cannot be identi…ed without further assumptions: the j summarize di¤erences in values of the integrated hazard function.83).80) (3.)) transformation is known as the complementary loglog transformation. In this case. X) = 1 exp[ exp 0 X + j ]: The cloglog model is a form of generalized linear model with particular link function: see (3. Observe that.79)
(3. FUNCTIONAL FORMS FOR THE HAZARD RATE
1 . the discrete time (interval) baseline hazard for the interval (aj is 1 and hence log [ log (1 h0j )] = log (Hj Hj 1 ) "Z # aj = log 0 (u) du
aj
1
h0j = exp (Hj
1
Hj )
(3. For example. To put things another way.83) X+
j
exp[ exp
]:
(3.42
CHAPTER 3. and we have unitlength intervals. When estimated using intervalcensored survival data. but one cannot identify the precise pattern of duration dependence in the continuous time hazard without further assumptions. aj ] and the beginning of the interval. one derives estimates of the regression coe¢ cients .
say
where j is the log of the di¤erence between the integrated baseline hazard 0 (t) evaluated at the end of the interval (aj 1 .
for convenience.2
A model in which time is intrinsically discrete
When survival times are intrinsically discrete. is a model that can be labelled the proportional odds model. X) is the discrete time hazard rate for month j. either they do not place any restrictions on how the j vary from interval to interval (in which case the model is a type of semiparametric one) or.3. In practice. the variation in j from interval to interval is speci…ed using a parametric functional form. X) = 1 h (j. DISCRETE TIME SPECIFICATIONS
43
– see later in chapter for the formula – reveals that j = (j) (j 1) . also commonly used in the literature. and h0 (j. Often however the pattern of variation in the j . for example. though of course it would not have the same interpretation. alternatively. a logistic hazard model (as considered in the next subsection) with intervalspeci…c intercepts may be consistent with an underlying continuous time model in which the withininterval durations follow a loglogistic distribution. we could of course apply the discrete time PH model that we have just discussed. (Beware that here the odds refer to the hazard. An alternative.87)
This is the logistic hazard model and.) Let us suppose. has a proportional odds interpretation.85)
where h (j. given survival up to end of the previous month. We can write this expression. The relative odds of making a transition at any given time is given by the product of two components: (a) a relative odds that is common to all individuals. Note. X) = 1 1 + exp
j 0
X
:
(3. Examples of these speci…cations are given below.3. and (b) an individualspeci…c scaling factor. In principle the j may di¤er for each month. The proportional odds model assumes that the relative odds of making a transition in month j.86)
= logit[ h0 (j)]. X) is the corresponding baseline hazard arising when X = 0. that the cloglog model is not the only model that is consistent with a continuous time model and intervalcensored survival time data (though it is the most commonlyused one). most analysts do not impose restrictions such as these on the j when doing empirical work. as h(j. unlike in the case of the Loglogistic model where they referred to the survivor function. It follows that log it [h (j.3. alternatively. is summarised by an expression of the form: h0 (j) h (j. X) = exp 1 h (j. given its derivation. X) 1 h0 (j)
0
X
(3. Sueyoshi (1995) has shown how. however. are characterized using some function of j. That is.
. X)
j
+
0
X
(3. that survival times are recorded in ’ months’ . …nally. they specify duration dependence in terms of the discrete time hazard (rather than the continuous time one).
3. and the j in the cloglog model. X)] = log where
j
h (j. Instead.
i. When estimating the full model. i. z3 .3
Functional forms for characterizing duration dependence in discrete time models
Examples of duration dependence speci…cations include: r log (j). groups of months are assumed to have the same hazard rate. X)] = z1 j + z2 j 2 + 0 X. one would not include an intercept term within the vector . but the hazard di¤ers between these groups. If the quadratic speci…cation were combined with a cloglog hazard model. then the full model speci…cation would be cloglog[h (j. piecewise constant. and z1 j and z2 are parameters that would be estimated together with intercept and slope parameters within the vector . I.e. where the shape parameters are z1 . log it(h) = log As h ! 0.44
CHAPTER 3. (Alternatively one could include an intercept term but drop one of the dummy variables.88)
h) ! 0 also. cloglog and logistic hazard models that share the same duration dependence speci…cation and the same X yield similar estimates – as long as the hazard rate is relatively ‘ small’ To see why this is so. If this speci…cation were combined with a logistic hazard model. the interval hazard is Ushaped or inverseUshaped. which can be thought of a discretetime analogue to the continuous time Weibull model. decreases if r < 0. :::zp : If p = 2 (a quadratic function of time). That is. note that . X)] = 1 D1 + 2 D2 +:::+ J DJ + 0 X. then the full model speci…cation would be logit[h (j. In practice. then log(1 h 1 h = log(h) log(1 h): (3.e. the researcher creates dummy variables corresponding to each interval (or group of intervals). and r is a parameter that would be estimated together with intercept and slope parameters within the vector .) The choice of shape of hazard function in these models is up to the investigator –just as one can choose between di¤erent parametric functional forms in the continuous time models. because the shape of the hazard monotonically increases if r > 0. z2 . If this speci…cation were combined with a logistic hazard model.e. the proportional odds model (a linear function of duration dependence and characteristics) is a close approximation
. a pth order polynomial function of time. z1 j + z2 j 2 + z3 j 3 + ::: + zp j p . where Dl is a binary variable equal to one if j = l and equal to zero otherwise. then the full model speci…cation would be cloglog[h (j. FUNCTIONAL FORMS FOR THE HAZARD RATE
3. log it(h) log(h) for ‘ small’h:
With a su¢ ciently small hazard rate. X)] = r log (j) + 0 X. or else as it would be collinear. or is constant if r = 0 (r + 1 = q is thus analogous to the Weibull shape parameter ).3.
including being used for speci…cation tests.3. To illustrate how one goes about deriving the relevant expressions. using the alternative characterization mentioned earlier. given estimates of a given model. X)] = log +( 1) log (t) +
0
X
(3.
3. X) = = t t
1 1
exp
0
X
(3. We derived a number of general results for PH and AFT models in an earlier section. The integrated hazard function also has several uses.90)
where exp 0 X . as we shall see in Chapters 5 and 6.4.2. DERIVING INFORMATION ABOUT SURVIVAL TIME DISTRIBUTIONS45 to a model with the log of the hazard rate as dependent variable. We also need expressions for the density and survivor functions in order to construct expressions for sample likelihoods in order to derive model estimates using the maximum likelihood principle. we focus on a relatively small number of models.1
The Weibull model
Hazard rate From Section 3. 0 (t) = t 1 exp( 0 ).
3. we need to know the shape of the survivor function for each model.4. in essence what we are doing here is illustrating those results. and assume no timevarying covariates.91)
. we know that the Weibull model hazard rate is given by (t. with reference to the speci…c functional forms for distributions that were set out in the earlier sections of this chapter. The expression for the hazard rate implies that: log[ (t. The result follows from the fact that the estimates of from a discrete time proportional hazards model correspond to those of a continuous time model in which log( ) is a linear function of characteristics. include: How long are spells (on average)? How do spell lengths di¤er for persons with di¤erent characteristics? What is the pattern of duration dependence (the shape of the hazard function with survival time)? To provide answers to these questions.89) (3.4
Deriving information about survival time distributions
The sorts of questions that one might wish to know. The baseline hazard function is 0 (t) = t 1 or.
46
CHAPTER 3. FUNCTIONAL FORMS FOR THE HAZARD RATE
We can easily demonstrate the Weibull model satis…es the general properties of PH model. From this expression can be derived the elasticity of the hazard rate with respect to changes in a given characteristic: @ log Xk @ = = @Xk @ log Xk If Xk is
k Xk :
(3. Constrast the cases of > 1 and < 1:
.94)
log(Zk ). then the elasticity of the hazard with respect to changes in Zk @ log = @ log Zk
k:
(3. X2
0
(X1
X2 ) :
(3. X) @Xk @ log (t. X1 = exp t.97)
*Insert graphs* The relative hazard rates for two persons with same X = X. X =
1
u. but at di¤erent survival times t and u. X =
:
(3. @ (t. Remember that all these results also apply to the Exponential model.98)
Thus failure at time t is (t=u) times more likely than at time u.96)
The Weibull shape parameter can also be interpreted with reference to the elasticity of the hazard with respect to survival time: @ log (t. but with di¤erent X : t. unless = 1 in which case failure is equally likely. Thus each coe¢ cient summarises the proportionate response of the hazard to a small change in the relevant covariate. X) = @ log (t) 1: (3. discussed earlier. are given by t u
1
t. as that is simply the Weibull model with = 1. First. where t > u. X) or @Xk = =
k
k
(3.92) (3.95)
The Weibull model also has the PH property concerning the relative hazard rates for two persons at same t.93)
where Xk is the kth covariate in the vector of characteristics X.
for groups characterised by combinations of X.106)
This expression suggests that a simple graphical speci…cation test of whether observed survival times follow the Weibull distribution can be based on a plot of the log of the integrated hazard function against the log of survival time.101)
0
using the fact that there are no timevarying covariates.4.103)
log H (t.99) (3. Hence. So substituting the Weibull expression for 0 the hazard rate into this expression: S (t.
.102)
*Insert graphs* to show how varies with X and with Density function Recall that f (t) = (t)S(t).104)
1
exp (
t ):
(3. X) = t Integrated hazard function Recall that H (t. in the Weibull case. the graphs should be parallel lines. S (t.} refers to the evaluation of the de…nite integral. H (t. But we know that for all models. X) = exp ( t ): (3.100) (3. X) du . and so in the Weibull case: f (t. DERIVING INFORMATION ABOUT SURVIVAL TIME DISTRIBUTIONS47 Survivor function Recall that (t. X) = =
log( ) + log(t) 0 X + log(t):
(3. If the distribution is Weibull. X) = h R i t 1 F (t. X) = exp = exp = exp Z
t
u ( t u
1 t 0
0
du )!
(3. X) = exp (u. X) = t : It follows that (3. the graph should be a straight line (with slope equal to the Weibull shape parameter ) or.105) (3.3. Hence the Weibull survivor function is: S (t. X) = ln S(t. The expression in the curly brackets {. X) = t 1 . X).
48
CHAPTER 3. FUNCTIONAL FORMS FOR THE HAZARD RATE
Quantiles of the survivor function, including median The median duration is survival time m such that S(m) = 0:5: *Insert chart * S(t) against t, showing m From (3.102) we have log S (t; X) = t and hence t= 1
1=
(3.107)
[ log S (t)]
:
(3.108)
Thus, at the median survival time, we must also have that m = = 1
1=
[ log (0:5)]
1=
(3.109) (3.110)
log (2)
:
Expressions for the upper and lower quartiles (or any other quantile) may be derived similarly. Substitute t = 0:75 or 0:25 in the expression for m rather than 0.5. How does the median duration vary with di¤erences in characteristics? Consider the following expression for the proportionate change in the median given a change in a covariate: 1 @[log(2) log( )] @ log m = = @Xk @Xk 1 @ log( ) = @Xk
k
:
(3.111)
Observe also that this elasticity is equal to k . Also, using the result that @ log m=@Xk = (1=m)@m=@Xk , the elasticity of the median duration with respect to a change in characteristics is Xk @m @ log(m) = = m @Xk @ log(Xk ) If Xk Zk is
k= k Xk
=
k Xk :
(3.112)
log(Zk ), then the elasticity of the median with respect to changes in = k:
Mean (expected) survival time In general, the mean or expected survival time is given by Z 1 Z 1 E (T ) tf (t) dt = S (t) dt
0 0
(3.113)
where the expression in terms of the survivor function was derived using integraR1 tion by parts. Hence in the Weibull case, E (T ) = 0 exp ( t ) dt. Evaluation of this integral yields (Klein and Moeschberger 1990, p.37):
3.4. DERIVING INFORMATION ABOUT SURVIVAL TIME DISTRIBUTIONS49 1+ 1 0.906 0.893 0.886 0.965 1.000 1.052 1.133 1.266 1.505 2.000 [log (2)] 0.912 0.885 0.832 0.716 0.693 0.665 0.632 0.592 0.543 0.480
1=
4.0 3.0 2.0 1.1 1.0 0.9 0.8 0.7 0.6 0.5
Ratio of mean to median 0.99 1.01 1.06 1.35 1.44 1.58 1.79 2.14 2.77 4.17
Table 3.5: Ratio of mean to median survival time: Weibull model
E (T ) =
1
1
1+
1
(3.114)
where (:) is the Gamma function. The Gamma function has a simple formula when its argument is an integer: (n) = (n 1)!, for integervalued n. For example, (5) = 4! = 4 3 2 1 = 24: (4) = 3! = 6. (3) = 2! = 2: (2) = 1. For noninteger arguments, one has to use special tables (or a builtin function in one’ software package). Observe that for Exponential model ( = s 1), the mean duration is simply 1= . So, if = 0:5, i.e. there is negative duration dependence (a monotonically 1 declining hazard), then E (T ) = (1= ) (3) = 2= 2 . Compare this with the 2 median m 0:480= . More generally, the ratio of the mean to the median in the Weibull case is 1+ [log (2)]
1 1=
ratio =
(3.115)
Table 3.5 summarises the ratio of the mean to median for various values of . Observe that, unless the hazard is increasing at a particularly fast rate (large > 0), then the mean survival time is longer than the median. The elasticity of the mean duration with respect to a change in characteristics is Xk @E (T ) k Xk = (3.116) E (T ) @Xk which is exactly the same as the corresponding elasticity for the median. Note that @ log E (T ) =@Xk = log(Zk ), then the elasticity of k = , so that if Xk the median with respect to changes in Zk is = , i.e. the same expression as k for the corresponding elasticity for the median!
50
CHAPTER 3. FUNCTIONAL FORMS FOR THE HAZARD RATE
3.4.2
Gompertz model
From Section 3.2, the Gompertz model hazard rate is given by (t; X) = exp( t): (3.117)
By integrating this expression, one derives the integrated hazard function H (t; X) = Since H(t) = is: [exp( t) 1]: (3.118)
log S(t), it follows that the expression for the survivor function
S (t; X) = exp
[1
exp( t)] :
(3.119)
The density function can be derived as f (t; X) = (t; X) S (t; X). Observe that if = 0, then we have the Exponential hazard model. To derive an expression for the median survival time, we applying the same logic as above, and …nd that m= 1 log 1 + log 2 : (3.120)
There is no closedform expression for the mean.
3.4.3
Loglogistic model
Hazard rate From Section 3.2, the Loglogistic model hazard rate is given by
1
(t; X)
=
= for ' 1= .where shape parameter
' ' t(' 1) ' 1 + ( t) > 0 and
h
t(
1
1)
1
1 + ( t)
i
(3.121)
(3.122)
0
= exp
X .
Survivor function The loglogistic survivor function is given by S (t; X) = = 1 1 + ( t) 1 ': 1 + ( t)
1=
(3.123) (3.124)
DERIVING INFORMATION ABOUT SURVIVAL TIME DISTRIBUTIONS51 Beware: this expression di¤ers slightly from that shown by e.g. *Insert graphs* to show how varies with X and with Density function
Recall that f (t.128)
One might therefore graph an estimate of the lefthand side against log(t) as a speci…cation check for the model: the graph should be a straight line. Klein and Moeschberger (1997). X) = log 1 + ( t) = log [1 + ( t) ] and hence ln[exp[H(t. we have h i 1 ' H(t. A more common check is expressed in terms of the log odds of survival.129) . and so in the Loglogistic case:
1
1
f (t. X) =
t(
1
1) 1=
h
1 + ( t) t
=
'
' (' 1) ' 2
i2
(3. X).126)
Integrated hazard function From above. X)S(t. however: see below. X)] 1] = '
0
(3. since they parameterise the hazard function di¤erently. and so the elasticity of the
k Xk
:
(3.127)
X + ' log (t) :
(3.4. Quantiles of the survivor function.3.130)
.125)
[1 + ( t) ]
(3. X) = (t. including median
The median survival time is m= Hence it can be derived that @m=@Xk = median with respect to a change in Xk is @ log (m) = @ log (Xk )
1
:
k= 2
(3.
we have log 1 S(t. so
0)
1
S(t.1415927.52
CHAPTER 3. From (3. X) 1 S(t.. FUNCTIONAL FORMS FOR THE HAZARD RATE
Mean (expected) survival time There is no closed form expression for mean survival time unless < 1 or equivalently ' > 1 (i. < 1: (3.135)
Hence the (conditional) log odds of survival at each t for any individual depend on a ‘ baseline’conditional odds of survival (common to all persons) and a personspeci…c factor depending on characteristics (and the shape parameter ) that scales those ‘ baseline’conditional odds: ( = 0 ) = exp( 1 X1 2 X2 ::: K XK ). 37): E (T ) = 1 sin ( ) . Note however that Csc(x) = 1= sin(x). The ratio of the mean to the median is given by E (T ) = m sin ( Logodds of survival interpretation The Loglogistic model can be given a logodds of survival interpretation.e..134)
S(t. XjX = 0) = (t 1 S(t.133). From (3. (Klein and Moeschberger’ s expression is in terms of the relatively unfamiliar cosecant function.e.131)
where is the trigonometric constant 3. = exp( S(t.136)
. the hazard rises and then falls).133)
also. XjX = 0) = ( 1 S(t. @ log (Xk )
(3. The logodds property suggests a graphical speci…cation check for the Loglogistic model.132)
i. Csc(x).
1
)
. the same expression as for the elasticity of the median.) The elasticity of the mean is given by @ log E (T ) = @ log (Xk )
k Xk
=
@ log (m) . XjX = 0)
)
0
1
:
(3.. XjX = 0) and therefore
(3.
< 1:
(3. the (conditional) odds of survival to t is 1 and at X = 0 (i.123). X) = ( t) S(t. X)
0
X
' log(t):
(3. X) = S(t. 1997. X) S(t. p. X)
0) 0. In this case.e. the mean is (Klein and Moeschberger.
alternatively.4. i. Sj log Sj and so the median is m= log(2) log(0:5) = : log(1 h) log(1 h) = (1 h)j = j log(1 h)
The expressions for the median and mean. In this case. etc.139)
(3.4.4.e. Things are simpler in the special case in which the discrete hazard rate is constant over time.138) (3.. the mean survival time E(T ) satis…es E(T ) = or.137)
k=1
(3. X)] against log(t) –it should be a straight line if the Loglogistic model is appropriate. hj = h all j (the case of a Geometric distribution of survival times).141)
k=1
. For discrete time models. there is typically no closed form expressions for median and mean and they have to be derived numerically (see web Lessons for examples). Recall that survival up to the end of the jth interval (or completion of the jth cycle) is given by: j Y S(j) = Sj = (1 hk ) (3.4
Other continuous time models
See texts for discussion of other models.
3. Most continuous time parametric models have closed form expressions for the median but not for the mean.5
Discrete time models
and hk is a logit or complementary loglog function of characteristics and survival time. Or give rise to parallel lines if plotted separately for di¤erent groups classi…ed by combinations of values of X. In general this function is not invertible so as to produce suitable closed form expressions.
K X
kf (k)
(3. The median is de…ned implicitly by …nding the value of j such that S(tj ) = 0:5. depend on the baseline hazard speci…cation that the researcher chooses.140)
In the general case.
3. DERIVING INFORMATION ABOUT SURVIVAL TIME DISTRIBUTIONS53 The check is therefore based on a graph of log[(1 S(t.3. X))=S(t. in which the hazard varies with survival time.
FUNCTIONAL FORMS FOR THE HAZARD RATE
E(T ) =
k=1
K X
S(k)
(3. and K is the maximum survival time.143) h)=h if K is
implying that the mean survival time is approximated well by (1 ‘ large’ . In the special case in which the hazard rate is constant at all survival times. then E(T ) = 1 h h 1 (1 h)K (3.142)
where f (k) = Pr(T = k).
.54
CHAPTER 3.
1
KaplanMeier (productlimit) estimators
Let t1 < t2 < ::: < tj < ::: < tk < 1 represent the survival times that are observed in the data set.
4. we consider estimators of the survivor and hazard functions – the empirical counterparts of the concepts that we considered in the previous chapter.) In any case. One might think of the estimators considered here as nonparametric estimators. The analysis of subgroups may. where the groups are de…ned by some combination of observable characteristics. because the nonparametric analysis is informative about the pattern of duration dependence. or to subgroups of subjects within the sample. We consider …rst the KaplanMeier productlimit estimator and then the lifetable estimator. We shall assume that we have a random sample from the population of spells. The methods may be applied to a complete sample. one can also determine the following quantities: dj : the number of persons observed to ‘ fail’(make a transition out of the state) at tj 55
. be constrained by cell size considerations – which is one of the reasons for taking account of the e¤ects of di¤erences in characteristics on survival and hazard functions by using multiple regression methods. it may also assist with the choice of parametric model. as no prior assumptions are made about the shapes of the relevant functions. (These are discussed in the chapters following. of course. From the data. and allow for rightcensoring (but not truncation). whereas for the latter the survival times are banded (grouped). The main di¤erence between them is that the former is speci…ed assuming continuous time measures of spells.Chapter 4
Estimation of the survivor and hazard functions
In this chapter.
1: Example of data structure mj : the number of persons whose observed duration is censored in the interval [tj .56CHAPTER 4.e. . Y dj b S(tj ) = 1 : (4. i. tj+1 ). . nj . Similarly the proportion surviving to the second obb served survival time t2 is S(t1 ) multiplied by one minus the proportion who made a transition out of the state between t1 and t2 . . the product of one minus the ‘ exit rate’at each of the survival times. dj . the KaplanMeier estimate of the survivor function is given by the product of one minus the number of exits divided by the number of persons at risk of exit. i.1. S(t1 ). one can also derive an estimate of the failure function F (tj ) = b b 1 S (tj ) and the integrated hazard function H(tj ). . at survival time tj .1
Empirical survivor function
So. . dk # censored m1 m2 m3 . . where the latter can be estimated by the number of exits divided by the number who were at risk of transition: d1 =(d1 + m1 ) = d1 =n1 . . b From this. . . Standard errors can be derived for the estimates using Greenwood’ formulae s (see texts). mj . . since
The proportion of those entering a state who survive to the …rst observed surb vival time t1 . still in state at time t but not in state by t + 1 nj : the number of persons at risk of making a transition (ending their spell) immediately prior to tj . tk # failures d1 d2 d3 . . .1) nj
jjtj <t
. ESTIMATION OF THE SURVIVOR AND HAZARD FUNCTIONS Failure time t1 t2 t3 . mk # at risk of failure n1 n2 n3 .1 provides a summary of the data structure. .
4. . is simply one minus the proportion who made a transition out of the state by that time. which is made up of those who have a censored or completed spell of length tj or longer: nj = (mj + dj ) + (mj+1 + dj+1 ) + :::: + (mk + dk )
Table 4. . nk
Table 4.e. tj . More generally. .
1. with the next ‘ step’beginning at time t1 with height b S(t1 ) and width t2 t1 . one has a set of b estimates at fS(tj ). the slope is not wellde…ned: so. and the last estimate depends on the largest noncensored survival time. What is one to do if an estimate of the hazard function is required? One possibility is to divide the survival time axis into a number of regular intervals of times and to derive estimates of the interval hazard rate (h rather than ): see the discussion of lifetable estimators below. To do this one would need estimates of the slope of the integrated hazard function at a series of survival times. one cannot simply ‘ connect the dots’(as in children’ drawing books).2) for a full discussion. with the …rst ‘ step’ at t = 0.3)
0
. of course. so too does the width of the steps (depending on the times at which failures were observed). Klein and Moeschberger (1997. To put things another way.
b H (tj ) =
Z
t
(u) du = exp [ H (t)] b log S (tj ) :
(4.4. nor is a nonparametric estimate of the hazard rate. the hazard function. height of one and width t1 . to draw the survivor function.) tage is that smoothing incorporates assumptions that need not be appropriate. This would be s making assumptions about the shape of the survivor function or. section . requires additional assumptions. To derive estimates of survival probabilities at dates beyond the maximum observed failure time. using the property that @H (t) =@t = (t). Clearly. Another possibility is to return to the estimated integrated hazard function. The shape of this step function is not like a regular staircase: the height between steps varies (depending on the survivor function estimates). The advantage is. having a . that once one has a smooth curve. KAPLANMEIER (PRODUCTLIMIT) ESTIMATORS
57
S (t)
= =)
exp
It is important to note that one can only derive estimates of the survivor function (and the integrated hazard function) at the dates at which there are transitions. the estimated survivor function is conventionally drawn as a step function. equivalently.g. that would most likely be unwarranted. (The estimated slope is the ‘ smoothed hazard’ The disadvan. This is a price that one has to pay for using a nonparametric estimator. tj g at various tj but. or at dates between withinsample failure times. the nonparametric step function nature of the survivor function and the integrated hazard function has implications for estimation of the hazard function. and so on. Trying to estimate the slope of the integrated hazard function at each of the observed survival time is equivalent to trying to …nd the slope at the corner of each of the steps. and to smooth this step function (for example using a kernel smoother).2) (4. this is a statistically sophisticated method of ‘ connecting the dots’ See e. One might think that one could derive an estimate of the hazard rate b (t) directly from the estimates of the integrated hazard function. In e¤ect. Indeed. one can also derive its slope. Re‡ ecting this. But the estimated integrated hazard function (‘ cumulative hazard’ is also a ) step function (with higher steps at longer observed survival times). 6.
De…ne intervals of time Ij where j = 1. for example. The NelsonAalen estimator of the cumulative hazard function has better smallsample properties that the KaplanMeier estimator. with datasets with sample sizes typical in the social sciences. Because the survival time axis is grouped into bands.2
Lifetable estimators
The lifetable method uses broadly the same idea as the KaplanMeier one. with the NelsonAalen estimator.58CHAPTER 4. ::::. in which case half of the total
. with ‡ exibility about the degree of smoothing (bandwidth).) An alternative to the KaplanMeier estimator of the survival and integrated hazard functions is the NelsonAalen estimator. J : Ij : [tj . The idea is to produce an ‘ averaged’ estimate centred on the midpoint of the interval. i. some people will have left. where dj : the number of failures observed in interval Ij mj : the number of censored spell endings observed in interval Ij Nj : the number at risk of failure at start of interval. The NelsonAalen estimator of the cumulative hazard is b H(tj ) = X dj nj (4. Suppose that transitions are evenly spread over each interval.4)
jjtj <t
b and this provides an estimate of the survivor function which is exp( H(tj )).
4.e. It turns out that the estimators are asymptotically equivalent (they provide the same estimates as the sample size becomes in…nitely large). there is grouped survival time data. there are genuine step changes in survival probabilities? (Think about exit from employment into retirement: there are likely to be large – and genuine – changes in survival probabilities at the age at which people qualify for state pensions. it is viceversa. whereas for estimation of the survivor function. sometimes called the FlemingHarrington estimator. In essence. one wants to adjust the observed number of persons at risk in a given interval to take account of the fact that during the period. the relative advantage is reversed. (In practice. tj+1 ). with the KaplanMeier estimator.) The sts command in Stata provides estimates of the KaplanMeier (productlimit) estimator of the survivor function and NelsonAalen estimator of the cumulative hazard function. the di¤erence between corresponding estimates is often negliglible. one estimates the survivor function and derives the cumulative hazard function from that. but have di¤erent small sample properties. ESTIMATION OF THE SURVIVOR AND HAZARD FUNCTIONS What if. but it was explicitly developed to handle situation where the observed information about the number of exits and the number at risk are those that pertain to intervals of time. sts graph includes an option for graphing smoothed hazards.
strati…cation. The formula for the estimator of the survivor function in this case is the same as that for the KaplanMeier one in the continuous time case. If the time axis were intrinsically discrete.7)
and an estimate of the hazard rate
where
b b S(k) + S(k + 1) e S (k) = (4. LIFETABLE ESTIMATORS
59
number of events for each interval will have left halfway through the relevant interval. then the estimator of the interval hazard is equal to zero. and tests of homogeneity **
e S (j)
(4. s Note that the ‘ adjustment’is used because the underlying survival times are continuous.6)
k=1
From this expression can be derived an estimate of the density function (recall S(t) = 1 F (t)) b b b b F (j + 1) F (j) S (j) S (j + 1) b f (j) = = tj+1 tj tj+1 tj b (j) = h i b f (j) (4. See Stata’ ltable command. but the observed survival time data are grouped.2.9) 2 and is taken as applying to the time corresponding to the midpoint of the interval.5)
1
:
(4. We can therefore de…ne an adjusted number at risk of exit: nj : the adjusted number at risk of failure used for midpoint of interval nj = N j and hence b S (j) =
j Y
dj 2 dk nk
(4. adjustment’is not required.8)
. ** to be added: discussion of formulae for standard errors.4. The same is true if one simply wanted to derive an estimate of the interval hazard rate. Observe that if there are no events within some interval Ij . so that the intervals referred to basic units of time or ‘ cycles’ then the ‘ .
ESTIMATION OF THE SURVIVOR AND HAZARD FUNCTIONS
.60CHAPTER 4.
a sample from the stock at point of time who are interviewed some time later (this corresponds to the delayed entry or left truncated spell data case discussed in the biostatistics literature). see chapter 1 of Gould et al. a random sample from the in‡ to the state (or of the population or ow some other group – as long as sample selection is unrelated to survival) and each spell is monitored (from start) until completion. A second lesson of this and succeeding chapters is that the sample likelihood used needs to be appropriate for the type of process that generated the data. The principal lesson of this chapter (and the next) is that these problems may be addressed by using estimation based on maximum likelihood methods (ML). The main sampling schemes in social science applications are as follows: 1. ML methods per se are not considered at all here –see any standard econometrics text. 61
. for a discussion of the ML principle and the nice properties of ML estimates. the type of sampling scheme. we return to the problem that was raised in Chapter 1. For a very useful (and free) introduction.g. The continuous time Cox regression model is considered in Chapter 7 (it uses an estimation principle that di¤ers from ML). e. (2003). in the next chapter we consider discrete time regression models.pdf. Greene (2003).Chapter 5
Continuous time multivariate models
In this chapter and the next. We drew attention to problems with using OLS (and some other commonly used econometric methods) to estimate multivariate models of survival times given the issues of censoring and timevarying covariates. or (b) the censoring point varies. 2. 3.statapress. downloadable from http://www. a random sample as in (1) and either (a) each spell is monitored until some common time t . i.e.com/books/mlch1. In this chapter we consider parametric regression models formulated in continuous time.
).1
Random sample of in‡ and each spell monitored ow until completed
Suppose we have a sample of completed spells (or persons –since we assume one spell per person). 5. indexed by i = 1. as long as the data have been stset data correctly. and log L is derived by summing down the rows. Then each individual contribution to the likelihood is given by the relevant density function. Weibull. a sample from the stock with no reinterview.1)
i=1
where Ti denotes completed spell length for person i. The likelihood contribution per spell is Li and for the sample as a whole. s Let us consider the likelihood contributions relevant in each case.
. log L =
n X i=1
log Li :
Software for maximization likelihood estimation typically works by requiring speci…cation of expressions of log Li for each individual i (corresponding to a row in the analysis data set). equivalently.g.
5. and hence log L =
n X i=1
log f (Ti ) :
with log Li = log f (Ti ) :To complete the model speci…cation. n. Cases (4) and (5) require one to write one’ own maximisation routines. L=
n Y
i=1
Li
or. in terms of the loglikelihood function. Stata’ streg modules handle cases (1)– straightforwardly with s (3) no problem. e. ow
Cases (3) – (5) di¤er from the …rst two in that one has to take account of ‘ selection e¤ects’when characterising the sample likelihood. and substitute the relevant expression for into the expression for log f (Ti ). by construction. CONTINUOUS TIME MULTIVARIATE MODELS 4. one has to choose a speci…c parametric model for the survival time distribution. Hence the overall sample likelihood function is given by L=
n Y
f (Ti )
(5.62
CHAPTER 5.0. : : : . a sample from the out‡ (the case of right truncated spell data). For continuous time data. (There are no censored spells.
right censoring but censoring point varies
This is almost the same as the case just considered. indexed k = 1.5)
k=1
=
J X j=1 J X j=1 J X j=1
log [ (Tj )S(Tj )] +
J X j=1
log S(Tk )
K X
(5. :::. K. with Tj such that Tj f (Tj ).4)
k=1
log
f (Tj ) S(Tj )
S(Tj ) +
K X
log S(Tk )
(5.7)
k=1
=
log (Tj ) +
=
N X i=1
N X i=1
log S(Ti )
(5.63
5. Taking logs of both sides implies:
J X j=1 J X j=1 K X
log L = =
log f (Tj ) +
log S(Tk )
K X
(5. in terms of the hazard rate.3
Random sample of population. monitored until t
Now the sample consists of completed spells.3)
The likelihood L is often written di¤erently in this case. with Tk such that Tk > t : Lk = S(t ) It follows that L=
J Y K Y
f (Tj )
S(t ):
(5. It is probably the most common likelihood used in empirical applications.9)
. J.2)
j=1
k=1
5.6)
k=1
=
log
(Tj ) +
log S(Tj ) +
log S(Tk )
(5.2
Random sample of in‡ ow with (right) censoring.8)
[ci log (Ti ) + log S(Ti )]
(5. :::.0. indexed j = 1.0. and t : Lj =
(right) censored spells. L=
J Y
f (Tj )
j=1
k=1
K Y
S(Tk )
(5.
where Zk is the length of time between sampling and interview. it is by choice of di¤erent parametric speci…cations for (t) that the model is fully speci…ed. so the total time at risk of exit can be calculated.
(5.64
CHAPTER 5. then the model cannot be …tted. together with the time between sampling and interview (last observation).10)
(5. K. who are interviewed some time later (‘ stock sampling with followup’ Spell start ).11)
Again. :::. and index spells still in progress at the time of the interview (rightcensored) by k = 1. for all i). CONTINUOUS TIME MULTIVARIATE MODELS
where ci is a censoring indicator de…ned such that ci = In this case.0. and Tk + Zk for censored spells.
5. :::. then the sample provides no information about the nature of duration dependence in the hazard rate. We have to analyse outcomes that have occurred by the time of interview conditional on surviving in the state up to the sampling time. Then we may de…ne the incomplete spell length for each person at the time that the spell was sampled from the stock: Tj for completed spells. we have log Li = ci log (Ti ) + log S(Ti ): Given the relationship between the survivor function and the integrated hazard function.
. If all survival times are censored (ci = 0. J.4
Left truncated spell data (delayed entry)
The most common social science example of this type of sampling scheme is when there is a sample from the stock of individuals at a point in time. A sample from the stock is a nonrandom sample (see below). If there are no exits. *Insert chart* time line indicating sampling time and interview times Let us index spells that were completed by the time of the interview by j = 1. the total observed spell length is Tj + tj for spells that were completed by the time of the interview. dates are assumed to be known (these dates are of course before the date of the stock sample). but we can handle the ‘ selection bias’ using information about the elapsed time between sampling and interview. Tk for censored spells. one can also write the loglikelihood contribution for each individual as log Li = ci log (Ti ) = ci log (Ti ) H(Ti ) Z Ti (u)du:
0
1 if spell complete 0 if spell censored. Entry is ‘ delayed’ because the observation of the subjects under study occurs some time after they are …rst at risk of the event.
13)
Lk =
(5. suppose that the sample was of the stock of unemployed persons at 1 May 2001. and let i be the date at which the stock sample was drawn:Then L = = or log L =
N X i=1 N Y f (Ti ) S ( i) i=1 N Y ci
S (Ti ) S ( i) S (Ti ) S ( i)
1 ci
(5. For example. How do we do this? Recall the expression for a conditional probability: Pr(AjB) = Pr(A \ B) : Pr(B) (5. Of all those who entered unemployment on 1 January 2001. Likelihood contribution for leavers (type j): Lj = Likelihood for stayers (type k): S(Tk + Zk ) : S (Tk ) f (Tj + tj ) : S (Tj ) (5. we have to condition on the fact that the person survived su¢ ciently long in the state to be at risk of being sampled in the stock. some will have left unemployment by May.14)
Hence the overall sample likelihood is L=
J K Y f (Tj + tj ) Y S (Tk + Zk ) : S (Tj ) S (Tk ) j=1 k=1
(5.16)
[ (Ti )]
ci
(5. only the ‘ slower’exiters will still be unemployed in May and at risk of being sampled in the stock. Hence.12)
By analogy. If we ignored this problem. we would not take account of the lengthbiased sampling.18)
.17)
i=1
ci log (Ti ) + log
S (Ti ) S ( i)
(5. let us simply de…ne Ti as the total spell length. we de‡ the likelihood contribution for each individual by the ate probability of survival from entry to the state until the stock sampling date. and use the censoring indicator ci to distinguish between censored and complete spells.65 For both completed and censored spells.15)
To write this expression in a form closer to that used in the previous subsection. But this probability is given for each i by the survival function: S(Ti ).
66
CHAPTER 5. So. Distinguish a series of di¤erent event types: A : incomplete spell of length t at calendar time s B : unemployed at date s (may di¤er by person –be si –if from survey with interviews through year)
.19)
where wi 1=S ( i ).0. we have to write down the probability of observing such a spell including taking account of the di¤erent chances of entering unemployment at di¤erent dates. In this situation we have no information that we can use to condition survival on (as in the previous subsection). (Similarly for the unemployed people who were surveyed during a di¤erent month. (1984) for applications. See Nickell (1979) and Atkinson et al. or entered unemployment on 1 February and remained unemployed 8 months. Think of the wi as being like an weighting variable: one weights the delayed entry observations by a type of inverseprobability weight to account for the left truncation. he did not have the exact date of exit for those spells that ended in the interval between stock sampling and interview: he only knew that there had been an exit. Lancaster (1979) also considered a variation on this sampling scheme.) We want to model the chance of having an incomplete spell of length t at calendar time s. i. stock sampling date).5
Sample from stock with no reinterview
This is a di¢ cult and awkward case. If there is no left truncation. CONTINUOUS TIME MULTIVARIATE MODELS
which is directly comparable with the (log)likelihood derived for the case without lefttruncation. the larger the weight. in order to derive the sample likelihood. To estimate a model with this structure. one simply replaces the expression f (Tj + tj ) in (5. The later in time that i is (the closer to Ti ).15) with S (Tj ) S (Tj + tj ). and then remaining unemployed between r and s. This expression can be rewritten as log L =
N X i=1
fci log (Ti ) + log [wi S (Ti )]g
(5. This model could be …tted using data from standard crosssectional household surveys: these surveys include samples of unemployed people.e. and typically include a single question asking how long the respondent had been unemployed. then S ( i ) = 1 = wi : This model can also be estimated straightforwardly in Stata using streg as long as the data have been properly stset …rst (use the enter option to indicate the ‘ entry’time. Someone unemployed when surveyed at the start of October 2001 could have entered unemployment on 1 January and remained unemployed 9 months. r.
5. conditional on being unemployed at date s –where the chance of the latter itself depends on the chances of having entered unemployment at some date in the past. It used to be relatively common because of a lack of suitable longitudinal surveys. In his data set. or unemployment on 1 March and remained unemployed 7 months.
common to all persons).24)
This says that the probability of being unemployed at s is the sum over all dates before the interview (indexed by ) of the product of the probability of entering unemployment at date and the probability of remaining unemployed between and s. since P (A) = Pr (C) Pr (D) Pr (B) (5.e.
(5.67 C : survival of spell until time s given entry at time r (i. Li =
si P si P
S(si jentry at ri ) i u (ri ) S(si jentry at ) i u( ) : S(si jentry at ri )u (ri ) S(si jentry at )u( )
(5.21) (5.e. spell length t = s r) D : entry to unemployment at date r. We now need to make an assumption about unemployment in‡ rates in ow order to evaluate Pr(D): Assume that Pr(D) = ui (r) =
i u(r). Or one wants to study longevity. Li = Pr (AjB) = Pr (A \ B) = Pr (B) = Pr (A) = Pr (B) Pr (C) Pr (D) = .27)
= 1
Observe the cancelling of the individual …xed e¤ects. This likelihood cannot be …tted using a canned routine in standard packages and has to be specially written (tricky!).e. the probability of entry at r factors into an individual …xed e¤ect (this varies with i but not time) and a function depending on time but not i (i.0.22) (5.6
Right truncated spell data (out‡ sample) ow
An example of this sampling scheme is where one has a sample of all the persons who left unemployment at a particular date and one wants to study the hazard of (re)employment.25)
i.23)
What are the expressions for the component probabilities? Look …rst at: Pr (B) =
s X
Si (s
= 1
jentry at ) ui ( ) :
(5. and has a sample based
.20) (5.26)
= 1
=
(5.
5. Implementing the model one can use ‘ external’data about unemployment entry rates for the u(:) term.
1
Episode splitting: timevarying covariates and estimation of continuous time models
To illustrate this. Of all people beginning a spell at a particular date. surviving up until the instant just before time t = Ti .28) which gives the probability of failure at t = Ti . control for this. The numerator in (5. those who are more likely to survive (have relatively long spells) are less likely to be found in the out‡ at a particular date – a form of ‘ ow selection bias’ To . One has to split the survival time (episode) for each individual into subperiods within which each timevarying covariate is constant. there is an overrepresentation of short spells relative to long spells. of course. CONTINUOUS TIME MULTIVARIATE MODELS
on death records. I. let us return to the most commonly assumed sampling scheme. s The sample likelihood is given by
n Y f (Ti ) : L= F (Ti ) i=1
(5. This must be conditioned: hence the expression in the denominator in (5. and then making the transition at t = Ti . for discretetime models. the probability of entering at t = 0. i.28) is the density function for a spell of length Ti . one has to condition on failure at the relevant survival time when specifying each individual’ likelihood contribution.
.1 What is the logic behind this? Consider a person i with two di¤erent values for a covariate: X1 if t < u X2 if t u (5. by de…nition. i. Estimation of continuous time parametric regression models incorporating timevarying covariates requires episode splitting. For the Cox model it need only be done at the survival times at which transitions occur (see Chapter 7) and.68
CHAPTER 5. In these sorts of cases. with one record per subperiod.28)
There are no censored spells. In our sample likelihood derivation above. we implicitly assumed explanatory variables were all constant – there were no timevarying covariates.e. the data set was organised so that there was one row for each individual at risk of transition.e.29)
Recall that the loglikelihood contribution for person i in the data structure that we have is:
1 Episode splitting to incorporate timevarying covariates is also required for other types of model. a random sample of spells with right censoring but the censoring point varies. one has to create multiple records for each individual. episode splitting is done at each time point (see Chapter 6).
5.e.
episode splitting is easily accomplished using stsplit (applied after the data have been stset. conditional on entry at u). multivariate continuous time models can be straightforwardly estimated using the streg or stcox commands. EPISODE SPLITTING: TIMEVARYING COVARIATES AND ESTIMATION OF CONTINUOUS TIME MOD
Record # Censoring indicator Single data record for i 1 ci = 0 or 1
Survival time Ti
Entry time 0
TVC value 
Multiple data records for i (after episode splitting) 1 ci = 0 u 2 ci = 0 or 1 Ti
0 u
X1 X2
Table 5.) Thus the log of the probability of survival until T = (log of probability of survival to time u) + (log of probability of survival to Ti . …nally.30)
where i’ observed survival time is Ti and the censoring indicator ci = 1 if i’ s s spell is complete (transition observed) and 0 if the spell is censored. Then one creates the appropriate timevarying covariate values for each record and.1 for a summary of the old and new data structures. incorporating some notational liberties to facilitate exposition.32)
= log [S (u)] + log
(This expression. follows directly from the discussion about the incorporation of timevarying covariates into PH and AFT models in Chapter 3. But log [S (Ti )] = log S (u) S (Ti ) S (u) S (Ti ) : S (u) (5. the timevarying covariate takes on the value X1 and in the second record the timevarying covariate takes on the value X2 . Thus episode splitting (when combined with software that can handle left truncated spell data) gives the correct loglikelihood contribution. So what we do is create one new record with ci = 0.1: Example of episode splitting
log Li = ci log [ (Ti )] + log [S (Ti )]
(5. and which then cleverly updates stset).
. In Stata. See Table 5. In the …rst episode and record. plus one new record summarising an episode with ‘ delayed entry’ at time u and censoring indicator ci has the value as in the original data. t = u (a right censored episode).1.5.31) (5.
CONTINUOUS TIME MULTIVARIATE MODELS
.70
CHAPTER 5.
4) (6. at which point i’ spell s is either complete (ci = 1). and let us suppose that each interval is a month long.3)
(1
hik )
k=1
and the likelihood contribution for each completed spell is given by the discrete time density function: Li = Pr(Ti = j) = fi (j) = hij Si (j 1) =
j hij Y (1 1 hij k=1
(6. We observe a person i’ spell s from month k = 1 through to the end of the j th month.Chapter 6
Discrete time multivariate models
6.5) (6.1)
The likelihood contribution for a censored spell is given by the discrete time survivor function Li = = Pr(Ti > j) = Si (j)
j Y
(6. or right censored (ci = 0).6)
hik )
The likelihood for the whole sample is 71
.1 In‡ sample with right censoring ow
We now measure time in discrete intervals indexed by the positive integers.2) (6. The discrete hazard is hij = Pr (Ti = jjTi j) : (6.
except that here there is episode splitting on a much more extensive basis. and yik = 0 otherwise. then the variable summarizing the pattern of duration dependence will itself be a timevarying covariate in this reorganised data set. 1995). ci ci = 1 =) yik = 1 for k = Ti .1 summarises the two data structures.14)
i=1 k=1
But this expression has exactly the same form as the standard likelihood function for a binary regression model in which yik is the dependent variable and in which the data structure has been reorganized from having one record per spell to having one record for each month that a person is at risk of transition from the state (socalled personmonth data or. DISCRETE TIME MULTIVARIATE MODELS
L = =
i=1 n Y
n Y
[Pr(Ti = j)] i [Pr (Ti > j)] " hij 1 hij hij 1 hij
j Y
c
1 ci
(6. It has four steps:
. personperiod data).8)
(1
hik )
(6.9)
k=1
where ci is a censoring indicator de…ned such that ci = 1 if a spell is complete and ci = 0 if a spell is rightcensored (as in the previous chapter). we can write
n j XX
log L = =
yik log
i=1 k=1
hik 1 hik
+
n j XX
n j XX
log (1
hik )
(6. yik = 0 otherwise = 0 =) yik = 0 for all k
(6. That is.12)
Hence. This implies that n n j X XX hij ci log log (1 hik ) : (6.11) (6. Jenkins. and done regardless of whether there are any timevarying covariates among the X. Table 6.13)
i=1 k=1
[yik log hik + (1
yik ) log (1
hik )] :
(6. This is just like the episode splitting described earlier for continuous time models which also lead to the creation of data records. As long as the hazard is not constant.7) #1
ci
(1
hik )
i=1
=
i=1
n Y
"
k=1 ci j Y
#ci " #
k=1
j Y
(1
hik )
(6.10) log L = + 1 hij i=1 i=1
k=1
Now de…ne a new binary indicator variable yik = 1 if person i makes a transition (their spell ends) in month k.72
CHAPTER 6. This result implies that there is an easy estimation method available for discrete time hazard models using data with this sampling scheme (see inter alia Allison. 1984. more generally.
. In principle.2. . . In the current context. . look at …rst order condition for a maximum of the loglikelihood function. 1 . i ci 1 0 2 .
73
Ti 2 3 .
Table 6. Estimate the model using any standard binary dependent regression package: logit for a logistic model. it means that in order to estimate discrete time hazard models in this way. Recall that with no delayed entry Li = hij 1 hij
ci j Y
(1
hik ) =
k=1
hij 1 hij
ci
Si (j) :
(6. one might have had to do this because the episode splitting required for the easy estimation method could create very large data sets with infeasible storage requirements. the latter models can only be estimated if the data contains both ‘ successes’and ‘ failures’in the binary dependent variable. 1 1 . 3. one could estimate the sequence likelihood given at the start without reorganising the data. . k 1 2 1 2 3 . For example. 2. . That the likelihood for discrete time hazard model can be written in the same form as the likelihood for a binary dependent model also has some other implications. . Create any timevarying covariates –at the very least this includes a variable describing duration dependence in the hazard rate (see the discussion in Chapter 3 of alternative speci…cations for the j terms).15)
.) The easy estimation method also can be used when there is stock sample with followup (‘ delayed entry’ as we shall now see. Indeed before the advent of cheap computer memory or storage. i ci Ti 1 0 2 1 0 2 2 1 3 2 2 . (To see this.1: Person and personperiod data structures: example 1. Personmonth data person id. the data must contain both censored and complete spells. . . . . 3 3 .6. ).
yik 0 0 0 0 1 . . Choose the functional form for hik (logistic or cloglog). cloglog for complementary loglog model. Reorganize data into personperiod format. . LEFTTRUNCATED SPELL DATA (‘ DELAYED ENTRY’ ) Person data person id.2
Lefttruncated spell data (‘ delayed entry’ )
We proceed in the discrete time case in an analogous manner to that employed with continuous time data. . .
6. The easy estimation method is not the only way of estimating the model. . 4.
personmonth id.
If one has a panel data set. steps one and two might be combined. in practice.16) Li = Si (ui ) But Si (ui ) =
ui Y
(1
hik )
(6. 1995): 2 3 j Q (1 hik ) 7 ci 6 hij 6 k=1 7 Li = (6.19)
k=ui +1
[yik log hik + (1
yik ) log (1
hik )]
(6. Jenkins.17)
k=1
and this leads to a ‘ convenient cancelling’result (Guo. 3. Actually. Choose the functional form for hik (logistic or cloglog). 4. for instance.20)
k=ui +1
which is very similar to the expression that we had in the nodelayedentry case.74
CHAPTER 6.g. The only di¤erence from before is step 2 (throwing data away). cloglog for complementary loglog model. the likelihood contribution for i is: j ci Q hij (1 hik ) 1 hij k=1 : (6. Throw away the records for all the periods prior to the time of ‘ delayed entry’(the months up to and including u in this case). DISCRETE TIME MULTIVARIATE MODELS
With delayed entry at time ui (say) for person i.
.18) 6Q 7 1 hij 4 ui 5 (1 hik )
k=1 j Y
= Taking logarithms. 2. Reorganize data into personperiod format. we have to condition on survival up to time ui (corresponding to the end of the ui th interval or cycle). we have log Li =
j X
hij 1 hij
ci
(1
hik ) :
(6. retaining the months during which each individual is observed at risk of experiencing the event. when the stock sample was drawn) to the month when last observed. one can create a data set with the correct structure directly. 1993. 5. except that the summation now runs over the months from the month of ‘ delayed entry’(e. which means dividing the expression above by S (ui ). Estimate the model using any standard binary dependent regression package: logit for a logistic model. 1995): 1. Create any timevarying covariates (at the very least this includes a variable describing duration dependence in the hazard rate –see above). Hence with lefttruncated data. Implementation of the easy estimation method using data with this sampling scheme now has four steps (Jenkins.
3
Righttruncated spell data (out‡ sample) ow
Again the speci…cation for the sample likelihood closely parallels that for the continuous time case. F (j) = 1 S(j). all spells are completed.3.
. Each spell’ contribution to the sample likelihood is s given by the discrete density function. divided by the discrete time failure function. There are no censored cases. But one has to account for the selection bias arising from out‡ ow sampling by conditioning each individual’ likelihood contribution on failure at s the observed survival time. f (j). by construction. Hence i’ contribution to sample likelihood is: s
hij 1 hij j Q
(1
hik ) : hik ) (6.21)
Li =
k=1 j Q
1
(1
k=1
Unfortunately the likelihood function does not conveniently simplify in this case (as it did in the lefttruncated spell data case).6. RIGHTTRUNCATED SPELL DATA (OUTFLOW SAMPLE)
75
6.
DISCRETE TIME MULTIVARIATE MODELS
.76
CHAPTER 6.
for the moment. Here follows an intuitive demonstration of how the model works. The distinguishing feature of Cox’ proportional hazard s model. The result derives from innovative use of the proportional hazard assumption together with several other insights and assumptions. but we shall not consider that case here. excluding the intercept) without having to specify any functional form for the 77
. and a partial likelihood (PL) method of estimation rather than maximum likelihood. that the X vector is constant.3)
We shall suppose.e. and suppose that we have a random sample of spells. based on the explanation given by Allison (1984). Cox (1972) proposed a method for estimating the slope coe¢ cients in (i. Xi ) =
0
(t) exp 0 (t) i :
0
0
Xi
(7. sometimes simply referred to as the ‘ Cox model’ is its demonstration . proposed by Cox (1972). is perhaps the mostoften cited article in survival analysis. Xi ) = = or. (t. (The models can be estimated using lefttruncated data.Chapter 7
Cox’ proportional hazard s model
This model. We are working in continuous time. some of which are censored and some are complete. the parametric models considered earlier).2)
(t)
i:
(7. equivalently.) Recall the PH speci…cation (t.1) (7. that one could estimate the relationship between the hazard rate and explanatory variables without having to make any assumptions about the shape of the baseline hazard function (cf. Hence the Cox model is sometimes referred to as a semiparametric model. but note that the Cox model can in fact also handle timevarying covariates.
The k indexes events. t + t) is f (t) dt = (t)S(t)dt: Consider event k = 5 with risk set i 2 f7. which one in fact does experience it? To work out this probability. given that one observation amongst many at risk experiences the event)
Or. But what then is each Lk ? Think of it as the following: Lk = Pr(person i has event at t = ti conditional on being in the risk set at t = ti ). use the rules of conditional probability together with the result that the probability density for survival times is the product of the hazard rate and survival function. f (t) = (t)S(t). using the method of PL. i. (7.e. of the people at risk of experiencing the event. In the table persons are arranged in order of survival times.5) = Pr(this particular person i experiences the event at t = ti .e. We assume that there is a maximum of one event at each possible survival time (this rules out ties in survival times –for which there are various standard ways of adapting the Cox model). 8g: We can de…ne A = Pr (event experienced by i = 7 and not i = 8) = [ 7 (11)S7 (11)dt] [S8 (11)] B = Pr (event experienced by i = 8 and not i = 7) = [ 8 (11)S8 (11)dt] [S7 (11)]
.4)
This is quite di¤erent from the sample likelihood expressions we had before in the maximum likelihood case.1: Data structure for Cox model: example baseline hazard function.1. i. Consider the illustrative data set shown in Table 7. and so the probability that an event occurs in the tiny interval [t. The sample Partial Likelihood is given by PL =
K Y
k=1
Lk
(7. COX’ PROPORTIONAL HAZARD MODEL S Person # i 1 2 3 4 5 6 7 8 *: censored Time Event # ti k 2 1 4 2 5 3 5* 6 4 9* 11 5 12* observation
Table 7. not persons. PL works in terms of the ordering of events by constrast with the focus in ML on persons (spells).78
CHAPTER 7.
7)
Everyone is in the risk set for the …rst event. In this case. exp( 0 ). and then maximize it to derive estimates of the slope coe¢ cients within : It has been shown that these estimates have ‘ nice’properties. (It turns out that the speci…cation in the latter case is closely related to the conditional logit model. also cancels from both the numerator and the denominator since it appears in each of the terms.79 Now consider the expression for probability A conditional on the probability of either A or B (the chance that either could have experienced event. Now let us apply the PH assumption (t.6)
Note that the survivor function terms cancel. L1 =
1 1
+
2
+ ::: +
(7. then the neat results above do not hold exactly. either one has to use various approximations (several standard ones are built into software packages by default). but some may also see it as a disadvantage if one is particularly interested in the shape of the baseline hazard function for its own sake. we …nd that L5 = A = A+B
7 (11) (11) + 8 (11) 7
(7. L1 =
1
(2) +
2
1 (2) (2) + :::: +
8
(2)
:
(7. which is common to all.9)
The baseline hazard contributions cancel. or modify the expressions to derive the ‘ exact’partial likelihood –though this may increase computational time substantially. if one …nds that the
. (The intercept term. One can derive an estimate of the baseline survivor and cumulative hazard functions (and thence of the baseline hazard using methods analogous to the KaplanMeier procedure discussed earlier –with the same issues arising). starting for illustration with event k = 5 with associated risk set i 2 f7.8)
8
:
(7. which is a sum of probabilities). If there are tied survival times.) By similar arguments. Xi ) = 0 (t) i . 8g: We can substitute into the expression for L5 above.) On the other hand. and …nd that L5 = = (11) 7 0 (11) 7 + 0 (11)
0 7 7+ 8
(7. one can construct the complete PL expression for the whole sample of events. Given each Lk expression. This same idea can be applied to derive all the other Lk : For example. which can be seen as a great advantage (one avoids potential problems from specifying the wrong shape). Note that the baseline hazard function is completely unspeci…ed.10)
8
and so on. Using the standard conditional probability formula.
. To estimate that constant scaling factor. The Cox PL model can incorporate timevarying covariates. Check this yourself: multiply all the survival times in the table by two and repeat the derivations (you should …nd that the estimates of the slope coe¢ cients in are the same). one does not need the exact survival times. Finally observe that the expression for each Lk does not depend on the precise survival time at which the kth event occurs. then perhaps s one should ask whether a continuous time model is suitable. This is because the PL estimates are derived only using the information about the risk pool at each failure time. Only the order of events a¤ects the PL expression. COX’ PROPORTIONAL HAZARD MODEL S
incidence of tied survival times is relatively high in one’ data set. one uses episode splitting again. Thus covariates are only ‘ evaluated’ during the estimation at the failure times. di¤ering only by a constant multiplicative scaling factor that does not vary with survival time. In empirical implementation. By constrast with the approaches to this employed in the last two chapters.80
CHAPTER 7. given the common shape. and it does not matter what happens to their values in between. One could instead apply the discrete time proportional hazards model. observe that now the splitting need only be done at the failure times. What is the intuition? Remember that the PH assumption means implies that the hazard function for two di¤erent individuals has the same shape.
Chapter 8
Unobserved heterogeneity (‘ frailty’ )
In the multivariate models considered so far. (If one is modelling human survival times. What if the e¤ects are important but ‘ ignored’in modelling? The literature suggests several …ndings: The ‘ nofrailty’model will overestimate the degree of negative duration dependence in the hazard (i. all di¤erences between individuals were assumed to be captured using observed explanatory variables (the X vector).) There are several reasons why these variables might be relevant. then frailty is an unobserved propensity to experience an adverse health event. but declines with time.e. 81
. For example: omitted variables (unobserved in the available data. 1990. There are usually referred to as ‘ frailty’ in the biomedical sciences. underestimate the degree of positive duration dependence). or intrinsically unobservable such as ‘ ability’ ) measurement errors in observed survival times or regressors (see Lancaster. one gets an underestimate of the true proportionate response of the hazard to a change in a regressor k from the nofrailtymodel k . chapter 4). Let us now look at these results in more detail. We now consider generalisations of the earlier models to allow for unobserved individual e¤ects. * Insert …gure * The proportionate response of the hazard rate to a change in a regressor k is no longer constant (it was given by k in the models without unobserved heterogeneity).
4) (8. and the opposite occurs for individuals with belowaverage values of v. It can be shown. and their survival times are smaller). a normalisation required for identi…cation)
2
(8. given that the individual e¤ect is unobserved? Clearly we cannot estimate the values of v themselves since. and is
distributed independently of t and X. X j v) = v (t. Alternatively. We consider the model (t. and v is an unobservable individual e¤ect that scales the nofrailty component. other things being equal. X j v) = v 0 (t) e log [ (t. X)
=
0
(t) e
X
0
(8. If the nofrailty hazard component has Proportional Hazards form.82
CHAPTER 8.1
Continuous time case
For convenience we suppress the subscript indexing individuals. The frailty model for the loghazard adds an additional additive ‘ error’ term (u). X)]
v
(8. using the standard relationship between the hazard rate and survivor function (see chapter 3). and assume for now that there are no timevarying covariates. X) is the hazard rate depending on observable characteristics X. the log of the hazard rate at each survival time t equals the log of the baseline hazard rate at t (common to all persons) plus an additive individualspeci…c component ( 0 X). X) where (t. think of this as a random intercept model: the intercept is 0 + u: How does one estimate frailty models.5)
where u log (v) and E(u) = : In the nofrailty model. This model is sometimes referred to as a ‘ mixture’model –think of the two components being ‘ mixed’together –or as a ‘ mixed proportional hazard’model (MPH). X j v)] = log 0 (t) +
X
X +u
(8. then:
0
(t.3)
0
(t. by
. X j v) = [S (t. that the relationship between the frailty survivor function and the nofrailty survivor function is S (t. Individuals with aboveaverage values of v leave relatively fast (their hazard rate is higher. UNOBSERVED HETEROGENEITY (‘ FRAILTY’ )
8. Random variable v is assumed to have the following properties: v >0 …nite variance E(v) = 1 (unit mean.1)
> 0.2)
Thus the individual e¤ect v scales nofrailty component survivor function.
(8. they are unobserved. X j . and so S t. X j . X) and evaluating the integral implies a speci…c functional form for the frailty survivor function: S t. in which case: S (t) = exp( t ). But what shape is appropriate to use for the distribution of v (among the potential candidates satisfying the assumptions given earlier)? The most commonly used speci…cation for the mixing distribution is the Gamma distribution.6)
[S(t. X). Xj .) Now consider what this implies if the nofrailty part follows a Weibull model. v): Then Sv (t. and there are not enough degrees of freedom left to …t these parameters. X)
(1= (1=
2
2
)
(8.g.9)
)
where S (t. 2 = S (t. X) = S(t.1. X) = ln S (t.8) (8.10)
for which one may calculate that the median survival time is given by mv = (2
2
1)
2
!1
:
(8. (Note that lim 2 !0 S t. = exp( 0 X). X j . This means that one works with some survivor function Sv (t. X) = Z
1 2
). with unit mean and variance 2 . CONTINUOUS TIME CASE
83
construction. X)] g (v) dv
v
(8. where this distribution has a particular parametric functional form (e. Put another way. The steps are as follows: Specify a distribution for the random variable v. then we can estimate those parameters with the data available.
2
= =
1 1+
2 2
ln S (t. X) is the nofrailty survivor function. Xj . otherwise known as ‘ integrating out’ the random individual e¤ect.
2
= 1+
2
t
(1=
2
)
(8. Making the relevant substitutions into Sv (t. and using the property that the integrated hazard function H (t. Write the likelihood function so that it refers to the distributional parameter(s) (rather than each v).11)
. there are as many individual e¤ects as individuals in the data set. However if we suppose the distribution of v has a shape whose functional form is summarised in terms of only a few key parameters.8. X). H(t) = t . and not S(t. X) H (t.7)
0
where g (v) is the probability density function (pdf) for v: The g (v) is the ‘ mixing’distribution. summarising the variance of v).
An alternative approach is to derive the survivor function (and its quantiles) for speci…c values of v (these are conditional survivor functions). X j . In this case. 2 S j. 2 . X j v)] = log [
0
(j)] +
0
X + u:
(8. there is a closed form expression for the frailty survivor function: see (8:9) above. i. not conditioning on a value of v (that has been integrated out). with the appropriate substitutions made.
8.) An alternative mixing distribution to the Gamma distribution is the Inverse Gaussian distribution. form.
. By the same arguments as those. using data organized in personperiod form. For further discussion. If v has a Gamma distribution. as discussed in the previous chapter. Of the speci…c values. there is no convenient closed form expression for the survivor function and hence likelihood contributions: the ‘ integrating out’must be done numerically. as proposed by Meyer (1990).g.12)
Recall too the discrete time model for the situation where survival times are grouped (leading to the cloglog model). one may suppose that u has a Normal distribution with mean zero.84
CHAPTER 8. X j v)] = D (j) +
0
X +u
(8. then the log of the frailty hazard is: log [ (j. one can have a discrete time PH model with unobserved heterogeneity: cloglog [h (j. Alternatively. This model can be estimated using my Stata program pgmhaz8 (or pgmhaz for users of Stata versions 5– The data should be organized in personperiod 7). X j . UNOBSERVED HETEROGENEITY (‘ FRAILTY’ )
This may be compared with the expression for the median in the nofrailty case for the Weibull model. If we have an in‡ sample with right censoring.2
Discrete time case
Recall that in the continuous time proportional hazard model. Estimation may be done using the builtin Stata program xtcloglog. 2 .13)
where u log (v). but other values such as the upper or lower quartiles could also be used. as above. Lancaster (1990). and timevarying covariates may be incorporated. and contribution of someone who makes a transition in the jth interval is S j 1.e. This is less commonly used (but available in Stata along with the Gamma mixture model). the Stata manuals under streg. or EC968 Web Lesson 8. which is the corresponding expression as v ! 0. see e. the contribution to the sample likeliow hood for a censored observation with spell length j intervals is S j. the most obvious choice is v = 1 (the mean value). (The estimates of the quartiles of the heterogeneity distribution can be derived using the estimates of 2 . D (j) characterises the baseline hazard function. This survivor function (and median) is an unconditional one. X j . [(log(2)= ]1= .
DISCRETE TIME CASE
85
For the logit model. for which the interval hazard rate is given (see Chapter 3) by h(j. and …nite variance.17) If 2 > 1 . X) = 1 exp[ exp( h2 (j.18) where L1 = h1 (j. L = L( 1 ) + (1 )L( 2 ) (8. h1 (j. The idea is essentially that one …ts an arbitrary distribution using a set of parameters. (8. one has a model that can be estimated using the Stata program xtlogit applied to data organized in personperiod form.e. i. All the approaches mentioned so far use parametric approaches. Sociologists might recognize the speci…cation as a form of ‘ latent class’model: the process describing timetoevent now di¤ers between a number of classes (groups) within the population. one might suppose now that log odds of the hazard takes the form h (j. X)
c j Y
+
1 X1
+
2 X2
+ ::: +
K XK
+
[1
h1 (k. We can incorporate this idea by allowing the intercept 0 to vary between the two classes.16)
j )]
for Type 2. A nonparametric approach to characterising the frailty distribution was pioneered in the econometrics literature by Heckman and Singer (1984). X)]
(8. We have a discrete (multinomial) rather than a continuous mixing distribution.19)
k=1
. consider the discrete time proportional hazards model. These parameters comprise a set of ‘ mass points’and the probabilities of a person being located at each mass point.15)
and e is an ‘ error’ term with mean zero. Gamma) distribution.2. If we have an in‡ sample with right censoring.e. other things being equal. X) 1 h1 (j.g.8. the contribution to the ow sample likelihood for a person with spell length j intervals is the probabilityweighted sum of the contributions arising were he a Type 1 or a Type 2 person. If one assumes that e has a Normal distribution with mean zero. X) = 1 exp[ exp(
0
+
1 X1
+
2 X2
+ ::: +
K XK
+
j )]:
Suppose there are two types of individual in the population (where this feature is not observed). i.14)
X +e
(8. X) = 1 exp[ exp(
1 + 1 X1 + 2 X2 +:::+ K XK + j )] 2
for Type 1 (8. as earlier. X j e) h0 (j) = exp 1 h (j. then Type 2 people are fast exiters relatively to Type 1 people. X j e)] = D (j) +
0 0
X +e
(8. but the randomness is characterised by a discrete distribution rather than a continuous (e. Once again we have a random intercept model. To be concrete. X j e) 1 h0 (j) which leads to the model with logit [h (j.
g. X j v) = v which implies @ log [ (t. chapter 9). we provide more details behind the results that were stated at the beginning of the chapter.e. and c is the censoring indicator.22)
(8. This parameter is constant and does not depend on t (or X).21) m L( m ):
m=1
The m are the M ‘ mass point’parameters describing the support of the discrete multinomial distribution. This model can be estimated in Stata using my program hshaz (get it by typing ssc install hshaz in an uptodate version of Stata) or using Sophia RabeHesketh’ program gllamm (ssc install gllamm). M = 2 or M = 3) and conducted inference conditional on this choice.3
What if unobserved heterogeneity is ‘ important’but ignored?
In this section. X1 is a subset of X:
. is (t. takes the PH form and there is a continuous mixing distribution: (t. X j v)] = k: (8. researchers have often used a small number (e. X j v) = v where
1 0 0
(t) . see e.g. The model also implies @ log [ (t.23) @Xk I. Lancaster (1990. For a discussion of the ‘ optimal’ number of mass points.25)
=e
0 1 X1
. using socalled Gâteaux derivatives.e. and the m are their corresponding probabilities. 1990) for a more extensive treatment (on which I have relied heavily). Where there are M latent classes. X) 1 h2 (j. X)
[1
h2 (k. the likelihood contribution for a person with spell length j is M X L= (8. i. s
8. We suppose that the ‘ true’model. X) = : @t @t We suppose that the ‘ observed’ model. The number of mass points to choose is not obvious a priori.86
CHAPTER 8.24)
(t)
1
(8. X)]
(8. In practice. with no omitted regressors. with P with m m = 1. UNOBSERVED HETEROGENEITY (‘ FRAILTY’ )
c j Y
L2 =
h2 (j. X) v] @ log 0 (t.
e
0
X
(8. with omitted regressors. See Lancaster (1979. the proportionate response of the hazard to some included variable Xk is given by the coe¢ cient k .20)
k=1
where is the probability of belonging to Type 1.
27)
8.28) (8. But recall 1= where is the Weibull shape parameter we discussed earlier. in turn.3. If one added in (unobserved) heterogeneity to the systematic part of the model. for all 2 > 0.3.30)
. and the estimate of hazard is an underestimate of ‘ true’one. that does the ‘ true’hazard (from the model with no omitted regressors).
8. with the variance of the ‘ residuals’ given by 2 var(u). Controlling for observable di¤erences.
2 2
). Hence ‘ survivors’at longer t increasingly comprise those with low v which. A decline in is equivalent to a rise in . In sum.1
The duration dependence e¤ect
Look at the ratio of the hazards from the two models:
1
= /
0
(t)
1
S1 t j 0 (t) v
2
2
2
2
(8. then
2
S1 t j and
= 1
S (t)
1 2
= 1+
2
2
H (t)
1 2
(8. Bergström and Eden (1992) illustrate the argument nicely in the context of a Weibull model for the nonfrailty component.29)
S1 t j
which is monotonically decreasing with t.8.3.26)
1
tj
2
= S1 t j
2
0
(t)
1:
(8. Recall that the Weibull model in the AFT representation is written log T = 0 X + u.2
The proportionate response of the hazard to variations in a characteristic
We consider now the proportionate response of the hazard to a variation in Xk where Xk is an included regressor (part of X1 ). or falls faster. This is equivalent to a decline in . implies a lower hazard. then one would expect the variance of the residuals to fall correspondingly. the hazard rate from a model with omitted regressors increases less fast. WHAT IF UNOBSERVED HETEROGENEITY IS ‘ IMPORTANT’BUT IGNORED?87 Assuming v Gamma(1. people with unobserved characteristics associated with higher exit rates leave the state more quickly than others. Thus the ‘ true’ model exhibits more positive duration dependence than the model without this extra heterogeneity incorporated. The intuition is of a selection or ‘ weeding out’e¤ect. We know that the proportionate response in the ‘ true’model is @ log = @Xk
k:
(8.
substituting back into the earlier expression.35)
1+
k H1 (t) : 2 H (t) 1
(8.38)
Note that 1 + 2 H1 (t) > 1 or. Variation in H1 causes equiproportionate variation in the hazard rate at all t in the ‘ true’model.31)
2
:
(8.88
CHAPTER 8. The intuition is as follows.
. alternatively. Group A members with high v (higher hazard) leave …rst. Suppose one wants to estimate k (with its nice interpretation as being the proportionate response of the hazard in the ‘ true’ model). we have that
2
= = =
k
1
1+
k 2H 1 (t)
H1 (t) 2 H (t) 1 i
(8. we have @ log 1 @Xk = = But @ S1 t j @Xk
2
@
S1 t j 2 + 0 X 1 1 @Xk 2 @ S1 t j k + S1 (t j 2 ) @Xk
2
(8. But with omitted regressors.37) (8. Thus the ratio of A to B declines as t increases and the proportionate response will fall. that 0 as t ! 1.39) 1 and tends to zero
1.33) (8. then the omitted regressor model provides an underestimate (in the modulus) of the proportionate response.32)
= =
1
2
[1 +
2
H1 (t)]
1 2
1 2@
[H1 (t)] @Xk
(8. B.34)
S1 t j 2 @ [H1 (t)] 1 + 2 H1 (t) @Xk
k H1
and so
@ [H1 (t)] = @Xk @ [S1 ] = S1 @Xk @ log 1 @Xk
2 2
(t)
(8. In general the elasticity of the hazard rate at any t varies with the ratio of the average group hazards. implying that the average A among the survivors falls and becomes more similar to the average B . A and B . On average the hazard rate is higher for A than for B at each time t. Suppose there are two types of person A.36)
Hence. The implications are twofold:
1+ h
k
S1 t j
2
2
: S1
(8. It is a ‘ weeding out’e¤ect again. the proportionate e¤ect tends to zero as t ! 1. 2. UNOBSERVED HETEROGENEITY (‘ FRAILTY’ )
In the observed model.
.8. For example. EMPIRICAL PRACTICE
89
8.4. The early empirical social science literature found that conclusions about whether or not frailty was ‘ important’ (e¤ects on estimate of duration dependence and estimates of ) appeared to be sensitive to choice of shape of distribution for v. One can estimate frailty models and test whether unobserved heterogeneity is relevant using likelihood ratio tests based on the restricted and unrestricted models.g.4
Empirical practice
These results suggest that taking accounting of unobserved heterogeneity is a potentially important part of empirical work. y Subsequent empirical work suggests. the topic underscores the importance of getting good data. partly because of a lack of available software. but that constraint is now less relevant. but the expression for the survivor function now involves factors related to the unobserved heterogeneity. that the e¤ects of unobserved heterogeneity are mitigated. that were not ‡ exible enough. if the analyst uses a ‡ exible baseline hazard speci…cation. the ‘ convenient cancelling’results used to derive an easy estimation method for discrete time models no longer apply. All in all. however. . Dolton and van der Klaauw (1995). including the discrete mixture methods brie‡ referred to. (Alternatively there are ‘ score’tests based on the restricted model. and thence estimates more robust. Some argued that the choice of distributional shape was essentially ‘ arbitrary’ and this stimulated the development of nonparametric methods. (The earlier literature had typically used speci…cations. To properly account ow for left (or right) truncation requires special programs. including a wide range of explanatory variables that summarize well the di¤erences between individuals. often the Weibull one.) Observe that virtually commonlyavailable estimation software programs assume that one has data from a in‡ or population sample. It has often not been so in the past. The likelihood contribution for a person with a left truncated spell still requires conditioning on survival up to the truncation date.) See e.
90
CHAPTER 8. UNOBSERVED HETEROGENEITY (‘ FRAILTY’ )
.
with a spell length TC ). What we observe in the data is either (i) no event at all (a censored case. and latent failure time TB . with survival times characterised by density function fB (t). The continuous time case will be considered …rst. Now we consider the possibility of exit to one of several destination states. with survival times characterised by density function fA (t). we will suppose that there are two destination states A.Chapter 9
Competing risks models
Until now we have considered modelled transitions out the current state (exit to any state from the current one). 91
. the hazard rates are ‘ latent’rather than observed in this way. we would be able to link the observed hazards with that destinationspeci…c hazard. but the arguments generalise to any number of destinations. The observed failure time T = min fTA . or (ii) an exit which is to A or to B (and we know which is which).
B
(t) : the hazard rate for exit to any destination. TB . We shall see that the assumption of independence in competing risks makes several of the models straightforward to estimate. However. Each destinationspeci…c hazard rate can be thought of as the hazard rate that would apply were transitions to all the other destinations not possible. B. TC g : What we are seeking is methods to estimate the destinationspeci…c hazard rates given data in this form.1
De…ne
A
Continuous time data
(t) : the latent hazard rate of exit to destination A. and latent failure time TA . If this were so. For illustration. as there are competing risks.
9. (t) : the latent hazard rate of exit to destination B. and then the discrete time one (for both the intervalcensored and ‘ pure’discrete cases).
B independent.2) S (t) = exp = exp = exp Z Z
0 t
[
t
A
(u) +
B
(u)] du Z
t B
(9.3) (u) du (9. probabilities: Pr(A or B) = Pr (A) + Pr (B) if A. Now de…ne destinationspeci…c censoring indicators
A
= =
1 if i exits to A 0 otherwise (exit to B or censored) 1 if i exits to B 0 otherwise (exit to A or censored)
(9.9) (9. is L = (LA )
A
(LB )
A
B
= [fA (T ) SB (T )] fA (T ) SA (T ) n = [ A (T )] =
A
LC
1
A
B
(9.6) (9. COMPETING RISKS MODELS are independent.10) (9. where LC = S (T ) = SA (T ) SB (T ) In the LA case. So (t) dt = A (t) dt + B (t) dt:) Independence also means that the survivor function for exit to any destination can factored into a product of destinationspeci…c survivor functions: Z t (u) du (9.7)
B
The overall contribution from the individual to the likelihood.11)
fB (T ) SA (T ) SB (T ) SB (T ) on o A B SA (T ) [ B (T )] SB (T )
. (Cf.92 Assume and
CHAPTER 9.5)
0 A
(u) du exp
0
0
= SA (t) SB (t) :
The derivation uses the property ea+b = ea eb : The individual sample likelihood contribution in the independent competing risk model with two destinations is of three types: LA : exit to A. the hazard rate for exit to any destination is the sum of the destinationspeci…c hazard rates. where LA = fA (T ) SB (T ) LB : exit to B.e. and vice versa in the LB case.1)
I.8)
B
[fB (T ) SA (T )]
B
[SA (T ) SB (T )]
1
A
B
(9. This implies that (t) =
A
A
B
(t) +
B
(t) :
(9. the likelihood contribution summarises the chances of a transition to A combined with no transition to B.4) (9. L. where LB = fB (T ) SA (T ) LC : censored spell.
Simply de…ne new destinationspeci…c censoring variables (as above) and then estimate separate models for each destination state. hypothesis is that the hazards need not be constant over t but are equal across individuals. See Narendranathan and Stewart (1991). and weaker. In other words. and the number of exits to each destination. in principle.12)
9. These results generalize straightforwardly to the situation with more than two independent competing risks. exits to only one destination are feasible at any given instant. Implementation of the …rst test involves use of likelihood values from estimation of the unrestricted destinationspeci…c models and the unrestricted single risk model (not distinguishing between exit types). These results are extremely convenient if one is only interested in estimating the competing risks model. In particular one is typically interested in testing whether the same model applies for transitions to each destination or whether the models di¤er (and how). the likelihood is no longer separable. Implementation of the second test requires similar information. including the number of exits to each destination at each observed survival time. Hence one can maximise the overall (log)likelihood by maximising the two component parts separately. one has to jointly estimate the hazards: under the null hypotheses with restrictions.2
Intrinsically discrete time data
Recall that in the continuous time case.2. the overall (continuous
. assuming independent risks. This is equivalent to supposing that conditional probabilities of exit to a particular state at a particular elapsed survival time are the same for all individuals. one is also interested in testing hypotheses involving restrictions across the destinationspeci…c hazards. as long as one assumes that hazard rates have a proportional hazard form. Fortunately there are some simple methods for testing a range of hypotheses that can be implemented using estimation of singlerisk models. The second.
o n (T ) + ln SA (T ) +
B
ln
B
o (T ) + ln SB (T ) :
(9. This means that the model can be estimated very easily. however. each of which depends only on parameters speci…c to that destination. The …rst hypothesis they consider is equality across the destinationspeci…c hazards of all parameters except the intercepts. which is equivalent to supposing that the ratios of destinationspeci…c hazards are independent of survival time and the same for all individuals. But introducing such restrictions means that. Often. The overall model likelihood value is the sum of the likelihood values for each of the destinationspeci…c models.9. INTRINSICALLY DISCRETE TIME DATA or ln L = n
A
93
ln
A
The loglikelihood for the sample as a whole is the sum of this expression over all individuals in the sample. Hence. the (log) likelihood for the continuous time competing risk model with two destination states factors into two parts.
16)
Similarly. then: LA = hA (j) S(j 1) hA (j) S(j) = 1 h (j) hA (j) = S(j) 1 hA (j) hB (j) (9.17) (9. Supposing that the observed survival time for an individual is j cycles. and the neat separability result that applies in the continuous time case no longer holds. To see why. the analogous expression could be rewritten as the product of the destinationspeci…c survivor functions. h (j) = hA (j) + hB (j) : (9. With discrete time.20) There is a common term in each expression summarising the overall probability of survival of survival for j cycles.94
CHAPTER 9. However this property does not lead to a neat separability result for the likelihood analogous to that for the continuous time case. but are observed grouped into intervals).13)
Because survival times are intrinsically discrete. Si (j). that for an individual exiting to B (LB ). then there cannot be an exit to the other destination at the same survival time.18) (9. In this case. and that for a censored case (LC ).15) (9. As before let us distinguish between the cases in which survival times are intrinsically discrete and in which they arise from interval censoring (survival times are intrinsically continuous.14) (9. In the continuous time case. consider the likelihood contributions for the discrete time model. things get more complicated. beginning with the former.19)
and (9. there is a parallel with the continuous time case: the discrete hazard rate for exit at time j to any destination is the sum of the destinationspeci…c discrete hazard rates. if there is an exit to one of the destinations at a given survival time. But this result does not carry over to here: S(j) =
j Y
[1
h (k)] =
k=1
k=1
j Y
[1
hA (k)
hB (k)] :
(9. i. That is.e. COMPETING RISKS MODELS
time) hazard equals the sum of the destinationspeci…c hazards. LB = hB (j) S(j 1) hB (j) S(j) = 1 h (j) hB (j) S(j) = 1 hA (j) hB (j) LC = S(j): (9.21)
. There are three types: that for an individual exiting to A (LA ).
27)
However.26)
With destinationspeci…c censoring indicators A and A de…ned as before. as Allison (1982) has demonstrated. INTRINSICALLY DISCRETE TIME DATA
95
The overall likelihood contribution for an individual with an observed spell length of j cycles is: L = (LA ) = 1
j Y
A
(LB )
B
LC
1
A
B
A
B
hA (j) hA (j) hB (j) [1 hA (k)
1
hB (j) hA (j) hB (j) (9. the likelihood contribution for the individual with spell length j can be written: L = exp( 0 X) A 1 + exp( 0 X) + exp( A
A
0 B X) 1 0 B X)
exp( 0 X) B 1 + exp( 0 X) + exp( A
A B
B
0 B X)
1 1 + exp( 0 X) + exp( A
j 1 Y
1 1 + exp(
0 A X)
k=1
+ exp(
0 B X)
:
(9.25)
hB (k) =
+ exp(
0 B X)
(9. which we refer back to later on. is
A
h (j) L = S(j) 1 h (j)
+
B
A
B
hA (j) h (j)
hB (j) h (j)
:
(9. it turns out that there is still a straightforward means of estimating an independent competing risk model.23)
Although there is no neat separability result in this case.9. To estimate the model. as Allison (1982) pointed out.2.24)
hB (k) = and hence 1 hA (k)
0 B X)
(9. there are four steps:
. The ‘ trick’ is to assume a particular form for the destinationspeci…c hazards: hA (k) = exp( 0 X) A 1 + exp( 0 X) + exp( A exp( 0 X) B 1 + exp( 0 X) + exp( A 1 1 + exp(
0 A X) 0 B X)
(9.22)
hB (k)]
k=1
Another way of writing the likelihood. this likelihood has the same form as the likelihood for a standard multinomial logit model applied to reorganised data.
This takes the value 0 for all censored observations in the reorganised data set.96
CHAPTER 9. in particular variables summarising durationdependence in the destinationspeci…c hazards. all observations are censored. Using software packages such as Stata it is straightforward to apply Wald tests of whether. ‘ exponentiated value of a coe¢ cient is . create additional categories of the dependent variable. set the dependent variable equal to 1. The other coe¢ cients ( A . one of the sets of parameters is set equal to zero. B ) then measure changes in probabilities relative to the censored (no event) outcome. (For persons with censored spells. Construct any other variables required. What is important is that they are distinct (and also that one’ software knows which value corresponds to the base category). This is what the Stata Reference Manuals (in the mlogit. COMPETING RISKS MODELS 1. A nice feature of this ‘ multinomial logit’hazard model is that it is straightforward to test hypotheses about whether the destinationspeci…c discrete hazards have common determinants. with an analogous interpretation for exp( B ). and for those with an exit to destination B in the …nal period observed. for example. There is more than one set of estimates that would led to the same probabilities of the outcomes observed. In e¤ect. B ). (If there are more destinations. One cannot estimate a set of coe¢ cients (call them C ) for the third process in addition to the other two sets of parameters ( A . but in the current context it is intuitively appealing to set C = 0. Estimate the model using a multinomial logit program. call it C. the the relative risk ratio for a one unit change in the corresponding variable. for example the durationdependence speci…cations.
Observe that the particular values for the dependent variable that are chosen do not matter. all observations are censored except the …nal one. are constrained to take the same form. One can also estimate models in which some components of the destinationspeci…c hazards. expand the data into personperiod form (as discussed in Chapter 6 for singledestination discrete time models). 2. set the dependent variable equal to 2. there is an identi…cation issue. In principle. Construct an dependent variable for each personperiod observation. The ratio of the probability of exit to destination A to the probability of no exit at all is exp( A ). it does not matter which. s censoring is being treated as another type of destination.entry) refer to as the ‘ relative risk’ And as the manual explains. Other timevarying covariates may also be constructed.) 3. As a consequence. One potential disadvantage of estimating the model using the multinomial logit programs in standard software
. setting the reference (base) category equal to 0. corresponding coe¢ cients are equal or not. for persons with a completed spell. it being understood that risk is measured as the risk of the category relative to the base category’(no event in this case).) For persons with an exit to destination A in the …nal period observed. 4. However in the multinomial logit model.
To construct the sample likelihood. only one is actually observed). the shape of the (continuous time) hazard rate within each interval cannot be identi…ed from the grouped data that is available. In the data generation processes discussed so far in this chapter. As we shall now see. let us explore the relationship between the overall discrete (interval) hazard and the destinationspeci…c interval hazards. assumptions have to be made about this shape. of course. for two related reasons. but also that that exit occurred before an exit to the other potential destinations. not only was there an exit to that destination. and more fundamentally.3. The overall discrete hazard for the jth interval is given by h(j) = 1 = 1 S(aj ) S(aj 1) R aj exp [ A (t) + B (t)]dt i h R 0 aj 1 [ A (t) + B (t)]dt exp 0 " Z # aj exp [ A (t) + B (t)]dt
aj 1
= 1
(9. the overall hazard was equal to the sum of the destinationspeci…c hazards.
9. and the relationship between these discrete interval hazards and the underlying continuous time hazards. we note that the jth interval (of unit length) runs between dates aj 1 and aj .9. and to explicitly relate the model to the continuous time hazard(s). more than one latent event is possible in each interval (though.3
Intervalcensored data
We now consider the case in which survival times are generated by some continuous time process. The alternative is to do as we did in Chapter 3. Before considering the expressions for the likelihood contributions. be to apply the ‘ multinomial logit’ hazard model to these data. making assumptions about the discrete time (interval) hazard and eschewing any attempt to relate the model to an underlying process in continuous time. INTERVALCENSORED DATA
97
is that these typically require that the same set of covariates appears in each equation. when constructing the likelihood and considering the probability of observing an exit to a speci…c destination in a given interval. but observed survival times are grouped into intervals of unit length (for example the ‘ month’or ‘ week’ One way to proceed would simply ). Put another way.e. Using the same notation as in Chapter 2.28)
. and alternative assumptions lead to di¤erent econometric models. First. the likelihood is not separable as it was for the continuous time case. With grouped survival times. we have to take account of the fact that. i. This is not true in the intervalcensored case. Second. the models are complicated relative to those discussed earlier in this chapter. as described above.
we also have h(j) = hA (j) + hB (j) + hA (j) hB (j) hA (j) + hB (j) if hA (j) hB (j) (9. there are three types of contribution to the likelihood: that for an individual exiting to A (LA ). Now consider the relationship between the survivor function for exit to any destination and the survivor functions for exits to each destination: S (j) = (1 h1 ) (1 h2 ) (::::) (1 hj ) = (1 hA1 ) (1 hB1 ) (1 hA2 ) (1 hB2 ) ::: (1 hA2 ) (1 hBj ) = (1 hA1 ) (1 hB2 ) (:::::) (1 hAj ) (1 hB1 ) (1 hB2 ) (::::) (1 hBj ) : In other words.29)
aj
1
and hB (j) = 1 It follows that h (j) = 1 or 1 h (j) = [1 f[1 exp
" Z
aj B 1
(t) dt :
aj
#
(9. with the accuracy of the approximation improving the smaller that the destinationspeci…c hazards are. rather than the discrete time ones. The destinationspeci…c discrete hazards for the same interval are # " Z
aj
hA (j) = 1
exp
A
(t) dt
(9. For a person with a censored spell length of j intervals. (The same factoring result can also be derived by expressing the survivor functions in terms of the continuous time hazards. Rearranging the expressions. analogous to the continuous time case.36)
.34)
0:
This tells us that the overall interval hazard is only approximately equal to the sum of the destinationspeci…c interval hazards.37) so that there is a factoring of the overall grouped data survivor function.33) (9.98
CHAPTER 9.) As before.31) (9.35)
(9.30)
hA (j)] [1 hA (j)] [1
hB (j)]g hB (j)] :
(9.32)
Thus the overall discrete interval hazard equals one minus the probability of not exiting during the interval to any of the possible destinations. S (j) = SA (j) SB (j) (9. and this latter probability is the product of one minus the destinationspeci…c discrete hazard rates. (9. The latter is straightforward (we have just derived it). and that for a censored case (LC ). COMPETING RISKS MODELS
where we have used on the property (t) = A (t) + B (t). that for an individual exiting to B (LB ).
TB > TA ) Z aj Z 1 = f (u. the jth interval is de…ned as (aj 1. The …rst is that transitions can only occur at the boundaries of the intervals.3. begins just after date aj 1.1
Transitions can only occur at the boundaries of the intervals. Because we assumed independence of competing risks. and the lower integration point in the second integral. They considered labour force transitions by British unemployed men using spell data
. The fourth is that the the hazard rate takes a particular proportional hazards form.41)
Z
u Z 1
fA (u)fB (v)dvdu fA (u)fB (v)dv + Z
1
aj 1
aj
"Z
u
aj
fA (u)fB (v)dv du
u
aj
#
(9. the withininterval hazard rates.
This was the assumption made by Narendrenathan and Stewart (1993). According to the convention established in Chapter 2.38)
k=1
What is the likelihood contribution if. v) is the joint probability density function for latent spell lengths TA and TB .39) (9. is the (unobserved) time within the interval at which the exit to A occurred. instead of being censored.3. the interval.
9. of unit length. u. aj ]: I. f (u. The expression for the joint probability that we want is LA = Pr(aj 1 < TA aj . and the third is that destinationspeci…c hazard rates are constant within each interval (though may vary between intervals). INTERVALCENSORED DATA
99
LC
= S (j) = SA (j) SB (j) =
j Y
[1
hA (k)][1
hB (k)]
(9.40) (9. and the …fth is that the log of the integrated hazard changes linearly over the interval. Five main assumptions have been used in the literature to date. v)dvdu = =
aj 1 Z aj aj 1
(9. The second is that the destinationspeci…c density functions are constant within each interval (though may vary between intervals). and it …nishes at (and includes) date aj . equivalently. We cannot proceed further without making some assumptions about the shape of the withininterval density functions or.e. the start of the next interval. v) = fA (u)fB (v).42)
where f (u.9. the individual’ s spell completed with an exit to destination A during interval j? We need to write down the joint probability that the exact spell length lay between the lengths implied by the boundaries of the interval and that latent exit time to destination B was after the latent exit time to destination A.
A natural choice for these is a proportional hazard speci…cation.49) (9. This allows substantial simpli…cation of (9. If we de…ne destinationspeci…c censoring indicators A and B . At the time. if a transition to A occurred in interval j = (aj 1.44) (9.51)
.45) (9.e.46) (9.50)
Bj
where Aj is the log of the integrated baseline hazard for destination A over the jth interval. and so there is some plausibility in the idea that transitions would only occur at weekly intervals. aj ]. in turn. we may write LB = 1 hB (j) SA (j) SB (j) : hB (j)
aj 1 Z aj aj
Z
1
fA (u)fB (v)dvdu Z
1
(9. in which case hA (j) and hB (j) would each take the complementary loglog form discussed in Chapter 3: hA (j) = 1 and hB (j) = 1 exp[ exp
0 BX
exp[ exp
0 AX
+ +
Aj
] ]
(9. This.47)
aj
fA (u)du
fB (v)dv
1
aj
(9. means that fB (v) = 0 between dates u and aj . Of course. as above. it occurred at date aj . The assumption has powerful and helpful implications.43) (9. If transitions can only occur at interval boundaries then.42): Z
aj
LA
= =
= [FA (aj ) FA (aj 1)] [1 FB (aj )] = hA (j) SA (j 1) SB (j) hA (j) SA (j) SB (j) : = 1 hA (j) By similar arguments. beyond the jth interval). COMPETING RISKS MODELS
grouped into weekly intervals. empirical evaluation of the expressions also requires selection of a functional form for the destinationspeci…c continuous hazards. in order to receive unemployment bene…ts.48)
In both expressions.100
CHAPTER 9. and it must be the case that TB > aj (i. there was a requirement for unemployed individuals to register (‘ sign on’ weekly. and Bj is interpreted similarly. the Chapter 3 de…nitions of the discrete (interval) hazard function (h) and survivor function (S) in the intervalcensored case have been used. then the overall likelihood contribution for the person with a spell length of j intervals is given by L = (LA ) =
A
(LB )
B
A
LC
1
A
B
B
hA (j) 1 hA (j)
SA (j)
hB (j) 1 hB (j)
SB (j):
(9. at the unemployment o¢ ce ).
INTERVALCENSORED DATA
101
Thus in the case where transitions can only occur at the interval boundaries.42). and especially Wilbert van der Klaauw.3. one can estimate the overall independent competing risk model by estimating separate destinationspeci…c models having de…ned suitable destinationspeci…c censoring variables.3. I claimed (incorrectly) that this result held universally for intervalcensored spell data. but now apply a di¤erent set of assumptions. We begin again with (9.42). The assumption of constant withininterval densities (as used for the ‘ acturial adjustment’in the context of lifetables) implies that fA (u)fB (v) = f Aj f Bj when aj 1<u aj and aj 1<v aj : (9. I am grateful for clari…catory discussions with Paul Allison.9.55)
aj
The …rst term in (9.56) (9. each of which is a function of a single destinationspeci…c hazard only. In other words. Dolton and van der Klaauw (1999).55) is Z aj f Aj f Bj [aj u]du
aj 1
= f Aj f Bj [aj = =
1 1 (aj )2 + (aj 2 2
1)2 ] (9.52)
Substituting this result into (9.
. evaluating the …rst term of the three within the square brackets. the result is analogous to that continuous time models (and also generalises to a greater number of destination states). Mark Stewart.1
9. we get Z aj A f Aj f Bj [aj u]du L =
aj
1 + 2
Z
1
aj
aj
1
Z
1
fA (u)fB (v)dvdu +
aj
1 2
Z
aj 1
aj
Z
1
fA (u)fB (v)dvdu: (9.57)
1 f f 2 ZAj Bj Z aj 1 aj f f dvdu 2 aj 1 aj 1 Aj Bj
1 In an earlier version of these Lecture Notes.2
Destinationspeci…c densities are constant within intervals
This was the assumption made by. My derivations reported for case (2) are based on unpublished notes by Wilbert. Chandra Shah. as in the continuous time case. we derive # Z aj "Z aj Z Z 1 1 1 1 A L = f Aj f Bj dv + fA (v)fB (u)dv + fA (v)fB (u)dv du: 2 aj 2 aj aj 1 u (9. for example. Z aj f Aj f Bj dv = f Aj f Bj [aj u] (9.53) Now. the likelihood contribution partitions into a product of terms. Peter Dolton. And.54)
u
and then substituting back into LA .
60) 2 1 Pr(aj 1 < TA aj ) [Pr(TB > aj 1) + Pr(TB > aj )](9. substituting this back into (9. provide bounds on the joint probability of interest and.59) 2 2
= =
This is the result cited by Dolton and van der Klaauw (1999.65). LA 1 [Pr(aj 1 < TA aj ) Pr(TB > aj 1)] 2 1 + [Pr(aj 1 < TA aj ) Pr(TB > aj )] (9. TB > aj 1) + Pr(aj 1 < TA aj . That is. the joint probabilities can be calculated using expressions for the marginal probabilities. to derive LA Z 1 Z 1 Z Z 1 aj 1 aj fA (u)fB (v)dvdu + fA (u)fB (v)dvdu (9.62) 1 hA (j) 2 SB (j) 1 hA (j) SA (j) +1 (9. and similarly in (9.102
CHAPTER 9.61) 2 hA (j) 1 SA (j) [SB (j 1) + SB (j)] (9. the expression for LA . The averaging property also holds when there are more
. footnote 9).67)
Thus the constant withininterval densities assumption leads to expressions for the likelihood contributions that have rather nice interpretations.65) (9. This in turn involves a simple averaging of survival functions that refer to the beginning and end of the relevant interval: see equations (9.58) 2 aj 1 aj 1 2 aj 1 aj 1 1 Pr(aj 1 < TA aj .66) (9. with the assumption.55). the expression for the likelihood contribution for the case when there is a transition to destination B in interval j is LB 1 [Pr(aj 1 < TB aj ) Pr(TA > aj 1)] 2 1 + [Pr(aj 1 < TB aj ) Pr(TA > aj )] 2 hB (j) 1 = SB (j) [SA (j 1) + SA (j)] 1 hB (j) 2 hB (j) 1 hA (j) =2 = SB (j)SA (j) : 1 hB (j) 1 hA (j) =
(9.62) and (9. TB > aj ): (9.64) 1 hA (j) 1 hB (j)
=
= = = =
Similarly. we can combine the …rst two terms in square brackets. COMPETING RISKS MODELS
Now.60).63) 1 hA (j) 2 1 hB (j) 1 hB (j) =2 hA (j) SA (j)SB (j) : (9.66). Since survival times TA and TB are independent (by assumption). The two component probabilities comprising equation (9. we simply take the average of them.
70)
SB (j):
where the precise speci…cation depends on the choice of functional form for the discrete (interval) hazard. then computations of the individual likelihood contributions would be underestimates. On the other hand. the expression is not separable into destinationspeci…c components that can be maximized separately. B. then expression corresponding to (9. The size of the discrete hazards will depend on the application in question.50) above. the ratio is about 1. equations (9.3. In other words.64) and (9. if one did (incorrectly) use the single risk approach as in Case 1. the distribution of survival times for each destination (and overall) takes the Exponential form within each interval.68)
It is as if. If there were three destinations.3
Destinationspeci…c hazard rates are constant within intervals
With this assumption.01. INTERVALCENSORED DATA
103
than two destinations.66) is LB = hB (j) SB (j) 1 hB (j) 1 [SA (j 1)SD (j 3
1)]
2 [SA (j)SD (j)] : 3
(9.
9. call them A.9. Whichever is used. suggests that the transition times for the other risks were more likely to have been after the end of the interval. whereas if the hazard equalled 0. To sum up.3. and D. with two destinations A. for example the cloglog form as in (9. Let
.01. B. then 11 hhA (j) 1:06. then we have the likelihood contributions tend to expressions that are the same as those derived for the case when transitions could only occur at the interval boundaries. one could not use standard singlerisk model software –special programs are required. and how wide the intervals are (for example are they one week or one year?). That is. with a greater number of competing risks.49) and (9.69) (9. the overall likelihood contribution for an individual with a spell of j intervals is L = (LA ) =
A
(LB )
B
A
LC 1
B
1
A
B
A
hA (j) 1 hA (j) hB (j) 1 hB (j)
hB (j) =2 1 hB (j) 1 hA (j) =2 1 hA (j)
SA (j)
B
(9. with the magnitude of the error depending on the size of the destinationspeci…c hazard rates.67) imply that. to estimate the independent competing risk model in this case. the fact that a transition to B rather than the other destinations occurred. A (j)=2 If the hazard rate hA (j) = 0:1. as the destinationspeci…c hazards become in…nitesimally small. The likelihood contribution gives greater weight to endofinterval survival functions.
hA (j) hB (j) h(j) = = = 1 1 1 exp exp exp (9. which is the di¤erence between two components. if aj
Aj
(9.e. But we have to allow for this possibility.30). Tedious manipulations show that LA (…rst term) = S(j 1)h(j)
Aj j
hA (j) exp[ h(j)
Bj ]
:
(9.29) and (9. making the appropriate substitutions into (9. We have explicit functional forms for the density functions within the relevant interval: fA (u) = Aj SA (u) = 1) exp Aj (aj 1 u) when aj 1 u < aj . These can be substituted into (9.78)
The …rst two terms in (9.72) (9. COMPETING RISKS MODELS
= B (t) = (j) =
A (t)
if aj 1 < t aj . and 1 < t aj . implying Bj if aj 1 < t aj : Aj + Bj . and Bj = exp( 0Bj + 1B X1 + 2B X2 + ::: + KB XK ). Given these assumptions.75) : (9.71) (9. The …rst component is the ratio of the
. hence the adjustment summarised in the …rst double integral term in the expression for LA . and similarly for Aj SA (j fB (v).104
CHAPTER 9.77)
LA (second term) is the likelihood contribution if one knew that no transitions to B could have occurred within the interval.78) represent the probability of survival to beginning of the jth interval followed by exit to any destination within interval j.74) (9. each destinationspeci…c hazard rate has a piecewise constant exponential (PCE) form. and the expression evaluated.73)
One obvious parameterisation would be Aj = exp( 0Aj + 1A X1 + 2A X2 + ::: + KA XK ). This probability is then multiplied by a term (in the square brackets). I. The second is Z
aj 1
LA (second term)
=
aj
= hA (j)SA (j 1)SB (j) hA (j) = SA (j)SB (j) 1 hA (j)
Z
1
fA (u)fB (v)dvdu
aj
(9. the destinationspeci…c interval hazards are. There are two terms in the double integral.42).76)
Aj Bj j
Now consider the likelihood contribution for an individual who made a transition to destination A during interval j.
observe that the expression for the likelihood contribution is not separable into destinationspeci…c components that can be maximized separately. then we have the likelihood contributions tend to expressions that are the same as those derived for the case when transitions could only occur at the interval boundaries. or intervalhazards are relatively small then. There is also a connection between the expression L3 and the expression (9. the ‘ model’ will provide estimates that are similar to the intervalcensoring model assuming a constant hazard within intervals (and also assuming that each uses the same speci…cation for duration dependence in the discrete hazard).9. further manipulations reveal that LA = S(j 1)h(j)
Aj Aj
+
= S(j)
Bj
1
h(j) h(j)
Aj Aj
+
(9. multiplied by the probability that an event of any type occurred within the interval conditional on survival to the beginning of the interval (i.80)
which corresponds to equation 4 of Røed and Zhang (2002a. hB (j) Bj .
. INTERVALCENSORED DATA
105
destinationspeci…c instantaneous hazard to the overall instantaneous hazard.3. The second component is the ratio of the destinationspeci…c discrete hazard to the overall discrete hazard. Combining LA (…rst term) and LA (second term). p. Observe also that as the destinationspeci…c hazards become in…nitesimally small.e. then the term in square brackets is equal to zero. Whatever the number of destinations. The size of the interval hazards will depend on the application in question. and LA takes the same form as we derived when transitions could only occur at the boundaries of intervals. multiplied by the relative chance that the event was A rather than B at each instant during the interval. The expression generalises straightforwardly when there are more than two potential destinations. 14). h(j)). and how wide the intervals are.Observe that if Bj = 0.81)
If intervals are short. scaled by a factor (exp[ Bj ]) which is a destinationspeci…c conditional probability of survival (nonexit to B) over an interval of unit length.23) describing the likelihood in the ‘ multinomial’case. In this case. The overall likelihood in this case is
A
L3
= S(j)
h(j) 1 h(j) 1) [1 h(j)]
+
B
A
B
Aj Aj 1
A B
Bj Bj
A
+
Aj
B
+
A
Bj
B
= S(j
[h(j)]
+
Aj j
Bj j
(9.79)
Bj
The contribution in this case is the probability of survival to the beginning of interval j. Recall that this was:
A
h (j) L = S(j) 1 h (j)
+
B
A
B
hA (j) h (j)
hB (j) h (j)
:
(9. the more likely it is that hA (j) multinomial Aj .
82) (9. The authors assume that the destinationspeci…c continuous time hazards have the following proportional hazards forms: = B (t) =
A (t) 0 (t) A .87)
The discrete time (interval) hazard rate for the jth interval is h(j) = 1 S(aj ) S(aj 1) exp[ ( A + B )(H(aj ) exp[ (eAj + eBj )] = = exp( exp(
j j
(9. Hamerle and Tutz (1989).92)
eBj
. To aid our subsequent derivations.84) 0 (t)dt
0
which means that the continuous time hazards can be rewritten in terms of the derivative of the integrated hazard function:
A (t) B (t)
= =
A H(t) B H(t):
(9. where
A B
= exp( = exp(
A X) B X)
(9. These models have the property that.86)
The survivor function can also be written in terms of the integrated hazard: Z
t
S(t)
= =
exp exp[ (
(t)dt +
B )H(t)]:
0
A
(9.90)
= 1 = 1 where eAj
+ +
A X) B X)
= exp( j ) = exp( j )
A B
(9.89) (9. COMPETING RISKS MODELS
9.83)
where baseline hazard function 0 (t) is common to each destinationspeci…c hazard. conditional on exit being made at a particular time. Let us consider the case with two potential destinations A and B. 1990. This is an example of a ‘ proportional intensity’model.4
Destinationspeci…c proportional hazards with a common baseline hazard function
This assumption underlies the derivation of competing risks models presented by. 103– 4).
where 0 (t) B .91) (9. pp.106
CHAPTER 9.88) H(aj 1))] (9.3. for example.85) (9. the probability that it is to one speci…c state rather than any other one does not depend on survival time (Lancaster. also de…ne the integrated baseline hazard function Z t H(t) = (9.
one can show that this expression can be written as: eAj !
L = S(j
A
1)h(j)
The overall likelihood contribution is given by
A
eAj + eBj
+
= S(j
1)h(j)
Aj
Aj
+
:
Bj
(9.95) (9. the likelihood contribution for an individual with exit to destination A in interval j is Z
aj
L
A
=
aj
+
Z
aj
A (u)SA (u) B (u)SB (u)dvdu 1 u Z 1 aj
Z
aj
fA (u)fB (v)dvdu:
(9. (Indeed Model 3 can be derived from Model 4. Suppose that 0 (t) in
.98) Using derivations similar to those for Case 3.93) (9.9. Clearly the likelihood contribution has a very similar shape that derived for the case where hazards were assumed to be constant within each interval. Similarly. pp. we may de…ne destinationspeci…c interval hazard rates hA (j) = hB (j) = and survivor functions: 1 1 exp( eAj ) exp( eBj )
A I(t)] B I(t)]
(9.94)
SA (t) = exp[ SB (t) = exp[
(9. but what is LA (…rst term)? We can rewrite it in terms of A .100)
This corresponds to the expression derived by Hamerle and Tutz (1989. INTERVALCENSORED DATA
107
and the j log[H(aj )] H(aj 1)] are intervalspeci…c parameters.99)
B
A
B
L4
= S(j) = S(j
h(j) 1 h(j) 1) [1 h(j)]
Aj Aj 1
A B
Bj Bj
A
+
Aj
B
+
A
Bj
B
[h(j)]
+
Aj j
Bj j
(9.96)
Using arguments similar to those developed earlier for the other cases. and the integrated baseline hazard function: Z aj Z aj LA (…rst term) = A H(u) exp[ A H(u)] B H(v) exp[ A H(u)]dvdu:
aj 1 u
(9. B .97)
1
aj
LA (second term) = hA (j)SA (j 1)SB (j). 85– 7).3.
and so the likelihood expression for the left truncation case is simply L = S(j)Z=S(i).) In sum. Now let the intercept terms in A and B be intervalspeci…c.5
The log of the integrated hazard changes at a constant rate over the interval
This assumption about the linearity of the withininterval log of the baseline hazards was made by Han and Hausman (1990) and Sueyoshi (1992). each of the likelihood expressions for intervalcensored data considered earlier (L1 . to derive the correct likelihood contributions in this case. This means dividing the likelihood contribution expression for the random sample of spells case (as considered earlier) by S(i).4.1
Extensions
Lefttruncated data
The derivations so far has assumed that analyst has access to a random sample of spells. the proportional intensity model adds some generality (weaker assumptions about the shape of the baseline hazard within intervals) at the cost of forcing a common baseline hazard function for the di¤erent latent risks. and may provide information about whether it is appropriate to make the proportionate intensities assumption. Analysts would typically like to allow the shapes of the baseline hazards to di¤er for the various destination types. L2 . COMPETING RISKS MODELS
the Proportional Intensities model is timeinvariant. Now. in which case j = . also known as ‘ delayed entry’ Suppose that the data for a given subject are truncated at a date within . and to test for equality rather than imposing it from the very start. The PCE model allows each of the baseline hazards to vary in a ‡ exible manner with survival time.4
9. one needs to condition on survival up to the truncation date. given the relationship between the survivor functionQ and the interval hazard. and contrasts with the assumption of a constant hazard made in the last subsection. for all j. L4 ) is of the form L = S(j)Z. We shall not consider it as it has been used relatively rarely and is.108
CHAPTER 9. Software programs for maximizing log L are typically applied to data sets organised so that there is one row for each interval that each subject is at risk of experiencing a transition.
9. a subject with j contributes j data rows and the loglikelihood
. rather complicated to explain. In the random sample of spells case. L3 . But. The assumption of a constant density within each interval (case 2 above) also implies that hazards rise within each interval. Appendix B) explains.
9. The models can all be easily adapted to the case in which the intervalcensored survival time data are subject to left truncation. the ith interval. As Sueyoshi (1992. in any case. the assumption implies that the hazard increases within each interval. where i < j: As we have seen in earlier chapters.3. in which case Model 3 is what results. there is a convenient j cancelling result: S(j)=S(i) = k=i (1 hk ): This has a convenient implication for empirical researchers.
9. the overall likelihood contribution for each person is now
M X M X
L =
m=1
m Lm (#
j
A. But. That is. Rather than using a discrete mixture distribution.2 Suppose that each destinationspeci…c hazard rate now depends on individualspeci…c unobserved heterogeneity in addition to. B ). Han and Hausman (1990).4.2
Correlated risks
The models may also be extended to allow for correlated rather than independent risks. observed heterogeneity (X).4. TB are now assumed to be independent conditional on the unobserved heterogeneity components. e. 2004).
B ) with
m
= 1:
(9. Røed and Zhang (2002a). Han and Hausman (1990). by allowing A and B to be correlated. one could assume a continuous distribution. supposing that the heterogeneity distribution is multivariate normal (cf. letting the likelihood contribution for each combination of mass points be given by Lm (# j A .
2 See e. : : : M . applied to data sets in which data rows corresponding to the intervals prior to the truncation point have been excluded: see Chapter 6. The di¤erence is that here the distribution is multivariate – the points of support refer to a joint distribution rather than a marginal distribution. the unconditional latent durations may be correlated. Dolton and van der Klaauw (1999). ICR models for intervalcensored and lefttruncated survival data can be easilyestimated using the same programs as for random samples of spells. a subject contributes j i data rows and the loglikelihood contribution is Pj log[S(j)Z=S(i)] = k=i+1 log(1 hk ) + log(Z): Thus. of the di¤erent combinations of A and B .
9. The latent durations TA . Speci…cally. Similarly. and Van den Berg and Lindeboom (1998).
. the hazard of transition to destination A is rewritten to be a function of B X + B .101)
m=1
This is similar to the discussion of discrete mixing distributions to characterize unobserved heterogeneity (see the previous chapter).g. the hazard for transtions to destination A is rewritten to be a function of A X + A (rather than of A X as before). and X no longer includes an intercept term. Conditions for identi…cation of proportional hazards models with regressors for dependent competing risks are given by Abbring and van den Berg (2003). but independent of.g. Perhaps the most straightforward means of estimating the extended model is to suppose that A and B have a discrete distribution with M points of support. where regression parameters be represented by the vector #. and these mass points are estimated together with the probabilities m for m = 1. EXTENSIONS
109
Pj hk ) + log(Z): With lefttruncated contribution is log[S(j)Z] = k=1 log(1 data. Sueyoshi (1992). Schneider and Uhlendor¤. and Heckman and Honoré (1989).
as Thomas (1996) explains. The same issues arise in the intrinsically discrete ‘ multinomial
. With intervalcensored data. then the modelling can be undertaken straightforwardly. one may be interested in more than simply how a covariate impacts on the destinationspeci…c latent hazard. But in other situations. In competing risks. this raises the question of whether the estimates derived are sensitive to the choice made. larger values of X imply a larger hazard rate. or expected spell length in the state conditional on exit to A: It turns out that all these quantities depend on all the parameters in the competing risk model. He also shows that if each hazard takes the proportional hazard form. Some complications arise with discretetime data. Since alternative assumptions are possible (and each is intrinsically nontestable). if one is prepared to assume that transitions can only occur at the interval boundaries. If the data are genuinely discrete. a cautionary note about the interpretation of coe¢ cient estimates from destinationspeci…c hazard regressions.110
CHAPTER 9. one cannot avoid modelling the destinationspeci…c models jointly. There can be no de…nitive answer to this question – it depends on the context. Sueyoshi (1992) emphasised that robustness can depend on the width of the intervals. in a manner analogous to that for continuous time data. the independent competing risks model can be estimated using standard singlerisk models. as discussed elsewhere in this book. one cannot use a singlerisk model approach. and shorter survival times. not only those in the destinationspeci…c model of exit to A. In the standard singlerisk and given the parameterizations used earlier. Nonetheless the situations in which transitions occur only at the interval boundaries are perhaps relatively rare. interpretations of coe¢ cients are not always so straightforward. Finally. For example. COMPETING RISKS MODELS
9. Regarding the other alternatives. particularly when the models include timevarying covariates. One may be also interested in estimating the (unconditional) probability of exiting to a particular destination A (say). then estimation can be straightforward for several important types of data generation process. with di¤erent speci…cations arising depending on the assumptions made about the shape of the hazard (or density) function within each interval. with a proportional hazards model. but have not been used much in applied work yet.5
Conclusions and additional issues
Our analysis of competing risks models has shown that. then an increase in X will increase the conditional probability of exit to A if its estimated coe¢ cient in the equation for the hazard of exit via A is larger than the corresponding coe¢ cients in the hazards for all other risks. For example. By constrast. but the ‘ multinominal logit’model is easy to estimate nonetheless. if the coe¢ cient estimate associated with a particular explanatory variable X is positive. we may note that Han and Hausman (1990) and Dolton and van der Klaauw (1995) suggested that their estimates were relatively robust to alternative choices. or the (conditional) probability of exit to A conditional on exiting at a particular survival time. If we have continuous time data. if we assume independence in competing risks. then there is a straightforward interpretation. Extensions to allow for correlated risks have been introduced in the literature.
A Stata program stcompet is provided by Coviello and Boggess (2004). Simulating the predicted values of interest for di¤erent values of the explanatory variables is one way of addressing the issues raised in this paragraph. See e. For multinomial logit models. The estimation of unconditional probabilities of exit to a particular destination is known in the biostatistics literature as the estimation of cumulative incidence.5. (1999). it is wellknown that increases in a variable with a positive coe¢ cient in the equation for one outcome need not lead to an increase in the probability of that outcome. as the probability of another outcome may increase by even more.
. Gooley et al. CONCLUSIONS AND ADDITIONAL ISSUES
111
logit’ competing risks model.9.g.
COMPETING RISKS MODELS
.112
CHAPTER 9.
and correlated competing risks (both are essentially an extension to the chapter on unobserved heterogeneity). Potential topics might include: Estimation using repeated spells and multistate transition models. see Allison (1984). Simulation of survival time data
113
. see Van den Berg (2001).Chapter 10
Additional topics
Nothing on this at present –something for the future. discussion based extensively on the application of mixture models. For an extensive. but more advanced. For a helpful introduction to the topic.
114
CHAPTER 10. ADDITIONAL TOPICS
.
Bergström. J. 321– 332. ‘ Discrete time methods for the analysis of event histories’ .P. J. pp. 65.. Abbring. A. Blossfeld. B. Public Economics 39. Allison. (1984).. M. 3– 26. Lawrence Urbaum Associates. 393. (1989). D. and van den Berg. 61– in S. Event History Analysis. Baker. P. (1995). (1989). A. San 98. and Jones. M. H. ‘ The identi…ability of the mixed proportional hazards competing risks model’ Journal of the Royal Statistical . P. . G. and Rohwer. 245– 273. Blossfeld. M. nomics 23. 701– 710. Hillsdale NJ. Leinhardt (ed) Sociological Methodology 1982. Bane. (1984). 2004 115
R
.A.(2003). K.H. and Stewart. Cambridge University Press. A.P. SAS Institute. second edition. and Edin.H. Newbury Park CA. Gomulka. and van den Berg. G. Micklewright. Society (A).M. (1992). (2002) Techniques of Event History Modeling: New Approaches to Causal Analysis. 5– 30.(2003). J. and Rau. of Human Resources 21. Allison.B. ‘ Duration dependence and nonparametric heterogeneity: a MonteCarlo study’ Journal of Econometrics 96. (1986). R.. Tinbergen Institute. 1– 23.J. Mahwah NJ... P. JosseyBass. Gary NC. Event History Modeling: A Guide for Social Scientists. Atkinson. (2004). 105. R.M. and Molino. Allison. Blank.References
Abbring. Francisco. ‘ Slipping in and out of poverty’ Journal .. Hamerle.J. ‘ Analyzing the length of welfare spells’ Journal of . Cambridge. and Ellwood. P. duration and incentives in Britain’ Journal of Public Eco. H. ‘ The determinants of individual unemployment durations in an era of high unemployment’ Economic Journal . Sage. Lawrence Urbaum Associates. Event History Analysis.S. W. ‘ The unobserved heterogeneity distribution in duration analysis’ unpublished paper. . ‘ Unemployment bene…t. (1982). Survival Analysis Using the SAS System: A Practical Guide.. J. G. Arulampulam. ‘ Time aggregation and the distributional shape of unemployment duration’ Journal of Applied Econometrics 7. and Mayer. Amsterdam. (2000). Box– Ste¤ensmeier. (1995). 357– . J. N.
(1988). An Introduction to Survival Analysis Using Stata. 305. T. (1999). Hoynes. (1989). 18. H. 435. Chapman Hall.W. and Rae. Analysis of Survival Data. New York. 552.W. (1994). and van der Klaauw. John Wiley.S. M. N. College Station TX. (1993).A. P.. B. and Lemeshow. Leisenring. 543– . T. (1987) ‘ Unemployment insurance and male unemployment duration in Canada’ Journal of Labor Economics 5. 393– 420. B. Dolton. G. 17. Maximum Likelihood Estimation with Stata. J. 695– 706. ElandtJohnson. 330.R. Holford. A. ‘ Cumulative incidence estimation in the presence of competing risks’ The Stata Journal. Stata Press. Blackwell. D. Guo. College Station TX. Jr. Frankfurt and New York. 5th edition. reprinted 1999). 217– . Gould. Heckman. (1984). 325– . and Gutierrez. and van der Klaauw.E.116
REFERENCES
Cleves. Sociological Methodology 1993. ‘ Age at motherhood in Japan’ Journal . W. Pitblado.V. Stata Press. (1990). Campus. Econometric Analysis. and Johnson. 271– 320. Hosmer. Gourieroux. G. (2003). Gould. 103– . J. Survival Methods and Data Analysis. W. (2000). Dolton.. PrenticeHall International. 431– . Cambridge MA..). A method for analyzing processes with multiple types of events’ Sociological Methods and Research . metrica 52. and Hausman. (1980. W. (1994). Marsden (ed. ‘ The turnover of teachers: a competing risks explanation’ Review of Economics and Statistics 81. 43– .. W. and MaCurdy. Gooley. A. S. 353. Ham. D. (1995). ‘ Flexible parametric estimation of duration and competing risks models’ Journal of Applied Econometrics 5. of Population Economics 7. M. and Ogawa. Diskrete Modelle zur Analyse von Verweildauer und Lebenzeiten. 76. C. Statistics in Medicine. Hachen. J.F. (1988). W. N. and Honoré. Ermisch. Cambridge University Press. 21– 54. W.
. (2003). D. R. J. (2004). and Oakes. 112. Econometrics of Qualitative Dependent Variables. ‘ Estimation of failure probabilities in the presence of competing risks: new representations of old estimators. J. 28. (1999). P. 48. Greene. Cambridge..J. Englewood Cli¤s NJ.L. ‘ Eventhistory analysis for lefttruncated data’ pp. ‘ Has the decline in welfare shortened welfare spells?’ American Economic Review Papers and Proceedings 84. 325– . ‘ The competing risks model. and Sribney.J. J. Applied Survival Analysis. New York. ‘ Leaving teaching in the UK: a duration analysis’ Economic Journal 105. ‘ The analysis of rates and survivorship using loglinear models’ Biometrics 36. 299– . J. ‘ method for minimising the impact A of distributional assumptions in econometric models for duration data’ Econo.. B. Crowley. (1984). (1999).E. Cox. and Singer. Heckman. T. 243 in P. W. R. Hamerle. Coviello.C. and Tutz. (2002). Han. D. Wiley. (1980). and Boggess. second edition. 1– . London. and Storer. ‘ The identi…ability of the competing risks model’ Biometrika. V. 4. A.
Kiefer. S. 956. S.gov. The Statistical Analysis of Failure Time Data. and Rigg. Narandranathan. and J. 139– . 97– 117 in J. ‘ Econometric methods for the duration of unemployment’ Econometrica 47. Narandranathan. B. (1985). (1980). S. (1988). . 1– 169. and Prentice. (1990). and GarcíaSerrano. (1995). (1990).php Jenkins.D. and Stewart. and Moeschberger.L. M. W. M. T. ‘ Multiple spells in the PrenticeGloecklerMeyer likelihood with unobserved heterogeneity’ Economics Letters 63. . Katz. (1979).F. The Econometric Analysis of Transition Data. Applied Statistics 42. (2001). 331– 340. Jenkins. ‘ Econometric analysis of duration data’ Journal of Econo. Cambridge University Press. 646– 679. Ridder.P. and Rhody. ‘ Modelling the probability of leaving unemployment: competing risk models with ‡ exible baseline hazards’ . Ondrich. C. ‘ The impact of the potential duration of unemployment bene…ts on the duration of unemployment’ Journal of Public . New York.asp Kalb‡ eisch. ‘ Econometric methods for grouped duration data’ pp. Nickell. Corporate Document Services. 45– 72.A. Lancaster. ‘ The relationship between unemployment bene…ts and reemployment probabilities: evidence from Spain’ .essex. ‘ Economic duration data and hazard functions’ Journal . Panel Data and Labor Market Studies. 1249– 1266. 144. 141– 165.iser.117 Jenkins. 129– 138. Survival Analysis. Longer version available at http://www. and Meyer. G. The Dynamics of Poverty in Britain. John Wiley.dwp. Oxford Bulletin of Economics and Statistics 66. R. (1990). Theeuwes (eds). NorthHolland. J. Meyer. (1993).L. Leeds. 239– 260. 361– 381. ‘ Estimating the probability of leaving unemployment’ . . Klein.P. and Lancaster. T. M. (1991).ac. (1995). ‘ The analysis of reemployment probabilities of the unemployed’ Journal of the Royal Statistical Society Series A. 939– . W. Kiefer. Oxford Bulletin of Economics and Statistics 57. Amsterdam. Department for Work and Pensions Research Report No.P.D. L. J. and Stewart. 757– 782.
. (1997). (1990). (2004). 157. Econometrica 47. ‘ How does the bene…t e¤ect vary as unemployment spells lengthen?’ Journal of Applied Econometrics 8. (1999). Bulletin of Economics and Statistics 53. S. (1980). M. N. 41. and Stewart. metrics 28. Kiefer.uk/pubs/workpaps/wp200017.P. New York. Nickell. ‘ Simple methods for testing the proportionality of causespeci…c hazards in competing risks models’ Oxford . S. http://www. B. Economics. 63– 83. 143. J. ‘ Easy ways to estimate discrete time duration models’ . metrica 58. Narandranathan. Cambridge. W. of Economic Literature 26. S. Lancaster. Techniques for Censored & Truncated Data. Hartog. N. N.uk/asd/asd5/rrep157. T. J. ‘ Unemployment insurance and unemployment spells’ Econo. SpringerVerlag. (1979).
L. 458– 478.T. G. G. Modeling Change and Event Occurence. 41– 67. Sueyoshi. and Willett.. ‘ Semiparametric proportional hazards estimation of competing risks models with timevarying covariates’ Journal of Econometrics . ‘ Attrition in longitudinal panel data and the empirical analysis of labor market behavior’ . T. Singer. Academic Press. Van den Berg. Van den Berg. 57– . J.edu/wws509/notes/c7. L. N.B. J. (1987). Moving O¤ Income Support: Barriers and Bridges.W. Therneau. Schneider. 20. (1984). Orlando FL.princeton. M. IZA. 51. (1996).T.D. Rodríguez. HMSO. cational Statistics 18. (1978). 27– 39. ‘ The duration of welfare spells’ .M. ‘ class of binary response models for grouped duraA tion data’ Journal of Applied Econometrics 10. (1999?).
. Bassi. T. identi…cation and multiple durations’ in Handbook of Econometrics Volume 5 (eds.118
REFERENCES
O’ Neill. StataCorp (2003). ‘ Timeaggregation bias in continuoustime hazardrate models’ pp. K. Resources 33.) JJ.B. (1991). and Willett.. and Grambsch. Social Dynamics: Models and Methods. New York. and Lindeboom. and Hannan. Heckman and E.B. Cambridge MA. HV245. and Koput. Amsterdam. Stata Corporation. (2001). NorthHolland. Shaw. ‘ Survival analysis’ http://data.L. (1994). Tuma. 411– . . DSS Research Report No. (1995). G. Blackwell. R.. S. R. Walker. Oxford. H. J.. T. 67. Leamer. Prentice. (1994).D. Jenkins.M. (2000). Stata Statistical Software: Release 8. P. 263– . 431. 155– 195. Marsden (ed. ‘ The transition from welfare to work and the role of potential labor income’ Working Paper 1420. (1993). Lindeboom. and Uhlendor¤. ‘ the interpretation of covariate estimates in indepenOn dent competingrisks models’ Oxford Bulletin of Economics and Statistics 48. (2003). 241– 248.pdf. G.J. ‘ s about time: using discretetime It’ survival analysis to study duration and the timing of events’ Journal of Edu. Singer. Petersen. . Review of Economics and Statistics 69.V. M.D. J. Bonn. ‘ Timeaggregation hazardrate models with covariates’ Sociological Methods and Research 21. London. 51. (1992). ‘ Attrition in panel survey data and the estimation of multistate labor market models’ Journal of Human .B. J. Thomas.). G. Modeling Survival Data: Extending the Cox Model. Sueyoshi. 290 in P. 25– 58. S. (1996). and Ridder. J. 53. ‘ s déjà vu all over again: using It’ multiplespell discretetime survival analysis’ Journal of Educational Statistics . 421– 435. Springer. K. A.T. and Willett. and Wolf. Van den Berg. D. and Gloeckler. ‘ Regression analysis of grouped survival data with application to breast cancer data’ Biometrics 34.J. G. College Station TX. Applied Longitudinal Data Analysis. Ashworth.J.M6 Singer. (1995). A. Oxford University Press. 25– . . Journal of Applied Econometrics 9.. J. (1992). Petersen. J. (2004). Sociological Methodology 1991.A.. ’ Duration models: speci…cation. M.. G. and Middleton. .
Yamaguchi. (1991). Event History Analysis.
. Econometric Analysis of Cross Section and Panel Data. Sage.M. MIT Press. J. (2002). Newbury Park CA. Cambridge MA. K.119 Wooldridge.
120
REFERENCES
.
or more about the details of statistical testing in general (Wald and likelihood ratio).
121
.g. Maximum Likelihood. Future versions may contain appendices about e.Appendix
Nothing here at present. and residual analysis. and more about Partial Likelihood.