Verth's Application of Event History On Clinical Tests

Department of Biostatistics
University of Aarhus
May 11 2004
Michael Vth
STATISTICAL ANALYSIS OF SURVIVAL DATA

IN CLINICAL RESEARCH 4
The main topic in the third period is analysis of

aggregated survival time data, i.e. data in which a
record reflects the survival experience of several
individuals in a given time and/or age period.
Such data are often encountered in epidemiological
studies and the methods presented below are
essentially identical to methods used to analyze
incidence rates and mortality rates in epidemiology.
A comprehensive coverage of analysis of aggregated
survival time data is beyond the scope of this course,
but the main approaches will be presented and
exemplified.
One additional topic related to survival time data with
individual records are also presented today:
Calculation of the expected survival curve based on
life tables for an external reference population
May 11 2004
Michael Vth
ANALYSIS OF SURVIVAL TIME DATA RELATION TO METHODS USED IN

EPIDEMIOLOGY
The statistical methods described in period 1 and 2 have
focused on mortality rates and modeled how the rates
depend on prognostic factors.
A clinical study of a group of patients followed until death
or some common closing date may be viewed as an
epidemiological study of a fixed cohort.
The methods for analysis of survival time data are
closely related to methods used for analysis of incidence
and mortality rates in epidemiological cohort studies.
In epidemiology event rates are computed as
rate
number of events
total time at risk
The time scale and/or age scale is often split in a

number of intervals (e.g. 5-years intervals) and separate
rates are computed for each time/age interval.
The effect of an exposure on the occurrence of the
event can be expressed as a rate ratio, which can be
estimated at a crude rate ratio or stratified on age/time
categories as well as other risk factors. Poisson
regression is used for more comprehensive analysis.
2
May 11 2004
Michael Vth
In the Cox regression analysis the hazard rate is

unspecified, i.e. no restriction is imposed on the way the
hazard rate depends on time and the shape of the
estimated baseline functions - hazard rate and survival
function is determined completely by the data.
Alternatively, a parametric description of the hazard
rate may be postulated and the unknown parameters
are then estimated from the data, typically as maximumlikelihood estimates
A simple parametric model the exponential
distribution
The simplest possible parametric model for the hazard
rate is assuming an unknown, constant rate. The
distribution of life times with a constant hazard rate is
called the exponential distribution.
In this case we have that
(t )
S(t ) exp( t ) e t
The maximum-likelihood estimate of the constant
hazard rate is
d number of events
s
total time at risk
The standard error of is estimated by
SE ( ) d s
3
May 11 2004
Michael Vth
A 95% confidence intervals for the unknown rate is

usually obtained by computing a symmetric confidence
interval for ln(rate) and transforming this interval back to
the original scale.
One may show that
1
SE (ln( ))
,
d
so a 95% confidence interval for the constant hazard
rate has
lower bound C
upper bound C
where
C exp 1.96
Note: The individual survival times are not needed to

compute the estimate, the standard errors and the
confidence limits. They can all be obtained from directly
from the aggregated data d and s.
Example: Survival with malignant melanoma
Consider the data used in Exercise 12. First a patient
identification number is generated (this is needed for
some of the commands) then the data are defined as
survival time data
gen id=_n
stset survtime , failure(status==1) id(id)
noshow scale(365.25)
stptime
4
May 11 2004
Michael Vth
The stptime generates the following output. The

calculations are based on the formulas above
Cohort | person-time failures
rate [95% Conf. Interval]
---------+---------------------------------------------------total | 1208.2793
57 .04717452 .0363884 .0611578
Separate rates for each category of a covariate are also

available
stptime , by(sex)
sex | person-time failures
---------+---------------------------------------------------female | 787.46886
28 .03555696 .0245506 .0514976
male | 420.8104
29 .06891465 .0478903 .0991689
---------+---------------------------------------------------total | 1208.2793
57 .04717452 .0363884 .0611578
Only one variable is allowed in the by option. To get

separate rates for intervals of follow-up time use the
option at(), which may be combined with the by option,
e.g.
stptime , at(2(2)8)
stptime , at(2(2)8) by(sex)
Output from the first command

Cohort | person-time failures
---------+---------------------------------------------------(0 - 2]| 387.25394
15 .03873427 .0233516 .0642502
(2 - 4]| 338.72005
21 .0619981 .0404232 .095088
(4 - 6]| 241.72895
14 .05791611 .034301 .0977896
(6 - 8]| 131.79329
5 .0379382 .0157909 .0911477
> 8 | 108.78303
2 .01838522 .0045981 .0735122
---------+----------------------------------------------------
total | 1208.2793
May 11 2004
Michael Vth
57 .04717452 .0363884 .0611578
May 11 2004
Michael Vth
Apparently, the mortality rate initially increases to reach

a plateau and then decreases.
The underlying life time distribution is no longer
exponential when the rate is computed for different
follow-up time intervals. The rate is now piecewise
constant.
Distributions with piecewise constant hazard rate
constitute a flexible class of distributions, which are just
as easy to work with as the exponential distribution,
which usually provides a too crude picture of the
distribution of lifetimes.
The distributions are characterized by the value of the
hazard rate in each of a number of disjoint intervals:
1
0 t 1
1 t 2
(t )
r 1 t
For interval j from j 1 to j let

d j the number of events
s j the total time at risk
Knowledge of the statistics d 1, d 2, , d r , s1, s2., sr
permits calculation of all relevant estimates and test
statistics.
7
May 11 2004
Michael Vth
For each interval the value of the hazard rate is

estimated by
j d j s j ,
the corresponding standard error becomes
SE ( j ) d j s j ,
and 95% confidence intervals are also obtained as
before.
A distribution with piecewise constant hazard rate can
be viewed as a parametric version of the life table and
we may in fact estimate the survival function in a way
very similar to the one used when computing the life
table estimate of the survival function. The probability of
surviving the jth interval given alive at the beginning of
the interval is estimated by
p j exp j ( j j 1 )
and the probability of surviving from time 0 until the end

of the jth interval is then estimated by
j
S ( j ) p1 p 2
p
Distributions with piecewise constant hazards provide
the link between the methods used for analysis of
survival data and the method used for analysis
epidemiological cohort studies.
Survival analysis methodology uses individual records
whereas the epidemiological analysis usually is based
on a multi-way table of aggregated data.
8
May 11 2004
Michael Vth
Example. Survival with malignant melanoma..

The STATA command
stptime , at(2(2)8) by(sex)
produces the following output

sex | person-time failures
-------+-----------------------------------------------------female |
(0 - 2]| 243.64408
5 .02052174 .0085417 .0493041
(2 - 4]| 221.13621
11 .0497431 .0275477 .0898214
(4 - 6]| 159.99452
8 .05000171 .0250057 .0999839
(6 - 8]| 86.639288
2 .02308422 .0057733 .0923008
> 8 | 76.054757
2 .02629684 .0065768 .1051463
-------+-----------------------------------------------------male |
(0 - 2]| 143.60986
10 .0696331 .0374664 .1294164
(2 - 4]| 117.58385
10 .0850457 .0457592 .1580614
(4 - 6]| 81.734428
6 .07340848 .0329795 .1633984
(6 - 8]| 45.154004
3 .06643929 .0214281 .2059996
> 8 | 32.728268
0
0
.
.
-------+-----------------------------------------------------total | 1208.2793
57 .04717452 .0363884 .0611578
To compare the survival for males and females

controlling for follow-up time (categorized in five time
intervals) using standard epidemiological methods only
the 2x5 person-time and 2x5 failures are needed.
STATA has the command stsplit and collapse which
can be used to form an aggregated data set. This new
data set can then be analyzed by a series of commands
for analysis of aggregated data.
First the individual records are split after 2, 4, 6, and 8
years of follow-up.
May 11 2004
Michael Vth
stsplit timecat , at(2(2)8)
We then define the new variables died and risktime

from the system variables _d, _t0 and _t, which for each
interval gives the event count (0 or 1), the start time and
the end time of the interval:
gen died=_d
gen risktime=_t-_t0
All the individual contributions are then aggregated (i.e.

summed) in a two-way table of sex versus time interval
and the result is saved in a new file
collapse (sum) died risktime , by(timecat sex)
save e:\kurser\survival\melatimesex.dta
To see the context of the new file write

use e:\kurser\survival\\melatimesex.dta
list
. list
+------------------------------------+
| sex timecat died risktime |
|------------------------------------|
1. | female
0
5 243.6441 |
2. | male
0
10 143.6099 |
3. | female
2
11 221.1362 |
4. | male
2
10 117.5838 |
5. | female
4
8 159.9945 |
|------------------------------------|
6. | male
4
6 81.73443 |
7. | female
6
2 86.63929 |
8. | male
6
3
45.154 |
9. | female
8
2 76.05476 |
10. | male
8
0 32.72827 |
10
May 11 2004
Michael Vth
+------------------------------------+
Note that timecat takes the lower limit of the interval as

category value.
11
May 11 2004
Michael Vth
To compute the mortality rate ratio for males versus

females stratified on categories of follow-up time write
ir died sex risktime , by(timecat)
The command syntax is

ir event-variable exposure-variable time-at-risk-variable ,
by(stratum-variable)
The exposure variable must have two categories and

only one stratum variable is allowed.
Output
. ir died sex risktime , by(timecat)
timecat |
IRR [95% Conf. Interval] M-H Weight
-------------+-----------------------------------------------0 | 3.388889 1.055401 12.63597 1.85567
2 | 1.702619 .6482636 4.416032 3.828909
4 | 1.463415 .4185228 4.809543 2.710744
6|
2.9 .3322018 34.72104 .6818182
8|
0
0 12.26261 .6055046
-------------+-----------------------------------------------Crude | 1.936121 1.111589 3.377279
(exact)
M-H combined | 1.936666 1.147481 3.268616
-------------------------------------------------------------Test of homogeneity (M-H) chi2(4) = 1.60 Pr>chi2 = 0.8096
Essentially the same results is obtained by a Cox

regression analysis of the original data set with
individual records
-------------------------------------------------------------_t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Inter]
--------+----------------------------------------------------sex | 1.939011 .5140979 2.50 0.013 1.153182 3.260339
--------------------------------------------------------------
12
May 11 2004
Michael Vth
EXAMPLE: THE LIFE SPAN STUDY

The mortality of approximately 100,000 survivors after
the atomic bombing of Hiroshima and Nagasaki has
been followed since October 1950 in a still on-going
study called the Life Span Study (LSS).
The table below gives aggregated data on cancer
mortality from 1971 to 1990 in Hiroshima, only two dose
categories are considered here.
Age at
exposure
0-19
Sex
M
F
20-39
M
F
40-
M
F
No. in
1950
392
3696
406
4221
219
1774
506
4272
431
3355
376
4052
Dose (Gy)
1
0.005
1
0.005
1
0.005
1
0.005
1
0.005
1
0.005
Cancer
Risk time
deaths
(in 1000 y)
23
6.77
105
67.33
22
7.12
71
79.03
37
2.92
218
23.93
69
7.82
266
71.48
40
1.59
233
13.50
37
2.47
255
27.19
LSS: Mortality, all cancers combined, Hiroshima, 1971-90
The STATA file hiro7190.dta contains these data. The

variable names are agex, sex, dose, cases, and pyr.
13
May 11 2004
Michael Vth
The following STATA commands may be used to

estimate the effect of exposure stratified on age-atexposure and sex
use e:\kurser\survival\hiro7190.dta
//combine agex and sex to a single variate
egen agexsex=group(agex sex)
//avoid rounding error
replace pyr=pyr*1000
ir cases dose pyr , by(agexsex)
Output
group(agex sex)
IRR [95% Conf. Interval] M-H Weight
------------------------------------------------------------1 | 2.178505 1.323522 3.445787
9.593117
2 | 3.43935 2.029444 5.615747
5.867905
3 | 1.390929 .9539309 1.977719
23.70801
4 | 2.371075 1.792351 3.100524
26.23102
5 | 1.457608 1.01505 2.045349
24.5507
6 | 1.597253 1.099477 2.26114
21.23567
-------------+----------------------------------------------Crude | 1.955327 1.688804 2.255824
M-H combined | 1.852352 1.607143 2.134972
------------------------------------------------------------Test of homogeneity (M-H) chi2(5) = 15.53 Pr>chi2 = 0.0083
Note: The last stratum variable is moving fastest, i.e.

1 ~ 0-19 M, 2 ~ 0-19 F, 3 ~ 20-39 M, etc.
The rate ratio for exposure effect is almost 2 and highly
significant. The test of homogeneity (identical stratumspecific rate ratios) is, however, also statistically
significant, indicating that the effect of exposure
depends on sex and/or age at exposure of the survivor.
14
May 11 2004
Michael Vth
A further investigation of this effect modification requires

a more refined method of analysis, a so-called Poisson
regression analysis.
POISSON REGRESSION
In the Poisson regression analysis the number of
events in a given cell of the multi-way table is treated as
a Poisson variable with mean equal to rate risktime .
Comment: A Poisson distribution with mean is the
limiting distribution of a binomial distribution (n,p) as the
n goes to infinity and p tends to zero such that the mean
is fixed n p . Poisson distributions are often used to
model occurrence of random events.
In a Poisson regression model the rate in a given cell is
modeled as a product of factors reflecting the effect of
the category levels of the variables defining the multiway table.
Example: Cancer mortality in the LSS
The LSS data above form a 3x2x2 table with agex, sex,
and dose as classifying variables.
A Poisson regression model specifies multiplicative
structure for mortality rate ijk in the cell given by
agex=i, sex=j, dose=k (i = 0,1,2, j = 0,1, k = 0,1)
If a reference category is chosen for each of the
classifying variables (e.g. i = j = k = 0), the Poisson
15
May 11 2004
Michael Vth
regression model with no interaction assumes that the

rates satisfy
ijk 000ai b j ck
The parameters ai , b j , and ck are rate ratios. The
parameter a1 represents the rate ratio of the mortality in
the second age-at-exposure category relative to the first
category when controlling for sex and dose as
independent risk factors.
Poisson models are usually specified as additive models
for the ln(rate). Using dummy variables we have
ln(rate ) ln( ijk ) 0 i(1)zi(1) j(2)z(2)
k(3)zk(3)
j
The constant, the parameter 0 , is the ln(rate) in the
reference cell and the other -parameters are
logarithms of rate ratios. Models with interaction terms
may also be used.
Poisson regression with STATA
The following commands fit the above Poisson
regression model to the LSS data in hiro7190.dta,
Note that the output from the default version (the first
command) gives the log-linear parameter estimates. To
get rate ratios the option irr must be added (second
version)
xi:poisson cases i.sex i.agex i.dose ,
exposure(pyr) nolog
16
May 11 2004
Michael Vth
xi:poisson cases i.sex i.agex i.dose ,

exposure(pyr) nolog irr
17
May 11 2004
Michael Vth
Output
. xi:poisson cases i.sex i.agex i.dose , exposure(pyr) nolog
i.sex
_Isex_1-2
(naturally coded; _Isex_1 omitted)
i.agex
_Iagex_1-3 (naturally coded; _Iagex_1 omitted)
i.dose
_Idose_0-1 (naturally coded; _Idose_0 omitted)
Poisson regression
Number of obs =
12
LR chi2(4)
= 1150.09
Prob > chi2
= 0.0000
Log likelihood = -47.475645
Pseudo R2
= 0.9237
-------------------------------------------------------------cases | Coef. Std. Err. z P>|z| [95% Conf. Int.]
---------+---------------------------------------------------_Isex_2 | -.6606352 .054719 -12.07 0.000 -.767882 -.55339
_Iagex_2 | 1.529529 .0798769 19.15 0.000 1.37297 1.68609
_Iagex_3 | 2.29477 .0796431 28.81 0.000 2.13867 2.45087
_Idose_1 | .6165749 .0725606 8.50 0.000 .474358 .75879
_cons | -6.357755 .0712617 -89.22 0.000 -6.49743 -6.2181
pyr | (exposure)
-------------------------------------------------------------. xi:poisson cases i.sex i.agex i.dose , exposure(pyr) irr
******************* first part as above **********************
-------------------------------------------------------------cases |
IRR Std. Err. z P>|z| [95% Conf. Int.]
-------------+-----------------------------------------------_Isex_2 | .5165231 .0282636 -12.07 0.000 .463995 .574998
_Iagex_2 | 4.616002 .368712 19.15 0.000 3.94707 5.39830
_Iagex_3 | 9.922157 .7902317 28.81 0.000 8.48816 11.5984
_Idose_1 | 1.852572 .1344237 8.50 0.000 1.60698 2.13569
pyr | (exposure)
--------------------------------------------------------------
The reference group is unexposed males, age 0-19 in

August 1945. Note that the constant term is not printed
when rate ratios are requested.
Parameter estimates are maximum-likelihood
estimates. The dose effect is extremely significant and
almost identical to the one found previously, on page 18
18
May 11 2004
Michael Vth
we had 1.852352. As expected the mortality depends

also on sex and age-at-exposure.
Does the model fit the data?
The table below compares observed count with
expected count predicted by the Poisson model fitted
above. We have
where ijk 000ai b j ck .
Expected ijk ijk PYR ,

Age at exposure
Males
observed
expected
Females observed
expected
0-19
105
116.7
71
70.8
unexp.
20-39
218
191.5
266
295.4
40233
232.2
255
241.5
0-19
23
21.7
22
11.8
exposed
20-39
37
43.3
69
59.9
4040
50.7
37
40.6
Illustration: For exposed males aged 20-39 at exposure

we have e.g.
Expected 011 exp(-6.358) 4.616
1.853 2.92
1000
43.279
The usual 2 goodness-of-fit test becomes 22.27 with

12 5 = 7 degrees of freedom giving a p = 0.0023.
STATAs command poisgof computes this statistic and
the corresponding likelihood ratio test
poisgof
poisgof , pearson
Output
. poisgof
Goodness-of-fit chi2 = 20.61145
Prob > chi2(7)
= 0.0044
19
May 11 2004
Michael Vth
. poisgof , pearson
Prob > chi2(7)
= 0.0023
20
May 11 2004
Michael Vth
The fit of the model can be improved by adding

interaction terms. The following output shows the result
of a series of such model fits. Only the output from the
final model is shown. Note the first model, which
includes the agex*sex interaction, corresponds to the
stratified analysis above.
. quietly
xi:poisson cases i.sex*i.agex i.dose , exposure(pyr) irr
. poisgof
Prob > chi2(5)
= 0.0105
. quietly xi:poisson cases i.sex*i.agex i.dose i.dose*i.sex ,
exposure(pyr) irr
. poisgof
Prob > chi2(4)
= 0.0496
. quietly xi: poisson cases i.sex*i.agex i.dose i.dose*i.sex
i.dose*i.agex , exposure(pyr) irr
. poisgof
Prob > chi2(2)
= 0.3886
. poisson //with no argument the previous fit is displayed
Poisson regression
Number of obs =
12
LR chi2(9)
= 1168.81
Prob > chi2
= 0.0000
Pseudo R2
= 0.9388
-------------------------------------------------------------case |
IRR Std. Err. z P>|z| [95% Conf. Inter]
------------+------------------------------------------------_Isex_2| .588025 .082229 -3.80 0.000 .447058 .773442
_Iagex_2| 5.79417 .660672 15.41 0.000 4.63376 7.24516
_Iagex_3| 11.3722 1.28203 21.57 0.000 9.11773 14.184
_IsexXage_~2| .715692 .11442 -2.09 0.036 .52317 .97907
_IsexXage_~3| .891059 .143171 -0.72 0.473 .65034 1.22088
_Idose_1| 2.27988 .411002 4.57 0.000 1.60127 3.24609
21
May 11 2004
Michael Vth
_IdosXsex_~2| 1.43131 .210778 2.44 0.015 1.07246 1.91022

_IdosXage_~2| .679969 .135972 -1.93 0.054 .45949 1.00624
_IdosXage_~3| .557839 .115872 -2.81 0.005 .371279 .838141
pyr| (exposure)
--------------------------------------------------------------
. testparm *sXa* //testing no dose by age interaction

( 1) [cases]_IdosXage_1_2 = 0
( 2) [cases]_IdosXage_1_3 = 0
chi2( 2) = 7.90
Prob > chi2 = 0.0193
. testparm *xXa* //testing no sex by age interaction
( 1) [cases]_IsexXage_2_2 = 0
( 2) [cases]_IsexXage_2_3 = 0
chi2( 2) = 5.74
Prob > chi2 = 0.0566
Comments
The final model is consistent with the data, but gives a
rather complex description.
The dose effect is modified by both sex (larger rate ratio
for females) and age-at-exposure (the dose effect
decreases with age-at-exposure).
Having 10 estimated parameters the final model is only
slightly simpler than the saturated model (i.e. the model
with 12 freely varying rates).
Note also
The goodness-of-fit test is not very reliable in large
tables with many small counts. In such circumstances
one may instead compare a given model with a much
22
May 11 2004
Michael Vth
richer model that e.g. includes a lot of interaction

parameters.
23
May 11 2004
Michael Vth
FROM SURVIVAL TIME DATA

TO POISSON REGRESSION ANALYSIS
In the LSS example the cancer mortality rate in each of
the 12 groups was constant during the follow-up from
1971 to 1990. This is a highly unrealistic model, since it
is well-known that cancer mortality rates increase
dramatically with age.
In analyses of data from large, epidemiological cohort
studies the dependence of rates on age and calendar
time is usually described by piecewise constant
hazard rates models.
This gives much more realistic models with a better
correction for confounding effects of age and/or calendar
time.
The analysis of such models is based on event counts
and risk times in a multi-way table and in this context
the method of analysis is usually denoted Poisson
regression, since the analysis is formally identical (i.e.
gives the same maximum likelihood estimates) to a
regression model for counts described by Poisson
distributions.
Individual data records are initially aggregated to form
the multi-way table of event counts and person-years-atrisk, see the figure below.
24
May 11 2004
Michael Vth
Age
25
20
15
10
1980
1985
1990
1995
2000
Calendar Time
In each cell compute:
Number of events D
Total time at risk
S
In the analysis of the LSS data multi-way tables with

3000-8000 cells are routinely used. These tables are
e.g. formed by a cross-classification on age (5-years
intervals), calendar time (5-years intervals), sex, city
and dose (8-12 categories) and separate analyses are
carried out for the most common cancer types.
25
May 11 2004
Michael Vth
For each entry in the multi-way table a crude rate can be

estimated as D/S = events/risktime. In the analysis the
dependence of these rates on the classifying factors are
studied using Poisson regression models very similar to
Cox regression models.
Main difference: the unspecified baseline hazard of the
Cox regression model is replaced by a piecewise
constant hazard.
In Poisson regression models effects of categorical
covariates used as classifying factors are described by
rate ratios. Both models with internal reference rates
and models with external reference rates are available.
The analysis requires software that can
1. Form the multi-way table of counts and personyears,
2. Perform a Poisson regression analysis of the
aggregated data.
Software:
Forming the table:
EPICURE, SAS, STATA (but not SPSS)
Poisson regression:
EPICURE, EGRET, SAS, STATA, S-Plus, Genstat,
GLIM, Statistix etc. (SPSS: not really).
26
May 11 2004
Michael Vth
FORMING EVENT-RISKTIME TABLES WITH STATA

The STATA commands stsplit and collapse are used
to transform survival time data with individual records
into a multi-way event-risktime table to be analyzed with
Poisson regression.
A few examples illustrate some of the possibilities. The
manual presents many more the stsplit is a highly
versatile command
Example 1. Splitting on time in study
In a clinical trial the data are usually described by
stset time , failure(status==1) noshow
to split the data at 1,2,3, and 5 years of follow-up write

stsplit timecat , at(1,2,3,5)
and data are split in 5 time categories.

Example 2: Splitting on age
If the survival time data are defined by
stset outdate , failure(status==1) enter(time
indate) origin(time bdate) scale(365.25)
noshow
the time scale is age in years and we may consider

using
stsplit agecat , at(10(10)70)
27
May 11 2004
Michael Vth
Example 3 Splitting on age and time in study

The data considered in example 2 can be split on both
age and time in study with the commands
stset outdate , failure(status==1) enter(time
indate) origin(time bdate) scale(365.25)
noshow
from(time indate)
After the data have been split the multi-way table is

formed by the commands
gen event=_d
gen risktime=_t-_t0
collapse (sum) event risktime , by(varlist)
save newfilename
use newfilename
xi: poisson event varlist1 , exposure(risktime)
other options
etc.
where varlist is a subset of the variables defining the
multi-way table and interaction terms.
Note:
The data do not have to be collapsed to do Poisson
regression, but data may become very large if split on
28
May 11 2004
Michael Vth
several time scales in many intervals and collapsing the

data may speed up computation. Also consider deleting
unnecessary variables first.
POISSON REGRESSION
MALIGNANT MELANOM DATA
To compare the results from a Cox regression analysis
with those from a Poisson regression model of the same
covariates consider
use "E:\kurser\survival\melanoma.dta"
* generate a person id number
gen id=_n
* define data as survival time data
stset survtime ,
///
failure(status==1) noshow scale(365.25) id(id)
* for later comparison we fit the following
* Cox model
xi:stcox i.sex i.invasion i.ecells
///
i.ulcerat , nolog
* now be split on follow-up time
gen died=_d
gen risktime=_t-_t0
collapse (sum) risktime died ,
///
by(timecat sex invasion ecells ulcerat)
* and save the multi-way table
save e:\kurser\survival\data\mmtable.dta
use e:\kurser\survival\mmtable.dta
* fit the corresponding
* poisson regression model
29
May 11 2004
Michael Vth
xi:poisson died
///
i.sex i.invasion i.ecells i.ulcerat
i.timecat , exposure(risktime) irr
///
Apart from the baseline hazard rate the two models are
identical and both give results as rate ratios.
Selected Output
Cox regression -- no ties
No. of subjects =
205
Number of obs =
No. of failures =
57
Time at risk = 1208.279261
LR chi2(5)
=
44.51
Prob > chi2
=
205
0.0000
-------------------------------------------------------------_t |Haz. Ratio Std. Err. z P>|z| [95% Conf. Int]

------------+------------------------------------------------_Isex_1| 1.87870 .509345 2.33 0.020 1.10429 3.19618
_Iinvasion_1| 2.14216 .711768 2.29 0.022 1.11693 4.10845
_Iinvasion_2| 2.78566 1.09658 2.60 0.009 1.28781 6.02569
_Iecells_1| 1.79241 .547121 1.91 0.056 .985399 3.2603
_Iulcerat_1| 2.75137 .88215 3.16 0.002 1.4677 5.15780
--------------------------------------------------------------
Poisson regression
Number of obs =
109
LR chi2(9)
=
50.08
Prob > chi2
= 0.0000
Pseudo R2
= 0.2222
-------------------------------------------------------------died |
IRR Std. Err. z P>|z| [95% Conf. Int]
------------+------------------------------------------------_Isex_1| 1.85962 .504960 2.28 0.022 1.09217 3.16635
_Iinvasion_1| 2.1712 .719682 2.34 0.019 1.13382 4.15761
_Iinvasion_2| 2.71461 1.06812 2.54 0.011 1.25541 5.86988
_Iecells_1| 1.81955 .555404 1.96 0.050 1.00033 3.3097
_Iulcerat_1| 2.76710 .885904 3.18 0.001 1.47743 5.18254
_Itimecat_2| 1.88015 .638056 1.86 0.063 .966772 3.65645
_Itimecat_4| 1.79353 .668697 1.57 0.117 .863671 3.72451
_Itimecat_6| 1.27918 .662529 0.48 0.635 .463521 3.53018
_Itimecat_8| .613017 .462909 -0.65 0.517 .139541 2.6930
risktime| (exposure)
30
May 11 2004
Michael Vth
--------------------------------------------------------------
The ratio between corresponding estimates varies

between 0.974 and 1.015, so estimated rate ratios are
indeed very similar in the two models. This is not
surprising since a piecewise constant hazard rate based
on 5 time intervals is rather flexible and it is therefore
possible to approximate the shape of a wide range of
baseline hazard rate functions.
31
May 11 2004
Michael Vth
USING POPULATION MORTALITY RATES IN

THE ANALYSIS
Main types of problems

Comparison of mortality (or survival) in a study group
with that of an external reference population for which
the mortality is known from e.g. published life tables.
Comparison of the excess mortality (relative to an
external reference group) found in two or several
subgroups of a study.
First carefully consider:
Why introduce an external reference population? Is it
really necessary or just a "convenient" way to correct for
age or sex?
Also consider:
Which external reference population should be
used? The whole country? The county? The
individuals in the working force? etc.
Which endpoint? All causes of death or specific
causes that are expected to be particularly
relevant?
32
May 11 2004
Michael Vth
Here mainly a discussion of "how to do it" without

taking a random sample from the background
population.
The statistical methods which include the mortality of the
background population can roughly be divided in two
groups:
RELATIVE SURVIVAL
The statistical methods in this group include:
The expected survival curve
"Crude", "corrected" and "relative" survival
Excess mortality parameters are usually describing
additive effects on the mortality rate.
RELATIVE MORTALITY
The statistical methods in this group include:
The expected number of deaths
The person-year method
Standardized mortality ratios (SMR)
Poisson regression with external rates
Excess mortality parameters are usually describing
multiplicative effects on the mortality rate.
FIRST:
What kind of information is available about the mortality
of the "normal" population? - and how can it be utilized?
33
May 11 2004
Michael Vth
34
May 11 2004
Michael Vth
NATIONAL LIFE TABLES AND MORTALITY

STATISTICS
Sources:
Most countries regularly - typically once a year - publish
a cross-sectional population life table. A standard lay-out
and terminology are used.
In Denmark:
Publications from The National Bureau of Statistics
(Danmarks Statistik) including "Statistisk rbog",
"Befolkningens bevgelser" contain life tables for the
Danish population based on one year period or five year
periods for each sex and single year age intervals from
0 to 99 year.
Life tables since 1981 can be found on the website
http://www.statistikbanken.dk/
which also gives access to other tables with mortality
statistics - select the link to Population and elections
(Befolkning og valg)
A typical life table is included on the last page.
Sundhedsstyrelsen publishes information on cause of
death (based on the death certificates) each year in
"Ddsrsagerne i Danmark". Cancer incidence rates
are available from Krftens Bekmpelse.
35
May 11 2004
Michael Vth
THE COLUMNS OF THE LIFE TABLE

For each sex and single year age intervals from 0 to 99
years:
Age-specific mortality proportion (Aldersklassens
ddshyppighed):
q x q( x )
The probability of dying at the age of

x years given alive on the x year birthday.
The table gives 100000 q x
Survival function (Overlevende):

Sx S ( x )
The probability for a new-born of surviving

until the x year birthday.
The table gives 100000 Sx
Expected remaining lifetime (Middellevetid)

ex e( x ) The expected remaining lifetime for a x
year old from the x year birthday.
Interrelationships:
q x Sx Sx 1 Sx 1 Sx 1 Sx 1 px
Sx 1 Sx px
Sx p0 p1 p2 K px 1
36
May 11 2004
Michael Vth
COMPUTING MORTALITY RATES

FROM THE LIFE TABLE
If the national mortality rate is piecewise constant on 1
year intervals, i.e. ( x ) x for x in 1 year intervals, the
following relation is true
q x 1 Sx 1 Sx 1 exp( x )
The (total) mortality rate can therefore be obtained from
the first or the second column of the life table as
x ln Sx 1 Sx ln 1 q x
Notes
The age-specific mortality proportion q x is a probability
and has no dimension, whereas the mortality rate x has
dimension per time unit.
In epidemiology both are often denoted the
mortality rate.
The mortality rate is always numerically larger than the
corresponding age-specific mortality proportion, but
apart from extremely old ages the discrepancy is very
small.
The plots below show the ratio x q x plotted against the
proportion q x and against age for each sex for the 200001 Danish life table.
37
May 11 2004
Michael Vth
The plots indicate that x q x is essentially correct for

ages below 60 and that an improved approximation can
be obtained as
x qx (1 q x 2) .
38
May 11 2004
Michael Vth
CAUSE-SPECIFIC MORTALITY
Simple -and reasonably accurate estimates of causespecific mortality rates can be derived from the
relation
CS ( x ) CS ( x ) ( x )
where CS ( x ) the cause-specific mortality rate, CS ( x )
the proportion of deaths from the specified cause at age
x, and ( x ) the total mortality rate.
Estimation of the total mortality rate ( x ) has already
been described.
For each sex and age in 5 year intervals the proportion
CS ( x ) can be estimated from tables of number of
deaths by cause published each year in "Causes of
death in Denmark" (Ddsrsagerne i Danmark) - or on
the website mention above - as
no. of cause-specific deaths in age interval
,
total no. of deaths in age interval
where the "age interval" refers to the five age interval
containing age x.
In each 5 year interval total mortality rates (one for each
of the 5 years) are then multiplied by this estimate to
give the corresponding cause-specific mortality rates .
39
May 11 2004
Michael Vth
THE EXPECTED SURVIVAL CURVE,

RELATIVE SURVIVAL AND CORECTED SURVIVAL
The expected survival curve:
Typical area of application: A clinical follow-up study.
Here: classical version based on grouped survival times
Follow-up
year
1. year
2. year
3. year
M
Alive at
start of
year
Y0
Y1
Y2
M
During the year of

follow-up
dead
d1
d2
d3
M
censored
c1
c2
c3
M
modified
number
at risk
n1
n2
n3
M
The modified number at risk is obtained as

ni Yi 1 ci 2
For each follow-up year the mortality proportion is
estimated by
q%
i d i ni
and the corresponding (conditional) survival proportion
is
%
p%
i 1 qi
The usual life table estimate of the survival function is
S%
i The probability of surviving until the end of period i
% p%
p%
i .
1 p2 K
40
May 11 2004
Michael Vth
Computation of the corresponding "expected"

survival curve involves the following steps:
First follow-up year :
Consider the Y0 individuals alive at the start of the year.
For j 1,2,K Y0 let
p1*j
The probability according to the published life

table of surviving one year for an individual of
the same sex and age as individual j.
The average expected survival probability for the first

year:
1 Y0 *
*
p1 p1j
Y0 i 1
Second follow-up year :
Consider the Y1 individuals alive at the start of the year.
For j 1,2,K Y1 let
p2* j
The probability according to the published life

table of surviving one year for an individual of
the same sex and age as individual j.
The average expected survival probability for the second

year:
1 Y1 *
*
p2 p2 j
Y1 j 1
For each of the following years of follow-up an average
expected survival probability p3* , p4* etc. are similarly
computed.
41
May 11 2004
Michael Vth
After i year of follow-up the expected survival curve

takes the value:
Si* p1* p2* K
pi*
The corrected survival curve is defined as the ratio of
the estimated (crude) survival to the expected survival:
*
SiC S%
S
i
i .
Note that the corrected survival curve is not necessarily

decreasing!
For each follow-up interval the relative survival is
defined as the ratio of the estimated conditional survival
probability to the corresponding conditional expected
survival probability:
*
ri p%
i pi .
Software for calculation of expected survival curves

To my knowledge none of the commercial statistical
software packages are able to compute expected
survival, corrected survival and relative survival, but
several public-domain products are available. See e.g.
http://www.cancerregistry.fi/surv2/
A locally developed PASCAL program is available from
Department at Biostatistics.
42
May 11 2004
Michael Vth
THE STATISTICAL MODEL BEHIND EXPECTED

SURVIVAL AND CORRECTED SURVIVAL
The mortality rate for patient j at time t is the sum of two
terms: the background mortality for a person of the
same age and sex and the excess mortality "caused"
by the disease in question:
j (t ) * (age, sex ) (t ) ,
where * (age, sex ) is the population mortality rate and
(t ) the excess mortality rate.
Let
ti
Ai A(t i ) (t )dt
0
Then
*
S * is an estimate of exp ,
exp *i Ai
S%
i is an estimate of
i
*
The corrected survival SiC S%
i Si can therefore be
viewed as an estimate of
exp Ai exp (t ) dt ,
0
ti
the survival function corresponding to the excess

mortality rate (t ) .
43
May 11 2004
Michael Vth
RELATIVE MORTALITY,
THE PERSON-YEAR METHOD
Notation:
(t )
mortality rate in the study group
mortality rate in the reference population
* (t )
Survival function and integrated mortality rate in the
reference population are denoted S * (t ) and * (t ) .
A simple statistical model:
Assume that the mortality rate in the study group is
proportional to that of the reference group:
(t ) * (t )
Two situations:
1. Age a is chosen as the underlying time t. With t a
the model becomes
(a, sex ) * (a, sex )

2. Follow-up time is chosen as the underlying time
scale. If e denote the age at entry the model becomes
(t , e, sex ) * (t e, sex )
The dependence on sex is suppress below.
44
May 11 2004
Michael Vth
The parameter is the mortality rate ratio or the

relative mortality. In epidemiology the estimate of is
usually called the standardized mortality ratio (SMR).
If ( )1 the mortality in the study group is higher
(lower) than the mortality in the reference population.
Generalizations: The relative mortality may depend on
e.g. sex, age-at-entry, follow-up time or risk factors,
which are known for each individual.
Estimation of the relative mortality
Data:
A record for each individual with:
Age at entry in the study
Age at exit from the study
Status at exit (dead or alive).
tENTRY
tEXIT
The maximum likelihood estimate of becomes
(t
D
*
)
(tENTRY )
EXIT
persons
D
E
The numerator D:
D = the observed number of deaths during follow-up.
The denominator E
E = the expected number of deaths during follow
up. This is a convenient terminology, but not quite
correct. E is rather number of deaths to be expected
with the observed follow-up times.
45
May 11 2004
Michael Vth
Confidence intervals for the relative mortality

An approximate 95% confidence interval for is
obtained by using
SE (ln()) 1
A symmetric confidence interval for ln( ) is transformed

back to a asymmetric confidence interval for :
c c ,
where
c exp 1.96
Null hypothesis:
The mortality in the study group is identical to the
mortality in the reference population, i.e. 1.
The expected value of D E is 0 on the null hypothesis
E (D E ) 0
and Var (D E ) can be estimated by E.
These results lead to the following test statistic
(D E )2
X
E
2
which on the null hypothesis is approximately a 2

variate on 1 degree of freedom. Large values provide
evidence against the null hypothesis.
46
May 11 2004
Michael Vth
THE EXPECTED NUMBER OF DEATHS

Above the expected number of deaths was
computed as the sum of individual contributions of the
form
* (tEXIT ) * (tENTRY )
Since the mortality rate is assumed constant on a
number of age intervals (typically 1 year or 5 year
intervals) we have
tEXIT
(tEXIT ) (tENTRY )
*
tENTRY
* (s )ds x t x ,
x
hvor t x is the time the individual spends in the age

category x, i.e. the individuals contribution to the
time-at-risk in age category x. Often it is simpler to
calculate the expected number of deaths by first
computing total time-at-risk in each age category,
multiply by the age-specific mortality rate, and sum
contributions from each age category, i.e.
E
* (t
EXIT
) * (tENTRY )
persons
*
x x
persons x
x* PYR x
x
where PYRx the person-years at risk in age category

x.
47
May 11 2004
Michael Vth
The following figure illustrates the two different ways to

calculation of E.
Age specific mortality rates
*
30
*
29
30
31
*
33
*
32
*
31
32
33
*
34
34
AGE
The "expected" number of deaths E depends on the

survival times and is therefore a random variable (i.e.
subject to random variation) and not really an expected
number (i.e. a constant).
If the mortality in the study group is identical to the
mortality in the reference population, i.e. if 1 one
may show that
E (E ) E (D ) expected number of deaths ,
i.e. on the average the "expected" number of death is
equal to the expected number of death!!!
48
May 11 2004
Michael Vth
Example:
Mortality for patients diagnosed with manic-depressive
psychosis (Weeke, Juel & Vth, J. Affective Disorders
1987; 13: 287-292).
Data:
Patients admitted to a psychiatric hospital for the first
time in the period April 1, 1970 - March 31, 1972 and
followed until March 31, 1977.
Number of patients N = 2168.
17 patients were lost to follow-up due to emigration and
were censored on date of emigration.
Results:
Observed
"Expected"
X2
All patients
309
176.55
1.75
99.4
Males
159
73.34
2.17
100.0
Females
150
103.21
1.45
21.2
95% confidence intervals for the relative mortality:

Method above
"exact"
Poisson
All patients
Males
Females
1.57 - 1.97
1.86 - 2.53
1.24 - 1.71
1.56 - 1.96
1.84 - 2.53
1.24 - 1.71
49
May 11 2004
Michael Vth
The results above are often derived from a Poisson

model for the observed number of deaths assuming that
the "expected" number E is fixed. Confidence interval
based on this Poisson model is referred to as "exact"
confidence intervals.
Here we can compare the excess mortality among men
with the excess mortality among women.
Null hypothesis: The relative mortality does not depend
on the sex of the patient: m f .
Test statistic:
On the null hypothesis the following test statistic is
approximately distributed as a 2 variate on 1 degree of
freedom
2
ln( m ) ln(f )
W
Dm
We find
W
Df
ln(2.17) ln(1.45)
1591 150 1
giving p = 0.00044.
50
12.35
May 11 2004
Michael Vth
USING STATA TO COMPUTE STANDARDIZED

MORTALITY RATIOS
The STATA command stptime can compute the SMR
relative to a set of reference rates read from a separate
file.
Example: Diabetes mortality data
The STATA file diabetes.dta contains data on mortality
for patients with diabetes from Green & Hougaard,
Diabetologia 1984; 26: 190-194, see Exercise 8 for a
variable description.
Here we compare the mortality in this group with the
mortality in the general population represented by a life
table based on the calendar years 1976-1980.
For simplicity 10 years age intervals are used in the rate
file. The file kvrater7680-10.dta contains the following
female mortality rates (per 1000 years)
agecat
0
10
20
30
40
50
60
70
80
rate
.461
.125
.205
.417
1.204
2.757
6.128
16.824
50.955
Note that agecat gives the lower bound of the age

interval.
51
May 11 2004
Michael Vth
The rates are computed from the life table as
ratex 1000 ln Sf* ( x 10) Sf* ( x ) 10

A similar file, marater7680-10.dta contains the mortality
rates for males. Both files must be sorted on agecat
before saving them.
Note: stptime only allows one set of rates, so a
combined analysis is not possible unless the same set
of rates are applied to both men and women.
The following commands produce expected number of
deaths and SMR for each sex separately.
* defining the survival time data with
* age as time scale
gen exitage=entryage+futime/365.25
stset exitage ,
failure(status==1) entry(time entryage)
id(id) noshow
///
///
* calculations for females

stptime if(sex==0) ,
smr(agecat rate)
using(E:\kurser\survival\kvrater7680-10.dta) ///
at(30(10)80) trim per(1000)
* calculations for males
stptime if(sex==1) ,
smr(agecat rate)
using(E:\kurser\survival\marater7680-10.dta)///
at(30(10)80) trim per(1000)
52
///
///
///
///
May 11 2004
Michael Vth
The option trim specifies that follow-up time less than

30 or greater than 90 are to be excluded from the
computations
Output
. stptime if(sex==0) , smr(agecat rate)
using(E:\kurser\survival\aarhus2003\data\kv
> rater7680-10.dta) at(30(10)80) trim per(1000)
|
observed expected
Cohort |person-time failures failures SMR [95% Conf.Inter]
---------+---------------------------------------------------(30 - 40]| 646.53937
11 .26951 40.815 22.6032 73.6995
(40 - 50]| 676.54623
8 .814799 9.8184 4.91015 19.6329
(50 - 60]| 787.27589
22 2.17029 10.137 6.67464 15.3951
(60 - 70]|
959.54
55 5.87991 9.3539 7.18152 12.1834
(70 - 80]| 723.50156
87 12.1722 7.1475 5.79286 8.81881
---------+---------------------------------------------------total | 3793.403 183 21.3067 8.5889 7.43041 9.92792
. stptime if(sex==1) , smr(agecat rate)
using(E:\kurser\survival\aarhus2003\data\ma
> rater7680-10.dta) at(30(10)80) trim per(1000)
|
observed expected
Cohort |person-time failures failures SMR [95% Conf.Inter]
---------+---------------------------------------------------(30 - 40]| 954.56259
17 .652274 26.063 16.2021 41.9243
(40 - 50]| 957.03772
22 1.6278 13.515 8.89909 20.5258
(50 - 60]| 970.16295
37 4.54827 8.135 5.89412 11.2277
(60 - 70]| 800.0821
63 9.53987 6.6039 5.15889 8.45355
(70 - 80]| 462.71036
82 13.747 5.9649 4.80402 7.40634
---------+---------------------------------------------------total | 4144.5557 221 30.1153 7.3385 6.43202 8.37266
For both women and men the mortality is considerably

higher than the mortality in the general population.
The SMR is slightly larger for women, but a clear trend
with age is seen in both sexes, so the overall SMR is
less relevant.
53
May 11 2004
Michael Vth
The command strate has also options for simple

comparisons with external rates.
54
May 11 2004
Michael Vth
POISSON REGRESSION WITH EXTERNAL

REFERENCE RATES USING STATA
A computation of a standardized mortality ratio for a
group of individuals or patients is a rather crude
comparison with the mortality in a reference population.
Often further insight can be gained by studying how the
relative mortality depends on a number of covariates.
Such models, SMR regression models, are conveniently
expressed as a Poisson regression model for the
aggregated data. The relevant parameters are estimated
by choosing the expected number of deaths as timeat-risk. The following example illustrates the approach
using STATA.
Example: Diabetes mortality data
The data in diabetes.dta from Green&Hougaard (1987)
is first split on 5-year age categories and collapsed in a
multi-way event time table with sex, agecat, and dxcat
(age-at-diagnosis) as classifying factors (output omitted)
egen dxacat=cut(dxage) , at(0,20,40,60,120)
gen exitage=entryage+futime/365.25
stset exitage ,
///
failure(status==1) entry(time entryage) id(id) noshow
gen died=_d
gen risktime=_t-_t0
collapse (sum) died risktime , by(sex agecat dxacat)
55
May 11 2004
Michael Vth
save e:\kurser\survival\diabetes-coll-agecat2.dta
The national mortality rates (per 1000 years) on 20 fiveyears intervals for each sex are placed in the file
mort7680.dta. The file contains data in variables
sexage and mrate, where sexage takes the values from
1 to 40:
sex
female
female
female
etc.
male
male
agecat
0-4
5-9
10-14
sexage
1
2
3
90-94
95-99
39
40
Apart from now using 5-years intervals mrate is

computed from the life table as before. The file
mort7680.dta must be sorted on sexage before it is
saved.
The reference rates are now appended to the multi-way
event time table using the commands
use e:\kurser\survival\diabetes-coll-agecat2.dta
egen sexage=group(sex agecat)
sort sexage
merge sexage
using e:\kurser\survival\mort7680.dta
save e:\kurser\survival\ diabetes-coll-agecat3.dta
We have now added a column, mrate, to the file.
56
///
May 11 2004
Michael Vth
The new column contains the reference rate (per 1000

years) in the appropriate sex and age category.
In a Poisson regression model number of events in a

cell of the multi-way table is treated as a Poisson variate
with mean raterisktime (see page 19).
The present table has sex, agecat and dxacat as
classifying factors and a total of 93 non-empty entries
with event, risktime and mrate in each cell
A Poisson regression model with external reference
rates specifies multiplicative structure for mortality rate
ijk in the cell given by sex=i, agecat=j, dxacat=k (i = 0,1,
j = 0,5,..,95, k = 0,20,40,60)
ijk ijk ij* ,

*
where ijk is the relative mortality in the i,j,k-cell and ij
is the sex and age specific reference rate.
The model therefore specifies that the number of events

in the i,j,k-cell has mean
rate risktime ijk risktime ijk ij* risktime ijk Eijk ,
where Eijk is the expected number of deaths in the cell
according the reference rates.
57
May 11 2004
Michael Vth
If we use Eijk instead of risktime in the Poisson

regression we have a regression model for the
relative mortality and may fit models like e.g.
ijk 000ai b j ck
A couple of examples illustrate the possibilities.
The irr option is not used since the constant term is not
displayed if this option is used. Rather inconvenient.
gene expected = risktime*mrate/1000
xi: poisson died , exposure(expected) nolog
Output
Poisson regression
Number of obs =
93
LR chi2(0)
= 0.00
Prob > chi2
=
.
Pseudo R2
= 0.0000
-------------------------------------------------------------died | Coef. Std. Err. z P>|z| [95% Conf. Inte]
---------+---------------------------------------------------_cons | 1.911351 .0451294 42.35 0.000 1.82290 1.99980
expected | (exposure)
--------------------------------------------------------------
The coefficient is equal to ln(SMR) so

SMR exp( 1.911351) 6.762
A similar calculation gives the 95% confidence interval
for the SMR:
Lower limit = exp(1.82290) = 6.190
Upper limit = exp(1.99980) = 7.388
58
May 11 2004
Michael Vth
Next see if the relative mortality depends on sex

xi: poisson died i.sex , exposure(expected) nolog
59
May 11 2004
Michael Vth
Output
Poisson regression
Number of obs =
93
LR chi2(1)
= 0.64
Prob > chi2
= 0.4236
Pseudo R2
= 0.0014
-------------------------------------------------------------died | Coef. Std. Err. z P>|z| [95% Conf. Inter]
---------+---------------------------------------------------_Isex_1 | .072242 .0903129 0.80 0.424 -.104768 .249252
_cons |1.874631 .064957 28.86 0.000 1.747318 2.001945
expected | (exposure)
--------------------------------------------------------------
The constant is ln(SMR) for females and the coefficient

for sex is ln(SMRm / SMRf ) . The SMR for males and
females are not significantly different (the ln(ratio) is
close to 0).
We find
SMRf exp(1.874631) 6.518
SMRm / SMRf exp(0.72242) 1.075
and therefore
SMRm SMRf (SMRm / SMRf ) 6.518
1.075 7.007
Confidence limits can be computed similarly.
Finally see if the relative mortality depends on age-atdiagnosis
xi: poisson died i.sex i.dxacat ,
exposure(expected) nolog
testparm *xa*
60
///
May 11 2004
Michael Vth
Output
Poisson regression
Number of obs =
93
LR chi2(4)
= 61.16
Prob > chi2
= 0.0000
Pseudo R2
= 0.1381
-------------------------------------------------------------died | Coef. Std. Err. z P>|z| [95% Conf. Inter]
-----------+-------------------------------------------------_Isex_1 |-.0495702 .0923718 -0.54 0.592 -.230616 .131475
_Idxacat_20|-.7117244 .1615624 -4.41 0.000 -1.02838 -.39507
_Idxacat_40|-.7557131 .1461376 -5.17 0.000 -1.04214 -.469289
_Idxacat_60|-1.277969 .1599211 -7.99 0.000 -1.59141 -.964530
_cons | 2.776317 .1412925 19.65 0.000 2.49939 3.05325
expected |(exposure)
-------------------------------------------------------------. testparm *dx*
( 1) [died]_Idxacat_20 = 0
chi2( 3) = 64.81
Prob > chi2 = 0.0000
The relative mortality depends clearly on age at

diagnosis, but this may well be a time-since-diagnosis
effect that is showing up here. Further analysis is
needed to uncover this. The model predicts the following
SMRs
dxage
0-20
20-40
40-60
60-
female
16.06
7.88
7.54
4.47
male
15.28
7.50
7.18
4.26
Example of obtaining an SMR from the coefficients

61
May 11 2004
Michael Vth
SMRm,20 40 exp(2.7763 0.0496 0.7117)

exp(2.015) 7.50
62
May 11 2004
Michael Vth
Analysis of censored survival data:

Cox regression or Poisson regression?
Analysis of time-to-event data can be analyzed both with
Cox regression and Poisson regression models
To use Poisson regression the individual data records
must first be aggregated in an event-time table using
special software.
This table will often be considerably smaller than the
original data set and computations will therefore be
faster.
Poisson regression is mainly preferable in large
studies with relatively few covariates. Time-dependent
covariates can be defined and used when setting up the
event-time table. Several time scales are easily
accommodated.
Cox regression is mainly preferable in studies with
many covariates and if the analyses include more
exploratory aspects of working with time-dependent
covariate information, e.g. selecting the best way to
define a time-dependent covariate. Once a proper
representation is found is may be advantageous to
continue with Poisson regression.
63

Verth's Application of Event History On Clinical Tests

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Verth's Application of Event History On Clinical Tests

Uploaded by

Copyright:

Available Formats

Department of Biostatistics

STATISTICAL ANALYSIS OF SURVIVAL DATA

The main topic in the third period is analysis of

ANALYSIS OF SURVIVAL TIME DATA RELATION TO METHODS USED IN

The time scale and/or age scale is often split in a

In the Cox regression analysis the hazard rate is

A 95% confidence intervals for the unknown rate is

Note: The individual survival times are not needed to

The stptime generates the following output. The

Separate rates for each category of a covariate are also

Only one variable is allowed in the by option. To get

Output from the first command

57 .04717452 .0363884 .0611578

Apparently, the mortality rate initially increases to reach

For interval j from j 1 to j let

For each interval the value of the hazard rate is

and the probability of surviving from time 0 until the end

Example. Survival with malignant melanoma..

produces the following output

To compare the survival for males and females

stsplit timecat , at(2(2)8)

We then define the new variables died and risktime

All the individual contributions are then aggregated (i.e.

To see the context of the new file write

Note that timecat takes the lower limit of the interval as

To compute the mortality rate ratio for males versus

The command syntax is

The exposure variable must have two categories and

Essentially the same results is obtained by a Cox

EXAMPLE: THE LIFE SPAN STUDY

LSS: Mortality, all cancers combined, Hiroshima, 1971-90

The STATA file hiro7190.dta contains these data. The

The following STATA commands may be used to

Note: The last stratum variable is moving fastest, i.e.

A further investigation of this effect modification requires

regression model with no interaction assumes that the

xi:poisson cases i.sex i.agex i.dose ,

The reference group is unexposed males, age 0-19 in

we had 1.852352. As expected the mortality depends

Expected ijk ijk PYR ,

Illustration: For exposed males aged 20-39 at exposure

The usual 2 goodness-of-fit test becomes 22.27 with

The fit of the model can be improved by adding

_IdosXsex_~2| 1.43131 .210778 2.44 0.015 1.07246 1.91022

. testparm *sXa* //testing no dose by age interaction

richer model that e.g. includes a lot of interaction

FROM SURVIVAL TIME DATA

In the analysis of the LSS data multi-way tables with

For each entry in the multi-way table a crude rate can be

FORMING EVENT-RISKTIME TABLES WITH STATA

to split the data at 1,2,3, and 5 years of follow-up write

and data are split in 5 time categories.

the time scale is age in years and we may consider

Example 3 Splitting on age and time in study

After the data have been split the multi-way table is

several time scales in many intervals and collapsing the

-------------------------------------------------------------_t |Haz. Ratio Std. Err. z P>|z| [95% Conf. Int]

The ratio between corresponding estimates varies

USING POPULATION MORTALITY RATES IN

Main types of problems

Here mainly a discussion of "how to do it" without

NATIONAL LIFE TABLES AND MORTALITY

. testparm sXa //testing no dose by age interaction