Professional Documents
Culture Documents
University of Aarhus
May 11 2004
Michael Vth
Department of Biostatistics
University of Aarhus
May 11 2004
Michael Vth
number of events
total time at risk
Department of Biostatistics
University of Aarhus
May 11 2004
Michael Vth
(t )
S(t ) exp( t ) e t
The maximum-likelihood estimate of the constant
hazard rate is
d number of events
s
total time at risk
The standard error of is estimated by
SE ( ) d s
3
Department of Biostatistics
University of Aarhus
May 11 2004
Michael Vth
C exp 1.96
Department of Biostatistics
University of Aarhus
May 11 2004
Michael Vth
Department of Biostatistics
University of Aarhus
total | 1208.2793
May 11 2004
Michael Vth
Department of Biostatistics
University of Aarhus
May 11 2004
Michael Vth
0 t 1
1 t 2
(t )
r 1 t
Department of Biostatistics
University of Aarhus
May 11 2004
Michael Vth
p j exp j ( j j 1 )
Department of Biostatistics
University of Aarhus
May 11 2004
Michael Vth
Department of Biostatistics
University of Aarhus
May 11 2004
Michael Vth
Department of Biostatistics
University of Aarhus
May 11 2004
Michael Vth
+------------------------------------+
11
Department of Biostatistics
University of Aarhus
May 11 2004
Michael Vth
12
Department of Biostatistics
University of Aarhus
May 11 2004
Michael Vth
Sex
M
F
20-39
M
F
40-
M
F
No. in
1950
392
3696
406
4221
219
1774
506
4272
431
3355
376
4052
Dose (Gy)
1
0.005
1
0.005
1
0.005
1
0.005
1
0.005
1
0.005
Cancer
Risk time
deaths
(in 1000 y)
23
6.77
105
67.33
22
7.12
71
79.03
37
2.92
218
23.93
69
7.82
266
71.48
40
1.59
233
13.50
37
2.47
255
27.19
13
Department of Biostatistics
University of Aarhus
May 11 2004
Michael Vth
Output
group(agex sex)
IRR [95% Conf. Interval] M-H Weight
------------------------------------------------------------1 | 2.178505 1.323522 3.445787
9.593117
2 | 3.43935 2.029444 5.615747
5.867905
3 | 1.390929 .9539309 1.977719
23.70801
4 | 2.371075 1.792351 3.100524
26.23102
5 | 1.457608 1.01505 2.045349
24.5507
6 | 1.597253 1.099477 2.26114
21.23567
-------------+----------------------------------------------Crude | 1.955327 1.688804 2.255824
M-H combined | 1.852352 1.607143 2.134972
------------------------------------------------------------Test of homogeneity (M-H) chi2(5) = 15.53 Pr>chi2 = 0.0083
14
Department of Biostatistics
University of Aarhus
May 11 2004
Michael Vth
POISSON REGRESSION
In the Poisson regression analysis the number of
events in a given cell of the multi-way table is treated as
a Poisson variable with mean equal to rate risktime .
Comment: A Poisson distribution with mean is the
limiting distribution of a binomial distribution (n,p) as the
n goes to infinity and p tends to zero such that the mean
is fixed n p . Poisson distributions are often used to
model occurrence of random events.
In a Poisson regression model the rate in a given cell is
modeled as a product of factors reflecting the effect of
the category levels of the variables defining the multiway table.
Example: Cancer mortality in the LSS
The LSS data above form a 3x2x2 table with agex, sex,
and dose as classifying variables.
A Poisson regression model specifies multiplicative
structure for mortality rate ijk in the cell given by
agex=i, sex=j, dose=k (i = 0,1,2, j = 0,1, k = 0,1)
If a reference category is chosen for each of the
classifying variables (e.g. i = j = k = 0), the Poisson
15
Department of Biostatistics
University of Aarhus
May 11 2004
Michael Vth
ijk 000ai b j ck
The parameters ai , b j , and ck are rate ratios. The
parameter a1 represents the rate ratio of the mortality in
the second age-at-exposure category relative to the first
category when controlling for sex and dose as
independent risk factors.
Poisson models are usually specified as additive models
for the ln(rate). Using dummy variables we have
ln(rate ) ln( ijk ) 0 i(1)zi(1) j(2)z(2)
k(3)zk(3)
j
The constant, the parameter 0 , is the ln(rate) in the
reference cell and the other -parameters are
logarithms of rate ratios. Models with interaction terms
may also be used.
Poisson regression with STATA
The following commands fit the above Poisson
regression model to the LSS data in hiro7190.dta,
Note that the output from the default version (the first
command) gives the log-linear parameter estimates. To
get rate ratios the option irr must be added (second
version)
xi:poisson cases i.sex i.agex i.dose ,
exposure(pyr) nolog
16
Department of Biostatistics
University of Aarhus
May 11 2004
Michael Vth
17
Department of Biostatistics
University of Aarhus
May 11 2004
Michael Vth
Output
. xi:poisson cases i.sex i.agex i.dose , exposure(pyr) nolog
i.sex
_Isex_1-2
(naturally coded; _Isex_1 omitted)
i.agex
_Iagex_1-3 (naturally coded; _Iagex_1 omitted)
i.dose
_Idose_0-1 (naturally coded; _Idose_0 omitted)
Poisson regression
Number of obs =
12
LR chi2(4)
= 1150.09
Prob > chi2
= 0.0000
Log likelihood = -47.475645
Pseudo R2
= 0.9237
-------------------------------------------------------------cases | Coef. Std. Err. z P>|z| [95% Conf. Int.]
---------+---------------------------------------------------_Isex_2 | -.6606352 .054719 -12.07 0.000 -.767882 -.55339
_Iagex_2 | 1.529529 .0798769 19.15 0.000 1.37297 1.68609
_Iagex_3 | 2.29477 .0796431 28.81 0.000 2.13867 2.45087
_Idose_1 | .6165749 .0725606 8.50 0.000 .474358 .75879
_cons | -6.357755 .0712617 -89.22 0.000 -6.49743 -6.2181
pyr | (exposure)
-------------------------------------------------------------. xi:poisson cases i.sex i.agex i.dose , exposure(pyr) irr
******************* first part as above **********************
-------------------------------------------------------------cases |
IRR Std. Err. z P>|z| [95% Conf. Int.]
-------------+-----------------------------------------------_Isex_2 | .5165231 .0282636 -12.07 0.000 .463995 .574998
_Iagex_2 | 4.616002 .368712 19.15 0.000 3.94707 5.39830
_Iagex_3 | 9.922157 .7902317 28.81 0.000 8.48816 11.5984
_Idose_1 | 1.852572 .1344237 8.50 0.000 1.60698 2.13569
pyr | (exposure)
--------------------------------------------------------------
Department of Biostatistics
University of Aarhus
May 11 2004
Michael Vth
0-19
105
116.7
71
70.8
unexp.
20-39
218
191.5
266
295.4
40233
232.2
255
241.5
0-19
23
21.7
22
11.8
exposed
20-39
37
43.3
69
59.9
4040
50.7
37
40.6
1000
43.279
Output
. poisgof
Goodness-of-fit chi2 = 20.61145
Prob > chi2(7)
= 0.0044
19
Department of Biostatistics
University of Aarhus
May 11 2004
Michael Vth
. poisgof , pearson
Goodness-of-fit chi2 = 22.27476
Prob > chi2(7)
= 0.0023
20
Department of Biostatistics
University of Aarhus
May 11 2004
Michael Vth
Department of Biostatistics
University of Aarhus
May 11 2004
Michael Vth
Comments
The final model is consistent with the data, but gives a
rather complex description.
The dose effect is modified by both sex (larger rate ratio
for females) and age-at-exposure (the dose effect
decreases with age-at-exposure).
Having 10 estimated parameters the final model is only
slightly simpler than the saturated model (i.e. the model
with 12 freely varying rates).
Note also
The goodness-of-fit test is not very reliable in large
tables with many small counts. In such circumstances
one may instead compare a given model with a much
22
Department of Biostatistics
University of Aarhus
May 11 2004
Michael Vth
23
Department of Biostatistics
University of Aarhus
May 11 2004
Michael Vth
24
Department of Biostatistics
University of Aarhus
May 11 2004
Michael Vth
Age
25
20
15
10
1980
1985
1990
1995
2000
Calendar Time
In each cell compute:
Number of events D
Total time at risk
S
Department of Biostatistics
University of Aarhus
May 11 2004
Michael Vth
26
Department of Biostatistics
University of Aarhus
May 11 2004
Michael Vth
Department of Biostatistics
University of Aarhus
May 11 2004
Michael Vth
etc.
where varlist is a subset of the variables defining the
multi-way table and interaction terms.
Note:
The data do not have to be collapsed to do Poisson
regression, but data may become very large if split on
28
Department of Biostatistics
University of Aarhus
May 11 2004
Michael Vth
Department of Biostatistics
University of Aarhus
May 11 2004
Michael Vth
xi:poisson died
///
i.sex i.invasion i.ecells i.ulcerat
i.timecat , exposure(risktime) irr
///
Apart from the baseline hazard rate the two models are
identical and both give results as rate ratios.
Selected Output
Cox regression -- no ties
No. of subjects =
205
Number of obs =
No. of failures =
57
Time at risk = 1208.279261
LR chi2(5)
=
44.51
Log likelihood = -260.94353
Prob > chi2
=
205
0.0000
Poisson regression
Number of obs =
109
LR chi2(9)
=
50.08
Prob > chi2
= 0.0000
Log likelihood = -87.668122
Pseudo R2
= 0.2222
-------------------------------------------------------------died |
IRR Std. Err. z P>|z| [95% Conf. Int]
------------+------------------------------------------------_Isex_1| 1.85962 .504960 2.28 0.022 1.09217 3.16635
_Iinvasion_1| 2.1712 .719682 2.34 0.019 1.13382 4.15761
_Iinvasion_2| 2.71461 1.06812 2.54 0.011 1.25541 5.86988
_Iecells_1| 1.81955 .555404 1.96 0.050 1.00033 3.3097
_Iulcerat_1| 2.76710 .885904 3.18 0.001 1.47743 5.18254
_Itimecat_2| 1.88015 .638056 1.86 0.063 .966772 3.65645
_Itimecat_4| 1.79353 .668697 1.57 0.117 .863671 3.72451
_Itimecat_6| 1.27918 .662529 0.48 0.635 .463521 3.53018
_Itimecat_8| .613017 .462909 -0.65 0.517 .139541 2.6930
risktime| (exposure)
30
Department of Biostatistics
University of Aarhus
May 11 2004
Michael Vth
--------------------------------------------------------------
31
Department of Biostatistics
University of Aarhus
May 11 2004
Michael Vth
32
Department of Biostatistics
University of Aarhus
May 11 2004
Michael Vth
Department of Biostatistics
University of Aarhus
May 11 2004
Michael Vth
34
Department of Biostatistics
University of Aarhus
May 11 2004
Michael Vth
35
Department of Biostatistics
University of Aarhus
May 11 2004
Michael Vth
36
Department of Biostatistics
University of Aarhus
May 11 2004
Michael Vth
x ln Sx 1 Sx ln 1 q x
Notes
The age-specific mortality proportion q x is a probability
and has no dimension, whereas the mortality rate x has
dimension per time unit.
In epidemiology both are often denoted the
mortality rate.
The mortality rate is always numerically larger than the
corresponding age-specific mortality proportion, but
apart from extremely old ages the discrepancy is very
small.
The plots below show the ratio x q x plotted against the
proportion q x and against age for each sex for the 200001 Danish life table.
37
Department of Biostatistics
University of Aarhus
May 11 2004
Michael Vth
x qx (1 q x 2) .
38
Department of Biostatistics
University of Aarhus
May 11 2004
Michael Vth
CAUSE-SPECIFIC MORTALITY
Simple -and reasonably accurate estimates of causespecific mortality rates can be derived from the
relation
CS ( x ) CS ( x ) ( x )
where CS ( x ) the cause-specific mortality rate, CS ( x )
the proportion of deaths from the specified cause at age
x, and ( x ) the total mortality rate.
Estimation of the total mortality rate ( x ) has already
been described.
For each sex and age in 5 year intervals the proportion
CS ( x ) can be estimated from tables of number of
deaths by cause published each year in "Causes of
death in Denmark" (Ddsrsagerne i Danmark) - or on
the website mention above - as
no. of cause-specific deaths in age interval
,
total no. of deaths in age interval
where the "age interval" refers to the five age interval
containing age x.
In each 5 year interval total mortality rates (one for each
of the 5 years) are then multiplied by this estimate to
give the corresponding cause-specific mortality rates .
39
Department of Biostatistics
University of Aarhus
May 11 2004
Michael Vth
Alive at
start of
year
Y0
Y1
Y2
M
censored
c1
c2
c3
M
modified
number
at risk
n1
n2
n3
M
Department of Biostatistics
University of Aarhus
May 11 2004
Michael Vth
Department of Biostatistics
University of Aarhus
May 11 2004
Michael Vth
42
Department of Biostatistics
University of Aarhus
May 11 2004
Michael Vth
j (t ) * (age, sex ) (t ) ,
where * (age, sex ) is the population mortality rate and
(t ) the excess mortality rate.
Let
ti
Ai A(t i ) (t )dt
0
Then
*
S * is an estimate of exp ,
exp *i Ai
S%
i is an estimate of
i
*
The corrected survival SiC S%
i Si can therefore be
viewed as an estimate of
exp Ai exp (t ) dt ,
0
ti
43
Department of Biostatistics
University of Aarhus
May 11 2004
Michael Vth
RELATIVE MORTALITY,
THE PERSON-YEAR METHOD
Notation:
(t )
mortality rate in the study group
mortality rate in the reference population
* (t )
Survival function and integrated mortality rate in the
reference population are denoted S * (t ) and * (t ) .
A simple statistical model:
Assume that the mortality rate in the study group is
proportional to that of the reference group:
(t ) * (t )
Two situations:
1. Age a is chosen as the underlying time t. With t a
the model becomes
(t , e, sex ) * (t e, sex )
The dependence on sex is suppress below.
44
Department of Biostatistics
University of Aarhus
May 11 2004
Michael Vth
tENTRY
tEXIT
(t
D
*
)
(tENTRY )
EXIT
persons
D
E
The numerator D:
D = the observed number of deaths during follow-up.
The denominator E
E = the expected number of deaths during follow
up. This is a convenient terminology, but not quite
correct. E is rather number of deaths to be expected
with the observed follow-up times.
45
Department of Biostatistics
University of Aarhus
May 11 2004
Michael Vth
c c ,
where
c exp 1.96
Null hypothesis:
The mortality in the study group is identical to the
mortality in the reference population, i.e. 1.
The expected value of D E is 0 on the null hypothesis
E (D E ) 0
and Var (D E ) can be estimated by E.
These results lead to the following test statistic
(D E )2
X
E
2
Department of Biostatistics
University of Aarhus
May 11 2004
Michael Vth
(tEXIT ) (tENTRY )
*
tENTRY
* (s )ds x t x ,
x
* (t
EXIT
) * (tENTRY )
persons
*
x x
persons x
x* PYR x
x
Department of Biostatistics
University of Aarhus
May 11 2004
Michael Vth
*
30
*
29
30
31
*
33
*
32
*
31
32
33
*
34
34
AGE
Department of Biostatistics
University of Aarhus
May 11 2004
Michael Vth
Example:
Mortality for patients diagnosed with manic-depressive
psychosis (Weeke, Juel & Vth, J. Affective Disorders
1987; 13: 287-292).
Data:
Patients admitted to a psychiatric hospital for the first
time in the period April 1, 1970 - March 31, 1972 and
followed until March 31, 1977.
Number of patients N = 2168.
17 patients were lost to follow-up due to emigration and
were censored on date of emigration.
Results:
Observed
"Expected"
X2
All patients
309
176.55
1.75
99.4
Males
159
73.34
2.17
100.0
Females
150
103.21
1.45
21.2
All patients
Males
Females
1.57 - 1.97
1.86 - 2.53
1.24 - 1.71
1.56 - 1.96
1.84 - 2.53
1.24 - 1.71
49
Department of Biostatistics
University of Aarhus
May 11 2004
Michael Vth
ln( m ) ln(f )
W
Dm
We find
W
Df
ln(2.17) ln(1.45)
1591 150 1
giving p = 0.00044.
50
12.35
Department of Biostatistics
University of Aarhus
May 11 2004
Michael Vth
rate
.461
.125
.205
.417
1.204
2.757
6.128
16.824
50.955
Department of Biostatistics
University of Aarhus
May 11 2004
Michael Vth
///
///
52
///
///
///
///
Department of Biostatistics
University of Aarhus
May 11 2004
Michael Vth
53
Department of Biostatistics
University of Aarhus
May 11 2004
Michael Vth
54
Department of Biostatistics
University of Aarhus
May 11 2004
Michael Vth
55
Department of Biostatistics
University of Aarhus
May 11 2004
Michael Vth
save e:\kurser\survival\diabetes-coll-agecat2.dta
The national mortality rates (per 1000 years) on 20 fiveyears intervals for each sex are placed in the file
mort7680.dta. The file contains data in variables
sexage and mrate, where sexage takes the values from
1 to 40:
sex
female
female
female
etc.
male
male
agecat
0-4
5-9
10-14
sexage
1
2
3
90-94
95-99
39
40
56
///
Department of Biostatistics
University of Aarhus
May 11 2004
Michael Vth
57
Department of Biostatistics
University of Aarhus
May 11 2004
Michael Vth
ijk 000ai b j ck
A couple of examples illustrate the possibilities.
The irr option is not used since the constant term is not
displayed if this option is used. Rather inconvenient.
gene expected = risktime*mrate/1000
xi: poisson died , exposure(expected) nolog
Output
Poisson regression
Number of obs =
93
LR chi2(0)
= 0.00
Prob > chi2
=
.
Log likelihood = -221.45684
Pseudo R2
= 0.0000
-------------------------------------------------------------died | Coef. Std. Err. z P>|z| [95% Conf. Inte]
---------+---------------------------------------------------_cons | 1.911351 .0451294 42.35 0.000 1.82290 1.99980
expected | (exposure)
--------------------------------------------------------------
58
Department of Biostatistics
University of Aarhus
May 11 2004
Michael Vth
59
Department of Biostatistics
University of Aarhus
May 11 2004
Michael Vth
Output
Poisson regression
Number of obs =
93
LR chi2(1)
= 0.64
Prob > chi2
= 0.4236
Log likelihood = -221.13671
Pseudo R2
= 0.0014
-------------------------------------------------------------died | Coef. Std. Err. z P>|z| [95% Conf. Inter]
---------+---------------------------------------------------_Isex_1 | .072242 .0903129 0.80 0.424 -.104768 .249252
_cons |1.874631 .064957 28.86 0.000 1.747318 2.001945
expected | (exposure)
--------------------------------------------------------------
///
Department of Biostatistics
University of Aarhus
May 11 2004
Michael Vth
Output
Poisson regression
Number of obs =
93
LR chi2(4)
= 61.16
Prob > chi2
= 0.0000
Log likelihood = -190.87923
Pseudo R2
= 0.1381
-------------------------------------------------------------died | Coef. Std. Err. z P>|z| [95% Conf. Inter]
-----------+-------------------------------------------------_Isex_1 |-.0495702 .0923718 -0.54 0.592 -.230616 .131475
_Idxacat_20|-.7117244 .1615624 -4.41 0.000 -1.02838 -.39507
_Idxacat_40|-.7557131 .1461376 -5.17 0.000 -1.04214 -.469289
_Idxacat_60|-1.277969 .1599211 -7.99 0.000 -1.59141 -.964530
_cons | 2.776317 .1412925 19.65 0.000 2.49939 3.05325
expected |(exposure)
-------------------------------------------------------------. testparm *dx*
( 1) [died]_Idxacat_20 = 0
( 2) [died]_Idxacat_40 = 0
( 3) [died]_Idxacat_60 = 0
chi2( 3) = 64.81
Prob > chi2 = 0.0000
female
16.06
7.88
7.54
4.47
male
15.28
7.50
7.18
4.26
Department of Biostatistics
University of Aarhus
May 11 2004
Michael Vth
62
Department of Biostatistics
University of Aarhus
May 11 2004
Michael Vth
63