STAT 431: Generalized Linear Models Spring 2019

STAT 431: Generalized Linear Models
Spring 2019
Lecture 13: Log Linear Models (Continued)
Instructor: Cecilia A. Cotton
Department of Statistics & Actuarial Science

University of Waterloo
June 20, 2019

c Cecilia Cotton 2019
This material is for the personal use of students enrolled in the Spring 2019 offering of Stat 431. Distribution or reproduction of these material for
commercial or non-commercial means is strictly prohibited.
Log Linear Models
Previously we used a Poisson GLM to model count data arising from a time
homogeneous Poisson process:
• Ni (ti ) = the number of events observed over (0, ti ]
E [Ni (t)] = µi (t) = λi ti
• Explanatory variables xi = (1, xi1 , . . . , xip−1 )0
log µi (ti ) = log λi + log ti = x0i β + log ti
We will consider three other types of data we can analyze with a Poisson GLM:
1. Approximating binomial data
2. Time non homogeneous Poisson processes
3. Contingency tables/Multinomial data (next class)
2/1
1. Poisson Approximation to the Binomial
• Suppose: Y ∼ Bin(m, π) so that E [Y ] = mπ
• Set µ = mπ and examine pmf of Y in terms of µ

m y
f (y ) = π (1 − π)m−y
y
(m)(m − 1) · · · (m − y )(m − y − 1) · · · (1) µ y µ m−y
= 1−
(m − y )! y ! m m
y −y
(m)(m − 1) · · · (m − (y − 1)) µ µ m µ
= 1− 1−
(m)(m) · · · (m) y! m m
• Recall a n
lim 1+ = ea
n→∞ n
• Therefore as m → ∞ with µ = mπ fixed:
µy e −µ
f (y ) → i.e. Y → Poisson(µ)
y!
3/1
Poisson Approximation to the Binomial
• So for Y ∼ Bin(m, π) as m → ∞ and π → 0 we have
Y ∼ Poisson(µ = mπ)
with E [Y ] = µ = mπ
• Using a Poisson GLM (with log link):
log µ = log π + log m

= xT β + log m
| {z }
offset
• Use the Poisson distribution to model Binomial data

• Use with large population (m large) and low event rate (π)
• Example: Today and Problem 3.1 in course notes
4/1
Example: Poisson Approximation to the Binomial
Non Melanoma Skin Cancer
Schwarz (2015) gives the incidence of non melanoma skin cancer among women in the early
1970s in Minneapolis-St Paul and Dallas-Fort Worth
City Age Class Age Mid Count Pop Size
msp 15-25 20 1 172675
msp 25-34 30 16 123065
msp 35-44 40 30 96216
msp 45-54 50 71 92051
msp 55-64 60 102 72159
msp 65-74 70 130 54722
msp 75-84 80 133 32185
msp 85+ 90 40 8328
dfw 15-25 20 4 181343
dfw 25-34 30 38 146207
dfw 35-44 40 119 121374
dfw 45-54 50 221 111353
dfw 55-64 60 259 83004
dfw 65-74 70 310 55932
dfw 75-84 80 226 29007
dfw 85+ 90 65 7538
5/1
6/1
R Code
# C r e a t e melanoma d a t a f r a m e : C i t y and A g e C l a s s a r e c a t e g o r i c a l / f a c t o r v a r i a b l e s
City = f a c t o r ( c ( r e p ( " msp " ,8) , r e p ( " dfw " ,8) ) )
AgeClass = f a c t o r ( r e p ( c ( " 15 -25 " ," 25 -34 " ," 35 -44 " ," 45 -54 " ," 55 -64 " ," 65 -74 " ," 75 -84 " ," 85+ " ) ,2) )
AgeMid = r e p ( s e q (20 ,90 ,10) ,2)
Count = c (1 ,16 ,30 ,71 ,102 ,130 ,133 ,40 ,4 ,38 ,119 ,221 ,259 ,310 ,226 ,65)
Pop = c (172675 , 123065 , 96216 , 92051 , 72159 , 54722 , 32185 , 8328 ,181343 , 146207 , 121374 , 111353 , 83004 ,
55932 , 29007 , 7538)
melanoma = d a t a . frame ( City , AgeClass , AgeMid , Count , Pop )

melanoma $ resp = c b i n d ( melanoma $ Count , melanoma $ Pop - melanoma $ Count )
# P l o t t h e i n c i d e n c e v e r s u s t h e m i d p o i n t o f t h e age c a t e g o r i e s
p l o t ( AgeMid , Count / Pop , pch =1+9* a s . n u m e r i c ( City == " msp " ) )
l e g e n d ( x =20 , y =0.008 , pch = c (1 ,10) , l e g e n d = c ( " msp " ," dfw " ) )
# F i t a b i n o m i a l ( l o g i s t i c ) r e g r e s s i o n model
fit . b i n o m i a l = glm ( resp ~ City + l o g ( AgeMid ) , f a m i l y = b i n o m i a l ( l i n k = logit ) , d a t a = melanoma )
summary( fit . b i n o m i a l )
# F i t a P o i s s o n ( l o g−l i n e a r ) r e g r e s s i o n model
fit . p o i s s o n = glm ( Count ~ City + l o g ( AgeMid ) + o f f s e t ( l o g ( Pop ) ) , f a m i l y = p o i s s o n , d a t a = melanoma )
summary( fit . p o i s s o n )
# F i t t e d OR, RR , and p r e d i c t e d c o u n t s f o r i n t e r p r e t a t i o n
exp ( - fit . b i n o m i a l $ c o e f f [2])
exp ( - fit . p o i s s o n $ c o e f f [2])
expit (sum( fit . b i n o m i a l $ c o e f f * c (1 ,0 , l o g (30) ) ) ) *146207
exp (sum( fit . p o i s s o n $ c o e f f * c (1 ,0 , l o g (30) ) ) ) *146207
7/1
Binomial Model

πi
log = β0 + β1 xi1 + β2 xi2
1 − πi
> summary( fit . b i n o m i a l )
Call :
glm ( f o r m u l a = resp ˜ City + l o g ( AgeMid ) , f a m i l y = b i n o m i a l ( l i n k = logit ) , d a t a = melanoma )
Deviance Residuals :
Min 1Q Median 3Q Max
-3.2539 -0.9718 -0.2358 0.2556 2.0784
Coefficients :
Estimate Std . Error z value Pr ( >| z |)
( Intercept ) -19.49446 0.34284 -56.86 <2e -16 ∗∗∗
Citymsp -0.81288 0.05226 -15.55 <2e -16 ∗∗∗
l o g ( AgeMid ) 3.35654 0.08298 40.45 <2e -16 ∗∗∗
---
Signif . codes : 0 ∗∗∗ 0.001 ∗∗ 0.01 ∗ 0.05 . 0.1 1
( Dispersion parameter for b i n o m i a l f a m i l y taken to be 1)
Null deviance : 2794.779 on 15 degrees of freedom

Residual deviance : 40.335 on 13 degrees of freedom
AIC : 140.54
Number of Fisher Scoring iterations : 4

8/1
Poisson Model
log µi = α0 + α1 xi1 + α2 xi2 + log(Pop)

> summary( fit . p o i s s o n )
Call :
glm ( f o r m u l a = Count ˜ City + l o g ( AgeMid ) + o f f s e t ( l o g ( Pop ) ) , f a m i l y = p o i s s o n , d a t a = melanoma )
-3.2690 -0.9863 -0.2324 0.2620 2.0957
Coefficients :
( Intercept ) -19.46463 0.34207 -56.90 <2e -16 ∗∗∗
Citymsp -0.81014 0.05218 -15.53 <2e -16 ∗∗∗
l o g ( AgeMid ) 3.34813 0.08277 40.45 <2e -16 ∗∗∗
---
Signif . codes : 0 ∗∗∗ 0.001 ∗∗ 0.01 ∗ 0.05 . 0.1 1
( Dispersion parameter for p o i s s o n f a m i l y taken to be 1)

AIC : 141.06
9/1
Example: Non Melanoma Skin Cancer
1. What is the OR and RR for developing non melanoma skin cancer for women in
Dallas-Forth Worth versus those in Minneapolis-St Paul, controlling for age?
OR
d = exp(−β̂1 ) = exp(0.81288) = 2.254381
RR
c = exp(−α̂1 ) = exp(0.81014) = 2.248222
2. What is the predicted number of skin cancer cases in Dallas-Fort Worth among
women age 25-34?
Ŷi = π̂i mi = expit(β̂0 + β̂2 log(30))(146207) = 45.34367
µ̂i = π̂i mi = exp(α̂0 + α̂2 log(30))(146207) = 45.41334
10/1
2. Time Non Homogeneous Poisson Processes
• Now consider
E [N(t)] = µ(t) = λ(t) (not = λt)
• The rate is now a function of time
• Lots of possible ways to model the rate λ(t)
• Piecewise constant
λ(t) = b1 I [0 < t < t1 ] + b2 I [t1 ≤ t < t2 ] + ...

• Piecewise linear
λ(t) = (m1 t + b1 )I [0 < t < t1 ] + (m2 t + b2 )I [t1 ≤ t < t2 ] + ...

• Quadratic
λ(t) = at 2 + bt + c
• Splines, etc
• Consider an example data analysis to explore this further
11/1
Example: Rat Tumor Data
Rat Tumor Data
• Here we consider data from a study of the development of mammary tumors in
rats reported in Gail et al. (1980)
• This study was a carcinogenicity experiment in which 48 rats were exposed to a
carcinogen
• 23 were then assigned to a treatment group where the treatment was designed to
reduce the development of tumours
• 25 were assigned to the control group
• The rats were carefully examined over 122 days for the development of new
tumors (multiple tumors could develop)
• The day (time) of each tumor was recorded
• Our aim here is to estimate the expected number of tumors in the two groups and
make treatment comparisons
12/1
Times to tumors in days (number of tumors detected)
Treatment Group Control Group
ID Days of Tumor Detection ID Days of Tumor Detection
(2)
1 122 1 3, 42, 59, 61 , 112, 119
2 - 2 28, 31, 35, 45, 52, 59(2) , 77, 85, 107, 112
3 3, 88 3 31, 38, 48, 52, 74, 77, 101(2) , 119
4 92 4 11, 114
5 70, 74, 85, 92 5 35, 45, 74(2) , 77, 80, 85, 90(2)
6 38, 92, 122 6 8(2) , 70, 77
7 28, 35, 45, 70, 77, 107 7 17, 35, 52, 77, 101, 114
8 92 8 21, 24, 66, 74, 101(2) , 114
9 21 9 8, 17, 38, 42(3)
10 11, 24, 66, 74, 92 10 52
11 56, 70 11 28(2) , 31, 38, 52, 74(2) , 77(2) , 80(2) , 92(2)
12 31 12 17, 119
13 3, 8, 24, 35, 92 13 52
14 45, 92 14 11(2) , 14, 17, 52, 56(2) , 80(2) , 107
15 3, 42, 92 15 17, 35, 66, 90
16 3, 17, 52, 80 16 28, 66, 70(2) , 74
17 17, 59, 92, 101, 107 17 3, 14, 24(2) , 28, 31, 35, 48, 74, 77, 119
18 45, 52, 85, 101, 122 18 21, 28, 45, 56, 63, 80, 85, 92, 101(2) , 119
19 92 19 28, 35, 52, 59, 66(2) , 90, 97, 119
20 21, 35 20 8(2) , 24, 42, 45, 59, 63(2) , 77, 101, 119, 122
21 24, 31, 42, 48, 70, 74 21 80
22 - 22 92, 122(2)
23 31 23 21
24 3, 28, 74
25 24, 74, 122
Timeline plots for data from Gail et al. (1980)
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
Rat Tumor Data Structure
The entries in the data frame rats.pw for the first five rats in the treated group are
given below:
> rats.pw[1:20,]
id interval count len trt
1 1 1 0 30 1
2 1 2 0 30 1
3 1 3 0 30 1
4 1 4 1 32 1
5 2 1 0 30 1 • Consider four time intervals
6 2 2 0 30 1
7 2 3 0 30 1 • One line of data per interval
8 2 4 0 32 1
9
10 3
3 1
2
1 30
0 30
1
1
• count = number events in interval
11 3 3 1 30 1
12 3 4 0 32 1 • len = days spent in interval
13 4 1 0 30 1
14 4 2 0 30 1 • trt = treatment group
15 4 3 0 30 1
16 4 4 1 32 1
17 5 1 0 30 1
18 5 2 0 30 1
19 5 3 3 30 1
20 5 4 1 32 1
15/1
R Code I
rats = r e a d . t a b l e ( " rats . dat " , header = F )
dimnames ( rats ) [[2]] = c ( " id " ," start " ," stop " ," status " ," enum " ," trt " )
# f u n c t i o n t o c o v e r t d a t a t o s t r u c t u r e we want f o r analysis
gd . pw . f = f u n c t i o n ( indata ) {
pid = s o r t ( u n i q u e ( indata $ id ) )
d a t a = m a t r i x (0 , nrow =( l e n g t h ( pid )∗4) , n c o l =5)

f o r ( i in 1: l e n g t h ( pid ) ) {
tmp = indata [ indata $ id == pid [ i ] , ]
etime = f l o o r ( tmp $ s t o p [ tmp $ s t a t u s == 1] )
startpos = 4∗(i -1) + 1

stoppos = 4∗i
d a t a [ startpos : stoppos ,1] = r e p ( pid [ i ] ,4)
d a t a [ startpos : stoppos ,2] = c (1 ,2 ,3 ,4)
d a t a [ startpos : stoppos ,3] = c (sum(( etime > 0) & ( etime <= 30) ) ,
sum(( etime > 30) & ( etime <= 60) ) ,
sum(( etime > 60) & ( etime <= 90) ) ,
sum(( etime > 90) & ( etime <= 122) ) )
d a t a [ startpos : stoppos ,4] = c (30 ,30 ,30 ,32)
d a t a [ startpos : stoppos ,5] = r e p ( u n i q u e ( tmp $ trt ) ,4)
}
d a t a = d a t a . frame ( d a t a )
dimnames ( d a t a ) [[2]] = c ( " id " ," interval " ," count " ," len " ," trt " )
r e t u r n ( data )
}
rats . pw = gd . pw . f ( rats )
16/1
R Code II
rats . pw [1:20 ,]
# Model C o n t r o l Group Only ( p f i t C )

pfitC = glm ( c o u n t ˜ o f f s e t ( l o g ( len ) ) + f a c t o r ( interval ) , f a m i l y = p o i s s o n ( l i n k = l o g ) , d a t a = rats . pw , s u b s e t
=( trt ==0) , epsilon = 1e -004)
summary( pfitC , corr = F )
# Model C o n t r o l and T r e a t m e n t G r o u p s ( p f i t )
pfit = glm ( c o u n t ˜ o f f s e t ( l o g ( len ) ) + f a c t o r ( interval ) + trt , f a m i l y = p o i s s o n ( l i n k = l o g ) , d a t a = rats . pw ,
epsilon = 1e -004)
summary( pfit , corr = F )
# Time Homogeneous Model ( f i t )

fit = glm ( c o u n t ˜ o f f s e t ( l o g ( len ) ) + trt , f a m i l y = p o i s s o n ( l i n k = l o g ) , d a t a = rats . pw , epsilon =
0.0001)
summary( fit , corr = F )
1 - p c h i s q ( fit $ d e v i a n c e - pfit $ d e v i a n c e , fit $ df . residual - pfit $ df . residual )
# Time non homogeneous Model w i t h T r e a t m e n t I n t e r a c t i o n ( i f i t )

ifit = glm ( c o u n t ˜ o f f s e t ( l o g ( len ) ) + f a c t o r ( interval ) ∗ trt , f a m i l y = p o i s s o n ( l i n k = l o g ) , d a t a = rats .
pw , epsilon = 0.0001)
summary( ifit , corr = F )
1 - p c h i s q ( pfit $ d e v i a n c e - ifit $ d e v i a n c e , pfit $ df . residual - ifit $ df . residual )
17/1
1. Model Control Group Only (pfitC)
• To start we fit a piecewise constant model for control rats
log µi = xT
i β + offset(log ti )
• interval is a categorical variable at 4 levels:
xi1 = I [interval 2]
• Include offset(log(len)) to account for the fact that different intervals are of
different durations
18/1
> pfitC = glm ( c o u n t ˜ o f f s e t ( l o g ( len ) ) + f a c t o r ( interval ) , f a m i l y = p o i s s o n ( l i n k = l o g ) , d a t a = rats . pw ,
s u b s e t =( trt ==0) , epsilon = 1e -004)
> summary( pfitC , corr = F )
Call :
glm ( f o r m u l a = c o u n t ˜ o f f s e t ( l o g ( len ) ) + f a c t o r ( interval ) , f a m i l y = p o i s s o n ( l i n k = l o g ) ,
d a t a = rats . pw , s u b s e t = ( trt == 0) , epsilon = 1e -04)
-1.91834 -1.57481 -0.27360 0.62620 2.89586
Coefficients :
( Intercept ) -3.09371 0.17137 -18.0530 <2e -16 ∗∗∗
f a c t o r ( interval ) 2 0.16254 0.23284 0.6981 0.4851
f a c t o r ( interval ) 3 0.30228 0.22593 1.3380 0.1809
f a c t o r ( interval ) 4 -0.15690 0.24794 -0.6328 0.5268
---
Signif . codes : 0 ∗∗∗ 0.001 ∗∗ 0.01 ∗ 0.05 . 0.1 1

AIC : 345.249

Interpretation of pfitC
log µi = β0 + β1 xi1 + β2 xi2 + β3 xi3 +offset(log ti )

| {z }
interval
• Relative Rate of events in interval 2 versus interval 1:
λ(interval 2)
exp(β1 ) = = exp(0.16254) = 1.176
λ(interval 1)
• Notice none of β1 , β2 , β3 are statistically significant
• There is a trend of a slightly higher rate in intervals 2 and 3 (versus interval 1) but
the event rate does not differ significantly across follow-up time in the control rats
20/1
2. Model Control and Treatment Groups (pfit)
• Now fit a model to both the treatment and control groups

• xi4 = I [treatment group]
• Assume a piecewise constant baseline rate function
• Model is now:
log µi = β0 + β1 xi1 + β2 xi2 + β3 xi3 +β4 xi4 + offset(log ti )

| {z }
interval
• exp(β1 ) is now RR of events for interval 2 versus interval 1, for two rats of the
same treatment group
21/1
> pfit = glm ( c o u n t ˜ o f f s e t ( l o g ( len ) ) + f a c t o r ( interval ) + trt , f a m i l y = p o i s s o n ( l i n k = l o g ) , d a t a = rats . pw ,
epsilon = 1e -004)
> summary( pfit , corr = F )
Call :
glm ( f o r m u l a = c o u n t ˜ o f f s e t ( l o g ( len ) ) + f a c t o r ( interval ) + trt ,
f a m i l y = p o i s s o n ( l i n k = l o g ) , d a t a = rats . pw , epsilon = 1e -04)
-1.83356 -1.19944 -0.33024 0.47007 3.05508
Coefficients :
( Intercept ) -3.088179 0.150548 -20.5130 < 2.2 e -16 ∗∗∗
f a c t o r ( interval ) 2 0.171860 0.195454 0.8793 0.3792
f a c t o r ( interval ) 3 0.206350 0.193905 1.0642 0.2872
f a c t o r ( interval ) 4 -0.064533 0.203686 -0.3168 0.7514
trt -0.822988 0.151130 -5.4456 5.164 e -08 ∗∗∗
---
Signif . codes : 0 ∗∗∗ 0.001 ∗∗ 0.01 ∗ 0.05 . 0.1 1

AIC : 547.754
Interpretation of pfit
• Relative Rate of events for treatment versus control rats
λ(treatment)
exp(β4 ) = = exp(−0.8230) = 0.44
λ(control)
• Controlling for interval of follow-up, the rate of tumor development in treated rats
in 0.44 times that of control rats
• i.e. Treatment looks beneficial
• Note that β4 is statistically significant
• β1 , β2 , β3 still not statistically significant
• Consider do we really need to use a time non homogeneous model for this data?
23/1
3. Time Homogeneous Model (fit)
log µi = β0 + β4 xi4 + offset(log ti )
• β0 = log rate of tumor development, per day, control group

• β4 = log Relative Rate (RR) of tumor development in treated vs control rats
• This model is nested within the time non homogeneous model

• Consider pfit model with β1 = β2 = β3 = 0
• We can carry out a likelihood ratio test
24/1
> fit = glm ( c o u n t ˜ o f f s e t ( l o g ( len ) ) + trt , f a m i l y = p o i s s o n ( l i n k = l o g ) , d a t a = rats . pw , epsilon =
0.0001)
> summary( fit , corr = F )
Call :
glm ( f o r m u l a = c o u n t ˜ o f f s e t ( l o g ( len ) ) + trt , f a m i l y = p o i s s o n ( l i n k = l o g ) ,
d a t a = rats . pw , epsilon = 1e -04)
-1.78004 -1.14210 -0.42348 0.40094 3.26727
Coefficients :
( Intercept ) -3.005609 0.081219 -37.0062 < 2.2 e -16 ∗∗∗
trt -0.822995 0.151160 -5.4445 5.194 e -08 ∗∗∗
---
Signif . codes : 0 ∗∗∗ 0.001 ∗∗ 0.01 ∗ 0.05 . 0.1 1

AIC : 544.491

Interpretation of fit
• Note β̂4 = −0.8230 is almost unchanged versus model fit
• LIkelihood Ratio/Deviance test of H0 : β1 = β2 = β3 = 0
∆D = D0 − DA = 269.060 − 266.323 ∼ χ2(3) under H0
> 1 - p c h i s q ( fit $ d e v i a n c e - pfit $ d e v i a n c e , fit $ d f . r e s i d u a l - pfit $ d f . r e s i d u a l )

[1] 0.43400775
• No not reject H0
• Conclude that the time homogeneous model is probably OK in this case
• However, we retain it for generality and for the following analysis
26/1
4. Time Non Homogeneous Model with Treatment Interaction (ifit)
• Q: Is the treatment effect constant over time?

• Model with interaction:
interval treatment
z }| { z }| {
log µi = β0 + β1 xi1 + β2 xi2 + β3 xi3 + β4 xi4
+ β5 xi1 xi4 + β6 xi2 xi4 + β7 xi3 xi4 + log ti
| {z }
interval∗treatment
• Model pfit (time non homogeneous, without interaction) is nested within this
model (consider ifit with β5 = β6 = β7 = 0)
27/1
> ifit = glm ( c o u n t ˜ o f f s e t ( l o g ( len ) ) + f a c t o r ( interval ) ∗ trt , f a m i l y = p o i s s o n ( l i n k = l o g ) , d a t a =
rats . pw , epsilon = 0.0001)
> summary( ifit , corr = F )
Call :
glm ( f o r m u l a = c o u n t ˜ o f f s e t ( l o g ( len ) ) + f a c t o r ( interval ) ∗ trt , f a m i l y = p o i s s o n ( l i n k = l o g ) , d a t a =
rats . pw , epsilon = 1e -04)
-1.91834 -1.21584 -0.32409 0.51249 2.89586
Coefficients :
( Intercept ) -3.093712 0.171368 -18.0530 < 2e -16 ∗∗∗
f a c t o r ( interval ) 2 0.162536 0.232840 0.6981 0.48514
f a c t o r ( interval ) 3 0.302284 0.225926 1.3380 0.18090
f a c t o r ( interval ) 4 -0.156902 0.247935 -0.6328 0.52684
trt -0.803850 0.316133 -2.5428 0.01100 ∗
f a c t o r ( interval ) 2: trt 0.031556 0.428219 0.0737 0.94126
f a c t o r ( interval ) 3: trt -0.376186 0.443551 -0.8481 0.39637
f a c t o r ( interval ) 4: trt 0.286455 0.436610 0.6561 0.51177
---
Signif . codes : 0 ∗∗∗ 0.001 ∗∗ 0.01 ∗ 0.05 . 0.1 1

AIC : 551.349

Interpretation of ifit
• Note β̂4 = −0.8038 is very similar to pfit
• LIkelihood Ratio/Deviance test of H0 : β5 = β6 = β7 = 0
∆D = D0 − DA = 266.323 − 263.917 ∼ χ2(3) under H0
> 1 - p c h i s q ( pfit $ d e v i a n c e - ifit $ d e v i a n c e , pfit $ d f . r e s i d u a l - ifit $ d f . r e s i d u a l )

[1] 0.4926147
• No not reject H0
• We do not have evidence that the treatment effect varies across the time intervals
29/1
Summary of Rat Tumor Data Analysis
• Looks like a piecewise constant rate function is not necessary

• The best model (of the ones we examined) is fit
log µi = β0 + β4 xi4 + offset(log ti )
• Interpretation: The relative rate for tumor development in treated versus control
rats is
exp(β̂4 ) = exp(−0.822995) = 0.439
• i.e. treatment is beneficial (treated rates get fewer tumors)
• Prediction: Expected number of tumors for a control rat observed for 70 days?
30/1

STAT 431: Generalized Linear Models Spring 2019

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

STAT 431: Generalized Linear Models Spring 2019

Uploaded by

Copyright:

Available Formats

STAT 431: Generalized Linear Models

Lecture 13: Log Linear Models (Continued)

Instructor: Cecilia A. Cotton

Department of Statistics & Actuarial Science

June 20, 2019

E [Ni (t)] = µi (t) = λi ti

• Explanatory variables xi = (1, xi1 , . . . , xip−1 )0

log µi (ti ) = log λi + log ti = x0i β + log ti

log µ = log π + log m

• Use the Poisson distribution to model Binomial data

melanoma = d a t a . frame ( City , AgeClass , AgeMid , Count , Pop )

( Dispersion parameter for b i n o m i a l f a m i l y taken to be 1)

Null deviance : 2794.779 on 15 degrees of freedom

Number of Fisher Scoring iterations : 4

log µi = α0 + α1 xi1 + α2 xi2 + log(Pop)

( Dispersion parameter for p o i s s o n f a m i l y taken to be 1)

Null deviance : 2790.340 on 15 degrees of freedom

Number of Fisher Scoring iterations : 4

µ̂i = π̂i mi = exp(α̂0 + α̂2 log(30))(146207) = 45.41334

λ(t) = b1 I [0 < t < t1 ] + b2 I [t1 ≤ t < t2 ] + ...

λ(t) = (m1 t + b1 )I [0 < t < t1 ] + (m2 t + b2 )I [t1 ≤ t < t2 ] + ...

d a t a = m a t r i x (0 , nrow =( l e n g t h ( pid )∗4) , n c o l =5)

startpos = 4∗(i -1) + 1

# Model C o n t r o l Group Only ( p f i t C )

# Time Homogeneous Model ( f i t )

# Time non homogeneous Model w i t h T r e a t m e n t I n t e r a c t i o n ( i f i t )

• To start we fit a piecewise constant model for control rats

• interval is a categorical variable at 4 levels:

( Dispersion parameter for p o i s s o n f a m i l y taken to be 1)

Null deviance : 167.786 on 99 degrees of freedom

Number of Fisher Scoring iterations : 4

log µi = β0 + β1 xi1 + β2 xi2 + β3 xi3 +offset(log ti )

• Relative Rate of events in interval 2 versus interval 1:

• Now fit a model to both the treatment and control groups

log µi = β0 + β1 xi1 + β2 xi2 + β3 xi3 +β4 xi4 + offset(log ti )

( Dispersion parameter for p o i s s o n f a m i l y taken to be 1)

Null deviance : 301.371 on 191 degrees of freedom

• Relative Rate of events for treatment versus control rats

log µi = β0 + β4 xi4 + offset(log ti )

• β0 = log rate of tumor development, per day, control group

• This model is nested within the time non homogeneous model

( Dispersion parameter for p o i s s o n f a m i l y taken to be 1)

Null deviance : 301.371 on 191 degrees of freedom

Number of Fisher Scoring iterations : 4

• Note β̂4 = −0.8230 is almost unchanged versus model fit

• LIkelihood Ratio/Deviance test of H0 : β1 = β2 = β3 = 0

∆D = D0 − DA = 269.060 − 266.323 ∼ χ2(3) under H0

> 1 - p c h i s q ( fit $ d e v i a n c e - pfit $ d e v i a n c e , fit $ d f . r e s i d u a l - pfit $ d f . r e s i d u a l )

• Q: Is the treatment effect constant over time?

( Dispersion parameter for p o i s s o n f a m i l y taken to be 1)

Null deviance : 301.371 on 191 degrees of freedom

Number of Fisher Scoring iterations : 4

• Note β̂4 = −0.8038 is very similar to pfit

• LIkelihood Ratio/Deviance test of H0 : β5 = β6 = β7 = 0

∆D = D0 − DA = 266.323 − 263.917 ∼ χ2(3) under H0

> 1 - p c h i s q ( pfit $ d e v i a n c e - ifit $ d e v i a n c e , pfit $ d f . r e s i d u a l - ifit $ d f . r e s i d u a l )

• Looks like a piecewise constant rate function is not necessary

log µi = β0 + β4 xi4 + offset(log ti )

You might also like