You are on page 1of 30

STAT 431: Generalized Linear Models

Spring 2019

Lecture 13: Log Linear Models (Continued)

Instructor: Cecilia A. Cotton

Department of Statistics & Actuarial Science


University of Waterloo

June 20, 2019


c Cecilia Cotton 2019
This material is for the personal use of students enrolled in the Spring 2019 offering of Stat 431. Distribution or reproduction of these material for
commercial or non-commercial means is strictly prohibited.
Log Linear Models
Previously we used a Poisson GLM to model count data arising from a time
homogeneous Poisson process:
• Ni (ti ) = the number of events observed over (0, ti ]

E [Ni (t)] = µi (t) = λi ti

• Explanatory variables xi = (1, xi1 , . . . , xip−1 )0

log µi (ti ) = log λi + log ti = x0i β + log ti

We will consider three other types of data we can analyze with a Poisson GLM:
1. Approximating binomial data
2. Time non homogeneous Poisson processes
3. Contingency tables/Multinomial data (next class)
2/1
1. Poisson Approximation to the Binomial
• Suppose: Y ∼ Bin(m, π) so that E [Y ] = mπ
• Set µ = mπ and examine pmf of Y in terms of µ
 
m y
f (y ) = π (1 − π)m−y
y
(m)(m − 1) · · · (m − y )(m − y − 1) · · · (1)  µ y  µ m−y
= 1−
(m − y )! y ! m m
y  −y
(m)(m − 1) · · · (m − (y − 1)) µ µ  m  µ 
= 1− 1−
(m)(m) · · · (m) y! m m

• Recall  a n
lim 1+ = ea
n→∞ n
• Therefore as m → ∞ with µ = mπ fixed:
µy e −µ
f (y ) → i.e. Y → Poisson(µ)
y!
3/1
Poisson Approximation to the Binomial
• So for Y ∼ Bin(m, π) as m → ∞ and π → 0 we have

Y ∼ Poisson(µ = mπ)

with E [Y ] = µ = mπ
• Using a Poisson GLM (with log link):

log µ = log π + log m


= xT β + log m
| {z }
offset

• Use the Poisson distribution to model Binomial data


• Use with large population (m large) and low event rate (π)
• Example: Today and Problem 3.1 in course notes

4/1
Example: Poisson Approximation to the Binomial
Non Melanoma Skin Cancer
Schwarz (2015) gives the incidence of non melanoma skin cancer among women in the early
1970s in Minneapolis-St Paul and Dallas-Fort Worth
City Age Class Age Mid Count Pop Size
msp 15-25 20 1 172675
msp 25-34 30 16 123065
msp 35-44 40 30 96216
msp 45-54 50 71 92051
msp 55-64 60 102 72159
msp 65-74 70 130 54722
msp 75-84 80 133 32185
msp 85+ 90 40 8328
dfw 15-25 20 4 181343
dfw 25-34 30 38 146207
dfw 35-44 40 119 121374
dfw 45-54 50 221 111353
dfw 55-64 60 259 83004
dfw 65-74 70 310 55932
dfw 75-84 80 226 29007
dfw 85+ 90 65 7538

5/1
6/1
R Code
# C r e a t e melanoma d a t a f r a m e : C i t y and A g e C l a s s a r e c a t e g o r i c a l / f a c t o r v a r i a b l e s
City = f a c t o r ( c ( r e p ( " msp " ,8) , r e p ( " dfw " ,8) ) )
AgeClass = f a c t o r ( r e p ( c ( " 15 -25 " ," 25 -34 " ," 35 -44 " ," 45 -54 " ," 55 -64 " ," 65 -74 " ," 75 -84 " ," 85+ " ) ,2) )
AgeMid = r e p ( s e q (20 ,90 ,10) ,2)
Count = c (1 ,16 ,30 ,71 ,102 ,130 ,133 ,40 ,4 ,38 ,119 ,221 ,259 ,310 ,226 ,65)
Pop = c (172675 , 123065 , 96216 , 92051 , 72159 , 54722 , 32185 , 8328 ,181343 , 146207 , 121374 , 111353 , 83004 ,
55932 , 29007 , 7538)

melanoma = d a t a . frame ( City , AgeClass , AgeMid , Count , Pop )


melanoma $ resp = c b i n d ( melanoma $ Count , melanoma $ Pop - melanoma $ Count )

# P l o t t h e i n c i d e n c e v e r s u s t h e m i d p o i n t o f t h e age c a t e g o r i e s
p l o t ( AgeMid , Count / Pop , pch =1+9* a s . n u m e r i c ( City == " msp " ) )
l e g e n d ( x =20 , y =0.008 , pch = c (1 ,10) , l e g e n d = c ( " msp " ," dfw " ) )

# F i t a b i n o m i a l ( l o g i s t i c ) r e g r e s s i o n model
fit . b i n o m i a l = glm ( resp ~ City + l o g ( AgeMid ) , f a m i l y = b i n o m i a l ( l i n k = logit ) , d a t a = melanoma )
summary( fit . b i n o m i a l )

# F i t a P o i s s o n ( l o g−l i n e a r ) r e g r e s s i o n model
fit . p o i s s o n = glm ( Count ~ City + l o g ( AgeMid ) + o f f s e t ( l o g ( Pop ) ) , f a m i l y = p o i s s o n , d a t a = melanoma )
summary( fit . p o i s s o n )

# F i t t e d OR, RR , and p r e d i c t e d c o u n t s f o r i n t e r p r e t a t i o n
exp ( - fit . b i n o m i a l $ c o e f f [2])
exp ( - fit . p o i s s o n $ c o e f f [2])
expit (sum( fit . b i n o m i a l $ c o e f f * c (1 ,0 , l o g (30) ) ) ) *146207
exp (sum( fit . p o i s s o n $ c o e f f * c (1 ,0 , l o g (30) ) ) ) *146207

7/1
Binomial Model
 
πi
log = β0 + β1 xi1 + β2 xi2
1 − πi
> summary( fit . b i n o m i a l )

Call :
glm ( f o r m u l a = resp ˜ City + l o g ( AgeMid ) , f a m i l y = b i n o m i a l ( l i n k = logit ) , d a t a = melanoma )

Deviance Residuals :
Min 1Q Median 3Q Max
-3.2539 -0.9718 -0.2358 0.2556 2.0784

Coefficients :
Estimate Std . Error z value Pr ( >| z |)
( Intercept ) -19.49446 0.34284 -56.86 <2e -16 ∗∗∗
Citymsp -0.81288 0.05226 -15.55 <2e -16 ∗∗∗
l o g ( AgeMid ) 3.35654 0.08298 40.45 <2e -16 ∗∗∗
---
Signif . codes : 0 ∗∗∗ 0.001 ∗∗ 0.01 ∗ 0.05 . 0.1 1

( Dispersion parameter for b i n o m i a l f a m i l y taken to be 1)

Null deviance : 2794.779 on 15 degrees of freedom


Residual deviance : 40.335 on 13 degrees of freedom
AIC : 140.54

Number of Fisher Scoring iterations : 4


8/1
Poisson Model

log µi = α0 + α1 xi1 + α2 xi2 + log(Pop)


> summary( fit . p o i s s o n )

Call :
glm ( f o r m u l a = Count ˜ City + l o g ( AgeMid ) + o f f s e t ( l o g ( Pop ) ) , f a m i l y = p o i s s o n , d a t a = melanoma )

Deviance Residuals :
Min 1Q Median 3Q Max
-3.2690 -0.9863 -0.2324 0.2620 2.0957

Coefficients :
Estimate Std . Error z value Pr ( >| z |)
( Intercept ) -19.46463 0.34207 -56.90 <2e -16 ∗∗∗
Citymsp -0.81014 0.05218 -15.53 <2e -16 ∗∗∗
l o g ( AgeMid ) 3.34813 0.08277 40.45 <2e -16 ∗∗∗
---
Signif . codes : 0 ∗∗∗ 0.001 ∗∗ 0.01 ∗ 0.05 . 0.1 1

( Dispersion parameter for p o i s s o n f a m i l y taken to be 1)

Null deviance : 2790.340 on 15 degrees of freedom


Residual deviance : 40.819 on 13 degrees of freedom
AIC : 141.06

Number of Fisher Scoring iterations : 4

9/1
Example: Non Melanoma Skin Cancer
1. What is the OR and RR for developing non melanoma skin cancer for women in
Dallas-Forth Worth versus those in Minneapolis-St Paul, controlling for age?
OR
d = exp(−β̂1 ) = exp(0.81288) = 2.254381

RR
c = exp(−α̂1 ) = exp(0.81014) = 2.248222

2. What is the predicted number of skin cancer cases in Dallas-Fort Worth among
women age 25-34?
Ŷi = π̂i mi = expit(β̂0 + β̂2 log(30))(146207) = 45.34367

µ̂i = π̂i mi = exp(α̂0 + α̂2 log(30))(146207) = 45.41334

10/1
2. Time Non Homogeneous Poisson Processes
• Now consider
E [N(t)] = µ(t) = λ(t) (not = λt)
• The rate is now a function of time
• Lots of possible ways to model the rate λ(t)
• Piecewise constant

λ(t) = b1 I [0 < t < t1 ] + b2 I [t1 ≤ t < t2 ] + ...


• Piecewise linear

λ(t) = (m1 t + b1 )I [0 < t < t1 ] + (m2 t + b2 )I [t1 ≤ t < t2 ] + ...


• Quadratic
λ(t) = at 2 + bt + c
• Splines, etc
• Consider an example data analysis to explore this further
11/1
Example: Rat Tumor Data
Rat Tumor Data
• Here we consider data from a study of the development of mammary tumors in
rats reported in Gail et al. (1980)
• This study was a carcinogenicity experiment in which 48 rats were exposed to a
carcinogen
• 23 were then assigned to a treatment group where the treatment was designed to
reduce the development of tumours
• 25 were assigned to the control group
• The rats were carefully examined over 122 days for the development of new
tumors (multiple tumors could develop)
• The day (time) of each tumor was recorded
• Our aim here is to estimate the expected number of tumors in the two groups and
make treatment comparisons

12/1
Times to tumors in days (number of tumors detected)
Treatment Group Control Group
ID Days of Tumor Detection ID Days of Tumor Detection
(2)
1 122 1 3, 42, 59, 61 , 112, 119
2 - 2 28, 31, 35, 45, 52, 59(2) , 77, 85, 107, 112
3 3, 88 3 31, 38, 48, 52, 74, 77, 101(2) , 119
4 92 4 11, 114
5 70, 74, 85, 92 5 35, 45, 74(2) , 77, 80, 85, 90(2)
6 38, 92, 122 6 8(2) , 70, 77
7 28, 35, 45, 70, 77, 107 7 17, 35, 52, 77, 101, 114
8 92 8 21, 24, 66, 74, 101(2) , 114
9 21 9 8, 17, 38, 42(3)
10 11, 24, 66, 74, 92 10 52
11 56, 70 11 28(2) , 31, 38, 52, 74(2) , 77(2) , 80(2) , 92(2)
12 31 12 17, 119
13 3, 8, 24, 35, 92 13 52
14 45, 92 14 11(2) , 14, 17, 52, 56(2) , 80(2) , 107
15 3, 42, 92 15 17, 35, 66, 90
16 3, 17, 52, 80 16 28, 66, 70(2) , 74
17 17, 59, 92, 101, 107 17 3, 14, 24(2) , 28, 31, 35, 48, 74, 77, 119
18 45, 52, 85, 101, 122 18 21, 28, 45, 56, 63, 80, 85, 92, 101(2) , 119
19 92 19 28, 35, 52, 59, 66(2) , 90, 97, 119
20 21, 35 20 8(2) , 24, 42, 45, 59, 63(2) , 77, 101, 119, 122
21 24, 31, 42, 48, 70, 74 21 80
22 - 22 92, 122(2)
23 31 23 21
24 3, 28, 74
25 24, 74, 122
Timeline plots for data from Gail et al. (1980)
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
ï
Rat Tumor Data Structure
The entries in the data frame rats.pw for the first five rats in the treated group are
given below:
> rats.pw[1:20,]
id interval count len trt
1 1 1 0 30 1
2 1 2 0 30 1
3 1 3 0 30 1
4 1 4 1 32 1
5 2 1 0 30 1 • Consider four time intervals
6 2 2 0 30 1
7 2 3 0 30 1 • One line of data per interval
8 2 4 0 32 1
9
10 3
3 1
2
1 30
0 30
1
1
• count = number events in interval
11 3 3 1 30 1
12 3 4 0 32 1 • len = days spent in interval
13 4 1 0 30 1
14 4 2 0 30 1 • trt = treatment group
15 4 3 0 30 1
16 4 4 1 32 1
17 5 1 0 30 1
18 5 2 0 30 1
19 5 3 3 30 1
20 5 4 1 32 1

15/1
R Code I
rats = r e a d . t a b l e ( " rats . dat " , header = F )
dimnames ( rats ) [[2]] = c ( " id " ," start " ," stop " ," status " ," enum " ," trt " )

# f u n c t i o n t o c o v e r t d a t a t o s t r u c t u r e we want f o r analysis
gd . pw . f = f u n c t i o n ( indata ) {
pid = s o r t ( u n i q u e ( indata $ id ) )

d a t a = m a t r i x (0 , nrow =( l e n g t h ( pid )∗4) , n c o l =5)


f o r ( i in 1: l e n g t h ( pid ) ) {
tmp = indata [ indata $ id == pid [ i ] , ]
etime = f l o o r ( tmp $ s t o p [ tmp $ s t a t u s == 1] )

startpos = 4∗(i -1) + 1


stoppos = 4∗i
d a t a [ startpos : stoppos ,1] = r e p ( pid [ i ] ,4)
d a t a [ startpos : stoppos ,2] = c (1 ,2 ,3 ,4)
d a t a [ startpos : stoppos ,3] = c (sum(( etime > 0) & ( etime <= 30) ) ,
sum(( etime > 30) & ( etime <= 60) ) ,
sum(( etime > 60) & ( etime <= 90) ) ,
sum(( etime > 90) & ( etime <= 122) ) )
d a t a [ startpos : stoppos ,4] = c (30 ,30 ,30 ,32)
d a t a [ startpos : stoppos ,5] = r e p ( u n i q u e ( tmp $ trt ) ,4)
}
d a t a = d a t a . frame ( d a t a )
dimnames ( d a t a ) [[2]] = c ( " id " ," interval " ," count " ," len " ," trt " )
r e t u r n ( data )
}
rats . pw = gd . pw . f ( rats )

16/1
R Code II
rats . pw [1:20 ,]

# Model C o n t r o l Group Only ( p f i t C )


pfitC = glm ( c o u n t ˜ o f f s e t ( l o g ( len ) ) + f a c t o r ( interval ) , f a m i l y = p o i s s o n ( l i n k = l o g ) , d a t a = rats . pw , s u b s e t
=( trt ==0) , epsilon = 1e -004)
summary( pfitC , corr = F )

# Model C o n t r o l and T r e a t m e n t G r o u p s ( p f i t )
pfit = glm ( c o u n t ˜ o f f s e t ( l o g ( len ) ) + f a c t o r ( interval ) + trt , f a m i l y = p o i s s o n ( l i n k = l o g ) , d a t a = rats . pw ,
epsilon = 1e -004)
summary( pfit , corr = F )

# Time Homogeneous Model ( f i t )


fit = glm ( c o u n t ˜ o f f s e t ( l o g ( len ) ) + trt , f a m i l y = p o i s s o n ( l i n k = l o g ) , d a t a = rats . pw , epsilon =
0.0001)
summary( fit , corr = F )
1 - p c h i s q ( fit $ d e v i a n c e - pfit $ d e v i a n c e , fit $ df . residual - pfit $ df . residual )

# Time non homogeneous Model w i t h T r e a t m e n t I n t e r a c t i o n ( i f i t )


ifit = glm ( c o u n t ˜ o f f s e t ( l o g ( len ) ) + f a c t o r ( interval ) ∗ trt , f a m i l y = p o i s s o n ( l i n k = l o g ) , d a t a = rats .
pw , epsilon = 0.0001)
summary( ifit , corr = F )
1 - p c h i s q ( pfit $ d e v i a n c e - ifit $ d e v i a n c e , pfit $ df . residual - ifit $ df . residual )

17/1
1. Model Control Group Only (pfitC)

• To start we fit a piecewise constant model for control rats

log µi = xT
i β + offset(log ti )

• interval is a categorical variable at 4 levels:

xi1 = I [interval 2]
xi2 = I [interval 3]
xi3 = I [interval 4]

• Include offset(log(len)) to account for the fact that different intervals are of
different durations

18/1
> pfitC = glm ( c o u n t ˜ o f f s e t ( l o g ( len ) ) + f a c t o r ( interval ) , f a m i l y = p o i s s o n ( l i n k = l o g ) , d a t a = rats . pw ,
s u b s e t =( trt ==0) , epsilon = 1e -004)
> summary( pfitC , corr = F )

Call :
glm ( f o r m u l a = c o u n t ˜ o f f s e t ( l o g ( len ) ) + f a c t o r ( interval ) , f a m i l y = p o i s s o n ( l i n k = l o g ) ,
d a t a = rats . pw , s u b s e t = ( trt == 0) , epsilon = 1e -04)

Deviance Residuals :
Min 1Q Median 3Q Max
-1.91834 -1.57481 -0.27360 0.62620 2.89586

Coefficients :
Estimate Std . Error z value Pr ( >| z |)
( Intercept ) -3.09371 0.17137 -18.0530 <2e -16 ∗∗∗
f a c t o r ( interval ) 2 0.16254 0.23284 0.6981 0.4851
f a c t o r ( interval ) 3 0.30228 0.22593 1.3380 0.1809
f a c t o r ( interval ) 4 -0.15690 0.24794 -0.6328 0.5268
---
Signif . codes : 0 ∗∗∗ 0.001 ∗∗ 0.01 ∗ 0.05 . 0.1 1

( Dispersion parameter for p o i s s o n f a m i l y taken to be 1)

Null deviance : 167.786 on 99 degrees of freedom


Residual deviance : 163.308 on 96 degrees of freedom
AIC : 345.249

Number of Fisher Scoring iterations : 4


Interpretation of pfitC

log µi = β0 + β1 xi1 + β2 xi2 + β3 xi3 +offset(log ti )


| {z }
interval

• Relative Rate of events in interval 2 versus interval 1:

λ(interval 2)
exp(β1 ) = = exp(0.16254) = 1.176
λ(interval 1)
• Notice none of β1 , β2 , β3 are statistically significant
• There is a trend of a slightly higher rate in intervals 2 and 3 (versus interval 1) but
the event rate does not differ significantly across follow-up time in the control rats

20/1
2. Model Control and Treatment Groups (pfit)

• Now fit a model to both the treatment and control groups


• xi4 = I [treatment group]
• Assume a piecewise constant baseline rate function
• Model is now:

log µi = β0 + β1 xi1 + β2 xi2 + β3 xi3 +β4 xi4 + offset(log ti )


| {z }
interval

• exp(β1 ) is now RR of events for interval 2 versus interval 1, for two rats of the
same treatment group

21/1
> pfit = glm ( c o u n t ˜ o f f s e t ( l o g ( len ) ) + f a c t o r ( interval ) + trt , f a m i l y = p o i s s o n ( l i n k = l o g ) , d a t a = rats . pw ,
epsilon = 1e -004)
> summary( pfit , corr = F )

Call :
glm ( f o r m u l a = c o u n t ˜ o f f s e t ( l o g ( len ) ) + f a c t o r ( interval ) + trt ,
f a m i l y = p o i s s o n ( l i n k = l o g ) , d a t a = rats . pw , epsilon = 1e -04)

Deviance Residuals :
Min 1Q Median 3Q Max
-1.83356 -1.19944 -0.33024 0.47007 3.05508

Coefficients :
Estimate Std . Error z value Pr ( >| z |)
( Intercept ) -3.088179 0.150548 -20.5130 < 2.2 e -16 ∗∗∗
f a c t o r ( interval ) 2 0.171860 0.195454 0.8793 0.3792
f a c t o r ( interval ) 3 0.206350 0.193905 1.0642 0.2872
f a c t o r ( interval ) 4 -0.064533 0.203686 -0.3168 0.7514
trt -0.822988 0.151130 -5.4456 5.164 e -08 ∗∗∗
---
Signif . codes : 0 ∗∗∗ 0.001 ∗∗ 0.01 ∗ 0.05 . 0.1 1

( Dispersion parameter for p o i s s o n f a m i l y taken to be 1)

Null deviance : 301.371 on 191 degrees of freedom


Residual deviance : 266.323 on 187 degrees of freedom
AIC : 547.754
Interpretation of pfit

• Relative Rate of events for treatment versus control rats

λ(treatment)
exp(β4 ) = = exp(−0.8230) = 0.44
λ(control)
• Controlling for interval of follow-up, the rate of tumor development in treated rats
in 0.44 times that of control rats
• i.e. Treatment looks beneficial
• Note that β4 is statistically significant
• β1 , β2 , β3 still not statistically significant

• Consider do we really need to use a time non homogeneous model for this data?

23/1
3. Time Homogeneous Model (fit)

log µi = β0 + β4 xi4 + offset(log ti )

• β0 = log rate of tumor development, per day, control group


• β4 = log Relative Rate (RR) of tumor development in treated vs control rats

• This model is nested within the time non homogeneous model


• Consider pfit model with β1 = β2 = β3 = 0
• We can carry out a likelihood ratio test

24/1
> fit = glm ( c o u n t ˜ o f f s e t ( l o g ( len ) ) + trt , f a m i l y = p o i s s o n ( l i n k = l o g ) , d a t a = rats . pw , epsilon =
0.0001)
> summary( fit , corr = F )

Call :
glm ( f o r m u l a = c o u n t ˜ o f f s e t ( l o g ( len ) ) + trt , f a m i l y = p o i s s o n ( l i n k = l o g ) ,
d a t a = rats . pw , epsilon = 1e -04)

Deviance Residuals :
Min 1Q Median 3Q Max
-1.78004 -1.14210 -0.42348 0.40094 3.26727

Coefficients :
Estimate Std . Error z value Pr ( >| z |)
( Intercept ) -3.005609 0.081219 -37.0062 < 2.2 e -16 ∗∗∗
trt -0.822995 0.151160 -5.4445 5.194 e -08 ∗∗∗
---
Signif . codes : 0 ∗∗∗ 0.001 ∗∗ 0.01 ∗ 0.05 . 0.1 1

( Dispersion parameter for p o i s s o n f a m i l y taken to be 1)

Null deviance : 301.371 on 191 degrees of freedom


Residual deviance : 269.060 on 190 degrees of freedom
AIC : 544.491

Number of Fisher Scoring iterations : 4


Interpretation of fit

• Note β̂4 = −0.8230 is almost unchanged versus model fit

• LIkelihood Ratio/Deviance test of H0 : β1 = β2 = β3 = 0

∆D = D0 − DA = 269.060 − 266.323 ∼ χ2(3) under H0

> 1 - p c h i s q ( fit $ d e v i a n c e - pfit $ d e v i a n c e , fit $ d f . r e s i d u a l - pfit $ d f . r e s i d u a l )


[1] 0.43400775

• No not reject H0
• Conclude that the time homogeneous model is probably OK in this case
• However, we retain it for generality and for the following analysis

26/1
4. Time Non Homogeneous Model with Treatment Interaction (ifit)

• Q: Is the treatment effect constant over time?


• Model with interaction:
interval treatment
z }| { z }| {
log µi = β0 + β1 xi1 + β2 xi2 + β3 xi3 + β4 xi4
+ β5 xi1 xi4 + β6 xi2 xi4 + β7 xi3 xi4 + log ti
| {z }
interval∗treatment

• Model pfit (time non homogeneous, without interaction) is nested within this
model (consider ifit with β5 = β6 = β7 = 0)

27/1
> ifit = glm ( c o u n t ˜ o f f s e t ( l o g ( len ) ) + f a c t o r ( interval ) ∗ trt , f a m i l y = p o i s s o n ( l i n k = l o g ) , d a t a =
rats . pw , epsilon = 0.0001)
> summary( ifit , corr = F )

Call :
glm ( f o r m u l a = c o u n t ˜ o f f s e t ( l o g ( len ) ) + f a c t o r ( interval ) ∗ trt , f a m i l y = p o i s s o n ( l i n k = l o g ) , d a t a =
rats . pw , epsilon = 1e -04)

Deviance Residuals :
Min 1Q Median 3Q Max
-1.91834 -1.21584 -0.32409 0.51249 2.89586

Coefficients :
Estimate Std . Error z value Pr ( >| z |)
( Intercept ) -3.093712 0.171368 -18.0530 < 2e -16 ∗∗∗
f a c t o r ( interval ) 2 0.162536 0.232840 0.6981 0.48514
f a c t o r ( interval ) 3 0.302284 0.225926 1.3380 0.18090
f a c t o r ( interval ) 4 -0.156902 0.247935 -0.6328 0.52684
trt -0.803850 0.316133 -2.5428 0.01100 ∗
f a c t o r ( interval ) 2: trt 0.031556 0.428219 0.0737 0.94126
f a c t o r ( interval ) 3: trt -0.376186 0.443551 -0.8481 0.39637
f a c t o r ( interval ) 4: trt 0.286455 0.436610 0.6561 0.51177
---
Signif . codes : 0 ∗∗∗ 0.001 ∗∗ 0.01 ∗ 0.05 . 0.1 1

( Dispersion parameter for p o i s s o n f a m i l y taken to be 1)

Null deviance : 301.371 on 191 degrees of freedom


Residual deviance : 263.917 on 184 degrees of freedom
AIC : 551.349

Number of Fisher Scoring iterations : 4


Interpretation of ifit

• Note β̂4 = −0.8038 is very similar to pfit

• LIkelihood Ratio/Deviance test of H0 : β5 = β6 = β7 = 0

∆D = D0 − DA = 266.323 − 263.917 ∼ χ2(3) under H0

> 1 - p c h i s q ( pfit $ d e v i a n c e - ifit $ d e v i a n c e , pfit $ d f . r e s i d u a l - ifit $ d f . r e s i d u a l )


[1] 0.4926147

• No not reject H0
• We do not have evidence that the treatment effect varies across the time intervals

29/1
Summary of Rat Tumor Data Analysis

• Looks like a piecewise constant rate function is not necessary


• The best model (of the ones we examined) is fit

log µi = β0 + β4 xi4 + offset(log ti )

• Interpretation: The relative rate for tumor development in treated versus control
rats is
exp(β̂4 ) = exp(−0.822995) = 0.439
• i.e. treatment is beneficial (treated rates get fewer tumors)
• Prediction: Expected number of tumors for a control rat observed for 70 days?

30/1

You might also like