You are on page 1of 4

p.

3-3
•A linear model approach for binomial data:
y
x
/n
x
=X| +c, c ~ N (0, o
2
I )
 Q: when is the approach appropriate?
 The pmf shape of the joint distribution of y
x
/n
x
is similar to the pdf shape of N (X|, o
2
I )
Some problems with this approach
 Predicted probability may >1 or <0
 Normal approximation might be too
much a stretch when n
i
’s are not
large or p
x
~1/0
 Variance of Binomial is not constant
 Some of these problems could be
corrected by using transformation
and weighting
p. 3-4
•Recall: linear model
model description 1: Y =X| +c, c ~ N (0, o
2
I )
model description 2: Y ~ N (X|, o
2
I )
 Y ~ N (µ
x
, E
x
)

 µ
x
=q
x
, E
x
=o
2
I
Q: which description can be generalized to binomial data?
•3 components in a generalized linear model (binomial example)
 y
x
~B(n
x
, p
x
)

link functiong: g monotone and q
x
=g(p
x
) [for binimial,
g: (0, 1)÷(÷·, ·) ]
•Common choices of link function for binomial data
Logit: q
x
=log(p
x
/(1÷p
x
))
Probit: q
x
=u
÷1
(p
x
), where u is the cdf of Normal
Xβ =
p
i=1
β
i
· h
i
(X
1
, . . . , X
m
) ≡ η
x
Xβ =
p
i=1
β
i
· h
i
(X
1
, . . . , X
m
) ≡ η
x
, −∞< η
x
< ∞
Generalized Linear Model for Binomial Data
NTHU STAT 5230, 2011 Lecture Notes
made by Shao-Wei Cheng (NTHU)
p. 3-5
Complementary log-log: q
x
=log(÷log(1÷p
x
))
Logit is close to the complementary log-log when p
x
is small
Logit is close to probit when 0.1 <p
x
<0.9
(exercise: plot & check the difference between the 3 functions)
•Log-likelihood of the GLM (use logit link as an example):
•Estimation of |
Recall: in linear model, the LS estimator of | is also the MLE
For GLM, the concept of LS does not work any more ¬still
can adopt the method of MLE (maximize l(|) as a function of |)
Usually no explicit formula for MLE of |
An algorithm to perform the maximization will be discussed in
future lecture
p. 3-6
I nference
•RSS and deviance
Recall: in linear model, RSS play a critical role in inference
 Q: In GLM, what is the concept similar to RSS?
 consider two models
 a larger model L with l parameters and likelihood L
L
 a smaller model S with s parameters and likelihood L
S
 S is nested in L (S c L)
 To test H
0
: S v.s. H
1
:L\S, likelihood methods suggests the
likelihood ratio statistics:
2 log(L
L
/L
S
)
 suppose that L is a saturated larger model, the test statistic
becomes:
where is the fitted values from the smaller model
 D is called deviance, which plays a role similar to RSS
2 log(L
L
(
ˆ
β
L
)/L
S
(
ˆ
β
S
))
NTHU STAT 5230, 2011 Lecture Notes
made by Shao-Wei Cheng (NTHU)
p. 3-7
 Since the saturated model fits as well as any model can fit, the
deviance D measures how close the (smaller) model comes to
perfection.
 Deviance can be treated as a measure of goodness of fit
•Suppose that y
i
is truly binomial and that the n
i
are relatively large
 , if the (smaller) model is correct ¬can use the
deviance to test whether the model is an adequate fit
The chi-square distribution is only an approximation that
becomes more accurate as the n
i
increase [often suggest n
i
> 5]
•Use deviance to compare two models S and L, S nested inL
Larger model L: deviance D
L
and df
L
(=k÷l)
Smaller model S: deviance D
S
and df
S
(=k÷s)
To test H
0
: S v.s. H
1
:L\S, the test statistics is
D
S
÷ D
L
which is asymptotically distributed as
•In terms of the accuracy of dist. approx., test >goodness of fit
p. 3-8
•(Wald’s test) alternative test for H
0
: |
i
=0
Can be generalized to H
0
: |
i
=c or H
0
: | =c
Asymptotic null distribution: N(0, 1)
in contrast to normal linear model, these two statistics
(deviance-based and Wald’s tests) are not identical
Hauck-Donner effect (see Hauck and Donner, 1977): for
sparse data(i.e., many n
i
’s =1 or small), the standard errors
can be overestimatedand so the z-value is too small and the
significance of an effect could be missed
¬therefore, the deviance-based test is preferred
test statistics: z-value
NTHU STAT 5230, 2011 Lecture Notes
made by Shao-Wei Cheng (NTHU)
p. 3-9
•100(1÷o)% confidence interval
Relationship between confidence interval and test
Approach 1: (from Wald’s test)
Approach 2: (profile likelihood-based method)
other |
j
’s, j=i, set to the maximizing values
(recall: the computation of the C.I. for ì in Box-Cox method)
the profile likelihood method is generally preferable for the
same Hauck-Donner reason
Similar method can be generalized to construct confidence
region of several parameters
NTHU STAT 5230, 2011 Lecture Notes
made by Shao-Wei Cheng (NTHU)