What is Binomial data?

3-3

•A linear model approach for binomial data:

y

x

/n

x

=X| +c, c ~ N (0, o

2

I )

Q: when is the approach appropriate?

The pmf shape of the joint distribution of y

x

/n

x

is similar to the pdf shape of N (X|, o

2

I )

Some problems with this approach

Predicted probability may >1 or <0

Normal approximation might be too

much a stretch when n

i

’s are not

large or p

x

~1/0

Variance of Binomial is not constant

Some of these problems could be

corrected by using transformation

and weighting

p. 3-4

•Recall: linear model

model description 1: Y =X| +c, c ~ N (0, o

2

I )

model description 2: Y ~ N (X|, o

2

I )

Y ~ N (µ

x

, E

x

)

µ

x

=q

x

, E

x

=o

2

I

Q: which description can be generalized to binomial data?

•3 components in a generalized linear model (binomial example)

y

x

~B(n

x

, p

x

)

link functiong: g monotone and q

x

=g(p

x

) [for binimial,

g: (0, 1)÷(÷·, ·) ]

•Common choices of link function for binomial data

Logit: q

x

=log(p

x

/(1÷p

x

))

Probit: q

x

=u

÷1

(p

x

), where u is the cdf of Normal

Xβ =

p

i=1

β

i

· h

i

(X

1

, . . . , X

m

) ≡ η

x

Xβ =

p

i=1

β

i

· h

i

(X

1

, . . . , X

m

) ≡ η

x

, −∞< η

x

< ∞

Generalized Linear Model for Binomial Data

NTHU STAT 5230, 2011 Lecture Notes

made by Shao-Wei Cheng (NTHU)

p. 3-5

Complementary log-log: q

x

=log(÷log(1÷p

x

))

Logit is close to the complementary log-log when p

x

is small

Logit is close to probit when 0.1 <p

x

<0.9

(exercise: plot & check the difference between the 3 functions)

•Log-likelihood of the GLM (use logit link as an example):

•Estimation of |

Recall: in linear model, the LS estimator of | is also the MLE

For GLM, the concept of LS does not work any more ¬still

can adopt the method of MLE (maximize l(|) as a function of |)

Usually no explicit formula for MLE of |

An algorithm to perform the maximization will be discussed in

future lecture

p. 3-6

I nference

•RSS and deviance

Recall: in linear model, RSS play a critical role in inference

Q: In GLM, what is the concept similar to RSS?

consider two models

a larger model L with l parameters and likelihood L

L

a smaller model S with s parameters and likelihood L

S

S is nested in L (S c L)

To test H

0

: S v.s. H

1

:L\S, likelihood methods suggests the

likelihood ratio statistics:

2 log(L

L

/L

S

)

suppose that L is a saturated larger model, the test statistic

becomes:

where is the fitted values from the smaller model

D is called deviance, which plays a role similar to RSS

2 log(L

L

(

ˆ

β

L

)/L

S

(

ˆ

β

S

))

NTHU STAT 5230, 2011 Lecture Notes

made by Shao-Wei Cheng (NTHU)

p. 3-7

Since the saturated model fits as well as any model can fit, the

deviance D measures how close the (smaller) model comes to

perfection.

Deviance can be treated as a measure of goodness of fit

•Suppose that y

i

is truly binomial and that the n

i

are relatively large

, if the (smaller) model is correct ¬can use the

deviance to test whether the model is an adequate fit

The chi-square distribution is only an approximation that

becomes more accurate as the n

i

increase [often suggest n

i

> 5]

•Use deviance to compare two models S and L, S nested inL

Larger model L: deviance D

L

and df

L

(=k÷l)

Smaller model S: deviance D

S

and df

S

(=k÷s)

To test H

0

: S v.s. H

1

:L\S, the test statistics is

D

S

÷ D

L

which is asymptotically distributed as

•In terms of the accuracy of dist. approx., test >goodness of fit

p. 3-8

•(Wald’s test) alternative test for H

0

: |

i

=0

Can be generalized to H

0

: |

i

=c or H

0

: | =c

Asymptotic null distribution: N(0, 1)

in contrast to normal linear model, these two statistics

(deviance-based and Wald’s tests) are not identical

Hauck-Donner effect (see Hauck and Donner, 1977): for

sparse data(i.e., many n

i

’s =1 or small), the standard errors

can be overestimatedand so the z-value is too small and the

significance of an effect could be missed

¬therefore, the deviance-based test is preferred

test statistics: z-value

NTHU STAT 5230, 2011 Lecture Notes

made by Shao-Wei Cheng (NTHU)

p. 3-9

•100(1÷o)% confidence interval

Relationship between confidence interval and test

Approach 1: (from Wald’s test)

Approach 2: (profile likelihood-based method)

other |

j

’s, j=i, set to the maximizing values

(recall: the computation of the C.I. for ì in Box-Cox method)

the profile likelihood method is generally preferable for the

same Hauck-Donner reason

Similar method can be generalized to construct confidence

region of several parameters

NTHU STAT 5230, 2011 Lecture Notes

made by Shao-Wei Cheng (NTHU)

