離散資料分析 Categorical Data Analysis: 陳俞成 Email:ycchen@mail.chna.edu.tw

離散資料分析
Categorical Data Analysis
陳俞成
Email:ycchen@mail.chna.edu.tw
2005.10.24
陳俞成 Email:ycchen@mail.chna.edu.tw 離散資料分析 Categorical Data Analysis

Generalized Linear Models
I Using models as the basis of investigating effects

of explanatory variables on categorical response
variables

I Benefits of good-fitting model

I The structural form of the model describes the
patterns of association and interaction.
I Inferences for model parameters help us evaluate
which explanatory variables affect the response, while
controlling effects of possible confounding variables.
I The size of the estimated model parameters determine
the strength and importance of the effects.
I The model’s predicted values smooth the data and
provide improved estimates of the mean of the
response distribution.

I Model can handle more complicated situations

such as analyzing simultaneously the effects of
several explanatory variables.
I The model-building paradigm focuses on
estimating parameters that describe the effects,
which is more informative than mere significance
testing.
I The explanatory variables in the model can be
continuous or categorical or both types.

I Generalized linear model(GLM) is a broad class

of models that includes ordinary regression and
ANOVA models for continuous response variables
as well as models for categorical response
variables.

I §4.1 discusses three components that are common to all
GLMs.
I §4.2 introduces the logistic regression model for binary

response variables, appropriate for binomial data.
I §4.3 introduces the loglinear model for count-type response

variables modeled as Poisson data.
I §4.4 discusses checks of the adequacy of model fit for

GLMs, illustrating for Poisson loglinear models.
I §4.5 presents further details about the fitting and checking

of GLMs.

I The random component identifies the response
variable Y and assumes a probability distribution
for it.
I The systematic component specifies the
explanatory variables used as predictors in the
model.
I The link component describes the functional
relationalship between the systematic component
and the expected value(mean) of the random
component.

Random Component
I The random component of a GLM consists of a

response variable Y with independent
observations (y1 , . . . , yn ) from a distribution in
the natural exponential family.
I The natural exponential family has probability
density function or mass function of form
f (yi ; θi ) = a(θi )b(yi ) exp[yi Q(θi )].

Random Component
I The value of the parameter θi may vary for

i = 1, . . . , n, depending on values of explanatory
variables.
I The term Q(θ) is called the natural parameter.

Binomial Logit Models for Binary Data
I Represent the success and failure outcomes by 1

and 0.
I The Bernoulli distribution for this Bernoulli trial
specifies probabilities P(Y = 1) = π and
P(Y = 0) = 1 − π, for which E (Y ) = π.
I f (y ; π) = π y (1 − π)1−y = (1 − π)[π/(1 − π)]y
π
= (1 − π) exp(y log 1−π ) for y = 0 and 1.

Binomial Logit Models for Binary Data
I This is in the natural exponential family,

identifying θ = π, a(π) = 1 − π, b(y ) = 1, and
Q(π) = log[π/(1 − π)].
I The natural parameter log[π/(1 − π)] is the log
odds of response 1, the logit of π.
I GLMs using the logit link are often called logit
models.

Poisson Loglinear Models for Count Data
I Poisson variates can take any nonnegative

integer value.
I The Poisson probability mass function for Y is
−µ y
f (y ; µ) = e y !µ = exp(−µ)( y1! ) exp(y log µ),
y = 0, 1, 2, . . . .

Poisson Loglinear Models for Count Data
I This has natural exponential form with

θ = µ, a(µ) = exp(−µ), b(y ) = 1/y !, and
Q(µ) = log µ.
I The natural parameter is log µ.
I GLMs using the log link are often called Poisson
loglinear models.

Systematic Component
I The systematic component of a GLM relates a

vector (η1 , . . . , ηn ) to the explanatory variables
through a linear model.
I Let xij denote the value of predictor
j(j = 1, 2, . . . , p) for subject i.
P
I ηi = βj xij , i = 1, . . . , n
j

Systematic Component
I This linear combination of the explanatory

P
variables, βj xij , is called the linear predictor.
j
I One xij = 1 for all i, for the coefficient of an
intercept(often denoted by α) in the model.
I Some {xij } may be based on others in the model;
for instance, perhaps xi3 = xi1 xi2 , or xi3 = xi12 .

Link
I The third component of a GLM is a link function

that connects the random and systematic
components.
I Let µi = E (Yi ), i = 1, . . . , n.
I The model links µi to ηi by ηi = g (µi ), where
the link function g (.) is a monotonic,
differentiable function.
P
I g (µi ) = βj xij , i = 1, . . . , n
j

Link
I The link function g (µ) = µ, called the identity

link, has ηi = µi .
I µi = α + β1 xi1 + · · · + βp xip
I This is the form of ordinary regression models for
normally distributed responses.

Link
I The link function g (µ) = log[µ/(1 − µ)], called

the logit link, has ηi = g (µi ).
I log[µi /(1 − µi )] = α + β1 xi1 + · · · + βp xip
I It is appropriate when µ is between 0 and 1, such
as a probability.
I A GLM that uses the logit link is called a logit
model.

Link
I The link function g (µ) = log(µ), called the log

link, has ηi = g (µi ).
I log(µi ) = α + β1 xi1 + · · · + βp xip
I It is appropriate when µ cannot be negative, such
as with count data.
I A GLM that uses the log link is called a loglinear
model.

Link
I The link function that transforms the mean to

the natural parameter is called the canonical link.
P
I For it, g (µi ) = Q(θi ), and Q(θi ) = j βj xij .
I For the normal distribution, it is the mean itself.
I For the Poisson, the natural parameter is the log
of the mean.
I For the binomial, the natural parameter is the
logit of the success probability.

Normal GLM
I Ordinary regression and ANOVA models for

continuous variates are special cases of GLMs.
I A GLM generalizes ordinary regression models in
two ways:
First, it allows the random component to have a distribution
other than the normal.
Second, it allows modeling some function of the mean.

Normal GLM
I A traditional way of analyzing nonnormal data

attemps to transform the response so it is
approximately normal, with constant variance.
I e.g. Box-Cox transformation
( λ stabilize variance by
y −1
∗ λ if λ 6= 0
finding λ s.t. y =
ln y if λ = 0
then expect y ∗ ∼ normal distribution.

Normal GLM
I A transform that produces constant variance may

not produce normality, or else simple linear
models for the explanatory variables may fit
poorly on that scale.
I With the theory and methodology of GLMs, it is
unnecessary to transform data so that
normal-theory methods apply.

Normal GLM
I The GLM fitting process utilizes maximum

likelihood methods for our choice of random
component, and in GLMs the choice of link is
separate from the choice of random component.
I If a link produces additivity of effects(i.e., if a
linear model holds for that link), it is not
necessary that it also stabilize variance or
produce normality.

Normal GLM
I Regression, ANOVA, and models for categorical

data are special cases of one super model.
I The same fitting method yields ML estimates of
parameters for all GLMs.
I This method is the basis of software for fitting
GLMs, such as GLIM and
SAS(PROC GENMOD).

Generalized Linear Models for Binary Data
I Y might indicate vote in a British

election(Labour, Conservative), choice of
automobile(domestic, import), or diagnosis of
breast cancer(present, absent).
I Each observation has one of two outcomes,
denoted by 0 and 1, binomial for a single trial.

I A binary response is somtimes called a Bernoulli

variable.
I Its distribution is specified by probabilities
P(Y = 1) = π of success and
P(Y = 0) = (1 − π) of failure.
I This distribution has mean E (Y ) = π and
variance var(Y ) = π(1 − π).

I For n independent observations on a binary

response with parameter π, the number of
successes has the binomial distribution specified
by the indices n and π.
I We denote P(Y = 1) by π(x), reflecting its
dependence on values x = (x1 , . . . , xp ) of
predictors.
I The variance of Y is var(Y ) = π(x)[1 − π(x)].

Linear Probability Model
I For a binary response, the regression model

π(x) = α + βx is called a linear probability
model.
I With independent observations it is a GLM with
binomial random component and identity link
function.
I The parameter β represents the change in π(x)
for a one-unit increase in x.

I Probabilities fall between 0 and 1, but linear

functions take values over the entire real line.
I This model has π(x) < 0 and π(x) > 1 for
sufficiently large or small x values.

I For its extension with multiple predictors,

difficulties often occur fitting this model because
during the fitting process, π̂(x) falls outside the
[0, 1] range for some subject’s x values.
I The model can be valid over a restricted range of
x values.

I Least squares is ML for a normal distribution

with constant variance.
I For binary responses, the constant variance
condition that makes least squares estimators
optimal(i.e., minimum variance in the class of
linear unbiased estimators) is not satisfied.

I Since var(Y ) = π(x)[1 − π(x)], the variance

depends on x through its influence on π(x).
I As π(x) moves toward 0 or 1, the distribution of
Y is more nearly concentrated at a single point,
and the variance moves toward 0.

I Because of the nonconstant variance, the

binomial ML estimator is more efficient than
least squares.
I Y , being binary, is very far from normally
distributed. The usual sampling distributions for
the least squares estimators do not apply.

I The estimates and standard errors for ML and

least squares are usually similar, however, when
π̂(x) for the sample x values falls in the range
within which the variance is relatively
stable(about 0.3 to 0.7).

Snoring and Heart Disease Example
I Based on an epidemiological survey of 2484

subjects to investigate snoring as a risk factor for
heart disease.
I The model states that the probability of heart
disease π(x) is linearly related to the level of
snoring x.

I We treat the rows of the table as independent

binomial samples with that probability as the
parameter.
I We use scores (0,2,4,5) for the snoring
categories, treating the last two levels as closer
than the other adjacent pairs.

Relationship between Snoring and Heart Disease
Heart Disease Proportion Linear Logit Probit
Snoring Yes No Yes Fita Fit Fit

Never 24 135 .017 .017 .021 .020
Occasional 35 603 .055 .057 .044 .046
Nearly every night 21 192 .099 .096 .093 .095
Every night 30 224 .118 .116 .132 .131
a Model fits refer to proportion of yes responses.
Source:Brit. Med. J.,291:630-632(1985).

I Software reports the ML fit,

π̂(x) = 0.0172 + 0.0198x, with a standard error
SE=0.0028 for β̂ = 0.0198.
I For nonsnorers (x = 0), the estimated proportion
of subjects having heart disease is 0.0172.
I We refer to the estimated values of E (Y ) for a
GLM as fitted values(擬合值).

I Figure 4.1 graphs the sample and fitted values.

I The table and graph suggest that the model fits
well.
I §5.4 discusses formal goodness-of-fit analyses for
binary-response GLMs.

I The estimated probability of heart disease is

about 0.02 for nonsnorers; it increases
2(0.0198)=0.04 for occasional snorers, another
0.04 for those who snore nearly every night, and
another 0.02 for those who always snore.

I Suppose we had chosen scores for snoring level

having different relative spacings from the scores
{0,2,4,5}. Examples are {0,2,4,4.5} or {0,1,2,3}.
Then the fitted values for the four snoring
categories would change somewhat.
I They would not change if the relative spacings
between scores were the same, such as
{0,4,8,10} or {1,3,5,6}.

I If we entered the data as 2484 binary

observations of 0 or 1 and fitted the model using
ordinary least squares rather than ML, we would
obtain π̂(x) = 0.0169 + 0.0200x.
I When the model fit is good, least squares and
ML estimates are usually similar.

Logistic Regression Model
I Binary data result from a nonlinear relationship

between π(x) and x.
I A fixed change in x often has less impact when
π(x) is near 0 or 1 than when π(x) is near 0.5.
I Nonlinear relationships between π(x) and x are
often monotonic, with π(x) increasing
continuously or π(x) decreasing continuously as
x increases.

I The S-shaped curves in Figure 4.2 are typical.

I The most important curve with this shape has
exp(α+βx)
the model formula π(x) = 1+exp(α+βx) .
I This is the logistic regression model.

I As x → ∞, π(x) ↓ 0 when β < 0 and π(x) ↑ 1

when β > 0.
I As |β| increases, the curve has a steeper rate of
change.
I When β = 0, the curve flattens to a horizontal
straight line.

π(x)
I The odds are 1−π(x) = exp(α + βx).
I The log odds has the linear relationship
π(x)
logit(π) = log( 1−π(x) ) = α + βx.
I This is called the logistic regression function.
I The log odds transformation is called the logit
transformation.

I Logistic regression models are GLMs with

binomial random component and logit link
function.
I Logistic regression models are also called logit
models.
I The logit is the natural parameter of the binomial
distribution, so the logit link is its canonical link.

I Whereas π(x) must fall in the (0,1) range, the

logit can be any real number.
I The real numbers are also the range for linear
predictors (such as α + βx) that form the
systematic component of a GLM.

I For the snoring and heart disease data, software

reports the logistic regression ML fit
logit[π̂(x)] = −3.87 + 0.40x.
I The positive β̂ = 0.40 reflects the increased
incidence of heart disease at higher snoring levels.
I Chapter 5 presents several ways of interpreting
such equations.
I Results are similar to those for the linear
probability model.
Alternative Binary Links
I The cumulative distribution function(cdf) F (x)

for X is defined as
F (x) = P(X ≤ x), −∞ < x < ∞.
I A monotone regression with β > 0 in Figure 4.2
has the shape of a cdf for a continuous random
variable.
I This suggests a model for a binary response
having form π(x) = F (x) for some cdf F .

I When β > 0, F (x) is the cdf of a two-parameter

logistic distribution.
I When β < 0, the formula for 1 − π(x) has the
logistic cdf appearance.
I Each choice of α and of β > 0 corresponds to a
different logistic distribution.

I The logistic cdf corresponds to a probability

distribution with a symmetric, bell shape.
I It looks similar to a normal distribution but with
slightly thicker tails.

I When a tolerance distribution applies to subjects’

responses model form π(x) = F (x) occurs naturally.
I For instance, in a toxicology study, suppose that

researchers spray an insecticide at various dosage levels on
batches of mosquitoes.
I If a cdf F describes the distribution of tolerances, then the

model for the probability π(x) of death at dosage level x
necessarily has form π(x) = F (x).

Probit Models
I When F is the cdf of a normal distribution,

model π(x) = F (x) is called the probit model.
I The link function for the model is then called the
probit link.
I The probit model has alternative expression
probit[π(x)] = α + βx.

Probit Models
I The probit link applied to a probability π(x)

transforms it to the standard normal z-score at
which the left-tail probability equals π(x).
I For instance, probit(.05)=−1.645, probit(.50)=0,
probit(.95)=1.645, and probit(.975)=1.96.
I The probit model is a GLM with binomial
random component and probit link.

Probit Models
I The ML fit of the probit model, using score {0,2,4,5} for

snoring level in the snoring and heart disease data, is
probit[π̂(x)] = −2.061 + 0.188x.
I π̂(0) = Φ(−2.061 + 0.188(0)) = Φ(−2.06) = .020
I π̂(5) = Φ(−2.061 + 0.188(5)) = Φ(−1.12) = .131
I The fitted values are similar to those obtained with the

linear probility and logistic regression models.

Probit Models
I It is rare, and requires enormous sample sizes, to

find data for which a logistic regression model
fits well but the probit model fits poorly, or
conversely.
I When both models fit well, slope esimates in
logistic regression models are roughly about
1.6-2.0 times those in probit models.

Probit Models
I The probit transform maps π(x) so that the

regression curve for π(x)(or 1 − π(x), when
β < 0) has the appearance of the normal cdf
with mean µ = −α/β and standard deviation
σ = 1/|β|.
I For the snoring and heart disease data,
−α̂/β̂ = 2.061/0.188 = 11.0 and
1/|β| = 1/0.188 = 5.3.

Probit Models
I The predicted probability of heart disease equals

1
2 at snoring level x = 11.0.
I The fitted probit value of −2.06 at x = 0 means
that 0 is 2.06 standard deviations below the
mean of a normal distribution with mean 11.0
and standard deviation 5.3.

Probit Models
I The probit model was introduced in 1934 for

models in toxicology.
I The logistic regression model was not studied
until about a decade later, but it is now much
more popular than the probit.
I Partly this is because one can interpret the
logistic regression effects using odds ratios.

I The number automobile thefts in 1995, or the number of
imperfections on a wafer have counts as possible outcomes.
I Poisson variates can take any nonnegative integer value.
I Chapter 6 presents Poisson GLMs for counts in

contingency tables.
I The response data are cell counts obtained by

cross-classifying subjects on two or more categorical
response variables.
I This section introduces Poisson regression-type models

using an alternative application: modeling count or rate
data for a single response variable.

Poisson Regression
I The log mean is the natural parameter for the

Poisson distribution, and the log link is the
canonical link for a GLM with Poisson random
component.
I A Poisson loglinear model is a GLM that assumes
a Poisson distribution for Y and uses the log link.

Poisson Regression
I Let µ denote the expected value for a Poisson

variate Y , and let X denote an explanatory
variable.
I The Poisson loglinear model has form
log µ = α + βx.
I The mean satisfies the exponential relationship
µ = exp(α + βx) = e α (e β )x .

Poisson Regression
I A one-unit increases in X has a multiplicative impact of e β

on µ.
I The mean of Y at x + 1 equals the mean of Y at x

multiplied by e β .
I If β = 0, then e β = e 0 = 1 and the multiplicative factor is

1; that is, the mean of Y does not change as X changes.
I If β > 0, then e β > 1, and the mean of Y increases as X

increases.
I If β < 0, then the mean of Y decreases as X increases.

Horseshoe Crabs and Satellites
I A study of nesting horseshoe crabs

I Each female horseshoe crab in the study had a
male crab attached to her in her nest.
I The study investigated factors that affect
whether the female crab had any other males,
called satellites, residing near her.

I Explanatory variables thought possibly to affect

this included the female crab’s color, spine
condition, weight, and carapace width.
I The response outcome for each female crab is
her number of satellites.
I For now, we use width alone as a predictor of the
response.(Other analyses of these data occur in
Chapter 5.)

I Figure 4.3 plots the response counts against

width, with numbered symbols indicating the
number of observations at each point.
I The substantial variability in counts makes it
difficult to discern a clear pattern.

I To obtain a clearer picture of overall trend, we

grouped the female crabs into a set of width
categories, (≤23.25,23.25-24.25,24.25-25.25,
25.25-26.25,26.25-27.25, 27.25-28.25,
28.25-29.25,>29.25),
I and calculated the sample mean number of
satellites for female crabs in each category.

I Figure 4.4 plots these sample means against the

sample mean width for crabs in each category.
I The sample mean width equals 26.3 and the
standard deviation equals 2.1.
I We used 26.25 rather than 26.3 for the midpoint
of the eight classes so that no observation would
fall exactly on the boundary between two
categories.

I More sophisticated ways of portraying the trend

smooth the data without grouping the width
values or assuming a particular functional
relationship.
I Figure 4.4 shows such a smoothed curve.
I The sample means and the smoothed curve both
show a strong increasing trend.

I The means tend to fall above the curve, since

the response counts in a category tend to be
skewed to the right.
I The smoothed curve is less susceptible to
outlying observations.
I The trend seems approximately linear, and we
discuss next models for the ungrouped data for
which the mean or the log of the mean is linear
in width.

I For a female crab, let µ be the expected number

of satellites and x = width.
I From GLM software, the ML fit of the Poisson
loglinear model is
log µ̂ = α̂ + β̂x = −3.305 + 0.164x.
I The effect β̂ = 0.164 of width has an asymptotic
(large-sample) standard error of ASE = 0.020.

I The model fitted value at any width level is an

estimated mean number of satellites µ̂.
I The fitted value at the mean width of x = 26.3
is µ̂ = exp(α̂ + β̂x) =
exp[−3.305 + 0.164(26.3)] = 2.74.
I For this model, exp(β̂) = exp(0.164) = 1.18
represents the multiplicative effect on µ̂ for a
1-cm increase in x.

I The fitted value at x = 27.3 = 26.3 + 1 is

exp[−3.305 + 0.164(27.3)] = 3.23, which equals
1.18 × 2.74.
I A 1-cm increase in width yields an 18% increase
in the estimated mean.

I Figure 4.3 shows that one crab had somewhat
greater width than the others, 33.5cm.
I An observation having explanatory variable
values much different from the rest of the sample
can have an undue influence on the model fit.
I To check the effect of this observation, we
deleted it and refitted the model for the
remaining 172 crabs.
I The ML estimates then equal α̂ = −3.461 and
β̂ = 0.170 (ASE=0.022).
I Figure 4.4 shows that E (Y ) may grow

approximately linearly with width.
I This suggests the Poisson GLM with identity
link, µ = α + βx.
I Its has ML fit µ̂ = α̂ + β̂x = −11.53 + 0.55x.
I The effect of X on µ in this model is additive,
rather than multiplicative.

I A 1-cm increase in x has an estimated increase

of β̂ = 0.55 in µ̂.
I The fitted value at the mean width of x = 26.3
is µ̂ = −11.53 + 0.55(26.3) = 2.93; at x = 27.3,
it is 2.93 + 0.55 = 3.48.
I The fitted values are positive at all sampled x.
I On the average, about a 2-cm increase in width
is associated with an extra satellite.

I Figure 4.5 plots µ̂ against width for the models

with log link and identity link.
I Although they diverge somewhat for relatively
small and large widths, they provide similar
predictions over the width range in which most
observations occur.
I §4.4.2 and 4.4.3 study whether either model
provides an adequate fit to these data.

Poisson Regression for Rate Data
I When events of a certain type occur over time, space, or

some other index of size, it is often relevant to model the
rate at which events occur.
I In modeling numbers of auto thefts in 1995 for a sample of

cities, we could form a rate for each city by dividing the
number of thefts by the city’s population size.
I The model might describe how the rate depends on

explanatory variables such as the city’s unemployment rate,
its resident’s median income, and percentage of residents
having completed high school.

I When a response count Y has index(such as

population size) equal to t, the sample rate is
Y /t.
I The expected value of the rate is µ/t.
I With an explanatory variable x, a loglinear model
for the expected rate has form
log(µ/t) = α + βx.

I This model has equivalent representation

log µ − log t = α + βx.
I The adjustment term, − log t, to the log link of
the mean is called an offset(補償項).
I The fit correspond to using log t as a predictor
on the right-hand side and forcing its coefficient
to equal 1.0.

I The expected response count satisfies

µ = t exp(α + βx).
I The mean is proportional to the index t, with
proportionality constant depending on the value
of x.
I For a fixed value of x, doubling the population
size t also doubles the expected number of auto
thefts µ.

Examples of Rate Models
I Using data dealing with motor vehicle accident

rates for elderly drivers(from an article by W.A.
Ray et al., Amer. J. Epidemiol., 132:
873-884(1992))
I The sample consisted of 16,262 Medicaid
enrollees aged 65-84 years, with data on each
subject for a period of somewhere between 0 and
4 years.

I The total observation time for women in the sample was

17.30 thousand years. During this period, they had 175
accidents in which an injury occured.
I The total observation time for men was 21.40 thousand

years, during which they had 320 injurious accidents.
I The sample rates of injurious accidents are

320/21.40=14.95 crashes per thousand years of driving for
males, and 175/17.30=10.12 for females.

I Let µ denote the expected number of injurious

accidents, for an observation period of t
thousand years.
I To model the effect of gender on the accident
rate, we use model log(µ/t) = α + βx with
x = 0 for females and x = 1 for males.
I The explanatory variable x is a dummy variable
(虛擬變數) for gender.

I The log of the accident rate equals α for females, and it
equals α + β for males. The rates are identical if β = 0.
I The estimate of α is simply the sample log(rate) for
females, namely log(10.12) = 2.31; the estimate of α + β
is simply the sample log(rate) for males, namely
log(14.95) = 2.70
I The estimated difference is β̂ = 0.39.
I The estimated accident rate for men was
exp(β̂) = exp(0.39) = 1.48 times the rate for women.
That is, 14.95/10.12=1.48, the sample rate being 48%
higher for men.

I To test whether the true rate are the same, we

test H0 : β = 0.
I GLM software reports an ASE for β̂ = 0.39 of
0.09, so there is strong evidence that the accident
rate was higher for males(i.e., that β > 0).
I The accident rates do not take into account
possibly different yearly levels of driving for the
two groups.

I When all counts in a data set have the same

index value t, or when counts do not refer to an
index such as time or group size, the model does
not need an offset term.

Summary
I Components of a generalized linear model

I Generalized linear model for binary data
I Generalized linear model for count data

離散資料分析 Categorical Data Analysis: 陳俞成 Email:ycchen@mail.chna.edu.tw

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

離散資料分析 Categorical Data Analysis: 陳俞成 Email:ycchen@mail.chna.edu.tw

Uploaded by

Copyright:

Available Formats

離散資料分析

Categorical Data Analysis

陳俞成 Email:ycchen@mail.chna.edu.tw 離散資料分析 Categorical Data Analysis

I Using models as the basis of investigating effects

陳俞成 Email:ycchen@mail.chna.edu.tw 離散資料分析 Categorical Data Analysis

I Benefits of good-fitting model

陳俞成 Email:ycchen@mail.chna.edu.tw 離散資料分析 Categorical Data Analysis

I Model can handle more complicated situations

陳俞成 Email:ycchen@mail.chna.edu.tw 離散資料分析 Categorical Data Analysis

I Generalized linear model(GLM) is a broad class

陳俞成 Email:ycchen@mail.chna.edu.tw 離散資料分析 Categorical Data Analysis

I §4.2 introduces the logistic regression model for binary

I §4.3 introduces the loglinear model for count-type response

I §4.4 discusses checks of the adequacy of model fit for

I §4.5 presents further details about the fitting and checking

陳俞成 Email:ycchen@mail.chna.edu.tw 離散資料分析 Categorical Data Analysis

陳俞成 Email:ycchen@mail.chna.edu.tw 離散資料分析 Categorical Data Analysis

I The random component of a GLM consists of a

f (yi ; θi ) = a(θi )b(yi ) exp[yi Q(θi )].

陳俞成 Email:ycchen@mail.chna.edu.tw 離散資料分析 Categorical Data Analysis

I The value of the parameter θi may vary for

陳俞成 Email:ycchen@mail.chna.edu.tw 離散資料分析 Categorical Data Analysis

I Represent the success and failure outcomes by 1

陳俞成 Email:ycchen@mail.chna.edu.tw 離散資料分析 Categorical Data Analysis

I This is in the natural exponential family,

陳俞成 Email:ycchen@mail.chna.edu.tw 離散資料分析 Categorical Data Analysis

I Poisson variates can take any nonnegative

陳俞成 Email:ycchen@mail.chna.edu.tw 離散資料分析 Categorical Data Analysis

I This has natural exponential form with

陳俞成 Email:ycchen@mail.chna.edu.tw 離散資料分析 Categorical Data Analysis

I The systematic component of a GLM relates a

陳俞成 Email:ycchen@mail.chna.edu.tw 離散資料分析 Categorical Data Analysis

I This linear combination of the explanatory

陳俞成 Email:ycchen@mail.chna.edu.tw 離散資料分析 Categorical Data Analysis

I The third component of a GLM is a link function

陳俞成 Email:ycchen@mail.chna.edu.tw 離散資料分析 Categorical Data Analysis

I The link function g (µ) = µ, called the identity

陳俞成 Email:ycchen@mail.chna.edu.tw 離散資料分析 Categorical Data Analysis

I The link function g (µ) = log[µ/(1 − µ)], called

陳俞成 Email:ycchen@mail.chna.edu.tw 離散資料分析 Categorical Data Analysis

I The link function g (µ) = log(µ), called the log

陳俞成 Email:ycchen@mail.chna.edu.tw 離散資料分析 Categorical Data Analysis

I The link function that transforms the mean to

陳俞成 Email:ycchen@mail.chna.edu.tw 離散資料分析 Categorical Data Analysis

I Ordinary regression and ANOVA models for

陳俞成 Email:ycchen@mail.chna.edu.tw 離散資料分析 Categorical Data Analysis

I A traditional way of analyzing nonnormal data

陳俞成 Email:ycchen@mail.chna.edu.tw 離散資料分析 Categorical Data Analysis

I A transform that produces constant variance may

陳俞成 Email:ycchen@mail.chna.edu.tw 離散資料分析 Categorical Data Analysis

I The GLM fitting process utilizes maximum

陳俞成 Email:ycchen@mail.chna.edu.tw 離散資料分析 Categorical Data Analysis

I Regression, ANOVA, and models for categorical

陳俞成 Email:ycchen@mail.chna.edu.tw 離散資料分析 Categorical Data Analysis

I Y might indicate vote in a British

陳俞成 Email:ycchen@mail.chna.edu.tw 離散資料分析 Categorical Data Analysis

I A binary response is somtimes called a Bernoulli

陳俞成 Email:ycchen@mail.chna.edu.tw 離散資料分析 Categorical Data Analysis

I For n independent observations on a binary

陳俞成 Email:ycchen@mail.chna.edu.tw 離散資料分析 Categorical Data Analysis

I For a binary response, the regression model

陳俞成 Email:ycchen@mail.chna.edu.tw 離散資料分析 Categorical Data Analysis