You are on page 1of 6

CATEGORICAL DATA ANALYSIS

Dr. Martin L. William


4. INTRODUCTION TO GENERALIZED LINER MODELS
Most studies have several explanatory variables and they may be categorical or interval.
Two-way table analysis would not suffice in such situations. Modeling the effects helps us to
describe their effects on the response variable. A good model evaluates effects, includes
relevant interactions and provides smooth estimates.
2.1 GENERALIZED LINEAR MODELS
The class of GLMs was introduced by Nelder&Wedderburn (1972). GLMs extend ordinary
regression models to encompass non-normal response variables and to build models for
functions of the mean. There are three components that specify a GLM:
• A Random Component that identifies the response variable Y and its distribution.
• A Systematic Component that specifies the explanatory variables in the linear
predictor function.
• A Link Function that specifies the function of E(Y) which is equated to the
systematic component.
2.1.1 Components of GLMs
The Random Component consists of a response variable Y with independent observations
(y1, y2,..., yN) from a distribution in the exponential family. This family has p.d.f. of the form
f(yi, θi) = a(θi) b(yi) exp [ yi Q(θi) ] - - - (2.1.1)
(Binomial, Poisson, Normal belong to this class)
The value of the parameter θi may vary for i = 1, 2, .., N, depending on the values of the
explanatory variables. The term Q(θ) is called the natural parameter.
The Systematic Component of a GLM relates a vector (η1, η2, ..., ηN) to the explanatory
variables, say, X1, X2, ..., Xp. Denoting the value of the jth predictor for the ith subject as xij,
p
we have ηi = 
j =1
j x i j . The RHS is called the linear predictor.[Usually, for one of the j's,

we have xi j = 1 for all 'i', the coefficient of an intercept, often denoted by α].

[Sometimes, we take j = 0 for the intercept term and retain j = 1, 2, ..., p for other 'p' number
of predictor variables]
The Link Function connects the random & systematic components. Let µi=E(Yi), i =1,2,...,N.
The model links µi to ηi by the equation ηi = g(µi) where the link function 'g' is a monotone
p
differentiable function. That is, g(µi) = 
j =1
j x i j . The link function that transforms the mean

to the natural parameter is called Canonical Link.


In summary, a GLM is a linear model for a transformed mean of a response variable which
follows a distribution in the exponential family.
2.1.2 Normal Linear Regression Models
Let Yi be normally distributed with mean µi. For normal distribution, the mean itself is the
natural parameter. Thus, the link function 'g' is the identity mapping. That is, we formulate
p
the link µi = 
j =1
j x i j . And this is the conventional Linear Regression Model.

1
CATEGORICAL DATA ANALYSIS
Dr. Martin L. William
2.1.3 Binomial Logit Models for Binary Data
Binary variables follow Bernoulli distribution. Let P(Y = 1) = π ( the probability of success),
so that E(Y) = π. The natural parameter is log [π / (1 ˗ π) ] which is called logit of π. Thus, the
link function is g(πi) = log[ πi / (1 - πi)]. We formulate the link
p
log[ πi / (1 - πi)] = 
j =1
j xi j

Therefore, for binary variables, the relevant model is the Binary Logit Model.
2.1.4 Loglinear Models for Poisson count Data
Count data, under certain situations, obey the Poisson law. If µ = E(Y) is the mean of a
Poisson variable, the natural parameter is log µ. Thus, the link function is g(µi) = log µi. We
formulate the link
p
log µi = 
j =1
j xi j

Therefore, for Poisson variables, the relevant model is the Loglinear Model.
2.1.5 Types of GLMs
A traditional way to analyze data transforms the response variable Y so that it has an
approximate normal distribution with constant variance across the subjects. When such a
transformation is possibile, ordinary least squares regression is applicable. In contrast, with
GLMs, the choice of link function is separate from the choice of random component. If a
suitable link function exists, it is not necessary that it stabilizes variance or produces
normality. This is because, the fitting process maximizes the likelihood for the choice of
distribution not restricted to normality.
The different types of GLMs are given below:
Random Component Link function Systematic Component Model
Normal Identity Continuous Linear Regression
Normal Identity Categorical ANOVA
Normal Identity Mixed ANOCOVA
Binomial Logit Mixed Logistic Regression
Poisson Log Mixed Loglinear
Multinomial Generalized Logit Mixed Multinomial Logit

2.1.6 Deviance
For a particular GLM, for observations y = (y1, y2,...,yN), let L(µ, y) be the log-likelihood and

let L(  , y) denote the maximum of the log likelihood for the model under consideration. The
maximum achievable log likelihood over all possible models is L(y,y), which occurs for the

most general (saturated) model having a separate parameter for each observation with  = y.
Such a model is useless since it does not provide reduction. However, it serves as a baseline
for comparison. We test H0: Model holds against H1: Saturated model holds. The deviance is
 
defined as D(y;  ) = – 2[L(  ,y) - L(y,y)] which asymptotically follows χ2(N – p) under the
null hypothesis. Here p is the number of parameters specified by the model being tested.
This is nothing but the LR test.

2
CATEGORICAL DATA ANALYSIS
Dr. Martin L. William
(Eg.) Consider a study where binomial counts at N fixed settings of the predictors are
observed. Let Yi ~ B(ni, πi), i = 1, 2, …, N. Suppose we wish to test homogeneity of the πi’s.
That is H0: πi = α for all i = 1, 2, .., N. [The number of parameters here is 1]. The saturated
model makes no assumption about the πi’s, letting them any N values between 0 & 1. [The
number of parameters here is N]. The deviance has dof = N – 1. It equals the G2 (LR) statistic
for testing independence in the N X 2 table that these samples form.
We note that H0 is same as “independence between the ‘settings’ of the predictors & the
outcomes (success/ failure).”Under independence, the test statistic has an approximate chi-
square distribution as the values of ni increase (whatever be N).

4.17 Advantages of the GLM formulation


GLMs provide a unified theory of modelling that encompasses the most important models for
numerical & categorical response variables. The MLEs of the parameters are computed with
an algorithm that uses a ‘weighted version of least squares’. The same algorithm applies to
the entire exponential family for any choice of link function. Most statistical software have
the facility to fit GLMs.

2.2 MOMENTS AND LIKELIHOOD FOR GLMs


GLMs may be extended to handle distributions with a second parameter. The random
component of the GLM specifies that the N observations (y1, y2,...,yN) are independent with
pmf or pdf for yi being
f (yi, θi ,  ) = exp { [ yiθi – b(θi ) ] / a(  ) + c( yi ,  ) } - - - (2.2.1)
This is called the Exponential Dispersion Family and  is called the Dispersion Parameter
and θi is the Natural Parameter. When  is known, (2.2.1) reduces to the form (2.1.1). For
one-parameter exponential families, the form (2.2.1) is not relevant. Usually, a(  ) has the
form a(  ) =  / wi where wi is a known quantity called ‘weight’.

2.2.1 Mean and Variance Functions for the Random Component


Let Li = log f (yi, θi ,  ) = [ yiθi – b(θi ) ] / a(  ) + c( yi ,  ) - - - (2.2.2)
So ∂Li / ∂θi = [ yi – b'(θi ) ] / a(  ) , ∂2Li / ∂θi2 = – b''(θi ) / a(  )
We know that E(∂Li / ∂θi ) = 0 which gives E(Yi) = b'(θi )- - - (2.2.3)
Also, – E(∂2Li / ∂θi2 ) = E (∂Li / ∂θi )2 which gives
b''(θi ) / a(  ) = E { [ yi – b'(θi ) ] / a(  )}2 = Var( Yi) / [a(  )]2 so that
Var(Yi) = b''(θi ) a(  ) - - - (2.2.4)

Poisson: Let Yibe Poisson with mean μi ; Then, f (yi) = exp { yi log μi – μi – log yi ! }
= exp{ yiθi – exp ( θi) – log yi ! }
where θi = log μi . This has the form (2.2.1) with b(θi) = exp(θi), a(  ) = 1, c( yi ,  ) = – log
yi!
The natural parameter is θi = log μi. We note that
E(Yi) = b'(θi ) = exp( θi ) = μi, Var(Yi) = b''(θi ) a(  ) = exp( θi ) = μi

3
CATEGORICAL DATA ANALYSIS
Dr. Martin L. William
Binomial: Let Yi be the proportion of successes in ni trials with probability of success in each
trial being πi. Then, niYi ~ B(ni, πi). Let θi = log [ πi / ( 1 – πi) ] so πi = exp(θi) / [1+ exp(θi) ].
 ni  ni yi
And f(yi) =    i (1 −  i ) ni − ni yi
 i i
n y
 y  − log[1 + exp( i ) ]  n 
= exp  i i + log  i 
 1 ni  ni y i  
 ni 
This is in the form (4.4.1) with b(θi) = log [1+ exp(θi) ], a(  ) = 1/ ni, c( yi ,  ) = log  
 i i
n y
The natural parameter is the logit namely θi .
We note that E(Yi) = b'(θi ) = exp(θi) / [1+ exp(θi) ] = πi ,
Var(Yi) = b''(θi ) a(  ) = exp(θi) / { [1+ exp(θi) ]2ni} = πi (1 – πi) / ni

2.2.2 Systematic Component and Link Function


We refer to Section 4.1.1 (components of GLM). The link function ‘g’ for which g(μi) = θi in
2.2.1 is the Canonical Link. Thus, we have the direct relationship between the natural
parameter and the linear predictor.
Since μi = b'(θi ) , we have θi = b' – 1 (θi ). For instance, in the Poisson case, b( θi ) = exp( θi),
so μi = b'(θi ) = exp( θi) ; We see that b'(·) is the exponential function and its inverse is the log
function, i.e. θi = log μi. Thus, the canonical link is the log link for Poisson model.

2.2.3 Likelihood Equations for a GLM


We refer to (2.2.1). for N independent observations, the log-likelihood is
y  − b( i )
L(β) =  Li =  log f ( yi ,  i ,  ) =  i i +  c ( yi ,  ) - - - (2.2.5)
i i i a ( ) i

The likelihood equations are ∂ L(β)/∂βj =   L / 


i
i j = 0 for all 'j'. Using Chain rule of

differentiation,  Li /  j = (  Li /  i ) ( i / i ) (i / i ) (i /  j )


We have,  Li /  i = [ yi − b ' ( i ) ] / a ( ) = [ y i −  i ] / a ( ) since μi = b'(θi )
  i /  i = b''(θi ) = Var(Yi) / a(  ) [refer eqn 4.4.4]
so that   i /  i = a(  ) / Var(Yi)
and  i /  i depends on the link function for the model namely ηi = g(µi)
p
And  i /  j = x i j since ηi = 
j =1
j xi j .

Substituting the above we get


y −  i a ( )  ( y i −  i ) xi j  i
 Li /  j = i   xi j  i = 
a ( ) Var (Yi )  i Var (Yi )  i
Thus, the likelihood equations are
( y i −  i ) xi j 
i Var(Y )  i = 0 , j = 1, 2, . . ., p - - - (2.2.6)
i i

4
CATEGORICAL DATA ANALYSIS
Dr. Martin L. William
Although β does not explicitly appear in these equations, it is there implicitly through µi ,
 p 
since µi = g −1    j xi j  .
 j =1 
Different link functions yield different equations. Interestingly, the likelihood equations
(2.2.6) depend on the distribution of Yi only through µi and Var(Yi). The variance itself is a
function of the mean say, Var(Yi) = υ(µi) such as υ(µi) = µi for Poisson, υ(µi) = µi(1- µi) for
Bernoulli, υ(µi) = σ2 (constant) for normal.

2.2.4 Asymptotic Covariance Matrix of Estimators


Let ˆ be the MLE of the parameter β. The Var-Cov Matrix of ˆ is I- 1 where the
'Information Matrix' I = ((i hk )) where
  L L 
i hk = E [ ˗ ∂2L(β) / ∂βh ∂βk ] = ˗  E [∂2Li / ∂βh ∂βk ] = +  E  i  i 
i i   h  k 
 (Yi −  i ) xi h  i (Yi −  i ) xi k  i 
= E 
 Var (Y )    Var (Y )  


i  i i i i 
2
xi h xi k   i 
= 
2 
 E (Yi −  i ) 2
i [ Var (Yi ) ]   i 
2 2
  i 
xi h xi k xi h xi k   i 
= 
2 
 Var (Yi ) =   
i [ Var (Yi ) ]   i  i Var (Yi )   i 
It is clearly seen that the element on the RHS above is the (h,k)th element of X' W X where
2
1   i 
W is a diagonal matrix with the main diagonal elements  
Var (Yi )   i 
Thus, Var-Cov( ˆ ) = (X' W X)- 1 and the estimated Var-Cov matrix of ˆ is (X' Ŵ X)- 1
where Ŵ is W evaluated at ˆ .

2.3 INFERENCE FOR GLMs AND FITTING OF GLMs


Let (xi1, xi2, …,xip) denote the values of the explanatory variables for ith observation. The
p
basic equations in the systematic component is ηi= g(μi) = 
j =1
j x i j , i = 1, 2, .., N, which is

written in matrix form as η = X β, where η = (η1, η2,…,ηN)', β = (β1, β2, .., βp)', X is N x p
matrix called Model Matrix. For most GLMs, the likelihood equations are nonlinear functions

of β. Let  denote the MLE of β. The likelihood-ratio based inference gives the Deviance.
Let W = diag (w1, …,wN) where wi = (∂ μi / ∂ ηi)2 / Var(Yi).
The Information Matrix is I =X'WX. The estimated asymptotic covariance matrix of the
     
MLE  is Cov (  ) = (X' W X ) −1 where W is W evaluated at  .

5
CATEGORICAL DATA ANALYSIS
Dr. Martin L. William

yi −  i
Residuals for GLMs: The Pearson Residual for a GLM is defined as e i = 
. For
1/ 2
[Var (Yi )]

yi −  i
instance, for a Poisson GLM, Var(Yi) =  i , and the Pearson Residual is e i = .

i
For two-way contingency tables, yi’s are nothing but the cell counts nij. These residuals are

ni j −ni j
expressed as e i j
= ^ 1/ 2
nij
Then, it is easily seen that, e 2
ij = X2, the Pearson X2 statistic.

2.3.1 Newton-Raphson Method of fitting


The ML equations involved in a GLM are non-linear and cannot be solved algebraically.
However, numerical methods of solving these equations are available. The Newton-Raphson
Method is an iterative method for solving nonlinear equations. It begins with an initial guess
for the solution. It obtains a second guess by approximating the function to be maximized in a
neighbourhood of the initial guess with a second-degree polynomial and maximizing that
polynomial. This process is repeated until the sequence of guesses converges.
Let L(β) be the function to be maximized and let u' = (∂L(β)/∂β1, ∂L(β)/∂β2, …, ∂Lβ)/∂βp).
Let H = (( hij)) where h ij = ∂2L(β)/∂βi∂βj [ H is called the Hessian Matrix]. Let u(t) and H(t) be

u and H evaluated at β(t), the tth guess for  for t = 0, 1, 2, …. The tth step in the iterative
process approximates f(β) near β(t) by terms up to the second order in its Taylor series
expansion as follows:
L(β) ≈ L(β(t)) + u(t)' (β – β(t)) + (1/2) (β – β(t))' H(t)(β – β(t))
It is easily seen that ∂L(β) / ∂β = 0 is same as the equation u(t) + H(t)(β – β(t)) = 0 and solving
this for β gives the next guess β(t+1) = β(t) – (H(t))–1u(t) [of course, H(t) must be non-singular].

2.3.2 Fisher Scoring Method


Fisher Scoring method is an alternative method resembling the Newton-Raphson method, but
differing in the Hessian Matrix. Fishers Scoring uses the Expected Value of this matrix called
Expected Information while Newton-Raphson uses the matrix itself called Observed
Information.
Let I(t)denote the approximation for the tth approximation to the estimate of the expected
information matrix. That is I(t)has elements – E (∂2L(β)/∂βi∂βj) evaluated at β(t). The
formula for Fisher scoring is β(t+1) = β(t) + (I(t))–1u(t).

You might also like