You are on page 1of 30

Econometria Avanzada

Generalized Linear Models

Carlos Castro

Universidad del Rosario

1 / 30
Motivation: Binary Choice Models

Binary variables can take the value of 0 or 1. The value depends on a


particular outcome of the observed event.
So far you have used this type of variables (dummy variables) on the
right-hand side of the equation, as exogenous variables. Now the idea is
to use this same variables as endogenous or dependent variables in the
models.
Discrete variables are often used in various models in order to overcome
problems in measuring the actual event or to control for qualitive
characteristics.

2 / 30
Example: Biometrics Literature

The objective of a determined test carried by a Chemical company is to


assess the effectiveness of a new pesticide, through the observed
tolerance of the insects. Let yi∗ be the tolerance of an insect;
yi∗ ∼ N(0, σ 2 ). The tolerance is measured with respect to an specific
doses of pesticide applied to the insects xi .

3 / 30
Example:Biometrics Literature

The objective of a determined test carried by a Chemical company is to


assess the effectiveness of a new pesticide, through the observed
tolerance of the insects. Let yi∗ be the tolerance of an insect;
yi∗ ∼ N(0, σ 2 ). The tolerance is measured with respect to an specific
doses of pesticide applied to the insects xi .
The observed variable is dichotomy yi that determines if the insect is
dead(0) or alive(1) after being exposed to xi doses of pesticide and in
accordance to the insects tolerance.

prob(yi = 1) = prob(yi > xi )


{ }
1, if yi∗ > xi
yi =
0, else

y ∗ is known as a latent variable, because it is not observable.

4 / 30
Motivation: Polychotomous

These variables can take more than two values.


Unordered: The variables do not have a natural order between them
 

 1 Person i travel auto 

 
 2 Person i travel carpool 
yi = 3 Person i travel subway

 

 4 Person i travel bus  

 
5 Person i travel other

5 / 30
Motivation: Polychotomous.

Ordered: The variables has an specific ranking (sequential).

 

 1 Person i high school education 

 
2 Person i incomplete college education
yi =

 3 Person i complete college education 

 
4 Person i Graduate education

6 / 30
Linear Models.

Consider the example of organized labor participation.

{ }
1, If person i is a member of organized labor (syndicate)
yi =
0, else

The problem at hand is the effect of collective bargaining on wages.


Through a normal Minzer equation this would not be a problem since our
dichotomous variable is on the right hand side of the equation. However
if we are interest in the determinants of organized labor we would need to
have the dichotomous variable on the left hand side of the equation.

7 / 30
Linear Models.

yi = xi β + vi
where xi is a matrix that collects socio-economic characteristic from the
employee, such as education, work experience,..etc.
The linear probability model has a number of shortcomings:

prob(yi = 1) = F (xi , β)
prob(yi = 0) = 1 − F (xi , β)

where β is the effect, changes in the variables in xi has on the probability


of the particular event.

8 / 30
Linear Model.

The linear probability can be express in the following form:

E [y |x] = F (x, β)
y = E [y |x] + [y − E [y |x]]
= β′x + ϵ

This form is the usual procedure applied in estimation when y is a


continuous variable.

9 / 30
Linear Models.

However when y is a discrete variable substantial problems emerge. The


first problem is one of heteroskedasticity in the error since it depends on
the estimate β.

β′x + ϵ = 0 or 1
ϵ = −β ′ x with probability 1 − F or
ϵ = 1 − β ′ x with probability F =⇒
var [ϵ|z] = β ′ x(1 − β ′ x)

10 / 30
Linear Models.

Since the variance of ϵ is not constant because it depends on xi β.


However grave this problem is it can be overcome using a Feasible
Generalized Least Squares (FGLS).
A second inconvenient of the linear approximation arises from the
inability to guarantee that the predicted values of the model will be
actual probability values. This arises from the fact that β ′ x is not
necessarily ∈ [0, 1] Therefore the model could point out prediction of
uninterpretable probability values (such as negative values).

11 / 30
Generalized Linear Models.

Generalized Linear Model


yi = φ(xi β)
Building such a model involves three decisions:
What is the distribution of the data y (for fixed values of the
predictors (exogenous variables) and possibly after a
transformation)?
What function of the mean will be models as linear in the
predictors?
What will the predictors be?

12 / 30
GLM: Distribution y.

Typically the vector y is assumed to consist of independent


measurements from a distribution from the exponential family or similar
to the exponential family:


r
l(y , θ) = c(θ)h(y ) exp Qj (θ)Tj (y )
j=1

Where Qj and Tj take values in R.

13 / 30
GLM: Link function.
We typically want to relate the parameters of the distribution to various
predictors. We do so by modelling a transformation of the mean, µi
which will be some function of Q(θ):

E (yi ) = µi
−1
φ (µi ) = xi β

where φ is a known function, called the link function (since it links


together the mean of yi and the linear form of the predictors).
Identity: xi β
Probit link: Φ−1 (xi β)
exp(xi β)
Logistic link: 1+exp(xi β)

Poisson link (count): exp(xi β)

14 / 30
Predictors.

Generally predictors are conbsidered to have a linear form xi β. Same


considerations as in the linear model with respect to interactions,
treatment (dummys), non-linear relationships, and incorporation of
random factors GLMM.

15 / 30
Model Specification.

In order to solve the unfortunate shortcomings of the linear specification


(mentioned in the introduction) it is necessary to guarantee that
β ′ x ∈ [0, 1], such that prob(yi = 1) = F (xi β). This requires a
transformation through the use of a particular function F such that
Fxβ : R → [0, 1]. In the linear model there is no transformation since F is
an identity function such that

prob(yi = 1) = xi β
It is possible to use any continuous probability function defined on R.

16 / 30
Probit Model: Specification.

The function used for the transformation is a standard normal.

∫xi β { }
1 −zi2
prob(yi = 1) = ϕ(xi β) = √ exp dz
2π 2
−∞

The use of the standard normal function ϕ(xi β) restricts the range of
values to be ϵ[0, 1], such that

lim ϕ(x) = 1 and lim ϕ(x) = 0


x→+∞ x→−∞

17 / 30
Probit: Specification

18 / 30
Probit: Specification.

prob(yi = 1) = prob(yi∗ > 0)


= prob(xi β + ϵi > 0)
= prob(ϵi > −xi β)
ϵi −xi β
= prob( > )
σ σ
The last expression guarantee that we are working with the symmetric
standard normal distribution

ϵi −xi β
prob(yi = 1) = prob( > )
σ σ
ϵi xi β
= prob( < )
σ σ
xi β
= ϕ( )
σ

19 / 30
Probit: Estimation.

Given a sample of T independent observations, where each observation


may be on a different individual, the likelihood function can be obtained
in the following way:

xi β
prob(yi = 1) = ϕ( )
σ
xi β
prob(yi = 0) = 1 − prob(yi = 1) = 1 − ϕ( )
σ
Since yi is iid the likelihood function is the multiplication of the
probability for each observation → ∃ 1, ..m observation such that yi = 0
and m + 1, ..., n such that n − m observations where yi = 1.

20 / 30
Probit: Estimation.

L = p(y1 = 0)p(y2 = 0) . . . p(ym = 0)p(ym+1 = 1) . . . p(yn = 1)


∏m
xi β ∏
n
xi β
= [1 − Φ( )] [Φ( )]
σ σ
i=1 i=m+1
∏n
x i β yi xi β 1−yi
= ϕ( ) [1 − ϕ( )]
σ σ
i=1

21 / 30
Probit: Estimation.

The log-likelihood function has the following form:


n
ln L = yi ln F (xi β) + (1 − yi ) ln[1 − F (xi β)]
i=1

First order condition for the log-likelihood:

∂ ln L ∑ f (.) ∑
n n
f (.)
= yi xi − (1 − yi ) xi = 0
∂β F (.) 1 − F (.)
i=1 i=1

22 / 30
Probit: Estimation.

This function must be maximized with the use of numeric methods since
it is not linear. The most common numerical method used is the Newton
Raphson Algorithm.

∂ 2 ln L ∂ ln L
β̂t+1 = β̂t − [ |β̂t ]−1 [ |β̂t ]
∂β∂β ′ ∂β
where β̂t is a consistent, asymptotically efficient and normal distributed
estimator.
The asymptotic covariance matrix is:

∂ 2 ln L
−[ |β̂t ]−1
∂β∂β ′

23 / 30
Logit: Specification.

The function use for the transformation is a logistic cumulative.

exp {β ′ x}
prob(yi = 1) = Λ(xi β) =
1 + exp {β ′ x}
Both of these model are the most commonly used. The choice between
theM is sometimes based on the data used. The distribution are both
symmetric. However the tails of the logistic density function are
heavier(widther) then in the standard normal. This makes the probability
associated with extreme values most likely to vary between the to
models.

24 / 30
Logit: Estimation.

The methods used in the estimation of logit model are also maximum
likelihood methods.
First order condition for the log-likelihood:

∂ ln L ∑
n
= (yi − Λi )xi = 0
∂β
i=1

25 / 30
Logit: Estimation.

This equation must be solved with the use of numeric methods since it is
not linear. The most common numerical method used is the Newton
Raphson Algorithm.

∂ 2 ln L ∂ ln L
β̂t+1 = β̂t − [ ′
|β̂t ]−1 [ |β̂t ]
∂β∂β ∂β
where β̂t is a consistent, asymptotically efficient and normal distributed
estimator.
The asymptotic covariance matrix is:

∂ 2 ln L
−[ |β̂t ]−1
∂β∂β ′

26 / 30
Marginal Effects.

The parameters of the models with dicrete dependent variables, like those
of any nonlinear regression model, are not necessarily the marginal effects
that one is accustomed to analyzing. In general

{ }
∂E [y |x] dF (β ′ x)
= β
∂x d(β ′ x)
= f (β ′ x)β

where f (.) is the density function that corresponds to the cumulative


distribution F (.).

27 / 30
Marginal Effects.

For the normal distribution, this result is:

∂E [y |x]
= ϕ(β ′ x)β
∂x
where ϕ(t) is the standard normal density.

28 / 30
Marginal Effects.

For the logistic distribution,

dΛ[β ′ x] exp {β ′ x}
=
d(β ′ x) (1 + exp {β ′ x})2
= Λ(β ′ x)[1 − Λ(β ′ x)]

Then for the logit model the marginal effect is:

∂E [y |x]
= (Λ(β ′ x)[1 − Λ(β ′ x)])β
∂x
It is obvious that these values will vary with the values of x. In
interpreting the estimated model, it will be useful to calculate this value
at, say, the means of the regressors.

29 / 30
Goodness-of-fit.

Often goodness-of-fit measures are implicitly or explicitly based on


comparison: model only a constant as explanatary variables vs model
with other covariates. Let L1 the maximun loglikelihood value of the
model of interest and L0 the maximun loglikelihood value when all of the
parameters except the intercept are set to zero. We know that L1 > L0 .
The intuition is that the larger the difference the more the extended
model adds to the restricted model.
Pseudo-R 2
1
1− 2(L1 −L0 )
1+ N

McFaddenR 2
L1
1−
L0

30 / 30

You might also like