You are on page 1of 22

Chapter 2: Discrete Choice

Joan Llull

Quantitative Statistical Methods II  Part II


Barcelona School of Economics
Introduction

In this chapter we analyze some models for discrete outcomes, models for which
of m mutually exclusive categories is selected.
This section: binary outcomes.

For notational convenience: y = 1{A is selected}:


It allows us to write the likelihood in a very compact way.
What happens with N −1 N
y ? Why is it important?
P
i=1 i

Chapter 2. Discrete Choice 2


The linear probability model
Simple approach: linear regression model.
OLS regression of y on x provides consistent estimates of sample-average marginal
eects ⇒ nice exploration tool.

Becoming popular in the treatment eects literature.


Two important drawbacks:
Predicted probabilities p̂(x) = x β̂ are not bounded between 0 and 1.
0

Error term is heteroscedastic and has a discrete support (given x).


Chapter 2. Discrete Choice 3
The General Binary Outcome Model

Chapter 2. Discrete Choice 4


The General Binary Outcome Model
The conditional probability of choosing A given x is p(x) ≡ Pr[y = 1|x] = F (x β).
0

These are single-index models.


This general notation is useful to derive general results that are common across
models.
This model includes linear model, Probit and Logit as special cases:
Linear model: F (x β) = x β.0 0

Logit: F (x β) = Λ(x β) = . 0
0 0 ex β
1+ex0 β

Probit: F (x β) = Φ(x β) =
0 φ(z)dz .
0 x0 β
R
−∞

Chapter 2. Discrete Choice 5


Maximum Likelihood Estimation
Given the binomial nature of data, we know the distribution of the outcome: Bernoulli:
(
p if y = 1 ,
1 − p if y = 0
y 1−y
g(y|x) = p (1 − p) =

where p = F (x β).0

Therefore, the conditional log-likelihood is:


N
X
{yi ln F (x0i β) + (1 − yi ) ln 1 − F (x0i β) }.

LN (β) =
i=1

And the rst order condition:


N
∂ LN X yi − F (x0i β̂)
≡ f (x0i β̂)xi = 0,
∂β 0 0
i=1 F (xi β̂)(1 − F (xi β̂))

where f (·) ≡ . ∂F (z)


∂z

No explicit solution. Newton-Raphson converges quickly because log-likelihood is globally concave


for the Probit and Logit.
Chapter 2. Discrete Choice 6
Consistency
We know that the distribution of y is Bernoulli ⇒ Consistency additionally requires
p = F (x β ).
0
0

The true parameter vector is the solution of:


max E[y ln F (x0 β) + (1 − y) ln 1 − F (x0 β) ] .
 
β

The rst order condition is:


y − F (x0 β)
 
0
E f (x β)x = 0.
F (x0 β)(1 − F (x0 β)) [p=F (x0 β 0 )]

Chapter 2. Discrete Choice 7


Asymptotic distribution

From Chapter 1: β̂ →d N (β, Ω /N )). 0

We may use the information matrix equality to get Ω : 0


 2
−1  −1  −1
∂ L ∂L ∂L 1
−E =E =E f (x0 β)2 xx0 .
∂β∂β 0 ∂β ∂β 0 F (x0 β) (1 − F (x0 β))

Note that this is of the form E[ωxx ] . 0 −1

Chapter 2. Discrete Choice 8


Marginal eects
Marginal eects are given by:
∂ Pr[y = 1|x]
= f (x0 β)βk .
∂xk

In the linear probability model, f (x β) = 1.


0

In non-linear models, depend on x (we can compute several alternatives).


Coecients are still informative of the sign of the marginal eect.
Interesting property: ratios of marginal eects are constant:
∂ Pr[y = 1|x]/∂xk f (x0 β)βk βk
= = .
∂ Pr[y = 1|x]/∂xl f (x0 β)βl βl

In the case of a dichotomic regressor the marginal eect is:


F (x0−k β −k + βk ) − F (x0−k β −k ).

Chapter 2. Discrete Choice 9


The Logit Model

Chapter 2. Discrete Choice 10


The Logit Model
The Logit model is given by: 0
ex β
F (x0 β) = Λ(x0 β) = .
1 + ex0 β
Nice property of the logistic function: ∂Λ(z)/∂z = Λ(z)(1 − Λ(z)).
Therefore, the ML estimator reduces to:
N 
X 
yi − Λ(x0i β̂) xi = 0.
i=1

And the asymptotic variance to: −1


Ω0 = E Λ(x0 β) 1 − Λ(x0 β) xx0
 
.

Marginal eects are:


∂ Pr[y = 1|x]
= Λ(x0 β)(1 − Λ(x0 β))βk .
∂xk
And another interesting property: ln
p
= x0 β.
1−p
Chapter 2. Discrete Choice 11
The Probit Model

Chapter 2. Discrete Choice 12


The Probit Model
The Probit model is given by:
Z x0 β
F (x0 β) = Φ(x0 β) = φ(z)dz.
−∞

Therefore, the ML estimator is given by:


N
X yi − Φ(x0i β̂)
φ(x0i β̂)xi = 0.
0
i=1 Φ(xi β̂)(1 − Φ(x0i β̂))

And the asymptotic variance is:


−1
φ(x0 β)2

Ω0 = E xx0 .
Φ(x β) (1 − Φ(x0 β))
0

Marginal eects are:


∂ Pr[y = 1|x]
= φ(x0 β)βk .
∂xk

Chapter 2. Discrete Choice 13


Latent Variable Representation

Chapter 2. Discrete Choice 14


Latent Variable Representation
One way to give a more structural interpretation to the model is in terms of a
latent measure of utility.

A latent variable is a variable that is not completely observed.


Two alternative ways in this context:
Index function model: a threshold of the latent variable determines the ob-
served decision.
Random utility model: the decision is based on the comparison of the utilities
obtained from each alternative.
Chapter 2. Discrete Choice 15
Index Function Model
Let y be the latent variable of interest, such that:

y ∗ = x0 β + u u ∼ F (·)

We only observe: (
1 if y ∗
> 0,
y=
0 if y ∗
≤ 0.

The probability of observing y = 1 is:


Pr[y = 1|x] = Pr[x0 β + u > 0] = Pr[u > −x0 β] = F (x0 β).
f (·) symmetric

This model delivers the Logit if F (·) = Λ(·) and the Probit if F (·) = Φ(·).
The threshold is normalized to 0 because it is not separately identied from the constant.
Similarly, all parameters are identied up to scale since Pr[u > −x β] = Pr[ua > −x βa] ⇒ We
0 0

have to impose restrictions on the variance of u.


Chapter 2. Discrete Choice 16
Random Utility Model
Consider the utility of the two alternatives:
U0 = V0 + ε0 ,
U1 = V1 + ε1 .
We only observe y = 1 if U > U and y = 0 otherwise. 1 0

The probability of observing y = 1 is: i

Pr[y = 1|x] = Pr[U1 > U0 |x] = Pr[ε0 − ε1 < V1 − V0 |x] = F (V1 − V0 ).


We typically express V − V as a single-index:
V = x β and V = x β ⇒ V − V = x (β − β ).
1 0
0 0 0
1 1 0 0 1 0 1 0

V = w β and V = z β ⇒ V − V = x (β − β ) with some β = 0.


1
0
1 0
0
0 1 0
0
1 0 jk

V = z α + x β for j = 0, 1 ⇒ V − V = (z − z ) α + x (β − β ).
j
0
j
0
j 1 0 1 0
0 0
1 0

Dierent distributional assumptions deliver dierent models:


ε , ε ∼ i.i.d. N ⇒ ε − ε ∼ N variance not identied.
1 0 0 1

f (ε ) = e
j exp{e }, j = 0, 1 (i.e. Type I EV) ⇒ ε − ε ∼ Λ(·)
−εj −εj
0 1

Chapter 2. Discrete Choice 17


Endogenous Variables

Chapter 2. Discrete Choice 18


Endogeneity

When the number of endogenous regressors is small enough we proceed with a Mul-
tivariate Probit model.

We discuss two cases:


Continuous endogenous regressor.
Discrete endogenous regressor.

When Probit is unfeasible, we may use GMM.


Chapter 2. Discrete Choice 19
Continuous endogenous variable
Consider the model:
y1 = 1{x0 α + βy2 + ε ≥ 0}
      
x ε 1 ρσ
z= z ∼ N 0, .
y2 = z 0 γ + ν z2 ν ρσ σ2

Endogeneity is provided by ρ 6= 0.
As in Exercise 1, we can factorize the conditional likelihood: f (y |z, y )f (y |z).
1 2 2

Then, given ε|z, ν ∼ N ν, 1 − ρ , the log-likelihood is:


ρ
σ
2

N 
(y2i − z 0i γ)2
X 
LN (α, β, ρ, σ, γ) ∝ y1i ln Φ (a) + (1 − y1i ) ln [1 − Φ (a)] − ln σ − ,
i=1
2σ 2

where a = ρ
x0i α+βy2i + σ
√ .(y2i −z 0i γ)
1−ρ2

We can estimate it by FIML or LIML.


Chapter 2. Discrete Choice 20
Discrete endogenous variable
Consider the model:
y1 = 1{x0 α + βy2 + ε ≥ 0}
      
x ε 1 ρ
z= z ∼ N 0, .
y2 = 1{z 0 γ + ν ≥ 0} z2 ν ρ 1

Endogeneity is provided by ρ 6= 0. This is the bivariate binomial probit.


There is no LIML procedure here.
The conditional log-likelihood
X
is:
N
LN (α, β, γ, ρ) = {y1i y2i ln P11i + (1 − y1i )y2i ln P01i +
i=1
+y1i (1 − y2i ) ln P10i + (1 − y1i )(1 − y2i ) ln P00i } ,

where:
P00 ≡ Pr[y1 = 0, y2 = 0|z] = Φ2 (−x0 α, −z 0 γ; ρ).

P10 ≡ Pr[y1 = 1, y2 = 0|z] = Φ(−z 0 γ) − P00 .

P01 ≡ Pr[y1 = 0, y2 = 1|z] = Φ(−x0 α − β) − Φ2 (−x0 α − β, −z 0 γ; ρ).

P11 ≡ Pr[y1 = 1, y2 = 1|z] = 1 − P00 − P10 − P01 .


Chapter 2. Discrete Choice 21
Moment Estimation

When ML is unfeasible, we rely on moment-based estimation.


If the number of external instruments equals the number of endogenous variables
(problem just identied), the GMM estimator solves:
N
X
(yi − pi )z i = 0.
i=1

If the problem is overidentied, we minimize a quadratic form on this expression.

Chapter 2. Discrete Choice 22

You might also like