Chapter 2

Chapter 2: Discrete Choice
Joan Llull
Quantitative Statistical Methods II Part II

Barcelona School of Economics
Introduction
In this chapter we analyze some models for discrete outcomes, models for which
of m mutually exclusive categories is selected.
This section: binary outcomes.
For notational convenience: y = 1{A is selected}:

It allows us to write the likelihood in a very compact way.
What happens with N −1 N
y ? Why is it important?
P
i=1 i
Chapter 2. Discrete Choice 2

The linear probability model
Simple approach: linear regression model.
OLS regression of y on x provides consistent estimates of sample-average marginal
eects ⇒ nice exploration tool.
Becoming popular in the treatment eects literature.

Two important drawbacks:
Predicted probabilities p̂(x) = x β̂ are not bounded between 0 and 1.
0
Error term is heteroscedastic and has a discrete support (given x).

The General Binary Outcome Model

The General Binary Outcome Model
The conditional probability of choosing A given x is p(x) ≡ Pr[y = 1|x] = F (x β).
0
These are single-index models.

This general notation is useful to derive general results that are common across
models.
This model includes linear model, Probit and Logit as special cases:
Linear model: F (x β) = x β.0 0
Logit: F (x β) = Λ(x β) = . 0
0 0 ex β
1+ex0 β
Probit: F (x β) = Φ(x β) =
0 φ(z)dz .
0 x0 β
R
−∞

Maximum Likelihood Estimation
Given the binomial nature of data, we know the distribution of the outcome: Bernoulli:
(
p if y = 1 ,
1 − p if y = 0
y 1−y
g(y|x) = p (1 − p) =
where p = F (x β).0
Therefore, the conditional log-likelihood is:

N
X
{yi ln F (x0i β) + (1 − yi ) ln 1 − F (x0i β) }.

LN (β) =
i=1
And the rst order condition:

N
∂ LN X yi − F (x0i β̂)
≡ f (x0i β̂)xi = 0,
∂β 0 0
i=1 F (xi β̂)(1 − F (xi β̂))
where f (·) ≡ . ∂F (z)

∂z
No explicit solution. Newton-Raphson converges quickly because log-likelihood is globally concave

for the Probit and Logit.
Consistency
We know that the distribution of y is Bernoulli ⇒ Consistency additionally requires
p = F (x β ).
0
0
The true parameter vector is the solution of:

max E[y ln F (x0 β) + (1 − y) ln 1 − F (x0 β) ] .

β
The rst order condition is:

y − F (x0 β)

0
E f (x β)x = 0.
F (x0 β)(1 − F (x0 β)) [p=F (x0 β 0 )]

Asymptotic distribution
From Chapter 1: β̂ →d N (β, Ω /N )). 0
We may use the information matrix equality to get Ω : 0

2
−1 −1 −1
∂ L ∂L ∂L 1
−E =E =E f (x0 β)2 xx0 .
∂β∂β 0 ∂β ∂β 0 F (x0 β) (1 − F (x0 β))
Note that this is of the form E[ωxx ] . 0 −1

Marginal eects
Marginal eects are given by:
∂ Pr[y = 1|x]
= f (x0 β)βk .
∂xk
In the linear probability model, f (x β) = 1.

0
In non-linear models, depend on x (we can compute several alternatives).

Coecients are still informative of the sign of the marginal eect.
Interesting property: ratios of marginal eects are constant:
∂ Pr[y = 1|x]/∂xk f (x0 β)βk βk
= = .
∂ Pr[y = 1|x]/∂xl f (x0 β)βl βl
In the case of a dichotomic regressor the marginal eect is:

F (x0−k β −k + βk ) − F (x0−k β −k ).

The Logit Model

The Logit Model
The Logit model is given by: 0
ex β
F (x0 β) = Λ(x0 β) = .
1 + ex0 β
Nice property of the logistic function: ∂Λ(z)/∂z = Λ(z)(1 − Λ(z)).
Therefore, the ML estimator reduces to:
N
X
yi − Λ(x0i β̂) xi = 0.
i=1
And the asymptotic variance to: −1

Ω0 = E Λ(x0 β) 1 − Λ(x0 β) xx0

.
Marginal eects are:

∂ Pr[y = 1|x]
= Λ(x0 β)(1 − Λ(x0 β))βk .
∂xk
And another interesting property: ln
p
= x0 β.
1−p
The Probit Model

The Probit Model
The Probit model is given by:
Z x0 β
F (x0 β) = Φ(x0 β) = φ(z)dz.
−∞
Therefore, the ML estimator is given by:

N
X yi − Φ(x0i β̂)
φ(x0i β̂)xi = 0.
0
i=1 Φ(xi β̂)(1 − Φ(x0i β̂))
And the asymptotic variance is:

−1
φ(x0 β)2

Ω0 = E xx0 .
Φ(x β) (1 − Φ(x0 β))
0
Marginal eects are:

∂ Pr[y = 1|x]
= φ(x0 β)βk .
∂xk

Latent Variable Representation

Latent Variable Representation
One way to give a more structural interpretation to the model is in terms of a
latent measure of utility.
A latent variable is a variable that is not completely observed.

Two alternative ways in this context:
Index function model: a threshold of the latent variable determines the ob-
served decision.
Random utility model: the decision is based on the comparison of the utilities
obtained from each alternative.
Index Function Model
Let y be the latent variable of interest, such that:
∗
y ∗ = x0 β + u u ∼ F (·)
We only observe: (
1 if y ∗
> 0,
y=
0 if y ∗
≤ 0.
The probability of observing y = 1 is:

Pr[y = 1|x] = Pr[x0 β + u > 0] = Pr[u > −x0 β] = F (x0 β).
f (·) symmetric
This model delivers the Logit if F (·) = Λ(·) and the Probit if F (·) = Φ(·).
The threshold is normalized to 0 because it is not separately identied from the constant.
Similarly, all parameters are identied up to scale since Pr[u > −x β] = Pr[ua > −x βa] ⇒ We
0 0
have to impose restrictions on the variance of u.

Random Utility Model
Consider the utility of the two alternatives:
U0 = V0 + ε0 ,
U1 = V1 + ε1 .
We only observe y = 1 if U > U and y = 0 otherwise. 1 0
The probability of observing y = 1 is: i
Pr[y = 1|x] = Pr[U1 > U0 |x] = Pr[ε0 − ε1 < V1 − V0 |x] = F (V1 − V0 ).

We typically express V − V as a single-index:
V = x β and V = x β ⇒ V − V = x (β − β ).
1 0
0 0 0
1 1 0 0 1 0 1 0
V = w β and V = z β ⇒ V − V = x (β − β ) with some β = 0.

1
0
1 0
0
0 1 0
0
1 0 jk
V = z α + x β for j = 0, 1 ⇒ V − V = (z − z ) α + x (β − β ).
j
0
j
0
j 1 0 1 0
0 0
1 0
Dierent distributional assumptions deliver dierent models:

ε , ε ∼ i.i.d. N ⇒ ε − ε ∼ N variance not identied.
1 0 0 1
f (ε ) = e
j exp{e }, j = 0, 1 (i.e. Type I EV) ⇒ ε − ε ∼ Λ(·)
−εj −εj
0 1

Endogenous Variables

Endogeneity
When the number of endogenous regressors is small enough we proceed with a Mul-
tivariate Probit model.
We discuss two cases:

Continuous endogenous regressor.
Discrete endogenous regressor.
When Probit is unfeasible, we may use GMM.

Continuous endogenous variable
Consider the model:
y1 = 1{x0 α + βy2 + ε ≥ 0}

x ε 1 ρσ
z= z ∼ N 0, .
y2 = z 0 γ + ν z2 ν ρσ σ2
Endogeneity is provided by ρ 6= 0.
As in Exercise 1, we can factorize the conditional likelihood: f (y |z, y )f (y |z).
1 2 2
Then, given ε|z, ν ∼ N ν, 1 − ρ , the log-likelihood is:

ρ
σ
2
N
(y2i − z 0i γ)2
X
LN (α, β, ρ, σ, γ) ∝ y1i ln Φ (a) + (1 − y1i ) ln [1 − Φ (a)] − ln σ − ,
i=1
2σ 2
where a = ρ
x0i α+βy2i + σ
√ .(y2i −z 0i γ)
1−ρ2
We can estimate it by FIML or LIML.

Discrete endogenous variable
Consider the model:
y1 = 1{x0 α + βy2 + ε ≥ 0}

x ε 1 ρ
z= z ∼ N 0, .
y2 = 1{z 0 γ + ν ≥ 0} z2 ν ρ 1
Endogeneity is provided by ρ 6= 0. This is the bivariate binomial probit.

There is no LIML procedure here.
The conditional log-likelihood
X
is:
N
LN (α, β, γ, ρ) = {y1i y2i ln P11i + (1 − y1i )y2i ln P01i +
i=1
+y1i (1 − y2i ) ln P10i + (1 − y1i )(1 − y2i ) ln P00i } ,
where:
P00 ≡ Pr[y1 = 0, y2 = 0|z] = Φ2 (−x0 α, −z 0 γ; ρ).
P10 ≡ Pr[y1 = 1, y2 = 0|z] = Φ(−z 0 γ) − P00 .
P01 ≡ Pr[y1 = 0, y2 = 1|z] = Φ(−x0 α − β) − Φ2 (−x0 α − β, −z 0 γ; ρ).
P11 ≡ Pr[y1 = 1, y2 = 1|z] = 1 − P00 − P10 − P01 .

Moment Estimation
When ML is unfeasible, we rely on moment-based estimation.

If the number of external instruments equals the number of endogenous variables
(problem just identied), the GMM estimator solves:
N
X
(yi − pi )z i = 0.
i=1
If the problem is overidentied, we minimize a quadratic form on this expression.

Chapter 2

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 2

Uploaded by

Copyright:

Available Formats

Chapter 2: Discrete Choice

Quantitative Statistical Methods II  Part II

For notational convenience: y = 1{A is selected}:

Chapter 2. Discrete Choice 2

Becoming popular in the treatment eects literature.

Error term is heteroscedastic and has a discrete support (given x).

Chapter 2. Discrete Choice 4

These are single-index models.

Chapter 2. Discrete Choice 5

Therefore, the conditional log-likelihood is:

And the rst order condition:

where f (·) ≡ . ∂F (z)

No explicit solution. Newton-Raphson converges quickly because log-likelihood is globally concave

The true parameter vector is the solution of:

The rst order condition is:

Chapter 2. Discrete Choice 7

From Chapter 1: β̂ →d N (β, Ω /N )). 0

We may use the information matrix equality to get Ω : 0

Note that this is of the form E[ωxx ] . 0 −1

Chapter 2. Discrete Choice 8

In the linear probability model, f (x β) = 1.

In non-linear models, depend on x (we can compute several alternatives).

In the case of a dichotomic regressor the marginal eect is:

Chapter 2. Discrete Choice 9

Chapter 2. Discrete Choice 10

And the asymptotic variance to: −1

Marginal eects are:

Chapter 2. Discrete Choice 12

Therefore, the ML estimator is given by:

And the asymptotic variance is:

Marginal eects are:

Chapter 2. Discrete Choice 13

Chapter 2. Discrete Choice 14

A latent variable is a variable that is not completely observed.

The probability of observing y = 1 is:

have to impose restrictions on the variance of u.

The probability of observing y = 1 is: i

Pr[y = 1|x] = Pr[U1 > U0 |x] = Pr[ε0 − ε1 < V1 − V0 |x] = F (V1 − V0 ).

V = w β and V = z β ⇒ V − V = x (β − β ) with some β = 0.

Dierent distributional assumptions deliver dierent models:

Chapter 2. Discrete Choice 17

Chapter 2. Discrete Choice 18

We discuss two cases:

When Probit is unfeasible, we may use GMM.

Then, given ε|z, ν ∼ N ν, 1 − ρ , the log-likelihood is:

We can estimate it by FIML or LIML.

Endogeneity is provided by ρ 6= 0. This is the bivariate binomial probit.

P10 ≡ Pr[y1 = 1, y2 = 0|z] = Φ(−z 0 γ) − P00 .

P01 ≡ Pr[y1 = 0, y2 = 1|z] = Φ(−x0 α − β) − Φ2 (−x0 α − β, −z 0 γ; ρ).

P11 ≡ Pr[y1 = 1, y2 = 1|z] = 1 − P00 − P10 − P01 .

When ML is unfeasible, we rely on moment-based estimation.

If the problem is overidentied, we minimize a quadratic form on this expression.

Chapter 2. Discrete Choice 22

You might also like

Quantitative Statistical Methods II Part II

Becoming popular in the treatment eects literature.

And the rst order condition:

The rst order condition is:

In the case of a dichotomic regressor the marginal eect is:

Marginal eects are:

Marginal eects are:

Dierent distributional assumptions deliver dierent models:

If the problem is overidentied, we minimize a quadratic form on this expression.