You are on page 1of 9

# Econometrics 2 Fall 2005

Likelihood Analysis of
Binary Choice Models
Heino Bohn Nielsen

1 of 18

Introduction
In some situations, the object of interest is a binary variable:
Do married women have a paid job or not?
Which families own their own car?
Which students will pass the exam?

yi =
0 if she doesnt

## Often applications in micro-econometrics:

Explain the choices of individuals and firms.

2 of 18

## Outline of the Lecture

(1) Interpretation of a linear regression model, yi

## (2) Alternative approaches: Binary choice models.

(3) Motivation: A latent variable model.
(4) Likelihood analysis.
(5) Measuring the goodness-of-fit.
(6) Empirical example: Teaching economics.

3 of 18

## The Simple Linear Probability Model

Consider a linear regression
yi = x0i + i,

i = 1, 2, ..., N,

## We assume that E[ i | xi] = 0 such that E[yi | xi] = x0i .

Since yi is binary, the conditional expectation can be found as
x0i = E [yi | xi] = 1 Prob(yi = 1 | xi) + 0 Prob(yi = 0 | xi)
= Prob(yi = 1 | xi).

## The derivative, , is interpretable as the change in the probability

Prob(yi = 1 | xi) = xi,
and not the change in yi.
4 of 18

Complications...
(1) Since

0 x0i 1.

## This requires that xi is bounded or that is restricted in some way.

(2) The error term is highly non-normal and heteroskedastic.

i,

yi
1
0

1 x0i
x0i

With Prob.
x0i
1 x0i

he variance is
2

## V [ i | xi] = (1 x0i) x0i + (x0i)2 (1 x0i) = x0i (1 x0i) ,

which depends on xi and .

5 of 18

## Binary Choice Models

As an alternative, consider a class of models
Prob(yi = 1 | xi) = F (x0i)
for some link function F () that always lies between 0 and 1.

yi
Prob(yi=1|xi)

yi=xib

xi
6 of 18

## Choice of Link Functions

All distribution functions are valid as link functions. Common choices are
(1) The normal distribution leading to the socalled probit model

2
t
1
exp
F (w) =
dt.
2
2

## (2) The logistic distribution leading to the logit model

Z

ew
.
1 + ew

F (w) =
Density functions (standardized)
0.4

1.0

Normal
Logistic

## Distribution functions (standardized)

Normal
Logistic

0.5
0.2

0.0
-4

-3

-2

-1

-4

-3

-2

-1

4
7 of 18

Interpretation of Coecients
The signs of the coecients are directly interpretable, but not the magnitudes.
To interpret the model we can look at the marginal eects or the slopes:
Prob(yi = 1 | xi) F (x0i)
=
= F 0 (x0i) k .
xik
xik
They depend on the values of xi. Often evaluated in
(a) the sample mean, or
(b) a standard individual defined by characteristics x.

## The marginal eects depend on F 0 (x0i).

To have the same marginal eects in a logit and probit model, the coecients have
to be dierent. Note that
1
0
Fprobit
(0) =
0.4
2
exp(0)
0
(0) =
= 0.25
Flogit
(1 + exp(0))2
so there is a scale dierence of 0.4/0.25 = 1.6.
8 of 18

## The derivatives are only approximations if xk is a dummy variable.

An alternative is to calculate the discrete eect of a specific change:
Prob(yi = 1 | x) Prob(yi = 1 | x + ) = F (x0) F ((x + )0 ).

## For a specific variable x1 this could be

F ( 0 + x1 1 + x2 2 + ...xk k ) F ( 0 + (x1 + 1) 1 + x2 2 + ...xk k ).
E.g. a dummy taking values x1 = 0 versus x1 = 1.

9 of 18

## Underlying Latent Model

A particular way to motivate the binary choice model is the so-called latent variable
representation.
Consider a woman, i, and consider her decision to work or not to work.
Assume that the utility gain from starting to work, yi, is a function of observed
characteristics, xi, and unobserved characteristics, i, i.e.
yi = x0i + i.
Note that yi is unobservable. It is called a latent variable.
Assume that the distribution of i | xi is symmetric, i.e. F ( ) = 1 F ( ).

10 of 18

0 otherwise

## We can calculate the probability:

Prob(yi = 1 | xi) = Prob(yi > 0 | xi)

=
=
=
=
=

Prob(x0i +

> 0 | xi)
Prob( i > x0i | xi)
1 Prob( i x0i | xi)
Prob( i x0i | xi)
F (x0i) .
i

The binary choice model can therefore be written in the latent variable representation.
For the probit model that is

if yi > 0
1

0
yi = xi + i,
yi =
i N(0, 1),
if yi 0
0
For the logit model N(0, 1) is replaced by a standard logistic distribution.

11 of 18

Likelihood Analysis
We have a fully specified model and a natural estimator is maximum likelihood.
The density for the binary variable is the probability
f (yi | xi; ) = Prob(yi = 1 | xi; )yi Prob(yi = 0 | xi; )1yi .
Assuming that the individuals are independent, we have a likelihood function:

## L() = f (y1, ..., yN | )

N
Y
=
Prob(yi = 1 | xi; )yi Prob(yi = 0 | xi; )1yi
i=1
N
Y
i=1

1yi

F (x0i)yi (1 F (x0i))

log L() =

N
X
i=1

## [yi log (F (x0i)) + (1 yi) log (1 F (x0i))] .

The f.o.c. is s() = 0, but the likelihood equations have no analytical solution.

12 of 18

Measuring Goodness-of-fit
The idea of R2 is to compare with a model with only a constant.
Let
log L1 maximized log-likelihood function
log L0 maximized value in a model with only a constant term.
Two possible goodness-of-fit measures are
McFadden R2 = 1 log L1/ log L0
1
Pseudo R2 = 1
.
1 + 2 (log L1 log L0) /N

## Note that in the model with only a constant, Prob(yi = 1) = p.

P
yi.
This is a simple binomial model and pbML = N1/N , where N1 =
In this case log L0 is given by

N1
N1
log L0 = N1 log
+ (N N1) log 1
.
N
N
13 of 18

Predicted Values
We can construct predictions from the probabilities:

b > 1/2
ybi = 1 if F (x0i)
b 1/2.
ybi = 0 if F (x0i)

To measure the goodness-of-fit we can compare yi and ybi in a simple cross table:
yi

0
1

0
n00
n10
n0

ybi

1
n01
n11
n1

N0
N1
N

## Let p = N1/N > 1/2.

Then a simple model saying ybi = 1 for all i will have N1/N correct predictions.
To beat that we should have at least
n00 + n11 N1
>
.
N
N

14 of 18

## Empirical Example: Teaching Economics!

We analyze whether a new way to teach economics (called PSI) is eective or not .
We have data for 32 classes of which 14 was exposed to PSI.
The data set includes

Variable
Mean
GRADE Whether the students grades improved, (0/1).
0.344
GPA
3.117
TUCE
Pretest score, (entering knowledge of material). 21.938
0.438
PSI
Whether the class was exposed to PSI, (0/1).

## We want to estimate a binary choice model, i.e.

Prob(GRADEi = 1 | xi) = F (x0i),

15 of 18

## Results of Linear regression, Logit, and Probit

Linear
Logit
Variable
Coe. Deriv. Coe. Deriv.
(tratio)

Constant

1.498
(2.75)

(tratio)

... 13.021

Probit
Coe. Deriv.

(tratio)

... 7.452

(2.64)

...

(2.90)

GPA

0.464

0.464

2.826

0.534

1.626

0.533

TUCE

0.010

0.010

0.095

0.018

0.052

0.017

PSI

0.379

0.379

2.379

0.449

1.426

0.468

Log-likelihood
N
PseudoR2

(2.78)
(0.55)
(2.38)

12.978
32
...
...

(2.24)
(0.67)
(2.23)

12.8896
32
0.325
0.374

(2.36)
(0.64)
(2.43)

12.8188
32
0.327
0.377

Note that the results are very close (in terms of derivative).
Note the scale dierences for 4 to PSI
bLogit

2.379
=
= 1.668.
bProbit 1.426

16 of 18

## Results for 3 Logit Estimations

Model (A)
Model (B)
Model (C)
Variable
Coe. tratio Coe. tratio Coe. tratio
Constant
13.021
2.64 10.656 2.63 0.647
1.74
GPA
2.826
2.24
2.538
2.15
...
...
TUCE
0.095
0.67
0.086
0.64
...
...
PSI
2.379
2.23
...
...
...
...
Log-likelihood
12.890
15.991
20.592
N
32
32
32
2
PseudoR
0.325
0.223
...
2
0.374
0.223
...

PSI is significant.
The tratio (Wald test) is 2.23 with a pvalue of 0.025 in N(0, 1).
The likelihood ratio test is

17 of 18

yi

0
1

0
18
3

ybi

1
3
8

Total
21
11
32

## A naive model (ybi = 0) would have 21/32 = 0.656 correct predictions.

The logit model has 26/32 = 0.813.

## To find the numerical eect of PSI we look at an average person.

I.e. with GPA = 3.117 and TUCE = 21.938.
b :
The probabilities of an improvement in test scores are F (x0)

## Prob(GRADE = 1 | GPA = 3.117, TUCE = 21.938, PSI = 0) = 0.1068

Prob(GRADE = 1 | GPA = 3.117, TUCE = 21.938, PSI = 1) = 0.5633

18 of 18