You are on page 1of 37

Linear Probability, Logit, and Probit Models

Mei-Yuan Chen
Department of Finance
National Chung Hsing University

August 19, 2022

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
M.-Y. Chen E370ProbitLogit.tex August 19, 2022 1 / 37
Qualitative Response Data

1 binary data: way of to work (0 for private and 1 for public), default or not
(0 for not default and 1 for default), buy or not buy (0 for not buy and 1
for buy), attend the class or not (0 for not attend and 1 for attend).
2 multinomial data: extents of agree (1 表非常不同意、2 表不同意、3 表
無意見、4 表同意及 5 表非常同意) ,Woody’s credit scoring (用 1 到
16 表示由最低到最高的信用評等) 。
3 truncated data: wage determination (觀察到的工資必須大於法定最低工
資水準)
4 limited data: limits of stock returns(股票價格的漲跌幅限制)。

This provides a very basic motivation for qualitative response and limited
dependent variable models in economics and finance.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
M.-Y. Chen E370ProbitLogit.tex August 19, 2022 2 / 37
Information and the Observational Rule

In many cases it is necessary to examine the relationship between


moments we wish to estimate and the observational rule which
determines the information we as analysts observe. Assuming
∃ − ∞ < y∗ < ∞,
∫ ∞

E(y ) = y∗ f(y∗ )dy∗
∫−∞α ∫ ∞
∗ ∗ ∗
= y f(y )dy + y∗ f(y∗ )dy∗ .
−∞ α

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
M.-Y. Chen E370ProbitLogit.tex August 19, 2022 3 / 37
Observational Rules

1 y = I(−∞ < y∗ < ∞)y∗ : y = y∗ .


2 y = I(y∗ > a): Observe y = 0 or 1, Binary Response Models
such as Logit, Probit, Linear Probability Model, and yi is
Bernoulli distributed.
3 y = I(y∗ > a)y∗ and do not observe any other information for
y∗ ≤ a: Truncated Regression Models.
4 y = I(y∗ > a)y∗ + I(y∗ ≤ a)a: Observe y = y∗ or y = a.
Censored Regression/Tobit Models.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
M.-Y. Chen E370ProbitLogit.tex August 19, 2022 4 / 37
Linear Probability Models

Usually, a linear regression model is considered for the conditional


mean of dependent variable Y on explanatory variables
X1 , X2 , . . . , Xk , i.e.,

E(Y|X1 , X2 , . . . , Xk ) = β10 X1 + β20 X2 + · · · + βk0 Xk = Xβ0 .

Suppose Y is defined as a Bernoulli random variable with value “1” or


“0” with probability p and 1 − p, respectively. Then

E(Y) = 1 × P(Y = 1) + 0 × P(Y = 0) = P(Y = 1) = p

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
M.-Y. Chen E370ProbitLogit.tex August 19, 2022 5 / 37
Now consider the conditional distribution of Y on other random
variables X and then the conditional mean of Y on X is defined as

E(Y|X = x) = 1 · P(Y = 1|X = x)


+0 · P(Y = 0|X = x) = P(Y = 1|X = x),

which equals to the conditional probability value of Y = 1 conditional


on X = x. Therefore, a linear regression model with a dependent
variable that is either zero or one is called the linear probability model.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
M.-Y. Chen E370ProbitLogit.tex August 19, 2022 6 / 37
Given a sample of random observations,
{(y1 , x′1 )′ , (y2 , x′2 )′ , . . . , (yn , x′n )′ }, a linear probability model is
specified as

yi = x′i β0 + ei , i = 1, . . . , n.

Then

if yi = 0, then ei = −x′i β0 ,
if yi = 1, then ei = 1 − x′i β0 .

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
M.-Y. Chen E370ProbitLogit.tex August 19, 2022 7 / 37
E(ei ) = (1 − x′i β0 ) · P(yi = 1|X = xi ) + (0 − x′i β0 ) · P(yi = 0|X = xi )
= [1 − P(yi = 1|X = xi )] · P(yi = 1|X = xi )
−P(yi = 1|X = xi ) · [1 − P(yi = 1|X = xi )] = 0

is constant for all i, and

var(ei ) = E(e2i )
= (1 − x′i β0 )2 · P(yi = 1|X = xi ) + (0 − x′i β0 )2 · P(yi = 0|X = xi )
= [1 − P(yi = 1|X = xi )]2 · P(yi = 1|X = xi )
+[−P(yi = 1|X = xi )]2 · [1 − P(yi = 1|X = xi )]
= [1 − P(yi = 1|X = xi )] · P(yi = 1|X = xi )
{[1 − P(yi = 1|X = xi )] + [P(yi = 1|X = xi )]}
= [1 − P(yi = 1|X = xi )] · P(yi = 1|X = xi )
= (1 − x′i β0 )(x′i β0 ) not constant ∀i.heteroskedastic

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
M.-Y. Chen E370ProbitLogit.tex August 19, 2022 8 / 37
Weighted Least Squares Estimator
To have a constant variance for regression errors, Golberger (1964)
proposed a two-step, weighted estimator for estimating a linear
probability model.
(1) Construct the weight wi by
[ ]1/2
1
wi = ,
(x′i β̂n )(1 − x′i β̂n )

where β̂n is the OLS estimation.


(2) Perform OLS estimation for the regression model with transformed sample
observations, i.e.,

n
′ ∑
n
β̂n∗ = ( x∗i x∗i )−1 ( x∗i yi∗ )
i=1 i=1

where x∗i = wi xi and y∗i = wi yi .


. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
M.-Y. Chen E370ProbitLogit.tex August 19, 2022 9 / 37
Logit and Probit Transformations

As a probability value has to be constrained between 0 and 1, x′i β


must be between 0 and 1. However, x′i β̂n∗ is not always so
constrained. To get ride of this constrain, a nonlinear transform on
pi = P(yi = 1|X = xi ) is necessary. In literature, two transformations
are commonly used.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
M.-Y. Chen E370ProbitLogit.tex August 19, 2022 10 / 37
Logit Transformation

As 0 ≤ pi ≤ 1, pi /(1 − pi ) lies on[0, ∞). There still exists a lower


bound, 0. This lower bound can be eliminate by taking natural
logarithm on it, i.e., log[pi /(1 − pi )] ∈ (−∞, ∞). Note that
log[pi /(1 − pi )] is large (small) as pi is large (small). Assume the
linear regression model with a transformed dependent variable, i.e.,
log[pi /(1 − pi )],

log[pi /(1 − pi )] = x′i β0 .

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
M.-Y. Chen E370ProbitLogit.tex August 19, 2022 11 / 37
Take exponential function on both sides of above equation

exp{log[pi /(1 − pi )]} = exp(x′i β0 )


pi /(1 − pi ) = exp(x′i β0 )
pi = exp(x′i β0 ) − pi exp(x′i β0 )
exp(x′i β0 )
pi = .
1 + exp(x′i β0 )

This expression is commonly referred to as logistic function. The


logistic function transforms x′i β0 into a real value between 0 and 1.
This is so called logit model.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
M.-Y. Chen E370ProbitLogit.tex August 19, 2022 12 / 37
Given a sample of random observation, {(yi , xi ), i = 1, . . . , n}, the likelihood
function is

n
L(y|x, β) = pyi i × (1 − pi )1−yi
i=1
{[ ] yi [ ]1−yi }
∏n
exp(x′i β) exp(x′i β
= 1− ) ,
i=1
1 + exp(x′i β) 1 + exp(x′i β)

and the log-likelihood function is

log L(y|x, β)
∑n { [ ] [ ]}
exp(x′i β) exp(x′i β
= yi log + (1 − y i ) log 1 − ) .
i=1
1 + exp(x′i β) 1 + exp(x′i β)

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
M.-Y. Chen E370ProbitLogit.tex August 19, 2022 13 / 37
Probit Transformation

∫ x′i β0
1
pi = Φ(x′i β0 ) = √ exp(−u2 /2)du.
−∞ 2π

Given a sample of random observation, {(yi , xi ), i = 1, . . . , n}, the likelihood


function is

n
L(y|x, β) = pyi i × (1 − pi )1−yi
i=1

n
= [Φ(x′i β0 )]yi [1 − Φ(x′i β)]1−yi ,
i=1

and the log-likelihood function is


n
log L(y|x, β) = {yi log[Φ(x′i β)] + (1 − yi ) log[1 − Φ(x′i β)]}.
i=1
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
M.-Y. Chen E370ProbitLogit.tex August 19, 2022 14 / 37
Marginal Effect on the Probability
For the probit model,

∂P(yi = 1) 1
= √ exp(−x′i β0 /2)β0j = ϕ(x′i β0 )β0j ,
∂xij 2π
and for the logit model,

∂P(yi = 1)
∂xij
= a ∗ e(ax+b) (1 + e(ax+b) )−1 + (−1)(a ∗ e(ax+b) (1 + e(ax+b) )−2 e(ax+b) )
[ ] [ ]
exp(x′i β0 ) exp(x′i β0 )
= β0j − β0j exp(x′i β0 )
1 + exp(x′i β0 ) (1 + exp(x′i β0 ))2
[ ][ ]
exp(x′i β0 ) exp(x′i β0 )
= β0j 1 −
1 + exp(x′i β0 ) 1 + exp(x′i β0 )
= P(yi = 1)[1 − P(yi = 1)]β0j .

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
M.-Y. Chen E370ProbitLogit.tex August 19, 2022 15 / 37
Ordered Probit/Logit Models
The ordered probit/logit models is to consider the dependent variable
outcomes have a natural (ordinal) ranking (i.e., the respones can be
ordered in some meanful fashion). Consider a random sample,
{(yi , xi ), i = 1, . . . , n}, where y− = m for m = 1, . . . , M with a
natural ordering (that is m + 1 is in some sense better than m). The
observed values are assumed to derive from some unobservable latent
variable y∗i , where

y∗i = x′i β0 + ui , i = 1, . . . , n,

with

yi = m if αm−1 < y∗i ≤ αm for m = 1, . . . , M

where α0 = −∞ and αM = ∞.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
M.-Y. Chen E370ProbitLogit.tex August 19, 2022 16 / 37
Then, the conditional probability of observing the mth category (i.e.,
yi = m) can be written as

P(yi = m|xi ) = P(αm−1 < y∗i ≤ αm )


= P(αm−1 < x′i β0 + ui ≤ αm )
= P(αm−1 − x′i β0 < ui ≤ αm − x′i β0 )
= P(ui ≤ αm − x′i β0 ) − P(ui ≤ αm−1 − x′i β0 )

for m = 1, . . . , M.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
M.-Y. Chen E370ProbitLogit.tex August 19, 2022 17 / 37
Ordered Probit/Logit Model

Assuming ui ∼ N(0, 1), the Ordered Probit takes the conditional


probabilities as

P(yi = m|xi ) = Φ(αm − x′i β0 ) − Φ(αm−1 − x′i β0 ),

where Φ() is the CDF function of the standard normal distribution


and assuming the conditional probability is specified a logistic
function and then
exp(αm − x′i β0 ) exp(αm−1 − x′i β0 )
P(yi = m|xi ) = − .
1 + exp(αm − x′i β0 ) 1 + exp(αm−1 − x′i β0 )

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
M.-Y. Chen E370ProbitLogit.tex August 19, 2022 18 / 37
Likelihood Function of Ordered Probit Model
Denote zim = 1(yi = m) for m = 1, . . . , M as the indicator function for the ith
observation yi = m. The likelihood of the ith observation is


M
li = P(yi = m|xi )zim
m=1

M
zim
= [Φ(αm − x′i β) − Φ(αm−1 − x′i β)]
m=1

and the likelihood function of the random sample becomes



n ∏
M
zim
L(α, β0 ) = [Φ(αm − x′i β) − Φ(αm−1 − x′i β)] .
i=1 m=1

Taking logarithms, the log-likelihood function is


n ∑
M
l(α, β0 ) = zim ln [Φ(αm − x′i β) − Φ(αm−1 − x′i β)] .
i=1 m=1
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
M.-Y. Chen E370ProbitLogit.tex August 19, 2022 19 / 37
MLE for Ordered Probit Model

The MLE for (α, β0 ) is obtained by satisfying


set set
∇α l(α, β) = 0 and ∇β l(α, β) = 0.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
M.-Y. Chen E370ProbitLogit.tex August 19, 2022 20 / 37
Measures for Goodness-of-Fit
(1) Squared correlation between y and ŷ;

[ ni=1 (yi − ȳn )(ŷi − ȳn )]2
R = ∑n
2
∑ .
[ i=1 (yi − ȳn )2 ][ ni=1 (ŷi − ȳn )2 ]

(2) Measures based on residual sum of squares:


[ ∑n ]
(yi − ŷi )2
R = 1 − ∑n
2 i=1

i=1 (yi − ȳn )


2

n ∑
n
= 1− (yi − ŷi )2 Effron’s measure
n1 n2 i=1
[ ∑n ]
(y i − ŷ i ) 2
= 1 − ∑ni=1 , Amemiya’s measure
i=1 ŷi (1 − ŷi )
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
M.-Y. Chen E370ProbitLogit.tex August 19, 2022 21 / 37
(3) Likelihood ratio:

R2 = 1 − (Lc /Lu )2/n


log Lu
= 1− , McFadden’s pseudo R2
log Lc
1
= 1− .
1 + 2(log Lu − log Lc )/n

(4) Proportion of correct predictions:

number of correct predictions


count R2 = ,
total number of observations
in which correction prediction means yi = ŷ∗i , where
{
∗ 1 if ŷi > 0.5
ŷi =
0 if ŷi < 0.5
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
M.-Y. Chen E370ProbitLogit.tex August 19, 2022 22 / 37
Sample Selection Bias Models

Heckman’s standard sample selection model is also called “Tobit-2”


model (Amemiya 1984, 1985). It consists of the following
(unobserved) structural process:

yS∗
i = xSi β S + ϵSi (1)

yO∗
i = xO O
i β + ϵi
O
(2)

where yS∗
i is the relization of the latent value if the selection
“tendency” for the individual i and yO∗i is the latent outcome. xSi and
xO
i are explanatory variables for the selection and outcome equation,
respectively. xS and xO may or may not be equal.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
M.-Y. Chen E370ProbitLogit.tex August 19, 2022 23 / 37
We observe that
{
0 if yS∗
i <0
ySi = (3)
1 otherwise
{
0 if ySi = 0
yO
i = (4)
yO∗
i otherwise

i.e. we observe the outcome only if the latent selection variable yS∗
i is positive.
The observed dependence between yO and xO can now be written as
′ ′
E[yO |xO = xiO , xS = xSi , yS = 1] = xO O O S S S
i β + E[ϵ |ϵ ≥ xi β ]. (5)

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
M.-Y. Chen E370ProbitLogit.tex August 19, 2022 24 / 37
Sample Selection Bias

As yi = max{x′i β0 + ϵi , 0} and given ϵi is normally distributed with


mean 0 and variance σ02 , E(yi |xi , yi > 0) and E(yi |xi ) can be derived
as follows.
Let y∗i = x′i β0 + ϵi , then it must be that y∗i ∼ N(x′i β0 , σ02 ). When
y = 0, we have
( ′ )
∗ −x β0
P(y = 0) = P(y ≤ 0) = Φ
σ0
( ′ )
x β0
= 1−Φ .
σ0

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
M.-Y. Chen E370ProbitLogit.tex August 19, 2022 25 / 37
Given (y∗i |X = xi ) ∼ N(x′i β0 , σ02 ) and ϵi = (y∗i − x′i β0 )/σ0 ∼ N(0, 1), the density
function of ϵi is
 ( ∗ ′ )2 
( 2) yi −xi β0
1 ϵi 1  σ0 
fϵ (ϵi ) = √ exp = √ exp  
2π 2 2π 2
( ) [ ( )]
1 (y∗i − x′i β0 )2 1 (y∗i − x′i β0 )2
= √ exp = σ0 √ exp
2π 2σ02 2πσ0 2σ02
( ∗ )
y − x′i β0
= σ0 fy∗i −x′i β0 (y∗i − x′i β0 ) = ϕ i .
σ0

Therefore,
( )
1 y∗i − x′i β0
fy∗i −x′i β0 (yi∗ − x′i β0 ) = fy∗i (y∗i ) = ϕ .
σ0 σ0

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
M.-Y. Chen E370ProbitLogit.tex August 19, 2022 26 / 37
Then, E(yi |xi , y(i > 0) =)E(y∗i |xi , y∗i > 0). As y∗i ∼ N(x′i β0 , σ02 ) and
yi −x′i β0
fy∗i (y∗i ) = 1
σ0 ϕ σ0 , we have
( ) ( )
1 y∗ ′
i −xi β0
f (y∗i )
y∗ σ0 ϕ σ0
f(y∗i |y∗i > 0) = i
= ( ) .
P(y∗i ≥ 0) 1−Φ
−x′i β0
σ0

Then,
( ) ( )
∫ 1 y∗ ′
i −xi β0
∞ ϕ
σ0 σ0
E(y∗i |xi , y∗i > 0) = y∗i ( ′ ) dy∗i
−xi β0
0 1−Φ σ0
∗ ′
( ∗ ′ ) ( ∗ ′ )
∫ yi −xi β0 yi −xi β0 x′i β0 yi −xi β0
∞ ϕ + ϕ
σ0 σ0 σ0 σ0
= ( ′ ) dy∗i .
−xi β0
0 1−Φ σ0

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
M.-Y. Chen E370ProbitLogit.tex August 19, 2022 27 / 37

y −x β ′
Let x = i σ i 0 , thus σ0 dx = dy∗
i , then we have that solutions to the first part and second part of the integral are:
0

∫ ∞ ∗ ( ∗ )
1 yi − x′i β0 yi − x′i β0 ∗
(1) : ( ) ϕ dy
−x′ β0 0 σ0 σ0
1−Φ i
σ0
∫ ∞
σ0
= ( ) x′ β 0 xϕ(x)dx, d exp(g(x))/dx = exp(g(x))[dg(x)/dx]
−x′ β − i
1−Φ i 0 σ0
σ0
∫ ∞
σ0 ϕ(x)
= ( ) x′ β 0 − dx, dϕ(x)/dx = −xϕ(x)
−x′ β − i dx
1−Φ i 0 σ0
σ0
[ ( )]
σ0 −x′i β0
= ( ) −ϕ(∞) + ϕ
−x′ β σ0
1−Φ i 0
σ0
( ) ( )
−x′i β0 −x′i β0
σ0 ϕ σ0
σ0 ϕ σ0
= ( ) = ( ′ ) ,
−x′ β0 x β
1−Φ i Φ i 0
σ0 σ0

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
M.-Y. Chen E370ProbitLogit.tex August 19, 2022 28 / 37
∫ ( ∗
∞ )
x′i β0 1 y − x′i β0
(2) : ( ′ ) ϕ i dy∗i
−xi β0 σ0 σ0
1−Φ σ0
0
( )
′ ∫ ∞ dϕ y∗i −x′i β0
xi β0 σ0
= ( ′ ) ∗ dy∗i
−xi β0 dy
1−Φ σ0
0 i


( ( ′ ))
xi β 0 −xi β0
= ( ′ ) 1−Φ
−xi β0 σ0
1−Φ σ0

= x′i β0 .

Using these results we have,


( )
−x′i β0
σ0 ϕ σ0
E(yi |xi , yi > 0) = x′i β0 + ( ) .
x′i β0
Φ σ0

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
M.-Y. Chen E370ProbitLogit.tex August 19, 2022 29 / 37
The unconditional mean of yi is,

E(yi |xi ) = E(yi |xi , yi > 0)P(yi > 0)


+E(yi |xi , yi = 0)P(y∗i ≤ 0)
= E(yi |xi , yi > 0)P(x′i β0 + ϵi > 0) + 0 × P(y∗i ≤ 0)
 ( ′ )
σ0 ϕ
−xi β0 ( ′ )
 ′ σ0
 xi β0
= xi β 0 + ( ′ ) Φ
xi β 0 σ0
Φ σ0
( ′ ) ( ′ )
xi β0 −xi β0
= x′i β0 Φ + σ0 ϕ .
σ0 σ0
( ′ ) ( ′ )
xβ 0−xi β0
It is clear that since Φ σi 00 = 1 − Φ , the non-censoring case is the
( )
σ0
( )
−∞−x′i β0 −∞−x′i β0
one replacing 0 with −∞, 1 − Φ σ0 = 1 and ϕ σ0 = 0. That
∗ ′
is, E(yi |xi , yi > −∞) = xi β0 .

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
M.-Y. Chen E370ProbitLogit.tex August 19, 2022 30 / 37
Estimating the model above by OLS gives in general biased results,

as E[ϵO |ϵS ≥ xSi β S ] ̸= 0.
Assuming the error terms follow a bivariate normal distribution:
( S ) (( ) ( ))
ϵ 0 1 ρ
∼ N , (6)
ϵO 0 ρ σ2

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
M.-Y. Chen E370ProbitLogit.tex August 19, 2022 31 / 37
Sample Selection Bias Models

Generally, a sample selection model can be expressed as

y1i = x′i β + ui , (7)


y2i = w′i α + vi , (8)
di = I(y2i > 0), i = 1, . . . , N,
[ 2 ]
σu σuv
(ui , vi )′ ∼ N(0, V), V= , (9)
σuv 1

where I(·) is an indicator function. I(·) = 1 if the argument (·) is true, otherwise
I(·) = 0. y2i is unobservable, only di is observable. Meanwhile y1i is observable
only when di = 1. When di = 0, all y1i are observed to be zero. In literature, two
estimations are developed for a sample selection model: one is Heckman
two-stage method and maximum likelihood method.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
M.-Y. Chen E370ProbitLogit.tex August 19, 2022 32 / 37
Heckman Two-Stage Estimator

Under di = 1, the expected value of y1i is

E(y1i |di = 1) = x′i β + σuv λ(w′i α), (10)


ϕ(z)
λ(z) = , (11)
Φ(z)

in which ϕ 和 Φ are probability density and distribution function. λ is


called the hazard ratio or inverse Mill’s ratio. expected value

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
M.-Y. Chen E370ProbitLogit.tex August 19, 2022 33 / 37
Therefore, for the observations with di = 1, (7) can be represented as

y1i = x′i β + σuv λ(w′i α) + ϵi . (12)

Since E(ϵi |di = 1) = 0,(12) can be used to estimate β.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
M.-Y. Chen E370ProbitLogit.tex August 19, 2022 34 / 37
Heckman’s Two-stage Procedures

1 Use the MLE (8) to estimate the parameter α in a probit model. Denote
the estimates as α̂ and put them into (11), then we have
λ̂i = λ(w′i α̂), i = 1, . . . , n.
2 Given the observation with di = 1, estimate a regression of y1i on xi and λ̂i
using least squares method. Denote β̂ and σ̂uv as the estimated results for
β and σuv . The parameter σu2 can be estimated by the following method
suggested by Heckman (1979):
1 [ ]
1∑
n
σ̂u2 = ê2i + σ̂uv λ̂i , (13)
n i=1

in which n1 is the number of observations with di = 1. êi is denoted as the


residuals from step 2.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
M.-Y. Chen E370ProbitLogit.tex August 19, 2022 35 / 37
It’s obvious that the performance of β̂ and σ̂uv depend on the
performance of λ̂i . If wi and xi are highly correlated, severe problem
of multicollinearity exists between λ̂i and xi so that the Heckman
two-stage estimator will not perform well. This issue is discussed in
Olsen (1980)、Nawata (1993) and Leung and Yu (1996, 2000).

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
M.-Y. Chen E370ProbitLogit.tex August 19, 2022 36 / 37
Maximum Likelihood Estimator
Under the assumption of normal distribution as (9), the log-likelihood function of
(7) is


n
log L = (1 − di ) log[1 − Φ(w′i α)]
i=1
[
+di log Φ{[w′i α + σuv σu−2 (y1i − x′i β)][1 − σuv
2
/σu2 ]−1/2 }
]
− log σu + log Φ[σu−1 (y1i − x′i β)] .

Let ρ = σuv /σu ,the log-likelihood function


n
log L = (1 − di ) log[1 − Φ(w′i α)]
i=1
[
+di log Φ{[w′i α + ρσu−1 (y1i − x′i β)][1 − ρ2 ]−1/2 }
]
− log σu + log Φ[σu−1 (y1i − x′i β)] . (14)
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
M.-Y. Chen E370ProbitLogit.tex August 19, 2022 37 / 37

You might also like