Econometrics

Linear Probability, Logit, and Probit Models
Mei-Yuan Chen
Department of Finance
National Chung Hsing University
August 19, 2022
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
M.-Y. Chen E370ProbitLogit.tex August 19, 2022 1 / 37
Qualitative Response Data
1 binary data: way of to work (0 for private and 1 for public), default or not
(0 for not default and 1 for default), buy or not buy (0 for not buy and 1
for buy), attend the class or not (0 for not attend and 1 for attend).
2 multinomial data: extents of agree （1 表非常不同意、2 表不同意、3 表
無意見、4 表同意及 5 表非常同意），Woody’s credit scoring （用 1 到
16 表示由最低到最高的信用評等）。
3 truncated data: wage determination （觀察到的工資必須大於法定最低工
資水準）
4 limited data: limits of stock returns（股票價格的漲跌幅限制）。
This provides a very basic motivation for qualitative response and limited
dependent variable models in economics and finance.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Information and the Observational Rule
In many cases it is necessary to examine the relationship between

moments we wish to estimate and the observational rule which
determines the information we as analysts observe. Assuming
∃ − ∞ < y∗ < ∞,
∫ ∞
∗
E(y ) = y∗ f(y∗ )dy∗
∫−∞α ∫ ∞
∗ ∗ ∗
= y f(y )dy + y∗ f(y∗ )dy∗ .
−∞ α
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Observational Rules
1 y = I(−∞ < y∗ < ∞)y∗ : y = y∗ .

2 y = I(y∗ > a): Observe y = 0 or 1, Binary Response Models
such as Logit, Probit, Linear Probability Model, and yi is
Bernoulli distributed.
3 y = I(y∗ > a)y∗ and do not observe any other information for
y∗ ≤ a: Truncated Regression Models.
4 y = I(y∗ > a)y∗ + I(y∗ ≤ a)a: Observe y = y∗ or y = a.
Censored Regression/Tobit Models.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Linear Probability Models
Usually, a linear regression model is considered for the conditional

mean of dependent variable Y on explanatory variables
X1 , X2 , . . . , Xk , i.e.,
E(Y|X1 , X2 , . . . , Xk ) = β10 X1 + β20 X2 + · · · + βk0 Xk = Xβ0 .
Suppose Y is defined as a Bernoulli random variable with value “1” or

“0” with probability p and 1 − p, respectively. Then
E(Y) = 1 × P(Y = 1) + 0 × P(Y = 0) = P(Y = 1) = p
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Now consider the conditional distribution of Y on other random
variables X and then the conditional mean of Y on X is defined as
E(Y|X = x) = 1 · P(Y = 1|X = x)

+0 · P(Y = 0|X = x) = P(Y = 1|X = x),
which equals to the conditional probability value of Y = 1 conditional

on X = x. Therefore, a linear regression model with a dependent
variable that is either zero or one is called the linear probability model.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Given a sample of random observations,
{(y1 , x′1 )′ , (y2 , x′2 )′ , . . . , (yn , x′n )′ }, a linear probability model is
specified as
yi = x′i β0 + ei , i = 1, . . . , n.
Then
if yi = 0, then ei = −x′i β0 ,
if yi = 1, then ei = 1 − x′i β0 .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
E(ei ) = (1 − x′i β0 ) · P(yi = 1|X = xi ) + (0 − x′i β0 ) · P(yi = 0|X = xi )
= [1 − P(yi = 1|X = xi )] · P(yi = 1|X = xi )
−P(yi = 1|X = xi ) · [1 − P(yi = 1|X = xi )] = 0
is constant for all i, and
var(ei ) = E(e2i )
= (1 − x′i β0 )2 · P(yi = 1|X = xi ) + (0 − x′i β0 )2 · P(yi = 0|X = xi )
= [1 − P(yi = 1|X = xi )]2 · P(yi = 1|X = xi )
+[−P(yi = 1|X = xi )]2 · [1 − P(yi = 1|X = xi )]
= [1 − P(yi = 1|X = xi )] · P(yi = 1|X = xi )
{[1 − P(yi = 1|X = xi )] + [P(yi = 1|X = xi )]}
= [1 − P(yi = 1|X = xi )] · P(yi = 1|X = xi )
= (1 − x′i β0 )(x′i β0 ) not constant ∀i.heteroskedastic
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Weighted Least Squares Estimator
To have a constant variance for regression errors, Golberger (1964)
proposed a two-step, weighted estimator for estimating a linear
probability model.
(1) Construct the weight wi by
[ ]1/2
1
wi = ,
(x′i β̂n )(1 − x′i β̂n )
where β̂n is the OLS estimation.

(2) Perform OLS estimation for the regression model with transformed sample
observations, i.e.,
∑
n
′ ∑
n
β̂n∗ = ( x∗i x∗i )−1 ( x∗i yi∗ )
i=1 i=1
where x∗i = wi xi and y∗i = wi yi .

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Logit and Probit Transformations
As a probability value has to be constrained between 0 and 1, x′i β

must be between 0 and 1. However, x′i β̂n∗ is not always so
constrained. To get ride of this constrain, a nonlinear transform on
pi = P(yi = 1|X = xi ) is necessary. In literature, two transformations
are commonly used.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Logit Transformation
As 0 ≤ pi ≤ 1, pi /(1 − pi ) lies on[0, ∞). There still exists a lower

bound, 0. This lower bound can be eliminate by taking natural
logarithm on it, i.e., log[pi /(1 − pi )] ∈ (−∞, ∞). Note that
log[pi /(1 − pi )] is large (small) as pi is large (small). Assume the
linear regression model with a transformed dependent variable, i.e.,
log[pi /(1 − pi )],
log[pi /(1 − pi )] = x′i β0 .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Take exponential function on both sides of above equation
exp{log[pi /(1 − pi )]} = exp(x′i β0 )

pi /(1 − pi ) = exp(x′i β0 )
pi = exp(x′i β0 ) − pi exp(x′i β0 )
exp(x′i β0 )
pi = .
1 + exp(x′i β0 )
This expression is commonly referred to as logistic function. The

logistic function transforms x′i β0 into a real value between 0 and 1.
This is so called logit model.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Given a sample of random observation, {(yi , xi ), i = 1, . . . , n}, the likelihood
function is
∏
n
L(y|x, β) = pyi i × (1 − pi )1−yi
i=1
{[ ] yi [ ]1−yi }
∏n
exp(x′i β) exp(x′i β
= 1− ) ,
i=1
1 + exp(x′i β) 1 + exp(x′i β)
and the log-likelihood function is
log L(y|x, β)
∑n { [ ] [ ]}
exp(x′i β) exp(x′i β
= yi log + (1 − y i ) log 1 − ) .
i=1
1 + exp(x′i β) 1 + exp(x′i β)
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Probit Transformation
∫ x′i β0
1
pi = Φ(x′i β0 ) = √ exp(−u2 /2)du.
−∞ 2π
Given a sample of random observation, {(yi , xi ), i = 1, . . . , n}, the likelihood

function is
∏
n
L(y|x, β) = pyi i × (1 − pi )1−yi
i=1
∏
n
= [Φ(x′i β0 )]yi [1 − Φ(x′i β)]1−yi ,
i=1
and the log-likelihood function is
∑
n
log L(y|x, β) = {yi log[Φ(x′i β)] + (1 − yi ) log[1 − Φ(x′i β)]}.
i=1
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Marginal Effect on the Probability
For the probit model,
∂P(yi = 1) 1
= √ exp(−x′i β0 /2)β0j = ϕ(x′i β0 )β0j ,
∂xij 2π
and for the logit model,
∂P(yi = 1)
∂xij
= a ∗ e(ax+b) (1 + e(ax+b) )−1 + (−1)(a ∗ e(ax+b) (1 + e(ax+b) )−2 e(ax+b) )
[ ] [ ]
exp(x′i β0 ) exp(x′i β0 )
= β0j − β0j exp(x′i β0 )
1 + exp(x′i β0 ) (1 + exp(x′i β0 ))2
[ ][ ]
exp(x′i β0 ) exp(x′i β0 )
= β0j 1 −
1 + exp(x′i β0 ) 1 + exp(x′i β0 )
= P(yi = 1)[1 − P(yi = 1)]β0j .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Ordered Probit/Logit Models
The ordered probit/logit models is to consider the dependent variable
outcomes have a natural (ordinal) ranking (i.e., the respones can be
ordered in some meanful fashion). Consider a random sample,
{(yi , xi ), i = 1, . . . , n}, where y− = m for m = 1, . . . , M with a
natural ordering (that is m + 1 is in some sense better than m). The
observed values are assumed to derive from some unobservable latent
variable y∗i , where
y∗i = x′i β0 + ui , i = 1, . . . , n,
with
yi = m if αm−1 < y∗i ≤ αm for m = 1, . . . , M
where α0 = −∞ and αM = ∞.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Then, the conditional probability of observing the mth category (i.e.,
yi = m) can be written as
P(yi = m|xi ) = P(αm−1 < y∗i ≤ αm )

= P(αm−1 < x′i β0 + ui ≤ αm )
= P(αm−1 − x′i β0 < ui ≤ αm − x′i β0 )
= P(ui ≤ αm − x′i β0 ) − P(ui ≤ αm−1 − x′i β0 )
for m = 1, . . . , M.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Ordered Probit/Logit Model
Assuming ui ∼ N(0, 1), the Ordered Probit takes the conditional

probabilities as
P(yi = m|xi ) = Φ(αm − x′i β0 ) − Φ(αm−1 − x′i β0 ),
where Φ() is the CDF function of the standard normal distribution

and assuming the conditional probability is specified a logistic
function and then
exp(αm − x′i β0 ) exp(αm−1 − x′i β0 )
P(yi = m|xi ) = − .
1 + exp(αm − x′i β0 ) 1 + exp(αm−1 − x′i β0 )
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Likelihood Function of Ordered Probit Model
Denote zim = 1(yi = m) for m = 1, . . . , M as the indicator function for the ith
observation yi = m. The likelihood of the ith observation is
∏
M
li = P(yi = m|xi )zim
m=1
∏
M
zim
= [Φ(αm − x′i β) − Φ(αm−1 − x′i β)]
m=1
and the likelihood function of the random sample becomes

∏
n ∏
M
zim
L(α, β0 ) = [Φ(αm − x′i β) − Φ(αm−1 − x′i β)] .
i=1 m=1
Taking logarithms, the log-likelihood function is
∑
n ∑
M
l(α, β0 ) = zim ln [Φ(αm − x′i β) − Φ(αm−1 − x′i β)] .
i=1 m=1
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
MLE for Ordered Probit Model
The MLE for (α, β0 ) is obtained by satisfying

set set
∇α l(α, β) = 0 and ∇β l(α, β) = 0.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Measures for Goodness-of-Fit
(1) Squared correlation between y and ŷ;
∑
[ ni=1 (yi − ȳn )(ŷi − ȳn )]2
R = ∑n
2
∑ .
[ i=1 (yi − ȳn )2 ][ ni=1 (ŷi − ȳn )2 ]
(2) Measures based on residual sum of squares:

[ ∑n ]
(yi − ŷi )2
R = 1 − ∑n
2 i=1
i=1 (yi − ȳn )

2
n ∑
n
= 1− (yi − ŷi )2 Effron’s measure
n1 n2 i=1
[ ∑n ]
(y i − ŷ i ) 2
= 1 − ∑ni=1 , Amemiya’s measure
i=1 ŷi (1 − ŷi )
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
(3) Likelihood ratio:
R2 = 1 − (Lc /Lu )2/n

log Lu
= 1− , McFadden’s pseudo R2
log Lc
1
= 1− .
1 + 2(log Lu − log Lc )/n
(4) Proportion of correct predictions:
number of correct predictions

count R2 = ,
total number of observations
in which correction prediction means yi = ŷ∗i , where
{
∗ 1 if ŷi > 0.5
ŷi =
0 if ŷi < 0.5
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Sample Selection Bias Models
Heckman’s standard sample selection model is also called “Tobit-2”

model (Amemiya 1984, 1985). It consists of the following
(unobserved) structural process:
′
yS∗
i = xSi β S + ϵSi (1)
′
yO∗
i = xO O
i β + ϵi
O
(2)
where yS∗
i is the relization of the latent value if the selection
“tendency” for the individual i and yO∗i is the latent outcome. xSi and
xO
i are explanatory variables for the selection and outcome equation,
respectively. xS and xO may or may not be equal.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
We observe that
{
0 if yS∗
i <0
ySi = (3)
1 otherwise
{
0 if ySi = 0
yO
i = (4)
yO∗
i otherwise
i.e. we observe the outcome only if the latent selection variable yS∗
i is positive.
The observed dependence between yO and xO can now be written as
′ ′
E[yO |xO = xiO , xS = xSi , yS = 1] = xO O O S S S
i β + E[ϵ |ϵ ≥ xi β ]. (5)
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Sample Selection Bias
As yi = max{x′i β0 + ϵi , 0} and given ϵi is normally distributed with

mean 0 and variance σ02 , E(yi |xi , yi > 0) and E(yi |xi ) can be derived
as follows.
Let y∗i = x′i β0 + ϵi , then it must be that y∗i ∼ N(x′i β0 , σ02 ). When
y = 0, we have
( ′ )
∗ −x β0
P(y = 0) = P(y ≤ 0) = Φ
σ0
( ′ )
x β0
= 1−Φ .
σ0
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Given (y∗i |X = xi ) ∼ N(x′i β0 , σ02 ) and ϵi = (y∗i − x′i β0 )/σ0 ∼ N(0, 1), the density
function of ϵi is
 ( ∗ ′ )2 
( 2) yi −xi β0
1 ϵi 1  σ0 
fϵ (ϵi ) = √ exp = √ exp  
2π 2 2π 2
( ) [ ( )]
1 (y∗i − x′i β0 )2 1 (y∗i − x′i β0 )2
= √ exp = σ0 √ exp
2π 2σ02 2πσ0 2σ02
( ∗ )
y − x′i β0
= σ0 fy∗i −x′i β0 (y∗i − x′i β0 ) = ϕ i .
σ0
Therefore,
( )
1 y∗i − x′i β0
fy∗i −x′i β0 (yi∗ − x′i β0 ) = fy∗i (y∗i ) = ϕ .
σ0 σ0
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Then, E(yi |xi , y(i > 0) =)E(y∗i |xi , y∗i > 0). As y∗i ∼ N(x′i β0 , σ02 ) and
yi −x′i β0
fy∗i (y∗i ) = 1
σ0 ϕ σ0 , we have
( ) ( )
1 y∗ ′
i −xi β0
f (y∗i )
y∗ σ0 ϕ σ0
f(y∗i |y∗i > 0) = i
= ( ) .
P(y∗i ≥ 0) 1−Φ
−x′i β0
σ0
Then,
( ) ( )
∫ 1 y∗ ′
i −xi β0
∞ ϕ
σ0 σ0
E(y∗i |xi , y∗i > 0) = y∗i ( ′ ) dy∗i
−xi β0
0 1−Φ σ0
∗ ′
( ∗ ′ ) ( ∗ ′ )
∫ yi −xi β0 yi −xi β0 x′i β0 yi −xi β0
∞ ϕ + ϕ
σ0 σ0 σ0 σ0
= ( ′ ) dy∗i .
−xi β0
0 1−Φ σ0
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
∗
y −x β ′
Let x = i σ i 0 , thus σ0 dx = dy∗
i , then we have that solutions to the first part and second part of the integral are:
0
∫ ∞ ∗ ( ∗ )
1 yi − x′i β0 yi − x′i β0 ∗
(1) : ( ) ϕ dy
−x′ β0 0 σ0 σ0
1−Φ i
σ0
∫ ∞
σ0
= ( ) x′ β 0 xϕ(x)dx, d exp(g(x))/dx = exp(g(x))[dg(x)/dx]
−x′ β − i
1−Φ i 0 σ0
σ0
∫ ∞
σ0 ϕ(x)
= ( ) x′ β 0 − dx, dϕ(x)/dx = −xϕ(x)
−x′ β − i dx
1−Φ i 0 σ0
σ0
[ ( )]
σ0 −x′i β0
= ( ) −ϕ(∞) + ϕ
−x′ β σ0
1−Φ i 0
σ0
( ) ( )
−x′i β0 −x′i β0
σ0 ϕ σ0
σ0 ϕ σ0
= ( ) = ( ′ ) ,
−x′ β0 x β
1−Φ i Φ i 0
σ0 σ0
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
∫ ( ∗
∞ )
x′i β0 1 y − x′i β0
(2) : ( ′ ) ϕ i dy∗i
−xi β0 σ0 σ0
1−Φ σ0
0
( )
′ ∫ ∞ dϕ y∗i −x′i β0
xi β0 σ0
= ( ′ ) ∗ dy∗i
−xi β0 dy
1−Φ σ0
0 i
′
( ( ′ ))
xi β 0 −xi β0
= ( ′ ) 1−Φ
−xi β0 σ0
1−Φ σ0
= x′i β0 .
Using these results we have,

( )
−x′i β0
σ0 ϕ σ0
E(yi |xi , yi > 0) = x′i β0 + ( ) .
x′i β0
Φ σ0
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
The unconditional mean of yi is,
E(yi |xi ) = E(yi |xi , yi > 0)P(yi > 0)

+E(yi |xi , yi = 0)P(y∗i ≤ 0)
= E(yi |xi , yi > 0)P(x′i β0 + ϵi > 0) + 0 × P(y∗i ≤ 0)
 ( ′ )
σ0 ϕ
−xi β0 ( ′ )
 ′ σ0
 xi β0
= xi β 0 + ( ′ ) Φ
xi β 0 σ0
Φ σ0
( ′ ) ( ′ )
xi β0 −xi β0
= x′i β0 Φ + σ0 ϕ .
σ0 σ0
( ′ ) ( ′ )
xβ 0−xi β0
It is clear that since Φ σi 00 = 1 − Φ , the non-censoring case is the
( )
σ0
( )
−∞−x′i β0 −∞−x′i β0
one replacing 0 with −∞, 1 − Φ σ0 = 1 and ϕ σ0 = 0. That
∗ ′
is, E(yi |xi , yi > −∞) = xi β0 .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Estimating the model above by OLS gives in general biased results,
′
as E[ϵO |ϵS ≥ xSi β S ] ̸= 0.
Assuming the error terms follow a bivariate normal distribution:
( S ) (( ) ( ))
ϵ 0 1 ρ
∼ N , (6)
ϵO 0 ρ σ2
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Sample Selection Bias Models
Generally, a sample selection model can be expressed as
y1i = x′i β + ui , (7)

y2i = w′i α + vi , (8)
di = I(y2i > 0), i = 1, . . . , N,
[ 2 ]
σu σuv
(ui , vi )′ ∼ N(0, V), V= , (9)
σuv 1
where I(·) is an indicator function. I(·) = 1 if the argument (·) is true, otherwise
I(·) = 0. y2i is unobservable, only di is observable. Meanwhile y1i is observable
only when di = 1. When di = 0, all y1i are observed to be zero. In literature, two
estimations are developed for a sample selection model: one is Heckman
two-stage method and maximum likelihood method.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Heckman Two-Stage Estimator
Under di = 1, the expected value of y1i is
E(y1i |di = 1) = x′i β + σuv λ(w′i α), (10)

ϕ(z)
λ(z) = , (11)
Φ(z)
in which ϕ 和 Φ are probability density and distribution function. λ is

called the hazard ratio or inverse Mill’s ratio. expected value
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Therefore, for the observations with di = 1, (7) can be represented as
y1i = x′i β + σuv λ(w′i α) + ϵi . (12)
Since E(ϵi |di = 1) = 0，(12) can be used to estimate β.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Heckman’s Two-stage Procedures
1 Use the MLE (8) to estimate the parameter α in a probit model. Denote
the estimates as α̂ and put them into (11), then we have
λ̂i = λ(w′i α̂), i = 1, . . . , n.
2 Given the observation with di = 1, estimate a regression of y1i on xi and λ̂i
using least squares method. Denote β̂ and σ̂uv as the estimated results for
β and σuv . The parameter σu2 can be estimated by the following method
suggested by Heckman (1979):
1 [ ]
1∑
n
σ̂u2 = ê2i + σ̂uv λ̂i , (13)
n i=1
in which n1 is the number of observations with di = 1. êi is denoted as the

residuals from step 2.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
It’s obvious that the performance of β̂ and σ̂uv depend on the
performance of λ̂i . If wi and xi are highly correlated, severe problem
of multicollinearity exists between λ̂i and xi so that the Heckman
two-stage estimator will not perform well. This issue is discussed in
Olsen (1980)、Nawata (1993) and Leung and Yu (1996, 2000).
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Maximum Likelihood Estimator
Under the assumption of normal distribution as (9), the log-likelihood function of
(7) is
∑
n
log L = (1 − di ) log[1 − Φ(w′i α)]
i=1
[
+di log Φ{[w′i α + σuv σu−2 (y1i − x′i β)][1 − σuv
2
/σu2 ]−1/2 }
]
− log σu + log Φ[σu−1 (y1i − x′i β)] .
Let ρ = σuv /σu ，the log-likelihood function
∑
n
log L = (1 − di ) log[1 − Φ(w′i α)]
i=1
[
+di log Φ{[w′i α + ρσu−1 (y1i − x′i β)][1 − ρ2 ]−1/2 }
]
− log σu + log Φ[σu−1 (y1i − x′i β)] . (14)
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .

Econometrics

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Econometrics

Uploaded by

Copyright:

Available Formats

Linear Probability, Logit, and Probit Models

August 19, 2022

In many cases it is necessary to examine the relationship between

1 y = I(−∞ < y∗ < ∞)y∗ : y = y∗ .

Usually, a linear regression model is considered for the conditional

E(Y|X1 , X2 , . . . , Xk ) = β10 X1 + β20 X2 + · · · + βk0 Xk = Xβ0 .

Suppose Y is defined as a Bernoulli random variable with value “1” or

E(Y) = 1 × P(Y = 1) + 0 × P(Y = 0) = P(Y = 1) = p

E(Y|X = x) = 1 · P(Y = 1|X = x)

which equals to the conditional probability value of Y = 1 conditional

is constant for all i, and

where β̂n is the OLS estimation.

where x∗i = wi xi and y∗i = wi yi .

As a probability value has to be constrained between 0 and 1, x′i β

As 0 ≤ pi ≤ 1, pi /(1 − pi ) lies on[0, ∞). There still exists a lower

log[pi /(1 − pi )] = x′i β0 .

exp{log[pi /(1 − pi )]} = exp(x′i β0 )

This expression is commonly referred to as logistic function. The

and the log-likelihood function is

Given a sample of random observation, {(yi , xi ), i = 1, . . . , n}, the likelihood

and the log-likelihood function is

yi = m if αm−1 < y∗i ≤ αm for m = 1, . . . , M

P(yi = m|xi ) = P(αm−1 < y∗i ≤ αm )

Assuming ui ∼ N(0, 1), the Ordered Probit takes the conditional

P(yi = m|xi ) = Φ(αm − x′i β0 ) − Φ(αm−1 − x′i β0 ),

where Φ() is the CDF function of the standard normal distribution

and the likelihood function of the random sample becomes

Taking logarithms, the log-likelihood function is

The MLE for (α, β0 ) is obtained by satisfying

(2) Measures based on residual sum of squares:

i=1 (yi − ȳn )

R2 = 1 − (Lc /Lu )2/n

(4) Proportion of correct predictions:

number of correct predictions

Heckman’s standard sample selection model is also called “Tobit-2”

As yi = max{x′i β0 + ϵi , 0} and given ϵi is normally distributed with

Using these results we have,

E(yi |xi ) = E(yi |xi , yi > 0)P(yi > 0)

Generally, a sample selection model can be expressed as

y1i = x′i β + ui , (7)

Under di = 1, the expected value of y1i is

E(y1i |di = 1) = x′i β + σuv λ(w′i α), (10)

in which ϕ 和 Φ are probability density and distribution function. λ is

y1i = x′i β + σuv λ(w′i α) + ϵi . (12)

Since E(ϵi |di = 1) = 0，(12) can be used to estimate β.

in which n1 is the number of observations with di = 1. êi is denoted as the

Let ρ = σuv /σu ，the log-likelihood function

You might also like