Logit & Probit Models

Logit & Probit Models
Theory and Estimation
Linear Probability Model
• Linear probability model is the OLS model applied to a dichotomous

dependent variable
• Recall OLS model:
Yi = α + β X i + ε i
• Recall also that:
E (Yi ) = α + β X i
• Since we are dealing with a proportion [P(π)=E(π)] we know the probability:
πi = α + β Xi
• Interpretation of the coefficients is straightforward:
– A one unit increase in X is associated with a β increase in the probability of an
event occuring
– The relationship is linear so the impact of X on Y is constant
Example: Swedish EURO Referenda

(sweden_class.dta)
• . reg yesno age */regress euro vote on age/*
• Source | SS df MS Number of obs = 9936

• -------------+------------------------------ F( 1, 9934) = 15.72
• Model | 3.9140087 1 3.9140087 Prob > F = 0.0001
• Residual | 2473.23001 9934 .248966178 R-squared = 0.0016
• -------------+------------------------------ Adj R-squared = 0.0015
• Total | 2477.14402 9935 .24933508 Root MSE = .49897
• ------------------------------------------------------------------------------
• yesno | Coef. Std. Err. t P>|t| [95% Conf. Interval]
• -------------+----------------------------------------------------------------
• age | -.001222 .0003082 -3.96 0.000 -.0018262 -.0006179
• _cons | 2.868081 .6038954 4.75 0.000 1.684324 4.051839
• ------------------------------------------------------------------------------
• Interpretation: as age increases by one year the expected change in the probability
of voting for the referendum decreases by .001.
• predict yhat, xb */generate predicted values/*
• (484 missing values generated)
• twoway (scatter yesno age) (connected yhat age, msymbol(none))
1
1
.8
.6
.4
.2
0
1900 1920 1940 1960 1980

What year w ere you born
yesno Linear prediction
Problems with the Linear Probability Model
1. Non-normal errors
.526268
• Since Y takes only two possible values

Fraction
the residual (ε) can also take on only

two values:
If Y=1 then ε = 1 - π(x) with 0
-.532797
Residuals
.557633
probability π(x)
If Y=0 then ε = π(x) with
probability 1 - π(x)
• The distribution of ε will have mean 0
and variance equal to π(x)[1- π(x)]
• Note: normality is not required for
estimates to be unbiased but it is
necessary for efficiency
2. Non-Constant Error Variance

• Since the variance of ε is π(x)[1- π(x)] we have non-constant variance:
variance that is a function of the value of X
• This means that the OLS estimator for the linear probability model is
inefficient and the standard errors are biased Æ resulting in incorrect
hypothesis tests
3. Non-Linearity
• Since the OLS estimator is a linear model probabilities increase by the
same amount as X goes up one unit, regardless of the value of X
• This assumption is often met over a limited range of values of X
• We often expect the impact of an X variable on the probability of Y
to diminish as X increases (or decreases)
• For example: the likelihood of owning a house increases as income
increases but at a decreasing rate
4. Nonsensical Predictions
• The linear model can create predicted values that are not bounded by
zero and one. This clearly does not make sense.
2
Logit and Probit Models:
A Latent Variable Approach
• A latent variable approach treats the use of a dichotomous variable essentially
as a measurement problem:
– There is a continuous underlying, latent, variable (denoted Y*) but we
cannot observe—and therefore cannot measure—it.
– Rather, we observe a dichotomous indicator of that latent variable
• For example: there is an underlying propensity for an individual to vote, for a
nation to go to war, for a student to cheat. However, we only observe the
outcome—the action, not the underlying propensity
• The underlying model is:
Yi * = α + β X i + ε i
• But we only observe the following realizations of Y*:

Yi = 0 if Yi * ≤ 0
Yi = 1 if Yi* > 0
• We can write
P (Yi = 1) = P (Yi * > 0)
= P ( X i β + ε i > 0)
= P (ε i > − X i β )
= P (ε i ≤ X i β
The last equality holds because the εis are distributed symmetrically.
In words we can say that Y=1 if the random part is less than or equal to the systematic
part
The problem is figuring out the probability. The requires the use of some distribution.
Logit
• If we assume that ε follows a standard logistic distribution then we get the

logit model
• Standard logistic distribution—the pdf:
exp(ε )
P (ε ) = λ (ε ) =
[1 + exp(ε )]2
• Cumulative distribution function for the standard logistic distribution:

exp(ε )
Λ (ε ) = ∫ λ (ε )d ε =
1 + exp(ε )
• Standard logistic distribution is symmetrical around zero.
3
• Recall that
P (Y = 1) = P(Y * > 0)
= P(ε i ≤ X i β )
• Assuming a standard logistic distribution for ε we can write this as:
exp( X i β )
P (Y = 1) ≡ Λ ( X i β ) =
1 + exp( X i β )
• We can write this out for every observation in our sample in terms of the
conditional expectation of Y given the value(s) of X. The likelihood for a
given observation is:
1−Yi
⎛ exp( X i β ) ⎞ ⎡ ⎛ exp( X i β ) ⎞ ⎤
i Y
Li = ⎜ ⎟ ⎢1 − ⎜ ⎟⎥
⎝ 1 + exp( X i β ) ⎠ ⎣ ⎝ 1 + exp( X i β ) ⎠ ⎦
• Observations with Y=1 contribute P(Y=1|X) to the likelihood while those
with Y=0 contribute P(y=0|X).
• Assuming independent observations we can take the product over all N

observations to get the overall likelihood:
1−Yi
⎛ exp( X i β ) ⎞ ⎡ ⎛ exp( X i β ) ⎞ ⎤
i Y
Li = ∏ ⎜ ⎟ ⎢1 − ⎜ ⎟⎥
⎝ 1 + exp( X i β ) ⎠ ⎣ ⎝ 1 + exp( X i β ) ⎠ ⎦
• Taking the natural logarithm results in:

N
⎛ exp( X i β ) ⎞ ⎡ ⎛ exp( X i β ) ⎞ ⎤
ln Li ∑ [Yi ln ⎜ ⎟ + (1 − Yi )ln ⎢1 − ⎜ ⎟⎥
i =1 ⎝ 1 + exp( X i β ) ⎠ ⎣ ⎝ 1 + exp( X i β ) ⎠ ⎦
• Now maximize this log-likelihood with respect to the Bs
Probit Models
• Standard normal distribution has mean zero and unit variance. Its density
looks as follows
1 ⎛ ε2 ⎞
φ (ε ) = exp ⎜ − ⎟
2π ⎝ 2 ⎠
• The cumulative distribution function is

ε
1 ⎛ ε2 ⎞
Φ (ε ) = ∫ 2π exp ⎜⎝ − 2 ⎟⎠d ε
−∞
4
• The probability for a probit looks like this:
P (Yi = 1) = Φ ( X i β )
Xiβ
1 ⎛ ( X β )2 ⎞
= ∫
−∞ 2π
exp ⎜ − i
⎝ 2 ⎠
⎟dX i β
• With a log likelihood of:
N
ln L = ∑ Yi ln Φ ( X i β ) + (1 − Yi ) ln Φ ( X i β )
i =1
What do these distributions look like?

.4
. set obs 600

obs was 0, now 600
. egen x=fill(-300 -299)
.3
. replace x=x/100
(599 real changes made)
.2
. gen probit=1/sqrt(2*3.1415)*exp(-
((x^2)/2))
. gen logit=(exp(x))/[[1+exp(x)]^2]
.1
. twoway (connected probit x) (connected

logit x)
0
-4 -2 0 2 4
x
Logit has fatter tails; that is the major probit logit
difference between the two
Cumulative distribution function
gen cumul_logit=sum(logit)
gen cumul_probit=sum(probit)
twoway (connected cumul_probit x)
(connected cumul_logit x)
100
80
60
40
20
0
-4 -2 0 2 4
x
cumul_probit cumul_logit
5
Which is Better? Logit or Probit?
• From an empirical standpoint logits and probits typically yield similar

estimates of the relevant derivatives
– Because the cumulative distribution functions for the two models differ
slightly only in the tails of their respective distributions
• The derivatives are different only if there are enough observations in the tail
of the distribution
• While the derivatives are usually similar, the parameter estimates associated
with the two models are not
– Multiplying the logit estimates by 0.625 makes the logit estimates
comparable to the probit estimates
Hypothesis Testing
• Logit and probit models are part of the “binomial” family in the generalized
linear model (GLM) framework.
• All GLMs are fit using mle and provide a framework that we will use later
when we add panel and time-series considerations.
• Key component in hypothesis testing of GLM models is the likelihood: both
the initial likelihood and the final likelihood.
• The likelihood also provides information regarding goodness of fit. In GLM
models we can construct a measure called the deviance (G2) which is
computed as G2=-2logeL
• The deviance is similar to the residual sum of squares from OLS.
• Hypothesis tests and confidence intervals are standard across all MLE
models.
• Tests for individual slopes are based on the Wald statistic:
β j − β 0j
Z0 =
n
SE βj
• Tests that several slopes are jointly equal to zero are based on the generalized
likelihood-ratio test (based on the deviance) and have a χ2 distribution. This
is similar to the F-test from OLS where the difference in ESS from a nested
model is compared to the ESS from the comparison model with degrees of
freedom dependent on the number of parameters being tested:
χ 2 = Gmodel
2
1 − Gmodel 2
2
6
Example: EURO referendum Sweden September 2003
VALU 2003/Exitpolls from 80 polling places.
• Dataset is a subset of the exit poll; 44 questions in total.
• N=10,732.
• sweden_class.dta
• Question of interest: “how did you vote in the referendum today”? Yes
means that Sweden should join the EU and adopt the Euro; No means that
Sweden will maintain the status quo.
• Outcome:the referendum was defeated.
• Substantively interesting for lots of reasons; useful for this class because there
are lots of questions that are coded on a nominal, ordinal and ratio scale.
Contains data from C:\Documents and Settings\Administrator\Desktop\class_sweden.dta

obs: 10,732 Extract from Swedish Exit Poll
Data
vars: 14 8 Sep 2004 11:39
size: 203,908 (99.7% of memory free)
-------------------------------------------------------------------------------
storage display value
variable name type format label variable label
-------------------------------------------------------------------------------
eu byte %40.0g eu Do you think Sweden should
resign from the EU or stay in
the Union
party byte %39.0g party What political party would you
vote for in a parliamentary
election today
gender byte %14.0g gender Gender
birthyear int %14.0g birth_year
What year were you born
citizen byte %14.0g citizen Are you a Swedish citizen
union byte %37.0g union Are you a member of a labor
union
leftright byte %22.0g pol_scale
On the left-right political
scale, where would you place
yourself
trust byte %14.0g trust Generally speeking, how much

trust do you have for
politicians
employed byte %67.0g employment
What is your employment
situation
immigration byte %33.0g imm_vote How important was the issue of
immigration for how you decided
to vote
democracy byte %33.0g dem_vote How important was democracy for
how you decided to vote
interestrate byte %33.0g intrate_vote
How important was the
possibility for Sweden to
decided its interest rate for
ho
ownecon byte %33.0g ownecon_vote
How important was the question
of your own economy for how you
decided to vote
yesno byte %9.0g yes=voted for referenda
-------------------------------------------------------------------------------
7
Variable and Value Labels
• Variable labels allow you to add a label that contains a description of the variable in the dataset
label var yesno “Yes=vote for Euro”
• Value labels allow you to label the values that an ordinal or nominal variable takes
label define eu 1"Sweden should resign from the EU" 2"Sweden should remain a
member of the EU" 3 "No opinion on hte matter" 9"No information";
label values v12 eu;
. tab eu
Do you think Sweden should resign from |

the EU or stay in the Union | Freq. Percent Cum.
----------------------------------------+-----------------------------------
Sweden should resign from the EU | 2,499 28.16 28.16
Sweden should remain a member of the E | 6,375 71.84 100.00
----------------------------------------+-----------------------------------
Total | 8,874 100.00
Simple Logit Model

• . logit yesno eu gender birthyear citizen
• Iteration 0: log likelihood = -5672.4066

• Logit estimates Number of obs = 8196

• LR chi2(4) = 3248.03
• Prob > chi2 = 0.0000
• Log likelihood = -4048.3928 Pseudo R2 = 0.2863
• ------------------------------------------------------------------------------
• yesno | Coef. Std. Err. z P>|z| [95% Conf. Interval]
• -------------+----------------------------------------------------------------
• eu | 3.661967 .0945649 38.72 0.000 3.476623 3.847311
• gender | .3624785 .0547282 6.62 0.000 .2552131 .4697439
• birthyear | -.0011038 .0017202 -0.64 0.521 -.0044753 .0022677
• citizen | -.5512848 .1560279 -3.53 0.000 -.8570939 -.2454757
• _cons | -4.311818 3.387804 -1.27 0.203 -10.95179 2.328157
• ------------------------------------------------------------------------------
• . predict phat_logit
• (option p assumed; Pr(yesno))
• (2332 missing values generated)
Simple Probit Model

. probit yesno eu gender birthyear citizen
Iteration 0: log likelihood = -5672.4066

Probit estimates Number of obs = 8196

LR chi2(4) = 3249.53
Prob > chi2 = 0.0000
Log likelihood = -4047.6407 Pseudo R2 = 0.2864
------------------------------------------------------------------------------
yesno | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
eu | 2.110938 .0458988 45.99 0.000 2.020978 2.200898
gender | .2146035 .0320565 6.69 0.000 .1517739 .2774331
birthyear | -.0006074 .0010026 -0.61 0.545 -.0025725 .0013577
citizen | -.3223676 .0868817 -3.71 0.000 -.4926526 -.1520825
_cons | -2.511047 1.974073 -1.27 0.203 -6.380159 1.358064
------------------------------------------------------------------------------
. predict phat_probit
(option p assumed; Pr(yesno))
(2332 missing values generated)
8
.8
.6
Pr(yesno)
.4
.2
0
0 .2 .4 .6 .8
Pr(yesno)

Logit &amp; Probit Models

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Logit &amp; Probit Models

Uploaded by

Copyright:

Available Formats

Logit & Probit Models

Theory and Estimation

Linear Probability Model

• Linear probability model is the OLS model applied to a dichotomous

Example: Swedish EURO Referenda

• Source | SS df MS Number of obs = 9936

• twoway (scatter yesno age) (connected yhat age, msymbol(none))

1900 1920 1940 1960 1980

yesno Linear prediction

Problems with the Linear Probability Model

• Since Y takes only two possible values

the residual (ε) can also take on only

2. Non-Constant Error Variance

• But we only observe the following realizations of Y*:

• If we assume that ε follows a standard logistic distribution then we get the

• Cumulative distribution function for the standard logistic distribution:

• Standard logistic distribution is symmetrical around zero.

• Assuming a standard logistic distribution for ε we can write this as:

• Assuming independent observations we can take the product over all N

• Taking the natural logarithm results in:

• Now maximize this log-likelihood with respect to the Bs

• The cumulative distribution function is

• With a log likelihood of:

What do these distributions look like?

. set obs 600

. twoway (connected probit x) (connected

Logit has fatter tails; that is the major probit logit

difference between the two

Cumulative distribution function

• From an empirical standpoint logits and probits typically yield similar

Contains data from C:\Documents and Settings\Administrator\Desktop\class_sweden.dta

trust byte %14.0g trust Generally speeking, how much

Do you think Sweden should resign from |

Simple Logit Model

• Iteration 0: log likelihood = -5672.4066

• Logit estimates Number of obs = 8196

Simple Probit Model

Iteration 0: log likelihood = -5672.4066

Probit estimates Number of obs = 8196

You might also like

Logit & Probit Models

Logit & Probit Models