Professional Documents
Culture Documents
• ------------------------------------------------------------------------------
• yesno | Coef. Std. Err. t P>|t| [95% Conf. Interval]
• -------------+----------------------------------------------------------------
• age | -.001222 .0003082 -3.96 0.000 -.0018262 -.0006179
• _cons | 2.868081 .6038954 4.75 0.000 1.684324 4.051839
• ------------------------------------------------------------------------------
• Interpretation: as age increases by one year the expected change in the probability
of voting for the referendum decreases by .001.
• predict yhat, xb */generate predicted values/*
• (484 missing values generated)
1
1
.8
.6
.4
.2
0
1. Non-normal errors
.526268
probability π(x)
If Y=0 then ε = π(x) with
probability 1 - π(x)
• The distribution of ε will have mean 0
and variance equal to π(x)[1- π(x)]
• Note: normality is not required for
estimates to be unbiased but it is
necessary for efficiency
2
Logit and Probit Models:
A Latent Variable Approach
• A latent variable approach treats the use of a dichotomous variable essentially
as a measurement problem:
– There is a continuous underlying, latent, variable (denoted Y*) but we
cannot observe—and therefore cannot measure—it.
– Rather, we observe a dichotomous indicator of that latent variable
• For example: there is an underlying propensity for an individual to vote, for a
nation to go to war, for a student to cheat. However, we only observe the
outcome—the action, not the underlying propensity
• The underlying model is:
Yi * = α + β X i + ε i
• We can write
P (Yi = 1) = P (Yi * > 0)
= P ( X i β + ε i > 0)
= P (ε i > − X i β )
= P (ε i ≤ X i β
The last equality holds because the εis are distributed symmetrically.
In words we can say that Y=1 if the random part is less than or equal to the systematic
part
The problem is figuring out the probability. The requires the use of some distribution.
Logit
3
• Recall that
P (Y = 1) = P(Y * > 0)
= P(ε i ≤ X i β )
exp( X i β )
P (Y = 1) ≡ Λ ( X i β ) =
1 + exp( X i β )
• We can write this out for every observation in our sample in terms of the
conditional expectation of Y given the value(s) of X. The likelihood for a
given observation is:
1−Yi
⎛ exp( X i β ) ⎞ ⎡ ⎛ exp( X i β ) ⎞ ⎤
i Y
Li = ⎜ ⎟ ⎢1 − ⎜ ⎟⎥
⎝ 1 + exp( X i β ) ⎠ ⎣ ⎝ 1 + exp( X i β ) ⎠ ⎦
• Observations with Y=1 contribute P(Y=1|X) to the likelihood while those
with Y=0 contribute P(y=0|X).
1−Yi
⎛ exp( X i β ) ⎞ ⎡ ⎛ exp( X i β ) ⎞ ⎤
i Y
Li = ∏ ⎜ ⎟ ⎢1 − ⎜ ⎟⎥
⎝ 1 + exp( X i β ) ⎠ ⎣ ⎝ 1 + exp( X i β ) ⎠ ⎦
Probit Models
• Standard normal distribution has mean zero and unit variance. Its density
looks as follows
1 ⎛ ε2 ⎞
φ (ε ) = exp ⎜ − ⎟
2π ⎝ 2 ⎠
4
• The probability for a probit looks like this:
P (Yi = 1) = Φ ( X i β )
Xiβ
1 ⎛ ( X β )2 ⎞
= ∫
−∞ 2π
exp ⎜ − i
⎝ 2 ⎠
⎟dX i β
N
ln L = ∑ Yi ln Φ ( X i β ) + (1 − Yi ) ln Φ ( X i β )
i =1
. replace x=x/100
(599 real changes made)
.2
. gen probit=1/sqrt(2*3.1415)*exp(-
((x^2)/2))
. gen logit=(exp(x))/[[1+exp(x)]^2]
.1
-4 -2 0 2 4
x
gen cumul_logit=sum(logit)
gen cumul_probit=sum(probit)
twoway (connected cumul_probit x)
(connected cumul_logit x)
100
80
60
40
20
0
-4 -2 0 2 4
x
cumul_probit cumul_logit
5
Which is Better? Logit or Probit?
Hypothesis Testing
• Logit and probit models are part of the “binomial” family in the generalized
linear model (GLM) framework.
• All GLMs are fit using mle and provide a framework that we will use later
when we add panel and time-series considerations.
• Key component in hypothesis testing of GLM models is the likelihood: both
the initial likelihood and the final likelihood.
• The likelihood also provides information regarding goodness of fit. In GLM
models we can construct a measure called the deviance (G2) which is
computed as G2=-2logeL
• The deviance is similar to the residual sum of squares from OLS.
• Hypothesis tests and confidence intervals are standard across all MLE
models.
• Tests for individual slopes are based on the Wald statistic:
β j − β 0j
Z0 =
n
SE βj
• Tests that several slopes are jointly equal to zero are based on the generalized
likelihood-ratio test (based on the deviance) and have a χ2 distribution. This
is similar to the F-test from OLS where the difference in ESS from a nested
model is compared to the ESS from the comparison model with degrees of
freedom dependent on the number of parameters being tested:
χ 2 = Gmodel
2
1 − Gmodel 2
2
6
Example: EURO referendum Sweden September 2003
VALU 2003/Exitpolls from 80 polling places.
• Dataset is a subset of the exit poll; 44 questions in total.
• N=10,732.
• sweden_class.dta
• Question of interest: “how did you vote in the referendum today”? Yes
means that Sweden should join the EU and adopt the Euro; No means that
Sweden will maintain the status quo.
• Outcome:the referendum was defeated.
• Substantively interesting for lots of reasons; useful for this class because there
are lots of questions that are coded on a nominal, ordinal and ratio scale.
7
Variable and Value Labels
• Variable labels allow you to add a label that contains a description of the variable in the dataset
label var yesno “Yes=vote for Euro”
• Value labels allow you to label the values that an ordinal or nominal variable takes
label define eu 1"Sweden should resign from the EU" 2"Sweden should remain a
member of the EU" 3 "No opinion on hte matter" 9"No information";
label values v12 eu;
. tab eu
• ------------------------------------------------------------------------------
• yesno | Coef. Std. Err. z P>|z| [95% Conf. Interval]
• -------------+----------------------------------------------------------------
• eu | 3.661967 .0945649 38.72 0.000 3.476623 3.847311
• gender | .3624785 .0547282 6.62 0.000 .2552131 .4697439
• birthyear | -.0011038 .0017202 -0.64 0.521 -.0044753 .0022677
• citizen | -.5512848 .1560279 -3.53 0.000 -.8570939 -.2454757
• _cons | -4.311818 3.387804 -1.27 0.203 -10.95179 2.328157
• ------------------------------------------------------------------------------
• . predict phat_logit
• (option p assumed; Pr(yesno))
• (2332 missing values generated)
------------------------------------------------------------------------------
yesno | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
eu | 2.110938 .0458988 45.99 0.000 2.020978 2.200898
gender | .2146035 .0320565 6.69 0.000 .1517739 .2774331
birthyear | -.0006074 .0010026 -0.61 0.545 -.0025725 .0013577
citizen | -.3223676 .0868817 -3.71 0.000 -.4926526 -.1520825
_cons | -2.511047 1.974073 -1.27 0.203 -6.380159 1.358064
------------------------------------------------------------------------------
. predict phat_probit
(option p assumed; Pr(yesno))
(2332 missing values generated)
8
.8
.6
Pr(yesno)
.4
.2
0
0 .2 .4 .6 .8
Pr(yesno)