Professional Documents
Culture Documents
Lecture 9 PDF
Lecture 9 PDF
The data
look like
Black Representative Elected
this.
The only
values Y
can have
are 0 and 1
0
0 .2 .4 .6 .8 1
Black Voting Age Population
Dichotomous Independent Vars.
1
And heres
a linear fit
of the data
Note that
the line
goes below
0
0 and
0 .2 .4 .6 .8 1
above 1
Black Voting Age Population
doesnt fit
the data
very well.
And if we
take values
of Y
between 0
0
and 1 to be
0 .2 .4 .6 .8 1 probabilities,
Black Voting Age Population
this doesnt
Black Representative Elected Fitted values
make sense
Redefining the Dependent Var.
How to solve this problem?
We need to transform the dichotomous Y
into a continuous variable Y (-,)
So we need a link function F(Y) that takes
a dichotomous Y and gives us a
continuous, real-valued Y
Then we can run
F(Y) = Y = X +
Redefining the Dependent Var.
Original
Y 0 1
Redefining the Dependent Var.
Original
Y 0 1
Y as a
Probability 0 1
Redefining the Dependent Var.
Original
Y 0 1
Y as a
Probability 0 1
?
Y (- ,)
-
Redefining the Dependent Var.
-4 -2 0 2 4
After estimation, you can back out probabilities using the standard normal dist.
Probit Estimation
.4
.3
.2
.1
0
-4 -2 0 2 4
-4 -2 0 2 4
-4 -2 0 2 4
Prob(Y=1)
.2
.1
0
-4 -2 0 2 4
-4 -2 0 2 4
-4 -2 0 2 4
Prob(Y=1)
.2
.1
0
-4 -2 0 2 4
0 .2 .4 .6 .8 1
Black Voting Age Population
This fits the data much better than the linear estimation
Always lies between 0 and 1
Probit Estimation
1
Probability of Electing a Black Rep.
.2 .4 0 .6 .8
0 .2 .4 .6 .8 1
Black Voting Age Population
0 .2 .4 .6 .8 1
Black Voting Age Population
0 .2 .4 .6 .8 1
Black Voting Age Population
0 .2 .4 .6 .8 1
Black Voting Age Population
Y as a
Probability 0 1
Y as a
Probability 0 1
Odds of Y
0
Y as a
Probability 0 1
Odds of Y
0
Y as a
Probability 0 1
Odds of Y
0
Log-Odds of Y
-
Logit Function
This is called the logit function
logit(Y) = log[O(Y)] = log[y/(1-y)]
Why would we want to do this?
At first, this was computationally easier than
working with normal distributions
Now, it still has some nice properties that well
investigate next time with multinomial dep. vars.
The density function associated with it is very
close to a standard normal distribution
Logit vs. Probit
.2
Logit
.15
Normal
.1
.05
0
-4 -2 0 2 4
The logit function is similar, but has thinner tails than the normal distribution
Logit Function
This translates back to the original Y as:
Y
log = X
1 Y
Y
= e X
1 Y
Y = (1 Y ) e X
Y = e X e X Y
Y + e X Y = e X
(1 + e )Y = e
X X
e X
Y=
1 + e X
Latent Variables
For the rest of the lecture well talk in terms of probits, but
everything holds for logits too
One way to state whats going on is to assume that there
is a latent variable Y* such that
(
Y * = X + , ~ N 0, 2 )
Latent Variable Formulation
For the rest of the lecture well talk in terms of probits, but
everything holds for logits too
One way to state whats going on is to assume that there
is a latent variable Y* such that
(
Y * = X + , ~ N 0, 2 ) Normal = Probit
Latent Variables
For the rest of the lecture well talk in terms of probits, but
everything holds for logits too
One way to state whats going on is to assume that there
is a latent variable Y* such that
(
Y * = X + , ~ N 0, 2 ) Normal = Probit
0 if y 0*
yi = i
1 if y > 0
*
i
Latent Variables
For the rest of the lecture well talk in terms of probits, but
everything holds for logits too
One way to state whats going on is to assume that there
is a latent variable Y* such that
(
Y * = X + , ~ N 0, 2 ) Normal = Probit
1 if y > 0
* set them to .
i
Latent Variables
This translates to possible values for the error term:
yi* > 0 x i + i > 0 i > x i
( )
Pr yi* > 0 | x i = Pr ( yi = 1 | x i ) = Pr ( i > x i )
x i
= Pr i >
x i
=
Similarly,
x i
Pr ( yi = 0 | x i ) = 1
Latent Variables
Look again at the expression for Pr(Yi=1):
x i
Pr ( yi = 1 | x i ) =
We cant estimate both and , since they enter the
equation as a ratio
So we set =1, making the distribution on a
standard normal density.
One (big) question left: how do we actually estimate
the values of the b coefficients here?
(Other than just issuing the probit command in Stata!)
Maximum Likelihood Estimation
Say were estimating Y=X+ as a probit
And say were given some trial coefficients .
Then for each observation yi, we can plug
in xi and to get Pr(yi=1)=(xi ).
For example, lets say Pr(yi=1) = 0.8
Then if the actual observation was yi=1,
we can say its likelihood (given ) is 0.8
But if yi=0, then its likelihood was only 0.2
And conversely for Pr(yi=0)
Maximum Likelihood Estimation
Let L(yi | ) be the likelihood of yi given
For any given trial set of coefficients, we can
calculate the likelihood of each yi.
Then the likelihood of the entire sample is:
n
L ( y1 ) L ( y2 ) L ( y3 ) K L ( yn ) = L ( yi )
i =1
0
Maximum Likelihood Estimation
1
y* = x + , Probit: 1#[0,1]
P(y=1) = Probit(y*)
1-L(y3)
0
y3
Given estimates of , the distance
from y3 to the line P(y=1) is 1-L(y3 | )
Maximum Likelihood Estimation
y9
1
y* = x + , Probit: 1#[0,1]
P(y=1) = Probit(y*)
1-L(y9)
1-L(y3)
0
Impact of changing
Maximum Likelihood Estimation
1
Probit(x)
Impact of changing to
Maximum Likelihood Estimation
1
Probit(x)
0
t = 1 2 3 T
Time Series
Time Series Cross Section
1
0
Country 1
1
0
Country 2
0 Country 3
1
0
Country 4
Maximum Likelihood Estimation
Recall that a likelihood function is:
n
L ( y1 ) L ( y2 ) L ( y3 ) K L ( yn ) = L ( yi ) L
i =1
[ ]
n n
max [log L ( yi| )] = log yi (1 )
1 yi
i =1 i =1
n
= yi log( ) + (1 yi ) log(1 )
i =1
[ y (1 ) (1 y ) ]
i i
i =1
=0
(1 )
n
[y
i =1
i yi + (1 yi ) ] = 0
n
n = yi
i =1
n
y i
= i =1
n
Maximum Likelihood Estimation
Finally, set this derivative to 0 and solve for
n
yi (1 yi )
=0
1
i =1
n
[ y (1 ) (1 y ) ]
i i
i =1
=0
(1 )
n Magically, the value
[y i yi + (1 yi ) ] = 0 of that maximizes
i =1 the likelihood
n function is the
n = yi sample mean, just
i =1
n
as we thought.
y i
= i =1
n
Maximum Likelihood Estimation
Can do the same exercise for OLS regression
The set of coefficients that maximize the likelihood would
then minimize the sum of squared residuals, as before
This works for logit/probit as well
In fact, it works for any estimation equation
Justlook at the likelihood function L youre trying to
maximize and the parameters you can change
Then search for the values of that maximize L
(Well skip the details of how this is done.)
Maximizing L can be computationally intense, but
with todays computers its usually not a big problem
Maximum Likelihood Estimation
This is what Stata does when you run a probit:
. probit black bvap
------------------------------------------------------------------------------
black | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
bvap | 0.092316 .5446756 16.95 0.000 0.081641 0.102992
_cons | -0.047147 0.027917 -16.89 0.000 -0.052619 -0.041676
------------------------------------------------------------------------------
Maximum Likelihood Estimation
This is what Stata does when you run a probit:
. probit black bvap
------------------------------------------------------------------------------
black | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
bvap | 0.092316 .5446756 16.95 0.000 0.081641 0.102992
_cons | -0.047147 0.027917 -16.89 0.000 -0.052619 -0.041676
------------------------------------------------------------------------------
Maximum Likelihood Estimation
This is what Stata does when you run a probit:
. probit black bvap
------------------------------------------------------------------------------
black | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
bvap | 0.092316 .5446756 16.95 0.000 0.081641 0.102992
_cons | -0.047147 0.027917 -16.89 0.000 -0.052619 -0.041676
------------------------------------------------------------------------------
Coefficients are
significant
Marginal Effects in Probit
In linear regression, if the coefficient on x is ,
then a 1-unit increase in x increases Y by .
But what exactly does it mean in probit that the
coefficient on BVAP is 0.0923 and significant?
Itmeans that a 1% increase in BVAP will raise the
z-score of Pr(Y=1) by 0.0923.
And this coefficient is different from 0 at the 5% level.
0 .2 .4 .6 .8 1
Black Voting Age Population
Marginal Effects in Probit
But increasing BVAP from .5 to .6 does have a
big impact on the probability
1
Probability of Electing a Black Rep.
.2 .4 0 .6 .8
0 .2 .4 .6 .8 1
Black Voting Age Population
Marginal Effects in Probit
So lesson 1 is that the marginal impact of
changing a variable is not constant.
Another way of saying the same thing is that in
the linear model
Y = 0 + 1 x1 + 1 x1 + K + n xn , so
Y
= i
xi
Significant
Coefficients
Example: Vote Choice
(1.52)
Example: Vote Choice
Or, calculate the impact of facing a quality challenger
by hand, keeping all other variables at their median.
(1.52)
From standard
normal table
Example: Vote Choice
Or, calculate the impact of facing a quality challenger
by hand, keeping all other variables at their median.
(1.52)
From standard
normal table
(1.52-.339)