Professional Documents
Culture Documents
z
– the logit case 1 e
π( z ) z
z
– (z ) as a cumulative standard
1 e normal
e 1distribution function, the
probit case.
• These two functions are similar. I focus on the logit case because it
permits closed-form expressions unlike the cumulative normal
distribution function.
Threshold interpretation
• Suppose that there exists an underlying linear model,
yit* = xit + it*.
– The response is interpreted to be the “propensity” to possess a
characteristic.
– We do not observe the propensity but we do observe when the
propensity crosses a threshold, say 0.
– We observe
0 yit* 0
yit *
1 y it 0
• Using the logit distribution function,
Prob (it* a) = 1/ (1 + exp(-a) )
• Note that Prob(-it* xit ) = Prob(it* xit ). Thus,
1
Prob( yit 1) Prob( y 0) Prob( xit β)
*
it
*
it (xit β)
1 exp(xit β)
Random utility interpretation
• In economics applications, we think of an individual choosing
among c categories.
– Preferences among categories are indexed by an
unobserved utility function.
– We model utility as a function of an underlying value plus
random noise, that is, Uitj = uit(Vitj + itj), j = 0,1.
– If Uit1 > Uit0 , then denote this choice as yit = 1.
– Assuming that uit is a strictly increasing function, we have
Prob( y it 1) Prob(U it 0 U it1 )
j
e
Prob( yit 1 | xitj 1) / 1 Prob( yit 1 | xitj 1)
• Prob(
To illustrate, if y = 1 | x then
0.693,
j it itj / 1 Prob(
0)exp( ) = 2.y
j it 1 | xitj 0)
– The odds (for y = 1) are twice as great for xj = 1 as for xj = 0.
More parameter interpretation
• Similarly, assuming that jth explanatory variable is
continuous, we have
d d Prob( yit 1 | xitj )
j xit β ln
dxitj dxitj 1 Prob( yit 1 | xitj )
d
dxitj
Prob( yit 1 | xitj ) / 1 Prob( yit 1 | xitj )
Prob( yit 1 | xitj ) / 1 Prob( yit 1 | xitj )
• Thus, we may interpret j as the proportional change in the
odds ratio, known as an elasticity in economics.
Parameter estimation
• The customary estimation method is maximum likelihood.
• The log likelihood of a single observation is
ln(1 π( xit β)) if yit 0
yit ln π( x it β) (1 yit ) ln(1 π( x it β))
• The log likelihood of the data set is ln π(xit β) if yit 1
y it
it ln π( xit β) (1 yit ) ln(1 π( xit β))
• Taking partial derivatives with respect to b yields the score equations
π(xit β)
it
x it yit π(xit β)
π(xit β)1 π(xit β)
0
• where
E yit π(xit β) E yit x it π(xit β)
Var yit π( xit β )1 π( xit β)
β
For the logit function
• The normal equations are:
x y
it
it it ( xit β) 0
– The solution depends on the responses yit only through the vector of
statistics it xit yit .
• The solution of these equations, say bMLE, yields the
maximum likelihood estimate bMLE .
• This method can be extended to provide standard errors for
the estimates.
9.2 Random effects models
• We accommodate heterogeneity by incorporating subject-specific
variables of the form:
pit = (i + xit ).
– We assume that the intercepts are realizations of random variables
from a common distribution.
• We estimate the parameters of the {i} distribution and the K slope
parameters .
• By using the random effects specification, we dramatically reduced the
number of parameters to be estimated compared to the Section 9.3 fixed
effects set-up.
– This is similar to the linear model case.
• This model is computationally difficult to evaluate.
Commonly used distributions
• We assume that subject-specific effects are independent and come from a
common distribution.
– It is customary to assume that the subject-specific effects are normally
distributed.
• We assume, conditional on subject-specific effects, that the responses are
independent. Thus, there is no serial correlation.
• There are two commonly used specifications of the conditional
distributions in the random effects panel data model.
– 1. A logistic model for the conditional distribution of a response. That is,
t 1
x β
Prob( yit 1) E Φ( i xit β) Φ it
2
1
9.3 Fixed effects models
• As with homogeneous models, we express the probability of the
response being 1 as a nonlinear function of linear combinations of
explanatory variables.
• To accommodate heterogeneity, we incorporate subject-specific
variables of the form:
pit = (i + xit ).
– Here, the subject-specific effects account only for the intercepts and
do not include other variables.
– We assume that {i} are fixed effects in this section.
• In this chapter, we assume that responses are serially uncorrelated.
• Important point: Panel data with dummy variables provide inconsistent
parameter estimates….
Maximum likelihood estimation
• Unlike random effect models, maximum likelihood estimators are inconsistent in
fixed effects models.
– The log likelihood of the data set is
y
it
it
– This log likelihood can
ln ( x β) (1 y ) ln(1 ( x β))
still be i it
maximized to yield it
maximum i
likelihood it
estimators.
– However, as the subject size n tends to infinity, the number of parameters also tends to
infinity.
• Intuitively, our ability to estimate is corrupted by our inability to estimate
consistently the subject-specific effects {i } .
– In the linear case, we had that the maximum likelihood estimates are equivalent to the least
squares estimates.
• The least squares estimates of were consistent.
• The least squares procedure “swept out” intercept estimators when producing
estimates of .
Maximum likelihood estimation is
inconsistent
• Example 9.2 (Chamberlain, 1978, Hsiao 1986).
– Let Ti = 2, K=1 and xi1 = 0 and xi2=1.
– Take derivatives of the likelihood function to get the
score functions – these are in display (9.8).
– From (9.8), the score functions are
L e i e i
yi1 yi 2 i
i
0
– and i 1 e 1 e
L e i
yi 2 i
0
β i 1 e
– Appendix 9A.1
• Maximize this to get bmle
• Show that the probability limit of bmle is 2 , and hence is an
inconsistent estimator of .
Conditional maximum likelihood
estimation
• This estimation technique provides consistent estimates of the
beta coefficients.
– It is due to Chamberlain (1980) in the context of fixed
effects panel data models.
• Let’s consider the logit specification of , so that
1
pit π( i xit β)
1 exp ( i xit β)
• Big idea: With this specification, it turns out that t yit is a
sufficient statistic for i.
– Thus, if we condition on t yit , then the distribution of the
responses will not depend on i.
Example of the sufficiency
• To illustrate how to separate the intercept from the slope
effects, consider the case Ti = 2.
– Suppose that the sum, t yit = yi1+yi2, equals either 0 or 2.
• If sum equals 0, then Prob (yi1 = 0, yi2 = 0 |yi1 + yi2 = sum) = 1.
• If sum equals 2, then Prob (yi1 = 1, yi2 = 1 |yi1 + yi2 = sum) = 1.
• Both conditional probabilities do not depend on i .
• Both conditional events are certain and will contribute nothing
to a conditional likelihood.
– If sum equals 1,
Prob yi1 yi 2 1 Prob yi1 0 Prob yi 2 1 Prob yi1 1Prob yi 2 0
i 1
• For the variance function, consider Var yit = it (1- it).
• Let Corr(yir, yis) denote the correlation between yir and yis.
– This is known as a working correlation.
• Use the exchangeable correlation structure specified as
1 for r s
Corr ( y ir , y is )
for r s