Economics 4233 - Fall 2011

Carlos Lamarche, Economics, OU

Limited Dependent Variable Models
Lecture 12
This lecture introduces limited dependent variables models. The main goal of the lecture is to briefly present the models to give you an introduction for empirical work. Probit and Logit Models: Suppose that we would like to explain a binary outcome e.g. yi = 1 if the governor was elected and yi = 0 otherwise. It turns out that the binary outcome may be explained by some observable variables e.g. taxes, state’s unemployment rate, etc. Formally the model is, yi = x′i β + ui where xi = (1, xi1 , ...xip )′ is a vector of independent variables. Since the dependent variable is a binary variable, β cannot be interpreted as a marginal effect. But since, E(y|x) = P (y = 1|x) × 1 + P (y = 0|x) × 0 = P (y = 1|x) we can write the conditional mean of y as P (y = 1|x) = E(y|x) = x′ β which means that βj measures the change in the probability of success (for example, being elected) when xj changes (for example, when the taxes are decreases by 1 percent). We can estimate the conditional linear model, but the predicted value can ˆ lie outside the [0,1] interval. The interpretation of x′ β as an estimated probability does not make more sense. Instead of considering a linear probability model, we can consider models of the form P (y = 1|x) = F (x′ β) The idea is to transform x′ β into a probability, so we must choose F . There are two classical options: Probit or Logit. Let me give you some details about these choices. The Probit strategy is to choose F to be standard normal
x′ β i

P (y = 1|x) =

Φ(x′i β)

=
−∞

z2 1 √ e− 2 dz 2π

Economics 4233 - Fall 2011

Carlos Lamarche, Economics, OU

Note that Φ(·) is the normal cumulative distribution function. The partial effect of the j th covariate is non linear, ∂Pi = φ(x′i β) × βj ∂xij where phi(·) is the normal probability distribution function. On the other hand, the Logit chose F to be logistic distribution, exi β P (y = 1|x) = ′ 1 + exi β In this model, the partial effect is ∂Pi exi β × βj = ′ ∂xij (1 + exi β )2 Estimation of Probit and Logit Models: OLS and WLS techniques break down to estimate the non linear function E(y|x), so we must use the maximum likelihood estimator (MLE). Let me introduce the new method of estimation, and then we consider the case of interest. Let a random sample of iid variables {y1, y2 , ...yn }, with yi ∼ f (y; θ). For example, f is a normal standard variable, and θ is both the mean µ and the variance σ 2 . The joint distribution is the product of the densities,
n
′ ′

L(θ; y) =
i=1

f (yi ; θ)

Taking logs, we obtain the log-likelihood function
n

Li = logL(θ; y) =

logf (yi; θ)
i=1

ˆ The value of θ that maximizes the log-likelihood function is called the MLE, θ. The density function of the binary variable yi given xi is f (yi|xi ; β) = [F (x′i β)]yi [1 − F (x′i β)]1−yi Therefore, the log-likelihood function is
n

L=

i=1

{yi logF (x′i β) + (1 − yi )log(1 − F (x′i β))}

Economics 4233 - Fall 2011

Carlos Lamarche, Economics, OU

ˆ The MLE is β, the value that maximizes the log-likelihood function. If F is logit, ˆ is the logit estimator, and if F is probit, then β is the probit estimator. ˆ then β Example: We estimate a labor participation model using Mroz.dta. The variable inlf , which means “in the labor force”, is a binary variable indicating the labor force participation: 1 if the woman reports working for a wage, and 0 otherwise. The probit model is Probit(inlf = 1) = −1.88 − 0.015nwif eincit + 0.12educit + 0.12expit − 0.002expsqit (0.29) (0.04) (0.024) (0.018) (0.001) where nwif einc denotes other sources of income, including husband’s earnings. The signs and standard errors can be interpreted as before, but the marginal effects are not directly derived from the equation. For example, ˆ ∂ P (inlf = 1) = −0.006 ∂nwif einc ˆ ∂ P (inlf = 1) = 0.05 ∂educ saying for example that if the women spends one more year in school, the probability of participating in the labor market increases 0.05. Tobit Model and Censored Models: Consider a case where the dependent variable is zero for a high fraction of the population, but continuous on positive values. For example, number of cigarettes per week, spends on alcohol per week, etc. y ∗ = β0 + β1 x + u y = max{0, y ∗} u|x ∼ N (0, σ 2)

You may recognize that the conditional mean model is different than the conditional mean models presented in previous lectures. The model can be estimated by considering the log-likelihood function that is equal to
n

L=

i=1

{1{yi = 0}log(1 − Φ(xi β/σ)) + 1{yi > 0}log((1/σ)φ[(yi − xi β/σ))}

The binary variable 1{·} takes the value 1 if yi = 0, and takes a value of 0 otherwise. ˆ The βs and σ s are the MLE. ˆ

Economics 4233 - Fall 2011

Carlos Lamarche, Economics, OU

Let’s now consider the censored data case. Suppose we have the following labor supply model, as an example, of the form h∗ = β0 + β1 wage + β2 kids + vi i The variable h∗ is a latent variable, and h∗ < 0 denotes hours of leisure. Why the i i model contains a latent variable? From microeconomics theory, we know that if the reservation wage is higher than the market wage rate, the individual i will consume leisure. So, we will have hi = 0 if h∗ ≤ 0, and h∗ if h∗ > 0. i i i INSERT FIGURE The graph indicates a problem: since OLS is sensitive to the observations that we are omitting, the estimate of the parameter of interest β is inconsistent. Heckman Sample Selection Problem and Correction: The sample selection problem is similar to the previous case. There is a bias that arises from using a nonrandomly selected samples to estimate behavioural assumptions. Therefore, sample selection will produce a biased estimator, explained by the fact that we are omitting a variable (called ‘inverse Mills ratio’). Consider, logwi = x′i β + ui where wi is wages of women i, xi are covariates that possibly explain wages such us education, experience, etc. Consider an additional equation,
′ Pi = 1 {zi γ + εi > 0}

where 1 {·} is an indicator variable that takes 1 if the event is true, and zi are covariates that explain women’s decision to work e.g. number of children. The key point is that u and ε are not independent e.g. E(uε) = σuε . Why? There are factors not observed by the econometrician that affects the decision of participating in the labor market and the determination of wages. Let’s write σuε ui = 2 εi + vi σε Note that,
′ E(logwi |xi , Pi = 1) = x′i β + E(ui |εi > −zi γ) ε εi zi γ σuε | >− = x′i β + 2 E σε σε σε σε ′ σuε φ(zi γ/σε ) = x′i β + 2 ′ σε Φ(zi γ/σε )

Economics 4233 - Fall 2011

Carlos Lamarche, Economics, OU

which says that the OLS model is biased. But if we estimate the model by adding the omitted variable, the estimator is consistent.
′ logwi = x′i β + δλ (zi γ/σε ) + ui 2 ˆ where λ(·) = φ(·)/Φ(·) is the inverse Mills ratio. Note that δ is an estimate of σuε /σε . Therefore, the Heckman procedure can be summarized as follows (for this example):

1. Probit of participation in the labor force on the vector of determinants of decision to work. Then, obtain estimates of γ/σε . 2. Construct Mills ratio (the additional covariate). 3. Estimate the equation, and test for sample selectivity bias H0 : δ = 0 using a t-test. If t > tc (α = 0.5), we may conclude that we have a non-random sample of women. Example: We use data on married women (Mroz.dta). In order to correct for sample selection, we need to first estimate a probit model for labor participation. probit(inlf = 1) = 0.270 − 0.012nwif einci + 0.131educi + 0.123expi − 0.002expsqi (0.509) (0.005) (0.025) (0.019) (0.0006) Then, in the second stage, after we estimate the inverse Mills ratio λ, we regress ˆ log(wage) = −0.578 + 0.109educi + 0.044expi − 0.0009expsqi + 0.032λ (0.307) (0.016) (0.016) (0.0004) (0.134) Note that we fail to reject H0 : δ = 0, therefore there is no evidence of sample selection in estimating the wage offer equation for married women.