Binary choice models and ML estimation

October 2007
Chapter 10
Binary choice and limited dependent variable models, and maximum likelihood estimation
Overview
The first part of this chapter describes the linear probability model, logit analysis, and probit analysis, three techniques for fitting regression models where the dependent variable is a qualitative characteristic. Next it discusses tobit analysis, a censored regression model fitted using a combination of linear regression analysis and probit analysis. This leads to sample selection models and heckman analysis. The second part of the chapter introduces maximum likelihood estimation, the method used to fit all of these models except the linear probability model.
Further material
Limiting distributions and the properties of maximum likelihood estimators Provided that weak regularity conditions involving the differentiability of the likelihood function are satisfied, maximum likelihood (ML) estimators have the following attractive properties in large samples: (1) They are consistent. (2) They are asymptotically normally distributed. (3) They are asymptotically efficient. The meaning of the first property is familiar. It implies that the probability density function of the estimator collapses to a spike at the true value. This being the case, what can the other assertions mean? If the distribution becomes degenerate as the sample size becomes very large, how can it be described as having a normal distribution? And how can it be described as being efficient, when its variance, and the variance of any other consistent estimator, tend to zero? To discuss the last two properties, we consider what is known as the limiting distribution of an estimator. This is the distribution of the estimator when the divergence between it and its population mean is multiplied by
n . If we do this, the distribution of a typical estimator remains nondegenerate as n becomes large, and this
enables us to say meaningful things about its shape and to make comparisons with the distributions of other estimators (also multiplied by n ).
is its ML To put this mathematically, suppose that there is one parameter of interest, , and that estimator. Then (2) says that
~ N 0, 2 n
) (
)
~ n cannot have smaller
~ for some variance 2 . (3) says that , given any other consistent estimator , variance.
October 2007
Test procedures for maximum likelihood estimation
This section on ML tests contains material that is a little advanced for an introductory econometrics course. It is provided because likelihood ratio tests are encountered in the sections on binary choice models and because a brief introduction may be of help to those who proceed to a more advanced course. There are three main approaches to testing hypotheses in maximum likelihood estimation: likelihood ratio (LR) tests, Wald tests, and Lagrange multiplier (LM) tests. Since the theory behind Lagrange multiplier tests is relatively complex, the present discussion will be confined to the first two types. We will start by assuming that the probability density function of a random variable X is a known function of a single unknown parameter and that the likelihood function for given a sample of n observations on X, L( | X 1 ,..., X n ) , satisfies weak regularity conditions involving its differentiability. In particular, we assume that is determined by the firstorder condition dL/d = 0. (This rules out estimators such as that in Exercise A10.7) The null hypothesis is H0: . = 0, the alternative hypothesis is H1: 0, and the maximum likelihood estimate of is
Likelihood ratio tests
with its value at = . In view A likelihood ratio test compares the value of the likelihood function at = 0 , L L( ) for all 0. However, if the null hypothesis is true, the ratio L L( ) of the definition of 0 0
should not be significantly greater than 1. As a consequence, the logarithm of the ratio,
L log L( ) = log L log 0 L( ) 0
()
()
()
()
should not be significantly different from zero. In that it involves a comparison of the measures of goodness of fit for unrestricted and restricted versions of the model, the LR test is similar to an F test. Under the null hypothesis, it can be shown that in large samples the test statistic log L( ) LR = 2 log L 0
()
has a chi-squared distribution with one degree of freedom. If there are multiple parameters of interest, and multiple restrictions, the number of degrees of freedom is equal to the number of restrictions. Examples We will return to the example in Section 10.6 in the text, where we have a normally-distributed random variable X with unknown population mean and known standard deviation equal to 1. Given a sample of n observations, the likelihood function is
1 1 (X X 1 ,..., X n ) = L( e 2 2
1
)2
1 (X ... 1 e 2 2
)2
The loglikelihood is
1 1 n (X i | X 1 ,..., X n ) = n log )2 log L( 2 i =1 2
= X . The LR statistic for the null hypothesis H0: = 0 is therefore and the unrestricted ML estimate is
1 1 n Xi X LR = 2 n log 2 2 i =1
n n log 1 1 ( X i 0 )2 2 2 i =1
= (X i 0 ) X i X
2 i =1 i =1
= n X 0
If we relaxed the assumption = 1, the unrestricted likelihood function is

1 X 1 2 , X 1 ,..., X n ) = L( e 2
1 2
1 X 1 2 ... e 2
n
and the log-likelihood is
1 1 , | X 1 ,..., X n ) = n log log L( n log 2 2 2 The first-order condition obtained by differentiating by is

log L n 1 = + 3
(X
i =1
(X
i =1
) = 0
2
from which we obtain

2 = 1 n )2 (X i n i =1
Substituting back into the log-likelihood function, the latter now becomes a function of only (and is known as the concentrated log-likelihood function or, sometimes, the profile log-likelihood function):
2 1 1 n n 2 n log log L( | X 1 ,..., X n ) = n log n (X i ) 2 i =1 2 1
As before, the ML estimator of is X . Hence the LR statistic is

1 1 n n log LR = 2 n log n Xi X i =1 2
)
2
1 1 1 2 n 1 n n 2 2 n log n log X ( ) 0 n i 2 2 i =1 2
n n 2 ( ) = n log X log Xi X i 0 i =1 i =1
It is worth noting that this is closely related to the F statistic obtained when one fits the least squares model Xi = + ui
n G G The least squares estimator of is X and RSS = X i X i =1
. If one imposes the restriction = 0 , we have
RSS R = ( X i 0 ) and the F statistic

2 i =1
F (1, n 1) =
(X
i =1
0 ) X i X
2 i =1
n X i X i =1
) (n 1)
2
Returning to the LR statistic, we have
LR = n log
(X
i =1 n
0 ) X
2
(X
i =1 i
= n log 1 +
n
(X
i =1
0 ) X i X
2 i =1
(X
n i =1
(X
i =1
0 ) X i X
i =1
(X
n i =1
n FF n 1
Note that we have used the approximation log(1 + a) = a which is valid when a is small enough for higher powers to be neglected.
Wald tests
Wald tests are based on the same principle as t tests in that they evaluate whether the discrepancy between the maximum likelihood estimate and the hypothetical value 0 is significant, taking account of the variance in the = 0 is estimate. The test statistic for the null hypothesis H :
0 0
( )
0
2
2 2 where is the estimate of the variance of evaluated at the maximum likelihood value. can be estimated
in various ways that are asymptotically equivalent if the likelihood function has been specified correctly. A common estimator is that obtained as minus the inverse of the second differential of the log-likelihood function evaluated at the maximum likelihood estimate. Under the null hypothesis that the restriction is valid, the test statistic has a chi-squared distribution with one degree of freedom. When there are multiple restrictions, the test statistic becomes more complex and the number of degrees of freedom is equal to the number of restrictions. Examples We will use the same examples as for the LR test, first, assuming that = 1 and then assuming that it has to be estimated along with . In the first case the log-likelihood function is 1 1 n ( X i )2 log L( | X 1 ,..., X n ) = n log 2 2 i =1 The first differential is
statistic is therefore n X 0 .
2
( X i ) and the second is n, so the estimate of the variance is i =1
1 .. The Wald test n
In the second example, where was unknown, the concentrated log-likelihood function is
2 1 1 n n 2 n log log L( | X 1 ,..., X n ) = n log n (X i ) 2 i =1 2 1
1 n n n 1 n log log ( X i )2 = n log 2 n 2 i =1 2 2 The first derivative with respect to is
dlogL =n d
(X i ) i =1
2 (X i ) i =1 n
The second derivative is
d log L =n d 2
2 ( n ) (X i ) ( X i ) 2 ( X i )
n n n
i =1
i =1
i =1
n 2 ( X i ) i =1
=X, Evaluated at the ML estimator
( X i ) = 0 and hence i =1
d 2 log L = d 2 n2
(X
i =1
giving an estimated variance
2 n
, given
2 = 1 n Xi X n i =1
Hence the Wald test statistic is
(X )
0
2 n
. Under the null hypothesis, this is distributed as a chi-squared
statistic with one degree of freedom. When there is just one restriction, as in the present case, the Wald statistic is the square of the corresponding asymptotic t statistic (asymptotic because the variance has been estimated asymptotically). The chi-squared test and the t test are equivalent, given that, when there is one degree of freedom, the critical value of the chi-squared statistic for any significance level is the square of the critical value of the normal distribution.
LR test of restrictions in a regression model Given the regression model
Yi = 1 + j X ij + u i
j =2
with u assumed to be i.i.d. N(0, ), the log-likelihood function for the parameters is
2
1 1 log L( 1 ,..., k , | Yi , X i , i = 1,...n ) = n log 2 2 2
k Yi 1 j X ij i =1 j =2 n
This is a straightforward generalization of the expression for a simple regression derived in Section 10.6 in the text. Hence n 1 Z log L( 1 ,..., k , | Yi , X i , i = 1,...n ) = n log log 2 2 2 2 where
k Z = Yi 1 j X ij i =1 j =2 n 2
The estimates of the parameters affect only Z. To maximize the log-likelihood, they should be chosen so as to minimize Z, and of course this is exactly what one is doing when one is fitting a last squares regression. Hence Z = RSS. It remains to determine the ML estimate of . Taking the partial differential with respect to , we obtain one of the first-order conditions for a maximum: log L( 1 ,..., k , ) n 1 = + 3 RSS = 0 From this we obtain RSS 2 = n Hence the ML estimator is the sum of the squares of the residuals divided by n. This is different from the least squares estimator, which is the sum of the squares of the residuals divided by n k, but the difference disappears
2 in the log-likelihood function, we obtain the concentrated as the sample size becomes large. Substituting for likelihood function
1 RSS 2 n log L( 1 ,..., k | Yi , X i , i = 1,...n ) = n log RSS log 2 2 2 /n n Z n RSS n n = log log 2 2 2 2 n = we will re-write this n (log RSS + 1 + log 2 log n ) 2
1
n (log RSS U + 1 + log 2 log n) 2 the subscript U emphasizing that this is the unrestricted log-likelihood. If we now impose a restriction on the parameters and maximize the log-likelihood function subject to the restriction, it will be n log LR = (log RSS R + 1 + log 2 log n ) 2 where RSS R RSS U and hence log LR log LU . The LR statistic for a test of the restriction is therefore
log LU =
2(log LU LR ) = n(log RSS R log RSS U ) = n log
RSS R RSS U
It is distributed as a chi-squared statistic with one degree of freedom under the null hypothesis that the restriction is valid. If there is more than one restriction, the test statistic is the same but the number of degrees of freedom under the null hypothesis that all the restrictions are valid is equal to the number of restrictions. An example of its use is the common factor test in Section 12.6 in the text. As with all maximum likelihood tests, it is valid only for large samples. Thus for testing linear restrictions we should prefer the F test approach because it is valid for finite samples.
Learning outcomes
After working through the corresponding chapter in the text, studying the corresponding slideshows, and doing the starred exercises in the text and the additional exercises in this guide, you should be able to: describe the linear probability model and explain its defects describe logit analysis, giving the mathematical specification describe probit analysis, including the mathematical specification calculate marginal effects in logit and probit analysis explain why OLS yields biased estimates when applied to a sample with censored observations, even when the censored observations are deleted
explain the problem of sample selection bias and describe how the heckman procedure may provide a solution to it (in general terms, without mathematical detail) explain the principle underlying maximum likelihood estimation apply maximum likelihood estimation from first principles in simple models
Additional exercises
A10.1 What factors affect the decision to make a purchase of your category of expenditure in the CES data set? Define a new variable CATBUY that is equal to 1 if the household makes any purchase of your category and 0 if it makes no purchase at all. Regress CATBUY on EXPPC, SIZE, REFAGE, and COLLEGE (as defined in Exercise A5.6) using: (1) the linear probability model, (2) the logit model, and (3) the probit model. Calculate the marginal effects at the mean of EXPPC, SIZE, REFAGE, and COLLEGE for the logit and probit models and compare them with the coefficients of the linear probability model. A10.2 Logit analysis was used to relate the event of a respondent working (WORKING, defined to be 1 if the respondent was working, and 0 otherwise) to the respondents educational attainment (S, defined as the highest grade completed) using 1994 data from the US National Longitudinal Survey of Youth. In this year the respondents were aged 2936 and a substantial number of females had given up work to raise a family. The analysis was undertaken for females and males separately, with the output shown below (first females, then males, with iteration messages deleted):
. logit WORKING S if MALE==0 Logit Estimates Number of obs chi2(1) Prob > chi2 Pseudo R2 = 2726 = 70.42 = 0.0000 = 0.0217
Log Likelihood = -1586.5519
-----------------------------------------------------------------------------WORKING | Coef. Std. Err. z P>|z| [95% Conf. Interval] ---------+-------------------------------------------------------------------S | .1511872 .0186177 8.121 0.000 .1146971 .1876773 _cons | -1.049543 .2448064 -4.287 0.000 -1.529355 -.5697314 -----------------------------------------------------------------------------. logit WORKING S if MALE==1 Logit Estimates Number of obs chi2(1) Prob > chi2 Pseudo R2 = 2573 = 75.03 = 0.0000 = 0.0446
Log Likelihood = -802.65424
-----------------------------------------------------------------------------WORKING | Coef. Std. Err. z P>|z| [95% Conf. Interval] ---------+-------------------------------------------------------------------S | .2499295 .0306482 8.155 0.000 .1898601 .3099989 _cons | -.9670268 .3775658 -2.561 0.010 -1.707042 -.2270113 ------------------------------------------------------------------------------
95 percent of the respondents had S in the range 918 years and the mean value of S was 13.3 and 13.2 years for females and males, respectively. From the logit analysis, the marginal effect of S on the probability of working at the mean was estimated to be 0.030 and 0.020 for females and males, respectively. Ordinary least squares regressions of WORKING on S yielded slope coefficients of 0.029 and 0.020 for females and males, respectively. As can be seen from the second figure below, the marginal effect of educational attainment was lower for males than for females over most of the range S 9. Discuss the plausibility of this finding.
As can also be seen from the second figure, the marginal effect of educational attainment decreases with educational attainment for both males and females over the range S 9. Discuss the plausibility of this finding. Compare the estimates of the marginal effect of educational attainment using logit analysis with those obtained using ordinary least squares.
1.0
Males
0.8
Probability
0.6
Females
0.4
0.2
0.0 0 2 4 6 8 10 S 12 14 16 18 20
Probability of working, as a function of S
0.07
Males
0.06 0.05 Marginal effect 0.04
Females
0.03 0.02 0.01 0.00 0 2 4 6 8 10 S 12 14 16 18 20
Marginal effect of S on the probability of working A10.3 A researcher interested in the relationship between parenting, age and schooling has data for the year 2000 for a sample of 1,167 married males and 870 married females aged 35 to 42 in the National Longitudinal Survey of Youth. In particular, she is interested in how the presence of young children in the household is related to the age and education of the respondent. She defines CHILDL6 to be 1 if there is a child less than 6 years old in the household and 0 otherwise and regresses it on AGE, age, and S, years of schooling, for males and females separately using probit analysis. Defining the probability of having a child less than 6 in the household to be p = F(Z) where
Z = 1 + 2AGE + 3S she obtains the results shown in the table (asymptotic standard errors in parentheses). males AGE S constant 0.137 (0.018) 0.132 (0.015) 0.194 (0.358) 0.399 0.368 females 0.154 (0.023) 0.094 (0.020) 0.547 (0.492) 0.874 0.272
Z f (Z )
For males and females separately, she calculates

Z = b1 + b2 AGE + b3 S
where AGE and S are the mean values of AGE and S and b1, b2, and b3 are the probit coefficients in the corresponding regression, and she further calculates
f (Z ) = 1 2 e
1 Z2 2
where f ( Z ) = (a)
dF . The values of Z and f ( Z ) are shown in the table. dZ
Explain how one may derive the marginal effects of the explanatory variables on the probability of having a child less than 6 in the household, and calculate for both males and females the marginal effects at the means of AGE and S.
(b) Explain whether the signs of the marginal effects are plausible. Explain whether you would expect the marginal effect of schooling to be higher for males or for females. (c) At a seminar someone asks the researcher whether the marginal effect of S is significantly different for males and females. The researcher does not know how to test whether the difference is significant and asks you for advice. What would you say?
A10.4 A researcher has 1994 data on years of schooling, S, and the marital status of 2,762 female respondents aged 2936 in the US National Longitudinal Survey of Youth. She notices that there is a relationship between education and the probability of ever having married, but it is nonlinear, the proportion being relatively low for those who did not complete high school (S < 12) and for those with a college education. She defines a variable M to be 1 if the respondent had ever been married and 0 if always single, and she regresses M on S and SSQ, a variable defined as the square of S, using ordinary least squares, with the following result (standard errors in parentheses):
M
= 0.371 + 0.075 S 0.003 SSQ (0.121) (0.018) (0.001)
She also performs a logit regression with the same specification (standard errors in parentheses):
Z
= 0.914 + 0.420 S 0.017 SSQ (0.748) (0.111) (0.004)
The graph shows the proportion of ever-married women for each value of S and the relationships implicit in the OLS and logit regressions.
10
Actual proportion 0.90 0.85
OLS model
Logit model
Probability of being married
0.80 0.75 0.70 0.65 0.60 0.55 0.50 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
Years of schooling
This model is a little different from the standard logit model in that Z is a quadratic function of S. Explain the implications of this for the properties of the model and the calculation of the marginal effect of S. The marginal effects for S = 8 (completion of grade school), S = 12 (completion of high school), and S = 16 (completion of four-year college) are shown in the table, together with that at S = 13.22 (sample mean).
S 8 12 13.22 16 Z 1.37 1.71 1.71 1.52 Marginal effect 0.0246 0.0024 0.0029 0.0169
A10.5 A random variable X can take the positive integer values 0, 1, 2, 3, The probability of it being equal to a specific value x, p(x), is given by
p ( x) = e x x!
where is an unknown parameter and x! is the product of the terms x(x1)(x2)(3)(2)(1). In a sample of three observations, the values of the random variable are 2, 5, and 2. Write down the likelihood function for and derive the maximum likelihood estimate of . A10.6 The probability density function of a random variable X is the exponential distribution
f(X) = eX for X > 0

with unknown parameter . Derive the maximum likelihood estimator of , given a sample of n observations on X. A10.7 A random variable X can be equal to any of the positive integers in the range 1 to n with equal probability. n is unknown. In a sample of 2 observations, the values of X were 2 and 3. Derive the maximum likelihood estimate of n.
11
A10.8 In Exercise 10.9 in the text, an event could occur with probability p. Given that the event occurred m times in a sample of n observations, the exercise required demonstrating that m/n was the ML estimator of p. Derive the LR statistic for the null hypothesis p = p0. If m = 40 and n = 100, test the null hypothesis p = 0.5. A10.9 For the variable in Exercise A10.8, derive the Wald statistic and test the null hypothesis p = 0.5.
Answers to the starred exercises in the text

10.2 A researcher, using a sample of 2,868 individuals from the NLSY, is investigating how the probability of a respondent obtaining a bachelors degree from a four-year college is related to the respondents score on ASVABC. 26.7 percent of the respondents earned bachelors degrees. ASVABC ranged from 22 to 65, with mean value 50.2, and most scores were in the range 40 to 60. Defining a variable BACH to be equal to 1 if the respondent has a bachelors degree (or higher degree) and 0 otherwise, the researcher fitted the OLS regression (standard errors in parentheses):
CH BA
= 0.864 + 0.023ASVABC (0.042) (0.001)
R2 = 0.21
The researcher also fitted the following logit regression:
= 11.103 + 0.189 ASVABC Z

(0.487) (0.009) where Z is the variable in the logit function. Using this regression, the probability and marginal effect functions are shown in the diagram. (a) (b) (c) Give an interpretation of the OLS regression and explain why OLS is not a satisfactory estimation method for this kind of model. With reference to the diagram, discuss the variation of the marginal effect of the ASVABC score implicit in the logit regression and compare it with that in the OLS regression. Sketch the probability and marginal effect diagrams for the OLS regression and compare them with those for the logit regression. In your discussion, make use of the information in the first paragraph of this question.
1.0 0.05
0.8
0.04
Cumulative effect
0.6
0.03
0.4
0.02
0.2
0.01
0.0 0 20 40 60 80
0.00 100
ASVABC
Marginal effect
12
Answer:
(a) The slope coefficient of the linear probability model indicates that the probability of earning a bachelors degree rises by 2.3 percent for every additional point on the ASVABC score. While this may be realistic for a range of values of ASVABC, it is not for very low ones. Very few of those with scores in the low end of the spectrum earned bachelors degrees and variations in the ASVABC score would be unlikely to have an effect on the probability. The intercept literally indicates that an individual with a zero score would have a minus 86.4 percent probability of earning a bachelors degree. Given the way that ASVABC was constructed, a score of zero was in fact impossible. However the linear probability model predicts nonsense negative probabilities for all those with scores less of 37 or less, of whom there were many in the sample. The linear probability model also suffers from the problem that the standard errors and t and F tests are invalid because the disturbance term does not have a normal distribution. Its distribution is not even continuous, consisting of only two possible values for each value of ASVABC.
(b) The question states that the mean value of ASVABC in the sample was 50.2. From the curve for the cumulative probability in the figure it can be seen that the probability of graduating from college for respondents with that score is only about 20 percent. The question states that most respondents had scores in the range 4060. It can be seen that at the top of that range the probability has increased substantially, being about 60 percent. Looking at the curve for the marginal probability, it can be seen that the marginal effect is greatest in the range 5065, and of course this is the range with the steepest slope of the cumulative probability. The questions states that the highest score was 65, where the probability would be about 75 percent. (c) The counterpart to the cumulative probability curve in the figure is a straight line using the regression result. At the ASVABC mean it predicts that there is a 29 percent chance of the respondent graduating from college, considerably more than the logit figure, but for a score of 65 it predicts a probability of only 63 percent. It is particularly unsatisfactory for low ASVABC scores. It predicts a 36 percent chance of graduating for respondents with the lowest score of 22, and negative probabilities for all scores lower than 38. The OLS counterpart to the marginal probability curve is a horizontal straight line at 0.023, showing that the marginal effect is under estimated for ASVABC scores above 50 and overestimated below that figure.
1.5 0.05
1.0
0.04
Cumulative effect
0.5
0.03
0.0 0 -0.5 20 40 60 80
0.02 100 0.01
-1.0
0.00
ASVABC
Marginal effect
13
10.4
The following probit regression, with iteration messages deleted, was fitted using 2,726 observations on females in the National Longitudinal Survey of Youth using the LFP data set described in Appendix B. The data are for 1994, when the respondents were aged 29 to 36 and many of them were raising young families.
. probit WORKING S AGE CHILDL06 CHILDL16 MARRIED ETHBLACK ETHHISP if MALE==0 Iteration Iteration Iteration Iteration 0: 1: 2: 3: log log log log likelihood likelihood likelihood likelihood = = = = -1485.6248 -1403.9344 -1403.0839 -1403.0835 Number of obs LR chi2(7) Prob > chi2 Pseudo R2 = = = = 2726 165.08 0.0000 0.0556
Probit estimates
Log likelihood = -1403.0835
-----------------------------------------------------------------------------WORKING | Coef. Std. Err. z P>|z| [95% Conf. Interval] ---------+-------------------------------------------------------------------S | .0892571 .0120629 7.399 0.000 .0656143 .1129 AGE | -.0438511 .012478 -3.514 0.000 -.0683076 -.0193946 CHILDL06 | -.5841503 .0744923 -7.842 0.000 -.7301525 -.4381482 CHILDL16 | -.1359097 .0792359 -1.715 0.086 -.2912092 .0193897 MARRIED | -.0076543 .0631618 -0.121 0.904 -.1314492 .1161407 ETHBLACK | -.2780887 .081101 -3.429 0.001 -.4370436 -.1191337 ETHHISP | -.0191608 .1055466 -0.182 0.856 -.2260284 .1877068 _cons | .673472 .2712267 2.483 0.013 .1418775 1.205066 ------------------------------------------------------------------------------
WORKING is a binary variable equal to 1 if the respondent was working in 1994, 0 otherwise. CHILDL06 is a dummy variable equal to 1 if there was a child aged less than 6 in the household, 0 otherwise. CHILDL16 is a dummy variable equal to 1 if there was a child aged less than 16, but no child less than 6, in the household, 0 otherwise. MARRIED is equal to 1 if the respondent was married with spouse present, 0 otherwise. The remaining variables are as described in EAEF Regression Exercises. The mean values of the variables are given in the output below:
. sum WORKING S AGE CHILDL06 CHILDL16 MARRIED ETHBLACK ETHHISP if MALE==0 Variable | Obs Mean Std. Dev. Min Max ---------+----------------------------------------------------WORKING | 2726 .7652238 .4239366 0 1 S | 2726 13.30998 2.444771 0 20 AGE | 2726 17.64637 2.24083 14 22 CHILDL06 | 2726 .3991196 .4898073 0 1 CHILDL16 | 2726 .3180484 .4658038 0 1 MARRIED | 2726 .6228907 .4847516 0 1 ETHBLACK | 2726 .1305943 .3370179 0 1 ETHHISP | 2726 .0722671 .2589771 0 1
Calculate the marginal effects and discuss whether they are plausible. [The data set and a description are posted on the website.]
Answer: The marginal effects are calculated in the table below. As might be expected, having a child aged less than 6 has a large adverse effect, very highly significant. Schooling also has a very significant effect, more educated mothers making use of their investment by tending to stay in the labor force. Age has a significant negative effect, the reason for which is not obvious (the respondents were aged 29 36 in 1994). Being black also has an adverse effect, the reason for which is likewise not obvious. (The WORKING variable is defined to be 1 if the individual has recorded hourly earnings of at least $3. If the definition is tightened to include also the requirement that the employment status is employed, the latter effect is smaller, but still significant at the 5 percent level.)
14
Variable S AGE CHILD06 CHILDL16 MARRIED ETHBLACK ETHHISP constant Total
Mean 13.3100 17.6464 0.3991 0.3180 0.6229 0.1306 0.0723 1.0000
b 0.0893 0.0439 0.5842 0.1359 0.0077 0.2781 0.0192 0.6735
Mean b 1.1886 0.7747 0.2332 0.0432 0.0048 0.0363 0.0014 0.6735 0.7685
f(Z) 0.2969 0.2969 0.2969 0.2969 0.2969 0.2969 0.2969
bf(Z) 0.0265 0.0130 0.1735 0.0404 0.0023 0.0826 0.0057
10.5
Using the CES data set, perform a tobit regression of expenditure on your commodity on total household expenditure per capita and household size, and compare the slope coefficients with those obtained in OLS regressions including and excluding observations with 0 expenditure on your commodity.
Answer: The table gives the number of unconstrained observations for each category of expenditure and the slope coefficients and standard errors from an OLS regression using the unconstrained observations only, the OLS regression using all the observations, and the tobit regression. As may be expected, the discrepancies between the tobit estimates and the OLS estimates are greatest for those categories with the largest numbers of constrained observations. In the case of categories such as FDHO, SHEL, TELE, and CLOT, there is very little difference between the tobit and the OLS estimates.
15
Comparison of OLS and tobit regressions OLS, all cases n FDHO FDAW HOUS TELE DOM TEXT FURN MAPP SAPP CLOT FOOT GASO TRIP LOCT HEAL ENT FEES TOYS READ EDUC TOB 868 827 867 858 454 482 329 244 467 847 686 797 309 172 821 824 676 592 764 288 368 EXPPC 0.0317 (0.0027) 0.0488 (0.0025) 0.2020 (0.0075) 0.0147 (0.0014) 0.0178 (0.0034) 0.0078 (0.0006) 0.0135 (0.0015) 0.0066 (0.0008) 0.0015 (0.0002) 0.0421 (0.0021) 0.0035 (0.0003) 0.0212 (0.0015) 0.0210 (0.0018) 0.0007 (0.0004) 0.0205 (0.0036) 0.0754 (0.0044) 0.0329 (0.0025) 0.0081 (0.0008) 0.0054 (0.0004) 0.0114 (0.0029) 0.0013 (0.0009) SIZE 132.221 (15.230) 5.342 (14.279) 113.011 (42.240) 40.958 (7.795) 16.917 (19.250) 7.896 (3.630) 5.030 (8.535) 3.668 (4.649) 1.195 (1.197) 25.575 (11.708) 0.612 (1.601) 16.361 (8.368) 15.862 (10.239) 6.073 (2.484) 162.500 (20.355) 58.943 (24.522) 33.642 (14.295) 9.680 (4.599) 8.202 (1.998) 17.678 (16.152) 17.895 (4.903) OLS, no 0 cases EXPPC 0.0317 (0.0027) 0.0476 (0.0027) 0.2017 (0.0075) 0.0145 (0.0014) 0.0243 (0.0060) 0.0115 (0.0011) 0.0198 (0.0033) 0.0124 (0.0022) 0.0017 (0.0004) 0.0414 (0.0021) 0.0034 (0.0003) 0.0183 (0.0015) 0.0263 (0.0044) 0.0005 (0.0018) 0.0181 (0.0036) 0.0743 (0.0046) 0.0337 (0.0032) 0.0095 (0.0011) 0.0050 (0.0004) 0.0235 (0.0088) 0.0057 (0.0016) SIZE 133.775 (15.181) 2.842 (15.084) 113.677 (42.383) 43.073 (7.833) 1.325 (35.584) 5.007 (6.431) 43.117 (21.227) 25.961 (13.976) 7.757 (2.009) 21.831 (12.061) 3.875 (1.893) 42.594 (8.732) 13.063 (27.150) 23.839 (9.161) 178.197 (20.804) 48.403 (26.222) 23.969 (19.334) 5.894 (6.205) 12.491 (2.212) 108.177 (47.449) 48.865 (8.011) EXPPC 0.0317 (0.0027) 0.0515 (0.0026) 0.2024 (0.0075) 0.0149 (0.0014) 0.0344 (0.0055) 0.0121 (0.0010) 0.0294 (0.0033) 0.0165 (0.0024) 0.0028 (0.0004) 0.0433 (0.0021) 0.0041 (0.0003) 0.0229 (0.0016) 0.0516 (0.0042) 0.0039 (0.0019) 0.0220 (0.0037) 0.0806 (0.0045) 0.0452 (0.0031) 0.0117 (0.0011) 0.0061 (0.0004) 0.0396 (0.0072) 0.0007 (0.0019) tobit SIZE 132.054 (15.220) 16.887 (14.862) 112.636 (42.256) 40.215 (7.862) 71.555 (31.739) 28.819 (5.744) 62.492 (19.530) 46.113 (14.248) 4.247 (2.039) 30.945 (11.946) 3.768 (1.977) 7.883 (9.096) 92.173 (24.352) 9.797 (9.904) 160.342 (21.123) 87.513 (25.611) 91.199 (17.532) 35.529 (6.381) 6.743 (2.233) 329.243 (42.401) 13.939 (10.736)
10.6
Does sample selection bias affect the OLS estimate of the return to college education? Using your EAEF data set, investigate whether there is evidence that selection bias affects the least squares estimate of the returns to college education. Define COLLYEAR = S 12 if S > 12, 0 otherwise, and LGEARNCL =
16
LGEARN if COLLYEAR > 0, missing otherwise. Use the heckman procedure to regress LGEARNCL on COLLYEAR, EXP, ASVABC, MALE, ETHBLACK, and ETHHISP, with ASVABC, SM, SF, and SIBLINGS being used to determine whether the respondent attended college. Run the equivalent regression using least squares. Comment on your findings. Answer: There appears to be a highly significant negative association between the random component of the selection relationship and that of the wage equation. The ethnicity coefficients are particularly affected. As is often the case with sample selection bias, there is no obvious simple reason for these effects.
. g COLLYEAR = 0 . replace COLLYEAR = S-12 if S>12 (286 real changes made) . g LGEARNCL = LGEARN if COLLYEAR>0 (254 missing values generated) . heckman LGEARNCL COLLYEAR EXP ASVABC MALE ETHBLACK ETHHISP, select(ASVABC MAL > E ETHBLACK ETHHISP SM SF SIBLINGS) Iteration Iteration Iteration Iteration Iteration 0: 1: 2: 3: 4: log log log log log likelihood likelihood likelihood likelihood likelihood = = = = = -510.46251 -509.65904 -509.19041 -509.18587 -509.18587 Number of obs Censored obs Uncensored obs Wald chi2(6) Prob > chi2 = = = = = 540 254 286 95.83 0.0000
Heckman selection model (regression model with sample selection)
Log likelihood = -509.1859
-----------------------------------------------------------------------------| Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------LGEARNCL | COLLYEAR | .126778 .0196862 6.44 0.000 .0881937 .1653623 EXP | .0390787 .008101 4.82 0.000 .023201 .0549565 ASVABC | -.0136364 .0069683 -1.96 0.050 -.027294 .0000211 MALE | .4363839 .0738408 5.91 0.000 .2916586 .5811092 ETHBLACK | -.1948981 .1436681 -1.36 0.175 -.4764825 .0866862 ETHHISP | -.2089203 .159384 -1.31 0.190 -.5213072 .1034667 _cons | 2.7604 .4290092 6.43 0.000 1.919557 3.601242 -------------+---------------------------------------------------------------select | ASVABC | .070927 .008141 8.71 0.000 .054971 .086883 MALE | -.3814199 .1228135 -3.11 0.002 -.6221298 -.1407099 ETHBLACK | .433228 .2184279 1.98 0.047 .0051172 .8613388 ETHHISP | 1.198633 .299503 4.00 0.000 .6116179 1.785648 SM | .0342841 .0302181 1.13 0.257 -.0249424 .0935106 SF | .0816985 .021064 3.88 0.000 .0404138 .1229832 SIBLINGS | -.0376608 .0296495 -1.27 0.204 -.0957729 .0204512 _cons | -4.716724 .5139176 -9.18 0.000 -5.723984 -3.709464 -------------+---------------------------------------------------------------/athrho | -.9519231 .2430548 -3.92 0.000 -1.428302 -.4755444 /lnsigma | -.4828234 .0727331 -6.64 0.000 -.6253776 -.3402692 -------------+---------------------------------------------------------------rho | -.7406524 .1097232 -.8913181 -.4426682 sigma | .6170388 .0448791 .5350593 .7115788 lambda | -.4570113 .0967091 -.6465576 -.267465 -----------------------------------------------------------------------------LR test of indep. eqns. (rho = 0): chi2(1) = 7.63 Prob > chi2 = 0.0058 ------------------------------------------------------------------------------
17
. reg LGEARNCL COLLYEAR EXP ASVABC MALE ETHBLACK ETHHISP Source | SS df MS -------------+-----------------------------Model | 36.62667 6 6.104445 Residual | 81.2192941 279 .291108581 -------------+-----------------------------Total | 117.845964 285 .413494611 Number of obs F( 6, 279) Prob > F R-squared Adj R-squared Root MSE = = = = = = 286 20.97 0.0000 0.3108 0.2960 .53954
-----------------------------------------------------------------------------LGEARNCL | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------COLLYEAR | .1380715 .0201347 6.86 0.000 .0984362 .1777068 EXP | .039627 .0085445 4.64 0.000 .022807 .0564469 ASVABC | .0063027 .0052975 1.19 0.235 -.0041254 .0167309 MALE | .3497084 .0673316 5.19 0.000 .217166 .4822509 ETHBLACK | -.0683754 .1354179 -0.50 0.614 -.334946 .1981952 ETHHISP | -.0410075 .1441328 -0.28 0.776 -.3247333 .2427183 _cons | 1.369946 .2884302 4.75 0.000 .8021698 1.937721 ------------------------------------------------------------------------------
10.7
Show that the tobit model may be regarded as a special case of a selection bias model.
Answer: The selection bias model may be written Bi* = 1 + j Q ji + i

j =2 m
Yi* = 1 j X ji + u1
j =2
Yi = Yi*
Yi is not observed
for Bi* > 0 , for Bi* 0
where the Q variables determine selection. The tobit model is the special case where the Q variables are identical to the X variables and B* is the same as Y*. 10.9 An event is hypothesized to occur with probability p. In a sample of n observations, it occurred m times. Demonstrate that the maximum likelihood estimator of p is m/n.
Answer: In each observation where the event did occur, the probability was p. In each observation where it did not occur, the probability was (1 p). Since there were m of the former and n m of the latter, the
joint probability was p m (1 p) n m . Reinterpreting this as a function of p, given m and n, the loglikelihood function for p. is log L( p ) = m log p + (n m ) log(1 p ) Differentiating with respect to p , we obtain the first-order condition for a minimum: d log L( p ) m n m =0 = dp p 1 p This yields p = m/n. We should check that the second differential is negative and that we have therefore found a maximum. The second differential is
d 2 log L( p ) m nm = 2 2 dp p (1 p )2
18
Evaluated at p = m/n,
d 2 log L( p ) n2 nm 1 1 = = n 2 + 2 2 m m dp m nm 1 n
This is negative, so we have indeed chosen the value of p that maximizes the probability of the outcome. 10.10 In Exercise 10.4, log L0 is 1485.62. Compute the pseudo-R2 and confirm that it is equal to that reported in the output.
Answer: As defined in equation (10.43),

pseudo-R2 = 1 as appears in the output. 10.11 In Exercise 10.4, compute the likelihood ratio statistic 2(log L log L0), confirm that it is equal to that reported in the output, and perform the likelihood ratio test.
1403.0835 log L =1 = 0.0556, 1485.6248 log L0
Answer: The likelihood ratio statistic is 2(1403.0835 + 1485.6248) = 165.08, as printed in the output. Under the null hypothesis that the coefficients of the explanatory variables are all jointly equal to 0, this is distributed as a chi-squared statistic with degrees of freedom equal to the number of explanatory variables, in this case 7. The critical value of chi-squared at the 0.1 percent significance level with 7 degrees of freedom is 24.32, and so we reject the null hypothesis at that level.
Answers to the additional exercises

A10.1 In the case of FDHO and HOUS there were too few non-purchasing households to undertake the analysis sensibly (one and two, respectively). The results for the logit analysis and the probit analysis were very similar. The linear probability model also yielded similar results for most of the commodities, the coefficients being similar to the logit and probit marginal effects and the t statistics being of the same order of magnitude as the z statistics for the logit and probit. However for those categories of expenditure where most households made purchases, and the sample was therefore greatly imbalanced, the linear probability model gave very different results, as might be expected. The total expenditure of the household and the size of the household were both highly significant factors in the decision to make a purchase for all the categories of expenditure except TELE, LOC and TOB. In the case of TELE, only 11 households did not make a purchase, the reasons apparently being non-economic. LOCT is on the verge of being an inferior good and for that reason is not sensitive to total expenditure. TOB is well known not to be sensitive to total expenditure. Age was a positive influence in the case of TRIP, HEAL, and READ and a negative one for FDAW, FURN, FOOT, TOYS, EDUC, and TOB. A college education was a positive influence for TRIP, HEAL, READ and EDUC and a negative one for TOB. Most of these effects seem plausible with simple explanations.
19
Linear probability model, dependent variable CATBUY EXPPC x 104 n FDHO FDAW HOUS TELE DOM TEXT FURN MAPP SAPP CLOT FOOT GASO TRIP LOCT HEAL ENT FEES TOYS READ EDUC TOB 868 827 867 858 454 482 329 244 467 847 686 797 309 172 821 824 676 592 764 288 368 b2 0.0002 0.0518 0.0025 0.0092 0.0926 0.1179 0.1202 0.0930 0.1206 0.0316 0.0838 0.0658 0.2073 0.0411 0.0375 0.0495 0.1348 0.0908 0.0922 0.0523 0.0036 t 0.16 5.63 1.19 1.85 4.22 5.51 5.75 4.71 5.59 4.60 4.75 5.56 10.65 2.32 3.79 5.26 8.20 4.78 6.64 2.82 0.17 b3 0.0005 0.0181 0.0000 0.0060 0.0433 0.0690 0.0419 0.0540 0.0655 0.0121 0.0444 0.0374 0.0599 0.0040 0.0162 0.0255 0.0615 0.0854 0.0347 0.1137 0.0153 SIZE t 0.52 3.37 0.02 2.06 3.39 5.52 3.43 4.69 5.20 3.02 4.31 5.42 5.27 0.39 2.81 4.64 6.41 7.70 4.28 10.51 1.21 REFAGE b3 0.0001 0.0018 0.0000 0.0004 0.0019 0.0019 0.0036 0.0012 0.0012 0.0008 0.0028 0.0013 0.0027 0.0011 0.0030 0.0017 0.0029 0.0055 0.0018 0.0041 0.0033 t 0.90 3.98 0.43 1.66 1.84 1.80 3.61 1.25 1.18 2.30 3.29 2.25 2.89 1.29 6.39 3.75 3.61 5.96 2.67 4.61 3.12 COLLEGE b4 0.0017 0.0101 0.0029 0.0123 0.0850 0.0227 0.0050 0.0049 0.0174 0.0028 0.0283 0.0222 0.1608 0.0109 0.0466 0.0350 0.1901 0.0549 0.1006 0.1310 0.1721 t 0.68 0.68 0.83 1.53 2.40 0.66 0.15 0.15 0.50 0.25 0.99 1.16 5.11 0.38 2.91 2.30 7.15 1.79 4.48 4.37 4.92 cases with probability <0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 57 0 >1 288 173 181 136 0 5 0 0 4 176 12 119 5 0 137 207 121 32 105 2 0
20
Logit model, dependent variable CATBUY EXPPC x 104 n FDHO FDAW HOUS TELE DOM TEXT FURN MAPP SAPP CLOT FOOT GASO TRIP LOCT HEAL ENT FEES TOYS READ EDUC TOB 827 858 454 482 329 244 467 847 686 797 309 172 821 824 676 592 764 288 368 3.6456 1.2314 0.3983 0.5406 0.5428 0.4491 0.5439 4.7446 0.6281 1.5214 1.0768 0.2953 1.1577 2.6092 1.5529 0.5087 1.8601 0.3311 0.0163 b2 5.63 1.83 4.09 5.22 5.44 4.54 5.30 4.68 4.49 5.18 9.02 2.31 3.49 4.96 7.55 4.38 6.59 3.21 0.18 z 0.6211 0.6355 0.1817 0.3071 0.1904 0.2648 0.2855 0.8642 0.3162 0.7604 0.3137 0.0294 0.3510 0.9863 0.5275 0.5351 0.4632 0.6053 0.0637 b3 3.43 2.10 3.37 5.32 3.46 4.57 5.05 3.16 4.18 5.20 5.22 0.46 2.83 4.45 6.10 7.02 4.78 9.17 1.19 SIZE z 0.0338 0.0330 0.0081 0.0075 0.0173 0.0059 0.0049 0.0213 0.0152 0.0084 0.0143 0.0069 0.0620 0.0209 0.0140 0.0240 0.0202 0.0283 0.0139 REFAGE b3 2.90 1.83 1.85 1.68 3.68 1.17 1.11 1.38 2.86 1.07 2.80 1.28 5.65 1.89 2.43 4.85 2.99 5.05 3.09 z 0.0141 1.1932 0.3447 0.0823 0.0227 0.0300 0.0597 0.1084 0.2277 0.2414 0.7728 0.0788 0.9372 1.0246 1.4393 0.2645 1.1033 0.7442 0.7260 COLLEGE b4 0.03 1.46 2.34 0.55 0.15 0.18 0.40 0.19 1.22 0.79 4.74 0.43 2.64 2.01 6.24 1.54 3.97 4.34 4.82 z
21
Probit model, dependent variable CATBUY EXPPC x 104 n FDHO FDAW HOUS TELE DOM TEXT FURN MAPP SAPP CLOT FOOT GASO TRIP LOCT HEAL ENT FEES TOYS READ EDUC TOB 827 858 454 482 329 244 467 847 686 797 309 172 821 824 676 592 764 288 368 1.6988 0.5129 0.2467 0.3257 0.3348 0.2770 0.3252 2.0167 0.3296 0.6842 0.6121 0.1556 0.4869 1.3386 0.8299 0.2849 0.7905 0.1917 0.0106 b2 5.72 1.93 4.16 5.32 5.54 4.62 5.43 4.63 4.48 5.25 9.63 2.31 3.65 5.10 7.82 4.48 6.67 3.11 0.18 z 0.2951 0.2630 0.1135 0.1841 0.1168 0.1628 0.1733 0.4036 0.1635 0.2998 0.1791 0.0141 0.1506 0.4519 0.2806 0.3091 0.2188 0.3535 0.0391 b3 3.55 2.14 3.42 5.44 3.46 4.65 5.12 3.31 4.17 5.08 5.05 0.39 2.69 4.53 6.36 7.35 4.58 9.51 1.18 SIZE z 0.0172 0.0135 0.0051 0.0046 0.0103 0.0035 0.0031 0.0086 0.0088 0.0065 0.0082 0.0039 0.0301 0.0116 0.0088 0.0149 0.0107 0.0168 0.0086 REFAGE b3 3.03 1.79 1.86 1.69 3.64 1.19 1.11 1.21 2.89 1.62 2.73 1.28 5.67 2.10 2.66 5.08 2.92 5.12 3.10 z 0.0182 0.5130 0.2160 0.0606 0.0145 0.0174 0.0423 0.0428 0.1105 0.1452 0.4806 0.0448 0.4195 0.4932 0.8151 0.1694 0.5887 0.4417 0.4477 COLLEGE b4 0.09 1.59 2.36 0.65 0.15 0.18 0.46 0.16 1.04 0.96 4.98 0.43 2.54 2.09 6.59 1.67 4.20 4.37 4.84 z
22
Comparison of marginal effects n FDAW 827 LPM Logit Probit LPM Logit Probit LPM Logit Probit LPM Logit Probit LPM Logit Probit LPM Logit Probit LPM Logit Probit LPM Logit Probit LPM Logit Probit EXPPC4 0.0518** 0.0240** 0.0260** 0.0092 0.0078 0.0087 0.0926** 0.0993** 0.0982** 0.1179** 0.1332** 0.1286** 0.1202** 0.1265** 0.1266** 0.0930** 0.0893** 0.0923** 0.1206** 0.1350** 0.1291** 0.0316** 0.0071** 0.0063** 0.0838** 0.0969** 0.0913** SIZE 0.0181** 0.0041** 0.0045** 0.0060* 0.0040* 0.0044* 0.0433** 0.0453** 0.0452** 0.0690** 0.0757** 0.0727** 0.0419** 0.0444** 0.0441** 0.0540** 0.0526** 0.0543** 0.0655** 0.0709** 0.0688** 0.0121** 0.0013** 0.0013** 0.0444** 0.0488** 0.0453** REFAGE 0.0018** 0.0002** 0.0003** 0.0004 0.0002 0.0002 0.0019 0.0020 0.0020 0.0019 0.0018 0.0018 0.0036** 0.0040** 0.0039** 0.0012 0.0012 0.0012 0.0012 0.0012 0.0012 0.0008* 0.0000 0.0000 0.0028** 0.0023** 0.0024** COLLEGE 0.0101 0.0001 0.0003 0.0123 0.0076 0.0087 0.0850* 0.0860* 0.0860* 0.0227 0.0203 0.0239 0.0050 0.0053 0.0055 0.0049 0.0060 0.0058 0.0174 0.0148 0.0168 0.0028 0.0002 0.0001 0.0283 0.0351 0.0306
TELE
858
DOM
454
TEXT
482
FURN
329
MAPP
244
SAPP
467
CLOT
847
FOOT
686
* significant at 5 percent level, ** at 1 percent level, two-tailed tests
23
Comparison of marginal effects (continued) n GASO 797 LPM Logit Probit LPM Logit Probit LPM Logit Probit LPM Logit Probit LPM Logit Probit LPM Logit Probit LPM Logit Probit LPM Logit Probit LPM Logit Probit LPM Logit Probit EXPPC4 0.0658** 0.0622** 0.0707** 0.2073** 0.2408** 0.2243** 0.0411* 0.0463* 0.0430* 0.0375** 0.0318** 0.0339** 0.0495** 0.0229** 0.0251** 0.1348** 0.1765** 0.1878** 0.0908** 0.1029** 0.0974** 0.0922** 0.1084** 0.1124** 0.0523** 0.0673** 0.0654** 0.0036 0.0040 0.0042 SIZE 0.0374** 0.0311** 0.0310** 0.0599** 0.0702** 0.0656** 0.0040 0.0046 0.0039 0.0162** 0.0096** 0.0105** 0.0255** 0.0086** 0.0085** 0.0615** 0.0600** 0.0635** 0.0854** 0.1083** 0.1057** 0.0347** 0.0270** 0.0311** 0.1137** 0.1230** 0.1206** 0.0153 0.0155 0.0153 REFAGE 0.0013* 0.0003 0.0007 0.0027** 0.0032** 0.0030** 0.0011 0.0011 0.0011 0.0030** 0.0017** 0.0021** 0.0017** 0.0002 0.0002* 0.0029** 0.0016* 0.0020** 0.0055** 0.0049** 0.0051** 0.0018** 0.0012** 0.0015** 0.0041** 0.0058** 0.0057** 0.0033** 0.0034** 0.0034** COLLEGE 0.0222 0.0099 0.0150 0.1608** 0.1728** 0.1761** 0.0109 0.0124 0.0124 0.0466** 0.0257** 0.0292* 0.0350* 0.0090* 0.0092* 0.1901** 0.1636** 0.1845** 0.0549 0.0535 0.0579 0.1006** 0.0643** 0.0837** 0.1310** 0.1512** 0.1508** 0.1721** 0.1769** 0.1751**
TRIP
309
LOCT
172
HEAL
821
ENT
824
FEES
676
TOYS
592
READ
764
EDUC
288
TOB
368
* significant at 5 percent level, ** at 1 percent level, two-tailed tests A10.2 The finding that the marginal effect of educational attainment was lower for males than for females over most of the range S 9 is plausible because the probability of working is much closer to 1 for males than for females for S 9, and hence the possible sensitivity of the participation rate to S is smaller. The explanation of the finding that the marginal effect of educational attainment decreases with educational attainment for both males and females over the range S 9 is similar. For both sexes, the greater is S, the greater is the participation rate, and hence the smaller is the scope for it being increased by further education. The OLS estimates of the marginal effect of educational attainment are given by the slope coefficients and they are very similar to the logit estimates at the mean, the reason being that most of the observations on S are confined to the middle part of the sigmoid curve where it is relatively linear. A10.3 (a) Since p is a function of Z, and Z is a linear function of the X variables, the marginal effect of Xj is dp Z dp p = = j X j dZ X j dZ
24
where j is the coefficient of Xj in the expression for Z. In the case of probit analysis, p = F(Z) is the cumulative standardized normal distribution. Hence dp/dZ is just the standardized normal distribution. For males, this is 0.368 when evaluated at the means. Hence the marginal effect of AGE is 0.368*0.137 = 0.050 and that of S is 0.368*0.132 = 0.049. For females the corresponding figures are 0.272*0.154= 0.042 and 0.272*0.094 = 0.026, respectively. So for every extra year of age, the probability is reduced by 5.0 percent for males and 4.2 percent for females. For every extra year of schooling, the probability increases by 4.9 percent for males and 2.6 percent for females. (b) Yes. Given that the cohort is aged 3542, the respondents have passed the age at which most adults start families, and the older they are, the less likely they are to have small children in the household. At the same time, the more educated the respondent, the more likely he or she is to have started having a family relatively late, so the positive effect of schooling is also plausible. However, given the age of the cohort, it is likely to be weaker for females than for males, given that most females intending to have families will have started them by this time, irrespective of their education. (c) Fit a probit regression for the combined sample, adding a male intercept dummy and male slope dummies for AGE and S. Test the coefficient of the slope dummy for S.
A10.4 For low values of S, Z is increasing, so one is travelling up the sigmoid function F(Z) and the probability is increasing, but, because Z is a quadratic function of S, it reaches a maximum and further increases in S causes Z to fall and one starts travelling back down the sigmoid function. Note that the marginal effect of S has two components that must be added the component due to S and the component due to SSQ. A10.5 The joint density of the three observations is
e 2 e 5 e 2 . . 2! 5! 2!
Reinterpreting this as a function of , given the data, this is the likelihood function for . The loglikelihood is e 3 9 log L = log 480 The first-order condition for a turning point is = 3 + 9 log log 480
d log L 9 = 3 + = 0 d
and so = 3. This is a maximum because the second differential 9/2 is negative.
Note: In this example it is not difficult to maximize L directly. The first-order condition is
dl = 3e 3 9 + e 3 98 = 0 d
again implying = 3.
X A10.6 The joint density function is n e . Interpreting this as a likelihood function, the loglikelihood is
i
n log X i . The first-order condition is
Xi = 0
25
= 1 , where X is the sample mean of X. The second derivative is n , which, being Hence X 2 negative, confirms that we have maximized the loglikelihood.
A10.7 The probability of X taking any particular value in any observation is 1/n. Hence the joint density of the two observations in the sample is 1/n2. To maximize this, we require n to be as small as possible. n must be at least equal to 3 since one observation is equal to 3. Hence 3 is the maximum likelihood estimator. A10.8 From the solution to Exercise 10.9, the log-likelihood function for p is log L( p ) = m log p + (n m ) log(1 p ) Thus the LR statistic is
m m (m log p 0 + (n m ) log(1 p 0 )) LR = 2 1 m log n + (n m ) log n 1 m / n m/n = 2 m log p + (n m ) log 1 p 0 0
If m = 40 and n = 100, the LR statistic for H0: p = 0.5 is
0.6 0.4 LR = 2 40 log 0.5 + 60 log 0.5 = 4.03

We would reject the null hypothesis at the 5 percent level (critical value of chi-squared with one degree of freedom 3.84) but not at the 1 percent level (critical value 6.64). A10.9 The first derivative of the log-likelihood function is
d log L( p ) m n m =0 = dp p 1 p
and the second differential is
d 2 log L( p ) m nm = 2 2 dp p (1 p )2
Evaluated at p = m/n,
n2 nm n3 d 2 log L( p ) 1 2 1 = = + n = m m 2 m(n m ) dp 2 m nm 1 n
The variance of the ML estimate is given by

d 2 log L( p ) dp 2
1
n3 = m(n m )
m(n m ) n3
The Wald statistic is therefore
m m p0 p0 n =n . 1 m nm m(n m ) n n n n3
26
Given the data, this is equal to
(0.4 0.5)2
1 0.4 0.6 100
= 4.17 .
Under the null hypothesis this has a chi-squared distribution with one degree of freedom, and so the conclusion is the same as in Exercise A.8.

Binary choice models and ML estimation

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Binary choice models and ML estimation

Uploaded by

Copyright:

Available Formats

October 2007

Test procedures for maximum likelihood estimation

If we relaxed the assumption = 1, the unrestricted likelihood function is

and the log-likelihood is

1 1 , | X 1 ,..., X n ) = n log log L( n log 2 2 2 The first-order condition obtained by differentiating by is

from which we obtain

As before, the ML estimator of is X . Hence the LR statistic is

. If one imposes the restriction = 0 , we have

RSS R = ( X i 0 ) and the F statistic

Returning to the LR statistic, we have

( X i ) and the second is n, so the estimate of the variance is i =1

1 .. The Wald test n

1 n n n 1 n log log ( X i )2 = n log 2 n 2 i =1 2 2 The first derivative with respect to is

The second derivative is

=X, Evaluated at the ML estimator

giving an estimated variance

Hence the Wald test statistic is

. Under the null hypothesis, this is distributed as a chi-squared

1 1 log L( 1 ,..., k , | Yi , X i , i = 1,...n ) = n log 2 2 2

2(log LU LR ) = n(log RSS R log RSS U ) = n log

Log Likelihood = -1586.5519

Log Likelihood = -802.65424

Probability of working, as a function of S

For males and females separately, she calculates

dF . The values of Z and f ( Z ) are shown in the table. dZ

= 0.371 + 0.075 S 0.003 SSQ (0.121) (0.018) (0.001)

= 0.914 + 0.420 S 0.017 SSQ (0.748) (0.111) (0.004)

Actual proportion 0.90 0.85

Probability of being married

0.80 0.75 0.70 0.65 0.60 0.55 0.50 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

f(X) = eX for X > 0

Answers to the starred exercises in the text

= 0.864 + 0.023ASVABC (0.042) (0.001)

The researcher also fitted the following logit regression:

= 11.103 + 0.189 ASVABC Z

0.02 100 0.01

Log likelihood = -1403.0835

Variable S AGE CHILD06 CHILDL16 MARRIED ETHBLACK ETHHISP constant Total

Mean 13.3100 17.6464 0.3991 0.3180 0.6229 0.1306 0.0723 1.0000

b 0.0893 0.0439 0.5842 0.1359 0.0077 0.2781 0.0192 0.6735

f(Z) 0.2969 0.2969 0.2969 0.2969 0.2969 0.2969 0.2969

bf(Z) 0.0265 0.0130 0.1735 0.0404 0.0023 0.0826 0.0057

Heckman selection model (regression model with sample selection)

Log likelihood = -509.1859

Answer: The selection bias model may be written Bi* = 1 + j Q ji + i

for Bi* > 0 , for Bi* 0

Answer: As defined in equation (10.43),

1403.0835 log L =1 = 0.0556, 1485.6248 log L0

Answers to the additional exercises

* significant at 5 percent level, ** at 1 percent level, two-tailed tests

n log X i . The first-order condition is

If m = 40 and n = 100, the LR statistic for H0: p = 0.5 is

0.6 0.4 LR = 2 40 log 0.5 + 60 log 0.5 = 4.03

The variance of the ML estimate is given by

The Wald statistic is therefore

Given the data, this is equal to

You might also like