This action might not be possible to undo. Are you sure you want to continue?

# October 2007

Chapter 10

**Binary choice and limited dependent variable models, and maximum likelihood estimation
**

Overview

The first part of this chapter describes the linear probability model, logit analysis, and probit analysis, three techniques for fitting regression models where the dependent variable is a qualitative characteristic. Next it discusses tobit analysis, a censored regression model fitted using a combination of linear regression analysis and probit analysis. This leads to sample selection models and heckman analysis. The second part of the chapter introduces maximum likelihood estimation, the method used to fit all of these models except the linear probability model.

Further material

Limiting distributions and the properties of maximum likelihood estimators Provided that weak regularity conditions involving the differentiability of the likelihood function are satisfied, maximum likelihood (ML) estimators have the following attractive properties in large samples: (1) They are consistent. (2) They are asymptotically normally distributed. (3) They are asymptotically efficient. The meaning of the first property is familiar. It implies that the probability density function of the estimator collapses to a spike at the true value. This being the case, what can the other assertions mean? If the distribution becomes degenerate as the sample size becomes very large, how can it be described as having a normal distribution? And how can it be described as being efficient, when its variance, and the variance of any other consistent estimator, tend to zero? To discuss the last two properties, we consider what is known as the limiting distribution of an estimator. This is the distribution of the estimator when the divergence between it and its population mean is multiplied by

n . If we do this, the distribution of a typical estimator remains nondegenerate as n becomes large, and this

enables us to say meaningful things about its shape and to make comparisons with the distributions of other estimators (also multiplied by n ).

ˆ is its ML To put this mathematically, suppose that there is one parameter of interest, θ, and that θ estimator. Then (2) says that

ˆ − ϑ ~ N 0, σ 2 nθ

(

) (

)

~ n θ − θ cannot have smaller

~ for some variance σ 2 . (3) says that , given any other consistent estimator θ , variance.

(

)

October 2007

2

Test procedures for maximum likelihood estimation

This section on ML tests contains material that is a little advanced for an introductory econometrics course. It is provided because likelihood ratio tests are encountered in the sections on binary choice models and because a brief introduction may be of help to those who proceed to a more advanced course. There are three main approaches to testing hypotheses in maximum likelihood estimation: likelihood ratio (LR) tests, Wald tests, and Lagrange multiplier (LM) tests. Since the theory behind Lagrange multiplier tests is relatively complex, the present discussion will be confined to the first two types. We will start by assuming that the probability density function of a random variable X is a known function of a single unknown parameter θ and that the likelihood function for θ given a sample of n observations on X, L(θ | X 1 ,..., X n ) , satisfies weak regularity conditions involving its differentiability. In particular, we assume that θ is determined by the firstorder condition dL/dθ = 0. (This rules out estimators such as that in Exercise A10.7) The null hypothesis is H0: ˆ. θ = θ0, the alternative hypothesis is H1: θ ≠ θ0, and the maximum likelihood estimate of θ is θ

Likelihood ratio tests

ˆ with its value at θ = θ . In view A likelihood ratio test compares the value of the likelihood function at θ = θ 0 ˆ, Lθ ˆ ≥ L(θ ) for all θ0. However, if the null hypothesis is true, the ratio L θ ˆ L(θ ) of the definition of θ 0 0

should not be significantly greater than 1. As a consequence, the logarithm of the ratio,

ˆ ⎞ ⎛ Lθ ˆ − log L(θ ) ⎟ = log L θ log⎜ 0 ⎜ L(θ ) ⎟ 0 ⎝ ⎠

()

()

()

()

should not be significantly different from zero. In that it involves a comparison of the measures of goodness of fit for unrestricted and restricted versions of the model, the LR test is similar to an F test. Under the null hypothesis, it can be shown that in large samples the test statistic ˆ − log L(θ ) LR = 2 log L θ 0

(

()

)

has a chi-squared distribution with one degree of freedom. If there are multiple parameters of interest, and multiple restrictions, the number of degrees of freedom is equal to the number of restrictions. Examples We will return to the example in Section 10.6 in the text, where we have a normally-distributed random variable X with unknown population mean μ and known standard deviation equal to 1. Given a sample of n observations, the likelihood function is

⎛ 1 − 1 (X ˆ X 1 ,..., X n ) = ⎜ L( μ e 2 ⎜ 2π ⎝

1 −μ

)2

1 ⎛ ⎞ − (X ⎟ × ... × ⎜ 1 e 2 ⎜ 2π ⎟ ⎝ ⎠

n −μ

)2

⎞ ⎟. ⎟ ⎠

The log–likelihood is

⎛ 1 ⎞ 1 n ⎟ − ∑ (X i − μ ˆ | X 1 ,..., X n ) = n log⎜ ˆ )2 log L(μ ⎜ ⎟ 2 i =1 ⎝ 2π ⎠

**ˆ = X . The LR statistic for the null hypothesis H0: μ = μ0 is therefore and the unrestricted ML estimate is μ
**

⎛⎛ ⎛ 1 ⎞ 1 n ⎟ − ∑ Xi − X LR = 2⎜ ⎜ n log⎜ ⎜ ⎟ 2 ⎜⎜ 2 π i =1 ⎝ ⎠ ⎝⎝

(

)

2

n ⎞ ⎛ ⎞⎞ ⎛ ⎞ ⎟ − ⎜ n log⎜ 1 ⎟ − 1 ∑ ( X i − μ 0 )2 ⎟ ⎟ ⎜ ⎟ ⎟ ⎜ ⎟⎟ ⎝ 2π ⎠ 2 i =1 ⎠ ⎝ ⎠⎠

= ∑ (X i − μ 0 ) − ∑ X i − X

2 i =1 i =1

n

n

(

)

2

= n X − μ0

(

)

2

X n ) = ⎜ L( μ e ⎜ ˆ 2π ⎜σ ⎝ 1 2 ˆ⎞ 1 ⎛ X −μ ⎛ ⎞ − ⎜ ⎟ ⎜ 1 ⎟ ˆ ⎟ 2⎜ ⎝ σ ⎠ × × .. ⎟ ⎟ ⎠ and the log-likelihood is ⎛ 1 ⎞ 1 ⎟ ˆ ... X n ) = n log⎜ ⎜ n ∑ (X i − μ ) ⎟ ⎟ −2 ⎜ ⎟ ⎝ i =1 ⎠ ⎝ 2π ⎠ 1 As before. we have . sometimes. n − 1) = ∑ (X i =1 n i − μ0 ) − ∑ X i − X 2 i =1 n ( ) 2 ⎛ n ⎜ ⎜∑ X i − X ⎝ i =1 ( ⎟ )⎞ ⎟ (n − 1) 2 ⎠ Returning to the LR statistic. X n ) = n log⎜ ˆ log L(μ ⎜ π ⎟ − n log σ − 2σ ˆ2 ⎝ 2 ⎠ The first-order condition obtained by differentiating by σ is ∂ log L n 1 =− + 3 ∂σ σ σ ∑ (X i =1 n i ˆ) −μ 2 ∑ (X i =1 n i − μ) = 0 2 from which we obtain ˆ2 = σ 1 n ˆ )2 ∑ (X i − μ n i =1 Substituting back into the log-likelihood function... the unrestricted likelihood function is ˆ⎞ 1 ⎛ X −μ ⎛ − ⎜ ⎟ 1 ˆ ⎟ 2⎜ ⎝ σ ⎠ ˆ . the profile log-likelihood function): 2 ⎛ 1 ⎞ ⎛1 n n 2⎞ ⎟ − n log⎜ log L(μ | X 1 . the ML estimator of μ is X .σ ˆ X 1 .. e ⎜ ⎟ ˆ 2π ⎟ ⎜σ ⎠ ⎝ n 2 ⎞ ⎟ ..3 If we relaxed the assumption σ = 1...σ ˆ | X 1 . the latter now becomes a function of μ only (and is known as the concentrated log-likelihood function or.... Hence the LR statistic is ⎛⎛ ⎜⎜ ⎛ 1 ⎞ ⎛1 n ⎟ − n log⎜ LR = 2⎜ ⎜ n log⎜ ⎜ n ∑ Xi − X ⎟ ⎜ ⎝ i =1 ⎝ 2π ⎠ ⎜⎜ ⎝⎝ ( ) 2 2 1 ⎞ 1 ⎞ ⎛ ⎞ ⎛ 1 ⎞ ⎞2 n ⎟ ⎜ ⎛1 n n ⎞ 2 ⎟⎟ 2 ⎟ ⎜ ⎟ ⎜ ⎟ − − − − − n log n log X ( ) μ ⎟ ⎜ ⎟⎟ 0 ⎟ ⎜n∑ i ⎟ ⎜ 2⎟ ⎜ 2⎟ ⎠ ⎝ i =1 ⎠ ⎟⎟ ⎝ 2π ⎠ ⎠ ⎝ ⎠⎠ n n ⎛ 2 ( ) = n⎜ − − log X μ log Xi − X ∑ ∑ i 0 ⎜ i =1 i =1 ⎝ ( ) ⎞ ⎟ ⎟ ⎠ It is worth noting that this is closely related to the F statistic obtained when one fits the least squares model Xi = μ + ui n G G The least squares estimator of μ is X and RSS = ∑ X i − X i =1 ( ) 2 . we have RSS R = ∑ ( X i − μ 0 ) and the F statistic 2 i =1 n F (1. If one imposes the restriction μ = μ0 ...

. the concentrated log-likelihood function is 2 ⎛ 1 ⎞ ⎛1 n n 2⎞ ⎟ − n log⎜ log L(μ | X 1 . Wald tests Wald tests are based on the same principle as t tests in that they evaluate whether the discrepancy between the maximum likelihood estimate θ and the hypothetical value θ0 is significant. so the estimate of the variance is ∑ i =1 n 1 .. Examples We will use the same examples as for the LR test. where σ was unknown. X n ) = n log⎜ ⎜ ⎟ 2 2 π i =1 ⎝ ⎠ The first differential is statistic is therefore n X − μ 0 . The Wald test n ) In the second example. Under the null hypothesis that the restriction is valid. the test statistic becomes more complex and the number of degrees of freedom is equal to the number of restrictions. The test statistic for the null hypothesis H : θ 0 0 (θˆ − θ ) 0 2 ˆ θ2 σ ˆ ˆ θ2 ˆ θ2 where σ ˆ is the estimate of the variance of θ evaluated at the maximum likelihood value.. A common estimator is that obtained as minus the inverse of the second differential of the log-likelihood function evaluated at the maximum likelihood estimate. σ ˆ can be estimated in various ways that are asymptotically equivalent if the likelihood function has been specified correctly. X n ) = n log⎜ ⎜ n ∑ (X i − μ ) ⎟ ⎟ −2 ⎜ ⎟ ⎝ i =1 ⎠ ⎝ 2π ⎠ 1 ⎛ 1 ⎞ n ⎛ n n 1 n ⎟ − log − log⎜ ⎟ ( X i − μ )2 ⎞ = n log⎜ − ∑ ⎜ ⎟ ⎜ ⎟ 2 n 2 ⎝ i =1 ⎠ 2 ⎝ 2π ⎠ The first derivative with respect to μ is .4 LR = n log ∑ (X i =1 n n i − μ0 ) −X 2 2 ∑ (X i =1 i i ) 2 ⎛ ⎜ = n log⎜ ⎜1 + ⎜ ⎜ ⎝ n ∑ (X i =1 n i − μ0 ) − ∑ X i − X 2 i =1 n ( ) 2 ∑ (X n i =1 i −X ) 2 ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ ≅n ∑ (X i =1 n − μ0 ) − ∑ X i − X i =1 ( ) 2 ∑ (X n i =1 i −X ) = 2 n F≅F n −1 Note that we have used the approximation log(1 + a) = a which is valid when a is small enough for higher powers to be neglected.. assuming that σ = 1 and then assuming that it has to be estimated along with μ.. taking account of the variance in the ˆ − θ = 0 is estimate. first. In the first case the log-likelihood function is ⎛ 1 ⎞ 1 n ⎟ − ∑ ( X i − μ )2 log L(μ | X 1 .. When there are multiple restrictions.. the test statistic has a chi-squared distribution with one degree of freedom... 2 ( ( X i − μ ) and the second is – n.

the log-likelihood function for the parameters is 2 ⎛ 1 ⎞ 1 ⎟− log L(β 1 .i. i = 1.n ) = n log⎜ ⎜ ⎟ 2σ 2 2 σ π ⎝ ⎠ k ⎛ ⎞ ⎜ Yi − β 1 − ∑ β j X ij ⎟ ∑ ⎜ ⎟ i =1 ⎝ j =2 ⎠ n 2 This is a straightforward generalization of the expression for a simple regression derived in Section 10. when there is one degree of freedom. X i .. Hence n 1 Z log L(β 1 . The second derivative is d log L =n dμ 2 2 ⎛ ⎞⎛ ⎞ 2⎞ ⎜ (− n )⎛ ⎜ ∑ (X i − μ ) ⎟ ⎟−⎜ ⎜ ∑ ( X i − μ )⎟ ⎟⎜ ⎜ − 2 ∑ ( X i − μ )⎟ ⎟ n n n ⎝ i =1 ⎠ ⎝ i =1 ⎠⎝ i =1 ⎠ ⎡n 2⎤ ⎢∑ ( X i − μ ) ⎥ ⎣ i =1 ⎦ 2 ˆ=X.. Under the null hypothesis.n ) = − n log σ − log 2π − 2 2σ 2 where k ⎛ ⎞ Z = ∑⎜ Yi − β 1 − ∑ β j X ij ⎟ ⎜ ⎟ i =1 ⎝ j =2 ⎠ n 2 . the Wald statistic is the square of the corresponding asymptotic t statistic (asymptotic because the variance has been estimated asymptotically).6 in the text..5 dlogL =n dμ (X i − μ ) ∑ i =1 2 ∑ (X i − μ ) i =1 n n .. β k . β k . σ | Yi . Evaluated at the ML estimator μ ( X i − μ ) = 0 and hence ∑ i =1 d 2 log L =− dμ 2 n2 n ∑ (X i =1 n i − μ) 2 giving an estimated variance ˆ2 σ n .σ )....d. i = 1. as in the present case. this is distributed as a chi-squared statistic with one degree of freedom.. When there is just one restriction. The chi-squared test and the t test are equivalent..... LR test of restrictions in a regression model Given the regression model Yi = β1 + ∑ β j X ij + u i j =2 k with u assumed to be i. X i . given ˆ2 = σ 1 n ∑ Xi − X n i =1 ( ) 2 Hence the Wald test statistic is (X − μ ) 0 2 ˆ2 n σ .. σ | Yi . given that.. N(0. the critical value of the chi-squared statistic for any significance level is the square of the critical value of the normal distribution.

and of course this is exactly what one is doing when one is fitting a last squares regression. As with all maximum likelihood tests. Thus for testing linear restrictions we should prefer the F test approach because it is valid for finite samples. we obtain the concentrated as the sample size becomes large. it is valid only for large samples. β k | Yi . If there is more than one restriction. you should be able to: • • • • • describe the linear probability model and explain its defects describe logit analysis. the test statistic is the same but the number of degrees of freedom under the null hypothesis that all the restrictions are valid is equal to the number of restrictions. which is the sum of the squares of the residuals divided by n – k..... including the mathematical specification calculate marginal effects in logit and probit analysis explain why OLS yields biased estimates when applied to a sample with censored observations. even when the censored observations are deleted ... we obtain one of the first-order conditions for a maximum: ∂ log L(β 1 . σ ) n 1 = − + 3 RSS = 0 σ σ ∂σ From this we obtain RSS ˆ2 = σ n Hence the ML estimator is the sum of the squares of the residuals divided by n. This is different from the least squares estimator. but the difference disappears ˆ 2 in the log-likelihood function. studying the corresponding slideshows. If we now impose a restriction on the parameters and maximize the log-likelihood function subject to the restriction. Hence Z = RSS. Substituting for σ likelihood function 1 ⎛ RSS ⎞ 2 n log L(β 1 . and doing the starred exercises in the text and the additional exercises in this guide..... it will be n log LR = − (log RSS R + 1 + log 2π − log n ) 2 where RSS R ≥ RSS U and hence log LR ≤ log LU . Taking the partial differential with respect to σ. X i .. they should be chosen so as to minimize Z. It remains to determine the ML estimate of σ.6 in the text. β k . To maximize the log-likelihood.n ) = − n log⎜ RSS ⎟ − log 2π − 2 2 /n n Z ⎝ ⎠ n RSS n n = − log − log 2π − 2 2 2 n =− we will re-write this n (log RSS + 1 + log 2π − log n ) 2 1 n (log RSS U + 1 + log 2π − log n) 2 the subscript U emphasizing that this is the unrestricted log-likelihood. An example of its use is the common factor test in Section 12. Learning outcomes After working through the corresponding chapter in the text. giving the mathematical specification describe probit analysis. The LR statistic for a test of the restriction is therefore log LU = − 2(log LU − LR ) = n(log RSS R − log RSS U ) = n log RSS R RSS U It is distributed as a chi-squared statistic with one degree of freedom under the null hypothesis that the restriction is valid. i = 1.6 The estimates of the β parameters affect only Z.

Err. Std. with the output shown below (first females. Discuss the plausibility of this finding. .2 Logit analysis was used to relate the event of a respondent working (WORKING. From the logit analysis.000 .5697314 -----------------------------------------------------------------------------. SIZE.049543 . z P>|z| [95% Conf.1146971 . defined as the highest grade completed) using 1994 data from the US National Longitudinal Survey of Youth. and COLLEGE (as defined in Exercise A5.1 What factors affect the decision to make a purchase of your category of expenditure in the CES data set? Define a new variable CATBUY that is equal to 1 if the household makes any purchase of your category and 0 if it makes no purchase at all. REFAGE. SIZE.020 for females and males.9670268 . Std. without mathematical detail) • explain the principle underlying maximum likelihood estimation • apply maximum likelihood estimation from first principles in simple models Additional exercises A10.029 and 0. Err. defined to be 1 if the respondent was working. and 0 otherwise) to the respondent’s educational attainment (S. respectively. logit WORKING S if MALE==1 Logit Estimates Number of obs chi2(1) Prob > chi2 Pseudo R2 = 2573 = 75.529355 -.000 .5519 -----------------------------------------------------------------------------WORKING | Coef.3 and 13.3775658 -2. and COLLEGE for the logit and probit models and compare them with the coefficients of the linear probability model. A10.0217 Log Likelihood = -1586.0446 Log Likelihood = -802.2448064 -4.287 0. Interval] ---------+-------------------------------------------------------------------S | . The analysis was undertaken for females and males separately. and (3) the probit model.0186177 8.2270113 ------------------------------------------------------------------------------ 95 percent of the respondents had S in the range 9–18 years and the mean value of S was 13.010 -1.3099989 _cons | -.707042 -. REFAGE.0000 = 0.561 0.1511872 .0306482 8.65424 -----------------------------------------------------------------------------WORKING | Coef. Calculate the marginal effects at the mean of EXPPC.1876773 _cons | -1. Interval] ---------+-------------------------------------------------------------------S | . logit WORKING S if MALE==0 Logit Estimates Number of obs chi2(1) Prob > chi2 Pseudo R2 = 2726 = 70. Ordinary least squares regressions of WORKING on S yielded slope coefficients of 0.7 • explain the problem of sample selection bias and describe how the heckman procedure may provide a solution to it (in general terms.2 years for females and males. the marginal effect of S on the probability of working at the mean was estimated to be 0.000 -1.2499295 .155 0. respectively.42 = 0. Regress CATBUY on EXPPC.6) using: (1) the linear probability model. then males.03 = 0.121 0.1898601 .0000 = 0. (2) the logit model.020 for females and males. with iteration messages deleted): . In this year the respondents were aged 29–36 and a substantial number of females had given up work to raise a family. As can be seen from the second figure below. respectively.030 and 0. the marginal effect of educational attainment was lower for males than for females over most of the range S ≥ 9. z P>|z| [95% Conf.

07 Males 0. age.8 Probability 0.00 0 2 4 6 8 10 S 12 14 16 18 20 Marginal effect of S on the probability of working A10. years of schooling. Compare the estimates of the marginal effect of educational attainment using logit analysis with those obtained using ordinary least squares.01 0.02 0.0 Males 0. as a function of S 0.3 A researcher interested in the relationship between parenting.4 0. and S.05 Marginal effect 0.2 0.0 0 2 4 6 8 10 S 12 14 16 18 20 Probability of working. Defining the probability of having a child less than 6 in the household to be p = F(Z) where .8 As can also be seen from the second figure. In particular.03 0.167 married males and 870 married females aged 35 to 42 in the National Longitudinal Survey of Youth. Discuss the plausibility of this finding. she is interested in how the presence of young children in the household is related to the age and education of the respondent.06 0. for males and females separately using probit analysis. She defines CHILDL6 to be 1 if there is a child less than 6 years old in the household and 0 otherwise and regresses it on AGE.04 Females 0.6 Females 0. 1. age and schooling has data for the year 2000 for a sample of 1. the marginal effect of educational attainment decreases with educational attainment for both males and females over the range S ≥ 9.

4 A researcher has 1994 data on years of schooling. The researcher does not know how to test whether the difference is significant and asks you for advice.368 females –0.492) –0. Explain whether you would expect the marginal effect of schooling to be higher for males or for females.762 female respondents aged 29–36 in the US National Longitudinal Survey of Youth.371 + 0. and calculate for both males and females the marginal effects at the means of AGE and S.154 (0.914 + 0. she calculates Z = b1 + b2 AGE + b3 S where AGE and S are the mean values of AGE and S and b1.132 (0. the proportion being relatively low for those who did not complete high school (S < 12) and for those with a college education.020) 0. and she further calculates f (Z ) = 1 2π e 1 − Z2 2 where f ( Z ) = (a) dF . but it is nonlinear.121) (0.272 Z f (Z ) For males and females separately. and the marital status of 2. using ordinary least squares.004) The graph shows the proportion of ever-married women for each value of S and the relationships implicit in the OLS and logit regressions. and she regresses M on S and SSQ.137 (0. dZ Explain how one may derive the marginal effects of the explanatory variables on the probability of having a child less than 6 in the household. She notices that there is a relationship between education and the probability of ever having married. males AGE S constant –0. (c) At a seminar someone asks the researcher whether the marginal effect of S is significantly different for males and females. and b3 are the probit coefficients in the corresponding regression.547 (0.001) She also performs a logit regression with the same specification (standard errors in parentheses): ˆ Z = –0.9 Z = β1 + β2AGE + β3S she obtains the results shown in the table (asymptotic standard errors in parentheses). The values of Z and f ( Z ) are shown in the table.111) (0.874 0.748) (0.018) (0. a variable defined as the square of S. What would you say? A10.399 0. (b) Explain whether the signs of the marginal effects are plausible. with the following result (standard errors in parentheses): ˆ M = 0. .358) –0. S.075 S – 0.023) 0. She defines a variable M to be 1 if the respondent had ever been married and 0 if always single.094 (0.420 S – 0.018) 0.015) 0.194 (0.017 SSQ (0.003 SSQ (0. b2.

80 0. . is given by p ( x) = e −λ λ x x! where λ is an unknown parameter and x! is the product of the terms x(x–1)(x–2)…(3)(2)(1).55 0.5 A random variable X can take the positive integer values 0. Derive the maximum likelihood estimate of n. Write down the likelihood function for λ and derive the maximum likelihood estimate of λ.6 The probability density function of a random variable X is the exponential distribution f(X) = λe–λX for X > 0 with unknown parameter λ.71 1. 1. 2. n is unknown. p(x).0246 0.85 OLS model Logit model Probability of being married 0.50 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 Years of schooling This model is a little different from the standard logit model in that Z is a quadratic function of S. The marginal effects for S = 8 (completion of grade school). S 8 12 13.22 (sample mean). and 2. In a sample of three observations.90 0.71 1. 5.7 A random variable X can be equal to any of the positive integers in the range 1 to n with equal probability.0029 –0.75 0. Explain the implications of this for the properties of the model and the calculation of the marginal effect of S. and S = 16 (completion of four-year college) are shown in the table.0169 A10.10 Actual proportion 0. A10. A10. given a sample of n observations on X.37 1. together with that at S = 13.70 0.52 Marginal effect 0.22 16 Z 1. In a sample of 2 observations.60 0. … The probability of it being equal to a specific value x. 3. S = 12 (completion of high school). the values of the random variable are 2. the values of X were 2 and 3.65 0. Derive the maximum likelihood estimator of λ.0024 –0.

868 individuals from the NLSY. discuss the variation of the marginal effect of the ASVABC score implicit in the logit regression and compare it with that in the OLS regression.023ASVABC (0.21 The researcher also fitted the following logit regression: ˆ = –11. the exercise required demonstrating that m/n was the ML estimator of p.2 0.2. Defining a variable BACH to be equal to 1 if the respondent has a bachelor’s degree (or higher degree) and 0 otherwise. test the null hypothesis p = 0.6 0. Sketch the probability and marginal effect diagrams for the OLS regression and compare them with those for the logit regression. A10. 1.0 0.8 0.189 ASVABC Z (0.0 0 20 40 60 80 0. with mean value 50.103 + 0. is investigating how the probability of a respondent obtaining a bachelor’s degree from a four-year college is related to the respondent’s score on ASVABC.7 percent of the respondents earned bachelor’s degrees.00 100 ASVABC Marginal effect .4 0.042) (0. using a sample of 2. Using this regression. (a) (b) (c) Give an interpretation of the OLS regression and explain why OLS is not a satisfactory estimation method for this kind of model. Answers to the starred exercises in the text 10. derive the Wald statistic and test the null hypothesis p = 0. With reference to the diagram.2 A researcher.8. Given that the event occurred m times in a sample of n observations. ASVABC ranged from 22 to 65.864 + 0. 26.04 Cumulative effect 0.03 0. and most scores were in the range 40 to 60.487) (0. If m = 40 and n = 100.9 in the text. the probability and marginal effect functions are shown in the diagram.05 0. make use of the information in the first paragraph of this question.01 0. In your discussion.11 A10.9 For the variable in Exercise A10. an event could occur with probability p.009) where Z is the variable in the logit function.8 In Exercise 10.001) R2 = 0. Derive the LR statistic for the null hypothesis p = p0.02 0.5. the researcher fitted the OLS regression (standard errors in parentheses): ˆ CH BA = –0.5.

The OLS counterpart to the marginal probability curve is a horizontal straight line at 0.0 0.02 100 0. The questions states that the highest score was 65. and negative probabilities for all scores lower than 38. where the probability would be about 75 percent. From the curve for the cumulative probability in the figure it can be seen that the probability of graduating from college for respondents with that score is only about 20 percent. and of course this is the range with the steepest slope of the cumulative probability.03 0. The linear probability model also suffers from the problem that the standard errors and t and F tests are invalid because the disturbance term does not have a normal distribution.12 Answer: (a) The slope coefficient of the linear probability model indicates that the probability of earning a bachelor’s degree rises by 2. (c) The counterpart to the cumulative probability curve in the figure is a straight line using the regression result. 1. it can be seen that the marginal effect is greatest in the range 50–65. However the linear probability model predicts nonsense negative probabilities for all those with scores less of 37 or less. At the ASVABC mean it predicts that there is a 29 percent chance of the respondent graduating from college.5 20 40 60 80 0. a score of zero was in fact impossible. being about 60 percent. showing that the marginal effect is under estimated for ASVABC scores above 50 and overestimated below that figure.05 1. It predicts a –36 percent chance of graduating for respondents with the lowest score of 22. Given the way that ASVABC was constructed.01 -1. Its distribution is not even continuous. Looking at the curve for the marginal probability. (b) The question states that the mean value of ASVABC in the sample was 50.04 Cumulative effect 0. It is particularly unsatisfactory for low ASVABC scores. consisting of only two possible values for each value of ASVABC. While this may be realistic for a range of values of ASVABC. The question states that most respondents had scores in the range 40–60.0 0. but for a score of 65 it predicts a probability of only 63 percent.2. of whom there were many in the sample.0 0 -0. The intercept literally indicates that an individual with a zero score would have a minus 86.00 ASVABC Marginal effect . Very few of those with scores in the low end of the spectrum earned bachelor’s degrees and variations in the ASVABC score would be unlikely to have an effect on the probability.3 percent for every additional point on the ASVABC score. It can be seen that at the top of that range the probability has increased substantially.5 0. it is not for very low ones. considerably more than the logit figure.023.4 percent probability of earning a bachelor’s degree.5 0.

514 0.08 0. MARRIED is equal to 1 if the respondent was married with spouse present.726 observations on females in the National Longitudinal Survey of Youth using the LFP data set described in Appendix B.0835 Number of obs LR chi2(7) Prob > chi2 Pseudo R2 = = = = 2726 165.0556 Probit estimates Log likelihood = -1403.715 0. The mean values of the variables are given in the output below: .182 0.3991196 . having a child aged less than 6 has a large adverse effect.1055466 -0. Being black also has an adverse effect.086 -.429 0.1191337 ETHHISP | -. with iteration messages deleted. but no child less than 6. the reason for which is not obvious (the respondents were aged 29 – 36 in 1994).1359097 .1418775 1. [The data set and a description are posted on the website.1129 AGE | -. Dev.0076543 .2780887 . the latter effect is smaller.1314492 .0631618 -0.842 0. If the definition is tightened to include also the requirement that the employment status is employed.7301525 -. Age has a significant negative effect.2260284 .4847516 0 1 ETHBLACK | 2726 . Std.000 -.399 0.30998 2.444771 0 20 AGE | 2726 17.24083 14 22 CHILDL06 | 2726 . sum WORKING S AGE CHILDL06 CHILDL16 MARRIED ETHBLACK ETHHISP if MALE==0 Variable | Obs Mean Std. Min Max ---------+----------------------------------------------------WORKING | 2726 .0683076 -.1877068 _cons | .6228907 . 0 otherwise. very highly significant.000 .3180484 .0120629 7. Interval] ---------+-------------------------------------------------------------------S | .4658038 0 1 MARRIED | 2726 . Err.0656143 .0892571 .205066 ------------------------------------------------------------------------------ WORKING is a binary variable equal to 1 if the respondent was working in 1994.121 0.0835 -----------------------------------------------------------------------------WORKING | Coef.2912092 .2712267 2.000 -. . was fitted using 2. more educated mothers making use of their investment by tending to stay in the labor force.0000 0.0744923 -7. 0 otherwise.0193897 MARRIED | -.4381482 CHILDL16 | -.0722671 .13 10. CHILDL16 is a dummy variable equal to 1 if there was a child aged less than 16. probit WORKING S AGE CHILDL06 CHILDL16 MARRIED ETHBLACK ETHHISP if MALE==0 Iteration Iteration Iteration Iteration 0: 1: 2: 3: log log log log likelihood likelihood likelihood likelihood = = = = -1485.9344 -1403.7652238 .4 The following probit regression. in the household.6248 -1403.3370179 0 1 ETHHISP | 2726 . The data are for 1994.) .0839 -1403. z P>|z| [95% Conf.0792359 -1.0438511 . (The WORKING variable is defined to be 1 if the individual has recorded hourly earnings of at least $3.013 .0191608 .081101 -3.2589771 0 1 Calculate the marginal effects and discuss whether they are plausible. Schooling also has a very significant effect.1161407 ETHBLACK | -. but still significant at the 5 percent level.0193946 CHILDL06 | -.4370436 -.5841503 .001 -.] Answer: The marginal effects are calculated in the table below. The remaining variables are as described in EAEF Regression Exercises.483 0. CHILDL06 is a dummy variable equal to 1 if there was a child aged less than 6 in the household.673472 .1305943 . As might be expected.64637 2.856 -.4239366 0 1 S | 2726 13. 0 otherwise.904 -. when the respondents were aged 29 to 36 and many of them were raising young families. 0 otherwise. the reason for which is likewise not obvious.012478 -3.4898073 0 1 CHILDL16 | 2726 .

0363 –0.0432 –0. perform a tobit regression of expenditure on your commodity on total household expenditure per capita and household size.1886 –0.0192 0.2969 0.3100 17. SHEL. TELE.0023 –0.0130 –0.0404 –0.1359 –0.2969 0. .0265 –0. the OLS regression using all the observations.2969 0.6735 Mean × b 1. and the tobit regression.14 Variable S AGE CHILD06 CHILDL16 MARRIED ETHBLACK ETHHISP constant Total Mean 13.7747 –0.0077 –0. and CLOT.0723 1. Answer: The table gives the number of unconstrained observations for each category of expenditure and the slope coefficients and standard errors from an OLS regression using the unconstrained observations only. and compare the slope coefficients with those obtained in OLS regressions including and excluding observations with 0 expenditure on your commodity.3180 0.6464 0.1306 0.0048 –0.5 Using the CES data set.1735 –0. As may be expected.2969 0.0893 –0.6735 0.7685 f(Z) 0.2332 –0. there is very little difference between the tobit and the OLS estimates.2781 –0.0057 10.2969 0. the discrepancies between the tobit estimates and the OLS estimates are greatest for those categories with the largest numbers of constrained observations.0439 –0.2969 0.0826 –0.0014 0.5842 –0.6229 0.2969 bf(Z) 0.0000 b 0. In the case of categories such as FDHO.3991 0.

0135 (0.893) –42.839 (9.222) 23.063 (27.0027) 0.883 (9.736) 10.0488 (0.325 (35.678 (16.0317 (0.0046) 0.215 (7.355) 58.0002) 0.903) OLS.532) 35.212) –108.0165 (0.197 (20.584) 5.0029) 0.611) 91.939 (10.230) 5.0004) 0.917 (19.0018) –0.381) –6.0806 (0.0005 (0.601) –16.113 (14.0025) 0.0078 (0.522) 33.202 (1.529 (6.743 (2.0027) 0.943 (24.0016) SIZE –133.636 (42.0149 (0.0220 (0.0050 (0.352) –9.061) –3.0004) 0.0011) 0.833) –1.0055) 0.030 (8.0181 (0.227) –25.0329 (0.677 (42.0054 (0.2017 (0.240) –40.500 (20.0010) 0.220) 16.0004) 0.0006) 0.295) 9.431) –43.0088) 0.0026) 0.0229 (0.0317 (0.530) 46.123) 87.0045) 0.0060) 0.0011) 0.247 (2.680 (4.887 (14.0015) 0.819 (5.0015 (0.0024) 0.0147 (0.0516 (0.0021) 0.0754 (0.946) 3.0205 (0.0452 (0.0036) 0.403 (26.896 (3.862 (10.0075) 0.368) 15.0022) 0.795) 16.009) 21.904) –160.0263 (0.612 (1.181) –2.0027) 0.0396 (0.0081 (0.977) –7.0007 (0.0072) 0.233) 329.173 (24.630) 5.831 (12.150) –23.0008) 0.0476 (0.797 (9. 0 otherwise.0075) 0.958 (7.0337 (0.0004) 0.555 (31.011 (42.15 Comparison of OLS and tobit regressions OLS.0075) 0. investigate whether there is evidence that selection bias affects the least squares estimate of the returns to college education.0037) 0.535) 3.775 (15.0515 (0.096) 92.449) –48.073 (7.492 (19.0294 (0.0015) 0.0178 (0.073 (2.334) –5.895 (4.0243 (0.0018) 0.894 (6.875 (1.0021) 0.152) –17.0033) 0.0003) 0.361 (8.0007 (0.0004) 0.0034) 0.401) –13.0028 (0.0015) 0.862) –112.279) –113. no 0 cases EXPPC 0.0014) 0.084) –113.961 (13.0014) 0.0057 (0.0032) 0.342 (14.0115 (0.484) –162.0066 (0.007 (6.0017 (0.2024 (0.0414 (0.239) –6.668 (4.6 Does sample selection bias affect the OLS estimate of the return to college education? Using your EAEF data set.0235 (0.0114 (0.0019) 0.804) 48.0124 (0.0121 (0. all cases n FDHO FDAW HOUS TELE DOM TEXT FURN MAPP SAPP CLOT FOOT GASO TRIP LOCT HEAL ENT FEES TOYS READ EDUC TOB 868 827 867 858 454 482 329 244 467 847 686 797 309 172 821 824 676 592 764 288 368 EXPPC 0.969 (19.998) 17.2020 (0.575 (11.205) –12.599) –8.0013 (0.0145 (0.842 (15.195 (1.862) 71.248) 4.0009) SIZE –132.0019) tobit SIZE –132.491 (2.0035 (0.011) EXPPC 0.0044) 0.0034 (0.0044) –0.250) 7.221 (15.0117 (0.642 (14.0033) 0.0011) 0.0031) 0.0003) 0.197) 25.054 (15.768 (1.0198 (0.0061 (0.0433 (0.649) –1.0025) 0.744) 62.199 (17.243 (42.0743 (0.0021) 0.0039 (0.0016) 0. and LGEARNCL = .0042) –0.732) –13.0041 (0.0421 (0.0212 (0.0183 (0.0027) 0.757 (2.739) 28.0344 (0.256) –40.708) 0.513 (25.0210 (0.177 (47.0003) 0.945 (11.117 (21.976) –7.0008) 0.594 (8.0036) 0.342 (21.865 (8.0095 (0. Define COLLYEAR = S – 12 if S > 12.0014) 0.0004) 0.161) –178.383) –43.0317 (0.039) 30.

0738408 5. there is no obvious simple reason for these effects.0342841 .0000211 MALE | .267465 -----------------------------------------------------------------------------LR test of indep.18587 Number of obs Censored obs Uncensored obs Wald chi2(6) Prob > chi2 = = = = = 540 254 286 95.709464 -------------+---------------------------------------------------------------/athrho | -.91 0.6253776 -.716724 .9519231 .1436681 -1.1034667 _cons | 2.0727331 -6.5213072 .204 -.086883 MALE | -.0404138 . SM. Std.98 0.198633 .4363839 .4764825 . Run the equivalent regression using least squares.021064 3.000 .2184279 1.050 -. EXP.919557 3. Comment on your findings.0967091 -.000 .601242 -------------+---------------------------------------------------------------select | ASVABC | .13 0.299503 4.71 0.0816985 .5350593 .8913181 -. SF.00 0. select(ASVABC MAL > E ETHBLACK ETHHISP SM SF SIBLINGS) Iteration Iteration Iteration Iteration Iteration 0: 1: 2: 3: 4: log log log log log likelihood likelihood likelihood likelihood likelihood = = = = = -510.2916586 .1407099 ETHBLACK | .8613388 ETHHISP | 1. ETHBLACK.000 -1. and SIBLINGS being used to determine whether the respondent attended college. replace COLLYEAR = S-12 if S>12 (286 real changes made) .0448791 . g LGEARNCL = LGEARN if COLLYEAR>0 (254 missing values generated) .023201 .5811092 ETHBLACK | -.0136364 .36 0.054971 .96 0.008141 8. ASVABC.4828234 .008101 4.0376608 .82 0.7406524 .0296495 -1.6116179 1.0957729 .0390787 .6221298 -.2089203 .46251 -509.4426682 sigma | .27 0.175 -.11 0. and ETHHISP. with ASVABC.5139176 -9.0935106 SF | .88 0.433228 . heckman LGEARNCL COLLYEAR EXP ASVABC MALE ETHBLACK ETHHISP.7604 .31 0. g COLLYEAR = 0 . As is often the case with sample selection bias.92 0.0249424 .83 0. MALE. Use the heckman procedure to regress LGEARNCL on COLLYEAR.047 . .6465576 -.0881937 .2430548 -3.159384 -1.0866862 ETHHISP | -.18587 -509.428302 -.0058 ------------------------------------------------------------------------------ .4570113 .0196862 6.000 .000 .19041 -509.4290092 6. Err.000 1.027294 .0204512 _cons | -4.1097232 -.0000 Heckman selection model (regression model with sample selection) Log likelihood = -509. The ethnicity coefficients are particularly affected.002 -.7115788 lambda | -.723984 -3.4755444 /lnsigma | -.18 0. (rho = 0): chi2(1) = 7.257 -.000 -.64 0. eqns.0051172 .65904 -509.1229832 SIBLINGS | -.785648 SM | .070927 .44 0.000 .63 Prob > chi2 = 0.000 -5. Interval] -------------+---------------------------------------------------------------LGEARNCL | COLLYEAR | . z P>|z| [95% Conf. missing otherwise.1859 -----------------------------------------------------------------------------| Coef.1228135 -3.3402692 -------------+---------------------------------------------------------------rho | -.16 LGEARN if COLLYEAR > 0. Answer: There appears to be a highly significant negative association between the random component of the selection relationship and that of the wage equation.126778 .190 -.000 .0069683 -1.0549565 ASVABC | -.6170388 .3814199 .1653623 EXP | .43 0.0302181 1.1948981 .

The second differential is d 2 log L( p ) m n−m =− 2 − 2 dp p (1 − p )2 . Answer: In each observation where the event did occur.0167309 MALE | .86 0. we obtain the first-order condition for a minimum: d log L( p ) m n − m =0 = − dp p 1− p This yields p = m/n.9 An event is hypothesized to occur with probability p.28 0.845964 285 .000 .53954 -----------------------------------------------------------------------------LGEARNCL | Coef. the probability was p.022807 .1981952 ETHHISP | -. for Bi* ≤ 0 where the Q variables determine selection.0000 0.413494611 Number of obs F( 6.64 0.8021698 1.0673316 5. Err.1441328 -0. 279) Prob > F R-squared Adj R-squared Root MSE = = = = = = 286 20.0683754 . t P>|t| [95% Conf. given m and n.0984362 . Since there were m of the former and n – m of the latter. Std. is log L( p ) = m log p + (n − m ) log(1 − p ) Differentiating with respect to p .19 0.0063027 .3108 0. it occurred m times. Demonstrate that the maximum likelihood estimator of p is m/n.1354179 -0. the probability was (1 – p).104445 Residual | 81.235 -.1777068 EXP | .50 0.291108581 -------------+-----------------------------Total | 117.000 .2884302 4.2192941 279 . In each observation where it did not occur.217166 .17 .3247333 . Answer: The selection bias model may be written Bi* = δ 1 + ∑ δ j Q ji + ε i j =2 m Yi* = β 1 ∑ β j X ji + u1 j =2 k Yi = Yi* Yi is not observed for Bi* > 0 . 10.0410075 .039627 .000 . Reinterpreting this as a function of p.3497084 . reg LGEARNCL COLLYEAR EXP ASVABC MALE ETHBLACK ETHHISP Source | SS df MS -------------+-----------------------------Model | 36.4822509 ETHBLACK | -.62667 6 6. Interval] -------------+---------------------------------------------------------------COLLYEAR | .369946 .2960 .19 0.937721 ------------------------------------------------------------------------------ 10.0564469 ASVABC | .0052975 1.776 -.0201347 6. We should check that the second differential is negative and that we have therefore found a maximum. the loglikelihood function for p.2427183 _cons | 1.75 0.97 0.334946 .0041254 .000 .1380715 .7 Show that the tobit model may be regarded as a special case of a selection bias model.0085445 4.614 -. The tobit model is the special case where the Q variables are identical to the X variables and B* is the same as Y*. the joint probability was p m (1 − p) n − m . In a sample of n observations.

−1403. log L0 is –1485. and TOB.11 In Exercise 10. this is distributed as a chi-squared statistic with degrees of freedom equal to the number of explanatory variables. the coefficients being similar to the logit and probit marginal effects and the t statistics being of the same order of magnitude as the z statistics for the logit and probit. TOYS.43). and perform the likelihood ratio test. EDUC. READ and EDUC and a negative one for TOB.18 Evaluated at p = m/n. and so we reject the null hypothesis at that level. FOOT. Answer: As defined in equation (10. Under the null hypothesis that the coefficients of the explanatory variables are all jointly equal to 0.4. as printed in the output.08.0835 + 1485. The linear probability model also yielded similar results for most of the commodities.4. The critical value of chi-squared at the 0. confirm that it is equal to that reported in the output. Age was a positive influence in the case of TRIP. − 1485.10 In Exercise 10. the linear probability model gave very different results. HEAL. HEAL. . pseudo-R2 = 1 – as appears in the output. LOCT is on the verge of being an inferior good and for that reason is not sensitive to total expenditure.0556. However for those categories of expenditure where most households made purchases. Most of these effects seem plausible with simple explanations. only 11 households did not make a purchase. In the case of TELE. 10. LOC and TOB.0835 log L =1– = 0. Answers to the additional exercises A10. TOB is well known not to be sensitive to total expenditure.1 percent significance level with 7 degrees of freedom is 24. A college education was a positive influence for TRIP. FURN. and the sample was therefore greatly imbalanced. and READ and a negative one for FDAW. so we have indeed chosen the value of p that maximizes the probability of the outcome. The results for the logit analysis and the probit analysis were very similar. as might be expected.6248) = 165.6248 log L0 Answer: The likelihood ratio statistic is 2(–1403. in this case 7.1 In the case of FDHO and HOUS there were too few non-purchasing households to undertake the analysis sensibly (one and two. 10.32. d 2 log L( p ) n2 n−m 1 ⎞ ⎛1 = − − = −n 2 ⎜ + ⎟ 2 2 m ⎛ m⎞ dp ⎝m n−m⎠ ⎜1 − ⎟ n⎠ ⎝ This is negative. Compute the pseudo-R2 and confirm that it is equal to that reported in the output.62. respectively). compute the likelihood ratio statistic 2(log L – log L0). the reasons apparently being non-economic. The total expenditure of the household and the size of the household were both highly significant factors in the decision to make a purchase for all the categories of expenditure except TELE.

0222 0.52 3.65 –2.16 5.0028 –0.82 –0.0036 0.92 cases with probability <0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 57 0 >1 288 173 181 136 0 5 0 0 4 176 12 119 5 0 137 207 121 32 105 2 0 .0518 0.1202 0.64 6.0092 0.30 7.0419 0.1179 0.61 1.0000 0.0908 0.37 0.79 5.15 1.0850 0.19 Linear probability model.0017 0.0019 –0.0690 0.0036 t –0.30 –3.20 4.0926 0.0055 0.91 2.39 2.75 –3.11 0.39 –3.0347 0.31 5.40 0.0466 0.0040 0.0004 0.0350 0.1006 0.60 4.85 4.69 5.67 –4.2073 –0.0028 –0.43 1.0002 0.1348 0.0283 0.0495 0.0008 –0.0101 0.0000 0. dependent variable CATBUY EXPPC x 10–4 n FDHO FDAW HOUS TELE DOM TEXT FURN MAPP SAPP CLOT FOOT GASO TRIP LOCT HEAL ENT FEES TOYS READ EDUC TOB 868 827 867 858 454 482 329 244 467 847 686 797 309 172 821 824 676 592 764 288 368 b2 –0.15 0.27 –0.38 2.75 5.37 –4.0523 –0.0658 0.98 –0.0615 0.06 3.48 4.0255 0.0838 0.0121 0.63 1.0018 –0.66 1.0013 0.0599 –0.0227 –0.1137 0.0018 –0.15 0.56 10.26 8.1608 0.02 4.0012 –0.0162 0.22 5.0433 0.0027 –0.84 –1.81 4.68 0.0019 –0.0444 0.96 2.75 4.0316 0.61 –3.1901 0.0655 0.78 6.1721 t 0.0109 0.0540 0.21 REFAGE b3 –0.89 –1.29 6.66 –0.68 0.0029 –0.25 –1.83 1.41 7.79 4.25 –0.0060 0.90 –3.0153 SIZE t 0.0922 0.80 –3.29 –2.64 2.0025 0.0549 0.59 4.53 2.0854 0.12 COLLEGE b4 0.43 4.0041 –0.0029 0.0033 t –0.28 10.0012 –0.0375 0.20 3.0411 0.50 0.51 5.0011 0.25 2.0017 –0.42 5.1310 –0.02 2.0049 0.0123 0.32 3.16 5.0001 –0.0181 0.51 1.70 4.17 b3 0.52 3.19 1.0174 0.0005 0.71 5.99 1.0374 0.0050 0.0030 –0.0930 0.18 –2.1206 0.39 5.61 –5.

54 3.59 3.02 –2.38 –2.46 2.68 1.2414 0.7442 –0.7728 0.43 –4.0597 –0.6092 1.0209 –0.85 2.30 4.4491 0.83 4.2314 0.20 5.18 0.17 –1.01 6.86 –1.4632 0.6355 0.3137 –0.19 SIZE z – –0.2277 0.28 5.97 4.0049 –0.6281 1.63 – 1.0227 0.0143 –0.0338 – 0.02 4.90 – 1.20 Logit model.0202 –0.6456 – 1.03 – 1.0140 –0.0240 0.5529 0.22 5.0213 –0.34 0.55 –0.7604 0.0069 0.38 6.5087 1.7260 COLLEGE b4 – 0.22 –0.0084 0.5406 0.18 9.24 1.3510 0.3311 –0.83 4.2645 1.5275 0.2855 0.5439 4.3983 0.37 5.32 3.2953 1.85 –1.7446 0.17 1.0768 –0.8642 0.68 4.18 5.0141 – 1.10 7.74 0.9372 1.57 5.11 –1.18 z – 0.64 2.0637 b3 – 3.43 – 2.09 5.0300 0.1932 0.46 4.46 2.1904 0.0152 –0.0620 –0.65 –1.09 z – 0.54 5.15 0.0075 –0.0283 –0.3447 0.3071 0.10 3.0059 –0.34 –4.16 4.22 0.0823 –0.45 6.0788 0.1033 0.80 –1.4393 0.1817 0.49 5.3162 0.99 –5.5214 1. dependent variable CATBUY EXPPC x 10–4 n FDHO FDAW HOUS TELE DOM TEXT FURN MAPP SAPP CLOT FOOT GASO TRIP LOCT HEAL ENT FEES TOYS READ EDUC TOB – 827 – 858 454 482 329 244 467 847 686 797 309 172 821 824 676 592 764 288 368 – 3.0330 0.0294 0.19 –1.2648 0.05 3.96 7.78 9.1577 2.8601 0.0246 1.89 –2.31 3.5428 0.82 z .55 4.79 4.0081 –0.07 2.49 4.0163 b2 – 5.9863 0.0173 0.83 1.68 –3.6053 0.6211 – 0.44 4.0139 REFAGE b3 – –2.40 –0.5351 0.43 2.21 –0.1084 –0.05 –3.

5887 0.79 1.65 5.2467 0. dependent variable CATBUY EXPPC x 10–4 n FDHO FDAW HOUS TELE DOM TEXT FURN MAPP SAPP CLOT FOOT GASO TRIP LOCT HEAL ENT FEES TOYS READ EDUC TOB – 827 – 858 454 482 329 244 467 847 686 797 309 172 821 824 676 592 764 288 368 – 1.2951 – 0.19 –1.08 2.65 5.14 3.0031 –0.46 4.0141 0.09 6.0606 –0.4477 COLLEGE b4 – 0.89 –1.39 2.0088 –0.0086 REFAGE b3 – –3.36 0.0423 0.2630 0.17 5.6121 –0.21 Probit model.31 4.4806 0.43 2.8151 0.0448 0.0107 –0.1694 0.0051 –0.1135 0.32 5.53 6.11 –0.0174 0.0103 0.63 4.2770 0.4036 0.3535 0.0039 0.4417 –0.8299 0.0182 – 0.21 –2.1733 0.54 4.1791 –0.0145 0.0301 –0.6988 – 0.86 –1.2998 0.1105 0.3348 0.0168 –0.2188 0.10 z – 0.1628 0.92 –5.58 9.10 –2.3252 2.25 9.66 –5.0167 0.0135 0.0149 0.1841 0.0086 –0.4195 0.1168 0.7905 0.4869 1.18 z – 0.10 7.4932 0.1635 0.59 1.67 3.93 4.5129 0.31 3.72 – 1.69 –3.67 4.44 3.2160 0.09 – 1.18 0.64 1.1506 0.3296 0.1917 –0.98 0.43 4.0046 –0.96 4.48 5.48 6.4519 0.42 5.1452 0.16 –1.12 3.63 –2.28 5.35 4.04 0.36 7.55 – 2.0065 0.37 –4.3386 0.03 – 1.51 1.6842 0.62 5.0035 –0.11 –1.69 4.73 –1.18 SIZE z – –0.67 –2.84 z .20 4.65 –0.0082 –0.46 0.0172 – 0.08 5.0391 b3 – 3.59 2.2849 0.0428 –0.15 0.82 4.1556 0.2806 0.16 5.05 –0.54 2.0088 –0.3257 0.12 –3.0116 –0.0106 b2 – 5.3091 0.5130 0.62 2.

0930** 0.0020 0.0923** 0.0002 0. ** at 1 percent level.0036** –0.0993** 0.0174 0. two-tailed tests .0306 TELE 858 DOM 454 TEXT 482 FURN 329 MAPP 244 SAPP 467 CLOT 847 FOOT 686 * significant at 5 percent level.0028 –0.1179** 0.0316** 0.0087 0.0655** 0.0913** SIZE 0.0040* 0.0850* 0.0452** 0.0001 0.0101 0.1291** 0.0058 0.0203 0.0540** 0.0018 –0.0003** 0.0757** 0.0024** COLLEGE 0.0003 0.0060 0.0969** 0.0518** 0.0002** –0.0063** 0.1206** 0.1350** 0.0441** 0.0709** 0.0012 –0.0055 0.0688** 0.0181** 0.0168 0.0488** 0.0060* 0.0049 0.0001 –0.0092 0.0727** 0.0227 0.0240** 0.0018 –0.0838** 0.0071** 0.0039** 0.0893** 0.0982** 0.0002 0.0044* 0.0123 0.0860* 0.0004 0.0012 –0.0020 –0.0012 0.0351 –0.1286** 0.0041** 0.0690** 0.0019 –0.0926** 0.1202** 0.0453** 0.0008* 0.0433** 0.0076 0.0013** 0.0078 0.0045** 0.0000 0.1265** 0.0002 0.0019 0.0260** 0.0012 –0.0023** –0.0526** 0.0239 –0.0444** 0.22 Comparison of marginal effects n FDAW 827 LPM Logit Probit LPM Logit Probit LPM Logit Probit LPM Logit Probit LPM Logit Probit LPM Logit Probit LPM Logit Probit LPM Logit Probit LPM Logit Probit EXPPC4 0.0013** 0.0053 –0.0028** –0.0860* 0.0283 –0.0050 –0.0012 0.0444** 0.0121** 0.0040** –0.1332** 0.0453** REFAGE –0.0419** 0.1266** 0.0148 0.0018** –0.0012 –0.0087 0.0000 –0.0543** 0.

0002* –0. For both sexes.0535 0.0011 –0.0042 SIZE 0.1206** 0.0058** –0.1137** 0.0222 0.0002 –0.1769** –0.0020** –0.0032** 0.3 (a) Since p is a function of Z.1721** –0.0027** 0.0643** 0.0622** 0.0041** –0.0046 –0.0153 REFAGE –0.0375** 0.0600** 0.0124 0.1636** 0.0702** 0.0055** –0.0854** 0.0251** 0.1348** 0.0017** –0.1751** TRIP 309 LOCT 172 HEAL 821 ENT 824 FEES 676 TOYS 592 READ 764 EDUC 288 TOB 368 * significant at 5 percent level.1006** 0.0057** –0.0922** 0.0109 0.0339** 0.0040 –0.0707** 0.0599** 0.0837** 0.0034** COLLEGE 0.0039 0.0150 0.0635** 0.0270** 0.0012** 0.0311** 0.0255** 0.0292* 0.2408** 0. and hence the possible sensitivity of the participation rate to S is smaller.1765** 0.0658** 0.0466** 0.1508** –0.1124** 0.0350* 0.0034** –0. the greater is the participation rate.1878** 0.0085** 0.0021** –0.23 Comparison of marginal effects (continued) n GASO 797 LPM Logit Probit LPM Logit Probit LPM Logit Probit LPM Logit Probit LPM Logit Probit LPM Logit Probit LPM Logit Probit LPM Logit Probit LPM Logit Probit LPM Logit Probit EXPPC4 0. two-tailed tests A10.0974** 0.0092* 0.0656** –0.0036 –0.2 The finding that the marginal effect of educational attainment was lower for males than for females over most of the range S ≥ 9 is plausible because the probability of working is much closer to 1 for males than for females for S ≥ 9.0162** 0. and Z is a linear function of the X variables.0030** 0.0463* –0.1512** 0.1084** 0.0051** 0.0040 –0.1845** 0.0124 0.0016* –0.0654** –0.0673** 0.0030** –0.0155 0.0013* –0.0310** 0. and hence the smaller is the scope for it being increased by further education.0011 –0.0615** 0.0099 0.0003 –0.0096** 0. The OLS estimates of the marginal effect of educational attainment are given by the slope coefficients and they are very similar to the logit estimates at the mean.0029** –0.1230** 0. the reason being that most of the observations on S are confined to the middle part of the sigmoid curve where it is relatively linear.0549 0.0318** 0.0430* 0.1901** 0.0033** –0.1728** 0.0049** –0.1761** 0.1310** 0.0579 0.0153 0.0523** 0. The explanation of the finding that the marginal effect of educational attainment decreases with educational attainment for both males and females over the range S ≥ 9 is similar.0374** 0.0007 0.0015** –0. ** at 1 percent level. the marginal effect of Xj is dp ∂Z dp ∂p = = βj ∂X j dZ ∂X j dZ .0311** 0.0257** 0.0347** 0.0908** 0.1057** 0.2073** 0.0495** 0.0011 0.0086** 0.1029** 0.0017** 0.1608** 0. the greater is S.0229** 0.0090* 0.0411* –0.1083** 0.0018** 0. A10.0105** 0.2243** –0.

respectively.6 The joint density function is λn e ∑ . The loglikelihood is ⎛ e −3λ λ9 log L = log⎜ ⎜ 480 ⎝ The first-order condition for a turning point is ⎞ ⎟ ⎟ = −3λ + 9 log λ − log 480 ⎠ d log L 9 = −3 + = 0 dλ λ and so λ = 3. given the data. this is the likelihood function for λ. For every extra year of schooling. it is likely to be weaker for females than for males. the probability is reduced by 5.368 when evaluated at the means.0 percent for males and 4. This is a maximum because the second differential –9/λ2 is negative. (b) Yes. Test the coefficient of the slope dummy for S. The first-order condition is dl = −3e −3λ λ9 + e −3λ 9λ8 = 0 dλ again implying λ = 3. the respondents have passed the age at which most adults start families. . So for every extra year of age. and the older they are.2 percent for females. Note that the marginal effect of S has two components that must be added – the component due to S and the component due to SSQ.9 percent for males and 2.6 percent for females. the more educated the respondent.272*–0. Interpreting this as a likelihood function. A10. adding a male intercept dummy and male slope dummies for AGE and S.026. given the age of the cohort. Given that the cohort is aged 35–42. −λ X A10.368*0.154= –0. (c) Fit a probit regression for the combined sample. given that most females intending to have families will have started them by this time.050 and that of S is 0.5 The joint density of the three observations is e − λ λ 2 e − λ λ5 e − λ λ 2 . p = F(Z) is the cumulative standardized normal distribution. irrespective of their education. Hence the marginal effect of AGE is 0. 2! 5! 2! Reinterpreting this as a function of λ. In the case of probit analysis.049. Hence dp/dZ is just the standardized normal distribution. The first-order condition is n λ − ∑ Xi = 0 . the probability increases by 4.4 For low values of S. so one is travelling up the sigmoid function F(Z) and the probability is increasing.368*–0. the loglikelihood is i n log λ − λ ∑ X i . so the positive effect of schooling is also plausible. but.137 = –0. A10. this is 0. At the same time. For males.24 where βj is the coefficient of Xj in the expression for Z.094 = 0. For females the corresponding figures are 0.042 and 0. However. the more likely he or she is to have started having a family relatively late. Note: In this example it is not difficult to maximize L directly. the less likely they are to have small children in the household. because Z is a quadratic function of S. it reaches a maximum and further increases in S causes Z to fall and one starts travelling back down the sigmoid function. Z is increasing.132 = 0.272*0.

25 ˆ = 1 . the log-likelihood function for p is log L( p ) = m log p + (n − m ) log(1 − p ) Thus the LR statistic is ⎛⎛ ⎞ m ⎛ m ⎞⎞ ⎜ − (m log p 0 + (n − m ) log(1 − p 0 ))⎟ LR = 2⎜ 1− ⎟⎟ ⎟ ⎜ ⎜ m log n + (n − m ) log⎜ ⎟ n ⎠⎠ ⎝ ⎝⎝ ⎠ ⎛ ⎛1 − m / n ⎞⎞ ⎛ m/n ⎞ ⎜ ⎟⎟ = 2⎜ m log⎜ ⎜ p ⎟ ⎟ + (n − m ) log⎜ 1 − p ⎟ ⎟ ⎜ 0 0 ⎝ ⎠⎠ ⎝ ⎠ ⎝ If m = 40 and n = 100.64).9 The first derivative of the log-likelihood function is d log L( p ) m n − m =0 = − dp p 1− p and the second differential is d 2 log L( p ) m n−m =− 2 − 2 dp p (1 − p )2 Evaluated at p = m/n. confirms that we have maximized the loglikelihood.6 ⎞ ⎞ ⎛ 0. being Hence λ X λ2 negative. 1 m n−m m(n − m ) n n n n3 2 2 . A10.5 ⎟ ⎟ ⎟ = 4.9.7 The probability of X taking any particular value in any observation is 1/n. n must be at least equal to 3 since one observation is equal to 3. The second derivative is − n .03 ⎠⎠ ⎝ ⎠ ⎝ ⎝ We would reject the null hypothesis at the 5 percent level (critical value of chi-squared with one degree of freedom 3.8 From the solution to Exercise 10. which. we require n to be as small as possible. A10. To maximize this.5 ⎟ + 60 log⎜ 0.84) but not at the 1 percent level (critical value 6. A10. the LR statistic for H0: p = 0.5 is ⎛ ⎛ 0.4 ⎞ LR = 2⎜ ⎜ 40 log⎜ 0. Hence 3 is the maximum likelihood estimator. n2 n−m n3 d 2 log L( p ) 1 ⎞ 2⎛ 1 = − − = − + n = − ⎜ ⎟ m ⎛ m ⎞2 m(n − m ) dp 2 ⎝m n−m⎠ ⎜1 − ⎟ n⎠ ⎝ The variance of the ML estimate is given by ⎛ d 2 log L( p ) ⎞ ⎟ ⎜− ⎟ ⎜ dp 2 ⎠ ⎝ −1 ⎞ ⎛ n3 ⎟ =⎜ ⎜ m(n − m ) ⎟ ⎠ ⎝ −1 = m(n − m ) n3 The Wald statistic is therefore ⎞ ⎛m ⎞ ⎛m ⎜ − p0 ⎟ ⎜ − p0 ⎟ ⎝n ⎠ =⎝n ⎠ . Hence the joint density of the two observations in the sample is 1/n2. where X is the sample mean of X.

5)2 1 × 0.8. this is equal to (0.6 100 = 4.4 × 0. .17 .26 Given the data.4 − 0. and so the conclusion is the same as in Exercise A. Under the null hypothesis this has a chi-squared distribution with one degree of freedom.