You are on page 1of 28

School of Economics

Introductory Econometrics
ECON2206/ECON3290 Tutorial Program
Sample Answers

Session 1, 2011

Table of Contents
Week 2 Tutorial Exercises ............................................................................................................................................................................................... 3 Week 3 Tutorial Exercises ............................................................................................................................................................................................... 5 Week 4 Tutorial Exercises ............................................................................................................................................................................................... 7 Week 5 Tutorial Exercises ............................................................................................................................................................................................... 9 Week 6 Tutorial Exercises ............................................................................................................................................................................................ 11 Review Questions (these may or may not be discussed in tutorial classes) .................................................................................. 11 Problem Set (these will be discussed in tutorial classes) ....................................................................................................................... 11 Review Questions (these may or may not be discussed in tutorial classes) .................................................................................. 15 Problem Set (these will be discussed in tutorial classes) ....................................................................................................................... 15 Review Questions (these may or may not be discussed in tutorial classes) .................................................................................. 17 Problem Set (these will be discussed in tutorial classes) ....................................................................................................................... 17 Review Questions (these may or may not be discussed in tutorial classes) .................................................................................. 20 Problem Set (these will be discussed in tutorial classes) ....................................................................................................................... 20 Readings .......................................................................................................................................................................................................................... 23 Review Questions (these may or may not be discussed in tutorial classes) .................................................................................. 23 Problem Set (these will be discussed in tutorial classes) ....................................................................................................................... 24 Review Questions (these may or may not be discussed in tutorial classes) .................................................................................. 26 Problem Set ................................................................................................................................................................................................................... 27 Review Questions (these may or may not be discussed in tutorial classes) .................................................................................. 29 Problem Set (these will be discussed in tutorial classes) ....................................................................................................................... 30 Review Questions (these may or may not be discussed in tutorial classes) .................................................................................. 32 Problem Set (these will be discussed in tutorial classes) ....................................................................................................................... 33 Week 7 Tutorial Exercises ............................................................................................................................................................................................ 15 Week 8 Tutorial Exercises ............................................................................................................................................................................................ 17 Week 9 Tutorial Exercises ............................................................................................................................................................................................ 20 Week 10 Tutorial Exercises ......................................................................................................................................................................................... 23 Problem Set (these will be discussed in tutorial classes) ....................................................................................................................... 10 Review Questions (these may or may not be discussed in tutorial classes) ..................................................................................... 9 Problem Set (these will be discussed in tutorial classes) .......................................................................................................................... 8 Review Questions (these may or may not be discussed in tutorial classes) ..................................................................................... 7 Problem Set ...................................................................................................................................................................................................................... 5 Review Questions (these may or may not be discussed in tutorial classes) ..................................................................................... 5 STATA Hints ..................................................................................................................................................................................................................... 4 Problem Set ...................................................................................................................................................................................................................... 3

Week 11 Tutorial Exercises ......................................................................................................................................................................................... 26 Week 12 Tutorial Exercises ......................................................................................................................................................................................... 29 Week 13 Tutorial Exercises ......................................................................................................................................................................................... 32

Week 2 Tutorial Exercises


Problem Set
Q1. Wooldridge 1.1 (i) Ideally, we could randomly assign students to classes of different sizes. That is, each student is assigned a different class size without regard to any student characteristics such as ability and family background. For reasons we will see in Chapter 2, we would like substantial variation in class sizes (subject, of course, to ethical considerations and resource constraints). (ii) A negative correlation means that larger class size is associated with lower performance. We might find a negative correlation because larger class size actually hurts performance. However, with observational data, there are other reasons we might find a negative relationship. For example, children from more affluent families might be more likely to attend schools with smaller class sizes, and affluent children generally score better on standardized tests. Another possibility is that, within a school, a principal might assign the better students to smaller classes. Or, some parents might insist their children are in the smaller classes, and these same parents tend to be more involved in their childrens education. (iii) Given the potential for confounding factors some of which are listed in (ii) finding a negative correlation would not be strong evidence that smaller class sizes actually lead to better performance. Some way of controlling for the confounding factors is needed, and this is the subject of multiple regression analysis.

Q2. Wooldridge 1.2 (i) Here is one way to pose the question: If two firms, say A and B, are identical in all respects except that firm A supplies job training one hour per worker more than firm B, by how much would firm As output differ from firm Bs? (ii) Firms are likely to choose job training depending on the characteristics of workers. Some observed characteristics are years of schooling, years in the workforce, and experience in a particular job. Firms might even discriminate based on age, gender, or race. Perhaps firms choose to offer training to more or less able workers, where ability might be difficult to quantify but where a manager has some idea about the relative abilities of different employees. Moreover, different kinds of workers might be attracted to firms that offer more job training on average, and this might not be evident to employers. (iii) The amount of capital and technology available to workers would also affect output. So, two firms with exactly the same kinds of employees would generally have different outputs if they use different amounts of capital or technology. The quality of managers would also have an effect. (iv) No, unless the amount of training is randomly assigned. The many factors listed in parts (ii) and (iii) can contribute to finding a positive correlation between output and training even if job training does not improve worker productivity. Q3. Wooldridge C1.3 (meap01_ch01.do) (i) The largest is 100, the smallest is 0. (ii) 38 out of 1,823, or about 2.1 percent of the sample. (iii) 17 (iv) The average of math4 is about 71.9 and the average of read4 is about 60.1. So, at least in 2001, the reading test was harder to pass. (v) The sample correlation between math4 and read4 is about 0.843, which is a very high degree of (linear) association. Not surprisingly, schools that have high pass rates on one test have a strong tendency to have high pass rates on the other test.

STATA Hints

Q4. Wooldridge C1.4 (jtrain2_ch01.do) (i) 185/445 .416 is the fraction of men receiving job training, or about 41.6%. (ii) For men receiving job training, the average of re78 is about 6.35, or $6,350. For men not receiving job training, the average of re78 is about 4.55, or $4,550. The difference is $1,800, which is very large. On average, the men receiving the job training had earnings about 40% higher than those not receiving training. (iii) About 24.3% of the men who received training were unemployed in 1978; the figure is 35.4% for men not receiving training. This, too, is a big difference. (iv) The differences in earnings and unemployment rates suggest the training program had strong, positive effects. Our conclusions about economic significance would be stronger if we could also establish statistical significance (which is done in Computer Exercise C9.10 in Chapter 9). Some of STATA do-files for solving the Problem Set are posted in the course website. It is important to understand the commands in the do-files so that you will be able write your own do-files for your assignments and the course project.

(vi) The average of exppp is about $5,194.87. The standard deviation is $1,091.89, which shows rather wide variation in spending per pupil. [The minimum is $1,206.88 and the maximum is $11,957.64.] (vii) 8.7%. Note that, by Taylor expansion, log(x + h) log(x) = log(1 + h/x) h/x, for positive x and small h/x (in absolute value).

Week 3 Tutorial Exercises


Review Questions (these may or may not be discussed in tutorial classes)
The minimum requirement for OLS to be carried out for the data set {(xi, yi), i=1,,n} with the sample size n > 2 is that the sample variance of x is positive. In what circumstances is the sample variance of x zero? When all observations on x have the same value, there is no variation in x. The OLS estimation of the simple regression model has the following properties: a) the sum of the residuals is zero; b) the sample covariance of the residuals and x is zero. Why? How would you relate them to the least squares principle? These are derived from the first-order conditions for minimising the sum of squared residuals (SSR). Convince yourself that the point ( x , y ) , the sample means of x and y, is on the sample

Problem Set

regression function (SRF), which is a straight line. This can be derived from the first of the first-order conditions for minimising SSR. How do you know that SST = SSE + SSR is true? The dependent variable values can be expressed as the fitted value plus the residual. After ) = ( ) + . taking the average away from both side of the equation, we find ( The required SST = SSE + SSR is obtained by squaring and summing both sides of the 1 ( ), and using the second of the first-order ) = above equation, noting ( conditions of minimising SSR. Which of the following models is (are) nonlinear model(s)? a) sales = 0 /[1 + exp(-1 ad_expenditure)] + u; b) sales = 0 + 1 log(ad_expenditure) + u; c) sales = 0 + 1 exp(ad_expenditure) + u; d) sales = exp(0 + 1 ad_expenditure + u). The model in a) is nonlinear. Can you follow the proofs of Theorems 2.1-2.3? You should be able to follow the proofs. If not at this point, try again.

Q1. Wooldridge 2.4 (i) When cigs = 0, predicted birth weight is 119.77 ounces. When cigs = 20, bwght = 109.49. This is about an 8.6% drop. (ii) Not necessarily. There are many other factors that can affect birth weight, particularly overall health of the mother and quality of prenatal care. These could be correlated with cigarette smoking during birth. Also, something such as caffeine consumption can affect birth weight, and might also be correlated with cigarette smoking. (iii) If we want a predicted bwght of 125, then cigs = (125 119.77)/( .524) 10.18, or about 10 cigarettes! This is nonsense, of course, and it shows what happens when we are trying to predict something as complicated as birth weight with only a single explanatory variable. The largest predicted birth weight is necessarily 119.77. Yet almost 700 of the births in the sample had a birth weight higher than 119.77. (iv) 1,176 out of 1,388 women did not smoke while pregnant, or about 84.7%. Because we are using only cigs to explain birth weight, we have only one predicted birth weight

Q2. Wooldridge 2.7 (i) When we condition on inc in computing an expectation, inc becomes a constant. So E(u|inc) = E(e|inc) = E(e|inc) = 0 because E(e|inc) = E(e) = 0. (ii) Again, when we condition on inc in computing a variance, inc becomes a constant. So 2 2 inc because Var(e|inc) = . Var(u|inc) = Var(e|inc) = ( )2Var(e|inc) = (iii) Families with low incomes do not have much discretion about spending; typically, a low-income family must spend on food, clothing, housing, and other necessities. Higher income people have more discretion, and some might choose more consumption while others more saving. This discretion suggests wider variability in saving among higher income families. Q3. Wooldridge C2.6 (meap93_ch02.do) (i) It seems plausible that another dollar of spending has a larger effect for low-spending schools than for high-spending schools. At low-spending schools, more money can go toward purchasing more books, computers, and for hiring better qualified teachers. At high levels of spending, we would expect little, if any, effect because the high-spending schools already have high-quality teachers, nice facilities, plenty of books, and so on. (ii) If we take changes, as usual, we obtain 10 = 1 log() 1 (%), just as in the second row of Table 2.3. So, if %expend = 10, math10 = 1/10. (iii) The regression results are 10 = 69.34 + 11.16 log() , = 408, 2 = .0297 (iv) If expend increases by 10 percent, 10 increases by about 1.1 percentage points. This is not a huge effect, but it is not trivial for low-spending schools, where a 10 percent increase in spending might be a fairly small dollar amount. (v) In this data set, the largest value of math10 is 66.7, which is not especially close to 100. In fact, the largest fitted values is only about 30.2.
100

at cigs = 0. The predicted birth weight is necessarily roughly in the middle of the observed birth weights at cigs = 0, and so we will under predict high birth rates.

Q4. Wooldridge C2.7 (charity_ch02.do) (i) The average gift is about 7.44 Dutch guilders. Out of 4,268 respondents, 2,561 did not give a gift, or about 60 percent. (ii) The average mailings per year are about 2.05. The minimum value is .25 (which maximum value is 3.5. (iii) The estimated equation is = 2.01 + 2.65 mailsyear, n = 4,268, 2 = .0138 (iv) The slope coefficient from part (iii) means that each mailing per year is associated with perhaps even causes an estimated 2.65 additional guilders, on average. Therefore, if each mailing costs one guilder, the expected profit from each mailing is estimated to be 1.65 guilders. This is only the average, however. Some mailings generate no contributions, or a contribution less than the mailing cost; other mailings generated much more than the mailing cost. (v) Because the smallest mailsyear in the sample is .25, the smallest predicted value of gifts is 2.01 + 2.65(.25) 2.67. Even if we look at the overall population, where some people have received no mailings, the smallest predicted value is about two. So, with this estimated equation, we never predict zero charitable gifts.

Week 4 Tutorial Exercises


Review Questions (these may or may not be discussed in tutorial classes)
What do we mean when we say regress wage on educ and expr? Use OLS to estimate the multiple linear regression model: wage = 0 + 1educ + 2expr + u. Why and under what circumstance do we need to control for expr in the regression model in order to quantify the effect of educ on wage? We need to control for expr in the model when the partial or ceteris paribus effect of educ on wage is the parameter of interest and expr and educ are correlated. What is the bias of an estimator? ) j. bias =E( What is the omitted variable bias? It is the bias resulted from omitting relevant variables that are correlated with the explanatory variables in a regression model. What is the consequence of adding an irrelevant variable to a regression model? Including an irrelevant variable will generally increase the variance of the OLS estimators. What is the requirement of the ZCM assumption, in your own words? You must do this by yourself. Why assuming E(u) = 0 is not restrictive when an intercept is included in the regression model? When the mean value of the disturbance is not zero, you can always define the new disturbance as the original disturbance minus the mean value, and define the new intercept as the original intercept plus the mean value. In terms of notation, why do we need two subscripts for independent variables? In our notation, the first subscript is for indexing observations (1st obs, 2nd obs, etc.) and the second subscript is for indexing variables (1st variable, 2nd variable, etc.). Using OLS to estimate a multiple regression model with k independent variables and an intercept, how many first-order conditions are there? k + 1. How do you know that the OLS estimators are linear combinations of the observations on the dependent variable? By the first-order conditions, we see that the OLS estimators are the solution to k+1 linear equations with k+1 unknowns. The right-hand-sides of the equations are linear functions of observations on the dependent variable (say y). Hence the solution is linear combinations of the y-observations. What is Gauss-Markov theorem? Read the textbook. Try to explain why R2 never decreases (it is likely to increase) when additional explanatory variables are added to the regression model. First, the smaller is the SSR, the greater is the R-squared. The OLS chooses parameter estimates to minimise the SSR. With extra explanatory variables, the R-squared does not decrease because the SSR does not increase, which is true because the OLS has a larger parameter space to choose estimates (more possibilities to make SSR small). What is an endogenous explanatory variable? Exogenous explanatory variable? See the top of page 88. What is multicollinearity and its likely effect on the OLS estimators? Read pages 95-99.

Problem Set (these will be discussed in tutorial classes)


Q1. Wooldridge 3.2 (i) Yes. Because of budget constraints, it makes sense that, the more siblings there are in a family, the less education any one child in the family has. To find the increase in the number of siblings that reduces predicted education by one year, we solve 1 = .094(sibs), so sibs = 1/.094 10.6. (ii) Holding sibs and feduc fixed, one more year of mothers education implies .131 years more of predicted education. So if a mother has four more years of education, her son is predicted to have about a half a year (.524) more years of education. (iii) Since the number of siblings is the same, but meduc and feduc are both different, the coefficients on meduc and feduc both need to be accounted for. The predicted difference in education between B and A is .131(4) + .210(4) = 1.364.

Q2. Wooldridge 3.10 (i) Because x1 is highly correlated with x2 and x3, and these latter variables have large partial effects on y, the simple and multiple regression coefficients on x1 can differ by large amounts. We have not done this case explicitly, but given equation (3.46) and the discussion with a single omitted variable, the intuition is pretty straightforward. 1 to be similar (subject, of course, to what we mean by 1 and (ii) Here we would expect almost uncorrelated). The amount of correlation between x2 and x3 does not directly effect the multiple regression estimate on x1 if is essentially uncorrelated with x2 and x3. (iii) In this case we are (unnecessarily) introducing multicollinearity into the regression: x2 and x3 have small partial effects on y and yet x2 and x3 are highly correlated withx1. Adding x2 1 ) is and x3 likely increases the standard error of the coefficient on x1 substantially, so se( likely to be much larger than se(1 ). (iv) In this case, adding x2 and x3 will decrease the residual variance without causing much 1 ) collinearity (because x1 is almost uncorrelated with x2 and x3 ), so we should see se( 1 ). 1 ). The amount of correlation between x and does not directly affect se( smaller than se( Q3. Wooldridge C3.2 (hprice1_ch03.do) (i) The estimated equation is = 19.32 + .128 + 15.20 , = 88, 2 = .632 (ii) Holding square footage constant, (the price increase) = 15.20 (increase in bedroom). Hence the price increases by 15.20, which means $15,200. (iii) Now (the price increase) = .128 (increase in square feet) + 15.20 (increase in bedroom) = .128 (140) + 15.20 (1) = 33.12. That is $33,120. Because the house size is increasing, the effect is larger than that in (ii). (iv) About 63.2% (v) The predicted price is -19.32+.128(2438)+15.20(4) = 353.544, or $353,544. (vi) The residual = 300,000 353,544 = - 53,544. The buyer under-paid, judged by the predicted price. But, of course, there are many other features of a house (some that we cannot even measure) that affect price. We have not controlled for those.

Q4. Wooldridge C3.6 (wage2_ch03_c3_6.do) (i) The slope coefficient from regressing IQ on educ is 3.53383. (ii) The slope coefficient from log(wage) on educ is .05984. (iii) The slope coefficient from log(wage) on educ and IQ are .03912 and .00586 respectively. ~ 1 + (iv) The computed 1 2 = .03912 + 3.53383(.00586) .05984.

Week 5 Tutorial Exercises


Review Questions (these may or may not be discussed in tutorial classes)
What are the CLM assumptions? These are MLR1, MLR2, MLR3 and MLR6. MLR6 is a very strong assumption that implies both MLR4 and MLR5. What is the sampling distribution of the OLS estimators under the CLM assumptions? The OLS estimators under the CLM assumptions follow the normal distribution. What are the standard errors of the OLS estimators? The standard error is the square root of the estimated variance of the estimator. What is the null hypothesis about a parameter? In this context, the null hypothesis is a statement that assigns a known value (or hypothesised value) to the parameter of interest. Usually, the null is a maintained proposition (from economic theories or experience) that will only be rejected on very strong evidence. What is a one-tailed (two-tailed) alternative hypothesis? This should be clear once you finish your reading. In testing hypotheses, what is a Type 1 (Type 2) error? What is the level of significance? Type 1 error is the rejection of a true null. Type 2 error is the non-rejection of a false null. The decision rule we use can be stated as reject the null if the t-statistic exceeds the critical value. How is the critical value determined? The critical value is determined by the level of significance (or level of confidence in the case of confidence intervals) and the distribution of the test statistic. For example, for t-stat, we use the t-distribution when the sample size is small, and the normal distribution when the sample size is large (df > 120). Justify the statement Given the observed test statistic, the p-value is the smallest significant level at which the null hypothesis would be rejected. If you used the observed t-stat as your critical value, you would just reject the null (because the t-stat is equal to this value). This critical value (= the t-stat value) implies a level of significance, say p, which is exactly the p-value. For any significance level smaller than p, the corresponding critical value is more extreme than the t-stat value and does not lead to the rejection of the null. Hence, p is the smallest significance level at which the null would be rejected. What is the 90% confidence interval for a parameter? It is an interval [L, U] defined by two sample statistics: the lower bound L and the upper bound U. The probability that [L, U] covers the parameter is 0.9. In constructing a confidence interval for a parameter, what is the level of confidence? It is the probability that the CI covers the parameter. When the level of confidence increases, how would the width of the confidence interval change (holding other things fixed)? The width will increase. Try to convince yourself that the event the 90% confidence interval covers a hypothesised value of the parameter is the same as the event the null of the parameter being the hypothesised value cannot be rejected in favour of the two-tailed alternative at the 10% level of significance. It becomes apparent if you start with P(|t-stat| > c) = 0.9 and disentangle the expression for the t-stat.

Problem Set (these will be discussed in tutorial classes)


Q1. Wooldridge 4.1 (i) and (iii) generally cause the t statistics not to have a t distribution under H0. Homoskedasticity is one of the CLM assumptions. An important omitted variable violates Assumption MLR.4 (hence MLR.6) in general. The CLM assumptions contain no mention of the sample correlations among independent variables, except to rule out the case where the correlation is one (MLR.3).

Q2. Wooldridge 4.2 (i) H0:3 = 0. H1:3 > 0. (ii) The proportionate effect on is .00024(50) = .012. To obtain the percentage effect, we multiply this by 100: 1.2%. Therefore, a 50 point ceteris paribus increase in ros is predicted to increase salary by only 1.2%. Practically speaking, this is a very small effect for such a large change in ros. (iii) The 10% critical value for a one-tailed test, using df = , is obtained from Table G.2 as 1.282. The t statistic on ros is .00024/.00054 .44, which is well below the critical value. Therefore, we fail to reject H0 at the 10% significance level. (iv) Based on this sample, the estimated ros coefficient appears to be different from zero only because of sampling variation. On the other hand, including ros may not be causing any harm; it depends on how correlated it is with the other independent variables (although these are very significant even with ros in the equation). Q3. Wooldridge 4.5 (i) .412 1.96(.094), or about [.228, .596]. (ii) No, because the value .4 is well inside the 95% CI. (iii) Yes, because 1 is well outside the 95% CI.

Q4. Wooldridge C4.8 (401ksubs_ch04.do) (i) There are 2,017 single people in the sample of 9,275. (ii) The estimated equation is = 43.04 + .799 inc + .843 age ( 4.08) (.060) (.092) n = 2,017, R2 = .119. The coefficient on inc indicates that one more dollar in income (holding age fixed) is reflected in about 80 more cents in predicted nettfa; no surprise there. The coefficient on age means that, holding income fixed, if a person gets another year older, his/her nettfa is predicted to increase by about $843. (Remember, nettfa is in thousands of dollars.) Again, this is not surprising. (iii) The intercept is not very interesting as it gives the predicted nettfa for inc = 0 and age = 0. Clearly, there is no one with even close to these values in the relevant population. (iv) The t statistic is (.843 1)/.092 1.71. Against the one-sided alternative H1: 2 < 1, the p-value is about .044. Therefore, we can reject H0: 2 = 1 at the 5% significance level (against the one-sided alternative). (v) The slope coefficient on inc in the simple regression is about .821, which is not very different from the .799 obtained in part (ii). As it turns out, the correlation between inc and age in the sample of single people is only about .039, which helps explain why the simple and multiple regression estimates are not very different, see the text also.

10

Week 6 Tutorial Exercises


Review Questions (these may or may not be discussed in tutorial classes)
How would you test hypotheses about a single linear combination of parameters? We may re-parameterise the regression model to isolate the single linear combination. Then the OLS on the re-parameterised model will provide the estimate of the single linear combination and the related standard error. See the example in the textbook and lecture slides What are exclusion restrictions for a regression model? Exclusion restrictions are the null hypothesis that a group of x-variables have zero coefficients in the regression model. What are restricted and unrestricted models? When the null hypothesis is a set of restrictions on the parameters, the regression under the null is called the restricted model, while the regression under the alternative (which simply states that the null is false) is called the unrestricted model. How do you compute the F-statistic, given that you have SSRs? Clearly, the general F-stat is based on the relative difference between the SSRs under the restricted model and unrestricted model: F = [(SSRr SSRur)/q]/[SSRur/(n-k-1)], where you should be able to explain the meanings of various symbols. What are general linear restrictions on parameters? These are linear equations on the parameters. For example, 1 22 + 33 = 0. What is the test for the overall significance of a regression? This is the F-test for the null hypothesis that all the coefficients of x-variables are zero. How would you report your regression results? See the guidelines in Section 4.6 of the textbook. Why would you care about the asymptotic properties of the OLS estimators? When MLR.6 does not hold, the finite-sample distribution of the OLS estimators is not available. For reasonably large samples (large n), we use the asymptotic distribution of the OLS estimators, which is known from studying the asymptotic properties, to approximate the finite-sample distribution of the OLS estimators, which is unknown. Comparing the inference procedures in Chapter 5 with those in Chapter 4, can you list the similarities and differences? It is beneficial to make a list by yourself. Under MLR.1-MLR.5, the OLS estimators are consistent, asymptotically normal, and asymptotically efficient. Try to explain these properties in your own words. This is really a test on your understanding of these notions in econometrics.

Problem Set (these will be discussed in tutorial classes)

Q1. Wooldridge 4.6 (i) With df = n 2 = 86, we obtain the 5% critical value from Table G.2 with df = 90. Because each test is two-tailed, the critical value is 1.987. The t statistic for H0:0 = 0 is about -.89, which is much less than 1.987 in absolute value. Therefore, we fail to reject H0:0 = 0. The t statistic for H0:1 = 1 is (.976 1)/.049 -.49, which is even less significant. (Remember, we reject H0 in favor of H1 in this case only if |t| > 1.987.) (ii) We use the SSR form of the F statistic. We are testing q = 2 restrictions and the df in the unrestricted model is 86. We are given SSRr = 209,448.99 and SSRur = 165,644.51. Therefore, F = [(209,448.99 165,644.51)/2]/[165,644.51/86] 11.37

11

Q3. Wooldridge 4.10 (i) We need to compute the F statistic for the overall significance of the regression with n = 142 and k = 4: F = [.0395/(1 .0395)](137/4) 1.41. The 5% critical value with 4 and 120 df is 2.45, which is well above the value of F. Therefore, we fail to reject H0: 1 = 2 = 3 = 4 = 0 at the 10% level. No explanatory variable is individually significant at the 5% level. The largest absolute t statistic is on dkr, tdkr 1.60, which is not significant at the 5% level against a two-sided alternative. (ii) The F statistic (with the same df) is now [.0330/(1 .0330)](137/4) 1.17, which is even lower than in part (i). None of the t statistics is significant at a reasonable level. (iii) We probably should not use the logs, as the logarithm is not defined for firms that have zero for dkr or eps. Therefore, we would lose some firms in the regression. (iv) It seems very weak. There are no significant t statistics at the 5% level (against a twosided alternative), and the F statistics are insignificant in both cases. Plus, less than 4% of the variation in return is explained by the independent variables. Q4. Wooldridge C4.9 (discrim_c4_9.do) (i) The results from the OLS regression, with standard errors in parentheses, are log ( ) = 1.46 + .073 prpblck + .137 log(income) + .380 prppov (0.29) (.031) (.027) (.133) 2 n = 401, R = .087 The p-value for testing H0: 1= 0 against the two-sided alternative is about .018, so that we reject H0 at the 5% level but not at the 1% level. (ii) The correlation is about .84, indicating a strong degree of multicollinearity. Yet each log () is about 5.1 and that coefficient is very statistically significant: the t statistic for prppov is about 2.86 (two-sided p-value = .004). for (iii) The OLS regression results when log(hseval) is added are log ( ) = .84 + .098 prpblck .053 log(income) + .052 prppov + .121 log(hseval) (.29) (.029) (.038) (.134) (.018)

Q2. Wooldridge 4.8 (i) We use Property VAR.3 from Appendix B: 1 3 2 ) = Var ( 1 ) + 9 Var ( 2 ) 6 Cov ( 1 , 2 ). Var( 2 1)/se( 1 3 2 ), so we need the standard error of 1 3 2 . 1 3 (ii) t = ( (iii) Because = 1 32, we can write 1 = + 32. Plugging this into the population model gives y = 0 + ( + 32)x1 + 2 x2 + 3 x3 + u = 0 + x1 + 2 (3 x1 + x2) + 3 x3 + u. This last equation is what we would estimate by regressing y on x1, 3x1 + x2, and x3. The coefficient and standard error on x1 are what we want

which is a strong rejection of H0: from Table G.3c, the 1% critical value with 2 and 90 df is 4.85. (iii) We use the R-squared form of the F statistic. We are testing q = 3 restrictions and there are 88 5 = 83 df in the unrestricted model. The F statistic is F = [(.829 .820)/3]/[(1 .829)/83] 1.46. The 10% critical value (again using 90 denominator df in Table G.3a) is 2.15, so we fail to reject H0 at even the 10% level. In fact, the p-value is about .23. (iv) If heteroskedasticity were present, Assumption MLR.5 would be violated, and the F statistic would not have an F distribution under the null hypothesis. Therefore, comparing the F statistic against the usual critical values, or obtaining the p-value from the F distribution, would not be especially meaningful.

12

Q5. Wooldridge 5.2 This is about the inconsistency of the simple regression of pctstck on funds. A higher tolerance of risk means more willingness to invest in the stock market, so 2 > 0. By assumption, funds and risktol are positively correlated. Now we use equation (5.5), where 1 ) = 1 + 21 > 1, so 1 has a positive inconsistency (asymptotic bias). This 1 > 0: plim( makes sense: if we omit risktol from the regression and it is positively correlated with funds, some of the estimated effect of funds is actually due to the effect of risktol.

n = 401, R2 = .184 The coefficient on log(hseval) is an elasticity: a one percent increase in housing value, holding the other variables fixed, increases the predicted price by about .12 percent. The two-sided p-value is zero to three decimal places. (iv) Adding log(hseval) makes log(income) and prppov individually insignificant (at even the 15% significance level against a two-sided alternative for log(income), and prppov is does not have a t statistic even close to one in absolute value). Nevertheless, they are jointly significant at the 5% level because the outcome of the F2,396 statistic is about 3.52 with pvalue = .030. All of the control variables log(income), prppov, and log(hseval) are highly correlated, so it is not surprising that some are individually insignificant. (v) Because the regression in (iii) contains the most controls, log(hseval) is individually significant, and log(income) and prppov are jointly significant, (iii) seems the most reliable. It holds fixed three measure of income and affluence. Therefore, a reasonable estimate is that if prpblck increases by .10, psoda is estimated to increase by 1%, other factors held fixed.

Q6. Wooldridge C5.1 (wage1_c5_1.do) (i) The estimated equation is = 2.87 + .599 educ + .022 exper + .169 tenure (0.73) (.051) (.012) (.022) = 3.085. n = 526, R2 = .306, Below is a histogram of the 526 residual, , i = 1, 2 , ..., 526. The histogram uses 27 bins, which is suggested by the formula in the Stata manual for 526 observations. For comparison, the normal distribution that provides the best fit to the histogram is also plotted.

13

(ii) With log(wage) as the dependent variable the estimated equation is log () = .284 + .092 educ + .0041 exper + .022 tenure (.104) (.007) (.0017) (.003) 2 = .441. n = 526, R = .316, The histogram for the residuals from this equation, with the best-fitting normal distribution overlaid, is given below:

(iii) The residuals from the log(wage) regression appear to be more normally distributed. Certainly the histogram in part (ii) fits under its comparable normal density better than in part (i), and the histogram for the wage residuals is notably skewed to the left. In the wage regression there are some very large residuals (roughly equal to 15) that lie almost five estimated standard deviations ( = 3.085) from the mean of the residuals, which is identically zero, of course. Residuals far from zero does not appear to be nearly as much of a problem in the log(wage) regression. The skewness and kurtosis of the residuals from the OLS regression may be used to test the null that u is normally distributed (skewness = 0 and kurtosis = 3 under the null). The null distribution of the Jarque-Bera test statistic (which takes into account both skewness and kurtosis) is approximately the chi-squared with 2 degrees of freedom. Hence, at the 5% level, we reject the normality of u whenever the JB statistic is greater than 5.99. In STATA, this test is carried out by the command sktest. With the data in WAGE1.RAW, JB = (extremely large) for the wage model and JB = 10.59 for the log(wage) model. While the normality is rejected for both models, we do see that the JB statistic is much smaller for the log(wage) model [also compare its p-values of skewness and excess kurtosis tests against those from the wage model]. Hence the normality appears to be a reasonable approximation to the distribution of the error term in the log(wage) model.

14

Week 7 Tutorial Exercises


Review Questions (these may or may not be discussed in tutorial classes)
What are the advantages of using the log of a variable in regression? Find the rules of thumb for taking logs. See page 191 of Wooldridge (p198-199 for the 3rd edition) Be careful when you interpret the coefficients of explanatory variables in a model where some variables are in logarithm. Do you remember Table 2.3? Consult Table 2.3. How do you compute the change in y caused by x when the model is built for log(y)? A precise computation is given by Equation (6.8) of Wooldridge. Why do we need the interaction terms in regression models? An interaction term is needed if the partial effect of an explanatory variable is linearly related to another explanatory variable. See (6.17) for example. What is the adjusted R-squared? What is the difference between it and the R-squared? The primary attractiveness of the adjusted R-squared is that it imposes penalty for adding additional explanatory variables to a model. The R-squared can never fall when a new explanatory variable is added. However, the adjusted R-squared will fall if the t-ratio on the new variable is less than one in absolute value. How do you construct interval prediction for given x-values? This is nicely summarised in Equations (6.27)-(6.31) and the surrounding text. How do you predict y for given x-values when the model is built for log(y)? Check the list on page 212 (p220 for the 3rd edition). What is involved in residual analysis? See pages 209-210 (p217-218 for the 3rd edition).

Problem Set (these will be discussed in tutorial classes)

Q1. Wooldridge 6.4 (i) Holding all other factors fixed we have log(wage) = 1 educ + 2 educ pareduc = (1 + 2 pareduc) educ . Dividing both sides by educ gives the result. The sign of 2 is not obvious, although 2 > 0 if we think a child gets more out of another year of education the more highly educated are the childs parents. (ii) We use the values pareduc = 32 and pareduc = 24 to interpret the coefficient on educpareduc. The difference in the estimated return to education is .00078(32 24) = .0062, or about .62 percentage points. (Percentage points are changes in percentages.) (iii) When we add pareduc by itself, the coefficient on the interaction term is negative. The t statistic on educpareduc is about 1.33, which is not significant at the 10% level against a two-sided alternative. Note that the coefficient on pareduc is significant at the 5% level against a two-sided alternative. This provides a good example of how omitting a level effect (pareduc in this case) can lead to biased estimation of the interaction effect. Q2. Wooldridge 6.5 This would make little sense. Performances on math and science exams are measures of outputs of the educational process, and we would like to know how various educational inputs and school characteristics affect math and science scores. For example, if the staff-to-pupil ratio has an effect on both exam scores, why would we want to hold

15

Q4. Wooldridge C6.8 (hprice1_c6_8.do) (i) The estimated equation (where price is in dollars) is = 21,770.3 + 2.068 lotsize + 122.78 sqrft + 13,852.5 bdrms (29,475.0) (0.642) (13.24) (9,010.1) 2 2 = 59,833. n = 88, R = .672, adjusted-R = .661, The predicted price at lotsize = 10,000, sqrft = 2,300, and bdrms = 4 is about $336,714. (ii) The regression is pricei on (lotsizei 10,000), (sqrfti 2,300), and (bdrmsi 4). We want the intercept estimate and the associated 95% CI from this regression. The CI is approximately 336,706.7 14,665, or about $322,042 to $351,372 when rounded to the nearest dollar. (iii) We must use equation (6.36) to obtain the standard error of and then use equation (6.37) (assuming that price is normally distributed). But from the regression in part (ii), 59,833. Therefore, se( 0 ) [(7,374.5)2 + (59,833) 2]1/2 60,285.8. se( 0 ) 7,374.5 and Using 1.99 as the approximate 97.5th percentile in the t84 distribution gives the 95% CI for price0, at the given values of the explanatory variables, as 336,706.7 1.99(60,285.8) or, rounded to the nearest dollar, $216,738 to $456,675. This is a fairly wide prediction interval. But we have not used many factors to explain housing price. If we had more, we could, presumably, reduce the error standard deviation, and therefore , to obtain a tighter prediction interval.

Q3. Wooldridge C6.3 (C6.2 for the 3rd Edition) (wage2_c6_3.do) (i) Holding exper (and the elements in u) fixed, we have log(wage) = 1educ + 2educexper = (1 + 2exper)educ , or log(wage)/ educ = (1 + 2exper), This is the approximate proportionate change in wage given one more year of education. (ii) H0: 3 = 0. If we think that education and experience interact positively so that people with more experience are more productive when given another year of education, then 3 > 0 is the appropriate alternative. (iii) The estimated equation is log () = 5.95 + .0440 educ .0215 exper + .00320 educexper (0.24) (.0174) (.0200) (.00153) 2 2 n = 935, R = .135, adjusted-R = .132. The t statistic on the interaction term is about 2.13, which gives a p-value below .02 against H1: 3 > 0. Therefore, we reject H0: 3 = 0 in favor of H1: 3 > 0 at the 2% level. (iv) We rewrite the equation as log(wage) = 0 + 1educ + 2exper + 3 educ (exper 10) + u, and run the regression log(wage) on educ, exper, and educ(exper 10). We want the coefficient on educ. We obtain 1 .0761 and se(1) .0066. The 95% CI for 1 is about .063 to .089.

performance on the science test fixed while studying the effects of staff on the math pass rate? This would be an example of controlling for too many factors in a regression equation. The variable scill could be a dependent variable in an identical regression equation.

16

Week 8 Tutorial Exercises


Review Questions (these may or may not be discussed in tutorial classes)
What is a qualitative factor? Give some examples. This should be easy. How do you use dummy variables to represent qualitative factors? A factor with two levels (or values) is easily represented one dummy variable. For a factor with more than two levels (or values), we need more than one dummies. For example, a factors with 5 values requires 4 dummy variables. How do you use dummy variables to represent an ordinal variable? An ordinal variable may be represented by dummies in the same way as a multi-levelled factor. How do you test for differences in regression functions across different groups? We generally use the F test on properly-defined restricted model and unrestricted model. Chow test is a special case of the F test, where the null hypothesis is no difference across groups. What can you achieve by interacting group dummy variables with other regressors? The interaction allows the slope coefficients to be different across groups. What is program evaluation? It assesses the effectiveness of a program, by checking the differences in regression coefficients for the controlled (or base) group and the treatment group. What are the interpretations of the PRF and SRF when the dependent variable is binary? The PRF is the conditional probability of success for given explanatory variables. The SRF is the predicted conditional probability of success for given explanatory variables. What are the shortcomings of the LPM? The major drawback of the LPM is that the predicted probability of success may be outside the interval [0,1]. Does MLR5 hold for the LPM? No, as indicated in the textbook and lecture, the conditional variance of the disturbance is not a constant and depends on explanatory variables: var(u|x) = E(y|x)[1 E(y|x)].

Problem Set (these will be discussed in tutorial classes)

Q1. Wooldridge 7.4 (i) The approximate difference is just the coefficient on utility times 100, or 28.3%. The t statistic is .283/.099 2.86, which is statistically significant. (ii) 100[exp(.283) 1) 24.7%, and so the estimate is somewhat smaller in magnitude. (iii) The proportionate difference is .181 .158 = .023, or about 2.3%. One equation that can be estimated to obtain the standard error of this difference is log(salary) = 0 + 1log(sales) + 2roe + 1consprod + 2utility +3trans + u, where trans is a dummy variable for the transportation industry. Now, the base group is finance, and so the coefficient 1 directly measures the difference between the consumer products and finance industries, and we can use the t statistic on consprod. Q2. Wooldridge 7.6 In Section 3.3 in particular, in the discussion surrounding Table 3.2 we discussed how to determine the direction of bias in the OLS estimators when an important variable (ability, in this case) has been omitted from the regression. As we discussed there, Table 3.2 only

17

Q3. Wooldridge 7.9 (i) Plugging in u = 0 and d = 1 gives f1(z) = (0+0) + (1+1)z. (ii) Setting f0(z*) = f1(z*) gives 0 + 1z* = (0+0) + (1+1)z* . Therefore, provided 1 0, we have z* = 0/1 . Clearly, z* is positive if and only if 0/1 is negative, which means 0 and 1 must have opposite signs. (iii) Using part (ii) we have totcoll* = .357/.030 = 11.9 years. (totcoll= total college years). (iv) The estimated years of college where women catch up to men is much too high to be practically relevant. While the estimated coefficient on femaletotcoll shows that the gap is reduced at higher levels of college, it is never closed not even close. In fact, at four years of college, the difference in predicted log wage is still .357+.030(4) = .237, or about 21.1% less for women.

strictly holds with a single explanatory variable included in the regression, but we often ignore the presence of other independent variables and use this table as a rough guide. (Or, we can use the results of Problem 3.10 for a more precise analysis.) If less able workers are more likely to receive training, then train and u are negatively correlated. If we ignore the presence of educ and exper, or at least assume that train and u are negatively correlated after netting out educ and exper, then we can use Table 3.2: the OLS estimator of 1 (with ability in the error term) has a downward bias. Because we think 1 0, we are less likely to conclude that the training program was effective. Intuitively, this makes sense: if those chosen for training had not received training, they would have lowers wages, on average, than the control group.

Q4. Wooldridge C7.13 (apple_c7_13.do) (i) 412/660 .624. (ii) The OLS estimates of the LPM are = .424 .803 ecoprc + .719 regprc + .00055 faminc + .024 hhsize (.165) (.109) (.132) (.00053) (.013) + .025 educ .00050 age (.008) (.00125) n = 660, R2 = .110 If ecoprc increases by, say, 10 cents (.10), then the probability of buying eco-labeled apples falls by about .080. If regprc increases by 10 cents, the probability of buying eco-labeled apples increases by about .072. (Of course, we are assuming that the probabilities are not close to the boundaries of zero and one, respectively.) (iii) The F test, with 4 and 653 df, is 4.43, with p-value = .0015. Thus, based on the usual F test, the four non-price variables are jointly very significant. Of the four variables, educ appears to have the most important effect. For example, a difference of four years of education implies an increase of .025(4) = .10 in the estimated probability of buying ecolabeled apples. This suggests that more highly educated people are more open to buying produce that is environmentally friendly, which is perhaps expected. Household size (hhsize) also has an effect. Comparing a couple with two children to one that has no children other factors equal the couple with two children has a .048 higher probability of buying ecolabeled apples. (iv) The model with log(faminc) fits the data slightly better: the R-squared increases to about .112. (We would not expect a large increase in R-squared from a simple change in the functional form.) The coefficient on log(faminc) is about .045 (t = 1.55). If log(faminc) increases by .10, which means roughly a 10% increase in faminc, then P(ecobuy = 1) is estimated to increase by about .0045, a pretty small effect.

18

(v) The fitted probabilities range from about .185 to 1.051, so none are negative. There are two fitted probabilities above 1, which is not a source of concern with 660 observations. 0.5 and zero (vi) Using the standard prediction rule predict one when otherwise gives the fraction correctly predicted for ecobuy = 0 as 102/248 .411, so about 41.1%. For ecobuy = 1, the fraction correctly predicted is 340/412 .825, or 82.5%. With the usual prediction rule, the model does a much better job predicting the decision to buy ecolabeled apples. (The overall percent correctly predicted is about 67%.)

19

Week 9 Tutorial Exercises


Review Questions (these may or may not be discussed in tutorial classes)
What is heteroskedasticity in a regression model? When homoskedasticity (MLR5) fails and the variance of the disturbance (u) changes across observation index (i), we say that heteroskedasticity is present. In the presence of heteroskedasticity, are the t-stat and F-stat from the usual OLS still valid? Why? Are there any other problems with the OLS under heteroskedasticity? The usual t-stat and F-stat are not valid because the usual OLS standard errors are incorrect under heteroskedasticity. Further the OLS estimator will no longer be (asymptotically) efficient and there are better estimators. What are the heteroskedasticity-robust standard errors? How do you use them in STATA? These are the corrected standard errors that take into account the possible presence of heteroskedasticity. The t-stat and F-stat computed using the heteroskedasticity-robust standard errors are valid test statistics. In STATA, these are easily obtained by using the option robust with the regress command, e.g., regress lwage educ, robust; How do you detect if there is heteroskedasticity? The Breusch-Pagan test or the White test can be used to detect the presence of heteroskedasticity. See Section 8.3 for details. If heteroskedasticity is present in a known form, how would you estimate the model? In this case, the WLS estimators should be used, which is more efficient than the OLS estimators. See Section 8.4 for details. If heteroskedasticity is present in an unknown form, how would you estimate the model? If there is strong evidence for heteroskedasticity, the FGLS estimators should be used, which is based on an exponential functional form. See Section 8.4 for details. What are the steps in the FGLS estimation? You should summarise from Section 8.4. How would you handle the heteroskedasticity of the LPM? The heteroskedasticity functional form is known for the LPM. Hence the WLS can be used in principle. However, because the LPM can produce predicted probabilities that are outside the interval (0,1), the known functional form p(x)[1-p(x)] may not be useful for the WLS. In that case, it may be necessary to use FGLS for the LPM.

Problem Set (these will be discussed in tutorial classes)

Q1. Wooldridge 8.1 Parts (ii) and (iii). The homoskedasticity assumption played no role in Chapter 5 in showing that OLS is consistent. But we know that heteroskedasticity causes statistical inference based on the usual t and F statistics to be invalid, even in large samples. As heteroskedasticity is a violation of the Gauss-Markov assumptions, OLS is no longer BLUE. Q2. Wooldridge 8.2 With Var(u|inc,price,educ,female) = 2inc2, h(x) = inc2, where h(x) is the heteroskedasticity function defined in equation (8.21). Therefore, h(x) = inc, and so the transformed equation is obtained by dividing the original equation by inc: beer/inc = 0(1/inc) + 1 + 2price/inc + 3educ/inc + 4female/inc + u/inc

20

Q3. Wooldridge 8.3 False. The unbiasedness of WLS and OLS hinges crucially on Assumption MLR.4, and, as we know from Chapter 4, this assumption is often violated when an important variable is omitted. When MLR.4 does not hold, both WLS and OLS are biased. Without specific information on how the omitted variable is correlated with the included explanatory variables, it is not possible to determine which estimator has a small bias. It is possible that WLS would have more bias than OLS or less bias. Because we cannot know, we should not claim to use WLS in order to solve biases associated with OLS. Q4. Wooldridge 8.5 (i) No. For each coefficient, the usual standard errors and the heteroskedasticity-robust ones are practically very similar. (ii) The effect is .029(4) = .116, so the probability of smoking falls by about .116. (iii) As usual, we compute the turning point in the quadratic: .020/[2(.00026)] 38.46, so about 38 and one-half years. (iv) Holding other factors in the equation fixed, a person in a state with restaurant smoking restrictions has a .101 lower chance of smoking. This is similar to the effect of having four more years of education. (v) We just plug the values of the independent variables into the OLS regression line: = .656 .069log(67.44)+.012log(6,500) .029(16)+.020(77) .00026(77) .0052. Thus, the estimated probability of smoking for this person is close to zero. (In fact, this person is not a smoker, so the equation predicts well for this particular observation.)

Notice that 1, which is the slope on inc in the original model, is now a constant in the transformed equation. This is simply a consequence of the form of the heteroskedasticity and the functional forms of the explanatory variables in the original equation.

Q5. Wooldridge C8.10 (401ksubs_c8_10.do) (i) In the following equation, estimated by OLS, the usual standard errors are in () and the heteroskedasticity-robust standard errors are in []: 401 = .506 + .0124 inc .000062 inc2 + .0265 age .00031 age2 .0035 male (.081) (.0006) (.000005) (.0039) (.00005) (.0121) [.079] [.0006] [.000005] [.0038] [.00004] [.0121] n = 9,275, R2 = .094. There are no important differences; if anything, the robust standard errors are smaller. (ii) This is a general claim. Since Var(y|x) = p(x)[1-p(x)], we can write E(u2|x) = p(x)-p(x)2. Written in error form, u2 = p(x) - p(x)2 + v. In other words, we can write this as a regression model u2 = 0 + 1p(x) + 2p(x)2 + v, with the restrictions 0 = 0, 1 = 1, and 2 = -1. Remember that, for the LPM, the fitted values, , are estimates of p(x). So, when we run the and 2 (including an intercept), the intercept estimates should be close regression 2 on to zero, the coefficient on should be close to one, and the coefficient on 2 should be close to 1. (iii) The White LM statistic and F statistic about 581.9 and 310.32 respectively, both of which are very significant. The coefficient on 401 is about 1.010, the coefficient on 4012 about .970, and the intercept is about -.009. These estimates are quite close to what we expect to find from the theory in part (ii).

21

(iv) The smallest fitted value is about .030 and the largest is about .697. The WLS estimates of the LPM are 401 = .488 + .0126 inc .000062 inc2 + .0255 age .00030 age2 .0055 male (.076) (.0005) (.000004) (.0037) (.00004) (.0117) n = 9,275, R2 = .108. There are no important differences with the OLS estimates. The largest relative change is in the coefficient on male, but this variable is very insignificant using either estimation method.

22

Week 10 Tutorial Exercises


Readings
Read Chapter 9 thoroughly. Make sure that you know the meanings of the Key Terms at the chapter end.

Review Questions (these may or may not be discussed in tutorial classes)


What is functional form misspecification? What is its major consequence in regression analysis? As the word misspecification suggests, it is a mis-specified model with a wrong functional form for explanatory variables. For example, a log wage model that does not include age2 is mis-specified if the partial effect of age on log wage is upside-down-U shaped. In linear regression models, the functional form misspecification can be viewed as the omission of a nonlinear function of the explanatory variables. Hence the consequences of omitting important variables apply and the major one is estimation bias. How would you test for functional form misspecification? The RESET may be used to test for functional form misspecification, more specifically, for neglected nonlinearities (nonlinear function of regressors). The RESET is based on the Ftest for the joint significance of the squared and cubed predicted value (y-hat) in a expanded model that includes all the original regressors as well as the squared and cubed y-hat, where the latter represent possible nonlinear functions of the regressors. What are nested models? And, nonnested models? Two models are nested if one is a restricted version of the other (by restricting the values of parameters). Two models are nonnested if neither nests the other. What is the purpose of testing one model against another? The purpose of testing one model against another is to select a best or preferred model for statistical inference and prediction. How would you test for two nonnested models? There are two strategies. The first is to test exclusion restrictions in an expanded model that nest both candidate models and decide which model can be excluded. The second is to test the significance of the predicted value of one model in the other model (known as DavidsonMacKinnon test). What is a proxy variable? What are the conditions for a proxy variable to be valid in regression analysis? A proxy variable is one that is used to represents the influence of an unobserved (and important) explanatory variable. There are two conditions for the validity of a proxy. The first is that the zero conditional mean assumption holds for all explanatory variables (including the unobserved and the proxy). The second is that the conditional mean of the unobserved, given other explanatory variables and the proxy, only depends on the proxy. Can you analyse the consequences of measurement errors? With a careful reading of Section 9.4, you should be able to analyse. In what circumstances missing observations will cause major concerns in regression analysis? Concerns arise when observations are systematically missing and the sample can no longer be regarded as a random sample. Nonrandom samples may cause bias in the OLS estimation. What is exogenous sample selection? What is endogenous sample selection?

23

The exogenous sample selection is a nonrandom sampling scheme in which a sample is chosen on the basis of the explanatory variables. It is easily seen that such sampling scheme does not violate MLR1,3,4 and hence does not introduce bias in the OLS estimation. On the other hand, the endogenous sample selection scheme selects a sample on the basis of the dependent variable and causes bias in the OLS estimation. What are outliers and influential observations? Outliers are unusual observations that are far away from the centre of other observations. A small number of outliers can have large impact on the OLS estimates of parameters. These are called influential observations if the estimation results when including them are significantly different from the results when excluding them. Consider the simple regression model yi = + ( + bi)xi + ui for i = 1, 2,, n, where the slop parameter contains a random variable bi; and are constant parameters. Assume that the usual MLR1-5 hold for yi = + xi + ui, ; E(bi|xi) = 0 and Var(bi|xi) = 2. If we regress y on x with a intercept, will the estimator of the slope parameter be biased from ? Will the usual OLS standard errors be valid for statistical inference? As the model can be writted as yi = + xi + (bixi + ui), where the overall disturbance is (bixi + ui). Given the stated assumptions, MLR1-4 hold for this model and the OLS estimator of the slope parameter is unbiased (from ). However, MLR5 does not hold since the conditional variance of the overall disturbance, var(bixi + ui), will be dependent on xi , and the usual OLS standard errors will not be valid.

Problem Set (these will be discussed in tutorial classes)


Q1. Wooldridge 9.1 There is functional form misspecification if 6 0 or 7 0, where these are the population parameters on ceoten2 and comten2, respectively. Therefore, we test the joint significance of these variables using the R-squared form of the F test: F = [(.375 .353)/2]/[(1 .375)/(177 8)] = 2.97. With 2 and df, the 10% critical value is 2.30, while the 5% critical value is 3.00. Thus, the p-value is slightly above .05, which is reasonable evidence of functional form misspecification. (Of course, whether this has a practical impact on the estimated partial effects for various levels of the explanatory variables is a different matter.)

Q2. Wooldridge 9.3 (i) Eligibility for the federally funded school lunch program is very tightly linked to being economically disadvantaged. Therefore, the percentage of students eligible for the lunch program is very similar to the percentage of students living in poverty. (ii) We can use our usual reasoning on omitting important variables from a regression equation. The variables log(expend) and lnchprg are negatively correlated: school districts with poorer children spend, on average, less on schools. Further, 3 < 0. From Table 3.2, omitting lnchprg (the proxy for poverty) from the regression produces an upward biased estimator of 1 [ignoring the presence of log(enroll) in the model]. So when we control for the poverty rate, the effect of spending falls. (iii) Once we control for lnchprg, the coefficient on log(enroll) becomes negative and has a t of about 2.17, which is significant at the 5% level against a two-sided alternative. The coefficient implies that math10 (1.26/100)(%enroll) = .0126(%enroll). Therefore, a 10% increase in enrollment leads to a drop in math10 of .126 percentage points.

24

Q4. Wooldridge C9.3 (jtrain_c9_3.do) (i) If the grants were awarded to firms based on firm or worker characteristics, grant could easily be correlated with such factors that affect productivity. In the simple regression model, these are contained in u. (ii) The simple regression estimates using the 1988 data are log(scrap) = .409 + .057 grant (.241) (.406) 2 n = 54, R = .0004. The coefficient on grant is actually positive, but not statistically different from zero. (iii) When we add log(scrap87) to the equation, we obtain log(scrap88) = .021 .254 grant88 + .831 log(scrap87) (.089) (.147) (.044) 2 n = 54, R = .873, where the years are indicated as suffixes to the variables for clarity. The t statistic for H0: grant = 0 is .254/.147 1.73. We use the 5% critical value for 40 df in Table G.2: -1.68. Because t = 1.73 < 1.68, we reject H0 in favor of H1: grant < 0 at the 5% level. (iv) The t statistic is (.831 1)/.044 3.84, which is a strong rejection of H0. (v) With the heteroskedasticity-robust standard error, the t statistic for grant88 is .254/.142 1.79, so the coefficient is even more significantly less than zero when we use the heteroskedasticity-robust standard error. The t statistic for H0: log(scrap87) = 1 is (.831 1)/.071 2.38, which is notably smaller than before, but it is still pretty significant.

Q3. Wooldridge 9.5 The sample selection in this case is arguably endogenous. Because prospective students may look at campus crime as one factor in deciding where to attend college, colleges with high crime rates have an incentive not to report crime statistics. If this is the case, then the chance of appearing in the sample is negatively related to u in the crime equation. (For a given school size, higher u means more crime, and therefore a smaller probability that the school reports its crime figures.)

(iv) Both math10 and lnchprg are percentages. Therefore, a ten percentage point increase in lnchprg leads to about a 3.23 percentage point fall in math10, a sizeable effect. (v) In column (1) we are explaining very little of the variation in pass rates on the MEAP math test: less than 3%. In column (2), we are explaining almost 19% (which still leaves much variation unexplained). Clearly most of the variation in math10 is explained by variation in lnchprg. This is a common finding in studies of school performance: family income (or related factors, such as living in poverty) are much more important in explaining student performance than are spending per student or other school characteristics.

25

Week 11 Tutorial Exercises


Review Questions (these may or may not be discussed in tutorial classes)
What are the main features of time series data? How do time series data differ from crosssectional data? The main features include the following. First, time series data have a temporal ordering, which matters in regression analysis because of the second feature. Second, many economic time series are serially correlated (or autocorrelated), meaning that future observation are dependent on present and past observations. Third, many economic time series contain trends and seasonality. In comparison to cross-sectional data, the major difference is that time series data are not random sample. Recall that, for a random sample, observations are required to be independent of one another. What is a stochastic process and its realisation? A stochastic process (SP) is a random variable that depends on the time index. At any fixed point in time, the SP is a random variable (hence has a distribution). The SP can be viewed as a random curve with the horizontal axis being the time index, where the outcome of the underlying experiment is a curve (or a time series plot). The time series we observe is viewed as a realisation (or outcome) of the random curve or the SP. What is serial correlation or autocorrelation? The serial (or auto) correlation of a time-series variable is the correlation between the variable at one point in time and the same variable at another point in time. The serial (or auto) correlation of yt is usually denoted as Corr(yt, yt-h) = Cov(yt, yt-h)/[Var(yt)Var(yt-h)]1/2. What is a finite distributed lag model? What is the long-run propensity (LRP)? How would you estimate the LRP and the associated standard error (say in STATA)? Read ie_Slides10 pages 5-7. Read Section 10.2. Try Wooldridge 10.3. What are TS1-6 (assumptions about time series regression)? How do they differ from the assumptions in MLR1-6? TS1-6 include: 1) linear in parameter; 2) no perfect collinearity; 3) strict zero conditional mean; 4) homoskedasticity; 5) no serial correlation; 6) normality. These differ from MLR1-6 mainly in that MLR2 (random sampling) do not hold for time series data. To ensure the unbiasedness of the OLS estimators, TS3 is needed. To ensure the validity of OLS standard errors, t-stat and F-stat for statistical inference, TS3-TS6 are needed. These are very strong assumptions and can be relaxed when sample sizes are large. What are strictly exogenous regressors and contemporaneously exogenous regressors? Strictly exogenous regressors satisfy the assumption TS3, whereas the contemporaneously exogenous regressors only satisfy the condition (10.10) (or (z10) of the Slides) that is weaker than TS3. What is a trending time series? What is a time trend? A time series is trending if it has the tendency of growing (or shrinking) over time. A time trend is a function of time index (eg, linear, quadratic, etc). A time trend can be used to mimic (or model) the trending component of a time series. Why may a regression with trending time series produce spurious results? First, two time trends are always correlated because they grow together in the same (or opposite) direction. Now consider two unrelated time series, each of which contains a time trend. Since the time trends in the two series are correlated, regressing one on the other will produce a significant slope coefficient. Such statistical significance is spurious because

26

it is purely induced by the time trends and has little to say about the true relationship between the two time series. Why would you include a time trend in regressions with trending variables? Including a time trend in regressions allows one to study the relationship among time series variables that is not induced by the time trends in the time series. What is seasonality in a time series? Give an example of time series variable with seasonality. Seasonality in a time series is the fluctuation that repeats itself every year (or every week, or every day). An example would be the daily revenue a pub, which is likely to have week-day seasonality. For quarterly data, how would you define seasonal dummy variables for a regression model? We need to define three dummies. For instance, if we take the first quarter as the base, we define dummy variables Q2, Q3 and Q4 for the second, third and fourth quarters and include them in the regression model.

Problem Set
Q1. Wooldridge 10.1 (i) Disagree. Most time series processes are correlated over time, and many of them strongly correlated. This means they cannot be independent across observations, which simply represent different time periods. Even series that do appear to be roughly uncorrelated such as stock returns do not appear to be independently distributed, as they have dynamic forms of heteroskedasticity. (ii) Agree. This follows immediately from Theorem 10.1. In particular, we do not need the homoskedasticity and no serial correlation assumptions. (iii) Disagree. Trending variables are used all the time as dependent variables in a regression model. We do need to be careful in interpreting the results because we may simply find a spurious association between yt and trending explanatory variables. Including a trend in the regression is a good idea with trending dependent or independent variables. As discussed in Section 10.5, the usual R-squared can be misleading when the dependent variable is trending. (iv) Agree. With annual data, each time period represents a year and is not associated with any season. Q2. Wooldridge 10.2 We follow the hint and write gGDPt-1 = 0 + 0 intt-1 + 1intt-2 + ut-1, and plug this into the right-hand-side of the intt equation: int t = 0 + 1( 0 + 0int t-1 + 1int t-2 + ut-1 3) + vt = (0 + 1 0 3 1) + 1 0int t-1 + 1 1int t-2 + 1ut-1 + vt . Now by assumption, u t-1 has zero mean and is uncorrelated with all right-hand-side variables in the previous equation, except itself of course. So Cov(intt , ut-1) = E(int t ut-1) = 1E(ut-12) > 0 because 1 > 0. If 2 = E(ut2 ) for all t then Cov(intt , ut-1) = 1 2. This violates the strict exogeneity assumption, TS3. While u t is uncorrelated with intt, intt-1, and so on, ut is correlated with intt+1.

27

Q4. Wooldridge C10.10 (intdef_c10_10.do) (i) The sample correlation between inf and def is only about .098, which is pretty small. Perhaps surprisingly, inflation and the deficit rate are practically uncorrelated over this period. Of course, this is a good thing for estimating the effects of each variable on i3, as it implies almost no multicollinearity. (ii) The equation with the lags is i3t = 1.61 + .343 inf t + .382 inf t-1 .190 def t + .569 def t-1 (0.40) (.125) (.134) (.221) (.197) 2 2 n = 55, R = .685, adj.R = .660. (iii) The estimated LRP of i3 with respect to inf is .343 + .382 = .725, which is somewhat larger than .606, which we obtain from the static model in (10.15). But the estimates are fairly close considering the size and significance of the coefficient on inft-1. (iv) The F statistic for significance of inf t-1 and def t-1 is about 5.22, with p-value .009. So they are jointly significant at the 1% level. It seems that both lags belong in the model.

Q3. Wooldridge 10.7 (i) pet-1 and pet-2 must be increasing by the same amount as pet. (ii) The long-run effect, by definition, should be the change in gfr when pe increases permanently. But a permanent increase means the level of pe increases and stays at the new level, and this is achieved by increasing pet-1, pet-1, and pet by the same amount.

28

You might also like