You are on page 1of 22

Department of Economics Economics W3412

Columbia University Fall 2013

Final Exam
Section 2 (Tue/Thur section)
(Seyhan Erden Arkonac)

Instructions
1. Do not turn this page until so instructed.

2. This exam ends promptly at 4pm.

3. This exam has six questions for a total of 100 points and a bonus question for 2 points.

4. Write down your Columbia ID number on the cover of this exam.

5. You are permitted to use a simple calculator. No computers, wireless, or other electronic
devices without prior permission. You may not share resources with anyone else.

6. Some questions ask you to draw a real-world judgment in a problem of practical importance.
The quality of that judgment counts. For example, consider the question: “It is 10oF outside.
In your judgment, why are so many people wearing heavy coats?” The answer, “To stay
warm” would receive more points than the answer, “Because they are fashion-conscious.”

NAME:_________________________________________________________

UNI:__________________________________________________________

1
Question 1 [17 points]:
A study analyzed the probability that Major League Baseball (MLB) players "survive" for
another season, or, in other words, play one more season. They studied a model of the following
form:

The dependent variable is a binary variable that takes on a value of one if the player played one
more season (a minimum of 50 at bats or 25 innings pitched), and zero otherwise. Seasons is the
number of total seasons played, measured in years, Perf is the performance of the player this
year, and Avgperf is the average performance of the player over their career.

The researchers had a sample of 4,728 hitters and 3,803 pitchers for the years 1901-1999. All
explanatory variables are standardized (sample mean of 0, variance of 1). Probit estimation
yielded the results as shown in the table:

Regression (1) Hitters (2) Pitchers


Regression model probit probit
constant 2.010 1.625
(0.030) (0.031)
number of seasons -0.058 -0.031
played (0.004) (0.005)
performance 0.794 0.677
(0.025) (0.026)
average 0.022 0.100
performance (0.033) (0.036)

(a) (6p) Interpret the two probit equations and calculate survival probabilities for hitters and
pitchers at the sample mean. Provide an explanation for why these are so high.

2
(b) (6p) Calculate the change in the survival probability for a player who has a very bad year by
performing two standard deviations below the average (assume also that this player has been
in the majors for many years so that his average performance is negligibly affected). How
does this change the survival probability when compared to the answer in (a)?

(c) (5p) Since the results for hitters and pitchers seem similar, the researcher could consider
combining the two samples. With a combined sample, how could you test the hypothesis
that the coefficients for the explanatory variables are the same for hitters and pitchers?
Explain in some detail.
.

3
Question 2 [21 points]: (ch 10)

Consider the following panel data regression with a single explanatory variable

Yit = β0 + β1Xit + .

In each of the examples below, you will be including entity and time fixed effects.
(a) (3 p) Consider the effect of beer taxes on the fatality rate using annual data from 1982-1988,
and nine U.S. regions (New England, Pacific, Mid-Atlantic, South, etc.). How many total
coefficients do you need to estimate?

(b) (4 p) Certain regions (e.g. New England) that tend to have higher beer taxes also tend to
have consistently higher quality hospitals. Does this pose a threat to your analysis?

(c) (3 p) Consider the effect of the minimum wage on teenage employment using annual data
from 1963-2000 for five Canadian Regions (Atlantic Provinces, Quebec, Ontario, Prairies,
British Columbia). How many total coefficients do you need to estimate?

4
(d) (4 p) Nationwide recessions impact both teenage employment and the minimum wage across
the country. Does this pose a threat to your analysis?

(e) (3 p) Consider the effect of savings rates on per capita income using data for three decades
(1960-1969, 1970-1979, 1980-1989; one observation per decade) and 104 countries. How
many total coefficients do you need to estimate?

(f) (4 p) A number of countries industrialized at different times between 1960-1989, a process


which can impact both the savings rate and per capita income. Does this pose a threat to
your analysis?

5
Question 3 [15 points]:
Consider a supply model for edible chicken, which the the U.S. Department of Agriculture calls
“broilers” Data for this question is adapted from the data provided by Epple and McCallum
(2006)1. The data are annual, 1950-2001 The Supply equation is:

( ) ( ) ( ) ( )

where is aggregate production of young chickens, is the real price index of fresh
chicken, is real price index of broiler feed, and which is included to
capture any technical progress in the production. Some potential external instrumental variables
are ( ), where is the real per capita income; ( ), where is the real price of
beef; is the percent population growth from year t-1 to year t; ( ) is the lagged
log of real price of chickens; ( ) is the log of exports of chicken.

Estimated supply equation for chicken can be written from the following output:

Regression 1:
. reg lnQPROD lnP lnPF TIME lnQPROD_1

Source | SS df MS Number of obs = 40


-------------+------------------------------ F( 4, 35) = 3102.49
Model | 11.9815945 4 2.99539863 Prob > F = 0.0000
Residual | .03379186 35 .000965482 R-squared = 0.9972
-------------+------------------------------ Adj R-squared = 0.9969
Total | 12.0153864 39 .308086831 Root MSE = .03107

------------------------------------------------------------------------------
lnQPROD | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lnP | .0091099 .0679409 0.13 0.894 -.1288175 .1470373
lnPF | -.0901945 .0426459 -2.11 0.042 -.1767703 -.0036186
TIME | .0111706 .0051486 2.17 0.037 .0007183 .0216229
lnQPROD_1 | .7326902 .1066347 6.87 0.000 .5162103 .94917
_cons | 2.109681 .7991519 2.64 0.012 .487316 3.732045
------------------------------------------------------------------------------

1 “Simultaneous Equation Econometrics: The Missing Example”, Economic Inquiry, 44(2), 374-384

6
Regression 2:
. ivreg lnQPROD (lnP=lnPB lnY POPGRO lnEXPTS) lnPF TIME lnQPROD_1

Instrumental variables (2SLS) regression

Source | SS df MS Number of obs = 40


-------------+------------------------------ F( 4, 35) = 1619.82
Model | 11.9506133 4 2.98765333 Prob > F = 0.0000
Residual | .064773079 35 .001850659 R-squared = 0.9946
-------------+------------------------------ Adj R-squared = 0.9940
Total | 12.0153864 39 .308086831 Root MSE = .04302

------------------------------------------------------------------------------
lnQPROD | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lnP | .393975 .1749342 2.25 0.031 .0388398 .7491103
lnPF | -.1909911 .0705566 -2.71 0.010 -.3342286 -.0477535
TIME | .0242389 .0087117 2.78 0.009 .0065532 .0419247
lnQPROD_1 | .5489031 .1635754 3.36 0.002 .2168274 .8809789
_cons | 3.298617 1.196567 2.76 0.009 .8694559 5.727778
------------------------------------------------------------------------------
Instrumented: lnP
Instruments: lnPF TIME lnQPROD_1 lnPB lnY POPGRO lnEXPTS
------------------------------------------------------------------------------

Regression 3:
. reg lnP lnPB lnY POPGRO lnEXPTS lnPF TIME lnQPROD_1

Source | SS df MS Number of obs = 40


-------------+------------------------------ F( 7, 32) = 49.65
Model | 1.61496433 7 .230709191 Prob > F = 0.0000
Residual | .14868612 32 .004646441 R-squared = 0.9157
-------------+------------------------------ Adj R-squared = 0.8973
Total | 1.76365045 39 .045221807 Root MSE = .06816

------------------------------------------------------------------------------
lnP | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lnPB | .1159974 .2186138 0.53 0.599 -.3293044 .5612991
lnY | 1.471961 .6529929 2.25 0.031 .1418577 2.802064
POPGRO | .0697965 .0908676 0.77 0.448 -.1152949 .2548878
lnEXPTS | 2.438689 .6971098 3.50 0.001 1.018723 3.858655
lnPF | .154805 .1068706 1.45 0.157 -.0628833 .3724932
TIME | -.0735312 .0230427 -3.19 0.003 -.1204676 -.0265948
lnQPROD_1 | -.0086269 .2911554 -0.03 0.977 -.601691 .5844372
_cons | -11.95739 6.311461 -1.89 0.067 -24.81341 .8986362
-----------------------------------------------------------------------------c

7
Regression 4:
. reg lnQPROD lnP lnPF TIME lnQPROD_1

Source | SS df MS Number of obs = 40


-------------+------------------------------ F( 4, 35) = 3102.49
Model | 11.9815945 4 2.99539863 Prob > F = 0.0000
Residual | .03379186 35 .000965482 R-squared = 0.9972
-------------+------------------------------ Adj R-squared = 0.9969
Total | 12.0153864 39 .308086831 Root MSE = .03107

------------------------------------------------------------------------------
lnQPROD | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lnP | .0091099 .0679409 0.13 0.894 -.1288175 .1470373
lnPF | -.0901945 .0426459 -2.11 0.042 -.1767703 -.0036186
TIME | .0111706 .0051486 2.17 0.037 .0007183 .0216229
lnQPROD_1 | .7326902 .1066347 6.87 0.000 .5162103 .94917
_cons | 2.109681 .7991519 2.64 0.012 .487316 3.732045
------------------------------------------------------------------------------

. predict e, residuals
(1 missing values generated)

Regression 5:
. reg e lnPB lnY POPGRO lnEXPTS lnPF TIME lnQPROD_1

Source | SS df MS Number of obs = 40


-------------+------------------------------ F( 7, 32) = 2.19
Model | .010946966 7 .001563852 Prob > F = 0.0618
Residual | .022844894 32 .000713903 R-squared = 0.3240
-------------+------------------------------ Adj R-squared = 0.1761
Total | .03379186 39 .000866458 Root MSE = .02672

------------------------------------------------------------------------------
e | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lnPB | .1180813 .0856913 1.38 0.178 -.0564662 .2926289
lnY | .2378684 .2559575 0.93 0.360 -.2835 .7592367
POPGRO | -.0123288 .0356179 -0.35 0.732 -.0848802 .0602225
lnEXPTS | .9702997 .2732502 3.55 0.001 .4137072 1.526892
lnPF | -.0522353 .0418907 -1.25 0.221 -.1375639 .0330932
TIME | -.0045154 .0090322 -0.50 0.621 -.0229133 .0138826
lnQPROD_1 | -.2648651 .1141259 -2.32 0.027 -.497332 -.0323983
_cons | .1666471 2.473941 0.07 0.947 -4.872605 5.205899
------------------------------------------------------------------------------

. test lnPB lnY POPGRO lnEXPTS

( 1) lnPB = 0
( 2) lnY = 0
( 3) POPGRO = 0
( 4) lnEXPTS = 0

F( 4, 32) = 3.83
Prob > F = 0.0118

8
(a) (4p) Compare the results in regression 1 and 2. Explain the reasons for instrumental
variables in regression 2?

(b) (5p) What are the requirements for valid instruments? Explain with mathematical
conditions.

(c) (6p) Do these instruments satisfy the requirements? You must use the necessary
regression results for your answer. Please specify the regression number you use while
answering each part of this questions.

9
Question 4 [15 points]:
There is some economic research that suggests that oil prices play a central role in causing
recessions in developed countries. In particular, this research suggests that it is specifically
increases in oil prices that matter. As a result, economists often look only at the percentage point
difference between oil prices at date t and the maximum value over the previous year. However,
you notice that energy prices can fluctuate quite dramatically in both directions and believe that
geographic areas also benefit substantially from oil price decreases. As a result, you decide to
consider the effect of real oil prices (Poil/CPI) on GDP growth (Yt) You estimate the following
distributed lag model using annual data (numbers in parenthesis are HAC standard errors):

t = 3.39 - 0.009 (Poil/CPI)t - 0.028 (Poil/CPI)t-1


(0.27) (0.010) (0.011)

t = 1960-2008, R2 = 0.15, SER = 1.88

(a) (5p) What is the impact effect of a 25 percent increase in real oil prices?

(b) (5p) What is the predicted cumulative change in GDP Growth over two years of this effect?

10
(c) (5p) The HAC F-statistic is 4.07. Can you reject the null hypothesis that oil price changes
have no effect on real GDP growth? What is the critical value you considered? Is there any
reason why you should be cautious using an F-test in this case, given the sample period?

11
Question 5 [20 points]:
Given the following STATA output, you can find a VAR(2) (VectorAutoregression) model of
change in inflation ( ) and unemployment rate ( )
. var unem cinf

Vector autoregression

Sample: 1951 - 2012 No. of obs = 62


Log likelihood = -201.564 AIC = 6.824644
FPE = 3.156906 HQIC = 6.959349
Det(Sigma_ml) = 2.284871 SBIC = 7.167731

Equation Parms RMSE R-sq chi2 P>chi2


----------------------------------------------------------------
unem 5 1.00228 0.6589 119.7914 0.0000
cinf 5 1.72495 0.3091 27.73971 0.0000
----------------------------------------------------------------

------------------------------------------------------------------------------
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
unem |
unem |
L1. | 1.061241 .1303681 8.14 0.000 .8057245 1.316758
L2. | -.2874012 .133048 -2.16 0.031 -.5481705 -.026632
|
cinf |
L1. | .0976014 .0668152 1.46 0.144 -.0333539 .2285567
L2. | .0623594 .0572543 1.09 0.276 -.049857 .1745758
|
_cons | 1.345204 .513183 2.62 0.009 .3393835 2.351024
-------------+----------------------------------------------------------------
cinf |
unem |
L1. | -.4678597 .2243671 -2.09 0.037 -.907611 -.0281084
L2. | .2932862 .2289793 1.28 0.200 -.155505 .7420773
|
cinf |
L1. | -.0527481 .1149907 -0.46 0.646 -.2781258 .1726296
L2. | -.430232 .0985363 -4.37 0.000 -.6233595 -.2371044
|
_cons | 1.00306 .883202 1.14 0.256 -.7279845 2.734104
------------------------------------------------------------------------------

Table 1
Year Unem Inflation
2008 5.8 3.8
2009 9.3 -0.3
2010 9.6 1.6
2011 8.9 3.1
2012 8.1 2.1

12
(a) (4p) Given the actual realizations of unemployment and inflation in table 1, forecast
unemployment for 2013, show your work

(b) (4p) Given the actual realizations of unemployment and inflation in table 1, forecast
inflation for 2013, show your work

13
(c) (4p) Following is the joint test result for the second lags of unemployment rate and the
inflation rate, according to the following test, would a VAR(1) model be better
forecasting model than a VAR(2) model, explain why?

. test L2.cinf L2.unem

( 1) [unem]L2.cinf = 0
( 2) [cinf]L2.cinf = 0
( 3) [unem]L2.unem = 0
( 4) [cinf]L2.unem = 0

chi2( 4) = 30.26
Prob > chi2 = 0.0000

(d) (4p) Why might a researcher use change in inflation as opposed to inflation in this
model? Explain.

(e) (4p) Should one use change in unemployment instead of unemployment? Explain.

14
Question 6 [12 points]:
Consider the panel data model:

where are i.i.d. and independent of Xs with mean zero and variance ,

(a) (3 p) Define ̃ and ̃ , the entity demeaned values of X and Y.

(b) (3 p) Rewrite the model in terms of these demeaned variables.

15
(c) (3 p) Derive algebraically ̂ the fixed-effects estimator of . The fixed effects
estimator minimizes the sum of squared residuals of the model you wrote in part b.

16
(d) (3 p) Show that, if is a random variable that is independent of X and u, the
estimator

̃

is unbiased for . Explain your answer.

17
Bonus Question [2 points]:
The two conditions for instrument validity are corr(Zi, Xi) ≠ 0 and corr(Zi, ui) = 0. The reason for the
inconsistency of OLS is that corr(Xi, ui) ≠ 0. If X and Z are correlated, and X and u are also correlated, how
is it possible that Z and u are not correlated? Explain.

18
Selected Tables from Stock and Watson, Introduction to Econometrics

19
20
21
22

You might also like