You are on page 1of 8

ECONOMETRICS / QUANTITATIVE AND STATISTICAL METHODS I

Examples of questions from past final exams


1. Explain in plain words (no formulas) and with as much intuition as possible, the following
concepts.
a) The Hausman test in instrumental variables estimation
b) The random effects estimator in panel data
c) Generalized Least Squares estimation
d) The two conditions necessary for validity of an instrument in IV estimation
e) Homoskedasticity in a regression model
f) A linear probability model
g) The difference between a consistent and an unbiased estimator
h) A Chow test

2. You work for a policy institution and your boss has asked you to analyze market structure in the
US. Given your vast knowledge of economics and the data that you have available, you have come
up with the following system of three simultaneous equations that describe the market structure of
US industries:

Ai = α 0 + α 1 M i + α 2 Cd i + α 3Ci + α 4 Gri + e1,i


Ci = β 0 + β1 Ai + β 2 MES i + β 3 K i + e2,i
M i = γ 0 + γ 1 K i + γ 2 Gri + γ 3Ci + γ 4 Gd i + γ 5 Ai + γ 6 Cd i + γ 7 MES i + e3,i

Where the endogenous variables are:


Ai: Advertising intensity of industry i, measured as: 100*increase in Advertising with respect to the
previous year / Sales.
Ci: A measure of industry concentration (the higher the number, the more concentrated the
industry)
Mi: Average price-cost margin

And the exogenous (explanatory) variables are:


Cdi: An index of consumer demand in industry i (a rescaled version of total demand / sales).
Gri: growth rate of sales in the past year in industry i
Ki: An index of capital intensity in the industry (relative to sales)
MESi: Minimum efficient scale in the industry (relative to sales)
Gdi: Geographic dispersion of the industry

Analyze whether you can identify the system as it is or not. If you cannot identify the whole system,
propose a solution that will solve the issue.
ECONOMETRICS / QUANTITATIVE AND STATISTICAL METHODS I

3. You work for the Minister of the Economy, and he has asked you to do some study on the
exporting activity of your country’s firms. He would like to know whether export subsidies that were
given to firms last year were effective. You have data on 200 firms (assume it’s a representative
sample of the population). The data consist of the following variables:
Exporta: A 1 if the firm exported during the last year, a 0 if not.
Crecim: Average growth rate (of assets) of the firm during the last three years.
Tecnologia: A 1 if the firm is in a technology intensive sector, a 0 if not.
Subvencion: A 1 if the firm received an export subsidy last year, a 0 if not.

Also, you have computed the interaction variable: inter=tecnologia*subvencion.


You first estimate the following linear model

Exporta = β0+ β1*Crecim+ β2*Tecnologia+ β3*Subvencion+ β4*inter+u


Linear regression Number of obs = 200
F( 4, 195) = 83.96
Prob > F = 0.0000
R-squared = 0.5259
Root MSE = .32789

Robust
exporta Coef. Std. Err. t P>|t| [95% Conf. Interval]

crecim 7.793423 .6980351 11.16 0.000 6.416756 9.170091


tecnologia .4324247 .0567957 7.61 0.000 .3204119 .5444375
subvencion .2629767 .0674078 3.90 0.000 .1300347 .3959186
inter -.065692 .1054383 -0.62 0.534 -.273638 .1422539
_cons -.3279511 .0410276 -7.99 0.000 -.408866 -.2470363

1) Interpret the value of the β1, β2 and β4 parameters and comment on their statistical significance.

2) Use the model to estimate the probability that the following firms will export:
- Firm A: A firm from a non-technology sector that received a subsidy and grew 3% on average
during the last three years.
- Firm B: A firm from a non-technology sector that did not receive a subsidy and grew 14% on
average during the last three years.
- Firm C: A firm from a technology sector that received a subsidy and grew -6% during the last three
years.

You now estimate the probit model:

P(Exporta=1|X’s )=Φ(β0+ β1*Crecim+ β2*Tecnologia+ β3*Subvencion+ β4*inter)

There are two different commands for probit estimation, so we show the results of both:
ECONOMETRICS / QUANTITATIVE AND STATISTICAL METHODS I

. probit exporta crecim tecnologia subvencion inter ,r

Iteration 0: log pseudolikelihood = -126.83573


Iteration 1: log pseudolikelihood = -66.469425
Iteration 2: log pseudolikelihood = -52.268415
Iteration 3: log pseudolikelihood = -47.705499
Iteration 4: log pseudolikelihood = -46.949773
Iteration 5: log pseudolikelihood = -46.922123
Iteration 6: log pseudolikelihood = -46.922078

Probit regression Number of obs = 200


Wald chi2(4) = 71.27
Prob > chi2 = 0.0000
Log pseudolikelihood = -46.922078 Pseudo R2 = 0.6301

Robust
exporta Coef. Std. Err. z P>|z| [95% Conf. Interval]

crecim 63.33297 8.611385 7.35 0.000 46.45496 80.21097


tecnologia 3.216319 .4555349 7.06 0.000 2.323487 4.109151
subvencion 2.165041 .4429881 4.89 0.000 1.2968 3.033282
inter -.8701871 .6062746 -1.44 0.151 -2.058464 .3180893
_cons -6.390114 .7575184 -8.44 0.000 -7.874823 -4.905406

. dprobit exporta crecim tecnologia subvencion inter ,r

Iteration 0: log pseudolikelihood = -126.83573


Iteration 1: log pseudolikelihood = -66.469425
Iteration 2: log pseudolikelihood = -52.268415
Iteration 3: log pseudolikelihood = -47.705499
Iteration 4: log pseudolikelihood = -46.949773
Iteration 5: log pseudolikelihood = -46.922123
Iteration 6: log pseudolikelihood = -46.922078

Probit regression, reporting marginal effects Number of obs = 200


Wald chi2(4) = 71.27
Prob > chi2 = 0.0000
Log pseudolikelihood = -46.922078 Pseudo R2 = 0.6301

Robust
exporta dF/dx Std. Err. z P>|z| x-bar [ 95% C.I. ]

crecim 11.9497 2.311679 7.35 0.000 .049746 7.4189 16.4805


tecnol~a* .6926187 .0764537 7.06 0.000 .46 .542772 .842465
subven~n* .5774227 .1165358 4.89 0.000 .31 .349017 .805829
inter* -.1130552 .0547728 -1.44 0.151 .155 -.220408 -.005702

obs. P .33
pred. P .1105265 (at x-bar)

(*) dF/dx is for discrete change of dummy variable from 0 to 1


z and P>|z| correspond to the test of the underlying coefficient being 0

3) Use the above results (however you think appropriate) to interpret the effect of crecim, tecnologia
and inter.

You ask Stata for the classification table (cutoff 0.5) and get:
Probit model for exporta

True
Classified D ~D Total

+ 52 14 66
- 14 120 134

Total 66 134 200

4) Find the ratio of correctly classified observations. Are you improving over the “naive” (no-model)
predictions? Which are you identifying better, the exporting companies or the non-exporting
companies?
ECONOMETRICS / QUANTITATIVE AND STATISTICAL METHODS I

4. You have to estimate the effects of education on wages, and you are interested in estimating an
equation that looks like:

Ln(wagei)= β0+ β1*Educi+ β2*Experi+ β3*Tenurei+ β4*Techi+ β5*Femalei + β6*MSci+εi

The variables are:


- wagei is a measure of wages earned by person i in the past year.
- Educi measures the years of education of person i.
- Experi measures the total number of years that person i has been working after completing his/her
education.
- Tenurei is the number of years that person i has been working for the current company.
- Techi is 1 if the company in which person i is working is in a technology intensive sector, a 0
otherwise.
- Female i is a 1 if person i is a woman, a 0 otherwise.
- MSci is a 1 if person i has a masters degree, a 0 otherwise.

a) What is the interpretation of your parameter of interest, β1?

b) You now have collected data for a sample of people and run the above model by using simple
OLS regression (with traditional standard errors). Then your boss tells you that your data have six
possible “characteristics” which may affect the validity of your analysis. For each of the
characteristics 1-6 below (taken independently):
- Give a technical name to the problem they may cause.
- Describe how the problem will affect the results of the estimation (i.e. whether it affects the
properties of the parameter estimates, the standard errors, etc.).
- Identify and briefly describe what test you could use (if any exists!) to detect whether indeed that
problem is present.
- Describe how you can solve the problem.

CHARACTERISTICS (expressed in the words of your boss)

1) “It is well known that people with more education are more intelligent to begin with, and
intelligence affects wages, but you do not have a measure of ‘intelligence’ in your model”.

2) “The dispersion of wages is much bigger for people in executive positions than for people in non-
executive positions”.

3) “The return of education (impact of education on wages) is likely to be different for males and for
females”.

4) “The people in your sample come from the customer database of a company that sells software
products”.

5) “Total working experience and tenure in a company are variables that are very much related”

6) “Education and wages are two ‘characteristics’ of a person that are the joint result of common
factors such as the education of parents, etc…”.
ECONOMETRICS / QUANTITATIVE AND STATISTICAL METHODS I

5. You work for an investment bank and your boss has asked you to analyze investment decisions by
investors. Given your knowledge of economics and finance you have come up with a system of four
simultaneous equations that describe the way investors decide on their investments (assume that the
equations are "believable", that is, do not try to make sense out of the equation):

Ai = α 0 + α1 Bi + α 2Ci + α 3 Malei + α 4 Ed i + α 5 Risk i + α 6 Incomei + e1,i


Bi = β 0 + β1 Ai + β 2Ci + β 3 Malei + β 4 Ed i + β 5 Incomei + e2,i
Ci = γ 0 + γ 1 Ai + γ 2 Bi + γ 3 Rurali + γ 4 Incomei + e3,i
Di = φ0 + φ1 Ai + φ2 Bi + φ3Ci + φ4 Breaki + e4,i

The endogenous variables are:


Ai: Share of the investor's wealth which is invested in fixed income assets
Bi: Share of the investor's wealth which is invested in stocks
Ci: Share of the investor's wealth which is invested in cash
Di: Share of the investor's wealth which is invested in real assets (art, etc...)
(these shares do not have to add up to one).

And the exogenous (explanatory) variables are:


Malei: a dummy equal to 1 if investor i is a male, and a 0 otherwise.
Edi: years of education of investor i.
Rurali: a dummy equal to 1 if investor i lives in a rural area, and a 0 otherwise.
Riski: a measure of risk aversion of the investor (higher number, more risk averse).
Incomei: a measure of monthly income of the investor.
Breaki: a dummy equal to 1 if investor i lives in a country which gives tax breaks to investing in real
assets (art, etc.) and a 0 otherwise.

Analyze whether you can identify the system as it is or not. If you cannot identify the whole system,
propose a solution that will solve the issue.

6. You have to estimate the determinants of firm profitability, and you are interested in estimating an
equation that looks like:

ROAi = β0+ β1*Investi+ β2*Volati+ β3*pastROAi+ β4*Techi+ β5*Assetsi + β6*Growthi+εi

The variables are:


- ROAi is a measure of return on assets of company i in the current year.
- Investi measures the amount of capital investment carried out by the company during the year.
- Volati is a measure of volatility of the output of the sector where the company operates.
- pastROAi is the ROA of the company during the previous year.
- Techi is 1 if the company is in a technology intensive sector, a 0 otherwise.
- Assetsi measures the level of assets of the company, so it is a measure of company size.
- Growthi denotes the growth in the company’s assets in the year.

You have data for a sample of companies and run the above model by using simple OLS regression
(with traditional standard errors). Then your boss tells you that your data have five possible
ECONOMETRICS / QUANTITATIVE AND STATISTICAL METHODS I

“characteristics” which may affect the validity of your analysis. For each of the characteristics 1-5
below (taken independently):
a) Give a technical name to the problem they may cause.
b) Describe how the problem will affect the results of the estimation (i.e. whether it affects the
properties of the parameter estimates, the standard errors, etc.).
c) Identify and briefly describe what test you could use (if any exists!) to detect whether indeed that
problem is present.
d) Describe how you can solve the problem.

CHARACTERISTICS (expressed in the words of your boss)

1) “It is well known that for companies that are larger in size the dispersion of ROA is bigger”.

2) “Volatility of sector output and growth in the company’s assets tend to be very much related”.

3) “The bigger that investment is, the less it seems to affect ROA”.

4) “You have used a database that has information only about companies that are currently
operational”.

5) “The determinants of asset growth and of return on assets (ROA) are quite similar”.

7. The government is planning to subsidize language courses for immigrants since not knowing the
language limits the access to good jobs. In order to understand the potential impact of this policy,
you are required to estimate the effect of speaking Spanish on the wages that immigrants (from non-
Spanish-speaking countries) earn in Spain. The National Survey on Immigrants contains data
collected in 2007on wages and language proficiency (and of other characteristics, such as age,
education, gender, etc) for a random sample of immigrants.

a) You first estimate a linear regression where the regressor (Span) is a binary indicator of language
proficiency (1 if the person speaks fluent Spanish, 0 otherwise) and the dependent variable (wage)
is the net monthly wage (see the results below). What is the estimated effect of speaking fluent
Spanish on wages? Comment on the coefficient and interpret the test on significance of the effect.
. reg wage Span

Source | SS df MS Number of obs = 4528


-------------+------------------------------ F( 1, 4526) = 50.95
Model | 30571239.8 1 30571239.8 Prob > F = 0.0000
Residual | 2.7156e+09 4526 599996.284 R-squared = 0.0111
-------------+------------------------------ Adj R-squared = 0.0109
Total | 2.7462e+09 4527 606616.837 Root MSE = 774.59

------------------------------------------------------------------------------
wage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
Span | 177.757 24.90259 7.14 0.000 128.9358 226.5782
_cons | 988.6795 20.69451 47.77 0.000 948.1082 1029.251
------------------------------------------------------------------------------

b) You then extend the model to include additional regressors: age (in years), woman (a dummy
which takes value 1 for female immigrants), and a binary variable (univ) which is 1 if the
ECONOMETRICS / QUANTITATIVE AND STATISTICAL METHODS I

immigrant has a university degree. Use the results (below) to predict the salary of a 20-year old
female immigrant who speaks fluent Spanish but has no university degree.

. reg wage Span woman univ

Source | SS df MS Number of obs = 4515


-------------+------------------------------ F( 4, 4510) = 220.75
Model | 449090432 4 112272608 Prob > F = 0.0000
Residual | 2.2938e+09 4510 508597.258 R-squared = 0.1637
-------------+------------------------------ Adj R-squared = 0.1630
Total | 2.7429e+09 4514 607634.928 Root MSE = 713.16

------------------------------------------------------------------------------
wage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
Span | 144.7333 23.26845 6.22 0.000 99.11577 190.3509
age | 10.92333 1.081844 10.10 0.000 8.802389 13.04428
woman | -445.0152 21.61999 -20.58 0.000 -487.401 -402.6295
univ | 468.8543 26.5885 17.63 0.000 416.7278 520.9808
_cons | 704.486 43.78279 16.09 0.000 618.6503 790.3217
------------------------------------------------------------------------------

c) Someone tells you that Span is probably endogenous. Explain why it is reasonable to expect that
Span (speaking fluent Spanish) is endogenous to wages.

d) You believe that the age at which the immigrant arrived in Spain could be a valid instrument for
Span (speaking fluent Spanish). Explain why the age of arrival may be a valid instrument.

e) The following table shows the results of estimating the above regression by 2SLS (the instrument
arrive is a dummy which takes value 1 if the person was less than 14 years old when he/she arrived
in Spain and a 0 otherwise). Is the instrument weak? Explain.

. ivregress 2sls wage age woman univ (Span = arrive), first

First-stage regressions
-----------------------

Number of obs = 4515


F( 4, 4510) = 51.57
Prob > F = 0.0000
R-squared = 0.0437
Adj R-squared = 0.0429
Root MSE = 0.4521

------------------------------------------------------------------------------
Span | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | -.0016024 .0006855 -2.34 0.019 -.0029464 -.0002585
woman | .0405692 .0136925 2.96 0.003 .0137252 .0674132
univ | .1723813 .0166719 10.34 0.000 .1396963 .2050663
arrive | .2672933 .0288442 9.27 0.000 .2107444 .3238421
_cons | .6790657 .025861 26.26 0.000 .6283654 .729766
------------------------------------------------------------------------------

Instrumental variables (2SLS) regression Number of obs = 4515


Wald chi2(4) = 814.74
Prob > chi2 = 0.0000
R-squared = 0.1241
Root MSE = 729.47

------------------------------------------------------------------------------
wage | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
ECONOMETRICS / QUANTITATIVE AND STATISTICAL METHODS I

Span | 484.98 174.1167 2.79 0.005 143.7174 826.2425


age | 11.51488 1.146491 10.04 0.000 9.267795 13.76196
woman | -459.1829 23.25133 -19.75 0.000 -504.7547 -413.6111
univ | 411.1739 39.93282 10.30 0.000 332.907 489.4408
_cons | 466.4478 128.7118 3.62 0.000 214.1773 718.7183
------------------------------------------------------------------------------
Instrumented: Span
Instruments: age woman univ arrive

g) Compare the estimated effect of Span on wages in the OLS and 2SLS regressions. Why are they
so different?

You might also like