You are on page 1of 46

Model

Specifica-on: Mo-va-on

2
Introduc-on
•  Ques-on 1: Do we include all explanatory variables or only a
few?

•  Ques-on 2: Should we transform the variables?

•  Ques-on 3: How to evaluate a model?


3
Introduc-on
•  Suppose that all explanatory variables in a dataset are
relevant for the dependent variable. Should we include all?

•  Answer: Not necessarily

4
Example: Stock market index

5
Explanatory variables
•  Stock characteris-cs: Dividends, earnings, vola-lity, book
value,issuing ac-vity

•  Interest-rate related: Treasury bill rates, long term yields,


corporate bond returns

•  Macroeconomic: Infla-on, investment, consump-on

6
Stock index and book-to-market ra-o

7
Log stock index and book-to-market ra-o

8
Change in log stock index, and book-to-
market ra-o

9
Regression output

•  What is the interpreta-on of the nega-ve sign of Book-to-


market?
–  Answer: High book-to-market usually coincides with periods when
market-value decreased.

10
Bias-efficiency trade-off

11
Bias-efficiency trade-off

12
Decision metrics

•  Possible decision metrics:

–  Informa-on criteria

–  Out-of-sample predic-on

13
Informa-on criteria
•  Commonly used informa-on criteria:

–  with s the standard error of the regression and k the number of


variables.
•  Which informa-on criterion imposes the strongest penalty on
the number of variables?

–  Answer: Penalty is 2/n for AIC and log(n)/n for BIC; BIC imposes
stronger penalty if log(n) > 2, n≥ 8.

14
Out-of-sample predic-on
•  Commonly used out-of-sample predic-on metrics:

–  with nf the number of observa-ons "saved" for out-of-sample


evalua-on and yi the i-th predicted value of the dependent variable.

15
Itera-ve selec-on methods
•  Commonly used methods to select explanatory variables:
–  t-test and F-test
–  Informa-on criteria
–  Out-of-sample predic-ons

•  Also itera-ve methods (based on tests) are commonly used:


–  General-to-specic / backward elimina-on
–  Specic-to-general / forward selec-on

16
Data transforma-on
•  Seang:

•  What is the most appropriate form of the data (y and X)?

–  All variables should be incorporated in a compa-ble manner.

–  If this is not the case, data can be transformed.

17
Taking logarithms

•  Use for:

–  Exponen-al growth..

18
Taking differences
•  Use for:
•  Trending paderns
–  Sta-s-cal assump-ons may not hold.

First difference:

19
Non-linear effects

•  Advantages:
–  Get non-linear func-onal form.
–  May provide economically meaningful speciffica-on.

20
Non-linear effects

21
Endogeneity
•  OLS requires some assump-ons:
–  explanatory variables should be exogenous
–  viola-on of this: endogeneity.

•  We want to explain

–  Number of flights at an airport per month (y) using

–  Number of travel insurances made in previous month (x)

•  Suppose OLS yields


–  y = 10000 + 0.25x + e

22
Endogeneity
•  Given the es-mates (y: flights, x: insurances)
y = 10000 + 0.25x + e
•  Correct: 4000 insurances sold è expected number of flights
= 10000 + 0.25*4000= 11000
–  High x tends to go together with high y.
–  The iden-fied correla-on yields adequate predic-ons.

•  Incorrect: Selling 4; 000 addi-onal insurances causes


0.25*4000 = 1000 addi-onal flights
•  The regression does not iden-fy a causal impact!
•  A third variable (travel demand) affects y (flights) and x
(insurances).
23
Stochas-c vs. non-stochas-c regressors
•  Standard assump-ons for linear model (y = βX + e) include
A2 Explanatory variables are non-stochas-c

•  Implica-ons:
–  Obtain new data: X stays constant (and y changes)
–  Need "controlled experiment"
–  OLS es-mator b converges to true coefficient β for n è 1 (OLS is
consistent)

24
Economic models
•  In economics:
–  Controlled (or natural) experiments are rare
–  New data with same X cannot be obtained
–  Explanatory variables are stochas-c!
•  If X stochas-c:
–  new data set è new X values
–  X can be correlated with other variables
–  If X correlated with e
•  X is endogenous
•  There is another variable that affects y and X
–  OLS does not properly es-mate β (inconsistent)
–  If X uncorrelated with e
•  X is exogenous
•  OLS consistent
25
Omided variable - Example
•  Model student's grade using adendance at lectures.

–  Which omided factor would lead to endogeneity of adendance?

•  Three possible omided factors:


1 Difficulty of exam
•  NO: not correlated with adendance.
2 Mo-va-on of the students?
•  YES: correlates with adendance and affects grade.
3 Compulsory adendance yes/no?
•  NO: does not directly impact the grade

26
Other examples - Strategic behavior
•  Consider a model explaining demand using price.

•  Strategic price seang:



1 Sets high price when high demand is expected

2 Price and sales posi-vely correlated

3 Price will be endogenous in regression of demand on price.

27
Other examples - Measurement errors

•  y (eg. salary) depends on x* (eg. intelligence)

•  x* (intelligence) difficult to observe

•  x = x* + measurement error: noisy measurement (eg. IQ


score)

•  measurement error: x is endogenous in y = α + βx + ε

28
Endogeneity
•  Common problem in economics

1 Omided variables
2 Strategic behavior
3 Measurement errors

è X is correlated with ε

•  Endogeneity violates the basic assump-ons

29
Simulated example, y = 1 + 2x* + u
• 

30
Measurement error example

•  Under measurement error (and endogeneity in general):

–  we obtain the wrong coefficients!

31
Direc-on of bias in the measurement error
case
•  OLS is "biased towards zero"

–  èOLS underes-mates true effect

•  Intui-vely:
–  x-values on the les likely have nega-ve measurement errors
–  x-values on the right likely have posi-ve measurement errors

•  Measurement errors "stretch" the scader in the horizontal


direc-on
–  è a flader regression line

32
OLS in presence of endogeneity

•  If X endogenous

–  X correlated with ε

–  OLS es-mator for β is not consistent

–  Even with in infinite amount of data: OLS does not give useful
es-mates

33
"Solving endogeneity": Graphical
representa-on
• 

34
Instrumental variable es-ma-on
•  Z variables are instruments if
–  Z and X are correlated
–  Z does not correlate with ε

•  Correla-on between instruments and y is only due to X

•  Use instruments to es-mate β

35
Solving "endogeneity: Graphical
representa-on
1 Use Z to decompose X in explained and unexplained part
2 Eect size of explained part on y equals β
3 Unexplained part is added to error term





Endogeneity is solved as
•  X unexplained not correlated with X explained
•  X unexplained is exogenous
36
Finding instruments
What are good instruments?

•  All exogenous variables in X (incl. constant)

•  Other instruments are always needed:

–  At least one for every endogenous variable

–  Want: strong correla-on between Z and X

–  Need: no correla-on between Z and "

37
Examples of instruments

•  Explain obtained grade using adendance:
–  Poten-al instruments:
•  Travel -me home to university
•  Policy change to obligatory adendance

•  What variable would be an instrument for price when


modeling consumer sales of ice cream using sales = α + βprice
+ ε?
–  Poten-al instruments?
1 Prices of raw materials
2 Compe-tor prices
3 Outside temperature

38
Summary

If X is in fact exogenous
–  OLS and 2SLS both consistent
–  Variance OLS smaller than variance 2SLS!
èUse OLS

If X is endogenous
–  2SLS is consistent
–  OLS inconsistent
è Use 2SLS

39
Examples of instruments

•  Explain obtained grade using adendance:
–  Poten-al instruments:
•  Travel -me home to university
•  Policy change to obligatory adendance

•  What variable would be an instrument for price when


modeling consumer sales of ice cream using sales = α + βprice
+ ε?
–  Poten-al instruments?
1 Prices of raw materials (valid)
2 Compe-tor prices (direct influence on sales, so part of ε)
3 Outside temperature (direct influence on sales, so part of ε)

40
Tes-ng the validity of instruments

•  Valid instruments sa-sfy three condi-ons

1 There are enough instruments
–  Easy! Just count.

2 Instruments are correlated (enough) with X
–  Check significance of instruments in first stage regression

3 Instruments are not correlated with ”
–  Perform Sargan test

41
Test correla-on Z vs. X
•  X1 poten-ally endogenous variables
•  X2 exogenous variables
•  Z = (Z*; X2) instruments

First-stage regression: apply OLS to


42
Sargan test
• 

43
Sargan test
• 

44
Notes on the Sargan test
•  Test only works when there are too "many" instruments
(m > k)

•  At least k of the instruments should be valid

•  Test cannot indicate which instruments are invalid!

45
Tes-ng for exogeneity of variables-
Hausman test
•  Intui-on:
–  Use the instruments to split poten-ally endogenous variables into
•  1 a guaranteed exogenous part
•  2 a poten-ally endogenous part

•  Check whether the endogenous and exogenous part affect y


differently

46
Hausman test - procedure
• 

47

You might also like