Professional Documents
Culture Documents
Empirical Methods in CF
Lecture 9 – Common Limits & Errors
2
Background readings for today
n Ali, Klasa, and Yeung (RFS 2009)
n Gormley and Matsa (RFS 2014)
3
Outline for Today
n Quick review of last lecture on RDD
n Discuss common limitations and errors
q Our data isn’t perfect
q Hypothesis testing mistakes
q How not to control for unobserved heterogeneity
4
Quick Review[Part 1]
5
Quick Review[Part 2]
n Estimation of sharp RDD
yi = α + β di + f ( xi − x ') + di × g ( xi − x ') + ui
6
Quick Review[Part 3]
n Estimation of fuzzy RDD is similar…
yi = α + β di + f ( xi − x ') + ui
7
Quick Review [Part 4]
n What are some standard internal validity tests
you might want to run with RDD?
q Answers:
n Check robustness to different polynomial orders
n Check robustness to bandwidth
n Graphical analysis to show discontinuity in y
n Compare other characteristics of firms around cutoff
threshold to make sure no other discontinuities
n And more…
8
Quick Review [Part 5]
n If effect of treatment is heterogeneous,
how does this affect interpretation of RDD
estimates?
q Answer = They take on local average
treatment effect interpretation, and fuzzy RDD
captures only effect of compliers. Neither is
problem for internal validity, but can
sometimes limit external validity of finding
9
Common Limitations & Errors – Outline
n Data limitations
n Hypothesis testing mistakes
n How to control for unobserved heterogeneity
10
Data limitations
n The data we use is almost never perfect
q Variables are often reported with error
q Exit and entry into dataset typically not random
q Datasets only cover certain types of firms
11
Measurement error – Examples
n Variables are often reported with error
q Sometimes it is just innocent noise
n E.g. Survey respondents self report past income
with error [because memory isn’t perfect]
12
Measurement error – Why it matters
n Answer = Depends; but in general, hard
to know exactly how this will matter
q If y is mismeasured…
n If only random noise, just makes SEs larger
n But if systematic in some way [as in second example],
can cause bias if error is correlated with x’s
q If x is mismeasured…
n Even simple CEV causes attenuation bias on
mismeasured x and biases on all other variables
13
Measurement error – Solution
n Standard measurement error solutions
apply [see “Causality” lecture]
q Though admittedly, measurement error is
difficult to deal with unless know exactly
source and nature of the error
14
Survivorship Issues – Examples
n In other cases, observations are included
or missing for systematic reasons; e.g.
q Ex. #1 – Firms that do an IPO and are
added to datasets that cover public firms may
be different than firms that do not do an IPO
q Ex. #2 – Firms adversely affected by some
event might subsequently drop out of data
because of distress or outright bankruptcy
n How can these issues be problematic?
15
Survivorship Issues – Why it matters
n Answer = There is a selection bias,
which can lead to incorrect inferences
q Ex. #1 – E.g. going public may not cause
high growth; it’s just that the firms going
public were going to grow faster anyway
q Ex. #2 – Might not find adverse effect of
event (or might understate it’s effect) if some
affected firms go bankrupt and are dropped
16
Survivorship Issues – Solution
n Again, no easy solutions; but, if worried
about survivorship bias…
q Check whether treatment (in diff-in-diff) is
associated with observations being more or
less likely to drop from data
q In other analysis, check whether covariates
of observations that drop are systematically
different in a way that might be important
17
Sample is limited – Examples
18
Sample is limited – Why it matters
19
Sample is limited – Solution
20
Interesting Example of Data Problem
21
Ali, et al. – Example data problem [P1]
22
Ali, et al. – Example data problem [P2]
n Ali, et al. (RFS 2009) found it mattered;
using Census measure overturns four
previously published results
q E.g. Concentration is positively related to R&D,
not negatively related as previously argued
q See paper for more details...
23
Common Limitations & Errors – Outline
n Data limitations
n Hypothesis testing mistakes
n How to control for unobserved heterogeneity
24
Hypothesis testing mistakes
n As noted in lecture on natural experiments,
triple-difference can be done by running
double-diff in two separate subsamples
q E.g. estimate effect of treatment on small firms;
then estimate effect of treatment on large firms
25
Example inference from such analysis
Small
Large
Low
D/E
High
D/E
Sample
=
Firms Firms Firms Firms
Treatment
*
Post 0.031 0.104** 0.056 0.081***
(0.121) (0.051) (0.045) (0.032)
26
Be careful making such claims!
27
Example triple interaction result
All
Sample
=
Firms
Treatment
*
Post 0.031
(0.121)
Difference is not actually
Treatment
*
Post
*
Large 0.073
(0.065)
statistically significant
N 5,432
R-‐squared 0.12 Remember to interact year
Firm
dummies X dummies & triple difference;
Year
dummies X
otherwise, estimates won’t
Year
*
Large
dummies X
match earlier subsamples
28
Practical Advice
n Don’t make claims you haven’t tested;
they could easily be wrong!
q Best to show relevant p-values in text or tables
for any statistical significance claim you make
q If difference isn’t statistically significant
[e.g. p-value = 0.15], can just say so; triple-diffs
are noisy, so this isn’t uncommon
q Or, be more careful in your wording…
n I.e. you could instead say, “we found an effect for large
firms, but didn’t find much evidence for small firms”
29
Common Limitations & Errors – Outline
n Data limitations
n Hypothesis testing mistakes
n How to control for unobserved heterogeneity
q How not to control for it
q General implications
q Estimating high-dimensional FE models
30
Unobserved Heterogeneity – Motivation
n Controlling for unobserved heterogeneity is a
fundamental challenge in empirical finance
q Unobservable factors affect corporate policies and prices
q These factors may be correlated with variables of interest
31
Many different strategies are used
n As we saw earlier, FE can control for unobserved
heterogeneities and provide consistent estimates
n But, there are other strategies also used to control
for unobserved group-level heterogeneity…
q “Adjusted-Y” (AdjY) – dependent variable is
demeaned within groups [e.g. ‘industry-adjust’]
q “Average effects” (AvgE) – uses group mean of
dependent variable as control [e.g. ‘state-year’ control]
32
AdjY and AvgE are widely used
n In JF, JFE, and RFS…
q Used since at least the late 1980s
q Still used, 60+ papers published in 2008-2010
q Variety of subfields; asset pricing, banking,
capital structure, governance, M&A, etc.
33
But, AdjY and AvgE are inconsistent
34
The underlying model [Part 1]
n Recall model with unobserved heterogeneity
yi , j = β X i , j + fi + ε i , j
35
The underlying model [Part 2]
36
The underlying model [Part 3]
37
We already know that OLS is biased
True model is: yi , j = β X i , j + fi + ε i , j
38
Adjusted-Y (AdjY)
n Tries to remove unobserved group heterogeneity by
demeaning the dependent variable within groups
1
where yi =
J
∑ (β X
k ∈group i
i,k + fi + ε i ,k )
39
Example AdjY estimation
40
Here is why…
n Rewriting the group mean, we have:
yi = fi + β X i + ε i ,
41
AdjY can have omitted variable bias
n β̂ adjY can be inconsistent when β ≠ 0
True model: yi , j − yi = β X i , j − β X i + ε i , j − ε i
y
But, AdjY estimates: i , j − yi = β AdjY
X i, j + u AdjY
i, j
42
Now, add a second variable, Z
43
AdjY estimates with 2 variables
n With a bit of algebra, it is shown that:
⎡ β (σ XZ σ ZX − σ Z2σ XX ) + γ (σ XZ σ ZZ − σ Z2σ XZ ) ⎤
⎢β + ⎥
⎡ βˆ ⎤ ⎢
AdjY
σ Z σ X − σ XZ
2 2 2
⎥
⎢ AdjY ⎥ = ⎢ ⎥
⎢⎣γˆ ⎥⎦ ⎢ β (σ σ
XZ XX − σ X ZX )
2
σ + γ (σ σ
XZ XZ − σ X ZZ ) ⎥
2
σ
⎢γ + σ σ
2 2
− σ 2 ⎥
⎣ Z X XZ ⎦
44
Average Effects (AvgE)
45
Average Effects (AvgE)
46
AvgE has measurement error bias
47
AvgE has measurement error bias
n Recall that group mean is given by yi = fi + β X i + ε i ,
q Therefore, yi measures fi with error − β X i − ε i
q As is well known, even classical measurement error
causes all estimated coefficients to be inconsistent
48
AvgE estimate of β with one variable
βˆ AvgE = β +
( ) (
σ Xf βσ fX + β 2σ X2 + σ ε2 − σ εε − βσ XX σ 2f + βσ fX + σ εε )
( )
σ X2 σ 2f + 2βσ fX + β 2σ X2 + σ ε2 − (σ Xf + βσ XX )
2
Determining
magnitude and Covariance between X and X
again problematic, but not Even non-i.i.d.
direction of bias
needed for AvgE estimate to nature of errors
is difficult
be inconsistent can affect bias!
49
Comparing OLS, AdjY, and AvgE
n Can use analytical solutions to compare
relative performance of OLS, AdjY, and AvgE
n To do this, we re-express solutions…
q We use correlations (e.g. solve bias in terms of
correlation between X and f, ρ Xf , instead of σ Xf )
q We also assume i.i.d. errors [just makes bias of
AvgE less complicated]
q And, we exclude the observation-at-hand when
calculating the group mean, X i , …
50
Why excluding Xi doesn’t help
n Quite common for researchers to exclude
observation at hand when calculating group mean
q It does remove mechanical correlation between X and
omitted variable, X i , but it does not eliminate the bias
q In general, correlation between X and omitted variable, X i ,
is non-zero whenever X i is not the same for every group i
q This variation in means across group is almost
assuredly true in practice; see paper for details
51
ρXf has large effect on performance
Estimate, β̂ AdjY more biased
than OLS, except for
True β = 1
2
AdjY
0
AvgE gives
wrong sign for Other parameters held constant
AvgE low values of ρXf
-1
σ f / σ X = σ ε / σ X = 1, J = 10, ρ X , X = 0.5
i −i
-2
AvgE
0.75
AdjY
0.5
0 5 10 15 20 25
σ f / σ X = σ ε / σ X = 1, ρ X , X = 0.5, ρ Xf = 0.25 J
i −i
53
Summary of OLS, AdjY, and AvgE
54
Fixed effects (FE) estimation
n Recall: FE adds dummies for each group to OLS
estimation and is consistent because it directly
controls for unobserved group-level heterogeneity
n Can also do FE by demeaning all variables with respect
to group [i.e. do ‘within transformation’] and use OLS
y
FE estimates: i , j − yi = β FE
( i, j i ) i, j
X − X + u FE
True model: yi , j − yi = β ( X i , j − X i ) + (ε i , j − ε i )
55
Comparing FE to AdjY and AvgE
n To estimate effect of X on Y controlling for Z
q One could regress Y onto both X and Z… Add group FE
q Or, regress residuals from regression of Y on Z
onto residuals from regression of X on Z Within-group
transformation!
§ AdjY and AvgE aren’t the same as finding the
effect of X on Y controlling for Z because...
§ AdjY only partials Z out from Y
§ AvgE uses fitted values of Y on Z as control
56
The differences will matter! Example #1
n Consider the following capital structure regression:
( D / A)i ,t = α + βXi,t + fi + ε i ,t
q (D/A)it = book leverage for firm i, year t
q Xit = vector of variables thought to affect leverage
q fi = firm fixed effect
57
Estimates vary considerably
De p e n d e n t variab le = b o o k le ve rag e
58
The differences will matter! Example #2
59
Estimates vary considerably
Dependent
Variable
=
Tobin's
Q
OLS Adj Y Avg E FE
60
Common Limitations & Errors – Outline
n Data limitations
n Hypothesis testing mistakes
n How to control for unobserved heterogeneity
q How not to control for it
q General implications
q Estimating high-dimensional FE models
61
General implications
62
Other AdjY estimators are problematic
n Same problem arises with other AdjY estimators
q Subtracting off median or value-weighted mean
q Subtracting off mean of matched control sample
[as is customary in studies if diversification “discount”]
q Comparing “adjusted” outcomes for treated firms pre-
versus post-event [as often done in M&A studies]
q Characteristically adjusted returns [as used in asset pricing]
63
AdjY-type estimators in asset pricing
n Common to sort and compare stock returns across
portfolios based on a variable thought to affect returns
n But, returns are often first “characteristically adjusted”
q I.e. researcher subtracts the average return of a benchmark
portfolio containing stocks of similar characteristics
q This is equivalent to AdjY, where “adjusted returns” are
regressed onto indicators for each portfolio
64
Asset Pricing AdjY – Example
n Asset pricing example; sorting returns based
on R&D expenses / market value of equity
Characteristically
adjusted
returns
by
R&D
Quintile
(i.e.,
Adj Y)
Missing Q1 Q2 Q3 Q4 Q5
-‐0.012*** -‐0.033*** -‐0.023*** -‐0.002 0.008 0.020***
(0.003) (0.009) (0.008) (0.007) (0.013) (0.006)
Difference between
We use industry-size benchmark portfolios Q5 and Q1 is 5.3
and sorted using R&D/market value percentage points
65
Estimates vary considerably
Dependent
Variable
=
Yearly
Stock
Return
Adj Y FE
R&D
Missing 0.021** 0.030***
(0.009) (0.010) Same AdjY result,
R&D
Quintile
2 0.01 0.019 but in regression
(0.013) (0.014) format; quintile 1
R&D
Quintile
3 0.032*** 0.051*** is excluded
(0.012) (0.018)
R&D
Quintile
4 0.041*** 0.068***
(0.015) (0.020) Use benchmark-period
R&D
Quintile
5 0.053*** 0.094*** FE to transform both
(0.011) (0.019) returns and R&D; this is
Observations 144,592 144,592 equivalent to double sort
2
R 0.00 0.47
66
What if AdjY or AvgE is true model?
§ If data exhibited structure of AvgE estimator,
this would be a peer effects model
[i.e. group mean affects outcome of other members]
§ In this case, none of the estimators (OLS, AdjY,
AvgE, or FE) reveal the true β [Manski 1993;
Leary and Roberts 2010]
67
Common Limitations & Errors – Outline
n Data limitations
n Hypothesis testing mistakes
n How to control for unobserved heterogeneity
q How not to control for it
q General implications
q Estimating high-dimensional FE models
68
Multiple high-dimensional FE
69
LSDV is usually needed with two FE
70
Why such models can be problematic
71
This is growing problem
72
But, there are solutions!
73
#1 – Interacted fixed effects
n Combine multiple fixed effects into one-
dimensional set of fixed effect, and remove
using within transformation
q E.g. firm and industry-year FE could be
replaced with firm-industry-year FE
74
#2 – Memory-saving procedures
n Use properties of sparse matrices to reduce
required memory, e.g. Cornelissen (2008)
n Or, instead iterate to a solution, which
eliminates memory issue entirely, e.g.
Guimaraes and Portugal (2010)
q See paper for details of how each works
q Both can be done in Stata using user-written
commands FELSDVREG and REGHDFE
75
These methods work…
76
Summary of Today [Part 1]
77
Summary of Today [Part 2]
78
In First Half of Next Class
n Matching
q What it does…
q And, what it doesn’t do
79
Assign papers for next week…
n Gormley and Matsa (working paper, 2015)
q Corporate governance & playing it safe preferences
80
Break Time
n Let’s take our 10 minute break
n We’ll do presentations when we get back
81