You are on page 1of 3

UNIVARIATE REGRESSION: DUMMY VARIABLES: is related to y, 2) proxy is correlated with the unobserved variable 3) proxy is uncorrelated with other x

SCALING VARIABLES: Ex of categorical dummy: male, female, employed/unemployed, ex of cross-sectional comparison: Size of firm variables in the model.
If you scale the dependent variable, the original confidence intervals, interpretations, etc remain the same. above/below or pay of CEO above/below median. ME: ME basically represents the imprecise measurement of variable of interests. Proxy and ME are different
Only the beta & the alpha change.& the SEs increase. Suppose y variable was in 1000s and you want to Interpretation of single dummy models: 𝑤𝑎𝑔𝑒=𝛽0 +𝛿 𝑓𝑒𝑚𝑎𝑙𝑒+𝛽1 𝑒𝑑𝑢𝑐. Gama 0 measures the difference in things. Proxy is used for things entirely unobservable and ME’s variables are well-defined. ME in ‘y’ variable
change it in raw dollars, the beta and alpha both will get multiplied with 1000. wage between male and female with the same level of education. Female is the dummy variables. Wage for only effects SEs and not betas. When there is a ME in y variable, the error term becomes u+e. We assume
Scaling the independent variable will lead to a change in the beta & SEs ONLY. Suppose your x variable was male= B0 + B1 educ. Wage for female= (B0 + Gama 0 + B1 Educ). Intercept will be B0 & (B0 + Game 0). Single that E(e)=0—does not matter as this violation only affects intercept and that, corr(u,e)=0. As long as the
ROE in % and you want to change it to decimals. If you multiply ROE with k, then divide the beta with k too. dummy just shifts intercept- it does not effect the slope. measurement error in y is uncorrelated with the x’s, B is unbiased. Because V(u+e)>V(u), SEs are higher. Is
The alpha would remain the same.. Interpretations of single dummy. 1) Wage = -1.57 – 1.8female + 0.57educ + 0.03exp + 0.14tenure  Female is assuming e and x are uncorrelated plausible?--> depends on the variables. Suppose, we are estimating value
If you want to scale both, the y and the x variable, then multiply the alpha and beta with the dependent the dummy and 1.8 coeff means that on an avg, females earn 1.8$/hour lesser than men with other things of firm value controlling for profitability. If we use book value of debt, there will be a ME as MV of debt should
variable scaling multiple and divide the beta for the independent scaling variable. Basically, combine the remaining same. 2) 𝒍𝒏(𝒑𝒓𝒊𝒄𝒆) = −𝟏. 𝟑𝟓 + 𝟎. 𝟏𝟕 𝒍𝒏( 𝒍𝒐𝒕𝒔𝒊𝒛𝒆) + 𝟎. 𝟕𝟏 𝒍𝒏( 𝒔𝒒𝒓𝒇𝒕) + 𝟎.𝟎𝟑𝒃𝒅𝒓𝒎𝒔 + 𝟎. be used. But in this case, ME will be correlated with x as debt is correlated with profitability. Therefore, the
above both. Interpretations never change. 𝟎𝟓𝟒𝒄𝒐𝒍𝒐𝒏𝒊𝒂𝒍  means that colonial homes cost 5.4% more than otherwise similar homes on an avg. assumption is violated and B will be inconsistent. When ME is in the x variable- e=x-x*. 2 situations, e is
SHIFTING X & Y VARIABLES BY ADDING SOMETHIING TO X AND Y: Idea is that you can add or subtract to x or Multiple indicator variables: Suppose you know how much lower wages are for married and single females. uncorrelated with observed measure x (no bias & SE becomes higher) OR e is uncorrelated with unobserved
y variables and it would not change anything. The intercept would change but the conclusion will remain the Four possible outcomes: S&M, S&F, M&M,M&F. But create only 3 categories idea being that if you put 3 measure x* (bias since e uncorrelated with x* guarantees e is correlated with x). With measurement error,
same. No change in the causal inference too. categories, you automatically know the 4th one. This would make the the data very collinear. The selection of the estimated beta is always small (attenuation bias). This is because you are underestimating the true
LOG-ING Y Variable: which 3 categories does not matter. It just effects the interpretation of the equation. Ex: If we exclude S&M, effect. In reality, some attenuation is likely cus e is correlated with x and x*.
ln(𝑤𝑎𝑔𝑒) = 𝛼 + 𝛽 𝑒𝑑𝑢𝑐𝑎𝑡𝑖𝑜𝑛 + 𝑢 Δ ln( 𝑤𝑎𝑔𝑒) = 𝛽Δ𝑒𝑑𝑢𝑐𝑎𝑡𝑖𝑜𝑛 we are estimating partial change in wage relative to that of S&M. Suppose, this is the eqn: ln( 𝑤𝑎𝑔𝑒) = 0.3 +
100 × Δ ln( 𝑤𝑎𝑔𝑒) = (100𝛽)Δ𝑒𝑑𝑢𝑐𝑎𝑡𝑖𝑜𝑛 0.21 𝑚𝑎𝑟𝑟𝑖𝑒𝑑 𝑀𝑎𝑙𝑒 − .20𝑚𝑎𝑟𝑟𝑖𝑒𝑑𝐹𝑒𝑚𝑎𝑙𝑒 −0.11𝑠𝑖𝑛𝑔𝑙𝑒𝐹𝑒𝑚𝑎𝑙𝑒 + 0.08𝑒𝑑𝑢𝑐𝑎𝑡𝑖𝑜n. Single males are omitted. Samultaneity Bias: if y affects any x (even controls), all betas will be biased.
𝑤𝑎𝑔𝑒 = (100𝛽)Δ𝑒𝑑𝑢𝑐𝑎𝑡𝑖𝑜𝑛 𝑤𝑎𝑔𝑒 Therefore, interpretations would be like single males earn 21% more than married male or 30% lesser than
100 × Δ 0 PANEL DATA AND FE REGRESSION:
married woman, all else equal.
%Δ𝑤𝑎𝑔𝑒 ≈ (100𝛽)Δ𝑒𝑑𝑢𝑐𝑎𝑡𝑖𝑜𝑛
Interactions with variables: Now suppose this becomes:
The proportionate change in y for a given change in x is assumed constant. The change in y is not assumed to Helps with the unobserved variables that does not vary within the groups of observations. Essentially captures
𝒍𝒏(𝒘𝒂𝒈𝒆)=𝟎.𝟑−𝟎.𝟏𝟏𝒇𝒆𝒎𝒂𝒍𝒆+𝟎.𝟐𝟏𝒎𝒂𝒓𝒓𝒊𝒆𝒅−𝟎.𝟑𝟎 𝒇𝒆𝒎𝒂𝒍𝒆×𝒎𝒂𝒓𝒓𝒊𝒆𝒅 +𝟎.𝟎𝟖𝒆𝒅𝒖𝒄𝒂𝒕𝒊𝒐𝒏  Before,
be constant... it gets larger as x increases. The right interpretation is that 1 unit change in x leads to all the unobserved variables within a group that’s why called unobserved heterogeneity.
married females had wages that were 0.20 lower than single males. How about now? Note: Omited catogory
(beta*100) y change in the y variable.
is single male. So lets start with that: • Start with single male: 0.30. Make single male to female (i.e., single
ln(wage) = 0.584 + 0.083education--- What does an additional year of education get you? Answer: = Avg
female): 0.30 -0.11x1 = 0.19. Make single male to married (i.e., married male): 0.30 + 0.21 x 1 = 0.51. Make
increase of 8.3% in wages. The problem with this specification is that it ignores non-linearities. That is, there
single male to married female (married female): 0.30 - 0.11 + 0.21 – 0.30 = 0.10 • Difference: 0.10 – 0.30 = -
are bigger effects from graduating from high school, and college with a diploma.
0.20. Same! Note that the things that we take out is just what they make. So 0.19 means that it is what single
LOG-ING X Variable:
females make. That is why, at the end, we subtract 0.1-0.3 to know how much married woman less than
Suppose you estimated the CEO salary model using logs got the following, where salary is expressed in $000s:
single males.
salary = 4.822 + 1,812.5ln(sales) … What is the interpretation of 1,812.5? Answer = For each 1% increase in
sales, salary increases by $18,125. This is because beta/100 is the change in y for 1% change in x.
LOG-ING X&Y Variable: ln(salary) = 4.822 + 0.257ln(sales)… What is the interpretation of 0.257? Answer = For
each 1% increase in sales, salary increases by 0.257%
RESCALING LOGS DO NOT MATTER: What happens if you try and rescale the log dependent or log
independent (log y)? Multiplying it is just like adding logs. Therefore, no changes to the beta. Just the
intercept changes.
Another way to to do FE estimation is by adding an indicator (dummy) variable. Since the delta and fi in the
WHEN TRUE % change in y becomes larger, 100 Δ ln (y) = % change in y does not hold true. The true %
above slides have no meaning, we can rescale them to make fi = 1. Now to estimate this, we can treat each fi
change is: [(𝑦′−𝑦)/𝑦 %=100 exp 𝛽(𝑥′−𝑥) −1]. So, if 𝛽 is -1.386, true change in y is not 100 x -1.386 = -138.6%,
as a parameter to 1. The equation is the same- just without delta.  this means create a dummy variable for
but it is: 100× 𝑒−1.386 −1%=100× .25 −1 =−75%
group i and add to the regression. The transformation is called least squares dummy variable. The results are
USE OF LOGS: Logs of y or x can mitigate influence of outliers. Helpful to take logs for variables with Positive
causal. Because dummy variables together are collinear, one of them will be dropped so do not try and
currency amount assets, liabs, etc; Large integral values (e.g. population) BUT Don’t take logs for variables
interpret the intercept. It is just the avg y when all the x = 0 for the group (any one of the firm in prev ex)
measured in years or as proportions. Proportions ROA/ROI/Leverage/any ratio.
Note: From the vision slide, we cannot conclude that women earn more than men. We first need to calc the corresponding to the dummy. If have 500 companies, put 499 and put that company’s row to 1 and others 0.
What is the % change in the unemployment if it goes from 10% to 9%--> A 10% drop OR a 1% point drop.
crossover rate and see if it makes sense. Crossover point is when eqn men=eqn woman. => Educ= Additonal things to note: i) Company fixed effect means that suppose apple’s data is across 5 years that there
Gama0/Gama1. It might occur outside the data (at education levels that do not exist). are 15 obs. All the obs will become one group. ii) Year FE means all observations for that year vecomes a
MULTIVARIATE REGRESSION:
group. Like all observations in 1995 becomes a group. Every year would now have its own intercept. iii) It is
ORDINAL INDEPENDENT VARIABLES: The difference between this and indicator variable is that it has more imp to know at what level is your panel data defined. iv) Any column that can divide your observations into
Assumptions: No perfect collinearity among covariates AND E(u|x1,x2,…) = 0. This implies that there is no
than 2 variables on the ‘x side’. Suppose we want to regress int rate on rating. Put AAA=1,AA=2, etc and groups, you can use FE. Similarly, industry and CEO FE can also be put. IN THE CONCLUSION, coefficient is
perfect correlation between x and u, which means we have the perfect model and a true causal relationship.
estimate: 𝐼𝑅 =𝛽0 +𝛽1 × 𝐶𝑅 +𝑢. The problem with this is that we assume constant linear relation but in reality identified using only changes in cross-sections.
Interpretations: Generally, we just look at changes in one x (say x1) holding all the other constant. In this
that is not the case. A better way is to convert ordinal to indicator variables. CONS OF FE: CANNOT IDENTIFY VARIABLES THAT DO NOT VARY WITHIN THE GROUP. All the group level
case, x1 becomes variable of interest. Intercept has meaning sometimes but sometimes it does not.
charecteristics will be removed/swept out in within transformation.
Another way of getting the multivariate regression (Partialing out way): #1 – Regress y on x2; save residuals
CMI AND VIOLATIONS OF CMI: ln(𝑡𝑜𝑡𝑎𝑙𝑝𝑎𝑦) =𝛼+𝛽 ln(𝑓𝑖𝑟𝑚𝑠𝑖𝑧𝑒)ijt +𝛽 𝑅𝑂𝐸 +𝛽 𝑓𝑒𝑚𝑎𝑙𝑒i +𝛿t (->time FE)+𝑓i (->CEO FE)+𝜆j (->Firm FE) +𝑢
(call them 𝑦z ) #2  Regress x1 on x2; save residuals (call them 𝑥z) #3 – Regress 𝑦z onto 𝑥z and NOW the
In the above, female B cannot be estimated as CEO fixed effect takes care of that. There is no variation in
estimated coefficient will be the same as if you’d just run the original multivariate regression! A multi-variate
If x causes y, it does not mean that it will have a high correlation. Only if there are 2 variables (x,y), causation the gender of the CEO  female dummy would be collinear with the CEO FE.
regression is basically finding the effect of each independent variable after partial-ing out the effect of other
will imply correlation. Otherwise, the third variable can be too strong and correlation might be low.
variables.
Also note, in a true model, X is only one of the many possible causes of Y so it is not true that only X causes Y.
This partial-ing out effect holds more generally when x1 is a set of other independent variables and x2 is
Also, if there is causation, it is not true that it will always lead to Y, rather, it will be there occurrence of Y
another set. In this case, you would first regress y on all the variables in x2. (Residuals yz) Then regress each
becomes more probable. In a true model, there are many variables and y is affected by all variables+error.
variable in x1 (residuals xz) on all variables in x2 (residuals xz). Regress yz on corresponding residuals of xz
one-by one. INSTRUMENTAL VARIABLES:
Assumptions for causal interpretations: 1st: E(u)=0 --) Not a storng adj  always adj and make it 0. 2nd: E(u|
R^2: First started by discussing about the SST, SSE and SSR. SST is total variation in y. SSE is total variation in
x1,x2…xk) = E(u) for all values of xk. This is the CMI assumption. It basically means that the avg value of u
predicted y (which is the mean of predicted y = mean of y). SST is total variation in residuals (mean of resi=0). Why is IV imp? If one of the x is correlated with the error term, it will be a breach of CMI. The only way to get
does not depend on x. OR knowing the value of x does not tell you anything about the value of y, except
Total variation can be broken into (explained) & SSR (unexplained). R^2 is the share of explained= SSE/SST = 1- correct result would be if x should not be there or x is not correlated with other x variables very unlikely.
through the relation B1*X OR the estimation error is uncorrelated with all the x. Suppose: wage=
(SSR/SST). It is also equal to the square of correlation between y and predicted y. BUT it never goes down as One way to get past it is FE regression but what if the x is not fixed within groups? Then we use IV. Suppose in
B0+Edu*B1+u and u over here is ability OR knowing the value of x does not help us ascertain the value of u.
variables goes up. AdjR^2= 1 – (1-R^2)((N-1/(N-1-K)). Interpretation of R^2: If R2 is 0.014, It means the model a regression eqn, cov(xk,u)≠0  then think that xk has a good and a bad variation. Good variation is
CMI says that any value of x will give us 0. So a person with 1 yr of education should have same ability as
is only being able to explain 1.4% change in y with the regressors included. Low R^2 is never wrong. Even with uncorrelated with u and bad variation is correlated. An IV is a variable that effects xk but does not affect y
someone with 5 years of education. The problem with this is that we generally do know what is u. By default,
low R^2, we can get a consistent resulting beta. Difference in beta and R^2: Beta only says the effect of x on (i.e. it only explains the good variation in xk). In this case, we can use IV to replace xk with just good
CMi is always violated.
y. But a question like am I being able to explain all variation in y? has to be answered with R^2. variation. IV must satisfy 2 conditions:1) Relevance condition: IV (z) must be explain xk after controlling for
Including irrelevant regressor: Incl irrelevant regressors will still give us a consistent beta for all but the all other x. To find this regress xk on z 2) Exclusion Condition: ‘z’ must be uncorrelated with the error term in
OVB: Most common concern is that estimation error (say z) contains another variable that affects y AND is
irrelevant beta would be 0. The problem would be the standard errors which might increase putting pressure the original model (original model is the one with xk). It basically means that z has no explanatory power with
correlated with x. NOTE: OBV is only problematic if correlated with x. True B1= B1 + [Cov(x,z)/var(x)]*B2.
on significance. Therefore, it is useful to know what increases the variance of beta. respect to y after conditioning on the other X. It can only explain Y through xk. This is based on economic
Cov/Var is one way of getting B. B1 takes y on x and B2 takes z on y. the entire second half is the bias.
We see bias is equal to 0 either when y is uncorrelated with z or B2=0 or cov(x,z) is 0. Take the education logic. 2 stages to put in IV: i) regress xk on other x & z. And then, put the predicted value of xk from stage
Rj is regression on xj on other variables. and efficiency example. If we exclude efficiency, it is likely that the cov between education and efficiency is one in place of the other xk.
Things to note: 1) more variation in x lowers SE. Another way to understand is that if everyone has the same positive. Therefore, the bias is positive. Estimated B1 is bigger than the estimate. BUT WE CAN ONLY PREDICT IV is like a scaled version of a proxy variable. In IV, we replace the problematic x variable with the value
height, there is no way to predict weight. 2) higher variance leads to higher SE reducing precision of model. BIAS WHEN THERE IS 1 VARIABLE THAT IS OMITED. The moment that we say that more than 1 variable is predicted by instrument.
High variance means large residuals which would mean that y is not predicted well. To improve estimate of a omitted, it is imp to know the sign of the bias. A WEAK IV is one that does not explain much of the problematic xk & leaves SEs in the first stage regression.
particular beta, add x variables hat predict y better. 3) if an x variable that is highly correlated with other x ELIMINATING OVB: If observable, just add as control. If not, make a proxy. Ex: IQ as proxy for efficiency in the F-stat in the first stage will be low. F-stat of 10 or larger is desirable.
variable is added, Rj^2 would increase and the SE would also increase. So do not add variables that are WHEN MORE THAN 1 PROBLEMATIC X Suppose xk-1 and xk are problematic and the IVs for both are z and
irrelevant. above equation. True model: 𝑦=𝛽 +𝛽1𝑥1 +𝛽2𝑥2 +𝛽3𝑥∗3+𝑢 Estimated Model: 𝑦=𝛽0 +𝛽1 𝑥1 +𝛽2 𝑥2 +𝛽3𝑥3 z1. Regress xk-1 on all other x’s except xk and both instruments z and z1. Then do the same with xk on all
+𝑢, where 𝑥∗3 =𝛿0 +𝛿 𝑥3 +𝑣. Assumptions: 1) E(u|x1,x2,x3*)=0. This means model is correct as E(u| other x’s other than xk-1. Get the predicted value and do the next stage regression.
QUESTION: Should we include x variables that explain y and are highly correlated with our x of interest? Ans: x2=0) and 2) E(v|x1,x2,x3*)= 0. i.e. x3 is a good proxy for x3*. So that after controlling for x3, x3* does
Highly collinear variables can inflate SE but do not result in bias or inconsistencies. With large sample, we not depend on x1 or x2.
could get more variation in x and more precise beta. If there are 3 variables and x2 and x3 are highly PRACTICAL ASSUMP FOR GOOD PROXY: Suppose the same eqn. We need to assume that E(eff|IQ)=E(e| NATURAL EXPERIMENTS & DID:
correlated, Var of B2 and B3 may be high but correlation between them has no direct effect on Var B1 . If x1 IQ,Edu). It means that avg efficiency does not change with education, after controlling for IQ. OR Only the
is uncorrelated with x2 and x3, the R1^2 = 0 and variance of B1 is unaffected. proxy explains efficiency and education does not. When choosing proxy, ensure that 1) unobserved var
NATURAL EXPERIMENTS: If COV(x,u)≠0, then CMI violated. We can also think of it as x not being random (i.e. chosen as a cutoff by policy-setters knowing that that is the age when people get COVID more, then no at x’ is driven by something else that is unrelated to treatment
the dist of x is not random). Eg. Firms with high x might have a higher y because high x is more likely for firms causality. Therefore, the rule-setting should be independent of x.
with some omitted variable contained in u. Example 2) Borrower FICO score > 620 makes securitization of loan more likely. This is an exogenous change Now, since fuzzy RDD just increases the probability of the treatment, use x>x’ as an IV for treatment.
THE PROBLEM HERE IS ENDOGENEITY. Companies decide both x and y and therefore bias is present which is as companies just above or below the cutoff are same but it is only because of the cutoff that securitization is Notations: di=1 if treated and 0 if not.. We use the threshold indicator (Ti) as the Iv for di. Relevance
OBV or any other. Therefore cannot lead to causal inference. Therefore, we need a natural experiment- easy. AUTHORS WILL ARGUE THAT FICO score 620 is just an industry practice and was not chosen in the sense condition: indicator affects the probability of di = 1 and Exclusion condition: Ti is unrelated to y conditional on
which is an event that causes random assignment of the variable of interest- x. Examples: i) Suppose texas that people who fall just below the 620 are more likely to default than others AND that people cannot choose di and controls f( ). These will be satisfied in earlier assumptions. Example: di = 1 if loan is securitized, Ti = 1 if
govt stopped use of stock options. For is to be exogenous, you have to prove that the fact that they stopped their own scores otherwise people would force to be above 620. The first reason just argues that cutoff was FICO score is greater than 620, which increases probability loan is securitized.
had got nothing to do with the firms. Now we can compare companies in and outside Texas. ii) if SEC haulting found independent of the x variables. Again, f( ) is typically a polynomial function
trade has anything to do with the risk preference of the investor, then it is not random. But if it is something Unlike sharp RDD, it isn’t as easy to allow functional form to vary above & below
else that leads to it, then it is random. THE FACT REMAINS ONE- THE “EVENT” SHOULD NOT BE IN CONTROL HOW IS THE TREATMENT RANDOM WHEN IT IS BASED ON A CUTOFF?  We assume that the observation So, if worried about different functional forms, what can you do to mitigate this concern?
OF THE OBSERVATION UNIT OR IN OTHER WORDS, SHOULD BE EXOGENOUS. It DOES NOT mean that x and unit cannot perfectly manipulate the value of x. Therefore, whether or not the unit falls above or below the Answer = Use a tighter window around event; this is less sensitive to functional form, f(x)
y are not related, it just means that the ‘event’ is causing x to change. cutoff is totally random.
NEs can be used to construct an IV or used to construct a threshold for RDD but, generally it is used when we Researcher should ask the following...
talk about difference in differences. Basically- this method compares y for a treated group to the outcome 2 types of RDD: Sharp RDD: Assignment of treatment depends only on x. If x>x’, then treated. OR Fuzzy RDD: • Is there any reason to believe threshold x’ was chosen because of some pre-existing discontinuity in y or lack
variable y of the untreated group when treatment is randomly assigned by the NE. x>x’ only increases the probability of treatment but other factors still influence whether the unit is actually of comparability above and below x’ ?
d = treatment/assignment indicator (a switch) treated or no. For Fuzzy, the avg change in y understates the causal effect, because the comparison assumes • If so, it is a clear violation of local continuity assumption
d=0  untreated ; d=1 treated .. therefore the switching equation= y= y0 + d (y1 – y0) = dy1 + (1-d)y0. If all observations were treated. Therefore, rescale the avg change in y around the threshold based on the • Is there any way or reason why subjects might manipulate their x around threshold?
d=1, y=y1 and if d=0, y=y0. change in probability of the treatment. • Subjects’ (observation unit’s) ability to manipulate x can cause violation of local continuity assumption. With
ATT (Avg treatment effect if treated)- E[y1 – y0|d=1)] .. it is basically the effect of treatment on those who manipulation, y might exhibit jump around x’ absent treatment because of manipulation.
receive treatment… Unobservable The variable which decides the assignment of x is called the forcing or the running variable. X’ is the • Why isn’t subjects’ ability to manipulate x always a problem?
ATU (Avg treatment effect if untreated)- E[y1-y0|d=0].. it is the effect if treatment on those who are in control threshold. Y0 is outcome absent treatment & y1 is outcome with treatment. • If they can’t perfectly manipulate it, there will still be randomness in treatment.
group. ASSUMPTIONS 1: • I.e. in small enough bandwidth around x’, there will still be randomness because idiosyncratic shocks will
push some above and some below threshold, even if they are trying to manipulate the x -  to solve this Look
for bunching of observations immediately above or below threshold. Any bunching would suggest
manipulation.

FIELD EXPERIMENTS:

What is an experiment?
• Units of analysis are randomly assigned with known probability to treatment and control conditions. The
assignment is random, not haphazard.
• Ideally, experimenters play an unobtrusive role, reducing the risk of a violation of balancing between
treatment and control.
In a typical experiment:
• Researcher manipulates something in the real world, exposing randomly-assigned groups of people to
different treatments
Example solution: the fact that some companies are rated and some are not is in itself not random. Firm with • Possibly the most fruitful area of research in social sciences, but also greatest ethical challenges since the
rating likely to be more profitable. Therefore selection bias will be positive. Even adding controls might not experiments can be so powerful and have real-world effects
help if firms differ in unobservable ways like investment opportunities. • Some subjects selected for treatment may refuse, others may switch from control to treatment or from
treatment to control. Hence two types of estimates are possible:
FUZZY RDD: Treatment probability increases after cutoff. Some are untreated after cutoff and some are
DID test general: A quasi-experimental research design to estimate effects using archival data. • As treated (AT) analysis compares subjects given the treatment they received. Does not consider treatment
treated below cutoff. Therefore, we conclude like: a borrower with fico score >620 makes securitization more
group subject assigned, only which they received
likely but whether or not he gets it securitized depends on documents and all too.
Anecdotal evidence is one using your story to back up your claim. This is it causation. • Intent to Treat (ITT) analysis compares subjects in the groups to which they were randomized, regardless of
whether they received or adhered to the allocated treatment.
ASSUMTIPN 2: Local Continuity: We do not expect any jump around the threshold x’ , in absence of the
This is also randomized if you go to each person’s house, toss and give treatment based on coin toss. Ex: give
treatment. Y variable is smooth a function around threshold if no treatment. The parallel trend assumption in
dirty water if tails and clean water if heads. CONCERNS REGARDING THE PAPER:
DID that assumes that the units would have been same without treatment is equivalent to continuity assump.
First concern with the analysis is that the coefficient estimate might be biased.
Resolved through: address the predictability of tax changes by focus on unpredictable tax changes using a
Will the below regression reveal causal effect of treatment, d, on y? Regression:𝒚𝒊 =𝜷𝟎+𝜷𝟏𝒅𝒊+𝒖𝒊 (d takes
model-based approach. In this approach, we performed a textual search analysis of major US newspapers, of
value 1 if above cutoff. Unlikely! d is correlated with x, and if x affects y, then there will be omitted variable!
keywords indicative of corporate tax changes to find the number of articles about predictive discussions
E.g., Borrowers FICO score, used in Keys, et al (2010), affects likelihood of default. Therefore, above
mentioning tax-increase related words.
regression can NOT be used to determine effect of securitization on default risk
and by using a ‘narrative approach’ by reading through archival records on political and economic
How can we modify previous regression to account for this omitted variable? • Control for x, hence use this
environment around the tax changes and identify those changes that are less likely to be anticipated, as well
regression: 𝒚𝒊 = 𝜷𝟎 + 𝜷𝟏𝒅𝒊 + 𝜷𝟐𝒙𝒊 + 𝒖𝒊
as unrelated to local economic conditions. This addresses the concern that while the tax changes we might
Why might this still be problematic? (1) Assumes effect of x is linear, and (2) doesn’t really make use of
have been unpredictable, they could have been driven by the local economic conditions that also
random assignment, which is really occurring near the threshold
simultaneously affect innovation.
Second concern is that both tax changes and firm innovation could be driven by underlying local economic
TRADEOFF OF USING NARROW BINS Ideally, we’d like to compare average x right below and right above x’ to
conditions, leading to incorrect estimation of the tax effect.
get causal effect; but there is a tradeoff? We won’t have many observations and estimate will be very noisy. A
Resolved through: concentrating on firms that are located in contiguous counties on either side of a state
wider range of x on each side reduces this noise, but increases risk of bias that observations further from
border, so these firms are exposed to similar economic conditions but only one of the states changes taxes.
threshold might vary for other reasons (including because of the direct effect of x on y)
The narrower geography allows us to control for po- tentially unobserved time-varying economic
heterogeneity across treated and control firms more accurately. These firms should be subject to similar
Parrallel trends: Absent treatment, the change in y for treated would not have been different than the change economic conditions due to their close geographic proximity, but they are subject to different tax changes
in y for the untreated observations. However: sample size is significantly reduced to the thousands
Violation of parallel trends: There is no effect, but B3 from the slide above > 0 because parallel trends was
violated. I.E. the outcome in treated group would have increased relative to the control, absent treatment.
First concern with the analysis is that the coefficient estimate might be biased.
Why is DID preferred? The required argument to suggest DID estimate is not causal is . The change in y for Resolved through: address the predictability of tax changes by focus on unpredictable tax changes using a
treated observations after treatment would have been different than change in y for untreated observations model-based approach. In this approach, we performed a textual search analysis of major US newspapers, of
for reasons a, b, and c, and these reasons are correlated with both whether a group is treated and when the keywords indicative of corporate tax changes to find the number of articles about predictive discussions
treatment occurs. This is (usually) a much convoluted and harder story to justify mentioning tax-increase related words.
and by using a ‘narrative approach’ by reading through archival records on political and economic
REGRESSION DISCONTINUITY: environment around the tax changes and identify those changes that are less likely to be anticipated, as well
as unrelated to local economic conditions. This addresses the concern that while the tax changes we might
The basic idea of RDD is that options are treated based on a known cutoff rule. Eg: for the observable, x, an Approach with smaller window can be subject to greater noise, but advantages are... • It doesn’t assume have been unpredictable, they could have been driven by the local economic conditions that also
observation is treated if x ≥ x’. This cutoff creates discontinuity. We are interested in evaluating the effect on constant effect of treatment for all values of x in the sample; in essence you are estimating local avg. simultaneously affect innovation.
outcome variable y of the treatment effect (of being above the cutoff). In RDD, it is necessary to compare the treatment effect Second concern is that both tax changes and firm innovation could be driven by underlying local economic
outcome variable y just above or below the cut-off. Less subject to risk of bias because correctly controlling for relationship between x and y is less important in conditions, leading to incorrect estimation of the tax effect.
Example of setting: 1) Suppose we want to test the effect of vaccine on COVID & there is a scheme that says the smaller window Resolved through: concentrating on firms that are located in contiguous counties on either side of a state
only people above 50 are eligible for the vaccination. What we have to do here is that know that people just border, so these firms are exposed to similar economic conditions but only one of the states changes taxes.
below and above would have the same characteristics buttill people below would not be getting the Non-parametric plot shouldn’t suggest jump in y at other points besides x’ [Why?] The narrower geography allows us to control for po- tentially unobserved time-varying economic
vaccination. So, id you test the samples for cutoff, you should get to know the effect of COVID. If 50 was Such a jump would call into question internal validity of RDD; possible that jump
heterogeneity across treated and control firms more accurately. These firms should be subject to similar
economic conditions due to their close geographic proximity, but they are subject to different tax changes
However: sample size is significantly reduced to the thousands

You might also like