Professional Documents
Culture Documents
Econometrics Endterm Summary 2 PDF
Econometrics Endterm Summary 2 PDF
ty Fest Step 1 Define hypotheses, Hy and Hq Step 2 Choose a significance level a Step 3 Estimate the restricted and unrestricted models, where the restricted model is the one obtained if Hy is true, ‘Step 4 Compare the residual sum of squares (RSS) across the two models by calculating the F-statistic: (RSS —RSS)/M RSS/(n—k—1) eine ‘Step 5 Find the critical value for the F-statistic, Fe = Fun-kte Step 6 Compare the observed statistic to the critical value- reject Hy if F > F. Omitted Variable Bias Omitted variable bias is a serious problem for empirical economic research: It violates one of the OLS. ‘assumptions for unbiasedness, ‘The misspecified model, where X; is the omitted variable Ye = Bot Bei teh @ = Bite; ‘The relationship between the omitted and the included variable is given by Nj = to aX; + u; Estimates of equation 2 are Y; = By +B,Xu +6, ‘We can show that E(B,) # B, but instead: E(B.) = BrtmB, (VB formula) —— eeewos ‘AthenaSi The OVB formula is . EG) = Bb +mby —~ ~~ ow Eesti i coeticient) true cou cient The blas depends on » a: bivariate relationship between the included and the ‘omitted variable > Bp : partial effect of the omitted variable on the dependent variable Y m0 m <0 B, =O | positive bias | negative bias B, <0 | neg: bias. Functional Form 1. Level-level sp. ication cigs; = By + Byincome; + €; 2. Log-log specification (double log) In cigs; = Bo + B In income; + €; 3, Log-level specification (semi log) In cigs; = By + Byincome; + €: 4. Level-log specification (semi log) cigs; = Bo + By In income; + £1 1. Level-Level Interpretation: smokers who earn $1000 more per year smoke 0.22 cigarettes more per day. 2. Logslog specification: The coefficient gives an elasticity. Interpretation: when income increases with 1%, smokers ‘smoke 0.16% more cigarettes per day. l.e., the income elasticity of cigarette consumption is 0.16 for smokers. 3, Log-level specification: the coefficient* 100% gives the percentage change in the dependent variable, for a one unit increase in the level of the independent variable. Interpretation: when income increases with $1000, smokers smoke 1.05% (= more cigarettes per day. 4. Level-log specification: the coefficient/100 gives the impact on the level of the dependent variable from a 1% increase in the independent variable. 38his AthenaSummary Interpretation: when income increases with 1%, smokers smoke 0.03 (= 3.32/100) more Cigarettes per day. You cannot compare R’ or adjusted-R’ across models with different dependent variables! Dummy Variable Dummy for gender, female; ‘+ female: =1 if the respondent is female; female, = 0if the respondent is not female. © female the reference group. Perfectly collinearity: female, and male, are perfect linear functions of each other. In particular, for each individual observation female, + male, = 1. Hence, one of the OLS assumptions for unbiasedness is violated. This is known as the dummy variable trap: you cannot include a full set of dummies; we need an omitted category which serves as the reference category. Dummy variables for >2 groups ‘+ Dummies for categorical variables: cannot be ranked, ‘* Dummies for ordinal variables: can be ranked. Create a separate dummy variable for each of the k categories of race. Include k-1 of those dummies into the equation (not k, due to the dummy variable trap!) Interaction Term ‘Average log wages for men (fem, = 0): = Bo + Bi age: + B: educ, Bo + Bi age, + Bs educi+ Bs + Bx educ, (Bo + Bs) + Bs age: + (Bz + Be) educ, ‘Average log wages for women (fem. = Chow test Chow test: tests whether groups have different regression functions. Write restricted and unrestricted models; define corresponding null and alternative hypotheses 2. Choose a significance level a 3. Estimate the restricted and unrestricted models 4. Calculate the Chow test statistic, which is an F-statistic comparing the RSS between the restricted and unrestricted models 5. Find the critical F-statistic, Fe 6. Reject Ho if F > Fe ee2 In wage; = By + Byagei + Byeduci + Restricted model: Unrestricted models: Inwage, = By! +BY ager + By'educi +6; for males (1) Inwage; = Bp + fi age) + By educ; +e; for females (2) Hypotheses. Hy: BO = Bo. BY! = Bt Ba! = Bo Ha =: Ho not true a = 005 (RSSy — RSS, — RSS) /(k +1) TRSS, + RSS) im +m — 2K EI) MA tm—2e 40) where RSSm_: RSS from restricted model RSS, + RSS from unrestricted model 1 RSS RSS from unrestricted model 2 m_ : nr of obs from unrestricted model 1 m+ nr of obs from unrestricted model 2 kn of parameters (all models have same k) Multicollinearity Perfect (multi)collinearity): perfect linear relationship between 2 or more independent variables. Imperfect (multi)collinearity does not violate any assumptions, but still may be a concern, The variances (and hence standard errors) of the estimates will increase. This makes it more difficult to reject the null hypothesis that a particular independent variable has no impact on the dependent variable. Estimates can also become very sensitive to changes in specification (e.g. adding a variable; changes in the number of observations). The estimation of the coefficients of nonmulticollinear variables will be largely unaffected. ‘© Multicollinearity increases R2 and therefore the standard error. Disease = imperfect multicollinearity. Consequence = estimates of Bx and of Var(B:) remain unbiased (since no OLS assumption has been violated), but the estimated Var(B,) is larger (i.e. larger standard errors, higher p-values). Diagnosis = examine correlations among regressors; estimate auxiliary regressions, do nothing (only drop one of the highly correlated variables if economic theory justifiesahase AthenaSumma Heteroskedasticity Consequences: T-statistc and f- statistic depend on variance so if the variance isn’t constant itis possible to make a hypothesis and therefore impossible to infer To test for heteroscedastcity: Breusch-Pagan 1 Estimate the model Ys = Bo + BX + BX +6; 2. Predict residuals ¢; from the estimated model ¥: = By + B,%ii + BX +e, and square them (¢?) 3. Regress squared residuals e? on independent variables from the original model P= O04 Xt + bX +0; 4 Test whether the independent variables have a jointly significant Impact on e?. if they do (ie. Ho is rejected), we have heteroskedasticity Hy + 61=52=0 — (homoskedasticity) Ha: Hy not true (heteroskedasticty) Robust standard errors are typically higher than the regular ones- although they may also be lower. Higher standard errors mean the t-statistics become smaller (in absolute value}, and estimates become less significant. Time-series Static time series model Ye = Bo + ByXie + et Distributed lag model Ye = Bo + ByXte + BoXiea ter Autoregressive distributed lag model Ye = By + By Yo-1 + ByXie + ByXiea tee fPurlous regression: a strong statistical relationship between two or more variables that isnot driven by an underlying causal relationship. Even if those factors are unobserved, we can contol them by Siectly controling forthe trend Serial correlation: violates OLS assumption 5. * Consequences: biased standard errors, and additionally, biased coefficient estimates in autoregressive models. et ee(© Diagnosis: Breusch-Godfrey test. (© Solution: GLS estimation or Newey-West standard errors. A distributed lag with many lagged independent variables is often problematic. The lagged independent variables are often strongly correlated, leading to multicollinearity (i.e. less significant estimates). The lagged independent variables take up degrees of freedom, decreasing the precision of our estimates (i.e. less significant estimates). We can rewrite the autoregressive (AR) model as a distributed lag (DL) model. ‘The DL model has stronger multicollinearity; The DL model requires the estimation of more parameters and also has a lower number of observations (due to the lags); The DL model typically has more serial correlation. A time-series Ys is strictly stationary if the distribution of ¥;,Y2, ... Yn is the same as the distribution of the variable shifted by some time lag k, Yiex Yau, «rat: the distribution of the variable does not depend on time t. However, to avoid spurious regression, we only need a weaker version - covariance stationarity. {A time-series Ys is covariance stationary if the following 3 statistical properties are unaffected by a change of time: E(¥,) is constant over time . Var(e) is constant over time ce: Le. COV (Ys, Yeu) does not depend on time (it only depends on the lag length k) ‘When one or more of these conditions is violated, a time-series is non-stationary. it follows a random walk. Ycisa non-stationary variable Ye= Yea + €+ (Eris Lid) IF Y¢ follows a random walk, then the value of ¥ tomorrow is the value of Y today, plus an unpredictable (ii.d) disturbance e.. ‘A variable that follows a random walk is also said to have a unit root, or be I(1), which stands for integrated of order 1. This Is very important for economic applications since many macro-economic time series are random. walks! ‘This shows that the variance of Y; becomes larger and larger over time. Period 1: 0? ; period 2: 2*0? ; period 3: 3*o”... etc We conclude that the variance of a random walk is not constant over time: a random walk is non- stationary. 42¢ ao ; Dickey-Fuller test . Dickey Fuller test for unit root performed separately for al variables ofthe regression equa regression equation, 1. Determine whether ¥e follows a timetrend by estimating Ye = Bot Bit +e, 2. Estimate AY, = @+9Y;1 +6 if no timettend found BY: = @+0Y11 +50 +e if a timeteend found H0=0 ty:0<0 3. Compare the t-statistic on 0 to the appropriate critical val from the DF table, DF, Poroprae cota vale 4 Reject Mp if ¢ < DFe, in which case Y; does not have a unit foot. Repeat this forthe independent variable(s) in the repression equation! If both the dependent and the independent variable are non-stationary, the error term may still be stationary. If this isthe case, the dependent and independent variables are said to be cointegrated. This is good news, since when X and ¥ are cointegrated, the original regression results are not spurious, and we do not need to first-difference. Dickey-Fuller Cointegration Test This test examines whether the error term is stationary. This should be the case if the two series are cointegrated since it indicates the series do not wander apart. The test uses the residual e; (as errors are unobserved} 1. Estimate the relationship between Ye and X; (include a timetrend if B, is significant) Ye = Bo + BiXt (+Bot) +e 2. Generate the residual, er 3. Regress the differenced residual onto the lagged residual: Dee = Yo + Oe + Ur 434. Compare the t-statistic to the critical value from the DF cointegration (DFC) table (with or without a timetrend, depending on the model from step 1), to test Ho + @=0 nocointegration Ha : @<0 cointegration 5. Reject My if t < DFC.: if Hy is rejected, we conclude Y; and X are cointegrated A standard sequence of steps for avoiding spurious regre 1. Test all variables for unit roots (i.e. nonstationarity) using the appropriate version of the Dickey-Fuller test 2. If the variables do not have unit roots, estimate the equation in its original units (Y and X). 3. If the variables both have unit roots, test the residuals of the equation for cointegration using the Dickey-Fuller cointegration test. 4, If the variables both have unit roots but are not cointegrated, then change the functional form of the model to first differences (AX and AY ) and estimate the equation. 5, If the variables both have unit roots and also are cointegrated, then estimate the equation in its original units. Linear Probability Model ‘Sometimes, we want to estimate a model that has a dummy variable as a dependent variable. But how can we estimate such a model, which has a 0-1 variable as the dependent variable? It turns out, one possibility is to use the OLS estimator: this is called the Linear Probability Model (LPM) Non-linear estimators, such as logit and probit, are also an option but not part of this course. The marginal (or partial) effect of each variable is given by its coefficient. Linear Probability Model: ‘© Advantages: easy to compute and easy to interpret. Most important disadvantages: = The errors are heteroskedastic by construction. = Predicted probabilities may lie outside the 0-1 interval (and of course it makes ‘no sense to interpret these as probabilities). Since OLS is a linear estimator, predicted probabilities can lie outside the 0-1 interval. Therefore, whenever estimating an LPM, you should always check your predicted probabilities. If many are <0 or >1, you should use a different estimator (logit / probit) instead of OLS (not part of this course).When estimating an LPM, do not forget t. «Correct for heteroskedasticity(, robust): else the st ( else the standard errors are biased. Check the fitted values: if more than a few percent fall outside of the a estimator is not appropriate. 'e O-1 range, the OLS Experiments are not always possible, however: another approach increasi . so-called natural experiments, ingly used by economists are Natural experiment = random variation contained within observat ; ional experiments designed by the researchers themselves) at (2s opposed tn ‘One example is the politcal representation of women in West-Bengal: since the reservation of seats for women was random across towns, this isa natural experiment. 45