You are on page 1of 43
ahd, Formula Card for the Exams OLS Estimator Le, -0v,-7 A. Dx, -xF Var (8;)= 2 ‘SST, 0-3) 2 RSS o NK se(B,)= —_¢ TSS ,(1 ) eo TSS, =X, -Xx, ‘Summary Statistics Total sample variation of Y (Total Sum of Squares): 185 =3°(¥,-¥F S2 E-statistic for multiple linear restrictions: Test Statistics _ (RSSy—RSS)/M- ‘RSS /(N-K-1) ~ Fan F Time Series AR()-process: Y,= PW. tMst=h 2s Week 0 Chapter 1: An Overview of Regression Analysis Econometrics_is_the_quantitati surement and analysis of actual economic and business phenomena, (\t allows us to examine data and to quantify the actions of firms, consumers and governments). Econometrics has three major uses: 1. Describing economic reality 2. Testing hypotheses about economic theory and policy 3. Forecasting future economic activity. = Bo+ BP + BoP + PYd ByP = Product line BaP, = Price on substitute B,Yd = disposable income Product in equation 1.1 is what economists call 2 normal good (one for which the quantity demanded increases when disposable income increases). ‘The steps used in nonexperimental quantitative research 1. Specifying the models or relationships to be studied 2. Collecting the data needed to quantify the models 3. Quantifying the models with the data Econometricians use regression analysis to make quantitative estimates of economic relationships that Previously have been completely theoretical in nature. Regression analysis is 0 statistical technique that attempts to “explain” movements in one voriable, 95.0 function of movements in a set of other variables, called the independent the dependent variable, (or explanatory) variables, through the quantification of one or more equations. = Bot BP + BoP + BYd Qis the dependent variable and P, and Yd are the independent variables. ifthe quantity of capital employed increases by one unit, then output increases by a certain amount, called the marginal productivity of capital. Such @ conclusion was unjustified because regression analysis cannot confirm causality; it can only test the strength and direction of the quantitative relationships involved Y= Bo+ BX The Bs are the coefficients that determine the coordinates of the straight line at any point. By is the constant or intercept term; it indicates the value of Y when X equals ero. Bis the slope coefficient, and it indicates the amount that ¥ will change when X increases by one unit. fi, shows the response of ¥to.a one-unit increase in X. Slope coefficient fi eK) _ ar =x)” aK A Iflinear regression techniques are gong tobe applied to an equation, that equation must be linear. By random we mean something that has is value determined entirely by chance. Stochastic error term isa term that is added to 0 rearession equation to introduce all ofthe variation in that cannot be explained by the included Xs. AthenaSummary Epsilon (e), although other symbols (like u or v) sometimes are used. Y= Pot BX+e Deterministic is the part of the equation without the error term: ¥ = fo + BX (also known as the expected value). Stochastic is the random component epsilon. The deterministic part of the equation may be written: ¥ = E(VIX) = Bo + AX In the real world itis unlikely to be exactly equal to the deterministic expected value E(YIX). Ata result, the stochastic element (e) must be added to the equation Y=E(VIX) += Bot Akt € Wn because there are at least four ‘The stochastic error term must be present in a regression equal in ¥ other than the variation in the included Xs sources of variation i uation (for example, because data are 1. Many minors influences on ¥ are omitted from the eq\ unavailable) 2. Itis virtually impossible to avoid some 3, The underlying theoretical equation might have the one chosen for the regression. For example, 4. Allattempts to generalize human behaviour must contai unpredictable or purely random variation. sort of measurement error in the dependent variable a different functional form (or shape) than the underlying equation might be nonlinear. jn at least some amount of Y= Bot BX te (C= 1,2, ND the ith observation of the dependent variable the ith observation of the independent variable © & =the ith observation of the stochastic error term © BoB = the regression coef ficients «N= the number of observations servations to observation, but the values of Y, X, and € do. ‘The coefficients do not change from obs /more than one independent variable) linear regression The resulting equation is called a multivariate model: Ye= Bot BrXus + Bakes + BaXai + & Estimated regression model equation has an actual number init for the values beta. Pis the estimated or fitted value of Y. The estimated regression coefficients, denoted by fo andj, are empirical best guesses of the true regression coefficients and are obtained from data from a sample of Ys and Xs. more generally Ka fo thx Y. use the prediction of E(Vi | Xi) from the regression equation. Residual the difference between the estimated and actual value of the dependent valu Note the distinction between the residual and the error term a =%- EQ IX) ‘The residual is the difference between the observed Y and the estimated regression line, while the error term is the difference between the observed Y and the true regression equation (the expected value of Y) [ True regression equation Estimated regression equation Bo Be By B g a ‘Although the true equation, ike observations of the stochastic error term, can never be known, we ‘were ableto come up with an estimated equation that had the sign we expected for R, and that helped usin our job. A data set Is called cross-sectional because al of the observations re from the same point in time and ‘sepresent different individual economic entities countries, housing, etc) from that same point in time. Chapter 17.1; Probability Distributions tfan event has a probability Pof occurring, then the fraction of the times that it occurs in the long run will be very close to P. A random variable X is a variable whose numerical value is determined by hance, the outcome of a random phenomenon. & discrete random variable has a countable number of possible values, such as 0, 1, and 2. Continuous random variables, such as time and distance, which an take on any value in an interval. A probability distribution P|xi| for a discrete random variable X assigns probabilities to the possible values X1, X2, and so on. The expected value (mean) of a discrete random variable X is weighted average of all possible values ‘of X, using the probability of each X value as weights: =e) =D xPR However, an expected-value criterion is often inappropriate. To measure the extent to which the outcomes may differ from the expected value, we can use the variance of a discrete random variable X, which is a weighted average, forall possible values of X, of the squared difference between X and its expected value, using the probability of each X value as weights: Yio we The standard deviation aris the square root of the variance. The variance is the expected value of {(X — u)?], that i, the anticipated long-run average value of the squared deviations of the possible values of X from its expected value 1. The standard deviation is usualy easier to interpret than the variance because ithas the same units (for example, dollars) asX and i, while the units for the variance are squared (for example, dollars squared). BUX 1) bese ‘AthenaSummary We can display these interval probabilities by using a conti ;ntinuous probability density curve. More generally, however, the formulas for the mean and standard deviation of a continuous random variable involve integrals and can be difficult to calculate. To standardize a random variable X, we subtract its mean jt and then divide by its standard deviation a: zatit 7 No matter what the initial units of X, the standardized random variable Z has a mean of O and se ndard deviation of 1. The standardized variable Z measures how many standard deviations X is spove or below its mean. IfX is equal to its mean, Zis equal to 0. IfXis one standard deviation above its mean, 2is equal to 1. IfX is two standard deviations below its mean, Zis equal to -2. a standardized sum of N independent, identically bles with a finite, nonzero standard deviation, then mal distribution as N increases. However, don’t be ‘The central limit theorem states that if Z is distributed (discrete or continuous) random varial the probability distribution of Z approaches the nor lulled into thinking that probabilities always follow the normal curve. Chapter 17.2: Sampling To distinguish between a population, which is the entire group of items that interest us, and a sample, uhich is the part of this population that we actually observe. Statistical inference involves using the Sample to draw conclusions about the characteristics of the population from which the sample came, is intended to represent is called a ‘ample is unrepresentative of the population, it gives a distorted lead to unwarranted conclusions. One of the most common causes fh occurs when the selection of the sample systematically election bias often happens when we use a convenience able. Self-selection bias can occur when we examine fers systematically from the population that ‘Any sample that biased sample. Because a biased picture of the population and may of biased samples is selection bias, whic excludes or underrepresents certain groups. sample consisting of data that are readily av data for a group of people who have chosen to be in that group. ss and lost sly unreliable, and not just because of faulty memor Retrospective studies are notorio ers of the past population who are no longer around -an exclusion data. We necessarily exclude mem! that causes survivor bias. ‘The systematie refusal of some groups to participate in an experiment or to respond to 2 polis called nonresponse bias. Exercises Week 0 1. What are the three major uses of econometrics? 2. True or false? 3. The sample is a smaller part of the population b. Selection bias occurs when a selection of the sample is systematically exclude © Survivor bias occurs when we exclude members of past population 3. What is the stochastic error term? a ee SSS 4. Give the equation for expected value of a discrete random variable X: A oo 5. Adata set is called cross-sectional because? Week 1 Chapter 2: Ordinary Least Squares ‘The purpose of regression analysis is to take a purely theoretical equation like: Y= Bot BX +e ‘And use a set of data to create an estimated question like: % = Bo + BX ‘The most widely used method of obtaining these estimates is Ordinary Least Squares (OLS). This is 2 regression estimation technique that calculates the f so as to minimize the sum of the squared residuals, thus; ‘Three important reasons for using OLS 1. OLSis relatively easy to use 2. The goal of minimizing ¥ e? 3. OLS estimates have a number of useful characteristics is quite appropriate form a theoretical point of view ‘The final reason for using OLS is that its estimates have at least two useful properties 1. The sum of the residuals is exactly zero > ots can be shown to be the “best” estimator possible under a set of specific assumptions. We'll define “best” in Chapter 4 ied to a sample of data to produce a real-world ‘An estimator is 2 mathematical technique that Is appli coefficient (or other parameters). Thus, OLS is numerical estimate of the true population regression an estimator, and af produced by OLS is an estimate. ‘The general multivariate regression model with K independent variables: + BuXni + & Bo + BrXai + BaXei + .e change in the dependent variable ‘a multivariate regression coefficient indicates thi holding constant the other Specifically, ‘ein the independent variable in question, associated with a one-unit increast independent variable in the equation. ‘The goal of OLS is to choose those fs that minimize the summed squared residuals. wund its mean as a measure of the amount of Econometricians use the squared variations of Y aro. ‘ted quantity is usually called the total sum of variation to be explained by the regression. This compu ‘squares, or TSS, and is written as: x TSS = ya -7y? re ‘The simplest commonly used measure of fit is R?, or the coefficient of determination. R? is the ratio of the explained sum of squares to the total sum of squares 10 mn 2 jenaSummary pi JESS ie Rss Let TSS 155 rM-rHe F ‘The higher R? is, the closer the estimated regression equation fits the sample data. R? percentage of the variation of ¥ around F that is the interval 0 < R? < 1. Avvalue of R? with R? is that adding another indey In sum, R? improves o1 measures the is explained by the regression equation. R? must lie in close to one shows an excellent overall fit. A major problem pendent variable to a particular equation can never decrease R? is of litle help i we are trying to decide whether adding a variable to an equation ur ability to meaningfully explain the dependent variable. Lei /(N-K x7 Chapter 4: The Classical Model The term classical refers to a set of flr considered the “best” estimator availabl ‘not hold, other estimation techniques s ly basic assumptions required to hold in order for OLS to be le for regression models. When one or more assumptions do sometimes may be better. Classical assumptions: 1. The regression model is linear, is correctly specified, and has an additive error term, The error term has a zero population mean, All explanatory variables are uncorrelated with the error term. Observations of the error term are uncorrelated with each other (no serial correlation) The error term has a constant variance (no heteroskedastcity). No explanatory variable is a perfect linear Funct perfect multicollinearity) The error term s normaly distributed (this assumption is optional but usually is invoked), ion of any other explanatory variable(s) (no ‘An error term satisfying Assumptions 1-5: classical error term, If Assumption 7 is added: classical normal error term. 2: To compensate for the chance that the mean of e might not equal zero, the mean of efor any regression is forced to be zero by the existence of the constant term in the equation. ‘The constant term equals the fixed portion of Y that cannot be explained by the independent variables, ‘whereas the error term equals the stochastic portion of t he unexplained value of Y. The second classical assumption is assured as long as a constant term is included in the equation and all other classical assumptions are met. 43: the error term and X were positively correlated, for example, when the estimated coefficient would probably be higher than it would otherwise have been (biased upward), because the OLS Program would mistakenly attribute the variation in Y caused by e to X instead. This Classical ‘Assumption Ill is violated most frequently when a researcher omits an important independent variable from the equation. ‘5: Assumption Vis likely to be violated in cross-sectional data sets, heteroskedasticity. 6: Itis quite unlikely to encounter perfect multicollinearityn practice, even imperfect multicolinearity can cause problems for estimation. a testing. It’s usually 7: It is not required for OLS estimation, but its major application is hypothe: advisable to add the assumption of normality to the other six assumptions for two reasons: 1. The error term e1 can be thought of as the sum of a number of minor influences or errors. AS the number of these minor influences gets larger, the distribution of the error term tends to approach the normal distribution. 2. The t-statistic and the F-statistic are not truly applicable unless the error term is normally distributed (or the sample is quite large). ‘An estimator is a formula, such as the OLS formula, while an estimate is the value of Bhat computed by the formula for a given sample. We would like the distribution of the Bhats to be centred around the true population B, so too would like that distribution to be as narrow (or precise) as possible. tions | through VI, the Ordinary Least Squares from among the set ofall linear unbiased estimator Gauss-Markov Theorem: Given classical assum estimator of Bk is the minimum variance estimator of Bk, fork =0, 1, 2, .., K ‘An unbiased estimator with the smallest variance is called efficient, and that estimator is said to have the property of efficiency. Given all seven classical assumptions, the OLS coefficient estimators can be shown to have the following properties: ‘They are unbiased. They are minimum variance. They are consistent. They are normally distributed. aeye 12 Exercises Week 1 1. What is the most widely used method of obtaining estimates? 2. True or false? Three important reasons for using OLS a. The OLS works on every problem b. The goal of minimizing 5 ¢? is quite appropriate form a theoretical point of view © Ithas@ number of useful characteristics 3. What are the first 4 Classical Assumptions? 4: Give the equation forthe general multivariate regression model with K independent variables: 5. Whats the Gauss-Markov Theorem? B weed AthenaSummary Week 2 Chapter 5: Hypothesis Testing and Statistical Inference The null hypothesis typically is a statement of the values that the researcher does not expect. Null hypothesis Ho: <0 (the values you do not expect) The alternative hypothesis typically is a statement of the values that the research expects. Alternative hypothesis Hy: 8 > 0 (the values you expect) ‘Two-sided test in which the alternative hypothesis has values on both sides of the null hypothesis. Type | and Type Il Errors Type | error = sending an innocent defendant to jail Type Il error = freeing a guilty defendant {A decision rule is a method of deciding wt that divides the “acceptance” region from the rejection ret hhether to reject a null hypothesis. A critical value is a value .gion when testing a null hypothesis. The t-Test The t-test is used to test hypotheses about indivi absolute value this t-value is, the greater the Ii different from zero. dual regression slope coefficient. The larger the ihood that the estimated regression coefficient is ‘Acritical t-value is the value that distinguishes the “acceptance” region from the rejection ret reject the null hypothesis if the calculated t~value It greater in absolute value than the critical tvalue reid if the calculated t-value has the sign implied by Ha- Reject Ho if [tel > te also has the sign implied by H,/ do not reject Ho otherwise. icates the probability of observing an estimated t-value greater than the ‘The level of significance ind jothesis were correct. critical t-value if the null hyp e whether a regression coefficient is ise of the one-sided t-test is to determin steps to use when The most common us 'e direction predicted by theory. The four significantly different from zero in th working with the t-test are: ‘LSet up the null and alternative hypotheses. + Choose a level of significance and therefore a critical -value. ae eoche regression and obtain an estimated tvalue (or t-score) 4, Apply the decision rule by comparing the calculated tvalue with the critical reject or not reject the null hypothesis. t-value in order to ‘The P-values ‘A p-value, or marginal significance level, for a t-score is or larger (in absolute value) if the null hypothesis were tara small p-value casts doubt on the null hypothesis, p-value. the probability of observing a t-score that size true. A p-value is a probability, so it runs from so to reject a null hypothesis, we need a low Reject Ho if p — value, < the level of significance and if Bx has the sign implied by H,.Do not reject Hp otherwise. 14 Limitations ofthe t-Test ; n One problem withthe t-tests that it is easy to misuse. The usefulness ofthe t-test diminishes rapidly asmore and more specifications are estimated and tested. Never conclude that statistical significance, as shown by the t-test, is the same as theoretical validity. Statistical significance says litte if anything about which variables determine the major portion of the variation in the dependent variable. Thus, the mere existence ofa large t-score fora huge sample has no real substantive significance. ‘A confidence interval isa range of values that will contain the true value of B a certain percentage of the time, say 90 or 95 percent. The formula for a confidence interval is: Confidence interval = fi +t, +SE(B) ic for whatever significance level we choose. Where ¢- is the two-sided critical value of the t-st The F-Test ‘The Ftest is a formal hypothesis test that is designed to deal with a null hypothesis that contains ‘multiple hypotheses or a single hypothesis ora single hypothesis about a group of coefficients. As a ‘result, in the F-test the null hypothesis always leads to a constrained equation, even if this violates our standard practice that the alternative hypothesis contains what we expect s true. The second step in ‘an F-test Is to estimate this constrained equation with OLS and compare the fit of this constrained equation withthe fit ofthe unconstrained equation, (RSSp— RSS)/M RSS/(N—K—1) Where: ‘+ RSS= residual sum of squares from the unconstrained equation. © RSS; = residual sum of squares from the constrained equation ‘+ -M_= number of constraints placed on the equation (usually equal to the number of fy eliminated from the unconstrained equation). + (W—K~1) = degrees of freedom in the unconstrained equation. RSSp is always greater than or equal to RSS. When F gets larger than the critical F-value, the hypothesized restrictions specified inthe null hypothesis are rejected by the test, Reject Hy if F > F, Donot reject Hp if F omitted variable bias. * Drop a redundant variable © Viable strategy when two variables measure almost the same thing, © Increase sample size © Increases TSS. Ifthe multicollinearity has not decreased: scores to the point of insignificance, then no remedy should even be considered. a Exercises Week 4 1. What allows a slope dummy variable? 2. True or false? a. If two variables are perfectly positively correlated, then b. If two variables are perfectly negatively correlated, then r =+ 1 c. If two variables are totally uncorrelated, then r= 0 3. Perfect multicollinearity violates which classical assumption? 4, Give 3 consequences of multicollinearity: 5. What are the remedies for multicollinearity? 22 Week 6 Chapter 9: Serial Correlation Time-series studies have som sections. 1 2 3, 4. fe characteristics that make them more difficult to deal with than cross- The order of observations ina time series is fixed 3. Serial correlation causes the OLS estimates of the SE(A)s to be biased, leading to unreliable hypothesis testing. How Can we Detect Serial Correlation? The Durbin-Watson tests used to determine if there i first-order serial ‘correlation in the error term of an equation by examining the residuals ofa particular estimation of that equation. I’ 1. The regression model includes an intercept term 2. The serial correlation is first-order in nature where pis the autocorrelation coefficient and uis a classical (normally distributed) error term. 3. The regression model does not include a lagged dependent variable as an independent variable. ‘The equation for the Durbin-Watson statistic for T observations is. Unfortunately, the Durbin-Watson test has a number of limitations. A popular alternative to the Durbin-Watson test is the Lagrange Multiplier (LM) test which checks for serial correlation by analysing how well the lagged residuals explain the residual of the original equation in an equation that also includes all the explanatory variables of the original model. 23 noose ‘AthenaSummary Generalized Least Squares (GLS) is a method of ridding an equation of pure first-order serial correlation and in the process restoring the minimum variance property to its estimation. To estimate GLS equations the Prais-Winsten method is used. It is a two-step iterative technique that rids an equation of serial correlation by first producing an estimate of p and then estimating the GLS equation using that p. Not all corrections for pure serial correlation involve GLS. Newey-West standard errors are se(A)s that take account of serial correlation without changing the B's themselves in any way. 24 Exercises Week 6 1. Give 2 characteristics of time-series studies that make them more difficult than cross-sections? 2. Which ofthese assumptions need tobe met before using the Durbin-Watson test? 3. The regression model includes an intercept term There is guarantee that the estimated Bs wil follow the smoothly declining pattern that ‘economic theory would suggest © The regression model does include a lagged dependent variable as an independent variable 3. When does pure serial correlation occur? ‘4. What is the equation for the Durbin-Watson statistic for T observations? 5. What is an alternative to the Durbin-Watson test? 25 Week 7 Chapter 12: Time-Series Models Ye = Bo + BaXac-1 + Bone + & Because X appears three times, each with a different lag, distributing the impact of X over a number of time periods. This isa distributed lag model; it explains the current value of ¥ as 2 function of current and past values of X. A more general distributed lag equation would be appropriate Y= a0 + BoXe + PiXeo1 + Bake-2 + BpXe-p + & Where pis the maximum number of periods by which Xis lagged. Unfortunately, the estimation of the previous equation with OLS causes a number of problems. 1. The various lagged values of X are likely to be severely multicollinear, estimates imprecise 2. Inlarge part because of this multicollinearity, follow the smoothly declining pattern that economic theor typical for the estimated coefficients of the previous equation making coefficient there is no guarantee that the estimated Bs will ry would suggest. Instead, it’s quite to follow a fairly irregular pattern. 3. The degrees of freedom tend t have to estimate a coefficient for each lagged X, thus increasing K and I freedom (N-K-1) 0 decrease, sometimes substantially, for two reasons. First, we lowering the degrees of place all the lagged independent variables with a lagged value of the dependent variable, and we'll call that kind of equation a dynamic model, The simplest dynamic model is an equation in which the current value of the dependent variable ¥ Is @ function of the current value of X and a lagged value of Y itself. Such a model with a lagged dependent variable is often called an autoregressive equation. ‘The most commonly used simplification is to ret Serial correlation causes: (1) OLS to no longer be the minimum variance unbiased estimator, (2) the SE(A)s to be biased, and (3) no bias in the OLS fis themselves. If an equation with a lagged dependent variable as an independent variable has a serially correlated error term, then OLS estimates of the coefficients will be biased, even in large samples. If serial correlation causes bias in a dynamic model, then tests for serial correlation are obviously important. Using the Lagrange Multiplier to test for serial correlation in a typical dynamic model involves three steps that should seem familiar: 1. Obtain the residuals from the estimated equation ee = Ye Fe = Me — Bo — BoXae ~ AeYena 2. Use these residuals as the dependent variable in an auxiliary equation that includes as independent variables all those on the right-hand side of the original equation as well as the lagged residuals. ec = Hg + aXe + aYena + seta + Ur 26 ahs 3. Estimate the above equation using OLS and then test the null hypothesis that a3 = O with the following test statistic LM = NR? Granger causality, or precedence, is a circumstance in which one time-series variable consistently and Dredictably changes before another variable. One problem with time-series data is that independent variables can appear to be more significant than they actually are if they have the same underlying trend as the dependent variable. Such a Problem is an example of spurious correlation, a strong relationship between two or more variables that is not caused by a real underlying causal relationship. The focus of this section, however, will be con time-series data and in particular on spurious correlation caused by nonstationary time series. More formally, a time-series variable X,, is stationary if: 1. The mean of Xz is constant over time 2. The variance of X; is constant over time, and 3. The simple correlation coefficient between X; and X-_y depends on the length of the lag (k) but on no other variable (for all ky). fone or more of these properties is not met, then Xt is nonstationary. And if a series is nonstationary, that problem is referred to as nonstationary. ‘A random walk is a time-series variable in which the next period’s value equals this period’s value plus a stochastic error term. To ensure that the equations we estimate are not spurious, it’s important to test for Nonstationarity. After the time trend has been removed, the standard method of testing for non-stationarity is the Dickey-Fuller test. The Dickey-Fuller test actually comes in three versions 1 ak = Bk +% 2. Av = Bot Biker +Ve 3. AY = fot Palen + Bat + Ve Cointegration consists of matching the degree of nonstationarity of the variables in an equation in a way that makes the error term (and residuals) of the equation stationary and rids the equation of any spurious regressions results. To deal with the possibility that nonstationary time series might be causing regression results to be spurious, most empirical work in time series follows a standard sequence of steps: 1. Specify the model. This model might be a time-series equation with no lagged variables, a distributed lag model or a dynamic model. Test all variables for unit roots using the appropriate version of the Dickey-Fuller test 3. Ifthe variables don’t have unit roots, estimate the equation in its original units (Y and X) If the variables have unit roots, test the residuals of the equation for cointegration using the Dickey-Fuller test 5. If the variables have unit roots are not cointegrated, then change the functional form of the model to first differences (AY and AX) 6. If the variables have unit roots and also are cointegrated, then estimate the equation in its original units. 27 Exercises Week 7 1 ‘What is the distributed lag model? .n with OLS causes a number of problems. True or ‘The estimation of the distributed lag equ false? ‘a. The various lagged values of X are likely to be b. There is guarantee that the estimated Bs will ‘economic theory would suggest. c. The degrees of freedom tend to decrease severely multicollinear follow the smoothly declining pattern that What is the most common way to remove al the lagged independent variables in an equation? iable X, stationary if: When is a time-series ve What does cointegration consist of? 28 Week 8 Chapter 13: Dummy Dependent Variable Techniques A linear probability model is just that, a linear-in-the-coefficients equation used to explain a dummy dependent variable: D.= Bo + Baku + Bakar +61 Unfortunately, using OLS to estimate the coefficients of an equation with a dummy dependent variable faces at least three problems. 1. R isnot an accurate measure of overall fit 2. Dy isnot bounded by 0 and 1 3. The error term is neither homoscedastic nor normally distributed The binomial logit is an estimation technique for equations with dummy dependent variables that avoids the unboundedness problem of te linear probability model by using a variant of the cumulative logistic function: 1 ae rs Logits cannot be estimated using OLS. Instead, we use maximum likelihood (ML), an iterative estimation technique that is especially useful for equations that are nonlinear in the coefficients. The standard documentation format for estimated logit equations. L:Pr(Dj = 1) = Bo + BiXe + BoXai How can we interpret estimated logit coefficients? How can we use them to measure the impact of an independent variable on the probability that D, = 1? Itturns out that there are three reasonable ways of answering this question: 1. Change an average observation 2. Use a partial derivative 3. Use a rough estimate of 0.25 ‘The binomial probit model is an estimation technique for equations with dummy dependent variables that avoids the unboundedness problem of the linear probability model by using a variant of the ‘cumulative normal distribution. Where: Zi=Bo + BX + BX =a standardized normal variable As different as this probit looks from the logit that we examined previously it can be rewritten to look auite familiar: 2 = OP) = Bo + Bika + aXe 29 Exercises Week 8 1. Give the linear probability mode! 2. Using OLs to estimate the coefficients of an equation with a dummy dependent variable faces three problems. True or false? a. R? isnot an accurate measure of overall fit b. Dy is not bounded by 0 and 1 cc. The error term is homoscedastic but not normally distributed 3. Why is the maximum likelihood used? Give the standard documentation format for estimated logit equations 5. What is the binomial probit model? 30 Solutions Week 0 41. (2) Describing economic reality (2) Testing hypotheses about economic theory and policy (3) Forecasting future economic activity, 2. (a) true, (b) true, (c) true a. Set up the null and alternative hypotheses. b. Choose a level of significance and therefore a critica t-value. Run the regression and obtain an estimated t-value (or t-score. 4. Apply the decision rule by comparing the calculated t-value with the cit order to reject or not reject the null hypothesis. 3._Isaterm thats added to a regression equation to introduce all ofthe variation in V that cannot be explained by the included Xs. 4 w= ELK) = DiXP 5. Because all of the observations are from the same point in time and represent different individual economic entities (countries, housing, etc) from that same point in time. al tvalue in Week 1 6. Ordinary Least Squares (OLS) 1. (2) false, (b) true, (c) true a, Set up the null and alternative hypotheses. b. Choose a level of significance and therefore a critical t-value c.Runthe regression and obtain an estimated tvalue (or t-score). 4. Apply the decision rule by comparing the calculated t-value with the critical t-value in order to reject or not reject the null hypothesis. 2. (a) The regression model s linear, is correctly specified, and has an additive error term. (2) ‘The error term has a zero population mean. (3) All explanatory variables are uncorrelated with the error term. (4) Observations of the error term are uncorrelated with each other (no serial correlation) Y= Bo + BaXrs + Bakai + ow me + BeXui + € 4. The Gauss-Markov Theorem: Given classical assumptions | through VI, the Ordinary Least ‘Squares estimator of Bk is the minimum variance estimator from among the set of al linear unbiased estimator of Bk, for k= 0, 1, 2,» K. Week 2 1. The null hypothesis is a statement of the values that the researcher does not expect. Whilst the alternative hypothesis isa statement of values that the researcher does expect. 2. (a) true, (b) true, Set up the null and alternative hypotheses. Choose a level of significance and therefore a critical t-value. Run the regression and obtain an estimated t-value (or tscore).. ‘Apply the decision rule by comparing the calculated t-value with the critical tvalue in order to reject or not eject the null hypothesis 4. p= @SSnmRS5IM RSS/(N-K=1) 5 Seasonal dummies are dummy variables that are used to account for seasonal variation in the ata in time-series models So ee Week 3 choosing the correct independent variables, the correct functional form, and the correct form its when any one of these choices is al of the stochastic error term. A specification error resul made incorrectly. 2. (a) false, (b) false, (c) true 3. We violate Classical Assumption Il (that the explanatory variables are independent of the error term) 4, In¥, = Bo + By In Xi +e 5. Lag Week 4 lationship between the dependent vari 1. Aslope dummy variable allows the slope of the rel: ‘nd an independent variable to be different depending on whether the condition specified by a dummy variable is met. 2. (a) false, (b) false, (c) true Perfect multicollinearity violates Classical Assumption VI 4, Any of the below 5 options Estimates will remain unbiased The variances and standard errors of the estimates will increases ‘The computed t-scores will fall Estimates will become very sensitive to changes in specification The overall fit of the equation and the estimation of the coefficients of nonmulticollinear variables will be largely unaffected 5, (1)Do nothing, (2) drop a redundant variable, (3) increase sample size paoge Week 6 1. Any of the below mentioned 4 a, The order of observations in a time series is fixed b, Time-series samples tend to by much smaller than cross-sectional ones c. The theory underlying time-series analysis can be quite complex The stochastic error term in a time-series equation is often affected by events that ‘took place in a previous time period. 2. (a) true, (b) false, (c) false uncorrelated observations of the error term, is 3, When Classical Assumption IV, which assumes violated in a correctly specified equation. a Ye med e 4 5. Lagrange Multiplier Week 7 1. Ye = Bo + BiXse1 + Boze + & (a) true, (b) false, (c) false Replacing all the lagged independent variables with a lagged value of the dependent variable (1) The mean of X; is constant over time (2) The variance of Xz is constant over time (3) The simple correlation coefficient between X; and X;- depends on the length of the lag (k) but ‘on no other variable (for all k). a 3. 4 32 ahd AthenaSummary 5. Cointegration consists of matching the degree of nonstationarity of the variables in an equation in a way that makes the error term (and residuals) of the equation stationary and ‘ids the equation of any spurious regressions results. Week 8 1D = Bo + Bik + Baka +6, (a) true, (b) true, (c) false Because logits cannot be estimated by using OLS L:Pr(D;= 1) = Bo + Bika + Baka: an estimation technique for equations with dummy dependent variables that avoids the unboundedness problem of the linear probability model by using a variant of the cumulative normal distribution. 33 Summary Lectures Statistics and Overview of regression analysis The expected value of X is its average value (i.e. the mean starting salary) in the population: calculated by weighting each value with the probability that it comes up. 1, When Xisa constant with value c, e.g. the expected value of starting salaries when all econ graduates earn exactly the same starting salary: Ele) = 2. When a constant c is added to X, e.g. the expected value of starting salaries when economics graduates all receive the same fixed bonus c on top of a random component X: E(X+¢) = E(X) += WX +e 3. When X is multiplied by constant c: e.g. economics graduates earn X euros or cX dollars, where cis a constant Euro-Dollar exchange rate: E(cX) = cE(X) = cx Important: this rule tells us what happens to the expected value of X when the units of measurements of X are changed, e.g. measuring in Euros or thousands of Euros. 4, When random variables X1 and X2 are summed: e.g. the expected value of starting salaries when these are made up of two different income sources (say, baseline wages and overtime pay): (XL + X2) = E(X) + E(X2) = WX 1+ WX 2 Putting the rules together: E (3X1. + 2X2 + 5) = 3E (X1) + 2E (X2) +5 = 3UX1 + 2pX2 +5, Population variance and standard deviation The variance of the random variable X Var(X) = E(X— EX) FE (u— wy PPX xi) = 0X) = 8 (va) Standard deviation of a random variable X 54(x) = \/Var(X) (sd) Notation Var(X) = 6% Sd(X) = ox 34 . : , 1. When Xis a constant with value c, the variance of starting salaries when all econ graduates ear ‘exactly the same starting salary: Var(e) = 0 stant cisadded toX,e., the variance of stating salaries when econ graduates all receive 2. When aor the same fixed bonus c on top of a random component X Var(X + ¢) = VarlX) eg. the variance of starting salaries when units of 3, When X is multiplied by a constant c, stes earn X euros or cX dollars, where cis a constant measurements are changed, e.g. economics graduat Euro-Dollar exchange rate: Var(ex) = ¢? Var(X) 4, When (pairwise) independent random variables are summed: Var(Xa + X2) = Var(X1) + Var(X2) 5, When dependent random variables are summed: \Var(X1 + X2) = Var(X1) + Var(X2) + 2Cov (X2,X2) ules 3 & 4: the variance of the sum of aX1 and bX2 (where a and b are constants) if X1 and X2 are independent: \Var(ax1 + bX2) = a Var(X2) + b? Var(X2) ules 3 & 5: the variance of the sum of aX1 and bX2 (where a and b are constants) if X1 and X2 are dependent: Var(aXa + bx2) = a? Var(X1) + b? Var(X2) + 2abCov (X1,X2) Covariance: Corr(X,G) = XG = Cov (X,G) p Var(X) p Var(G) = (Cov (X,G))/((sd(x)sd(G)) The covariance between X and G is a measure of linear association between X and G: XG = E [(X- EX) (G- EG)] ‘The correlation between X and G is 2 scale-invariant measure of linear association between X and G: 1. Covariance between X and a constant c: Cov (X, c) = 0 (rule 1) 2. Covariance between aX and bG where a and b are constants: Cov (aX, bG) = abCov (X,G) (rule 2) 3 i covariance between X and X: Cov (X,X) = € (X EX)2 = Var(X) (rule 3) 35 Terminology Multiple Regression Mod! Population regression model, in general form: ¥i= Bo * Boku + Baka = + Bika tes ‘endogenous variable, regressand). ‘exogenous variables, regressors). Where: (or explained variable, ‘on (ie. other than the influences of X:, ¥: dependent variable ( XX Xa independent variables (or explanatory variables, rerror term (or disturbance) - captures all other influences i). wuared deviations (i.e. vertical distances) from the Xa, OLS estimator chooses Be, Bi such that the sum of sa regression line is minimized (hence "least squares"); eet dual «> Oif data point lies above the regression line; e, ty Fest Step 1 Define hypotheses, Hy and Hq Step 2 Choose a significance level a Step 3 Estimate the restricted and unrestricted models, where the restricted model is the one obtained if Hy is true, ‘Step 4 Compare the residual sum of squares (RSS) across the two models by calculating the F-statistic: (RSS —RSS)/M RSS/(n—k—1) eine ‘Step 5 Find the critical value for the F-statistic, Fe = Fun-kte Step 6 Compare the observed statistic to the critical value- reject Hy if F > F. Omitted Variable Bias Omitted variable bias is a serious problem for empirical economic research: It violates one of the OLS. ‘assumptions for unbiasedness, ‘The misspecified model, where X; is the omitted variable Ye = Bot Bei teh @ = Bite; ‘The relationship between the omitted and the included variable is given by Nj = to aX; + u; Estimates of equation 2 are Y; = By +B,Xu +6, ‘We can show that E(B,) # B, but instead: E(B.) = BrtmB, (VB formula) —— eee wos ‘AthenaSi The OVB formula is . EG) = Bb +mby —~ ~~ ow Eesti i coeticient) true cou cient The blas depends on » a: bivariate relationship between the included and the ‘omitted variable > Bp : partial effect of the omitted variable on the dependent variable Y m0 m <0 B, =O | positive bias | negative bias B, <0 | neg: bias. Functional Form 1. Level-level sp. ication cigs; = By + Byincome; + €; 2. Log-log specification (double log) In cigs; = Bo + B In income; + €; 3, Log-level specification (semi log) In cigs; = By + Byincome; + €: 4. Level-log specification (semi log) cigs; = Bo + By In income; + £1 1. Level-Level Interpretation: smokers who earn $1000 more per year smoke 0.22 cigarettes more per day. 2. Logslog specification: The coefficient gives an elasticity. Interpretation: when income increases with 1%, smokers ‘smoke 0.16% more cigarettes per day. l.e., the income elasticity of cigarette consumption is 0.16 for smokers. 3, Log-level specification: the coefficient* 100% gives the percentage change in the dependent variable, for a one unit increase in the level of the independent variable. Interpretation: when income increases with $1000, smokers smoke 1.05% (= more cigarettes per day. 4. Level-log specification: the coefficient/100 gives the impact on the level of the dependent variable from a 1% increase in the independent variable. 38 his AthenaSummary Interpretation: when income increases with 1%, smokers smoke 0.03 (= 3.32/100) more Cigarettes per day. You cannot compare R’ or adjusted-R’ across models with different dependent variables! Dummy Variable Dummy for gender, female; ‘+ female: =1 if the respondent is female; female, = 0if the respondent is not female. © female the reference group. Perfectly collinearity: female, and male, are perfect linear functions of each other. In particular, for each individual observation female, + male, = 1. Hence, one of the OLS assumptions for unbiasedness is violated. This is known as the dummy variable trap: you cannot include a full set of dummies; we need an omitted category which serves as the reference category. Dummy variables for >2 groups ‘+ Dummies for categorical variables: cannot be ranked, ‘* Dummies for ordinal variables: can be ranked. Create a separate dummy variable for each of the k categories of race. Include k-1 of those dummies into the equation (not k, due to the dummy variable trap!) Interaction Term ‘Average log wages for men (fem, = 0): = Bo + Bi age: + B: educ, Bo + Bi age, + Bs educi+ Bs + Bx educ, (Bo + Bs) + Bs age: + (Bz + Be) educ, ‘Average log wages for women (fem. = Chow test Chow test: tests whether groups have different regression functions. Write restricted and unrestricted models; define corresponding null and alternative hypotheses 2. Choose a significance level a 3. Estimate the restricted and unrestricted models 4. Calculate the Chow test statistic, which is an F-statistic comparing the RSS between the restricted and unrestricted models 5. Find the critical F-statistic, Fe 6. Reject Ho if F > Fe ee 2 In wage; = By + Byagei + Byeduci + Restricted model: Unrestricted models: Inwage, = By! +BY ager + By'educi +6; for males (1) Inwage; = Bp + fi age) + By educ; +e; for females (2) Hypotheses. Hy: BO = Bo. BY! = Bt Ba! = Bo Ha =: Ho not true a = 005 (RSSy — RSS, — RSS) /(k +1) TRSS, + RSS) im +m — 2K EI) MA tm—2e 40) where RSSm_: RSS from restricted model RSS, + RSS from unrestricted model 1 RSS RSS from unrestricted model 2 m_ : nr of obs from unrestricted model 1 m+ nr of obs from unrestricted model 2 kn of parameters (all models have same k) Multicollinearity Perfect (multi)collinearity): perfect linear relationship between 2 or more independent variables. Imperfect (multi)collinearity does not violate any assumptions, but still may be a concern, The variances (and hence standard errors) of the estimates will increase. This makes it more difficult to reject the null hypothesis that a particular independent variable has no impact on the dependent variable. Estimates can also become very sensitive to changes in specification (e.g. adding a variable; changes in the number of observations). The estimation of the coefficients of nonmulticollinear variables will be largely unaffected. ‘© Multicollinearity increases R2 and therefore the standard error. Disease = imperfect multicollinearity. Consequence = estimates of Bx and of Var(B:) remain unbiased (since no OLS assumption has been violated), but the estimated Var(B,) is larger (i.e. larger standard errors, higher p-values). Diagnosis = examine correlations among regressors; estimate auxiliary regressions, do nothing (only drop one of the highly correlated variables if economic theory justifies ahase AthenaSumma Heteroskedasticity Consequences: T-statistc and f- statistic depend on variance so if the variance isn’t constant itis possible to make a hypothesis and therefore impossible to infer To test for heteroscedastcity: Breusch-Pagan 1 Estimate the model Ys = Bo + BX + BX +6; 2. Predict residuals ¢; from the estimated model ¥: = By + B,%ii + BX +e, and square them (¢?) 3. Regress squared residuals e? on independent variables from the original model P= O04 Xt + bX +0; 4 Test whether the independent variables have a jointly significant Impact on e?. if they do (ie. Ho is rejected), we have heteroskedasticity Hy + 61=52=0 — (homoskedasticity) Ha: Hy not true (heteroskedasticty) Robust standard errors are typically higher than the regular ones- although they may also be lower. Higher standard errors mean the t-statistics become smaller (in absolute value}, and estimates become less significant. Time-series Static time series model Ye = Bo + ByXie + et Distributed lag model Ye = Bo + ByXte + BoXiea ter Autoregressive distributed lag model Ye = By + By Yo-1 + ByXie + ByXiea tee fPurlous regression: a strong statistical relationship between two or more variables that isnot driven by an underlying causal relationship. Even if those factors are unobserved, we can contol them by Siectly controling forthe trend Serial correlation: violates OLS assumption 5. * Consequences: biased standard errors, and additionally, biased coefficient estimates in autoregressive models. et ee (© Diagnosis: Breusch-Godfrey test. (© Solution: GLS estimation or Newey-West standard errors. A distributed lag with many lagged independent variables is often problematic. The lagged independent variables are often strongly correlated, leading to multicollinearity (i.e. less significant estimates). The lagged independent variables take up degrees of freedom, decreasing the precision of our estimates (i.e. less significant estimates). We can rewrite the autoregressive (AR) model as a distributed lag (DL) model. ‘The DL model has stronger multicollinearity; The DL model requires the estimation of more parameters and also has a lower number of observations (due to the lags); The DL model typically has more serial correlation. A time-series Ys is strictly stationary if the distribution of ¥;,Y2, ... Yn is the same as the distribution of the variable shifted by some time lag k, Yiex Yau, «rat: the distribution of the variable does not depend on time t. However, to avoid spurious regression, we only need a weaker version - covariance stationarity. {A time-series Ys is covariance stationary if the following 3 statistical properties are unaffected by a change of time: E(¥,) is constant over time . Var(e) is constant over time ce: Le. COV (Ys, Yeu) does not depend on time (it only depends on the lag length k) ‘When one or more of these conditions is violated, a time-series is non-stationary. it follows a random walk. Ycisa non-stationary variable Ye= Yea + €+ (Eris Lid) IF Y¢ follows a random walk, then the value of ¥ tomorrow is the value of Y today, plus an unpredictable (ii.d) disturbance e.. ‘A variable that follows a random walk is also said to have a unit root, or be I(1), which stands for integrated of order 1. This Is very important for economic applications since many macro-economic time series are random. walks! ‘This shows that the variance of Y; becomes larger and larger over time. Period 1: 0? ; period 2: 2*0? ; period 3: 3*o”... etc We conclude that the variance of a random walk is not constant over time: a random walk is non- stationary. 42 ¢ ao ; Dickey-Fuller test . Dickey Fuller test for unit root performed separately for al variables ofthe regression equa regression equation, 1. Determine whether ¥e follows a timetrend by estimating Ye = Bot Bit +e, 2. Estimate AY, = @+9Y;1 +6 if no timettend found BY: = @+0Y11 +50 +e if a timeteend found H0=0 ty:0<0 3. Compare the t-statistic on 0 to the appropriate critical val from the DF table, DF, Poroprae cota vale 4 Reject Mp if ¢ < DFe, in which case Y; does not have a unit foot. Repeat this forthe independent variable(s) in the repression equation! If both the dependent and the independent variable are non-stationary, the error term may still be stationary. If this isthe case, the dependent and independent variables are said to be cointegrated. This is good news, since when X and ¥ are cointegrated, the original regression results are not spurious, and we do not need to first-difference. Dickey-Fuller Cointegration Test This test examines whether the error term is stationary. This should be the case if the two series are cointegrated since it indicates the series do not wander apart. The test uses the residual e; (as errors are unobserved} 1. Estimate the relationship between Ye and X; (include a timetrend if B, is significant) Ye = Bo + BiXt (+Bot) +e 2. Generate the residual, er 3. Regress the differenced residual onto the lagged residual: Dee = Yo + Oe + Ur 43 4. Compare the t-statistic to the critical value from the DF cointegration (DFC) table (with or without a timetrend, depending on the model from step 1), to test Ho + @=0 nocointegration Ha : @<0 cointegration 5. Reject My if t < DFC.: if Hy is rejected, we conclude Y; and X are cointegrated A standard sequence of steps for avoiding spurious regre 1. Test all variables for unit roots (i.e. nonstationarity) using the appropriate version of the Dickey-Fuller test 2. If the variables do not have unit roots, estimate the equation in its original units (Y and X). 3. If the variables both have unit roots, test the residuals of the equation for cointegration using the Dickey-Fuller cointegration test. 4, If the variables both have unit roots but are not cointegrated, then change the functional form of the model to first differences (AX and AY ) and estimate the equation. 5, If the variables both have unit roots and also are cointegrated, then estimate the equation in its original units. Linear Probability Model ‘Sometimes, we want to estimate a model that has a dummy variable as a dependent variable. But how can we estimate such a model, which has a 0-1 variable as the dependent variable? It turns out, one possibility is to use the OLS estimator: this is called the Linear Probability Model (LPM) Non-linear estimators, such as logit and probit, are also an option but not part of this course. The marginal (or partial) effect of each variable is given by its coefficient. Linear Probability Model: ‘© Advantages: easy to compute and easy to interpret. Most important disadvantages: = The errors are heteroskedastic by construction. = Predicted probabilities may lie outside the 0-1 interval (and of course it makes ‘no sense to interpret these as probabilities). Since OLS is a linear estimator, predicted probabilities can lie outside the 0-1 interval. Therefore, whenever estimating an LPM, you should always check your predicted probabilities. If many are <0 or >1, you should use a different estimator (logit / probit) instead of OLS (not part of this course). When estimating an LPM, do not forget t. «Correct for heteroskedasticity(, robust): else the st ( else the standard errors are biased. Check the fitted values: if more than a few percent fall outside of the a estimator is not appropriate. 'e O-1 range, the OLS Experiments are not always possible, however: another approach increasi . so-called natural experiments, ingly used by economists are Natural experiment = random variation contained within observat ; ional experiments designed by the researchers themselves) at (2s opposed tn ‘One example is the politcal representation of women in West-Bengal: since the reservation of seats for women was random across towns, this isa natural experiment. 45

You might also like