You are on page 1of 514
mV eouus tebe JACK JOHNSTON JOHN DiNARDO http://www.mhcollege.com P/N 032720-3 PART OF ISBN O-07-913121-2 A McGraw-Hill A Division of The McGraw-Hill Companies vi ECONOMETRIC METHODS 1.3. To derive cov(a, b) 1.4 Gauss-Markov theorem 1.5. To derive var(eo) Problems Further Aspects of Two-Variable Relationships 2.1 Time as a Regressor 21.1 Constant Growth Curves ~ 2.1.2 Numerical Example 2.2. Transformations of Variables 2.2.1 Log-Log Transformations 2.2.2. Semilog Transformations 2.2.3 Reciprocal Transformations 23. An Empirical Example of a Nonlinear Relation: U.S. Inflation and Unemployment 24° Lagged Dependent Variable as Regressor 24.1 An Introduction to Asymptotics 24.2 Convergence in Probability 24,3 Convergence in Distribution 24.4 The Autoregressive Equation 2.5. Stationary and Nonstationary Series 2.5.1 Unit Root 2.5.2 Numerical IMustration 2.6 Maximum Likelihood Estimation of the Autoregressive Equation 2.6.1 Maximum Likelihood Estimators 2.6.2. Properties of Maximum Likelihood Estimators Appendix 2.1 Change of variables in density functions 2.2 Maximum likelihood estimators for the AR(1) model Problems The k-Variable Linear Equation 3.1 Matrix Formulation of the &-Variable Model 1.1 The Algebra of Least Squares 3.1.2 Decomposition of the Sum of Squares 3.1.3 Equation in Deviation Form 3.2. Partial Correlation Coefficients 3.2.1 Sequential Buildup of the Explained Sum of Squares 3.2.2 Partial Correlation Coefficients and Multiple Regression Coefficients 3.2.3 General Treatment of Partial Correlation and Multiple Regression Coefficients 3.3 The Geometry of Least Squares 3.4 Inference in the k-Variable Equation 3.4.1 Assumptions 3.4.2 Mean and Variance of b 36 36 37 37 41 a2 43 4B 45 47 49 52 53 55 56 37 39 59 61 61 63 65 69 70 70 nD B 6 B 81 82 83 86 86 87 3.4.3. Estimation of o? 3.4.4 Gauss—Markov Theorem 3.45. Testing Linear Hypotheses about B 3.46 Restricted and Unrestricted Regressions 3.4.7 Fitting the Restricted Regression 3.5. Prediction Appendix 3.1 To prove rig = (riz ~ nist VIF JT 3.2. Solving fora single regression coefficient in a multiple regression 3.3 To show that minimizing a’a subject to X’a = c gives a= XXX) e 3.4 Derivation of the restricted estimator b. Problems Some Tests of the k-Variable Linear Equation for Specification Error 4.1 Specification Error 4.1.1 Possible Problems with w 4.1.2. Possible Problems with X 4.1.3 Possible Problems with B 4.2. Model Evaluation and Diagnostic Tests 4.3. Tests of Parameter Constancy 4.3.1 The Chow Forecast Test 43.2. The Hansen Test 4.3.3. Tests Based on Recursive Estimation 43.4 One-Step Ahead Prediction Errors 43.5 CUSUM and CUSUMSQ Tests, 43.6 A More General Test of Specification Error: ‘The Ramsey RESET Test 44 A Numerical Illustration 45. Tests of Structural Change 4.5.1 Test of One Structural Change 45.2 Tests of Slope Coefficients 45.3. Tests of Intercepts 45.4 Summary 45.5 A Numerical Example 45.6 Extensions 46/ Dummy Variables ’ 4.6.1 Introduction 46.2 Seasonal Dummies 4.6.3 Qualitative Variables 4.6.4 Two ot Mote Sets of Dummy Variables 4.6.5 A Numerical Example Appendix 4.1 To show var(d) = 0? |I,, + Xo(X}Xi)'X3] Problems Contents vii 89 89 90 95 96 99 100 101 103 103 104 109 109 110 10 ul 112 113 113 116 17 118 119 121 121 126 127 128 128 129 130 132 133 133 134 135 137 137 138 139 vi BCONOMETRIC METHODS \5 Maximum Likelihood (ML), Generalized Least Squares \ (GLS), and Instrumental Variable (IV) Estimators a 5.1 52 53 54 55 ‘Maximum Likelihood Estimators 5.1.1 Properties of Maximum Likelihood Estimators ML Estimation of the Linear Model Likelihood Ratio, Wald, and Lagrange Multiplier Tests 5.3.1 Likelihood Ratio (LR) Tests 5.3.2 The Wald (W) Test 5.3.3 Lagrange Multiplier (LM) Test ML Estimation of the Linear Model with Nonspherical Disturbances 5.4.1 Generalized Least Squares Instrumental Variable (IV) Estimators 5.5.1 Special Case 5.5.2. Two-Stage Least Squares (2SLS) 5.5.3 Choice of Instruments 5.5.4 Tests of Linear Restrictions Appendix Sd 5.2 53 Change of variables in density functions Centered and uncentered R? To show that €,X(X'X)"!X'e. = ele. — e'e Problems Heteroscedasticity and Autocorrelation 61 62 63 64 65 66 67 Properties of OLS Estimators Tests for Heteroscedasticity 6.2.1 The White Test 6.2.2. The Breusch-Pagan/Godfrey Test 6.2.3. The Goldfeld-Quandt Test 62.4 Extensions of the Goldfeld-Quandt Test Estimation Under Heteroscedasticity 6.3.1 Estimation with Grouped Data 6.3.2 Estimation of the Heteroscedasticity Relation Autocorrelated Disturbances 6.4.1 Forms of Autocorrelation: Autoregressive and Moving Average Schemes 6.4.2 Reasons for Autocorrelated Disturbances OLS and Autocorrelated Disturbances ‘Testing for Autocorrelated Disturbances 6.6.1 Durbin-Watson Test 6.6.2 The Wallis Test for Fourth-Order Autocorrelation 6.6.3 Durbin Tests for a Regression Containing Lagged Values of the Dependent Variable 6.6.4 Breusch-Godfrey Test 66.5 Box-Pierce-Ljung Statistic Estimation of Relationships with Autocorrelated Disturbances 142 142 13 145 147 147 148, 149 151 152 153 156 157 157 158 158 159 160 161 162 163 166 166 167 168 168 170 im im 174 175 176 176 178 179 182 182 185 187 188 6.8 Forecasting with Autocorrelated Disturbances 6.9 Autoregressive Conditional Heteroscedasticity (ARCH) Appendix 6.1 LM test for multiplicative heteroscedasticity 6.2. LR test for groupwise homoscedasticity 6.3. Properties of the ARCH(1) process Problems Univariate Time Series Modeling 7.1 A Rationale for Univariate Analysis 71.1 The Lag Operator 7.1.2. ARMA Modeling 7.2 Properties of AR, MA, and ARMA Processes 7.2.1 AR(1) Process 7.2.2 AR(Q) Process 7.23 MA Processes 7.2.4 ARMA Processes 7.3 Testing for Stationarity 73.1 Graphical Inspection 7.3.2. Integrated Series 7.3.3 Trend Stationary (TS) and Difference Stationary (DS) Series 7.3.4 Unit Root Tests 73.5 Numerical Example 7.4 Identification, Estimation, and Testing of ARIMA Models 74.1 Identification 7.4.2 Estimation 7.4.3 Diagnostic Testing 7.5 Forecasting 7.5.1 MA(1) Process 7.5.2. ARMA(I,1) Process 7.5.3. ARIMA(I,1,0) Process 7.6 Seasonality 7.7 A Numerical Example: Monthly Housing Starts Problems Autoregressive Distributed Lag Relationships 8.1 Autoregressive Distributed Lag Relations 8.1.1 A Constant Elasticity Relation 8.1.2 Reparameterization a 8.1.3 Dynamic Equilibrium 8.1.4 Unit Elasticity 8.1.5 Generalizations 8.2 Specification and Testing 8.2.1 General to Simple and Vice Versa 8.2.2 Estimation and Testing 8.2.3 Exogeneity Contents ix 192 195, 198 200 201 202 204 205 206 207 207 207 209 213 214 215 216 220 220 223, 226 228 228 229 230 231 232 233 234 235 236 243 244 244 245 245 246 246 247 248 248, 250 253 X__ ECONOMETRIC METHODS 8.2.4 Exogeneity Tests 8.2.5 The Wu-Hausman Test 8.3 Nonstationary Regressors 84 ANumerical Example 8.4.1 Stationarity 8.4.2 Cointegration 8.4.3 A Respecified Relationship 8.4.4 A General ADL Relation 8.4.5 A Reparameterization 8.5 Nonnested Models ‘Appendix 8.1 Nonsingular linear transformations of the variables in an equation 8.2. To establish the equality of the test statistics in Eqs. (8.37) and (8.41) Problems 9 Multiple Equation Models 9.1 Vector Autoregressions (VARs) 9.1.1 A Simple VAR 9.1.2. A Three-Variable VAR 9.1.3 Higher-Order Systems 9.2. Estimation of VARs 9.2.1 Testing the Order of the VAR 9.2.2 Testing for Granger Causality . 9.2.3 Forecasting, Impulse Response Functions, and Variance Decomposition 9.2.4 Impulse Response Functions 9.2.5 Orthogonal Innovations 9.2.6 Variance Decomposition 9.3. Vector Error Correction Models 9.3.1 Testing for Cointegration Rank 9.3.2 Estimation of Cointegrating Vectors 9.3.3 Estimation of a Vector Error Correction Model 9.4 Simultaneous Structural Equation Models 9.5 Identification Conditions 9.6 Estimation of Structural Equations 9.6.1 Nonstationary Variables 9.6.2 System Methods of Estimation Appendix 9.1 Seemingly Unrelated Regressions (SUR) 9.2 Higher-order VARs 9.2.1 A VAR(I) Process 9.2.2. A VAR(2) Process Problems 256 257 259 265 266 266 270 2 215 282 284 285 287 287 287 292 294 295 296 296 297 298 299 301 301 302 303 305 305 309 314 317 317 318 320 320 321 322 Contents xi 10 Generalized Method of Moments v nu 12 10.1 10.2 103 10.4 10.5 106 10.7 ‘The Method of Moments OLS as a Moment Problem Instrumental Variables as a Moment Problem GMM and the Orthogonality Condition Distribution of the GMM estimator Applications 10.6.1 Two-Stage Least Squares, and Tests of Overidentifying Restrictions 10.6.2. Wu-Hausman Tests Revisited 10.6.3. Maximum Likelihood 10.6.4 Euler Equations Readings Problems A Smorgasbord of Computationally Intensive Methods Wd 2 3 4 us 116 An Introduction to Monte Carlo Methods 11.1.1 Some Guidelines for Monte Carlo Experiments 11.1.2 An Example 11.1.3 Generating Pseudorandom Numbers 11.1.4 Presenting the Results Monte Carlo Methods and Permutation Tests The Bootstrap 11.3.1 The Standard Error of the Median 11.3.2 An Example 11.3.3. The Parametric Bootstrap 11.3.4 Residual Resampling: Time Series and Forecasting 11.3.5 Data Resampling: Cross-Section Data 11.3.6 Some Remarks on Econometric Applications of the Bootstrap Nonparametric Density Estimation 11.4.1 Some General Remarks on Nonparametric Density Estimation 11.4.2 An Application: The Wage Effects of Unions Nonparametric Regression 11.5.1 Extension: The Partially Linear Regression Model References Problems Panel Data 121 122 123 124 12.5 Sources and Types of Panel Data The Simplest Case—The Pooled Estimator ‘Two Extensions to the Simple Model The Random Effects Model Random Effects as a Combination of Within and Between Estimators 327 328 329 330 333 335 336 336 338 342 343 344 345 348 348 349 350 352 354 359 362 362 363 365 366 369 369 370 375 316 379 383 385 385 388 389 300 390 301 392 xii 13 ECONOMETRIC METHODS 12.6 The Fixed Effects Model in the Two-Period Case 12.7 The Fixed Effects Model with More Than Two ‘Time Periods 12.8 The Perils of Fixed Effects Estimation 12.8.1 Example 1: Measurement Error in X 12.8.2 Example 2: Endogenous X 12.9 Fixed Effects or Random Effects? 12.10 A Wu-Hausman Test 12.11 Other Specification Tests and an Introduction to Chamberlain's Approach 12.11. Formalizing the Restrictions 1211.2 Fixed Effects in the General Model 12.113 Testing the Restrictions 12.12, Readings Problems Discrete and Limited Dependent Variable Models 13.1 Types of Discrete Choice Models 13.2. The Linear Probability Model 13.3 Example: A Simple Descriptive Model of Union Participation 13.4 Formulating a Probability Model 13.5 The Probit 13.6 ‘The Logit 13.7 Misspecification in Binary Dependent Models 13.7.1 Heteroscedasticity 13.7.2 Misspecification in the Probit and Logit 13.7.3. Functional Form: What Is the Right Model to Use? 13.8 Extensions to the Basic Model: Grouped Data 13.8.1 Maximum Likelihood Methods 13.8.2 Minimum y* Methods 13.9 Ordered Probit 13.10 Tobit Models 13.10.1_ ‘The Tobit as an Extension of the Probit 13.102. Why Not Ignore “The Problem”? 13.103 Heteroscedasticity and the Tobit 13.11 Two Possible Solutions 13.11.1 Symmetrically Trimmed Least Squares 13.11.2. Censored Least Absolute Deviations (CLAD) Estimator 13.12. Treatment Effects and Two-Step Methods 1312.1 The Simple Heckman Correction 1312.2 Some Cautionary Remarks about Selectivity Bias 13.12.3 The Tobit as a Special Case 13.13 Readings Problems 395 397 399, 399 402 403 403 404 407 407 408 412 412 44 45 418 419 424 426 426 427 430 432 432 433 434 436 436 439 440 441 442 447 449 450 452 452 Appendix A Al Vectors A.L1 Multiplication by a Scalar A.1.2 Addition and Subtraction A.13 Linear Combinations A.L4 Some Geometry ALS Vector Multiplication A.L6 Equality of Vectors A2 Matrices A2.1 Matrix Multiplication A.2.2 The Transpose of a Product A23 Some Important Square Matrices A24 Partitioned Matrices A25 — Matrix Differentiation A.2.6 Solution of Equations A27 The Inverse Matrix A28 The Rank of a Matrix A.2.9 Some Properties of Determinants A2.10 Properties of Inverse Matrices A2.11 More on Rank and the Solution of Equations A212 Eigenvalues and Bigenvectors A.2.13 Properties of Eigenvalues and Eigenvectors A214 Quadratic Forms and Positive Definite Matrices Appendix B B.1__ Random Variables and Probability Distributions B.2__ The Univariate Normal Probability Distribution B.3_ Bivariate Distributions B.4__ Relations between the Normal, x, t, and F Distributions B.5 Expectations in Bivariate Distributions B.6 Multivariate Densities B.7 Multivariate Normal pdf B.8__ Distributions of Quadratic Forms B.9 Independence of Quadratic Forms B.10 Independence of a Quadratic Form and a Linear Function Appendix C Appendix D Index Contents xiii 455 456 456 456 456 458 458 459 459 461 463 464 465 467 4m 472 473 416 478 483 485 486 487 489, 490 490 492 493 495 496 497 499 521 CHAPTER 1 Relationships between Two Variables The economies literature contains innumerable discussions of relationships be- tween variables in pairs: quantity and price; consumption and income; demand for money and the interest rate; trade balance and the exchange rate; education and income; unemployment and the inflation rate; and many more. This is not to say that economists believe that the world can be analyzed adequately in terms of a collection of bivariate relations. When they leave the two-dimensional diagrams of the text- books behind and take on the analysis of real problems, multivariate relationships abound. Nonetheless, some bivariate relationships are significant in themselves; more importantly for our purposes, the mathematical and statistical tools developed for two-variable relationships are fundamental building blocks for the analysis of more complicated situations, 11 EXAMPLES OF BIVARIATE RELATIONSHIPS Figure 1.1 displays two aspects of the relationship between real personal saving (SAV) and real personal disposable income (INC) in the United States. In Fig. 1.la the value of each series is shown quarterly for the period from 1959.1 to 1992.1. These two series and many of the others in the examples throughout the book come from the DRI Basic Economics Database (formerly Citibase); where relevant, we indicate the correspondence between our labels and the Citibase labels for the vari- ables. Figure I.La is a typical example of a time series plot, in which time is dis- played on the horizontal axis and the values of the series are displayed on the vertical axis. Income shows an upward trend throughout the period, and in the early years, saving does likewise. This pattern, however, is not replicated in the middle and later 1A definition of all series is given in the data disk, which accompanies this volume, Instructions for accessing the disk are given in Appendix C. : 2. BCONOMETRIC METHODS “earoouyqeuosiad aiqesodstp (22) ONT 38 8 &€ € 8 8 (smtp 861 40 sto 5 70 (etop 196830 sto1ta “Suyaes youosto 23) AVS Year @ + + at : + we ogee’ t | Tele : owe | we : th | + 3 : : g g g g Avs 2500-3000 3500 INC 1500 2000 1000 Oy FIGURE 1.1 Saving and income. cuarter 1: Relationships between Two Variables 3 years. One might be tempted to conclude from Fig. 1.1a that saving is much more volatile than income, but that does not necessarily follow, since the series have sep- arate scales.” An alternative display of the same information is in terms of a scatter plot, shown in Fig. 1.1b, Here one series is plotted against the other. The time dimension is no longer shown explicitly, but most software programs allow the option of joining successive points on the scatter so that the evolution of the series over time may still be traced. Both parts of Fig. 1.1 indicate a positive association between the variables: increases in one tend to be associated with increases in the other. It is clear that although the association is approximately linear in the early part of the period, it is not so in the second half, Figures 1.2 and 1.3 illustrate various associations between the natural log of real personal expenditure on gasoline (GAS), the natural log of the real price of gasoline (PRICE), and the natural log of real disposable personal income (INCOME). The derivations of the series are described in the data disk. The rationale for the logarith- mic transformations is discussed in Chapter 2. Figure 1.2 gives various time plots of gasoline expenditure, price, and income. The real price series, with 1987 as the base year, shows the two dramatic price hikes of the early and late 1970s, which were subsequently eroded by reductions in the nominal price of oil and by U.S. inflation, so the real price at the end of the period was less than that obtaining at the start, The income and expenditure series are both shown in per capita form, because U.S. population increased by about 44 percent over the period, from 176 million to 254 million, The population series used to deflate the expenditure and income series is the civilian noninstitutional population aged 16 and over, which has increased even faster than the general population. Per capita real expenditure on gasoline increased steadily in the 1960s and early 1970s, as real income grew and real price declined. This steady rise ended.with the price shocks of the 1970s, and per capita gas con- sumption has never regained the peak levels of the early seventies. The scatter plots in Fig. 1.3 further illustrate the upheaval in this market. The plot for the whole period in Fig. 1.3a shows very different associations between ex- penditure and price in the earlier and later periods. The scatter for 1959.1 to 1973.3 in Fig. 1.3b looks like a conventional negative association between price and quantity. This is shattered in the middle period (1973.4 to 1981.4) and reestablished, though with a very different slope, in the last period (1982.1 to 1992.1). This data set will be analyzed econometrically in this and later chapters. These illustrative scatter diagrams have three main characteristics. One is the sign of the association or covariation—that is, do the variables move together in a positive or negative fashion? Another is the strength of the association. A third characteristic is the linearity (or otherwise) of the association—is the general shape of the scatter linear or curvilinear? In Section 1.2 we discuss the extent to which the correlation coefficient measures the first two characteristics for a linear association, and in later chapters we will show how to deal with the linearity question, but first we give an example of a bivariate frequency distribution. 2See Problem 1.1 4 BCONOMETRIC METHODS. I er Gas 81 — z 4a o 6 70 8 80 85 90 . . 43 & -19 43 @) FIGURE 1.2 Time series plots of natural log of gasoline consumption in 1987 dollars per capita. (@) Gasoline consumption vs. natural log of price in 1987 cents per gallon. (6) Gasoline consumption vs. natural log of income in 1987 dollars per capita. cuarrer 1: Relationships between Two Variables 5 52 41 470 30 465 g g z 4s z 460 455 46 450 1059.1 1992.1 1959.1 0 19733 44 ; 44s ‘31 30 79 «478 497 16 ‘si 80 79 78 7736 cas cas @ © 52 52 30 “ sok Eas Gash S & 46 . ask 1973.40 1981.4 1982.1 0 19921 44 1 n a4 fi \ 480 475-770-765. Seo 77570765 7.80 cas Gas © @ FIGURE 1.3 ‘Scatter plots of price and gasoline consumption. 1.1.1 Bivariate Frequency Distributions The data underlying Figs. 1.1 to 1.3 come in the form of n pairs of observations of the form (Xj, ¥;), i = 1,2, .n. When the sample size n is very large, the data are usually printed as a bivariate frequency distribution; the ranges of X and ¥ are split into subintervals and each cell of the table shows the number of observations 6 ECONOMETRIC METHODS TABLE Lt Distribution of heights and chest circumferences of 5732 Scottish militiamen. Chest circumference (inches) 45and Row 3335 36-38 39-41 42-44 over totals 64-65 39 331 326 26 0 2 Height 66-67 40 5911010 170 4 1815 (inches) 68-69 19 3124 488 18 1981 71 3 100 419 290 2B 397 2-13 0 17 120 153 27 317 (Column totals 103 135130791127 2 S732 Source: Edinburgh Medical and Surgical Journal (1817, pp. 260-264). TABLE 1.2 Conditional means for the data in Table 1.1 Mean of heis’ z:venchest inches) 66.31 66.84 67.89 69.16 70.53 ‘Mean of chest given height (inches) 38.41 39.19 40.26 40.76 41.80 in the corresponding pair of subintervals. Table 1.1 provides an example. It is not possible to give a simple, two-dimensional representation of these data. However, inspection of the cell frequencies suggests a positive association between the two measurements. This is confirmed by calculating the conditional means. First of all, each of the five central columns of the table gives a distribution of heights fora given chest measurement. These are conditional frequency distributions, and traditional statistics such as means and variances may be calculated. Similacly, the rows of the table give distributions of chest measurements, conditional on height. The two sets of conditional means are shown in Table 1.2; each mean series increases monotonically with increases in the coaditioning variable, indicating a positive association between the variables. 1.2 THE CORRELATION COEFFICIENT The direction and closeness of the linear association between two variables are mea- sured by the correlation coefficient. Let the observations be denoted by (X;, ¥;) with i = 1,2,...,1. Once the sample means have been calculated, the data may be ex- pressed in deviation form as m=Xi-X y=K-¥ ‘ondensed from Stephen M. Stigler, The History of Statistics, Harvard University Press, 1986, p. 208. “See Stigler, op. cit, fora fascinating and definitive history of the evolution of the correlation coefficient. cHarreR 1: Relationships between Two Variables 7 where X and Y denote the sample means of X and ¥. Figure 1.4 shows an illustrative point on a scatter diagram with the sample means as new axes, giving four quadrants, which are numbered counterclockwise. The product x;y; is positive for all points in quadrants I and III and negative for all points in quadrants I and IV. Since a positive relationship will have points lying for the most part in quadrants I and III, and a negative relationship will have points lying mostly in the other two quadrants, the sign of 377, x;y; will indicate whether the scatter slopes upward or downward. This sum, however, will tend to increase in absolute terms as more data are added to the sample. Thus, it is better to express the sum in average terms, giving the sample covariance, cov(X, ¥) = (Xi, — XK ~ Py/n (a. The value of the covariance depends on the units in which the variables are mea- sured, Changing one variable from dollars to cents will give a new covariance 100 times the old. To obtain a measure of association that is invariant with respect to units of measurement, the deviations are expressed in standard deviation units. The covariance of the standardized deviations is the correlation coefficient, r namely, y Quadrant tt Quadrant Go) negative (9 positive . OY) : t i i ¥ Quadrant Wt Quadrant 1v (a) positive Gay) negative o x x FIGURE 1.4 Coordinates for scatter diagram for paired variables. 8 ECONOMETRIC METHODS r = (\R = = Sayilns.y (1.2) a [> 930 Viet Omitting subscripts and the limits of summation (since there is no ambiguity) and performing some algebraic manipulations give three equivalent expressions for the correlation coefficient—two in terms of deviations and one in terms of the raw data: Say nSxSy Soy = 13 vee Sy ad - aD x¥-(SxVEY) VASP = (XP nD = (DP 1.2.1 The Correlation Coefficient for a Bivariate Frequency Distribution In general. a bivariate distribution such as that shown in Table 1.1 may be represented by the paired values X,, ¥ with frequency nj fori = 1,...,mand j = 1,..., p.X; is the midpoint of the ith subinterval on the X axis, and ¥; the midpoint of the jth subinterval on the ¥ axis. If we use a period for a subscript over which summation has taken place. the marginal frequencies for X are given by m = S?_, mj for i= 1,....m.In conjunction with the X; values these marginal frequencies will yield the standard deviation of X. that is. s,. The marginal frequencies for ¥ are nj = SM, nj for j = 1...., p. Thus. the standard deviation of Y, or sy, may be obtained. Finally the covariance is obtained from a ©ov(X, ¥) = 7S) mij(Xi — RY, — Pyyn (1.4) isis where n is the total number of observations. Putting the three elements together, one may express the correlation coefficient for the bivariate frequency distribution in terms of the raw data as a ——————— (1.5) = ™ z z n> n,X? — (> m,X)P fn > ng¥?- (CL ny¥jP y ist Vir j=l m2 = 2 nS S myXi¥i— (EXE 2X) isij=t j=l 1 ‘carrer 1; Relationships between Two Variables 9 1.2.2 The Limits of r The correlation coefficient must lie in the range from —1 to +1. To see this, let ¢ be any arbitrary constant. Then \(y— ex)? = 0. Now letc = > xy/ 3) x? . Substi- tution in the inequality gives (S xy? = (3 x°)(S y*), thatis,r? < 1. This expres- sion is one form of the Cauchy-Schwarz inequality. The equality will only hold if each and every y deviation is a constant multiple of the corresponding x deviation. In such a case the observations all lie on a single straight line, with a positive slope ( 1) or a negative slope (r = ~1). Figure 1.5 shows two cases in which r is approxi- mately zero. In one case the observations are scattered over all four quadrants; in the other they lie exactly on a quadratic curve, where positive and negative products off- set one another. Thus, the correlation coefficient measures the degree of linear associ- ation. A low value for r does not rule out the possibility of a strong nonlinear associa- tion, and such an association might give positive or negative values for rif the sample observations happen to be located in particular segments of the nonlinear relation. 1.2.3 Nonsense Correlations and Other Matters Correlation coefficients must be interpreted with care. Many coefficients that are both numerically large and also adjudged statistically significant by tests to be de- scribed later may contain no real information. That statistical significance has been achieved does not necessarily imply that a meaningful and useful relationship has been found. The crucial question is, What has caused the observed covariation? If there is a theory about the joint variation of X and Y, the sign and size of the corre- lation coefficient may lend support to that theory, but if no such theory exists or can be devised, the correlation may be classed as a nonsense correlation. y Peo 7 ¥ Pao eI ~ a a @ o FIGURE 1.5 aired variables for which r? = 0. These 10 ECONOMETRIC METHODS Our favorite spurious, or nonsense, correlation was given in a beautiful 1926 pa- per by the statistician G. Udny Yule.> Yule took annual data from 1866 to 1911 for the death rate in England and Wales and for the proportion of all marriages solemnized in the Church of England and found the correlation coefficient to be +0.95. How- ever, no British politician proposed closing down the Church of England to confer immortality on the electorate. More recently, using annual data from 1897 to 1958, Plosser and Schwert have found a correlation coefficient of +0.91 between the log of nominal income in the United States and the log of accumulated sunspots. Hendry has noted a very strong. though somewhat nonlinear, positive relationship between the inflation rate and the accumulation of annual rainfall in the United Kingdom.’ It ‘would be nice if the British could reduce their inflation rate and, as a bonus, enjoy the inestimable side effect of improved weather, but such happy conjunctions are not tobe. In these three examples all of the variables are subject to trend-like movements over time.® Presumably some complex set of medical, economic, and social factors contributed to the reduction in the death rate in England and Wales, even as a differ- ent set of factors produced a decline in the proportion of marriages in the Church of England. Cumulative sunspots and cumulative rainfall necessarily trend upward, as do the U.S. nominal income and the British inflation rate. Series responding to essen- tially unrelated generating mechanisms may thus display contemporaneous upward and/or downward movements and thus yield strong correlation coefficients. Trends may be fitted to such series, as will be shown in the next chapter, and the residuals from such trends calculated. Correlations between pairs of residuals for such series will be negligible. An alternative approach to correlating detrended residuals is to correlate the first differences of the series. The first differences are simply the changes in the series between adjacent observations. They are usually denoted by the prefix A. Thus, AX,=Xi-Xe1 0 AY = Me ¥e Many series that show very high correlations between X and Y (the levels) will show very low correlations between AX and AY (the first differences). This result usually indicates a spurious relationship. On the other hand, if there is a causal relationship between the variables, we expect to find correlations between levels and also between first differences. This point has recently been emphasized in an important paper by Stigler and Sherwin.’ The main thesis of the paper is that if 5G. Udny Yule, “Why Do We Sometimes Get Nonsense Correlations between Time Series?”, Jounal of the Royal Statistical Society. Series A, General, 89, 1926, 1-68. “Charles 1, Plosser and G. William Schwert, “Money, Income, and Sunspots: Measuring Economic Re- lationships and the Effects of Differencing,” Journal of Monetary Economics, 4, 1978, 637-660. "David F. Hendry, “Bconometrics—Alchemy or Science?”, Economica, 47, 1980, 387-406, ‘Trends, like most economic phenomena, are often fragile and transitory. The point has been made in lyrical style by Sir Alec Caimeross, one of Britain's most distinguished economists and a former chief economic adviser t the British government, “A trend is a trend, is @ trend, but the question is, will it ‘bend? Will it alter its course, through some unforeseen force and come to a premature end?” George J. Stigler and Robert A. Sherwin, “The Extent of the Market,” Journal of Law and Economics, 28, 1985, 555-585. cuarrex 1. Relationships between Two Variables 11 two goods or services are in the same market their prices should be closely related. However, since most prices, like many economic series, show trend-like movements over time, Stigler and Sherwin wish to guard against being misled by spurious cor- relation. Thus, in addition to correlating price levels they correlate price changes. As one example, the prices of December 1982 silver futures on the New York Com- modity Exchange and the Chicago Board of Trade over a 30-day trading period gave r = 0,997, and the price changes gave r = 0.956. In Minneapolis, Minnesota, and Kansas City, Missouri, two centers of the flour-milling industry, the monthly whole- sale prices of flour over 1971-1981 gave correlations of 0.97 for levels and 0.92 for first differences. In these two cases the first difference correlations strongly reinforce the levels correlations and support the thesis of a single market for these goods. 1.2.4 A Case Study Gasoline is retailed on the West Coast of the United States by the “majors” (Arco, Shell, Texaco, etc.) and by “minors,” or “independents.” Traditionally the majors have offered a greater variety of products, differentiated in terms of grade of gasoline, method of payment, degree of service, and so forth; whereas the minors have sold for cash and offered a smaller range of products. In the spring of 1983 Arco abolished its credit cards and sold for cash only. By the fall of 1983 the other majors had responded by continuing their credit cards but introducing two prices, a credit price and a lower cash price. Subsequently one of the independents sued Arco under the antitrust laws. The essence of the plaintiff's case was that there were really two separate markets for gasoline, one in which the majors competed with each other, and a second in which the minors competed. They further alleged, though not in this precise language, that Arco was like a shark that had jumped out of the big pool into their litle pool with the intention of gobbling them all up. No one questioned that there was competition within the majors and competition within the minors: the crucial question was whether there was competition between majors and minor The problem was a perfect candidate for the Stiglet/Sherwin type of analysis. ‘The Lundberg Survey reports detailed information twice a month on the prices of all types and grades of gasoline at a very large sample of stations. These data are also averaged for majors and minors. Twelve differentiated products were defined for the majors and four for the minors. This step allowed the calculation of 66 correlation coefficients for all pairs of products within the majors and 6 correlation coefficients within the minors. Each set of coefficients would be expected to consist of very high numbers, reflecting the intensity of competition inside each group. However, it was also possible to calculate 48 correlation coefficients for all cross-pairs of a major price and a minor price. If the plaintiff's argument were correct, these 48 coefficients would be of negligible size. On the other hand, if there were just a single large mar- ket for gasoline, the cross correlations should not be markedly less than correlations within each group. A nice feature of the problem was that the within-group corre- lations provided a standard of reference for the assessment of the cross correlations. In the cases discussed in the Stigler/Sherwin paper only subjective judgments could be made about the size of correlation coefficient required to establish that two goods were in the same market. 12. EcoNomETRIC METHODS The preceding approach yielded a matrix of 120 correlation coefficients. In or der to guard against possible spurious correlation, such a matrix was computed for levels, for first differences, for logs of levels, and for first differences of logs (which measure percent changes in price). In addition, regression analysis was used to adjust for possible common influences from the price of crude oil or from general inflation, and matrices were produced for correlations between the residuals from these regres- sions. In all cases the matrices showed “forests” of tall trees (that is, high correlation coefficients), and the trees were just as tall in the rectangle of cross correlations as in the triangles of within correlations. The simple correlation coefficients thus provided conclusive evidence for the existence of a single market for retail gasoline. 13 PROBABILITY MODELS FOR TWO VARIABLES Classical statistical inference is based on the presumption that there exists some population distribution of all possible observations on the variables of interest. That distribution is characterized by certain crucial parameter values. From a sample of n observations sample statistics are computed and these serve as a basis for inference about the population parameters. Ever since the work of Haavelmo in the 1940s the probability approach has been extensively used in econometrics.! Indeed the development of econometrics in the past half century has been driven mainly by the effort to adapt and extend classical inference procedures to deal with the special problems raised by the nature of the data generation process in economics and the general unavailability of controlled economic experiments. 1.3.1 Discrete Bivariate Probability Distribution To introduce some of the main ideas, consider a discrete bivariate probability dis- tribution as shown in Table 1.3. The cell entries indicate the probability of the joint occurrence of the associated X, ¥ values. Thus, pi; = probability that X = X; and ¥ = ¥;. The column and row totals, where a period indicates the subscript over which summation has taken place, give the marginal probabilities for X and Y, re- spectively. There are six important population parameters for the bivariate distribu- tion. The means are defined by Be = EQ) = > piXi and wy = EY) = >opjh (1.6) i 7 ‘The variances are defined as o} = var(X) = EU(X ~ wx) = > p(X - wa? 7 2 (1.7) oF = var(Y) = ET ~ wy => ely 7 'Trygve Haavelmo, The Probability Approach in Econometrics, supplement to Econometrica, 12, July, 1944. cuarren 1; Relationships between Two Variables 13 TABLE 1.3 ‘ A bivariate probability distribution Marginal X x Xm probability ¥ Pu Pa Pt Pa % a Pr Py y, Pin Pip Pap Ps Marginal p, Peo Pe 1 The covariance is 8 0 cov(X, ¥) = ELK - wo - w,)] DD pi - why - wy) (8) 75 Finally, the population correlation coefficient is defined as a Ory com(X,Y) = p (1.9) oxy In these formulae >”; and >” ; indicate summation over the relevant subscripts. Conditional probabilities 7 Consider the X; column in Table 1.3. Each cell probability may be divided by the column total, p;,, to give a conditional probability for ¥ given X;. Thus, a = probability that ¥ = ¥; given that X = X; fa = prob(Y; | X;) ‘The mean of this distribution is the conditional expectation of Y, given X;, that is, My, = EY |X) = > (Ze) (iy +P Similarly, the variance of this distribution is a conditional variance, or oy, = var(Y | X;) = > (2) Byix? (1.12) Vp. The conditional means and variances are both functions of X, so there is a set of m conditional means and variances. In a similar fashion one may use the row probabil- ities to study the conditional distributions of X given ¥. 14 BCONoMETRIC METHODS TABLE 1.4 Bivariate distribution of income (X) and vacation expenditure (Y) X ($7000) 20 30 40 1 ae 2 08 1508 y 3 06 06.06 00) 4 0 5 As . 5 | 0 0° © 6 0° 0 3 Marginal probability 7"40™) 30 30 Mean (Y |X) 25 39 Vary X) 44° 851.09 Oat Tt Roto < 14 . TABLE LS. Conditional probabilities from Table 1.4 ¥ aT aX %4 0 ee oe 002 ea eee X % OI—05 02 02 0 0 @ 0 01 02 05 O11 On A numerical example Table 1.4 presents hypothetical data on income and vacation expenditure for an imaginary population. There are just three levels of income and six possible levels of vacation expenditure. Everyone. no matter how humble, gets to spend at least $1,000 on vacation. The marginal probabilities show that 40 percent of this population have incomes of $20,000, 30 percent have incomes of $30,000, and 30 percent have in- comes of 40.000. The conditional probabilities derived from these data are shown in Table 1.5. These conditional probabilities are used to calculate the conditional means and variances shown in the last two rows of Table 1.4. Mean vacation expenditure rises with income but the increase is not linear, being greater for the increase from $30,000 to $40,000 than for the increase from $20,000 to $30,000. The conditional variance also increases with income. One could carry out the parallel analysis for X given Y. This might be of interest to a travel agent concerned with the distribution of income for people with a given vacation expenditure. 1.3.2: The Bivariate Normal Distribution The previous examples have been in terms of discrete variables. For continuous vari- ables the most famous distribution is the bivariate normal. When X and ¥ follow a bivariate normal distribution, the probability density function (pdf) is given by cuarren 1: Relationships between Two Variables 15, f(x.) = ——__ x = 2mo,oy J1- PP 1 XT Mex yo my), (y= mY -5 > |(*—# te (Po By) (2 Be) I of 20 — p>) ( oy if oy ey ae In this equation we have used x and y to indicate the values taken by the variables X and Y. The lower-case letters here do tot measure deviations from sample means, as they do in the discussion of the correlation coefficient in Section 1.2. The range of variation for both variables is from minus to plus infinity. Integrating over y in Eq. (1.13) gives the marginal distribution for X, which is (1.14) Thus, the marginal distribution of X is seen to be normal with mean jx and stan- dard deviation o-,. Likewise, the marginal distribution of Y is normal with mean sy and standard deviation oy. The remaining parameter in Eq. (1.13) is p, which can be shown to be the correlation coefficient between X and Y. Finally, from the joint distribution (Eq. (1.13)] and the marginal distribution [Eq. (1.14)], the conditional distribution of Y given X may be obtained" as FO1D = FW F@ 1 1 (¥— My jl = ex ae (1.15) Qmay\x of 2 ( yx ‘The conditional distribution is also seen to be normal. The conditional mean is Hyx = a + Bx (1.16) where a = pby—By, and (L172) The conditional mean is thus a linear function of the X variable. The conditional variance is invariant with X and is given by Oj, = OL — p?) (1.18) This condition of constant variance is referred to as homoscedasticity. Finally, the conditional mean and variance for X given Y may be obtained by interchanging x and y in the last three formulae. 14 THE TWO-VARIABLE LINEAR REGRESSION MODEL In many bivariate situations the variables are treated in a symmetrical fashion. For the Scottish soldiers of Table 1.1 the conditional distribution of height, given chest “See Problem 1.4. 16 ECONOMETRIC METHODS size, is just as meaningful and interesting as the conditional distribution of chest size, given height. These are two aspects of the joint variation, However, in the va- cation expenditure/income example we have already tended to show more interest in the conditional distribution of expenditure, given income, than in the distribution of income, given expenditure. This example is typical of many economic situations, Economists often have explicit notions, derived from theoretical models, of causal- ity running from X, say, to ¥. Thus. the theory of consumer behavior leads one to expect that household income will be a major determinant of household vacation expenditure, but labor economics does not give equal strength to the proposition that household vacation expenditure is a major determinant of household income. Although it is formally true that a joint distribution can always be factored in two different ways into the product of a marginal and a conditional distribution, one fac- torization will often be of more interest to an economist than the other, Thus, in the expenditure/income case the factorization f(X, ¥) = f(X) + f(¥ | X) will be of greater interest than the factorization f(X, ¥) = f(¥)- f(X | ¥). Moreover, in the first factorization the conditional distribution of expenditure, given income, will usually receive much more attention and analysis than the marginal distribution for income. 1.4.1 A Conditional Model To formulate a model for vacation expenditure that is conditional on income, let us consider how data on such variables might be obtained. One possibility is that a sample of households from the N’ households in the population was taken and the values of ¥ and X recorded for the year in question. This is an example of cros section data. There will be some—presumably complex end certainly unknown— bivariate distribution for all N households. This bivariate distribution itself will be some marginalization of a multivariate distribution covering income and all cate~ gories of expenditure, Concentrating on the conditional distribution, economic the- ory would suggest EY |X) = g(X) where g(X) is expected to be an increasing function of X. If the conditional expec- tation is linear in X, as in the case of a bivariate normal distribution, then EW |X) =a + BX (1.19) For the ith household this expectation gives E(Y | X)) = @ + BX, The actual vacation expenditure of the ith household is denoted by ¥;, so we define a discrepancy or disturbance u; as uy = Yj — E(Y | Xi) = Yi a ~ BX; (1.20) "We now return to the earlier convention of using X and ¥ to indicate both the label fora variable and the values that it may assume, ccuarrex |; Relationships between Two Variables 17 ‘The disturbance u; must therefore represent the net influence of everything other than the income of the ith household. These other factors might include such things as the number and ages of household members, accumulated savings, and so forth. Such factors might be measured and included in Eq. (1.19), but with any finite number of explanatory factors we still cannot expect perfect agreement between individual ob- servations and expected values. Thus, the need to specify a disturbance term remains. Taking conditional expectations of both sides of Eq. (1.20) gives E(u; | X;) = 0. The variance of 1; is also seen to be the variance of the conditional distribution, 72, If we look at the jth household, the disturbance u; will have zero expectation and variance Fle; These conditional variances may well vary with income. In the hypo- thetical data of Table 1.4 they are positively associated with income. For the present, however, we will make the homoscedasticity assumption that the disturbance vari- ances are constant and independent of income. Finally, we make the assumption that the disturbances are distributed independently of one another. This rules out such things as “vacation mania,” where everyone rushes off to Europe and large positive disturbances become apparent. This assumption implies that the disturbances are pairwise uncorrelated.'’ Collecting these assumptions together gives Elu;) = 0 for all i var(uj) = E(u?) = a? for all i (1.21) cov(uj, uj) = Eluiuj) = 0 fori # j These assumptions are embodied in the simple statement The uj are iid(0, 0?) (1.22) which reads “the u; are independently and identically distributed with zero mean and variance 0.” Now suppose the available data come in time series form and that X, = aggregate real disposable personal income in year ¢ ¥, = aggregate real vacation expenditure in year t 1,2,...,m. The series {X,} is no longer a set of sample values from the mn of all N incomes in any year: it is the actual sum of all incomes in each "Two variables are said to be independently distributed, or stochastically independent, ifthe conditional distributions are equal (o the corresponding marginal distributions. This statement is equivalent to the joint probabilities being the product of the marginal probabilities. For the discrete case. the covariance between X and ¥ is then covlX ¥) = 0S) p(X — wd - wy) = > ii ~ nd S p AY ws) using Bq. (1.6) =o The converse is not necessarily true since the covariance measures linear association; but substituting 1p = Oin Eg, (1.13) shows that it is true for the bivariate normal distribution, since the bivariate density then collapses into the product of the two marginal densities. 18. ECONOMETRIC METHODS year. It might be regarded as a sample of n observations from the “population” of all possible aggregate income numbers, but this interpretation seems to be putting some strain on the meaning of both sample and population. Moreover, the usual time series “sample” consists of data for n adjacent years. We would be rather suspicious of cross-section samples that always consisted only of n adjacent households. They could be from Millionaires’ Row or from Skid Row. Thus, it is difficult to give an unambiguous and useful interpretation of f(X), the marginal distribution of X over time, However, the conditional distribution f(Y | X) is still important and must be given a probabilistic formulation. To see this reasoning, retum to the cross section formulation and introduce the time subscript. Thus, Yin = @ + BXin + Uir (1.23) real vacation expenditure by the ith household in year t real disposable income of the ith household in year r where Yir Xir Making the (implausible) assumption that the a and 8 parameters are the same for all households and aggregating Eq. (1.23) over all N households in the economy, we find promerolEr) which may be rewritten as > Ur ¥, = Na + BX, + U; (1.24) where ¥ and X denote aggregate expenditure and aggregate income and U is an ag- gregate disturbance. The assumptions made about the household u's imply that U; is a stochastic variable with zero mean and variance No”. In the context of time series, one needs to make a further assumption about the independence, or lack thereof, of the U's. If the independence assumption is chosen, then the statement is that the U; are iid(0, No”), 1.4.2. Estimates and Estimators Whether the sample data are of cross section or time series form, the simplest version of the two-variable model is ¥; = a + BX; + uj, with the u; being iid(0, 02). There are thus three parameters to be estimated in the model, namely, a, 8, and o?, The parameters a and f are taken as a pair, since numerical values of both are required to fit a specific line. Once such a line has been fitted, the residuals from that line may be used to form an estimate of 0. An estimator is a formula, method, or recipe for estimating an unknown popu- lation parameter; and an estimate is the numerical value obtained when sample data are substituted in the formula, The first step in fitting a straight line to sample data is to plot the scatter diagram and make sure from visual inspection that the scatter is approximately linear. The treatment of nonlinear scatters is discussed in the next chapter. Let the straight line fitted to the data be denoted by ?; = a + bX;, where ¥; indicates the height of the line at X;. The actual ¥; value will in general deviate from Y;, Many estimators of the pair a,b may be devised. cuarrex |: Relationships between Two Variables 19 1. Fit line by eye and read off the implied values for the intercept a and slope b. Different “artists” may, of course, draw different lines, so it is preferable to have an estimator that will yield the same result for a given data set, irrespective of the investigator. 2. Pass a line through the leftmost point and the rightmost point of the scatter. If Xe denotes the smallest value of X in the sample and X.e the largest and Y., Yee the associated ¥ values, this estimator is b = (Yor — Ye M(Xee — Xe) a = Y,~bXy = Yuu — Xue This estimator can hardly be expected to perform very well since it uses only two of the sample points and ignores the rest. 3. The last criticism may be met by averaging the X and ¥ coordinates of the m left ‘most and the m rightmost points, where m is some integer between I and n/2, and passing a line through the resultant average points. Such an estimator with m set at n/3 or n/2 has been proposed in the literature on errors in variables, as will be discussed later. This type of estimator does not easily lend itself to mathematical manipulation, and some of its properties in repeated applications are difficult to determine, 1.43 Least-Squares Estimators ‘The dominant and powerful estimating principle, which emerged in the early years of the nineteenth century for this and other problems, is that of least squares.'* Let the residuals from any fitted straight line be denoted by = bX; G2 1,2.00 (1.25) From the definition of P; and from Fig. 1.6 these residuals are seen to be measured in the vertical (Y) direction. Each pair of a, b values defines a different line and hence a different set of residuals. The residual sum of squares is thus a function of a and +b. The least squares principle is Select a, b to minimize the residual sum of squares. RSS = Se? = f(a,b) The necessary conditions for a stationary value of RSS are!® See again the unfolding story Press, 1986. 'Sin obtaining the derivatives we leave the summation sign in place and differentiate the typical term. with respect to @ and b in tur, and simply observe the rule that any constant can be moved in front of the summation sign but anything that Varies from one sample point to another must be kept to the right of the summation sign. Finally, we have dropped the subscripts and range of summation since there is ‘no ambiguity. Strictly speaking, one should also distinguish between the a and b values that appear in the expression to be minimized and the specific values that actually do minimize the residual sum of squares, but again there is litte risk of ambiguity and we have kept the expressions uncluttered. Stephen M. Stigler, The History of Statistics, Harvard University 20 ECONOMETRIC METHODS y. ~ ° > & x FIGURE 1.6 Residuals from a fitted straight line. 2 92) 2 27-0 bx) = 2D e = 0 (1.26) ade - 3 2° Xe = Simplifying gives the normal equations for the linear regression of ¥ on X. That is, DY =na+b> x (128) Dx =aSxt+o> xX? ‘The reason for the adjective normal will become clear when we discuss the geometry of least squares later. The first normal equation may be rewritten as a =Y-bx (1.29) and = -20X(Y - a- bx) = (1.27) ‘Substituting for a in the second normal equation gives < ee S32 (1.30) cuarrer 1: Relationships between Two Variables 21 Thus, the least-squares slope may first of all be estimated by Eq. (1.30) from the sample deviations, and the intercept then obtained from substituting for b in Eq. (1.29). Notice that these two expressions have exactly the same form as those given in Eq. (1.17) for the intercept and slope of the conditional mean in the bivariate normal distribution. The only difference is that Eqs. (1.29) and (1.30) are in terms of sample statistics, whereas Eq. (1.17) is in terms of population statistics. ‘To summarize, the least-squares line has three important properties. It minimizes the sum of the squared residuals. It passes through the mean point (X, Y), as shown by Eq, (1.29). Finally, the least-squares residuals have zero correlation in the sample with the values of X.'6 The disturbance variance a? cannot be estimated from a sample of u values, since these depend on the unknown a and B values and are thus unobservable. Anes- timate can be based on the calculated residuals (the ¢;). Two possibilities are 5. e?/n or >, e°/(n — 2). For reasons to be explained in Chapter 3 the usual choice is - 22 7 ad (1.31), 1.4.4 Decomposition of the Sum of Squares Using Eqs. (1.25) and (1.29), one may express the residuals in terms of the x, y deviations, namely ei = Yi — bx; ee (1.32) Squaring both sides, followed by summing over the sample observations, gives De = Vy = 2 Say t PS? The residual sum of squares is thus seen to be a quadratic function of b. Since x? = 0, and the equality would only hold in the pathological case of zero variation in the X variable, the single stationary point is necessarily a minimum. Substitution from Eq. (1.30) gives DFP LP +e =b> y+ De . (1.33) Vyr+D This famous decomposition of the sum of squares is usually written as TSS = ESS + RSS Dike = Drs Re = Set De =D using Eq. (1.26) Hence, cov(X,e) = 0 using Eq. (1.27) 22. ECONOMETRIC METHODS where!’ TSS RSS ‘otal sum of squared deviations in the Y variable esidual, or unexplained, sum of squares from the regression of ¥ on X ESS = explained sum of squares from the regression of ¥ on X The last line of Eq. (1.33) may be rearranged to give 2 RSS _ ESS Sloss. Tss ~ TSS ‘Thus, 7? may be interpreted as the proportion of the Y variation attributable to the linear regression on X. Equation (1.34) provides an alternative demonstration that the limits of r are +1 and that in the limiting case the sample points all lie on a single straight line. (1.34) 1.45 A Numerical Example Table 1.6 gives some simple data to illustrate the application of these formulae. Sub- stitution in Eq. (1.28) then gives the normal equations 40 = 5a+20b 230 = 20a + 1206 with solution P=1+175X The same data in deviation form are shown in Table 1.7. The regression coefficients may be obtained from Say _ 70 b= S45 = 7175 and a=Y~bxX =8-1.75(4) =1 The explained sum of squares may be calculated as ESS = bS xy = 1.75(70) = 122.5 and the residual sum of squares is given by subtraction as RSS = TSS ~ ESS = 124 ~ 122.5 = 1.5 Finally, the proportion of the ¥ variation explained by the linear regression is, ESS _ 122.5 P= FSS > pap 7 0.9879 "TUnfortunately there is no uniform notation for sums of squares. Some authors use SSR to indicate the sum of squares due to the regression (our ESS), and SSE to indicate the sum of squares due to error (our RSS), carrer 1; Relationships between Two Variables 23 TABLE 1.6 xX oy x wf 204 8 4 450 -050 0 -1 307 2 9 625) 07S 2.25 eee ee 273 75) 0.25 Seo 525) 975) 075) 375) 9 17 153 8116.75 0252.25 Sums 20 40230-12040 0 0 TABLE 1.7 ay 5 xe 8 16-350 -050 1.00 1 1 el) 0.75, 0.75) 15 2-525 0.25 -0.75 14 ot 1 1475 -0.75 -0.75 5.9 45 Gl ea ee Sums 0 0 70 40 124 0 0 0 15 INFERENCE IN THE TWO-VARIABLE, LEAST-SQUARES MODEL The least-squares (LS) estimators of a and B have been defined in Eqs. (1.28) to (1.30). There are now two important questions: 1. What are the properties of these estimators? 2. How may these estimators be used to make inferences about a and 8? 1.5.1 Properties of LS Estimators The answers to both questions depend on the sampling distribution of the LS es- timators. A sampling distribution describes the behavior of the estimator(s) in re- peated applications of the estimating formulae. A given sample yields a specific ‘numerical estimate. Another sample from the same population will yield another numerical estimate. A sampling distribution describes the results that will be ob- tained for the estimator(s) over the potentially infinite set of samples that may be drawn from the population. The parameters of interest are a, 8, and o? of the conditional distribution, ‘f(Y | X). In that conditional distribution the only source of variation from one hy- pothetical sample to another is variation in the stochastic disturbance (u), which in conjunction with the given X values will determine the ¥ values and hence the sam- ple values of a, b, and s®, Analyzing ¥ conditional on X thus treats the X1, Xp...» X, values as fixed in repeated sampling. This treatment rests on the implicit assump- 24 ECONOMETRIC METHODS tion that the marginal distribution for X, thatis, f(X), does not involve the parameters of interest or, in other words, that f(X) contains no information on @, 8, and 0. This is called the fixed regressor case, or the case of nonstochastic X. From Eq. (1.30) the LS slope may be written b= Swi where the weights w; are given by xi a 3. =x (1.35) These weights are fixed in repeated sampling and have the following properties: we = 0 we and wix; = > wiX; = 1 (1.36) Day It then follows that b= Swi (137) so that the LS slope is a linear combination of the ¥ values. ‘The sampling distribution of b is derived from Eq, (1.37) by substituting ¥; = @ + BX; + u; and using the stochastic properties of u to determine the stochastic properties of b. Thus, : a(S) mi) +B (Swix) + wi b = B+ Swans (1.38) and so E(b) = B (1.39) that is, the LS slope is an unbiased estimator of 8. From Eq. (1.38) the variance of Bis seen to be var(b) = El(b ~ BY?) = £[(S mau) From the properties of the w’s it may be shown! that 2 o vantb) = So (1.40) By similar methods it may beshown!? that E(a)=a (1.41) 2 and var(a) = o? [4+ * (1.42) nt SH These four formulae give the means and variances of the marginal distributions of and b. The two estimators, however, are in general not stochastically independent, "See Appendix 1.1 "See Appendix 1.2. chapter |: Relationships between Two Variables 25 for the covariance is*® cov(a, b) = (1.43) This covariance only vanishes if X = 0. One can always rearrange an LS regression to have a zero mean for the right-hand-side variable. By using Eq. (1.29), ¥ = a + bX + e can be rewritten as Y = ¥ +bx+e, which gives cov(¥, b) = cov(i,b) = 0. 1.5.2 Gauss-Markoy Theorem The LS estimators are seen to be linear combinations of the Y variable and hence linear combinations of the stochastic u variable. Because they are also unbiased, they belong to the class of linear unbiased estimators. Their great importance in the theory and practice of statistics is that their sampling variances are the smallest that can be achieved by any linear unbiased estimator. Looking at estimators of B, for example, let b= Sal; denote any arbitrary linear unbiased estimator of B. The unbiasedness criterion im- poses two linear constraints on the weights, (c;), leaving (n ~ 2) weights “free.” It can be shown?! that var(b*) = var(b) + 0? (ci — wi? Since S(c; — w;)? = 0, var(b*) = var(b). Equality only holds when ¢; = w; for all i, that is, when b* = b. The least-squares estimator thus has minimum variance in the class of linear unbiased estimators and is said to be a best linear unbiased estimator, or BLUE. 1.5.3 Inference Procedures The results established so far have required the assumption that the u; are iid(0, 0). The derivation of inference procedures requires a further assumption about the form of the probability distribution of the u's. The standard assumption is that of normal- ity, which may be justified by appeal to the Central Limit Theorem, since the u's represent the net effect of many separate but unmeasured influences. Linear combi- nations of normal variables are themselves normally distributed. Thus. the sampling distribution of a,b is bivariate normal, as in the formula in Eq, (1.13), The marginal distributions are therefore also normal and are determined by the means and vari- ances already obtained. Thus, b~NB.071>. 2) (1.44) See Appendix 1.3, See Appendix 1.4. 26 ECONOMETRIC METHODS to be read, “b is normally distributed with mean and variance 0/5: x2.” The square root of this variance, that is, the standard deviation of the sampling distribu- tion, is often referred to as the standard error of b and denoted by s.e.(b). The sampling distribution of the intercept term is a~nfaor(h+ 25) (1.45) If o? were known, these results could be put to practical use. For example, a 95 percent confidence interval for would be provided by b+ 1.9601) 2 Italso follows from Eq, (1.44) that b-B = ——= ~ NO) 1.46) 7 sa ey where (0, 1) denotes the standard normal distribution (a normally distributed vari- able with zero mean and unit variance). Thus, a test of the hypothesis Ho: B = Bo is carried out by computing and contrasting this statistic with a preselected critical value from the standard nor- mal distribution. If, for example, the absolute value of this statistic exceeded 1.96, Ho would be rejected at the 5 percent level of significance. When o? is unknown these procedures are not feasible. Te derive an operational procedure we need two further results. They will be stated here and proved for the general case of multiple regression in Chapter 3. The relevant results are = ~ x(n - 2) (1.47) to be read “S. e*/o? is distributed as x? with (n — 2) degrees of freedom,” and De? is distributed independently of f(a, b) (1.48) As shown in Appendix B the f distribution is defined as a combination of a standard normal variable and an independent y” variable. Thus, Eqs. (1.46) through (1.48) give Die: sl JS where s? = > e7/(n — 2), the estimator of o? defined in Eq. (1.31). Notice that Eq. (1.49) has the same structure as Eq. (1.46), the only difference being that the un- known ois replaced by the estimate s. This causes a shift from the normal distribu- tion to the distribution. For degrees of freedom in excess of about 30, the differences between the critical values of the f distribution and the standard normal distribution are negligible. A 95 percent confidence interval for B is ~X(n-2) (1.49) y cuarreR t: Relationships between Two Variables 27 b* tors! > (1.50) and Ho: B = Bo would be rejected if b- Bo FO) > tooas(n - 2) 151 vise foors(n ~ 2) (51) where fno2s(1t — 2) indicates the 2.5 percent point of the ¢ distribution with (n — 2) degrees of freedom. The conditions in Eqs. (1.50) and (1.51) are opposite sides of the same coin. If Eq. (1.51) leads to a rejection of Ho, then Bp lies outside the confidence interval given in Eq. (1.50). Likewise, if By lies inside the 95 percent confidence interval, Eq. (1.51) will not lead to the rejection of Ho at the 5 percent level. The most commonly used test of significance is that of Ho: B = 0, The test statistic is then b b "TES ew (1.52) and Hy would be rejected at the 5 percent level of significance if the absolute value of b exceeded fo 2s times its standard error. The abbreviation s.c.(b) is used to de- note both the true and the estimated standard error of f.. The test statistic in Eq. (1.52) is a routine output of most regression packages and usually has some label such as T-STAT. Many programs also report a P-value, which is the probability of obtaining a coefficient as far or farther from zero as the sample value if, in fact, the true value of the coefficient is zero. This number may be labeled P- VALUE or 2-TAIL SIG. By a similar development, tests on the intercept are based on the f distribution: a-a s/n + X77 xt Thus, a 100(1 ~ €) percent confidence interval for a is given by a+ tens /Un + 27>" 2 (1.54) and the hypothesis Hy:a = ag would be rejected at the 100€ percent level of sig- nificance if Pees (1.53) a-ay | | > kn s Jin + XY 2 Tests on o? may be derived from the result stated in Eq. (1.47). Using that result one may, for example, write =2)s% prob abs < one < xm} = 0.95 (1.55) ‘nich states that 95 percent of the values of a x? variable will lie between the values (teat cut off 2.5 percent in each tail of the distribution, The critical values are read off from the x? distribution with (n — 2) degrees of freedom, or accessed through any appropriate software package. The only unknown in Eq. (1.55) is 02, and the ‘

You might also like