8.essentials of Econometrics Chapter1-8 PDF

ESSENTIALS OF ECONOMETRICS. THIRD EDITION Damodar N. Gujarati United States Military Academy, West Point EM McGraw ye 3 3k ia Tirwin + Boston Burr Ridge, IL Dubuque, 1A Madison, Wi New York ‘San Francisco St.Louis Bangkok Bogoté Caracas Kuala Lumpur Lisbon London Madrid MexicoCity Milan Montreal New Delhi Santiago Seoul Singapore Sydney Taipei TorontoCHAPTER THE NATURE AND SCOPE OF ECONOMETRICS Research in economics, finance, management, marketing, and related disciplines is becoming increasingly quantitative. Beginning students in these fields are encouraged, if not required, to take a course or two in econometrics—a field of study that has become quite popular. This chapter gives the beginner an overview of what econometrics is all about. 1.1 WHAT IS ECONOMETRICS? Simply stated, econometrics means economic measurement. Although quantitative measurement of economic concepts such as the gross domestic product (GDP), unemployment, inflation, imports, and exports is very important, the scope of econometrics is much broader, as can be seen from the following definitions: Econometrics may be defined as the social science in which the tools of economic the- ‘ory, mathematics, and statistical inference ase applied to the analysis of economic phenomena.? Econometrics, the result of a certain outlook on the role of economics, consists of the application of mathematical statistics to economic data to lend empirical support to the models constructed by mathematical economics and to obtain auumerical results Yarthur 8. Goldberger, Econometric Theory, Wiley, New York, 1964, °P. A. Samuelson, TC. Koopmans, and J, R.N. Stone, “Report of the Evaluetive Committee for Etonometrica," Econometrica, vol. 22, 0.2, Apri 1954, pp. 141-146,CHAPTER ONE: THE NATURE AND SCOPE OF ECONOMETRICS. 2 WHY STUDY ECONOMETRICS? maa 2 [As the preceding definitions suggest, econometrics makes use of economic theory, ‘mathematical economics, economic statistics (i., economic data), and mathesnatical statistics. Yet, it is a subject that deserves to be studied in its own rright for the following reasons. Economic theory makes statements or hypotheses that are mostly qualitative in nature. For example, microeconomic theory states thi, other things remaining the same (the famous ceteris paribus clause of economics), an increase in the price of a commodity is expected to decrease the quantity demanded of that commodity. Thus, economic theory postulates a negative or inverse relationship between the price and quantity demanded of a commodity—this is the widely known law of downward-stoping demand or simply the law of demand. But the theory itself does not provide any numerical measure of the strength of the relationship between the two; that is, it does not tell by how much the quantity demanded will go up or down as a result of a certain change in the price of the commodity. It is the econometrician’s job to provide such numerical estimates. Econometrics gives empirical (ie., based on observation or experiment) content to most economic theory. If we find in a study or experiment that when the price of a unit increases by a dollar and the quantity demanded goes down by, say, 100 units, we have not only confirmed the law of demand, but in the process we have also provided a numerical estimate of the relationship between “tio two variables—price and quantity. "he main concern of mathematical economics is to express economic theory in mathematical form or equations (or models) without regard to measurability ov evapitical verification of the theory. Econometrics, as noted earlier is primarily interested in the empirical verification of economic theory. As we will show shortly, the econometrician often uses mathematical models proposed by the mathematical economist but puts these models in forms that lend uiemselves to empirical testing. Economic statistics is mainly concerned with collecting, processing, and pre- senting economic data in the form of charts, diagrams, and tables. This is the economic statistician’s job. He or she collects data on the GDP, employment, unemployment, prices, etc. This data constitutes the raw data for econometric work. But the economic statistician does not go any further because he or she is. not primarily concerned with using the collected data to test economic theories. Although mathematical statistics provides many of the'tools employed in the trade, the econometrician often needs special methods because of the unique nature of most economic data, namely, that the data are not usuelly generated as the result of a controlled experiment. The econometrician, like the meteorol- ogist, generally depends on data that cannot be controlled directly. Thus, data on consumption, income, investments, savings, prices, etc,, which are colllected by public and private agencies, are nonexperimentai in nature. The econometrician takes these data as given. This creates special problems not normally dealt with in mathematical statistics. Moreover, such data are likely to contain errors of measurement, of either omission or commission, and the econometricianCHAPTER ONE: ‘THE NATURE AND-SCOPE OF ECONOMETRICS 3 may be called upon to develop spetial methods of analysis to.deal with such errors of measurement, For students majoring in economics and business there is a pragmatic reason for studying econometrics. After graduation, in their employment, they may be called upon to forecast sales, interest rates, and money supply or to estimate demand and supply.functions or price elasticities for products. Quite often, economists appear as expert witnesses before federal and state regulatory agencies on behalf of their clients or the public at lange. Thus, an economist appearing before asstate regulatory commission that controls prices of gas and electricity may bere- quired to assess the impact of a proposed price increase on the quantity de- marided of electricity before the comunission will approve the price increase, In situations like this the economist may need to develop a demand function for electricity for this purpose. Such a demand function may enable the economist to estimate the price elasticity of demand; thats, the percentage change in the quantity demanded for a percentage change in the price. Knowledge of econometrics is very helpful in estimating such demand functions, It is fair to say that econometrics has become an integral part of training in economics arid business. 1.3. THE METHODOLOGY OF ECONOMETRICS How does one actually do-an econometric study? Broadly speaking, economet- sic analysis proceeds along the following fines. 1. Creating a statement of theory or hypothesis, 2, Collecting data. 3. Specifying the mathematical model of theory 4, Specifying the statistical, or econometric, model of theory. 5, Estimating the parameters ofthe chosen econometric model. 6. Checking for model adequacy: Model specification testing. 7. Testing the hypothesis derived from the model. 8, Using the model for prediction or forecasting, Tp illustrate the methodology, consider this question: Do economic conditions affect people's decisions to enter the labor force, that is, their willingness to work? As a measure of economic conditions, suppose We use the unemploy- intent rate (UNR), and asa measure of labor force participation we use the labor force patticipation rate (LPR). Data on UNR and LEPR are tegularly published by the government. So to answer the question we proceed as follows. Creating a Statement of Theory or Hypothesis ‘The starting point isto find out what economic theory has to say on the subject you want to study In labor economics, there are two rival hypotheses about the effect of economic conditions on people's willingness to work, The discouraged worker hypothesis (effect) states that when economic conditions worsen, asCHAPTER ONE: THE NATURE AND SCOPE OF ECONOMETRICS oliecting Data reflected in a higher unemployment rate, many unemployed workers give up hope of finding a job ard drop out of the labor force. On the other hand, the added-worker hypothesis (effect) maintains that when economic conditions worsen, inany secondary workers who are not currently in the labor market (eg, mothers with children) may decide to join the labor force if the main breadwinner in the family loses his or her job. Even if the jobs these secondary workers get are low paying, the earnings will make up some of the loss in income suffered by the primary breadwinner. Whether, on balance, the labor force participation rate will increase or decrease will depend on the relative strengths of the added-worker and discouraged- worker effects, If the added-worker effect dominates, LFPR will increase even when thé unemployment rate shigh. Contratily ifthe discouraged-worker effect dominates, LFPR will decrease. How do we find this out? This now becomes our empirical question, For empirical purposes, therefore, we need quantitative information on the {wo variables. There are three types of data that are generally available for empirical analysis, 4. Time series, 2. Cross-sectional. 3. Pooled (a combinatic:: of time series and cross-sectional). ‘Times series data are colle-ted over a period of time, such as the data on GDP, employment, unemployment, money supply, or government deficits. Such data may be collected at regular intervals—daily (e.g,, stock prices), weekly (e.g, money supply); monthly (e.g,, the unemployment rate), quarterly (eg, GDP), or annually (e.g,, government budget). These data may be quantitative in nature (e.g, prices, income, money supply) or qualitative (e.g., male or female, employed ot unemployed, married or unmarried, white or black). As wwe will show, qualitative variabies, also called dummy or categorical variables, can be every bit as important as quantitative variables, Cross-sectional data are data on one or more variables collected at one point in time, such as the census of population conducted by the U.S. Census Bureau every 10 years (the most recent was.on April 1, 2000); the surveys of consumer expenditures conducted by the University of Michigan; and the opinion polls suich.as those conducted by Gallup, Haris, and other polling organizations. In pooled data we have elements of both time series and cross-sectional data. For example, if we were to collect data on the unemployment rate for 10 countries for a period of 20 years, the data will constitute an example of pooled data— data on the unemployment rate for each country for the 20-year period will form. “time series data, whereas data on the unemployment rate for the 10 countries for any single year will be cross-sectional data, In pooled data we will have 200 observations—20 annual observations for each of the 10 countries.CHAPTER ONE: THE NATURE AND SCOPE OF ECONOWETRICS 5 ‘There is a special type of pooled data, panel data, also called longitudinal or micropanel data, in which the same cross-sectional unit, say, a family or firm, is surveyed over time, For example, the US. Department of Commerce conducts a census of housing at periodic intervals. At each periodic survey the same household (or the people living at the seme address) is interviewed to find out if there has been any change in the housing and financial conditions of that household since the last survey. The panel data that results from repeatedly in- terviewing the same household at periodic intervals provide very useful information on the dynamics of household behavior. Sources of Data A word is in order regarding data sources, The success of “any econometric study hinges on the’ quality, as well as the quantity, of data. Fortunately, the Internet has opened up a veritable wealth of data. In Appendix A1 we give addresses of several Web sites that have all kinds of microeconomicand macroeconomic data. Students should be familiar with such sources of data, as well as how to access or download them. Of course, these data aré continually updated so the reader may find the latest available data. For our analysis, we obtained the time series data shown in Table 1-1. This table gives data on the civilian labor force participation rate (CLEPR) and the civilian unemployment rate (CUNR), defined as the number of civilians unemployed as a percentage of the civilian labor force, for the United States for the period 1980-20025 Unlike physical sciences, most data collected in economics (e.g., GDP, money supply, DowJones index, car sales) are nonexperimental in that the data- collecting agency (e.g,, government) may not have any direct control over the data, Thus, the data on labor force participation and unentployment are based on the information provided to the government by participants in the labor market. Ina sense, the government is a passive collector of these data and may not be aware of the added- or discouraged-worker hypotheses, or any other hypothesis, for that matter. Therefore, the collected data may be the result of several factors affecting the labor force participation decision made by the individual person. Thatis, the same data may be compatible with more than one theory. Specttying the Mathematical Model of Labor Force Participation ‘To see how CLEPR behaves in relation to CUNR, the first thing we should do is plot the data for these variables in a scatter diagram, or scattergram, as shown. in Figure 1-1. ‘The scattergram shows that CLEPR and CUNR are inversely related, perhaps suggesting that, on balance, the discouraged-worker effect is stronger than the added-worker effect! As a first approximation, we can draw. a straight line 2We consider here only the aggregate CLFPR and CUNR, but data ane available by age, sex and ethnic composition. Qn this, see Shelly Lundberg, “The Added Worker Effect,” Journal of Labor Economics, vol. 3, January 1985, pp. 1-37CHAPTER ONE: THE NATURE AND SCOPE OF ECONOMETRICS TABLE 1-1 U.S. CIVILIAN LABOR FORCE PARTICIPATION RATE (CLFPR), CIVILIAN UNEMPLOYMENT RATE (CUNR), ‘AND REAL AVERAGE HOURLY EARNINGS (AHEB2)* FOR THE YEARS 1980-2002. . Year CLFPR (%) CUNR(%) —-AHEB2 ($) 1980 638 7A 778 198 63.9 76 7.69 1982 64.0 a7 7.68 1983 64.0 96 779 1984 e44 75 7.80 1985 648. 72 77 1986 65.3 70 781 1987 658 62 773 1988, 659 55 7.69 1989 665 53 764 1990 665 56 752 1991 662 68 745 1992 664 75 7a 1998 663 69 7.39 1994 666 64 7.40 1995 668 56 7.49 1996 66.8 54 7.43, 1997 era 49 7.58 1998 era 45 795 1999 674 42 7.88 +2000 672 49 7.29 2001 669 48 7.99 2002 668 58 cary “AHEB2 represents averdge houty eamings in 1982 dolars. ‘Source: Economie Repor ofthe Proident, 2002, CLEPR tom Table 6-40, CUNA fom Table B49, and AHES? fom Table B47, Data for 2002 are from more recent govemment publications. Filed Line Plot os 670 665 Roo ge: dao os 640 65 Cr rT) 6 7 CUNR (%) FIGURE 1-1 Regression plot for chill labor force participationCHAPTER ONE: THE NATURE AND SCOPE OF ECONOMETRICS 7 through the scatter points and write.the relationship between CLFPR and CUNR by the following simple mathematical model: CLFPR = B, + B,CUNR (1.1) Equation (1.1) states that CLFPR is linearly related to CUNR. By and By are known as the parametets of the linear function B is also known as the intercept it gives the value of CLEPR when CUINR is zero And B; is known as the slope. The slope measures the rate of change in CLFPR fora unit change in CUNR, or ‘more generally the rate of change in the value of the variable on the left-hand side of the equation for a unit change in the value of the variable on the right- hand side, The slope coefficient By can be positive (if the added-worker effect dominates the discouraged-worker effect) or negative (if the discouraged worker effect dominates the added-worker effect), Figure 1-1 suggests that in the present case it is negative. Specifying the Statistical, or Econometric, Model of Labor Force Participation The purely mathematical model of the relationship between CLFPR and CUNR given in Eq, (1.1), although of prime interest to the mathematical economist, is of limited appeal to the econometrician, for such a model assumes an exact, or deterministic, relationship between the two variables; that is, fox a given CUNR, there is a unique value of CLFPR. In reality, one tarely finds such neat relationships between economic variables. Most often, the relationships are inexact, or statistical, in nature. ‘This is seen clearly from the scattergram given in Figure 1-1. Although the two variables are inversely related, the relationship between them is not perfectly or exactly linear, for if we draw a straight line through the 23 data points, not all the data points will lie exactly on that straight line. Recall that to draw astraight line ‘weneed only two points.’ Why don’t the 23 data points lie exactly on the straight line specified by the mathematical model, Eq, (1.1)? Remember that our data on labor force and unemployment are nonexperimentally collected. Therefore, as noted earlier, besides the added- and discouraged-worker hypotheses, there may be other forces affecting labor force participation decisions, As a result, the observed relationship betweeri CLFPR and CUNR is likely to be imprecise. Let us allow for the influence of all other variables affecting CLFPR in a catchall variable u and write Eq, (1.1) as follows: CLEPR = By + ByCUNR+u (1.2) SBroadly speeking, a paremeter is an unknown quantity that may vary over a certain set of values. In statistics a probability distribution function (PDE) of a random variable is often character- ied by its parameters, such as its mean and variance. This topic is discussed in greater detail in ters and 3, in. Chapter 6 we give a more precise interpretation of the intercept in the context of regression a "he even ted to ft parabola othe ster points given in Fig 1, but the ress were notCHAPTER ONE: THE NATURE AND SCOPE OF ECONOMETRICS where u represents the random error term, or simply the error term. We let # represent all those forces (besides CLINR) that affect CLEPR but are not explicitly introduced in the model, as well as purely random forces. As we will see in Paxt Ul, the ertor term distinguishes econometrics from purely mathematical economics. Equation (1.2) is an example of a statistical, or empirical or econometric, mode More precisely, itis an example of what is known asa linear regression model, | which isa prime subject of thisbook. In such a model, the variable appearing on the left-hand side of the equation is called the dependent variable, and the variable on the right-hand side is called the indépendent, or explanatory, variable. In linear regression analysis oisr primary objective is to explain the behavior of one variable (the dependent variable) in relation to the behavior of one or more other variables (the explanatory variables), allowing for the fact that the relationship between them is inexact. Notice that the econometric model, Eq. (1.2), is derived from the mathematical model, Eq. (1.1), which shows that mathemtatical economics and econometrics are mutually complementary disciplines. This is clearly reflected in the definition of econometrics given at the outset. Before proceeding further, a warning regarding causation is in order. In the regression model, Eq, (1.2), we have stated that CLFPR is the dependent variable and CUNRis theindependent, or explanatory, variable, Does that mean that the two variables are causally related; that is, is CUINR the cause and CLEPR the effect? In other words, does regression imply causatioa? Not xecessarily. As Kendall and Stuart note, “A statistical relationship, however strong and however suggestive, can never establish causal connection: out ideas of c:::sation must come from outside statistics, ultimately from some theory or other.”8 In our example, itis up to economic theory (e.g,, the discouraged-worker hypothesis) to establish the cause-and-effect relationship, if any, between the dependent and explanatory variables. If causality cannot be established, itis better to call the relationship, Eq. (1.2), predictive relationship: Given CUINR, can we predict CLFPR? stimating the Parameters of the Chosen Econometric Model Given the data on CLEPR and CUNR, such as that in Table 1-1, how do we estimate the parameters of the model, Eq, (1.2), namely, By and By? That is, how do we find the numerical values (ie,, estimates) of these parameters? This will be the focus of our attention in Part I, where we develop the appropriate methods of computation, especially the method of ordinary least spares (OLS). Using OLS and the data given in Table 1.1, we obtairied the following results: CLEDR = 69,9963 ~ 0.6513CUNR 3) ‘In statistical lingo, the random error term is known as the stochastic error term. °M. G. Kendall and A, Stuart, The Advanced Theory of Sttistcs, Charles Grifin Publishers, New York, 1961, vol.2, Chap. 26,p. 279CHAPTER ONE: THE NATURE AND SCOPE OF ECONOMETRICS 9 Note that we have put the symbol «on CLFPR (read as “CLFPR hat”) to remind us that Eq. (1.3) is an estimate of Bq, (1.2). The estimated regression line is shown in Figure 1-1, along with the actual data points, ‘ASEq, (1.3) shows, the estimated value of By is ~ 70 and that of By is * - 0.65, where the symbol © means approximately. Thus, if the unemployment rate goes up by one unit (ie., one percentage point, ceteris paribus, CLEPR is expected to decrease on the average by about 0.65 percentage points; that is, as economic conditions worsen, on average, there is a net decrease in the labor force participation rate of about 0.65 percentage points, perhaps suggesting that the discouraged-warker effect dominates. We say “on the average” because the presence of the error term w, as noted eatlier, is likely to make the relationship somewhat imprecise. This is vividly seen in Figure 1-1 where the points not on the estimated regression line are the actual participation rates and the (vertical) distance between them and the points on the regression line are the estimated 1's. As we will see in Chapter 6, the estimated n’s are called residuals. In short, the estimated regression line, Eq: (1.3), gives the relationship between average CLEPR and CUNK; thatis, on average how CLFPR responds to a unit change in CUNR. The value of about 69.9963 suggests that the average value of CLFPR will be about 69.99 percent ifthe CUNR were zer0; that is, about 69,99 percent of the civilian working-age population would participate in the labor force if there were full employment (i.e,, zeto unemployment). Checking for Model Adequacy: Model Specification Testing How adequate is out model, Eq, (1.3)? Itis true thata person will take into account labor market conditions as measured by, say, the unemployment rate before entering the labor market, For example, in 1982 (a recession year) the civilian unemployment rate was about 9.7 percent. Compared to that, in 2001 it was only 4.8 percent. A persons more likely to be discouraged from entering the labor mat ket when the unemployment rate is9 percent than when itis 5 percent. But there are other factors that also enter into labor force participation decisions. For example, hourly wages, or earnings, prevailing in the labor market also will be an important decision variable. In the short-run at least, a higher wage may attractmore workers to the labor market, other things remaining the same (ceteris paribus). to see its importance, in Table 1-1 we have also given data on real average hourly ‘earnings (AHE82), where real earnings are measured in 1982 dollars. To take into o account the influence of AHE82, we now consider the following model: CLFPR = B; + B)CUNR + BsAHE82 + u (1.4) Equation (1.4) is an example of a multiple linear regression model, in contrast to Eq. (1.2), which is an example of asinipe (txvo-variable or bivariate) linear regression This is, however, a mechanical interpretation of the intercept. We will see in Chapter 6 how to interpret the intercept term meaningfully ina given context.0° CHAPTER ONE: THE NATURE AND SCOPE OF ECONOMETRICS ‘model: In the two-variable model there is a single explanatory variable, whereas in a multiple regression there are several, or multiple, explanatory variables. Notice that in the multiple regression, Eq. (1.4), we also have included the error term, u, for no matter how many explanatory variables one introduces in the model, one cannot fully explain the behavior of the dependent variable, How many variables one introduces in the multiple regression is a decision that the researcher will have to make in a given situation. Of course, the underlying economic theory will often tell what these variables might be. However, keep in mind the warning given earlier that regression does not mean causation, the relevant theory must determine whether one or more explanatory variables are causally related to the dependent variable. How do we estimate the parameters of the multiple regression, Eq, (1.4)? We ‘over this topic in Chapter 8, after we discuss the two-variable model in Chapters 6 and 7. We consider the two-variable case first because it is the building block of the multiple regression model. As we shall see in Chapter 8, the multiple regression model is in many ways a stcaightforward extension of the two-variable model. For our illustrative example, the empirical counterpart of Eq, (1.4) is as fol lows (these resulls are based on OLS): CLEPR 80.9013 — 0.6713CUNR — 1.4042AHE82, (15) ‘These results are interesting because both the slope coefficients are negative ‘The negative coefficient of CLINR suggests that, ceteris paribils (ie, holding the influence of AHES82 constant), a one-percentage-point increase in the unemployment rate leads, on average, to about a 0.67-percentage-point decrease in CLFPR, pethaps once again supporting the discouraged-worker hypothesis. On the other hand, holding the influence of CLINR constant, an increase in real average hourly earnings of one dollar, on average, leads to about a 1.40 petcentage- point decline in CLEPR." Does the negative coefficient for AHE82 make economic sense? Would one not expecta positive coefficient—the higher the hourly earnings, the higher the attraction of the labor market? However, one could justify the negative coefficient by recalling the twin concepts of microeconomics, namely, the income effect and the substitution effect." Which model do we choose, Bq. (1.3) or Eq, (1.5)? Since Eq, (1.5) encompasses Eq. (1.3) and since it adds an additional dimension (easnings) to the analysis, we aay choose Eq, (1.5). After all, Eq. (1.2) was based implicitly on the assumption that variables other than the unemployment rate were held constant. But where “TAs we will discuss in Chapter 8, the coefficients of CUNR and AHES2 given in Eq. (1.5) are known as partial regression cogficents. in that chapter we will discuss the precise meaning of parti regression coelficents. Consult any standard textbook on microeconomics, One intuitive justification of this result is 2s follows. Suppose both spouses are in the labor force and the earings of one spouse rises substantially. This may prompt the other spouse to withdraw from the labor force without substantially alfecting the family income.CHAPTER ONE: THE NATURE AND SOOPE OF ECONOMETRICS 11. do.we stop? For example, labor force patticipation may also depend on family wealth, number of children under age 6 (this is especially critical for married ‘women thinking of oiting the labor market), availability of day-care centers for young children, religious beliefs, availability of welfare benefits, unemployment insurance, and so on. Even if data on these variables are available, we may not want to introduce them all in the model because the purpose of developing an econometric model is not to capture total reality, but just its salient features, If we decide to include every conceivable variable in the regression model, the model will be so unwieldy that it will be of little practical use. The model ultimately chosen should be a reasonably good replica of the underlying reality. In Chapter 11 we will discuss this question further and find out how one can go about developing a model. Testing the Hypothesis Derived from the Model Having finally settled on a model, we may want to perform hypothesis testing. ‘That is, we may want to find out whether the estimated model makes economic sense and whether the results obtained conform with the underlying economic theory. For example, the discouraged-worker hypothesis postulates a negative relationship between labor force participation and the unemployment rate. Is this hypothesis borne out by our results? Our statistical results seem to be in confor- mity with this hypothesis because the estimated coefficient of CUNR is negative. However, hypothesis testing can be complicated. In our illustrative example, suppose someone told us that in a prior study the coefficient of CUNR was found to be about -1. Are our results in agreement? If we'rely on the model, Bq. (1.3}, we might get one answer; but if we rely on Eq, (1.5), wemightget another answer. How do we resolve this question? Although we will develop the neoes- sary tools to answer such questions, we should keep in mind that the answer toa particulas hypothesis may depend on the model we finally choose. ‘The point worth remembering is that in repression analysis we may be inter- ‘ested not only in estimating the parameters of the regression model but also in testing certain hypotheses suggested by economic theory and/or prior empirical experience. Using the Model for Prediction or Forecasting Having gone through this multistage procedure, you can legitimately ask the question: What do we do with the estimated model, such as Eq, (1.5)? Quitenat- urally, we would like to use it for prediction, or forecasting. For instance, sup- Bose we have 2004 data on the CLINR and AHE82, Assume these values are 6.0 and 10, respectively. If we put these values in Eq, (1.5), we obtain 62.8315 per-~ cent as the predicted value of CLFPR for 2004. That is, ifthe unemployment rate in 2004 were 6.0 percent and the real hourly earnings were $10, the civilian labor force participation rate for 2004 would be about 63 percent. Of course, when data on CLFPR for 2004 actually become available, we can compare the12 CHAPTER ONE: THE NATURE AND SCOPE OF ECONOMETRICS predicted value with the actual value, The discrepancy between the two will represent the prediction errr. Naturally, we would like to keep the prediction error as small as possible. Whether this is always possible is a question that we will answer in Chapters 6 and 7. Let us now summarize the steps involved in econometric analysis. ‘Step Example 4. Statement of theory ‘The Added- /Dsoouraged-Worker Hypothests 2. Collection of Data Table 1-1 ‘3, Mathematical Model of Theory: CLFPR = B; + &:CUNA 4 Egonometlc Model of Theory: CLEPR = B, + B,CUNA + u 5. Parameter Estimation: CLFPR = 69,9963 — 0.6513CUNR 8. Model Adequacy Check: CLFPR = 80.9 ~ 0.671CUNR — 1.4AHE@2 7. Hypothesis Test: &<00rB,>0 8. Prediction: ‘What is CLFPR, given values of CUNA and AHEB2? Although we examined econometric methodology using an example from labor economics, we should point out that a similar procedure can be employed to analyze quantitative relationships between variables in any field of knowledge. As a matter of fact, regression analysis has been used in politics, international relations, psychology, sociology, meteorology, and many other disciplines. 1.4 THE ROAD AHEAD Now that we have provided a glimpse at the nature and scope of econometrics, let us see what lies aliead. The book is divided into four paris. Partl, consisting of Chapters 2,3,4,and5, reviews the basics of probability and statistics for the benefit of those readers whose knowledge of statistics has become rusty. Reader should have some previous background in introductory stati Part II introduces the reader to the bread-and-butter tool of econometrics, namely, the classical linear regression model (CLRM). A thorough understanding of CLRM is a must in order to follow research in the general areas of economics and business. Part Ill considers the practical aspects of regression analysis and discusses a variety of problems that the practitioner will have to tackle when one ot more assumptions of the CLRM do not hold. Part IV discusses two comparatively advanced topic—simultaneous equation regression models and time series econometrics. This book keeps the needs of the beginner in mind, The discussion of most topics is straightforward and unencumbered with mathematical proofs, derivations, etc’ firmly believe that the apparently forbidding subject of econometrics can be taught to beginners in such a way that they can see the value of the subject without getting bogged down in mathematical and statistical minutiae. The "Some ofthe proofs and derivations are presented in my Basic Econometrics, Ath ed, MeGraw- Hill, New York, 2003.CHAPTER ONE: THE NATURE AND SCOPE OF ECONOMETRICS 13 student should keep in mind that an introductory econometrics courses just like the introductory statistics course he or she has already taken, As in statistics, econometrics is primarily about estimation and hypothesis testing. What is different, and generally much more interesting and useful, is that the parameters being estimated or tested are not just means and variances, but relationships between variables, which is what much of economics and other social sciences are all about. A final word. The availability of comparatively inexpensive computer software packages has now made econometrics readily accessible to beginners. In this book we will largely use three software packages: Euiews, Excel, and Minitab, These packages are readily available and widely used. Once students get used to using such packages, they will soon realize that learning economet= tics is really great fun, and they will have a better appreciation of the much maligned “dismal” science of economics. KEY TERMS AND CONCEPTS QUESTIONS ‘The key terms and concepts introduced in this chapter are Econometrics Random error term (error term) Mathematical economics Linear regression model: Discouraged-worker hypothesis dependent variable (effect) independent (or explanatory) Added-worker hypothesis (effect) variable Time series data Deterministic vs. statistical a) quantitative relationship b) qualitative Causation Cross-sectional data Parameter estimates Pooled data Hypothesis testing Panel (ot longitudinal or micropanel _ Prediction (forecasting) data) Scatter diagram (scattergram) a) parameters b) intercept 6) slope 141, Suppose a local government decides to increase the tax sate on sesidential properties under its jurisdiction. What will be the effect of this on the prices of resi- dential houses? Follow the eight-step procedure discussed in the text toanswer this question. 1.2. How do you perceive the role of econometrics in decision making in business and economics? 13. Suppose you are an economic adviser to the Chairman of the Federal Reserve Board (the Fed), and he asks you whether it is advisable to increase the money 134 CHAPTER ONE: THE NATURE AND SCOPE OF ECONOMETRICS PROBLEMS TABLE 1-2 ie supply to bolster the-economy. What factors would you take into account in your advice? How would you use econometrics in your advice? 14. ‘To reduce the dependence on foreign oil supplies, the government is thinking of increasing the federal taxes on gasoline. Suppose the Ford Motor Company has hired you to assess the impact of the tax increase on the demand for its cars. How would you go about advising the company? 1.5, Suppose the president of the United States is thinking of imposing tariffs on imported steel to protect the interests of the domestic steel industry. As an economic advisor to the president, what would be your recommendations? How would you set up an econometric study to assess the consequences of imposing the tariff? 16, Table 1-2 gives data on the Consumer Price Index (CPI), S&P 500 stock index, and three-month Treasury bill rate for the United States for the years 1980-2001. a. Plot these data with time on the horizontal axis and the three variables on the vertical axis. If you prefer, you may use a separate figure for each variable . What relationships do you expect between the CPI and the S&P index and between the CPT and the three-month Treasury bill rate? Why? «. Foreach variable, “eyeball” a regression line from the scattergram. CONSUMER PRICE INDEX (CPI, 1982-1984 = 100), ‘STANDARD AND POOR'S COMPOSITE INDEX (S8P 500, {941-1943 = 100), AND THREE MONTH TREASURY BILL RATE (3-mT BILL, %). Year orl ‘S&P 500 3-mT bill 1980 824 118.78 1151 1981 90.9. 128.05 14.03 1982 96.5 119.71 10.68 1983 99.6 160.41 8.63 1984 103.9 160.48 9.58 1985 1076 106.84 ow 1986 108.6 23634 8.98 1987 1136 286.83, 5.82 1988 1183 265.79 6.69 1989 124.0 322.84 812 1990 1907 334,59 751 4991 136.2 376.18 5a2 1992 1403 415.74 345 1998 1445 451A 302 1994 1482 490.42 429 1095 1824 54172 551 1996. 156.9 670.50 5.02 1997 1605 873.43 507 1998 163.0 1085.50 481 1999 168.6 1827.33 468 2000 1722 1427.22 585 2001 aA 1194.18 3.45 ‘Source: Economic Report ofthe President, 2002, Tabos B-60,CHAPTER ONE: THE NATURE AND SCOPE OF ECONOMETRICS 15, TABLE 4-3 GMS EXCHANGE RATE BETWEEN GERMAN MARK AND U.S, DOLLAR AND THE OP| IN THE UNITED STATES AND GERMANY, 1980-1998, Year ews oPius, CPI Germany 1980 1.8175 824 86.7 1981 2.2692 90.9 922 1982 2.4281 965 974 1983 2.5599 99.6 1003 1984 2.8455 1080 4027, 1985 2.9420 107.6 104.8 1986, 24705, 1096 104.6 1987 4.7981 1136 1049 1988 4.7570 118.3 1063 1989 1,8808 1240 1092 1990 1.6166 130.7 1122 1991 1.8610 1362. 1163 1992 4.5618 1403, 122.2 1998 1.8545 1445 1278 1994 1.6216 1482 1314 1995 1.4321 1824 1935 1996 1.8049 156.9 1855 1997 1.7948 160.5 1978 1998, 4.1879 183.0 1984 ‘Source: Ecencmlc Report of te President, 2002. GM trom ‘Table 8-110; OPI (1862-1884 = 100) from Yabo 8-408, 17, Table 1-8 gives you data on the exchange tate between the German mark and the US. dollar (number of German marks per US. dollar) ag well as the consumer price indexes in the two countries for the period 1980-1998. 4. Plotthe exchange rate (ER) and the two consumer price indexes against time, measured in years. : . b. Divide the US, CPIby the German CPi and calli the relative price ratio (RPR). ¢. Plot ER against RPR. 4. Visually sketch a regression line through the scatterpoints, APPENDIX 1A: Economic Data on the World Wide Web’ Economic Statistics Briefing Room: An excellent source of data on output, income, employment, unemployment, earnings, production and business activity, prices and money, credits and security markets, and intemational statistics, http//wwwwhitehouse.govifsbr/esbrhtm 4 should be noted that this ist is by no means exhaustive. The sources listed here are updated continually. The best way to get information om the Internet isto search using a key word (ec. unemployment rate), Don't he surnrised vant e=t a mlthiors af infnematinn on fhe16 CHAPTER ONE: THE NATURE AND SCOPE OF ECONOMETRICS Federal Reserve System Beige Book: Gives a suntmary of current economic conditions by the Federal Reserve District. There are 12 Federal Reserve Districts. hitp//wwwhbog.fth fed.us/fome/bb/current Government Information Sharing Project: Provides regional economic information; 1990 population and housing census; 1992 economic census; agriculture census for 1982, 1987, 1992; data on US. imports and exports 1991-1995; 1990 Equal Employment Opportunity information, hitp’//govinfo.kerrorst.edu National Bureau of Economic Research (NBER) Home Page: This highly regarded private economic research institute has extensive data on asset prices, labor, ‘productivity, money supply, business cycle indicators, etc,-NBER has many links to other Web sites, . httpeiwwwanberorg Panel Study: Provides data on longitudinal survey of representative sample of US. individuals and families. These data have been collected annually since 1968. http://www.umich.edu/~psid Resources for Economists ont the Internet: Very comprehensive source of information and data on many economic activities with links to many Web sites. A very valuable source for academic and nonacademic economists. Aitp://econwpa.wwstl.edu/EconFAQ/EconFaq-html ‘The Federal Web Locator: Provides information on almost every sector of the federal government; has international links, http:/erwwdaw.villedu/Fed-Agency/fedwebloc.html WebEC:WWW Resources in Economics: A most comprehensive library of economic facts and figures. http://wuecon.wustl.edu/~adneteciWebHo/WebEc.himt ‘Ameriéan Stock Exchange: Information on ome 700 ‘companies listed on the second largest stock market. httpy/www.amex.com/ Bureau of Economic Analysis (BEA) Home Page: This agency of the US. Department of Commerce, which publishes the Survey of Current Business, is an excellent source of data on all kinds of economic activities. hitp:lwww.bea.dac.gov! Business Cycle Indicators: You will find data on about 256 economic time series. Attpswwwglobalexposure.convbei.html CIA Publication: You will find the World Fact Book (anwial) and Handbook of International Statistics, hitp//www.odic.gow/cia/publications/pubs.html Energy Information Administration (Department of Energy [DOE]): Economic information and data on each fuel category. hitpulwwweiadoe.gov/ FRED Database: Federal Reserve Bank of St. Louis publishes historical economic and social data, which include interest rates, monetary and business indicators, exchange rates, etc. hitpu/wwwestls.frb.org/fred/fred.html(OVO CHAPTER ONE: THE NATURE AND SCOPE OF ECONOMETRICS 17 wag74718 International Trade Administration: Offers many Web links to trade statistics, cross-country programs, etc. httpulwwwiita.docgov/ STAT-USA Databases: ‘The National Trade Data Bank provides the most comprehensive source of international trade data and export promotion information. It also contains extensive data on demographic, political, and sociveco- nomic conditions for several countries. http:/www.stat-usa.gov/BEN/databases.html ‘Statistical Resources on the Web/Economics: An excellent source of statistics col- lated from various federal bureaus, economic indicators, the Federal Reserve Board, data on consumer prices, and Web links to other sources. httpy/wwwlib.umich.edu/libhome/Documents.centers/stecon.html Buren of Labor Statistics: The home page contains data related to various aspects of employment, unemployment, and earnings and provides links to other statistical Web sites. hitp://stats.bls.gov:80/ U.S. Census Bureau Home Page: Prime source of social, demographic, and economic data on income, employment, income distribution, and poverty. hittp:l/www.census.govi General Social Surcey: Annual personal interview survey data on U.S. house- holds that began in 1972. More than 35,000 have responded to some 2500 different questions covering a variety of data. http//www.iepstumich.edw/GSS/ Institute for Research on Poverty: Data collected by nonpartisan and nonprofit university-based research center on a variety of questions relating to poverty and social inequality, hitpywwwesscawisc.edwirp/ Social Security Administration: The official Web site of the Social Security Administration with a variety of data, hitpuhrww.sa.gov! Federal Deposit Insurance Corporation, Bank Data and Statistics: http:fwww {dic gov/bank/statisticallindex html Federal Reserve Board, Economic Research and Data: http:}www.federalreserve.govima.him US. Census Bureau, Home Page: httpywww.census.gov U.S. Department of Conimerce, Bureau of Economic Analysis: httpuwww.bea.gov : US. Department of Energy, Energy Information Administration: httpy/wwweia.doe.govineichistorichistorichtm U.S. Department of Health and Human Services, National Center for Health - Statistics: Ritpdfeewwede.gowinchs U.S, Department of Housing and Urban Development, Data Sets: pulwrwwhudusenorg/datasets/pdrdatas.html 1]18 CHAPTER ONE: THENATURE AND SCOPE OF ECONOMETRICS 18 ULS. Department of Labor, Bureau of Labor Statistics: hitpd/wwwobls.gov USS. Department of Transportation, TranStats: hitp:/wwwéranstats.bts.gov. U.S. Department of the Treasury, Internal Revenue Service, Tax Statistics: hitp//wwwirs.gov/taxstats Rockefeller Institute of Government, State and Local Fisoal Data: hitp:/stateandlocalgateway.rockinst.org/fiscal_trends Arierican Economic Association, Resources for Economists: hitpshwwwafe.org, Anterican Statistical Association, Business and Economic Statistics: hitp//www.econ-datalinks.org American Statistical Association, Statistics in Sports: http//www.amstat.org/sections/sis/sports.htm! European Central Bank, Statistics: hittp:/wwwecb.int/stats World Bank, Data and Statistics: hitpdlwww.worldbankorgidata International Monetary Fund, Statistical Topics: httpu/www.mgorg/external/np/sta/indexhtm International Monetary Fund, World Economic Outlook: hittp//www.imf.org/external/pubs/ft/weo/2003/02/datalindex.htm. Penn World Tables: hitp?//pwrtecon.upenn.edu Current Population Survey: httpulwwwbls.census.gov/cps/epsmain.htm Consumer Expenditure Survey: hittpv/www.bls.gov/cexfhome.htm Survey of Consumer Finances: httpviwww.federalreserve.gov/pubs/oss/oss2/scfindex.html City and Countty Data Book: itp/fwww.census.gov/prod/wwwicedb.html Panel Study of Income Dynamics: http:/psidonline.isr.umich.edu ‘National Longituidinal Surveys: hittpyiwwwbls.gov/nlshhome.htm National Association of Home Builders, Economic and Housing Data: http:/www.nahb.org/category.aspx?sectionID=113 National Science Foundation, Division of Science Resources Statistics: hittpywww.nsf govisbelsrsistats htmPART { BASICS OF PROBABILITY AND STATISTICS This part consists of four chapters that review the essentials of statistical theory that are needed to understand econometric theory and practice discussed in the remainder of the book. ‘Chapter 2 discusses the fundamental concepts of probability, probability distributions, and random variables. Chapter 3 discusses the characteristics of probability distributions such as the expected value, variance, covariance, correlation, conditional expectation, conditional variance, skewness, and kurtosis. This chapter shows how these characteristics are measured in practice. Chapter 4 discusses four important probability distributions that are used extensively in practice: (1) the normal distribution, (2) the t distribution, (3) the chi-square distribution, and (4) the F disttibution. In this chapter the main features of these distributions are outlined. With several examples, this chapter shows how these four probability distributions form the foundation of most statistical theory and practice. Chapter 5is devoted to a discussion of the two branches of classical statistics— estimation and hypothesis testing. A firm understanding of these two topics will considerably make our study of econometrics in subsequent chapters easier. ‘These four chapters are written in a very inforntal yet informative style so readers can brush up on their knowledge of elementary statistics. Since students coming to econometrics may have different statistics backgrounds, these four chapters provide a fairly self-contained introduction to the subject. All the concepts introduced in these chapters are well illustrated with several practical examples, 19CHAPTER 2 REVIEW OF STATISTICS |: PROBABILITY AND PROBABILITY DISTRIBUTIONS ‘The purpose ofthis and the following three chaptersis to review somefundamen- tal statistical concepts that are needed to understand Essentials of Econometric. ‘These three chapters will serve as a refresher course for those students who have had a basic course in statistics and will provide a unified framework for following discussions of the material in the remaining patts of this book for those whose knowledge of statistics has become somewhat rusty. Students who have had very little statistics should supplement these three chapters with a good satistics book. (Gome references are given at the end of this chapter) Note that the discussion in Chapters 2 through 5 is nonrigorous and is by no micans a substitute for a basic course in statistics, It is simply an overview that is intended as a bridge to econometrics. 2.1 SOME NOTATION Inthis chapter we come across several mathematical expressions that often can be expressed more conveniently in shorthand forms. The Summation Notation ‘The Greck capital letter E(sigma) is used to indicate summation or addition, Thus,” Sama Ke tbh ist22 PART ONE: BASICS OF PROBABILITY AND STATISTICS where fis the index of summation and the expréssion on the left-hand sides the shorthand for “take the sum of the variable X from the first value ({= 1) to the nth value (i =)"; X stands for the ifh value ofthe X variable. re oy X) is often abbreviated as ‘ Ls ' where the upper and lower limits of the sum are known or can be easily deter- ined or also expressed as yx x which simply means take the sum ofall the relevant values of X. We will use all these notations interchangeably. Properties of the Summation Operator Some important properties of © are as follows: 1, Where kis a constant Dk=nk ist That is, a constant summed n times is m times that constant. Thus, 4 Y3=4x3=12 a In this example n = 4 and k= 3. 2, Where k is a constant DRX SED X That is, a constant can be pulled out of the summation sign and put in front of it, 3 Lith) =CX+ LY Thats, the summation of the sum of two variables is the sum of their individual summations. 4, La + bX) =a +b YX; wheres and b are constants and where use is made of properties 1,2,and3. ‘We will make extensive use of the summation notation in the remainder of this chapter and in the rest of the book. We now discuss several important concepts from probability theory.CHAPTER TWO: REVIEW OF STATISTICS |: PROBABILITY AND PROBABILITY DISTRIBUTIONS "23 2.2 EXPERIMENT, SAMPLE SPACE, SAMPLE POINT, AND EVENTS Experiment The first important concept is that of a statistical or random experiment. In statistics this term generally refers to any process of observation or measurement that has more than one possible outcome and for which there is uncertainty about which outcome will actualy materialize, Example 2.1, Tossing a coin, throwing a pair of dice, and drawing a card from a deck of cards are all experiments. It is implicitly assumed that in performing these experiments certain conditions are fulfilled, for example, that the coin or the dice are fair (not loaded). The outcomes of such experiments could be ahead or a tail if a coin is tossed or any one of the numbers 1, 2,3, 4,5, or 6 if a die is thrown, Note that the outcomes are unknown before the experiment is performed. The objectives of such experiments may be to establish a law (eg,, How many heads are you likely to obtain in a toss of, say, 1000 coins?) or to test the proposition that the coin is loaded (e.g., Would you regard coin as being loaded if you obtain 70 heads in 10 tosses of acoin?). Sample Space or Population ‘The set of all possible outcomes of an experiment is called the population or sample space, The concept of sample space wss frst introduced by von Mises, an Austrian mathematician and engineer, in 1931. Example 2.2, Consider the experiment of tossing two fair coins. Let H denote a head and Ta tail. Then we have these outcomes: HH, HT, TH, TT, where HH means a head on the first toss and a head on the second toss, HT means a head on the first toss and a tail on the second toss, etc. In this example the totality of the outcomes, or sample space or population, is 4—no other outcomes are logialy possible. (Don’t worry about the coin landing on its edge.) Example 23. ‘The New York Mets are scheduled to play a doubleheader, Let O; indicate the outcome that they win both games, Q, that they win the first game but lose the second, Os that they lose the first game but win the second, and O, that they lose both games, Here the sample space consists of fouir owtcomes: Os, 0s. Ox, and Ox24 PARTONE: BASICS OF PROBABILITY AND STATISTICS Sample Point Events Venn Diagrams 24 Each member, or outcome, of the sample space or population is called a sample point. In Example 22 each outcome, HH, HT, TH, and TT, ise sample point. In Example 23 each outcome, 01, Ox, Os, and Oy is a sample point. An event isa particular collection of outcomes and is thus a subset of the sample space. Example 24, Letevent be the occurrence of one head and one tail in the coin tossing experiment, From Example2.2 we see that only outcontes HT and TH belong to event A, (Note: HT and TH are a subset of the sample space HH, HT, TH, and TT). Let B be the event that two heads occur in a toss of two coins. ‘Then, ob- ‘viously, only the outcome HH belongs to event B. (Again, note that HH is a subset of the sample space HH, HT, TH, and TT). Events are said to be mutually exclusive if the occurrence of one event pre- vents the occusrence of another event at the same time. In Example 2.3, if 0; occurs, thatis, the Mets win both the games, itrules out the occurrence of any of the other three outcomes. Two events are said to be equally likely if we are confident that one event is as likely to occur as the other event, In 4 single toss of a coin a head is as likely to appeas as a tail. Events are said to be collectively exhaustive if they exhaust all possible outcomes of an experiment. In our two coin-tossing example, since HH, HT, TH, and TT are the only possible outcomes, they are(collectively) exhaustive events. Likewise, in the Mets example, 01, Oz, O3, and Ox are the only possible outcornes, barring, of course, rain or natural calamities such as the earthquake that occurred during the 1989 World Series in San Francisco. ‘Asimple graphic device, called the Venn diagram, originally introduced by ‘Venn in his book, Symbolic Logic, published in 1861, can be used to depict sample point, sample space, events, and related concepts, as shown in Figure 2-1. In this figure each rectangle represents the sample space § and the two circles represent two events A and B. If there are more events, we can draw more circles to represent all those events. The various subfigures in this diagram depict various situations, Figure 2-1(@) shows outcomes that belong to A and the outcomes that doniatbe- long to, which are denoted by thesymbol A, whichis called the complement of A. Figure 2-1(b) shows the union (ie., sum) of A and B, that is the event whose outcomes belong to set A or set B. Using set theory notation, it is often denoted as AUB (read as A union B), which is the equivalent of A + B.CHAPTER TWO: REVIEW OF STATISTICS I: PROBABILITY AND PROBABILITY DISTRIBUTIONS 25 oe ® @ @ FIGURE 21 Vern diagram, ‘The shaded area in Figure 2-1(c) denotes events whose outcomes belong to both set A and set B, which is represented as AN B(read.as A intersects B), and is the equivalent of the product AB, Finally, Figure 2-1(d) shows that the two events are mutually exclusive because they have no outcomes in common. In set notation, this means ANB = 0 (or that AB = 0). 2.3 RANDOM VARIABLES Although the outcome(s) of an experiment cant be described verbally, stich as a head or a tail, or the ace of spades, it would be much simpler if the results of all experiments could be described numerically, that is, in terms of numbers. As we will see late, for statistical putposes such representation is very useful, Example 25. Reconsider Example 2.2. Instead of describing the outcomes of the experi ment by HH, HT, TH, and TT, consider the “variable” number of heads in a toss of two coins. We have the following situation: Fitstccin —_-Sgcond.¢oin Number of heads T T 0 a H 1 H i 1 H H 2 os26 PART ONE: BASICS OF PROBABILITY AND STATISTICS ‘We call the variable “number of heads” a stochastic or random variable (r.v., for short). More generally, a variable whose (numerical) value is determined by the outcome of an experiment is called a random variable. In the preceding example the xv, number of heads, takes three different values, 0, 1, or 2, depending on whether no heads, one head, or two heads were obtained in a toss of two coins. In the Mets example the rv, the number of wins, likewise takes three different values; 0, 1, or 2. By convention, random variables are denoted by capital letters, X, Y, Z, etc., and the values taken by these variables are often denoted by small letters. Thus, if X is an rv, x denotes a particular value taken by X. _An ry. may be either discrete or continuous. A discrete random variable takes on only a finite number of values or countably infinite number of values (ce. as many values as there are whole numbers). Thus, the number of heads in a toss of two coins can take on only three values, 0, 1; or 2. Hence, it is a’discrete rv. Similarly, the number of wins in a doubleheader is also a discrete rv. since it can take only three values, 0, 1, or 2 wins. A continuous random variable, on the other hand, is an tv. that can take on any value in some interval of values. ‘Thus, the height of an individual is a continuous variable—in the range of, say, 60 to 72 inches it can take any value, depending on the precision of measurement. Similar factors such as weight, rainfall, or temperature also can be regarded as continuous random variables. 24 PROBABILITY Having defined experiment, sample space, sample points, events, and random variables, we now consider the important concept of probability. First, we define the concept of probability of an event and then extend it to random variables. Probability of an Event: The Classical or A Priori Definition If an experiment can result inn mutually exclusive and equally likely outcomes, and if mz of these outcomes are favorable to event A, then P(A), the probability that A occurs, is the ratio m/n, That is, umber of outcomes favorable toA en total number of outcomes Note the two features of this definition: The outcomes must be mutually exclusive (that is, they cannot occur at the same time), and each outcome must have an equal chance of occurring (for example, ina throw of adie, any one of the six numbers has an equal chance of appearing). P(A) = Example 26. Ina throw of adie numbered 1 through 6, there ae six possible outcomes: 1, 2,3,4,5,0r 6. These outcomes are mutually exclusive since, ina single throw(CHAPTER TWO: REVIEW OF STATISTICS |: PROBABILITY AND PROBABILITY DISTRIBUTIONS 27 of the die, two or more numbers cannot tun up simultaneously. These six outcomes are also equally likely. Hence, by the classical definition, the probability that any of these six numbers will show up is 1/6—there are six total outcomes and eaich outcome has an equal chance of occurring, Here n = 6 and m=1, Similarly, the probability of obtaining a head in a sirigle toss of a coin is 1/2 since there are two possible outcomes, Hand T, and each has an equal chance of coming up. Likewise, in a deck of 52 cards, the probability of drawing any single card is 1/52. (Why?) The probability of drawing a spade, however, is 13/52. (Why?) The preceding examples show why the classical definition is called an a priori definition since the probabilities are derived from purely deductive reasoning. One doesn’t have to throw a coin to state that the probability of obtaining a head ora tail is 1/2, since logically, these are the only possible outconies. But the classical definition has some deficiencies, What happens if the outcomes of an experiment are not finite or are not equally likely? What, for exam- le, is the probability that the gross domestic product (GDP) next year will be a certain amount or what is the probability that there will be a recession next year? The classical definition is not equipped to answer these questions. A more widely used definition that can handle such cases is the relative frequency definition of probability, which we will now discuss. Relative Frequency or Empirical Definition ot Probabitity To introduce this concept of probability, consider the following example. Example 2.7. : Table 2-1 gives the distribution of maéks received by 200 students on a microeconomics examination. Table 2-1 is an example of a frequency distribution showing how the rv, marks in the present example are distributed. The numbers in column 3 of the table are called absolute frequencies, that is, the number of occurrences of a given event. The numbers in column 4 are called relative frequencies, that is, the absolute frequencies divided by the total ‘number of occurrences (200 in the present case). Thus, the absolute frequency of marks between 70 and 79 is 45 but the relative frequency is 0.225, which is 45 divided by 200, Can we treat the relative frequencies’as probabilities? Intuitively, it seems reasonable to consider the relative frequencies as probabilities provided the number of observations on which the relative frequencies are based is reasonably Jarge. This is the esséince ofthe enipirial, or relative frequency, definition of probability28 PART ONE: BASICS OF PROBABILITY AND STATISTICS 28 TABLE 2-1 THE DISTRIBUTION OF MARKS RECEIVED BY 200 STUDENTS ON A MICROECONOMICS EXAMINATION. Midpoint of Absolute Relative “Marks infeval = frequency frequency. «) @ 6) (4) = (3200 08 5 0 o 10-19 15 0 o 20-29 25 0 0 30-39 5 10 0.050 40-49, 6 20 0.400 50-69 5 8 0175 60-69 65 50 0.250 70-79 6 45 0225 00-89 85 30 0.150 90-99 5 10 0,050 Total: 200 1.000 More formally, if in trials (or observations), m of them are. favorable to event A, then P(A), the probability of event A, is simply the ratio m/n (te, relative frequency) provided n, the number of trials, is sufficiently large (technically, infinite) Notice that, unlike the classical definition, we do not have to insist that the outcome be mutually exclisive and equally likely. . Inshort, if the number of trials is sufficiently large, we can treat the relative frequencies as fairly good measures of true probabilities. In Table 2-1 we can,, therefore, treat the relative frequencies given in column 4 as probabilities” Properties of Probabilities ‘The probability of an event as defined earlier has the following important properties: 1. The probability of an event always lies between O-and 1. Thus, the probability of event A, P(A), satisfies this relationship: 00 27 That is, the conditional probability of A, given B, is equal to the ratio of their joint probability to the marginal ma of B. In like manner, P@|A)= 5 P(A) > 0 (2.8) om Tp avoid the shaded area in Fig. 2-1(c) being counted twice, we have to subtract P(AB) on the right-hand side of this equation.‘CHAPTER TWO. REVIEW OF STATISTICS t PROBABILITY AND PROBABILITY DISTRIBUTIONS 31 ‘Toviswalize (27), we can resort to a Venn diagram, as shown in Figure 2-1(c) ‘As you can see ftom this figure, regions 2 and 3 represent event B and regions 1 and 2 represent event A. Because region 2 is common to both events, and since Bhas occurred, if we divide the area of region 2 by the sum of the areas of regions 2 and 3, we will get the (conditional) probability that the event A has oc- ” curred, knowing that B has occurred. Simply put, the conditional probability is the fraction ofthe time that event A occurs when event B has occurred. Example2.11, Inan introductory accounting class there are 500 students, of which 300 are ales and 200 are females. Of these, 100 males and 60 females plan to major in actouriting. A student is selected at random from this class, and itis found that this student plans to be an accounting majo What i the probability that the student isa male? Let A denote the event that the student is a male and B that the student is an accounting major. Therefore, we want fo find out P(4| B). From the formula of conditional probability just given, this probability can be obtained as ec) ?@) _ 100/500 ~ 60/500 = 0.625 P(A\B) From the data given previously, it can be readily seen that P(A) = 200/500 0.6; that is, the unconditional probability of selecting a male student is 0.6, which is different from the preceding probability 0.625. This example brings out an important point, namely, conditional and unconditional probabilities in general are diferent. However, if the two events are independent, then we can see from Bq. (27) that P(AB) _ P(APB) _ P(A|B) = 7H 7 Pe) P(A) (2.9) Note that: P(AB) = P(A) P(B) when the two events are independent, as noted earlier, in this case, the conditional probability of A given B is the same as the unconditional probability of A. In this case it does not matter if B occurs or not. ‘An interesting application of conditional probability is contained in the fa- ‘mous Bayes’ Theorem, which was originally propounded by Thornas Bayes, « nonconformist minister in Tumbridge Wells, England (17011761). This theorem, published after Baves! death. led tn the en-rallod Ravacian chant af a32 PARTONE: BASICS OF PROBABILITY AND STATISTICS Statistics, a.rival to the school of classical statistics, which still predominates statistics teaching in most universities in the world. The knowledge that anevent B has occurred can be used to revise or update the probability that an event A has occurred! is the essence of Bayes’ Theorem. Toexplain this theorem, let A and B be two events, each with a positive probability. Bayes’ Theorem then states that: PCBLAPLA) PRIAPA+ PCPA) oe where A’, called the complement of A, means the event A does not occur, In words, Bayes’ Theorem shows how conditional probabilities of the form P(BIA) may be combined with initial probability of A (ie., P(A)) to obtain the final probability. P(A|B). Notice how the roles of conditioning event (6) and ‘outcome event (A) have been interchanged. The following example will show how Bayes’ Theorem works, P(AIB) = Example 2.12, Bayes’ Theorem Suppose a woman has two coins in her handbag. One is a fair coin and one is two-headed. She takes a coin at random from her handbag and tosses it. Suppose a head shows up. What is the probability that the coin she tossed was two-headed? Let A be the event that the coin is two-headed and A’ the event that the coin is fair. The probability of selecting either of these coins is P(A) = P(A’) = 1/2. Let B be the event that a Head tums up. If the coin has two heads, B certain to occur, Hence, (BJA) = 1. But if the coin is fair, P(B|A) = 0.5. ‘Therefore by Bayes’ theorem we obtain P(BIA)P(A) _ (1)(05) _ 2 x PCAIB) =~" pipy = 075 5 ~ 056 Notice the particular feature of this theorem. Before the coin was tossed, the probability of selecting a regular or two-headed coin was the same, namely, 0.5. But knowing that a head was actually observed, the probability that the coin selected was two-headed, is revised upwards to about 0.66. In Bayesian language, P(A) is called the prior probability (i., before the fact or evidence) and P(A|B) is called the revised or posterior probability (after the fact or evidence). The knowledge that B has occurred leads us to reassess or revise the (prior) probability assigned to A. 5 the sample space is partitioned into A (event A occurs) and Al (event A does not occu), then forany event its true that PB) = PBA) + PGA); that isthe probability that B occurs isthe sum of the common outcomes between B and each partition of A. This result can be generalized if A is partitioned ito several segments. This can be seen easily from & Vern diagram.CHAPTER TWO: REVIEW OF STATISTICS |; PROBABILITY AND PROBABILITY DISTRIBUTIONS 33, ‘You might see this finding as somewhat puzzling, Intuitively you will think that this probability should be 1/2. But look at the problem this way. There are three ways in which heads can come up, and in two of these cases, the hidden face will also be heads. * Notice another interesting feature of the theorem. In classical statistics we assume that the coin is fair when we bed it, In Bayesian statistics we question that premise or hypothesis. Probabliity of Random Varlables _ Justas weassigned probabilities to sample outcomes or events ofa sample space, ‘we can assign probabilities to random variables, foras we saw, random variables are simply numerical representations of the outcomes of the sample space, as shown in Example 25. In this textbook we are largely concerned with random variables such as GDP, money supply, prices, and wages, and we should know how to assign probabilities to random variables. Technically, we need to study the probability distributions of random variables, a topic we will now discuss. 2.5 RANDOM. VARIABLES AND THEIR PROBABILITY DISTRIBUTIONS By probability distribution of a random variable we mean the possible values taken by that variable and the probabilities of occurrence of those values. To understand this clearly, we first consider the probability distribution of a discrete tv, and then we consider the probability distribution of a continuous tv, for there are differences between the two. Probability Distribution of a Discrete Random Variable As noted before, a discrete rv. takes only.a finite (or countably infinite) number of values. Let X bean ry. with distinct values of x, .... The function f defined by f0 =x) = PO=2) Dover en =0ifr ex is called the probability mass function (PME) or simply the probability function (PF), where P(X = 3)) means that the probability that the discrete rv. X takes the numerical value of 2,7 Note these properties of the PME given in (2.11): Os fa) s1 (2.12) Df) =1 (2.13) ‘The values taken by a discrete rv. are often calluess points and f(X= x) denotes the mass associated with the nas point ee‘34 PART ONE: ” BASICS OF PROBABILITY AND STATISTICS FIGURE 2.2 where the summation extends over al the values of X. Notice the similarities of these properties with the properties of probabilities discussed earlier To see what this means, consider the following example. Example 2.13. Let the xv. X represent the number of heads obtained in two tosses of a coin. Now consider the following table: Number ofhesds PF Xx Ax) 0 14 1 ve 2 14 sum 41.00 In this example the rv. X (the number of heads) takes three different values— X=0,1, or 2. The probability that X takes a value of zero (i.e, no heads are obtained in a toss of two coins) is 1/4, for of the four possible outcomes of throwing two coins (ie,, the sample space), only 1 is favorable to the outcome TT: Likewise, of the four possible outcomes, only one is favorable to the out- ‘come of two heads; hence, its probability is also 1/4. On the other hand, two outcomes, HT and TH, are favorable to the outcome of one head; hence, its probability is 2/4 =1/2. Notice that in assigning these probabilities we have used the classical definition of probability, Geometrically, the PMF of this example is as shown in Figure 2-2. fy Probability Density 14! | | x 0 1 2 Number of Heads ‘The probability mass function (PMF) of the number of head into tosses ofa coin4Evample 2.19).(CHAPTER TWO: REVIEW OF STATISTICS PROBABILITY AND PROBABILITY DISTRIBUTIONS 35 Probability Distsibutlon of a Continuous Random Variable The probability distribution of a continuous random variable is conceptually similar to the probability distribution of a discrete rv. except that we now measure the probability of such an ry. over a certain range or interval. Instead of calling ita PME, we call ita probability density function (PDF). An example will make the distinction between the two clear. Let X represent the continuous .v. height, measured in inches, and suppose ‘we want to find out the probability that the height of an individual lies in the interval, say, 60 to 68 inches. Further, suppose that the rv. height has the PDF as shown in Figure 2-3. The probability that the height of an individual lies in the interval 60 to 68 inches is given by the shaded area lying between’60 and 68 marked on the curve in Figure 2-3. (How this probability is actually measured is shown in Chapter 4,)'n passing, note that since a continuous rv, can take an uncountably infinite number of values, the probability that such an tv. takes a particular numerical value (e.g , 63 inches) is always Zero; the probability for a continuous 1.0. is aloays measured over an interonl, say, between 62.5 and 63.5 inches, ‘More formally, for a continuous rv. X the probability density function (PDF), {(%), is such that 2 P(y ) used for taking the sun of the values ofa discrete random variable and where dr stands for a small interval of x values. APDF has the following properties: 1. The total area under the curve f(t) given in (2.14) is 1, 2. Pry 21, oo dates P Coase g a z 2 i oO 6 8 a Height in Inches FIGURE 2-3 The PDF of a continuous random variable,86 PART ONE: BASICS OF PROBABILITY AND STATISTICS 3, since the probability that a continuous rv, takes a particular value is zero because probabilities for such a variable are measured over an area ot interval, the left hand side of (2.14) can be expressed in any of these forms: Ply $ Xm) = P(y 0 forall values inthe range 0103. Cumulative Distribution Function (CDF) Associated with the PMF or PDF of an rv, X is its cumulative distribution function (CDP), F(X), which is defined as follows: F(X)= P(X <2) (2.15) where P(X < x) means the probability that the ny, X takes a value of less than or equal to x, where x is given. (Of course, for a continuous rv, the probability that such an rv.-takes the exact value of x is zero.) Thus, P(X < 2) means the probability that the rv. X takes-a value less than or equal to 2. The following. properties of CDF should be noted: 1, F(~oc) =0 and F(oo) = 1, where F(—co) and F(oo) are the limits of F(x) as x tends to -00 and oo, respectively. 2. F(x) is a nondecreasing function such that if x» > x; then F(a) > F(x).CHAPTER TWO: REVIEW OF STATISTICS I: PROBABILITY AND PROBABILITY DISTRIBUTIONS 57 3, P(x k) = 1— F(Q); thatis, the probability that X assumes a value equal toor greater than kis 1 minus the probability that X takes a value below k. 4. Ply < X fla) (2.16) where 3° f(X) means the sum of the PMF for values of X less than or equal to the specified x, as shown in the preceding table. Thus, in this example the probability that X takes the value of less than 2 (heads) is 5/16, but the probability that it takes a value of les than 3is 11/16, Of course, the probability that it takes a value of 4 or less than four heads is 1. (Why?) Geometrically, the CDF of Example 2.15 looks like Figure 24. Since we are dealing with a discrete rv. in Example 2.15, its CDFis a discontinuous function, known asa step function. If we were dealing with the CDF of a continuous 1¥,, its CDE would be a continuous curve, as shown in Figure 2:58 Example 2.16. Referring to Example 2.15, whatis the probability that X lies between 2 and 3? Here we want tofind F(X =3) — F(X= 2), From the table givenin Example2.35, "Tf. xis continuous with a PDF of f(s), then F(x) = i f(x) dx and f(x) = F'(x), where F’(3) is the derivative of F(x) = £F(X). 3788 PART ONE: BASICS OF PROBABILITY AND STATISTIOS Number of Heads FIGURE 2-4 The cumulative distribution function (CDF) of & discrete random variable (Example 2.15). FIGURE 2-5 The CDF of ¢ continuous random variable. we find that F(X-< 3) = 15/16 and that F(X.< 2) = 11/16. Therefore, the probability that X lies between 2 and 3is 4/16 = 1/4. 2.6 MULTIV::i/ATE PROBABILITY DENSITY FUNCTIONS So far wehave been concerned with single variable, or univariate, probability distribution functions. Thus, the PMPs of Examples 2.5 and 2.13 are univariate PDBs, for we considered single random variables, such as the number of heads in a toss of two coins or the number of heads in a toss of four coins. However, we do not need to be so restrictive, for the outcomes of an experiment could be‘CHAPTER TWO: REVIEW OF STATISTICS PROBABILITY AND PROBABILITY DISTAIBHONS. 39 TABLE 22 THE FREQUENCY DISTRIBUTION OF TWO RANDOM VARIABLES: NUMBER OF PCS SOLD (X) AND NUMBER OF PRINTERS SOLD (Y). Number of POs Sold (X) 0 1 2 8 4 Toll ‘Number of Printers Sold (Y) 0 6 6 4 4 2. 1 4° 0 12 4 2 cra -2 2 4 0 0 1 4 3 2 2 10 2 2 5 4 2 2 2-08 46 Total 16 24 «48 48 64 200 described by more than one ry, in which case we would like to find their probability distributions. Such probability distributions are called multivariate probability distributions. The simplest of these is the bivariate, or a two-variable PMF or PDF. Let us illustrate with an example? Example 2.17, A retail computer store sells personal-computers (PCs) as well as printers. The number of computers and printers sold varies from day to day, but the store manager obtained the sales history over the past 200 days in the form of Table 2-2. In this example we have two random variables, X (number of PCs sold) and. Y (number of printers sold). The table shows that of the 200 days, there were 30 days when the store sold 4PCs and 4 printers, but on 2 days, althoughit sold 4 PCs,.it sold no printers, Other entries in the table are to be interpreted similarly. Table 2-2 provides an example of what is known as a joint frequency distribution, it gives the number of times a combination of values is taken by the two variables. Thus, in our example the number of times 4 PCs and 4 printers, were sold together is 30. Such a number is called an absolute frequency. All the numbers shown in Table 2-2 are thus absolute frequencies, Dividing the absolute frequencies given in.the precéding table by 200, we obtain the relative frequencies, which are shown in Table 2-3. Since our sample is reasonably large, we can treat these (joint) relative fre- ‘quencies as measures of joint probabilities, as per the frequency interpretation of probabilities. Thus, the probability that the store sells three PCs and three printers is 0.10, ot about 10 percent. Other entries in the table are to be interpreted sisnilarly ‘This example s adapted from Rol. Mitethammer, Mathai! Suiits fr Eooomics and Business, Springer, New York, 1995, p.107.40. PARTONE: . BASICS OF PROBABILITY AND STATISTICS: TABLE2-9 . THE BIVARIATE PROBABILITY DISTRIBUTION OF NUMBER OF PCS SOLD (X) AND NUMBER OF PRINTERS SOLD (Y). ‘Number of PCs Sold (x) 3 Total A(Y) Number of Printers Sold (y) 0, 0.03 0.02 0.01 oat 1 0.02. 0.05 0.02 0.01 0.16 2 01 002 0.05 = 0.05 023 3 oi 0.01 0.10 - - 0.10 027 4 001 ~ 001 005 0.15 023 Total £(X) 0. 024 024 00 Because the two variables are discrete, Table 2-3 provides an example of what is known as bivariate or joint probability mass function (PME). More formally let Xand Y be two discrete random variables, Then the function f(¥Y) = P(X=aandY=y) = Owhen Xx and ¢ y a1 is known as the joint PME. This gives the joint probability that X takes the value of x and Y takes the value of y simultaneously, where x and y are some specific values of the two variables. Notice the following properties of the joint PMF: 1. (X,Y) 2 0 for all pairs of X and Y. This is so because all probabilities ate nonnegative. 2 DEEN vihich follows fom theft thatthe sum of the probabilities associated with all joint outcomies must equal 1. Note that we have used the double summation sign because wé are riow dealing with two variables. If we were to deal with a three variable joint PME, we would be using the triple summation sign, andso on. The joint probability of two continuous random variables (i, joint PDF) can be defined analogously, although the mathematical expressions are somewhat involved and ate given by way of exercises for the benefit of the more mathe- ‘matically inclined reader. The discussion of joint PMF or joint PDF leads to a discussion of some related concepts, which we will now discuss. Marginal Probabllity Functions We have studied the univariate PBs, such as {00 or f(V), and the bivariate, or joint, PF/(X, Y) Is there any relationship between the two? Yes, there is, In relation to f(X, ¥), f() and f(X) are called univariate, unconditional, individual, or marginal PMFs or PDFs. More technically, the probability that XCHAPTER TWO: REVIEW OF STATISTICS |: PROBABILITY AND PROBABILITY DISTRIBUTIONS 41 ‘TABLE 24 MARGINAL PROBABILITY DISTRIBUTIONS OF X (NUMBER OF PCS SOLD) AND ¥ (NUMBER OF PRINTERS SOLD). Value of X 1%) Value of Y a) o 0.08 0 On 1 012 1 0.16 2 0.24 2 023 3 0.24 3 027 4 03g 4 023 sum 1,00 1.00 assumes a given value (eg, 2) regandless of the wllues taken by Y is called the marginal probability of X, and the distribution of these: probabilities is called the ‘marginal PMF of X. How do we compitte these marginal PMEs or PDFs? That is easy In Table 2-3 we see from the column totals that the probability that X takes the value of 1 regardless of the values taken by Y is 0.12; the probability that it takes the value of 2 regardless of Y’s value is 0.24, and so on. Therefore, the marginal PME of X is as that shown in Table 2-4. Also Table 2-4 shows the marginal PMF of Y, which cait be derived similarly. Note that the sum of each of the PFs, f&X) and fis 1. (Why?) You will easily note that to obtain the marginal probabilities of X, wesum the joint probabilities corresponding to the given value of X regardless of the values taken by Y. That is, we sum down the columns, Likewise, to obtain the marginal probabilities of Y, we eum the joint probabilities corresponding to the given value of Y regardless ofthe Values taken by X. Thats, we sum across the rows. Once such marginal probabilities are computed, finding the marginal PMFs isstraight- forward, as we just showed. More formally, iff(X, Y) is the joint PMF of random. variables X and Y, then the marginal PFs of X and Y are obtained as follows: "FOL fore” Qa) y and, AML IG forall 2.19) 5 If the two variables are continuous, we will replace the summation symbol ‘with the integral symbol. For example, if F(X, Y) represents a joint PDF; to find the marginal PDF of X, we will integrate the joint PDF with respect to Y values, aid to find to marginal PDF of Y, we will integrate it with respect to the X values (see Exercise 2.19). Conditional Probability Functions Continuing with Example 2.17, et usnow suppose we want fo find the probability that 4 printers were sold, knowing that 4 PCs were sold. In other words, what is the probability that Y = 4, conditional upon the fact that X = 4? This is42. PARTONE: BASICS OF PROBABILITY AND STATISTICS, ey known as éonditional probability (recall our earlier discussion of conditional probability of an event). This probability can be obtained from the conditional probability mass function defined as f(Y[X) = PY = y|X=2) (2.20) where f(Y |X) stands for the conditional PMF of Y; it gives the probability that Y takes orl the value of y (number of printers sold) conditional on the knowledge that X has assumed the value of x (number of PCs sold). Similarly, SKIN) = P(X=21¥=y) 21) gives the conditional PMF of X. Note thatthe preceding two conditional PFs are for two discrete random variables, Y and X. Hence, they may be called discrete. conditional PMFs, Conditional PDFs for continuous random variables can be defined analogously, although the mathematical formulas are slightly involved (see Exercise 2-19). One simple method of computing the conditional PF is as follows: {%N ‘Y | X)= foin= A _ joint probability of Xand Y = atginal probability of X soci =fA fM) (228) _ joint probability of Xand ¥ “marginal probability of Y In words, the conditional PMF of one variable, given the oalue ofthe other varia, is simply he ratio ofthe joint probability ofthe fco variables divided by the marginal or unconditional PF of the ther (.e, the conditioning) variable, (Compare this withthe conditional probability of an event A, given that event B has happened, ie., P(A B)) Returning to our example, we want to find out f(Y = 41X = 4), which is f= 4and X=4) fX=4 15 aigp (tom Table 23) (2.22) f's4/X=4)= (2.24) =8047 From Table 2-3 we observe thet the marginal, or unconditional, probability that Y takes a value of 4is 0.23, but knowing that 4 PCs were sold, the probability that 4 printers will be sold increases to * 0.47. Notice how the knowledge‘CHAPTER TWO: REVIEW OF STATISTICS I: PROBABILITY AND PROBABILITY DISTRIBUTIONS 43, about the other event, the conditioning event, changes our assessment of the probabilities, This isin the spirit of Bayesian statistics, In regression analysis, as we will show in Chapter 6, we are interested in studying the behavior of one variable, say, stock prices, conditional upon the knowledge of another variable, say, interest rates. Or, we may be interested in studying the female fertility rate, knowing a woman's level of education, Therefore, the knowledge of conditional PMFs ot PDFs is very important for the development of regression analysis. Statistical independence. : TABLES Another concept that is vital for the study of regression analysis is the concept of independent random variables, which is related to the concept of independence of events discussed earliet. We explain this with sn example. Example 2.18. Abag contains three balls numbered 1, 2, and 3, respectively. Two balls are drawn at random, with replacement, from the bag (ie,, every time a ball is drawn itis put back before another is drawn). Let the variable X denote the nurnber on the first ball drawn and Y the number on the secortd ball. Table 2-5, gives the joint as. well as the marginal PMEs of the two variables. ‘Now consider the probabilities f(X = 1, Y= 1), f(X= 1), and f(Y = 1). As Table 2-5, shows, these probabilities are 1/9, 1/3, and 1/3, respectively. Now the first of these is a joint probability, whereas the last two are marginal probabilities. However, the joint probability in this case is equal to the product of the two marginal probabilities. When this happens, we say.that the two variables are statistically independent, that is, the value taken by one variable has no effect on the value taken by the other variable. More formally fwo variables X and Y are statistically independent if and only if their joint PME or PDF can be expresséd as the ‘product oftheir individual, or marginal, PMFs or PDEs forall combinations of X and Y onlues. Symbolically, FRY) = FOO FH (2.25) ‘STATISTICAL INDEPENDENCE OF TWO RANDOM VARIABLES. 1 1 18 1 39 Yo Bs 4 9 oe 88 3 oy 78 e88 wm. 3999/8 144 PARTONE: BASICS OF PROBABILITY AND STATISTICS Youcan easily verify that for any other combination of X and Y values given in, Table 25 the joint PF is the product of the respective marginal PFs; that is, the two vatiables are statistically independent. Bearin mind ehat Equation (2.25) must be true forall combinations of X and Y onlues. Example 2.19. j? 2.5. What is the difference between a PDF and a PMF? 2.6. What is the difference between the CDFs of continuous and discrete random variables? 2.7. By the conditional probability formula, we have P(AB) 1 PAB) = 76), and = PAB) 2 2, PBIA= PA) ~» P(AB) = PB|A)P(A) ‘where ~» means “implies.” If you substitute for F(AB) from the right-hand side of Q) into the numerator of (1), what do you get? How do you interpret this result? PROBLEMS 28, What do the following stand for? ‘ 4 aye eo di+9 a a ‘ 3 bi Say,aisaconstant £3! im & 2 fi . D(2a +34) & 2 a & 32 3 apea h Stet—3 29, Express the following in the "notation: at tay tot Ku t% Dy yt 2p +335 + 44 +535 (t+ H +t att +R) ta oyTABLE 26 TABLET CHAPTER TWO: REVIEW OF STATISTICS PROBABILITY AND PROBABILITY DISTRIBUTIONS 47 2.10, Itcan be shown that the sum ofthe frst n positive numbers is » ke a +1) Uae the pig formula fo evaluate a 5 rob Qa koe 2 3k 2A. It canbe proved thatthe samet ‘squares of the first n positive numbers is . Ba n(n+1)(2n+1) ecm Using this formu, obtain » 0 aye’ Ee eYR aye fi et fe aa. Ann Xhas the following PDE: x 1), 0 b 1 2b 2 3b 3 4b 4 8b a. What is the value of bt Why? b, Find the P(X < 2); prob(X <3); prob@2 < X <3), 2.18, The following table gives the joint probablty distribution, f(X, 1), of two random variables X and ¥. x Yy 1 2 a 1 0.03 0.08 0.08, 2 0.02 0.04 0.04 3 0.09 0.18 0.18 4 006 O42 0.12 a. Find the marginal (Le, unconditional) distributions of X and Y, samely, ‘f(X) and (1). b, Find the conditional PDR f(XIY) and f(Y IX). 2.14. OF100 people, 50 are Derocrats,40/are Republicans, and 10 are Independents. ‘The percentages of the people in these three categories who read the Wall Street Journal are known to be 30, 60, and 40 percent, respectively. If one of these peopleis observed reading the Journal, what is the probability that he or she is a Republican? 2.45. Let A denote the event that a person lives in New York City. Let P(A) = 0.5 Let B denote the event that the person does not live in New York City but 748 PARTONE: BASICS OF PROBABILITY AND STATISTICS TABLE 2-8 TABLE 2-9 works in the city. Let P(B) = 0.4. What s the probability that the person either lives in the city orhe does not live in the city but works there? 2.16. Based on a random sample of 500 married women, the following table gives the joint PMF of their work status in relation to the presence or absence of chil- ! dren in the household.” CCHILDREN AND WORK STATUS OF WOMEN IN THE UNITED STATES. Works outside Das not work home outside home Toa Hes children 02 03 0s Does rot have children 04 o4 08 Total 06 04 10 a, Are children and working outside of the home mutually exclusive? b. Are working outside of the home and presence of children independent events? 217. The following table gives the joint probability of X and Y, where X represents a person's poverty status (below ot above the poverty line as defined by the US. government), and Y represents the person’s race (White, blacks only, and + all Hispanic). POVERTY IN THE UNITED STATES, 2002. x y Below poverty ing Above poverty fine White 0.08 067 Black 0.93 0.09 : Hispanic 0.08, 0.410 Source: These data ae dated from the U.S. Census Bureau, Curent Population Reports, Poverty inthe United States: 2002 Sopiomber 2003, Tele 1. Afough the poverty ne varies by several socloeconomic characteristic, fra family of four in 2002, tho viding Hine was about $18,392. Fares below this income lovel can be-classtiod as poor, a. Compute f(X|Y = white); f(X1Y = black) and f(X|Y = Hispanic), where | X represents below the poverty line, What general conélusions can you draw from these computations? b. Are tace and poverty status independent variables? How do you know? "Adapted from Banry R. Chiswick and Stephen J. Chiswick Staite and Economics: A | Problem Solving Approach, University Park Press, Baltimore, 1975.‘CHAPTER TWO: REVIEW OF STATISTICS I PROBABILITY AND PROBABILITY DISTRIBUTIONS 49 *2.18, The PDF of a continuous random variable X is as follows: f(X) =c(tx-22) 05x52 = Ootherwise a. For this to be a proper density friction, what must be the value of c? b, Find P(l 2) *2,19. Consider the following joint PDF: feon=Zxe—x—y} O 05) and P(Y <05) '. What is the conditional density of X given that Y = y, where 0 1.58 PART ONE: BASICS OF PROBABILITY AND STANSTIOS In words this inequality states that at least the fraction (I~ 4) of the total probability of X lies within ¢ standard deviations ofits mean or expected value. Put differently, the probability that a random variable deviates from its mean value by more than c standard deviations is less than or at the most equal to 1/c”, What is remarkable about this inequality is that we do not need to know the actual PDF or PMF of a random variable. Of course, if we know the actual PDF or PMP, probabilities such as Eq. (9.20) can be computed easily, a8 we will show when we consider some specific probability distributions in Chapter 4. Example 3.5. Illustration of Chebyshiev’s Inequality - The average number of donuts sold in a donut shop between 8 am. and 9am. is 100 with a variance of 25. What is the probability that on a given day the number of donuts sold between § a.m. and 9 a.m. is between 90 and 110? By Chebyshev’s inequality, we have: 1 PIX ~ wal Sco] =1> 5 PU —100| <5q= 1-4 G21) Since (110 — 100} = (100 — 90) = 10, ‘we see that 5c = 10. Therefore, c = 2. It therefore follows that (1-3) 0.75. That is, the probability that between’90.and 110 donuts are oh between 8 am. and 9 aim. is at least 73 percent. By the same token, the probability that the number of donuts sold between 8 am. and 9 a.m, exceeds 110 oris less than 90 is 25'percent. Coefficient of Variation Before moving on, note that since the standard deviation (or variance) depends on the units of measurement, it may’ be difficult to compare two or more stan dard deviations if they ate expressed in different units of measurement. To get around this difficulty, use the coefficient of variation (V), a treasure of relative variation, which is defined as follows: v= 2%. 199 6.22) Px Verbally, the V is the ratio of the standard deviation of a random variable X to its mean value multiplied by 100. Since the standard deviation and the mean value of a random variable are measured in the same units of measurement, V is unitless; thatis, it is a pure number. We can therefore compare the V values of two or more random variables directly. Example 3.6, ‘An instructor teaches two sections of an introductory econometrics class with 15 students in each class. On the midterm examination, class A scoredCHAPTER THREE: CHARACTERISTICS OF PROBABILITY DleTRIBUTIONS 59 an average of 83 points with a standard deviation of 10, and class B scored an. average 88 points with a standard deviation of 16, Which class performed relatively better? If we use V as defined in Eq, (3.22), we get: 10 8 Since the relative variability of class A is lower, we can say that class A did relatively better than class B. Vas = 100 = 12.048 and Va= = +100 = 18.181 3.3 COVARIANCE The expected value and valance aré the two most frequently used summary ‘measures of a univariate PMF (or PDF). The former gives us the center of gravity, and the latter tells us how the individual values are distributed around the center of gravity. But once we go beyond the univariate probability distributions (eg,, the PMF of Example 3.2), we need to consider, in addition to the mean and variance of each variable, some additional characteristics of multivariate PFs, such as the covariance and correlation, which we will now discuss. Let X and Y be two random variables with means E(X) = wy and E(Y) = Hy. Then the covariance (cov) between the two variables is defined as cov(K 1) = ELK —ms)(Y - Hy)] = EQY) — pitty (3.23) ‘As Equation (3.23) shows, a covariance is a special kind of expected value and is a measure of how two variables vary or move together (ie,, co-vary), a8 ._ shown in Example 3.7, which follows. In words, Eq. (3.23) states that to find the covariance between two- variables, we must expréss the value of each variable as a deviation from its mean value and take the expected value of the product, How this is done in practice follows, ‘The covariance between two random vaniables can be positive, negative, or Zero, Iftwo random variables move in the same direction (Le, if they both increase) as in Example 3.7 below, then the covariance will be positive, whereas if they move in the opposite direction (Le, if one increases and the other decreases), the covariance will be negative. If, however, the covariance between the two variables is zero, it means that there is no (linear) relationship between the two variables. ‘To compute the covariance as defined in Eq, (3.23), we use the following formula, assuming X and Y ate disorete random variables: cov KY) = PY Kwa} wy) WN) ry = LYNG -wny (3.24) my = E(XY) — pxny where E(XY) is computed from Eq, (3.10),60 PART ONE: BASICS OF PROBABILITY AND STATISTICS Note the double summation sign in this expression because the covariance requires the summation of both variables over the range of their values, Using the integral notation of calculus, a similar formula can be devised to compute the covariance of two continuous random variables, Example 3.7. Once again, retum to our PC/ptinter sales example. To find out the covariance betvreen computer sales (X) and printer sales (1), we use formula (3.24). We have already computed the first term on the right-hand side of this equation in Example 33, whichis 7.06, We have already found that p = 2.60 and ty = 235, Therefore, the covariance in this example is cov (%, ¥) = 7.06 = (2.60)(2.35) = 0.95 which shows thiat PC sales and printer sales are positively related. Properties of Covariance ‘The covatiance as defined earlier has the following properties, which we will find quite useful in regression analysis inlater chapters. 41, IfXand Y are independent random variables, their covariance is zero. This is easy to verify, Recall that if two random variables are independent, EQ) = EQQEO) = pty Substituting this expression into Eq, (3.23), we see at once that the covariance of two independ random variables zero. 2 cov (a +X,¢+dY) = bd cov (% Y) (3.25) where a,b ¢,and dare constants, ; 3. cov (X, X) = var(X) (3.26) That is; the covatiance’of a variable with itself is simply its variance, which can be verified from the definitions of variance and covariance given previously: Obviously, then, cov (Y, Y) = var(Y). 4. IfXand Y are two random Variables but are not necessarily independent, then the variance formulas given in Eg. (313) need to be mouiified as » follows: var (K+) = var (X) +var(¥) +2cov(, Y) 627 var (X~ 1) = var(X) +-var(V) -2cov(X%Y) (6.28) Of course, if the two variables are independent, formulas (3.27) and (3.28) will, coincide with Bq. @.13).CHAPTER THREE: CHARACTERISTICS OF PROBABILITY DISTRIBUTIONS 61 3.4 CORRELATION COEFFICIENT In the PC/printer sales example just considered we found that the covariance between PC sales and computer sales was 0.95, which suggests that the two vatiables are positively related, But the computed number of 0.95 does not give any idea of how strongly the two variables are positively related. We can find out how strongly any two variables are related in terms of what is known as the (population) coefficient of correlation, which is defined as follows: cov 1) 629) Sy where p (tho) denotes the coefficient of correlation. As is clear from Equation (3.29), the correlation between two random variables X and Y is siniply the ratio of the covariance between the two variables divided by their respective standard deviations. The correlation coefficient thus defined is a measure of linear association between two variables, that is, how strongly the two variables are linearly related. p= Properties of Correlation Coefficient ‘The correlation coefficient just defined has the following properties: 1, Like the covariance, the correlation coefficient can be positive ot negative. It is positive if the covariance is positive and negative if the covariance is negative. Jn short, it has the same sign as the covariance. 2, The correlation coefficient is a measure of linear relationship between two. variables. 3. The correlation coefficient always lies between —1 and +1. Symbolically, -ispsi (3.30) If the correlation coefficient is +1, it means that the two variables are ‘perfectly positively linearly related (as in Y = By + ByX), whereas if the correlation coefficient is —1, it means they are perfectly negatively lin- carly related, Typically, p lies between these limits. 4, The correlation coefficient is a pure number; that is, itis devoid of units of ‘measurement. On the other hand, other characteristics of probability distributions, such as the expected value, variance, and covariance, depend ‘on the units in which the original variables are measured. 5. If two variables are GGtatistically) independent, their covariance is zero. ‘Therefore, the cotrelation coefficient will be zero. The converse, however, is ‘not true, Thatis,if the correlation coefficient between two variablesis zero, itdoesnotmear that the two variables are independent. This is because the cottelation coefficient is a measure of lirtear association or linear relationship between two variables, as noted previously. Rot example, if Y = X%, the correlation betiveen the two variables may be zero, butbynomeansare the ‘two variables independent. Here Y is a nonlinear function of X. a62 PART ONE: BASICS OF PROBABILITY AND STATISTICS y y peels “ x x x @ ® o Y Y ‘ y - positive but pose fo Fosgraas x x x @ © 9 FIGURE 3-3. Some typical pattems of the correlation coefficient, p. 6. Correlation does not necessatily imply causality. If one finds a positive correlation between lung cancet and smoking, it does not necessarily mean that smoking causes lung cancer. Figure 3-3 gives some typical patterns of correlation coefficients. Example 38, Let us continue with the PC/printer sales example. We have already seen that the covariance between the two variables i 0.95. From the data given in(CHAPTER THREE: " CHARACTERISTICS OF PROBABILITY DISTRIBUTIONS 63 ‘Table 2-4, we can easily verify that = 1,2649 and oy formula (3.29), we obtain A124, Then, using 0.95 °= Tapaaaiag = 03!” Thus, the two variables are positively correlated, although the value of the correlation coefficient i rather moderate. Tis probably isnot surprising, for not everyone purchasing a PC buys a printer. ‘The use of the correlation coefficient in the regression context is discussed in Chapter 7, Incidentally, Eq. (3.29) can also be written as: Cov%N) = poxey 31) That is, the covariance between two variables is equal to the coefficient of correlation between the two times the product of the standard deviations of the two. Variances of Correlated Variables In Bq, (3.27) and Bq, (328) we gave formulas for the variance of variables that are not necessarily independent. Knowing the relationship between covariance and correlation, we can express these formulas alternatively as follows: var(X+Y) = ax (X) + var(Y)+2pozoy _ (3.32) var (X — Y) = var(X) + var(¥) ~ 2poxay (3.33) Of course, if the correlation between two random variables is zero, then var (X + ¥) and var(X— ¥) are both equal to var (X) + var(Y), as we saw before. Asanexercise, you can find the variance of (X + Y) of our PC/ printer example. A 3.5 CONDITIONAL EXPECTATION Another statistical concept that is especially important in regression analysis is the concept of conditional expectation, which is different from the expectation of an rv. considered previously, which may be called the unconditional expectation, The difference between the two concepts of expectations canbe explained. as follows. Return to our PC/printer sales example. In this example X is the number of _ PCs gold per day (ranging from 0 to 4) and Y’is the number or printers sold per day (ranging from 0 to4), We have seen that E(X) = 2.6 and E(Y”) = 2.35. These are wiconditional expected values, for in computing these values we have not put any restrictions on them,64 PART ONE: BASIOS OF PROBABILITY AND STATISTICS 84 But now consider this question: What is the average number of printers'sold (1) fits known that on a particular day 3 PCs were sold? Put differently, what is the conditional expectation of Y given that X=3? Technically, what is E(Y |X =3)? This is known as the conditional expectation of ¥. Similarly, we could ask; What is E(Y X= 1)? From the preceding discussion it should be clear that in computing the unconditional expectation of anv, we donot take into account information about any other rv,, whereas in computing the conditional expectation we do. ‘To compute such conditional expectations, we use the following definition of conditional expectation EKIY=9) =) )X/KlY=9) (3.34) ¥ Which gives the conditional expectation of X, where X is a discrete xv, £(X1Y = y) is the conditional PDF of X given in Bq, (2.20), and Jy means the sum over all values of X. In relation to Equation (3.34), EX), considered earlier, is called the unconditional expectation. Computationally, E(X| Y = y) is similar to E(X) except that instead of using the unconditional PDF of X, we use its conditional PDE. ai seen clearly in comparing Bq, (3.34) with Eq, (31). Similarly, BY(X=x)= SOYf(VIX=%) 6.35) T gives the conditional expectation of ¥. Let us lustrate with an example. Example 3.9. Let us compute E(Y |X = 2) for our PC/printer sales example. That is, we want to find out the conditional expected value of printers sold, knowing that 2 PCs have been sold per day. Using formula (3.34), we have 4 =D Yf(Y|252) 7 = f(VS1/X=)42f(¥=21X=2) +3f(Y=3|X=2)+4f(Y=4|X=2) = 1875 Note: f(Y = 1| X=2) = MrAt=A), and so on (see Table 2-3). As these calculations show, the conditional expectation of Y given that X=2,is about 1.88, whereas, as shown previously, the unconditional expectation of Y was 2.35. Just as we saw previously that the conditional PDFs and. marginal PDFs are generally different, the conditional and unconditional expectations in general are different too. Of course, if the two variables are E(Y[X=CHAPTER THREE: CHARACTERISTICS OF PROBABILITY DISTRIBUTIONS 65 independent, the conditional and unconditional expectations will be the same. (Why?) Conditional Variance Judt as we can compute the conditional expectation of a random variable, we can also compute its conditional variance, var(Y |X). For Example 39, for instance, we may be interested in finding the variance of Y, given that X= 2, var (Y| X = 2), We can use formula (3.11) for the variance of X, except that we now have to use the conditional expectation of Y and the conditional PDR. To see this is actually done, see Exercise 3.23, Incidentally, the variance formula given in (3.11) may be called the unconditional vatiance of X. Just as conditional and unconditional expectations of an rv., in general, are different, the conditional and unconditional variances, in general, are different also. They will be the same, however, if the two variables are independent. As we will see in Chapter 6 and in subsequent chapters, the concepts of conditional expectation and conditional variance will play an important role in econometrics. Retuming to the civilian labor force participation rate (CLFPR) and the civilian unemployment rate (CUNR) example discussed in Chapter 1, will the unconditional expectation of CLFPR be the same as the conditional expectation of CLFPR, conditioned on the knowledge of CUNR? Ifthey are the same, then, the knowledge of CUNR is not particularly helpful in predicting CLEPR. In such a situation, regression analysis is not very useful. On the other hand, if the knowledge of CUNR enables us to forecast CLFPR better than without that knowledge, regression analysis will be a very valuable research tool, as wwe will show in subsequent chapters. 38 SKEWNESS AND KURTOSIS To conclude our discussion of the chasacteristics of probability distributions, we discuss the concepts of skewness and kurtosis of a probability distribution, which tells us sotnething about the shape of the probability distribution. Skewness (S) is a measure of asymmetry, and kurtosis (K) is a measure of tallness or flatness of a PDE, as can be seen in Figure 3-4. To obtain measures of skewness and kurtosis, we need to know the third moment and the fourth moment of a PME (PDF). We have already seen that the first moment of the PMF (PDF) of a random variable X is measured by E(X) = px, the mean of X, and the second moment around the mean (ie, the variance) is measured by E(X ~ p1,)*. In like fashion, the third and fourth mo- ‘ments around the mean value can be expressed as: Third moment: E(X - tx)? (3.36) Fourth moment: E(X — px)! (3.37) 8566 PART ONE: BASICS OF PROBABILITY AND STANSTICS &§ FIGURE 3-4 (@) Skewness; (b) kurtosis. ‘And, in general, the rth moment aroimd the mean value can be expressed as rth moment: EQC— ps)" 6.38) Given these definitions, the commonly used measures of skewness and kurtosis, are as follows: EQ ~ tte)? e __. third moment about mean ~ cube of standard deviation $ (3,39) Since for a symmetrical PDF the third (and all odd order) moments are zero, for such a PDF the $ value is zero, The prime example is the normal distribution, which we will discuss more fully in the next chapter. Ifthe $ value is positive,CHAPTER THREE: CHARACTERISTICS OF PROBABILITY DISTRIBUTIONS. 67 the PDF is right-, or positively, skewed, and if it is negative, itis left, or negatively, skewed. (Gee Fig. 34a). = Feat TER = px)? _ ___ fourth moment ~~ square of second moment PDFs with values of K less than 3 are called platykirtc (fat or short-failed), and those with values of greater than3 are callled leptokurti¢ (slim or long-tailed), as shown in Fig. 3-4()) For a normal distribution the K value is 3, and such a PDF is called mesokurtic. Since we will be making extensive use of the normal distribution in later chapters, thieknowledge that for such a distéibution the values of S and Kare zeroand 3 respectively, Will help us fo compare other PDFs with the normal distribution. The computational:formulas to obtain the third and fourth moments of a PDE are straightforward extensions of the formula given in Eq, (3.11), namely, Third moment: )(X~ p)*f(%) (G41) K (3.40) Fourth moment: 7OK~ pa)*@) (42) _ Where X is a discrete iv. For a continuous rv. we will replace the summation sign by the integral sign (/). Example 3.10. Consider the PDF given in Table 3-1. For this PDF we have already seen that E(X)=35 and var(X) = 2.9167. The calculations of the third and fourth moments about the mean value are as follows: eer Ee eee eemeoeaues } X AX) amd) as) FX) A . 178 (1-95)(1/6) < * (1~35)%1/8) ‘ 1 21/8 (2~35)%1/6).-.- (@=98)4(1/8) 3° 1/6 (8-3.5)9(1/6)" (8-3.8)4(1/6) 4 1/6 (4-35)°(1/6) (4-35)4(1/6) 5 6 1/8. (5-350/8} (5 -3.8)4(1/6) We (6 =3.5)°(1/8) (6=S5)'(1/6) ‘Sum = 0 14,732, From the definitions of skewness and kurtosis given before, verify that for the present example the skewness coefficient is zero (Is that surprising?) and that the kurtosis value is 1.7317, Therefore, although the PDE given above is symmetrical around its mean value, itis platykurtic, or much flatter than the normal distribution, which should be apparent from its shape in Fig. 3-4(6). + 8768 PARTONE: BASICS OF PROBABILITY AND STATISTICS 3.7 FROM THE POPULATION TO THE SAMPLE ‘Sample Mean To compute the characteristics of probebility distibutors, suchas the expected value, variance, covariance, correlation coefficient, and conditional expected value, we obviously need the PMF (PDF), that is, the whole sample space or population. Thus, to find out the average income of all the people living in New York City.at a given time, obviously we need information on the population of the whole city, Although conceptually there is some finite population of New York City at any given time, itis simply not practical to collect information about each member of the population (ie, outcome, in the language of probability). What is done in practice is to draw a “representative” of a “random” “sanuple from this population and to compute the average income of the people ‘sampled.> But would the average income obtained from the sample be equal to the true average income (.e,, expected value of iricome).in the-popullation as a whole? Most likely it will not. Sishilarly, if we were to compute the variance of the income ini the sampled population, would that equal the true variance that we would have obtained had westudied the whole population? Again, mos likely itwill not. How then could we lea about population characteristics like the expected value, variance, etc. if we only have one or two samples from a given popula tion? And, as we will see throughout this book, in practice, invariably we have, to depend on one or more samples from a given population. The answer to this very important question will be the focus of our attention in Chapter 5. But meanwhile, we must find the sample counterparts, the sample moments, of the various population characteristics that we discussed in the preceding sections, Let X denote the number of cars sold per day by a car dealer, Assume that the ry. X follows some PDE. Further, suppose we want to find out the average number [ie,, E(X)] of cars sold by the dealer in the first 10 days of each month. Assume that the car dealer has been in business for 10 years but has no time to Jook up the sales figures for the first 10 days of éach month for the past 10 years. Suppose that he decides to pick at random the past data for one month and notes the sales figures for the first 10 days of that month, Which are as follows: 9, 11, 11, 14, 13, 9, 8, 9, 14, and 12. This is a.sample of 10 values. Notice that he has data for 120 months, and if he had decided to choose another month, he probably would have obtained 10 other values. If the dealer adds up the 10 sales values and divides the sum by 10 (ie., the sample size), the number he would obtain is known as the sample mean, wrories meaning of a random sample will be explained in Chapter 4,CHAPTER THREE: CHARACTERISTICS OF PROBABILITY DISTRIBUTIONS 69 The sample mean of an rv, X is generally denoted by the symbol X (read as X bat) and is defined as * z X x — . . Ds; 43) where 5%, Xi,a8 ustial, means thesum of the X values from 1 tom, where is the sample size, . The sample mean thus defined is known as an estimator of E(X), which we [EQ)} Hint: Recall the definition of variance. B cove Y) = EI — px) — wy} = E(XY) ~ expr where ix = E(X) and wy = E(Y). How would you express these formulas verbally? 3.19. Establish Eq. (3.15). Hint: Var (aX) = EfaX — E(aX)]° and simplify. 3.20. Establish Eq. (8.17): Hint: Var(aX +bY) = El(aX + bY) -E(aX +bY)P and simplify. : 3.21. According to Chebyshev’s inequality, what percentage of any set of data must lie within c standard deviations on either side of the mean value if (a) ¢ = 2.5 and (b) c= 8? . Show that E(X—K)? = var(X) + [EQ KF. For. what’ value of k will E(X-)? be minimum? And whats that value of 7 . For the PC/printer sales example discussed in the chapter compute the con: ditional variance of ¥ (printers sold) given that X (PCs sold) is 2. Hint: Use the conditional expectation given in Example 3.9 and use the formula: var(Y|X=2) = Le ~E(Y|X=2Pf(Y|X=2) 3.24. Compute the expected value and variance for the PDE given in Problem 2.18, 3.2 3.23,CHAPTER SOME IMPORTANT PROBABILITY DISTRIBUTIONS In the previous chapter we noted that a random variable (cv) can be described by a few characteristics, or moments, of its probability function (PDF or PME), such as the expected value and variance. This, however, presumes that we know the PDF of that rv, which isa tall order sirice there are all kinds of random variables. In practice, however, some random variables occur so frequently that statisticians have determined their PDFs and documented their properties. For our purpose, we will consider only those PDFs that are of direct interest to us. But keep in mind that there aré several other PDFs that statisticians have studied which can be found in arty standard statistics textbook. In this chapter we will discuss the following four probability distributions: 1. Thenormal distribution 2..The f distribution ‘ 3, The chi-square (x*) distribution 4. The F distribution These piobability ‘distributions are important in their own right, but for our puitposes they are especially important because they help ws te find out the probability distributions of estimators (or sfatistis), such as the sample mean and sample variance, Recall that estimators are random variables. Equipped with that knowledge, we will be able to draw inferences about their true population values. For example, if we know the probability distribution of the sample mean, X, we will be able to draw inferences about the true, or population, mean x Similarly, if we know the probability distribution ofthe sample var- ance $2, we will be able to say something about the true population variance, 0. This is the essence of statistical inference, of drawing conclusions about some_ a78 PART ONE: -BASICS OF PROBABILITY AND STATISTICS characteristics (ie, nioments) of the population on the basis of the sample at hand, We will discuss in depth how this accomplished in Chapter 5, Fornow we discuss the salient features ofthe four probability distributions. 44 THE NORMAL DISTRIBUTION Pethaps the single most importent probability distribution involving a contin ous rv. is the normal distribution, Its bell-shaped picture, as shown in ‘Figure 2-3, should be familiar to anyone with a modicum of statistical know!- edge. Experience has shown that the normal distribution is a reasonably good ‘model for a continuous rv. whose value depends on a number of factors, each factor exerting a comparatively small positive or negative influence. Thus, consider the tv, body weight. Itis likely to be normally distributed becausé factors such as herédity, bone structure, diet, exercise, and metabolism are each expected to have some influence on weight, yet no single factor dominates the others. Likewise, variables such as height and grade-point average are also found to be normally distributed. For notational convenience, we express a normally distributed ny. Xas X~ Nur of) at where ~ means distributed a3, N stands for the normal distribution, and the quantities inside the parentheses are the parameters of the distribution, namely, its (population) mean or expected value zx and its variance o3. Note that Xis a continuous 1.v. and may take arty value in the range ~00 #0 00, Properties of the Norma Distribution 1, The normal distribution curve, as Figure 2-3 shows, is symmetrical around its mean vate 2x. 2. The PDF of a normally distributed rv. is highest at its mean value but tails off at its extremities (ie, in the tails of the distribution). That is, the probability of obtaining a value of a normally distributed rv, far away from its mean value becontes progressively smaller. For example, the probability of someone exceeding the height of 7.5 feet is very small. 3. As a matter of fact, approximately 68 percent of the arta under the normal ‘curve lies between the values of (ux 40x), approximately 95 percent of ‘Ror the thathematically inclined'student, here is the mathematical equation for the PDF of a noctally distributed nv: 100)» son A(R) here exp{ | means¢ raised to tepore eepetin ine | 6% 27128 (he bse of aaa logarithm), and x = 3.1459. x and of, own asthe distribution, are respec: eee un orp ae and he vaae af edition ar rc ete trof their marginal PDFs, thats, F(X, ‘CHAPTER FOUR: SOME |MPORTANT PROBABILITY DISTRIBUTIONS, 79 “3-9 Res br appre | = 95 (approx) 98.7% (approx): FIGURE 4-1 -- Areas under the normal curve. the area lies between (jx + 20x), and approximately 99.7 percent of the area lies between (11x + 30x),a8 shownin Figure 4-1, Asnoted in Chapter 2, and discussed further subsequently, these areas can be used as measures of probabilities."The total area under the curve is 1 of 100 percent, 4, Anormal distribution is fully desorited by its two paramelers, pix and-o2. * That is, once the values of these two parameters are known, we can find out the probability of X'lying within a ‘certain interval from the mathe- ‘matical formula given in footnote 1. Fortunately, we do not have to compute the probabilities from this formula because these probabilities can be oblained easily from the-specially “prepared table in Appendix A (Table A-1), We will explain how to use this table shortly. A linear combination (funiction) of two (or more) normally distributed random tariables is itself normally distributed—an especialy important property of the normal distribution in econometrics. To illustrate, let X~ N(x 09) ¥~ Nix. ¢¥) and assume that X and Y are indepenident.” ‘Now consider the linear combination of these two variables: W=aX + BY; where a and b are constant (e.g., W=2X +4); then W~ Nipw, on] (42) a where bw = (@ux+bpy) of, = (203 +003) a3) 2 Recall that two variables are independently distributed if their joint PDF (PME) is the product VAC. fral values of X and ¥. a 79‘80 PARTONE: BASICS OF PROBABILITY AND STATISTICS Note that in Bq, (43) we have used some of the properties ofthe expectation operator E and the variances of independent random variables discussed in Chapter 3, (See Section 32) Incidentally, expression (4.2) can be extended straightforwardly to a linear combination of more than two normal random variables. 6, Fora normal distribution, skewness (S) is zero and kurtosis (K) is 3. Example 4:1. Let X denote the number of roses sold daily by a flocst in uptown Manhattan and Y the number of roses sold daily by a florist in downtown Manhattan. Assume thatboth Xand Y are independently normally distributed as X ~ N(200, 64) and Y ~ N(150, 81). What is the average value of the roses sold in two days by the two florists and the cortespondisig variance-of sale?-Here W.= 2X-+.2Y. Therefore, following expression (43), we have E(W) = EQX + 2¥) = 500 and var(W) = 4 var(X) + 4 var(Y) = 580. Therefore, W is distributed normally with a mean value of 500and a variance of 580: (W~ N(G00, 580). The Standard Normal Distribation *9 Although a normal distribution is fully specified by its two parameters, (population) mean o expected value and variance, one normal distribution can differ from another in either its mean or variance, or both, as shown in Figute 4-2. How do we compare the various normal distributions shown in Figure 4-2? Since these normal distributions differ in either their mean values or Variances, orboth, lets define a new varjable, Z, as follows: ee an) ox If the variable X has a méan pix aid a variance of, it can be shown that the Z variable defined previously has a mean value of zero and a variance of one (or unity), (For proof, see Problem 4.26). In statistics such a variable is known as a unit or standardized variable. : If X~ N(ux, 03), then Z as defined in Bq, (44) is known a unit or standard normal variable, that is, a normal variable with zero mean and unit (or 1) vari- , ance. We write such a normal variable as: * Z~N(,1) (45)* Note that if Xand Y ae normally distributed bul are nt independent, W is stillnormallydis- tributed with the mean given in (4.3) but with the following variance (cf. Eq. 327}: 2 = ata} + Bod + ibcov (RV). ‘This canbe proved easily by noting the property of thenormal distribution that a linear function ofa normally dstiputed variables itself normally distributed Note that giver x and of,CHAPTER FOUR: “SOMEIMPORTANT PROBABILITY DISTRIBUTIONS. 81 FIGURE 42 (9) Diferent means, came valanot (b same méan, iret valance; (diferent means, diferent variances, Thus, aty normaly distributed ro: witha given mean and variance can be converted to «standard normal variable, which gently simplifies our task of computing probabilities, 4 we will show shortly. ‘The PDF and CDF (cumullative distribution function) of the standard normal distribution are shown in Figures 4-0(a) and 4-3(t), respectively. (See Section 2.5 on the definitions of PDF and CDR. See also Tables A-I(a) arid A-1(b) in Appendix A.) The CDE, like any other CDE, gives the probability that the standard normal variable takes a value equal to or less than z, that is, P(Z <2), where ris a specific numerical value of Z. "Toillustrate how we use the standard normal distribution to compute vari- 1 ous probabilities, we consider several concrete examples. Example 4.2, Itis given that X, the daily sale of read ina bakery, follows the normal distribution witha mean of 70 loaves and a variance of; thats, X~ N(70,9). What is the probability that on any given day the sale ofbread is greater than 75 loaves? Since X follows the normal distribution with the stated mean and variance, it follows that 5-70 ayo aa follows the standard normal distribution, Therefore, we want to find? P(Z > 1.67) Syote: Whether we write P(Z, > 1.67) or P(Z > 1,67} isimmateral because, as noted in Ch.2, the probability that «continuous 2 takes a particular vale (eg, 1.67)'sabvays 2er0. o482. PARTONE: BASICS OF PROBABILITY AND STATISTICS. FIGURE 4.3 PELRTSZSLAT = 0905 P(ZS167)= 09595 er ee ee ee) (2) POF and (6) COF ofthe standard normal variable. Now Table A-1(Q) in Appendix A gives the CDF of the standard normal distribution between values of Z = -3,0 to Z = 3.0 For example, this table shows that the probability that Z lies between -3.0 to Z = 1.67 = 0.9525. Therefore, P(Z > 1,67) = 1 - 0.9525 = 0.0475 Thats, the probability of the daily sale of bread exceeding 75 loaves of bread 480.0475 or about 4.75 percent. [See Figure 4-3(a).] Example 43, Continize with Example 4.2, bt suppose we now want to find out the probability of a daily sale of bread of 75 or fewer loaves, The answer is obvious from the previous example, namely, that this probability is 0.9525 which is shown in Figure 4-3(0).CHAPTER FOUR: SOME IMPORTANT PROBABILITY DISTRIBUTIONS. @3 Example 44, Continue with Example 4.2, but now suppose we want to find out the probability that the daily sale of bread is between 65 and 75 loaves. To compute this probability, we first compute Now from Table A-1 we see that P(-3.0 < Z-< 167) = 0.0475 and P(-3.0 < Z < 1.67) = 0.9525 Therefore, P(-2.67 < Z < 1.67) = 09505 ~ 010475 = 0.9050 That is, the probability is 905 percent that the sales volume will lie between 65 and 75 loaves of bread per day, as shown in Figure 4-3(2). Example 45. Continue with the preceding example but now assume that we want to find the probability that the sale of bread either exceeds 75 loaves or is less than 65 loaves per day. If you have niastesed the previous examples, you can see easily that this probability is 0.0950, as shown in Figure 4-3(a). As the preceding examples show, once we know that a particular rv. follows the normal distribution with a given mean and variance, all we have to do is convert that variable into the standard normal variable and compute the relevant probabilities from the standatd normal table (Table A-l). It is indeed remarkable that just one standard normal distribution table suffices to deal with any normadly distributed variable regardless of its specific mean and variance values. ‘As we have temarked earlier, the normal distribution is probably the single most important theoretical probability distribution because several (continu- us) randoai variables are found to be normally distributed or at least approximately so, We will show this in Section 4.2. But before that, we consider some practical problems in dealing with the normal distribution, aS 2{4 PARTONE: BASICS OF PROBABILITY AND STATISTICS ‘TABLE 4-1 25 RANDOM NUMBERS FROM N(0, 1) AND N(2, NOt)” NA) NOt)” NGA) 0.48524 4.25181 022068 = 021487 0.46282 0.01998 0.00719 -0.47728 009037 = 0.71217 «1.82007 196009 «0.59128 = -1.25408 1.62008 1.02684 3.09222 117658 -1.29695 41,0575 278722 061502 O.58124 241198 -1.80759 1.55853 2.58235 0.20687 1.71088 0.40786 = ~0.19653. 0.90193 0.24596 249463 0.14726 0.45979 osdeo2 3.69288 3.29003 Random Sampling from a Normal Population Since the normal distribution is used so extertsively in theoretical and practical statistics, it is important to know how we can obtain a random sample from such a population. Suppose we wish to draw a random sample of 25 observations from a normal probability distribution with a mean of zero and variance: of 1 [ie, the standard normal distribution, N(0, 1)]. How do we obtain such & sample? ‘Most statistical packages have routines, called random number generators, to obtain random samples from the most frequently used probability distributions. For example, using the MINITAB statistical package, we obtained 25 random numbers from an N(0, 1) normal population. These are shown in the first column of Table 4-1. Also shown in column 2 of the table is another random. sample of 25 observations obtained from a normal population with mean 2 and variance 4 (ie, N(2, 4)).° Of course, you car generate as many samples as wanted by the procedure just described. ‘the Sampling or Probabllity Distribution of the Sample X In Chapter 3 we introduced the sample mean [see Eq, (343)]as an estimator of the population mean. But since the sample mean is based on a given sample, its value will vary from sample to sample; that i, the sample mean can be treated SMMINTTAB will generate a random sample from a normal population with a given mean variance. Actually, once we obtain a random sample from the standard normal distribution [.e, N(, 1), ‘we can easly convert this sample to a normal population witha different mean and variance. Let Y=a+bZ, where Z isN( 1),and wherea and bare constants. Since Yisa linear combination of normally distributed variable, Y is itself riormally distributed with E(1) = E(a +-bZ) =a, since E(Z) =0 and var(a +bZ) =P? var(Z) =P, since var(Z) = 1. Hence, Y ~ N(a, 62), Therefore, if yurmultiply the values ofZ by band adda toit,you will ave a sample from a normal population Yethmeat and variance Thus a= Dandb-=2,wehave = NO,CHAPTER FOUR: SOME IMPORTANT PROBABILITY DISTRIBUTIONS 85 as anv, which will have its own PDR. Can we find out the PDF of the sample mean? The answer is yes, provided the sample is drawn randomly. InChapter 3 we described the notion of random sampling in an intuitive way by letting each member of the population have an equal chance of being included inthesample. Instatistics, however, the termi random samplingis usedina rather special sense. Wesay that X, Xo,..., Xx constitutes a random sample of size n fal these Xs are drawn independently from the same probability distribution (ie, each X; has the same PDF). The Xs thus drawn are knownas iid. (nidependently and identically distributed) random variables. In the future, therefore, the term random sample will denote a sample of i.i.d. random variables, For brevity, sometimes we will use the term an ii. sample to mean a random sample in the sense just described. Thus, if each X; ~ N(wx, 03) and if each X; value is drawn independently, then we say that Xj, Xo. +, Xp até iiid.-random variables, the normal PDF being their common probability distribution. Note two things about this definition: First, each X included in the sample must have the same PDF and, second, ‘each X included in the sample is drawn independently of the others. Given the very important concept of random sampling, we now develop another very important concept in statistics, namely, the concept of the sampling, or probability, distribution of an estimator, such 2s, say, the sample mean, X. A firm comprehension of this concept is absolutely essential to understand the topic of statistical inferencein Chapter 5 and for our discussion of econometrics in succeeding chapters, Since many students find the concept of sampling dis- ‘eibution somewhat bewildering, we will explain it with an example, Example 46. Consider normal distribution with amean valueof 10 anda variance of, that is, N(10, 4). From this population we obtain 20 random samples of 20 observations each. For each sample thus drawn, we obtain the sample mean value, X. ‘Thus we have a total of 20 sample means. These are collected in Table 4-2. Let us group these 20 means in a frequency distsibution, as shown in Tobie 43. ‘The frequency distribution of the sample means given in Table 4-3 may be called the empirical sampling, or probability, distribution of the sample means.” Plotting this empirical distribution, we obtain the bar diagram shown in Figure 4-4. If we connect the heights of the various bars shown in the figure, we obtain the frequency polygon, which resembles the shape of the normal distribution. If we had drawn many more such samples, would the frequency polygon take the familiar bell-shaped curve of the normal distribution; that is, would the ‘The sampling distribution of an estimator i like the probability dstibution of any randoxn ‘arisble exe that he andor variable in tis cse happens to be an estintor or ast, Put differently, a sampling distribution isa probability distribution where the random variable is ar estimator, such as the sample mean or sample variance. oo86 PARTONE: BASICSOF PROBABILITY AND STATISTICS TABLE 4:2 29 SAMPLE MEANS FROM ‘TABLE 43 . FREQUENCY DISTRIBUTION OF 20 (10,4). ‘SAMPLE MEANS. ‘Sample means (Xj) Range of Absolute Relative sample mean frequency frequency 9.641 10.194 10.040 10.249 85-89 1 0.05 9.174 10.32 90-04 1 0.05 10.840 10.390 95-09 5 0.25 10.480 9.404 0-104 8 040 11.386 8621 105-109 4 020 9.740 9.739 HOt 1 0.05 9.937 10.184 = = + 49250 Total 2 1.00 10.394 ‘Sum of 20 semple means = 201.05, 201.05 Ra = = 10052 Da VvariX) = 2 a 22 =0399 Note: R= =O f peels uo] § os Eon 025 £ 020 2 os & 030 05 a 8 BS IS Mean Value FIGURE 4-4 Distribution of 20 sample means from N{10, 4) population, sampling distribution of the sample mean in fact follow the normal distribution? Indeed, this is the case. Here we rely on statistical theory: If X;, Xo, ... Xy is a random sample froma riormal population with mean ix and variance o2, then the sample mean, X, also follows the normal distribution with the same mean j.x but with variance &, thatis, 2 a “ 2 Re (1, 3) 4.6)(CHAPTER FOUR: SOME IMPORTANT PROBABILITY DISTRIBUTIONS 67 Inother words, the sampling (or probability) distribution of the sample mean, X, the estimator of j.x, also follows the normal distribution with the same mean as that of each X;but with variance equal to the variance of Xj(= 03) divided by the sample size m (for proof, see Problem 4.25). As you can see for n > 1, the variance of the sample mean will be much smaller than the variance of any X;- Ifwe take the (positive) square root ofthe variance of X, we obtain %, which is called the standard error (se) of X, which is akin to the concept of standard deviation. Historically, the square root of the variance of a random variable is called the standard deviation and the square root of the variance of an estimator is called the standard error, Since an estimator is also a random variable, there is no need to distinguish the two terms, But we will keep the distinction because it is so well entrenched in statistics, Returning to our example, then, the expected value of X;, E(X;), should be 10, and its variance should be4/20 = 0.20. Ifwe take the mean value of the 20 sam- plemeans given in Table 43, call it the grand mean X, it should be about equal to F(%), and if we compute the sample variance of these 20 sample means, it should be about equal to 0.20. As Table 4-3 shows, X = 10.052, about equal to the expected value of 10 and var(X;) = 0.339, which is not quite close to 0.20. Why the difference? Notice that the data given in Table 4-2 is based only on 20 samples. Asnoted, if we had many mote samples (each based on 20 observations), we would come close to the theoretical result of mean 10 and variance of 0.20. Itis comforting to ‘know that we have such a useful theoretical result. As a consequence, we do not have to conduct the type of sampling experiment shown in Table 4-2, which can ‘be time consuming, Just based on one random sample from the normal distrib- tution, we can say that the expected value of the sample mean is equal to the true ‘mean value of jix. As we will show in Chapter 5, knowledge that a particular estimator follows a particular probability distribution will immensely help us in relating a Sample quantity to its population counterpart. In passing, note sic asa result of Bq (46), it follows at once that bx) =~ NO,1) (7) a that is, a standard normal variable. Therefore, you can easily compute from the standard normal distribution table the probabilities that a given sample mean is greater than ot ess than a given population méan. An example follows. Z= Example 47. Let X denote the number of miles per gallon achieved by cars of a particular model, You are told that X ~ N(20, 4). What is the probability that, for a random sample of 25 cars, the average gallons per mile will be a. greater than 21 miles b. less than 18 miles «. between 19 and 21 miles? | \ o188 PARTONE: BASICS OF PROBABILITY AND STATISTICS Since X follows the normal distbution with mean = 20 and variance = 4, we know that ¥ also follows the normal distribution with mean = 20 and variance = 4/25. As a resull,we know that 2 ana 2 Xm, 2 ~ M0, 1) VB That is, Z follows the standard normal distribution. Therefore, we want to find =) =P(Z>28) = 0.0062 (From Table A-1(t)) i 18~ =) 2 PE> 2) = ?(Z> P(X<18)= (<8 =P(Z<-5) 80 P(19 < X<2l) = P(-25< Z<25) | = 0.9876 Before moving on, note that the sampling experiment we conducted in Table 4-2 is an illustration of the so-called Monte Carlo experiments or Monte Carlo simulations. They are a very inexpensive method of studying properties of various statistical models, especially when conducting real experiments would'be time-consuming and expensive (see Problems 4.21, 4.22, and 4.23), The Central Limit Theorem (CLT) We have just shown that the sample mean of a sample drawn from a normal population also follows the normal distribution, But what about samples drawn from other populations? There isa remarkable theorem in statistics—the central limit theorem (CLT)—originally proposed by the French mathematician Laplace, which states that if Xy, X,..., Xq is a random sample from any population (Le, probability distribution) with mean y.x and o2, the sample meat X tends to be normally distributed with mean wx and variance & as the sample size increases indefinitely (technically, infinitely).® Of course, if the X;happen to be from the normal population, the sample mean follows the normal distribution regardless of the sample size. This is shown in Figure 4-5, “in practice, no matter what the underlying probability distribution is, the sample mean of a ‘sample size of at least 30 observations will be approximately normal. 3CHAPTER FOUR: SOME IMPORTANT PROBABILITY OISTRIBUTIONS 89 ~ ‘i sping dinttinal a, Noma population EN Sampling distribution of “ (approx. normal ifn is large) \ \ \ . AN " ‘Non-normal population i ASS # ® FIGURE'4-6 The central init theorer: (a) Sartples dravn from a normal population; {B) samples dean from a.noe-eortal population, 42 THE tDISTRIBUTION The probability distribution that we use most intensively in this book is the # distribution, also known as Student’ ¢ distribution. It is closely related to the normal distribution. ‘To introduce this distribution, recall that if X~ N(wx, o2/n), the variable (X- px) NOM) that, the standard normal distribution. This isso provided that both px and 2 are known. But suppose we only know px and estimate o3 by its (sample) estimator $3 = 225% given in Bq, (4), Replacing ox by S, thats, replacing the population standard deviation (a..) by the sample sd, in Equation (4.7), we obtain a new variable A: an _ _— Ss (4.8) Siva ae ‘Student was the pseudonym of W.S. Gsset, who used to work a statistician for Guinness's Brewery in Dublin, He discovered this probability distribution in 1908,90 PART ONE: BASICS OF PROBABILITY AND STATISTICS Statistical theory shows that the f variable thus defined follows Student's distri bution with (1) d, Justas the mean and variance are the parameters of the normal distribution, the ¢ distribution has a single parameter, namely, the df, which. inthe present case are (n—1). Note: Before we compute $? (and hence $,), we must first compute X. But since we use the same sample to compute X, we have (t~1), not t, independent observations to compute $*; so to speak, we lose 1d. Insam, if we draw random samples from a normal population with mean pix and variance o} but replace of by its estimator SZ, the sample mean X follows the t distribution. A t-distributed rv. is often designated as ty, where k denotes the df. (To avoid confusion with the sample size 1, we use the subscript k to denote the d.f. in general) Table A-2 in Appendix A tabulates the t distribution for various d.f. We will demonstrate the use of this table shortly. Properties of the t Distribution ao FIGURE 4.6 41. Thet distribution, like the normal distribution, is symmetric, as shown in Figure 4-6. 2, The mean of the f distribution, like the standard normal distribution, is 2er0, but its variance is k/(— 2). Therefore, the variance ofthe £ distribution is defined for df. greater than 2. We have already seen that for the standard normal distribution the variance is always 1, which means that the Variance of the ¢ distribution is larger than the variance of the standard normal distribution, as shown in Figure 4-6. In other words, the ¢ distribution is flatter than the normal distribution, But as k increases, the variance of the ¢ distribution approaches the variance of the standard normal distribution, namely, 1. Thus, if the d.f. are k = 10, the variance of the f distribution is 10/8 =-1.25; if k = 30, the variance becomes 30/28 = 1.07; and, when k = 100, the variance becomes 100/98 = 1.02, which is not much greater than 1, Asa zesuilt, thet distribution approaches the standard normal distribution as the df. increase. But notice that even for k as small as 30, there is not a 3 t ‘The J ¢stubution for golected dearees of freedom (A).‘CHAPTER FOUR: SOME IMPORTANT PROBABILITY DISTRIBUTIONS 91 great difference in the variances of the f and the standard normal variable, ‘Therefore, the sample size does not have to be enormously large for the distr- bution to approximate the normal distribution, ‘Tillustrate the f table (Table A-2) given in Appendix A, we now consider a few examples. ‘Example 48, Let us revisit Example 4.2. In a period of 15 days the sale of bread averaged 7A loaves with a (sample) sd. of loaves, What s the probability of obtaining such a sale given that the true average sale is 70 loaves a day? + If we had known the true a, we could have used the standard normal Z variable to answer this question. But since we know its estimator, §, we can ‘use Eq. (48) to compute the f value and tise Table A-2 in Appendix A to answer this question as follows: 74-70 ~ 4apJi5 = 3.873 Notice that in this example the df. are 14 = (15-1), (Why?) As Table A-2 shows, for 14 df. the probability of obtaining at value of 2.145 ot greater is 0.025 (2.5 percent), of 2.624 or greater is 0.01 (1 percent), and of 3.987 or greater is 0.001 (0.1 percent). Therefore, the probability of obtaining a t value of as much as 3.873 or greater mustbe much smaller than 0.001. Example 49. Let us keep the setup of Example 4.8 intact except to assume that the sale of bread averages 72 loaves in the said 15-day period. Now what is the probability of obtaining such a sales figure? Following exactly the same line of reasoning, the reader can verify that the computed f value is ~1.936. Now from Table A-2 we observe that for 14 d.f. the probability of obtaining a f value of 1.761 of greater is ~0.05 (or 5 percent) and that of 2.145 or greater is 0.025 (or 2.5 percent). Therefore, the probability of obtaining a # value’ of 1.936 or greater lies somewhere between 2.5 and 5 percent. Example 4.10. Now assume that in a 15-day period the average sale of bread was 68 loaves with an s.d. of {loaves aday. Ifthe true mean sales are 70 loaves a day, what is the probability of obtaining such a sales figure? Plugging in the relevant numbers in Equation (4.8), we find that the value in this case is -1.936, But since the ¢ distribution is symmetric, the 292 PARTONE: BASICS OF PROBABILITY AND STATISTICS probability of obtaining a ¢ value of ~1.996 or smaller is the same as that of obtaining a f value of +1.936 or greater, which, as we saw earlier, is somewhere between 2.5 and 5 percent, Example 411. Again, continue with the previous example, What is the probability that the average sale of bread in the said 15-day period was either gieates than 72 loaves or less than 68 loaves? From Examples 4.9 and 4.10 we know that the probability of the average sale exceeding 72 or being less than 68 is the same as the probability that a ¢ value either exceeds 1.936 or is smaller than ~1.996.!° These probabilities, as we saw previously, are each between 0,025 and 0.05. Therefore the total probability will be between 0.05 or 0.10 (or between 5 and 10 percent). In cases like this we would, therefore, compute the probability that If > 1.936, where || means the absolute value off, that is, the # value disregarding the sign. (For example, the absolute value of 2 is 2 and the absolute value of ~2 is also 2.) From the preceding examples we see that once we compute thet value from Eq, (48), and once we know the dé, computing the probabilities of obtaining a given # value involves simply consulting the t table. We will consider further uses of the table in the regression context at appropriate places later in the text: Example 4.12: For the years 1967 to 1990 the Scholastic Aptitude Test (S.A.T, scores were as follows: a M Female Verbal (average) 440.42, 434.60 (142.08) (803.38) Math (average) . 5000. 45387 (46.61) (68.88) ‘Note:The figures in parentheses are the vadanoes. Arandom sample of 10 male S.A. scores on the verbal test gave the (sample) mean value of 440.60 and the (sample) variance of 137.60. What is the probability of obtaining such a score knowing that for the entire 1967 to 1990 period the (true) average scove was 440.42? With the knowledge of the distribution, we can now answer this question easily: Substituting the relevant values in Bq, (48), we obtain 60 — 440.42 *Becarefulhere, Thertumber ~2.0issmallerthan —196,andthenumber~23issmallerthan—20,(CHAPTER FOUR: SOME litPORTANT-PROBABILITY DISTRIBUTIONS 93 This £ value has the ¢ distribution with 9 d.f, (Why?) From Table A-2 we observe that the probability of obtaining such at value is greater than 0.25 or 25 percent. ‘A note on the use ofthe t table Table A-2). With the advent of user-friendly statistical softwase packages and electronic statistical tables, Table A-2.s now of limited value because it gives probabilities for a few selected df. This is also true of the other statistical fables given in Appendix A. Therefore, if you have access to one or more statistical software packages, you can compute probabilities for any given degrees of freedom much more accurately than using those given in Appendix A tables, 4,3. THE CHI-SQUARE (x") PROBABILITY DISTRIBUTION Now that we have derived the sampling distribution of the sample mean X, (normal if the true standard deviation is known or the # distribution if we use the sample standard deviation) can we derive the sampling distribution of the , = Lee sample variance, S*= *257" since we use the sample mean and sample variance very frequently in practice? The answer is yes, and the probability distribution that we need for this purpose is the chi-square (x) probability distribution, which is very closely related to the normal distribution. Note that just as the sample mean will vary from sample to sample, so will the sample variance. Thats, like the sample mean, the sample variance is also a random variable. Ofcourse, when we havea specific sample, we have a specific sample mean and a specific sample variance value. ‘We already know that if a random variable (rv.) X follows the normal distribution with mean j.x and variance of, that-is, X~ N(x, 03), then the xv, Z=(K~wx)/ox is a standard normal variable, that is, Z ~ N(0, 1). Statistical theory shows that the square of a standard normal variable is distributed as a chi-square (x2) probability distribution with one degree of freedom (df). Symbolically, Barby as) where the subscript (1) of x? shows the degrees of freedom (d.f)—I in the present case. As in the case of the f distribution, the dif. is the parameter of the chi-square distribution, In Equation (4.9) there is only 1 dif. since we are considerinig only the square of one standard normal variable. Annote on degrees of freedom: In general, the number of d.f. means the number ofindependent observations available to.compute a statistic, such as thesample mean or sample variance. For example, the sample variance of an tv. X is defined as $* = (Xi; — X)?/(n ~1). In this case we say that the number of df. is (1 = 1) because if we use the same sample to compute the sample mean X, around which we measure the sample variance, s0 to speak, we Jose one d.f,; that is, we have only (1 — 1) independent observations. An example will clarify this further. Consider three X values: 1, 2, and 3. The sample mean is 2. Now 9394 PARTONE: BASICS OF PROBABILITY AND STATISTICS Ke Probability Density a FIGURE 4-7 Density function ofthe x2 variable. since 32% — X) = 0 always, of the three deviations (1 -2}, @2~ 2), and (8-2), only two can be chosen arbitrarily; the third must be fixed in such a way that the condition J°(%; — X) = is satisfied." Therefore, in this case, although there are three observations, the df are only 2. Now let Zy, Zp... Zebek-independent unit normal variables (i.e, each Z is a normal rv. with zero mean and unit variance). If we square each of these Zs, we can show that the sum of the squared Zs also follows a chi-square distribution with kd. That is, LF= G+ Bt Bmry _ 410) Note that the df. are now k since there are I-independent observations in the sum of squares shown in Equation (4.10). Geometrically the x? distribution appears as in Figure 4-7. Properties ot the Chi-square Distribution 1. As Figure 47 shows, unlike the normal distribution, the chi-square distribution takes only positive values (after all, it is the distribution of a squared quantity) and ranges from 0 to infinity. 2. As Figure 4-7 also shows, unlike the normal distribution, the chi-square distribution is a skewed distribution, the degree of the skewness depending on the df For comparatively few dé. the distribution is highly skewed to UWNote that DO - X) =EX- LX=nX-nX=0, because X= LP Xi/n and LX= nk, because X isa constant, given a particular sample, 94CHAPTER FOUR: SOME IMPORTANT PROBABILITY DISTRIBUTIONS 95 the right, but as the df increase, the distribution becomes increasingly symmetrical and approaches the normal distribution, 3, The expected, or mean, value ofa chi-square tv. is kand its variance is 2k, where kis the d.f. This i a noteworthy property ofthe chi-square distribution in that its variance is tice its mean value, 4, IE Zy and Zp are two independent chi-square variables with ky and ky d.f, then their sur (Zy + Za)is also a chi-square variable with df. = (ky +h). ‘Table A-4in Appendix A tabulates the probabilities thata particular x? value exceeds a given number, assuming the cf, underlying the chi-square value are known or given. Although specific applications of the chi-square distribution in regression analysis will be considered in later chapters, for now we will ]ook at how to use the table. Example 4.13, For 30 df, what is the probability that an observed chi-square value is greater than 13.782 Or greater than 18.49? Or greater than 50.89? From Table A-4 in Appendix A we observe that these probabilities are 0.995, 0.95, and 0.01, respectively. Thus, for 30 df. the probability of obtaining a chi-square value of approximately 51 is very small, only about 1 percent, but for the same d.f. the probability of obtaining a chi-square value of approximately 14 is very high, about 99.5 percent. Example 4.14, If Sis the sample variance obtained from a random sample of » observations from a normal population with the variance of 02, statistical theory shows that the quantity 2 {n- 45) ~Xhey (4.1) ‘That is, the ratio of the sample variance to population variance multiplied by the df, (1-1) follows the chi-square distribution with (1-1) df. Suppose a random sample of 20 observations from.a normal population with 6? = 8 gave a’sample variance of S* = 16, What is the probability of obtaining such a sample variance? Putting the appropriate numbers in the preceding expression, we find that 19(16/8) = 38 is a chi-square variable with 19 dif. And from Table A-4 in Appendix A we find that for 19 df. if the true 0? were 8, the probability of finding a chi-square value of ©38 is “0,005, a very small probability. There is doubt whether the particular random sample came from a population with a variance value of 8. But we will discuss more on this in Chapter 5, In Chapter 5 we will show how Eq, (.1) enables us to test hypotheses about 0? if we have knowledge only about the sample variance $?.96 PART ONE: BASICS OF PRORABILITY AND STATISTICS 4.4 THE FDISTRIBUTION <2 a Another probability distribution that we will find extremely useful in econometrics is the F distribution. The motivation behind this distribution is as follows. Let X1, Xs, ..., Xm be a random sample of size m from a normal population with mean px and variance 03, and let Yz, Y2,..., Yn be a random sample of sizen from a normal population with mean jy and variance o. Assume that these two samples are independent and are drawn from populations that are normally distributed. Suppose we want to find out if the Variances of the two normal populations are the same, that is, whether o% = 0. Since we cannot directly observe the two population variances, let us suppose we obtain their estimators as follows: /-FP g-y% =H (422) (433) 14? If the two population variances are in fact equal, the F ratio given in Equation (4.14) should be-about 1, whereas if they are different, the F ratio should be different from 1; the greater the difference between the two variances, the greater the F value will be. Statistical theory shows that if 0 = 0? (ie,, the two population variances are equal), the-F ratio given in Eq, (4:14) follows the F distribution with (m - 1) (numerator) d.f. and (1-1) (denominator) d.f. And since the F distribution is often used to compare the variances of two (approximately normal) populations, itisalso knownas the variance ratio distribution. The F ratio is often designated s Fy, Where the double subscript indicates the parameters of the distribution, By convention, in computing the F value the variance with the larger numerical values putin the numerator. That is why the F values always 1 or greater than 1 Als, note that if a variable, say, W, follows the F distribution with mand nd in the numeretor and denominator, respectively then. the variable (1/W) also follows the F distabution but with rt and m df in the numerator and denomninstor, respectively. More specially, Fawn ‘here a denotes th Jevgfsgicace, which we wll dissin Chapter ‘tobe precise, Gi follows the F distribution. But if o} = cf, we have the F ratio given in iq, (14), Note that in computing the two Sample variances we loose 1 if. foreach, because in each. case, we use the same sample to compute the sample mean, which consumes dfCHAPTER FOUR: SOME IMPORTANTPROBABILITY DISTRIBUTIONS 97 ff) Probability Density . FIGURE 4-8 The Fistibution for various df. namely, the numerator and the denominator d.f, [in the preceding example y= (m~1) andy =(n-1))* Properties of the F Distribution 1. Like the chi-square distribution, the F distribution is also skewed to the right and also ranges between 0 and infinity (see Figure 46). 2, Also, like the # and chi-square distributions, the F distribution approaches the normal distribution ask, and ky, the df, become large (technically, infinite), 3. The square of a f-distributed 1.v. with k df. has an F distribution with 1 and kd. in the numerator and denominator, respectively. Thats, gap (4.15) ‘We will ee the usefulness ofthis property in Chapter 8, 4, Just as there is a relationship between the F and f distributions, there is a relationship between the F and chi-square distributions, which is a Fran) = e sn 0 4169, oe That is, a chi-square variable divided by is d,,m, approaches the F variable with m df. in the numerator and very large (technically, infinite) df. in the denorninatot. Therefore, in very large samples, we can use the x? = \The F distribution has two sets of A because statiatical theory shows that the F distribution is the distribution of the ratio of two independent chi-square random variables divided by their respective dé 9798 PARITONE: BASICS OF PROBABILITY AND STATISTICS distribution instead of the F distribution, and vice versa, We can write (4.16) alternatively as m:Flagy =X2 asm 00 (437) That is, numerator df. times Fn,n) equals a chi-square value with rumerator GE, provided the denominator degrees of freedom are sufficiently large (technically, infinite). ‘The F distribution is tabulated in Table A-3 in Appendix A. We will consider its specific uses in the context of regression analysis later, but in the meantime Jet us see how this table is used. Example 4.15, Let us return to the S.A.T, example (Example 4.12), Assume that the verbal scores for males and females are each normally distributed. Further assume that average scores and their variances given in the preceding table represent sample values from a much larger population. Based on the two sample variances, can we assume that the two population variances are the same? Since the verbal scores of the male and female populations are assumed to be normally distributed random variables, we can compute the F ratio given in Eq. (4.14) as 303.39 Fe = 2.1353 142.08 which has the F distribution with 23 d.f. in the numerator and 23 df in the denominator, (Note: in computing the F value, we are putting the larger of the two variances in the numerator) Although Table 4-3 in Appendix A does not give the F value corresponding to df. of 23, if we use 24 df. for both the numerator and the denominator, the probability of obtaining an F value of about 2.14 lies somewhere between 1 and 5 percent, Jf we regard this probability as too low (more about this in Chapter 5), we could say that the two population variances do not seem to be equal; that is, there is a difference in the variances of male and female scores on the verbal part of the S.A.T. fest. Remember that if the two population variances are the same, the F value will be 1, but f they are different, the F value will be increasingly greater than 1. Example 4.16. An instructor gave the same econometrics examination to two classes, one consisting of 100 students and the other consisting of 150 students. He draws a random sample of 25 students from the frst class and a random sample of 31 students from the other class and observes that the sample variances of the grade-point average in the two classes were 100 and 132, respectively: ItCHAPTER FOUR: SOME IMPORTANT PROBABILITY DISTRIBUTIONS 93 is assumed that the rv, grade-point average, in the two classes is normally distributed. Can we assume that the variances of grade-point average in the two classes are the'same? Since we are dealing with two independent random samples drawn from twonormal populations, applying the Fsatio given in Eq, (4.14), we find that 132 Fe =132 100 follows the F distribution with 90 and 24 df, respectively. From the F values given in Table A-3 we observe that for 30 numerator d.f. and 24 denominator di. the probability of obtaining an F value of as much a5 1.31 ot greater is 25 percent. If we regard this probability as reasonably high, we can conclude that the (population) variances in the two econometrics classes are (statistically) the same. 45 SUMMARY In Chapter 2 we discussed probability distributions in general terms. In this chapter, we considered four specific probability distributions—the normal, the é, the chi-square, and the F—and the special features of each distribution, in particular, the situations in which these distributions can be useful. As we will see in the remainder of tis book, these four PDFs play a very pivotal role in econo- metrictheory and practice. Therefore, a solid grasp of the fundamentals of these distributions is essential to follow the ensuing material. You may want to return to this chapter from time to time to consult specific points ofthese distributions when they are referred to in later chapters. KEY TERMS AND CONCEPTS . ‘The key terms and concepts introduced in this chapter are The normal distribution Monte Carlo experiments ot a) unit or standardized simulations variable Central limit theorem (CLT) ») unit or standard normal { distribution (Student's variable distribution) Random number generators Chi-square (2) probability Random sampling; iid. random distribution variables Degrees of freedom (d.£) Sampling, or probability, distribution F distribution of an estimator (eg, the sample a) variance ratio distribution mean) 'b) numerator and denominator Standard error degrees of freedom (df) AE 99100 PARTONE: BASICS OF PROBABILITY AND STATISTICS QUESTIONS PROBLEMS 109 4, Explain the meaning of a. Degrees of freedom. , Sampling distribution of an estimator. ¢ Standard error. 42, Consider a random variable (tv) X~N(G, 16). tate whether the following statements are true or false: a. The probability of obtaining an X value of greater than 12 is about 0.16. b, The probability of obtaining an X value between 12.and 14 is about 0.09. ¢ The probability that an X value is more than 2.5 standard deviations from the mean value is 0,0062. . 43, Continue with Question 4.2. a. What is the probability distribution of the sample mean X obtained from a random sample from this population? b. Does your answer to (a) depend on the sample size? Why or why not? , Assuming a sample size of 25, what is the probability of obtaining X of 6? 44, What is the difference between the f distribution and the normal distribution? When should you use the ¢ distribution? 4.5: Consider an r¥, that follows the distribution, a. For 20 degrees of freedom (d.f.), what is the probability that the f value will be greater than 1.325? by And that the ¢ value in 4.5(a) will be less than ~1325? . What is the probability that a value will be greater than or less than 1.325? 4. Istherea difference between the statement in 4.5(c) and the statement, “What is the probability that the absolute value of, ||, willbe greater than 1.3252” 46, True or false, For a sufficiently latged.f, thet, the chi-square, and the F distributions all approach the unit normal distribution. 4.7, Fora sufficiently large d.f, the chi-square distribution can be approximated by the standard normal distribution as: Z = 2x? — /2k—1 ~ N(0, 1) Letk=50. a. Use the chi-square table to find out the probability that a chi-square value will exceed 80. b. Determine this probability by using the preceding normal approtimation. ¢. Asstume that the dif. are now 100, Compute the probability from the chi- sqiare table as well as from the given normal approximation. What conclusions canyou draw from using thenommal approximation tothe chi-square distribution? 4.8. What is the importance of the central limit theorem in statistics? 4.9. Give examples where the chi-squiare and F probability distributions can be used. 4.10, Profits (X) in an industry consisting of 100 firms are normally distributed with a mean value of 1.5 million anda standard deviation (sd) of 120,00. Calculate a. P(X < $1 million) ‘b. P($800,000 < X = $1,300,000) 4.11, In problem 4.10, if10 percent of the firms are to exceed a certain profit, whatis that profit?CHAPTER FOUR: SOMEIMPORTANT PROBABILITY DISTRIBUTIONS "101 4.12, The grade-point average in an econometrics examination was normally distributed with a mean of 75.In a sample of 10 percent of students it was found that the grade-point average was greater than 80. Can you tell what the sd. of the grade-point average was? 4.13, The amount of toothpaste in a tube is normally distributed with a mean of 65 ounces and ans.d. of 0.8 ounces, The cost of producing each tube is 50 cents, Ifina quality control examination a tube is found to weigh less than 6 ounces, itis to be refilled to the mean value at a cost of 20 cents per tube, On the other hand, if the tube weighs more than. 7 ounces, the company loses a profit of 5 cents per tube, 1 1000 tubes were examined, a, How many tubes will be found to contain less than 6 ounces? b, In that case, what wil be the total cost of the refill? ¢. How many tubes will be found to contain more than 7 ounces? In that case, ‘what will be the amount of profits lost? 1X ~ N(10, 3) and Y ~ N(5, 8),and if X and Y are independent, what is the probability distribution of aX+Y bX-Y e3X d4X+5y 4.15. Continue with problem 4.14 but now assume that X and Y are positively correlated with a correlation coefficient of 0.6, 4.16. Let X and Y represent the rates of retum (in percent) on two stocks. You are told that X ~ N(15, 25) and ¥-~ N(8,4), and that the correlation coefficient between the two rates of return is ~0.4. Suppose you want to hold the two stocks in yout portfolio in equal proportion. What is the probability distribution of the retum on the portfolio? Is it better to hold this portfolio or to invest in only one of the two stocks? Why? 4.17. Return to Example 4.12. A random sample of 10 female $.A.T. scores on the math test gave a sample variance of 85.21. Knowing that the true variance is 88.88, what is the probability of obtaining such a sample value? Which probability distribution do you use to answer this question? What are the assumptions underlying that distribution? 4.18, The 10 economic forecasters of a random sample were asked to forecast the rate of growth of the real gross national product (GNP) for the coming year. 4 = Suppose the probability distribution of the r.v.—forecast—is normal, 2. The probability is 0.10 that the sample variance of forecast is more than X | ‘percent of the population variance. What is the value of X? . b, If the probability is 0.95 so that the sample variance is between X and Y per- v cent of the population variance, what willbe the values of X and Y? 4.19, When a sample of 10 cereal boxes of a well-known brand was reweighed, it gave the following weights (in ounces): 7613 1602 1590 1569 1600 4879 1601 1604 15961620 a What is the sample mean? And the sample variance? b. If the true mean weight per box was 16 ounces, what is the probability of obtaining such a (sample) mean? Which probability distribution do you use and why? 101102 PART ONE: BASICS OF PROBABILITY AND STATISTICS. 4.20, The same microeconomics examination was given to students at two different universities, The results were as follows: x =90,. 1m =50 R=, $=72 . m=40 where the Xs denote the grade averages in the two samples, the S, the two saniple variances; and the 7s, the sample sizes. How would you test the hypothesis that the population variances of the test scores in the two universities are the same? Which probability distribution would you use? What are the assumptions underlying that distribution? 4.21. Monte Carlo Simulation. Draw 25 random samples of 25 observations from the # distribution with k = 10 df. For each sample compute the sample mean. What is the sampling distribution of these sample means? Why? You may use graphs to illustrate your answer. 4.22, Repeat problem 4.21, but this time use the x? distribution with 8 df. 4,23. Repeat problem 4.21, but use the F distribution with 10 and 15 df. in the numerator and denominator, respectively. 424. Using Eq, (4.16), compare the values of yf) with Fio,o,Finae, and Fingy. What general conclusions do you draw? _ 425, Given X~ N(ux,03), prove that X~N(ux,03/n). Hint: var(X) = var(2+24--44), Expand this, recalling some of the properties of the variance discussed’in Chapter 3 and the fact that the Xjare iid. 5 Prove that Z = (*2#*), has zero mean anid unit whether Z is normal or not. Hint: E(2) = E 4.2CHAPTER STATISTICAL INFERENCE: ESTIMATION AND HYPOTHESIS TESTING Equipped with the knowledge of probability; random. variables; probability distributions; and characteristics of probability distributions, such as expected value, variance, covariance, correlation, and conditional expectation, we are now ready to undertake the impostant task of statistical inference. Broadly speaking, statistical inference is concerned with drawing conclusions about the nature of some population (e.g, the normal) on the basis of a random sample that has supposedly been drawn from that population. Thus, if we believe that a particular sample has come from a normal population and we compute the sample mean and sample variance from that sample, we may want to know what the true (population) mean is and what the variance of that population may be. Aa 8.1 THE MEANING OF STATISTICAL INFERENCE* As noted previously, the concepts of population and sample are extremely important in statistics, Population, as defined in Chapter 2, is the totality ofall possible outcomes of a phenomenon of interest (eg, the population of New York City). sampleisa subset of a population (eg,, the people living in Manhattan, which is cone of the five boroughs of the city). Statistical inference, loosely speaking, isthe study of the relationship between a population and a sample drawn from that population. To understand what this means, let us consider a concrete example, "Broadly speaking, there are two approachés to statistical inference, Bayesian and classical. (Classical approach, as propounded by statisticians Neyman and Pearson, is generally the approach shat a beginning student in statistics frst encounters. Although there are basic philosophical difer- ences in the two approaches, there may not be gross dilferencs in the inferences that result. er, 103104 PARTONE: BASICS OF PROBABILITY AND STATISTICS 194 TABLE 5-1 PRICE TO EARNINGS (P/E) RATIOS OF 28 COMPANIES. (ON THE NEW YORK STOCK EXCHANGE (NYSE), Company PIE Company PIE wwe. 96.02 IBM 22.94 JPM 12.10 INS 22.43 MCD 22.13 MRK 16.48 MSFT 33.75 MMM 26.05 MO 12.21 PG 24.49 ‘SBC 14.87 UTX 14.87 WMT 27.84 DIS 37.10 13, standard deviation = 9.49 ‘Source: www.stokselectorcom. Table 5-1 gives data on the price fo earnings ratio—the famous P/E ratio—for 28 companies listed on the New York Stock Exchange (NYSE) for February 2, 2004 (at about 3 p.m.)? Assume that this is a random sample from the universe (population) of stocks listed on the NYSE, some 3000 or so. The F/E ratio of 27.96 for Alcoa (AA) listed in this table, for example, means that on that day the stock was selling at about 28 times its annual earnings. The FYE ratio is one of the key indicators for investors in the stock market. Suppose our primary interest is not in any single P/E ratio, but in the sver- age P/E ratio in the entire population of the NYSE listed stocks. Since we can obtain data on the P/E ratios of all the stocks listed on the NYSE, in principle, ‘we can easily compute the average P/E ratio.-In practice, that would be time consuming and expensive. Could we use the data given in Table 5-1 to compute the average P/E ratio of the 28 companies listed in this table and use this (sample) average as an estimate of the average P/E ratio in the entire population of the stocks listed on the NYSE? Specifically if we let X = P/E ratio of a stock and X= the average P/E ratio of the 28 stocks given in Table 5-1, can we tell what the expected P/E ratio, E(X), isin the NYSE population asa whole? This process of generalizing from the sample value (e.g.,X) to the population value (eg,, E(X)] is the essence of statistion inference. We will now discuss this topic in some detail. 2ince the price of the stock varies from day to day, the F/E ratio will vary from day to day, even though the eamnings do not change. The stocks given inthis table are members ofthe so-called Dow 30. In reality stock prices change very frequently when the stock market is open, but most newspa- ‘Pets quote the P/E ratio as ofthe end ofthe business day.CHAPTER FIVE: STATISTICAL INFERENCE: ESTIMATION AND HYPOTHESIS TESTING 105 5.2 ESTIMATION AND HYPOTHESIS TESTING: TWIN BRANCHES. OF STATISTICAL INFERENCE From the preceding discussion it can be seen that statistical inference proceeds along the following lines, There is some population of interest, say, the stocks listed on the NYSB, and we are interested in stadying some aspect of this population, say, the P/E ratio. Of course, we may not want to study each and ‘every P/E ratio, but only the average P/E ratio, Since collecting information on ali the NYSE P/E ratios needed to compute the average P/E ratio is expensive and time-consuming, we may obtain a random sample of only a few stocks to get the P/E ratio of each’of these sampled stocks and compute the sample average P/E ratio, say, X. X is an estimator, also known as a (sample) statistic, of the population average P/E ratio, E(X), which is called the (population) para- tnetet; (Recall the discussion in Chapter 3). For example, the mean and variance’ are the parameters of the normal distribution. A particular numerical value of the estimator is called an estimate (e.g., an X value of 23). Thus, esti- tation is the first step in statistical inference, Having obtained an estimate of a Parameter, we next need to find out how good that estimate is, for an estimate is not likely to equal the true parameter value. If we obtain two or more random samples of 28 stocks éach and compute X for each of these samples, the two estimates will probably not be the same, This variation in estimates from sample to sample is known as sampling variation or sampling error’ Are there aniy criteria by which we can judge the “goodness” of an estimator? In Section 5.4 we discuss some of the commonly. used criteria to judge the goodness of an estimator. Whereas estimation is one side of statistical inference, hypothesis testing is the other. In hypothesis testing we may have prior judgment or expectation about what value a particular parameter may assume. For example,. prior knowledge or an expert opinion tells us'that the true average P/E ratio in the Population of NYSE stocks i, say, 20. Suppose a particular random sample of 28 stocks gives this estimate as 23. is this value of 23 close to the alue of 20? Obviously, the number 23 is different from the number 20. But the important question here is this: Is 23 statistically different from 20? We know that / because of sampling variation there is likely 40 be a difference between a (sam- c ple) estimate and its population value: Itis possible that statistically the number 23 may not be very different from the number 20, in which case we may not reject the hypothesis that the true average P/F ratio is 20. But how do we decide that? This is the essence of the topic of hypothesis testing, which we will discuss in Section 5.5, With these preliminaries, let us examine the twin topics of estimation and hypothesis testing in some detail. ‘Novice ta this spling ero ot debt, but tous because we ave a random n= ple and the elements included in the sample wil vary from sample to sample. This is inevitable in any analysis based on a sample.106 PART ONE: . BASICS OF PROBABILITY AND STATISTICS 5.3 ESTIMATION OF PARAMETERS In Chapter 4 we considered several theoretical probability distributions. Often we know or are willing to assume that a random variable X follows a particular distribution, but we do not know the value(s) of the parameter(s) of the distribution. For example, if X follows the normal distribution, we may want to noir the values ofits two parameters, namely, the mean E(X) = px and the variance 03, To estimate these unknowns, the usual procedure is to assume that we have a random sample of size n from the known probability distribution and . to use the sample {0 estimate the unknown parameters. Thus, we can use the sample mean as an estimate of the population mean (or expected value) and the sample variance ag an estimate of the population variance. This procedure is known as the problem of estination. The problem of estimation can be broken down into two categories: point estimation and interval estimation. To fix the ideas, assume that the random vasiable (.v), X (P/E ratio), is normally distributed with a certain mean and a certain variance, but for naw we do not know the values of these parameters. Suppose, however, we havea random sample of 28 P/E ratios (28 Xs) from this normal population, as shown in Table 5-1. : iow can we we this sample data fo compute the population mean value tix = E(X) and the population variance of? More specifically, suppose our immediate interest is in finding out j.x.* How do we go about it? An obvious choice is the sample mean X of the 28 P/E ratios shown in Table 5-1, which is 23.25, We call this single numerical oalue the point estimate of tx, and the for mula X= 2 X//n that we used to compute this point estimate is called the point estimator, or statistic. Notice that a point estimator, ora statistic, isan rv, as its value ‘will vary from sample to sample. (Recall our sampling experiment in Example 4-6.) Therefore, how teliable is a specific estimate such as 23.25 of the true j.x? In other words, how can we rely on just one estimate of the true population mean? Would it not be better to state that although X is the single best guess of the true population mean, the interval, say, from.19 to 24, most likely includes the true 1x? This is essentially the idea behind interval estimation. We will now consider the actual mechanics of obtaining interval estimates, The key idea underlying interval estimation is the notion of sampling, or probability, distribution of an estimator such as the sample mean X, which we have already discussed in Chapter 4. In Chapter 4 we saw that if an rv. X~ N(x, 03), then > B X~ (v= 2) en or (X- px) Z= fe TNO) (52) “This discussion can be easily extended to estimate 0%, 195‘CHAPTER FIVE: STATISTICAL INFERENCE: ESTIMATION AND HYPOTHESIS TESTING, 107. 2082 0 2052 FIGURE 6-1 The tdistibution for 27 df. Thats, the sampling distribution of the sample mean X also follows the normal distribution with the stated parameters > As pointed out in Chapter 4 of is not generally known, but if we use its estimator S? = -(X; ~ ¥?/(#— 1), then we know that 1 Fo) St follows the t distribution with (1 ~ 1) degrees of freedom (d.f). Tose how Equation (6.3) helps us to obtain an interval estimation of the px of ous P/E example, note that we have a total of 28 observations and, therefore, 27 4. Now if we consult the table (Table A-2) given in Appendix A, we notice that for 27 df, 63) P(-2.052 < ¢ < 2.052)=0.95 (5.4) as shown in Figure 5-1. That is, for 27 df, the probability is 0.95 (or 95 percent) that the interval (~2.052, 2052) will include the ¢ value computed from Eq, (63)! These t values, as we will see shortly, are known as eritical t values; they show what percentage of the area under the f distribution cusve (see Figure 5-1) lies ‘between those values (note that the total area under the curve is 1); # = ~2.052 iscalled the lower critical t valve and t = 2.052 is called the upper critical t value. i! Now substituting the f value from Eq. (6:3) into Eq, (5.4), we obtain ‘ (esp « Ea wx) 2 2052s Te S 21s2) 65) Simple algebraic manipulation will show that Equation (6.5) can be expressed equivalently as Se x. Se) (k- 2050 18.5, which is called a one-sided or one-tailed alternative hypothe- , OF Hx < 185, also a one-sided or one-tailed alternative hypothesis, or Hyjox £185, whichis called a composite, two-sided, or two-tailed alternative hypothesis. That is, the true mean value is either preater than or less than 18.5." ‘Totest the null hypothesis (against the altemative hypothesis), we use the sample data eg, the sample average P/E ratio of 23.95 obtained from the sample in ‘Table 5-1) and statistical theory to develop decision rules that will tell us whether the sample evidence supports the null hypothesis. If the sample evidence supports thenull hypothesis, we donot reject Ho, butifit does not, we reject Hp. In the latter case we may accept the alternative hypothesis, Hs, How do we develop these decision rufes? There are two complementary approaches: (1) confidence interval and (2) test of significance. We illustrate each ‘with the aid of our P/E example. Assume that Hopx = 185 Hyspex # 18.5 (@ two-sided hypothesis) ‘The Confidence Interval Approach to Hypothesis Testing Totest the null hypothesis, suppose we have the sample data given in Table 5-1. Front these data we computed the sample mean of 22.27. We kno from our discussion in Section 5.3 that the sample mean is distributed normally with mean p.x and variance 03/1. But since the true variance is unknown, we replace it with the sample variance, in which case we know that the sample mean follows the f distribution, as shown in Eq, (53). Based on thet distribution, we obtained the following 95% confidence interval for: 19.57 < px $26.93 (5.16) = (6.7) We know that confidence intervals provide a range of values that may include the true p.x with a certain degree of confidence, such as 95 percent. Theréfore, if 7A hypothesis is “something considered to be true for the purpose of investigation or argument” (Weiser) ora “hops aa basi foretcning ot as tating pn for father inves: tigation from known facts" (Oxford English Dictionary). ‘There are various ways of stating the null and alternative hypotheses. For example, we could have Hyyux 2 13 and Hhypy < 13, ae rae mo a116 PART ONE: BASICS OF PROBABILITY AND STATISTICS this interval does not include a particular null hypothesized value such as jx = 185, could wenot reject this null hypothesis? Yes, we can, with 95% confidence. From the preceding discussion it should be clear that the topics of confidence interval and hypothesis testing are intimately related. In the language of hypothesis testing, the 95% confidence interval shown in inequality (67) (see Fig. 5-2) is called the acceptance region and the area outside the acceptance region is called the critical region, or the region of rejection, of thenull hypoth- exis. The lower and upper limits of the acceptance region are called critical values. In this language, if the acceptance région includes the value of the parameter under Hg, we do not reect the null hypothesis. But if it falls outside the acceptance region (.e, it lies within the rejection region), we reject the null hypothesis. In our example we reject the null hypothesis that py = 18.5 since the acceptance region. given in Eq, (6.7) does not includes the rull-hypothesized value. It should be clear now why the boundaries of the acceptance region are called critical values, for they are the dividing line between accepting and rejecting a null hypothesis. ‘Type | and Type Il Errors: A Digression In our 2/ example we rejected Hox = 18.5 because our sample evidence of X = 23.25 does not seem to be compatible with this hypothesis. Does this mean that the sample shown in Table 5-1 did not come from a normal population whose mean value was 18.5? We cannot be absolutely sure, for the confidence interval given in inequality (6.7) is 95 and not 100 percent. If that is the case, we would be making an error in rejecting Hoywx = 185. In this case we are said to commit a type I error, that is, he error of rejecting a hypothesis when itis frue. By the same token, suppose Hux = 21, in which case, as inequality (6.7) shows, we would not reject this null hypothesis. But quite possibly the sample in Table 5-1 did not come from a normal distribution with a mean value of 21. Thus, we are said to commit a type If error, that is, the erro of accepting a false hypothesis, Schematically, Reject Hy Do not reject Hy Hhistue Type Vertor ~ Correct decision Hi isfalse Correct decision Type lero Ideally, we would like to minimize both these errors. But, unfortunately, for anty given sample size it isnot possible to minimize both errors simultaneously. The classical approach to this problem, embodied in the work of statisticians Neyman and Pearson, is to assume that atype I erroris likely to be more serious in practice than a type Il error. Therefore, we should try to Keep the probability Sioa dh ladepere ll soles wets a il to increase the sample size, which may notalways be easy.CHAPTER FIVE: STATISTICAL INFERENCE: ESTIMATION ANO HYPOTHESIS TESTING 117 of committing a type I error at a fairly low level, such as 0.01 or 0.05, and then. try to minimize a type IL ersor as much as possible. In the literature the probability of committing a type I error is designated as «and is called the level of significance," and the probability of committing a type Il error is designated as 8. Symbolically, ‘Type letror = a = ptob. (rejecting Ha | Hp is true) ‘ype error = p = prob. (accepting Hs | Hy is false) The probability of not committing a type II error, that is, rejecting Hy when it is false, is (1 — B), which is called the power of the test. The standard, or classical, approach to hypothesis testing is to fix a at levels such as 0.01 or 0.05 and then try to maximize the power of the test; that is, to minimize p. How this is actually accomplished is involved, and so we leave the subject for the references. Suffice it to nate thet, in-practice, the classical ap- ptoach simply specifies the value of « without worrying too much about B, But ‘keep in mind that, in practice, in making a decision there is a trade-off between the significance level and the power of the test. Thatis, for a given sample sie, if we try to reduce the probability of a type I error, we ipso facto increase the probability ofa type ff error and therefore reduce the power of the test. Thus, instead of using a =5 percent, if we were to use a = 1 percent, we may be very confident when we reject Hp, but we may not be so confident when we do not reject it. Sine the precedent point is important, let us illustrate. For our I7E ratio example, in Eq, (5.7) we established a 95% confidence, Let us still assume that Hy:px = 18.5 but now fix a = 1 percent and obtain the 99% confidence interval, which is (noting that for 99% CI, the critical t values are (-2.771, 2.771) for 27 df): 18.28 18.5? Although the confidence interval approach can be easily adopted to construct one-sided confidence intervals, in practice it is much easier to use the test of significance approach to hypothesis testing, which we will now discuss. The Test of Significance Approach to Hypothesis Testing The test of significance is an alternative, but complementary and perhaps shortes, approach to hypothesis testing. To see the essential points involved, retum to the P/E example and Eq. (5.3): We know that X=x t= Sida 6.19) = 6.3) follows the ¢ distribution with (1-1) dif. In any concrete application we will know the values of X, S,,and n. The only unknown value is wx. But if we specify a value for pix, as we do under Hy, then the right-hand side of Eq. (5.3) is ‘known, and therefore we will have a unique f value. And since we know that the ¢ of Eq. (5.3) follows the £ distribution with (1 — 1), we simply look up the ¢ table to find out the probability of obtaining such a f value. Observe that if the difference between X and px is small (in absolute terms), then, as Eq. (6.3) shows, the |! value will also be small, where |F| means the absolute f value. In the event that X = px, t will be zero, in which case we do not reject the null hypothesis, Therefore, asthe |t| value increasingly deviates from zero, toe will fen to reject the mull hypothesis. As thet table shows,for any given df, the probability of obtaining an increasingly higher |t| value becomes progressively smallec Thus, as ltl gets larger, we will be more and more inclined to reject the null hypothesis. But how large must |t| be before we can reject the mull hypothesis? The answer, as you would suspect, depends on a, the probability of committing a type error, as Well as on the d.f,, as we will demonstrate shortly. This is the general idea behind the test of significance app?vach to hypothesis festing. The key idea here is the test statistic—the t statistic—and its probability distribution under the hypothesized value of wx. Appropriately, in the present instance the test is known as the f test since we use the f distribution. (For details of the f distribution, see Section 4.2), In our P/E example X= 23.25, $, = 949 and n = 28. Let Hoipx = 18.5 and Hux # 18.5, as before. Therefore, _ 23.25 ~ 185 9.49//28 Js the computed f value such that we can reject the null hypothesis? We cannot answer this question without first specifying what chance we are willing to take if we reject the mull hypothesis when it js true, In other words, fo answer this, question, we must specify a, the probability of committing a type I error. Suppose we fix a at 5 percent. Since the alternative hypothesis is two-sided, ‘we want to divide the risk of a type J error equally between the two tails of the Ea > AA. = 2.6486 (6.20) 119120 PART ONE: BASICS OF PROBABILITY AND STATISTICS t distribiution—the two critical regions—so that if the computed ¢ value lies in either of the rejection regions, we can reject the null hypothesis, Now for 27 df, as we saw earlier, the 5% critical t values are ~2.052 and +2.052, as shown in Fig. 5-1. The probability of obtaining a t value equal to-or smaller than —2.0096 is 2.5 percent and that of obtaining a ¢ value equal to or greater than +2,0096 is also 2.5 percent, giving the total probability of committing a type I error of 5 percent. -As Fig. 5-1-also shows, the computed f value for our example is about 2.6, which obviously lies in the right tail critical region of the ¢ distribution. We therefore reject the null hypothesis that the true average P/E ratio is 18.5. If that ‘hypothesis were true, we would not have obtained a t value as large as 2.6 (in. absolute terms); the probability of our obtaining such a t value is much smaller than 5 percent—our prechosen ‘probability of committing’ a type I error. Actually, the probability is much smaller than 2.5 percent. (Why?) In the language of the test of significance we frequently come across the following two terms. 1. A test (statistic) is statistically significant, 2. A test (statistic) is statistically insignificant. ‘When we say thata testis statistically significant, we generally mean that we can reject the null hypothesis. That is, the probability that the observed difference between the sample value and the hypothesized value is. due to mere chance is small, less than o (the probability of a type J error). By the same token, when we say that a test is statistically insignificant, we do not reject the null hypothesis. in this case, the observed difference between the sample value and the hypothesized value could very well be die to sampling variation, or, due to mere chance (ie, the probability of the difference is much greater than a), When we-reject the null hypothesis, we say that our finding is statistically significant. Oh the other hand, when we do not reject the null hypothesis, we say thaf our finding is not statistically significant. o One or Two-Tailed Test? In all the examples considered so far the alternative hypothesis was two-sided, or two-tailed. Thus, if the average P/E ratio were equal to 18.5 urider Hy, it was either greater than or less than 18.5 under Hy. In this case if the test statistic fell in either tail of the distribution (ie., the rejection region), we rejected the null hypothesis, as is clear from Figure 5-7(2). However, there are occasions when the -null and altemnative hypotheses are one-sided, or one-tailed. For example, if for the P/E example we had Hopx $18.5 and Hyy.x > 18.5, the alternative hypothesis is one-sided. How do we test this hypothesis? ‘The testing proceduréis exactly the same as that used in previous cases except instead of finding out two critical values, we determine only asingle critical value of the test statistic, as shown in Fig, 5-7. As this figure illustrates; the probability of committing a type I error is now concentrated only in one tail of the probability distribution, fin the present case. For 27 df. and a = 5 percent, the # table willCHAPTER FIVE: STATISTICAL INFERENCE: ESTIMATION AND HYPOTHESIS TESTING. 121 95% te.35 025%) an25% #07 as) ~2052 0 ean @ 95% an5t tarde) 0 178 ae 95% te35 tQ7dt) o FIGURES-7 The ftest of significance: (2) Twotaed (glo (feta show that the one-tailed critical value is 1.703 (right tal) or ~1.708 (left tal), as shown in Fig. 5-7. For our P/E example, as noted before, the computed t value is about 2.43, Since the t value lies in the critical region of Fig, 5-7(0), this { value is statistically significant. That is, we reject the null hypothesis that the true average P/B ratio is equal to (or less than) 1865; the chances of that happening are much smaller than our prechosen probability of committing a fype I exror of 5 percent. ‘Table 5-2 summarizes the (f) test of significance approach to testing the two- tailed and one-tailed null hypothesis. In practice, whether we use the confidence interval approach or the test of significance approach to hypothesis testing is a matter of personal choice and convenience. In the confidence interval approach we specify a plausible range of values {ie confidence interval) for the true parameter and find out if the confidence interval includes the hypothesized value of that parameter. IFit does, we do not | - 124122 PART ONE: BASICS OF PROBABILITY AND STATISTICS ‘TABLES-2 ASUMMARY OF THE t TEST. Null hypothesis Altemative hypothesis Critical region Hy Reject Hy ft Hx= Ho Bx> Bo > leas, r= Po Pe bean ‘Note: we denotes the particular value of yx assumed under the null ‘hypothesis.. ‘The fist subscript on the ¢stalisic shown in tho lest cour tho fvel of igtcance, andthe second subscripts the dt. These ar the ricalt vals. reject that null hypothesis, but if it lies outside the confidence interval. we can reject the hypothesis. In the test of significance approach, instead of specifying a range of plausible values for the’ unknown parameter, we pick a specific value of the parameter suggested by the null hypothesis; compute a test statistic, such as the f statistic; and find its sampling distribution and the probability of obtaining a specific value of such a test statistic. If this probability is very low, say, less than a = 5 or 1 percent, we reject the particular null hypothesis. If this probability is greater than the preselected a, we do not reject the null hypothesis. A Word about Accepting or Rejecting a Null Hypothesis In this book we have used the terminology “reject” or “do not reject” a null hypothesis rather than “reject” or “accept” a hypothesis. This is in the same spirit as a jury verdict ina court trial that says whether a defendant is guilty or not guilty rather than guilty or innocent. The fact that a person is not found guilty does not necessarily mean that he or she is innocent. Similarly the fact that we do not reject a null hypothesis does not necessarily mean that the hypothesis is true, because another null hypothesis may be equally compatible with the data, For our P/E example, for instance, from Eq, (6.7) it is obvious any value of jx between 19.57 and 26.93 would be an “acceptable” hypothesis. f AWord on Choosing the Level of Significance, «, and the p Value ro ‘The Achilles heel of the classical approach fo hypothesis testing is its arbitrariness in selecting a. Although 1, 5, and 10 percent are the commonly used values of a, there is nothing sacrosanct about these values, As noted earlier, unless ‘we examine the consequences of committing both type I and type Il errors, we cannot make the appropriate choice of «. In practice, it is preferable to find the p value (ie, the probability value), also known as the exact significance level, of the test statistic. This may be defined as the lowest significance level at which a null Ieumathesis can be rejected.(CHAPTER FIVE: STATISTICAL INFERENCE: ESTIMATION AND HYPOTHESIS TESTING 123 To illustrate, in an application involving 20 df. a f value of 3.552 was obtained, The table given in Appendix A (Table A-2) shows that the p value for this ¢ is 0.001 (one-tailed) or 0.002 (two-tailed). That is, this ¢ value is statistically significant at the 0.001 (one-tailed) or 0.002 level (two-tailed). For our P/E example under the null hypothesis that the true P/E ratiois 18.5, we found that f = 2.43. If the alternative hypothesis is that the true P/E ratio is greater than 18.5, we find from the Table A-1 in Appendix A that P(f > 2.43) is about .01 This is the p value of the t statistic. We say that this t value is statistically significant at the. 0.01 or 1 percent level. Put differently, if we were to fix a = 0.01, at that level we can reject the null hypothesis that the true nx = 18.5. . Of course, this is a much smaller probability, smaller than the conventional a valite, such as 5 percentt, Therefore, we can reject the nll hypothesis much more emphatically than if we were to choose, say, « = 0.05. As a rule, the smaller the p value, the stronger the evidence against the mull hypothesis, ‘One virtue of quoting the p value is that it avoids the arbitrariness involved in fixing aat artificial levels, such as 1,5, 00 10 percent. If for example, in an application the p value of a test statistic (such as #) is, say, 0.135, and if you are willing to 3.5 percent, this p value'is statistically significant (.¢, youreject the null hypothesis at this level of significance). Nothing is Wrong if you want totakea chance ofbeing wrong 13.5 percent ofthe time if youreject the true null hypothesis, Nowadays several statistical packages routinely compute the p values of various test statistics, and it is recommended that you report these p values. The x? and F Tests of Significance: Besides the f test of significance discussed previously, in subsequent chapters we will need tests of significance based on the x? and the F probability distrib- tutions considered in Chapter 4. Since the philosophy of testing is the same, we will simply present here the actual mecharism with a couple of illustrative ‘examples; we will present further examples in subsequent chapters. The x? test of significance In Chapter 4 (see Example 4.14) we showed that if Sig the sample vatiance obtained from a random sample of n observations from a normal population with variance o2, then the quantity 3 (n-1) (3) ~ Xe 6.21) That is, the ratio ofthe sample variance to population variance multiplied by the df. (n.— 1) follows the y? distribution with (1-1) df. Ifthe df. and S* are known but c” is not known, we can establish a (1 ~ a) % confidence interval for the true but unknown o? using the x? distribution. The mechanism is similar to that for establishing confidence intervals on the basis of thet test. But if we are given a specific value of o? under Ho, we can directly compute the y? value from expression (6.21) and test its significance against the critical x? values given in the Table A-4 in Appendix A. An example follows. AREER 3124 PARTONE: BASICS OF PROBABILITY AND STATISTICS TABLE 5-3 124 Example 54, Suppose a random sample of 31 observations from a normal population gives a (sample) variance of S* = 12. Test the hypothesis that the true variance is 9 against the hypothesis that it is different from 9. Use a = 5%. Here Hoo? =9 and Kyo? £9, Answer: Putting the appropriate numbers in expression (6.21), we obtain: x? = 30(12/9) = 40, which has 30 d.f. From the Table A-4 in Appendix A, we observe that the probability of obtaining a x? value of about 40 ot higher (for 30 d.£) is 0.10 or 10 percent. Since this probability is greater than our level of significance of 5 percent, we do not reject the null hypothesis that the true variance is 9, ‘Table 5-3 summarizes the x? test for the various types of null and alternative hypotheses. ‘The F Test of Significance In Chapter 4 we showed that if we have two randomly selected samples from two normal populations, X and Y, with m and observations, respectively, then the variable & Fook s EK = Rim) = (5,22) Da -Hya-1) follows the F distribution with (m— 1) and (n - 1) d.f,, provided the variances of the two normal populations are equal, In other words, the Ho is 03 = of. To test this hypothesis, we use the F test given in Eq. (6.22). An example follows. Example 55, Refer to the S.A.) math scores for male and female students given in Example 4.15. The variances of these scores were (46.61) for male and (83.88) ASUNMARY OF THE x? TEST, Nullhypothesis Alsmative hypothesis _—_Cicl region A Hh Reject Hb i =o o> 05 PE tay hag 5G 98 haven Os ak Rte OFS ae) OF Xa Note:o8 Ws the valuo of ound the rl hypo eubectp!TABLE 5-4 ‘CHAPTER FIVE: STATISTICAL INFERENCE: ESTIMATION AND HYPOTHESIS TESTING 125 for female. The number of observations were 24 or 23 d.f. each. Assuming that these variances represent a sample from a much larger population of S.A.T. scores, test the hypothesis that the male and female population vari- ancés on the math past of theS.A.T. scores are the same. Use a = 1%. Answer: Here the F value is 83.88/46.62 = 1.80 (approx. This F value has the F distribution with 23 d.f, each. Now from the Table A-3 in Appendix A we see that for 24 df. (23 c.f. is not given in the table), the critical F vaiue at the 1% level of significance is 2.66, Since the observed F value of 1.80 is less than 2.74, it is not statistically significant. That is, at « ='1%, we do not reject the riull hypothesis that the two population variances are the same. - Example 5.6. Inthe preceding example, what is the p omlue of obtaining an F value of 1.80? ‘Using Minitab, we can find that for 23 d.f in the numerator and denominator, the probability of obtaining an F value of 1.8 or greater is about 0.0832 or about 8%, This is the p value of obtaining an F value of as much as 1.8 of greater. In other words, this is the lowest level of probability at which we can reject the null hypothesis that the two variances are the same. Therefore, in this case if we reject the null hypothesis that the two variances are the same, ‘we are taking the chance of being wrong 8 out of 100 times. Examples 55 and 5.6 suggest a practical strategy. We may fix a at some level (eg-,1,5,or 10 percent) and also find out the p value of the test statistic. Ifthe estimated p value is smaller than the chosen level of significance, we can reject the null hypothesis under consideration. On the other hand, if the estimated p value is greater than the preselected level of significance, we may not reject the null hypothesis, ‘Table 5-4 summarizes the F test. ASUMMARY OF THE F STATISTIC, Null hypothes's Alternative hypothesis: Ciitical region wh my Reject Hy if faop ob >of p> Feta s teed eee A> Ramat F< Ficaitat oles 1.0Fando? ore te tno popaionvadarces, 2, Stand Sf arthe two sample vances, 3. and at denote, respectively the numerator and denominator a. ‘Incompetent a "veo nfl ume 5 The ctial vas are gen inthe a coun, The rst bck tel pens ard een ete mead Snominaor dt ro we126. PART ONE: BASICS OF PROBABILITY AND STATISTICS To conclude this chapter, we summarize the steps involved in testing a statistical hypothesis: Step 1: State the null hypothesis Ho and the alternative hypothesis Hi (eg, Hy:jux = 18.5 and Hispx # 1855 for our P/E example). Step 2: Select the test statistic (e.g,, X) Step 3: Determine the probability distribition of the test statistic (eg., ~ Nuxof/n), Step 4: Choose the level of significance a,, that is, the probability of committing a type I error, (But keep in mind our discussion about the value.) Step 5: Choose the confidence interval or the test of significance approach: The Confidence Interval Approach Using the probability distribution of the test statistic, establish a 100(1 — a) % confidence interval. If this interval (ie., the acceptance region) includes the null-hypothestzed value, do not reject the null hypothesis, But if this interval does not include it, reject the mull hypothesis. The Test of Significance Approach Alternatively, you can follow this approach by obtaining the relevant test statistic (eg,, the t statistic) under the null hypothesis and find out the p value of obtaining a specified value of the test statistic from the appropriate probability distribution (eg, the t, For the x? distribution). If this probability is less than the prechosen value of a, you can reject thenull hypothesis, But if itis greater than a, do not eject it. Ifyou do not want to preselect o,, just present the p value of the statistic. Whether you choose the confidence interval or the test of significance approach, always keep in mind that in rejecting or not rejecting a null hypothesis you are taking a chance of being wrong a (or p value) percent ofthe time. Further uses of the various tests of significance discussed in this chapter will be illustrated throughout the rest of this book. 56 SUMMARY Estimating population parameters on the basis of sample information and testing hypotheses about them in light of the sample information are the two main branches of (classical) statistical inference. In this chapter we examined the essential features of these branches. KEY TERMS AND CONCEPTS The key terms and concepts introduced in this chapter are Statistical inference Critical ¢ values Parameter estimation Confidence interval a) point estimation a) confidence coefficient b) interval estimation +b) random interval (lower limit, Comnline (nenhahility distribsstion ‘upper limit) aQUESTIONS CHAPTER FIVE: STATISTICAL INFERENCE: ESTIMATION AND HYPOTHESIS TESTING 127 Level of significance Confidence interval (approach to Probability of committing atypelerror hypothesis testing) Properties of estimators a) acceptance region a) linearity b) critical region; region of b) unbiasedness rejection 6) minimum variance critical values ) efficiency (efficient estimator) Type Fervor (a) ) best lineas unbiased estimator confidence coefficient (1 - a) @LUE) ‘Type error (8) £) consistency (consistent power of the test (1 ~ 8) estimator) ‘Tests of significance (approach to Hypothesis testing hypothesis testing) a) null hypothesis a) Test statistic; f statistic; f test ») altemative hypothesis b)x? test ©) one-sided; one-tailed oF test ) two-sided; two-tailed; thep value composite 5.1. What is the distinction between each ofthe following pairs of terms: a. Point estimator and interval estimator: b. Null and alternative hypotheses. . Typel and type Il errors. 4. Confidence coefficient and evel of significance, e. Typelll error and power. 5.2. Whatis the meaning of =~. a. Statistical inference. Critical value of atest b. Sampling distribution. —§, Level of significance, " ¢ Acceptance region. _g, Thep value. } G, ‘Test statistic. . 53. Explain carefully the meaning of \ ‘a. An unbiased estimator. d. A linear estimator. N b, Aminimum variance estimator. e: Abest linear unbiased estimator (BLUE). N ¢, Abest, ot efficient, estimator. 5.4. State whether the following statements are true, false, or uncertain. Justify your answers, a. An estimator of a parameter is a random variable, but the parameter is non- _ random, or fixed. ‘b. An unbiased estimator of a parameter, say, x, means that it will always be ‘equal to wx. ¢, Anestimator can be a minimum variance estimator without being unbiased. — di, An efficient estimator means an estimator with minimum variance. , Anestimator can be BLUE only if its sampling distribution is normal. £, An’ acceptance region and a confidence interval for any given problem ‘means the same thing. . AtypeT error occitrs when we reject the null hypothesis even thanch itis false. ~ - ve 127128 PART ONE: BASICS OF PROBABILITY AND STATISTICS PROBLEMS 123 Ah, Atype If error occurs when we reject thenull hypothesis even though it may be true, J.As the degrees of feedom (df) incase indefinitely, the ¢ distribution approaches the normal distribution. j. The central limit theorem states that the sample mean is always distributed normally. k. The terms evel of significance and p value mean the same thing, 53. Explain carefully the difference between the confidence interval and test of ___ significance approaches to hypothesis testing, 58. Suppose in an example with 40 d.f that you obtained a t value of 1,35. Since its p value is somewhere between a 5 and 10 percent level of significance (one- tailed), itis not statistically very significant. Do you agree with this statement? Why or why not? 3. Bind the critical Z values in the following cases! a. «= 0.05 (two-tailed test) b. «= 005 (one-tailed test) 58. Find the critical # values in the following cases: a. 1=4,a=0.05 (twotailedtest) di, b. n=4,a = 005 (one-tailed test) e c n=14,a=0,01 (twortailed test) . f. n= 200, 59. Assume that the per capita income of residents in a country is normally distributed with mean 2 = $1000 and variance 0? = 10,000 ($ squared). a; What is the probability that the per capita income lies between $800 and $1200? b. What is the probability that it exceeds §12002 c. What is the probability that it is less than $800. d. Is it true that the probability of per capita income exceeding $5000 is practically ze10? ; 5.10. Continuing with problem 539, based on a random sample of 1000 members, suppose that you find the sample mean income, X, to be $900. 2. Given that 1 = $1000, what is the probability of obtaining such a sample mean value? 1, Based on the sample mean, establish a 95% confidence interval for p. and ind out if this confidence interval includes . = $1000. If it does not, what conclusions would you draw? . Using the test of significance approach, decide whether you want to accept or réject the hypothesis that y. = $1000, Which test do you tse and why? 5.11, ‘The number of peanuts contained in ajar follows thenormal distribution with mean y, and variance o?. Quality contzol inspections over several periods show that 5 percent of the jars contain less than 65 ounces of peanuts and. 10 percent contain more than 6.8 ounces. a. Bind pando?, . What percentage of bottles contain more than 7 ounces? 3.12, The following random sample was obtained from.a normal population with mean ut and variance = 2.5.13, 514, 5.15. Bale 6.1; 5.18, 5.9, 5.20, CHAPTER FIVE: STATISTICAL INFERENCE: ESTIMATION AND HYPOTHESIS TESTING 129 Note: use a = 5%, . What is the p value in part (2) ofthis problem? Based on a random sample of 10 values from a normal population with mean stand stadard deviation o, ou calculated that X= 8 and the sample stan- dad deviation = 4 Catia a 95% conidence interval for the population mean, Which probability distribution do you use? Why? You are told that X~ N(ux = 8, 3 = 36). Based on a sample of 25 observa- 5. tions, you found that a. What is the sampling distribution of X?_ ', What is the probability of obtaining an X= 7.5 or less? ¢. From your answer in part (t) of this problem, could such a sample value have com from the preceding population? Compute the p values in the following cases: . 12 1.72,d£=24 b. Z>29 ©. F > 2.59,df = 3 and 20, respectively 19,4. =30 ‘you cannot get an exact answer from the various probability distribu- tn tables, try to obtain them from a program such as Minitab. 5. In an application involving 30 d.f. you obtained a f statistic of 0.68. Since this ¢ value is not statistically sipificant even at the 10% level of significance, you can safely accept the relevant hypothesis, Do you agree with this statement? What is the p value of obtaining such a statistic? Let X~ N(ux, 03). A random sample of three observations wes obtained from this population. Consider the following estimators of pix: + a tate and n= a. Is fix an umbiased estimator of uy? What about fi? b. If both estimators are unbiased, which.one would you choose? (Hint: ‘Compare the variances of the two estimators.) Refer to Problem 4.10in Chapter 4. Suppose atandom sample of 10 firass gave a ‘mean profit of $900,000 and a (sample) standard deviation of $100,000. a. Establisha 95% confidence interval for the true mean profit in the industry. b, Which probability distribution do you use? Why? Refer to Example 4.14 in Chapter 4, a, Establish a 95% confidence interval for the true 0”, b, Test the hypothesis that the true variance is 82, Sixteen cars were first driven with a standard fuel and then with Petrocoal, a gasoline with a methanol additive. The results of the nitrous oxide emissions (NO,) test are as follows: Type of Fuel Average NO, Standard Deviation of NO, ‘Standard 4.075 Q5786 Petrocoal 1.459 0.6134 ‘Source: Wiha ©, Finkelelein and Bouse Levin, Statistics for Lawyers, 12)190 PART ONE: BASICS OF PROBABILITY AND STATISTICS a. How would you test the hypothesis that the two population standard deviations are the same? . Which test do you use? What are the assumptions underlying that test? 5.21. Show that the estimator given in Eq: (6.14) is biased. (Hint: Expand Eq, (6.14), and take the expectation of each term, keeping in mind that the expected value of each X;is wx). 5.22. One-sided confidence intereal, Retum to the P/E example in this chapter and look at the two-sided 95% confidence interval given in Eq, (6.7). Suppose you ‘want to establish a one-sided confidence interval only, either an upper bound or a lower bound, How would you go about establishing such an interval? (Hint: find the one-tail critical t value).PART { { THE LINEAR REGRESSION MODEL ‘The objective of Part ii, which consists of five chapters, is to introduce you to the “bread-and-butter” tool of econometrics, namely, the linear regression model. Chapter 6 discusses the basic ideas of linear regression in. terms of the simplest possible linear regression model, in particular, the two-variable model. We make an important distinction between the population regression model and the sample regression model and estimate the former from the latter. This estimation is done using the method of least squares, one of the popular methods of estimation. Chapter 7 considers hypothesis testing. As in any hypothesis testing in statistics; we try to find out whether the estimated values of the parameters of the regression model are compatible with the hypothesized values of the parameters. We do thisthypothesis testing in the context of the classical linear regres sion model (CLRM). We discuss why the CLRM is used and point out that the CLRM is a useful stasting point, In Part Il we will reexamine the assumptions of the CLRM to see what happens to the CLRM if one o more of its assumptions are not fulfilled. Chapter 8 extends the idea of the two-variable linear regression model developed in the previous two chapters to multiple regression models, that is, models having more than one explanatory variable. Although in many ways the multiple regression model is an extension of the two-variable model, there are differences when it comes to interpreting the coefficients of the model and in the hypothesis-testing procedure. The linear regression model, whether two-variable or multivariable, only requires that the parameters of the model be linear; the variables entering the model need not themselves be linear. Chapter 9 considers a variety of models Ba AML182 PART Wo: THE LINEAR REGRESSION MODEL, 122 that are linear in the parameters (or can be made so) but are not necessarily linear in the variables. With several illustrative examples, we point out how and where such models can be used. ‘Ofteri the explanatory variables entering into a regression model are qualitative in nature, such as sex, race, and religion. Chapter 10 shows how such variables can be measured and how they entich the linear regression model by taking into account the influence of variables that otherwise cannot be quantified. Part II makes an effort to “wed” practice to theory, The availability of user- friendly regression packages allows you to estimate a regression model without knowing much theory, but temember the adage that “a little knowledge is a dangerous thing.” So even though theory may be boring, itis absolutely essential in understanding and interpreting regression results. Besides, by omitting al] mathematical derivations, we have made the theory “less boring.”CHAPTER BASIC IDEAS OF LINEAR REGRESSION: THE TWO-VARIABLE MODEL Jn Chapter 1 we noted that in developing a model of an economic phenomenon (eg, the law of demand) econometricians make heavy use of a statistical technique known ais regression analysis. The purpose ofthis chapter and Chapter 7 is to introduce the basis of regression analysis in terms ofthe simplest possible linear regression model, namely, the two-vasiable model, Subsequent chapters ‘will consider various modifications and extensions of the two-variable model. 6.1 THE MEANING OF REGRESSION ‘Asnoted in Chapter 1, regression analysis is concerned with the study of the relationship between one variable called the explained, or dependent, variableand ‘one or more other variables called independent, or explanatory, variables. ‘Thus, we may be interested in studying the relationship between the quantity demanded of a commodity in terms of the price of that commodity, income of the consumer, and prices of other commodities competing with this commodity. Or, we may be interested in finding out how sales ofa product (e.g., automobiles) are related to advertising expenditure incurred on that product. Or, we may be interested in finding out how defense expenditures vary in relation to the gross domestic product (GDP). In all these examples there may be some underlying theory that specifies why we would expect one variable to be dependent or related to one or more other variables. In the first example, the law of demaind provides the rationale for the dependence of the quantity demanded of a product on its own price and several other variables previously mentioned. For notational uniformity, from here on we will let ¥ represent the dependent variable and X the independent, ot explanatory, variable. If here is more than oa AE184 PARTTWO: THE LINEAR REGRESSION MODEL one explanatory variable, we will show the various X’s by the appropriate subscripts (X1, Xz, Xe ete.). Itis very important to bear in mind the warning given in Chapter 1 that, although regression analysis deals with relationship between a dependent variable and one or more independent variables, it does not necessarily imply causa tion; that is, it doesnot necessarily mean that the independent variables are the cause and the dependent variable is the effect. If causality between the two exists, ittmust be justified on the basis of some (economic) theory. As noted earlier, the law of demand suggests that if all other Variables are held constant, the quan~ tity demanded of a commodity is (inversely) dependent on its own price. Here microeconomic theory suggests that the price may be the causal force and the quantity demanded the effect. Aloays Keep in mind that regression does not necessarily imply causation. Causality must be justified, or inferred, from the theory that underlies the phenomenon that is tested empirically. Regression analysis may have one of the following objectives: 4. To estimate the mean, oF average, value of the dependent variable, given the values of the independent variables. To test hypotheses about the nature of the dependence—hypotheses suggested by the underlying economic theory. For example, in the demand function mentioned previously, we may want to test the hypothesis that the price elasticity of demand is, say, -1.0; that is, the demand curve has unitary price elasticity. Ifthe price of the commodity goes up by 1 percent, the quantity demanded on the average goes down by 1 percent, assuming all other factors affecting demand are held constant. To predict, or forecast, the mean value of the dependent variable, given the values of the independent variable(s) beyond the sample range. ‘Thus, in the S.AT. example discussed in Chapter 4, we may wish to predict the average score on the verbal part of the S.A.T. for a group of students who know their scores on the math part of the test (see Table 6-15). 4, One or more of the preceding objectives combined. vy 62 THE POPULATION REGRESSION FUNCTION (PRF): HYPOTHETICAL EXAMPLE . To illustrate what all this means, we will consider a concrete example. Several states in the United States run lotteries several times a week to generate money for social expenditures, such as education. In New York State the most popular lottery game is Lotto. Suppose we are interésted in finding outhow much money people ata certain income level spend on Lotto in New York State each week. Let ¥ represent the weekly expenditure on Lotto and X represent the weekly personal disposable (ie., after-tax) income (PDI), both measured in dollars. For simplicity of exposition, assume that a hypothetical population of 100 Lotto players is divided into 16 PDI classes in increments of $25, starting with $150 and ending with €275 The data nn their weekly Lotto expenditures given in Table 6.1,CHAPTER SIX: BASIC JDEAS OF LINEAR REGRESSION: THE TWO-VARIABLE MODEL 135, ‘TABLE 6-1 WEEKLY LOTTO EXPENDITURE IN RELATION TO WEEKLY PERSONAL DISPOSABLE INCOME. Personal Disposable Income (§) ‘Weeldy Expenciture on Lotto (6) Customer 150175 200-225 250 275 800 AB 350 «B75 + 28 33H Haka 5B 2 7 3 st 8 B 7 8 6%. Be 3 2B 2 9 St 3 3 3 3 88 oF 4 8 9 B@ B@ M 9 3 9 3 5 23 8 Bw 6 8 2 2 2 2% 7 2 3 9 9 7 8 418 «420 38 HG BCD 8 2 68 7 2 2 B 4 9 w Bt 9 8 4 «6 8 0 BW 2B 8 B Ce rr “uo ees Moan 20.90 22.40 2440 2640 27.90 2920 90.20 3190 320 33.60 Table 6.1 is to be interpreted as follows: Ata weekly disposal income of $150, there are 10 people who spent anywhere from $12 to $28 per week on lottery: tickets. The average amount these 10 people spent is $20.90 per week. Likewise, at the income level of $375 there are 10 people who spent anywhere from $30 to $46 on lottery tickets, the average for this group being $33.60 per week. Other figures in the table are to be interpreted similarly. Let us plot the data of Table 6-1 in a scatfetgram or scatter diagram (see Figure 6-1) with weekly expenditure on Lotto (Y) on the vertical axis and weekly petsonal disposable income, or PDI, (X) on the horizontal axis. As this scattergram shows, corresponding to any given X value there are sev- eal Y values, 10 in the present instance." In other words, each X has associated ; with ita Y population—the totality of ¥ corresponding to that X (recall the definition of population ftom Chapter 2) In the present case we have 10 subpopu, f lations of ¥. ¢ What does the scattergram reveal? The impression that you getis thatgenerally N there isa tendency for Y to increase as X increases—people with higher incomes ‘\ are generally likely to spend more money on the Lotto. This tendency is all the ‘more pronounced if we concentrate on the circled points which give the mean or average value of Y corresponding to the vatious Xs, Mate formally they are called conditional mean, or conditional expected, values. (See Chapter 3 for detals,) TE we connect the various expected values of Y, the resulting line is called the population regression line (PRI). The PRL gives the average, or mean, value of the dependent variable (Lotto expertture in ont example corresponding to each value of the _ independent variable (personal disposal income in our example) in the population as a or simplicity of exposition only we are assdming that there are 10 Y values corresponding to eachX value. In reality, the mumiber of Yves against each X may be large and each income group need not have the same mumber of observations. ae196 PART TWO: THE LINEAR REGRESSION MODEL FIGURE 6-1 a Weekly Expenditure on Lotto ($) L 18D 175 200 225 250 275 300 325 50 375 “Weekly Personal Disposable Income ($) ‘Westiy expenditure on Lotto (6) and weekly personal disposable income (8). whole. Thus, corresponding to a weekly income of $200, the average weekly expenditure on Lotto is $24.40, corresponding to a weekly income of $350, the average weekly expenditure on Lottois $33.00,ando on. In short, the PRLisa line that ellsushow themean, oraverage, value of ¥ (orany dependent variable) is e- lated to each value of X (or any independent variable) in the population as whole, Since the PRL sketched in Figure 6-1 is approximately linear, we can express, it mathematically in the following functional form: E(Y|X) = Bi + BX 6a) which is the mathematical equation of straight line. In Equation (6.1), E(Y X;) means the mean, or expected value, of ¥ corresponding to, or conditional upon, a given value of X. The subscript i refers to the ith subpopulation. Thus, in ‘Table 6-1 E(Y | X; = 150) is 20.90, which is the mean, or expected, value of Yin the first subpopulation (ie,, comesponding to X = 150). ‘The last row in Table 6-1 gives the conditional mean values of Y. It is very important to note that E(Y |X} is a function of X; (linear in the present example). This means that the dependence of Y on X, technically called the regression of Y on X, can be defined simply as the mean of the distribution of Y values (as in Table 6-1), which has the given X. In other words, the population regression line (PRL) is a line that passes through the conditional means of Y. The mathematical form in which the PRL is expressed, such as Eq, (6.1), is called the populationCHAPTER SIX: BASIC IDEAS OF LINEAR REGRESSION: THE TWO-VARIABLE MODEL 137 regression function (PRE), as it represents the regression line in the population a8 Whole. In the present instance the PRF is linear. (The more technical meaning of linearity is discussed in Section 6.6.) In Bg, (6.1) By and By are called the parameters, also known as the regression coefficients, B, is also known as the intercept (coefficient) and B, as the slope (coefficient), The slope coefficient measures the rateofchange in the (conditional) mean value of ¥ per unit change in X. If, for example, the slope coettcient were 0.06, it would suggest that if PDI increases, say, by a dollar, the (conditional) mean value of Y increases by six cents; that is, on the average, expenditure on Lotto will go up by six cents, Bs the (conditional) mean value of Yif Xs zero;it gives theaverage value ofthe expenditure on Latto if per capita disposable income werezero. Butwe will have more to say about this interpretation of the intercept later in the chapter. . How do we go about finding out the estimates, or numerical values, of the intercept and slope coefficients? We explore this question in Section 68. Before moving on, a word about terminology is in order. Since in regression analysis, as noted in Chapter 1, We are concerned with examining the behavior of the dependent variable conditional upon the given values of the independent variable(s), our approach to regression analysis can be termed conditional regression analysis.” As a result, there is no need to use the adjective “conditional” all the time, Therefore in the future expressions like E(Y |X) will be simply oritten as E(Y), ‘with the explicit understanding that the latter in fact stands for the former. Of course, where there is cause for confusion, we wil use the more extended notation. 6.3 STATISTICAL OR STOCHASTIC SPECIFICATION OF THE POPULATION REGRESSION FUNCTION As we just discused, the PRF gives the average value of the dependent variable corresponding to each value of the independent variable. Let us take another look at Table 6-1. We know, for example, that corresponding to X= $300, the * average Vis $90.30. But if we pick one customer at random from the 10 customers corresponding to this income, We know that expenditure on Lotto by that cus tomer will not necessarily be equal to the mean value of $90.30. Tobe concrete, take the last customerin this group. His or her expenditure on Lottois$23,which is below the mean value. By the same token, ifyou take the first customer in that group, his orhet expenditure is $42, which is above the average value, How do you explain the expenditure of an individual customer in relation to income? ‘The best we can do is to say that any individual's expenditure on ‘The fact that our analysis is conditional on X does not mean that X causes Y. It is just that we ‘want ose the behavior of Yin relation to an X variable thats of intrest to the analyst. For example, when the Federal Reserve Bank (the Fed) changes the Federal Punds rate, itis interested in finding out how the economy respostds. Following the 2000 recession in the United States, the Fed reduced the Federal Funds rate several times to resuscitate the ailing econonty. One of the key de- terminants of the demand for housing is the mortgage interest rate. It is therefore of great interest to prospective homeowners to track the mortgage interest rates. When the Fed reduces the Federal Funds rat, all other interest rates follow suit. 137198 PARTTWO: THE LINEAR REGRESSION MODEL FIGURE 6-2 Weekly Expenditure on Lotto ($) ee (eee ee cae rene Weekly Personal Disposable Income ($) g 8 Lotto is equal to the average for that group plus or minus some quantity. Let us express this mathematically as = Bt BK tu (6.2) where w is known as the stochastic, or random, esror-term, or simply the error term? We have already encquntered this term in Chapter 1, The error term is a random variable (c.v.), for its value cannot be controlled or known a priori. As we know from Chapter2,arv. is usually characterized by its probability distribution (eg,-the normal or the t distribution), How do we interpret Equation (6.2)? We-can say that an individuals expenditure on Lotto, say, the ith individual, corresponding to a specific PDI can be ‘expressed as the sum of two components. The first component is (Bi + B2X;), which is simply the mean, or average, expenditure on Lotto in the ith subpopulation; thatis, the point on the PRL corresponding to the specific PDI This component may be called the systematic, ot deterministic, component. The second component is 1, which may be called the nonsystemintic, or random component (ie,, determined by factors other than income). The error term u; is also known as the noise component. To see this clearly, consider Figure 6-2, which is based on the data of Table 6-1. "he word stochastic comes froma the Greek word stokias meaning a "bull's eye.” The outcome of throwing darts onto a dart board isa stochastic process, thats process fraught with misses. In st tistics, e word implies the presence of a random variable—a variable whose outcome is deter-CHAPTER SIX: BASIC IDEAS OF LINEAR REGRESSION: THE TWO-VARABLE MODEL 139 As this figure shows, at PDI = $150, one customer spends $25 on the Lotto, ‘whereas the average expenditure on Lotto at this income level is $20.90. Thus, this person's expenditure on Lotto exceeds the systematic component (ie., the amiean for the group) by $4.10. So his or her component is +41 units. On the other hand, at PDI = §300, a randomly chosen second customer spends $24 on the Lotto, whereas the average expenditure for this group is 0.30, This person’s Lotto expenditure is less than the systematic component by $6.30 units; his or her « component is thus -6.3, Eq, (6.2) is called the stochastic (or statistical) PRE, whereas Eq, (6.1)is called the deterministic, or nonstochastic, PRE The latter represents the means of the various Y values corresponding to the specified income levels, whereas the for- ser tells us how individual lotto expenditures vary around their mean values due to the presence of the stochastic error term, 1 ‘What is the natuse of the « term? 6A THE NATURE OF THE STOCHASTIC ERROR,TERM 1. The esror term may represent the influence of those vasiables thatare not explicitly included in the model. For example in our Lotto scenario it may very well represent influences, such as a person's wealth, the area where he or she lives, peer pressure, or previous Lotto winnings. ‘2, Even if we included all the relevant variables determining the expenditure on the Lotto, some intrinsic randomness in Lotto expenditure is ‘bound to occur that cannot be explained no matter how hard we try. Human behavior, after all, is not totally predictable or rational. Thus, 1 ‘may reflect this inherent randomness in human behavior, may also represent errors of measurement. For example, the data on PDI may be rounded or the data on Lotto expenditure may be suspect because in some communities gambling may be frowned upon. . The privciple of Ockham’s razor—that descriptions be kept as simple as possible until proved inadequate—would suggest that we keep our regression model as simple as possible. Therefore, even if we know what other variables might affect Y, their combined influence on Y may be so small and nonsystematic that you can incorporate it in the random term, u, Remember that a model isa simplification of reality. If we truly want to build reality into a model it may be too unwieldy to be of any practical use. In model building, therefore, some abstraction from reality is inevitable. By the way, William Ockham (1285-1349) wasan English philoso- pher who maintained that a complicated explanation should not be accepted without good reason and wrote “Frustra fit per plura, quod fieri potest per paucfora—I is vain to do with more what canbe done with fess.” Ikis for one or more of these reasons that an individual consumer's expenditure on Lotto will deviate from his or her group average (ie, the systematic component), And as we will soon discower, thie errr tera wave om rte140 PARTTWO: THE LINEAR REGRESSION MODEL 6.5 THE SAMPLE REGRESSION FUNCTION (SRF) TABLE 6-2 ing How do we estimate the PRF of Eq, (61), that is, obtain the values of By and By? If we have the data from Table 6-1, the whole population, this would be a relatively straightforward task, All we have to dois to find the conditional means of Y corresponding to each X and then join these means. Unfortunately, in prac- fice, we rarely have the entire population at our disposal. Often we only have a simple from this population, (Recall from Chapters 1 and 2 our discussion regarding the population and the sample.) Our task heres to estimate the PRF on the basis of the sample information. How do we accomplish this? Pretend that you have never-seen Table 6-1 but only had the data given in. . Table 62, which presumably represent a randomly selected sample of Y values corresponding to the X values shown in Table 6-1. i Unlike Table 6-1, we now have only one Y value corresponding ‘to each X. The important question that we now face is: From the sample data of Table 6-2, can we estimate the average expenditure on Lotto in the population as a whole corresponding:to each X? In other words, can we estimate the PRF from the sample data? As you can well surmise, we may not be able to,estimate the PRF accurately because of santplrig fluctuations, or sampling error, a topic we dis- ‘cussed in Chapter 4, To see this clearly, suppose another random sample, which is shown in Table 6-3, is drawn from the population of Table 6-1. If we plot the data of Tables 6-2 and 6-3, we obtain the scattergram shown in Figure 6-3. ‘Through the scatter points we have drawn visually two straight lines that fit the scatfer points reasonably well, We will call these lines the sample regression. liries (SRLs). Which of the two SRLs represents the true PRL? If we avoid the temptation of looking at Figure 6-1, which represents the PRL, there is noway we can be sure that either of the SRLs shown in Figure 6-3 represents the true PRL. Forifwehad yet another sample, we would obtaina third SRL. Supposedly, each SRL represents the PRL, but because of sampling variation, each is at best an approximation of the true PRL. In ‘general, we would get K different SRLs for K different samples, and all these SRLs are not likely to be the same. ‘ARANDOM SAMPLE TABLE 6-9" ANOTHER RANDOM FROM TABLE 6-1: SAMPLE FROM TABLE 6-1. y x y x 18 150 23 150 24 175 8 175 6 200 24 200 % 225 25 225 30 20 28 250 2 25 27 215 4 200 a 300 35 225 29 cd 3 350 33 350 40 375 34 375FIGURE 6-3 CHAPTER SiX: BASIC IDEAS OF LINEAR REGRESSION: THE TWO-VARIABLE MODEL 141 (© Sample ‘8 Sample? SRL s & 8 & Weekly Expenditure on Lotto ($) 6 — 1200 160-2000 02836 ‘Weekly Personal Disposable Income §) Saniple regression lines based on two independent samples, Now analogous to the PRF that underlies the PRL, we can develop the concept of the sample regression function (SRE) to represent the SRL. The sample counterpart of Eq. (6.1) may be written as Yad + bX; 63) where ” js read as “hat” or “cap,” and where Yj = estimator of E(Y IX), the estimator ofthe population conditional mean by = estimnator of By by = estimator of By Asnoted in the preceding chapter, an estimator, or a sample statistic, is a rule or a formmla that suggests how we can estimate the population parameter at hand. A particular numerical value obtained by the estimator in an application, as we know, is an estimate. (Recall from Chapter 5 the discussion on point and interval estimators.) If we look at the scattergram in Figure 63, we observe that not all the sample data lie exactly on the respective sample regression lines. Therefore, just as we developed the stochastic PREF of Eq, (6.2), we need to develop the stochastic version of Eq, (6.3), which we write as YahthXt+e (64) where ¢; = the estimator of 1. 22 ON142 PARTTWO: THE LINEAR REGRESSION MODEL FIGURE 64 We call ¢; the residual term, or simply the residual. Conceptually, itis analogous to u; and can be regarded as the estimator of the latter. It is introduced'in - the SRF for the same reasons as u; was introduced in the PRE. Simply stated ¢; represents the difference between the actual Y values and their estimated values from the sample regression. That is, a=%-% (6.5) To summarize, our primary objective in regression analysis is to estimate the (stochastic) PRF x 1 + By Xi +m ‘on the basis of the SRF Yah thXte because more often than not our analysis is based on a single sample from some population. But because of sampling variation, our estimate of the PRF based on the SRF is only approximate. This approximation is shown in Figure 6-4. Keep in mind that we actually do not observe By, By and u. What we observe are their proxies, by, by, and e, once we have a specific sample. For a given Xi, shown in this figure, we have one (sample) observation, Y;.In terms of the SRE, the observed Y; can be expressed as N= Vite (6.6) sa Y= 28%, PRE E18) = 8, +B, X; Weeldy Expenditure on Lotto ($) Weekly Disposable Income (§) ‘The population and sample regression ines.‘CHAPTERSIX: BASIC IDEAS OF LINEAR REGRESSION: THE TWO-VARIABLE MODEL 143 and in terms of the PRF it can be expressed as. Y= EX) tu (67) Obviously, in Figure 6-4, % underestimates the true mean value E(Y|X,) for the X; shown therein, By the same token, for any Y to the right of point A in Figure 6-4 (e.g, Ya), the SRF will overestimate the true PRE. But you can readily see that such over- and underestimation is inevitable due to sampling fluctuations. The important question now is: Granted that the SRF is only an approximation of the PRE, can we find a method or a procedure that will make this approximation as close as possible? In other words, how should we construct the SRF 50 that by is as close as possible to B, and by is as close as possible to By, because generally we do not have the entite population at our disposal? As we will show in Section 68, we can indeed find a “best fitting” SRF that will mirror the PRF as faithfully as possible. It is fascinating to consider that this can be done ven though we never actually determine the PRE itself. 6.6 THE SPECIAL MEANING OF THE TERM “LINEAR” REGRESSION Since in this text we are concemed primarily with “linear” models like Eq, (6.1), itis essential to know what the term linear really means, for it can be interpreted in two different ways. Linearity in the Variables The first and perhaps the more “natural meaning of linearity is that the conditional mean value of the dependent variable is.a linear function of the indepen- | dent variable(s) as in Bq, (6.1) or Eq. (62) orin the sample counterparts, Eqs. (6:3) \ artd (64) Tn this interpretation, the following functions are not linear: oe E(Y) = By + BX} (6.8) y ) : 1 | EQ) = Bi+ Baye (69) because in Equation (68) X appears with a power of, and ing (69) itappears in the inverse form For regression models linear in the explanatory variable(s), therate of changein the dependent variable remains constantfora unit change in the explanatory variable; that is, the slope remairis constant, But for a regression {A funn Y = (8 nd ob ern X10) Xagpen h poer of nly at terms such as and Var xc and (Xo ot meld erie by another arable (eg, X- Zand X/Z, where Zis another variable). 143144 PARTTWO: THELINEAR REGRESSION MODEL Y Y ‘Slope B, is sameat each point of the curv. = 8, +B,(0/%) 98, +,%, : Slope B, varies from point topoinfon the curve FIGURE 65° (a) Linear demand curve; (b) nonlinear demand curve, model nonlinear in the explanatory variables the slope does not remain constant. This can be seen more clearly in Figure 6-5, ‘As Figure 6-5 shows, for the regression (6.1), the slope—the rate of change in E(Y)—the mean of Y, remains the same, namely, By no matter at what value of X we measure the change. But for regression, say, Bq. (6.8), the rate of change in the mean value of Y varies from point to point on the regression line; itis actually a curve here5 Linearity in the Parameters The second interpretation of linearity is that the conditional mean of the dependent variable is a linear function of the parameters, the Bs; it may or may not be linear in the variables. Analogous to a linear-in-variable function, a function is said to be linear in the parameter, say, By, if By appears with a power of 1 only. On this definition, models (6.8) and (6.9) are both linear models because By and By enter the models linearly, It does not matter that the variable X enters non- linearly in both models, However, a model of the type E(Y) = By + BBX (6.10) is nonlinear in the parameter model since By enters with a power of 2. In this book we are primarily concerned with models that are linear in the parameters. Therefore, from now on the term linear regression will mean a regression that is linear in the parameters, the Bs (i.e., the parameters are raised to the power of 1 only); if may ar may not be linear in the explanatory variables © Those who kiow calcwlus will recognize that inthe lineer model the slope, that is, the derivative of ¥ with respect to Xs constant, equal to By, but in the ncalinesr model Bq. (6.8) ts equal to ~B:(1/%2), which obviously will depend onthe value of X at which te slope is measured and i. therefore not constant. is not to suggest that nonlinear (in-the-parameters) models like Eq, (6.10) cannot be esti- rated or that they are not used in prectice. Asa matter of fact, in advanced courses in econometrics such models ae studied in depth atCHAPTER SIX: BASIC IDEAS OF LINEAR REGRESSION: THE TWO-VARIABLE MODEL 145 6.7 TWO-VARIABLE VERSUS MULTIPLE LINEAR REGRESSION 68 ESTIMATION OF ORDINARY LE: So far in this chapter we have considered only the two-variable, or simple, regression models in which the dependent variable is a function of just one explanatory variable. This was done just to introduce the fundamental ideas of regression analysis, But the concept of regression can be extended easily to the case where the dependent variable is a function of more than one explanatory variable. For instance, if the expenditure on Lotto is a function of income (Xz), wealth of the consumer (Xs), age of the Lotto player (X), we cari write the extended Lotto expenditure function as E(Y) = By + BoXxj + ByXxi + BaXr 61) (Note: E(Y) = E(Y | Xei, Xsi, Xi).] Equation (6.11) is an example of a multiple linear regression, a regression in which more than one independent, or explanatory, variable is used to explain the behavior of the dependent variable, Model (6.11) states that the (conditional) mean value of the expenditure on Lotto is a linear function of income, wealth, and age of the Lotto player. The expenditure function of an individual Lotto player (ie, the stochastic PRF) can be expressed as Y;= By + BoXaj + ByXoi + BaXus + oh = EM) +H (6.12) which shows that the individual demand will differ from the group mean by the factor u, which is the stochastic error term. As noted earlier, even in a multiple regression we introduce the error term because we cannot take into account all the forces that might affect the dependent variable. Notice that both Eqs. (6.11) and (6.12) ae linear in the parameters and are therefore linear regression models. The explanatory variables themselves do not need to enter the model linearly, although in the present example they do. OF PARAMETERS: THE METHOD AST SQUARES As noted in Section 6.5, we have to estimate the population regression function (PRE) on the basis of the sample regression function (SRF), since in practice we only have a sample (or two) from a given population. How then do we estimate the PRF? And how do we find out whether the estimated PRF (je, the SRF) is a ("estimate of the true PRE. We will answer the first question in thie chap- terand take up the second question—of the “goodness” of the estimated PRE— in Chapter 7. To introduce the fundamental ideas of estimation of the PRE, we consider the simplest possible linear regression model, namely, the two-variable linear regression in which we study the relationship of the dependent variable Y toa single explanatory variable X. In Chapter 8 wie extend the analueic ta the miltinla te a146 PARTTWO: THE LINEAR REGRESSION MODEL regression, where we will study the relationship of the dependent variable Y to more than one explanatory variable, ‘The Method of Ordinary Least Squares Although there are several methods of obtaining the SRF as an estimator of the true PRE in regression analysis the method that is used most frequently is that of least squares (LS), more popularly known as the method of ordinary least squares (OLS).” We will use the terms LS and OLS methods interchangeably. To explain this method, we first explain the least squares principle. The Least Squares Principle Recall our two-variable PRE, Eq, (6.2): Yi = By + BX + uj Since the PRF is not directly observable (Why?), we estimate it from the SRE Yeh thX te which we can write as @ = actual Yj — predicted ¥; =%-% = ¥j—by — bX; fusing Eq. (6.3)] which shows that the residuals are simply the differences between the actual and estimated Y values, the latter obtained from the SRF, Eq, (6.3) This can be seen more vividly in Figure 6-4. Now the best way to estimate the PRF is to choose b; and by, the estimators of B, and By, in such a way thatthe residuals e areas small as possible. The method of ordinary least squares (OLS) states that by and by should be chosen in such a t way that the residual sum of squares (RSS), J>¢2, is as small as possible i Algebraically, the east squares principle states Minimize Ye = S°0%- YP = -h- bx? (6.13) As you can observe from Eq, (6.13), once the sample values of Y and X are given, RSS isa function of the estimators b; and by. Choosing different values of *Despitethename, there is nothing ordinary about thismethod. As we will shows thismethod has ‘several desirable statistical properties, Itis called OLS because there is another method, called the generalized east squares (GLS) method, of which OLS is a special case ‘Note thatthe smaller the is, the smaller their sum of squares willbe. The reason for considering the squares of e and not of themselves is that this procedure avoids the problem of the sign of the residuals, Note that e can be postive as wel as negative.CHAPTER SIX: BASIC IDEAS OF LINEAR REGRESSION: THE TWO-VARIABLE MODEL 147 by and by will yield different es and hence different values of RSS, To see this, just rotate the SRF shown in Figure 6-4 any way you like. For each rotation, you will get a different intercept (ie. by) and a different slope (i.e., bz). We want to choose those values of these estimators that will give the smallest possible RSS. How do we actually determine these values? This is now simply a matter of arithmetic and involves the téchnique of differential calculus, Without going info detail, it can be shown that the values of by and by that actually minimize the RSS given in Eq, (6.13) are obtained by solving the following two simultaneous equations. (The details are given in Appendix 6A at the end of this chapter) Dan +h x a4) DV sh wth ¥ (615) where 1 is the sample size, These simultaneous equations are known as the (least squares) normal equations. In Equations (6.14) and (6.15) the unknowns are the bs and the knowns are the quantities involving sums, squared sums, and the sum of the cross-products of the variables Y and X, which can be easily obtained from the sample at hand, ‘Now solving these two equations simultaneously (using any high school algebra trick you know), we obtain the following solutions for by and by. by =¥-bX 6.16) which is the estimator of the population intercept, B;. The sample intercept is thus the sample mean value of Y minus the estimated slope times the sample mean value of X. Las aa _De-HH-7) LK ~ XP (6.17) which is the estimator of the population slope coefficient B, Note that 4=Q%-% and y=(%-1) ‘hat is the smell leters denote deviations from the sample mean values, a convention that we will adopt in this book. As you can see from the formula for by, itis simpler to write the estimator using the deviation form, Expressing the values ofa variable {from its mean value does not change the ranking ofthe values, since we are subtracting the same constat from each nalue. Note that by and hs are solely exnressed in ferms 1aas 14 PARTTWO: THELINEAR REGRESSION MODEL of quantities that can be readily computed from the sample at hand. Of coixse, these days the computer will do all the calculations for you. The estimators given in Equations (6.16) and (6.17) ate known as OLS esti- ‘ators, since they are obtained by the method of OLS. Before proceeding further, we should note a few interesting features of the OLS estimators given in Eqs. (6:16) and {6.17}: 1. The SRF obtained by the method of OLS passes through the sample ‘mean values of X and Y, which is evident from Eq, (6.16), for it can be written as Y=h+hX (6.18) 2, The mean value of the residuals, é(=yei/n) is always zero, which provides a check on the arithmetical accuracy of the calculations (see Table 6-4). 3, The sum of the product of the residuals e and the values of the explanatory variable X is zero; that is, these two variables are uncorrelated (on the definition of correlation, see Chapter 3). Symbolically, Lax =0 (6.19) This provides yet another check on the least squares calculations. 4, The sum of the product of the residuals ¢; and the estimated ¥;(=%) is zero; that is, Yj is zero (see Question 6.24). 69 PUTTING IT ALL TOGETHER Let us use the sample data given in Table 6.2 to compute the values of b and bp. ‘The necessary computations involved in implementing formulas (6.16) and (6.17) are laid out in Table 6-4. Keep in mind that the data given in Table 6.2 is a random sample from the population given in Table 6-1. From the computations shown in Table 6-4, we obtain the following sample Lotto expenditure regression: %=7.6182400814%, 6.20) where Y represents weekly expenditure on Lotto and X represents PDI. Note that we have put a cap ont Y to remind that it isan estimator of the true population mean corresponding tothe given level of X recall Eg 6.3). The estimated regression line is shown in Figure 6 6. Interpretation of the Estimated Lotto Expenditure Function ‘The interpretation of the estimated Lotto expenditure function is as follows: The slope coefficient of 0.0814 means that, other things remaining the same, if PDIGHAPTER SIX: BASIC IDEAS OF LINEAR REGRESSION: THE TWO-VARIABLE MODEL 149 TABLE 64 RAW DATA(FROM TABLE 62) FOR LOTTO. a % % Dm F 0 wy oF Tym o F Lem 18 150 2700 22500 112.5 —11 121, 1265825 1297.6 19.89643 1.896432 3.572482 -276.4648 24 175 4200 90625 87.5 -5 25 765625 497.5 21.87281 2.127199 452495 972.2588 % 200 6200 40000 -625 -3 9 390825 1875 2390918 2.000618 4.97152 418.1636, 23. 225 S175 50625 -975 -6 98 140625 225 2S.94658 -2.046557 8676308 ~662.7503 30 250 7500 62500 -125 1 1 15625 -128 27.96193- 2.018068 4.072598 504.517 27 275 7425 75805 125 -2 4 15625 -25 90.01831 8.018907 9.110177 800.0044 34 900 10200 90000 37S §. 2 140825 167.5 92.05468 1.945318 3.784262 583.5054 35 925 11975 105625 625 6 36 300625 975 94.09106 0.908043 OBLBITT 205.4085 33° 350 11550 122500 875 4 16 765625 350 96.12743 .-9.127482 9.780881 —1094.601 40 875 18000 140625 - 1125 11 121 1285625 1297.8 98.16381 1.896193 9.971605 688.5724 290 2625 80925 740625 0 0 904 51562’ 4200° 290 © ~0,00119 51.8909 0.0000 Hote: =X) —Ryi yoo (YP R= 2625; F=28.0. _, Fitted Line Plot Y=7.618 +0085X FIGURE 6-6 Regression fine based on data from Table 64. per week goes up by a dollar, the mean or average weekly expenditure on Lotto goes up by about 8 cents per week. The intercept value of 7.6182 means that if PDI were zero, the mean expenditure on Lotto will be about $7.62 per week. Very often such an interpretation has no economic meaning, How can a person spend $7.62 per week on Lotto if his or her PDI were zero? Of course, one can borrow to wager on the Lotto, bit that may be appropriate only for the diehard gambler. As we will see throughout the book, often the intercept has no particular economic meaning, In general you have to use common sense in interpreting the intercept term, for very often the sample range of the X values (PDI in-out example) may not include zero as one of the observed values, Perfups itis best fo FL 1 4 4oo! 150 PARTTWO: THE LINEAR REGRESSION MODEL TABLE 6-5 AVERAGE HOURLY WAGE BY EDUCATION, Years of schooling Average hourly wage ($)__ Number of pple 6 44867 3 7 5.7700 5 8 5.9787 16 9 7.3317 2 10 “733182 7 1 6.5044 27 2 7.8182 a8 13 7.8961 37 14 41,0228 56 16 1086738 13 16 10.8961 70 7 13.6150 4 18 13.5310 31 ‘Sours: Arthur S, Goldberger, ritodctory Econometrics, Harvard Unversty ross, Cambridge, Mass., 1988, Table 1.1, p 5 adap), interpret the intercept term as the mean or average effect on Y of all the variables omit- ted from the regression model 6.10 SOME ILLUSTRATIVE EXAMPLES “159 Now that we have discussed the OLS method and learned how to estimate a PRE let us provide some concrete applications of regression analysis, Example 6.1. Years of schooling and average hourly earnings Based on a sample of 528 observations, Table 6-5:gives data on average hourly wage Y($) and years of schooling (X). Suppose we want to find out how Y behaves in relation to X. From human, capital theories of labor economics, we would expect average wage to increase with years of schooling. That is, we expect a positive relationship ‘between the two variables; it would be bad news if such were not the case. ‘The regression results based on the data in Table 6-5 are as follows: Y= 0.0144 + 0.72418) (621) As these results show, there is a positive association between education and earnings, which accords with prior expectations. For every additional year of schooling, the mean wage rate goes up by about 72 cents per hous’ The negative intercept in the present instance has no particular economic meaning. ‘sine the datin Table refer to he mean wage forthe vaso cates he slope cece ‘here should strictly be interpreted as the average increas in the mean hourly earnings.(CHAPTER SX: BASIC IDEAS OF LINEAR REGRESSION: THE TWO-VARIABLE MODEL 151 Example 6.2. Okun’s Law Based on the US, data for 1947 to 1960, the late Arthur Okun of the Brookings Institution and a former chairman of the President's Council of Economic Advisers obtained the following regression, known as Okui's law: = -0.4(X; - 2.5) (622) where Y; = change in the unemployment rate, percentage points X= percent growth rate in real output, as measured by real GDP 2.5 = the long-term, or trend, rate of growth of output historically observed in the United States In this regression the intercept is zero and the slope coefficient is -0.4. Okun's Law says that for every percentage point of growth in real GDP above 2.5 percent, the unemployment rate declines by 0.4 percentage points. Olkan’s Law has been sed to predict the required growth in real GDP to reduce the unemployment rate by a given percentage point. Thus, a growth rate of 5 percent in real GDP will reduce the unemployment rate by 1 percentage point, ora growth rate of 7.5 percent is required to reduce the unemployment rate by 2 percentage points. In Problem 6.17, which gives comparatively more recent data, you are asked to find out if Okun’s Law still holds, This example shows how sometimes a simple (ie., two-variable) regression model can be used for policy purposes. Example 6.3. Stock Prices and Interest Rates Stock prices and interest rates are key economic indicators, Investors in stock markets, individual or institutional, watch very carefully the movements in the interest rates, Since interest rates represent the cost of borrowing money, they have a vast effect on investment and hence on the profitability of a company. Macroeconomic theory would suggest an inverse relationship between stock prices and interest fates. As a measure of stock prices, let us use the S&P 500 composite index (1941-1943 = 10) and as a measure of interest rate, let us use the three-month ‘Treasury bill rate’), Table 6.19 in Exercise 6.18 gives data on these variables for the period 1980-1999, Plotting these data, we obtain the scattergram as shown in Figure 6-7. The scattergram clearly shows that there is an inverse relationship between the two variables, as per theory. But the relationship between the two is not linear (ie, straight line); it more closely resembles Fig. 6-5(b). Therefore, let us maintain that the true relationship is: Y= By + Bo(1/Xi) +4 (6.23) Note that Bq, (6.23) is a linear regression model, as the parameters in the model are linear, Itis however nonlinear in the variable X, Ifyou let Z = 1/X, then the model is linear in the parameters as well as the variables Y and Zo f oon AE,ao 152 PART TWO: THELINEAA REGRESSION MODEL 24 6 § © 2 UM 16 ‘Treasury Bill Rate (%) FIGURE 6-7 S&P 500 coinposite index end three-month Treasury bil rate. Using the Eviews statistical package, we estimated Eq. (6.23)by OLS, giving the following results: Y= 15.6035 + 2606.5170(1/%,) (6.24). How do we interpret these results? The negative value ofthe intercept has no practical economic meaning. The interpretation of the coefficient of (1/X) is rather tricky. Literally interpreted, it suggests that if the reciprocal of the three-month Treasury bill rate goes up by one unit, thé average vali of the S&P 500 index goes up by about 2606 units. This is, however, not a very en- lightening interpretation. If you want to measure the rate of change of (awean) ¥ with respect to X (ie, the derivative of Y with respect to X), then as footnote 5 shows, this rate of change is given by —B,(1/X?), which depends on the value taken by X. Suppose X = 2. Knowing that the estimated By is 2606.517, we find the rate of change at this X value as 651.63 (approx). That is, starting with a Treasury bill rate of about 2 percent, if that rate goes up by one percentage point, on average, the S&P 500 index will decline by about 651 units. Of course, an increase in the Treastiry bill rate from 2 percent to 3 percent is a substantial increase. Interestingly, if you had disregarded Figure 6-5 and had simply fitted the straight line regression to the data in Table 6-13, you would obtain the following regression: = 902.2907 — 69.3682, (6.25)TABLE GE. CHAPTER SIX: BASIC IDEAS OF LUNEAR REGRESSION: THE TWO-VARIABLE MODEL 153 ‘Here the interpretation of the intercept term is that if the Treasury bill rate were zero, the average value of the S&P index would be about 902. Again, this may not have any concrete economic meaning. The slope coefficient here suggests, that if the Treasury bill rate were to increase by one unit, say, one percentage point, the average value of the S&P index would go down by about 69 units. Regressions (6.24) and (6.25) bring out the practical problems in choosing an appropriate model for empirical analysis. Which isa better model? How do we know? What tests do we use to choose between the two models? We will provide answers to these questions as we progress through the book (see Chapter 9). ‘A question to ponder: In Eq. (6.24) the sign of the slope coefficient is positive, whereas in Eq, (6.25) itis negative, Are these findings conflicting? Example 6.4. Median home price and mortgage interest rate in the United States, 1994-2008, ‘Over the past several years there has been a surge in home prices across the United States. Itis believed that this surge is due to sharply falling mortgage interest rates. To see the impact of mortgage interest rates on home prices, ‘Table 66 gives data on median home prices (1000 $) ard 30-year fixed rate mortgage (%) in the metropolitan New York area for the period 1994-2008. MEDIAN HOME PRICE (MHP) AND MORTGAGE INTEREST RATE (INT) IN METROPOLITAN NEW YORK AREA, 1994-2003; Observations. MHP(10008) INT (%) 1988 1038, 103 1989 1892 103, 1990. 1749 10.1 1991 17385 93 1982 729 84 1993 1732 73 1994 173.2 a4 1905 1697 79 1998 1745 76 1997 4719 76 1998 108.1 69 1999 208.2 14 2000 2302 at 2m 258.2 70 2002 9098 85 2003 3234 58 ‘Source: National Assocation of Réators, Freddie ‘Mac, es eporid bythe New York Timos, July 18, 2008, See. 11,p. 1. Nota: MHP = Median home pie. INT = Mortgago interest rte, 1" “4x 154. PARTTWO: THE LINEAR REGRESSION MODEL g ‘Median Home Price (1000 $) 8 on ° ° Or nie o 08 a mp 6 7 § 9 0 Interest Rate (%6) FIGURE 6-8 Median home prices and interest rates. These data are plotted in Figure 6-8, As a first approximation, if you fit a straight line regression model, you will obtain the following results, where Y = median home price (1000 $) and X = 30-yeat fixed rate mortgage (%): Y; = 393.4164 — 23.4256X; (6.26) These results show that if the mortgage interest rate goes up by 1 percentage point,/° on average, the median home price goes down by about 23.4 units or about $23,400 (Note: Y is measured in thousands of dollars.) Literally interpreted, the intercept coefficient of about 393 would suggest that if the mortgage interest rate were zero, the median home price ‘on average would be about $393,000, an interpretation that may stretch our credulity, It seers that falling interest rates do have substantial impact on home prices. A question: If we had taken median family income into account, would this conclusion still stand? Example 6.5, Antique clocks and their prices The Triberg Clock Company of Schonachbach Germany holds an annual antique clock auction. Data on about 32 clocks (the age of the clock, the number of bidders, and the price of the winning bid in Marks) are given in Table 6-14 inProblem number 6.19. Note that this auction took place about 20 years ago. If we believe that the price of the winning bid depends on the age of the clock—the older the clock, the higher the price, ceteris paribus—we would Note that there is a difference betiveen a 1 percentage point increase anda 1 percent increase For example, if the current interest rates 6 percent but then goes to 7 percent, this represents a 4 percentage point increase; the percentage increase is, however, (74) %100= 166%,6.11 SUMMARY CHAPTER SIX: BASIC IDEAS OF LINEAR REGRESSION: THE TWO-VARIABLE MODEL 155 expect a positive relationship between the two. Similarly, the higher the umber of bidders, the highef the auction price because a large number of bidders for a particular clock would suggest that that clock is more value able, and hence wé would expect a positive relationship between the two ‘variables. Using the data given in Table 6-14, we obtained the following OLS regressions: Price = —191.6662 + 10.4856 Age (6.27 Price = 807.9501 + 54.5724 Bidders (6.28) As these results showy the auction price is positively related to the age of the clock, as well as to the number of bidders present at the auction, In Chapter 8 on multiple regression we will see what happens when we regress price on age and number of bidders together, rather than individually, as in the preceding two regressions. ‘The regression results presented in the preceding examples can be obtained easily by applying the OLS formulas Eq. (6.16) and Eq, (6.17) to the data presented in the various tables, Of course, this would be very. tedious ard very time- consuming to do manually. Fortunately, there are several statistical software packages that can estimate regressions in practically no time. In this book we ‘will use the Eviews and Minitab software packages to estimate several regression models because these packages are comprehensive, easy to use and are readily available. (Excel can also do simple and multiple regressions.) Throughout this book, we will reproduce the computer output obtained from these packages. But keep in mind that there are other software packages that can estimate all kinds of regression models. Some of these packages are LIMDEP, MICROFIT, PC-GIVE, RATS, SAS, SHAZAM, SPSS, and STATA. im this chapter we introduced some fundamental ideas of regression analysis. Starting with the key concept of the population regression function (PRF), we ‘developed the concept of linear PRE This book is primarily concerned with linear PRFs, that is, regressions that are linear in the partmeters regardless of whether of not they are linear in the variables, We then introduced the idea of the stochastic PRE and discussed in detail the nature and role of the stochastic error term i PRF is, of course, a theoretical or idealized construct because, in practice, all we haves a sample(s) from some population, This necessitated the discussion of the sample regression function (SKE). We then considered the question of how we actually goes about obtaining the SRE-Here we discussed the popular method of ordinary least squares (OLS) and presented the appropriate formulas to estimate the parameters of the PRR. si 255y 156 PART TWO: THE LINEAR REGRESSION MODEL We illustrated the OLS method with a fully worked-out numerical example as well as with several practical examples. Our next task isto find out how good the SRF obtained by OLS is as an esti- ator of the true PRR, We undertake this important task in Chapter 7. KEY TERMS AND CONCEPTS ‘The key terms and concepts introduced in this chapter are Regression analysis deterministic, or nonstochastic, a) explained, or dependent, PRE : variable Sample regression line (SRL) b) independent, or explanatory, Sample regression function (SRF) vatiable Estimator; sample statistic Scattergram; scatter diagram Estimate Population regression line (PRL) Residual term ¢; residual ) conditional mean, or conditional Linearity in variables expected values Linearity in parameters Population regression a) linear regression function (PRP) ‘Two-variable, or simple, regression Regression coefficients; parameters vs, multiple linear regression a) intercept Estimation of parameters b) slope a) the metliod of ordinary least Conditional regression analysis squares (OLS) Stochastic, or random, error term; b) the least squares principle error term residual sum of squares (RSS) a) noise component 4) normal equations b) stochastic, or statistical, PRE ©) OLS estimators QUESTIONS - 6.1. Explain carefully the meaning of each of the following terms: a. Population regression function (PRF). , Sample regression function (SRF). ¢. Stochastic PRE. 4d. Linear regression model. ¢, Stochastic error term (ui). £, Residual term (). g: Conditional expectation. h. Unconditional expectation. i, Regression coefficients or parameters. j. Estimators of regression coefficients. 6.2. What is the difference between a stochastic population regression function (PRE) and a stochastic sample regression function (SRF)? 6.3, Since we do not observe the PRE, why bother studying it? Comment on this statement, dPROBLEMS CHAPTER SIX: BASIC IDEAS OF LINEAR REGRESSION: THE TWO-VARIABLE MODEL 157 64, State whether the following statements are true, false, or uncertain, Give your reasons. Be precise, a. The stochastic error term 1; and the residual term ¢; mean the same thing, 'b. The PRE gives the value of the dependent variable corresponding.to each value of the independent variable, ¢. Alinear regression model means a model linear in the variables. . In the linear regression model the explanatory variable is the cause and the dependent variable is the effect. . The conditional and unconditional mean of a random variable are the same thing. £. In Eq, (62) the regression coefficients, the Bs, are random variables, whereas the bs in Bq, (64) are the parameters, ena (6.1) the slope coefficient By measures the slope of Y per unit change in h i practice, the two-variable regression model is useless because the bebav- ior of a dependent variable can never be explained by a single explanatory variable, i. Thesum of the deviation ofa random variable from its mean value is aluoays equal to zero. 6.8. Whats the relationship between a, B and by; b. By and byand c. uj and ef? Which of these entities can we observed and how? 6.6. Can you rewrite Eq, (6.22) to express X as a function of Y? How would you interpret the converted equation? 67. The following table gives pairs of dependent and independent variables. In each case state whether you would expect the relationship between the two variables to be positive, negative, or uncertain, In other words, tell whether the slope coefficient will be positive, negative, or neither. Give a brief justification. ineach case. Dependent variable Independent variate a] GOP * Rate of interest (6) Personal savings ‘ate of interest, (0) Yield of crop Rainfall (d) US. defense expenditure ‘Soviet Union's defense expenditure {@) Number of nome suns Fit by Annual salary ‘star baseball player (f) Apresident’s popularity Length of stayin office (g) Astuderts Srst-year grede- SAT. score point average (ti Astudents grade in econometrics Grade in statistics () ports of Japanese cars U.S. per capita income 68, State whether the following models are linear regression models: a. Yi = By + Ba(1/Xi) 1 + Bln X; + 1 In) = Bi + BX +m) FOEaN 158 PART TWO: THE LINEAR REGRESSION MODEL TABLE 6-7 158 260 180, 152, 175, 1 @. in = By + Ben X +m @. Yj = By + BoBsXi +4; £Y = Bi + BX tu; ‘Note; In stands for the natural log, that is, log to the base e. (More on this in Chapter 9.) 69. The following table gives data on weekly family consumption expenditure (1) (in dollars) and weekly family income (X) (in dollars). HYPOTHETICAL DATA ON WEEKLY CONSUMPTION EXPENDITURE AND WEEKLY INCOME. Weekly Income Weekly consumption expenditure wy (91%) 80 55, 60, 65, 70, 75 100 65, 70, 74, 80, 85, 88 120 79, 84, 90, 94, 98 140 80, 93, 95, 103, 108, 113, 115 160 102, 107, 110, 116, 118, 125. 180 110, 115, 120, 130, 135, 140 200 120, 136, 140, 144, 145 220 195, 137, 140, 152, 157, 160, 162 240 187, 145, 155, 1 15, 189 180, 185, 191 a. For each income level, compute the mean consumption expenditure, E(Y | X); that is, the conditional expected value. b, Plot these data in a scattergram with-income on the horizontal axis and consumption expenditure on the vertical axis. «Plot the conditional means derived in part (a) in the same scattergram. 4. What can you say about the relationship between Y and X and between mean Y and X? , Write down the PRF and the SRE for this example. £, Is the PRF linear or nonlinear? 6.10. From the data given in the preceding problem, a random sample of Y was drawn against each X. The result was as follows: Y m- 6 9 - 9 110 15 120 140 165 160 XxX @ 100 120 140 160 180 200 220 240260 a. Draw the scattergram with Y on the vertical axis and X on the horizontal ais, . What can you say about the relatiorship between Y and X? . What is the SRF for this example? Show all your calculations in the manner of Table 64. 4. On the same diagram, show the SRF and PR. e. Are the PRF and SRF identical? Why or why not? 6.11. Suppose someone has presented the following regression results for your consideration: ¥ = 2.6911 - 0.4795X,TABLE 6-8 CHAPTER SIX: BASIC IDEAS OF LINEAR REGRESSION: THE TWO-VARIABLE MODEL 159 where Y = coffee consumption in the United States (cups per person per day) X = retail price of coffee ($ per pound) time period : a.Is this a time series regression or a cross-sectional regression? ‘b. Sketch the regression line, c. What is the interpretation of the intercept in this example? Does it make _ economic sense? 4. How would you interpret the slope coefficient? e.Isit possible to tell what the true PRF is in this example? f. The price elasticity of demand is defined as the percentage change in the quantity demanded for a percentage change in the price, Mathematically, it is expressed as Hasty =sope(F) ‘That is, elasticity is equal to the product of the slope and the ratio of X te Y, where X= the price and Y = the quantity From the regression results presented earlier, can you tell what the price elasticity of demand for coffee is? fnot, what additional information would you need to compute the price elasticity? 6.12, The following table gives data on the Consumer Price Index (CPI) forall tems (1982-1984 = 100) and Standard & Poot (S&P) index of $00 comion stock prices (Base of index: 1941-1943 = 10), ‘CONSUMER PRICE INDEX (CF) AND ‘SAP 500 INDEX (S8P), UNITED STATES, 1978-1989. Year cP SAP 1978 2 96.02 728 408.01 82.4 118.78 90.9 128.05 98.5 t19.71 998 160.41 103.9 16046, 1078 186.84 109.8 236.34 1136 286.89 118.3 265.79 1240 322.84 ce: Econan Report of Presiden, 1290, Tab C58, fr CPI end Tabla (6985 forthe SEP Indo. a. Plot the data on a scattergram with the S&P index on the vertical axis and CPI on the horizontal axis, . What can you say about the relationship between the two indexes? What + does economic theory have to say about this relationship? «. Consider the following regression model: (S&P), = Pa + BYCPh + us 159Ea 160 PART TWO: THE UNEAR REGRESSION MODEL TABLE 6-9 Use the method of least squares to estimate this equation from the preced- - ing data and interptet your results. 4. Do the results obtained in part (c) make economic sense? e, Do you know why the S&P index dropped in 19882 6:13, The following table gives data on the nominal interest rate (Y) and the infla tion rate (X) for the year 1988 for nine industrial countries. NOMINAL INTEREST RATE (Y)AND INFLATION (X) IN NINE INDUSTRIAL ‘COUNTRIES FOR THE YEAR 1988, Country ¥(%) XC) Australia 19 WW ‘Canada 84 4.0 France 75 31 Germany 40 1.6 ltaly 113 48 Mexico 663 517 ‘Switzerland 22 20 ‘United Kingdom 103 68 United States 18 44 ‘Source: Rudigor Dombusch and Stanley Fischer, Maoroaconomcs, th ed, MoBren i New Yor, 1990, p. 652. Th original deta ae from vaious notes of the Intemational ‘Fhhanclal Statistics published by tie Intemational Monetary Fund (MF). a. Plot these data with the interest rate on the vertical axis and the inflation rate on the horizontal axis. What does the scattergram reveal? b. Doan OLS regression of Y on X. Present all your calculations, ¢. If the real interest rate is to remain constant, what must be the relationship between the nominal interest rate and the inflation rate? That is, what must be the value of the slope coefficient in the regression of Y on’ X and that of the intercept? Do your results siiggest that this is the case? For a theoretical discussion of the relationship among the nominal interest rate, the inflation rate, and the real interest rate, see any textbook on macroeconomics and ook up the topic ofthe Fisher equation, named after the famous American economist, Irving Fisher, 6.14. The real exchange rate (RE) is defined as the nominal exchange rate (NE) times the ratio of the domestic price to foreign price, Thus, RE for the United States against Germany is REys = NEys(UScer/Germancry) a. From the data given in Table 1.3 of problem 1.7, compute REys, b. Using a regression package you are familiar with, estimate the following regression: NEus = By + By REys +4 aCHAPTER SIX: BASIC IDEAS OF LINEAA REGRESSION: THE TWO-VARIABLE MODEL 161 ¢. Apriori, what do you expect the relationship between the nominal and real exchange rates to be? You may want to read up on the purchasing power parity (PPP) theory from any text on international trade or macroeconomics, . Are the a priori expectations supported by your regression results? If not, ‘what might be the reason? *e, Run regression (1) in the following alternative form: InNEys = A+ AplnREys +4 7) where In stands for the natural logarithm, that islog to the base. Interpret the results of this regression. Are the results from regressions (1) arid (2) qualitatively the same? 6.15. Refer to problem 6.12. In the following table we have data on CPi and the S&P 500 index for the years 1990 to 2001. TABLE 6-10 | CONSUMER PRICE INDEX (CPI) AND S&P 500 INDEX (SP), UNITED STATES, 1990-2001. Observations oP sep 1980 190.7000. 334.5900 1991 196.2000 376.1800 1992 440.3000. 415.7400 1993 144.5000 451.4100 1994 148.2000 460,420 1995 162.4000 841.7200 1996 156.9000 BIO S00 1997 460.5000 873.4300 1988 163.0000 1086.50 4989 166.6000 1327990, 2000 172.2000 1827220 2001 177.1000 1194-180 Sourve: Eeanamle Report ofthe President, 2002, 3 a, Repeat questions (a) to (e) from problem 6.12. ’ +, Do you see any difference in the estimated regressions? ¢. Now combine the two sets of data and estimate the regression of the S&P index on the CPI. a. Are there noticeable differences in the three regressions? 6.16. The following table gives data on average starting pay (ASP), grade point average (GPA) scores (ona scale of 1 to), GMAT scores, annual tuition, pércent of graduates employment at graduation, recruiter assessment score (5.0 highest) and percent of applicants accepted in the graduate business school for 47 well-regarded business schools in the United States for the year 2004, a. Using a bivariate regression model find out if GPA has any effect on ASP, ». Using a suitable regression model, find out if GMAT scores have any tela~ tionship to ASP. c. Does annual tuition have any relationship to ASP? How do you know? If there is a positive relationship between the two, does that mean it “Onions F 31462 PART TWO; THE LINEAR REGRESSION MODEL TABLE 6-11 SELECTED DATA ON TOP BUSINESS SCHOOLS IN THE UNITED STATES, School GPA GMAT _'ASP(S)_%Employed Tullon (] Recrulte-Rating - % Aoceptance Harvard 360 708 105896 77.6 ©=—89650 45 16 Stanford 357 713 107820754 96252 44 92 Wharton 348 (713101404783 37823 48 159 MIT 350 710 99899. 7.084780 45 182 Kelog 345 «703 «98368727 4314 46. 157 Coumbia 340 709 98611705 (34788 42 125 Unichicago 340" 690 © 97872 «6724783. 43 229 Caberkely 350 700 ©9194 «68228020 4a 128 Darimouthtuck S 840 69696714 75.1450 43 199 Michiganannarbor 340 692 «97039, «61634688 42 27 Dukefuqua ~ 340 70392459. 56284354 42 253 Losengelesanderson ©» «3.50701 «90062 S21 20218 4A 206 Univrgintadarden 340 678928556. = (33270 44 242 Comelfohnson 335 «672 BIBS G26 (8889 44 304 Newyorkunistem 340 | 700 9589662434128 39 185 Yale 350 703 80526 «47.4 = 3560 4a 186 Camegiemelion 338 68085737 SB.1 35220 38 273 Unisoutheatitoria 330 688 76670 «68.4 «= 83675 37 280 Emoryuni Yn r 218 Ohiostatefisher 399 68586153709 5224 25 Uniminnesotacarlson «3.28654 B41 72.5 (00 x ay Unincarofinachapel 320 668 «67819. 55.1. «(31004 38 360 Indianaunikeliey a1 65082270» 57.0561 40 318 Unitinoisurbana 340 64879610 66.7 (24864 34 393 Unitexasaustinmocombs 3.34 67882167 «50.1 26088 40 400 Purduounikrannert 331 658 «BRST G40. 5488. 35 322 Uniwashington 340 669. «7060585819857 33 404 ‘Asizonastateunicarey «3.44 6517284958280 34 353 Wichiganstatounibroad © 3.36646 RUB 221420 34 253 Unicaforradavis 340 67973828, «886 8912 35 310 Uninotredamemendoza 34066478047 47.6 28205 36 BAA Gsorgetownuni 930 6818372663. 80888. 33 212 Unimaryiandcollegepk = «335. 655-0085 50026382 32 265 Uniarizonaeller 346 6436764562224 33 568 Unirochestersimon 320, 647 77844558). RATS 34 365 Uniwisconsinmadison -«-3.30°662.—«=«75002«« 622” «24802 35 243 Wakeforestunibaboock 3.20643. 74581 67.8 26625 38 459 Brighamyounguni 358 640 «60983 85.3. 9626 34 478 Vanderbiltuniowen 328 6387770548) 90005 36 538 Washingtonunistiouis «3256176845 51.9 «31810 35 407 Georgiainstteshdupree 382 66B_ «OMS. «BBO 21790 32 380 Unigeorgia 326 6597341314 «17820 33 319 Pennstateunipark 390 65375962. «42.0412 35 369 Rincaunitexones 320 624 «2357.55.68 © 87AB. az uo Bostonunl 317 86671655623) 33 a5 Unicaiforiarvine 3376707162615 20188 30 B49 Uniiowatipple 330 8? = 664061518998 33 53.0 ‘Source: U.S. News and World Repor, Api 12, 2004, p. 64 (adapter, ‘Notes: GPA = 2008 average uidergraduato GPA. GMAT = 2003 average GMAT score, ASP = average stating slay (8). ‘% Employed = greduates employed al graduation, ‘Tution = 2008 out-of-state tlio and foes. Recruiter rating = recruitor assessment soars (5.0 maximum). ‘acceptance = 2008 acceptance rae to graduate seh.‘CHAPTER SUK: “BASIC IDEAS OF LINEAR REGRESSION: THE TWO-VARIABLE MODEL 163 ‘pays to ga to the most expensive business school? Can you argue that a hhigh-tuition business school means a high-quality MBA program? Why or why not? 4. Does the recruiter perception have any bearing on ASP? 6.17, Table 6-12 gives data on real GDP (Y) and civilian unemployment rate (X) for the United States for period 1970 to 1999. TABLE 6-12° REAL GROSS DOMESTIC PRODUCT AND CIVILIAN UNEMPLOYMENT RATE, UNITED STATES, 1970-1999. Real Gross batan ‘Observations Domestic Product (¥) unemployment rate (X) 4970 3878.0 4a i971 3697.7 59 wre 38084 56 1973 4123.4 49 1974 4099.0 56 1975 4084.4 85 1078 431.7 7 197 4511.8 TA 1978 47606 61 1979 4912.1 5B 1980 4900.9 7A 1981 0210 78 1982 4919.3 97 1983 51923 96 1984 5505.2 75 1985 ral 72 1986 59124 70 1987 61133 62 1988 6368.4 55 4988 6591.8 83 1990 67079 8B 4991 6676.4 68 1992 6880.0 18 1994. 7062.6 69 1994 7347.7 6A 1995 75438 68 1996 78132 54 1997 81595 49 1998 SBI87 45 1999 8875.8 42 ‘Soures: Econo Report of he President, 2001 Note: Real Gress Doreen Product (f billons); Cian unenloymont ao, (4). a. Estimate Okun’s Law in the form of Eq, (6.22). Are the regression results similar to the ones showin in (6.22)? Does this suggest that Okun’s Law is universally valid? . Now regress percentage change in real GDP on change in the civilian unemployment rate and interpret your regression results. «. If the uriemployment rate remains unchanged, what is the expected (percent) rate of growth in real GDP? (Use the regression in (b)). How would veut internret this erowth rate? =) a Dé 163oN 164. PART TWO: THE LINEAR REGRESSION MODEL TABLE 6-13 S&P 500 INDEX (S&P) AND TABLE 6-14 AUCTION DATAON PRICE, AGE OF ‘THREE-MONTH TREASURY BILL. CLOCK AND NUMBER OF BIDDERS. RATE (3-M-T BILL) 1980-1999, —__- ‘Number of Observations S&P Sin Tbil (16) ‘Observations Price Age bidders 1980 118.78 11.51 4 1235, 127 13, 1961 12805 | 1403, 2 1080 115 12 192 a7 10.89 3 us eT 7 1983 160,41 8.63. 4 1852" 150 9 1984 160.46 ° 9.58 S 1047 168g 1985 186.84 7.48 6 1979 182 ais 1986 236.34 5.98 bi 1822 156 12 1987 286.83 5.82 8 51253 132 10 1988 265.79 6.69 8 1297 1878 1989 922.84 8.12 10 946 113 Q 1990 30458 7.51 W 1713 187 15 1991 376.18 = 5.42 12 1024 117 n 1992 416.74 3.45 43, 2131 170 14 1993 451.41 3.02 14 1550 182 8 1994 880.42 4.29 15 1884 162 a 1996 541.72 5.51 16 2041 184 10 1996 6705.02 vw 854 43 1997 87343 5.07 18 1483 159 9 1998 | 1085.5 481 19 1055 108 14 1999 1827.33 4.66 20 1845 175 8 eer at 729 108 6 ‘Sourve: Economic Report of the 22 1792 479 9 Pressel, 200. 2B ts M18 m4 1593 1878 25 1147 187 8 2%. 102 153 a7 1152 117 13 2B 1896 12810 29 785111 7 30 Tah MS 7 n 1356 194 5 32 1262 168 7 648, Refer to Example 63, for which the data are as shown in Table 6-13 a. Using a statistical package of your choice, confirm the regression results given in Eq. (6.24) and Eq. (6.25). b. For both regressions, get the estimated values of Y (ie., Y%) and compare them with the actual ¥ values in the sample. Also obtain the residual values, ¢, From this can you tell which is a better model, (6,24) or (625)? 6.19, Refer to Example 6.5 on antique clock prices. Table 6-14 gives the underlying data. a, Plot clock prices against the age of the clock and against the number of bidders. Does this plot suggests that linear regression models shown in ¥q. (6.27) and Eq, (6.28) may be appropriate? , Would it make any sense to plot the number of bidders against the age of the clock? What would such a plot reveal? 6.20, Refer to the Lotto expenditure example discussed in the text. Table 6-4 gives the necessary raw calculations to obtain the OLS estimators. Look at the columns ¥ (actual Y) and f (estimated Y) values, Plot the two in a scattergram.TABLE 6-15 CHAPTER SIX: BASIC IDEAS OF LINEAR REGRESSION: THE TWO-VARIABLE MODEL 165, ‘What does the scattergram reveal? If you believe that the fitted model (Eq. 6.20)lisa “good” model, what should be the shape of the scattergram? In thenext chapter we will see what we mean by a “good” model. 6.21, Table 6-15 gives data on verbal and math S.A.T. scores for both males and females for the period 1967-1990. MEAN SCHOLASTIC APTITUDE TEST (S.A.T}) VERBALAND MATH SCORES FOR COLLEGE:BOUND SENIORS, 1967-1990, Verbal Math Year Males Females Total Males Females Total 1967 463 4BBCB SN T 0 1968 464485 468512 47049. 1969 459= 468 463513. 470493 y70 488 46t_ 460509058 1971454457455 SOT, 468288 to72 454452, 453 50S ABT 1973 446443, 4455024808 1074 447,442 a SOY Sa 1975 437431434 | 4954497 1976 433430431497, 4G 472 iT 8h a2 97445470 1978 433425. 429 4944s 1979431 «428 42749344348 1980 428° 420 424 at 43488 1981 430418 Sah 8 joe2 431421 4265493487 1983 430 = 420 4254935468 Se a ne ae a) 1985 437° 425431 499452475, 1986 437 | 42643} BOL ABTS 1987 435425 = 430500453476 198 495422 a2 BHC ATG 1989 434. 421407 50045476 t990 420419 42449945576 ‘Source: Tha College Bos" The New York Tines, Aug 28, 1980, p. 8-5. 1967-1971 data are estimates, a. You want to predict the male math score (Y) on the basis of the male verbal score (X). Develop a suitable linear regression model and estimate its parameters. b, Interpret your regression results. c. Reverse the roles of ¥ and X and regress the verbal score on the math score. Interpret this regression 4d. Let az be the slope coefficient in the regression of math score on the verbal ‘score and let bz be the slope coefficient of the verbal score on the math score. Multiply these two values, Compare the resulting value with the r? obtained from the regression of math score on verbal score or the r* value obtained from the regression of verbal score on math score. What conclusion cart vet draur frm thie avarcica? AEP, a166 PARTTWO: THE LINEAR REGRESSION MODEL OPTIONAL QUESTIONS 6.22. Prove that J” ¢; = 0, and hence show thaté = 0. 6.23, Prove that )* ey 6.24, Prove that )é;¥; = 0; that is, that the sum of the product of residuals ; and the estimated Ysis always zero. 6.25, Prove that Y = Y; that is that the means of the actual Y values and the estimated Y valies are the same. 626, Prove that ny = Da¥= D Xiy, where x; = (X;— X) and y = 6.27. Prove that 52) = Y'y =0, where andy are as defined in Problem 626. 6.28, For the Lotto example data given in Table 6-4, verify that statements made in (Questions 6.22 hold true (save the rounding errors). APPENDIX 6A: Derivation of Least-squares Estimates We start with Bq, (6.13): Y= LV -- xy? A.) Using the technique of partial differentiation from calculus, we obtain: ay /0b =2°% = -X(-1) (A2) 3 6P/ 0b, = 2) (4 - by - X)(-X) Aa) By the first order condition of optimization, we set these tivo derivations to zero, and simplify, which will give Lami the OX (6A.4) Dh Ht+h Pw (6A5) which are Eqs. (6.14) and (6.15), respectively, given in the text, Solving these two equations simultaneously, we get the formulas given in Bags. (6.16) and (6.17). manCHAPTER 7 THE TWO-VARIABLE MODEL: HYPOTHESIS TESTING In Chapter 6 we showed how the method of least squares works. By applying, that method to our Lotto sample data given in Table 6-2, we obtained the folowing Lotto expenditure function: ¥% = 7.6182 + 0.0814%; (6.20) where ¥ represents weekly expenditure on Lotto and X represents weekly personal disposable income (PDI), both measured in dollars. This is the estimation stage of statistical inference. We now turn our attention. to its other stage, namely, hypothesis testing. The important question that we raise is: How “good” is the estimated regression line given in Equation (6.20)? That is, how do we tell that it really is a good estimator of the true population regression function (PRF)? How can we be sure just on the basis of a single sample given in ‘Table 6-2 that the estimated regression function (ie,, the sample regression function [SRE]) is in fact a good approximation of the true PRE? ‘We cannot answer this question definitely unless we are a little more specific about our PRE, Eq, (6.2). As Bq, (6.2) shows, ¥;depends on both X; and 4. Now we have assumed that the X; values are known or given—recall from Chapter 6 that our analysis is conditional regression analysis, conditional upon the given Xs, in short, we treat the X values as nonstochastic. The (nonobservable) error term 1 is of course random, or stochastic. (Why?) Since a stochastic term (i) is added toa nonstochastic term (X) to generate Y, ¥ becomes stochastic, too. This means that unless we are willing to assume how the stochastic u terms are generated, we will-not be able to tell how good an SRF is as an estimate of the true PRE In deriving the ordinary least squares (OLS) estimators so far, we did not - 1 LT 7~~ me? 168 PARTTWO: THE LINEAR REGRESSION MODEL say how the u; were generated, for the derivation of OLS estimators did not depend on any (probabilistic) assumption about the error term. But in testing statistical hypotheses based on the SRE we cannot make further progress, as we will show shortly, unless we make some specific assumptions about how uj are generated. This is precisely what the so-called classical linear regression model (CLRM) does, which we will now discuss. Again, to explain the fundamental ideas, we consider the two-variable regression model introduced in Chapter 6. In Chapter 8 we extend the ideas developed here ta the multiple regression models. 7.1 THE CLASSICAL LINEAR REGRESSION MODEL ‘The CLRM makes the following assumptions: AT, The regression model is linear in the parameters; it may ormmay not be linear in the vasiables.‘That is, the regression model is of the following type. Y= Bit BX ty (6.2) As will be discussed in Chapter 8, this model can be extended to include more explanatory variables. AT2, The explanatory variable(s) X is uncorrelated with the disturbance term 1. However, ifthe X variable(s) is ronstochastc (Le, its value isa fixed number), this assumption is automatically flied. This aasurnption is nota new assumption because in Chapter 6 we stated that our regression analysis is a conditional regression analysis, conditional upon the given X values, In essence, we are assuming that the Xs are nonstochastic. Assumption 7.1 is made to deal with simultaneous equation regression models, Which we will discuss in Chaptet 15. AT3. Given the value of X;, the expected, or mean, value of the disturbance term « is zeto. That is, E(u] Xj) =0 (74) Recall our discussion in Chapter 6 about the nature of the random term 1, It represents all those factors that are not specifically’ introduced in the model. ‘What Assumption (7-1) states is that these other factors o forces are not relatedFIGURE 7-1 CHAPTER SEVEN: THE TWO-VARIABLE MODEL: HYPOTHESIS TESTING 169 ‘PRR EY |X) = By + BK a Conditional distribution of disturbances uy to X; (the variable explicitly introduced in the model) and therefore, given the value of X;, their mean value is zero. This is shown in Figure 7-1. ATA, The variance of each us constant, or homoscedastic (Homo means equal and soedastic means variance), That is var (u;) = 0? (7.2) Geometrically, this assumption is as shown in Figure 7-2(), This assumption simply means that the conditional distribution of each Y population corresponding to the given value of X has the same variance; that is, the individual Y values are spread around their mean values with the same variance. If this is riot the case, then we-have heteroscedasticity, or unequal variance, which is depicted in Figure 7-2(0)2 As this figure shows, the variance of each Y population is different, which is in contrast to Figure 7-2(a), where each Y population has the same variance. The CLRM assumes that the variance of is as shown in Figure 7-2). Note that Assumption A7.2 only states that X and u are uncorrelated, Assumption A7.3 adds thatnot only ae X and w uncorrelated, but also that given the value of X the mean of (which represents umpteen factors) 8 220, °Since the X values are assumed tobe given or nonstochesti the ony source of variation in Y {s from u. Therefore, given X; the variance of Yis the same as that of u; In short, the conditional variance of wand Yj are the same, namely, o®. Note, howreves thatthe unconditional variance of Yi ‘as shown in Chap. 3, is E[Y — E(Y)]F. As we will se, if the variable X has any impact on Y, the conditional vatiance of Y willbe smaller than the unconditional variance of Y. Incidentally, the sample counterpart of the unconditional variance of Yis 0} ~ YF fn ~1). “There is a debate in the literature whether it is homoscedasticity or homoskedasticity and het- 4 = LEE ro 4170 PARTTWO: THE LINEAR REGRESSION MODEL os, PREY,= B, +8, PREY, =B,+%%, | 7 (a) Homoscedasticty (equal variance); (b) Heteroscedasticty (unequal variance). FIGURE'7-3 Patterns of aulocorrelation: () No autocorrelation; (b) postive autocorrelation; (c} negative autocorelation, A75. ‘There is no correlation between two error terms, Thisis the assumption of no autocorrelation. ‘Algebraically, this assumption car be written as cov(uj,uj)=0 if] 73) Here cov stands for covariance (see Chapter 3) and i and j are any two error terms. (Note: =}, Equation (73) will give the variance of u, which by Eq, (7.2) is aconstant), Geometrically, Eq (73) can be shown in Figure 7-3. Assumption A7.5 means that there is no systematic relationship between two error terms. Itdoes not mean that ifone wis above the mean value, another error term 1 will also be above the mean value (for positive correlation), or that if one error term is below the mean value, another error ferm has to be above the mean value, or vice versa (negative correlation). In short, the assumption of no autocorrelation means the error terms u: are random.CHAPTER SEVEN: ‘THE TWO-VARIABLE MODEL: HYPOTHESIS TESTING 171 ‘Since any two ertor terms are assumed to be uncorrelated, it means that any two Y values will also be uncorrelated; that is, cov (Yj, Yj) = 0. This is because Y= By, + B,X:+M and given that the B’s ase fixed numbers and that X is assumed to be fixed, Y will vary as u varies. So, if the u’s are uncorrelated, the Y's will be uncorrelated also. ATS. ‘The regression model is correctly specified, Alternatively, there is no specifi cation bias or specification error in the model used in empirical analysis: What this assumption implies is that we have included all the variables that affect a particular phenomenon, Thus, if we are studying the demand for avto- mobiles, if we only include prices of automobiles and consumer income and do not take into account variables such as advertising, financing costs, and gasoline prices, we will be committing model specification errors. Of course, itis not easy to determine the “correct” model in any given case, but we will provide some guidelines in Chapter 11. ‘You might wonder about all these assumptions. Why are they needed? How realistic are they? What happens if they.are not true? How do we know that a particular regression model ir fact salisfes’all these assumptions? Although these questions are certainly pertinertat this stage of the development of our subject matter, we cannot provide totally satisfactory answers to all of them, However, as we progress through the book, we will see the utility of these assumptions. As a matter of fact, all of Part Ill is devoted to finding out what ‘happens if one or mare ofthe assumptions of CLRM are not fulfilled. But keep in mind that in any scientific inquiry we make certain assumptions because they facilitate the development of the subject matter in gradual steps, not because they are necessarily realistic, An analogy might help here. Students of economics afe generally introduced to the model of perfect competition before they are introduced to the models of ixiperfect competition, This is done because the implications derived from this model enable us to appreciate better the models of imperfect competition, rot because the model of perfect competition is nécessarily realistic, although there are markets that may be reasonably perfectly competitive, sich as the stock market or the foreign exchange market. 7.2 VARIANCES AND STANDARD ERRORS OF ORDINARY LEAST SQUARES ESTIMATORS One immediate consequence of the assumptions just introduced is, that they enable us to estimate the variances and standard errors of the ordinary least squaces (OLS) estimators given in Eqs. (6.16) and (6.17). In Chapter.5 we discussed the basics of estimation theory, including the notions of (point) estimators, their sampling distributions, and the concepts of the variance and standard exror of the estimators. Recalling that knowledge, we know that the OLS 234 — Pata pefho~ 172 PART TWO: THELINEAR REGRESSION MODEL estimators given in Eqs. (6.16) and (6.17) are random variables for their values will change from sample to sample, Naturally, we would like to know something about the sampling variability of these estimators, that is, how they vary from sample to sample, These sampling variabilities, as we know now, are measured by the variances of these estimators, or by theit standard errors (se), which are the square roots of the variances. The variances and standard errors of the OLS estimators given in Eqs, (6.16) and (6.17) areas follows vat (by) = 03 = ie so? (74) (Note: This formula involves both small x and capital X.) se (bi) = var(bs) (73) var(bs) = of = Se 76) 8e(ba) = y/var(ba) (77) whete var = the variance and se = the standard error, and where ois the variance ofthe disturbance term u which by the assumption of homoscedaticity is assumed to be the same for each 1. Once o? is known, then all the terms on the tightthand sides of the preceding equations can be easily computed, which will give us the numerical values of the variances and standard errors of the OLS estimators. The homoscedastic 0? is estimated from the following formula: (78) "where 6? is an estimator of o? (recall we use” toindicatean estimator) and Ye? ig the residual sum of squares (RSS), that is, 5°(¥ —¥j)?, the sum. of the squared: difference between the actual.Y ard tlie estimated Y. (See the next to the last column of Table 6-4.) Theexpression (nt ~2) is known as the degrees of freedom (d,f.), which, as noted in Chapter 4, is simply the number of independent observations: Once ¢;is computed, as shown in Table 6-4, Ye? can be compuled easily. Incidentally, in: Passing rot ‘hat. 6 = Ve 9) ‘The proofs can be found in Damodar N. Gujarati, Basic Econometrics, ath ed., McGraw-Hill, ‘Nevr York, 2603, pp. 101-102, ‘Notice that we can compute ¢; only when ¥; is computed. But fo compute the latter, we must fist obtain by and lin estimating these two unknowns, we lose 2d. Therefore alfough we have ni observations the df are oly (t~2). 235TABLE 7-1 ‘OHAPTER SEVEN: THE TWO-VARIABLE MODEL: HYPOTHESIS TESTING 173 COMPUTATIONS FOR"THE LOTTO EXAMPLE. Estimator Formula ‘Answer "Equation nurbet e oa 1 6654 7.40) é VE = ERB 5 apae ayy x] var (by) (2S) LOR ooh 93168 @.19 0{b1). Vear(by) = VO8168 3.0523 (7.13) ot, $4884. ‘Vat (be) oe 51660: 7 0.000126, (7.14) so(be): Nar (be) = 0.000126 Note:The ra dats under Yng the calc sce penn Table GA. n coming the variances ofthe etinalos, has been feplaced by hs etna, é, Note tha re answers aro ‘ured to the nearest ig 2 7.48) which is known as the standard error of the regression (SER), which is simply ‘he standard deviation ofthe Y¥ values about the estimated regression line * This standard error of regression is often used as a sunnmmary measure of the goodness of fit of the estimated regression line, a topic discussed in Section 76. As you would suspect, the smaller the value of the closer the actual Y value isto its estimated value from the regression mode). Varlances and Standard Errore of the Lotto Example! Using the preceding formulas, let us compute the Variances and standard errors of our Lotto example, These calculations are presented in Table 7-1.(See Eqs.7.30 to7.15 therein) Summary ofthe Lotte Expenditure Function Let us express the estimated Lotta expenditire faction inthe fl lowing form: $= 7.6182 + 0.0814X; s¢ = (8.0502)(0.0112) ‘whert the figures in. the parentheses are the estimated standard etrors. Regression results are sometimes presented in this format (but more on this in Section 7.8). Such a presentation indicates immediately the estimated parameters anc theit standard errors, For example, it tells ts that the estimated slope coefficient of thie Lotto expenditure function (é,, the coefficient of the income (7.16) ‘Note the dtfeenre between he standard eto of egression and the standard deviation of ‘The lates i measured, a woul, from its mean value, as $y = J DEES, whereas the former is iiedoued Grom th estimated vlc (Le, fo the sample regression, Se lo footnote? 236. JOR Iree 174 PARTTWO: THE LINEAR REGRESSION MODEL variable) is 0.0814 and its standard deviation, or standard error, is 0.0112—this is a measure of variability of by from sample to sample. What use can we make of this finding? Can we say, for example, that our computed by lies within a certain number of standard deviation units from the true B,? If we can do that, ‘we can state with some confidence (ie, probability) how good the computed SRE Equation (7.16), is as an estimate of the true PRE. This is, of course, the topic of hypothesis testing; But before discussing hypothesis testing, we need a bit more theory. In particular, since by and by are random variables, we must find thetr sampling, or probability, distributions, Recall from Chapters 4 and 5. that a random variable (r-v.) has a probability distribution associated with it. Once we determine the sampling distributions of our two estimators, as we will show in Section 7.4, the task of hypothesis testing becomes straightforward. But even before: that-we answer an important question: Why do we use the OLS “> method? 7.3. WHY OLS? THE PROPERTIES OF OLS ESTIMATORS im ‘The method of OLS is used popularly not only because it is easy to use but also because it has some strong theoretical properties, which are summarized in the well-known Gauss-Markov theorem. Gauss-Markoy theorem Given the assumptions of the classical linear regression model, the OLS estimators have minimum variance in the class of linear estimators; that is they . are BLUE (best linear unbiased estimators. We have already discussed the BLUE property in Chapter 5.In short, the OLS estimators have the following properties.” 1, by and by are linear estimators; that is, they are linear functions of the random variable Y, which is evident from Equations (6.16) and (6.17). 2. They are unbiased; that is, E(b) = By and E(ba) = Bp. Therefore, in repeated applications, on'average, by and b will coincide with their true values'By and Bs, respectively. 3, E(G2) = 0%; that is, the OLS estimator ofthe ertor variance is unbiased. In repeated applications, on average, the estimated value of the error variance will converge to its true value. 4, by and by are effcient estimators; that is, var (bi) is less than the variance _of any other linear unbiased estimator of By, and var (b,) is less than the "Ror proof, see Damodar N. Gujarati, Basic Econometrics, 4th ed., McGraw-Hill, New Youk, 2005, p.100-106 237(CHAPTER SEVEN: THE TWO-VARIABLE MODEL: HYPOTHESIS TesTING. 175 variance of any other linear unbiased estimator of B,. Therefore, we will be able to estimate the true B, and By more precisely if we use OLS rather than any other method that also gives linear unbiased estimators of the true parameters, The upshot of the preceding discussion is that the OLS estimators possess many desirable statistical properties that we discussed in Chapter 5. Itis for this reason that the OLS mettiod has been used populatly in regression analysis, a8 ‘well as for its intuitive appeal and ease of use. Monte Carlo Experiment In theory the OLS estimators are unbiased, but How do we know that in practice this is the case? To find out, let us conduct the following Monte Carlo experiinest. ‘Assume that we are given the following information: Y= By + BX +m 5 +20 +4 where uj ~ N(0, 4). Thats, we are told that the true values of the intercept and slope coefficiénts are L5 and 2.0, respectively, and that the error term follows the normal distribution with a mean of zero and a Variance of 4. Now suppose you are given 10 values of X: 1,2,3,4,5,6,7,8,9)10. Given this information, you can proceed as follows. Using any statistical package, you generate 10 values of ui from a normal distribution, with mean zero and variance 4, Given By, By, the 10 values of X and the ten values of gen erated from the normal distribution, you will then obtaif 10 values of Y from ‘I the preceding equation. Call this experiment or sample number 1. Go tothe not- A mal distribution table, collect another 10 values of v, generate another 10 values, ay of Y, and call it sample numtber 2. In this manier obtain 21 samples. f For cach sample of 10 valies, regress Y; generated above on the X values and , ‘obtaitt by, by, arid 32. Repeat this exercise for all 21 Samples, Therefore, you will have 21 values each of bi, by, and 6%, We conducted this experiment and ‘ obtained the results shown in Table 7-2. ‘ From the data given i this table, we have computed the méan, or avetage, ‘ values of by, by, and G2, which are, respectively, 1.4526, 1.9665, and 44743, ‘whereas the true values of the corresponding coefficients, as we know, are 1.5, 20, and 4.0. ; ‘What conclusion can we draw from this experiment? It seems that if we apply the method of least squares tirne and again, on average, the values of the estimated paraineters will be equal te their true (population parameter) values. ‘That is, OLS estimators are unbiased. in the present example, had we conducted = more than 21 sampling experiments, we would have come much closer to the true values. 238 =176 PART TWO: THE LINEAR REGRESSION MODEL TABLE7-2 MONTE CARLO EXPERIMENT: ¥ = 1.5-428/+ ue u~ N(O,4)- 7A THE SAMPLING, OR PROBABILITY, DISTRIBUTIONS OF OLS ESTIMATORS Now that we have seen how to compute the OLS estimators and their standard errors and have-examined some of the properties of these estimators, we need to find the sampling distributions of thse estinators: Without that knowledge we will not be able to engage in hypothesis testing. The general notion of sampling distribution of an estimator has already been discussed in Chapter 4 (see Section 42). ‘Todetive the sampling distributions ofthe OLS eatimators by and bp, we need to add one more assiimption to the list of assumptions of the CLRM. This assumption is fia ATI. Tn the PRF Yj = By + BoX; +-us the error term’ follows the normal distribution with mean zero and variance o2, That is 4; ~ NO, 0") (717)CHAPTER SEVEN: THE TWO-VARIABLE MODEL: HYPOTHESIS TESTING 177 What is the ratioriale for this assumption? There is a celebrated theorem in statistics, known as the central limit theorem (CLT), which we discussed earlier in Chapter 4 (see Section 4.1), which states that: Central Limit Theorem If there isa large number of independent and identically distributed random variables, then, with a few exceptions the distribution of their sum. fends to be a normal distribution as the number of such variables increases indefinitely. Recall from Chapter 6 our discussion about the nature of the error term, 4j. AS shown in Section 6.4, the error term represents the influence of all those forces that affect Y but are not specifically included in the regression model because. there are so many of them and the individual effect of any one such force (i. variable) on Y may’be too minor, If all these foices are random, and if we let 4 represent the sum of all these forces, then by invoking the CLT we can assume that the error term w follows the normal distribution. We liave already assumed. that the mean value of uis zero and that its variance, following the homoscedasticity assumption, is the constant 0, Hence, we have Equation (7.17). But iow does the assumption that u follows the normal distribution helps us to find out the probability distributions of by and 6? Here we make use of another property of the normal distribution discussed in Chapter 4, namely, any litear function of a normally distributed variable is itself normally distributed, Does ‘his mean that ifwe prove that by and b, are linear fictions of the normally dis- ‘tributed variable u, they themselves are nonmally distributed? That's right! You can indeed prove that these two OLS estimators are in fact linear functions of the normally distributed uj. (For proof, see Bxercige 7.24 )° Now we know from Chapter 4 that a normally distributed rv. has two parameters, the mean and the variance. What are the parameters of the normally: distefbuted by and by? They are as follows: by ~ N(B, of) a8) by ~ N(Ba, 8) (7.49) where the variances of by and by are as given in (7.4) and (7.6). An short, by and th each follows the normal distribution with their means equal to true B; and By and their variances given by Eqs. (7.4) and (7.6) developed previously. Geometrically, the distributions of these estimators are “as shown in Figure 7-4, S0p¢ exception i the Cauchy probability distibution, which has no meen or variance. Mymay algo be noted that since Yj = By + Kj + wy ify ~ No, then Yy ite AX, 07) ‘eeaist Yi linear combination oft. (Note that By, By ate constants and X; fixed 240 LEE. lviio 1? 178 PARTTWO; THE LINEAR REGRESSION NODEL . % , 1) FIGURE 7-4 (Normal) sampling distributions of by and be. 7.5 HYPOTHESIS TESTING 3 Recall that estimation and hypothesis testing are.the two main branches of sta~ tistical inference. In Chapter 6 we showed how OLS helps us to estimate the parameters of linear regression models. In this chapter the classical framework enabled us to examine some of the properties of OLS estimators. With the added assumption that the error term 1 i8 normally. distributed, we were able to find the sampling (or probability) distributions of the OLS. estimators, namely, the normal distribution. With this knowledge we are now equipped to deal with the topic of hypothesis testing in the context of regression analysis. ‘Let us return to our Lotto example. The estimated Lotto expenditure function is given in Eq, (6.20). Suppose someone suggests that income (ie, PDI) has no effect on the amount of money spent on Lotto each week. Hhy:B, = 0. In applied regression, analysis such a “zér0” null hypothesis, the so-called straw man hypothesis, is deliberately chosen to find out whether Y is related to X atall. If there isno relationship between Y and X to begin with, then testing a hypothesis that By=~2 or any other value is meaningless. Of course, if the zero null hypothesis is sustainable, there is no point at all in including X in the model. Therefore, if X really belongs in the model, you would fully expect to reject the zero null hypothesis He in favor of the allernative hypothesis Hy, which says, for example, that By + 0; thats the slope coefficients diffetent from zero. It could be positive orit could be negative. atCHAPTER SEVEXE THE THOVARIABLE MODEL: HYPOTHESIS TESTING: 179 Our numerical results show that by = 0:0814. You would therefore expect that the zero null hypothesis is not tenable in this case. But we cannot lookat the numerical tesults alone, for we know that because of sampling fluctuations, the numerical value will change from sample to sample, Obviously, we need some formal testing procedure to reject or not reject the null hypothesis, How do we proceed? This should not be a problem now, for in Bquation (7:19) we have shown that >, follows the normal distribution with mean = By and var (bs) = 02/° 3. Then, following our discussion about hypothesis testing in Chapter 5, Section 55, we can use either: : 1. The confidence interval approach or 2. The test of significance approach to test any hypotheses about By as well as By. Since by follows the normal distribution, with the mean and the variance stated in expression (7.19), we know that Ze bab selb:) ~b=hk of \a# follows the standard normal distribution. From Chapter 4 we know the properties of the standard normal distribution, particularly, the property that *95 pes- cent of the area of the normal distribution lies within two standard deviation units of the mean value, where © means approximately. Therefore, if our null hypothesis is By = 0 and the computed by = 0.0814, we can find out the probability of obtaining stich a value from the Z, or standard normal, distribution (Appendix A, Table A-1) If this probability is very small, we can reject the null hypothesis, but if itis large, say, greater than 10 percent, we may not reject the null hypothesis, All this is familiar material from Chapters 4 and 5. Bia, there is a hitch! To use Equation (7.20) we must know the true o?. This is not known, but we can estimate it by using 6? given in Eq. (7.8). However, ifwe replace’s in Bg; (7.20) by its estimator 6, then, as shown in Chapter 4, Eq. (4.8), the right-hand side of Eq. (7.20) follows the t distribution with (n— 2) df, not the standard normal distribution; that is, 6, effet ~NO,1) 720) hee 72a) Ot more generally 1-2 (7.22) se(b2) Note that we lose 2 df, in computing ¢? for teasdins stated earlier, 242 Le aE. a 179180 PART TWO: THELINEAR REGRESSION MODEL Therefore, to test the mill hypothesis in the present-case, we have to use the distribution in lieu of the (standard) normal distribution, But the procedure of hypothesis testing remains the same, as explained in Chapter 5. Testing Ho:B: = 0 versus H,: By # 0: The Confidence Interval Approach For our Lotto example we have 10 observations, hence thie d.f are (10-2) = 8. Let us assume that a, the level of significance or the probability of committing a type Terror, is fixed at 5 percent. Since the alternative hypottiesis is two-sided, _ from the table given in Appendix A, Table A-2, we find thatfor 8 df, P(-2306 < ts 2.306) = 0.95 (7.23) That is, the probability that a f value (for 8 af) lies between the limits (~2.306, 2.306) is 095 or 95 percent; these, as we know, are the critical t values. Now by substituting fort from expression (7.21) into the preceding equation, we obtain P| -2306 < 22 <7396] = 095 (7.28) o/ oa Rearranging inequality (7.24), we obtain P (: = 2306S e# Beh re] =095° (7.25) yL4 . Or, more generally, Pl(bz — 2.306 se(ba) < Br < be + 2.306 se(bz)] = 0.95 (7.26) vihich provides a 95% confidence interval for Bz. In repeated applications 95 out of 100 such intervals will include the true By, As noted previously, in the language of hypothesis testing such a confidence interval is known as the region af acceptance (of Hp) and the area outside the confidence interval is known.as the rejection region (of Ha). Geometrically, the 95% confidence interval is shown in Figite 7-5(a). Now following our discussion in Chapter 5, if this interval (Lez, the acceptance region) includes the null-hypothesized value of By, we do not reject the hypothesis. But if it lies outside the confidence interval (ie, it lies in the rejection region), we reject the null hypothesis, bearing in mind that in making either of these decisions we ate taking a chance of being wrong a certain percent, say, 5 percent, of the time. All that remains to be done for our Lotto example is to obtain the numerical valixe of this interval, But that is now easy, for we have already. obtained ALS, 193(CHAPTER SEVEN: THE TWO-VARIABLE MODEL HYPOTHESIS TESTING 181 Bye 23t6 4 ‘s2eney) ow @ FIGURE 7-5. a} 05% confidence interval fr 8, (8 df); (b) 5% cooidence inferval for the slope coofcient ofthe Loto example se(hs) = 0.0112, as shown in Bq, (7.16) Substituting this value in Equation (7.26), we now obtain the 95% confidence inferval as shown in Figure 7-5(0), 0.0814 = 2.306(0.0112) < By < 0.0814 + 3.306(0,0112) That is, 0.0566 < By < 0.1072 @2n Since this interval does not include the null-hypothesized value of 0, we can reject the null hypothesis that income has no effect on Lotto, expenditure. Pist positively, income does determine thé amount of money spent on Lotto. A cautionary note: As noted in Chapter 5, although the statement given in Eg, (7.26) is true, we cannot say that the probability is 95 percent that the partic- vular interval (7.27) includes the trve B, for unlike Bq. (7.26), expression (727) is, not d random interval; it is fixed. Therefore, the’probability is either 1 or 0 that the interval (7.27) includes Bz. We canonly say that if we construct 100 intervals ike interval (7.27), 95 out of 100 such intervals will include the true By we cane not guarantee that this particular interval will necessarily include By, Following a similar procedure exactly, the reader should verify that the 95% confidence interval for the intercept term By is 0.5796. By < 14.6568 (7.28) Uy for example, Ho:By = 0 vs. HyBs # 0, obviously this null hypothesis will be rejected too, for the preceding 95% confidence interval does not include 0. On 2a8FL 182 (PART TWO; THELINEAR REGRESSION MODEL the other hand, ifthe null hypothesis were that the true intercept term is 11, we would not reject this null hypothesis because the 95% confidence interval includes this value, The Test of Significance Approach to Hypothesis Testing 192 The key idea underlying this approach to hypothesis testing is that of a test statistic (eee Chapter 5) and the sampling distribution of the test statistic under the null hypothesis, Ho, The decision to accept of reject Hy is made on the basis of the value of the test statistic obtained from the sample data. To illustrate this approach, recall that. bp — By se(br) follows the f distribution with (n — 2) d.f. Now if we let *(7.22) HyBy = BS where BS is a specific numerical value of By (e-,, Bf = 0), then “b-Bf mr) __ estimator ~ hypothesized value standard error ofthe estimator ft (7.29) can be readily computed from the sample data. Since all the quantities in Equation (7.29) are now known, we can use the f value computed from Eq. (7.29) as the test statistic, which follows the # distribution with (n ~ 2) d.f, Appropriately, the testing procedure is called the ¢ test.1° Now to use the ¢ test in any concrete application,.we need to know three things: 1, The d.f,, which are always (nt ~ 2) for the two-Variable model 2. The level of significance, a, which is. a matter of personal choice, although 1, 5, or 10 percent levels are usually used in empirical analysis. Instead of arbitrarily choosing the a value, you can‘ find the p value (the exact level of significance as described in Chapter 5) and reject the null hypothesis if the computed p talueis sufficiently low. 3. Whether. we use a, one-tailed or two-tailed test (see Table 5-2 and Figure 5-7). eas "The difference betwreen the confidence interval and the test of significance approaches lies in the fact that in the formes we do-not know what the true Bis and therefore try to guess itby establishing a (1a) confidence interval, In the test of significance approach, on the other hand, we hypothesize what the teue B (=83) sand ty to find out if the sample valu bis gufcently close to (the hypothesized) Bf.‘CHAPTER SEVEN: -THE TWO-VARIABLE MODEL: HYPOTHESIS TESTING. 183 “$555 Di) O_o 2506 8585 FIGURE 7-6 The tdtibuion for Lotte Example Continued 1. Atwo-tailed test. Asstume that Ho:By = and He Bz # 0. Using Bq, (7.29), we find that 7.30) Now from the # table given in Appendix A, Table A-2, we find that for 8 df. we have the following critical values (two-tailed) (see Figure 7-6): Level of signticance” Citcal oot 9355. 0.05 2.308 2:0 4.800, In Table 5-2 we stated that in the case of the two-tailed ? test if the com- ‘puted |¢], the absolute value of f, exceeds the critical ¢ value at the chosen level of significance, we can reject the null hypothesis, Therefore, in the present case we can reject the null hypothesis that the true B, (Le, the income coefficient) is z8r0, because the computed |!) of 7.2624 far exceeds the critical t value even at the 1% level of significanice. We reached the’same conclusion on the basis of the confidence interval shown in Eq, (7.27), which should not be surprising because the confidence interoal and the test of significance approaches to hypothesis testing are merely two sides of the same coin, Incidentally, in the present example the p value (i., probability value) of the ¢ statistics of 7.2624 is about 0.0001, Thus, if we were tozeject the null hypothesis that the true slope coefficient is zero at this p value, we ‘would be wrong in one out of ten thousand océasions. . A one-tailed test, Since the income coefficient in the Lotto expenditure function is expected to be positive, a realistic set of hypotheses would be Ho:By <0 and Hy:By > 0;-here the alternative hypothesis is one- sided. . So. LEME 246 ao184. PART TWO: THE LINEAR REGRESSION MODEL The t-testing procedure remains.exactly the same as before, except, as noted in Table 5-2, the probability of committing a type I error is not divided equally between the two tails of the t distribution but is concentrated in only one tail, either left or right. In the present case itwill be the right tail. (Why?) Fos 8 df, we observe from the ¢ table (Appendix A, ‘Table‘A-2) that the critical # value (right-tailed) is Level of Significance Crical ¢ ; oot 2.698 005° = 1.860 0.10 1.397 For the Lotto example, we first compute the t value as if the null hypothesis were that By = 0. We have already seen that this ¢ value is t = 7.2624 (7.30) Sirice this t value exceeds any of thie critical values shown in the preceding table, following the rules laid dovin in Table 5-2, we can reject the hypothesis that income has no effect on Lotto expenditure; actually it has a positive effect (.e,, By > 0) (see Figure 7-7). Ham = ds) Bia 0 o, FIGURE 7-7 One-ialled test: (e) Righttaleg (0) lettalled,(CHAPTER SEVEN: THE TWO-VARIABLE MODEL: HYPOTHESIS TESTING 185, 7.6 HOW GOOD IS THE FITTED REGRESSION LINE: ‘THE COEFFICIENT OF DETERMINATION, r* ‘Our finding in the preceding séction that on the basis of the ¢ test both the estimated intercept and slope coefficients are individually statistically significant (Le, significantly different from zero) suggests that the SIE, Eq, (7.16), shown it. Figure 6-6 seems to “fit” the data “reasonably” well. Of course, not each actual 'Y value lies on the estimated PRE, That is, not all. ‘% — Yi) are zero; as Table 6-4 shows, some ¢ are positive and some are negative. Can we develop an overall measure of “goodness of fit” that will tell.us how well the estimated regression line, Eq, (7.16), fits the actual Y values? Indeed, suck a measure has been developed and is kriown as the coefficient of determination, denoted by the symbol r? (read as r squared), To see how r2is computed, we proceed a3 follows. Recall that Keh+e (65) Lat us expiess this equation in a slightly different but equivalent forsn as (see Figure 7-8). M-YNo = (hie Dat (%- Tere) 730 Vacation in iain ‘Unipaed ot fomismanvise yr aaijated |. sola wrain Tsnentvake (Note: Foo? ‘Now, letting small letters indicate deviations from mean values, we can write the preceding equation as ywehitea (Note: y; = (%; — Y), etc. Also, note thaté'= 0, a8 a sesult of which Y: the mean values of the actual Y and the estimated Y are the same, Or weber te (7,33) since f= bax. Now squaring Equation (7.33) on both sides and summing over the sample, wwe obtain, after simple algebsaic manipulation, Yee Y#R+ Le (7.34) Or, equivalently, YivsB owt Te (7.35) 248 A FLAG 185FLEE 186 PARTTWO: THE LINEAR REGRESSION MODEL ' YF 4 f) = Vatiton in Y, no explained by regres xX x FIGURE 7-8 Breakdown of fta! variation in Yj This is an important relationship, as we will see, For proof of Equation (7.35), see Problem 7.25. ‘The various sums of squares appearing in Eq, (7.35) can be defined as follows: Sv = the total variation" of the actual Y values about their sample mean Y, which may be called the total sum of squares (TSS). Lf = the total variation of the estimated Y values about their mean value ), which may be called appropriately the sum of squares due to regresiot (Le, due to the explanatory variable [s]), or simply the explained sum of squares (ESS). Le} =asbefre, the residual sum of squates (RSS) or residual or unexplained variation of the Y values about the regression line. Put simply, then, Bg, (7.35) is TSS.= BSS +RSS (7.36) and shows that the total variation in the observed Y values about their mean value casi be partitioned into two parts, one attributable to the regression line and the other to random forces, because not all actual Y observations lie on the fitted line. All this can be seen clearly from Figure 7-8 (see also Fig. 6-6). Now if the chosen SRF fits the data quite well, ESS should be much larger ‘than RSS. Ifall actual ¥ li on the fitted SRF, ESS will be equal to TSS, and RSS will be zero. On thie other hand, ifthe SRF fits the data poorly, RSS will be much larger than BSS. In the extreme, if X explains no variation at alli Y, ESS will be zero and RSS will equal TSS. These are, however, polar cases. Typically, neither ‘Whe torms variation and oariance are diferent. Variation means the sum of squares of deviations of a variable from its mean value, Variance is this sum divided by the appropriate df. In short, variance = vaviation/ p 349CHAPTER SEVEN; THE TWO-VARIABLE MODEL: HYPOTHESIS TESTING 187 ESS nor RSS will.be zero. If ESS is relatively larger than RSS, the SF will explain a substantial proportion ofthe variation in Y. IF RSS is relatively larger than ESS, the SRF will explain only some part of the variation of Y. All these ‘qualitative statements are intuitively easy to understand and can be readily quantified. IF we divide Equation (7.36) by TSS.on both sides, we obtain BSS RSS l= at hs (737) ‘Now let us define” (738) The quantity? thus defined is known as the (samiple) coeficient of determination and is the most commonly used measure of the goodness of fit of a regression line. Verbally, ° measures the proportion or percentage ofthe total variation in ¥ explained by the regression model, ‘Two properties of f may be rioted: 1. Stis anon-negative quantity. Why?) 2, Its limits are 0 if we reject it (Le.,, when the test sfatistic is significant), it means that the true population value is different from zero, ‘One advantage of reporting the regression results in the preceding format is that.we can see at once whether each estimated coefficient is individually statistically significant, that is significantly different from zero, By quoting the p valives'we can determine the exact level of significance of the estimated t value, Thus the Value of the estimated slope coefficient is 7.2624, whose p value is ‘practically zeto, As noted in Chapter 5, the lower the Poalue, the greater the evidence ageing the null hypothesis “A warning is in order here. When deciding whether to reject or not reject @ “null hypothesis, determine beforehand what level of the p value (call it the critical p value) you are willing to accept and then compare the computed p value with the ctitical p value, If the computed p valve is'smaller than the critical poalue, the null hypothesis can be rejected, But if it is greater that the critical (76) Ihe ‘able in the appertdix of this book (Appendix A, abl A-2} can now be replaced by elec- ‘rome tales that will compute the p values to several digits, This is also true of the nonnal, chi- siuane and the F tables (Appendix A. Tables Ad and 3, espectivel 252 i+ aan are f2 88hha 490 PARTTWO: THELINEAR REGRESSION MODEL p value the null hypothesis may not be rejected. If you feel comfortable with the tradition of fixing the critical p value at the conventional 1, 5, or 10 percent level, “that is fine. In Eq. (7.46), the actual p value (Le, the exact level of significance) of the t coefficient of 7.2624 is 0.0001. If we had chosen the critical p oalue at 5 per- -cent, obviously we would reject the null hypothesis, for the computed p value oF 0.0001 is muich smaller than 5 percent. Ofcourse, any null hypothesis (besides the zero null hypothesis) can be tested. easily by making use of the ¢ test discussed earlier. Thus, if the null hypothesis is that the true intercept term is 10 and if Hi:By # 10, the ¢ value will be p= 7818-10 “30625 ‘Thep unlue of obtaining sucha t value is about 0.22, which is obtained fromelec- tronic tables. IF you had fixed the critical p value at the 10 percent level, you would not reject the null hypothesis, for the computed p value is much greater than the critical p onlue, ‘The zero null hypothesis, as mentioned before; is essentially a kind of straw man, It is usually adopted for strategic reasons—to “dramatize” the statistical significance (i.e., importance) of an estimated coefficient. =—0.7803 7.8 COMPUTER OUTPUT OF THE LOTTO EXAMPLE. Since these days we rately.run regressions manually, itmay be usefiil to produce the actual oiitput of regression analysis obtained froma statistical software package. Below we give the selected output of ourLottoexample obtained from Eviews, 253 199(CHAPTER SEVEN: THE TWO-VARIABLE MODEL: HYPOTHESIS TESTING 194 7.9 NORMALITY TESTS Before we leave our Lotto example, we need to look at the regression results given in Bq, (7.46). Remember that our statistical testing procedure is based on the assumption that the ecror term u; is normally distributed. How do we find out if this is the case in our example, since we do not ditectly observe the true ‘errors u;? We have the residuals, ¢ which are proxies for ;. Therefore, we will have to use the ¢;to learn something about the normality of w;. There are several tests of normality, but here we will eonsider only three comparatively simple tests.’ Histograms of Residuals A histogram of residuals is a simple graphital device that is used to leam something about the shape of the probability density function (PDF) of a tandom vatiable. On the horizontal axis, we divide the values of the variable of interest (eg., OLS residuals) into suitable intervals, and in each class interval, we erect rectangles equal in height to the number of observations (ie, frequency) in that class interval. If you mentally superimpose the bell-shaped normal distribution curve on this histogram, you might get some idea about the nature of the probability distribution of the variable of interest, It is aways a good practice to plot the histogram of residuals from any regression to get come rough idea about the likely shape of the underlying probability distribution: for 9 deteited discussion’ of varios inoritlity rests: see G. Bane Wethwrbill, Régressta Analysis with Applicafions, Chapinan aid Hall, London, 1986, Chap. 8 28a PA EA ony oDSE So 192 PARTTWO: THE LINEAR REGRESSION MODEL Normal Probability Plot ~ ‘Another comparatively simple graphical device to study the PDF ofa random variable is the norinal probability plot (NEP) which makes use of normal probability paper, a specially ruled graph paper. On the horizontal axis, (X-axis) we plot values of the variable of interest (say, OLS residuals ¢), and on the vertical, axis (Y-axis), we show the expected Values of this vatiable if its distribution ‘were normal. Therefore; if the variable is in fat from the normal population, the NPP will approximate a straight line, MINITAB has the capability to plot the ‘NPP of any random variable, MINITAB also produces the Anderson-Darling normality test known as the A? statisti. The underlying nill hypothesis is that a variable is normally distributed. This hypothesis can be sustained if the com 7 Pate As notte spit. Jarque-Bera Test * statistical packages is the Jarque-Bera (JB) test A test of normality that has now become: very popular and is included in several ‘This is an aeymptotio, or large sample, fest and is based on OLS residuals. This test first computes the coefficients of skewness, S (a measuie of asyminetry of a PDF), and kurtosis, K (a measure of how tall or flat a PDF is in relation to the normal distribution), of a random variable (¢.g., OLS residuals) (see Chapter 3). For a normally distributed variable, skewness is zero and kurtosis is 3 (see Figure 3-4). Jarque and Bera have developed the following test statistic: w= 2{4 oP an where 1 is the sample size, $ represents skewness, andl K represents kurtosis. They have shown that under the normality assumption the JB statistic given in Equation (ZAM) follows the chi-square distribution with 2 df. asymptotically (ve, in large samples) Symbolic Bing ~: ee (748) whiese asy means asymptotically. As you can see from Eq, (7.47), if a variable is normally distributed, S is zero. and (K ~ 3) is also zero, and therefore the value of the JB statistic is zero ipso facto. But ifa variable is not normally distributed, the JB statistic will assume increasingly larger values. What constitutes a large or small value of the JB statistic canbe learned easily from the chi-square table (Appendix A, Table A). Ifthe computed chi-square value from Bq, (7.47) exceeds the critical chi-square value for 2 df. at the chosen level of significance, we reject the null hypothesis of sormal distribution; butif it does not exceed the ctitical chi-square value, we do “See C. M. Janque snd A. K. Bera, “A Test for Normality of Observations and Regression Residuals,” international Statistical Review, vol. 55, 1987, pp. 163-172.CHAPTER SEVEN: THE TWO-VARIABLE MODEL: HYPOTHES not eject the null hypothesis, Ofcourse, if we have the p value ofthe comiputed chi-square value, we will know the exact probability of obtaining that value, We will illustrate these normality tests with the following example, 7.10 ACONCLUDING EXAMPLE: RELATIONSHIP BETWEEN WAGES AND PRODUCTIVITY IN THE U.S, BUSINESS SECTOR, 1 FIGURE 7-10 According to the maiginal productivity theory of iiicroeconomics, we would expecta positive relationship betweeh wages and worker productivity, To see if this 60, in Table 7-3 we provide data on labor productivity, as measured by the index of output pes hour of all persons, and wages, as measured by the index of real compensation per hour, for the business sector of the U.S. economy for the period 1959 to 2000. The base year of theindex is1992 fi hourly real compert- sation is hourly compensation divided By the: ice index (CPD)... Let Ruoages (Y/) == index of seal compensation ahd Protctivity (X) index of ‘output per hour of all persons. Plotting these .we Gptain the scatter dia: gram shown in Figure 7-10, ‘This figure shows a very close finear relationship between labor prodiic- tivity and real wages. Therefore, we can use a (bivariate) linear regression Index of Compensation 2 eb ht ah vie 7 aN | : Index of Productivity ia Relationship between Compensation and produc ine U.S, business ‘séctor, 195972000, 256 a7. ‘yh PARTTWO: THE LINEAR REGRESSION MODEL ‘TABLE7-3__ INDEXES OF COMPENSATION AND. PRODUCTIVITY IN THE U.S, BUSINESS. ‘SECTOR, 1959-2000. Observations Productvty Compensation 1959 + 47.900. Ba 1960 48.8000, 59.8000, 1961“ 50.6000 61,6000 1962 "62,9000 63,7000 1963 5.0000 65.2000 1964 87,5000, 87.7000 1985 . 69,6000 69.1000: 1968 62.0000 74,7000 1967" 63.4000. 8.5000 1968 65.9000 78.0000: 1969 85.7000 773000 1970, 67.0090 78,8000, - 1971 62,9000. 80,1000 1972 72.2000 82,4000 1973 745000°° 84.2000 1974 "73.2000 83,2000. 1975 75.8000 14,1000 1976 + 72,6000 866000 1977 79.9000 87.6000 1978. 80.7000 89.2000 1978 80.7000 189.4000 1980 804000 892000 4981 82,0000 59000 1982 81.6000 90,7000 1983 84,6000 90.7000 1984 87.0000 91.1000 1985 88.7000 925000 1908 91.4000 1955000 \ 1987 91.9000 96.0000 1988 193.0000 97.1000 1989 99.9000 96.7000 7 1990. 95.2000, 96.3000. 4 1991 983000 97.2000 1992 ‘100.0000 490.0000 g 1903 +100.5000 022000 1994 tot.9000 99,9000 1995 102.6000 98.6000 1996 108.4000 100.1000 1997 107.8000. 101.0000, 1998 110.6000. 105.0008 1999 113.5000 107.6000 2000 116.9000 411.2000 ‘Source: Ecunomis Repert of the President, 2001. At: indx uber, 1982 = 00, vey(CHAPTER SEVEN: ‘THE TWO-VARIABLE MODEL: HYPOTHESIS TESTING: 195, model to the data given in Table 7-3, Using Eviews, we obtained the following results: Let us interpret the results. The slope coefficient of abiout 0.72 suggests that if the index of productivity goes up by a iinit, the index of real wages goes up, on average by 0.72 units, This coeffident is highly significant, for the ¢ value of about 369 (obtained under the aisumption that the true population coefficient is zero) is highly significant for the’ value is almost zero. The intercept coefficient, C, is also highly significant, for the p ualue of obtaining, a f value for this coefficient of as much as about 17 is practically zero. ‘The R? value of about 0.97 means that the index of productivity explains about 97 percent of the variation in the index of real compensation. This is a very high value, since an R? can at most be one. For now neglect some of the information given in the preseding table (eg, the Durbin-Watson statistic) for we will explain it at appropriate places, Figure 7.11 gives the actual and estimated values of the index of real compensation, the dependent variable in our model, as well the differences between the tivo, which are nothing but the residuals ¢. These residuals are also plotted in this figure. , Figure 7-12 plots the listogram of the residuals sHiown in Figure 7-11 and also shows the JB statistics. The histogram and the JB statistic show that there is no reason to reject the hypothesis that ‘the true error terms in the wages-productivity regression are normally distributed, Figure7-13 shows the normal probability plot of the residuals obtained from the compensation-productivity regression; this figure was obtained from MINITAB. As is clear from this figure, the estimated residuals lie approximately on a straight line, suggesting that the error terms (ie, 1) in this regression may be norially distributed. The computed AD statistic of 0.519 has a p value of about 0.17 or 17 pesvent. IE we fix the evitical p ual, say, atthe 5 percent level, the obserivid AD statistic ig not significant, probably confirming that the error terms are normally distributed. 258 , x 4196 PART TWO: THELINEAR REGRESSION MODEL. Yer (Residuals Residual graph, 9 sas) ess 4D 196598000 _—«8DRSY« ~ 34805 we ated ase wa sm 6620 198200. OTE, Wek. 6rd” esa 6. @1u0 - T9Tm 1866 TemmD’ *: 78100. We Ro 738m 1968 Mono,’ T5518 19695. 7rs000 = 75.805, 197 MsiO: Toa ign > ‘soatwo* rasi8 197 Baloo.” sas 1973. seaieo. ‘shast8 19m signed 508555 . “195° ehatoo —sa700 1976 865000 sare? 197 srfteo. as, 1978 2m. > S665 1979” syago0 86.3065 1960 so.am0.- 860500 ‘wet * ‘eeam00 97.248 82. 997000. 868561 1983. 907000. e219 oe 9.100. es 1985 925000" 920805 1985 985000... stom 1957 96am 943500 1983 too. 951689 1999 7000 9555 1990 s6at00 s67716 wo. wae 9887 1921000 490236 193°" 102200, 100597 194° 995000 10.607 195. 9000” m3 1996 < 10n400 104134 997. 0.goo. 105866" 90 Sage era 1999. 17ado. 109560 ood 100. Ra 12860 LES FIGURE 7-11 Actual ¥, estimated and residuals (Regression of ‘compensation on productivity), ‘Not: ¥=Aotual index of Cetrpanaton. Ye Ebimated lodex of Compensation, 741 AWORD ABOUT FORECASTING ‘We noted in Chapter 6, that one of the purposes of regression analysis is to predict the mean value of the dependent variable, given the values of the explanatory Variable(s). ‘To. be more. specific, let us: return to our Lotto example. Regression (7.46) presented the results of expenditure on Lotto based on the data of Table 6-2, Suppose we want to frid out the average expenditure on Lotto DR a)(CHAPTER SEVEN: THE TWO-VARIABLE MODEL: HYPOTHESIS TESTING 197 es QE eA FIGURE 7-12 Histogram of residuals from the compensation productivity regression, Probability Plot of RESIL Nommal-95% CI 8 Percent o 5S RBSBSIS BB et Lk 400-3000 ~2000 ~1000 0 1000. 2900. 3000 4000 F000 RESIL FIQURET18 Namal baby lta reste band fm te cxmpenstin-paduolety regression. ‘by a person with a given level of income. Whatis the expected level of Lotto expenditure at this level of income? ‘To fix the ideas, assuuiie that X (PDI) takes the value Xo, where Xq is some specified numerical value of X, say, Xo = $340. Now suppose we want to estimate E(Y | Xp = 340), that is the true inean Lotto expenditure corresporiding to the PDI = 340. Let % = the estimator of E(Y | Xo) (7.49) 260. a A AEtE E> 198 PART TWO: THE LINEAR REGRESSION MODEL 199 How do we obtain this estimate? Under the assumptions of the classical linear regression model (CLRM), it can be shown that Equation (7.49) can be obtained by simply putting the given Xo value in Eq, (7.46), which gives: Yasuo = 7.6182 + 0.0814(340) = 35,2942 (750) That is, the forecasted mean expenditure on Lotto for a person with weekly income of $340 is about $35 per week. Although econometric theory shows that under CLRM Yx-si0 ot, more generally, % is an unbiased estimator of the true mean value (ie., a point on the population regression line), it is not likely to be equal to the latter in any given sample, (Why?) The difference between them is called the forecasting, of prediction, error: To assess this error, we need to find out the sampling distribution of ¥.!6 Given the assumptions of the CLRM, it can be shown that fp is normally distributed with the following mean and variance: Mean = E(¥ | Xo) = By + By Xo _ aft, o- st) vonoi[} Set where X = the sample mean of X values in the historical regression (7.46) 22 = their sum of squared deviations from X = the variance of u; n= sample size ‘The positive square root of Equation (7.51) gives the standard error of Yo, se(%o).. _, Since in practice 0” is not known, if we replace it by its unbiased estimator 6”, 4 follows the ¢ distribution with (n — 2) d.f. (Why?) Therefore, we can use the f distribution to establish a 100 (1 — «)% confidence interval for the true (Le, pop- . ulation) mean value of Y cotresponding to Xp in the usual manner as follows: Plba + baXo— fa se Yo) < Br + Bo Xo
(CHAPTER SEVEN: THE TWO-VARIABLE MODEL: HYPOTHESIS TESTING 205 TABLE7-4 CAPACITY UTILIZATION AND INFLATION INTHE UNITED STATES, 1970-2001. ‘Observations “Capacity Iniiafon 1970 ez0000-. 6.900000 1971 77,6000 6.000000. 1972 83,2000 4.900000 1973 8750000 5.600000 1974 84.1000 9.000000 1975 73.40000 9.300000 1978 7.90000 5.700000 197 ‘82.90000: 6.400000 - 1978 '84,50000° 7.100000 1979. 84,20000 8.900000 1960 78.8000. 9.200000 4981 ‘7720000. 9.900000 1982 71,1000 ° 6.200000" 1983. 73,5000 - 4.000000 1984 Taso: 8.700900 1985: 72,40000° 9.200000 1966 78,80000 . 2.200000. 1967 ‘81,10000 3.000000. 1968 ‘84.1000 3.400000. 1969 83.2000 3.600000 1990 81.6000 3.800000 1991 7830000. 3.600000 tage 79.90000. © 2.400000 1993 ‘80.0000 - 2.400000 1994 2.40000 2.100000 1905, ‘e2.a0000:.- 2200000 1996 1.20000. 1.900000 1907 ‘8270000: 1.900000 1988 e900 1.200000 1999 1.40000 “1.400000. 2000 ‘81.4000. 2.100000 2001 TRO 2.400000 ‘Soiree: Economie Report o tie Posen, TABLE 7-5 CASH DIVIDEND (Y) AND APTER-TAX PROFITS (X) IN| U.S, MANUFACTURING INDUSTRIES, 1974-1986. Your ¥ Ko Year 8 x ($; In milions) (8, in milione) 1974 19,487 | SB76T YOR 40at7™ 101,902 1976: 19,968. 49,195. 1982: 41,259: 71,028, 1976 22,7063 64518. 1983 41,604. 85,694 4977 26.585 70.968. ibs. "45,02 YOTS48 4978 28,982 81,148. 1985. 45517» 87.648 197° 32,401 95,698 1908 48,044. -83,t2t 1880. 36,495 . ‘Sour Businoss Stes, 1906, U.S. Dapartmont of Conner, Buroau of Economic Anais, Decombar 1887, p72: 268 m P 2 EB a\ 206 PARTTWO: THE LINEAR REGRESSION MODEL b. Plot the scattergram between Y and X. ¢. Does the scattergtam support your expectations in part (a)? . Ifo, do an OLS regression of Y on X and obtain the usual statistics. , Establish a 99% confidence interval forthe tre slope and test the hypothe- -sis that the true slope coefficient is zero; that is, there is no relationship between dividend and the after-tax profit, 7.45. Refer to the S.A. data given in Table 6-15. Suppose you want to predict the male math scores on the basis of the female math scores by running the following regression: : Y= By + BX +h where ¥ and X denote the malé and female math scores, respectively. a, Bstimate the preceding regression, obtaining the usual summary statistics. b. Testthe hypothesis that there is no elationship between Y and X whatsoever. . Suppose the female math score in 1991 is expected to be 460. What is the predicted (average) male math score? 4, Establish a 95% confidence interval for the predicted value in part ()- 7.6. Repeat the exercise in Problem 7,15 but let Y and X denote the male and the female verbal scores respectively. Assume a female verbal gcore for 1991 of 425. 77. Consider the following regression results? R? =0.10, Durbin-Watson = 201 where Y = the real efurn on the stock prie index from January of the current year to Januasy of the following year X = the total dividends in the preceding year divided by the stock price index for July of the preceding year t=times © ‘Note: On Durbin-Watson statistic, see Chapter 14. ‘The time period covered. by the study was 1926 to-1982, Note: R® stands for the adjusted coefficient of determination, The Durbin- ‘Watson valie isa measure of autocorrelation. Both measures are explained in. subsequent chapters, . a. How would you interpret the preceding regression? b. If the previous results are acceptable to you, does that mean the best in- ‘vestment strategy is to invest in the stock market when the dividend /price ratio is high? ¢. If you want to know the answer to patt (bead Shiller's analysis. 78, Refer to Example 6.1 on years of schooling and average hourly eamigs. The data for this example is given in Table 65 and the regression results are presented in Bq, (6.21). For this regression a Obtain the standard errors of the intercept and slope coefficients and 1”. ’b, Test the liypothesis that schooling has no effect ont average hourly earning, Which test did you use and why? ee gee Robert}, Shiller, Market Volailty, MIT Press, Cambridge, Mass, 1989, pp. 32-86. 269TABLETS (CHAPTER SEVEN: THE TWO-VARIABLE MODEL: HYPOTHESIS TESTING 207 ¢. If youreject the null hypothesis in (), would you also reject she hypothesis thatthe slope coefficient in Bq, (6.21) is not different from 1? Show the nec- calculations. essay 7.19, Example 6.2 discussed Okun’s law, as shown in Bq, (6.22), This equation can" also be written as X; = By + ByY,, where X = percent growth in ral output, as measured by GDP and Y = change in the unemploysnent rate; measured in percentage points, Using the data given in Table 6-12, a Estimate the preceding regression, obtaining the usual results as per 46). by Is the change in the unemployment rate a significant determinant of percent growth in real GDP? How do you know? ¢. How would yott interpret the intercept coefficient in this regression? Does ithave any economic meaning? 720, For Example 63, relating stock prices to interest rate, are the regiession results givert in Eq, (6.24) statistically significant? Show the necessary calculations, 17.21. Refer to Example 6.5 about antique clocks and their prices. Tased on ible 6d, -we obtained the regression results shown in (6.27) and (6.28); For each regression obtain the standard errors, the ¢ ratios and the r? values. Test for the statistical significance of the estimated coefficients in the two regressions, 7.22. Refer to Problem 6.16. Using OLS regressions, answer questions (, (6), and (c). 7.23. Table 7-6 gives data on US. expenditure on imporied goods (Y) and personal, disposable income (X) for the 1968 to 1987. Based on the data given in this table, estimate an import expenditure furic- tion, obtaining the. usual-regression statistics and test the hypothesis that expenditure on imports is unrelated to personal disposable income, 7.24 Show that the OLS estimators, by and be, are linear estimators. Also show that these estimators are linear functions of the error term u; (Hitt: Note that b= Lays DF = LDwiy, where w; = %/ 33? and note that the X’s are minepchany 7.25. Prove Eq, (735). (Hint: Square Eg, (7.33) and use sorne of the properties of OLS). U.S. EXPENDITURE ON IMPORTED GOODS (7) AND PERSONAL DISPOSABLE INCOME (X}, 1968-1087". Yeats Xe Year x 1968. 135.7 969 1446 1970 1509 4971. 1882 we 107 1973 2182 $814 3818: 1975 187.9 976-2293 jor? | 2594 ‘Sourge: Economie Repar othe President, ori ett p Sh enleda se Yom IbbaB2? p23 “Inbiifns of 1962 doles, 270 rae PERS:CHAPTER MULTIPLE REGRESSION: ESTIMATION AND HYPOTHESIS TESTING 7253 riable linear regression model that we have considered so far there ingle independent, or explanatory, variable, In this chapter we extend that model by considering the possibility that miore than one explanatory variable may influence the dependent variable. A regression model with moze than ‘one explanatory variable is known as a multiple regression model, multiple because multiple influences (ie., variables) may affect the dependent variable, For example, consider the 1980s savings and loan (S&L) crisis resulting from the bankruptcies of some S&L institutions in several states. Suppose we want to develop a regression model to explain bankruptcy, the dependent variable. Now a phenomenon such as bankruptcy is too complex to be explained by a single explanatory variable; the explanation may entail several variables, such as the ratio of primary capital to total assets, the ratio of loans that are more than. 90 days past due to total assets, thie ratio of nonaccruing loans to total assets, the ratio of renegotiated loans to total assets, or the ratio of net income to total assets, etc.! To include all these variables in a regression model to allow for the multiplicity of influences affecting bankruptcies, we have to considera multiple. regression model. Needless to say, we could cite hundreds of examples of multiple regression models. Infact, most regression models are multiple regression models because very few economic phenomena can be explained by only a single explanatory: variable, as in the case of the two-variable model, "As 9 matter of fact, these were some ofthe variables that were considered by the Board of Governors of the Federal Reserve System in their intemal studies of bankrupt banks. 271(CHAPTER EIGHT: MULTIPLE REGRESSION: ESTIMATIONAND HYPOTHESIS TESTING 209 Inthis chapter we discuss the multiple regression model seeking answers to the following questions: 1. How do we estimate the multiple regression models? Is the estiinating procedure any different from that forthe two-variable model? 2. Is the hypothesis-testing procedure any different from the two-variable model? 3, Ate there any unique features of multiple tegressions that we did not encounter in the two-variable case? 4, Since a multiple regiession cart have any’ number of explanatory variables, how do we decide how many variables to include in any given situation? To answer these and other related questions, we first consider the simplest of the multiple regression models, namely, the three-variable model in which the behavior of the dependent variable Y is examined in relation to two explanatory variables, X and Xs. Once the three-variable model is clearly understood, the extension to the four-, five-, or more variable case is quite straightforward, although the arithmetic gets a bit tedious. (Bit in this age of high-speed computers, that should not be a problem.) It is interesting that the three-variable ‘model itself is in many ways a clear-cut extension of the two-variable model, as the following discussion reveals, “81. THE THREE-VARIABLE LINEAR REGRESSION MODEL Genetalizing the two-variable population regression function (PRE), we can write the three-variable PRF in its nonstochastic form as EY) = Bi + ByXer + BaXx ar and in the stochastic form as, Y= Bit BrXa + BaXy +04 62) SEM) +H 3) where ¥ = the dependent variable Xand X= the explanatory variables u = the stochastic disturbance term: ¢ = the ith observation: ion (6.1) can be wate att E(Y) = BiXy + BX + BpXe, withthe understanding that {for each observation. The ptesentetion in (8.1) for notational convenience in thatthe sub- scrips on the parameters or fheirestitaiors match the subscripts onthe variables to which they are ‘tached, 272Af Mee 240 PART TWO: THE LINEAR REGRESSION MODEL In case the data are cross-sectional, the subscript i will denote the ith observa- Hon. Note that we introduce w in the three-variable, or, more generally, in the multivariable model for the same reason that it was introduced in the two- variable case, By is the intercept term. It represents the average value of Y when Xpand X5 are set equal'to zero. The coefficients By and By are called partial regression coefficients; their meaning will be explained stiorly. Following the discussion in Chapter 6, Equation (8.1) gives the conditional’ - mean value of Y, conditional upon the given or fixed values of the variables Xp and Xs, Therefore, as in the:two-variable case, multiple regression analysis is conditional regression analysis, conditional upon the given or fixed values of the explanatory variables, and we obtain the average, or mean, vale of Y forthe fixed values of the X variables, Recall that the PRF gives the (conditional) means of the ¥ populations corresponding to the given levels of the explanatory variables, Xp and Xs; ‘The stochastic version, Equation (8.2), states that any individual Y value can be expressed'as the sum of two components: 1, A systematic, or deterministic, component (By + BxXn + BsXay), which is simply its mean value E(Y;) (ie., the point on the population regression line, PRL) and 2. m, which is the nonsystematic, or random, component, detertnined by factors other than X2 and Xs. All thisis familiar territory fromi the two-variable case; the only point tonote is that we now have two explanatory variables instead of one explanatory variable, Notice that Eg, (8.1), or its stochastic countespart Eq. (8.2), is a linear regression rmodel—a model that is linear inthe parameters, the Bs. As noted in Chapter 6, our concern in this book is with regression models that are linear in the parameters; such models may of may not be linear in.the varlables (but more on this in Chapter 9). The Meaning of Partial Regression Coefficient ' ‘As mentioned earlier, the regression coefficients B, and By are known as partial regression or partial slope coefficients, The meaning of the partial segression coefficient is as follows: Bp measures the change in the mean value of Y. E(Y), per unit change in Xp, holding the value of Xs constant, Likewise, By measures the Unlike the two-variable case, we cannot show this diagrammatically because to represent the thie variables Y, Xo, and %5, we have to use a three-dimensional diagram, which is difficult (o visualize in two-dimensional fori But by stretching the imagination, we can visualize a diagram. similar to igure 6-6. ‘Geometrically, the PRL inthis case repress what is known as plane. 27%CHAPTER EIGHE MULTIPLE REGRESSION: ESTIMATION AND HYPOTHESIS TESTING 217 change in the mean value of Y per unit change in Xs, holding the value of Xp constant. This is the unique feature of a multiple regression; in the two-variable case, since there was only a single explanatory variable, we didnothave to worry about the preserice of other explanatory variables in the model, In the multiple segression model we want to find out what part of the change in the average value of Y can be directly attributable to Xz and what part to X3. Since this point is so crucial to understanding the logic of multiple regression, let us explain it by a simple example. Suppose we have the following PRF: B(Y) = 15 1.2% + 08%, 6a Let Xs be held constant atthe value 10, Putting this value in Equation (8.4), we obtain, (85) Here the slopé coefficient By = —1.2 indicates that the mean value of Y decreases by 1.2 per unit increase in Xp when Xs is held constant—in this example it is field constant at 10 although any other value will do. This slope coefficient is called the partial regression ca evn if we hola Xp constant, say, at the value 5, we obtain. E(Y;) = 15 — 1.25) + 0.8 =9+08%y 6a ‘Here the slope coefficient Bs = 0.8 means that the mean value of Y increases by 0.8 per unit increase in X3 when X; is held constant—here it is held constant at 5, bat any other value will do just as well. This slope coefficient too is a partial regression coefficient. In short, ther, a partial regression coefficient reflects the (partial) effect of ane explanatory variable on the mean value ofthe dependent variable when the values of other explanatory variables included in the model are held constant. This unique feature of multiple regression enables us not only. to include more than one-explanatory vatiable in the model but also to “isolate” or “disentangle” the effect of each X. variable on Y from the other X variables inclisded in the model. We will consider a concrete example in Section 8.5. (5) shows, it does not matter at what yahie X, fs held constant, for that Sas the algebra of i ri by its coefficient wil be a constant number, which will simply be added constant value raul! othe inter ‘The mathematically inclined read will notice at ance that By isthe partial derivative of HY) with respect to X end that Bs is the partial detivative of E(Y) with respect to Xi, 274 OOP bo m2212 PART TWO: THELINEAR REGRESSION MODEL 8.2 ASSUMPTIONS OF MULTIPLE LINEAR REGRESSION MODEL Pa £32 As in the two-variable case, our first order of business is to estimate the regression coefficients of the multiple regression model. Toward that end, we continue to operate within the framework of the classical linear regression model. (CLRM) first introduced in Chapter 7 and to use the method of ordinary least squares (OLS) to estimate the coefficients, Specifically, for model (8.2), we assume (cf. Section’7.1): A8.1. ‘The regression model is linear in the parameters as in Eq, (8.1) and that it is correctly specified. : AB2. Xpand Xs are uncorrelated with the disturbance term u, However if Xpand Xsare nonstochastic (Le, fixed number in repeated sampling), this assump-. tion is automatically fulfilled, Since our regression analysis is conditional regression analysis, conditional on the given X values, assumption 8.2is not necessary. But it is made to deal with simultaneous equation regression models discussed in Chapter 15, where we will see that some of the X variables may be correlated with the error ferm. A83, The error term u has a zero mean value; that is, E(u) +0 @n A84. Homoscedasticity, that is, the variance of u, is constant: var(uso2. ” 8) ABS. No autocorrelation exists between the error terms ij and up cov (uj, uj) ix j (8.9) ‘A8.6. No exact collinearity exists between X, and Xs; thatis, there is no exact linear relationship between the two explanatory watiables, This is a new assumption’ and is explained later 275,CHAPTER EIGHT: MULTIPLE REGRESSION: ESTIMATION AND HYPOTHESIS TESTING 213 AB7, For hypothesis testing, the error term u follows the normal distribution with ‘meat zero and (homoscedastic) variance 02. That i, a ~ NiQ;0) B10) Except for assumption 8.6, the rationale for the other assumptions the same as that discussed for the two-variable linear regression. As noted in Chapter 7, we make these assumptions to facilitate the development of the subject, It Part IIT we will revisit these assumptions arid see what happens if one or more of them ate not fulfilled in actual applications. Assiimption 86 that there is no exact linear relationship between the explanatory variables Xpand X,, technically knowmas the assumption of nacollinearity, or ‘nomulticollinearity, if more than one exact linear relationship is involved, isnew and needs some explanation. Inforinally, no perfect collinearity means that a variable, say, X, cannot be expressed as.an exact linear function of ariother variable, say, Xs. Thus, ifwe can express Xy = 34+ 2Ky Xo = 4Xy then the two variables are collinear, for theré is an exact linear felationship between Xp and Xs. Assiimption £.6 states that this should not be the case. The logic here is quite simple. If for example, Xp = 4X5, then substituting this in Bg. (8.1), we see that (6.41) A=4Bj + Bs (8.22) Equation (8.11) is two-variable model, not a three-variable model. Now even. if we can estimate Eq, (8.11) and obtain an estimate of A, there isno way that we can get individtal estimates of B, or Bs from the estimated A, Note that since Equation (8.12) is one equation with two unknowns we need two (indeperident) equations to obtain unique estimates of By and By ‘The tipshot of this discussion i that in cases of perfect collinearity we cannot estimate the individual partial regression coefficients By and Bs, in other words, 276 FERIAMf Ban 214 PARTTWO: THELNEAR REGRESSION MODEL we cannot assess the individual effect of X and Xs on Y. But this is hardly ‘surprising, for we really do not have two independent variables in the model, ‘Alttiough, in practice, the case of perfect cbllinearty is rare, the cases of high or near perfect collinearity abound. In later chapter (see Chapter 12) we will examine this case ‘more fully. For now we merely require that two or more explanatory variables do not have exact linear relationships among them. 8.3 ESTIMATION OF PARAMETERS OF MULTIPLE REGRESSION ‘To estimate the parameters of Hq. (8.2), we use the ordinary least squares (OLS) method whose main features have already been discussed in Chapters 6 and 7. Ordinary Least Squarés Estimators To find the OLS estimators, let us first write the sample regression function (GRE) corresponding to the PRF Eq, (62), as follows: Ya br + ba¥n tbsXu er 6.13) where, following the convention introduced in Chapter 6, ¢ is the residual term, or simply the residival—the sample counterpart of and where the bs are the estimators of the population coefficients, the Bs. More specifically, = the estimator of Bi }, = the estimator of By bg = the estimator of By ‘The sample counterpart of Eq, (8:1) is Y= by + bake tbaXe (6.14) ‘which is the éstimated population regression lirie (PRL) (actually a plane). As explained in, Chapter 6, the OLS principle chooses the values of the unknown parameters in sucki a way that the residual sum of squares (RSS) Je? is as small as possible. To do this, we first write Equation (8.18) as. = Yih = ka, body (6.15) Squaring this equation on both sides and summing over the sample observa tions, we obtain. a RSS: yee = Te. he bike = bX)? (8.16) And in OLS we minknize this RSS (which is simply the sum of the squared differen betwee actual Yj and estimated Y), 277CHAPTER EIGHT: MULTIPLE REGRESSION: ESTIMATION AND HYPOTHESIS TESTING. 215 The minimization of Equation (8:16) involves the calculus technique of differentiation. Without going into detail, this process of differentiation gives us the following equations, known as (least squares) normal equations, to help estimate the unknowns’ [compare them with the corresponding equations given for the two-variable case in Equations (6.14) and (6.15)]: Y= by +b2% + bas (6.17) DIY Xa = th YD Xa th FG + bs XX (8.18) Dive = Y Re they tote tbe PH (8:19) Where the summation is over the sample range 1 to n. Here we have three eqita- ‘tions in three unknowns; the knowns are the variables Y and the Xs and the unknowns are the bs. Ordinarily, we should be able to solve three equations with three unknowns, By simple algebraic manipulations of the preceding equations, we obtain the three OLS estimators as follows: he Pbk -b% (620) (Dyta)(Dx8) ~ (Lyra) (Daa) ner (Pa)C4)- (Lael oe _ (Ls) (E9) = (Ce) (Tray) (L4)(L8)- (Daw)? where, a3 usual, lowercase letters denote deviations from sample mean values. egyw=%—¥). (8.22) ~ You will notice the similarity between these equations and the corresponding ones for the two-variable case given in Bags. (6:16) and (6.17). Also, notice. th the following features of the preceding equations: (1) Equations (8.21) and (8.22) ate symumetrical in that one can be obtained from the other by interchanging, the roles of Xo and Xs, and (2) the denominators of these two equations are ‘identical. ee Variance and Standard Errors of OLS Estimators: ‘Having obtained the OLS estimators of the intercept and partial regression coefficients, we can derive the variances and standard errors of these estimators in the manner of the two-variable model, Thesé variances or standard errors. give us some idea about the variability of the estimiators from sample to sample. As in the two-variable case, we need the standard érrors for two main The mathematical details can be found in Appendix 84.1 278LB aD a 216. PARTTWO: THE LINEAR REGRESSION MODEL purposes: (1) to establish confidence intervals for the true parameter values and (2) to test statistical hypotheses. The relevant formulas, stated without proof, are as follows: var(by) = [: + bea pas) y2) ESS = the explained sum of squares (te, explained by all the X variables) RSS = the residual sum of squares Also, as in the two-variable éase, R? is defined as = 15 R (8.33) is, itis the ratio of the explained sum of squares to the total sum of squares; the only change is that the ESS is now due to more than one explanatory variable. 280 R22 wt nD218 PARTTWO: THE LINEARREGRESSION MODEL Now it can be shown that? BSS= bo) wan tbs ) yaw (8.34) and, as shown before, RSS = ) ye) Yet — Bs vit (6.35) Therefore, R2 can'be computed as : Ren BEM tbo ye zane Ee 636? In passing, note that te positive square root of R?,R, is known as the coefficient of multiple corrélation, the two-variable analogue ofr, Just as r measures the degree of linear association between Y and X, R can be interpreted as the de~ gree of linear association between Y and all the X variables jointly, Although r can be positive or negative, Ris always taken to be positive. In practice, however, Ris of little importance. 8.5 ANTIQUE CLOCK AUCTION PRICES REVISITED Let us take time out to illustrate all the preceding theory with the antique clock auction prices example we considered in Chapter 6 (See Table 6-14). Let Y = auction price, Xp = age of clock, and Xj = number of bidders. A priori, one would expect a positive relationship betweein Y and the two explanatory variables. The results of regressing Y on the two explanatory variables are as follows (The Eviews output of this regression is given in Appendix 8A.4.) : ¥; = -1336.049 + 12.7413 Xp; + 85.7640, se = (175.2725) (0.9123) - _ (8.8019) t= (7.6226) (13.9653) (9.7437) (6.37) p=(0.0000)* —(0.0000)*_(0.0000)* RP 08906; F = 118.0585 Mia Interpretation of the Regression Results ‘As expected, the auction price is positively related to both the age of the clock and the number of bidders. The interpretation of the slope coefficient of about 12.74 means that holding other variables constant, if the age of the clock goes up "See Appendix 8A2- i 5R an algo be coniputed as 1 ~ RY = 1— pa “Denotés ari extremely small value. ' oadGIAPTER EIGHT: MULTIPLE REGRESSION: ESTIMATION AND HYPOTHESIS TESTING 219 by a year, the average price of the clock goes about by about 12.74 marks, Likewise, holding other vatiables constant, if the number of bidders increases by one, the average price of the clock goes up by about 89.76 marks. The negative value. of the intercept has no viable economic meaning. The R* value of about 0.89 means that the two explanatory variables account for about 89 percent of the variation in the auction bid price, a fairly high value. The F value given in Bg, (637) will be explained shortly. . 8.6 HYPOTHESIS TESTING INA MULTIPLE REGRESSION: GENERAL COMMENTS - Although R? gives us an overall measure of goodness of fit ofthe estimated regression line, by itself R* does not tell us whether the estimated partial regression coefficients are statistically significant, that is, statistically different from zero. Some of them may be and some may not be. How do we find out? ‘Tobe specific, let us suppose we want to entertain the hypothesis that age of the antique clock has no effect on its price. In other words, we Want to test the null hypothesis: Hy:By =D, How do we go about it? From our discussion of hypothesis testing for the two-variable model given in Chapter 7, in order to answer this question we need to find out the sampling distribution of ty, the estimator of By, What is the sampling distribution of by? And the sampling distribution of by and bs? Th the two-variable case we saw that the OLS estimators, by and by, ae ior mally distributed if we are willing to assume that the error term u follows the ‘notmal distribution. Now irtassumption A8.7 we have stated that even for mvul- tiple regression we will continue to assume that u is normally distributed with zero mean and constant variance o”, Given this and the other assumptions listed in Section 8.2, we can be prove that by by, and b; each follow the normal distribution with means equal to Bi, By, and Bs, respectively, and the variances f given by Eqs. (6.23), (8.25), and (8.27), tespectively. a However, as in the two-variable case, if we replace the true butt unobservable, } o? by its unbiased estimator 6? given in Eq, (8.29), the OLS estimators follow: | 4 the t distribution with (n — 3) df, not the normal distribution, That is, (8.38) (8.39) (8.40) Notice that the d.f, are now (1r— 3) because in computing thé RSS, > ¢?, and hence 6”, we first need to estimate the intercept and the two partial slope coefficients; so we Jose 3 d.€. 282220 PART TWO: THE LINEAR REGRESSION MODEL ‘We know that by replacing o? with 6? the OLS estimators follow the t distribution. Now we can use this information, to establish confidence intervals as. well as to test statistical hypothesis about the true partial regression coefficients. The actual mechanics in many ways resemble the two-variable case, which we P now illustrate with an example. 8.7 TESTING HYPOTHESES ABOUT INDIVIDUAL. PARTIAL REGRESSION COEFFICIENTS Suppose in our illustrative example we hypothesize that HyBy=0 and HBr #0 That is, under the null hypothesis, the age of the antique clock has no effect whatsoever on its bid price, whereas under the altemative hypothesis, itis con- tended that age has some effect, positive or negative, on price. The alternative ypothesis is thus two-sided. Given the preceding null hypothesis, we know that ata b ~se(l2) = by 1 By = = xp (Note: By = 0) i841) follows the ¢ distribution with (n — 3) = 29-d.f,, since n = 32 in our example. From the regression results given in (8.37), we obtain’ t= © 13.9653 (8.42) a Which has the f distribution with 29d. . On the basis of the computed ¢ value, do we reject the null hypothesis that the age of the antique clock has no effect on its bid price? To answer this question, we can either use the les of significance approach or the confidence interval ‘approach, as we did for the two-variable regression. 4h 33 ‘The Test of Significance Approach Recall that in the test of significance approach to hypothesis testing we develop a test statistic, find out its sampling distribution, choose a level-of significance a, and determine the critical value(s) of the test statistic at the chosen level of significance. Then we compare the value of the test statistic obtained from the sample at hand with the critical veluels) and reject the null hypothesis ifthe computed value of the test statistic exceeds the critical value(s).!° Alternatively, YH the test statistic has a negative vale, ve consider its absolute valve and say that ifthe albsolite value ofthe test statistic exceeds the critical vatue, we reject the null hypothesis 2R2 no nD BeCHAPTER EIGHT: MULTIPLE REGRESSION: ESTIMATION ANO HYPOTHESIS TESTING 221 we can find the p value of the test statistic and reject the null hypothesis ifthe p ‘value is smaller than the chosen a value. The approach that we followed for the two-variable case also carties over to the multiple regression. Returning to our illustrative example, we know that the test statistic in question is the £ statistic, which follows the ¢ distribution with (7 — 3) d.f. Therefore, we use the £ test of significance. The actual mechariics are now straightforward. Suppose we choose a = 0105 or 5%. Since the alternative hypothesis is two- sided, we have to find the critical't value at a/2 = 2.5% (Why?) for (n ~3) d.t, which in the present example is 29. Then from the ? table we observe that for was, (-2.045 < t < 2.045) = 0.95 (8.43) ‘Thatis, the probability that a f value lies between the limits —2.045 and +2.045: (e,, the critical values) is 95 percent. From Eq; (842), we see that the computed ¢ value under Ho:B) = Q is approximately 14, which obviously exceeds the critical f value of 2,045. We therefore reject the null hypothesis and conclude that age of an antique clock definitely has an influence on its bid price: This conclusion is also reinforced by the p value given in Eq, (8.37), which is practically zero, That is, ifthe null hypothesis that By = 0 were true, our chances of obtaining a f value of about 14 o greater is practically nil. Therefore, we can reject the null hypothesis more re- soundingly on the basis of the p value than the conventionally chosen. value of 1% of 5%. One-Tail or Two-Tail f Test? Since, a prioti, we expect the coefficient of the age variable to be positive, we should in fact nse the one-tai test here. The 5% critical t valite forthe one-tail test for 29 d.& now becomes 1,699, Since the computed t value of about 14 is still so much greater than 1.699, we reject the null, hypothesis and now conclude that the age of the antique clock positively im- pacts its bid price; the two-tail test, on the other hand, simply told us that age of the antique clock could have a positive or negative impact on its bid price. Therefore, be careful about how you formulate your null and altemative hypotheses. Let theory be the guide in choosing these hypotheses. The Confidence interval Approach to Hypothesis Testing ‘The basics of the confidence interval approach to hypothesis testing have already. been discussed in Chapter 7. Here we mercly illustrate it with our numerical example. We showed previously that P(-2.045 < # < 2.088) = 0.95 ‘We also know that 639) SRE Ta ds. a]Li ne ne 222 PART TWO: THE LINEAR REGRESSION MODEL If we substitute this value into Equation (8.43), we obtain b-B se(ba) 95 P (2 045 < < 206) Which, after rearranging becomes Plby ~ 2.045 se(b2) < By < bz + 2.045 se(b2)] = 0.95 (8.44j which is a 95% confidence interval for By [cf. Eq. (7.26)]. Recall that under the confidence interval approach, ifthe confidence interval, which we call the ac- ceptence region, includes the null-hypothesized value, we do not reject the null hypothesis. On the other hand, if the null-hypothesized value lies outside the confidence interval, that is in the region of rejection, we can reject the null hypothesis, But always bear in mind that in making either decision we ate taking a chance of being wrong a% (éay, 5%) ofthe time. For our illustrative example, Eq, (8:44) becomes ; 12.7413 = 2.045(0.9123) < By < 12.7413 + 2.045(0.9123) thatis, 10.8757 < By < 14.6069 (8.45) which is a 95% confidence interval for true Bz. Since this interval does not:include the null-hypothesized value, we can reject the null hypothesis: If we construct confidence intervals like expression (8.45), then 95 out of 100 such intervals will include the true By, but, as noted in Chapter 7, we cannot say that the probability i 95% that the particular interoal (8.45) does or does not include the true Bp. Needless to say, we can use the two approaches to hypothesis testing to test hypotheses about any other coefficient given in te regression results for our illustrative example. As you can see from the regression results, the variable, number of bidders, is also statistically significant (ie., significantly. different from zero) because the estimated f value of about 8 has a p value of almost zero. Remember that the lower the p value, the greater the evidence against the null hypothesis, : 8.8 TESTING THE JOINT HYPOTHESIS THAT B, = By DOR R? = 0 For out illustrative example we saiv that individually the partial slope coetfi- cients by and bs are statistically significant; that is, indfvidually each partial slope coefficient is significantly different from zero. But now consider the following null hypothesis: Hp: By = By =0 (8.46),‘CHAPTER EIGHT: MULTIPLE REGRESSION: ESTIMATION AND HYPOTHESIS TESTING 223 This null hypothesis isa joint hypothesis that By and Bg ase jointly or simultane ously (and not individually or singly) equal to zero. This hypothesis states that the two explanatory variables together have no influence on Y, This is the same as saying that Hy =0 (ean That is, the two explanatory variables explain zero percent of the variation in the dependent variable (recall the definition of R’). Therefore, the two sets of hypotheses (8.46) and (8.47) are’ equivalent; one implies the other: A fest. of either hypothesis is called a test of the overall significance of fhe estimated. multiple regression; that is, whether Y is linearly related to both Xaand Xs. How do we test, say, the hypothesis given in Equation (8.46)? The temptation here is to state that since individually b; and b3 are statistically different from zero in the present example, then jointly or collectively they also must be statistically different from zero; that is, we reject Ho given. in Eq, (8.46). In other words, since age of the antique clock and the number of bidders at the auction each has a significant effect on the auction price, together they also must have a significant effect on the auiction price. But we should be careful here for, as we show more fully in Chapter 12 on imulticollinearity, in practice, in a multiple regression one or more variables individually have no effect on the dependent variable but collectively they have a significant impact on it. This means that the Hesting procedure discussed previously, although valid for testing’ the statistical significance of an individual regression cocficient, is not valid for testing the joint hypothesis. ‘How then do we test a hypothesis like hypothesis (8.46)? This can be done by using a technique known as analysis of variance (ANOVA). To see how this technique is employed, recall the following identity: TSS = ESS + RSS (8.32) Thatis, Do = be ows tbs Doyen t Det ag} Bquation (8.48) decomposes the TSS into two components, one explained by the (chosen) regression model (BSS) and the other not explained by the model (RSS). A study of these components of TSS is known as the analysis of uariance (ANOVA) front the regression viewpoint, Asnoted in Chapter 4 every Sum of squares has associated with itits degrees of freedom (df); that is, the number of independent observations on the basis This is Equation (235) wetton illest 286 lane PELE ae.KB om 224 PART TWO: THE LINEAR REGRESSION MODEL ‘TABLE G4 ‘ANOVA TABLE FOR THE THREE-VARIABLE REGRESSION, ‘Source of vafation ‘Sum of squares (88) oe ‘Due to regression (ESS) be Dyer ba E yee 2 Due o residual (RSS) re 1-3 Total (TSS) = a1 Note: MSS = maa or average, sum of square, of which the sum of squatesis compote, ‘Now each of the preceding sums sof squares as these df: ‘Sum of squares Tss: ‘A 1 (always, Why?) Rss ‘n— 8 (three-varlable model) ESS 2 (throe-varable model)* “An easy way to find the dt. for ESS sta strat the df for RSS fom the dor TSS. Let us arrange all these suns of squares and their associated df in atabular form, known 2s the ANOVA table, as shown in Table 8-1. Now given the assumptions of the CLRM (and assumption A8.7) and the null hypothesis: Ho:Bp = Bs = 0, it can be shown that the variable _ Yatiance| explained by X and X% ‘unexplained variance = (ba Xyety +b Yo wrs)/2 aD 70) follows the F distribution with 2 arid (n ~ 3) df. in the numeratot and denominator, respectively. (See Chapter 4 for a general discussion of the F distribution and Chapter 5 for some applications)... general, if the regression rnodel las k explanatory variables inchuting the intercept term, the F ratio has (k — 1) df in the numerator and (nt ~ 1) df. the denomnriator32 How can we use the F ratio of Equation (849) to test the joint hypothesis that both X) and Xs have no impact on Y? The answer is evident in Bq, (8.49). 1 the nuinerator of Eq (8.49) is larger than its, denominator—if the variance of Y (8.49) 14 simple way to remember this is that the numerator @1-of the E ratio ig equal to the number of prlal slope coeficint in the mode, and the denominator dis equal to n mins the total num ‘bof parameters estimated (Le, partial slopes plus te iter) 287(CHAPTER EIGHT: MULTIPLE REGRESSION: ESTIMATION AND HYPOTHESIS TESTING 225 TABLE 8-2 ANOVA ABLE FOR THE CLOCK AUCTION PRICE EXAMPLE. ‘Source of variation, ‘Sum of squares (SS) df, Mss = ‘ue to regression (ESS) 42780953 2. d27e296.92 Due to restdual (RSS) 5054622 29. . 625460.2/08 ‘otal (TSS) S0BTETS t F = 2199147.6/18119,386 = 118.0585" “*Figures have been rounded. explained by the regression (i, by Xp aid Xs) is larger than the variance not explained by the regression—the F value will be greater than 1, Therefore, as the variance explained by the X variables becomes increasingly langer relative to the unexplained variance, the F ratio will be increasingly larger, too. Thus, arr increasingly large F value will be evidence against the null hypothesis that the two (or aiore) explanatory variables have no effect on Y, Of course, this intuitive reasoning can be formalized in the usual framework of hypothesis testing. As shown in Chapter 4, Section 4.4, we compute F as given in Eq, (8.49) and compare it with the critical F value for 2 and (n= 3) df at the chosen level of a, the probability of committing a type 1 error As usual, if the computed F value exceeds the critical F value, we reject the null hypothesis that the impact ofall explanatory variables is siraultaneously equal to zero. If it does not exceed the critical F value, we do not reject the mull hypothesis that the explanatory variables ‘have no itupact whatsoever on the dependent ariable. ‘Th illustrate the actual mechanics, let us return to our illustrative example, ‘The numerical counterpart of Table 8-1 is given in Table 8-2. ‘The entries in this table are obtained from the Eviews computer output given in Appendix 84.4. From this table and the computer dutput, we see that the estimated F value is 118.0585, or about 119, Under the null hypothesis that By = Bs.= 0, and given the‘assumptions of the classical linear regression model {CLRM), we know that the computed F’value follows the F distribution with 2 and 29 df. in the numerator and denominator, respectively Ifthe null hypothesis were true, what isthe probability of our obtaining an F value of as much as 118 or greater for 2 and 13 d.f.? The p Value of obtaining an F value of 118 or greater is 0.000000, which is practically zero, Hence, we can reject the null hypothesis that age and numberof bidders fgetier has no effet onthe bid price of antique docks."* In our illustrative example it so happens that not only do we reject the null hypothesis that By and B3 are individually statistically insignificant, but we also "8Unlike other software packages, Bviews doesnot produce the ANOVA table, although it gives the F value, Butt is very easy to construct this tbl, for Bsiows gives TSS and RSS from which ESS. canbe easily obfained, lgyau had chasen «= 1% the critical F value for2 and 30 (which is close (629) A. 539. The F yalue of 118s obviously much greater than this eral value. 288 po no em tnantiae sn2S PREG AD, 226 PARTTWO: THE LINEAR REGRESSION MODEL reject the hypothesis that collectively they are insignificant. However, this need not happen all the time. We will come across cases where not all explanatory variables individually have much impact on the dependent variable (i.e, some of thet values may be statistically insignificant) yet all of thes collectively indlu- ence the dependent variable (e,, the F test will reject the null hypothesis that all partial slope coefficlents are simultaneously equal to zero.) As we will se, this happens if we have the problem of multicollinearity, which we will discuss mote in Chapter 12. An Important Relationship between Fand R? nm so TABLE G9 ie) ‘There is an important relationship betiween the coefficient of determination R? * ‘and the F ratio used in ANOVA. This relationship is as follows: 2 Rke=1) Pac (6.50) ‘where a = the number of observations and k = the number of explanatory variables including the intercept. Equation (8.50) shows how F and R? are related. These two statistics vary directly. When R? = 0 (ie,, no relationship between Y and the X variables), F is 270 ipso facto, The larger R*is, the greater the F value willbe, In the limit when R's 1, the F values infinite. ‘Thus the F test discussed earlier, which is a measure of the overall significance of the estititated regression line, is also a test of significance of R?; that is, whether R? is different from zero. In other words, testing the null hypothesis Eq. (8.46) is equivalent to testing the null hypothesis that (the population) R? is zero, as noted in Eq, (8:47). One advantage of the F test expressed in terms of R? is the ease of computation, All we need to know is the R? value, which is routinely coniputed by most regiession programs, Therefore, the overall F test of significance given in Eq, (8.49) can be recast in terms of R? as shown ift Eq, (8.50), and the ANOVA Table 6-1 can be equivalently expressed as Table 8-3. _ ANOVA TABLE IN TERMS OF A®. ‘Source of variation: ‘Sum of squaires (SS) ole MSS = # Dye to regression (ESS) ROSH 2 Fon Duo to residuals (RSS) (1 FAQL YF) Total (TSS) " Ey Hoe a computing ths F eli, wo. doit ned malty F and (1 ~ Fy by snow i rope out, as canbe seen rom Ea. (248). {nthe kvarable modal ha a lb (kf and (n=, especialy DRoCHAPTER EIGHT: ‘MULTIPLE REGRESSION: ESTIMATION AND HYPOTHESS TESTING 227 For ous illustrative example, R* = 0.8906, Therefore, the F ratio of Equation (8.50) becomes = 8806/2 F0:3506)/25 Which is about the same F as shown in Table 82, except for rounding errors. tis left for you to set up the ANOVA table for our illustrative example: in the manner of Table 8-3. F 118.12 (85D 8.9 TWO-VARIABLE REGRESSION IN THE CONTEXT OF MULTIPLE REGRESSION: INTRODUCTION TO SPECIFICATION BIAS Lets retum to out example. in Example 65, we regressed auction price on the age of the antique clock and fie number of bidders separately, as shown in Equations (6.27) and (6.28). These equations are reproduced here with the tstuat regression output, Y= 191.6662 + 10.4856 Age, se = (264.4393) + (1.7937) 652 = (0.7248) (6.8457) 7? = 05325; F = 34.1723 ‘Y= 807.9501 + 54.5724 Bidders se = (291.9501) (23.5724) 8.53) | $3962), (2.3855)? 0.1549; F = 5.5017 If we compare these regressions with the results of the multiple regression given in (8.37), we see several differences: 1. The slope values in Bqiations (8.52) and (8.53) are different from those given in the miultiple regression (8.37), espetially that of the number of Bidders variable. ° : 2, The intercept values inthe three regressions are also differents 3. The R? value in the multiple regression is quite different than the’? values giver in the two bivariate regressions. ~ As we will show; some of these differences are statistically significant and some others may not be; i . Why the differences in the results of the two regressions? Remember that in Eq, (6.37), while deriving the impact of age of the antique clock on the auction price, we held the nimber of bidders constant, whetsas in Bq, (8.52) we simply neglected the ritinber of bidders, Put differently, in Bq. (8,37) the effect of a clock’s age On auctivn price is net of the effect, ot influence, of the number of bidders, whereas in Bg. (8.52) the effect af the nuamber of bidders has fot been. netted out. Thus, the coefficient of the age variable in Bq, (8.52) reflects the gross effecl—the direct effect of age as well as the indirect effect of the ruunaber of bidders. 290 wm tes oN ,AB > as 228 PARTTWO: THE LINEAR REGRESSION MODEL ‘This difference between the result of regressions (8.37) and (8.52) shows very nicely the meaning of the “partial” regression coefficient. ‘We saw in our discussion of regression (8.37) that both the age of the clock and the number of bidders variables were individually as well as collectively important influences‘on the auction price. Therefore, by omitting thé number of bidders variable from regression (8.52) we have committed what is known asa (model) specification bias or specification error, more specifically, the specification etror of omitting a relevant variable from the model, Similarly, by omitting the age of the clock from segtession (8.53), we also have committed a specification error, Although we will exainine the topi¢ of specification errors in Chapter 11, ‘what is important to note here is that you should be very careful in‘developing a regression model for empirical purposes. Take whatever help you can from the underlying theory, and/or prior empirical work in developing the model, And once you choose a model, do not drop variables from the model arbitrarily. 8.10 COMPARING TWO At VALUES: THE ADJUSTED R® By examining the R? values of our two-variable (Eq, (8.52) or Eq. (853)] and three-variable (Eq, (8.37)] regressions for our illustrative example, you will notice that the R? value of the former (0.5325 for Eq, (8.52) or 0.1549 for Bq. (8.53)] is smaller than that of the latter (0.8906). Is this always the case? Yes! An important property of R’ is that the larger the number of explanatory variables in a model, the higher the R? willbe. It would then seem that if we want to explain a substantial amount of the variation in a dependent variable, we merely have to goon adding more explanatory variables! However, do not take this “advice” too seriously because the definition of R? = BS9/TSS does not take into account the d.f. Note that in a k-variable model including the intercept tern the d.f. for ESS is (k ~ 1). Thus, if you havea model with 5 explanatory vatiables including the intercept, the d.f. associated with ESS will be 4, whereas if you had a model with 10 explanatory variables including the intercept, the df. for the ESS will be 9. But the conventional R? formula does not take into accorant the differing df. in the various models. Note that the df, for TSS is always (n- 1). (Why?) Therefore, comparing the R? unhues of two inodels with the some dependent onriable but with differing numbers of explanatory variables is essentially like comparing apples and oranges, ‘Thus, what we need is a measure of goodriess of fit that is adjusted for (Le., takes into account explicitly) the number of explanatory variables in the model. Stich & meastare has been devised and is known as the adjusted R, denoted by the symbol, R®, This R? can be derived from the conventional R? (see Appendix 8A.3) a8 follows: Me (8.54)CHAPTER EIGHT: MULTIPLE REGRESSION: ESTIMATION AND HYPOTHESIS TESTING 229 Note that the R* we have considered previously is also knowin as the unadjusted R? for obvious reasoris. The features of the adjusted R? are: 1. If k>1, R°s R; that is, as the number of explanatory variables increases in a model, the adjusted R* becomes increasingly smaller than the unadjusted R2, There seems to be a “penalty” involved in adding ‘more explanatory variables to a regression model, 2 Although the unadjusted R? is always positive, the adjusted R? can on ‘occasion turn out to be negative. For example, in a regression model involving k= 3 and 1'= 30, if an Ris found to be 0.06, R* can be negative (8.0096). At present, most computer regression packagés compute both the adjusted and unadjitsted R? values. This is a good practice, for the adjusted R? will enable us to compate two regressions that have the same dependent variable but a different riumber of explanatory variables."° Even when we are not comparing two regression models, it js a good practice to find the adjusted R? value because it explicitly takes lito account the number of variables included in a model. For ourillustrative example, your should verify that the adjusted R* value is 0.8830, which, as expected, is émaller than the unadjusted R? value of 0.8906, ‘The adjusted R? values for regressions (8.52) and (8.53) are 0.5169 and 0.1268, respectively, which are slighily lower than the corresponding unadjusted R? values. 8.11 WHEN TO ADD AN ADDITIONAL EXPLANATORY VARIABLE TO A MODEL Jn practice, inorder to explain a particular phenomenon, we are often faced with the problem of deciding’ among several competing explanatory variables. ‘The common practice is to add variables as long as the adjusted R* increases (even though its numerical value may be smaller than the unadjusted 2) But when does adjusted R? increase? It cart be shown that R? will increase if the |t| (absolute t) value-of the. coéfficient of the added variable is larger than 1, where the t alive is computed under the null hypothesis thiat the population value of the said coefficient is zero." : ‘To see this ail clearly let us fist repress auction price on a constant only, then ona constant and the age of the clock, and then on a constant, age of the clock ‘and the nvimber of bidders, The tesults are given.in Table 8.4 As we will in Chair 9 if to regressions have diferent dependent variables, we cannot conga thle ales dct aust or uae, thet oro a particular t Value is significant, the adjusted R? will increase o long asthe ‘ofthe coeliient ofthe added variable i greter tan 1, 292MB 230 PARTTWO: THE LINEAR REGRESSION MODEL 203 ‘TABLE 8-4. ACOMPARISON OF FOUR MODELS OF ANTIQUE CLOCK AUCTION PRICES. Dependent variable . Intercept Age, Fof Bidders. «AY =P F Auction proce 1928084 — — 000 000 9 (1) (19.0850) ‘Auction price; ~181,6062 104856 = +8805 08169 94.1723 (2) (0.7248) (68457) ‘Aucton price 807.9501 — «548724 “0.1549 0.1268 55017 (3) (3.4962) 6) ‘Auction ptice 1995.04. . 12.7413 85.7640 0.8908 0,8890- 118.0585 (4) (7.6226) (13.9659). (9.7497) ‘Noté:Figues inthe pareniheses are th estimated f values under the nul hypothesis hat ia corresponding ‘optlaton values are orp: Some interesting facts stand out ir this exercise: 1, When we regress auction price on the intercept only, the R2, R?, and F val- uués are all zero, as we would expect. But what does the intercept value represent here? It is nothing but the (sample) mean value of auction ptice. One way to check on this is to look at Bq. (6.16). If there is no X variable in this equation, the intercept is equal to the mean value of the dependent variable. 2. When we regress auction price on a constant and age of the antique clock, we se thatthe t value of the age variable isnot only greater than. 1, but it is also statistically significant. Unsurprisingly, both R? and R? values increase (although the latter is somewhat smaller than’ the former), But notice an interesting fact. If you square the ¢ value of 5.8457, we get (5.8457)? = 34.1722, which is about the same as the F value of 534.1723 shown in Table 8-4. Is this surprising? No, because in Equation * (4.15) we stated that =A (8.55) = (4.15) ‘That is, the square of the £ statistic with k df. is equal to the F statistic with 1 df. in the numerator and k-d.£ in the denominator. In our example, k = 30 [32 observations ~2, the two coefficients estimated in Model (], The numerator df. is 1, because we have only one explanatory vari- vable in this model. .. When we regress auction price on a constant and the nuttiber of bidders, ‘we see that the # value of the latter is 2.3455. If you squate this value, you will get (2.3455 = 5.5013, which is about the same as shown in the preceding table, which again verifies (6.55), Since the ¢ value is greater than 1, both R’ asid R? values have increased. The computed ¢ value is also statistically significant, suggesting that the number of bidders variable should be added to model (1). A similar conclusion holds for model (2). 903CHAPTER EIGHT: MULTIPLE REGRESSION: ESTIMATION AND HYPOTHESIS TESTING 231 ‘4, How do we decide if it is worth adding both age and number of bidders together to model (1)? We have already answered this question with the help of the ANOVA technique and the attendant F test. In ‘Table 8.2 we showed: that one could reject the hypothesis that By =. , that is, the two explanatory variables together have no impact on the attction bid price,” 8.12, RESTRICTED LEAST SQUARES. Letus take another look at the regressions given in Table 6-4. There we saw thie consequences of omitting relevant variables from a regression model. Thus, in regression (1) shown n this table we regressed antique clock auction price on the intetcept only, which gave an R? value of 0, which is not suxprising, Then in » regression (4) we regressed atiction price on the age of the antique clock as well as on the riumber of bidders present at the auction, which gave an R? value of 0.8906. On the basis of thie F test we concluded that there was a specification error and that Dotti the explanatory variables should be added to the model. Let us call regression (1) the restricted model because it implicitly assumes that the coefficients of the age of the clock and the number of bidders are zero; that is, these variables do not belong in the model (ie., By = Bs = 0), Let us call segression (4) the unrestricted model because it includes all the relevant variables. Since (1) is a restricted model, when we estimate it by OLS, we call it restricted feast squares (RLS). Since (4) is an unrestricted model, when we estimate it by OLS, we call it unrestricted Jeast squares (URLS). All the models we have estimated thus far were essentially URLS, for we assumed that the model being estimated is cortectly specified and that we have included all the relevant variables in the model, In Chapter 11 we will see the consequences of violating this assumption. ‘The question now is. How do we decide between RLS and URLS? That is, iow do we find outif the restrictions imposed by a model, such as (1) in the pre~ sent instanice, are Valid? This question can be atiswesed by the F test, For this piupose, let R? denote the R? value obtained from the restricted model and Ri, the R? value obtained from the untestricted model. Now assuming that the ertof term 4; is normally distributed, it can be shown that : FFs (8.56) follows the F distribution with m and (1 - k) d.f in the numerator and denom- inatot, respectively, whete K? = R* obtained from the restricted regression, Suppose you have » model with four explanatory variables. italy yout only include tivo of these variable but then you want to ind ou! if itis worth adding twvo more explanatory variables. This can be handled by an extension of the F test. For details, see Damodar N, Gujarat, Bas Econometrics, Ath ed, McGraw-Hill, New York, 2003, pp. 260-264. 294 - weet ne oN282 PART TWO: “THELINEAR REGRESSION MODEL Ri, = R° obtained from the unrestricted regression, m = number of restrictions imposed by the restricted regression (two in our example), 1 = number of observations in the sample, and k = number of parameters estimated in the un- - restricted regression (including the intercept). The null hypothesis tested here is ‘that the restrictions imposed by the restricted model-are valid. If the F value estimated from Equation (8.56) exceeds the critical F value at the chosen level of significance, we reject the restricted regression. That is, in this situation, the restrictions imposed by the (restricted) model are riot valid. Returning to our antique clock atiction price example, putting the appropriate values in Bq, (8.56) from Table 8-4, we obtain: (0:890"0)/2 0.445 Te teoye2= 3) = ToL = 17.414 (8.57) ‘The probability of such ‘an F value is extremely small. Therefore, we reject the restricted regression, More positively, age of theantique clockaswellasthenumber ofbiddersat auction both have astatistically significant impact on the auctionprice. ‘The formula (8.56) is of general application. The only precaution to be taken in its application is that in comparing the restricted and unrestricted regres~ sions, the dependent variable must be in the same form. If they are not, we have to make them comparable using the method discussed in Chapter 9 (see Problem 9.16) or use an alternative that is discussed in Exercise 8.20. F », 819 ILLUSTRATIVE EXAMPLES a HBR To conclude this chapter, we corisider several examples involving multiple regressions. Our objective here isto show you how multiple regression models ate used in a varlety of applications: Example 8.1. Does tax policy affect corporate capital stractare? To find out the extent to which tax policy has been responsible for the recent. trend in U.S, manufacturing toward increasing ise of debt capital in lieu of equity capital—that is, tovard an increasing debt/equity ratio (called leverage in the financial literature}—Pozdena estimated the following regression model:!® = By + ByXor + ByXy + ByXei + BSB Xsy + BoXoe + ty (8.58) the personal tax rate e capital gains tax rate jondebt-related tax shields ‘Randall Johaston Pozdena, "Tax Policy and Corporate Capital Structure,” Economic Review, Federal Reserve Bank of San Francisco, Pall 1987, pp:.37-51. 295CHAPTER EIGHT: MULTIPLE REGRESSION: ESTIMATION AND HYPOTHESIS TESTING 233 TABLE@-5 LEVERAGE IN MANUFACTURING CORPORATIONS, 1935-1962. Coaticent Explanatory vafabio (tvaiue in parentheses) Corporate tax rato 24 4 (10.5) Pereonal tax rate ~12 (4a) Capital gains tax rata 0s Ga Non dob shield 24 (-48) (elton rate 14 80), 'n=48 (aumbsr of observations) = 087 ‘Sorte: andal Johnston Pozen, “Tex Poly and Corporal Capital ‘Stucu,* Gaon Rew, FedaalRoserve Bank of Sin Fandlco, Fall 1967, Taba 1,0. 45 adapted, ‘ots: 1. The author does ot present he extiatd intercept 2. Tho adjusted Fis caleulted using Ea, 8.54). 8. The standard errs ofthe various oofidons canbe obanad ty tng he coaticint vals bys Cea (0g, 24/105 = (01.266 sho se of the corporate lx rate coeticen, Economic theory suggests that coefficients Bz, By, and By will be positive and coefficlents Bs and B will be negative? Based on the data for US. manufactur ing conporations for the years 1935 to 1982, Pozdena obtained the OLS results, that are presented in tabular form (Table8-5) rather than in the usual format |e.g., Eq, (8.7)].(Resuls are sometimes presented in this form for ease of reading.) Discussion of Regression Results ‘The first fact to noté about the preceding regression results is that all the coefficients have signs according to prior expectations: For instance, the corporate tax rate has a positive effect on leverage. Holding other things the same, as the cor- orate tax rate goes up by one percentage point, on the average, the leverage ratio (ie, the debt/equity ratio) goes up by 2.4 percentage points. Likewise, if the inflation rate goes up by one percentage point, on the average, leverage goes up by 14 percentage points, other things remaining the same. (Question: Why would you expect a positive relation between leverage and inflation?) Other partial regression coefficients should be interpreted similarly. Since the? values are presented underneath each partial regression coefficient ‘under the null hypothesis that each poptlation partial regression coefficient is Gee Fozdena’s tice {foctnote 1a) forthe heoretica discussion of expected signs of he varius coefficients. Inthe United States the interest paid on debt capital stax deductible, whereas the i+ ‘comé paidas dividends isnot. Tiss one reason that corporations may prefer debt to equity capital, 296 D2MEE 234 PARTTWO: “THELINEAR REGRESSION MODEL ne <2 individually equal to zero, we can easily test whether such a null hypothesis stands up against the (two-sided) alternative hypothesis that each true population coefficient is different from zero: Hence, we use the two-tail test. The d.f. in this example are 42, which are obtained by subtracting from 1-(= 48) the number of parameters estimated, which are 6 in the present instance (Note: The intercept value is not presented in Table 8-5, although it was estimated.) If we choose « = 0.05 or 5%, the two-tail critical ¢ value is about 2.021 for 40 df.» (Note: This is good enough for present purposes since the t table does not give the precise t value for 42 d.£,) If cis fixed 21 0.1 or a 1% level, the critical f value for 40 dif, is 2.704 (two-tail). Looking at the ¢ values presented in Table 8-5, we see that each partial regression coefficient, except that of the capital gains tax variable, is statistically significantly different from zero at the 1% level of significance. The coefficient of the capital gains tax variable is not significant at either the 1% or 5% level. Therefore, except for this variable, we can reject the individual null hypothesis that each partial regression coefficient is zero. In other words, all but one of the explanatory variables individually has an impact on the debt/equity ratio. In passing, note that ifan estimated coeficient is statistically significant at the 1% level, itis also significant at the 5% level, but the converse is not true, 5 What about the overall significance of the estimated regression line? That is, do we reject the null hypothesis that all partial slopes are simultaneously equal to zero or, equivalently, is R? = 0? This hypothesis can be easily tested by using, Eq, (8.50), which in the present case gives 356.22. (8.59) This F value has an F distribution with 5 and 42 df. If« is set at 0.05, the F table (Appendix A, Table A-3) shows that for 5 and'40 d.f. (the table has no, exact value of 42.4.4. in the denominator), the critical F value is 2.45, The corresponding value at a = 0.01 is 3.51. The computed F of ~ 56 far exceeds either of these critical F values. Therefore, we reject the null hypothesis that all partial slopes are simultaneously equal to zero or, alternatively, R? = 0. Collectively, all five explanatory variables influence the dependent variable. Individually, however, as we have seen, only four variables have an impact ont the dependent variable, the debt/equity ratio. Example 8-1 again underscores the point made earlier that the (individual) t test and the joint) F fest ate quite different”? rq the two-Variable linear regression model, as noted before, if = Fx; that is, the square of a f value with Fd, is equal fo an F value with 1d in the natnerator atd kf. in the denominator. 907CHAPTEREIGHT: MULTIPLE REGRESSION: ESTIMATION AND HYPOTHESIS TESTING 235 Example 8.2. The demand for imports in Jamaica ‘Toexplain the demand for imports in Jamaica, J. Gafar” obtained the following regression based on answal data for 19 years: Y= -58.9-+ 0.20%y — 0.10% sex (0.0092) (0.084) R*=0.96 (8.60) t= QL (-1.1908) FR? = 0.955 where Y = quantity of imports X= personal consumption expenditure Xs import price /domestic price Economic tieory would suggest positive relationship between Y and X2and a negative relationship between Y and Xy which tums out to be the case, Individually the coefficient of Xs statistically significant but that of Xsis not af; say, the 5% level. But since the absolute ¢ value of Xs is greater than 1, R® for this example will drop ifX5 were dropped from the mnodel. (Why?) Together, X; and X explain about 96 percent of the variation in the quantity of imports into Jarhaica, Example 8,3. The demand for alcoholic beverages in the United Kingdom ‘To explain the demand for aleoholic beverages in the United Kingdom, T. McGuinness” estimated the following regression based on annual data for years: Y; = 0.014 ~ 0.354Xoy + 0.0018 Xqp + 0.657 Xqy + 0.0059 Xse 0.012) (0.2688) (0.0005) (0.266) (0.0034) t=(-1.16) (132) 8.39) @47) (173). R= 0.689 (8.61) where Y= the annual change in pure alcohol consumption per adult X= the antal change in the real price index of alcoholic drinks X= the annual change in the teal disposable income per person X= theantual change in the number of licensed premises i the adult population X= the annual’ change in real advertising expenditure on alcoholic drinks per adult ‘Theory would suggest that all but the variable Xp will be positively: re~ lated to Y, This is borne out by the results, although each coefficient is not 19, Gafar, “Devaluation and the Balatce of Baystents Adjustment in Developing Beonomy: An Analysis Relating fo larzica,” Applied Economics, vol. 13, 1981, pp. 151-168. Notations ‘were ‘adapted, Adjusted R? computed , McGuinness, “AA Eeotomelric Analysis of Total Deniand for Alcoholie Beverages in the United Kingdom,” urna of industrial Ecos, vl. 29, 1980, pp. 85-109. Notations were adapted. 298 Sk ae. fensie FRE. ‘236 PART TWO: THE LINEAR REGRESSION MODEL 8.14 SUMMARY 345 & statistically significant. For 15 df. (Why?), the 5% critical t value is 1.753 (one-tail) and 2:131 (two-tail), Consider the coefficient of Xs, the change in advertising expenditure. Since the advertising expenditure and the demand for alcoholic beverages are expected to be positive (otherwise, it is bad news for the advertising industry), we can entertain the hypothesis that Hy:Bs = 0 vs, Hi:Bs > 0, and therefore can use the one-tailf test. The computed f value of 173is very close to being significant at the 5% level, Itis left as an exercise for you to compute the F value for this example to tte hypotil that all partial slope coefficients are simultaneously equal to zeto. Example 8.4. Civilian labor force participation rate, anenployment at and average hourly earnitige revisited Tn Chapter 1'we presented regression (1.5) without discussing the statistical ignificance of the resillts. Now we have the necessaty tools to do that. The ‘complete regréssion tesults are as follows: CEPR, = 80.9018 ~ 0.6713 CUNR; ~ 1.4042 AHES2 = (47561) (0.0827) (0.6086) (17.0096)(-8.1159) _ (2.3072) (8.62) value = (0.0000)* (0.0000)* (0.0319) R= 07727; R?=0.7500; F =34073 As these sesults show, each of the estimated regression coefficients is individually statistically highly significant, because the p values are so small. ‘Thatis, each coefficient is significantly different from zero. Collectively, Both CUNR and AHES? are also highly statistically significant, because thep value of the computed F value (for ? and 20 d.f) of 40s extremely low. ‘As expected, the civilian unemployment rate has a negative effect on the civilian labor force participation rate, suggesting that perhaps the discouraged-worker effect domtinates the added-worker hypothesis. The theoretical reasoning behind this has already been explained in Chapter 1. ‘The negative value of AHES2 suggests that perhaps the income effect dominates the substitution effect. In this chapter we considered the simplest of the multiple regression models, namely, the three-variabte linear regression model—one dependent variableand two explanatory variables. Although in many ways a'straightforward extension of the two-variable linear regression model, the thrée-Variablemodel introduced. “Denotes excel smal value. 299CHAPTER EIGHT: MULTIPLE REGRESSION: ESTIMATION AND HYPOTHESISTESTING 237 several new concepts, such as partial regression coefficients, adjusted and unadjusted multiple coefficient of determination, and multicollinearity. Insofar as estimation of the parameters of the multiple regression coéfii- cients is concerned, we still worked within the framework of the classical linear regression model and used the method of ordinary least squares (OLS), The OLS estimators of multiple regression, like the two-variable model, possess several desirable statistical properties summed up in the Gauss-Markov property of best linear unbiasednéss (BLUE). - With the assumption that the disturbance ae the normal distribution with zero méan and constant variance 6”, we saw that, a5 in the two- ‘vasiable case, each estimated coefficient inthe multiple regression follows the normal distribution with a mean equal to.the true population value and the vatiances given by the formulas developed in the text. Unfortunately, in practice, o7 is not known and: has to be estimated. The OLS estimator of this: unknown variance is 62. But if we replace o by 62, then, a8 in the two-vatiable case, each estimated coefficient of the multiple regression follows the distribu. tion, not the normal distribution. ‘The knowledge that each multiple regression coefficient follows the t distribution with d.f, equal to (n'—K), where k is the number of parameters estimated (including the intercept), means we can use the t distribution to test statistical hypotheses about each multiple regression coefficient individually. This can be done on the basis of either the ¢ test of significance or the confidence interval: based on the f distribution. In this respect, the multiple regression model does not differ mach from the two-variable model, except that proper allowance must be made for the dé, which now depend on the number of parameters estimated. However, when testing the hypothesis that all partial slope coefficients are simultaneousty equal to zero, the individual testing referted to earlier is of no- help. Here we should use the analysis of variance (ANOVA) technigie and the attendant F test. Incidentally, testing that all partial slope coefficients are simvul- taneously equal to zero’ is the same as testing that the multiple coefficient of determination, R? is equal to zero. Therefore, the F test can also be used to test this later but equivalent hypothesis, We also discussed the question of when to add a variable or a group of variables to d model, using either the t test or the F test. In this context we also discussed the method of restticted least squares, All the concepts introduced in this chapter have been illustrated by iiumerical exainples atid by concrete economic applications. KEY TERMS AND CONCEPTS . The key terms and coricepts introduced in this chapter are Multiple regression modet Multcollincarity Partial ogtession coefficients; Collinearity; exact Linear relationship partial slope coefficients a) high or near perfect collinearity. 300 : ~298 -PARTTWO: THE LINEAR REGRESSION MODEL ‘Multiple coefficient of Model specification bas (specification determination, R? error) Coefficient of multiple correlation, R Adjusted R?(R2) Individual hypothesis testing Restricted least squares (RLS) Joint hypothesis testing or test of Unrestricted least squares (URLS) ~ overall significance of estimated Relationship between # and F tests. multiple reg a) analysis of variance (ANOVA) b) F test QUESTIONS. 8.1, Explain caffully the meaning of a. Partial regression coefficient “b. Coefficient of multiple determination, R? «¢, Perfect collinearity A, Perfect multicollinearity e. Individual hypothesis testing £, Joint hypothesis testing : Adjusted R® 8.2. Explain step by step the procedure involved in a. Testing the statistical significance of a single multiple regression coefficient. b, Testing the statistical significance of all partial slope coefficients. 8. State with re reasons whether he folowing statements are true (0) fase or uncestain (U). a. The adjusted and unadjusted Rsueidentcl orily when the unadjusted R? is equal to b. The way to 5s determiné whether a group of explanatory vatiables exerts sig- . nificant influence onthe dependent variable isto see if any of the explana- fory variables has a significant t statistic; if not, they axe statistically in- si tas a a 6 When =, Fe 0vand when R? F = infinite, 4, When the df. exceed 120, the 5% critical # value (two-tail) and the 5% criti- calZ (standard normal) value are identical, namely, 196. - e. In the model ¥; = By + By Xai + 83Xui + 14, if Xp and Xq are negatively correlated in the sample and B; > 0, omitting Xs from the model will bias bjz downward [ie, E(by2) < Ba] where by is the slope coefficient in the regression of ¥-on Xpalone. f, When we say that an estimated regression coefficient is statistically’ cant, we mean that itis statistically different from 1. g, To compute a critical t value, we need to know only the dif h. By the overall significance of a multiple regression we mean the statistical significance of any single variable included in the model, i. Insofar as estimation and hypothesis testing are concerned, there is no difference between simple regression and iultiple regression. j. The df of the tofal sum of squares (18S) are always (n ~ 1) voxels of the roumber of explanatory variables included in the model, dE FREE * Optional.‘CHAPTER EIGHT, MULTIPLE REGRESSION: ESTIMATION AND HYPOTHESIS TESTING 239 8.4. What is the value of 6? in each of the following cases? a. Yeef = 880, = 25, k= 4 including intercept) b, Yoef = 1220, n= 14, k=3 (excluding intercept) 85. Find the critical t value(s) in the following situations: Degrees of freedom Level of significance (at) ) hy 2 5 Two-sided 20 1 Right 30 5 Left 200 5 ‘Twotalt 8.6, Find the critical F values for the following combinations: Numerator. Denominator 61. Covel of significance 2) 5 8 4 4 9 1 a 200 6 PROBLEMS 87. Youare given the following data: yo & oh 1 1 2 a8 4 8 3. ~3 Based on these data, estimate the following regressions: a + Aa Kast uy . 1+ Ca Xi + ui orl h + Bp Xo + BXy +; sf ‘Note: Do not worry about estimating the standard errors. Sy 1, Is Ap=B Why of why not? x 2, 18 Cy=By? Why ot why not? ‘ep What genetal conclusion can you draw from this exercise? t 88. You are given the following data based on 15 observations: Y= 367,693; X= 402.760; Xy=8.0; )>y? = 66,042,269 as YF $ 84,855.096; x3 = 280.0; Y yr = 74,778.46 Livre = 4.2509; Yayry = 4,7960 where lowercase letters, as usual, denote deviations from sample mean values, a. Estimate the three multiple regression coefficients . Estimate their standard errors. c, Obtain R* and d. Estintate 95% confidence intervals for By and Bs. e. Test the statistical significance of each estimated regression coeflicient using \ a 35% (tworlail). 302 239a PB 240° PARTTWO: THE LINEAR REGRESSION MODEL f. Test atc = ee ANOVA table, 8.9. A thtee-variable regression gave the following results; ‘Sum of squares Mean sum of ‘Source of vation (ss) df, squares (MSS) Due to regression (ESS) 65,965, = - Due to residual (RSS) s = = Total (TSS) i: 66,042 14 ‘a: What is the sample size? b, What is the value of the RSS? ¢. What are the df. of the ESS anid RSS? 4, What is RY And R 3 e, Test the hypothesis that X2 and Xs have zero influence on Y. Which test do you use and why? f. From the preceding : information, can You tell what is the individual contri- bution of X, and Xs toward Y? 8.10.’ Recast the ANOVA table given in problem 8: Jin terms of R2: 8.1L. To explain what determines the price of air conditioners, B:T, Ratchford? obtained the following regression results based on a sample of 19 air conditioners: 68.236 + 0.023 Xqj + 19.729Xqy + 7.653Xq R* = 0.84 (@.005}. (6.992). (4.082) where Y = the price, in dollars Xp= the BIU rating of ait conditioner Xq= the energy efficiency ratio Xq= the number of settings se = standard errors a, Interpret the regression results. b, Do the results make economic sense? ©. Ata = 5%, test the hypothesis that the BTU rating has no effect on the price ‘of an air conditioner versus thatit has a positive effect: 44: Would you accept the null hypothesis that the three explanatory Variables explain a sibstantial variation ‘in the. prices: of air conditioners? Show clearly all your calculations. 8.12, Based on the US. data for 1965-IQ to 1983-IVQ (n= 76), James Doti and Esmael Adibi% obtained the following regression to explain personal consumption expenditure (PCE) in the United States. Y= = 10.96 + 0.93Xy — 2.09Xy P= (-3.33)049.06). 3.09) R= 0.9996 F= 83,7537 °g, T. Ratchford, “The Value of information for Selected Appliatces,” Journal of Marketing Reet vol. 17,1980, pp- 16-25, Noone were adapted, ‘Wyatnes Doti and. Barnael Adibi, Economic Analysis: An Applications Approach, Prentice Hall, Englewood Cliffs, NJ, 1988, p. 188. Notations wete adapted. anz po(CHAPTER EIGHT: MULTIPLE REGRESSION: ESTIMATION AND HYPOTHESIS TESTING 244 where Y = the PCE (§, in billions) X= the disposable (ie,, after-tax) income (§, in billions) X3 = the prime rate (%) charged by banks a. What is the marginal propensity to consume (MPC)--the amount of additional. consumption expenditure from an additional dollar’s persona disposable income? be Is the MPC statistically different from 1? Show the appropriate testing procedure. . What is the rationale for the inclusion of the prime rate variable in the model? A priéti, would you expecta negative sign for this variable?, Is by significantly different from zero?“ : e. Test the hypothesis that R= 0, £, Compiate the standard étror of each coefficient: 8.13. Inthe illustrative Example 8,2 given in the text, test the hypothesis that Xpand Xs together have no influence on Y,, Which test do, you use? What are the assumptions underlying that test? ‘Table 8-6 gives data oin child mortality (CM), female literacy rate (FLR), per capita GNP (PGNP), and ‘total fertility rate (TFR) for a group of 64 countries. a. A priori what is the expected relationship between CM and each of the other variables? b. Regress CM on FLR and obtain the ugual regression results, c. Regress CM on FLR and PGNP and obtain the usual results. d. Regress CM on FLR, PGNP, and TER and obtain the usual results, Also show the ANOVA table, , Given the various regression results, which model would you choose and why? {the regression modéin (dis the correct model, but you estimate (a) or (b) or (¢), what are the consequences? ‘g Suppose you have regressed CM on FLR as in (2). How would decide ifit is ‘worth adding the variables PGNP and TER to the model? Which test would . you use? Show the necessisy calculations. 8.15.. Use formula (8.54) to answer the following question: BA6. Value of AE a ke ‘What conclusion do you draw about the relationship between R? and R?? 8.16, For the illustrative Exatnple 83, compute the F value. If that F value s signil- icant, what does that mean? 8:17. For Exainple 82, set up the ANOVA fable and test that R? = 0, Use'a = 1%. 8:18. Refer to the data givers in Table 6.11 to answer the following questions: a. Develop a multiple regression model to explain the average starting pay of MBA gradviates, obtaining the usual regression output, b. If you include both GPA and GMAT scores in the model, a priori, what problem(s) may you encounter and why? 304 A 1Ke BPN me ‘242 PARTTWO: THE LINEAR REGRESSION MODEL TABLE 8.5 CHILD MORTALITY, FERTILITY, AND RELATED DATA FOR 64 COUNTRIES. cM. FUR PNP TR CM = FLRPGNP_ Ss TFR 128 7 1870 6.668 442, 50 8640 TAT 2042 130815, toe 2 350 «880 202 16 310. 7.00 287 31 230 7.00 1785 570625 4 88 1620-391 96 78 (2050 3.81 312, n 190 6.70 209 28 200 6.44 7 88 2090 4.20 7 48 670 * 619 2 oe 90843 240 29. ‘0 5.89: 262 22 230 850 att 120: 580 216 12 40625 55 55 290. 238 246 9 390 710 75 a7 1180. 3.98. t9t 3t 1010 7410 a) 00° 599. 19 300 7.00 4. 93 1790: 350 TB 1730. 848 165 3t 1150 TAL 103 c J 780: 5.88 4 7 1160 424 67. 85 1300 462 96 80. 1270 5.00 143 78 930 5.00: 148 30 580. ‘827 83 5 690. 474 8 8 860 ‘5.21 223 33 200 8.49 161 43 420 6.50 240 19 450 6.50 118 47 1080 6.12 312 “at 280, 650 269 7 200 6.19 12 72 4430, 1,69 189 5 270 5.05 52 83 270 9.25 126 58. 560, 6.16 79 43 1340 TAT 12 et 4240 1,80 é 88 670 3.52 167 20 240 ATS 168 28 410 6.09 185 65 430 410 28 5 4370, 2.86 107 er 3020 6.66 121 at 1310 4.88. Te 63. 1420 7:28 15 62 1470 3.89 128 4g 420. 8.12 186 45 0 6.90 oF CI] 19830 523 47 65 3630 4.10 12 40 88 7% 48 20 608 ma 530.650 2 se 720 FAR = Fema ercy rat “= PGNP = Porcapa GNP In 1680, 5 ‘THR = Tol ely al 1960-1885, tho average number of cre bor oe wornan using ago potas or gen yor. Source: Chandan Majo, Howard Wie, and Mare Why. Eoonometcs and Data Anas er Developing Counties, Rowadge, London 196, p, 488. «IF the coefficient of the tuition variable is positive and statistically significant, does that mean it pays to go to the most expensive business school? ‘What might the tuition variable be a proxy for? d. Suppose you regress GMAT score on GPA and find a statistically significant: positive relationship between the two, what can you say about the problem, of multicolinearity? e, Set up the ANOVA table for the multiple fegression in part (a) and test the hypothesis that all partial slope coefficients are zero. £, Do the ANOVA exercise in part (¢), using the R? value. 8.19. Figure §-1 gives you the normal probability plot for illustrative Example 8.4 a. From this figure, can you tell if the error term in Eq. (8.62) follows the nor- al distribution? Why or why not? . Isthe observed Anderson-Darling A* value of 0.481 statistically significant? Titis, what does that gan? Iitis not, what conclusion do you draw?CHAPTER EIGHT: MULTIPLE REGRESSION: ESTIMATION AND HYPOTHESIS TESTING 243, ‘Nonmal Probability Plot ofthe Residua)s (Cesponse is CLFPR) s Percent zessege 2s FIGURE &A-t Normal probablity plot for Example 8.4. AD = Anderson-Darting Statistic. From the given data, can you identify the mean and vaciarice ofthe error tem? 820, Restrcedlnst squares (RLS). Ifthe dependent vatiabe inthe restricted and unrestricted regressions are not the same, you can use the following Vari- ant of the F test given in Eq (256) (RSS, RSS,)/m RSSq/(n—k) where RSS, = residual sum of squares from the restricted regression, RSS = residual sum of squares fo the ined eres, ‘mt = nuinber of restriations, and (rt ~ k)-d£ in the unrestricted Tuto fea youths fons evork head giveninTable8-4. Fe » Franc * EP ie, APPENDIX 84.1: Derivations of OLS Estimators Given in Equations (8,20) to (8.22) Start with Bg, (@.16). Differentiate this equation partially with respect to by, ba, and by, and set the resulting equations to zero to obtain: Bes x SEE 00 -h-n¥y—bH-t)=0 2 aD ihr kal Ha) 0 BRE Uy ty Xe —buHON-Xa) <0 Bb 386244. PARTTWO: THE LINEAR REGRESSION MODEL Simplifying these equations, gives Eq, (8.17) (8.18), and (8.19). Using small letters to denote deviations from the mean valves (¢.g.,2%1 = Xo ~ Xa), we can solve the preceding équations to obtain the formulas given in Eqs. (8.20), (8.21),and (8.22), APPENDIX 84.2: Derivation of Equation (8.31) Note that the three-variable sample regression model Y= by + bX + by Xe + ey (8A.2.1) can be expressed in the deviation form (ie., each variable expresséd as a deviation from the mean value and noting thiat’= 0) as Y= boty + baty +e (8A.2.2), Therefore, (6A2.3) Which we can write as LFt= Leen = Veil ~baay — baa) Dei be Devaar —bs Yea = Drei since the last two terms are zero (Why?) = Db — baw) = Deb Dye yyw Lit (Yop +bs wa) = TSS— ESS aN APPENDIX 8A.3: Derivation of Equation (8.50) Recall that (see footnote 9) RSS 27 - 5 . Real es @A3.1) FEED Now R° is defined 2s RSS/(t= Kh) ~ T8S/(n=1) RS) TSS (n-h) Note how the degrees of freedom are taken into account: Now substituting Bquation (8A.3.1) into Equation (8A.3.2), and after algebraic manipulations, we obtain 5 Re =1 (8A.3.2) 1 Ra1-(+R) 307CHAPTER EIGHT: MULTIPLE REGRESSION: ESTIMATION AND HYPOTHESIS TESTING 245 Notice that if we do not fake into account the df associated with RSS (= 11 ~ k) “ and TSS (= 1 — 1), then, obviously R?-= R’. APPENDIX 8A. views Output of the Clock Auctlon Price Example 285“O ow

8.essentials of Econometrics Chapter1-8 PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

8.essentials of Econometrics Chapter1-8 PDF

Uploaded by

Copyright:

Available Formats

You might also like