Professional Documents
Culture Documents
8.essentials of Econometrics Chapter1-8 PDF
8.essentials of Econometrics Chapter1-8 PDF
0 27
That is, the conditional probability of A, given B, is equal to the ratio of their
joint probability to the marginal ma of B. In like manner,
P@|A)= 5 P(A) > 0 (2.8)
om
Tp avoid the shaded area in Fig. 2-1(c) being counted twice, we have to subtract P(AB) on the
right-hand side of this equation.‘CHAPTER TWO. REVIEW OF STATISTICS t PROBABILITY AND PROBABILITY DISTRIBUTIONS 31
‘Toviswalize (27), we can resort to a Venn diagram, as shown in Figure 2-1(c)
‘As you can see ftom this figure, regions 2 and 3 represent event B and regions 1
and 2 represent event A. Because region 2 is common to both events, and since
Bhas occurred, if we divide the area of region 2 by the sum of the areas of re-
gions 2 and 3, we will get the (conditional) probability that the event A has oc- ”
curred, knowing that B has occurred. Simply put, the conditional probability is
the fraction ofthe time that event A occurs when event B has occurred.
Example2.11,
Inan introductory accounting class there are 500 students, of which 300 are
ales and 200 are females. Of these, 100 males and 60 females plan to major
in actouriting. A student is selected at random from this class, and itis found
that this student plans to be an accounting majo What i the probability that
the student isa male?
Let A denote the event that the student is a male and B that the student is
an accounting major. Therefore, we want fo find out P(4| B). From the for-
mula of conditional probability just given, this probability can be obtained as
ec)
?@)
_ 100/500
~ 60/500
= 0.625
P(A\B)
From the data given previously, it can be readily seen that P(A) = 200/500
0.6; that is, the unconditional probability of selecting a male student is 0.6,
which is different from the preceding probability 0.625.
This example brings out an important point, namely, conditional and uncondi-
tional probabilities in general are diferent. However, if the two events are indepen-
dent, then we can see from Bq. (27) that
P(AB) _ P(APB) _
P(A|B) = 7H 7 Pe) P(A) (2.9)
Note that: P(AB) = P(A) P(B) when the two events are independent, as
noted earlier, in this case, the conditional probability of A given B is the same
as the unconditional probability of A. In this case it does not matter if B occurs
or not.
‘An interesting application of conditional probability is contained in the fa-
‘mous Bayes’ Theorem, which was originally propounded by Thornas Bayes, «
nonconformist minister in Tumbridge Wells, England (17011761). This theo-
rem, published after Baves! death. led tn the en-rallod Ravacian chant af
a32 PARTONE: BASICS OF PROBABILITY AND STATISTICS
Statistics, a.rival to the school of classical statistics, which still predominates
statistics teaching in most universities in the world. The knowledge that anevent
B has occurred can be used to revise or update the probability that an event A
has occurred! is the essence of Bayes’ Theorem.
Toexplain this theorem, let A and B be two events, each with a positive prob-
ability. Bayes’ Theorem then states that:
PCBLAPLA)
PRIAPA+ PCPA) oe
where A’, called the complement of A, means the event A does not occur,
In words, Bayes’ Theorem shows how conditional probabilities of the form
P(BIA) may be combined with initial probability of A (ie., P(A)) to obtain the
final probability. P(A|B). Notice how the roles of conditioning event (6) and
‘outcome event (A) have been interchanged. The following example will show
how Bayes’ Theorem works,
P(AIB) =
Example 2.12, Bayes’ Theorem
Suppose a woman has two coins in her handbag. One is a fair coin and one is
two-headed. She takes a coin at random from her handbag and tosses it.
Suppose a head shows up. What is the probability that the coin she tossed
was two-headed?
Let A be the event that the coin is two-headed and A’ the event that the
coin is fair. The probability of selecting either of these coins is P(A) = P(A’) =
1/2. Let B be the event that a Head tums up. If the coin has two heads, B
certain to occur, Hence, (BJA) = 1. But if the coin is fair, P(B|A) = 0.5.
‘Therefore by Bayes’ theorem we obtain
P(BIA)P(A) _ (1)(05) _ 2 x
PCAIB) =~" pipy = 075 5 ~ 056
Notice the particular feature of this theorem. Before the coin was tossed, the
probability of selecting a regular or two-headed coin was the same, namely, 0.5.
But knowing that a head was actually observed, the probability that the coin se-
lected was two-headed, is revised upwards to about 0.66.
In Bayesian language, P(A) is called the prior probability (i., before the fact
or evidence) and P(A|B) is called the revised or posterior probability (after the
fact or evidence). The knowledge that B has occurred leads us to reassess or
revise the (prior) probability assigned to A.
5 the sample space is partitioned into A (event A occurs) and Al (event A does not occu), then
forany event its true that PB) = PBA) + PGA); that isthe probability that B occurs isthe sum
of the common outcomes between B and each partition of A. This result can be generalized if A is
partitioned ito several segments. This can be seen easily from & Vern diagram.CHAPTER TWO: REVIEW OF STATISTICS |; PROBABILITY AND PROBABILITY DISTRIBUTIONS 33,
‘You might see this finding as somewhat puzzling, Intuitively you will think
that this probability should be 1/2. But look at the problem this way. There are
three ways in which heads can come up, and in two of these cases, the hidden
face will also be heads.
* Notice another interesting feature of the theorem. In classical statistics we as-
sume that the coin is fair when we bed it, In Bayesian statistics we question that
premise or hypothesis.
Probabliity of Random Varlables
_ Justas weassigned probabilities to sample outcomes or events ofa sample space,
‘we can assign probabilities to random variables, foras we saw, random variables
are simply numerical representations of the outcomes of the sample space, as
shown in Example 25. In this textbook we are largely concerned with random
variables such as GDP, money supply, prices, and wages, and we should know
how to assign probabilities to random variables. Technically, we need to study
the probability distributions of random variables, a topic we will now discuss.
2.5 RANDOM. VARIABLES AND THEIR
PROBABILITY DISTRIBUTIONS
By probability distribution of a random variable we mean the possible values
taken by that variable and the probabilities of occurrence of those values. To un-
derstand this clearly, we first consider the probability distribution of a discrete
tv, and then we consider the probability distribution of a continuous tv, for
there are differences between the two.
Probability Distribution of a Discrete Random Variable
As noted before, a discrete rv. takes only.a finite (or countably infinite) number
of values.
Let X bean ry. with distinct values of x, .... The function f defined by
f0 =x) = PO=2) Dover en
=0ifr ex
is called the probability mass function (PME) or simply the probability func-
tion (PF), where P(X = 3)) means that the probability that the discrete rv. X takes
the numerical value of 2,7 Note these properties of the PME given in (2.11):
Os fa) s1 (2.12)
Df) =1 (2.13)
‘The values taken by a discrete rv. are often calluess points and f(X= x) denotes the mass as-
sociated with the nas point
ee‘34 PART ONE: ” BASICS OF PROBABILITY AND STATISTICS
FIGURE 2.2
where the summation extends over al the values of X. Notice the similarities of
these properties with the properties of probabilities discussed earlier
To see what this means, consider the following example.
Example 2.13.
Let the xv. X represent the number of heads obtained in two tosses of a coin.
Now consider the following table:
Number ofhesds PF
Xx Ax)
0 14
1 ve
2 14
sum 41.00
In this example the rv. X (the number of heads) takes three different values—
X=0,1, or 2. The probability that X takes a value of zero (i.e, no heads are
obtained in a toss of two coins) is 1/4, for of the four possible outcomes of
throwing two coins (ie,, the sample space), only 1 is favorable to the outcome
TT: Likewise, of the four possible outcomes, only one is favorable to the out-
‘come of two heads; hence, its probability is also 1/4. On the other hand, two
outcomes, HT and TH, are favorable to the outcome of one head; hence, its
probability is 2/4 =1/2. Notice that in assigning these probabilities we have
used the classical definition of probability,
Geometrically, the PMF of this example is as shown in Figure 2-2.
fy
Probability Density
14!
| | x
0 1 2
Number of Heads
‘The probability mass function (PMF) of the number of
head into tosses ofa coin4Evample 2.19).(CHAPTER TWO: REVIEW OF STATISTICS PROBABILITY AND PROBABILITY DISTRIBUTIONS 35
Probability Distsibutlon of a Continuous Random Variable
The probability distribution of a continuous random variable is conceptually
similar to the probability distribution of a discrete rv. except that we now mea-
sure the probability of such an ry. over a certain range or interval. Instead of
calling ita PME, we call ita probability density function (PDF). An example
will make the distinction between the two clear.
Let X represent the continuous .v. height, measured in inches, and suppose
‘we want to find out the probability that the height of an individual lies in the in-
terval, say, 60 to 68 inches. Further, suppose that the rv. height has the PDF as
shown in Figure 2-3.
The probability that the height of an individual lies in the interval 60 to
68 inches is given by the shaded area lying between’60 and 68 marked on the
curve in Figure 2-3. (How this probability is actually measured is shown in
Chapter 4,)'n passing, note that since a continuous rv, can take an uncountably
infinite number of values, the probability that such an tv. takes a particular nu-
merical value (e.g , 63 inches) is always Zero; the probability for a continuous 1.0. is
aloays measured over an interonl, say, between 62.5 and 63.5 inches,
‘More formally, for a continuous rv. X the probability density function (PDF),
{(%), is such that
2
P(y
(CHAPTER SEVEN: THE TWO-VARIABLE MODEL: HYPOTHESIS TESTING 205
TABLE7-4 CAPACITY UTILIZATION AND INFLATION
INTHE UNITED STATES, 1970-2001.
‘Observations “Capacity Iniiafon
1970 ez0000-. 6.900000
1971 77,6000 6.000000.
1972 83,2000 4.900000
1973 8750000 5.600000
1974 84.1000 9.000000
1975 73.40000 9.300000
1978 7.90000 5.700000
197 ‘82.90000: 6.400000
- 1978 '84,50000° 7.100000
1979. 84,20000 8.900000
1960 78.8000. 9.200000
4981 ‘7720000. 9.900000
1982 71,1000 ° 6.200000"
1983. 73,5000 - 4.000000
1984 Taso: 8.700900
1985: 72,40000° 9.200000
1966 78,80000 . 2.200000.
1967 ‘81,10000 3.000000.
1968 ‘84.1000 3.400000.
1969 83.2000 3.600000
1990 81.6000 3.800000
1991 7830000. 3.600000
tage 79.90000. © 2.400000
1993 ‘80.0000 - 2.400000
1994 2.40000 2.100000
1905, ‘e2.a0000:.- 2200000
1996 1.20000. 1.900000
1907 ‘8270000: 1.900000
1988 e900 1.200000
1999 1.40000 “1.400000.
2000 ‘81.4000. 2.100000
2001 TRO 2.400000
‘Soiree: Economie Report o tie Posen,
TABLE 7-5 CASH DIVIDEND (Y) AND APTER-TAX PROFITS (X) IN|
U.S, MANUFACTURING INDUSTRIES, 1974-1986.
Your ¥ Ko Year 8 x
($; In milions) (8, in milione)
1974 19,487 | SB76T YOR 40at7™ 101,902
1976: 19,968. 49,195. 1982: 41,259: 71,028,
1976 22,7063 64518. 1983 41,604. 85,694
4977 26.585 70.968. ibs. "45,02 YOTS48
4978 28,982 81,148. 1985. 45517» 87.648
197° 32,401 95,698 1908 48,044. -83,t2t
1880. 36,495 .
‘Sour Businoss Stes, 1906, U.S. Dapartmont of
Conner, Buroau of Economic Anais, Decombar 1887, p72:
268
m
P 2 EB
a\ 206 PARTTWO: THE LINEAR REGRESSION MODEL
b. Plot the scattergram between Y and X.
¢. Does the scattergtam support your expectations in part (a)?
. Ifo, do an OLS regression of Y on X and obtain the usual statistics.
, Establish a 99% confidence interval forthe tre slope and test the hypothe-
-sis that the true slope coefficient is zero; that is, there is no relationship be-
tween dividend and the after-tax profit,
7.45. Refer to the S.A. data given in Table 6-15. Suppose you want to predict
the male math scores on the basis of the female math scores by running the
following regression: :
Y= By + BX +h
where ¥ and X denote the malé and female math scores, respectively.
a, Bstimate the preceding regression, obtaining the usual summary statistics.
b. Testthe hypothesis that there is no elationship between Y and X whatsoever.
. Suppose the female math score in 1991 is expected to be 460. What is the
predicted (average) male math score?
4, Establish a 95% confidence interval for the predicted value in part ()-
7.6. Repeat the exercise in Problem 7,15 but let Y and X denote the male and the
female verbal scores respectively. Assume a female verbal gcore for 1991 of 425.
77. Consider the following regression results?
R? =0.10, Durbin-Watson = 201
where Y = the real efurn on the stock prie index from January of the current
year to Januasy of the following year
X = the total dividends in the preceding year divided by the stock price
index for July of the preceding year
t=times ©
‘Note: On Durbin-Watson statistic, see Chapter 14.
‘The time period covered. by the study was 1926 to-1982,
Note: R® stands for the adjusted coefficient of determination, The Durbin-
‘Watson valie isa measure of autocorrelation. Both measures are explained in.
subsequent chapters, .
a. How would you interpret the preceding regression?
b. If the previous results are acceptable to you, does that mean the best in-
‘vestment strategy is to invest in the stock market when the dividend /price
ratio is high?
¢. If you want to know the answer to patt (bead Shiller's analysis.
78, Refer to Example 6.1 on years of schooling and average hourly eamigs. The
data for this example is given in Table 65 and the regression results are pre-
sented in Bq, (6.21). For this regression
a Obtain the standard errors of the intercept and slope coefficients and 1”.
’b, Test the liypothesis that schooling has no effect ont average hourly earning,
Which test did you use and why?
ee
gee Robert}, Shiller, Market Volailty, MIT Press, Cambridge, Mass, 1989, pp. 32-86.
269TABLETS
(CHAPTER SEVEN: THE TWO-VARIABLE MODEL: HYPOTHESIS TESTING 207
¢. If youreject the null hypothesis in (), would you also reject she hypothesis
thatthe slope coefficient in Bq, (6.21) is not different from 1? Show the nec-
calculations.
essay
7.19, Example 6.2 discussed Okun’s law, as shown in Bq, (6.22), This equation can"
also be written as X; = By + ByY,, where X = percent growth in ral output, as
measured by GDP and Y = change in the unemploysnent rate; measured in
percentage points, Using the data given in Table 6-12,
a Estimate the preceding regression, obtaining the usual results as per
46).
by Is the change in the unemployment rate a significant determinant of per-
cent growth in real GDP? How do you know?
¢. How would yott interpret the intercept coefficient in this regression? Does
ithave any economic meaning?
720, For Example 63, relating stock prices to interest rate, are the regiession results
givert in Eq, (6.24) statistically significant? Show the necessary calculations,
17.21. Refer to Example 6.5 about antique clocks and their prices. Tased on ible 6d,
-we obtained the regression results shown in (6.27) and (6.28); For each regres-
sion obtain the standard errors, the ¢ ratios and the r? values. Test for the statis-
tical significance of the estimated coefficients in the two regressions,
7.22. Refer to Problem 6.16. Using OLS regressions, answer questions (, (6), and (c).
7.23. Table 7-6 gives data on US. expenditure on imporied goods (Y) and personal,
disposable income (X) for the 1968 to 1987.
Based on the data given in this table, estimate an import expenditure furic-
tion, obtaining the. usual-regression statistics and test the hypothesis that
expenditure on imports is unrelated to personal disposable income,
7.24 Show that the OLS estimators, by and be, are linear estimators. Also show that
these estimators are linear functions of the error term u; (Hitt: Note that
b= Lays DF = LDwiy, where w; = %/ 33? and note that the X’s are
minepchany
7.25. Prove Eq, (735). (Hint: Square Eg, (7.33) and use sorne of the properties of OLS).
U.S. EXPENDITURE ON IMPORTED GOODS (7) AND
PERSONAL DISPOSABLE INCOME (X}, 1968-1087".
Yeats Xe Year x
1968. 135.7
969 1446
1970 1509
4971. 1882
we 107
1973 2182
$814 3818:
1975 187.9
976-2293
jor? | 2594
‘Sourge: Economie Repar othe President,
ori ett p Sh enleda se Yom IbbaB2? p23
“Inbiifns of 1962 doles,
270
rae
PERS:CHAPTER
MULTIPLE REGRESSION:
ESTIMATION AND
HYPOTHESIS TESTING
7253
riable linear regression model that we have considered so far there
ingle independent, or explanatory, variable, In this chapter we extend
that model by considering the possibility that miore than one explanatory vari-
able may influence the dependent variable. A regression model with moze than
‘one explanatory variable is known as a multiple regression model, multiple
because multiple influences (ie., variables) may affect the dependent variable,
For example, consider the 1980s savings and loan (S&L) crisis resulting from
the bankruptcies of some S&L institutions in several states. Suppose we want to
develop a regression model to explain bankruptcy, the dependent variable.
Now a phenomenon such as bankruptcy is too complex to be explained by a
single explanatory variable; the explanation may entail several variables, such
as the ratio of primary capital to total assets, the ratio of loans that are more than.
90 days past due to total assets, thie ratio of nonaccruing loans to total assets, the
ratio of renegotiated loans to total assets, or the ratio of net income to total as-
sets, etc.! To include all these variables in a regression model to allow for the
multiplicity of influences affecting bankruptcies, we have to considera multiple.
regression model.
Needless to say, we could cite hundreds of examples of multiple regression
models. Infact, most regression models are multiple regression models because
very few economic phenomena can be explained by only a single explanatory:
variable, as in the case of the two-variable model,
"As 9 matter of fact, these were some ofthe variables that were considered by the Board of
Governors of the Federal Reserve System in their intemal studies of bankrupt banks.
271(CHAPTER EIGHT: MULTIPLE REGRESSION: ESTIMATIONAND HYPOTHESIS TESTING 209
Inthis chapter we discuss the multiple regression model seeking answers to
the following questions:
1. How do we estimate the multiple regression models? Is the estiinating
procedure any different from that forthe two-variable model?
2. Is the hypothesis-testing procedure any different from the two-variable
model?
3, Ate there any unique features of multiple tegressions that we did not
encounter in the two-variable case?
4, Since a multiple regiession cart have any’ number of explanatory variables,
how do we decide how many variables to include in any given situation?
To answer these and other related questions, we first consider the simplest of
the multiple regression models, namely, the three-variable model in which the
behavior of the dependent variable Y is examined in relation to two explanatory
variables, X and Xs. Once the three-variable model is clearly understood, the
extension to the four-, five-, or more variable case is quite straightforward,
although the arithmetic gets a bit tedious. (Bit in this age of high-speed com-
puters, that should not be a problem.) It is interesting that the three-variable
‘model itself is in many ways a clear-cut extension of the two-variable model, as
the following discussion reveals,
“81. THE THREE-VARIABLE LINEAR REGRESSION MODEL
Genetalizing the two-variable population regression function (PRE), we can
write the three-variable PRF in its nonstochastic form as
EY) = Bi + ByXer + BaXx ar
and in the stochastic form as,
Y= Bit BrXa + BaXy +04 62)
SEM) +H 3)
where ¥ = the dependent variable
Xand X= the explanatory variables
u = the stochastic disturbance term:
¢ = the ith observation:
ion (6.1) can be wate att E(Y) = BiXy + BX + BpXe, withthe understanding that
{for each observation. The ptesentetion in (8.1) for notational convenience in thatthe sub-
scrips on the parameters or fheirestitaiors match the subscripts onthe variables to which they are
‘tached,
272Af Mee
240 PART TWO: THE LINEAR REGRESSION MODEL
In case the data are cross-sectional, the subscript i will denote the ith observa-
Hon. Note that we introduce w in the three-variable, or, more generally, in the
multivariable model for the same reason that it was introduced in the two-
variable case,
By is the intercept term. It represents the average value of Y when Xpand X5
are set equal'to zero. The coefficients By and By are called partial regression
coefficients; their meaning will be explained stiorly.
Following the discussion in Chapter 6, Equation (8.1) gives the conditional’ -
mean value of Y, conditional upon the given or fixed values of the variables Xp
and Xs, Therefore, as in the:two-variable case, multiple regression analysis is
conditional regression analysis, conditional upon the given or fixed values of the
explanatory variables, and we obtain the average, or mean, vale of Y forthe fixed
values of the X variables, Recall that the PRF gives the (conditional) means of the
¥ populations corresponding to the given levels of the explanatory variables, Xp
and Xs;
‘The stochastic version, Equation (8.2), states that any individual Y value can
be expressed'as the sum of two components:
1, A systematic, or deterministic, component (By + BxXn + BsXay), which is
simply its mean value E(Y;) (ie., the point on the population regression
line, PRL) and
2. m, which is the nonsystematic, or random, component, detertnined by
factors other than X2 and Xs.
All thisis familiar territory fromi the two-variable case; the only point tonote
is that we now have two explanatory variables instead of one explanatory
variable,
Notice that Eg, (8.1), or its stochastic countespart Eq. (8.2), is a linear regression
rmodel—a model that is linear inthe parameters, the Bs. As noted in Chapter 6, our
concern in this book is with regression models that are linear in the parameters;
such models may of may not be linear in.the varlables (but more on this in
Chapter 9).
The Meaning of Partial Regression Coefficient '
‘As mentioned earlier, the regression coefficients B, and By are known as partial
regression or partial slope coefficients, The meaning of the partial segression
coefficient is as follows: Bp measures the change in the mean value of Y. E(Y), per
unit change in Xp, holding the value of Xs constant, Likewise, By measures the
Unlike the two-variable case, we cannot show this diagrammatically because to represent the
thie variables Y, Xo, and %5, we have to use a three-dimensional diagram, which is difficult (o
visualize in two-dimensional fori But by stretching the imagination, we can visualize a diagram.
similar to igure 6-6.
‘Geometrically, the PRL inthis case repress what is known as plane.
27%CHAPTER EIGHE MULTIPLE REGRESSION: ESTIMATION AND HYPOTHESIS TESTING 217
change in the mean value of Y per unit change in Xs, holding the value of Xp
constant. This is the unique feature of a multiple regression; in the two-variable
case, since there was only a single explanatory variable, we didnothave to worry
about the preserice of other explanatory variables in the model, In the multiple
segression model we want to find out what part of the change in the average
value of Y can be directly attributable to Xz and what part to X3. Since this point
is so crucial to understanding the logic of multiple regression, let us explain it
by a simple example. Suppose we have the following PRF:
B(Y) = 15 1.2% + 08%, 6a
Let Xs be held constant atthe value 10, Putting this value in Equation (8.4), we
obtain,
(85)
Here the slopé coefficient By = —1.2 indicates that the mean value of Y decreases
by 1.2 per unit increase in Xp when Xs is held constant—in this example it is
field constant at 10 although any other value will do. This slope coefficient is
called the partial regression ca evn if we hola Xp constant, say, at
the value 5, we obtain.
E(Y;) = 15 — 1.25) + 0.8
=9+08%y 6a
‘Here the slope coefficient Bs = 0.8 means that the mean value of Y increases by
0.8 per unit increase in X3 when X; is held constant—here it is held constant at
5, bat any other value will do just as well. This slope coefficient too is a partial
regression coefficient.
In short, ther, a partial regression coefficient reflects the (partial) effect of ane ex-
planatory variable on the mean value ofthe dependent variable when the values of other
explanatory variables included in the model are held constant. This unique feature of
multiple regression enables us not only. to include more than one-explanatory
vatiable in the model but also to “isolate” or “disentangle” the effect of each X.
variable on Y from the other X variables inclisded in the model.
We will consider a concrete example in Section 8.5.
(5) shows, it does not matter at what yahie X, fs held constant, for that
Sas the algebra of i ri
by its coefficient wil be a constant number, which will simply be added
constant value raul!
othe inter
‘The mathematically inclined read will notice at ance that By isthe partial derivative of HY)
with respect to X end that Bs is the partial detivative of E(Y) with respect to Xi,
274
OOP bo
m2212 PART TWO: THELINEAR REGRESSION MODEL
8.2 ASSUMPTIONS OF MULTIPLE LINEAR REGRESSION MODEL
Pa
£32
As in the two-variable case, our first order of business is to estimate the regres-
sion coefficients of the multiple regression model. Toward that end, we con-
tinue to operate within the framework of the classical linear regression model.
(CLRM) first introduced in Chapter 7 and to use the method of ordinary least
squares (OLS) to estimate the coefficients,
Specifically, for model (8.2), we assume (cf. Section’7.1):
A8.1.
‘The regression model is linear in the parameters as in Eq, (8.1) and that it is
correctly specified. :
AB2.
Xpand Xs are uncorrelated with the disturbance term u, However if Xpand
Xsare nonstochastic (Le, fixed number in repeated sampling), this assump-.
tion is automatically fulfilled,
Since our regression analysis is conditional regression analysis, conditional on
the given X values, assumption 8.2is not necessary. But it is made to deal with
simultaneous equation regression models discussed in Chapter 15, where we
will see that some of the X variables may be correlated with the error ferm.
A83,
The error term u has a zero mean value; that is,
E(u) +0 @n
A84.
Homoscedasticity, that is, the variance of u, is constant:
var(uso2. ” 8)
ABS.
No autocorrelation exists between the error terms ij and up
cov (uj, uj) ix j (8.9)
‘A8.6.
No exact collinearity exists between X, and Xs; thatis, there is no exact linear
relationship between the two explanatory watiables, This is a new assump-
tion’ and is explained later
275,CHAPTER EIGHT: MULTIPLE REGRESSION: ESTIMATION AND HYPOTHESIS TESTING 213
AB7,
For hypothesis testing, the error term u follows the normal distribution with
‘meat zero and (homoscedastic) variance 02. That i,
a ~ NiQ;0) B10)
Except for assumption 8.6, the rationale for the other assumptions the same
as that discussed for the two-variable linear regression. As noted in Chapter 7,
we make these assumptions to facilitate the development of the subject, It
Part IIT we will revisit these assumptions arid see what happens if one or more
of them ate not fulfilled in actual applications.
Assiimption 86 that there is no exact linear relationship between the explana-
tory variables Xpand X,, technically knowmas the assumption of nacollinearity, or
‘nomulticollinearity, if more than one exact linear relationship is involved, isnew
and needs some explanation.
Inforinally, no perfect collinearity means that a variable, say, X, cannot be
expressed as.an exact linear function of ariother variable, say, Xs. Thus, ifwe can
express
Xy = 34+ 2Ky
Xo = 4Xy
then the two variables are collinear, for theré is an exact linear felationship
between Xp and Xs. Assiimption £.6 states that this should not be the case. The
logic here is quite simple. If for example, Xp = 4X5, then substituting this in
Bg. (8.1), we see that
(6.41)
A=4Bj + Bs (8.22)
Equation (8.11) is two-variable model, not a three-variable model. Now even.
if we can estimate Eq, (8.11) and obtain an estimate of A, there isno way that we
can get individtal estimates of B, or Bs from the estimated A, Note that since
Equation (8.12) is one equation with two unknowns we need two (indeperident)
equations to obtain unique estimates of By and By
‘The tipshot of this discussion i that in cases of perfect collinearity we cannot
estimate the individual partial regression coefficients By and Bs, in other words,
276
FERIAMf Ban
214 PARTTWO: THELNEAR REGRESSION MODEL
we cannot assess the individual effect of X and Xs on Y. But this is hardly
‘surprising, for we really do not have two independent variables in the model,
‘Alttiough, in practice, the case of perfect cbllinearty is rare, the cases of high
or near perfect collinearity abound. In later chapter (see Chapter 12) we will
examine this case ‘more fully. For now we merely require that two or more
explanatory variables do not have exact linear relationships among them.
8.3 ESTIMATION OF PARAMETERS OF MULTIPLE REGRESSION
‘To estimate the parameters of Hq. (8.2), we use the ordinary least squares (OLS)
method whose main features have already been discussed in Chapters 6 and 7.
Ordinary Least Squarés Estimators
To find the OLS estimators, let us first write the sample regression function
(GRE) corresponding to the PRF Eq, (62), as follows:
Ya br + ba¥n tbsXu er 6.13)
where, following the convention introduced in Chapter 6, ¢ is the residual term,
or simply the residival—the sample counterpart of and where the bs are the
estimators of the population coefficients, the Bs. More specifically,
= the estimator of Bi
}, = the estimator of By
bg = the estimator of By
‘The sample counterpart of Eq, (8:1) is
Y= by + bake tbaXe (6.14)
‘which is the éstimated population regression lirie (PRL) (actually a plane).
As explained in, Chapter 6, the OLS principle chooses the values of the un-
known parameters in sucki a way that the residual sum of squares (RSS) Je? is
as small as possible. To do this, we first write Equation (8.18) as.
= Yih = ka, body (6.15)
Squaring this equation on both sides and summing over the sample observa
tions, we obtain. a
RSS: yee = Te. he bike = bX)? (8.16)
And in OLS we minknize this RSS (which is simply the sum of the squared
differen betwee actual Yj and estimated Y),
277CHAPTER EIGHT: MULTIPLE REGRESSION: ESTIMATION AND HYPOTHESIS TESTING. 215
The minimization of Equation (8:16) involves the calculus technique of dif-
ferentiation. Without going into detail, this process of differentiation gives us
the following equations, known as (least squares) normal equations, to help esti-
mate the unknowns’ [compare them with the corresponding equations given
for the two-variable case in Equations (6.14) and (6.15)]:
Y= by +b2% + bas (6.17)
DIY Xa = th YD Xa th FG + bs XX (8.18)
Dive = Y Re they tote tbe PH (8:19)
Where the summation is over the sample range 1 to n. Here we have three eqita-
‘tions in three unknowns; the knowns are the variables Y and the Xs and the un-
knowns are the bs. Ordinarily, we should be able to solve three equations with
three unknowns, By simple algebraic manipulations of the preceding equa-
tions, we obtain the three OLS estimators as follows:
he Pbk -b% (620)
(Dyta)(Dx8) ~ (Lyra) (Daa)
ner (Pa)C4)- (Lael oe
_ (Ls) (E9) = (Ce) (Tray)
(L4)(L8)- (Daw)?
where, a3 usual, lowercase letters denote deviations from sample mean values.
egyw=%—¥).
(8.22)
~
You will notice the similarity between these equations and the correspond-
ing ones for the two-variable case given in Bags. (6:16) and (6.17). Also, notice. th
the following features of the preceding equations: (1) Equations (8.21) and (8.22)
ate symumetrical in that one can be obtained from the other by interchanging,
the roles of Xo and Xs, and (2) the denominators of these two equations are
‘identical.
ee
Variance and Standard Errors of OLS Estimators:
‘Having obtained the OLS estimators of the intercept and partial regression co-
efficients, we can derive the variances and standard errors of these estimators
in the manner of the two-variable model, Thesé variances or standard errors.
give us some idea about the variability of the estimiators from sample to
sample. As in the two-variable case, we need the standard érrors for two main
The mathematical details can be found in Appendix 84.1
278LB
aD
a
216. PARTTWO: THE LINEAR REGRESSION MODEL
purposes: (1) to establish confidence intervals for the true parameter values and
(2) to test statistical hypotheses. The relevant formulas, stated without proof, are
as follows:
var(by) = [: + bea pas) y2)
ESS = the explained sum of squares (te, explained by all the X variables)
RSS = the residual sum of squares
Also, as in the two-variable éase, R? is defined as
=
15
R (8.33)
is, itis the ratio of the explained sum of squares to the total sum of squares;
the only change is that the ESS is now due to more than one explanatory
variable.
280
R22
wt
nD218 PARTTWO: THE LINEARREGRESSION MODEL
Now it can be shown that?
BSS= bo) wan tbs ) yaw (8.34)
and, as shown before,
RSS = ) ye) Yet — Bs vit (6.35)
Therefore, R2 can'be computed as :
Ren BEM tbo ye zane Ee 636?
In passing, note that te positive square root of R?,R, is known as the coeffi-
cient of multiple corrélation, the two-variable analogue ofr, Just as r measures
the degree of linear association between Y and X, R can be interpreted as the de~
gree of linear association between Y and all the X variables jointly, Although r
can be positive or negative, Ris always taken to be positive. In practice, how-
ever, Ris of little importance.
8.5 ANTIQUE CLOCK AUCTION PRICES REVISITED
Let us take time out to illustrate all the preceding theory with the antique clock
auction prices example we considered in Chapter 6 (See Table 6-14). Let Y =
auction price, Xp = age of clock, and Xj = number of bidders. A priori, one
would expect a positive relationship betweein Y and the two explanatory vari-
ables. The results of regressing Y on the two explanatory variables are as fol-
lows (The Eviews output of this regression is given in Appendix 8A.4.)
: ¥; = -1336.049 + 12.7413 Xp; + 85.7640,
se = (175.2725) (0.9123) - _ (8.8019)
t= (7.6226) (13.9653) (9.7437) (6.37)
p=(0.0000)* —(0.0000)*_(0.0000)*
RP 08906; F = 118.0585
Mia
Interpretation of the Regression Results
‘As expected, the auction price is positively related to both the age of the clock
and the number of bidders. The interpretation of the slope coefficient of about
12.74 means that holding other variables constant, if the age of the clock goes up
"See Appendix 8A2- i
5R an algo be coniputed as 1 ~ RY = 1— pa
“Denotés ari extremely small value. '
oadGIAPTER EIGHT: MULTIPLE REGRESSION: ESTIMATION AND HYPOTHESIS TESTING 219
by a year, the average price of the clock goes about by about 12.74 marks,
Likewise, holding other vatiables constant, if the number of bidders increases
by one, the average price of the clock goes up by about 89.76 marks. The nega-
tive value. of the intercept has no viable economic meaning. The R* value of
about 0.89 means that the two explanatory variables account for about 89 per-
cent of the variation in the auction bid price, a fairly high value. The F value
given in Bg, (637) will be explained shortly. .
8.6 HYPOTHESIS TESTING INA MULTIPLE REGRESSION:
GENERAL COMMENTS -
Although R? gives us an overall measure of goodness of fit ofthe estimated re-
gression line, by itself R* does not tell us whether the estimated partial regres-
sion coefficients are statistically significant, that is, statistically different from
zero. Some of them may be and some may not be. How do we find out?
‘Tobe specific, let us suppose we want to entertain the hypothesis that age of
the antique clock has no effect on its price. In other words, we Want to test the
null hypothesis: Hy:By =D, How do we go about it? From our discussion of
hypothesis testing for the two-variable model given in Chapter 7, in order to
answer this question we need to find out the sampling distribution of ty, the
estimator of By, What is the sampling distribution of by? And the sampling dis-
tribution of by and bs?
Th the two-variable case we saw that the OLS estimators, by and by, ae ior
mally distributed if we are willing to assume that the error term u follows the
‘notmal distribution. Now irtassumption A8.7 we have stated that even for mvul-
tiple regression we will continue to assume that u is normally distributed with
zero mean and constant variance o”, Given this and the other assumptions
listed in Section 8.2, we can be prove that by by, and b; each follow the normal
distribution with means equal to Bi, By, and Bs, respectively, and the variances f
given by Eqs. (6.23), (8.25), and (8.27), tespectively. a
However, as in the two-variable case, if we replace the true butt unobservable, }
o? by its unbiased estimator 6? given in Eq, (8.29), the OLS estimators follow: | 4
the t distribution with (n — 3) df, not the normal distribution, That is,
(8.38)
(8.39)
(8.40)
Notice that the d.f, are now (1r— 3) because in computing thé RSS, > ¢?, and
hence 6”, we first need to estimate the intercept and the two partial slope coef-
ficients; so we Jose 3 d.€.
282220 PART TWO: THE LINEAR REGRESSION MODEL
‘We know that by replacing o? with 6? the OLS estimators follow the t distri-
bution. Now we can use this information, to establish confidence intervals as.
well as to test statistical hypothesis about the true partial regression coefficients.
The actual mechanics in many ways resemble the two-variable case, which we
P now illustrate with an example.
8.7 TESTING HYPOTHESES ABOUT INDIVIDUAL.
PARTIAL REGRESSION COEFFICIENTS
Suppose in our illustrative example we hypothesize that
HyBy=0 and HBr #0
That is, under the null hypothesis, the age of the antique clock has no effect
whatsoever on its bid price, whereas under the altemative hypothesis, itis con-
tended that age has some effect, positive or negative, on price. The alternative
ypothesis is thus two-sided.
Given the preceding null hypothesis, we know that
ata b
~se(l2)
= by 1 By =
= xp (Note: By = 0) i841)
follows the ¢ distribution with (n — 3) = 29-d.f,, since n = 32 in our example.
From the regression results given in (8.37), we obtain’
t= © 13.9653 (8.42)
a
Which has the f distribution with 29d. .
On the basis of the computed ¢ value, do we reject the null hypothesis that
the age of the antique clock has no effect on its bid price? To answer this ques-
tion, we can either use the les of significance approach or the confidence interval
‘approach, as we did for the two-variable regression.
4h 33
‘The Test of Significance Approach
Recall that in the test of significance approach to hypothesis testing we develop
a test statistic, find out its sampling distribution, choose a level-of significance
a, and determine the critical value(s) of the test statistic at the chosen level of
significance. Then we compare the value of the test statistic obtained from the
sample at hand with the critical veluels) and reject the null hypothesis ifthe
computed value of the test statistic exceeds the critical value(s).!° Alternatively,
YH the test statistic has a negative vale, ve consider its absolute valve and say that ifthe
albsolite value ofthe test statistic exceeds the critical vatue, we reject the null hypothesis
2R2
no
nD
BeCHAPTER EIGHT: MULTIPLE REGRESSION: ESTIMATION ANO HYPOTHESIS TESTING 221
we can find the p value of the test statistic and reject the null hypothesis ifthe p
‘value is smaller than the chosen a value. The approach that we followed for the
two-variable case also carties over to the multiple regression.
Returning to our illustrative example, we know that the test statistic in ques-
tion is the £ statistic, which follows the ¢ distribution with (7 — 3) d.f. Therefore,
we use the £ test of significance. The actual mechariics are now straightforward.
Suppose we choose a = 0105 or 5%. Since the alternative hypothesis is two-
sided, we have to find the critical't value at a/2 = 2.5% (Why?) for (n ~3) d.t,
which in the present example is 29. Then from the ? table we observe that for
was,
(-2.045 < t < 2.045) = 0.95 (8.43)
‘Thatis, the probability that a f value lies between the limits —2.045 and +2.045:
(e,, the critical values) is 95 percent.
From Eq; (842), we see that the computed ¢ value under Ho:B) = Q is ap-
proximately 14, which obviously exceeds the critical f value of 2,045. We there-
fore reject the null hypothesis and conclude that age of an antique clock defi-
nitely has an influence on its bid price: This conclusion is also reinforced by the
p value given in Eq, (8.37), which is practically zero, That is, ifthe null hypoth-
esis that By = 0 were true, our chances of obtaining a f value of about 14 o
greater is practically nil. Therefore, we can reject the null hypothesis more re-
soundingly on the basis of the p value than the conventionally chosen. value
of 1% of 5%.
One-Tail or Two-Tail f Test? Since, a prioti, we expect the coefficient of the
age variable to be positive, we should in fact nse the one-tai test here. The 5%
critical t valite forthe one-tail test for 29 d.& now becomes 1,699, Since the com-
puted t value of about 14 is still so much greater than 1.699, we reject the null,
hypothesis and now conclude that the age of the antique clock positively im-
pacts its bid price; the two-tail test, on the other hand, simply told us that age
of the antique clock could have a positive or negative impact on its bid price.
Therefore, be careful about how you formulate your null and altemative
hypotheses. Let theory be the guide in choosing these hypotheses.
The Confidence interval Approach to Hypothesis Testing
‘The basics of the confidence interval approach to hypothesis testing have already.
been discussed in Chapter 7. Here we mercly illustrate it with our numerical
example. We showed previously that
P(-2.045 < # < 2.088) = 0.95
‘We also know that
639)
SRE Ta
ds.
a]Li
ne
ne
222 PART TWO: THE LINEAR REGRESSION MODEL
If we substitute this value into Equation (8.43), we obtain
b-B
se(ba)
95
P (2 045 < < 206)
Which, after rearranging becomes
Plby ~ 2.045 se(b2) < By < bz + 2.045 se(b2)] = 0.95 (8.44j
which is a 95% confidence interval for By [cf. Eq. (7.26)]. Recall that under the
confidence interval approach, ifthe confidence interval, which we call the ac-
ceptence region, includes the null-hypothesized value, we do not reject the null
hypothesis. On the other hand, if the null-hypothesized value lies outside the
confidence interval, that is in the region of rejection, we can reject the null hy-
pothesis, But always bear in mind that in making either decision we ate taking
a chance of being wrong a% (éay, 5%) ofthe time.
For our illustrative example, Eq, (8:44) becomes ;
12.7413 = 2.045(0.9123) < By < 12.7413 + 2.045(0.9123)
thatis,
10.8757 < By < 14.6069 (8.45)
which is a 95% confidence interval for true Bz. Since this interval does not:in-
clude the null-hypothesized value, we can reject the null hypothesis: If we con-
struct confidence intervals like expression (8.45), then 95 out of 100 such inter-
vals will include the true By, but, as noted in Chapter 7, we cannot say that the
probability i 95% that the particular interoal (8.45) does or does not include the true Bp.
Needless to say, we can use the two approaches to hypothesis testing to test
hypotheses about any other coefficient given in te regression results for our
illustrative example. As you can see from the regression results, the variable,
number of bidders, is also statistically significant (ie., significantly. different
from zero) because the estimated f value of about 8 has a p value of almost zero.
Remember that the lower the p value, the greater the evidence against the null
hypothesis, :
8.8 TESTING THE JOINT HYPOTHESIS THAT B, = By DOR R? = 0
For out illustrative example we saiv that individually the partial slope coetfi-
cients by and bs are statistically significant; that is, indfvidually each partial slope
coefficient is significantly different from zero. But now consider the following
null hypothesis:
Hp: By = By =0 (8.46),‘CHAPTER EIGHT: MULTIPLE REGRESSION: ESTIMATION AND HYPOTHESIS TESTING 223
This null hypothesis isa joint hypothesis that By and Bg ase jointly or simultane
ously (and not individually or singly) equal to zero. This hypothesis states that
the two explanatory variables together have no influence on Y, This is the same
as saying that
Hy =0 (ean
That is, the two explanatory variables explain zero percent of the variation in
the dependent variable (recall the definition of R’). Therefore, the two sets of
hypotheses (8.46) and (8.47) are’ equivalent; one implies the other: A fest. of
either hypothesis is called a test of the overall significance of fhe estimated.
multiple regression; that is, whether Y is linearly related to both Xaand Xs.
How do we test, say, the hypothesis given in Equation (8.46)? The temptation
here is to state that since individually b; and b3 are statistically different from
zero in the present example, then jointly or collectively they also must be statis-
tically different from zero; that is, we reject Ho given. in Eq, (8.46). In other
words, since age of the antique clock and the number of bidders at the auction
each has a significant effect on the auction price, together they also must have a
significant effect on the auiction price. But we should be careful here for, as we
show more fully in Chapter 12 on imulticollinearity, in practice, in a multiple re-
gression one or more variables individually have no effect on the dependent
variable but collectively they have a significant impact on it. This means that the
Hesting procedure discussed previously, although valid for testing’ the statistical
significance of an individual regression cocficient, is not valid for testing the joint
hypothesis.
‘How then do we test a hypothesis like hypothesis (8.46)? This can be done by
using a technique known as analysis of variance (ANOVA). To see how this
technique is employed, recall the following identity:
TSS = ESS + RSS (8.32)
Thatis,
Do = be ows tbs Doyen t Det ag}
Bquation (8.48) decomposes the TSS into two components, one explained by
the (chosen) regression model (BSS) and the other not explained by the model
(RSS). A study of these components of TSS is known as the analysis of uariance
(ANOVA) front the regression viewpoint,
Asnoted in Chapter 4 every Sum of squares has associated with itits degrees
of freedom (df); that is, the number of independent observations on the basis
This is Equation (235) wetton illest
286
lane PELE ae.KB om
224 PART TWO: THE LINEAR REGRESSION MODEL
‘TABLE G4
‘ANOVA TABLE FOR THE THREE-VARIABLE REGRESSION,
‘Source of vafation ‘Sum of squares (88) oe
‘Due to regression (ESS) be Dyer ba E yee 2
Due o residual (RSS) re 1-3
Total (TSS) = a1
Note: MSS = maa or average, sum of square,
of which the sum of squatesis compote, ‘Now each of the preceding sums sof
squares as these df:
‘Sum of squares
Tss: ‘A 1 (always, Why?)
Rss ‘n— 8 (three-varlable model)
ESS 2 (throe-varable model)*
“An easy way to find the dt. for ESS sta strat the df
for RSS fom the dor TSS.
Let us arrange all these suns of squares and their associated df in atabular
form, known 2s the ANOVA table, as shown in Table 8-1.
Now given the assumptions of the CLRM (and assumption A8.7) and the
null hypothesis: Ho:Bp = Bs = 0, it can be shown that the variable
_ Yatiance| explained by X and X%
‘unexplained variance
= (ba Xyety +b Yo wrs)/2
aD 70)
follows the F distribution with 2 arid (n ~ 3) df. in the numeratot and denomi-
nator, respectively. (See Chapter 4 for a general discussion of the F distribution
and Chapter 5 for some applications)... general, if the regression rnodel las k
explanatory variables inchuting the intercept term, the F ratio has (k — 1) df in the
numerator and (nt ~ 1) df. the denomnriator32
How can we use the F ratio of Equation (849) to test the joint hypothesis that
both X) and Xs have no impact on Y? The answer is evident in Bq, (8.49). 1 the
nuinerator of Eq (8.49) is larger than its, denominator—if the variance of Y
(8.49)
14 simple way to remember this is that the numerator @1-of the E ratio ig equal to the number
of prlal slope coeficint in the mode, and the denominator dis equal to n mins the total num
‘bof parameters estimated (Le, partial slopes plus te iter)
287(CHAPTER EIGHT: MULTIPLE REGRESSION: ESTIMATION AND HYPOTHESIS TESTING 225
TABLE 8-2 ANOVA ABLE FOR THE CLOCK AUCTION PRICE EXAMPLE.
‘Source of variation, ‘Sum of squares (SS) df, Mss =
‘ue to regression (ESS) 42780953 2. d27e296.92
Due to restdual (RSS) 5054622 29. . 625460.2/08
‘otal (TSS) S0BTETS t
F = 2199147.6/18119,386 = 118.0585"
“*Figures have been rounded.
explained by the regression (i, by Xp aid Xs) is larger than the variance not ex-
plained by the regression—the F value will be greater than 1, Therefore, as the
variance explained by the X variables becomes increasingly langer relative to
the unexplained variance, the F ratio will be increasingly larger, too. Thus, arr
increasingly large F value will be evidence against the null hypothesis that the two (or
aiore) explanatory variables have no effect on Y,
Of course, this intuitive reasoning can be formalized in the usual framework
of hypothesis testing. As shown in Chapter 4, Section 4.4, we compute F as
given in Eq, (8.49) and compare it with the critical F value for 2 and (n= 3) df
at the chosen level of a, the probability of committing a type 1 error As usual, if
the computed F value exceeds the critical F value, we reject the null hypothesis that the
impact ofall explanatory variables is siraultaneously equal to zero. If it does not exceed
the critical F value, we do not reject the mull hypothesis that the explanatory variables
‘have no itupact whatsoever on the dependent ariable.
‘Th illustrate the actual mechanics, let us return to our illustrative example,
‘The numerical counterpart of Table 8-1 is given in Table 8-2.
‘The entries in this table are obtained from the Eviews computer output given
in Appendix 84.4. From this table and the computer dutput, we see that the
estimated F value is 118.0585, or about 119, Under the null hypothesis that By =
Bs.= 0, and given the‘assumptions of the classical linear regression model
{CLRM), we know that the computed F’value follows the F distribution with 2
and 29 df. in the numerator and denominator, respectively Ifthe null hypothe-
sis were true, what isthe probability of our obtaining an F value of as much as
118 or greater for 2 and 13 d.f.? The p Value of obtaining an F value of 118 or
greater is 0.000000, which is practically zero, Hence, we can reject the null hy-
pothesis that age and numberof bidders fgetier has no effet onthe bid price
of antique docks."*
In our illustrative example it so happens that not only do we reject the null
hypothesis that By and B3 are individually statistically insignificant, but we also
"8Unlike other software packages, Bviews doesnot produce the ANOVA table, although it gives
the F value, Butt is very easy to construct this tbl, for Bsiows gives TSS and RSS from which ESS.
canbe easily obfained,
lgyau had chasen «= 1% the critical F value for2 and 30 (which is close (629) A. 539. The
F yalue of 118s obviously much greater than this eral value.
288
po
no
em tnantiae sn2S PREG AD,
226 PARTTWO: THE LINEAR REGRESSION MODEL
reject the hypothesis that collectively they are insignificant. However, this need
not happen all the time. We will come across cases where not all explanatory
variables individually have much impact on the dependent variable (i.e, some
of thet values may be statistically insignificant) yet all of thes collectively indlu-
ence the dependent variable (e,, the F test will reject the null hypothesis that all
partial slope coefficlents are simultaneously equal to zero.) As we will se, this
happens if we have the problem of multicollinearity, which we will discuss
mote in Chapter 12.
An Important Relationship between Fand R?
nm
so
TABLE G9
ie)
‘There is an important relationship betiween the coefficient of determination R?
* ‘and the F ratio used in ANOVA. This relationship is as follows: 2
Rke=1)
Pac (6.50)
‘where a = the number of observations and k = the number of explanatory vari-
ables including the intercept.
Equation (8.50) shows how F and R? are related. These two statistics vary di-
rectly. When R? = 0 (ie,, no relationship between Y and the X variables), F is
270 ipso facto, The larger R*is, the greater the F value willbe, In the limit when
R's 1, the F values infinite.
‘Thus the F test discussed earlier, which is a measure of the overall signifi-
cance of the estititated regression line, is also a test of significance of R?; that is,
whether R? is different from zero. In other words, testing the null hypothesis
Eq. (8.46) is equivalent to testing the null hypothesis that (the population) R? is
zero, as noted in Eq, (8:47).
One advantage of the F test expressed in terms of R? is the ease of computa-
tion, All we need to know is the R? value, which is routinely coniputed by most
regiession programs, Therefore, the overall F test of significance given in
Eq, (8.49) can be recast in terms of R? as shown ift Eq, (8.50), and the ANOVA
Table 6-1 can be equivalently expressed as Table 8-3. _
ANOVA TABLE IN TERMS OF A®.
‘Source of variation: ‘Sum of squaires (SS) ole MSS = #
Dye to regression (ESS) ROSH 2 Fon
Duo to residuals (RSS) (1 FAQL YF)
Total (TSS) " Ey
Hoe a computing ths F eli, wo. doit ned malty F and (1 ~ Fy by snow
i rope out, as canbe seen rom Ea. (248).
{nthe kvarable modal ha a lb (kf and (n=, especialy
DRoCHAPTER EIGHT: ‘MULTIPLE REGRESSION: ESTIMATION AND HYPOTHESS TESTING 227
For ous illustrative example, R* = 0.8906, Therefore, the F ratio of Equation
(8.50) becomes
= 8806/2
F0:3506)/25
Which is about the same F as shown in Table 82, except for rounding errors.
tis left for you to set up the ANOVA table for our illustrative example: in the
manner of Table 8-3.
F 118.12 (85D
8.9 TWO-VARIABLE REGRESSION IN THE CONTEXT OF MULTIPLE
REGRESSION: INTRODUCTION TO SPECIFICATION BIAS
Lets retum to out example. in Example 65, we regressed auction price on the
age of the antique clock and fie number of bidders separately, as shown in
Equations (6.27) and (6.28). These equations are reproduced here with the tstuat
regression output,
Y= 191.6662 + 10.4856 Age,
se = (264.4393) + (1.7937) 652
= (0.7248) (6.8457) 7? = 05325; F = 34.1723
‘Y= 807.9501 + 54.5724 Bidders
se = (291.9501) (23.5724) 8.53)
| $3962), (2.3855)? 0.1549; F = 5.5017
If we compare these regressions with the results of the multiple regression
given in (8.37), we see several differences:
1. The slope values in Bqiations (8.52) and (8.53) are different from those
given in the miultiple regression (8.37), espetially that of the number of
Bidders variable. ° :
2, The intercept values inthe three regressions are also differents
3. The R? value in the multiple regression is quite different than the’?
values giver in the two bivariate regressions.
~ As we will show; some of these differences are statistically significant and some
others may not be; i .
Why the differences in the results of the two regressions? Remember that in
Eq, (6.37), while deriving the impact of age of the antique clock on the auction
price, we held the nimber of bidders constant, whetsas in Bq, (8.52) we simply
neglected the ritinber of bidders, Put differently, in Bq. (8,37) the effect of a
clock’s age On auctivn price is net of the effect, ot influence, of the number of
bidders, whereas in Bg. (8.52) the effect af the nuamber of bidders has fot been.
netted out. Thus, the coefficient of the age variable in Bq, (8.52) reflects the gross
effecl—the direct effect of age as well as the indirect effect of the ruunaber of bidders.
290
wm tes oN
,AB
>
as
228 PARTTWO: THE LINEAR REGRESSION MODEL
‘This difference between the result of regressions (8.37) and (8.52) shows very
nicely the meaning of the “partial” regression coefficient.
‘We saw in our discussion of regression (8.37) that both the age of the clock
and the number of bidders variables were individually as well as collectively im-
portant influences‘on the auction price. Therefore, by omitting thé number of
bidders variable from regression (8.52) we have committed what is known asa
(model) specification bias or specification error, more specifically, the specifi-
cation etror of omitting a relevant variable from the model, Similarly, by omit-
ting the age of the clock from segtession (8.53), we also have committed a spec-
ification error,
Although we will exainine the topi¢ of specification errors in Chapter 11,
‘what is important to note here is that you should be very careful in‘develop-
ing a regression model for empirical purposes. Take whatever help you can
from the underlying theory, and/or prior empirical work in developing the
model, And once you choose a model, do not drop variables from the model
arbitrarily.
8.10 COMPARING TWO At VALUES: THE ADJUSTED R®
By examining the R? values of our two-variable (Eq, (8.52) or Eq. (853)] and
three-variable (Eq, (8.37)] regressions for our illustrative example, you will no-
tice that the R? value of the former (0.5325 for Eq, (8.52) or 0.1549 for Bq. (8.53)]
is smaller than that of the latter (0.8906). Is this always the case? Yes! An impor-
tant property of R’ is that the larger the number of explanatory variables in a
model, the higher the R? willbe. It would then seem that if we want to explain
a substantial amount of the variation in a dependent variable, we merely have
to goon adding more explanatory variables!
However, do not take this “advice” too seriously because the definition of
R? = BS9/TSS does not take into account the d.f. Note that in a k-variable model
including the intercept tern the d.f. for ESS is (k ~ 1). Thus, if you havea model
with 5 explanatory vatiables including the intercept, the d.f. associated with
ESS will be 4, whereas if you had a model with 10 explanatory variables includ-
ing the intercept, the df. for the ESS will be 9. But the conventional R? formula
does not take into accorant the differing df. in the various models. Note that the
df, for TSS is always (n- 1). (Why?) Therefore, comparing the R? unhues of two
inodels with the some dependent onriable but with differing numbers of explanatory
variables is essentially like comparing apples and oranges,
‘Thus, what we need is a measure of goodriess of fit that is adjusted for (Le.,
takes into account explicitly) the number of explanatory variables in the model.
Stich & meastare has been devised and is known as the adjusted R, denoted by
the symbol, R®, This R? can be derived from the conventional R? (see Appendix
8A.3) a8 follows: Me
(8.54)CHAPTER EIGHT: MULTIPLE REGRESSION: ESTIMATION AND HYPOTHESIS TESTING 229
Note that the R* we have considered previously is also knowin as the unadjusted
R? for obvious reasoris.
The features of the adjusted R? are:
1. If k>1, R°s R; that is, as the number of explanatory variables
increases in a model, the adjusted R* becomes increasingly smaller than
the unadjusted R2, There seems to be a “penalty” involved in adding
‘more explanatory variables to a regression model,
2 Although the unadjusted R? is always positive, the adjusted R? can on
‘occasion turn out to be negative. For example, in a regression model
involving k= 3 and 1'= 30, if an Ris found to be 0.06, R* can be negative
(8.0096).
At present, most computer regression packagés compute both the adjusted
and unadjitsted R? values. This is a good practice, for the adjusted R? will en-
able us to compate two regressions that have the same dependent variable but a
different riumber of explanatory variables."° Even when we are not comparing
two regression models, it js a good practice to find the adjusted R? value
because it explicitly takes lito account the number of variables included in a
model.
For ourillustrative example, your should verify that the adjusted R* value is
0.8830, which, as expected, is émaller than the unadjusted R? value of 0.8906,
‘The adjusted R? values for regressions (8.52) and (8.53) are 0.5169 and 0.1268,
respectively, which are slighily lower than the corresponding unadjusted R?
values.
8.11 WHEN TO ADD AN ADDITIONAL EXPLANATORY
VARIABLE TO A MODEL
Jn practice, inorder to explain a particular phenomenon, we are often faced
with the problem of deciding’ among several competing explanatory variables.
‘The common practice is to add variables as long as the adjusted R* increases
(even though its numerical value may be smaller than the unadjusted 2) But
when does adjusted R? increase? It cart be shown that R? will increase if the |t|
(absolute t) value-of the. coéfficient of the added variable is larger than 1, where the t
alive is computed under the null hypothesis thiat the population value of the said coeffi-
cient is zero." :
‘To see this ail clearly let us fist repress auction price on a constant only, then
ona constant and the age of the clock, and then on a constant, age of the clock
‘and the nvimber of bidders, The tesults are given.in Table 8.4
As we will in Chair 9 if to regressions have diferent dependent variables, we cannot
conga thle ales dct aust or uae,
thet oro a particular t Value is significant, the adjusted R? will increase o long asthe
‘ofthe coeliient ofthe added variable i greter tan 1,
292MB
230 PARTTWO: THE LINEAR REGRESSION MODEL
203
‘TABLE 8-4. ACOMPARISON OF FOUR MODELS OF ANTIQUE CLOCK AUCTION PRICES.
Dependent variable . Intercept Age, Fof Bidders. «AY =P F
Auction proce 1928084 — — 000 000 9 (1)
(19.0850)
‘Auction price; ~181,6062 104856 = +8805 08169 94.1723 (2)
(0.7248) (68457)
‘Aucton price 807.9501 — «548724 “0.1549 0.1268 55017 (3)
(3.4962) 6)
‘Auction ptice 1995.04. . 12.7413 85.7640 0.8908 0,8890- 118.0585 (4)
(7.6226) (13.9659). (9.7497)
‘Noté:Figues inthe pareniheses are th estimated f values under the nul hypothesis hat ia corresponding
‘optlaton values are orp:
Some interesting facts stand out ir this exercise:
1, When we regress auction price on the intercept only, the R2, R?, and F val-
uués are all zero, as we would expect. But what does the intercept value
represent here? It is nothing but the (sample) mean value of auction
ptice. One way to check on this is to look at Bq. (6.16). If there is no X
variable in this equation, the intercept is equal to the mean value of the
dependent variable.
2. When we regress auction price on a constant and age of the antique
clock, we se thatthe t value of the age variable isnot only greater than.
1, but it is also statistically significant. Unsurprisingly, both R? and R?
values increase (although the latter is somewhat smaller than’ the
former), But notice an interesting fact. If you square the ¢ value of 5.8457,
we get (5.8457)? = 34.1722, which is about the same as the F value of
534.1723 shown in Table 8-4. Is this surprising? No, because in Equation
* (4.15) we stated that
=A (8.55) = (4.15)
‘That is, the square of the £ statistic with k df. is equal to the F statistic
with 1 df. in the numerator and k-d.£ in the denominator. In our exam-
ple, k = 30 [32 observations ~2, the two coefficients estimated in Model
(], The numerator df. is 1, because we have only one explanatory vari-
vable in this model.
.. When we regress auction price on a constant and the nuttiber of bidders,
‘we see that the # value of the latter is 2.3455. If you squate this value, you
will get (2.3455 = 5.5013, which is about the same as shown in the
preceding table, which again verifies (6.55), Since the ¢ value is greater
than 1, both R’ asid R? values have increased. The computed ¢ value is also
statistically significant, suggesting that the number of bidders variable
should be added to model (1). A similar conclusion holds for model (2).
903CHAPTER EIGHT: MULTIPLE REGRESSION: ESTIMATION AND HYPOTHESIS TESTING 231
‘4, How do we decide if it is worth adding both age and number of bid-
ders together to model (1)? We have already answered this question
with the help of the ANOVA technique and the attendant F test. In
‘Table 8.2 we showed: that one could reject the hypothesis that By =.
, that is, the two explanatory variables together have no impact on the
attction bid price,”
8.12, RESTRICTED LEAST SQUARES.
Letus take another look at the regressions given in Table 6-4. There we saw thie
consequences of omitting relevant variables from a regression model. Thus, in
regression (1) shown n this table we regressed antique clock auction price on the
intetcept only, which gave an R? value of 0, which is not suxprising, Then in
» regression (4) we regressed atiction price on the age of the antique clock as well
as on the riumber of bidders present at the auction, which gave an R? value of
0.8906. On the basis of thie F test we concluded that there was a specification error
and that Dotti the explanatory variables should be added to the model.
Let us call regression (1) the restricted model because it implicitly assumes that
the coefficients of the age of the clock and the number of bidders are zero; that
is, these variables do not belong in the model (ie., By = Bs = 0), Let us call
segression (4) the unrestricted model because it includes all the relevant variables.
Since (1) is a restricted model, when we estimate it by OLS, we call it restricted
feast squares (RLS). Since (4) is an unrestricted model, when we estimate it by
OLS, we call it unrestricted Jeast squares (URLS). All the models we have esti-
mated thus far were essentially URLS, for we assumed that the model being
estimated is cortectly specified and that we have included all the relevant vari-
ables in the model, In Chapter 11 we will see the consequences of violating this
assumption.
‘The question now is. How do we decide between RLS and URLS? That is,
iow do we find outif the restrictions imposed by a model, such as (1) in the pre~
sent instanice, are Valid? This question can be atiswesed by the F test, For this
piupose, let R? denote the R? value obtained from the restricted model and Ri,
the R? value obtained from the untestricted model. Now assuming that the
ertof term 4; is normally distributed, it can be shown that :
FFs (8.56)
follows the F distribution with m and (1 - k) d.f in the numerator and denom-
inatot, respectively, whete K? = R* obtained from the restricted regression,
Suppose you have » model with four explanatory variables. italy yout only include tivo of
these variable but then you want to ind ou! if itis worth adding twvo more explanatory variables.
This can be handled by an extension of the F test. For details, see Damodar N, Gujarat, Bas
Econometrics, Ath ed, McGraw-Hill, New York, 2003, pp. 260-264.
294 -
weet ne oN282 PART TWO: “THELINEAR REGRESSION MODEL
Ri, = R° obtained from the unrestricted regression, m = number of restrictions
imposed by the restricted regression (two in our example), 1 = number of
observations in the sample, and k = number of parameters estimated in the un- -
restricted regression (including the intercept). The null hypothesis tested here is
‘that the restrictions imposed by the restricted model-are valid. If the F value
estimated from Equation (8.56) exceeds the critical F value at the chosen level of
significance, we reject the restricted regression. That is, in this situation, the
restrictions imposed by the (restricted) model are riot valid.
Returning to our antique clock atiction price example, putting the appropri-
ate values in Bq, (8.56) from Table 8-4, we obtain:
(0:890"0)/2 0.445
Te teoye2= 3) = ToL = 17.414 (8.57)
‘The probability of such ‘an F value is extremely small. Therefore, we reject the
restricted regression, More positively, age of theantique clockaswellasthenumber
ofbiddersat auction both have astatistically significant impact on the auctionprice.
‘The formula (8.56) is of general application. The only precaution to be taken
in its application is that in comparing the restricted and unrestricted regres~
sions, the dependent variable must be in the same form. If they are not, we have
to make them comparable using the method discussed in Chapter 9 (see
Problem 9.16) or use an alternative that is discussed in Exercise 8.20.
F
», 819 ILLUSTRATIVE EXAMPLES
a
HBR
To conclude this chapter, we corisider several examples involving multiple
regressions. Our objective here isto show you how multiple regression models
ate used in a varlety of applications:
Example 8.1. Does tax policy affect corporate capital stractare?
To find out the extent to which tax policy has been responsible for the recent.
trend in U.S, manufacturing toward increasing ise of debt capital in lieu of eq-
uity capital—that is, tovard an increasing debt/equity ratio (called leverage in
the financial literature}—Pozdena estimated the following regression model:!®
= By + ByXor + ByXy + ByXei + BSB Xsy + BoXoe + ty (8.58)
the personal tax rate
e capital gains tax rate
jondebt-related tax shields
‘Randall Johaston Pozdena, "Tax Policy and Corporate Capital Structure,” Economic Review,
Federal Reserve Bank of San Francisco, Pall 1987, pp:.37-51.
295CHAPTER EIGHT: MULTIPLE REGRESSION: ESTIMATION AND HYPOTHESIS TESTING 233
TABLE@-5 LEVERAGE IN MANUFACTURING CORPORATIONS, 1935-1962.
Coaticent
Explanatory vafabio (tvaiue in parentheses)
Corporate tax rato 24
4 (10.5)
Pereonal tax rate ~12
(4a)
Capital gains tax rata 0s
Ga
Non dob shield 24
(-48)
(elton rate 14
80),
'n=48 (aumbsr of observations)
= 087
‘Sorte: andal Johnston Pozen, “Tex Poly and Corporal Capital
‘Stucu,* Gaon Rew, FedaalRoserve Bank of Sin Fandlco, Fall
1967, Taba 1,0. 45 adapted,
‘ots: 1. The author does ot present he extiatd intercept
2. Tho adjusted Fis caleulted using Ea, 8.54).
8. The standard errs ofthe various oofidons canbe obanad
ty tng he coaticint vals bys Cea (0g, 24/105 =
(01.266 sho se of the corporate lx rate coeticen,
Economic theory suggests that coefficients Bz, By, and By will be positive and
coefficlents Bs and B will be negative? Based on the data for US. manufactur
ing conporations for the years 1935 to 1982, Pozdena obtained the OLS results,
that are presented in tabular form (Table8-5) rather than in the usual format |e.g.,
Eq, (8.7)].(Resuls are sometimes presented in this form for ease of reading.)
Discussion of Regression Results
‘The first fact to noté about the preceding regression results is that all the coeffi-
cients have signs according to prior expectations: For instance, the corporate tax
rate has a positive effect on leverage. Holding other things the same, as the cor-
orate tax rate goes up by one percentage point, on the average, the leverage
ratio (ie, the debt/equity ratio) goes up by 2.4 percentage points. Likewise, if
the inflation rate goes up by one percentage point, on the average, leverage goes
up by 14 percentage points, other things remaining the same. (Question: Why
would you expect a positive relation between leverage and inflation?) Other
partial regression coefficients should be interpreted similarly.
Since the? values are presented underneath each partial regression coefficient
‘under the null hypothesis that each poptlation partial regression coefficient is
Gee Fozdena’s tice {foctnote 1a) forthe heoretica discussion of expected signs of he varius
coefficients. Inthe United States the interest paid on debt capital stax deductible, whereas the i+
‘comé paidas dividends isnot. Tiss one reason that corporations may prefer debt to equity capital,
296
D2MEE
234 PARTTWO: “THELINEAR REGRESSION MODEL
ne
<2
individually equal to zero, we can easily test whether such a null hypothesis
stands up against the (two-sided) alternative hypothesis that each true popula-
tion coefficient is different from zero: Hence, we use the two-tail test. The d.f.
in this example are 42, which are obtained by subtracting from 1-(= 48) the
number of parameters estimated, which are 6 in the present instance (Note: The
intercept value is not presented in Table 8-5, although it was estimated.) If
we choose « = 0.05 or 5%, the two-tail critical ¢ value is about 2.021 for 40 df.»
(Note: This is good enough for present purposes since the t table does not give
the precise t value for 42 d.£,) If cis fixed 21 0.1 or a 1% level, the critical f value
for 40 dif, is 2.704 (two-tail). Looking at the ¢ values presented in Table 8-5,
we see that each partial regression coefficient, except that of the capital gains
tax variable, is statistically significantly different from zero at the 1% level of
significance. The coefficient of the capital gains tax variable is not significant at
either the 1% or 5% level. Therefore, except for this variable, we can reject
the individual null hypothesis that each partial regression coefficient is zero. In
other words, all but one of the explanatory variables individually has an impact
on the debt/equity ratio. In passing, note that ifan estimated coeficient is statisti-
cally significant at the 1% level, itis also significant at the 5% level, but the converse
is not true, 5
What about the overall significance of the estimated regression line? That is,
do we reject the null hypothesis that all partial slopes are simultaneously equal to
zero or, equivalently, is R? = 0? This hypothesis can be easily tested by using,
Eq, (8.50), which in the present case gives
356.22. (8.59)
This F value has an F distribution with 5 and 42 df. If« is set at 0.05, the F table
(Appendix A, Table A-3) shows that for 5 and'40 d.f. (the table has no, exact
value of 42.4.4. in the denominator), the critical F value is 2.45, The correspond-
ing value at a = 0.01 is 3.51. The computed F of ~ 56 far exceeds either of these
critical F values. Therefore, we reject the null hypothesis that all partial slopes
are simultaneously equal to zero or, alternatively, R? = 0. Collectively, all five
explanatory variables influence the dependent variable. Individually, however,
as we have seen, only four variables have an impact ont the dependent variable,
the debt/equity ratio. Example 8-1 again underscores the point made earlier
that the (individual) t test and the joint) F fest ate quite different”?
rq the two-Variable linear regression model, as noted before, if = Fx; that is, the square of a f
value with Fd, is equal fo an F value with 1d in the natnerator atd kf. in the denominator.
907CHAPTEREIGHT: MULTIPLE REGRESSION: ESTIMATION AND HYPOTHESIS TESTING 235
Example 8.2. The demand for imports in Jamaica
‘Toexplain the demand for imports in Jamaica, J. Gafar” obtained the follow-
ing regression based on answal data for 19 years:
Y= -58.9-+ 0.20%y — 0.10%
sex (0.0092) (0.084) R*=0.96 (8.60)
t= QL (-1.1908) FR? = 0.955
where Y = quantity of imports
X= personal consumption expenditure
Xs import price /domestic price
Economic tieory would suggest positive relationship between Y and X2and
a negative relationship between Y and Xy which tums out to be the case,
Individually the coefficient of Xs statistically significant but that of Xsis not af;
say, the 5% level. But since the absolute ¢ value of Xs is greater than 1, R® for this
example will drop ifX5 were dropped from the mnodel. (Why?) Together, X; and X
explain about 96 percent of the variation in the quantity of imports into Jarhaica,
Example 8,3. The demand for alcoholic beverages in the United Kingdom
‘To explain the demand for aleoholic beverages in the United Kingdom, T.
McGuinness” estimated the following regression based on annual data for
years:
Y; = 0.014 ~ 0.354Xoy + 0.0018 Xqp + 0.657 Xqy + 0.0059 Xse
0.012) (0.2688) (0.0005) (0.266) (0.0034)
t=(-1.16) (132) 8.39) @47) (173).
R= 0.689
(8.61)
where Y= the annual change in pure alcohol consumption per adult
X= the antal change in the real price index of alcoholic drinks
X= the annual change in the teal disposable income per person
X= theantual change in the number of licensed premises
i the adult population
X= the annual’ change in real advertising expenditure on alcoholic
drinks per adult
‘Theory would suggest that all but the variable Xp will be positively: re~
lated to Y, This is borne out by the results, although each coefficient is not
19, Gafar, “Devaluation and the Balatce of Baystents Adjustment in Developing Beonomy: An
Analysis Relating fo larzica,” Applied Economics, vol. 13, 1981, pp. 151-168. Notations ‘were
‘adapted, Adjusted R? computed ,
McGuinness, “AA Eeotomelric Analysis of Total Deniand for Alcoholie Beverages in the
United Kingdom,” urna of industrial Ecos, vl. 29, 1980, pp. 85-109. Notations were adapted.
298
Sk ae.
fensie
FRE.
‘236 PART TWO: THE LINEAR REGRESSION MODEL
8.14 SUMMARY
345
&
statistically significant. For 15 df. (Why?), the 5% critical t value is 1.753
(one-tail) and 2:131 (two-tail), Consider the coefficient of Xs, the change in
advertising expenditure. Since the advertising expenditure and the demand
for alcoholic beverages are expected to be positive (otherwise, it is bad news
for the advertising industry), we can entertain the hypothesis that Hy:Bs = 0
vs, Hi:Bs > 0, and therefore can use the one-tailf test. The computed f value
of 173is very close to being significant at the 5% level,
Itis left as an exercise for you to compute the F value for this example to
tte hypotil that all partial slope coefficients are simultaneously equal
to zeto.
Example 8.4. Civilian labor force participation rate, anenployment at
and average hourly earnitige revisited
Tn Chapter 1'we presented regression (1.5) without discussing the statistical
ignificance of the resillts. Now we have the necessaty tools to do that. The
‘complete regréssion tesults are as follows:
CEPR, = 80.9018 ~ 0.6713 CUNR; ~ 1.4042 AHES2
= (47561) (0.0827) (0.6086)
(17.0096)(-8.1159) _ (2.3072) (8.62)
value = (0.0000)* (0.0000)* (0.0319)
R= 07727; R?=0.7500; F =34073
As these sesults show, each of the estimated regression coefficients is indi-
vidually statistically highly significant, because the p values are so small.
‘Thatis, each coefficient is significantly different from zero. Collectively, Both
CUNR and AHES? are also highly statistically significant, because thep value
of the computed F value (for ? and 20 d.f) of 40s extremely low.
‘As expected, the civilian unemployment rate has a negative effect on
the civilian labor force participation rate, suggesting that perhaps the
discouraged-worker effect domtinates the added-worker hypothesis. The
theoretical reasoning behind this has already been explained in Chapter 1.
‘The negative value of AHES2 suggests that perhaps the income effect domi-
nates the substitution effect.
In this chapter we considered the simplest of the multiple regression models,
namely, the three-variabte linear regression model—one dependent variableand
two explanatory variables. Although in many ways a'straightforward extension
of the two-variable linear regression model, the thrée-Variablemodel introduced.
“Denotes excel smal value.
299CHAPTER EIGHT: MULTIPLE REGRESSION: ESTIMATION AND HYPOTHESISTESTING 237
several new concepts, such as partial regression coefficients, adjusted and unad-
justed multiple coefficient of determination, and multicollinearity.
Insofar as estimation of the parameters of the multiple regression coéfii-
cients is concerned, we still worked within the framework of the classical linear
regression model and used the method of ordinary least squares (OLS), The
OLS estimators of multiple regression, like the two-variable model, possess
several desirable statistical properties summed up in the Gauss-Markov prop-
erty of best linear unbiasednéss (BLUE). -
With the assumption that the disturbance ae the normal distri-
bution with zero méan and constant variance 6”, we saw that, a5 in the two-
‘vasiable case, each estimated coefficient inthe multiple regression follows the
normal distribution with a mean equal to.the true population value and the
vatiances given by the formulas developed in the text. Unfortunately, in prac-
tice, o7 is not known and: has to be estimated. The OLS estimator of this:
unknown variance is 62. But if we replace o by 62, then, a8 in the two-vatiable
case, each estimated coefficient of the multiple regression follows the distribu.
tion, not the normal distribution.
‘The knowledge that each multiple regression coefficient follows the t dis-
tribution with d.f, equal to (n'—K), where k is the number of parameters esti-
mated (including the intercept), means we can use the t distribution to test statis-
tical hypotheses about each multiple regression coefficient individually. This can
be done on the basis of either the ¢ test of significance or the confidence interval:
based on the f distribution. In this respect, the multiple regression model does not
differ mach from the two-variable model, except that proper allowance must be
made for the dé, which now depend on the number of parameters estimated.
However, when testing the hypothesis that all partial slope coefficients are
simultaneousty equal to zero, the individual testing referted to earlier is of no-
help. Here we should use the analysis of variance (ANOVA) technigie and the
attendant F test. Incidentally, testing that all partial slope coefficients are simvul-
taneously equal to zero’ is the same as testing that the multiple coefficient of
determination, R? is equal to zero. Therefore, the F test can also be used to test
this later but equivalent hypothesis,
We also discussed the question of when to add a variable or a group of
variables to d model, using either the t test or the F test. In this context we also
discussed the method of restticted least squares,
All the concepts introduced in this chapter have been illustrated by
iiumerical exainples atid by concrete economic applications.
KEY TERMS AND CONCEPTS .
The key terms and coricepts introduced in this chapter are
Multiple regression modet Multcollincarity
Partial ogtession coefficients; Collinearity; exact Linear relationship
partial slope coefficients a) high or near perfect collinearity.
300 :
~298 -PARTTWO: THE LINEAR REGRESSION MODEL
‘Multiple coefficient of Model specification bas (specification
determination, R? error)
Coefficient of multiple correlation, R Adjusted R?(R2)
Individual hypothesis testing Restricted least squares (RLS)
Joint hypothesis testing or test of Unrestricted least squares (URLS)
~ overall significance of estimated Relationship between # and F tests.
multiple reg
a) analysis of variance (ANOVA)
b) F test
QUESTIONS.
8.1, Explain caffully the meaning of
a. Partial regression coefficient
“b. Coefficient of multiple determination, R?
«¢, Perfect collinearity
A, Perfect multicollinearity
e. Individual hypothesis testing
£, Joint hypothesis testing
: Adjusted R®
8.2. Explain step by step the procedure involved in
a. Testing the statistical significance of a single multiple regression coefficient.
b, Testing the statistical significance of all partial slope coefficients.
8. State with re reasons whether he folowing statements are true (0) fase
or uncestain (U).
a. The adjusted and unadjusted Rsueidentcl orily when the unadjusted R?
is equal to
b. The way to 5s determiné whether a group of explanatory vatiables exerts sig-
. nificant influence onthe dependent variable isto see if any of the explana-
fory variables has a significant t statistic; if not, they axe statistically in-
si tas a a
6 When =, Fe 0vand when R? F = infinite,
4, When the df. exceed 120, the 5% critical # value (two-tail) and the 5% criti-
calZ (standard normal) value are identical, namely, 196. -
e. In the model ¥; = By + By Xai + 83Xui + 14, if Xp and Xq are negatively cor-
related in the sample and B; > 0, omitting Xs from the model will bias bjz
downward [ie, E(by2) < Ba] where by is the slope coefficient in the regres-
sion of ¥-on Xpalone.
f, When we say that an estimated regression coefficient is statistically’
cant, we mean that itis statistically different from 1.
g, To compute a critical t value, we need to know only the dif
h. By the overall significance of a multiple regression we mean the statistical
significance of any single variable included in the model,
i. Insofar as estimation and hypothesis testing are concerned, there is no dif-
ference between simple regression and iultiple regression.
j. The df of the tofal sum of squares (18S) are always (n ~ 1) voxels of the
roumber of explanatory variables included in the model,
dE
FREE
* Optional.‘CHAPTER EIGHT, MULTIPLE REGRESSION: ESTIMATION AND HYPOTHESIS TESTING 239
8.4. What is the value of 6? in each of the following cases?
a. Yeef = 880, = 25, k= 4 including intercept)
b, Yoef = 1220, n= 14, k=3 (excluding intercept)
85. Find the critical t value(s) in the following situations:
Degrees of freedom Level of significance
(at) ) hy
2 5 Two-sided
20 1 Right
30 5 Left
200 5 ‘Twotalt
8.6, Find the critical F values for the following combinations:
Numerator. Denominator 61. Covel of significance 2)
5 8 4
4 9 1
a 200 6
PROBLEMS
87. Youare given the following data:
yo & oh
1 1 2
a8 4
8 3. ~3
Based on these data, estimate the following regressions:
a + Aa Kast uy .
1+ Ca Xi + ui orl
h + Bp Xo + BXy +; sf
‘Note: Do not worry about estimating the standard errors. Sy
1, Is Ap=B Why of why not? x
2, 18 Cy=By? Why ot why not? ‘ep
What genetal conclusion can you draw from this exercise? t
88. You are given the following data based on 15 observations:
Y= 367,693; X= 402.760; Xy=8.0; )>y? = 66,042,269 as
YF $ 84,855.096; x3 = 280.0; Y yr = 74,778.46
Livre = 4.2509; Yayry = 4,7960
where lowercase letters, as usual, denote deviations from sample mean values,
a. Estimate the three multiple regression coefficients
. Estimate their standard errors.
c, Obtain R* and
d. Estintate 95% confidence intervals for By and Bs.
e. Test the statistical significance of each estimated regression coeflicient using
\ a 35% (tworlail).
302
239a
PB
240° PARTTWO: THE LINEAR REGRESSION MODEL
f. Test atc = ee
ANOVA table,
8.9. A thtee-variable regression gave the following results;
‘Sum of squares Mean sum of
‘Source of vation (ss) df, squares (MSS)
Due to regression (ESS) 65,965, = -
Due to residual (RSS) s = =
Total (TSS) i: 66,042 14
‘a: What is the sample size?
b, What is the value of the RSS?
¢. What are the df. of the ESS anid RSS?
4, What is RY And R 3
e, Test the hypothesis that X2 and Xs have zero influence on Y. Which test do
you use and why?
f. From the preceding : information, can You tell what is the individual contri-
bution of X, and Xs toward Y?
8.10.’ Recast the ANOVA table given in problem 8: Jin terms of R2:
8.1L. To explain what determines the price of air conditioners, B:T, Ratchford? ob-
tained the following regression results based on a sample of 19 air conditioners:
68.236 + 0.023 Xqj + 19.729Xqy + 7.653Xq R* = 0.84
(@.005}. (6.992). (4.082)
where Y = the price, in dollars
Xp= the BIU rating of ait conditioner
Xq= the energy efficiency ratio
Xq= the number of settings
se = standard errors
a, Interpret the regression results.
b, Do the results make economic sense?
©. Ata = 5%, test the hypothesis that the BTU rating has no effect on the price
‘of an air conditioner versus thatit has a positive effect:
44: Would you accept the null hypothesis that the three explanatory Variables
explain a sibstantial variation ‘in the. prices: of air conditioners? Show
clearly all your calculations.
8.12, Based on the US. data for 1965-IQ to 1983-IVQ (n= 76), James Doti and
Esmael Adibi% obtained the following regression to explain personal con-
sumption expenditure (PCE) in the United States.
Y= = 10.96 + 0.93Xy — 2.09Xy
P= (-3.33)049.06). 3.09) R= 0.9996
F= 83,7537
°g, T. Ratchford, “The Value of information for Selected Appliatces,” Journal of Marketing
Reet vol. 17,1980, pp- 16-25, Noone were adapted,
‘Wyatnes Doti and. Barnael Adibi, Economic Analysis: An Applications Approach, Prentice Hall,
Englewood Cliffs, NJ, 1988, p. 188. Notations wete adapted.
anz
po(CHAPTER EIGHT: MULTIPLE REGRESSION: ESTIMATION AND HYPOTHESIS TESTING 244
where Y = the PCE (§, in billions)
X= the disposable (ie,, after-tax) income (§, in billions)
X3 = the prime rate (%) charged by banks
a. What is the marginal propensity to consume (MPC)--the amount of ad-
ditional. consumption expenditure from an additional dollar’s persona
disposable income?
be Is the MPC statistically different from 1? Show the appropriate testing
procedure.
. What is the rationale for the inclusion of the prime rate variable in the
model? A priéti, would you expecta negative sign for this variable?,
Is by significantly different from zero?“ :
e. Test the hypothesis that R= 0,
£, Compiate the standard étror of each coefficient:
8.13. Inthe illustrative Example 8,2 given in the text, test the hypothesis that Xpand
Xs together have no influence on Y,, Which test do, you use? What are the
assumptions underlying that test?
‘Table 8-6 gives data oin child mortality (CM), female literacy rate (FLR), per
capita GNP (PGNP), and ‘total fertility rate (TFR) for a group of 64 countries.
a. A priori what is the expected relationship between CM and each of the
other variables?
b. Regress CM on FLR and obtain the ugual regression results,
c. Regress CM on FLR and PGNP and obtain the usual results.
d. Regress CM on FLR, PGNP, and TER and obtain the usual results, Also
show the ANOVA table,
, Given the various regression results, which model would you choose and
why?
{the regression modéin (dis the correct model, but you estimate (a) or (b)
or (¢), what are the consequences?
‘g Suppose you have regressed CM on FLR as in (2). How would decide ifit is
‘worth adding the variables PGNP and TER to the model? Which test would
. you use? Show the necessisy calculations.
8.15.. Use formula (8.54) to answer the following question:
BA6.
Value of AE a ke
‘What conclusion do you draw about the relationship between R? and R??
8.16, For the illustrative Exatnple 83, compute the F value. If that F value s signil-
icant, what does that mean?
8:17. For Exainple 82, set up the ANOVA fable and test that R? = 0, Use'a = 1%.
8:18. Refer to the data givers in Table 6.11 to answer the following questions:
a. Develop a multiple regression model to explain the average starting pay of
MBA gradviates, obtaining the usual regression output,
b. If you include both GPA and GMAT scores in the model, a priori, what
problem(s) may you encounter and why?
304
A
1Ke BPN
me
‘242 PARTTWO: THE LINEAR REGRESSION MODEL
TABLE 8.5 CHILD MORTALITY, FERTILITY, AND RELATED DATA FOR 64 COUNTRIES.
cM. FUR PNP TR CM = FLRPGNP_ Ss TFR
128 7 1870 6.668 442, 50 8640 TAT
2042 130815, toe 2 350 «880
202 16 310. 7.00 287 31 230 7.00
1785 570625 4 88 1620-391
96 78 (2050 3.81 312, n 190 6.70
209 28 200 6.44 7 88 2090 4.20
7 48 670 * 619 2 oe 90843
240 29. ‘0 5.89: 262 22 230 850
att 120: 580 216 12 40625
55 55 290. 238 246 9 390 710
75 a7 1180. 3.98. t9t 3t 1010 7410
a) 00° 599. 19 300 7.00
4. 93 1790: 350 TB 1730. 848
165 3t 1150 TAL 103 c J 780: 5.88
4 7 1160 424 67. 85 1300 462
96 80. 1270 5.00 143 78 930 5.00:
148 30 580. ‘827 83 5 690. 474
8 8 860 ‘5.21 223 33 200 8.49
161 43 420 6.50 240 19 450 6.50
118 47 1080 6.12 312 “at 280, 650
269 7 200 6.19 12 72 4430, 1,69
189 5 270 5.05 52 83 270 9.25
126 58. 560, 6.16 79 43 1340 TAT
12 et 4240 1,80 é 88 670 3.52
167 20 240 ATS 168 28 410 6.09
185 65 430 410 28 5 4370, 2.86
107 er 3020 6.66 121 at 1310 4.88.
Te 63. 1420 7:28 15 62 1470 3.89
128 4g 420. 8.12 186 45 0 6.90
oF CI] 19830 523 47 65 3630 4.10
12 40 88 7% 48 20 608
ma 530.650 2 se 720
FAR = Fema ercy rat
“= PGNP = Porcapa GNP In 1680, 5
‘THR = Tol ely al 1960-1885, tho average number of cre bor oe wornan using ago
potas or gen yor.
Source: Chandan Majo, Howard Wie, and Mare Why. Eoonometcs and Data Anas er
Developing Counties, Rowadge, London 196, p, 488.
«IF the coefficient of the tuition variable is positive and statistically signifi-
cant, does that mean it pays to go to the most expensive business school?
‘What might the tuition variable be a proxy for?
d. Suppose you regress GMAT score on GPA and find a statistically significant:
positive relationship between the two, what can you say about the problem,
of multicolinearity?
e, Set up the ANOVA table for the multiple fegression in part (a) and test the
hypothesis that all partial slope coefficients are zero.
£, Do the ANOVA exercise in part (¢), using the R? value.
8.19. Figure §-1 gives you the normal probability plot for illustrative Example 8.4
a. From this figure, can you tell if the error term in Eq. (8.62) follows the nor-
al distribution? Why or why not?
. Isthe observed Anderson-Darling A* value of 0.481 statistically significant?
Titis, what does that gan? Iitis not, what conclusion do you draw?CHAPTER EIGHT: MULTIPLE REGRESSION: ESTIMATION AND HYPOTHESIS TESTING 243,
‘Nonmal Probability Plot ofthe Residua)s
(Cesponse is CLFPR)
s
Percent
zessege 2s
FIGURE &A-t Normal probablity plot for Example 8.4.
AD = Anderson-Darting Statistic.
From the given data, can you identify the mean and vaciarice ofthe error
tem?
820, Restrcedlnst squares (RLS). Ifthe dependent vatiabe inthe restricted and
unrestricted regressions are not the same, you can use the following Vari-
ant of the F test given in Eq (256)
(RSS, RSS,)/m
RSSq/(n—k)
where RSS, = residual sum of squares from the restricted regression, RSS =
residual sum of squares fo the ined eres, ‘mt = nuinber of
restriations, and (rt ~ k)-d£ in the unrestricted
Tuto fea youths fons evork head giveninTable8-4.
Fe
» Franc
* EP ie,
APPENDIX 84.1: Derivations of OLS
Estimators Given in Equations (8,20) to (8.22)
Start with Bg, (@.16). Differentiate this equation partially with respect to by, ba,
and by, and set the resulting equations to zero to obtain:
Bes x
SEE 00 -h-n¥y—bH-t)=0
2 aD ihr kal Ha) 0
BRE Uy ty Xe —buHON-Xa) <0
Bb 386244. PARTTWO: THE LINEAR REGRESSION MODEL
Simplifying these equations, gives Eq, (8.17) (8.18), and (8.19). Using small letters
to denote deviations from the mean valves (¢.g.,2%1 = Xo ~ Xa), we can solve the
preceding équations to obtain the formulas given in Eqs. (8.20), (8.21),and (8.22),
APPENDIX 84.2: Derivation of Equation (8.31)
Note that the three-variable sample regression model
Y= by + bX + by Xe + ey (8A.2.1)
can be expressed in the deviation form (ie., each variable expresséd as a devia-
tion from the mean value and noting thiat’= 0) as
Y= boty + baty +e (8A.2.2),
Therefore,
(6A2.3)
Which we can write as
LFt= Leen
= Veil ~baay — baa)
Dei be Devaar —bs Yea
= Drei since the last two terms are zero (Why?)
= Db — baw)
= Deb Dye yyw
Lit (Yop +bs wa)
= TSS— ESS
aN
APPENDIX 8A.3: Derivation of Equation (8.50)
Recall that (see footnote 9)
RSS
27 - 5
. Real es @A3.1)
FEED
Now R° is defined 2s
RSS/(t= Kh)
~ T8S/(n=1)
RS)
TSS (n-h)
Note how the degrees of freedom are taken into account:
Now substituting Bquation (8A.3.1) into Equation (8A.3.2), and after alge-
braic manipulations, we obtain 5
Re
=1 (8A.3.2)
1
Ra1-(+R)
307CHAPTER EIGHT: MULTIPLE REGRESSION: ESTIMATION AND HYPOTHESIS TESTING 245
Notice that if we do not fake into account the df associated with RSS (= 11 ~ k) “
and TSS (= 1 — 1), then, obviously R?-= R’.
APPENDIX 8A. views Output of the Clock
Auctlon Price Example
285“O
ow