Wollo University Economics Dept Estimates Multiple Regression

Wollo University, College of Business and Economics, Department of Economics
CHAPTER THREE
ESTIMATION OF MULTIPLE LINEAR REGRESSION MODEL
3.1 The Multiple Regression Model
The simple regression analysis studied previously is inadequate in practice, because it assumes
that only one independent variable is considered to explain the variation in the dependent
variable. But in most economic theory the variation in the dependent variable is practically
explained by a number of explanatory variables. Therefore, we need to extend the simple
regression analysis two or more explanatory variables are involved so as to explain the variations
in the dependent variable.
The multiple regression analysis allows us to explicitly include many factors which
simultaneously affect the dependent variable. In this situation, more of the variation in the
dependent variable can possibly explained by the potential explanatory variables. This is
important both for testing economic theories and for evaluating policy effects when we must rely
on non-experimental data. As a result the multiple regression analysis is still the most widely
vehicle for empirical analysis in economics and other social sciences.
Since the multiple regression analysis allows many observed factors (X’S) to affect the
dependent variable (Y) the general multiple linear regression model with K independent
variables can be written in the population as:
Yi   0  1 X 1i   2 X 2i   3 X 3i  ...   K X K   i ...............(3.1)
Where  0 is the intercept, 1 is the parameter associated with X 1 ;  2 is the parameter associated
with X 2 , and so on. We can also refer to the parameters other than the intercept as slope
parameters;  is the error term or disturbance that contains factors other than X 1 , X 2 ,…, X K
that affect Y, because no matter how many explanatory variables are included in the model, there
will always be factors we cannot include in the model.
To illustrate the multiple regression model with any independent variables it is useful to remind
the wage example. That is, besides a single independent variable (education) that is explained in
Econometrics Lecture Notes; 2013 By Addisu M. 1

the simple regression model there are some other variables such as working experience, job
training, ability etc that affect the wage level.
The simplest possible multiple regression model is a model with two independent variables and
this model can be written in population as:
Yi   0  1 X 1i   2 X 2i   i .......... .................... ........( 3.2)
Where Y is the dependent variable, X 1 & X 2 are explanatory variables and the  ’S and  i refers
to as stated in earlier equation.
Note that the intercept term  0 usually gives the mean or average effect on Y if all variables
excluded from the model (i.e, X 1 = X 2 =0) But the slope coefficients 1 and  2 gives the partial
effect of the explanatory variable under consideration on the dependent variable given the value
of the other explanatory variable is constant. This is the concept of partial in which the effect of a
particular explanatory variable is analyzed by holding constant the values of other explanatory
variables.
As a result the interpretation of the above two variable case of the multiple regression model is
that analyzing the conditional mean or expected value of Y conditional up on the given or fixed
values of X 1 and X 2 .
(Yi / X 1i , X 2i )   0  1 X 1i   2 X 2i .......... .......... .......... .(3.3)
3.2 Estimating and Interpreting of OLS Estimators

3.2.1 Obtaining OLS Estimates
In this section, we summarize some computational and algebraic features of ordinary least
squares (OLS) so as to obtain the parameter estimates using the simplest possible multiple
regression model, that is, a model with only two independent variables.
Thus, the estimated OLS equation would be written as:
Yˆ  ˆ0  ˆ1 X 1  ˆ 2 X 2 .................... .......... .......... ..(3.4)
Where ˆ 0 , ˆ1 andˆ 2 are the estimate of  0 , 1 , and  2 respectively.

Given some observations which might ranges from i to n on Y, X 1 & X 2 we can obtain the
estimated values of ˆ 0 , ˆ1 & ˆ 2 using the method of ordinary least squares-OLS through
minimizing the sum of squared residuals.
i.e, the equation (Y1  ˆ0  ˆ1 X 1i  ˆ 2 X 2i ) 2 .................... ....(3.5)

should be as small as possible.
2  i  Yi  Yî
Minimize  î where
 Yi  ˆ0  ˆ1 X 1i  ˆ 2 X 2i
Thus, minimizing
  i  (Yi  Yî ) 2
2
 (Yi  ˆ 0  ˆ1 X 1i  ˆ 2 X 2i ) 2 .......... .................... ...(3.6)

To minimize this equation, take the partial derivative of equation (3.6) with respect to
ˆ 0 , ˆ1 andˆ 2 and set at zero so as to solve for ˆ 0 , ˆ1 andˆ 2 .
2 2 2
 î  î   i
i.e,  0;  0; and 0
ˆ 0 ˆ 1 ˆ 2
2
The partial derivative of  î with respect to ̂ 0 gives the following results:
  Yi  ˆ 0  ˆ1 X 1i  ˆ 2 X 2 i ) 2
2
 î
 0
ˆ 0 ˆ 0
 (Yi  ˆ0  ˆ1 X 1i  ˆ 2 X 2i )  0

  Yi   ˆ 0   1 X 1i   ˆ 2 X 2 i  0
  Yi  nˆ 0  ˆ1  X 1i  ˆ 2  X 2 i  0
  Yi  nˆ 0  ˆ 0  ˆ1  X 1i  ˆ 2  X 2i .............................(3.7)
Dividing equation (3.7) by n, we can obtain:

Y  ˆ0  ˆ1 X 1  ˆ 2 X 2 .................... .................... .......... .......(3.8)
Equation (3.8) can also be rewritten as follow so as to obtain

ˆ 0  Y  ˆ1 X 1  ˆ 2 X 2 .................... .......... .................... ...(3.9)
In similar fashion, let’s compute the partial derivatives of  Uˆ i with respect to ˆ1 andˆ 2 .
2

Partial derivatives of  Uˆ i with respect to ̂ 1 gives us:

2
  Uˆ 2  (Yi  ˆ0  ˆ1 X 1i  ˆ 2 X 2 i )

2
 0
ˆ1 ˆ1
 (Y  ˆ  ˆ X  ˆ X )(  X )  0
i 0 1 1i 2 2i 1i
  Yi X i  ˆ0  X 1i  ˆ1  X 1i  ˆ 2  X 1i X 2i  0.......(3.10)

2
Partial derivatives of  Uˆ i with respect to ̂ 2 gives us:

2
  Uˆ 2  (Yi  ˆ 0  ˆ1 X 1i  ˆ 2 X 2 i )
2
 0
ˆ 2 ˆ 2
 (Y  ˆ  ˆ X  ˆ X )( X )  0
i 0 1 1i 2 2i 2i
  Yi X i  ˆ 0  X 2i  ˆ1  X 1i X 2 i  ˆ 2  X 2 i  0.......( 3.11)

2
By solving the three equations (3.8), (3.10) and (3.11), which are obtained from partial
derivatives of  Uˆ i with respect to ˆ 0 , ˆ1 & ˆ 2 respectively, simultaneous we can obtain

2
ˆ1 andˆ 2 . Alternatively, we can solve for ˆ1 and ˆ 2 by substituting the values of ̂ 0 on equation
(3.9) into the two normal equations (3.10) & (3.11) through undertake some algebraic
manipulations.
Thus, from the simultaneous system of the normal equations, we obtain values of ˆ1 and ˆ 2 as:
( X 1i  X 1 )(Yi  Y ) ( X 2 i  X 2 ) 2  ( X 2 i  X 2 )(Yi  Y ) ( X 1i  X 1 )( X 2i  X 2 )
ˆ1  ...(3.12)
( X 1i  X 1 ) 2 ( X 2i  X 2 ) 2  ( X 1i  X 1 )( X 2i  X 2 ) 
2
( X 2 i  X 2 )(Yi  Y ) ( X 1i  X 1 ) 2  ( X 1i  X 1 )(Yi  Y ) ( X 1i  X 1 )( X 2i  X 2 )
ˆ 2  ...(3.13)
( X 1i  X 1 ) 2 ( X 2i  X 2 ) 2  ( X 1i  X 1 )( X 2 i  X 2 ) 
2
If we write the above equations (3.12) & (3.13) in lower cases letters for deviations from the
means (i.e, X 1i  X 1i  X 1 , X 2i  X 2i  X 2 , andYi  Yi  Y ) , the equations can be reduced to:
2
(  y i x1i )(  x2 i )  ( y i x2 i )( x1i x2 i )
ˆ1  2 2
.......................(3.14)
( x1i )(  x2 i )  ( x1i x2 i ) 2

2
( yi x 2i )( x1i )  ( yi x1i )( x1i x2 i )
ˆ 2  2 2
.......................(3.15)
(  x1i )( x 2i )  (  x1i x 2i ) 2
3.2.2 Interpreting OLS Estimates

After estimating the parameter estimates the next important lesson is interpretation of the
estimated equation. For instance, with the case of two independent variables the coefficients on
the estimated equation can be interpreted as:
Yˆ  ˆ 0  ˆ1 X 1  ˆ 2 X 2 .................... .......... .......... .............(3.16)
The intercept ̂ 0 refers to the predicted value of Y when X 1 = 0 and X 2 = 0. The slope estimates
ˆ1 and ̂ 2 refers to the partial effects of one independent variable on Y while holding constant
the other explanatory variable. This is also termed as the ‘ceteris paribus’ interpretations of the
slope estimates.
For example, when X 2 is held fixed so that X 2 = 0, then the predicted change in Y is resulted
from the changes in X 1 . (Note that the intercept has nothing to do in changes in Y).
i.e, Yˆ  ˆ1 X 1

As a result the estimated equation will become:
Yˆ  ˆ 0  ˆ1 X 1 .......... .................... .......... .................... .......(3.17)
Similarly, by holding X 1 fixed:
Yˆ  ˆ 2 X 2
And it’s associated estimated equation become:
Yˆ  ˆ 0  ˆ 2 X 2 .................... .......... .................... .......... ......(3.18)
Note that sometimes we may want to know the effects of more than one independent variables by
changing them simultaneously. This is easily done using equation (3.16). For example, if we
want to know the effects of the one unit increments on both explanatory variables X 1 & X 2
simultaneously on Y, we need to add the coefficients of X 1 & X 2 and multiply by 100 to turn
the effect into percent.

3.3 Goodness of Fit

The goodness of fit of the estimated regression equation is measured by the multiple coefficient
of determination which is denoted by R2. Thus, R2 measures the proportion of the variation in Y
explained by variables X 1 and X 2 jointly in the multiple regression case of two independent
variables. Conceptually it is akin to the coefficient of determination r2 in the simple regression
model.
To derive R2, we simply follow the procedures in simple regression model first by defining the
total sum of squares (TSS), the explained sum of squares (ESS), and the residual sum of squares
(RSS) as:
2
TSS  (Yi  Y ) 2   yi ......................................(3.19)
ESS  (Yî  Y ) 2   yˆ i ......................................(3.20)
2
2
RSS  (Yi  Yî ) 2   U i ......................................(3.21)
Using the same argument in the simple regression case, the total variation in Yi is the sum of the

total variation in Yi & Uˆ i .
i.e, TSS = ESS + RSS
2 2 2
 yi   yˆ i   U i .......... .......... .......... .................... ..3.22

Assuming that the total variation in Y is nonzero, as is the case unless Yi is constant in the
sample, we can divide (3.22) by TSS to get:
ESS/TSS + RSS/TSS = 1
Just as in simple regression case, the R2 is defined to be:
R2 = ESS/TSS = 1 – RSS/TSS ………………..….(3.23)
Since ESS  ˆ1  yi x1i  ˆ 2  y i x2i (through doing some algebraic manipulations), the equation
of R2 can also be written as:
R 2  ESS / TSS  ( ˆ1  yi x1i  ˆ 2  yi x 2i ) /( y i )......... .......(3.24) )

2
where y, x and X are lower case letters to indicate deviations from the mean values.
1 2
3.4 OLS Assumptions in Multiple Regression Model

3.4.1 Assumptions
The assumptions of the multiple linear regression models are:
Assumption 1: Linearity in parameters
The parameters are linear functions of the dependent variable. Thus the model in the population
can be written as:
Y   0  1 X 1i   2 X 2i  ...   K X K  U ,......... ....(3.25)
Where  0 , 1 ,...,  K are the unknown parameters constants and U is the unobserved random term
or the disturbance term.
Assumption 2: Random sampling

We have a random sample of K-observations, ( X 1i , X 2 i ........, Xk i , Yi ) from the population
model (3.25). As a result the model of a randomly drawn observation i from the population can
be written as:
Y   0  1 X 1i   2 X 2i  ...   K X ki  U i .......... ...(3.26)
Assumption 3: Zero conditional mean

The error term U has an expected value of zero, given any values of the independent variables.
E (U / X 1 , X 2 ,......... .., X k )  0.................... .......... (3.27)
Assumption 4: No Perfect Colinearity/Multicolinearity

There no exact linear relationships among the independent variables so that the independent
variables are not perfectly linearly correlated.
Assumption5: Homoskedasticity
The variance of in the error term, U, conditional on explanatory variables is the same for all
combinations of outcomes of the explanatory variables.
For instance, given three explanatory variables, the variance in U can be written as:
Var (U / X 1 , X 2 , X 3 )   2 .......... .................... (3.28) )
The homoskedesticity assumption regress that the variance of the unobserved error term U does
not depend on the given combination of explanatory variables rather it is constant regardless of

the explanatory variables. If this assumption fails, the model exhibits hetroskedasticity which
refers to the variability of variances in the error term.
Assumption 6: Normality Of U
The population error U is independent of the explanatory variables and is normally distributed
with zero mean and variance  2 :
u ~ normal (0,  2 )  u ~ N(0,  2 )......... .......... .......... .......... .......... .......... ....3.29
Assumption 7: No autocorrelation
The values of U i corresponding to X i are independent from the values of any other U j
corresponding to X j i.e, there is no serial correlation of the U’S.
Cov (U i ,U j )  0.......... .......... .......... .......... .......... .......... ...3.30
Assumption 8: Independency of U i & X i .
Every disturbance term U i is independent of the explanatory variables:
Cov(U i , X i )  0.......... .......... .......... .......... .......... .......... ...3.31
Assumption 9: No errors of measurement in X’S

The explanatory variables are measured without error:
Assumption 10: Correct specification of the model

The model has no specification error in which the mathematical form is correctly specified and
all important explanatory variables are explicitly included in the model.
3.4.2 Gauss Markov Theorem as a Measure of Efficiency of OLS

The Gauss Markov theorem measures the efficiency of OLS method by justifying the rational for
the use of OLS method rather than using a variety of competing estimators.
The Gauss Markov theorem states that under the assumptions 1 through 5 the OLS estimator
̂ j for j is the best linear unbiased estimator (BLUE). Each component of the acronym
“BLUE” refers to:

i. Linearity: the parameters are linear functions of the dependent variable.

ii. Unbiasedness: the OLS estimators ̂ j are unbiased estimators for the population
parameters  j so that the biasedness of ̂ j from  j is zero.
E ( ˆ j )   j , j  0,1,..., k
E ( ˆ j )   j  0.................... .................... .......... .................(3.32)
iii. The word best refers to the smallest variance the OLS estimators. Therefore, since
method of OLS give minimum variances of the estimators it is the best one as compared
to other competing methods such as maximum likelihood, 2 stages least squares etc.
Due to the BLUE properties of OLS estimators assumptions 1 through 5 are called Gauss
Markov assumptions. Finally, the Gauss Markov theorem states that under assumptions 1
through 5, ˆ 0 , ˆ1 ,............ˆ K are the best linear unbiased estimators (BLUEs)
 0 , 1 ,......... ... K , respectively.
3.5 Hypothesis Testing

3.5.1 Hypothesis Testing and Confidence Interval for Single Coefficient of Regression
Before going to deal about hypothesis testing and interval estimation we need to compute the
mean and variance of the estimates. Since the OLS estimators are normally distributed they are
unbiased estimator of the population parameter and therefore their expected value is equal to the
corresponding population parameter. Given the multiple regression model with two explanatory
variable:
E ( ˆ 0 )   0 , E ( ˆ1 )  1 , and E ( ˆ 2 )   2 .......... .......... ....(3.33)
Therefore, the mean of the estimates ˆ 0 , ˆ1 andˆ 2 are the corresponding population parameters
 0 , 1 and 2 respectively.
Similarly, we can derive variances and standard errors of these estimates in a manure as follows:
2 2 2 2
2 1 X  X 2i  X 2  X 1i  2 X 1 X 2  X 1i X 2i 
Var  ( ˆ0 )   U   1 2 2 .........( 3.34)
 n  X 1i  X 2i  ( X 1i X 2 i ) 2 
2
2  X 2i 
Var  ( ˆ1 )   U  2 2 2 
.................... .................(3.35)
  X 1i  X 2 i  ( X 1i X 2i ) 

2  X 1i 
Var  ( ˆ 2 )   U  2 2 2 
.......... .......... .................( 3.36)
  X 1i  X 2 i  (  X 1i X 2 i ) 
2 2
Where  U   U i / n  k , n is number of samples, K is number of parameters.
The corresponding standard errors are:
Se( ˆ 0 )  Var ( ˆ 0 ) .................... .......... .......... .......... ....(3.37)
Se( ˆ1 )  Var ( ˆ1 ) .......... .......... .................... .......... ....(3.38)
Se( ˆ 2 )  Var ( ˆ 2 ) .................... .......... .......... .......... ....(3.39)
3.5.2 Testing Hypothesis about a Single Population Parameter: The t-test

In hypothesis testing involving the population parameter  j , the primary interest is laying on
testing the null hypothesis:

H0   j  0
Where j represents any of the k-independent variables

This hypothesis testing is suited for testing simple hypothesis and hence it is called classical
hypothesis testing. As an example, consider the wage equation which is affected by variables
education (educ), experience (exp) and tenure (tenu):
log(wage)   0  1educ   2 exp   3 tenu  U .................... .....(3.40)
From this wage equation, for example, the null hypothesis H 0 :  2  0 means that once
education and tenure have been accounted for, the number of years in work force (exp.) has no
effect on the hourly wage. Note that the alternative hypothesis will reveals the opposite one.
The statistic used to test the null hypothesis (against any alternative hypothesis) is called the t-
statistics or the t-ratio and can be defined as:

t s  t ˆ  ( ˆ j   j ) / Se(  j )
j

 ( ˆ j  0) / Se(  j )

 ˆ j / Se(  j )......... .......... .......... .......... .......... .......( 3.41)

In addition to the t-statistics, the theoretical or critical value of t, tc, at some significant level, 
and degree of freedom, n-k-1, is required so as to determine the decision rule for rejecting or
accepting the null hypothesis.
The critical values of t are directly obtained at the t-distribution table using the level of
significance and the degree of freedom. As the degrees of freedom in the t-distribution get large,
the t-distribution approaches to the normal distribution. For example, when n-k-1=120, the 5%
critical value for one tailed test is 1.658 compared with the standard normal value of 1.645. these
are close enough for practical purposes; for degrees of freedom greater than 120, one can use the
standard normal critical values.
Some treatment of regression analysis define the t-statistics as the absolute value so that the t-
statistics is always positive. However, this practice has the draw backs of making testing against
one-sided alternative clumsy. Therefore, throughout this test for one sided case, the t-statistics
always has the same sign as the corresponding OLS coefficient estimate.
In order to determine the decision rule for rejecting or accepting the null hypothesis H 0 :  j  0 ,
we need to make the relevant alternative hypothesis. There are two alternatives in which the
alternative hypothesis is formulated.
i. One-Sided Alternative
Positive Alternative: it is the right tailed test in which the hypothesized population parameter  j is
positive so that alternative hypothesis is formulated as: H 0 :  j  0 . Since  j is greater than zero, the t-
statistics is expected to be positive.

Decision rule:
If ts > tc, then reject H 0
If ts < tc, then accept H 0
Negative Alternative: it is the left-tailed test in which the hypothesized population parameter is negative
so that the alternative hypothesis is formulated as: H 1 :  j  0 . Since  j is less than zero, but the t-

statistics is expected to be negative. Thus, to compare this negative t-statistics with the t-critical value we
must get the negative t-critical though it is always positive on the table.
Decision rule:
If ts < -tc, then reject H 0
If ts > -tc, then accept H 0
Illustration
Suppose that the estimated wage equation with standard errors in parentheses, is:
log( wa gˆ e)  2.274  0.00046educ  0.48 exp  0.0002ten
(6.113) (0.00010 ) (0.40) (0.00022 )................... .........( 3.42)
n  408, R 2  0.0541
Test the significance of education (for right tailed test) and the significance of tenure (for left
tailed) at 5% of significant level.
 Significance of education;
t s  ˆ1 / se( ˆ1 )  0.00046 / 0.0001  4.6

t c  t  , n k 1  t 0.05, 404  1.645
(K represents the number of slope parameters)

H0
 Since ts > tc we reject so that education is statistically significant in determining wage.
 Significance of tenure:
t s  ˆ3 / se( ˆ 3 )  0.0002 / 0.00022  0.91

t c  t ,n k 1 , t 0.05, 404  1.645
 Since ts > -t c we accept H 0 so that tenure is not statistically significant in determining wage.
ii. Two-Sided Alternatives

This refers to the two tailed test in which the hypothesized population parameter is either
positive or negative but different from zero. Therefore, the alternative hypothesis is
formulated as:
H1 :  j  0

When the alternative is two-sided we are interested in the absolute value of t-statistics. Thus the decision
rule for the null hypothesis H 0 :  j  0 against the alternative hypothesis is:
If /ts/ > t c, then reject H 0
If /ts/ < t c, then accept H 0
Note that to find the t-critical value for two-tailed test the significant level α has to be divided by
2 so as to share the significance level for each tail of the t-distribution equally. For the sake of
illustration we can test the significance of experience in the earlier estimated wage equation
(3.42).
t s  ˆ 2 / se( ˆ 2 )  0.48 / 0.4  1.2

t s  t / 2,n  k 1 , t 0.025 , 404  1.96
 Since /ts/ < tc we accept H 0 and hence experience is not statistically significant in determining wage at
5% level of significance.
Testing Other Hypothesis About  j
Although H 0 :  j  0 is the most common hypothesis, we sometimes want to test whether  j is equal
to some other given constant. Two common examples are  j  1 and  j  1 , generally, if the null
hypothesis is stated as:

H0 :  j  aj
Where a j is our hypothesized value of  j , then the appropriate t-statistics is:
(estimate  hypothesiz ed value )

tS 
s tan dard error of the estimate
( ˆ j  a j )
 .......... .................... .................... .................... .......(3.43)
Se ( ˆ j )
Note that the computation of t-critical and the decision rules are exactly the same as before for
the respective types of tailed tests. The only difference is the computation of t-statistics.
Computing p-values for t-test

So far, we have see hypothesis testing of a single parameter using t-test by comparing the critical
and computed values of t at different significance level and degree of freedom, Rather than

testing at different significance levels, we can use the p-values for testing the parameter. The p-
value refers to the smallest probability of rejecting the null hypothesis when it is true. Implying,
it is the smallest significance level at which the null hypothesis would be rejected.
We obtain the p-value by computing the probability that at random variable with certain degree
of freedom. Thus, we reject the null-hypothesis when the value of t distributed random variable
denoted by T with n-k-1 degree of freedom is larger than the value of t-statistics in absolute
value, /t/.
P(/T/ > /t/)…………………………………….(3.44)
Therefore, p-value is the significance level of test when we use the value of t-statistics as the
critical value for test.
However, in most cases the p-value is computed using the econometric package STATA but not
available in table. Once the p-values have been computed, a classical test can be carried out at
any desired level. If  denotes the significance level of the test (in decimal form), then H 0 is
rejected if p-value <  .
Computing p-value for one-sided alternatives is also quite simple. Although some regression
packages only compute p-values for two-sided alternatives, the p-values can be obtained by just
dividing the two-sided p-value by 2.
Illustration
Suppose you estimate a regression model and obtain a p-value = 0.086 for a particular parameter estimate
ˆ1 and given the 5% of significance level test the significance of the parameter 1 at one-tailed and two
tailed alternatives.
For two-tailed test or classical test we reject H 0 , since P / 2   while for one-tailed test
(i.e, H 0 :  1  0; H 1 :  1  0) we accept H 0 , since p   .

Note:
Note: (1) when H 0 is not rejected we prefer to use the language “we fail to reject H 0 at x%
level,” rather than “ H 0 is accepted at x% level. (2) The statistical significance is distincted from
economic or practical significance. That is, the statistical significance of a variable X j is
determined entirely by the size of t-statistics, where as the economic/practical significance of a

variable is related to the size (and sign) of ˆ j .
Confidence Interval
Confidence interval also called interval estimates for the population provide a range or likely values for
the population parameter  j . For example, a 95% confidence interval (CI) for the unknown  j is given
by:
ˆ j  t / 2 . Se( ˆ j )   j  ˆ j  t / 2 . Se( ˆ j )............(3.45)

From this CI the lower and upper bounds of the confidence interval are
 j  ˆ j  t / 2 . Se( ˆ j ) and  j  ˆ j  t / 2 . Se( ˆ j ) respectively
As an example, for df = n-k-1 = 25 a 95% CI for ̂ j which is assumed to be -0.056 (its estimated value)
with Se( ˆ j )  0.02 is given by:
- 0.056 - t 0.025, 25 .Se(  j )   j  -0.056 + t 0.025,25 .Se(  j )

- 0.056 - (2.06) (0.02)   j  -0.056 + (2.06) (0.02).............. .......... .................... .(3.46)
- 0.0972   j  - 0.0148
The lower & upper bounds of the CI are -0.0972 & 0.0148 respectively so that the unknown
population parameter  j whose estimated value is – 0.056 would lie in the interval (-0.0972,-
0.0148) for 95% samples. That is, if 100 samples of size 29 are selected and 100 confidence
intervals like (3.46) are constructed, we expect 95 of them to contain the true population
parameter  j .
3.5.3 Overall Significance Hypothesis Test: The F-Test

Throughout the previous section we were concerned with testing the significance of a single
parameter to point out the partial effect of individual parameters on the dependent variable. But

now we try to point out the joint effect of one or more coefficients in the multiple regression
model using F-test. For the sake of simplicity let’s take the simplified form of multiple
regression with three variables (i.e. with two independent variables) by considering the following
null hypothesis:
H 0 : 1   2  0
This null hypothesis is a joint hypothesis that  1 and  2 are jointly of simultaneously equal to zero. A
test of such a hypothesis is called overall significance hypothesis test.
Unlike testing individual hypothesis using the t-test, two or more parameters are tested jointly
using F-test in which its value is calculated through the analysis of variance technique
(ANOVA). That is, computing the TSS, ESS and RSS first with their respective degree of
freedoms as n-1, k-1, n-k where n is the total number of observations and k is the total number of
parameters to be estimated including the intercept term.
Thus, F is computed using the formula:
ESS / k  1
FS 
RSS / n  k
Once the F-statistics (FS) is compute using the above formula we can test the overall significance
by comparing this calculated value with its critical value (FC) obtained from the F-table at the
numerator degree of freedom (k-1) and denominator degree of freedom (n-k), at α level of
significance. Therefore, the decision rule will be:
If Fs > Fc, then reject H 0
If Fs < Fc, then failed to reject H 0
As an example of testing the overall significance, consider the following model to explain
students score interms of factors such as class size (classize) and teachers’ compensation
(tchcomp).
Score   0  1 classize   2 tchcomp
Given 20 number of observations and the TSS &ESS are 363 and 257 respectively, test the
statistical significance of the slope parameters jointly at 1% level of significance.
ESS / k  1 257 / 3  1 257 / 2
FS     20.60
RSS / n  k 106 / 20  3 106 / 17
FC  F ,ndf ,ddf  F0.01, 2 ,17  6.11

Since Fs > Fc we reject H0 so that the factors class size and teachers compensation are
statistically significant in affecting students performance.
Relationship between R2 and F

There is an intimate relationship between the multiple coefficient of determination R2 and the
overall test F-test. In this regard, the R2 from of the formula expression for F is produced as
follow:
ESS / k  1
F 
RSS / n  k
ESS ( n  k )
 .
RSS ( k  1)
ESS / TSS (n  k )
 .
RSS / TSS ( k  1)
R 2 (n  k ) ESS RSS
 , R2  , 1 R2  .......... .................... ...............(3.47 )
1  R 2 (k  1) TSS TSS
Equation 3.47 shows that R2 & F are related and both vary directly. That is, the larger the R2 ,
the greater the F-value.
Thus, the F-test, which is a measure of the overall significance of the estimated regression is
equivalent to a test of significance of R 2. Implying that testing the null hypothesis
H 0 : 1   2  0 using F-test is equivalent to testing the null hypothesis H0 : R2 = 0.
Note that there are two types of models in this lesson: unrestricted and restricted models. The
unrestricted model is a model with potential k-independent variables (and k+1 number of
parameters: adding one for the intercept) where as the restricted model is a model with less than
k-independent variables due to drop out of some independent variables. Accordingly, the
unrestricted and restricted models have the unadjusted R-squared, R2 and the adjusted R-squared,
R 2 respectively.

The adjusted R-squared is computed as:

RSS / n  k
R 2 1
TSS / n  1
RSS (n  1)
1 .
TSS ( n  k )
TSS  ESS (n  1)
1 .
TSS (n  k )
TSS ESS ( n  1)
1 (  ).
TSS TSS (n  k )
 1  (1  R 2 ).( n  k ), ESS  R 2 .......... .................... .......... ...(3.48)
k 1 TSS
It is R 2 but not R 2 which helps to test the significance for exclusion of some variables from the
unrestricted model. In other words, the overall significance of the unrestricted model (with more
explanatory variables) is compared with the overall significance of restricted model (with fewer
explanatory variables) using the adjusted R-squared.
Computing P-values for F-test

Since the F-distribution depends on the numerator and denominator degree of freedoms, it is
difficult to come up with a reliable decision simply by looking at the statistical & critical values
of F. Therefore P-value is useful in reporting the outcome of F-tests.
In F-testing context, the P-value is defined as:
P-value = P (f > F)……………………………… (3.49)
Where f refers and F random variable with (k-1 , n-k) d.f and F refers to the actual value of the
test statistics. The p-value still has the same interpretation as it did for t-statistics: it is the
probability of rejecting H0 when it is true so that a small P-value is an evidence to reject H0
comparing with level of significance. As with t-testing, once the p-value has been computed, the
F-test can be carried out at any significance level.
 If p < α, then reject H0
 If p > α, then fail to reject H0

Wollo University Economics Dept Estimates Multiple Regression

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Wollo University Economics Dept Estimates Multiple Regression

Uploaded by

Copyright:

Available Formats

Wollo University, College of Business and Economics, Department of Economics

Econometrics Lecture Notes; 2013 By Addisu M. 1

(Yi / X 1i , X 2i )   0  1 X 1i   2 X 2i .......... .......... .......... .(3.3)

3.2 Estimating and Interpreting of OLS Estimators

Yˆ  ˆ0  ˆ1 X 1  ˆ 2 X 2 .................... .......... .......... ..(3.4)

Where ˆ 0 , ˆ1 andˆ 2 are the estimate of  0 , 1 , and  2 respectively.

Econometrics Lecture Notes; 2013 By Addisu M. 2

i.e, the equation (Y1  ˆ0  ˆ1 X 1i  ˆ 2 X 2i ) 2 .................... ....(3.5)

 (Yi  ˆ 0  ˆ1 X 1i  ˆ 2 X 2i ) 2 .......... .................... ...(3.6)

 (Yi  ˆ0  ˆ1 X 1i  ˆ 2 X 2i )  0

Dividing equation (3.7) by n, we can obtain:

Equation (3.8) can also be rewritten as follow so as to obtain

Econometrics Lecture Notes; 2013 By Addisu M. 3

Partial derivatives of  Uˆ i with respect to ̂ 1 gives us:

  Uˆ 2  (Yi  ˆ0  ˆ1 X 1i  ˆ 2 X 2 i )

  Yi X i  ˆ0  X 1i  ˆ1  X 1i  ˆ 2  X 1i X 2i  0.......(3.10)

Partial derivatives of  Uˆ i with respect to ̂ 2 gives us:

  Yi X i  ˆ 0  X 2i  ˆ1  X 1i X 2 i  ˆ 2  X 2 i  0.......( 3.11)

derivatives of  Uˆ i with respect to ˆ 0 , ˆ1 & ˆ 2 respectively, simultaneous we can obtain

Econometrics Lecture Notes; 2013 By Addisu M. 4

3.2.2 Interpreting OLS Estimates

Yˆ  ˆ 0  ˆ1 X 1  ˆ 2 X 2 .................... .......... .......... .............(3.16)

i.e, Yˆ  ˆ1 X 1

Yˆ  ˆ 0  ˆ1 X 1 .......... .................... .......... .................... .......(3.17)

Similarly, by holding X 1 fixed:

Yˆ  ˆ 0  ˆ 2 X 2 .................... .......... .................... .......... ......(3.18)

Econometrics Lecture Notes; 2013 By Addisu M. 5

3.3 Goodness of Fit

R 2  ESS / TSS  ( ˆ1  yi x1i  ˆ 2  yi x 2i ) /( y i )......... .......(3.24) )

3.4 OLS Assumptions in Multiple Regression Model

Econometrics Lecture Notes; 2013 By Addisu M. 6

Assumption 2: Random sampling

Assumption 3: Zero conditional mean

Assumption 4: No Perfect Colinearity/Multicolinearity

Econometrics Lecture Notes; 2013 By Addisu M. 7

corresponding to X j i.e, there is no serial correlation of the U’S.

Cov (U i ,U j )  0.......... .......... .......... .......... .......... .......... ...3.30

Assumption 8: Independency of U i & X i .

Every disturbance term U i is independent of the explanatory variables:

Cov(U i , X i )  0.......... .......... .......... .......... .......... .......... ...3.31

Assumption 9: No errors of measurement in X’S

Assumption 10: Correct specification of the model

3.4.2 Gauss Markov Theorem as a Measure of Efficiency of OLS

“BLUE” refers to:

Econometrics Lecture Notes; 2013 By Addisu M. 8

parameters  j so that the biasedness of ̂ j from  j is zero.

 0 , 1 ,......... ... K , respectively.

3.5 Hypothesis Testing

E ( ˆ 0 )   0 , E ( ˆ1 )  1 , and E ( ˆ 2 )   2 .......... .......... ....(3.33)

Econometrics Lecture Notes; 2013 By Addisu M. 9

Se( ˆ 0 )  Var ( ˆ 0 ) .................... .......... .......... .......... ....(3.37)

Se( ˆ1 )  Var ( ˆ1 ) .......... .......... .................... .......... ....(3.38)

Se( ˆ 2 )  Var ( ˆ 2 ) .................... .......... .......... .......... ....(3.39)

3.5.2 Testing Hypothesis about a Single Population Parameter: The t-test

testing the null hypothesis:

Where j represents any of the k-independent variables

Econometrics Lecture Notes; 2013 By Addisu M. 10

statistics is expected to be positive.

If ts < tc, then accept H 0

Econometrics Lecture Notes; 2013 By Addisu M. 11

If ts > -tc, then accept H 0

t s  ˆ1 / se( ˆ1 )  0.00046 / 0.0001  4.6

(K represents the number of slope parameters)