You are on page 1of 23

Chapter III

Multiple Regression Analysis

Pr. Dr. Hédi ESSID


Topics to be Covered
 The need for additional regressors.

 Classical assumptions and Least Squares Estimation


extended to allow for several regressors.

 Interpreting the results.

 T-tests of individual parameter values.

 R squared and R bar squared.

 Practical illustrations.

Econometrics 2
Multiple Regression Models : Examples

 Food expenditure-Income equations may need to be


extended to include variables like price.

 Sales-advertising equations may need to be extended to


include variables such as consumers income, price and the
price of advertising of competitors’ products.

 Earnings equations may need to be extended to include


experience or age and other variables in addition to years of
schooling (including dummy variables to examine the
importance of gender, ethnic group etc.)

Econometrics 3
Assumptions of the Multiple Linear Regression
Model

 A stable linear stochastic relationship

Yi = β1 + β2 X2i + β3 X3i +.... + βk Xki + ui

for i = 1,2,...,n

 The parameters are the same for each observation – no


structural differences.

Econometrics 4
Assumptions of the Multiple Linear Regression
Model
u is a random variable distributed with

(1) zero mean

E(ui) = 0 for all i

(2) constant variance

Var(ui) = for all i

(2) implies disturbances are homoscedastic”

Econometrics 5
Assumptions of the Multiple Linear Regression
Model

(3) Disturbances are independent of one another

E(uiuj ) = 0 for i  j

(4) Disturbances are independent of each of the X


variables

E(ui|Xj) = E(ui) = 0 for all i, j

Econometrics 6
Assumptions of the Multiple Linear Regression
Model

(5) u has a normal distribution


thus with (2) we can write u ~ N(0, )

(6) There is no exact linear relationship among the X


variables (they are linearly independent)

Econometrics 7
Multiple Regression Model in Matrix Form

Arrange the Y, X and u values as follows

Y1  1 X 21 X 31 … X k 1   u1 
Y  1 X X 32 … X k 2  u 
Y   2 X  22  U   2
⋮ ⋮ ⋮ ⋮ ⋮ ⋮  ⋮
     
Yn  1 X 2n X 3 n … X kn  un 

Note: The first X subscriptes the variable, the second


identifies the observation number.
Econometrics 8
Multiple Regression Model in Matrix Form

 Also write the parameters as a column vector

 1 
 
   2
⋮ 
 
 k 

 We can then write the model in matrix form as

Y  X U

Econometrics 9
Least Squares Estimators

Least squares estimation chooses estimators


ˆ , ˆ ,..., ˆ so as to minimise the sum of the squares
1 2 k

of the differences between the actual and fitted values


of Y i.e.
i n
RSS   (Yi  Yˆi )2
i 1

where Yˆi  ˆ1  ˆ2 X 2 i  ˆ3 X 3 i  .....  ˆk X ki

Econometrics 10
Least Squares Estimators – Matrix Notation

 In matrix notation the estimators may be written as

   
1
ˆ  X T X X TY

 Where Y is a column vector of the Y values (Y1,Y2,…,Yn),


and

 X is a nk matrix containing a column of ones (to pick up


the intercept term) followed by a colomn for each of the X
variables containing the observations on them, and

 ˆ is a k vector containing the estimators of different


regression parameters.

Econometrics 11
Statistical Properties of Least Squares
Estimators – Matrix Notation

   
1 1
ˆ  X T X X Y  X X
T T
U

 E ( ˆ )  

 We say that Least Squares Estimators are unbiased

Econometrics 12
Statistical Properties of Least Squares Estimators
– Matrix Notation

X X
1
Var ( ˆ )  2
u
T

X X
1
Assumption 5  ˆ ∼ N (  , 2
u
T
)

2
Or is unknown !

   
1
 Var ˆ  ˆu2 X T X

 and s.e.( ˆ j )  ˆu a jj

Econometrics 13
Goodness of Fit
The coefficient of multiple determination is defined as
the ratio of the Explained or Regression Sum of Squares
to the Total Sum of Squares:

Ŷ  Y 
2

R 2

 
ESS
 Y  Y 
2
TSS

RSS
Or we could use R  1 
2

TSS
Note: this is no longer the square of a simple correlation
coefficient.

Econometrics 14
Adjusted R squared (R bar squared)

RSS
Whereas R  1
2

TSS

Adjusted R squared or R bar squared is defined as

RSS / n  k
R  1
2

TSS / n  1

Assuming that k denotes the number of independent


variables regressors. This penalizes models where the
reduction in sum of squared residuals isn’t enough to
compensate for the loss of degrees of freedom.
Econometrics 15
More on R bar squared
 The use of an adjusted R2 is an attempt to take account of
the phenomenon of the R2 automatically and spuriously
increasing when extra explanatory variables are added to
the model.

 It is a modification due to Theil of R2 that adjusts for the


number of explanatory terms in a model relative to the
number of data points.

 The adjusted R2 can be negative, and its value will always


be less than or equal to that of R2.

 Unlike R2, the adjusted R2 increases when a new


explanator is included only if the new explanator improves
the R2 more than would be expected in the absence of any
explanatory value being added by the new explanator.

Econometrics 16
More on R bar squared
A bit of simple algbra shows the relationship beween R2
and R 2

 n 1
2



R  1  1 R 
2
n  k 

Can you show this ?

Notice that R2 and R2 will be very close for large


samples, but if the sample size is small the ratio n-1/n-k
can be some way from 1 and thus R2 bar will be rather
less than R2.

Econometrics 17
Tests of Significance of the Individual Parameter
Estimates
Under the classical assumptions for any particular
parameter βj the statistic

ˆ j   j
has a tn-k distribution
s.e.( ˆ j )

So we can use the ratio in T- tests of H0: βj=0

Note: the formula for s.e.( ˆ j ) is more complicated than in


the bivariate case and is best expressed in matrix terms.

Econometrics 18
Tests of Significance of the Individual Parameter
Estimates
2
As before an estimate of u
is required.

We use the formula,

ˆ 
2 

ˆ
u 2
RSS
u
nk nk

Here k stands for the number of independent variables


with constant intercept so there are (n-k) degrees of
freedeom available to us.

Econometrics 19
Testing the Overall Significance of the Regression.
Analysis of Variance (F-test)
H0: 2=…=k=0
H1: not all the parameters are 0

Under H0 the ratio

 ( ˆ
y  y )2
/ k  1 ESS / k  1
 ∼ Fk 1,n k
 uˆ / n  k
2
RSS / n  k

So if Fcal > Ftable reject H0

Econometrics 20
R squared and the F-value

Note : Also, we can calculate an Fcal value from the value


of R squared :

R2 / k  1
Fcal  ∼ Fk 1,n k
(1  R ) / n  k
2

Econometrics 21
Practical illustrations

Example: Big Andy’s Hamburger Sales

 The andy dataset includes variables

- sales, which is monthly revenue to the company in


$1000s,

- price, which is a price index of all products sold by


Big Andy’s, and

- advert, the advertising expenditure in a given month,


in $1000 s.

Econometrics 22
Practical illustrations

Example: Big Andy’s Hamburger Sales

 Summary statistics for the andy dataset is shown in Table


3.1.

 The basic andy model is presented in the equation below:

Sales=1 +2 price +3 advert + u

Econometrics 23

You might also like