You are on page 1of 19

University of Agricultural Sciences and Veterinary Medicine Cluj-Napoca

- Agribusiness MSc -

Lecture 5. Two-variable regression analysis:


The problem of estimation – part 2 -

Course: Econometrics
Instructor: Diana Dumitras

Fall 2013
I. Assumptions under OLS
( The Classical Linear Regression Model (CLRM) – Gauss )

A1) The regression model is linear in parameters

Yi 1 2 Xi ui

A2) X values are fixed in repeated sampling


= X is assumed to be nonstochastic

A3) Zero mean value of disturbance ui

E[ui X i ] 0

E[Yi X i ] 1 2 Xi
Assumptions under OLS

A4) Homoscedasticity or equal variance of ui


= variation around the regression line is the same
across the X values
2
Var[ui X i ]
2
Var[Yi X i ]
Proof:
2
Var[ui X i ] E ui E[ui X i ] E[u 2i X i ] 2

2
Var[Yi X i ] E Yi E[Yi X i ]
E[( 1 2 Xi ui ) ( 1 2 X i )]
E (ui ) 0
Homoscedasticity vs. Heteroscedasticity

2 2
Var[ui X i ] Var[ui X i ] i

2 2
Var[Yi X i ] Var[Yi X i ] i

f(u) f(u)

σ12
Y Y
σ2 σ2 σ2 σ22 σ32

PRF PRF

X X
Assumptions under OLS

A5) No autocorrelation btw. the disturbances (no serial correlation)


= given any two X values, Xi and Xj ,
the correlation btw. any two ui and uj is zero

Cov[ui , u j X i , X j ] 0 i j

Proof:
Cov[ui , u j X i , X j ] E ui E (ui ) X i uj E (u j ) X j
E ui X i u j X j
0
Patterns of correlation among the disturbances:

Positive serial correlation Negative serial correlation


+ui +ui

-uj +uj -uj +uj

-ui -ui

Zero correlation
+ui

-uj +uj

-ui
Assumptions under OLS

A6) Zero covariance btw. ui and Xi


= the disturbance term and the explanatory variables
are uncorrelated

Cov[ui X i ] 0 or

E (ui X i ) 0
Proof:
Cov[ui X i ] E ui E (ui ) X i E( X i )
E ui X i E( X i )
E (ui X i ) E (ui ) E ( X i )
E (ui X i ) 0
Assumptions under OLS

A7) The number of observations n must be greater than


the number of parameters to be estimated

A8) Variability in X values


- the sample variance of X must be a finite positive number
(Xi X )2
Var ( X )
n 1

A9) The regression model is correctly specified

A10) There is no perfect multicollinearity


= no perfect linear relationships among
the explanatory variables
- in the case of multiple regression models
II. OLS Standard Errors

The estimates changes from sample to sample


- need: to find the precision of the estimators

Standard error (s.e.)


= standard deviation of the sampling distribution
of the estimator

X i2
Var( ˆ1 ) 2
2
se( ˆ1 ) var( ˆ1 )
n x i

2
Var( ˆ2 ) 2 se( ˆ2 ) var( ˆ2 )
x i
OLS Standard Errors

How to obtain the variance of ui ?

uˆi2
ˆ2 ˆ 2 = OLS estimator of σ2
n 2
uˆi2 = the sum of the residuals squared
(Residual sum of squares)

n 2 = no. of degrees of freedom (df)


df = no. of observations in the sample -
no. of parameters estimated

uˆ 2i (Yi Yˆi )2 (Yi ˆ ˆ X )2 - eq (1)


1 2 i

uˆi2 y 2i ˆ2 x2i
2
OLS Standard Errors

Note:
2
X i2
Var( ˆ2 ) Var( ˆ1 ) 2

xi2 n xi2

- the larger the variation in the X values,


the smaller the variance of the parameter
- the greater the precision the parameter is estimated

- the larger the variance of σ2 ,


the larger the variance of the parameter

- as n increases, the no. of terms in the sum increases


- increases the precision the parameter is estimated
III. The Gauss-Markov Theorem

Best linear unbiased estimator (BLUE):

- an estimator is said to be BLUE if:

1. it is linear
= is a linear function of a random variable

2. it is unbiased
= its average is equal to the true value

E ( ˆ2 ) 2

3. it has minimum variance in the class of all such linear


unbiased estimators
= is efficient estimator
The Gauss-Markov Theorem

The Gauss-Markov Theorem:


Given the assumptions of the CLRM, the LS estimators,
in the class of unbiased linear estimators,
have minimum variance, that is, they are BLUE

OLS estimators are BLUE

BLUE = smallest variance


IV. The coefficient of determination r2

The coefficient of determination r2:


= measure that tells how well SRL fits the data
= measures the “goodness of fit”

Venn diagram (The Ballentine view)

Y X Y X Y=X

r2 = 0 0 ≤ r2 ≤ 1 r2 = 1
Circle Y = variation in the dependent variable Y
Circle X = variation in the explanatory variable X
Shaded area = the extent to which the variation in Y
is explained by the variation in X
Computation of r2 :

- recall: Yi Yˆi uˆi


or in the deviation form:

yi yˆi uˆi

- summing over the sample:

y i2 yˆ i2 uˆi2 2 yˆ i uˆi

yi2
ˆy i2 ˆ
ui2

TSS ESS RSS


Breakdown of the variation of Yi into two components

TSS ESS RSS


uˆi due to residual
Y Yi
SRF
(Yi Y ) total
Ŷi
(Yˆi Y ) due to regression
Y

Xi X
Breakdown of the variation of Yi into two components

Total sum of squares (TSS)


= total variation of actual Y values about their sample mean

TSS y2i (Yi Y )2

Explained sum of squares (ESS)


= variation of the estimated Y values about their sample mean

ESS yˆi2 (Yˆi Y )2

Residual sum of squares (RSS)


= variation of Y values about their regression line

RSS uˆi2 (Yi Yˆi )2


The coefficient of determination

- recall: TSS ESS RSS


- dividing by TSS on both sides:

ESS RSS
1
TSS TSS
- the coefficient of determination is:

2 ESS (Yˆi Y ) 2 RSS


r or r 2
1
TSS (Yi Y ) 2 TSS
- or a much easier way to be computed (see Gujarati, p.85):

2
xi yi
r 2
0 ≤ r2 ≤ 1
x 2i y 2i
The coefficient of correlation
= measure of the degree of association btw. two variables

r r2 r
xi yi
x 2i y 2i
-1 ≤ r ≤ 1
Properties:
• it is symmetrical: rXY= rYX

• it is independent of the origin and scale

• if X, Y are statistically independent: rXY= 0

• it is a measure of linear association only

• it does not necessary imply any cause-and-effect relationship

You might also like