Lecture 5

University of Agricultural Sciences and Veterinary Medicine Cluj-Napoca
- Agribusiness MSc -
Lecture 5. Two-variable regression analysis:

The problem of estimation – part 2 -
Course: Econometrics
Instructor: Diana Dumitras
Fall 2013
I. Assumptions under OLS
( The Classical Linear Regression Model (CLRM) – Gauss )
A1) The regression model is linear in parameters
Yi 1 2 Xi ui
A2) X values are fixed in repeated sampling

= X is assumed to be nonstochastic
A3) Zero mean value of disturbance ui
E[ui X i ] 0
E[Yi X i ] 1 2 Xi
Assumptions under OLS
A4) Homoscedasticity or equal variance of ui

= variation around the regression line is the same
across the X values
2
Var[ui X i ]
2
Var[Yi X i ]
Proof:
2
Var[ui X i ] E ui E[ui X i ] E[u 2i X i ] 2
2
Var[Yi X i ] E Yi E[Yi X i ]
E[( 1 2 Xi ui ) ( 1 2 X i )]
E (ui ) 0
Homoscedasticity vs. Heteroscedasticity
2 2
Var[ui X i ] Var[ui X i ] i
2 2
Var[Yi X i ] Var[Yi X i ] i
f(u) f(u)
σ12
Y Y
σ2 σ2 σ2 σ22 σ32
PRF PRF
X X
A5) No autocorrelation btw. the disturbances (no serial correlation)

= given any two X values, Xi and Xj ,
the correlation btw. any two ui and uj is zero
Cov[ui , u j X i , X j ] 0 i j
Proof:
Cov[ui , u j X i , X j ] E ui E (ui ) X i uj E (u j ) X j
E ui X i u j X j
0
Patterns of correlation among the disturbances:
Positive serial correlation Negative serial correlation

+ui +ui
-uj +uj -uj +uj
-ui -ui
Zero correlation
+ui
-uj +uj
-ui
A6) Zero covariance btw. ui and Xi

= the disturbance term and the explanatory variables
are uncorrelated
Cov[ui X i ] 0 or
E (ui X i ) 0
Proof:
Cov[ui X i ] E ui E (ui ) X i E( X i )
E ui X i E( X i )
E (ui X i ) E (ui ) E ( X i )
E (ui X i ) 0
A7) The number of observations n must be greater than

the number of parameters to be estimated
A8) Variability in X values

- the sample variance of X must be a finite positive number
(Xi X )2
Var ( X )
n 1
A9) The regression model is correctly specified
A10) There is no perfect multicollinearity

= no perfect linear relationships among
the explanatory variables
- in the case of multiple regression models
II. OLS Standard Errors
The estimates changes from sample to sample

- need: to find the precision of the estimators
Standard error (s.e.)

= standard deviation of the sampling distribution
of the estimator
X i2
Var( ˆ1 ) 2
2
se( ˆ1 ) var( ˆ1 )
n x i
2
Var( ˆ2 ) 2 se( ˆ2 ) var( ˆ2 )
x i
OLS Standard Errors
How to obtain the variance of ui ?
uî2
ˆ2 ˆ 2 = OLS estimator of σ2
n 2
uî2 = the sum of the residuals squared
(Residual sum of squares)
n 2 = no. of degrees of freedom (df)

df = no. of observations in the sample -
no. of parameters estimated
uˆ 2i (Yi Yî )2 (Yi ˆ ˆ X )2 - eq (1)

1 2 i
uî2 y 2i ˆ2 x2i
2
OLS Standard Errors
Note:
2
X i2
Var( ˆ2 ) Var( ˆ1 ) 2
xi2 n xi2
- the larger the variation in the X values,

the smaller the variance of the parameter
- the greater the precision the parameter is estimated
- the larger the variance of σ2 ,

the larger the variance of the parameter
- as n increases, the no. of terms in the sum increases

- increases the precision the parameter is estimated
III. The Gauss-Markov Theorem
Best linear unbiased estimator (BLUE):
- an estimator is said to be BLUE if:
1. it is linear
= is a linear function of a random variable
2. it is unbiased
= its average is equal to the true value
E ( ˆ2 ) 2
3. it has minimum variance in the class of all such linear

unbiased estimators
= is efficient estimator
The Gauss-Markov Theorem
The Gauss-Markov Theorem:

Given the assumptions of the CLRM, the LS estimators,
in the class of unbiased linear estimators,
have minimum variance, that is, they are BLUE
OLS estimators are BLUE
BLUE = smallest variance

IV. The coefficient of determination r2
The coefficient of determination r2:

= measure that tells how well SRL fits the data
= measures the “goodness of fit”
Venn diagram (The Ballentine view)
Y X Y X Y=X
r2 = 0 0 ≤ r2 ≤ 1 r2 = 1
Circle Y = variation in the dependent variable Y
Circle X = variation in the explanatory variable X
Shaded area = the extent to which the variation in Y
is explained by the variation in X
Computation of r2 :
- recall: Yi Yî uî

or in the deviation form:
yi yî uî
- summing over the sample:
y i2 yˆ i2 uî2 2 yˆ i uî
yi2
ˆy i2 ˆ
ui2
TSS ESS RSS

Breakdown of the variation of Yi into two components
TSS ESS RSS

uî due to residual
Y Yi
SRF
(Yi Y ) total
Ŷi
(Yî Y ) due to regression
Y
Xi X
Breakdown of the variation of Yi into two components
Total sum of squares (TSS)

= total variation of actual Y values about their sample mean
TSS y2i (Yi Y )2
Explained sum of squares (ESS)

= variation of the estimated Y values about their sample mean
ESS yî2 (Yî Y )2
Residual sum of squares (RSS)

= variation of Y values about their regression line
RSS uî2 (Yi Yî )2

The coefficient of determination
- recall: TSS ESS RSS

- dividing by TSS on both sides:
ESS RSS
1
TSS TSS
- the coefficient of determination is:
2 ESS (Yî Y ) 2 RSS

r or r 2
1
TSS (Yi Y ) 2 TSS
- or a much easier way to be computed (see Gujarati, p.85):
2
xi yi
r 2
0 ≤ r2 ≤ 1
x 2i y 2i
The coefficient of correlation
= measure of the degree of association btw. two variables
r r2 r
xi yi
x 2i y 2i
-1 ≤ r ≤ 1
Properties:
• it is symmetrical: rXY= rYX
• it is independent of the origin and scale
• if X, Y are statistically independent: rXY= 0
• it is a measure of linear association only
• it does not necessary imply any cause-and-effect relationship

Lecture 5

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 5

Uploaded by

Copyright:

Available Formats

University of Agricultural Sciences and Veterinary Medicine Cluj-Napoca

Lecture 5. Two-variable regression analysis:

A1) The regression model is linear in parameters

A2) X values are fixed in repeated sampling

A3) Zero mean value of disturbance ui

A4) Homoscedasticity or equal variance of ui

A5) No autocorrelation btw. the disturbances (no serial correlation)

Positive serial correlation Negative serial correlation

-uj +uj -uj +uj

A6) Zero covariance btw. ui and Xi

A7) The number of observations n must be greater than

A8) Variability in X values

A9) The regression model is correctly specified

A10) There is no perfect multicollinearity

The estimates changes from sample to sample

Standard error (s.e.)

How to obtain the variance of ui ?

n 2 = no. of degrees of freedom (df)

uˆ 2i (Yi Yˆi )2 (Yi ˆ ˆ X )2 - eq (1)

- the larger the variation in the X values,

- the larger the variance of σ2 ,

- as n increases, the no. of terms in the sum increases

Best linear unbiased estimator (BLUE):

- an estimator is said to be BLUE if:

3. it has minimum variance in the class of all such linear

The Gauss-Markov Theorem:

OLS estimators are BLUE

BLUE = smallest variance

The coefficient of determination r2:

Venn diagram (The Ballentine view)

- recall: Yi Yˆi uˆi

- summing over the sample:

TSS ESS RSS

TSS ESS RSS

Total sum of squares (TSS)

TSS y2i (Yi Y )2

Explained sum of squares (ESS)

ESS yˆi2 (Yˆi Y )2

Residual sum of squares (RSS)

RSS uˆi2 (Yi Yˆi )2

- recall: TSS ESS RSS

2 ESS (Yˆi Y ) 2 RSS

• it is independent of the origin and scale

• if X, Y are statistically independent: rXY= 0

• it is a measure of linear association only

• it does not necessary imply any cause-and-effect relationship

You might also like