Introduction To Econometrics, 5 Edition: Chapter 2: Properties of The Regression Coefficients and Hypothesis Testing

Type author name/s here
Dougherty
Introduction to Econometrics,
5th edition
Chapter heading
Chapter 2: Properties of the
Regression Coefficients and
Hypothesis Testing
© Christopher Dougherty, 2016. All rights reserved.

RANDOM COMPONENTS, UNBIASEDNESS OF THE REGRESSION COEFFICIENTS
True model Fitted model

Y  1   2 X  u Ŷ  ˆ1  ˆ2 X
The regression coefficients are special types of random variable. We will demonstrate this
using the simple regression model in which Y depends on X. The two equations show the
true model and the fitted regression.
1

Y  1   2 X  u Ŷ  ˆ1  ˆ2 X
ˆ
2    X i  X   Yi  Y 
 X  X
2
i
 X i  X     1   2 X i  ui     1   2 X  u  

 X  X
2
i
 X i  X    2  X i  X   ui  u 

 X  X
2
i
We will investigate the properties of the ordinary least squares (OLS) estimator of the slope
coefficient, shown above.
2

Y  1   2 X  u Ŷ  ˆ1  ˆ2 X
ˆ
2    X i  X   Yi  Y 
 X  X
2
i
 X i  X     1   2 X i  ui     1   2 X  u  

 X  X
2
i
 X i  X    2  X i  X   ui  u 

 X  X
2
i
Y has two components: a nonrandom component that depends on X and the parameters,
and the random component u. Since ˆ2 depends on Y, it indirectly depends on u.
3

Y  1   2 X  u Ŷ  ˆ1  ˆ2 X
ˆ
2    X i  X   Yi  Y 
 X  X
2
i
 X i  X     1   2 X i  ui     1   2 X  u  

 X  X
2
i
 X i  X    2  X i  X   ui  u 

 X  X
2
i
If the values of u in the sample had been different, we would have had different values of Y,
and hence a different value for ˆ2 . We can in theory decompose b2 into its nonrandom and
random components.
4

Y  1   2 X  u Ŷ  ˆ1  ˆ2 X
ˆ
2    X i  X   Yi  Y 
 X  X
2
i
 X i  X     1   2 X i  ui     1   2 X  u  

 X  X
2
i
 X i  X    2  X i  X   ui  u 

 X  X
2
i
The first step is to substitute for Y and its sample mean from the true model.
5

Y  1   2 X  u Ŷ  ˆ1  ˆ2 X
ˆ
2    X i  X   Yi  Y 
 X  X
2
i
 X i  X     1   2 X i  ui     1   2 X  u  

 X  X
2
i
 X i  X    2  X i  X   ui  u 

 X  X
2
i
The b1 terms in the second factor cancel. We rearrange the remaining terms.
6

Y  1   2 X  u Ŷ  ˆ1  ˆ2 X
  X i  X    2  X i  X   ui  u 
ˆ
2 
 X  X
2
i
  X  X    X i  X   ui  u 
2
 2 i
 Xi  X 
2
 2 
  X  X   u  u
i i
 X  X 
2
i
We expand the numerator.
7

Y  1   2 X  u Ŷ  ˆ1  ˆ2 X
  X i  X    2  X i  X   ui  u 
ˆ
2 
 X  X
2
i
  X  X    X i  X   ui  u 
2
 2 i
 Xi  X 
2
 2 
  X  X   u  u
i i
 X  X 
2
i
Hence we decompose ˆ2 into the true value b2 and an error term that depends on the values
of X and u.
8

Y  1   2 X  u Ŷ  ˆ1  ˆ2 X
  X i  X    2  X i  X   ui  u 
ˆ
2 
 X  X
2
i
  X  X    X i  X   ui  u 
2
 2 i
 Xi  X 
2
 2 
  X  X   u  u
i i
 X  X 
2
i
The error term depends on the value of the disturbance term in every observation in the
sample, and thus it is a special type of random variable.
9

Y  1   2 X  u Ŷ  ˆ1  ˆ2 X
  X i  X    2  X i  X   ui  u 
ˆ
2 
 X  X
2
i
  X  X    X i  X   ui  u 
2
 2 i
 Xi  X 
2
 2 
  X  X   u  u
i i
 X  X 
2
i
The error term is responsible for the variations of ˆ2 around its fixed component b2. If we
wish, we can express the decomposition more tidily.
10
ˆ2   X  X   Y Y      X  X   u  u
i i i i
 X  X   X  X 
2 2 2
i i
This is the decomposition so far.
11
ˆ2   X  X   Y Y      X  X   u  u
i i i i
 X  X   X  X 
2 2 2
i i
 X i  X  ui  u     X i  X  ui    X i  X  u
   X i  X  ui  u   X i  X 
   X i  X  ui
The next step is to make a small simplification of the numerator of the error term. First, we
expand it as shown.
12
ˆ2   X  X   Y Y      X  X   u  u
i i i i
 X  X   X  X 
2 2 2
i i
   X i  X  ui  u   X i  X 
   X i  X  ui
The mean value of u is a common factor of the second summation, so it can be taken
outside.
13
ˆ
2    X i  X   Yi  Y 
 2 
  X  X   u  u
i i
 Xi  X   X  X 
2 2
i
   X i  X  ui  u   X i  X 
   X i  X  ui
 X i  X     X i   nX  nX  nX  0
X
 X i
The second term then vanishes because the sum of the deviations of X around its sample
mean is automatically zero.
14
ˆ2   X  X   Y Y      X  X   u  u
i i i i
 X  X   X  X 
2 2 2
i i
 X i  X  ui  u     X i  X  ui
   X  X
2
j
ˆ
2  2    X i  X  ui 1
  2    X i  X  ui
 
1  Xi  X 
  2      X i  X  ui   2     ui
   
  2   ai ui
Thus we can rewrite the decomposition as shown. For convenience, the denominator of the
error term has been denoted D.
15
ˆ2   X  X   Y Y      X  X   u  u
i i i i
 X  X   X  X 
2 2 2
i i
 X i  X  ui  u     X i  X  ui
   X  X
2
j
ˆ
2  2    X i  X  ui 1
  2    X i  X  ui
 
1  Xi  X 
  2      X i  X  ui   2     ui
   
  2   ai ui
A further small rearrangement of the expression for the error term.
16
ˆ2   X  X   Y Y      X  X   u  u
i i i i
 X  X   X  X 
2 2 2
i i
 X i  X  ui  u     X i  X  ui
   X  X
2
j
ˆ
2  2    X i  X  ui 1
  2    X i  X  ui
 
1  Xi  X 
  2      X i  X  ui   2     ui
   
  2   ai ui
Another re-arrangement.
17
ˆ2   X  X   Y Y      X  X   u  u
i i i i
 X  X   X  X 
2 2 2
i i
 X i  X  ui  u     X i  X  ui
   X  X
2
j
ˆ
2  2    X i  X  ui 1
  2    X i  X  ui
 
1  Xi  X 
  2      X i  X  ui   2     ui
   
  2   ai ui
One more re-arrangement.
18
ˆ2   X  X   Y Y      X  X   u  u
i i i i
 X  X   X  X 
2 2 2
i i
 X i  X  ui  u     X i  X  ui
   X  X
2
j
ˆ
2  2    X i  X  ui 1
  2    X i  X  ui
 
1  Xi  X 
  2      X i  X  ui   2     ui
   
  2   ai ui Xi  X Xi  X
ai  
 
 jX  X  2
Thus we have shown that ˆ2 is equal to the true value and plus a weighted linear
combination of the values of the disturbance term in the sample, where the weights are
functions of the values of X in the observations in the sample.
19
ˆ2   X  X   Y Y      X  X   u  u
i i i i
 X  X   X  X 
2 2 2
i i
 X i  X  ui  u     X i  X  ui
   X  X
2
j
ˆ
2  2    X i  X  ui 1
  2    X i  X  ui
 
1  Xi  X 
  2      X i  X  ui   2     ui
   
  2   ai ui Xi  X Xi  X
ai  
 
 jX  X  2
As you can see, every value of the disturbance term in the sample affects the sample value
of ̂ 2 .
20
ˆ2   X  X   Y Y      X  X   u  u
i i i i
 X  X   X  X 
2 2 2
i i
 X i  X  ui  u     X i  X  ui
   X  X
2
j
ˆ
2  2    X i  X  ui 1
  2    X i  X  ui
 
1  Xi  X 
  2      X i  X  ui   2     ui
   
  2   ai ui Xi  X Xi  X
ai  
 
 jX  X  2
Before moving on, it may be helpful to clarify a mathematical technicality. In the summation
in the denominator of the expression for ai, the subscript has been changed to j. Why?
21
Xi  X Xi  X Xi  X
ai   
  X 1  X  2  ...   X n  X  2
 X
n
 X
2
j
j 1
   X  X
2
j
ˆ
2  2    X i  X  ui 1
  2    X i  X  ui
 
1  Xi  X 
  2      X i  X  ui   2     ui
   
  2   ai ui Xi  X Xi  X
ai  
 
 jX  X  2
The denominator is the sum, from 1 to n, of the squared deviations of X from its sample
mean. This is made explicit in the version of the expression in the box at the top of the
slide.
22
ai   
  X 1  X  2  ...   X n  X  2
 X
n
 X
2
j
j 1
   X  X
2
j
ˆ
2  2    X i  X  ui 1
  2    X i  X  ui
 
1  Xi  X 
  2      X i  X  ui   2     ui
   
  2   ai ui Xi  X Xi  X
ai  
 
 jX  X  2
Written this way, the meaning of the denominator is clear, but the form is clumsy.
Obviously, we should use S–notation to compress it.
23
ai   
  X 1  X  2  ...   X n  X  2
 X
n
 X
2
j
j 1
   X  X
2
j
ˆ
2  2    X i  X  ui 1
  2    X i  X  ui
 
1  Xi  X 
  2      X i  X  ui   2     ui
   
  2   ai ui Xi  X Xi  X
ai  
 
 jX  X  2
For the S-notation, we need to choose an index symbol that changes as we go from the first
squared deviation to the last. We can use anything we like, EXCEPT i, because we are
already using i for a completely different purpose in the numerator.
24
ai   
  X 1  X  2  ...   X n  X  2
 X
n
 X
2
j
j 1
   X  X
2
j
ˆ
2  2    X i  X  ui 1
  2    X i  X  ui
 
1  Xi  X 
  2      X i  X  ui   2     ui
   
  2   ai ui Xi  X Xi  X
ai  
 
 jX  X  2
We have used j here, but this was quite arbitrary. We could have used anything for the
summation index (except i), as long as the meaning is clear. We could have used a smiley ☻
instead (please don’t).
25
ˆ2   X  X   Y Y      X  X   u  u
i i i i
 X  X   X  X 
2 2 2
i i
 X i  X  ui  u     X i  X  ui
   X  X
2
j
ˆ
2  2    X i  X  ui 1
  2    X i  X  ui
 
1  Xi  X 
  2      X i  X  ui   2     ui
   
  2   ai ui Xi  X Xi  X
ai  
 
 jX  X  2
The error term depends on the value of the disturbance term in every observation in the
sample, and thus it is a special type of random variable.
26
ˆ2   X  X   Y Y      X  X   u  u
i i i i
 X  X   X  X 
2 2 2
i i
 X i  X  ui  u     X i  X  ui
   X  X
2
j
ˆ
2  2    X i  X  ui 1
  2    X i  X  ui
 
1  Xi  X 
  2      X i  X  ui   2     ui
   
  2   ai ui Xi  X Xi  X
ai  
 
 jX  X  2
We will show that the error term has expected value zero, and hence that the ordinary least
squares (OLS) estimator of the slope coefficient in a simple regression model is unbiased.
27

Y  1   2 X  u Ŷ  ˆ1  ˆ2 X
ˆ2   X  X   Y Y     a u
i i
 X  X 
2 2 t i
i
Xi  X
ai 
 X  X
n
2
j
j 1
 
E ˆ2  E   2   E   ai ui 
  2   E  ai ui    2   ai E  ui    2
The expected value of ̂ 2 is equal to the expected value of b2 and the expected value of the
weighted sum of the values of the disturbance term.
28

Y  1   2 X  u Ŷ  ˆ1  ˆ2 X
ˆ2   X  X   Y Y     a u
i i
 X  X 
2 2 t i
i
E   ai ui   E  a1 u1  ...  an un   E  a1 u1   ...  E  a n un    E  a i ui 
 
E ˆ2  E   2   E   ai ui 
  2   E  ai ui    2   ai E  ui    2
b2 is fixed so it is unaffected by taking expectations. The first expectation rule (Review

chapter) states that the expectation of a sum of several quantities is equal to the sum of
their expectations.
29

Y  1   2 X  u Ŷ  ˆ1  ˆ2 X
ˆ2   X  X   Y Y     a u
i i
 X  X 
2 2 t i
i
Xi  X
ai 
 X  X
n
2
j
j 1
 
E ˆ2  E   2   E   ai ui 
  2   E  ai ui    2   ai E  ui    2
Now for each i, E(aiui) = aiE(ui). This is a really important step and we can make it only with
Model A.
30

Y  1   2 X  u Ŷ  ˆ1  ˆ2 X
ˆ2   X  X   Y Y     a u
i i
 X  X 
2 2 t i
i
Xi  X
ai 
 X  X
n
2
j
j 1
 
E ˆ2  E   2   E   ai ui 
  2   E  ai ui    2   ai E  ui    2
Under Model A, we are assuming that the values of X in the observations are nonstochastic.
It follows that each ai is nonstochastic, since it is just a combination of the values of X.
31

Y  1   2 X  u Ŷ  ˆ1  ˆ2 X
ˆ2   X  X   Y Y     a u
i i
 X  X 
2 2 t i
i
Xi  X
ai 
 X  X
n
2
j
j 1
 
E ˆ2  E   2   E   ai ui 
  2   E  ai ui    2   ai E  ui    2
Thus it can be treated as a constant, allowing us to take it out of the expectation using the
second expected value rule (Review chapter).
32

Y  1   2 X  u Ŷ  ˆ1  ˆ2 X
ˆ
2    X i  X   Yi  Y 
  2   at ui
 X  X
2
i
Xi  X
ai 
 X  X
n
2
j
j 1
 
E ˆ2  E   2   E   ai ui 
  2   E  ai ui    2   ai E  ui    2
Under Assumption A.3, E(ui) = 0 for all i, and so the estimator is unbiased. The proof of the
unbiasedness of the estimator of the intercept will be left as an exercise.
33

Y  1   2 X  u Ŷ  ˆ1  ˆ2 X
OLS estimators of the parameters are not the only unbiased estimators. We will give an
example of another.
34

Y  1   2 X  u Ŷ  ˆ1  ˆ2 X
ˆ Yn  Y1   1   2 X n  un     1   2 X 1  u1 
2  
X n  X1 X n  X1
 2  X n  X 1    un  u1 
Y
un  u1
  n 2  1   2 X n  un
Y  
Yn
X n  X1 X n  X1
Y1
Y1   1   2 X 1  u1
X1 Xn X
Someone who had never heard of regression analysis, seeing a scatter diagram of a sample
of observations, might estimate the slope by joining the first and the last observations, and
dividing the increase in the height by the horizontal distance between them.
35

Y  1   2 X  u Ŷ  ˆ1  ˆ2 X
ˆ Yn  Y1   1   2 X n  un     1   2 X 1  u1 
2  
X n  X1 X n  X1
 2  X n  X 1    un  u1 
Y
un  u1
  n 2  1   2 X n  un
Y  
Yn
X n  X1 X n  X1
Y1
Y1   1   2 X 1  u1
X1 Xn X
The estimator is thus (Yn–Y1) divided by (Xn–X1). We will investigate whether it is biased or
unbiased.
36

Y  1   2 X  u Ŷ  ˆ1  ˆ2 X
ˆ Yn  Y1   1   2 X n  un     1   2 X 1  u1 
2  
X n  X1 X n  X1
 2  X n  X 1    un  u1 
Y
un  u1
  n 2  1   2 X n  un
Y  
Yn
X n  X1 X n  X1
Y1
Y1   1   2 X 1  u1
X1 Xn X
To do this, we start by substituting for the Y components in the expression.
37

Y  1   2 X  u Ŷ  ˆ1  ˆ2 X
ˆ Yn  Y1   1   2 X n  un     1   2 X 1  u1 
2  
X n  X1 X n  X1
 2  X n  X 1    un  u1  un  u1
  2 
X n  X1 X n  X1
The b1 terms cancel out and the rest of the expression simplifies as shown. Thus we have
decomposed this naïve estimator into two components, the true value and an error term.
This decomposition is parallel to that for the OLS estimator, but the error term is different.
38

Y  1   2 X  u Ŷ  ˆ1  ˆ2 X
ˆ Yn  Y1   1   2 X n  un     1   2 X 1  u1 
2  
X n  X1 X n  X1
 2  X n  X 1    un  u1  un  u1
  2 
X n  X1 X n  X1
 un  u1 
E  2   E  2   E 
ˆ

X
 n  X 1 
1
 2  E  un  u1    2
X n  X1
We now take expectations to investigate unbiasedness.
39

Y  1   2 X  u Ŷ  ˆ1  ˆ2 X
ˆ Yn  Y1   1   2 X n  un     1   2 X 1  u1 
2  
X n  X1 X n  X1
 2  X n  X 1    un  u1  un  u1
  2 
X n  X1 X n  X1
 un  u1 
E  2   E  2   E 
ˆ

X
 n  X 1 
1
 2  E  un  u1    2
X n  X1
The denominator of the error term can be taken outside because the values of X are
nonstochastic.
40

Y  1   2 X  u Ŷ  ˆ1  ˆ2 X
ˆ Yn  Y1   1   2 X n  un     1   2 X 1  u1 
2  
X n  X1 X n  X1
 2  X n  X 1    un  u1  un  u1
  2 
X n  X1 X n  X1
 un  u1 
E  2   E  2   E 
ˆ

X
 n  X 1 
1
 2  E  un  u1    2
X n  X1
Given Assumption A.3, the expectations of un and u1 are zero. Therefore, despite being
naïve, this estimator is unbiased.
41

Y  1   2 X  u Ŷ  ˆ1  ˆ2 X
ˆ Yn  Y1   1   2 X n  un     1   2 X 1  u1 
2  
X n  X1 X n  X1
 2  X n  X 1    un  u1  un  u1
  2 
X n  X1 X n  X1
 un  u1 
E  2   E  2   E 
ˆ

X
 n  X 1 
1
 2  E  un  u1    2
X n  X1
It is intuitively easy to see that we would not prefer the naïve estimator to OLS. Unlike OLS,
which takes account of every observation, it employs only the first and the last and is
wasting most of the information in the sample.
42

Y  1   2 X  u Ŷ  ˆ1  ˆ2 X
ˆ Yn  Y1   1   2 X n  un     1   2 X 1  u1 
2  
X n  X1 X n  X1
 2  X n  X 1    un  u1  un  u1
  2 
X n  X1 X n  X1
 un  u1 
E  2   E  2   E 
ˆ

X
 n  X 1 
1
 2  E  un  u1    2
X n  X1
The naïve estimator will be sensitive to the value of the disturbance term u in those two
observations, whereas the OLS estimator combines all the disturbance term values and
takes greater advantage of the possibility that to some extent they cancel each other out.
43

Y  1   2 X  u Ŷ  ˆ1  ˆ2 X
ˆ Yn  Y1   1   2 X n  un     1   2 X 1  u1 
2  
X n  X1 X n  X1
 2  X n  X 1    un  u1  un  u1
  2 
X n  X1 X n  X1
 un  u1 
E  2   E  2   E 
ˆ

X
 n  X 1 
1
 2  E  un  u1    2
X n  X1
More rigorously, it can be shown that the population variance of the naïve estimator is
greater than that of the OLS estimator, and that the naïve estimator is therefore less
efficient.
44
Copyright Christopher Dougherty 2016.
These slideshows may be downloaded by anyone, anywhere for personal use.

Subject to respect for copyright and, where appropriate, attribution, they may be
used as a resource for teaching an econometrics course. There is no need to
refer to the author.
The content of this slideshow comes from Section 2.3 of C. Dougherty,

Introduction to Econometrics, fifth edition 2016, Oxford University Press.
Additional (free) resources for both students and instructors may be
downloaded from the OUP Online Resource Centre
http://www.oxfordtextbooks.co.uk/orc/dougherty5e/
Individuals studying econometrics on their own who feel that they might benefit
from participation in a formal course should consider the London School of
Economics summer school course
EC212 Introduction to Econometrics
http://www2.lse.ac.uk/study/summerSchools/summerSchool/Home.aspx
or the University of London International Programmes distance learning course
EC2020 Elements of Econometrics
www.londoninternational.ac.uk/lse.
2016.04.18

Introduction To Econometrics, 5 Edition: Chapter 2: Properties of The Regression Coefficients and Hypothesis Testing

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Introduction To Econometrics, 5 Edition: Chapter 2: Properties of The Regression Coefficients and Hypothesis Testing

Uploaded by

Copyright:

Available Formats

Type author name/s here

© Christopher Dougherty, 2016. All rights reserved.

True model Fitted model

True model Fitted model

True model Fitted model

True model Fitted model

True model Fitted model

True model Fitted model

True model Fitted model

We expand the numerator.

True model Fitted model

True model Fitted model

True model Fitted model

This is the decomposition so far.

A further small rearrangement of the expression for the error term.

One more re-arrangement.

True model Fitted model

True model Fitted model

b2 is fixed so it is unaffected by taking expectations. The first expectation rule (Review

True model Fitted model

True model Fitted model

True model Fitted model

True model Fitted model

True model Fitted model

True model Fitted model

True model Fitted model

True model Fitted model

To do this, we start by substituting for the Y components in the expression.

True model Fitted model

True model Fitted model

We now take expectations to investigate unbiasedness.

True model Fitted model

True model Fitted model

True model Fitted model

True model Fitted model

True model Fitted model

These slideshows may be downloaded by anyone, anywhere for personal use.

The content of this slideshow comes from Section 2.3 of C. Dougherty,

You might also like