# 1

Properties of the
OLS Estimator
Quantitative
Methods 2
Lecture 5
2
Solutions for β
0
and β
1

OLS Chooses values of β
0
and β
1
that
minimizes the unexplained sum of squares.

To find minimum take partial derivatives with
respect to β
0
and β
1

=
=
n
i
u SSR
1
2
ˆ
2
ˆ
( )
i i
SSR y y ·
¿
2
0 1
( )
i i
SSR y x þ þ ·
¿
3
Solutions for β
0
and β
1

Derivatives were transformed into the
normal equations

Solving the normal equations for β
0

and β
1
gives us our OLS estimators
0 1
2
0 1
ˆ ˆ
( ) ( )
ˆ ˆ
( ) ( )
i i
i i i i
y n x
x y x x
þ þ
þ þ
· +
· +
¿ ¿
¿ ¿ ¿
4
Solutions for β
0
and β
1

Our estimate of the slope of the line is:

And our estimate of the intercept is:
1
2
( )( )
ˆ
( )
i i
i
x x y y
x x
þ

·

¿
¿
0 1
ˆ ˆ
y x þ þ ·
5
Estimators and the “True”
Coefficients

are the “true”coefficients if we
only wanted to describe the data we
have observed

We are almost ALWAYS using data to
our data

Thus are estimates of some
“true” set of coefficients (β
0
and β
1
) that

exist beyond our observed data
0 1
ˆ ˆ
& þ þ
0 1
ˆ ˆ
& þ þ
6
Some Terminology for Labeling
Estimators

Various conventions used to
distinguish the “true” coefficients
from the estimates that we observe.
We will use the beta versus beta-
hat distinction from Wooldridge.

But other authors, textbooks, or
websites may use different terms.
ˆ
: Wooldridge þ þ ÷
:
ˆ
&
Others
b b
b
b B a A
þ
÷
÷
÷ ÷
Think of this as the
same distinction
between population
values and sample-
based estimates
7
Gauss-Markov Theorem:
Under the 5 Gauss-Markov assumptions,
the OLS estimator is the best, linear,
unbiased estimator of the true parameters
(β’s) conditional on the sample values of
the explanatory variables. In other words,
the OLS estimators is BLUE
8
5 Gauss-Markov Assumptions for
Simple Linear Model (Wooldridge, p.65)

Linear in Parameters

Random Sampling of n
observations

Sample variation in
explanatory variables. x
i
’s
are not all the same value

Zero conditional mean. The
error u has an expected
value of 0, given any values
of the explanatory variable

Homoskedasticity. The
error has the same variance
given any value of the
explanatory variable.
,
( ) : 1, 2,...
i i
x y i n ·
0 1 1
y x u þ þ · + +
1 2 3
( ... )
n
x x x x x / · · · ·
( ) 0 E u x ·
2
( ) Var u x o ·
9
The Linearity Assumption

Key to understanding
OLS models.

The restriction is that our
model of the population
must be linear in the
parameters.

A model cannot be non-
linear in the parameters

Non-linear in the
variables (x’s), however,
is fine and quite useful
0 1 1 2 2
...
k k
y x x x u þ þ þ þ · + + + +
2
0 1 1
0 1 1
[ ]
[ ln( ) ]
OLS y x
OLS y x
þ þ
þ þ
/ · +
/ · +
2
0 1 1 1
0 1 1
[ ]
[ ln( )]
OLS y x x
OLS y x
þ þ þ
þ þ
= · + +
= · +
10
F(y/x)
y
x
x
1
Demonstration of the Homskedasticity Assumption
Predicted Line Drawn Under Homoskedasticity
x
2
x
4
x
3
x y
1 0
ˆ ˆ
ˆ
Β + Β =
Variance across values
of x is constant
11
F(y/x)
y
x
x
1
Demonstration of the Homskedasticity Assumption
Predicted Line Drawn Under Heteroskedasticity
x
2
x
4
x
3
x y
1 0
ˆ ˆ
ˆ
Β + Β =
Variance differs across
values of x
12
How Good are the Estimates?
Properties of Estimators

Small Sample Properties

True regardless of how much data we have

Most desirable characteristics

Unbiased

Efficient

BLUE (Best Linear Unbiased Estimator)
13
“Second Best” Properties of
Estimators

Asymptotic (or large sample) Properties

True in hypothetical instance of infinite data

In practice applicable if N>50 or so

Asymptotically unbiased

Consistency

Asymptotic efficiency
14
Bias

A parameter is unbiased if

In other words, the average value of the estimator
in repeated sampling equals the true parameter.

Note that whether an estimator is biased or not
ˆ
( ) , 0,1,....,
j j
E j k þ þ · ·
15
Efficiency

An estimator is efficient if its variance is less
than any other estimator of the parameter.

This criterion only useful in combination with
others. (e.g. =2 is low variance, but biased)
is the“best” Unbiased estimator if
,where is any other unbiased estimator
of β
ˆ
j
þ
ˆ
j
þ
ˆ
( ) ( )
j j
Var Var þ þ s
%
j
þ
%
We might want to
choose a biased
estimator, if it has a
smaller variance.
16
True β
F(βx)
β + bias
Biased estimator
of β
Unbiased and
efficient estimator
of β
High Sampling
Variance means
inefficient
estimator of β
0
17
BLUE
(Best Linear Unbiased Estimate)

An Estimator is BLUE
if:

is a linear function

is unbiased:

is the most efficient:
ˆ
j
þ
ˆ
j
þ
ˆ
j
þ
ˆ
j
þ
ˆ
( ) , 0,1,....,
j j
E j k þ þ · ·
ˆ
( ) ( )
j j
Var Var þ þ s
%
18
Large Sample Properties

Asymptotically Unbiased

As n becomes larger E( ) trends toward β
j

Consistency

If the bias and variance both decrease as n
gets larger, the estimator is consistent.

Asymptotic Efficiency

asymptotic distribution with finite mean and
variance

is consistent

no estimator has smaller asymptotic variance
ˆ
j
þ
19
F(βx)
True β
0
n=50
Demonstration of
Consistency
n=16
n=4
20
Let’s Show that OLS is
Unbiased

Begin with our equation: y
i
= β
0

1
x
i
+u

u ~ N(0,σ
2
) and y
i
~ N(β
0

1
x
i
, σ
2
)

A Linear function of a normal random
variable is also a normal random
variable

Thus β
0
and β
1
are normal random
variables
21
The Robust Assumption of
“Normality”

Even if we do not know the distribution of y, β
0

and β
1
will behave like normal random variables

Central Limit Theorem says estimates of the
mean of any random variable will approach
normal as n increases

Assumes cases are independent (errors not
correlated) and identically distributed (i.i.d).

This is critical for hypothesis testing

β’s are normal regardless of y
22
Showing β
1
hat
is Unbiased

Recall the formula for

From rules of summation properties, this
reduces to:
1
2
( )( )
ˆ
( )
i i
i
x x y y
x x
þ

·

¿
¿
ˆ
j
þ
1
2
( )
ˆ
( )
i i
i
x x y
x x
þ

·

¿
¿
23
Showing β
1
hat
is Unbiased

Now we substitute for y
i
to yield:

This expands to:
0 1 1
1
2
( )( )
ˆ
( )
i i
i
x x x u
x x
þ þ
þ
+ +
·

¿
¿
0 1 1
1
2
( ) ( ) ( )
ˆ
( )
i i i i
i
x x x x x x x u
x x
þ þ
þ
+ +
·

¿ ¿ ¿
¿
24
Showing β
1
hat
is Unbiased

Now we can separate terms to yield:

Now we need to rely on two more rules of
summation:
0 1 1
1
2 2 2
( ) ( ) ( )
ˆ
( ) ( ) ( )
i i i i
i i i
x x x x x x x u
x x x x x x
þ þ
þ

· + +

¿ ¿ ¿
¿ ¿ ¿
1
2 2 2
1 1 1
6. ( ) 0
7. ( ) ( ) ( )
n
i
i
n n n
i i i i
i i i
x x
x x x n x x x x
·
· · ·
·
· ·
¿
¿ ¿ ¿
25
Showing β
1
hat
is Unbiased

By the first summation rule, the first term = 0

By the second summation rule, the second term
= β
1

This leaves:
1 1
2
( )
ˆ
( )
i i
i
x x u
x x
þ þ

· +

¿
¿
26
Showing β
1
hat
is Unbiased

Expanding the summations yields:

To show that β
1
hat
is unbiased, we must show
that the expectation of β
1
hat
= β
1
1 1 2 2
1 1
2 2 2
( ) ( ) ( )
ˆ
...
( ) ( ) ( )
n n
i i i
x x u x x u x x u
x x x x x x
þ þ

· + + +

¿ ¿ ¿
1 1 2 2
1 1
2 2 2
( ) ( ) ( )
ˆ
( ) ...
( ) ( ) ( )
n n
i i i
x x u x x u x x u
E E E E
x x x x x x
þ þ
] ] ]

· + + +
] ] ]

] ] ]
] ] ]
¿ ¿ ¿
27
Showing β
1
hat
is Unbiased

Now we need Gauss-
Markov assumption4
that expected value of
the error term = 0

Then all terms after β
1

are equal to 0

This reduces to:
( ) 0 E u x ·
1 1
ˆ
( ) 0 0... 0 E þ þ · + + +
1 1
ˆ
( ) E þ þ ·
Two Assumptions needed
to get this result:
1. x’s are fixed
(measured without
error)
2. Expected Value of the
error is zero
28
Showing β
0
hat
Is Unbiased

Begin with equation for β
0
hat

Since :

Substitute for mean of y (y
bar)
0 1
ˆ ˆ
y x þ þ ·
0 1
0 1
... ,...
i i i
y x u
Then
y x u
þ þ
þ þ
· + +
· + +
0 0 1 1
ˆ ˆ
x u x þ þ þ þ · + +
29
Showing β
0
hat
Is Unbiased

Take expected value of both sides:

We just showed that
Thus, β
1
’s cancel each other out

This leaves:

Again, since then,
0 0 1 1
ˆ ˆ
[ ] [ ] [ ] E x E u E x þ þ þ þ · + +
0 0
ˆ
[ ] [ ] E E u þ þ · +
[ ] 0 E u ·
0 0
ˆ
[ ] E þ þ ·
1 1
ˆ
( ) E þ þ ·
30
Notice Assumptions

Two key assumptions to show β
0
hat
and
β
1
hat
are unbiased

x is fixed (meaning, it is measured without
error)

E(u)=0

Unbiasedness tells us that OLS will give
us a best guess at the slope and intercept
that is correct on average.
31
OK, but is it BLUE?

Now we have an estimator (β
1
hat
)

We know that β
1
hat
is unbiased

We can calculate the variance of β
1
hat

across samples.

But is β
1
hat
the Best Linear Unbiased
Estimator????
32
The variance of the
estimator and
hypothesis testing
33
The variance of the estimator
and hypothesis testing

We have derived an estimator for the
slope a line through data: β
1
hat

We have shown that β
1
hat
is an unbiased
estimator of the “true” relationship β
1

Must assume x is measured without error

Must assume the expected value of the error
term is zero
34
Variance of β
0
hat
and β
1
hat

Even if β
0
hat
and β
1
hat
are right “on average” we
still want to know how far off they might be in a
given sample

1
, not β
1
hat

Thus we need to know the variance of β
0
hat
and
β
1
hat

Use probability theory to draw conclusions about
β
1
, given our estimate of β
1
hat

35
Variances of β
0
hat
and β
1
hat

Conceptually, the
variances of β
0
hat
and β
1
hat

are the expected distance
from their individual
values to their mean
values.

We can solve these
based on our proof of
unbiasedness

Recall from above:
2
0 0 0
2
1 1 1
ˆ ˆ ˆ
Var( ) = E[( -E( )) ]
ˆ ˆ ˆ
Var( ) = E[( -E( )) ]
þ þ þ
þ þ þ
1 1 2 2
1 1
2 2 2
( ) ( ) ( )
ˆ
...
( ) ( ) ( )
n n
i i i
x x u x x u x x u
x x x x x x
þ þ

· + + +

¿ ¿ ¿
36
The Variance of β
1
hat

If a random variable (β
1
hat
) is the linear
combination of other independently distributed
random variables (u)

Then the variance of β
1
hat
is the sum of the
variances of the u’s

Note the assumption of independent observations

Applying this principle to the previous equation
yields:
37
The Variance of β
1
hat

Now we need another Gauss-Markov
Assumption 5:

That is, we must assume that the variance of
the errors is constant. This yields:
2 2 2 2 2 2
1 1 2 2
1
2 2 2 2 2 2
( ) ( ) ( )
ˆ
( ) ...
[ ( ) ] [ ( ) ] [ ( ) ]
u u n un
i i i
x x x x x x
Var
x x x x x x
o o o
þ

· + +

¿ ¿ ¿
2 2
2
2
1
2
...
un u u u
σ σ σ σ = = =
2
( ) Var u x o ·
2
2 2
ˆ 1
2 2
1
( )
ˆ
( )
[ ( ) ]
i
u
i
x x
Var
x x
þ
þ o o

· ·

¿
38
The Variance of β
1
hat
!

OR:

That is, the variance of β
1
hat
is a function of
the variance of the errors (σ
u
2
), and the
variation of x

But…what is the true variance of the errors?
2
2
ˆ
1
2
1
ˆ
( )
( )
u
i
Var
x x
þ
o
þ o · ·

¿
39
The Estimated Variance of β
1
hat

We do not observe the σ
u
2
- because we
don’t observe β
0
and β
1

β
0
hat
and β
1
hat
are unbiased, so we use the
variance of the observed residuals as an
estimator of the variance of the “true” errors

We lose 2 degrees of freedom by
substituting in estimators β
0
hat
and β
1
hat

40
The Estimated Variance of β
1
hat

Thus:

This is an unbiased estimator of σ
u
2

Thus the final equation for the estimated variance
of β
1
hat
is:

New assumptions: independent observations and
constant error variance
2
ˆ
2
2
ˆ

=

n
u
i
u
σ
2
2
ˆ
ˆ 1
2
1
ˆ
( )
( )
u
i
Var
x x
þ
o
þ o · ·

¿
41
The Estimated Variance of β
1
hat

has nice intuitive qualities

As the size of the errors decreases,
decreases

The line fits tightly through the data. Few other lines
could fit as well

As the variation in x increases, decreases

Few lines will fit without large errors for extreme
values of x
2
ˆ
1 þ
o
2
ˆ
1 þ
o
2
ˆ
1 þ
o
2
2
ˆ
ˆ 1
2
1
ˆ
( )
( )
u
i
Var
x x
þ
o
þ o · ·

¿
42
The Estimated Variance of β
1
hat

Because the variance of the estimated errors has
n in the denominator, as n increases, the
variance of β
1
hat
decreases

The more data points we must fit to the line,
the smaller the number of lines that fit with few
errors

line must go
2
ˆ
2
2
ˆ

=

n
u
i
u
σ
2
2
ˆ
ˆ 1
2
1
ˆ
( )
( )
u
i
Var
x x
þ
o
þ o · ·

¿
43
Variance of β
1
hat
is Important for
Hypothesis Testing

F –test – hypothesis that Null Model does
better

Log-likelihood Test – joint significance of
variables in an MLE model

T-test – tests that individual coefficients
are not zero.

This is the central task for testing most policy
theories
44
T-Tests

In general, our theories give us
hypotheses that β
0
>0 or β
1
<0, etc.

We can estimate β
1
hat
, but we need a way
to assess the validity of statements that β
1

is positive or negative, etc.

We can rely on our estimate of β
1
hat
and its
variance to use probability theory to test
such statements.
45
Z – Scores & Hypothesis Tests

We know that β
1
hat
~ N(β
1
, σ
β
)

Subtracting β
1
from both sides, we can
see that (β
1
hat
- β
1
) ~ N( 0 , σ
β
)

The if we divide by the standard deviation
we can see that:

1
hat
- β
1
) / β
1
hat
~ N( 0 , 1 )

To test the “Null Hypothesis that β
1
=0, we
can see that: β
1
hat
/ σ
β
~ N( 0 , 1 )
46
Z-Scores & Hypothesis Tests

This variable is a “z-score” based on the
standard normal distribution.

95% of cases are within 1.96 standard
deviations of the mean.

If β
1
hat
/ σ
β
> 1.96 then in a series of
random draws there is a 95% chance that
β
1
>0

The Problem is that we don’t know σ
β
47
Z-Scores and t-scores

Obvious solution is to substitute
in place of σ
β

Problem: β
1
hat
/

is the ratio of two random
variables, and this will not be normally
distributed

Fortunately, an employee of Guinness Brewery
figured out this distribution in 1919
ˆ
1 þ
o
ˆ
1 þ
o
48
The t-statistic

The statistic is called “Student’s t,” and the t-
distribution looks similar to a normal distribution

Thus β
1
hat
/

~ t
(n-2)
for bivariate regression.

More generally β
1
hat
/

~ t(n-k)

where k is the # of parameters estimated
ˆ
1 þ
o
ˆ
1 þ
o
49
The t-statistic

Note the addition of a “degrees of
freedom” constraint

Thus the more data points we have
relative to the number of parameters we
are trying to estimate, the more the t
distribution looks like the z distribution.

When n>100 the difference is negligible
50
Limited Information in Statistical
Significance Tests

Results often illustrative rather than
precise

Only tests “not zero” hypothesis – does
not measure the importance of the
variable (look at confidence interval)

Generally reflects confidence that results
are robust across multiple samples
51
For Example… Presidential
Approval and the CPI

reg approval cpi

Source | SS df MS Number of obs = 148

---------+------------------------------ F( 1, 146) = 9.76

Model | 1719.69082 1 1719.69082 Prob > F = 0.0022

Residual | 25731.4061 146 176.242507 R-squared = 0.0626

Total | 27451.0969 147 186.742156 Root MSE = 13.276

------------------------------------------------------------------------------

approval | Coef. Std. Err. t P>|t| [95% Conf. Interval]

---------+--------------------------------------------------------------------

cpi | -.1348399 .0431667 -3.124 0.002 -.2201522 -.0495277

_cons | 60.95396 2.283144 26.697 0.000 56.44168 65.46624

------------------------------------------------------------------------------

. sum cpi

Variable | Obs Mean Std. Dev. Min Max

---------+-----------------------------------------------------

cpi | 148 46.45878 25.36577 23.5 109
52
So the distribution of β
1
hat
is:
F
r
a
c
t
i
o
n
Simd cpi parameter
-.3 -.2 -.135 -.1 -.05 0 .1
0
.3
53
Now Lets Look at Approval and
the Unemployment Rate

. reg approval unemrate

Source | SS df MS Number of obs = 148

---------+------------------------------ F( 1, 146) = 0.85

Model | 159.716707 1 159.716707 Prob > F = 0.3568

Residual | 27291.3802 146 186.927262 R-squared = 0.0058

Total | 27451.0969 147 186.742156 Root MSE = 13.672

------------------------------------------------------------------------------

approval | Coef. Std. Err. t P>|t| [95% Conf. Interval]

---------+--------------------------------------------------------------------

unemrate | -.5973806 .6462674 -0.924 0.357 -1.874628 .6798672

_cons | 58.05901 3.814606 15.220 0.000 50.52003 65.59799

------------------------------------------------------------------------------

. sum unemrate

Variable | Obs Mean Std. Dev. Min Max

---------+-----------------------------------------------------

unemrate | 148 5.640541 1.744879 2.6 10.7
54
Now the Distribution of β
1
hat
is:
F
r
a
c
t
i
o
n
Simd unemrate parameter
-3 -2 -1 -.597 0 .67 1 2 3
0
.25