Cef PDF

Conditional Expectations and Linear Regressions
Walter Sosa-Escudero
Econ 507. Econometric Analysis. Spring 2009
March 31, 2009
All models are wrong, but some are useful

(George E. P. Box)
Box, G. E. P. and Draper, N., 1987, Empirical Model-Building and Response Surfaces, Wiley, New York, p. 424.
Motivation
Our last attempt with the linear model
So far we have assumed we know the model and its
structure. The OLS (or the GMM) estimator consistently
estimates the unknown parameters.
What is OLS estimating if the underlying model is completeley

unknown (possibly non-linear, endogenous, heteroskedastic, etc.)
We will argue that the OLS estimator provides a good linear
aproximation of the (possibly non-linear) conditional expectation.
Note: this lecture is highly inspired by Angrist and Pischke (2009).
Conditional Expectations Revisited
E(y|x) gives the expected value of y for given values of x.

It provides a reasonable representation of how x alters y.
If x is random, E(y|x) is a random function.
LIE: E(y) = E[E(y|x)].
We need two more properties.
Decomposition property: any random variable y can be expressed

as
y = E(y|x) +
where is a random variable satifying i) E(|x) = 0, ii)
E(h(x)) = 0, where h(.) is any function of x.
Intuition: any variable can bedecomposed in two parts: the conditional

expectation and an orthogonoal error term. We are not claiming E(y|x)
is linear.
Proof:
i)

E(|x) = E y E(y|x) |x = E(y|x) E(y|x) = 0
ii)

E(h(x)) = E h(x)E(|x) = 0
Prediction property: Let m(x) be any function of x. Then

E(y|x) = argmin E (y m(x))2 .
m(x)
Intuition: the conditional expectation is the best prediction, where

best means minimum mean squared error.
Proof:
(y m(x))2
i2
(y E(y|x)) + (E(y|x) m(x))

2
2
=
y E(y|x) + E(y|x) m(x)

+ 2 y E(y|x) E(y|x) m(x)
=
Now:
The first term is not affected by the choice of m(x).

The third term y E(y|x) E(y|x) m(x) = h(x)(x), with
h(x) E(y|x) m(x), and is equal to zero, by the decomposition
property.
Hence, the whole expression is minimized if m(x) = E(y|x).
The population linear regression

For any random variable y and a vector of K random variables x,
the population linear regression function is a linear funtion
r(x) = x0 , where is a K 1 vector that satisfies

= argmin E (y x0 b)2
b
r(x) solves a minimum mean squared linear prediction

problem.
Technically, r(X) is the orthogonal projection of the random
variable y on the space spanned by the elements of the
random vector x.
This is like the population version of what we did before with
the data.
Orthogonality and population coefficients: if E(xx0 ) is invertible,

the condition
E[x(y x0 )] = 0
is necessary and sufficient to guarantee x0 exists and is the unique
solution to the best linear prediction problem.
Corollary
= E(xx0 )1 E(xy)
Note: this the population version of the OLS estimator
Sketchy proof: Let x0 be any linear predictor.

2]
E[(y x0 )
= E[(y x0 ) + x0 ( ))2 ]
= E[(y x0 )2 ] + 2( )E[x(y x0 )] + E(x0 ( ))2
= E[(y x0 )2 ] + E(x0 ( ))2
E[(y x0 )2 ]
Linear regression and conditional expectation
1) If the E(Y |X) is linear, then r(X) = E(Y |X).
Proof: if E(y|x) is linear, then E(y|x) = x0 for some K vector . By

the decomposition property
E(x(y E(y|x)) = E(x(y x0 )) = 0
. Solve and get = , hence E(y|x) = r(x)
2) r(x) is the best linear predictor of y given x in the MMSE sense:
Proof: solves the population minimum MSE linear problem.
3) r(x) is the best linear prediction of E(y|x) in the MMSE sense.
Proof: see homework
Go back to the starting point of this course

y = x0 + u
with E(u|x) = 0, among other assumptions. Then trivially
E(y|x) = r(x) = x0
Note that we got r(X) = E(y|x) by imposing structure on u
(predeterminedness). In the population regression approach we
first imposed structure on r(x) (weve forced it to solve a
minimum prediction problem) and we got error orthogonality as a
consequence.
How to justify linear regression models?

1
If we are interested in E(y|x) and this is a linear function of

x, then it coincides with the linear regression function.
r(x) provides the best linear representation of y given x.
r(x) provides the best linear representation of E(y|x) given x.
This says that if m(x) is a non-linear function, the OLS method

will be estimating consistently the parameters of r(x), who
provides the best approximation of m(x) in the MMSE sense.
Empirical Illustration

Cef PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Cef PDF

Uploaded by

Copyright:

Available Formats

Conditional Expectations and Linear Regressions

March 31, 2009

Conditional Expectations and Linear Regressions

All models are wrong, but some are useful

Conditional Expectations and Linear Regressions

What is OLS estimating if the underlying model is completeley

Conditional Expectations and Linear Regressions

Conditional Expectations Revisited

E(y|x) gives the expected value of y for given values of x.

We need two more properties.

Conditional Expectations and Linear Regressions

Decomposition property: any random variable y can be expressed

Intuition: any variable can bedecomposed in two parts: the conditional

Conditional Expectations and Linear Regressions

Conditional Expectations and Linear Regressions

Prediction property: Let m(x) be any function of x. Then

Intuition: the conditional expectation is the best prediction, where

Conditional Expectations and Linear Regressions

Conditional Expectations and Linear Regressions

The population linear regression

r(x) solves a minimum mean squared linear prediction

Conditional Expectations and Linear Regressions

Orthogonality and population coefficients: if E(xx0 ) is invertible,

Conditional Expectations and Linear Regressions

Sketchy proof: Let x0 be any linear predictor.

Conditional Expectations and Linear Regressions

Linear regression and conditional expectation

1) If the E(Y |X) is linear, then r(X) = E(Y |X).

Proof: if E(y|x) is linear, then E(y|x) = x0 for some K vector . By

Conditional Expectations and Linear Regressions

2) r(x) is the best linear predictor of y given x in the MMSE sense:

Proof: solves the population minimum MSE linear problem.

Conditional Expectations and Linear Regressions

3) r(x) is the best linear prediction of E(y|x) in the MMSE sense.

Proof: see homework

Conditional Expectations and Linear Regressions

Go back to the starting point of this course

Conditional Expectations and Linear Regressions

How to justify linear regression models?

If we are interested in E(y|x) and this is a linear function of

r(x) provides the best linear representation of y given x.

r(x) provides the best linear representation of E(y|x) given x.

This says that if m(x) is a non-linear function, the OLS method

Conditional Expectations and Linear Regressions

Conditional Expectations and Linear Regressions

You might also like