OLS Derivation PDF

Introduction to Econometrics [ET2013]
Teresa Randazzo
Ca’ Foscari University of Venice

teresa.randazzo@unive.it
(Introduction to Econometrics) 1
Today
1. Short recall of algebra/statistics/probability

2. Basic ingredients of regression analysis
I The classic linear regression model (OLS)
I Finite sample properties
I Goodness-of-fit
I Asymptotic properties
I Hypothesis testing
I Data problems
3. Interpreting and comparing regression model

4. Heteroskedasticity and autocorrelation
5. Univariate Time series model
Basic ingredient of regression analysis
The Simple Linear Regression Model
I We begin with cross-sectional analysis and assume we can collect

a random sample of observations from the population of interest
I There are two variables, x and y , and we want to study how y
varies with changes in x
Issues:
1 How do we allow factors other than x to affect y ? There is never
an exact relationship between two variables!
2 What is the functional form of the relationship between y and x?
3 How can we be sure we are capturing a ceteris paribus
relationship between y and x?
Basic ingredients of regression analysis
Suppose you want to know the mean earnings of women recently
graduate from college (µy ).
I An estimator is a function of a sample of data to be drawn

randomly from a population.
I An estimate is the numerical value of the estimator when it is
actually computed using data from a specific sample
I The sample average Ȳ is a natural way to estimate µy
I There are many possible estimators so what make an estimator

better than another?
We look for an estimator that gets as close as possible to the unknown

true value, at least in some average sense; in other words, we would
like the sampling distribution of an estimator to be as tightly centered
on the unknown value as possible
The three desiderable charactristics of an estimator:
I Unbiasedness: E (µ̂Y ) = µY
p
I Consistency: Ȳ −
→ µY
I Law of large numbers: An estimator is consistent if the probability
that its falls within an interval of the true population value tends
to one as the sample size increases.
I Efficiency: Y has a smaller variance than all other linear unbiased

estimators
Linear regression
What does relationship exist between wage and a number of
background characteristics (e.g gender, age, education)?
y = f (x1 , x2 ....xk )
The following linear regression model is used to study such relationship
yi = β0 + β1 x1,i + β2 x2,i + ...βk xk,i + εi
yi = xi0 β + εi
where
I yi is the endogenous variable observed on unit i or at time i
I xi is a k × 1 vector of explanatory variables observed for unit i of
at time i (gender, years of education, age,..)
I β is a k × 1 vector of associated (slope) parameters
I εi is an unobservable disturbance term relative to unit i or time i
Ordinary Least Squared (OLS)
I Given a sample of N observations we are interested in finding
which linear combination of x1 ...xk and a constant give a good
approximation of y
I Clearly, we would like to choose values for β1 , ...βk such that the
difference between yi − xi0 β is small
I The most common approach is to chose β such that the sum of
squared differences is as small as possible
I We determine β̂ to minimize the following obiective function
N
X N
X
S(β) = (yi − xi0 β)2 = ei2
i=1 i=1
I We minimize the sum of squared approximation errors

I This approach is known as Ordinary Least Squares (OLS)
Simple Linear Regression
Fitted line and observation points
I In the simpliest case we have just one regressor and a constant
I Given the following Simple Linear Regression model
y = β0 + β1 x1 + ε
I We want to know how y changes when x changes, holding the
other factors in ε fixed
I Holding ε fixed means ∆ε = 0 so that:
∆y = β1 ∆x + ∆ε
= β1 ∆x when ∆ε = 0.
I We therefore have β1 = ∆y ∆x ⇒ β1 measures by how much y
changes if x is increased by one unit, holding ε fixed.
I Linearity implies that a one-unit change in x has the same effect
on y , regardless of the initial value of x.
Examples
I #1: yield and fertilizer:
yield = β0 + β1 fertilizer + ε,
where ε contains land quality, rainfall, . . . .

β1 measures by how much yield changes when the amount of
fertilizer changes by one unit, holding all else fixed.
I #2: wage and education:
wage = β0 + β1 educ + ε
where ε contains some factors such as ability, past workforce
experience, tenure on the current job, . . . .
Recall that ∆wage = β1 ∆educ when ∆ε = 0: each year of education
is assumed to be worth the same dollar amount no matter how much
education one starts with
I In order to estimate the population parameters, β0 and β1 we
need a random sample from the population
I Let {(xi , yi ) : i = 1, 2, ..., N} be a sample of size N (the number
of observations) from the population.
Figure: Savings and income for 15 families, and the PRF E(savings|income)
Simple Linear Regression: Minimizing the Sum of Squared Residuals
I Suppose we aim to fit as good as possible a regression line

through the data points
yi = β0 + β1 x1,i + εi
I A proper strategy is to choose β̂0 and β̂1 to make the following

objective function (sum of squared residuals) as small as
possible
XN
S= (yi − β0 − β1 xi )2 = min
i=1
I To derive the minimum we simply need to take derivative of the
function respect to β0 and β1
I The solution of the system will be the minimum
Simple Linear Regression: Minimizing the Sum of Squared Residuals
I The OLS minimizes the sum of squared differences between y and

the linear combination
I Therefore:
∂S
=?? = 0
∂β0
∂S
=?? = 0
∂β1
I Derive the first order conditions in the simple linear regression

model with a constant and one regressor
Simple Linear Regression: Methods of Moments
The OLS estimators of the intercept β0 and the slope β1 are
β̂0 = b0 = ȳ − β1 x̄
PN
i=1 (y − ȳ )(x − x̄) Cov (x, y )
βˆ1 = b1 = PN =
i=1 (x − x̄)
2 Var (x)
The OLS predicted values ŷ and residuals ε̂ are:
ŷ = β̂0 + β̂1 xi
ε̂i = yi − yˆi
The estimated intercept (βˆ0 ), slope (βˆ1 ), and residual (ε̂) are
computed from a sample of N observations of xi and yi , i = 1,.., n.

These are estimates of the unknown true population intercept (β0 ),
slope (β1 ), and error term (εi ).
We have N fitted values and N residuals
OLS in Matrix form
Bivariate case
Compact the linear regression model in matrix notation
y = β1 + β2 xi2 + εi
     
y1 1 x12 ε1
y2  1 x22  ε2 
 β1
 ..  =  .. ..  β +  .. 
    
 .  . .  2 .
yn 1 xn2 εn
   0  
y1 x1 ε1
y2  x 0  ε2 
   2  β1
 ..  =  ..  + . 
 
. . 2 β  .. 
yn 0
xn εn
Multivariate Regression Model
I Consider an extension of the wage equation we used for simple

regression:
wage = β0 + β1 educ + β2 exper + ε

where exper is years of labor market experience.
I Our main interest is in β1 , but β2 is of some interest too as it
measures the ceteris paribus effect of experience
I By explicitly including exper in the equation, we have taken it out
of the error term ⇒ we will be able to measure the effect of educ
on wage, holding exper fixed.
I Multiple regression includes more explanatory factors into the

model
I It allows to explicitly hold fixed additional factors that are taken

out of the term ε
I It also allows for more flexible functional forms
I Generally, we can write a model with two explanatory variables as:
y = β0 + β1 x1 + β2 x2 + ε,
where β0 is the intercept, β1 measures the change in y with
respect to x1 , holding other factors fixed, and β2 measures the
change in y with respect to x2 , holding other factors fixed.
I In the model with two explanatory variables, the key assumption
about how ε is related to x1 and x2 is
E(ε|x1 , x2 ) = 0.
I For any values of x1 and x2 in the population, the average
unobservable is equal to zero
I In the wage equation, the assumption is E (ε|educ, exper ) = 0
implies that other factors affecting wage are not related on
average to educ and exper
The model with k explanatory variables
I The multiple linear regression model (MLRM) can be written

in the population as
y = β0 + β1 x1 + β2 x2 + . . . + βk xk + ε
where β0 is the intercept, β1 is the parameter associated with x1 ,
β2 is the parameter associated with x2 , and so on.
I The MLRM contains k + 1 (unknown) population parameters.
We call β1 , . . . , βk the slope parameters.
I The error term ε contains factors other than x1 , x2 , . . . , xn that
affect y
The zero conditional mean assumption for MLRM
I The key assumption for the general multiple regression model is

easy to state in terms of a conditional expectation:
E(ε|x1 , ..., xk ) = 0
I At minimum, this assumption requires that all factors in ε are
uncorrelated with the explanatory variables
I We can make this condition closer to being true by controlling for
more variables
I Suppose we have x1 , x2 , . . . , xk (k regressors) along with y . We
want to fit an equation of the form
ŷ = β̂0 + β̂1 x1 + β̂2 x2 + . . . + β̂k xk

given data {(xi1 , xi2 , . . . , xik , yi ) : i = 1, ..., n}. Notice that now
the explanatory variables have two subscripts: i is the observation
number and the second subscript (1, 2, . . . , k) identifies the
specific variable.
I As in the simple regression case, we have different ways to
motivate OLS. We choose β̂0 , β̂1 , β̂2 , . . . , β̂k (so k + 1 unknowns)
to minimize the sum of squared residuals
n
X
(yi − β̂0 − β̂1 xi1 − β̂2 xi2 − . . . − β̂k xik )2
i=1
I We can use multivariate calculus. The OLS first order
conditions are the k + 1 linear equations in the k + 1 unknowns
β̂0 , β̂1 ,. . . ,β̂k :
Xn
(yi − β̂0 − β̂1 xi1 − . . . − β̂k xik ) = 0
i=1
n
X
xi1 (yi − β̂0 − β̂1 xi1 − . . . − β̂k xik ) = 0
i=1
Xn
xi2 (yi − β̂0 − β̂1 xi1 − . . . − β̂k xik ) = 0
i=1
..
.
n
X
xik (yi − β̂0 − β̂1 xi1 − . . . − β̂k xik ) = 0
i=1
I The OLS regression line is written as
ŷ = β̂0 + β̂1 x1 + β̂2 x2 + . . . + β̂k xk
OLS in Matrix form
Multivariate case
Compact the linear regression model in matrix notation
Y = β1 + β2 xi2 + β3 xi3 + ... + βk x1k−1 + εik−1
      
y1 1 x12 . . . x1k−1 β1 ε1
y2  1 x22 . . . x2k−1   β2  ε2 
 ..  =  ..   ..  +  .. 
      
..
 .  . .   .  .
yn 1 xn2 . . . xnk−1 βk−1 εn
Y = X β + ε
n×1 n×k k×1 nx1
OLS in Matrix form
I The OLS estimator is based on the idea of finding the value of β

that minimizes the quadratic distance between the Y and X β
I The objective function
N
X
S(β) = (yi − xi0 β)2
i=1
I Can be rewritten in matrix notation
S(β) = (yi − X β)0 (yi − X β) = yy 0 − 2y 0 X β + βX 0 X β
I To solve the problem
min ∂S(β) = 0
β
OLS in Matrix Form
∂S(β)
I Notice that β is a k × 1 vector, e.i
 ∂S(β) 
1 β
 ∂S(β) 
∂S(β)  β2 

= . 

β  .. 
∂S(β)
βn
I Solving the FOC under rank=(X’X)=k we get:
β̂ = (X 0 X )−1 X 0 y
I which in the bivariate case corrispond to

P
xy
β̂ = P 2
x
OLS in Matrix Form
I Therefore we obtain
N
X N
−1 X
0 −1 0 0
β̂ = (X X ) Xy= xx xy
i=1 i=1
I The condition rank(X) = k → (X’X) is non-singular

I is crucial to obtain a valid (unique) OLS estimator.
I In other words the rank condition rank(X) = k implies that there
is no exact (or perfect) multicollinearity (it is not possible to
obtain one regressor as a function of the the others)
I N.B.: The rank of a matrix is defined as (a) the maximum number of
linearly independent column vectors in the matrix or (b) the maximum
number of linearly independent row vectors in the matrix.
Formal link between the two representations
I To understand the connection between the two representations, if

you know the basic rule of algebra, it is easy to verify that
N
X
X 0X = xx 0
i=1
N
X
X 0y = xy
i=1
N
X N
X
2
ε = (yi − xi0 β)2 = (y − X β)0 (y − X β) = ε0 ε
i=1 i=1
I These equations link the two (different but perfectly equivalent)
representations.
Variance-Covariance Matrix
I The crucial fact in the matrix representation
y = Xβ + ε
is the compact covariance matrix

  
1
ε2  
0
Σ = E (εε ) = ε ε . . . ε
  
 ..  1 2 n 
N×N  .  
n
 2 
ε1 ε1 ε2 . . . ε1 εn
ε2 ε1 ε2 . . . ε2 εn 
 2 
=E  ... ..
.
..
.
.. 
. 
..
 
εn ε1 ε n ε2 . εn 2
 
E (ε21 ) E (ε1 ε2 ) . . . E (ε1 εn )
E (ε2 ε1 ) E (ε2 ) . . . E (ε2 εn )
 2 
= .. .. .. .. 
 . . . . 
..
 
E (εn ε1 ) E (εn ε2 ) . E (ε2n )
 2 
σ1 σ12 . . . σ1n
σ21 σ 2 . . . σ2n 
 2 
=  ... ..
.
..
.
.. 
. 
..
 
σn1 σn2 . σn 2

OLS Derivation PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

OLS Derivation PDF

Uploaded by

Copyright:

Available Formats

Introduction to Econometrics [ET2013]

Ca’ Foscari University of Venice

1. Short recall of algebra/statistics/probability

3. Interpreting and comparing regression model

I We begin with cross-sectional analysis and assume we can collect

I An estimator is a function of a sample of data to be drawn

I The sample average Ȳ is a natural way to estimate µy

I There are many possible estimators so what make an estimator

We look for an estimator that gets as close as possible to the unknown

The three desiderable charactristics of an estimator:

I Efficiency: Y has a smaller variance than all other linear unbiased

The following linear regression model is used to study such relationship

yi = β0 + β1 x1,i + β2 x2,i + ...βk xk,i + εi

I We minimize the sum of squared approximation errors

Fitted line and observation points

where ε contains land quality, rainfall, . . . .

I Suppose we aim to fit as good as possible a regression line

I A proper strategy is to choose β̂0 and β̂1 to make the following

I The OLS minimizes the sum of squared differences between y and

I Derive the first order conditions in the simple linear regression

The OLS estimators of the intercept β0 and the slope β1 are

The OLS predicted values ŷ and residuals ε̂ are:

computed from a sample of N observations of xi and yi , i = 1,.., n.

Compact the linear regression model in matrix notation

I Consider an extension of the wage equation we used for simple

wage = β0 + β1 educ + β2 exper + ε

I Multiple regression includes more explanatory factors into the

I It allows to explicitly hold fixed additional factors that are taken

I It also allows for more flexible functional forms

The model with k explanatory variables

I The multiple linear regression model (MLRM) can be written

The zero conditional mean assumption for MLRM

I The key assumption for the general multiple regression model is

ŷ = β̂0 + β̂1 x1 + β̂2 x2 + . . . + β̂k xk

Compact the linear regression model in matrix notation

Y = β1 + β2 xi2 + β3 xi3 + ... + βk x1k−1 + εik−1

I The OLS estimator is based on the idea of finding the value of β

I Can be rewritten in matrix notation

S(β) = (yi − X β)0 (yi − X β) = yy 0 − 2y 0 X β + βX 0 X β

I To solve the problem

I Solving the FOC under rank=(X’X)=k we get:

I which in the bivariate case corrispond to

I The condition rank(X) = k → (X’X) is non-singular

I To understand the connection between the two representations, if

is the compact covariance matrix

You might also like