You are on page 1of 30

Introduction to Econometrics [ET2013]

Teresa Randazzo

Ca’ Foscari University of Venice


teresa.randazzo@unive.it

(Introduction to Econometrics) 1
Today

1. Short recall of algebra/statistics/probability


2. Basic ingredients of regression analysis
I The classic linear regression model (OLS)
I Finite sample properties
I Goodness-of-fit
I Asymptotic properties
I Hypothesis testing
I Data problems

3. Interpreting and comparing regression model


4. Heteroskedasticity and autocorrelation
5. Univariate Time series model

(Introduction to Econometrics) 2
Basic ingredient of regression analysis
The Simple Linear Regression Model

I We begin with cross-sectional analysis and assume we can collect


a random sample of observations from the population of interest
I There are two variables, x and y , and we want to study how y
varies with changes in x

Issues:
1 How do we allow factors other than x to affect y ? There is never
an exact relationship between two variables!
2 What is the functional form of the relationship between y and x?
3 How can we be sure we are capturing a ceteris paribus
relationship between y and x?

(Introduction to Econometrics) 3
Basic ingredients of regression analysis
Suppose you want to know the mean earnings of women recently
graduate from college (µy ).

I An estimator is a function of a sample of data to be drawn


randomly from a population.
I An estimate is the numerical value of the estimator when it is
actually computed using data from a specific sample

I The sample average Ȳ is a natural way to estimate µy

I There are many possible estimators so what make an estimator


better than another?

We look for an estimator that gets as close as possible to the unknown


true value, at least in some average sense; in other words, we would
like the sampling distribution of an estimator to be as tightly centered
on the unknown value as possible
(Introduction to Econometrics) 4
Basic ingredients of regression analysis

The three desiderable charactristics of an estimator:

I Unbiasedness: E (µ̂Y ) = µY

p
I Consistency: Ȳ −
→ µY
I Law of large numbers: An estimator is consistent if the probability
that its falls within an interval of the true population value tends
to one as the sample size increases.

I Efficiency: Y has a smaller variance than all other linear unbiased


estimators

(Introduction to Econometrics) 5
Basic ingredients of regression analysis
Linear regression
What does relationship exist between wage and a number of
background characteristics (e.g gender, age, education)?

y = f (x1 , x2 ....xk )

The following linear regression model is used to study such relationship

yi = β0 + β1 x1,i + β2 x2,i + ...βk xk,i + εi

yi = xi0 β + εi
where
I yi is the endogenous variable observed on unit i or at time i
I xi is a k × 1 vector of explanatory variables observed for unit i of
at time i (gender, years of education, age,..)
I β is a k × 1 vector of associated (slope) parameters
I εi is an unobservable disturbance term relative to unit i or time i
(Introduction to Econometrics) 6
Ordinary Least Squared (OLS)
I Given a sample of N observations we are interested in finding
which linear combination of x1 ...xk and a constant give a good
approximation of y
I Clearly, we would like to choose values for β1 , ...βk such that the
difference between yi − xi0 β is small
I The most common approach is to chose β such that the sum of
squared differences is as small as possible
I We determine β̂ to minimize the following obiective function

N
X N
X
S(β) = (yi − xi0 β)2 = ei2
i=1 i=1

I We minimize the sum of squared approximation errors


I This approach is known as Ordinary Least Squares (OLS)

(Introduction to Econometrics) 7
Ordinary Least Squared (OLS)
Simple Linear Regression

Fitted line and observation points

(Introduction to Econometrics) 8
Ordinary Least Squared (OLS)
Simple Linear Regression
I In the simpliest case we have just one regressor and a constant
I Given the following Simple Linear Regression model

y = β0 + β1 x1 + ε
I We want to know how y changes when x changes, holding the
other factors in ε fixed
I Holding ε fixed means ∆ε = 0 so that:

∆y = β1 ∆x + ∆ε
= β1 ∆x when ∆ε = 0.
I We therefore have β1 = ∆y ∆x ⇒ β1 measures by how much y
changes if x is increased by one unit, holding ε fixed.
I Linearity implies that a one-unit change in x has the same effect
on y , regardless of the initial value of x.
(Introduction to Econometrics) 9
Ordinary Least Squared (OLS)
Simple Linear Regression
Examples
I #1: yield and fertilizer:

yield = β0 + β1 fertilizer + ε,

where ε contains land quality, rainfall, . . . .


β1 measures by how much yield changes when the amount of
fertilizer changes by one unit, holding all else fixed.
I #2: wage and education:
wage = β0 + β1 educ + ε
where ε contains some factors such as ability, past workforce
experience, tenure on the current job, . . . .
Recall that ∆wage = β1 ∆educ when ∆ε = 0: each year of education
is assumed to be worth the same dollar amount no matter how much
education one starts with
(Introduction to Econometrics) 10
Ordinary Least Squared (OLS)
Simple Linear Regression
I In order to estimate the population parameters, β0 and β1 we
need a random sample from the population
I Let {(xi , yi ) : i = 1, 2, ..., N} be a sample of size N (the number
of observations) from the population.

Figure: Savings and income for 15 families, and the PRF E(savings|income)

(Introduction to Econometrics) 11
Ordinary Least Squared (OLS)
Simple Linear Regression: Minimizing the Sum of Squared Residuals

I Suppose we aim to fit as good as possible a regression line


through the data points

yi = β0 + β1 x1,i + εi

I A proper strategy is to choose β̂0 and β̂1 to make the following


objective function (sum of squared residuals) as small as
possible
XN
S= (yi − β0 − β1 xi )2 = min
i=1
I To derive the minimum we simply need to take derivative of the
function respect to β0 and β1
I The solution of the system will be the minimum

(Introduction to Econometrics) 12
Ordinary Least Squared (OLS)
Simple Linear Regression: Minimizing the Sum of Squared Residuals

I The OLS minimizes the sum of squared differences between y and


the linear combination
I Therefore:
∂S
=?? = 0
∂β0
∂S
=?? = 0
∂β1

I Derive the first order conditions in the simple linear regression


model with a constant and one regressor

(Introduction to Econometrics) 13
Ordinary Least Squared (OLS)
Simple Linear Regression: Methods of Moments

The OLS estimators of the intercept β0 and the slope β1 are

β̂0 = b0 = ȳ − β1 x̄

PN
i=1 (y − ȳ )(x − x̄) Cov (x, y )
βˆ1 = b1 = PN =
i=1 (x − x̄)
2 Var (x)

(Introduction to Econometrics) 14
Ordinary Least Squared (OLS)
Simple Linear Regression

The OLS predicted values ŷ and residuals ε̂ are:

ŷ = β̂0 + β̂1 xi

ε̂i = yi − yˆi
The estimated intercept (βˆ0 ), slope (βˆ1 ), and residual (ε̂) are

computed from a sample of N observations of xi and yi , i = 1,.., n.


These are estimates of the unknown true population intercept (β0 ),
slope (β1 ), and error term (εi ).
We have N fitted values and N residuals

(Introduction to Econometrics) 15
OLS in Matrix form
Bivariate case

Compact the linear regression model in matrix notation

y = β1 + β2 xi2 + εi

     
y1 1 x12 ε1
y2  1 x22    ε2 
 β1
 ..  =  .. ..  β +  .. 
    
 .  . .  2 .
yn 1 xn2 εn
   0  
y1 x1 ε1
y2  x 0    ε2 
   2  β1
 ..  =  ..  + . 
 
. . 2 β  .. 
yn 0
xn εn

(Introduction to Econometrics) 16
Multivariate Regression Model

I Consider an extension of the wage equation we used for simple


regression:

wage = β0 + β1 educ + β2 exper + ε


where exper is years of labor market experience.
I Our main interest is in β1 , but β2 is of some interest too as it
measures the ceteris paribus effect of experience
I By explicitly including exper in the equation, we have taken it out
of the error term ⇒ we will be able to measure the effect of educ
on wage, holding exper fixed.

(Introduction to Econometrics) 17
Multivariate Regression Model

I Multiple regression includes more explanatory factors into the


model

I It allows to explicitly hold fixed additional factors that are taken


out of the term ε

I It also allows for more flexible functional forms

(Introduction to Econometrics) 18
Multivariate Regression Model
I Generally, we can write a model with two explanatory variables as:

y = β0 + β1 x1 + β2 x2 + ε,
where β0 is the intercept, β1 measures the change in y with
respect to x1 , holding other factors fixed, and β2 measures the
change in y with respect to x2 , holding other factors fixed.
I In the model with two explanatory variables, the key assumption
about how ε is related to x1 and x2 is

E(ε|x1 , x2 ) = 0.
I For any values of x1 and x2 in the population, the average
unobservable is equal to zero
I In the wage equation, the assumption is E (ε|educ, exper ) = 0
implies that other factors affecting wage are not related on
average to educ and exper
(Introduction to Econometrics) 19
Multivariate Regression Model

The model with k explanatory variables

I The multiple linear regression model (MLRM) can be written


in the population as

y = β0 + β1 x1 + β2 x2 + . . . + βk xk + ε
where β0 is the intercept, β1 is the parameter associated with x1 ,
β2 is the parameter associated with x2 , and so on.
I The MLRM contains k + 1 (unknown) population parameters.
We call β1 , . . . , βk the slope parameters.
I The error term ε contains factors other than x1 , x2 , . . . , xn that
affect y

(Introduction to Econometrics) 20
Multivariate Regression Model

The zero conditional mean assumption for MLRM

I The key assumption for the general multiple regression model is


easy to state in terms of a conditional expectation:

E(ε|x1 , ..., xk ) = 0
I At minimum, this assumption requires that all factors in ε are
uncorrelated with the explanatory variables
I We can make this condition closer to being true by controlling for
more variables

(Introduction to Econometrics) 21
Multivariate Regression Model
I Suppose we have x1 , x2 , . . . , xk (k regressors) along with y . We
want to fit an equation of the form

ŷ = β̂0 + β̂1 x1 + β̂2 x2 + . . . + β̂k xk


given data {(xi1 , xi2 , . . . , xik , yi ) : i = 1, ..., n}. Notice that now
the explanatory variables have two subscripts: i is the observation
number and the second subscript (1, 2, . . . , k) identifies the
specific variable.
I As in the simple regression case, we have different ways to
motivate OLS. We choose β̂0 , β̂1 , β̂2 , . . . , β̂k (so k + 1 unknowns)
to minimize the sum of squared residuals
n
X
(yi − β̂0 − β̂1 xi1 − β̂2 xi2 − . . . − β̂k xik )2
i=1

(Introduction to Econometrics) 22
Multivariate Regression Model
I We can use multivariate calculus. The OLS first order
conditions are the k + 1 linear equations in the k + 1 unknowns
β̂0 , β̂1 ,. . . ,β̂k :
Xn
(yi − β̂0 − β̂1 xi1 − . . . − β̂k xik ) = 0
i=1
n
X
xi1 (yi − β̂0 − β̂1 xi1 − . . . − β̂k xik ) = 0
i=1
Xn
xi2 (yi − β̂0 − β̂1 xi1 − . . . − β̂k xik ) = 0
i=1
..
.
n
X
xik (yi − β̂0 − β̂1 xi1 − . . . − β̂k xik ) = 0
i=1
I The OLS regression line is written as
ŷ = β̂0 + β̂1 x1 + β̂2 x2 + . . . + β̂k xk
(Introduction to Econometrics) 23
OLS in Matrix form
Multivariate case

Compact the linear regression model in matrix notation

Y = β1 + β2 xi2 + β3 xi3 + ... + βk x1k−1 + εik−1

      
y1 1 x12 . . . x1k−1 β1 ε1
y2  1 x22 . . . x2k−1   β2  ε2 
 ..  =  ..   ..  +  .. 
      
..
 .  . .   .  .
yn 1 xn2 . . . xnk−1 βk−1 εn

Y = X β + ε
n×1 n×k k×1 nx1

(Introduction to Econometrics) 24
OLS in Matrix form

I The OLS estimator is based on the idea of finding the value of β


that minimizes the quadratic distance between the Y and X β
I The objective function

N
X
S(β) = (yi − xi0 β)2
i=1

I Can be rewritten in matrix notation

S(β) = (yi − X β)0 (yi − X β) = yy 0 − 2y 0 X β + βX 0 X β

I To solve the problem

min ∂S(β) = 0
β

(Introduction to Econometrics) 25
OLS in Matrix Form
∂S(β)
I Notice that β is a k × 1 vector, e.i
 ∂S(β) 
1 β
 ∂S(β) 
∂S(β)  β2 

= . 

β  .. 
∂S(β)
βn

I Solving the FOC under rank=(X’X)=k we get:

β̂ = (X 0 X )−1 X 0 y

I which in the bivariate case corrispond to


P
xy
β̂ = P 2
x

(Introduction to Econometrics) 26
OLS in Matrix Form

I Therefore we obtain
N
X N
−1  X 
0 −1 0 0
β̂ = (X X ) Xy= xx xy
i=1 i=1

I The condition rank(X) = k → (X’X) is non-singular


I is crucial to obtain a valid (unique) OLS estimator.
I In other words the rank condition rank(X) = k implies that there
is no exact (or perfect) multicollinearity (it is not possible to
obtain one regressor as a function of the the others)
I N.B.: The rank of a matrix is defined as (a) the maximum number of
linearly independent column vectors in the matrix or (b) the maximum
number of linearly independent row vectors in the matrix.

(Introduction to Econometrics) 27
Formal link between the two representations

I To understand the connection between the two representations, if


you know the basic rule of algebra, it is easy to verify that
N
X
X 0X = xx 0
i=1

N
X
X 0y = xy
i=1

N
X N
X
2
ε = (yi − xi0 β)2 = (y − X β)0 (y − X β) = ε0 ε
i=1 i=1
I These equations link the two (different but perfectly equivalent)
representations.

(Introduction to Econometrics) 28
Variance-Covariance Matrix
I The crucial fact in the matrix representation

y = Xβ + ε

is the compact covariance matrix


  
1
ε2  
0
Σ = E (εε ) = ε ε . . . ε
  
 ..  1 2 n 
N×N  .  
n
 2 
ε1 ε1 ε2 . . . ε1 εn
ε2 ε1 ε2 . . . ε2 εn 
 2 
=E  ... ..
.
..
.
.. 
. 
..
 
εn ε1 ε n ε2 . εn 2

(Introduction to Econometrics) 29
 
E (ε21 ) E (ε1 ε2 ) . . . E (ε1 εn )
E (ε2 ε1 ) E (ε2 ) . . . E (ε2 εn )
 2 
= .. .. .. .. 
 . . . . 
..
 
E (εn ε1 ) E (εn ε2 ) . E (ε2n )
 2 
σ1 σ12 . . . σ1n
σ21 σ 2 . . . σ2n 
 2 
=  ... ..
.
..
.
.. 
. 
..
 
σn1 σn2 . σn 2

(Introduction to Econometrics) 30

You might also like