You are on page 1of 30

Basic Econometrics 1

dr Anna Staszewska-Bystrova
General Information

I Materials:
1. Lecture notes
2. Adkins L. C., Using gretl for Principles of Econometrics 4th Edition,
free e-book, 2014 (available at:
www.learneconometrics.com/gretl.html)
I Software: Gretl

I Katedra Modeli i Prognoz Ekonometrycznych


(www.econometrics.uni.lodz.pl)
I Office hours: Thursday, 8:00–9:35, C216
I Final mark: Test – during the last meeting
Basic concepts

In economics we express our ideas about relationships between economic


variables using the mathematical concept of a function.
For example, to express a relationship between income i and consumption
c, we may write
c = f (i)

The supply of an agricultural commodity such as beef might be written as

q s = f (p, p c , p f )

q s is the quantity supplied, p is the price of beef, p c is the price of


competitive products in production (for example, the price of hogs), and
p f is the price of factors or inputs (for example, the price of corn) used in
the production process.
Econometrics is about how we can use economic, business or social
science theory and data, along with tools from statistics, to answer ”how
much” type questions.
I A question facing a central bank is ”How much should we increase
the discount rate to slow inflation, and yet maintain a stable and
growing economy?”
I U.S. Presidential candidate questions how many additional California
voters will support him if he spends an additional million dollars in
advertising in that state.
The answer to these questions depends on unknown parameter values
measuring, for example, responsivness of firms and individuals to increase
of discount rate and efficiency of the advertising campaign. These values
have to be estimated.
I Econometrics is about how to best estimate economic parameters
given the data we have.
The Econometric Model

I When studying supply of beef we recognize that the actual supply is


the sum of a systematic part and a random and unpredictable
component, ε, that we will call a random error.
I An econometric model representing the supply is

q s = f (p, p c , p f ) + ε

I The random error ε accounts for the many factors that affect supply
that we have omitted from this simple model, and it also reflects the
intrinsic uncertainty in economic activity.
I To complete the specification of the econometric model, we must
also say something about the form of the algebraic relationship
among our economic variables.
We can assume that the systematic part of the supply relation is linear

f (p, p c , p f ) = α0 + α1 p + α2 p c + α3 p f

The corresponding econometric model is

q s = α0 + α1 p + α2 p c + α3 p f + ε
The Data

To estimate parameters of interest we use samples of observed data. The


data may be collected in:
I time series form - data collected over discrete intervals of time e.g.
total expenditure by hospitals in Poland in consecutive years
1990-2014
I cross section form - data collected over sample units in a particular
time period e.g. expenditure of each hospital in a sample of Polish
hospitals in 2013
I panel data form - data that follow individual micro-units over time
e.g. expenditure of a number of hospitals analysed separately in
consecutive years.
These data may be collected at various levels of aggregation:
I micro - data collected on individual economic decision making units
such as individuals, households or firms. Example - expenditure data
on individual hospitals
I macro - data resulting from a pooling or aggregating over
individuals, households or firms at the local, state or national levels.
Example - total expenditure for all hospitals
The data collected may represent a flow or a stock:
I a flow measures a rate per unit of time, such as a hospital’s
expenditure on medicines during the last quarter of 2013 or its
expenditure during the year 2013. A flow has a time dimension. The
“size of the flow” is meaningless unless the period over which it is
measured is stated.
I a stock is a value at a point in time, such as the value of medicines
held by the hospital on December 1st 2013
The data collected may be quantitative or qualitative:
I quantitative - outcomes such as prices or income that may be
expressed as numbers or some transformation of them, such as real
prices or per capita income.
I qualitative - outcomes that are of an “either-or” situation. For
example, a consumer either did or did not make a purchase of a
particular good, or a person either is or is not married.
Statistical Inference

We’ll use the phrase statistical inference a lot. By this we mean we want
to ”infer” or learn something about the real world by analyzing a sample
of data. The ways in which statistical inference is carried out include:
I Estimating economic parameters, such as elasticities, using
econometric methods.
I Predicting economic outcomes, such as the enrollment in Lódź
University for the next 10 years.
I Testing economic hypotheses, such as the question of whether
newspaper advertising is better than store displays for increasing
sales.
Types of econometric Models

I Single equation
I Multi-equation

I Linear
I Nonlinear

A linear, single equation model with one explanatory variable is called


simple regression, with more than one explanatory variable - multiple
regression model.
The Multiple Regression Model

yt = β0 + β1 x1t + ... + βk xkt + εt

I y - explained, dependent or endogenous variable or regressand


I x1 , ..., xk - explanatory, independent or exogenous variables or
regressors
I The coefficients β0 , β1 , ..., βk are unknown parameters.
I The parameter β0 is the intercept term or the constant.
I ε - error, disturbance or shock
The Assumptions of the Model

yt = β0 + β1 x1t + β2 x2t + εt
To make this statistical model complete, assumptions about the
probability distribution of the random errors, εt , need to be made.
1. E (εt ) = 0. Each random error has a probability distribution with zero
mean.
2. var (εt ) = σ 2 . The variance σ 2 is an unknown parameter and it
measures the uncertainty in the statistical model. It is the same for each
observation. Errors with this property are said to be homoskedastic.
3. cov (εt , εs ) = 0. The covariance between the two random errors
corresponding to any two different observations is zero. Thus, any pair of
errors is uncorrelated.
4. We will sometimes further assume that the random errors εt have
normal probability distributions. That is, εt ∼ N(0, σ 2 ).
In addition to the above assumptions about the error term, we make two
assumptions about the explanatory variables.
I The first is that the explanatory variables are not random variables.
I The second assumption is that no explanatory variable is an exact
linear function of any of the other explanatory variables. This
assumption is equivalent to assuming that no variable is redundant.
As we will see, if this assumption is violated, a condition called
“exact multicollinearity”, the least squares procedure fails.
The statistical properties of yt follow from those of εt
1. E (yt ) = β0 + β1 x1t + β2 x2t . This assumption says that the average
value of yt changes for each observation and is given by the regression
function
2. var (yt ) = var (εt ) = σ 2
3. cov (yt , ys ) = cov (εt , εs ) = 0
4. yt ∼ N[(β0 + β1 x1t + β2 x2t ), σ 2 ] is equivalent to assuming that
εt ∼ N(0, σ 2 )
Multiple Regression in Matrix Notation

The multiple regression model can be presented in the matrix form

y = X β + ε,

where y and ε are n × 1 (column) vectors, β is a (k + 1) × 1 vector and


X is an n × (k + 1) matrix.
   
y1 ε1
   ..  ,
y =  ...  , ε =  . 
yn εn
   
1 x11 ... xk1 β0
 ..  , β =  .. 
X =  ... . . . ... .   . 
1 x1n ... xkn βk
Previously discussed non-collinearity of explanatory variables assumption
can be expressed in terms of X matrix. The structure of the X matrix
critically affects the quality of the inferences about β.
I We shall suppose that the X matrix has rank k + 1, the number of
columns and the number of β ′ s. Because rank ≤ min(number of
rows, number of columns) we require that n ≥ k + 1 (which at the
same time is a familiar condition concerning inference about k + 1
unknown parameters requiring that we had at least same number of
observations).
I The assumption rank(X ) = k + 1 is often called the assumption of
full rank.
I If the matrix does not have full rank we have (perfect or exact)
multicollinearity.
If the matrix does not have full rank it will not be possible to answer
some questions. To illustrate, suppose

yt = β0 + β1 xt + εt
 
1 1
 
X =  ... ... 
1 1

so that xt = 1. The matrix X has rank 1 for its 2 columns are identical.
It will be impossible to say anything about β0 and β1 separately, for the
model says
yt = β0 + β1 + εt

and so distinct values such as β0 = 0, β1 = 2 and β0 = 2, β1 = 0 will


make identical claims about the distribution of yt , viz. (namely)

yt = 2 + εt
Because distinct values have identical implications for the behaviour of y
there is no possibility of learning from the value of y which of the β
combinations is the correct one.
Generally if X is of less than full rank then corresponding to the true
value β there will be a β ∗ (actually infinitely many) such that

X β = X β∗

Thus β ∗ and β are observationally equivalent–they make identical


predictions about the behaviour of y and knowing the behaviour of y is
no help in telling which is the true value.
Least Squares

Least squares involves choosing βb0 , ..., βbk to minimise


n
b =
S(β) (yt − (βb0 + βb1 x1t + ... + βbk xkt ))2
t=1

The algebra is best done in matrix notation. Write the sum of squares
b as
function S(β)
b = (y − X β)
S(β) b ′ (y − X β)
b

b is
based on the fact that the t-th component of the vector (y − X β)

yt − (βb0 + βb1 x1t + ... + βbk xkt )

The operation of transposing a matrix is denoted by a prime, i.e. the


symbol ′ : so X ′ is the transpose of X .
The gradient of S, i.e. the vector of partial derivatives, is

∂S(β)b ∂ ′
= (y y − βb′ X ′ y − y ′ X βb + βb′ X ′ X β)
b
∂ βb ∂ βb
∂ ′
= (y y − 2βb′ X ′ y + βb′ X ′ X β)
b
∂ βb
= −2X ′ y + 2X ′ X βb

b has been expanded, the terms βb′ X ′ y and y ′ X βb


where the quadratic S(β)
combined because they are equal – these scalar quantities are transposes
of each other.
Setting the gradient equal to the zero vector

∂S(β)b
=0
∂ βb
gives a set of linear equations
−X ′ y + X ′ X βb = 0.

These equations are called the normal equations. When X has full rank
the matrix X ′ X is nonsingular and can be inverted. We can write

βb = (X ′ X )−1 X ′ y

When X does not have full rank, X ′ X is singular and there are infinitely
many solutions to the equation −X ′ y + X ′ X βb = 0.
Properties of the least squares estimators

The estimator βb is a random vector and inherits properties from y and


ultimately from ε. First we have

βb = (X ′ X )−1 X ′ y = (X ′ X )−1 X ′ (X β + ε)
= (X ′ X )−1 X ′ X β + (X ′ X )−1 X ′ ε
= I β + (X ′ X )−1 X ′ ε = β + (X ′ X )−1 X ′ ε

This is just algebra plus the specification that y = X β + ε.


The final line says that the estimator βb differs from the vector of
parameters that we want to estimate by an estimation error (X ′ X )−1 X ′ ε
that depends on ε the vector of disturbances and the X matrix. Different
b
realisations of ε would give different realisations of β.
To obtain the properties of βb from that of ε we need the following results
about the expectation and the variance matrix of random vectors (i.e.
vectors of random variables) when the vectors are subjected to a linear
transformation.
The expectation of a random vector z is defined
   
Ez1 E ε1
Ez =  ...  , so E ε =  ...  = 0
Ezn E εn
The variance matrix (also called the covariance matrix and the
variance-covariance matrix) is defined

var (z) = E ((z − Ez)(z − Ez)′ )


 
var (z1 ) cov (z1, z2 ) ... cov (z1, zn )
 cov (z1, z2 ) var (z2 ) ... . 
=  
 . . ... . 
cov (z1, zn ) . ... var (zn )

So in the case of ε we have


 
σ2 0 ... 0
 0 σ2 ... . 
var (ε) = 
 .
 = σ2 I
. ... 0 
0 0 ... σ 2

This corresponds to assumptions of no autocorrelation and


homoscedasticity.
Now after these definitions we have a theorem.
If the random vector w is defined

w = c + Az

where z is a random vector, A is a matrix of constants and c a vector of


constants, then
I Ew = c + AEz
I var (w ) = Avar (z)A′
I These generalise the results for random variables: if w = c + az,
then Ew = c + aEz and var (w ) = a2 var (z).
Applying the rule for the expectation gives for the expectation of βb

E βb = β + (X ′ X )−1 X ′ E ε = β

because E ε = 0.Thus βb is an unbiased estimator of β.


Applying the rule for the variance matrix gives
( )
b = E (βb − β)(βb − β)′
var (β)
( )
= E (X ′ X )−1 X ′ εε′ ((X ′ X )−1 X ′ )′
= (X ′ X )−1 X ′ var (ε)((X ′ X )−1 X ′ )′
= (X ′ X )−1 X ′ σ 2 I ((X ′ X )−1 X ′ )′
= σ 2 (X ′ X )−1 X ′ X (X ′ X )−1 = σ 2 (X ′ X )−1

where I have used the assumptions concerning var (ε), the rule for the
transpose of a product and exploited the fact that (X ′ X )−1 is symmetric.
I Summing up, we have established two points

E βb = β ; var (β)
b = σ 2 (X ′ X )−1

I Given normal distribution of the error terms

ε ∼ N(0, σ 2 I ) we can moreover say that


βb ∼ N(β, σ 2 (X ′ X )−1 )
What you should be able to do

1. Name, define and give examples of various types of data


2. Explain what is meant by the terms: statistical inference, simple
regression, multiple regression, multicollinearity, unbiassed estimator
3. Use appropriate terms to refer to the “components” of the multiple
regression model
4. Give the assumptions of the multiple regression model (also using
the matrix notation)
5. Derive the least squares estimator, its expectation and the variance
matrix
6. Answer true/false questions on the material from slides

You might also like