Allchap1 PDF

Copyright 1996 Lawrence C.
Marsh
PowerPoint Slides
for
Undergraduate Econometrics
by
Lawrence C. Marsh
To accompany: Undergraduate Econometrics

by R. Carter Hill, William E. Griffiths and George G. Judge
Publisher: John Wiley & Sons, 1997
Copyright 1996 Lawrence C. Marsh
1.1
Chapter 1
The Role of
Econometrics
in Economic Analysis
Copyright © 1997 John Wiley & Sons, Inc. All rights reserved. Reproduction or translation of this work beyond
that permitted in Section 117 of the 1976 United States Copyright Act without the express written permission of the
copyright owner is unlawful. Request for further information should be addressed to the Permissions Department,
John Wiley & Sons, Inc. The purchaser may make back-up copies for his/her own use only and not for distribution
or resale. The Publisher assumes no responsibility for errors, omissions, or damages, caused by the use of these
programs or from the use of the information contained herein.
1.2
The Role of Econometrics
Using Information:
1. Information from economic theory.
2. Information from economic data.

1.3
Understanding Economic Relationships:
Dow-Jones money supply
federal Stock Index
budget
short term
treasury bills
inflation trade
deficit Federal Reserve
unemployment Discount Rate
power of
labor unions capital gains tax
rent
control
crime rate laws
1.4
Economic Decisions
To use information effectively:
economic theory
economic data } economic
decisions
*Econometrics* helps us combine

economic theory and economic data .
1.5
The Consumption Function
Consumption, c, is some function of income, i :
c = f(i)
For applied econometric analysis

this consumption function must be
specified more precisely.
1.6
demand, qd, for an individual commodity:
qd = f( p, pc, ps, i ) demand

p = own price; pc = price of complements;
ps = price of substitutes; i = income
supply, qs, of an individual commodity:
qs = f( p, pc, pf ) supply
p = own price; pc = price of competitive products;
ps = price of substitutes; pf = price of factor inputs
1.7
How much ?
Listing the variables in an economic relationship is not enough.
For effective policy we must know the amount of change

needed for a policy instrument to bring about the desired
effect:
• By how much should the Federal Reserve

raise interest rates to prevent inflation?
• By how much can the price of football tickets

be increased and still fill the stadium?
1.8
Answering the How Much? question
Need to estimate parameters

that are both:
1. unknown
and
2. unobservable
1.9
The Statistical Model
Average or systematic behavior

over many individuals or many firms.
Not a single individual or single firm.

Economists are concerned with the
unemployment rate and not whether
a particular individual gets a job.
1.10
Actual vs. Predicted Consumption:

Actual = systematic part + random error
Consumption, c, is function, f, of income, i, with error, e:
c = f(i) + e
Systematic part provides prediction, f(i),
but actual will miss by random error, e.
The Consumption Function 1.11
c = f(i) + e
Need to define f(i) in some way.
To make consumption, c,
a linear function of income, i :
f(i) = β1 + β2 i
The statistical model then becomes:
c = β1 + β2 i + e
1.12
The Econometric Model
y = β1 + β2 X2 + β3 X3 + e
• Dependent variable, y, is focus of study

(predict or explain changes in dependent variable).
• Explanatory variables, X2 and X3, help us explain

observed changes in the dependent variable.
1.13
Statistical Models
Controlled (experimental)
vs.
Uncontrolled (observational)
Controlled experiment (“pure” science) explaining mass, y :

pressure, X2, held constant when varying temperature, X3,
and vice versa.
Uncontrolled experiment (econometrics) explaining consump-

tion, y : price, X2, and income, X3, vary at the same time.
1.14
Econometric model
• economic model
economic variables and parameters.
• statistical model
sampling process with its parameters.
• data
observed values of the variables.
1.15
The Practice of Econometrics
• Uncertainty regarding an outcome.

• Relationships suggested by economic theory.
• Assumptions and hypotheses to be specified.
• Sampling process including functional form.
• Obtaining data for the analysis.
• Estimation rule with good statistical properties.
• Fit and test model using software package.
• Analyze and evaluate implications of the results.
• Problems suggest approaches for further research.
1.16
Note: the textbook uses the following symbol

to mark sections with advanced material:
“Skippy”
2.1
Chapter 2
Some Basic
Probability
Concepts
2.2
Random Variable
random variable:
A variable whose value is unknown until it is observed.
The value of a random variable results from an experiment.
The term random variable implies the existence of some

known or unknown probability distribution defined over
the set of all possible values of that variable.
In contrast, an arbitrary variable does not have a

probability distribution associated with its values.
2.3
Controlled experiment values
of explanatory variables are chosen
with great care in accordance with
an appropriate experimental design.
Uncontrolled experiment values

of explanatory variables consist of
nonexperimental observations over
which the analyst has no control.
2.4
Discrete Random Variable
discrete random variable:
A discrete random variable can take only a finite
number of values, that can be counted by using
the positive integers.
Example: Prize money from the following

lottery is a discrete random variable:
first prize: $1,000
second prize: $50
third prize: $5.75
since it has only four (a finite number)
(count: 1,2,3,4) of possible outcomes:
$0.00; $5.75; $50.00; $1,000.00
2.5
Continuous Random Variable
continuous random variable:
A continuous random variable can take
any real value (not just whole numbers)
in at least one interval on the real line.
Examples:
Gross national product (GNP)
money supply
interest rates
price of eggs
household income
expenditure on clothing
2.6
Dummy Variable
A discrete random variable that is restricted

to two possible values (usually 0 and 1) is
called a dummy variable (also, binary or
indicator variable).
Dummy variables account for qualitative differences:

gender (0=male, 1=female),
race (0=white, 1=nonwhite),
citizenship (0=U.S., 1=not U.S.),
income class (0=poor, 1=rich).
2.7
A list of all of the possible values taken
by a discrete random variable along with
their chances of occurring is called a probability
function or probability density function (pdf).
die x f(x)
one dot 1 1/6
two dots 2 1/6
three dots 3 1/6
four dots 4 1/6
five dots 5 1/6
six dots 6 1/6
2.8
A discrete random variable X
has pdf, f(x), which is the probability
that X takes on the value x.
f(x) = P(X=x)
Therefore, 0 < f(x) < 1
If X takes on the n values: x1, x2, . . . , xn,

then f(x1) + f(x2)+. . .+f(xn) = 1.
2.9
Probability, f(x), for a discrete random
variable, X, can be represented by height:
0.4
f(x) 0.3
0.2
0.1
0 1 2 3 X
number, X, on Dean’s List of three roommates
2.10
A continuous random variable uses
area under a curve rather than the
height, f(x), to represent probability:
f(x)
red area
green area 0.1324
0.8676
. .
$34,000 $55,000 X
per capita income, X, in the United States

2.11
Since a continuous random variable has an
uncountably infinite number of values,
the probability of one occurring is zero.
P[X=a] = P[a<X<a]=0
Probability is represented by area.

Height alone has no area.
An interval for X is needed to get
an area under the curve.
2.12
The area under a curve is the integral of
the equation that generates the curve:
b
P[a<X<b]= ∫ f(x) dx
a
For continuous random variables it is the

integral of f(x), and not f(x) itself, which
defines the area and, therefore, the probability.
2.13
Rules of Summation
n
Rule 1: Σ xi = x1 + x2 + . . . + xn
i=1
n n
Rule 2: Σ axi = a Σ xi
i=1 i=1
n n n
Rule 3: Σ (xi + yi) = Σ xi + Σ yi
i=1 i=1 i=1
Note that summation is a linear operator

which means it operates term by term.
2.14
Rules of Summation (continued)
n n n
Rule 4: Σ (axi + byi) = a Σ xi + b Σ yi
i=1 i=1 i=1
n x1 + x2 + . . . + xn
Rule 5: x = n Σ xi =
1
i=1 n
The definition of x as given in Rule 5 implies

the following important fact:
n
Σ (xi − x) = 0
i=1
2.15
Rules of Summation (continued)
n
Rule 6: Σ f(xi) = f(x1) + f(x2) + . . . + f(xn)
i=1
n
Notation: Σ f(xi) = Σi f(xi) = Σ f(xi)
x i=1
n m n
Rule 7: Σ Σ f(xi,yj) = Σ [ f(xi,y1) + f(xi,y2)+. . .+ f(xi,ym)]
i=1 j=1 i=1
The order of summation does not matter :

n m m n
Σ Σ f(xi,yj) = Σ Σ f(xi,yj)
i=1 j=1 j=1 i=1
2.16
The Mean of a Random Variable
The mean or arithmetic average of a

random variable is its mathematical
expectation or expected value, EX.
2.17
Expected Value
There are two entirely different, but mathematically
equivalent, ways of determining the expected value:
1. Empirically:
The expected value of a random variable, X,
is the average value of the random variable in an
infinite number of repetitions of the experiment.
In other words, draw an infinite number of samples,

and average the values of X that you get.
2.18
Expected Value
2. Analytically:
The expected value of a discrete random
variable, X, is determined by weighting all
the possible values of X by the corresponding
probability density function values, f(x), and
summing them up.
In other words:
E[X] = x1f(x1) + x2f(x2) + . . . + xnf(xn)

Empirical vs. Analytical 2.19
As sample size goes to infinity, the

empirical and analytical methods
will produce the same value.
In the empirical case when the

sample goes to infinity the values
of X occur with a frequency
equal to the corresponding f(x)
in the analytical expression.
2.20
Empirical (sample) mean:
n
x = Σ xi
i=1
where n is the number of sample observations.
Analytical mean:
n
E[X] = Σ xi f(xi)
i=1
where n is the number of possible values of xi.
Notice how the meaning of n changes.

2.21
The expected value of X:
n
EX = Σ xi f(xi)
i=1
The expected value of X-squared:

n
Σ
2 2
EX = xi f(xi)
i=1
It is important to notice that f(xi) does not change!
The expected value of X-cubed:

n
Σ
3 3
EX = xi f(xi)
i=1
2.22
EX = 0 (.1) + 1 (.3) + 2 (.3) + 3 (.2) + 4 (.1)
= 1.9
2 2 2 2 2 2
EX = 0 (.1) + 1 (.3) + 2 (.3) + 3 (.2) + 4 (.1)
= 0 + .3 + 1.2 + 1.8 + 1.6
= 4.9
3 3 3 3 3 3
EX = 0 (.1) + 1 (.3) + 2 (.3) + 3 (.2) +4 (.1)
= 0 + .3 + 2.4 + 5.4 + 6.4
= 14.5
2.23
n
E [g(X)] = Σ
i=1
g(xi) f(xi)
g(X) = g1(X) + g2(X)

n
E [g(X)] = Σ
i=1
[g1(xi) + g2(xi)] f(xi)
n n
E [g(X)] = Σ
i=1
g1(xi) f(xi) +i =Σ1 g2(xi) f(xi)
E [g(X)] = E [g1(X)] + E [g2(X)]

2.24
Adding and Subtracting
Random Variables
E(X+Y) = E(X) + E(Y)
E(X-Y) = E(X) - E(Y)

2.25
Adding a constant to a variable will

add a constant to its expected value:
E(X+a) = E(X) + a
Multiplying by constant will multiply
its expected value by that constant:
E(bX) = b E(X)
2.26
Variance
var(X) = average squared deviations

around the mean of X.
var(X) = expected value of the squared deviations

around the expected value of X.
2
var(X) = E [(X - EX) ]
2
2.27
var(X) = E [(X - EX) ]
2
var(X) = E [(X - EX) ]
2 2
= E [X - 2XEX + (EX) ]
2 2
= E(X ) - 2 EX EX + E (EX)
2 2 2
= E(X ) - 2 (EX) + (EX)
2 2
= E(X ) - (EX)
2 2
var(X) = E(X ) - (EX)
2.28
variance of a discrete
random variable, X:
n
var ( X) = ∑(xi - EX ) f (xi ) 2
i=1
standard deviation is square root of variance

2.29
calculate the variance for a
discrete random variable, X:
2
xi f(xi) (xi - EX) (xi - EX) f(xi)
2 .1 2 - 4.3 = -2.3 5.29 (.1) = .529

3 .3 3 - 4.3 = -1.3 1.69 (.3) = .507
4 .1 4 - 4.3 = - .3 .09 (.1) = .009
5 .2 5 - 4.3 = .7 .49 (.2) = .098
6 .3 6 - 4.3 = 1.7 2.89 (.3) = .867
n
Σ x f(xi) = .2 + .9 + .4 + 1.0 + 1.8 = 4.3
i=1 i
n 2
Σ (xi - EX) f(xi) = .529 + .507 + .009 + .098 + .867
i=1
= 2.01
2.30
Z = a + cX
var(Z) = var(a + cX)
2
= E [(a+cX) - E(a+cX)]
2
= c var(X)
2
var(a + cX) = c var(X)
2.31
Joint pdf
A joint probability density function,

f(x,y), provides the probabilities
associated with the joint occurrence
of all of the possible pairs of X and Y.
2.32
Survey of College City, NY
college grads
joint pdf in household
Y=1 Y=2
f(x,y)
f(0,1) f(0,2)
vacation X = 0 .45 .15
homes
owned
X=1 .05 .35
f(1,1) f(1,2)
2.33
Calculating the expected value of

functions of two random variables.
E[g(X,Y)] = Σ Σ g(xi,yj) f(xi,yj)

i j
E(XY) = Σ Σ xi yj f(xi,yj)
i j
E(XY) = (0)(1)(.45)+(0)(2)(.15)+(1)(1)(.05)+(1)(2)(.35)=.75
2.34
Marginal pdf
The marginal probability density functions,

f(x) and f(y), for discrete random variables,
can be obtained by summing over the f(x,y)
with respect to the values of Y to obtain f(x)
with respect to the values of X to obtain f(y).
f(xi) = Σ f(xi,yj) f(yj) = Σ f(xi,yj)

j i
2.35
marginal
Y=1 Y=2 marginal
pdf for X:
X=0 .45 .15 .60 f(X = 0)
.05 .35 .40 f(X = 1)

X=1
marginal .50 .50

pdf for Y:
f(Y = 1) f(Y = 2)
2.36
Conditional pdf
The conditional probability density

functions of X given Y=y , f(x|y),
and of Y given X=x , f(y|x),
are obtained by dividing f(x,y) by f(y)
to get f(x|y) and by f(x) to get f(y|x).
f(x,y) f(x,y)
f(x|y) = f(y|x) =
f(y) f(x)
2.37
conditonal
Y=1 Y=2
f(Y=1|X = 0)=.75 f(Y=2|X= 0)=.25
.75 .25
X=0 .45 .15 .60
f(X=0|Y=1)=.90 .90 .30 f(X=0|Y=2)=.30
f(X=1|Y=1)=.10 .10 .70 f(X=1|Y=2)=.70
X=1 .05 .35 .40
.125 .875
f(Y=1|X = 1)=.125 f(Y=2|X = 1)=.875
.50 .50
2.38
Independence
X and Y are independent random

variables if their joint pdf, f(x,y),
is the product of their respective
marginal pdfs, f(x) and f(y) .
f(xi,yj) = f(xi) f(yj)

for independence this must hold for all pairs of i and j
2.39
not independent
Y=1 Y=2 marginal
pdf for X:
.50x.60=.30 .50x.60=.30
X=0 .45 .15 .60 f(X = 0)
.05 .35 .40 f(X = 1)

X=1
.50x.40=.20 .50x.40=.20 The calculations
in the boxes show
marginal .50 .50 the numbers
pdf for Y: required to have
f(Y = 1) f(Y = 2) independence.
2.40
Covariance
The covariance between two random

variables, X and Y, measures the
linear association between them.
cov(X,Y) = E[(X - EX)(Y-EY)]
Note that variance is a special case of covariance.

2
cov(X,X) = var(X) = E[(X - EX) ]
2.41
cov(X,Y) = E [(X - EX)(Y-EY)]
cov(X,Y) = E [(X - EX)(Y-EY)]

= E [XY - X EY - Y EX + EX EY]
= E(XY) - EX EY - EY EX + EX EY
= E(XY) - 2 EX EY + EX EY
= E(XY) - EX EY
cov(X,Y) = E(XY) - EX EY
Y=1 Y=2 2.42
X=0 .45 .15 .60

EX=0(.60)+1(.40)=.40
.05 .35 .40

X=1
covariance
.50 .50 cov(X,Y) = E(XY) - EX EY
EY=1(.50)+2(.50)=1.50 = .75 - (.40)(1.50)
= .75 - .60
EX EY = (.40)(1.50) = .60
= .15
E(XY) = (0)(1)(.45)+(0)(2)(.15)+(1)(1)(.05)+(1)(2)(.35)=.75
2.43
Correlation
The correlation between two random

variables X and Y is their covariance
divided by the square roots of their
respective variances.
cov(X,Y)
ρ(X,Y) =
var(X) var(Y)
Correlation is a pure number falling between -1 and 1.
Y=1 Y=2 2.44
EX=.40
2 2 2
EX=0(.60)+1(.40)=.40
X=0 .45 .15 .60 2 2
var(X) = E(X ) - (EX)
2
= .40 - (.40)
.05 .35 .40 = .24
X=1
cov(X,Y) = .15
EY=1.50 .50 .50 correlation

cov(X,Y)
2 2 2
EY=1(.50)+2(.50) 2
ρ(X,Y) =
2
var(Y) = E(Y ) - (EY) var(X) var(Y)
= .50 + 2.0
= 2.50 - (1.50)2
= 2.50
= .25 ρ(X,Y) = .61
2.45
Zero Covariance & Correlation
Independent random variables

have zero covariance and,
therefore, zero correlation.
The converse is not true.

Since expectation is a linear operator, 2.46
it can be applied term by term.
The expected value of the weighted sum

of random variables is the sum of the
expectations of the individual terms.
E[c1X + c2Y] = c1EX + c2EY
In general, for random variables X1, . . . , Xn :
E[c1X1+...+ cnXn] = c1EX1+...+ cnEXn

2.47
The variance of a weighted sum of random
variables is the sum of the variances, each times
the square of the weight, plus twice the covariances
of all the random variables times the products of
their weights.
Weighted sum of random variables:

2 2
var(c1X + c2Y)=c1 var(X)+c2 var(Y) + 2c1c2cov(X,Y)
Weighted difference of random variables:
var(c1X − c2Y) = c21 var(X)+c22var(Y) − 2c1c2cov(X,Y)

2.48
The Normal Distribution
Y~ 2
N(β,σ )
1 - (y - β) 2
f(y) = exp
2 π σ2 2 σ2
f(y)
β y
2.49
The Standardized Normal
Z = (y - β)/σ
Z ~ N(0,1)
1 - z2
f(z) = exp
2π
2
2.50
Y~ N(β,σ2)
f(y)
β
a
y
Y-β a-β a-β

P[Y>a] = P > = P Z >
σ σ σ
2.51
Y~ N(β,σ2)
f(y)
β
a b
y
a-β Y-β b-β
P[a<Y<b] = P < < σ
σ σ
a-β b-β
= P <Z<
σ σ
2.52
Linear combinations of jointly
normally distributed random variables
are themselves normally distributed.
Y1 ~ N(β1,σ12), Y2 ~ N(β2,σ22), . . . , Yn ~ N(βn,σn2)
W = c1Y1 + c2Y2 + . . . + cnYn
W ~ N[ E(W), var(W) ]
2.53
Chi-Square
If Z1, Z2, . . . , Zm denote m independent

N(0,1) random variables, and
V = Z1 + Z2 + . . . + Zm, then V ~ χ(m)
2 2 2 2
V is chi-square with m degrees of freedom.
E[V] = E[ χ(m) ] = m
2
mean:
var[V] = var[ χ(m) ] = 2m

2
variance:
2.54
Student - t
If Z ~ N(0,1) and V ~ χ(m) and if Z and V

2
are independent then, Z

t= ~ t(m)
Vm
t is student-t with m degrees of freedom.
mean: E[t] = E[t(m) ] = 0 symmetric about zero
variance: var[t] = var[t(m) ] = m / (m−2)

2.55
F Statistic
If V1 ~ χ(m1) and V2 ~ χ(m2) and if V1 and V2

2 2
are independent, then V

1m
1
F= ~ F(m1,m2)
V2
m2
F is an F statistic with m1 numerator

degrees of freedom and m2 denominator
degrees of freedom.
3.1
Chapter 3
The Simple Linear

Regression
Model
3.2
Purpose of Regression Analysis
1. Estimate a relationship among economic

variables, such as y = f(x).
2. Forecast or predict the value of one

variable, y, based on the value of
another variable, x.
3.3
Weekly Food Expenditures
y = dollars spent each week on food items.

x = consumer’s weekly income.
The relationship between x and the expected

value of y , given x, might be linear:
E(y|x) = β1 + β2 x
3.4
f(y|x=480)
f(y|x=480)
µy|x=480 y
Figure 3.1a Probability Distribution f(y|x=480)

of Food Expenditures if given income x=$480.
f(y|x) 3.5
f(y|x=480) f(y|x=800)
µy|x=480 µy|x=800 y
Figure 3.1b Probability Distribution of Food

Expenditures if given income x=$480 and x=$800.
Average 3.6
Expenditure
E(y|x)
E(y|x)=β1+β2x
∆E(y|x)
∆E(y|x) β2=
∆x
∆x
β1{
x (income)
Figure 3.2 The Economic Model: a linear relationship

between avearage expenditure on food and income.
Homoskedastic Case 3.7
f(yt) yt
re
itu
nd
pe
ex
.
.
x1=480 x2=800 income xt

Figure 3.3. The probability density function
for yt at two levels of household income, x t
Heteroskedastic Case 3.8
f(yt)
y
t
re
it u
n d
pe
e x
.
.
.
x1 x2 x3 income xt
Figure 3.3+. The variance of yt increases
as household income, x t , increases.
3.9
Assumptions of the Simple Linear
Regression Model - I
1. The average value of y, given x, is given by
the linear regression:
E(y) = β1 + β2x
2. For each value of x, the values of y are
distributed around their mean with variance:
var(y) = σ2
3. The values of y are uncorrelated, having zero
covariance and thus no linear relationship:
cov(yi ,yj) = 0
4. The variable x must take at least two different
values, so that x ≠ c, where c is a constant.
3.10
One more assumption that is often used in
practice but is not required for least squares:
5. (optional) The values of y are normally

distributed about their mean for each
value of x:
y ~ N [(β1+β2x), σ2 ]
3.11
The Error Term
y is a random variable composed of two parts:
I. Systematic component: E(y) = β1 + β2x

This is the mean of y.
II. Random component: e = y - E(y)

= y - β1 - β2x
This is called the random error.
Together E(y) and e form the model:

y = β1 + β2x + e
y 3.12
y4 .
e4 { E(y) = β + β x
1 2
y3 .} e3
y2 e2 {
.
y1 }
. e1
x1 x2 x3 x4 x
Figure 3.5 The relationship among y, e and
the true regression line.
y 3.13
. y4
ê { ^y = b + b x
4
.^y 1 2
^y 4
.} ê3
3
y2 .
ê { . y3
2 .
y^2
^y
1.
} ê
y1 . 1
x1 x2 x3 x4 x
Figure 3.7a The relationship among y, ê and
the fitted regression line.
y 3.14
.y4 ^y = b + b x
y^*2
^y*
. 3
{.
ê*
4
1 2
^y*= b* + b* x
^y* 1 2
y^*1. . ê*3 { 4
ê* { y .
{
2 . 2 y
3
ê*
1
y1.
x1 x2 x3 x4 x
Figure 3.7b The sum of squared residuals
from any other line will be larger.
f(.) 3.15
f(e) f(y)
0 β1+β2x
Figure 3.4 Probability density function for e and y

The Error Term Assumptions 3.16
1. The value of y, for each value of x, is

y = β1 + β2x + e
2. The average value of the random error e is:
E(e) = 0
3. The variance of the random error e is:
var(e) = σ2 = var(y)
4. The covariance between any pair of e’s is:
cov(ei ,ej) = cov(yi ,yj) = 0
5. x must take at least two different values so that
x ≠ c, where c is a constant.
6. e is normally distributed with mean 0, var(e)=σ2
(optional) e ~ N(0,σ2)
Unobservable Nature 3.17
of the Error Term
1. Unspecified factors / explanatory variables,
not in the model, may be in the error term.
2. Approximation error is in the error term if

relationship between y and x is not exactly
a perfectly linear relationship.
3. Strictly unpredictable random behavior that

may be unique to that observation is in error.
3.18
Population regression values:
y t = β1 + β2x t + e t
Population regression line:
E(y t|x t) = β1 + β2x t
Sample regression values:

y t = b1 + b2x t + ê t
Sample regression line:

^
y t = b1 + b2x t
3.19
y t = β1 + β2x t + e t
e t = y t - β1 - β2x t
Minimize error sum of squared deviations:

T
S(β1,β2) = Σ(y t - β1 - β2x t ) 2 (3.3.4)
t=1
Minimize w. r. t. β1 and β2 : 3.20
T
S(β1,β2) = Σ(y t - β1 - β2x t )2 (3.3.4)
t =1
∂S(.)
= - 2 Σ (y t - β1 - β2x t )
∂β1
∂S(.)
= -2Σ x t (y t - β1 - β2x t )
∂β2
Set each of these two derivatives equal to zero and
solve these two equations for the two unknowns: β1 β2
Minimize w. r. t. β1 and β2 : 3.21
T
S(.) = Σ
t =1
(y t - β1 - β2x t )2
S(.)
S(.)
∂S(.) < .
0
∂βi ∂S(.) =
0
.∂S(.)
∂βi ∂βi
>0
.
bi βi
To minimize S(.), you set the two 3.22
derivatives equal to zero to get:
∂S(.)
= - 2 Σ (y t - b1 - b2x t ) = 0
∂β1
∂S(.)
= -2Σ x t (y t - b1 - b2x t ) = 0
∂β2
When these two terms are set to zero,
β1 and β2 become b1 and b2 because they no longer
represent just any value of β1 and β2 but the special
values that correspond to the minimum of S(.) .
3.23
- 2 Σ (y t - b1 - b2x t ) = 0
-2Σ x t (y t - b1 - b2x t ) = 0
Σ y t - Tb1 - b2 Σ x t = 0
Σ x t y t - b1 Σ x t - b2 Σ xt
2
= 0
Tb1 + b2 Σ x t = Σ yt
Σ xt + b2 Σ xt Σ xtyt
2
b1 =
3.24
Tb1 + b2 Σ x t = Σ yt
Σ xt + b2 Σ xt Σ xtyt
2
b1 =
Solve for b1 and b2 using definitions of x and y
T Σ x t yt -
Σ xt Σ yt
b2 =
TΣ t - ( Σ t)
2
x x 2
b1 = y - b2 x
3.25
elasticities
percentage change in y ∆y/y ∆y x
η = = =
percentage change in x ∆x/x ∆x y
Using calculus, we can get the elasticity at a point:
∆y x ∂y x
η = lim =
∆x→ 0 ∆x y ∂x y
3.26
applying elasticities
E(y) = β1 + β2 x
∂E(y)
= β2
∂x
∂E(y) x x
η = = β2
∂x E(y) E(y)
3.27
estimating elasticities
∂y x x
η =
^
= b2
∂x y y
y^t = b1 + b2 x t = 4 + 1.5 x t
x = 8 = average number of years of experience
y = $10 = average wage rate
η = b2
^ x 8
= 1.5 = 1.2
y 10
3.28
Prediction
Estimated regression equation:
y^t = 4 + 1.5 x t
xt = years of experience
y^t = predicted wage rate
^
If xt = 2 years, then yt = $7.00 per hour.
^
If xt = 3 years, then yt = $8.50 per hour.
3.29
log-log models
ln(y) = β1 + β2 ln(x)
∂ln(y) ∂ln(x)
= β2
∂x ∂x
1 ∂y 1 ∂x
= β2
y ∂x x ∂x
3.30
1 ∂y 1 ∂x
= β2
y ∂x x ∂x
x ∂y
= β2
y ∂x
elasticity of y with respect to x:
x ∂y
η = = β2
y ∂x
4.1
Chapter 4
Properties of
Least Squares
Estimators
4.2
Simple Linear Regression Model
yt = β1 + β2 x t + ε t
yt = household weekly food expenditures

x t = household weekly income
For a given level of x t, the expected

level of food expenditures will be:
E(yt|x t) = β1 + β2 x t
Assumptions of the Simple 4.3
Linear Regression Model
1. yt = β1 + β2x t + ε t
2. E(ε t) = 0 <=> E(yt) = β1 + β2x t
3. var(ε t) = σ 2 = var(yt)
4. cov(ε i,ε j) = cov(yi,yj) = 0
5. xt ≠ c for every observation
6. ε t~N(0,σ 2) <=> yt~N(β1+ β2x t,σ 2)
4.4
The population parameters β1 and β2
are unknown population constants.
The formulas that produce the

sample estimates b1 and b2 are
called the estimators of β1 and β2.
When b0 and b1 are used to represent

the formulas rather than specific values,
they are called estimators of β1 and β2
which are random variables because
they are different from sample to sample.
4.5
Estimators are Random Variables
( estimates are not )
• If the least squares estimators b0 and b1

are random variables, then what are their
their means, variances, covariances and
probability distributions?
• Compare the properties of alternative

estimators to the properties of the
least squares estimators.
4.6
The Expected Values of b1 and b2
The least squares formulas (estimators)
in the simple regression case:
TΣxtyt - Σxt Σyt

b2 = (3.3.8a)
TΣxt2 -(Σxt) 2
b1 = y - b2x (3.3.8b)
where y = Σyt / T and x = Σx t / T

yt = β1 + β2x t + ε t
Substitute in 4.7
to get:
TΣxtεt - Σxt Σεt
b2 = β2 +
TΣxt2 -(Σxt) 2
The mean of b2 is:

TΣxtEεt - Σxt ΣEεt
Eb2 = β2 +
TΣxt2 -(Σxt) 2
Since Eεt = 0, then Eb2 = β2 .

4.8
An Unbiased Estimator
The result Eb2 = β2 means that

the distribution of b2 is centered at β2.
Since the distribution of b2

is centered at β2 ,we say that
b2 is an unbiased estimator of β2.
4.9
Wrong Model Specification
The unbiasedness result on the

previous slide assumes that we
are using the correct model.
If the model is of the wrong form

or is missing important variables,
then Eεt ≠ 0, then Eb2 ≠ β2 .
4.10
Unbiased Estimator of the Intercept
In a similar manner, the estimator b1

of the intercept or constant term can be
shown to be an unbiased estimator of β1
when the model is correctly specified.
Eb1 = β1
4.11
Equivalent expressions for b2:
Σ(xt − x )(yt − y ) (4.2.6)

b2 =
Σ(xt − x )2
Expand and multiply top and bottom by T:
TΣxtyt − Σxt Σyt

b2 = (3.3.8a)
TΣxt −(Σxt)
2 2
4.12
Variance of b2
Given that both yt and εt have variance σ 2,

the variance of the estimator b2 is:
var(b2) = σ2
Σ(x t − x) 2
b2 is a function of the yt values but

var(b2) does not involve yt directly.
4.13
Variance of b1
Given b1 = y − b2x
the variance of the estimator b1 is:
Σx t 2
var(b1) = σ 2
Τ Σ(x t − x) 2
4.14
Covariance of b1 and b2
−x
cov(b1,b2) = σ2
Σ(x t − x)2
If x = 0, slope can change without affecting

the variance.
4.15
What factors determine
variance and covariance ?
1. σ 2: uncertainty about yt values uncertainty about
b1, b2 and their relationship.
2. The more spread out the xt values are then the more
confidence we have in b1, b2, etc.
3. The larger the sample size, T, the smaller the
variances and covariances.
4. The variance b1 is large when the (squared) xt values
are far from zero (in either direction).
5. Changing the slope, b2, has no effect on the intercept,
b1, when the sample mean is zero. But if sample
mean is positive, the covariance between b1 and
b2 will be negative, and vice versa.
4.16
Gauss-Markov Theorm
Under the first five assumptions of the

simple, linear regression model, the
ordinary least squares estimators b1
and b2 have the smallest variance of
all linear and unbiased estimators of
β1 and β2. This means that b1and b2
are the Best Linear Unbiased Estimators
(BLUE) of β1 and β2.
4.17
implications of Gauss-Markov
1. b1 and b2 are “best” within the class
of linear and unbiased estimators.
2. “Best” means smallest variance
within the class of linear/unbiased.
3. All of the first five assumptions must
hold to satisfy Gauss-Markov.
4. Gauss-Markov does not require
assumption six: normality.
5. G-Markov is not based on the least
squares principle but on b1 and b2.
4.18
G-Markov implications (continued)
6. If we are not satisfied with restricting
our estimation to the class of linear and
unbiased estimators, we should ignore
the Gauss-Markov Theorem and use
some nonlinear and/or biased estimator
instead. (Note: a biased or nonlinear
estimator could have smaller variance
than those satisfying Gauss-Markov.)
7. Gauss-Markov applies to the b1 and b2
estimators and not to particular sample
values (estimates) of b1 and b2.
4.19
Probability Distribution
of Least Squares Estimators
σ2 Σx t2
b1 ~ N β1 ,
Τ Σ(x t − x)2
σ2
b2 ~ N β2 ,
Σ(x t − x)2
4.20
yt and ε t normally distributed
The least squares estimator of β2 can be
expressed as a linear combination of yt’s:
b2 = Σ wt yt
(x t − x)
where wt =
Σ(x t − x)2
b1 = y − b2x
This means that b1and b2 are normal since

linear combinations of normals are normal.
4.21
normally distributed under
The Central Limit Theorem
If the first five Gauss-Markov assumptions

hold, and sample size, T, is sufficiently large,
then the least squares estimators, b1 and b2,
have a distribution that approximates the
normal distribution with greater accuracy
the larger the value of sample size, T.
4.22
Consistency
We would like our estimators, b1 and b2, to collapse

onto the true population values, β1 and β2, as
sample size, T, goes to infinity.
One way to achieve this consistency property is

for the variances of b1 and b2 to go to zero as T
goes to infinity.
Since the formulas for the variances of the least

squares estimators b1 and b2 show that their
variances do, in fact, go to zero, then b1 and b2,
are consistent estimators of β1 and β2.
Estimating the variance 4.23
of the error term, σ 2
ê = yt − b1 − b2 x t
t
Σ
T
^
e 2
t
^2
σ = t =1
T− 2
σ
^ 2 is an unbiased estimator of σ 2
4.24
The Least Squares
Predictor, ^yo
Given a value of the explanatory

variable, Xo , we would like to predict
a value of the dependent variable, yo .
The least squares predictor is:
^y = b + b x
o 1 2 o (4.7.2)
5.1
Chapter 5
Inference
in the Simple
Regression Model
5.2
Assumptions of the Simple
Linear Regression Model
1. yt = β1 + β2x t + ε t
2. E(ε t) = 0 <=> E(yt) = β1 + β2x t
3. var(ε t) = σ 2 = var(yt)
4. cov(ε i,ε j) = cov(yi,yj) = 0
5. xt ≠ c for every observation
6. ε t~N(0,σ 2) <=> yt~N(β1+ β2x t,σ 2)
5.3
Probability Distribution
of Least Squares Estimators
σ2 Σx2t
b1 ~ N β1 ,Τ Σ(x t − x)2
σ2
b2 ~ N β2 ,
Σ(x t − x) 2
5.4
Error Variance Estimation
Unbiased estimator of the error variance:
Σ ^ 2
et
σ
^2 =
Τ−2
Transform to a chi-square distribution:

(Τ − 2) σ
σ 2
^2
∼ χ Τ−2
5.5
We make a correct decision if:
• The null hypothesis is false and we decide to reject it.
• The null hypothesis is true and we decide not to reject it.
Our decision is incorrect if:
• The null hypothesis is true and we decide to reject it.

This is a type I error.
• The null hypothesis is false and we decide not to reject it.

This is a type II error.
5.6
σ2
b2 ~ N β2 ,
Σ(x t − x) 2
Create a standardized normal random variable, Z,

by subtracting the mean of b2 and dividing by its
standard deviation:
b2 − β2
Ζ = ∼ Ν(0,1)
var(b2)
5.7
Simple Linear Regression
yt = β1 + β2x t + ε t where E ε t = 0
yt ~ N(β1+ β2x t , σ 2)
since Eyt = β1 + β2x t
ε t = yt − β1 − β2x t
Therefore, ε t ~ N(0,σ 2) .
5.8
Create a Chi-Square
ε t ~ N(0,σ 2) but want N(0,1) .
(ε t /σ) ~ N(0,1) Standard Normal .
(ε t /σ) 2 ~ χ2(1) Chi-Square .

5.9
Sum of Chi-Squares
Σt =1(ε t /σ)2 =
(ε1 /σ)2 + (ε 2 /σ)2 +. . .+ (ε T /σ)2
χ2(1) + χ2(1) +. . .+χ2(1) = χ2(Τ)
Therefore, Σt =1(ε t /σ)2 ∼ χ2(Τ)

5.10
Chi-Square degrees of freedom
Since the errors ε t = yt − β1 − β2x t
are not observable, we estimate them with
the sample residuals e t = yt − b1 − b2x t.
Unlike the errors, the sample residuals are

not independent since they use up two degrees
of freedom by using b1 and b2 to estimate β1 and β2.
We get only T−2 degrees of freedom instead of T.

5.11
Student-t Distribution
Z
t= ~ t(m)
V/m
where Z ~ N(0,1)
and V ~ χ 2
(m)
5.12
Z
t = ~ t(m)
V / ( T− 2)
(b2 − β2)
where Z =
var(b2)
σ2
and var(b2) =
Σ( xi − x )2
5.13
Z
t = (T−2) σ
^2
V =
V / (T-2) σ 2
(b2 − β2)
var(b2)
t =
(T−2) σ
^2
( T− 2)
σ 2
σ
5.14
var(b2) =
Σ( xi − x )2
(b2 − β2) notice the

cancellations
σ2
Σ( xi − x )2 (b2 − β2)
t = =
(T−2) ^2
σ σ
^2
( T− 2)
σ2 Σ( xi − x )2
5.15
(b2 − β2) (b2 − β2)

t = =
^
σ
^2 var(b2)
Σ( xi − x )2
(b2 − β2)
t =
se(b2)
5.16
Student’s t - statistic
(b2 − β2)
t = ~ t (T−2)
se(b2)
t has a Student-t Distribution

with T− 2 degrees of freedom.
5.17
Figure 5.1 Student-t Distribution

f(t)
(1−α)
α/2 α/2
-tc 0 tc t
red area = rejection region for 2-sided test
5.18
probability statements
P( t < -tc ) = P( t > tc ) = α/2
P(-tc ≤ t ≤ tc) = 1 − α
(b2 − β2)
P(-tc ≤ ≤ tc) = 1 − α
se(b2)
5.19
Confidence Intervals
Two-sided (1−α)x100% C.I. for β1:
b1 − tα/2[se(b1)], b1 + tα/2[se(b1)]
Two-sided (1−α)x100% C.I. for β2:

b2 − tα/2[se(b2)], b2 + tα/2[se(b2)]
5.20
Student-t vs. Normal Distribution

1. Both are symmetric bell-shaped distributions.
2. Student-t distribution has fatter tails than the normal.
3. Student-t converges to the normal for infinite sample.
4. Student-t conditional on degrees of freedom (df).
5. Normal is a good approximation of Student-t for the first few

decimal places when df > 30 or so.
5.21
Hypothesis Tests
1. A null hypothesis, H0.
2. An alternative hypothesis, H1.
3. A test statistic.
4. A rejection region.
5.22
Rejection Rules
1. Two-Sided Test:
If the value of the test statistic falls in the critical region in either
tail of the t-distribution, then we reject the null hypothesis in favor
of the alternative.
2. Left-Tail Test:
If the value of the test statistic falls in the critical region which lies
in the left tail of the t-distribution, then we reject the null
hypothesis in favor of the alternative.
2. Right-Tail Test:
If the value of the test statistic falls in the critical region which lies
in the right tail of the t-distribution, then we reject the null
hypothesis in favor of the alternative.
5.23
Format for Hypothesis Testing
1. Determine null and alternative hypotheses.

2. Specify the test statistic and its distribution
as if the null hypothesis were true.
3. Select α and determine the rejection region.
4. Calculate the sample value of test statistic.

5. State your conclusion.
5.24
practical vs. statistical
significance in economics
Practically but not statistically significant:
When sample size is very small, a large average gap between
the salaries of men and women might not be statistically
significant.
Statistically but not practically significant:

When sample size is very large, a small correlation (say, ρ =
0.00000001) between the winning numbers in the PowerBall
Lottery and the Dow-Jones Stock Market Index might be
statistically significant.
5.25
Type I and Type II errors
Type I error:
We make the mistake of rejecting the null
hypothesis when it is true.
α = P(rejecting H0 when it is true).
Type II error:
We make the mistake of failing to reject the null
hypothesis when it is false.
β = P(failing to reject H0 when it is false).
5.26
Prediction Intervals
A (1−α)x100% prediction interval for yo is:
y^ o ± tc se( f )
f = y^ o − yo se( f ) = ^ f)
var(
1 ( x − x )2
var( f ) = σ 1 +
^ ^ 2 + o
Τ Σ(x t − x)2
6.1
Chapter 6
The Simple Linear

Regression Model
6.2
Explaining Variation in yt
Predicting yt without any explanatory variables:

yt = β1 + et Σ(yt − b1) = 0
T
t=1
Σyt − Tb1 = 0
T T
Σ Σ(yt − β1)
T
2 2
et = t=1
t=1 t=1
b1 = y
∂Σ
T
2
et T
t=1
= −2 Σ(yt − b1) = 0
∂β1 t=1 Why not y?
6.3
^
yt = b1 + b2xt + et
^
Explained variation: yt = b1 + b2xt
Unexplained variation:
ê = y − y
^ = y −b −b x
t t t t 1 2 t
6.4
^ ^
yt = yt + et using y as baseline
yt − y = yt − y + et Why not y?
^ ^
T cross
Σ(yt−y) = Σ(yt−y) +Σe
T T
^
2 2 ^2 product
t=1 t=1 t=1
t
term
drops
SST = SSR + SSE out
6.5
Total Variation in yt
SST = total sum of squares
SST measures variation of yt around y
T
SST = Σ(yt − y) 2
t=1
6.6
Explained Variation in yt
SSR = regression sum of squares
Fitted ^yt values:

^y = b + b x
t 1 2 t
^
SSR measures variation of yt around y
T
SSR =Σ ^
(y t − y) 2
t=1
6.7
Unexplained Variation in yt
SSE = error sum of squares
et = yt−yt = yt − b1 − b2xt
^ ^
^
SSE measures variation of yt around y t
T T
SSE = Σ(yt − ^
y)t
2
=Σ ê 2
t
t=1 t=1
6.8
Analysis of Variance Table
Table 6.1 Analysis of Variance Table

Source of Sum of Mean
Variation DF Squares Square
Explained 1 SSR SSR/1
Unexplained T-2 SSE SSE/(T-2)
[= σ
^ 2]
Total T-1 SST
6.9
Coefficient of Determination
What proportion of the variation
in yt is explained?
0≤ R ≤1 2
2 SSR
R = SST
6.10
SST = SSR + SSE
SST SSR SSE
Dividing = +
by SST SST SST SST
1 = SSR + SSE
SST SST
R = 2 SSR
SST = 1− SSE
SST
6.11
R 2 is only a descriptive measure.
R2 does not measure the quality

of the regression model.
Focusing solely on maximizing

2
R is not a good idea.
6.12
Correlation Analysis
Population:
cov(X,Y)
ρ=
var(X) var(Y)
Sample:
^
cov(X,Y)
r=
^
var(X) ^
var(Y)
6.13
^ =Σ
T
var(X) t=1
(x t − x) /( T−1)
2
Σ(yt − y) /(T−1)
T
^ =
var(Y) 2
t=1
= Σ(xt − x)(yt − y)/(T−1)

T
^
cov(X,Y)
t=1
6.14
Sample Correlation Coefficient
Σ(xt − x)(yt − y)
T
t=1
r=
Σ(xt − x) Σ(yt − y)
T T
2 2
t=1 t=1
2 6.15
Correlation Analysis and R
For simple linear regression analysis:

2 2
r = R
2
R is also the correlation
between yt and yt ^
measuring “goodness of fit”.
6.16
Regression Computer Output
Typical computer output of regression estimates:
Table 6.2 Computer Generated Least Squares Results

(1) (2) (3) (4) (5)
Parameter Standard T for H0:
Variable Estimate Error Parameter=0 Prob>|T|
INTERCEPT 40.7676 22.1387 1.841 0.0734
X 0.1283 0.0305 4.201 0.0002
6.17
b1 = 40.7676 b2 = 0.1283
se(b1) = ^ 1)
var(b = 490.12 = 22.1287
se(b2) = ^ 2)
var(b = 0.0009326 = 0.0305
b1 40.7676
t = = = 1.84
se(b1) 22.1287
b2 0.1283
t = se(b2)
= = 4.20
0.0305
6.18
Sources of variation in the dependent variable:

Sum of Mean
Source DF Squares Square
Explained 1 25221.2229 25221.2229
Unexplained 38 54311.3314 1429.2455
Total 39 79532.5544
R-square: 0.3171
6.19
SST = Σ(yt−y) = 79532

2
SSR = Σ(yt−y) = 25221

^ 2
SSE = Σet = 54311

^ 2
^2
SSE /(T-2) = σ = 1429.2455
SSR
2
R = = 1− SSE
= 0.317
SST SST
6.20
Reporting Regression Results
yt = 40.7676 + 0.1283xt
(s.e.) (22.1387) (0.0305)
yt = 40.7676 + 0.1283xt
(t) (1.84) (4.20)
6.21
Reporting Regression Results
2
R = 0.317
This R2 value may seem low but it is
typical in studies involving cross-sectional
data analyzed at the individual or micro level.
A considerably higher R2 value would be

expected in studies involving time-series data
analyzed at an aggregate or macro level.
6.22
Effects of Scaling the Data
Changing the scale of x
yt = β1 + β2xt + et
The estimated
coefficient and yt = β1 + (cβ2)(xt/c) + et
standard error
yt = β1 + β x* +
*
2 t et
change but the
other statistics where
are unchanged.
β*2 = cβ2 and x*t = xt/c
Effects of Scaling the Data 6.23
Changing the scale of y

yt/c = (β1/c) + (β2/c)xt + et/c
All statistics y = β + β* x
*
t
*
1 2 t + *
et
are changed
except for where y*t = yt/c e*t = et/c
the t-statistics
2
and R value. β*1 = β1/c and β*2 = β2/c
6.24
Effects of Scaling the Data
Changing the scale of x and y

No change in yt/c = (β1/c) + (cβ2/c)xt/c + et/c
2
the R or the
t-statistics or y =β +
*
t
*
1 β2x*t + e*t
in regression
results for β2 where y*t = yt/c e*t = et/c
but all other
stats change.
β*1 = β1/c and x*t = xt/c
6.25
Functional Forms
The term linear in a simple

regression model does not mean
a linear relationship between
variables, but a model in which
the parameters enter the model
in a linear way.
Linear vs. Nonlinear 6.27
Linear Statistical Models:

yt = β1 + β2xt + et yt = β1 + β2 ln(xt) + et
ln(yt) = β1 + β2xt + et yt = β1 + β2x2t + et
Nonlinear Statistical Models:

β3 β3
yt = β1 + β2xt + et yt = β1 + β2xt + et
yt = β1 + β2xt + exp(β3xt) + et
Linear vs. Nonlinear 6.27
y nonlinear
relationship
food
expenditure between food
expenditure and
income
0 income x
6.28
Useful Functional Forms
1. Linear
Look at 2. Reciprocal
each form
3. Log-Log
and its
slope and 4. Log-Linear
elasticity 5. Linear-Log
6. Log-Inverse
6.29
Linear
xt
slope: β2 elasticity: β2 y
t
6.30
Reciprocal
yt = β1 + β2 xt + et 1
slope: elasticity:
1
− β2 2 1
− β2 x y
xt t t
6.31
Log-Log
ln(yt)= β1 + β2ln(xt) + et
yt
slope: β2 x elasticity: β2
t
6.32
Log-Linear
ln(yt)= β1 + β2xt + et
slope: β2 yt elasticity: β2xt

6.33
Linear-Log
yt= β1 + β2ln(xt) + et
slope: β2 _
1
elasticity: β
_
1
xt 2 yt
6.34
Log-Inverse
ln(yt) = β1 - β2 x + et 1
t
yt 1
slope: β2 2 elasticity: β2 x
xt t
6.35
Error Term Properties
1. E (et) = 0
2. var (et) = σ 2
3. cov(ei, ej) = 0
4. et ~ N(0, σ )
2
6.36
Economic Models
1. Demand Models
2. Supply Models
3. Production Functions
4. Cost Functions
5. Phillips Curve
6.37
Economic Models
1. Demand Models
* quality demanded (yd) and price (x)
* constant elasticity
ln(yt )= β1 + β2ln(x)t + et
d
6.38
Economic Models
2. Supply Models
* quality supplied s
(y )
and price (x)
ln(yt )= β1 + β2ln(xt) + et
s
6.39
Economic Models
3. Production Functions
* output (y) and input (x)
Cobb-Douglas Production Function:
ln(yt)= β1 + β2ln(xt) + et
6.40
Economic Models
4a. Cost Functions

* total cost (y) and output (x)
yt = β1 + β2 x 2
t + et
6.41
Economic Models
4b. Cost Functions

* average cost (x/y) and output (x)
(yt/xt) = β1/xt + β2xt + et/xt

6.42
Economic Models
5. Phillips Curve
nonlinear in both variables and parameters
* wage rate (wt) and time (t)
wt − wt-1
% ∆wt = w = γα + γη u
1
t-1 t
unemployment rate, ut
7.1
Chapter 7
The Multiple
Regression Model
Two Explanatory Variables 7.2
yt = β1 + β2xt2 + β3xt3 + et
xt‘s affect yt ∂yt ∂yt

= β2 = β3
separately ∂xt2 ∂xt3
But least squares estimation of β2

now depends upon both xt2 and xt3 .
Correlated Variables 7.3
yt = β1 + β2xt2 + β3xt3 + et
yt = output xt2 = capital xt3 = labor
Always 5 workers per machine.
If number of workers per machine

is never varied, it becomes impossible
to tell if the machines or the workers
are responsible for changes in output.
The General Model 7.4
yt = β1 + β2xt2 + β3xt3 +. . .+ βKxtK + et
The parameter β1 is the intercept (constant) term.

The “variable” attached to β1 is xt1= 1.
Usually, the number of explanatory variables

is said to be K−1 (ignoring xt1= 1), while the
number of parameters is K. (Namely: β1 . . . βK).
7.5
Statistical Properties of et
1. E(et) = 0
2. var(et) = σ2
3. cov(et , es) = 0 for t ≠ s
4. et ~ N(0, σ2)
7.6
Statistical Properties of yt
1. E (yt) = β1 + β2xt2 +. . .+ βKxtK

2. var(yt) = var(et) = σ 2
3. cov(yt ,ys) = cov(et , es) = 0 t≠s

4. yt ~ N(β1+β2xt2 +. . .+βKxtK, σ2)
Assumptions 7.7
1. yt = β1 + β2xt2 +. . .+ βKxtK + et
2. E (yt) = β1 + β2xt2 +. . .+ βKxtK
3. var(yt) = var(et) = σ2
4. cov(yt ,ys) = cov(et ,es) = 0 t≠s
5. The values of xtk are not random
6. yt ~ N(β1+β2xt2 +. . .+βKxtK, σ2)
7.8
Least Squares Estimation
yt = β1 + β2xt2 + β3xt3 + et
T
2
S ≡ S(β1, β2, β3) = Σ(yt − β1 − β2xt2 − β3xt3)
t=1
Define: y*t = yt − y
x*t2 = xt2 − x2
x*t3 = xt3 − x3
7.9
Least Squares Estimators
b1 = y − b1 − b2x2 − b3x3
(Σ t t2)(Σ t3 ) (Σ t t3)(Σ t2 t3)

2
y* x* x* − * x*
y *
x *
x
b2 =
(Σx )(Σx ) − (Σxt2xt3)
* 2 * 2 2
* *
t2 t3
(Σ t t3)(Σ t2 ) (Σ t t2)(Σ t3 t2)

2
y* x* x* − y* *
x *
x *
x
b3 =
(Σx )(Σx ) − (Σxt2xt3)
* 2 * 2 2
* *
t2 t3
7.10
Dangers of Extrapolation
Statistical models generally are good only
“within the relevant range”. This means
that extending them to extreme data values
outside the range of the original data often
leads to poor and sometimes ridiculous results.
If height is normally distributed and the
normal ranges from minus infinity to plus
infinity, pity the man minus three feet tall.
Error Variance Estimation 7.11
Unbiased estimator of the error variance:
Σ ^ 2
e t
σ
^2 =
Τ−Κ
Transform to a chi-square distribution:

σ
^
(Τ − Κ) 2
σ2
∼ χ Τ−Κ
Gauss-Markov Theorem 7.12
Under the assumptions of the

multiple regression model, the
ordinary least squares estimators
have the smallest variance of
all linear and unbiased estimators.
This means that the least squares
estimators are the Best Linear
U nbiased Estimators (BLUE).
7.13
Variances
yt = β1 + β2xt2 + β3xt3 + et
var(b2) = σ 2
When r23 = 0
(1− r23)
2
Σ(xt2 − x2) 2 these reduce
to the simple
var(b3) =
σ 2
regression
(1− r23)
2
Σ(xt3 − x3) 2
formulas.
Σ(xt2 − x2)(xt3 − x3)

where r23 =
Σ(xt2 − x2) Σ(xt3 − x3)
2 2
7.14
Variance Decomposition
The variance of an estimator is smaller when:
1. The error variance, σ 2, is smaller: σ 2 0.
2. The sample size, T, is larger:

T
Σ(xt2 − x2)
t=1
2
.
3. The variable’s values are more spread out:
(xt2 − x2) . 2
2
4. The correlation is close to zero: r23 0.
7.15
Covariances
yt = β1 + β2xt2 + β3xt3 + et
− r23 σ2
cov(b2,b3) =
Σ(xt2 − x2) Σ(xt3 − x3)
2
(1− r23) 2 2
Σ(xt2 − x2)(xt3 − x3)

where r23 =
Σ(xt2 − x2) Σ(xt3 − x3)
2 2
7.16
Covariance Decomposition
The covariance between any two estimators
is larger in absolute value when:
1. The error variance, σ 2, is larger.

2. The sample size, T, is smaller.
3. The values of the variables are less spread out.
4. The correlation, r23, is high.

Var-Cov Matrix 7.17
yt = β1 + β2xt2 + β3xt3 + et
The least squares estimators b1, b2, and b3

have covariance matrix:
var(b1) cov(b1,b2) cov(b1,b3)

cov(b1,b2,b3) = cov(b1,b2) var(b2) cov(b2,b3)
cov(b1,b3) cov(b2,b3) var(b3)
Normal 7.18
yt = β1 + β2x2t + β3x3t +. . .+ βKxKt + et
yt ~N (β1 + β2x2t + β3x3t +. . .+ βKxKt), σ 2
This implies and is implied by: et ~ N(0, σ )

2
Since bk is a linear
function of the yt’s: bk ~ N βk, var(bk)
bk − βk
z = ~ N(0,1) for k = 1,2,...,K
var(bk)
Student-t 7.19
Since generally the population variance

of bk , var(bk) , is unknown, we estimate
^ k) which uses σ^ 2 instead of σ 2.
it with var(b
bk − βk bk − βk
t = =
^ )
var(b se(bk)
k
t has a Student-t distribution with df=(T−K).

7.20
Interval Estimation
bk − βk
P −tc ≤ ≤ tc = 1 − α
se(bk)
tc is critical value for (T-K) degrees of freedom
such that P(t ≥ tc) = α /2.
P bk − tc se(bk) ≤ βk ≤ bk + tc se(bk) = 1−α
Interval endpoints: bk − tc se(bk) , bk + tc se(bk)

8.1
Chapter 8
Hypothesis Testing
and
Nonsample Information
Chapter 8: Overview 8.2
1. Student-t Tests
2. Goodness-of-Fit
3. F-Tests
4. ANOVA Table
5. Nonsample Information
6. Collinearity
7. Prediction
Student - t Test 8.3
yt = β1 + β2Xt2 + β3Xt3 + β4Xt4 + et

Student-t tests can be used to test any linear
combination of the regression coefficients:
H0: β1 = 0 H0: β2 + β3 + β4 = 1
H0: 3β2 − 7β3 = 21 H0: β2 − β3 ≤ 5
Every such t-test has exactly T−K degrees of freedom
where K=#coefficients estimated(including the intercept).
One Tail Test 8.4
yt = β1 + β2Xt2 + β3Xt3 + β4Xt4 + et
H0: β3 ≤ 0 b3
t= ~ t (T−K)
H1: β3 > 0 se(b3)
df = T− K
= T− 4
(1 − α) α
0 tc
Two Tail Test 8.5
yt = β1 + β2Xt2 + β3Xt3 + β4Xt4 + et
H0: β2 = 0 b2
t= ~ t (T−K)
H1: β2 ≠ 0 se(b2)
df = T− K
= T− 4
α/2 (1 − α) α/2
-tc 0 tc
Goodness - of - Fit 8.6
2 SSR =
Σ ^
(y t− y) 2
R = SST
t=1
T
Σ(yt − y)
t=1
2
0≤ R ≤12
Adjusted R-Squared 8.7
Adjusted Coefficient of Determination

Original:
2
R = SSR
SST
= 1− SSE
SST
Adjusted:
R = 1−
2 SSE/(T−K)
SST/(T−1)
Computer Output 8.8
Table 8.2 Summary of Least Squares Results

Variable Coefficient Std Error t-value p-value
constant 104.79 6.48 16.17 0.000
price −6.642 3.191 −2.081 0.042
advertising 2.984 0.167 17.868 0.000
b2 −6.642
t= = = −2.081
se(b2) 3.191
Reporting Your Results 8.9
Reporting standard errors:

^y = 104.79 − 6.642 X + 2.984 X
t t2 t3
(6.48) (3.191) (0.167) (s.e.)
Reporting t-statistics:
^y = 104.79 − 6.642 X + 2.984 X

t t2 t3
(16.17) (-2.081) (17.868) (t)
Single Restriction F-Test 8.10
yt = β1 + β2Xt2 + β3Xt3 + β4Xt4 + et
F =
(SSER − SSEU)/J H0: β2 = 0
SSEU/(T−K)
H1: β2 ≠ 0
(1964.758 − 1805.168)/1
=
1805.168/(52 − 3) dfn = J = 1
= 4.33 dfd = T− K = 49
By definition this is the t-statistic squared:
t = − 2.081 F = t2 = 4.33
Multiple Restriction F-Test 8.11
yt = β1 + β2Xt2 + β3Xt3 + β4Xt4 + et
H0: β2 = 0, β4 = 0 (SSER − SSEU)/J

F =
H1: H0 not true SSEU/(T−K)
First run the restricted dfn = J = 2

regression by dropping dfd = T− K = 49
Xt2 and Xt4 to get SSER.
Next run unrestricted regression to get SSEU .
8.12
F-Tests
F-Tests of this type are always right-tailed,
even for left-sided or two-sided hypotheses,
f(F) because any deviation from the null will
make the F value bigger (move rightward).
(SSER − SSEU)/J
F =
SSEU/(T−K)
(1 − α) α
0 Fc F
F-Test of Entire Equation 8.13
yt = β1 + β2Xt2 + β3Xt3 + et
We ignore β1. Why? H0: β2 = β3 = 0

(SSER − SSEU)/J
H1: H0 not true
F =
SSEU/(T−K)
dfn = J = 2
(13581.35 − 1805.168)/2 dfd = T− K = 49
=
1805.168/(52 − 3) α = 0.05
= 159.828 Reject H0! Fc = 3.187
ANOVA Table 8.14

Sum of Mean
Source DF Squares Square F-Value
Explained 2 11776.18 5888.09 158.828
Unexplained 49 1805.168 36.84
Total 51 13581.35 p-value: 0.0001
2 SSR = 11776.18
=
R = SST 13581.35
0.867
Nonsample Information 8.15
A certain production process is known to be

Cobb-Douglas with constant returns to scale.
ln(yt) = β1 + β2 ln(Xt2) + β3 ln(Xt3) + β4 ln(Xt4) + et
where β2 + β3 + β4 = 1 β4 = (1 − β2 − β3)
ln(yt /Xt4) = β1 + β2 ln(Xt2/Xt4) + β3 ln(Xt3 /Xt4) + et

y*t = β1 + β2 X*t2 + β3 X*t3 + β4 X*t4 + et
Run least squares on the transformed model.
Interpret coefficients same as in original model.
Collinear Variables 8.16
The term “independent variable” means

an explanatory variable is independent of
of the error term, but not necessarily
independent of other explanatory variables.
Since economists typically have no control
over the implicit “experimental design”,
explanatory variables tend to move
together which often makes sorting out
their separate influences rather problematic.
Effects of Collinearity 8.17
A high degree of collinearity will produce:

1. no least squares output when collinearity is exact.
2. large standard errors and wide confidence intervals.
3. insignificant t-values even with high R2 and a
significant F-value.
4. estimates sensitive to deletion or addition of a few
observations or “insignificant” variables.
5. good “within-sample”(same proportions) but poor
“out-of-sample”(different proportions) prediction.
Identifying Collinearity 8.18
Evidence of high collinearity include:

1. a high pairwise correlation between two
explanatory variables.
2. a high R-squared when regressing one
explanatory variable at a time on each of the
remaining explanatory variables.
3. a statistically significant F-value when the
t-values are statistically insignificant.
4. an R-squared that doesn’t fall by much when
dropping any of the explanatory variables.
Mitigating Collinearity 8.19
Since high collinearity is not a violation of

any least squares assumption, but rather a
lack of adequate information in the sample:
1. collect more data with better information.
2. impose economic restrictions as appropriate.
3. impose statistical restrictions when justified.
4. if all else fails at least point out that the poor
model performance might be due to the
collinearity problem (or it might not).
Prediction 8.20
yt = β1 + β2Xt2 + β3Xt3 + et
Given a set of values for the explanatory
variables, (1 X02 X03), the best linear
unbiased predictor of y is given by:
^y = b + b X + b X
0 1 2 02 3 03
This predictor is unbiased in the sense

that the average value of the forecast
error is zero.
9.1
Chapter 9
Extensions
of the Multiple
Regression Model
9.2
Topics for This Chapter
1. Intercept Dummy Variables
2. Slope Dummy Variables
3. Different Intercepts & Slopes
4. Testing Qualitative Effects
5. Are Two Regressions Equal?
6. Interaction Effects
7. Dummy Dependent Variables
9.3
Intercept Dummy Variables
Dummy variables are binary (0,1)
yt = β1 + β2Xt + β3Dt + et
yt = speed of car in miles per hour
Xt = age of car in years
Dt = 1 if red car, Dt = 0 otherwise.
H0: β3 = 0
Police: red cars travel faster.
H1: β3 > 0
9.4
yt = β1 + β2Xt + β3Dt + et
red cars: yt = (β1 + β3) + β2xt + et
other cars: yt = β1 + β2Xt + et
yt
β1 + β3
β1 β2
red c
miles ars
per β2 othe
r car
hour s
0 age in years Xt
Slope Dummy Variables 9.5
yt = β1 + β2Xt + β3DtXt + et
Stock portfolio: Dt = 1 Bond portfolio: Dt = 0
yt yt = β1 + (β2 + β3)Xt + et
value
of β stocks
β2 + 3
porfolio bonds
β2
β1 yt = β1 + β2Xt + et
β1 = initial
investment 0 years Xt
9.6
Different Intercepts & Slopes
yt = β1 + β2Xt + β3Dt + β4DtXt + et
“miracle” seed: Dt = 1 regular seed: Dt = 0
yt yt = (β1 + β3) + (β2 + β4)Xt + et
harvest
weight β “miracle”
β2 + 4
of corn yt = β1 + β2Xt + et
β1 + β3 regular
β2
β1
rainfall Xt
yt = β1 + β2 Xt + β3 Dt + et 9.7
For men: Dt = 1.
For women: Dt = 0.
yt Men
yt = (β1+ β3) + β2 Xt + et
wage
Women
rate
β2 yt = β1 + β2 Xt + et
β1+ β3 . β2 Testing for H0: β3 = 0
discrimination
β1 . in starting wage H1: β3 > 0
0 years of experience Xt
yt = β1 + β5 Xt + β6 Dt Xt + et
9.8
For men Dt = 1.
For women Dt = 0.
yt
yt = β1 + (β5 + β6 )Xt + et Men
wage
β5 + β6 Women
rate
yt = β1 + β5 Xt + et
β5
Men and women have the same
starting wage, β1 , but their wage rates
β1 increase at different rates (diff.= β6 ).
β6 > 0 means that men’s wage rates are

increasing faster than women's wage rates.
0 years of experience Xt
An Ineffective Affirmative Action Plan 9.9
yt = β1 + β2 Xt + β3 Dt + β4 Dt Xt + et
yt women are started yt = (β1 + β3) + (β2 + β4) Xt + et
at a higher wage. Men
wage + β4 Women
β2
rate
β2 yt = β1 + β2 Xt + et
Women are given a higher starting wage, β1 ,

β1 while men get the lower starting wage, β1 + β3 ,
β1 + β3 (β3 < 0 ). But, men get a faster rate of increase
in their wages, β2 + β4 , which is higher than the
Note: rate of increase for women, β2 , (since β4 > 0 ).
(β3 < 0)
0 Xt
years of experience
9.10
Testing Qualitative Effects
1. Test for differences in intercept.
2. Test for differences in slope.
3. Test for differences in both

intercept and slope.
men: Dt = 1 ; women: Dt = 0 9.11
Yt = β 1 + β 2 Xt + β 3 Dt + β 4 Dt Xt + et
H0: β3 ≤ 0 vs. Η1: β3 > 0 intercept
Testing for b3 − 0
discrimination in tn−4
starting wage. Est. Var b3 ˜
H0: β4 ≤ 0 vs. Η1: β4 > 0 slope
Testing for b4 − 0
tn−4
discrimination in
wage increases. Est. Var b 4 ˜
Testing: Ho: β3 = β4 = 0 9.12
H1 : otherwise
( SSE R − SSE U ) / 2 2
∼ F T−4
SSE U / ( T − 4 )
T
SSE U =∑(yt −b1−b2Xt −b3 Dt −b4 Dt Xt )
2
t=1
and intercept and slope
T
∑ ( yt − b 1 − b 2 X t ) 2
SSE R =
t =1
9.13
Are Two Regressions Equal?
variations of “The Chow Test”
I. Assuming equal variances (pooling):

men: Dt = 1 ; women: Dt = 0
yt = β1 + β2 Xt + β3 Dt + β4 Dt Xt + et
Ho: β3 = β4 = 0 vs. H1: otherwise

yt = wage rate Xt = years of experience
This model assumes equal wage rate variance.
II. Allowing for unequal variances: 9.14
(running three regressions)
Forcing men and women to have same β1, β2.
Everyone: yt = β1 + β2 Xt + et SSER
Allowing men and women to be different.
Men only: ytm = δ1 + δ2 Xtm + etm SSEm
Women only: ytw = γ1 + γ2 Xtw + etw SSEw
(SSER − SSEU)/J J = # restrictions
F=
SSEU /(T−K) K=unrestricted coefs.
J=2 K = 4 where SSEU = SSEm + SSEw
9.15
Interaction Variables
1. Interaction Dummies
2. Polynomial Terms
(special case of continuous interaction)
3. Interaction Among Continuous Variables

9.16
1. Interaction Dummies
Wage Gap between Men and Women
yt = wage rate; Xt = experience
For men: Mt = 1. For women: Mt = 0.
For black: Bt = 1. For nonblack: Bt = 0.
No Interaction: wage gap assumed the same:
yt = β1 + β2 Xt + β3 Mt + β4 Bt + et
Interaction: wage gap depends on race:
yt = β1 + β2 Xt + β3 Mt + β4 Bt + β5 Mt Bt + et
9.17
2. Polynomial Terms
Polynomial Regression yt = income; Xt = age
Linear in parameters but nonlinear in variables:
yt = β1 + β2 X t + β3 X t
2
+ β4 X t
3
+ et
yt
20 30 40 50 60 70 80 90 Xt
People retire at different ages or not at all.
9.18
Polynomial Regression
yt = income; Xt = age
yt = β1 + β2 X t + β3 X t
2
+ β4 X t
3
+ et
Rate income is changing as we age:

∂yt
= β2 + 2 β3 X t + 3 β4 X t
2
∂Xt
Slope changes as X t changes.

9.19
3. Continuous Interaction
Exam grade = f(sleep:Zt , study time:Bt)
yt = β1 + β2 Zt + β3 Bt + β4 Zt Bt + et
Sleep and study time do not act independently.
More study time will be more effective

when combined with more sleep and less
effective when combined with less sleep.
9.20
continuous interaction
Exam grade = f(sleep:Zt , study time:Bt)
Your studying is ∂yt

more effective = β2 + β4 Zt
∂Bt
with more sleep.
Your mind sorts ∂yt
= β2 + β4 Bt
things out while ∂Zt
you sleep (when you have things to sort out.)
Exam grade = f(sleep:Zt , study time:Bt) 9.21
If Zt + Bt = 24 hours, then Bt = (24 − Zt)
yt = β1+ β2 Zt +β3(24 − Zt) +β4 Zt (24 − Zt) + et
yt = (β1+24 β3) + (β2−β3+24 β4)Zt − β4Z t
2
+ et
yt = δ1 + δ2 Zt + δ3 Z2t + et
Sleep needed to maximize your exam grade:
∂yt − δ2
= δ2 + 2δ3 Zt = 0 Zt =
∂Zt 2δ3
where δ2 > 0 and δ3 < 0
9.22
Dummy Dependent Variables
1. Linear Probability Model
2. Probit Model
3. Logit Model
9.23
Linear Probability Model
1 quits job
yi =
0 does not quit
yi = β1 + β2 Xi2 + β3 Xi3 + β4 Xi4 + ei

Xi2 = total hours of work each week
Xi3 = weekly paycheck
Xi4 = hourly pay (Xi3 divided by Xi2)
Linear Probability Model 9.24
yi = β1 + β2 Xi2 + β3 Xi3 + β4 Xi4 + ei

Read predicted values of yi off the regression line:
^y = b + b X + b X + b X
i 1 2 i2 3 i3 4 i4 ^y
i
yt = 1
yt = 0
total hours of work each week Xi2
Linear Probability Model 9.25
Problems with Linear Probability Model:
1. Probability estimates are sometimes

less than zero or greater than one.
2. Heteroskedasticity is present in that

the model generates a nonconstant
error variance.
9.26
Probit Model
latent variable, zi : zi = β1 + β2 Xi2 + . . .
Normal probability density function:
1 −0.5zi2
f(zi) = e
2π
Normal cumulative probability function:
∫−∞
zi −0.5u2
F(zi) = P[ Z ≤ zi ] = 1
e du
2π
9.27
Probit Model
Since zi = β1 + β2 Xi2 + . . . , we can
substitute in to get:
pi = P[ Z ≤ β1 + β2Xi2 ] = F(β1 + β2Xi2)
yt = 1
yt = 0
9.28
Logit Model
pi is the probability of quitting the job.
1
Define pi : pi = − (β + β X + . . .)
1 +e 1 2 i2
For β2 > 0, pi will approach 1 as Xi2 +∞
For β2 > 0, pi will approach 0 as Xi2 −∞

9.29
Logit Model
pi is the probability of quitting the job.
1
pi =
1 + e − (β1 + β2 Xi2 + . . .)
yt = 1
yt = 0
9.30
Maximum Likelihood
Maximum likelihood estimation (MLE)

is used to estimate Probit and Logit functions.
The small sample properties of MLE

are not known, but in large samples
MLE is normally distributed, and it is
consistent and asymptotically efficient.
10.1
Chapter 10
Heteroskedasticity
10.2
The Nature of Heteroskedasticity
Heteroskedasticity is a systematic pattern in
the errors where the variances of the errors
are not constant.
Ordinary least squares assumes that all
observations are equally reliable.
For efficiency (accurate estimation/prediction)
reweight observations to ensure equal error
variance.
10.3
Regression Model
zero mean: E(et) = 0

homoskedasticity: var(et) = σ 2
nonautocorrelation: cov(et, es) = 0 t≠s
heteroskedasticity: var(et) = σt 2
10.4
Homoskedastic pattern of errors
consumption
yt
.
. . .
. . . .
. . . . . .
. . . . .
. . . .
. . . .
. . .
.
. . . . .
.. .
.
.
income xt
10.5
The Homoskedastic Case
f(yt) yt
o n
p ti
u m
s
c o n .
.
.
.
x1 x2 x3 x4 xt
income
10.6
Heteroskedastic pattern of errors
consumption
.
yt .
. .
. . . .
. .
. . . .
. . . . .
. . . .. . . . .
.
. . . . . . .
. . .
. . . . .
. .
income xt
10.7
The Heteroskedastic Case
f(yt)
y
t
o n
p ti
u m
ns
c o
.
. rich people
.
poor people
x1 x2 x3 income xt
10.8
Properties of Least Squares
1. Least squares still linear and unbiased.

2. Least squares not efficient.
3. Usual formulas give incorrect standard
errors for least squares.
4. Confidence intervals and hypothesis tests
based on usual standard errors are wrong.
10.9
heteroskedasticity: var(et) = σt 2
incorrect formula for least squares variance:
var(b2) = σ 2
Σ (xt − x )2
correct formula for least squares variance:
Σ t ( t )
σ 2 x −x 2
var(b2) =
[Σ ( t ) ]
x − x 2 2
10.10
Hal White’s Standard Errors
White’s estimator of the least squares variance:
^
Σ t ( t )
e 2 x −x 2
est.var(b2) =
[Σ (xt − x )2]2
In large samples White’s standard error

(square root of estimated variance) is a
correct / accurate / consistent measure.
10.11
Two Types of Heteroskedasticity
1. Proportional Heteroskedasticity.
(continuous function(of xt, for example))
2. Partitioned Heteroskedasticity.
(discrete categories/groups)
10.12
Proportional Heteroskedasticity
E(et) = 0 var(et) = σt 2 cov(et, es) = 0 t≠s
The variance is
assumed to be
where σt 2 = σ 2 xt proportional to
the value of xt
10.13
std.dev. proportional to xt
variance: var(et) = σt 2 σt 2 = σ 2 xt
standard deviation: σt = σ xt
To correct for heteroskedasticity divide the model by xt

yt 1 xt et
= β1 + β2 +
xt xt xt xt
yt 1 xt et 10.14
= β1 + β2 +
xt xt xt xt
y*t = β1x*t1 + β2x*t2 + e*t

et 1 1
var(e*t ) = var( ) = x var(et) = x σ 2 xt
xt t t
var(e*t ) = σ 2
et is heteroskedastic, but e*t is homoskedastic.
10.15
Generalized Least Squares
These steps describe weighted least squares:
1. Decide which variable is proportional to the
heteroskedasticity (xt in previous example).
2. Divide all terms in the original model by the
square root of that variable (divide by xt ).
3. Run least squares on the transformed model
which has new yt*, x*t1 and x*t2 variables
but no intercept.
10.16
Partitioned Heteroskedasticity
yt = bushels per acre of corn t = 1, ,100 ...
xt = gallons of water per acre (rain or other)
error variance of “field” corn: var(et) = σ1 2

t = 1, . . . ,80
error variance of “sweet” corn: var(et) = σ2 2

t = 81, . . . ,100
10.17
Reweighting Each Group’s Observations
“field” corn: yt = β1 + β2xt + et var(et) = σ1 2
yt 1 xt et
σ1 = β1 σ1 + β2 σ1 + σ1 t = 1, . . . ,80
“sweet” corn: yt = β1 + β2xt + et var(et) = σ2 2
yt 1 xt et
σ2 = β1 σ2 + β2 σ2 + σ2
t = 81, . . . ,100
10.18
Apply Generalized Least Squares

Run least squares separately on data for each group.
σ1 2 provides estimator of σ1 2 using

^
the 80 observations on “field” corn.
σ
^ 2 provides estimator of σ 2 using
2 2
the 20 observations on “sweet” corn.
10.19
Detecting Heteroskedasticity
Determine existence and nature of heteroskedasticity:
1. Residual Plots provide information on the

exact nature of heteroskedasticity (partitioned
or proportional) to aid in correcting for it.
2. Goldfeld-Quandt Test checks for presence

of heteroskedasticity.
10.20
Residual Plots
Plot residuals against one variable at a time
after sorting the data by that variable to try
to find a heteroskedastic pattern in the data.
.
et .
. . .
. . .
. . .
. . . . . . .. . . . . ..
0 . . . . . . ..
. . . . . .
.. . xt
. .
10.21
Goldfeld-Quandt Test
The Goldfeld-Quandt test can be used to detect

heteroskedasticity in either the proportional case
or for comparing two groups in the discrete case.
For proportional heteroskedasticity, it is first necessary

to determine which variable, such as xt, is proportional
to the error variance. Then sort the data from the
largest to smallest values of that variable.
In the proportional case, drop the middle 10.22
r observations where r ≈ T/6, then run
separate least squares regressions on the first
T1 observations and the last T2 observations.
Ho: σ1 2 = σ2 2
H1: σ1 2 > σ2 2 Use F
Table
^
σ 2
Goldfeld-Quandt 1
Test Statistic
GQ = ~ F[T1-K1, T2-K2]
σ
^
2
2
Small values of GQ support Ho while large values support H1.

10.23
More General Model
Structure of heteroskedasticity could be more complicated:
σt 2 = σ 2 exp{α1 zt1 + α2 zt2}
zt1 and zt2 are any observable variables upon

which we believe the variance could depend.
Note: The function exp{.} ensures that σt2 is positive.

10.24
More General Model
σt2 = σ 2 exp{α1 zt1 + α2 zt2}
ln(σt2) = ln(σ 2) + α1 zt1 + α2 zt2
ln(σt2) = α0 + α1 zt1 + α2 zt2
where α0 = ln(σ 2)
Least squares residuals, êt
Ho: α1 = 0, α2 = 0
ln(e^ t2) =α0 +α1zt1+α2zt2 + νt
H1: α1 ≠ 0, α2 ≠ 0
and/or
the usual F test
11.1
Chapter 11
Autocorrelation
11.2
The Nature of Autocorrelation

For efficiency (accurate estimation/prediction)
all systematic information needs to be incor-
porated into the regression model.
Autocorrelation is a systematic pattern in the

errors that can be either attracting (positive)
or repelling (negative) autocorrelation.
11.3
et crosses line not enough (attracting)
Postive . . .. . . . ..
Auto. 0
. . .. . .
. ... .. .
t
crosses line randomly

et
No . . .. . . . . . . . .
Auto. 0 . . .. . . . . .. .
. . . . .t
et . . crosses line .
too much (repelling)
. . .
Negative
0
. . .
Auto.
. . . . . . . t
. .
11.4
Regression Model
zero mean: E(et) = 0

homoskedasticity: var(et) = σ 2
nonautocorrelation: cov(et, es) = 0 t≠s
autocorrelation: cov(et, es) ≠ 0 t≠s

11.5
Order of Autocorrelation
1st Order: et = ρ et−1 + νt
2nd Order: et = ρ1 et−1 + ρ2 et−2 + νt
3rd Order: et = ρ1 et−1 + ρ2 et−2 + ρ3 et−3 + νt

We will assume First Order Autocorrelation:
AR(1) : et = ρ et−1 + νt
11.6
First Order Autocorrelation
et = ρ et−1 + νt where −1 < ρ < 1
E(νt) = 0 var(νt) = σν2 cov(νt, νs) = 0 t ≠ s

These assumptions about νt imply the following about et :
E(et) = 0 cov(et, et−k) = σe2 ρk for k > 0
σ 2
var(et) = σ e2 = ν 2 corr(et, et−k) = ρk for k > 0
1− ρ
11.7
Autocorrelation creates some
Problems for Least Squares:
1. The least squares estimator is still linear

and unbiased but it is not efficient.
2. The formulas normally used to compute

the least squares standard errors are no
longer correct and confidence intervals and
hypothesis tests using them will be wrong.
11.8
Generalized Least Squares
AR(1) : et = ρ et−1 + νt substitute
in for et
yt = β1 + β2xt + ρ et−1 + νt
Now we need to get rid of et−1
(continued)
11.9
yt = β1 + β2xt + ρ et−1 + νt
et = yt − β1 − β2xt lag the

errors
et−1 = yt−1 − β1 − β2xt−1 once
yt = β1 + β2xt + ρ(yt−1 − β1 − β2xt−1) + νt
(continued)
11.10
yt = β1 + β2xt + ρ(yt−1 − β1 − β2xt−1) + νt
yt = β1 + β2xt + ρyt−1 − ρ β1 − ρ β2xt−1 + νt
yt − ρyt−1 = β1(1−ρ) + β2(xt−ρxt−1) + νt
yt* = β*1 + β2x*t2 + νt
yt* = yt − ρyt−1 x*t2 = (xt−ρxt−1)

β1 = β1(1−ρ)
*
yt* = yt − ρyt−1 β1 = β1(1−ρ) 11.11
*
x*t2 = xt − ρxt−1 yt* = β*1 + β2x*t2 + νt
Problems estimating this model with least squares:

1. One observation is used up in creating the
transformed (lagged) variables leaving only
(T−1) observations for estimating the model.
2. The value of ρ is not known. We must find

some way to estimate it.
11.12
Recovering the 1st Observation
Dropping the 1st observation and applying least squares
is not the best linear unbiased estimation method.
Efficiency is lost because the variance

of the error associated with the 1st observation
is not equal to that of the other errors.
This is a special case of the heteroskedasticity

problem except that here all errors are assumed
to have equal variance except the 1st error.
11.13
Recovering the 1st Observation
The 1st observation should fit the original model as:
y1 = β1 + β2x1 + e1
with error variance: var(e1) = σe2 = σν2 /(1-ρ2).
We could include this as the 1st observation for our

estimation procedure but we must first transform it so
that it has the same error variance as the other observations.
Note: The other observations all have error variance σν2.

11.14
y1 = β1 + β2x1 + e1
with error variance: var(e1) = σe2 = σν2 /(1-ρ2).
The other observations all have error variance σν2.
Given any constant c : var(ce1) = c2 var(e1).

If c = 1-ρ2 , then var( 1-ρ2 e1) = (1-ρ2) var(e1).
= (1-ρ2) σe2
= (1-ρ2) σν2 /(1-ρ2)
= σν2
The transformation ν1 = 1-ρ2 e1 has variance σν2 .

11.15
y1 = β1 + β2x1 + e1
Multiply through by 1-ρ2 to get:
1-ρ2 y1 = 1-ρ2 β1 + 1-ρ2 β2x1 + 1-ρ2 e1
The transformed error ν1 = 1-ρ2 e1 has variance σν2 .
This transformed first observation may now be

added to the other (T-1) observations to obtain
the fully restored set of T observations.
11.16
Estimating Unknown ρ Value
If we had values for the et’s, we could estimate:
et = ρ et−1 + νt
First, use least squares to estimate the model:
The residuals from this estimation are:

ê = y - b - b x
t t 1 2 t
11.17
ê = y - b - b x
t t 1 2 t
Next, estimate the following by least squares:
e^t = ρ êt−1 + ^νt
The least squares solution is:
Σ
T
^
e ^
e
t t-1
^ρ = t=2
Σ
T
^
e 2
t=2
t-1
11.18
Durbin-Watson Test
Ho: ρ = 0 vs. H1: ρ ≠ 0 , ρ > 0, or ρ < 0
The Durbin-Watson Test statistic, d, is :
Σ ( ^ ^
− t-1)
T 2
e t e
t=2
d =
Σ t
T
^
e 2
t=1
11.19
Testing for Autocorrelation
The test statistic, d, is approximately related to ^
ρ as:
d ≈ 2(1−ρ)
^
ρ = 0 , the Durbin-Watson statistic is d ≈ 2.

When ^
ρ = 1 , the Durbin-Watson statistic is d ≈ 0.

When ^
Tables for critical values for d are not always
readily available so it is easier to use the p-value
that most computer programs provide for d.
Reject Ho if p-value < α, the significance level.

11.20
Prediction with AR(1) Errors
When errors are autocorrelated, the previous period’s
error may help us predict next period’s error.
The best predictor, yT+1 , for next period is:
^y ^ ^ ^ ~
T+1 = β1 + β2xT+1 + ρ eT
^ ^
where β1 and β2 are generalized least squares
~
estimates and eT is given by:
~ ^ ^
e =y −β − β x
T T 1 2 T
11.21
For h periods ahead, the best predictor is:
^y ^ ^ ^ ~
T+h = β1 + β2xT+h + ρ eT
h
^ ~
Assuming | ρ | < 1, the influence of ρ eT
^ h
diminishes the further we go into the future

(the larger h becomes).
12.1
Chapter 12
Pooling
Time-Series and
Cross-Sectional Data
12.2
Pooling Time and Cross Sections
yit = β1it + β2itx2it + β3itx3it + eit
for the ith firm in the tth time period
If left unrestricted,
this model requires different equations
for each firm in each time period.
12.3
Seemingly Unrelated Regressions

SUR models impose the restrictions:
β1it = β1i β2it = β2i β3it = β3i
yit = β1i + β2ix2it + β3ix3it + eit
Each firm gets its own coefficients: β1i , β2i and β3i
but those coefficients are constant over time.
12.4
Two-Equation SUR Model
The investment expenditures (INV) of General Electric (G)
and Westinghouse(W) may be related to their stock market
value (V) and actual capital stock (K) as follows:
INVGt = β1G + β2GVGt + β3GKGt + eGt
INVWt = β1W + β2WVWt + β3WKWt + eWt
i = G, W t = 1, . . . , 20
12.5
Estimating Separate Equations
We make the usual error term assumptions:
E(eGt) = 0 E(eWt) = 0
var(eGt) = σG2 var(eWt) = σ2W
cov(eGt, eGs) = 0 cov(eWt, eWs) = 0
For now make the assumption of no correlation

between the error terms across equations:
cov(eGt, eWt) = 0 cov(eGt, eWs) = 0
homoskedasticity assumption: 12.6
σG = σW
2 2
Dummy variable model assumes that σG = σW :

2 2
INVt = β1G + δ1Dt + β2GVt + δ2DtVt + β3GKt + δ3DtKt + et
For Westinghouse observations Dt = 1; otherwise Dt = 0.
β1W = β1G + δ1 β3W = β3G + δ3

β2W = β2G + δ2
12.7
Problem with OLS on Each Equation
The first assumption of the Gauss-Markov

Theorem concerns the model specification.
If the model is not fully and correctly specified

the Gauss-Markov properties might not hold.
Any correlation of error terms across equations

must be part of model specification.
12.8
Correlated Error Terms
Any correlation between the

dependent variables of two or
more equations that is not due
to their explanatory variables
is by default due to correlated
error terms.
Which of the following models would 12.9
be likely to produce positively correlated
errors and which would produce
negatively correlations errors?
1. Sales of Pepsi vs. sales of Coke.

(uncontrolled factor: outdoor temperature)
2. Investments in bonds vs. investments in stocks.
(uncontrolled factor: computer/appliance sales)
3. Movie admissions vs. Golf Course admissions.
(uncontrolled factor: weather conditions)
4. Sales of butter vs. sales of bread.
(uncontrolled factor: bagels and cream cheese)
12.10
Joint Estimation of the Equations
INVGt = β1G + β2GVGt + β3GKGt + eGt
INVWt = β1W + β2WVWt + β3WKWt + eWt
cov(eGt, eWt) = σGW

12.11
Seemingly Unrelated Regressions
When the error terms of two or more equations

are correlated, efficient estimation requires the use
of a Seemingly Unrelated Regressions (SUR)
type estimator to take the correlation into account.
Be sure to use the Seemingly Unrelated Regressions (SUR)

procedure in your regression software program to estimate
any equations that you believe might have correlated errors.
12.12
Separate vs. Joint Estimation
SUR will give exactly the same results as estimating

each equation separately with OLS if either or both
of the following two conditions are true:
1. Every equation has exactly the same set of

explanatory variables with exactly the same
values.
2. There is no correlation between the error

terms of any of the equations.
12.13
Test for Correlation
Test the null hypothesis of zero correlation
Ηο: σGW = 0
σ
^ 2
λ = T rGW 2
rGW =
2
GW
σ σ
^ 2 ^2
λ ∼ χ(1)
G W asy. 2
12.14
Start with
the residuals
êGt and êWt
from each
σ
^
GW =
1
T
Σ êGte^ Wt
equation
estimated
separately. σ
^ 2
G = 1
T
Σe
^ 2
Gt
σ
^ 2
σ
^ 2
= 1
Σ ê 2
rGW =
2 GW W T Wt
σ σ
^ 2 ^2
G W
λ = T rGW 2
λ ∼ χ(1)
asy. 2
12.15
Fixed Effects Model
yit = β1it + β2itx2it + β3itx3it + eit
Fixed effects models impose the restrictions:

β1it = β1i β2it = β2 β3it = β3
For each ith cross section in the tth time period:

yit = β1i + β2x2it + β3x3it + eit
Each ith cross-section has its own constant β1i intercept.

12.16
The Fixed Effects Model is conveniently
represented using dummy variables:
D1i=1 if North D2i=1 if East D3i=1 if South D4i=1 if West
D1i=0 if not N D2i=0 if not E D3i=0 if not S D4i=0 if not W
yit = β11D1i + β12D2i + β13D3i + β14D4 i+ β2x2it + β3x3it + eit
yit = millions of bushels of corn produced

x2it = price of corn in dollars per bushel
x3it = price of soybeans in dollars per bushel
Each cross-sectional unit gets its own intercept,

but each cross-sectional intercept is constant over time.
12.17
Test for Equality of Fixed Effects
Ho : β11 = β12 = β13 = β14
H1 : Ho not true
The Ho joint null hypothesis may be tested with F-statistic:
(SSER − SSEU) / J J
F= ~ F(NT − K)
SSEU / (NT − K)
SSER is the restricted error sum of squares (one intercept)

SSEU is the unrestricted error sum of squares (four intercepts)
N is the number of cross-sectional units (N = 4)
K is the number of parameters in the model (K = 6)
J is the number of restrictions being tested (J = N−1 = 3)
T is the number of time periods
12.18
Random Effects Model
β1i = β1 + µi
β1 is the population mean intercept.
µi is an unobservable random error that

accounts for the cross-sectional differences.
12.19
Random Intercept Term
β1i = β1 + µi where i = 1, ... ,N
µi are independent of one another and of eit
E(µi) = 0 var(µi) = σµ
2
Consequently, E(β1i) = β1 var(β1i) = σµ2

12.20
Random Effects Model
yit = (β1+µi) + β2x2it + β3x3it + eit
yit = β1 + β2x2it + β3x3it + (µi +eit)
yit = β1 + β2x2it + β3x3it + νit

12.21
yit = β1 + β2x2it + β3x3it + νit
νit = (µi +eit)
νit has zero mean: E(νit) = 0
νit is homoskedastic: var(νit) = σµ2 + σe2
The errors from the same firm in different time periods
are correlated:
cov(νit,νis) = σµ
2 t≠s
The errors from different firms are always uncorrelated:

cov(νit,νjs) = 0 i≠j
13.1
Chapter 13
Simultaneous
Equations
Models
13.2
Keynesian Macro Model
Assumptions of Simple Keynesian Model
1. Consumption, c, is function of income, y.
2. Total expenditures = consumption + investment.
3. Investment assumed independent of income.

13.3
The Structural Equations
consumption is a function of income:
c = β1 + β2 y
income is either consumed or invested:
y=c+i
13.4
The consumption equation:
ct = β1 + β2 yt + et
The income identity:
yt = ct + it
The Simultaneous Nature
13.5
of Simultaneous Equations
2. 1.
5.
Since yt
contains et
3.
they are
4. yt = ct + it correlated
13.6
The Failure of Least Squares
The least squares estimators of

parameters in a structural simul-
taneous equation is biased and
inconsistent because of the cor-
relation between the random error
and the endogenous variables on
the right-hand side of the equation.
13.7
Single vs. Simultaneous Equations
Single Equation: Simultaneous Equations:
yt ct et
ct
et yt it
13.8
Deriving the Reduced Form
yt = ct + it
ct = β1 + β2(ct + it) + et
(1 − β2)ct = β1 + β2 it + et
13.9
Deriving the Reduced Form
(1 − β2)ct = β1 + β2 it + et
β1 β2 1
ct = + it + et
(1−β2) (1−β2) (1−β2)
ct = π11 + π21 it + νt
The Reduced Form Equation
13.10
Reduced Form Equation
ct = π11 + π21 it + νt
β1 β2
π11 = (1−β ) π21 = (1−β )
2 2
1
and νt = (1−β2) + et
13.11
yt = ct + it
where ct = π11 + π21 it + νt
yt = π11 + (1+π21) it + νt
It is sometimes useful to give this equation
its own reduced form parameters as follows:
yt = π12 + π22 it + νt
ct = π11 + π21 it + νt
13.12
yt = π12 + π22 it + νt
Since ct and yt are related through the identity:
yt = ct + it , the error term, νt, of these two
equations is the same, and it is easy to
show that:
β1
π =π =11 12
(1−β2)
π22 = (1−π21) = 1
(1−β2)
13.13
Identification
The structural parameters are β1 and β2.
The reduced form parameters are π11 and π21.

Once the reduced form parameters are estimated,
the identification problem is to determine if the
orginal structural parameters can be expressed
uniquely in terms of the reduced form parameters.
π11 ^ ^ π21 ^
β1 = β2 =
^
(1+π 21) (1+π 21)

^ ^
13.14
Identification
An equation is under-identified if its structural
(behavorial) parameters cannot be expressed
in terms of the reduced form parameters.
An equation is exactly identified if its structural

(behavorial) parameters can be uniquely expres-
sed in terms of the reduced form parameters.
An equation is over-identified if there is more

than one solution for expressing its structural
(behavorial) parameters in terms of the reduced
form parameters.
13.15
The Identification Problem
A system of M equations
containing M endogenous
variables must exclude at least
M−1 variables from a given
equation in order for the
parameters of that equation to
be identified and to be able to
be consistently estimated.
13.16
Two Stage Least Squares
yt1 = β1 + β2 yt2 + β3 xt1 + et1
yt2 = α1 + α2 yt1 + α3 xt2 + et2
Problem: right-hand endogenous variables

yt2 and yt1 are correlated with the error terms.
Problem: right-hand Copyright
endogenous1996 variables
Lawrence C. Marsh
13.17
yt2 and yt1 are correlated with the error terms.
Solution: First, derive the reduced form equations.
yt1 = β1 + β2 yt2 + β3 xt1 + et1
yt2 = α1 + α2 yt1 + α3 xt2 + et2
Solve two equations for two unknowns, yt1, yt2 :
yt1 = π11 + π21 xt1 + π31 xt2 + νt1
yt2 = π12 + π22 xt1 + π32 xt2 + νt2

13.18
2SLS: Stage I
yt1 = π11 + π21 xt1 + π31 xt2 + νt1

yt2 = π12 + π22 xt1 + π32 xt2 + νt2
Use least squares to get fitted values:
^y = ^π + ^π x + ^π x
t1 11 21 t1 31 t2 yt1 = ^yt1 + ^νt1
^yt2 = ^π12 + ^π22 xt1 + ^π32 xt2 yt2 = ^yt2 + ^νt2
2SLS: Stage II 13.19
yt1 = ^yt1 + ^νt1 and yt2 = ^yt2 + ^νt2
Substitue in yt1 = β1 + β2 yt2 + β3 xt1 + et1

for yt1 , yt2 yt2 = α1 + α2 yt1 + α3 xt2 + et2
^t2 + ^νt2) + β3 xt1 + et1

yt1 = β1 + β2 (y
^ t1 + ^νt1) + α3 xt2 + et2
yt2 = α1 + α2 (y
2SLS: Stage II (continued) 13.20
yt1 = β1 + β2 ^yt2 + β3 xt1 + ut1

yt2 = α1 + α2 ^yt1 + α3 xt2 + ut2
where ut1 = β2^νt2 + et1 and ut2 = α2^

νt1 + et2
Run least squares on each of the above equations

to get 2SLS estimates:
~ ~ ~ ~
β1 , β2 , β3 , α1 , α2 and α3
~ ~
14.1
Chapter 14
Nonlinear
Least
Squares
14.2
Review of Least Squares Principle
(minimize the sum of squared errors)
(A.) “Regression” model with only an intercept term:
yt = α + et ∂ SSE = − 2 Σ (y − ^α) = 0
∂α t
et = yt − α Σ yt − Σ ^α = 0
2 ^ = 0
Σ yt − Τ α
Σ et = Σ (yt − α)
2
Yields an exact analytical solution:

2
SSE = Σ (yt − α) ^ = 1 Σ yt = y
α T
14.3
Review of Least Squares
(B.) Regression model without an intercept term:
yt = βxt + et ∂ SSE = − 2 Σ x (y − ^βx )= 0

∂α t t t
et = yt − βxt
^
Σ xtyt − Σ βxt = 0
2
^
Σxt yt − β Σxt = 0
2
Σ et = Σ (yt − βxt)
2 2
^β Σx2 = Σ x y
2 t t t
SSE = Σ (yt − βxt)
This yields an exact ^β = Σ xtyt
analytical solution: Σx2t
Review of Least Squares 14.4
(C.) Regression model with both an intercept and a slope:
2
yt = α + βxt + et SSE = Σ (yt − α − βxt)
∂ SSE = − 2 Σ x (y − ^α − ^βx ) = 0
∂β t t t
∂ SSE = − 2 Σ (y − ^α − ^βx ) = 0
∂α t t
This yields an exact

analytical solution:
^ ^
α = y− β x
^ ^
y− α − β x = 0
^ ^β = Σ (xt−x)(yt−y)
Σxtyt − αΣxt − βΣx2t = 0
^
Σ(xt−x) 2
Nonlinear Least Squares 14.5
(D.) Nonlinear Regression model:
yt = xtβ + et
PROBLEM: An exact
2 analytical solution to
SSE = Σ (yt − β
xt ) this does not exist.
∂ SSE = − 2 Σ x ^β ln(x )(y − x ^β) = 0

∂β t t t t
Must use numerical

search algorithm to
try to find value of
^β ^ β to satisfy this.
Σ [xt ln(xt)yt] − Σ [xt ln(xt)]
2β =0
14.6
Find Minimum of Nonlinear SSE
2
SSE SSE = Σ (yt − β
xt )
^ β
β
14.7
Conclusion
The least squares principle
is still appropriate when the
model is nonlinear, but it is
harder to find the solution.
14.8
Optional Appendix
Nonlinear least squares
optimization methods:
The Gauss-Newton Method

14.9
The Gauss-Newton Algorithm

1. Apply the Taylor Series Expansion to the
nonlinear model around some initial b(o).
2. Run Ordinary Least Squares (OLS) on the
linear part of the Taylor Series to get b(m).
3. Perform a Taylor Series around the new b(m)
to get b(m+1) .
4. Relabel b(m+1) as b(m) and rerun steps 2.-4.
5. Stop when (b(m+1) − b(m) ) becomes very small.
14.10
The Gauss-Newton Method
yt = f(Xt,b) + εt for t = 1, . . . , n.
Do a Taylor Series Expansion around the vector b = b(o) as follows:
f(Xt,b) = f(Xt,b(ο)) + f’(Xt,b(ο))(b - b(ο))

+ (b - b(ο))Tf’’(Xt,b(ο))(b - b(ο)) + Rt
yt = f(Xt,b(ο)) + f’(Xt,b(ο))(b - b(ο)) + εt∗
where εt∗ ≡ (b - b(o))Tf’’(Xt,b(ο))(b - b(ο)) + Rt + εt

yt = f(Xt,b(ο)) + f’(Xt,b(ο))(b - b(ο)) + εt∗
14.11
yt - f(Xt,b(ο)) = f’(Xt,b(ο))b - f’(Xt,b(ο)) b(ο) + εt∗
yt - f(Xt,b(ο)) + f’(Xt,b(ο)) b(ο) = f’(Xt,b(ο))b + εt∗
yt∗(ο) = f’(Xt,b(ο))b + εt∗ This is linear in b .
where yt∗(ο) ≡ yt - f(Xt,b(ο)) + f’(Xt,b(ο)) b(ο)
Gauss-Newton just runs OLS on this

transformed truncated Taylor series.
Gauss-Newton just runs OLS on this 14.12
transformed truncated Taylor series.
yt∗(ο) = f’(Xt,b(ο))b + εt∗ or y∗(ο) = f’(X,b(ο))b + ∈∗
for t = 1, . . . , n in matrix terms
^b = [ f’(X,b(ο))T f’(X,b(ο))]-1 f’(X,b(ο))T y∗(ο)
This is analogous to linear OLS where

y = Xb + ∈ led to the solution: ^b = (XTX)−1XTy
except that X is replaced with the matrix of first
partial derivatives: f’(Xt,b(ο)) and y is replaced by y∗(ο)
(i.e. “y” = y*(ο) and “X” = f’(X,b(ο)) )

14.13
Recall that: y*(o) ≡ y − f(X,b(o)) + f’(X,b(ο)) b(ο)
Now define: y∗∗(ο) ≡ y − f(X,b(o))
Therefore: y∗(ο) = y∗∗(ο) + f’(X,b(ο)) b(ο)
Now substitute in for y∗ in Gauss-Newton solution:
^b = [ f’(X,b )T f’(X,b )]-1 f’(X,b )T y∗

(ο) (ο) (ο) (ο)
to get:
^b = b(o) + [ f’(X,b(ο))T f’(X,b(ο))]-1 f’(X,b(ο))T y∗∗(ο)

14.14
^b = b(o) + [ f’(X,b(ο))T f’(X,b(ο))]-1 f’(X,b(ο))T y∗∗(ο)
^
Now call this b value b(1) as follows:
b(1) = b(ο) + [ f’(X,b(ο))T f’(X,b(ο))]-1 f’(X,b(ο))T y∗∗(ο)
More generally, in going from interation m to

iteration (m+1) we obtain the general expression:
b(m+1) = b(m) + [ f’(X,b(m))T f’(X,b(m))]-1 f’(X,b(m))T y∗∗(m)

14.15
Thus, the Gauss-Newton (nonlinear OLS) solution
can be expressed in two alternative, but equivalent,
forms:
1. replacement form:
b(m+1) = [ f’(X,b(m))T f’(X,b(m))]-1 f’(X,b(m))T y*(m)
2. updating form:
b(m+1) = b(m) + [ f’(X,b(m))T f’(X,b(m))]-1 f’(X,b(m))T y∗∗(m)

14.16
For example, consider Durbin’s Method of estimating
the autocorrelation coefficient under a first-order
autoregression regime:
y t = b1 + b2 Xt 2 + . . . + bK Xt K + εt for t = 1, . . . , n.
εt = ρ ε t - 1 + ut where u t satisfies the conditions
E u t = 0 , E u 2t = su2, E u t u s = 0 for s ≠ t.
Therefore, u t is nonautocorrelated and homoskedastic.
Durbin’s Method is to set aside a copy of the equation,

lag it once, multiply by ρ and subtract the new equation
from the original equation, then move the ρyt-1 term to
the right side and estimate ρ along with the bs by OLS.
14.17
Durbin’s Method is to set aside a copy of the equation,
lag it once, multiply by ρ and subtract the new equation
from the original equation, then move the ρyt-1 term to
the right side and estimate ρ along with the b’s by OLS.
y t = b1 + b2 X t 2 + b3 X t 3 + εt for t = 1, . . . , n.
Lag once and multiply by ρ: where εt = ρ εt - 1 + ut
ρ y t-1 = ρ b1 + ρ b2 Xt -1, 2 + ρ b3 Xt -1, 3 + ρ εt -1

Subtract from the original and move ρ y t-1 to right side:
yt = b1(1-ρ) + b2(Xt 2 - ρXt-1, 2) + b3(Xt 3 − ρXt-1, 3)+ ρy t-1+ ut

14.18
The structural (restricted,behavorial) equation is:
yt = b1(1-ρ) + b2(Xt 2 - ρXt-1, 2) + b3(Xt 3 - ρXt-1, 3) + ρy t-1+ ut
Now Durbin separates out the terms as follows:
yt = b1(1-ρ) + b2Xt 2 - b2ρXt-1 2 + b3Xt 3 - b3ρXt-1 3+ ρy t-1+ ut
The corresponding reduced form (unrestricted) equation is:
yt = α1 + α2Xt, 2 + α3Xt-1, 2 + α4Xt, 3 + α5Xt-1, 3 + α6yt-1+ u t
α1 = b1(1-ρ) α2 = b2 α3= - b2ρ α4 = b3 α5= - b3ρ α6= ρ

14.19
α1 = b1(1-ρ) α2 = b2 α3= - b2ρ α4 = b3 α5= - b3ρ α6= ρ
^ α^ ^α ^α ^ ^α
Given OLS estimates: α1 2 3 4 α5 6
we can get three separate and distinct estimates for ρ :
^ − ^α3 − ^α
^ρ = α
^
ρ= ρ=
^ 5
^α2 ^α 4 6
These three separate estimates of ρ are in conflict !!!

It is difficult to know which one to use as “the”
legitimate estimate of ρ. Durbin used the last one.
14.20
The problem with Durbin’s Method is that it ignores
the inherent nonlinear restrictions implied by this
structural model. To get a single (i.e. unique) estimate
for ρ the implied nonlinear restrictions must be
incorporated directly into the estimation process.
Consequently, the above structural equation should be

estimated using a nonlinear method such as the
Gauss-Newton algorithm for nonlinear least squares.
yt = b1(1-ρ) + b2Xt 2 - b2ρXt -1, 2 + b3Xt 3 - b3ρXt -1, 3+ ρyt-1+ ut

14.21
yt = b1(1-ρ) + b2Xt 2 - b2ρXt-1, 2 + b3Xt 3 - b3ρXt-1, 3+ ρyt-1+ ut
∂ yt ∂ yt ∂ yt ∂ yt
f’(Xt,b) = [ ]
∂b1 ∂b2 ∂b 3 ∂ρ
∂ yt ∂ yt
= (1 − ρ) = (X t, 2 − ρ X t-1,2)
∂b1 ∂b2
∂ yt
= (X t, 3 − ρ X t-1,3)
∂b 3
∂ yt
= ( - b1 - b2Xt-1,2 - b3Xt-1,3+ y t-1 )
∂ρ
14.22
^β = [ f’(X,b(m))T f’(X,b(m))]-1 f’(X,b(m))T y∗(m)
(m+1)
where yt∗(m) ≡ yt - f(Xt,b(m)) + f’(Xt,b(m)) b(m)

b1(m)
b2(m)
Iterate until convergence. b(m) =
b3(m)
ρ(m)
∂ yt
∂ yt ∂ yt ∂ yt
f’(Xt,b(m)) = [ ∂b ∂b ∂b ]
1(m) 2(m) 3(m) ∂ρ(m)
f(Xt,b) = b1(1-ρ) + b2Xt 2 - b2ρXt-1 2 + b3Xt 3 - b3ρXt-1 3+ ρy t-1

Copyright 1996 Lawrence C. Marsh15.1
Chapter 15
Distributed
Lag Models
The Distributed Lag Effect

Effect Effect Effect
at time t at time t+1 at time t+2
Economic action
at time t
Unstructured Lags
yt = α + β0 xt + β1 xt-1 + β2 xt-2 + . . . + βn xt-n + et
“n” unstructured lags
no systematic structure imposed on the β’s
the β’s are unrestricted

Problems with Unstructured Lags
1. n observations are lost with n-lag setup.
2. high degree of multicollinearity among xt-j’s.
3. many degrees of freedom used for large n.
4. could get greater precision using structure.

The Arithmetic Lag Structure

proposed by Irving Fisher (1937) β0 = (n+1) γ
β1 = nγ
the lag weights decline linearly β2 = (n-1) γ
β3 = (n-2) γ
.
Imposing the relationship: .
β# = (n - # + 1) γ βn-2 = 3γ
βn-1 = 2γ
βn = γ
only need to estimate one coefficient, γ ,
instead of n+1 coefficients, β0 , ... , βn .
Arithmetic Lag Structure
yt = α + β0 xt + β1 xt-1 + β2 xt-2 + . . . + βn xt-n + et
Step 1: impose the restriction: β# = (n - # + 1) γ
yt = α + (n+1) γxt + n γxt-1 + (n-1) γxt-2 + . . . + γxt-n + et
Step 2: factor out the unknown coefficient, γ .
yt = α + γ [(n+1)xt + nxt-1 + (n-1)xt-2 + . . . + xt-n] + et

yt = α + γ [(n+1)xt + nxt-1 + (n-1)xt-2 + . . . + xt-n] + et
Step 3: Define zt .
zt = [(n+1)xt + nxt-1 + (n-1)xt-2 + . . . + xt-n]
Step 4: Decide number of lags, n.

For n = 4: zt = [ 5xt + 4xt-1 + 3xt-2 + 2xt-3 + xt-4]
Step 5: Run least squares regression on:
yt = α + γ zt + et
βi
β0 = (n+1)γ .
β1 = nγ .
β2 = (n-1)γ . linear
. lag
. structure
.
βn = γ .
0 1 2 . . . . . n n+1
i
Polynomial Lag Structure
proposed by Shirley Almon (1965)
n = the length of the lag
the lag weights fit a polynomial
p = degree of polynomial
βi = γ0 + γ1i + γ2i +...+ γpi

2 p
where i = 1, . . . , n
For example, a quadratic polynomial:

β0 = γ0
β1 = γ0 + γ1 + γ2
βi = γ0 + γ1i + γ2i 2
β2 = γ0 + 2γ1 + 4γ2
β3 = γ0 + 3γ1 + 9γ2
where i = 1, . . . , n
β4 = γ0 + 4γ1 + 16γ2
p = 2 and n = 4
15.10
yt = α + β0 xt + β1 xt-1 + β2 xt-2 + β3 xt-3 + β4 xt-4 + et
Step 1: impose the restriction: βi = γ0 + γ1i + γ2i2
yt = α + γ0 xt + (γ0 + γ1 + γ2)xt-1 + (γ0 + 2γ1 + 4γ2)xt-2

+ (γ0 + 3γ1 + 9γ2)xt-3+ (γ0 + 4γ1 + 16γ2)xt-4 + et
Step 2: factor out the unknown coefficients: γ0, γ1, γ2.
yt = α + γ0 [xt + xt-1 + xt-2 + xt-3 + xt-4]

+ γ1 [xt + xt-1 + 2xt-2 + 3xt-3 + 4xt-4]
+ γ2 [xt + xt-1 + 4xt-2 + 9xt-3 + 16xt-4] + et
15.11
yt = α + γ0 [xt + xt-1 + xt-2 + xt-3 + xt-4]
+ γ1 [xt + xt-1 + 2xt-2 + 3xt-3 + 4xt-4]
+ γ2 [xt + xt-1 + 4xt-2 + 9xt-3 + 16xt-4] + et
Step 3: Define zt0 , zt1 and zt2 for γ0 , γ1 , and γ2.
z t0 = [xt + xt-1 + xt-2 + xt-3 + xt-4]

z t1 = [xt + xt-1 + 2xt-2 + 3xt-3 + 4xt- 4 ]
z t2 = [xt + xt-1 + 4xt-2 + 9xt-3 + 16xt- 4]
15.12
Step 4: Regress yt on zt0 , zt1 and zt2 .
yt = α + γ0 z t0 + γ1 z t1 + γ2 z t2 + et
^ ‘s in terms of ^γ , ^γ , and ^γ .
Step 5: Express βi 0 1 2
β^0 = ^γ0
^
β1 = ^γ0 + ^γ + ^γ
1 2
^
β2 = ^γ +
0 2γ^ + 4γ
1
^
2
^
β3 = ^γ0 + ^ + 9γ
3γ 1
^
2
^
β 4 = ^γ +
0
^ + 16γ
4γ 1
^
2
15.13
βi β2
β0
. . .β
β1
3
. β4
.
0 1 2 3 4 i
Figure 15.3
15.14
Geometric Lag Structure
infinite distributed lag model:
yt = α + β0 xt + β1 xt-1 + β2 xt-2 + . . . + et
∞
yt = α + iΣ
=0
βi x t-i + et (15.3.1)
geometric lag structure:
βi = β φi where |φ| < 1 and βφi > 0 .

15.15
infinite unstructured lag:
yt = α + β0 xt + β1 xt-1 + β2 xt-2 + β3 xt-3 + . . . + et
β0 = β
β1 = βφ
Substitute βi = β φi β2 = β φ2
β3 = β φ3
..
.
infinite geometric lag:
yt = α + β(xt + φ xt-1 + φ2 xt-2 + φ3 xt-3 + . . .) + et
15.16
impact multiplier :
β
interim multiplier (3-period) :
β + β φ + β φ2
long-run multiplier :
β
β(1 + φ + φ2 + φ3 + . . . ) = 1− φ
15.17
βi β0 = β .
geometrically
β1 = β φ . declining
β2 = β φ2 . weights
β3 = β φ3
β4 = β φ4
. .
0 1 2 3 4 i
Figure 15.5
15.18
Problem:
How to estimate the infinite number
of geometric lag coefficients ???
Answer:
Use the Koyck transformation.
15.19
The Koyck Transformation
Lag everything once, multiply by φ and subtract from original:

φ yt-1 = φ α + β(φ xt-1 + φ2 xt-2 + φ3 xt-3 + . . .) + φ et-1
yt − φ yt-1 = α(1− φ) + βxt + (et − φ et-1)

15.20
yt − φ yt-1 = α(1− φ) + βxt + (et − φ et-1)
Solve for yt by adding φ yt-1 to both sides:
yt = α(1− φ) + φ yt-1 + βxt + (et − φ et-1)
yt = δ1 + δ2 yt-1 + δ3xt + νt
15.21
yt = α(1− φ) + φ yt-1 + βxt + (et − φ et-1)
Defining δ1 = α(1− φ) , δ2 = φ , and δ3 = β ,

use ordinary least squares:
yt = δ1 + δ2 yt-1 + δ3xt + νt
^β = ^δ
The original structural 3
parameters can now be ^φ = ^δ
estimated in terms of 2
these reduced form
^α = ^δ / (1− ^δ )
parameter estimates. 1 2
15.22
^ + φ^ x + φ^2 x + ^φ3 x + . . .) + ê
yt = ^α + β(x t t-1 t-2 t-3 t
^ ^
β0 = β
^ ^ ^
β1 = β φ
^β = ^β ^φ2
2
^β = ^β ^φ3
3
.
.
.
^ ^ ^ ^
yt = α + β0 xt + β1 xt-1 + β2 xt-2 + β3 xt-3 + . . . + êt
^
15.23
Durbin’s h-test
for autocorrelation
Estimates inconsistent if geometric lag model is autocorrelated,
but Durbin-Watson test is biased in favor of no autocorrelation.
T−1
h= 1− d
2 1 − ( T − 1)[se(b2)]2
h = Durbin’s h-test statistic

T = sample size
d = Durbin-Watson test statistic
se(b2) = standard error of the estimate b2
15.24
Adaptive Expectations
yt = α + β x*t + et
yt = credit card debt
x*t = expected (anticipated) income

(x*t is not observable)
15.25
adjust expectations
based on past realization:
x*t - x*t-1 = λ (xt-1 - x*t-1)

15.26
x*t - x*t-1 = λ (xt-1 - x*t-1)
rearrange to get:
x*t = λ xt-1 + (1- λ) x*t-1
or
λ xt-1 = [x*t - (1- λ) x*t-1]

15.27
yt = α + β x*t + et
Lag this model once and multiply by (1− λ):
(1− λ)yt-1 = (1− λ)α + (1− λ)β x*t-1 + (1− λ)et-1
subtract this from the original to get:
yt = αλ - (1− λ)yt-1+ β [x*t - (1− λ)x*t-1]

+ et - (1− λ)et-1
15.28
yt = αλ - (1− λ)yt-1+ β [x*t - (1− λ)x*t-1]
+ et - (1− λ)et-1
Since λ xt-1 = [x*t - (1- λ) x*t-1]

we get:
yt = αλ - (1− λ)yt-1+ βλxt-1 + ut
where ut = et - (1− λ)et-1

15.29
yt = αλ - (1− λ)yt-1+ βλxt-1 + ut
Use ordinary least squares regression on:
yt = β1 + β2yt-1+ β3xt-1 + ut
^ ^
and we get:
β1 ^ β3
^
α= β=
^ ^ ^
λ = (1− β2)
^
(1− β2) (1− β2)
15.30
Partial Adjustment
y*t = α + β xt + et
inventories partially adjust , 0 < γ < 1,

towards optimal or desired level, y*t :
yt - yt-1 = γ (y*t - yt-1)

15.31
Partial Adjustment
yt - yt-1 = γ (y*t - yt-1)

= γ (α + βxt + et - yt-1)
= γα + γβxt - γyt-1+ γet
Solving for yt :
yt = γα + (1 - γ)yt-1 + γβxt + γet

15.32
Partial Adjustment
yt = γα + (1 - γ)yt-1 + γβxt + γet
yt = β1 + β2yt-1+ β3xt + νt
Use ordinary least squares regression to get:
^ ^
^
^γ = (1− β β1 ^ β3
2) ^α = β=
^
^
(1− β2) (1− β2)
Chapter 16
Time
Series
Analysis
Previous Chapters used Economic Models
1. economic model for dependent variable of interest.
2. statistical model consistent with the data.
3. estimation procedure for parameters using the data.
4. forecast variable of interest using estimated model.
Times Series Analysis does not use this approach.

Time Series Analysis does not generally

incorporate all of the economic relationships
found in economic models.
Times Series Analysis uses

more statistics and less economics.
Time Series Analysis is useful for short term forecasting only.
Long term forecasting requires incorporating more involved

behavioral economic relationships into the analysis.
Univariate Time Series Analysis can be used

to relate the current values of a single economic
variable to:
1. its past values
2. the values of current and past random errors
Other variables are not used

in univariate time series analysis.
Three types of Univariate Time Series Analysis

processes will be discussed in this chapter:
1. autoregressive (AR)
2. moving average (MA)
3. autoregressive moving average (ARMA)

Multivariate Time Series Analysis can be
used to relate the current value of each of
several economic variables to:
1. its past values.
2. the past values of the other forecasted variables.
3. the values of current and past random errors.
Vector autoregressive models discussed later in

this chapter are multivariate time series models.
First-Order Autoregressive Processes, AR(1):
yt = δ + θ1yt-1+ et, t = 1, 2,...,T. (16.1.1)
δ is the intercept.
θ1 is parameter generally between -1 and +1.
et is an uncorrelated random error with
mean zero and variance σe 2 .
Autoregressive Process of order p, AR(p) :
yt = δ + θ1yt-1 + θ2yt-2 +...+ θpyt-p + et (16.1.2)
δ is the intercept.
θi’s are parameters generally between -1 and +1.
Properties of least squares estimator:
AR models always have one or more lagged

dependent variables on the right hand side.
Consequently, least squares is no longer a

best linear unbiased estimator (BLUE),
but it does have some good asymptotic
properties including consistency.
16.10
AR(2) model of U.S. unemployment rates
yt = 0.5051 + 1.5537 yt-1 - 0.6515 yt-2

(0.1267) (0.0707) (0.0708)
positive
negative
Note: Q1-1948 through Q1-1978 from J.D.Cryer (1986) see unempl.dat

16.11
Choosing the lag length, p, for AR(p):
The Partial Autocorrelation Function (PAF)
The PAF is the sequence of correlations between

(yt and yt-1), (yt and yt-2), (yt and yt-3), and so on,
given that the effects of earlier lags on yt are
held constant.
16.12
Partial Autocorrelation Function
Data simulated yt = 0.5 yt-1 + 0.3 yt-2 + et
from this model:
^θ θkk is the last (kth) coefficient
kk 1
in a kth order AR process.
2/ T
0
k
−2/ T
This sample PAF suggests a second

−1 order process AR(2) which is correct.
16.13
Using AR Model for Forecasting:
unemployment rate: yT-1 = 6.63 and yT = 6.20
^y ^ ^ ^
T+1 = δ + θ y
1 T + θ2 yT-1
= 0.5051 + (1.5537)(6.2) - (0.6515)(6.63)

= 5.8186
^y ^ ^ ^
T+2 = δ + θ1 T+1 + θ2 yT
y
= 0.5051 + (1.5537)(5.8186) - (0.6515)(6.2)

= 5.5062
^y ^ ^ ^
T+1 = δ + θ y
1 T + θ2 yT-1
= 0.5051 + (1.5537)(5.5062) - (0.6515)(5.8186)

= 5.2693
16.14
Moving Average Process of order q, MA(q):
yt = µ + et + α1et-1 + α2et-2 +...+ αqet-q + et (16.2.1)
µ is the intercept.
αi‘s are unknown parameters.
16.15
An MA(1) process:
yt = µ + et + α1et-1 (16.2.2)
Minimize sum of least squares deviations:
T T
S(µ,α1) = Σ et = t=1Σ(yt - µ - α1et-1)
2 2
(16.2.3)
t=1
16.16
Stationary vs. Nonstationary
stationary:
A stationary time series is one whose mean, variance,
and autocorrelation function do not change over time.
nonstationary:
A nonstationary time series is one whose mean,
variance or autocorrelation function change over time.
16.17
First Differencing is often used to transform

a nonstationary series into a stationary series:
yt = z t - z t-1
where z t is the original nonstationary series

and yt is the new stationary series.
16.18
Choosing the lag length, q, for MA(q):
The Autocorrelation Function (AF)
The AF is the sequence of correlations between

(yt and yt-1), (yt and yt-2), (yt and yt-3), and so on,
without holding the effects of earlier lags
on yt constant.
The PAF controlled for the effects of previous lags

but the AF does not control for such effects.
16.19
Autocorrelation Function
Data simulated yt = et − 0.9 et-1
from this model:
rkk 1 This sample AF suggests a first order
process MA(1) which is correct.
2/ T
0
k
−2/ T
rkk is the last (kth) coefficient

−1 in a kth order MA process.
16.20
Autoregressive Moving Average
ARMA(p,q)
An ARMA(1,2) has one autoregressive lag

and two moving average lags:
yt = δ + θ1yt-1 + et + α1et-1 + α2 et-2

16.21
Integrated Processes
A time series with an upward or downward

trend over time is nonstationary.
Many nonstationary time series can be made

stationary by differencing them one or more times.
Such time series are called integrated processes.

16.22
The number of times a series must be
differenced to make it stationary is the
order of the integrated process, d.
An autocorrelation function, AF,

with large, significant autocorrelations
for many lags may require more than
one differencing to become stationary.
Check the new AF after each differencing

to determine if further differencing is needed.
16.23
Unit Root
zt = θ1zt -1 + µ + et + α1et -1 (16.3.2)
-1 < θ1 < 1 stationary ARMA(1,1)
θ1 = 1 nonstationary process
θ1 = 1 is called a unit root

16.24
Unit Root Tests
zt - zt -1 = (θ1- 1)zt -1 + µ + et + α1et -1
∆zt = θ1zt -1 +
*
µ + et + α1et -1 (16.3.3)
where ∆zt = zt - zt -1 and θ1 = θ1- 1

*
Testing θ1 = 0 is equivalent to testing θ1 = 1

*
16.25
Unit Root Tests
H0: θ1 = 0
*
vs. H1: θ1 < 0
*
(16.3.4)
Computer programs typically use one of

the following tests for unit roots:
Dickey-Fuller Test
Phillips-Perron Test
16.26
Autoregressive Integrated Moving Average

ARIMA(p,d,q)
An ARIMA(p,d,q) model represents an

AR(p) - MA(q) process that has been
differenced (integrated, I(d)) d times.
yt = δ + θ1yt-1 +...+ θpyt-p + et + α1et-1 +... + αq et-q

16.27
The Box-Jenkins approach:
1. Identification
determining the values of p, d, and q.
2. Estimation
linear or nonlinear least squares.
3. Diagnostic Checking
model fits well with no autocorrelation?
4. Forecasting
short-term forecasts of future yt values.
16.28
Vector Autoregressive (VAR) Models
Use VAR for two or more interrelated time series:
yt = θ0 + θ1yt-1 +...+ θpyt-p + φ1xt-1 +... + φp xt-p + et
xt = δ0 + δ1yt-1 +...+ δpyt-p + α1xt-1 +... + αp xt-p + ut

16.29
Vector Autoregressive (VAR) Models
1. extension of AR model.
2. all variables endogenous.
3. no structural (behavioral) economic model.
4. all variables jointly determined (over time).
5. no simultaneous equations (same time).
16.30
The random error terms in a VAR model
may be correlated if they are affected by
relevant factors that are not in the model
such as government actions or
national/international events, etc.
Since VAR equations all have exactly the

same set of explanatory variables, the usual
seemingly unrelation regression estimation
produces exactly the same estimates as
least squares on each equation separately.
16.31
Least Squares is Consistent
Consequently, regardless of whether

the VAR random error terms are
correlated or not, least squares estimation
of each equation separately will provide
consistent regression coefficient estimates.
16.32
VAR Model Specification
To determine length of the lag, p, use:
1. Akaike’s AIC criterion
2. Schwarz’s SIC criterion
These methods were discussed in Chapter 15.

16.33
Spurious Regressions
yt = β1 + β2 xt + εt
where εt = θ1 εt-1 + νt
-1 < θ1 < 1 I(0) (i.e. d=0)

θ1 = 1 I(1) (i.e. d=1)
If θ1 =1 least squares estimates of β2 may

appear highly significant even when true β2 = 0 .
16.34
Cointegration
yt = β1 + β2 xt + εt
If xt and yt are nonstationary I(1)

we might expect that εt is also I(1).
However, if xt and yt are nonstationary I(1)

but εt is stationary I(0), then xt and yt are
said to be cointegrated.
16.35
Cointegrated VAR(1) Model
VAR(1) model:
yt = θ0 + θ1yt-1 + φ1xt-1 + et
xt = δ0 + δ1yt-1 + α1xt-1 + ut
If xt and yt are both I(1) and are cointegrated,

use an Error Correction Model, instead of VAR(1).
16.36
Error Correction Model
∆yt = yt - yt-1 and ∆xt = xt - xt-1
∆yt = θ0 + (θ1-1)yt-1 + φ1xt-1 + et
∆xt = δ0 + δ1yt-1 + (α1-1)xt-1 + ut
(continued)
16.37
∆yt = θ0 + γ1(yt-1 - β1 - β2 xt-1) + et

*
∆xt = δ0 + γ2(yt-1 - β1 - β2 xt-1) + ut

*
θ0 =
*
θ0 + γ1β1 φ1 δ1
γ1 =
- α1
1 α1 - 1
β2 =
δ1
δ0
*
= δ0 + γ2β1 γ2 = δ1
16.38
Estimating an Error Correction Model
Step 1:
Estimate by least squares:
yt-1 = β1 + β2 xt-1 + εt-1
to get the residuals:
^ε ^ ^
t-1 = yt-1 - β1 - β2 xt-1
16.39
Estimating an Error Correction Model
Step 2:
Estimate by least squares:
∆yt = θ0 + γ1 ^ε
*
t-1 + et
∆xt = δ0 + γ2 ^ε
*
t-1 + ut
16.40
Using cointegrated I(1) variables in a

VAR model expressed solely in terms
of first differences and lags of first
differences is a misspecification.
The correct specification is to use an

Chapter 17
Guidelines for
Research Project
What Book Has Covered
ð Formulation
economic ====> econometric.
ð Estimation
selecting appropriate method.
ð Interpretation
how the xt’s impact on the yt .
ð Inference
testing, intervals, prediction.
Topics for This Chapter
1. Types of Data by Source
2. Nonexperimental Data
3. Text Data vs. Electronic Data
4. Selecting a Topic
5. Writing an Abstract
6. Research Report Format
Types of Data by Source
i) Experimental Data
from controlled experiments.
ii) Observational Data
passively generated by society.
iii) Survey Data
data collected through interviews.
Time vs. Cross-Section
Time Series Data

data collected at distinct points in time
(e.g. weekly sales, daily stock price, annual
budget deficit, monthly unemployment.)
Cross Section Data
data collected over samples of units, individuals,
households, firms at a particular point in time.
(e.g. salary, race, gender, unemployment by state.)
Micro vs. Macro
Micro Data:
data collected on individual economic
decision making units such as individuals,
households or firms.
Macro Data:
data resulting from a pooling or aggregating
over individuals, households or firms at the
local, state or national levels.
Flow vs. Stock
Flow Data:
outcome measured over a period of time,
such as the consumption of gasoline during
the last quarter of 1997.
Stock Data:
outcome measured at a particular point in
time, such as crude oil held by Chevron in
US storage tanks on April 1, 1997.
Quantitative vs. Qualitative
Quantitative Data:
outcomes such as prices or income that may
be expressed as numbers or some transfor-
mation of them (e.g. wages, trade deficit).
Qualitative Data:
outcomes that are of an “either-or” nature
(e.g. male, home owner, Methodist, bought
car last year, voted in last election).
International Data
International Financial Statistics (IMF monthly).

Basic Statistics of the Community (OECD annual).
Consumer Price Indices in the European
Community (OECD annual).
World Statistics (UN annual).
Yearbook of National Accounts Statistics (UN).
FAO Trade Yearbook (annual).
17.10
United States Data
Survey of Current Business (BEA monthly).
Handbook of Basic Economic Statistics (BES).
Monthly Labor Review (BLS monthly).
Federal Researve Bulletin (FRB monthly).
Statistical Abstract of the US (BC annual).
Economic Report of the President (CEA annual).
Economic Indicators (CEA monthly).
Agricultural Statistics (USDA annual).
Agricultural Situation Reports (USDA monthly).
17.11
State and Local Data
State and Metropolitan Area Data Book
(Commerce and BC, annual).
CPI Detailed Report (BLS, annual).
Census of Population and Housing
(Commerce, BC, annual).
County and City Data Book
(Commerce, BC, annual).
17.12
Citibase on CD-ROM
• Financial series: interest rates, stock market, etc.

• Business formation, investment and consumers.
• Construction of housing.
• Manufacturing, business cycles, foreign trade.
• Prices: producer and consumer price indexes.
• Industrial production.
• Capacity and productivity.
• Population.
17.13
Citibase on CD-ROM
(continued)
• Labor statistics: unemployment, households.
• National income and product accounts in detail.
• Forecasts and projections.
• Business cycle indicators.
• Energy consumption, petroleum production, etc.
• International data series including trade
statistics.
17.14
Resources for Economists
Resources for Economists by Bill Goffe
http://econwpa.wustl.edu/EconFAQ/EconFAQ.html
Bill Goffe provides a vast database of information

about the economics profession including economic
organizations, working papers and reports,
and economic data series.
17.15
Internet Data Sources
A few of the items on Bill Goffe’s Table of Contents:
• Shortcut to All Resources.
• Macro and Regional Data.
• Other U.S. Data.
• World and Non-U.S. Data.
• Finance and Financial Markets.
• Data Archives.
• Journal Data and Program Archives.
17.16
Useful Internet Addresses
http://seamonkey.ed.asu.edu/~behrens/teach/WWW_data.html
http://www.sims.berkeley.edu/~hal/pages/interesting.html
http://www.stls.frb.org FED RESERVE BK - ST. LOUIS
http://www.bls.gov BUREAU OF LABOR STATISTICS
http://nber.harvard.edu NAT’L BUR. ECON. RESEARCH
http://www.inform.umd.edu:8080/EdRes/Topic/EconData/
.www/econdata.html UNIVERSITY OF MARYLAND
http://www.bog.frb.fed.us FEB BOARD OF GOVERNORS
http://www.webcom.com/~yardeni/economic.html
17.17
Data from Surveys
The survey process has four distinct aspects:
i) identify the population of interest.
ii) designing and selecting the sample.
iii) collecting the information.
iv) data reduction, estimation and inference.

17.18
Controlled Experiments
Controlled experiments were done on these topics:
1. Labor force participation: negative income tax:
guaranteed minimum income experiment.
2. National cash housing allowance experiment:
impact on demand and supply of housing.
3. Health insurance: medical cost reduction:
sensitivity of income groups to price change.
4. Peak-load pricing and electricity use:
daily use pattern of residential customers.
17.19
Economic Data Problems
I. poor implicit experimental design
(i) collinear explanatory variables.
(ii) measurement errors.
II. inconsistent with theory specification
(i) wrong level of aggregation.
(ii) missing observations or variables.
(iii) unobserved heterogeneity.
17.20
Selecting a Topic
General tips for selecting a research topic:
ð • “What am I interested in?”
ð • Well-defined, relatively simple topic.
ð • Ask prof for ideas and references.
ð • Journal of Economic Literature (ECONLIT)
ð • Make sure appropriate data are available.
ð • Avoid extremely difficult econometrics.
ð • Plan your work and work your plan.
17.21
Writing an Abstract
Abstract of less than 500 words should include:
(i) concise statement of the problem.
(ii) key references to available information.
(iii) description of research design including:
(a) economic model
(b) statistical model
(c) data sources
(d) estimation, testing and prediction
(iv) contribution of the work
17.22
Research Report Format
1. Statement of the Problem.
2. Review of the Literature.
3. The Economic Model.
4. The Statistical Model.
5. The Data.
6. Estimation and Inferences Procedures.
7. Empirical Results and Conclusions.
8. Possible Extensions and Limitations.
9. Acknowledgments.
10. References.

Allchap1 PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Allchap1 PDF

Uploaded by

Copyright:

Available Formats

Copyright 1996 Lawrence C.

To accompany: Undergraduate Econometrics

1. Information from economic theory.

2. Information from economic data.

To use information effectively:

*Econometrics* helps us combine

Consumption, c, is some function of income, i :

For applied econometric analysis

qd = f( p, pc, ps, i ) demand

supply, qs, of an individual commodity:

Listing the variables in an economic relationship is not enough.

For effective policy we must know the amount of change

• By how much should the Federal Reserve

• By how much can the price of football tickets

Answering the How Much? question

Need to estimate parameters

Average or systematic behavior

Not a single individual or single firm.

The Statistical Model

Actual vs. Predicted Consumption:

Consumption, c, is function, f, of income, i, with error, e:

The statistical model then becomes:

• Dependent variable, y, is focus of study

• Explanatory variables, X2 and X3, help us explain

Controlled experiment (“pure” science) explaining mass, y :

Uncontrolled experiment (econometrics) explaining consump-

• Uncertainty regarding an outcome.

Note: the textbook uses the following symbol

The term random variable implies the existence of some

In contrast, an arbitrary variable does not have a

Uncontrolled experiment values

Example: Prize money from the following

A discrete random variable that is restricted

Dummy variables account for qualitative differences:

Therefore, 0 < f(x) < 1

If X takes on the n values: x1, x2, . . . , xn,

per capita income, X, in the United States

Probability is represented by area.

For continuous random variables it is the

Note that summation is a linear operator

The definition of x as given in Rule 5 implies

The order of summation does not matter :

The Mean of a Random Variable

The mean or arithmetic average of a

In other words, draw an infinite number of samples,

E[X] = x1f(x1) + x2f(x2) + . . . + xnf(xn)

As sample size goes to infinity, the

In the empirical case when the

where n is the number of sample observations.

where n is the number of possible values of xi.

Notice how the meaning of n changes.

The expected value of X-squared:

The expected value of X-cubed:

g(X) = g1(X) + g2(X)

E [g(X)] = E [g1(X)] + E [g2(X)]

E(X+Y) = E(X) + E(Y)

E(X-Y) = E(X) - E(Y)

Adding a constant to a variable will

var(X) = average squared deviations

var(X) = expected value of the squared deviations

standard deviation is square root of variance

2 .1 2 - 4.3 = -2.3 5.29 (.1) = .529

A joint probability density function,

Econometrics helps us combine