Econ 399 Chapter2a

Part 1
Cross Sectional Data

Simple Linear Regression Model
Chapter 2
Multiple Regression Analysis
Chapters 3 and 4
Advanced Regression Topics
Chapter 6
Dummy Variables Chapter 7
Note: Appendices A, B, and C are
additional review if needed.
1. The Simple Regression Model

2.1 Definition of the Simple Regression
Model
2.2 Deriving the Ordinary Least Squares
Estimates
2.3 Properties of OLS on Any Sample of
Data
2.4 Units of Measurement and Functional
Form
2.5 Expected Values and Variances of the
OLS Estimators
2.6 Regression through the Origin
2.1 The Simple Regression Model

Economics is built upon assumptions
-assume people are utility maximizers

-assume perfect information
-assume we have a can opener
The Simple Regression Model is based on
assumptions
-more assumptions are required for

more analysis
-disproving assumptions leads to
more complicated models

Recall the SIMPLE LINEAR REGRESION
MODEL:
y 0 1 x u
(2.1)
-relates two variables (x and y)

-also called the two-variable linear regression
model or bivariate linear regression model
y is the DEPENDENT or EXPLAINED variable
x is the INDEPENDENT or EXPLANATORY
variable
y is a function of x

Recall the SIMPLE LINEAR REGRESION
MODEL:
y 0 1 x u
(2.1)
u is the ERROR TERM or DISTURBANCE

variable
-u takes into account all factors other
than x that affect y
-u accounts for all unobserved impacts
on y

Example of the SIMPLE LINEAR REGRESION
MODEL:
taste 0 1cookingtime u
(ie)
-taste depends on cooking time

-taste is explained by cooking time
-taste is a function of cooking time
-u accounts for other factors affecting
taste (cooking skill, ingredients available,
random luck, differing taste buds, etc.)

The SRM shows how y changes:
y 1x if u 0
(2.2)
-for example, if B1=3, a 2 increase in x

would cause a 6 unit change in y (2 x 3 =
6)
-B1 is the SLOPE PARAMETER
-B0 is the INTERCEPT PARAMETER or
CONSTANT TERM
-not always useful in analysis
y 0 1 x u
(2.1)
-note that this equation implies

CONSTANT returns
-the first x has the same impact on y
as the 100th x
-to avoid this we can include powers or
change functional forms

-in order to achieve a ceteris paribus
analysis of xs affect on y, we need
assumptions of us relationship with x
-in order to simplify our assumptions, we
first assume that the average of u in the
population is zero:
E (u) 0
(2.5)
-if Bo is included in the equation, it can

always be modified to make (2.5) true
-ie: if E(u)>0, simply increase B1
2.1 x, u and Dependence

-we now need to assume that x and u are
unrelated
-if x and u are uncorrelated, u may still be
correlated to functions such as x2
-we therefore need a stronger assumption:
E (u | x) E (u ) 0
(2.6)
-the average value of u does not depend on

x
-second equality comes from (2.5)
-called the ZERO CONDITIONAL MEAN
2.1 Example
Take the regression:
Papermark 0 1Paperquality u
(ie)
-where u takes into other factors of the

applied paper, in particular length
exceeding 10 pages
-assumption (2.6) requires that a papers
length does not depend on how good it is:
E (length | good paper) E(length | bad paper) 0

Conditional Expectations of (2.1) and (2.6) give
us:
E (y | x) 0 1x
(2.8)
-2.8 is called the POPULATION REGRESSION

FUNCTION (PRF)
-a one unit increase in x increases the
expected value of y by B1
-B0+B1x is the systematic (explained) part of
y
-u is the unsystematic (unexplained) part of y
2.2 Deriving the OLS Estimates

In order to estimate B1 and B2, we need sample
data
-let {(x,y):i=1,.n} be a sample of n

observations from the population
yi 0 1x i u i
(2.9)
-here yi is explained by xi with error term

ui
-y5 indicates the observation of y from
the 5th data point
-this regression plots a best fit line
through our data points:
2.2 Deriving the OLS Estimates

These OLS estimates create a straight line going
through the middle of the estimates:
2.2 Deriving OLS

Estimates
In order to derive OLS,
we first need assumptions.
We must first assume that u has zero expected value:
E (u) 0
(2.10)
-Secondly, we must assume that the covariance

between x and u is zero:
Cov ( x, u ) E ( xu ) 0
(2.11)
-(2.10) and (2.11) can also be rewritten in

terms of x and y as:
E (y - 0 - 1x) 0
E[ x(y - 0 - 1x)] 0
(2.12)
(2.13)
2.2 Deriving OLS

Estimates
-(2.12) and (2.13)
imply restrictions on the
joint probability of the POPULATION

-given SAMPLE data, these equations become:
1 n
- x ) 0
(y
(2.14)
i
o
1 i
n i 1
1
n
- x ) 0
x
(y
i i o 1i
(2.15)
i 1
-notice that the hat above B1 and B2 indicate we are now dealing with estimates
-this is an example of method of moments estimation (see Section C for a
discussion)
2.2 Deriving OLS

Estimates
Using summation
properties, (2.14)
simplifies to:
y 0 1 x
(2.16)
Which can be rewritten as:
0 y 1 x
(2.17)
Which is our OLS estimate for the intercept

-therefore given data and an estimate of the slope, the estimated
intercept can be determined
2.2 Deriving OLS

Estimates
By cancelling out
1/n and combining (2.17)
and 2.15 we get:
n
x) x ] 0
x
[
y
(
y
i i
1
1
i 1
Which can be rewritten as:

n
i 1
i 1
x
(
y
y
)
i i
1 xi ( xi x )
2.2 Deriving OLS

Estimates
Recall the algebraic
properties:
n
x [ x x] [ x x]
i 1
i 1
And
n
x [y
i 1
y ] [ xi x][ yi y ]
i 1
2.2 Deriving OLS

Estimates
We can make the
simple assumption that:
n
[ x x]
i 1
(2.18)
Which essentially states that not all xs are the same

-ie: you didnt do a survey where one question is are you alive?
-This is essentially the key assumption needed to estimate B 1hat
2.2 Deriving OLS

All this gives usEstimates
the OLS estimate for B1:
n
[ x x][ y
i
i 1
[ x x]
i 1
y]
(2.19)
2
Note that assumption (2.18) basically ensured the denominator is not zero.
-also note that if x and y are positively (negatively) correlated, B1hat will
be positive (negative)
2.2 Fitted Values
OLS estimates of B1 and B2 give us a

FITTED value for y when x=xi:
yi 0 1x i
(2.20)
-there is one fitted or predicted value of y for each observation of x

-the predicted ys can be greater than, less than or (rarely) equal to
the actual ys
2.2 Residuals
The difference between the actual y values

and the estimates is the ESTIMATED error,
or residuals:
ui yi y i yi 0 1x i
(2.21)
-again, there is one residual for each observation

-these residuals ARE NOT the same as the actual error term
2.2 Residuals
The SUM OF SQUARED RESIDUALS can be

expressed as:
n
ui ( yi 0 1x i )
2
(2.22)
i 1
-if B1hat and B2hat are chosen to minimize (2.22), (2.14) and (2.15) are our FIRST ORDER
CONDITIONS (FOCS) and we are able to derive the same OLS estimates as above (2.17)
and (2.19)
-the term OLS comes from the fact that the square of the residuals is minimized
2.2 Why OLS?
Why minimize the sum of the squared residuals?

-Why not minimize the residuals themselves?
-Why not minimize the cube of the residuals?
-not all minimization techniques can be expressed as formulas
-OLS has the advantage of deriving unbiasedness, consistency, and
other important statistical properties.
2.2 Regression Line
Our OLS regression supplies us with an OLS

REGRESSION LINE:
y 0 1x
-note that as this is an equation of a line, there are no subscripts

-B0hat is the predicted value of y when x=0
-not always a valid value
-(2.23) is also called the SAMPLE REGRESSION FUNCTION (SRF)
-different data sets will estimate different Bs
(2.23)
2.2 Deriving OLS

Estimates
The slope estimate:
1 y/x
(2.24)
Shows the change in yhat when x changes, or

alternatively,
y 1x
(2.25)
The change in x can be multiplied by B1hat to

estimate the change in y.
2.2 Deriving OLS Estimates

Notes:
1) As the mathematics required to

estimate OLS is difficult with more
than a few data points, econometrics
software (like Shazam) must be used.
2) A successful regression cannot
conclude on causality, only comment
on positive or negative relations
between x and y
3) We often use the terminology
regress y on x to estimate y=f(x)

Data
Review
-Once again, simple algebraic

properties are needed in order to build
OLSs foundation
-OLS (B1hat and B2hat) can be used to
calculate fitted values (yhat)
-the residual (u) is the difference
between the actual y values and the
estimated y values (yhat)
2.3 Properties of OLS
u=y-yhat
Here yhat underpredicts y
uhat
yhat
1) From the FOC of OLS (2.14), the sum of

all residuals is zero:
n
u
i 1
(2.30)
2) Also from the FOC of OLS (2.15), the sample covariance

between the regressors and the OLS residuals is zero:
n
x u
i 1
i i
(2.31)
From 2.30, the left side of 2.31) is proportional

to the required sample covariance
3) The point (xbar, ybar) is always on the

OLS regression line (from 2.16):
y 0 1 x
(2.16)
Further Algebraic Gymnastics:

1) From (2.30) we know that the sample
average of the fitted y values equals the
sample average of the actual y values:
y y

Further Algebraic Gymnastics:
2) 2.30 and 2.31 combine to prove that the
covariance between yhat and uhat is zero
Therefore OLS breaks down yi into two
uncorrelated parts a fitted value and a
residual:
yi y i ui
(2.32)
2.3 Sum of Squares

From the idea of fitted and residual
components, we can calculate the TOTAL
SUM OF SQUARES (SST), the EXPLAINED
SUM OF SQUARES (SSE) and the
n
RESIDUAL SUM OF
SQUARES (SSR)
2
SST (y i - y)
(2.33)
i 1
n
SSE (y i - y)
(2.34)
i 1
n
SSR (y i - y i ) 2
i 1
(
u
)
i
i 1
(2.35)
2.3 Sum of Squares

SST measures the sample variation in y.
SSE measures the sample variation in yhat
(the fitted component.
SSR measures the sample variation in uhat
(the residual component.
These relate to each other as follows:
SST SSE SSR
(2.36)
2.3 Proof of Squares

The proof of (2.36) is as follows:
2
2
(y
y
)
[(y
y
)
(
y
y
)]
i
i i
i
[(ui ) ( y i y )]2
2
2
(ui ) 2 u i (yi y ) ( yi y )]
SSR 2 u i (y i y ) SSE
Since we assumed that the covariance

between residuals and fitted values is zero,
2 u i (yi y ) 0
(2.37)

Data
Notes
-An in-depth analysis of sample and

inter-variable covariance is available in
section C for individual study
-SST, SSE and SSR have differing
interpretations and labels for different
econometric software. As such, it is
always important to look up the base
formula
2.3 Goodness of Fit

-Once weve run a regression, the question
is begged, How well does x explain y.
-We cant answer that yet, but we can ask,
How well does the OLS regression line fit
the data?
-To measure this, we use R2, the
COEFFICIENT OF DETERMINATION:
SSE
SSR
R
1SST
SST
2
(2.38)
2.3 Goodness of Fit

-R2 is the ratio of the explained variation
compared to the total variation
-the fraction of the sample variation in
y that is explained by x
-R2 always lies between zero and 1
-if R2=1, all actual points lie on the
regression line (usually an error)
-if R20, the regression explains very
little; OLS is a poor fit

Data
Notes
-A low R2 is not uncommon in the social

sciences, especially in cross-sectional
analysis
-econometric regressions should not be
heavily judged due to a low R 2
-for example, if R2=0.12, that means
12% of the variation is explained, which is
better than the 0% before the regression

Econ 399 Chapter2a

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Econ 399 Chapter2a

Uploaded by

Copyright:

Available Formats

Part 1

Cross Sectional Data

1. The Simple Regression Model

2.1 The Simple Regression Model

-assume people are utility maximizers

-more assumptions are required for

2.1 The Simple Regression Model

-relates two variables (x and y)

2.1 The Simple Regression Model

u is the ERROR TERM or DISTURBANCE

2.1 The Simple Regression Model

-taste depends on cooking time

2.1 The Simple Regression Model

-for example, if B1=3, a 2 increase in x

2.1 The Simple Regression Model

-note that this equation implies

2.1 The Simple Regression Model

-if Bo is included in the equation, it can

2.1 x, u and Dependence

-the average value of u does not depend on

Take the regression:

-where u takes into other factors of the

E (length | good paper) E(length | bad paper) 0

2.1 The Simple Regression Model

-2.8 is called the POPULATION REGRESSION

2.2 Deriving the OLS Estimates

-let {(x,y):i=1,.n} be a sample of n

-here yi is explained by xi with error term

2.2 Deriving the OLS Estimates

2.2 Deriving OLS

-Secondly, we must assume that the covariance

-(2.10) and (2.11) can also be rewritten in

2.2 Deriving OLS

joint probability of the POPULATION

2.2 Deriving OLS

Which can be rewritten as:

Which is our OLS estimate for the intercept

2.2 Deriving OLS

Which can be rewritten as:

2.2 Deriving OLS

2.2 Deriving OLS

Which essentially states that not all xs are the same

2.2 Deriving OLS

2.2 Fitted Values

OLS estimates of B1 and B2 give us a

-there is one fitted or predicted value of y for each observation of x

The difference between the actual y values

-again, there is one residual for each observation

The SUM OF SQUARED RESIDUALS can be

2.2 Why OLS?

Why minimize the sum of the squared residuals?

2.2 Regression Line

Our OLS regression supplies us with an OLS

-note that as this is an equation of a line, there are no subscripts

2.2 Deriving OLS

Shows the change in yhat when x changes, or

The change in x can be multiplied by B1hat to

2.2 Deriving OLS Estimates

1) As the mathematics required to

2.3 Properties of OLS on Any Sample of

-Once again, simple algebraic

2.3 Properties of OLS

2.3 Properties of OLS