You are on page 1of 40

# Part 1

## Cross Sectional Data

Simple Linear Regression Model
Chapter 2
Multiple Regression Analysis
Chapters 3 and 4
Chapter 6
Dummy Variables Chapter 7
Note: Appendices A, B, and C are

## 1. The Simple Regression Model

2.1 Definition of the Simple Regression
Model
2.2 Deriving the Ordinary Least Squares
Estimates
2.3 Properties of OLS on Any Sample of
Data
2.4 Units of Measurement and Functional
Form
2.5 Expected Values and Variances of the
OLS Estimators
2.6 Regression through the Origin

## 2.1 The Simple Regression Model

Economics is built upon assumptions

## -assume people are utility maximizers

-assume perfect information
-assume we have a can opener
The Simple Regression Model is based on
assumptions

## -more assumptions are required for

more analysis
more complicated models

## 2.1 The Simple Regression Model

Recall the SIMPLE LINEAR REGRESION
MODEL:

y 0 1 x u

(2.1)

## -relates two variables (x and y)

-also called the two-variable linear regression
model or bivariate linear regression model
y is the DEPENDENT or EXPLAINED variable
x is the INDEPENDENT or EXPLANATORY
variable
y is a function of x

## 2.1 The Simple Regression Model

Recall the SIMPLE LINEAR REGRESION
MODEL:

y 0 1 x u

(2.1)

## u is the ERROR TERM or DISTURBANCE

variable
-u takes into account all factors other
than x that affect y
-u accounts for all unobserved impacts
on y

## 2.1 The Simple Regression Model

Example of the SIMPLE LINEAR REGRESION
MODEL:

taste 0 1cookingtime u

(ie)

## -taste depends on cooking time

-taste is explained by cooking time
-taste is a function of cooking time
-u accounts for other factors affecting
taste (cooking skill, ingredients available,
random luck, differing taste buds, etc.)

## 2.1 The Simple Regression Model

The SRM shows how y changes:

y 1x if u 0

(2.2)

## -for example, if B1=3, a 2 increase in x

would cause a 6 unit change in y (2 x 3 =
6)
-B1 is the SLOPE PARAMETER
-B0 is the INTERCEPT PARAMETER or
CONSTANT TERM
-not always useful in analysis

y 0 1 x u

(2.1)

## -note that this equation implies

CONSTANT returns
-the first x has the same impact on y
as the 100th x
-to avoid this we can include powers or
change functional forms

## 2.1 The Simple Regression Model

-in order to achieve a ceteris paribus
analysis of xs affect on y, we need
assumptions of us relationship with x
-in order to simplify our assumptions, we
first assume that the average of u in the
population is zero:

E (u) 0

(2.5)

## -if Bo is included in the equation, it can

always be modified to make (2.5) true
-ie: if E(u)>0, simply increase B1

## 2.1 x, u and Dependence

-we now need to assume that x and u are
unrelated
-if x and u are uncorrelated, u may still be
correlated to functions such as x2
-we therefore need a stronger assumption:

E (u | x) E (u ) 0

(2.6)

## -the average value of u does not depend on

x
-second equality comes from (2.5)
-called the ZERO CONDITIONAL MEAN

2.1 Example

## Take the regression:

Papermark 0 1Paperquality u

(ie)

## -where u takes into other factors of the

applied paper, in particular length
exceeding 10 pages
-assumption (2.6) requires that a papers
length does not depend on how good it is:

## 2.1 The Simple Regression Model

Conditional Expectations of (2.1) and (2.6) give
us:

E (y | x) 0 1x

(2.8)

## -2.8 is called the POPULATION REGRESSION

FUNCTION (PRF)
-a one unit increase in x increases the
expected value of y by B1
-B0+B1x is the systematic (explained) part of
y
-u is the unsystematic (unexplained) part of y

## 2.2 Deriving the OLS Estimates

In order to estimate B1 and B2, we need sample
data

## -let {(x,y):i=1,.n} be a sample of n

observations from the population

yi 0 1x i u i

(2.9)

## -here yi is explained by xi with error term

ui
-y5 indicates the observation of y from
the 5th data point
-this regression plots a best fit line
through our data points:

## 2.2 Deriving the OLS Estimates

These OLS estimates create a straight line going
through the middle of the estimates:

## 2.2 Deriving OLS

Estimates
In order to derive OLS,
we first need assumptions.
We must first assume that u has zero expected value:

E (u) 0

(2.10)

## -Secondly, we must assume that the covariance

between x and u is zero:

Cov ( x, u ) E ( xu ) 0

(2.11)

## -(2.10) and (2.11) can also be rewritten in

terms of x and y as:

E (y - 0 - 1x) 0

E[ x(y - 0 - 1x)] 0

(2.12)
(2.13)

## 2.2 Deriving OLS

Estimates
-(2.12) and (2.13)
imply restrictions on the

## joint probability of the POPULATION

-given SAMPLE data, these equations become:
1 n
- x ) 0
(y

(2.14)

i
o
1 i
n i 1
1
n

- x ) 0
x
(y

i i o 1i

(2.15)

i 1

-notice that the hat above B1 and B2 indicate we are now dealing with estimates
-this is an example of method of moments estimation (see Section C for a
discussion)

## 2.2 Deriving OLS

Estimates
Using summation
properties, (2.14)
simplifies to:

y 0 1 x

(2.16)

0 y 1 x

(2.17)

## Which is our OLS estimate for the intercept

-therefore given data and an estimate of the slope, the estimated
intercept can be determined

## 2.2 Deriving OLS

Estimates
By cancelling out
1/n and combining (2.17)
and 2.15 we get:
n

x) x ] 0
x
[
y

(
y

i i
1
1
i 1

n

i 1

i 1

x
(
y

y
)

i i
1 xi ( xi x )

## 2.2 Deriving OLS

Estimates
Recall the algebraic
properties:
n

x [ x x] [ x x]
i 1

i 1

And
n

x [y
i 1

y ] [ xi x][ yi y ]
i 1

## 2.2 Deriving OLS

Estimates
We can make the
simple assumption that:
n

[ x x]
i 1

(2.18)

## Which essentially states that not all xs are the same

-ie: you didnt do a survey where one question is are you alive?
-This is essentially the key assumption needed to estimate B 1hat

## 2.2 Deriving OLS

All this gives usEstimates
the OLS estimate for B1:
n

[ x x][ y
i

i 1

[ x x]
i 1

y]
(2.19)
2

Note that assumption (2.18) basically ensured the denominator is not zero.
-also note that if x and y are positively (negatively) correlated, B1hat will
be positive (negative)

## OLS estimates of B1 and B2 give us a

FITTED value for y when x=xi:

yi 0 1x i

(2.20)

## -there is one fitted or predicted value of y for each observation of x

-the predicted ys can be greater than, less than or (rarely) equal to
the actual ys

2.2 Residuals

## The difference between the actual y values

and the estimates is the ESTIMATED error,
or residuals:

ui yi y i yi 0 1x i

(2.21)

## -again, there is one residual for each observation

-these residuals ARE NOT the same as the actual error term

2.2 Residuals

## The SUM OF SQUARED RESIDUALS can be

expressed as:
n

ui ( yi 0 1x i )
2

(2.22)

i 1

-if B1hat and B2hat are chosen to minimize (2.22), (2.14) and (2.15) are our FIRST ORDER
CONDITIONS (FOCS) and we are able to derive the same OLS estimates as above (2.17)
and (2.19)
-the term OLS comes from the fact that the square of the residuals is minimized

## Why minimize the sum of the squared residuals?

-Why not minimize the residuals themselves?
-Why not minimize the cube of the residuals?
-not all minimization techniques can be expressed as formulas
-OLS has the advantage of deriving unbiasedness, consistency, and
other important statistical properties.

REGRESSION LINE:

y 0 1x

## -note that as this is an equation of a line, there are no subscripts

-B0hat is the predicted value of y when x=0
-not always a valid value
-(2.23) is also called the SAMPLE REGRESSION FUNCTION (SRF)
-different data sets will estimate different Bs

(2.23)

## 2.2 Deriving OLS

Estimates
The slope estimate:

1 y/x

(2.24)

alternatively,

y 1x

(2.25)

## The change in x can be multiplied by B1hat to

estimate the change in y.

Notes:

## 1) As the mathematics required to

estimate OLS is difficult with more
than a few data points, econometrics
software (like Shazam) must be used.
2) A successful regression cannot
conclude on causality, only comment
on positive or negative relations
between x and y
3) We often use the terminology
regress y on x to estimate y=f(x)

Data
Review

## -Once again, simple algebraic

properties are needed in order to build
OLSs foundation
-OLS (B1hat and B2hat) can be used to
calculate fitted values (yhat)
-the residual (u) is the difference
between the actual y values and the
estimated y values (yhat)

## 2.3 Properties of OLS

u=y-yhat
Here yhat underpredicts y

uhat
yhat

## 1) From the FOC of OLS (2.14), the sum of

all residuals is zero:
n

u
i 1

(2.30)

## 2) Also from the FOC of OLS (2.15), the sample covariance

between the regressors and the OLS residuals is zero:
n

x u
i 1

i i

(2.31)

## From 2.30, the left side of 2.31) is proportional

to the required sample covariance

## 3) The point (xbar, ybar) is always on the

OLS regression line (from 2.16):

y 0 1 x

(2.16)

## Further Algebraic Gymnastics:

1) From (2.30) we know that the sample
average of the fitted y values equals the
sample average of the actual y values:

y y

## 2.3 Properties of OLS

Further Algebraic Gymnastics:
2) 2.30 and 2.31 combine to prove that the
covariance between yhat and uhat is zero
Therefore OLS breaks down yi into two
uncorrelated parts a fitted value and a
residual:

yi y i ui

(2.32)

## 2.3 Sum of Squares

From the idea of fitted and residual
components, we can calculate the TOTAL
SUM OF SQUARES (SST), the EXPLAINED
SUM OF SQUARES (SSE) and the
n
RESIDUAL SUM OF
SQUARES (SSR)
2
SST (y i - y)
(2.33)
i 1
n

SSE (y i - y)

(2.34)

i 1
n

SSR (y i - y i ) 2
i 1

(
u
)
i
i 1

(2.35)

## 2.3 Sum of Squares

SST measures the sample variation in y.
SSE measures the sample variation in yhat
(the fitted component.
SSR measures the sample variation in uhat
(the residual component.
These relate to each other as follows:

(2.36)

## 2.3 Proof of Squares

The proof of (2.36) is as follows:

2
2

(y

y
)

[(y

y
)

(
y

y
)]
i
i i
i

[(ui ) ( y i y )]2

2
2

(ui ) 2 u i (yi y ) ( yi y )]

SSR 2 u i (y i y ) SSE

## Since we assumed that the covariance

between residuals and fitted values is zero,

2 u i (yi y ) 0

(2.37)

Data
Notes

## -An in-depth analysis of sample and

inter-variable covariance is available in
section C for individual study
-SST, SSE and SSR have differing
interpretations and labels for different
econometric software. As such, it is
always important to look up the base
formula

## 2.3 Goodness of Fit

-Once weve run a regression, the question
is begged, How well does x explain y.
How well does the OLS regression line fit
the data?
-To measure this, we use R2, the
COEFFICIENT OF DETERMINATION:

SSE
SSR
R
1SST
SST
2

(2.38)

## 2.3 Goodness of Fit

-R2 is the ratio of the explained variation
compared to the total variation
-the fraction of the sample variation in
y that is explained by x
-R2 always lies between zero and 1
-if R2=1, all actual points lie on the
regression line (usually an error)
-if R20, the regression explains very
little; OLS is a poor fit

Data
Notes

## -A low R2 is not uncommon in the social

sciences, especially in cross-sectional
analysis
-econometric regressions should not be
heavily judged due to a low R 2
-for example, if R2=0.12, that means
12% of the variation is explained, which is
better than the 0% before the regression