You are on page 1of 50

BFI Program

ECONOMETRICS I National Economics University


2020
MA Hai Duong
1
INTRODUCTORY
ECONOMETRICS
SIMPLE AND MULTIPLE
REGRESSION ANALYSIS

2
RECAP.
We reviewed in the second lecture about the concepts of:
1. a random variable and its probability density function (pdf);
2. the mean and the variance of a random variable;
3. the joint pdf of two random variables;
4. the covariance and correlation between two random variables;
5. the mean and the variance of a linear combination of random variables;
6. how and why diversification reduces risk, and how good estimators in econometrics reduce
risk in a similar fashion;
7. conditional distribution of one random variable given another;
8. the conditional expectation function as the fundamental object for modelling. 3
LECTURE OUTLINE
Wooldridge: Chapter 2 - 2.3, 3.1-3.2, Appendix D-1, D-2, D-4
 Explaining using with a model
 Simple linear regression (textbook reference Chapter 2, 2-1)
 The OLS estimator (textbook reference, Ch 2, 2-2,2-3)
 Simple linear regression in matrix form
 Geometric interpretation of least squares (not in the textbook)
 Stretching our imagination to multiple regression - Appendix E of the textbook, section E-
1

4
WHAT IS A MODEL?
To study the relationship between random variables, we use a model.
A complete model for a random variable is a model of its probability distribution.
 For example, a possible model for heights of adult men is that it is normally
distributed, and since normal distribution is fully described by its mean and
variance, we only had to estimate the mean and variance from a sample of
observations on adult men and we have estimated a model.
But when studying two or more random variables, modelling the joint distribution is
difficult and requires a lot of data.
So, we model the conditional distribution of given .
In particularly, we provide a model for .
5
TERMINOLOGY AND
NOTATION
Terminology: There are many different ways people refer to the target variable that we
want to explain using variable . These include:

Dependent Variable Independent Variable


Explained Variable Explanatory Variable
Response Variable Control Variable
Predicted Variable Predictor Variable
Regressand Regressor

“Regress on ” means “Estimate the model using the OLS method”


6
MODELLING MEAN

7
THE SIMPLE LINEAR
REGRESSION MODEL

8
THE SIMPLE LINEAR
REGRESSION MODEL
The following equation specifies the conditional mean of given
with
It is an incomplete model because it does not specify the probability distribution of
conditional of
If we add the assumptions that and that the conditional distribution of given is
normal, then we have a complete model, which is called the “classical linear model”
and was shown in the picture on the previous slide.
In summary: we make the assumption that in the big scheme of things, data are
generated by this model, and we want to use observed data to learn the unknowns ,
and in order to predict using .

9
THE OLS ESTIMATOR
Let’s leave the theory universe and go back to the data world
We have a random sample of observations on two variables and in two columns of
a spreadsheet
Let’s denote them by
We want to see if these two columns of numbers are related to each other
In the simple case that there is only one , we look at the scatter plot, which is very
informative visual tools and give us a good idea of the correlation between and
(body-fat and weight)
Unfortunately, in some business and economic applications the signal is too weak to
be detected by data visualization alone (asset return and size)
10
EXERCISE AT HOME
Proof:

And

11
THE OLS ESTIMATOR
How can we determine a straight line that fits our data best?
We find and that minimize the sum of squared residuals

The first order conditions are:

Two equations and two unknowns, we can solve to get and

12
THE OLS ESTIMATOR
The set of equations:

And after some algebra (see page 26 of the textbook, 6th ed.)

13
THE OLS ESTIMATOR
Several important things to note:
1.
2. From the formula for we see that , which shows that (, ) lies on the regression
line.
3. Both and are functions of sample observations only. They are estimators, i.e.,
formulae that can be computed from the sample. If we collect a di fferent sample
from the same population, we get different estimates. So, these estimators are
random variables.

14
ESTIMATORS WE HAVE SEEN
SO FAR
Population parameter Sample statistic (or estimator)
Pop. mean Sample average
Pop. variance Sample variance
Pop. St. dev. Sample st. dev.

Pop. Cova. Sample cova.

Pop. Corr. Coef. Sample Corr. Coef.

Slope and intercept in PRF and OLS estimators and


15
SIMPLE LINEAR REGRESSION
IN MATRIX FORM
For each of the observations our model implies
where
We stack all of these on top of each other from 1 to
+
Define (in algebra): and are the vectors

16
SIMPLE LINEAR REGRESSION
IN MATRIX FORM
We define the matrix of regressors
and the parameter vector
These notations allow us to write the model simply as

17
MULTIPLE REGRESSION
(MATRIX FORM)
In multiple regression where we have explanatory variables plus the intercept

Where:
and
Remember that and are observable, and are unknown and unobservable

18
THE POWER OF ABSTRACTION
In the real world we have goals such as:
 We want to determine the added value of education to wage after controlling for
experience and IQ based on a random sample of 1000 observations
 We want to predict the value of house in the NY city given characteristics such as
land area, number of bedrooms, number of bathrooms, proximity to public
transport, etc., based on data on the last 100 houses sold in the NY city
 We are interested in predict the final exam score given students attention and
unobserved factors that affect exam performance (student abilities: stenograph,
fast mental calculation, ...) based on a random sample of 300 students
In all cases we want to get a good estimate of in the model based on our sample of
observations.
We have now turned these very diverse questions into a single mathematical problem
19
THE OLS SOLUTION
OLS finds the vector in the column space of X which is closest to y. Let’s see what
this means
To visualize we are limited to 3 dimensions at most
Assume we want to explain house prices with number of bedrooms
We have data on 3 houses, with 4, 1 and 1 bedrooms which sold for 10, 4 and 6
×$100,000
Start to find a good
+ +

20
VECTORS AND VECTOR
SPACES
An -dimensional vector is an
arrow from the origin to the
point in with coordinates given
by its elements
Example:

21
VECTORS AND VECTOR
SPACES
The length of a vector is the square
root of sum of squares of its
coordinates

Example: vector on the previous


slide
For and of the same dimension and
and are perpendicular (orthogonal) to
each other
Example: and

22
VECTORS AND VECTOR
SPACES
For any constant , the vector is on
the line that passes through
This line is called the “space
spanned by ”
Example: and are in the space
spanned by

23
VECTORS AND VECTOR
SPACES
Consider a matrix that has two columns that are not a multiple of each other
Geometrically, these vectors form a two-dimensional plane that contains all linear
combinations of these two vectors, i.e., set of all for all . This plane is called the
column space of (denote by col())
If the number of rows of is also two, then the column space of is the entire , that is
any two-dimensional can be written as a linear combination of columns of
Consider the house price example, but only with two houses
+ +
The price vector can be perfectly explained with a linear combination of columns
of , i.e., with zero
24
GRAPHICAL DESCRIPTION

25
GRAPHICAL DESCRIPTION

26
GRAPHICAL DESCRIPTION

27
GRAPHICAL DESCRIPTION

28
GRAPHICAL DESCRIPTION

29
GRAPHICAL DESCRIPTION

30
OLS ESTIMATOR VECTOR
This shows that with only two observations, we would suggest:
While this fits the first two houses perfectly, when we add the third house (1 bedroom sold for
6 hundred thousands) we make an error of 2 hundred thousand dollars
With 3 observations, the 3-dimensional price vector no longer lies in the space of linear
combinations of columns of .
The closest point in the column space of to is the point that make a right angle between the
column space of with (shows on the next slide)

31
GRAPHICAL DESCRIPTION

32
GRAPHICAL DESCRIPTION

Vector
Vector residuals

A right angle

33
GRAPHICAL DESCRIPTION
For more explanation, watch video on
https://youtu.be/ZKJtB5WUgok

34
SOLUTION
Hence, the shortest is the one that is perpendicular to column of X, i.e.

Since = , we get:

^ −𝟏 ′
→ 𝜷=( 𝐗 𝐗 ) 𝐗 𝐲

For to be invertible, columns of X must be linearly independent.


The top box is more important than the formula for the OLS estimator in the second
box. Geometry of OLS is summarized in the first box: The OLS residuals are
orthogonal to columns of
35
SOLUTION
The vector of OLS predicted values is the orthogonal projection of y in the column
space of X

By definition

Since and form a right-angled triangle, we know (remember the length of a vector)
+ or
(*)

36
SOLUTION
Since we also have

Subtracting from both side of (*) and rearranging, we get


or

This leads to the definition of the coefficient of determination , which is a measure


of goodness of fit

37
INTERPRETATION OF OLS
ESTIMATES
Consider an extension of the wage equation that is in lecture 1

where IQ is IQ score (in the population, it has a mean of 100 and sd = 15).
Primarily interested in , because we want to know the value that education adds to a
person’s wage.
Without IQ in the equation, the coefficient of educ will show how strongly wage and
educ are correlated, but both could be caused by a person’s ability.
By explicitly including IQ in the equation, we obtain a more persuasive estimate of
the causal effect of education provided that IQ is a good proxy for intelligence.

38
OLS IN ACTION
[1]

39
OLS IN ACTION
[2]

40
OLS IN ACTION
If we only estimate a regression of wage on education, we cannot be sure if we are
measuring the effect of education, or if education is acting as a proxy for smartness.
This is important, because if the education system does not add any value other than
separating smart people from not so smart, the society can achieve that much
cheaper by national IQ tests!

In the second regression, the coefficient of educ now shows that for two people with
the same IQ score, the one with 1 more year of education is expected to earn $42
more.

41
INTERPRETATION OF OLS
ESTIMATES
Consider for simplicity
The conditional expectation of given and (also know as the population regression
function) is

The estimated regression (also known as the sample regression function) is

The formula: (**)


allows us to compute how predicted changes when and change by any amount.

42
INTERPRETATION OF OLS
ESTIMATES
Now, in (**) what happens if we hold fixed and increase by 1 unit, that is, and . We
will have
In other words, is the slope of with respect to when is held fixed.
We also refer to as the estimate of the partial effect of on holding constant
Yet another legitimate interpretation is that estimates the e ffect of on after the
influence of has been removed (or has been controlled for)
Similarly, if and

43
INTERPRETATION OF OLS
ESTIMATES
Let’s go back to regression output and interpret the parameters.

shows that for two people with the same IQ, the one with one more year of
education is predicted to earn $42.06 more in monthly wages. (on average)
Or: Every extra year of education increases the predicted wage by $42.06, keeping
IQ constant (or “after controlling for IQ”, or ”after removing the e ffect of IQ”, or
”all else constant”, or ”all else equal”, or ”ceteris paribus”)

44
SUMMARY
Given a sample of observations, OLS finds the orthogonal projection of the dependent
variable vector in the column space of explanatory variables
The residual vector is orthogonal to each of the explanatory variable vectors, including a
column of ones for the intercept
This leads to the formula for the OLS estimator in multiple regression

It also leads to:


Note the difference between the population parameter and its OLS estimator . We have is
constant and doesn’t change, but is a function of sample and its value changes for
different sample. Why are these good estimators?
45
TEXTBOOK EXERCISES –
CHAP 2
Problems 2, 3 (page 53)
Problems 4, 5, 7 (page 54)
Problem 11 (page 55)

46
EXERCISE 1
Write the meaning of each of the following terms without referring to
the book (or your notes), and compare your definition with the version in the text for each:
a. constant or intercept
b. cross-sectional
c. dependent variable
d. estimated regression equation
e. expected value
f. independent (or explanatory) variable
g. linear
h. multivariate regression model
i. regression analysis
j. residual
k. slope coefficient
l. stochastic error term/random error term
47
EXERCISE 2
Suppose we built a model of the wage of the ith worker in a particular field as a function of
the work experience, education, and gender of that worker:

Where: - the wage of the ith worker


- the number of experience years of the ith worker
- the number of schooling years of the ith worker
a. What is the real-world meaning of ?
b. What is the real-world meaning of ?
c. Suppose that you had the opportunity to add another variable to the equation. Which of the
following possibilities would seem best? Explain your answer.
(the age of the ith worker, the number of jobs in this field, the average wage in this field, the
48
number of “employee of the month” awards won by the ith worker)
COMPUTER EXERCISES
C1, C3 (page 56)
C7 (page 57)
C9, C10 (page 58)

49
THANK FOR YOUR
ATTENDANCE - Q & A

50

You might also like