0% found this document useful (0 votes)
58 views47 pages

Simple Regression Model Overview

The document provides an introduction to the simple linear regression model, including definitions of key terms like the population regression function and sample regression function. It discusses how the ordinary least squares method is used to estimate the parameters of the sample regression function by minimizing the sum of squared residuals.

Uploaded by

WAWAN HERMAWAN
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views47 pages

Simple Regression Model Overview

The document provides an introduction to the simple linear regression model, including definitions of key terms like the population regression function and sample regression function. It discusses how the ordinary least squares method is used to estimate the parameters of the sample regression function by minimizing the sum of squared residuals.

Uploaded by

WAWAN HERMAWAN
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

Introduction to Econometrics

Ekki Syamsulhakim
Undergraduate Program
Department of Economics
Universitas Padjadjaran
Last Week
• Distribution of Random Variables
• Expected Value of Random Variable – part 2
• Joint, Marginal and Conditional Probabilities
• Measures of Association
Today
• Chapter 2 – Wooldridge
– Simple regression model
Simple Regression Model
• The simple regression model can be used to study
the relationship between two variables.
• The simple regression model has limitations as a
general tool for empirical analysis.
– it is sometimes appropriate as an
empirical tool.
• Learning how to interpret the simple regression
model is good practice for studying multiple
regression, which we will do in subsequent
chapters.
Simple Regression Model
• Much of applied econometric analysis begins
with the following premise: and are two
variables, representing some population, and
we are interested in “explaining in terms
of ,” or in “studying how varies with changes
in .”
Simple Regression Model
• In writing down a model that will “explain in terms
of ,” we must confront three
issues.
– First, since there is never an exact relationship between
two variables, how do we allow for other factors to
affect ?
– Second, what is the functional relationship between
and ?
– And third, how can we be sure we are capturing a
ceteris paribus relationship between and (if that is a
desired goal)?
Definition of the
Simple Regression Model
• We can resolve these ambiguities by writing down an
equation relating y to x. A simple
equation is

(2.1)
• Equation (2.1), which is assumed to hold in the
population of interest, defines the simple linear
regression model.
– It is also called the two-variable linear regression model or
bivariate linear regression model because it relates the two
variables and .
Definition of the
Simple Regression Model
• When related by (2.1), the variables y and x have
several different names used interchangeably, as
follows:
Definition of the
Simple Regression Model
• The variable , called the error term or
disturbance in the relationship, represents
factors other than that affect .
– A simple regression analysis effectively treats all
factors affecting other than as being unobserved.
• You can usefully think of as standing for
“unobserved.”
Definition of the
Simple Regression Model
• Equation (2.1) also addresses the issue of the
functional relationship between and .
– If the other factors in are held fixed, so that the
change in is zero, = 0, then has a linear effect on :

(2.2)
– Thus, the change in y is simply multiplied by the
change in . This means that is the slope parameter
in the relationship between and , holding the other
factors in fixed;
Definition of the
Simple Regression Model

– The intercept parameter , sometimes called the


constant term, also has its uses, although it is
rarely central to an analysis.
Example of SRM
• The linearity of (2.1) implies that a one-unit
change in x has the same effect on ,
regardless of the initial value of .
• This is unrealistic for many economic
applications.
– For example, in the wage-education example, we
might want to allow for increasing returns:
the next year of education has a larger effect on
wages than did the previous year.
• The most difficult issue to address is whether
model (2.1) really allows us to draw ceteris
paribus conclusions about how affects .
• We just saw in equation (2.2) that does
measure the effect of on , holding all other
factors (in ) fixed.
– Is this the end of the causality issue?
Unfortunately, no.
• How can we hope to learn in general about
the ceteris paribus effect of on , holding other
factors fixed, when we are ignoring all those
other factors?
• we are only able to get reliable estimators of
and from a random sample of data when we
make an assumption restricting how the
unobservable is related to the explanatory
variable .
– Without such a restriction, we will not be able
to estimate the ceteris paribus effect, .
• Because and are random variables, we need
a concept grounded in probability.
• Before we state the key assumption about
how and are related, we can always make
one assumption about .
• As long as the intercept is included in the
equation, nothing is lost by assuming that the
average value of in the population is zero.
Mathematically,

(2.5)
• Assumption (2.5) says nothing about the
relationship between and , but simply makes
a statement about the distribution of the
unobserved factors in the population.
• Without loss of generality, we can assume that
things such as average ability are zero in the
population of all working people.
• We now turn to the crucial assumption regarding
how and are related.
– A natural measure of the association between two
random variables is the correlation coefficient.
• If and are uncorrelated, then, as random
variables, they are not linearly related.
– correlation measures only linear dependence between
and .
– Correlation has a somewhat counterintuitive feature:
it is possible for to be uncorrelated with while being
correlated with functions of , such as
• See Section B.4 for further discussion
• Example in excel
• This possibility is not acceptable for most regression
purposes, as it causes problems for interpreting the
model and for deriving statistical properties.
• A better assumption involves the expected value of
given
• because and are random variables, we can define
the conditional distribution of given any value of .
– In particular, for any , we can obtain the expected (or
average) value of for that slice of the population
described by the value of .
• The crucial assumption is that the average value of
does not depend on the value of .
• We can write this assumption as

(2.6)

• Equation (2.6) says that the average value of


the unobservables is the same across all slices
of the population determined by the value of
– and that the common average is necessarily equal
to the average of over the entire population.
– When assumption (2.6) holds, we say that u is
mean independent of .
• Let us see what (2.6) entails in the wage example. To
simplify the discussion, assume that is the same as innate
ability.
• Then (2.6) requires that the average level of ability is the
same regardless of years of education. For example, if
denotes the average ability for the group of all people with
eight years of education, and denotes the average ability
among people in the population with sixteen years of
education, then (2.6) implies that these must be the same.
• In fact, the average ability level must be the same for all
education levels. If, for example, we think that average
ability increases with years of education, then (2.6) is false.
– This would happen if, on average, people with more ability
choose to become more educated.
• As we cannot observe innate ability, we have
no way of knowing whether or not average
ability is the same for all education levels.
– But this is an issue that we must address before
relying on simple regression analysis.
• What if we assume zero conditional mean of ?
– often useful
– Taking the expected value of (2.1) conditional on
and using gives

(2.8)
Zero Conditional Mean Assumption
• Suppose Econometric Model :

• Taking expected value (conditional on x) and


using E(u|x)=0, we have

• We call this Population Regression Function , and


is shown as a linear function (of x)
Population Regression Function

Population Regression Function


Exists, but the parameters are
UNKNOWN
Population Regression Function

error
error

Population Regression Function


Exists, but the parameters are
UNKNOWN
PRF vs SRF
• We generally don’t have population data;
using sample data to obtain the parameter
– Statistical inference
• Note:
– We denote population parameter WITHOUT HAT
(without ^)
– We denote sample parameter with HAT (^)
– Missing hat may lead to confusion (and also
reduce marks)
Important: Symbols
• Econometric Model

• PRF ()

• SRF (by “OLS”)


Econometrics model &
Regression Function
• Econometric Model

• PRF ()

• SRF (by “OLS”)


Denoting type of data
• Cross-section

• Time-series

• Panel
Ordinary Least Square

HOW TO GET THE PARAMETERS?


Suppose we have
Econometric model:

PRF:

Since we assume that and we only have sample


data
SRF:
Scatter Plot and Regression Line
SRF: Fitted Values and Residuals
Scatter Plot, Regression Line, and Errors

residual
residual
Mechanics of OLS
• We want to minimize the residual (the
distance between actual data and estimated
value – of our independent variable)

• Minimize  minimize

• Because we have observation, we will have as


many as of , i.e
Mechanics of OLS
• …. Because we have observation, we will have
as many as of , i.e  we need a
representation (again!)

• As some of the residuals are placed above -


and some others are place below - the
regression line (SRF in this case), we could just
add them up, so we get as the representative
Mechanics of OLS
• … But this is not the best idea, since the value
of may end up equal to zero (due to
cancellation between positive and negative )
Mechanics of OLS
• A better idea is to take the absolute value of , i.e , so
the representative number is

• ..but this is also a problem if we have more than one


SRF that has the same

• To overcome the problem we need to square , so that


the representative number will be , or simply , named
“sum of squares of residual” or “residual sum of
squares”
Mechanics of OLS
• We assume
Mechanics of OLS
First Order Condition:
First Order Condition:
From (1) & (2):

Using cramer’s rule:


(1) And (2) in matrix:
Cramer’s Rule

|
^𝛽 =
∑𝑦
∑ 𝑥𝑦
∑𝑥
∑𝑥 |∑
2

=
𝑦∑ 𝑥 −∑𝑥 ∑ 𝑥𝑦
2

| |
0 2 2
𝑛 ∑𝑥 𝑛∑ 𝑥 − ( ∑ 𝑥 )
2
∑𝑥 ∑𝑥
Cramer’s rule

You might also like