You are on page 1of 78

Introduction to Econometrics

Ekki Syamsulhakim
Undergraduate Program
Department of Economics
Universitas Padjadjaran
Last Week
• Cool class room!
• Some basic mathematical statistics
– Fun with summation
– Covariance – and direction of 2 variables
– Random Variables
Registering your email
• Putri
• Hanifa
• Fauziah
• Sofi
• Evangela
• Annisa
Today
• Random variables continuous
• Conditional Probability
• Conditional Expectation
• Ch 2W
– PRF & SRF
– Simple regression model
– Interpretation of simple regression parameters
– Fitted Values and Residuals
– How to derive OLS estimator
Continuous RV
• A variable X is a continuous random variable if it
takes on any real value with zero probability.
– This definition is somewhat counterintuitive, since in
any application, we eventually observe some outcome
for a random variable.
• The idea is that a continuous random variable X
can take on so many possible values that we
cannot count them or match them up with the
positive integers, so logical consistency dictates
that X can take on each value with probability zero.
Continuous RV
• Household income could be considered
continuous: when the data measured to as
many decimal places as you can imagine
– there are an infinite number of possible outcomes.
• Therefore, the probability of any one
particular value occurring would be virtually
zero.
Continuous RV
• Because it makes no sense to discuss the
probability that a continuous random variable
takes on a particular value, we use the
probability density function (pdf) of a
continuous rv only to compute events
involving a range of values.
Probability Density Function (pdf)
• The PROBABILITY DENSITY FUNCTION (PDF) is
a graphical representation of the probability
distribution.
• Probabilities of being in a particular range are
given by the area under a curve over that
range. The area under a PDF is given by
Probability Density Function
Probability Density Function
• For example, if a and b are constants where
a<b, the probability that X lies between the
numbers a and b, P(a≤X≤b), is the area under
the pdf between points a and b, as shown in
Figure B.2.
• If you are familiar with calculus, you recognize
this as the integral of the function f between
the points a and b. The entire area under the
pdf must always equal one.
Normal Probability Distribution
• The most important probability distribution
for describing a continuous random variable is
the normal probability distribution.
• The normal distribution has been used in a
wide variety of practical applications in which
the random variables are heights and weights
of people, test scores, scientific
measurements, amounts of rainfall, and other
similar values
Normal Distribution
• When the shape of the pdf of a Random
Variable follows a bell shaped curve, we say
that the RV follows a Normal distribution
Normal Distribution
Normal Distribution
Conditional Distributions
• In econometrics, we are usually interested
in how one random variable, call it Y, is
related to one or more other variables.
• For now, suppose that there is only
variable whose effects we are interested in, call
it X .
• The most we can know about how X affects Y is
contained in the conditional distribution of Y
given X .
Conditional Distributions
• This information is summarized by the
conditional probability density function,
defined by
Conditional Expectation
• You know what covariance and correlation
(coefficient) are
• You already reviewed what Expected Value is
• We have just reviewed conditional probability
distribution is
Conditional Expectation
• Covariance and correlation measure the linear
relationship between two random variables and
treat them symmetrically.
• More often in the social sciences, we would like to
explain one variable, called Y, in terms of another
variable, say X .
• Further, if Y is related to X in a nonlinear fashion,
we would like to know this.
• For example, Y might be hourly wage, and X might
be years of formal education.
Conditional Expectation
• We have already introduced the notion of the
conditional probability density function of Y
given X .
• Thus, we might want to see how the
distribution of wages changes with education
level.
– A single number will no longer suffice, since the
distribution of Y, given X=x, generally depends on
the value of x .
Conditional Expectation
• Nevertheless, we can summarize the
relationship between Y and X by looking at
the conditional expectation of Y given X,
sometimes called the conditional mean.
• The idea is this. Suppose we know that X has
taken on a particular value, say x. Then, we
can compute the expected value of Y given
that we know this outcome of X.
Conditional Expectation
• We denote this expected value by E(Y|X=x), or
some-times E(Y|x) for shorthand.
• Generally, as x changes, so does E(Y|x).
• When Y is a discrete random variable taking
on values {y1,…,ym}, then
Conditional Expectation
• When Y is continuous, E(Y x) is defined by
integrating yfY|X(y|x) over all possible values of y .
• As with unconditional expectations, the
conditional expectation is a weighted average of
possible values of Y, but now the weights reflect
the fact that X has taken on a specific value.
• Thus, E(Y|x) is just some function of x, which tells
us how the expected value of Y varies with x .
Conditional Expectation
• As an example, let (X ,Y) represent the population of all
working individuals, where X is years of education, and
Y is hourly wage.
• Then, E(Y|X =12) is the average hourly wage for all
people in the population with 12 years of education
(roughly a high school education). E(Y|X =16) is the
average hourly wage for all people with 16 years of
education.
• Tracing out the expected value for various levels of
education provides important information on how
wages and education are related.
Conditional Expectation
Conditional Expectation
• In principle, the expected value of hourly
wage can be found at each level of education,
and these expectations can be summarized in
a table.
• Since education can vary widely—and can
even be measured in fractions of a year—this
is a cumbersome way to show the
relationship between average wage and
amount of education.
Conditional Expectation
• In econometrics, we typically specify simple
functions that capture this relationship. As an
example, suppose that the expected value of
WAGE given EDUC is the linear function
Conditional Expectation
• If this relationship holds in the population of
working people, the average wage for people
with eight years of education is 1.05+ .45(8)=
4.65, or $4.65.
• The average wage for people with 16 years of
education is 8.25, or $8.25.
• The coefficient on EDUC implies that each
year of education increases the expected
hourly wage by .45, or 45 cents.
Zero Conditional Mean Assumption
• Suppose Econometric Model :

• Because u and x are random variables, we can define


the conditional distribution of u given any value of x .
• In particular, for any x, we can obtain the expected (or
average) value of u for that slice of the
population described by the value of x .
• The crucial assumption is that the average value of u
does not depend on the value of x . We can write this as
Zero Conditional Mean Assumption
2.6

• It says that, for any given value of x, the average of the


unobservables is the same and therefore must equal the average
value of u in the entire population.
• To simplify the discussion, assume that u is the same as innate
ability. Then (2.6) requires that the average level of ability is the
same regardless of years of education.
• For example, if E( abil |8) denotes the average ability for the group
of all people with eight years of education, and E( abil|16) denotes
the average ability among people in the population with 16 years of
education, then (2.6) implies that these must be the same. In fact,
the average ability level must be the same for all education levels.
Important: Symbols
• Econometric Model

• PRF ()

• SRF (by OLS)


Econometrics model &
Regression Function
• Econometric Model

• PRF ()

• SRF (by OLS)


Denoting type of data
• Cross-section

• Time-series

• Panel
Simple Regression Model
• The simple regression model can be used to study
the relationship between two variables.
• The simple regression model has limitations as a
general tool for empirical analysis.
– it is sometimes appropriate as an
empirical tool.
• Learning how to interpret the simple regression
model is good practice for studying multiple
regression, which we will do in subsequent
chapters.
Simple Regression Model
• Much of applied econometric analysis begins
with the following premise: and are two
variables, representing some population, and
we are interested in “explaining in terms
of ,” or in “studying how varies with changes
in .”
Simple Regression Model
• In writing down a model that will “explain in terms
of ,” we must confront three
issues.
– First, since there is never an exact relationship between
two variables, how do we allow for other factors to
affect ?
– Second, what is the functional relationship between
and ?
– And third, how can we be sure we are capturing a
ceteris paribus relationship between and (if that is a
desired goal)?
Definition of the
Simple Regression Model
• We can resolve these ambiguities by writing down an
equation relating y to x. A simple
equation is

(2.1)
• Equation (2.1), which is assumed to hold in the
population of interest, defines the simple linear
regression model.
– It is also called the two-variable linear regression model or
bivariate linear regression model because it relates the two
variables and .
Definition of the
Simple Regression Model
• When related by (2.1), the variables y and x have
several different names used interchangeably, as
follows:
Definition of the
Simple Regression Model
• The variable , called the error term or
disturbance in the relationship, represents
factors other than that affect .
– A simple regression analysis effectively treats all
factors affecting other than as being unobserved.
• You can usefully think of as standing for
“unobserved.”
Definition of the
Simple Regression Model
• Equation (2.1) also addresses the issue of the
functional relationship between and .
– If the other factors in are held fixed, so that the
change in is zero, = 0, then has a linear effect on :

(2.2)
– Thus, the change in y is simply multiplied by the
change in . This means that is the slope parameter
in the relationship between and , holding the other
factors in fixed;
Definition of the
Simple Regression Model

– The intercept parameter , sometimes called the


constant term, also has its uses, although it is
rarely central to an analysis.
Example of SRM
• The linearity of (2.1) implies that a one-unit
change in x has the same effect on ,
regardless of the initial value of .
• This is unrealistic for many economic
applications.
– For example, in the wage-education example, we
might want to allow for increasing returns:
the next year of education has a larger effect on
wages than did the previous year.
• The most difficult issue to address is whether
model (2.1) really allows us to draw ceteris
paribus conclusions about how affects .
• We just saw in equation (2.2) that does
measure the effect of on , holding all other
factors (in ) fixed.
– Is this the end of the causality issue?
Unfortunately, no.
• How can we hope to learn in general about
the ceteris paribus effect of on , holding other
factors fixed, when we are ignoring all those
other factors?
• we are only able to get reliable estimators of
and from a random sample of data when we
make an assumption restricting how the
unobservable is related to the explanatory
variable .
– Without such a restriction, we will not be able
to estimate the ceteris paribus effect, .
• Because and are random variables, we need
a concept grounded in probability.
• Before we state the key assumption about
how and are related, we can always make
one assumption about .
• As long as the intercept is included in the
equation, nothing is lost by assuming that the
average value of in the population is zero.
Mathematically,

(2.5)
• Assumption (2.5) says nothing about the
relationship between and , but simply makes
a statement about the distribution of the
unobserved factors in the population.
• Without loss of generality, we can assume that
things such as average ability are zero in the
population of all working people.
• We now turn to the crucial assumption regarding
how and are related.
– A natural measure of the association between two
random variables is the correlation coefficient.
• If and are uncorrelated, then, as random
variables, they are not linearly related.
– correlation measures only linear dependence between
and .
– Correlation has a somewhat counterintuitive feature:
it is possible for to be uncorrelated with while being
correlated with functions of , such as
• See Section B.4 for further discussion
• Example in excel
• This possibility is not acceptable for most regression
purposes, as it causes problems for interpreting the
model and for deriving statistical properties.
• A better assumption involves the expected value of
given
• because and are random variables, we can define
the conditional distribution of given any value of .
– In particular, for any , we can obtain the expected (or
average) value of for that slice of the population
described by the value of .
• The crucial assumption is that the average value of
does not depend on the value of .
• We can write this assumption as

(2.6)

• Equation (2.6) says that the average value of


the unobservables is the same across all slices
of the population determined by the value of
– and that the common average is necessarily equal
to the average of over the entire population.
– When assumption (2.6) holds, we say that u is
mean independent of .
• Let us see what (2.6) entails in the wage example. To
simplify the discussion, assume that is the same as innate
ability.
• Then (2.6) requires that the average level of ability is the
same regardless of years of education. For example, if
denotes the average ability for the group of all people with
eight years of education, and denotes the average ability
among people in the population with sixteen years of
education, then (2.6) implies that these must be the same.
• In fact, the average ability level must be the same for all
education levels. If, for example, we think that average
ability increases with years of education, then (2.6) is false.
– This would happen if, on average, people with more ability
choose to become more educated.
• As we cannot observe innate ability, we have
no way of knowing whether or not average
ability is the same for all education levels.
– But this is an issue that we must address before
relying on simple regression analysis.
• What if we assume zero conditional mean of ?
– often useful
– Taking the expected value of (2.1) conditional on
and using gives

(2.8)
Zero Conditional Mean Assumption
• Suppose Econometric Model :

• Taking expected value (conditional on x) and


using E(u|x)=0, we have

• We call this Population Regression Function , and


is shown as a linear function (of x)
Population Regression Function

Population Regression Function


Exists, but the parameters are
UNKNOWN
Population Regression Function

error
error

Population Regression Function


Exists, but the parameters are
UNKNOWN
PRF vs SRF
• We generally don’t have population data;
using sample data to obtain the parameter
– Statistical inference
• Note:
– We denote population parameter WITHOUT HAT
(without ^)
– We denote sample parameter with HAT (^)
– Missing hat may lead to confusion (and also
reduce marks)
Important: Symbols
• Econometric Model

• PRF ()

• SRF (by “OLS”)


Econometrics model &
Regression Function
• Econometric Model

• PRF ()

• SRF (by “OLS”)


Denoting type of data
• Cross-section

• Time-series

• Panel
Ordinary Least Square

HOW TO GET THE PARAMETERS?


Suppose we have
Econometric model:

PRF:

Since we assume that and we only have sample


data
SRF:
Scatter Plot and Regression Line
SRF: Fitted Values and Residuals
Scatter Plot, Regression Line, and Errors

residual
residual
Mechanics of OLS
• We want to minimize the residual (the
distance between actual data and estimated
value – of our independent variable)

• Minimize  minimize

• Because we have observation, we will have as


many as of , i.e
Mechanics of OLS
• …. Because we have observation, we will have
as many as of , i.e  we need a
representation (again!)

• As some of the residuals are placed above -


and some others are place below - the
regression line (SRF in this case), we could just
add them up, so we get as the representative
Mechanics of OLS
• … But this is not the best idea, since the value
of may end up equal to zero (due to
cancellation between positive and negative )
Mechanics of OLS
• A better idea is to take the absolute value of , i.e , so
the representative number is

• ..but this is also a problem if we have more than one


SRF that has the same

• To overcome the problem we need to square , so that


the representative number will be , or simply , named
“sum of squares of residual” or “residual sum of
squares”
Mechanics of OLS
• We assume
Mechanics of OLS
First Order Condition:
First Order Condition:
From (1) & (2):

Using cramer’s rule:


(1) And (2) in matrix:
Cramer’s Rule

|
^𝛽 =
∑𝑦
∑ 𝑥𝑦
∑𝑥
∑𝑥 |∑
2

=
𝑦∑ 𝑥 −∑𝑥 ∑ 𝑥𝑦
2

| |
0 2 2
𝑛 ∑𝑥 𝑛∑ 𝑥 − ( ∑ 𝑥 )
2
∑𝑥 ∑𝑥
Cramer’s rule

You might also like