You are on page 1of 54

Appendix A

• The Summation Operator and Descriptive Statistics


• Properties of Linear Functions
• Proportions and Percentages
• Special Functions & Their Properties
• Differential Calculus

Appendix A Page 1
The Summation
Operator and
Descriptive Statistics
The summation operator is a useful shorthand for manipulating expressions involving the sums of many numbers, and it plays a key role in statistics and
econometric analysis.

Definition
If denotes a sequence of numbers, then we write the sum of these numbers as

Property Sum.1 Property Sum.2 Property Sum.3 These are not true
For any constant , For any constant , If is a set of pairs of
numbers, and and are constants, then
. .
,
.

Sample Mean Sample Standard Deviation Sample Covariance


. . .

Notice that
Notice that

Appendix A Page 2
Properties of Linear
Functions
Univariate Linear Function

Example:
Definition
If and are two variables related by Suppose that the relationship between monthly housing expenditure and
monthly income is
,
.
then we say that is a linear function of , and
and are two parameters (numbers) describing
this relationship.

The intercept is .
The slope is .

The marginal effect of on is constant and


equal to , i.e.,
.

✔ " " denotes


"change."

Then, for each additional dollar of income, cents is spent on housing. If


family income increases by , then housing expenditure increases by
.
The marginal propensity to consume (MPC) housing out of income is .
The average propensity to consume (APC) is

Multivariate Linear Function

Linear functions are easily defined for more than two Example:
variables. Suppose that is related to two variables,
and , in the general form Suppose that the monthly quantity demanded of compact discs is related to the
price of compact discs and monthly discretionary income by
.
.
The change in for given changes in and is

Partial Effect of on :

Appendix A Page 3
.

Partial Effect of on :

if .

Partial Effect of on :

if .

The slope of the demand curve, -9.8, is the partial effect of price on
quantity: holding income fixed, if the price of compact discs increases by one
dollar, then the quantity demanded falls by 9.8.

Appendix A Page 4
Proportions and
Percentages

Definition: Example:
The proportionate change in in moving from to If an individual’s income goes from per year to per year,
, sometimes called the relative change, is simply then the proportionate change is

Definition: Example:
The percentage change in in going from to is When income goes from to , income has increased by
simply 100 times the proportionate change, i.e., ; to get this, we simply multiply the proportionate change, , by
.
.

⚠ When is a percentage itself, and is moving from to it's advisable to report the percentage point change, i.e.,
instead.

Example:
Let denote the percentage of adults in a particular city having a college education. Suppose the initial value is (24% have a college
education), and the new value is . The change in , . The percentage of people with a college education has
increased by six percentage points.

Appendix A Page 5
Special Functions &
Their Properties
Definition:
A nonlinear function between and is
characterized by the fact that the change in for a
given change in depends on the starting value of .

1. Quadratic Functions
2. Natural Logarithm
3. Exponential Function

Appendix A Page 6
Quadratic Functions
, Example:
⚠ The function is
nonlinear in , but it is
where , , and are parameters. , linear in , , and .

,
.

Therefore setting will determine

maxima: [diminishing marginal effeccts]

minima: [increasing marginal effeccts]

Optima: ➡

In this case .

Example:

Suppose the relationship between hourly wages and years in the workforce ( ) is
given by

Notice that has a positive effect on wage up to the turning point,

.
The first year of experience is worth approximately .48 (48 cents). Each additional year
of experience increases wage by less than the previous year (diminishing marginal
return to experience)

Appendix A Page 7
Natural Logarithm
The logarithm of a number, , is the exponent to which
another fixed value, the base, must be raised to produce
that number, i.e.,
• .
• .

⚠ In Economics whenever we write or we are


always referring to the natural logarithm, i.e. .

The log function is

Properties:
An Important Approximation:

. Let and be positive values. Then, it can be shown that

. proportionate
change

. For small changes in . Notice that

. percentage
change

For small changes in .

Definition (Elasticity)
The elasticity of with respect to is defined as The elasticity of with respect
to is the percentage change in
when increases by 1%.
.

Consider the relationship Constant Elasticity Demand Function:


If is quantity demanded and is price and these variables are related by
,
,
Then
then the price elasticity of demand is . Roughly, a increase in
. price leads to a fall in the quantity demanded.

The slope parameter is the elasticity of with respect


to [assuming that ].

Suppose that and Logarithmic Wage Equation:


Suppose that hourly wage and years of education are related by
,
.
Then , so
Then,

Appendix A Page 8
.
Then , so
Then,

Semi-elasticity of with
respect to : The percentage
.
change of when increases
by one unit.
It follows that one more year of education increases hourly wage by about
.

The slope parameter is the semi-elasticity of with


respect to [assuming that ].

Suppose that and Labor Supply Equation:

, Assume that the labor supply of a worker can be described by

Then , so ,

where is hourly wage and is hours worked per week. Then,

is the unit change in when increases by In other words, a increase in wage increases the weekly hours worked
. [assuming that ]. by about , or slightly less than one-half hour.

If the wage increases by , then , or


about four and one-half hours.

Appendix A Page 9
Exponential Function
We write the exponential function as

Properties:

Appendix A Page 10
Differential Calculus

Function Derivative

Function Partial Derivatives

Appendix A Page 11
Chapter 1: The Nature of
Econometrics and
Economic Data
1. What Is Econometrics?
2. Steps in Empirical Economic Analysis
3. The Structure of Economic Data
4. Causality & Ceteris Paribus

Ch01 Page 1
What Is Econometrics?
The term “econometrics” is believed to have been crafted by Ragnar Frisch (1895-1973) of Norway, one of the three
principal founders of the Econometric Society, first editor of the journal Econometrica, and co-winner of the first
Nobel Memorial Prize in Economic Sciences in 1969. It is therefore fitting that we turn to Frisch’s own words in the
introduction to the first issue of Econometrica to describe the discipline.

A word of explanation regarding the term econometrics may be in order. Its definition is implied in the statement of
the scope of the [Econometric] Society, in Section I of the Constitution, which reads: “The Econometric Society is an
international society for the advancement of economic theory in its relation to statistics and mathematics.... Its main
object shall be to promote studies that aim at a unification of the theoretical-quantitative and the empirical-
quantitative approach to economic problems....”a

What can you do with Econometrics? ✏ Econometricians typically


analyze non-experimental data.
✔ Estimate relationships between economic variables. Experimental data is often collected in
laboratory environments in the natural
✔ Test economic theories and hypotheses. sciences. Non-experimental data are
✔ Forecast economic variables. sometimes called observational data or
retrospective data on individuals, firms, or
✔ Evaluate and implement government and business policy. segments of the economy.

Machine learning (ML) is the study of


algorithms and statistical models that
Econometrics can be computer systems use to progressively
understood as the use of improve their performance on a specific task.
statistical methods to analyze
economic data. From
<https://en.wikipedia.org/wiki/Machine_learn
ing>

Specific Task: Make Predictions (most


common). Making predictions in Economics is
called Forecasting.

Sometimes the term 'Statistical Learning' is


used instead.

Ch01 Page 2
Steps in Empirical
Economic Analysis
A model in ML
An empirical analysis uses data to test a theory or to estimate a relationship. is called the
Machine or the
⚠ When testing a theory, a formal economic model is often required: Hypothesis.

Example: Economic Model of Crime Example: Job Training & Worker Productivity A deterministic
relationship!
Becker, G. S. (1968) "Crime and Punishment: An What is the effect of additional training on worker Once we know the
Economic Approach," Journal of Political Economy 76, productivity? values of we
169-217. automatically know the
Simple reasoning leads to a model such as value of y if the
functional form f is
An equation for criminal activity based on utility known!
maximization is derived: ,

, where

where
hourly wage, A deterministic
hours spent in criminal activities, relationship!
years of formal education, Once we know the
"wage" for an hour spent in criminal activity, values of educ, exper
hourly wage in legal employment, years of workforce experience, and
and training we
income other than from crime or employment, weeks spent in job training. automatically know
the value of wage if
probability of getting caught, the functional form f
probability of being convicted if caught, is known!
expected sentence if convicted, and
age.

⚠ One needs to transformed the economic model into an econometric model. This often requires that

• The functional form, , has to be specified.


• Variables may have to be approximated by other quantities.
Example: Econometric Model of Crime Example: Job Training & Worker Productivity
An econometric model of crime can be: ,

where

hourly wage,
where
years of formal education,
some measure of the frequency of criminal activity, years of workforce experience, and
the wage that can be earned in legal employment, weeks spent in job training.
the income from other sources (assets, inheritance, and so
on), unobservables characteristics such as “innate ability,”
the frequency of arrests for prior infractions (to approximate quality of education, family background, etc.
the probability of arrest), the frequency of conviction, and
the average sentence length after conviction.
unobservables characteristics such as moral character, "wages" in
criminal activities, family socio-economic background, etc.

✏ The econometric models now introduce parameters, i.e., , , , Parameters are


✏ The econometric models now also introduce a variable, , to account for things the econometrician does not observe. unknown scalars.

Ch01 Page 3
⭐ Econometrics will allow us to estimate (guess) the values of the parameters , , , with economic data on
, , , and (let's say) while accounting for the presence of .

Ch01 Page 4
The Structure of
Economic Data
Cross-Sectional Data

In Economics, we are going to


call this column the dependent
variables, response, outcome.
In ML, we are going to call it
the dependent variable, or the
'output.'

In Economics, we are going to


call these columns independent
variables, regressors, controls,
covariates, confounders.
In ML, we are going to call
these columns 'features' or
inputs.

Time Series Data

Ch01 Page 5
Pooled Cross Sections

Ch01 Page 6
Panel (longitudinal) Data

Ch01 Page 7
Ch01 Page 8
Causality & Ceteris
Paribus
Definition ✏ The notion of 'ceteris paribus' (which A model used for prediction can solely use
means "other (relevant) factors being correlations, but it is understood that causal
The causal effect of on can be defined as equal") plays an important role in the models should in principle predict better!
"How does the variable change if the variable causal analysis. The notion of causation is implicit when talking
changes but all other relevant factors are held about 'model interpretability' in ML.
constant?" Correlation does not mean causation

Effects of Fertilizer on Crop Yield Feasible Experiment:


“By how much will the production of soybeans increase if one
Choose several one-acre plots of land; randomly assign Key point: "randomly assign,"
increases the amount of fertilizer applied to the ground?” makes it independent (and therefore
different amounts of fertilizer to the different plots;
Implicit assumption: all other factors that influence crop yield uncorrelated) of other plot features
compare yields. that affect yield.
such as quality of land, rainfall, presence of parasites, and so on
are held fixed.
Experiment works because amount of fertilizer
applied is unrelated to other factors influencing crop
yields.
Measuring the Return to Education Infeasible Experiment:
“If a person is chosen from the population and given another year Key point: "randomly assign,"
Choose a group of people; randomly assign different
of education, by how much will his or her wage increase?” makes it independent (and therefore
amounts of education to them (infeasable!); compare
uncorrelated) of other factors that
Implicit assumption: all other factors that influence wages such as wage outcomes. influence wages.
experience, family background, intelligence, and so on are held
fixed.
Problem without random assignment, amount of
education is related to other factors that influence wages
(e.g. intelligence).
Effect of Law Enforcement on City Crime Levels Infeasible Experiment:
Key point: "randomly assign,"
“If two cities are the same in all respects, except that city A has ten makes it independent (and therefore
Randomly assign number of police officers to a large
more police officers than city B, by how much would the two cities’ uncorrelated) of other factors that
number of cities. determine the crime rate.
crime rates differ?”
Implicit assumption: all other factors that influence crime such as
population characteristics, geographic location, and so on are held In reality, the number of police officers will be
fixed. determined by the crime rate, i.e., simultaneous
determination of police numbers and crime rate.
Effect of Minimum Wage on Unemployment Infeasible Experiment:
“By how much will unemployment increase if the minimum wage Key point: "randomly chooses," makes
Government randomly chooses minimum wage each
is increased by a certain amount (holding other things fixed)?” it independent (and therefore
year and observes unemployment outcomes. uncorrelated) of other factors that
Implicit assumption: all other factors that influence determine unemployment.
unemployment such as the demand for labor, prices, and so on
are held fixed. In reality, the level of the minimum wage will depend
on political and economic factors that also influence
unemployment.

Ch01 Page 9
Chapter 2: The Simple
Regression Model
1. Definition
2. OLS: Derivation
3. OLS: Algebraic Properties
4. Nonlinearities
5. OLS: Expected Values & Variance

Ch02 Page 1
Definition
An example of a model that will "explain in terms of " is the simple linear regression model:

, Parameters:
if
Intercept
The change in is simply
Slope multiplied by the change in
holding fixed.

Facing two facts: Variables:


1) What's the relationship? Linear
2) How do we allow for other
factors to affect y?
Dependent variable Independent variable Error variable
unobservables ( )
Explained variable Explanatory variable Disturbance variable
In ML, this is an example of a Response variable Control variable Unobservable variable
model in Supervised Learning.
Predicted variable Predictor variable
Regressand Regressor

Example: Soybean Yield & Fertilizer Example: A Simple Wage Regression


, ,

• The error term contains factors such as land • The error term contains factors such as labor
quality, rainfall, and so on. force experience, innate ability, tenure with current
employer, work ethic, and so on.
• The coefficient (parameter) measures the
effect of fertilizer on yield, holding other factors • The coefficient (parameter) measures the change in
fixed, i.e., . hourly wage given another year of education, holding
other factors fixed, i.e., .

When there is a Causal Interpretation?


The key assumption is going to be on the joint distribution of the observed control, , and the unobserved error :

⬅ Mean Independence Assumption.

⚠ This is a very strong assumption and unlikely to hold in practice for non-experimental data.

Example: Soybean Yield & Fertilizer Example: A Simple Wage Regression


, ,

Feasible Experiment: Infeasible Experiment:

Choose several one-acre plots of land; randomly Choose a group of people; randomly assign different
assign different amounts of fertilizer to the different amounts of education to them (infeasable!); compare
plots; compare yields. wage outcomes.

Because of random assignment, the random Because of random assignment, the random variables
variables and are statistically independent and and are statistically independent and therefore mean-
therefore mean-independent. independent.

⚠ With observational data, you will expect people with


⚠ With observational data, if more fertilizer is put
higher levels of ability (in ) getting higher levels of
on the higher-quality (in ) plots of land, then mean-
education, so mean independence will fail.
independence will fail.

Without loss of generality the mean independence assumption, , is further strengthen to:

⬅ Zero Conditional Mean Assumption.

Why?
Firstly, notice that putting the mean independence assumption together with the zero conditional mean assumption
one has .

Now assume that you're certain that the unobservable, , cannot have mean zero but instead . In this case

= , where is a new intercept that relates to the

Ch02 Page 2
= , where is a new intercept that relates to the
original one by , and .

Therefore, even if you believe that the original unobservable, , cannot have mean zero, we have just shown that the
original model can be re-written in terms of an alternative model with the same slope parameters, , (the ones we
usually care for), a different intercept and a new mean-zero error term, .

Population Regression Function (PRF)

Notice that the zero conditional mean assumption implies

This means that the expected value of the dependent


variable can be expressed as a linear function of the
explanatory variable.

The linearity means that a one-unit increase in changes


the expected value of by the amount .

Supervised learning is where you have input variables


( ) and an output variable ( ) and you use an algorithm
to learn the mapping function from the input to the
output. "Ordinary Least Squares" estimation

.
"Simple Linear Regression Model"

The goal is to approximate the mapping function so well ❗ Also notice that the zero conditional mean assumption has two
that when you have new input data ( ) that you can other implications:
predict the output variables ( ) for that data.
,
From <https://machinelearningmastery.com/supervised-and-unsupervised-machine-learning-
algorithms/>
.

Ch02 Page 3
OLS: Derivation
Recall that the zero conditional mean assumption implied

If we were to know the actual values of , , , and , then we


would have a system of linear equations with two unknows, i.e., and .

In this simple case we can write the explicit solutions:

⚠ Since we do not know the actual numerical values of


, , , and , we cannot calculate the
actual numerical values of the parameters, and .
Therefore this is infeasible!

Ordinary Least Squares Estimator

"random sample" means that any two pairs and for are
(random sampling assumption) statistically independent and came from the same joint distribution.
We have a random sample of size , , "following the population model" means that each pair, say and for
following the population model in equation . , in the sample are such that and .

How can we 'guess' (estimate) the numerical values of and based on a random sample fulfilling the random sampling assumption?

These are not true


Population Estimator
Moment
,

.
Further simplification using the results in Appendix A yield Since the estimators and are found by solving an ordinary
system of linear equations:

These are not true

.
They are called the Ordinary Least Squares (OLS).
The "Least Squares" part of the name comes from the fact
that this ordinary system of linear equations are the first-order
conditions (FOC) from minimizing (least) the sum of the
squared errors (squares), i.e.,

with respect to and .

Further Estimators

Name Object Estimator Name


Intercept Constant
Slope Coefficient
Population Sample
Regression Regression
Function Function (SRF)
(PRF)
Error for Residual for

Value of Fitted value for


when when

These are not true


,
.

Ch02 Page 4
Example: CEO Salary & Return on Equity

Regression Model: . These are not true


,
,
Fitted Regression: ,
.
.
If the return on equity is zero, , then the predicted
salary is the intercept, 963.191, which equals $963,191 since
salary is measured in thousands.
If the return on equity increases by one percentage
point, , then salary is predicted to change by about 18.5,
or $18,500.

Example: Wage & Education

Regression Model: .

Fitted Regression: .

The intercept of -0.90 literally means that a person with no


education has a predicted hourly wage of -90¢ an hour.
One more year of education increases hourly
wage by 54¢ an hour. Therefore, four more years of education
increase the predicted wage by 4(0.54)=2.16, or $2.16 per hour.

These are not true


,
,
,
.

Example: Voting Outcomes & Campaign Expenditures

Regression Model: .

Fitted Regression:
.

This means that if Candidate A’s share of spending increases


by one percentage point, Candidate A receives almost one -half a
percentage point (0.464) more of the total vote.
If , is predicted to be about 50, or half
the vote.

These are not true


,
,
,
.
26.81+0.464*50=50.01

Fitted Values & Residuals

Ch02 Page 5
Residuals:
Fitted Values:

By contrast the 5th CEO’s fitted salary is


$149,493 lower than their actual salary.

For example, the 12th CEO’s fitted salary is


$526,023 higher than their actual salary.

Ch02 Page 6
OLS: Algebraic
Properties

(1) The sum (therefore the sample average) of the OLS residuals,
, is zero.

(2) The sample covariance between the regressor, ,


and the OLS residuals, , is zero.

(3) The point is always on the OLS regression line.

Notice that and therefore by property (1) above one has .

Analysis of Variance (Variance Decomposition)

SST = SSE + SSR

Total Sum of Squares: Proof:

Explained Sum of Squares:

Residual Sum of Squares:

• SST is the total sample variability.


• SSE is the part of the sample variability that is
explained by the model.
• SSR is the part of the sample variability that is
not explained by the model.
The term because of properties (1) and (2)

Goodness-of-Fit

Definition

is the fraction of the sample variance in that is


explained by .

is the percentage of the sample variation in that is explained by .

Example: CEO Salary & Return on Equity

Regression Model: .

Fitted Regression:
.

The firm’s return on equity explains only about 1.3% of the


variation in salaries for this sample of 209 CEOs.

Example: Voting Outcomes & Campaign Expenditures

Regression Model: .

Fitted Regression:
.

Ch02 Page 7
The share of campaign expenditures explains over 85% of the
variation in the election outcomes for this sample.

Ch02 Page 8
Nonlinearities

Example: A Log Wage Equation

Regression Model: .

Fitted Regression:
.

The intercept of 0.584 literally gives the predicted


when .
increases by 8.3% ( ) for every additional
year of education.

These are not true


• Another year of education increases by 8.3%.
• Another year of education increases by 0.083
units.

Example: CEO Salary & Firm Sales

Regression Model:

Fitted Regression:
.

It implies that a 1% increase in firm sales increases CEO salary by


about 0.257%—the usual interpretation of an elasticity.

These are not true


• If increases by 1 unit, then increases
by 0.257 units.

Model Dependent Variable Independent Variable Interpretation of


Level-level
Level-log
Log-level
Log-log

Ch02 Page 9
OLS: Expected Values &
Variance
Recall that

The data is a random sample, i.e., realizations from the joint


distribution of .

❗ Therefore, the OLS estimators and are functions of random variables, so they are
random variables themselves, i.e., we can calculate expected values and variances.

Unbiasedness of OLS

Assumption SLR.1 (Linear in Parameters)


In the population model, the dependent variable, , is related to the independent variable, , and the error (or disturbance), , as
,

where and are the population intercept and slope parameters, respectively.

Assumption SLR.2 (Random Sampling)


The data is a random sample from the joint distribution of that follows the population model
.

Assumption SLR.3 (Sample Variation in the Explanatory Variable)


The sample outcomes on , namely , are not all the same values.

Assumption SLR.4 (Zero Conditional Mean)


The error has an expected value of zero given any value of the explanatory variable, i.e.,
.

Theorem (Unbiasedness of OLS) • The estimated coefficients may be smaller or larger, depending on the sample that
Let Assumptions SLR.1-SLR.4 hold, then is the result of a random draw.
• However, on average, they will be equal to the values that characterize the true
, relationship between and in the population.
• “On average” means if sampling was repeated, i.e. if drawing the random sample
. and doing the estimation was repeated many times.
• In a given sample, estimates may differ considerably from true values.

Variance of the OLS Estimator

Depending on the sample, the estimates will be nearer or farther away from the true population values.
How far can we expect our estimates to be away from the true population values on average (= sampling variability)?
Sampling variability is measured by the estimator‘s variances

,
.

Notice that for the population model , we have

,
.

Possibility 1 for : Homoskedasticity Possibility 2 for : Heteroskedasticity

Assumption SLR.5 (Homoskedasticity)

Ch02 Page 10
Assumption SLR.5 (Homoskedasticity)
The error has the same variance given any value of the explanatory variable, i.e.,
.

Theorem (Sampling Variance of OLS) • The sampling variability of the estimated regression coefficients will be the higher,
Let Assumptions SLR.1-SLR.5 hold, then the larger the variability of the unobserved factors, and the lower, the higher the
variation in the explanatory variable.
,
Although we can calculate and from the sample,
we do not observe .
.
⚠ Therefore, in order to estimate and , we need
to estimate the generally unknown error variance, .

Estimating the Error Variance


Firstly, recall that by the Law of Total Variance: . Therefore, by Assumptions SLR.4 and SLR,5 we have that

.
0
Infeasible Estimator of Feasible Estimators of
Biased:
.
Unbiased:

Standard Error of the Regression:


Standard Error of :

Standard Error of :

Ch02 Page 11
Chapter 3: Multiple Linear
Regression (Estimation)

1. Definition
2. OLS: Derivation
3. OLS: Interpretation & Partialling Out
4. OLS: Algebraic Properties
5. OLS: Expected Values
6. OLS: Variance
7. OLS: Efficiency (Gauss-Markov Theorem)

Ch03 Page 1
Definition
The general multiple linear regression model (also called the multiple regression model) can be written in the population as

Parameters: Variables:

Intercept , , ,
Slope parameter associated with . Dependent variable Independent variables Error variable
Slope parameter associated with . Explained variable Explanatory variables Disturbance variable
Response variable Control variables Unobservable variable
Slope parameter associated with . Predicted variable Predictor variables
Regressand Regressors
There are parameters in the model.

Example: Wage Equation Example: A Simple Test Score Regression


, ,

• Effectively takes out of the error term and


puts it explicitly in the equation. Because • The coefficient of interest for policy purposes is ,
appears in the equation, its coefficient, , the ceteris paribus effect of on .
measures the ceteris paribus effect of • Per student spending, , is likely to be
on , which is also of some interest correlated with average family income, , at a
given high school because of school financing.
• It contains experience explicitly, we will be able Omitting average family income, , in regression
to measure the effect of education on wage, would lead to a violation of the zero conditional mean
holding experience fixed. assumption.

Example: Family Income & Consumption


, Basically you should think of including many
regressors on the right-hand side of a regression model
• This is a nonlinear function in income, , i.e., equivalent to taking them out of the error term and
consumption is a quadratic function on . therefore accounting their effects explicitly, mimicking the
• It makes no sense to measure the effect of ceteris-paribus idea.
on while holding fixed, because
if changes, then so must ! ⚠ The model must be linear in the parameters, not on the
explanatory variables.

When there is a Causal Interpretation?


The key assumption is going to be on the joint distribution of the observed control, , , , , and the unobserved error :

⬅ Zero Conditional Mean Assumption.

❗ Also notice that the zero conditional mean assumption has the following implications:

Ch03 Page 2
Ch03 Page 3
OLS: Derivation

Recall that the zero conditional mean assumption implied

If we were to know the actual values of , , ,


, and for all , then we would have a
system of ( ) linear equations with ( ) unknows, i.e., , ,
, . .

Ordinary Least Squares Estimator

(random sampling assumption)


We have a random sample of size , , following the population
model in equation .

Population Estimator
Moment

The , , , , are therefore the solution to an ordinary system of ( ) linear equations


and called the Ordinary Least Squares (OLS).
The "Least Squares" part of the name comes from the fact that this ordinary system of ( )
linear equations are the first-order conditions (FOC) from minimizing (least) the sum of the squared
errors (squares), i.e.,

with respect to , , , , , .

Name Object Estimator Name


Intercept Constant
Slope Coefficient
Population Sample Regression
Regression Function (SRF)
Function (PRF)
Error for Residual for
Value of when Fitted value for when

Ch03 Page 4
OLS: Interpretation &
Partialling Out
The SRF (OLS Regression line) is

or in term of changes,

Definition
The coefficient on measures the change in due to a one-unit
increase in , holding all other independent variables fixed. That is,
,
holding fixed. Thus, we have controlled for the variables
when estimating the effect of on . The other coefficients
have a similar interpretation.

Example: Determinant of College GPA

. . . .

The intercept 1.29 is the predicted college GPA, , if


and are both set as zero.

Holding fixed, another point on is associated


with .453 of a point on the college GPA, or almost half a point.

Alternatively, if we choose two students, A and B, and these


students have the same ACT score, but the high school GPA of
Student A is one point higher than the high school GPA of
Student B, then we predict Student A to have a college
GPA .453 higher than that of Student B.

The sign on implies that, while holding fixed, a


change in the score of 10 points —a very large change, since
the maximum score is 36 and the average score in the sample
is about 24 with a standard deviation less than three —affects
by less than one-tenth of a point.
0.0094*10=0.094

Example: Hourly Wage Equation

. . . .

The coefficient .092 means that, holding and


fixed, another year of education is predicted to increase wages by
approximate 9.2% [100(.092)].

Sometimes, we want to change more than one independent


variable at the same time to find the resulting effect on the
dependent variable:

We can obtain the estimated effect on when an individual stays at


the same firm for another year: (general workforce experience) and
both increase by one year. The total effect (holding fixed) is

. . . . . .

or 2.61%.

A “Partialling Out” Interpretation of Multiple Regression

One can show that the estimated coefficient, , of an explanatory

Ch03 Page 5
One can show that the estimated coefficient, , of an explanatory
variable, , in a multiple regression

can be obtained in two steps:

Step 1: Regress the explanatory variable, , on all other


explanatory variables, , , , .

Step 2: Regress the dependent variable, , on the residuals from


the regression in Step 1.

Why does this procedure work?


The residuals from the first regression is the part of the explanatory
variable that is uncorrelated with the other explanatory variables.
The slope coefficient of the second regression therefore represents
the isolated effect of the explanatory variable on the dependent
variable.

Ch03 Page 6
OLS: Algebraic Properties

(1) The sum (therefore the sample average) of the OLS residuals,
, is zero.

(2) The sample covariance between the regressors, ,


, , , , and the OLS
residuals, , is zero.
.
(3) The point is always on the OLS regression line.

Notice that and therefore by property (1) above


one has .

Analysis of Variance (Variance Decomposition) Goodness of Fit

SST = SSE + SSR Definition

Total Sum of Squares: .

is the fraction of the sample variance in that is explained by


Explained Sum of Squares: , , , , .

⚠ The can only increase when adding regressors.


Residual Sum of Squares: can also be shown to equal the squared correlation
coefficient between the actual and the fitted values . That is,

Example: Explaining Arrest Records

Fitted Regression 1:

. . . .
.

If the proportion prior arrests increases by 0.5, the predicted fall in


arrests is 7.5 arrests per 100 men. 0.5*(-0.150)=(-0.075)*100=7.5

If the months in prison increase from 0 to 12, the predicted fall in


arrests is 0.408 arrests for a particular man. 12*(-0.034)=-0.408

If the quarters employed increase by 1, the predicted fall in arrests is


10.4 arrests per 100 men. 1*(-0.104)*100=-10.4

Fitted Regression 2:

. . . . .
.

Adding the average sentence variable increases from .0413


to .0422, a practically small effect.

The sign of the coefficient on is also unexpected: it says


that a longer average sentence length increases criminal activity.

Ch03 Page 7
Ch03 Page 8
OLS: Expected Values
Unbiasedness of OLS

Assumption MLR.1 (Linear in Parameters) Recall the model must be linear in the parameters of interest (no on
In the population model, the dependent variable, the explanatory variables).
, is related to the independent variable, , ,
, , and the error (or disturbance), , as
,

where , , , , , are the unknown


u
parameters (constants) of interest.
Assumption MLR.2 (Random Sampling) An implication of this assumption is that for a randomly chosen
The data is a observation of the data set it must be true that
random sample from the joint distribution of
.
that follows the population
model
.
The assumption only rules out perfect collinearity/correlation bet-
ween explanatory variables; imperfect correlation is allowed.
Assumption MLR.3 (No Perfect Collinearity)
In the sample (and therefore in the population), Perfect Collinearity (Example):
none of the independent variables is constant,
and there are no exact linear relationships among .
the independent variables.
Constant variables are also ruled out (collinear with intercept).

In a multiple regression model, the zero conditional mean


Assumption MLR.4 (Zero Conditional Mean) assumption is much more likely to hold because fewer things end up in
The error has an expected value of zero given the error.
any value of the explanatory variables, i.e.,
. If Assumption MLR.4 holds, we say that the explanatory variables are
exogenous, if is said to be correlated with the error term, , we say
that is endogenous.

Theorem (Unbiasedness of OLS)


Let Assumptions MLR.1-MLR.4 hold, then

,
,
,
,

where .

Ch03 Page 9
Ch03 Page 10
Including Irrelevant
Regressors
Suppose we specify the model ( )
Example of
, a misspecified
model.

where Assumptions MLR.1 - MLR.4 hold.

However, has no effect on after and have been controlled for, which means that .

In this case the PRF is .

If we were to regress on , , and and obtain the OLS estimates, , , , and by virtue of the Assumptions
MLR.1 - MLR.4 we have

Including one or more irrelevant variables in a multiple regression model, or overspecifying the model, does not affect the
unbiasedness of the OLS estimators.

Ch03 Page 11
Excluding Relevant
Regressors
Suppose the true model ( ) is

➕ (relationship between regressors and ). Then :


:

Therefore, if one runs a regression of on , then if Assumptions SLR.1-SLR.4 hold for this regression, then

Example of
a misspecified , ⚠ Unless or
model.
, .
.

bias
bias

⚠ Therefore, if one omits a relevant ( )


regressor, the resulting OLS estimators are
biased and these terms are called omitted Positive Bias Negative Bias
variable bias. Negative Bias Positive Bias

There are not similar results for the general (any) case.

Ch03 Page 12
Ch03 Page 13
OLS: Variance
As in the simple linear regression case we have that for the population model
,

,
.

Assumption MLR.5 (Homoskedasticity)


The error has the same variance given any value of the explanatory variables, i.e.,
.

Theorem (Sampling Variance of Slope OLS Estimators)


Let Assumptions MLR.1-MLR.5 hold, then

for , where is the total


sample variation in , and is the -squared from regressing
on all other independent variables (including an intercept).

Error Variance Total Sample Variation in the Explanatory Variable Linear Relationships Among Regressors
• A high error variance increases the sampling variance • More sample variation leads to more precise estimates. • The -squared of this regression will be the
because there is more “noise” in the equation. • Total sample variation automatically increases with the sample higher when can be better explained by
• A large error variance doesn‘t necessarily make size. the other independent variables.
estimates imprecise. • Increasing the sample size is thus a way to get more precise • The sampling variance of the slope
• The error variance does not decrease with sample size. estimates. estimator for will be higher when can
be better explained by the other
⚠ This term is generally unknown and will be independent variables.
estimated (below). Under perfect multicollinearity, the
variance of the slope estimator is not
defined (infinity).

❓ Is Multicollinearity a Problem?

Define the variance inflation factor (VIF) as

so that .

Therefore, the higher the linear correlation between


the regressor and the other independent variables, the
higher the , and therefore the higher the variance of
the slope coefficient will be.

Rule of : If is above 10 (equivalently, is


above .9), then we conclude that multicollinearity is a
'problem' for estimating .

Ch03 Page 14
Ch03 Page 15
Misspecified
Models
Suppose the true model ( ) is

and it satisfies assumptions MLR.1-MLR.5:

Regress on and : ,
Regress on : ,

Truth Biased/Unbiased Variance


, ,& are unbiased
,& are unbiased

,
, ,& are unbiased because .
,& are biased

Ch03 Page 16
Estimating : Standard
Errors
Firstly, recall that by the Law of Total Variance: . Therefore, by Assumptions MLR.4 and MLR,5 we have that

Infeasible Estimator of Feasible Estimators of


Biased:
.
Unbiased:

Standard Error of the Regression


(Root Mean Squared Error):
Standard Error of :

⁉ What are "Degrees of Freedom" ( )?

We can figure out why the degrees of freedom adjustment is necessary by returning to the first order conditions for the OLS estimators.
These can be written and where . Thus, in obtaining the OLS estimates, restrictions are imposed
on the OLS residuals. This means that, given of the residuals, the remaining residuals are known: there are only
degrees of freedom in the residuals. (This can be contrasted with the errors , which have degrees of freedom in the sample.)

⁉ Why do "Degrees of Freedom" ( ) matter?

Because it can be shown that under Assumptions MLR.1 MLR.5:

This means that , i.e., an unbiased estimator.

Ch03 Page 17
Example: Explaining Arrest Records
. . . . .
. . . . .
.

Standard errors are usually reported underneath the OLS estimates in


parenthesis.

Ch03 Page 18
OLS: Efficiency (Gauss-
Markov Theorem)
Simple Linear Regression:

Consider the simple linear regression model ( ) again but without an intercept ( ):

Under Assumptions SLR.1-SLR.5 we have that the OLS estimator of is

⁉ Is the OLS estimator the only unbiased estimator under


Assumptions SLR.1-SLR.4?

NO

Consider the alternative estimator of (provided ):

It is also unbiased: .

It turns out that, under Assumptions SLR.1-SLR.5, the OLS estimator has the smallest possible variance among any other
estimator that can be written as a linear function of the response variable , i.e., , and is also
unbiased. In this sense the OLS estimator is the best.

Multiple Linear Regression:

Theorem (Gauss-Markov) Assumptions MLR.1-MLR.5 are therefore called the


Under Assumptions MLR.1 through MLR.5, are Gauss-Markov Assumptions.
the best linear unbiased estimators (BLUEs) of , ⚠ OLS is the best under Assumptions MLR.1-MLR.5 which
respectively. means that if one of these assumptions were to be violated,
then there will be better estimators than the OLS.

Ch03 Page 19
Derivacion alternativa de MCO
Saturday, June 26, 2021 8:54 PM

Ch03 Page 20
Ch03 Page 21
Ch03 Page 22
Ch03 Page 23

You might also like