You are on page 1of 67

Econometrics I

Chapter 3
Linear Regression with One Regressor

Prof. Miguel Ángel Borrella Mas

School of Economics and Business Administration


Universidad de Navarra

Academic year 2022-23


Outline

1 Introduction

2 The (simple) Linear Regression Model

3 The Ordinary Least Squares estimator

4 Measures of fit

5 The Least Squares Assumptions

6 Sampling distribution of the OLS Estimators


Outline

1 Introduction

2 The (simple) Linear Regression Model

3 The Ordinary Least Squares estimator

4 Measures of fit

5 The Least Squares Assumptions

6 Sampling distribution of the OLS Estimators

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


Learning objectives

• Ask a question [simple - with one independent variable]. We


want to study the causal effect of “A” on “B”

• Set up a simple linear model to answer this question

• Answer the question using data and a statistical package


(Stata)

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


Example
Question from previous chapter: Does class-size affect student
performance?

• What are our priors? −→ Smaller class size are better for
learning outcomes (?)
• We are interested in
Change in TextScore △T extScore
β1 = =
Change in ClassSize △ClassSize

• In words: β1 measures the change in Test Score due to a unit


change in Class Size
• Mathematically:
• β1 = slope of a straight line relating test scores and class size
T est Score = β0 + β1 ∗ Class Size
• β0 = intercept of the straight line
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
Example (2)

• But: The average test score in district i does not only depend
on the average class size
• It also depends on other factors such as:
• Quality of the teachers
• Student background
• Quality of text books
• ...

• The equation describing the linear relation between Test score


and Class size is better written as:

T est Scorei = β0 + β1 Class Sizei + ui

where ui lumps together all other district characteristics that


affect average test scores

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


Statistical inference for linear regression

Statistical (or econometric) inference about the slope entails:


1 Estimation:
• How should we draw a line through the data to estimate the
population slope? −→ Ordinary Least Squares (OLS!)
• What are advantages and disadvantages of OLS?
2 Hypothesis testing:
• How to test whether the slope is zero?
3 Confidence intervals:
• How to construct a confidence interval for the slope?

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


Outline

1 Introduction

2 The (simple) Linear Regression Model

3 The Ordinary Least Squares estimator

4 Measures of fit

5 The Least Squares Assumptions

6 Sampling distribution of the OLS Estimators

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


Population regression line

The population regression line is the expected value of Y given X

E(Y | X)

• The slope (or marginal effect) is the difference in the expected


values of Y, for two values of X that differ by one unit
• The estimated regression can be used either for:
• causal inference (learning about the causal effect on Y of a
change in X)
• prediction (predicting the value of Y given X, for an
observation not in the data set)

• Causal inference and prediction place different requirements


on the data – but both use the same regression toolkit

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


General form

General form of the population regression line


Yi = β0 + β1 Xi + ui , i = 1, . . . , n
where:

• Subscript i = observational level [n-paired obs. (Xi , Yi )]


• Yi = Dependent variable
• Xi = Independent variable or regressor
• β0 = Population intercept (unknown!)
• β1 = Population slope (unknown!)
• ui = Regression error term −→ Omitted factors that influence
Y , other than the variable X. Also includes error in the
measurement of Y

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


Interpretation

△Y △u
= β1 as long as =0
△X △X
• By how much does Y change if X is increased by 1 unit?
• It is only correct if all other things remain equal when X is
increased by 1 unit
• Conditional mean independence: E(u | X) = 0
• (Explanatory variable must not contain information about the
mean of the unobserved factors)
• Can we test this?

• Condition unlikely to hold


• Simple linear regression model is rarely applicable in practice
• But its discussion is useful for pedagogical reasons

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


Interpretation

△Y △u
= β1 as long as =0
△X △X
• By how much does Y change if X is increased by 1 unit?
• It is only correct if all other things remain equal when X is
increased by 1 unit
• Conditional mean independence: E(u | X) = 0
• (Explanatory variable must not contain information about the
mean of the unobserved factors)
• Can we test this?

• Condition unlikely to hold


• Simple linear regression model is rarely applicable in practice
• But its discussion is useful for pedagogical reasons

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


Example
A simple linear wage equation:

W agei = β0 + β1 Educi + ui

• β1 = Measures the change in hourly wage of an additional


year of education
• ui = Includes factors such as:
• Labor force experience
• Tenure with current employer
• Work ethic
• Ability

• What about the conditional mean independence?


−→ Again unlikely to hold:
• Individuals with more education will also be more intelligent
(more able) on average!
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
Example
A simple linear wage equation:

W agei = β0 + β1 Educi + ui

• β1 = Measures the change in hourly wage of an additional


year of education
• ui = Includes factors such as:
• Labor force experience
• Tenure with current employer
• Work ethic
• Ability

• What about the conditional mean independence?


−→ Again unlikely to hold:
• Individuals with more education will also be more intelligent
(more able) on average!
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
Outline

1 Introduction

2 The (simple) Linear Regression Model

3 The Ordinary Least Squares estimator

4 Measures of fit

5 The Least Squares Assumptions

6 Sampling distribution of the OLS Estimators

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


Intuition
In general we do not know β0 and β1 −→ We have to estimate them
using a random sample of data
Question: How to find the line that fits the data best?

OLS estimators: To choose the regression coefficients s.t. the


estimated regression line is as close as possible to the observed data
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
Intuition
In general we do not know β0 and β1 −→ We have to estimate them
using a random sample of data
Question: How to find the line that fits the data best?

OLS estimators: To choose the regression coefficients s.t. the


estimated regression line is as close as possible to the observed data
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
Graphically

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


Regression model
Method to estimate β0 and β1 :

LEAST SQUARE PRINCIPLE


Mathematical procedure that uses the data to position a line with
the objective of minimizing the sum of the squares of the vertical
distances between the actual Y values and the predicted values of Y

n
X n
X
min S(β0 , β1 ) = û2i = (Yi − Ŷi )2
β0 ,β1
i=1 i=1

where ûi are called the residuals:


• Difference between the observed y-value and the predicted
y-value for a given x-value on the line
• ûi = Yi − βˆ0 − βˆ1 Xi
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
Regression equation

Regression equation
An equation that expresses the linear relationship between two
variables

Ŷi = βˆ0 + βˆ1 Xi


where:
• Ŷi = Estimated value of Y for a selected X value
• βˆ0 = Y-intercept. Estimated value of Y when X = 0
• βˆ1 = Slope. Average change in the dependent variable Y for
each change of one unit (increase or decrease) in the
independent variable X
• Xi = Value of the independent variable that is selected

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


OLS estimators of β1 and β0

Pn
sy sxy i=1 xi yi − nX̄ Ȳ
OLS estimator of β1 → βˆ1 = rxy = 2 = P n 2 2
sx sx i=1 xi − nX̄
where:
• rxy = Correlation coefficient
• sy = Standard deviation of Y
• sx = Standard deviation of X
• sxy = Covariance between X and Y

βˆ0 = Ȳ − βˆ1 X̄
where:
• Ȳ = Sample mean of Y
• X̄ = Sample mean of X
• βˆ1 = Estimated slope of the regression line
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
OLS estimators of β1 and β0

Pn
sy sxy i=1 xi yi − nX̄ Ȳ
OLS estimator of β1 → βˆ1 = rxy = 2 = P n 2 2
sx sx i=1 xi − nX̄
where:
• rxy = Correlation coefficient
• sy = Standard deviation of Y
• sx = Standard deviation of X
• sxy = Covariance between X and Y

βˆ0 = Ȳ − βˆ1 X̄
where:
• Ȳ = Sample mean of Y
• X̄ = Sample mean of X
• βˆ1 = Estimated slope of the regression line
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
Why to use OLS estimators?

• OLS is, as in the case of the sample average, the estimator


searching for the line that better “fits” the scatterplot:
• Notice: if the “line” is just an intercept (Y does not depend
on X), then the OLS estimator is just the sample average of
Y1 , . . . , Yn −→ (Ȳ )

• Like Ȳ , the OLS estimator has some desirable properties:


• Under certain assumptions, it is unbiased −→ E(β̂1 ) = β1
• Its sampling distribution has lower variance than other
candidate estimators of β1
p
• Under certain assumptions, it is consistent −→ β̂1 −→ β1

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


Example
Application to the California Test Score – Class Size data

• The sample mean of district average test scores Ȳ = 654.16

• It can also be obtained by OLS:

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


Example

19.053
• Estimated slope = β̂1 = −2.28 = −0.2264 ∗ 1.8918

• Estimated intercept = β̂0 = 698.9 = 654.1565 + 2.28 ∗ 19.64


• Estimated regression line: T est ˆScore = 698.9 − 2.28 ∗ ST R
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
Example

• Interpretation of the estimated slope and intercept:


• The slope: Districts with one more student per teacher have,
on average, test scores that are 2.28 points lower
△Text Score
• That is, we estimate = △ST R = −2.28

• The intercept: Taken literally, means that, according to this


estimated line, districts with zero students per teacher would
have a (predicted) test score of 698.9
• BUT: This interpretation of the intercept makes no sense
here!
1 It extrapolates the line outside the range of the data (the
intercept is not itself economically meaningful)
2 What does it mean for class size to be zero?

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


Example

• Interpretation of the estimated slope and intercept:


• The slope: Districts with one more student per teacher have,
on average, test scores that are 2.28 points lower
△Text Score
• That is, we estimate = △ST R = −2.28

• The intercept: Taken literally, means that, according to this


estimated line, districts with zero students per teacher would
have a (predicted) test score of 698.9
• BUT: This interpretation of the intercept makes no sense
here!
1 It extrapolates the line outside the range of the data (the
intercept is not itself economically meaningful)
2 What does it mean for class size to be zero?

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


Example (2)

In Stata:

One of the districts in the dataset is Antelope, CA, for which:

ST R = 19.33 and Test Score = 657.8

• Predicted value: ŶAntelope = 698.9 − 2.28 ∗ 19.33 = 654.8


• Residual: ûAntelope = 657.8 − 654.8 = 3.0

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


Outline

1 Introduction

2 The (simple) Linear Regression Model

3 The Ordinary Least Squares estimator

4 Measures of fit

5 The Least Squares Assumptions

6 Sampling distribution of the OLS Estimators

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


Introduction

How well does the estimated regression line “fit” or explain the data?
1 Does the regressor X account for much or for little variation
in Y ? −→ The R2 measures the fraction of the variance of Y
that is explained by X
• It is unitless
• Ranges between 0 (no fit) and 1 (perfect fit)

2 Are the observations in the scatter plot clustered closely


around the regression line?
−→ The standard error of the regression (SER) measures
how far Yi typically is from its predicted value

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


The R2

The R2 is the fraction of the sample variance of Yi “explained” by


the regression

Yi = Ŷi + ûi = OLS prediction + OLS residual

• Sample Var(Y) = Sample Var(Ŷ ) + Sample Var(û)


• Total sum of squares = “Explained” SS + SS “Residuals”
Pn  ¯ 2

ESS i=1 Ŷi −Ŷ SSR
• R2 = T SS = Pn 2 =1− T SS
i=1 (Yi −Ȳ )

• If R2 = 0, Xi explains none of the variation in Yi


• If R2 = 1, Xi explains all of the variation in Yi (Yi = Ŷi )
• In practice, 0 < R2 < 1
• With one regressor: R2 = the square of the correlation
coefficient between X and Y

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


SE of the regression
The SER is an estimator of the standard deviation of the regression
error ui
v v
u n u n
u 1 X 2 u 1 X
¯ û2i

SER = sû = t ûi − û = t
n−2 n−2
i=1 i=1

¯= 1 Pn
• The second equality holds because û n i=1 ûi =0
• The divisor n − 2 is used because 2 degrees of freedom were
lost in estimating the two regression coefficients β0 and β1
• It measures the spread of the observations around the
regression line in the units of the dependent variable
• In other words: It measures the average “size” of the OLS
residual (the average “mistake” made by the OLS regression)

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


SE of the regression
The SER is an estimator of the standard deviation of the regression
error ui
v v
u n u n
u 1 X 2 u 1 X
¯ û2i

SER = sû = t ûi − û = t
n−2 n−2
i=1 i=1

¯= 1 Pn
• The second equality holds because û n i=1 ûi =0
• The divisor n − 2 is used because 2 degrees of freedom were
lost in estimating the two regression coefficients β0 and β1
• It measures the spread of the observations around the
regression line in the units of the dependent variable
• In other words: It measures the average “size” of the OLS
residual (the average “mistake” made by the OLS regression)

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


Example

The slope is statistically significant & large in a policy sense, but:


• STR explains only a small fraction of the variation in test
scores −→ R2 = 5.12%
• Large spread −→ RSER = 18.6%
q P
1 n
• In Stata: SER = RMSE = n i=1 (û2i )
• Distinction negligible if n is large enough
• Does this make sense? Does this mean the STR is
unimportant in a policy sense?
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
Example

The slope is statistically significant & large in a policy sense, but:


• STR explains only a small fraction of the variation in test
scores −→ R2 = 5.12%
• Large spread −→ RSER = 18.6%
q P
1 n
• In Stata: SER = RMSE = n i=1 (û2i )
• Distinction negligible if n is large enough
• Does this make sense? Does this mean the STR is
unimportant in a policy sense?
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
Outline

1 Introduction

2 The (simple) Linear Regression Model

3 The Ordinary Least Squares estimator

4 Measures of fit

5 The Least Squares Assumptions

6 Sampling distribution of the OLS Estimators

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


Introduction

So far: OLS is as a way to draw a straight line through the data on


Y and X. But:

1 Under what conditions does the slope of this line have a


causal interpretation? That is, when will the OLS estimator
be unbiased for the causal effect on Y of X?
2 What is the variance of the OLS estimator over repeated
samples?

To answer these questions −→ To make some assumptions about


how Y and X are related to each other, and about how they are
collected (the sampling scheme)
• These assumptions are known as the Least Squares
Assumptions for Causal Inference

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


Introduction

So far: OLS is as a way to draw a straight line through the data on


Y and X. But:

1 Under what conditions does the slope of this line have a


causal interpretation? That is, when will the OLS estimator
be unbiased for the causal effect on Y of X?
2 What is the variance of the OLS estimator over repeated
samples?

To answer these questions −→ To make some assumptions about


how Y and X are related to each other, and about how they are
collected (the sampling scheme)
• These assumptions are known as the Least Squares
Assumptions for Causal Inference

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


Reminder!

The causal effect on Y of a unit change in X is the expected


difference in Y as measured in a randomized controlled experiment

• With a binary treatment:


• The causal effect is the expected difference in means between
the treatment and control groups (remember chapter 2b!)
• It requires random assignment or as-if random assignment
• Random assignment ensures that the treatment (X) is
uncorrelated with all other determinants of Y, so that there are
no confounding variables

• The least squares assumptions for causal inference generalize


the binary treatment case to regression

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


General assumptions

General assumptions for the linear regression model:

1 Assumption SLR.1 (Linear in parameters) −→ In the


population, the relationship between Y and X is linear

Y = β0 + β1 X + U

2 Assumption SLR.2 (Sample variation in the regressor) −→


The values of the regressor are not all the same (otherwise
would be impossible to study how different values of X lead to
different values of Y)

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


Specific assumptions
Specific assumptions for the linear regression model:
3 Assumption SLR.3 (Zero conditional mean) −→ The value
of the regressor must contain no information about the mean
of the unobserved factors
E (ui | Xi ) = 0
4 Assumption SLR.4 (Simple random sampling) −→
(Xi , Yi ), i = 1, . . . , n are i.i.d. and then each data point
follows the population equation
5 Assumption SLR.5 (Outliers unlikely) −→ X and Y have
finite fourth moments
6 Assumption SLR.6 (Homoskedasticity) −→ The value of the
regressor must contain no information about the variability of
the unobserved factors
V ar (ui | Xi ) = σ 2
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
SLR.3
For any given value of X, the mean of u is zero: E (ui | Xi = xi ) = 0

• This implies that β̂1 is unbiased for the causal effect of β1

E (Yi | Xi ) = E (β0 + β1 Xi + ui | Xi ) =
= β0 + β1 E (Xi | Xi ) + (ui | Xi ) = β0 + β1 Xi

• Our example: T est Scorei = β0 + β1 Class Sizei + ui −→


What can be those other factors included in ui ?
• Parental involvement
• Outside learning opportunities (extra math classes...)
• Home environment conducive to reading
• Family income is a useful proxy for many such factors

• This means E(family income | ST R) = constant and implies


that family income and STR are uncorrelated. . .
• Is E (ui | Xi = xi ) = 0 plausible for these other factors?
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
SLR.3 (2)

The benchmark for understanding this assumption is to consider an


ideal RCT:

• X is randomly assigned to people


• Students randomly assigned to different size classes
• Randomization is done by computer —using no information
about the individual

• Because X is assigned randomly, all other individual


characteristics (things included in u) are distributed
independently of X −→ u and X are independent
• Then, in an ideal RCT: E (ui | Xi = xi ) = 0 (SLR.3 holds)

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


SLR.3 (3)
With observational data, we need to think hard about whether
E (ui | Xi = xi ) = 0 holds. Example:
• Suppose that:
• Districts which wealthy inhabitants have small classes and
good teachers
• These districts have a lot of money which they can use to hire
more and better teachers
• Districts with poor inhabitants have large classes and bad
teachers
• These districts have little money and can hire only few and not
very good teachers
• In this case, class size is related to teacher quality
• Teacher quality likely affects test scores −→ Within ui
• This implies a violation of SLR.3
E(ui | Class sizei = small) ̸= E(ui | Class sizei = large) ̸= 0
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
SLR.4

(Xi , Yi ), i = 1, . . . , n are i.i.d. arises automatically if the entity


(individual, district) is sampled by simple random sampling:

• The entities are selected from the same population, so


(Xi , Yi ) are identically distributed for all i = 1, . . . , n
• The entities are selected at random, so the values of (X, Y )
for different entities are independently distributed

Examples of a violation of simple random sampling:


• Panel data and time series data (Data recorded over time)
• Observations on children from the same mother (not
independent)

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


SLR.5

Large outliers are rare −→ E(X 4 ) < ∞ and E(Y 4 ) < ∞

• Outliers are observations that have values far outside the


usual range of the data
• Another way to state assumption is that X and Y have finite
kurtosis
• Large outliers can make OLS regression results misleading
• Look at your data! If you have a large outlier, is it a typo?
• Does it belong in your data set? Why is it an outlier?

• Assumption is necessary to justify the large sample


approximation to the sampling distribution of the OLS
estimators

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


SLR.5 (2)

Large outlier can strongly influence the results:

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


SLR.6

Homoskedasticity graphically:

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


Outline

1 Introduction

2 The (simple) Linear Regression Model

3 The Ordinary Least Squares estimator

4 Measures of fit

5 The Least Squares Assumptions

6 Sampling distribution of the OLS Estimators

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


Introduction

The OLS estimator is computed from a sample of data


−→ A different sample yields a different value of β̂1 (the source of
the “sampling uncertainty” of β̂1 ¡). Then we want:

• Quantify the sampling uncertainty associated with β̂1


• Use β̂1 to test hypothesis such as H0 = β1 = 0
• Construct a confidence interval for β1
• Goal: To study the sampling distribution of β̂1
1 Probability framework for linear regression
2 Distribution of OLS estimator

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


Introduction

The OLS estimator is computed from a sample of data


−→ A different sample yields a different value of β̂1 (the source of
the “sampling uncertainty” of β̂1 ¡). Then we want:

• Quantify the sampling uncertainty associated with β̂1


• Use β̂1 to test hypothesis such as H0 = β1 = 0
• Construct a confidence interval for β1
• Goal: To study the sampling distribution of β̂1
1 Probability framework for linear regression
2 Distribution of OLS estimator

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


Probability Framework for Linear Regression

The probability framework is summarized by the OLS assumptions:

1 Population → The group of interest (ex: all possible school


districts)
2 Random variables → X, Y (ex: Test Score, STR) (SLR.2)
3 Joint distribution of X and Y → We assume:
• Population regression function is linear (SLR.1)
• E (ui | Xi = xi ) = 0 (SLR.3)
• X, Y have nonzero finite fourth moments (SLR.5)
4 Simple random sampling → Data collection by this method
implies (Xi , Yi ), i = 1, . . . , n are i.i.d. (SLR.4)

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


Reminder

Recall the summary of the sampling distribution of Ȳ :

• For (Y1 , . . . , Yn ) i.i.d. with 0 < σY2 < ∞,

σY2
 
Ȳ is Best ≤ V ar(µ̂Y ) ∀ µ̂Y
V ar(Ȳ ) =
n
n
!
1X
Linear µ̂Y = Yi
n
i=1

Unbiased E(Ȳ ) = µY
Estimator of µY

• Moreover:
Ȳ − E(Ȳ )
By CLT: p ≃ N (0, 1)
V ar(Ȳ )

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


Sampling distribution of β̂1

Like Ȳ , β̂1 (remember: it is a function of sample averages!) has a


sampling distribution:

1 What is E(β̂1 )?
−→ If E(β̂1 ) = β1 , then OLS is unbiased (good thing!)
2 What is V ar(β̂1 )? (measure of sampling uncertainty)
−→ We need to derive a formula in order to compute the SE
of β1
3 What is the distribution of β̂1 in small samples?
−→ It is very complicated in general
4 What is the distribution of β̂1 in large samples?
−→ By the CLT, β̂1 is (approx) normally distributed

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


Preliminary algebra
Some (needed!) preliminary algebra:

Yi = β0 + β1 Xi + ui
Ȳ = β0 + β1 X̄ + ū
Hence: Yi − Ȳ = β1 (Xi − X̄) + (ui − ū)

Thus:
Pn  
Xi − X̄ Yi − Ȳ
βˆ1 = i=1
Pn 2
Xi − X̄
Pn  Pi=1
n 
Xi − X̄ β1 (Xi − X̄) + (ui − ū)
βˆ1 = i=1
Pn
i=1
2
Xi − X̄
i=1
Pn   Pn 
X i − X̄ Xi − X̄ X i − X̄ (ui − ū)
βˆ1 = β1 i=1Pn 2 + i=1 Pn 2
i=1 Xi − X̄ i=1 Xi − X̄

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


Preliminary algebra (2)

Pn 
Xi − X̄ (ui − ū)
βˆ1 = β1 + i=1
Pn 2
i=1 Xi − X̄

It can be shown that:


n
X n
X
 
Xi − X̄ (ui − ū) = Xi − X̄ ui
i=1 i=1

Finally:
Pn 
ˆ i=1 Xi − X̄ ui
β1 − β1 = Pn 2
i=1 Xi − X̄

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


What is E(β̂1 )?

Pn  !
  Xi − X̄ ui
E βˆ1 − β1 = E i=1
Pn 2
i=1 Xi − X̄
Pn  !
  X i − X̄ E (ui | Xi )
Using LIE: E βˆ1 − β1 = E i=1
Pn 2
i=1 Xi − X̄
 
E βˆ1 − β1 = 0, because LSR.3: E (ui | Xi = xi )

• Thus LSR.3 implies that E(β̂1 ) = β1 —just like Ȳ !


• That is, β̂1 is an unbiased estimator of β1

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


What is V ar(β̂1 )?
Rewrite:
Pn  1 Pn
Xi − X̄ ui i=1 vi
βˆ1 − β1 = i=1
Pn 2 =
n
n−1
 2
i=1 Xi − X̄ n SX

where vi = Xi − X̄ ui
2 ≈ σ 2 and n−1
If n is large enough: SX X n ≈ 1. Then:
1 P n
vi
βˆ1 − β1 ≈ n i=1
2
σX
!
1 Pn
  vi
V ar βˆ1 − β1 = V ar n i=1 2
σX
  1 V ar  X − X̄  u 
i i
V ar βˆ1 = 2
n 2
σX
 
• V ar βˆ1 is inversely proportional to n —just like Ȳ !
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
What is the distribution of β̂1 ?

The exact sampling distribution is complicated – it depends on the


population distribution of (Y, X). But, When n is large, we get
some simple (and good) approximations
p
• Since V ar(β̂1 ) < ∞ and β̂1 −→ β1
• We can use the CLT to obtain the (approx) distribution
• Remember previous slide:
1 Pn 1 Pn
i=1 vi i=1 vi
βˆ1 − β1 = n
n−1
 2 ≈ n
2
n SX σX

where vi = Xi − X̄ ui

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


What is the distribution of β̂1 ? (2)

• When n is large: vi = (Xi − X̄)ui ≈ (Xi − µX )ui


• vi is i.i.d. (why?)
• V ar(vi ) < ∞ (why?)
σ2
Pn  
• By the CLT: n1 i=1 vi ≃ N 0, nv

Then, β̂1 is approximately distributed:


!
σv2
β̂1 ≃ N β1 , 2
2
n σX

where vi ≈ (Xi − µX ) ui

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


Extra (1): Proof of consistency
p
Consistency means β̂1 −→ β1 or p − lim β̂1 = β1

Pn  !
Xi − X̄ Y i − Ȳ
p − lim βˆ1 = p − lim i=1
Pn 2
i=1 Xi − X̄
1 Pn
 p !
Xi − X̄ ui −→ 0
p − lim βˆ1 = β1 + p − lim n i=1
2 p
1 Pn
n i=1 Xi − X̄ −→ V ar(X)

• Then, p − lim β̂1 = β1 if E (ui | Xi = xi ) = 0


• Unbiasedness & consistency both rely on SLR.3
• But consistency implies that the sampling distribution
becomes more and more tightly distributed around β1 if the
sample size n becomes larger and large

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


Extra (2): Variance of X vs Variance of β̂1

  1 V ar [(X − µ ) u ]
i X i
V ar βˆ1 = 2
n σ 2
X

where 2
σX = V ar(Xi )

• The variance of X appears (squared) in the denominator −→


Increasing the spread of X decreases the variance of β̂1
• Intuition: More variation in X implies more info in the data
that can be used to fit the regression line. Graphically:

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


Extra (3): (What about SLR.6?)

Under SLR.6:
  σu2 σu2
V ar βˆ1 = 2 = Pn 2
n ∗ σX Xi − X̄
i=1

• Same notion: Larger sampling variability of βˆ1 if variability


of unobserved factors is higher; lower if variation in the
regressor is larger
SSR
• As usual, σu2 is unknown −→ Use σ̂ 2 = s2u = SER = n−2

• Homoskedastic SE of βˆ1 is then:


su
SE(βˆ1 ) = qP 2
n
i=1 Xi − X̄

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


Extra (3): (What about SLR.6?)

Difference between homoskedastic and heteroskedastic (robust) SE

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


Summary

Parallel conclusions hold for the OLS estimator β̂1 (and also for β̂0 ):

• Under SLR.1-SLR.5:
 
β̂1 is Best V ar(β̂1 ) ≤ V ar(β̃1 ) ∀ β̃1 −→ Efficient!
X
Linear ( Y1 , . . . ; Yn weighted by X1 , . . . , Xn )
 
Unbiased E(β̂1 ) = β1
Estimator of β1

• Moreover:
β̂1 − E(β̂1 )
By CLT: q ≃ N (0, 1)
V ar(β̂1 )

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)


Summary (2)

If SLR.1-SLR.5 hold, then in large samples β̂1 and β̂0 have a jointly
normal sampling distribution:
 
1 The large-sample normal distribution of β̂ is N
1 β1 , σβ̂2 ,
1
where the variance of this distribution is:
  1 V ar [(X − µ ) u ]
i X i
V ar βˆ1 = 2
n σX 2

 
2 The large-sample normal distribution of β̂ is N β , σ 2 ,
0 0 β̂
0
where the variance of this distribution is:
  1 V ar (H u )  
ˆ i i µX
V ar β0 =  where Hi = 1 − Xi
n E(H 2 ) 2 E(Xi2 )

i

Ready to turn to hypothesis tests & confidence intervals!


Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)
Econometrics I
End chapter 3

Prof. Miguel Ángel Borrella Mas

School of Economics and Business Administration


Universidad de Navarra

You might also like