Chapter 3 Econometrics

Econometrics I
Chapter 3
Linear Regression with One Regressor
Prof. Miguel Ángel Borrella Mas
School of Economics and Business Administration

Universidad de Navarra
Academic year 2022-23

Outline
1 Introduction
2 The (simple) Linear Regression Model
3 The Ordinary Least Squares estimator
4 Measures of fit
5 The Least Squares Assumptions
6 Sampling distribution of the OLS Estimators

Outline
1 Introduction
4 Measures of fit
Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)

Learning objectives
• Ask a question [simple - with one independent variable]. We

want to study the causal effect of “A” on “B”
• Set up a simple linear model to answer this question
• Answer the question using data and a statistical package

(Stata)

Example
Question from previous chapter: Does class-size affect student
performance?
• What are our priors? −→ Smaller class size are better for
learning outcomes (?)
• We are interested in
Change in TextScore △T extScore
β1 = =
Change in ClassSize △ClassSize
• In words: β1 measures the change in Test Score due to a unit

change in Class Size
• Mathematically:
• β1 = slope of a straight line relating test scores and class size
T est Score = β0 + β1 ∗ Class Size
• β0 = intercept of the straight line
Example (2)
• But: The average test score in district i does not only depend
on the average class size
• It also depends on other factors such as:
• Quality of the teachers
• Student background
• Quality of text books
• ...
• The equation describing the linear relation between Test score

and Class size is better written as:
T est Scorei = β0 + β1 Class Sizei + ui
where ui lumps together all other district characteristics that

affect average test scores

Statistical inference for linear regression
Statistical (or econometric) inference about the slope entails:

1 Estimation:
• How should we draw a line through the data to estimate the
population slope? −→ Ordinary Least Squares (OLS!)
• What are advantages and disadvantages of OLS?
2 Hypothesis testing:
• How to test whether the slope is zero?
3 Confidence intervals:
• How to construct a confidence interval for the slope?

Outline
1 Introduction
4 Measures of fit

Population regression line
The population regression line is the expected value of Y given X
E(Y | X)
• The slope (or marginal effect) is the difference in the expected

values of Y, for two values of X that differ by one unit
• The estimated regression can be used either for:
• causal inference (learning about the causal effect on Y of a
change in X)
• prediction (predicting the value of Y given X, for an
observation not in the data set)
• Causal inference and prediction place different requirements

on the data – but both use the same regression toolkit

General form
General form of the population regression line

Yi = β0 + β1 Xi + ui , i = 1, . . . , n
where:
• Subscript i = observational level [n-paired obs. (Xi , Yi )]

• Yi = Dependent variable
• Xi = Independent variable or regressor
• β0 = Population intercept (unknown!)
• β1 = Population slope (unknown!)
• ui = Regression error term −→ Omitted factors that influence
Y , other than the variable X. Also includes error in the
measurement of Y

Interpretation
△Y △u
= β1 as long as =0
△X △X
• By how much does Y change if X is increased by 1 unit?
• It is only correct if all other things remain equal when X is
increased by 1 unit
• Conditional mean independence: E(u | X) = 0
• (Explanatory variable must not contain information about the
mean of the unobserved factors)
• Can we test this?
• Condition unlikely to hold

• Simple linear regression model is rarely applicable in practice
• But its discussion is useful for pedagogical reasons

Interpretation
△Y △u
= β1 as long as =0
△X △X
• By how much does Y change if X is increased by 1 unit?
• It is only correct if all other things remain equal when X is
increased by 1 unit
• Conditional mean independence: E(u | X) = 0
• (Explanatory variable must not contain information about the
mean of the unobserved factors)
• Can we test this?
• Condition unlikely to hold

• Simple linear regression model is rarely applicable in practice
• But its discussion is useful for pedagogical reasons

Example
A simple linear wage equation:
W agei = β0 + β1 Educi + ui
• β1 = Measures the change in hourly wage of an additional

year of education
• ui = Includes factors such as:
• Labor force experience
• Tenure with current employer
• Work ethic
• Ability
• What about the conditional mean independence?

−→ Again unlikely to hold:
• Individuals with more education will also be more intelligent
(more able) on average!
Example
A simple linear wage equation:
W agei = β0 + β1 Educi + ui
• β1 = Measures the change in hourly wage of an additional

year of education
• ui = Includes factors such as:
• Labor force experience
• Tenure with current employer
• Work ethic
• Ability
• What about the conditional mean independence?

−→ Again unlikely to hold:
• Individuals with more education will also be more intelligent
(more able) on average!
Outline
1 Introduction
4 Measures of fit

Intuition
In general we do not know β0 and β1 −→ We have to estimate them
using a random sample of data
Question: How to find the line that fits the data best?
OLS estimators: To choose the regression coefficients s.t. the

estimated regression line is as close as possible to the observed data
Intuition
In general we do not know β0 and β1 −→ We have to estimate them
using a random sample of data
Question: How to find the line that fits the data best?
OLS estimators: To choose the regression coefficients s.t. the

estimated regression line is as close as possible to the observed data
Graphically

Regression model
Method to estimate β0 and β1 :
LEAST SQUARE PRINCIPLE

Mathematical procedure that uses the data to position a line with
the objective of minimizing the sum of the squares of the vertical
distances between the actual Y values and the predicted values of Y
n
X n
X
min S(β0 , β1 ) = û2i = (Yi − Ŷi )2
β0 ,β1
i=1 i=1
where ûi are called the residuals:

• Difference between the observed y-value and the predicted
y-value for a given x-value on the line
• ûi = Yi − βˆ0 − βˆ1 Xi
Regression equation
Regression equation
An equation that expresses the linear relationship between two
variables
Ŷi = βˆ0 + βˆ1 Xi

where:
• Ŷi = Estimated value of Y for a selected X value
• βˆ0 = Y-intercept. Estimated value of Y when X = 0
• βˆ1 = Slope. Average change in the dependent variable Y for
each change of one unit (increase or decrease) in the
independent variable X
• Xi = Value of the independent variable that is selected

OLS estimators of β1 and β0
Pn
sy sxy i=1 xi yi − nX̄ Ȳ
OLS estimator of β1 → βˆ1 = rxy = 2 = P n 2 2
sx sx i=1 xi − nX̄
where:
• rxy = Correlation coefficient
• sy = Standard deviation of Y
• sx = Standard deviation of X
• sxy = Covariance between X and Y
βˆ0 = Ȳ − βˆ1 X̄
where:
• Ȳ = Sample mean of Y
• X̄ = Sample mean of X
• βˆ1 = Estimated slope of the regression line
OLS estimators of β1 and β0
Pn
sy sxy i=1 xi yi − nX̄ Ȳ
OLS estimator of β1 → βˆ1 = rxy = 2 = P n 2 2
sx sx i=1 xi − nX̄
where:
• rxy = Correlation coefficient
• sy = Standard deviation of Y
• sx = Standard deviation of X
• sxy = Covariance between X and Y
βˆ0 = Ȳ − βˆ1 X̄
where:
• Ȳ = Sample mean of Y
• X̄ = Sample mean of X
• βˆ1 = Estimated slope of the regression line
Why to use OLS estimators?
• OLS is, as in the case of the sample average, the estimator

searching for the line that better “fits” the scatterplot:
• Notice: if the “line” is just an intercept (Y does not depend
on X), then the OLS estimator is just the sample average of
Y1 , . . . , Yn −→ (Ȳ )
• Like Ȳ , the OLS estimator has some desirable properties:

• Under certain assumptions, it is unbiased −→ E(β̂1 ) = β1
• Its sampling distribution has lower variance than other
candidate estimators of β1
p
• Under certain assumptions, it is consistent −→ β̂1 −→ β1

Example
Application to the California Test Score – Class Size data
• The sample mean of district average test scores Ȳ = 654.16
• It can also be obtained by OLS:

Example
19.053
• Estimated slope = β̂1 = −2.28 = −0.2264 ∗ 1.8918
• Estimated intercept = β̂0 = 698.9 = 654.1565 + 2.28 ∗ 19.64

• Estimated regression line: T est ˆScore = 698.9 − 2.28 ∗ ST R
Example
• Interpretation of the estimated slope and intercept:

• The slope: Districts with one more student per teacher have,
on average, test scores that are 2.28 points lower
△Text Score
• That is, we estimate = △ST R = −2.28
• The intercept: Taken literally, means that, according to this

estimated line, districts with zero students per teacher would
have a (predicted) test score of 698.9
• BUT: This interpretation of the intercept makes no sense
here!
1 It extrapolates the line outside the range of the data (the
intercept is not itself economically meaningful)
2 What does it mean for class size to be zero?

Example
• Interpretation of the estimated slope and intercept:

• The slope: Districts with one more student per teacher have,
on average, test scores that are 2.28 points lower
△Text Score
• That is, we estimate = △ST R = −2.28
• The intercept: Taken literally, means that, according to this

estimated line, districts with zero students per teacher would
have a (predicted) test score of 698.9
• BUT: This interpretation of the intercept makes no sense
here!
1 It extrapolates the line outside the range of the data (the
intercept is not itself economically meaningful)
2 What does it mean for class size to be zero?

Example (2)
In Stata:
One of the districts in the dataset is Antelope, CA, for which:
ST R = 19.33 and Test Score = 657.8
• Predicted value: ŶAntelope = 698.9 − 2.28 ∗ 19.33 = 654.8

• Residual: ûAntelope = 657.8 − 654.8 = 3.0

Outline
1 Introduction
4 Measures of fit

Introduction
How well does the estimated regression line “fit” or explain the data?
1 Does the regressor X account for much or for little variation
in Y ? −→ The R2 measures the fraction of the variance of Y
that is explained by X
• It is unitless
• Ranges between 0 (no fit) and 1 (perfect fit)
2 Are the observations in the scatter plot clustered closely

around the regression line?
−→ The standard error of the regression (SER) measures
how far Yi typically is from its predicted value

The R2
The R2 is the fraction of the sample variance of Yi “explained” by

the regression
Yi = Ŷi + ûi = OLS prediction + OLS residual
• Sample Var(Y) = Sample Var(Ŷ ) + Sample Var(û)

• Total sum of squares = “Explained” SS + SS “Residuals”
Pn ¯ 2

ESS i=1 Ŷi −Ŷ SSR
• R2 = T SS = Pn 2 =1− T SS
i=1 (Yi −Ȳ )
• If R2 = 0, Xi explains none of the variation in Yi

• If R2 = 1, Xi explains all of the variation in Yi (Yi = Ŷi )
• In practice, 0 < R2 < 1
• With one regressor: R2 = the square of the correlation
coefficient between X and Y

SE of the regression
The SER is an estimator of the standard deviation of the regression
error ui
v v
u n u n
u 1 X 2 u 1 X
¯ û2i

SER = sû = t ûi − û = t
n−2 n−2
i=1 i=1
¯= 1 Pn
• The second equality holds because û n i=1 ûi =0
• The divisor n − 2 is used because 2 degrees of freedom were
lost in estimating the two regression coefficients β0 and β1
• It measures the spread of the observations around the
regression line in the units of the dependent variable
• In other words: It measures the average “size” of the OLS
residual (the average “mistake” made by the OLS regression)

SE of the regression
The SER is an estimator of the standard deviation of the regression
error ui
v v
u n u n
u 1 X 2 u 1 X
¯ û2i

SER = sû = t ûi − û = t
n−2 n−2
i=1 i=1
¯= 1 Pn
• The second equality holds because û n i=1 ûi =0
• The divisor n − 2 is used because 2 degrees of freedom were
lost in estimating the two regression coefficients β0 and β1
• It measures the spread of the observations around the
regression line in the units of the dependent variable
• In other words: It measures the average “size” of the OLS
residual (the average “mistake” made by the OLS regression)

Example
The slope is statistically significant & large in a policy sense, but:

• STR explains only a small fraction of the variation in test
scores −→ R2 = 5.12%
• Large spread −→ RSER = 18.6%
q P
1 n
• In Stata: SER = RMSE = n i=1 (û2i )
• Distinction negligible if n is large enough
• Does this make sense? Does this mean the STR is
unimportant in a policy sense?
Example
The slope is statistically significant & large in a policy sense, but:

• STR explains only a small fraction of the variation in test
scores −→ R2 = 5.12%
• Large spread −→ RSER = 18.6%
q P
1 n
• In Stata: SER = RMSE = n i=1 (û2i )
• Distinction negligible if n is large enough
• Does this make sense? Does this mean the STR is
unimportant in a policy sense?
Outline
1 Introduction
4 Measures of fit

Introduction
So far: OLS is as a way to draw a straight line through the data on

Y and X. But:
1 Under what conditions does the slope of this line have a

causal interpretation? That is, when will the OLS estimator
be unbiased for the causal effect on Y of X?
2 What is the variance of the OLS estimator over repeated
samples?
To answer these questions −→ To make some assumptions about

how Y and X are related to each other, and about how they are
collected (the sampling scheme)
• These assumptions are known as the Least Squares
Assumptions for Causal Inference

Introduction
So far: OLS is as a way to draw a straight line through the data on

Y and X. But:
1 Under what conditions does the slope of this line have a

causal interpretation? That is, when will the OLS estimator
be unbiased for the causal effect on Y of X?
2 What is the variance of the OLS estimator over repeated
samples?
To answer these questions −→ To make some assumptions about

how Y and X are related to each other, and about how they are
collected (the sampling scheme)
• These assumptions are known as the Least Squares
Assumptions for Causal Inference

Reminder!
The causal effect on Y of a unit change in X is the expected

difference in Y as measured in a randomized controlled experiment
• With a binary treatment:

• The causal effect is the expected difference in means between
the treatment and control groups (remember chapter 2b!)
• It requires random assignment or as-if random assignment
• Random assignment ensures that the treatment (X) is
uncorrelated with all other determinants of Y, so that there are
no confounding variables
• The least squares assumptions for causal inference generalize

the binary treatment case to regression

General assumptions
General assumptions for the linear regression model:
1 Assumption SLR.1 (Linear in parameters) −→ In the

population, the relationship between Y and X is linear
Y = β0 + β1 X + U
2 Assumption SLR.2 (Sample variation in the regressor) −→

The values of the regressor are not all the same (otherwise
would be impossible to study how different values of X lead to
different values of Y)

Specific assumptions
Specific assumptions for the linear regression model:
3 Assumption SLR.3 (Zero conditional mean) −→ The value
of the regressor must contain no information about the mean
of the unobserved factors
E (ui | Xi ) = 0
4 Assumption SLR.4 (Simple random sampling) −→
(Xi , Yi ), i = 1, . . . , n are i.i.d. and then each data point
follows the population equation
5 Assumption SLR.5 (Outliers unlikely) −→ X and Y have
finite fourth moments
6 Assumption SLR.6 (Homoskedasticity) −→ The value of the
regressor must contain no information about the variability of
the unobserved factors
V ar (ui | Xi ) = σ 2
SLR.3
For any given value of X, the mean of u is zero: E (ui | Xi = xi ) = 0
• This implies that β̂1 is unbiased for the causal effect of β1
E (Yi | Xi ) = E (β0 + β1 Xi + ui | Xi ) =
= β0 + β1 E (Xi | Xi ) + (ui | Xi ) = β0 + β1 Xi
• Our example: T est Scorei = β0 + β1 Class Sizei + ui −→

What can be those other factors included in ui ?
• Parental involvement
• Outside learning opportunities (extra math classes...)
• Home environment conducive to reading
• Family income is a useful proxy for many such factors
• This means E(family income | ST R) = constant and implies

that family income and STR are uncorrelated. . .
• Is E (ui | Xi = xi ) = 0 plausible for these other factors?
SLR.3 (2)
The benchmark for understanding this assumption is to consider an

ideal RCT:
• X is randomly assigned to people

• Students randomly assigned to different size classes
• Randomization is done by computer —using no information
about the individual
• Because X is assigned randomly, all other individual

characteristics (things included in u) are distributed
independently of X −→ u and X are independent
• Then, in an ideal RCT: E (ui | Xi = xi ) = 0 (SLR.3 holds)

SLR.3 (3)
With observational data, we need to think hard about whether
E (ui | Xi = xi ) = 0 holds. Example:
• Suppose that:
• Districts which wealthy inhabitants have small classes and
good teachers
• These districts have a lot of money which they can use to hire
more and better teachers
• Districts with poor inhabitants have large classes and bad
teachers
• These districts have little money and can hire only few and not
very good teachers
• In this case, class size is related to teacher quality
• Teacher quality likely affects test scores −→ Within ui
• This implies a violation of SLR.3
E(ui | Class sizei = small) ̸= E(ui | Class sizei = large) ̸= 0
SLR.4
(Xi , Yi ), i = 1, . . . , n are i.i.d. arises automatically if the entity

(individual, district) is sampled by simple random sampling:
• The entities are selected from the same population, so

(Xi , Yi ) are identically distributed for all i = 1, . . . , n
• The entities are selected at random, so the values of (X, Y )
for different entities are independently distributed
Examples of a violation of simple random sampling:

• Panel data and time series data (Data recorded over time)
• Observations on children from the same mother (not
independent)

SLR.5
Large outliers are rare −→ E(X 4 ) < ∞ and E(Y 4 ) < ∞
• Outliers are observations that have values far outside the

usual range of the data
• Another way to state assumption is that X and Y have finite
kurtosis
• Large outliers can make OLS regression results misleading
• Look at your data! If you have a large outlier, is it a typo?
• Does it belong in your data set? Why is it an outlier?
• Assumption is necessary to justify the large sample

approximation to the sampling distribution of the OLS
estimators

SLR.5 (2)
Large outlier can strongly influence the results:

SLR.6
Homoskedasticity graphically:

Outline
1 Introduction
4 Measures of fit

Introduction
The OLS estimator is computed from a sample of data

−→ A different sample yields a different value of β̂1 (the source of
the “sampling uncertainty” of β̂1 ¡). Then we want:
• Quantify the sampling uncertainty associated with β̂1

• Use β̂1 to test hypothesis such as H0 = β1 = 0
• Construct a confidence interval for β1
• Goal: To study the sampling distribution of β̂1
1 Probability framework for linear regression
2 Distribution of OLS estimator

Introduction
The OLS estimator is computed from a sample of data

−→ A different sample yields a different value of β̂1 (the source of
the “sampling uncertainty” of β̂1 ¡). Then we want:
• Quantify the sampling uncertainty associated with β̂1

• Use β̂1 to test hypothesis such as H0 = β1 = 0
• Construct a confidence interval for β1
• Goal: To study the sampling distribution of β̂1
1 Probability framework for linear regression
2 Distribution of OLS estimator

Probability Framework for Linear Regression
The probability framework is summarized by the OLS assumptions:
1 Population → The group of interest (ex: all possible school

districts)
2 Random variables → X, Y (ex: Test Score, STR) (SLR.2)
3 Joint distribution of X and Y → We assume:
• Population regression function is linear (SLR.1)
• E (ui | Xi = xi ) = 0 (SLR.3)
• X, Y have nonzero finite fourth moments (SLR.5)
4 Simple random sampling → Data collection by this method
implies (Xi , Yi ), i = 1, . . . , n are i.i.d. (SLR.4)

Reminder
Recall the summary of the sampling distribution of Ȳ :
• For (Y1 , . . . , Yn ) i.i.d. with 0 < σY2 < ∞,
σY2

Ȳ is Best ≤ V ar(µ̂Y ) ∀ µ̂Y
V ar(Ȳ ) =
n
n
!
1X
Linear µ̂Y = Yi
n
i=1

Unbiased E(Ȳ ) = µY
Estimator of µY
• Moreover:
Ȳ − E(Ȳ )
By CLT: p ≃ N (0, 1)
V ar(Ȳ )

Sampling distribution of β̂1
Like Ȳ , β̂1 (remember: it is a function of sample averages!) has a

sampling distribution:
1 What is E(β̂1 )?
−→ If E(β̂1 ) = β1 , then OLS is unbiased (good thing!)
2 What is V ar(β̂1 )? (measure of sampling uncertainty)
−→ We need to derive a formula in order to compute the SE
of β1
3 What is the distribution of β̂1 in small samples?
−→ It is very complicated in general
4 What is the distribution of β̂1 in large samples?
−→ By the CLT, β̂1 is (approx) normally distributed

Preliminary algebra
Some (needed!) preliminary algebra:
Yi = β0 + β1 Xi + ui
Ȳ = β0 + β1 X̄ + ū
Hence: Yi − Ȳ = β1 (Xi − X̄) + (ui − ū)
Thus:
Pn
Xi − X̄ Yi − Ȳ
βˆ1 = i=1
Pn 2
Xi − X̄
Pn Pi=1
n
Xi − X̄ β1 (Xi − X̄) + (ui − ū)
βˆ1 = i=1
Pn
i=1
2
Xi − X̄
i=1
Pn Pn
X i − X̄ Xi − X̄ X i − X̄ (ui − ū)
βˆ1 = β1 i=1Pn 2 + i=1 Pn 2
i=1 Xi − X̄ i=1 Xi − X̄

Preliminary algebra (2)
Pn
Xi − X̄ (ui − ū)
βˆ1 = β1 + i=1
Pn 2
i=1 Xi − X̄
It can be shown that:

n
X n
X

Xi − X̄ (ui − ū) = Xi − X̄ ui
i=1 i=1
Finally:
Pn
ˆ i=1 Xi − X̄ ui
β1 − β1 = Pn 2
i=1 Xi − X̄

What is E(β̂1 )?
Pn !
Xi − X̄ ui
E βˆ1 − β1 = E i=1
Pn 2
i=1 Xi − X̄
Pn !
X i − X̄ E (ui | Xi )
Using LIE: E βˆ1 − β1 = E i=1
Pn 2
i=1 Xi − X̄

E βˆ1 − β1 = 0, because LSR.3: E (ui | Xi = xi )
• Thus LSR.3 implies that E(β̂1 ) = β1 —just like Ȳ !

• That is, β̂1 is an unbiased estimator of β1

What is V ar(β̂1 )?
Rewrite:
Pn 1 Pn
Xi − X̄ ui i=1 vi
βˆ1 − β1 = i=1
Pn 2 =
n
n−1
2
i=1 Xi − X̄ n SX

where vi = Xi − X̄ ui
2 ≈ σ 2 and n−1
If n is large enough: SX X n ≈ 1. Then:
1 P n
vi
βˆ1 − β1 ≈ n i=1
2
σX
!
1 Pn
vi
V ar βˆ1 − β1 = V ar n i=1 2
σX
1 V ar X − X̄ u
i i
V ar βˆ1 = 2
n 2
σX

• V ar βˆ1 is inversely proportional to n —just like Ȳ !
What is the distribution of β̂1 ?
The exact sampling distribution is complicated – it depends on the

population distribution of (Y, X). But, When n is large, we get
some simple (and good) approximations
p
• Since V ar(β̂1 ) < ∞ and β̂1 −→ β1
• We can use the CLT to obtain the (approx) distribution
• Remember previous slide:
1 Pn 1 Pn
i=1 vi i=1 vi
βˆ1 − β1 = n
n−1
2 ≈ n
2
n SX σX

where vi = Xi − X̄ ui

What is the distribution of β̂1 ? (2)
• When n is large: vi = (Xi − X̄)ui ≈ (Xi − µX )ui

• vi is i.i.d. (why?)
• V ar(vi ) < ∞ (why?)
σ2
Pn
• By the CLT: n1 i=1 vi ≃ N 0, nv
Then, β̂1 is approximately distributed:

!
σv2
β̂1 ≃ N β1 , 2
2
n σX
where vi ≈ (Xi − µX ) ui

Extra (1): Proof of consistency
p
Consistency means β̂1 −→ β1 or p − lim β̂1 = β1
Pn !
Xi − X̄ Y i − Ȳ
p − lim βˆ1 = p − lim i=1
Pn 2
i=1 Xi − X̄
1 Pn
p !
Xi − X̄ ui −→ 0
p − lim βˆ1 = β1 + p − lim n i=1
2 p
1 Pn
n i=1 Xi − X̄ −→ V ar(X)
• Then, p − lim β̂1 = β1 if E (ui | Xi = xi ) = 0

• Unbiasedness & consistency both rely on SLR.3
• But consistency implies that the sampling distribution
becomes more and more tightly distributed around β1 if the
sample size n becomes larger and large

Extra (2): Variance of X vs Variance of β̂1
1 V ar [(X − µ ) u ]
i X i
V ar βˆ1 = 2
n σ 2
X
where 2
σX = V ar(Xi )
• The variance of X appears (squared) in the denominator −→

Increasing the spread of X decreases the variance of β̂1
• Intuition: More variation in X implies more info in the data
that can be used to fit the regression line. Graphically:

Extra (3): (What about SLR.6?)
Under SLR.6:
σu2 σu2
V ar βˆ1 = 2 = Pn 2
n ∗ σX Xi − X̄
i=1
• Same notion: Larger sampling variability of βˆ1 if variability

of unobserved factors is higher; lower if variation in the
regressor is larger
SSR
• As usual, σu2 is unknown −→ Use σ̂ 2 = s2u = SER = n−2
• Homoskedastic SE of βˆ1 is then:

su
SE(βˆ1 ) = qP 2
n
i=1 Xi − X̄

Extra (3): (What about SLR.6?)
Difference between homoskedastic and heteroskedastic (robust) SE

Summary
Parallel conclusions hold for the OLS estimator β̂1 (and also for β̂0 ):
• Under SLR.1-SLR.5:

β̂1 is Best V ar(β̂1 ) ≤ V ar(β̃1 ) ∀ β̃1 −→ Efficient!
X
Linear ( Y1 , . . . ; Yn weighted by X1 , . . . , Xn )

Unbiased E(β̂1 ) = β1
Estimator of β1
• Moreover:
β̂1 − E(β̂1 )
By CLT: q ≃ N (0, 1)
V ar(β̂1 )

Summary (2)
If SLR.1-SLR.5 hold, then in large samples β̂1 and β̂0 have a jointly
normal sampling distribution:

1 The large-sample normal distribution of β̂ is N
1 β1 , σβ̂2 ,
1
where the variance of this distribution is:
1 V ar [(X − µ ) u ]
i X i
V ar βˆ1 = 2
n σX 2

2 The large-sample normal distribution of β̂ is N β , σ 2 ,
0 0 β̂
0
where the variance of this distribution is:
1 V ar (H u )
ˆ i i µX
V ar β0 = where Hi = 1 − Xi
n E(H 2 ) 2 E(Xi2 )

i
Ready to turn to hypothesis tests & confidence intervals!

Econometrics I
End chapter 3
Prof. Miguel Ángel Borrella Mas
School of Economics and Business Administration

Universidad de Navarra

Chapter 3 Econometrics

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 3 Econometrics

Uploaded by

Copyright:

Available Formats

Econometrics I

Prof. Miguel Ángel Borrella Mas

School of Economics and Business Administration

Academic year 2022-23

2 The (simple) Linear Regression Model

3 The Ordinary Least Squares estimator

5 The Least Squares Assumptions

6 Sampling distribution of the OLS Estimators

2 The (simple) Linear Regression Model

3 The Ordinary Least Squares estimator

5 The Least Squares Assumptions

6 Sampling distribution of the OLS Estimators

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)

• Ask a question [simple - with one independent variable]. We

• Set up a simple linear model to answer this question

• Answer the question using data and a statistical package

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)

• In words: β1 measures the change in Test Score due to a unit

• The equation describing the linear relation between Test score

T est Scorei = β0 + β1 Class Sizei + ui

where ui lumps together all other district characteristics that

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)

Statistical (or econometric) inference about the slope entails:

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)

2 The (simple) Linear Regression Model

3 The Ordinary Least Squares estimator

5 The Least Squares Assumptions

6 Sampling distribution of the OLS Estimators

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)

The population regression line is the expected value of Y given X

• The slope (or marginal effect) is the difference in the expected

• Causal inference and prediction place different requirements

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)

General form of the population regression line

• Subscript i = observational level [n-paired obs. (Xi , Yi )]

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)

• Condition unlikely to hold

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)

• Condition unlikely to hold

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)

• β1 = Measures the change in hourly wage of an additional

• What about the conditional mean independence?

• β1 = Measures the change in hourly wage of an additional

• What about the conditional mean independence?

2 The (simple) Linear Regression Model

3 The Ordinary Least Squares estimator

5 The Least Squares Assumptions

6 Sampling distribution of the OLS Estimators

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)

OLS estimators: To choose the regression coefficients s.t. the

OLS estimators: To choose the regression coefficients s.t. the

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)

LEAST SQUARE PRINCIPLE

where ûi are called the residuals:

Ŷi = βˆ0 + βˆ1 Xi

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)

• OLS is, as in the case of the sample average, the estimator

• Like Ȳ , the OLS estimator has some desirable properties:

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)

• The sample mean of district average test scores Ȳ = 654.16

• It can also be obtained by OLS:

Econometrics - Ch. 3 Miguel Ángel Borrella (UNAV)