4 - Simple Linear Regression I 2022-23

Introductory
Econometrics
BUSI2053
The Simple Linear
Regression Model I:
Specification and
Estimation
Lecture Outline
• An Economic Model
• An Econometric Model
• The Nature of Regression Analysis
• Estimating the Regression Parameters
• Assumptions of OLS and G-M condition
• Properties of the OLS estimator
• Suggested Reading:
• Chapter 2, 3 & 4.1, 4.2: Hill, R.C., Griffiths W.E. and Lim, G.C. Principles of
Econometrics, fourth edition, Wiley, 2012 (pp. 39-110)
• Chapter 5-6: Westhoff (2013) An introduction to econometrics: a self-contained
approach, MIT Press, 2013
• Chapter 1-3: Gujarati, D.N. and Porter D.C. Basic econometrics, 5th ed., McGraw-Hill,
2009 (pp. 34-117)
• Chapter 1 & 2: Dougherty, Christopher. Introduction to econometrics. 4th ed. Oxford
University Press, 2011 (pp.83-147)
03/28/2023 Myint Moe Chit, BUSI2053 2
Before we move on,
• Random variable
• Conditional probability
• Population parameters (Fixed values)
• Sample statistics (Random variables)
• Expected value rules
• Variance and covariance rules
• Estimator and estimate
• Properties of an estimator

Simple linear regression
The statistical method that uses a

straight-line relationship to predict a 100
numerical dependent variable Y from 90 Y=1.23+ 0. 6X + e
a single numerical independent 80
variable X. 70
Exam Scores
60
50
40
30
20
10
0
0 10 20 30 40 50 60 70 80 90 100
Attendance Rate

An Economic Model
• As business decision makers, we are usually more
interested in studying relationships between variables
– Economic theory tells us that expenditure on economic
goods depends on income
𝐸𝑥𝑝𝑒𝑛𝑑𝑖𝑑𝑢𝑟𝑒= 𝑓 (𝐼𝑛𝑐𝑜𝑚𝑒)
– Consequently, Expenditure is defined as the ‘‘dependent
variable’’ and Income as the “independent’’ or
‘‘explanatory’’ variable
– The dependent variable is denoted as y and the
independent variable is denoted as x.
y=Expenditure; x=Income

An Economic Model
• Suppose that we are interested only in food expenditure
of households with an income of $1,000 per week
Probability distribution of
food expenditure y given
income x = $1000
An Economic Model to Econometric Model
• In econometrics, we recognise that real-world expenditures
are random variables, and we want to use data to learn
about the relationship between income and expenditure.
Regression analysis is largely concerned

with estimating and/or predicting the
(population) mean value of the dependent
variable based on the known or fixed values
of the explanatory variable(s).
Probability distributions of
food expenditures y given
incomes x = $1000 and x = $2000
Notes on Regression Analysis 1
Regression versus Correlation
• The correlation analysis measures the strength
or degree of linear association between two
variables that are assumed to be random
• Regression analysis tries to estimate or predict

the average value of a random variable based
on the ﬁxed values of other variables
• In regression analysis the dependent variable

is assumed to be statistical, random, or
stochastic. The explanatory variables, on the
other hand, are assumed to have ﬁxed values
(in repeated sampling)
An Econometric Model
• To investigate the relationship between
expenditure and income we must build
𝐸¿
Average expenditure
an economic model and then a
corresponding econometric model
• This econometric model is also called a 𝐸¿
regression model
– The simple regression model can be Δ 𝐸(𝑦∨𝑥)
written as Δ𝑥 Δ 𝐸(𝑦 ∨𝑥) 𝑑𝐸 (𝑦∨𝑥)
𝛽 2= =
𝐸 ( 𝑦 ∨𝑥 ) =𝜇 𝑦 ∨𝑥 =𝛽 1+ 𝛽2 𝑥 𝛽1 Δ𝑥 𝑑𝑥
Income y
where β1 is the intercept and β2 is the
slope and they are population
dE(y|x)/dx denotes the derivative of the
parameters Quiz 1
expected value of y given an x value
An Econometric Model
• The model describes economic 𝑓 (𝑦)
behaviour in the population of our
interest re
it u
• If we were to sample household d
pen
expenditures at other levels of income ex
ood
the regression function describes the F
𝛽 1+ 𝛽2 𝑥=𝐸 ¿
mean value of household expenditure for
each level of income
• Note: represents the model for expected
value of y. For the actual value of y, the
model can be written as: where is Ho
us e ho
random error term. ld i
n co
m𝑥e

Introducing the Error Term
Any observation on the dependent variable y can be decomposed
into two parts: a systematic component and a random component
called a “random error term”.
𝑦
𝑦4
𝑒4
𝐸 ¿
𝑦3
𝑒3
For example, the equation for :
𝑒2
𝑦2
𝑦1 𝑒1
The relationship among y, e
and the true regression line
𝑥1
03/28/2023 𝑥2 𝑥3
Myint Moe Chit, BUSI2053 𝑥4 𝑥 11
• The random error term is defined as 𝑦
𝑦4
𝑒= 𝑦 − 𝐸 ( 𝑦 ∨𝑥 ) =𝑦 − β 1 − β 2 𝑥 𝑒4
𝐸¿
𝑦3
• Rearranging gives a population 𝑒3
regression model.
𝑒2
𝑦 =β 1+ β 2 𝑥+𝑒 𝑦2
𝑒1
𝑦1
• The expected value of the error term,
given x, is 𝑥1 𝑥2 𝑥3 𝑥4 𝑥
𝐸(𝑒∨𝑥)=𝐸 (𝑦∨𝑥)− β1 −β 2 𝑥=0 Why?

• The random error term is defined as
𝑒= 𝑦− 𝐸 ( 𝑦∨𝑥 ) =𝑦 − β 1 − β 2 𝑥
• Any other economic factors that affect expenditures on food
are “collected” in the error term.
• The error term e is a “storage bin” for unobservable and/or

unimportant factors affecting the dependent variable
• The error term e captures any approximation error that arises

because the linear functional form we have assumed
• The error term captures any elements of random behaviour

that may be present in each individual observation.
Probability density functions for e and y Note that

• Regression analysis concerned with the statistical, not
functional or deterministic, dependence among variables
• The dependence of family food expenditure on income,

family size, location, and taste, for example, is statistical in
nature
• The explanatory variables, although certainly important,

will not enable the economist to predict food expenditure
exactly because of measurement errors as well as many
other unobservable factors (variables) that collectively
affect the expenditure
• Thus, there is bound to be some “intrinsic” or random variability in the dependent-
variable food expenditure that cannot be fully explained no matter how many explanatory
variables we consider.
Estimating the Regression Parameters
Estimates for the Food Expenditure Function
y =weekly food expenditure in $

x =weekly income in $100
Food Expenditure and Income Data [food (POE 4th ed.) from gretl] Quiz 2
Least Squares Principles
• Least squares estimates for the ^𝑦 𝑖 =𝑏 1+𝑏 2 𝑥𝑖
unknown parameters β1 and β2
are obtained my minimizing the
sum of squares function:
See Appendix 2A (p83-84) in Hill et el. (2012) or Appendix

𝑏1= ¯
𝑦 −𝑏 2 ¯
𝑥 3A.1 (p.92) Gujarati and Porter (2009) for derivations.

Least Squares Principles
• We want to estimate population
𝐸( 𝑦)=𝛽 1+ 𝛽2 𝑥𝑖
parameters β1 and β2 from
𝑦 =β 1+ β 2 𝑥+𝑒
• The fitted regression line is:
^𝑦 𝑖 =𝑏 1+𝑏 2 𝑥𝑖
^𝑦 𝑖 =𝑏 1+𝑏 2 𝑥𝑖
• The least squares residual is:
𝑒^ 𝑖= 𝑦 𝑖 − ^𝑦 𝑖= 𝑦 𝑖 −𝑏1 −𝑏 2 𝑥 𝑖
Be mindful of differences in notations in
different textbooks.
Estimating the Regression Parameters (exercise)
x = Number of children in a
1 2 3 4 3*4 5 household
xi y i ( 𝑥𝑖 − 𝑥
¯()𝑦 𝑖 − 𝑦
¯) ( 𝑥𝑖 − 𝑥¯ )2 ^
𝑦 =𝑏1 +𝑏2 𝑥𝑖 𝑒^
y = Number of visits to KFC
2 4 -2 -1 2 4 in a month
3 3 -1 -2 2 1
5 4 1 -1 -1 1
6 7 2 2 4 4
5 4 1 -1 -1 1
4 7 0 2 0 0
4 6 0 1 0 0
3 5 -1 0 0 1
¯𝑥 =4 𝑦¯ =5 ∑=6 ∑=12
𝑏1= ¯𝑦 −𝑏 2 ¯𝑥 =¿
Estimates for the Food Expenditure Function
The population regression model
y =weekly food expenditure in $
𝑦 =β 1+ β 2 𝑥+𝑒
^𝑦 =83.42+10.21 𝑥 The fitted regression line
^𝑦 𝑖 =83.42+10.21 𝑥𝑖
𝑦¯ =283.57 The sample regression model
𝑦 𝑖 =8 3.42+1 0.21 𝑥𝑖 + 𝑒^ 𝑖
¯𝑥 =19.60
Quiz 3
x =weekly income in $100
Regression versus Causation
• Although regression analysis deals with the
dependence of one variable on other variables, it
does not necessarily imply causation.
“A statistical relationship, however strong and

however suggestive, can never establish causal
connection: our ideas of causation must come
from outside statistics, ultimately from some
theory or other.”
• In the expenditure-income example, there is no statistical reason to assume that income does
not depends on expenditure. The fact that we treat expenditure as dependent on income
(among other things) is due to non-statistical considerations: Common sense suggests that
the relationship cannot be reversed, or we cannot control income by varying food
expenditure.
Assumptions of the Simple Linear Regression
SR1:The value of y, for each value of x, is given by the linear
regression: 𝑦 =β 1+ β 2 𝑥+𝑒
SR2: The expected value of the random error e is: 𝐸(𝑒 𝑖 )=0
2
SR3: The variance of the random error e is: var (¿𝑒𝑖 )=𝜎 =var(¿ 𝑦 𝑖 )¿¿
SR4: The covariance between any pair of random errors, ei and ej is:
cov (¿𝑒𝑖 , 𝑒 𝑗 )=cov(¿ 𝑦 𝑖 , 𝑦 𝑗 )=0 ¿¿
SR5: The variable x is not random, and must take at least two
different values
2
SR6: Optional 𝑒 𝑁 (0,σ )
Properties of least square estimator
Gauss-Markov condition
Under the assumptions SR1-SR5 of the linear regression model, the OLS estimators b1
and b2 have the smallest variance of all linear and unbiased estimators of β1 and β2.
They are the Best Linear Unbiased Estimators (BLUE) of β1 and β2.
1. The estimators b1 and b2 are “best” compared to similar estimators because they
have the minimum variance.
2. The estimators b1 and b2 (not specific b1 and b2) are linear and unbiased.
3. If any of these assumptions are not true, then b1 and b2 are not BLUE
See optional slides on Moodle for the proof of the properties

Summary
After this lecture, you should be able to:
• differentiate between an economic model and an econometric model
• explain the meaning of random error term
• understand the underlying concept of the least squares method in
regression analysis
• identify and explain the keys assumption of simple linear regression
• properties of least square estimators

Thank you.
Any question?

4 - Simple Linear Regression I 2022-23

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

4 - Simple Linear Regression I 2022-23

Uploaded by

Copyright:

Available Formats

Introductory

03/28/2023 Myint Moe Chit, BUSI2053 3

The statistical method that uses a

03/28/2023 Myint Moe Chit, BUSI2053 4

03/28/2023 Myint Moe Chit, BUSI2053 5

Regression analysis is largely concerned

• Regression analysis tries to estimate or predict

• In regression analysis the dependent variable

03/28/2023 Myint Moe Chit, BUSI2053 10

𝐸(𝑒∨𝑥)=𝐸 (𝑦∨𝑥)− β1 −β 2 𝑥=0 Why?

03/28/2023 Myint Moe Chit, BUSI2053 12

• The error term e is a “storage bin” for unobservable and/or

• The error term e captures any approximation error that arises

• The error term captures any elements of random behaviour

03/28/2023 Myint Moe Chit, BUSI2053 14

• The dependence of family food expenditure on income,

• The explanatory variables, although certainly important,

y =weekly food expenditure in $

See Appendix 2A (p83-84) in Hill et el. (2012) or Appendix

03/28/2023 Myint Moe Chit, BUSI2053 17

“A statistical relationship, however strong and

See optional slides on Moodle for the proof of the properties

03/28/2023 Myint Moe Chit, BUSI2053 23

03/28/2023 Myint Moe Chit, BUSI2053 24

You might also like