0 views

Uploaded by chanlego123

ETC2410 Lecture 4 Slides Semester 1 2019 Monash University Clayton Campus

ETC2410 Lecture 4 Slides Semester 1 2019 Monash University Clayton Campus

© All Rights Reserved

- Introduction to Econometrics- Stock & Watson -Ch 5 Slides.doc
- Quantitative Methods for Economists
- mbook applied econometrics using MATLAB
- L
- Economics Assignments.doc
- TEAM FACTORS AND THE MODERATING EFFECT OF TOP MANAGEMENT SUPPORT ON NEW PRODUCT DEVELOPMENT (NPD) PERFORMANCE: THE MALAYSIAN EXPERIENCE
- G4e01
- Assignment Week3
- Margins 01
- primary education quality in francophone sub-saharan africa
- Logit and Probit Analysis Lecture
- Zak, P. J., And Knack, S. (2001) Trust and Growth. Economic Journal
- Autodefensas y Justicia Por Mano Propia Univ Vandelbilt
- Sushmita Chatterjee
- SSRN-id2601673
- Exploratory Model of the Impact of Agriculture on Nigerian Economy
- Edwards Start Times
- Does Learning by Exporting Happen? Evidence from the Automobile Industry in China
- cheming-e
- Arpresearch Dev 200806.04

You are on page 1of 34

OLS Estimates and Effects of Rescaling

2019

1 / 34

Recap

y = X β + u

n×1 n×(k+1) (k+1)×1 n×1

to the vector y, i.e. the length of its error vector (OLS residuals) is

the shortest

I This implies that the OLS residual vector is perpendicular to all

columns of X, i.e.,

X0 û = 0

which leads to the famous OLS formula

βb = (X0 X)−1 X0 y

2 / 34

I A consequence of orthogonality of residuals and columns of X is that

n

X n

X n

X

(yi − ȳ )2 = (ŷi − ȳ )2 + ûi2

i=1 i=1 i=1

or

SST = SSE + SSR

I This leads to the definition of the coefficient of determination R 2 ,

which is a measure of goodness of fit

R 2 = SSE/SST = 1 − SSR/SST

3 / 34

I OLS formula gives us fitted values that are closest to the actual

values in the sense that the length of the residual vector is the

smallest possible.

I But why should this be a good estimate of the unknown parameters

of the conditional expectation function?

I We ask if this formula produces the best information we can get

from our data about the unknown parameters β

4 / 34

Lecture Outline

I Interpretation of the OLS estimates in multiple regression (textbook

reference 3-2a to 3-2e)

I Properties of good estimators

I Properties of the OLS estimator βb

1. OLS is unbiased: The expected value of βb (Textbook reference

3-3, and Appendix E-2)

2. The Variance of βb (Textbook reference 3-4, and Appendix E-2)

3. OLS is BLUE (Gauss-Markov Theorem): The Efficiency of βb

(Textbook reference 3-5, and Appendix E-2)

I Units of measurement: do the results qualitatively change if we

change the units of measurement? (textbook reference 2-4a, 6-1

(exclude 6-1a))

5 / 34

Interpretation of OLS estimates

Example: The causal effect of education on wage

wage = β0 + β1 educ + β2 IQ + u

= 15).

I Primarily interested in β1 , because we want to know the value that

education adds to a person’s wage.

I Without IQ in the equation, the coefficient of educ will show how

strongly wage and educ are correlated, but both could be caused by

a person’s ability.

I By explicitly including IQ in the equation, we obtain a more

persuasive estimate of the causal effect of education provided that

IQ is a good proxy for intelligence.

6 / 34

Interpretation of OLS estimates

Example: The causal effect of education on wage

sure if we are measuring the effect of education, or if education is

acting as a proxy for smartness. This is important, because if the

education system does not add any value other than separating

smart people from not so smart, the society can achieve that much

cheaper by national IQ tests!

7 / 34

Interpretation of OLS estimates

Example: The causal effect of education on wage

I The coefficient of educ now shows that for two people with the

same IQ score, the one with 1 more year of education is expected to

earn $42 more.

8 / 34

Interpretation of OLS estimates

I The conditional expectation of y given x1 and x2 (also know as the

population regression function) is

E (y | x1 , x2 ) = β0 + β1 x1 + β2 x2

function) is

ŷ = β̂0 + β̂1 x1 + β̂2 x2 .

I The formula

∆ŷ = β̂1 ∆x1 + β̂2 ∆x2

allows us to compute how predicted y changes when x1 and x2

change by any amount.

9 / 34

I What if we increase x1 by 1 unit and hold x2 fixed, that is, ∆x1 = 1

and ∆x2 = 0?

1 unit while x2 is held fixed.

I We also refer to β̂1 as the estimate of the partial effect of x1 on y

holding x2 constant

I Yet another legitimate interpretation is that β̂1 estimates the effect

of x1 on y after the influence of x2 has been removed (or has been

controlled for)

10 / 34

I Similarly,

∆ŷ = β̂2 if ∆x1 = 0 and ∆x2 = 1

I Let’s go back to regression output and interpret the parameters.

wage

[ = −128.89 + 42.06 educ + 5.14 IQ

n = 935, R 2 = .134

I 42.06 shows that for two people with the same IQ, the one with one

more year of education is predicted to earn $42.06 more in monthly

wages.

I Or: Every extra year of education increases the predicted wage by

$42.06, keeping IQ constant (or “after controlling for IQ”, or ”after

removing the effect of IQ”, or ”all else constant”, or ”all else

equal”, or ”ceteris paribus”)

11 / 34

Effects of rescaling

The upshot: changing units of measurement does not create any extra

information, so it only changes OLS results in predictable and

non-substantive ways.

I Scaling x → cx: Any change of unit of measurement that involves

multiplying x by a constant (such as changing dollars to cents or

pounds to kilograms), does not change the column space of X, so

will not change ŷ. Therefore, it must be that the coefficient of x

changes such that x β̂ stays the same.

ŷ = c∗ + β

β c∗ x ∗ + β

c∗ x2

0 1 1 2

⇒ β 0

c∗ = βb1 /c, β

1

c∗ = βb2

2

12 / 34

13 / 34

I Change of units of the form x → a + cx: Any change of unit of

measurement that involves multiplying x by a constant and adding

or subtracting a constant (such as changing from ◦ C to ◦ F ) does

not change the column space of X either, so will not change ŷ.

Therefore, it must be that the β̂ changes in such a way that Xβ̂

stays the same.

ŷ = c∗ + β

β c∗ x ∗ + β

c∗ x2

0 1 1 2

⇒ β 0

c∗ = βb1 /c, β

1

c∗ = βb2

2

I In both cases, since ŷ does not change, residuals, SST, SSE and

SSR all stay the same, so R 2 will not change.

14 / 34

I Changing the units of dependent variable y → cy : This changes the

length of y but the column space of X is still the same. So, ŷ will

be multiplied by c, and since both y and ŷ are multiplied by c, the

residuals will be multiplied by c also.

∗ c∗ + β c∗ x1 + βc∗ x2 + uˆ∗

y = β0 1 2

∗ c∗ = c βb0 , β

c∗ = c βb1 , β

c∗ = c βb2 , uˆ∗ = c û

y = cy ⇒ β 0 1 2

I In this case, SST, SSE and SSR all change but all will be multiplied

by . . ., so again, R 2 will not change.

15 / 34

Estimators and their Unbiasedness

produces an estimate for parameters of interest.

I Example: Sample average is an estimator for the population mean.

I Example: βb = (X0 X)−1 X0 y is an estimator for the parameter vector

β in the multiple regression model.

I Since estimators are functions of sample observations, they are ...

I While we do not know the values of population parameters, we can

use the power of mathematics to investigate if the expected value of

the estimator is indeed the parameter of interest

I Definition: An estimator is an unbiased estimator of a parameter of

interest if its expected value is the parameter of interest.

16 / 34

The Expected Value of the OLS Estimator

b =β

MLR.1 Linear in Parameters E.1 Linear in Parameters

y = β0 + β1 x1 + · · · + βk xk + u y = X β + u

n×1 n×(k+1) (k+1)×1 n×1

MLR.2 Random Sampling

We have a random sample of n observations E.3 Zero Conditional Mean

MLR.4 Zero Conditional Mean E (u | X) = 0

(n×1)

E (u | x1 , x2 , . . . , xk ) = 0

MLR.3 No Perfect Collinearity E.2 No Perfect Collinearity

None of x1 , x2 , . . . , xk is a constant and there X has rank k + 1

are no exact linear relationships among them

17 / 34

I We use an important property of conditional expectations to prove

that the OLS estimator is unbiased: if E (z | w ) = 0, then

E (g (w )z) = 0 for any function g . For example

E (wz) = 0, E (w 2 z) = 0, etc.

I Now let’s show E (β)

b =β

I The assumption of no perfect collinearity (E.2) is immediately

required because ...

I Step 1: Using the population model (Assumption E.1), substitute

for y in the estimator’s formula and simplify

βb = (X0 X)−1 X0 y

= (X0 X)−1 X0 (Xβ + u)

= β + (X0 X)−1 X0 u

18 / 34

I Step 2: Take expectations:

E (β)

b = E (β + (X0 X)−1 X0 u)

= β + E ((X0 X)−1 X0 u)

= β

I Note that all assumptions were needed and were used in this proof.

19 / 34

I Are these assumptions too strong?

I Linearity in parameters is not too strong, and does not exclude

non-linear relationships between y and x (more on this later)

I Random sample is not too strong for cross-sectional data if

participation in the sample is not voluntary. Randomness is

obviously not correct for time series data.

I Perfect multicollinearity is quite unlikely unless we have done

something silly like using income in dollars and income in cents in

the list of independent variables, or we have fallen into the “dummy

variable trap” (more on this later)

I Zero conditional mean is not a problem for predictive analytics,

because for best prediction, we always want our best estimate of

E (y | x1 , x2 , . . . , xk ) for a set of observed predictors.

20 / 34

I Zero conditional mean can be a problem for prescriptive analytics

(causal analysis) when we want to establish the causal effect of one

of the x variables, say x1 on y controlling for an attribute that we

cannot measure. E.g. Causal effect of education on wage keeping

ability constant:

We run the regression: wage = α̂0 + α̂1 educ + v̂

∂ability

E (α̂1 ) = β1 + β2 6= β1

∂educ

I α̂1 is a biased estimator of β1

I This is referred to as “omitted variable bias”

I One solution is to add a measurable proxy for ability, such as IQ

I Other solutions in ETC3410

21 / 34

I Zero conditional mean can be quite restrictive in time series analysis

as well.

I It implies that the error term in any time period t is uncorrelated

with each of the regressors, in all time periods, past, present and

future.

I Assumption MLR.4 is violated when the regression model contains

a lag of the dependent variable as a regressor. E.g. We want to

predict this quarter’s GDP using GDP outcomes for the past 4

quarters

I In this case, the regression parameters are biased estimators.

22 / 34

The Variance of the OLS Estimator

I Coming up with unbiased estimators is not too hard. But how the

estimates that produce are dispersed around the mean determines

how precise they are.

I To study the variance of β,

b we need to learn about variances of a

vector of random variables var-cov matrix

23 / 34

I We introduce an extra assumption:

y = β0 + β1 x1 + · · · + βk xk + u y = X β + u

n×1 n×(k+1) (k+1)×1 n×1

MLR.2 Random Sampling

We have a random sample of n observations E.3 Zero Conditional Mean

MLR.4 Zero Conditional Mean E (u | X) = 0

(n×1)

E (u | x1 , x2 , . . . , xk ) = 0

MLR.3 No Perfect Collinearity E.2 No Perfect Collinearity

None of x1 , x2 , . . . , xk is a constant and there X has rank k + 1

are no exact linear relationships among them

MLR.5 Homoskedasticity E.4 Homo+Randomness

E (u 2 | x1 , x2 , . . . , xk ) = σ 2 Var (u | X) = σ 2 In

24 / 34

I Under these assumptions, the variance-covariance matrix of the OLS

estimator conditional on X is

because

25 / 34

I We can immediately see that given X the OLS estimator will be

more precise (i.e. its variance will be smaller) the smaller σ 2 is

I It can also be seen (not as obvious) that as we add observations to

the sample, the variance of the estimator decreases, which also

makes sense

I Gauss-Markov Theorem: Under Assumptions E.1 to E.4 (or MLR.1

to MLR.5) βb is the best linear unbiased estimator (B.L.U.E.) of β.

I This means that there is no other estimator that can be written as a

linear combination of elements of y that will be unbiased and will

have a lower variance than β.

b

I This is the reason that everybody loves the OLS estimator.

26 / 34

Estimating the Error Variance

I We can calculate β,

b and we showed that

b = σ 2 (X0 X)−1

Var (β)

I An unbiased estimator of σ 2 is

Pn

2

2

i=1 ûi û0 û

σ̂ = =

n−k −1 n−k −1

under the name standard error of the regression, and it is a

measure of how good the fit is (the smaller, the better)

27 / 34

I Why do we divide SSR by n − k − 1 instead of n?

I In order to get an unbiased estimator of σ 2

û0 û

E( ) = σ2 .

n−k −1

If we divide by n the expected value of the estimator will be slightly

different from the true parameter (proof of unbiasedness of σ̂ 2 is

not required)

I Of course if the sample size n is large, this bias is very small

I Think about dimensions: ŷ is in the column space of X, so it is in a

subspace with dimension k + 1

I û is orthogonal to column space of X, so it is in a subspace with

dimension n − (k + 1) = n − k − 1. So even though there are n

coordinates in û, only n − k − 1 of those are free (it has n − k − 1

degrees of freedom)

28 / 34

Standard error of each parameter estimate

the diagonal elements of

Var\

(β̂ | X) = σ̂ 2 (X0 X)−1

I When reporting regression results, we usually report these standard

errors under their corresponding parameter estimates

29 / 34

I Example: Predicting wage with education and experience

wage

(0.77) (0.05) (0.01)

30 / 34

Summary

1. the population model is linear in parameters;

2. the conditional expectation of true errors given all explanatory

variables is zero;

3. the sample is randomly selected; and

4. the explanatory variables are not perfectly collinear

the OLS estimator βb is an unbiased estimator of β.

I If we add no conditional heteroskedasticity to the above

assumptions, then OLS is the B.L.U.E. for β

31 / 34

Summary

important to interpret the coefficients using the context, and very

important to remember that each βb estimates the partial effect of

its corresponding x all other x staying constant.

I Linear scaling: Linear transformation of dependent or independent

variables changes the outcomes of linear regression in exactly

predictable ways, and all of these outcomes are substantively the

same. Hence, we should use the units of measurement which make

presentation of results more understandable without worrying that

changing the units of data may affect our results.

32 / 34

Digression: The Variance-Covariance Matrix

µ1 , µ2 , . . . , µn . We can write:

v1 µ1

v2 µ2

v = . , E (v) = µ = .

n×1 .. n×1 ..

vn µn

vi and vj by σij for i = 1, . . . , n, j = 1, . . . , n and i 6= j

33 / 34

I We define the variance-covariance matrix of v as

2

σ1 σ12 · · · σ1n

σ21 σ22 · · · σ2n

= .

.. ..

.. . .

σn1 σn2 ··· σn2

I Note that since σij = Cov (vi , vj ) = Cov (vj , vi ) = σji , this matrix is

symmetric.

I An important property: If A is an n × k matrix of constants, then

variance

34 / 34

- Introduction to Econometrics- Stock & Watson -Ch 5 Slides.docUploaded byAntonio Alvino
- Quantitative Methods for EconomistsUploaded byyula
- mbook applied econometrics using MATLABUploaded byimad.akhdar
- LUploaded bybosadpapu
- Economics Assignments.docUploaded byEmran Hosen
- TEAM FACTORS AND THE MODERATING EFFECT OF TOP MANAGEMENT SUPPORT ON NEW PRODUCT DEVELOPMENT (NPD) PERFORMANCE: THE MALAYSIAN EXPERIENCEUploaded byMazri Yaakob
- G4e01Uploaded byArslan Tariq
- Assignment Week3Uploaded bytotomkos
- primary education quality in francophone sub-saharan africaUploaded bygokugami
- Margins 01Uploaded byhubik38
- Logit and Probit Analysis LectureUploaded byDecimusBlue
- Zak, P. J., And Knack, S. (2001) Trust and Growth. Economic JournalUploaded byNinthCircleOfHell
- Sushmita ChatterjeeUploaded bysbos1
- Autodefensas y Justicia Por Mano Propia Univ VandelbiltUploaded byJorge Leonardo Méndez
- SSRN-id2601673Uploaded byMegawati Daris
- Exploratory Model of the Impact of Agriculture on Nigerian EconomyUploaded byEditor IJTSRD
- Edwards Start TimesUploaded bysmf 4LAKids
- Does Learning by Exporting Happen? Evidence from the Automobile Industry in ChinaUploaded byLuong Tuan Anh
- cheming-eUploaded byEduardo Ocampo Hernandez
- Arpresearch Dev 200806.04Uploaded byandreutza_sw3378830
- Ch 13 Slides.docUploaded bymathsisbest
- EViewsGuide.pdfUploaded byRashidAli
- 35455-123316-1-PBUploaded byAnissa Thia Chandra
- BF09 Cost Estimation Linear RegressionUploaded byALBERICOJUNIOR345
- analisis_pertumbuhan_industri_terhadap_p.docxUploaded bySyarif Hidayatullah
- Class No 10 CVL 316 W17Uploaded byApril Sky
- Ch2 Large Sample TheoryUploaded byputex247
- ART12091001-2.pdfUploaded bySobuj Bernard
- 3rd Phase PptUploaded byVeeresh Tangadgi
- 07 RegularizationUploaded byson

- Language is Identity Quote AnalysisUploaded bychanlego123
- ETC2410 Introductory Econometrics Unit OverviewUploaded bychanlego123
- CalssTestS2-2018v1Uploaded bychanlego123
- ETC2410_ETC3440_ETW2410_BEX2410_main 2018unprotected (1)Uploaded bychanlego123
- The Beginner's MindUploaded bychanlego123
- Emotion as a WOKUploaded bychanlego123
- BFC2140 Lecture Semester 1 2019 Week 1Uploaded bychanlego123
- Lec6-2019Uploaded bychanlego123
- In What Ways Do the Arts of Today Use the Arts of the PastUploaded bychanlego123
- An American in Paris Essay Answer (MK 2)Uploaded bychanlego123
- Lec5-2019Uploaded bychanlego123
- Lec3-2019Uploaded bychanlego123
- Wrongful Conviction Throws Spotlight on Unreliability of Eyewitness EvidenceUploaded bychanlego123
- The Vocabulary We Have Does More Than Communicate Our Knowledge It Shapes What We KnowUploaded bychanlego123
- Lec3-2019.pdfUploaded bychanlego123
- Lec2-2019Uploaded bychanlego123
- Is Intuition a Way of KnowingUploaded bychanlego123
- Role of LanguageUploaded bychanlego123
- Lec1-2019-v01Uploaded bychanlego123
- Piano Collections NieR Gestalt & ReplicantUploaded bychanlego123
- Theory of TruthUploaded bychanlego123
- Eyes Dont Always Have ItUploaded bychanlego123
- Mid-sem Test WednesdayUploaded bychanlego123
- Mid Semester Tests - MondayUploaded bychanlego123
- Tok Hw KnowerUploaded bychanlego123
- A Pair of Silk Stockings AnalysisUploaded bychanlego123
- Ben Sherman 13 Full[1]Uploaded bysumitg2781

- 2.1986_Longitudinal Data Analysis Using Generalized Linear Models_zeger.1986Uploaded bysss
- FRM 1 AIM Statements for NotesUploaded byMartin Bresser
- ACF.pptUploaded byAnand Satsangi
- Ch_01_Wooldridge_5e_PPTUploaded byMártonFleck
- dissertationUploaded byTheresa Marindiko
- Quantitative Methods for Financial Analyssis Sample-8Uploaded byKeith Martinez
- Linear Regression in SPSSUploaded byJovenil Bacatan
- c11.4 02-Further Issues in Using OLS With Times Series DataUploaded byWilliam Ouyang
- JHHJHUploaded byJeavelyn Cabilino
- SyllabusUploaded byshakiranr1
- Manson S.M. Evans T. 2007 - Agent-Based Modeling of Deforestation in Southern Yucatan Mexico and Reforestation in the Midwest United StatesUploaded byfckw-1
- Causality in Economics and EconometricsUploaded byAbigail P. Dumalus
- asset-v1_UBCx+ITSx+2T2015+type@asset+block@ITSx_Week_5_-_RD(1)Uploaded byLee Tze-Houng
- Demand ForecastingUploaded bysamrulezzz
- Time SeriesUploaded bysakith
- Measurement_in_the_Era_of_AccountabilityUploaded byfpedrini6204
- ISTTUploaded byIntan Permata Asti
- Hierarchical vs Step-WiseUploaded byShabir Hussain
- Cointegration and Testing Unit Roots / applied economicsUploaded byNaresh Sehdev
- PracticeExamRegression3024.pdfUploaded bydungnt0406
- tobinUploaded byNachiket Jog
- Lecture Notes Lectures 1 8Uploaded byElia Scagnolari
- Is Robust Inference With OLS Sensible in TS RegressionsUploaded bynadamau22633
- panel dataUploaded byShikha Sharma
- 4a PPC ForecastingUploaded byBkb Mustaqim
- Lampiran 3.docxUploaded byWahyou Trie
- EconomicsUploaded byObaydur Rahman
- Regression Analysis HW1Uploaded byJavier Recasens
- AMOS 1Uploaded byputri
- (2©ø¨£03)Construction of a Manpower Demand and Supply Forecasting (1)Uploaded byHyo Chong Yang