You are on page 1of 31

ECON1203

Business and Economic


Statistics

Week10

Week 10 topics
Simplelinearregression

Methodofleastsquares
Basicassumptionsofregressionmodel
Inferenceandexplanatorypower

Wehaveflirtedwithregressionbefore!

Thematerialinthislectureandthenextdrawson
SharpeChapters4,15,16and17
MakesureyouareacrossthematerialinCh15
aboutpredictionandconfidenceintervals!
2

Recall (end of) week 2 lecture

Wetalkedabouthowtofitalinetoabivariatescatterplot
Theexampleplottedhoursofinternetuseagainst
education:

Simple regression

Supposewehave(Yi,Xi)pairsfori=1,,n

Wefit a line to the data byminimizingthe


residualsumofsquares.Thisistheaction
accomplishedbyordinaryleastsquares(OLS)
regression.

Thislineisdefinedbyestimatesoftheinterceptand
slopeinthelinearrelationshipYi=0+1Xi+i

Thesignoftheslopecoefficienthasthesame
signasthesamplecovariance(andcorrelation)
betweenYandX
4

Simple regression
Assume :
Y b b X
i

Thisistheactual
Yvaluefor
observationi.

where b0 and b1 are chosen to minimize


n

(
Y

Y
)
i i

ThisisthevalueofYwe
wouldpredictforobservation
i basedonthislinearmodel,
if we knew its X.

i 1

(which is a function of b0 , b1 , and the data on X and Y).


The solution to this minimization problem is :
n

b1

s xy
s

2
x

(X
i 1

X )(Yi Y )

2
(
X

X
)
i
i 1

; b0 Y b1 X
5

Numerical versus statistical


properties

OLScanbeviewedascurvefitting

Thecurvewefitdescribestherelationshipamongst
variables

Butwealsowanttomakeinferencesaboutthe
parametersofthepopulation regression function

Howcanweuseb1tomakeinferencesabout1?

Whatarethepropertiesofb1asanestimatorof1?

Wecanalsouseregressionmodelstomake
predictions orforecasts e.g.,:

Ifacompanyincreasesitsadvertisingexpenditure,whatis
thepredictedimpactonsales?
6
Whatistheconfidenceintervalforthatprediction?

Some basics

Terminology

Yiisthedependent variable

Xiistheindependentorexplanatory variable

iisthedisturbanceorerrorterm

0and1aretheparameterstobeestimated

OLS produces:
Estimated parameter values b0 , b1
Predictions: Y b b X
i

Residuals: ei Yi Yi
7

Some basics

Thepopulationregressionrelationshipis
Yi=0+1Xi+i

Thisequationlinksthe(X,Y) pairsviatheunknown
parametersandtheunobservederrors

OLSproduces
Yi=b0+b1Xi+ei

Thisequationlinksthe(X,Y) pairsviaestimated
parametersandcalculatedresiduals

Some basics

Thedisturbancetermiplaysacrucialroleinregression

Distinguishesregressionmodelsfromdeterministicfunctions
RepresentsfactorsotherthanXithataffectYi

Regressiontreatstheseotherfactorsasunobserved

1isthemarginaleffectofXionYi, holding these other


factors constant

Reliableestimatesof1willrequireassumptionsrestrictingthe
relationshipbetweenXiandi
Ourdesiretomakeceteris paribus interpretationsof1
includemoreexplanatoryvariablesintheregression
Leadstoanextensiontomultiple regression
9

Example:
An Engel curve for food

Sampleof40households,
eachwith3familymembers
Dataon:

Y=weeklyexpenditureon
food($)
X=weeklyincome($)
Cross-sectional data

PopulationEngelcurve
(demandfunctionholding
pricesconstant):
Yi=0+1Xi+ii=1,,n
10

Engel curve for food

Forthesedata,

b0=7.38andb1=0.23

Howdoyouinterpretb1?

Itistheestimatedpartial effect ofXonY.Inthiscase,a$1increasein


householdincomeisestimatedtoincreasehouseholdexpenditureonfoodby23
cents.

Whatisthepredictedfoodexpenditurefora
householdwithaweeklyincomeof$100?
=7.38+0.23100=$30.38

Whyrestricttohouseholdswith3people?
Becauseexpenditurewillbedeterminedbyhouseholdsize,wemust
control for householdsize.Inthisexample,wedothisviaestimating
11
theequationonlyonasetofsame-sizehouseholds.

Assumptions of the Classical


Linear Regression Model

A1:Linearity(themodelbelowisright):

A2:Randomsampling:

ThevaluesofX1,X2,,Xnarenotallequal

A4:Zeroconditionalmean

Wehavearandomsample(Yi,Xi)fromthepopulation
governedbythemodelinA1

A3:SamplevariationinX:

Yi=0+1Xi+i,i=1,,n

E(i|Xi)=0
Thisisakeyassumptionthatweuseininterpretingour
regressionresults.ItimpliesthatiandXiareuncorrelated.

A2andA4meanwecanthinkofXasfixedrather
12
thanrandom

Assumptions of Classical
Linear Regression Model

A5:Homoskedasticity:

A6:Thedisturbancesareuncorrelated:

Cov(i,j)=0,(inotequalj)

A7:Thedisturbancesaredistributednormally,with
identicalvariance(usedforinference):

Var(i)=2

i~N(0,2)

Insomecircumstances,someorallofthese
assumptionswillnotberealistic!Whatdowedo??

Studymoreeconometrics(beyondthiscourse)
Whathappenstotheestimateswegetwhensomeofthese
assumptionsareviolated
13
Whattodoaboutit

Classical Linear Regression


Model (CLRM)

GivenA1-A7,itistruethat:

Yi~N(0+1Xi,2)
The(conditional)meanofY
dependson X

If1=0,whatis0?

1 0 0 Y

GivenA1-A4,wecanshowthat
theOLSestimatorsare
unbiased andconsistent

GivenA1-A6,OLSisthebest
methodtouse,inaparticular
sense
Exactlywhatbestmeansin
thiscontext:lowest-variance(of
thesamplingdistribution)

14

Explaining variation in the


dependent variable
Y

Yi Yi
w

Y b0 b1 X

Yi Y

Y
w

X
15

Decomposition of variance

Yi Y Yi Y Yi Yi
Yi Y ei

Total
part
part explained

by X
deviation

unexplained
It can be shown that

Y Y
2

Yi Y

e
2

2
i

error
Total sum
regression

of squares
sum of squares
sum of squares
SST SSR SSE
16

Standard error of the estimate

Thepopulationvariance,2,measuresthespreadof
thedataaroundthepopulationregressionline

Thestandarderroroftheestimate(SEE)isanestimatorof
Itmeasuresthefitoftheregressionmodel
LowSEEgoodfit
2
e
i 1 i
n

SEE s where s 2

n2

SSE
n2

Divisor of n 2 is a degrees of freedom adjustment (for the


2 parameters that need to be estimated) to ensure that this formula is an
unbiased estimator
17

Coefficient of determination
Define :
SSR
SSE
R
1
SST
SST
R 2 measures the proportion of variation in the dependent variable
that is explained by the regression model
Note :
2

0 R2 1
The closer R 2 is to unity, the better the fit of
" the model" (right - hand side) to " the data" (left - hand side)
2
R 2 rY2Y ( rXY
in simple linear regression)

18

OLS inference

Whatcanwesayaboutthepropertiesofthe
OLSpointestimatorsb0andb1,ifour
assumptionshold?

TheyareunbiasedE(bj)=j
Theyarenormallydistributed,astheyarelinear
functionsofYi,whichareassumedtobedrawnfrom
anormaldistribution
Even without normality of Yi , wecaninvokethe
CLT andassumethatbjwillbeasymptotically
normal

WeneedtoknowVar(b0) andVar(b1)inorderto
19
conductinference

OLS inference...

Justaswedidwhenestimatingmeans,wecandefine
thetrueandestimatedvariances
Thepanelbelowgives:

Truevariancesontheleft,and
Estimatedstandarderrorsontheright,wheretheunknown is
replacedbyestimateds.

2
2 X 2i
X
i
var( b0 )

s
b0
n ( X i X )2
n ( X i X )2

2
var( b1 )
sb1 s
2
( Xi X )

1
2
(
X

X
)
i

20

OLS inference
Now we have a basis for testing hypotheses about the population ' s!
b j ~ N j , var(b j ) , j 0,1

b j 0j
var(b j )

~ N (0,1)

Under H 0 : j 0j .
With unknown 2 , we need to estimate var(b j ) by
replacing 2 with s 2

b j 0j
var(b j )

b j 0j
sb j

2
e
i

n2

~ t( n 2 )
21

Executive salaries

Considertherelationshipbetweensalariesof
chiefexecutiveofficersandfirmperformance

Datafor209USCEOsfor1990
Assumetheregressionmodel:
Yi=0+1Xi+i
Y=salaryinthousandsofUSdollars(salary)
X=average(over3years)returnonequity(roe)
Rangeofsalaryisfrom223to14,822($223,000to
$14,822,000)withameanof1,281($1,281,000)
roerangesfrom0.5%to56.3%withameanof17.2%

22

Executive salaries

23

Executive salaries

Onlyaweaklinearrelationshipisestimatedbetween
salary androe

R2=0.013themodelexplainsonly1.3%ofthevariationin
CEOsalaries
Also,SEE=1,367comparedwithameansalaryof1,281

Theestimatedeffectofroeonsalary,b1,is18.5.This
means:

Aunitincrease(onepercentagepoint)inroewouldon
average resultinanincreaseinsalaryof$18,500
Thispointestimatehasastandarderrorof11.1

p-value=0.098wouldnotrejectH0:1=0versusH1:10,for
any<9.8%
24
Similarly,the95%CIis-3.4to40.4andhencecovers
1=0

Executive salaries

Isb1=18.5abigeffectinaneconomicsense?

Whatabouttheinterceptestimate(b0=963.2)?

Thismeansthepredictedsalary fortheCEOofafirm
withroe=0is$963,200

Beware!

Sometimesinterceptsdontmakesense
WhatifYwereCEOsalaryandXwereageofCEO?
Wedstillhavetohaveanintercept!

Regressionmodelsareapproximations
25
Weareoftenonlyinterested/confidentintheapproximationfor
valuesinthedata

Example: Male versus Female


Hourly wages
Q:Doyoungmalesonaverageearnmore
wagesthanyoungfemales?
WehaveUSNationalLongitudinalSurvey(NLS)
datafor1987,including:

Hourlywagesforeachrespondent,measuredin$.
Adummy(binaryvariable)takingthevalue0ifthe
observationisfemale,and1iftheobservationis
male.

26

Male versus Female Hourly


wages

LetWibetheobservedhourlywage,i=1,,3294

Definethedummyvariable,Dt ,asfollows:

Di =1ifmale

Di =0otherwise(female)

Specifyourregressionmodelas

Wi=0+1Di+i

Whataretheinterpretationsof0and1?

E(Wi|Di =0)=0andE(Wi|Di =1)=0+1

1isthereforeinterpretableasthedifference between the


meansoffemaleandmalewages
27

Male versus Female Hourly


wages

28

Male versus Female Hourly


wages

Theregressionmodeldoesnotfitthedatawell.

R2=0.032model(gender)explains3.2%ofthevariationinwages

Implication:therearemanyothervariableswhich
explainhourlywage(multipleregressionwillbe
introducednextweek!)
Themeanfemalewageisestimatedtobe$5.15
Resultsindicatethatmalesonaverageearn$1.17per
hourmorethanfemales
Thisgendereffectislargeinapractical oreconomic
sense
Theassociatedt-statisticindicatesthattheeffectisalso
statistically significant:theeffectofgenderonwageis
29
verypreciselyestimated

Practical inference

Thiswaslargesampleinference
Inbothregressionexamples,oursamplesizewaslarge(209and
3294,respectively)
Whathappenstoourt-statasn?
Statisticalversuseconomicsignificance
Thegendereffectwaslargeinbothaneconomicandstatisticalsense
Whatiftheestimatedeffectofroeonsalarywas18.5butthe
associateds.e.was2.1? Nowthet-stat=18.5/2.18.81withp0.000
Useofhypothesistestingindecisionmaking:
Treatascircumstantialevidenceratherthanproof
Reportingstyles
Reportstandarderrors,nott-statistics!!!
Choiceofsignificancelevels?
Compare pvaluesagainstwhateveralphaisappropriatetothe
30
researchquestion

Progress report
Wehaveintroducedlinearregression,including
concepts,jargon,usefulstatisticalresults,and
simpleexamples
Fromnextweek,wewilltalkmoreabout
inferenceandstartmovingtomultipleregression
GoodluckontheCourseProject!Remember
to submit BOTH hard copy (at the START of
your tutorial workshop) AND e-copy (on
Moodle)!!!

31

You might also like