Professional Documents
Culture Documents
Week10
Week 10 topics
Simplelinearregression
Methodofleastsquares
Basicassumptionsofregressionmodel
Inferenceandexplanatorypower
Wehaveflirtedwithregressionbefore!
Thematerialinthislectureandthenextdrawson
SharpeChapters4,15,16and17
MakesureyouareacrossthematerialinCh15
aboutpredictionandconfidenceintervals!
2
Wetalkedabouthowtofitalinetoabivariatescatterplot
Theexampleplottedhoursofinternetuseagainst
education:
Simple regression
Supposewehave(Yi,Xi)pairsfori=1,,n
Thislineisdefinedbyestimatesoftheinterceptand
slopeinthelinearrelationshipYi=0+1Xi+i
Thesignoftheslopecoefficienthasthesame
signasthesamplecovariance(andcorrelation)
betweenYandX
4
Simple regression
Assume :
Y b b X
i
Thisistheactual
Yvaluefor
observationi.
(
Y
Y
)
i i
ThisisthevalueofYwe
wouldpredictforobservation
i basedonthislinearmodel,
if we knew its X.
i 1
b1
s xy
s
2
x
(X
i 1
X )(Yi Y )
2
(
X
X
)
i
i 1
; b0 Y b1 X
5
OLScanbeviewedascurvefitting
Thecurvewefitdescribestherelationshipamongst
variables
Butwealsowanttomakeinferencesaboutthe
parametersofthepopulation regression function
Howcanweuseb1tomakeinferencesabout1?
Whatarethepropertiesofb1asanestimatorof1?
Wecanalsouseregressionmodelstomake
predictions orforecasts e.g.,:
Ifacompanyincreasesitsadvertisingexpenditure,whatis
thepredictedimpactonsales?
6
Whatistheconfidenceintervalforthatprediction?
Some basics
Terminology
Yiisthedependent variable
Xiistheindependentorexplanatory variable
iisthedisturbanceorerrorterm
0and1aretheparameterstobeestimated
OLS produces:
Estimated parameter values b0 , b1
Predictions: Y b b X
i
Residuals: ei Yi Yi
7
Some basics
Thepopulationregressionrelationshipis
Yi=0+1Xi+i
Thisequationlinksthe(X,Y) pairsviatheunknown
parametersandtheunobservederrors
OLSproduces
Yi=b0+b1Xi+ei
Thisequationlinksthe(X,Y) pairsviaestimated
parametersandcalculatedresiduals
Some basics
Thedisturbancetermiplaysacrucialroleinregression
Distinguishesregressionmodelsfromdeterministicfunctions
RepresentsfactorsotherthanXithataffectYi
Regressiontreatstheseotherfactorsasunobserved
Reliableestimatesof1willrequireassumptionsrestrictingthe
relationshipbetweenXiandi
Ourdesiretomakeceteris paribus interpretationsof1
includemoreexplanatoryvariablesintheregression
Leadstoanextensiontomultiple regression
9
Example:
An Engel curve for food
Sampleof40households,
eachwith3familymembers
Dataon:
Y=weeklyexpenditureon
food($)
X=weeklyincome($)
Cross-sectional data
PopulationEngelcurve
(demandfunctionholding
pricesconstant):
Yi=0+1Xi+ii=1,,n
10
Forthesedata,
b0=7.38andb1=0.23
Howdoyouinterpretb1?
Whatisthepredictedfoodexpenditurefora
householdwithaweeklyincomeof$100?
=7.38+0.23100=$30.38
Whyrestricttohouseholdswith3people?
Becauseexpenditurewillbedeterminedbyhouseholdsize,wemust
control for householdsize.Inthisexample,wedothisviaestimating
11
theequationonlyonasetofsame-sizehouseholds.
A1:Linearity(themodelbelowisright):
A2:Randomsampling:
ThevaluesofX1,X2,,Xnarenotallequal
A4:Zeroconditionalmean
Wehavearandomsample(Yi,Xi)fromthepopulation
governedbythemodelinA1
A3:SamplevariationinX:
Yi=0+1Xi+i,i=1,,n
E(i|Xi)=0
Thisisakeyassumptionthatweuseininterpretingour
regressionresults.ItimpliesthatiandXiareuncorrelated.
A2andA4meanwecanthinkofXasfixedrather
12
thanrandom
Assumptions of Classical
Linear Regression Model
A5:Homoskedasticity:
A6:Thedisturbancesareuncorrelated:
Cov(i,j)=0,(inotequalj)
A7:Thedisturbancesaredistributednormally,with
identicalvariance(usedforinference):
Var(i)=2
i~N(0,2)
Insomecircumstances,someorallofthese
assumptionswillnotberealistic!Whatdowedo??
Studymoreeconometrics(beyondthiscourse)
Whathappenstotheestimateswegetwhensomeofthese
assumptionsareviolated
13
Whattodoaboutit
GivenA1-A7,itistruethat:
Yi~N(0+1Xi,2)
The(conditional)meanofY
dependson X
If1=0,whatis0?
1 0 0 Y
GivenA1-A4,wecanshowthat
theOLSestimatorsare
unbiased andconsistent
GivenA1-A6,OLSisthebest
methodtouse,inaparticular
sense
Exactlywhatbestmeansin
thiscontext:lowest-variance(of
thesamplingdistribution)
14
Yi Yi
w
Y b0 b1 X
Yi Y
Y
w
X
15
Decomposition of variance
Yi Y Yi Y Yi Yi
Yi Y ei
Total
part
part explained
by X
deviation
unexplained
It can be shown that
Y Y
2
Yi Y
e
2
2
i
error
Total sum
regression
of squares
sum of squares
sum of squares
SST SSR SSE
16
Thepopulationvariance,2,measuresthespreadof
thedataaroundthepopulationregressionline
Thestandarderroroftheestimate(SEE)isanestimatorof
Itmeasuresthefitoftheregressionmodel
LowSEEgoodfit
2
e
i 1 i
n
SEE s where s 2
n2
SSE
n2
Coefficient of determination
Define :
SSR
SSE
R
1
SST
SST
R 2 measures the proportion of variation in the dependent variable
that is explained by the regression model
Note :
2
0 R2 1
The closer R 2 is to unity, the better the fit of
" the model" (right - hand side) to " the data" (left - hand side)
2
R 2 rY2Y ( rXY
in simple linear regression)
18
OLS inference
Whatcanwesayaboutthepropertiesofthe
OLSpointestimatorsb0andb1,ifour
assumptionshold?
TheyareunbiasedE(bj)=j
Theyarenormallydistributed,astheyarelinear
functionsofYi,whichareassumedtobedrawnfrom
anormaldistribution
Even without normality of Yi , wecaninvokethe
CLT andassumethatbjwillbeasymptotically
normal
WeneedtoknowVar(b0) andVar(b1)inorderto
19
conductinference
OLS inference...
Justaswedidwhenestimatingmeans,wecandefine
thetrueandestimatedvariances
Thepanelbelowgives:
Truevariancesontheleft,and
Estimatedstandarderrorsontheright,wheretheunknown is
replacedbyestimateds.
2
2 X 2i
X
i
var( b0 )
s
b0
n ( X i X )2
n ( X i X )2
2
var( b1 )
sb1 s
2
( Xi X )
1
2
(
X
X
)
i
20
OLS inference
Now we have a basis for testing hypotheses about the population ' s!
b j ~ N j , var(b j ) , j 0,1
b j 0j
var(b j )
~ N (0,1)
Under H 0 : j 0j .
With unknown 2 , we need to estimate var(b j ) by
replacing 2 with s 2
b j 0j
var(b j )
b j 0j
sb j
2
e
i
n2
~ t( n 2 )
21
Executive salaries
Considertherelationshipbetweensalariesof
chiefexecutiveofficersandfirmperformance
Datafor209USCEOsfor1990
Assumetheregressionmodel:
Yi=0+1Xi+i
Y=salaryinthousandsofUSdollars(salary)
X=average(over3years)returnonequity(roe)
Rangeofsalaryisfrom223to14,822($223,000to
$14,822,000)withameanof1,281($1,281,000)
roerangesfrom0.5%to56.3%withameanof17.2%
22
Executive salaries
23
Executive salaries
Onlyaweaklinearrelationshipisestimatedbetween
salary androe
R2=0.013themodelexplainsonly1.3%ofthevariationin
CEOsalaries
Also,SEE=1,367comparedwithameansalaryof1,281
Theestimatedeffectofroeonsalary,b1,is18.5.This
means:
Aunitincrease(onepercentagepoint)inroewouldon
average resultinanincreaseinsalaryof$18,500
Thispointestimatehasastandarderrorof11.1
p-value=0.098wouldnotrejectH0:1=0versusH1:10,for
any<9.8%
24
Similarly,the95%CIis-3.4to40.4andhencecovers
1=0
Executive salaries
Isb1=18.5abigeffectinaneconomicsense?
Whatabouttheinterceptestimate(b0=963.2)?
Thismeansthepredictedsalary fortheCEOofafirm
withroe=0is$963,200
Beware!
Sometimesinterceptsdontmakesense
WhatifYwereCEOsalaryandXwereageofCEO?
Wedstillhavetohaveanintercept!
Regressionmodelsareapproximations
25
Weareoftenonlyinterested/confidentintheapproximationfor
valuesinthedata
Hourlywagesforeachrespondent,measuredin$.
Adummy(binaryvariable)takingthevalue0ifthe
observationisfemale,and1iftheobservationis
male.
26
LetWibetheobservedhourlywage,i=1,,3294
Definethedummyvariable,Dt ,asfollows:
Di =1ifmale
Di =0otherwise(female)
Specifyourregressionmodelas
Wi=0+1Di+i
Whataretheinterpretationsof0and1?
28
Theregressionmodeldoesnotfitthedatawell.
R2=0.032model(gender)explains3.2%ofthevariationinwages
Implication:therearemanyothervariableswhich
explainhourlywage(multipleregressionwillbe
introducednextweek!)
Themeanfemalewageisestimatedtobe$5.15
Resultsindicatethatmalesonaverageearn$1.17per
hourmorethanfemales
Thisgendereffectislargeinapractical oreconomic
sense
Theassociatedt-statisticindicatesthattheeffectisalso
statistically significant:theeffectofgenderonwageis
29
verypreciselyestimated
Practical inference
Thiswaslargesampleinference
Inbothregressionexamples,oursamplesizewaslarge(209and
3294,respectively)
Whathappenstoourt-statasn?
Statisticalversuseconomicsignificance
Thegendereffectwaslargeinbothaneconomicandstatisticalsense
Whatiftheestimatedeffectofroeonsalarywas18.5butthe
associateds.e.was2.1? Nowthet-stat=18.5/2.18.81withp0.000
Useofhypothesistestingindecisionmaking:
Treatascircumstantialevidenceratherthanproof
Reportingstyles
Reportstandarderrors,nott-statistics!!!
Choiceofsignificancelevels?
Compare pvaluesagainstwhateveralphaisappropriatetothe
30
researchquestion
Progress report
Wehaveintroducedlinearregression,including
concepts,jargon,usefulstatisticalresults,and
simpleexamples
Fromnextweek,wewilltalkmoreabout
inferenceandstartmovingtomultipleregression
GoodluckontheCourseProject!Remember
to submit BOTH hard copy (at the START of
your tutorial workshop) AND e-copy (on
Moodle)!!!
31