Econ 1203

ECON1203
Business and Economic

Statistics
Week10
Week 10 topics
Simplelinearregression
Methodofleastsquares
Basicassumptionsofregressionmodel
Inferenceandexplanatorypower
Wehaveflirtedwithregressionbefore!
Thematerialinthislectureandthenextdrawson
SharpeChapters4,15,16and17
MakesureyouareacrossthematerialinCh15
aboutpredictionandconfidenceintervals!
2
Recall (end of) week 2 lecture
Wetalkedabouthowtofitalinetoabivariatescatterplot
Theexampleplottedhoursofinternetuseagainst
education:
Simple regression
Supposewehave(Yi,Xi)pairsfori=1,,n
Wefit a line to the data byminimizingthe

residualsumofsquares.Thisistheaction
accomplishedbyordinaryleastsquares(OLS)
regression.
Thislineisdefinedbyestimatesoftheinterceptand
slopeinthelinearrelationshipYi=0+1Xi+i
Thesignoftheslopecoefficienthasthesame
signasthesamplecovariance(andcorrelation)
betweenYandX
4
Simple regression
Assume :
Y b b X
i
Thisistheactual
Yvaluefor
observationi.
where b0 and b1 are chosen to minimize

n
(
Y
Y
)
i i
ThisisthevalueofYwe
wouldpredictforobservation
i basedonthislinearmodel,
if we knew its X.
i 1
(which is a function of b0 , b1 , and the data on X and Y).

The solution to this minimization problem is :
n
b1
s xy
s
2
x
(X
i 1
X )(Yi Y )
2
(
X
X
)
i
i 1
; b0 Y b1 X
5
Numerical versus statistical

properties
OLScanbeviewedascurvefitting
Thecurvewefitdescribestherelationshipamongst
variables
Butwealsowanttomakeinferencesaboutthe
parametersofthepopulation regression function
Howcanweuseb1tomakeinferencesabout1?
Whatarethepropertiesofb1asanestimatorof1?
Wecanalsouseregressionmodelstomake
predictions orforecasts e.g.,:
Ifacompanyincreasesitsadvertisingexpenditure,whatis
thepredictedimpactonsales?
6
Whatistheconfidenceintervalforthatprediction?
Some basics
Terminology
Yiisthedependent variable
Xiistheindependentorexplanatory variable
iisthedisturbanceorerrorterm
0and1aretheparameterstobeestimated
OLS produces:
Estimated parameter values b0 , b1
Predictions: Y b b X
i
Residuals: ei Yi Yi
7
Some basics
Thepopulationregressionrelationshipis
Yi=0+1Xi+i
Thisequationlinksthe(X,Y) pairsviatheunknown
parametersandtheunobservederrors
OLSproduces
Yi=b0+b1Xi+ei
Thisequationlinksthe(X,Y) pairsviaestimated
parametersandcalculatedresiduals
Some basics
Thedisturbancetermiplaysacrucialroleinregression
Distinguishesregressionmodelsfromdeterministicfunctions
RepresentsfactorsotherthanXithataffectYi
Regressiontreatstheseotherfactorsasunobserved
1isthemarginaleffectofXionYi, holding these other

factors constant
Reliableestimatesof1willrequireassumptionsrestrictingthe
relationshipbetweenXiandi
Ourdesiretomakeceteris paribus interpretationsof1
includemoreexplanatoryvariablesintheregression
Leadstoanextensiontomultiple regression
9
Example:
An Engel curve for food
Sampleof40households,
eachwith3familymembers
Dataon:
Y=weeklyexpenditureon
food($)
X=weeklyincome($)
Cross-sectional data
PopulationEngelcurve
(demandfunctionholding
pricesconstant):
Yi=0+1Xi+ii=1,,n
10
Engel curve for food
Forthesedata,
b0=7.38andb1=0.23
Howdoyouinterpretb1?
Itistheestimatedpartial effect ofXonY.Inthiscase,a$1increasein

householdincomeisestimatedtoincreasehouseholdexpenditureonfoodby23
cents.
Whatisthepredictedfoodexpenditurefora
householdwithaweeklyincomeof$100?
=7.38+0.23100=$30.38
Whyrestricttohouseholdswith3people?
Becauseexpenditurewillbedeterminedbyhouseholdsize,wemust
control for householdsize.Inthisexample,wedothisviaestimating
11
theequationonlyonasetofsame-sizehouseholds.
Assumptions of the Classical

Linear Regression Model
A1:Linearity(themodelbelowisright):
A2:Randomsampling:
ThevaluesofX1,X2,,Xnarenotallequal
A4:Zeroconditionalmean
Wehavearandomsample(Yi,Xi)fromthepopulation
governedbythemodelinA1
A3:SamplevariationinX:
Yi=0+1Xi+i,i=1,,n
E(i|Xi)=0
Thisisakeyassumptionthatweuseininterpretingour
regressionresults.ItimpliesthatiandXiareuncorrelated.
A2andA4meanwecanthinkofXasfixedrather
12
thanrandom
Assumptions of Classical
Linear Regression Model
A5:Homoskedasticity:
A6:Thedisturbancesareuncorrelated:
Cov(i,j)=0,(inotequalj)
A7:Thedisturbancesaredistributednormally,with
identicalvariance(usedforinference):
Var(i)=2
i~N(0,2)
Insomecircumstances,someorallofthese
assumptionswillnotberealistic!Whatdowedo??
Studymoreeconometrics(beyondthiscourse)
Whathappenstotheestimateswegetwhensomeofthese
assumptionsareviolated
13
Whattodoaboutit
Classical Linear Regression

Model (CLRM)
GivenA1-A7,itistruethat:
Yi~N(0+1Xi,2)
The(conditional)meanofY
dependson X
If1=0,whatis0?
1 0 0 Y
GivenA1-A4,wecanshowthat
theOLSestimatorsare
unbiased andconsistent
GivenA1-A6,OLSisthebest
methodtouse,inaparticular
sense
Exactlywhatbestmeansin
thiscontext:lowest-variance(of
thesamplingdistribution)
14
Explaining variation in the

dependent variable
Y
Yi Yi
w
Y b0 b1 X
Yi Y
Y
w
X
15
Decomposition of variance
Yi Y Yi Y Yi Yi
Yi Y ei
Total
part
part explained
by X
deviation
unexplained
It can be shown that
Y Y
2
Yi Y
e
2
2
i
error
Total sum
regression
of squares
sum of squares
sum of squares
SST SSR SSE
16
Standard error of the estimate
Thepopulationvariance,2,measuresthespreadof
thedataaroundthepopulationregressionline
Thestandarderroroftheestimate(SEE)isanestimatorof
Itmeasuresthefitoftheregressionmodel
LowSEEgoodfit
2
e
i 1 i
n
SEE s where s 2
n2
SSE
n2
Divisor of n 2 is a degrees of freedom adjustment (for the

2 parameters that need to be estimated) to ensure that this formula is an
unbiased estimator
17
Coefficient of determination
Define :
SSR
SSE
R
1
SST
SST
R 2 measures the proportion of variation in the dependent variable
that is explained by the regression model
Note :
2
0 R2 1
The closer R 2 is to unity, the better the fit of
" the model" (right - hand side) to " the data" (left - hand side)
2
R 2 rY2Y ( rXY
in simple linear regression)
18
OLS inference
Whatcanwesayaboutthepropertiesofthe
OLSpointestimatorsb0andb1,ifour
assumptionshold?
TheyareunbiasedE(bj)=j
Theyarenormallydistributed,astheyarelinear
functionsofYi,whichareassumedtobedrawnfrom
anormaldistribution
Even without normality of Yi , wecaninvokethe
CLT andassumethatbjwillbeasymptotically
normal
WeneedtoknowVar(b0) andVar(b1)inorderto
19
conductinference
OLS inference...
Justaswedidwhenestimatingmeans,wecandefine
thetrueandestimatedvariances
Thepanelbelowgives:
Truevariancesontheleft,and
Estimatedstandarderrorsontheright,wheretheunknown is
replacedbyestimateds.
2
2 X 2i
X
i
var( b0 )
s
b0
n ( X i X )2
n ( X i X )2
2
var( b1 )
sb1 s
2
( Xi X )
1
2
(
X
X
)
i
20
OLS inference
Now we have a basis for testing hypotheses about the population ' s!
b j ~ N j , var(b j ) , j 0,1
b j 0j
var(b j )
~ N (0,1)
Under H 0 : j 0j .
With unknown 2 , we need to estimate var(b j ) by
replacing 2 with s 2
b j 0j
var(b j )
b j 0j
sb j
2
e
i
n2
~ t( n 2 )
21
Executive salaries
Considertherelationshipbetweensalariesof
chiefexecutiveofficersandfirmperformance
Datafor209USCEOsfor1990
Assumetheregressionmodel:
Yi=0+1Xi+i
Y=salaryinthousandsofUSdollars(salary)
X=average(over3years)returnonequity(roe)
Rangeofsalaryisfrom223to14,822($223,000to
$14,822,000)withameanof1,281($1,281,000)
roerangesfrom0.5%to56.3%withameanof17.2%
22
Executive salaries
23
Executive salaries
Onlyaweaklinearrelationshipisestimatedbetween
salary androe
R2=0.013themodelexplainsonly1.3%ofthevariationin
CEOsalaries
Also,SEE=1,367comparedwithameansalaryof1,281
Theestimatedeffectofroeonsalary,b1,is18.5.This
means:
Aunitincrease(onepercentagepoint)inroewouldon
average resultinanincreaseinsalaryof$18,500
Thispointestimatehasastandarderrorof11.1
p-value=0.098wouldnotrejectH0:1=0versusH1:10,for
any<9.8%
24
Similarly,the95%CIis-3.4to40.4andhencecovers
1=0
Executive salaries
Isb1=18.5abigeffectinaneconomicsense?
Whatabouttheinterceptestimate(b0=963.2)?
Thismeansthepredictedsalary fortheCEOofafirm
withroe=0is$963,200
Beware!
Sometimesinterceptsdontmakesense
WhatifYwereCEOsalaryandXwereageofCEO?
Wedstillhavetohaveanintercept!
Regressionmodelsareapproximations
25
Weareoftenonlyinterested/confidentintheapproximationfor
valuesinthedata
Example: Male versus Female

Hourly wages
Q:Doyoungmalesonaverageearnmore
wagesthanyoungfemales?
WehaveUSNationalLongitudinalSurvey(NLS)
datafor1987,including:
Hourlywagesforeachrespondent,measuredin$.
Adummy(binaryvariable)takingthevalue0ifthe
observationisfemale,and1iftheobservationis
male.
26
Male versus Female Hourly

wages
LetWibetheobservedhourlywage,i=1,,3294
Definethedummyvariable,Dt ,asfollows:
Di =1ifmale
Di =0otherwise(female)
Specifyourregressionmodelas
Wi=0+1Di+i
Whataretheinterpretationsof0and1?
E(Wi|Di =0)=0andE(Wi|Di =1)=0+1
1isthereforeinterpretableasthedifference between the

meansoffemaleandmalewages
27

wages
28

wages
Theregressionmodeldoesnotfitthedatawell.
R2=0.032model(gender)explains3.2%ofthevariationinwages
Implication:therearemanyothervariableswhich
explainhourlywage(multipleregressionwillbe
introducednextweek!)
Themeanfemalewageisestimatedtobe$5.15
Resultsindicatethatmalesonaverageearn$1.17per
hourmorethanfemales
Thisgendereffectislargeinapractical oreconomic
sense
Theassociatedt-statisticindicatesthattheeffectisalso
statistically significant:theeffectofgenderonwageis
29
verypreciselyestimated
Practical inference
Thiswaslargesampleinference
Inbothregressionexamples,oursamplesizewaslarge(209and
3294,respectively)
Whathappenstoourt-statasn?
Statisticalversuseconomicsignificance
Thegendereffectwaslargeinbothaneconomicandstatisticalsense
Whatiftheestimatedeffectofroeonsalarywas18.5butthe
associateds.e.was2.1? Nowthet-stat=18.5/2.18.81withp0.000
Useofhypothesistestingindecisionmaking:
Treatascircumstantialevidenceratherthanproof
Reportingstyles
Reportstandarderrors,nott-statistics!!!
Choiceofsignificancelevels?
Compare pvaluesagainstwhateveralphaisappropriatetothe
30
researchquestion
Progress report
Wehaveintroducedlinearregression,including
concepts,jargon,usefulstatisticalresults,and
simpleexamples
Fromnextweek,wewilltalkmoreabout
inferenceandstartmovingtomultipleregression
GoodluckontheCourseProject!Remember
to submit BOTH hard copy (at the START of
your tutorial workshop) AND e-copy (on
Moodle)!!!
31

Econ 1203

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Econ 1203

Uploaded by

Copyright:

Available Formats

ECON1203

Business and Economic

Recall (end of) week 2 lecture

Wefit a line to the data byminimizingthe

where b0 and b1 are chosen to minimize

(which is a function of b0 , b1 , and the data on X and Y).

Numerical versus statistical

1isthemarginaleffectofXionYi, holding these other

Engel curve for food

Itistheestimatedpartial effect ofXonY.Inthiscase,a$1increasein

Assumptions of the Classical

Classical Linear Regression

Explaining variation in the

Standard error of the estimate

Divisor of n 2 is a degrees of freedom adjustment (for the

Example: Male versus Female

Male versus Female Hourly

E(Wi|Di =0)=0andE(Wi|Di =1)=0+1

1isthereforeinterpretableasthedifference between the

Male versus Female Hourly

Male versus Female Hourly

You might also like