Professional Documents
Culture Documents
Forecasting in Stata
Forecasting in Stata
Introduction
ThismanualisintendedtobeareferenceguidefortimeseriesforecastinginSTATA.Itwillbeupdated
periodicallyduringthesemester,andwillbeavailableonthecoursewebsite.
WorkingwithvariablesinSTATA
IntheDataEditor,youcanseethatvariablesarerecordedbySTATAinspreadsheetformat.Eachrowsis
anobservation,eachcolumnisadifferentvariable.AneasywaytogetdataintoSTATAisbycutting
andpastingintotheDataEditor.
WhenvariablesarepastedintoSTATA,theyaregiventhedefaultnamesvar1,var2,etc.Youshould
renamethemsoyoucankeeptrackofwhattheyare.Thecommandtorenamevar1asgdpis:
. rename var1 gdp
Newvariablescanbecreatedbyusingthegeneratecommand.Forexample,totakethelogofthe
variablegdp:
. generate y=ln(gdp)
DatesandTime
Fortimeseriesanalysis,datesandtimesarecritical.Youneedtohaveonevariablewhichrecordsthe
timeindex.Wedescribehowtocreatethisseries.
AnnualData
Forannualdataitisconvenientifthetimeindexistheyearnumber(e.g.2010).Supposeyourfirst
observationistheyear1947.Youcangeneratethetimeindexbythecommands:
. generate t=1947+_n-1
. tsset t, annual
Thevariable_nisthenaturalindexoftheobservation,startingat1andrunningtothenumberof
observationsn.Thegeneratecommandcreatesavariabletwhichadds1947to_n,andthen
subtracts1,soitisaserieswithentries1947,1948,1949,etc.Thetssetcommanddeclaresthe
variablettobethetimeindex.Theoptionannualisnotnecessary,buttellsSTATAthatthetime
indexismeasuredattheannualfrequency.
QuarterlyData
STATAstoresthetimeindexasanintegerseries.Itusestheconventionthatthefirstquarterof1960is
0.Thesecondquarterof1960is1,thefirstquarterof1961is4,etc.Datesbefore1960arenegative
integers,sothatthefourthquarterof1959is1,thethirdis2,etc.
Whenformattedasadate,STATAdisplaysquarterlytimeperiodsas1957q2,meaningthesecond
quarterof1957.(EventhoughSTATAstoresthenumber11,theeleventhquarterbefore1960q1.)
STATAusestheformulatq(1957q2)totranslatetheformatteddate1957q2tothenumericalindex
11.
Supposethatyourfirstobservationisthethirdquarterof1947.Youcangenerateatimeindexforthe
datasetbythecommands
. generate t=tq(1947q3)+_n-1
. format t %tq
. tsset t
Thegeneratecommandcreatesavariabletwithintegerentries,normalizedsothat0occursin
1060q1.Theformatcommandformatsthevariabletusingthetimeseriesquarterlyformat.Thetq
referstotimeseriesquarterly.Thetssetcommanddeclaresthatthevariabletisthetimeindex.
Youcouldhavealternativelytyped
. tsset t, quarterly
totellSTATAthatitisaquarterlyseries,butitisnotnecessaryasthasalreadybeenformattedas
quarterly.Now,whenyoulookatthevariabletyouwillseeitdisplayedinyearquarterformat.
MonthlyData
Monthlydataissimilar,butwithmreplacingq.STATAstoresthetimeindexwiththeconvention
that1960m1is0.Togenerateamonthlyindexstartinginthesecondmonthof1962,usethecommands
. generate t=tm(1962m2)+_n-1
. format t %tm
. tsset t
WeeklyData
Weeklydataissimilar,withwinsteadofqandm,andthebaseperiodis1960w1.Foraseries
startinginthe7thweekof1973,usethecommands
. generate t=tw(1973w7)+_n-1
. format t %tw
. tsset t
DailyData
Dailydataisstoredbydates.Forexample,01jan1960isJan1,1960,whichisthebaseperiod.To
generateadailytimeindexstaringonApril18,1962,usethecommands
. generate t=td(18apr1962)+_n-1
. format t %td
. tsset t
PastingaDataTableintoSTATA
Somequarterlyandmonthlydataareavailableastableswhereeachrowisayearandthecolumnsare
differentquartersormonths.IfyoupastethistableintoSTATA,itwilltreateachcolumn(eachmonth)
asaseparatevariable.YoucanuseSTATAtorearrangethedataintoasinglecolumn,butyouhavetodo
thisforonevariableatatime.
Iwilldescribethisformonthlydata,butthestepsarethesameforquarterly.
AfteryouhavepastedthedataintoSTATA,supposethatthereare13columns,whereoneistheyear
number(e.g.1958)andtheother12arethevaluesforthevariableitself.Renametheyearnumberas
year,andleavetheother12variableslistedasvar2etc.Thenusethereshapecommand
.reshapelongvar,i(year)j(month)
Now,thedataeditorshouldshowthreevariables:year,monthandvar.STATAhasresortedthe
observationsintoasinglecolumn.Youcandroptheyearandmonthvariables,createamonthlytime
index,andrenamevartobemoredescriptive.
Inthereshapecommandlistedabove,STATAtakesthevariableswhichstartwithvarandstripsoffthe
trailingnumbersandputstheminthenewvariablemonth.Itusestheexistingvariableyearto
groupobservations.
DataOrganizedinRows
Somedatasetsarepostedinrows.Eachrowisadifferentvariable,andeachcolumnisadifferenttime
period.IfyoucutandpastearowofdataintoSTATA,itwillinterpretthedataasasingleobservation
withmanyvariables.
OnemethodtosolvethisproblemiswithExcel.Copytherowofdata,openacleanExcelWorksheet,
andusethePasteSpecialCommand.(Rightclick,thenPasteSpecial.)ChecktheTransposeoption,
andOK.Thiswillpastethedataintoacolumn.Youcanthencopyandpastethecolumnofdatainto
theSTATADataEditor.
CleaningDataPastedintoSTATA
Manydatasetspostedonthewebarenotimmediatelyusefulfornumericalanalysis,astheyarenotin
calendarorder,orhaveextracharacters,columns,orrows.Beforeattemptinganalysis,besureto
visuallyinspectthedatatobesurethatyoudonothavenonsense.
Examples
Dataattheendofthesamplemightbepreliminaryestimates,andbefootnotedormarkedto
indicatethattheyarepreliminary.Youcanusetheseobservations,butyouneedtodeleteall
charactersandnonnumericalcomponents.Typically,youwillneedtodothisbyhand,entry
byentry.
Seasonaldatamaybereportedusinganextraentryforannualvalues.Somonthlydatamightbe
reportedas13numbers,oneforeachmonthplus1fortheannual.Youneedtodeletethe
annualvariable.Todothis,youcantypicallyusethedropcommand.Forexample,ifthese
entriesaremarkedAnnual,andyouhavepastedthislabelintovar2,then
. drop if var2==Annual
Thisdeletesallobservationsforwhichthevariablevar2equalsAnnual.Noticesthatthis
commandusesadoubleequality==.Thisiscommoninprogramming.Thesingleequality=
isusedforassignment(definition),andthedoubleequality==isusedfortesting.
TimeSeriesPlots
Thetslinecommandgeneratestimeseriesplots.Tomakeplotsofthevariablegdp,orthevariables
menandwomen
. tsline gdp
. tsline men women
Timeseriesoperators
Foratimeseriesy
L.
lagy(t1)
Example:L.y
L2.
2periodlagy(t2)
Example:L2.y
F.
leady(t+1)
Example:F.y
F.
2periodleady(t+2)
Example:F2.y
D.
differencey(t)y(t1)
Example:D.y
D2.
doubledifference(y(t)y(t1))(y(t1)y(t2))
Example:D2.y
S.
seasonaldifferencey(t)y(ts),wheresistheseasonalfrequency(e.g.,s=4forquarterly)
Example:S.y
S2.
2periodseasonaldifferencey(t)y(t2s)
Example:S2.y
RegressionEstimation
Toestimatealinearregressionofthevariableyonthevariablesxandz,usetheregresscommand
. regress y x z
Theregresscommandreportsmanystatistics.Inparticular,
Thenumberofobservationsisatthetopofthesmalltableontheright
Thesumofsquaredresidualsisinthefirstcolumnofthetableontheleft(underSS),intherow
markedResidual.
Theleastsquaresestimateoftheerrorvarianceisinthesametable,underMSandintherow
Residual.Theestimateoftheerrorstandarddeviationisitssquareroot,andisintheright
table,reportedasRootMSE.
Thecoefficientestimatesarerepotedinthebottomtable,underCoef.
Standarderrorsforthecoefficientsaretotherightoftheestimates,underStd.Err.
Insometimeseriescases(mostimportantly,trendestimationandhstepaheadforecasts),theleast
squaresstandarderrorsareinappropriate.Togetappropriatestandarderrors,usetheneweycommand
insteadofregress.
. newey y x z, lag(k)
Here,kisaninteger,meaningnumberofperiods,whichyouselect.Itisthenumberofadjacent
periodstosmoothovertoadjustthestandarderrors.STATAdoesnotselectkautomatically,anditis
beyondthescopeofthiscoursetoestimatekfromthesample,soyouwillhavetospecifyitsvalue.I
suggestthefollowing.Inhstepaheadforecasting,setk=h.Intrendestimation,setk=4forquarterlyand
k=12formonthlydata.
InterceptOnlyModel
Thesimplestregressionmodelisinterceptonly,y=b0+e.Thiscanbeestimatedbytheregressornewey
command
. regress y
. newey y, lag(k)
Theestimatedinterceptisthesamplemeanofy.Whilethiscouldhavebeencalculatedusingother
methods,suchasthesummarizecommand,usingtheregress/neweycommandisusefulasthen
afterwardsyoucanusepostestimationcommands,includingpredict.
RegressionFitandResiduals
Tocalculatepredictedvalues,usethepredictcommandaftertheregressorneweycommand
. predict p
Thiscreatesavariablepofthefittedvaluesxbeta.
Tocalculateleastsquaresresiduals,aftertheregressorneweycommand
. predict e, residuals
Thiscreatesavariableeoftheinsampleresidualsyxbeta.
Youcanthenplotthefitversusactualvalues,andaresidualtimeseries
. tsline y p
. tsline e
Thefirstplotisagraphofthevariablesyandp,assumingthatyisthedependentvariable,andparethe
fittedvalues.Thesecondplotisagraphoftheresidualsagainsttime.
DummyVariables
Indicatorvariables,knownasdummyvariables,canbecreatedusinggenerate.Onepurposeistocreate
subperiodsandregimes.
Forexample,tocreateadummyvariableequaling0forobservationsbefore1984,andequaling1
formonthlyobservationsstartingin1984
. generate d=(t>=tm(1984m1))
Inthisexample,thetimeindexist.Thecommandtm(1984m1)convertsthedateformat1984m1
intoanintegervalue.Thenewvariableisd,andequals0forobservationsupto1983m12,and
equals1forobservationsstartingin1984m1.
Tocreateadummyvariableequaling1forquarterlyobservationsbetween1990q1and1998q4,and
0otherwise,(andthetimeindexist)use
. generate d=(t>=tq(1990q1))*(t<=tq(1998q4))
Thiscommandessentiallygeneratedtwodummyvariablesandthenmultipliedthemtocreatethe
variabled.
ChangingInterceptModel
Wecanallowtheinterceptofamodeltochangeataknowntimeperiodwesimplyaddadummy
variabletotheregression.Forexample,iftisthetimeindex,thedataaremonthlyandwewanta
changeinmeanstartinginthe7thmonthof1987,
. generate d=(t>=tm(1987m7))
. regress y d
Thegeneratecommandcreatedadummyvariableforthesecondtimeperiod.Theregresscommand
estimatedaninterceptonlymodelallowingaswitchintheinterceptinJuly1987.
TheestimatedconstantistheinterceptbeforeJuly1987.Thecoefficientondisthechangeinthe
intercept.
TimeTrendModel
Toestimatearegressiononatimetrendonly,useregressorneweywiththetimeindexasaregressor.
Ifthetimeindexist
. regress y t
TrendswithChangingSlope
Hereishowtocreateatrendwhichchangesslopeataspecificdate(forconcreteness1984m1).Usethe
generatecommandtocreateadummyfortheperiodstartingat1984m1,andtheninteractitwitha
trendnormalizedtobezeroat1984m1:
. generate d=(t>=tm(1984m1))
. generate ts=d*(t-tm(1984m1))
Thenewvariabletsiszerobefore1984,andthenisalineartrendafterthat.
Thenregressthevariableofinterestontandts:
. regress t ts
Thecoefficientontisthetrendbefore1984.Thecoefficientontsisthechangeinthetrend.
Ifyouwanttheretobeajumpaswellasachangeinslopeat1984m1,thenincludethedummyd
. regress t d ts
ExpandingtheDatasetBeforeForecasting
Whenyouhaveasetoftimeseriesobservations,STATAtypicallyrecordsthedatesasrunningfromthe
firstuntilthelastobservation.YoucancheckthisbylookingatthedataintheDataEditor.Butto
forecastadateoutofsample,thesedatesneedtobeinthedataset.Thisrequiresexpandingthe
datasettoincludethesedates.Thisisdonebythetsappendcommand.Therearetwoformats
. tsappend, add(12)
Thiscommandadds12datestotheendofthesample.Ifthecurrentfinalobservationis2009m12,the
commandadds2010m01through2010m12.IfyoulookatthedatausingtheDataEditor,youwillsee
thatthetimeindexhasnewentries,through2010m12,buttheothervariablesaremissing.Missing
valuesareindicatedbyaperiod..
Theotherformatwhichaccomplishesthesametaskis
. tsappend, last (2010m12) tsfmt(tm)
Thiscommandaddsobservationssothatthelastobservationis2010m12,andthattheformattingis
monthly.Forquarterlydata,toaddobservationsupto2010q4thecommandis
. tsappend, last (2010q4) tsfmt(tq)
PointForecastingOutofSample
Thepredictcommandcanbeusedforpointforecasting,solongastheregressorsareavailable.The
datasetfirstneedstobeexpandedaspreviouslydescribed,andtheregressioncoefficientsestimated
usingeithertheregressorneweycommands.
Thecommand
. predict p
willcreateaseriespofpredictedvalues,bothinsampleandoutofsample.Torestrictthepredicted
valuestobeinsample,use
. predict p
Torestrictthepredictedvaluestoinsampleobservations(forquarterlydatawithtimeindextandthe
lastinsampleobservation2009m12)
. predict p if t<=tm(2009m12)
Torestrictthepredictedvaluestooutofsample(formonthlydatawiththelastinsample2009m12)
. predict yp if t>tm(2009m12)
Iftheobservations,insamplepredictions,andoutofsamplepredictionsarey,p,andyp,theycanbe
plottedtogether,butasthreedistinctelements,as
. tsline y p yp
. tsline y p yp if t>tm(2000m12)
Thesecondcommandrestrictstheplottoobservationsafter2000,whichisusefulifyouwishtofocusin
ontheforecastperiod(theexampleisforquarterlydata).
NormalForecastIntervals
Tomakeanintervalforecastbasedonthenormalapproximation,youneedwhatarecalledthe
standarddeviationoftheforecast,whichisanestimateofthestandarddeviationoftheforecast
error.Thesearecomputedusingthepredictcommand.Youfirstneedtoestimatetheforecastandsave
theforecast.Supposeyouareforecastingthemonthlyvariableygiventheregressorsxandz,the
insampleendsin2009m12andwemakethefollowingcommands
. regress y x z
. predict p if t<=tm(2009m12)
. predict yp if t>tm(2009m12)
Thenyouadd
. predict s if t>tm(2009m12), stdf
Thiscreatesavariablesfortheforecastperiodwhoseentriesarethestandarddeviationofthe
forecast.Nowyoumultiplythisbyastandardnormalquantileandaddtothepointforecast
. generate yp1=yp-1.645*stdf
. generate yp2=yp+1.645*stdf
Thesecommandscreatetwoseriesfortheforecastperiod,whichequaltheendpointsofaforecast
intervalwith90%coverage.(1.645and1.645arethe5%and95%quantilesofthenormaldistribution).
EmpiricalForecastIntervals
Tomakeanintervalforecast,youneedtoestimatethequantilesoftheresidualsoftheforecast
equation.Todoso,youfirstneedtoestimatetheforecastandsavetheforecast.Supposeyouare
forecastingthemonthlyvariableygiventheregressorsxandz,theinsampleendsin2009m12
andwemakethefollowingcommands
. regress y x z
. predict p if t<=tm(2009m12)
. predict yp if t>tm(2009m12)
. predict e, residuals
Nowwewanttocalculatethe25%and75%quantilesoftheresidualse.Thiscanbeaccomplished
usingwhatiscalledquantileregressionwithjustanintercept.TheSTATAcommandisqreg.Theformat
issimilartoregress,butyouhavetotellSTATAthequantileyouwanttoestimate.
. qreg e, quantile(.25)
Thiscommandcomputesthe25%quantileregressionofeonanintercept(asnoregressorsare
specified).TheCoef.Reportedinthetableisthe.25quantileofe.Nowyoucancomputetheoutof
samplevalues,andaddthemtothepointforecastyptocreatethelowerpartoftheforecastinterval
. predict q1 if t>tm(2009m12)
. generate yp1=yp+q1
Thepredictcommandusesthelastestimationcommandinthiscaseqregtocomputetheforecast.
Inthiscaseitiscomputingtheoutofsample.25quantileofe.
Youcanrepeatthisfortheupperforecastintervalendpoint.
. qreg e, quantile(.75)
. predict q2 if t>tm(2009m12)
. generate yp2=yp+q2
Thevariablesyp1andyp2aretheoutofsampleforecastintervalendpointsfory.Youcanplotthe
datatogetherwiththeoutofsamplepointandintervalforecasts,e.g.
. tsline y yp yp1 yp2 if t>tm(2000m12)
Forafanchart,yourepeatthisformultiplequantiles.
ConditionalForecastIntervals
Theqregcommandmakesiteasytocomputetheforecastintervalendpointsconditionalonregressors.
Thisisaquiteadvancedtechnique,soIdonotrecommenditwithoutcare.Butthisishowitcanbe
done.Asintheprevioussection,supposeyouareforecastingygivenxandz,haveforecast
residualse,andoutofsamplepointforecastyp.Nowyouwantoutofsampleconditionalquantiles
ofegivensomeregressors.Supposethatyouthinkthatxhaspredictivepowerforthequantilesof
e.Youcanusethecommandsforthe.25quantile
. qreg e x, quantile(.25)
. predict q1 if t>tm(2009m12)
. generate yp1=yp+q1
andsimilarlyforthe.75quantile.
Thismethodmodelsthequantilesofeasfunctionsofx.Thiscanbeusefulwhenthespread
(variance)ofthedistributionchangesovertime.