You are on page 1of 12

ForecastinginSTATA:ToolsandTricks

Introduction
ThismanualisintendedtobeareferenceguidefortimeseriesforecastinginSTATA.Itwillbeupdated
periodicallyduringthesemester,andwillbeavailableonthecoursewebsite.

WorkingwithvariablesinSTATA
IntheDataEditor,youcanseethatvariablesarerecordedbySTATAinspreadsheetformat.Eachrowsis
anobservation,eachcolumnisadifferentvariable.AneasywaytogetdataintoSTATAisbycutting
andpastingintotheDataEditor.
WhenvariablesarepastedintoSTATA,theyaregiventhedefaultnamesvar1,var2,etc.Youshould
renamethemsoyoucankeeptrackofwhattheyare.Thecommandtorenamevar1asgdpis:
. rename var1 gdp
Newvariablescanbecreatedbyusingthegeneratecommand.Forexample,totakethelogofthe
variablegdp:
. generate y=ln(gdp)

DatesandTime
Fortimeseriesanalysis,datesandtimesarecritical.Youneedtohaveonevariablewhichrecordsthe
timeindex.Wedescribehowtocreatethisseries.
AnnualData
Forannualdataitisconvenientifthetimeindexistheyearnumber(e.g.2010).Supposeyourfirst
observationistheyear1947.Youcangeneratethetimeindexbythecommands:
. generate t=1947+_n-1
. tsset t, annual
Thevariable_nisthenaturalindexoftheobservation,startingat1andrunningtothenumberof
observationsn.Thegeneratecommandcreatesavariabletwhichadds1947to_n,andthen
subtracts1,soitisaserieswithentries1947,1948,1949,etc.Thetssetcommanddeclaresthe
variablettobethetimeindex.Theoptionannualisnotnecessary,buttellsSTATAthatthetime
indexismeasuredattheannualfrequency.

QuarterlyData
STATAstoresthetimeindexasanintegerseries.Itusestheconventionthatthefirstquarterof1960is
0.Thesecondquarterof1960is1,thefirstquarterof1961is4,etc.Datesbefore1960arenegative
integers,sothatthefourthquarterof1959is1,thethirdis2,etc.
Whenformattedasadate,STATAdisplaysquarterlytimeperiodsas1957q2,meaningthesecond
quarterof1957.(EventhoughSTATAstoresthenumber11,theeleventhquarterbefore1960q1.)
STATAusestheformulatq(1957q2)totranslatetheformatteddate1957q2tothenumericalindex
11.
Supposethatyourfirstobservationisthethirdquarterof1947.Youcangenerateatimeindexforthe
datasetbythecommands
. generate t=tq(1947q3)+_n-1
. format t %tq
. tsset t
Thegeneratecommandcreatesavariabletwithintegerentries,normalizedsothat0occursin
1060q1.Theformatcommandformatsthevariabletusingthetimeseriesquarterlyformat.Thetq
referstotimeseriesquarterly.Thetssetcommanddeclaresthatthevariabletisthetimeindex.
Youcouldhavealternativelytyped
. tsset t, quarterly
totellSTATAthatitisaquarterlyseries,butitisnotnecessaryasthasalreadybeenformattedas
quarterly.Now,whenyoulookatthevariabletyouwillseeitdisplayedinyearquarterformat.

MonthlyData
Monthlydataissimilar,butwithmreplacingq.STATAstoresthetimeindexwiththeconvention
that1960m1is0.Togenerateamonthlyindexstartinginthesecondmonthof1962,usethecommands
. generate t=tm(1962m2)+_n-1
. format t %tm
. tsset t

WeeklyData
Weeklydataissimilar,withwinsteadofqandm,andthebaseperiodis1960w1.Foraseries
startinginthe7thweekof1973,usethecommands
. generate t=tw(1973w7)+_n-1
. format t %tw
. tsset t

DailyData
Dailydataisstoredbydates.Forexample,01jan1960isJan1,1960,whichisthebaseperiod.To
generateadailytimeindexstaringonApril18,1962,usethecommands
. generate t=td(18apr1962)+_n-1
. format t %td
. tsset t

PastingaDataTableintoSTATA
Somequarterlyandmonthlydataareavailableastableswhereeachrowisayearandthecolumnsare
differentquartersormonths.IfyoupastethistableintoSTATA,itwilltreateachcolumn(eachmonth)
asaseparatevariable.YoucanuseSTATAtorearrangethedataintoasinglecolumn,butyouhavetodo
thisforonevariableatatime.
Iwilldescribethisformonthlydata,butthestepsarethesameforquarterly.
AfteryouhavepastedthedataintoSTATA,supposethatthereare13columns,whereoneistheyear
number(e.g.1958)andtheother12arethevaluesforthevariableitself.Renametheyearnumberas
year,andleavetheother12variableslistedasvar2etc.Thenusethereshapecommand
.reshapelongvar,i(year)j(month)
Now,thedataeditorshouldshowthreevariables:year,monthandvar.STATAhasresortedthe
observationsintoasinglecolumn.Youcandroptheyearandmonthvariables,createamonthlytime
index,andrenamevartobemoredescriptive.
Inthereshapecommandlistedabove,STATAtakesthevariableswhichstartwithvarandstripsoffthe
trailingnumbersandputstheminthenewvariablemonth.Itusestheexistingvariableyearto
groupobservations.

DataOrganizedinRows
Somedatasetsarepostedinrows.Eachrowisadifferentvariable,andeachcolumnisadifferenttime
period.IfyoucutandpastearowofdataintoSTATA,itwillinterpretthedataasasingleobservation
withmanyvariables.
OnemethodtosolvethisproblemiswithExcel.Copytherowofdata,openacleanExcelWorksheet,
andusethePasteSpecialCommand.(Rightclick,thenPasteSpecial.)ChecktheTransposeoption,
andOK.Thiswillpastethedataintoacolumn.Youcanthencopyandpastethecolumnofdatainto
theSTATADataEditor.

CleaningDataPastedintoSTATA
Manydatasetspostedonthewebarenotimmediatelyusefulfornumericalanalysis,astheyarenotin
calendarorder,orhaveextracharacters,columns,orrows.Beforeattemptinganalysis,besureto
visuallyinspectthedatatobesurethatyoudonothavenonsense.
Examples

Dataattheendofthesamplemightbepreliminaryestimates,andbefootnotedormarkedto
indicatethattheyarepreliminary.Youcanusetheseobservations,butyouneedtodeleteall
charactersandnonnumericalcomponents.Typically,youwillneedtodothisbyhand,entry
byentry.

Seasonaldatamaybereportedusinganextraentryforannualvalues.Somonthlydatamightbe
reportedas13numbers,oneforeachmonthplus1fortheannual.Youneedtodeletethe
annualvariable.Todothis,youcantypicallyusethedropcommand.Forexample,ifthese
entriesaremarkedAnnual,andyouhavepastedthislabelintovar2,then
. drop if var2==Annual
Thisdeletesallobservationsforwhichthevariablevar2equalsAnnual.Noticesthatthis
commandusesadoubleequality==.Thisiscommoninprogramming.Thesingleequality=
isusedforassignment(definition),andthedoubleequality==isusedfortesting.

TimeSeriesPlots
Thetslinecommandgeneratestimeseriesplots.Tomakeplotsofthevariablegdp,orthevariables
menandwomen
. tsline gdp
. tsline men women

Timeseriesoperators
Foratimeseriesy
L.

lagy(t1)

Example:L.y

L2.

2periodlagy(t2)

Example:L2.y

F.

leady(t+1)

Example:F.y

F.

2periodleady(t+2)

Example:F2.y

D.

differencey(t)y(t1)

Example:D.y

D2.

doubledifference(y(t)y(t1))(y(t1)y(t2))

Example:D2.y

S.

seasonaldifferencey(t)y(ts),wheresistheseasonalfrequency(e.g.,s=4forquarterly)

Example:S.y

S2.

2periodseasonaldifferencey(t)y(t2s)

Example:S2.y

RegressionEstimation
Toestimatealinearregressionofthevariableyonthevariablesxandz,usetheregresscommand
. regress y x z
Theregresscommandreportsmanystatistics.Inparticular,

Thenumberofobservationsisatthetopofthesmalltableontheright

Thesumofsquaredresidualsisinthefirstcolumnofthetableontheleft(underSS),intherow
markedResidual.

Theleastsquaresestimateoftheerrorvarianceisinthesametable,underMSandintherow
Residual.Theestimateoftheerrorstandarddeviationisitssquareroot,andisintheright
table,reportedasRootMSE.

Thecoefficientestimatesarerepotedinthebottomtable,underCoef.

Standarderrorsforthecoefficientsaretotherightoftheestimates,underStd.Err.

Insometimeseriescases(mostimportantly,trendestimationandhstepaheadforecasts),theleast
squaresstandarderrorsareinappropriate.Togetappropriatestandarderrors,usetheneweycommand
insteadofregress.
. newey y x z, lag(k)
Here,kisaninteger,meaningnumberofperiods,whichyouselect.Itisthenumberofadjacent
periodstosmoothovertoadjustthestandarderrors.STATAdoesnotselectkautomatically,anditis
beyondthescopeofthiscoursetoestimatekfromthesample,soyouwillhavetospecifyitsvalue.I
suggestthefollowing.Inhstepaheadforecasting,setk=h.Intrendestimation,setk=4forquarterlyand
k=12formonthlydata.

InterceptOnlyModel
Thesimplestregressionmodelisinterceptonly,y=b0+e.Thiscanbeestimatedbytheregressornewey
command
. regress y
. newey y, lag(k)
Theestimatedinterceptisthesamplemeanofy.Whilethiscouldhavebeencalculatedusingother
methods,suchasthesummarizecommand,usingtheregress/neweycommandisusefulasthen
afterwardsyoucanusepostestimationcommands,includingpredict.

RegressionFitandResiduals
Tocalculatepredictedvalues,usethepredictcommandaftertheregressorneweycommand
. predict p
Thiscreatesavariablepofthefittedvaluesxbeta.
Tocalculateleastsquaresresiduals,aftertheregressorneweycommand
. predict e, residuals
Thiscreatesavariableeoftheinsampleresidualsyxbeta.
Youcanthenplotthefitversusactualvalues,andaresidualtimeseries
. tsline y p
. tsline e
Thefirstplotisagraphofthevariablesyandp,assumingthatyisthedependentvariable,andparethe
fittedvalues.Thesecondplotisagraphoftheresidualsagainsttime.

DummyVariables
Indicatorvariables,knownasdummyvariables,canbecreatedusinggenerate.Onepurposeistocreate
subperiodsandregimes.
Forexample,tocreateadummyvariableequaling0forobservationsbefore1984,andequaling1
formonthlyobservationsstartingin1984
. generate d=(t>=tm(1984m1))
Inthisexample,thetimeindexist.Thecommandtm(1984m1)convertsthedateformat1984m1
intoanintegervalue.Thenewvariableisd,andequals0forobservationsupto1983m12,and
equals1forobservationsstartingin1984m1.
Tocreateadummyvariableequaling1forquarterlyobservationsbetween1990q1and1998q4,and
0otherwise,(andthetimeindexist)use
. generate d=(t>=tq(1990q1))*(t<=tq(1998q4))
Thiscommandessentiallygeneratedtwodummyvariablesandthenmultipliedthemtocreatethe
variabled.

ChangingInterceptModel
Wecanallowtheinterceptofamodeltochangeataknowntimeperiodwesimplyaddadummy
variabletotheregression.Forexample,iftisthetimeindex,thedataaremonthlyandwewanta
changeinmeanstartinginthe7thmonthof1987,
. generate d=(t>=tm(1987m7))
. regress y d
Thegeneratecommandcreatedadummyvariableforthesecondtimeperiod.Theregresscommand
estimatedaninterceptonlymodelallowingaswitchintheinterceptinJuly1987.
TheestimatedconstantistheinterceptbeforeJuly1987.Thecoefficientondisthechangeinthe
intercept.

TimeTrendModel
Toestimatearegressiononatimetrendonly,useregressorneweywiththetimeindexasaregressor.
Ifthetimeindexist
. regress y t

TrendswithChangingSlope
Hereishowtocreateatrendwhichchangesslopeataspecificdate(forconcreteness1984m1).Usethe
generatecommandtocreateadummyfortheperiodstartingat1984m1,andtheninteractitwitha
trendnormalizedtobezeroat1984m1:
. generate d=(t>=tm(1984m1))
. generate ts=d*(t-tm(1984m1))
Thenewvariabletsiszerobefore1984,andthenisalineartrendafterthat.
Thenregressthevariableofinterestontandts:
. regress t ts
Thecoefficientontisthetrendbefore1984.Thecoefficientontsisthechangeinthetrend.
Ifyouwanttheretobeajumpaswellasachangeinslopeat1984m1,thenincludethedummyd
. regress t d ts

ExpandingtheDatasetBeforeForecasting
Whenyouhaveasetoftimeseriesobservations,STATAtypicallyrecordsthedatesasrunningfromthe
firstuntilthelastobservation.YoucancheckthisbylookingatthedataintheDataEditor.Butto
forecastadateoutofsample,thesedatesneedtobeinthedataset.Thisrequiresexpandingthe
datasettoincludethesedates.Thisisdonebythetsappendcommand.Therearetwoformats
. tsappend, add(12)
Thiscommandadds12datestotheendofthesample.Ifthecurrentfinalobservationis2009m12,the
commandadds2010m01through2010m12.IfyoulookatthedatausingtheDataEditor,youwillsee
thatthetimeindexhasnewentries,through2010m12,buttheothervariablesaremissing.Missing
valuesareindicatedbyaperiod..
Theotherformatwhichaccomplishesthesametaskis
. tsappend, last (2010m12) tsfmt(tm)
Thiscommandaddsobservationssothatthelastobservationis2010m12,andthattheformattingis
monthly.Forquarterlydata,toaddobservationsupto2010q4thecommandis
. tsappend, last (2010q4) tsfmt(tq)

PointForecastingOutofSample
Thepredictcommandcanbeusedforpointforecasting,solongastheregressorsareavailable.The
datasetfirstneedstobeexpandedaspreviouslydescribed,andtheregressioncoefficientsestimated
usingeithertheregressorneweycommands.
Thecommand
. predict p
willcreateaseriespofpredictedvalues,bothinsampleandoutofsample.Torestrictthepredicted
valuestobeinsample,use
. predict p
Torestrictthepredictedvaluestoinsampleobservations(forquarterlydatawithtimeindextandthe
lastinsampleobservation2009m12)
. predict p if t<=tm(2009m12)
Torestrictthepredictedvaluestooutofsample(formonthlydatawiththelastinsample2009m12)
. predict yp if t>tm(2009m12)

Iftheobservations,insamplepredictions,andoutofsamplepredictionsarey,p,andyp,theycanbe
plottedtogether,butasthreedistinctelements,as
. tsline y p yp
. tsline y p yp if t>tm(2000m12)
Thesecondcommandrestrictstheplottoobservationsafter2000,whichisusefulifyouwishtofocusin
ontheforecastperiod(theexampleisforquarterlydata).

NormalForecastIntervals
Tomakeanintervalforecastbasedonthenormalapproximation,youneedwhatarecalledthe
standarddeviationoftheforecast,whichisanestimateofthestandarddeviationoftheforecast
error.Thesearecomputedusingthepredictcommand.Youfirstneedtoestimatetheforecastandsave
theforecast.Supposeyouareforecastingthemonthlyvariableygiventheregressorsxandz,the
insampleendsin2009m12andwemakethefollowingcommands
. regress y x z
. predict p if t<=tm(2009m12)
. predict yp if t>tm(2009m12)
Thenyouadd
. predict s if t>tm(2009m12), stdf
Thiscreatesavariablesfortheforecastperiodwhoseentriesarethestandarddeviationofthe
forecast.Nowyoumultiplythisbyastandardnormalquantileandaddtothepointforecast
. generate yp1=yp-1.645*stdf
. generate yp2=yp+1.645*stdf
Thesecommandscreatetwoseriesfortheforecastperiod,whichequaltheendpointsofaforecast
intervalwith90%coverage.(1.645and1.645arethe5%and95%quantilesofthenormaldistribution).

EmpiricalForecastIntervals
Tomakeanintervalforecast,youneedtoestimatethequantilesoftheresidualsoftheforecast
equation.Todoso,youfirstneedtoestimatetheforecastandsavetheforecast.Supposeyouare
forecastingthemonthlyvariableygiventheregressorsxandz,theinsampleendsin2009m12
andwemakethefollowingcommands

. regress y x z
. predict p if t<=tm(2009m12)
. predict yp if t>tm(2009m12)
. predict e, residuals
Nowwewanttocalculatethe25%and75%quantilesoftheresidualse.Thiscanbeaccomplished
usingwhatiscalledquantileregressionwithjustanintercept.TheSTATAcommandisqreg.Theformat
issimilartoregress,butyouhavetotellSTATAthequantileyouwanttoestimate.
. qreg e, quantile(.25)
Thiscommandcomputesthe25%quantileregressionofeonanintercept(asnoregressorsare
specified).TheCoef.Reportedinthetableisthe.25quantileofe.Nowyoucancomputetheoutof
samplevalues,andaddthemtothepointforecastyptocreatethelowerpartoftheforecastinterval
. predict q1 if t>tm(2009m12)
. generate yp1=yp+q1
Thepredictcommandusesthelastestimationcommandinthiscaseqregtocomputetheforecast.
Inthiscaseitiscomputingtheoutofsample.25quantileofe.
Youcanrepeatthisfortheupperforecastintervalendpoint.
. qreg e, quantile(.75)
. predict q2 if t>tm(2009m12)
. generate yp2=yp+q2
Thevariablesyp1andyp2aretheoutofsampleforecastintervalendpointsfory.Youcanplotthe
datatogetherwiththeoutofsamplepointandintervalforecasts,e.g.
. tsline y yp yp1 yp2 if t>tm(2000m12)
Forafanchart,yourepeatthisformultiplequantiles.

ConditionalForecastIntervals
Theqregcommandmakesiteasytocomputetheforecastintervalendpointsconditionalonregressors.
Thisisaquiteadvancedtechnique,soIdonotrecommenditwithoutcare.Butthisishowitcanbe
done.Asintheprevioussection,supposeyouareforecastingygivenxandz,haveforecast
residualse,andoutofsamplepointforecastyp.Nowyouwantoutofsampleconditionalquantiles

ofegivensomeregressors.Supposethatyouthinkthatxhaspredictivepowerforthequantilesof
e.Youcanusethecommandsforthe.25quantile
. qreg e x, quantile(.25)
. predict q1 if t>tm(2009m12)
. generate yp1=yp+q1
andsimilarlyforthe.75quantile.
Thismethodmodelsthequantilesofeasfunctionsofx.Thiscanbeusefulwhenthespread
(variance)ofthedistributionchangesovertime.

You might also like