You are on page 1of 58

# ST 203 : Statistical Models and Data Analysis Lecture 8

Cliﬀord Lam
Department of Statistics London School of Economics and Political Science

Princetonshield

1 / 58

Lecture 8 rundown

Recap from last lecture What is time series? Why are they so important? How do we model them? We’re back to Excel

Princetonshield

2 / 58

Time series
Time series (TS) data is any sequence of measurements taken on a response that varies over time Examples:
Weather (pressure, temperature, rainfall; daily, weekly, annual) Health (HIV; white cell count, Cancer; tumor growth) Finance (shares, interest rates, exchange rates)

Princetonshield

3 / 58

TS - why do we care
In the business world TS are the main object of stats analysis Shares, interest rates, real estate prices, price of gold, petrol prices, inﬂation etc. In your future jobs you might need to know something about TS Also TS are important for weather forecasting and in particular lately for identifying global warming

Princetonshield

4 / 58

TS - why do we care
The aim of TS modelling is to understand seasonal (cyclical) and directional trends In order to be able to FORECAST (i.e. predict) the values of the variable of interest on a future date This allows people in the ﬁnancial world to make proﬁts By buying or selling shares, options etc.

Princetonshield

5 / 58

forecasting Forecasting is extremely uncertain Remember when we talked about out-of-sample predictions in MLR? I.TS .e. predicting the outcome for ranges of the explanatory variables that we have not seen We are always less sure about out-of-sample predictions than we are of in-sample predictions Princetonshield 6 / 58 .

TS .forecasting In TS forecasting we only care about out-of-sample prediction This becomes diﬃcult because TS are very variable and often unpredictable: Markets have crashes and recessions Weather is highly variable Princetonshield 7 / 58 .

TS and regression Consider the data on petrol sales for cars per quarter for 4 years From the scatterplot of quarter (time) versus sales we can see that a linear downward trend would be suitable Princetonshield 8 / 58 .

TS and regression However we shouldn’t really ﬁt a linear regression as we have dependent error terms Because there is a seasonal trend which shows clearly in the residual plot Princetonshield 9 / 58 .

TS and regression It makes sense that there should be a seasonal trend to petrol sales People travel more in the summer and therefore people buy more petrol then Of the NICEL assumptions I is being violated The errors are not Independent This is quite typical for time series models But there is a linear trend too Need to consider both elements 10 / 58 Princetonshield .

TS Components TS have 4 elements 1 Trend: long term direction of the data . days of the week 3 Cycles: long term cycles that are not necessarily related to the season . weeks. can be described by a regression 2 Seasonal eﬀects: cycles related to seasons. months.no need to worry in this course 4 Irregular ﬂuctuations: random error + blips and market crashes Princetonshield 11 / 58 .this can be linear or exponential etc.

TS Components We assume a multiplicative model for how the TS components mix: Time series data= Yt =Trend × Seasonality × Cyclicality × Irregularities = TSCI -typically we focus on T and S as these are easier to predict than long term cycles and irregular ﬂuctuations 12 / 58 Princetonshield .

funnel shaped) So if we just add seasonality to linearity we don’t take into account that for larger time there is also often extra variability A multiplicative model does! Princetonshield 13 / 58 .e.TS Components We use a multiplicative model because is makes more sense than an additive model Think about the residual plot from before It has a seasonal pattern but was also heteroscedastic (i.

Trend example Below are monthly data on petrol sales. there is an upward probably linear trend As the data are monthly there is also a monthly (seasonal) trend but it is harder to see Princetonshield 14 / 58 . as you can see from this longer time series.

Irregular ﬂuctuations example Below are monthly data on chemical sales. as you can see from this longer time series. there are a number of irregular jumps with linear trends Princetonshield 15 / 58 .

as you can see from this time series. there is a deﬁnite seasonal trend to ice-creams sales Princetonshield 16 / 58 .Season example Below are seasonal data on ice-cream sales.

Other examples Employment will have cycles: recessions have a cyclical nature Irregularities: market crashes Seasonality: More work in the summer Trend: as more people are born more are employed Princetonshield 17 / 58 .

TS components In this course we focus on retrieving the main components of a time series Trend and Seasonality This can get pretty intense before you understand it so please pay attention The way this works is to ﬁnd the underlying trend of the time series And then divide the time series by this trend in order to get the seasonal component Princetonshield 18 / 58 .

Stationarity Time series without cycles or seasonality are called stationary I.e. if a time series has only trend and can be explained by a linear regression then it is stationary Princetonshield 19 / 58 .

TS components Let Y represent a time series Y = T SCI. the most general form I is unpredictable C is hard to do unless we know the data cycle T and S can be found so we assume Y = T S Princetonshield 20 / 58 .

TS how to 1 S: First we preliminarily ﬁnd the trend by Smoothing the data Think carefully about the type of moving average you might choose 2 M: We preliminarily ﬁnd the seasonal component by dividing the time series Y by the trend in the moving average S = Y T 3 S: We then have to get the true seasonal eﬀects by estimating ﬁrst seasonal averages and then seasonal indices Princetonshield 21 / 58 .

TS how to 4 T:Now that we have a good estimate of seasonality divide data by season to get the real trend T = Y S 5 R: Use a linear regression on the Trend to get the estimate of the trend parameters 6 F: Multiply the Trend forecast to the seasonal estimates and then do the usual residual analyses Princetonshield 22 / 58 .

thus extracting the trend The idea is to summarise what a time series is doing by averaging the data points over a number of time points Some people don’t like this and prefer autocorrelation models We’ll see these later Princetonshield 23 / 58 . cyclical and irregular components of the time series.Smoothing Smoothing is the idea of getting rid of seasonal.

Smoothing There are two main ways of smoothing 1 Moving averages 2 Weighted moving averages 3 Exponential smoothing .later We’ll do Moving Averages ﬁrst Princetonshield 24 / 58 .

TS example: Moving average If we are looking at quarters we should use a 4 point centered moving average If we were looking at months we would use a 12 point centered moving average If we were looking at years we would choose a 3 or 5 point moving average If the moving average has an even number of points it needs to be centered Can also use weighted moving averages Princetonshield 25 / 58 .

TS example: Moving average The moving average is a way of smoothing out the ﬂuctuations and seasonality Let’s say we want to calculate a 5 and 3 point moving averages 1 5 point MA x5 = ¯t xt−2 + xt−1 + xt + xt+1 + xt+2 5 2 3 point MA x3 = ¯t xt−1 + xt + xt+1 3 Princetonshield 26 / 58 .

TS example: Moving average Princetonshield 27 / 58 .

TS example: Moving average The data are annual so we look at 3 and 5 pt moving averages These does not have to be centered As you can see there is an upward trend in the data (exponential or quadratic?) The 3PtMA starts at point 2 and the 5PtMA starts at point 3 and they both ﬁnish early as they need all the points above and below for the estimates Princetonshield 28 / 58 .

TS example: Moving average Princetonshield 29 / 58 .

g in our minks example we use a 5 point MA’s where the furthermost points worth less x5 = ¯t xt−2 + 2xt−1 + 4xt + 2xt+1 + xt+2 10 Princetonshield 30 / 58 .Weighted moving average Instead of just using simple averages we can try using weighted moving averages These give more weight to values close to the time we are estimating it for E.

TS example: Moving average The weighted moving average formula looks like this in Excel Princetonshield 31 / 58 .

Princetonshield 32 / 58 .

TS example: Moving average Consider the example of car petrol price per gallon in dollars per quarter over 4 years It has a downward trend It has a seasonal (quarterly) component Princetonshield 33 / 58 .

If the MA is even then we also have to center it: In this example we calculate a 4 point centered moving average 1 4 point MA x4 = ¯t xt−2 + xt−1 + xt + xt+1 4 2 Say we get x4 and x4 ¯t ¯(t+1) .TS example: Moving average For quarters we need a 4 point centered moving average. the centered 4 point MA is x4 + x4 ¯t ¯(t+1) 4c xt = ¯ 2 Princetonshield 34 / 58 .

TS example: Moving average In our example we have quarters so we calculate a 4 point centered MA First a 4 point MA Princetonshield 35 / 58 .

TS example: Moving average Then we center it: Princetonshield 36 / 58 .

TS example: Moving average As you can see. it no longer goes up and down as much as the original time series It now looks much more linear This can also be shown in the plot The MA’s only start after the ﬁrst 2-3 values of the time series as you need at least that many to smooth Also they end early for the same reason Princetonshield 37 / 58 . the MAs smooth the seasonal trend.

TS example: Moving average Princetonshield 38 / 58 .

Seasonal components Once we’ve gotten the hang of the trend and smoothed out the season if there is one We can try and isolate the seasonal element by dividing the time series by the trend Remember that Y = T S if there are no cycles or irregularities so Y = S T What we estimate is the Ratio-to-moving-average (R2MA) In our petrol example there was a seasonal component so let’s ﬁnd the R2MA 39 / 58 Princetonshield .

TS example: R2MA R2M At = Yt 4ptCMAt Princetonshield 40 / 58 .

Seasonal components We now have an idea of the seasonal trends However. if you think about it. we want one estimate for the seasonal component for each quarter Currently we have 3 for each quarter (we’re looking over 4 years) Remember that because we have to leave out the ﬁrst couple and the last couple of values to get the 4 point centered MA we only have the R2MA from the 3rd Quarter till the 14th The best thing is to list them Princetonshield 41 / 58 .

TS example: R2MA Princetonshield 42 / 58 .

Seasonal components Once we’ve listed them we get average values for the seasonal component for each season For each season we have 3 values so we average them for each season. if for Autumn we have 2 values only then we take the average of these 2 values for Autumn. Princetonshield 43 / 58 . E. If some seasons have diﬀerent number of values we take the average of them anyway.g.

Seasonal Average Princetonshield 44 / 58 .

0013.so 1 is similar to 0 in additive models Princetonshield 45 / 58 . very close to 4. The idea of seasonality is it to imagine it as going up and down around the trendline We want it to be on average 1 each year so we want the average of the four seasonal averages to be exactly 4 Remember that we are using a multiplicative model .Seasonal components .Seasonal averages Next we look at the sum of the seasonal averages In this case it is 4.

of.Index = Summer. for summer we do Summer.Average ×4 Sum.Seasonal components We have to do estimate the seasonal indices The way we do this is by normalising the seasonal averages so they sum to 4 E.g.Avgs Princetonshield 46 / 58 .

Seasonal Indices Princetonshield 47 / 58 .

De-seasonalising Now we have a grip on the seasonal component of the data We want to get a better grip on the trend So we use the same trick we used before: Y = T S so Y = T S To get a de-seasonalised trend for the petrol data we divide Petrol by the Seasonal Index From the plot we can see that the deseasonalised data now just looks regularly linear 48 / 58 Princetonshield .

De-seasonalising Princetonshield 49 / 58 .

De-seasonalising Princetonshield 50 / 58 .

Trend What we do now with the trend is to ﬁt a linear regression to it You do this in the usual way and I won’t go over it I just keep the intercept and the coeﬃcient of Quarter Princetonshield 51 / 58 .

Trend Princetonshield 52 / 58 .

Forecasting We have the Seasonal indices (seasonal component) We have the trend line (linear regression) The aim is now to forecast the next few points 1 Use the linear regression to forecast(predict) the trend values for quarters 17-20 2 Multiply the regression predictions by the appropriate seasonal index These are the forecasts Princetonshield 53 / 58 .

Forecast Princetonshield 54 / 58 .

Forecast We now look at the plot of the predicted time series As you see the downward trend is continuing and we see the seasonal patter repeating itself Princetonshield 55 / 58 .

Real data We actually have the data for those next 4 quarters As you see there was a crash in petrol prices in 1985 (where the quarter 16 ends) We could not have predicted that without further information Princetonshield 56 / 58 .

Real data At the end of 1985 there was a pretty big crash in the price of crude oil that lasted all the way through to 1986 This was due to a maneuver by OPEC countries to secure their future in the market in the face of competition from other countries.g the USA Crude oil production has gone down pretty much since then Princetonshield 57 / 58 . e.

Seasonality.Main points to take away Time series analysis violates assumptions I from NICEL Time series have 4 components Trend. better estimated trend by linear regression. Cyclicality and Irregularity We can use Moving averages to smooth the time series and identify a trend We can use SMSTRF to ﬁnd moving average. seasonal index. and forecast Princetonshield 58 / 58 .