# Business Statistics Project

Time series analysis MsC. Business Administration Angelo Delle Piane Ruggero Cardinaletti

and whether it’s better to convert the time series into its logarithm or not. Our project is based on analyzing the time series of Austrian tourism. We would like to start up by plotting some graphs which will help us understand how does the series behave. From 1995 onwards. let’s take a look at the time series provided: Time series 1000000 900000 800000 700000 600000 Arrivals 500000 400000 300000 200000 100000 0 1990M01 1990M09 1992M01 1993M05 1994M01 1994M09 1995M05 1996M01 1996M09 1998M01 1998M09 1999M05 2000M01 2000M09 2001M05 2002M01 2002M09 2004M01 2004M09 2005M05 2006M01 2007M05 2008M01 2008M09 2009M05 2010M01 2010M09 1991M05 1992M09 1997M05 2003M05 2006M09 Time As we can see. if it has a seasonality and a trend. . we can perceive a slight trend that increases over time as well. First of all. from the plot we can identify a seasonal pattern that periodically occurs throughout the years. Residents. The first issue we’re facing is if transforming the series into logarithm would help us stabilize the time series itself.Business Statistics Project Time series: Austria. focusing our attention on the residents. and therefore improve our model.

but we could make an hypothesis on the lack of growth over 2007/08.T ime series traformed into LOG 6.5 1990M01 1990M11 1991M09 1992M07 1993M05 1994M03 1995M01 1995M11 1996M09 1997M07 1998M05 1999M03 2000M01 2000M11 2001M09 2002M07 2003M05 2004M03 2005M01 2005M11 2006M09 0 -0. pointed out with the red line.6 5. and the trend effect.5 5. probably due to the financial crisis. Taking into account the transformed time series we could plot the ACF to see how autocorrelated the time series is.5 -2 -2.1 6 5.3 . apr-08 feb-10 lug-05 TIME lug-94 giu-95 dic-00 giu-06 ott-91 ott-02 mag- mag- 5.9 5.5 2 1.5 1 0.5 -1 -1. Annual Growth 2. Plotting the annual growth (from 1991 onwards) we have the confirmation that a positive trend exists.8 5. appears reduced.4 mar-98 nov-01 nov-90 mar-09 2008M05 2009M03 ago-93 gen-00 ago-04 feb-99 set-92 apr-97 set-03 Months Once we convert the time series the amplitude of seasonal fluctuations seems to be more stable.7 5.5 Months 2007M07 We don’t know how to explain the drastic decrease in 1998.

We start by improving our data set. Once we have found the best solution. so close to one. With the data we retrieve from the regression we calculate the seasonal effect. and plot the graph of the seasonal pattern: . along with t (number of months) and its polynomial equations.Autocorrelogram 1 0.8 -1 0 1 2 3 4 5 6 7 8 9 1 0 1 1 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 9 20 21 22 23 24 Lag By looking at the ACF. by dropping one of the seasonal dummies. including a constant equal to 1 and seasonal dummy variables for each month.99558. We use the first re parameterization to create our models.6 Autocorrelation 0.02479). we can predict the 2010 values by using different models and compare the results with the original data of 2010 at hand. and the standard error is very small (0.4 0. we shall use it to forecast the following year arrivals. To do this.6 -0. To avoid estimating the model by constrained least squares we reparametrize it in two ways: by dropping the intercept or.2 0 -0. as well as a seasonality -as long as we do not take into account the series’ trend. The first model we decide to fit to our time series is a time series regression using polynomial trend and deterministic seasonality.2 -0.8 0. this provides us evidence that the time series can be forecasted effectively.4 -0. Regression with no intercept The regression tells us that the fit of the model is quite good since the adjusted R-squared is around 0. alternatively. which increases over time-. we can identify a clear lag 1 autocorrelation. Modelling the time series We need to find the best model to forecast the monthly arrivals in 2011 in our time series.

int 5.5 5.15 Months Watching this pattern we can assume that we have a high peak season on the month of August and a low peak season on December.05 -0.7 log austria 5.2 1 10 19 28 37 46 55 64 73 82 91 100 109 118 127 136 145 154 163 172 181 190 199 208 217 226 235 244 We see that the model has some problems to predict the series but still behaving quite well. .4 5.8 5.1 6 5. Since our time series is referred to the tourist arrival of Austrian residents. probably due to the harsh weather conditions.05 0 1 2 3 4 5 6 7 8 9 10 11 12 -0. Prediction with NO intercept 6.1 0.Seasonal pattern 0.6 Prediction No. To improve the model let’s see what happens if we include polynomial trends.1 -0. we deduce that they prefer to travel around their own country in the summertime rather than during winter.15 0.9 5.3 5.

We would like to forecast the series with other models in order to reduce these errors and achieve a better prediction. Quadratic regression .000722 0.023476152 Cubic regression t³ 0. we calculate a more effective parameter.5 5.000714 0.To be brief we will compare the three polynomial regressions we have create by identifying their best goodness of fit analyzing the AIC and SIC value.023243 At first sight we’d choose the model with the lowest AIC.058274 0.9 5.4 5. which suggests a different solution. but we can still find discrepancies.000727 0.065716 0. SIC.061996 0.8 5. that is the quadratic regression model (which is also the one with the highest Adjusted R-squared value). Prediction with t^2 6. in this case we should pick the fourthgrade regression. to be certain.t² AIC SIC Adjusted R-squared Standard error 0. ARMA model .02338 ‘Fourth grade’ – t4 0. Let’s plot the model to see how it is able to forecast the series for the year 2010.3 gen-90 set-90 mag91 gen-92 set-92 mag93 gen-94 set-94 mag95 gen-96 set-96 mag97 gen-98 set-98 mag99 gen-00 set-00 mag01 gen-02 set-02 mag03 gen-04 set-04 mag05 gen-06 set-06 mag07 gen-08 set-08 mag09 gen-10 set-10 Ypreditc-t^2 Austria LOG We can observe that the forecast has improved from the previous model we plotted.995558192 0. However.1 6 5.6 5. the Adjusted R-squared and the Standard Error. mainly during the first and the last years of the time series.7 h 5.995519 0.995539 0.

3). Let’s see how this model works with our time series and how effectively can forecast the 2010 values. we decided to try different ARMA models.676 -624. the model we choose is the ARMA (1. From the given results.290 -607. by establishing p and q parameters from (1.3).0001 < 0. since it has the lowest AICC.899 As we can see.943 -603.0001 From the results we can see that both tests agree that the data cannot be assumed to be generated by a white noise process since both Ljung-Box and Box-Pierce have a very small p-value. .1) to (3.0001 < 0.462 -603.0001 < 0. we will choose the one with the lowest AIC or AICC (correct AIC for finite sample sizes).448 1200. .109 1158.694 -599. Statistic Box-Pierce Ljung-Box Box-Pierce Ljung-Box Lag 6 6 12 12 Value 542.412 -623.749 -626.747 553.443 p-value < 0. p 1 1 1 2 2 2 3 3 3 Q 1 2 3 1 2 3 1 2 3 P 0 0 0 0 0 0 0 0 0 Q 0 0 0 0 0 0 0 0 0 AICC -608. In order to exploit a more accurate model.156 -603.Before applying any ARMA or ARIMA models we have first decided to analyze the ACF and PACF and calculated the Ljung-Box and Box-Pierce test if the data could be assumed as White noise or not.

we should use a model that considers it like ARIMA.410 -1063. q and d.8 LOG Austria 5.000 0.D=1.537 -1058.811 -1057.ARMA (1.1. which doesn’t take into account the seasonal effect.313 -1053.3) 6 5.4 5. However.7 5.526 -1058. and it seems to be the most precise one.676). with both p and P equal to zero.q=2.574 The results tell us that the parameters we should use for applying the best model are ARIMA (0.1).208 -1054.640 -1062. ARIMA model and AIRLINE In the ARIMA models we should consider the parameters p.D=1. Why? Goodness of fit statistics: Observations SSE 228. To begin we’d like to find the best parameters in a range from (p=0.Q=1): p 0 0 0 0 1 1 1 1 2 2 2 2 q 1 1 2 2 1 1 2 2 1 1 2 2 P 0 1 0 1 0 1 0 1 0 1 0 1 Q 1 1 1 1 1 1 1 1 1 1 1 1 AICC -1067. This model is known as the AIRLINE model.948 -1062.955 -1052.P=0.3 ago-87 mag-90 gen-93 ott-95 lug-98 apr-01 gen-04 ott-06 lug-09 apr-1 2 dic-1 4 TIME LOG Austria ARIMA (LOG Austria) Validation Prediction Low er bound (95%) Upper bound (95%) It’s clear that the ARMA model.d=1.Q=1) to (p=2.627 -1058.5 5. tends to make a prediction that will head back to the mean (the value 5.9 5.6 5. d=D=1 and q=Q=1. since our time series has seasonality.019 .804 -1049.P=1.q=1.d=1.

9 108.7 5.) FPE AIC AICC SBC MSE (mean square error) ARIMA (0.408 -1067.000 -1077.5 5.MAPE(Diff) MAPE WN Variance WN Variance (estimate) -2Log(Like. In fact. in our case we’ve chosen the former solution. comparing the prediction for the year 2010 (yellow line) with the original data for the same year (crossed green line) we realize that this has been the best forecast for the year 2010 that we’ve done so far.1 6 5.6 5.410 0.693 0.8 5.1.4 5.112 0. we want to let this forecast method compete with another model: the Holt-Winters one.0000814507 LOG Austria 5.3 ago-87 mag-90 gen-93 ott-95 lug-98 apr-01 gen-04 ott-06 lug-09 apr-1 2 dic-1 4 TIME LOG Austria Low er bound (95%) ARIMA (LOG Austria) Upper bound (95%) Validation Year 2010 original data Prediction The AIRLINE model’s data provide us with the lowest AIC observed and.000 0. most important. the lowest MSE value (retrieved by dividing the SSE by the degrees of freedom).1) = AIRLINE 6. Nevertheless. Seasonal Holt-Winters model The Holt-Winters model can be elaborated considering the multiplicative or the additive seasonality of a time series.000 -1083. also the graph underlines the same conclusion.522 -1077. . which explain the goodness of fit of the model itself.522 0.

to be certain of the results. Seasonal Holt-Winters MSE 0. but to compare the two models we have to use the same coefficient.8 5. thus we must look at the MSE for the Holt-Winters as well. it was able to forecast the time series for the year 2010 (given all the previous data) in the most reliable way.000400707 1 0. Furthermore. .5 5.4 5.969 The Adjusted R-Squared.254 -0. confirms that the results are quite acceptable.000 0.020 0.3 ago-87 mag-90 gen-93 ott-95 lug-98 apr-01 gen-04 ott-06 lug-09 apr-1 2 dic-1 4 TIME LOG Austria Low er bound (95%) Holt-Winters(LOG Austria) Upper bound (95%) Validation Months of 2010 Prediction The plot of this model doesn’t differ so much from the previous one.6 5. however.0000814507 The AIRLINE model demonstrates that. this time. Forecasting 2011 arrival of resident with AIRLINE The prediction for year 2011 will be performed using the best model since. according to the MSE evaluation.019 0. at a first glance.2 6. This is why we chose to apply it also for forecasting what will happen on 2011.014 0. we must take a look at the summary statistics: Statisti c DF SSE MSE RMSE MAPE MPE MAE R² Value 212.9 5.085 0. we will forecast a 12-step-ahead prediction up to November 2011.0004007071 AIRLINE > MSE 0.Holt-Winters / Seasonal multiplicative (LOG Austria) 6. since our data ends up in November 2010.1 6 LOG Austria 5. we won’t be able to compare it with the original data (future values).7 5.

795 5.886 5. together with a lower AIC value.7 5.1. because the summary statistics present a lower mean square error.807 5. know as AIRLINE model.852 5. Furthermore.9 LOG Austria 5.925 5.4 5.819 5.790 5. In fact: .928 5.806 Highlighted in green are the data set provided by the AIRLINE simulation.984 5.901 5.8 5. Conclusion After analyzing a series of different models. compared with the other models. which reflect both the seasonal pattern and the increasing trend as they’ve been observed throughout the whole time series. it was able to plot the most precise graph at eyesight.AIRLINE for year 2011 6.3 ago-87 mag-90 gen-93 ott-95 lug-98 apr-01 gen-04 ott-06 lug-09 apr-1 2 dic-1 4 TIME LOG Austria ARIMA (LOG Austria) Validation Prediction Low er bound (95%) Upper bound (95%) dic-10 gen-11 feb-11 mar-11 apr-11 mag-11 giu-11 lug-11 ago-11 set-11 ott-11 nov-11 Log (Austria) predicted 5.5 5.883 5.6 5.1). we’ve got to the conclusion that the best model for our time series is the ARIMA (0.1 6 5.

it is overtaken by the AIRLINE model. in our opinion. especially in correcting the model step by step. even though. and adapting it to changes. by observing the statistical data. ϖ The seasonal Holt-Winters model produces a more accurate forecast than the previous ones. yet. . we could have obtain a fairly good prediction with this model too.ϖ The polynomial regression model elaborates the forecast in a good way. but has some flaws. ϖ The ARMA combinations do not take into account the seasonal pattern nor the trend of the time series since the model itself tends to get back to the mean in the forecast.

Sign up to vote on this title