You are on page 1of 45

Time Series Project

1.

Imported the series in two ways, each row has an associated date. This is
in fact not a column, but instead a time index for value. As an index, we
see single value of wine sales across times. While uploading in df_1 we
have passed the argument of parse dates=true and squeeze=true,
indicating the first column as dates and it needs to be parsed, through
squeeze we indicating one column and a series.
While df_2 is upload without any argument we are adding time stamp
manually to the series and also dropping the YearMonth column available
in the data set. This is primarily done when don’t parse the dates while
uploading the series.

Plotting the time series - While Sparkling wine sales are growing and the
price of rose wine showing downward trend. We can also see some
element of seasonality through the graph however a detailed report is
given on trend and seasonality while decomposing the series.
2.

For Sparkling time series there is no missing values however for Rose,
there are two missing values. I used interpolate function with linear
method and direction forward to impute the values. The missing values.

The data seems to be skewed for the both time series. The min and max
are two extremes hence high standard deviation is observed and so is
mean with 50th percentile. The total count is 187 records for the data set.
The above yearly box plots reflect the year on year sales.
For Sparkling, there is a variation of the sales each year with some
consistency 1985 & 1986. The least sale took place in 1982 and highest
sale in year 1994. The 1995 data is only for 7 months hence cannot take a
call on the performance of the sales for this year. The sales started with a
dip in year 1980 and stated showing upward trend from 1983, we can also
see variation in sales going up from 1983 to 1987, however the highest
variation is visible in 1994. We also see the skewness in each year sales
except for 1981. There are outliers in the year sales data however to time
series we can ignore the outliers.
For Rose, there is a downward sales observation in Rose sales, with
highest sales in 1981 and lowest in 1994. Steep decline in sales is
observed from year 1990 until 1994, although 1995 data is till July month,
the sales seems to have picked up in this year. We can also assume high
variation in monthly sales for year 1981, least in 1994.
Looking at the monthly box plots, we can observe the seasonality in the
both the data sets, being on higher side for Sparkling wine. The sales
picks up in Q4 for both the wine, higher increase of sales observed for
sparkling wines. The sales for sparkling wines is comparatively lower in Q1
& Q2, slowly starts upwards trend from Q3 wherein the monthly sales for
Rose wine picks from Jan and show slight consistent path until Q3 and
starts to go up again from Q4. Monthly sales for both wines have skewed
data with very exception.
The above plots clearly reflects the December being highest sales
generating month, followed by November for Sparkling and December
highest sales month for Rose however other months mixed trends such as
August showing high sales in few years and so do July. There is a hint of
seasonality observed from both the graph.
The above upsample of the data to annual plot, reflects downward annual
sales showing overall downwards trend year on year for Rose. The Annual
sale for Sparkling wine shown some dip initially however sales picked from
1982 to steady peak until 1988 and shown a dip again. We observed and
steep decline after 1994 and that is due to sales data available until July
of 1995.
In the above case we have up sample the records to mean of yearly
observations which talks very similar observation as mentioned sum of
annualized sales for both the wines. The average sales for Sparkling wine
is much higher than Rose.
Another view created is for Quarterly for both data set, while the quarterly
sales for sparkling shows increasing trend for sparkling, the same we can
see going downwards for Rose wine. There is a hint of seasonality on the
time series data set.
Above image reflects decomposition of the time series for both the data
set. I used both additive and multiplicative way of decompising to see the
residual, through which I can determine whether they are multiplicative or
additive. Basis which we can clearly say that the series is a multiplicative
one and have seasonal component to it. We can clearly see increasing
trend for sparkling wine wherein declining trend for Rose wine. The plot
above clearly shows that the sales of furniture is unstable, along with its
obvious seasonality.
The seasonal variation is on higher side for Sparkling sales and sales
variation is higher for Rose wines. We can see this through the residual
plot also.

3.
The data frame has been split into train and test. As per the question the
test data should start at 1991, hence used 71% split for train and rest for
test. The head and tail of the train and test data confirms the test
beginning at January 1991 and train set ending at December, 1990. The
count of train set is 132 and test set is 55 for both data frame, Sparkling
and Rose.
The above graph is visual representation of the train and test set for data
frame. Orange reflects the train set and blue reflects the test data set.
Train set -01/1980 to 12/1990 Testing set – 01/1991 to 07/1995

4.

Linear Regression
There are several models which are made on both data set, starting with
Linear Linear Regression. The above images reflects the code and visual
prediction of test data in comparison to the actuals for the Sparkling and
Rose.
For Sparkling an upwards prediction is observed for the test set while the
downward trend observed for Rose, test set. The red line is regression on
train and wherein green is for regression on test.
The RMSE for Sparking test is 1389.35 and MAPE is 40.05
The RMSE for Rose test is 15.26 and MAPE is 22.82
Naïve

In the Naïve model, the forecasts for every horizon correspond to the last
observed value, Looking at the RMSE for Naïve model we can clearly say
this model is not suitable for both the data sets. The RMSE for test,
Sparkling is 3864.27 and MAPE is 152.87 and for Rose, RMSE is 79.71 and
MAPE is 145.10

Simple Average
Simple average forecasts the expected value through average of all
previous observations. Take average of all previous known values and
calculate the next value.
The RMSE for test set, sparkling – 1275.08 and MAPE - 38.90, which is so
far the best one among all three models ran on the data set. For Rose test
set, RMSE – 53.46 and MAPE- 96.93, which is not good against linear
regression model used so far.

Moving Average
A moving average (rolling average or running average) is a calculation to
analyze data points by creating a series of averages of different subsets of
the full data set. Given a series of numbers and a fixed subset size, the
first element of the moving average is obtained by taking the average of
the initial fixed subset of the number series. Then the subset is modified
by "shifting forward"; that is, excluding the first number of the series and
including the next value in the subset.
In our case, we have taken 2, 4, 6- and 9-point trailing average on the
both the data sets. If you look at the plot of all the 4 trailing averages, all
of them have predicted below the actual train and test data set, out of
which 9 point trailing has predicted lowest of all and closet to actual is
2point trailing moving average.
This is also evident through the RMSE score for each moving averages.
Sparkling
2 point MA – 831.40, MAPE – 19.70
4 point MA – 1156.59, MAPE – 35.96
6 point MA – 1283.92, MAPE – 43.86
9 point MA – 1346.27 MAPE – 46.86
Rose
2 point MA – 11.52, MAPE – 13.54
4 point MA – 14.45, MAPE – 19.49
6 point MA – 14.56, MAPE – 20.82
9 point MA – 14.72 MAPE – 21.01

So far among all models ran, 2 point moving average has been best for
Sparkling and Rose dataset.

Simple Exponential Smoothening


Although the simple exponential smoothening model is used on data with
no clear trend and seasonality. We tried applying the model on the data
set to check the performance of the model.
We used level as 0.098 and alpha 1, for this model which was derived
from auto. The model did not perform well in comparison to other ones.
Sparkling RMSE – 1275.08 and MAPE – 38.90
Rose RMSE – 36.79 and MAPE – 63.28
We also performed grid search for Simple exponential and we also
forecasting using alpha 0.3 and 0.4.
Alpha - 0.3, Sparkling RMSE – 1935.50 and MAPE – 75.66
Alpha -0.3, Rose RMSE – 47.50 and MAPE – 83.71
Alpha - 0.4, Sparkling RMSE –2311.91 and MAPE – 91.55
Alpha -0.4, Rose RMSE – 53.76 and MAPE – 95.50

Double Exponential
Double exponential are used for data sets with level and trends and no
seasonality, we did the grid search to begin and we found alpha- 0.3 and
beta – 0.3 with lowest RMSE and MAPE on train and test set for Sparkling
and Rose
Sparkling RMSE- 18259 and MAPE-675.28
Rose RMSE- 265.56 and MAPE- 442.50
Double Exponential has been worst performing on both data set so far.
Triple Exponential
We started the Triple exponential smoothening model with auto parameter
followed by grid search for alpha, beta and gamma value with least RMSE
and MAPE.
Sparkling – With auto parameter we got
Alpha=0.154,Beta=0.371,Gamma=7.413 and through grid search we
Alpha=0.3,Beta=0.3,Gamma=0.3. The RMSE for
Alpha=0.154,Beta=0.371,Gamma=7.413 test model was 17.36 and
MAPE- 28.8 where in for Alpha=0.3,Beta=0.3,Gamma=0.3. test model we
got 462.28 and MAPE- 499.52 . The Auto parameter Triple exponential
model gave the least RMSE for Sparkling data
Rose – With auto parameter we got Alpha=0.106,Beta=0.0,Gamma=0.048
and through grid search we got Alpha=0.1,Beta=0.2,Gamma=0.2. The
RMSE for Alpha=0.106,Beta=0.0,Gamma=0.048 test model was 17.36
and MAPE- 28.8 where in for Alpha=0.106,Beta=0.0,Gamma=0.048 test
model we got 462.28 and MAPE- 499.52. The Auto parameter Triple
exponential model gave the highest RMSE for Rose data

5.

In the above coding we have performed to do the stationarity test on the


dataframe. We have performed Augmented Dickey Fuller test on the data
sets to see the stationarity. The hypothesis is that our data is stationary.
Alpha – 0.05
Sparkling – We had to reject our hypothesis in this case since the p value
is more than 0.05 hence we need to stationaries the data. The data
properties do not depend on the time at which the series is observed. This
is hint of trend or/and seasonality in the data set.
We took the difference of 1 between consecutive observation to
stationaries the data post which the p value appeared to be less than 0.05
Sparkling – We had to reject our hypothesis in this case since the p value
is more than 0.05 hence we need to stationaries the data. The data
properties do not depend on the time at which the series is observed. This
is hint of trend or/and seasonality in the data set.
We took the difference of 1 between consecutive observation to
stationaries the data post which the p value appeared to be less than 0.05

6
The above screen shot is for automated ARIMA model. An ARIMA model is
characterized by 3 terms: p, d, q where, p is the order of the AR term, q is
the order of the MA term, d is the number of differencing required to make
the time series stationary. An ‘non-seasonal’ time series that exhibits
patterns and is not a random white noise can be modeled with ARIMA
models. We used Akaike information criteria as our measurement for the
performance of the model. The model summary reveals a lot of
information. The table in the middle is the coefficients table where the
values under ‘coef’ are the weights of the respective terms.

Sparkling – Starting with an iteration for p q and d to see suitable


parameters for p q and d. We applied the ARIMA model using order as
param to see the lowest AIC for given combination. Post sorting of the
lowest AIC, we derived p = 2, q=1 and d=2, the lowest AIC recorded was
2210.618. The same order was fitted to check the final outcome of the
model on train and test data set. We found ar.L1.D.Sparkling,
ar.L2.D.Sparkling, ma.L1.D.Sparkling and ma.L2.D.Sparkling. The p value
of AR1,AR2, MA1 and MA2 term is 0, less that 0.05 which makes them
highly significant in terms of coefficient. The AIC for this model is 2210.61.
The same parameters are then used to forecast on test set, RMSE for test
is 1374.48.

Rose – Starting with an iteration for p q and d to see suitable parameters


for p q and d. We applied the ARIMA model using order as param to see
the lowest AIC for given combination. Post sorting of the lowest AIC, we
derived p = 0, q=1 and d=2, the lowest AIC recorded was 1276.85. The
same order was fitted to check the final outcome of the model on train
and test data set. We found ma.L1.D.Rose and ma.L2.D.Rose. The p value
of MA1 is 0 and MA2 term is 0.013, less that 0.05 which makes them
highly significant in terms of coefficient. The AIC for this model is 1276.83.
The same parameters are then used to forecast on test set, RMSE for test
is 15.61.
The above screenshots are for auto SARIMA models. The big difference
between an ARIMA model and a SARIMA model is the addition of seasonal
error components to the model. The purpose of an ARIMA model is to
make the time-series that you are working with act like a stationary series.
This is important because if it isn’t stationary, you can get biased
estimates of the coefficients.

There is no difference with a SARIMA model. We are still trying to get the
series to behave in a stationary way, so that our model gets estimated
correctly. Seasonality can come in two basic varieties, multiplicative and
additive. By default, statsmodels works with a multiplicative seasonal
components. For our model it really won’t matter.
SARIMA model has 7 parameters. The first 3 parameters are the same as
an ARIMA model. The last 4 define the seasonal process. It takes the
seasonal autoregressive component, the seasonal difference, the seasonal
moving average component, the length of the season, as additional
parameters
Sparkling - Starting with an iteration for an order and seasonal order. We
applied params for order and seasonal order both to see combination with
the lowest AIC. Post sorting of the lowest AIC, we derived p = 1, q=1 and
d=2 and season order of 1,0,2,12, the lowest AIC recorded was 1555.61.
The same order was fitted to check the final outcome of the model on
train and test data set. We found ar.L1, ar.S.L12, ma.L1, ma.L2, ma.S.L12
and ma.S.L24. The p value of AR for Lag 1,seasonal AR for Lag 12, MA for
lag 2 and seasonal MA for lad 12 term is 0 or close to 0, less that 0.05
which makes them highly significant in terms of coefficient. The MA for lag
1 and seasonal MA for lag 24 is not significant since their p value is above
0.05 for The AIC for this model is 1555.58. The same parameters are then
used to forecast on test set, RMSE for test is 528.59.
Rose - Starting with an iteration for an order and seasonal order. We
applied params for order and seasonal order both to see combination with
the lowest AIC. Post sorting of the lowest AIC, we derived p = 0, q=1 and
d=2 and season order of 2,0,2,12, the lowest AIC recorded was 887.93
The same order was fitted to check the final outcome of the model on
train and test data set. We found ar.S.L24, ar.S.L12, ma.L1, ma.L2,
ma.S.L12 and ma.S.L24. The p value of seasonal AR for Lag 12 & 24 term
is 0, less that 0.05 which makes them highly significant in terms of
coefficient. The MA for lag 1&2, and seasonal MA for lag 12 & 24 is not
significant since their p value is above 0.05 for The AIC for this model is
887.93. The same parameters are then used to forecast on test set, RMSE
for test is 26.92.
7

Sparkling- Looking at this you see a significant Lag in ACF at 12 and


geometric decay at each Lag 12 i.e. 24, 36, 48 in PACF. Right away you
know this is the Seasonal component of the ARIMA (because of the 12 Lag
intervals).
PACF plots say lag-1 is just within the confidence interval touching the
edge of CI and further lags are within the CI, while ACF reflects lag-1 and
-2 above the CI and then dropping for further few lags. The p should be
taken as 1 and q as 2.
I believe grid search is a good option here, p=1 and q=2 was specified by
the grid search. While the model is also created through above mentioned
AR and MA, I tweaked it little bit to p=0 and q=2
Rose – Both ACF and PACF, lag -1 and -2 are significantly above the CI and
then further lags in the CI for ACF and slightly above the CI for PACF. In
this case p=0 and q=2. ACF does reflects significant spikes at lag 12
which hints seasonality. As mentioned for Sparkling data set, grid search is
best option here, the search specified p=0 and q=2, since already ran the
model with same parameter I tweaked the P to 2 to see the performance.
Sparkling – we ran model through grid search and manual interpretation
above the results of the same. As I mentioned that grid search performed
better to determine AR and MA for the model and it is evident though the
model outcome as well. Arima (2,1,2) gave the lesser RMSE than the
manual ARIMA (0,1,2). Incase of SARIMA model, there is a very less
difference between two. In the above case I will go with (0,1,2)(2,0,2,12)
(only selected between ARIMA and SARIMA, in totality the Triple
Exponential model has best RMSE outcome)
Rose – Looking at the model comparison between grid search outcome
and manual tuning, the manual tuning ARIMA model gave slightly better
results and grid search parameters gave better results for SARIMA. While
observing the RMSE, I would select manually tuned ARIMA model (2,1,2)
(only selected between ARIMA and SARIMA, in totality the Triple
Exponential model has best RMSE outcome).
8

Root mean square error(RSME) is basically standard deviation of the


residual (predicted vs actual).It helps to measure the performance of the
model and see how prediction is close to actual.
Above image is the model performance comparison for all the models, for
Sparkling – the lowest RMSE is captured for Triple Exponential
Smoothening model.
For Rose – Its 2 point moving average model to forecast future sales
instances.

9
Final Model – Sparkling

For Triple Exponential final model, I made three different


models to check which stands out for this data set.
Model 1
The first model created was using auto for defining alpha,beta and
gamma. Using trend as additive and seasonal as multiplicative with
frequency monthly. The parameters are alpha – 0.665,beta – 0.665 and
gamma as 0.256. The prediction is done on the data set using fitted
values generated through auto.
To forecast 12 months in future the autofit is used in this case and same
has been plotted, the prediction on the dataset looks close to original data
set, the RMSE for this model stands at 347.12 and MAPE at 10.27.
I had to define the confidence level interval while showing it on the plot,
coding the upper and lower limit. The grayed area represents the
confidence interval.
Model 2

In the second stint of the model creation for Triple exponential


smoothening (TES), we used grid search to find most optimized the alpha,
beta and gamma values. Basis on RMSE and MAPE, we selected alpha as
001, beta – 0.1 and gamma – 0.3.
We used the forecast function to forecast for 12months in future through
selected parameters. Plot one reflects the forecast on entire dataset
versus the actual, plot 2 talks about 12 months forecast in future (redline)
and plot three depicts the confidence interval for future forecast.
Model 3
In the third iteration of the model for final one, I used the same alpha,
beta and gamma as we got in train and test data set iteration. The alpha-
0.15443133706629503,beta-6.108874921042356e-28,gamma-
0.371166173081257 is taken for model creation. The RMSE for this model
turned out to be same as the model we made through grid search. Out the
three model we created for TES, the one with auto parameters gave the
lowest RMSE and MAPE.
Final Model – Rose
The 2 point moving average is applied to complete data set instead of
train and test. A quick check on head and tail confirms the two moving
average.

The above plot reflects the prediction on the data set against actual data.
The blue line represents the dataframe and orange is the 2 point moving
average. The average is slightly lower than the actual peaks in dataframe
which is obvious.

The RMSE of 2 point moving average is 11.52 and MAPE is 13.54, which
same as test data set.

In order to predict for 12 months using 2 point moving average, we


created a loop function. We have used following approach.
a) Took average of last values in the data set, e.g – average of
186+187 row
b) Appended the value in subsequent row as new input, average of
186+187 row is appended in row no 188
c) We repeated this iteration for 12 times
d) We also popped the dates using range of 1995-08-31 to 1996-07-31
e) The new list is then put in to a new dataframe
As a result, is what you see the above plot, the orange line the forecasting
for next 12 months basis 2 point moving average.

The above plot reflects the forecasting for next 12 months starting from
1995-08-31 to 1996-07-31 along with confidence interval of 95%. We
manually create the upper limit being 2.5% above the forecast and lower
limit with being 2.5% below the forecast.
The plot reflects the forecasting variance with 95% confidence interval.

10
Comments - Sparkling
In the first iteration, we created 12 models to see which one has lowest
RMSE and MAPE and in second iteration we created another 4 models
through ARIMA and SARIMA approach. Looking at the performance its
clear that the integrated approach of Auto regression and moving average
was not able to forecast with lowest RMSE. Between ARIMA and SARIMA,
SARIMA performed better. Out of all the models, Triple Exponential model
gave out the lowest RMSE and observed as one of best models, which
takes our attention to the seasonal aspect of the Sparkling wine sales, this
can be seen in ACF and PACF interpretation as well. We cannot ignore that
the sales go up towards the end of the year and that has to be accounted
in our forecasting. While we have forecasted using TES model, we still can
see the error on the higher side and this could be due to variation in the
data and seasonal components. While the we have considered the whole
year data from 1980 to 1995, 1995 data consist of 7months only. The
other iteration we can include it omitting data either from the beginning
and or towards the end to see if the error can be reduced to make better
forecasting.

Comments-Rose
Like we 12 iteration for Sparkling we created similar models for Rose sales
data set. Although we can see hint of seasonal aspect however it is not
strong as Sparkling wine sales. The variation is high in the Rose data set.
We can see the sales going up in Q4 of the year which does hint the
seasonal aspect in the data. The RMSE are on lower side for all the models
except for Triple exponential model. The moving average performed the
best among all. Even though I used 2 point moving average to forecast for
12months, my recommendation that we should only consider for 2months
in future rather than 12 months. This is mainly due to 2point moving
average considering last two values, hence basis actual values we would
be able to forecast two months into future.

You might also like