You are on page 1of 60

Time Series Forecasting Umesh K Hasija 1

Final Project Report

Project - Time Series Forecasting – Wine Sales Analysis

Umesh K Hasija
Time Series Forecasting Umesh K Hasija 2

Table of Contents

Table of Contents ....................................................................................................................... 2


Table of Figures ......................................................................................................................... 3
1. Executive Summary ............................................................................................................ 5
2. Introduction ........................................................................................................................ 5
3. Data Details ........................................................................................................................ 5
4. Read the data as an appropriate Time Series data and plot the data ................................... 5
4.1 Reading the Data ......................................................................................................... 5
4.2 Plotting the Data .......................................................................................................... 6
5. Perform appropriate Exploratory Data Analysis to understand the data and also perform
decomposition ............................................................................................................................ 7
5.1. EDA............................................................................................................................. 7
Null Value Check ............................................................................................................... 7
Duplicate Value Check ....................................................................................................... 8
Data Description ................................................................................................................. 8
Yearly Box Plots ................................................................................................................. 9
Monthly Box Plots ............................................................................................................ 10
Monthly Sales Across Years............................................................................................. 11
Yearly Sum of Observations............................................................................................. 14
5.2. Decomposition .......................................................................................................... 14
6. Split the data into training and test. The test data should start in 1991. ........................... 17
7. Build various exponential smoothing models on the training data and evaluate the model
using RMSE on the test data. Other models such as Regression, Naïve forecast models and
simple average models should also be built on the training data and check the performance on
the test data using RMSE ......................................................................................................... 20
7.1 Linear Regression ...................................................................................................... 20
7.2 Naïve Model .............................................................................................................. 23
7.3 Simple Average Model.............................................................................................. 24
7.4 Moving Average Model ............................................................................................ 26
Moving Average Model on the Rose Wine dataset: ......................................................... 29
Simple Average Model on the Sparkling Wine dataset:................................................... 29
7.5 Simple Exponential Smoothing (SES) ...................................................................... 32
7.6 Double Exponential Smoothing (DES) ..................................................................... 35
7.7 Triple Exponential Smoothing (TES) ....................................................................... 37
7.8 Summary of all Models ............................................................................................. 40
Time Series Forecasting Umesh K Hasija 3

8. heck for the stationarity of the data on which the model is being built on using
appropriate statistical tests and also mention the hypothesis for the statistical test. If the data
is found to be non-stationary, take appropriate steps to make it stationary. Check the new data
for stationarity and comment. Note: Stationarity should be checked at alpha = 0.05 ............. 41
9. Build an automated version of the ARIMA/SARIMA model in which the parameters are
selected using the lowest Akaike Information Criteria (AIC) on the training data and evaluate
this model on the test data using RMSE. ................................................................................. 43
9.1 ARIMA Model .......................................................................................................... 43
9.2 SARIMA Model ........................................................................................................ 46
10. Manual ARIMA and SAMIRA model …………………………………………………56
11. Model Comparison……………………………………………………………………….60
12. Final Recommendations………………………………………………………………..61

Table of Figures

Figure 1: Reading Wine Datasets .............................................................................................. 6


Figure 2: Rose Wine Time Series Plot ....................................................................................... 6
Figure 3: Sparkling Wine Time Series Plot ............................................................................... 7
Figure 4: Null Value Check ....................................................................................................... 8
Figure 5: Null Value Check ....................................................................................................... 8
Figure 6: Wine Sales Time Series Data Description ................................................................. 9
Figure 7: Yearly Box Plots ........................................................................................................ 9
Figure 8: Monthly Box Plots.................................................................................................... 11
Figure 9: Monthly Sales Across Years - Rose Wine ............................................................... 12
Figure 10: Monthly Sales Across Years - Sparkling Wine ...................................................... 13
Figure 11: Sum of Yearly Observations for Rose and Sparkling Wine ................................... 14
Figure 12: Additive Decomposition - Rose and Sparkling Wine ............................................ 15
Figure 13: Multiplicative Decomposition - Rose and Sparkling Wine.................................... 16
Figure 14: Training and Test Datasets for Rose and Sparkling Wine Time Series ................. 18
Figure 15: Plots for Training and Test data frames ................................................................. 19
Figure 16: Training and Test data for Linear Regression ........................................................ 21
Figure 17: Linear Regrsn Outcome on the Rose and Sparkling Wine Time Series respectively
.................................................................................................................................................. 22
Figure 18: Performance of the Linear Regression Model ........................................................ 23
Figure 19: Training and Test data for Naive Model ................................................................ 23
Figure 20: Naive Model Outcome on the Rose and Sparkling Wine Time Series respectively
.................................................................................................................................................. 24
Figure 21: Performance of the two Models ............................................................................. 24
Figure 22: Training and Test data for Simple Average Model ................................................ 25
Figure 23: Simple Average Model Outcome on the Rose and Sparkling Wine Time Series
respectively .............................................................................................................................. 25

Figure 24: Moving Average Model Data for Rose and Sparkling wine respectively .............. 27
Figure 25: Moving Average Model Outcome on the Rose wine Time Series ......................... 28
Figure 26: Moving Average Model Outcome on the Sparkling Wine Time Series ................ 29
Figure 30: Summarized Performance of the Models ............................................................... 30
Time Series Forecasting Umesh K Hasija 4

Figure 31: SES Parameters for the Rose and Sparkling wine datasets respectively ................ 32
Figure 32: SES Train and Test data for Rose and Sparkling wine respectively ...................... 32
Figure 33: Simple Exponential Smoothing Outcome on the Rose wine Time Series ............. 33
Figure 34: Simple Exponential Smoothing Outcome on the Sparkling Wine Time Series ..... 33
Figure 35: Summarized Performance of the Models ............................................................... 34
Figure 36: Alpha Beta values for Rose and Sparkling wine respectively................................ 35
Figure 37: Double Exponential Smoothing Outcome on the Rose wine Time Series ............. 35
Figure 38: Double Exponential Smoothing Outcome on the Sparkling Wine Time Series .... 36
Figure 39: Summarized Performance of the Models ............................................................... 37
Figure 40: TES Parameters for the Rose and Sparkling wine datasets respectively ............... 38
Figure 41: TES Model Train and Test data for Rose and Sparkling wine respectively .......... 38
Figure 42: Triple Exponential Smoothing Outcome on the Rose wine Time Series ............... 39
Figure 43: Triple Exponential Smoothing Outcome on the Sparkling Wine Time Series ...... 39
Figure 44: Summarized Performance of the Models ............... Error! Bookmark not defined.
Figure 45: Sorted Model Performance Summary for Rose Wine Time Series........................ 40
Figure 46: Sorted Model Performance Summary for Sparkling Wine Time Series ................ 40
Figure 47: Stationarity - Rose .................................................................................................. 41
Figure 48: Stationarity Check - Sparkling ............................................................................... 42
Figure 49: Running Automated ARIMA Model on Rose Wine Dataset ................................. 44
Figure 50: Results of Automated ARIMA Model on Rose Wine Dataset .... Error! Bookmark
not defined.
Figure 51: Running Automated ARIMA Model on Sparkling Wine Dataset ......................... 45
Figure 52: Results of Automated ARIMA Model on Sparkling Wine Dataset ....................... 45
Figure 53:SARIMA Model on Rose wine data ......................................................................... 49
Figure 54: SARIMA Model on Sparkling Data ....................................................................... 52

Table of Tables

Table 1: Dataset Sample .......................................................... Error! Bookmark not defined.


Time Series Forecasting Umesh K Hasija 5

1. Executive Summary

For the Time Series Forecasting project, we have been provided with the data of different types

of wine sales in the 20th century. Both of these datasets are from the same company but for

different wines. As an analyst in the ABC Estate Wines, I need to analyse and forecast Wine

Sales in the 20th century.

2. Introduction

The intent for this project is to perform forecasting analysis on the Rose and Sparkling dataset.

I will try to analyse this dataset by using Linear Regression, Naïve Model, Simple and Moving

Average models, Simple, Double and Triple Exponential Smoothing. These datasets contains

187 entries each, and I will try to build the most optimum model(s) on the complete data and

predict 12 months into the future with appropriate confidence intervals/bands.

3. Data Details

Each datasets contain two columns, where the first column shows the month and year of the

corresponding Sales Quantity recorded in the second column.

4. Read the data as an appropriate Time Series data and plot the data

4.1 Reading the Data

As we can observe, each entry has an YearMonth value with it, which is not really a

datapoint, but an index for the sales entry. So in reality the datasets have a single column

that contains the quantity of wines sold in that particular month.


Time Series Forecasting Umesh K Hasija 6

Figure 1: Reading Wine Datasets

It can be observed that both the datasets have data starting from January 1980 going till

July 1995, so there are 187 entries in totality in each dataset.

4.2 Plotting the Data

Both the time series below.

Figure 2: Rose Wine Time Series Plot


Time Series Forecasting Umesh K Hasija 7

Figure 3: Sparkling Wine Time Series Plot

As we can observe from the above plots, the sales for Rose Wine are showing a declining

trend and the sales for Sparkling wines are showing slight upward trend . There is a certain

seasonality element that is visible in the graphs. We will explore the trend and seasonality

further during decomposition, where we will be able to view a much detailed report on

these two factors.

5. Perform appropriate Exploratory Data Analysis to understand the data

and also perform decomposition

5.1.EDA

Null Value Check


Performing a Null value check on both the time series, I got:
Time Series Forecasting Umesh K Hasija 8

Figure 4: Null Value Check

The Rose dataset contains 2 Null values and there are no Null values for Sparkling dataset;

I addressed the Rose dataset Null values using linear interpolation so as to obtain the

imputed values in place of the values that are missing.

Post the imputation, I confirmed that there are no more Null values in the Rose dataset.

Figure 5: Null Value Check

Duplicate Value Check

There are no duplicate entries in the datasets as each value correspond to a different time

index, so basically these are all sales figures for different months.

Data Description
Time Series Forecasting Umesh K Hasija 9

Figure 6: Wine Sales Time Series Data Description


As we can see from the above, both the wine sales time series data look like they are

skewed. There is High Standard Deviation for both ethe time series since the Min and Max

have significant difference between them. Moreover, there is difference between the mean

and the median for the same reason of skewness. As mentioned earlier, there are in total

187 records in both the datasets.

Yearly Box Plots

Following are the yearly boxplots for the two wine sales time-series:

Figure 7: Yearly Box Plots


As we can observe from the above plots, Rose wine has mostly a downward sales trend.

The highest sales for Rose wine can be observed in 1981 and the lowest sales in 1994

(because the 1995 sales seem to be doing well, considering the data is only till July month
Time Series Forecasting Umesh K Hasija 10

and reaching to the 1994 level already in 7 months itself). The highest variation in monthly

sales for Rose wine seems to be in the year 1981 and on the year 1994 there seems to be

the lowest variation in monthly sales.

As we can observe, the Sparkling Wine sales have a variation each year, the years 1985

and 1986 seem to be the years with the least variation, so the 2 years show certain

consistency in terms of sales. The highest sales for Sparkling Wine seems to happen in the

year 1994 and the lowest in the year 1982. Based on the 1995 data of 7 months (till July),

it is difficult to comment on the sales performance of that year. The Sparkling wine sales

appear to be going down from the year 1980 and have started increasing from the year

1983. The variation in Sparkling Wine sales seem to be increasing for the period 1983-

1986, while the highest variation in Sparkling wine sales is in the year 1994. There is clear

skewness that can be observed for Sparkling wine sales for all the years, except maybe in

1981. There are outliers in the yearly sales data, however as it is a Time Series, we can

ignore the outlier data.

Monthly Box Plots

Following are the monthly boxplots for the two wine sales time-series:

Text(0, 0.5, 'Rose Wine Sales Monthly Variation')


Time Series Forecasting Umesh K Hasija 11

Text(0, 0.5, 'Sparkling Wine Sales Monthly Variation')

Figure 8: Monthly Box Plots


As we can observe from the Monthly Box Plots, we can clearly see that there is a

seasonality element visible in both the Rose as well as Sparkling Wine time series datasets.

The Sparkling wine seem to have a higher seasonality element as compared to Rose Wine.

As can be clearly seen that the sales have an increasing sales trend in the last quarter of the

year, with Sparkling wine observing a steeper rise in sales during last quarter. The sales

for Rose wine seems to pick up from January month and is more or less consistent till June,

observes some stagnancy till September month and then starts to pick up again from

October (i.e. last quarter); while for Sparkling wine, the sales is relatively low in first two

quarters, slowly picks up pace during the third quarter and goes on a rise till the end of the

year. Monthly sales data for both the type of wines shows skewness without much

exceptions.

Monthly Sales Across Years

The monthly sales for the type of wines across years can be seen in the following Pivot

Tables and the associated graphs:


Time Series Forecasting Umesh K Hasija 12

Figure 9: Monthly Sales Across Years - Rose Wine


Time Series Forecasting Umesh K Hasija 13

Figure 10: Monthly Sales Across Years - Sparkling Wine


As can be observed from the above two sets of tables and graphs, the months of December

seems to be the month that drives the highest sales figures for both Rose and Sparkling

Wines. The second highest sales for Sparkling being in November while Rose wine shows

a mixed trend, with highest sales in August or July for certain years.

We can observe a seasonality element in the graphs for both Rose and Sparkling wines.
Time Series Forecasting Umesh K Hasija 14

Yearly Sum of Observations

The yearly sum of sales numbers can be observed in the following tables and graphs:

Figure 11: Sum of Yearly Observations for Rose and Sparkling Wine
As can be observed from the above summation tables and the plotted graphs, Rose wine

annual sales year on year observe a downward sales trend. While the sales figures for

Sparkling wine show a dip initially with sales picking up from the year 1982 right up to

the year 1988 and then observing another dip in the sales. The steep drop post 1994 for

both Rose and Sparkling wine is because of the relatively less (half year data - till July)

data available for the year 1995.

5.2.Decomposition

I have provided the decomposed elements for both the Time Series below:
Time Series Forecasting Umesh K Hasija 15

Rose Additive Decompose Plot

Sparkling Additive Decompose Plot

Figure 12: Additive Decomposition - Rose and Sparkling Wine


Time Series Forecasting Umesh K Hasija 16

Rose Multiplicative Decompose Plot

Sparkling Multiplicative Decompose Plot

Figure 13: Multiplicative Decomposition - Rose and Sparkling Wine


Time Series Forecasting Umesh K Hasija 17

We can see the decomposition of the two time series above. I have tried with both additive

and multiplicative decomposition for both time series so that I can determine if the wine

datasets are a multiplicative or additive series.

As we can observe from the above, we can say that the wine time series are clearly

multiplicative in nature and both have a seasonal component.

We can also observe again that the Rose wine sales depict a downward sales trend and the

Sparkling wine sales show an upward sales trend. The plots above clearly indicate that the

Wine sales are unstable and not uniform, and they have an apparent seasonality trend.

Moreover, the seasonal variation seems to be more in the case of Sparkling wine as

compared to the Rose wine; while the sales variation seems to be more in case of Rose

wine as compared to Sparkling wine.

6. Split the data into training and test. The test data should start in 1991.

I have split the time series datasets into Train and Test datasets below. It is given the

question that the Test Data should start in 1991.

Rose Data Split Sparkling Data Split


Time Series Forecasting Umesh K Hasija 18

Figure 14: Training and Test Datasets for Rose and Sparkling Wine Time Series
I have also confirmed that the Train dataset indeed ends in 1990, and the Test dataset

indeed starts in 1991 by using the Head and Tail functions on the Training and Test

datasets. As we can observe, the size of the Train data frame is 132 observations and that

of the Test data frame is 55 observations.

I have also plotted the Train and test data frames for both time series datasets below:
Time Series Forecasting Umesh K Hasija 19

Figure 125: Plots for Training and Test data frames


We can observe the training and test data in the above plots, the Blue part of the plots

depicts the Train datasets (January ’80 – December ‘90), and the Orange part of the plots

depict the test datasets(January ’91 – July ‘95).


Time Series Forecasting Umesh K Hasija 20

7. Build various exponential smoothing models on the training data and

evaluate the model using RMSE on the test data. Other models such as

Regression, Naïve forecast models and simple average models should

also be built on the training data and check the performance on the test

data using RMSE

In this section I will try to run the various available models on both the Rose and Sparkling

wine time series. Let’s kick off the analysis with Linear Regression model.

7.1 Linear Regression

The extracts of Training and Test data for the Linear Regression can be seen below:
Time Series Forecasting Umesh K Hasija 21

Figure 136: Training and Test data for Linear Regression


Following are the results from a Linear Regression model run on both the Rose and

RoseWine datasets:
Time Series Forecasting Umesh K Hasija 22

Sparkling Linear Regression

Figure 147: Linear Regression Outcome on the Rose and Sparkling Wine Time Series
respectively

The Regression plots above depict the regression on training set as the Red line and that

on the test set as the blue line. As we can observe from the above plots and metrics, Rose

wine sales show a downward trend, and the Sparkling wine sales show an upward trend.

For RegressionOnTime forecast on the Test Data for Rose wine,


Time Series Forecasting Umesh K Hasija 23

For RegressionOnTime forecast on the Test Data for Sparkling wine,

Figure 18: Performance of the Linear Regression Model


7.2 Naïve Model

The extracts of Test data for the Naïve Model can be seen below:

Rose Test Set Sparkling Test Set

Figure19: Test data for Naive Model


Following are the results from running a Naïve Model on both the Rose and Sparkling

Wine datasets:
Time Series Forecasting Umesh K Hasija 24

Figure 15: Naive Model Outcome on the Rose and Sparkling Wine Time Series respectively
For Rose Wine,

RMSE = 79.719

For Sparkling Wine,

RMSE = 3864.27

Figure 161: Performance of the two Models


As can be seen from the Naïve model performance for Rose and Sparkling wine datasets

above, the Naïve model is not suitable for any of the wine datasets since the forecasts

depends on the previous last observation.

7.3 Simple Average Model

The extracts of Training and Test data for the Simple Average Model can be seen below:
Time Series Forecasting Umesh K Hasija 25

Figure 23: Test data for Simple Average Model


Following are the results from running a Simple Average Model on both the Rose and

Sparkling Wine datasets:

Rose

Sparkling

Figure 17: Simple Average Model Outcome on the Rose and Sparkling Wine Time Series
respectively
For Simple Average Model on the Rose Wine dataset,
Time Series Forecasting Umesh K Hasija 26

RMSE = 53.46

For Simple Average Model on the Sparkling Wine dataset,

RMSE = 1275.08

The summarized performance of the models run on the two wine datasets can be seen

below:

Figure 187: Performance of the three Models


As can be seen from the Simple Average model performance for Rose and Sparkling wine

datasets above, the Linear Regression model has the best performance among all the three

models run till now for the Rose wine dataset; while the Simple Average model shows the

best performance among all the three models run till now for the Sparkling wine dataset.

7.4 Moving Average Model

The Moving Average data for the Rose and Sparkling wine datasets can be seen below:
Time Series Forecasting Umesh K Hasija 27

Figure 199: Moving Average Model Data for Rose and Sparkling wine respectively
Following are the results from running a Moving Average Model on both the Rose and

Sparkling Wine datasets:


Time Series Forecasting Umesh K Hasija 28

Figure 31: Moving Average Model Outcome on the Rose wine Time Series
Time Series Forecasting Umesh K Hasija 29

Figure 20: Moving Average Model Outcome on the Sparkling Wine Time Series

Moving Average Model on the Rose Wine dataset:

For 2 point Moving Average Model forecast on the Training Data, RMSE = 11.529 |

For 4 point Moving Average Model forecast on the Training Data, RMSE = 14.451 |

For 6 point Moving Average Model forecast on the Training Data, RMSE = 14.566 |

For 9 point Moving Average Model forecast on the Training Data, RMSE = 14.728 |

Simple Average Model on the Sparkling Wine dataset:

For 2 point Moving Average Model forecast on the Training Data, RMSE = 813.401 |
Time Series Forecasting Umesh K Hasija 30

For 4 point Moving Average Model forecast on the Training Data, RMSE = 1156.590 |

For 6 point Moving Average Model forecast on the Training Data, RMSE = 1283.927 |

For 9 point Moving Average Model forecast on the Training Data, RMSE = 1346.278 |

The summarized performance of the models run on the wine datasets can be seen below:

Figure 21: Summarized Performance of the Models


I have applied 2, 4, 6- and 9-point trailing averages on the both the Rose and Sparkling

wine data sets.

As we can observe from the above plots, all of the trailing average plots show prediction

values below the actual train and test data sets, and the 9 point trailing average plot shows

the lowest prediction of all the plots. The closest prediction to actual data is shown by the

2 point trailing moving average model. This observation is corroborated by the RMSE

scores for each of these moving average models.

As can be seen from the summarized performance of all the models, the 2 point moving

average has shown the best performance of all the models run on the Rose and Sparkling

wine datasets.
Time Series Forecasting Umesh K Hasija 31

Rose Wine Performance Summary

Sparkling wine Performance Summary


Time Series Forecasting Umesh K Hasija 32

7.5 Simple Exponential Smoothing (SES)

The SES Parameters for the Rose and Sparkling wine datasets can be seen below:

Rose Wine

Sparkling Wine

Figure 22: SES Parameters for the Rose and Sparkling wine datasets respectively
The SES test data for the Rose and Sparkling wine datasets can be seen below:

Figure 23: SES Train and Test data for Rose and Sparkling wine respectively
Time Series Forecasting Umesh K Hasija 33

Following are the results from running a SES Model on both the Rose and Sparkling Wine

datasets:

Figure 24: Simple Exponential Smoothing Outcome on the Rose wine Time Series

Figure 25: Simple Exponential Smoothing Outcome on the Sparkling Wine Time Series
Time Series Forecasting Umesh K Hasija 34

For Rose Wine dataset:

For Alpha = 0.09874 Simple Exponential Smoothening Model forecast on the Test data,

RMSE = 36.796

For Sparkling Wine dataset:

For Alpha =0.0496Simple Exponential Smoothening Model forecast on the Test data,

RMSE = 1316.05

The summarized performance of the models run on the wine datasets can be seen below:

Figure 26: Summarized Performance of the Models


As we all know that SES model should be used on data which has no element of trend or

seasonality, I still applied it on the both the Rose and Sparkling wine data sets so as to see

what’s the performance of the model in this case.


Time Series Forecasting Umesh K Hasija 35

7.6 Double Exponential Smoothing (DES)

The Alpha, Beta values for the Rose and Sparkling wine datasets can be seen below: Sorted

Figure 27: Alpha Beta values for Rose and Sparkling wine respectively
Following are the results from running a DES Model on both the Rose and Sparkling Wine

datasets:

Figure 28: Double Exponential Smoothing Outcome on the Rose wine Time Series
Time Series Forecasting Umesh K Hasija 36

Figure 29: Double Exponential Smoothing Outcome on the Sparkling Wine Time Series
For Rose Wine dataset:

For Alpha = 0.3, Beta = 0.3 Double Exponential Smoothening Model forecast on the Test

data, RMSE = 265.567594

For Sparkling Wine dataset:

For Alpha = 0.3, Beta = 0.3 Simple Exponential Smoothening Model forecast on the Test

data, RMSE = 18259.110704

The summarized performance of the models run on the wine datasets can be seen below:
Time Series Forecasting Umesh K Hasija 37

Figure 30: Summarized Performance of the Models


As we all know that DES model should be used on data which has no seasonality but has

levels and trends, I used the grid search to begin and we reached conclusion that Alpha =

0.3 and Beta = 0.3 show the lowest RMSE for both the Rose and Sparkling wine data sets.

The DES model is the model with the worst performance so far for both Rose and

Sparkling wine datasets.

7.7 Triple Exponential Smoothing (TES)

The TES Parameters for the Rose and Sparkling wine datasets can be seen below:

ROSE TES
Time Series Forecasting Umesh K Hasija 38

Sparkling TES

Figure 31: TES Parameters for the Rose and Sparkling wine datasets respectively
The TES train and test data for the Rose and Sparkling wine datasets can be seen below:

Figure 32: TES Model Train and Test data for Rose and Sparkling wine respectively
Following are the results from running a SES Model on both the Rose and Sparkling Wine

datasets:
Time Series Forecasting Umesh K Hasija 39

Figure 33: Triple Exponential Smoothing Outcome on the Rose wine Time Series

Figure 34: Triple Exponential Smoothing Outcome on the Sparkling Wine Time Series
For Rose Wine dataset:

For Alpha=0.06569, Beta= 0.05192, Gamma= 3.87e-06, Triple Exponential Smoothing

Model forecast on the Test, RMSE = 21.020

For Sparkling Wine dataset:

For Alpha=0.1111,Beta=0.0617,Gamma=0.3954, Triple Exponential Smoothing Model

forecast on the Test Data, RMSE is 469.768

The summarized performance of the models run on the wine datasets can be seen below:
Time Series Forecasting Umesh K Hasija 40

7.8 Summary of all Models

Now that we have run all the models planned, let’s view the summary of the performance

in decreasing order for both the datasets:

Figure 35: Sorted Model Performance Summary for Rose Wine Time Series

Figure 36: Sorted Model Performance Summary for Sparkling Wine Time Series
As we can observe that for the Rose wine dataset, the 2 point trailing moving average gives

the best RMSE among all the models.

For the Sparkling wine dataset, the TES model offers the best RMSE among all the models.
Time Series Forecasting Umesh K Hasija 41

8. Check for the stationarity of the data on which the model is being built

on using appropriate statistical tests and also mention the hypothesis

for the statistical test. If the data is found to be non-stationary, take

appropriate steps to make it stationary. Check the new data for

stationarity and comment. Note: Stationarity should be checked at alpha

= 0.05

I have performed the Stationarity Test on both the Rose and Sparkling wine data frames. I

have used an augmented Dickey-Fuller test on the Rose and Sparkling wine data sets to

check the stationarity. The Hypothesis is that the wine data is stationary, Alpha = 0.05

Figure 37: Stationarity - Rose


Time Series Forecasting Umesh K Hasija 42

Figure 38: Stationarity Check - Sparkling


As we can observe from the above, we need to reject the Hypothesis since the p value

seems to be greater than alpha, hence we will have to stationaries the data. That is, the data

properties do not depend on the time when the data series is observed. This is basically a

hint of a seasonality/trend element in the dataset. After taking the difference of 1 in

between continuous observations to stationaries the data, we can observe that the p-value

appeared to be less than 0.05.


Time Series Forecasting Umesh K Hasija 43

Post making the series Stationary by taking the differenc

9. Build an automated version of the ARIMA/SARIMA model in which the

parameters are selected using the lowest Akaike Information Criteria

(AIC) on the training data and evaluate this model on the test data

using RMSE.

9.1 ARIMA Model

ARIMA on Rose Wine Series


Time Series Forecasting Umesh K Hasija 44

Figure 39: Running Automated ARIMA Model on Rose Wine Dataset


Following are the Results of ARIMA model in Rose wine dataset:

ARIMA on Sparkling Wine Series


Time Series Forecasting Umesh K Hasija 45

Figure 40: Running Automated ARIMA Model on Sparkling Wine Dataset


Following are the Results of ARIMA model in Rose wine dataset:

Figure 41: Results of Automated ARIMA Model on Sparkling Wine Dataset

As we can see from the above, the lowest AIC recorded for Rose wine data is for p,d,q

values of 0,1,2 respectively and the lowest AIC is 1276.83.

RMSE: 15.618093

The lowest AIC for Sparkling data is 2210.6 for p,d,q, values of 2,1,2 respectively.

RMSE: 1374.696495
Time Series Forecasting Umesh K Hasija 46

9.2 SARIMA Model

Following is the outcome of SARIMA Model run on Rose wine data:


Time Series Forecasting Umesh K Hasija 47
Time Series Forecasting Umesh K Hasija 48

SARMIA on ROSE series with seasonality of 12


Time Series Forecasting Umesh K Hasija 49

RMSE Seasonality 6 – Rose Wine Series: 26.13


RMSE Seasonality 12 – Rose Wine series:26.92

Following is the outcome of SARIMA Model run on Sparkling wine data

Seasonality: 6
Time Series Forecasting Umesh K Hasija 50
Time Series Forecasting Umesh K Hasija 51
Time Series Forecasting Umesh K Hasija 52

SARIMA for Sparkling with seasonality of 12:

Figure 42: SARIMA Model on Sparkling Data


Time Series Forecasting Umesh K Hasija 53

As can be observed, for Rose dataset, the model with p,d,q, as 0,1,2 respectively has the

lowest AIC,

RMSE Seasonality 6: 26.13

RMSE Seasonality 12: 26.92

For the Sparkling Dataset, as can be observed, the model with p,d,q, as 1,1,2 respectively

has the lowest AIC

RMSE Seasonality 06: 601.12

RMSE Seasonality 12: 528.61

10. Build ARIMA/SARIMA models based on the cut-off points of ACF and

PACF on the training data and evaluate this model on the test data

using RMSE.

10.1 Manual ARIMA

Rose Wine Series


Time Series Forecasting Umesh K Hasija 54

With p,d,q values of (2,1,4)


RMSE: 22.31
Lowest AIC: 1282
Time Series Forecasting Umesh K Hasija 55

10.2 Manual SARIMA


Time Series Forecasting Umesh K Hasija 56

RMSE: 527.27

For both Rose and Sparkling Wine series Manual method performs better than Automated

SARIMA. However, for ARIMA model Automated ARIMA performs better.


Time Series Forecasting Umesh K Hasija 57

11. Build a table (create a data frame) with all the models built along

with their corresponding parameters and the respective RMSE values on

the test data.

Rose:
Based on the table it is evident that for Rose Wine Series Triple exponential smoothing with
Alpha, Beta and Gamma Set to 0.3 performs the best achieving lowest RMSE of 10.94. While
the worst performance is displayed by Simple exponential smoothing performs worst with
RMSE of 265.56 with Alpha set to 0.3.

Sparkling:
Based on the table it is evident that for Sparkling Wine Series Triple exponential smoothing
with Alpha, Beta and Gamma Set to 0.3 performs the best achieving lowest RMSE of 392.78
While the worst performance is displayed by Simple exponential smoothing performs worst
with RMSE of 18259.11 with Alpha set to 0.3.
Time Series Forecasting Umesh K Hasija 58

12. Based on the model-building exercise, build the most optimum

model(s) on the complete data and predict 12 months into the future

with appropriate confidence intervals/band

12.1 For Rose wine Series based on RMSE building Triple exponential model for full data set
and predicting into next 12 months with sufficient confidence intervals. Alpha, Beta and
Gamma have been set to 0.3.
Time Series Forecasting Umesh K Hasija 59

12.2 For Sparkling wine Series based on RMSE building Triple exponential model for full
data set and predicting into next 12 months with sufficient confidence intervals. Alpha, Beta
and Gamma have been set to 0.3.
Time Series Forecasting Umesh K Hasija 60

13. Comment on the model thus built and report your findings and

suggest the measures that the company should be taking for future

sales. Please explain and summarise the various steps performed in this

project. There should be proper business interpretation and actionable

insights present.

Sparkling Wine Data:

 Triple Exponential has worked the best for the forecast with lowest RMSE on the test
data.
 Forecast for the next 12 months is is slightly over the sales of the previous 12 months
however there isn’t a considerable increase.
 As observed from the month wise bar graph shown earlies sales pick up in last 2
months probably due to holiday season.
 To mitigate the sale, drop in the early and mid-months of the years ABC can target
promotional and discount offers.
 Tie ups with event organizers or focusing on the clients with business like marriages
etc could eb adopted to increase the sales.
 Procurement of material to ramp up production is suggested by Q2 to meet the surge
in demand starting from Q3.

Rose Wine Data:

 Triple Exponential has worked the best for the forecast with lowest RMSE on the test
data.
 Forecast for next 12 months is almost similar to 1995.
 Like the sparkling wine sales, the increase is in sales is towards tail end of the year
and similar tactics for increasing sales is suggested.
 It is suggested that the company should undertake market research to find the reason
for steady decrease in demand for Rose Wine over the years.

You might also like