You are on page 1of 47

Page 1 of 47

TIME SERIES FORECASTING REPORT

DSBA

Graded Project

By

Subham Nandy - 26-03-2023


Page 2 of 47

Contents

Problem 1(i)- Time Series Forecasting of Rose Wine --------------------------------- Page 3

Problem 1(ii)-Time Series Forecasting of Sparkling Wine -------------------------- Page 24

List of Figures

1. Line graph for the dataset…………………….…………………………………………………….Page 4


2. Box Plots -Yearly and Monthly ……………………………………………..……………………Page 5
3. Decomposing Model …………………...……………………………………………………………Page 7
4. Train-Test Graph …………………..………….……………………………………………………….Page 8
5. SES DES Model Graphs…………………….………………………………………………………...Page 9
6. TES Model graphs ..……………………………………………………………………………….…Page 10-11
7. Linear Regression Graph.………………………………………………………………………….Page 12
8. Naive Model Graph …………………………………………………………………………………Page 13
9. Simple Average Graph.……………………………………………………………………………..Page 14
10. Auto ARIMA Model Graph ………………………………………………………………………Page 18
11. ACF PACF Plots ……………………………………………………………………………………….Page 19
12. Manual ARIMA Model Graph ……………………………………………………………….…Page 20
13. Final Predicted Graph ……………………………………………………………………………..Page 23

List of Tables.

1. Monthly Sales across Years………………………….………………….………………………….Page 6


2. SARIMAX for Auto ARIMA …………………………….………………………….………………..Page 17
3. SARIMAX for Manual ARIMA ………………………….……………………….………..….…….Page 20
4. Comparison for All Models…..……………..…………………………….…………….…..…….Page 22
Page 3 of 47

Problem:

For this particular assignment, the data of different types of wine sales in the 20th century is to be analysed. Both of
these data are from the same company but of different wines. As an analyst in the ABC Estate Wines, you are tasked
to analyse and forecast Wine Sales in the 20th century.

First, we are Analyzing the data for wine - “ROSE”

1.1 Read the dataset as an appropriate Time Series Data and plot the data.

Checking the first few rows of the data set :

Checking the last few rows of the data set :

Checking the data description :


Page 4 of 47

Checking the null values:

Since there are 2 null values present in the dataset, we’ve to impute the values before
proceeding further.
Plotting the graph before imputing :
Page 5 of 47

After imputing the null/missing values:

1.2 Perform appropriate Exploratory Data Analysis to understand the data and also
perform decomposition.

Yearly Box-plots:

Monthly Box-Plots:
Page 6 of 47

Monthly Graph :

Monthly Sales Across Years :


Page 7 of 47

Decomposing the dataset into trend, Seasonality and Residual :


Additive Model :-
Page 8 of 47

Multiplicative Model: -

As we can see from both the graphs that the Residual pattern is almost the same, we
will consider the Additive model only.
1.3 Split the data into training and test. The test data should start in 1991.
Now the data has to be split from 1991 into test set. Viewing the train test dataset after splitting:

Last rows of Train set :


Page 9 of 47

First few rows of Test Set:

Plotting the graph for Train - Test Set:

1.4  Build all the exponential smoothing models on the training data and evaluate the
model using RMSE on the test data. Other additional models such as regression, naïve
forecast models, simple average models, moving average models should also be built on
the training data and check the performance on the test data using RMSE .

1. Single Exponential Smoothing model :


Params:
Page 10 of 47

First few rows of predicted data after applying SES model :

Checking the MAPE and RMSE for the model :

2. Double Exponential Smoothing model :


Checking the Parameters :

After plotting the graph for SES and DES model :


Page 11 of 47

Checking the MAPE and RMSE for the DES model :

Comparing both the SES and DES Model :

3. Triple Exponential Smoothing model (Additive) :


Checking the Parameters :

Checking first few predicted rows:


Page 12 of 47

Checking the MAPE and RMSE for the TES(Additive) model :

4. Triple Exponential Smoothing model (Multiplicative) :


Checking the Parameters :

Checking first few predicted rows:


Page 13 of 47

Checking the MAPE and RMSE for the TES(Multiplicative) model :

5. Linear Regression Model :


For this particular linear regression, we are going to regress the ‘Rose’ variable
against the order of the occurrence. For this we need to modify our training data
before fitting it into a linear regression.
Page 14 of 47

Now that our training and test data has been modified, let us go ahead use Linear
Regression to build the model on the training data and test the model on the test data.

On Applying Linear Regression :

Checking the MAPE and RMSE for the Linear Regression model :
Page 15 of 47

6. Naive Model :

Checking first few predicted rows:

After plotting the graph:

Checking the MAPE and RMSE for the Naive model :

7. Simple Average Model :

Checking first few predicted rows:


Page 16 of 47

After plotting the graph:

Checking the MAPE and RMSE for the Simple Average model :

Now comparing all the model’s Performance till now :

It can be clearly seen that the Triple Exponential Smoothing (Additive) Model has
performed the best till now with the minimum Root Mean Square Error and least
Mean Absolute Percentage Error. However, Linear Regression model, Double
Exponential Model and Triple Exponential Smoothing (Multiplicative) Model also
have performed well for the given train set.
Page 17 of 47

1.5 Check for the stationarity of the data on which the model is being built on using
appropriate statistical tests and also mention the hypothesis for the statistical test. If
the data is found to be non-stationary, take appropriate steps to make it stationary.
Check the new data for stationarity and comment.
Note: Stationarity should be checked at alpha = 0.05.

Plotting the graph for train data :

Checking the data at 95% confidence level if it’s stationary or not, using Dicky_Adfuller Test (Alpha = 0.05):

Since the p-value is more than 0.05,the training data is non-stationary at 95% confidence level. Let
us take a first level of differencing to stationarize the Time Series.
Page 18 of 47

Now the train data is stationarized using the first level of differencing and now this can be used for
the ARIMA Models. We do not need to worry about stationarity for the Test Data because we are
not building any models on the Test Data, we are evaluating our models over there.

1.6 Build an automated version of the ARIMA/SARIMA model in which the parameters are
selected using the lowest Akaike Information Criteria (AIC) on the training data and
evaluate this model on the test data using RMSE.

ARIMA : (Auto)
Page 19 of 47

Sorted the ARIMA Models as per the AIC Values (Ascending) :

Hence , we can see the lowest Akaike Information Criteria is for the model (2,1,3), we
build a model ARIMA(2,1,3)

Other Diagnostics results for the model ARIMA(2,1,3) :


Page 20 of 47

First few rows of the predicted data by the model ARIMA(2,1,3) :

Checking the RMSE and MAPE of the model :

Plotting the graph with the predicted values :

1.7 Build ARIMA/SARIMA models based on the cut-off points of ACF and PACF on
the training data and evaluate this model on the test data using RMSE.

ACF Plot for the train data :


Page 21 of 47

From ACF plot we get the value of q=2.


PACF Plot for the train set :

From PACF plot we get the value of p=2.


Hence we build a model ARIMA(2,1,2) :
Page 22 of 47

Checking the RMSE and MAPE of the model:

Checking the diagnostics for the Manual Model :


Page 23 of 47

1.8 Build a table with all the models built along with their corresponding parameters and
the respective RMSE values on the test data.
1.9 Based on the model-building exercise, build the most optimum model(s) on the
complete data and predict 12 months into the future with appropriate confidence
intervals/bands.

Comparing all the Models :


Page 24 of 47

As we can see the optimum model would be the one with the
minimum MAPE and the least RMSE value. Hence we choose the
model Triple Exponential Smoothing(Additive) Model for
predicting the future forecast.

For predicting the next 12 months, we need to train our TES model with all the data
we have been provided with.
So for this we are taking the whole dataset (1981-1995) as the train data . Then we fit
the model.
Checking the parameters for the model :

Checking the values predicted for the next 12 months :


Page 25 of 47

Plotting the graph for the forecasting done by the model:


Page 26 of 47

Now we will analyze the data for wine - “SPARKLING”

1.1 Read the dataset as an appropriate Time Series Data and plot the data.

Checking the first few rows of the data set :

Checking the last few rows of the data set :

Checking the data description :

Checking the null values:

Plotting the graph for the dataset :


Page 27 of 47

1.2 Perform appropriate Exploratory Data Analysis to understand the data and also
perform decomposition.

Yearly Box-plots:

Monthly Box-Plots:
Page 28 of 47

Monthly Graph :

Monthly Sales Across Years :


Page 29 of 47

Decomposing the dataset into trend, Seasonality and Residual :


Additive Model :-

Multiplicative Model: -
Page 30 of 47

1.3 Split the data into training and test. The test data should start in 1991.
Now the data has to be split from 1991 into test set. Viewing the train test dataset after splitting:

Plotting the graph for Train - Test Set:

1.4  Build all the exponential smoothing models on the training data and evaluate the
model using RMSE on the test data. Other additional models such as regression, naïve
forecast models, simple average models, moving average models should also be built on
the training data and check the performance on the test data using RMSE .

8. Single Exponential Smoothing model :


Params:
Page 31 of 47

First few rows of predicted data after applying SES model :

Checking the MAPE and RMSE for the model :

9. Double Exponential Smoothing model :


Checking the Parameters :
Page 32 of 47

First few rows of predicted data after applying SES model :

First few rows of predicted data after applying DES model :

After plotting the graph for SES and DES model :

Checking the MAPE and RMSE for the DES model :

Comparing both the SES and DES Model :

10. Triple Exponential Smoothing model (Additive) :


Checking the Parameters :
Page 33 of 47

Checking first few predicted rows:

Checking the MAPE and RMSE for the TES(Additive) model :

11. Triple Exponential Smoothing model (Multiplicative) :


Checking the Parameters :
Page 34 of 47

Checking first few predicted rows:

Checking the MAPE and RMSE for the TES(Multiplicative) model :

12. Linear Regression Model :


For this particular linear regression, we are going to regress the ‘Sparkling’ variable
against the order of the occurrence. For this we need to modify our training data
before fitting it into a linear regression.
Page 35 of 47

Now that our training and test data has been modified, let us go ahead use Linear
Regression to build the model on the training data and test the model on the test data.
On Applying Linear Regression :

Checking the MAPE and RMSE for the Linear Regression model :
Page 36 of 47

13. Naive Model :

Checking first few predicted rows:

After plotting the graph:

Checking the MAPE and RMSE for the Naive model :

14. Simple Average Model :

Checking first few predicted rows:


Page 37 of 47

After plotting the graph:

Checking the MAPE and RMSE for the Simple Average model :

Now comparing all the model’s Performance till now :

It can be clearly seen that the Triple Exponential Smoothing (Additive) Model has
performed the best till now with the minimum Root Mean Square Error and least
Mean Absolute Percentage Error. However, Linear Regression model, Double
Exponential Model and Triple Exponential Smoothing (Multiplicative) Model also
have performed well for the given train set.
Page 38 of 47

1.5 Check for the stationarity of the data on which the model is being built on using
appropriate statistical tests and also mention the hypothesis for the statistical test. If
the data is found to be non-stationary, take appropriate steps to make it stationary.
Check the new data for stationarity and comment.
Note: Stationarity should be checked at alpha = 0.05.

Plotting the graph for train data :

Checking the data at 95% confidence level if it’s stationary or not, using Dicky_Adfuller Test (Alpha = 0.05):

Since the p-value is more than 0.05,the training data is non-stationary at 95% confidence level. Let
us take a first level of differencing to stationarize the Time Series.
Page 39 of 47

Now the train data is stationarized using the first level of differencing and now this can be used for
the ARIMA Models. We do not need to worry about stationarity for the Test Data because we are
not building any models on the Test Data, we are evaluating our models over there.

1.6 Build an automated version of the ARIMA/SARIMA model in which the parameters are
selected using the lowest Akaike Information Criteria (AIC) on the training data and
evaluate this model on the test data using RMSE.

ARIMA : (Auto)

Sorted the ARIMA Models as per the AIC Values (Ascending) :


Page 40 of 47

Hence , we can see the lowest Akaike Information Criteria is for the model (2,1,2), we
build a model ARIMA(2,1,2)

Other Diagnostics results for the model ARIMA(2,1,2) :

First few rows of the predicted data by the model ARIMA(2,1,2) :

Checking the RMSE and MAPE of the model :

Plotting the graph with the predicted values :


Page 41 of 47

1.7 Build ARIMA/SARIMA models based on the cut-off points of ACF and PACF on
the training data and evaluate this model on the test data using RMSE.

ACF Plot for the train data :

From ACF plot we get the value of q=0.


PACF Plot for the train set :
Page 42 of 47

From PACF plot we get the value of p=0.


Hence we build a model ARIMA(0,1,0) :

Checking the RMSE and MAPE of the model:

Checking the diagnostics for the Manual Model :


Page 43 of 47

Other Diagnostics results for the model ARIMA(0,1,0) :

1.8 Build a table with all the models built along with their corresponding parameters and
the respective RMSE values on the test data.
1.9 Based on the model-building exercise, build the most optimum model(s) on the
complete data and predict 12 months into the future with appropriate confidence
intervals/bands.
Page 44 of 47

Comparing all the Models :

As we can see the optimum model would be the one with the
minimum MAPE and the least RMSE value. Hence we choose the
model Triple Exponential Smoothing(Additive) Model for
predicting the future forecast.
For predicting the next 12 months, we need to train our TES model with all the data
we have been provided with.
So for this we are taking the whole dataset (1981-1995) as the train data . Then we fit
the model.
Checking the parameters for the model :

Checking the values predicted for the next 12 months :


Page 45 of 47

Plotting the graph for the forecasting done by the model:

1.10 Comment on the model thus built and report your findings and suggest the
measures that the company should be taking for future sales.

ABC Estate Wines produces two types of wines, namely ROSE and SPARKLING.
The data is provide from 1980(Jan) to 1995(July)
First, we analyze the ROSE Wine:
1. Its found out that the sales are in declining trend over the years
2. Seasonality is present in the data with the highest selling month being December.
3. Different Time Series Forecasting models are tried on the dataset for predicting
purpose and are successfully executed.
4. We predict the next12 months sales using the best fit model, i.e. Triple
Exponential Smoothing - August 1995- July 1996
Page 46 of 47

5. For Future Sales, the company should do a market survey to know the feedback of
the customers and deep dive into finding the reason behind the
declining trend
Second, we analyze the Sparkling Wine :
6. Its found out that the sales are in inclining trend during late 1908s and then a slow
growth or stable during the 1990s
7. Seasonality is present in the data with the highest selling month being December.
8. Different Time Series Forecasting models are tried on the dataset for predicting
purpose and are successfully executed.
9. We predict the next 12 months sales using the best fit model, i.e. Triple
Exponential Smoothing - August 1995- July 1996
10. The Sales for this wine is doing well but can do better if other features are
enhanced after proper research.
Page 47 of 47

THANK YOU

You might also like