Professional Documents
Culture Documents
Time Series Forecasting Report: Graded Project by
Time Series Forecasting Report: Graded Project by
DSBA
Graded Project
By
Contents
List of Figures
List of Tables.
Problem:
For this particular assignment, the data of different types of wine sales in the 20th century is to be analysed. Both of
these data are from the same company but of different wines. As an analyst in the ABC Estate Wines, you are tasked
to analyse and forecast Wine Sales in the 20th century.
1.1 Read the dataset as an appropriate Time Series Data and plot the data.
Since there are 2 null values present in the dataset, we’ve to impute the values before
proceeding further.
Plotting the graph before imputing :
Page 5 of 47
1.2 Perform appropriate Exploratory Data Analysis to understand the data and also
perform decomposition.
Yearly Box-plots:
Monthly Box-Plots:
Page 6 of 47
Monthly Graph :
Multiplicative Model: -
As we can see from both the graphs that the Residual pattern is almost the same, we
will consider the Additive model only.
1.3 Split the data into training and test. The test data should start in 1991.
Now the data has to be split from 1991 into test set. Viewing the train test dataset after splitting:
1.4 Build all the exponential smoothing models on the training data and evaluate the
model using RMSE on the test data. Other additional models such as regression, naïve
forecast models, simple average models, moving average models should also be built on
the training data and check the performance on the test data using RMSE .
Now that our training and test data has been modified, let us go ahead use Linear
Regression to build the model on the training data and test the model on the test data.
Checking the MAPE and RMSE for the Linear Regression model :
Page 15 of 47
6. Naive Model :
Checking the MAPE and RMSE for the Simple Average model :
It can be clearly seen that the Triple Exponential Smoothing (Additive) Model has
performed the best till now with the minimum Root Mean Square Error and least
Mean Absolute Percentage Error. However, Linear Regression model, Double
Exponential Model and Triple Exponential Smoothing (Multiplicative) Model also
have performed well for the given train set.
Page 17 of 47
1.5 Check for the stationarity of the data on which the model is being built on using
appropriate statistical tests and also mention the hypothesis for the statistical test. If
the data is found to be non-stationary, take appropriate steps to make it stationary.
Check the new data for stationarity and comment.
Note: Stationarity should be checked at alpha = 0.05.
Checking the data at 95% confidence level if it’s stationary or not, using Dicky_Adfuller Test (Alpha = 0.05):
Since the p-value is more than 0.05,the training data is non-stationary at 95% confidence level. Let
us take a first level of differencing to stationarize the Time Series.
Page 18 of 47
Now the train data is stationarized using the first level of differencing and now this can be used for
the ARIMA Models. We do not need to worry about stationarity for the Test Data because we are
not building any models on the Test Data, we are evaluating our models over there.
1.6 Build an automated version of the ARIMA/SARIMA model in which the parameters are
selected using the lowest Akaike Information Criteria (AIC) on the training data and
evaluate this model on the test data using RMSE.
ARIMA : (Auto)
Page 19 of 47
Hence , we can see the lowest Akaike Information Criteria is for the model (2,1,3), we
build a model ARIMA(2,1,3)
1.7 Build ARIMA/SARIMA models based on the cut-off points of ACF and PACF on
the training data and evaluate this model on the test data using RMSE.
1.8 Build a table with all the models built along with their corresponding parameters and
the respective RMSE values on the test data.
1.9 Based on the model-building exercise, build the most optimum model(s) on the
complete data and predict 12 months into the future with appropriate confidence
intervals/bands.
As we can see the optimum model would be the one with the
minimum MAPE and the least RMSE value. Hence we choose the
model Triple Exponential Smoothing(Additive) Model for
predicting the future forecast.
For predicting the next 12 months, we need to train our TES model with all the data
we have been provided with.
So for this we are taking the whole dataset (1981-1995) as the train data . Then we fit
the model.
Checking the parameters for the model :
1.1 Read the dataset as an appropriate Time Series Data and plot the data.
1.2 Perform appropriate Exploratory Data Analysis to understand the data and also
perform decomposition.
Yearly Box-plots:
Monthly Box-Plots:
Page 28 of 47
Monthly Graph :
Multiplicative Model: -
Page 30 of 47
1.3 Split the data into training and test. The test data should start in 1991.
Now the data has to be split from 1991 into test set. Viewing the train test dataset after splitting:
1.4 Build all the exponential smoothing models on the training data and evaluate the
model using RMSE on the test data. Other additional models such as regression, naïve
forecast models, simple average models, moving average models should also be built on
the training data and check the performance on the test data using RMSE .
Now that our training and test data has been modified, let us go ahead use Linear
Regression to build the model on the training data and test the model on the test data.
On Applying Linear Regression :
Checking the MAPE and RMSE for the Linear Regression model :
Page 36 of 47
Checking the MAPE and RMSE for the Simple Average model :
It can be clearly seen that the Triple Exponential Smoothing (Additive) Model has
performed the best till now with the minimum Root Mean Square Error and least
Mean Absolute Percentage Error. However, Linear Regression model, Double
Exponential Model and Triple Exponential Smoothing (Multiplicative) Model also
have performed well for the given train set.
Page 38 of 47
1.5 Check for the stationarity of the data on which the model is being built on using
appropriate statistical tests and also mention the hypothesis for the statistical test. If
the data is found to be non-stationary, take appropriate steps to make it stationary.
Check the new data for stationarity and comment.
Note: Stationarity should be checked at alpha = 0.05.
Checking the data at 95% confidence level if it’s stationary or not, using Dicky_Adfuller Test (Alpha = 0.05):
Since the p-value is more than 0.05,the training data is non-stationary at 95% confidence level. Let
us take a first level of differencing to stationarize the Time Series.
Page 39 of 47
Now the train data is stationarized using the first level of differencing and now this can be used for
the ARIMA Models. We do not need to worry about stationarity for the Test Data because we are
not building any models on the Test Data, we are evaluating our models over there.
1.6 Build an automated version of the ARIMA/SARIMA model in which the parameters are
selected using the lowest Akaike Information Criteria (AIC) on the training data and
evaluate this model on the test data using RMSE.
ARIMA : (Auto)
Hence , we can see the lowest Akaike Information Criteria is for the model (2,1,2), we
build a model ARIMA(2,1,2)
1.7 Build ARIMA/SARIMA models based on the cut-off points of ACF and PACF on
the training data and evaluate this model on the test data using RMSE.
1.8 Build a table with all the models built along with their corresponding parameters and
the respective RMSE values on the test data.
1.9 Based on the model-building exercise, build the most optimum model(s) on the
complete data and predict 12 months into the future with appropriate confidence
intervals/bands.
Page 44 of 47
As we can see the optimum model would be the one with the
minimum MAPE and the least RMSE value. Hence we choose the
model Triple Exponential Smoothing(Additive) Model for
predicting the future forecast.
For predicting the next 12 months, we need to train our TES model with all the data
we have been provided with.
So for this we are taking the whole dataset (1981-1995) as the train data . Then we fit
the model.
Checking the parameters for the model :
1.10 Comment on the model thus built and report your findings and suggest the
measures that the company should be taking for future sales.
ABC Estate Wines produces two types of wines, namely ROSE and SPARKLING.
The data is provide from 1980(Jan) to 1995(July)
First, we analyze the ROSE Wine:
1. Its found out that the sales are in declining trend over the years
2. Seasonality is present in the data with the highest selling month being December.
3. Different Time Series Forecasting models are tried on the dataset for predicting
purpose and are successfully executed.
4. We predict the next12 months sales using the best fit model, i.e. Triple
Exponential Smoothing - August 1995- July 1996
Page 46 of 47
5. For Future Sales, the company should do a market survey to know the feedback of
the customers and deep dive into finding the reason behind the
declining trend
Second, we analyze the Sparkling Wine :
6. Its found out that the sales are in inclining trend during late 1908s and then a slow
growth or stable during the 1990s
7. Seasonality is present in the data with the highest selling month being December.
8. Different Time Series Forecasting models are tried on the dataset for predicting
purpose and are successfully executed.
9. We predict the next 12 months sales using the best fit model, i.e. Triple
Exponential Smoothing - August 1995- July 1996
10. The Sales for this wine is doing well but can do better if other features are
enhanced after proper research.
Page 47 of 47
THANK YOU