Professional Documents
Culture Documents
1
Time Series Forecasting-Sparkling
Content:
Problem Statement…………………………………………………………………………………………………….04
1. Read the data as an appropriate Time Series data and plot the
data…………………………………………………………………………………………………………………………………………..04
2. Perform appropriate Exploratory Data Analysis to understand the data and also perform
decomposition…………………………………………………….……………………………………………………………………05
3. Split the data into training and test. The test data should start in
1991……………………………………………………………………………………………………………………………………….…13
4. Build all the exponential smoothing models on the training data and evaluate the model using
RMSE on the test data. Other additional models such as regression, naïve forecast models,
simple average models, moving average models should also be built on the training data and
check the performance on the test data using
RMSE………………………………………………………………………………….…………………………………………………….14
5. Check for the stationarity of the data on which the model is being built on using appropriate
statistical tests and also mention the hypothesis for the statistical test. If the data is found to be
non-stationary, take appropriate steps to make it stationary. Check the new data for stationarity
and comment………………………………………………………………………………………………………………………….31
Note: Stationarity should be checked at alpha = 0.05.
6. Build an automated version of the ARIMA/SARIMA model in which the parameters are selected
using the lowest Akaike Information Criteria (AIC) on the training data and evaluate this model
on the test data using RMSE……………………………………………………………………………………………………34
7. Build ARIMA/SARIMA models based on the cut-off points of ACF and PACF on the training data
and evaluate this model on the test data using RMSE………………………………………………………….….35
8. Build a table with all the models built along with their corresponding parameters and the
respective RMSE values on the test data…………………………………………………………………………………..54
9. Based on the model-building exercise, build the most optimum model(s) on the complete data
and predict 12 months into the future with appropriate confidence intervals/bands……………..65
10. Comment on the model thus built and report your findings and suggest the measures that the
company should be taking for future sales………………………………………………………….…………………….67
2
Time Series Forecasting-Sparkling
Figure Table:
1. Sparkling Year wise data plot …………………………………………………………………………………………………05
2. Month wise data plot………………………………………………………………………………………………………………07
3. Monthly Sales across Year..……………………………………………………………………………………………………..08
4. Time Series Plot ….……………………………………………………………………………………………………………………09
5. Empirical Cumulative Distribution…………………………………………………………………………………………..10
6. average Sparkling, and precent change...……………………………………………………………………………….11
7. Multiplicative decomposition ..…………………………………………………………………………………………………12
8. Additive decomposition . …………………………………………………………………………………………………………12
9. Linear Regression Model. …………………………………………………………………………………………………………17
10. Linear Regression Model. ………………………………………………………………………………………………………..17
11. Naïve Forecast. ………………………………………………………………………………………………………………………..18
12. Simple average forecast. …………………………………………………………………………………………………………19
13. Moving Average ………………………………………………………………………………………………………………………21
14. Plotting on the whole data. ………………………………………………………………………………………………………21
15. Plotting on both the Training and Test data………………………………………………………………..…………….21
16. Simple Exponential Smoothing……..…………………………………………………………………………………………..23
17. Triple Exponential Smoothing (Holt - Winter's Model)………………………………………………………………29
18. Sparkling TES forecast ..…………………………………………………………………………………………………………….30
19. Rolling mean and Standard deviation…….…………………………………………………………………………………32
20. Automated ARIMA. ………………………………………………………………………………………………………..………..37
21. Automated SARIMA model……………………………………………………………………………………………………….44
22. Log Data Autocorrelation(acf)…….…………………………………………………………………………………………….46
23. Log Data Difference Autocorrelation(acf). ……………………………………………………………………………….46
24. Log Data Autocorrelation(pacf)………………………………………………………………………………………………..47
25. Log Data Difference Autocorrelation(pacf)……………………………………………………………………………..38
26. Manual ARIMA. ………………………………………………………………………………………………………………………..55
27. Manual SARIMA . …………………………………………………………………………………………………………………….59
28. The forecast along with the confidence band………………………………………………………………………….68
3
Time Series Forecasting-Sparkling
Problem Statement:
For this particular assignment, the data of different types of wine sales in the 20th century is to be
analyzed. Both of these data are from the same company but of different wines. As an analyst in the ABC
Estate Wines, you are tasked to analyse and forecast Wine Sales in the 20th century
1. Read the data as an appropriate Time Series data and plot the data.
Solution:
HEAD:
TAIL:
4
Time Series Forecasting-Sparkling
Info:
5
Time Series Forecasting-Sparkling
We have to see how to load the data from a ‘.csv’ file as a Time Series object, et us go ahead and analyse
the Time Series plot that we got.
Solution:
6
Time Series Forecasting-Sparkling
The average sales of the Sparkling wine per month are around 2402.
We have converted the data into the Date format and given the column name as Timestamp.
We can also drop the column Year Month as we got the month year and date in single column named as
Timestamp.
7
Time Series Forecasting-Sparkling
Yearly Boxplot-
Now, let us plot a box and whisker (1.5*IQR) plot to understand the spread of the data and check for
outliers in each year-
8
Time Series Forecasting-Sparkling
As we got to know from the time series plot, the box plots over here also indicate a measure of trends
being present. Also, we see that sales of Sparkling wine have some outliers for certain years.
Monthly Boxplot-
Since this is a monthly data, let us plot a box and whisker (1.5*IQR) plot to understand the spread of the
data and check for outliers for every month across all the years, if any.
The highest such numbers are being recorded in the month of December across various years.
9
Time Series Forecasting-Sparkling
10
Time Series Forecasting-Sparkling
11
Time Series Forecasting-Sparkling
Multiplicative-
12
Time Series Forecasting-Sparkling
We see that the residuals are located around 0 from the plot of the residuals in the decomposition.
Additive-
For additive we see that residual values are around 0 and for Multiplicative model we see the residual
are around1.
13
Time Series Forecasting-Sparkling
3. Split the data into training and test. The test data should start in 1991.
14
Time Series Forecasting-Sparkling
Solution:
Training Data is till the end of 1990. The test Data is from the beginning of 1991 to the last
time stamp provided.
15
Time Series Forecasting-Sparkling
It is difficult to predict the Future observations if such an instance has not happened in the
past. From our Train-Test split.
We are predicting likewise behavior as compared to the past years.
The train data of sparkling wine sales has been split for data up to 1990 and has 132 data
points.
The test data of sparkling wine sales has been split for data from 1991 and has 55 data
points.
From our train test split we are predicting the future sales as compared to the past years.
4. Build all the exponential smoothing models on the training data and evaluate
the model using RMSE on the test data. Other models such as regression,naïve
forecast models and simple average models. should also be built on the training
data and check the performance on the test data using RMSE.
Solution:
16
Time Series Forecasting-Sparkling
Now that our training and test has been modified, let us go ahead use Liner regression to build the
model on the Training and Test the model on the test data.
17
Time Series Forecasting-Sparkling
Evaluate this model on the test data using Root Mean Squared Error (RMSE):
For Regression on Time forecast on the Test Data, RMSE is 1389.135
18
Time Series Forecasting-Sparkling
Model Evaluation:
For Naive forecast on the Test Data, RMSE is 3864.279
19
Time Series Forecasting-Sparkling
Model Evaluation:
For Simple Average forecast on the Test Data, RMSE is 1275.082
20
Time Series Forecasting-Sparkling
The best interval can be determined by the maximum accuracy or the minimum error over here.
For moving Average, we are going to average over the entire data.
21
Time Series Forecasting-Sparkling
Let us split the data into train and test and plot this Time Series. The window of the moving average
needs to be carefully selected as too big a window will result in not having any test set as the whole
series might get average over.
22
Time Series Forecasting-Sparkling
Model Evaluation
Before we go on to build the various Exponential Smoothing models, let us plot all the models and
compare the Time Series plots.
# Plotting on both Training and Test data-
23
Time Series Forecasting-Sparkling
24
Time Series Forecasting-Sparkling
For Alpha =0.995 Simple Exponential Smoothing Model forecast on the Test Data, RMSE is 1316.035
First, we will define an empty data frame to store our values from the loop
Model Evaluation-
25
Time Series Forecasting-Sparkling
26
Time Series Forecasting-Sparkling
Alpha= 0.6885714285714285
Beta= 9.999999999999999e-05
Model Evaluation for Alpha = 0.68 and Beta = 0.0 : DES-Autofit Model:
For Alpha =0.68 Double Exponential Smoothing Model forecast on the Test Data, RMSE is 2007.239
First we will define an empty data frame to store our values from the loop
27
Time Series Forecasting-Sparkling
We must build several models and went through a model building exercise. This exercise has given us an
idea as to which model gives us the least error on our test set for this data. But in time series
forecasting, we need to be careful about the fact that after we have done this exercise, we need to build
the model on the whole data. Remember, the training data that we have used to build the model stops
much before the data ends. In order to forecast using any of the model built, we need to build the
models again (this time on the complete data) with the same parameters.
The two models to be build on the whole data are the following:
28
Time Series Forecasting-Sparkling
The fit of the model is by the best parameters that Python thinks for the model. It uses a brute force
method to choose the parameters
Alpha= 0.11133818361298699
Beta=0.049505131019509915
Gamma=0.3620795793580111
29
Time Series Forecasting-Sparkling
Model Evaluation for alpha = 0.11 and beta = 0.7 gama= 0.395 : TES-Autofit Model:
For Auto-fit Triple Exponential Smoothing Model forecast on the Test Data, RMSE is 404.287
30
Time Series Forecasting-Sparkling
First, we will define an empty data frame to store our values from the loop
31
Time Series Forecasting-Sparkling
32
Time Series Forecasting-Sparkling
5. Check for the stationarity of the data on which the model is being built on
using appropriate statistical tests and also mention the hypothesis for the
statistical test. If the data is found to be non-stationary, take appropriate steps
to make it stationary. Check the new data for stationarity and comment. Note:
Stationarity should be checked at alpha = 0.05.
Solution:
Test for stationarity of the series - Dicky Fuller test
Null hypothesis H0- series is not stationary
Alternative Hypothesis H1-Series is Stationary
33
Time Series Forecasting-Sparkling
34
Time Series Forecasting-Sparkling
From the above plots we can say that there seems to be seasonality in the data.
35
Time Series Forecasting-Sparkling
We have kept the value of d as 1 as we need to take a difference of the series to make it stationary.
The following loop helps us in getting a combination of different parameters of p and q in the range of 0
and 2
We sort the below AIC values in the ascending order to get the parameters for the minimum AIC
value-
36
Time Series Forecasting-Sparkling
Sort the above AIC values in the ascending order to get the parameters for the minimum AIC value
37
Time Series Forecasting-Sparkling
Predict on the Test Set using this model and evaluate the model.
RMSE: 1299.9796397916396
MAPE: 47.09998646565863
38
Time Series Forecasting-Sparkling
Build an Automated version of a SARIMA model for which the best parameters are selected in
accordance with the lowest Akaike Information Criteria (AIC).
RMSE: 1299.9796397916396
39
Time Series Forecasting-Sparkling
MAPE: 47.09998646565863
Build an Automated version of a SARIMA model for which the best parameters are selected in
accordance with the lowest Akaike Information Criteria (AIC).
40
Time Series Forecasting-Sparkling
41
Time Series Forecasting-Sparkling
42
Time Series Forecasting-Sparkling
43
Time Series Forecasting-Sparkling
Predict on the Test Set using this model and evaluate the model.
44
Time Series Forecasting-Sparkling
45
Time Series Forecasting-Sparkling
46
Time Series Forecasting-Sparkling
47
Time Series Forecasting-Sparkling
48
Time Series Forecasting-Sparkling
49
Time Series Forecasting-Sparkling
50
Time Series Forecasting-Sparkling
51
Time Series Forecasting-Sparkling
52
Time Series Forecasting-Sparkling
Predict on the Test Set using this model and evaluate the model.
53
Time Series Forecasting-Sparkling
spark_forecasted_log-
54
Time Series Forecasting-Sparkling
7. Build ARIMA/SARIMA models based on the cut-off points of ACF and PACF on
the training data and evaluate this model on the test data using RMSE.
Solution:
Manual ARIMA :
We built manual ARIMA model for sparkling sales based on the ACF and PACF plots.
We choose the AR parameters p value 1, moving average parameter q value 2 and a value 1 based on
the below plots
55
Time Series Forecasting-Sparkling
56
Time Series Forecasting-Sparkling
The data has some seasonality so we should build a SARIMA model to get better accuracy.
57
Time Series Forecasting-Sparkling
We built the ACF and the PACF plots once more for sparkling data. We choose the AR parameters
58
Time Series Forecasting-Sparkling
59
Time Series Forecasting-Sparkling
Now we see that there is almost no trend present in the data. Seasonality is only present in the data.
Check the stationarity of the above series before fitting the SARIMA model.
60
Time Series Forecasting-Sparkling
Checking the ACF and the PACF plots for the new modified Time Series.
Differenced Data Autocorrelation(acf)
61
Time Series Forecasting-Sparkling
62
Time Series Forecasting-Sparkling
Predict on the Test Set using this model and evaluate the model.
63
Time Series Forecasting-Sparkling
64
Time Series Forecasting-Sparkling
8. Build a table (create a data frame) with all the models built along with their
corresponding parameters and the respective RMSE values on the test data.
Solution:
We have consolidated the test results from the various models Built in the forecasting process of the
future Sparkling wine sales and get the following Test RMSE scores sorted in order of lowest to highest
values.
65
Time Series Forecasting-Sparkling
SARIMA Model-
66
Time Series Forecasting-Sparkling
We can see that we have annual seasonality rather than half year Seasonality.
Normally Distributed-
67
Time Series Forecasting-Sparkling
The upper and lower confidence bands were calculated at 95% confidence interval.
10. Comment on the model thus built and report your findings and suggest the
measures that the company should be taking for future sales.¶
Solution:
Sparkling sales shows stabilized values and not much trend compared to previous years.
December month shows the highest sales across the years from 1980-1994
The Models are built considering the trend and Seasonality into account and we see from the
output plot that the future prediction is in line with the Trend and Seasonality in the previous
years.
The sales of sparkling are seasonal, the company cannot have the same stock through the year.
The predictions would help here to plan the stock need basis the forecast sales.
The company should use the predictions results to plan the low demand season to stock as per
the demand.
68
Time Series Forecasting-Sparkling
Thank You…!
69