You are on page 1of 31

Time Series

Forecasting
Business Report

Name: S.Krishna Veni


Date: 20/02/2022

1
Contents

1. Read the data as an appropriate Time Series data and plot the data.........................................4
2. Perform appropriate Exploratory Data Analysis to understand the data and also perform decomposition.. 7
3. Split the data into training and test. The test data should start in 1991...................................12
4. Build all the exponential smoothing models on the training data and evaluate the model using RMSE on the
test data. Other additional models such as regression, naïve forecast models, simple average models, moving
average models should also be built on the training data and check the performance on the test data using
RMSE.........................................................................................................................................15
5. Check for the stationarity of the data on which the model is being built on using appropriate statistical tests
and also mention the hypothesis for the statistical test. If the data is found to be non-stationary, take
appropriate steps to make it stationary. Check the new data for stationarity and comment.
Note: Stationarity should be checked at alpha = 0.05...............................................................16
6. Build an automated version of the ARIMA/SARIMA model in which the parameters are selected using the
lowest Akaike Information Criteria (AIC) on the training data and evaluate this model on the test data using
RMSE.........................................................................................................................................33
7. Build ARIMA/SARIMA models based on the cut-off points of ACF and PACF on the training data and evaluate
this model on the test data using RMSE....................................................................................35
8. Build a table with all the models built along with their corresponding parameters and the respective RMSE
values on the test data..............................................................................................................39
9. Based on the model-building exercise, build the most optimum model(s) on the complete data and predict 12
months into the future with appropriate confidence intervals/bands......................................41
10. Comment on the model thus built and report your findings and suggest the measures that the company
should be taking for future sales...............................................................................................42
2
List of Figures

Fig.1 - Rose Year wise data plot.........................................................................................................6


Fig.2 –Rose Month wise data plot......................................................................................................6
Fig.3 - Sparkling Year wise data plot..................................................................................................6
Fig.4 - Sparkling Month wise data plot s............................................................................................6
Fig.5 – Additive decomposition- Rose................................................................................................7
Fig.6 – Multiplicative decomposition- Rose.......................................................................................7
Fig.7 - Additive decomposition- Sparkling..........................................................................................7
Fig.8 – Multiplicative Decomposition- Sparkling................................................................................7
Fig.9 – Simple exponential smoothing model - Rose.........................................................................8
Fig.10 - Simple exponential smoothing - Sparkling............................................................................8
Fig.11 – Double exponential smoothing model- Rose........................................................................8
Fig.12 – Double exponential smoothing- Sparkling............................................................................8
Fig.13 – Triple exponential smoothing- Rose.....................................................................................8
Fig.14 – Triple exponential smoothing- Sparkling..............................................................................8
Fig.15- Linear Regression - Rose........................................................................................................9
Fig.16- Linear Regression - Sparkling................................................................................................10
Fig.17- Simple Average model- Rose................................................................................................11
Fig.18 - Simple Average model- Sparkling........................................................................................12
Fig.19 - Moving Average model- Rose..............................................................................................12
Fig.20 – Moving Average model- Sparkling......................................................................................13
Fig.21 – Naïve’s Model- Rose...........................................................................................................14
Fig.22 - Naïve’s Model- SParkling.....................................................................................................18
3
Fig.23 – ARIMA- ROSE......................................................................................................................18
Fig.24 – ARIMA- Sparkling ...............................................................................................................18
Fig.25 – SARIMA - ROSE...................................................................................................................19
Fig.26 – SARIMA - Sparkling.............................................................................................................19
Fig.27 – Comparison Plot.................................................................................................................19

Problem:
For this particular assignment, the data of different types of wine sales in the 20th
century is to be analysed. Both of these data are from the same company but of
different wines. As an analyst in the ABC Estate Wines, you are tasked to analyse and
forecast Wine Sales in the 20th century.
Data set for the Problem: Sparkling.csv and Rose.csv
Please do perform the following questions on each of these two data sets separately.

1. Read the data as an appropriate Time Series data and plot the data.

Reading the dataset-

4
We had null values in Rose dataset , we have rectified the null values by interpolation
method.

5
The two datasets with Sales data of two types of wine “Rose” and “Sparkling” are imported in to Jupyter notebook
using pd.read_csv command. We also parse the Year and month information to Datetime datatype using
parse_dates function

The sales data of the two data sets are plotted as follows-

Fig 1. Rose dataset Graph

6
Observations on Rose Year wise sales data –

 We notice that there is an decreasing trend in the initial years which stabilizes after few years and again shows a
decreasing trend

 We also observe seasonality in the data trend and pattern seem to repeat on yearly basis

Fig 2. Sparkling dataset Graph

Observations on Sparkling Year wise sales data –

 We notice that there is not much trend in the plot

 The seasonality seems to have a pattern on yearly basis.

2. Perform appropriate Exploratory Data Analysis to understand the data and


also perform decomposition.

We have 187 data points in Rose wine data with two missing values and 187 data points in Sparkling wine data. The
basic measures of descriptive statistics tell us how the Sales have varied across years. However, for this measure of
descriptive statistics we have averaged over the whole data without taking the time component into account hence
should look at the box plots year wise and month wise.

Rose wine data – Descriptive statistics

7
Fig.3 Rose Year wise data plot

Fig 4. Rose Month Wise Data plot

 As observed in the Time Series plot, the year wise boxplots over here also indicate a measure of downward trend.

 Also, we see that the sales of Rose wine has some outliers for certain years.

 December seems to have the highest sales of Rose wine and there are also outlier in June, July, August and
September months

Sparkling wine data – Descriptive statistics

8
Fig.5 Sparkling Year wise data plot

Fig 6. Sparkling Month wise dataplot

 As observed in the Time Series plot, the boxplots over here also do not indicate trend

 Also, we see that the sales of Sparkling wine has some outliers for almost all years except 1995

 We also observe December month has the highest sales value for Sparkling wine

9
Fig 7. Rose Year – Month wise line plot

Fig 8. Sparkling Year - month wise line plot

 We observe from the above Line plots of Year/month wise sales data of Rose and Sparkling wine that December
month has the highest Sales and January, February and March months show lower sales values.

Decomposition

Rose Sales-

10
Fig 9. Rose Data – Additive Decomposition

Fig 10. Rose Data – Multiplicative Decomposition

 There were two missing values which were interpolated using Linear method

 For additive we see the residual values are around 0 and for Multiplicative model we see the residual are around
1

Sparkling Sales

11
Fig 12. Sparkling Data Additive Decomposition

Fig 13. Sparkling Data Multiplicative Decomposition

 For additive we see the residual values are around 0 and for Multiplicative model we see the residual are around
1

3. Split the data into training and test. The test data should start in 1991.

Rose Data set

12
 The Train data of Rose wine sales has been split for data up to 1990 and has 132 data points
13
 The Test data of Rose wine sales has been split for data from 1991 and has 55 data points
 From our train-test split we are predicting the future sales as compared to the past years.

Sparkling Data set

 The Train data of Sparkling wine sales has been split for data up to 1990 and has 132 data points

14
 The Test data of Sparkling wine sales has been split for data from 1991 and has 55 data points
 From our train-test split we are predicting the future sales as compared to the past years.

4. Build various exponential smoothing models on the training data and evaluate the
model using RMSE on the test data. Other models such as regression, naïve forecast
models, simple average models etc. should also be built on the training data and
check the performance on the test data using RMSE. Please do try to build as many
models as possible and as many iterations of models as possible with different
parameters.

Simple Exponential Smoothing

Rose Dataset

15
While trying to forecast Model Evaluation for @ 𝛼 = 0.995 : Simple Exponential
Smoothing, We get an RMSE score of 36.79 for the Test data. We shall also proceed to
find the scores with different alpha levels.

Sparkling Dataset

16
While trying to forecast Model Evaluation for @ 𝛼 = 0.995 : Simple Exponential Smoothing, We get an RMSE score
of 1316 for the Test data. We shall also proceed to find the scores with different alpha levels.

Double Exponential Smoothing

Rose Dataset

17
We ran the Double Exponential Smoothing for different Smoothing level (Alpha) and Smoothing slope/trend(Beta)
values ranging from 0.3 to 1.0 and we got the least five RMSE scores ranging from 265.57 to 439.30 (listed in the
table below) at Alpha values 0.30-0.60/Beta values 0.30-0.40

18
Sparkling Data

We ran the Double Exponential Smoothing for different Smoothing level (Alpha) and Smoothing slope/trend(Beta)
values ranging from 0.3 to 1.0 and we got the least five RMSE scores ranging from 1500 to 1530 (listed in the table
below) at Alpha values 0.60-0.80/Beta values 0.90-1.00

Triple Exponential Smoothing

Rose Data Set

We ran the Triple Exponential Smoothing for Rose sales with auto fit parameters which resulted in Smoothing
level(Alpha) 0.106, Smoothing slope/trend(Beta) 0.048 and Smoothing seasonality(Gamma) 0.0 and we got Test
RMSE score 17.37. The auto fit model parameters are listed as follows:
19
We shall check the Test scores of Triple Exponential Smoothing model at different Alpha,
Beta, Gamma levels.

20
We ran the Triple Exponential Smoothing for different Smoothing level (Alpha), Smoothing slope/trend(Beta) and
Smoothing seasonality(Gamma) values ranging from 0.3 to 1.0 and we got the least five RMSE scores ranging from
10.95 to 16.72 (listed in the table below) at Alpha values 0.30-0.50/Beta values 0.30-0.50/Gamma values 0.30-0.80

Sparkling dataset

We ran the Triple Exponential Smoothing for Sparkling sales with auto fit parameters which resulted in Smoothing
level(Alpha) 0.154, Smoothing slope/trend(Beta) 2.6836273360597693e-21 and Smoothing seasonality(Gamma)
0.371 and we got Test RMSE score 383.19. The auto fit model parameters are listed as follows:

jkgtubbn Standardized residual – Do not display any obvious seasonality


 Histogram plus estimated density - The KDE plot of the residuals is similar with the
normal distribution, hence the model residuals are normally distributed based
 Normal Q-Q plot – There is an ordered distribution of residuals (blue dots) following the
linear trend of the samples taken from a standard normal distribution with N(0, 1)
 Correlogram – The time series residuals have low correlation with lagged versions of
Itself

21
Automated ARIMA- SPARKLING

We checked the Stationarity of the train data of Sparkling sales data at alpha 0.05 and observe from the result table
that data was not stationary hence we took a difference of order 1 and found the train data was stationary then.

We see there is some seasonality in data

We ran the automated ARIMA model for Sparkling Sales and sorted the AIC values output
from lowest to highest. We then proceeded to build the ARIMA model with the lowest
Akaike Information Criteria.

Automated SARIMA- Sparkling

We observe the ACF plot for Sparkling Sales and observe seasonality at intervals 12, hence
we run the Automated SARIMA models at seasonality 12 and sorted the AIC values output
from lowest to highest. We then proceeded to build the SARIMA model with the lowest
22
Akaike Information Criteria

Inference from Model diagnostics confirms that the model residuals are normally
distributed
 Standardized residual – Do not display any obvious seasonality
 Histogram plus estimated density - The KDE plot of the residuals is similar with the
normal distribution, hence the model residuals are normally distributed based
 Normal Q-Q plot – There is an ordered distribution of residuals (blue dots) following the
linear trend of the samples taken from a standard normal distribution with N(0, 1)
 Correlogram – The time series residuals have low correlation with lagged versions of
Itself

5. Build ARIMA/SARIMA models based on the cut-off points of ACF and PACF on the training data and
evaluate this model on the test data using RMSE

Manual ARIMA – Rose


We then built manual ARIMA model for Rose Sales based on the ACF and PACF plots.
Hence we chose the AR parameter p value 5, Moving average parameter q value 2 and d
value 1 based on the below plots. The Test RMSE score at this p,d,q value of ARIMA model
is 15.43

23
Manual SARIMA – Rose
We then built manual SARIMA model for Rose Sales based on the ACF and PACF plots.
Hence we chose the AR parameter p value 5, Moving average parameter q value 2 and d
value 1 based on the below plots. We then derive the Seasonal parameters based on the
seasonal cut-offs on the 6 months series plot and chose Seasonal AR parameter P value 2,
Seasonal Moving average parameter Q value 9, differencing 1 and Seasonal period 6
The Test RMSE score at this (5,1,2)(2,1,9,6) value of SARIMA model is 25.23

24
Model diagnostics confirms that the model residuals are normally distributed. Standardized
residual do not display any obvious seasonality, Histogram plus estimated density - The KDE plot
has normal distribution , Normal Q-Q plot – There is an ordered distribution of residuals (blue
dots) following the linear trend Correlogram – The time series residuals have low correlation with
lagged versions of itself

Manual Arima – Sparkling


We then built manual ARIMA model for Sparkling Sales based on the ACF and PACF plots.
Hence we chose the AR parameter p value 1, Moving average parameter q value 2 and d
value 1 based on the below plots. The Test RMSE score at this p,d,q value of ARIMA model
is 1436.73

25
Manual SARIMA- Sparkling
We then built manual SARIMA model for Sparking Sales based on the ACF and PACF plots.
Hence we chose the AR parameter p value 1, Moving average parameter q value 2 and d
value 1 based on the below plots. We then derive the Seasonal parameters based on the
seasonal cut-offs on the 6 months series plot and chose Seasonal AR parameter P value 1,
Seasonal Moving average parameter Q value 2, differencing 1 and Seasonal period 6
The Test RMSE score at this (5,1,2)(2,1,9,6) value of SARIMA model is 722.86

26
Model diagnostics confirms that the model residuals are normally distributed. Standardized
residual do not display any obvious seasonality, Histogram plus estimated density - The KDE plot
has normal distribution , Normal Q-Q plot – There is an ordered distribution of residuals (blue
dots) following the linear trend Correlogram – The time series residuals have low correlation with
lagged versions of itself

6. Build a table (create a data frame) with all the models built along with their corresponding parameters
and the respective RMSE values on the test data.
27
We have consolidated the test results from the various models built in the forecasting process of
the future Rose wine sales and get the following Test RMSE scores sorted in order of lowest to
highest values. The lowest score 10.95. It was obtained from the Triple exponential smoothing model which was
run based on different Smoothing level (Alpha), Smoothing slope/trend(Beta) and Smoothing
seasonality(Gamma) values ranging from 0.3 to 1.0 and the parameters that gave the lowest score
are Alpha – 0.3, Beta – 0.4 and Gamma – 0.3

We have consolidated the test results from the various models built in the forecasting process of
the future Sparkling wine sales and get the following Test RMSE scores sorted in order of lowest
to highest values.

7. Based on the model-building exercise, build the most optimum model(s) on the complete data and
predict 12 months into the future with appropriate confidence intervals/bands.

Final Full Model- Rose data


28
We observed from the RMSE scores that Triple Exponential would work better for the
Rose Sales data where we had seasonality and Trend.
We see that the best model is the Triple Exponential Smoothing (Holt-winter method)
with parameters α = 0.3, β = 0.4 and γ = 0.3

Prediction Plot – Rose


The upper and lower confidence bands were calculated at 95% confidence interval

Final Full Model- Sparkling data

We observed from the RMSE scores that Triple Exponential would work better for the
Sparkling Sales data where we had seasonality and Trend.
We see that the best model is the Triple Exponential Smoothing (Holt-winter method)
with parameters α = 0.154, β = 0. 2.6836273360597693e-21 and γ = 0.371

29
Prediction Plot – Sparkling

The upper and lower confidence bands were calculated at 95% confidence interval

8. Comment on the model thus built and report your findings and suggest the measures that the company
should be taking for future sales.

Inference and Recommendations – Rose Sales Model

 Rose sales shows a decrease in trend compared to the previous years


 December month shows the highest Sales across the year while the value
has come down through the years from 1980-1994
 The models are built considering the Trend and Seasonality in to account
and we see from the output plot that the future prediction is in line with
the trend and seasonality in the previous years
 The Sales of Rose wine is seasonal and also has trend, hence the company
cannot have the same stock through the year. The predictions would help
here to plan the Stock need basis the forecasted sales.
 The company should use the prediction results and capitalize on the high
demand seasons and ensure to source and supply the high demand
 The company should use the prediction results to plan the low demand
seasons to stock as per the demand

Inference and Recommendations – Sparkling Sales Model

 Sparkling sales shows stabilized values and not much trend compared to
previous years
30
 December month shows the highest Sales across the years from 1980-1994
 The models are built considering the Trend and Seasonality in to account
and we see from the output plot that the future prediction is in line with
the trend and seasonality in the previous years
 The Sales of Sparling wine is seasonal, hence the company cannot have the
same stock through the year. The predictions would help here to plan the
Stock need basis the forecasted sales.
 The company should use the prediction results and capitalize on the high
demand seasons and ensure to source and supply the high demand
 The company should use the prediction results to plan the low demand
seasons to stock as per the demand

31

You might also like