You are on page 1of 39

Project Time Series Forecasting

Name: Vignesha. M

PG-DSBA Online

March 2023

Date: 05-Novermber-2023

1
Table of Contents

Executive Summary .............................................................................................................. 6

Introduction ........................................................................................................................... 6

Date Description: .................................................................................................................. 6

Sparkling Wine Sales – Forecast Data: ................................................................................. 7

1. Read the data as an appropriate Time Series data and plot the data. ......................... 7

Top 5 Records: ............................................................................................................... 7

Data Information: ........................................................................................................... 7

Time-Series Plot: ........................................................................................................... 8

2. Perform appropriate Exploratory Data Analysis to understand the data and also perform
decomposition. .................................................................................................................. 9

Data Summary: .............................................................................................................. 9

Time-Series Decomposition: ........................................................................................ 10

3. Split the data into training and test. The test data should start in 1991. .................... 12

4. Build all the exponential smoothing models on the training data and evaluate the model
using RMSE on the test data. Other models such as regression, naïve forecast models and
simple average models. should also be built on the training data and check the performance
on the test data using RMSE. .......................................................................................... 12

Model Plots: ................................................................................................................. 13

5. Check for the stationarity of the data on which the model is being built on using
appropriate statistical tests and also mention the hypothesis for the statistical test. If the
data is found to be non-stationary, take appropriate steps to make it stationry. Check the
new data for stationarity and comment. Note: Stationarity should be checked at alpha =
0.05. ................................................................................................................................ 14

6. Build an automated version of the ARIMA/SARIMA model in which the parameters are
selected using the lowest Akaike Information Criteria (AIC) on the training data and evaluate
this model on the test data using RMSE. ......................................................................... 15

ARIMA: ........................................................................................................................ 15

SARIMA: ...................................................................................................................... 17

7. Build a table (create a data frame) with all the models built along with their
corresponding parameters and the respective RMSE values on the test data. ................ 19

2
8. Based on the model-building exercise, build the most optimum model(s) on the
complete data and predict 12 months into the future with appropriate confidence
intervals/bands. ............................................................................................................... 20

9. Comment on the model thus built and report your findings and suggest the measures
that the company should be taking for future sales. Please explain and summarise the
various steps performed in this project. There should be proper business interpretation and
actionable insights present. ............................................................................................. 21

Rose Wine Sales – Forecast Data: ..................................................................................... 23

1. Read the data as an appropriate Time Series data and plot the data. ....................... 23

Top 5 Records: ............................................................................................................. 23

Data Information: ......................................................................................................... 23

Time-Series Plot: ......................................................................................................... 24

Imputing the missing values ......................................................................................... 24

2. Perform appropriate Exploratory Data Analysis to understand the data and also perform
decomposition. ................................................................................................................ 25

Data Summary: ............................................................................................................ 25

Time-Series Decomposition: ........................................................................................ 26

3. Split the data into training and test. The test data should start in 1991. .................... 28

4. Build all the exponential smoothing models on the training data and evaluate the model
using RMSE on the test data. Other models such as regression, naïve forecast models and
simple average models. should also be built on the training data and check the performance
on the test data using RMSE. .......................................................................................... 28

Model Plots: ................................................................................................................. 29

5. Check for the stationarity of the data on which the model is being built on using
appropriate statistical tests and also mention the hypothesis for the statistical test. If the
data is found to be non-stationary, take appropriate steps to make it stationry. Check the
new data for stationarity and comment. Note: Stationarity should be checked at alpha =
0.05. ................................................................................................................................ 30

6. Build an automated version of the ARIMA/SARIMA model in which the parameters are
selected using the lowest Akaike Information Criteria (AIC) on the training data and evaluate
this model on the test data using RMSE. ......................................................................... 31

ARIMA: ........................................................................................................................ 31

SARIMA: ...................................................................................................................... 33

3
7. Build a table (create a data frame) with all the models built along with their
corresponding parameters and the respective RMSE values on the test data. ................ 35

8. Based on the model-building exercise, build the most optimum model(s) on the
complete data and predict 12 months into the future with appropriate confidence
intervals/bands. ............................................................................................................... 36

9. Comment on the model thus built and report your findings and suggest the measures
that the company should be taking for future sales. Please explain and summarise the
various steps performed in this project. There should be proper business interpretation and
actionable insights present. ............................................................................................. 38

List of Figures:
Figure 1: Data Information .................................................................................................... 7
Figure 2: Time-Series Plot ..................................................................................................... 8
Figure 3: Time Series Decomposition- Additive method ...................................................... 10
Figure 4: Time-Series Decomposition- Multiplicative method .............................................. 11
Figure 5: Plotting on test data using various models ........................................................... 13
Figure 6: ARIMA Model forecast on Test data...................................................................... 16
Figure 7: SARIMA Model forecast on Test data ................................................................... 18
Figure 8: Overall results of various models on test data ...................................................... 20
Figure 9: Forecast with confidence intervals ....................................................................... 21
Figure 10: Data information ................................................................................................. 23
Figure 11: Sparkling Wine Sales Over Time ........................................................................ 24
Figure 12: Time-Series plot after imputing the missing values ............................................. 25
Figure 13: Time Series Decomposition- Additive method .................................................... 26
Figure 14: Time-Series Decomposition- Multiplicative method ............................................ 27
Figure 15: Plotting on test data using various models ......................................................... 29
Figure 16: ARIMA Model forecast on Test data.................................................................... 32
Figure 17: SARIMA Model forecast on Test data ................................................................. 34
Figure 18: Overall results of various models on test data .................................................... 36
Figure 19: Forecast with confidence intervals ..................................................................... 37
Figure 20: Prediction intervals ............................................................................................. 37

4
List of Tables:
Table 1: Data Description ...................................................................................................... 6
Table 2: Top 5 Records ......................................................................................................... 7
Table 3: Data Summary......................................................................................................... 9
Table 4: RMSE values of models used ................................................................................ 12
Table 5: ARIMA Model- Results ........................................................................................... 15
Table 6: ARIMA Summary ................................................................................................... 16
Table 7: SARIMA Model - Results ....................................................................................... 17
Table 8: SARIMA Summary ................................................................................................. 17
Table 9: Results of RMSE from various models................................................................... 19
Table 10: Top 5 Records ..................................................................................................... 23
Table 11: Data Summary ..................................................................................................... 25
Table 12: RMSE values of models used .............................................................................. 28
Table 13: ARIMA Model - Results ........................................................................................ 31
Table 14: ARIMA Summary ................................................................................................. 32
Table 15: SARIMA Model - Results ..................................................................................... 33
Table 16: SARIMA Summary ............................................................................................... 33
Table 17: Results of RMSE from various models................................................................. 35

5
Executive Summary
ABC Estate Wines seeks to analyse and forecast wine sales for two distinct types of wine,
Sparkling and Rose, throughout the 20th century. The analysis aims to provide insights into
historical sales trends and enable data-driven decision-making for inventory management,
marketing strategies, and financial planning for future sales. The primary objective is to
accurately forecast wine sales and understand the underlying patterns for both wine
categories.

Introduction
In pursuit of data-driven excellence, ABC Estate Wines undertakes an analysis and
forecasting project encompassing Sparkling and Rose wine sales throughout the 20th century.
This initiative aims to uncover sales patterns and seasonal trends, providing actionable
insights for inventory management, marketing campaigns, and financial planning. The findings
will empower ABC Estate Wines to make informed decisions and enhance its offerings,
ensuring the company’s continued success in the ever-evolving wine industry.

Date Description:

Data
Wine Type Dataset Name Time period Data Availability
Characteristics

Time Series with


seasonality and
Sparkling “Sparkling.csv” 20th Century Complete Data
Trends

Time Series with


Missing Data,
Rose “Rose.csv” 20th Century Missing values
Seasonality, and
Trends
Table 1: Data Description

Note: First, we will analyse the forecast data for Sparkling wine sales, followed by the analysis
of the forecast data for Rose wine sales.

6
Sparkling Wine Sales – Forecast Data:

1. Read the data as an appropriate Time Series data and plot the data.

The Sparkling wine sales data was transformed into a time series data format, with the top 5
records displayed, and the complete dataset was visualized through plotting.

Top 5 Records:

YearMonth Sparkling

1980-01-01 1686

1980-02-01 1591

1980-03-01 2304

1980-04-01 1712

1980-05-01 1471

Table 2: Top 5 Records

Data Information:

Figure 1: Data Information

7
Time-Series Plot:

Figure 2: Time-Series Plot

Inference:

➢ Trend: There is a noticeable upward trend in Wine sales over the years. It indicates
that, on average, wine sales have been increasing during the period. This long-term
trend can be of strategic importance for business planning and growth.

➢ Seasonality: The plot reveals a clear seasonality in wine sales, with recurring patterns
of increases and decreases over time. This seasonality suggests that wine sales are
influenced by regular, calendar-based factors, such as holidays, seasons, or special
events.

➢ Cyclic Patterns: In addition to seasonality, the plot shows cyclic patterns with periodic
peaks and troughs. These cycles might be influenced by more extended economic or
industry-specific trends that affect wine sales.

➢ Outliers: The presence of occasional outliers suggests that there are exceptional
periods where wine sales experience sudden spikes. Identifying the reasons behind
these outliers can help understand the impact of specific events or promotions on
sales.

8
2. Perform appropriate Exploratory Data Analysis to understand the data
and also perform decomposition.

Data Summary:

Sparkling

count 187.000000

mean 2402.417112

std 1295.111540

min 1070.000000

25% 1605.000000

50% 1874.000000

75% 2549.000000

max 7242.000000

Table 3: Data Summary

Inference:

➢ Central Tendency: The mean wine sales for the data is approximately 2,404.42 units.
This represents the average level of sales during the specified period.
➢ Variability: The standard deviation of approx. indicates a considerable amount of
variability in wine sales. The sales data exhibit fluctuations over time.
➢ Range: The minimum and maximum sales are 1,070 and 7,242 units, respectively.
This wide range suggests variation in wine sales, with some periods having
substantially higher sales than others.
➢ Distributions: The data is lightly right-skewed, as the mean (2,404.42) is greater than
the median (1,874). This suggests that there are periods with relatively high sales that
pull the mean above the median.
➢ Spread: The IQR is approximately 944 units. This indicates the spread of the middle
50% of the data.

The wide range and variability suggest the presence of seasonality, trends, and potentially
influential factors affecting sales over time. Further time series analysis and forecasting
can help uncover and utilize these patterns for business insights and decision-making.

9
Time-Series Decomposition:

Method 1: Additive Method

Figure 3: Time Series Decomposition- Additive method

Inference:

i. The Plot of the time series components reveals a positive trend, clear seasonality, and
a small, mostly random residual component in the wine sales data.
ii. These insights are valuable for forecasting and decision-making in the wine industry,
allowing for informed business strategies and inventory management.

10
Method 2: Multiplicative method

Figure 4: Time-Series Decomposition- Multiplicative method

Inference:

To determine which model (additive or multiplicative) is more appropriate, Let's observe the
decomposition results:

➢ Trend - There is an evident in upward trend in wine sales over the years. This suggests
that the trend component is a significant factor in the data.
➢ Seasonality - The data clearly exhibits seasonality, with recurring patterns of
increases and decreases over time. This is common in time series data, especially
when it represents consumer products like wine.

Given the relatively stable and consistent seasonality observed in the data, the additive model
is a reasonable choice. The seasonality in wine series is added to the trend component. This
model simplifies the decomposition and is easier to interpret.

11
3. Split the data into training and test. The test data should start in 1991.

After splitting the data into training and test sets, the training dataset comprises 132 data
points, while the test dataset consists of 55 data points, ensuring a robust evaluation of the
forecasting models.

Training Data Shape: (132, 1)


Test Data Shape: (55, 1)

4. Build all the exponential smoothing models on the training data and
evaluate the model using RMSE on the test data. Other models such as
regression, naïve forecast models and simple average models. should
also be built on the training data and check the performance on the test
data using RMSE.

Various forecasting models were utilized to analyse and predict wine sales. These models
include Simple Exponential Smoothing (SES), Double Exponential Smoothing (DES), Linear
Regression, Naïve Forecast models, and Simple Average models. The corresponding RMSE
values are detailed in the table format provided below.

Model RMSE

Holt-Winter Model 358.883527

Simple Exponential Smoothing 1304.927405

Double Exponential Smoothing 23136.382921

Regression Model 1389.135175

Naive Forecast Model 3864.279352

Simple Average Model 1275.081804

Table 4: RMSE values of models used

12
Model Plots:

Figure 5: Plotting on test data using various models

Inference:

➢ Holt-Winter Model: The model has the lowest RMSE, indicating that it provides the
most accurate forecasts among the models tested. It takes into account seasonality,
trend, and level components, making it suitable for capturing complex patterns in the
data.
➢ Simple Exponential Smoothing: The model has a lower RMSE compared to some
other models, but it is less accurate than the Holt-Winter model. This model assumes
a constant level and does not consider seasonality and trend, which may limit its ability
to capture patterns in the data.
➢ Double exponential Smoothing: This model has a significantly higher RMSE than
the other models, indicating poor performance. It captures trend but not seasonality,
and its high RMSE suggests that it might not well-suited for this given dataset.

13
➢ Regression Model: The RMSE for the regression model is relatively high, similar to
Simple Exponential Smoothing. The linear regression model considers time as a
predictor, but it may not capture more complex patters in the data as effectively as
other time series models.
➢ Naive Forecast Model: The model has the highest RMSE among the models,
indicating the least accurate forecasts. It simply uses the last observed value as the
forecast, which is often not sufficient for capturing seasonality or trend.
➢ Simple Average Model: The model has a lower RMSE, which is close to that of Simple
Exponential Smoothing. It assumes a constant level and does not capture seasonality
or trend.

So, The Holt-Winter model (Triple Exponential Smoothing) appears to be the best-performing
model based on the RMSE metric, indicating its ability to capture and forecast the sales data
more accurately, considering both seasonality and trend.

5. Check for the stationarity of the data on which the model is being built
on using appropriate statistical tests and also mention the hypothesis
for the statistical test. If the data is found to be non-stationary, take
appropriate steps to make it stationary. Check the new data for
stationarity and comment. Note: Stationarity should be checked at
alpha = 0.05.

In order to assess the stationarity of the time series data, we employed the Augmented Dickey-
Fuller (ADF) Test. This statistical test is applied to evaluate the stationarity of the time series
data, and it operates on the basis of the following hypotheses:

• Null Hypothesis (H0): The time series is not stationary.

• Alternative Hypothesis (H1): The time series is stationary.

This analysis provides valuable insights into the data's stationarity, a critical factor in time
series forecasting.

After checking for ADF test, the following results are obtained.

ADF Statistic: -1.3604974548123367


p-value: 0.6010608871634855

So,

14
Null Hypothesis is not rejected. The data is not stationary

After one time differencing, to make the data stationary

ADF Statistic (Differenced Data): -45.050300936195256


p-vale (Differenced Data): 0.0
So,

Null Hypothesis Rejected: The differenced data is stationary.

Inference:

• After applying one-time differencing to the data, the time series becomes stationary.
This means that it no longer exhibits a significant trend or seasonality, and its statistical
properties do not change over time.

• In the context of time series analysis and modelling, having stationary data is important
because it satisfies a fundamental assumption for many time series models, including
Holt-Winter model and other forecasting techniques. Stationary data ensures that the
statistical properties of the time series, such as mean and variance, remain constant
over time. Non-stationary data, on the other hand, can exhibit changing statistical
properties, making it more challenging to model and forecast accurately.

6. Build an automated version of the ARIMA/SARIMA model in which the


parameters are selected using the lowest Akaike Information Criteria
(AIC) on the training data and evaluate this model on the test data using
RMSE.

ARIMA:
ARIMA Results:

Metric Value

RMSE (Root Mean Squared Error) 1011.01

Best ARIMA order (p, d, q) (4, 0, 4)

AIC (Akaike Information Criterion) 2192.44

Table 5: ARIMA Model- Results


15
ARIMA Summary:

Table 6: ARIMA Summary

ARIMA Model on Test Data- Plot:

Figure 6: ARIMA Model forecast on Test data

16
SARIMA:

SARIMA Results:

Metric Value

Best SARIMA Order (p, d, q) (2, 1, 4)

Best SARIMA Seasonal Order (P, D, Q, s) (0, 1, 1, 12)

RMSE for SARIMA Model 440.00

AIC for the Best SARIMA Model 1779.10

Table 7: SARIMA Model - Results

These metrics provide key information about the SARIMA model's parameters and
performance in forecasting wine sales.

SARIMA Summary:

Table 8: SARIMA Summary

17
SARIMA Model on Test Data- Plot:

Figure 7: SARIMA Model forecast on Test data

Inference:

i. Model Fit: The SARIMA model has a significantly lower RMSE (440.00) compared to
the ARIMA model (1011.01). This indicates that the SARIMA model provides more
accurate predictions and better captures the underlying patterns in the data.
ii. Model Orders:
a. For the ARIMA model, the best order (p, d, q) is (4, 0, 4), which implies that it
includes autoregressive (AR) and moving average (MA) components but does
not require differencing (d=0). The ARIMA model does not consider any
seasonal components.
b. For the SARIMA model, the best order (p, d, q) is (2, 1, 4), and the seasonal
order (P, D, Q, s) is (0, 1, 1, 12). This indicates that the SARIMA model includes
a seasonal component with a yearly seasonality (s=12) and also requires first-
order differencing (d=1).
iii. Information Criterion (AIC): the SARIMA model has a lower AIC (1779.10) compared
to the ARIMA model (2192.44). This suggests that the SARIMA model is a better fit for
the data.

So, it is evident that the SARIMA model outperforms the ARIMA model in terms of
accuracy (lower RMSE) and model fit (lower AIC). The SARIMA model takes into
account both non-seasonal and seasonal components in the time series data, making
it a more appropriate choice for forecasting.

18
7. Build a table (create a data frame) with all the models built along with
their corresponding parameters and the respective RMSE values on the
test data.

Here's a table summarizing all the models built, their parameters, and the RMSE values on
the test data:

Model Parameters RMSE

SES (Auto) Auto-selected 1304.9274045289721

Holt (Auto) Auto-selected 23136.38292067584

Holt-Winters (Auto) Auto-selected 358.8835269185226

Linear Regression Auto-selected 1389.135174897992

Naïve Forecast (Last Observed) Last Observed 3864.2793518443914

Simple Average (Train) Train Data 1275.0818036965309

ARIMA p=4, d=0, q=4 1011.0130508904288

SARIMA p=2, d=1, q=4, P=0, D=1, Q=1, s=12 440.00343302390326

Table 9: Results of RMSE from various models

Inference:

The table provides a summary of different forecasting models along with their corresponding
parameters and RMSE values.

i. Holt-Winters model has the lowest RMSE values of 358.88, indicating that it performs
well in the forecasting the wine sales data.
ii. SARIMA model also shows a low RMSE value of 440, suggesting good forecasting
accuracy. It is the second-best model.
iii. ARIMA model with parameters p = 4, d = 0, q =4 has an RMSE value of 1011.01,
which is higher than to top two models but still reasonable.
iv. Simple Average Model and Simple Exponential Smoothing have similar RMSE
values, around 1300, indicating a moderate level of forecasting accuracy.
v. Linear Regression performs slightly worse with an RMSE of 1389.14.

19
vi. Naive Forecast model is the highest RMSE of 3864.28, suggesting poor forecasting
accuracy.

Plots:

Figure 8: Overall results of various models on test data

8. Based on the model-building exercise, build the most optimum


model(s) on the complete data and predict 12 months into the future
with appropriate confidence intervals/bands.

NOTE: The best model based on low RMSE value is Holt-Winter Model (Triple Exponential
Smoothing)

Plotting the historical data and forecast with confidence intervals:

20
Figure 9: Forecast with confidence intervals

9. Comment on the model thus built and report your findings and suggest
the measures that the company should be taking for future sales.
Please explain and summarise the various steps performed in this
project. There should be proper business interpretation and actionable
insights present.

Based on the model-building exercise for the wine sales forecasting, summarizing the
summary of steps performed, findings, and actionable insights for the company:

Summary of Steps performed:

• Data Collection: The project began with the collection of historical wine sales data for
analysis and forecasting.
• Exploratory Data Analysis: An initial analysis of the data was conducted to understand
trends, seasonality, and other patterns. This revealed the presence of seasonality in the
wine sales data.
• Model Selection: Several time series forecasting models were considered, including
SARIMA and Holt-Winters. After a comprehensive analysis, the Holt-Winters model was
selected as the best-performing model based on RMSE (Root Mean Square Error).
• Model Training: The Holt-Winters model was trained on the complete dataset, including
both training and test data.
• Forecasting: The trained Holt-Winters model was used to make forecasts for the next 12
months into the future.
21
• Prediction Intervals: Prediction intervals (bands) were calculated for the forecasts based
on the model's residuals.

Findings and Insights:

• Model Selection: The Holt-Winters model demonstrated the best forecasting accuracy
among the models considered. This model considers both seasonality and trend in the
data, making it suitable for capturing the underlying patterns in wine sales.
• Confidence Intervals: While Holt-Winters does not provide direct confidence intervals,
prediction intervals (bands) were calculated to estimate the uncertainty in the forecasts.
• Future Sales Insights: Based on the model's forecasts and prediction intervals, the
company can gain valuable insights:
o The forecasts can serve as a basis for production planning, inventory management,
and supply chain optimization.
o The prediction intervals indicate the range of uncertainty in the forecasts, helping
the company understand the potential variability in sales.

Measures of the Company:

• Demand Planning: The company should use the forecasts to plan production and
inventory levels. By aligning production with the forecasts, they can minimize overstock
and understock situations.
• Marketing and Promotions: Insights from the model can guide the company in identifying
high-demand periods and planning marketing and promotion strategies accordingly.
• Inventory Management: The prediction intervals highlight the range of potential sales.
The company can use this information for optimizing inventory levels and reducing carrying
costs.
• Resource Allocations: With accurate forecasts, the company can allocate resources
efficiently, ensuring that they have the right number of staff, materials, and distribution
channels available when needed.
• Continuous Monitoring: The company should continuously monitor actual sales against
forecasts and update the model as new data becomes available. This helps in improving
the forecasting accuracy over time.
• Scenario Planning: The company can use the model to conduct scenario planning,
considering various sales scenarios and their implications on operations and profitability.

In addition to this, continuous monitoring and flexibility in response to changing market


conditions are essential for successful implementation.

22
Rose Wine Sales – Forecast Data:

1. Read the data as an appropriate Time Series data and plot the data.

The Rose wine sales data was transformed into a time series data format, with the top 5
records displayed, and the complete dataset was visualized through plotting.

Top 5 Records:

YearMonth Rose

1980-01-01 112.0

1980-02-01 118.0

1980-03-01 129.0

1980-04-01 99.0

1980-05-01 116.0

Table 10: Top 5 Records

Data Information:

Figure 10: Data information

23
Time-Series Plot:

Figure 11: Sparkling Wine Sales Over Time

Inference:

➢ Seasonal Patterns: The "Rose" values show seasonal patterns, with peaks in different
months across the years, especially, there is a peak around December each year,
indicating a potential annual cycle
➢ Yearly Trends: Over the years, there are variations in the "Rose" wine sales. Some
years show increasing trends, while others show decreasing trends.
➢ Outliers: Some months have usually high or low "Rose" wine sales data, which could
be outliers or related to specific events of factors.
➢ Missing Values: The dataset contains missing values particularly in the year 1994. It
is important to address the missing values before forecasting to ensure the continuity
of the time series.

Imputing the missing values


To fill missing values in the time series data, there are various methods like forward-fill,
backward-fill, interpolation, or seasonal decomposition. Here, the missing values are imputed
with Forward-fill method. This method fills missing values with the most recent preceding non-
missing value. It is suitable when the data's most recent values are a good estimate of the
missing values.

24
Plotting the time-series by imputing using Forward fill method:

Figure 12: Time-Series plot after imputing the missing values

2. Perform appropriate Exploratory Data Analysis to understand the data


and also perform decomposition.

Data Summary:

Sparkling

count 187.000000

mean 89.909091

std 39.244440

min 28.000000

25% 62.500000

50% 85.000000

75% 111.000000

max 267.000000

Table 11: Data Summary

25
Inference:

➢ The data covers a range of sales values, from a minimum of 28 units to a maximum of
267 units, indicating variability in monthly sales.
➢ The average monthly sales are around 89.91 units, suggesting a moderate level of
sales on average.
➢ The data is not heavily skewed, as the median (85 units) is relatively close to the mean
(89.91 units).
➢ There are substantial variations in monthly sales, as indicated by the standard
deviation of approximately 39.24 units.
➢ Sales values at different percentiles provide a sense of the distribution of data across
the values. For instance, 25% of the data falls below 62.5 units, and 75% falls below
111 units.

These insights help in understanding the central tendencies, variability, and distribution of
the wine sales in the data, which can be valuable for forecasting and making informed
business decisions.

Time-Series Decomposition:
Method 1: Additive Method

Figure 13: Time Series Decomposition- Additive method

26
Method 2: Multiplicative method

Figure 14: Time-Series Decomposition- Multiplicative method

Inference:

Given the relatively stable and consistent seasonality observed in the data, the additive model
is a reasonable choice. The seasonality in wine sales is added to the trend component. This
model simplifies the decomposition and is easier to interpret.

27
3. Split the data into training and test. The test data should start in 1991.

After splitting the data into training and test sets, the training dataset comprises 132 data
points, while the test dataset consists of 55 data points, ensuring a robust evaluation of the
forecasting models.

4. Training Data Shape: (132, 1)


5. Test Data Shape: (55, 1)

4. Build all the exponential smoothing models on the training data and
evaluate the model using RMSE on the test data. Other models such as
regression, naïve forecast models and simple average models. should
also be built on the training data and check the performance on the test
data using RMSE.

Various forecasting models were utilized to analyse and predict wine sales. These models
include Simple Exponential Smoothing (SES), Double Exponential Smoothing (DES), Linear
Regression, Naïve Forecast models, and Simple Average models. The corresponding RMSE
values are detailed in the table format provided below.

Model RMSE

Holt-Winter Model 26.724742

Simple Exponential Smoothing 37.612861

Double Exponential Smoothing 366.304022

Regression Model 15.275732

Naive Forecast Model 79.738550

Simple Average Model 53.480857

Table 12: RMSE values of models used

28
Model Plots:

Figure 15: Plotting on test data using various models

Inference:

➢ Holt-Winter Model (RMSE: 26.724742): The Holt-Winter model has a moderate


RMSE, indicating a reasonable level of forecasting accuracy. It outperforms several
other models in the list.
➢ Simple Exponential Smoothing (RMSE: 37.612861): Simple Exponential Smoothing
(SES) has a relatively high RMSE, suggesting a notable forecasting error when applied
to the test data.
➢ Double exponential Smoothing (RMSE: 366.304022): Double Exponential
Smoothing (DES) has an extremely high RMSE, signifying substantial forecasting
inaccuracy. The result may indicate an issue with model configuration or data
suitability.
➢ Regression Model (RMSE: 15.275732): The regression model has a lower RMSE,
indicating relatively good performance in capturing the underlying trend in the data. It
is one of the top-performing models in this context.
29
➢ Naive Forecast Model (RMSE: 79.738550): The Naïve Forecast Model has the
highest RMSE, indicating a large forecasting error. It simply uses the last observed
value as the forecast, which may not capture underlying patterns or trends.
➢ Simple Average Model (RMSE: 53.480857): The Simple Average Model performs
better than the Naïve Forecast Model but still has a relatively high RMSE, suggesting
that it does not capture the dynamics of the time series data effectively.

So, The Holt-Winter model (Triple Exponential Smoothing) appears to be the best-performing
model based on the RMSE metric, indicating its ability to capture and forecast the sales data
more accurately, considering both seasonality and trend.

5. Check for the stationarity of the data on which the model is being built
on using appropriate statistical tests and also mention the hypothesis
for the statistical test. If the data is found to be non-stationary, take
appropriate steps to make it stationary. Check the new data for
stationarity and comment. Note: Stationarity should be checked at
alpha = 0.05.

In order to assess the stationarity of the time series data, we employed the Augmented Dickey-
Fuller (ADF) Test. This statistical test is applied to evaluate the stationarity of the time series
data, and it operates on the basis of the following hypotheses:

• Null Hypothesis (H0): The time series is not stationary.

• Alternative Hypothesis (H1): The time series is stationary.

This analysis provides valuable insights into the data's stationarity, a critical factor in time
series forecasting.

After checking for ADF test, the following results are obtained.

ADF Statistic: -1.8748555417199908


p-value: 0.3439807193343034

So,

Null Hypothesis is not rejected. The data is not stationary

After one time differencing, to make the data stationary

30
ADF Statistic (Differenced Data): -8.04413902007531
p-vale (Differenced Data): 1.8135795068093227e-12

So,

Null Hypothesis Rejected: The differenced data is stationary.

Inference:

• After applying one-time differencing to the data, the time series becomes stationary.
This means that it no longer exhibits a significant trend or seasonality, and its statistical
properties do not change over time.

• In the context of time series analysis and modelling, having stationary data is important
because it satisfies a fundamental assumption for many time series models, including
Holt-Winter model and other forecasting techniques. Stationary data ensures that the
statistical properties of the time series, such as mean and variance, remain constant
over time. Non-stationary data, on the other hand, can exhibit changing statistical
properties, making it more challenging to model and forecast accurately.

6. Build an automated version of the ARIMA/SARIMA model in which the


parameters are selected using the lowest Akaike Information Criteria
(AIC) on the training data and evaluate this model on the test data using
RMSE.

ARIMA:
ARIMA Results:

Metric Value

RMSE (Root Mean Squared Error) 36.82998834185089

Best ARIMA order (p, d, q) (2, 1, 3)

AIC (Akaike Information Criterion) 1274.695121131141

Table 13: ARIMA Model - Results

31
ARIMA Summary:

Table 14: ARIMA Summary

ARIMA Model on Test Data- Plot:

Figure 16: ARIMA Model forecast on Test data

32
SARIMA:

SARIMA Results:

Metric Value

Best SARIMA Order (p, d, q) (0, 1, 4)

Best SARIMA Seasonal Order (P, D, Q, s) (0, 1, 1, 12)

RMSE for SARIMA Model 14.390639768196014

AIC for the Best SARIMA Model 1088.918434291231

Table 15: SARIMA Model - Results

These metrics provide key information about the SARIMA model's parameters and
performance in forecasting wine sales.

SARIMA Summary:

Table 16: SARIMA Summary

33
SARIMA Model on Test Data- Plot:

Figure 17: SARIMA Model forecast on Test data

Inference:

iv. Model Fit: The SARIMA model has a significantly lower RMSE (14.39) compared to
the ARIMA model (36.83). This suggests that the SARIMA model is better at capturing
the underlying patterns in the data and making more accurate predictions.
v. Model Orders:
a. For the ARIMA model, the best order (p, d, q) is (2, 1, 3), which means it
includes autoregressive (AR) and moving average (MA) components. The
differencing parameter (d) is set to 1, indicating that the data needed one
differencing to make it stationary.
b. For the SARIMA model, the best order (p, d, q) is (0, 1, 4), and the seasonal
order (P, D, Q, s) is (0, 1, 1, 12). This indicates that the SARIMA model includes
a seasonal component with a yearly seasonality (s=12).
vi. Information Criterion (AIC): The SARIMA model has a lower AIC (1088.92)
compared to the ARIMA model (1274.70). This further suggests that the SARIMA
model is a better fit for the data.

So, it is evident that the SARIMA model outperforms the ARIMA model in terms of accuracy
(lower RMSE) and model fit (lower AIC). The SARIMA model takes into account both non-
seasonal and seasonal components in the time series data, making it a more appropriate
choice for forecasting.

34
7. Build a table (create a data frame) with all the models built along with
their corresponding parameters and the respective RMSE values on the
test data.

Here's a table summarizing all the models built, their parameters, and the RMSE values on
the test data:

Model Parameters RMSE

SES (Auto) Auto-selected 37.61286146429779

Holt (Auto) Auto-selected 366.3040222851931

Holt-Winters (Auto) Auto-selected 26.7247417908197

Linear Regression Auto-selected 15.2757315973659

Naïve Forecast (Last Observed) Last Observed 79.73855004724103

Simple Average (Train) Train Data 53.48085657692872

ARIMA p=4, d=0, q=4 36.82998834185089

SARIMA p=2, d=1, q=4, P=0, D=1, Q=1, s=12 14.390639768196014

Table 17: Results of RMSE from various models

Inference:

The table provides a summary of different forecasting models along with their corresponding
parameters and RMSE values.

➢ Simple Exponential Smoothing (SES): This model, with auto-selected parameters, has
a relatively high RMSE, indicating moderate forecasting accuracy.
➢ Holt-Winters: he Holt-Winters model with auto-selected parameters has an extremely
high RMSE, suggesting significant forecasting inaccuracy. This may indicate an issue with
model configuration or data suitability.
➢ Holt-Winter Model (Auto): The auto-selected Holt-Winter model performs better
compared to other models, with a relatively lower RMSE, indicating better forecasting
accuracy.
➢ Linear Regression: The regression model, with auto-selected parameters, exhibits good
forecasting accuracy and performs well in capturing the underlying trend in the data.

35
➢ Naïve Forecast (Last Observed): The Naïve Forecast model, which relies on the last
observed value, has the highest RMSE, suggesting a large forecasting error. This model
may not capture the underlying patterns or trends effectively.
➢ Simple Average: The Simple Average model, trained on the historical data, performs
better than the Naïve Forecast model but still has a relatively high RMSE, indicating a
need for improved forecasting accuracy.
➢ ARIMA (Auto): The ARIMA model with auto-selected parameters exhibits a moderate
RMSE, suggesting a reasonable level of forecasting accuracy.
➢ SARIMA (Auto): The SARIMA model with auto-selected parameters shows the lowest
RMSE, indicating the best forecasting accuracy among the models. This model appears
to capture the time series data's underlying patterns effectively.

Plots:

Figure 18: Overall results of various models on test data

8. Based on the model-building exercise, build the most optimum


model(s) on the complete data and predict 12 months into the future
with appropriate confidence intervals/bands.

36
NOTE: The best model based on low RMSE value is SARIMA Model.

Plotting the historical data and forecast with confidence intervals:

Figure 19: Forecast with confidence intervals

Plotting the prediction intervals:

Figure 20: Prediction intervals

37
9. Comment on the model thus built and report your findings and suggest
the measures that the company should be taking for future sales.
Please explain and summarise the various steps performed in this
project. There should be proper business interpretation and actionable
insights present.

The model built in this project is a Seasonal Autoregressive Integrated Moving Average
(SARIMA) model. It was selected as the most optimum based on its performance in
forecasting wine sales. Here's summary of the various steps performed in this project, along
with findings and actionable insights for the company:

Summary of Steps performed:

➢ Data Preparation: The project began with the importation and cleaning of the time series
data, which represents monthly wine sales over several years. Missing values were
handled, and the data was transformed into an appropriate format for analysis.
➢ Exploratory Data Analysis: Exploratory data analysis was performed to understand the
data's characteristics. The data exhibited seasonality and trends, making time series
modelling an appropriate choice.
➢ Model Building: Various time series models were built and evaluated, including Simple
Exponential Smoothing, Holt-Winters, Linear Regression, ARIMA, and SARIMA models.
Model parameters were auto-selected or manually configured based on their performance
in forecasting the data.
➢ Model Evaluation: The models were evaluated using Root Mean Square Error (RMSE),
and the SARIMA model was found to have the lowest RMSE, indicating the best
forecasting accuracy.
➢ Stationarity Check: Stationarity tests (i.e., ADF) were conducted on the data. The
SARIMA model assumes stationary data, and the data was adjusted as necessary to meet
this requirement.
➢ Optimum Model Building: The SARIMA model with the best parameters was built using
the entire dataset.
➢ Forecasting: The SARIMA model was used to predict wine sales for the next 12 months
into the future.
➢ Prediction Intervals: Prediction intervals (confidence intervals) were calculated to provide
a range of possible future sales values, accounting for uncertainty.

38
Findings:

• The SARIMA model demonstrated the best forecasting accuracy, making it a suitable
choice for predicting future wine sales.

• The forecasted sales for the next 12 months indicate a seasonal pattern, with
fluctuations throughout the year.

• The prediction intervals provide a range of expected sales values, helping the company
understand the uncertainty in the forecasts.

Actionable Insights:

• Inventory Management: The company can use the SARIMA model's forecasts to plan
inventory effectively. By considering seasonal patterns and prediction intervals, they can
ensure they have the right amount of wine in stock to meet customer demand.
• Promotions and Marketing: Understanding the seasonal variations in sales can help the
company plan promotions and marketing campaigns at the right times to boost sales
during peak seasons.
• Financial Planning: Accurate sales forecasts can assist in financial planning and
budgeting. The company can allocate resources and investments more efficiently based
on expected sales.
• Data Collection and Quality: Ensuring data quality is crucial for accurate forecasting. The
company should continue collecting and maintaining high-quality sales data to improve
forecasting models further.
• Model Refinement: Periodically reviewing and refining the SARIMA model's parameters
can lead to more accurate forecasts. Continuous model improvement is essential for
maintaining forecasting excellence.
• Market Expansion: Identifying trends and patterns in sales data can help the company
explore new markets and expand its product line to maximize profitability.

So, the SARIMA model provides a powerful tool for forecasting wine sales, allowing the
company to make informed decisions about inventory, marketing, and financial planning. By
leveraging accurate forecasts and understanding the underlying patterns, the company can
take strategic measures to enhance future sales and overall business performance.

THE END!
39

You might also like