Business Report Sparkling Dataset - TSF

Business Analytics Report
Submitted to: Concerned faculty At

Great learning
The University of Texas & Austin
Submitted By:
Mr. Charit Sharma

PGPDSBA Online July 2021
Subject: - Time Series Forecasting – SPARKLING DATA SET
Sparkling Dataset 2
Contents
Problem Statement
For this particular assignment, the data of different types of wine sales in the 20th century is to be analyzed. Both of these
data are from the same company but of different wines. As an analyst in the ABC Estate Wines, you are tasked to analyze
and forecast Wine Sales in the 20th century.
1. Read the data as an appropriate Time Series data and plot the data.
2. Perform appropriate Exploratory Data Analysis to understand the data and also perform decomposition.
3. Split the data into training and test. The test data should start in 1991.
4. Build all the exponential smoothing models on the training data and evaluate the model using RMSE on the test data.
Other additional models such as regression, naïve forecast models, simple average models, moving average models
should also be built on the training data and check the performance on the test data using RMSE.
5. Check for the stationarity of the data on which the model is being built on using appropriate statistical tests and also
mention the hypothesis for the statistical test. If the data is found to be non-stationary, take appropriate steps to make
it stationary. Check the new data for stationarity and comment. Note: Stationarity should be checked at alpha = 0.05.
6. Build an automated version of the ARIMA/SARIMA model in which the parameters are selected using the lowest
Akaike Information Criteria (AIC) on the training data and evaluate this model on the test data using RMSE.
7. Build ARIMA/SARIMA models based on the cut-off points of ACF and PACF on the training data and evaluate this
model on the test data using RMSE.
8. Build a table with all the models built along with their corresponding parameters and the respective RMSE values on
the test data.
9. Based on the model-building exercise, build the most optimum model(s) on the complete data and predict 12 months
into the future with appropriate confidence intervals/bands.
10. Comment on the model thus built and report your findings and suggest the measures that the company should be
taking for future sales.
Sparkling Dataset 3
1. Read the data as an appropriate Time Series data and plot the data.
Display top 5 records
Fig:1. Sparkling first 5 records
Display bottom 5 records
Fig:2. Sparkling last 5 records
Check for Duplicate values
Sparkling data doesn’t have duplicate records.
Check for Missing values
Sparkling data doesn’t have null values.
Shape of the Sparkling data set
The Data set has 187 rows and 2 columns
Creating Time Stamp and make it as actual Time Series data
Sparkling Dataset 4
Fig:6. Sparkling time series plot without time-stamp
Note
X-Axis is not showing time stamp value. We need to pass the date range manually.
Fig:7. Sparkling time series plot with time-stamp
2. Perform appropriate Exploratory Data Analysis to understand the data and also perform decomposition.
Descriptive Statistics of data
Fig:8. Sparkling Descriptive Stats
Yearly Boxplot
Sparkling Dataset 5
Fig:9. Sparkling Yearly box plot
Monthly Boxplot
Fig:10. Sparkling Monthly box plot
Month plot
Fig:11. Sparkling Month plot
Sparkling Dataset 6
Pivot table
Table:1. Sparkling pivot
Monthly Sales across years
Sparkling Dataset 7
Fig:12. Sparkling sales across years
Analysis Observations from Sparkling wine data:
• Data from January, 1980 to July, 1995, monthly sales of Sparkling wines are provided.
• The given data file has been read as is, and a date range (valid time-stamp) has been applied to the data as an
index.
• The sale of Sparkling wines shows up and down slopes over the period, showing no consistent trend.
• Sparkling wine has been consistently popular with customers over the years.
• According to the data, on average, 2402 units of Sparkling wines were sold each month As a result of Within
a certain timeframe. Approximately 50% A month sales varied from 1605 units to 2549 units. Maximum sale
reported in a month is 7242 units and the minimum is 1070.
• Based on a yearly report boxplot, the average sale of Sparkling has remained relatively steady over the
period, around or a little below 2000 units.
Decade Plot
Sparkling Dataset 8
Fig:13. Sparkling Decade plot
Yearly Plot
Fig:14. Sparkling Yearly plot
Quarterly plot
Fig:15. Sparkling Quarterly plot
Daily Plot
Sparkling Dataset 9
Fig:16. Sparkling Daily plot
ECDF
Fig:17. Sparkling ECDF
Inference:
• From Empirical CDF plot, it seems that at least 3000 Sparkling wine units have been sold in 80% of month.
Avg Sparkling wine sales per month
Fig:18. Average Sparkling sales per month
Sparkling Dataset 10
Month on month Percent change
Fig:19. Sparkling month on month percent change
Decomposition
The decomposition of time series is a statistical task that deconstructs a time series into several components.
There are various forces that may affect the observations in a time series. The three important components are:
I. Trend (Long term movement)
II. Seasonal component: Intra-year stable fluctuations repeatable over the entire length of series
III. Irregular component (Random movements)
Additive
Fig:20. Sparkling - Additive
Multiplicative
Fig:21. Sparkling - Multiplicative
Infernece:
• There is no consistent trend shown in the plot of the trend component.
Fig:22. Additive- deaseasonalized
Fig:23. Multiplicative- deaseasonalized
3. Split the data into training and test. The test data should start in 1991.
Fig:24. Sparkling train and test split
Fig:25. Sparkling train and test plot
4. Build all the exponential smoothing models on the training data and evaluate the model using RMSE on the test
data. Other additional models such as regression, naïve forecast models, simple average models, moving average
models should also be built on the training data and check the performance on the test data using RMSE.
Model 1 Linear Regression

We are going to regress the 'Sparkling' variable against the order of the occurrence. we have to modify our training
data before fitting it into a linear regression.
Fig:26. Sparkling -Linear Regression
▪ The RMSE value for Linear Regression model is 1389.135
Model 2 Naïve Approach

In naive model, we say that the prediction for tomorrow is the same as today and the prediction for day after
tomorrow is tomorrow and since the prediction of tomorrow is same as today, Therefore the prediction for day after
tomorrow is also today.
Fig:27. Sparkling -Naïve Approach
▪ The RMSE value for Naïve approach model is 3864.279
Model 3 Simple Average

In simple average method, we will forecast by using the average of the training values.
Fig:28. Sparkling -Simple Average
▪ The RMSE value for Simple Average model is 1275.082
Model 4 Moving Average

In moving average model, we are going to calculate rolling means (or moving averages) for different intervals. The
best interval can be determined by the maximum accuracy (or the minimum error) over here. we are going to average
over the entire data.
Fig:29. Sparkling -Moving Average
The RMSE value for Moving average model is,
Fig:30. Model Comparison plot
Model 5 Simple Exponential Smoothing

• The autofit model picked 0.049 as the smoothing parameter.
• On the second iteration, the model was executed without passing a value for alpha and used parameters
‘optimized=True, use_brute=True’
Fig:31. Sparkling-SES
The RMSE value for SES model is, 1316.035, Alpha=0.049

Model 6 Double Exponential Smoothing (Holt’s Model)
Fig:32. Sparkling-DES
The RMSE value for SES model is, 1778.564, Alpha=0.1, Beta=0.1
Model 7 Triple Exponential Smoothing (Holt-Winter’s Model)
• The Triple Exponential Smoothing models (Holt-Winter’s Model) is applicable when data has both trend and
seasonality. Sparkling data contain slight trend and significant seasonality.
• On the second iteration the model was allowed tochose the optimized values using parameters
‘optimized=True, use_brute=True’.
• The best model chosen as final one in comparison with all models is the one with alpha 0.1, beta 0.9 and
gamma 0.6 which has the least RMSE value as 338.458
Fig:33. Sparkling-TES
The RMSE value for SES model is, 473.152, Alpha=0.112, Beta=0.037, Gamma=0.493
The RMSE value for SES model is, 338.458, Alpha=0.1, Beta=0.9, Gamma=0.6
5. Check for the stationarity of the data on which the model is being built on using appropriate statistical tests and
also mention the hypothesis for the statistical test. If the data is found to be non-stationary, take appropriate steps to
make it stationary. Check the new data for stationarity and comment. Note: Stationarity should be checked at alpha =
0.05.
Augmented Dickey Fuller test (ADF Test) is a common statistical test used to test whether a given Time series is
stationary or not. It is one of the most commonly used statistical test when it comes to analyzing the stationary of a
series.
H0: Time series is non-stationary
H1: Time series is stationary
Stationarity should be checked at alpha = 0.05
Fig:34. Sparkling- Stationarity check on the whole data
We see that at 5% significant level, the above Time series is non-stationary.Let us take a difference of order 1
and check whether the series becomes stationary or not.
After Difference of order 1,
Fig:35. Sparkling- Difference of order whole data

Fig:36. Sparkling- Autocorrelation
Fig:37. Sparkling- Differenced Autocorrelation

Inference:
• From the first plot, ACF order is 1, therefore q value is 1.
• From the second plot differenced ACF, we got the q value as 2.
Fig:38. Sparkling- Partial Autocorrelation
Fig:39. Sparkling- Differenced Partial Autocorrelation
Inference:
• From the first plot, PACF order is 1, therefore p value is 1.
• From the second plot differenced PACF, we got the p value as 3.
6. Build an automated version of the ARIMA/SARIMA model in which the parameters are selected using the lowest
Akaike Information Criteria (AIC) on the training data and evaluate this model on the test data using RMSE.
Model 8 Automated ARIMA
▪ The RMSE value is, ARIMA(3,1,3) 1374.037
Model 9 Automated SARIMA
Least AIC:
Fig:40. Sparkling- Auto SARIMA Least AIC
Fig:42. Sparkling- Auto SARIMA (1,1,2)(2,0,2,6)
RMSE value is, 627.380819506515
Least AIC:
Fig:43. Sparkling- Auto SARIMA Least AIC
Fig:44. Sparkling- Auto SARIMA (1,1,2)(1,0,2,12)
RMSE value is, 528.612
Inference:
• The diagnostic plot for the model is as below, which clearly shows a normal distribution of residuals, where
more values are around zero.
• The correlogram shows the autocorrelation of the residuals and there are no points significant above the
confidence index.
7. Build ARIMA/SARIMA models based on the cut-off points of ACF and PACF on the training data and evaluate
this model on the test data using RMSE.
Model 10 Manual ARIMA
RMSE value of Man_ARIMA(3,1,2) is, 1379.049
Model 11 Manual SARIMA 6
The Auto-Regressive parameter in an SARIMA model is 'P' which comes from the significant lag after which
the PACF plot cuts-off to 2.
- The Moving-Average parameter in an SARIMA model is 'q' which comes from the significant lag after
which the ACF plot cuts-off to 2.
Fig:45. Sparkling- Man_SARIMA_6(2,1,2)(2,0,2,6)
Model 12 Manual SARIMA 12
Fig:46. Sparkling- Man_SARIMA_12(3,1,1)(3,1,1,12)
Inference:
• The diagnostic plot for the model is as below, which clearly shows a normal distribution of residuals, where
more values are around zero.
8. Build a table (create a data frame) with all the models built along with their corresponding parameters and the
respective RMSE values on the test data.
Fig:47. Sparkling-Test RMSE values of all models
9. Based on the model-building exercise, build the most optimum model(s) on the complete data and predict 12
months into the future with appropriate confidence intervals/bands.
There are two optimum models based on which we will build whole data,
A. We see that the best model is the Man_SARIMA_12(3,1,1)(3,1,1,12) with additive seasonality with the
parameters α = 3, β = 1 and γ = 1.
RMSE value is 547.310
B. We see that the next best model is Triple Exponential Smoothing with additive seasonality with the parameters
$\alpha$ = 0.1, $\beta$ = 0.9 and $\gamma$ = 0.6.
RMSE value is 465.568
RMSE of TES < RMSE of SARIMA
Therefore the optimum model for the whole data is TES with parameters 𝛼 = 0.1, 𝛽 = 0.9 and 𝛾 = 0.6
We have forecasted the sparkling wine sales for future 12 months using TES with parameters 𝛼 = 0.1, 𝛽 = 0.9
and 𝛾 = 0.6
Fig:48. Sparkling-Confidence intervals on full data
Fig:49. Sparkling- Full data forecasting plot
10.Comment on the model thus built and report your findings and suggest the measures that the company should be
taking for future sales.
• Based on comparison of all model’s Test RMSE value,Triple Exponential Smoothing (Holt Winter’s)
is selected for final prediction into 12 months in future.
• TES forecast on the Sparkling Full Data, RMSE is 465.568
• The ABC wine company is recommended to ramp up their procurement and production line in accordance
with the above forecasts.
*************************End of Report**************************

Business Report Sparkling Dataset - TSF

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Business Report Sparkling Dataset - TSF

Uploaded by

Copyright:

Available Formats

Business Analytics Report

Submitted to: Concerned faculty At

Mr. Charit Sharma

Display top 5 records

Fig:1. Sparkling first 5 records

Display bottom 5 records

Fig:2. Sparkling last 5 records

Check for Duplicate values

Sparkling data doesn’t have duplicate records.

Check for Missing values

Sparkling data doesn’t have null values.

Shape of the Sparkling data set

The Data set has 187 rows and 2 columns

Creating Time Stamp and make it as actual Time Series data

Fig:7. Sparkling time series plot with time-stamp

Descriptive Statistics of data

Fig:8. Sparkling Descriptive Stats

Fig:10. Sparkling Monthly box plot

Fig:11. Sparkling Month plot

Table:1. Sparkling pivot

Monthly Sales across years

Analysis Observations from Sparkling wine data:

Fig:14. Sparkling Yearly plot

Fig:15. Sparkling Quarterly plot

Fig:17. Sparkling ECDF

Avg Sparkling wine sales per month

Fig:18. Average Sparkling sales per month

Fig:19. Sparkling month on month percent change

Fig:20. Sparkling - Additive

Fig:21. Sparkling - Multiplicative

• There is no consistent trend shown in the plot of the trend component.

Fig:22. Additive- deaseasonalized

Fig:23. Multiplicative- deaseasonalized

Fig:25. Sparkling train and test plot

Model 1 Linear Regression

Fig:26. Sparkling -Linear Regression

▪ The RMSE value for Linear Regression model is 1389.135

Model 2 Naïve Approach

▪ The RMSE value for Naïve approach model is 3864.279

Model 3 Simple Average

Fig:28. Sparkling -Simple Average

▪ The RMSE value for Simple Average model is 1275.082

Model 4 Moving Average

The RMSE value for Moving average model is,

Fig:30. Model Comparison plot

Model 5 Simple Exponential Smoothing

The RMSE value for SES model is, 1316.035, Alpha=0.049

Model 6 Double Exponential Smoothing (Holt’s Model)

Model 7 Triple Exponential Smoothing (Holt-Winter’s Model)

H0: Time series is non-stationary

H1: Time series is stationary

Stationarity should be checked at alpha = 0.05

After Difference of order 1,

Fig:35. Sparkling- Difference of order whole data

Fig:37. Sparkling- Differenced Autocorrelation

Fig:39. Sparkling- Differenced Partial Autocorrelation

Model 8 Automated ARIMA

Model 9 Automated SARIMA

Fig:40. Sparkling- Auto SARIMA Least AIC

RMSE value is, 627.380819506515

Fig:43. Sparkling- Auto SARIMA Least AIC

RMSE value is, 528.612

Model 10 Manual ARIMA

RMSE value of Man_ARIMA(3,1,2) is, 1379.049