You are on page 1of 32

Project on: Time Series Forecasting

by: Shivani Pandey


Batch: DSBA’21
Table of Contents

1. Approach Pg-3
2. Question 1 Pg-4
3. Question 2 Pg-5-9
4. Question 3 Pg-10
5. Question 4 Pg-11-19
6. Question 5 Pg-20
7. Question 6 Pg-21-24
8. Question 7 Pg-25-27
9. Question 8 Pg-28
10. Question 9 Pg-29-30
11. Question 10 Pg-31
Problem:

For this particular assignment, the data of different types of wine sales in the 20th century is to be analyzed. Both of
these data are from the same company but of different wines. As an analyst at ABC Estate Wines, you are tasked to
analyze and forecast Wine Sales in the 20th century.
Approach:

• We have the month-wise sales data of two types of wines (Rose and Sparkling) for 15 years.

• We are going to read the data from a time-series perspective and apply different exponential smoothing models, ARIMA and SARIMA
models (auto/manual) to forecast future wine sales.

• We are also going to analyze results from the different models and build a final model to predict future sales using the model that
provides the best results.
1. Read the data as an appropriate Time Series data and plot the data.
➢ The two datasets with Sales data of two types of wine “Rose” and “Sparkling” are imported into the Jupyter notebook using pd.read_csv command. We also
parse the Year and month information to Datetime datatype using parse_dates function

➢ The sales data of the two data sets are plotted as follows

Rose Year wise Sales - Graph Sparkling Year wise Sales - Graph

Observations on Rose Year-wise sales data – Observations on Sparkling Year-wise sales data –

✓ We notice that there is a decreasing trend in the initial years which ✓ We notice that there is not much trend in the plot
stabilizes after few years and again shows a decreasing trend
✓ The seasonality seems to have a pattern on yearly basis
✓ We also observe seasonality in the data trend and pattern seem to
repeat on yearly basis
2. Perform appropriate Exploratory Data Analysis to understand the data and also perform decomposition.

We have 187 data points in Rose wine data with two missing values, hence, 185 values, and 187 data points in Sparkling wine data. The basic measures of
descriptive statistics tell us how Sales have varied across the years. However, for this measure of descriptive statistics, we have averaged over the whole data
without taking the time component into account hence we should look at the box plots year-wise and month-wise.

Rose wine data – Descriptive statistics


Measure count mean std min 25% 50% 75% max
Value 185 90 39 28 63 86 112 267

Rose Year wise Sales – Box Plot Rose Month wise Sales – Box Plot

✓ As observed in the Time Series plot, the year-wise boxplots over here also indicate a measure of a downward trend.

✓ Also, we see that the sales of Rose wine have some outliers for certain years.

✓ December seems to have the highest sales of Rose wine and there are also outliers in June, July, August, and September months
Sparkling wine data – Descriptive statistics
Measure count mean std min 25% 50% 75% max
Value 187 2,402 1,295 1,070 1,605 1,874 2,549 7,242

Sparkling Year wise Sales – Box Plot Sparkling Month wise Sales – Box Plot

✓ As observed in the Time Series plot, the boxplots over here also do not indicate trend

✓ Also, we see that the sales of Sparkling wine have some outliers for almost all years except 1995

✓ We also observe December month has the highest sales value for Sparkling wine
✓ We observe from the above Line plots of Year/month wise sales data of Rose and Sparkling wine that December month has the highest Sales and
January, February and March months show lower sales values.
Rose Sales – Month, Cumulative % and Month on Month % Sales plots Sparkling Sales – Month, Cumulative % and Month on Month % Sales plots

✓ The median values keep increasing from January to December months. The ✓ The median values is stable from January to June and has an increasing trend from
July to December. January to December months. The Average Sales value does not
Average Sales value also shows a decreasing trend
show a trend
Rose Sales – Additive decomposition Sparkling Sales – Additive decomposition

Rose Sales – Multiplicative decomposition Sparkling Sales – Multiplicative decomposition

✓ There were two missing values which were interpolated using Linear
✓ For additive we see the residual values are around 0 and for
method
Multiplicative model we see the residual are around 1

✓ For additive we see the residual values are around 0 and for
Multiplicative model we see the residual are around 1
3. Split the data into training and test. The test data should start in 1991.
Rose Sales Train and Test data Sparkling Sales Train and Test data
✓ The Train data of Rose wine sales has been split for data up to 1990 ✓ The Train data of Sparkling wine sales has been split for data up to 1990
and has 132 data points and has 132 data points

✓ The Test data of Rose wine sales has been split for data from 1991 and ✓ The Test data of Sparkling wine sales has been split for data from 1991
has 55 data points and has 55 data points

✓ From our train-test split we are predicting the future sales as ✓ From our train-test split we are predicting the future sales as compared
compared to the past years. to the past years.

YearMonth Rose YearMonth Rose YearMonth Sparkling YearMonth Sparkling


First rows of Train Data - Rose First rows of Test Data - Rose First rows of Train Data - Sparkling First rows of Test Data - Sparkling
01-01-1980 112 01-01-1991 54 01-01-1980 1686 01-01-1991 1902
01-02-1980 118 01-02-1991 55 01-02-1980 1591 01-02-1991 2049
01-03-1980 129 01-03-1991 66 01-03-1980 2304 01-03-1991 1874
01-04-1980 99 01-04-1991 65 01-04-1980 1712 01-04-1991 1279
01-05-1980 116 01-05-1991 60 01-05-1980 1471 01-05-1991 1432
Last rows of Train Data - Rose Last rows of Test Data - Rose Last rows of Train Data - Sparkling Last rows of Test Data - Sparkling
01-08-1990 70 01-03-1995 45 01-08-1990 1605 01-03-1995 1897
01-09-1990 83 01-04-1995 52 01-09-1990 2424 01-04-1995 1862
01-10-1990 65 01-05-1995 28 01-10-1990 3116 01-05-1995 1670
01-11-1990 110 01-06-1995 40 01-11-1990 4286 01-06-1995 1688
01-12-1990 132 01-07-1995 62 01-12-1990 6047 01-07-1995 2031

Train Data Shape - Rose Train Data Shape - Sparkling


(132, 1) (132, 1)
Test Data Shape - Rose Test Data Shape - Sparkling
(55, 1) (55, 1)
4. Build various exponential smoothing models on the training data and evaluate the model using RMSE on the test data. Other
models such as regression, naïve forecast models, simple average models etc. should also be built on the training data and
check the performance on the test data using RMSE. Please do try to build as many models as possible and as many iterations
of models as possible with different parameters.
Model 1 - Rose Model 1 - Sparkling
Simple Exponential Smoothing - Rose Simple Exponential Smoothing - Sparkling

While trying to forecast Model Evaluation for @ 𝛼 = 0.995 : Simple Exponential


While trying to forecast Model Evaluation for @ 𝛼 = 0.995: Simple Exponential
Smoothing, We get an RMSE score of 1275.08 for the Test data. We shall also proceed to
Smoothing, we get an RMSE score of 36.80 for the Test data. We shall also proceed to
find the scores with different alpha levels.
find the scores with different alpha levels.

Model spec Test RMSE Rose Model spec - Sparkling Test RMSE - Sparkling
Alpha=0.995,SimpleExponentialSmoothing 36.80 Alpha=0.995,SimpleExponentialSmoothing 1275.08
Model 2 - Rose Model 2 - Sparkling
Simple Exponential Smoothing Tuning- Rose Simple Exponential Smoothing Tuning - Sparkling
While trying to forecast Model Evaluation at 𝛼 values ranging from 0.3 to 0.9 we get While trying to forecast Model Evaluation at 𝛼 values ranging from 0.3 to 0.9 we get
the Test RMSE scores as in the below table ranging from 47.50 to 77.14, Lowest RMSE the Test RMSE scores as in the below table ranging from 1935.51 to 3686.79, Lowest
being 47.50 at 𝛼 0.30. RMSE being 1935.51 at 𝛼 0.30.
Sorted RMSE values Sorted RMSE values
Alpha Values Train RMSE Rose Test RMSE Rose Model spec - Rose Test RMSE - Rose Alpha Values Train RMSE Sparkling Test RMSE Sparkling Model spec - Sparkling Test RMSE - Sparkling
0.30 32.47 47.50 0.30 1359.51 1935.51 Alpha=0.3,SimpleExponentialSmoothing 1935.51
Alpha=0.3,SimpleExponentialSmoothing 47.50 0.40 1352.59 2311.92
0.40 33.04 53.77
0.50 33.68 59.64 0.50 1344.00 2666.35
0.60 34.44 64.97 0.60 1338.81 2979.20
0.70 35.32 69.70 0.70 1338.84 3249.94
0.80 36.33 73.77 0.80 1344.46 3483.80
0.90 37.48 77.14 0.90 1355.72 3686.79

We shall proceed to check the Test scores of Double Exponential Smoothing model We shall proceed to check the Test scores of Double Exponential Smoothing model
Model 3 - Rose Model 3 - Sparkling
Double Exponential Smoothing – Rose (Level and Trend) Double Exponential Smoothing – Sparkling (Level and Trend)
We ran the Double Exponential Smoothing for different Smoothing level (Alpha) and We ran the Double Exponential Smoothing for different Smoothing level (Alpha) and
Smoothing slope/trend(Beta) values ranging from 0.3 to 1.0 and we got the least five RMSE Smoothing slope/trend(Beta) values ranging from 0.3 to 1.0 and we got the least five
scores ranging from 265.57 to 439.30 (listed in the table below) at Alpha values 0.30- RMSE scores ranging from 1919.21.57 to 1955.18 (listed in the table below) at Alpha
0.60/Beta values 0.30-0.40 values 0.60-0.80/Beta values 0.90-1.00
Sorted RMSE values Least 5 Sorted RMSE values Least 5
Alpha Values Rose Beta Values Rose Train RMSE Rose Test RMSE Rose Alpha Values Sparkling Beta Values Sparkling Train RMSE Sparkling Test RMSE Sparkling
0.30 0.30 35.94 265.57 0.70 1.00 2638.42 1919.21
0.40 0.30 36.75 339.31 0.60 1.00 2638.76 1931.82
0.30 0.40 37.39 358.75 0.80 1.00 2637.87 1933.30
0.50 0.30 37.43 394.27 0.70 0.90 2638.43 1951.17
0.60 0.30 38.35 439.30 0.80 0.90 2637.99 1955.18
Model spec - Rose Test RMSE - Rose Model spec - Sparkling Test RMSE - Sparkling
Alpha=0.3,Beta=0.3,DoubleExponentialSmoothing 265.57 Alpha=0.3,Beta=0.3,DoubleExponentialSmoothing 1919.21

We shall proceed to check the Test scores of Triple Exponential Smoothing model We shall proceed to check the Test scores of Triple Exponential Smoothing model
Model 4 - Rose Model 4 - Sparkling
Triple Exponential Smoothing Rose (Level, Trend and Seasonality) Triple Exponential Smoothing Sparkling (Level, Trend and Seasonality)
We ran the Triple Exponential Smoothing for Rose sales with auto fit parameters We ran the Triple Exponential Smoothing for Sparkling sales with auto fit parameters
which resulted in Smoothing level(Alpha) 0.106, Smoothing slope/trend(Beta) 0.048 which resulted in Smoothing level(Alpha) 0.154, Smoothing slope/trend(Beta)
and Smoothing seasonality(Gamma) 0.0 and we got Test RMSE score 17.37. The auto 2.6836273360597693e-21 and Smoothing seasonality(Gamma) 0.371 and we got Test
fit model parameters are listed as follows: RMSE score 383.19. The auto fit model parameters are listed as follows:
Rose Triple Exponential Smoothing Autofit Parameters: Sparkling Triple Exponential Smoothing Autofit Parameters:
smoothing_level: 0.106 smoothing_level: 0.15422215345902165,
smoothing_slope: 0.048 smoothing_slope: 2.6836273360597693e-21,
smoothing_seasonal: 0.0, smoothing_seasonal: 0.37132238239976756,
damping_slope: nan, damping_slope: nan,
initial_level: 76.65565108239777, initial_level: 1639.9993318334934,
initial_slope: 0.0,
initial_slope: 4.848983218641197,
initial_seasons: array([1.47550257, 1.65927135,
initial_seasons: array([1.00842014, 0.96898448, 1.24179403,
1.80572621, 1.58888812, 1.77822689, 1.92604353,
1.1320575 , 0.93981009, 0.93811201, 1.2245818 , 1.54428852,
2.11649443, 2.25135182, 2.11690561, 2.08112817,
2.4092726 , 3.30448096]), 1.27336069, 1.6319816 ,2.48292921, 3.11861884]),
use_boxcox: False, use_boxcox: False,
lamda: None, lamda: None,
remove_bias: False remove_bias: False
Model spec - Rose Test RMSE - Rose Model spec - Sparkling RMSE - Sparkling

Alpha=0.106,Beta=0.048,Gamma=0.0,TripleExponentialSmoothing 17.37 Alpha=0.154,Beta=2.6836273360597693e-21,Gamma=0.371,TripleExponentialSmoothing 353.91

We shall check the Test scores of Triple Exponential Smoothing model at different Alpha, We shall check the Test scores of Triple Exponential Smoothing model at different Alpha,
Beta, Gamma levels Beta, Gamma levels
Model 5 - Rose Model 5 - Sparkling
Triple Exponential Smoothing Tuning – Rose (Level, Trend and Seasonality) Triple Exponential Smoothing Tuning – Sparkling (Level, Trend and Seasonality)
We ran the Triple Exponential Smoothing for different Smoothing level (Alpha), Smoothing We ran the Triple Exponential Smoothing for different Smoothing level (Alpha), Smoothing
slope/trend(Beta) and Smoothing seasonality(Gamma) values ranging from 0.3 to 1.0 and slope/trend(Beta) and Smoothing seasonality(Gamma) values ranging from 0.3 to 1.0 and we
we got the least five RMSE scores ranging from 10.95 to 16.72 (listed in the table below) at got the least five RMSE scores ranging from 392.79 to 542.18 (listed in the table below) at
Alpha values 0.30-0.50/Beta values 0.30-0.50/Gamma values 0.30-0.80 Alpha values 0.30-0.70/Beta values 0.30-0.80/Gamma values 0.30-0.60

Alpha Values Beta Values Gamma Values Train RMSE Rose Test RMSE Rose Alpha Values Beta Values Gamma Values Train RMSE Sparkling Test RMSE Sparkling
0.30 0.40 0.30 28.11 10.95
0.30 0.30 0.30 404.51 392.79
0.30 0.30 0.40 27.40 11.20
0.30 0.40 0.30 424.83 410.85
0.40 0.30 0.80 32.60 12.62
0.40 0.30 0.40 435.55 421.41
0.30 0.50 0.30 29.09 14.41
0.70 0.80 0.30 700.32 518.19
0.50 0.30 0.60 32.14 16.72
0.50 0.30 0.50 498.24 542.18

Model spec - Rose Test RMSE - Rose


Model spec - Sparkling Test RMSE - Sparkling
Alpha=0.3,Beta=0.4,Gamma=0.3,TripleExponentialSmoothing 10.95
Alpha=0.3,Beta=0.3,Gamma=0.3,TripleExponentialSmoothing 392.79

We shall now check the Test scores of Linear Regression model We shall now check the Test scores of Linear Regression model
Model 6 - Rose Model 6 - Sparkling
Linear Regression - Rose Linear Regression - Sparkling
In Linear regression, we regress the ‘Rose' sales against the order of the occurrence. Hence In Linear regression, we regress the ‘Sparkling’ sales against the order of the occurrence.
we modified our training data before fitting it into a linear regression by generating Hence we modified our training data before fitting it into a linear regression by generating
numerical time instance order for train and test data. We then ran the Linear regression numerical time instance order for train and test data. We then ran the Linear regression
model and got Test RMSE score 2372.45 model and got Test RMSE score 1275.87

Model spec - Rose Test RMSE - Rose Model spec - Sparkling Test RMSE - Sparkling

RegressionOnTime 2372.45 RegressionOnTime 1275.87

We shall now check the Test scores of Naive model


We shall now check the Test scores of Naive model
Model 7 - Rose Model 7 - Sparkling
Naive Model- Rose Naive Model- Sparkling
We ran the Naïve model for Rose sales and got the Test RMSE score – 79.72. We observe We ran the Naïve model for Sparkling sales and got the Test RMSE score – 3863.28. We
that the green line in the Chart which shows the Naïve forecast plotting is a straight line observe that the green line in the Chart which shows the Naïve forecast plotting is a
given the Naïve models approach where Sales for tomorrow is the same as today and it straight line given the Naïve models approach where Sales for tomorrow is the same as
applies to all the future periods today and it applies to all the future periods

Model spec - Rose Test RMSE - Rose Model spec - Sparkling Test RMSE - Sparkling
Naïve Model 79.72 Naïve Model 3864.28

We shall now check the Test scores of Simple Average Model We shall now check the Test scores of Simple Average Model
Model 8 - Rose Model 8 - Sparkling
Simple Average Model - Rose Simple Average Model - Sparkling
We ran the Simple Average model for Rose sales and got the Test RMSE score – 53.46. We We ran the Simple Average model for Sparkling sales and got the Test RMSE score –
observe that the green line in the Chart which shows the Simple Average forecast plotting 1275.08. We observe that the green line in the Chart which shows the Simple Average
is a straight line given the Simple Average models approach where we use the average forecast plotting is a straight line given the Simple Average models approach where we
Sales value to forecast future Sales use the average Sales value to forecast future Sales

Model spec - Rose Test RMSE - Rose Model spec - Sparkling Test RMSE - Sparkling
SimpleAverage Model 53.46 SimpleAverage Model 1275.08

We shall now check the Test scores of Moving Average Model We shall now check the Test scores of Moving Average Model
Model 9 - Rose Model 9 - Sparkling
Moving Average Model - Rose Moving Average Model - Sparkling
We ran the Moving Average model for Rose Sale. we computed the moving averages for 2, We ran the Moving Average model for Sparlking Sale. we computed the moving averages
4, 6 and 9 point intervals. The Test RMSE scores for each of the moving average is listed in for 2, 4, 6 and 9 point intervals. The Test RMSE scores for each of the moving average is
the table below. Lowest score is in 2 period moving average at 11.80 listed in the table below. Lowest score is in 2 period moving average at 811.18

Model spec - Rose Test RMSE - Rose Model spec - Sparkling Test RMSE - Sparkling
2pointTrailingMovingAverage 11.80 2pointTrailingMovingAverage 811.18
4pointTrailingMovingAverage 15.37 4pointTrailingMovingAverage 1184.21
6pointTrailingMovingAverage 15.86 6pointTrailingMovingAverage 1337.20
9pointTrailingMovingAverage 16.34 9pointTrailingMovingAverage 1422.65
5. Check for the stationarity of the data on which the model is being built on using appropriate statistical tests and also
mention the hypothesis for the statistical test. If the data is found to be non-stationary, take appropriate steps to make it
stationary. Check the new data for stationarity and comment. Note: Stationarity should be checked at alpha = 0.05.
Stationarity Test - Rose Stationarity Test - Sparkling

We check the Stationarity of the Rose sales data at alpha 0.05 and observe from the We check the Stationarity of the Sparkling sales data at alpha 0.05 and observe from the
following result table that P value (0.34) is greater than alpha value. Hence we fail to following result table that P value (0.60) is greater than alpha value. Hence we fail to
reject the null hypothesis that the data is not stationary reject the null hypothesis that the data is not stationary

We then applied a difference of order 1 and checked for Stationarity and got the following
result where p-value(0.00) is lesser than the alpha value. Null hypsis is hence rejectedand We then applied a difference of order 1 and checked for Stationarity and got the following
Data is stationary result where p-value(0.00) is lesser than the alpha value. Null hypothesis is hence rejected
and Data is stationary
6. Build an automated version of the ARIMA/SARIMA model in which the parameters are selected using the lowest Akaike
Information Criteria (AIC) on the training data and evaluate this model on the test data using RMSE.
Automated ARIMA - Rose Automated ARIMA - Sparkling
We checked the Stationarity of the train data of Rose sales data at alpha 0.05 and observe We checked the Stationarity of the train data of Sparkling sales data at alpha 0.05 and
from the result table that data was not stationary hence we took a difference of order 1 observe from the result table that data was not stationary hence we took a difference of
and found the train data was stationary then. order 1 and found the train data was stationary then.

Difference of order 1 Difference of order 1

We see there is some seasonality in data We see there is some seasonality in data
Automated ARIMA - Rose Automated ARIMA - Sparkling

We ran the automated ARIMA model for Rose Sales and sorted the AIC values output from We ran the automated ARIMA model for Sparkling Sales and sorted the AIC values output
lowest to highest. We then proceeded to build the ARIMA model with the lowest Akaike from lowest to highest. We then proceeded to build the ARIMA model with the lowest
Information Criteria and got the Test RMSE score 15.62 Akaike Information Criteria and got the Test RMSE score 1374.61

Sorted AIC values Least 5


param AIC Model spec - Rose Test RMSE - Rose Sorted AIC values Least 5
(0, 1, 2) 1276.84 param AIC
ARIMA(0,1,2) 15.62 Model spec - Sparkling Test RMSE - Sparkling
(1, 1, 2) 1277.36 (2, 1, 2) 2210.62
(1, 1, 1) 1277.78 ARIMA(2,1,2) 1374.61
(2, 1, 1) 2232.36
(2, 1, 1) 1279.05 (0, 1, 2) 2232.78
(2, 1, 2) 1279.30 (1, 1, 2) 2233.60
(1, 1, 1) 2235.01

We then build an auto SARIMA model as it would be appropriate given data seems We then build an auto SARIMA model as it would be appropriate given data seems
seasonal seasonal
Automated SARIMA - Rose Automated SARIMA - Sparkling
We observe the ACF plot for Rose Sales and observe seasonality at intervals 12, hence we We observe the ACF plot for Sparkling Sales and observe seasonality at intervals 12, hence
run the Automated SARIMA models at seasonality 12 and sorted the AIC values output we run the Automated SARIMA models at seasonality 12 and sorted the AIC values output
from lowest to highest. We then proceeded to build the SARIMA model with the lowest from lowest to highest. We then proceeded to build the SARIMA model with the lowest
Akaike Information Criteria and got the Test RMSE score 26.93 Akaike Information Criteria and got the Test RMSE score 528.60

Sorted AIC values Least 5 Sorted AIC values Least 5


param seasonal AIC Model spec - Rose Test RMSE - Rose param seasonal AIC Model spec - Sparkling Test RMSE - Sparkling
(0, 1, 2) (2, 0, 2, 12) 887.94 SARIMA(0,1,2)(2,0,2,12) 26.93 (1, 1, 2) (1, 0, 2, 12) 1555.58 SARIMA(1,1,2)(1,0,2,12) 528.60
(1, 1, 2) (2, 0, 2, 12) 889.87 (1, 1, 2) (2, 0, 2, 12) 1556.08
(2, 1, 2) (2, 0, 2, 12) 890.67 (0, 1, 2) (2, 0, 2, 12) 1557.12
(2, 1, 1) (2, 0, 0, 12) 896.52 (0, 1, 2) (1, 0, 2, 12) 1557.16
(2, 1, 2) (2, 0, 0, 12) 897.35 (2, 1, 2) (2, 0, 2, 12) 1557.69
Automated SARIMA - Rose Automated SARIMA - Sparkling

Inference from Model diagnostics confirms that the model residuals are normally Inference from Model diagnostics confirms that the model residuals are normally
distributed distributed

✓ Standardized residual – Do not display any obvious seasonality ✓ Standardized residual – Do not display any obvious seasonality

✓ Histogram plus estimated density - The KDE plot of the residuals is similar with the ✓ Histogram plus estimated density - The KDE plot of the residuals is similar with the
normal distribution, hence the model residuals are normally distributed based normal distribution, hence the model residuals are normally distributed based

✓ Normal Q-Q plot – There is an ordered distribution of residuals (blue dots) following the ✓ Normal Q-Q plot – There is an ordered distribution of residuals (blue dots) following the
linear trend of the samples taken from a standard normal distribution with N(0, 1) linear trend of the samples taken from a standard normal distribution with N(0, 1)

✓ Correlogram – The time series residuals have low correlation with lagged versions of ✓ Correlogram – The time series residuals have low correlation with lagged versions of
itself itself
7. Build ARIMA/SARIMA models based on the cut-off points of ACF and PACF on the training data and evaluate this model on
the test data using RMSE.
Manual ARIMA - Rose Manual ARIMA - Sparkling

We then built manual ARIMA model for Rose Sales based on the ACF and PACF plots. We then built manual ARIMA model for Sparkling Sales based on the ACF and PACF plots.
Hence we chose the AR parameter p value 5, Moving average parameter q value 2 and d Hence we chose the AR parameter p value 1, Moving average parameter q value 2 and d
value 1 based on the below plots. The Test RMSE score at this p,d,q value of ARIMA model value 1 based on the below plots. The Test RMSE score at this p,d,q value of ARIMA model
is 15.43 is 1436.73

Model spec - Rose Test RMSE - Rose Model spec - Sparkling Test RMSE - Sparkling
ARIMA(5,1,2) 15.43 ARIMA(1,1,2) 1436.73
Manual SARIMA - Rose Manual SARIMA - Sparkling
We then built manual SARIMA model for Rose Sales based on the ACF and PACF plots. We then built manual SARIMA model for Sparking Sales based on the ACF and PACF plots.
Hence we chose the AR parameter p value 5, Moving average parameter q value 2 and d Hence we chose the AR parameter p value 1, Moving average parameter q value 2 and d
value 1 based on the below plots. We then derive the Seasonal parameters based on the value 1 based on the below plots. We then derive the Seasonal parameters based on the
seasonal cut-offs on the 6 months series plot and chose Seasonal AR parameter P value 2, seasonal cut-offs on the 6 months series plot and chose Seasonal AR parameter P value 1,
Seasonal Moving average parameter Q value 9, differencing 1 and Seasonal period 6 Seasonal Moving average parameter Q value 2, differencing 1 and Seasonal period 6

The Test RMSE score at this (5,1,2)(2,1,9,6) value of SARIMA model is 25.23 The Test RMSE score at this (5,1,2)(2,1,9,6) value of SARIMA model is 722.86

Model spec - Rose Test RMSE - Rose Model spec - Sparkling Test RMSE - Sparkling
SARIMA(5, 1, 2)(2, 1, 9, 6) 25.23 SARIMA(1, 1, 2)(1, 1, 3, 6) 722.86
Manual SARIMA - Rose Manual SARIMA - Sparkling

Model diagnostics confirms that the model residuals are normally distributed. Standardized Model diagnostics confirms that the model residuals are normally distributed. Standardized
residual do not display any obvious seasonality, Histogram plus estimated density - The KDE plot residual do not display any obvious seasonality, Histogram plus estimated density - The KDE plot
has normal distribution , Normal Q-Q plot – There is an ordered distribution of residuals (blue has normal distribution , Normal Q-Q plot – There is an ordered distribution of residuals (blue
dots) following the linear trend Correlogram – The time series residuals have low correlation with dots) following the linear trend Correlogram – The time series residuals have low correlation with
lagged versions of itself lagged versions of itself
8. Build a table (create a data frame) with all the models built along with their corresponding parameters and the respective
RMSE values on the test data.
Model Results - Rose Model Results - Sparkling

We have consolidated the test results from the various models built in the forecasting process of We have consolidated the test results from the various models built in the forecasting process of
the future Rose wine sales and get the following Test RMSE scores sorted in order of lowest to the future Sparkling wine sales and get the following Test RMSE scores sorted in order of lowest
highest values. to highest values.

The lowest score 10.95. It was obtained from the Triple exponential smoothing model which was The lowest score 383.19 is from the Triple exponential smoothing model which was run based on
run based on different Smoothing level (Alpha), Smoothing slope/trend(Beta) and Smoothing auto fit parameters which were Smoothing level(Alpha) 0.154, Smoothing slope/trend(Beta)
seasonality(Gamma) values ranging from 0.3 to 1.0 and the parameters that gave the lowest score 2.6836273360597693e-21 and Smoothing seasonality(Gamma) 0.371
are Alpha – 0.3, Beta – 0.4 and Gamma – 0.3
9. Based on the model-building exercise, build the most optimum model(s) on the complete data and predict 12 months into
the future with appropriate confidence intervals/bands.
Final Full Model - Rose Final Full Model - Sparkling
We observed from the RMSE scores that Triple Exponential would work better for the We observed from the RMSE scores that Triple Exponential would work better for the
Rose Sales data where we had seasonality and Trend. Sparkling Sales data where we had seasonality and Trend.

We see that the best model is the Triple Exponential Smoothing (Holt-winter method) We see that the best model is the Triple Exponential Smoothing (Holt-winter method)
with parameters α = 0.3, β = 0.4 and γ = 0.3 with parameters α = 0.154, β = 0. 2.6836273360597693e-21 and γ = 0.371

Prediction Plot –Rose Sales Prediction Plot –Sparkling Sales


Final Full Model - Rose Final Full Model - Sparkling

The upper and lower confidence bands were calculated at 95% confidence interval The upper and lower confidence bands were calculated at 95% confidence interval

Prediction Plot –Rose Sales at 95% Confidence Interval Prediction Plot –Sparkling Sales at 95% Confidence Interval

Model spec - Rose RMSE - Rose Model spec - Sparkling RMSE - Sparkling
Alpha=0.3,Beta=0.4,Gamma=0.3,TripleExponentialSmoothing 24.27 Alpha=0.154,Beta=2.6836273360597693e-21,Gamma=0.371,TripleExponentialSmoothing 353.91

The full model RMSE score of Rose sales forecast using Triple Exponential Smoothing The full model RMSE score of Sparkling sales forecast using Triple Exponential Smoothing
with parameters α = 0.3, β = 0.4 and γ = 0.3 is at 24.27 with parameters α = 0.154, β = 0. 2.6836273360597693e-21 and γ = 0.371 is at 353.91
10. Comment on the model thus built and report your findings and suggest the measures that the company should be taking
for future sales.
Inference and Recommendations – Rose Sales Model Inference and Recommendations – Sparkling Sales Model

➢ Rose sales shows a decrease in trend compared to the previous years ➢ Sparkling sales shows stabilized values and not much trend compared to
previous years
➢ December month shows the highest Sales across the year while the value
has come down through the years from 1980-1994 ➢ December month shows the highest Sales across the years from 1980-1994

➢ The models are built considering the Trend and Seasonality in to account ➢ The models are built considering the Trend and Seasonality in to account
and we see from the output plot that the future prediction is in line with and we see from the output plot that the future prediction is in line with
the trend and seasonality in the previous years the trend and seasonality in the previous years

➢ The Sales of Rose wine is seasonal and also has trend, hence the company ➢ The Sales of Sparling wine is seasonal; hence the company cannot have the
cannot have the same stock through the year. The predictions would help same stock through the year. The predictions would help here to plan the
here to plan the Stock need basis the forecasted sales. Stock need basis the forecasted sales.

➢ The company should use the prediction results and capitalize on the high ➢ The company should use the prediction results and capitalize on the high
demand seasons and ensure to source and supply the high demand demand seasons and ensure to source and supply the high demand

➢ The company should use the prediction results to plan the low demand ➢ The company should use the prediction results to plan the low demand
seasons to stock as per the demand seasons to stock as per the demand
---------------------------------------------------------------------------------------- The End – Thankyou -----------------------------------------------------------

You might also like