You are on page 1of 22

Time

Series –
Rose

Sakchi Saraf
PGP-DSBA Online
May’22
Date: 11/12/22

0
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Table of Contents

Contents
Problem 1 .................................................................................................................................. 3
Executive summary....................................................................................................................
Introduction................................................................................................................................ 3
Data description......................................................................................................................... 3
Sample of the dataset...............................................................................................................
Exploratory data analysis........................................................................................................... 3
Data type....................................................................................................................................
Describe....................................................................................................................................
Boxplot......................................................................................................................................
Line graph
Model analysis............................................................................................................................ 9
Train and Test data...................................................................................................................
Linear Regression Model..........................................................................................................
Naïve Approach Model.............................................................................................................
Simple Average Model..............................................................................................................
Moving Average Model............................................................................................................
Simple Exponential Smoothing.................................................................................................
Double Simple Exponential Smoothing....................................................................................
Triple Exponential Smoothing..................................................................................................
Stationary check.......................................................................................................................
ACF & PACF Plot........................................................................................................................
ARIMA Model............................................................................................................................
SARIMA Model...........................................................................................................................
Comparison of the models........................................................................................................ 18
Forecast ....................................................................................................................................... 19
Model finding........................................................................................................................... THE
END!.................................................................................................................................................... 29

1
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Sl No. Table name Sl No. Graph name
1 Sample of the dataset 1 Boxplot – Yearly & Monthly
2 Describe 2 Line graph – Dataset
3 Decomposition 3 Line graph – monthly sale
4 Comparison table 4 Line graph – Accumulated sale
5 ARIMA with AIC score 5 Line graph – change in sale
6 ARIMA determined with graph 6 Line graph - decomposition
7 SARIMA table with AIC score 7 Line graph – train & test
8 SARIMA determined with graph 8 Line graph – Linear Regression
9 All model comparison table 9 Line graph - Naïve Approach
10 Forecast table 10 Line graph – Simple Average
11 Line graph - Multiple Average for diff point
12 Line graph - Multiple Average for 2-point
13 Line graph – Simple Exponential Smoothing
14 Line graph – Double Exponential Smoothing
15 Line graph – Triple Exponential Smoothing
16 Line graph – Comparison of models
17 Line graph – Stationary check on dataset
18 Line graph – Stationary check on train data
19 ACF & PACF graph
20 SARIMA diagnostics
21 Forecast graph

2
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Problem 1
For this particular assignment, the data of different types of wine sales in the 20th century is to be analysed.
Both of these data are from the same company but of different wines. As an analyst in the ABC Estate Wines,
you are tasked to analyse and forecast Wine Sales in the 20th century.

Introduction
The purpose of this whole exercise is to analyze and forecast the sale of both the wines for 20 th century. We
will apply different forecasting technics to find out the best model to forecast the data, followed by
forecasting. The data consist of 5 year and 7 months data for monthly basis.

Data Description
1. YearMonth – states the month and year for the particular sale
2. Rose – states the sale of Rose wine for the particular month

Sample of the dataset

Head of the dataset Tail of the dataset

The first table shows the first 5 data entries of the dataset and the second table shows the last 5 data entries
of the dataset. The column “YearMonth” states the timeline and the column “Rose” states the sale of Rose
wine for the timeline.

Exploratory Data Analysis


Data type in the data frame:

RangeIndex: 187 entries, 0 to 186


Data columns - 2 columns
# Column Non-Null Count Dtype
0 YearMonth 187 non-null object
1 Rose 185 non-null int64

From the above information we can infer the following: -


 There is total 187 rows and 2 columns in the dataset, which means data of 5 years and 7 months.
 The dataset has 1 numerical variable and 1 categorical variable.
 While performing forecasting we only require numerical data as input. So, the object data type was
changed.
 There is 2 missing value presents in the dataset which will be replaced with a value.

3
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Describe
From the above table we get to know that –
 There is no miss value in the dataset and no bad data.
 There is a huge difference of 6100+ between the minimum value
and the maximum value.
 The mean and medium also has a huge gap portraying the spread
of the data

Boxplot

4
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
The first boxplot shows the yearly sale of Rose wine from 1980 to 1995 and the second boxplot shows the
monthly sale of the Rose wine from January to December. Observations as follows:
1. Outliers – There are outliers in many every year data except 1995, so we can say as it has only 7
months data so outliers come form the end months data. In monthly sale graph we can see there are
5 months with outliers, they are – June, July, August, September & December.
2. Inconsistent – Among the yearly data we can see that 1980 is the most inconsistant year followed by
1987 & 1982. Among the monthly data we can see that July is the most inconsistent month among a
year.
3. Maximum sale – Among the yearly value the sale has decreased in sale. Among the monthly sales
second half of the year shows an little increasing trend, in which quarter 4 has the maximum sale
and december being highest selling month.

Line graph

The above graph shows the monthly sale for the whole time period from 1980 to 1995. From the above
graph we can see there is decreasing trend along with a little seasonality.

5
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
The above graph shows each month sales over the period of time from 1980 to 1995. Here, we can infer the
following for the monthly sale of Rose wine: -
1. The sale of December is the highest among the all with a decreasing trend.
2. The sale of all the month shows a decreasing trend.

The above line graph shows the sales cumulative distribution. Here, we can infer that last 20% of the sales
consist of the maximum sale value approximately from 112 to 260.

6
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
The first graph shows the month-on-month average chance in sale and the second graph shows the month-
on-month average chance in sale. In the average change we can see the seasonal pattern, the magnitude is
different for every point but there is a pattern. In the percentage change we can see that the magnitude
different from each other and there are points for huge fall or rise.

The above graphs are decomposed output of the time series, the first is decomposed by additive model and
the second is decomposed by multiplicative model. The parts of decomposed graph are explained below: -
1. Graph – The first line graph shows the per month sale against the time period from 1980 to 1995
2. Trend – The second line graph shows the trend present in the dataset. We can see that there is a
strong decreasing trend. The trend is same in both the additive and multiplicative model.
3. Seasonality – The third line graph shows the seasonality present in the dataset. We can see that the
seasonality pattern is almost the same in additive and multiplicative model but the scale is totally
different.

7
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
4. Error – The last graph shows the error present in the dataset. We can see that there is some trend
present in the additive model but in multiplicative model there is not trend.
5. Result – From the above inferences we can say that the multiplicative model is able to capture the
trend and seasonality better than additive model.

The below table shows how the dataset is decomposed: -

Decompose for Trend Seasonality Error Value


07/1980

Additive model 147.08 7.16 -36.24 118

Multiplicative Model 147.08 1.10 0.73 118

From the above table, we can see that both the model has decomposed the dataset well but from the above
decomposed graph we can conclude that multiplicative model is better.

8
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Model Analysis
Train and Test data

The dataset has been divided into two parts – train and test data. Data till 1990 is train data and the data
from 1991 is test data. Before the division of the data, a date time value was made as index value.

Below is the line graph of dataset after train and test data split:

In the above graph the blue line represents the train data and the orange line represent the test data.

Below are the different models applied on the train and test data: -

1. Linear Regression Model

The above graph shows the train and test data along with the linear regression forecast on the test data. The
green line is the forecast, from the graph we can see that Linear Regression is unable to forecast on the test
data.

The RSME score of the linear regression is 16.63.

9
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
2. Naïve Approach Model

The above graph shows the train and test data along with the Naïve Approach forecast on the test data. The
green line is the forecast, from the graph we can see that Naïve Approach is unable to forecast on the test
data.

The RSME score of the linear regression is 78.49.

3. Simple Average Model

The above graph shows the train and test data along with the Simple Average forecast on the test data. The
green line is the forecast, from the graph we can see that Simple Average is unable to forecast on the test
data.

10
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
The RSME score of the linear regression is 52.37.

4. Moving Average Model

The above graph shows the train and test data along with different points of Moving Average forecast on the
test data. We have forecasted for 2, 4, 6 & 9 point Moving Average. From the graph, we can see that 2-point
Moving Average is able to forecast better than other points.

The RSME score of 2-point Moving Average is 12.16.


The RSME score of 4-point Moving Average is 15.57.
The RSME score of 6-point Moving Average is 15.69.
The RSME score of 9-point Moving Average is 16.16.

The above graph shows the train and test data along with 2-point Moving Average forecast on the test data.
The red line is the forecast on the test data, from the graph we can see that 2-point Moving Average is able
to forecast better.

11
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
5. Simple Exponential Smoothing

The above graph shows the train and test data along with the Simple Exponential Smoothing on the test
data. The green line is the forecast for auto fit param and the red line is the forecast for lowest RSME value
for different alpha values. From the graph we can see that both the Simple Exponential Smoothing are same
but unable to forecast on the test data.

The RSME score of Simple Exponential Smoothing for Alpha 0.099 is 35.93.
The RSME score of Simple Exponential Smoothing for Alpha 0.1 is 35.96.

6. Double Exponential Smoothing

The above graph shows the train and test data along with the Double Exponential Smoothing forecast on the
test data. The green line is the forecast, from the graph we can see that Double Exponential Smoothing is
able to capture the trend a little bit but unable to forecast on the test data accurately.

12
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
The RSME score of the Double Exponential Smoothing is 36.00.

7. Triple Exponential Smoothing

The above graph shows the train and test data along with the Triple Exponential Smoothing forecast on the
test data. The green line is the forecast, from the graph we can see that Triple Exponential Smoothing is able
to capture both the trend and seasonality better than other models.

The RSME score of the Triple Exponential Smoothing is 11.91.

Comparison of all above models

The above graph shows the forecast for all the above models. The below table shows the RSME for all the
above models: -
Model name RSME score
Linear Regression 16.62
Naïve Approach 78.49
Simple Average 52.37
2-point Moving Average 12.15
Simple Exponential Smoothing 35.93
Double Exponential Smoothing 36.00

13
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Triple Exponential Smoothing 11.90

Checking Stationary of the dataset

The first graph is the stationary check for the original dataset and the second graph is the stationary check
after one differenciation. As is the first check the p-value is above 0.05 so we differentiated once. After
differenciation the p-value is below 0.05.

Checking Stationary of the TRAIN dataset

The first graph is the stationary check for the train dataset and the second graph is the stationary check after
one differenciation. As is the first check the p-value is above 0.05 so we differentiated once. After

14
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
differenciation the p-value is below 0.05. So the d value for ARIMA and SARIMA model is 1.

Checking the ACF & PACF for the dataset and differenciated dataset

The first graph is the ACF plot on original dataset, the second graph is PACF plot of the original dataset, the
third and fourth graph are ACF and PACF plot after differentiation of dataset.
From the above graphs we get to know the values of the following:
 q value –2 as after the second poll the third poll is inside the confidential area.
 p value – 3 as after the second poll the third poll is inside the confidential area.
 Q value – 14 as after the second poll the third poll is inside the confidential area.
 P value – 2 as after the second poll the third poll is inside the confidential area.

8. ARIMA Model

15
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
The above is the output for the ARIMA model with p=0, q=2 & d=1, determined from the list of AIC score. It
has a RSME score of 16.72 and AIC score of 1276.83.

The above is the output for the ARIMA model with p=3, q=2 & d=1, determined manually from the ACF &
PACF plot (rational for selection stated above). It has a RSME score of 16.72and AIC score of 1280.97.

9. SARIMA Model

16
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
The above is the output for the SARIMA model with p=1, q=2, d=1, P=1, Q=2 & D=0 for seasonality with 12,
determined from the list of AIC score. It has a RSME score of 26.45 and AIC score of 896.69.

The above graphs are the diagnostics of the SARIMA model. We can infer that:
1. The histogram is also almost normally distributed
2. The blue dots are nearby the red line and there are few points away from the line
3. In the last graph we can see that all the bars are included in the confidence interval

17
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
The above is the output for the SARIMA model with p=2, q=2, d=1, P=2, Q=14 & D=0 for seasonality with
12, , determined manually from the ACF & PACF plot (rational for selection stated above). It has a RSME
score of 16.47.

Comparison of the above ARIMA & SARIMA model


The below table shows the RSME for the above models: -

Model name RSME score


Linear Regression 16.62
Naïve Approach 78.49
Simple Average 52.37
2-point Moving Average 12.15
Simple Exponential Smoothing 35.93
Double Exponential Smoothing 36.00
Triple Exponential Smoothing 11.90
ARIMA model 16.72
ARIMA model (manually) 16.72
SARIMA model 26.45
SARIMA model (manually) 16.47

From the above table we can know that Triple Exponential Smoothing technique followed by 2-point moving
average gives the lowest RSME score.

18
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
19
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
Forecast for the next 12 months using the top model
Using the Triple Exponential Smoothing

The above shows the forecast of the sale of Rose wine using Triple Exponential Smoothing model. The blue
line is the sale of the Rose wine and the yellow line is the forecast of the Rose wine for the next 12 month
from 08/1995 to 07/1995. The grey area shows the confidential interval of 0.05.

Modelling finding
1. The best models are Triple Exponential Smoothing technique followed by 2-point moving average
model with the lowest RSME score.
2. The 6 & 9 point moving average will be unable to capture the seasonality.

Conclusion
The sale of Sparkling is kind of consistent it has increased a little bit over the years. There could be multiple
reasons for such sales like competitive price, quality of product or change in customer preference. They
should try changing the target audience and pricing according to the market along with innovative
promotion.

20
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited
THE END!

21
Proprietary content. ©Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited

You might also like