You are on page 1of 71

1.

Read the data as an appropriate Time series data and plot the data

Time series is a sequence of observations recorded at regular time intervals.

Rose Wine sales Dataset Sparkling wine Dataset


Import libraries

Import data
Plot the data
2. EDA and Decomposition

The above plot looks like time series but if you look at the X axis it is not time,
hence we need to pass the date command through pandas
Yearly
Monthly

This plot shows us the behaviour of the Time Series ('Wine sales' in this
case) across various months. The red line is the median value.
Yearly sales across Months
Decomposition
There rare no missing values in this dataset
Handling Missing values
Some of our key observations from this analysis:
1) Seasonality: Seasonal plot displays a fairly consistent month-on-month
pattern. The monthly seasonal components are average values for a month
after removal of trend. Trend is removed from the time series using the
following formula:

Seasonality_t × Remainder_t = Y_t/Trend_t

2) Irregular Remainder (random): is the residual left in the series after


removal of trend and seasonal components. Remainder is calculated using
the following formula:

Remainder_t = Y_t / (Trend_t × Seasonality_t)

Some of our key observations from this analysis:


1) Seasonality: Seasonal plot displays a fairly consistent month-on-month
pattern. The monthly seasonal components are average values for a month
after removal of trend. Trend is removed from the time series using the
following formula:

Seasonality_t × Remainder_t = Y_t/Trend_t

2) Irregular Remainder (random): is the residual left in the series after


removal of trend and seasonal components. Remainder is calculated using
the following formula:

Remainder_t = Y_t / (Trend_t × Seasonality_t)


3. SPLIT DATA INTO TRAIN AND TEST

Or
Both the test data in Sparkling and Rose wine sales start from 1991
4. Build Various Exponential smoothening models on training data and evaluate the model using RMSE on the test data. Other models such as
regression, naïve forecast, simple average models etc, should also be built on training data and check performance on test using RMSE, Build as
many models as possible with as many iterations of models with different parameters.

SES, Holt & Holt-Winter Model


Exponential Smoothing methods

Exponential smoothing methods consist of flattening time series data.

Exponential smoothing averages or exponentially weighted moving averages consist of forecast based on previous periods data with exponentially declining
influence on the older observations.

Exponential smoothing methods consist of special case exponential moving with notation ETS (Error, Trend, Seasonality) where each can be none(N),
additive (N), additive damped (Ad), Multiplicative (M) or multiplicative damped (Md).

One or more parameters control how fast the weights decay.

These parameters have values between 0 and 1

SES - ETS(A, N, N) - Simple smoothing with additive errors

T he simplest of the exponentially smoothing methods is naturally called simple exponential smoothing (SES).

This method is suitable for forecasting data with no clear trend or seasonal pattern.
In Single ES, the forecast at time (t + 1) is given by Winters,1960

 ��+1=𝛼��+(1−𝛼)��Ft+1=αYt+(1−α)Ft

Parameter 𝛼α is called the smoothing constant and its value lies between 0 and 1. Since the model uses only one smoothing constant, it is called
Single Exponential Smoothing.
The dataset, Rose and Sparkling gives the wine sales, the 20th centuary1980 to 1995.

1. Building different models and comparing accuracy metrics


2. Forecast using SES model.
3. Calculate the values of RMSE and MAPE.
4. Plot the forecasted values along with original values.

Sparkling Dataset

train_time = [i+1 for i in range(len(train))]


test_time = [i+43 for i in range(len(test))]
print('Training Time instance','\n',train_time)
print('Test Time instance','\n',test_time)
Training Time instance
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29,
30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56,
57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83,
84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108,
109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130,
131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152,
153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174,
175, 176, 177, 178, 179, 180]
Test Time instance

[43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68,
69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95,
96, 97]
Rose Wine dataset

train_time = [i+1 for i in range(len(train))]

test_time = [i+43 for i in range(len(test))]

print('Training Time instance','\n',train_time)

print('Test Time instance','\n',test_time)


Training Time instance
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29,
30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56,
57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83,
84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108,
109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130]
Test Time instance
[43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69,
70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96,
97, 98, 99]

Evaluation Metrics
There are two popular metrics used in measuring the performance of regression (continuous variable) models i.e MAE & RMSE.
Mean Absolute Error (MAE): It is the average of the absolute difference between the predicted values and observed values.
Root Mean Square Error (RMSE): It is the square root of the average of squared differences between the predicted values and observed values.

MAE is easier to understand and interpret but RMSE works well in situations where large errors are undesirable. This is because the errors are squared before
they are averaged, thus penalizing large errors. In our case, RMSE suits well because we want to predict the sales with minimum error (i.e penalize high errors)
so that inventory can be managed properly.

So, we’ll choose RMSE as a metric to measure the performance of our models.
MODEL 1 LINEAR REGRESSION

Rose SPARKLING
## Mean Absolute Percentage - Function Definition
def MAPE(y, yhat):
y, yhat = np.array(y), np.array(yhat)
try:
mape = round(np.sum(np.abs(yhat - y)) / np.s
um(y) * 100,2)
except:
print("Observed values are empty")
mape = np.nan
return mape
## Training Data - RMSE and MAPE

rmse_model1_train = metrics.mean_squared_error(train[
'Sparkling'],train_predictions_model1,squared=False)
mape_model1_train = MAPE(train['Sparkling'],train_pre
dictions_model1)
print("For RegressionOnTime forecast on the Training
Data, RMSE is %3.3f MAPE is %3.2f" %
(rmse_model1_train, mape_model1_train))
RMSE ON THE TEST DATA For RegressionOnTime forecast on the Training Data,
RMSE is 1295.158 MAPE is 40.01
We can calculate the RMSE using the helper function from the scikit-
learn library mean_squared_error() that calculates the mean squared ## Test Data - RMSE and MAPE
error between a list of expected values (the test set) and the list of
predictions. We can then take the square root of this value to give us an rmse_model1_test = metrics.mean_squared_error(test['S
RMSE score. parkling'],test_predictions_model1,squared=False)
## Test Data - RMSE and MAPE mape_model1_test = MAPE(test['Sparkling'],test_predic
tions_model1)
rmse_model1_test = metrics.mean_squared_error(test['Rose-wine-
print("For RegressionOnTime forecast on the Test Data
Sales'],test_predictions_model1,squared=False)
, RMSE is %3.3f MAPE is %3.2f" %(rmse_model1_test, m
mape_model1_test = MAPE(test['Rose-wine-
ape_model1_test))
Sales'],test_predictions_model1)
print("For RegressionOnTime forecast on the Test Data, RMSE is %3.3f For RegressionOnTime forecast on the Test Data, RMSE
MAPE is %3.2f" %(rmse_model1_test, mape_model1_test)) is 1275.073 MAPE is 38.12
Model 2: Naive Approach: �̂ �+1=��
## Training Data - RMSE and MAPE

rmse_model2_train = metrics.mean_squared_error(tra
in['Sparkling'],NaiveModel_train['naive'],squared=
Model Evaluation False)
mape_model2_train = MAPE(train['Sparkling'],NaiveM
odel_train['naive'])
print("For Naive Model forecast on the Training Da
ta, RMSE is %3.3f MAPE is %3.2f" %
(rmse_model2_train, mape_model2_train))
For Naive Model forecast on the Training Data,
RMSE is 3799.461 MAPE is 148.31
## Test Data - RMSE and MAPE

rmse_model2_test = metrics.mean_squared_error(test
['Sparkling'],NaiveModel_test['naive'],squared=Fal
se)
mape_model2_test = MAPE(test['Sparkling'],NaiveMod
el_test['naive'])
print("For RegressionOnTime forecast on the Test D
ata, RMSE is %3.3f MAPE is %3.2f" %
(rmse_model2_test, mape_model2_test))
For RegressionOnTime forecast on the Test Data,
RMSE is 1327.156 MAPE is 32.90
resultsDf_2 = pd.DataFrame({'Test RMSE': [rmse_mod
el2_test],'Test MAPE': [mape_model2_test]},index=[
For RegressionOnTime forecast on the Test Data, RMSE is 'NaiveModel'])
21.783 MAPE is 31.85
For Naive Model forecast on the Training Data, RMSE is
53.740 MAPE is 38.62 resultsDf = pd.concat([resultsDf, resultsDf_2])
resultsDf

SIMPLE AVERAGE METHOD


Consider the graph given below. Let’s assume that the y-axis depicts the SALES of SIMPLE AVERAGE METHOD
wine and x-axis depicts the time(Years). Consider the graph given below. Let’s assume that the y-axis depicts
We can infer from the graph that the sales of the wine is increasing and decreasing the SALES of wine and x-axis depicts the time(Years).
randomly by a greater margin, such that the average remains constant. Many a times We can infer from the graph that the sales of the wine is increasing and
we are provided with a dataset, which though varies by a small margin throughout it’s decreasing randomly by a smallerr margin, such that the average does
time period, but the average at each time period remains constant. In such a case we not remains constant. Many a times we are provided with a dataset,
can forecast the price of the next day somewhere similar to the average of all the past which though varies by a small margin throughout it’s time period, but
days. Such forecasting technique which forecasts the expected value equal to the the average at each time period remains constant. In such a case we
average of all previously observed points is called Simple Average technique. can forecast the price of the next day somewhere similar to the average
of all the past days. Such forecasting technique which forecasts the
expected value equal to the average of all previously observed
points is called Simple Average technique.
We take all the values previously known,
calculate the average and take it as the next value. Of course it won’t be it exact,
but somewhat close. As a forecasting method, there are actually situations where
this technique works the best.
We take all the values previously known, calculate the average and
take it as the next value. Of course it won’t be it exact, but
somewhat close. As a forecasting method, there are actually
situations where this technique works the best.
## Training Data - RMSE and MAPE

rmse_model3_train = metrics.mean_squared_error(tra
in['Sparkling'],SimpleAverage_train['mean_forecast
'],squared=False)
mape_model3_train = MAPE(train['Sparkling'],Simple
Average_train['mean_forecast'])
print("For Simple Average Model forecast on the Tr
aining Data, RMSE is %3.3f MAPE is %3.2f" %
(rmse_model3_train, mape_model3_train))
For Simple Average Model forecast on the Training
Data, RMSE is 1306.654 MAPE is 40.25
## Test Data - RMSE and MAPE

rmse_model3_test = metrics.mean_squared_error(test
['Sparkling'],SimpleAverage_test['mean_forecast'],
squared=False)
mape_model3_test = MAPE(test['Sparkling'],SimpleAv
For Simple Average Model forecast on the Training Data, RMSE erage_test['mean_forecast'])
is 36.229 MAPE is 25.58 print("For Simple Average forecast on the Test Dat
a, RMSE is %3.3f MAPE is %3.2f" %
For Simple Average forecast on the Test Data, RMSE is 52.432 (rmse_model3_test, mape_model3_test))
MAPE is 88.43
For Simple Average forecast on the Test Data,
RMSE is 1275.478 MAPE is 39.46
resultsDf_3 = pd.DataFrame({'Test RMSE': [rmse_mod
el3_test],'Test MAPE': [mape_model3_test]}
,index=['SimpleAverageM
odel'])

resultsDf = pd.concat([resultsDf, resultsDf_3])


resultsDf
MOVING AVERAGE
We can see that Moving Average method outperforms both Average
MOVING AVERAGE FOR TRAIN AND TEST DATA FOR DIFFERENT ITERATIONS
method and Naive method for this dataset. Now we will look at Simple
Exponential Smoothing method and see how it performs
We can see that Moving Average method outperforms both Average method and
Naive method for this dataset. Now we will look at Simple Exponential Smoothing
method and see how it performs.
MODEL COMPARISION PLOT
MODEL COMPARISION PLOT
Simple Exponential Smoothing

We have the Sparkling wine sales data from Jan 1980 to Jul 1995.
Split the data into train and test in the ratio 70:30 Use Single Exponential Smoothing method to forecast sales using the test data. Calculate the values of
RMSE and MAPE. Plot the forecasted values along with original values.
Forecasts are calculated using weighted averages where the weights decrease exponentially as observations come from further in the past, the smallest
weights are associated with the oldest observations:

where 0≤ α ≤1 is the smoothing parameter.


The one-step-ahead forecast for time T+1 is a weighted average of all the observations in the series y1,…,yT. The rate at which the weights decrease is
controlled by the parameter α.
If you stare at it just long enough, you will see that the expected value yx is the sum of two products: α⋅yt and (1−α)⋅y t-1.
Hence, it can also be written as :

So essentially we’ve got a weighted moving average with two weights: α and 1−α.
As we can see, 1−α is multiplied by the previous expected value y x−1 which makes the expression recursive. And this is why this method is
called Exponential. The forecast at time t+1 is equal to a weighted average between the most recent observation yt and the most recent forecast y t|t−1.
## Training Data
Model Evaluation for 𝛼 = 1 : Simple Exponential Smoothing
## Training Data rmse_SESmodel_train = metrics.mean_squared_error(SES_
train['Sparkling'],SES_train['predict'],squared=False
rmse_SESmodel_train = metrics.mean_squared_error(SES_t )
rain['Rose'],SES_train['predict'],squared=False) mape_SESmodel_train = MAPE(SES_train['Sparkling'],SES
mape_SESmodel_train = MAPE(SES_train['Rose'],SES_train _train['predict'])
['predict']) print("For Alpha =1 Simple Exponential Smoothing Mode
print("For Alpha =1 Simple Exponential Smoothing Model l forecast on the Training Data, RMSE is %3.3f MAPE
forecast on the Training Data, RMSE is %3.3f MAPE is is %3.2f" %(rmse_SESmodel_train, mape_SESmodel_train)
%3.2f" %(rmse_SESmodel_train, mape_SESmodel_train)) )
For Alpha =1 Simple Exponential Smoothing Model ## Test Data
forecast on the Training Data, RMSE is 28.436 MAPE is
22.87 rmse_SESmodel_test = metrics.mean_squared_error(SES_t
# Test Data est['Sparkling'],SES_test['predict'],squared=False)
mape_SESmodel_test = MAPE(SES_test['Sparkling'],SES_t
rmse_SESmodel_test = metrics.mean_squared_error(SES_te est['predict'])
st['Rose'],SES_test['predict'],squared=False) print("For Alpha =1 Simple Exponential Smoothing Mode
mape_SESmodel_test = MAPE(SES_test['Rose'],SES_test['p l forecast on the Training Data, RMSE is %3.3f MAPE
redict']) is %3.2f" %(rmse_SESmodel_test, mape_SESmodel_test))
print("For Alpha =1 Simple Exponential Smoothing Model
forecast on the Training Data, RMSE is %3.3f MAPE is For Alpha =1 Simple Exponential Smoothing Model
%3.2f" %(rmse_SESmodel_test, mape_SESmodel_test)) forecast on the Training Data, RMSE is 1275.478 MAPE
is 39.46
For Alpha =1 Simple Exponential Smoothing Model # fit model
forecast on the Training Data, RMSE is 40.594 MAPE is
68.94 #defining the various values of alpha for which we wa
# create class nt to run the model
model = SimpleExpSmoothing(np.asarray(train['Rose'])) alpha_list = [0.1, 0.5, 0.99]
from math import sqrt
from sklearn.metrics import mean_squared_error pred_SES = test.copy() # Have a copy of the test dat
# fit model aset

#defining the various values of alpha for which we wan #starting a loop
t to run the model for alpha_value in alpha_list:
alpha_list = [0.1, 0.5, 0.99]
alpha_str = "SES" + str(alpha_value)
pred_SES = test.copy() # Have a copy of the test data mode_fit_i = model.fit(smoothing_level
set = alpha_value, optimized=False)#fitting the model
pred_SES[alpha_str] = mode_fit_i.forecast(len(t
#starting a loop est['Sparkling']))#calculating the forecasts for the
for alpha_value in alpha_list: test set
#time period
alpha_str = "SES" + str(alpha_value) rmse = np.sqrt(mean_squared_erro
mode_fit_i = model.fit(smoothing_level r(test['Sparkling'], pred_SES[alpha_str]))#calculate
= alpha_value, optimized=False)#fitting the model the RMSE
pred_SES[alpha_str] = mode_fit_i.forecast(len(te #for the test set
st['Rose']))#calculating the forecasts for the test se mape = MAPE(test['Sparkling'],pr
t ed_SES[alpha_str])#calculate the MAPE for the test se
#time period t
rmse = np.sqrt(mean_squared_error
(test['Rose'], pred_SES[alpha_str]))#calculate the RMS ###
E
#for the test set print("For alpha = %1.2f, RMSE is %3.4f MAPE is
mape = MAPE(test['Rose'],pred_SES %3.2f" %(alpha_value, rmse, mape))
[alpha_str])#calculate the MAPE for the test set
For alpha = 0.10, RMSE is 1370.4942 MAPE is 49.24
### For alpha = 0.50, RMSE is 2583.5285 MAPE is 102.90
For alpha = 0.99, RMSE is 3797.5779 MAPE is 150.28
print("For alpha = %1.2f, RMSE is %3.4f MAPE is %
3.2f" %(alpha_value, rmse, mape)) DOUBLE EXPONENTIAL OR HOLT METHOD
For alpha = 0.10, RMSE is 15.7835 MAPE is 21.25
For alpha = 0.50, RMSE is 22.4445 MAPE is 37.12
For alpha = 0.99, RMSE is 33.8501 MAPE is 58.46

HOLT METHOD
Holt extended simple exponential smoothing to allow forecasting of data
with a trend. It is nothing more than exponential smoothing applied to both
level(the average value in the series) and trend. To express this in
mathematical notation we now need three equations: one for level, one for
the trend and one to combine the level and trend to get the expected
forecast y
The values we predicted in the above algorithms are called Level. In the
above three equations, you can notice that we have added level and trend
to generate the forecast equation.

As with simple exponential smoothing, the level equation here shows


that it is a weighted average of observation and the within-sample one-
SIMPLE EXPONENTIAL SMOOTHENING step-ahead forecast The trend equation shows that it is a weighted average
of the estimated trend at time t based on ℓ(t)−ℓ(t−1) and b(t−1), the
previous estimate of the trend.
For alpha = 0.10, RMSE is 15.7835 MAPE is 21.25
For alpha = 0.50, RMSE is 22.4445 MAPE is 37.12 SIMPLE EXPONENTIAL SMOOTHENING
For alpha = 0.99, RMSE is 33.8501 MAPE is 58.46
For alpha = 0.10, RMSE is 1370.4942 MAPE is 49.24
HOLT For alpha = 0.50, RMSE is 2583.5285 MAPE is 102.90
For alpha = 0.16, RMSE is 45.4301 MAPE is 71.33
For alpha = 0.99, RMSE is 3797.5779 MAPE is 150.28
HOLT
NOTE there is a INCREASE in RMSE and MAPE in Holt method
For alpha = 0.00, RMSE is 1322.4504 MAPE is 45.89
HOLT WINTERS ADDITIVE
Using Holt’s winter method will be the best option among the rest of the
models beacuse of the seasonality factor. The Holt-Winters seasonal method
comprises the forecast equation and three smoothing equations — one for
the level ℓt, one for trend bt and one for the seasonal component denoted
by st, with smoothing parameters α, β and γ.
ADDITIVE

In this method also, we can implement both additive and multiplicative


technique. The additive method is preferred when the seasonal variations
are roughly constant through the series, while the multiplicative method is
preferred when the seasonal variations are changing proportional to the
level of the series.

df_pred_opt = pd.DataFrame({'Y_hat':pred_HoltW['Holt
WM'] ,'Y':test['Sparkling'].values})
For alpha = 0.18, RMSE is 17.9824 MAPE is 27.13
rmse_opt = np.sqrt(mean_squared_error(df_pred_opt
.Y, df_pred_opt.Y_hat))
MULTIPLICATIVE MODEL mape_opt = MAPE(df_pred_opt.Y, df_pred_opt.Y_hat)

print("For alpha = %1.2f, RMSE is %3.4f MAPE is %3.2


f" %(alpha_value, rmse_opt, mape_opt))
For alpha = 0.09, RMSE is 254.2838 MAPE is 7.50
ADDITIVE: AAA MODEL For alpha = 0.04, RMSE is 253.7152 MAPE is 7.58
MULTIPLICATIVE: AAM MODEL For alpha = 0.09, RMSE is 254.2838 MAPE is
7.50

df_pred_opt = pd.DataFrame({'Y_hat':pred_HoltW['HoltW
M'] ,'Y':test['Rose'].values})

rmse_opt = np.sqrt(mean_squared_error(df_pred_opt.
Y, df_pred_opt.Y_hat))
mape_opt = MAPE(df_pred_opt.Y, df_pred_opt.Y_hat)

print("For alpha = %1.2f, RMSE is %3.4f MAPE is %3.2f


" %(alpha_value, rmse_opt, mape_opt))
For alpha = 0.18, RMSE is 12.1116 MAPE is 16.18

ADDITIVE: AAA MODEL For alpha = 0.18, RMSE is 17.9824


MAPE is 27.13
MULTIPLICATIVE: AAM MODEL For alpha = 0.18, RMSE is
12.1116 MAPE is 16.18
5. Check for the stationarity of the data on which the model is being built on using appropriate statistical tests and also mention the hypothesis for the
statistical test. If the data is found to be non-stationary, take appropriate steps to make it stationary. Check the new data for stationarity and
comment. Note: Stationarity should be checked at alpha = 0.05.iterations of models as possible with different parameters.

Augmented Dickey-Fuller test

Statistical tests make strong assumptions about the data. They can only be used to inform the degree to which a null hypothesis can be rejected or fail to be reject. The
result must be interpreted for a given problem to be meaningful. They can provide a quick check and confirmatory evidence that your time series is stationary or non-
stationary.

The Augmented Dickey-Fuller test is a type of statistical test called a unit root test.
The intuition behind a unit root test is that it determines how strongly a time series is defined by a trend.There are a number of unit root tests and the Augmented Dickey-
Fuller may be one of the more widely used. It uses an autoregressive model and optimizes an information criterion across multiple different lag values.

The null hypothesis of the test is that the time series can be represented by a unit root, that it is not stationary (has some time-dependent structure). The alternate
hypothesis (rejecting the null hypothesis) is that the time series is stationary.

 Null Hypothesis (H0): If failed to be rejected, it suggests the time series has a unit root, meaning it is non-stationary. It has some time dependent structure.
 Alternate Hypothesis (H1): The null hypothesis is rejected; it suggests the time series does not have a unit root, meaning it is stationary. It does not have time-
dependent structure.
We interpret this result using the p-value from the test. A p-value below a threshold (such as 5% or 1%) suggests we reject the null hypothesis (stationary), otherwise a p-
value above the threshold suggests we fail to reject the null hypothesis (non-stationary).

 p-value > 0.05: Fail to reject the null hypothesis (H0), the data has a unit root and is non-stationary.
 p-value <= 0.05: Reject the null hypothesis (H0), the data does not have a unit root and is stationary .
## Test for stationarity of the series - Dicky Fuller test ## Test for stationarity of the series - Dicky Fuller test
from statsmodels.tsa.stattools import adfuller
def test_stationarity(timeseries) from statsmodels.tsa.stattools import adfuller
#Determing rolling statistics def test_stationarity(timeseries):
rolmean = timeseries.rolling(window=7).mean()
rolstd = timeseries.rolling(window=7).std() #Determing rolling statistics
rolmean = timeseries.rolling(window=7).mean()
#Plot rolling statistics: rolstd = timeseries.rolling(window=7).std()
orig = plt.plot(timeseries, color='blue',label='Original')
mean = plt.plot(rolmean, color='red', label='Rolling Mean') #Plot rolling statistics:
std = plt.plot(rolstd, color='black', label = 'Rolling Std') orig = plt.plot(timeseries, color='blue',label='Original')
plt.legend(loc='best') mean = plt.plot(rolmean, color='red', label='Rolling Mean')
plt.title('Rolling Mean & Standard Deviation') std = plt.plot(rolstd, color='black', label = 'Rolling Std')
plt.show(block=False) plt.legend(loc='best')
plt.title('Rolling Mean & Standard Deviation')
#Perform Dickey-Fuller test: plt.show(block=False)
print ('Results of Dickey-Fuller Test:')
dftest = adfuller(timeseries, autolag='AIC') #Perform Dickey-Fuller test:
dfoutput = pd.Series(dftest[0:4], index=['Test Statistic','p-value','#Lags Used print ('Results of Dickey-Fuller Test:')
','Number of Observations Used']) dftest = adfuller(timeseries, autolag='AIC')
for key,value in dftest[4].items(): dfoutput = pd.Series(dftest[0:4], index=['Test Statistic','p-value','#Lags Used
dfoutput['Critical Value (%s)'%key] = value ','Number of Observations Used'])
print (dfoutput,'\n') for key,value in dftest[4].items():
dfoutput['Critical Value (%s)'%key] = value
print (dfoutput,'\n')
Null Hypothesis (H0): If failed to be rejected, it suggests the time series has a unit root, meaning it is non-stationary. It has some time
dependent structure.
Alternate Hypothesis (H1): The null hypothesis is rejected; it suggests the time series does not have a unit root, meaning it is
stationary. It does not have time-dependent structure.
We interpret this result using the p-value from the test. A p-value below a threshold (such as 5% or 1%) suggests we reject the null
hypothesis (stationary), otherwise a p-value above the threshold suggests we fail to reject the null hypothesis (non-stationary).

p-value > 0.05: Fail to reject the null hypothesis (H0), the data has a unit root and is non-stationary.- In Rose dataset
p-value <= 0.05: Reject the null hypothesis (H0), the data does not have a unit root and is stationary.- In Sparkling dataset
6. Build an automated version of the ARIMA/SARIMA model in which the parameters are selected using the lowest Akaike Information Criteria (AIC) on
the training data and evaluate this model on the test data using RMSE.
ARIMA is a very popular statistical method for time series forecasting. ARIMA stands for Auto-Regressive Integrated Moving Averages. ARIMA models work on the following
assumptions –

 The data series is stationary, which means that the mean and variance should not vary with time. A series can be made stationary by using log transformation or
differencing the series.
 The data provided as input must be a univariate series, since ARIMA uses the past values to predict the future values.

ARIMA has three components – AR (autoregressive term), I (differencing term) and MA (moving average term). Let us understand each of these components –

 AR term refers to the past values used for forecasting the next value. The AR term is defined by the parameter ‘p’ in ARIMA The value of ‘p’ is determined using the
PACF plot.
 MA term is used to defines number of past forecast errors used to predict the future values. The parameter ‘q’ in ARIMA represents the MA term. ACF plot is used
to identify the correct ‘q’ value.
 Order of differencing specifies the number of times the differencing operation is performed on series to make it stationary. Test like ADF and KPSS can be used to
determine whether the series is stationary and help in identifying the d value.

Auto ARIMA takes into account the AIC and BIC values generated

Akaike’s information criterion (AIC) compares the quality of a set of statistical models to each other.

Hyperparameter tuning for ARIMA

In order to choose the best combination of the above parameters, we’ll use a grid search. The best combination of parameters will give the lowest Akaike
information criterion (AIC) score. AIC tells us the quality of statistical models for a given set of data.
The Normal Q-Q plot shows that the ordered distribution of residuals follows
the distribution similar to normal distribution. Thus, our model seems to be
pretty good
Let’s check diagnostic plots to visualize the performance of our model.
The Normal Q-Q plot shows that the ordered distribution of residuals
follows the distribution similar to normal distribution. Thus, our model
seems to be pretty good.
The above plot shows that our predicted values catch up to the observed

values in the dataset. Our forecasts seem to align with the ground truth The above plot shows that our predicted values catch up to the observed

very well and shows result as expected. RMSE is also reasonably low in our values in the dataset. Our forecasts seem to align with the ground truth very

case. well and spiked up through 95 to 96 shows result as expected. RMSE is also

low in this case.

So, final ARIMA model can be represented as SARIMAX(1, 0, 1)x(0, 0, So, final ARIMA model can be represented as SARIMAX(1, 0, 1)x(0, 0,

2)5. This is the best we can do with ARIMA, so let’s try another model to 2)5. This is the best we can do with ARIMA, so let’s try another model to see

see whether we can decrease the RMSE. whether we can decrease the RMSE
Prophet
Prophet is an open-source tool by Facebook. This procedure is used for
forecasting time series data based on an additive model where non-linear
trends are fit with yearly, weekly, and daily seasonality, plus holiday effects.
The first plot shows that the total sales on a weekly basis are increasing. The second
plot shows the holiday gaps in the dataset and the third plot shows that the store
sees very high sales in the last week of December (because of the Christmas
holidays).
In this case SARIMA performed the best In this case ARIMA performed the best
7. Build ARIMA/SARIMA models based on the cut-off points of ACF and PACF on the training data and evaluate this model on the test data using RMSE.

The seasonal part of an AR or MA model will be seen in the seasonal lags of the PACF and ACF. For example, an ARIMA(0,0,0)(0,0,1) 1212 model will
show:

 a spike at lag 12 in the ACF but no other significant spikes;


 exponential decay in the seasonal lags of the PACF (i.e., at lags 12, 24, 36, …).

Similarly, an ARIMA(0,0,0)(1,0,0)1212 model will show:

 exponential decay in the seasonal lags of the ACF;


 a single significant spike at lag 12 in the PACF.

In considering the appropriate seasonal orders for a seasonal ARIMA model, restrict attention to the seasonal lags.The modelling procedure is almost
the same as for non-seasonal data, except that we need to select seasonal AR and MA terms as well as the non-seasonal components of the model.
To determine a proper model for a given time series data, it is necessary to carry out the ACF and PACF analysis. These statistical measures reflect how
the observations in a time series are related to each other. For modeling and forecasting purpose it is often useful to plot the ACF and PACF against
consecutive time lags. These plots help in determining the order of AR and MA terms. Below we give their mathematical definitions: For a time series{ }
x(t),t = 0,1, 2,... the Autocovariance [21, 23] at lag k is defined as: γ = ( , ) = [( − μ)( − μ)] k t t+k t t+k Cov x x E x x
8. Build a table (create a data frame) with all the models built along with their corresponding parameters and the respective RMSE values on the test
data.

Rose sales Sparkling sales

You might also like