Professional Documents
Culture Documents
Project Time Series Forecasting Final
Project Time Series Forecasting Final
Read the data as an appropriate Time series data and plot the data
Import data
Plot the data
2. EDA and Decomposition
The above plot looks like time series but if you look at the X axis it is not time,
hence we need to pass the date command through pandas
Yearly
Monthly
This plot shows us the behaviour of the Time Series ('Wine sales' in this
case) across various months. The red line is the median value.
Yearly sales across Months
Decomposition
There rare no missing values in this dataset
Handling Missing values
Some of our key observations from this analysis:
1) Seasonality: Seasonal plot displays a fairly consistent month-on-month
pattern. The monthly seasonal components are average values for a month
after removal of trend. Trend is removed from the time series using the
following formula:
Or
Both the test data in Sparkling and Rose wine sales start from 1991
4. Build Various Exponential smoothening models on training data and evaluate the model using RMSE on the test data. Other models such as
regression, naïve forecast, simple average models etc, should also be built on training data and check performance on test using RMSE, Build as
many models as possible with as many iterations of models with different parameters.
Exponential smoothing averages or exponentially weighted moving averages consist of forecast based on previous periods data with exponentially declining
influence on the older observations.
Exponential smoothing methods consist of special case exponential moving with notation ETS (Error, Trend, Seasonality) where each can be none(N),
additive (N), additive damped (Ad), Multiplicative (M) or multiplicative damped (Md).
T he simplest of the exponentially smoothing methods is naturally called simple exponential smoothing (SES).
This method is suitable for forecasting data with no clear trend or seasonal pattern.
In Single ES, the forecast at time (t + 1) is given by Winters,1960
��+1=𝛼��+(1−𝛼)��Ft+1=αYt+(1−α)Ft
Parameter 𝛼α is called the smoothing constant and its value lies between 0 and 1. Since the model uses only one smoothing constant, it is called
Single Exponential Smoothing.
The dataset, Rose and Sparkling gives the wine sales, the 20th centuary1980 to 1995.
Sparkling Dataset
[43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68,
69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95,
96, 97]
Rose Wine dataset
Evaluation Metrics
There are two popular metrics used in measuring the performance of regression (continuous variable) models i.e MAE & RMSE.
Mean Absolute Error (MAE): It is the average of the absolute difference between the predicted values and observed values.
Root Mean Square Error (RMSE): It is the square root of the average of squared differences between the predicted values and observed values.
MAE is easier to understand and interpret but RMSE works well in situations where large errors are undesirable. This is because the errors are squared before
they are averaged, thus penalizing large errors. In our case, RMSE suits well because we want to predict the sales with minimum error (i.e penalize high errors)
so that inventory can be managed properly.
So, we’ll choose RMSE as a metric to measure the performance of our models.
MODEL 1 LINEAR REGRESSION
Rose SPARKLING
## Mean Absolute Percentage - Function Definition
def MAPE(y, yhat):
y, yhat = np.array(y), np.array(yhat)
try:
mape = round(np.sum(np.abs(yhat - y)) / np.s
um(y) * 100,2)
except:
print("Observed values are empty")
mape = np.nan
return mape
## Training Data - RMSE and MAPE
rmse_model1_train = metrics.mean_squared_error(train[
'Sparkling'],train_predictions_model1,squared=False)
mape_model1_train = MAPE(train['Sparkling'],train_pre
dictions_model1)
print("For RegressionOnTime forecast on the Training
Data, RMSE is %3.3f MAPE is %3.2f" %
(rmse_model1_train, mape_model1_train))
RMSE ON THE TEST DATA For RegressionOnTime forecast on the Training Data,
RMSE is 1295.158 MAPE is 40.01
We can calculate the RMSE using the helper function from the scikit-
learn library mean_squared_error() that calculates the mean squared ## Test Data - RMSE and MAPE
error between a list of expected values (the test set) and the list of
predictions. We can then take the square root of this value to give us an rmse_model1_test = metrics.mean_squared_error(test['S
RMSE score. parkling'],test_predictions_model1,squared=False)
## Test Data - RMSE and MAPE mape_model1_test = MAPE(test['Sparkling'],test_predic
tions_model1)
rmse_model1_test = metrics.mean_squared_error(test['Rose-wine-
print("For RegressionOnTime forecast on the Test Data
Sales'],test_predictions_model1,squared=False)
, RMSE is %3.3f MAPE is %3.2f" %(rmse_model1_test, m
mape_model1_test = MAPE(test['Rose-wine-
ape_model1_test))
Sales'],test_predictions_model1)
print("For RegressionOnTime forecast on the Test Data, RMSE is %3.3f For RegressionOnTime forecast on the Test Data, RMSE
MAPE is %3.2f" %(rmse_model1_test, mape_model1_test)) is 1275.073 MAPE is 38.12
Model 2: Naive Approach: �̂ �+1=��
## Training Data - RMSE and MAPE
rmse_model2_train = metrics.mean_squared_error(tra
in['Sparkling'],NaiveModel_train['naive'],squared=
Model Evaluation False)
mape_model2_train = MAPE(train['Sparkling'],NaiveM
odel_train['naive'])
print("For Naive Model forecast on the Training Da
ta, RMSE is %3.3f MAPE is %3.2f" %
(rmse_model2_train, mape_model2_train))
For Naive Model forecast on the Training Data,
RMSE is 3799.461 MAPE is 148.31
## Test Data - RMSE and MAPE
rmse_model2_test = metrics.mean_squared_error(test
['Sparkling'],NaiveModel_test['naive'],squared=Fal
se)
mape_model2_test = MAPE(test['Sparkling'],NaiveMod
el_test['naive'])
print("For RegressionOnTime forecast on the Test D
ata, RMSE is %3.3f MAPE is %3.2f" %
(rmse_model2_test, mape_model2_test))
For RegressionOnTime forecast on the Test Data,
RMSE is 1327.156 MAPE is 32.90
resultsDf_2 = pd.DataFrame({'Test RMSE': [rmse_mod
el2_test],'Test MAPE': [mape_model2_test]},index=[
For RegressionOnTime forecast on the Test Data, RMSE is 'NaiveModel'])
21.783 MAPE is 31.85
For Naive Model forecast on the Training Data, RMSE is
53.740 MAPE is 38.62 resultsDf = pd.concat([resultsDf, resultsDf_2])
resultsDf
rmse_model3_train = metrics.mean_squared_error(tra
in['Sparkling'],SimpleAverage_train['mean_forecast
'],squared=False)
mape_model3_train = MAPE(train['Sparkling'],Simple
Average_train['mean_forecast'])
print("For Simple Average Model forecast on the Tr
aining Data, RMSE is %3.3f MAPE is %3.2f" %
(rmse_model3_train, mape_model3_train))
For Simple Average Model forecast on the Training
Data, RMSE is 1306.654 MAPE is 40.25
## Test Data - RMSE and MAPE
rmse_model3_test = metrics.mean_squared_error(test
['Sparkling'],SimpleAverage_test['mean_forecast'],
squared=False)
mape_model3_test = MAPE(test['Sparkling'],SimpleAv
For Simple Average Model forecast on the Training Data, RMSE erage_test['mean_forecast'])
is 36.229 MAPE is 25.58 print("For Simple Average forecast on the Test Dat
a, RMSE is %3.3f MAPE is %3.2f" %
For Simple Average forecast on the Test Data, RMSE is 52.432 (rmse_model3_test, mape_model3_test))
MAPE is 88.43
For Simple Average forecast on the Test Data,
RMSE is 1275.478 MAPE is 39.46
resultsDf_3 = pd.DataFrame({'Test RMSE': [rmse_mod
el3_test],'Test MAPE': [mape_model3_test]}
,index=['SimpleAverageM
odel'])
We have the Sparkling wine sales data from Jan 1980 to Jul 1995.
Split the data into train and test in the ratio 70:30 Use Single Exponential Smoothing method to forecast sales using the test data. Calculate the values of
RMSE and MAPE. Plot the forecasted values along with original values.
Forecasts are calculated using weighted averages where the weights decrease exponentially as observations come from further in the past, the smallest
weights are associated with the oldest observations:
So essentially we’ve got a weighted moving average with two weights: α and 1−α.
As we can see, 1−α is multiplied by the previous expected value y x−1 which makes the expression recursive. And this is why this method is
called Exponential. The forecast at time t+1 is equal to a weighted average between the most recent observation yt and the most recent forecast y t|t−1.
## Training Data
Model Evaluation for 𝛼 = 1 : Simple Exponential Smoothing
## Training Data rmse_SESmodel_train = metrics.mean_squared_error(SES_
train['Sparkling'],SES_train['predict'],squared=False
rmse_SESmodel_train = metrics.mean_squared_error(SES_t )
rain['Rose'],SES_train['predict'],squared=False) mape_SESmodel_train = MAPE(SES_train['Sparkling'],SES
mape_SESmodel_train = MAPE(SES_train['Rose'],SES_train _train['predict'])
['predict']) print("For Alpha =1 Simple Exponential Smoothing Mode
print("For Alpha =1 Simple Exponential Smoothing Model l forecast on the Training Data, RMSE is %3.3f MAPE
forecast on the Training Data, RMSE is %3.3f MAPE is is %3.2f" %(rmse_SESmodel_train, mape_SESmodel_train)
%3.2f" %(rmse_SESmodel_train, mape_SESmodel_train)) )
For Alpha =1 Simple Exponential Smoothing Model ## Test Data
forecast on the Training Data, RMSE is 28.436 MAPE is
22.87 rmse_SESmodel_test = metrics.mean_squared_error(SES_t
# Test Data est['Sparkling'],SES_test['predict'],squared=False)
mape_SESmodel_test = MAPE(SES_test['Sparkling'],SES_t
rmse_SESmodel_test = metrics.mean_squared_error(SES_te est['predict'])
st['Rose'],SES_test['predict'],squared=False) print("For Alpha =1 Simple Exponential Smoothing Mode
mape_SESmodel_test = MAPE(SES_test['Rose'],SES_test['p l forecast on the Training Data, RMSE is %3.3f MAPE
redict']) is %3.2f" %(rmse_SESmodel_test, mape_SESmodel_test))
print("For Alpha =1 Simple Exponential Smoothing Model
forecast on the Training Data, RMSE is %3.3f MAPE is For Alpha =1 Simple Exponential Smoothing Model
%3.2f" %(rmse_SESmodel_test, mape_SESmodel_test)) forecast on the Training Data, RMSE is 1275.478 MAPE
is 39.46
For Alpha =1 Simple Exponential Smoothing Model # fit model
forecast on the Training Data, RMSE is 40.594 MAPE is
68.94 #defining the various values of alpha for which we wa
# create class nt to run the model
model = SimpleExpSmoothing(np.asarray(train['Rose'])) alpha_list = [0.1, 0.5, 0.99]
from math import sqrt
from sklearn.metrics import mean_squared_error pred_SES = test.copy() # Have a copy of the test dat
# fit model aset
#defining the various values of alpha for which we wan #starting a loop
t to run the model for alpha_value in alpha_list:
alpha_list = [0.1, 0.5, 0.99]
alpha_str = "SES" + str(alpha_value)
pred_SES = test.copy() # Have a copy of the test data mode_fit_i = model.fit(smoothing_level
set = alpha_value, optimized=False)#fitting the model
pred_SES[alpha_str] = mode_fit_i.forecast(len(t
#starting a loop est['Sparkling']))#calculating the forecasts for the
for alpha_value in alpha_list: test set
#time period
alpha_str = "SES" + str(alpha_value) rmse = np.sqrt(mean_squared_erro
mode_fit_i = model.fit(smoothing_level r(test['Sparkling'], pred_SES[alpha_str]))#calculate
= alpha_value, optimized=False)#fitting the model the RMSE
pred_SES[alpha_str] = mode_fit_i.forecast(len(te #for the test set
st['Rose']))#calculating the forecasts for the test se mape = MAPE(test['Sparkling'],pr
t ed_SES[alpha_str])#calculate the MAPE for the test se
#time period t
rmse = np.sqrt(mean_squared_error
(test['Rose'], pred_SES[alpha_str]))#calculate the RMS ###
E
#for the test set print("For alpha = %1.2f, RMSE is %3.4f MAPE is
mape = MAPE(test['Rose'],pred_SES %3.2f" %(alpha_value, rmse, mape))
[alpha_str])#calculate the MAPE for the test set
For alpha = 0.10, RMSE is 1370.4942 MAPE is 49.24
### For alpha = 0.50, RMSE is 2583.5285 MAPE is 102.90
For alpha = 0.99, RMSE is 3797.5779 MAPE is 150.28
print("For alpha = %1.2f, RMSE is %3.4f MAPE is %
3.2f" %(alpha_value, rmse, mape)) DOUBLE EXPONENTIAL OR HOLT METHOD
For alpha = 0.10, RMSE is 15.7835 MAPE is 21.25
For alpha = 0.50, RMSE is 22.4445 MAPE is 37.12
For alpha = 0.99, RMSE is 33.8501 MAPE is 58.46
HOLT METHOD
Holt extended simple exponential smoothing to allow forecasting of data
with a trend. It is nothing more than exponential smoothing applied to both
level(the average value in the series) and trend. To express this in
mathematical notation we now need three equations: one for level, one for
the trend and one to combine the level and trend to get the expected
forecast y
The values we predicted in the above algorithms are called Level. In the
above three equations, you can notice that we have added level and trend
to generate the forecast equation.
df_pred_opt = pd.DataFrame({'Y_hat':pred_HoltW['Holt
WM'] ,'Y':test['Sparkling'].values})
For alpha = 0.18, RMSE is 17.9824 MAPE is 27.13
rmse_opt = np.sqrt(mean_squared_error(df_pred_opt
.Y, df_pred_opt.Y_hat))
MULTIPLICATIVE MODEL mape_opt = MAPE(df_pred_opt.Y, df_pred_opt.Y_hat)
df_pred_opt = pd.DataFrame({'Y_hat':pred_HoltW['HoltW
M'] ,'Y':test['Rose'].values})
rmse_opt = np.sqrt(mean_squared_error(df_pred_opt.
Y, df_pred_opt.Y_hat))
mape_opt = MAPE(df_pred_opt.Y, df_pred_opt.Y_hat)
Statistical tests make strong assumptions about the data. They can only be used to inform the degree to which a null hypothesis can be rejected or fail to be reject. The
result must be interpreted for a given problem to be meaningful. They can provide a quick check and confirmatory evidence that your time series is stationary or non-
stationary.
The Augmented Dickey-Fuller test is a type of statistical test called a unit root test.
The intuition behind a unit root test is that it determines how strongly a time series is defined by a trend.There are a number of unit root tests and the Augmented Dickey-
Fuller may be one of the more widely used. It uses an autoregressive model and optimizes an information criterion across multiple different lag values.
The null hypothesis of the test is that the time series can be represented by a unit root, that it is not stationary (has some time-dependent structure). The alternate
hypothesis (rejecting the null hypothesis) is that the time series is stationary.
Null Hypothesis (H0): If failed to be rejected, it suggests the time series has a unit root, meaning it is non-stationary. It has some time dependent structure.
Alternate Hypothesis (H1): The null hypothesis is rejected; it suggests the time series does not have a unit root, meaning it is stationary. It does not have time-
dependent structure.
We interpret this result using the p-value from the test. A p-value below a threshold (such as 5% or 1%) suggests we reject the null hypothesis (stationary), otherwise a p-
value above the threshold suggests we fail to reject the null hypothesis (non-stationary).
p-value > 0.05: Fail to reject the null hypothesis (H0), the data has a unit root and is non-stationary.
p-value <= 0.05: Reject the null hypothesis (H0), the data does not have a unit root and is stationary .
## Test for stationarity of the series - Dicky Fuller test ## Test for stationarity of the series - Dicky Fuller test
from statsmodels.tsa.stattools import adfuller
def test_stationarity(timeseries) from statsmodels.tsa.stattools import adfuller
#Determing rolling statistics def test_stationarity(timeseries):
rolmean = timeseries.rolling(window=7).mean()
rolstd = timeseries.rolling(window=7).std() #Determing rolling statistics
rolmean = timeseries.rolling(window=7).mean()
#Plot rolling statistics: rolstd = timeseries.rolling(window=7).std()
orig = plt.plot(timeseries, color='blue',label='Original')
mean = plt.plot(rolmean, color='red', label='Rolling Mean') #Plot rolling statistics:
std = plt.plot(rolstd, color='black', label = 'Rolling Std') orig = plt.plot(timeseries, color='blue',label='Original')
plt.legend(loc='best') mean = plt.plot(rolmean, color='red', label='Rolling Mean')
plt.title('Rolling Mean & Standard Deviation') std = plt.plot(rolstd, color='black', label = 'Rolling Std')
plt.show(block=False) plt.legend(loc='best')
plt.title('Rolling Mean & Standard Deviation')
#Perform Dickey-Fuller test: plt.show(block=False)
print ('Results of Dickey-Fuller Test:')
dftest = adfuller(timeseries, autolag='AIC') #Perform Dickey-Fuller test:
dfoutput = pd.Series(dftest[0:4], index=['Test Statistic','p-value','#Lags Used print ('Results of Dickey-Fuller Test:')
','Number of Observations Used']) dftest = adfuller(timeseries, autolag='AIC')
for key,value in dftest[4].items(): dfoutput = pd.Series(dftest[0:4], index=['Test Statistic','p-value','#Lags Used
dfoutput['Critical Value (%s)'%key] = value ','Number of Observations Used'])
print (dfoutput,'\n') for key,value in dftest[4].items():
dfoutput['Critical Value (%s)'%key] = value
print (dfoutput,'\n')
Null Hypothesis (H0): If failed to be rejected, it suggests the time series has a unit root, meaning it is non-stationary. It has some time
dependent structure.
Alternate Hypothesis (H1): The null hypothesis is rejected; it suggests the time series does not have a unit root, meaning it is
stationary. It does not have time-dependent structure.
We interpret this result using the p-value from the test. A p-value below a threshold (such as 5% or 1%) suggests we reject the null
hypothesis (stationary), otherwise a p-value above the threshold suggests we fail to reject the null hypothesis (non-stationary).
p-value > 0.05: Fail to reject the null hypothesis (H0), the data has a unit root and is non-stationary.- In Rose dataset
p-value <= 0.05: Reject the null hypothesis (H0), the data does not have a unit root and is stationary.- In Sparkling dataset
6. Build an automated version of the ARIMA/SARIMA model in which the parameters are selected using the lowest Akaike Information Criteria (AIC) on
the training data and evaluate this model on the test data using RMSE.
ARIMA is a very popular statistical method for time series forecasting. ARIMA stands for Auto-Regressive Integrated Moving Averages. ARIMA models work on the following
assumptions –
The data series is stationary, which means that the mean and variance should not vary with time. A series can be made stationary by using log transformation or
differencing the series.
The data provided as input must be a univariate series, since ARIMA uses the past values to predict the future values.
ARIMA has three components – AR (autoregressive term), I (differencing term) and MA (moving average term). Let us understand each of these components –
AR term refers to the past values used for forecasting the next value. The AR term is defined by the parameter ‘p’ in ARIMA The value of ‘p’ is determined using the
PACF plot.
MA term is used to defines number of past forecast errors used to predict the future values. The parameter ‘q’ in ARIMA represents the MA term. ACF plot is used
to identify the correct ‘q’ value.
Order of differencing specifies the number of times the differencing operation is performed on series to make it stationary. Test like ADF and KPSS can be used to
determine whether the series is stationary and help in identifying the d value.
Auto ARIMA takes into account the AIC and BIC values generated
Akaike’s information criterion (AIC) compares the quality of a set of statistical models to each other.
In order to choose the best combination of the above parameters, we’ll use a grid search. The best combination of parameters will give the lowest Akaike
information criterion (AIC) score. AIC tells us the quality of statistical models for a given set of data.
The Normal Q-Q plot shows that the ordered distribution of residuals follows
the distribution similar to normal distribution. Thus, our model seems to be
pretty good
Let’s check diagnostic plots to visualize the performance of our model.
The Normal Q-Q plot shows that the ordered distribution of residuals
follows the distribution similar to normal distribution. Thus, our model
seems to be pretty good.
The above plot shows that our predicted values catch up to the observed
values in the dataset. Our forecasts seem to align with the ground truth The above plot shows that our predicted values catch up to the observed
very well and shows result as expected. RMSE is also reasonably low in our values in the dataset. Our forecasts seem to align with the ground truth very
case. well and spiked up through 95 to 96 shows result as expected. RMSE is also
So, final ARIMA model can be represented as SARIMAX(1, 0, 1)x(0, 0, So, final ARIMA model can be represented as SARIMAX(1, 0, 1)x(0, 0,
2)5. This is the best we can do with ARIMA, so let’s try another model to 2)5. This is the best we can do with ARIMA, so let’s try another model to see
see whether we can decrease the RMSE. whether we can decrease the RMSE
Prophet
Prophet is an open-source tool by Facebook. This procedure is used for
forecasting time series data based on an additive model where non-linear
trends are fit with yearly, weekly, and daily seasonality, plus holiday effects.
The first plot shows that the total sales on a weekly basis are increasing. The second
plot shows the holiday gaps in the dataset and the third plot shows that the store
sees very high sales in the last week of December (because of the Christmas
holidays).
In this case SARIMA performed the best In this case ARIMA performed the best
7. Build ARIMA/SARIMA models based on the cut-off points of ACF and PACF on the training data and evaluate this model on the test data using RMSE.
The seasonal part of an AR or MA model will be seen in the seasonal lags of the PACF and ACF. For example, an ARIMA(0,0,0)(0,0,1) 1212 model will
show:
In considering the appropriate seasonal orders for a seasonal ARIMA model, restrict attention to the seasonal lags.The modelling procedure is almost
the same as for non-seasonal data, except that we need to select seasonal AR and MA terms as well as the non-seasonal components of the model.
To determine a proper model for a given time series data, it is necessary to carry out the ACF and PACF analysis. These statistical measures reflect how
the observations in a time series are related to each other. For modeling and forecasting purpose it is often useful to plot the ACF and PACF against
consecutive time lags. These plots help in determining the order of AR and MA terms. Below we give their mathematical definitions: For a time series{ }
x(t),t = 0,1, 2,... the Autocovariance [21, 23] at lag k is defined as: γ = ( , ) = [( − μ)( − μ)] k t t+k t t+k Cov x x E x x
8. Build a table (create a data frame) with all the models built along with their corresponding parameters and the respective RMSE values on the test
data.