You are on page 1of 52

Business Analytics Report

Submitted to:
Concerned
faculty At
Great learning
The University of Texas & Austin

Submitted By:

Mr. Charit Sharma


PGPDSBA Online July 2021
Subject: - Time Series Forecasting – ROSE DATA SET

1 Time Series Forecasting Project Report by Charit Sharma


Contents
Problem Statement
For this particular assignment, the data of different types of wine sales in the 20th century is to be
analyzed. Both of these data are from the same company but of different wines. As an analyst in the
ABC Estate Wines, you are tasked to analyze and forecast Wine Sales in the 20th century.

1. Read the data as an appropriate Time Series data and plot the data.
2. Perform appropriate Exploratory Data Analysis to understand the data and also perform
decomposition.
3. Split the data into training and test. The test data should start in 1991.
4. Build all the exponential smoothing models on the training data and evaluate the model using
RMSE on the test data. Other additional models such as regression, naïve forecast models,
simple average models, moving average models should also be built on the training data and
check the performance on the test data using RMSE.
5. Check for the stationarity of the data on which the model is being built on using appropriate
statistical tests and also mention the hypothesis for the statistical test. If the data is found to
be non-stationary, take appropriate steps to make it stationary. Check the new data for
stationarity and comment. Note: Stationarity should be checked at alpha = 0.05.
6. Build an automated version of the ARIMA/SARIMA model in which the parameters are
selected using the lowest Akaike Information Criteria (AIC) on the training data and evaluate
this model on the test data using RMSE.
7. Build ARIMA/SARIMA models based on the cut-off points of ACF and PACF on the training data
and evaluate this model on the test data using RMSE.
8. Build a table with all the models built along with their corresponding parameters and the
respective RMSE values on the test data.
9. Based on the model-building exercise, build the most optimum model(s) on the complete data
and predict 12 months into the future with appropriate confidence intervals/bands.
10. Comment on the model thus built and report your findings and suggest the measures that the
company should be taking for future sales.
1. Read the data as an appropriate Time Series data and plot the data.

Import necessary library

import numpy as np
import pandas as pd
import seaborn as sns
from matplotlib import pyplot as plt
from matplotlib.pylab import rcParams
rcParams['figure.figsize'] = 13, 6

Afterwards read the dataset call them pd read command in the pandas

Then apply the head command for top 5

Rose
YearMonth

0 1980-01 112.0

1 1980-02 118.0

2 1980-03 129.0

3 1980-04 99.0

4 1980-05 116.0

Afterwards apply the Tail command to check the below 5

Then apply the Describe command to check the basic statistical details like
percentile, mean, std etc. of a data frame or a series of numeric values.

3 Time Series Forecasting Project Report by Charit Sharma


Now we will plot the graph, this shows

It shows a downward trend of production in Year 1981 it was way above then
250 and in year 1995 its marginally above 50 we can say near to 60

2. Perform appropriate Exploratory Data Analysis to understand the data and


also perform decomposition.
We will apply the Transcribe command to check the data in one table

Now we apply the isnull function to check the missing values or the null values
in the dataset
Now we will apply the shape command to check the size and dimension of the
dataframe

(187, 1)

Then we will apply the df.info to get further info of the dataframe

Now we will construct a boxplot here X label will be the Rose Wine Sale and Y
the number of years

Afterwards we will construct a boxplot to check the monthly trend

plt.xlabel('Monthly Rose Wine Sale');


plt.ylabel('Months');

5 Time Series Forecasting Project Report by Charit Sharma


Now we will construct a monthly graph by using statsmodel
It’s a monthly plot of rose wine production.
month_plot(df,ylabel='Rose Wine Production',ax=ax)
plt.grid();

Now we will construct a pivot table for monthly sales across years of Rose
Wine
Then we will construct a plot to show the monthly sales across the years.

Now we will check the Sum of the Observations of each year'

7 Time Series Forecasting Project Report by Charit Sharma


Now we will check the average mean df_yearly_mean of the Years and shall
apply the head command for first 5

Then we will check the 'Mean of the Observations of each year'


Now we will check the Quarterly mean

Now we will construct the df_quarterly_sum.plot();

df_daily_sum = df.resample('D').sum() Now we checked the monthly info

9 Time Series Forecasting Project Report by Charit Sharma


df_daily_sum.plot()

Now we checked the Daily yearly trends

df_decade_sum = df.resample('10Y').sum()

Now we checked the decade info


Now we have constructed the plot to show the trend

df_decade_sum.plot();

Now we will import ecdf from stats model

# statistics
from statsmodels.distributions.empirical_distribution import ECDF

cdf = ECDF(df['Rose'])
plt.plot(cdf.x, cdf.y, label = "statmodels");

plot x label shows the sales of rose wine

11 Time Series Forecasting Project Report by Charit Sharma


# group by date and get average RetailSales, and precent change

df['1994']
df.interpolate(methods='spline',order=3,inplace=True)

df['1994']

Now we will check the missing values in the dataframe

13 Time Series Forecasting Project Report by Charit Sharma


Now from stats model we will import seasonal decompose values

decomposition = seasonal_decompose(df['Rose'],model='additive')

This shows trends, seasonal and resid graphs.

trend = decomposition.trend
seasonality = decomposition.seasonal
residual = decomposition.resid
deaseasonalized_ts = trend + residual

decomposition = seasonal_decompose(df['Rose'],model='multiplicative')

15 Time Series Forecasting Project Report by Charit Sharma


trend = decomposition.trend
seasonality = decomposition.seasonal
residual = decomposition.resid

deaseasonalized_ts = trend + residual


"Original Time Series", "Time Series without Seasonality Component"

3. Split the data into training and test. The test data should start in 1991.

train=df[df.index.year< 1991]
test=df[df.index.year>=1991]

Then we will apply the shape command

(132, 1)

print(test.shape)
(55, 1)

We will then print first few rows of training data and last few rows of training
data.

17 Time Series Forecasting Project Report by Charit Sharma


Now we will do the same for Test Data first few and last few rows of it

Now we will plot the test and train data


train['Rose'].plot(figsize=(13,5), fontsize=14)
test['Rose'].plot(figsize=(13,5), fontsize=14)

4. Build various exponential smoothing models on the training data and evaluate the model
using RMSE on the test data. Other models such as regression,naïve forecast models and
simple average models. should also be built on the training data and check the
performance on the test data using RMSE.

train_time = [i+1 for i in range(len(train))]


test_time = [i+43 for i in range(len(test))]

Model 1: Linear Regression

LinearRegression_train = train.copy()
LinearRegression_test = test.copy()

Now we will import the sklearn linear model

lr = LinearRegression()

'Regression On Time_Test Data'

19 Time Series Forecasting Project Report by Charit Sharma


from sklearn import metrics

## Test Data – RMSE

For RegressionOnTime forecast on the Test Data, RMSE is 51.433

resultsDf = pd.DataFrame({'Test RMSE':


[rmse_model1_test]},index=['RegressionOnTime'])
resultsDf

Test RMSERegressionOnTime51.433312

Model 2 : Naive Approach

NaiveModel_train = train.copy()
NaiveModel_test = test.copy()

NaiveModel_test['naive'] =
np.asarray(train['Rose'])[len(np.asarray(train['Rose']))-1]
NaiveModel_test['naive'].head()

'Naive Forecast on Test Data'


## Test Data – RMSE

For Naive forecast on the Test Data, RMSE is 79.719

resultsDf_2 = pd.DataFrame({'Test RMSE':


[rmse_model2_test]},index=['NaiveModel'])

Test RMSERegressionOnTime51.433312NaiveModel79.718773

Model 3: Simple Average

SimpleAverage_train = train.copy()
SimpleAverage_test = test.copy()

'Simple Average on Test Data'

21 Time Series Forecasting Project Report by Charit Sharma


## Test Data – RMSE

For Simple Average forecast on the Test Data, RMSE is 53.461

resultsDf_3 = pd.DataFrame({'Test RMSE':


[rmse_model3_test]},index=['SimpleAverageModel'])

Model 4: Moving Average

Average means
# Plotting on the whole data

#Creating train and test set

# Test Data - RMSE --> 2 point Trailing MA

For 2 point Moving Average Model forecast on the Training Data, RMSE is 1
1.529
For 4 point Moving Average Model forecast on the Training Data, RMSE is 1
4.451
For 6 point Moving Average Model forecast on the Training Data, RMSE is 1
4.566
For 9 point Moving Average Model forecast on the Training Data, RMSE is 1
4.728

23 Time Series Forecasting Project Report by Charit Sharma


Model 5: Simple Exponential Smoothing

Exponential smoothing is generally used to make short term forecasts, but longer-term forecasts
using this technique can be quite unreliable.

from statsmodels.tsa.api import ExponentialSmoothing, SimpleExpSmoothing,


Holt
import warnings
warnings.filterwarnings("ignore")

SES_train = train.copy()
SES_test = test.copy()

## Plotting on both the Training and Test data

'Alpha =0.098 Predictions for Rose Wine'


## Test Data
rmse_model5_test_1 =
metrics.mean_squared_error(SES_test['Rose'],SES_test['predict'],squared=Fals
e)

For Alpha =0.098 Simple Exponential Smoothing Model forecast on the Test
Data, RMSE is 36.796

resultsDf_5 = pd.DataFrame({'Test RMSE':


[rmse_model5_test_1]},index=['Alpha=0.098,SimpleExponentialSmoothing'])

## First we will define an empty dataframe to store our values from the loop

25 Time Series Forecasting Project Report by Charit Sharma


## Plotting on both the Training and Test data

"Alpha= 0.3 Simple Exponential Smoothing Rose Wine

'Alpha=0.3,SimpleExponentialSmoothing','Alpha=0.4,SimpleExponentialSmoothi
ng'

Model 6: Double Exponential Smoothing - Holt's Model

Holt's Double Parameter Exponential Smoothing. This method is an extension of Brown's


method. In the Holt model a growth factor is added to the smoothing equation.

DES_train = train.copy()
DES_test = test.copy()
## First we will define an empty dataframe to store our values from the loop

Alpha ValuesBeta ValuesTrain RMSETest RMSE

for i in np.arange(0.3,1.1,0.1):
for j in np.arange(0.1,1.1,0.1):
model_DES_alpha_i_j =
model_DES.fit(smoothing_level=i,smoothing_slope=j,optimized=False,use_brut
e=True)
DES_train['predict',i,j] = model_DES_alpha_i_j.fittedvalues
DES_test['predict',i,j] = model_DES_alpha_i_j.forecast(steps=len(test))

rmse_model6_train =
metrics.mean_squared_error(DES_train['Rose'],DES_train['predict',i,j],squared=
False)
rmse_model6_test =
metrics.mean_squared_error(DES_test['Rose'],DES_test['predict',i,j],squared=Fa
lse)
resultsDf_7 = resultsDf_7.append({'Alpha Values':i,'Beta Values':j,'Train
RMSE':rmse_model6_train,'Test RMSE':rmse_model6_test}, ignore_index=True)

resultsDf_7.sort_values(by=['Test RMSE']).head()

27 Time Series Forecasting Project Report by Charit Sharma


## Plotting on both the Training and Test data
'Alpha=0.3,Beta=0.3,DoubleExponentialSmoothing predictions on Test Set

resultsDf_7_1 = pd.DataFrame({'Test RMSE': [resultsDf_7['Test RMSE'][0]]}


,index=['Alpha=0.3,Beta=0.1,DoubleExponentialSmoothing'])

Model 7: Triple Exponential Smoothing - Holt Winter's Model


29 Time Series Forecasting Project Report by Charit Sharma
TripleExponentialSmoothing predictions on Test Set

31 Time Series Forecasting Project Report by Charit Sharma


5. 5 Check for the stationarity of the data on which the model is being built on using
appropriate statistical tests and also mention the hypothesis for the statistical test. If the
data is found to be nonstationary, take appropriate steps to make it stationary. Check the
new data for stationarity and comment. Note: Stationarity should be checked at alpha =
0.05.

## Test for stationarity of the series - Dicky Fuller test


6. Build an automated version of the ARIMA/SARIMA model in which the parameters are
selected using the lowest Akaike Information Criteria (AIC) on the training data and
evaluate this model on the test data using RMSE.

33 Time Series Forecasting Project Report by Charit Sharma


'Rose Wine Differenced Data Partial Autocorrelation')
35 Time Series Forecasting Project Report by Charit Sharma
## Sort the above AIC values in the ascending order to get the parameters for
the minimum AIC value

resultsDf_9 = pd.DataFrame({'Test RMSE': [rmse]}


,index=['ARIMA(0,1,2)'])
resultsDf = pd.concat([resultsDf, resultsDf_9])
resultsDf

37 Time Series Forecasting Project Report by Charit Sharma


Now we will create a loop for pdq values

import statsmodels.api as sm
for param in pdq:
for param_seasonal in model_pdq:
SARIMA_model = sm.tsa.statespace.SARIMAX(train['Rose'].values,
order=param,
seasonal_order=param_seasonal,
enforce_stationarity=False,
enforce_invertibility=False)

39 Time Series Forecasting Project Report by Charit Sharma


41 Time Series Forecasting Project Report by Charit Sharma
7. Build ARIMA/SARIMA models based on the cut-off points of ACF and PACF
on the training data and evaluate this model on the test
data using RMSE.
Manual ARIMA

43 Time Series Forecasting Project Report by Charit Sharma


predicted_manual_ARIMA = results_manual_ARIMA.forecast(steps=len(test))

15.732718754522914

MANUAL SARIMA
Df.plot is constructed to check the trend

45 Time Series Forecasting Project Report by Charit Sharma


47 Time Series Forecasting Project Report by Charit Sharma
8. Build a table (create a data frame) with all the models built along with their corresponding
parameters and the respective RMSE values on the test data.

temp_resultsDf = pd.DataFrame({'Test RMSE': [rmse]},


index=['SARIMA(1,1,2)(2,0,2,6)'])
('Sorted by RMSE values on the Test Data for Rose Wine sale:','\n',)

49 Time Series Forecasting Project Report by Charit Sharma


9. Based on the model-building exercise, build the most optimum model(s) on the complete
data and predict 12 months into the future with appropriate confidence intervals/bands.

Building the most optimum model on the Full Data.

'Plot of Exponential Smoothing Acutal Values for Rose Wine Productions'


fullmodel = ExponentialSmoothing(df,trend='additive',
seasonal='multiplicative').fit(smoothing_level=0.3, smoothing_slope=0.4,
smoothing_seasonal=0.3)

RMSE: 20.672560612957582

# Getting the predictions for the same number of times stamps that are present
in the test data prediction = fullmodel.forecast(steps=len(test))
prediction = fullmodel.forecast(steps=len(test))

#In the below code, we have calculated the upper and lower confidence bands at
95% confidence level
#The percentile function under numpy lets us calculate these and adding and
subtracting from the predictions

51 Time Series Forecasting Project Report by Charit Sharma


#gives us the necessary confidence bands for the predictions

# plot the forecast along with the confidence band

You might also like