Professional Documents
Culture Documents
Revathy Prabhakaran
EXECUTIVE SUMMARY
• Problem/data - To predict future sales of Rossmann stores using past sales data
available.
• Methods and models - Time series analysis using the following models:-
• 1.Holt Winters’s
• 2.ARIMA
• 3.SARIMA
• 4. FB Prophet.
• Results/errors of these models - Performance of the forecasting method
measured using the root mean squared error(RMSE)
• Findings – Basis above models we found SARIMA to provide the best forecast for
our weekly resampled dataset of the Rossmann stores
Exploratory Data Analysis
Data Preparation & Pre-processing
Correlation Analysis
We can see a strong positive correlation between the amount of Sales and
Customers visiting the store. We can also observe a positive correlation
between a running promotion (Promo = 1) and number of customers.
Sales are highly correlated with feature Customers and feature Open and
moderately correlated with Promo.
Sales trend over the months
*For 2013; we can see from the given trends that sales
tend to spike in November and December. So, there is
a seasonality factor present in the data.
*For 2014; we can see the same trends seasonality
trend for November and December. So, there is a
seasonality factor present in the data.
*We see that sales due to promotion are peaking
during Mar and Jun. For 2015 as well we see the same
trend as 2013 & 2014
Sales trend over days
We resample the entire dataset to daily level,
and below are the observations;
So, we have 172,871 observations when the stores were closed or have zero sales.
We can drop these rows in order to do data analysis but we can still keep them for predictive
modelling because our models will be able to understand the trend behind it
There are in total 1115 stores with
sales
feature having 3849.93 volatility and
a b c d a b c d a b c d
From above table, we can see that Store of type 'a' and 'd' have
the highest total sales but stores of type ‘a' and ‘d' have the
highest sale per customer
Model Development & Validations
Forecasting a Time Series
We tried 4 following modelling approaches
Holt winters
ARIMA
SARIMA
FB Prophet
Evaluation Metrics
• There are two popular metrics used in measuring the performance of regression (continuous variable)
models i.e MAE & RMSE.
• MAE - Mean Absolute Error: It is the average of the absolute difference between the predicted values and
observed values.
• RMSE - Root Mean Square Error: It is the square root of the average of squared differences between the
predicted values and observed values.
• MAE is easier to understand and interpret but RMSE works well in situations where large errors are
undesirable.
• So, let's choose RMSE as a metric to measure the performance of our models
Forecasting a Time Series
Model 1: Holt Winter’s
Autocorrelation plots From the daily data, we see a strong
correlation of sales of a specific day
with 7 day lagged version of it.
• This algorithm simply conduct an exhaustive search over all the combinations of parameters.
The best one among all of them will be chosen according to a loss function of our choice. In our
case, we use the popular Akaike Information Criterion (AIC) as per the standard in the ARMA
modelling process.
Model2. ARIMA
Checking diagnostic plots- SARIMA
Model 4 – FB Prophet
Model Evaluation
• Holt Winter’s
ARIMA
FB PROPHET
Model Evaluation
Business recommendations and next steps
Recommendation:
The Rossmann data set is more suited for weekly forecast, so instead of forecasting at daily
basis the business should look at weekly forecasts and this is what we have tried in all our
models. Also, the business should favor oversupplying inventory rather than undersupplying, as
drug store products tend to have a long use life so can always be sold at a later date. In case of
undersupply of products, there is the certainty of a missed sales opportunity, as drug products
tend to be urgent and necessary for customers
Next steps:
• We should use SARIMAX model, which is expected to do better than SARIMA. In SARIMAX, we
pass exogenous variables which have impact on sales like we discovered during EDA
• Fit prophet model in better way. We could do more customization in prophet model to make
its prediction better. e.g. we can pass school holiday and state holiday data. As prophet is
known to make good use of holiday data in prediction. Also, we can use automatic change
point detection also to make it better.
• We did analysis on weekly mean of sales data. We can also use sales data for each store type
and run different models to make better predictions for each type of store. Other hierarchical
forecasting models can be used
THANK YOU