You are on page 1of 19

2.11.

2020

Demand Forecasting by
Artificial Neural
Network (ANN)s

PREPARED BY DR. SULE ITIR SATOGLU 1

Why Forecast with ANNs?


Sales forecasting with high accuracy becomes basic requirement in many industries.
However, the market conditions, the factors that affect the demand vary among the
industries.
In some industries with relatively stable demand pattern, traditional short term forecasting
methods such as moving average, exponential smoothing or Regression analysis (Holt,
Winter etc.) or other time-series analysis techniques can be employed.
But, in the industries/sectors with a huge variety of products and many external factors
affect the demand, those traditional methods are not enough to represent the demand
behavior.
ANN may be a good method for forecasting product demand in these sectors.

PREPARED BY DR. SULE ITIR SATOGLU 2

1
2.11.2020

What is ANN?
An ANN, inspired from human brain, is a numerical model developed in a similar way with the
nature of the biological nervous system.
Although there are different aspects of the various ANN models, the main characteristics of
ANN are non-linearity, learning, and flexibility.
There are multilayer (MLP-multi-layer perceptron) as well as single layer ANN models.
Input-output variables’ relations are non-linear and this non-linearity can be represented by the
ANN model and this allows complex problems to be solved.
Learning is the ability to make inferences from data, inspired by the human brain. Hence, ANN
can stand out from the classical algorithms.
ANN can easily adapt to the changes that may occur in the models, therefore it is flexible.

PREPARED BY DR. SULE ITIR SATOGLU 3

Neural Network Types


1. Artificial Neural Networks(ANN) for Regression, classification
2. Convolutional Neural Networks(CNN) for Computer Vision
3. Recurrent Neural Networks(RNN) for Time Series analysis

PREPARED BY DR. SULE ITIR SATOGLU 4

2
2.11.2020

Neuron
Biological Neurons are the fundamental units of the brain and nervous system.
These cells receive sensory input from the external world via dendrites, process it and gives the
output through Axons.

https://towardsdatascience.com/introduction-to-artificial-neural-networks-ann-1aea15775ef9

PREPARED BY DR. SULE ITIR SATOGLU 5

Perceptron
A single layer neural network is called a
Perceptron that gives a single output.
Each of the inputs multiplied by a connection
weight or synapse.
Weight shows the strength of a specific node.
σ 𝑥𝑘 𝑤𝑘 𝑖𝑠 𝑐𝑜𝑚𝑝𝑢𝑡𝑒𝑑.
Then, Activation function is applied:
𝑓 σ 𝑥𝑘 𝑤𝑘 .
Activation function decides whether a neuron
should be activated or not by calculating the
weighted sum and further adding bias to it.
The motive is to introduce non-linearity into
the output of a neuron.

(Wikipedia)
PREPARED BY DR. SULE ITIR SATOGLU 6

3
2.11.2020

Example
Price of a house may be afffected by its
Area footage area, # of bedrooms, distance to the
X1 w1
city center, and age of the house.
# of bedrooms

X2 Price
w2
Distance to
y
city center w3
X3
w4

Age
X4

Input layer Output layer

PREPARED BY DR. SULE ITIR SATOGLU 7

Multi-Layer Perceptron
ANNs may have hidden layer(s), thus hidden
neurons.
So, this is a multi-layer perceptron.

(Kubat, 2017)

PREPARED BY DR. SULE ITIR SATOGLU 8

4
2.11.2020

Configuration of multilayer perceptron


ANN
𝑦𝑗 = σ𝑖 𝑤𝑖𝑗 × 𝑥𝑖 + 𝑏𝑗
(1)
𝑥𝑖 = input value of the neuron
𝑤𝑖𝑗 = weight of the connection between
the neurons-i and j
The functions that convert input values are called activation functions. 𝑏𝑗 = bias value for jth neuron
Activation functions used in neural networks: Sigmoid, hyperbolic tangent
and linear, as shown in Equations (2), (3) and, (4), respectively. 𝑦𝑗 = net output value of the jth neuron
1
𝑓 𝑥 = −𝑥 (2)
1+𝑒
𝑒 𝑥 −𝑒 −𝑥
𝑓 𝑥 = 𝑥 −𝑥 (3)
𝑒 +𝑒
𝑓 𝑥 =𝑥 (4)

PREPARED BY DR. SULE ITIR SATOGLU 9

Example
Not all synapses are
Area weighted.
X1 w1 Synapses with zero
# of bedrooms Price weight not shown here.
X2 w2 y Positive weight:
Distance to
city center w3 Importance of the
X3
neuron.
w4
Zero weight: Discard the
Age
connecton/synapse.
X4

Input layer Output layer

Hidden layer

PREPARED BY DR. SULE ITIR SATOGLU 10

5
2.11.2020

How ANNs work?


Training of the ANN is performed such that the computed/forecasted output is
compared to the real output, and the difference also called cost function is
computed, and fed back to the system..
For each layer of the network, the cost function is analyzed and used to adjust
the threshold and weights for the next input.
Aim is to minimize the cost function. Our aim is to minimize the cost function (a
non-linear function).
This process is called back-propagation.

PREPARED BY DR. SULE ITIR SATOGLU 11

Back-Propagation
Back-propagation is a widely used algorithm in training feedforward neural
networks for supervised learning.
While fitting an ANN, backpropagation computes the gradient of the loss/cost
function with respect to the weights of the network for a single input–output
example, and does so efficiently.
This efficiency makes it feasible to use gradient methods for training multilayer
networks, updating weights to minimize loss/cost.

PREPARED BY DR. SULE ITIR SATOGLU 12

6
2.11.2020

Stages of Building an ANN for Forecasting

PREPARED BY DR. SULE ITIR SATOGLU 13

Stages

Choose Decide Split


Data Cleanup
Feature
Run ANN
Obtain
Features & hyper- Train/Test
Selection parameters Results
Target Data

PREPARED BY DR. SULE ITIR SATOGLU 14

7
2.11.2020

Data Cleanup-Eyballing Data


It is useful to take a quick look at the data to get a feel of it.
In Phyton, Pandas Data Frame is helpful.
Most of the time spent doing data analysis is about doing data cleanup.
Clean the extremely unlikely values.
Clean the outliers from the data. In statistics, outliers are always cleaned not
to cause any bias.
Then transform the data using One-Hot Encoding.
◦ One-hot encoding is a function applied in Phyton, for coding each value of the
categorical variables as independent binary variables.
◦ Give example..

PREPARED BY DR. SULE ITIR SATOGLU 15

Feature Selection
Scaling the data by means of Normalization.
Normalization is needed, because the scales of the input variables are often different from
each other!
Alternative Normalization methods can be applied.
Min-max Normalization is applied as follows:
𝑥−𝑥𝑚𝑖𝑛
𝑥𝑛𝑒𝑤 =  Hence, the independent variables values’ range between 0 and 1.
𝑥𝑚𝑎𝑥 −𝑥𝑚𝑖𝑛

Raw data may contain some features with little/no predictive value.
Identify and exclude irrelevant ones.
Sometimes creating synthetic features out of two or more raw features can be helpful.

PREPARED BY DR. SULE ITIR SATOGLU 16

8
2.11.2020

Feature Selection
Categorical variables: These show the specific and categorical
aspects of the entities (stores, products etc).
Ordinal variables: A scale from low to high (such as 1 to 5) is given to
the values of the variables.
Continous variables: Those can take any value and these are
quantifiable.

PREPARED BY DR. SULE ITIR SATOGLU 17

One-Hot Encoding
Categorical variables take N-possible values.
So, a feature has N-possible values.
It is useful to encode such kind of a feature with N-values as N-features, each taking a binary
value.
In Phyton: Some functions in Scikit-learn will transform raw datasets into this format.
Dimensionality is increased.
Ex.: We have stores named Istinyepark, Kanyon, etc. So we define new features for each of them.
https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html
Sklearn.preprocessing.LabelBinarizer --> This function converts categorical variables’ each value
into a feature.

PREPARED BY DR. SULE ITIR SATOGLU 18

9
2.11.2020

Choose Features & Target


X: Features (input variables)
y: Target (the output to be forecasted)
Define which variables(s) are independent and which are dependent
(Target)
Remember that X and y must have equal # of rows!

PREPARED BY DR. SULE ITIR SATOGLU 19

Decide Hyperparameters
Hyperparameters control the flow of the training or tune the training.
Hidden number of neurons, etc.
Scikit-learn models have hypreparameters set before they are trained.
These are sensible-defaults.
You can change if you want.

PREPARED BY DR. SULE ITIR SATOGLU 20

10
2.11.2020

Train/Test Split
The data must be properly split into traininng dataset and test dataset groups.
In Phyton the function:
Train_test_split(X,y,..) exist.
This function intends to prevent over-fitting problem.
Overfitting problem: Overfitting happens when a model learns the detail and noise in the
training data to the extent that it negatively impacts the performance of the model on new data.
In other words, the algorithm memorizes the data, and cannot predict well when a different or
unseen type of data appears.
So, the training and test data must be evenly split into parts to prevent overfitting problem.

PREPARED BY DR. SULE ITIR SATOGLU 21

Run ANN & Obtain Results


The ANN model is built and run.
The results of it are obtained.
The results show R2 and MSE of the training dataset, and test dataset.
In MATLAB, graphs are plotted for each dataset, showing the R2 of each.
MSE (Mean Square Error) is also an important error measure.

PREPARED BY DR. SULE ITIR SATOGLU 22

11
2.11.2020

Coefficeint of Correlation/Determination
R2: Square of Coefficent of Correlation
This is not an error measure, but shows how well the variance of
dependent variable (values) are explained by the independent variables.
ANN results frequently show R2 for assessing the forecasting performance.

PREPARED BY DR. SULE ITIR SATOGLU 23

Sales Forecasting by Artificial Neural


Networks for the Apparel Retail Chain
Stores
CAGLAYAN, N., SATOGLU, S. I., & KAPUKAYA, E. N. (2019, JULY). SALES
FORECASTİNG BY ARTİFİCİAL NEURAL NETWORKS FOR THE APPAREL
RETAİL CHAİN STORES. IN INTERNATİONAL CONFERENCE ON INTELLİGENT
AND FUZZY SYSTEMS (PP. 451-456). SPRINGER, CHAM.

PREPARED BY DR. SULE ITIR SATOGLU 24

12
2.11.2020

Motivation
Especially, in retail and apparel industries, products are not tailor-made and must be
produced and made available in chain stores for the customers, in advance.
Large variety of products, diversity in customer expectations and changes in trends make
forecasting very difficult for the retail products. (Ren et al., 2019)(Beheshti-Kashi et al., 2015)
Demand is not stable, especially in big data era (Ren et al., 2019) and fashion supply chain is
primarily based on quick and competent forecast (Ren et al., 2017).
Poor forecasting cause stock outs and insufficient usage of resources.
So, an ANN model needs to be developed for the apparel retail chain stores.

PREPARED BY DR. SULE ITIR SATOGLU 25

Purpose
Purpose of this study is to develop an ANN to forecast sales of a product family sold in
an apparel chain store.
The past sales, sales price, promotion data of selected stores, as well as store type and
location information and weather temperature data are included in the ANN models.
The city with the highest number of stores was selected and 37 of the stores in this city
were chosen and considered.
Besides, Regression Analysis was used for forecasting.
The results obtained by the ANN was much better than those of the Regression
Analysis.

PREPARED BY DR. SULE ITIR SATOGLU 26

13
2.11.2020

Methodology

PREPARED BY DR. SULE ITIR SATOGLU 27

Application
There are shopping mall stores and and Street stores of the retail chain store in Istanbul.
Firstly, the data set that belongs to the sales of sports shoes model in the stores at the shopping
centers in Istanbul was used.
Street stores are not considered because they may vary in product models and prices.
Secondly, the stores and concept stores were also considered for both shopping malls and street
stores.
The models for these two different prediction models are defined as modelstores and modelgeneral,
respectively.
An artificial neural network model (ANN) has been proposed to estimate the sales of the model
that can be used in all seasons.
The weekly sales data for 2014-2017 used in the ANN model.

PREPARED BY DR. SULE ITIR SATOGLU 28

14
2.11.2020

Application-Input Data &Pre-Processing


Data pre-processing was performed.
Some of the weekly data that had missing values were excluded.
Besides, the outlier values were also excluded that could cause biased or misleading results.
Variables Type of Variables Data References
Air temperatures Numeric Turkish State
Meteorological Service
Special days Binary [36]
Information of incomes Numeric
The percentage of discount Numeric
Product Manager of Store
Number of customers Numeric
Information of Store Nominal

PREPARED BY DR. SULE ITIR SATOGLU 29

Application-Input Data Normalizaton


Min-Max normalization method is preferred for data normalization for all
independent & numerical variables.
𝑥𝑖 −𝑥𝑚𝑖𝑛
𝑥𝑛𝑜𝑟𝑚𝑎𝑙𝑖𝑧𝑒𝑑 = 𝑥
𝑚𝑎𝑥 −𝑥𝑚𝑖𝑛

PREPARED BY DR. SULE ITIR SATOGLU 30

15
2.11.2020

Application-Model Development
The ANN structure of modelstores is presented in the Figure.
At this stage, many network designs were established and
networks trained and results were obtained.
At the end of experiments, the best learning model with
the least error value was accepted.
Bayesian regulation learning function and feed-forward
back propagation algorithm are applied.
The network structure includes two layers, twenty hidden
neurons, sigmoid activation and tansig transfer functions.
Many trials were made to find the best ANN configuration.

PREPARED BY DR. SULE ITIR SATOGLU 31

Results based on MATLAB

PREPARED BY DR. SULE ITIR SATOGLU 32

16
2.11.2020

Results

PREPARED BY DR. SULE ITIR SATOGLU 33

Results

PREPARED BY DR. SULE ITIR SATOGLU 34

17
2.11.2020

Model Development for All Stores


The second ANN model was developed for all stores.

PREPARED BY DR. SULE ITIR SATOGLU 35

Results of the Second Model

PREPARED BY DR. SULE ITIR SATOGLU 36

18
2.11.2020

MAPE of the ANN Model for All Stores

PREPARED BY DR. SULE ITIR SATOGLU 37

Comparison of ANN & Regression Results


Linear regression is not suitable to
forecast for the demand of the stores, as
the MAPE of the Regression based
forecasts are high.

PREPARED BY DR. SULE ITIR SATOGLU 38

19

You might also like