You are on page 1of 53

Forecasting

with Artificial Neural Networks

EVIC 2005 Tutorial


Santiago de Chile, 15 December 2005

Æ slides on www.neural-forecasting.com

Sven F. Crone
Centre for Forecasting
Department of Management Science
Lancaster University Management School
email: s.crone@neural-foreasting.com

EVIC’05 © Sven F. Crone - www.bis-lab.com

Lancaster University Management School?

EVIC’05 © Sven F. Crone - www.bis-lab.com

What you can expect from this session …

ƒ Simple back propagation algorithm [Rumelhart et al. 1982]


∂C (t pj , o pj )
E p = C (t pj , o pj ) o pj = f j (net pj ) Δ p w ji ∝ −
∂w ji
∂C (t pj , o pj ) ∂C (t pj , o pj ) ∂net pj
=
∂w ji ∂net pj ∂w ji
∂C (t pj , o pj )
Æ „How to …“ on Neural
δ pj = −
∂net pj Network Forecasting
∂C (t pj , o pj ) ∂C (t pj , o pj ) ∂o pj
δ pj = − =
with limited maths!
∂net pj ∂o pj ∂net pj
∂o pt
= f j' ( net pj )
∂net pj
∂C (t pj , o pj )
δ pj =
∂o pj
f j' ( net pj ) Æ CD-Start-Up Kit for
∂ ∑ wki o pi Neural Net Forecasting
∂C (t pj , o pj ) ∂net pk ∂C (t pj , o pj )
∑ ∂net pk ∂o pj
=∑
∂net pk
i
∂o pj
k k
Æ 20+ software simulators
∂C (t pj , o pj )
=∑ wkj = − ∑ δ pj wkj
k ∂net pk k Æ datasets
δ pj = f ( net pj )∑ δ pj wkj
j
'
Æ literature & faq
k

⎧ ∂C (t pj , o pj ) '
⎪ f j (net pj ) if unit j is in the output layer
⎪ ∂o pj
δ pj = ⎨ Æ slides, data & additional info on
⎪ f ' ( net ) δ w
pj ∑ pk pjk if unit j is in a hidden layer www.neural-forecasting.com
⎩⎪
j
k

1
EVIC’05 © Sven F. Crone - www.bis-lab.com

Agenda

Forecasting with Artificial Neural Networks


1. Forecasting?
2. Neural Networks?
3. Forecasting with Neural Networks …
4. How to write a good Neural Network forecasting paper!

EVIC’05 © Sven F. Crone - www.bis-lab.com

Agenda

Forecasting with Artificial Neural Networks

1. Forecasting?
1. Forecasting as predictive Regression
2. Time series prediction vs. causal prediction
3. Why NN for Forecasting?
2. Neural Networks?
3. Forecasting with Neural Networks …
4. How to write a good Neural Network forecasting paper!

EVIC’05 © Sven F. Crone - www.bis-lab.com

Forecasting or Prediction?

ƒ Data Mining: „ Application of data analysis algorithms &


discovery algorithms that extract patterns out of the data” Æ
algorithms?
Data Mining

TASKS Descriptive Predictive


timing of dependent variable
Data Mining Data Mining

Categorisation: Summarisation Association Sequence Categorisation: Time Series


Regression
Clustering & Visualisation Analysis Discovery Classification Analysis

¾K-means ¾Feature Selection ¾Association rules ¾Temporal ¾Decision trees ¾Linear Regression ¾Exponential
Clustering ¾Princip.Component ¾Link Analysis association rules ¾Logistic regress. ¾Nonlinear Regres. smoothing
¾Neural networks Analysis ¾(S)ARIMA(x)
¾Neural networks ¾Neural networks
¾K-means ¾Disciminant MLP, RBFN, GRNN ¾Neural networks
Clustering Analysis
¾Class Entropy

ALGORITHMS

2
EVIC’05 © Sven F. Crone - www.bis-lab.com

Forecasting or Classification
Independent Metric scale Ordinal scale Nominal scale

dependent
Metric sale ƒ Regression ƒ Analysis of Supervised learning
ƒ Time Series DOWNSCALE Variance
Inputs Target
Analysis
...
...
Ordinal scale ...
...
DOWNSCALE DOWNSCALE DOWNSCALE ...
Cases ...
...
...
Nominal scale ƒ Classification ƒ Contingency .. .. .. .. .. .. .. .. .. ... .. ..
......... . .
DOWNSCALE Analysis ...

ƒ Principal ƒ Clustering Unsupervised learning


NONE Component Inputs
Analysis
...
...
...
...
...
Cases ...
...
...
...
.. .. .. .. .. .. .. .. .. ..
......... .
...

ƒ Simplification from Regression (“How much”) Æ Classification (“Will event occur”)


Æ FORECASTING = PREDICTIVE modelling (dependent variable is in future)
Æ FORECASTING = REGRESSION modelling (dependent variable is of metric scale)

EVIC’05 © Sven F. Crone - www.bis-lab.com

Forecasting or Classification?

ƒ What the experts say …

EVIC’05 © Sven F. Crone - www.bis-lab.com

International Conference on Data Mining

ƒ You are welcome to contribute … www.dmin-2006.com !!!

3
EVIC’05 © Sven F. Crone - www.bis-lab.com

Agenda

Forecasting with Artificial Neural Networks


1. Forecasting?
1. Forecasting as predictive Regression
2. Time series prediction vs. causal prediction
3. SARIMA-Modelling
4. Why NN for Forecasting?
2. Neural Networks?
3. Forecasting with Neural Networks …
4. How to write a good Neural Network forecasting paper!

EVIC’05 © Sven F. Crone - www.bis-lab.com

Forecasting Models

ƒ Time series analysis vs. causal modelling

… Time series prediction (Univariate)


„ Assumes that data generating process
that creates patterns can be explained
only from previous observations of
dependent variable
… Causal prediction (Multivariate)
„ Data generating process can be explained
by interaction of causal (cause-and-effect)
independent variables

EVIC’05 © Sven F. Crone - www.bis-lab.com

Classification of Forecasting Methods


Forecasting Methods

Objective Subjective „Prophecy“


Forecasting Methods Forecasting Methods educated guessing…
Time Series Causal
Methods Methods

Averages Linear Regression Sales Force Composite


Moving Averages Multiple Regression Analogies
Naive Methods
Dynamic Regression Delphi
Exponential Smoothing PERT
Vector Autoregression
Simple ES Survey techniques
Linear ES Intervention model
Seasonal ES Neural Networks
Dampened Trend ES Neural Networks ARE
Simple Regression ƒ time series methods
Autoregression ARIMA ƒ causal methods
Neural Networks & CAN be used as
ƒ Averages & ES
Demand Planning Practice ƒ Regression …
Objektive Methods + Subjektive correction

4
EVIC’05 © Sven F. Crone - www.bis-lab.com

Time Series Definition

ƒ Definition
… Time Series is a series of timely ordered, comparable observations
yt recorded in equidistant time intervals
ƒ Notation
… Yt represents the t th period observation, t=1,2 … n

EVIC’05 © Sven F. Crone - www.bis-lab.com

Concept of Time Series

ƒ An observed measurement is made up of a


… systematic part and a
… random part

ƒ Approach
… Unfortunately we cannot observe either of these !!!
… Forecasting methods try to isolate the systematic part
… Forecasts are based on the systematic part
… The random part determines the distribution shape

ƒ Assumption
… Data observed over time is comparable
„ The time periods are of identical lengths (check!)
„ The units they are measured in change (check!)
„ The definitions of what is being measured remain unchanged (check!)
„ They are correctly measured (check!)
… data errors arise from sampling, from bias in the instruments or the
responses, from transcription.

EVIC’05 © Sven F. Crone - www.bis-lab.com

Objective Forecasting Methods – Time Series

Methods of Time Series Analysis / Forecasting


… Class of objective Methods
… based on analysis of past observations of dependent variable alone

ƒ Assumption
… there exists a cause-effect relationship, that keeps repeating itself
with the yearly calendar
… Cause-effect relationship may be treated as a BLACK BOX
… TIME-STABILITY-HYPOTHESIS ASSUMES NO CHANGE:
Æ Causal relationship remains intact indefinitely into the future!
… the time series can be explained & predicted solely from previous
observations of the series

Æ Time Series Methods consider only past patterns of same variable


Æ Future events (no occurrence in past) are explicitly NOT considered!
Æ external EVENTS relevant to the forecast must be corected
MANUALLY

5
EVIC’05 © Sven F. Crone - www.bis-lab.com

Simple Time Series Patterns


Dia gra m 1.1: Tre nd - Dia gra m 1.2: S e a sona l -
long-te r m gr ow th or de cline occur ing m or e or le s s r e gular m ove m e nts
w ithin a s e r ie s w ithin a ye ar
100 120

80 100
80
60
60
40
40 regular fluctuation within a year
20
Long term movement in series 20 (or shorter period)
0 0
superimposed on trend and cycle
12

15

18

21

24

27

30
3

9
Year

10

15

20

25

30

35

40

45
5
Year

Dia gra m 1.3: Cycle - Dia gra m 1.4: Irre gula r -


alte r nating ups w ings of var ie d le ngth r andom m ove m e nts and thos e w hich
and inte ns ity r e fle ct unus ual e ve nts
10 350

8 300
250
6
200
4 150
2 Regular fluctuation 100

0
superimposed on trend (period 50
may be random)
10

15

20

25

30

35

40

45
5

0
Year

10

19

28

37

46

55

64

73

82

EVIC’05 © Sven F. Crone - www.bis-lab.com

Regular Components of Time Series

A Time Series consists of superimposed components / patterns:

¾ Signal
ƒ level ‘L‘
ƒ trend ‘T‘
ƒ seasonality ‘S‘
¾ Noise
ƒ irregular,error 'e'

Sales = LEVEL + SEASONALITY + TREND + RANDOM ERROR

EVIC’05 © Sven F. Crone - www.bis-lab.com

Irregular Components of Time Series


Original-Zeitreihen

Structural changes in systematic data


45

40

35

30

25

ƒ PULSE 20

15

… one time occurrence 10

… on top of systematic stationary / 0


M 08.2000

M 09.2000

M 10.2000

M 11.2000

M 12.2000

M 01.2001

M 02.2001

M 03.2001

M 04.2001

M 05.2001

M 06.2001

M 07.2001

M 08.2001

M 09.2001

M 10.2001

M 11.2001

M 12.2001

M 01.2002

M 02.2002

M 03.2002

M 04.2002

M 05.2002

M 06.2002

M 07.2002

[t]

trended / seasonal development Datenwert original Korrigierter Datenwert

STATIONARY time series with PULSE


ƒ LEVEL SHIFT Original-Zeitreihen

… one time / multiple time shifts 60

on top of systematic stationary /


50

trended / seasonal
40

etc. development
30

20

10

ƒ STRUCTURAL BREAKS 0 [t]


Okt 00

Okt 01
Nov 00

Dez 00

Jan 01

Feb 01

Mrz 01

Mai 01

Jun 01

Jul 01

Nov 01

Dez 01

Jan 02

Feb 02

Mrz 02

Mai 02

Jun 02

Jul 02
Sep 00

Sep 01
Aug 00

Apr 01

Aug 01

Apr 02

… Trend changes (slope, direction) Datenwert original Korrigierter Datenwert

… Seasonal pattern changes & shifts STATIONARY time series with level shift

6
EVIC’05 © Sven F. Crone - www.bis-lab.com

Components of Time Series

ƒ Time Series

Level

Trend

Random

• Time Series Æ
decomposed into Components

EVIC’05 © Sven F. Crone - www.bis-lab.com

Time Series Patterns

Time Series Pattern

REGULAR IRREGULAR
Time Series Patterns Time Series Patterns

STATIONARY) SEASONAL TRENDED FLUCTUATING INTERMITTANT


Time Series Time Series Time Series Time Series Time Series

Original-Zeitreihen Original-Zeitreihen Original-Zeitreihen


Original-Zeitreihen Original-Zeitreihen
60000 30
45 60
12

40 25
50000
50
10
35

40000 20
30 40
8

25
30000 15
30

20 6

20000 20 10
15
4
10
10000 5
10
5 2

0 0 0
0 [t]
M 08.2000

M 09.2000

M 10.2000

M 11.2000

M 12.2000

M 01.2001

M 02.2001

M 03.2001

M 04.2001

M 05.2001

M 06.2001

M 07.2001

M 08.2001

M 09.2001

M 10.2001

M 11.2001

M 12.2001

M 01.2002

M 02.2002

M 03.2002

M 04.2002

M 05.2002

M 06.2002

M 07.2002

M 08.2000

M 09.2000

M 10.2000

M 11.2000

M 12.2000

M 01.2001

M 02.2001

M 03.2001

M 04.2001

M 05.2001

M 06.2001

M 07.2001

M 08.2001

M 09.2001

M 10.2001

M 11.2001

M 12.2001

M 01.2002

M 02.2002

M 03.2002

M 04.2002

M 05.2002

M 06.2002

M 07.2002
M 08.2000

M 09.2000

M 10.2000

M 11.2000

M 12.2000

M 01.2001

M 02.2001

M 03.2001

M 04.2001

M 05.2001

M 06.2001

M 07.2001

M 08.2001

M 09.2001

M 10.2001

M 11.2001

M 12.2001

M 01.2002

M 02.2002

M 03.2002

M 04.2002

M 05.2002

M 06.2002

M 07.2002

[t]
Jan 77

Mrz 77

Mai 77

Jul 77

Nov 77

Jan 78

Mrz 78

Mai 78

Jul 78

Nov 78

Jan 79

Mrz 79

Mai 79

Jul 79

Nov 79

Jan 80

Mrz 80

Mai 80

Jul 80

Nov 80
Sep 77

Sep 78

Sep 79

Sep 80

[t] [t] [t]


0
1960

1961

1962

1963

1964

1965

1966

1967

1968

1969

1970

1971

1972

1973

1974

1975

1976

1977

1978

1979

1980

- 10 -5

Datenwert original Korrigierter Datenwert Datenwert original Korrigierter Datenwert Datenwert original Korrigierter Datenwert
Datenwert original Korrigierter Datenwert Datenwert original Korrigierter Datenwert

Yt = f(Et) Yt = f(St,Et) Yt = f(Tt,Et) time series fluctuates Number of periods


time series is time series is time series is very strongly around level with zero sales is high
influenced by influenced by influenced by (mean deviation > (ca. 30%-40%)
level & random level, season and trend from level ca. 50% around mean)
fluctuations random and random
fluctuations fluctuations + PULSES!
+ LEVEL SHIFTS!
Combination of individual Components
Yt = f( St, Tt, Et ) + STRUCTURAL BREAKS!

EVIC’05 © Sven F. Crone - www.bis-lab.com

Components of complex Time Series


Zeitreihen-Diagramm

Sales or 800

750

observation of Yt 700

650
Different possibilities to
time series at point t 600 combine components
550

500
[t]
Jan 60

Mrz 60

Mai 60

Jul 60

Nov 60

Jan 61

Mrz 61

Mai 61

Jul 61

Nov 61

Jan 62

Mrz 62

Mai 62

Jul 62

Nov 62

Jan 63

Mrz 63

Mai 63

Jul 63

Nov 63
Sep 60

Sep 61

Sep 62

Sep 63

Datenwert original Korrigierter Datenwert

= Original-Zeitreihen

consists of a 45

40

combination of f( ) 35

30

25

20

15

10

0 [t]
Jan 77

Mrz 77

Mai 77

Jul 77

Nov 77

Jan 78

Mrz 78

Mai 78

Jul 78

Nov 78

Jan 79

Mrz 79

Mai 79

Jul 79

Nov 79

Jan 80

Mrz 80

Mai 80

Jul 80

Nov 80
Sep 77

Sep 78

Sep 79

Sep 80

Base Level + Datenwert original Korrigierter Datenwert

St , Original-Zeitreihen

Additive Model
Seasonal Component 12

10

6
Yt = L + St + Tt + Et
4

[t]
0
1960

1961

1962

1963

1964

1965

1966

1967

1968

1969

1970

1971

1972

1973

1974

1975

1976

1977

1978

1979

1980

Tt ,
Datenwert original Korrigierter Datenwert

Trend-Component Multiplicative Model


Yt = L * St * Tt * Et
Original-Zeitreihen

60000

Et 50000

40000

Irregular or 30000

20000

random Error- 10000

Component 0
M 08.2000

M 09.2000

M 10.2000

M 11.2000

M 12.2000

M 01.2001

M 02.2001

M 03.2001

M 04.2001

M 05.2001

M 06.2001

M 07.2001

M 08.2001

M 09.2001

M 10.2001

M 11.2001

M 12.2001

M 01.2002

M 02.2002

M 03.2002

M 04.2002

M 05.2002

M 06.2002

M 07.2002

[t]

Datenwert original Korrigierter Datenwert

7
EVIC’05 © Sven F. Crone - www.bis-lab.com

Classification of Time Series Patterns


No Additive Multiplicative
Seasonal Effect Seasonal Effect Seasonal Effect
No
Trend Effect

Additive
Trend Effect

Multiplicative
Trend Effect

[Pegels 1969 / Gardner]

EVIC’05 © Sven F. Crone - www.bis-lab.com

Agenda
Forecasting with Artificial Neural Networks
1. Forecasting?
1. Forecasting as predictive Regression
2. Time series prediction vs. causal prediction
3. SARIMA-Modelling
1. SARIMA – Differencing
2. SARIMA – Autoregressive Terms
3. SARIMA – Moving Average Terms
4. SARIMA – Seasonal Terms
4. Why NN for Forecasting?
2. Neural Networks?
3. Forecasting with Neural Networks …
4. How to write a good Neural Network forecasting paper!

EVIC’05 © Sven F. Crone - www.bis-lab.com

Introduction to ARIMA Modelling

ƒ Seasonal Autoregressive Integrated Moving Average Processes: SARIMA


… popularised by George Box & Gwilym Jenkins in 1970s (names often used
synonymously)
… models are widely studied
… Put together theoretical underpinning required to understand & use ARIMA
… Defined general notation for dealing with ARIMA models
Æ claim that most time series can be parsimoniously represented
by the ARIMA class of models

ƒ ARIMA (p, d, q)-Models attempt to describe the systematic pattern of


a time series by 3 parameters
… p: Number of autoregressive terms (AR-terms) in a time series

… d: Number of differences to achieve stationarity of a time series

… q: Number of moving average terms (MA-terms) in a time series

Φ p ( B)(1 − B) d Z t = δ + Θ q ( B)et

8
EVIC’05 © Sven F. Crone - www.bis-lab.com

The Box-Jenkins Methodology for ARIMA models

Model Identification Data Preparation


• Transform time series for stationarity
• Difference time series for stationarity

Model selection
• Examine ACF & PACF
• Identify potential Models (p,q)(sq,sp) auto

Model Estimation & Testing


Model Estimation
• Estimate parameters in potential models
• Select best model using suitable criterion

Model Diagnostics / Testing


• Check ACF / PACF of residuals Æ white noise
• Run portmanteau test of residuals
Re-identify

Model Application
Model Application
• Use selected model to forecast

EVIC’05 © Sven F. Crone - www.bis-lab.com

ARIMA-Modelling

ƒARIMA(p,d,q)-Models
ƒ ARIMA - Autoregressive Terms AR(p), with p=order of the autoregressive part
ƒ ARIMA - Order of Integration, d=degree of first differencing/integration involved
ƒ ARIMA - Moving Average Terms MA(q), with q=order of the moving average of error
ƒ SARIMAt (p,d,q)(P,D,Q) with S the (P,D,Q)-process for the seasonal lags

ƒObjective
… Identify the appropriate ARIMA model for the time series
… Identify AR-term
… Identify I-term
… Identify MA-term

ƒIdentification through
… Autocorrelation Function
… Partial Autocorrelation Function

EVIC’05 © Sven F. Crone - www.bis-lab.com

ARIMA-Models: Identification of d-term

ƒ Parameter d determines order of integration

ƒ ARIMA models assume stationarity of the time series


… Stationarity in the mean
… Stationarity of the variance (homoscedasticity)

ƒ Recap:
… Let the mean of the time series at t be μt = E (Yt )
… and λ = cov Y , Y
t ,t −τ ( t t −τ )
λt ,t = var (Yt )
ƒ Definition
… A time series is stationary if its mean level μt is constant for all t
and its variance and covariances λt-τ are constant for all t
… In other words:
„ all properties of the distribution (mean, varicance, skewness, kurtosis etc.) of a
random sample of the time series are independent of the absolute time t of
drawing the sample Æ identity of mean & variance across time

9
EVIC’05 © Sven F. Crone - www.bis-lab.com

ARIMA-Models: Stationarity and parameter d

ƒ Is the time series stationary


sample B
Stationarity:
μ(A)= μ(B)
Sample A var(A)=var(B)
etc.

this time series:


μ(B)> μ(A) Æ trend
Æ instationary time series

EVIC’05 © Sven F. Crone - www.bis-lab.com

ARIMA-Modells: Differencing for Stationariry

ƒ Differencing time series


ƒ E.g. : time series Yt={2, 4, 6, 8, 10}.
ƒ time series exhibits linear trend
10
ƒ 1st order differencing between
observation Yt and predecessor Yt-1
derives a transformed time series:
2
4-2=2
1 2 3 4 5 6-4=2
xt 8-6=2
10 10-8=2

Æ The new time series ΔYt={2,2,2,2} is


stationary through 1st differencing
Æ d=1 Æ ARIMA (0,1,0) model
2

1 2 3 4
Æ 2nd order differences: d=2
xt-xt-1

EVIC’05 © Sven F. Crone - www.bis-lab.com

ARIMA-Modells: Differencing for Stationariry

ƒ Integration
… Differencing

Z t = Yt − Yt −1
… Transforms: Logarithms etc.
… …
… Where Zt is a transform of the variable of interest Yt
chosen to make Zt-Zt-1-(Zt-1-Zt-2)-… stationary

ƒ Tests for stationarity:


… Dickey-Fuller Test
… Serial Correlation Test
… Runs Test

10
EVIC’05 © Sven F. Crone - www.bis-lab.com

Agenda
Forecasting with Artificial Neural Networks
1. Forecasting?
1. Forecasting as predictive Regression
2. Time series prediction vs. causal prediction
3. SARIMA-Modelling
1. SARIMA – Differencing
2. SARIMA – Autoregressive Terms
3. SARIMA – Moving Average Terms
4. SARIMA – Seasonal Terms
4. Why NN for Forecasting?
2. Neural Networks?
3. Forecasting with Neural Networks …
4. How to write a good Neural Network forecasting paper!

EVIC’05 © Sven F. Crone - www.bis-lab.com

ARIMA-Models – Autoregressive Terms


ƒ Description of Autocorrelation structure Æ auto regressive (AR) term
… If a dependency exists between lagged observations Yt and Yt-1 we can
describe the realisation of Yt-1

Yt = c + φ1Yt −1 + φ2Yt − 2 + ... + φ pYt − p + et


Observation Weight of the Observation in random component
in time t AR relationship t-1 (independent) („white noise“)

… Equations include only lagged realisations of the forecast variable


… ARIMA(p,0,0) model = AR(p)-model

ƒ Problems
… Independence of residuals often violated (heteroscedasticity)
… Determining number of past values problematic

ƒ Tests for Autoregression: Portmanteau-tests


… Box-Pierce test
… Ljung-Box test

EVIC’05 © Sven F. Crone - www.bis-lab.com

ARIMA-Modells: Parameter p of Autocorrelation

ƒ stationary time series can be analysed for autocorrelation-


structure
ƒ The autocorrelation coefficient for lag k
n

∑ (Y − Y )(Y
t t −k −Y )
ρ k = t =k +1 n

∑ (Y − Y )
t =1
t

denotes the correlation between lagged observations of distance Autocorrelation between x_t and x_t-1
k 220

ƒ Graphical interpretation …
200

180

… Uncorrelated data has


[x_t-1]

160

low autocorrelations
X_t-1

140

… Uncorrelated data shows 120

no correlation patern 100


100 120 140 160 180 200 220

… [x_t]

11
EVIC’05 © Sven F. Crone - www.bis-lab.com

ARIMA-Modells: Parameter p
ƒ E.g. time series Yt 7, 8, 7, 6, 5, 4, 5, 6, 4.

lag 1: 7, 8 lag 2: 7, 7 lag 3: 7, 6


8, 7 8, 6 8, 5
7, 6 7, 5 7, 4
6, 5 6, 4 6, 5
5, 4 5, 5 5, 6
4, 5 4, 6 4, 5
5, 6 5, 4
6, 4
r1=.62 r2=.32 r3=.15

ACF
0.6 Æ Autocorrelations rt gathered at
lags 1, 2, … make up the
… autocorrelation function (ACF)
0.0
1 2 3 lag
-0.6

EVIC’05 © Sven F. Crone - www.bis-lab.com

ARIMA-Models – Autoregressive Terms


ƒ Identification of AR-terms …?
NORM AUTO
1.0 1.0

.5 .5

0.0 0.0

-.5 -.5
Confidence Limits Confidence Limits
ACF

ACF

-1.0 Coefficient -1.0 Coefficient


1 3 5 7 9 11 13 15 1 3 5 7 9 11 13 15
2 4 6 8 10 12 14 16 2 4 6 8 10 12 14 16

Lag Number Lag Number

• Random independent • An AR(1) process?


observations

EVIC’05 © Sven F. Crone - www.bis-lab.com

ARIMA-Modells: Partial Autocorrelations


ƒ Partical Autocorrelations are used to measure the degree of
association between Yt and Yt-k when the effects of other time
lags 1,2,3,…,k-1 are removed
… Significant AC between Yt and Yt-1
Æ significant AC between Yt-1 and Yt-2
Æ induces correlation between Yt and Yt-2 ! (1st AC = PAC!)

ƒ When fitting an AR(p) model to the time series, the last


coefficient p of Yt-p measures the excess correlation at lag p
which is not accounted for by an AR(p-1) model. πp is called the
pth order partial autocorrelation, i.e.
π p = corr (Yt , Yt − p | Yt −1 , Yt − 2 ,..., Yt − p +1 )
ƒ Partial Autocorrelation coefficient measures true correlation at
Yt-p
Yt =ϕ0 +ϕp1Yt−1 +ϕp2Yt−2 +. . . .+πpYt−p +νt

12
EVIC’05 © Sven F. Crone - www.bis-lab.com

ARIMA Modelling – AR-Model patterns

1
Autocorrelation function 1
Partial Autocorrelation function
0,8 0,8

0,6 Pattern in ACf 0,6 1st lag significant


0,4 0,4

0,2 0,2

0 0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
-0,2 -0,2

-0,4 -0,4

-0,6 -0,6
Autocorrelation
-0,8 -0,8 Partial Autocorrelation
Confidence
Confidence
-1 Interval -1
Interval

ƒ AR(1) model: Yt = c + φ1Yt −1 + et =ARIMA (1,0,0)

EVIC’05 © Sven F. Crone - www.bis-lab.com

ARIMA Modelling – AR-Model patterns


1 Autocorrelation function 1 Partial Autocorrelation function
0,8 Pattern in ACf 0,8 1st lag significant
0,6 0,6

0,4 0,4

0,2 0,2

0 0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
-0,2 -0,2

-0,4 -0,4

-0,6 -0,6
Autocorrelation
-0,8 -0,8 Partial Autocorrelation
Confidence
Confidence
-1 Interval -1
Interval

ƒ AR(1) model: Yt = c + φ1Yt −1 + et =ARIMA (1,0,0)

EVIC’05 © Sven F. Crone - www.bis-lab.com

ARIMA Modelling – AR-Model patterns


1 Autocorrelation function 1 Partial Autocorrelation function
0,8 Pattern in ACf 0,8 1st & 2nd lag significant
0,6 0,6

0,4 0,4

0,2 0,2

0 0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
-0,2 -0,2

-0,4 -0,4

-0,6 -0,6
Autocorrelation
-0,8 -0,8 Partial Autocorrelation
Confidence
Confidence
-1 Interval -1
Interval

ƒ AR(2) model: Yt = c + φ1Yt −1 + φ2Yt − 2 + et


=ARIMA (2,0,0)

13
EVIC’05 © Sven F. Crone - www.bis-lab.com

ARIMA Modelling – AR-Model patterns


1 Autocorrelation function 1 Partial Autocorrelation function
0,8 Pattern in ACF: 0,8 1st & 2nd lag significant
0,6
dampened sine 0,6

0,4 0,4

0,2 0,2

0 0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
-0,2 -0,2

-0,4 -0,4

-0,6 -0,6
Autocorrelation
-0,8 -0,8 Partial Autocorrelation
Confidence
Confidence
-1 Interval -1
Interval

ƒ AR(2) model:
=ARIMA (2,0,0)
Yt = c + φ1Yt −1 + φ2Yt − 2 + et

EVIC’05 © Sven F. Crone - www.bis-lab.com

ARIMA Modelling – AR-Models


ƒ Autoregressive Model of order one ARIMA(1,0,0), AR(1)
Autoregressive: AR(1), rho=.8

Yt = c + φ1Yt −1 + et
ƒ 5

ƒ 4

ƒ 3

ƒ 2

= 1.1 + 0.8Yt −1 + et
ƒ 1

ƒ 0
ƒ 1ƒ 4ƒ ƒ7 ƒ
10 ƒ
13 ƒ
16 ƒ
19 ƒ
22 ƒ
25 ƒ
28 ƒ
31 ƒ
34 ƒ
37 ƒ
40 ƒ
43 ƒ
46 49
ƒ -1

ƒ -2

ƒ -3

ƒ -4

ƒ Higher order AR models


Yt = c + φ1Yt −1 + φ2Yt − 2 + ... + φ pYt − p + et
for p = 1, − 1 < φ1 < 1
p = 2, − 1 < φ2 < 1 ∧ φ2 + φ1 < 1 ∧ φ2 − φ1 < 1

EVIC’05 © Sven F. Crone - www.bis-lab.com

Agenda
Forecasting with Artificial Neural Networks
1. Forecasting?
1. Forecasting as predictive Regression
2. Time series prediction vs. causal prediction
3. SARIMA-Modelling
1. SARIMA – Differencing
2. SARIMA – Autoregressive Terms
3. SARIMA – Moving Average Terms
4. SARIMA – Seasonal Terms
4. Why NN for Forecasting?
2. Neural Networks?
3. Forecasting with Neural Networks …
4. How to write a good Neural Network forecasting paper!

14
EVIC’05 © Sven F. Crone - www.bis-lab.com

ARIMA Modelling – Moving Average Processe

ƒ Description of Moving Average structure


… AR-Models may not approximate data generator underlying the
observations perfectly Æ residuals et, et-1, et-2, …, et-q
… Observation Yt may depend on realisation of previous errors e
… Regress against past errors as explanatory variables

Yt = c + et − θ1et −1 − θ 2 et − 2 − ... − φq et − q
… ARIMA(0,0,q)-model = MA(q)-model

for q = 1, − 1 < θ1 < 1


q = 2, − 1 < θ 2 < 1 ∧ θ 2 + θ1 < 1 ∧ θ 2 − θ1 < 1

EVIC’05 © Sven F. Crone - www.bis-lab.com

ARIMA Modelling – MA-Model patterns

1
Autocorrelation function 1
Partial Autocorrelation function
0,8 0,8

0,6 1st lag significant 0,6 Pattern in PACF


0,4 0,4

0,2 0,2

0 0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
-0,2 -0,2

-0,4 -0,4

-0,6 -0,6
Autocorrelation
-0,8 -0,8 Partial Autocorrelation
Confidence
Confidence
-1 Interval -1
Interval

ƒ MA(1) model:
(0,0,1)
Yt = c + et − θ1et −1 =ARIMA

EVIC’05 © Sven F. Crone - www.bis-lab.com

ARIMA Modelling – MA-Model patterns

1
Autocorrelation function 1
Partial Autocorrelation function
0,8 0,8

0,6 1st lag significant 0,6 Pattern in PACF


0,4 0,4

0,2 0,2

0 0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
-0,2 -0,2

-0,4 -0,4

-0,6 -0,6
Autocorrelation
-0,8 -0,8 Partial Autocorrelation
Confidence
Confidence
-1 Interval -1
Interval

ƒ MA(1) model: Yt = c + et − θ1et −1 =ARIMA (0,0,1)

15
EVIC’05 © Sven F. Crone - www.bis-lab.com

ARIMA Modelling – MA-Model patterns


1 Autocorrelation function 1
Partial Autocorrelation function
0,8 0,8

0,6 1st, 2nd & 3rd lag significant 0,6 Pattern in PACF
0,4 0,4

0,2 0,2

0 0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
-0,2 -0,2

-0,4 -0,4

-0,6 -0,6
Autocorrelation
-0,8 -0,8 Partial Autocorrelation
Confidence
Confidence
-1 Interval -1
Interval

ƒ MA(3) model: Yt = c + et − θ1et −1 − θ1et − 2 − θ1et −3


=ARIMA (0,0,3)

EVIC’05 © Sven F. Crone - www.bis-lab.com

ARIMA Modelling – MA-Models


ƒ Autoregressive Model of order one ARIMA(0,0,1)=MA(1)
Autoregressive: MA(1), theta=.2

Yt = c + et − θ1et −1
ƒ 3
ƒ 2,5
ƒ 2
ƒ 1,5
ƒ 1

= 10 + et + 0.2et −1
ƒ 0,5
ƒ 0
ƒ -0,5ƒ 1ƒ 4ƒ ƒ7 ƒ 13
10 ƒ 16
ƒ 19
ƒ 22
ƒ 25
ƒ 28
ƒ 31
ƒ 34
ƒ 37
ƒ 40
ƒ 43
ƒ 46
ƒ 49
ƒ -1
ƒ -1,5
ƒ -2 Period

EVIC’05 © Sven F. Crone - www.bis-lab.com

ARIMA Modelling – Mixture ARMA-Models

ƒ complicated series may be modelled by combining AR & MA terms


… ARMA(1,1)-Model = ARIMA(1,0,1)-Model

Yt = c + φ1Yt −1 + et − θ1et −1

… Higher order ARMA(p,q)-Models

Yt = c + φ1Yt −1 + φ2Yt − 2 + ... + φ pYt − p + et


−θ1et −1 − θ 2 et − 2 − ... − φq et − q

16
EVIC’05 © Sven F. Crone - www.bis-lab.com

1 Autocorrelation function 1 Partial Autocorrelation function


0,8 0,8

0,6 1st lag significant 0,6 1st lag significant


0,4 0,4

0,2 0,2

0 0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1
-0,2 -0,2

-0,4 -0,4

-0,6 -0,6
Autocorrelation
-0,8 -0,8 Partial Autocorrelation
Confidence
Confidence
-1 Interval -1
Interval

ƒ AR(1) and MA(1) model:


=ARMA(1,1)=ARIMA (1,0,1)

EVIC’05 © Sven F. Crone - www.bis-lab.com

Agenda
Forecasting with Artificial Neural Networks
1. Forecasting?
1. Forecasting as predictive Regression
2. Time series prediction vs. causal prediction
3. SARIMA-Modelling
1. SARIMA – Differencing
2. SARIMA – Autoregressive Terms
3. SARIMA – Moving Average Terms
4. SARIMA – Seasonal Terms
5. SARIMAX – Seasonal ARIMA with Interventions
4. Why NN for Forecasting?
2. Neural Networks?
3. Forecasting with Neural Networks …
4. How to write a good Neural Network forecasting paper!

EVIC’05 © Sven F. Crone - www.bis-lab.com

Seasonality in ARIMA-Models

ƒ Identifying seasonal data:Spikes in ACF / PACF at seasonal lags,


e.g 1,0000
… t-12 & t-13 for yearly ,8000

… t-4 & t-5 for quarterly ,6000


ACF
,4000
Upper Limit
,2000
ƒ Differences ,0000
Low er Limit

… Simple: ΔYt=(1-B)Yt -,2000


1
4
7
10
13
16
19
22
25
28
31
34

… Seasonal: ΔsYt=(1-Bs)Yt -,4000

with s= seasonality, eg. 4, 12

ƒ Data may require seasonal differencing to remove seasonality


… To identify model, specify seasonal parameters: (P,D,Q)
„ the seasonal autoregressive parameters P
„ seasonal difference D and
„ seasonal moving average Q
Æ Seasonal ARIMA (p,d,q)(P,D,Q)-model

17
EVIC’05 © Sven F. Crone - www.bis-lab.com

Seasonality in ARIMA-Models

1.0000 ACF
.8000

.6000
ACF
.4000
Upper Limit
.2000
Low er Limit
.0000 Seasonal spikes
-.2000 .8000
= monthly data
1
4
7
10
13
16
19
22
25
28
31
34

-.4000 .6000

.4000
PACF
.2000
Upper Limit
.0000
Low er Limit
-.2000
1
4
7
10
13
16
19
22
25
28
31
34

-.4000

-.6000
PACF

EVIC’05 © Sven F. Crone - www.bis-lab.com

Seasonality in ARIMA-Models

ƒ Extension of Notation of Backshift Operator

ΔsYt = Yt-Yt-s =Yt–BsYt=(1-Bs)Yt


ƒ Seasonal difference followed by a first difference: (1-B) (1-Bs)
Yt

ƒ Seasonal ARIMA(1,1,1)(1,1,1)4-modell
(1 − φ1 B ) (1 − Φ1B 4 ) (1 − B ) (1 − B 4 ) Yt = c + (1 − θ1 B ) (1 − Θ1 B 4 ) et

Non-seasonal Non-seasonal Non-seasonal


AR(1) difference MA(1)
Seasonal
Seasonal Seasonal
AR(1)
difference MA(1)

EVIC’05 © Sven F. Crone - www.bis-lab.com

Agenda
Forecasting with Artificial Neural Networks
1. Forecasting?
1. Forecasting as predictive Regression
2. Time series prediction vs. causal prediction
3. SARIMA-Modelling
1. SARIMA – Differencing
2. SARIMA – Autoregressive Terms
3. SARIMA – Moving Average Terms
4. SARIMA – Seasonal Terms
5. SARIMAX – Seasonal ARIMA with Interventions
4. Why NN for Forecasting?
2. Neural Networks?
3. Forecasting with Neural Networks …
4. How to write a good Neural Network forecasting paper!

18
EVIC’05 © Sven F. Crone - www.bis-lab.com

Forecasting Models

ƒ Time series analysis vs. causal modelling

… Time series prediction (Univariate)


„ Assumes that data generating process
that creates patterns can be explained
only from previous observations of
dependent variable
… Causal prediction (Multivariate)
„ Data generating process can be explained
by interaction of causal (cause-and-effect)
independent variables

EVIC’05 © Sven F. Crone - www.bis-lab.com

Causal Prediction

ƒ ARX(p)-Models

ƒ General Dynamic Regression Models

EVIC’05 © Sven F. Crone - www.bis-lab.com

Agenda

Forecasting with Artificial Neural Networks

1. Forecasting?
1. Forecasting as predictive Regression
2. Time series prediction vs. causal prediction
3. SARIMA-Modelling
4. Why NN for Forecasting?
2. Neural Networks?
3. Forecasting with Neural Networks …
4. How to write a good Neural Network forecasting paper!

19
EVIC’05 © Sven F. Crone - www.bis-lab.com

Why forecast with NN?

ƒ Pattern or noise?

ÆAirline Passenger data Æ Fresh products


supermarket Sales
ÆSeasonal, trended Æ Seasonal, events,
ÆReal “model” disagreed: heteroscedastic noise
multiplicative seasonality Æ Real “model” unknown
or additive seasonality
with level shifts?

EVIC’05 © Sven F. Crone - www.bis-lab.com

Why forecast with NN?

ƒ Pattern or noise?

Æ Random Noise iid Æ BL(p,q) Bilinear


(normally distributed: Autoregressive Model
mean 0; std.dev. 1)
yt = 0.7 yt −1ε t − 2 + ε t

EVIC’05 © Sven F. Crone - www.bis-lab.com

Why forecast with NN?

ƒ Pattern or noise?

Æ TAR(p) Threshold Æ Random walk


Autoregressive model yt = yt −1 + ε t
yt = 0.9 yt −1 + ε t for yt −1 ≤ 1
= −0.3 yt −1 − ε t for yt −1 > 1

20
EVIC’05 © Sven F. Crone - www.bis-lab.com

Motivation for NN in Forecasting – Nonlinearity!

Æ True data generating process in unknown & hard to identify


Æ Many interdependencies in business are nonlinear

ƒ NN can approximate any LINEAR and NONLINEAR function to


any desired degree of accuracy
… Can learn linear time series patterns
… Can learn nonlinear time series patterns
Æ Can extrapolate linear & nonlinear patterns = generalisation!
ƒ NN are nonparametric
… Don’t assume particular noise process, i.e. gaussian
ƒ NN model (learn) linear and nonlinear process directly from data
… Approximate underlying data generating process

Æ NN are flexible forecasting paradigm

EVIC’05 © Sven F. Crone - www.bis-lab.com

Motivation for NN in Forecasting - Modelling Flexibility

Æ Unknown data processes require building of many candidate models!

ƒ Flexibility on Input Variables Æ flexible coding


… binary scale [0;1]; [-1,1]
… nominal / ordinal scale (0,1,2,…,10 Æ binary coded [0001,0010,…]
… metric scale (0.235; 7.35; 12440.0; …)
ƒ Flexibility on Output Variables
… binary Æ prediction of single class membership
… nominal / ordinal Æ prediction of multiple class memberships
… metric Æ regression (point predictions) OR probability of class membership!
ƒ Number of Input Variables
… …
ƒ Number of Output Variables
… …

Æ One SINGLE network architecture Æ MANY applications

EVIC’05 © Sven F. Crone - www.bis-lab.com

Applications of Neural Nets in diverse Research Fields


Æ 2500+ journal publications on NN & Forecasting alone!
Citation Analysis by year
Forecast
ƒ Neurophysiology title =(neura l AND ne t* AND (forecast* OR pre dict* OR time -se r* OR
time w /2 se r* OR time ser*) & title =(... )
Sales Forecast
Æ simulate & explain brain and evaluate d sales forecasting re lated point predictions
Linear (Forecast)

ƒ Informatics 350
2
R = 0.9036
35

Æ eMail & url filtering 300 30


Æ VirusScan (Symmantec Norton Antivirus) 250 25
Æ Speech Recognition & Optical Character Recognition
[citations]

200 20
ƒ Engineering
Æ control applications in plants 150 15
Æ automatic target recognition (DARPA) 100 10
Æ explosive detection at airports
Æ Mineral Identification (NASA Mars Explorer) 50 5
Æ starting & landing of Jumbo Jets (NASA)
0 0
ƒ Meteorology / weather
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003

Æ Rainfall prediction [year]

Æ ElNino Effects
ƒ Corporate Business Number of Publications by
Business Forecasting Domain
Æ credit card fraud detection
Æ simulate forecasting methods 5

ƒ Business Forecasting Domains 52 32


… Electrical Load / Demand
… Financial Forecasting 83 General Business
„ Currency / Exchange rate Marketing
51
„ stock forecasting etc. Finance
… Sales forecasting Production

Æ not all NN recommendations


Product Sales
21
Electrical Load
are useful for your DOMAIN! 10

21
EVIC’05 © Sven F. Crone - www.bis-lab.com

IBF Benchmark– Forecasting Methods used

Applied Forecasting Methods (all industries)


0% 10% 20% 30% 40% 50% 60% 70% 80%

Averages 25%

Autoregressive Methods 7%

Decomposition 8% Time Series methods


(objective) Æ 61%

[Forecastinng Method]
Exponential Smoothing 24%

Trendextrapolation 35%

Causal Methods
Econometric Models 23%

Neural Networks 9%
(objective) Æ 23%
Regression 69%

Analogies 23%

Delphi 22%
Judgemental Methods
PERT 6%
(subjective) Æ 2x%
Surveys 49%

[number replies]

Æ Survey 5 IBF conferences in 2001


… 240 forecasters, 13 industries

ÆNN are applied in coroporate Demand Planning / S&OP processes!


[Warning: limited sample size]

EVIC’05 © Sven F. Crone - www.bis-lab.com

Agenda

Forecasting with Artificial Neural Networks


1. Forecasting?
2. Neural Networks?
1. What are NN? Definition & Online Preview …
2. Motivation & brief history of Neural Networks
3. From biological to artificial Neural Network Structures
4. Network Training
3. Forecasting with Neural Networks …
4. How to write a good Neural Network forecasting paper!

EVIC’05 © Sven F. Crone - www.bis-lab.com

What are Artificial Neural Networks?

ƒ Artificial Neural Networks (NN)


… „a machine that is designed to model the way in which the brain
performs a particular task …; the network is … implemented … or ..
simulated in software on a digital computer.“ [Haykin98]
… class of statistical methods for information processing consisting of
large number of simple processing units (neurons), which exchange
information of their activation via directed connections. [Zell97]
Input Processing Output
ƒtime series observation θ n +1 ƒtime series prediction
ƒcausal variables θ n+ 5 ƒdependent variables
ƒimage data (pixel/bits) θ n+ 2 ƒclass memberships
ƒFinger prints θ n+ 6 ƒclass probabilities
Black Box
ƒChemical readouts θ n+ 3 ƒprincipal components
ƒ… θn+h ƒ…
θ n+ 4

22
EVIC’05 © Sven F. Crone - www.bis-lab.com

What are Neural Networks in Forecasting?

ƒ Artificial Neural Networks (NN) Æ a flexible forecasting paradigm


… A class of statistical methods for time-series and causal forecasting

… Highly flexible processing Æ arbitrary input to output relationships


… Properties Æ non-linear, nonparametric (assumed), error robust (not outlier!)
… Data driven modelling Æ “learning” directly from data

Input Processing Output


ƒNominal, interval or θ n +1 ƒRatio scale
ratio scale (not ordinal)
θ n+ 5
ƒRegression
ƒtime series observations θ n+ 2 ƒSingle period ahead
ƒLagged variables θ n+ 6 ƒMultiple period ahead
ƒDummy variables Black Box
ƒ…
ƒCausal variables θ n+ 3
θn+h
ƒExplanatory variables
ƒDummy variables θ n+ 4
ƒ…

EVIC’05 © Sven F. Crone - www.bis-lab.com

DEMO: Preview of Neural Network Forecasting

ƒ Simulation of NN for Business Forecasting

ƒ Airline Passenger Data Experiment


ƒ 3 layered NN: (12-8-1) 12 Input units - 8 hidden units – 1 output unit
ƒ 12 input lags t, t-1, …, t-11 (past 12 observations) Æ time series prediction
ƒ t+1 forecast Æ single step ahead forecast

Æ Benchmark Time Series


[Brown / Box&Jenkins]
ƒ 132 observations
ƒ 13 periods of monthly data

EVIC’05 © Sven F. Crone - www.bis-lab.com

Demonstration: Preview of Neural Network Forecasting

ƒ NeuraLab Predict! Æ „look inside neural forecasting“

Errors on training / validation / test dataset

Time Series versus Neural Network Forecast


Æ updated after each learning step

PQ-diagramm
in sample observations & forecasts out of sample
training ÅÆ validate = Test

Time series
actual value
Absolute Forecasting Errors

NN forecasted value

23
EVIC’05 © Sven F. Crone - www.bis-lab.com

Agenda

Forecasting with Artificial Neural Networks


1. Forecasting?
2. Neural Networks?
1. What are NN? Definition & Online Preview …
2. Motivation & brief history of Neural Networks
3. From biological to artificial Neural Network Structures
4. Network Training
3. Forecasting with Neural Networks …
4. How to write a good Neural Network forecasting paper!

EVIC’05 © Sven F. Crone - www.bis-lab.com

Motivation for using NN … BIOLOGY!

ƒ Human & other nervous systems (animals, insects Æ e.g. bats)


… Ability of various complex functions: perception, motor control,
pattern recognition, classification, prediction etc.
… Speed: e.g. detect & recognize changed face in crowd=100-200ms
… Efficiency etc.
Æ brains are the most efficient & complex computer known to date

Human Brain Computer (PCs)


Processing Speed 10-3ms (0.25 MHz) 10-9ms (2500 MHz PC)
Neurons/Transistors 10 billion & 103 billion conn. 50 million (PC chip)
Weight 1500 grams kilograms to tons!
Energy consumption 10-16 Joule 10-6 Joule
Computation: Vision 100 steps billions of steps

Æ Comparison: Human = 10.000.000.000 Æ ant 20.000 neurons

EVIC’05 © Sven F. Crone - www.bis-lab.com

Brief History of Neural Networks

ƒ History
… Developed in interdisciplinary Research (McCulloch/Pitts1943)
… Motivation from Functions of natural Neural Networks
ª neurobiological motivation
ª application-oriented motivation [Smith & Gupta, 2000]

Turing Hebb Dartmouth Project Minsky/Papert Werbos 1st IJCNN 1st journals
1936 1949 1956 1969 1974 1987 1988
McCulloch / Pitts Rosenblatt Kohonen Rumelhart/Hinton/Williams
1943 1959 1972 1986
Minski 1954 INTEL1971 IBM 1981 Neuralware 1987 SAS 1997
builds 1st NeuroComputer 1st microprocessor Introduces PC founded Enterprise Miner
GE 1954 White 1988 IBM 1998
1st computer payroll system 1st paper on forecasting $70bn BI initiative

Neuroscience Applications in Engineering Applications in Business


Æ Pattern Recognition & Control ÆForecasting …

ª Research field of Soft-Computing & Artificial Intelligence


ª Neuroscience, Mathematics, Physics, Statistics, Information Science,
Engineering, Business Management
ª different VOCABULARY: statistics versus neurophysiology !!!

24
EVIC’05 © Sven F. Crone - www.bis-lab.com

Agenda

Forecasting with Artificial Neural Networks


1. Forecasting?
2. Neural Networks?
1. What are NN? Definition & Online Preview …
2. Motivation & brief history of Neural Networks
3. From biological to artificial Neural Network Structures
4. Network Training
3. Forecasting with Neural Networks …
4. How to write a good Neural Network forecasting paper!

EVIC’05 © Sven F. Crone - www.bis-lab.com

Motivation & Implementation of Neural Networks

ƒFrom biological neural networks … to artificial neural networks

o1wi,1neti = ∑wij o j −θ j ai = f (neti ) oi = ai

o2 wi,2 j
oi
oj wi,j
θ n +1
θ n+5
θ n+2
θ n+6
θ n+3
θ n+ h
Mathematics as abstract θ n+4
representations of reality
⎛ ⎞
Æ use in software simulators, oi = tanh ⎜ ∑ w ji o j − θ i ⎟
⎜ ⎟
hardware, engineering etc. ⎝ j ⎠

neural_net = eval(net_name);
[num_rows, ins] = size(neural_net.iw{1});
[outs,num_cols] = size(neural_net.lw{neural_net.numLayers,
neural_net.numLayers-1});
if (strcmp(neural_net.adaptFcn,''))
net_type = 'RBF';
else net_type = 'MLP';
end

fid = fopen(path,'w');

EVIC’05 © Sven F. Crone - www.bis-lab.com

Information Processing in biological Neurons

ƒ Modelling of biological functions in Neurons


… 10-100 Billion Neurons with 10000 connections in Brain
… Input (sensory), Processing (internal) & Output (motoric) Neurons

… CONCEPT of Information Processing in Neurons …

25
EVIC’05 © Sven F. Crone - www.bis-lab.com

Alternative notations –
Information processing in neurons / nodes

Biological Representation

Graphical Notation
Input Input Function Activation Function Output
in1 wi,1

in2
wi,2
neti = ∑ wij in j ai = f ( neti − θ j ) outi
βi => weights wi j

β0 => bias θ inj


wi,j
Neuron / Node ui

Mathematical Representation alternative:


⎧1 if ∑w ji x j −θ i ≥ 0 ⎛ ⎞

yi = ⎨
j yi = tanh ⎜ ∑ w ji x j − θ i ⎟
⎪0 if ∑w ji x j − θi < 0 ⎝ j


⎩ j

EVIC’05 © Sven F. Crone - www.bis-lab.com

Information Processing in artificial Nodes


CONCEPT of Information Processing in Neurons
Input Function (Summation of previous signals)
Activation Function (nonlinear)
binary step function {0;1}
sigmoid function: logistic, hyperbolic tangent etc.
Output Function (linear / Identity, SoftMax ...)

Input Input Activation Output Output


Function Function Function
in1 wi,1

neti = ∑ wij in j − θ j ai = f (net i )


wi,2
in2 oi = ai outi
j

wi,j
inj Neuron / Node ui
=
Unidirectional Information Processing

⎧1 if

∑w o
j
ji j −θi ≥ 0
outi = ⎨
⎪0 if ∑w ji o j − θi < 0
⎩ j

EVIC’05 © Sven F. Crone - www.bis-lab.com

Input Functions

26
EVIC’05 © Sven F. Crone - www.bis-lab.com

Binary Activation Functions

ƒ Binary activation calculated from input

a j = f act ( net j , θ j ) e.g. a j = f act ( net j − θ j )

EVIC’05 © Sven F. Crone - www.bis-lab.com

Information Processing: Node Threshold logic


Node Function Æ BINARY THRESHOLD LOGIC
1. weight individual input by connection strength
2. sum weighted inputs
3. add bias term
4. calculate output of node through BINARY transfer function Æ RERUN with next input
inputs weights information processing output

2.2
o1 w1,i
0.71 2.2* 0.71
+4.0*-1.84
+1.0* 9.01
w2,i -1.84 = 3.212 -4.778 < 0
4 o2 0.00
Î 0.00 oj
3.212
- 8.0
9.01 = -4.778
o3 w3,i
1
8.0
θ=o0 w0,i ⎧1 ∀∑ w ji o j −θ i ≥ 0
⎪ j
oi = ⎨
neti = ∑ wij o j − θ j ai = f (net i ) ⎪0 ∑ w ji o j − θ i < 0
j ⎩ j

EVIC’05 © Sven F. Crone - www.bis-lab.com

Continuous Activation Functions

ƒ Activation calculated from input

a j = f act ( net j , θ j ) e.g. a j = f act ( net j − θ j )

Hyperbolic Tangent

Logistic Function

27
EVIC’05 © Sven F. Crone - www.bis-lab.com

Information Processing: Node Threshold logic


Node Function Æ Sigmoid THRESHOLD LOGIC of TanH activation function
1. weight individual input by connection strength
2. sum weighted inputs
3. add bias term
4. calculate output of node through BINARY transfer function Æ RERUN with next input
inputs weights information processing output

2.2
o1 w1,i
0.71 2.2* 0.71
+4.0*-1.84
+1.0* 9.01
w2,i -1.84 = 3.212 Tanh(-4.778)
4 o2 -0.9998
= -0.9998 oj
3.212
- 8.0
9.01 = -4.778
o3 w3,i
1
8.0
θ=o0 w0,i
⎛ ⎞
neti = ∑ wij o j − θ j ai = f (net i ) oi = tanh ⎜ ∑ w ji o j − θ i ⎟⎟

j ⎝ j ⎠

EVIC’05 © Sven F. Crone - www.bis-lab.com

A new Notation … GRAPHICS!

ƒ Single Linear Regression … as an equation:

y = β o + β1 x1 + β 2 x2 + ... + β n xn + ε

ƒ Single Linear Regression … as a directed graph:

1
β0
x1
β1

x2 β2
∑β x
n
n n y
… βn
xn

Unidirectional Information Processing

EVIC’05 © Sven F. Crone - www.bis-lab.com

Why Graphical Notation?

ƒ Simple neural network equation without recurrent feedbacks:


⎛ ⎛ ⎛ ⎞ ⎞ ⎞
yk = tanh ⎜ ∑ wkj tanh ⎜ ∑ wki tanh ⎜ ∑ w ji x j − θ j ⎟ − θ i ⎟ − θ k ⎟ ⇒ Min !
⎜ k ⎜ i ⎟ ⎟
⎝ ⎝ ⎝ j ⎠ ⎠ ⎠
2
⎛ ⎜⎛ ∑N ⎞
−⎜ ∑ xi wij ⎟ −θ j ⎞
⎛ N ⎞
⎜ ⎜ xi wij ⎟⎠⎟ −θ j ⎜ ⎟ ⎟
β => wi ⎛⎛ N ⎞ ⎞ ⎝ e⎝ i=1 − e ⎝ i=1 ⎠

… with … i tanh ⎜ ⎜ ∑ xi wij ⎟ − θ j ⎟ = 2
β0 => θ ⎝ ⎝ i =1 ⎠ ⎠ ⎛ ⎜⎜ ∑ xi wij ⎟⎟−θ j
⎛ ⎞
−⎜ ∑ xi wij ⎟ −θ j ⎞
⎛ ⎞
N N

⎜ ⎜ ⎟ ⎟
⎝ e⎝ i=1 ⎠
+ e ⎝ i=1 ⎠ ⎠

ƒ Also: θ n+1

θ n+ 2
θ n+5
θ n+3

Æ Simplification
θ n+ 4
for complex models!

28
EVIC’05 © Sven F. Crone - www.bis-lab.com

Combination of Nodes

ƒ “Simple” processing per node


ƒ Combination of simple nodes ⎛⎛ N ⎞ ⎞
creates complex behaviour wk,j tanh ⎜ ⎜ ∑ oi wij ⎟ − θ j ⎟
ok ⎝ ⎝ i =1 ⎠ ⎠
ƒ … ⎛ ⎛⎜ ∑N oi wij ⎞⎟ −θ j −⎜ ∑ oi wij ⎟ −θ j ⎞
⎛ N ⎞
2

wl,j ⎜ e⎜⎝ i=1 ⎟




− e ⎝ i=1

⎠ ⎟
ol ⎜⎜ ⎟⎟
= ⎝ ⎠ →
2
⎛⎛ N ⎞ ⎞ ⎛ ⎜⎛ ∑ oi wij ⎟⎞ −θ j
N
− ⎜ ∑ oi wij ⎟ −θ j ⎞
⎛ N ⎞

w1,i tanh ⎜ ⎜ ∑ oi wij ⎟ − θ j ⎟ ⎜ e⎝⎜ i=1 + e ⎝ i=1 ⎠ ⎟


⎟ ⎜ ⎟
o1 ⎝ ⎝ i =1 ⎠ ⎠ ⎜⎜

⎟⎟
2 ⎝ ⎠
⎛ ⎜⎛ ∑N oi wij ⎟⎞ −θ j −⎜ ∑ oi wij ⎟ −θ j ⎞
⎛ N ⎞

w2,i ⎜ e⎝⎜ i=1 ⎟




− e ⎝ i=1

⎠ ⎟ wi,j
o2 ⎜⎜ ⎟⎟
=⎝ ⎠
2
→ oj
⎛ ⎛⎜ ∑ N ⎞
− ⎜ ∑ oi wij ⎟ −θ j ⎞
⎛ N ⎞
w3,i ⎜ e⎜⎝ i=1
oi wij ⎟ −θ j
⎛⎛ N ⎞ ⎞
+ e ⎝ i=1 ⎠ ⎟
⎟ ⎜ ⎟
o3 ⎠
tanh ⎜ ⎜ ∑ oi wij ⎟ − θ j ⎟
⎜⎜ ⎟⎟ wi,j+1 ⎝ ⎝ i =1 ⎠ ⎠
⎝ ⎠ 2
⎛ ⎛⎜ ∑ N ⎞
oi wij ⎟ −θ j
⎛ N ⎞
− ⎜ ∑ oi wij ⎟ −θ j ⎞
⎜ e⎝⎜ i=1 ⎟


− e ⎝ i=1

⎠ ⎟
wl,j ⎜⎜ ⎟⎟
ol =⎝ ⎠
2
⎛ ⎛⎜ ∑N ⎞
oi wij ⎟ −θ j
⎛ N ⎞
− ⎜ ∑ oi wij ⎟ −θ j ⎞
⎜ ⎜ ⎟ ⎜ ⎟

EVIC’05 © Sven F. Crone - www.bis-lab.com

Architecture of Multilayer Perceptrons


Architecture of a Multilayer Perceptron Combination of neurons

Æ Classic form of feed forward neural network!


Neurons un (units / nodes) ordered in Layers
unidirectional connections with trainable weights wn,n
Vector of input signals xi (input)
Vector of output signals oj (output)
w1,5 w5,8
X1 u1 u5 u8 w8,12
u12 o1 = neural network
X2
u2 u5 u9
U13 o2
X3 …
u3 u6 u10
… … … u14 o3

X4 u4 u7 u11

input-layer hidden-layers output-layer


⎛ ⎛ ⎛ ⎞ ⎞ ⎞
ok = tanh ⎜⎜ ∑ wkj tanh ⎜⎜ ∑ wki tanh⎜⎜ ∑ w ji o j − θ j ⎟⎟ − θ i ⎟⎟ − θ k ⎟⎟⎟ ⇒ Min!
⎜ ⎝ ⎠
⎝ k ⎝ i j ⎠ ⎠

EVIC’05 © Sven F. Crone - www.bis-lab.com

Dictionary for Neural Network Terminology

ƒ Due to its neuro-biological origins, NN use specific terminology


Neural Networks Statistics
Input Nodes Independent / lagged Variables
Output Node(s) Dependent variable(s)
Training Parameterization
Weights Parameters
… …

Æ don‘t be confused: ASK!

29
EVIC’05 © Sven F. Crone - www.bis-lab.com

Agenda

Forecasting with Artificial Neural Networks


1. Forecasting?
2. Neural Networks?
1. What are NN? Definition & Online Preview …
2. Motivation & brief history of Neural Networks
3. From biological to artificial Neural Network Structures
4. Network Training
3. Forecasting with Neural Networks …
4. How to write a good Neural Network forecasting paper!

EVIC’05 © Sven F. Crone - www.bis-lab.com

Hebbian Learning

ƒ HEBB introduced idea of learning by adapting weights [0,1]

Δwij = η oi a j

ƒ Delta-learning rule of Widrow-Hoff


Δwij = η oi (t j − a j )
= η oi (t j − o j ) = η oiδ j

EVIC’05 © Sven F. Crone - www.bis-lab.com

Neural Network Training with Back-Propagation


Training Æ LEARNING FROM EXAMPLES
1. Initialize connections with randomized weights (symmetry breaking)
2. Show first Input-Pattern (independent Variables) (demo only for 1 node!)
3. Forward-Propagation of input values unto output layer
4. Calculate error between NN output & actual value (using error / objective function)
5. Backward-Propagation of errors for each weight unto input layer
 RERUN with next input pattern…

w1,4 4 w4,9 Input-Vector Weight-Matrix

x1 x W
1 2 3 4 5 6 7 8 9 10
1 5 o1 1
9 2
x2 3
2 6 4
o2 5
6
x3 10 7
3 7 8
9
w3,8 w3,10 10
8
E=o-t
o
Output-Vector

t
Teaching-Output

30
EVIC’05 © Sven F. Crone - www.bis-lab.com

Neural Network Training

ƒ Simple back propagation algorithm [Rumelhart et al. 1982]


∂C (t pj , o pj )
E p = C (t pj , o pj ) o pj = f j (net pj ) Δ p w ji ∝ −
∂w ji
∂C (t pj , o pj ) ∂C (t pj , o pj ) ∂net pj
=
∂w ji ∂net pj ∂w ji
∂C (t pj , o pj ) Δwij = η oiδ j
δ pj = −
∂net pj ⎧ f j′ ( net j )( t j − o j ) ∀output nodes j
∂C (t pj , o pj ) ∂C (t pj , o pj ) ∂o pj ⎪
mit δ j = ⎨
⎪ f j′ ( net j ) ∑ ( δ k w jk ) ∀hidden nodes j
δ pj = − =
∂net pj ∂o pj ∂net pj
∂o pt ⎩ k
= f j' ( net pj )
∂net pj
∂C (t pj , o pj ) Δwij = η oiδ j
δ pj = f j' ( net pj )
∂o pj 1
∂ ∑ wki o pi mit f (net j ) = → f ′(net j ) = o j (1 − o j )
∂C (t pj , o pj ) ∂net pk ∂C (t pj , o pj ) ∑ oi ( t ) wij

k ∂net pk ∂o pj
=∑
k ∂net pk
i
∂o pj 1+ e i
∂C (t pj , o pj ) ⎧o j (1 − o j )( t j − o j ) ∀output nodes j
=∑ wkj = − ∑ δ pj wkj ⎪
∂net pk δj = ⎨
⎪o j (1 − o j ) ∑ (δ k w jk ) ∀hidden nodes j
k k

δ pj = f j' (net pj )∑ δ pj wkj ⎩ k


k

⎧ ∂C (t pj , o pj ) '
⎪ f j (net pj ) if unit j is in the output layer
⎪ ∂o
δ pj = ⎨ pj

⎪ f ' ( net ) δ w
pj ∑ pk pjk if unit j is in a hidden layer
⎩⎪
j
k

EVIC’05 © Sven F. Crone - www.bis-lab.com

Neural Network Training = Error Minimization

ƒ Minimize Error through changing ONE weight wj


E(wj)

random random
starting point 1 starting point 2

local local local


minimum minimum minimum

local
minimum
GLOBAL
minimum wj

EVIC’05 © Sven F. Crone - www.bis-lab.com

Error Backpropagation = 3D+ Gradient Decent

ƒ Local search on multi-dimensional error surface


4
2
ƒ task of finding the deepest
valley in mountains
0

-2
… local search
-4
… stepsize fixed
… follow steepest decent
2
Ælocal optimum = any valley
Æglobal optimum = deepest
valley with lowest error
0

Ævaries with error surface


4
-4 2
0

0
-2
-2 -4
6

4
0 5

2
2.5
0
2 0

-2.5

4 -4
-2 0
0
2
4

31
EVIC’05 © Sven F. Crone - www.bis-lab.com

Demo: Neural Network Forecasting revistied!

ƒ Simulation of NN for Business Forecasting

ƒ Airline Passenger Data Experiment


ƒ 3 layered NN: (12-8-1) 12 Input units - 8 hidden units – 1 output unit
ƒ 12 input lags t, t-1, …, t-11 (past 12 observations) Æ time series prediction
ƒ t+1 forecast Æ single step ahead forecast

Æ Benchmark Time Series


[Brown / Box&Jenkins]
ƒ 132 observations
ƒ 13 periods of monthly data

EVIC’05 © Sven F. Crone - www.bis-lab.com

Agenda

Forecasting with Artificial Neural Networks

1. Forecasting?
2. Neural Networks?
3. Forecasting with Neural Networks …
1. NN models for Time Series & Dynamic Causal Prediction
2. NN experiments
3. Process of NN modelling
4. How to write a good Neural Network forecasting paper!

EVIC’05 © Sven F. Crone - www.bis-lab.com

Time Series Prediction with Artificial Neural Networks

ƒ ANN are universal approximators [Hornik/Stichcomb/White92 etc.]


ª Forecasts as application of (nonlinear) function-approximation
ª various architectures for prediction (time-series, causal, combined...)

yˆ t + h = f ( xt ) + ε t + h
yt+h = forecast for t+h
f (-) = linear / non-linear function
xt = vector of observations in t
et+h = independent error term in t+h

ª Single neuron / node θ n +1


≈ nonlinear AR(p) θ n+5
θn+ 2
ª Feedforward NN (MLP etc.) θn+6
≈ hierarchy of nonlinear AR(p) θ n+3
θn+ h
ª Recurrent NN (Elman, Jordan) θn+ 4
≈ nonlinear ARMA(p,q)
ª … (
yˆ t +1 = f yt , yt −1 , yt − 2 ,..., yt − n −1 )
Non-linear autoregressive AR(p)-model

32
EVIC’05 © Sven F. Crone - www.bis-lab.com

Neural Network Training on Time Series

ƒSliding Window Approach of presenting Data Input


Present new
data pattern to
Neural Network
Calculate
Neural Network
Output from
Input values
Compare
Neural Network
Forecast agains
<> actual value
θ n +1 Backpropagation
Change weights
Forward pass to reduce output
θn+2
forecast error
θ n +5
Backpropagation New Data Input
θ n +3
Slide window
forward to show
θn+4 next pattern

EVIC’05 © Sven F. Crone - www.bis-lab.com

Neural Network Architectures for


Linear Autoregression

Æ Interpretation
ƒ weights represent
autoregressive terms
ƒ Same problems /
shortcomings as
standard AR-models!

Æ Extensions
ƒ multiple output nodes
= simultaneous auto-
regression models
ƒ Non-linearity through
different activation
function in output
node

yˆ t +1 = f ( yt , yt −1 , yt − 2 ,..., yt − n −1 )
yˆt +1 = yt wtj + yt −1wt −1 j + yt − 2 wt − 2 j + ... + yt − n −1 wt − n −1 j − θ j
linear autoregressive AR(p)-model

EVIC’05 © Sven F. Crone - www.bis-lab.com

Neural Network Architecture for


Nonlinear Autoregression

Æ Extensions
yˆ t +1 = f ( yt , yt −1 , yt − 2 ,..., yt − n −1 ) ƒ additional layers with
⎛t − n −1
⎞ nonlinear nodes
yˆ t +1 = tanh ⎜ ∑ yi wij −θ j ⎟ ƒ linear activation
⎝ i =t ⎠
Nonlinear autoregressive AR(p)-model function in output layer

33
EVIC’05 © Sven F. Crone - www.bis-lab.com

Neural Network Architectures for


Nonlinear Autoregression

Æ Interpretation
ƒ Autoregressive
modeling AR(p)-
approach WITHOUT
the moving average
terms of errors
≠ nonlinear ARIMA
θ n +1
ƒ Similar problems /
shortcomings as
standard AR-models!
θ n+ 2
θn+5
Æ Extensions
θ n +3
ƒ multiple output nodes
= simultaneous auto-
θ n+ 4 regression models

yˆ t +1 = f ( yt , yt −1 , yt − 2 ,..., yt − n −1 )
⎛ ⎛ ⎛ ⎞ ⎞ ⎞
yˆt +1 = tanh ⎜ ∑ wkj tanh ⎜ ∑ wki tanh ⎜ ∑ w ji yt − j − θ j ⎟ − θ i ⎟ − θ k ⎟
⎜ k ⎜ i ⎟ ⎟
⎝ ⎝ ⎝ j ⎠ ⎠ ⎠
Nonlinear autoregressive AR(p)-model

EVIC’05 © Sven F. Crone - www.bis-lab.com

Neural Network Architectures for


Multiple Step Ahead Nonlinear Autoregression

Æ Interpretation
ƒ As single
Autoregressive
modeling AR(p)

θ n +1
θ n +5
θ n+2
θ n +6
θ n +3
θ n+h

θ n+4

yˆt +1 , yˆt + 2 ,..., yˆt + n = f ( yt , yt −1 , yt − 2 ,..., yt − n −1 )

Nonlinear autoregressive AR(p)-model

EVIC’05 © Sven F. Crone - www.bis-lab.com

Neural Network Architectures for Forecasting -


Nonlinear Autoregression Intervention Model

Æ Interpretation
ƒ As single
Autoregressive
modeling AR(p)
ƒ Additional Event term
to explain external
events
θ n +1

Æ Extensions
θn +2 ƒ multiple output nodes
θ n +5 = simultaneous
multiple regression
θ n +3

θn +4

yˆt +1 , yˆt + 2 ,..., yˆt + n = f ( yt , yt −1 , yt − 2 ,..., yt − n −1 )

Nonlinear autoregressive ARX(p)-model

34
EVIC’05 © Sven F. Crone - www.bis-lab.com

Neural Network Architecture for


Linear Regression
Demand

Max Temperature

Rainfall

Sunshine Hours

yˆ = f ( x1 , x2 , x3 ,..., xn )
yˆ = x1 w1 j + x2t w2 j + x3t w3 j + ... + xn wnj − θ j
Linear Regression Model

EVIC’05 © Sven F. Crone - www.bis-lab.com

Neural Network Architectures for


Non-Linear Regression (≈Logistic Regression)
Demand

Max Temperature

Rainfall

Sunshine Hours

yˆ t +1 = f ( yt , yt −1 , yt − 2 ,..., yt − n −1 )
⎛ t − n −1 ⎞
yˆt +1 = log ⎜ ∑
⎝ i =t
yi wij −θ j ⎟

Nonlinear Multiple (Logistic) Regression Model

EVIC’05 © Sven F. Crone - www.bis-lab.com

Neural Network Architectures for


Non-linear Regression

Æ Interpretation
ƒ Similar to linear
Multiple Regression
Modeling
ƒ Without nonlinearity
in output: weighted
θ n+1 expert regime on non-
linear regression
ƒ With nonlinearity in
θ n+ 2 output layer: ???
θ n +5
θ n+ 3

θ n+ 4

yˆ = f ( x1 , x2 , x3 ,..., xn )
yˆ = x1 w1 j + x2t w2 j + x3t w3 j + ... + xn wnj − θ j
Nonlinear Regression Model

35
EVIC’05 © Sven F. Crone - www.bis-lab.com

Classification of Forecasting Methods


Forecasting Methods

Objective Subjective „Prophecy“


Forecasting Methods Forecasting Methods educated guessing…
Time Series Causal
Methods Methods

Averages Linear Regression Sales Force Composite


Moving Averages Multiple Regression Analogies
Naive Methods
Dynamic Regression Delphi
Exponential Smoothing PERT
Vector Autoregression
Simple ES Survey techniques
Linear ES Intervention model
Seasonal ES Neural Networks
Dampened Trend ES Neural Networks ARE
Simple Regression ƒ time series methods
Autoregression ARIMA ƒ causal methods
Neural Networks & CAN be used as
ƒ Averages & ES
Demand Planning Practice ƒ Regression …
Objektive Methods + Subjektive correction

EVIC’05 © Sven F. Crone - www.bis-lab.com

Different model classes of Neural Networks

ƒ Since 1960s a variety of NN were developed for different tasks


Æ Classification ≠ Optimization ≠ Forecasting Æ Application Specific Models

Focus

ƒ Different CLASSES of Neural Networks for Forecasting alone!


Æ Focus only on original Multilayer Perceptrons!

EVIC’05 © Sven F. Crone - www.bis-lab.com

Problem!

ƒ MLP most common NN architecture used


ƒ MLPs with sliding window can ONLY capture
nonlinear seasonal autoregressive processes nSAR(p,P)

ƒ BUT:
… Can model MA(q)-process through extended AR(p) window!
… Can model SARMAX-processes through recurrent NN

36
EVIC’05 © Sven F. Crone - www.bis-lab.com

Agenda

Forecasting with Artificial Neural Networks

1. Forecasting?
2. Neural Networks?
3. Forecasting with Neural Networks …
1. NN models for Time Series & Dynamic Causal Prediction
2. NN experiments
3. Process of NN modelling
4. How to write a good Neural Network forecasting paper!

EVIC’05 © Sven F. Crone - www.bis-lab.com

Time Series Prediction with Artificial Neural Networks

ƒ Which time series patterns can ANNs learn & extrapolate?


[Pegels69/Gardner85]

ƒ … ???
Æ Simulation of
Neural Network prediction of
Artificial Time Series

EVIC’05 © Sven F. Crone - www.bis-lab.com

Time Series Demonstration – Artificial Time Series

ƒ Simualtion of NN in Business Forecasting with NeuroPredictor

ƒ Experiment: Prediction of Artificial Time Series (Gaussian noise)


… Stationary Time Series
… Seasonal Time Series
… linear Trend Time Series
… Trend with additive Seasonality Time Series

37
EVIC’05 © Sven F. Crone - www.bis-lab.com

Time Series Prediction with Artificial Neural Networks

ƒ Which time series patterns can ANNs learn & extrapolate?


[Pegels69/Gardner85]

; ;

; ; ;

; ; ;

Æ Neural Networks can forecast ALL mayor time series patterns


Æ NO time series dependent preprocessing / integration necessary
Æ NO time series dependent MODEL SELECTION required!!!
Æ SINGLE MODEL APPROACH FEASIBLE!

EVIC’05 © Sven F. Crone - www.bis-lab.com

Time Series Demonstration A - Lynx Trappings

ƒ Simulation of NN in Business Forecasting

ƒ Experiment: Lynx Trappings at the McKenzie River


ƒ 3 layered NN: (12-8-1) 12 Input units - 8 hidden units – 1 output unit
ƒ Different lag structures: t, t-1, …, t-11 (past 12 observations
ƒ t+1 forecast Æ single step ahead forecast

Æ Benchmark Time Series


[Andrews / Hertzberg]
ƒ 114 observations
ƒ Periodicity? 8 years?

EVIC’05 © Sven F. Crone - www.bis-lab.com

Time Series Demonstration B – Event Model

ƒ Simulation of NN in Business Forecasting

ƒ Experiment: Mouthwash Sales


ƒ 3 layered NN: (12-8-1) 12 Input units - 8 hidden units – 1 output unit
ƒ 12 input lags t, t-1, …, t-11 (past 12 observations) Æ time series prediction
ƒ t+1 forecast Æ single step ahead forecast

Æ Spurious Autocorrelations from Marketing Events


ƒAdvertisement with small Lift
ƒPrice-reductions with high Lift

38
EVIC’05 © Sven F. Crone - www.bis-lab.com

Time Series Demonstration C – Supermarket Sales

ƒ Simulation of NN in Business Forecasting

ƒ Experiment: Supermarket sales of fresh products with weather


ƒ 4 layered NN: (7-4-4-1) 7 Input units - 8 hidden units – 1 output unit t+4
ƒ Different lag structures: t, t-1, …, t-7 (past 12 observations)
ƒ t+4 forecast Æ single step ahead forecast

EVIC’05 © Sven F. Crone - www.bis-lab.com

Agenda

Forecasting with Artificial Neural Networks


1. Forecasting?
2. Neural Networks?
3. Forecasting with Neural Networks …
1. NN models for Time Series & Dynamic Causal Prediction
2. NN experiments
3. Process of NN modelling
1. Preprocessing
2. Modelling NN Architecture
3. Training
4. Evaluation
4. How to write a good Neural Network forecasting paper!

EVIC’05 © Sven F. Crone - www.bis-lab.com

Decisions in Neural Network Modelling

ƒ Data Pre-processing
… Transformation
… Scaling
… Normalizing to [0;1] or [-1;1]
NN Modelling Process

ƒ Modelling of NN architecture
… Number of INPUT nodes
Number of HIDDEN nodes
…
… Number of HIDDEN LAYERS manual
… Number of OUTPUT nodes Decisions recquire
… Information processing in Nodes (Act. Functions)
… Interconnection of Nodes Expert-Knowledge

ƒ Training
… Initializing of weights (how often?)
… Training method (backprop, higher order …)
… Training parameters
… Evaluation of best model (early stopping)

ƒ Application of Neural Network Model

ƒ Evaluation
… Evaluation criteria & selected dataset

39
EVIC’05 © Sven F. Crone - www.bis-lab.com

Modeling Degrees of Freedom


ƒ Variety of Parameters must be pre-determined for ANN Forecasting:

D= [DSE DSA ]
Dataset Selection Sampling

P= [C N S ]
Preprocessing Correction Normalization Scaling

A= [NI NS NL NO K T ]
Architecture no. of input no. of hidden no. of hidden no. of output connectivity / Activation
nodes nodes layers nodes weight matrix Strategy

U= [FI FA FO ]
signal processing Input Activation Output
function Function Function

L= [G PT,L IP IN B ]
learning algorithm choice of Learning initializations number of stopping
Algorithm parameters procedure initializations method &
phase & layer parameters

O
objective Function

Æ interactions & interdependencies between parameter choices!

EVIC’05 © Sven F. Crone - www.bis-lab.com

Heuristics to Reduce Design Complexity

ƒ Number of Hidden nodes in MLPs (in no. of input nodes n)


… 2n+1 [Lippmann87; Hecht-Nielsen90; Zhang/Pauwo/Hu98]
… 2n [Wong91]; n [Tang/Fishwick93]; n/2 [Kang91]
… 0.75n [Bailey90]; 1.5n to 3n [Kasstra/Boyd96] …
ƒ Activation Function and preprocessing
… logistic in hidden & output [Tang/Fischwick93; Lattermacher/Fuller95; Sharda/Patil92 ]
… hyperbolic tangent in hidden & output [Zhang/Hutchinson93; DeGroot/Wurtz91]
… linear output nodes [Lapedes/Faber87; Weigend89-91; Wong90]
ƒ ... with interdependencies!

Æ no research on relative performance of all alternatives


Æ no empirical results to support preference of single heuristic
Æ ADDITIONAL SELECTION PROBLEM of choosing a HEURISTIC
Æ INCREASED COMPLEXITY through interactions of heurístics
Æ AVOID selection problem through EXHAUSTIVE ENUMERATION

EVIC’05 © Sven F. Crone - www.bis-lab.com

Tip & Tricks in Data Sampling

ƒ Do’s and Don'ts

… Random order sampling? Yes!


… Sampling with replacement? depends / try!
… Data splitting: ESSENTIAL!!!!
„ Training & Validation for identification, parameterisation & selection
„ Testing for ex ante evaluation (ideally multiple ways / origins!)

Æ Simulation Experiments

40
EVIC’05 © Sven F. Crone - www.bis-lab.com

Data Preprocessing

ƒ Data Transformation
… Verification, correction & editing (data entry errors etc.)
… Coding of Variables
… Scaling of Variables
… Selection of independent Variables (PCA)
… Outlier removal
… Missing Value imputation

ƒ Data Coding
… Binary coding of external events Æ binary coding
… n and n-1 coding have no significant impact, n-coding appears to be more
robust (despite issues of multicollinearity)

Æ Modification of Data to enhance accuracy & speed

EVIC’05 © Sven F. Crone - www.bis-lab.com

Data Preprocessing – Variable Scaling


X2 X2
ƒ Scaling of variables Income Income
100.000 1.0
90.000 0.9
80.000 0.8
70.000 0.7
60.000 0.6
50.000 0.5
40.000 0.4
30.000 0.3
20.000 0.2
10.000 0.1
0 0.0
0.01 0.02 0.03 X1 0.0 0.1 0.2 … 0.5 … 0.9 1.0 X1
Hair Length Hair Length
(x − Min(x))
… Linear interval scalingy = ILower + ( IUpper − ILower )
M ax(x) − M in(x)

… Intervall features, e.g. „turnover“ [28.12 ; 70; 32; 25.05 ; 10.17 …]


Linear Intervall scaling to taget intervall, e.g. [-1;1]
eg. x = 72 Max(x) = 119.95 Min(x) = 0 Target [-1;1]
(72 − 0) 144
y = −1 + (1 − (−1)) = −1 + = 0.2005
119.95 − 0 119.95

EVIC’05 © Sven F. Crone - www.bis-lab.com

Data Preprocessing – Variable Scaling


X2 X2
ƒ Scaling of variables Income Income
100.000 1.0
90.000 0.9
80.000 0.8
70.000 0.7
60.000 0.6
50.000 0.5
40.000 0.4
30.000 0.3
20.000 0.2
10.000 0.1
0 0.0
0.01 0.02 0.03 X1 0.0 0.1 0.2 … 0.5 … 0.9 1.0 X1
Hair Length Hair Length
… Standardisation / Normalisation
x −η
y=
σ

ƒ Attention: Interaction of interval with activation Function


… Logistic [0;1]
… TanH [-1;1]

41
EVIC’05 © Sven F. Crone - www.bis-lab.com

Data Preprocessing – Outliers

ƒ Outliers
… extreme values
… Coding errors
… Data errors

ƒ Outlier impact on scaled variables Æ potential to bias the analysis


„ Impact on linear interval scaling (no normalisation / standardisation)
Scaling

ƒ Actions 0 10 253 -1 +1
Æ Eliminate outliers (delete records)
Æ replace / impute values as missing values
Æ Binning of variable = rescaling
Æ Normalisation of variables = scaling

EVIC’05 © Sven F. Crone - www.bis-lab.com

Data Preprocessing – Skewed Distributions

ƒ Asymmetry
of observations

ƒ …

Æ Transform data
ƒ Transformation of data (functional transformation of values)
ƒ Linearization or Normalisation
Æ Rescale (DOWNSCALE) data to allow better analysis by
ƒ Binning of data (grouping of data into groups) Æ ordinal scale!

EVIC’05 © Sven F. Crone - www.bis-lab.com

Data Preprocessing – Data Encoding

ƒ Downscaling & Coding of variables


… metric variables Æ create bins/buckets of ordinal variables (=BINNING)
„ Create buckets of equaly spaced intervalls
„ Create bins if Quantile with equal frequencies

… ordinal variable of n values


Æ rescale to n or n-1 nominal binary variables

… nominal Variable of n values, e.g. {Business, Sports & Fun, Woman}


Æ Rescale to n or n-1 binary variables
„ 0 = Business Press

„ 1 = Sports & Fun

„ 2 = Woman
„ Recode as 1 of N Coding Æ 3 new bit-variables
„ 1 0 0 Æ Business Press
„ 0 1 0 Æ Sports & Fun
„ 0 0 1 Æ Woman
„ Recode 1 of N-1 Coding Æ 2 new bit-variables
„ 1 0 Æ Business Press
„ 0 1 Æ Sports & Fun
„ 0 0 Æ Woman

42
EVIC’05 © Sven F. Crone - www.bis-lab.com

Data Preprocessing – Impute Missing Values


X2
ƒ Missing Values Income
100.000
… missing feature value for instance 90.000
… some methods interpret “ “ as 0! 80.000
70.000
… Others create special class for missing 60.000
50.000
… … 40.000
30.000
20.000
10.000
0
0.01 0.02 0.03 X1
Hair Length

ƒ Solutions
… Missing value of interval scale Æ mean, median, etc.
… Missing value of nominal scale Æ most prominent value in feature
set

EVIC’05 © Sven F. Crone - www.bis-lab.com

Tip & Tricks in Data Pre-Processing

ƒ Do’s and Don'ts

… De-Seasonalisation? NO! (maybe … you can try!)


… De-Trending / Integration? NO / depends / preprocessing!

… Normalisation? Not necessarily Æ correct outliers!


… Scaling Intervals [0;1] or [-1;1]? Both OK!
… Apply headroom in Scaling? YES!
… Interaction between scaling & preprocessing? limited
… …

Æ Simulation Experiments

EVIC’05 © Sven F. Crone - www.bis-lab.com

Outlier correction in Neural Network Forecasts?

… Outlier correction? YES!

… Neural networks are often characterized as


„Fault tolerant and robust
„Showing graceful degradation regarding errors
Æ Fault tolerance = outlier resistance in time series prediction?

Æ Simulation Experiments

43
EVIC’05 © Sven F. Crone - www.bis-lab.com

ƒ Number of OUTPUT nodes


… Given by problem domain!
ƒ Number of HIDDEN LAYERS
… 1 or 2 … depends on Information Processing in nodes
… Also depends on nonlinearity & continuity of time series
ƒ Number of HIDDEN nodes
… Trial & error … sorry!

ƒ Information processing in Nodes (Act. Functions)


… Sig-Id
… Sig-Sig (Bounded & additional nonlinear layer)
… TanH-Id
… TanH-TanH (Bounded & additional nonlinear layer)

ƒ Interconnection of Nodes
… ???

EVIC’05 © Sven F. Crone - www.bis-lab.com

Tip & Tricks in Architecture Modelling

ƒ Do’s and Don'ts

… Number of input nodes? DEPENDS! Æ use linear ACF/PACF to start!


… Number of hidden nodes? DEPENDS! Æ evaluate each time (few)
… Number of output nodes? DEPENDS on application!

… fully or sparsely connected networks? ???


… shortcut connections? ???

… activation functions Æ logistic or hyperbolic tangent? TanH !!!


… activation function in the output layer? TanH or Identity!
… …

Æ Simulation Experiments

EVIC’05 © Sven F. Crone - www.bis-lab.com

Agenda

Forecasting with Artificial Neural Networks


1. Forecasting?
2. Neural Networks?
3. Forecasting with Neural Networks …
1. NN models for Time Series & Dynamic Causal Prediction
2. NN experiments
3. Process of NN modelling
1. Preprocessing
2. Modelling NN Architecture
3. Training
4. Evaluation & Selection
4. How to write a good Neural Network forecasting paper!

44
EVIC’05 © Sven F. Crone - www.bis-lab.com

Tip & Tricks in Network Training

ƒ Do’s and Don'ts

… Initialisations? A MUST! Minimum 5-10 times!!!


Error Variation by number of Initialisations

40,0

35,0

30,0

25,0
[MAE]

20,0

15,0

10,0

5,0
lowest error
0,0
highest error
1 5 10 25 50 100 200 400 800 1600
[initialisations]

Æ Simulation Experiments

EVIC’05 © Sven F. Crone - www.bis-lab.com

Tip & Tricks in Network Training & Selection

ƒ Do’s and Don'ts

… Initialisations? A MUST! Minimum 5-10 times!!!


… Selection of Training Algorithm? Backprop OK, DBD OK …
… not higher order methods!
… Parameterisation of Training Algorithm? DEPENDS on dataset!
… Use of early stopping? YES – carefull with stopping criteria!
… …

… Suitable Backpropagation training parameters (to start with)


„ Learning rate 0.5 (always <1!)
„ Momentum 0.4
„ Decrease learning rate by 99%

… Early stopping on composite error of


Training & Validation

Æ Simulation Experiments

EVIC’05 © Sven F. Crone - www.bis-lab.com

Agenda

Forecasting with Artificial Neural Networks


1. Forecasting?
2. Neural Networks?
3. Forecasting with Neural Networks …
1. NN models for Time Series & Dynamic Causal Prediction
2. NN experiments
3. Process of NN modelling
1. Preprocessing
2. Modelling NN Architecture
3. Training
4. Evaluation & Selection
4. How to write a good Neural Network forecasting paper!

45
EVIC’05 © Sven F. Crone - www.bis-lab.com

Experimental Results

ƒ Experiments ranked by validation error


TRAIN
Rank by Data Set Errors
valid-error
Training Validation Test ANN ID
overall lowest 0,009207 0,011455 0,017760
overall highest 0,155513 0,146016 0,398628
VALID
1st 0,010850 0,011455 0,043413 39 (3579)
2nd 0,009732 0,012093 0,023367 10 (5873)
… … … … …
25th 0,009632 0,013650 0,025886 8 (919)
TEST
… … … … …
14400th 0,014504 0,146016 0,398628 33 (12226)

0 ,2 5
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
Æ significant positive correlations
A
A

training & validation set


A
A
A
A A
A
AA A
A
A A A
A
A
AAA
A
A
A
A
A
A
AAA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
10 „
A
A
A
A
A A
A
AA A
A
A AA AA A A A
A
A
A A

validation & test set


0 ,2 0 A A
A
A
A
AA
A
AA
A
A
AA
Percentile Validation

AAAA AAAA A A
A
A
A
A
AAA
A
AA AAA AA A AA
A
A
A
A
AA
A
A
A A
A A A AA AAA A
AA
A
A
A
AA A
A
A
A AAA
A AAA
A
A
AAA
A
AA A A A A
AA A A
A A
AA
A
A
A
A
A
AA
A
A
A
A
A
A
A
A
A
AA
A A
„
AAAAA
A
A
A AAAAAAA
A A A A
A A
A
A
AA
A
A
A
A
A
A
AA
AA
AAAAAAAA A A
A
AAAA
A
A
A AA A
A A A A A
A
A A

training & test set


A
AAAA
A
A
AAA
A
AA A
AAAA
A AA A A A A A
AA
A
A
A
A
A AA
A
AA
A
AA
A
AA
A
AAA
A
AA
A
A
A
AAAA
AA
A AAA AA A AA
AA
A
0 ,1 5 A A
AA
A
AA
AA
A
A
A
A
A
A
AA
A
A
A
A
A
AA A
AAA AAAA A AA
A
A
A
A
A
AA
A
A
A
A
AA
A
A
AA
AA
A
AA
A
A
AA
A
AA
A
A
A
AA
A
A
AAA A
A A AA A A
A
AA
A
A
A A A
A
A
AA
A
A
AA
A
AA
A
AA
A
AAAA
AA
AA
A
A
AAA
AAA
A AA A A A A
A A A
A
AA
AA
A
AA
„
test

A AAAAAA A A
AA A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
AAA
AA
AAA
AA
AAA A
AA
A
A
A
AA
A A
6
A A A
A
A
AA
A
A
A
A
A
AA
A
A
A
AA
AA
A
A
AA
A
AA
A
AA
A
AA
A
A
AA
AA
AA
AA
AA
AA
A
A AA A
A A A
A
AA
AA
A
A
A
A
A
A
AA
A
A
A
A
A
A
AA
A
A
A
A
A
A
A
AA
A
A
A
AA
A
A
AA
A
A
AA
AA
A
AA
A
AA
A
AA
A
A
AA
A
A
AA A
A
A AAA A
A A
AAA A A A
A
AA
A
A
A
A
AAA A
A
AA
A
A
A
A
A
A
AA
A
A
A
A
AA
A
A
A
A
AA
A
A
A
A
AA
A
A
A
AA
A
A
AA
A
A
AA
AA
AA
A
AAA
A
A AA AA A AA A
A
A
AA
A
A
A
A
AA
A
A
A
AA
A
A
A
A
A
A
AA
A
A
A
AA
A
A
A
A
AA
A
AA
A
AA
A
AA
A
AA
A
AA
A
AA
A
AA A A
A
AAA
A
A
AA
A
A
A
A
AA
A
A A
A
A
AA
A
A
AA
A
A
AA
A
AAA
A
AA
A A
AAA
A AAAA A A
A
AAAA
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AAA
A
AA
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
A
A
A A A A A A
A
A
A
AA
A
A
AA
0 ,1 0 A
AAA
A
A
AA
A
A
A
AA
A
A
A
A
AA
A
A
A
A
A
A
A
A
A
AA
A
A
A
AA
AA
AA
A
A
A
A
A
AA
A
A
A
A
A
AA
A
A
A
AA
A
AA
A
A
A
AA
AA
AA
A
AA
A A A A A
AA
A
AA
A
A
A
AA
A
A
A
AA
A
A
A
AA
A
A
AA
A
AA
A
A
AA
A
A
A
A
A
AA
A
AA
A
A
A
A
A
AA
A
A
A
AA
AAAAAAA
A A
AAAAA
A
A
A
AA
AA
A
A
A
AA
AAA
A
A
A
AA
A
A
AA
A
A
A
A
AA
A
A
AA
A
AA
AA
A
AAA
AA
AAAA
A
A A A
A
A
AAA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
AA
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
AA
A
AA
A
A
A
A
A
A
A
AA
A
AA
A
A
A
A
A
A
A
AA
A
A
A
A
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
A
A
AAAAA A AA
AAAA
A
A
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
A
A
A
A
AA
A
AAA A
AA A
AA
AA
A
A
A
A
AA
A
A
A
AA
A
A
A
A
AA
A
A
A
A
A
A
A
AA
A
A
A
A
AA
A
A
A
A
AA
A
A
A
A
A
A
A
AA
A
A
A
A
AA
A
A
AA
A
A
AA
AA
A
A
AAA
A
A
AA
A
AA
A
A
A
A
AA
A
A
A
A
A
AA
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
AA
AA
A
A
AA
A
AA
A
A
A
A
AA
A
A
A
A
A
A
A
A
AA
A
A
A
AA
A
A
A
A
A
AA
A
A
A
A
A
AA
A
A
A
A
AA
A
AA
A
A
AA
AA
A
A
AA
AA
AAA
AA
AA
AA
AA
AA
A A
AAA
AAA
A A

inconsistent errors by selection criteria


A
AA
AA
A
A
AA
A
A
A
A
A
A
A
A
AA
A
A
AA
A
A
A
AAAA
A
A
A
A
A
AA
AA
A
AA
A
AAA
A
A
A
A
A
A
A
AA
AA
A
AA
A
A
A
A
AA
A
A
A
A
A
A
AA
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
AA
A
A
A
A
A
AAA
AA
A
AA A
A A
A
A
A
A
A
A
AA
A
AA
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
A
AA
AA
AA
A
A
A
A
A
AA
AA
A
A
A
A
A
A
AA
A
AA
A
A
A
A
AA
A
A
A
A
A
AA
AA
AA
A
A
A
A
A
AA
A
A
A
A
AA
A
A
A
A
AA
A
A
A
AA
AAA
AA
A
A
A
AA
A
A
A
A
A
AA
A
A
AA
A
AA
A
A
A
A
A
A
A
A
AA
A
A
A
AA
A
AA
AA
A
A
AA
A
A
A
A
A
AA
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
AA
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
AA
A
A
A
A
AA
A
A
AA
A
A
A
AA
AA
AA
A
A
A
A
AA
A
A
A
A
A
A
AA
A
A
AA
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
AAA
AA
AA
A
A
A
A
A
AAA
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
AA
A
A
A
A
AA
A
A
A
A
A
AA
A
A
AA
A
AAA
A
A
A
AA
A
A
A
A
AA
AA
A
A
AA
A
AA
A
AA
A
A
A
A
A
AAA
A
A
A
A
A
A
AA
AA
A
A
A
A
A
AA
A
A
A
AA
A
A
A
A
AA
A
A
A
A
AA
AA
AAA A
A
A
0 ,0 5 A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
AA
A
A
A
A
A
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
AA
A
A
A
A
A
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
AA
A
AA
A
A
A
AA
AA
A
A
A
A
AA
A
AA
AA
A
AA
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
A
A
A
A
AA
A
AA
A
A
A
A
A
A
A
A
AA
A
A
AAAA
A
1
…
AA
A
A
A
A
AA
A
A
AA
AA
A
A
AAAA

0 ,00
0,10
0 ,15
„ low validation error Æ high test error
higher validation error Æ lower test error
0,05 0,05
0,1 0
0,15 valid
train „

EVIC’05 © Sven F. Crone - www.bis-lab.com

Problem: Validation Error Correlations

ƒ Correlations between dataset errors


Correlation between datasets
Data included Train - Validate Validate - Test Train - Test
14400 ANNs 0,7786** 0,9750** 0,7686**
top 1000 ANNs 0,2652** 0,0917** 0,4204**
top 100 ANNs 0,2067** 0,1276** 0,4004**

Æ validation error is questionable selection criterion S-Diagramm of Error Correlations


for TOP 1000 ANN s by Training error

decreasing correlation
S -Diagramm of Error Correlations E Tain
0,09 E Valid
for TOP 1000 ANNs by Test error
… 0,08
E Tain
E Test
20 ANN m ov.Average Train
20 ANN m ov.Average Test
0,035 E Valid
0,07
S-Diagramm of Error Correlations E Tain
high variance on test error
E Test
0,03 0,06 20 ANN mov.Average Train
20 ANN mov.Average Test
for TOP 1000 ANNs by Valid error E Valid
… 0,05
[Error]

0,025
0,3 0,04
E Test

same results ordered


0,02 E Tain 150 MovAv. E Test
[Error]

0,08 0,03
E Valid
… 0,015 0,02
E Test 150 MovAv. E Train

by training & test error


0,25
0,07 0,01 0,01 20 ANN m ov.Average Train
0 20 ANN m ov.Average Test
0,005 1 64 127 190 253 316 379 442 505 568 631 694 757 820 883 946
0,06 [ordered ANN by T rain error]
0,2 0
1 64 127 190 253 316 379 442 505 568 631 694 757 820 883 946
0,05 [orde red ANN by Valid e rror]
[Error]
[Error]

0,15
0,04

0,03
0,1
0,02
0,05
0,01

00
11 64 127
1056 2111190
31662534221
316 5276
379 6331
442 505
738656884416319496
69410551
757 11606
820 883
12661946
13716
[ordered ANN
[ordered ANN by
by Valid
Valid error]
error]

EVIC’05 © Sven F. Crone - www.bis-lab.com

ƒ Desirable properties of an Error Measure:


… summarizes the cost consequences of the errors
… Robust to outliers
… Unaffected by units of measurement
… Stable if only a few data points are used

Fildes,IJF, 92, Armstrong and Collopy, IJF, 92;


Hendry and Clements, Armstong and Fildes, JOF, 93,94

46
EVIC’05 © Sven F. Crone - www.bis-lab.com

Model Evaluation through Error Measures

ƒ forecasting k periods ahead we can assess the forecast quality


using a holdout sample

ƒ Individual forecast error


… et+k= Actual - Forecast et = yt − Ft

ƒ Mean error (ME) 1 n


… Add individual forecast errors t ME = ∑Yt+k − Ft +k
n k =1
… As positive errors cancel out negative errors,
the ME should be approximately zero for an
unbiased series of forecast

ƒ Mean squared error (MSE) 1 n


MSE t = ∑(Yt+k − Ft+k )
2

… Square the individual forecast errors n k =1


… Sum the squared errors and divide by n

EVIC’05 © Sven F. Crone - www.bis-lab.com

Model Evaluation through Error Measures

Æavoid cancellation of positive v negative errors: absolute errors


1 n
ƒ Mean absolute error (MAE) MAE = ∑ Yt +k − Ft+k
n k =1
… Take absolute values of forecast errors
… Sum absolute values and divide by n
1 n Yt + k − Ft + k
ƒ Mean absolute percent error (MAPE) MAPE = ∑ Y
n k =1 t +k
… Take absolute values of percent errors
… Sum percent errors and divide by n

ÆThis summarises the forecast error over different lead-times


ÆMay need to keep k fixed depending on the decision to be made
based on the forecast:
T + n− k
1 T + n −k
Yt +k − Ft (k )

1
MAE(k ) = ∑
(n − k + 1) t =T
Yt +k − Ft (k ) MAPE(k ) =
( n − k + 1) t =T Yt +k

EVIC’05 © Sven F. Crone - www.bis-lab.com

Selecting Forecasting Error Measures

ƒ MAPE & MSE are subject to upward bias by single bad forecast
ƒ Alternative measures may are based on median instead of mean

ƒ Median Absolute Percentage Error


… median = middle value of a set of errors sorted in ascending order
… If the sorted data set has an even number of elements, the median
is the average of the two middle values

⎛ ef ,t ⎞
MdAPEf = Med⎜⎜ ×100⎟⎟
⎝ yt ⎠
ƒ Median Squared Error
MdSEf = Med(ef2,t )

47
EVIC’05 © Sven F. Crone - www.bis-lab.com

Evaluation of Forecasting Methods

ƒ The Base Line model in a forecasting competition is the Naïve


1a No Change model Æ use as a benchmark
yˆt +f t = y t

ƒ Theil’s U statistic allows us to determine whether our forecasts


outperform this base line, with increased accuracy trough our
method (outperforms naïve ) if U < 1

(
⎛ yˆt +f t − y t +f ⎞ ) 2

∑⎜⎜ y ⎟

U= ⎝ t ⎠
⎛ (y t − y t +f ) ⎞
2

∑⎝⎜ y ⎟⎠
t

EVIC’05 © Sven F. Crone - www.bis-lab.com

Tip & Tricks in Network Selection

ƒ Do’s and Don'ts

… Selection of Model with lowest Validation error? NOT VALID!


… Model & forecasting competition? Always multiple origin etc.!
… …

Æ Simulation Experiments

EVIC’05 © Sven F. Crone - www.bis-lab.com

Agenda

Forecasting with Artificial Neural Networks

1. Forecasting?
2. Neural Networks?
3. Forecasting with Neural Networks …
1. NN models for Time Series & Dynamic Causal Prediction
2. NN experiments
3. Process of NN modelling
4. How to write a good Neural Network forecasting paper!

48
EVIC’05 © Sven F. Crone - www.bis-lab.com

How to evaluate NN performance

Valid Experiments
ƒ Evaluate using ex ante accuracy (HOLD-OUT data)
… Use training & validation set for training & model selection
… NEVER!!! Use test data except for final evaluation of accuracy
ƒ Evaluate across multiple time series
ƒ Evaluate against benchmark methods (NAÏVE + domain!)
ƒ Evaluate using multiple & robust error measures (not MSE!)
ƒ Evaluate using multiple out-of-samples (time series origins)
Æ Evaluate as Empirical Forecasting Competition!

Reliable Results
ƒ Document all parameter choices
ƒ Document all relevant modelling decisions in process
Æ Rigorous documentation to allow re-simulation through others!

EVIC’05 © Sven F. Crone - www.bis-lab.com

Evaluation through Forecasting Competition

ƒ Forecasting Competition
… Split up time series data Æ 2 sets PLUS multiple ORIGINS!
… Select forecasting model
… select best parameters for IN-SAMPLE DATA
… Forecast next values for DIFFERENT HORIZONS t+1, t+3, t+18?
… Evaluate error on hold out OUT-OF-SAMPLE DATA
… choose model with lowest AVERAGE error OUT-OF-SAMPLE DATA

ƒ Results Æ M3-competition
… simple methods outperform complex ones
… exponential smoothing OK
Æ neural networks not necessary
… forecasting VALUE depends on
VALUE of INVENTORY DECISION

EVIC’05 © Sven F. Crone - www.bis-lab.com

Evaluation of Forecasting Methods

ƒ HOLD-OUT DATA Æ out of sample errors count!


… today Future …
… 2003 “today” presumed Future …
Method Jan Feb Mar Apr Mai Jun Jul Aug Sum Sum
Baseline Sales 90 100 110 ? ? ? ? ?
Method A 90 90 90 90 90 90 90 90
Method B 110 100 120 100 110 100 110 100
absolute error AE(A) 0 10 20 ? ? ? ? ? 30 ?
absolute error AE(B) 20 0 10 ? ? ? ? ? 10 ?

t+1 t+2 t+3


t+1 t+2 t+3
SIMULATED = EX POST Forecasts

49
EVIC’05 © Sven F. Crone - www.bis-lab.com

Evaluation of Forecasting Methods

ƒ Different Forecasting horizons, emulate rolling forecast …


… 2003 “today” presumed Future …
Method Jan Feb Mar Apr Mai Jun Jul Aug Sum Sum
Baseline Sales 90 100 110 100 90 100 110 100
Method A 90 90 90 90 90 90 90 90
Method B 110 100 120 100 110 100 110 100
absolute error AE(A) 0 10 20 10 0 10 20 10 30 50
absolute error AE(B) 20 0 10 0 20 0 0 0 30 20

t+1 t+2 t+3

t+1 t+2
ƒ Evaluate only RELEVANT horizons …
… omit t+2 if irrelevant for planning!

EVIC’05 © Sven F. Crone - www.bis-lab.com

Evaluation of Forecasting Methods

ƒ Single vs. Multiple origin evaluation


… 2003 “today” presumed Future …
Method Jan Feb Mar Apr Mai Jun Jul Aug Sum Sum
Baseline Sales 90 100 110 100 90 100 110 100
Method A 90 90 90 90 90 90 90 90
Method B 110 100 120 100 110 100 110 100
absolute error AE(A) 0 10 20 10 0 10 20 10 30 50
absolute error AE(B) 20 0 10 0 20 0 0 0 30 20

B A B
ƒ Problem of sampling Variability!
… Evaluate on multiple origins A
… Calculate t+1 error
… Calculate average of t+1 error A
Æ GENERALIZE about forecast errors
B

EVIC’05 © Sven F. Crone - www.bis-lab.com

Software Simulators for Neural Networks

Commercial Software by Price Public Domain Software

ƒ High End ƒ Research oriented


… Neural Works Professional
… SNNS
… SPSS Clementine
… SAS Enterprise Miner … JNNS JavaSNNS
ƒ Midprice … JOONE
… Alyuda NeuroSolutions … …
… NeuroShell Predictor
… NeuroSolutions
… NeuralPower Æ FREE CD-ROM for evaluation
… PredictorPro … Data from Experiments
„ M3-competition
ƒ Research „ airline-data
… Mathlab Library „ lynx-data
… R-package „ beer-data
… NeuroLab
… Software Simulators
ƒ…

Æ Consider Tashman/Hoover Tables on forecasting Software for more details

50
EVIC’05 © Sven F. Crone - www.bis-lab.com

Neural Networks Software - Times Series friendly!


Alyuda Inc.

Ward Systems AITrilogy: NeuroShell Predictor, NeuroShell


Classifier, GeneHunter
”Let your systems
learn the wisdom NeuroShell 2, NeuroShell Trader, Pro,DayTrader
of age and
experience”

Attrasoft Inc. Predictor


Predictor PRO

Promised Land Braincell

Neural Planner Inc. Easy NN


Easy NN Plus

NeuroDimension NeuroSolutions Cosunsultant


Inc. Neurosolutions for Excel
NeuroSolutions for Mathlab
Trading Solutions

EVIC’05 © Sven F. Crone - www.bis-lab.com

Neural networks Software – General Applications


Neuralware Inc Neural Works Professional II Plus

SPSS SPSS Clementine DataMining Suite

SAS SAS Enterprise Miner

… …

EVIC’05 © Sven F. Crone - www.bis-lab.com

Further Information

ƒ Literature & websites


… NN Forecasting website www.neural-forecasting.com or www.bis-lab.com
… Google web-resources, SAS NN newsgroup FAQ
ftp://ftp.sas.com/pub/neural/FAQ.html
… BUY A BOOK!!! Only one? Get: Reeds & Marks ‘Neural Smithing’

ƒ Journals
… Forecasting … rather than technical Neural Networks literature!
„ JBF – Journal of Business Forecasting
„ IJF – International Journal of Forecasting
„ JoF – Journal of Forecasting

ƒ Contact to Practitioners & Researchers


… Associations
„ IEEE NNS – IEEE Neural Network Society
„ INNS & ENNS – International & European Neural Network Society
… Conferences
„ Neural Nets: IJCNN, ICANN & ICONIP by associations (search google …)
„ Forecasting: IBF & ISF conferences!
… Newsgroups news.comp.ai.nn
… Call Experts you know … me ;-)

51
EVIC’05 © Sven F. Crone - www.bis-lab.com

Agenda

Business Forecasting with Artificial Neural Networks

1. Process of NN Modelling
2. Tips & Tricks for Improving Neural Networks based forecasts
a. Copper Price Forecasting
b. Questions & Answers and Discussion
a. Advantages & Disadvantages of Neural Networks
b. Discussion

EVIC’05 © Sven F. Crone - www.bis-lab.com

Advantages … versus Disadvantages!


Advantages Disadvantages

- ANN can forecast any time - ANN can forecast any time
series pattern (t+1!) series pattern (t+1!)
- without preprocessing - without preprocessing
- no model selection needed! - no model selection needed!
- ANN offer many degrees of - ANN offer many degrees of
freedom in modeling freedom in modeling
- Freedom in forecasting with - Experience essential!
one single model - Research not consistent
- Complete Model Repository - explanation & interpretation
- linear models of ANN weights IMPOSSIBLE
- nonlinear models
(nonlinear combination!)
- Autoregression models
- impact of events not directly
- single & multiple regres.
deductible
- Multiple step ahead
- …

EVIC’05 © Sven F. Crone - www.bis-lab.com

Questions, Answers & Comments?


Summary Day I
- ANN can forecast any time series
pattern (t+1!)
- without preprocessing
- no model selection needed!
- ANN offer many degrees of
freedom in modeling
- Experience essential!
- Research not consistent
Sven F. Crone
crone@bis-lab.de
What we can offer you:
SLIDES & PAPERS availble: - NN research projects with
www.bis-lab.de complimentary support!
- Support through MBA master
www.lums.lancs.ac.uk
thesis in mutual projects

52
EVIC’05 © Sven F. Crone - www.bis-lab.com

Contact Information

Sven F. Crone
Research Associate

Lancaster University Management School


Department of Management Science, Room C54
Lancaster LA1 4YX
United Kingdom

Tel +44 (0)1524 593867


Tel +44 (0)1524 593982 direct
Tel +44 (0)7840 068119 mobile
Fax +44 (0)1524 844885

Internet www.lums.lancs.ac.uk
eMail s.crone@lancaster.ac.uk

53