You are on page 1of 74

Part II

Neural Networks: a short introduction

1
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Some biological remarks

` The human brain (whose processing speed is around 100 Hz) is


g
able to make logical inferences,, but computers
p (processing
(p g speed
p
around 109 Hz) easily outperform it

` Conversely, the human brain can manage complex tasks (throw a


ball in a basket by trial and error, control an unknown machinery,
recognize a face in the crowd) that are almost impossible for
computers

` The reason lies in the high number of processing units (1010


neurons) of the brain and their massive interconnection (104
synapses)

` In fact, most instinctive activities of our brain are similar to


approximating complex functions and do not involve logical
inference

2
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Artificial Neural Networks (ANNs)

` Neural networks are parallel processing structures composed by


many elementary
l units
i that
h reproduce
d non-linear
li relationships
l i hi
learned from examples

` Neural
N l networks
t k are freely
f l inspired
i i d by
b biological
bi l i l concepts
t

` ANNs are a wide class of logical structures, usually built as


computer
t code
d

` Perform complex tasks, such as approximation of functions,


associative data retrieval,
retrieval classification

` Several ANNs can be considered as non-linear and non-parametric


multivariate regression methods

3
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
ANNs: structure and glossary

` ANNs are composed by units (or neurons), organized in different ways


and connected by weighted connections

` The weights contain the stored information

` E
Eachh unitit performs
f a non-linear
li t
transformation
f ti off the
th sum off its
it
weighted input signals and propagates it to the inputs of the connected
units

` The weight values are adjusted by the learning procedure, performed


on a set of examples named Training Set (TS)

` Once trained, ANNs can produce the correct outputs for inputs not
previously included in the TS (generalization)

` ANNs can be supervised (learn input-output


input output relationships) or
unsupervised (discover features hidden in the TS)

4
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
A generic feed-forward neural network
units
The information goes from the input to the
output
t t units
it weights
i ht

input
units
inputs
output

5
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Some historical remarks

` McCulloch and Pitts (1940) proposed a neuron model composed by


y threshold devices and stochastic algorithms
binary g
` Rosenblatt (1958) devised a class of binary linear machines named
Perceptrons
` Minski and Papert (1969) criticized Perceptrons, demonstrating that
their learning capabilities were limited to applications with linearly
separable patterns
` Rumelhart, Hinton, and Williams (1980’s) proposed the multilayer
perceptron (MLP) with backpropagation learning algorithm. In such
way they introduced the modern feed-forward
feed forward ANNs
` Two other important neural networks are:
– the Kohonen’s Self Organizing Map (1981) for data projection and
classification
p
– the Hopfield network ((1982)) for optimization
p p
purposes
p

6
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
A practical classification of the main types of ANNs

Feed-forward networks
– Distributed knowledge
• Multi Layer
y Perceptron
p ((MLP))
• Recurrent Neural Networks (RNN)

– Kernel-based
Kernel based
• Radial Basis Function network (RBF)

Vector quantization networks


– Kohonen
Kohonen’s
s Self
Self-Organizing
Organizing Map (SOM)

7
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Some details on feed
feed-forward
forward neural networks
` A feed-forward ANN is a
supervised network
organized in layers.
` It can have any number of:
• layers
• units per layer
• network inputs
• network outputs
` Hidden layers are the layers
interposed between the
input and output layer
` The information flow moves
always from the input to the
output layer

8
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Multi Layer Perceptron
Multi-Layer

` The most popular neural network

` Feed-forward network with multiple hidden layers

` It uses non-linear smooth transfer functions (logistic, hyperbolic


tangent)

` The weights and biases are adjusted in order to minimize the


approximation error (difference between target and output)

` MLP is trained with gradient–descent algorithms, whose the first


one was the error backpropagation algorithm proposed by
Werbos in 1974

9
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
MLP structure

hidden layers
(non-linear units)
input units ouput layer
(placeholders) (linear or non-linear units)

inputs outputs

10
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
A non
non-linear
linear unit

y = f ( wx + b )

w wx+b
x Σ ƒ y

b x: input
y: output
1 w: weight
b: bias

11
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Transfer functions

Log-sigmoid
Log sigmoid Tan-sigmoid
Tan sigmoid

Linear
12
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Tan-sigmoid
g

2
f (n) = −2 n
−1
1+ e

13
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Log-sigmoid

1
f (n) =
1 + e− n

14
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Effect of the weight (tan-sigmoid)
(tan sigmoid) y = f ( wx + b )

-2 2

bias = 0, weight varies from –2 to 2

15
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Effect of the bias (tan-sigmoid) y = f ( wx + b )

2 0 -2

weight = 1, bias varies from 2 to -2

16
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
MLP as a general function approximator

` The transfer function of a MLP can be viewed as a non-linear


non linear
combination of the non-linear functions of the inputs

` IIn the
th particular
ti l case off onlyl one hidden
hidd layer
l and
d linear
li outputs
t t it
performs a linear combination of non-linear functions

` The Cybenko’s theorem states the conditions for the approximation


of continuous functions

` A MLP with biases, one sigmoidal hidden layer, and a linear output
layer is capable of approximate any function with a finite number of
discontinuities at any desired precision

17
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Cybenko’s
Cybenko s theorem (1989)

Any continuous function of n variables F (x1, x2, …, xn)


can be presented in the form:

⎧ n ⎫ m
F ( x1 , x2 ,… , xn ) = ∑ g j ⎨∑ hij ( xi ) ⎬
j =1 ⎩ i =1 ⎭
where gj and hij are continuous functions and hij does
not depend on function F.
The theorem states that: A feed-forward neural network with one
internal layer, and an arbitrary continuous sigmoidal function can
approximate continuous functions with arbitrary precision,
provided that the number of hidden units m be sufficiently large.

18
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
MLP and its approximating function

MLP 3-4-1 with log-sigmoidal hidden layer and linear output layer

w u

x y

⎧ ⎫
4 ⎪ ⎪
⎪ 1 ⎪
y = ∑ ⎨u j ⎬+b
j =1 ⎪ ⎛ 3

1 + exp ⎜ −∑ wij xi + b j ⎟ ⎪
⎩⎪ ⎝ i =1 ⎠ ⎭⎪

19
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Example of a random 3D surface (MLP 2-3-1)

These examples are obtained


giving
g g random values to the
weights of the hidden units of
a MLP with 2 inputs and 1
output

20
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Example of a random 3D surface (MLP 2-10-1)
2 10 1)

21
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Example of a random 3D surface (MLP 2-80-1)

The complexity of the


surface increases with the
number of hidden units

22
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
MLP design issues

` Network size and structure

` Selection of the training set

` Normalization of the input and output data

` Choice of the learning algorithm

` Generalization

` Overfitting

23
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Network size and structure

` There is no formalized theory available for the design of MLP


` Often one goes on by trial and error
` How many hidden units?
A rule of thumb states that: N hidden = N input × N output
` How many hidden layers?
It seems there is no difference, provided that there are enough hidden
units
` Tan-sigmoid or log-sigmoid transfer functions?
It seems there is no difference
` The output layer is often composed by linear units
` Special features are sometimes obtained by ‘pruning’
pruning the connections (not
fully connected MLPs)

24
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Selection of the training set

` The training
g set ((TS)) is composed
p of input-output
p p p pairs extracted
from the function to approximate

` The TS must cover all the domain of interest,


interest since MLP does
not have extrapolation capabilities

` According to the usual statistical criteria, the number of samples


must be much greater than the number of the parameters (units)
of the network

25
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
g of y = 3 x 2 + 2, from x=3 to x=6
Curve fitting

MLP 1-8-1 with


tansig
i hidd
hidden
layer and linear
output unit

range of the training set


— original function
— ANN
ANN’ss output

The error increases outside the range of the training set


26
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Normalization of the input and output data

` The log-sigmoid output function exists in the range [0 ÷ 1], that of the
1 ÷ 1]
tan sigmoid in the range [[-1
tan-sigmoid

` The output data must be accordingly normalized

` The learning algorithm achieves better performance if the normalization


range is reduced of 10% on each side ([0.1 ÷ 0.9] and [-0.9 ÷ 0.9])

` With linear output units, the normalization of the output data is not
required

` Large input values (positive or negative) cause the learning algorithm


working with low derivatives of the transfer function and slow down the
process So it is advisable to normalize also the input data
process.

27
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Learning algorithms: the backpropagation

` The backpropagation algorithm has exceeded the limits of the


P
Perceptron,
t allowing
ll i the
th training
t i i off multi-layer
lti l networks
t k with
ith non-linear
li
transfer functions

` The error computed on the output layer is (back-)propagated to all


previous layers and the weights are accordingly modified with:
∂E
Δwij ( t ) = −η
∂wij
where the term η is the learning rate

` The procedure is repeated until the overall quadratic error (squared


difference between targets and outputs) falls under a user-defined value
(goal)

28
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Backpropagation
p p g training
gpprocedure

` Weights and biases are randomly initialized before the training


session Random weight initialization is the most popular method
session.
` Each learning cycle is composed by three steps:
1. present to the network the input vector of the sample and calculate
the output vector (forward step)
2. propagate
p p g backward the error from the output
p layer
y ((backward step)
p)
3. change the weights of each connection to reduce the output error
attributable to this connection (adjusting step)

` When these three steps have been performed for the entire TS,
one epoch has occurred
` The goall is
Th i to
t converge to
t a near-optimal
ti l solution
l ti b
based
d on the
th
overall squared error

29
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Backpropagation learning cycle

30
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Learning algorithms: speed up the backpropagation

` The original backpropagation algorithm requires many epochs to reach


the global minimum

` To reduce the learning time several improvements have been devised.


Some of these are:

ƒ Backpropagation improvements (improve the gradient descent):

– Backpropagation with adaptive learning rate

– Backpropagation with momentum

ƒ Block methods (use a matricial approach):

– Levenberg
Levenberg-Marquardt
Marquardt method

– Levenberg-Marquardt method with Bayesian regularization

31
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Backpropagation with adaptive learning rate

` The performance of the gradient descent algorithm is improved if


the learning rate η can change during the learning process:
∂E
Δwij ( t ) = −η ( t )
∂wij
` The adaptive learning rate attempts to use a step size as large as
possible while keeping learning stable

` In this way the learning rate adapts locally to the complexity of


th error surface
the f

32
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Backpropagation with momentum

` This method uses the weight change Δw of the previous cycle as


parameter for the computation of the new weight change:

∂E
Δwij ( t + 1) = −η + α Δwij ( t )
∂wij

` The purpose is to give a "momentum" to the search for the


minimum

` This prevents that the gradient descent algorithm is stuck by


local minima

33
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Levenberg-Marquardt
Levenberg Marquardt algorithm and Bayesian regularization

` The Levenberg-Marquardt algorithm is designed to approach


second-order learning speed without computing the Hessian matrix

` When the performance function has the form of a sum of squares (as
in the MLP learning algorithm) the Jacobian matrix (first order
p
derivatives of the error with respect to the weights
g and biases)) can
be used for approximate the Hessian matrix

` The
Th BBayesian
i regularization
l i ti method,
th d coupled
l d tto L
Levenberg-
b
Marquardt, minimizes a linear combination of squared errors and
weights and improves the generalization capabilities of the network

34
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Learning algorithms: Cover’s theorem and Linear Separation

` The Cover’s theorem (1965) states that: A complex pattern-classification


problem,, cast in a high-dimensional
p g space
p nonlinearly,
y, is more likelyy to be
linearly separable than in a low-dimensional space, provided that the space is
not densely populated

` This theorem is the basis of a very fast learning method, applicable to MLPs
with one hidden layer and linear output layer

` Known since the beginning of neural networks, this method was from time to
time “rediscovered” with different names: Random Activation Weight Neural Net
(RAWN) by Te Braake and Van Straten (1994), Extreme Learning Machine
(ELM) by Huang Guang-Bin (2005)

` These methods, as well the Support Vector Machines (SVM), exploit the so-
called ‘kernel trick’: transform the input space in a higher dimensional space
spanned by the hidden units, where the TS data can be fitted (hopefully) by
linear combination

35
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Cover’s theorem: an application example

Points in the 2D original space Projection in the 3D space

Linear separation plane Points separated in the original


g space

36
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Training
g the MLP with the kernel trick
` The kernel trick (from the Cover‘s theorem) transforms a non-linear
relationship in a linear one
one, that can be fitted by a linear method

` A MLP with one non-linear hidden layer ((logistic, tansig, gaussian) and
a linear output layer can be quickly trained using the kernel trick

` Unlike the traditional MLP, the hidden layer weights and biases are
randomly assigned and the output layer weights are found by linear
regression

` In this way there is no iterative learning: the weights are found using
matrix inversion

` The K non
non-linear
linear mapping functions (units of the hidden layer) which
span the high-dimensional space are called ‘kernels’

37
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Training
g the MLP with the kernel trick …

` If K is sufficiently large any non-linear relationship can be adequately


fi d as a lilinear combination
fitted bi i off non-linear
li ffunctions
i

` However, this method requires many more hidden units than the classic
gradient descent methods
gradient-descent

` Linear separation is effective for fast curve fitting but less accurate in
generalization than the traditional MLP

` Let’s see a simple example in MATLAB:

38
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Kernel trick: a simple
p example
p ((1))

% non-linear relationship from x to y

x=0:.001:1; % (1 x 1001)
y=x.^2+1; % (1 x 1001)

plot(x,y), grid

This relationship exists in the


bidimensional space and cannot be
directly approximated by linear
methods

39
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Kernel trick: a simple
p example
p ((2))

Let’s use the kernel trick with K = 2 (MLP 1-2-1). Probably the number
off kernels
k l is
i ttoo llow:

K = 2;
% random choice of the weights and biases of the first layer
W1 = randn(K,size(x,1)); % K x 1 hidden layer input weight matrix
bias = randn(si
randn(size(W1));
e(W1)) % K x 1 biases
W1 = w1+bias; % K x 1 weights plus biases

% the inputs are mapped by the logistic function into the new space
H = 1./(1 + exp(-w1*x)); % K x 1001 hidden layer input/output matrix

40
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Kernel trick: a simple
p example
p ((3))
Finally, let’s compute the estimated outputs by linear regression and
plot
p ot tthe
e data
data:

W2 = y*pinv(H); % 1 x K output layer weights


ye = W2
W2*H;
H; % 1 x 1001 estimated output vector
plot(x,y), grid, hold
p ( ,y , );
plot(x,ye,'r');

Nothing happened! The nearly


linear approximation (red line) is
very different from the original
relationship (blue line)

41
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Kernel trick: a simple
p example
p ((4))
Let’s use now a higher number of kernels. The approximation with K = 3 is
now qquite satisfactory.
y Setting
g K = 6 the fitting
g is very
yggood. The same
results can be obtained using a different transfer function in the hidden
layer (tansig or gaussian)

K=3 K=6

42
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Kernel trick: a simple
p example
p ((5))
The same example using Levenberg-Marquardt learning:

2 2

1.8 1.8

1.6 1.6

1.4 1.4

1.2 1.2

1 1

0.8 0.8
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

MLP 1-2-1 MLP 1-3-1


The same fitting quality is obtained with fewer units, but learning
takes much longer
43
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Learning
g equations
q using
g the Kernel trick
` MLP with single hidden layer
N hidden
⎛ Ninput ⎞
yo = ∑ w j ,o f j ⎜⎜ ∑ wk , j xk + b j ⎟⎟
j =1 ⎝ k =1 ⎠
` Procedure
– Randomly assign weights wk,j
k j and biases bj of the hidden layer
– Present the input matrix X (NS x (NI+1)) and compute the output matrix of
the hidden layer H (NS x NH)
– Compute
C t th
the weights
i ht wj,o off the
th output
t t layer
l by:
b

W = H† Y
where W (NH x NO) is the weight vector of the output layer, Y (NS x NO) is
the TS output vector and H+ (NH x NS) is the Moore-Penrose pseudoinverse
of H
44
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Comparison between Linear separation and Levenberg-Marquardt
methods

MATLAB Peaks function

Levenberg-Marquardt
Levenberg Marquardt Linear separation

45
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
MATLAB Peaks function

46
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Fitting with MLP 2-60-1 trained with Levenberg-Marquardt method

180 weigths,
i th llearning
i time
ti 338 s, MAPE = 330 %
47
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Fitting with MLP 2-120-1 trained with Linear separation method

360 weigths,
i th llearning
i time
ti 5 s, MAPE = 614 %
48
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Generalization

` Cybenko’s
Cybenko s theorem ensures the approximation of the training samples
but does not tell anything about the generalization capabilities of the
trained network

` Generalization is the ability to learn the “true” input-output relationship,


hidden in the training samples

` Training samples may be affected by noise or some regressors may be


unavailable

` So, a too accurate fitting of the training data could result in poor
generalization (overfitting)

49
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Overfitting

` The overfitting is caused by the use of too many units compared with
the “true” unknown model

` The network reproduces the noisy samples of the training data set
instead of the true relationship

` P
Practically,
ti ll overfitting
fitti occurs when
h th number
the b off the
th training
t i i d t is
data i
small compared to the network size (overparameterization): the MLP
simply copies the training data

` Overfitting is avoided if the network size fits the actual relationship to be


approximated
pp

50
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Overfitting

o samples
— actual
— MLP

Th approximation
The i ti tracks
t k the
th samples
l off the
th ttraining
i i sett
51
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Correct fitting

o samples
— actual
— MLP

Th approximation
The i ti fits
fit the
th true
t unknown
k curve
52
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
How to avoid overfitting
g

` Ideally the MLP should reproduce as much as possible the “true”


relationship

` Because it is difficult to know beforehand how large a network should


be for a specific application,
application there are two methods to avoid overfitting:
– Early stopping (cross-validation)
The training samples are divided in Training set (actual training) and
Validation set (check of the learning quality). When the approximation
error decreases in the TS and increases in the VS the training is
stopped
pp
– Regularization
Overfitting occurs because the network attempts to track down every
single data point of the TS. The learning cost function is then modified
by adding a penalty term (regularization function) for the complexity of
the model (second derivatives increase)

53
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Kernel-based neural networks

` Kernel-based networks are supervised networks with three layers:


– the units of the hidden layer (kernels) form a vectorial quantization of the
input space
– the input layer evaluates some measure of the distance amongst the input
vector and the kernels
– the output layer employs the outputs of the hidden layer to approximate the
relationship as linear combination on non-linear functions

` The information is localized: each hidden unit corresponds


p to a
receptive zone in the input space

` Kernel-based networks allow knowledge


g removal and incremental
learning

54
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
A kernel-based neural network

linear output layer

hidden layer (kernel units)

input layer (distance measure)

input space

receptive zones
55
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Radial basis function networks

` A RBF is composed by:


– an input layer that computes the Euclidean distance amongst the input vector and the
weight vectors of the hidden layer
– a hidden layer composed by units with Gaussian transfer function (radial bases)
whose weight vectors form a vectorial quantization of the input space
– a linear output layer

` Performs a linear combination of non


non-linear
linear functions of the input values

` The learning process is divided in two parts:


– first, the weight vectors of the hidden layer (centroids of the radial bases) are found
by clustering techniques or randomly assigned in the input space. Alternatively, an
optimal method can be employed (orthogonal least squares algorithm)
– afterwards,, the weights
g of the output
p layer
y are computed
p byy linear regression
g ((Linear
separation method)

56
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
RBF network

hidden layer (bases)

input units
(distance measure)
linear ouput layer

x y

57
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
A radial basis unit

D(x,w)
compute
x ƒ y
distance

x: input vector
y: output value
w: weight vector

58
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Gaussian

f (n) = e − n2

59
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Some remarks on RBF networks

` F
Fast learning
l i (if clustering
l i techniques
h i and
d the
h linear
li separation
i method
h d
are employed)

` L
Localization
li ti off the
th knowledge
k l d ini the
th hidden
hidd layer
l (if clustering
l t i is i used
d to
t
find the centers of the radial bases)

` Overfitting
O fitti can be
b avoided
id d (fast
(f t design
d i off the
th network
t k by
b trial
t i l and
d error))

` With the usual learning methods (clustering and linear separation) the
quality
lit off the
th approximation
i ti iis llower with
ith respectt tto th
the MLP

60
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Comparison between RBF network and MLP trained with LS

MATLAB Peaks function

RBF network
t k MLP with linear separation

61
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Fitting with RBF 2-120-1

360 weigths,
i th llearning
i titime 8 s, MAPE = 1210 %
62
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Fitting with MLP 2-120-1
trained with Linear separation method

360 weigths,
i th llearning
i titime 5 s, MAPE = 614 %
63
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Vector quantization methods

` Unsupervised/supervised
p p networks that p
project
j data from a high
g
dimensional space in a bidimensional map, preserving as much as
possible the original topology

` Allows the visualization of a multi-dimensional data set for a better


understanding

` Also
Al used
d as clustering
l t i method
th d

` The main VQ networks are:


– Kohonen’s
’ SSelf-Organizing
fO Map (SOM)
(SO ) (unsupervised)
( )
– Curvilinear Component Analysis (CCA) (unsupervised)
– Learning Vector Quantization (LVQ) method (supervised)

64
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Self-Organizing Map (SOM)

` Kohonen’s SOM is a rectangular grid of competitive units with the


following properties:
– each unit has a fixed position in the grid
– each unit is associated to a weight vector of N components, where N is the
di
dimension
i off ththe iinputt space

` The learning procedure is:


– the
h weights
i h are randomly
d l iinitialized
i i li d
– each TS sample is presented as input
– the units compete and the unit closest to the input vector is the winning unit
– the weights of the winning unit and those of the adjacent units
(neighborhood) are adjusted to get closer to the input vector
– the
th procedure
d is
i repeated
t d until
til th
the map b
becomes stable
t bl

65
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Structure of the SOM

winning
g unit neighbor
g units

input vector

66
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Properties
p of the SOM

` After the learning session, the units are organized in a smooth


projection of the input data (codebook)

` The original topology is preserved: separate receptive zones (bubbles)


classify similar samples

` Amongst the bubbles there are smooth areas composed by units that
do not win the competition for the training data (dead units) but could
win for new similar data

` This smoothing property makes the method not suited for clear-cut
separation of the clusters but very useful for data visualization

67
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
An example:
p classification of daily
y load p
profiles

– 105 AEM daily load profiles (January – April 1995)

– 105 x 24 input data matrix

– A map with 16 x 12 units with 24-dimension weight vectors has been


chosen

– Th
The units
it off the
th map are randomly
d l initialized
i iti li d and
d after
ft ththe ttraining
i i
session become a smooth topological preserving projection of the load
profiles

– A k-means clustering algorithm can be applied to partition the


codebook’s units in five clusters

– The map shows that holidays and workdays are classified in well
separate clusters

68
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
16 x 12 codebook before training
g ((random initialization))

69
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
16 x 12 map codebook after training

70
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Plot of a single unit of the codebook

71
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Load profiles classified by a map unit (“hits”)

72
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Partitioned codebook with the “hits” of the training data

h lid
holidays

Data hits are the


white circles, with
radius proportional
to the number of
samples classified
by that unit

weekdays

73
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Codebook profiles superimposed to the clustered map

holidays

Weekday weekdays
samples
classified by
the same unit

74
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012

You might also like