Professional Documents
Culture Documents
1
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Some biological remarks
2
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Artificial Neural Networks (ANNs)
` Neural
N l networks
t k are freely
f l inspired
i i d by
b biological
bi l i l concepts
t
3
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
ANNs: structure and glossary
` E
Eachh unitit performs
f a non-linear
li t
transformation
f ti off the
th sum off its
it
weighted input signals and propagates it to the inputs of the connected
units
` Once trained, ANNs can produce the correct outputs for inputs not
previously included in the TS (generalization)
4
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
A generic feed-forward neural network
units
The information goes from the input to the
output
t t units
it weights
i ht
input
units
inputs
output
5
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Some historical remarks
6
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
A practical classification of the main types of ANNs
Feed-forward networks
– Distributed knowledge
• Multi Layer
y Perceptron
p ((MLP))
• Recurrent Neural Networks (RNN)
– Kernel-based
Kernel based
• Radial Basis Function network (RBF)
7
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Some details on feed
feed-forward
forward neural networks
` A feed-forward ANN is a
supervised network
organized in layers.
` It can have any number of:
• layers
• units per layer
• network inputs
• network outputs
` Hidden layers are the layers
interposed between the
input and output layer
` The information flow moves
always from the input to the
output layer
8
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Multi Layer Perceptron
Multi-Layer
9
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
MLP structure
hidden layers
(non-linear units)
input units ouput layer
(placeholders) (linear or non-linear units)
inputs outputs
10
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
A non
non-linear
linear unit
y = f ( wx + b )
w wx+b
x Σ ƒ y
b x: input
y: output
1 w: weight
b: bias
11
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Transfer functions
Log-sigmoid
Log sigmoid Tan-sigmoid
Tan sigmoid
Linear
12
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Tan-sigmoid
g
2
f (n) = −2 n
−1
1+ e
13
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Log-sigmoid
1
f (n) =
1 + e− n
14
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Effect of the weight (tan-sigmoid)
(tan sigmoid) y = f ( wx + b )
-2 2
15
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Effect of the bias (tan-sigmoid) y = f ( wx + b )
2 0 -2
16
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
MLP as a general function approximator
` IIn the
th particular
ti l case off onlyl one hidden
hidd layer
l and
d linear
li outputs
t t it
performs a linear combination of non-linear functions
` A MLP with biases, one sigmoidal hidden layer, and a linear output
layer is capable of approximate any function with a finite number of
discontinuities at any desired precision
17
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Cybenko’s
Cybenko s theorem (1989)
⎧ n ⎫ m
F ( x1 , x2 ,… , xn ) = ∑ g j ⎨∑ hij ( xi ) ⎬
j =1 ⎩ i =1 ⎭
where gj and hij are continuous functions and hij does
not depend on function F.
The theorem states that: A feed-forward neural network with one
internal layer, and an arbitrary continuous sigmoidal function can
approximate continuous functions with arbitrary precision,
provided that the number of hidden units m be sufficiently large.
18
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
MLP and its approximating function
MLP 3-4-1 with log-sigmoidal hidden layer and linear output layer
w u
x y
⎧ ⎫
4 ⎪ ⎪
⎪ 1 ⎪
y = ∑ ⎨u j ⎬+b
j =1 ⎪ ⎛ 3
⎞
1 + exp ⎜ −∑ wij xi + b j ⎟ ⎪
⎩⎪ ⎝ i =1 ⎠ ⎭⎪
19
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Example of a random 3D surface (MLP 2-3-1)
20
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Example of a random 3D surface (MLP 2-10-1)
2 10 1)
21
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Example of a random 3D surface (MLP 2-80-1)
22
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
MLP design issues
` Generalization
` Overfitting
23
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Network size and structure
24
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Selection of the training set
` The training
g set ((TS)) is composed
p of input-output
p p p pairs extracted
from the function to approximate
25
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
g of y = 3 x 2 + 2, from x=3 to x=6
Curve fitting
` The log-sigmoid output function exists in the range [0 ÷ 1], that of the
1 ÷ 1]
tan sigmoid in the range [[-1
tan-sigmoid
` With linear output units, the normalization of the output data is not
required
27
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Learning algorithms: the backpropagation
28
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Backpropagation
p p g training
gpprocedure
` When these three steps have been performed for the entire TS,
one epoch has occurred
` The goall is
Th i to
t converge to
t a near-optimal
ti l solution
l ti b
based
d on the
th
overall squared error
29
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Backpropagation learning cycle
30
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Learning algorithms: speed up the backpropagation
– Levenberg
Levenberg-Marquardt
Marquardt method
31
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Backpropagation with adaptive learning rate
32
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Backpropagation with momentum
∂E
Δwij ( t + 1) = −η + α Δwij ( t )
∂wij
33
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Levenberg-Marquardt
Levenberg Marquardt algorithm and Bayesian regularization
` When the performance function has the form of a sum of squares (as
in the MLP learning algorithm) the Jacobian matrix (first order
p
derivatives of the error with respect to the weights
g and biases)) can
be used for approximate the Hessian matrix
` The
Th BBayesian
i regularization
l i ti method,
th d coupled
l d tto L
Levenberg-
b
Marquardt, minimizes a linear combination of squared errors and
weights and improves the generalization capabilities of the network
34
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Learning algorithms: Cover’s theorem and Linear Separation
` This theorem is the basis of a very fast learning method, applicable to MLPs
with one hidden layer and linear output layer
` Known since the beginning of neural networks, this method was from time to
time “rediscovered” with different names: Random Activation Weight Neural Net
(RAWN) by Te Braake and Van Straten (1994), Extreme Learning Machine
(ELM) by Huang Guang-Bin (2005)
` These methods, as well the Support Vector Machines (SVM), exploit the so-
called ‘kernel trick’: transform the input space in a higher dimensional space
spanned by the hidden units, where the TS data can be fitted (hopefully) by
linear combination
35
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Cover’s theorem: an application example
36
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Training
g the MLP with the kernel trick
` The kernel trick (from the Cover‘s theorem) transforms a non-linear
relationship in a linear one
one, that can be fitted by a linear method
` A MLP with one non-linear hidden layer ((logistic, tansig, gaussian) and
a linear output layer can be quickly trained using the kernel trick
` Unlike the traditional MLP, the hidden layer weights and biases are
randomly assigned and the output layer weights are found by linear
regression
` In this way there is no iterative learning: the weights are found using
matrix inversion
` The K non
non-linear
linear mapping functions (units of the hidden layer) which
span the high-dimensional space are called ‘kernels’
37
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Training
g the MLP with the kernel trick …
` However, this method requires many more hidden units than the classic
gradient descent methods
gradient-descent
` Linear separation is effective for fast curve fitting but less accurate in
generalization than the traditional MLP
38
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Kernel trick: a simple
p example
p ((1))
x=0:.001:1; % (1 x 1001)
y=x.^2+1; % (1 x 1001)
plot(x,y), grid
39
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Kernel trick: a simple
p example
p ((2))
Let’s use the kernel trick with K = 2 (MLP 1-2-1). Probably the number
off kernels
k l is
i ttoo llow:
K = 2;
% random choice of the weights and biases of the first layer
W1 = randn(K,size(x,1)); % K x 1 hidden layer input weight matrix
bias = randn(si
randn(size(W1));
e(W1)) % K x 1 biases
W1 = w1+bias; % K x 1 weights plus biases
% the inputs are mapped by the logistic function into the new space
H = 1./(1 + exp(-w1*x)); % K x 1001 hidden layer input/output matrix
40
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Kernel trick: a simple
p example
p ((3))
Finally, let’s compute the estimated outputs by linear regression and
plot
p ot tthe
e data
data:
41
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Kernel trick: a simple
p example
p ((4))
Let’s use now a higher number of kernels. The approximation with K = 3 is
now qquite satisfactory.
y Setting
g K = 6 the fitting
g is very
yggood. The same
results can be obtained using a different transfer function in the hidden
layer (tansig or gaussian)
K=3 K=6
42
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Kernel trick: a simple
p example
p ((5))
The same example using Levenberg-Marquardt learning:
2 2
1.8 1.8
1.6 1.6
1.4 1.4
1.2 1.2
1 1
0.8 0.8
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
W = H† Y
where W (NH x NO) is the weight vector of the output layer, Y (NS x NO) is
the TS output vector and H+ (NH x NS) is the Moore-Penrose pseudoinverse
of H
44
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Comparison between Linear separation and Levenberg-Marquardt
methods
Levenberg-Marquardt
Levenberg Marquardt Linear separation
45
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
MATLAB Peaks function
46
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Fitting with MLP 2-60-1 trained with Levenberg-Marquardt method
180 weigths,
i th llearning
i time
ti 338 s, MAPE = 330 %
47
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Fitting with MLP 2-120-1 trained with Linear separation method
360 weigths,
i th llearning
i time
ti 5 s, MAPE = 614 %
48
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Generalization
` Cybenko’s
Cybenko s theorem ensures the approximation of the training samples
but does not tell anything about the generalization capabilities of the
trained network
` So, a too accurate fitting of the training data could result in poor
generalization (overfitting)
49
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Overfitting
` The overfitting is caused by the use of too many units compared with
the “true” unknown model
` The network reproduces the noisy samples of the training data set
instead of the true relationship
` P
Practically,
ti ll overfitting
fitti occurs when
h th number
the b off the
th training
t i i d t is
data i
small compared to the network size (overparameterization): the MLP
simply copies the training data
50
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Overfitting
o samples
— actual
— MLP
Th approximation
The i ti tracks
t k the
th samples
l off the
th ttraining
i i sett
51
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Correct fitting
o samples
— actual
— MLP
Th approximation
The i ti fits
fit the
th true
t unknown
k curve
52
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
How to avoid overfitting
g
53
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Kernel-based neural networks
54
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
A kernel-based neural network
input space
receptive zones
55
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Radial basis function networks
56
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
RBF network
input units
(distance measure)
linear ouput layer
x y
57
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
A radial basis unit
D(x,w)
compute
x ƒ y
distance
x: input vector
y: output value
w: weight vector
58
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Gaussian
f (n) = e − n2
59
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Some remarks on RBF networks
` F
Fast learning
l i (if clustering
l i techniques
h i and
d the
h linear
li separation
i method
h d
are employed)
` L
Localization
li ti off the
th knowledge
k l d ini the
th hidden
hidd layer
l (if clustering
l t i is i used
d to
t
find the centers of the radial bases)
` Overfitting
O fitti can be
b avoided
id d (fast
(f t design
d i off the
th network
t k by
b trial
t i l and
d error))
` With the usual learning methods (clustering and linear separation) the
quality
lit off the
th approximation
i ti iis llower with
ith respectt tto th
the MLP
60
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Comparison between RBF network and MLP trained with LS
RBF network
t k MLP with linear separation
61
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Fitting with RBF 2-120-1
360 weigths,
i th llearning
i titime 8 s, MAPE = 1210 %
62
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Fitting with MLP 2-120-1
trained with Linear separation method
360 weigths,
i th llearning
i titime 5 s, MAPE = 614 %
63
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Vector quantization methods
` Unsupervised/supervised
p p networks that p
project
j data from a high
g
dimensional space in a bidimensional map, preserving as much as
possible the original topology
` Also
Al used
d as clustering
l t i method
th d
64
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Self-Organizing Map (SOM)
65
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Structure of the SOM
winning
g unit neighbor
g units
input vector
66
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Properties
p of the SOM
` Amongst the bubbles there are smooth areas composed by units that
do not win the competition for the training data (dead units) but could
win for new similar data
` This smoothing property makes the method not suited for clear-cut
separation of the clusters but very useful for data visualization
67
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
An example:
p classification of daily
y load p
profiles
– Th
The units
it off the
th map are randomly
d l initialized
i iti li d and
d after
ft ththe ttraining
i i
session become a smooth topological preserving projection of the load
profiles
– The map shows that holidays and workdays are classified in well
separate clusters
68
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
16 x 12 codebook before training
g ((random initialization))
69
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
16 x 12 map codebook after training
70
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Plot of a single unit of the codebook
71
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Load profiles classified by a map unit (“hits”)
72
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Partitioned codebook with the “hits” of the training data
h lid
holidays
weekdays
73
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012
Codebook profiles superimposed to the clustered map
holidays
Weekday weekdays
samples
classified by
the same unit
74
Neural methods for Short-Term Load Forecasting - F. Piglione, 2012