You are on page 1of 40

WK4 – Radial Basis Function Networks

Contents

Time Series
CS 476: Networks of Neural Computation
Prediction WK4 – Radial Basis Function
TS & NNs Networks
RBF Model

Conclusions
Dr. Stathis Kasderidis
Dept. of Computer Science
University of Crete

Spring Semester, 2009

CS 476: Networks of Neural Computation, CSD, UOC, 2009


Contents

•Introduction to Time Series Analysis


Contents
•Prediction Problem
Time Series
•Predicting Time Series with Neural Networks
Prediction

TS & NNs •Radial Basis Function Network


RBF Model •Conclusions
Conclusions

CS 476: Networks of Neural Computation, CSD, UOC, 2009


Introduction to Time Series Analysis

•There are two major classes of statistical problems:


Contents
•Classification problems (given an input x find in
Time Series which of a set of K known classes it belongs to);
Prediction •Regression problems (try to build a functional
TS & NNs relationship between independent and regressed
variables. The former are the effects, while the latter
RBF Model
are the the causes).
Conclusions
•The regression problems are created due to the
need for:
•Explanation

•Prediction

•Control

CS 476: Networks of Neural Computation, CSD, UOC, 2009


Introduction to Time Series Analysis II

•In a regression problem, there are two high-level


Contents issues to determine:
Time Series •The nature of the mechanism that generates
Prediction the data (stochastic or deterministic). This
TS & NNs affects which class of models he will use use;

RBF Model •A modelling procedure.


Conclusions

CS 476: Networks of Neural Computation, CSD, UOC, 2009


Introduction to Time Series Analysis III

• A modelling procedure includes usually the


Contents following steps:
Time Series
1. Specification of a model :
Prediction • If it describes a function or a probability
TS & NNs distribution;
RBF Model • If it is linear or non-linear;
Conclusions • If it is parametric or non-parametric;
• If it is a mixture or a single function;
• It it includes time explicitly or not;
• It it include memory or not.

CS 476: Networks of Neural Computation, CSD, UOC, 2009


Introduction to Time Series Analysis IV

2. Preparation of the data:


Contents
• Noise reduction;
Time Series
• Scaling;
Prediction
• Appropriate representation for the target
TS & NNs problem;
RBF Model • Transformations
Conclusions • De-correlation (cleaning up spatial or
temporal correlation structure)
• Feature extraction
• Handling missing values

CS 476: Networks of Neural Computation, CSD, UOC, 2009


Introduction to Time Series Analysis V

3. An estimation procedure (i.e. a framework to


Contents estimate the model parameters):
Time Series • Maximum Likelihood estimation;
Prediction • Bayesian estimation;
TS & NNs • (Ordinary) Least Squares;
RBF Model • Numerical Techniques used in the
Conclusions estimation framework are:
• Optimisation;
• Integration;
• Graph-Theoretic methods;
• etc

CS 476: Networks of Neural Computation, CSD, UOC, 2009


Introduction to Time Series Analysis VI

•Availability of data:
Contents
•Enough in number;
Time Series •Quality;

Prediction •Resolution.

TS & NNs •Resulting estimators created by the framework


must be:
RBF Model
•Un-biased (i.e. do not systematically differ
Conclusions from the true model in a statistical sense);
•Consistent (i.e. as the number of data grows
the estimator approaches the true model with
probability 1).

CS 476: Networks of Neural Computation, CSD, UOC, 2009


Introduction to Time Series Analysis VII

4. A model selection procedure (i.e. to select the


Contents best model). Factors include:
Time Series • Goodness of Fit (i.e. how well fitted first the
Prediction given data);

TS & NNs • Generalisation (i.e. how well approaches the


underlying data generation mechanism);
RBF Model
• Confidence Intervals.
Conclusions

CS 476: Networks of Neural Computation, CSD, UOC, 2009


Introduction to Time Series Analysis VIII

5. Testing a model:
Contents
• Testing the model in out of sample data;
Time Series
• Re-iterate the modelling procedure until we
Prediction produce a model with which we are satisfied;
TS & NNs • Compare different classes of models in order
RBF Model to find the best one;

Conclusions • Usually we select the simplest class which


describes well the data;
• There is not always available a comparison
framework among different classes of
models.
• Neural Networks are semi-parametric, non-
linear statistical modelling techniques
CS 476: Networks of Neural Computation, CSD, UOC, 2009
The Prediction Problem

•Def: A time series, {Xt}, is a family of real-valued


Contents
random variables indexed by t. The index t can take
Time Series values in  or .
Prediction

TS & NNs •When a family of variables is defined in all points in


time it is called continuous, otherwise it is called
RBF Model
discrete.
Conclusions •In practice we have always a discrete series due to
discrete sampling times of a continuous series or due
to digitization.
•The length of a series is the time elapsed between
the recoded start and finish of the series.

CS 476: Networks of Neural Computation, CSD, UOC, 2009


The Prediction Problem II

•Def: A time series, {Xt}, is called (strictly) stationary


Contents
if, for any t1, t2,…, tn  I, any k  I and n=1,2,…
Time Series
  
Pxt1 , xt2 ,..., xtn xt 1 , xt 2 ,..., xt n  Pxt1 k , xt2  k ,..., xtn  k xt 1 , xt 2 ,..., xt n 
Prediction

TS & NNs

RBF Model Where P denotes the joint distribution function of the


Conclusions set of random variables which appear as suffices and I
is an appropriate indexing set.

•Broadly speaking a time series is stationary if there is


no systematic change in mean, if there is no
systematic change in variance, and if strictly periodic
variations have been removed.
CS 476: Networks of Neural Computation, CSD, UOC, 2009
The Prediction Problem III

•In classical time series analysis we decompose a


Contents time series to the following components:
Time Series •A trend (a long term movement);

Prediction •Fluctuations about the trend of grater or less


regularity;
TS & NNs
•A seasonal component;
RBF Model
•A residual (irregular or random effect).
Conclusions

•Typically probability theory of time series examines


stationary series and investigates residuals for further
structure. However, in other cases we may be
interested in capturing the trend (i.e. function
approximation).

CS 476: Networks of Neural Computation, CSD, UOC, 2009


The Prediction Problem IV

•It is assumed that if the residuals do not contain any


Contents further structure, then they behave like an IID
Time Series (identical and independent distributed) process which
usually is assumed to be the normal. Such a stochastic
Prediction
process cannot be modelled further, thus the analysis
TS & NNs of a time series terminates;
RBF Model •If on the other hand the series contains more
Conclusions
structure, we re-iterate the analysis until the residuals
do not contain any structure.
•Tests to use for checking the normality of the
residuals are:
•Kolmogorov-Smirnov test;
•BDS test, etc;

CS 476: Networks of Neural Computation, CSD, UOC, 2009


The Prediction Problem V

•If the structure of the series is linear then we fit a


Contents linear model such as ARMA, or if it is non-stationary
Time Series we fit the ARIMA model.
Prediction •On the other hand for non-linear models we use the
ARCH, GARCH and neural network models. Typically
TS & NNs
we fit first the linear component with a linear model
RBF Model and then the residuals with a non-linear model.
Conclusions

CS 476: Networks of Neural Computation, CSD, UOC, 2009


The Prediction Problem VI

•Usually a time series does not have all the desirable


Contents statistical properties so we transform it in order to
Time Series achieve better results before we start the analysis.
Typical transforms include:
Prediction
•Stabilise the variance;
TS & NNs
•Make seasonal effects additive;
RBF Model •Make the data normally distributed;
Conclusions •Filtering (FFT, moving averages, exponential
smoothing, low and high-pass filters, etc)
•Differencing (the preferred method for de-
trending. We apply differencing until the time
series becomes stationary).

CS 476: Networks of Neural Computation, CSD, UOC, 2009


The Prediction Problem VII

•Restating the prediction problem:


Contents

Time Series “We want to construct a model with an appropriate


Prediction technique, which when is estimated can give 'good'
forecasts in new data. The new data commonly are
TS & NNs
some future values of the series. We want the model
RBF Model
to predict as accurately as possible the future values
Conclusions of the time series, given as input some previous
values of the series”.

CS 476: Networks of Neural Computation, CSD, UOC, 2009


The Prediction Problem VIII

•There are three main approaches which are used to


Contents model the series prediction problem:
Time Series •A. Assume a functional relationship as a generating
Prediction mechanism. E.g. Xt+1 = F(Xt), where Xt is an
appropriate vector of past values and F is the
TS & NNs
generating mechanism;
RBF Model •B. Assume that the map F has multiple braches. Then
Conclusions the returned output represents the probability of
obtaining Xt+1 in any one of the branches of F.
•C. Divide the input to a set of classes and try to learn
the map from input to classes, I.e. a classification
problem.

CS 476: Networks of Neural Computation, CSD, UOC, 2009


Time Series Prediction using Neural Networks

•To apply a neural network model in time series


Contents prediction we we have to make choices on the
Time Series following issues:
Prediction •Preparing the data:
TS & NNs •Transforming the data (see above);
RBF Model •Handling missing values;
Conclusions •Smoothing the data (if needed);
•Scale the data (almost always a good idea!);
•Dimensionality reduction (principal component
analysis, factor analysis);
•De-correlating data
•Extracting Features (I.e. combination of variables)
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Time Series Prediction using Neural Networks II

•Representing variables:
Contents
•Continuous or discrete;
Time Series
•Semantics of variables (i.e. probabilities,
Prediction categories, data points, etc);
TS & NNs •Distributed or atomic representation;
RBF Model •Variables with little information content can be
Conclusions harmful in generalisation;
•In Bayesian estimation the method of Automatic
Relevance Determination can be used for
selecting variables;
•Selecting Features
•Capturing of causal relations

CS 476: Networks of Neural Computation, CSD, UOC, 2009


Time Series Prediction using Neural Networks III

•Discovering ‘memory’ in the generating process:


Contents
•Trial and error;
Time Series
•Partial + Auto-correlation functions (linear);
Prediction
•Mutual Information function (non-linear);
TS & NNs
•Methods from Dynamical Systems theory;
RBF Model
•Determination of past values by fitting a model
Conclusions (e.g. linear) and eliminating past values with small
contribution based on sensitivity.

CS 476: Networks of Neural Computation, CSD, UOC, 2009


Time Series Prediction using Neural Networks IV

•Selecting an architecture:
Contents
•Type of training;
Time Series
•Family of models;
Prediction
•Transfer function;
TS & NNs
•Memory;
RBF Model
•Network Topology;
Conclusions
•Other parameters in network specification.

•Model selection:
•See discussion in WK3

CS 476: Networks of Neural Computation, CSD, UOC, 2009


Time Series Prediction using Neural Networks V

•Determination of Confidence Intervals:


Contents
•JacknifeMethod (a linear approximation of
Time Series Bootstrap)
Prediction •Bootstrap;

TS & NNs •Moving Blocks Bootstrap;


RBF Model •Bootstrap t-interval;
Conclusions •Bootstrap percentile interval;
•Bias-corrected and accelerated Bootstrap.

CS 476: Networks of Neural Computation, CSD, UOC, 2009


Time Series Prediction using Neural Networks VI

• Additional Literature:
Contents
1. Masters T. (1995). Neural, Novel & Hybrid
Time Series Algorithms for Time Series Prediction, Wiley.
Prediction 2. Pawitan Y, (2001). In all Likelihood: Statistical
TS & NNs Modelling and Inference Using Likelihood, Oxford
University Press.
RBF Model
3. Chatfield C. (1989). The analysis of time series. An
Conclusions introduction. 4th Ed. Chapman & Hall.
4. Harvey A (1993). Time Series Models, Harvester
Wheatsheaf.
5. Efron B., Tibshirani R. (1993). An introduction to
Bootstrap, Chapman and Hall.

CS 476: Networks of Neural Computation, CSD, UOC, 2009


Radial Basis Function Model

•There are only three layers: Input, Hidden and


Contents Output. There is only one hidden layer.
Time Series

Prediction

TS & NNs

RBF Model

Conclusions

CS 476: Networks of Neural Computation, CSD, UOC, 2009


Radial Basis Function Model II

•The hidden layer provides a non-linear transformation


Contents of the input space to the hidden space, which is
Time Series assumed usually of high enough dimension.
Prediction •The output layer combines in a linear way the
TS & NNs activations of the hidden layer.

RBF Model

Conclusions •Note: The RBF model owns its development on ideas


of fitting hyper-surfaces to data points in a high-
dimensional space.
•In Numerical Analysis, radial-basis functions were
introduced for the solution of real multivariate
interpolation problems.

CS 476: Networks of Neural Computation, CSD, UOC, 2009


Radial Basis Function Model III

•In the RBF model the hidden units provide a set of


Contents “functions” that constitute an arbitrary “basis” for the
Time Series input patterns when they are expanded to the hidden
space.
Prediction

TS & NNs •The inspiration for the RBF model is based on


Cover’s theorem (1965) on the separability of
RBF Model patterns:
Conclusions
“A complex pattern-classification problem cast in a
high-dimensional space nonlinearly is more likely to be
linearly separable than in a low-dimensional space”.
•This leads to consider the multivariable interpolation
problem in high-dimensional space:

CS 476: Networks of Neural Computation, CSD, UOC, 2009


Radial Basis Function Model IV

Given a set of N different points {xi Rm0| I=1,2,..,N}


Contents
and a corresponding set of N real numbers {di R1 |
Time Series I=1,2,…,N}, find a function F:RN  R1 that satisfies the
Prediction interpolation condition:
TS & NNs F(xi)= di , I=1,2,…,N
RBF Model For strict interpolation the interpolating surface, i.e. F,
Conclusions is constrained to pass through all data points.
•The radial-basis function (RBF) technique consists of
choosing a function F that has the following form:
N
  
F (x)   wi (|| x  xi ||)
i 1

CS 476: Networks of Neural Computation, CSD, UOC, 2009


Radial Basis Function Model V

Where {(||x-xi||) | I=1,2,…,N} is a set of N arbitrary


Contents
functions, known as radial-basis functions, and ||•||
Time Series denotes a norm, which is usually the Euclidean. The
Prediction
data points xi Rm0 are taken to be the centers of the
radial-basis functions.
TS & NNs
•Assume that d describes the desired response vector
RBF Model
and w is the linear weight vector. N is the size of the
Conclusions training set. Let  denote an N x N matrix with
elements:
ij = (||xj-xi||) , (j,i)=1,2,..,N
 is called the interpolation matrix.

CS 476: Networks of Neural Computation, CSD, UOC, 2009


Radial Basis Function Model VI

•Thus according to the above theorem we can write:


Contents
w = x
Time Series
The solution for the weight vector is:
Prediction

TS & NNs W = -1x


RBF Model Assuming that  is non-singular. The Micchelli’s
Theorem provides assurances for a set of functions
Conclusions
that create non-singular matrix :
Let {xi}i=1N be a set of distinct points in Rm0 . Then the N
x N interpolation matrix  is nonsingular.

CS 476: Networks of Neural Computation, CSD, UOC, 2009


Radial Basis Function Model VII

•Functions that are covered by Micchalli’s theorem


Contents include:
Time Series
•Multiquadrics:
Prediction
(r)=(r2 + c2)½ c>0, r R
TS & NNs
•Inverse Multiquadrics:
RBF Model
(r)=1/(r2 + c2)½ c>0, r R
Conclusions
•Gaussian functions:
(r)=exp(-r2/22) >0, r R
•All that is required for nonsigular  is that the points
x be different.

CS 476: Networks of Neural Computation, CSD, UOC, 2009


Radial Basis Function Model VIII

•Universal Approximation Theorem for RBF


Contents Networks:
Time Series
For any continuous input-output mapping function f(x)
Prediction there is an RBF network with a set of centers {ti}i=1m1
TS & NNs and a common width >0 such that the input-output
mapping function F(x) realized by the RBF network is
RBF Model
close to f(x) in the Lp norm, p  [1,].
Conclusions

The RBF network is consisting of functions F: Rm0  R


represented by: m1  
 x  ti
F ( x )   wi G ( )
i 1 

CS 476: Networks of Neural Computation, CSD, UOC, 2009


Radial Basis Function Model IX

•Results on Sample Complexity, Computational


Contents Complexity and Generalisation Performance for
Time Series RBF Networks:
Prediction •The generalisation error converges to zero only if
TS & NNs the number of hidden units m1, increases more
slowly than the size N of the training sample;
RBF Model
•For
a given size N of training sample, the optimum
Conclusions
number of hidden units, m1* , behaves as:
m1*  N1/3
•The RBF network exhibits a rate of approximation
O (1/ m1) that is similar to that of an MLP with
sigmoid activation functions.
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Radial Basis Function Model X

• Comparison of MLP and RBF networks:


Contents
1. An RBF network has a single hidden layer. An
Time Series
MLP has one or more hidden layers;
Prediction
2. Typically the nodes of an MLP in a hidden or
TS & NNs output layer share the same neuronal model.
RBF Model On the other hand the nodes of an RBF in a
hidden layer play a different role than those in
Conclusions
the output layer;
3. The hidden layer of an RBF is non-linear. The
output layer is linear. Typically in an MLP both
layers are nonlinear;

CS 476: Networks of Neural Computation, CSD, UOC, 2009


Radial Basis Function Model XI

4. An RBF network computes as argument of its


Contents activation function the Euclidean norm of the
Time Series input vector and the center of the unit. In MLP
networks the activation function computes the
Prediction
inner product of the input vector and the
TS & NNs weight vector of the node;
RBF Model 5. MLPs are global approximators; RBFs are local
Conclusions approximators due to the localised decaying
Gaussian (or other) function.

CS 476: Networks of Neural Computation, CSD, UOC, 2009


Learning Law for Radial Basis Networks

• To develop a learning law for RBF networks we


Contents assume that the error function has the following
Time Series form:
N
1
Prediction E 
2
 ej
j 1
2

TS & NNs

RBF Model
Where N is the size of the training sample used to do
Conclusions the learning, and ej is the error signal defined by:

e j  d j  F (x j )
M

 d j   wi G (|| x j  ti ||ci )
i 1

CS 476: Networks of Neural Computation, CSD, UOC, 2009


Learning Law for Radial Basis Networks II

•We need to find the free parameters wi, ti and -1 so


Contents
as to minimise E. Ci is a norm weighting matrix, i.e.:
Time Series
||x||C2 = (Cx)T(Cx)=xCTCx
Prediction

TS & NNs
•We use a weighted norm matrix when the individual
elements of x belong to different classes.
RBF Model
•To calculate the update equations we use gradient
Conclusions
descent on the instantaneous error function E. We get
the following update rules for the free parameters:

CS 476: Networks of Neural Computation, CSD, UOC, 2009


Learning Law for Radial Basis Networks III

1. Linear weights (output layer):


Contents
E ( n ) N
 
Time Series
wi ( n)
  j
e (
j 1
n )G (|| x j  xi ||Ci )

Prediction

TS & NNs E ( n )
wi (n  1)  wi (n)  1
RBF Model wi (n) i=1,2,…,m1
Conclusions

2. Positions
E ( n)
of centers
N (hidden layer):
  
 2 wi (n) e j (n)G ' (|| x j  ti (n) ||Ci ) i [ x j  ti (n)]
1

t i ( n ) j 1

  E ( n )
ti ( n  1)  ti ( n)   2
t i ( n )
i=1,2,…,m1
CS 476: Networks of Neural Computation, CSD, UOC, 2009
Learning Law for Radial Basis Networks IV

3. Spreads of centers (hidden layer):


Contents
E ( n ) N
 
Time Series
1
  wi (n) e j (n)G ' (|| x j  ti ||Ci )Q ji (n)
 i ( n ) j 1
Prediction    
Q ji (n)  [ x j  ti (n)][ x j  ti (n)]T
TS & NNs E ( n )
1 1
 i (n  1)   i (n)   3 1
RBF Model  i ( n )
Conclusions

• Note that three different learning rates 1, 2, 3


are used in the gradient descent equations.

CS 476: Networks of Neural Computation, CSD, UOC, 2009


Conclusions

•In time series modelling we seek to extract the


Contents maximum possible structure we can find in the series.
Time Series •We terminate the analysis of a series when the
Prediction residuals do not contain any more structure, i.e. they
have an IID structure.
TS & NNs

RBF Model
•NN can be used as models in time series prediction.

Conclusions •RBF networks are a second paradigm of multi layer


perceptrons.
•They are inspired by interpolation theory (numerical
analysis)
•They can be trained with the gradient descent
method, the same as the MLP case.

CS 476: Networks of Neural Computation, CSD, UOC, 2009

You might also like