WK4 - Radial Basis Function Networks

WK4 – Radial Basis Function Networks
Contents
Time Series
CS 476: Networks of Neural Computation
Prediction WK4 – Radial Basis Function
TS & NNs Networks
RBF Model
Conclusions
Dr. Stathis Kasderidis
Dept. of Computer Science
University of Crete
Spring Semester, 2009
CS 476: Networks of Neural Computation, CSD, UOC, 2009

Contents
•Introduction to Time Series Analysis

Contents
•Prediction Problem
Time Series
•Predicting Time Series with Neural Networks
Prediction
TS & NNs •Radial Basis Function Network

RBF Model •Conclusions
Conclusions

Introduction to Time Series Analysis
•There are two major classes of statistical problems:

Contents
•Classification problems (given an input x find in
Time Series which of a set of K known classes it belongs to);
Prediction •Regression problems (try to build a functional
TS & NNs relationship between independent and regressed
variables. The former are the effects, while the latter
RBF Model
are the the causes).
Conclusions
•The regression problems are created due to the
need for:
•Explanation
•Prediction
•Control

Introduction to Time Series Analysis II
•In a regression problem, there are two high-level

Contents issues to determine:
Time Series •The nature of the mechanism that generates
Prediction the data (stochastic or deterministic). This
TS & NNs affects which class of models he will use use;
RBF Model •A modelling procedure.

Conclusions

Introduction to Time Series Analysis III
• A modelling procedure includes usually the

Contents following steps:
Time Series
1. Specification of a model :
Prediction • If it describes a function or a probability
TS & NNs distribution;
RBF Model • If it is linear or non-linear;
Conclusions • If it is parametric or non-parametric;
• If it is a mixture or a single function;
• It it includes time explicitly or not;
• It it include memory or not.

Introduction to Time Series Analysis IV
2. Preparation of the data:

Contents
• Noise reduction;
Time Series
• Scaling;
Prediction
• Appropriate representation for the target
TS & NNs problem;
RBF Model • Transformations
Conclusions • De-correlation (cleaning up spatial or
temporal correlation structure)
• Feature extraction
• Handling missing values

Introduction to Time Series Analysis V
3. An estimation procedure (i.e. a framework to

Contents estimate the model parameters):
Time Series • Maximum Likelihood estimation;
Prediction • Bayesian estimation;
TS & NNs • (Ordinary) Least Squares;
RBF Model • Numerical Techniques used in the
Conclusions estimation framework are:
• Optimisation;
• Integration;
• Graph-Theoretic methods;
• etc

Introduction to Time Series Analysis VI
•Availability of data:
Contents
•Enough in number;
Time Series •Quality;
Prediction •Resolution.
TS & NNs •Resulting estimators created by the framework

must be:
RBF Model
•Un-biased (i.e. do not systematically differ
Conclusions from the true model in a statistical sense);
•Consistent (i.e. as the number of data grows
the estimator approaches the true model with
probability 1).

Introduction to Time Series Analysis VII
4. A model selection procedure (i.e. to select the

Contents best model). Factors include:
Time Series • Goodness of Fit (i.e. how well fitted first the
Prediction given data);
TS & NNs • Generalisation (i.e. how well approaches the

underlying data generation mechanism);
RBF Model
• Confidence Intervals.
Conclusions

Introduction to Time Series Analysis VIII
5. Testing a model:
Contents
• Testing the model in out of sample data;
Time Series
• Re-iterate the modelling procedure until we
Prediction produce a model with which we are satisfied;
TS & NNs • Compare different classes of models in order
RBF Model to find the best one;
Conclusions • Usually we select the simplest class which

describes well the data;
• There is not always available a comparison
framework among different classes of
models.
• Neural Networks are semi-parametric, non-
linear statistical modelling techniques
The Prediction Problem
•Def: A time series, {Xt}, is a family of real-valued

Contents
random variables indexed by t. The index t can take
Time Series values in  or .
Prediction
TS & NNs •When a family of variables is defined in all points in

time it is called continuous, otherwise it is called
RBF Model
discrete.
Conclusions •In practice we have always a discrete series due to
discrete sampling times of a continuous series or due
to digitization.
•The length of a series is the time elapsed between
the recoded start and finish of the series.

The Prediction Problem II
•Def: A time series, {Xt}, is called (strictly) stationary

Contents
if, for any t1, t2,…, tn  I, any k  I and n=1,2,…
Time Series
  
Pxt1 , xt2 ,..., xtn xt 1 , xt 2 ,..., xt n  Pxt1 k , xt2  k ,..., xtn  k xt 1 , xt 2 ,..., xt n 
Prediction
TS & NNs
RBF Model Where P denotes the joint distribution function of the

Conclusions set of random variables which appear as suffices and I
is an appropriate indexing set.
•Broadly speaking a time series is stationary if there is

no systematic change in mean, if there is no
systematic change in variance, and if strictly periodic
variations have been removed.
The Prediction Problem III
•In classical time series analysis we decompose a

Contents time series to the following components:
Time Series •A trend (a long term movement);
Prediction •Fluctuations about the trend of grater or less

regularity;
TS & NNs
•A seasonal component;
RBF Model
•A residual (irregular or random effect).
Conclusions
•Typically probability theory of time series examines

stationary series and investigates residuals for further
structure. However, in other cases we may be
interested in capturing the trend (i.e. function
approximation).

The Prediction Problem IV
•It is assumed that if the residuals do not contain any

Contents further structure, then they behave like an IID
Time Series (identical and independent distributed) process which
usually is assumed to be the normal. Such a stochastic
Prediction
process cannot be modelled further, thus the analysis
TS & NNs of a time series terminates;
RBF Model •If on the other hand the series contains more
Conclusions
structure, we re-iterate the analysis until the residuals
do not contain any structure.
•Tests to use for checking the normality of the
residuals are:
•Kolmogorov-Smirnov test;
•BDS test, etc;

The Prediction Problem V
•If the structure of the series is linear then we fit a

Contents linear model such as ARMA, or if it is non-stationary
Time Series we fit the ARIMA model.
Prediction •On the other hand for non-linear models we use the
ARCH, GARCH and neural network models. Typically
TS & NNs
we fit first the linear component with a linear model
RBF Model and then the residuals with a non-linear model.
Conclusions

The Prediction Problem VI
•Usually a time series does not have all the desirable

Contents statistical properties so we transform it in order to
Time Series achieve better results before we start the analysis.
Typical transforms include:
Prediction
•Stabilise the variance;
TS & NNs
•Make seasonal effects additive;
RBF Model •Make the data normally distributed;
Conclusions •Filtering (FFT, moving averages, exponential
smoothing, low and high-pass filters, etc)
•Differencing (the preferred method for de-
trending. We apply differencing until the time
series becomes stationary).

The Prediction Problem VII
•Restating the prediction problem:

Contents
Time Series “We want to construct a model with an appropriate

Prediction technique, which when is estimated can give 'good'
forecasts in new data. The new data commonly are
TS & NNs
some future values of the series. We want the model
RBF Model
to predict as accurately as possible the future values
Conclusions of the time series, given as input some previous
values of the series”.

The Prediction Problem VIII
•There are three main approaches which are used to

Contents model the series prediction problem:
Time Series •A. Assume a functional relationship as a generating
Prediction mechanism. E.g. Xt+1 = F(Xt), where Xt is an
appropriate vector of past values and F is the
TS & NNs
generating mechanism;
RBF Model •B. Assume that the map F has multiple braches. Then
Conclusions the returned output represents the probability of
obtaining Xt+1 in any one of the branches of F.
•C. Divide the input to a set of classes and try to learn
the map from input to classes, I.e. a classification
problem.

Time Series Prediction using Neural Networks
•To apply a neural network model in time series

Contents prediction we we have to make choices on the
Time Series following issues:
Prediction •Preparing the data:
TS & NNs •Transforming the data (see above);
RBF Model •Handling missing values;
Conclusions •Smoothing the data (if needed);
•Scale the data (almost always a good idea!);
•Dimensionality reduction (principal component
analysis, factor analysis);
•De-correlating data
•Extracting Features (I.e. combination of variables)
Time Series Prediction using Neural Networks II
•Representing variables:
Contents
•Continuous or discrete;
Time Series
•Semantics of variables (i.e. probabilities,
Prediction categories, data points, etc);
TS & NNs •Distributed or atomic representation;
RBF Model •Variables with little information content can be
Conclusions harmful in generalisation;
•In Bayesian estimation the method of Automatic
Relevance Determination can be used for
selecting variables;
•Selecting Features
•Capturing of causal relations

Time Series Prediction using Neural Networks III
•Discovering ‘memory’ in the generating process:

Contents
•Trial and error;
Time Series
•Partial + Auto-correlation functions (linear);
Prediction
•Mutual Information function (non-linear);
TS & NNs
•Methods from Dynamical Systems theory;
RBF Model
•Determination of past values by fitting a model
Conclusions (e.g. linear) and eliminating past values with small
contribution based on sensitivity.

Time Series Prediction using Neural Networks IV
•Selecting an architecture:
Contents
•Type of training;
Time Series
•Family of models;
Prediction
•Transfer function;
TS & NNs
•Memory;
RBF Model
•Network Topology;
Conclusions
•Other parameters in network specification.
•Model selection:
•See discussion in WK3

Time Series Prediction using Neural Networks V
•Determination of Confidence Intervals:

Contents
•JacknifeMethod (a linear approximation of
Time Series Bootstrap)
Prediction •Bootstrap;
TS & NNs •Moving Blocks Bootstrap;

RBF Model •Bootstrap t-interval;
Conclusions •Bootstrap percentile interval;
•Bias-corrected and accelerated Bootstrap.

Time Series Prediction using Neural Networks VI
• Additional Literature:
Contents
1. Masters T. (1995). Neural, Novel & Hybrid
Time Series Algorithms for Time Series Prediction, Wiley.
Prediction 2. Pawitan Y, (2001). In all Likelihood: Statistical
TS & NNs Modelling and Inference Using Likelihood, Oxford
University Press.
RBF Model
3. Chatfield C. (1989). The analysis of time series. An
Conclusions introduction. 4th Ed. Chapman & Hall.
4. Harvey A (1993). Time Series Models, Harvester
Wheatsheaf.
5. Efron B., Tibshirani R. (1993). An introduction to
Bootstrap, Chapman and Hall.

Radial Basis Function Model
•There are only three layers: Input, Hidden and

Contents Output. There is only one hidden layer.
Time Series
Prediction
TS & NNs
RBF Model
Conclusions

Radial Basis Function Model II
•The hidden layer provides a non-linear transformation

Contents of the input space to the hidden space, which is
Time Series assumed usually of high enough dimension.
Prediction •The output layer combines in a linear way the
TS & NNs activations of the hidden layer.
RBF Model
Conclusions •Note: The RBF model owns its development on ideas

of fitting hyper-surfaces to data points in a high-
dimensional space.
•In Numerical Analysis, radial-basis functions were
introduced for the solution of real multivariate
interpolation problems.

Radial Basis Function Model III
•In the RBF model the hidden units provide a set of

Contents “functions” that constitute an arbitrary “basis” for the
Time Series input patterns when they are expanded to the hidden
space.
Prediction
TS & NNs •The inspiration for the RBF model is based on

Cover’s theorem (1965) on the separability of
RBF Model patterns:
Conclusions
“A complex pattern-classification problem cast in a
high-dimensional space nonlinearly is more likely to be
linearly separable than in a low-dimensional space”.
•This leads to consider the multivariable interpolation
problem in high-dimensional space:

Radial Basis Function Model IV
Given a set of N different points {xi Rm0| I=1,2,..,N}

Contents
and a corresponding set of N real numbers {di R1 |
Time Series I=1,2,…,N}, find a function F:RN  R1 that satisfies the
Prediction interpolation condition:
TS & NNs F(xi)= di , I=1,2,…,N
RBF Model For strict interpolation the interpolating surface, i.e. F,
Conclusions is constrained to pass through all data points.
•The radial-basis function (RBF) technique consists of
choosing a function F that has the following form:
N
  
F (x)   wi (|| x  xi ||)
i 1

Radial Basis Function Model V
Where {(||x-xi||) | I=1,2,…,N} is a set of N arbitrary

Contents
functions, known as radial-basis functions, and ||•||
Time Series denotes a norm, which is usually the Euclidean. The
Prediction
data points xi Rm0 are taken to be the centers of the
radial-basis functions.
TS & NNs
•Assume that d describes the desired response vector
RBF Model
and w is the linear weight vector. N is the size of the
Conclusions training set. Let  denote an N x N matrix with
elements:
ij = (||xj-xi||) , (j,i)=1,2,..,N
 is called the interpolation matrix.

Radial Basis Function Model VI
•Thus according to the above theorem we can write:

Contents
w = x
Time Series
The solution for the weight vector is:
Prediction
TS & NNs W = -1x

RBF Model Assuming that  is non-singular. The Micchelli’s
Theorem provides assurances for a set of functions
Conclusions
that create non-singular matrix :
Let {xi}i=1N be a set of distinct points in Rm0 . Then the N
x N interpolation matrix  is nonsingular.

Radial Basis Function Model VII
•Functions that are covered by Micchalli’s theorem

Contents include:
Time Series
•Multiquadrics:
Prediction
(r)=(r2 + c2)½ c>0, r R
TS & NNs
•Inverse Multiquadrics:
RBF Model
(r)=1/(r2 + c2)½ c>0, r R
Conclusions
•Gaussian functions:
(r)=exp(-r2/22) >0, r R
•All that is required for nonsigular  is that the points
x be different.

Radial Basis Function Model VIII
•Universal Approximation Theorem for RBF

Contents Networks:
Time Series
For any continuous input-output mapping function f(x)
Prediction there is an RBF network with a set of centers {ti}i=1m1
TS & NNs and a common width >0 such that the input-output
mapping function F(x) realized by the RBF network is
RBF Model
close to f(x) in the Lp norm, p  [1,].
Conclusions
The RBF network is consisting of functions F: Rm0  R

represented by: m1  
 x  ti
F ( x )   wi G ( )
i 1 

Radial Basis Function Model IX
•Results on Sample Complexity, Computational

Contents Complexity and Generalisation Performance for
Time Series RBF Networks:
Prediction •The generalisation error converges to zero only if
TS & NNs the number of hidden units m1, increases more
slowly than the size N of the training sample;
RBF Model
•For
a given size N of training sample, the optimum
Conclusions
number of hidden units, m1* , behaves as:
m1*  N1/3
•The RBF network exhibits a rate of approximation
O (1/ m1) that is similar to that of an MLP with
sigmoid activation functions.
Radial Basis Function Model X
• Comparison of MLP and RBF networks:

Contents
1. An RBF network has a single hidden layer. An
Time Series
MLP has one or more hidden layers;
Prediction
2. Typically the nodes of an MLP in a hidden or
TS & NNs output layer share the same neuronal model.
RBF Model On the other hand the nodes of an RBF in a
hidden layer play a different role than those in
Conclusions
the output layer;
3. The hidden layer of an RBF is non-linear. The
output layer is linear. Typically in an MLP both
layers are nonlinear;

Radial Basis Function Model XI
4. An RBF network computes as argument of its

Contents activation function the Euclidean norm of the
Time Series input vector and the center of the unit. In MLP
networks the activation function computes the
Prediction
inner product of the input vector and the
TS & NNs weight vector of the node;
RBF Model 5. MLPs are global approximators; RBFs are local
Conclusions approximators due to the localised decaying
Gaussian (or other) function.

Learning Law for Radial Basis Networks
• To develop a learning law for RBF networks we

Contents assume that the error function has the following
Time Series form:
N
1
Prediction E 
2
 ej
j 1
2
TS & NNs
RBF Model
Where N is the size of the training sample used to do
Conclusions the learning, and ej is the error signal defined by:

e j  d j  F (x j )
M

 d j   wi G (|| x j  ti ||ci )
i 1

Learning Law for Radial Basis Networks II
•We need to find the free parameters wi, ti and -1 so

Contents
as to minimise E. Ci is a norm weighting matrix, i.e.:
Time Series
||x||C2 = (Cx)T(Cx)=xCTCx
Prediction
TS & NNs
•We use a weighted norm matrix when the individual
elements of x belong to different classes.
RBF Model
•To calculate the update equations we use gradient
Conclusions
descent on the instantaneous error function E. We get
the following update rules for the free parameters:

Learning Law for Radial Basis Networks III
1. Linear weights (output layer):

Contents
E ( n ) N
 
Time Series
wi ( n)
  j
e (
j 1
n )G (|| x j  xi ||Ci )
Prediction
TS & NNs E ( n )
wi (n  1)  wi (n)  1
RBF Model wi (n) i=1,2,…,m1
Conclusions
2. Positions
E ( n)
of centers
N (hidden layer):
  
 2 wi (n) e j (n)G ' (|| x j  ti (n) ||Ci ) i [ x j  ti (n)]
1
t i ( n ) j 1
  E ( n )
ti ( n  1)  ti ( n)   2
t i ( n )
i=1,2,…,m1
Learning Law for Radial Basis Networks IV
3. Spreads of centers (hidden layer):

Contents
E ( n ) N
 
Time Series
1
  wi (n) e j (n)G ' (|| x j  ti ||Ci )Q ji (n)
 i ( n ) j 1
Prediction    
Q ji (n)  [ x j  ti (n)][ x j  ti (n)]T
TS & NNs E ( n )
1 1
 i (n  1)   i (n)   3 1
RBF Model  i ( n )
Conclusions
• Note that three different learning rates 1, 2, 3

are used in the gradient descent equations.

Conclusions
•In time series modelling we seek to extract the

Contents maximum possible structure we can find in the series.
Time Series •We terminate the analysis of a series when the
Prediction residuals do not contain any more structure, i.e. they
have an IID structure.
TS & NNs
RBF Model
•NN can be used as models in time series prediction.
Conclusions •RBF networks are a second paradigm of multi layer

perceptrons.
•They are inspired by interpolation theory (numerical
analysis)
•They can be trained with the gradient descent
method, the same as the MLP case.

WK4 - Radial Basis Function Networks

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

WK4 - Radial Basis Function Networks

Uploaded by

Copyright:

Available Formats

WK4 – Radial Basis Function Networks

Spring Semester, 2009

CS 476: Networks of Neural Computation, CSD, UOC, 2009

•Introduction to Time Series Analysis

TS & NNs •Radial Basis Function Network

CS 476: Networks of Neural Computation, CSD, UOC, 2009

•There are two major classes of statistical problems:

CS 476: Networks of Neural Computation, CSD, UOC, 2009

•In a regression problem, there are two high-level

RBF Model •A modelling procedure.

CS 476: Networks of Neural Computation, CSD, UOC, 2009

• A modelling procedure includes usually the

CS 476: Networks of Neural Computation, CSD, UOC, 2009

2. Preparation of the data:

CS 476: Networks of Neural Computation, CSD, UOC, 2009

3. An estimation procedure (i.e. a framework to

CS 476: Networks of Neural Computation, CSD, UOC, 2009

TS & NNs •Resulting estimators created by the framework

CS 476: Networks of Neural Computation, CSD, UOC, 2009

4. A model selection procedure (i.e. to select the

TS & NNs • Generalisation (i.e. how well approaches the

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Conclusions • Usually we select the simplest class which

•Def: A time series, {Xt}, is a family of real-valued

TS & NNs •When a family of variables is defined in all points in

CS 476: Networks of Neural Computation, CSD, UOC, 2009

•Def: A time series, {Xt}, is called (strictly) stationary

RBF Model Where P denotes the joint distribution function of the

•Broadly speaking a time series is stationary if there is

•In classical time series analysis we decompose a

Prediction •Fluctuations about the trend of grater or less

•Typically probability theory of time series examines

CS 476: Networks of Neural Computation, CSD, UOC, 2009

•It is assumed that if the residuals do not contain any

CS 476: Networks of Neural Computation, CSD, UOC, 2009

•If the structure of the series is linear then we fit a

CS 476: Networks of Neural Computation, CSD, UOC, 2009

•Usually a time series does not have all the desirable

CS 476: Networks of Neural Computation, CSD, UOC, 2009

•Restating the prediction problem:

Time Series “We want to construct a model with an appropriate

CS 476: Networks of Neural Computation, CSD, UOC, 2009

•There are three main approaches which are used to

CS 476: Networks of Neural Computation, CSD, UOC, 2009

•To apply a neural network model in time series

CS 476: Networks of Neural Computation, CSD, UOC, 2009

•Discovering ‘memory’ in the generating process:

CS 476: Networks of Neural Computation, CSD, UOC, 2009

CS 476: Networks of Neural Computation, CSD, UOC, 2009

•Determination of Confidence Intervals:

TS & NNs •Moving Blocks Bootstrap;

CS 476: Networks of Neural Computation, CSD, UOC, 2009

CS 476: Networks of Neural Computation, CSD, UOC, 2009

•There are only three layers: Input, Hidden and

CS 476: Networks of Neural Computation, CSD, UOC, 2009

•The hidden layer provides a non-linear transformation

Conclusions •Note: The RBF model owns its development on ideas

CS 476: Networks of Neural Computation, CSD, UOC, 2009

•In the RBF model the hidden units provide a set of

TS & NNs •The inspiration for the RBF model is based on

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Given a set of N different points {xi Rm0| I=1,2,..,N}

CS 476: Networks of Neural Computation, CSD, UOC, 2009

Where {(||x-xi||) | I=1,2,…,N} is a set of N arbitrary