You are on page 1of 11

Introduction

A kind

## of supervised neural networks

Design of Neural Networks as curve fitting (approximation) problem
Learning
Find a surface that best fits to given training data
Generalization
use of this multidimensional surface to interpolate test data

XX

X X OGeneralization
X X
TrainingData
X

X
)
12(
w
1
m
i
y

(
X
)

i
2
i

1
3

(
X
)
3
w
m
1
xN
X
):noliearfunctio
m
1i(

Input layer
Hidden layer
Hidden

## units provide a set of basis function

High dimension: more linearly separable [Covers Theorem]

Output layer
Linear

Xx1

## combination of hidden functions

..
..

xx23
IncreasNe->m
ddem1esion:

Outyput
Morelikselypetorabetedlinearly

(
X
,#1of

)ba2s[i1
x
ia
2
2
]funac0ti

(
X
)

[
x
,
x
,
]
2
1
1
2

(
X
)

[
x
,
x
,
x
]
1
1
2
1
2
0
a
x

a
x

0
1
3
4
0
a
x

a
x

0
1
3
4
5
1
2
0
o:2)#ofbasifunctio:4)#ofbasifunctio:5)
Covers Theorem on Separability(1)

## A pattern classification cast in high dimensional space

nonlinearly is more likely to be linearly separable than in a
low dimension space
X

separP(bNl,em)(12

1
N
m

(X
)[
X
),
X
),.
X
)]W
1(
2(
m
1(
T

(
X
)

0
,
X

2
a
x
.
x

i
.
i
i
1
2
N
1
2
r
0

i
12.
rm

s
e
p
a
r
b
l
e
m
1
m
1

## Covers theorem in case of polynomial functions

Let

1
1
N
1
0

:
Polynomial hidden function
r-th order rational varieties

Probability

## that particular dichotomy picked at random is

N: # of data points
Separating surface:
degree of freedom
If
is increasing, the probability of separability is increasing

10 (

(12x)
exxtt122,t12

x
)

x
)
1(
2(
),2(x)2
1x
Example: XOR Problem

## XOR data are not linearly separable

Nonlinear transformation

(0,1)

IX
n
p
u
t
p
a
t
e
r
n
F
i
r
s
t
H
i
d
e
n
F
u
n
c
t
i
o
n
S
e
c
o
n
d
H
i
d
e
n
F
u
n
c
t
i
o
n
((1
,0,1
)0) 1
0
.
1
3
5
3
0
.
3
6
7
8
6
7
8
1
5
3
1
(1,1)1 0.3678 (0D,0e)cision0.3678

Linearly separable in

(0,0)

(1,0)

space

Boundary
(01,10) (1,1)

m
1
m
11

Expected

## maximum number of vectors linearly separable in a space

of dimensionality

## Surface of high dimension has high separating capacity

[Corollary to Covers theorem]
E[N]=

2
Median[N]=2

R
|
i

1
,
2

N
1

R
i

1
,
2

i
N
1
F
:R
R

F
(
X
)

d
i
i
F
(
X
)

(
X

w
d
le
t

w
d

w
h
e
r

(
X

,
d,
w

m
0
i
N

Interpolation Problem(1)

i
i
i

1
12N
1
2
1
3
1
N
1
1
2
2
2
2
j
i
j
i
N
N
1N
2N
3
N
11213111

Interpolation problem
given

## a set of N different points

corresponding set of N real number
function
that satisfies this condition :

2
1
2
1
2
1
2
N
2
2
N
N
N
1
N
2
N
3
N

Function

Form:

and a
, find a

x2

x1

X
iN

1
exrp(c12fo)rfsrmsoemre0,0,rRR
(
r()r)
Interpolation Problem(2)

Micchellis Theorem
If

221/212/2

is nonsingular

## Types of RBF function

Inverse

Gaussian functions

Fw(Xr1)w0(.x5X011.5)4.5w1(1Xx0).5w(Xx1)0.5
32 40 2

2
1
1
2
2
3
3
2
2
04.5140.5w
12
132

Interpolation Problem(3)

Example
{(x,d)}

={ (-1,1),(0,2),(1,1) }

21
-1 0 1

(
X
,
d
)
i
i

1
N

1
F
(
X
)

(
X

)
w

i
i

1
XiN1
Summary

RBF Networks
Three

## layer : Input layer, hidden layer, output layer

Hidden units provide set of basis functions for input vectors
Dimension of hidden layer is much larger than that of input layer
Output is obtained from linear combination of hidden functions

Covers theorem
A pattern

## classification cast in high dimensional space nonlinearly is

more likely to be linearly separable than in a low dimension space

Interpolation problem
Find