Sistemas Neurodifusos

x y
xy

y v
Linear Transformations
1
6 Hopfield Network Questions
Initial
Condition Recurrent Layer
p
Sx1 W n(t + 1) a(t + 1) a(t)
SxS
Sx1 Sx1 D Sx1
1 b
Sx1
S S
a(0) = p a(t + 1) = satlins (Wa(t) + b)
The network output is repeatedly multiplied by the weight

matrix W.
What is the effect of this repeated operation?
Will the output converge, go to infinity, oscillate?
In this chapter we want to investigate matrix multiplication,
which represents a general linear transformation.
2
6 Linear Transformations
A transformation consists of three parts:

1. A set of elements X = {x i}, called the domain,
2. A set of elements Y = {yi}, called the range, and
3. A rule relating each x i X to an element yi Y.
A transformation is linear if:

1. For all x 1, x 2 X, A(x 1 + x 2 ) = A(x 1) + A(x 2 ),
2. For all x X, a , A(a x ) = a A(x ) .
3
6 Example - Rotation
Is rotation linear?
A(ax )
A(x ) x = aA(x ) ax
1.
A(x ) x
x 1 +x 2 A(x 1 + x 2)
A(x 1)
x2
2. x1 A(x 2)
4
6 Matrix Representation - (1)
Any linear transformation between two finite-dimensional
vector spaces can be represented by matrix multiplication.
Let {v1, v2, ..., vn} be a basis for X, and let {u1, u2, ..., um} be
a basis for Y.
n m
x = xiv i y = yiu i
i=1 i=1
Let A:XY
A(x ) = y
n m

A x jv j = yiu i
j = 1 i=1
5
Since A is a linear operator,
n m
xjA(vj) = yiu i
j=1 i=1
Since the u i are a basis for Y,

m (The coefficients aij will make
A(vj) = aij u i up the matrix representation of
i=1 the transformation.)
n m m
x j aij u i = yiu i
j=1 i=1 i=1
6
m n m
u i aij x j = yiu i
i=1 j=1 i=1
m n

ui aij x j yi = 0
i=1 j=1
Because the u i are independent,
a 11 a 12 a 1n x 1 y1
n
a 21 a 22 a 2n x 2
aij x j = yi =
y2
j=1
This is equivalent to a m1 a m2 a mn x n ym
matrix multiplication.
7
6 Summary
A linear transformation can be represented by matrix

multiplication.
To find the matrix which represents the transformation we
must transform each basis vector for the domain and then
expand the result in terms of the basis vectors of the range.
m
A(vj) = aij u i
i=1
Each of these equations gives us

one column of the matrix.
8
6 Example - (1)
Stand a deck of playing cards on edge so that you are looking
at the deck sideways. Draw a vector x on the edge of the deck.
Now skew the deck by an angle , as shown below, and note
the new vector y = A(x). What is the matrix of this transforma-
tion in terms of the standard basis set?
s2
x y = A(x)
x y = A(x)
s1
9
6 Example - (2)
To find the matrix we need to transform each of the basis vectors.
m
A(vj) = aij u i
i=1
We will use the standard basis vectors for both

the domain and the range.
2
A (s j) = aij s i = a1 j s 1 + a2 j s 2
i=1
10
6 Example - (3)
We begin with s1:
If we draw a line on the bottom card and then skew the
deck, the line will not change.
A(s1)
s1
2
A (s 1) = 1s 1 + 0s 2 = a i1 s i = a 11 s 1 + a 21 s 2
i=1
This gives us the first column of the matrix.

11
6 Example - (4)
Next, we skew s2:
tan()
A(s2)
s2
2
A (s 2) = tan ( ) s 1 + 1 s 2 = ai2 s i = a 12 s 1 + a 22 s 2
i=1
This gives us the second column of the matrix.

12
6 Example - (5)
The matrix of the transformation is:
A = 1 tan( )
0 1
13
6 Change of Basis
Consider the linear transformation A:XY. Let {v1, v2, ..., vn} be
a basis for X, and let {u1, u2, ..., um} be a basis for Y.
n m
x = xiv i y = yiu i
i=1 i=1
A(x ) = y
The matrix representation is:

a 11 a 12 a 1n x 1 y1
a 21 a 22 a 2n x 2 y2
=

a m1 a m2 a mn x n ym
Ax = y
14
6 New Basis Sets
Now lets consider different basis sets. Let {t1, t2, ..., tn} be a
basis for X, and let {w1, w2, ..., wm} be a basis for Y.
n m
x = x'i t i y = y'iw i
i=1 i=1
The new matrix representation is:
a' 11 a' 12 a' 1n x' 1 y' 1

a' 21 a' 22 a' 2n x' 2 y' 2
=

a' m1 a' m2 a' mn x' n y' m
A'x' = y'
15
6 How are A and A' related?
Expand ti in terms of the original basis vectors for X.
t 1i
n
t 2i
ti = t ji v j ti =

j=1
t ni
Expand w i in terms of the original basis vectors for Y.
w1i
m
w2i
wi = w ji u j wi =

j=1
w mi
16
6 How are A and A' related?
Bt = t1 t 2 t n x = x' 1 t1 + x' 2 t2 + + x' n t n = B t x'
Bw = w 1 w 2 wm y = B w y'
Ax = y ABt x' = B w y'
1
[ B w AB t ] x' = y'
1
A' = [ B w AB t ]
A'x' = y'
Similarity
Transform
17
6 Example - (1)
Take the skewing problem described previously, and find the
new matrix representation using the basis set {s1, s2}.
t2 t1
s2 t1 = 0.5 s 1 + s 2
t2 = -s1 + s2
s1
0.5
t1 =
1
B t = t 1 t 2 = 0.5 1 B w = B t = 0.5 1
1 1 1 1 1
t2 =
1 (Same basis for
domain and range.)
18
6 Example - (2)
1
A' = [ B w ABt ] = 2 3 2 3 1 tan 0.5 1
2 3 1 3 0 1 1 1
A' = ( 2 3 ) tan + 1 ( 2 3 ) tan

( 2 3 ) tan ( 2 3 ) tan + 1
For = 45:
A' = 53 23 A = 11
2 3 1 3 0 1
19
6 Example - (3)
Try a test vector: x = 0.5 x' = 1
1 0
y = Ax = 1 1 0.5 = 1.5 y' = A'x' = 53 23 1 = 53

0 1 1 1 2 3 1 3 0 2 3
t2 t1 = x y = A( x )
s2
s1
Check using reciprocal basis vectors:

1
y' = B 1 y = 0.5 1 1.5 = 2 3 2 3 1.5 = 5 3
1 1 1 2 3 1 3 1 2 3
20
6 Eigenvalues and Eigenvectors
Let A:XX be a linear transformation. Those vectors
z X, which are not equal to zero, and those scalars
which satisfy
A(z) = z
are called eigenvectors and eigenvalues, respectively.
s2
x y = A(x) Can you find an eigenvector
for this transformation?
s1
21
6 Computing the Eigenvalues
Az = z
[A I ]z = 0 [A I] = 0
Skewing example (45):
1 1 1 = 1
A = 11 = 0 2
(1 ) = 0
0 1 0 1 2 = 1
1 1 z = 0 0 1 z = 0 1 z 11 = 0 z 21 = 0 z1 = 1
1
0 1 0 00 0 0 z 21 0 0
For this transformation there is only one eigenvector.

22
6 Diagonalization
Perform a change of basis (similarity transformation) using

the eigenvectors as the basis vectors. If the eigenvalues are
distinct, the new matrix will be diagonal.
{z 1, z 2, , z n} Eigenvectors
B = z 1 z2 z n
{ 1, 2, , n} Eigenvalues
1 0 0
1 0 2 0
[ B AB ] =

0 0 n
23
6 Example
A = 11
1 1
1 1 1 = 0
= 0 2
2 = ( ) ( 2 ) = 0 1 1 z = 0
1 1 2 = 2 1 1 0
1 = 0 1 1 z = 1 1 z1 1 = 0 z21 = z 11 z1 = 1
1
1 1 1 1 z2 1 0 1
2 = 2 1 1 z = 1 1 z 12 = 0 z 22 = z 12 z2 = 1
1
1 1 1 1 z 22 0 1
A' = [ B AB ] = 1 2 1 2 1 1 1 1 = 0 0
1
Diagonal Form:
12 12 1 1 1 1 0 2
24
7
Supervised Hebbian Learning
1
7 Hebbs Postulate
When an axon of cell A is near enough to excite a cell B and
repeatedly or persistently takes part in firing it, some growth
process or metabolic change takes place in one or both cells such
that As efficiency, as one of the cells firing B, is increased.
D. O. Hebb, 1949
Dendrites
B Cell Body
Axon
Synapse
2
7 Linear Associator
Inputs Linear Layer
p n a
Rx1
W Sx1 Sx1
SxR
R S
a = purelin (Wp)
R
a = Wp ai = wij p j
j=1
Training Set:
{p 1, t 1} , {p 2 , t 2} , , {pQ , t Q}
3
7 Hebb Rule
w ijnew = w ijold + f i ( a iq )g j ( p jq )
Presynaptic Signal
Postsynaptic Signal
Simplified Form:
w ijnew = w ijold + a iq p jq
Supervised Form:
w ijnew = w ijold + t iq p jq
Matrix Form:
new old T
W = W + tq pq
4
7 Batch Operation
Q

T T T T
W = t1 p1 + t2 p2 ++ tQ pQ = tq pq (Zero Initial
q=1 Weights)
Matrix Form:
T
p1 P = p1 p 2 pQ
T
W = t 1 t 2 tQ p 2 = TP T

pQ
T
T = t1 t 2 tQ
5
7 Performance Analysis
Q
Q
a = Wp k = t q p Tq p k = tq ( p Tq p k )
q = 1 q=1
Case I, input patterns are orthogonal.

T
( pq pk ) = 1 q = k
= 0 qk
Therefore the network output equals the target:

a = Wp k = t k
Case II, input patterns are normalized, but not orthogonal.
t q ( pq p k )
T
a = Wp k = tk +
qk
Error
6
7 Example
Banana Apple Normalized Prototype Patterns
1 1 0.5774 0.5774

p1 = 1 p2 = 1 p 1 = 0.5774 , t 1 = 1 p 2 = 0.5774 , t 2 = 1

1 1 0.5774 0.5774
Weight Matrix (Hebb Rule):
W = TP = 1 1 0.5774 0.5774 0.5774 = 1.1548 0 0

T
0.5774 0.5774 0.5774
Tests:
0.5774
Banana Wp 1 = 1.1548 0 0 0.5774 = 0.6668
0.5774
0.5774
Apple Wp 2 = 1.1548 0 0 0.5774 = 0.6668
0.5774
7
7 Pseudoinverse Rule - (1)
Performance Index: Wp q = t q q = 1, 2, , Q

2
F(W ) = || t q Wp q ||
q=1
Matrix Form: WP = T
T = t1 t 2 tQ P = p1 p2 p Q
2 2
F ( W ) = || T WP || = || E ||
e ij
2 2
|| E || =
i j
8
7 Pseudoinverse Rule - (2)
WP = T
Minimize:
2 2
F ( W ) = || T WP || = || E ||
If an inverse exists for P, F(W) can be made zero:

W = TP 1
When an inverse does not exist F(W) can be minimized

using the pseudoinverse:
W = TP +
+ T 1 T
P = (P P) P
9
7 Relationship to the Hebb Rule
Hebb Rule
W = TP T
Pseudoinverse Rule
W = TP +
+ T 1 T
P = (P P) P
If the prototype patterns are orthonormal:
PT P = I
+ T 1 T T
P = (P P) P = P
10
7 Example
+
1 1
+ 1 1
p1 = 1 , t1 = 1 2
p = ,
1 2 t = 1 W = TP = 1 1 1 1

1 1 1 1
1
+ T 1 T
P = (P P) P = 3 1 1 1 1 = 0.5 0.25 0.25
13 1 1 1 0.5 0.25 0.25
W = TP = 1 1 0.5 0.25 0.25 = 1 0 0

+
0.5 0.25 0.25
1 1
Wp 1 = 1 0 0 1 = 1 Wp 2 = 1 0 0 1 = 1
1 1
11
7 Autoassociative Memory
p1,t1 p2,t2 p3,t3

T
p1 = 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Inputs Sym. Hard Limit Layer
p n a T T T
W W = p1 p1 + p2 p2 + p3 p3
30x1 30x1 30x1
30x30
30 30
a = hardlims (Wp)
12
7 Tests
50% Occluded
67% Occluded
Noisy Patterns (7 pixels)
13
7 Variations of Hebbian Learning
new old T
Basic Rule: W = W + tq pq
new old T
Learning Rate: W = W + tq pq
new old T old old T

Smoothing: W = W + tq pq W = (1 )W + tq pq
new old T
Delta Rule: W = W + ( tq aq ) pq
new old T
Unsupervised: W = W + aq pq
14
8
Performance Surfaces
1
8 Taylor Series Expansion
d ( )
F ( x ) = F ( x ) + F x ( x x )
dx x = x
2
1 d ( ) ( x x ) +
2
+ --- F x
2 d x2
x = x
n
1 d ( ) ( x x ) n +
+ ----- F x
n! d x n
x = x
2
8 Example
x
F( x) = e
Taylor series of F(x) about x* = 0 :
x 0 0 1 0 2 1 0 3
F( x ) = e = e e ( x 0 ) + --- e ( x 0 ) --- e ( x 0 ) +
2 6
1 2 1 3
F ( x ) = 1 x + --- x --- x +
2 6
Taylor series approximations:

F ( x ) F0 ( x ) = 1
F( x ) F1( x ) = 1 x
1 2
F ( x ) F 2 ( x ) = 1 x + --- x
2
3
8 Plot of Approximations
F2 ( x )
3
2 F1 ( x )
1
F0 ( x )
-2 -1 0 1 2
4
8 Vector Case
F ( x ) = F ( x 1, x 2, , x n )

F ( x ) = F ( x ) + F(x ) ( x 1 x 1 ) + F(x) ( x 2 x 2 )
x1 x=x x 2 x=x
2
1
++ F( x ) ( x x ) + --- F ( x ) ( x x )2
xn x = x
n n 2 x2 x = x
1 1
1
2
1 ( x 1 x 1 ) ( x 2 x 2 ) +
+ --- F(x )
2 x 1 x 2 x=x
5
8 Matrix Form
F ( x ) = F ( x ) + F ( x ) ( x x )
T

x=x
1
+ --- ( x x ) 2F ( x ) ( x x ) +
T
2
x=x
Gradient Hessian
2 2 2
F(x) F( x ) F(x )
F(x) x 21 x 1 x 2 x 1 x n
x1
2 2 2
F(x) F( x) F(x ) F(x )
F ( x ) = x 2 2 F ( x ) = x 2 x 1 x 22 x 2 x n

F(x)
2

2

2
xn F( x) F( x ) F(x )
x n x 1 x n x 2 x 2n
6
8 Directional Derivatives
First derivative (slope) of F(x) along xi axis: F ( x ) x i
(ith element of gradient)
Second derivative (curvature) of F(x) along xi axis: 2 F ( x ) x 2i
(i,i element of Hessian)
T
p F ( x )
First derivative (slope) of F(x) along vector p: -----------------------
p
T
Second derivative (curvature) of F(x) along vector p: p 2 F ( x ) p
------------------------------
2
p
7
8 Example
2 2
F ( x ) = x 1 + 2x 1 x 2 + 2x 2
x = 0.5 p = 1
0 1

F(x)
x1 2x 1 + 2x 2
F ( x ) = = = 1
x = x 2x 1 + 4x 2 1
F(x)
x2 x = x
x = x
1
T 1 1
p F ( x ) 1 0
----------------------- = ------------------------ = ------- = 0
p 2
1
1
8
8 Plots
Directional
Derivatives
2
20
15
1
1.4
10
1.3
5 x2 0 1.0
0 0.5
2
1 2
-1
0.0
0 1
0
-1
x2 -2 -2
-1
x1
-2
-2 -1 0 1 2
x1
9
8 Minima
Strong Minimum
The point x* is a strong minimum of F(x) if a scalar > 0 exists,
such that F(x*) < F(x* + x) for all x such that > ||x|| > 0.
Global Minimum
The point x* is a unique global minimum of F(x) if
F(x*) < F(x* + x) for all x 0.
Weak Minimum
The point x* is a weak minimum of F(x) if it is not a strong
minimum, and a scalar > 0 exists, such that F(x*) F(x* + x)
for all x such that > ||x|| > 0.
10
8 Scalar Example
4 2 1
F ( x ) = 3x 7x --- x + 6
2
8
Strong Maximum
6
2 Strong Minimum
Global Minimum
0
-2 -1 0 1 2
11
8 Vector Example
4 2 2 2
F ( x ) = ( x 2 x 1 ) + 8x 1 x 2 x 1 + x 2 + 3 F ( x ) = ( x 1 1.5x 1 x 2 + 2x 2 )x 1
2 2
1.5
1 1
0.5
0 0
-0.5
-1 -1
-1.5
-2 -2
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1 0 1 2
12 8
6
8
4
2
0 0
2 2
1 2 1 2
0 1 0 1
0 0
-1 -1
-1 -1
-2 -2 -2 -2
12
8 First-Order Optimality Condition
1 T 2 ( )
F ( x ) = F ( x + x ) = F ( x ) + F ( x ) x +
T

x + --- x F x
x=x 2 x=x
x = x x
For small x: If x* is a minimum, this implies:

F ( x + x ) F ( x ) + F ( x )
T T
x F ( x ) x 0
x = x x = x
If F ( x )
T
x > 0 then F ( x x ) F ( x ) F ( x ) T x < F ( x )
x=x x = x
T
But this would imply that x* is not a minimum. Therefore F ( x ) x = 0
x = x
Since this must be true for every x, F ( x )

= 0
x=x
13
8 Second-Order Condition
If the first-order condition is satisfied (zero gradient), then
1 T
F ( x + x ) = F ( x ) + --- x 2F ( x ) x +
2 x = x
T
A strong minimum will exist at x* if x 2F ( x )
x > 0 for any x 0.
x=x
Therefore the Hessian matrix must be positive definite. A matrix A is positive definite if:
T
z Az > 0 for any z 0.
This is a sufficient condition for optimality.
A necessary condition is that the Hessian matrix be positive semidefinite. A matrix A is

positive semidefinite if:
T
z Az 0 for any z.
14
8 Example
2 2
F( x ) = x 1 + 2x 1 x 2 + 2x 2 + x 1
2x 1 + 2x 2 + 1
F ( x ) = = 0 x = 1
2x 1 + 4x 2 0.5
(Not a function of x
2F ( x ) = 2 2
2 4 in this case.)
To test the definiteness, check the eigenvalues of the Hessian. If the eigenvalues
are all greater than zero, the Hessian is positive definite.
2F ( x ) I = 2 2 2
= 6 + 4 = ( 0.76 ) ( 5.24 )
2 4
= 0.76, 5.24 Both eigenvalues are positive, therefore strong minimum.

15
8 Quadratic Functions
1 T T
F ( x ) = --- x Ax + d x + c (Symmetric A)
2
Gradient and Hessian:

Useful properties of gradients:
T T
( h x ) = ( x h ) = h
T T
x Qx = Qx + Q x = 2 Qx (for symmetric Q )
Gradient of Quadratic Function:

F ( x ) = Ax + d
Hessian of Quadratic Function:

2 F ( x ) = A
16
8 Eigensystem of the Hessian
Consider a quadratic function which has a stationary
point at the origin, and whose value there is zero.
1 T
F ( x ) = --- x Ax
2
Perform a similarity transform on the Hessian matrix,
using the eigenvalues as the new basis vectors.
B = z 1 z 2 zn
Since the Hessian matrix is symmetric, its eigenvectors

are orthogonal. 1 T
B = B
1 0 0
T 0 2 0
A' = [ B AB ] = = A = BB T

0 0 n
17
8 Second Directional Derivative
T T
p 2 F ( x ) p p Ap
------------------------------ = ---------------
2 2
p p
Represent p with respect to the eigenvectors (new basis):

p = Bc
i c 2i
p Ap c B ( B B ) Bc c c i = 1
T T T T T
---------------
2
= -------------------------------------------
T T
- = -------------
T
- = -------------------
n
-
p c B Bc c c
c2i
i=1
T
p Ap
min --------------- max
2
p
18
8 Eigenvector (Largest Eigenvalue)
0
0

p = z max T T
c = B p = B zmax = 0
1
0

0
n
T
z max Az max
i c 2i
z1
-------------------------------
2
=1
n
- = max
- = i------------------- z2 (min)
z max
c 2i (max)
i=1
The eigenvalues represent curvature

(second derivatives) along the eigenvectors
(the principal axes).
19
8 Circular Hollow
1 T
F ( x ) = x 1 + x 2 = --- x 2 0 x
2 2
2 0 2
2F ( x ) = 2 0 1 = 2 z1 = 1 2 = 2 z2 = 0
02 0 1
(Any two independent vectors in the plane would work.)

2
2
0
0
2 -1
1 2
0 1
0
-1
-1
-2 -2
-2 -2 -1 0 1 2
20
8 Elliptical Hollow
1 T
F ( x ) = x 1 + x 1 x 2 + x 2 = --- x 2 1 x
2 2
2 1 2
2F ( x ) = 2 1 1 = 1 z1 = 1 2 = 3 z2 = 1
1 2 1 1
0
1
0
2 -1
1 2
0 1
0
-1
-1
-2 -2
-2 -2 -1 0 1 2
21
8 Elongated Saddle
1 2 3 1 2 1 T
F ( x ) = --- x 1 --- x 1 x 2 --- x 2 = --- x 0.5 1.5 x
4 2 4 2 1.5 0.5
2F ( x ) = 0.5 1.5 1 = 1 z1 = 1 2 = 2 z2 = 1
1.5 0.5 1 1
-4 0
-8
2
-1
1 2
0 1
0
-1
-1
-2 -2 -2
-2 -1 0 1 2
22
8 Stationary Valley
1 2 1 2 1 T
F( x ) = --- x 1 x 1 x 2 + --- x 2 = --- x 1 1 x
2 2 2 1 1
2F ( x ) = 1 1 1 = 1 z1 = 1 2 = 0 z2 = 1
1 1 1 1
0
1
0
2 -1
1 2
0 1
0
-1
-1
-2 -2
-2 -2 -1 0 1 2
23
8 Quadratic Function Summary
If the eigenvalues of the Hessian matrix are all positive, the
function will have a single strong minimum.
If the eigenvalues are all negative, the function will have a
single strong maximum.
If some eigenvalues are positive and other eigenvalues are
negative, the function will have a single saddle point.
If the eigenvalues are all nonnegative, but some
eigenvalues are zero, then the function will either have a
weak minimum or will have no stationary point.
If the eigenvalues are all nonpositive, but some
eigenvalues are zero, then the function will either have a
weak maximum or will have no stationary point.
x = A d
1
Stationary Point:
24

Sistemas Neurodifusos

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Sistemas Neurodifusos

Uploaded by

Copyright:

Available Formats



a(0) = p a(t + 1) = satlins (Wa(t) + b)

The network output is repeatedly multiplied by the weight

A transformation consists of three parts:

A transformation is linear if:

Since the u i are a basis for Y,

Because the u i are independent,

A linear transformation can be represented by matrix

Each of these equations gives us

We will use the standard basis vectors for both

This gives us the first column of the matrix.

This gives us the second column of the matrix.

The matrix of the transformation is:

The matrix representation is:

The new matrix representation is:

a' 11 a' 12 a' 1n x' 1 y' 1

Expand w i in terms of the original basis vectors for Y.

Bt = t1 t 2 t n x = x' 1 t1 + x' 2 t2 + + x' n t n = B t x'

Ax = y ABt x' = B w y'

A' = ( 2 3 ) tan + 1 ( 2 3 ) tan

y = Ax = 1 1 0.5 = 1.5 y' = A'x' = 53 23 1 = 53

Check using reciprocal basis vectors:

are called eigenvectors and eigenvalues, respectively.

Skewing example (45):

For this transformation there is only one eigenvector.

Perform a change of basis (similarity transformation) using

Supervised Hebbian Learning

Case I, input patterns are orthogonal.

Therefore the network output equals the target:

Case II, input patterns are normalized, but not orthogonal.

Weight Matrix (Hebb Rule):

W = TP = 1 1 0.5774 0.5774 0.5774 = 1.1548 0 0

If an inverse exists for P, F(W) can be made zero:

When an inverse does not exist F(W) can be minimized

If the prototype patterns are orthonormal:

W = TP = 1 1 0.5 0.25 0.25 = 1 0 0

0.5 0.25 0.25

p1,t1 p2,t2 p3,t3

Inputs Sym. Hard Limit Layer

Noisy Patterns (7 pixels)

new old T old old T

Taylor series of F(x) about x* = 0 :

Taylor series approximations:

First derivative (slope) of F(x) along xi axis: F ( x ) x i

(ith element of gradient)

Second derivative (curvature) of F(x) along xi axis: 2 F ( x ) x 2i

(i,i element of Hessian)

For small x: If x* is a minimum, this implies:

Since this must be true for every x, F ( x )

This is a sufficient condition for optimality.

A necessary condition is that the Hessian matrix be positive semidefinite. A matrix A is

= 0.76, 5.24 Both eigenvalues are positive, therefore strong minimum.

Gradient and Hessian:

Gradient of Quadratic Function:

Hessian of Quadratic Function:

Since the Hessian matrix is symmetric, its eigenvectors

Represent p with respect to the eigenvectors (new basis):

The eigenvalues represent curvature

(Any two independent vectors in the plane would work.)

You might also like