Professional Documents
Culture Documents
Sistemas Neurodifusos
Sistemas Neurodifusos
x y
xy
y v
Linear Transformations
1
6 Hopfield Network Questions
Initial
Condition Recurrent Layer
p
Sx1 W n(t + 1) a(t + 1) a(t)
SxS
Sx1 Sx1 D Sx1
1 b
Sx1
S S
3
6 Example - Rotation
Is rotation linear?
A(ax )
A(x ) x = aA(x ) ax
1.
A(x ) x
x 1 +x 2 A(x 1 + x 2)
A(x 1)
x2
2. x1 A(x 2)
4
6 Matrix Representation - (1)
Any linear transformation between two finite-dimensional
vector spaces can be represented by matrix multiplication.
Let {v1, v2, ..., vn} be a basis for X, and let {u1, u2, ..., um} be
a basis for Y.
n m
x = xiv i y = yiu i
i=1 i=1
Let A:XY
A(x ) = y
n m
A x jv j = yiu i
j = 1 i=1
5
6 Matrix Representation - (2)
Since A is a linear operator,
n m
xjA(vj) = yiu i
j=1 i=1
n m m
x j aij u i = yiu i
j=1 i=1 i=1
6
6 Matrix Representation - (3)
m n m
u i aij x j = yiu i
i=1 j=1 i=1
m n
ui aij x j yi = 0
i=1 j=1
a 11 a 12 a 1n x 1 y1
n
a 21 a 22 a 2n x 2
aij x j = yi =
y2
j=1
This is equivalent to a m1 a m2 a mn x n ym
matrix multiplication.
7
6 Summary
m
A(vj) = aij u i
i=1
s2
x y = A(x)
x y = A(x)
s1
9
6 Example - (2)
To find the matrix we need to transform each of the basis vectors.
m
A(vj) = aij u i
i=1
10
6 Example - (3)
We begin with s1:
If we draw a line on the bottom card and then skew the
deck, the line will not change.
A(s1)
s1
2
A (s 1) = 1s 1 + 0s 2 = a i1 s i = a 11 s 1 + a 21 s 2
i=1
tan()
A(s2)
s2
2
A (s 2) = tan ( ) s 1 + 1 s 2 = ai2 s i = a 12 s 1 + a 22 s 2
i=1
A = 1 tan( )
0 1
13
6 Change of Basis
Consider the linear transformation A:XY. Let {v1, v2, ..., vn} be
a basis for X, and let {u1, u2, ..., um} be a basis for Y.
n m
x = xiv i y = yiu i
i=1 i=1
A(x ) = y
a m1 a m2 a mn x n ym
Ax = y
14
6 New Basis Sets
Now lets consider different basis sets. Let {t1, t2, ..., tn} be a
basis for X, and let {w1, w2, ..., wm} be a basis for Y.
n m
x = x'i t i y = y'iw i
i=1 i=1
a' m1 a' m2 a' mn x' n y' m
A'x' = y'
15
6 How are A and A' related?
Expand ti in terms of the original basis vectors for X.
t 1i
n
t 2i
ti = t ji v j ti =
j=1
t ni
w1i
m
w2i
wi = w ji u j wi =
j=1
w mi
16
6 How are A and A' related?
Bw = w 1 w 2 wm y = B w y'
1
[ B w AB t ] x' = y'
1
A' = [ B w AB t ]
A'x' = y'
Similarity
Transform
17
6 Example - (1)
Take the skewing problem described previously, and find the
new matrix representation using the basis set {s1, s2}.
t2 t1
s2 t1 = 0.5 s 1 + s 2
t2 = -s1 + s2
s1
0.5
t1 =
1
B t = t 1 t 2 = 0.5 1 B w = B t = 0.5 1
1 1 1 1 1
t2 =
1 (Same basis for
domain and range.)
18
6 Example - (2)
1
A' = [ B w ABt ] = 2 3 2 3 1 tan 0.5 1
2 3 1 3 0 1 1 1
For = 45:
A' = 53 23 A = 11
2 3 1 3 0 1
19
6 Example - (3)
Try a test vector: x = 0.5 x' = 1
1 0
t2 t1 = x y = A( x )
s2
s1
A(z) = z
s2
x y = A(x) Can you find an eigenvector
for this transformation?
s1
21
6 Computing the Eigenvalues
Az = z
[A I ]z = 0 [A I] = 0
1 1 1 = 1
A = 11 = 0 2
(1 ) = 0
0 1 0 1 2 = 1
1 1 z = 0 0 1 z = 0 1 z 11 = 0 z 21 = 0 z1 = 1
1
0 1 0 00 0 0 z 21 0 0
{z 1, z 2, , z n} Eigenvectors
B = z 1 z2 z n
{ 1, 2, , n} Eigenvalues
1 0 0
1 0 2 0
[ B AB ] =
0 0 n
23
6 Example
A = 11
1 1
1 1 1 = 0
= 0 2
2 = ( ) ( 2 ) = 0 1 1 z = 0
1 1 2 = 2 1 1 0
1 = 0 1 1 z = 1 1 z1 1 = 0 z21 = z 11 z1 = 1
1
1 1 1 1 z2 1 0 1
2 = 2 1 1 z = 1 1 z 12 = 0 z 22 = z 12 z2 = 1
1
1 1 1 1 z 22 0 1
A' = [ B AB ] = 1 2 1 2 1 1 1 1 = 0 0
1
Diagonal Form:
12 12 1 1 1 1 0 2
24
7
1
7 Hebbs Postulate
When an axon of cell A is near enough to excite a cell B and
repeatedly or persistently takes part in firing it, some growth
process or metabolic change takes place in one or both cells such
that As efficiency, as one of the cells firing B, is increased.
D. O. Hebb, 1949
Dendrites
B Cell Body
Axon
Synapse
2
7 Linear Associator
Inputs Linear Layer
p n a
Rx1
W Sx1 Sx1
SxR
R S
a = purelin (Wp)
R
a = Wp ai = wij p j
j=1
Training Set:
{p 1, t 1} , {p 2 , t 2} , , {pQ , t Q}
3
7 Hebb Rule
w ijnew = w ijold + f i ( a iq )g j ( p jq )
Presynaptic Signal
Postsynaptic Signal
Simplified Form:
w ijnew = w ijold + a iq p jq
Supervised Form:
w ijnew = w ijold + t iq p jq
Matrix Form:
new old T
W = W + tq pq
4
7 Batch Operation
Q
T T T T
W = t1 p1 + t2 p2 ++ tQ pQ = tq pq (Zero Initial
q=1 Weights)
Matrix Form:
T
p1 P = p1 p 2 pQ
T
W = t 1 t 2 tQ p 2 = TP T
pQ
T
T = t1 t 2 tQ
5
7 Performance Analysis
Q
Q
a = Wp k = t q p Tq p k = tq ( p Tq p k )
q = 1 q=1
t q ( pq p k )
T
a = Wp k = tk +
qk
Error
6
7 Example
Banana Apple Normalized Prototype Patterns
1 1 0.5774 0.5774
p1 = 1 p2 = 1 p 1 = 0.5774 , t 1 = 1 p 2 = 0.5774 , t 2 = 1
1 1 0.5774 0.5774
Tests:
0.5774
Banana Wp 1 = 1.1548 0 0 0.5774 = 0.6668
0.5774
0.5774
Apple Wp 2 = 1.1548 0 0 0.5774 = 0.6668
0.5774
7
7 Pseudoinverse Rule - (1)
Performance Index: Wp q = t q q = 1, 2, , Q
2
F(W ) = || t q Wp q ||
q=1
Matrix Form: WP = T
T = t1 t 2 tQ P = p1 p2 p Q
2 2
F ( W ) = || T WP || = || E ||
e ij
2 2
|| E || =
i j
8
7 Pseudoinverse Rule - (2)
WP = T
Minimize:
2 2
F ( W ) = || T WP || = || E ||
+ T 1 T
P = (P P) P
9
7 Relationship to the Hebb Rule
Hebb Rule
W = TP T
Pseudoinverse Rule
W = TP +
+ T 1 T
P = (P P) P
PT P = I
+ T 1 T T
P = (P P) P = P
10
7 Example
+
1 1
+ 1 1
p1 = 1 , t1 = 1 2
p = ,
1 2 t = 1 W = TP = 1 1 1 1
1 1 1 1
1
+ T 1 T
P = (P P) P = 3 1 1 1 1 = 0.5 0.25 0.25
13 1 1 1 0.5 0.25 0.25
1 1
Wp 1 = 1 0 0 1 = 1 Wp 2 = 1 0 0 1 = 1
1 1
11
7 Autoassociative Memory
p n a T T T
W W = p1 p1 + p2 p2 + p3 p3
30x1 30x1 30x1
30x30
30 30
a = hardlims (Wp)
12
7 Tests
50% Occluded
67% Occluded
13
7 Variations of Hebbian Learning
new old T
Basic Rule: W = W + tq pq
new old T
Learning Rate: W = W + tq pq
new old T
Delta Rule: W = W + ( tq aq ) pq
new old T
Unsupervised: W = W + aq pq
14
8
Performance Surfaces
1
8 Taylor Series Expansion
d ( )
F ( x ) = F ( x ) + F x ( x x )
dx x = x
2
1 d ( ) ( x x ) +
2
+ --- F x
2 d x2
x = x
n
1 d ( ) ( x x ) n +
+ ----- F x
n! d x n
x = x
2
8 Example
x
F( x) = e
x 0 0 1 0 2 1 0 3
F( x ) = e = e e ( x 0 ) + --- e ( x 0 ) --- e ( x 0 ) +
2 6
1 2 1 3
F ( x ) = 1 x + --- x --- x +
2 6
F( x ) F1( x ) = 1 x
1 2
F ( x ) F 2 ( x ) = 1 x + --- x
2
3
8 Plot of Approximations
F2 ( x )
3
2 F1 ( x )
1
F0 ( x )
-2 -1 0 1 2
4
8 Vector Case
F ( x ) = F ( x 1, x 2, , x n )
F ( x ) = F ( x ) + F(x ) ( x 1 x 1 ) + F(x) ( x 2 x 2 )
x1 x=x x 2 x=x
2
1
++ F( x ) ( x x ) + --- F ( x ) ( x x )2
xn x = x
n n 2 x2 x = x
1 1
1
2
1 ( x 1 x 1 ) ( x 2 x 2 ) +
+ --- F(x )
2 x 1 x 2 x=x
5
8 Matrix Form
F ( x ) = F ( x ) + F ( x ) ( x x )
T
x=x
1
+ --- ( x x ) 2F ( x ) ( x x ) +
T
2
x=x
Gradient Hessian
2 2 2
F(x) F( x ) F(x )
F(x) x 21 x 1 x 2 x 1 x n
x1
2 2 2
F(x) F( x) F(x ) F(x )
F ( x ) = x 2 2 F ( x ) = x 2 x 1 x 22 x 2 x n
F(x)
2
2
2
xn F( x) F( x ) F(x )
x n x 1 x n x 2 x 2n
6
8 Directional Derivatives
T
p F ( x )
First derivative (slope) of F(x) along vector p: -----------------------
p
T
Second derivative (curvature) of F(x) along vector p: p 2 F ( x ) p
------------------------------
2
p
7
8 Example
2 2
F ( x ) = x 1 + 2x 1 x 2 + 2x 2
x = 0.5 p = 1
0 1
F(x)
x1 2x 1 + 2x 2
F ( x ) = = = 1
x = x 2x 1 + 4x 2 1
F(x)
x2 x = x
x = x
1
T 1 1
p F ( x ) 1 0
----------------------- = ------------------------ = ------- = 0
p 2
1
1
8
8 Plots
Directional
Derivatives
2
20
15
1
1.4
10
1.3
5 x2 0 1.0
0 0.5
2
1 2
-1
0.0
0 1
0
-1
x2 -2 -2
-1
x1
-2
-2 -1 0 1 2
x1
9
8 Minima
Strong Minimum
The point x* is a strong minimum of F(x) if a scalar > 0 exists,
such that F(x*) < F(x* + x) for all x such that > ||x|| > 0.
Global Minimum
The point x* is a unique global minimum of F(x) if
F(x*) < F(x* + x) for all x 0.
Weak Minimum
The point x* is a weak minimum of F(x) if it is not a strong
minimum, and a scalar > 0 exists, such that F(x*) F(x* + x)
for all x such that > ||x|| > 0.
10
8 Scalar Example
4 2 1
F ( x ) = 3x 7x --- x + 6
2
8
Strong Maximum
6
2 Strong Minimum
Global Minimum
0
-2 -1 0 1 2
11
8 Vector Example
4 2 2 2
F ( x ) = ( x 2 x 1 ) + 8x 1 x 2 x 1 + x 2 + 3 F ( x ) = ( x 1 1.5x 1 x 2 + 2x 2 )x 1
2 2
1.5
1 1
0.5
0 0
-0.5
-1 -1
-1.5
-2 -2
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1 0 1 2
12 8
6
8
4
2
0 0
2 2
1 2 1 2
0 1 0 1
0 0
-1 -1
-1 -1
-2 -2 -2 -2
12
8 First-Order Optimality Condition
1 T 2 ( )
F ( x ) = F ( x + x ) = F ( x ) + F ( x ) x +
T
x + --- x F x
x=x 2 x=x
x = x x
If F ( x )
T
x > 0 then F ( x x ) F ( x ) F ( x ) T x < F ( x )
x=x x = x
T
But this would imply that x* is not a minimum. Therefore F ( x ) x = 0
x = x
Therefore the Hessian matrix must be positive definite. A matrix A is positive definite if:
T
z Az > 0 for any z 0.
2x 1 + 2x 2 + 1
F ( x ) = = 0 x = 1
2x 1 + 4x 2 0.5
(Not a function of x
2F ( x ) = 2 2
2 4 in this case.)
To test the definiteness, check the eigenvalues of the Hessian. If the eigenvalues
are all greater than zero, the Hessian is positive definite.
2F ( x ) I = 2 2 2
= 6 + 4 = ( 0.76 ) ( 5.24 )
2 4
1 0 0
T 0 2 0
A' = [ B AB ] = = A = BB T
0 0 n
17
8 Second Directional Derivative
T T
p 2 F ( x ) p p Ap
------------------------------ = ---------------
2 2
p p
i c 2i
p Ap c B ( B B ) Bc c c i = 1
T T T T T
---------------
2
= -------------------------------------------
T T
- = -------------
T
- = -------------------
n
-
p c B Bc c c
c2i
i=1
T
p Ap
min --------------- max
2
p
18
8 Eigenvector (Largest Eigenvalue)
0
0
p = z max T T
c = B p = B zmax = 0
1
0
0
n
T
z max Az max
i c 2i
z1
-------------------------------
2
=1
n
- = max
- = i------------------- z2 (min)
z max
c 2i (max)
i=1
2F ( x ) = 2 0 1 = 2 z1 = 1 2 = 2 z2 = 0
02 0 1
2
0
0
2 -1
1 2
0 1
0
-1
-1
-2 -2
-2 -2 -1 0 1 2
20
8 Elliptical Hollow
1 T
F ( x ) = x 1 + x 1 x 2 + x 2 = --- x 2 1 x
2 2
2 1 2
2F ( x ) = 2 1 1 = 1 z1 = 1 2 = 3 z2 = 1
1 2 1 1
0
1
0
2 -1
1 2
0 1
0
-1
-1
-2 -2
-2 -2 -1 0 1 2
21
8 Elongated Saddle
1 2 3 1 2 1 T
F ( x ) = --- x 1 --- x 1 x 2 --- x 2 = --- x 0.5 1.5 x
4 2 4 2 1.5 0.5
2F ( x ) = 0.5 1.5 1 = 1 z1 = 1 2 = 2 z2 = 1
1.5 0.5 1 1
-4 0
-8
2
-1
1 2
0 1
0
-1
-1
-2 -2 -2
-2 -1 0 1 2
22
8 Stationary Valley
1 2 1 2 1 T
F( x ) = --- x 1 x 1 x 2 + --- x 2 = --- x 1 1 x
2 2 2 1 1
2F ( x ) = 1 1 1 = 1 z1 = 1 2 = 0 z2 = 1
1 1 1 1
0
1
0
2 -1
1 2
0 1
0
-1
-1
-2 -2
-2 -2 -1 0 1 2
23
8 Quadratic Function Summary
If the eigenvalues of the Hessian matrix are all positive, the
function will have a single strong minimum.
If the eigenvalues are all negative, the function will have a
single strong maximum.
If some eigenvalues are positive and other eigenvalues are
negative, the function will have a single saddle point.
If the eigenvalues are all nonnegative, but some
eigenvalues are zero, then the function will either have a
weak minimum or will have no stationary point.
If the eigenvalues are all nonpositive, but some
eigenvalues are zero, then the function will either have a
weak maximum or will have no stationary point.
x = A d
1
Stationary Point:
24