Widrow-HoffLearning-LMS

© All Rights Reserved

60 views

Widrow-HoffLearning-LMS

© All Rights Reserved

- ShalevSiSrCo2010_PEGPegasASOS.pdf
- Zoho questions.pdf
- test-3-8-12-03
- DSPLAB09-10
- TOGAF V91Sample Catalogs Matrics Diagrams v3
- Classification of Lung Diseases Using Optimization Techniques
- control system labs all
- Kinematics Modeling and Analyses of Articulated Rovers
- Design of Digital Filters
- 00801880
- quantumhw2
- Neural Network
- Zh 33251263
- Stiffness Matrix
- Experiment 9
- 10 Adaptive Filter
- Definir y Comparar Matrices ALG II SPN l v3 n46 s1
- f 1379073256653
- 08 Kernels
- 32.2.Lehtonen

You are on page 1of 22

(LMS Algorithm)

In this chapter we apply the principles of performance

learning to a single-layer linear neural network.

Widrow-Hoff learning is an approximate steepest

descent algorithm, in which the performance index is

mean square error.

the late 1950s, at about the same time that

Frank Rosenblatt developed the

perceptron learning rule.

In

I 1960 Widrow

Wid

and

dH

Hoff

ff iintroduced

t d

d

ADALINE (ADAptive LInear NEuron)

network.

Its learning rule is called LMS (Least Mean

Square) algorithm.

ADALINE is similar to the perceptron,

except that its transfer function is linear,

instead of hard limiting.

2

IUT-Ahmadzadeh

1430/10/28

switching circuits, in 1960 IRE WESCON Convention

Record, Part 4, New York: IRE, pp. 96104.

Widrow, B., and Lehr, M. A., 1990, 30 years of

adaptive neural networks: Perceptron, madaline, and

backpropagation, Proc. IEEE, 78:14151441.

Widrow, B., and Stearns, S. D., 1985, Adaptive Signal

Processing, Englewood Cliffs, NJ: Prentice-Hall.

only solve linearly separable problems.

The LMS algorithm minimizes mean

square error

error, and therefore tries to move

the decision boundaries as far from the

training patterns as possible.

The LMS algorithm found many more

practical uses than the p

p

perceptron

p

((like

most long distance phone lines use

ADALINE network for echo cancellation).

4

IUT-Ahmadzadeh

1430/10/28

ADALINE Network

a = purel in Wp + b = Wp + b

w i 1

iw

iw

w i 2

w i R

Two-Input ADALINE

T

a = 1w p + b = w1 1 p 1 + w1 2 p 2 + b

determined by the input vectors for which the net input n is zero.

IUT-Ahmadzadeh

1430/10/28

The LMS algorithm is an example of supervised training.

Training Set:

{ p 1, t 1} {p 2 , t2} { p Q, t Q}

Input:

Target:

pq

tq

Notation:

x =

1w

a = 1w p + b

z = p

a = x z

2

F x = E e = E t a = E t xT z

The expectation is taken over all sets of input/target pairs.7

Error Analysis

2

F x = E e = E t a = E t xT z

T

F x = E t 2 t x T z + x T z z x

2

F x = E t 2 x T E t z + xT E zz x

This can be written in the following convenient form:

T

F x = c 2 x h + x R x

where

c = E t

h = E tz

R = E zz

IUT-Ahmadzadeh

1430/10/28

between the input vector and its associated

target.

R is the input correlation matrix.

The diagonal elements of this matrix are

equal to the mean square values of the

elements of the input vectors.

The mean square error for the ADALINE Network is a

quadratic function:

T

1 T

F x = c + d x + -- x Ax

2

d = 2 h

A = 2R

Stationary Point

Hessian Matrix:

A = 2R

semidefinite. Really it can be shown that all correlation

matrices are either positive definite or positive

semidefinite. If there are any zero eigenvalues, the

performance index will either have a weak minimum or

else no stationary point (depending on d= -2h),

otherwise there will be a unique global minimum x*

(see Ch8).

T

1 T

Fx = c + d x + - x Ax = d + Ax = 2h + 2Rx

Stationary point:

2h + 2R x = 0

10

IUT-Ahmadzadeh

1430/10/28

definite:

1

x = R h

we could find the minimum point directly from above

equation.

But it is not desirable or convenient to calculate h and

R. So

11

Approximate mean square error (one sample):

x = t k a k 2 = e 2k

F

Expectation of the squared error has been replaced

by the squared error at iteration k.

Approximate (stochastic) gradient:

Fx = e2k

2

e k

e k

e k j = ---------------- = 2 e k ------------ w1 j

w 1 j

j = 1 2 R

2

e k

2

e k

e k R + 1 = ---------------- = 2e k ------------b

b

12

IUT-Ahmadzadeh

1430/10/28

T

e k t k a k

t k 1 w pk + b

------------- = ---------------------------------- =

w1 j

w 1 j

w1 j

e k

------------- =

w 1 j

w 1

t k w 1 i p i k + b

i = 1

Where pi(k) is the ith elements of the input vector at kth iteration.

e k

------------- = p j k

w1 j

e k

-------- = 1

b

F x = e2 k = 2e k z k

13

mean square error by the single error at iteration k as in:

x = tk ak2 = e2k

F

This approximation to F (x) can now be used

in the Steepest descent algorithm.

LMS Algorithm

Al ith

xk + 1 = xk F x

x = xk

14

IUT-Ahmadzadeh

1430/10/28

If we substitute

x k + 1 = x k + 2 e k z k

1w k + 1

= 1w k + 2e k p k

b k + 1 = b k + 2 e k

These last two equations make up the LMS algorithm.

Also called Delta Rule or the Widrow-Hoff learning

algorithm.

15

Multiple-Neuron Case

iw k +

1 = iw k + 2 ei k p k

b i k + 1 = b i k + 2e i k

Matrix Form:

T

W k + 1 = W k + 2e k p k

b k + 1 = b k + 2 e k

16

IUT-Ahmadzadeh

1430/10/28

Analysis of Convergence

Note that xk is a function only of z(k-1), z(k-2), , z(0). If

we assume that successive input vectors are statistically

independent, then xk is independent of z(k).

We will show that for stationary input processes meeting

this condition, so the expected value of the weight vector

will converge to:

*

1

x R h

solution, as we saw before.

17

xk + 1 = xk + 2e k zk

E xk + 1 = E xk + 2E e k z k

Substitute the error with

t (k ) xTk z (k )

T

Ex k + 1 = Ex k + 2E t k z k E xk zk z k

since xTk z (k ) z T (k )x k

T

E xk + 1 = E xk + 2 Etk z k E zkz k xk

18

IUT-Ahmadzadeh

1430/10/28

E xk + 1 = E xk + 2 h RE xk

E xk + 1 = I 2RE xk + 2h

For stability, the eigenvalues of this

matrix must fall inside the unit circle.

eig I 2 R = 1 2 i 1

(where i is an eigenvalue of R)

Since

i 0 ,

19

1 2i 1 .

1 2

1 i

for all i

0 1 m ax

SD we use the Hessian Matrix A, here we use the input

correlation matrix R (Recall that A=2R).

10

20

IUT-Ahmadzadeh

1430/10/28

E xk + 1 = I 2 R E xk + 2 h

If the system is stable,

stable then a steady state condition will be reached.

reached

E xss = I 2 R E xss + 2 h

The solution to this equation is

1

Ex ss = R h = x

Thus the LMS solution, obtained by applying one input at a time, is

the same as the minimum mean square solution of x* R 1h

21

Example

Banana

p

=

t

=

1

1 1

1

p

=

t

=

Apple 2

1 2

1

input correlation matrix is:

1

2

1

2

R = E pp = -- p 1 p 1 + -- p 2 p 2

1

1 0 0

R = --2- 1 1 1 1 + -2- 1 1 1 1 = 0 1 1

1

1 = 1.0

2 = 0.0

3 = 2.0

0 1 1

1

1

------- = ---- = 0.5

max 2.0

. We choose them by trial and error).

11

22

IUT-Ahmadzadeh

1430/10/28

Iteration One

Banana

a0 = W 0p 0 = W0 p1= 0 0 0

1

1= 0

1

W(0) is

selected

arbitrarily.

e 0 = t 0 a0 = t1 a 0= 1 0= 1

W 1 = W0 + 2e 0 pT 0

T

1

W 1 = 0 0 0 + 20.2 1 1 = 0.4 0.4 0.4

1

23

Iteration Two

Apple

1

1 = 0.4

1

e 1 = t1 a1 = t2 a 1= 1 0.4= 1.4

T

1

W 2 = 0.4 0.4 0.4 + 2 0.2 1.4 1 = 0.96 0.16 0.16

1

24

12

IUT-Ahmadzadeh

1430/10/28

Iteration Three

a 2= W2 p 2= W 2 p1= 0.96 0.16 0.16

1

1 = 0.64

1

e 2 = t 2 a 2 = t 1 a2 = 1 0.64= 0.36

T

W = 1 0 0

25

learning process:

Computationally, the learning process

goes through

th

h allll ttraining

i i examples

l ((an

epoch) number of times, until a stopping

criterion is reached.

The convergence process can be

monitored with the plot of the meanmean

squared error function F(W(k)).

26

13

IUT-Ahmadzadeh

1430/10/28

the mean-squared error is sufficiently

small:

ll F(W(k)) <

The rate of change of the mean-squared

error is sufficiently small:

27

Adaptive Filtering

ADALINE is one of the most widely used NNs in practical

applications. One of the major application areas has been

Adaptive Filtering.

Adaptive Filter

Tapped Delay Line

28

14

IUT-Ahmadzadeh

1430/10/28

ak = purelinWp + b =

w1 i yk i + 1 + b

i= 1

lang age wee

recognize this network as a finite impulse response

(FIR) filter.

29

30

15

IUT-Ahmadzadeh

1430/10/28

Two-input filter can attenuate and phase-shift the

noise in the desired way.

31

Correlation Matrix

To Analyze this system we need to find the input

correlation matrix R and the input/target crosscorrelation vector h.

h

R E[zz T ]

z k =

h = E t z

v k

v k 1

t k = s k + m k

2

R=

E v k

E v k v k 1

2

E v k 1v k Ev k 1

h =

16

E s k + m k v k

E s k + m k v k 1

32

IUT-Ahmadzadeh

1430/10/28

and the filtered noise m, to be able to obtain specific

values.

We assume: The EEG signal is a white (Uncorrelated

from one time step to the next) random signal

uniformly distributed between the values -0.2 and +0.2,

the noise source (60 Hz sine wave sampled at 180 Hz) is

given by

2 k

2k

v k = 1.2 sin---------

3

noise attenuated by a factor 1.0 and shifted in phase by

33

-3/4:

m k = 1.2

2 k 3

sin --------- ----- 3

4

2

2k 2

E v k = 1.2 --- sin --------- = 1.2 0.5 = 0.72

3

3

21

k =1

E v k 1 = E v k = 0.72

3

2 k 1

2k

1

E v k v k 1 = --- 1.2 sin ---------1.2 sin-----------------------

3

3

3

k=1

2

2

= 1.2 0.5 cos ------ = 0.36

3

R=

17

0.72 0.36

0.36 0.72

34

IUT-Ahmadzadeh

1430/10/28

Stationary Point

E sk + mk v k = E sk v k + E mk v k

0

1st

independent and zero mean.

1

Em k v k = -3

2k

3

--------- ------ 1.2sin --------- = 0.51

1.2 sin 2k

3

3

4

k =1

E s k + m k v k 1 = Es k v k 1 + E m kv k 1

0

35

1

2k 3

2 k 1

Em k v k 1 = --- 1.2 sin------- ----1.2 sin --------------- = 0.70

3

4

3

k=1

h =

E s k + m k v k

h = 0.51

E s k + m k v k 1

x = R 1 h =

0.72 0.36

0.36 0.72

0.70

0.51

0.70

0.30

0.82

minimum solution?

36

18

IUT-Ahmadzadeh

1430/10/28

Performance Index

T

F x = c 2 x h + x Rx

2

c = E t k = E s k + m k

2

c = Es k + 2E s k mk + E m k

The middle term is zero because s(k) and v(k) are

independent and zero mean.

1

E s k = ------0.4

2

0.2

0.2

0.2

2

1

3

s d s = --------------- s 0.2 = 0.0133

3 0.4

37

1

E m k = --- 1.2 sin 2

------ 3

------ = 0.72

3

3

4

k =1

c = 0.0133

0 0133 + 0.72

0 72 = 0.7333

0 7333

The minimum mean square error is the same as the

mean square value of the EEG signal. This is what

we expected, since the error of this adaptive noise

canceller is in fact the reconstructed EEG Signal.

38

19

IUT-Ahmadzadeh

1430/10/28

W1,2

W1,1

descent.

39

Note that the contours in this figure reflect the fact that

the eigenvalues and the eigenvectors of the Hessian

matrix A=2R are

0.7071

0.7071

, 2 0.75, z 2

0.7071

0.7071

1 2.16, z1

smoother, but the learning proceed more slowly.

Note that max is 2/2.16=0.926 for stability.

40

20

IUT-Ahmadzadeh

1430/10/28

algorithm is approximate steepest descent; it uses an estimate

41

of the gradient, not the true gradient. nnd10eeg

Echo Cancellation

42

21

IUT-Ahmadzadeh

1430/10/28

HW

Ch 4: E 2, 4, 6, 7

Ch 5: 5, 7, 9

Ch 6: 4, 5, 8, 10

Ch 7: 1, 5, 6, 7

Ch 8: 2, 4, 5

Ch 9: 2, 5, 6

Ch 10: 3, 6, 7

43

22

IUT-Ahmadzadeh

- ShalevSiSrCo2010_PEGPegasASOS.pdfUploaded bycarefree190
- Zoho questions.pdfUploaded byDinagar Rengasamy
- test-3-8-12-03Uploaded byapi-19803797
- DSPLAB09-10Uploaded byAshish K
- TOGAF V91Sample Catalogs Matrics Diagrams v3Uploaded bySabyasachi Banerjee
- Classification of Lung Diseases Using Optimization TechniquesUploaded byInternational Journal for Scientific Research and Development - IJSRD
- control system labs allUploaded byM Xubair Yousaf Xai
- Kinematics Modeling and Analyses of Articulated RoversUploaded byRanjit Kumar Barai
- Design of Digital FiltersUploaded byhexinfx
- 00801880Uploaded bydebasishmee5808
- quantumhw2Uploaded byAlex Gurfinkel
- Neural NetworkUploaded bySwati Saini
- Zh 33251263Uploaded byAJER JOURNAL
- Stiffness MatrixUploaded byMesfin Derbew
- Experiment 9Uploaded bySharad Mishra
- 10 Adaptive FilterUploaded byNikite Keraima
- Definir y Comparar Matrices ALG II SPN l v3 n46 s1Uploaded bygerardodoac
- f 1379073256653Uploaded byJoseph Shing
- 08 KernelsUploaded bycprobbiano
- 32.2.LehtonenUploaded byΜέμνων Μπερδεμένος
- nn.pdfUploaded byhat
- thurston & brown.pdfUploaded byJean Daniel Ngoh
- MT329Uploaded bypolaiahmtech4206
- 9a320b634b7631cbd75411e6e4ac0b38e560Uploaded byfathir
- StaBIL-3.0Uploaded byDaniel Maurad
- Algorithms for Big Data (CS 229r)Uploaded byAditya
- Signal_Processing_1_script_english_v2017.pdfUploaded byAnonymous TYuz9rC
- LINX Arduino LabVIEWUploaded byMohammad Rizki
- ictejournal-2014-1-article-4.pdfUploaded byArianto Gunawan
- ECE107L - E04 - 01 - EXP4Uploaded byAllyza Marie Balane

- Gravity Screenplay PDFUploaded bychgian
- Project Management for ConstructionUploaded byBoy Bangus
- favretto.pdfUploaded byjfranbripi793335
- Foley - RobustnessUploaded byrobox514
- Phosphate CorrosionUploaded byHugo Ridao
- Transportation ModelsUploaded byBehbehlynn
- 307345369GATE_2014Uploaded bySourabh Khandelwal
- TL-WA901ND(UN)_V4_UG.pdfUploaded bytjiang
- sysman_industrial ethernet network -comm_02-2008_en.pdfUploaded bysanjuandi
- 1990_EA_4WD_SuppUploaded bycyberbeam
- Reduction of Mouldboard Plough Share Wear by a Combination Technique of HardfacingUploaded byJuan Ambrosio Martinez
- Wel03lUploaded byWilly Uio
- Bapp Preload Assem Bsen14399hr 3 HrassUploaded bypolposition
- C2 Exam Empirical Formula and Reacting Masses Questions - Sets 1-3 Higher Tier Only.pdfUploaded byDaniel Looseley
- PA_SW-HF51_5000.pdfUploaded byIonescuTeodora
- WDM NATURE - 2017.pdfUploaded byOliveira JM
- Cargadores de Bajo Perfil Cat r1300g 2012Uploaded byJuan Carlos
- From New Regionalism to the Urban Network- Changing the Paradigm of Growth. Peter Calthorpe Urban Planning Today. William S. SaundersUploaded byNina C. Izábal M.
- panasonic_sa_akx100pn_sa_akx100pr_sa_akx100ps.pdfUploaded byJhon Fredy Diaz Correa
- HAAKE Practical Approach to Rheology and Rheometry - SchrammUploaded bypitblau
- Enve 208 Experiment 3.1Uploaded bymihrican302
- OrdinaryandUniversalKriging.pdfUploaded byRossana Caira
- SQL Server DBCC CommandsUploaded bynavin_nani
- REF542plus_om_755869_ENdUploaded bygoomi
- chemistry paper 3.pdfUploaded byVentus
- RC-TOOLUploaded byCatalin Boroianu
- Dgs-3312sr Cli Manual r3.5Uploaded byshiv_patel14
- Abdallah Samad - Progress Report 1.pdfUploaded byAbdallah Samad
- webdynpro resumeUploaded byPradeep Satwarkar
- 10987C_TrainerPrepGuideUploaded byAyamin

## Much more than documents.

Discover everything Scribd has to offer, including books and audiobooks from major publishers.

Cancel anytime.