37 views

Uploaded by terrorindarkness

save

- RA - Results Analysis
- Shashi kuttan returns and goes to bengaluru
- Quiz2 2004 Sample
- Sequencing Problem
- 10-Apt&Sat-II Maths Paper II
- Solutions Teasers
- Journal-Submission Science Sample
- Riddler Express Solution
- 2009 Calculus Applications Solutions (Mu)
- 11 3 redox titration lab report
- 978-3-642-33712-3.pdf
- T2 (B) Eng Math 4 2 20102011
- Lp Example (2)
- pv
- Multiple Attribute Decision Making
- Signals and System MATLAB code
- Rain Streaks Removal From Single Image
- Role of Audio Visual Aids in Developing Mathematical Skills at Secondary Level in District Kohat
- F4 Add Maths Annual Scheme of Work_2010
- Political Science 117
- Invoice # 1015
- Calc II Exams 1-6
- FP1 June 2011 Unofficial MS
- UBC Math 255 Practice Midterm 1
- devry course description
- Linear Approximation to Estimate a Root
- cycle 2 hor-precal
- Syllabus_MAT114_1.1_MVCDE
- Finite Strip Method
- HW1-100903
- Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
- Dispatches from Pluto: Lost and Found in the Mississippi Delta
- The Innovators: How a Group of Hackers, Geniuses, and Geeks Created the Digital Revolution
- Sapiens: A Brief History of Humankind
- Yes Please
- The Unwinding: An Inner History of the New America
- Grand Pursuit: The Story of Economic Genius
- This Changes Everything: Capitalism vs. The Climate
- A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
- The Emperor of All Maladies: A Biography of Cancer
- The Prize: The Epic Quest for Oil, Money & Power
- John Adams
- Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
- The World Is Flat 3.0: A Brief History of the Twenty-first Century
- Rise of ISIS: A Threat We Can't Ignore
- The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
- Smart People Should Build Things: How to Restore Our Culture of Achievement, Build a Path for Entrepreneurs, and Create New Jobs in America
- Team of Rivals: The Political Genius of Abraham Lincoln
- The New Confessions of an Economic Hit Man
- How To Win Friends and Influence People
- Angela's Ashes: A Memoir
- Steve Jobs
- Bad Feminist: Essays
- You Too Can Have a Body Like Mine: A Novel
- The Incarnations: A Novel
- The Light Between Oceans: A Novel
- Leaving Berlin: A Novel
- The Silver Linings Playbook: A Novel
- The Sympathizer: A Novel (Pulitzer Prize for Fiction)
- Extremely Loud and Incredibly Close: A Novel
- A Man Called Ove: A Novel
- The Master
- Bel Canto
- We Are Not Ourselves: A Novel
- The First Bad Man: A Novel
- The Rosie Project: A Novel
- The Blazing World: A Novel
- Brooklyn: A Novel
- The Flamethrowers: A Novel
- Life of Pi
- The Love Affairs of Nathaniel P.: A Novel
- The Bonfire of the Vanities: A Novel
- Lovers at the Chameleon Club, Paris 1932: A Novel
- The Perks of Being a Wallflower
- A Prayer for Owen Meany: A Novel
- The Cider House Rules
- Wolf Hall: A Novel
- The Art of Racing in the Rain: A Novel
- The Wallcreeper
- Interpreter of Maladies
- The Kitchen House: A Novel
- Beautiful Ruins: A Novel
- Good in Bed

You are on page 1of 16

**Least Mean Square (LMS) Algorithm
**

3.1 Spatial Filtering

uses single linear neuron and can be understood as adaptive filtering y = ∑k wkxk error e = d − y for k = 1 to p where d = desired value 1 2 e 2

**cost function = mean squared error = J = -1 x1 w0 = θ w1 ..... xn wp J 3.2 Steepest descent ∂J/∂wk = 0 to determine optimum weight /
**

¡

∑

output y

adjust weights iteratively and move along the error surface towards the optimum value ∂ / wk(n+1) = wk(n) − η (∂J(n)/∂wk) i.e. updated value is proportional to negative of the gradient of the error surface ∴ wk(n+1) = wk(n) + η e(n) xk(n)

Jmin

gradient / = ∂J/∂w

w0

single weight

18

for example η(n) = c/n for some constant c / 19 .2. requires: 0 < η < 2/tr[Rx] where tr[Rx] = ∑k λk ≥ λmax / • Faster convergence is usually obtained by making η a function of n. • reduces storage requirement to information present in its current set of weights. and can operate in a nonstationary environment. 3.Properties of LMS: • a stochastic gradient algorithm in that the gradient vector is ‘random’ in contrast to steepest descent • on average improves in accuracy for increasing values of n.1 Convergence ( proof not given) • in the mean if weight vector → optimum value as n → ∞. requires: / 0 < η < 2/λmax λmax is max eigenvalue of autocorrelation matrix Rx Rx = E[x xT] • in the mean square if mean-square of error signal → constant as n → ∞.

dj uj ϕ(•) • -1 ej ej(n) = dj(n) − yj(n) υj(n) = ∑iwji(n)yi(n) υ yj(n) = ϕj(υj(n)) (n) = 1 ∑ e2(n) over all j in o/p layer 2 j for i = 0 to p 20 . Let wji be weight connected from neuron i to neuron j error signal: net internal sum: output: £ ¢ Instantaneous sum of squared errors: ....... • neuron j yi wij from previous layer . Multilayer Feedforward Perceptron Training 4......1 Back-propagation Algorithm • • .4..

∂ (n) / ∂υj(n) ¤ ¥ • Learning goal is to minimise av by adjusting weights.e. need to consider neuron j feeding neuron k. where inmputs to neuron j are yi δj(n) = − ∂ (n) ϕj′(υj(n)) = −∑k ek ∂ek(n) ϕj′(υj(n)) υ υ ∂yj(n) ∂yj(n) ¤ ¥ ∴ δj(n) = − ϕj′(υj(n)) ∑k ek(n) ∂ek(n) ∂υk(n) = − ϕj′(υj(n)) ∑k δk(n) wkj(n) υ υ ∂υk(n) ∂yj(n) • Thus δj(n) is computed in terms of δk(n) which is closer to the output. steepest descent where δj (n) = . After calculating the network output in a forward pass. local gradient easily calculated Case 2: hidden node more complex. the error is computed and recursively back-propagated through the network in a backward pass. but instead of estimate (n) is used on a pattern-by-pattern basis ¤ ¥ ¤ ¥ ¤ av ¤ ¥ ¥ For N patterns.Case 1: Output node. average squared error: = 1 ∑ (n) N n for n = 1 to N av the 21 . × × weight correction = (learning rate)×(local gradient)×(i/p signal neuron) ∆ wji(n) η δj(n) = yi(n) ¦ § = η δj (n) yi(n) ¦ § ∴ weight correction: ∆ wji(n) = − η ∂ (n) ∂ wji(n) ¦ § ¦ § From the chain rule: ∂ (n) = ∂ (n) ∂ej(n) ∂yj(n) ∂υj(n) ∂wji(n) ∂ej(n) ∂yj(n) ∂υj(n) ∂wji(n) i.

4.2 Back-propagation training Activation function: yj(n) = ϕj(uj(n)) = 1 1 + exp(−uj(n)) − ∂yj(n) = ∂uj(n) ¨ = yj(n) [ 1 − yj(n)] ϕj′(uj(n)) = exp(−uj(n)) − 2 [1 + exp(−uj(n))] − Note that max value of ϕj′(υj(n)) occurs at yj(n) = 0.5 and υ min value of 0 occurs at yj(n) = 0 or 1 Momentum term: + α ∆ wji(n − 1) α 0 ≤ |α| < 1 helps locate more desirable local minimum in complex error surface example error surface ¨ no change in error sign ⇒ ∆ wji(n) increases and descent is accelerated ¨ changes in error sign ⇒ ∆ wji(n) decreases and stabilises oscillations ¨ large enough α can stop process terminating in shallow local minima ¨ single weight with momentum. η can be larger 22 .

4. W = no. ε = fraction of errors permitted on test 4.2 Stopping criteria e.3 Other perspectives for improving generalisation 4.3.3.1 Pattern vs Batch Mode Choice depends on particular problem: • randomly updating weights after each pattern requires very little storage and leads to a stochastic search which is less likely to get stuck in local minima • updating after presentation of all training samples (an epoch) provides a more accurate estimate of the gradient vector since it is based on the average © squared error av 4.3.3 Initialisation • default is uniform distribution inside a small range of values • too large values can lead to premature saturation (neuron outputs close to limits) which gives small weight adjustments even though error is large 4.g. gradient vector threshold and/or change in average squared error per epoch 4. of hidden nodes. of synaptic weights. of examples.5 Cross-Validation • measures generalisation on test set • various parameters including no. learning rate and training set size can be set based on cross-validation performance 23 .3.3.4 Training Set Size worst-case formula N > W/ε where: / N = no.

g. Kalman filtering.7 Other ways of minimising cost function • Back-propagation uses a relatively simple.4 Universal Approximation Theorem single hidden layer with suitable ϕ gets arbitrarily close to any continuous function • logistic function satisfies ϕ(⋅) definition ⋅ • single hidden layer sufficient.6 Network Pruning by complexity regularisation (two possibilities: network growing and network pruning) goal is to find weight vector that minimises R(w) = s(w) +λ c(w) where s(w) is standared error measure e. mean square error λ is the regularisation parameter c(w) is the complexity penalty that depends on the network e. but no clue on synthesis • single hidden layer is restrictive in that hierarchical features not supported 24 .3. conjugate-gradient method 4.3.g.g. ||w||2 • regularisation term allows identification of weights having insignificant effect 4.4. quick approach to minimising cost function by obtaining an instantaneous estimate of the gradient • methods and techniques from nonlinear optimum filtering and nonlinear function optimisation have been used to provide more sophisticated approach to minimising the cost function e.

4.5 -1 x1 x1 0 0 1 1 x2 0 1 0 1 a 0 0 0 1 b 0 1 1 1 target 0 1 1 0 x2 1 neuron c out =0 out =1 out =0 0 1 x1 25 .5 -1 out =0 0 1 out =1 out 1 neuron b x1 1 1 1 1 1.5 a • x2 b 0.5 Example of learning XOR Problem Decision Boundaries x1 0 0 1 1 x2 0 1 0 1 target 0 1 1 0 x2 1 neuron a out =1 out =0 0 -1 x• 1 1 x2 -2 1 c 0.

..6 Example: vehicle navigation sharp left sharp right .......... fully connected 9 hidden units video input retina network computes steering angle training examples from human driver obstacles detected by laser range finder 26 ...............4....... fully connected 45 output units ....

w ( k ) 22 22 2p W(k) = w p1 ( k ) w p 2 ( k ) . bkp ak = ak1 ak2 . ak2..1 linear associative memory stimulus ak ak1 • w11 w12 w13 ak2 • . .. bk2 w11 ( k ) w12 ( k ) .... bkp [ak1... w pp ( k ) akp • p bkp response bk = W(k) ak Design of weight matrix for storing q pattern associations ak bk estimate of weight matrix = ∑k bk akT for k = 1 to q (Hebbian learning principle) where bk akT is the outer product = bk1 bk2 ... .. response bk bk1 1 bk = bk1 bk2 ...akp] 27 .. w1 p ( k ) w ( k ) w ( k ) . .5... akp 2 . Associative Memories 5. ...

of patterns reliably stored is p. [0. the dimension of input space which is also the rank (no. a2 = [0 1 0 0]T.45]T which is closer to b1 than b2 or b3 28 .25 -0. of independent columns or rows) of W For an auto-associative memory ideally W ak = ak showing that stimulus patterns are eigenvectors of W with all unity eigenvalues Example: a1 = [1 0 0 0]T.2]T gives [4 1.8 -0.and hetero-associative content addressable and resistant to noise and damage interaction between stored patterns may lead to error on recall The max.Pattern recall: For recall of a stimulus pattern aj: b = W aj = ∑k (akTaj) bk vj = ∑k (akTaj)bk for k = 1 to q.g. k ≠ j assuming that key patterns have been normalised. b2 = [-2 1 6]T. vj results from interference from all other stimulus patterns ∴ (akTaj) = 0 for j ≠k → perfect recall Main features: (orthonormal patterns) distributed memory auto.e. b3 = [-2 4 3]T memory weight matrix = 5 -2 -2 0 1 1 4 0 0 6 3 0 giving perfect recall since stimulus patterns are orthonormal noisy stimulus e.15 -0. a3 = [0 0 1 0]T b1 = [5 1 0]T. no.15 0. akTaj = 1 b = bj + vj where i.

.... Radial Basis Functions 6.1 Separability of patterns Separability theorem (Cover) states that if mapping ϕ(x) is nonlinear and hiddenunit space is high relative to input space then it is more likely to be non-separable ϕ1 1 w0 w1 x2 • .... then the training set can be learned perfectly .... wp t = centre of Gaussian 29 Example of RBF is a Gaussian ϕ(x) = exp(−||x − t||2) − output neuron is linear weighted sum ϕ(x) is nonlinear and hidden-unit space [ϕ1(x). ϕp(x)] is usually ϕ high dimension relative to input space and more likely to be separable a difficult nonlinear optimisation problem has been converted to a linear optimisation problem that can be solved by LMS algorithm if a different RBF is centred on each training pattern... ϕp w1 2 ...6. ϕ2(x).. xp • x1 • ϕ2 ..

t2 = [0.0) use two hidden Gaussian functions ϕ1(x) = exp(−||x − t1||2).0]T − decision boundary • • 0 1 pattern (0.0) ϕ1(x) x1 0 0 1 1 x2 0 1 0 1 ϕ1(x) √ e-√2 e-1 e-1 1 ϕ2(x) 1 e-1 e-1 √ e-√2 30 .1) patterns (0. t1 = [1.1) (1.2 Example: XOR ϕ2(x) 1 • pattern (1.1]T − ϕ2(x) = exp(−||x − t2||2).6.

with centres xi and widths σi F(x) = ∑i wi exp( − ||x − xi||2 ) for i = 1 to N σ 2σi2 practical ways of regularising: reduce number of RBFs change σ of the RBFs ! ! where s(F) is the standard error term and c(F) is the regularising term 31 . for every x ∈ X there exists y ∈ Y (existence) 2.6. that includes a complexity term: ! ! (F) = ! s(F) +λ c(F) one regularised solution is given by a linear superposition of multivariate Gaussian basis functions • one regularised is given by a linear superposition of multivariate Gaussian basis functions.3 Ill-posed Hypersurface Reconstruction Inverse problem of finding unknown mapping F from domain X and range Y is well-posed if: 1. 3. t ∈ X. mapping is continuous (continuity) F(x) = F(t) iff x = t (uniqueness) X x Y F(x) Learning is ill-posed because of sparsity of information & noise in training set Regularisation Theory for solving ill-posed problems (Tikhonov) uses a modified cost functional. for every pair of inputs x.

fundamentally different in hidden & o/p layers • all layers usually nonlinear vs. of centres and d = distance between them • Self-organised selection centres. nonlinear hidden but linear output • computation of inner product of i/p vector & weight vector vs.4 RBF Networks vs.g.g. local approximation with fast learning but poor extrapolation 6. e. error-correction learning 2with suitable cost function using modified gradient descent " choose position of centres 32 . k-n-n or self-organising NN • Supervised Selection of centres. For the hidden layer the main choice involves how the centres are learned: • Fixed Centres selected at random. choose Gaussian exp(− M d-2 ||x − ti||2) − where M = no. MLP • Single vs possibly multiple hidden layers • common computation nodes vs.5 Learning Strategies variety of possibilities since a nonlinear optimisation strategy for hidden layer is combined with linear optimisation strategy in output layer. e.g. Euclidean norm between i/p vector and centre of appropriate unit • global approximation and therefore good at extrapolation vs. e.6.

6.5 # output shown for 200 inputs uniformly sampled in the range [−8.5 best compromise is σ = 1.6 Example: curve fitting # − RBF for approximating (x−2)(2x+1)(1+x2)-1 from 15 noise-free examples # 15 Gaussian hidden units with same σ # Three designs are generated for σ = 0.5.5 σ = 1. 12] − x x x x x x x x x x x x x x x σ = 1.0 σ = 0.0.0 33 . σ = 1. σ = 1.

- RA - Results AnalysisUploaded byaczilla
- Shashi kuttan returns and goes to bengaluruUploaded byradheymohanupadhayay
- Quiz2 2004 SampleUploaded byPriyoti Ahmed
- Sequencing ProblemUploaded bySaurabh Mishra
- 10-Apt&Sat-II Maths Paper IIUploaded byVishal Vj
- Solutions TeasersUploaded byYandi Layadi
- Journal-Submission Science SampleUploaded byNana Siluman Panda
- Riddler Express SolutionUploaded byEduin Latimer
- 2009 Calculus Applications Solutions (Mu)Uploaded byBHAAJI0001
- 11 3 redox titration lab reportUploaded byapi-346839668
- 978-3-642-33712-3.pdfUploaded byDebopriyo Banerjee
- T2 (B) Eng Math 4 2 20102011Uploaded byNoor Nadiah
- Lp Example (2)Uploaded byJuancho Silva
- pvUploaded bysssuganthi
- Multiple Attribute Decision MakingUploaded bylkouajiep
- Signals and System MATLAB codeUploaded byAbhinav Choudhary
- Rain Streaks Removal From Single ImageUploaded byEditor IJRITCC
- Role of Audio Visual Aids in Developing Mathematical Skills at Secondary Level in District KohatUploaded bySampreety Gogoi
- F4 Add Maths Annual Scheme of Work_2010Uploaded bySure Seong
- Political Science 117Uploaded byJoenalyn Pascua
- Invoice # 1015Uploaded byMohammed Iqbal
- Calc II Exams 1-6Uploaded byxxambertaimexx
- FP1 June 2011 Unofficial MSUploaded byareyouthere92
- UBC Math 255 Practice Midterm 1Uploaded byKeith Russell
- devry course descriptionUploaded byapi-260546892
- Linear Approximation to Estimate a RootUploaded byjeffconnors
- cycle 2 hor-precalUploaded byapi-262886009
- Syllabus_MAT114_1.1_MVCDEUploaded bykvhkrishnan
- Finite Strip MethodUploaded byRahul Chandel
- HW1-100903Uploaded byanggrezhk