12 views

Uploaded by Sava Jovanovic

save

- Calc
- Lecture-8.pdf
- Math53 Worksheet
- Unit 3 Integration
- TEST - Derivatives
- UT Dallas Syllabus for math2413.003.11s taught by Wieslaw Krawcewicz (wzk091000)
- 9781510421684_IGCSE-Math-Core-Ext-SO.pdf
- Derivative by Definition.pdf
- CalculusSongs01_17-1
- W11 Test 1 set 2
- Derivative
- M38LecNotes
- 6a. Construction Process Optimization
- MIT 18.02 Syllabus
- Graphs of Polynomials
- Logarithms Practice Test
- IDT 846: Distance Education-Learner Analysis
- 0 - Introduction.pdf
- Question 1 Solution
- modeling utah population data math 1010 project 1
- p
- Math Pointers
- 1013-midterm-2012-13spr-a
- Sydney Boys 2011 2U HY & Solutions
- math lesson plan
- 6 1 Maxima and Minima
- Act Geogebra Rate of Change 01
- AP Calculus Review Sheet (w/ Solutions)
- Derivative Graph Notes
- UPSC Recruitment- Official Notification
- prirucnik_za_matlab.pdf
- PI_MBUS_300.pdf
- Sveuciliste u Zagrebu.docx
- Praktikum Za Racunarski Merno-Informacioni Sistemi u Industriji (1)
- DRR - Inteligentno ponašanje sistema za upravljanje učenjem.pdf
- Visoka tehnicka skola strukovnih studija Novi Sad.docx
- Ladder Programiranje
- Praktikum Za Laboratorijske Vebe Iz Predmeta IDENTIFIKACIJA SISTEMA
- JureSimundic_RS232
- Identifikacija Sistema
- NEURONSKE MREŽE_predavanja.pdf
- MaraZivcic_Modbus.pdf
- MIU-Savremeni upravljacki uredjaji u solarnim sistemima .pdf
- Dinamika - Ubrzanje, Inercija i Eksplicitna Forma
- 208346054-ModBus-Ver-01.pdf
- Ladder programiranje.pptx
- baba.docx
- Modbus protokol - seminarski.pdf
- Dual-Axis Solar Tracking Controller Based on Fuzzy-Rules Emulated Networks and Astronomical Yearbook Records.pdf

You are on page 1of 18

**The error correction learning procedure is simple enough in conception. The procedure is as
**

follows: During training an input is put into the

network and flows through the network generating a set of values on the output units.

Then, the actual output is compared with the desired target, and a match is computed. If the output and target match, no change is made to the

net. However, if the output differs from the target a change must be made to some of the connections.

Let’s first recall the definition of derivative of singlevariable functions.

Definition 1. The derivative of f at (an interior point

of its domain) x, denoted by f (x), and defined by

1

2 . if f (x) = 0 then f can have a local maximum. if f (x) < 0 then we say that f is decreasing at x.f (x) − f (xn) f (x) = lim xn →x x − xn Let us consider a differentiable function f : R → R. If f (x) > 0 then we say that f is increasing at x. minimum or inflextion point at x. The derivative of f at (an interior point of its domain) x is denoted by f (x).

and decreasing in the opposite direction. A differentiable function is always increasing in the direction of its derivative. when 3 .xn x xn Derivative of function f . It means that if we want to find one of the local minima of a function f starting from a point x0 then we should search for a second candidate: • in the right-hand side of x0 if f (x0) < 0.f f(x) .f(xn ) x .

f (x0)) is given by y − f (x0) 0 = f (x ) x − x0 that is y = f (x0) + (x − x0)f (x0) The next approximation. that is 1 x n+1 0 f (xn) =x − n . The equation for the line crossing the point (x0. denoted by x1. is a solution to the equation f (x0) + (x − x0)f (x0) = 0 which is.f is decreasing at x0. f (x ) n 4 . f (x0) x =x − 0 f (x ) This idea can be applied successively. • in the left-hand side of x0 if f (x0) > 0. when f increasing at x0.

the value of f at wn+1 is smaller than its previous value at wn. each iteration of a descent method calculates the downhill direction (opposite of the direction of the deriva5 .f ( x) Downhill direction f(x 0) x1 x0 The downhill direction is negative at x0.e. In error correction learning procedure. The above procedure is a typical descent method. In a descent method the next iteration wn+1 should satisfy the following property f (wn+1) < f (wn) i.

tive) at wn which means that for a sufficiently small η > 0 the inequality f (wn − ηf (wn)) < f (wn) should hold. and we let wn+1 be the vector wn+1 = wn − ηf (wn) Let f : Rn → R be a real-valued function. wn+1. In a descent method. it should satisfy the property f (wn+1) < f (wn) i.e. the value of f at wn+1 is smaller than its value at previous approximation wn. Each iteration of a descent method calculates a downhill direction (opposite of the direction of the derivative) at wn which means that for a sufficiently small 6 . whatever is the next iteration.

1 . . . The derivative of f with respect e at w is defined as f (w + te) − f (w) . .η > 0 the inequality f (wn − ηf (wn)) < f (wn) should hold. . . . i.e. ∂ef (w) = lim t→+0 t If i-th e = (0. e is the i-th basic direction then instead of ∂ef (w) 7 . 0)T . Let f : Rn → R be a real-valued function and let e ∈ Rn with e = 1 be a given direction. and we let wn+1 be the vector wn+1 = wn − ηf (wn).

which is defined by ∂if (w) f (w1. . wi. . The gradient of f at w. . . . . . . wn) − f (w1. . wi + t. . ..we write ∂if (w). . . . wn) . denoted by f (w) is defined by 8 . t→+0 t = lim f(w) f(w + te) w e te w + te The derivative of f with respect to the direction e. .

. w2) = w12 + w22 then the gradient of f is given by f (w) = 2w = (2w1. Let f (w1. Suppose we are given a single-layer network with n input units and m linear output units. i.f (w) = (∂1f (w). i. the output of the i-th neuron can be written as 9 . 2w2)T . .e. −∂nf (w))T . which is (−∂1f (w). . Definition 2. the downhill direction is −f (w). The downhill (steepest descent) direction of f at w is the opposite of the uphill direction. . . . .e. The gradient vector always points to the uphill direction of f . A linear activation function is a mapping from f : R → R such that f (t) = t for all t ∈ R. . ∂nf (w))T Example 1.

oi = neti =< wi. . xkn). . . ym 1. . y k = (y1k . y K )} k ). . . y 1). . . . . . . . . . Assume we have the following training set {(x1. . m. . for i = 1. x >= wi1x1 + · · · + winxn. οm ο1 w11 wmn x1 x2 xn Single-layer feedforward network with m output units 10 . (xK . . . k = where xk = (xk1 . . K.

xk > 11 . oki :=< wi. yik indicates the target for a particular output unit on a particular pattern. xk >)2 k=1 i=1 where i indexes the output units. k indexes the input/output pairs to be learned. we can define the performance of the system as K K 1 k E= Ek = y − ok 2 2 k=1 That is k=1 1 k (yi − oki )2 E= 2 i=1 K 1 = 2 m k=1 K m (yik − < wi.The basic idea of the delta learning rule is to define a measure of the overall performance of the system and then to find a way to optimize that performance. In our network.

and E is the total error of the system. if the output functions are differentiable. y) (we omit the index k for the sake of simplicity) is given by the gradient descent method. The rule for changing weights following presentation of input/output pair (x. It turns out. we min12 . we can assign a particular unit blame in proportion to the degree to which changes in that unit’s activity lead to changes in the error. that this problem has a simple solution: namely.e. is to minimize this function.indicates the actual output for that unit on that pattern. i. then. The goal. That is. we change the weights of the system in proportion to the derivative of the error with respect to the weights.

Let us compute now the partial derivative of the error function Ek with respect to wij ∂Ek ∂Ek ∂neti = = −(yi − oi)xj ∂wij ∂neti ∂wij where neti = wi1x1 + · · · + winxn. That is.imize the quadratic error function by using the following iteration process ∂Ek wij := wij − η ∂wij where η > 0 is the learning rate. wij := wij + η(yi − oi)xj 13 .

So the delta learning rule can be written as wi := wi + η(yi − oi)x where wi is the weight vector of the i-th output unit. . . .for j = 1. denoted by δik and called delta. Definition 3. The error signal term. produced by the i-th output neuron is defined as ∂Ek = (yi − oi) δi = − ∂neti For linear output units δi is nothing else but the difference between the desired and computed output values of the i-th neuron. . n. x is the actual input to the system. 14 . yi is the i-th coordinate of the desired output. oi is the i-th coordinate of the computed output and η > 0 is the learning rate.

yet powerful learning procedures can be defined. w := w + η(y − o)x = w + ηδx where δ denotes the difference between the desired and the computed output. A key advantage of neural network systems is that these simple. 15 . The essential character of such networks is that they map similar input patterns to similar output patterns. allowing the systems to adapt to their environments.If we have only one output unit then the delta learning rule collapses into x1 w1 f xn ο = f(net) wn A single neuron net.

This rule parallels the discrete perceptron training rule.This characteristic is what allows these networks to make reasonable generalizations and perform reasonably on patterns that have never before been presented. The overlap in such networks is determined outside the learning system itself whatever produces the patterns. It should be noted that the delta learning rule was introduced only recently for neural network training by McClelland and Rumelhart. The similarity of patterns in a connectionist system is determined by their overlap. The standard delta rule essentially implements gradient descent in sum-squared error for linear activation functions. It also can be called the continuous percep16 .

x := xk . . . . y K )} k ). ym k = 1. . om)T is computed oi =< wi. . 17 . Input xk is presented. m. and output o = (o1. . xkn) and y k = (y1k . where xk = (xk1 . K. y 1). Summary 1. . .tron training rule. y := y k . . Emax > 0 are chosen • Step 2 Weights wij are initialized at small random values. The delta learning rule with linear activation functions. and the running error E is set to 0 • Step 3 Training starts here. . . . . Given are K training pairs arranged in the training set {(x1. . . . . (xK . . . . k := 1. x >) = wiT x for i = 1. . . . • Step 1 η > 0. .

For E < Emax terminate the training session. If E > Emax then E is set to 0 and we initiate a new training cycle by going back to Step 3 18 . otherwise we go to Step 7 • Step 7 The training cycle is completed.• Step 4 Weights are updated wij := wij + η(yi − oi)xj • Step 5 Cumulative cycle error is computed by adding the present error to E 1 E := E + y − o2 2 • Step 6 If k < K then k := k +1 and we continue the training by going back to Step 3.

- CalcUploaded bychinsu6893
- Lecture-8.pdfUploaded byAvinash Gamit
- Math53 WorksheetUploaded byAndrew Berger
- Unit 3 IntegrationUploaded bylyssa daud
- TEST - DerivativesUploaded byShannon Burton
- UT Dallas Syllabus for math2413.003.11s taught by Wieslaw Krawcewicz (wzk091000)Uploaded byUT Dallas Provost's Technology Group
- 9781510421684_IGCSE-Math-Core-Ext-SO.pdfUploaded byNi
- Derivative by Definition.pdfUploaded byLaurence Arcon Banal
- CalculusSongs01_17-1Uploaded byCocoa Spice
- W11 Test 1 set 2Uploaded bybidarihassan
- DerivativeUploaded byMuhammad Umar Raza
- M38LecNotesUploaded byjii
- 6a. Construction Process OptimizationUploaded byAbdelhadi Sharawneh
- MIT 18.02 SyllabusUploaded byjoelinia
- Graphs of PolynomialsUploaded byPuhtrishuh Chichay Dee
- Logarithms Practice TestUploaded byHaruki Edo
- IDT 846: Distance Education-Learner AnalysisUploaded byKate Prudchenko
- 0 - Introduction.pdfUploaded byDan Enzer
- Question 1 SolutionUploaded byEric Wong
- modeling utah population data math 1010 project 1Uploaded byapi-238079916
- pUploaded byHan Qing
- Math PointersUploaded byDudyEffendy
- 1013-midterm-2012-13spr-aUploaded bytttsui
- Sydney Boys 2011 2U HY & SolutionsUploaded bykev
- math lesson planUploaded byapi-238109871
- 6 1 Maxima and MinimaUploaded bySebastian Garcia
- Act Geogebra Rate of Change 01Uploaded bySae Nina
- AP Calculus Review Sheet (w/ Solutions)Uploaded byAznAlexT
- Derivative Graph NotesUploaded byNasir Iqbal
- UPSC Recruitment- Official NotificationUploaded bySupriya Santre

- prirucnik_za_matlab.pdfUploaded bySava Jovanovic
- PI_MBUS_300.pdfUploaded bySava Jovanovic
- Sveuciliste u Zagrebu.docxUploaded bySava Jovanovic
- Praktikum Za Racunarski Merno-Informacioni Sistemi u Industriji (1)Uploaded bySava Jovanovic
- DRR - Inteligentno ponašanje sistema za upravljanje učenjem.pdfUploaded bySava Jovanovic
- Visoka tehnicka skola strukovnih studija Novi Sad.docxUploaded bySava Jovanovic
- Ladder ProgramiranjeUploaded bySava Jovanovic
- Praktikum Za Laboratorijske Vebe Iz Predmeta IDENTIFIKACIJA SISTEMAUploaded bySava Jovanovic
- JureSimundic_RS232Uploaded bySava Jovanovic
- Identifikacija SistemaUploaded bySava Jovanovic
- NEURONSKE MREŽE_predavanja.pdfUploaded bySava Jovanovic
- MaraZivcic_Modbus.pdfUploaded bySava Jovanovic
- MIU-Savremeni upravljacki uredjaji u solarnim sistemima .pdfUploaded bySava Jovanovic
- Dinamika - Ubrzanje, Inercija i Eksplicitna FormaUploaded bySava Jovanovic
- 208346054-ModBus-Ver-01.pdfUploaded bySava Jovanovic
- Ladder programiranje.pptxUploaded bySava Jovanovic
- baba.docxUploaded bySava Jovanovic
- Modbus protokol - seminarski.pdfUploaded bySava Jovanovic
- Dual-Axis Solar Tracking Controller Based on Fuzzy-Rules Emulated Networks and Astronomical Yearbook Records.pdfUploaded bySava Jovanovic