3 views

Uploaded by Alessandro Rinaldi

slides_1 slides nanotech

- Neural Network
- Ann 5
- Artificial Neural Networks- A Tutorial
- A Neural Network Measurement of Relative Military Security the Case of Greece and Cyprus
- Network.docx
- 10.1007_s12665-014-3280-z
- FUZZY NEURAL NETWORK THEORY AND APPLICATION
- rr420507-neural-networks
- UPSC Recruitment- Official Notification
- [George a. Rovithakis Manolis a. Christodoulou] Adaptive Control With Recurrent High Order Neural Networks
- Single Layer Perceptron Learning Algorithm and Flowchart of the Program and the Code of the Program in C
- Artificial Neural Network Based Induction Motor Fault Classifier Using Continuous Wavelet Transform
- WanjawaAndMuchemi ANNmodelForStockMarketPrediction Libre
- Comparison Between Levenberg-Marquardt And Scaled Conjugate Gradient Training Algorithms For Image Compression Using MLP
- [11]part-2-chapter-6
- Biblio
- MAE 640 Lec2
- Beattie Natalie C M 200904 MSc(Eng)
- 20 Leveling.pdf
- Cppnow2012 Submission 13

You are on page 1of 35

1

SLausucal rlnclples and

CompuLauonal MeLhods

rof. ur. Lars kaderall

lnsuLuL fur medlzlnlsche lnformauk und 8lomeLrle

Medlzlnlsche lakulLaL

1echnlsche unlverslLaL uresden

lars.kaderall[Lu-dresden.de

arL 8 - Machlne Learnlng

!"#$%&'$()"&$* ,''-.'

Slldes and exerclses are avallable onllne aL Lhe

lM8 WebslLe or also on my groups webslLe,

follow llnk ,CompuLauonal 8lology:

hup://www.kaderall.org/Leachlng.hLml

Slldes are password-proLecLed:

username: /)012&)

assword: '(-3.%(

ln case of any quesuons, please lnLerrupL aL

any ume

Cmce hours: 8y appolnLmenL, or lmmedlaLely

before or aer Lhe lecLures.

07/03/14

2

Cvervlew

4$(. 5)%(.%(

Wednesday,

May 7Lh

Llnear Separauon / Classlcauon

neuronal neLworks

Wednesday,

May 14Lh

SupporL vecLor Machlnes

Wednesday,

May 21sL

ClusLerlng AlgorlLhms

Wednesday,

May 28Lh

LxpecLauon Maxlmlzauon

Duda, Hart, Storck

Pattern Classification (2

nd

Edition)

Wiley Interscience

ISBN 0-471-05669-3

Baldi, Brunak

Bioinformatics The Machine

Learning Approach (2

nd

Edition)

MIT Press

ISBN 0-262-02506-X

8ecommended LlLeraLure

07/03/14

3

Introduction

Data Complexity

Protein

Structure

ACTGTT...

Sequence

Pathways

Cell

Tissue

Organ

Organism

Traditional Bioinformatics

High Throughput Experiments

Big Data -> Machine Learning

lnLroducuon

! Biology has become a very data-

rich science:

Genome Sequencing Data

Protein Structure Data

Gene Expression Data

Protein Arrays / Mass

Spectrometry

High-throughput microscopy

...

! These data cannot be analzed

manually anymore. Ample

opportunities for computer science!

! Requires tight collaboration

between computer science,

mathematics, biology and medicine

! Methods of machine learning,

statistical pattern recognition and

data mining are prime tools to

automatically analyze the vast

amounts of data becoming

available

07/03/14

4

lnLroducuon

! Lxample quesuons lnclude:

Can we correlaLe phenoLypes wlLh

genoLypes? lor example, can we

predlcL how a cancer pauenL wlll

respond Lo LreaLmenL, based on hls

genomlc prole?

Can we learn, how our genome

lnuences our meLaboLype? lor

example, who wlll respond wlLh

welghL galn Lo faL-rlch dleL, and who

wlll noL?

Can we learn based on

observauonal daLa, how genes /

proLelns lnLeracL wlLh one anoLher

and how Lhey form regulaLory

molecular neLworks?

Can we lnfer Lhe funcuon of genes

from large daLa seLs?

Can we ldenufy subgroups e.g. ln a

seL of pauenLs wlLh Lhe same

dlsease, based on Lhelr molecular

proles?

Machlne Learnlng and redlcuon

! ln mosL of Lhe quesuons on

Lhe prevlous sllde, we are

concerned wlLh characLerlsuc

properues of ob[ecLs.

! 1hese properues are Lhen

used Lo compare ob[ecLs, or

Lo classlfy new ob[ecLs.

! 8uL.

- WhaL are characLerlsuc

properues of e.g. an

apple? Pow do we as

humans recognlze an

apple?

- 1yplcal feaLures: Color,

form, surface, slze, .

07/03/14

3

Machlne Learnlng and redlcuon

! Color?

Machlne Learnlng and redlcuon

! Surface? lorm?

07/03/14

6

Puman ercepuon

Sensory lnformauon

reprocesslng

auern recognluon

Acuon

Machlne ercepuon

! Slmple Lxample:

! 8ulld a machlne LhaL

can classlfy sh,

loaded onLo a

conveyor belL, lnLo

dlerenL Lypes

Species

Seabass

Salmon

07/03/14

7

roblem Analysls

! Camera, LhaL acqulres

lmages

! lrom Lhe lmages, compuLe

characLerlsuc properues of

Lhe sh, for example

! LengLh

! 8rlghLness

! WldLh

! number and form of ns

! osluon of mouLh,

! eLc.

! 1hls seL of properues are

Lhen candldaLes for

classlcauon of Lhe sh

reprocesslng

SegmenLauon

(SeparaLe sh & background)

leaLure LxLracuon

(CompuLe roperues)

Classlcauon

Seabass

Salmon

07/03/14

8

leaLure Selecuon

Choose lengLh as Lhe properLy used for classlcauon?

leaLure Selecuon

! lL seems lengLh alone ls noL a good properLy for classlcauon.

! We could Lry wlLh brlghLness lnsLead:

! The choice of threshold is a further factor that will influence

classification outcome (e.g. minimize seabass in salmon cans")

07/03/14

9

leaLure selecuon

Combine width and brightness:

Fish x

T

= [x

1

, x

2

]

Brightness Width

Cenerallzauon?

07/03/14

10

Cenerallzauon?

ueslgn-Cycle of a Classler

07/03/14

11

Linear discriminant functions

! We wlll assume ln Lhe followlng

LhaL we can (adequaLely) separaLe

salmon and sea bass uslng a llnear

classler

! Llnear funcuons are noL

necessarlly opumal (ln all cases),

buL have Lhe advanLage LhaL Lhey

are very easy Lo use and

undersLand

! Clven Lralnlng daLa, our ob[ecuve

ls Lo nd a llne (or a hyperplane ln

hlgher dlmenslonal spaces) LhaL

opumally separaLes Lwo classes

lnLroducuon

07/03/14

12

Llnear ulscrlmlnanL luncuons

and ueclslon 8oundarles

A two-class classifier with discriminant function of the form (1) uses the

following classification rule:

Decide class !

1

if g(x) > 0 and class !

2

if g(x) < 0

! Decide class !

1

if w

t

x > -w

0

and class !

2

otherwise

If g(x) = 0 " x is (by definition) assigned to an arbitrary class

Definition: Linear Discriminant Function

A linear discriminant function is a function g(x) which computes a

linear combination of the components of x,

g(x) = w

t

x + w

0

(1)

where w is a weight vector and w

0

a bias.

Lquauon of a Pyperplane

normal vecLor

osluonal vecLor of a polnL

ulsLance Lo Crlgln:

Crlgln

07/03/14

13

Llnear ulscrlmlnanL luncuons

! The equation g(x) = 0 defines

a decision surface, which

separates points of class !

1

from points of class !

2

.

For linear (affine)

functions g(x), the

decision surface is a

hyperplane: In 2D it is

a line, in 3D a plane,

"

g(x) is an algebraic

measure for the

distance of x to the

hyperplane.

Llnear ulscrlmlnanL luncuons

where x

p

is the projection of x onto H

And since w is collinear to x-x

p

.

As g(x

p

)=0 and w

t

w = ||w||# it follows that

=> Linear discriminant functions separate

the space using a hyperplane as decision

surface.

Orientation of the surface is determined by

w, its position by the bias term w

0

.

x = x

p

+r

w

w

07/03/14

14

1he Mulu-CaLegory Case

! In case of c>2 classes we define c linear discriminant functions

and assign x to class !

i

if g

i

(x) > g

j

(x) # j $ i. In case of ties, the classification is

undetermined.

! Such a linear machine separates the feature space into c regions, where g

i

(x)

is maximal of the c discriminant functions if x is in region R

i

.

! Two neighboring regions R

i

and R

j

are separated by the hyperplane H

ij

that is

defined by:

g

i

(x) = g

j

(x)

i.e. (w

i

w

j

)

t

x + (w

i0

w

j0

) = 0

! => w

i

w

j

is orthogonal to H

ij

, and

g

i

(x) = w

i

t

x +w

i,0

l=1, ...,c

d(x, H

ij

) =

g

i

(x) ! g

j

(x)

w

i

!w

j

1he Mulu-CaLegory Case

07/03/14

13

1he Mulu-CaLegory Case

It is easy to show that the decision

regions of a linear machine are

convex. This restriction limits the

flexibility and accuracy of a linear

classifier.

=> In particular, every decision

region is singly connected

=> This makes the linear machine

suitable, if the conditional class

probabilities p(x|w) are unimodal.

Nevertheless, there are also

multimodal distributions for which

linear discriminants give excellent

results"

Learnlng wlLh llnear dlscrlmlnanLs

! LeL us assume we are glven n

daLa polnLs wlLh class labels "

1

and "

2

.

! Cur ob[ecuve ls Lo use Lhese Lo

learn a llnear dlscrlmlnanL

funcuon g(x)=w

L

x LhaL

separaLes Lhe classes.

! 1he amne case g(x)=w

L

x+w

0

can

be reduced Lo Lhe case g(x)=w

L

x

uslng a slmple Lrlck (how?)

! lor now, we assume LhaL a

soluuon exlsLs LhaL classles all

polnLs correcLly

! We are hence looklng for w, s.L.

wx>0 fur alle Samples of class

1, and wx<0 oLherwlse.

! 1o slmpllfy Lhe compuLauons,

we replace all polnLs x of class

1 by Lhelr negauve -x

! Pence our ob[ecuve becomes:

llnd w, s.L. wx>0 for all x

Solution region

Solution region

07/03/14

16

CradlenL descenL

! roblem: llnd a vecLor w, s.L. wx>0

for all Lralnlng polnLs x

! ldea: uene a sulLable funcuon

!(w), LhaL ls mlnlmal lf w ls a

soluuon.

! CradlenL descenL sLarLs wlLh a

arblLrary (random) vecLor w, and

Lhen lLerauvely makes a sLep ln Lhe

dlrecuon of sLeepesL descenL of

!(w) Lo nd a beuer polnL w.

! lormally:

! #

k

ls Lhe learnlng raLe, Lhls ls a

parameLer LhaL musL be chosen

carefully.

! CradlenL descenL ls noL a global

opumlzauon procedure, lL can geL

sLuck ln local opuma.

(1) BEGIN

(2) Initialize w, threshold $, #(), k <- 0

(3) do

(4) k <- k + 1

(5) w <- w #(k)dJ(w)

(6) until |#(k)dJ(w)| < $

(7) return w

(8) END

w

k+1

=w

k

!!

k

"J w

k

( )

Gradient descent Algorithm

Lxcurse: CradlenL uescenL

Conslder Lhe slmplesL case -

mlnlmlzlng a funcuon f(x) of

a scalar x:

f(x)

x x

0

f(x

0

)

07/03/14

17

Choose sLarung polnL x

0

,

Lhen compuLe gradlenL of

f() ln x

0

. 1hls gradlenL glves

Lhe sLeepness of f() aL Lhe

polnL x

0

.

Slnce we wlsh Lo mlnlmlze

f(), we wlll proceed ln Lhe

dlrecuon wlLh (sLeepesL)

negauve gradlenL!

f(x)

x x

0

f(x

0

)

Lxcurse: CradlenL uescenL

Choose sLarung polnL x

0

,

Lhen compuLe gradlenL of

f() ln x

0

. 1hls gradlenL glves

Lhe sLeepness of f() aL Lhe

polnL x

0

.

Slnce we wlsh Lo mlnlmlze

f(), we wlll proceed ln Lhe

dlrecuon wlLh (sLeepesL)

negauve gradlenL!

Chooose nexL polnL

!

"#$

& !

"

' ( )*+!

"

,

wlLh sLepwldLh !

f(x)

x x

0

f(x

0

)

x

1

f(x

1

)

Lxcurse: CradlenL uescenL

07/03/14

18

Lxample: CradlenL uescenL

As a slmple example, leL

us conslder mlnlmlzauon

of

f(x) = x - 2x

=> f'(x)=2x-2

As (random) sLarung

polnL, we choose x

0

=0.

LeL us seL Lhe sLepwldLh

h=1 (for lack of any beuer

value)

f'(0)=-2

x

1

= x

0

- h * f'(x

0

) = 2

As a slmple example, leL

us conslder mlnlmlzauon

of

f(x) = x - 2x

=> f'(x)=2x-2

x

1

= 2.

h = 1

f'(2) = 2

x

2

= x

1

- h * f'(x

0

) = 0

6%7)"(-%$(.*89 (:&' ;&**

%)( /)%<."#.====

Lxample: CradlenL uescenL

07/03/14

19

As a slmple example, leL

us conslder mlnlmlzauon

of

f(x) = x - 2x

=> f'(x)=2x-2

x

0

= 0

h = 2

f'(0) = -2

x

1

= x

0

- h * f'(x

0

) = 4

x

2

= x

1

- h * f'(x

1

) = -8

...

4&<."#.' .<.%=

Lxample: CradlenL uescenL

As a slmple example, leL

us conslder mlnlmlzauon

of

f(x) = x - 2x

=> f'(x)=2x-2

x

0

= 0

h = 0.73

f'(0) = -2

x

1

= x

0

- h * f'(x

0

) = 1.3

x

2

= x

1

- h * f'(x

1

) = 0.73

...

5)%<."#.'9 ;&(: "$(.

3.1.%3&%# )% :

Lxample: CradlenL uescenL

usual procedure ls Lo adapL

Lhe sLepwlLh h, decreaslng

lL over ume!

07/03/14

20

1he procedure ls compleLely analogue ln hlgher dlmenslonal space...

Lxcurse: CradlenL uescenL

CradlenL descenL: CaveaLs

>)/$* ?&%&0-0

@*)2$*. ?&%&0-0

07/03/14

21

1he ercepLron - Cb[ecuve luncuon

! 8ack Lo Lhe llnear classler: We are looklng for an ob[ecuve funcuon

LhaL depends on Lhe welghLs w, and LhaL we can mlnlmlze so LhaL all

wx>0 are saused.

! Cne could Lake !(w,x

1

,...,x

n

) = number of olnLs wlLh wx <= 0, buL Lhls

funcuon ls plecewlse consLanL, and we cannoL compuLe a gradlenL.

! A beuer alLernauve ls

x(w) ls Lhe seL of olnLs x, LhaL have been wrongly classled

! !

p

lsL never negauve, and 0 l for all polnL x: wx>0 holds.

! CeomeLrlcally, !

p

ls proporuonal Lo Lhe sum of Lhe dlsLances of Lhe

lncorrecLly classled polnLs Lo Lhe hyperplane dened by w.

! 1he gradlenL of !

p

ls

and hence Lhe Cu updaLe rule becomes

J

p

w

( )

= !w

t

x

( )

x"X w ( )

#

!J

p

w

( )

= "x

( )

x#X w ( )

$

w

k+1

=w

k

+!

k

x

x!X w ( )

"

Batch Update

1he ercepLron - Cb[ecuve luncuon

! 1he drawlng Lo Lhe rlghL

glves an ldea of Lhe way Lhe

algorlLhm works:

! Assume, LhaL only one polnL

x ls wrongly classled.

! 1he welghL vecLor w ls Lhen

correcLed lnLo Lhe dlrecuon

of x

! 1hls Lurns Lhe hyperplane,

and x ls now on Lhe correcL

slde

(lcLure Source: P. 8urkhardL, lrelburg)

07/03/14

22

! 8aLch-Learnlng: 1ake all wrongly

classled polnLs slmulLaneously for

Lhe updaLe

! AlLernauve: 1ake [usL a slngle

lncorrecLly classled polnL aL a ume,

lLeraLe Lhrough polnLs

! !

p

ls an ob[ecuve funcuon, LhaL

focuses on mlsLakes (error

correcuon)

! lf a soluuon exlsLs LhaL classles all

polnLs correcLly, Lhe algorlLhm wlll

LermlnaLe (roof: uuda, ParL, SLork,

p. 230).

! 1he soluuon ls usually noL unlque

! lf Lhere ls no soluuon, Lhe algorlLhm

wlll noL LermlnaLe

1he ercepLron - Cb[ecuve luncuon

noL llnearly separable

Cuadrauc Lrror

! ln Lhe non-separable case, one may sLrlve Lo mlnlmlze Lhe error made.

! So far, our ob[ecuve was Lo nd w s.L. w

L

x

l

>0 for all x

l

. We wlll no

conslder Lhe problem Lo nd w wlLh w

L

x

l

=b

l

, where b

l

ls a consLanL (e.g.

b

l

=+1 for polnLs x

l

ln class 1, b

[

=-1 for unkLe x

[

ln class 2)

! LeL x be a maLrlx LhaL conLalns x

l

ln Lhe l-Lh row. lurLher, leL b be Lhe

vecLor of class labels b

l

. ln maLrlx-form, our problem can Lhen be wrluen

as:

llnd w, s.L. xw = b

! lf x ls non-slngular (lnveruble), a soluuon ls glven by w=x

-1

b

! Powever, Lhls condluon ls usually noL glven:

! lf we have more daLa polnLs Lhan equauons, w ls overdeLermlned, and

Lhere ls no exacL soluuon.

! We could Lhen sull mlnlmlze Lhe error - & ./'0

07/03/14

23

Cuadrauc Lrror

! lor Lechnlcal reasons, one usually mlnlmlzes Lhe squared error lnsLead.

! 1he ob[ecuve funcuon Lhen becomes:

! 1he gradlenL of !

s

ls

(e.g. for gradlenL descenL)

! AlLernauvely: Semng Lhe derlvauve Lo zero glves Lhe necessary condluon

.

1

./ & .

1

0

Cuadrauc Lrror

! lf x

L

x ls nonslgnular, Lhen a soluuon ls

/&+.

1

.,

'$

.

1

0

! lf x

L

x ls slngular, one denes Lhe 23-456"78-93-

and

ls a soluuon LhaL mlnlmlzes Lhe quadrauc error on xw=b.

07/03/14

24

Lxample

! LeL Lhe polnLs (1,2), (2,0), (3,1)

and (2,3) be glven, wlLh class labels

1,1,-1,-1, respecuvely

! Cb[ecuve: llnd w, s.L.

! 1he pseudolnverse ls

! And Lhe soluuon

"#$%&'() "#+,&%-.

07/03/14

23

neuronal neLworks - lnLroducuon

! Objective: Classify objects, learn nonlinear Relations

Many practical problems exist, in which linear discriminant

functions are not sufficient for error minimization.

Support Vector Machines offer one way, to deal with this situation

through the Kernel Trick we will see this in the next lecture

In many situations, nonlinear functions offer much better

classification performance. However, a central problem is the

choice of appropriate nonlinear function to use.

A brute force approach would be to use a full set of basis

functions (e.g. all polynomial functions), however, such a

classifier would have too many parameters that cannot be

estimated from finite data"

neuronal neLwork - lnLroducuon

! neuronal neLworks Lry Lo learn Lhe

nonllnearlues dlrecLly from Lhe

daLa.

! nn were orlglnally developed, Lo

model and sLudy lnformauon

processlng and learnlng ln Lhe

human braln.

! nns conslsL of slmulaLed

neurons", LhaL are connecLed ln a

neLworks.

! nonllnearlues are lnLroduced

Lhrough nonllnear funcuons of Lhe

lnpuLs of a neuron, used Lo

calculaLe Lhe neurons ouLpuL

07/03/14

26

neurons ln neuronal neLworks

! Lvery neuron geLs one or more

lnpuLs

! A welghLed sum of Lhese lnpuLs

ls compuLed

! A nonllnear funcuon

(acuvauon funcuon") ls

applled Lo Lhe welghLed sum

! 1he resulLs of Lhls compuLauon

ls emlued by Lhe neuron.

! arameLers of Lhe nn are Lhe

welghLs w used Lo compuLe Lhe

welghLed sum.

x

1

x

2

x

3

x

4

w

1

w

2

w

3

w

4

f w

i

i=1

n

!

x

i

+w

0

"

#

$

%

&

'

w

i

i=1

n

!

x

i

+w

0

! Every neuron (=node) hat one or

several inputs from other neurons,

and one or several outputs to other

nodes.

! Inputs and Outputs can be

Binary {0, 1}

Bipolar {-1, 1}

Continous

! All Inputs for a given node arrive

simultaneously, and stay active

until the output is computed.

! The edges in the network have

weights

! f(net) is a (usually nonlinear)

activation function, where net is a

weighted sum of the incoming

connections.

x

1

x

2

x

3

x

4

w

1

w

2

w

3

w

4

f w

i

i=1

n

!

x

i

+w

0

"

#

$

%

&

'

w

i

i=1

n

!

x

i

+w

0

neurons ln neuronal neLworks

07/03/14

27

! Identity function: f(net)=net

! Step-function:

! Sigmoidal function, e.g.

Continuous and

differentiable

Asymptotic against

saturation points

Logistic sigmoidal function

Tanh sigmoidal function

Acuvauon luncuons

neLwork 1opology

! Feedforward-Network

Connections can only go from layer i to layer i+1

Most widely used network topology

But: Many other topologies are possible!

07/03/14

28

! Acyclic Networks

Connections do not form (directed) cycles.

Example: Feedforward Neuronal Networks.

! Recurrent Networks

Networks with weighted cycles.

Much more difficult to analyze and handle than acyclic

networks.

! Modular Networks

Consist of several different modules, each module is

an individual neural network for a subproblem.

Few connections between modules.

neLwork 1opology

neuronal neLworks and nonllnear Classlcauon

07/03/14

29

Lxpresslve SLrengLh of neuronal neLworks

Lxpresslve SLrengLh of neuronal neLworks

Question: Can any classification-decision be learnt by a three-layer

neuronal feedforward network?

Answer: Yes! (A. Kolmogorov)

Any continuous function from input to output can

be implemented in a three-layer net, given sufficient

number of hidden units n

H

, proper nonlinearities,

and weights.

Unfortunately: Kolmogorovs theorem does not tell us how to choose

the nonlinear activation functions for a given data set, nor how many

hidden units we need. This is hence the central problem in pattern

recognition with neuronal networks"

07/03/14

30

Lxpresslve SLrengLh of neuronal neLworks

,1ralnlng of a neuronal neLwork

! Well now, glven some

experlmenLal daLa wlLh class

labels (Lralnlng daLa") and

agreemenL on a neLwork

Lopology Lo use (Lhree-layer

feedforward neural neLwork

wlLh a cerLaln number of

hldden nodes), how do we

choose Lhe welghLs of Lhe

neLwork Lo properly model

Lhe daLa?

=> 8ackpropagauon algorlLhm

07/03/14

31

8ackpropagauon-AlgorlLhm

! Network Architecture:

! Backpropagation works for Feedforward networks

with at least one layer of nonlinear hidden nodes

! The activation function needs to be differentiable

(often used: sigmoidal functions)

! Learning:Supervised, error-driven

! Objective of the learning procedure: Adapt the weights

connecting to input to hidden and the hidden to output

layers so as to minimize the classification error.

& Adaptation of the weights for the hidden-to-output connections

is clear (simple delta-rule for gradient descent), but what about

adapting the weights on the input-to-hidden edges?

& How do we compute an error for the hidden layer nodes? This

is called the Credit assignment problem

8ackpropagauon

! Forward computation:

Present input pattern x at the input layer

Compute Outputs x

(h)

of the hidden layer nodes

Compute output o at the output layer

-> Network computes a function of the inputs x to calculate the

outputs o

! Objective of Training:

Reduce the sum of the squared error:

for given training data P=(x,d) as far as possible (ideally to

zero).

o

k

= S(net

k

(2)

) = S( w

k, j

(2,1)

x

j

(1)

j

!

)

x

j

(1)

= S(net

j

(1)

) = S( w

j,i

(1,0)

x

i

i

!

)

(

j=1

K

!

p=1

P

!

o

p, j

"d

p, j

)

2

07/03/14

32

8ackpropagauon - rlnclple:

Update the Hidden -> Output Layer weights using a gradient-descent

step (delta-rule)

Unfortunately, the same cannot be done for the Input -> Hidden Layer

weights, as we do not know the desired value for the hidden layer

nodes.

The solution is to distribute the error on the output layer to the

hidden layer nodes, and then do the update of the input->hidden

layer weights using this backpropagation error -> hence the name of

the method.

Key to the whole procedure is the distribution of the error to the

hidden layer nodes.

The same principle can be applied if several hidden layers exist

The method has been originally proposed by Werbos in 1974; todays

formulation is due to Rumelhart, Hinton and Williams (1986)

8ackpropagauon AlgorlLhmus

07/03/14

33

ueslgn-Cycle of a Classler

Learnlng Curves

Before training, the model error on the training data is high. The

objective of training is to reduce the training error.

The resulting error per training pattern depends on the number of

available data points, as well as the expressive power of the neural

networks -> for example the number of hidden layer nodes

The expected error on a new, independent dataset not used for training

is higher than the error on the training data, and can increase or

decrease with further training

Usually, a validation data set is used to decide when to stop training.

The objective of this is to avoid overfitting and obtain a classifier that

generalizes well.

Stop training the network if the minimum error is achieved on an

independent validation data set.

07/03/14

34

Lernkurven

MSE

Any Cuesuons?

07/03/14

33

A:$%B 8)- 7)" 8)-" CD.%E)%=

rof. ur. Lars kaderall

lnsuLuL fur Medlzlnlsche lnformauk und 8lomeLrle,

Medlzlnlsche lakulLaL,

1echnlsche unlverslLaL uresden

lars.kaderall[Lu-dresden.de

- Neural NetworkUploaded bysaaqib
- Ann 5Uploaded bymedhapurohit
- Artificial Neural Networks- A TutorialUploaded byChimie Minerale
- A Neural Network Measurement of Relative Military Security the Case of Greece and CyprusUploaded byMaha Darma
- Network.docxUploaded byGaurav Singhania
- 10.1007_s12665-014-3280-zUploaded bySrdjan Kostic
- FUZZY NEURAL NETWORK THEORY AND APPLICATIONUploaded byGT
- rr420507-neural-networksUploaded bySRINIVASA RAO GANTA
- UPSC Recruitment- Official NotificationUploaded bySupriya Santre
- [George a. Rovithakis Manolis a. Christodoulou] Adaptive Control With Recurrent High Order Neural NetworksUploaded byruben210979
- Single Layer Perceptron Learning Algorithm and Flowchart of the Program and the Code of the Program in CUploaded bySuchita Gupta
- Artificial Neural Network Based Induction Motor Fault Classifier Using Continuous Wavelet TransformUploaded byBokuz
- WanjawaAndMuchemi ANNmodelForStockMarketPrediction LibreUploaded byPavan Reddy
- Comparison Between Levenberg-Marquardt And Scaled Conjugate Gradient Training Algorithms For Image Compression Using MLPUploaded byAI Coordinator - CSC Journals
- [11]part-2-chapter-6Uploaded bysarfaraz0000
- BiblioUploaded byAdelmo Filho
- MAE 640 Lec2Uploaded bykostas.sierros9374
- Beattie Natalie C M 200904 MSc(Eng)Uploaded byhitmanwestern
- 20 Leveling.pdfUploaded bySureshKumar
- Cppnow2012 Submission 13Uploaded byVenugopal Gudimetla
- Gray Box 1994Uploaded byMario Calderon
- Mca - Took From Sanmacs.comUploaded byvijayendiran_g
- Mathematics - CourseDescriptions.pdfUploaded byBol Mën Nhial
- matlab ExperimentUploaded byDivayChadha
- Assignment 1Uploaded byMaruthi Ram
- vector_product7Uploaded byalgalgalg
- Paper_Final.pdfUploaded byabraham silva hernandez
- Koehn.encodingUploaded byqazxcde321
- nhUploaded byjasimabd
- Case Studies for Applications of Elman Recurrent Neural NetworksUploaded bytaherhie

- Az_farmUploaded byAlessandro Rinaldi
- 2 - BrangwynneGPGALDtLbCDCUploaded byAlessandro Rinaldi
- CNT Review MukulUploaded byAlessandro Rinaldi
- lab_3-14_SupplUploaded byAlessandro Rinaldi
- Haiku - Introduzione e ClassiciUploaded by_eth08_
- slides_1Uploaded byAlessandro Rinaldi
- Moving Sale Cheap FurnitureUploaded byAlessandro Rinaldi
- Moving SaleUploaded byAlessandro Rinaldi
- 4 - Kato - Protein DisordersUploaded byAlessandro Rinaldi
- Pronti a PartireUploaded bypiemme76
- Ama No Gawa - Soffice BiancoUploaded bytoyita1
- CNT Review MukulUploaded byAlessandro Rinaldi
- Erklaerung Bewerbungsvereinbarung 2014 NeuUploaded byAlessandro Rinaldi
- 133233580 31 Days to Better PracticingUploaded by5823dc
- Ouverture a Danser SatieUploaded byAlessandro Rinaldi
- user_manual_it.pdfUploaded byAlessandro Rinaldi
- appunticoraleUploaded byTribalswing Quartet
- 5 BagatelleUploaded byAlessandro Rinaldi
- CHUAN CHANG I Fondamenti Dello Studio Del Pianoforte [cap.1]Uploaded bykuntakinte90
- Manual 2013Uploaded byAlessandro Rinaldi
- Belkin - Sulle Idee MusicaliUploaded byfiorgal
- Jugendhotels_und_günstige_Hotels_in_BerlinUploaded byAlessandro Rinaldi

- Types and Levels of CommunicationUploaded byAman Singh
- Antoine Bousquet - The Scientific Way of WarfareUploaded byNathan Johnson
- FAB30803 Control System Assignment 1 September 2016Uploaded byShahrul Azeerie
- Design of Visual and Auditory Human-Machine Interfaces With User Participation and Knowledge SupportUploaded byMuhammad Azeem
- 9783319409993-t1 (1)Uploaded byGenett Jimenez Delgado
- MAE_476_Mech Control Systems IUploaded byTyler Griesinger
- IRJET-Feature Selection Using Binary Artificial Bee Colony For Sentiment ClassificationUploaded byIRJET Journal
- Control SystemsUploaded byshadiqengineer
- w4.pdfUploaded bysnehil96
- ml4771_syllabusUploaded byLisa Knight
- Unit 6.pdfUploaded byHridaya Kandel
- SISO Feedback LinearizationUploaded byTarek Bmr
- Intelligent Controller for Coupled Tank SystemUploaded byOmprakash Verma
- Peter M. Asaro / I.AUploaded byalicia
- Introduction to Statistical Relational LearningUploaded byaqil_shamsi
- A Comparison among Support Vector Machine and other Machine Learning Classification AlgorithmsUploaded byAnonymous vQrJlEN
- Hop FieldUploaded byMansour Naseri
- A Haptic Interface Using MatlabUploaded byazertyode45
- First Impressions a Study of Non VerbalUploaded byJosue Mundt
- paper4_water-tank.pdfUploaded byEddie Tang
- SOLO TaxonomyUploaded bytini0211
- Hopfield: an exampleUploaded byRoots999
- A Fast and Accurate Dependency Parser Using Neural NetworksUploaded byNon Sense
- EjemploUploaded byMartin Garcia
- MpcUploaded byVignesh Ram
- EEE308 Lab 6 FinalUploaded byReprieve Reza
- wida levelsUploaded byapi-328351253
- Naïve Bayes ClassifierUploaded byZohair Ahmed
- Combine SlidesUploaded bymansoor
- DCS Handout UpdatedUploaded byPandimadevi Selvakumar