slides_1 slides nanotech

© All Rights Reserved

3 views

slides_1 slides nanotech

© All Rights Reserved

- Pattern Classification Slide
- History of Functional Analysis ,1981
- John Sheliak TWZ Formalization
- Lady or the Tiger
- 17942
- A Framework for Face Recognition using Self Organizing Map (SOM) and Soft k-NN Ensemble
- 9
- A Comparison of Three Neural Network Architectures For
- Planes and Lines - Linear algebra, analytic geometry, differential geometry
- ece 434 AI
- tmpDF4.tmp
- Solutions 2 Matrices
- An Efficient ELM Approach for Blood Vessel Segmentation in Retinal Images
- Abm 1110 Operations Research, 16-05-2013, May 2013
- Radial Basis Function Process Neural Network Training Based on Generalized FRECHET Distance and GA-SA Hybrid Strategy
- seminaar8th
- Unit1_Paper2_May2010
- fuel cell modeling
- Vidia Sagar
- Dynamic and Feed-forward Neural Networks for Load Frequency Control of Two-Area Power Systems

You are on page 1of 35

1

SLausucal rlnclples and

CompuLauonal MeLhods

rof. ur. Lars kaderall

lnsuLuL fur medlzlnlsche lnformauk und 8lomeLrle

Medlzlnlsche lakulLaL

1echnlsche unlverslLaL uresden

lars.kaderall[Lu-dresden.de

arL 8 - Machlne Learnlng

!"#$%&'$()"&$* ,''-.'

Slldes and exerclses are avallable onllne aL Lhe

lM8 WebslLe or also on my groups webslLe,

follow llnk ,CompuLauonal 8lology:

hup://www.kaderall.org/Leachlng.hLml

Slldes are password-proLecLed:

username: /)012&)

assword: '(-3.%(

ln case of any quesuons, please lnLerrupL aL

any ume

Cmce hours: 8y appolnLmenL, or lmmedlaLely

before or aer Lhe lecLures.

07/03/14

2

Cvervlew

4$(. 5)%(.%(

Wednesday,

May 7Lh

Llnear Separauon / Classlcauon

neuronal neLworks

Wednesday,

May 14Lh

SupporL vecLor Machlnes

Wednesday,

May 21sL

ClusLerlng AlgorlLhms

Wednesday,

May 28Lh

LxpecLauon Maxlmlzauon

Duda, Hart, Storck

Pattern Classification (2

nd

Edition)

Wiley Interscience

ISBN 0-471-05669-3

Baldi, Brunak

Bioinformatics The Machine

Learning Approach (2

nd

Edition)

MIT Press

ISBN 0-262-02506-X

8ecommended LlLeraLure

07/03/14

3

Introduction

Data Complexity

Protein

Structure

ACTGTT...

Sequence

Pathways

Cell

Tissue

Organ

Organism

Traditional Bioinformatics

High Throughput Experiments

Big Data -> Machine Learning

lnLroducuon

! Biology has become a very data-

rich science:

Genome Sequencing Data

Protein Structure Data

Gene Expression Data

Protein Arrays / Mass

Spectrometry

High-throughput microscopy

...

! These data cannot be analzed

manually anymore. Ample

opportunities for computer science!

! Requires tight collaboration

between computer science,

mathematics, biology and medicine

! Methods of machine learning,

statistical pattern recognition and

data mining are prime tools to

automatically analyze the vast

amounts of data becoming

available

07/03/14

4

lnLroducuon

! Lxample quesuons lnclude:

Can we correlaLe phenoLypes wlLh

genoLypes? lor example, can we

predlcL how a cancer pauenL wlll

respond Lo LreaLmenL, based on hls

genomlc prole?

Can we learn, how our genome

lnuences our meLaboLype? lor

example, who wlll respond wlLh

welghL galn Lo faL-rlch dleL, and who

wlll noL?

Can we learn based on

observauonal daLa, how genes /

proLelns lnLeracL wlLh one anoLher

and how Lhey form regulaLory

molecular neLworks?

Can we lnfer Lhe funcuon of genes

from large daLa seLs?

Can we ldenufy subgroups e.g. ln a

seL of pauenLs wlLh Lhe same

dlsease, based on Lhelr molecular

proles?

Machlne Learnlng and redlcuon

! ln mosL of Lhe quesuons on

Lhe prevlous sllde, we are

concerned wlLh characLerlsuc

properues of ob[ecLs.

! 1hese properues are Lhen

used Lo compare ob[ecLs, or

Lo classlfy new ob[ecLs.

! 8uL.

- WhaL are characLerlsuc

properues of e.g. an

apple? Pow do we as

humans recognlze an

apple?

- 1yplcal feaLures: Color,

form, surface, slze, .

07/03/14

3

Machlne Learnlng and redlcuon

! Color?

Machlne Learnlng and redlcuon

! Surface? lorm?

07/03/14

6

Puman ercepuon

Sensory lnformauon

reprocesslng

auern recognluon

Acuon

Machlne ercepuon

! Slmple Lxample:

! 8ulld a machlne LhaL

can classlfy sh,

loaded onLo a

conveyor belL, lnLo

dlerenL Lypes

Species

Seabass

Salmon

07/03/14

7

roblem Analysls

! Camera, LhaL acqulres

lmages

! lrom Lhe lmages, compuLe

characLerlsuc properues of

Lhe sh, for example

! LengLh

! 8rlghLness

! WldLh

! number and form of ns

! osluon of mouLh,

! eLc.

! 1hls seL of properues are

Lhen candldaLes for

classlcauon of Lhe sh

reprocesslng

SegmenLauon

(SeparaLe sh & background)

leaLure LxLracuon

(CompuLe roperues)

Classlcauon

Seabass

Salmon

07/03/14

8

leaLure Selecuon

Choose lengLh as Lhe properLy used for classlcauon?

leaLure Selecuon

! lL seems lengLh alone ls noL a good properLy for classlcauon.

! We could Lry wlLh brlghLness lnsLead:

! The choice of threshold is a further factor that will influence

classification outcome (e.g. minimize seabass in salmon cans")

07/03/14

9

leaLure selecuon

Combine width and brightness:

Fish x

T

= [x

1

, x

2

]

Brightness Width

Cenerallzauon?

07/03/14

10

Cenerallzauon?

ueslgn-Cycle of a Classler

07/03/14

11

Linear discriminant functions

! We wlll assume ln Lhe followlng

LhaL we can (adequaLely) separaLe

salmon and sea bass uslng a llnear

classler

! Llnear funcuons are noL

necessarlly opumal (ln all cases),

buL have Lhe advanLage LhaL Lhey

are very easy Lo use and

undersLand

! Clven Lralnlng daLa, our ob[ecuve

ls Lo nd a llne (or a hyperplane ln

hlgher dlmenslonal spaces) LhaL

opumally separaLes Lwo classes

lnLroducuon

07/03/14

12

Llnear ulscrlmlnanL luncuons

and ueclslon 8oundarles

A two-class classifier with discriminant function of the form (1) uses the

following classification rule:

Decide class !

1

if g(x) > 0 and class !

2

if g(x) < 0

! Decide class !

1

if w

t

x > -w

0

and class !

2

otherwise

If g(x) = 0 " x is (by definition) assigned to an arbitrary class

Definition: Linear Discriminant Function

A linear discriminant function is a function g(x) which computes a

linear combination of the components of x,

g(x) = w

t

x + w

0

(1)

where w is a weight vector and w

0

a bias.

Lquauon of a Pyperplane

normal vecLor

osluonal vecLor of a polnL

ulsLance Lo Crlgln:

Crlgln

07/03/14

13

Llnear ulscrlmlnanL luncuons

! The equation g(x) = 0 defines

a decision surface, which

separates points of class !

1

from points of class !

2

.

For linear (affine)

functions g(x), the

decision surface is a

hyperplane: In 2D it is

a line, in 3D a plane,

"

g(x) is an algebraic

measure for the

distance of x to the

hyperplane.

Llnear ulscrlmlnanL luncuons

where x

p

is the projection of x onto H

And since w is collinear to x-x

p

.

As g(x

p

)=0 and w

t

w = ||w||# it follows that

=> Linear discriminant functions separate

the space using a hyperplane as decision

surface.

Orientation of the surface is determined by

w, its position by the bias term w

0

.

x = x

p

+r

w

w

07/03/14

14

1he Mulu-CaLegory Case

! In case of c>2 classes we define c linear discriminant functions

and assign x to class !

i

if g

i

(x) > g

j

(x) # j $ i. In case of ties, the classification is

undetermined.

! Such a linear machine separates the feature space into c regions, where g

i

(x)

is maximal of the c discriminant functions if x is in region R

i

.

! Two neighboring regions R

i

and R

j

are separated by the hyperplane H

ij

that is

defined by:

g

i

(x) = g

j

(x)

i.e. (w

i

w

j

)

t

x + (w

i0

w

j0

) = 0

! => w

i

w

j

is orthogonal to H

ij

, and

g

i

(x) = w

i

t

x +w

i,0

l=1, ...,c

d(x, H

ij

) =

g

i

(x) ! g

j

(x)

w

i

!w

j

1he Mulu-CaLegory Case

07/03/14

13

1he Mulu-CaLegory Case

It is easy to show that the decision

regions of a linear machine are

convex. This restriction limits the

flexibility and accuracy of a linear

classifier.

=> In particular, every decision

region is singly connected

=> This makes the linear machine

suitable, if the conditional class

probabilities p(x|w) are unimodal.

Nevertheless, there are also

multimodal distributions for which

linear discriminants give excellent

results"

Learnlng wlLh llnear dlscrlmlnanLs

! LeL us assume we are glven n

daLa polnLs wlLh class labels "

1

and "

2

.

! Cur ob[ecuve ls Lo use Lhese Lo

learn a llnear dlscrlmlnanL

funcuon g(x)=w

L

x LhaL

separaLes Lhe classes.

! 1he amne case g(x)=w

L

x+w

0

can

be reduced Lo Lhe case g(x)=w

L

x

uslng a slmple Lrlck (how?)

! lor now, we assume LhaL a

soluuon exlsLs LhaL classles all

polnLs correcLly

! We are hence looklng for w, s.L.

wx>0 fur alle Samples of class

1, and wx<0 oLherwlse.

! 1o slmpllfy Lhe compuLauons,

we replace all polnLs x of class

1 by Lhelr negauve -x

! Pence our ob[ecuve becomes:

llnd w, s.L. wx>0 for all x

Solution region

Solution region

07/03/14

16

CradlenL descenL

! roblem: llnd a vecLor w, s.L. wx>0

for all Lralnlng polnLs x

! ldea: uene a sulLable funcuon

!(w), LhaL ls mlnlmal lf w ls a

soluuon.

! CradlenL descenL sLarLs wlLh a

arblLrary (random) vecLor w, and

Lhen lLerauvely makes a sLep ln Lhe

dlrecuon of sLeepesL descenL of

!(w) Lo nd a beuer polnL w.

! lormally:

! #

k

ls Lhe learnlng raLe, Lhls ls a

parameLer LhaL musL be chosen

carefully.

! CradlenL descenL ls noL a global

opumlzauon procedure, lL can geL

sLuck ln local opuma.

(1) BEGIN

(2) Initialize w, threshold $, #(), k <- 0

(3) do

(4) k <- k + 1

(5) w <- w #(k)dJ(w)

(6) until |#(k)dJ(w)| < $

(7) return w

(8) END

w

k+1

=w

k

!!

k

"J w

k

( )

Gradient descent Algorithm

Lxcurse: CradlenL uescenL

Conslder Lhe slmplesL case -

mlnlmlzlng a funcuon f(x) of

a scalar x:

f(x)

x x

0

f(x

0

)

07/03/14

17

Choose sLarung polnL x

0

,

Lhen compuLe gradlenL of

f() ln x

0

. 1hls gradlenL glves

Lhe sLeepness of f() aL Lhe

polnL x

0

.

Slnce we wlsh Lo mlnlmlze

f(), we wlll proceed ln Lhe

dlrecuon wlLh (sLeepesL)

negauve gradlenL!

f(x)

x x

0

f(x

0

)

Lxcurse: CradlenL uescenL

Choose sLarung polnL x

0

,

Lhen compuLe gradlenL of

f() ln x

0

. 1hls gradlenL glves

Lhe sLeepness of f() aL Lhe

polnL x

0

.

Slnce we wlsh Lo mlnlmlze

f(), we wlll proceed ln Lhe

dlrecuon wlLh (sLeepesL)

negauve gradlenL!

Chooose nexL polnL

!

"#$

& !

"

' ( )*+!

"

,

wlLh sLepwldLh !

f(x)

x x

0

f(x

0

)

x

1

f(x

1

)

Lxcurse: CradlenL uescenL

07/03/14

18

Lxample: CradlenL uescenL

As a slmple example, leL

us conslder mlnlmlzauon

of

f(x) = x - 2x

=> f'(x)=2x-2

As (random) sLarung

polnL, we choose x

0

=0.

LeL us seL Lhe sLepwldLh

h=1 (for lack of any beuer

value)

f'(0)=-2

x

1

= x

0

- h * f'(x

0

) = 2

As a slmple example, leL

us conslder mlnlmlzauon

of

f(x) = x - 2x

=> f'(x)=2x-2

x

1

= 2.

h = 1

f'(2) = 2

x

2

= x

1

- h * f'(x

0

) = 0

6%7)"(-%$(.*89 (:&' ;&**

%)( /)%<."#.====

Lxample: CradlenL uescenL

07/03/14

19

As a slmple example, leL

us conslder mlnlmlzauon

of

f(x) = x - 2x

=> f'(x)=2x-2

x

0

= 0

h = 2

f'(0) = -2

x

1

= x

0

- h * f'(x

0

) = 4

x

2

= x

1

- h * f'(x

1

) = -8

...

4&<."#.' .<.%=

Lxample: CradlenL uescenL

As a slmple example, leL

us conslder mlnlmlzauon

of

f(x) = x - 2x

=> f'(x)=2x-2

x

0

= 0

h = 0.73

f'(0) = -2

x

1

= x

0

- h * f'(x

0

) = 1.3

x

2

= x

1

- h * f'(x

1

) = 0.73

...

5)%<."#.'9 ;&(: "$(.

3.1.%3&%# )% :

Lxample: CradlenL uescenL

usual procedure ls Lo adapL

Lhe sLepwlLh h, decreaslng

lL over ume!

07/03/14

20

1he procedure ls compleLely analogue ln hlgher dlmenslonal space...

Lxcurse: CradlenL uescenL

CradlenL descenL: CaveaLs

>)/$* ?&%&0-0

@*)2$*. ?&%&0-0

07/03/14

21

1he ercepLron - Cb[ecuve luncuon

! 8ack Lo Lhe llnear classler: We are looklng for an ob[ecuve funcuon

LhaL depends on Lhe welghLs w, and LhaL we can mlnlmlze so LhaL all

wx>0 are saused.

! Cne could Lake !(w,x

1

,...,x

n

) = number of olnLs wlLh wx <= 0, buL Lhls

funcuon ls plecewlse consLanL, and we cannoL compuLe a gradlenL.

! A beuer alLernauve ls

x(w) ls Lhe seL of olnLs x, LhaL have been wrongly classled

! !

p

lsL never negauve, and 0 l for all polnL x: wx>0 holds.

! CeomeLrlcally, !

p

ls proporuonal Lo Lhe sum of Lhe dlsLances of Lhe

lncorrecLly classled polnLs Lo Lhe hyperplane dened by w.

! 1he gradlenL of !

p

ls

and hence Lhe Cu updaLe rule becomes

J

p

w

( )

= !w

t

x

( )

x"X w ( )

#

!J

p

w

( )

= "x

( )

x#X w ( )

$

w

k+1

=w

k

+!

k

x

x!X w ( )

"

Batch Update

1he ercepLron - Cb[ecuve luncuon

! 1he drawlng Lo Lhe rlghL

glves an ldea of Lhe way Lhe

algorlLhm works:

! Assume, LhaL only one polnL

x ls wrongly classled.

! 1he welghL vecLor w ls Lhen

correcLed lnLo Lhe dlrecuon

of x

! 1hls Lurns Lhe hyperplane,

and x ls now on Lhe correcL

slde

(lcLure Source: P. 8urkhardL, lrelburg)

07/03/14

22

! 8aLch-Learnlng: 1ake all wrongly

classled polnLs slmulLaneously for

Lhe updaLe

! AlLernauve: 1ake [usL a slngle

lncorrecLly classled polnL aL a ume,

lLeraLe Lhrough polnLs

! !

p

ls an ob[ecuve funcuon, LhaL

focuses on mlsLakes (error

correcuon)

! lf a soluuon exlsLs LhaL classles all

polnLs correcLly, Lhe algorlLhm wlll

LermlnaLe (roof: uuda, ParL, SLork,

p. 230).

! 1he soluuon ls usually noL unlque

! lf Lhere ls no soluuon, Lhe algorlLhm

wlll noL LermlnaLe

1he ercepLron - Cb[ecuve luncuon

noL llnearly separable

Cuadrauc Lrror

! ln Lhe non-separable case, one may sLrlve Lo mlnlmlze Lhe error made.

! So far, our ob[ecuve was Lo nd w s.L. w

L

x

l

>0 for all x

l

. We wlll no

conslder Lhe problem Lo nd w wlLh w

L

x

l

=b

l

, where b

l

ls a consLanL (e.g.

b

l

=+1 for polnLs x

l

ln class 1, b

[

=-1 for unkLe x

[

ln class 2)

! LeL x be a maLrlx LhaL conLalns x

l

ln Lhe l-Lh row. lurLher, leL b be Lhe

vecLor of class labels b

l

. ln maLrlx-form, our problem can Lhen be wrluen

as:

llnd w, s.L. xw = b

! lf x ls non-slngular (lnveruble), a soluuon ls glven by w=x

-1

b

! Powever, Lhls condluon ls usually noL glven:

! lf we have more daLa polnLs Lhan equauons, w ls overdeLermlned, and

Lhere ls no exacL soluuon.

! We could Lhen sull mlnlmlze Lhe error - & ./'0

07/03/14

23

Cuadrauc Lrror

! lor Lechnlcal reasons, one usually mlnlmlzes Lhe squared error lnsLead.

! 1he ob[ecuve funcuon Lhen becomes:

! 1he gradlenL of !

s

ls

(e.g. for gradlenL descenL)

! AlLernauvely: Semng Lhe derlvauve Lo zero glves Lhe necessary condluon

.

1

./ & .

1

0

Cuadrauc Lrror

! lf x

L

x ls nonslgnular, Lhen a soluuon ls

/&+.

1

.,

'$

.

1

0

! lf x

L

x ls slngular, one denes Lhe 23-456"78-93-

and

ls a soluuon LhaL mlnlmlzes Lhe quadrauc error on xw=b.

07/03/14

24

Lxample

! LeL Lhe polnLs (1,2), (2,0), (3,1)

and (2,3) be glven, wlLh class labels

1,1,-1,-1, respecuvely

! Cb[ecuve: llnd w, s.L.

! 1he pseudolnverse ls

! And Lhe soluuon

"#$%&'() "#+,&%-.

07/03/14

23

neuronal neLworks - lnLroducuon

! Objective: Classify objects, learn nonlinear Relations

Many practical problems exist, in which linear discriminant

functions are not sufficient for error minimization.

Support Vector Machines offer one way, to deal with this situation

through the Kernel Trick we will see this in the next lecture

In many situations, nonlinear functions offer much better

classification performance. However, a central problem is the

choice of appropriate nonlinear function to use.

A brute force approach would be to use a full set of basis

functions (e.g. all polynomial functions), however, such a

classifier would have too many parameters that cannot be

estimated from finite data"

neuronal neLwork - lnLroducuon

! neuronal neLworks Lry Lo learn Lhe

nonllnearlues dlrecLly from Lhe

daLa.

! nn were orlglnally developed, Lo

model and sLudy lnformauon

processlng and learnlng ln Lhe

human braln.

! nns conslsL of slmulaLed

neurons", LhaL are connecLed ln a

neLworks.

! nonllnearlues are lnLroduced

Lhrough nonllnear funcuons of Lhe

lnpuLs of a neuron, used Lo

calculaLe Lhe neurons ouLpuL

07/03/14

26

neurons ln neuronal neLworks

! Lvery neuron geLs one or more

lnpuLs

! A welghLed sum of Lhese lnpuLs

ls compuLed

! A nonllnear funcuon

(acuvauon funcuon") ls

applled Lo Lhe welghLed sum

! 1he resulLs of Lhls compuLauon

ls emlued by Lhe neuron.

! arameLers of Lhe nn are Lhe

welghLs w used Lo compuLe Lhe

welghLed sum.

x

1

x

2

x

3

x

4

w

1

w

2

w

3

w

4

f w

i

i=1

n

!

x

i

+w

0

"

#

$

%

&

'

w

i

i=1

n

!

x

i

+w

0

! Every neuron (=node) hat one or

several inputs from other neurons,

and one or several outputs to other

nodes.

! Inputs and Outputs can be

Binary {0, 1}

Bipolar {-1, 1}

Continous

! All Inputs for a given node arrive

simultaneously, and stay active

until the output is computed.

! The edges in the network have

weights

! f(net) is a (usually nonlinear)

activation function, where net is a

weighted sum of the incoming

connections.

x

1

x

2

x

3

x

4

w

1

w

2

w

3

w

4

f w

i

i=1

n

!

x

i

+w

0

"

#

$

%

&

'

w

i

i=1

n

!

x

i

+w

0

neurons ln neuronal neLworks

07/03/14

27

! Identity function: f(net)=net

! Step-function:

! Sigmoidal function, e.g.

Continuous and

differentiable

Asymptotic against

saturation points

Logistic sigmoidal function

Tanh sigmoidal function

Acuvauon luncuons

neLwork 1opology

! Feedforward-Network

Connections can only go from layer i to layer i+1

Most widely used network topology

But: Many other topologies are possible!

07/03/14

28

! Acyclic Networks

Connections do not form (directed) cycles.

Example: Feedforward Neuronal Networks.

! Recurrent Networks

Networks with weighted cycles.

Much more difficult to analyze and handle than acyclic

networks.

! Modular Networks

Consist of several different modules, each module is

an individual neural network for a subproblem.

Few connections between modules.

neLwork 1opology

neuronal neLworks and nonllnear Classlcauon

07/03/14

29

Lxpresslve SLrengLh of neuronal neLworks

Lxpresslve SLrengLh of neuronal neLworks

Question: Can any classification-decision be learnt by a three-layer

neuronal feedforward network?

Answer: Yes! (A. Kolmogorov)

Any continuous function from input to output can

be implemented in a three-layer net, given sufficient

number of hidden units n

H

, proper nonlinearities,

and weights.

Unfortunately: Kolmogorovs theorem does not tell us how to choose

the nonlinear activation functions for a given data set, nor how many

hidden units we need. This is hence the central problem in pattern

recognition with neuronal networks"

07/03/14

30

Lxpresslve SLrengLh of neuronal neLworks

,1ralnlng of a neuronal neLwork

! Well now, glven some

experlmenLal daLa wlLh class

labels (Lralnlng daLa") and

agreemenL on a neLwork

Lopology Lo use (Lhree-layer

feedforward neural neLwork

wlLh a cerLaln number of

hldden nodes), how do we

choose Lhe welghLs of Lhe

neLwork Lo properly model

Lhe daLa?

=> 8ackpropagauon algorlLhm

07/03/14

31

8ackpropagauon-AlgorlLhm

! Network Architecture:

! Backpropagation works for Feedforward networks

with at least one layer of nonlinear hidden nodes

! The activation function needs to be differentiable

(often used: sigmoidal functions)

! Learning:Supervised, error-driven

! Objective of the learning procedure: Adapt the weights

connecting to input to hidden and the hidden to output

layers so as to minimize the classification error.

& Adaptation of the weights for the hidden-to-output connections

is clear (simple delta-rule for gradient descent), but what about

adapting the weights on the input-to-hidden edges?

& How do we compute an error for the hidden layer nodes? This

is called the Credit assignment problem

8ackpropagauon

! Forward computation:

Present input pattern x at the input layer

Compute Outputs x

(h)

of the hidden layer nodes

Compute output o at the output layer

-> Network computes a function of the inputs x to calculate the

outputs o

! Objective of Training:

Reduce the sum of the squared error:

for given training data P=(x,d) as far as possible (ideally to

zero).

o

k

= S(net

k

(2)

) = S( w

k, j

(2,1)

x

j

(1)

j

!

)

x

j

(1)

= S(net

j

(1)

) = S( w

j,i

(1,0)

x

i

i

!

)

(

j=1

K

!

p=1

P

!

o

p, j

"d

p, j

)

2

07/03/14

32

8ackpropagauon - rlnclple:

Update the Hidden -> Output Layer weights using a gradient-descent

step (delta-rule)

Unfortunately, the same cannot be done for the Input -> Hidden Layer

weights, as we do not know the desired value for the hidden layer

nodes.

The solution is to distribute the error on the output layer to the

hidden layer nodes, and then do the update of the input->hidden

layer weights using this backpropagation error -> hence the name of

the method.

Key to the whole procedure is the distribution of the error to the

hidden layer nodes.

The same principle can be applied if several hidden layers exist

The method has been originally proposed by Werbos in 1974; todays

formulation is due to Rumelhart, Hinton and Williams (1986)

8ackpropagauon AlgorlLhmus

07/03/14

33

ueslgn-Cycle of a Classler

Learnlng Curves

Before training, the model error on the training data is high. The

objective of training is to reduce the training error.

The resulting error per training pattern depends on the number of

available data points, as well as the expressive power of the neural

networks -> for example the number of hidden layer nodes

The expected error on a new, independent dataset not used for training

is higher than the error on the training data, and can increase or

decrease with further training

Usually, a validation data set is used to decide when to stop training.

The objective of this is to avoid overfitting and obtain a classifier that

generalizes well.

Stop training the network if the minimum error is achieved on an

independent validation data set.

07/03/14

34

Lernkurven

MSE

Any Cuesuons?

07/03/14

33

A:$%B 8)- 7)" 8)-" CD.%E)%=

rof. ur. Lars kaderall

lnsuLuL fur Medlzlnlsche lnformauk und 8lomeLrle,

Medlzlnlsche lakulLaL,

1echnlsche unlverslLaL uresden

lars.kaderall[Lu-dresden.de

- Pattern Classification SlideUploaded byTiến Quảng
- History of Functional Analysis ,1981Uploaded bygylpm
- John Sheliak TWZ FormalizationUploaded bygalaxy5111
- Lady or the TigerUploaded bybajanuj5264
- 17942Uploaded bymass_sia
- A Framework for Face Recognition using Self Organizing Map (SOM) and Soft k-NN EnsembleUploaded byAnonymous vQrJlEN
- 9Uploaded bybhaidada
- A Comparison of Three Neural Network Architectures ForUploaded byGanesh Kumar Arumugam
- Planes and Lines - Linear algebra, analytic geometry, differential geometryUploaded bycecilchiftica
- tmpDF4.tmpUploaded byFrontiers
- Solutions 2 MatricesUploaded byMarlon Vella
- Radial Basis Function Process Neural Network Training Based on Generalized FRECHET Distance and GA-SA Hybrid StrategyUploaded bycseij
- ece 434 AIUploaded byMohammadAnees
- An Efficient ELM Approach for Blood Vessel Segmentation in Retinal ImagesUploaded byBONFRING
- seminaar8thUploaded byLatika Parashar
- Unit1_Paper2_May2010Uploaded byKingBee1123
- Abm 1110 Operations Research, 16-05-2013, May 2013Uploaded bymanoj gokikar
- fuel cell modelingUploaded byPiuleac1402
- Vidia SagarUploaded byZaharadeen Muhammad Hamdana
- Dynamic and Feed-forward Neural Networks for Load Frequency Control of Two-Area Power SystemsUploaded byIJAETMAS
- CVEN 2401 Workshop Week6Uploaded byMichael Boutsalis
- VariablesQ1.docxUploaded byAmit Verma
- vector synopsisUploaded bybulu30
- 2017 12 Maths Sample Paper 02 QpUploaded byRishabh Singh
- EMT320Week5KeyTerms.docxUploaded byGrantham University
- 2 Intro to MatricesUploaded byMatt Brenner
- TaskCommet1.pdfUploaded byabdulrahman
- BRM - MS scaling.pdfUploaded byramsan1991
- Almost Surely Minimal, Closed Graphs of Hyper-InjectiveUploaded byLucius Lunáticus
- The Vector Space 01 Linear CombinationsUploaded bySai Srivatsa

- 133233580 31 Days to Better PracticingUploaded by5823dc
- Moving SaleUploaded byAlessandro Rinaldi
- Moving Sale Cheap FurnitureUploaded byAlessandro Rinaldi
- 2 - BrangwynneGPGALDtLbCDCUploaded byAlessandro Rinaldi
- 4 - Kato - Protein DisordersUploaded byAlessandro Rinaldi
- lab_3-14_SupplUploaded byAlessandro Rinaldi
- slides_1Uploaded byAlessandro Rinaldi
- CNT Review MukulUploaded byAlessandro Rinaldi
- Manual 2013Uploaded byAlessandro Rinaldi

- differentialgeom003681mbpUploaded byNavin_Fogla_7509
- Syllabus ITtdhd SyllabusUploaded bySarunkumar Balath
- WA51_13417_r1981-t44_Geogr-PolonicaUploaded byJacopo Miglioranzi
- Tensor Analysis for Students of Physics and Engineering. Read 02.24.2008Uploaded byScribd_rkc
- Lecture 03Uploaded bysofianina05
- Vectors, shape & space revision notes from GCSE Maths TutorUploaded bygcsemathstutor
- Mircea Rades - Mechanical Vibrations 2, Structural Dynamic ModelingUploaded bymircearades
- DGPDECUploaded byparanoid923
- calc_2Uploaded byAlex Tan
- Image Segmentation using clustering (Texture with PCA)Uploaded byP Prakash Prakash
- Chemistry for GeologistsUploaded byd1p45
- syllabus for maths physics and gsUploaded byyuvinus
- Face Analysis, Modeling and Recognition SystemsUploaded bycodushi
- 9789382332640.pdfUploaded byjyoti
- UT Dallas Syllabus for engr3300.003.10f taught by Arash Loloee (axl018200)Uploaded byUT Dallas Provost's Technology Group
- Ponnapalli PhD ThesisUploaded bySri Priya Ponnapalli
- Winitzki - Why Physics is Hard to Learn No Matter How Much You KnowUploaded bywinitzki
- RreseUploaded byAbhinandan Dubey
- Beck M.-quantum Mechanics. Theory and ExperimentUploaded bygee45
- OMPF 2.0 ReferenceUploaded byaethomas
- From income inequality to economic inequality_Amartya Sen.pdfUploaded byMohammad Nahid Mia
- Viana-Lectures on Lyapunov Exponents-Cambridge(2014)Uploaded byLopez Enrique
- Mathematics & Statistics Contents .pdfUploaded byRajesh singh
- Ppt Chapter 08Uploaded bySubbulakshmi Venkatachalam
- PRINCIPE_Information Theoretic LearningUploaded byMarcela Lasso
- Matlab FunctionsUploaded bySudeep Balan
- itug.pdfUploaded byHalder Subhas
- Orthogonal MatricesUploaded byHemant
- Assign-1Uploaded bykapilkumar18
- Topology From The Differentiable Viewpoint(Milnor).pdfUploaded byShaul Barkan

## Much more than documents.

Discover everything Scribd has to offer, including books and audiobooks from major publishers.

Cancel anytime.