You are on page 1of 8

Perception learning algorithm

Yy Wnew Wold

0 false negative when wordthx


y 1,9
0
y I false positive Wnew Word Rx
y
geometric interpretation 42 2 Wo o

who
pam ggwy.nl
sit
WISO
y Sw

false negative false positive

XOR problem

Y O waxtWzxz Wo
X X O 0C Wo O Wo E O

contradiction

4 9 1 70 wz y L wz u so

y
2
yay z

OR is not linearly separable

Solving XOR with multilayer perception

introduce a hidden layer

input layer hidden layer


l m
O
a
O
x
O g
O
0 I 1 O 9
1 0 0 1 I
1 1 O O O
desired valuer
and an an
y
x É o ort ti K x

d58 O 7 101 O o
7,01 Jo X 7yd
o s

x y
1
t a
85 1 1
E E
ta as o x as txt 0.5 0

01 width'd 0.51 0 0.5 0 x txt 0.57


x x x
j
input
glythidderglayer output layer

Feedforward Neural Network

ti 1 tl

s 0 70
K J O d O 10
y i
Xm I 0 t O 10

Tiger Teaser teaser


131cL

All layer l neurons are connected to all layer Its neurons


Structure
of neuron
j at layer 1

ft t1
it11 activation
a
inputs to
ujcel layer A1 neurons

it
i wig II s 0
induced load
j
output g
field
Y É
d number of layer e new

Wijk weight between neurons i and j

Forward
propagation

Fix set of all weights w twig


Given input x compute output full
compute layer I outputs compute layer 2 outputs

compute layer L outputs x fu x

How to learn w

Training 1056

JCW
If loss fwlxiliyi

w arguminJcw

How to learn w efficiently


gradient descent Newton Rophoon X computationally expensive

Assumption on loot differentiable


Batch gradient descent GD

Initialize WCO randomly

Aticitiitticil rosewall
I small enough Janta eJ wall
Need to reevaluate J at each iteration
Stochastic gradient descent SGD

Initialize WCO randomly

At iteration n
Sample rail yall randomly from Dtrain
w att wa 171055 fun Call YG

Mini
Faster updok
Randomization
helps in avoiding bad load minimum

Mini batch gradient descent


Pick a mini batch of K samples at each iteration ex 10 samples
Examples of lost functions

Sum
of squared errors

loss fw xil yi fwCxil y fucxil gil


Ej yik fwkcx.tl

Good for regression problems Typiclly d 1

fun xi n Yik
Cross entropy

loss fwhiliyi ÉÉyiklog fullxill

Good for classification problems Typically d k number


classes
of

Classifier given input x predict j argmax fuk x


k
Examples of activation functions
linear

OWN
logistic

jot
Owl
É
O v OWI I 0611

hyperbolic tangent

IF
out tanto
t ÉÉ
softplus
Cul In Ster
a'at
É
rectified liner unit Cretu

Owl max20 u

Cul O for v20


g p
Back
propagation

Is there a
way
to perform gradient descent or 560
efficiently by utilizing feedforward structure

forward propagation back propagation

ecw loss full y


To compute Delwl we need Jelwl for all weights
gyu
s
Jeff
Chain rule

gist sjd.xi
Jiff jeff gg

silt Jeff h48j wig ft dal ri

gal
wig O vile
It Sj
Procedure

I Start from find layer Compute


Sj j 1 id
2 Use Sj
j I id to compile 8pct for it yd
3 Use deltas to compute
gently for all isil

SGD with backpropagation


w att wa 171055 fun Call YG

wig wig
Ryuffe

Wig tysjulx.cl 11

É
Extensions

Momentum

Dwight n LDwight n 1 tyg calx n

0 221 momentum constant

Speeds up learning on steady downhill


Stabilizer learning when oscillations happen

You might also like