Artificial Neural Network

Artificial Neural Network
Neuron in the Brain
Sumber: MICHAEL NEGNEVITSKY, Artificial Intelligence : A Guide to

Intelligent Systems, Second Edition, Addision Wesley, 2005
• A neuron consists of a cell body, soma, a number of fibres called dendrites,

and a single long fibre called the axon.
• While dendrites branch into a network around the soma, the axon
stretches out to the dendrites and somas of other neurons.
The Artificial Neurons

wij Proposed by McCulloch
positive The ArtificialandNeurons
— excitatory Pitts [1943]
negative — inhibitory called M-P neurons
zero — no connection
ti — threshold
x1 yi (t  1)  a( f )
wi1
x2
wi2  ti yi
f (.) a (.)
m
f ( i ) x  wijwximj  ti 1 f  0
a( f )  
m
j 1 0 otherwise
Neuron
● The neuron is the basic information processing
unit of a Neural Network. It consists of:
1 A set of links, describing the neuron inputs, with weights w1,
w2, …, wm
2 An adder function (linear combiner) for computing the
weighted sum of the inputs: i m
(real numbers) f   wi xi
i 1
3 Activation function a(.) for limiting the amplitude of the

neuron output.
y  a( f )
Activation Function a(.)
• Step function 1 if f  t
a( f )  
0 otherwise
• Sigmoid function
1
a( f ) 
1  exp( f )
• Gaussian function
1  1  f    
2
a( f )  
exp    

2   2   
Activation Function a(.)
• Tanh
ex − e−x
tanh x = x
e + e−x
• Rectified Linear Unit (ReLU)

What Can a Neuron Do?
• A hard limiter.
• A binary threshold unit.
• Hyperspace separation.
y x2
t

f ( i )  w1 x1  w2 x2  t
1 f ( i )  0
y
0
0 otherwise
x1
w1 w2
x2
1 x1
Artificial Neural Networks (ANN)
X1 X2 X3 Y Input Black box

1 0 0 0
1 0 1 1
X1
1 1 0 1 Output
1 1 1 1 X2
0 0 1 0
Y
0 1 0 0
0 1 1 1 X3
0 0 0 0
Output Y is 1 if at least two of the three inputs are equal to 1.

Artificial Neural Networks (ANN)
Input
nodes Black box
X1 X2 X3 Y
1 0 0 0 Output
1 0 1 1
X1 0.3 node
1 1 0 1
1 1 1 1 X2 0.3 
0 0 1 0
Y
0 1 0 0
0 1 1 1 X3 0.3 t=0.4
0 0 0 0
Y  a (0.3 X 1  0.3 X 2  0.3 X 3  0.4  0)

1 if f is true
where a ( f )  
0 otherwise
Algorithm for learning ANN
• Initialize the weights (w0, w1, …, wk)
• Adjust the weights in such a way that the output

of ANN is consistent with class labels of training
examples
– Objective function: E
1

 iY  f ( wi , X i ) 2
2 i
– Find the weights wi’s that minimize the above

objective function
• e.g., backpropagation algorithm
Learning Algorithms
• To design a learning algorithm, we face the

following problems:
1. Whether to stop? Is the criterion satisfactory?

2. In what direction to proceed? e.g., gradient descent
3. How long a step to take? : learning rate
Assume there are only two
Gradient Descent parameters w1 and w2 in a
network.
Error Surface 𝜃 = 𝑤1 , 𝑤2
The colors represent the value of C. Randomly pick a

starting point 𝜃 0
Compute the
negative gradient
𝑤2 𝜃∗ at 𝜃 0
−𝜂𝛻𝐶 𝜃 0 −𝛻𝐶 𝜃 0
−𝛻𝐶 𝜃 0
Times the
𝜕𝐶 𝜃 0 /𝜕𝑤1 learning rate 𝜂
𝜃0 𝛻𝐶 𝜃 0 =
𝜕𝐶 𝜃 0 /𝜕𝑤2 −𝜂𝛻𝐶 𝜃 0
𝑤1 Source: Hung-yi Lee, Deep Learning Tutorial
Gradient Descent
Eventually, we would
Randomly pick a
reach a minima …..
starting point 𝜃 0
Compute the
2−𝜂𝛻𝐶 𝜃2 negative gradient
−𝜂𝛻𝐶 𝜃𝜃
1
𝑤2 2 at 𝜃 0
−𝛻𝐶
−𝛻𝐶 𝜃 1 𝜃
𝜃1 −𝛻𝐶 𝜃 0
Times the
learning rate 𝜂
𝜃0
−𝜂𝛻𝐶 𝜃 0
𝑤1 Source: Hung-yi Lee, Deep Learning Tutorial
Local Minima
• Gradient descent never guarantee global
minima
Different initial
point 𝜃 0
𝐶 Reach different minima,

so different results
Who is Afraid of Non-Convex
Loss Functions?
𝑤1 𝑤2 http://videolectures.net/eml07
_lecun_wia/
Source: Hung-yi Lee, Deep Learning Tutorial
Besides local minima ……
cost
Very slow at the
plateau
Stuck at saddle point
Stuck at local minima
𝛻𝐶 𝜃 𝛻𝐶 𝜃 𝛻𝐶 𝜃
≈0 =0 =0
parameter space
Source: Hung-yi Lee, Deep Learning Tutorial
Back propagation algorithm
for Single-Layer Perceptron
• Step 0 - Initialize weights (w0, w1, …, wn), m = 0,
learning rate , and threshold t
• Step 1 – Do m = m + 1
• Step 2 – Select pattern Xm
• Step 3 – Calculate output
f ( wi , X i )   wi X i  t o  a ( f )
• Step 3 – Calculate error atau delta  = d – o
i
• Step 4 – Update weight w( new)  w( old )     i X i

• Step 5 – Repeat until w convergent
• Step 6 – Return w
General Structure of ANN
x1 x2 x3 x4 x5
Input
Layer Input Neuron i Output
I1 wi1
wi2 Activation
I2
wi3
Si function Oi Oi
Hidden g(Si )
Layer I3
threshold, t
Output Training ANN means learning

Layer the weights of the neurons
y
Multilayer Perceptron
y1 y2 yn
Output Layer . . .
. . .
Hidden Layer
. . .
Input Layer . . .
x1 x2 xm
How an MLP Works?
Example:
 Not linearly separable.
XOR
 Is a single layer
x2
perceptron workable?
1
0 x1
1
How an MLP Works?
Example:
XOR L1
y1 y2
x00
L2
2
01 L1 L2
1
11
x1 x2 x3= 1
0 x1
1
How an MLP Works?
Example:
XOR L1 L3
x00
L2
y2
2
01
1 1
11
0 x1 0 y1
1 1
How an MLP Works?
Example:
XOR L1 L3
x00
L2
y2
2
01
1 1
11
0 x1 0 y1
1 1
How an MLP Works?
Example:
z
L3
L3
y1 y2 y2
1
L1 L2 y3= 1
0 y1
x1 x2 x3= 1 1
Is the problem linearly separable?
Parity Problem
x 1 x2 x3 x3
000 0
001 1
010 1
011 0
100 1 x2
101 0
110 0 x1
111 1
Parity Problem
x1 x2 x3 x3
000 0
001 1 P1
010 1
011 0
100 1 P2
P3 x2
101 0
110 0 x1
111 1
Parity Problem
x1 x2 x3 x3
000 0
001 1 P1
010 1 111
011 0
100 1 P2
P3 x2
101 0 011
110 0 x1
111 1 000 001
Parity Problem
x3
P1
y1 y2 y3 111
P1 P2 P3
P2
P3 x2
011
x1 x2 x3
x1
000 001
Parity Problem
y3 x3
P1
111
P4
y2 P2
P3 x2
011
y1 x1
000 001
Parity Problem
y3 z
P4
y1 y3
y2
P4
y2 P1 P2 P3
y1 x1 x2 x3
General Problem
General Problem
Hyperspace Partition
L3
L1
L2
Region Encoding
000 L3 001
L1
010
100
101
110
111 L2
Hyperspace Partition &
Region Encoding Layer
L3
000 001
L1
010
L1 L2 L3
100
101
110
x1 x2 x3 111 L2
Region Identification Layer
101
L3
000 001
L1
010
L1 L2 L3
100
101
110
x1 x2 x3 111 L2
001
L3
000 001
L1
010
L1 L2 L3
100
101
110
x1 x2 x3 111 L2
000
L3
000 001
L1
010
L1 L2 L3
100
101
110
x1 x2 x3 111 L2
110
L3
000 001
L1
010
L1 L2 L3
100
101
110
x1 x2 x3 111 L2
010
L3
000 001
L1
010
L1 L2 L3
100
101
110
x1 x2 x3 111 L2
100
L3
000 001
L1
010
L1 L2 L3
100
101
110
x1 x2 x3 111 L2
111
L3
000 001
L1
010
L1 L2 L3
100
101
110
x1 x2 x3 111 L2
Classification
0 0
1 1
101 001 000 110 010 100 111 1
L3
000 001
L1
010
L1 L2 L3
100
101
110
x1 x2 x3 111 L2
Feed-Forward Neural Networks
Back Propagation Learning algorithm

Supervised Learning
Training Set  
T  (x(1) , d(1) ), (x( 2) , d( 2) ), , (x( p ) , d ( p ) )
o1 o2 on
d1 d2 dn
Output Layer . . .
. . .
Hidden Layer
. . .
Input Layer . . .
x1 x2 xm
Supervised Learning
Training Set  
T  (x(1) , d(1) ), (x( 2) , d( 2) ), , (x( p ) , d ( p ) )
Sum of Squared Errors o1 o2 on

d1 d2 dn
. . .
E (l )
2 j 1

1 n (l )
  d j  o (jl )  2
. . .
Goal: . . .
p
Minimize E   E (l ) . . .
l 1 x1 x2 xm
Back Propagation Learning Algorithm

1 n (l )

p
  d j  o (jl ) E  E
2
E (l ) (l )
2 j 1 l 1
o1 o2 on
d1 d2 dn
 Learning on Output Neurons . . .
 Learning on Hidden Neurons
. . .
. . .
. . .
x1 x2 xm
Learning on Output Neurons
1 n (l )
 
p
  d j  oj E   E (l )
(l ) (l ) 2
E
2 j 1 l 1
o1 oj on o(jl )  a(net (jl ) ) net (jl )   w jioi(l )

d1 dj dn
. . . j . . . E  p p
E ( l )

w ji w ji
E
l 1
(l )

l 1 w ji
wji
. . . i . . . E E
(l ) (l )
net (jl )

w ji net (jl ) w ji
. . . . . .
? ?
. . . . . .
 
n p
E   E (l )
1
  d j  oj
(l ) (l ) (l ) 2
E
2 j 1 l 1

d1 dj dn
. . . j . . . E  p p
E ( l )

w ji w ji
E
l 1
(l )

l 1 w ji
wji
. . . i . . . E E
(l ) (l )
net (jl )

. . . . . . E (l ) E (l ) o j
(l )
 (l )
net j
(l )
o j net (jl )
. . . . . .
depends on the
 (d (l )
j o )(l )
j activation function
Activation Function — Sigmoid
1
1
y  a (net )   net
0.5
1 e
0 net
1 y
2
 1   net

a (net )    net   (   ) e  net
e 
 1  e  y
a(net )  y (1  y ) Remember this

1 n (l )
 
p
  d j  o (jl ) E   E (l )
2
E (l )
2 j 1 l 1

d1 dj dn
. . . j . . . E  p
E ( l )
p

w ji w ji
E
l 1
(l )

l 1 w ji
wji
. . . i . . . E E
(l ) (l )
net (jl )

. . . . . . E (l ) E (l ) o j
(l )
 (l )
net j
(l )
o j net (jl )
. . . . . . Using sigmoid,
(d (l )
j o )
(l )
j
o(jl ) (1  o(jl ) )
 
n p
1
E (l )   d (jl )  o (jl ) E   E (l )
2
Learning on Output
2 j 1 Neurons l 1
E (l )
 (l )
   ( d (l )
 o (l )
)  o (l )
(1  o (l )
j )
net j
j (l ) j j j
o1 oj on o(jl )  a(net (jl ) ) 

net (jl )  ( l ) w o(l )
ji i
d1 dj dn j
. . . j . . . E  p p
E ( l )

w ji w ji
E
l 1
(l )

l 1 w ji
wji
. . . i . . . E E
(l ) (l )
net (jl )

. . . . . . E (l ) E (l ) o j
(l )
 (l )
net j
(l )
o j net (jl )
. . . . . . Using sigmoid,
(d (l )
j o )
(l )
j
o(jl ) (1  o(jl ) )
 
n p
E   E (l )
1
  d j  oj
(l ) (l ) (l ) 2
E
2 j 1 l 1

d1 dj dn
. . . j . . . E  p p
E ( l )

w ji w ji
E
l 1
(l )

l 1 w ji
wji
. . . i . . . E E
(l ) (l )
net (jl )

w ji net (jl ) w ji oi(l )
. . . . . .
E (l )
  (j l ) oi(l )
w ji
. . . . . .
 (d (jl )  o(jl ) )o(jl ) (1  o(jl ) )oi(l )
1 n (l )
 
p
  d j  o (jl ) E   E (l )
2
E (l )
2 j 1 l 1

d1 dj dn
. . . j . . . E  p
E ( l )
p

w ji w ji
E (l )

l 1 w ji
wji How to train the weights l 1
. . . i . . . connecting
E

E to
(l )
output neurons?
net (l ) (l )
j
E p oi(l )
.   (l ) (l )
. . j . o.i .
w ji l 1 E (l )
  (j l ) oi(l )
w ji
. . . p
w ji     j oi(l )
. .( l ).
 (d (jl )  o(jl ) )o(jl ) (1  o(jl ) )oi(l )
l 1
Learning on Hidden Neurons
1 n (l )
 
p
  d j  oj E   E (l )
(l ) (l ) 2
E
2 j 1 l 1
E  p p
E (l )
. . . j . . .

wik wik
E
l 1
(l )

l 1 wik
wji E (l ) E (l ) neti(l )

wik neti(l ) wik
. . . i . . .
wik
. . .k . . .
? ?
. . . . . .
 
n p
1
E (l )   d (jl )  o (jl ) E   E (l )
2
Learning on Hidden
 i
(l )
E  p p
E (l )
. . . j . . .

wik wik
 E (l )
l 1

l 1 wik

wik neti(l ) wik ok(l )
. . . i . . .
wik
. . .k . . .
. . . . . .
 
n p
1
E (l )   d (jl )  o (jl ) E   E (l )
2
Learning on Hidden
 i
(l )
E  p p
E (l )
. . . j . . .

wik wik
 E
l 1
(l )

l 1 wik

. . . i . . .
wik E (l ) E (l ) oi(l )
 (l )
. . .k . . . neti(l )
oi neti(l )
 oi(l ) (1  oi(l ) )
?
. . . . . .
 
n p
1
E (l )   d (jl )  o (jl ) E   E (l )
2
Learning on Hidden
 (l )

E (l )
  o (l )
(1  o (l )
)  w  (l ) i
(l )
neti(l )
i i i ji j
E  E (l )
j p p
. . . j . . .

wik wik
 E (l )
l 1

l 1 wik

. . . i . . .
wik E (l ) E (l ) oi(l )
 (l )
. . .k . . . neti(l )
oi neti(l )
 oi(l ) (1  oi(l ) )
E (l ) E (l ) net j
(l )
. . . . . . 
oi
(l )
j net j
(l )
oi(l )
 (lj ) w ji
 
n p
1
E (l )   d (jl )  o (jl ) E   E (l )
2
Learning on Hidden
E (l )
 i(l ) 
neti(l )
  oi
(l )
(1  oi
(l )
)  w 
ji j
(l )
E  E (l )
j p p
. . . j . . .

wik wik
E
l 1
(l )

l 1 wik

. . . i . . .
wik E p
. . .k . . .    i(l ) ok(l )
wik l 1
. . . . . . p
wik     i(l ) ok(l )
l 1
Back Propagation
o1 oj on
d1 dj dn
. . . j . . .
. . . i . . .
. . . k . . .
. . . . . .
x1 . . . xm
Back Propagation
 (l )

E (l )
  ( d (l )
 o (l )
) o (l )
(1  o (l )
j )
net j
j (l ) j j j
o1 oj on
d1 dj dn
. . . j . . . p
w ji     (j l ) oi(l )
l 1
. . . i . . .
. . . k . . .
. . . . . .
x1 . . . xm
Back Propagation
 (l )

E (l )
  ( d (l )
 o (l )
) o (l )
(1  o (l )
j )
net j
j (l ) j j j
o1 oj on
d1 dj dn
. . . j . . . p
w ji     (j l ) oi(l )
l 1
. . . i . . . p
wik     i(l ) ok(l )
l 1
. . . k . . .
. . . . . . E (l )
 i
(l )

neti(l )
  oi
(l )
(1  oi
(l )
) j ji j
w  (l )
x1 . . . xm
Multilayer Neural Network
xii 𝛿𝑗 = 𝑜𝑗 1 − 𝑜𝑗 ෍ 𝑤𝑗𝑖 𝛿𝑘
wij
oi wji j
𝛿𝑘 = (𝑑𝑘 − 𝑜𝑘 )𝑜𝑘 (1 − 𝑜𝑘 )
oj wjk
wkj k
ok
PEMBELAJARAN MULTI LAYER PERCEPTRON@DP-RL2006 64

Backpropagation algorithm
for MultiLayer Perceptorn
• Step 0 - Initialize weights (w0, w1, …, wn), m = 0, learning rate , and
threshold t, sigmoid variable 
• Step 1 – Do m = m + 1
• Step 2 – Select pattern Xm
• Step 3 – Calculate output oj and ok
o  a( f )
• Step 3 – Calculate delta k and j
• Step 4 – Update weight w
kj ( new )  wkj ( old )    o
k j
wij ( new)  wij ( old )   X

j i
• Step 5 – Repeat until w convergent
• Step 6 – Return w
https://machinelearningmastery.com/implement-
backpropagation-algorithm-scratch-python/
Learning Factors
• Initial Weights
• Learning Rate ()
• Cost Functions
• Momentum
• Update Rules
• Training Data and Generalization
• Number of Layers
• Number of Hidden Nodes
Number of Hidden Layers
• In fact, for many practical problems, there is no
reason to use more than one hidden layer.
No. of Result
Hidden Layer
none Only capable for representing linear separable
function or decision
1 Can approximate any function that contains a
continuous mapping from one finite space to another
2 Can represent an arbitrary decision boundary to
arbitrary accuracy with rational activation functions
and can approximate any smooth mapping to any
accuracy
Number of Neurons in the Hidden
Layer
There are many rule-of-thumb methods:
• Between the size of the input layer and the
size of output layer
• 2/3 the size of the input layer, plus the size of
the output layer
• Less than twice the size of the input layer
Number of Neurons in Output Layer
• Regression
– One neuron
• Classification
– Binary class  one neuron
– Multi class  more than one neuron
Material Resources
• 虞台文, Feed-Forward Neural Networks, Course
slides presentation
• Andrew Ng, Machine Learning, Course slides
presentation
• MICHAEL NEGNEVITSKY, Artificial Intelligence : A
Guide to Intelligent Systems, Second Edition,
Addision Wesley, 2005.
• Richard O. Duda, et. al, Pattern Classification 2nd
Edition, John Wiley & Sons, Inc., 2001
• Hung-yi Lee, Deep Learning Tutorial

Artificial Neural Network

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Artificial Neural Network

Uploaded by

Copyright:

Available Formats

Artificial Neural Network

Neuron in the Brain

Sumber: MICHAEL NEGNEVITSKY, Artificial Intelligence : A Guide to

• A neuron consists of a cell body, soma, a number of fibres called dendrites,

Sumber: MICHAEL NEGNEVITSKY, Artificial Intelligence : A Guide to

3 Activation function a(.) for limiting the amplitude of the

• Rectified Linear Unit (ReLU)

X1 X2 X3 Y Input Black box

Output Y is 1 if at least two of the three inputs are equal to 1.

Y  a (0.3 X 1  0.3 X 2  0.3 X 3  0.4  0)

• Adjust the weights in such a way that the output

– Find the weights wi’s that minimize the above

• To design a learning algorithm, we face the

1. Whether to stop? Is the criterion satisfactory?

The colors represent the value of C. Randomly pick a

𝐶 Reach different minima,

Stuck at local minima

• Step 4 – Update weight w( new)  w( old )     i X i

Output Training ANN means learning

Back Propagation Learning algorithm

Sum of Squared Errors o1 o2 on

o1 oj on o(jl )  a(net (jl ) ) net (jl )   w jioi(l )

o1 oj on o(jl )  a(net (jl ) ) net (jl )   w jioi(l )

a(net )  y (1  y ) Remember this

o1 oj on o(jl )  a(net (jl ) ) net (jl )   w jioi(l )

o1 oj on o(jl )  a(net (jl ) ) 

o1 oj on o(jl )  a(net (jl ) ) net (jl )   w jioi(l )

o1 oj on o(jl )  a(net (jl ) ) net (jl )   w jioi(l )

PEMBELAJARAN MULTI LAYER PERCEPTRON@DP-RL2006 64

wij ( new)  wij ( old )   X

You might also like