You are on page 1of 32

Activation Functions

Week-4
Basic Model of Artificial Neurons
Nonlinear model of an ANN • uq is a linear combiner of input
(xj) and synaptic weight (wqj)
n
uq =  wqj x j = wqT x = xT wq
j =1

uq = wq1 x1 + ... + wqn xn


• Output of activation function:
 x1 
x 
yq = f (uq ,  q )
xj =  2
  • McCulloh-Pitts:
 
 xn 
 wq1   1 for u q θ q
  f(•) : activation function yq =  0 for u θ
wq 2
wqj =    q q

 
 wqn 

2 Fundamentals of Intelligent Systems - 2022


Basic Model of Artificial Neurons (2)
 uq is a linear combiner of input
Alternate model
(xj) and synaptic weight (wqj)
n


n
vq = wqj x j uq =  wqj x j = wqT x = xT wq
j =0
j =1

 Activation potential

vq = u q −  q
 Output of activation function:

y q = f ( vq )
y q = f (v q )
 n 
= f   wqj x j −  q 

f(•) : activation function
 j =1 

3 Fundamentals of Intelligent Systems - 2022


Basic Model of Artificial Neurons (3)
n
vq =  wqj x j −  q → x0 = −1
j =1
n
vq =  wqj x j +  q → x0 = 1
j =1
n
vq =  wqj x j + wq 0 x0
j =1
n
vq =  wqj x j
j =0

y q = f ( vq )
 n 
= f  wqj x j 
 j =0 

4 Fundamentals of Intelligent Systems - 2022


Activation Functions
 Activation function can be a linear or nonlinear function.
 Selection of activation function depends on particular problem to be solved.

A. Linear (identity) function

y q = f lin (vq ) = vq

5 Fundamentals of Intelligent Systems - 2022


Activation Functions (2)
B1. Hard Limiter Function

0 if vq  0
y q = f hl (vq ) = 
1 if vq  0

Example : McCulloch-Pitts model

6 Fundamentals of Intelligent Systems - 2022


Activation Functions (3)
B2. Symmetric Hard Limiter Function

− 1 if vq  0

y q = f shl (vq ) =  0 if vq = 0
1 if v  0
 q

7 Fundamentals of Intelligent Systems - 2022


Activation Functions (4)
C1. Piecewise linear function (saturating function)

 1
 0 if vq  −
2
 1 1 1
y q = f sl (vq ) = vq + if -  vq 
 2 2 2
 1
1 if vq 
 2

8 Fundamentals of Intelligent Systems - 2022


Activation Functions (5)
C2. Symmetric Piecewise linear function

 − 1 if vq  −1

y q = f ssl (vq ) = vq if - 1  vq  1
 1 if v  1
 q

9 Fundamentals of Intelligent Systems - 2022


Activation Functions (6)
D1. Binary sigmoid function 1
y q = f bs (vq ) = −vq
1+ e

10 Fundamentals of Intelligent Systems - 2021


Activation Functions (7)
Derivative of binary sigmoid function
1
y q = f bs (vq ) = −vq
1+ e

−vq
dfbs ( vq ) e
g bs (vq ) = dvq
= −vq 2
(1 + e )
= f bs (vq )[1 − f bs (vq )]

11 Fundamentals of Intelligent Systems - 2022


Activation Functions (8)
D2. Hyperbolic tangent sigmoid (bipolar sigmoid) function
v q −vq
e −e
y q = f hts (vq ) = tanh(vq ) = v q −vq
e +e
− 2vq
1-e
= − 2vq
1+ e

df hts (vq )
g hts (vq ) = =  [1 + f hts (vq )][1 − f hts (vq )]
dvq

12 Fundamentals of Intelligent Systems - 2022


Feedforward NN
Single-Layer Feedforward Networks Multilayer Feedforward Networks

Input Hidden Output


layer layer layer

The term “hidden” refers to the fact that this


part of the neural network is not seen directly
from either the input or output of the network.

13 Fundamentals of Intelligent Systems - 2022


ADALINE and MADALINE

Week-4
ADAptive LINear Element (ADALINE)
 ADALINE is the basic building
linear combiner
block used in many neural
networks. signum
 The network consists of a linear
combiner cascaded with a signum
function (bipolar or unipolar).
 ADALINE is an adaptive pattern
classification network that is
trained by the LMS algorithm.
1 

 Synaptic weights are updated w0 (k )


y (k )
during training with LMS x1 (k )  
v(k )

adaptive algorithm using linear w1 (k )


error. xn (k ) 

 Activation function is used after wn (k )


network trained (several input Adaptive Algorithm
e(k )

d (k )

beyond the trained one).


e(k) : linear error
15 Fundamentals of Intelligent Systems - 2021
Simple Adaptive Linear Combiner
 x(k) : input, E : Expectation function  Error : e(k ) = d (k ) − wT (k ) x(k )
d(k) : desired response, J : Objective
function.  Objective function to be minimized to
find optimum w :
 Output :
v(k ) = wT (k ) x(k ) MSE
1 1
J (w) = E{e2 (k )} = E{[d (k ) − wT (k ) x(k )]2}
2 2

MSE

16 Fundamentals of Intelligent Systems - 2021


Simple Adaptive Linear Combiner (2)
 LMS Algorithm :

w(k + 1) = w(k ) + [−J (w)] − J (w)


 Where µ : a learning constant.
 Gradient :

J ( w) 1 e 2 (k )
J ( w)   w= w( k )
w 2 w
1 
= [d (k ) − wT (k ) x(k )]2 w(k )
2 w w(k + 1)
1  2
= [d (k ) − 2d (k ) xT (k ) w(k ) + wT (k ) x(k ) xT (k ) w(k )
2 w
= −d (k ) x(k ) + x(k ) xT (k ) w(k ) = −d (k ) x(k ) + wT (k ) x(k ) x(k )
= −[d (k ) − wT (k ) x(k )]x(k )
= − e( k ) x ( k )

17 Fundamentals of Intelligent Systems - 2021


Simple Adaptive Linear Combiner (3)
 LMS Algorithm :

w(k + 1) = w(k ) + [−J ( w)]


= w(k ) + e(k ) x(k )

 Where µ : a learning constant.,


learning rate parameter
 In scalar form:

x0 = 1 - Jika nilai miu terlalu kecil, maka algoritma learning akan


N
memodifikasi weight secara lamban dan iterasinya akan

e(k ) = d (k ) −  wi (k ) xi (k )
sangat banyak hingga mencapai error surface.
- Jika nilai miu terlalu besar, learning rule nya akan tidak
i =0 stabil ➔ karena pendekatan yang digunakan untuk
evaluasi gradien adalah rumus sebelumnya (gradient
wi (k + 1) = wi (k ) + e(k ) xi (k ) objective function).
- Jika miu terlalu besar, maka menyebabkan weight nya
tidak konvergen (mnejadi divergen).

18 Fundamentals of Intelligent Systems - 2021


Simple Adaptive Linear Combiner (4)
 Convergence property :

2
0 
max
max : Largest eigen value of input covar matrix

 Adjustment of µ :

0
 (k ) = ;   0 and   1
k 0
1+

19 Fundamentals of Intelligent Systems - 2021


Simple Adaptive Linear Combiner (5)
 Step 1 : Set k = 1, initialize the synaptic weight vector w(k=1) and select values for
µ0 and τ.
0
 Step 2 : Compute learning rate parameter µ(k) :  ( k ) =
k
1+

 Step 3 : Compute v(k) :
n
v(k ) =  wi (k ) xi (k )
i =0

 Step 4 : Compute error :

e(k ) = d (k ) − v(k )
 Step 5 : Update the synaptic weights : wi (k + 1) = wi (k ) +  (k )e(k ) xi (k )
for i = 0, 1, 2, …., n.
 Step 6 : if convergence is achieved then stop, else set k  k+1 then go to step 2.

20 Fundamentals of Intelligent Systems - 2021


ADALINE Algorithm
1 

w0 (k )
v(k ) y (k )
x1 (k )  

w1 (k )
xn (k ) 

wn (k )
e(k ) d (k )
Adaptive Algorithm 

 Activation function is used after network trained

0 if v(k )  0 − 1 if v(k )  0
 
 Compute y(k) : y (k ) = f hl (v(k )) =  or y (k ) = f shl (v(k )) =  0 if v(k ) = 0
1 if v(k )  0 1 if v(k )  0

21 Fundamentals of Intelligent Systems - 2021


Example

22 Fundamentals of Intelligent Systems - 2021


ADALINE → Method 1
n
1  v(k ) =  wi (k ) xi (k )
w0 (k ) i =0
v(k ) y (k )
x1 (k )  
e(k ) = d (k ) − v(k )
w1 (k )
xn (k ) 

wi (k + 1) = wi (k ) + e(k ) xi (k )
wn (k )

0 if v(k )  0
e(k ) d (k )


Adaptive Algorithm
y (k ) = f hl (v(k )) = 
1 if v(k )  0
epoch k x0(k) x1(k) x2(k) d(k) y(k)
1 1 0 0 -0.1 0
2 1 0 1 -0.1 0
1
3 1 1 0 -0.1 0
4 1 1 1 0.1 1

23 Fundamentals of Intelligent Systems - 2021


n

 = 0.1 wi (k + 1) = wi (k ) + wi wi = 0.1e(k ) xi (k ) v(k ) =  wi (k ) xi (k )


i =0

ep k x0 x1 x2 d w0 w1 w2 v e ∆w0 ∆w1 ∆w2 y

1 1 0 0 -0.1 0.1 0.1 0.1 0.1 -0.2 -0.02 0 0 1


2 1 0 1 -0.1 0.08 0.1 0.1 0.18 -0.28 -0.028 0 -0.028 1
1
3 1 1 0 -0.1 ??? ??? ??? ??? ??? ??? ??? ??? ???
4 1 1 1 0.1

1 1 0 0 -0.1
2 1 0 1 -0.1
2
3 1 1 0 -0.1
4 1 1 1 0.1

1 1 0 0 -0.1
2 1 0 1 -0.1
3
3 1 1 0 -0.1
4 1 1 1 0.1

1 1 0 0 -0.1
2 1 0 1 -0.1
4
3 1 1 0 -0.1
4 1 1 1 0.1

24 Fundamentals of Intelligent Systems - 2021


n

 = 0.1 wi (k + 1) = wi (k ) + wi wi = 0.1e(k ) xi (k ) v(k ) =  wi (k ) xi (k )


i =0

ep k x0 x1 x2 d w0 w1 w2 v e ∆w0 ∆w1 ∆w2 y

1 1 0 0 -0.1 0.1 0.1 0.1 0.1 -0.2 -0.02 0 0 1


2 1 0 1 -0.1 0.08 0.1 0.1 0.18 -0.28 -0.028 0 -0.028 1
1
3 1 1 0 -0.1 0.052 0.1 0.072 0.152 -0.252 -0.0252 -0.0252 0 1
4 1 1 1 0.1

1 1 0 0 -0.1
2 1 0 1 -0.1
2
3 1 1 0 -0.1
4 1 1 1 0.1

1 1 0 0 -0.1
2 1 0 1 -0.1
3
3 1 1 0 -0.1
4 1 1 1 0.1

1 1 0 0 -0.1
2 1 0 1 -0.1
4
3 1 1 0 -0.1
4 1 1 1 0.1

25 Fundamentals of Intelligent Systems - 2021


ADALINE → Method 2
n
1 
v(k ) =  wi (k ) xi (k )
w0 (k ) i =0
v(k ) y (k )
x1 (k )  
1
y(k ) = f bs (v(k )) =
1 + e −v ( k )
w1 (k )
xn (k ) 

wn (k )
e(k ) = d (k ) − y(k )
e(k ) d (k )
Adaptive Algorithm 

wi (k + 1) = wi (k ) + e(k ) xi (k )

epoch k x0(k) x1(k) x2(k) d(k)


1 1 0 0 0
2 1 0 1 0
1
3 1 1 0 0
4 1 1 1 1

26 Fundamentals of Intelligent Systems - 2021


ADALINE → Method 3
1 
k = iterasi
w0 (k )
v(k ) y (k ) k n x0(n) x1(n) x2(n) d(n)
x1 (k )  
1 1 1 0 0 0
w1 (k )
xn (k ) 
2 2 1 0 1 0
wn (k )
3 3 1 1 0 0
e(k ) d (k )
4 4 1 1 1 1
Adaptive Algorithm 
5 1 1 0 0 0
6 2 1 0 1 0
n
0 7 3 1 1 0 0
v(k ) =  wi (k ) xi (k )  (k ) =
k 8 4 1 1 1 1
i =0 1+
 . . . . .
1
y(k ) = f bs (v(k )) =
1 + e −v ( k )

e(k ) = d (k ) − y(k ) MSE (k ) =


1
e(k )2 wi (k + 1) = wi (k ) +  (k )e(k ) xi (k )
2
27 Fundamentals of Intelligent Systems - 2021
Linear Separability
 Consider a two-input ADALINE network as in the
figure.
 Output of the linear combiner :
v(k ) = w1(k ) x1(k ) + w2 (k ) x2 (k ) + w0 (k )
 Output of activation function :

y (k ) = sgn v(k )

 Borderline for the classification:

w1 (k ) x1 (k ) + w2 (k ) x2 (k ) + w0 (k ) = 0
w1 (k ) w (k )
→ x2 ( k ) = − x1 (k ) − 0
w2 (k ) w2 (k )

 The straight line can effectively separate the input


into two domain :
v(k )  0 and v(k )  0

28 Fundamentals of Intelligent Systems - 2021


ADALINE with Nonlinearly Transformed Inputs
 To solve classification problem for patterns that are not linearly separable, the inputs of
ADALINE can be preprocessed with fixed nonlinearities.
 Separation function of the network in the figure is :
v(k ) = w1(k ) x12 (k ) + w2 (k ) x1 (k )
 Nonlinearities can be expanded with other
+ w3 (k ) x1(k ) x2 (k ) + w4 (k ) x2 (k )
nonlinearity functions.
+ w5 (k ) x22 (k ) + w0 (k )

29 Fundamentals of Intelligent Systems - 2021


Linear Error Correction Rules

 μLMS learning rule : w(k + 1) = w(k ) + e(k ) x(k )

 Change of weights is proportional to difference of target and output of linear


combiner.

e( k ) x ( k )
 αLMS learning rule : w(k + 1) = w(k ) +  ;0.1    1
2
x(k ) 2

 Self normalizing of μLMS learning rule.

30 Fundamentals of Intelligent Systems - 2021


Multiple ADALINE (MADALINE)
 To overcome limitation of ADALINE in separation problem with nonlinear boundary problem.
 MADALINE does not use any transformation functions.
 MADALINE I : single layer network and MADALINE II : two-layer network.
 Learning : LMS algorithm.

MADALINE II
MADALINE I

31 Fundamentals of Intelligent Systems - 2021


Example of MADALINE (XNOR Logic Function)

Structure of Madaline Separation Properties

32 Fundamentals of Intelligent Systems - 2021

You might also like