Activation Functions

Activation Functions
Week-4
Basic Model of Artificial Neurons
Nonlinear model of an ANN • uq is a linear combiner of input
(xj) and synaptic weight (wqj)
n
uq =  wqj x j = wqT x = xT wq
j =1
uq = wq1 x1 + ... + wqn xn

• Output of activation function:
 x1 
x 
yq = f (uq ,  q )
xj =  2
  • McCulloh-Pitts:
 
 xn 
 wq1   1 for u q θ q
  f(•) : activation function yq =  0 for u θ
wq 2
wqj =    q q
 
 wqn 
2 Fundamentals of Intelligent Systems - 2022

Basic Model of Artificial Neurons (2)
 uq is a linear combiner of input
Alternate model
(xj) and synaptic weight (wqj)
n

n
vq = wqj x j uq =  wqj x j = wqT x = xT wq
j =0
j =1
 Activation potential
vq = u q −  q
 Output of activation function:
y q = f ( vq )
y q = f (v q )
 n 
= f   wqj x j −  q 

f(•) : activation function
 j =1 

Basic Model of Artificial Neurons (3)
n
vq =  wqj x j −  q → x0 = −1
j =1
n
vq =  wqj x j +  q → x0 = 1
j =1
n
vq =  wqj x j + wq 0 x0
j =1
n
vq =  wqj x j
j =0
y q = f ( vq )
 n 
= f  wqj x j 
 j =0 

Activation Functions
 Activation function can be a linear or nonlinear function.
 Selection of activation function depends on particular problem to be solved.
A. Linear (identity) function
y q = f lin (vq ) = vq

Activation Functions (2)
B1. Hard Limiter Function
0 if vq  0
y q = f hl (vq ) = 
1 if vq  0
Example : McCulloch-Pitts model

B2. Symmetric Hard Limiter Function
− 1 if vq  0

y q = f shl (vq ) =  0 if vq = 0
1 if v  0
 q

C1. Piecewise linear function (saturating function)
 1
 0 if vq  −
2
 1 1 1
y q = f sl (vq ) = vq + if -  vq 
 2 2 2
 1
1 if vq 
 2

C2. Symmetric Piecewise linear function
 − 1 if vq  −1

y q = f ssl (vq ) = vq if - 1  vq  1
 1 if v  1
 q

D1. Binary sigmoid function 1
y q = f bs (vq ) = −vq
1+ e

Derivative of binary sigmoid function
1
y q = f bs (vq ) = −vq
1+ e
−vq
dfbs ( vq ) e
g bs (vq ) = dvq
= −vq 2
(1 + e )
= f bs (vq )[1 − f bs (vq )]

D2. Hyperbolic tangent sigmoid (bipolar sigmoid) function
v q −vq
e −e
y q = f hts (vq ) = tanh(vq ) = v q −vq
e +e
− 2vq
1-e
= − 2vq
1+ e
df hts (vq )
g hts (vq ) = =  [1 + f hts (vq )][1 − f hts (vq )]
dvq

Feedforward NN
Single-Layer Feedforward Networks Multilayer Feedforward Networks
Input Hidden Output

layer layer layer
The term “hidden” refers to the fact that this

part of the neural network is not seen directly
from either the input or output of the network.

ADALINE and MADALINE
Week-4
ADAptive LINear Element (ADALINE)
 ADALINE is the basic building
linear combiner
block used in many neural
networks. signum
 The network consists of a linear
combiner cascaded with a signum
function (bipolar or unipolar).
 ADALINE is an adaptive pattern
classification network that is
trained by the LMS algorithm.
1 
 Synaptic weights are updated w0 (k )

y (k )
during training with LMS x1 (k )  
v(k )
adaptive algorithm using linear w1 (k )

error. xn (k ) 
 Activation function is used after wn (k )

network trained (several input Adaptive Algorithm
e(k )

d (k )
beyond the trained one).

e(k) : linear error
Simple Adaptive Linear Combiner
 x(k) : input, E : Expectation function  Error : e(k ) = d (k ) − wT (k ) x(k )
d(k) : desired response, J : Objective
function.  Objective function to be minimized to
find optimum w :
 Output :
v(k ) = wT (k ) x(k ) MSE
1 1
J (w) = E{e2 (k )} = E{[d (k ) − wT (k ) x(k )]2}
2 2
MSE

Simple Adaptive Linear Combiner (2)
 LMS Algorithm :
w(k + 1) = w(k ) + [−J (w)] − J (w)

 Where µ : a learning constant.
 Gradient :
J ( w) 1 e 2 (k )
J ( w)   w= w( k )
w 2 w
1 
= [d (k ) − wT (k ) x(k )]2 w(k )
2 w w(k + 1)
1  2
= [d (k ) − 2d (k ) xT (k ) w(k ) + wT (k ) x(k ) xT (k ) w(k )
2 w
= −d (k ) x(k ) + x(k ) xT (k ) w(k ) = −d (k ) x(k ) + wT (k ) x(k ) x(k )
= −[d (k ) − wT (k ) x(k )]x(k )
= − e( k ) x ( k )

 LMS Algorithm :
w(k + 1) = w(k ) + [−J ( w)]

= w(k ) + e(k ) x(k )
 Where µ : a learning constant.,

learning rate parameter
 In scalar form:
x0 = 1 - Jika nilai miu terlalu kecil, maka algoritma learning akan

N
memodifikasi weight secara lamban dan iterasinya akan
e(k ) = d (k ) −  wi (k ) xi (k )
sangat banyak hingga mencapai error surface.
- Jika nilai miu terlalu besar, learning rule nya akan tidak
i =0 stabil ➔ karena pendekatan yang digunakan untuk
evaluasi gradien adalah rumus sebelumnya (gradient
wi (k + 1) = wi (k ) + e(k ) xi (k ) objective function).
- Jika miu terlalu besar, maka menyebabkan weight nya
tidak konvergen (mnejadi divergen).

 Convergence property :
2
0 
max
max : Largest eigen value of input covar matrix
 Adjustment of µ :
0
 (k ) = ;   0 and   1
k 0
1+


 Step 1 : Set k = 1, initialize the synaptic weight vector w(k=1) and select values for
µ0 and τ.
0
 Step 2 : Compute learning rate parameter µ(k) :  ( k ) =
k
1+

 Step 3 : Compute v(k) :
n
v(k ) =  wi (k ) xi (k )
i =0
 Step 4 : Compute error :
e(k ) = d (k ) − v(k )
 Step 5 : Update the synaptic weights : wi (k + 1) = wi (k ) +  (k )e(k ) xi (k )
for i = 0, 1, 2, …., n.
 Step 6 : if convergence is achieved then stop, else set k  k+1 then go to step 2.

ADALINE Algorithm
1 
w0 (k )
v(k ) y (k )
x1 (k )  
w1 (k )
xn (k ) 
wn (k )
e(k ) d (k )
Adaptive Algorithm 
 Activation function is used after network trained
0 if v(k )  0 − 1 if v(k )  0
 
 Compute y(k) : y (k ) = f hl (v(k )) =  or y (k ) = f shl (v(k )) =  0 if v(k ) = 0
1 if v(k )  0 1 if v(k )  0


Example

ADALINE → Method 1
n
1  v(k ) =  wi (k ) xi (k )
w0 (k ) i =0
v(k ) y (k )
x1 (k )  
e(k ) = d (k ) − v(k )
w1 (k )
xn (k ) 
wi (k + 1) = wi (k ) + e(k ) xi (k )
wn (k )
0 if v(k )  0
e(k ) d (k )


Adaptive Algorithm
y (k ) = f hl (v(k )) = 
1 if v(k )  0
epoch k x0(k) x1(k) x2(k) d(k) y(k)
1 1 0 0 -0.1 0
2 1 0 1 -0.1 0
1
3 1 1 0 -0.1 0
4 1 1 1 0.1 1

n
 = 0.1 wi (k + 1) = wi (k ) + wi wi = 0.1e(k ) xi (k ) v(k ) =  wi (k ) xi (k )

i =0
ep k x0 x1 x2 d w0 w1 w2 v e ∆w0 ∆w1 ∆w2 y
1 1 0 0 -0.1 0.1 0.1 0.1 0.1 -0.2 -0.02 0 0 1

2 1 0 1 -0.1 0.08 0.1 0.1 0.18 -0.28 -0.028 0 -0.028 1
1
3 1 1 0 -0.1 ??? ??? ??? ??? ??? ??? ??? ??? ???
4 1 1 1 0.1
1 1 0 0 -0.1
2 1 0 1 -0.1
2
3 1 1 0 -0.1
4 1 1 1 0.1
1 1 0 0 -0.1
2 1 0 1 -0.1
3
3 1 1 0 -0.1
4 1 1 1 0.1
1 1 0 0 -0.1
2 1 0 1 -0.1
4
3 1 1 0 -0.1
4 1 1 1 0.1

n
 = 0.1 wi (k + 1) = wi (k ) + wi wi = 0.1e(k ) xi (k ) v(k ) =  wi (k ) xi (k )

i =0
ep k x0 x1 x2 d w0 w1 w2 v e ∆w0 ∆w1 ∆w2 y
1 1 0 0 -0.1 0.1 0.1 0.1 0.1 -0.2 -0.02 0 0 1

2 1 0 1 -0.1 0.08 0.1 0.1 0.18 -0.28 -0.028 0 -0.028 1
1
3 1 1 0 -0.1 0.052 0.1 0.072 0.152 -0.252 -0.0252 -0.0252 0 1
4 1 1 1 0.1
1 1 0 0 -0.1
2 1 0 1 -0.1
2
3 1 1 0 -0.1
4 1 1 1 0.1
1 1 0 0 -0.1
2 1 0 1 -0.1
3
3 1 1 0 -0.1
4 1 1 1 0.1
1 1 0 0 -0.1
2 1 0 1 -0.1
4
3 1 1 0 -0.1
4 1 1 1 0.1

n
1 
v(k ) =  wi (k ) xi (k )
w0 (k ) i =0
v(k ) y (k )
x1 (k )  
1
y(k ) = f bs (v(k )) =
1 + e −v ( k )
w1 (k )
xn (k ) 
wn (k )
e(k ) = d (k ) − y(k )
e(k ) d (k )
wi (k + 1) = wi (k ) + e(k ) xi (k )
epoch k x0(k) x1(k) x2(k) d(k)

1 1 0 0 0
2 1 0 1 0
1
3 1 1 0 0
4 1 1 1 1

1 
k = iterasi
w0 (k )
v(k ) y (k ) k n x0(n) x1(n) x2(n) d(n)
x1 (k )  
1 1 1 0 0 0
w1 (k )
xn (k ) 
2 2 1 0 1 0
wn (k )
3 3 1 1 0 0
e(k ) d (k )
4 4 1 1 1 1
5 1 1 0 0 0
6 2 1 0 1 0
n
0 7 3 1 1 0 0
v(k ) =  wi (k ) xi (k )  (k ) =
k 8 4 1 1 1 1
i =0 1+
 . . . . .
1
y(k ) = f bs (v(k )) =
1 + e −v ( k )
e(k ) = d (k ) − y(k ) MSE (k ) =

1
e(k )2 wi (k + 1) = wi (k ) +  (k )e(k ) xi (k )
2
Linear Separability
 Consider a two-input ADALINE network as in the
figure.
 Output of the linear combiner :
v(k ) = w1(k ) x1(k ) + w2 (k ) x2 (k ) + w0 (k )
 Output of activation function :
y (k ) = sgn v(k )
 Borderline for the classification:
w1 (k ) x1 (k ) + w2 (k ) x2 (k ) + w0 (k ) = 0
w1 (k ) w (k )
→ x2 ( k ) = − x1 (k ) − 0
w2 (k ) w2 (k )
 The straight line can effectively separate the input

into two domain :
v(k )  0 and v(k )  0

ADALINE with Nonlinearly Transformed Inputs
 To solve classification problem for patterns that are not linearly separable, the inputs of
ADALINE can be preprocessed with fixed nonlinearities.
 Separation function of the network in the figure is :
v(k ) = w1(k ) x12 (k ) + w2 (k ) x1 (k )
 Nonlinearities can be expanded with other
+ w3 (k ) x1(k ) x2 (k ) + w4 (k ) x2 (k )
nonlinearity functions.
+ w5 (k ) x22 (k ) + w0 (k )

Linear Error Correction Rules
 μLMS learning rule : w(k + 1) = w(k ) + e(k ) x(k )
 Change of weights is proportional to difference of target and output of linear

combiner.
e( k ) x ( k )
 αLMS learning rule : w(k + 1) = w(k ) +  ;0.1    1
2
x(k ) 2
 Self normalizing of μLMS learning rule.

Multiple ADALINE (MADALINE)
 To overcome limitation of ADALINE in separation problem with nonlinear boundary problem.
 MADALINE does not use any transformation functions.
 MADALINE I : single layer network and MADALINE II : two-layer network.
 Learning : LMS algorithm.
MADALINE II
MADALINE I

Example of MADALINE (XNOR Logic Function)
Structure of Madaline Separation Properties

Activation Functions

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Activation Functions

Uploaded by

Copyright:

Available Formats

Activation Functions

uq = wq1 x1 + ... + wqn xn

2 Fundamentals of Intelligent Systems - 2022

3 Fundamentals of Intelligent Systems - 2022

4 Fundamentals of Intelligent Systems - 2022

A. Linear (identity) function

5 Fundamentals of Intelligent Systems - 2022

Example : McCulloch-Pitts model

6 Fundamentals of Intelligent Systems - 2022

7 Fundamentals of Intelligent Systems - 2022

8 Fundamentals of Intelligent Systems - 2022

9 Fundamentals of Intelligent Systems - 2022

10 Fundamentals of Intelligent Systems - 2021

11 Fundamentals of Intelligent Systems - 2022

12 Fundamentals of Intelligent Systems - 2022

Input Hidden Output

The term “hidden” refers to the fact that this

13 Fundamentals of Intelligent Systems - 2022

 Synaptic weights are updated w0 (k )

adaptive algorithm using linear w1 (k )

 Activation function is used after wn (k )

beyond the trained one).

16 Fundamentals of Intelligent Systems - 2021

w(k + 1) = w(k ) + [−J (w)] − J (w)

17 Fundamentals of Intelligent Systems - 2021

w(k + 1) = w(k ) + [−J ( w)]

 Where µ : a learning constant.,

x0 = 1 - Jika nilai miu terlalu kecil, maka algoritma learning akan

18 Fundamentals of Intelligent Systems - 2021

19 Fundamentals of Intelligent Systems - 2021

 Step 4 : Compute error :

20 Fundamentals of Intelligent Systems - 2021

 Activation function is used after network trained

21 Fundamentals of Intelligent Systems - 2021

22 Fundamentals of Intelligent Systems - 2021

23 Fundamentals of Intelligent Systems - 2021

 = 0.1 wi (k + 1) = wi (k ) + wi wi = 0.1e(k ) xi (k ) v(k ) =  wi (k ) xi (k )

ep k x0 x1 x2 d w0 w1 w2 v e ∆w0 ∆w1 ∆w2 y

1 1 0 0 -0.1 0.1 0.1 0.1 0.1 -0.2 -0.02 0 0 1

24 Fundamentals of Intelligent Systems - 2021

 = 0.1 wi (k + 1) = wi (k ) + wi wi = 0.1e(k ) xi (k ) v(k ) =  wi (k ) xi (k )

ep k x0 x1 x2 d w0 w1 w2 v e ∆w0 ∆w1 ∆w2 y

1 1 0 0 -0.1 0.1 0.1 0.1 0.1 -0.2 -0.02 0 0 1

25 Fundamentals of Intelligent Systems - 2021

epoch k x0(k) x1(k) x2(k) d(k)

26 Fundamentals of Intelligent Systems - 2021

e(k ) = d (k ) − y(k ) MSE (k ) =

 Borderline for the classification:

 The straight line can effectively separate the input

28 Fundamentals of Intelligent Systems - 2021

29 Fundamentals of Intelligent Systems - 2021

 μLMS learning rule : w(k + 1) = w(k ) + e(k ) x(k )

 Change of weights is proportional to difference of target and output of linear

 Self normalizing of μLMS learning rule.

30 Fundamentals of Intelligent Systems - 2021

31 Fundamentals of Intelligent Systems - 2021

Structure of Madaline Separation Properties

32 Fundamentals of Intelligent Systems - 2021

You might also like