You are on page 1of 9

Comparison of activation function

on extreme learning machine (ELM)


performance for classifying the active
compound
C  A C  2264   
   S 

Dian Eka Ratnawati, Marjono, Widodo, et al.

ARTICLES YOU MAY BE INTERESTED IN

E   y       
   y      Sy   
A C  2264   

C     y y   


A C  2264   

E      W    y   
 EM 
A C  2264   

A C  2264    2264 

©  A
Comparison of Activation Function
on Extreme Learning Machine (ELM) Performance
for Classifying the Active Compound
Dian Eka Ratnawati 1*, Marjono2 ,Widodo3, Syaiful Anam4
1
Ph.D Student Mathematics, Faculty of Mathematics and Natural Sciences,
Brawijaya University, Malang 65145, Indonesia
2,4
Department of Mathematics, Faculty of Mathematics and Natural Sciences,
Brawijaya University, Malang 65145,Indonesia
3
Department of Biology, Faculty of Mathematics and Natural Sciences,
Brawijaya University, Malang 65145,Indonesia
corresponding author: *dian_ilkom@ub.ac.id

Abstract. The active compound could interact with other molecules and can lead to a variety of positive and negative effects on
the living system. Therefore, classification of the compound is very important for understanding its character and functions for
human medicine. The manual classification of compound functions by laboratory test is time and cost consuming. Based on the
previous research, the functions of the compound could be predicted based on its molecular structure in the format of the
Simplified Molecular Input Line Entry System (SMILES). So, we employed the Extreme Learning Machine (ELM) for
classifying the active compounds according to its SMILES structure. The result of this study suggested that ELM could classify
the active compounds very fast and guarantee optimal performance. The accuracy and computational time of classification model
were depending on the activation function. This experiment uses eleven activation functions i.e Binary Step Function, Sigmoid,
Swish, Exponential Linear Squashing (ELiSH ), Hyperbolic Tangent(TanH), Hard Hiperbolic Function (HardTanH), Rectified
Linear Unit (ReLU), TanhRe, Exponential Linier Units (ELUs), SoftPlus, and Leaky ReLU (LReLU). The results of
experiments show that ELU and TanHRe have the best performance based on average and maximal accuracy. Accuracy of the
system depends on the patterns in class and the activation functions which are used. Based on experimental results, the average
accuracy can reach 80.56% on ELUs activation function and the maximum accuracy 88.73% on TanHRe.

Keywords: activation function, ELM, SMILES

INTRODUCTION

Exploring the function of active compounds in the laboratory is consuming time and cost, its function
elucidation could be enhanced by applying the computation approach[1,2]. The similar structure might have the
same activation function [3], therefore active compounds could be classified according to the similarity of its
structure. The structure of a molecular is written based on Simplified Molecular Input Line System (SMILES) that
unique[4,5] and suitable for input data of machine learning[4,6,7]. Currently, the Extreme Learning Machine (ELM)
is a popular machine learning that extremely fast and always guarantee optimal performance[8,9,10]. Therefore we
employed the ELM for classifying the active compounds according to its SMILES structure.
Accuracy and computational time of the classification model by ELM depend on the Activation functions (AFs)
which are used[11]. Therefore, the selection of AFs in the ELM is the main point of this research. There are eleven
activation functions i.e Binary Step Function, Sigmoid, hyperbolic tangent (Tanh), Rectified Linear Unit (ReLU),
TanhRe, EliSH, Swish, LReLU, HardTanH, SoftPlus, and ELUs. All of the eleventh of the activation function also
were compared in this study.
Some previous studies also used the SMILES for classifying the active compound by used Fuzzy KNN[12],
Learning Vector Quantization (LVQ)[13], C4.5[14], K-Means[15], momentum backpropagation[16], and
backpropagation[17]. However, the limitation of the previous researches have just classified the compounds into 2
classes. So, this study was able to employ ELM for classifying compounds into several classes. This is supported by
previous research that ELM is superior to other methods i.e C.45, backpropagation, SVM, and RBF[8,9].

Symposium on BioMathematics 2019 (SYMOMATH 2019)


AIP Conf. Proc. 2264, 140001-1–140001-8; https://doi.org/10.1063/5.0023872
Published by AIP Publishing. 978-0-7354-2024-3/$30.00

140001-1
METHODS

Extreme Learning Machine Algorithm


ELM is one of the latest neural network methods that has advantages in processing speed and has a good
performance. Known a training sample {ℵ = {( ,  )| ∈  ,  ∈  ,  = , . . },  is input, activation function
() and the number of hidden neurons  ̃ the steps of ELM algorithm are [9] :
̃ by random with range [-1,1]
1. Create input weight , and bias ,  = ,, … 
2. Compute the hidden layer output matrix H, as Eq. (1)
1, . 1, + ,  ⋯ ̃,, . 1, + ̃  (1)
H=[ ⋮ … ⋮ ]
(1, . , + , ) ⋯ (̃,, . , + ̃, ) ̃

3. Compute the output weight β, such as Eq. (2)


 = †  (2)
1 1
β=[ ⋮ ] and T =  ⋮ 
̃ ̃  
 
with H † value as eq (3)
 † = ( )1  and   non-singular (3)
 † =  (  )1 and   non-singular.
The target  is defined as Eq (4)[18,19]
 if  =  (4)
 =   = ,, … .
− if  ≠ 
where β denotes the matrix of output weight, H † denotes the Moore Penrose Pseudo Inverse of matrix , 
denotes target matrix,  denotes output hidden layer matrix,  is transpose output hidden layer matrix,  is
a class label for  and  is the number of class.
4. Calculate the prediction result by using Eq.(5) [18,19]
 = .  ,  = ,, … . (5)
5. Estimate the class label by using Eq. (6)[18]
̂ = arg max  . (6)
1,2,..

Activation Functions

Activation functions (AFs) are functions in neural network (ELM) to calculate weighted and biases. An
activation function is used to generate the outputs of our neural network[20]. The AF is the main component for the
training and optimization of a neural network because AFs learn patterns in a dataset. There are several activation
functions which are
1. Binary Step Function.
Binary Step Function has a threshold value and suitable for binary classification. It is defined as Eq (7)
, if  ≥ 0 (7)
() = 
0, if  < 0.
2. Sigmoid Function.
The sigmoid function has been applied successfully in binary classification problems, which is defined as
Eq.(8)
 (8)
() = ( )
 +  
3. Swish
The swish activation function is a combination between sigmoid activation function and the input (). It is
defined as Eq.(9)
 (9)
() = ( )
 +  

140001-2
4. Exponential Linear Squashing (ELiSH )
The ELiSH function is a combination of ELUs and sigmoid function. It is defined as Eq. (10)

 (10)
 ( +  ),   ≥ 
() =  − 
 ,   < 
  + 

5. Hiperbolic Tangent Function (TanH)


TanH function has range [-1,1] which is calculated as Eq.(11)
  −   (11)
() = ( )
  +  
6. Hard Hiperbolic Function (HardTanH)
HardTanH is a variant of the TanH activation function, which has more efficient computing than TanH[21].
HardTanH activation function is defined as in Eq.(12)
−, if  < − (12)
() =  , if −  ≤  ≤ 
, if  > 
7. Rectified Linear Unit (ReLU) Function
ReLU represents linear functions, so it is easy to achieve optimal results. ReLU function can be defined as
Eq.(13)
, if  ≥ 0 (13)
() = max(0, ) =  
0, if  < 0
8. TanhRe
TanHRe is a combination of ReLU and TanH function. TanHRe is defined as in Eq.(14)[11]

 ,   >  (14)
() = 
 (),   ≤ 

9. Exponential Linier Units (ELUs)


ELU is a development of ReLU, that is, in ReLU there is no negative value, while in ELU negative values
remain. ELUs is defined as Eq.(15)[22]
, if  > 0 (15)
() = (0, ) =  
 () − , if  ≤ 0
where  is ELU hyperparameter, which is usually set to 1.0

10. Softplus Function


Softplus is defined as Eq.(16)
() =  ( +   ) (16)

11. Leaky ReLU (LReLU)


LReLU is a variation of Relu which is developed by Maas et al by maintaining a negative value. It is defined
as Eq.(17)[23]
, if  > 0 (17)
() = 
, if  ≤ 0
where  is constant value , which is usually set to 0.01[23].

Performance Evaluation method

To evaluate the performance system, it is used accuracy as Eq. (18)


 (18)
 =

where  is the number of correct testing data, and Nt is the number of total testing data[24].

140001-3
Preprocessing

Before classifying SMILES codes, every SMILES code must be extracted into 29 features. These features are B,
C, N, O, P, S, F, Cl, Br, I, OH, =, #, @, ( ), [], +, -, charge, ionic (.), aromatic (:) , NO, epoksi (COC), C=C, N+,
C=O, [O-] , total valence, and total cyclic.

RESULT AND DISCUSSION


This experiment uses a SMILES code dataset from the PubChem database, which categories into five classes,
i.e., cancer class, infection class, nervous class, hypertension class, and inflammation class. The five classes were
grouped into three experiments there are class 1-2-3 ( nervous -virus-bacteria), class 1-6-7 (nervous -hypertension-
inflammation), and class 1-3-4 (nervous -bacteria-cancer). The various class combinations to observe the ability of
ELM and AFs in recognizing data patterns. The performance of eleven AFs was examined based on the average
accuracy, maximal accuracy, standard deviation of accuracy and processing time; with 20 times trial and 100 of the
hidden neuron.

a) Average Accuracy
Figure 1(a), Fig 1.(b), and Fig 1(c) show the average accuracies on class combinations 1-2-3, 1-6-7, and 1-3-4,
respectively. The average accuracy of each AFs fluctuating depending on group experiment or class combinations.
This data indicates the AFs ability to classify the active compounds still depends on the datasets which are used. The
5 best average accuracies of AFs are shown in Table 1.

(a) (b) (c)


FIGURE 1. (a) The average accuracy on class 1-2-3. (b) The average accuracy on class 1-6-7. (c) The average accuracy on
class 1-3-4

Table 1 shows the 5 best AFs in each group based on average accuracy. TanHRe and ELU activation function
have good performance in all classes combinations or various datasets. TanHRe and ELU have identity functions for
positive and negative values [22]. These results correspond to Maimaitiyiming et.al [11] that TanHRe had the best
performance is compared to TanH and ReLU, and ELU is outperformed than ReLU and LRELU [22,25,26].

TABLE 1. The 5 best activation functions based on average accuracy


Average Accuracy Average Accuracy Average Accuracy
Ranking Class 1-2-3 Class 1-6-7 Class 1-3-4

1 TanHRe TanHRe ELU


2 ELU SoftPlus TanHRe
3 TanH ELU TanH
4 LReLU Elish Elish
5 ReLU Swish SoftPlus

b).Maximal Accuracy
Figure 2(a), Fig.2(b) and Fig. 2(c) show maximal accuracies of predictions on class combination 1-2-3, class
combination 1-6-7, and class combination 1-3-4. The maximal accuracy of each AFs fluctuating depending on

140001-4
group experiment or class combination. This data indicates the AFs ability to classify the active compounds still
depends on the dataset which is used. The 5 best maximal accuracies of AFs are shown in Table 2.

(a) (b) (c)


FIGURE 2. (a) The maximal accuracy on class 1-2-3. (b) The maximal accuracy on class 1-6-7. (c) The maximal accuracy on
class 1-3-4 .

Table 2 shows the 5 best activation functions in each class combination based on maximal accuracy. The table
shows that the TanHRe and ELU activation functions exist in all classes (1-2-3, 1-6-7, and 1-3-4). It means TanHRe
and ELU can recognize various data patterns well. Besides, softplus is the best in class 1-6-7, because Softplus is an
improvement from ReLU and it has softened and nonzero gradient properties.

TABLE 2. The 5 best activation functions based on maximal accuracy


Ranking Maximal Accuracy Maximal Accuracy Maximal Accuracy
Class 1-2-3 Class 1-6-7 Class 1-3-4
1 TanHRe SoftPlus Sigmoid
2 LReLU Swish HardTanH
3 ReLU ELU ELU
4 SoftPlus TanHRe TanH
5 ELU Elish TanHRe

c). Standard Deviation of Accuracy


In this study, the average accuracy is more important than the standard deviation of accuracy. Standard
deviation is used to see the tendency of prediction accuracy. If the average accuracies are high and the standard
deviations are small, the accuracies tendency is high. Conversely, if the average of accuracies are poor and the
standard deviations are small, then the tendency of prediction accuracies are low. Figure 3.(a), Fig 3.(b), and Fig
3(c) denote standard deviation on class combination 1-2-3, 1-6-7, and 1-3-4. The activation functions in each figure,
from left to right, show the ascending order of the resulting standard deviations. The activation functions are sorted
up because the greater the standard deviations mean the more variation of accuracy, which indicates that the
activation functions are less stable in providing predictions. The unstable means that the difference in prediction
accuracy between experiments is quite large.

(a) (b) (c)


FIGURE 3.(a) The standard deviation on class combination 1-2-3. (b) The The standard deviation on class combination 1-2-3.
on class combination 1-6-7. (c) The The standard deviation on class combination 1-3-4 .

140001-5
TABLE 3. The 5 best activation functions based on Standard Deviation
Ranking Standard Deviation Standard Deviation Standard Deviation
Class 1-2-3 Class 1-6-7 Class 1-3-4

1 HardTanH HardTanH TanHRe


2 ELU Binary SF Elish
3 TanHRe Sigmoid ELU
4 TanH TanH SoftPlus
5 Softplus ReLU ReLU

TanHRe and ELUs activation functions have the best average accuracy and they are the 5 best standard deviations of
accuracy in class 1-2-3 and 1-3-4. This is means that TanHRe and ELU tend towards high prediction accuracy.

d) Average Processing Time


Figure 4 (a), Fig 4.(b), and Fig 4(c) shows the average processing time on class 1-2-3, class 1-6-7, and class 1-
3-4. The activation functions in each figure, from left to right show the ascending order of the resulting processing
time. From all figures, it can be seen that the Swish activation function is faster than the ReLU activation function.
This is appropriate with Gagana et.al research[27].

(a) (b) (c)


FIGURE 4. (a) The average processing time on class 1-2-3. (b) The average processing time on class 1-6-7. (c) The average
processing time on class 1-3-4 .

The experimental results in Table 4 indicate that Softplus, Swish and L ReLU exist in all classes (1-2-3, 1-6-
7, and 1-3-4). Even the Softplus function is always on top ranking, except it has the same speed as LReLU in class
1-3-4. This results appropriately with Zheng study that Softplus converges faster compared to the ReLU and
Sigmoid activation functions[28]. Faster convergence means less computation time needed. This is such as in this
study that the processing time of the Sigmoid and ReLU need a longer time than the Softplus activation function.

TABLE 4. The 5 best activation functions based on Processing Time


Ranking Processing Time Processing Time Processing Time
Class 1-2-3 Class 1-6-7 Class 1-3-4

1 SoftPlus SoftPlus LReLU


2 Swish Sigmoid SoftPlus
3 Sigmoid Swish ELU
4 LReLU ReLU Elish
5 TanH LReLU Swish

Based on scenarios the average accuracy and maximum accuracy, ELU and TanHRe are the best activation
function on SMILES code classification. But, TanHRe requires the processing time longer than ELU and other
activation functions. However, if the performance of activation function based on processing time then the Softplus

140001-6
activation function is the right choice because it has the fastest processing time. In addition, softplus has average
accuracy and maximal accuracy quite well.

CONCLUSIONS

The accuracy of the ELM depends on the patterns in class and the activation function which is used. The
activation functions in ELM have the best performance based on the average and maximal of the accuracy are ELU
and TanHRe. The experimental results show that the average accuracy reaches 80.56% on ELUs function and the
maximum accuracy is 88.73% on the TanHRe function. Beside the ELUs and TanHRe activation function have the
high average and the high maximal of the accuracy, the ELUs and TanHRe activation functions also have a small
standard deviation, it means that the ELM with the ELUs and TanHRe activation function can classify well the
function of an active compound based on a SMILES code.

REFERENCES

[1] E. Karakoc, A. Cherkasov, and S. C. Sahinalp, Bioinformatics 22, 243–251 (2006).


[2] Q. Li, Y. Wang, and S. H. Bryant, Bioinformatics 25, 3310–3316 (2009).
[3] Y. C. Martin, J. L. Kofron, and L. M. Traphagen, Journal of Medicinal Chemistry 45, 4350–4358 (2002).
[4] D. Weininger, J.Chem. Inf.Comput.Sci, 28, 31–36 (1988).
[5] C. Liao, B. Liu, L. Shi, J. Zhou, and X. P. Lu, Eur. J. Med. Chem 40, 632–640 (2005).
[6] A. P. Toropova and A. A. Toropov, Toxicol. Lett 268, 51–57 (2017).
[7] M. A. Islam and T. S. Pillay, Chemom. Intell. Lab. Syst 153, 67–74 (2016).
[8] G. Huang, Q. Zhu, and C. Siew, “Extreme Learning Machine : A New Learning Scheme of Feedforward
Neural Networks,”in 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat.
No.04CH37541), Vol.4 (IEEE,2004), pp.985–990.
[9] G.-B. Huang, Q. Zhu, and C. Siew, Neurocomputing 70, 489–501 (2006)..
[10] R. Zhang, G. Huang, N. Sundararajan, and P. Saratchandran, ACM Trans. Comput. Biol. Bioinforma 4,
485–495 ( 2007).
[11] M. Maimaitiyiming, V. Sagan, and P. Sidike, Remote Sens 11, 740, (2019).
[12] R. Rizky, W. Tigusti, D. E. Ratnawati, and S. Anam, J. Pengemb. Teknlogi Inf. dan Ilmu Komput 2, 6331–
6338 (2018).
[13] S. Ramzini, D. E. Ratnawati, and S. Anam, J. Pengemb. Teknlogi Inf. dan Ilmu Komput. 2, 6160–6168
(2018).
[14] A.R.M. Iskandar, D. E. Ratnawati, and S. Anam, J. Pengemb. Teknlogi Inf. dan Ilmu Komput. 3, 761–769 (
2019).
[15] S. Witanto, D. E. Ratnawati, and S. Anam, J. Pengemb. Teknlogi Inf. dan Ilmu Komput. 3, 702–707( 2019).
[16] W.I.N. Ayu, W. Indriana, D. E. Ratnawati, and S. Anam, J. Pengemb. Teknlogi Inf. dan Ilmu Komput. 3,
1946–1951 (2019).
[17] D. E. Ratnawati, Marjono, and S. Anam, “Prediction of active compounds from SMILES codes using
backpropagation algorithm,” in AIP Conference Proceedings 2021, Vol. 060009 (AIP Publishing, 2018), pp.
1–6
[18] S. Saraswathi, S. Sundaram, and N. Sundararajan, IEEE/ACM Trans. Comput. Biol. Bioinforma. 8, 452–463
(2011).
[19] R. Ahila, V. Sadasivam, and K. Manimala, Appl. Soft Comput. J 32, 23–37 (2015).
[20] S. Jeyanthi and M. Subadra, “Implementation of Single Neuron Using Various Activation Functions with
FPGA,” in 2014 IEEE International Conference on Advanced Communications, Control and Computing
Technologies, . 978, (IEEE,2014), pp. 1126–1131.
[21] R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukeuoglu, and P. Kuksa, J. Mach. Learn. Res.12,
2493–2537 ( 2011).
[22] D. Clevert, T. Unterthiner, and S. Hochreiter, “Fast and Accurate Deep Network Learning by Exponential
Linear Units (ELUs),” in ICLR (2016), pp. 1–14
[23] A. L. Maas and A. Y. Ng, “Rectifier Nonlinearities Improve Neural Network Acoustic Models,” in
Proceedings of the 30th International Conference on International Conference on Machine Learning, Vol.
28 (JMLR:W&CP, 2013), pp. 1–14.

140001-7
[24] Z. Wang and Y. Parth, “Extreme Learning Machine for Multi-class Sentiment Classification of Tweets,” in
Proceedings in Adaptation, Learning and Optimization, Vol. 6 (Springer,2015), pp.1-11.
[25] Y. Zhang, Q. Hua, D. Xu, H. Li, Y. Bu, and P. Zhao, “A Complex-Valued CNN for Different Activation
Functions in Polarsar Image Classification,” in IGARSS 2019 - 2019 IEEE International Geoscience and
Remote Sensing Symposium, (IEEE,2019), pp. 10023–10026.
[26] M. M. Lau and K. H. Lim, “Review of Adaptive Activation Function in Deep Neural Network,” in 2018
IEEE-EMBS Conference on Biomedical Engineering and Sciences (IECBES), (IEEE,2018), pp. 686–690.
[27] B. Gagana, H. A. U. Athri, and S. Natarajan, “Activation Function Optimizations for Capsule Networks,” in
2018 International Conference on Advances in Computing, Communications and Informatics (ICACCI), 3
(IEEE,2018), pp. 1172–1178.
[28] H. Zheng, Z. Yang, W. Liu, J. Liang, and Y. Li, “Improving Deep Neural Networks Using Softplus Units,”
in 2015 International Joint Conference on Neural Networks (IJCNN), (IEEE,2015), pp. 1–4.

140001-8

You might also like