Q LearningEmbeddedSineCosineAlgorithmQLESCA

Journal Pre-proofs
Q-Learning Embedded Sine Cosine Algorithm (QLESCA)
Qusay Shihab Hamad, Hussein Samma, Shahrel Azmin Suandi, Junita

Mohamad-Saleh
PII: S0957-4174(21)01704-8
DOI: https://doi.org/10.1016/j.eswa.2021.116417
Reference: ESWA 116417
To appear in: Expert Systems with Applications
Received Date: 25 February 2021

Revised Date: 10 November 2021
Accepted Date: 14 December 2021
Please cite this article as: Shihab Hamad, Q., Samma, H., Azmin Suandi, S., Mohamad-Saleh, J., Q-Learning
Embedded Sine Cosine Algorithm (QLESCA), Expert Systems with Applications (2021), doi: https://doi.org/
10.1016/j.eswa.2021.116417
This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover
page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version
will undergo additional copyediting, typesetting and review before it is published in its final form, but we are
providing this version to give early visibility of the article. Please note that, during the production process, errors
may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
© 2021 Published by Elsevier Ltd.

Q-Learning Embedded Sine Cosine Algorithm (QLESCA)
Qusay Shihab Hamad1,3, Hussein Samma2, Shahrel Azmin Suandi1*, Junita Mohamad-Saleh1
1School of Electrical and Electronic Engineering, Engineering Campus, Universiti Sains Malaysia, 14300 Nibong
Tebal, Penang, Malaysia
2Soft Computing Research Group, School of Computing, Faculty of Engineering, Universiti Teknologi Malaysia,
81310 UTM, Johor Bahru, Johor, Malaysia
3University of Information Technology and Communications (UOITC), Baghdad, Iraq
qusay@student.usm.my, hussein.samma@utm.my, *shahrel@usm.my, jms@usm.my

Abstract
Sine Cosine Algorithm (SCA) was recognized as a lightweight, efficient, and has a clear math principal optimizer.
However, SCA still suffers from a set of problems such as stagnation at local optima, a slow convergence curve, and
a lack of efficient balancing between exploration and exploitation search modes. To mitigate these limitations and
improve SCA performance, this study introduces a new version of SCA called QLESCA that smartly controls SCA
parameters through an embedded Q-learning algorithm at run time. Each QLESCA agent evolves independently, and
it has its own Q-table. The Q-table contains nine different states computed based on population density and distance
from the micro population leader. As such, nine different actions are generated by Q-table to control QLESCA
parameters, namely r1 and r3. These QLESCA parameters are responsible for adaptive switching from
exploration/exploitation and vice versa. For each Q-table action, a reward value is given to a well-performing agent
and a penalty to a non-performing agent. To verify the proposed algorithm's performance, QLESCA was evaluated
with 23 continuous benchmarks, 20 large scale benchmark optimization functions, and three engineering design
problems. In addition, the conducted analysis was compared with various SCA variant algorithms and other state-of-
the-art swarm-based optimization methods. The numerical results demonstrate that the QLESCA was superior in terms
of achieved fitness value. Statistical results confirm that QLESCA significantly outperforms other optimization
algorithms. Additionally, the convergence curve outcomes show that the proposed QLESCA optimization obtains fast
convergence against other conducted algorithms.
Keywords: Since Cosine Optimizer, Swarm intelligence, Q-learning Algorithm, Optimization Algorithms,
Metaheuristic Algorithm, Large-scale Problems.
1. Introduction
SCA is a population-based stochastic optimization algorithm (Mirjalili, 2016). The basic intuition of SCA was inspired
by the mathematical characteristics of sine and cosine trigonometric functions. Due to its simplicity and less tedious
parameter tuning over other multi-agent-based optimization algorithms. Also, the SCA has shown competitive
performance among other meta-heuristic algorithms (Gupta, Deep, Mirjalili, et al., 2020). Indeed, SCA has been
successfully applied to different real-world problems, such as power systems optimization (Ghosh & Mukherjee,
2017) (Mahdad & Srairi, 2018), clustering (Abd Elfattah et al., 2017), breast cancer classification (Majhi, 2018),
pairwise global sequence alignment (Issa et al., 2018), scheduling (Das et al., 2018), grid design (Algabalawy et al.,
2018), finding the optimal design of a power system that reduces annual cost and system emissions (Algabalawy et
al., 2018), image segmentation (Hernandez del Rio et al., 2020), optimize the coordination of directional overcurrent
relays (Sarwagya et al., 2020), and feature selection (Abualigah & Dulaimi, 2021).
Although SCA has the ability to explore search space, however, it lacks efficient exploration/exploitation balancing
(Huiling Chen et al., 2020). Besides, it suffers from a slow convergence rate, especially when working with high
dimensionality optimization problems (Gupta & Deep, 2019b) (Gupta, Deep, Mirjalili, et al., 2020). To overcome
these weaknesses, researchers focused on modifying SCA parameters (Abd Elaziz et al., 2017) (Suid et al., 2018)(Long
et al., 2019), enhancing SCA equations (Hao Chen et al., 2020) (Belazzoug et al., 2020), or integrating SCA with other
optimizers such as differential evolution (DE) (Nenavath & Jatoth, 2018), fruit fly (Fan et al., 2020), particle swarm
1
(Issa et al., 2018), and whale optimization algorithm (WOA) (Khalilpourazari & Khalilpourazary, 2018). An improved
version of SCA named Opposition-Based Sine Cosine Algorithm (OBSCA) was proposed by (Abd Elaziz et al.,
2017). OBSCA considers opposition-based learning as a mechanism to enhance search exploration ability. Their
results indicated that a better outcome was achieved as compared with the standard SCA. A multi-strategy SCA
algorithm termed MSCA was proposed by (Huiling Chen et al., 2020). MSCA combines multiple control mechanisms,
including the Cauchy mutation operator, chaotic local search mechanism, opposition-based learning strategy, and
other differential evolution operators. These four strategies were sequentially executed to generate a new search
solution. However, MSCA requires fitness evaluation computation after each strategy which increases evaluation cost.
The hybridization of SCA with differential evolution (DE) was proposed by (Nenavath & Jatoth, 2018). Their hybrid
scheme was applied for solving object tracking problem. Nevertheless, integrating SCA with DE will increase
algorithm complexity and fitness evaluations required for each population i.e. SCA and DE. (Belazzoug et al., 2020)
suggested an improved SCA that used different equations that control SCA agents' movement. Unfortunately, this
improved algorithm starts with search exploration mode, and over time, it goes exploitation search. Therefore, if it
trapped into local optima, it cannot switch back to exploration mode.
Even though the works presented in the previous paragraph introduce a good improvement to SCA, some points like
lack of automatic balancing of exploration/exploitation search mode remains a critical problem in SCA. Moreover,
they were designed to work with a large population (30 search agents) that need more fitness evaluations than a micro
population (Samma et al., 2016). To mitigate these limitations, this work presents QLESCA that evolves with a micro
population and embedded Q-learning to control SCA parameters. QLESCA has the capability of automatic switching
from exploration to exploitation and vice-versa. Precisely, guided by Q-learning, each QLESCA agent adaptively
moves in the search space according to its performance, where the reward or penalty is given based on the agent's
achievement. It is worth mentioning that Q-learning was successfully integrated with several optimization algorithms
such as PSO (Samma et al., 2016), simulated annealing (SA) algorithm (Samma et al., 2020) and SCA (Zamli et al.,
2018). Zamli et al. (2018) developed a novel hybrid Q-learning with a sine cosine algorithm, which they named
QLSCA. It should be noted that their study is different from our work in many aspects, which can be summarized in
the following points:
1. In QLSCA, Q-learning has been used to dynamically determine the optimal search operator from four
different options (Sine, Cosine, Levy Flight, Crossover) during runtime. In contrast to standard SCA, this
eliminates the switching likelihood between sine and cosine equations, which usually use random switching.
Q-learning is used to switch between these four options, and it chooses the optimal option based on previous
rewards. On the other hand, in our proposed algorithm (QLESCA), the Q-Learning algorithm is used to guide
the SCA search agents toward a more efficient discovery of the entire search space and toward the global
solution by skipping the local optima. SCA was injected with the Q-learning algorithm in order to control
the values of two critical parameters (r1 and r3). These parameters are critical in directing the search agent's
movement within the search area. Q-Table makes an attempt to modify the values of these two parameters in
accordance with the values in the Q-table.
2. QLSCA algorithm has only one Q-table, and the table size is 4*4 (4 actions and 4 states), whereas, in
QLESCA, the number of Q-tables equals the number of search agents; in our suggested model, we employ
five agents, resulting in five Q-tables. The table size in QLESCA is 9*9 (9 actions and 9 states).
3. In QLSCA, they did not modify the switching behavior between exploration and exploitation phases. The
same switching strategy was used in SCA. The algorithm starts with exploration 50% of the time, then moves
to exploitation. But on the other hand, the switching process in our proposed QLESCA is dependent on the
values in the Q-table. That means it can begin with exploration, then move on to exploitation, and vice-versa
until it reaches its limit.
4. Two supplementary operators were included in QLSCA. These operators are the Levy Flight motions and
crossover. While our proposed algorithm did not implement any additional operators.
Motivated by their successful integrations, this study incorporates Q-learning into SCA as a single search algorithm
named QLESCA. The main advantages of QLESCA are highlighted as follows.
1. It uses Q-learning to automate switching from exploration to exploitation and vice-versa.
2. It implements two indicators to monitor the agents' behavior in the search space, and through them, Q-tables
builds for each agent; these indicators are called population density and distance from the leader.
2
3. It evolves each agent independently, and different actions are generated based on agent performance.
4. It runs with a micro population, which reduces fitness evaluations consumed by a large population, i.e., 30
agents.
The remaining parts of this paper are organized as follows. Related studies on SCA variants are described in Section
2. An overview of the Q-learning algorithm is presented in Section 3. The proposed QLESCA algorithm is presented
in Section 4. A comprehensive evaluation and analysis of QLESCA is given in Section 5. Conclusions and suggestions
for further research are given in Section 6.
2. SCA and its variants
SCA is a stochastic population-based optimization algorithm presented by (Mirjalili, 2016). This algorithm, like other
population-based metaheuristics, starts with a set of random solutions, and then each solution updates its position
based on the simple mathematical functions of sine and cosine as in Eq. (1),
𝑋𝑡𝑖 + 1 = { 𝑋𝑡𝑖 + 𝑟1 ∗ sin (𝑟2) ∗ |𝑟3𝑃𝑡𝑖 ― 𝑋𝑡𝑖|, 𝑟4 < 0.5

𝑋𝑡𝑖 + 𝑟1 ∗ cos (𝑟2) ∗ |𝑟3𝑃𝑡𝑖 ― 𝑋𝑡𝑖|, 𝑟4 ≥ 0.5 , (1)
where, 𝑋𝑡𝑖is the position of current solution in i-th dimension at t-th iteration, | | indicates the absolute value, 𝑃𝑡𝑖
represent the position of the destination solution in i-th dimension at t-th iteration, (𝑟1, 𝑟2 and 𝑟3) are random numbers,
r4 is a random number between [0, 1] and used to switch between the sine and cosine equally. Here, r1 determines the
area of the new position, which may be either in the space between Xi and Pi, or outside them (see Fig. 1). Eq. (2) is
used to update the value of 𝑟1 to achieve a balance between exploration and exploitation.
𝑎
𝑟1 = 𝑎 ― 𝑡𝑇 , (2)
destination
(P)
Next position region

when r1<1
solution
(X)
Next position region
when r1>1
Fig. 1. The effect of r1 in the movement direction inward or outwards the destination.
where a is constant, t represents current iterations, and T is the total number of iterations. The behavior of r1 when a
= 2 and T = 100 is visualized in Fig. 2.
3
2
1.8
1.6
1.4
1.2
r1
0.8
0.6
0.4
0.2
0
0 10 20 30 40 50 60 70 80 90 100
Iteration (t)
Fig. 2. The parameter 𝑟1 is linearly decreased from 2 to 0
The parameter r2 is a random variable in the range [0, 2π] that is used to determine the direction of the next solution's
inward or outward movement relative to Pi. Finally, the parameter r3 specifies the destination agent's contribution
level, i.e., Pi. Similar to r1, when r3 > 1, Pi plays a significant role in the position updating process.
The pseudo-code of this algorithm is presented as follows:

Algorithm 1 The SCA algorithm
1. Initialize a population of random search agent (solutions)
2. while (t < maximum number of iterations) do
3. for Each solution do
4. Evaluation the solution using the objective function
5. if the objective value of the solution better than the destination (P) then
6. Update P
7. Update r1, r2, r3, r4
8. Update the solution using Eq. (1)
9. Return the best solution obtained so far as the global optimum
The standard SCA still has a high probability of falling into the local optimal stagnation or failing to find the global
optimum solution. Additionally, it suffers from premature convergence (Long et al., 2019) (Hao Chen et al., 2020)
(Zhou et al., 2021). Recently many researchers presented several schemes to improve the classical SCA. In general,
these studies could be divided into two different categories, which are modified-based and hybrid-based schemes.
Other researchers investigated the applications of SCA, as explained in section 2.3.
2.1 Modified-based SCA Algorithms
This is the first approach proposed by researchers to improve standard SCA. This approach modifies the internal
behavior of SCA Algorithm and is further investigated in this section. In this context, Gupta et al. (2020 a) proposed
a modification to SCA with the main aim of improving the exploitation ability of classical SCA. They suggested
several modifications, including using a non-linear transition rule instead of a linear transition to improve the shift
from exploration to exploitation, modified search equations by providing leading guidance based on the elite candidate
solution. Additionally, a mutation operator with Gaussian mutation and the chaotic map was employed to generate a
new location, preventing the search from being stuck in locally optimum solutions. The proposed version was tested
with 33 benchmarks optimization, and results showed a great performance compared with the standard SCA. In the
related literature, a new position-updating equation with inertia weight has been used to accelerate the convergence of
SCA was given by Long et al. (2019). They introduced a new Gaussian-based non-linear conversion parameter
4
reduction technique to balance the exploration and exploitation, and the algorithm was applied to solve high-
dimensional problems. Specifically, the proposed algorithm was evaluated using 24 benchmarks with different
dimensional, including 30, 100, 500, 1000, and 5000-D. The comparisons showed that the proposed algorithm
demonstrated the ability to escape from local optima, and it has a faster convergence as compared to SCA. An adaptive
sine cosine algorithm (ASCA) was presented by Feng et al. (2020) that incorporates several strategies, including elite
mutation to increase the population diversity, simplex dynamic search to enhance the solution quality, and
neighbourhood search strategy to improve the convergence rate. To assess ASCA, it was applied for reservoir
operation problem scheduling. Results showed good performances with better scheduling results at different runoff
cases.
The improved version of the standard SCA based on memory-guided was termed MG-SCA introduced by Gupta et
al. (2020 b). Their method was focused on balancing between exploration and exploitation, which prevents becoming
stuck in local optima during the search process. The authors suggested guidance by preserving a personal best history
for each individual solution, which would then be used to update the solutions and provide good exploration. These
search guides are decreased by increasing the number of iterations for the transition from the diversification phase to
the intensification phase. It was evaluated with 30 benchmark optimizations, and results showed decreasing nature of
the number of guidance has a meaningful impact on enhancing the exploration ability. In addition, it was able to
maintain the balance between exploration and exploitation during the search process. Rizk-Allah (2019) presented an
improved version of SCA based on orthogonal parallel information (SCA-OPI). In SCA-OPI, the orthogonal
component of information allows the algorithm to retain variety and improve exploration search, whilst the
parallelized scheme allows the algorithm to find interesting solutions and emphasizes exploitation search. In addition,
to retain the exploration ability, an experience-based opposing direction method is given. Different benchmark and
engineering design problems were used to assess and examine the SCA-OPI. The findings showed that the proposed
algorithm delivered very competitive outcomes. Gupta & Deep (2019a) hybridized a self-adaptive sine cosine
algorithm with opposition-based learning. The main idea was to enhance the search capabilities of search agents in
SCA to overcome the problem of stagnation in local optima. To jump out of the local optima, the opposing population
is produced using opposite numbers based on perturbation rate. Then, a self-adaptive component is introduced to the
SCA search equations to leverage all of the pre-visited promising search areas. The proposed method has been tested
on 23 classical benchmark problems and results showed a great performance as compared with the SCA. Qu et al.
(2018) improved the sine cosine algorithm based on greedy Levy mutation. The main aim was to increase the
population diversity and reduce the search oscillation. Specifically, their algorithm has three optimization strategies;
first one, uses an exponentially decreasing conversion parameter and a linearly decreasing inertia weight to balance
the system's exploration and local development abilities. Secondly, it replaces the optimal individuals in the primary
algorithm with random individuals near the optimal individuals, allowing it to jump out of the local optimum. Finally,
for the best individuals, the greedy Levy mutation technique is employed to improve local development ability.
Experimental analysis indicated higher optimization accuracy against SCA.
An improved SCA combined with an optimal neighborhood and quadratic interpolation strategy, which is termed
QISCA, was proposed in Guo et al. (2020). The main objective of QISCA is to overcome the shortcoming of updating
the population guided by the global optimal individual in the SCA. To achieve this goal, a Stochastic Optimal
Neighborhood was used for neighborhood updates, and a Quadratic Interpolation curve was used for individual
updates. Over that, QISCA incorporates Quasi-Opposition Learning strategies to enhance the population’s global
exploration capabilities and improve the convergence speed and accuracy. QISCA was tested with 23 benchmark
functions and 30 CEC2017 test functions, and results showed superior performances. Gupta & Deep (2019b) proposed
another variant of SCA that was named ISCA. The following methods were used to improve SCA: Firstly, the search
equations are adjusted by integrating the personal best state instead of the global best state to determine the region of
search space surrounding a solution's personal best state. Then a crossover is performed to prevent the skipping of true
solutions during the search. Finally, greedy selection has been applied to reduce the overflow of diversity. Standard
benchmark functions CEC 2014 and CEC 2017 have been used to test ISCA. The results showed that the proposed
variant gives a higher optimization accuracy compared to SCA. A new variant of SCA was proposed in the work of
Hao Chen et al. (2020). The main goal is to improve the global exploration and local exploitation powers of SCA. The
authors used three strategies: orthogonal learning, multi-swarm, and greedy selection mechanisms so that the proposed
algorithm is called OMGSCA. To improve its neighborhood finding capabilities, the orthogonal learning technique
was added. The multi-swarm technique is used to improve the exploring capabilities. Finally, to improve the quality
of the search agents, a greedy selection method has been added. The obtained results demonstrated that these strategies
can significantly improve the exploratory and exploitative inclinations of the basic algorithm.
5
Chen et al. (2019) proposed a new optimizer for successfully approximating the unknown parameters of solar cells
and PV modules. ISCA is the name of the new algorithm. The Nelder-Mead simplex (NMs) method and the
opposition-based learning (OBL) method were used to improve it. The NMs is utilized to ensure population
intensification and increase exploitation capability. OBL, on the other hand, can increase population diversification,
ensuring a balance between exploitation and exploration tendencies. In terms of the correctness of concluding
solutions and convergence ratio, the ISCA outperformed most of the published approaches, according to
comprehensive reported findings. Rizk-Allah (2018) introduced the multi-orthogonal sine cosine algorithm
(MOSCA). SCA is used with a multi-orthogonal search strategy (MOSS) in this algorithm. The advantages of the
SCA and MOSS were combined into MOSCA. The suggested approach includes two stages: first, the SCA phase
initiates the search process in order to improve exploration capabilities. Second, the MOSS phase begins its search by
looking for SCA that has already been discovered to increase exploitation tendencies. In this case, the MOSS phase
can help the SCA phase search more thoroughly. The performance of the MOSCA algorithm was investigated by
applying it to eighteen benchmark problems. The experimental results indicate that MOSCA was outperformed SCA
in most cases.
2.2 Hybrid-based SCA Algorithms
The second approach to improve SCA algorithms is by integrating other optimization algorithms together with SCA.
This method is also known as the hybrid-based technique. A hybrid metaheuristic technique tries to integrate two or
more metaheuristic methods in general. This can fully exploit the original algorithms' beneficial characteristics
(Nenavath & Jatoth, 2019). Some of the well-known hybrid-based techniques are investigated in this section.
Fan et al. (2020) hybridized the SCA with the Fruit Fly optimization algorithm (FOA). The proposed algorithm is
termed SCA_FOA. The sine cosine method is integrated into FOA, allowing the fruit fly to reach the global optimum
in a novel way. Their hybrid approach's key idea is that each individual FOA adopts the SCA movement formula to
fly outward or inward to localize the best global optimum. A total of 28 benchmark functions with several engineering
problems were used. Reported results confirmed the effectiveness of the hybrid model as compared with individual
and other optimizers. Issa et al. (2018) integrated SCA with PSO. ASCA-PSO is the name of the suggested algorithm.
Their model contains two layers, where the top layer contains set of particles and the PSO operators perform their
movement. The bottom layer separates the population into sets of groups, and each group contains N search agents;
the new positions are computed using the SCA. Hence, the bottom layer focuses on exploring the search space, while
the top layer focuses on exploiting the best solutions found by the bottom layer. The proposed algorithm has been
tested over 20 benchmark functions, which showed its superiority over the SCA. Another hybridization of SCA with
PSO called H-PSO-SCAC was proposed by Chen et al. (2018). They improved the following areas: To regulate the
local search and convergence to the global optimal solution, they first proposed sine cosine acceleration coefficients
(SCAC). Second, to initialize the population, opposition-based learning (OBL) was used. In addition, a sine map was
used to modify the inertia weight ω. Finally, they presented a new algorithm for updating position updates. The
efficiency of the proposed algorithm was verified by the application of twelve numerical optimization problems.
Experimental results showed that, in most cases, the proposed approach is capable of efficiently solving numerical
optimization tasks. Chegini et al. (2018) integrated SCA with PSO and Levy flight (LF) into one algorithm. This
algorithm is called PSOSCALF. The PSO algorithm was used in conjunction with SCA and LF method in this study.
The mathematical framework for updating the solution in the SCA algorithm is based on the sine and cosine functions'
behavior. These functions ensure the ability to exploit and explore. LF is a random walk that generates search steps
using the Levy distribution, followed by more effective searches in the search space with big leaps. By combining the
SCA and LF in the PSOSCALF method, the original PSO algorithm's exploration capacity is improved, and the
possibility of getting stuck in the local minimum is avoided. The performance and accuracy of the PSOSCALF have
been examined by 23 benchmark functions and 8 engineering problems. The optimization results of the test functions
showed that the PSOSCALF was more successful than the PSO.
To improve standard SCA, Nenavath et al. (2018) presented a hybridization of it with PSO. The hybrid's idea was to
add personal best position (Pbest) and global best position (Gbest) components of PSO to traditional SCA to guide
the search process for potential candidate solutions, and PSO is then initiated with SCA's Pbest to further exploit the
search space. The proposed algorithm combines PSO's exploitation capability and SCA's exploration capability to
achieve optimal global solutions. The effectiveness of this algorithm was evaluated using 23 classical benchmark
functions. Results proved that the proposed algorithm was very competitive. The Hybridizing SCA with Whale
6
Optimization Algorithm (WOA) was proposed by Khalilpourazari & Khalilpourazary (2018). This algorithm is
named SCWOA. The hybrid Algorithm's major goal was to provide a new hybrid solution technique that combines
the advantages of SCA and WOA. The first improvement is that, instead of reducing the WOA's encircling mechanism,
the initial change is to incorporate the SCA's updating operator. In the first iterations, this aids the WOA in making
an efficient trade-off between exploration and exploitation. Where the exploration of the solution space is guaranteed
by the circular movement of whales. The second enhancement is that, under WOA, a humpback whale may update
its position depending on the position of a randomly picked humpback whale when doing exploration. This operator
searches the solution space at random but produces a large number of low-quality solutions. Instead of the humpback
whales' behavior, SCWOA employed the SCA idea, which forces all solutions to update their positions in relation to
the best solution achieved thus far. This ensures that all whales in the SCWOA maintain a current position in relation
to the best humpback whale. The proposed algorithm was compared with SCA and WOA, and the result showed that
SCWOA was capable of efficiently solving numerical optimization tasks. To overcome the weaknesses of the
traditional Grey Wolf Optimizer (GWO) and SCA, as well as to improve their searchability. N. Singh & Singh (2017)
hybridized these two algorithms into one algorithm called HGWOSCA. HGWOSCA is the name of the proposed
hybrid version. The mobility of a GWO alpha agent is enhanced in this version by using SCA position update
equations. This approach aims to enhance global convergence, exploration, and exploitation performance by speeding
up the search process rather than letting the algorithm run for several generations with little progress. The proposed
algorithm was exercised on 22 benchmark tests, five bio-medical dataset problems. The experimental results proved
that the proposed hybrid could be useful in solving benchmark and real-life applications efficiently.
The hybridization of SCA with water wave optimization (WWO) was proposed by Zhang et al. (2018); SCWWO is
the name given to a hybrid algorithm. SCWWO includes two new features: the first one, because water wave
waveforms and sine and cosine curves are extremely similar, and the SCA algorithm has a strong global search
capability to improve the WWO algorithm's exploitation and exploration capabilities, the WWO algorithm is
combined with the SCA in parallel to wave propagation and breaking. Second, the elite opposition-based learning
method is added to the wave refraction operation, which boosts population diversity and improves the WWO
algorithm's exploration capabilities. WWO's convergence speed and calculation accuracy are significantly improved
by the SCWWO method. The proposed algorithm was compared using 9 benchmark functions. The experimental
results demonstrated the feasibility and efficiency of this algorithm. Nenavath & Jatoth (2019) proposed SCA's
hybridization with a teaching-learning-based optimization algorithm (TLBO). The notion of TLBO is integrated into
the SCA in the proposed hybrid SCA-TLBO to increase its searchability. A typical SCA algorithm is used in SCA–
TLBO to search globally with the goal of moving the majority of solutions to a more favorable location. Following
the exploration stage, the TLBO algorithm is used to do a local search with a short step to find the optimal answer.
Based on the mainframe of SCA-TLBO, the SCA emphasizes diversification at the beginning of the search with a big
step to explore the search space extensively and evade trapping into local optima, while later the TLBO algorithm
focuses on intensification and lets individuals move toward the best solution at the later stage of the optimization
process. The effectiveness of SCA-TLBO was evaluated using 23 benchmark functions. Furthermore, results proved
that the proposed algorithm was very competitive.
Singh et al. (2020) suggested a novel hybrid technique dubbed the hybrid salp swarm algorithm (SSA) and sine cosine
algorithms (HSSASCA).SSA is used to explore the vector of solutions in this study, whereas SCA is used as a local
search method to increase the solution superiority. The sine and cosine functions have been incorporated into the
position update equation in the SSA to improve the algorithm's exploration and exploitation tendencies. This
integration gives the SSA greater freedom in investigating the population and guarantees that the variety of the
population, as well as the right value, is rapidly reached. The SCA algorithm's inherent characteristic is to produce a
compound mutation in the best solutions and avoid becoming trapped in local optima. The goal of this technique is to
enhance global convergence by speeding up the search process rather than letting the algorithm run for numerous
rounds with little progress. The algorithm was validated on 22 mathematical optimization functions and 3 engineering
problems. The experimental results revealed that the hybrid algorithm achieved the highest accuracies in comparison
with the other algorithms. A hybrid optimization technique for numerical optimization and feature selection is
presented by Hussain et al. (2021), which incorporates the SCA into Harris Hawks optimization (HHO). The aim of
SCA integration is to address inefficient exploration in HHO. Furthermore, the delta factor is utilized to improve
exploitation by constantly changing candidate solutions to minimize solution stagnation in HHO. SCHHO is the
approach that has been proposed. By solving 29 CEC2017 functions as well as feature selection, the suggested
technique was able to prove its efficacy and robustness. SCHHO produced better results when compared to original
algorithms, well-known population-based optimization methods, and other hybrid equivalents published in recent
7
research. Kamboj et al. (2020) suggested a hybrid form of the HHO and the SCA, dubbed the Hybrid Harris Hawks-
Sine Cosine Algorithm (hHHO-SCA), in order to speed up the global search phase of the existing Harris Hawks
optimizer. The goal of the proposed study is to use SCA to investigate the exploration phase of HHO and to enhance
the exploitation phase. In order to validate the results of the proposed algorithm, it was applied to 65 standard
benchmark problems including CEC2017, CEC2018, and eleven engineering design problems. From observation, the
outcomes of the proposed algorithm were much better than standard SCA and HHO.
The hybrid-based approach is more complex and consumes a lot of time when compared with the modified-based
approach because this hybrid-based approach needs to take into account both advantages from the algorithms involved.
On the other hand, the modified-based approach merely improves the optimization algorithm by editing the algorithm
parameters in significant ways, like adding new parameters, deleting existing parameters, or editing the search
equations in the original algorithm. Therefore, we propose the Q-Learning Embedded Sine Cosine Algorithm
(QLESCA) which is one of the modified-based approaches in this paper.
3 Overview of Q-learning Algorithm
Q-learning is one of the artificial intelligence techniques (Watkins & Dayan, 1992)(Sutton et al., 1999). Basically, Q-
learning has five parts, which are the learning agent, the environment, states, actions, and rewards. To implement Q-
learning, consider S = [S1, S2, S3,…, Sn] be a set of states of the learning agent, A = [A1, A2, A3,…, An] be a set of actions
that the learning agent can execute, Rt is the reward or punishment acquired from executing an action At in state St at
time t, α is the learning rate, which is typically set between 0 and 1. If α is near to 0, previous knowledge learned
becomes more important, while if it is close to 1, newly acquired information becomes more relevant instantly. In
other words, setting it to 0 prevents the Q-table from being updated, and therefore prevents any learning. Setting α to
a high number, such as 0.9, enables rapid learning. γ denotes the discount factor, which is also between 0 and 1. γ
indicates how much the agent's decision-making is influenced by future reward expectations. When γ is close to 0,
only the current reward is considered; as γ approaches 1, the future reward is given more weight in comparison to the
immediate reward (Huynh et al., 2021). Q(St, At) be the total cumulative reward that the learning agent has gained at
time t, and it is computed by Eq. (3).
𝑄𝑡 + 1(𝑆𝑡,𝐴𝑡) = 𝑄(𝑆𝑡,𝐴𝑡) + α [𝑅𝑡 + γ 𝑚𝑎𝑥 𝑄(𝑆𝑡 + 1,𝐴𝑡 + 1) ― 𝑄(𝑆𝑡,𝐴𝑡)]. (3)
For illustration purpose, a numerical example is given in Fig.3. Assuming the current agent state (Q(St, At)) is equal
to 5 as shown in Fig. 3(a), so next action could belong to one of four namely move up, move down, move right, or
move left. As given in the Fig.3 each action is associated with different reward including 25 for moving left, 50 for
moving up, 75 for moving right, and 100 for moving down. As such, best action, according to previously accumulated
rewards, is moving down. Therefore, the new action will be executed (i.e., move down), and a new reward will be
calculated. Assuming a reward Rt was 1. The literature suggests the setting of the discount factor (γ) is 0.1, and the
learning rate parameter (α) is 0.9 (Samma et al., 2016). So, Q-table is updated using Eq. (3), as follows:
𝑄𝑡 + 1(𝑆𝑡,𝐴𝑡)= 5+ 0.9 * [1+ 0.1* max (25, 50, 75, 100) - 5] = 10.4
8
Fig. 3. A numerical illustration of a current state and b next state
The search steps of the Q-learning algorithm are illustrated in algorithm number 2.
Algorithm 2 Q-learning algorithm
1. Initialize Q(S, A ) table with zeros
2. Repeat
3. Select the best action from the Q-table
4. Execute the selected action and compute the reward
5. Update Q-table using Eq.(3)
6. End
4 Q-Learning Embedded Sine Cosine Algorithm (QLESCA)
The proposed QLESCA, which incorporates Q-learning into SCA, is shown in Fig 4. Basically, the Q-learning
technique is embedded in order to control the values of the SCA parameters, namely r1 and r3. As explained earlier,
r1 controls the amount of jump, and r3 is responsible for the destination's contribution level (P).
Under the control of Q-learning, r1 variable will be given a random value that belongs to one of three scales, namely
Low (from 0 to 0.666), Medium (from 0.667 to 1.332), and High (from 1.333 to 2). So, when r1 is low, the SCA
algorithm will be in the exploitation mode. On the other hand, it performs search exploration when r1 is High.
However, on a Medium scale, it will work in two scenarios. If the randomly generated value of r1 (from 0.667 to
0.999) works in exploitation mode, while r1 values in the range (from 1 to 1.332), it will work in the exploration phase.
For the r3 parameter, similar to r1, it is also will be in the range from 0 to 2 with three intervals: Low (from 0 to 0.666),
Medium (from 0.667 to 1.332), and High (from 1.333 to 2). The architecture of the Q-table has nine actions, including
(r1=L, r3=L), (r1=L, r3=M), (r1=L, r3=H), (r1=M, r3=L) ,(r1=M, r3=M), (r1=M, r3=H), (r1=H, r3=L), (r1=H, r3=M),
and (r1=H, r3=H).
Therefore, different actions will be generated from Q-table and SCA agent will be executed. A reward of 1 will be
given in case the agent moves to a better location; otherwise, a penalty of -1 is set. Two indicators were used to
measure population status and individual agent location with respect to destination (P). These indicators are population
density (Samma et al., 2016) (Tang et al., 2015) and distance, which are defined in Eqs. (4) and (5), respectively.
9
Fig. 4. The overview of proposed Q-Learning Embedded Sine Cosine Algorithm (QLESCA)
1𝑁 𝐷 2
𝐷𝑒𝑛𝑠𝑖𝑡𝑦 (𝐷𝑒𝑛) = 𝑁|𝐿|∑𝑖 = 1 ∑𝑗 = 1(𝑋𝑗𝑖 ― 𝑋𝑗) (4)
∑ |𝑃 ― 𝑋(𝑖,:)|2
𝐷𝑖𝑠𝑡𝑎𝑛𝑐𝑒 (𝐷𝑖𝑠) = ∑ (𝑈𝑝𝑝𝑒𝑟_𝐵𝑜𝑢𝑛𝑑 ― 𝐿𝑜𝑤𝑒𝑟_𝐵𝑜𝑢𝑛𝑑)2
(5)
Here, 𝑁 is the total number of agents, 𝐿 is the longest diagonal length in the search space, 𝐷 is the dimension of the
search space, 𝑋𝑗𝑖 is the value of agent i at dimension j, 𝑋𝑗 is the mean value of the whole QLESCA agents at dimension
j. It should be noted that the range of these indicators will be in the range of [0,1] and this is further categorized into
10
three ranges, which are Low (from 0 to 0.333), Medium (from 0.334 to 0.666), and High (from 0.667 to 1). Table 1
shows the combination of various ranges with its respective state.
Table 1
Indicator states
State 1 2 3 4 5 6 7 8 9
Den H H H M M M L L L
Dis H M L H M L H M L
Figure 5 depicts the entire structure of the Q-table used in the proposed model. Each QLESCA agent evolves
independently and has its own Q-table, as previously stated. There are 9 columns and rows in this table. Each column
denotes a specific action (from 1 to 9). Rows are the same way; each row represents a single state (from 1 to 9). This
table starts with zeros and is updated as the search agent receives rewards or punishments via Eq (3).
Fig. 5. The structure of the Q-table in the proposed model
To demonstrate how density (Den) and distance (Dis) indicators work, Fig. 6 to 9 show several cases. It should be
noted that the green circle represents the leader of the population, i.e., Pi, red circles represent search agents, and the
largest red circle (circled) represents the currently executed agent. The first state is given in Fig. 6, where it can be
seen that population density is considered high (H) because most of the agents are very close to the leader. However,
the currently evolved agent (with a large red circle and surrounded by a blue ring) is far away from the population
leader. As such, the distance indicator will be high (H).
11
Fig. 6. State 1 (Den=H and Dis=H)
The second case in Fig. 7 demonstrates the state number 3 when population density is H. This state demonstrates that
the currently executed agent (with a large red circle and surrounded by a blue ring) is very close to the leader, and
their distance is L.
Fig. 7. State 3 (Den=H and Dis=L)
12
The third case (State 7) in Fig. 8 illustrates when population density is L, where all agents spread in the search space.
For currently executed agent (with a large red circle and surrounded by a blue ring) the distance is H because it is
located far away from the leader.
Fig. 8. State 7 (Den=L and Dis=H)

The fourth case (State 9) in Fig. 9 demonstrates when most of the agents are located far away from the leader.
Therefore, population density, Den is L but currently executed agent (with a large red circle and surrounded by a blue
ring) is very close to the leader and its distance state, Dis is L.
Fig. 9. State 9 (Den=L and Dis=L)
13
Fig. 10 illustrates the behavior of r1 in QLESCA as compared with the standard SCA. It can be seen that r1 has a non-
linear transition which conforms the ability of changing search behavior from exploration to exploitation adaptively
several times under the control of Q-table actions. This makes QLESCA able to excavate the search space in a more
efficient way and makes it robust in dealing with different real-life problems.
Fig. 10. r1 behavior in QLESCA against standard SCA
The flow chart and pseudo-code of QLESCA algorithm are given in Fig. 11 and Algorithm no. 3 respectively. The
main steps of QLESCA algorithm are explained as follows:
Step 1: Initialization
The first step is about initializing micro population of QLESCA agents with randomly values according to
the search range of the problem (i.e., Upper Bound (ub) and Lower Bound (lb)). In addition, the Q-table for
each agent will be set to zeros. Then, the fitness function for each agent is calculated and best agent 𝑃 is set
to the best agent in the micro population.
Step 2: Population evolution

The loop will be repeated according to the maximum number of iterations. Each agent will be evolved and
move in the search space toward a better location.
Step 3: State computation
For the currently executed agent, population density (𝐷𝑒𝑛) and distance (𝐷𝑖𝑠) from micro population leader
𝑃 is computed at this stage. For computing these parameters Eqs. (4) and (5) are used.
Step 4: Action execution
According to the state ( 𝐷𝑒𝑛 , 𝐷𝑖𝑠 ), an action is generated and ranges of 𝑟1 𝑎𝑛𝑑 𝑟3 are identified. Then,
parameters 𝑟1 𝑎𝑛𝑑 𝑟3 are computed using Eqs. (6) and (7), respectively.
𝑟1 = 𝑟1 _𝑚𝑖𝑛 + (𝑟1 _𝑚𝑎𝑥 ― 𝑟1 _𝑚𝑖𝑛)𝑈(0,1)

(6)
𝑟3 = 𝑟3 _𝑚𝑖𝑛 + (𝑟3 _𝑚𝑎𝑥 ― 𝑟3 _𝑚𝑖𝑛)𝑈(0,1) (7)
14
Consequently, a new 𝑋𝑡𝑖 is computed using Eq. (8).
𝑋_𝑏𝑒𝑠𝑡𝑖 + 𝑟1 𝑠𝑖𝑛(𝑟2)(𝑟3𝑃 ― 𝑋_𝑏𝑒𝑠𝑡𝑖), 𝑟4 < 0.5
{
𝑋𝑡𝑖 = 𝑋_𝑏𝑒𝑠𝑡 + 𝑟 𝑐𝑜𝑠(𝑟 )(𝑟 𝑃 ― 𝑋_𝑏𝑒𝑠𝑡 ), 𝑟 ≥ 0.5
𝑖 1 2 3 𝑖 4
(8)
Finally, the boundaries of 𝑋𝑡𝑖 are checked and corrected to lie within the search range ∈ (𝑙𝑏 , 𝑢𝑏).
Step 5: Fitness evaluation and reward setting
For the generated new agent 𝑋𝑡𝑖, the fitness is computed.
Step 6: Position updating
At this step, if a better position is found, then 𝑋_𝑏𝑒𝑠𝑡𝑖 agent is updated to be equal 𝑋𝑡𝑖 (𝑋_𝑏𝑒𝑠𝑡𝑖 = 𝑋𝑡𝑖). In
addition, if a new agent 𝑋𝑡𝑖 better than best agent 𝑃, then 𝑃 must be updated too (𝑃 = 𝑋𝑡𝑖).
Step 7: Q-table updating

At this step, the Q-table of the currently executed agent will be updated using Eq. (3).
Step 8: Stopping condition
When the maximum number of fitness evaluations is met, QLESCA will stop and return the best solution
achieved 𝑃.
15
Start
Step 1
Initialization
Step 2
Population evolution
Step 3
State computation
Step 4
Action execution
Step 5
Fitness evaluation
and reward setting
Step 6
Yes Set reward to 1 &
Better position
Update best agent
found?
X_best
No
No
Better than P?
Yes
Update destination P
Step 7
Q-table updating
Step 8
No
Stop?
Yes
End
Fig. 11. The flowchart of QLESCA.
16
Algorithm 3: QLESCA pseudo-code
{ Step 1: Initialization }
1. Initialize QLESCA parameters (population size, Maximum Iteration (Max_itr), Fitness Function, Dimension,
boundaries (lb, ub), Q-table)
2. At first Iteration (t = 0)
3. Generate a random set of search agents 𝑋𝑖, i=1 to 5
4. Check and correct 𝑋𝑖 boundaries
5. Calculate fitness (𝑓𝑖𝑡𝑛𝑒𝑠𝑠 𝑖 ) for each search agent (𝑋𝑖)
6. for i = 1 to 5 do
7. 𝑋_𝑏𝑒𝑠𝑡𝑖 = 𝑋𝑖
8. 𝑋_𝑏𝑒𝑠𝑡_𝑓𝑖𝑡𝑛𝑒𝑠𝑠𝑖= 𝑓𝑖𝑡𝑛𝑒𝑠𝑠 𝑖
9. If (i=1)
10. 𝑃 = 𝑋𝑖
11. 𝑃_𝑓𝑖𝑡𝑛𝑒𝑠𝑠 = 𝑓𝑖𝑡𝑛𝑒𝑠𝑠 𝑖
12. Else
13. If (𝑓𝑖𝑡𝑛𝑒𝑠𝑠 𝑖 < 𝑃_𝑓𝑖𝑡𝑛𝑒𝑠𝑠 )
15. 𝑃 = 𝑋𝑡𝑖
16. end for
17. t=1
{ Step 2: Population evolution }
18. while (𝑡 ≤ 𝑀𝑎𝑥_𝑖𝑡𝑟) do
19. for i = 1 to 5 do
{ Step 3: State computation }
20. Compute current population Density (Den) using Eq. (4)
21. Compute Distance (Dis) from P using Eq. (5)
{ Step 4: Action execution }
21. Based on the current action set 𝑟1 _𝑟𝑎𝑛𝑔𝑒 and 𝑟3 _𝑟𝑎𝑛𝑔𝑒
22. 𝑟1 = 𝑟1 _𝑚𝑖𝑛 + (𝑟1 _𝑚𝑎𝑥 ― 𝑟1 _𝑚𝑖𝑛)𝑈(0,1)
23. 𝑟2 = 2𝜋 𝑈(0,1)
24. 𝑟3 = 𝑟3 _𝑚𝑖𝑛 + (𝑟3 _𝑚𝑎𝑥 ― 𝑟3 _𝑚𝑖𝑛)𝑈(0,1)
25. 𝑟4 = 𝑈(0,1)
26. If (𝑟4 < 0.5)
27. 𝑋𝑡𝑖 = 𝑋_𝑏𝑒𝑠𝑡𝑖 + 𝑟1 𝑠𝑖𝑛(𝑟2)(𝑟3𝑃 ― 𝑋_𝑏𝑒𝑠𝑡𝑖)
28. Else
29. 𝑋𝑡𝑖 = 𝑋_𝑏𝑒𝑠𝑡𝑖 + 𝑟1 𝑐𝑜𝑠(𝑟2)(𝑟3𝑃 ― 𝑋_𝑏𝑒𝑠𝑡𝑖)
30. Check and correct 𝑋𝑡𝑖 boundaries
{ Step 5: Fitness evaluation }
31. Calculate fitness (𝑓𝑖𝑡𝑛𝑒𝑠𝑠 𝑖) for each search agent (𝑋𝑡𝑖)
{ Step 6: Best found positions updating }
32. If (𝑓𝑖𝑡𝑛𝑒𝑠𝑠 𝑖 < 𝑋_𝑏𝑒𝑠𝑡_𝑓𝑖𝑡𝑛𝑒𝑠𝑠𝑖)
33. 𝑋_𝑏𝑒𝑠𝑡_𝑓𝑖𝑡𝑛𝑒𝑠𝑠𝑖 = 𝑓𝑖𝑡𝑛𝑒𝑠𝑠 𝑖
34. 𝑋_𝑏𝑒𝑠𝑡𝑖 = 𝑋𝑡𝑖
35. 𝑅𝑒𝑤𝑎𝑟𝑑 = 1
36. Else
37. 𝑅𝑒𝑤𝑎𝑟𝑑 = ―1
38. If (𝑓𝑖𝑡𝑛𝑒𝑠𝑠 𝑖 < 𝑃_𝑓𝑖𝑡𝑛𝑒𝑠𝑠 )
40. 𝑃 = 𝑋𝑡𝑖
{ Step 7: Update Q-table }
41. Updated Q-table using Eq.(3).
42. end for
17
{ Step 8: Check stop condition }
43. 𝑡=𝑡+1
5 Experimental Analysis
In this section, a series of experiments were conducted in the analysis in order to verify the effectiveness of QLESCA.
A total of three case studies were investigated, including basic benchmark problems CEC 2005 (Suganthan et al.,
2005), large-scale benchmark problems CEC 2010 (Ke Tang, Xiaodong Li, P. N. Suganthan, 2010), and three
engineering design problems they are explained as follows.
5.1 Case Study I: Basic Benchmark Problems
A total of 23 CEC 2005 continuous benchmarks were used in this section including 7 unimodal, 6 multimodal, and 10
fixed-dimensions multimodal shown in Tables 2, 3, and 4. Unimodal functions have single global optima, however,
multimodal functions have more than one local optimum.
Table 2
Description of unimodal benchmark functions.
Function Description Dim Range 𝑓𝑚𝑖𝑛
𝐷
Sphere 30 [-100,100] 0
𝐹1(𝑋) = ∑𝑥
𝑖=1
𝑖
2
𝐷 𝐷
Schwefel 2.22 30 [-10,10] 0
𝐹2(𝑋) = ∑
𝑖=0
|𝑥𝑖| + ∏
𝑖=0
|𝑥𝑖|
2
𝐷 𝑖 Schwefel 1.2 30 [-100,100] 0
𝐹3(𝑋) = ∑(∑𝑥 )
𝑖=1 𝑗=1
𝑗
𝐹4(𝑋) = 𝑚𝑎𝑥𝑖{|𝑥𝑖|,1 ≤ 𝑖 ≤ 𝑛} Schwefel 2.21 30 [-100,100] 0

𝐷―1
Rosenbrock 30 [-30,30] 0
𝐹5(𝑋) = ∑ [ 100(𝑥
𝑖=1
𝑖+1
2
― 𝑥𝑖2) + (𝑥𝑖 ― 1)2]
𝐷
Step 30 [-100,100] 0
𝐹6(𝑋) = ∑([𝑥 + 0.5])
𝑖=1
𝑖
2
𝐷
Quartic 30 [-128,128] 0
𝐹7(𝑋) = ∑
𝑖=1
𝑖𝑥𝑖4 + 𝑟𝑎𝑛𝑑𝑜𝑚[0,1]
Table 3
Description of multimodal benchmark functions.
Function Descript Di Range 𝑓𝑚𝑖𝑛
ion m
𝐷
Schwef 30 [- −418.9
𝐹8(𝑋) = ∑
𝑖=1
― 𝑥𝑖sin ( |𝑥𝑖|) el 500,500] 829 ∗n
𝐷
Rastrigi 30 [−5.12,5. 0
𝐹9(𝑋) = ∑
𝑖=1
[𝑥𝑖2 ― 10cos (2𝜋𝑥𝑖) + 10] n 12]
18
( ) (∑
𝐷 𝐷 Ackley 30 [-32,32] 0
𝐹10(𝑋) = ―20exp ―0.2
1
𝐷 ∑
𝑖=1
𝑥𝑖2 ― exp
1
𝐷
𝑖=1
)
cos (2𝜋𝑥𝑖) + 20 + 𝑒
𝐷 𝐷
Griewa 30 [- 0
𝐹11(𝑋) =
1
4000
𝑖=1
∑ 𝑥𝑖2 ― ∏𝑖=1
cos () 𝑥𝑖
𝑖
+1 nk 600,600]
{
𝐷―1
}
𝐷
𝜋 Penalize 30 [-50,50] 0
𝐹12(𝑋) = 10sin (𝜋𝑦1) +
𝐷 ∑
𝑖=1
(𝑦𝑖 ― 1) [1 + 10𝑠𝑖𝑛 (𝜋𝑦𝑖 + 1)] + (𝑦𝐷 ― 1) +
2 d 2
𝑢(𝑥𝑖,10,100,4) 2
∑
𝑖=1
𝑥𝑖 + 1
𝑦𝑖 = 1 +
4
{
𝑚
𝑘(𝑥𝑖 ― 𝑎) 𝑥𝑖 > 𝑎
𝑢(𝑥𝑖,𝑎,𝑘,𝑚) = 0 ― 𝑎 < 𝑥𝑖 < 𝑎
𝑘( ― 𝑥𝑖 ― 𝑎)𝑚 𝑥𝑖 < 𝑎
{
𝐷
}
Penalize 30 [-50,50] 0
𝐹13(𝑋) = 0.1 𝑠𝑖𝑛 (3𝜋𝑥1) +2
∑
𝑖=1
2 𝑠𝑖𝑛2(2𝜋𝑥𝐷)]
(𝑥𝑖 ― 1)2[1 + 𝑠𝑖𝑛2(3𝜋𝑥𝑖 + 1)] + (𝑥𝐷 ― 1)2[1 +
+ ∑𝑢(𝑥 ,5,100,4)
𝑖=1
𝑖
Table 4
Description of fixed-dimension multimodal benchmark functions.
―1
25 Foxholes 2 [-65,65] 1
1 1
𝐹14(𝑋) = (
500
+ ∑𝑗 + ∑
𝑗=1
2
(𝑥 𝑖 ― 𝑎 𝑖𝑗 ) 6
)
𝑖=1
11
𝑥1(𝑏2𝑖 ― 𝑏𝑖𝑥2)
2 Kowalik 4 [−5,5] 0.0003
𝐹15(𝑋) = ∑ [𝑎𝑖 ―
𝑏2𝑖 + 𝑏𝑖𝑥3 + 𝑥4
𝑖=1
]
1 Six-hump 2 [-5,5] -1.0316

𝐹16(𝑋) = 4𝑥21 ― 2.1𝑥21 + 𝑥61 + 𝑥1𝑥2 ― 4𝑥22 + 4𝑥42 Camel-
3
Back
2
5.1 5 1 Branin 2 [-5,5] 0.398
2
𝐹17(𝑋) = (𝑥2 ― 𝑥
2 1
+ 𝑥 1 ― 6) + 10(1 ― )𝑐𝑜𝑠 𝑥1 + 10
4𝜋 𝜋 8𝜋
𝐹18(𝑋) Goldstein- 2 [-2,2] 3
= [1 + (𝑥1 + 𝑥2 + 1)2(19 ― 14𝑥1 + 3𝑥21 ― 14𝑥2 + 6𝑥1Price
𝑥2 + 3𝑥22)] ∗ [30 + (2𝑥1 ― 3𝑥2)2(18 ― 32𝑥1 + 12
𝑥21 + 48𝑥2 ― 36𝑥1𝑥2 + 27𝑥22)]
4 3
Hartman 3 3 [1,3] -3.86
𝐹19(𝑋) = ― ∑𝑐 exp ( ― ∑𝑎 (𝑥 ― 𝑝 ) )
𝑖=1
𝑖
𝑗=1
𝑖𝑗 𝑗 𝑖𝑗
2
4 6
Hartman 6 6 [0,1] -3.32
𝐹20(𝑋) = ― ∑
𝑖=1
𝑐𝑖 exp ( ― ∑
𝑗=1
𝑎𝑖𝑗(𝑥𝑗 ― 𝑝𝑖𝑗)2)
5
Shekel 5 4 [0,10] -10.1532
𝐹21(𝑋) = ― ∑
𝑖=1
[(𝑋 ― 𝑎𝑖)(𝑋 ― 𝑎𝑖)𝑇 + 𝑐𝑖]
―1
19
7
Shekel 7 4 [0,10] -10.4028
𝐹22(𝑋) = ― ∑
𝑖=1
[(𝑋 ― 𝑎𝑖)(𝑋 ― 𝑎𝑖)𝑇 + 𝑐𝑖]
―1
10
Shekel 10 4 [0,10] -10.5363
𝐹23(𝑋) = ― ∑
𝑖=1
[(𝑋 ― 𝑎𝑖)(𝑋 ― 𝑎𝑖)𝑇 + 𝑐𝑖]
―1
5.1.1 Performance Analysis
In this section, a standard benchmark test suite has been considered to evaluate the performance of the EQLSA
compared with the classical SCA. This test suite consists of 23 test functions of different complexity levels. The
comparison results are reported in Table 5. Since the test functions from F1 to F7 are unimodal functions and only one
global optima is present in these functions, these test functions can be utilized to evaluate the exploitation efficiency
of the candidate solutions in the algorithm. On all these unimodal test cases, QLESCA provides superior results as
compared to the classical SCA where it was able to locate the optimal solution. This is due to the benefit of automatic
control of exploration/exploitation search mode which leads search agents to exploit the search space deeper and get
a global optimum.
Test problems from F8 to F13 are multimodal problems and can be used to check the exploration and local optima
avoidance ability of meta-heuristic algorithms, because of having an enormous number of optima. On all these 6
problems, the proposed QLESCA has obtained the optimal solution. The proposed QLESCA provides a far better
solution as compared to the classical SCA in multimodal problems.
Test problems from F14 to F23 are fixed-dimensional test problems. As can be seen from Table 5, QLESCA
outperformed the classical SCA in all the test problems. It is worth mentioned that these problems have less number
of local optima as compared to the multimodal problems with high dimension size and therefore, the ability to maintain
a suitable balance between the exploitation and exploration can be verified in the QLESCA on these test problems.
Table 5
Comparison Between SCA and QLESCA
F SCA QLESCA
Avg. Std. Avg. Std.
1 55946.85000 16306.45000 1.09E-28 5.34E-28
2 55.09284 13.07359 6.75E-52 3.7E-51
3 83335.39000 9385.27100 161.80320 282.93560
4 87.92184 3.79334 28.71990 12.06261
5 2.47E+08 48043877 27.22215 0.53014
6 46811.88000 16990.53000 0.22526 0.24820
7 1.09E+10 3.1E+09 0.02498 0.02288
8 -3873.55900 266.76010 -6742.01000 656.08400
9 342.96790 65.96998 1.22566 4.30354
10 20.13777 0.18278 0.67459 3.69489
11 498.03060 131.53070 0.00809 0.03765
12 5.77E+08 1.13E+08 3.20868 5.19907
13 1.14E+09 1.54E+08 2.20867 2.16350
14 1.52745 0.72204 1.45844 1.82772
15 0.00139 0.00037 0.00053 0.00023
16 -1.03068 0.00088 -1.03163 5.62E-08
17 0.40835 0.01350 0.39789 1.49E-06
18 3.00525 0.00860 3.00006 0.00015
19 -0.30048 2.26E-16 -0.30048 2.26E-16
20 -2.83712 0.26382 -3.31441 0.02337
21 -1.39947 1.22174 -9.97541 0.92933
20
22 -1.90094 1.25638 -9.46303 2.12654
23 -2.36823 1.19035 -10.34310 0.98507
5.1.2 Convergence Analysis
In this section, the convergence graphs between proposed algorithm QLESCA and SCA, which are used for
comparison in Table 5, are plotted in Fig. 12. The curves in Fig. 12 are plotted based on the average value of the best
objective function obtained in 30 runs. The horizontal axis denotes the iterations104, and the vertical axis represents
the best score obtained. As can be seen in all functions, the convergence is far better in the proposed QLESCA than
SCA. It can be seen from Fig. 12, QLESCA has a much faster convergence speed than the original SCA, indicating
that the convergence capability of the original SCA has been improved considerably with the introduction of the Q-
Learning strategy.
Fig. 12. The convergence curve of SCA and QLESCA
21
Fig. 12. The convergence curve of SCA and QLESCA (cont.)
22
23
24
5.1.3 Effect of population size
This section aims to investigate the effect of population size on QLESCA performances. Two experiments were
executed; population size is 5 and 30. In addition, results from SCA are included in Table 6 for comparison and
analysis purposes. As can be observed from Table 6 the search accuracy of QLESCA tends to be degraded when larger
population is used, i.e., 30 agents. This is due that larger number of agents tend to increase the consumed fitness
evaluations. Unexpectedly, the QLESCA with larger population size outperformed QLESCA with a micro population
size on the F10 and F14 tasks. An explanation is that the F10 and F14 objectively possess more complex local optimal
values compared to other tasks.
Table 6
Comparison between SCA, QLESCA with 5 search agents and QLESCA with 30 search agents
QLESCA
SCA
F 5 Search Agent 30 Search Agent
Avg. Std. Avg. Std. Avg. Std.
1 55946.85000 16306.45000 1.09E-28 5.34E-28 0.00068 0.00170
2 55.09284 13.07359 6.75E-52 3.70E-51 1.12E-07 2.54E-07
3 83335.39000 9385.27100 161.80320 282.93560 3559.34900 2269.65200
4 87.92184 3.79334 28.71990 12.06261 40.26694 11.50641
5 2.47E+08 48043877 27.22215 0.53014 287.66980 522.57260
6 46811.88000 16990.53000 0.22526 0.24820 2.15820 0.47214
7 1.09E+10 3.10E+09 0.02498 0.02288 13031.74000 51050.71000
8 -3873.55900 266.76010 -6742.01000 656.08400 -5442.90000 581.55960
9 342.96790 65.96998 1.22566 4.30354 3.34862 9.15207
10 20.13777 0.18278 0.67459 3.69489 0.00119 0.00258
11 498.03060 131.53070 0.00809 0.03765 0.05878 0.11727
12 5.77E+08 1.13E+08 3.20868 5.19907 7.79900 9.69448
13 1.14E+09 1.54E+08 2.20867 2.16350 21.88741 23.65705
14 1.52745 0.72204 1.45844 1.82772 0.99809 0.00045
15 0.00139 0.00037 0.00053 0.00023 0.00058 0.00016
16 -1.03068 0.00088 -1.03163 5.62E-08 -1.03163 4.71E-07
17 0.40835 0.01350 0.39789 1.49E-06 0.39790 1.24E-05
18 3.00538 0.01190 3.00002 3.33E-05 3.00005 4.90E-05
19 -0.300479 2.26E-16 -0.30048 2.26E-16 -0.30048 2.26E-16
20 -2.76975 0.25869 -3.31646 0.02188 -3.31046 0.02693
25
21 -1.50286 1.21966 -10.14450 0.01625 -10.02940 0.10910
22 -2.17269 1.33926 -10.28650 0.75048 -10.23770 0.29187
23 -2.13109 1.11692 -10.52500 0.01251 -10.40630 0.11471
5.2 Case Study II: Large-Scale Benchmark Problems
QLESCA was used to solve 20 large scale benchmark functions, CEC 2010(Ke Tang, Xiaodong Li, P. N. Suganthan,
2010). The brief description of these 20 functions can be seen in Table 7.
Table 7
Description of Large-Scale Benchmark Functions.
𝐷 𝑖―1 Shifted Elliptic 1000 [-100,100] 0
𝐹1(𝑋) = ∑(10 )
𝑖=1
6 𝐷―1
𝑍2𝑖 Function
𝐷
Shifted 1000 [-5,5] 0
𝐹2(𝑋) = ∑
𝑖=1
[𝑍2𝑖 ― 10cos (2𝜋𝑍𝑖) + 10] Rastrigin’s
Function
( ) (∑
𝐷 𝐷 Shifted 1000 [-32,32] 0
𝐹3(𝑋) = ―20exp ―0.2
1
𝐷 ∑
𝑖=1
𝑍2𝑖 ― exp
1
𝐷
𝑖=1
) Ackley’s
cos (2𝜋𝑍𝑖) + 20 + 𝑒
Function
𝐹4(𝑋) = 𝐹𝑟𝑜𝑡_𝑒𝑙𝑙𝑖𝑝𝑡𝑖𝑐[𝑍(𝑃1:𝑃𝑚)] ∗ 106 + 𝐹𝑒𝑙𝑙𝑖𝑝𝑡𝑖𝑐[𝑍(𝑃𝑚 + 1:𝑃𝐷)] Single-group 1000 [-100,100] 0

Shifted and m-
rotated Elliptic
Function
𝐹5(𝑋) = 𝐹𝑟𝑜𝑡_𝑟𝑎𝑠𝑡𝑟𝑖𝑔𝑖𝑛[𝑍(𝑃1:𝑃𝑚)] ∗ 106 + 𝐹𝑟𝑎𝑠𝑡𝑟𝑖𝑔𝑖𝑛[𝑍(𝑃𝑚 + 1:𝑃𝐷)] Single-group 1000 [-5,5] 0
Shifted and m-
rotated
Rastrigin’s
Function
𝐹6(𝑋) = 𝐹𝑟𝑜𝑡_𝑎𝑐𝑘𝑙𝑒𝑦[𝑍(𝑃1:𝑃𝑚)] ∗ 106 + 𝐹𝑎𝑐𝑘𝑙𝑒𝑦[𝑍(𝑃𝑚 + 1:𝑃𝐷)] Single-group 1000 [-32,32] 0
Shifted and m-
rotated Ackley’s
Function
𝐹7(𝑋) = 𝐹𝑠𝑐ℎ𝑤𝑒𝑓𝑒𝑙[𝑍(𝑃1:𝑃𝑚)] ∗ 106 + 𝐹𝑠𝑝ℎ𝑒𝑟𝑒[𝑍(𝑃𝑚 + 1:𝑃𝐷)] Single-group 1000 [-100,100] 0
Shifted m-
dimensional
Schwefel’s
𝐹8(𝑋) = 𝐹𝑟𝑜𝑠𝑒𝑛𝑏𝑟𝑜𝑐𝑘[𝑍(𝑃1:𝑃𝑚)] ∗ 106 + 𝐹𝑠𝑝ℎ𝑒𝑟𝑒[𝑍(𝑃𝑚 + 1:𝑃𝐷)] Single-group 1000 [-100,100] 0
Shifted m-
dimensional
Rosenbrock’s
Function
𝐷 𝐷
group 1000 [-100,100] 0
2𝑚 2𝑚
𝐹9(𝑋) = ∑
𝑘=1 2
Shifted and m-
𝐹𝑟𝑜𝑡_𝑒𝑙𝑙𝑖𝑝𝑡𝑖𝑐[𝑧(𝑃(𝑘 ― 1) ∗ 𝑚 + 1:𝑃𝑘 ∗ 𝑚)] + 𝐹𝑒𝑙𝑙𝑖𝑝𝑡𝑖𝑐[𝑧(𝑃𝐷 :𝑃𝐷)]
+ 1rotated Elliptic
Function
𝐷 𝐷
group 1000 [-5,5] 0
2𝑚 2𝑚
𝐹10(𝑋) = ∑
𝑘=1
Shifted
𝐹𝑟𝑜𝑡_𝑟𝑎𝑠𝑡𝑟𝑖𝑔𝑖𝑛[𝑧(𝑃(𝑘 ― 1) ∗ 𝑚 + 1:𝑃𝑘 ∗ 𝑚)] + 𝐹𝑟𝑎𝑠𝑡𝑟𝑖𝑔𝑖𝑛[𝑧(𝑃𝐷 :𝑃𝐷)]
2
and m-
+ rotated
1
Rastrigin
Function
26
𝐷 𝐷
group 1000 [-32,32] 0
2𝑚 2𝑚
𝐹11(𝑋) = ∑
𝑘=1 2
Shifted and m-
𝐹𝑟𝑜𝑡_𝑎𝑐𝑘𝑙𝑒𝑦[𝑧(𝑃(𝑘 ― 1) ∗ 𝑚 + 1:𝑃𝑘 ∗ 𝑚)] + 𝐹𝑎𝑐𝑘𝑙𝑒𝑦[𝑧(𝑃𝐷 :𝑃𝐷)]
+rotated
1 Ackley’s
Function
𝐷 𝐷
group 1000 [-100,100] 0
2𝑚 2𝑚
𝐹12(𝑋) = ∑
𝑘=1 2
+1
Shifted m-
𝐹𝑠𝑐ℎ𝑤𝑒𝑓𝑒𝑙[𝑧(𝑃(𝑘 ― 1) ∗ 𝑚 + 1:𝑃𝑘 ∗ 𝑚)] + 𝐹𝑠𝑝ℎ𝑒𝑟𝑒[𝑧(𝑃𝐷 :𝑃𝐷)]
rotated
Schwefel’s
𝐷 𝐷
group 1000 [-100,100] 0
2𝑚 2𝑚
𝐹13(𝑋) = ∑
𝑘=1 2
Shifted
𝐹𝑟𝑜𝑠𝑒𝑛𝑏𝑟𝑜𝑐𝑘[𝑧(𝑃(𝑘 ― 1) ∗ 𝑚 + 1:𝑃𝑘 ∗ 𝑚)] + 𝐹𝑠𝑝ℎ𝑒𝑟𝑒[𝑧(𝑃𝐷 :𝑃𝐷)]
+ 1 rotated
m-
Rosenbrock’s
Function
𝐷 𝐷
group Shifted 1000 [-100,100] 0
𝑚 𝑚
𝐹14(𝑋) = ∑𝐹
𝑘=1
𝑟𝑜𝑡_𝑒𝑙𝑙𝑖𝑝𝑡𝑖𝑐[𝑧(𝑃(𝑘 ― 1) ∗ 𝑚 + 1:𝑃𝑘 ∗ 𝑚)]
and m-rotated
Elliptic
Function
𝐷 𝐷
group Shifted 1000 [-5,5] 0
𝑚 𝑚
𝐹15(𝑋) = ∑𝐹
𝑘=1
𝑟𝑜𝑡_𝑟𝑎𝑠𝑡𝑟𝑖𝑔𝑖𝑛[𝑧(𝑃(𝑘 ― 1) ∗ 𝑚 + 1:𝑃𝑘 ∗ 𝑚)]
and m-rotated
Rastrigin
Function
𝐷 𝐷
group Shifted 1000 [-32,32] 0
𝑚 𝑚
𝐹16(𝑋) = ∑𝐹
𝑘=1
𝑟𝑜𝑡_𝑎𝑐𝑘𝑙𝑒𝑦[𝑧(𝑃(𝑘 ― 1) ∗ 𝑚 + 1:𝑃𝑘 ∗ 𝑚)]
and m-rotated
Ackley’s
Function
𝐷 𝐷
group Shifted 1000 [-100,100] 0
𝑚 𝑚
𝐹17(𝑋) = ∑𝐹
𝑘=1
𝑠𝑐ℎ𝑤𝑒𝑓𝑒𝑙[𝑧(𝑃(𝑘 ― 1) ∗ 𝑚 + 1:𝑃𝑘 ∗ 𝑚)]
m-rotated
Schwefel
𝐷 𝐷
group Shifted 1000 [-100,100] 0
𝑚 𝑚
𝐹18(𝑋) = ∑𝐹
𝑘=1
𝑟𝑜𝑠𝑒𝑛𝑏𝑟𝑜𝑐𝑘[𝑧(𝑃(𝑘 ― 1) ∗ 𝑚 + 1:𝑃𝑘 ∗ 𝑚)]
m-rotated
Rosenbrock’s
Function
2
𝑛 𝑖 Shifted 1000 [-100,100] 0
𝐹19(𝑋) = ∑(∑𝑥 )
𝑖=1 𝑗=1
𝑖
Schwefel’s
𝐷―1
Shifted 1000 [-100,100] 0
𝐹20(𝑋) = ∑ [100(𝑧 ― 𝑧
𝑖=1
2
𝑖 𝑖+1 )2 + (𝑧𝑖 ― 1)2] Rosenbrock’s
Function
5.2.1 Comparison with SCA variants
27
This section aims to compare the performances of QLESCA against recently developed SCA variants. Specifically,
algorithms OBSCA (Abd Elaziz et al., 2017), SCADE (Nenavath & Jatoth, 2018), ISCA (Suid et al., 2018), and
MSCA (Huiling Chen et al., 2020) are executed according to settings shown in Table 8. The mean value and standard
deviation for all conducted algorithms are shown in Table 9. Reported results indicated that QLESCA outperformed
other algorithms, except F3 the result of the QLESCA is very close to the MSCA. Further analysis was examined by
plotting the convergence curve of QLESCA as compared with other SCA variants. As can be seen in Fig. 13.
Table 8
Algorithm’s parameters settings
Maximum no. of
Method Population size Other parameters
iterations
QLESCA 5 α = 0.9 and γ = 0.1
104
a = 2. All the parameters are set as in (Mirjalili,
SCA 30 104
2016)
a = 2; μ= 4; F = random in [0.2, 0.8]; Pc = 0.8.
MSCA 30 104 All the parameters are set as in (Huiling Chen et
al., 2020)
a = 2; F = random in [0.2, 0.8]; Pc = 0.8. All the
SCADE 30 104 parameters are set as in (Huiling Chen et al.,
2020)
a = 2. All the parameters are set as in (Huiling
OBSCA 30 104
Chen et al., 2020)
a = 2, α = 0.03, β = 0.2. All the parameters are
ISCA 30 104
set as in (Suid et al., 2018)
Fig. 13. The convergence curve of QLESCA with SCA and its variants
28
Table 9
Comparison Between QLESCA, SCA and its variants
QLESCA SCA MSCA OBSCA SCADE ISCA
Avg. Stdv. Avg. Stdv. Avg. Stdv. Avg. Stdv. Avg. Stdv. Avg. Stdv.
1
5.32E+10 8.79E+09 3.43E+11 3.53E+10 1.86E+11 3.98E+09 3.51E+11 1.84E+10 2.16E+11 1.51E+10 1.99E+11 1.33E+10
2 16595.290 361.1906 24047.290 1058.6210 16741.700 24229.880 17922.050 22571.1500 2672.0080
95.95124 251.16300 580.66120
00 0 00 0 00 00 00 0 0
3
21.03316 0.04607 21.53476 0.01325 21.00770 0.01063 21.52631 0.01076 21.15838 0.05930 21.44479 0.09891
4
1.65E+14 5.24E+13 2.88E+15 5.41E+14 2.54E+15 5.01E+14 3.42E+15 7.09E+14 4.93E+15 1.21E+15 1.47E+15 3.25E+14
5
4.66E+08 45992396 8.48E+08 33734849 7.96E+08 29861877 9.11E+08 58342140 8.96E+08 38940582 6.93E+08 35357209
6 96611.600 85461.842 69457.310 126283.100 290226.30
17781724 2191157 21109935 20592247 21161109 20929264 20195599
00 00 00 00 000
7
6.16E+10 1.13E+10 3.35E+11 5.68E+10 4.89E+11 9.83E+10 3.98E+11 7.17E+10 1.01E+12 3.48E+11 2.29E+11 4.86E+10
8
3.05E+14 2.94E+14 1.03E+17 2.08E+16 4.64E+16 4.14E+15 1.33E+17 2.78E+16 8.81E+16 2.67E+16 2.75E+16 5.51E+15
9
7.14E+10 6.96E+09 3.85E+11 2.57E+10 2.26E+11 4.38E+09 3.84E+11 2.15E+10 2.54E+11 1.81E+10 2.40E+11 1.47E+10
10 16568.300 433.6154 24429.380 16978.390 24053.610 1295.8150 18307.450 22453.1400 2659.3170
425.32350 73.07203 691.48050
00 0 00 00 00 0 00 0 0
11
229.49590 0.51170 236.40270 0.25667 230.43210 0.13935 236.39300 0.19724 232.05010 0.79090 234.98330 0.94295
12 342615.6 3619786.8
5312623 26780263 2390309 22027582 30046975 2544311 29984926 5292463 18165241 1973549
0000 0000
13
3.89E+11 2.15E+10 3.58E+12 1.52E+11 6.81E+11 7.07E+09 3.44E+12 1.61E+11 1.25E+12 2.68E+11 2.86E+12 1.02E+12
14
7.55E+10 7.60E+09 4.17E+11 2.58E+10 2.53E+11 6.43E+09 4.11E+11 2.30E+10 2.94E+11 2.00E+10 2.67E+11 2.11E+10
15 16068.150 347.0188 24506.770 17044.640 24472.850 18213.680 23357.9000 2115.8450
281.37120 121.29726 253.73940 479.52500
00 0 00 00 00 00 0 0
16
418.58900 0.79762 430.14620 0.40942 419.22570 0.21331 430.02270 0.25890 421.95070 1.12282 428.04100 2.07849
29
17
8639061 742967 66434082 5886362 48856949 4510592.8 72941794 7875182 61386883 12502180 42796599 4139636
18
1.16E+12 4.72E+10 7.52E+12 5.35E+11 1.45E+12 7.29E+09 7.54E+12 2.45E+11 2.63E+12 4.85E+11 7.63E+12 9.25E+11
19
25564257 3533975 1.19E+08 15228527 1.39E+08 28461571 1.47E+08 27216120 2.23E+08 62338040 83928007 8306343
20
1.32E+12 3.97E+10 8.41E+12 2.53E+11 1.63E+12 7.88E+09 8.27E+12 2.03E+11 2.93E+12 6.87E+11 8.12E+12 1.03E+12
30
Fig. 13. The convergence curve of QLESCA with SCA and its variants
31
Fig. 13. The convergence curve of QLESCA with SCA and its variants (cont.)
32
Fig. 13. The convergence curve of QLESCA with SCA and its variants (cont.)
33
5.2.1.1 Statistical Analysis
To assess the performance of QLESCA statistically, the p-value of Wilcoxon signed-rank test (García et al., 2010) is
computed. Each value greater than 0.05 is shown in boldface, which indicates the difference is not significant. As can
be seen from Table 10, QLESCA outperformed original SCA and its variants (MSCA, SCADE, OBSCA, and ISCA).
This indicates that the exploration and exploitation capability of the original SCA has been improved considerably by
the integration with Q-learning. However, no satisfying result was obtained on F2, indicating that the difference with
MSCA is not statistically significant. Regardless, the proposed algorithm surpassed all other algorithms.
Table 10
p-values for QLESCA versus other competitors on Large Scale
Function no. SCA MSCA OBSCA SCADE ISCA
1 3.02E-11 3.02E-11 3.02E-11 3.02E-11 3.02E-11
2 3.02E-11 0.10547 3.02E-11 4.98E-11 3.02E-11
3 3.02E-11 0.01171 3.02E-11 1.86E-09 3.02E-11
4 3.02E-11 3.02E-11 3.02E-11 3.02E-11 3.02E-11
5 3.02E-11 3.02E-11 3.02E-11 3.02E-11 3.02E-11
6 3.02E-11 3.02E-11 3.02E-11 3.02E-11 3.47E-10
7 3.02E-11 3.02E-11 3.02E-11 3.02E-11 3.02E-11
8 3.02E-11 3.02E-11 3.02E-11 3.02E-11 3.02E-11
9 3.02E-11 3.02E-11 3.02E-11 3.02E-11 3.02E-11
10 3.02E-11 1.16E-07 3.02E-11 1.09E-10 3.02E-11
11 3.02E-11 6.72E-10 3.02E-11 3.02E-11 3.02E-11
12 3.02E-11 3.02E-11 3.02E-11 3.02E-11 3.02E-11
13 3.02E-11 3.02E-11 3.02E-11 3.02E-11 3.02E-11
14 3.02E-11 3.02E-11 3.02E-11 3.02E-11 3.02E-11
15 3.02E-11 7.39E-11 3.02E-11 3.02E-11 3.02E-11
16 3.02E-11 0.00013 3.02E-11 3.34E-11 3.02E-11
17 3.02E-11 3.02E-11 3.02E-11 3.02E-11 3.02E-11
18 3.02E-11 3.02E-11 3.02E-11 3.02E-11 3.02E-11
19 3.02E-11 3.02E-11 3.02E-11 3.02E-11 3.02E-11
20 3.02E-11 3.02E-11 3.02E-11 3.02E-11 3.02E-11
5.2.1.2 Execution Time Analysis
The settings of the machine used in this study are shown in Table 11 bellow.
Table 11
The detailed settings of the utilized system.
Name Setting
Hardware
CPU Intel(R) Core (TM) i5-9400F
Frequency 2.90 GHz
RAM 16 GB
GPU Nvidia GeForce RTX 2070 Super
SSD 476 GB
Hard drive 1 TB
Software
Operating system Windows 10
Language MATLAB R2020b
34
The time consumed (in seconds) by each algorithm are clarified in this section. Table 12 contains all results. It can be
seen that QLESCA has a slightly higher computational time as compared with other algorithms. This is due to
calculating density (𝐷𝑒𝑛) and distance (𝐷𝑖𝑠) from micro population leader 𝑃 for each agent in all iterations then
according to the state (𝐷𝑒𝑛, 𝐷𝑖𝑠), an action is generated and ranges of 𝑟1 𝑎𝑛𝑑 𝑟3 are identified. Finally, the Q-table of
the agent is updated for currently executed.
Table 12
Comparison of CPU execution time for QLESCA versus other competitors on Large Scale
Function no. QLESCA SCA MSCA OBSCA SCADE ISCA
1 2.77000 2.45000 1.92000 2.54000 2.04000 2.41000
2 1.86000 1.43000 0.69000 0.92000 0.94000 1.40000
3 1.89000 1.46000 0.72000 0.82000 1.00000 1.43000
4 2.89000 2.54000 2.09000 2.48000 2.12000 2.51000
5 2.01000 1.56000 0.86000 1.10000 1.07000 1.52000
6 2.05000 1.60000 0.91000 0.98000 1.21000 1.56000
7 2.13000 1.77000 1.22000 1.43000 1.71000 1.75000
8 1.74000 1.40000 0.69000 0.91000 0.88000 1.37000
9 3.18000 2.82000 2.34000 3.08000 2.43000 2.79000
10 2.33000 1.94000 1.26000 1.67000 1.47000 1.91000
11 2.49000 2.09000 1.46000 1.61000 1.71000 2.06000
12 5.80000 5.43000 5.75000 6.65000 6.48000 5.39000
13 2.20000 1.85000 1.20000 1.61000 1.32000 1.82000
14 3.45000 3.10000 2.65000 3.49000 2.68000 3.06000
15 2.45000 2.11000 1.50000 2.00000 1.69000 2.07000
16 2.72000 2.36000 1.83000 2.02000 2.03000 2.33000
17 9.81000 9.47000 10.70000 12.80000 11.80000 9.39000
18 2.65000 2.31000 1.72000 2.32000 1.79000 2.29000
19 16.20000 15.80000 19.20000 21.20000 22.60000 15.80000
20 1.68000 1.36000 0.63000 0.84000 0.82000 1.33000
5.2.2 Comparison with state-of-the-art Algorithms
To prove the efficiency of QLESCA with state of the art algorithms, this section studied several swarm-based
algorithms namely particle swarm optimization (PSO) (Kennedy & Eberhart, 1995), Multi-Verse Optimizer (MVO)
(Mirjalili et al., 2016), Grasshopper Optimisation Algorithm (GOA) (Saremi et al., 2017), BAT (Yang & Hossein
Gandomi, 2012), artificial bee colony (ABC) (Karaboga & Basturk, 2007) , Harmony Search Algorithm(Geem et al.,
2001), moth-flame optimization algorithm (MFO) (Mirjalili, 2015), cultural algorithm (CA)(Reynolds, 1994),
Simulated Annealing Algorithm(SA)(Kirkpatrick et al., 1983), Bees Algorithm (BeA)(Pham et al., 2005), differential
evolution algorithm (DE)(Storn & Price, 1997), ant colony optimization for continuous domains (ACOR) (Socha &
Dorigo, 2008). The parameters of each algorithm are as shown in Table 13.
The average results (Avg), standard deviation (Stdv), and rank in cracking F1 to F20 tasks are shown in Table 14. The
ranking was based on the average results of 30 runs. It can be seen that the proposed QLESCA is outperformed most
other algorithms, there exist some functions the results of the QLESCA is very close to the other algorithms such as
F1, F2, F3, F4, F5, F6, F8, and F11. However, based on the overall rank on F1-F20, QLESCA, with the average rank of
2.538462 on F1-F20, is significantly superior to all the other optimizers on these global optimization tasks.
In addition, the convergence curve of QLESCA and these studied algorithms are given in Fig. 14. Moreover,
QLESCA, with the fastest convergence speed, outperformed all the other optimizers in dealing with F7, F9, F10, F12,
F13, F14, F15, F16, F17, F18, F19, and F20, as shown in Fig. 14. However, QLESCA converged poorly on the F1, F2, F3,
F4, F5, F6, F8, and F11 tasks. As being explained, SCA and its variant algorithms might be not good at dealing with
these problems intrinsically.
35
Table 13
Parameter settings
Maximum no. of
Method Population size Other parameters
iterations
QLESCA 5 104 γ = 0.1 and α = 0.9
c1 = 2.5 to 0.5, c2 = 0.5 to 2.5, w=0.9 to 0.4. All
PSO 30 104 the parameters are set as in (Kennedy & Eberhart,
1995)
Minimum of wormhole existence probability 0.2,
Maximum of wormhole existence probability 1.
MVO 30 104
All the parameters are set as in (Mirjalili et al.,
2016)
GOA 30 104
cMax =1, cMin =0.00004. All the parameters are
set as in (Saremi et al., 2017)
Loudness =0.5, Pulse rate=0.5, Frequency
minimum=0, Frequency maximum=2. All the
BAT 30 104
parameters are set as in (Yang & Hossein
Gandomi, 2012)
Acceleration coefficient upper bound=1. All the
ABC 30 104
parameters are set as in (S. M. K. Heris, 2015)
Number of new Harmonies=20, Harmony
memory consideration rate=0.9, pitch adjustment
Harmony 30 104 rate=0.1, fret width damp ratio=0.995,
fret width=0.02*(VarMax - VarMin). All the
parameters are set as in (Geem et al., 2001)
b = 1. All the parameters are set as in (Mirjalili,
MFO 30 104
2015)
Acceptance ratio=0.35, α =0.3, β =0.5. All the
CA 30 104
parameters are set as in (Reynolds, 1994)
Maximum Number of Sub-iterations=20, Initial
Temp=0.1, α =0.99, Number of Neighbors per
SA 30 104 Individual=5, Mutation Rate=0.5,
σ = 0.1*( VarMax - VarMin). All the parameters
are set as in (M. K. Heris, 2015)
Neighborhood Radius Damp Rate=0.95,
Neighborhood Radius=0.1*( VarMax - VarMin).
BeA 30 104
All the parameters are set as in (Pham et al.,
2005)
Lower bound of scaling factor=0.2, upper bound
of scaling factor
DE 30 104
=0.8, crossover probability=0.2. All the
parameters are set as in (Storn & Price, 1997)
Sample size=40, intensification factor=0.5,
ACOR 30 104 deviation-distance ratio=1. All the parameters are
set as in (Socha & Dorigo, 2008)
36
Table 14
Comparison QLESCA with state-of-art algorithms
F1 F2 F3
Avg. Stdv. Rank Avg. Stdv. Rank Avg. Stdv. Rank
QLESCA 5.41E+10 5.45E+09 2 16600.64000 297.45930 3 21.02415 0.04095 2
PSO 8.87E+10 1.03E+10 4 19439.25000 304.10970 5 21.31485 0.02138 5
MVO 1.18E+11 8.16E+09 5 21424.98000 720.08150 8 21.48055 0.01388 10
GOA 1.69E+11 8.31E+09 7 19247.42000 831.23660 4 21.48817 0.02221 11
BAT 8.71E+11 1.7E+10 13 38427.07000 164.29210 13 21.33172 0.02380 6
ABC 3.59E+11 1.98E+10 12 24326.68000 288.18200 12 21.53793 0.01273 13
Harmony 7.94E+10 3.25E+09 3 15668.87000 209.18110 1 21.01589 0.02455 1
MFO 1.97E+11 1.69E+10 9 21982.23000 418.97390 9 21.30880 0.02207 3
CA 1.89E+11 1.60E+10 8 21313.38000 383.37450 7 21.31108 0.03081 4
SA 1.38E+11 5.38E+09 6 20881.62000 289.81860 6 21.44254 0.01059 9
BeA 3.17E+11 1.45E+10 10 23231.21000 250.41820 10 21.41542 0.00901 8
DE 4.00E+10 1.59E+09 1 16536.72000 168.86100 2 21.35020 0.01624 7
ACOR 3.58E+11 1.84E+10 11 24280.08000 289.78110 11 21.53704 0.01535 12
F4 F5 F6
QLESCA 1.78E+14 5.61E+13 2 4.59E+08 50404330 3 16325642 1965633 4
PSO 3.29E+14 8.14E+13 4 4.73E+08 33344216 5 13157176 705710.90000 3
MVO 4.36E+14 8.16E+13 7 5.51E+08 53146849 8 18916334 2480666 6
GOA 1.74E+14 8.01E+13 1 7.20E+08 89791846 9 20970572 107245.30000 11
BAT 2.26E+16 4.06E+15 13 1.78E+09 39815859 13 20031539 47165.07000 9
ABC 3.34E+15 7.13E+14 10 7.67E+08 47513723 10 21164144 97751.60000 12
Harmony 7.22E+14 1.81E+14 8 4.42E+08 17403006 2 11249309 453708.40000 2
MFO 3.08E+14 1.32E+14 3 4.06E+08 84709436 1 19269247 1018576 7
CA 4.23E+14 1.70E+14 6 5.25E+08 85114636 7 19729076 801814.90000 8
SA 3.55E+14 6.72E+13 5 4.66E+08 22720279 4 16477556 296813.10000 5
BeA 3.40E+15 6.84E+14 11 8.50E+08 36750684 11 20675847 119277.30000 10
DE 7.90E+14 1.20E+14 9 4.76E+08 19316344 6 11189963 3372948 1
ACOR 6.88E+15 1.79E+15 12 1.07E+09 56212991 12 21287380 86799.16000 13
F7 F8 F9
QLESCA 6.12E+10 9.72E+09 1 8.56E+14 1.85E+15 3 7.37E+10 4.74E+09 1
PSO 8.75E+10 2.12E+10 4 1.27E+15 1.31E+15 4 8.86E+10 7.19E+09 2
MVO 8.21E+10 1.32E+10 3 1.87E+15 5.23E+14 5 1.28E+11 7.12E+09 4
GOA 1.13E+11 6.51E+10 6 1.20E+16 5.64E+15 8 2.04E+11 1.33E+10 8
BAT 8.08E+13 9.13E+13 13 1.06E+18 8.13E+16 13 9.18E+11 1.31E+10 13
ABC 8.21E+11 1.80E+11 10 8.07E+16 1.54E+16 10 4.07E+11 2.05E+10 12
Harmony 1.01E+11 1.72E+10 5 3.49E+14 1.02E+14 2 1.28E+11 5.07E+09 3
MFO 1.26E+11 2.88E+10 8 1.92E+16 1.05E+16 9 1.89E+11 1.23E+10 7
CA 1.54E+11 3.73E+10 9 1.19E+16 4.64E+15 7 2.04E+11 2.01E+10 9
SA 7.83E+10 1.33E+10 2 1.88E+15 4.53E+14 6 1.50E+11 7.04E+09 6
BeA 1.30E+12 4.32E+11 11 1.12E+17 2.30E+16 11 3.50E+11 1.45E+10 10
DE 1.22E+11 1.48E+10 7 5.60E+11 2.21E+11 1 1.38E+11 5.35E+09 5
ACOR 4.18E+12 1.93E+12 12 2.04E+17 4.36E+16 12 4.01E+11 2.11E+10 11
F10 F11 F12

QLESCA 16596.38000 575.7735 1 229.59640 0.59215 2 5558557 472654.50000 1
PSO 19638.77000 269.0929 5 231.06790 1.80557 3 9259535 511074.90000 2
MVO 22390.32000 798.30120 8 235.88430 0.32286 11 9920623 730563.10000 4
GOA 19234.65000 881.99590 4 235.61690 0.36542 9 10802873 1524735 5
BAT 40836 209.02580 13 233.17490 0.22371 7 3.20E+09 6.86E+08 13
ABC 24477.16000 304.12490 11 236.38140 0.16222 12 36285170 4719861 11
Harmony 16774.23000 184.13750 2 227.10840 0.58783 1 13914334 870210.90000 8
MFO 22489.61000 314.73280 9 232.86130 0.32185 6 11497069 832933.80000 6
CA 21612.33000 515.61400 7 233.57900 0.43983 8 12607645 1826148 7
SA 21049.18000 282.5782 6 232.41510 0.49288 4 9500622 481682.60000 3
BeA 23393.52000 254.65520 10 232.53440 0.15980 5 25066125 3013927 10
DE 18388.60000 228.04390 3 235.65860 0.22462 10 14252435 964791.60000 9
ACOR 24606.33000 316.80430 12 236.52600 0.15574 13 40109483 5821522 12
37
F13 F14 F15
QLESCA 3.82E+11 2.57E+10 1 7.07E+10 5.01E+09 1 16128.64 480.9063 1
PSO 8.22E+11 1.24E+11 5 8.19E+10 6.53E+09 2 19752.12000 368.1153 5
MVO 1.15E+12 1.30E+11 6 1.35E+11 6.10E+09 3 22969.42000 683.7036 9
GOA 7.05E+11 3.87E+10 3 2.30E+11 1.40E+10 9 19479.21000 677.6217 3
BAT 1.29E+13 1.20E+11 13 9.27E+11 1.36E+10 13 39284.14000 184.01360 13
ABC 3.59E+12 1.63E+11 12 4.30E+11 2.10E+10 11 24564.76000 301.89590 11
Harmony 7.34E+11 4.22E+10 4 1.68E+11 6.03E+09 6 17751.26000 129.99640 2
MFO 2.30E+12 1.51E+11 8 1.66E+11 1.42E+10 5 22212.58000 269.34550 8
CA 2.34E+12 2.42E+11 9 2.08E+11 1.92E+10 8 21530.14000 497.86470 7
SA 1.46E+12 7.22E+10 7 1.62E+11 6.35E+09 4 21025.84000 212.55520 6
BeA 3.12E+12 1.14E+11 10 3.79E+11 1.26E+10 10 23303.02000 331.74040 10
DE 4.44E+11 2.61E+10 2 1.89E+11 7.28E+09 7 19596.66000 255.53630 4
ACOR 3.58E+12 1.63E+11 11 4.36E+11 2.05E+10 12 24612.47000 215.47090 12
F16 F17 F18

QLESCA 418.57290 0.64700 1 8984789 873160.9 1 1.18E+12 4.96E+10 1
PSO 425.04600 0.70903 4 16663642 1101651 2 3.30E+12 2.09E+11 6
MVO 429.61790 0.30358 11 18935140 1423758 3 4.48E+12 2.98E+11 7
GOA 429.03770 0.31428 9 26224196 4131571 7 1.57E+12 4.81E+10 2
BAT 426.57860 0.46359 7 6.65E+09 8.81E+08 13 2.58E+13 1.12E+11 5
ABC 430.12750 0.32094 12 86284994 10741175 11 7.77E+12 2.50E+11 13
Harmony 421.75580 0.78320 2 31859277 2302721 9 2.49E+12 6.60E+10 4
MFO 425.25420 0.39979 5 20471494 2281300 5 6.15E+12 3.02E+11 9
CA 425.73720 0.55941 6 21646401 2117012 6 6.38E+12 3.70E+11 10
SA 426.63460 0.61636 8 19186957 1306477 4 4.71E+12 1.89E+11 8
BeA 424.00590 0.24978 3 69837194 5762869 10 7.00E+12 1.47E+11 11
DE 429.51480 0.18799 10 31837067 1740241 8 2.23E+12 1.05E+11 3
ACOR 430.21780 0.30136 13 92037720 12745444 12 7.68E+12 2.80E+11 12
F19 F20 Overall Rank

Avg. Stdv. Rank Avg. Stdv. Rank Sum of ranks Average rank Rank
QLESCA 26424918 4448248 1 1.32E+12 4.57E+10 1 33 1.65000 1
PSO 38434028 3716747 3 3.62E+12 2.36E+11 6 79 3.95000 2
MVO 38940527 4637811 4 4.79E+12 3.87E+11 7 129 6.45000 6
GOA 35037142 6483199 2 1.77E+12 4.15E+10 2 120 6.00000 5
BAT 2.87E+11 2.12E+11 13 2.69E+13 8.64E+10 4 220 11.00000 10
ABC 2.47E+08 53657583 11 8.42E+12 2.96E+11 13 229 11.45000 11
Harmony 69064279 7072802 9 2.72E+12 9.59E+10 5 79 3.95000 2
MFO 40196700 5225606 5 6.89E+12 2.30E+11 10 132 6.60000 7
CA 56212402 10340726 7 6.74E+12 4.86E+11 9 149 7.45000 8
SA 40621044 4077418 6 5.24E+12 1.78E+11 8 113 5.65000 4
BeA 1.44E+08 25515416 10 7.71E+12 1.78E+11 11 192 9.60000 9
DE 65043366 8403146 8 2.42E+12 1.14E+11 3 106 5.30000 3
ACOR 3.91E+08 1.69E+08 12 8.35E+12 2.60E+11 12 239 11.95000 12
38
Fig. 14. The convergence curve of QLESCA with state of art
39
Fig. 14. The convergence curve of QLESCA with state of art (cont.)
40
41
5.3 Case Study III: Engineering Design Problems
This section discusses the performance and efficiency of QLESCA to solve three practical engineering design
problems. These problems are cantilever beam design problem, three-bar truss design problem, and multiple disk
clutch brake design problem.
5.3.1 Cantilever beam design problem
This problem is a good benchmark to verify the capability of optimization methods for solving continuous, discrete,
and/or mixed variable structural design problems (Gandomi & Yang, 2011). This benchmark was originally given in
(Thanedar & Vanderplaats, 1995) with ten variables. Fig.15 illustrates a five-stepped cantilever beam with a
rectangular shape.
Cantilever beam design consists of five types of the hollow element, including its square-shaped cross-section
(Kamboj et al., 2020). The main objective of this type of optimization is to decrease or reduce the weight of the beam.
In this kind of engineering design problem, each element can be defined by one variable, and the overall design
structure consists of five types of structural parameters, where the beam thickness is taken constantly. In the final
design for an optimal solution, this type of cantilever beam design problem, the displacement of vertical constraint
must be taken into consideration, which should not violate the design of the final optimal solution. The mathematical
model is given below.
Fig. 15. Cantilever beam design problem
42
𝑀𝑖𝑛𝑖𝑚𝑖𝑧𝑒 𝑓(𝑥) = 0.6224(𝑥1 + 𝑥2 + 𝑥3 + 𝑥4 + 𝑥5), (9)
61 37 19 7 1
Subject to ǥ(𝑥) = 𝑥31 + 𝑥32 + 𝑥33 + 𝑥34 + 𝑥35 ≤ 1, (10)
Variable range 0.01 ≤ 𝑥1,𝑥2,𝑥3,𝑥4,𝑥5 ≤ 100.
The obtained mean value of QLESCA and other algorithms are given in Table 15. As can be seen from Table 15,
QLESCA, obtaining the best mean of 13.36700, outperformed all other algorithms, establishing itself as a highly
competitive candidate for solving the cantilever beam design problem.
Table 15
Cantilever beam design problem
The mean of optimum
Algorithm
cost
QLESCA 13.36700
PSO 40.79000
MVO 15.88000
GOA 15.00000
BAT -
ABC 13.68000
Harmony 13.40000
MFO 13.36800
CA 13.62000
SA 14.69000
BeA 13.67400
DE 13.36800
ACOR 63.85000
SCA 15.90000
MSCA 15.58000
OBSCA 34.89000
SCADE 20.53000
ISCA 15.11000
5.3.2 Three-bar truss design problem
Three bar truss design optimization problem was firstly revealed by Ray and Saini (Ray et al., 2001). According to
this, it is desired that three bars are placed as clarified in Fig.16. It aims to minimize the weight of bars in this position.
This is a constraining optimization problem. There are two design parameters (x1, x2) and three restrictive functions
in this problem. The problem is expressed mathematically as follows:
Fig. 16. Three-bar truss design problem
43
𝑀𝑖𝑛𝑖𝑚𝑖𝑧𝑒 𝑓(𝑥) = (2 2𝑥1 + 𝑥2)𝐿, (11)
Subject to:
2𝑥1 + 𝑥2
ǥ1(𝑥) = 2𝑥21 + 2𝑥1𝑥2
𝑃 ― 𝜎 ≤ 0, (12a)
𝑥2
ǥ2(𝑥) = 2𝑥21 + 2𝑥1𝑥2
𝑃 ― 𝜎 ≤ 0, (12b)
1
ǥ3(𝑥) = 𝑥1 + 2𝑥2
𝑃 ― 𝜎 ≤ 0, (12c)
𝐾𝑁 𝐾𝑁
where 0 ≤ 𝑥1,𝑥2 ≤ 1, 𝐿 = 100 𝑐𝑚, 𝑃 = 2𝑐𝑚2𝑎𝑛𝑑 𝜎 = 2𝑐𝑚2 (Erdoǧan Yildirim & Karci, 2019).
Optimization results are given in Table 16. It is shown that QLESCA obtained the best mean value with 263.89500,
QLESCA outperformed all the other algorithms, establishing itself as a highly competitive candidate for solving the
three-bar truss design problem.
Table 16
Three-bar truss design problem
The mean of optimum
Algorithm
cost
QLESCA 263.89500
PSO 272.10000
MVO 264.10000
GOA 264.60000
BAT -
ABC 263.91000
Harmony 264.60000
MFO 263.92000
CA 263.90000
SA 264.00000
BeA 263.92000
DE 263.89600
ACOR 280.10000
SCA 264.30000
MSCA 265.50000
OBSCA 265.10000
SCADE 268.50000
ISCA 264.50000
5.3.3 Multiple disk clutch brake (MDCB) design problem
This problem was originally presented by (Steven, 2002). The objective of the problem is to minimize the mass of
MDCB. The problem has five decision discrete variables namely disc thickness, actuating force, inner and outer radius
as clarified in Fig.17, as well as friction surfaces that need to be computed while satisfying eight restraints (Rao &
Waghmare, 2017)(Sharma & Abraham, 2019).
𝑀𝑖𝑛𝑖𝑚𝑖𝑧𝑒 𝑓(𝑥) = 𝜋(𝑥22 ― 𝑥21)𝑥3(𝑥5 + 1)𝜌, (13)
Subject to:
44
ǥ1(𝑥) = 𝑥2 ― 𝑥1 ―∆𝑅 ≥ 0, (14a)
ǥ2(𝑥) = 𝐿𝑚𝑎𝑥 ―(𝑥5 +1)(𝑥3 +𝛿) ≥ 0, (14b)
ǥ3(𝑥) = 𝑃𝑚𝑎𝑥 ― 𝑃𝑟𝑧 ≥ 0, (14c)
ǥ4(𝑥) = 𝑃𝑚𝑎𝑥 ∗ 𝑉𝑠𝑟𝑚𝑎𝑥 ― 𝑃𝑟𝑧 ∗ 𝑉𝑠𝑟 ≥ 0, (14d)
ǥ5(𝑥) = 𝑉𝑠𝑟𝑚𝑎𝑥 ―𝑉𝑠𝑟 ≥ 0, (14e)
ǥ6(𝑥) = 𝑇𝑚𝑎𝑥 ―𝑇 ≥ 0, (14f)
ǥ7(𝑥) = 𝑀ℎ ―𝑠𝑀𝑠 ≥ 0, (14g)
ǥ8(𝑥) = 𝑇 ≥ 0, (14h)
where,
2 𝑥32 ― 𝑥31 𝜋𝑛
𝑀ℎ = 3µ𝑥4𝑥5𝑥2 ― 𝑥2𝑁.𝑚𝑚, 𝑊 = 30𝑟𝑎𝑑/𝑠, 𝐴 = 𝜋(𝑥22 ― 𝑥21)𝑚𝑚2
2 1
𝑥4 𝑃𝑖𝑅𝑠𝑟𝑛 2(𝑥32 ― 𝑥31)

𝑃𝑟𝑧 = 𝐴 𝑁/𝑚𝑚2, 𝑉𝑠𝑟 = 30 𝑚𝑚/𝑠, 𝑅𝑠𝑟 = 3(𝑥22𝑥21)
𝑚𝑚
𝑘𝑔 𝑚
∆𝑅 = 20𝑚𝑚, 𝐿𝑚𝑎𝑥 = 30𝑚𝑚, µ = 0.6, 𝑃𝑚𝑎𝑥 = 1 𝑀𝑃𝑎, 𝑝 = 0.0000078 3
, 𝑉𝑠𝑟𝑚𝑎𝑥 = 10 ,
𝑚𝑚 𝑠
𝛿 = 0.5 𝑚𝑚, 𝑠 = 1.5, 𝑇𝑚𝑎𝑥 = 15 𝑠, 𝑛 = 250 𝑟𝑝𝑚, 𝐼𝑧 = 55 𝐾𝑔.𝑚2, 𝑀𝑠 = 40 𝑁𝑚, 𝑀𝑓 = 3 𝑁𝑚
60 ≤ 𝑥1 ≤ 80, 90 ≤ 𝑥2 ≤ 110, 1 ≤ 𝑥3 ≤ 3, 0 ≤ 𝑥4 ≤ 1000, 2 ≤ 𝑥5 ≤ 9, 𝑖 = 1,2,3,4,5 .
Fig. 17. Multiple disk clutch brake design problem
As can be seen from Table 17, QLESCA, obtains the lowest mean value with 0.26255, outperformed all the other
algorithms, establishing itself as a highly competitive candidate for solving the multiple disk clutch brake (Discrete
variables) design problem.
45
Table 17
Multiple disk clutch brake design problem
The mean of optimum
Algorithm
cost
QLESCA 0.26255
PSO 0.36100
MVO 0.37700
GOA 0.28900
BAT -
ABC 0.27160
Harmony 0.27800
MFO 0.26630
CA 0.29900
SA 0.29000
BeA 0.36740
DE 0.26412
ACOR 0.88600
SCA 0.29100
MSCA 0.35400
OBSCA 0.32700
SCADE 0.40600
ISCA 0.29800
6 Conclusion
This paper introduced a modified-based approach to improve SCA by embedding Q-Learning in Sine Cosine
Algorithm. The algorithm proposed is known as QLESCA. QLESCA used Q-learning to automate switching from
exploration to exploitation and vice-versa. Two indicators were used (population density and distance from leader) to
build Q-table. Each agent works independently, and different actions are generated based on agent performances.
QLESCA was run with a micro population that reduced fitness evaluations consumed by a large population. Then the
proposed algorithm was evaluated with several benchmark problems, namely the basic 23 functions, 20 large scale
problems, and 3 engineering design problems. Reported results indicated that QLESCA significantly outperforms the
standard SCA in all studied problems. In addition, QLESCA was examined with four variants of SCA (OBSCA,
SCADE, ISCA, and MSCA) and twelve state-of-the-art algorithms (PSO, MVO, GOA, BAT, ABC, Harmony, MFO,
CA, SA, BeA, DE, and ACOR) and it outperformed these algorithms in most case studies.
As future work, the application of QLESCA on various problems will be studied. These problems, including
constrained optimization problems, feature selection, data mining, and image processing. Additionally, the
development of multi-objective QLESCA could be investigated to solve multi-objective problems.
Acknowledgment
This research is supported by the Malaysia Ministry of Higher Education (MOHE) Fundamental Research Grant
Scheme (FRGS), no. FRGS/1/2019/ICT02/USM/03/3.
The authors would like to thank Associate Professor Dr. Huiling Chen, College of Computer Science and Artificial
Intelligence, Wenzhou University, China for sharing the MSCA, OBSCA, SCADE codes that were used in the
comparison experiments. Additionally, the authors also thank Yarpiz Project for sharing source codes for many
algorithms in Engineering Optimization.
46
References
Abd Elaziz, M., Oliva, D., & Xiong, S. (2017). An improved Opposition-Based Sine Cosine Algorithm for global
optimization. Expert Systems with Applications, 90, 484–500. https://doi.org/10.1016/j.eswa.2017.07.043
Abd Elfattah, M., Abuelenin, S., Hassanien, A. E., & Pan, J. S. (2017). Handwritten Arabic manuscript image
binarization using sine cosine optimization algorithm. In Advances in Intelligent Systems and Computing (Vol.
536, pp. 273–280). Springer Verlag. https://doi.org/10.1007/978-3-319-48490-7_32
Abualigah, L., & Dulaimi, A. J. (2021). A novel feature selection method for data mining tasks using hybrid Sine
Cosine Algorithm and Genetic Algorithm. Cluster Computing, 24(3), 2161–2176.
https://doi.org/10.1007/s10586-021-03254-y
Algabalawy, M. A., Abdelaziz, A. Y., Mekhamer, S. F., & Abdel Aleem, S. H. E. (2018). Considerations on optimal
design of hybrid power generation systems using whale and sine cosine optimization algorithms. Journal of
Electrical Systems and Information Technology, 5(3), 312–325. https://doi.org/10.1016/j.jesit.2018.03.004
Belazzoug, M., Touahria, M., Nouioua, F., & Brahimi, M. (2020). An improved sine cosine algorithm to select
features for text categorization. Journal of King Saud University - Computer and Information Sciences, 32(4),
454–464. https://doi.org/10.1016/j.jksuci.2019.07.003
Chegini, S. N., Bagheri, A., & Najafi, F. (2018). PSOSCALF: A new hybrid PSO based on Sine Cosine Algorithm
and Levy flight for solving optimization problems. Applied Soft Computing, 73, 697–726.
https://doi.org/10.1016/j.asoc.2018.09.019
Chen, Hao, Heidari, A. A., Zhao, X., Zhang, L., & Chen, H. (2020). Advanced orthogonal learning-driven multi-
swarm sine cosine optimization: Framework and case studies. Expert Systems with Applications, 144.
https://doi.org/10.1016/j.eswa.2019.113113
Chen, Huiling, Jiao, S., Heidari, A. A., Wang, M., Chen, X., & Zhao, X. (2019). An opposition-based sine cosine
approach with local search for parameter estimation of photovoltaic models. Energy Conversion and
Management, 195(May), 927–942. https://doi.org/10.1016/j.enconman.2019.05.057
Chen, Huiling, Wang, M., & Zhao, X. (2020). A multi-strategy enhanced sine cosine algorithm for global
optimization and constrained practical engineering problems. Applied Mathematics and Computation, 369,
124872. https://doi.org/10.1016/j.amc.2019.124872
Chen, K., Zhou, F., Yin, L., Wang, S., Wang, Y., & Wan, F. (2018). A hybrid particle swarm optimizer with sine
cosine acceleration coefficients. Information Sciences, 422(C), 218–241.
https://doi.org/10.1016/j.ins.2017.09.015
Das, S., Bhattacharya, A., & Chakraborty, A. K. (2018). Solution of short-term hydrothermal scheduling using sine
cosine algorithm. Soft Computing, 22(19), 6409–6427. https://doi.org/10.1007/s00500-017-2695-3
Erdoǧan Yildirim, A., & Karci, A. (2019). Application of Three Bar Truss Problem among Engineering Design
Optimization Problems using Artificial Atom Algorithm. 2018 International Conference on Artificial
Intelligence and Data Processing, IDAP 2018, September. https://doi.org/10.1109/IDAP.2018.8620762
Fan, Y., Wang, P., Heidari, A. A., Wang, M., Zhao, X., Chen, H., & Li, C. (2020). Rationalized fruit fly
optimization with sine cosine algorithm: A comprehensive analysis. Expert Systems with Applications, 157,
113486. https://doi.org/10.1016/j.eswa.2020.113486
Feng, Z., Niu, W., Liu, S., Luo, B., Miao, S., & Liu, K. (2020). Multiple hydropower reservoirs operation
optimization by adaptive mutation sine cosine algorithm based on neighborhood search and simplex search
strategies. Journal of Hydrology, 590(December 2019), 125223. https://doi.org/10.1016/j.jhydrol.2020.125223
Gandomi, A. H., & Yang, X. S. (2011). Benchmark problems in structural optimization. Studies in Computational
Intelligence, 356, 259–281. https://doi.org/10.1007/978-3-642-20859-1_12
García, S., Fernández, A., Luengo, J., & Herrera, F. (2010). Advanced nonparametric tests for multiple comparisons
in the design of experiments in computational intelligence and data mining: Experimental analysis of power.
47
Information Sciences, 180(10), 2044–2064. https://doi.org/10.1016/j.ins.2009.12.010
Geem, Kim, J. H., & Loganathan, G. V. (2001). A New Heuristic Optimization Algorithm: Harmony Search.
SIMULATION, 76(2), 60–68. https://doi.org/10.1177/003754970107600201
Ghosh, A., & Mukherjee, V. (2017). Temperature dependent optimal power flow. 2017 International Conference on
Technological Advancements in Power and Energy ( TAP Energy), 1–6.
https://doi.org/10.1109/TAPENERGY.2017.8397287
Guo, W. yan, Wang, Y., Dai, F., & Xu, P. (2020). Improved sine cosine algorithm combined with optimal
neighborhood and quadratic interpolation strategy. Engineering Applications of Artificial Intelligence, 94(July
2018), 103779. https://doi.org/10.1016/j.engappai.2020.103779
Gupta, S., & Deep, K. (2019a). A hybrid self-adaptive sine cosine algorithm with opposition based learning. Expert
Systems with Applications, 119, 210–230. https://doi.org/10.1016/j.eswa.2018.10.050
Gupta, S., & Deep, K. (2019b). Improved sine cosine algorithm with crossover scheme for global optimization.
Knowledge-Based Systems, 165, 374–406. https://doi.org/10.1016/j.knosys.2018.12.008
Gupta, S., Deep, K., & Engelbrecht, A. P. (2020). A memory guided sine cosine algorithm for global optimization.
Engineering Applications of Artificial Intelligence, 93(May), 103718.
https://doi.org/10.1016/j.engappai.2020.103718
Gupta, S., Deep, K., Mirjalili, S., & Kim, J. H. (2020). A modified Sine Cosine Algorithm with novel transition
parameter and mutation operator for global optimization. Expert Systems with Applications, 154.
https://doi.org/doi.org/10.1016/j.eswa.2020.113395
Heris, M. K. (2015). Simulated Annealing in MATLAB - Yarpiz. https://yarpiz.com/223/ypea105-simulated-
annealing
Heris, S. M. K. (2015). Implementation of artificial bee colony in MATLAB. Yarpiz, Project Code: YPEA114.
https://yarpiz.com/297/ypea114-artificial-bee-colony
Hernandez del Rio, A. A., Cuevas, E., & Zaldivar, D. (2020). Multi-level Image Thresholding Segmentation Using
2D Histogram Non-local Means and Metaheuristics Algorithms. In Studies in Computational Intelligence
(Vol. 890, pp. 121–149). Springer. https://doi.org/10.1007/978-3-030-40977-7_6
Hussain, K., Neggaz, N., Zhu, W., & Houssein, E. H. (2021). An efficient hybrid sine-cosine Harris hawks
optimization for low and high-dimensional feature selection. Expert Systems with Applications, 176(July
2020), 114778. https://doi.org/10.1016/j.eswa.2021.114778
Huynh, T. N., Do, D. T. T., & Lee, J. (2021). Q-Learning-based parameter control in differential evolution for
structural optimization. Applied Soft Computing, 107, 107464. https://doi.org/10.1016/j.asoc.2021.107464
Issa, M., Hassanien, A. E., Oliva, D., Helmi, A., Ziedan, I., & Alzohairy, A. (2018). ASCA-PSO: Adaptive sine
cosine optimization algorithm integrated with particle swarm for pairwise local sequence alignment. Expert
Systems with Applications, 99, 56–70. https://doi.org/10.1016/j.eswa.2018.01.019
Kamboj, V. K., Nandi, A., Bhadoria, A., & Sehgal, S. (2020). An intensify Harris Hawks optimizer for numerical
and engineering optimization problems. Applied Soft Computing Journal, 89, 106018.
https://doi.org/10.1016/j.asoc.2019.106018
Karaboga, D., & Basturk, B. (2007). A powerful and efficient algorithm for numerical function optimization:
artificial bee colony (ABC) algorithm. Journal of Global Optimization, 39(3), 459–471.
https://doi.org/10.1007/s10898-007-9149-x
Ke Tang, Xiaodong Li, P. N. Suganthan, Z. Y. (2010). Benchmark functions for the CEC’2010 special session and
competition on large-scale global optimization. Gene, 7(33), 8.
Kennedy, J., & Eberhart, R. (1995). Particle swarm optimization. Proceedings of ICNN’95 - International
Conference on Neural Networks, 4, 1942–1948. https://doi.org/10.1109/ICNN.1995.488968
48
Khalilpourazari, S., & Khalilpourazary, S. (2018). SCWOA: an efficient hybrid algorithm for parameter
optimization of multi-pass milling process. Journal of Industrial and Production Engineering, 35(3), 135–147.
https://doi.org/10.1080/21681015.2017.1422040
Kirkpatrick, S., Gelatt, C. D., & Vecchi, M. P. (1983). Optimization by simulated annealing. Science, 220(4598),
671–680.
Long, W., Wu, T., Liang, X., & Xu, S. (2019). Solving high-dimensional global optimization problems using an
improved sine cosine algorithm. Expert Systems with Applications, 123, 108–126.
https://doi.org/10.1016/j.eswa.2018.11.032
Mahdad, B., & Srairi, K. (2018). A new interactive sine cosine algorithm for loading margin stability improvement
under contingency. Electrical Engineering, 100(2), 913–933. https://doi.org/10.1007/s00202-017-0539-x
Majhi, S. K. (2018). An Efficient Feed Foreword Network Model with Sine Cosine Algorithm for Breast Cancer
Classification. International Journal of System Dynamics Applications, 7(2), 1–14.
https://doi.org/10.4018/IJSDA.2018040101
Mirjalili, S. (2015). Moth-flame optimization algorithm: A novel nature-inspired heuristic paradigm. Knowledge-
Based Systems, 89, 228–249. https://doi.org/10.1016/j.knosys.2015.07.006
Mirjalili, S. (2016). SCA: A Sine Cosine Algorithm for solving optimization problems. Knowledge-Based Systems,
96, 120–133. https://doi.org/10.1016/j.knosys.2015.12.022
Mirjalili, S., Mirjalili, S. M., & Hatamlou, A. (2016). Multi-Verse Optimizer: a nature-inspired algorithm for global
optimization. Neural Computing and Applications, 27(2), 495–513. https://doi.org/10.1007/s00521-015-1870-
7
Nenavath, H., & Jatoth, R. K. (2018). Hybridizing sine cosine algorithm with differential evolution for global
optimization and object tracking. Applied Soft Computing, 62, 1019–1043.
https://doi.org/10.1016/j.asoc.2017.09.039
Nenavath, H., & Jatoth, R. K. (2019). Hybrid SCA–TLBO: a novel optimization algorithm for global optimization
and visual tracking. Neural Computing and Applications, 31(9), 5497–5526. https://doi.org/10.1007/s00521-
018-3376-6
Nenavath, H., Kumar Jatoth, D. R., & Das, D. S. (2018). A synergy of the sine-cosine algorithm and particle swarm
optimizer for improved global optimization and object tracking. Swarm and Evolutionary Computation,
43(March), 1–30. https://doi.org/10.1016/j.swevo.2018.02.011
Pham, D. T., Ghanbarzadeh, A., Koc, E., Otri, S., Rahim, S., & Zaidi, M. (2005). The bees algorithm. Technical
Note, Manufacturing Engineering Centre, Cardiff University, UK.
Qu, C., Zeng, Z., Dai, J., Yi, Z., & He, W. (2018). A Modified Sine-Cosine Algorithm Based on Neighborhood
Search and Greedy Levy Mutation. Computational Intelligence and Neuroscience, 2018, 1–19.
https://doi.org/10.1155/2018/4231647
Rao, R. V., & Waghmare, G. G. (2017). A new optimization algorithm for solving complex constrained design
optimization problems. Engineering Optimization, 49(1), 60–83.
https://doi.org/10.1080/0305215X.2016.1164855
Ray, T., Saini, P. J., & Saini, P. (2001). ENGINEERING DESIGN OPTIMIZATION USING A SWARM WITH
AN INTELLIGENT INFORMATION SHARING AMONG INDIVIDUALS, Engineering Optimization. En#.
Opt.. 2W1, 33(6), 735–148. https://doi.org/10.1080/03052150108940941
Reynolds, R. G. (1994). An introduction to cultural algorithms. Proceedings of the Third Annual Conference on
Evolutionary Programming, 24, 131–139.
Rizk-Allah, R. M. (2018). Hybridizing sine cosine algorithm with multi-orthogonal search strategy for engineering
design problems. Journal of Computational Design and Engineering, 5(2), 249–273.
https://doi.org/10.1016/j.jcde.2017.08.002
49
Rizk-Allah, R. M. (2019). An improved sine–cosine algorithm based on orthogonal parallel information for global
optimization. Soft Computing, 23(16), 7135–7161. https://doi.org/10.1007/s00500-018-3355-y
Samma, H., Lim, C. P., & Mohamad Saleh, J. (2016). A new Reinforcement Learning-based Memetic Particle
Swarm Optimizer. Applied Soft Computing Journal, 43, 276–297. https://doi.org/10.1016/j.asoc.2016.01.006
Samma, H., Mohamad-Saleh, J., Suandi, S. A., & Lahasan, B. (2020). Q-learning-based simulated annealing
algorithm for constrained engineering design problems. Neural Computing and Applications, 32(9), 5147–
5161. https://doi.org/10.1007/s00521-019-04008-z
Saremi, S., Mirjalili, S., & Lewis, A. (2017). Grasshopper Optimisation Algorithm: Theory and application.
Advances in Engineering Software, 105, 30–47. https://doi.org/10.1016/j.advengsoft.2017.01.004
Sarwagya, K., Nayak, P. K., & Ranjan, S. (2020). Optimal coordination of directional overcurrent relays in complex
distribution networks using sine cosine algorithm. Electric Power Systems Research, 187(November 2019),
106435. https://doi.org/10.1016/j.epsr.2020.106435
Sharma, T. K., & Abraham, A. (2019). Artificial bee colony with enhanced food locations for solving mechanical
engineering design problems. Journal of Ambient Intelligence and Humanized Computing, 0(0), 0.
https://doi.org/10.1007/s12652-019-01265-7
Singh, N., & Singh, S. B. (2017). A novel hybrid GWO-SCA approach for optimization problems. Engineering
Science and Technology, an International Journal, 20(6), 1586–1601.
https://doi.org/10.1016/j.jestch.2017.11.001
Singh, Narinder, Son, L. H., Chiclana, F., & Magnot, J.-P. (2020). A new fusion of salp swarm with sine cosine for
optimization of non-linear functions. Engineering with Computers, 36(1), 185–212.
https://doi.org/10.1007/s00366-018-00696-8
Socha, K., & Dorigo, M. (2008). Ant colony optimization for continuous domains. European Journal of Operational
Research, 185(3), 1155–1173. https://doi.org/10.1016/j.ejor.2006.06.046
Steven, G. (2002). Evolutionary algorithms for single and multicriteria design optimization by A. Osyczka.
STRUCTURAL AND MULTIDISCIPLINARY OPTIMIZATION, 24(1), 88–88.
Storn, R., & Price, K. (1997). Differential Evolution - A Simple and Efficient Heuristic for Global Optimization
over Continuous Spaces. Journal of Global Optimization, 11(4), 341–359.
https://doi.org/10.1023/A:1008202821328
Suganthan, P. N., Hansen, N., Liang, J., & Deb, K. (2005). Problem Definitions and Evaluation Criteria for the
CEC 2005 Special Session on Real-Parameter Optimization. May 2014.
https://www.researchgate.net/profile/Ponnuthurai_Suganthan/publication/235710019_Problem_Definitions_an
d_Evaluation_Criteria_for_the_CEC_2005_Special_Session_on_Real-
Parameter_Optimization/links/0c960525d3990de15c000000/Problem-Definitions-and-Evaluation-
Suid, M. H., Ahmad, M. A., Ismail, M. R. T. R., Ghazali, M. R., Irawan, A., & Tumari, M. Z. (2018). An Improved
Sine Cosine Algorithm for Solving Optimization Problems. 2018 IEEE Conference on Systems, Process and
Control (ICSPC), December, 209–213. https://doi.org/10.1109/SPC.2018.8703982
Sutton, R. S., Precup, D., & Singh, S. (1999). Between MDPs and semi-MDPs: A framework for temporal
abstraction in reinforcement learning. Artificial Intelligence, 112(1), 181–211. https://doi.org/10.1016/S0004-
3702(99)00052-1
Tang, K., Li, Z., Luo, L., & Liu, B. (2015). Multi-strategy adaptive particle swarm optimization for numerical
optimization. Engineering Applications of Artificial Intelligence, 37, 9–19.
https://doi.org/10.1016/j.engappai.2014.08.002
Thanedar, P. B., & Vanderplaats, G. N. (1995). Survey of Discrete Variable Optimization for Structural Design.
Journal of Structural Engineering, 121(2), 301–306. https://doi.org/10.1061/(ASCE)0733-
9445(1995)121:2(301)
50
Watkins, C. J. C. H., & Dayan, P. (1992). Q-learning. Machine Learning, 8(3–4), 279–292.
https://doi.org/10.1007/BF00992698
Yang, X., & Hossein Gandomi, A. (2012). Bat algorithm: a novel approach for global engineering optimization.
Engineering Computations, 29(5), 464–483. https://doi.org/10.1108/02644401211235834
Zamli, K. Z., Din, F., Ahmed, B. S., & Bures, M. (2018). A hybrid Q-learning sine-cosine-based strategy for
addressing the combinatorial test suite minimization problem. PLOS ONE, 13(5), e0195675.
https://doi.org/10.1371/journal.pone.0195675
Zhang, J., Zhou, Y., & Luo, Q. (2018). An improved sine cosine water wave optimization algorithm for global
optimization. Journal of Intelligent & Fuzzy Systems, 34(4), 2129–2141. https://doi.org/10.3233/JIFS-171001
Zhou, W., Wang, P., Heidari, A. A., Wang, M., Zhao, X., & Chen, H. (2021). Multi-core sine cosine optimization:
Methods and inclusive analysis. Expert Systems with Applications, 164, 113974.
https://doi.org/10.1016/j.eswa.2020.113974
CRediT Author Statement (ESWA-D-21-00802)
Qusay Shihab Hamad: Conceptualization, Methodology, Software, Validation, Formal Analysis,

Investigation, Writing – Original Draft
Hussein Samma: Conceptualization, Validation, Resources, Supervision
Shahrel Azmin Suandi: Conceptualization, Writing – Review & Editing, Supervision, Project
Administration, Funding Acquisition
Junita Mohamad-Saleh: Formal Analysis, Resources
Q-Learning Embedded Sine Cosine Algorithm (QLESCA) ORCID Information
1. Qusay Shihab Hamad - 0000-0002-8699-2586

2. Hussein Samma - 0000-0002-3562-2788
3. Shahrel Azmin Suandi * - 0000-0001-9980-7426 (Corresponding author)
4. Junita Mohamad-Saleh - 0000-0003-3447-6050
Declaration of interests
☒ The authors declare that they have no known competing financial interests or personal relationships
that could have appeared to influence the work reported in this paper.
51
☐The authors declare the following financial interests/personal relationships which may be considered
as potential competing interests:
52

Q LearningEmbeddedSineCosineAlgorithmQLESCA

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Q LearningEmbeddedSineCosineAlgorithmQLESCA

Uploaded by

Copyright:

Available Formats

Journal Pre-proofs

Q-Learning Embedded Sine Cosine Algorithm (QLESCA)

Qusay Shihab Hamad, Hussein Samma, Shahrel Azmin Suandi, Junita

To appear in: Expert Systems with Applications

Received Date: 25 February 2021

© 2021 Published by Elsevier Ltd.

qusay@student.usm.my, hussein.samma@utm.my, *shahrel@usm.my, jms@usm.my

2. SCA and its variants

𝑋𝑡𝑖 + 1 = { 𝑋𝑡𝑖 + 𝑟1 ∗ sin (𝑟2) ∗ |𝑟3𝑃𝑡𝑖 ― 𝑋𝑡𝑖|, 𝑟4 < 0.5

Next position region

Fig. 2. The parameter 𝑟1 is linearly decreased from 2 to 0

The pseudo-code of this algorithm is presented as follows:

2.1 Modified-based SCA Algorithms

2.2 Hybrid-based SCA Algorithms

3 Overview of Q-learning Algorithm

𝑄𝑡 + 1(𝑆𝑡,𝐴𝑡) = 𝑄(𝑆𝑡,𝐴𝑡) + α [𝑅𝑡 + γ 𝑚𝑎𝑥 𝑄(𝑆𝑡 + 1,𝐴𝑡 + 1) ― 𝑄(𝑆𝑡,𝐴𝑡)]. (3)

4 Q-Learning Embedded Sine Cosine Algorithm (QLESCA)

Fig. 5. The structure of the Q-table in the proposed model

Fig. 7. State 3 (Den=H and Dis=L)

Fig. 8. State 7 (Den=L and Dis=H)

Fig. 9. State 9 (Den=L and Dis=L)

Fig. 10. r1 behavior in QLESCA against standard SCA

Step 2: Population evolution

Step 3: State computation

Step 4: Action execution

𝑟1 = 𝑟1 _𝑚𝑖𝑛 + (𝑟1 _𝑚𝑎𝑥 ― 𝑟1 _𝑚𝑖𝑛)𝑈(0,1)

𝑟3 = 𝑟3 _𝑚𝑖𝑛 + (𝑟3 _𝑚𝑎𝑥 ― 𝑟3 _𝑚𝑖𝑛)𝑈(0,1) (7)

Step 5: Fitness evaluation and reward setting

For the generated new agent 𝑋𝑡𝑖, the fitness is computed.

Step 6: Position updating

Step 7: Q-table updating

Fig. 11. The flowchart of QLESCA.

5.1 Case Study I: Basic Benchmark Problems

𝐹4(𝑋) = 𝑚𝑎𝑥𝑖{|𝑥𝑖|,1 ≤ 𝑖 ≤ 𝑛} Schwefel 2.21 30 [-100,100] 0

1 Six-hump 2 [-5,5] -1.0316

5.1.1 Performance Analysis

5.1.2 Convergence Analysis

Fig. 12. The convergence curve of SCA and QLESCA

5.1.3 Effect of population size

5.2 Case Study II: Large-Scale Benchmark Problems

𝐹4(𝑋) = 𝐹𝑟𝑜𝑡_𝑒𝑙𝑙𝑖𝑝𝑡𝑖𝑐[𝑍(𝑃1:𝑃𝑚)] ∗ 106 + 𝐹𝑒𝑙𝑙𝑖𝑝𝑡𝑖𝑐[𝑍(𝑃𝑚 + 1:𝑃𝐷)] Single-group 1000 [-100,100] 0

5.2.1 Comparison with SCA variants

5.2.1.2 Execution Time Analysis

5.2.2 Comparison with state-of-the-art Algorithms

F10 F11 F12

F16 F17 F18

F19 F20 Overall Rank

5.3 Case Study III: Engineering Design Problems

5.3.1 Cantilever beam design problem

Fig. 15. Cantilever beam design problem

Variable range 0.01 ≤ 𝑥1,𝑥2,𝑥3,𝑥4,𝑥5 ≤ 100.

5.3.2 Three-bar truss design problem

Fig. 16. Three-bar truss design problem

5.3.3 Multiple disk clutch brake (MDCB) design problem

𝑀𝑖𝑛𝑖𝑚𝑖𝑧𝑒 𝑓(𝑥) = 𝜋(𝑥22 ― 𝑥21)𝑥3(𝑥5 + 1)𝜌, (13)

ǥ2(𝑥) = 𝐿𝑚𝑎𝑥 ―(𝑥5 +1)(𝑥3 +𝛿) ≥ 0, (14b)

ǥ3(𝑥) = 𝑃𝑚𝑎𝑥 ― 𝑃𝑟𝑧 ≥ 0, (14c)

ǥ4(𝑥) = 𝑃𝑚𝑎𝑥 ∗ 𝑉𝑠𝑟𝑚𝑎𝑥 ― 𝑃𝑟𝑧 ∗ 𝑉𝑠𝑟 ≥ 0, (14d)

ǥ5(𝑥) = 𝑉𝑠𝑟𝑚𝑎𝑥 ―𝑉𝑠𝑟 ≥ 0, (14e)

ǥ6(𝑥) = 𝑇𝑚𝑎𝑥 ―𝑇 ≥ 0, (14f)

ǥ7(𝑥) = 𝑀ℎ ―𝑠𝑀𝑠 ≥ 0, (14g)

𝑥4 𝑃𝑖𝑅𝑠𝑟𝑛 2(𝑥32 ― 𝑥31)

Fig. 17. Multiple disk clutch brake design problem

CRediT Author Statement (ESWA-D-21-00802)