Multi-Algorithm Optimization

Vol.
8, 2023-12
Multi-Algorithm Optimization
Hugo Hernandez
ForsChem Research, 050030 Medellin, Colombia
hugo.hernandez@forschem.org
doi: 10.13140/RG.2.2.21772.49284
Abstract
The No Free Lunch (NFL) Theorem states that the average success rate of all optimization
algorithms is basically the same, considering that certain algorithms work well for some types
of problems, but fail for other types of problems. Another interpretation of the NFL Theorem is
that “there is no universal optimizer”, capable of successfully and efficiently solving any type of
optimization problem. In this report, a Multi-Algorithm Optimization strategy is presented
which allows increasing the average success rate at a reasonable cost, by running a sequence
of different optimization algorithms, starting from multiple random points. Optimization of
different benchmark problems performed with this algorithm illustrated that the particular
sequence employed and the number of starting points greatly influence the success rate and
cost of the optimization. A suggested sequence consisting on using the Broyden-Fletcher-
Goldfarb-Shanno, Nelder-Mead, and adaptive step-size One-at-a-time optimization algorithms
and using random starting points, achieved overall success rate, with average
optimization time for a benchmark set of different global optimization problems. The
proposed method (implemented in R language) is included in the Appendix.
Keywords
Broyden-Fletcher-Goldfarb-Shanno, Global Optimization, Nelder-Mead, No Free Lunch

Theorem, One-at-a-time, Randomistic, Simulated Annealing, Success Rate, Universal Optimizer
1. Introduction
Optimization refers to any procedure used to find the best values of a certain set of decision
variables that optimizes (maximizes or minimizes) a certain objective function (which is a
function of a particular set of decision variables) [1]. The main purpose of any optimization
algorithm is solving any optimization problem in an efficient way, that is, at a reasonable cost
(considering for example, time, computational resources, etc.).
Cite as: Hernandez, H. (2023). Multi-Algorithm Optimization. ForsChem Research Reports, 8, 2023-12, 1 -
33. Publication Date: 28/08/2023.
Hugo Hernandez
ForsChem Research
There are currently hundreds of different optimization algorithms available in the scientific
literature [2]. Unfortunately, there is not a single best optimization algorithm. According to the
No Free Lunch (NFL) Theorem [3,4], the average performance of all optimization algorithms is
basically the same, as the improvement in performance in certain types of optimization
problems is compensated by the decrease in performance in other types of problems. In simple
words, the NFL theorem states that ‘‘universal optimizers are impossible.’’ [5]
The NFL Theorem is graphically illustrated in Figure 1, showing the success rate obtained using
different optimization algorithms for different benchmark problems considered in a
previous report [6]. While some methods may show a higher average success rate, for all
methods there are always optimization problems with a very low success rate, as well as
problems with a very high success rate. Also, the differences in average success rates between
methods are usually not significant from a statistical point of view. In addition, higher average
success rates were typically obtained by higher optimization costs.
Figure 1. Success Rate of different optimization algorithms. Black dots: Success rate of
individual benchmark problems (1000 random runs). Green diamond: Sample average success
rate. Dotted red lines: 95% confidence intervals in the estimation of the mean success rate. NM:
Nelder-Mead. BFGS: Broyden-Fletcher-Goldfarb-Shanno. BFGS-UB: BFGS unbounded. SANN:
Simulated Annealing. OAT: Adaptive step-size One-at-a-time. OAT-UB: OAT unbounded.
The nature of the various numerical optimization algorithms available is also different.
Optimization algorithms can be classified, in a very general way, as depicted in Figure 2. First of
all, we can distinguish between methods that calculate the gradient of the function and
methods that do not require the determination of the gradient. While the gradient of a
function may indicate the most efficient route towards improving the objective function, it may
also result in the stagnation of the algorithm at local optima. In that sense, non-gradient based
methods allow overcoming such limitation. Non-gradient based methods can also be classified
into deterministic and random (or stochastic) methods, depending on the search strategy
28/08/2023 ForsChem Research Reports Vol. 8, 2023-12

10.13140/RG.2.2.21772.49284 www.forschem.org (2 / 33)
Hugo Hernandez
ForsChem Research
employed. Deterministic search rules will lead to the exact same result every time the same
starting point is used for the optimization. Search rules based on random numbers, on the
contrary, may yield different results when the same starting point is considered. A particular
category of non-gradient based algorithms, presented in Figure 2, is the randomistic algorithm,
involving both deterministic and random search rules in the same algorithm. The Adaptive step-
size One-at-a-time (OAT) algorithm presented in a previous report is an example of a
randomistic optimization method [6].
Figure 2. General Classification of Numerical Optimization Algorithms
Gradient-based methods are quite successful in the case of relatively simple optimization
functions (i.e. convex functions), where a local optimum is also a global optimum. For non-
convex optimization problems, gradient-based methods may fail to find the global optimum.
Random methods have usually a higher rate of success compared to deterministic methods,
particularly for non-convex problems, due to the exploratory nature of the random search
strategy. However, it also implies a higher optimization cost, in terms of both number of
function evaluations and optimization time.
The purpose of the present report is exploring the possibility of increasing the average success
rate of optimization at a reasonable cost, by combining different optimization algorithms in a
multi-algorithm optimization approach.
Section 2 describes the proposed multi-algorithm optimization method. Section 3 explains the
methodology employed to evaluate optimization performance for the multi-algorithm
optimization using different permutations of algorithms. Section 4 summarizes and discusses
the results obtained with the multi-algorithm optimization method. Finally, the Appendix
includes the corresponding functions implemented in R language (https://cran.r-project.org/).

Hugo Hernandez
ForsChem Research
2. Multi-Algorithm Optimization Method
The idea behind the multi-algorithm optimization method is relatively simple. It basically
consists in the execution of multiple optimization procedures in series, where the best result
obtained after each optimization procedure is used as starting point of the next algorithm. This
strategy is depicted in Figure 3.
Figure 3. Multi-Algorithm Optimization Strategy
Particularly in this report, only different optimization algorithms are considered. They can be
considered as representative algorithms of the different categories of algorithms previously
discussed. A brief description of each algorithm is presented next.
2.1. Gradient-based Method
The gradient-based method considered is the bounded Broyden-Fletcher-Goldfarb-Shanno

(BFGS) algorithm [7]. The main steps of this algorithm are the following:
 Determine the Cauchy Point ( ) by solving the quadratic sub-problem:

( ) ( )
( ) ( )
(2.1)

Hugo Hernandez
ForsChem Research
where is the value of the -th iteration, is the objective function, is the gradient
and is an approximation to the Hessian.
 Compute a search direction . Three different methods are considered: Direct primal
method, conjugate gradient method or dual method.
 Perform a line search along , subject to the bounds, using a step length as follows:
(2.2)
 Compute ( )
 Check for convergence, update parameter values, set and return to the first
step.
The BFGS algorithm is used in R language (https://cran.r-project.org/) using the function optim
from the stats R package (version 3.6.2), by specifying as input argument for method the text
“L-BFGS-B”.
2.2. Non-gradient Deterministic Method
The representative method chosen in this case is the Nelder-Mead (NM) algorithm [8], also
known as downhill simplex method. A general description of the method is the following:
 Set the initial position of all vertices of the simplex at a fixed step along each dimension
of the problem.
 Evaluate the objective function at the vertices and establish a hierarchy according to
their values, where is the best and is the worst vertex.
 Determine the centroid ( ̅ ) of the vertices excluding .
 Use the following operations for determining the new vertex:
o Reflection: This is the starting operation. A new vertex ( ) is proposed by:
( ) ̅
(2.3)
where is the reflection coefficient.
o Expansion: If reflection produced a new optimum, then the vertex is expanded
to:
( ) ̅
(2.4)
where is the expansion coefficient.
o Contraction: On the other hand, if the reflected vertex remains as the worst,
then the new vertex is contracted into:
( ) ( ) ̅
(2.5)

Hugo Hernandez
ForsChem Research
where is the contraction coefficient.

 The worst vertex is then updated by the new position (either or , depending on
the particular case), and the procedure is repeated until convergence or any other
termination criteria is met.
The NM algorithm is used in R language (https://cran.r-project.org/) using the function optim

“Nelder-Mead”, or no method argument at all (NM is the default method).
2.3. Non-gradient Random Method
Perhaps one of the most representative methods for non-gradient stochastic search methods
is Simulated Annealing (SANN) [9]. SANN has been inspired by the Metropolis-Monte Carlo
simulation method used in molecular modeling [10]. The main steps of this method are:
 Set an initial state ( ), initial temperature ( ), and a cooling schedule, where the
temperature of the system is a function of the iteration number.
 For each new iteration , select a random candidate ( ) from a given probability
distribution function. Typically, a Markov kernel is considered, where the new
candidate randomly deviates from the current state ( ), as follows:
(2.6)
where is a random deviation term following a probability model (e.g. normal,
uniform, etc.).
 Calculate the probability of acceptance for the candidate state as follows:
( ) ( )
( ) ( )
(2.7)
The sign of the exponent is negative for minimization problems, and positive for
maximization problems. is the objective function.
 Calculate a uniform random number ( ) to evaluate candidate acceptance:
( )
{
( )
(2.8)
 Update the temperature of the system ( ) using the pre-defined cooling schedule.
 Repeat the procedure until any termination criterion is achieved.

Hugo Hernandez
ForsChem Research
The SANN algorithm is used in R language (https://cran.r-project.org/) using the function optim
“SANN”.
2.4. Non-gradient Randomistic Method
The non-gradient randomistic method selected for the present analysis is the Adaptive step-size
One-at-a-time (OAT) introduced in a previous report [6]. The method procedure is the
following:
 A minimum ( ) and an initial step size ( ) are defined for each decision variable
(or dimension).
 For each optimization cycle, the evaluation order for the decision variables is randomly
determined.
 For each dimension, the new value of the decision variable is calculated as:
(2.9)
where (search direction), and is the value at the current best point.
 If the new point improves the objective function, the current best point is updated and
the algorithm is accelerated by setting . Otherwise, the new step size is
randomly decreased:
⟦ ⟧
(2.10)
where is a uniform random number, and ⟦ ⟧ is the rounding to the closest integer
operator.
 When , the search direction is switched (only once), otherwise the next
dimension is considered.
 The whole cycle is repeated until any termination criterion is achieved.
The OAT algorithm is used in R language (https://cran.r-project.org/) using the function

OAToptim shown in the Appendix.
2.5. maoptim Function
The integration of the four algorithms into a single optimization procedure has been
implemented in R language using the following preliminary maoptim function:

Hugo Hernandez
ForsChem Research
#Multi-Algorithm Optimization Function

maoptim<-function(par,fn,gr=NULL,method=c(1,2,3,4),lower=-
Inf,upper=Inf,control=list(),hessian=FALSE){
t0=Sys.time()
methodv=method
N=length(methodv)
counts=0
for (i in 1:N){
if (methodv[i]==1){
OUT=optim(par=par,fn=fn,gr=gr,method="L-BFGS-
B",lower=lower,upper=upper,control=control,hessian=hessian)
par=OUT$par
value=OUT$value
counts=counts+OUT$counts[1]
}
if (methodv[i]==2){
OUT=optim(par=par,fn=fn,gr=gr,method="Nelder-
Mead",control=control,hessian=hessian)
par=OUT$par
value=OUT$value
}
if (methodv[i]==3){
OUT=optim(par=par,fn=fn,gr=gr,method="SANN",control=control,hessian=hessian)
par=OUT$par
value=OUT$value
}
if (methodv[i]==4){
OUT=OAToptim(fun=fn,x0=par,lower=lower,upper=upper,step0=(upper-
lower)/5,stepmin=1e-6,ncycles=1000,tol=1e-6,MCcheck=30)
par=OUT[[1]]
value=OUT[[2]]
counts=counts+OUT[[3]]
}
}
optime=Sys.time()-t0
return(list(par,value,counts,optime))
}
The procedures employs as input argument a vector of integer numbers between and ,
denoting each of the representative methods described earlier, in the same order:
1. Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm

2. Nelder-Mead (NM) algorithm
3. Simulated Annealing (SANN) algorithm
4. Adaptive step-size One-at-a-time (OAT) algorithm
Notice that the length of the vector can be arbitrarily defined, as well as the methods
employed. Thus, any algorithm can be repeatedly used, and any algorithm can be left out of the
evaluation. By default, all algorithms are performed only once in the corresponding arbitrarily
pre-specified order ( ).

Hugo Hernandez
ForsChem Research
The output of this function includes: i) The best parameter values obtained after the
procedure, ii) the best objective value found by minimization, iii) the total number of function
evaluations performed, and iv) the total duration of the optimization procedure.
3. Performance Evaluation Methodology
All the optimization algorithms employed were run using the above-mentioned R functions
always considering their corresponding default parameters. The optimization procedures were
evaluated using an Intel® Core™ i5-2400S processor @ 2.50 GHz with 8 GB RAM.
The performance of each arbitrary array of methods considering the algorithms previously
described, was evaluated using the same metrics reported in [6] for benchmark
optimization problems (described in the Appendix). The metrics considered include:
 Average Euclidean distance (〈 〉) from the true optimum ( ). The average Euclidean
distance to the optimum is determined as follows:
∑
〈 〉
(3.1)
where is the Euclidean distance of the best result found in the -th of
replicates, with respect to the true optimum known for the benchmark function, given
by:
√∑( )
(3.2)
In the case of multiple global optima, the Euclidean distance is considered with respect
to the closest optimum.
 Average best objective function value (〈 〉)

∑
〈 〉
(3.3)
 Success rate ( ). A solution of the optimization algorithm will be considered successful
when the Euclidean distance is less or equal than √∑ ( ) .

Hugo Hernandez
ForsChem Research
(3.4)
where
{
(3.5)
 Average number of function evaluations (〈 〉)

∑
〈 〉
(3.6)
where is the number of functions evaluations of each solution, determined and
reported by the optimization algorithm.
 Average computation time (〈 〉)

∑
〈 〉
(3.7)
where was determined in R for each replicate using the function difftime.
4. Results Analysis and Discussion
4.1. Effect of the Permutation of Algorithms
In first place, a full factorial design was employed to test the effect of the particular array of
methods considered in the multi-algorithm optimization. Only optimization steps were
considered (arrays with a length of 4), where each of the algorithms can be used. This leads
to a factorial design with different permutations. All benchmark optimization
problems were evaluated with each permutation, considering random sets of starting
points. Of course, the same starting points were evaluated using each permutation.
Particularly for this design, permutations involved a single algorithm only (i.e. , ,
and ). permutations involved only two different algorithms. permutations
involved three different algorithms. And permutations involved all algorithms at the same
time.
The overall success rate values obtained for the different permutations are illustrated in
Figure 4, grouped by the number of different algorithms considered. A significant trend is
observed in the data, indicating that effectively, the combination of different optimization

Hugo Hernandez
ForsChem Research
strategies leads to an increase in the success rate. Figure 5 shows the corresponding effect of
the number of different algorithms on the average number of total function evaluations and
average optimization times. These plots illustrate that the average optimization cost is not
significantly influenced by increasing the number of different algorithms (considering the same
number of steps). However, an increase in diversity reduces the fluctuations in the results.
Notice that the optimization cost observed in these results is strongly influenced by the SANN
algorithm. Table 1 summarizes the results shown in Figure 4 and Figure 5.
Figure 4. Overall Success Rate as a function of the number of different algorithms employed.
Black dots: Success rate of individual permutations considering benchmark problems (100
random runs). Green diamond: Average success rate. Dotted red lines: confidence
intervals in the estimation of the mean success rate.
Figure 5. Total number of function evaluations (left) and average optimization time (right) as
functions of the number of different algorithms employed. Black dots: Performance of
individual permutations considering benchmark problems ( random runs). Green
diamond: Average success rate. Dotted red lines: confidence intervals in the estimation of
the mean success rate.

Hugo Hernandez
ForsChem Research
Table 1. Estimates of the average and standard deviation of the average of performance criteria
as a function of number of different algorithms
# Different Success Rate # Function Evaluations Optimization time (s)
Algorithms Average Avg. St. Dev. Average Avg. St. Dev. Average Avg. St. Dev.
1 40.5% 5.81% 10411 9865.9 0.1106 0.0877
2 49.6% 0.98% 10407 1229.0 0.1117 0.0111
3 54.8% 0.53% 10412 584.0 0.1131 0.0054
4 59.0% 0.40% 10416 62.4 0.1146 0.0032
Notice that SANN alone has a default maximum number of function evaluations, which
are completed in all optimizations. Thus, while SANN performs function evaluations in
the optimization cycle, all other algorithms combined perform on average about function
evaluations only, representing only about of the total optimization cost.
Table 2. Performance of different algorithm sequences employing different optimization

algorithms. Red values: Worst performance. Green values: Best performance.
Algorithm Sequence Average Success Rate Average Optimization 〈 〉 Performance
( ) Time 〈 〉 (s) Ratio
[NM,BFGS,SANN,OAT] 55.4% 0.1062 5.2127
[NM,BFGS,OAT,SANN] 57.3% 0.1088 5.2676
[NM,SANN,BFGS,OAT] 57.4% 0.1043 5.5000
[NM,SANN,OAT,BFGS] 58.4% 0.1048 5.5734
[NM,OAT,BFGS,SANN] 58.0% 0.1083 5.3544
[NM,OAT,SANN,BFGS] 59.2% 0.1060 5.5897
[BFGS,NM,SANN,OAT] 58.2% 0.1050 5.5401
[BFGS,NM,OAT,SANN] 59.1% 0.1082 5.4641
[BFGS,SANN,NM,OAT] 56.9% 0.1045 5.4387
[BFGS,SANN,OAT,NM] 58.5% 0.1044 5.6017
[BFGS,OAT,NM,SANN] 59.5% 0.1098 5.4191
[BFGS,OAT,SANN,NM] 62.0% 0.1087 5.7090
[SANN,NM,BFGS,OAT] 57.0% 0.1044 5.4628
[SANN,NM,OAT,BFGS] 57.4% 0.1038 5.5293
[SANN,BFGS,NM,OAT] 57.9% 0.1035 5.5968
[SANN,BFGS,OAT,NM] 59.5% 0.1033 5.7577
[SANN,OAT,NM,BFGS] 57.6% 0.1057 5.4513
[SANN,OAT,BFGS,NM] 60.3% 0.1062 5.6767
[OAT,NM,BFGS,SANN] 59.1% 0.1416 4.1723
[OAT,NM,SANN,BFGS] 60.6% 0.1388 4.3641
[OAT,BFGS,NM,SANN] 58.8% 0.1408 4.1737
[OAT,BFGS,SANN,NM] 61.0% 0.1438 4.2449
[OAT,SANN,NM,BFGS] 61.6% 0.1394 4.4158
[OAT,SANN,BFGS,NM] 64.3% 0.1390 4.6274
From these results we may also conclude that algorithm diversity improves success rate,
partially overcoming the limitations described by the NFL theorem, while at the same time

Hugo Hernandez
ForsChem Research
decreasing the fluctuation in optimization costs due to differences between individual

algorithms.
Table 2 summarizes the performance of the permutations considering all optimization

algorithms. The performance results include the average success rate, the average
optimization time, and the ratio between success rate and optimization time ( 〈 〉). This
ratio is used to consider the combined effect of success rate and optimization time. The
number of function evaluations was not included as it is closely related to optimization time.
The highest success rate was obtained by the algorithm sequence [OAT,SANN,BFGS,NM], while
the lowest cost and highest 〈 〉 ratio was obtained by the [SANN,BFGS,OAT,NM] sequence.
Only this pair of sequences will be considered for the next analysis.
4.2. Effect of the Number of Cycles
Following with the idea of increasing success rate, the algorithm sequences showing best
performance ([OAT,SANN,BFGS,NM] and [SANN,BFGS,OAT,NM]) were evaluated considering
multiple consecutive optimization cycles. That is, the whole algorithm sequence was repeated
using the best point obtained by each cycle as the starting point for the next cycle. The
evaluation was performed for up to consecutive cycles for each sequence. All benchmark
optimization problems were considered again, but this time using random sets of starting
points. The results obtained are summarized in Table 3 and Figure 6.
Table 3. Performance of different cycles for the best algorithm sequences employing
different optimization algorithms.
Number of Average Success Average Optimization 〈 〉
Algorithm Sequence
Cycles ( ) Rate ( ) Time 〈 〉 (s) Performance Ratio
[OAT,SANN,BFGS,NM] 1 61.7% 0.1391 4.4370
[OAT,SANN,BFGS,NM] 2 67.3% 0.2347 2.8670
[OAT,SANN,BFGS,NM] 3 69.8% 0.3316 2.1052
[OAT,SANN,BFGS,NM] 4 69.9% 0.4285 1.6314
[OAT,SANN,BFGS,NM] 5 71.1% 0.5246 1.3549
[SANN,BFGS,OAT,NM] 1 59.7% 0.0997 5.9838
[SANN,BFGS,OAT,NM] 2 66.5% 0.1980 3.3593
[SANN,BFGS,OAT,NM] 3 70.0% 0.2953 2.3697
[SANN,BFGS,OAT,NM] 4 71.1% 0.3929 1.8099
[SANN,BFGS,OAT,NM] 5 72.4% 0.4898 1.4790
First of all, we may notice that the [OAT,SANN,BFGS,NM] sequence only shows a higher
success rate compared to [SANN,BFGS,OAT,NM] for less than cycles, while
[SANN,BFGS,OAT,NM] always shows less optimization times compared to
[OAT,SANN,BFGS,NM] independently of the number of cycles.

Hugo Hernandez
ForsChem Research
Figure 6. Overall success rate (left) and average optimization time (right) as functions of the
number of cycles of optimization sequences. Light blue diamonds: [OAT,SANN,BFGS,NM]. Dark
blue circles: [SANN,BFGS,OAT,NM].
In second place, the optimization time increases linearly with the number of cycles, as
expected. On the other hand, while the success rate increases with the number of cycles, such
increase is less than expected. If we assume that the performance of each cycle is independent
from the previous cycles, we would expect that:
( ) ( ( ))
(4.1)
where ( ) is the expected success rate after cycles, and ( ) is the observed success rate
for a single cycle.
The observed success rates are compared to the expected success rates in Figure 7.
Figure 7. Comparison between overall observed (blue) success rate and expected (green)
success rate according to Eq. (4.1) as functions of the number of cycles of optimization
sequences. Diamonds: [OAT,SANN,BFGS,NM]. Circles: [SANN,BFGS,OAT,NM].

Hugo Hernandez
ForsChem Research
The difference in the observed results can be explained by two reasons: 1) The probability of
success is different for each optimization problem. 2) The probability of success is different for
each starting point.
Regarding the second cause, while the average probability of success in a sample of random
points is ( ) , the probability of success of certain points is much lower than the average
( ( ) ), resulting in a lower impact of the number of cycles on success. In simpler words, bad
starting points will not be successful no matter how many optimization cycles are performed.
For that reason, a new strategy will be employed in the following section to overcome this
particular difficulty.
4.3. Effect of the Number of Random Starting Points
Considering the previous results, we may conclude that choosing different random starting
points at the beginning of each cycle might be more successful that continuing optimizing from
the current best point. Such randomization of the starting point for a better exploration of the
search region is the goal of non-gradient random search methods.
On the other hand, while we have included the SANN algorithm in the multi-algorithm
optimization method, it seems like something is not working properly with SANN. Among the
algorithms considered here, the SANN algorithm provided the highest success rate ( )
when a single algorithm was used. However, it also represents almost of the optimization
cost when all algorithms are used. Let us recall that SANN evaluates new random points but it
compares the performance of the new point with the current best value. Thus, SANN evaluates
whether the new point is suitable as an optimum value, but it does not evaluate whether the
new point is a good starting point for the optimization using other methods. For this reason,
the SANN algorithm will be removed from the algorithm sequence, and instead it will be
replaced by the evaluation of the sequence for different initial random points. This is equivalent
to performing various optimization cycles simultaneously, instead of performing them in series
(where the current best is used as starting point for the next cycle).
By removing SANN from the list of permutations considered we obtain only

permutations. Now, since algorithm diversity improves success rate, let us consider only those
permutations using different algorithms. This reduces the list to permutations. A
particular feature of all these permutations is that one of the algorithms is executed twice
during the optimization. In order to remove such redundancy, the performance of all possible
permutations considering all three remaining algorithms without repetitions is evaluated, using
a single random starting point. In this case, we obtain permutations. Table 4 summarizes the
performance of the permutations considered at this stage (with and without redundancy).

Hugo Hernandez
ForsChem Research
Table 4. Performance of permutations considering optimization algorithms (BFGS, NM,

and OAT). Red values: Worst performance. Green values: Best performance.
Average Success Average Optimization 〈 〉
Algorithm Sequence
Rate ( ) Time 〈 〉 (s) Performance Ratio
[BFGS,NM,OAT] 50.1% 0.0076 65.71
[BFGS,OAT,NM] 51.7% 0.0090 57.66
[NM,BFGS,OAT] 47.8% 0.0077 62.25
[NM,OAT,BFGS] 48.4% 0.0079 60.85
[OAT,BFGS,NM] 48.8% 0.0350 13.95
[OAT,NM,BFGS] 48.9% 0.0338 14.48
[NM,NM,BFGS,OAT] 42.0% 0.0102 41.22
[NM,NM,OAT,BFGS] 43.2% 0.0112 38.64
[NM,BFGS,NM,OAT] 42.6% 0.0108 39.65
[NM,BFGS,BFGS,OAT] 41.4% 0.0104 39.82
[NM,BFGS,OAT,NM] 44.6% 0.0104 42.73
[NM,BFGS,OAT,BFGS] 41.8% 0.0115 36.35
[NM,BFGS,OAT,OAT] 43.4% 0.0142 30.64
[NM,OAT,NM,BFGS] 43.2% 0.0116 37.25
[NM,OAT,BFGS,NM] 44.7% 0.0117 38.08
[NM,OAT,BFGS,BFGS] 42.4% 0.0119 35.48
[NM,OAT,BFGS,OAT] 43.9% 0.0150 29.37
[NM,OAT,OAT,BFGS] 43.7% 0.0155 28.14
[BFGS,NM,NM,OAT] 45.7% 0.0104 44.04
[BFGS,NM,BFGS,OAT] 44.6% 0.0104 42.92
[BFGS,NM,OAT,NM] 47.9% 0.0107 44.56
[BFGS,NM,OAT,BFGS] 45.8% 0.0111 41.29
[BFGS,NM,OAT,OAT] 46.6% 0.0143 32.68
[BFGS,BFGS,NM,OAT] 44.9% 0.0100 44.80
[BFGS,BFGS,OAT,NM] 48.3% 0.0118 40.81
[BFGS,OAT,NM,NM] 47.9% 0.0123 39.12
[BFGS,OAT,NM,BFGS] 46.1% 0.0131 35.26
[BFGS,OAT,NM,OAT] 48.9% 0.0166 29.46
[BFGS,OAT,BFGS,NM] 46.9% 0.0125 37.67
[BFGS,OAT,OAT,NM] 49.6% 0.0157 31.65
[OAT,NM,NM,BFGS] 43.1% 0.0475 9.07
[OAT,NM,BFGS,NM] 44.3% 0.0456 9.72
[OAT,NM,BFGS,BFGS] 42.2% 0.0469 8.99
[OAT,NM,BFGS,OAT] 43.8% 0.0505 8.66
[OAT,NM,OAT,BFGS] 44.0% 0.0520 8.47
[OAT,BFGS,NM,NM] 44.6% 0.0493 9.05
[OAT,BFGS,NM,BFGS] 42.4% 0.0470 9.01
[OAT,BFGS,NM,OAT] 46.9% 0.0506 9.27
[OAT,BFGS,BFGS,NM] 44.6% 0.0463 9.64
[OAT,BFGS,OAT,NM] 46.7% 0.0495 9.44
[OAT,OAT,NM,BFGS] 44.6% 0.0559 7.97
[OAT,OAT,BFGS,NM] 45.4% 0.0547 8.31

Hugo Hernandez
ForsChem Research
As expected, sequences with only algorithms were on average ( ) faster than

sequences with algorithms ( ). Particularly, the fastest sequence was [BFGS,NM,OAT]
with an average of which also provided the highest performance ratio ( ).
Also notice the great reduction in optimization time obtained by eliminating SANN from the list
of algorithms ( w/o SANN vs. with SANN). On average, sequences including
SANN are about times slower than sequences without SANN.
Notice also that the best performance ( success rate) was achieved with the sequence
[BFGS,OAT,NM] without any redundancy.
From this results, the first two sequences ([BFGS,NM,OAT] and [BFGS,OAT,NM]) are selected
to be used for the next evaluation. In this case, the number of starting points used in the
optimization is changed between and . In addition, the evaluation with starting points
was also included. All the starting points are randomly chosen considering a uniform
distribution within the boundaries of each decision variable. Each problem is solved times
for each number of starting points. Only the first starting point is identical for all
optimization runs. The results obtained are summarized in Table 5 and Figure 8.
Table 5. Effect of the number of random starting points on the performance of sequences
[BFGS,NM,OAT] and [BFGS,OAT,NM]
[BFGS,NM,OAT] [BFGS,OAT,NM]
#Starting
Avg. Success Avg. Optim. Performance Avg. Success Avg. Optim. Performance
Points
Rate Time (s) Ratio Rate Time (s) Ratio
1 47.3% 0.0118 40.21 49.8% 0.0135 36.93
2 60.6% 0.0234 25.83 64.8% 0.0246 26.35
3 68.2% 0.0334 20.41 72.5% 0.0381 19.03
4 72.6% 0.0433 16.77 77.7% 0.0497 15.62
5 74.9% 0.0516 14.50 80.4% 0.0631 12.75
6 77.1% 0.0593 12.99 82.7% 0.0750 11.02
7 79.0% 0.0686 11.51 83.9% 0.0867 9.68
8 79.9% 0.0798 10.01 85.5% 0.0988 8.65
9 81.3% 0.0892 9.12 85.9% 0.1105 7.78
10 81.5% 0.0993 8.20 87.4% 0.1242 7.04
11 82.2% 0.1109 7.41 88.1% 0.1332 6.61
12 83.0% 0.1157 7.17 88.7% 0.1436 6.17
13 83.3% 0.1275 6.53 88.7% 0.1606 5.52
14 84.0% 0.1391 6.04 90.1% 0.1703 5.29
15 84.2% 0.1495 5.63 90.3% 0.1833 4.93
100 92.0% 0.9894 0.93 98.9% 1.2275 0.81

Hugo Hernandez
ForsChem Research
Figure 8. Overall success rate (left) and average optimization time (right) as functions of the
number of random starting points. Light blue diamonds: [BFGS,NM,OAT]. Dark blue circles:
[BFGS,OAT,NM]. Solid green line: Eq. (4.1).
The fitted (dashed) curves shown in Figure 8 correspond to the following empirical models. In
the case of average success rates:
( ) ( ( ( ( )) ))
(4.2)
( ) ( ( ( ( )) ))
(4.3)
where represents the number of initial starting points, is the error function, and is
the decimal logarithm function.
In general, the success rate is higher for the sequence [BFGS,OAT,NM] than for
[BFGS,NM,OAT] considering any arbitrary number of starting points. However, they were both
lower than the theoretical success rate determined by Eq. (4.1) (using an intermediate value of
as the success ratio for a single starting point).
In the case of average optimization times, the following linear models were obtained:
〈 〉( )
(4.4)
〈 〉( )
(4.5)
Thus, the optimization time for the sequence [BFGS,OAT,NM] is longer than for the
sequence [BFGS,NM,OAT]. In addition, using about random starting points results in
optimization times similar to those obtained by including SANN in the optimization sequence,
and also results in higher success rates.

Hugo Hernandez
ForsChem Research
The difference between the theoretical and the observed success rates can be attributed to the
different success rates of each problem. Table 6 shows the individual success rates obtained by
each sequence for each particular problem using and starting points.
Table 6. Individual success rates for the different optimization problems obtained with
sequences [BFGS,OAT,NM] and [BFGS,NM,OAT] for and starting points. Green:
success rate. Red: Success rate .
[BFGS,OAT,NM] [BFGS,NM,OAT]
Problem Function
ackley 56.0% 100% 52.5% 100%
beale 87.0% 100% 84.0% 100%
booth 100% 100% 100% 100%
bukin6 4.0% 20.5% 1.5% 98.5%
camel 81.0% 100% 82.0% 100%
crossintray 57.5% 100% 66.5% 100%
easom 18.5% 100% 13.0% 100%
eggholder 1.5% 81.5% 0.5% 75.5%
goldsteinprice 55.0% 100% 68.5% 100%
gomezlevyC 46.5% 100% 50.0% 100%
himmelblau 21.0% 48.5% 22.0% 100%
hoeldertable 22.5% 100% 50.0% 100%
levi13 53.0% 100% 45.0% 100%
matyas 100% 100% 100% 100%
mccormick 62.5% 100% 56.0% 100%
mishraC 50.5% 100% 54.0% 100%
rastrigin 33.0% 100% 28.5% 100%
rosenbrock 100% 100% 100% 100%
rosenbrockC 2.0% 53.0% 4.5% 99.0%
schaffer2 26.5% 100% 32.0% 100%
schaffer4 15.0% 100% 21.5% 100%
simionescuC 51.5% 100% 68.5% 100%
sphere 100% 100% 100% 100%
styblinskitang 36.5% 100% 37.5% 100%
townsendC 1.5% 95.5% 8.0% 100%
Overall 47.30% 91.96% 49.84% 98.92%
First of all, notice that some “easy” problems already achieved success with a single
starting point. In addition, most optimization problems reached a success rate when
starting points are considered, with the exceptions of the most “difficult” problems. Those
“difficult” problems also resulted in low success rates for a single starting point, commonly
below . For example, a “difficult” problem like the eggholder function showing a success
rate of about for a single starting point, would result in a theoretical individual success rate
of with independent starting points, thus decreasing the overall average success
rate for the set of benchmark problems considered with respect to the theoretical average.

Hugo Hernandez
ForsChem Research
Figure 9 illustrates the evolution of the success rate of individual problems as a function of the
number of starting points for the sequence [BFGS,NM,OAT]. In this graph we can clearly
distinguish a breach between “easy” and “difficult” problems, where “easy” problems are
found on the top of the graph, and “difficult” problems are present in the bottom. Even when
“difficult” problems improve more slowly than “easy” problems, it is clear that by increasing
the number of starting points, their individual success rate improves.
Figure 9. Success Rate of individual problems as a function of the number of starting points
( ) using the [BFGS,NM,OAT] sequence. Black dots: Success rate of individual benchmark
problems (200 random runs). Green diamond: Sample average success rate. Dotted red lines:
95% confidence intervals in the estimation of the mean success rate.
5. Conclusion
From the results obtained we may conclude that a “universal optimizer”, with an almost
success rate for any type of optimization problem is possible by considering a large number of
starting points randomly distributed in the search region. Of course, the cost of such strategy
may be prohibitive in most practical situations. The efficiency of such “universal optimizer” can
be increased (optimization costs decreased) by using a diversification strategy where multiple
optimization algorithms (having different natures) are used simultaneously (either in sequence
or in parallel). This strategy may overcome some limitations described by the No Free Lunch
theorem [3-5].
In particular, a multi-algorithm optimization method is proposed where multiple random states

are used as starting points for an optimization sequence using the following algorithms:
Broyden-Fletcher-Goldfarb-Shanno (gradient-based) [7], Nelder-Mead (deterministic non-
gradient) [8], and adaptive step-size One-at-a-time optimization (randomistic non-gradient) [6].
This sequence seems to maximize the average success rate of a benchmark set of global
optimization problems, while keeping a competitive optimization cost. For example, using
random starting points with this optimization sequence, the overall success rate obtained is

Hugo Hernandez
ForsChem Research
already , with an average optimization time per run of about for the benchmark
set of problems considered.
The final implementation of this algorithm in R language (https://cran.r-project.org/) is included

in the Appendix.
Acknowledgment and Disclaimer
This report provides data, information and conclusions obtained by the author(s) as a result of original
scientific research, based on the best scientific knowledge available to the author(s). The main purpose
of this publication is the open sharing of scientific knowledge. Any mistake, omission, error or inaccuracy
published, if any, is completely unintentional.
This research did not receive any specific grant from funding agencies in the public, commercial, or not-
for-profit sectors.
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC
4.0). Anyone is free to share (copy and redistribute the material in any medium or format) or adapt
(remix, transform, and build upon the material) this work under the following terms:
 Attribution: Appropriate credit must be given, providing a link to the license, and indicating if
changes were made. This can be done in any reasonable manner, but not in any way that
suggests endorsement by the licensor.
 NonCommercial: This material may not be used for commercial purposes.
References
[1] Hernandez, H. (2018). Introduction to Randomistic Optimization. ForsChem Research Reports, 3,

2018-11, 1-25. doi: 10.13140/RG.2.2.30110.18246.
[2] Priyadarshini, J., Premalatha, M., Čep, R., Jayasudha, M., & Kalita, K. (2023). Analyzing physics-
inspired metaheuristic algorithms in feature selection with K-nearest-neighbor. Applied Sciences,
13 (2), 906. doi: 10.3390/app13020906.
[3] Wolpert, D. H., & Macready, W. G. (1997). No free lunch theorems for optimization. IEEE
Transactions on Evolutionary Computation, 1 (1), 67-82. doi: 10.1109/4235.585893.
[4] Adam, S. P., Alexandropoulos, S-A. N., Pardalos, P. M., & Vrahatis, M. N. (2019). No Free Lunch
Theorem: A Review. In: Demetriou, I., Pardalos, P. Approximation and Optimization: Algorithms,
Complexity and Applications. Springer Optimization and Its Applications, Vol 145. Springer, Cham.
pp. 57-82. doi: 10.1007/978-3-030-12767-1_5.
[5] Ho, Y. C., & Pepyne, D. L. (2002). Simple explanation of the no-free-lunch theorem and its
implications. Journal of Optimization Theory and Applications, 115, 549-570. doi:
10.1023/A:1021251113462.
[6] Hernandez, H. and Ochoa, S. (2022). Adaptive Step-size One-at-a-time (OAT) Optimization.
ForsChem Research Reports, 7, 2022-12, 1 - 44. doi: 10.13140/RG.2.2.15208.14087.

Hugo Hernandez
ForsChem Research
[7] Byrd, R. H., Lu, P., Nocedal, J., & Zhu, C. (1995). A limited memory algorithm for bound constrained
optimization. SIAM Journal on Scientific Computing, 16 (5), 1190-1208. doi: 10.1137/0916069.
[8] Nelder, J. A., & Mead, R. (1965). A simplex method for function minimization. The Computer
Journal, 7 (4), 308-313. doi: 10.1093/comjnl/7.4.308.
[9] Bélisle, C. J. (1992). Convergence theorems for a class of simulated annealing algorithms on .
Journal of Applied Probability, 29 (4), 885-895. doi: 10.2307/3214721.
[10] Metropolis, N., & Ulam, S. (1949). The Monte Carlo method. Journal of the American Statistical
Association, 44 (247), 335-341. doi: 10.1080/01621459.1949.10483310.
Appendix
A.1. OAT Optimization Algorithm Implementation in R

OAToptim<-function(fun,x0=NA,lower=NA,upper=NA,step0=NA,stepmin=NA,ncycles=1000,tol=1e-6,
MCcheck=30,display=TRUE,optmode=c('min','max')){
t0=Sys.time()
if (display==TRUE) print('Initializing the Adaptive One-at-a-time Optimizer...')
#Validation of input arguments
nx=length(x0)
nl=length(lower)
nu=length(upper)
nm=length(stepmin)
n0=length(step0)
nd=max(nx,nl,nu,nm,n0)
if (nx<nd){
x0=c(x0,NA*(1:(nd-nx)))
}
if (nl<nd){
lower=c(lower,-Inf*(1:(nd-nl)))
}
if (nu<nd){
upper=c(upper,Inf*(1:(nd-nu)))
}
if (nm<nd){
stepmin=c(stepmin,NA*(1:(nd-nm)))
}
if (n0<nd){
step0=c(step0,NA*(1:(nd-n0)))
}
#Initialization
for (i in 1:nd){
#Lower bounds
if(is.na(lower[i])==TRUE | is.nan(lower[i])==TRUE | is.null(lower[i])==TRUE |
is.infinite(lower[i])==TRUE){
lower[i]=-Inf
}
#Upper bounds
if(is.na(upper[i])==TRUE | is.nan(upper[i])==TRUE | is.null(upper[i])==TRUE |
is.infinite(upper[i])==TRUE){
upper[i]=Inf
}
if(upper[i]<lower[i]){
upper[i]=Inf
if (display==TRUE) print('Warning: An upper bound is less than the corresponding lower
bound. The upper bound is set to Inf.')
}
#Initial point
if(is.na(x0[i])==TRUE | is.nan(x0[i])==TRUE | is.null(x0[i])==TRUE){
xmin=lower[i]

Hugo Hernandez
ForsChem Research
xmax=upper[i]
if(is.infinite(lower[i])==TRUE){
xmin=0
}
if(is.infinite(upper[i])==TRUE){
xmax=1
}
x0[i]=xmin+(xmax-xmin)*round(1000*runif(1))/1000
}
#Initial step size
if (is.na(step0[i]) | is.nan(step0[i]) | is.null(step0[i]) | is.infinite(step0[i])==
TRUE){
step0[i]=0
} else {
step0[i]=abs(step0[i])
}
if (step0[i]==0){
if(is.infinite(lower[i])==TRUE | is.infinite(upper[i])==TRUE){
if(x0[i]==0){
step0[i]=1
} else {
step0[i]=x0[i]/10
}
} else {
step0[i]=(upper[i]-lower[i])/10
}
}
#Minimum step size
if (is.na(stepmin[i]) | is.nan(stepmin[i]) | is.null(stepmin[i]) |
is.infinite(stepmin[i])==TRUE){
stepmin[i]=0
} else {
stepmin[i]=abs(stepmin[i])
}
if (stepmin[i]==0){
stepmin[i]=step0[i]/1000
}
#Minimum step size correction
step0[i]=stepmin[i]*ceiling(step0[i]/stepmin[i])
x0[i]=stepmin[i]*round(x0[i]/stepmin[i])
}
#Tolerance
if(is.na(tol)==TRUE | is.nan(tol)==TRUE | is.null(tol)==TRUE | is.infinite(tol)==TRUE){
tol=1e-6
} else {
tol=abs(tol)
}
#Number of cycles
if(is.na(ncycles)==TRUE | is.nan(ncycles)==TRUE | is.null(ncycles)==TRUE |
is.infinite(ncycles)==TRUE){
ncycles=100
} else {
ncycles=max(1,abs(round(ncycles)))
}
#Monte Carlo check
if(is.na(MCcheck)==TRUE | is.nan(MCcheck)==TRUE | is.null(MCcheck)==TRUE |
is.infinite(MCcheck)==TRUE){
MCcheck=10
} else {
MCcheck=abs(round(MCcheck))
}
#Optimization mode
optmode=substr(optmode[1],1,3)
if (optmode=="Max" | optmode="MAX"){
optmode="max"
}

Hugo Hernandez
ForsChem Research
#Geometric mean step

GMstep=sqrt(step0*stepmin)
#Current best
xopt=x0
if (display==TRUE){
print('Initial point: ')
print(x0)
}
Fobj=tol*round(fun(x0)/tol)
nfeval=1
if (display==TRUE) print(paste('Initial objective function: ',Fobj))
for (i in 1:ncycles){
if (display==TRUE) print(paste('Cycle ',i,'/',ncycles))
xopt0=xopt
cycleorder=sample(1:nd)
for (k in 1:nd){
j=cycleorder[k]
if (display==TRUE) print(paste('Variable ',j,' (',k,'/',nd,')'))
exit=0
dir=1
step=step0[j]
paramv=xopt[j]
paramvopt=paramv
while (dir>=-1){
while (exit==0){
paramv=stepmin[j]*round((paramvopt+dir*step)/stepmin[j])
paramv=max(paramv,lower[j])
paramv=min(paramv,upper[j])
x=xopt
x[j]=paramv
if (optmode=='max'){
Fobjnew=-Inf
Fobjnew=try(fun(x))
nfeval=nfeval+1
if(is.na(Fobjnew)==TRUE | is.nan(Fobjnew)==TRUE | is.null(Fobjnew)==TRUE)
Fobjnew=-Inf
if (Fobjnew>(Fobj+tol)){
paramvopt=paramv
Fobj=tol*round(Fobjnew/tol)
if (display==TRUE) print(paste('New best function value: ',Fobj))
step=step*2
} else {
if (step<=stepmin[j]){
exit=1
} else {
step=stepmin[j]*round(step/((2+8*runif(1))*stepmin[j]))
}
}
} else {
Fobjnew=Inf
Fobjnew=try(fun(x))
nfeval=nfeval+1
Fobjnew=Inf
if (Fobjnew<(Fobj-tol)){
paramvopt=paramv
step=step*2
} else {
if (step<=stepmin[j] | abs(Fobj)<tol){
exit=1
} else {
step=stepmin[j]*round(step/((2+8*runif(1))*stepmin[j]))
}
}

Hugo Hernandez
ForsChem Research
}
}
if (dir==1){
exit=0
dir=-1
step=step0[j]
} else {
dir=-2
step=step0[j]
}
}
xopt[j]=paramvopt
if (display==TRUE) print(paste('Best variable value =',xopt[j]))
}
if (max(abs(xopt-xopt0))==0){
if (MCcheck>0){
if (display==TRUE) print('Initiating Monte Carlo check of the optimum.')
for (m in 1:MCcheck){
x=stepmin*round((xopt+GMstep*rnorm(nd))/stepmin)
x=pmax(x,lower)
x=pmin(x,upper)
if (optmode=='max'){
Fobjnew=-Inf
Fobjnew=try(fun(x))
nfeval=nfeval+1
Fobjnew=-Inf
if (Fobjnew>(Fobj+tol)){
xopt=x
}
} else {
Fobjnew=Inf
Fobjnew=try(fun(x))
nfeval=nfeval+1
Fobjnew=Inf
if (Fobjnew<(Fobj-tol)){
xopt=x
}
}
}
}
if (max(abs(xopt-xopt0))==0){
if (display==TRUE) print('No further improvement in the objective function.
Terminating the optimizer.')
break
} else {
if (display==TRUE) print('Resuming the OAT optimizer.')
}
}
}
if (i==ncycles & display==TRUE) print('Maxinum number of cycles reached. Terminating the
optimizer.')
ctime=difftime(Sys.time(),t0,units="secs")
if (display==TRUE){
print('Best point found:')
print(xopt)
print(paste('Best objective function: ',Fobj))
print(paste('Number of function evaluations: ',nfeval))
print(paste('Optimization time (s): ',ctime))
}
return(list(xopt,Fobj,nfeval,ctime))
}

Hugo Hernandez
ForsChem Research
A.2. Multi-Algorithm Optimization Implementation in R

#Multi-Algorithm Optimization Function with multiple starting points
maoptim<-function(par,fn,gr=NULL,method=c("L-BFGS-B","Nelder-Mead","OAT"),lower=-
Inf,upper=Inf,control=list(),hessian=FALSE){
#This function performs the Multi-Algorithm Optimization of a function of one or more
variables. It requires optim{stats} and OAToptim (defined at the bottom)
#Usage
#maoptim(par, fn, gr = NULL, method = c("L-BFGS-B","Nelder-Mead","OAT"), lower = -Inf,
upper = Inf, control = list(), hessian = FALSE)
#Arguments
#par Initial values for the parameters to be optimized over.
#fn A function to be minimized (or maximized), with first argument the
vector of parameters over which minimization is to take place. It should return a scalar
result.
#gr A function to return the gradient for the "BFGS", "CG" and "L-BFGS-B"
methods. If it is NULL, a finite-difference approximation will be used.
# For the "SANN" method it specifies a function to generate a new candidate
point. If it is NULL a default Gaussian Markov kernel is used.
#... Further arguments to be passed to fn and gr.
#method The sequence of methods to be used. See 'Details'.
#lower, upper Bounds on the variables for the "L-BFGS-B" or "OAT" methods, or
bounds in which to search for method "Brent".
#control a list of control parameters. See 'Details'.
#hessian Logical. Should a numerically differentiated Hessian matrix be
returned?
#Details: See Details in optim{stats}. In addition, the control argument may include
the following optional arguments:
#nsp Non-negative integer indicating the number of starting points. By default,
nsp=14
#step0 Vector representing the initial search step-size for each decision
variable.
#stepmin Vector representing the minimum search step-size for each decision
variable. For integer decision variables the minimum search step-size must be 1.
#ncycles Non-negative integer indicating the maximum number of full cycles to be
performed by the optimization algorithm.
#tol Argument indicating the tolerance (or resolution) for the objective
function.
#MCcheck Non-negative integer indicating the number of Monte Carlo trials used as
at test of local optima.
#display Logical argument used to show the progress of the optimization. By default
it is set to FALSE.
#optmode Character argument indicating the type of optimization to be performed:
"min" for minimization or "max" for maximization. A minimization problem is considered by
default.
#Output Values
#par Optimal values found for the decision variables.
#value Best objective function value found.
#counts Number of function evaluations performed.
#time Elapsed computation time in seconds.
t0=Sys.time()
s=1 #Sign of optimization problem (1: minimization, -1: maximization)
if (is.numeric(control$fnscale)==TRUE){
s=sign(control$fnscale)
} else {
if (is.character(control$optmode)==TRUE){
if (control$optmode=="max") {
s=-1
}
}
}
if (is.numeric(control$nsp)==TRUE){
nsp=max(1,control$nsp)
} else {
nsp=14
}

Hugo Hernandez
ForsChem Research
methodv=method
N=length(methodv)
n=length(par)
counts=0
par0=par
valueopt=Inf
for (j in 1:nsp){
if (j>1){
par=lower+(upper-lower)*runif(n)
if(max(is.nan(par))==1) par=paropt+pmax(abs(paropt),abs(par0))*rnorm(n)
}
for (i in 1:N){
if ((min(lower)==-Inf) | (max(upper)==Inf)){
if (methodv[i]=="L-BFGS-B") methodv[i]="BFGS"
} else {
if (n==1 & methodv[i]=="Nelder-Mead") methodv[i]="Brent"
}
if (methodv[i]=="OAT"){
if (is.numeric(control$step0)==TRUE){
step0=control$step0
} else {
step0=(upper-lower)/5
}
if (is.numeric(control$stepmin)==TRUE){
stepmin=control$stepmin
} else {
stepmin=1e-6
}
if (is.numeric(control$ncycles)==TRUE){
ncycles=control$ncycles
} else {
ncycles=1000
}
if (is.numeric(control$tol)==TRUE){
tol=control$tol
} else {
tol=1e-6
}
if (is.numeric(control$MCcheck)==TRUE){
MCcheck=control$MCcheck
} else {
MCcheck=30
}
if (is.logical(control$display)==TRUE){
display=control$display
} else {
display=FALSE
}
if (s==-1) {
optmode="max"
} else {
optmode="min"
}
OUT=OAToptim(fun=fn,x0=par,lower=lower,upper=upper,step0=step0,stepmin=stepmin,ncycles=ncycl
es,tol=tol,MCcheck=30,display=display,optmode=optmode)
par=OUT[[1]]
value=OUT[[2]]
counts=counts+OUT[[3]]
} else {
OUT=optim(par=par,fn=fn,gr=gr,method=methodv[i],lower=lower,upper=upper,control=control,hess
ian=hessian)
par=OUT$par
value=OUT$value
}

Hugo Hernandez
ForsChem Research
}
if (s*value<s*valueopt){
valueopt=value
paropt=par
}
}
optime=Sys.time()-t0
return(list(par=paropt,value=valueopt,counts=counts,time=optime))
}
A.3. Benchmark Functions and their Implementation in R
Ackley Function
√ ( ) ( )
( )
ackley<-function(x){
f=-20*exp(-0.2*sqrt((x[1]^2+x[2]^2)/2))-exp(0.5*(cos(2*pi*x[1])+cos(2*pi*x[2])))+exp(1)+20
return(f)
}
Beale Function
( ) ( ) ( ) ( )
beale<-function(x){
f=(1.5-x[1]+x[1]*x[2])^2+(2.25-x[1]+x[1]*x[2]*x[2])^2+(2.625-x[1]+x[1]*(x[2]^3))^2
return(f)
}
Booth Function
( ) ( ) ( )
booth<-function(x){
f=(x[1]+2*x[2]-7)^2+(2*x[1]+x[2]-5)^2
return(f)
}
Bukin Function # 6
| |
( ) √| |
bukin6<-function(x){
f=100*sqrt(abs(x[2]-0.01*x[1]^2))+0.01*abs(x[1]+10)
return(f)
}

Hugo Hernandez
ForsChem Research
Camel Function
( )
camel<-function(x){
f=2*x[1]^2-1.05*x[1]^4+(x[1]^6)/6+x[1]*x[2]+x[2]^2
return(f)
}
Cross-in-Tray Function
√
| |
| |
| |
( ) ( )
crossintray<-function(x){
f=-0.0001*(1+abs(sin(x[1])*sin(x[2])*exp(abs(100-sqrt(x[1]^2+x[2]^2)/pi))))^0.1
return(f)
}
Easom Function
( ) (( ) ( ) )
easom<-function(x){
f=-cos(x[1])*cos(x[2])*exp(-((x[1]-pi)^2+(x[2]-pi)^2))
return(f)
}
Eggholder Function
( ) ( ) √| | √| |
eggholder<-function(x){
f=-(x[2]+47)*sin(sqrt(abs(0.5*x[1]+x[2]+47)))-x[1]*sin(sqrt(abs(x[1]-x[2]-47)))
return(f)
}
Goldstein-Price Function
( ) ( ( ) ( ))
( ( ) ( ))
goldsteinprice<-function(x){
f=(1+(x[1]+x[2]+1)^2*(19-14*x[1]+3*x[1]^2-14*x[2]+6*x[1]*x[2]+3*x[2]^2))*(30+(2*x[1]-
3*x[2])^2*(18-32*x[1]+12*x[1]^2+48*x[2]-36*x[1]*x[2]+27*x[2]^2))
return(f)
}

Hugo Hernandez
ForsChem Research
Gomez-Levy Constrained Function

( )
( ) ( )
gomezlevyC<-function(x){
f=4*x[1]^2-2.1*x[1]^4+(x[1]^6)/3+x[1]*x[2]-4*x[2]^2+4*x[2]^4
if (2*(sin(2*pi*x[2]))^2-sin(4*pi*x[1])>1.5) f=Inf
return(f)
}
Himmelblau Function
( ) ( ) ( )
himmelblau<-function(x){
f=(x[1]^2+x[2]-11)^2+(x[1]+x[2]^2-7)^2
return(f)
}
Hölder Table Function

√
| |
| |
( )
| |
hoeldertable<-function(x){
f=-abs(sin(x[1])*cos(x[2])*exp(abs(1-sqrt(x[1]^2+x[2]^2)/pi)))
return(f)
}
Lévi Function # 13
( ) ( ) ( ) ( ( )) ( ) ( ( ))
levi13<-function(x){
f=(sin(3*pi*x[1]))^2+(x[1]-1)^2*(1+(sin(3*pi*x[2]))^2)+(x[2]-1)^2*(1+(sin(2*pi*x[2]))^2)
return(f)
}
Matyas Function
( ) ( )
matyas<-function(x){
f=0.26*(x[1]^2+x[2]^2)-0.48*x[1]*x[2]
return(f)
}

Hugo Hernandez
ForsChem Research
McCormick Function
( ) ( ) ( )
mccormick<-function(x){
f=sin(x[1]+x[2])+(x[1]-x[2])^2-1.5*x[1]+2.5*x[2]+1
return(f)
}
Mishra’s Bird Constrained Function

( ) ( ) ( ) ( )
( ) ( )
mishraC<-function(x){
f=sin(x[2])*exp((1-cos(x[1]))^2)+cos(x[1])*exp((1-sin(x[2]))^2)+(x[1]-x[2])^2
if ((x[1]+5)^2+(x[2]+5)^2>=25) f=Inf
return(f)
}
Rastrigin Function
( ) ( ( )) ( ( ))
rastrigin<-function(x){
nd=length(x)
f=10*nd
for (i in 1:nd){
f=f+x[i]^2-10*cos(2*pi*x[i])
}
return(f)
}
Rosenbrock Function
( ) ( ) ( )
rosenbrock<-function(x){
nd=length(x)
f=0
for (i in 1:(nd-1)){
f=f+100*((x[i+1]-x[i]^2)^2)+(1-x[i])^2
}
return(f)
}
Rosenbrock Constrained Function

( ) ( ) ( )
( )

Hugo Hernandez
ForsChem Research
rosenbrockC<-function(x){
f=100*((x[2]-x[1]^2)^2)+(1-x[1])^2
if ((x[1]-1)^3-x[2]+1>0 | x[1]+x[2]-2>0) f=Inf
return(f)
}
Schaffer Function #2
( )
( )
( )
schaffer2<-function(x){
f=0.5+((sin(x[1]^2-x[2]^2))^2-0.5)/(1+0.001*(x[1]^2+x[2]^2))^2
return(f)
}
Schaffer Function #4
( | |)
( )
( )
schaffer4<-function(x){
f=0.5+((cos(sin(abs(x[1]^2-x[2]^2))))^2-0.5)/(1+0.001*(x[1]^2+x[2]^2))^2
return(f)
}
Sphere Function
( )
sphere<-function(x){
nd=length(x)
f=0
for (i in 1:nd){
f=f+x[i]^2
}
return(f)
}
Simionescu Constrained Function

( )
( ( ))
( )
simionescuC<-function(x){
f=0.1*x[1]*x[2]
c=((1+0.2*cos(8*atan(x[1]/x[2])))^2)
if (is.nan(c)==TRUE) c=Inf
if (x[1]^2+x[2]^2>c) f=Inf
return(f)
}

Hugo Hernandez
ForsChem Research
Styblinski-Tang Function
( )
styblinskitang<-function(x){
nd=length(x)
f=0
for (i in 1:nd){
f=f+0.5*(x[i]^4-16*x[i]^2+5*x[i])
}
return(f)
}
Townsend Constrained Function
( ) ( (( ) )) ( )
( ) ( ) ( )
townsendC<-function(x){
f=-(cos(x[2]*(x[1]-0.1)))^2-x[1]*sin(3*x[1]+x[2])
t=atan2(x[1],x[2])
if (x[1]^2+x[2]^2>=(2*cos(t)-0.5*cos(2*t)-0.25*cos(3*t)-0.125*cos(4*t))^2+(2*sin(t))^2)
f=Inf
return(f)
}


Multi-Algorithm Optimization

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Multi-Algorithm Optimization

Uploaded by

Copyright:

Available Formats

Vol.

Broyden-Fletcher-Goldfarb-Shanno, Global Optimization, Nelder-Mead, No Free Lunch

28/08/2023 ForsChem Research Reports Vol. 8, 2023-12

Figure 2. General Classification of Numerical Optimization Algorithms

28/08/2023 ForsChem Research Reports Vol. 8, 2023-12

2. Multi-Algorithm Optimization Method

Figure 3. Multi-Algorithm Optimization Strategy

2.1. Gradient-based Method

The gradient-based method considered is the bounded Broyden-Fletcher-Goldfarb-Shanno

 Determine the Cauchy Point ( ) by solving the quadratic sub-problem:

28/08/2023 ForsChem Research Reports Vol. 8, 2023-12

2.2. Non-gradient Deterministic Method

28/08/2023 ForsChem Research Reports Vol. 8, 2023-12

where is the contraction coefficient.

The NM algorithm is used in R language (https://cran.r-project.org/) using the function optim

2.3. Non-gradient Random Method

28/08/2023 ForsChem Research Reports Vol. 8, 2023-12

2.4. Non-gradient Randomistic Method

The OAT algorithm is used in R language (https://cran.r-project.org/) using the function

2.5. maoptim Function

28/08/2023 ForsChem Research Reports Vol. 8, 2023-12