Professional Documents
Culture Documents
8849-Article Text-13341-1-10-20200411 PDF
8849-Article Text-13341-1-10-20200411 PDF
1,*
Maulana Malik, 2Mustafa Mamat, 3Siti Sabariah Abas, 4Sukono
1
Department of Mathematics, Faculty of Mathematics and Natural Science,
Universitas Indonesia, Depok, Indonesia
2,3
Faculty of Informatics and Computing, Universiti Sultan Zainal Abidin,
Terengganu, Malaysia
4
Department of Mathematics, Faculty of Mathematics and Natural Science,
Universitas Padjadjaran, Bandung, Indonesia
Abstract
Conjugate gradient (CG) methods were instrumental in solving unconstrained, wide-ranging
optimization. In this paper, we propose a new CG coefficient family, which holds conditions of
sufficient descent and global convergence properties. Under exact line search this new CG is
evaluated on a set of functions. Based on number of iterations (NOI) and central processing
unit (CPU) time, it then compared its output with that of some of the well-known previous CG
methods. The results show that of all the methods tested, the latest CG method has the best
performance.
1. Introduction
Consider the problem of unconstrained optimization of the following 𝑛 variables:
min 𝑓(𝒙) (1)
𝒙∈ℝ𝑛
𝑛 𝑛
where 𝑓: ℝ → ℝ is smooth and ℝ denotes an 𝑛-dimensional Euclidean space. The nonlinear
conjugate gradient method for (1) is designed by iterative formula is given by [1]
𝒙𝑘+1 = 𝒙𝑘 + 𝛼𝑘 𝒅𝑘 , 𝑘 = 0, 1, 2, … (2)
where 𝒙𝑘 is 𝑘th iterative point, 𝛼𝑘 > 0 is steplength obtained by one dimensional line search,
and 𝒅𝑘 is a search direction of 𝑓 at 𝒙𝑘 defined by
−𝒈𝑘 , 𝑘 = 0
𝒅𝑘 = { (3)
−𝒈𝑘 + 𝛽𝑘 𝒅𝑘−1 , 𝑘 ≥ 1
with 0 < 𝛿 < 𝜎 < 1 and ‖ . ‖ means the Euclidean norm [2]. Equation (4) is the form of exact
line search and inequality (4), (5) are form of inexact line search. The inexact line search is
methods such as Armijo [3], Goldstein [4], Wolfe [5] and Grippo-Lucidi line search [6]. The
step size 𝛼𝑘 in this paper uses exact line (4).
There are many well-known formulas for 𝛽𝑘 , such as the Hestenes and Stiefel (HS) [7],
Fletcher and Reeves (FR) [8], Conjugate Descent (CD) [9], Dai and Yuan (DY) [10], Wei, Yao
and Liu (WYL) [11], Rivaie, Mustafa, Ismail and Leong (RMIL) [12], and lastly Polak and
Ribiere (PRP) [13]. It is all given as follows, respectively:
𝒈𝑇 (𝒈𝑘 −𝒈𝑘−1 )
𝛽𝑘𝐻𝑆 = 𝒅𝑇𝑘 , (7)
𝑘−1 (𝒈𝑘 −𝒈𝑘−1 )
‖𝒈𝑘 ‖2
𝛽𝑘𝐹𝑅 = ‖𝒈𝑘−1 ‖2
, (8)
‖𝒈𝑘 ‖2
𝛽𝑘𝐶𝐷 = − , (9)
𝒅𝑇
𝑘−1 𝒈𝑘−1
‖𝒈𝑘 ‖2
𝛽𝑘𝐷𝑌 = 𝒅𝑇
, (10)
𝑘−1 𝑘 −𝒈𝑘−1 )
(𝒈
‖𝒈𝑘 ‖
𝒈𝑇
𝑘 (𝒈𝑘 − 𝒈 )
‖𝒈𝑘−1 ‖ 𝑘−1
𝛽𝑘𝑊𝑌𝐿 = 2 , (11)
||𝒈𝑘−1 ||
𝒈𝑇
𝑘 (𝒈𝑘 −𝒈𝑘−1 )
𝛽𝑘𝑅𝑀𝐼𝐿 = ‖𝒅𝑘−1 ‖2
, (12)
𝑇 (𝒈
𝒈𝑘 𝑘 −𝒈𝑘−1 )
𝛽𝑘𝑃𝑅𝑃 = ‖𝒈𝑘−1 ‖2
. (13)
The sufficient descent condition and global convergence properties are the most well studied
properties of CG methods. Previously, Wei et al. [11] proposed a CG coefficient which
modification of CG coefficient PRP known as WYL. This WYL is known fulfil the sufficient
condition and global convergence properties under exact line, Gippo-Lucidi and Wolfe line
Search [11]. As well as Rivaie et al. [12] with CG coefficient RMIL is fulfil to the sufficient
descent condition and global convergence properties under exact line search. RMIL is design
a new formula for the denominator and retained the original numerator as the PRP and HS the
numerical result of RMIL is have best performance compared to the other standard CG methods
(see [12]).
For a good reference for studies describing the latest CG coefficients with important result
and various modifications from 𝛽𝑘 see Rivaie et al. [14], Yousif [15], Basri and Mustafa [16],
Waziri et al. [17], Yuan et al. [18], Liu [19], Babaie-Kafaki [20], Liu et al. [21], Huang et al.
[22], Kui et al. [23], Yang et al. [24], Xu et al. [25], Zhu et al. [26] and Guo and Wan [27].
In this paper we will present our new CG coefficient 𝛽𝑘 , whose efficiency compared to
classic FR, CD, DY, WYL and RMIL formulas. Section 2 introduces a new formula for the CG
coefficient with an algorithm designed to solve unlimited problems of optimization. In Section
3, we will present the sufficient descent condition and proof of our new method of global
convergence. Section 4 will explain important numerical findings and discussions. Finally, the
conclusion is in Section 5.
negative gradient of WYL, retained the original denominator of RMIL and prevent
negative 𝛽𝑘 values. Hence,
‖𝒈𝑘 ‖
𝒈𝑇
𝑘 (𝒈𝑘 − 𝒈 −𝒈𝑘−1 )
‖𝒈𝑘−1 ‖ 𝑘−1
𝛽𝑘𝑀𝑀𝑆𝑆 = max {0, ‖𝒅𝑘−1 ‖ 2 } . (14)
3. Convergence Analysis
In this section we will use the exact line search to demonstrate the sufficient descent condition
and the global convergent properties of 𝛽𝑘𝑀𝑀𝑆𝑆 .
Theorem 1. Consider a CG method with search direction 𝒅𝑘 in (3), 𝛽𝑘𝑀𝑀𝑆𝑆 given as equation
(14), then, the condition (15) will hold for all 𝑘 ≥ 1.
2
Proof: If 𝑘 = 0 , then 𝒅0 = −𝒈0 , so that 𝒈𝑇0 𝒅0 = 𝒈𝑇0 (−𝒈0 ) = − (√𝒈𝑇0 𝒈0 ) = −‖𝒈0 ‖2 .
Hence, condition (15) holds true for 𝑘 = 0. Next, we will show that for 𝑘 ≥ 1, condition (15),
will holds true. Multiply (3) by 𝒈𝑇𝑘 then
Hence, condition (15) holds true for 𝑘 ≥ 1. So, it has been proven that for every 𝑘 ≥ 0, search
direction 𝒅𝑘 is descent direction. The proof is complete.∎
In this subsection, we will demonstrate that CG methods with 𝛽𝑘𝑀𝑀𝑆𝑆 converge globally.
We need to simplify our new 𝛽𝑘𝑀𝑀𝑆𝑆 first, however, so that our proof of convergence will be
substantially easier. From (14), we have two cases:
‖𝒈𝑘 ‖
Case 1: If ‖𝒈𝑘 ‖2 > ‖𝒈 𝒈𝑘−1 − 𝒈𝑘−1 then
𝑘−1 ‖
‖𝒈𝑘‖ ‖𝒈𝑘 ‖
𝒈𝑇
𝑘 (𝒈𝑘 − 𝒈 −𝒈𝑘−1 ) ‖𝒈𝑘 ‖2 − 𝒈𝑇 𝒈 −𝒈𝑇
𝑘 𝒈𝑘−1 ‖𝒈𝑘 ‖2
‖𝒈𝑘−1 ‖ 𝑘−1 ‖𝒈𝑘−1 ‖ 𝑘 𝑘−1
𝛽𝑘𝑀𝑀𝑆𝑆 = 2 = ‖𝒅𝑘−1 ‖2
< ‖𝒅 2.
||𝒅𝑘−1 || 𝑘−1 ‖
‖𝒈𝑘 ‖
Case 2: If ‖𝒈𝑘 ‖2 ≤ ‖𝒈 𝒈𝑘−1 − 𝒈𝑘−1 then 𝛽𝑘𝑀𝑀𝑆𝑆 = 0. (17)
𝑘−1 ‖
The following basic assumptions are often needed when analyzing the global
convergence properties of the CG methods.
Assumption 1.
(i) The level set Ω = {𝒙 ∈ ℝn : 𝑓(𝒙) ≤ 𝑓(𝒙0 )} is bounded, where 𝒙0 is a given starting
point.
(ii) In an open convex set Ω0 that contains Ω, 𝑓 is continuous and differentiable, and
its gradient is Lipschitz continuous; that is, for any 𝒙, 𝒚 ∈ Ω0 , there exist a constant
𝐿 > 0 such that ‖𝑔(𝒙) − 𝑔(𝒚)‖ ≤ 𝐿‖𝒙 − 𝒚‖.
Proof of this Lemma can be found in [28]. The following convergent theorem for the CG
method can be obtained by using Lemma 1 and (17).
Theorem 2: Suppose Assumption 1 hold. Consider any CG method in the form (3), where
the steplenght 𝛼𝑘 is determined by the exact line (4). In addition, suppose that the
sufficient descent condition holds true. Then,
Proof: We use a contradiction proof. Suppose that (18) is not correct then there is a
constant 𝐶 > 0 such that
1 1
‖𝒈𝑘 ‖ ≥ 𝐶 ⟺ ‖𝒈𝑘 ‖2 ≥ 𝐶 2 ⟺ ≤ 𝐶 2 , for every 𝑘 ≥ 0. (19)
‖𝒈 ‖2 𝑘
2
‖𝒅𝑘 ‖2 = (𝛽𝑀𝑀𝑆𝑆
𝑘
) ‖𝒅𝑘−1 ‖2 + ‖𝒈𝑘 ‖2.
Applying (17)
‖𝒅𝑘 ‖2 = ‖𝒈𝑘 ‖2 , for all 𝑘. (20)
‖𝒅𝑘 ‖2 1
‖𝒈𝑘 ‖4
= ‖𝒈 2 .
𝑘‖
and we obtain
‖𝒅𝑘 ‖2 1 ‖𝒅 ‖2
‖𝒈𝑘 ‖4
≤ ‖𝒈 ‖2
+ ‖𝒈𝑘−1 ‖4 . (21)
𝑘 𝑘−1
‖𝒅1 ‖2 ‖𝒅 ‖2 1 1 1 1
for 𝑘 = 1, ‖𝒈1 ‖4
≤ ‖𝒈0 ‖4 + ‖𝒈 2 = ‖𝒈 2 + ‖𝒈 2 = ∑1𝑘=0 ‖𝒈 2 ,
0 1‖ 0‖ 1‖ 𝑘‖
‖𝒅2 ‖2 ‖𝒅 ‖2 1 1 1 1 1
for 𝑘 = 2, ‖𝒈2 ‖4
≤ ‖𝒈1 ‖4 + ‖𝒈 2 ≤ ‖𝒈 2 + ‖𝒈 2 + ‖𝒈 2 = ∑2𝑘=0 ‖𝒈 2 , ……….
1 2‖ 0‖ 1‖ 2‖ 𝑘‖
‖𝒅𝑛 ‖2 ‖𝒅 ‖2 1 1 1 1 1 1
for 𝑘 = 𝑛, ‖𝒈𝑛 ‖4
≤ ‖𝒈𝑛−1 ‖4 + ‖𝒈 2 ≤ ‖𝒈 2 + ‖𝒈 2 + ‖𝒈 2 + ⋯ + ‖𝒈 2 = ∑𝑛𝑘=0 ‖𝒈 2 .
𝑛−1 𝑛‖ 0‖ 1‖ 2‖ 𝑛‖ 𝑘‖
So that,
‖𝒅𝑛 ‖2 1
‖𝒈𝑛 ‖4
≤ ∑𝑛𝑘=0 ‖𝒈 2 . (22)
𝑘‖
‖𝒈𝑘 ‖4 𝐶2
∑𝑛𝑘=0 ≥ ∑𝑛𝑘=0 𝑘+1, and further, we get
‖𝒅𝑘 ‖2
2
(𝒈𝑇
𝑘 𝒅𝑘 ) 1
∑∞
𝑘=0 ≥ 𝐶 2 ∑∞
𝑘=0 𝑘+1. (23)
‖𝒅𝑘 ‖2
function is an artificial device. Artificial functions are functions used to detect algorithmic
actions in various conditions such as the length of the narrow valleys, unimodal functions
and functions with a large number of local optimal functions.
In this paper there are thirty-one non-linear functions to be evaluated as described in
Table 1. In addition, for each test dimension, one of which is the initial point suggested
by Andrei [29]. The comparison between each method is based on the number of iterations
(NOI) and the time in seconds required to run each test problem (CPU). Test evaluation
was based on the Nocedal-line search algorithm for the exact condition (4) and coded in
MATLAB under the stop criterion is set to ‖𝒈𝑘 ‖ ≤ 10−6 . The test was performed on a
laptop with an intel ® Core TM i7- CPU @ 1.80GHz processor (8CPUs), ~2.0GHz, 16
GB RAM memory and windows 10 Professional 64bit operating system.
The numerical results are combined using the profile results described in Dolan and
More [30]. The profile results are illustrated in Figure 1 and 2. Figure 1 and 2,
respectively, are the results of the iteration and running time profiles. The results in
Figures 1 and 2 are obtained in the following way:
𝑎𝑝,𝑠
𝑟𝑝,𝑠 =
min{𝑎𝑝,𝑠 : 𝑠 ∈ 𝑆}
where 𝑟𝑝,𝑠 is performance ratio, 𝑎𝑝,𝑠 is the number of iterations or CPU time, 𝑃 is set test,
and 𝑆 is set of solvers on the test set 𝑃. Overall profile results can be obtained in the
following ways:
1
𝜌𝑠 (𝑡) = 𝑠𝑖𝑧𝑒 {𝑝 ∈ 𝑃: 𝑟𝑝,𝑠 ≤ 𝑡}
𝑛𝑝
with𝜌𝑠 (𝑡) is the probability for solver 𝑠 ∈ 𝑆 that a performance ratio 𝑟𝑝,𝑠 is within a factor
𝑡 ∈ ℝ of the best possible ratio and 𝑛𝑝 is the number of functions. The function 𝜌𝑠 (𝑡) is
the distribution function for the performance ratio. The value of 𝜌𝑠 (1) is the probability
that solver will win over the rest of solvers.
From Table 1 and Table 2, we see that for all given test function, MMSS successfully reach
solution point. The RMIL, FR and CD only reaches 98%, the DY 95% and the WYL 96%.
Based on Figure 1 the performance profile show that MMSS method almost strongly
outperforms other tested methods RMIL, FR, CD, DY, and WYL in terms of the number of
iterations (NOI) since it corresponds to the top curve. As well as on Figure 2, the performance
profile show that MMSS method almost strongly outperforms other tested methods RMIL, FR,
CD, DY, and WYL in terms of CPU time since it corresponds to the top curve. Moreover, based
on the test of the problem being tested, it was found that MMSS was better than the other
methods.
5. Conclusion
In this paper we proposed a new coefficient of conjugate gradient method, namely MMSS.
Firstly, we have proved that the sufficient descent property holds. Under some
assumptions we have showed that our proposed algorithm is globally convergent under
the exact line search. The comparison of our proposed method with RMIL, FR, CD, DY
and WYL methods shows that our new is have the best performance.
References
[1] E. Polak, “Algorithms and Consistent Approximations”, Springer, Berlin, (1997).
[2] J. Nocedal and S.J. Wright, “Numerical Optimization”, Springer, New York, (2000).
[3] L. Armijo, “Minimization of functions having Lipschitz continuous first partial
derivatives”, Pacific Journal of mathematics., vol.16, no. 1, (1966), pp.1-3.
[4] A. A. Goldstein, “On steepest descent”, Journal of the Society for Industrial and Applied
Mathematics, Series A: Control., vol. 3, no. 1, (1965), pp. 147-151.
[5] P. Wolfe, “Convergence conditions for ascent methods”, SIAM review., vol. 11, no. 2,
(1969), pp. 226-235.
[6] L. Grippo and S. Lucidi, “A globally convergent version of the Polak-Ribiere conjugate
gradient method”, Mathematical Programming., vol. 78, no. 3, (1997), pp. 375-391.
[7] M. R. Hestenes and E. Stiefel, “Methods of Conjugate Gradients for Solving Linear
Systems”, Journal of Research of The National Bureau of Standards., vol. 49, no. 6, (1952),
pp. 409-435.
[8] R. Fletcher and C. M. Reeves, “Function minimization by conjugate gradients”, The
computer journal., no. 2, (1964), pp. 149-154.
[9] R. Fletcher, “Practical methods of optimization”, Wiley Interscience John Wiley and Sons,
New York, USA, 2nd edition, (1987).
[10] Y. H. Dai and Y. Yuan, “A Nonlinear Conjugate Gradient Method with A Strong Global
Convergence Property”, SIAM Journal on optimization., vol. 10, no. 1, (1999), pp. 177-
182.
[11] Z. Wei, S. Yao, and L. Liu, “The convergence properties of some new conjugate gradient
methods”, Appl. Math. Comput., vol. 183, no. 2, (2006), pp. 1341-1350.
[12] M. Rivaie, M. Mamat, L.W. June, and I. Mohd, “A new class of nonlinear conjugate
gradient coefficients with global convergence properties”, Appl. Math. Comput., vol. 218,
no. 22, (2012), pp.11323-11332.
[13] E. Polak and G. Ribiere, “Note on The Convergence of Methods of Conjugate Directions”,
Revue Francaise d’Informatique Et De Recherche Operationnelle., vol.3, no. 16, (1969),
pp. 35-43.
[14] M. Rivaie, M. Mamat, and A. Abashar, “A new class of nonlinear conjugate gradient
coefficients with exact and inexact line searches”, Appl. Math. Comput., vol. 268, no.
October, (2015), pp. 1152–1163.
[15] O. O. O. Yousif, “The convergence properties of RMIL+ conjugate gradient method under
the strong Wolfe line search”, Appl. Math. Comput., vol. 367, (2020), p. 124777.
[16] S. Basri and M. Mamat, “A new class of nonlinear conjugate gradient with global
convergence properties”, in Materials Today: Proceedings., (2018).
[17] M. Y. Waziri, K. Ahmed, and J. Sabi’u, “A family of Hager–Zhang conjugate gradient
methods for system of monotone nonlinear equations”, Appl. Math. Comput., vol. 361,
(2019), pp. 645-660.
[18] G. Yuan, T. Li, and W. Hu, “A conjugate gradient algorithm for large-scale nonlinear
equations and image restoration problems”, Appl. Numer. Math., vol. 147, no. 11661009,
(2020), pp. 129-141.
[19] J. Liu, “Convergence properties of a class of nonlinear conjugate gradient methods”,
Comput. Oper. Res., vol. 40, no. 11, (2013), pp. 2656-2661.
[20] S. Babaie-Kafaki, “Two modified scaled nonlinear conjugate gradient methods”, J. Comput.
Appl. Math., vol. 261, (2014), pp. 172-182.
[21] D. Liu, L. Zhang, and G. Xu, “Spectral method and its application to the conjugate gradient
method”, Appl. Math. Comput., vol. 240, (2014), pp. 339-347.
[22] Y. Huang, S. Liu, X. Du, and X. Dong, “A Globally Convergent Hybrid Conjugate Gradient
Method and Its Numerical Behaviors”, Mathematical Problem in Engineering., vol. 2013,
no. 5, (2013), pp. 1-14.
[23] L. Jin-kui, Z. Li-min, and S. Xiao-qian, “Global Convergence of a Nonlinear Conjugate
Gradient Method”, Mathematical Problem in Engineering., vol. 2011, (2011), pp. 1-23.
[24] X. Yang, Z. Luo, and X. Dai, “A Global Convergence of LS-CD Hybrid Conjugate
Gradient Method,” Adv. Numer. Anal., vol. 2013, (2013), pp. 1–5.
[25] C. Xu, J. Zhu, Y. Shang, and Q. Wu, “Method over Networks”, Complexity., vol. 2020,
(2020), pp. 1-13.
[26] T. Zhu, Z. Yan, and X. Peng, “A Modified Nonlinear Conjugate Gradient Method for
Engineering Computation,” Math. Probl. Eng., vol. 2017, (2017), pp. 1-11.
[27] J. Guo and Z. Wan, “A Modified Spectral PRP Conjugate Gradient Projection Method for
Solving Large-Scale Monotone Equations and Its Application in Compressed Sensing”,
Math. Probl. Eng., vol. 2019, (2019), pp. 23–27.