You are on page 1of 7

Applied Mathematics Letters 148 (2024) 108889

Contents lists available at ScienceDirect

Applied Mathematics Letters


www.elsevier.com/locate/aml

An ordinary differential equation for modeling Halpern fixed-point


algorithm✩
Lulu Zhang a,b , Weifeng Gao a,b ,∗, Jin Xie a,b , Hong Li a,b
a
School of Mathematics and Statistics, Xidian University, Xi’an 475004, China
b
Key Laboratory of Collaborative Intelligence Systems, Ministry of Education, Xidian
University, Xi’an 710071, China

article info abstract

Article history: The ordinary differential equation is a powerful tool for analyzing optimization
Received 7 August 2023 algorithm. Motivated by the fact, this paper revisits Halpern fixed-point algorithm
Received in revised form 7 October 2023 from an ordinary differential equation. More specifically, we establish a second-
Accepted 7 October 2023 order ordinary differential equation with Hessian-driven damping, which is the
Available online 10 October 2023 limit of Halpern fixed-point algorithm. The Hessian-driven damping makes it
Keywords: possible to significantly attenuate the oscillations.
Ordinary differential equation © 2023 Elsevier Ltd. All rights reserved.
Halpern fixed-point algorithm
Hessian-driven damping

1. Introduction

In this paper, we are interested in the following monotone inclusion problem

find x∗ ∈ Rd such that Bx∗ = 0, (1)

where B : Rd → Rd is a single-valued and 1/L-co-coercive operator. Variational inequality and the


minimization problem of convex function can be viewed as monotone inclusion problem (1). The problem
(1) is widespread in many fields, such as computer vision, machine learning, optimization and statistics
(see, e.g. [1–4]). To solve (1), various researches have been proposed. For example, [2] provided a method
that approximating the solution of (1) can be formulated equivalently to approximating the fixed-point of a
nonexpansive operator. More precisely, the fixed-point problem of a nonexpansive operator has the following
form
find x∗ ∈ Rd such that T x∗ = x∗ , (2)

✩ This work was supported in part by the National Nature Science Foundation of China under Grant 62276202 and 62106186,
in part by the Natural Science Basic Research Plan in Shaanxi Province of China under Grant 2022JQ-670, and in part by the
Fundamental Research Funds for the Central Universities, China under Grant QTZX22047 and JB210701.
∗ Corresponding author.
E-mail addresses: zhanglulu202204@126.com (L. Zhang), gaoweifeng2004@126.com (W. Gao), xj6417@126.com (J. Xie),
lihong@mail.xidian.edu.cn (H. Li).

https://doi.org/10.1016/j.aml.2023.108889
0893-9659/© 2023 Elsevier Ltd. All rights reserved.
L. Zhang, W. Gao, J. Xie et al. Applied Mathematics Letters 148 (2024) 108889

where T : Rd → Rd is nonexpansive. If T = I − sB, then (2) is equivalent to (1) in [5], where I : Rd → Rd


stands for the identity operator and s ∈ (0, 2/L]. Fixed-point problem (2) for nonexpansive operator [6]
has been investigated in many practical applications, such as convex optimization problem [2], monotone
variational inequality and convex feasibility problem [7]. To solve (2), many fixed-point algorithms have
been proposed, such as Krasnosel′ skil̆-Mann algorithm [8], Halpern fixed-point algorithm [9] and hybrid
algorithm [10]. With the growing use of Halpern fixed-point algorithm in some applied fields, the need
for Halpern fixed-point algorithm increased significantly. Thus, we mainly introduce the form of Halpern
fixed-point algorithm
xk+1 = βk x0 + (1 − βk )T xk , (3)
where x0 is the initial point, T : Rd → Rd is nonexpansive, the coefficient βk satisfies βk ∈ (0, 1),
∞ ∞
limk→∞ βk = 0, Σk=0 βk = ∞ and Σk=0 |βk+1 − βk | < ∞. If T = I − sB, then the Halpern fixed-point
algorithm for solving (1) can be written as follows:

xk+1 = βk x0 + (1 − βk )xk − s(1 − βk )Bxk , (4)

where s ∈ (0, 2/L], B : Rd → Rd is a single-valued and 1/L-co-coercive operator. This paper discusses the
Halpern fixed-point algorithm (4).
Notice that the Halpern fixed-point algorithm (4) can be applied to monotone inclusion and convex
optimization problem, especially game theory, robust optimization and minimization problem [8]. Thus, we
also consider the unconstrained minimization problem

min f (x), (5)


x∈Rd

where f (x) is a convex function and ∇f is 1/L-co-coercive. Actually, if B = ∇f , then the problem (1)
reduces to (5). Therefore, we can apply the Halpern fixed-point algorithm (4) to solve (5). The Halpern
fixed-point algorithm for solving (5) has the following scheme

xk+1 = βk x0 + (1 − βk )xk − s(1 − βk )∇f (xk ), (6)


1 w+1
where βk ∈ (0, 1), s ∈ (0, 2/L] and ∇f is 1/L-co-coercive. When βk = k+2 and βk = k+2w+2 , the iterations
{xk }k generated by (6) have been studied in [5,11] respectively, where w > 2. Nevertheless, the literatures
have not yet provided a full exploration for the continuous perspective, which is the main subject of the
current paper.
Since ordinary differential equations (ODEs) have been recognized as a valuable tool for discovering
and studying optimization algorithms. Therefore, ODEs approaching optimization algorithms enjoy much
attention since the seventies of the last century, see [12–20]. For instance, taking the step size tends to 0
in gradient algorithm, we obtain the following ODE (gradient flow) Ẋ(t) = −∇f (X(t)), where the initial
condition is X(0) = x0 ∈ Rd . Subsequently, when f is a differentiable convex function with L-Lipschitz
continuous gradient, Su et al. [12] showed that the corresponding ODE of Nesterov’s accelerated gradient
method [13] has the form
3
Ẍ(t) + Ẋ(t) + ∇f (X(t)) = 0, (7)
t
with the initial conditions X(0) = x0 and Ẋ(0) = 0. Furthermore, by constructing a Lyapunov function, Su
et al. [12] provided a new proof of the fact that Nesterov’s accelerated gradient method has an decay rate of
O(L/k 2 ). This shows that the convergence rate of optimization algorithm can be studied by using ordinary
differential equations. However, Wibisono et al. [14] derived a low-resolution ODE of Polyak’s heavy ball
method [15] and Nesterov’s accelerated gradient method for strongly convex function. The low-resolution
ODE cannot distinguish the two algorithms. To this end, Shi et al. [16,17] proposed the high-resolution
2
L. Zhang, W. Gao, J. Xie et al. Applied Mathematics Letters 148 (2024) 108889

ODE. More precisely, if f is a twice differentiable convex function with L-Lipschitz continuous gradient, the
high-resolution ODE of Nesterov’s accelerated gradient method is

3 √ 2 3 s
Ẍ(t) + Ẋ(t) + s∇ f (X(t))Ẋ(t) + (1 + )∇f (X(t)) = 0, (8)
t 2t
√ √ √
with the initial conditions X(3 s/2) = x0 and Ẋ(3 s/2) = − s∇f (x0 ). The main difference between
√ √
(7) and (8) is that (8) possesses the term s∇2 f (X(t))Ẋ(t). We call s∇2 f (X(t))Ẋ(t) Hessian-driven
damping. Furthermore, for a general convex function f , Attouch et al. [18,19] presented an ODE framework
with Hessian-driven damping (DIN-AVD)α,β,b
α
Ẍ(t) + Ẋ(t) + β(t)∇2 f (X(t))Ẋ(t) + b(t)∇f (X(t)) = 0, (9)
t
where β(t) is the damping parameter, b(t) is the time scale parameter and b(t) > β̇(t) + β(t)/t. Notice
that the ODEs (7) and (8) are two special cases of (DIN-AVD)α,β,b . More specifically, (9) reduces

to the
√ 3 s
continuous version (7) as α = 3, β(t) ≡ 0 and b(t) ≡ 1. When α = 3, β(t) ≡ s and b(t) = 1 + 2t , (9) can
be interpreted as (8). Based on the previous works, we will consider Halpern fixed-point algorithm by using
ODE. We believe that it will be interesting to understand the Halpern fixed-point algorithm by ODE.
Paper organization. The rest of this paper is organized as follows. Section 2 provides some necessary
notations and preliminary results. Section 3 presents the main results and derives the corresponding ODE
(HF-ODE) of Halpern fixed-point algorithm (6). In Section 4, we conclude this work.

2. Notations and preliminary results

This section recalls some necessary notations and concepts. Throughout this paper, we use ∥ · ∥ to denote
the standard Euclidean norm and use ⟨·⟩ to denote the standard inner product. A single-value operator B
is said to be L-Lipschitz continuous if ∥Bx − By∥ ≤ L∥x − y∥ for all x, y ∈ dom(B), where L ≥ 0 is a
Lipschitz constant and dom(B) = {x ∈ Rd : Bx ̸= ∅} denotes the domain of B. If L = 1, then we say that
B is nonexpansive. If L ∈ [0, 1), then we say that B is L-contractive, where L is the contraction factor. We
say that B is monotone if ⟨u − v, x − y⟩ ≥ 0 for all x, y ∈ dom(B). We say that B is 1/L-co-coercive if
⟨Bx − By, x − y⟩ ≥ L1 ∥Bx − By∥2 for all x, y ∈ dom(B). If L = 1, then we say that B is firmly nonexpansive.
Notice that if B is 1/L-co-coercive, then it is also monotone and L-Lipschitz continuous. ∇f (·) denotes the
first-order derivative of f (·), ∇2 f (·) denotes the second-order derivative of f (·) and x∗ denotes a minimizer
of f . Ẋ(·) denotes the first-order derivative of X(·) and Ẍ(·) denotes the second-order derivative of X(·).

3. Main results

In this section, we introduce the limit HF-ODE of Halpern fixed-point algorithm (6). To the best of
my knowledge, this work is the first to use the continuous dynamic system to model Halpern fixed-point
algorithm. Here we offer a new condition p(βk−1 − βk ) = βk βk−1 for some constant p.

Theorem 3.1. Let f : Rd → Rd be a twice continuously differentiable convex function and ∇f satisfies 1/L-
β0
co-coercive. If there exists a constant p ≥ 2−β 0
such that the coefficient βk of (6) satisfies p(βk−1 − βk ) =
βk βk−1 , then Halpern fixed-point algorithm (6) has the limit HF-ODE

p+1 √ 2 s
Ẍ(t) + Ẋ(t) + s∇ f (X(t))Ẋ(t) + ∇f (X(t)) = 0, (10)
t t
√ 1+p √ √
with the initial conditions X(( βp0 − 1+p p
2 ) s) = x0 and Ẋ(( β0 − 2 ) s) = − s(1 − β0 )∇f (x0 ).
3
L. Zhang, W. Gao, J. Xie et al. Applied Mathematics Letters 148 (2024) 108889

Proof . We start by recalling the Halpern fixed-point algorithm (6), which has the following form
xk+1 = βk x0 + (1 − βk )xk − s(1 − βk )∇f (xk ), (11)
where x0 is the initial point and p(βk−1 − βk ) = βk βk−1 . Shifting the index from k to k − 1 in (11) and
multiplying this expression by −βk , we deduce
− βk xk = −βk βk−1 x0 − (1 − βk−1 )βk xk−1 + s(1 − βk−1 )βk ∇f (xk−1 ). (12)
Adding (11) × βk−1 and (12), we obtain
βk−1 xk+1 − βk xk = (1 − βk )βk−1 xk − s(1 − βk )βk−1 ∇f (xk ) − (1 − βk−1 )βk xk−1
+ s(1 − βk−1 )βk ∇f (xk−1 ),
which leads to
βk−1 (xk+1 − xk ) = (1 − βk−1 )βk (xk − xk−1 ) − s(1 − βk )βk−1 ∇f (xk )
(13)
+ s(1 − βk−1 )βk ∇f (xk−1 ).
1+p √
Introduce the Ansatz xk = X(tk ) for some smooth curve X(t) defined for t ≥ ( βp0 − 2 ) s. Letting

tk := ( βp − 1+p
2 ) s, we have by Taylor expansion
k
xk+1 − xk 1 √ √
√ = Ẋ(tk ) + Ẍ(tk ) s + o( s),
s 2
√ √ (14)
xk − xk−1 1
√ = Ẋ(tk ) − Ẍ(tk ) s + o( s),
s 2
and we use a Taylor expansion for ∇f , which gives
√ √
∇f (xk−1 ) = ∇f (xk ) − ∇2 f (X(tk ))Ẋ(tk ) s + o( s). (15)

Then plugging (14) and (15) into (13)×1/ s, we have
√ √
s √ s √
βk−1 (Ẋ(tk ) + Ẍ(tk ) + o( s)) = (1 − βk−1 )βk (Ẋ(tk ) − Ẍ(tk ) + o( s))
√2 2 √ √
+ s(1 − βk−1 )βk (∇f (X(tk )) − ∇2 f (X(tk ))Ẋ(tk ) s + o( s))

− s(1 − βk )βk−1 ∇f (X(tk )).
Rearranging it, we obtain

s[βk−1 + (1 − βk−1 )βk ]
Ẍ(tk ) + [βk−1 − (1 − βk−1 )βk ]Ẋ(tk )
2 √ (16)
+ s(1 − βk−1 )βk ∇2 f (X(tk ))Ẋ(tk ) + s(βk−1 − βk )∇f (X(tk )) + o(s) = 0.

s
Multiplying both sides of (16) by 1/ 2 [βk−1+ (1 − βk−1 )βk ], we obtain

2[βk−1 − (1 − βk−1 )βk ] 2 s(1 − βk−1 )βk
Ẍ(tk ) + √ Ẋ(tk ) + ∇2 f (X(tk ))Ẋ(tk )
s[βk−1 + (1 − βk−1 )βk ] βk−1 + (1 − βk−1 )βk
(17)
2(βk−1 − βk ) √
+ ∇f (X(tk )) + o( s) = 0.
βk−1 + (1 − βk−1 )βk

Substituting p(βk−1 − βk ) = βk βk−1 into and ignoring o( s) term, we have the HF-ODE of Halpern
fixed-point algorithm (11)

p+1 √ s
Ẍ(t) + Ẋ(t) + s∇2 f (X(t))Ẋ(t) + ∇f (X(t)) = 0, (18)
t t

where the first initial condition is X(( βp0 − 1+p
2 ) s) = x0 . By (11), we obtain the second initial condition
x1 − x0 √
√ = − s(1 − β0 )∇f (x0 ).
s
√ √
Therefore, the second initial condition is Ẋ(( βp0 − 1+p
2 ) s) = − s(1 − β0 )∇f (x0 ). □
4
L. Zhang, W. Gao, J. Xie et al. Applied Mathematics Letters 148 (2024) 108889

Fig. 1. Evolution of the trajectories for NHF-ODE (p=1.1) and HF-ODE (p=1.1) on an ill-conditioned quadratic problem in R2 .

3.1. Particular cases

As anticipated above, by introducing the condition p(βk−1 −βk ) = βk βk−1 , we obtain the ODEs of Halpern
fixed-point algorithm in the literature and generalize the results.
1
• βk = k+2

In this case, we choose p = 1 and tk = (k + 1) s, and the limit ODE of Halpern fixed-point algorithm
(6) in [5] is √
2 √ s
Ẍ(t) + Ẋ(t) + s∇2 f (X(t))Ẋ(t) + ∇f (X(t)) = 0.
t t
w+1
• βk = k+2w+2

Let us set p = w + 1 and tk = (k + 3w 2 + 1) s, and the limit ODE of Halpern fixed-point algorithm (6)
in [11] is √
w+2 √ 2 s
Ẍ(t) + Ẋ(t) + s∇ f (X(t))Ẋ(t) + ∇f (X(t)) = 0.
t t

√ √
Remark 3.1. If we set α = p + 1, β(t) = s and b(t) = ts , then the HF-ODE (10) satisfies
b(t) = β̇(t) + β(t)/t, which is different from the ODE framework (DIN -AV D)α,β,b .

Remark 3.2. In the absence of Hessian-driven damping s∇2 f (X(t))Ẋ(t), the corresponding ODE of
Halpern fixed-point algorithm (6) is

p+1 s
Ẍ(t) + Ẋ(t) + ∇f (X(t)) = 0,
t t
which is called NHF-ODE. We omit the proof and more details please see [12].

3.2. HF-ODE with Hessian-driven damping

To better understand the role of Hessian-driven damping, we compare the two continuous systems NHF-
ODE and HF-ODE on a simple minimization problem. Taking d = 2, the step size s = 0.09 and an
ill-conditioned function f (x1 , x2 ) = 21 (10000x21 + x22 ), we have the trajectories in Figs. 1 and 2. For Fig. 1, we
set p = 1.1 and the initial conditions (x1 (0.345), x2 (0.345)) = (1, 1), (ẋ1 (0.345), ẋ2 (0.345)) = (−1500, −0.15).
For Fig. 2, we take p = 2.1 and the initial conditions (x1 (0.795), x2 (0.795)) = (1, 1), (ẋ1 (0.795), ẋ2 (0.795)) =
(−1500, −0.15). The wild oscillations of NHF-ODE are neutralized by the Hessian-driven damping in
HF-ODE.
5
L. Zhang, W. Gao, J. Xie et al. Applied Mathematics Letters 148 (2024) 108889

Fig. 2. Evolution of the trajectories for NHF-ODE (p = 2.1) and HF-ODE (p = 2.1) on an ill-conditioned quadratic problem in R2 .

4. Conclusions

This paper is devoted to ordinary differential equation for modeling Halpern fixed-point algorithm. More
precisely, we derive the limit ordinary differential equation HF-ODE of Halpern fixed-point algorithm under
some given coefficients. This limit ordinary differential equation exhibits approximate equivalence to Halpern
fixed-point algorithm and thus can serve as a tool for analysis.

CRediT authorship contribution statement

Lulu Zhang: Originated the proposed idea, Algorithm analysis, Writing – original draft. Weifeng Gao:
Responsible for the research direction, Paper organization. Jin Xie: Writing – original draft, Managing of
research funding. Hong Li: Writing – original draft, Managing of research funding.

Data availability

We do not analyze or generate any datasets, because our work proceeds within a theoretical and
mathematical approach. One can obtain the relevant materials from the references below.

Acknowledgments

This work was supported in part by the National Nature Science Foundation of China under Grant
62276202 and 62106186, in part by the Natural Science Basic Research Plan in Shaanxi Province of China
under Grant 2022JQ-670 and 2022JM-372, in part by the Fundamental Research Funds for the Central
Universities, China under Grant QTZX22047 and JB210701.

References

[1] H. Attouch, M. Soueycatt, Augmented lagranian and proximal alternating direction methods of multipliers in Hilbert
spaces. Applications to games, PD e’s and control, Pac. J. Optim. 5 (1) (2009) 17–37.
[2] H.H. Bauschke, P.L. Combettes, Convex Analysis and Monotone Operator Theory in Hilbert Spaces, Springer, New
York, 2017.
[3] P.L. Combettes, Monotone operator theory in convex optimization, Math. Program. B 170 (1) (2018) 177–206.
[4] R.T. Rockafellar, Monotone operators and the proximal point algorithm, SIAM J. Control Optim. 14 (1976) 877–898.
[5] F. Lieder, On the convergence rate of the halpern-iteration, Optim. Lett. 15 (2) (2021) 405–418.
[6] K. Goebel, W.A. Kirk, Topics in Metric Fixed Point Theory, in: Cambridge Studies in Advanced Mathematics, Cambridge
University Press, Cambridge, 1990.
6
L. Zhang, W. Gao, J. Xie et al. Applied Mathematics Letters 148 (2024) 108889

[7] H.H. Bauschke, J.M. Borwein, On projection algorithms for solving convex feasibility problems, SIAM Rev. 38 (3) (1996)
367–426.
[8] M.A. Krasnoselskil̆, Two remarks on the method of successive approximations, Uspekhi Mat. Nauk 10 (1955) 123–127.
[9] Halpern, B: Fixed points of nonexpansive maps, Bull. Amer. Math. Soc. 73 (1967) 957–961.
[10] K. Nakajo, W. Takahashi, Strong convergence theorems for nonexpansive mappings and nonexpansive semigroups, J.
Math. Anal. Appl. 279 (2003) 372–379.
[11] T.D. Quoc, The connection between Nesterov’s accelerated methods and Halpern fixed-point iterations, 2022, arXiv:22
03.04869v1.
[12] W.J. Su, S. Boyd, E.L. Candès, A differential equation for modeling Nesterov’s accelerated gradient method: theory and
insights, J. Mach. Learn. Res. 17 (153) (2016) 1–43.
[13] Y. Nesterov, A method of solving a convex programming problem with convergence rate O(1/(k2 )), Sov. Math. Doklady
27 (1983) 372–376.
[14] A. Wibisono, A.C. Wilson, M.I. Jrodan, A variational perspective on accelerated methods in optimization, Proc. Natl.
Acad. Sci. 113 (47) (2016) E7351–E7358.
[15] B.T. Polyak, Some methods of speeding up the convergence of iteration methods, USSR Comput. Math. Math. Phys. 4
(5) (1964) 1–17.
[16] B. Shi, S.S. Du, M.I. Jordan, W.J. Su, Understanding the acceleration phenomenon via high-resolution differential
equations, Math. Program. (2018) arXiv preprint arXiv:1810.08907.
[17] B. Shi, S.S. Du, W.J. Su, M.I. Jordan, Acceleration via symplectic discretization of high-resolution differential equations,
in: Advances in Neural Information Processing Systems, 2019, pp. 5745–5753.
[18] H. Attouch, Z. Chbani, J. Fadili, H. Riahi, First-order optimization algorithms via systems with Hessian driven damping,
Math. Program. 193 (2022) 113–155.
[19] H. Attouch, Z. Chbani, J. Fadili, H. Riahi, Convergence of ierates for first-order optimization algorithms with inertia
and Hessian driven damping, Optimization 72 (5) (2023) 1199–1238.
[20] B. Xue, H.L. Du, X.G. Geng, New integrable peakon equation and its dynamic system, Appl. Math. Lett. (2023)
http://dx.doi.org/10.1016/j.aml.2023.108795.

You might also like