You are on page 1of 24

mathematics

Article
An Inertial Parametric Douglas–Rachford Splitting Method for
Nonconvex Problems
Tianle Lu and Xue Zhang *

School of Mathematics and Computer Science, Shanxi Normal University, Taiyuan 030031, China;
221109020@sxnu.edu.cn
* Correspondence: zhangxue2100@sxnu.edu.cn

Abstract: In this paper, we propose an inertial parametric Douglas–Rachford splitting method for min-
imizing the sum of two nonconvex functions, which has a wide range of applications. The proposed
algorithm combines the inertial technique, the parametric technique, and the Douglas–Rachford
method. Subsequently, in theoretical analysis, we construct a new merit function and establish
the convergence of the sequence generated by the inertial parametric Douglas–Rachford splitting
method. Finally, we present some numerical results on nonconvex feasibility problems to illustrate
the efficiency of the proposed method.

Keywords: nonconvex; Douglas–Rachford splitting; inertial; parameterized

MSC: 90C26; 65K05; 90C90

1. Introduction
The Douglas–Rachford (DR) splitting method is a classical optimization algorithm
initially proposed by Douglas and Rachford [1] for the numerical solution of heat differential
equations. Later, Lions and Mercier [2], through their pioneering work, made the algorithm
applicable to a class of optimization problems formulated as follows:

Citation: Lu, T.; Zhang, X. An Inertial min ϕ(u) = f (u) + g(u), (1)
u
Parametric Douglas–Rachford
Splitting Method for Nonconvex where f and g are closed convex functions. Following that, the DR splitting method has also
Problems. Mathematics 2024, 12, 675. been widely applied to various optimization problems that arise from signal processing,
https://doi.org/10.3390/ tensor recovery, and image processing, where the objective function is the sum of two
math12050675 proper closed convex functions. For example, Combettes and Pesque [3], Gandy et al. [4],
Academic Editor: Shin-ya Matsushita He and Yuan [5], Qu et al. [6].
Moreover, the DR splitting method can solve the more general problem of finding the
Received: 22 January 2024 zeros of two maximal monotone operators. When both f and g in Problem (1) are convex
Revised: 22 February 2024
functions, and the corresponding operators are the subdifferentials of f and g, the DR
Accepted: 23 February 2024
splitting method corresponds to the following iterative procedure :
Published: 25 February 2024
1 k 1
x k +1 = x + (2 proxγg − I )(2 proxγ f − I )( x k ), (2)
2 2
Copyright: © 2024 by the authors. where step-size parameter γ > 0, I is the identity mapping, and the proximal mapping is
Licensee MDPI, Basel, Switzerland. defined as
1
This article is an open access article proxγ f ( x ) := arg min{ f (u) + ∥ u − x ∥2 }.
distributed under the terms and u 2γ
conditions of the Creative Commons
Although both f and g are convex functions in Problem (1), the proximal operator of
Attribution (CC BY) license (https://
the objective function ϕ is challenging to compute. As indicated by (2), the DR splitting
creativecommons.org/licenses/by/
method overcomes this difficulty. It converts solving Problem (1) into solving two easily
4.0/).

Mathematics 2024, 12, 675. https://doi.org/10.3390/math12050675 https://www.mdpi.com/journal/mathematics


Mathematics 2024, 12, 675 2 of 24

computable proximal operators. The DR splitting algorithm is a powerful method for


solving problems with such a sum of functions.
Recently, Patrinos and Stella [7] introduced the Douglas–Rachford Envelope (DRE)
function, which has a stationary point corresponding to the solution to the problem of
minimizing the sum of two convex functions, f + g, under linear constraints. Notably, DRE
has greatly assisted in subsequent research on nonconvex optimization. Barshad et al. [8]
recently proposed an unrestricted DR method for solving convex feasibility problems in
Hilbert space. While the DR splitting method is typically limited to a finite number of
sets, they introduced DR algorithms that allow the number of sets involved to be (almost)
unrestricted. For convex cases, the DR splitting method has been extensively utilized over
the past few decades, with significant progress made in proving convergence and exploring
various generalizations and extensions. For more details on the DR splitting method, we
recommend these survey articles [9,10].
More recently, the DR splitting method has demonstrated promising performance
in tackling nonconvex problems [11–16]. For example, Li and Pong [11] introduced a
merit function inspired by the DRE [7] and proved that the DR splitting method with the
following iterative form:

1

k +1
y
 ∈ arg min{ f (y) + ∥ y − x k ∥2 },




 y
1
zk+1 ∈ arg min{ g(z) + ∥z − (2yk+1 − x k )∥2 },





 z
 k +1
x = x k + ( z k +1 − y k +1 ),

could be applied to nonconvex feasibility problems under certain assumptions on the


function f , g, and step-size parameter γ > 0. They also showed the global convergence of
the whole iterative sequence when γ satisfies the following inequality:

5 3
(1 + γL)2 + γs − < 0, (3)
2 2

where L is the Lipschitz constant, s ∈ Rm is such that f + 2l ∥ · ∥2 is convex. In contrast


to [11], Themelis and Patrinos [12] established the convergence of the alternating direc-
tion method of multipliers (ADMM) and the DR splitting method by relaxing certain
constraints on the algorithm parameters and considering larger step sizes and relaxation
parameters. Subsequently, based on the above findings, Themelis et al. [13] proposed
two line-search algorithms, which enhanced these methods with quasi-Newton directions.
Dao and Phan [14] proved the convergence of the adaptive Douglas–Rachford splitting
method for solving Problem (1) under suitable assumptions, where the functions f and g
are closed and, respectively, a- and b-convex with a + b ≥ 0. For a fundamental nonconvex
composite optimization problem in federated learning, Tran Dinh et al. [15] introduced two
new algorithms called Federated Douglas–Rachford (FedDR) and asynchronous FedDR
(asyncFedDR). These algorithms utilize novel techniques, including the Douglas–Rachford
splitting method and asynchronous implementation, to address statistical and system het-
erogeneity efficiently. Bian and Zhang [16] studied the parameterized Douglas–Rachford
(PDR) splitting method and demonstrated its effectiveness. Notably, they modified the
iterative formula about z of the DR splitting method as follows,

1

 yk+1 ∈ arg min{ f (y) + ∥ y − x k ∥2 },





 y
1
zk+1 ∈ arg min{ g(z) + ∥z − αyk+1 + x k ∥2 },





 z
 k +1
x = x k + ( z k +1 − y k +1 ),
Mathematics 2024, 12, 675 3 of 24

where α ∈ ( 23 , 2] is the parameterized coefficient, step-size γ > 0, and they satisfy the
following inequality:
4−α 9 − 2α 3
(1 + γL)2 + γs − < 0. (4)
2 2 2
In nonconvex settings, it is evident that the PDR splitting method represents a spe-
cific instance of the adaptive DR method [14]. Parameterization techniques to make DR
algorithms more flexible in practice. Moreover, in the numerical experimental part of
PDR, we observe that it saves a significant amount of running time compared to DR. Not
only the PDR splitting method but much work (see [17–19]) on DR splitting methods has
been done by adding parameters to its formulations, and it illustrates the superiority of
parameterization.
On the other hand, the inertial scheme, also called the heavy ball method proposed
by Polyak [20], has been widely used in optimization algorithms, and its effectiveness has
been proved. Inertia strategies help solve nonconvex minimization problems more rapidly.
For instance, Boţ and Csetnek [21] proposed the inertial Douglas–Rachford splitting for
monotone inclusion problems. Based on the inexact proximal point algorithm, Alves et
al. [22] combined inertial step and overrelaxation to propose a partially inexact inertial-
relaxed Douglas–Rachford algorithm. Meanwhile, Han et al. [19] study the randomized
r-sets–Douglas–Rachford (RrDR) method with the inertial scheme and show that it con-
verges at an accelerated linear rate. Additionally, Feng et al. [23] developed an inertial
Douglas–Rachford splitting (IDRS) method and demonstrated its validity in terms of signal
recovery. The IDRS method is given as follows:

1

 yk+1 ∈ arg min{ f (y) +
 ∥ y − u k ∥2 },



 y 2γ

1


zk+1 ∈ arg min{ g(z) + ∥z − (2yk+1 − uk )∥2 },

 z 2γ

x k +1 = u k + ( z k +1 − y k +1 ),





 k +1
u = x k +1 + β ( x k +1 − x k ),

where β > 0 is the inertial parameter and satisfies the following conditions:

1 1 − 5γs − 4γL − 2γ2 L2


> s, β< . (5)
γ 2

The use of parameterization and inertial strategies can improve the performance of
the DR method. It is natural to pose the following question: Can we develop an efficient
approach to solving Problem (1) by leveraging both parameterization and inertial strategies?
Given the prevalence of nonconvex optimization problems in practical applications,
investigating the DR splitting method for solving these problems holds significant and
far-reaching implications. In this paper, we consider the nonconvex and nonsmooth mini-
mization problems (1), where f and g are properly closed, possibly nonconvex functions.
We propose to combine the inertial and parametric techniques with the DR splitting method
and introduce an inertial parametric Douglas–Rachford (IPDR) splitting method. Moreover,
we demonstrate that the IPDR splitting method generates a stationary point if the sequence
generated by the method has a cluster point. Our analysis relies heavily on our defined
merit function (see Definition 5), which generates a decreasing sequence along the IPDR
splitting method. Additionally, because of our IPDR method, it is easy to obtain DR, PDR,
and IDRS by appropriate choice of parameter simplification. Then, we obtain a unified
convergence analysis of DR, PDR, and IDRS as a byproduct. Finally, numerical results
on nonconvex feasibility problems demonstrate that our algorithm saves computing time
while improving accuracy.
The structure of this paper is organized as follows. In Section 2, we present some
fundamental concepts and preliminary materials. In Section 3, we introduce the IPDR
Mathematics 2024, 12, 675 4 of 24

method. We also demonstrate the convergence of the proposed algorithm with suitable
assumptions. Section 4 presents the results of numerical experiments, which are illustrated
and discussed. Finally, we provide concluding remarks in Section 5.

2. Notation and Preliminaries


In this section, we will provide basic concepts and notations which are necessary for
the understanding of subsequent results.
Let Rn denote
p the n-dimensional Euclidean space, ⟨·, ·⟩ denote the inner product,
and ∥ · ∥ = ⟨·, ·⟩ denote the induced norm. For an extended real valued function
f : Rn → (−∞, ∞], we say that f is proper if it is never −∞ and its domain,
dom f := { x ∈ Rn : f ( x ) < +∞} is nonempty. The function is called closed if it is lower
semicontinuous. We also use dist( x, X ) to denote the distance from the point x to the set
X , i.e., dist( x, X ) := infy∈X ∥ x − y∥. The set of all cluster points of { x k }k∈N is denoted by
C ({ x k }k∈N ). The set of critical points of f is represented by crit f , i.e.,

crit f = {u ∈ dom f : 0 ∈ ∂ f (u)}.

Definition 1 ([24]). (limiting subdifferential) Let f be a proper function. The limiting subdifferen-
tial of f at x ∈ dom f is defined by

f (z) − f x t − vt , z − x t

f
n
∂ f ( x ) := v ∈ Rn : ∃ x t → x, vt → v with lim inf
z→ xt ∥z − xt ∥ (6)
o
≥ 0 for each t .

If f is differentiable at x, we have ∂ f ( x ) = {∇ f ( x )}. If f is convex, we have

∂ f ( x ) = {v ∈ Rn : f (z) ≥ f ( x ) + ⟨v, z − x ⟩ for any z ∈ Rn }, (7)

which is the classic definition of subdifferential in convex analysis.


Next, we list some fundamental results for subsequent use.

Lemma 1 ([25]). Given a positive real number α that satisfies α < 1, and { ak }k∈N is a summable
sequence, {bk }k∈N is a non-negative real sequence and satisfies

bk+1 ≤ αbk + ak , ∀k ≥ 1,

then ∑ bk < +∞.
k =0

Lemma 2 ([24]). Given a bounded sequence { x k }k∈N ∈ Rn , the set C ({ x k }k∈N ) is nonempty and
compact. Moreover, we have

lim dist( x k , C ({ x k }k∈N )) = 0.


k→∞

Lemma 3 ([26]). (Descent lemma) Consider a differentiable function f : Rn → R. If the gradient


∇ f is L − Lipschitz continuous, then for all x, y ∈ Rn , it holds that

L
| f (y) − f ( x ) − ⟨∇ f ( x ), y − x ⟩|≤ ∥ y − x ∥2 .
2

Proposition 1. Let f , f1 : Rn → R ∪ {+∞} be proper functions, then the following conclusions hold.
1. Let { x k }, {vk } be sequences, if x k → x, vk → v, vk ∈ ∂ f ( x k ) and f ( x k ) → f ( x ), then

v ∈ ∂ f ( x );
Mathematics 2024, 12, 675 5 of 24

2. Suppose f is lower semicontinuous, and x ∗ ∈ Rn is a local minimizer of f , then

0 ∈ ∂ f ( x ∗ ); (8)

3. Given a point x̄ with f ( x̄ ) finite, if the function f 1 is continuously differentiable on the


neighborhood of x̄, then
∂( f + f 1 )( x̄ ) = ∂ f ( x̄ ) + ∇ f 1 ( x̄ );
4. For a given function f , if it is strongly convex with coefficient α > 0, then

α
f (y) ≥ f ( x ) + ⟨v, y − x ⟩ + ∥ y − x ∥2 , ∀ x, y, v ∈ ∂ f ( x ).
2

Remark 1. We point out that


1. For the objective function in Problem (1) and any point u ∈ dom ϕ, by the Proposition 1.3,
we have that
∂ϕ(u) = ∇ f (u) + ∂g(u);
2. If 0 ∈ ∂ f ( x ∗ ), then x ∗ is the stationary point of the function f . In the next section, the
problem of how to find a stationary point of the objective function ϕ in (1) will be formally
considered.

The proximal operator plays a crucial role in the design of our algorithm. Therefore,
we review its definition and significant properties as follows.

Definition 2 ([24]). (Proximal mapping) Given a proper and lower semicontinuous function
f : Rn → R ∪ {+∞} and a parameter α > 0, the proximal mapping Pα f of f associated with
parameter α > 0 is defined by:
1
Pα f ( x ) = arg min{ f (y) + ∥ y − x ∥2 }.
y 2α

Proposition 2. Suppose the function f is proper and lower semicontinuous. Given α > 0.
1. If f is lower bounded, i.e., inf f > −∞, then for each x̄ ∈ dom( f ), the set Pα f ( x̄ ) is
nonempty and compact [24] (Theorem 1.17);
2. If f is convex, the proximal mapping Pα f is single valued and firmly nonexpansive [27]
(Lemma 11.1), i.e.,

∥ Pα f ( x ) − Pα f (y)∥2 ≤ ⟨ Pα f ( x ) − Pα f (y), x − y⟩, ∀ x, y.

Definition 3 ([24]). (Indicator function) For a closed set S ∈ Rn , its indicator function IS is
defined by 
0, if x ∈ S,
IS ( x ) =
+∞, if x ∈ / S.

At last, we list another important concept, the Kurdyka–Łojasiewicz property.

Definition 4 ([28]). (Kurdyka–Łojasiewicz property) A function f : Rn → R ∪ {+∞} is said to


have the KL property at x ∗ ∈ dom ∂ f if there exists η ∈ (0, +∞], a neighborhood U of x ∗ , and a
continuous concave function ψ : [0, η ) → R+ such that:
1. ψ(0) = 0, ψ ∈ C1 (0, η ), and for all s ∈ (0, η ), ψ′ (s) > 0;
2. For all x in U ∩ [ f ( x ∗ ) < f ( x ) < f ( x ∗ ) + η ], the KL inequality in following holds

ψ′ ( f ( x ) − f ( x ∗ )) dist(0, ∂ f ( x )) ≥ 1. (9)

A function that satisfies the KL property at each point of its domain is called a
KL function.
Mathematics 2024, 12, 675 6 of 24

Lemma 4 ([29]). (Uniformized KL property) Let Ω ⊆ Rn be a compact set and f : Rn →


(−∞, +∞] be a proper and lower semicontinuous function. Assume that f is constant on Ω and
f satisfies the KL property at each point of Ω. Then there exists ε, η > 0, and a desingularization
function ψ, such that for all x ∗ ∈ Ω and for all x in

{ x ∈ Rn : dist( x, Ω) < ε} ∩ { f ( x ∗ ) < f ( x ) < f ( x ∗ ) + η },

one has
ψ′ ( f ( x ) − f ( x ∗ )) · dist(0, ∂ f ( x )) ≥ 1.

3. Algorithm and Convergence


3.1. The IPDR Algorithm
In this section, we present the algorithm for Problem (1) under the following assumptions.

Assumption 1. Functions f , g satisfy


1. The function f has a Lipschitz continuous gradient, i.e., there exists a constant L ≥ 0 such
that
∥∇ f ( x ) − ∇ f (y)∥ ≤ L∥ x − y∥, ∀ x, y ∈ Rn ; (10)
2. The function g is a proper closed function.

Remark 2. Based on Assumption 1, we conclude from Lemma 3 that f (·) + 2l ∥·∥2 is convex for
any fixed l ≥ L. Specifically, take s ≤ L such that
s
f s+ (·) := f (·) + ∥·∥2
2
is convex.

Algorithm 1 IPDR method for solving (1)


Input: Choose parameters γ, β > 0, α ∈ ( 32 , 3]; Initialize: y0 , z0 , x0 , u0 , and set k = 0.
Output: optimal x ∗ .
1: repeat
2: compute
1
yk+1 ∈ arg min f (y) + ∥ y − u k ∥2 ; (11)
y 2γ
3: compute
1
zk+1 ∈ arg min g(z) + ∥z − (αyk+1 − uk )∥2 ; (12)
z 2γ
4: compute
x k +1 = u k + ( z k +1 − y k +1 ); (13)
5: compute
u k +1 = x k +1 + β ( x k +1 − x k ); (14)
6: k = k + 1;
7: until a termination criterion is satisfied.

Remark 3. Using the optimal conditions and the subdifferential calculus rule for y and z-updates
in (11) and (12), we have
1 k +1
0 = ∇ f ( y k +1 ) + (y − u k ), (15)
γ
1 k +1
0 ∈ ∂g(zk+1 ) + (z + uk − αyk+1 ). (16)
γ
Mathematics 2024, 12, 675 7 of 24

If a limiting point (y∗ , z∗ , x ∗ ) of the sequence {(yk , zk , x k )} exists, we have y∗ = z∗ , and

1
0 ∈ ∇ f (y∗ ) + ∂g(y∗ ) + (2 − α ) y ∗ . (17)
γ

According to Formula (17), the limiting point is a critical point of Problem (1) with the addi-
−α y 2 .
tional regularization term 22γ ∥ ∥ If the variable g in Algorithm 1 is replaced with
−α y 2 , the corresponding clustering points will also be a critical point of Problem (1).
g̃ = g − 22γ ∥ ∥
And we set ϕ̃ = f + g̃.

For simplicity, we use the following notations:


1. For any k ∈ N, define f k ( x ) by

1
f k ( x ) := f ( x ) + ∥ x − u k ∥2

(18)
1 1 1 s
= f s+ ( x ) + ( − s)∥ x − u k ∥2 − ∥ u k ∥2 .
2 γ 1 − γs 2(1 − γs)
1
The f k ( x ) is strongly convex with coefficient c and

1
yk+1 = arg min{ f k ( x )} = Pc f s+ ( u k ),
x 1 − γs

where c = 1−γγs . By combining the Proposition 2.2 with the Propositions 1.1 and 1.4,
we can obtain that, for all k ∈ N, the following two inequalities hold:

1 1
∥ y k +1 − y k ∥ 2 ≤ ⟨ y k +1 − y k , uk − u k −1 ⟩, (19)
1 − γs 1 − γs

1
f k ( x ) − f k ( y k +1 ) ≥ ∥ x − y k +1 ∥ 2 , ∀ x; (20)
2c
2. For any k ∈ N, define gk : Rn → R ∪ {+∞} by

1
gk ( x ) := g( x ) + ∥ x − (αyk+1 − uk )∥2 . (21)

Then
zk+1 = arg min{ gk ( x )};
x

3. Denote
v k = ( y k ; z k ; x k ), w k = ( y k ; z k ; x k ; x k −1 ; x k −2 ), (22)
k −1 k −1 k −1 k −1 k −2
∆1k k
= y −y , ∆2k k
= z −z ,∆ =x −x . (23)
To analyze the convergence of the IPDR method, we formulated a new merit function.

Definition 5 (Merit function). Let γ > 0, for any y, z, x, x̃, x̂ ∈ Rn , the merit function is defined by

β2
M(y, z, x, x̃, x̂ ) := M0 (y, z, x ) + ( + ρ0 )∥ x̃ − x̂ ∥2 , (24)

(2− α ) β2
where ρ0 = 2γ ,

1 1 2−α
M0 (y, z, x ) = f (y) + g(z) − ∥y − z∥2 + ⟨ x − (α − 1)y, z − y⟩ + ∥ y ∥2 . (25)
2γ γ 2γ
Mathematics 2024, 12, 675 8 of 24

This definition is motivated by the Douglas–Rachford merit function in [11] (Definition


2). Moreover, it is not hard to see that the function M0 can be alternatively written as
1 1
M0 (y, z, x ) = f (y) + g(z) + ∥z − (αy − x )∥2 − ∥ x − ( α − 1) y ∥2
2γ 2γ
1 2−α
− ∥ z − y ∥2 + ∥ y ∥2 (26)
γ 2γ
1 1
= f (y) + g(z) + (∥ x − y∥2 − ∥ x − z∥2 ) + ⟨(2 − α)y, z − y⟩
2γ γ
2−α
+ ∥ y ∥2 , (27)

where Equation (26) follows from the elementary relation ⟨a, b⟩ = 12 (∥a + b∥2 − ∥a∥2 − ∥b∥2 )
applied with a = x − (α − 1)y and b = z − y in (25). And Equation (27) follows from the
relation ⟨a, b⟩ = 12 (∥a + b∥2 − ∥a − b∥2 ) applied with a = x − y and b = z − y in (25).

3.2. Convergence Analysis


Before presenting the convergence results, we provide several significant relations.

uk−1 − uk = x k−1 + β( x k−1 − x k−2 ) − [ x k + β( x k − x k−1 )]


(28)
= β( x k−1 − x k−2 ) + (1 + β)( x k−1 − x k ),

where the first equality follows from the definition of u-update (14). Next, using the
definition of x-update (13) and u-update (14), we obtain that

x k+1 − x k = (uk + zk+1 − yk+1 ) − [uk − β( x k − x k−1 )]


(29)
= z k +1 − y k +1 + β ( x k − x k −1 ).

Using Equations (13) and (14), we can obtain the following two relations:

y k − z k = u k −1 − x k
= u k −1 − u k + u k − x k (30)
k −1 k k k −1
=u − u + β( x − x ).

z k − y k = x k − u k −1
(31)
= x k − x k −1 + β ( x k −2 − x k −1 ).

Assumption 2. The following relationships between the parameters are satisfied,


1. α ≤ 3, ρ = −3 − γs + 2α − (4 − α)[2γ(s + L) + γ2 L2 ] > (4 − α)q, where q > 0;
2. (4 − α) β + (2 − α)qβ2 ≤ (6 − 2α)q.

The following theorem demonstrates the monotonically decreasing of the sequence


{M(wk )}k∈N , therefore contributing to establishing global subsequential convergence for
the IPDR splitting algorithm.

Theorem 1. (Decrease property) Suppose that Assumptions 1 and 2 hold. {wk } is the sequence of
points generated by the IPDR algorithm. Then, for all k ∈ N,

M(wk+1 ) − M(wk ) ≤ −ρ1 ∥∆1k+1 ∥2 − (ρ2 − ρ0 )∥∆k ∥2 , (32)

(2− α ) β2 ρ−(4−α)q (3− α ) β 4− α β


where ρ0 = 2γ > 0, ρ1 = 2γ > 0 and ρ2 = γ − 2γ · q > ρ0 .
Mathematics 2024, 12, 675 9 of 24

Proof. First, from the definition of M0 (see (25)) and the equality (29), we can obtain

M0 ( y k +1 , z k +1 , x k +1 ) − M0 ( y k +1 , z k +1 , x k )
1
= ⟨ x k +1 − x k , z k +1 − y k +1 ⟩
γ
1 (33)
= ⟨ z k +1 − y k +1 + β ( x k − x k −1 ), z k +1 − y k +1 ⟩
γ
1 β
= ∥ z k +1 − y k +1 ∥ 2 + ⟨ ∆ k , z k +1 − y k +1 ⟩.
γ γ

Second, employing (26) and the fact that zk+1 is a minimizer, we have

M0 ( y k +1 , z k +1 , x k ) − M0 ( y k +1 , z k , x k )
1 k +1 1
= g ( z k +1 ) + ∥z − (αyk+1 − x k )∥2 − ∥yk+1 − zk+1 ∥2
2γ γ
 
k 1 k k +1 k 2 1 k +1 k 2
− g(z ) + ∥z − (αy − x )∥ − ∥y −z ∥ (34)
2γ γ
β 1 1
= gk (zk+1 ) − gk (zk ) − ⟨∆2k+1 , ∆k ⟩ − ∥yk+1 − zk+1 ∥2 + ∥yk+1 − zk ∥2
γ γ γ
β 1 1
≤ − ⟨∆2k+1 , ∆k ⟩ − ∥yk+1 − zk+1 ∥2 + ∥yk+1 − zk ∥2 ,
γ γ γ

where the second equality from the following Equation (35):


From (14) and the definitions of gk ( x ) (see (21)), we can easily obtain
 
g k ( z k +1 ) − g k ( z k )
 
k +1 k 1 k +1 k +1 k 2 1 k k +1 k 2
− g(z ) − g(z ) + ∥z − (αy − x )∥ − ∥z − (αy − x )∥
2γ 2γ
1 1 k
= g(zk+1 ) + ∥zk+1 − (αyk+1 − uk )∥2 − g(zk ) − ∥z − (αyk+1 − uk )∥2
2 2γ
 
k +1 k 1 k +1 k +1 k 2 1 k k +1 k 2
− g(z ) − g(z ) + ∥z − (αy − x )∥ − ∥z − (αy − x )∥
2γ 2γ
1 1 k
= ∥zk+1 − (αyk+1 − uk )∥2 − ∥z − (αyk+1 − uk )∥2 (35)
2γ 2γ
1 k +1 1 k
− ∥z − (αyk+1 − x k )∥2 + ∥z − (αyk+1 − x k )∥2
2γ 2γ
1  
= −2⟨zk+1 − zk , αyk+1 − uk ⟩ + 2⟨zk+1 − zk , αyk+1 − x k ⟩

1
= ⟨ z k +1 − z k , u k − x k ⟩
γ
β
= (∆2k+1 , ∆k ).
γ
Mathematics 2024, 12, 675 10 of 24

Based on the definitions of M0 (see (27)) and Formula (20), we have

M0 ( y k +1 , z k , x k ) − M0 ( y k , z k , x k )
 
1 1
= f ( y k +1 ) + ∥ x k − y k +1 ∥ 2 − f ( y k ) + ∥ x k − y k ∥2
2γ 2γ
1 1
+ ⟨(2 − α)yk+1 , zk − yk+1 ⟩ − ⟨(2 − α)yk , zk − yk ⟩
γ γ
2−α k +1 2 k 2
(36)
+ (∥y ∥ − ∥y ∥ )

β 2 − α k +1 2 2 − α k 2
= f k (yk+1 ) − f k (yk ) + ⟨∆1k+1 , ∆k ⟩ + ∥y ∥ − ∥y ∥
γ 2γ 2γ
1 h i
+ (2 − α ) ⟨ y k +1 , z k − y k +1 ⟩ − ⟨ y k , z k − y k ⟩ ,
γ

where the last equality follows from the (18). By the inequality Equation (20), we have

1
f k ( y k +1 ) − f k ( y k ) ≤ − ∥ ∆ k +1 ∥ 2 . (37)
2c 1
Moreover, for the last equation of (36), we further simplify

⟨ y k +1 , z k − y k +1 ⟩ − ⟨ y k , z k − y k ⟩
= ⟨ y k +1 − y k , z k − y k +1 ⟩ + ⟨ y k , y k − y k +1 ⟩ (38)
1 1
≤ ∥ y k +1 − y k ∥ 2 + ∥ z k − y k +1 ∥ 2 + ⟨ y k , y k − y k +1 ⟩,
2 2

2⟨yk , yk − yk+1 ⟩ + ∥yk+1 ∥2 − ∥yk ∥2 = ∥yk − yk+1 ∥2 = ∥∆1k+1 ∥2 . (39)


Substituting Equations (37)–(39) into (36), we obtain

M0 ( y k +1 , z k , x k ) − M0 ( y k , z k , x k )
1 β 2 − α k +1 2
≤ − ∥∆1k+1 ∥2 + ⟨∆1k+1 , ∆k ⟩ + ∥ ∆1 ∥
2c γ 2γ
2−α k 2 − α k +1 2 (40)
+ ∥ z − y k +1 ∥ 2 + ∥ ∆1 ∥
2γ 2γ
1 2−α β 2−α k
= (− + )∥∆1k+1 ∥2 + ⟨∆1k+1 , ∆k ⟩ + ∥ z − y k +1 ∥ 2 .
2c γ γ 2γ

Summing (33), (34), and (40), we obtain

M0 ( v k +1 ) − M0 ( v k )
1 2−α β
≤ (− + )∥∆1k+1 ∥2 + ⟨∆k , ∆1k+1 − ∆2k+1 + zk+1 − yk+1 ⟩
2c γ γ
1 2−α
+( + )∥zk − yk+1 ∥2
γ 2γ
1 2−α β 1 2−α (41)
= (− + )∥∆1k+1 ∥2 + ⟨∆k , zk − yk ⟩ + ( + )∥zk − yk+1 ∥2
2c γ γ γ 2γ
1 2−α β β2
= (− + )∥∆1k+1 ∥2 + ∥∆k ∥2 − ⟨∆k , ∆k−1 ⟩
2c γ γ γ
1 2−α
+( + )∥zk − yk+1 ∥2 ,
γ 2γ
Mathematics 2024, 12, 675 11 of 24

where the second equality uses Formula (31). From Formula (19) and the relationship
Equation (30), we obtain

∥ y k +1 − z k ∥ 2 = ∥ y k +1 − y k ∥ 2 + ∥ y k − z k ∥ 2 + 2 ⟨ y k +1 − y k , y k − z k ⟩
= ∥∆1k+1 ∥2 + ∥yk − zk ∥2 − 2⟨∆1k+1 , uk − uk−1 ⟩ + 2β⟨∆1k+1 , ∆k ⟩ (42)
≤ −(1 − 2γs)∥∆1k+1 ∥2 k
+ ∥y − z ∥ k 2
+ 2β⟨∆1k+1 , ∆k ⟩.

Applying the optimality condition Equation (15) of (11), it is implied that

γ ∇ f ( y k +1 ) = u k − y k +1 .

Combining with Assumption 1, we have that for any k ∈ N, it satisfies

∥uk − uk−1 ∥ = ∥γ∇ f (yk+1 ) − γ∇ f (yk ) + yk+1 − yk ∥ ≤ (1 + γL)∥∆1k+1 ∥. (43)

Therefore, combining (43), (28), and (30), for all k ∈ N, it holds that

∥yk − zk ∥2 = ∥uk−1 − uk + β( x k − x k−1 )∥2


= ∥uk−1 − uk ∥2 + ∥ β∆k ∥2 + 2β⟨uk−1 − uk , ∆k ⟩
(44)
≤ (1 + γL)2 ∥∆1k+1 ∥2 + β2 ∥∆k ∥2 + 2β⟨uk−1 − uk , ∆k ⟩
= (1 + γL)2 ∥∆1k+1 ∥2 − (2β + β2 )∥∆k ∥2 + 2β2 ⟨∆k−1 , ∆k ⟩.

Together with Formulas (41), (42) and (44), we obtain

M0 ( v k +1 ) − M0 ( v k )
1 2−α β β2
≤ (− + )∥∆1k+1 ∥2 + ∥∆k ∥2 − ⟨∆k , ∆k−1 ⟩
2c γ γ γ
1 2−α h
+( + ) (2γs − 1)∥∆1k+1 ∥2 + 2β⟨∆1k+1 , ∆k ⟩
γ 2γ
i
+(1 + γL)2 ∥∆1k+1 ∥2 − (2β + β2 )∥∆k ∥2 + 2β2 ⟨∆k−1 , ∆k ⟩
4−α (3 − α ) β 2 k k −1
 
ρ β
=− ∥∆1k+1 ∥2 − (2β + β2 ) − ∥ ∆ k ∥2 + ⟨∆ , ∆ ⟩
2γ 2γ γ γ
(4 − α ) β k +1 k
+ ⟨ ∆1 , ∆ ⟩
γ
(4 − α ) q − ρ k +1 2 4−α (3 − α ) β2 (4 − α ) β2
 
β
≤ ∥ ∆1 ∥ − (2β + β2 ) − − − ∥ ∆ k ∥2
2γ 2γ γ 2γ 2γ q
(3 − α ) β 2 k −1 2
+ ∥∆ ∥

(4 − α ) q − ρ k +1 2 (3 − α ) β β2 (4 − α ) β2 (3 − α ) β 2 k −1 2
 
= ∥ ∆1 ∥ − + − ∥ ∆ k ∥2 + ∥∆ ∥ , (45)
2γ γ 2γ 2γ q 2γ

∥ a ∥2
where the second inequality uses the relation 2⟨ a, b⟩ ≤ q + q∥b∥2 with q > 0.
Mathematics 2024, 12, 675 12 of 24

Finally, using the definition of M (see (24)) and (45), we obtain

M(wk+1 ) − M(wk )
β2 β2
= M0 ( v k +1 ) + ( + ρ0 )∥∆k ∥2 − M0 (vk ) − ( + ρ0 )∥∆k−1 ∥2
2γ 2γ
ρ − (4 − α ) q k +1 2 (3 − α ) β 4 − α β2 (2 − α ) β 2 k −1 2 (46)
≤ − ∥ ∆1 ∥ − ( − · )∥∆k ∥2 + ∥∆ ∥
2γ γ 2γ q 2γ
+ ρ 0 ∥ ∆ k ∥ 2 − ρ 0 ∥ ∆ k −1 ∥ 2
= − ρ1 ∥∆1k+1 ∥2 − (ρ2 − ρ0 )∥∆k ∥2 ,

(3− α ) β 4− α β2
where ρ2 = γ − 2γ · q . This completes the proof.

Remark 4. 1. If α = 2, then the IPDR method becomes the IDRS method, and Assumption 2
reduces to the following Relation

−3 + γs + 4 − 4γ(s + L) − 2γ2 L2 > 2q, β < q. (47)

Compared to (5), it is clear that in this case, the parameter requirement has one less condition:
1
γ > s.
2. If the inertial parameter β = 0, then the IPDR method degenerates to the PDR method, and
Assumption 2 also degenerates to inequality (4).
3. When α and β are set to 0, it is evident that the IPDR method degenerates to the DR method.
Additionally, the inequality relation related to the parameters is the same as in (3).

Theorem 2. Suppose that Assumptions 1 and 2 hold. The sequence {vk } is generated by the IPDR
method. If there exists a bounded subsequence {vk j } ⊆ {vk }, then we have
∞ ∞
1. ∑ ∥∆1k ∥2 < +∞, ∑ ∥∆k ∥2 < +∞;
k =0 k =0
2. lim ∥∆1k ∥ = lim ∥∆2k ∥ = lim ∥∆k ∥ = lim ∥yk − zk ∥ = 0.
k→∞ k→∞ k→∞ k→∞

Proof. 1. The sequence {M(ω k )k∈N } is decreasing by Theorem 1, and {M(wk j ) j∈N } has
a lower bound by Assumption 1. Then {M(ω k )k∈N } is convergent. Summing (32) from
k = 0 to N ≥ 1, we obtain that
N +1 N
M(w N +1 ) − M(w0 ) ≤ −ρ1 ∑ ∥∆1k ∥2 − (ρ2 − ρ0 ) ∑ ∥ ∆ k ∥2 .
k =1 k =0

As N → ∞, we have
∞ ∞
ρ1 ∑ ∥∆1k ∥2 + (ρ2 − ρ0 ) ∑ ∥∆k ∥2 ≤ M(w0 ) − Nlim
→∞
M(w N ) < +∞.
k =1 k =0

Since ρ1 > 0, ρ2 > ρ0 , then


∞ ∞
∑ ∥∆1k ∥2 < +∞, ∑ ∥∆k ∥2 < +∞. (48)
k =0 k =0

2. From (48), we see further that

lim ∥∆1k ∥ = lim ∥yk − yk−1 ∥ = 0,


k→∞ k→∞
lim ∥∆ ∥ = lim ∥ x k − x k−1 ∥ = 0.
k
k→∞ k→∞
Mathematics 2024, 12, 675 13 of 24

Next, using (28) and (14), we have

lim ∥uk+1 − uk ∥ = 0,
k→∞
lim ∥zk − yk ∥ = 0.
k→∞

Finally, it follows from Relation (13) that

lim ∥zk+1 − zk ∥ = 0.
k→∞

This completes the proof.

The following theorem demonstrates that all cluster points of {zk }({yk }) are stationary
points of the objective function ϕ̃ = f + g̃.

Theorem 3. Assume {vk j } j∈N is a convergent subsequence of {vk }k∈N . Denote vk j → v∗ =


(y∗ ; z∗ ; x ∗ ), then we have
1. ϕ̃(zk j ) → ϕ̃(z∗ );
2. z∗ ∈ crit ϕ̃.

Proof. By Theorem 2, we have yk − zk → 0, thus y∗ = z∗ .


1. Given that g is lower semicontinuous and f is continuous, it follows that

g(z∗ ) ≤ lim inf g(zk j ), f (z∗ ) = lim f (zk j ). (49)


j→∞ j→∞

From the definition of zk+1 which is a minimal point, we have

1 k +1 1 ∗
g ( z k +1 ) + ∥z − (αyk+1 − uk )∥2 ≤ g(z∗ ) + ∥z − (αyk+1 − uk )∥2 .
2γ 2γ

Substituting k with k j − 1 and taking the upper limits on both sides, we obtain that

lim sup g(zk j ) ≤ g(z∗ ).


j→∞

Furthermore, together with (49), we can easily obtain the following,

2−α
lim ϕ̃(zk j ) = lim f (zk j ) + lim g(zk j ) − ∥ y ∥2
j→∞ j→∞ j→∞ 2γ
2−α
= f (z∗ ) + g(z∗ ) − ∥y∥2 = ϕ̃(z∗ ).

2. Combining optimality condition Equations (15) and (16), we have

1 k +1
0 ∈ ∇ f (yk+1 ) + ∂g(zk+1 ) + (z − ( α − 1 ) y k +1 ), ∀ k ∈ N.
γ
2− α 2
For the function ϕ̃ = ϕ − 2γ ∥ y ∥ , applying the optimality condition to the subprob-
lem, we obtain further that

1 k +1
0 ∈ ∇ f (yk+1 ) + ∂ g̃(zk+1 ) + (z − y k +1 ), ∀ k ∈ N. (50)
γ

Thus, for all k ∈ N,


2−α
qk+1 ∈ ∇ f (zk+1 ) + ∂ g̃(zk+1 ) = ∂(ϕ − ∥y∥2 )(zk+1 ).

Mathematics 2024, 12, 675 14 of 24

where
1 k +1
q k +1 = ∇ f ( z k +1 ) − ∇ f ( y k +1 ) + (y − z k +1 ).
γ
Thus, we have qk j → 0, j → ∞ by (18) and Theorem 2. Therefore, we obtain the
conclusion based on Proposition 1.1.

To establish the global convergence result, we provide an upper bound on a selected


subgradient sequence, denoted by {M(wk+1 )}k∈N , in terms of quantities associated with
the norms ∥∆k+1 ∥ and ∥∆k ∥.

Lemma 5. For k ∈ N, define


(α − 1)(−∆k+1 + β∆k )
 
 −∆k+1 + β∆k 
1
t k +1 ∆ k +1 − β∆k

=  ,
γ
 
( β + 2ρ0 γ)∆
2 k 
−( β + 2ρ0 γ)∆
2 k

then
tk+1 ∈ ∂M(wk+1 ), ∀k ∈ N.
Moreover, there exists a positive constant ρ3 such that

∥tk+1 ∥ ≤ ρ3 (∥∆k+1 ∥ + ∥∆k ∥), ∀ k ∈ N.

Proof. From the definition of M (24), we have

1
M(wk+1 ) = f (yk+1 ) + g(zk+1 ) − ∥ y k +1 − z k +1 ∥ 2

1 k +1
+ ⟨x − ( α − 1 ) y k +1 , z k +1 − y k +1 ⟩ (51)
γ
2 − α k +1 2 β 2 k
+ ∥y ∥ + ∥ x − x k −1 ∥ 2 + ρ 0 ∥ x k − x k −1 ∥ 2 .
2γ 2γ

We first consider the subdifferential of M at wk+1 = (yk+1 ; zk+1 ; x k+1 ; x k ; x k−1 ). Notice
that for any k ≥ 0, we have

1 k +1 1 h k +1 i
∂y M(wk+1 ) =∇ f (yk+1 ) − (y − z k +1 ) + −x − ( α − 1 ) z k +1 + 2 ( α − 1 ) y k +1
γ γ
(2 − α ) k +1
+ y
γ
2 ( α − 1 ) 2 − α k +1 α − 1 k +1 1 k +1
 
1 1
=∇ f (yk+1 ) + − + + y +( − )z − x
γ γ γ γ γ γ
(52)
α − 1 1 − α 1 1
=∇ f (yk+1 ) + y k +1 + z k +1 + z k +1 − x k +1
γ γ γ γ
α − 1 1
=∇ f (yk+1 ) + ( y k +1 − z k +1 ) + ( z k +1 − x k +1 )
γ γ
α−1 k
= ( u − x k +1 ),
γ
Mathematics 2024, 12, 675 15 of 24

where the last equality use Formula (15) and the definition of x k+1 (13). From the Formulas
(13) and (16), we obtain

2 − α k +1 1 k +1
∂z M(wk+1 ) = ∂g(zk+1 ) + y + (x − z k +1 )
γ γ
1h i 2−α 1
∋ (αyk+1 − uk ) − zk+1 + y k +1 + ( x k +1 − z k +1 )
γ γ γ
2 k +1 1 k 1 k +1 1 k +1 1 k +1
= y − u − z + x − z (53)
γ γ γ γ γ
2 k +1 1
= (y − z k +1 ) + ( x k +1 − u k )
γ γ
1 k
= ( u − x k +1 ).
γ

1 k +1
∂ x M(wk+1 ) = (z − y k +1 ).
γ
β2
∂ x̃ M(wk+1 ) = ( + 2ρ0 )( x k − x k−1 ).
γ
β2
∂ x̂ M(wk+1 ) = −( + 2ρ0 )( x k − x k−1 ).
γ
Therefore, we obtain

1h
(α − 1)(uk − x k+1 ); uk − x k+1 ; zk+1 − yk+1 ; ( β2 + 2ρ0 γ)( x k − x k−1 );
γ (54)
i
(− β2 − 2ρ0 γ)( x k − x k−1 ) ∈ ∂M(wk+1 ).

By (31), we have

uk − x k+1 = −( x k+1 − uk )
h i
= − x k +1 − x k + β ( x k −1 − x k )
(55)
= −( x k+1 − x k ) − β( x k−1 − x k )
= −∆k+1 + β∆k .

z k +1 − y k +1 = x k +1 − x k + β ( x k −1 − x k )
= ∆ k +1 − β ( x k − x k −1 ) (56)
k +1 k
=∆ − β∆ .
The (54) can be converted into

tk+1 ∈ ∂M(wk+1 ), ∀k ∈ N.
Mathematics 2024, 12, 675 16 of 24

Based on the properties of the norm (∥ − x ∥ = ∥ x ∥, ∥ ax ∥ = | a|∥ x ∥, ∥ x + (−y)∥ ≤


∥ x ∥ + ∥y∥), we have

∥tk+1 ∥ ≤∥tky+1 ∥ + ∥tkz+1 ∥ + ∥tkx+1 ∥ + ∥tkx̃+1 ∥ + ∥tkx̂+1 ∥


1h
= ∥(α − 1)(−∆k+1 + β∆k )∥ + ∥ − ∆k+1 + β∆k ∥ + ∥∆k+1 − β∆k ∥
γ
i
+ ∥( β2 + 2ρ0 γ)∆k ∥ + ∥ − ( β2 + 2ρ0 γ)∆k ∥
1h
= |α − 1|∥∆k+1 − β∆k ∥ + ∥∆k+1 − β∆k ∥ + ∥∆k+1 − β∆k ∥
γ
i
+ | β2 + 2ρ0 γ|∥∆k ∥ + | β2 + 2ρ0 γ|∥∆k ∥ (57)
1h
≤ (α − 1)(∥∆k+1 ∥ + β∥∆k ∥) + (∥∆k+1 ∥ + β∥∆k ∥) + (∥∆k+1 ∥ + β∥∆k ∥)
γ
i
+ ( β2 + 2ργ)∥∆k ∥ + ( β2 + 2ρ0 γ)∥∆k ∥
1h i
= (α + 1)∥∆k+1 ∥ + ((α + 1) β + (6 − 2α) β2 )∥∆k ∥
γ
≤ρ3 (∥∆k+1 ∥ + ∥∆k ∥),

where ρ3 = max{(α + 1)/γ, [(α + 1) β + (6 − 2α) β2 ]/γ}. The main convergence result is
as follows.

Theorem 4 (Strongly convergence). If ϕ is a KL function and {vk }k∈N is bounded sequence, and
based on Assumptions 1 and 2, it holds that

1. ∑ ∥vk+1 − vk ∥ < +∞;
k =0
2. The sequence {zk }k∈N converges to a stationary point of ϕ̃.

Proof. Let the set Ω := C ({wk }k∈N ). Since the sequence {vk }k∈N is bounded and wk =
(yk ; zk ; x k ; x k−1 ; x k−2 ). Moreover, by Lemma 2, we can deduce that Ω is a nonempty compact
set, and dist(wk , Ω) → 0.
1. From the Theorem 2, the set Ω has the form

Ω = {(y; z; x; x̃; x̂ ) : (y; z; x ) ∈ C ({vk }k∈N )}.

Let w∗ = (y∗ ; z∗ ; x ∗ ; x̃ ∗ ; x̂ ∗ ) ∈ Ω, and suppose that vk j → v∗ . Then based on


Theorems 2 and 3, we have

1
lim M(wk ) = lim ϕ(wk j ) = M(w∗ ) = ϕ̃( x ∗ ) + ∥ y ∗ ∥2 .
k→∞ j→∞ 2γ

Therefore, M is finite and constant on Ω.


Since ϕ is a KL function, so is M. Thus, M satisfies the KL property. There must
exist parameters ε > 0, η > 0, ψ : [0, η ) → R+ , for all w ∈ {w : dist(w, Ω) < ε, M(w∗ ) <
M(w) < M(w∗ ) + η }, satisfies

ψ′ (M(w) − M(w∗ ))dist(0, ∂M(w)) ≥ 1. (58)

Recall from Theorem 1 that the sequence {M(wk )}k∈N is nonincreasing, so if for some
k0 ∈ N, we have M(wk0 ) = M(w∗ ), then for all k ≥ k0 , M(wk ) = M(w∗ ).
Thus, by Theorem 1, {yk }, { x k } must be eventually constant, and so is {zk } by (13).
Hence, {vk } is of finite length. The proof is complete.
Mathematics 2024, 12, 675 17 of 24

On the other hand, if no such k0 exists, then we have

M(wk ) > M(w∗ ), ∀k ∈ N.

Suppose that k1 ∈ N is sufficiently large, one can obtain

dist(wk , Ω) < ε, M(w∗ ) < M(wk ) < M(w∗ ) + η, ∀k > k1 .

In combination with (58), we obtain

ψ′ (M(wk+1 ) − M(w∗ ))dist(0, ∂M(wk+1 )) ≥ 1, ∀k ≥ k1 . (59)

For ease of understanding, define

Mk = M(wk ) − M(w∗ ), ∀k ∈ N.

Since ψ is concave, we conclude that

ψ(Mk+1 ) − ψ(Mk+2 ) ≥ ψ′ (Mk+1 )(Mk+1 − Mk+2 ), ∀k ≥ k1 .

By multiplying both sides of (59) by non-negative term Mk+1 − Mk+2 , we obtain

(ψ(Mk+1 ) − ψ(Mk+2 ))dist(0, ∂M(wk+1 )) ≥ Mk+1 − Mk+2 , ∀k ≥ k1 . (60)

Next using the Theorem 1 and Lemma 5, for all k ∈ N, we have

Mk+1 − Mk+2 = M(wk+1 ) − M(wk+2 ) ≥ ρ1 ∥∆1k+2 ∥2 + (ρ2 − ρ0 )∥∆k+1 ∥2 , (61)

ρ3 (∥∆k+1 ∥ + ∥∆k ∥) ≥ ∥tk+1 ∥ ≥ dist(0, ∂M(wk+1 )). (62)


Since for all t ∈ (0, η ), ψ′ (t)
> 0, then we have ψ(Mk+1 ) − ψ(Mk+2 ) > 0. Using the
above two inequalities along with (60), we can conclude that

(ρ2 − ρ0 )∥∆k+1 ∥2 ≤ ρ3 (∥∆k+1 ∥ + ∥∆k ∥)(ψ(Mk+1 ) − ψ(Mk+2 )). (63)

From this, we see further that


ρ3
∥ ∆ k +1 ∥ 2 ≤ (∥∆k+1 ∥ + ∥∆k ∥)(ψ(Mk+1 ) − ψ(Mk+2 )), (64)
ρ2 − ρ0

ρ3 ∥ ∆ k +1 ∥ + ∥ ∆ k ∥
∥ ∆ k +1 ∥ ≤ (ψ(Mk+1 ) − ψ(Mk+2 )) + . (65)
ρ2 − ρ0 4
That is,
1 4ρ3
∥ ∆ k +1 ∥ ≤ ∥ ∆ k ∥ + (ψ(Mk+1 ) − ψ(Mk+2 )).
3 3( ρ0 − ρ2 )
In addition, we have ψ(Mk ) > 0 for k ≥ k1 + 1.

N
∑ (ψ(Mk ) − ψ(Mk+1 )) = ψ(Mk1 +1 ) − ψ(M N +1 ) ≤ ψ(Mk1 +1 ), (66)
k = k 1 +1

that is,

∑ (ψ(Mk ) − ψ(Mk+1 )) < +∞. (67)
k = k 1 +1

Thus, by Lemma 1, we obtain


Mathematics 2024, 12, 675 18 of 24


∑ ∥xk+1 − xk ∥ < +∞. (68)
k =0

Similar to the above proof process, in Equations (59)–(62), by making the superscript
k = k − 1, we also obtain

(ψ(Mk ) − ψ(Mk+1 )) dist(0, ∂M(wk )) ≥ Mk − Mk+1 , ∀k ≥ k1 ; (69)

Mk − Mk+1 = M(wk ) − M(wk+1 ) ≥ ρ1 ∥∆1k+1 ∥2 + (ρ2 − ρ0 )∥∆k ∥2 ; (70)

ρ3 (∥∆k ∥ + ∥∆k−1 ∥) ≥ ∥tk ∥ ≥ dist(0, ∂M(wk )). (71)


Using (69), (70) and (71), we have

ρ1 ∥∆1k+1 ∥2 ≤ ρ3 (∥∆k ∥ + ∥∆k−1 ∥)(ψ(Mk ) − ψ(Mk+1 )). (72)

Then
ρ3 ∥ ∆ k ∥ + ∥ ∆ k −1 ∥
∥∆1k+1 ∥ ≤ (ψ(Mk ) − ψ(Mk+1 )) + .
ρ1 4
Therefore, by (67), (68), we obtain

∑ ∥yk+1 − yk ∥ < +∞. (73)
k =0

Finally, observe from (13), we have



∑ ∥zk+1 − zk ∥ < +∞. (74)
k =0

2. From conclusion 1, we know that {vk }k∈N is Cauchy sequence, and thus {vk }k∈N
is convergent. Consequently, by applying conclusion 2 in Theorem 3, we obtain that {zk }
converges to a stationary point of ϕ̃.

4. Numerical Results
In this section, we evaluate the effectiveness and feasibility of the IPDR splitting
method through the nonconvex feasibility problem coming from [16,30]. All experiments
are implemented in MATLAB R2021a on a laptop computer with a 3.20 GHz AMD Ryzen 7
6800H processor and 16 GB memory.
We are solving a nonconvex feasibility problem, which searches for a point in the
intersection of two nonempty sets. It can be structured as follows:
1 2
min
x 2 dC ( x )
(75)
s.t. x ∈ D,

where C is a nonempty closed convex set, D is a nonempty compact set, and


d2C ( x ) = inf ∥y − x ∥2 is the distance function. Specifically, we set C = { x ∈ Rn : Ax = b}
y∈C
and D = { x ∈ Rn : ∥ x ∥0 ≤ r, ∥ x ∥∞ ≤ 106 }.
As is well known, this problem can be modeled by the following:

min f ( x ) + g( x ). (76)
x

As discussed in Section 3, when the IPDR splitting method is applied directly to Prob-
lem (76) with f ( x ) = inf ∥y − x ∥2 and g( x ) = ID ( x ), the resulting sequence does not con-
y∈C
verge to the critical point of (76). However, as an alternative, we can take
Mathematics 2024, 12, 675 19 of 24

2− α 2
f ( x ) = inf ∥y − x ∥2 and g( x ) = ID − 2γ ∥ x ∥ and apply the IPDR splitting method
y∈C
accordingly, then the sequence generated converges to a critical point of (76). Thus, we
obtain the following algorithm:
= 1+1 γ (ut + γPC (ut )),
 t +1

 y
 t +1 t 
αy −u

 t +1
( IPDR) z ∈ PD α −1 ,
(77)

 x t + 1 = u t + ( z t − y t + 1 ),

 t +1
u = x t +1 + β ( x t +1 − x t ),

where PC is the projector onto the set C and PD is the projector onto the set D; α is the
parameterized coefficient; β is the inertial parameter; and γ > 0 is the step-size.
We compare our method with that of several other algorithms, including DR [11], PR
(the Peaceman–Rachford splitting method [30]), PDR [16], Alt (the alternating projection
method [11]), and IDRS [23].
In Li and Pong [11], the DR splitting method for Problem (76) is as follows:

= 1+1 γ ( x t + γPC ( x t )),


 t +1
 y
( DR) zt+1 ∈ PD 2yt+1 − x t ,

(78)
 t +1
x = x t + ( z t − y t +1 ).

To ensures the convergence, in Li et al. [30], the authors set f ( x ) = infx∈C ∥y − x ∥2 +


β 2and g( x ) = ID ( x ) − 2 ∥ x ∥2 to solve Problem (76) by the PR method. Thus, their
β
2 ∥x∥
algorithm can be given as follows:
  t 
γPC 1+x βγ + x t
 y t +1 = (1+ β ) γ +1 ,



( PR) (79)
 t +1 t 
t +1 ∈ P 2y − x

 z D 1− βγ ,
 t +1
= x t + 2 z t +1 − y t +1 .
 
x

Similar to the PR method, in Bian and Zhang [16]. They take f ( x ) = 12 infx∈C ∥y − x ∥2
−α ∥ x ∥2 in Problem (76). The PDR splitting method is formulated:
and g( x ) = ID ( x ) − 22γ

= 1+1 γ ( x t + γPC ( x t )),


 t +1
 y
  t +1 t 
( PDR) zt+1 ∈ PD αy α−− 1
x
, (80)

 t +1
x t t
= x + ( z − y 1 ). t +

In Feng et al. [23], they study the IDRS method for the signal recovery problem. We
give the IDRS method for the feasibility problem and make the same substitution as the
PDR method for the functions f , g. The IDRS method is given below:

= 1+1 γ (ut + γPC (ut )),


 t +1
 y

 t +1
∈ PD 2yt+1 − ut ,

z
( IDRS) (81)

 x t +1 = u t + ( z t − y t +1 ),
 t +1
u = x t +1 + β ( x t +1 − x t ).

The Alt method is a typical algorithm for solving Problem (75) is as follows:
 
( Alt) x t+1 ∈ arg min { x − x t + A† (b − Ax t ) }. (82)
{∥ x ∥0 ≤r,∥ x ∥∞ ≤106 }

We consider the problem of finding an r-sparse solution of the linear system Ax = b.


We first generate an m × n matrix A with i.i.d. standard Gaussian entries. Then we generate
a sparse vector x̂ ∈ Rn with r = [ m5 ] non-zero entries randomly. A random sparse vector
x̃ ∈ Rn is generated by setting x̃ = 0 and then specifying r random entries in x̃ as x̂. Finally,
Mathematics 2024, 12, 675 20 of 24

we set b = A x̂. We choose the same initialization and stopping criteria in Li and Pong [11],
Li et al. [30] and Bian and Zhang [16].
All algorithms are initialized at the origin, and the stopping criteria are as follows:
• DR, PR, PDR, IDRS, IPDR

max{ x t − x t−1 , yt − yt−1 , zt − zt−1 }


< 10−8 ; (83)
{∥ x t−1 ∥, ∥yt−1 ∥, ∥zt−1 ∥, 1}

• Alt
max{ x t − x t−1 }
< 10−8 . (84)
{∥ x t−1 ∥, 1}
The parameters of various methods are listed in Table 1, and for the definition of the
algorithmic parameters in Table 1, please refer to [11,16,23,30].

Table 1. Parameters of various methods for solving the nonconvex feasibility problem.

Method DR PDR IDRS PR


q q
q
γ0 = 14+ γ0 = 14+ θ = 2.2
−α − 1 −α − 1
α α
γ0 = 32 − 1
Parameters γ0 = 0.93
k = 150 k = 150 β = 0.4 θ
β −2
α = 1.7 k = 150 γ1 = ( β+1)2
Note: All algorithms use the heuristic, i.e., we initialize γ = k · γ0 and update γ as max{ γ2 , 0.999 · γ0 } whenever
γ > γ0 .

To select the parameter γ, we first choose γ0 > 0. However, γ0 may become very small
during actual numerical calculations. Therefore, we use a heuristic: γ = k · γ0 . Here, in the
DR method, k is chosen to be 150, while in the PDR method, the optimal parameters (as
referenced in [16]) are chosen to be k = 150 and α = 1.7. Most notably, in the IPDR splitting
method, we reduce the value of γ0 while increasing k to ensure appropriate scaling, where
k = 375.
We further give the assumptions in Section 3 that the function f , g satisfy:
1. Given that C is both closed and convex, it implies that the function f is smooth
with a Lipschitz continuous gradient, where the Lipschitz continuity modulus L is 1.
Moreover, similar to PDR [16], we can set s = 0 in this experiment;
2. From Assumption 2, we have q < ρ/(4 − α), then choose the largest q, and it is easy
to calculate that
(6 − 2α)q
β ≤ β1 = .
(4 − α ) + (2 − α ) q
In summary, the parameters of the IPDR method are as follows:
1. We choose the largest parameter β, i.e., β = β 1 ;
2. We choose the parameter α = 1.6;
p
3. γ = 0.7 · (1 + α)/(4 − α) − 1.
In our experiments, we generated 50 random instances where m was chosen from the
set {300, 400, 500, 600} and n was chosen from the set {4000, 5000, 6000}. The results of
the numerical calculations are presented in Tables 2–5, which include the runtime (tim),
number of iterations (iter), as well as the largest and smallest function values at termination
(fmax, fmin). To assess the correctness of the sparse solution of the linear system, we report
the number of successes (succ) and failures (fail). We declare success when the value of
the function at termination is less than 10−12 , and we declare failure when the value of the
function at termination is greater than 10−6 .
Mathematics 2024, 12, 675 21 of 24

Table 2. Compare DR and PR methods for feasibility problems on random instances.

Data DR PR
m n Tim (s) Iter Succ Fail Tim (s) Iter Succ Fail
300 4000 0.41 593 50 0 0.10 133 37 13
300 5000 0.65 710 50 0 0.16 167 29 21
300 6000 0.94 801 50 0 0.24 206 22 28
400 4000 0.79 525 50 0 0.16 97 45 5
400 5000 1.06 578 50 0 0.25 126 44 6
400 6000 1.37 650 50 0 0.34 157 32 18
500 4000 1.10 498 50 0 0.40 176 48 2
500 5000 1.31 525 50 0 0.25 94 49 1
500 6000 1.74 561 50 0 0.36 124 43 7
600 4000 2.17 486 50 0 9.39 2498 4 46
600 5000 2.38 502 50 0 0.58 116 49 1
600 6000 2.12 524 50 0 0.52 92 50 0

Table 3. Compare Alt and IDR methods for feasibility problems on random instances.

Data Alt IDRS


m n Tim (s) Iter Succ Fail Tim (s) Iter Succ Fail
300 4000 0.66 1023 13 37 0.40 418 39 11
300 5000 0.87 1064 4 46 0.54 420 22 28
300 6000 1.44 1342 3 47 0.75 454 18 32
400 4000 1.20 818 23 27 0.92 465 42 8
400 5000 1.76 991 17 33 0.95 409 35 15
400 6000 2.26 1119 8 42 1.10 419 33 17
500 4000 1.46 663 40 10 1.33 473 50 0
500 5000 2.15 869 31 19 1.26 407 46 4
500 6000 2.87 1049 19 31 1.25 362 35 15
600 4000 1.59 473 48 2 2.45 565 50 0
600 5000 2.41 664 44 6 2.18 464 50 0
600 6000 3.47 877 38 12 1.82 357 47 3

Table 4. Compare PDR and IPDR methods for feasibility problems on random instances.

Data PDR IPDR


m n Tim (s) Iter Succ Fail Tim (s) Iter Succ Fail
300 4000 0.39 348 50 0 0.30 291 50 0
300 5000 0.54 392 40 10 0.59 378 50 0
300 6000 0.71 392 33 17 0.66 386 40 10
400 4000 0.66 291 50 0 0.47 258 50 0
400 5000 0.89 322 50 0 0.59 281 50 0
400 6000 1.33 439 47 3 0.74 324 50 0
500 4000 0.79 259 50 0 0.54 232 50 0
500 5000 0.86 288 50 0 0.67 256 50 0
500 6000 1.15 315 50 0 0.80 278 50 0
600 4000 1.12 237 50 0 0.77 214 50 0
600 5000 1.47 263 50 0 0.92 236 50 0
600 6000 1.69 286 50 0 1.07 253 50 0
Mathematics 2024, 12, 675 22 of 24

Table 5. Value for feasibility problems on random instances.

Data DR PDR IPDR


m n fmax fmin fmax fmin fmax fmin
300 4000 3 × 10−15 3 × 10−16 3 × 10−15 4 × 10−16 2 × 10−15 2 × 10−16
300 5000 3 × 10−15 3 × 10−16 1 × 10−1 2 × 10−16 3 × 10−15 3 × 10−16
300 6000 3 × 10−15 4 × 10−16 9 × 10−2 4 × 10−16 8 × 10−2 4 × 10−16
400 4000 2 × 10−15 2 × 10−17 4 × 10−15 2 × 10−16 3 × 10−15 2 × 10−16
400 5000 3 × 10−15 1 × 10−16 4 × 10−15 7 × 10−16 3 × 10−15 4 × 10−16
400 6000 3 × 10−15 4 × 10−16 1 × 10−1 8 × 10−16 3 × 10−15 3 × 10−16
500 4000 2 × 10−16 2 × 10−18 4 × 10−15 5 × 10−16 3 × 10−15 3 × 10−16
500 5000 2 × 10−15 4 × 10−17 5 × 10−15 9 × 10−16 4 × 10−15 5 × 10−16
500 6000 4 × 10−15 3 × 10−16 3 × 10−15 4 × 10−16 4 × 10−15 6 × 10−16
600 4000 3 × 10−17 3 × 10−20 5 × 10−15 7 × 10−16 4 × 10−15 5 × 10−16
600 5000 2 × 10−16 6 × 10−18 5 × 10−15 7 × 10−16 6 × 10−15 6 × 10−16
600 6000 2 × 10−15 3 × 10−17 5 × 10−15 1 × 10−15 3 × 10−15 9 × 10−16

As demonstrated in the tables, when m ∈ {400, 500, 600}, the success rate of the IPDR
method closely matches that of DR and PDR while exhibiting superior computational
efficiency. In other words, IPDR has a high success rate, fewer iterations, and less CPU time.
In cases where m = 300, our proposed method achieves a success rate that outperforms
all other algorithms, ranking second only to the DR method. It can also be seen that the
success rate of the method may be related to the experimental model, and the DR method
can solve some problems with worse nonconvexity. In Table 5, we list the fmax and fmin
values of the algorithms with high success rates: DR, PDR, and IPDR. Notably, the final
function value of IPDR consistently outperforms PDR in most cases. It is worth mentioning
that the PR, Alt, and IDRS methods exhibited low success rates. These observations show
the effectiveness of combining parametric and inertial techniques.

5. Conclusions
In this paper, we focus on solving a class of nonconvex and nonsmooth minimiza-
tion problems. Specifically, we propose an inertial parametric Douglas–Rachford (IPDR)
splitting method, which combines inertial and parametric techniques with the DR splitting
method. We also construct a new merit function and use the Kurdyka-Łojasiewicz property
to prove the boundedness and convergence of the iterative sequences generated by the
proposed IPDR method. Finally, by applying the IPDR method to nonconvex feasibility
problems, our numerical experimental results demonstrate the potential advantage of
combining parametric and inertia techniques.

Author Contributions: Conceptualization, T.L.; methodology, T.L.; validation, T.L. and X.Z.; inves-
tigation, T.L.; writing—original draft preparation, T.L.; writing—review and editing, T.L. and X.Z.;
supervision, X.Z. All authors have read and agreed to the published version of the manuscript.
Funding: This research was funded by the National Natural Science Foundation of China (11901368)
and the Shanxi Province Science Foundation for Youths (20210302124530).
Data Availability Statement: The data presented in this study are available on request from the
corresponding author.
Conflicts of Interest: There is no conflict of interest.
Mathematics 2024, 12, 675 23 of 24

Abbreviations
The following abbreviations are used in this manuscript:

DR Douglas–Rachford
DRE Douglas–Rachford Envelope
ADMM Alternating direction method of multipliers
PDR Parameterized Douglas–Rachford
RrDR Randomized r-sets-Douglas–Rachford
IDRS Inertial Douglas–Rachford splitting
IPDR Inertial parametric Douglas–Rachford
PR Peaceman–Rachford
Alt Alternating projection

References
1. Douglas, J.; Rachford, H.H. On the numerical solution of heat conduction problems in two and three space variables. Trans. Am.
Math. Soc. 1956, 82, 421–439.
2. Lions, P.L.; Mercier, B. Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal. 1979, 16, 964–979.
3. Combettes, P.L.; Pesquet, J.C. A Douglas–Rachford Splitting Approach to Nonsmooth Convex Variational Signal Recovery. IEEE J.
Sel. Top. Signal Process. 2007, 1, 564–574.
4. Gandy, S.; Recht, B.; Yamada, I. Tensor completion and low-n-rank tensor recovery via convex optimization. Inverse Probl. 2011,
27, 025010.
5. He, B.; Yuan, X. On the O(1/n) convergence rate of the Douglas–Rachford alternating direction method. SIAM J. Numer. Anal.
2012, 50, 700–709.
6. Qu, Y.; He, H.; Han, D. A Partially Inertial Customized Douglas–Rachford Splitting Method for a Class of Structured Optimization
Problems. SIAM J. Sci. Comput. 2024, 98, 9.
7. Patrinos, P.; Stella, L.; Bemporad, A. Douglas-Rachford Splitting: Complexity Estimates and Accelerated Variants. In Proceedings
of the 53rd IEEE Conference on Decision and Control, Piscataway, NJ, USA, 15–17 December 2014; pp. 4234–4239.
8. Barshad, K.; Gibali, A.; Reich, S. Unrestricted Douglas-Rachford algorithms for solving convex feasibility problems in Hilbert
space. Optim. Methods Softw. 2023, 38, 655–667.
9. Lindstrom, S.B.; Sims, B. Survey: Sixty Years of Douglas-Rachford. J. Aust. Math. Soc. 2021, 110, 333–370.
10. Han, D.R. A Survey on Some Recent Developments of Alternating Direction Method of Multipliers. J. Oper. Res. Soc. China. 2022,
10, 1–52.
11. Li, G.; Pong, T.K. Douglas-Rachford Splitting for Nonconvex Optimization with Application to Nonconvex Feasibility Problems.
Math. Program. 2016, 159, 371–401.
12. Themelis, A.; Patrinos, P. Douglas-Rachford Splitting and ADMM for Nonconvex Optimization: Tight Convergence Results.
SIAM J. Optim. 2020, 30, 149–181.
13. Themelis, A.; Stella, L.; Patrinos, P. Douglas-Rachford splitting and ADMM for nonconvex optimization: Accelerated and
Newton-type linesearch algorithms. Comput. Optim. Appl. 2022, 82, 395–440.
14. Dao, M.N.; Phan, H.M. Adaptive Douglas–Rachford Splitting Algorithm for the Sum of Two Operators. SIAM J. Optim. 2019, 29,
2697–2724.
15. Tran Dinh, Q.; Pham, N.H.; Phan, D.; Nguyen, L. FedDR-randomized Douglas-Rachford splitting algorithms for nonconvex
federated composite optimization. Adv. Neural Inf. Process. Syst. 2021, 34, 30326–30338.
16. Bian, F.; Zhang, X. A Parameterized Douglas-Rachford Splitting Algorithm for Nonconvex Optimization. Appl. Math. Comput.
2021, 410, 126425.
17. Dao, M.N.; Phan, H.M. Linear Convergence of Projection Algorithms. Math. Oper. Res. 2019, 44, 715–738.
18. Wang, D.; Wang, X. A Parameterized Douglas–Rachford Algorithm. Comput. Optim. Appl. 2019, 73, 839–869.
19. Han, D.; Su, Y.; Xie, J. Randomized Douglas-Rachford Method for Linear Systems: Improved Accuracy and Efficiency. arXiv 2022,
arXiv:2207.04291.
20. Polyak, B.T. Some Methods of Speeding Up the Convergence of Iteration Methods. USSR Comput. Math. Math. Phys. 1964, 4, 1–17.
21. Boţ, R.I.; Csetnek, E.R.; Hendrich, C. Inertial Douglas–Rachford splitting for monotone inclusion problems. Appl. Math. Comput.
2015, 256, 472–487.
22. Alves, M.M.; Eckstein, J.; Geremia, M.; Melo, J.G. Relative-Error Inertial-Relaxed Inexact Versions of Douglas-Rachford and
ADMM Splitting Algorithms. Comput. Optim. Appl. 2020, 75, 389–422.
23. Feng, J.; Zhang, H.; Zhang, K.; Zhao, P. An Inertial Douglas-Rachford Splitting Algorithm for Nonconvex and Nonsmooth
Problems. Concurr. Comput. Pract. Exper. 2021, e6343.
24. Rockafellar, R.T.; Wets, R.J.B. Variational Analysis; Springer Science & Business Media: New York, NY, USA, 2009; Volume 317.
25. Boţ, R.I.; Csetnek, E.R. An Inertial Tseng’s Type Proximal Algorithm for Nonsmooth and Nonconvex Optimization Problems. J.
Optim. Theory Appl. 2016, 171, 600–616.
26. Nesterov, Y. Introductory Lectures on Convex Programming Volume I: Basic Course. Lect. Notes 1998, 3, 5.
Mathematics 2024, 12, 675 24 of 24

27. Goebel, K.; Reich, S. Uniform Convexity, Hyperbolic Geometry, and Nonexpansive Mappings; Marcel Dekker: New York, NY, USA,
1984; Volume 83.
28. Attouch, H.; Bolte, J.; Svaiter, B.F. Convergence of Descent Methods for Semi-Algebraic and Tame Problems: Proximal Algorithms,
Forward-Backward Splitting, and Regularized Gauss-Seidel Methods. Math. Program. 2013, 137, 91–129.
29. Bolte, J.; Sabach, S.; Teboulle, M. Proximal Alternating Linearized Minimization for Nonconvex and Nonsmooth Problems. Math.
Program. 2014, 146, 459–494.
30. Li, G.; Liu, T.; Pong, T.K. Peaceman-Rachford Splitting for a Class of Nonconvex Optimization Problems. Comput. Optim. Appl.
2017, 68, 407–436.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

You might also like