Professional Documents
Culture Documents
https://doi.org/10.1007/s11228-020-00550-4
Maxim V. Balashov1
Received: 16 January 2019 / Accepted: 19 July 2020 / Published online: 1 August 2020
© Springer Nature B.V. 2020
Abstract
We consider the problem of minimization for a function with Lipschitz continuous gradient
on a proximally smooth and smooth manifold in a finite dimensional Euclidean space. We
consider the Lezanski-Polyak-Lojasiewicz (LPL) conditions in this problem of constrained
optimization. We prove that the gradient projection algorithm for the problem converges
with a linear rate when the LPL condition holds.
The gradient projection algorithm GPA (also known as the gradient projection method, the
projection-proximal method etc.) for solving the problem
min f (x) (1)
A
was introduced in [1] (without evaluating the rate of convergence) and [2] (with evaluating
the rate of convergence). Convexity and smoothness are essential properties for efficiency of
the method. In particular, if the function in (1) is strongly convex with Lipschitz continuous
gradient and the set is closed and convex, then the GPA converges with the rate of geometric
progression (or linear rate). The GPA proved to be an extremely useful tool for solving
different extremal problems [2, 3]. Numerous variants of the GPA for convex optimization
problems can be found in [4].
Maxim V. Balashov
balashov73@mail.ru
1 This means that the sequence {x } generated by the GPA converges to a solution x with the rate xk −x∗ ≤
k ∗
C1 e−C2 k for all k > k0 with unknown constants k0 ∈ N, C1 > 0 and/or C2 > 0.
The Gradient Projection Algorithm for Smooth Sets and Functions... 343
One of the most important assumptions for linear rate is some sort of the Lojasiewicz
condition with exponent 1/2. This means the LPL condition (Definition 1) in notations of
the present paper. We compare our results with [21, Theorem 2.3, Corollary 2.11] and some
other results in Section 3.
Retraction [18, 25] is another important thing for gradient methods on smooth manifolds.
When we slide along a tangent line (to a geodesic or manifold) in the anti-gradient direction,
we typically leave the manifold A. Thus we need a procedure (retraction) to return the point
x + tv on the set A (x ∈ A, t > 0, v is a tangent vector to the set A at the point x).
In the present paper we plan to prove convergence of the GPA for the problem of min-
imization of a smooth nonconvex function on a proximally smooth and smooth manifold.
This is a key point in the work. Our choice is dictated by the fact that the metric projection
on such set exists and it is unique (and continuously depends on the point) in some uniform
neighborhood of the set. We use proximal properties of proximally smooth sets first of all.
As we will see further, these properties are useful in particular for the case when A is a
smooth manifold.
Suppose that the set A is proximally smooth and f has Lipschitz continuous gradient.
(x))
Define for t > 0 and for a point x ∈ A the gradient mapping Gt (x) = x−PA (x−tf t [4]. If
the error bound condition for the gradient mapping
holds for the solution set in problem (1) and t > 0 is sufficiently small, then the standard
GPA for (1) with step-size t converges with a linear rate [29, Theorem 5.1]. This result
shows that proximally smooth sets form a good class for the GPA in problem (1).
The mentioned error bound condition for the gradient mapping has basically theoretical
meaning and quite complicated for usage. Moreover, it does not take into account spe-
cific situation when A is a smooth manifold. Assume further that A is a smooth manifold
S. In the present paper we consider (instead of an Error bound condition for the func-
tion f on a manifold S) the Lezanski-Polyak-Lojasiewicz condition (or LPL, Definition
1). This condition is well known in unconstrained optimization, we adopted it to the case
of constrained optimization problems. The LPL condition (which is known as Lojasiewicz
condition, Polyak-Lojasiewicz condition, Kurdyka-Lojasiewaicz condition and so on) was
considered before [6–8], [21, Definition 2.1], [24, §3.2]. In particular it was considered for
constrained optimization of a quadratic function on the Euclidean unit sphere [35] and on
the Stiefel manifold [36].
In Section 2 we recall the LPL condition for problem (1) in the case when A is a
proximally smooth and smooth manifold. In Lemmas 1, 2 and Theorem 1 we consider rela-
tionship between the LPL condition of the function f on the smooth manifold S and the
quadric growth condition for f .
We prove some lemmas about properties of a proximally smooth set which is given by
the system {x ∈ Rd | g(x) = 0}, g : Rd → Rm . In particular, we proved in Lemma 4 that
proximal smoothness is a typical property for a compact smooth manifold.
Theorems 2 and 3 are devoted to proving a linear convergence of the GPA on a smooth
and proximally smooth manifold S ⊂ Rd under the LPL condition. We obtain explicit form
of the convergence rate via known constants (25). In the case dimS = d − 1 it was proved in
[24]. Principal difficulties arise in the case dim S < d − 1. Lemma 5 is the main technical
tool in this situation.
Thus smooth and proximally smooth manifolds define a class of sets for which the GPA
under the LPL condition works alright and the rate of convergence for the GPA can be quite
344 M.V. Balashov
easily and explicitly proven in fact in the same way as in the convex case. Our final results
do not need concept of geodesics at all. This is another advantage of the proposed approach.
Note that in the case of a smooth manifold A the property of proximal smoothness for
A with constant R is equivalent to the existence of local geodesics on the manifold. Such a
geodesic exists and is unique for any endpoints a, b ∈ A, a − b < 2R [26, Items (1) and
(5) of Theorem 1.14.2], [31, Item (k) of Theorem 3.1].
In the present paper we consider first order algorithms. Some approachers to the second
order algorithms (a cubic regularised Newton’s method over Riemannian manifolds) can be
found in [37].
1.2
We denote a Euclidean space of d dimensions by Rd and the inner product by (·, ·). Define
BR (a) = {x ∈ Rd | x − a ≤ R}, a ∈ Rd , R > 0.
For a set A ⊂ Rd we denote by cl A, int A, ∂A the closure, the interior and the boundary
of the set A, respectively.
Let T ⊂ Rd be a subspace. Denote by T ⊥ its orthogonal complement.
A function f : Rd → R is called strongly convex with constant κ > 0 [3, §1 Chapter 1]
if the function f (x) − κ2 x2 is convex.
We denote by f (x) the Frechet gradient for a differentiable function f at a point x.
Suppose that there exists L1 > 0 with f (x) − f (y) ≤ L1 x − y for all x, y. Then we
shall say that the function f is smooth.
If f is smooth with constant L1 then for all x, x0 we have
f (x0 ) + (f (x0 ), x − x0 ) − L1
2 x − x0 2 ≤ f (x) ≤
≤ f (x0 ) + (f (x0 ), x − x0 ) + L1
2 x − x0 2 .
For a function f : Rd → R and β ∈ R define the lower level set
Lf (β) = {x ∈ Rd | f (x) ≤ β}.
We denote by A (x) = (x, A) = inf x − a the distance function.
a∈A
The metric projection of a point x ∈ Rd onto a set A ⊂ Rd is
PA x = {a ∈ A | x − a = A (x)}.
Let R > 0, x0 , x1 ∈ Rd and x0 − x1 < 2R. We denote by DR (x0 , x1 ) the strongly con-
vex segment with endpoints x0 , x1 that is the intersection of all closed balls of radius R each
of which contains {x0 , x1 }. The boundary of the set DR (x0 , x1 ) consists of all small arcs of
circles of radius R with endpoints {x0 , x1 }. Define also DR◦ (x0 , x1 ) = DR (x0 , x1 )\{x0 , x1 }.
A closed set A ⊂ Rd is called proximally smooth (or prox-regular, weakly convex) with
constant R if the distance function A (x) is continuously Frechet differentiable on the set
UA (R) = {x ∈ Rd | 0 < A (x) < R}.
The properties of such sets were considered by different authors [27, 28, 30, 31].
For a proximally smooth set A and a point x ∈ A we denote by N (A, x) the cone of
proximal normals
N (A, x) = {n ∈ Rd | ∃t = t (n) > 0 PA (x + tn) = x}.
For a proximally smooth set in Rd all tangent cones and consequently all normal cones
coincide, see [32, Section 6], [33, Corollary 2.2].
The Gradient Projection Algorithm for Smooth Sets and Functions... 345
We shall say that x ∈ A is a stationary point of the function f on the set A ⊂ Rn , if there
exists some t > 0 with
PA (x − tf (x)) = x.
The last equality is equivalent to the inclusion f (x) ∈ −N (A, x).
For a differentiable mapping ϕ : Rm → Rd , ϕ(u) = (ϕ1 (u), . . . , ϕd (u)), we define the
Jacobi matrix ⎛ ⎞
ϕ1u (u) . . . ϕ1 u (u)
⎜ 1 m
⎟
ϕ (u) = ⎝. . . . . . . . . . . . . . . . . . . .⎠ ,
ϕdu (u) . . . ϕdu (u)
1 m
any point x ∈ S we can find δ > 0, u ∈ Rm , Bδ (u) ⊂ Rm and a function ϕ ∈ C 1 (int Bδ (u))
with ϕ(u) = x, ϕ : int Bδ (u) → ϕ(int Bδ (u)) ⊂ S is a diffeomorphism. In fact, ϕ = ψ −1 .
Hence rank ϕ (v) = m for all v ∈ int Bδ (u).
Smoothness of the manifold S means that for any point x ∈ S there exists a tangent
subspace
Tx to S at the point x and a tangent plane x + Tx . It should be noted that Tx =
ϕ (u), ∈ Rm has dimension m for any x ∈ S. Here the function ϕ and the point u are
from the previous paragraph.
Recall that in the case S = {x ∈ Rd | g(x) = 0}, g : Rd → Rm is a C 1 function
satisfying the full rank condition rank (g (x)) = m for all x ∈ S, we have Tx = {v ∈
Rd | g (x)v = 0}. For a vector h ∈ Rd the metric projection PTx h on the subspace Tx is
given by the formula
−1
PTx h = I − g (x) g (x)g (x)
T T
g (x) h.
Further in the context of Definition 1 we shall write ”for any x ∈ S” instead of ”for any
x ∈ S ∩ Lf (β)” for simplicity.
We want to point out that Definition 1 is a natural generalization for the LPL condition
in unconstrained case. Indeed, we may assume that in unconstrained situation S = Rd and
thus for any x ∈ S we get PTx f (x) = f (x).
Consider a function f : Lf (β) → R. Suppose that f is strongly convex with constant κ
and Lipschitz continuous with constant L. Let S be a smooth manifold without edge which
L
is proximally smooth with constant R and κ < R. Then f satisfies the LPL condition on
the set S ∩ Lf (β) [24, Example 4].
A strongly convex function does not satisfy the LPL condition without the inequal-
2
μ < R. Consider the function f (x, y) = x + y − 4 and the curve S =
3
ity L 2
(x, y) | y = 2x 2 , x ∈ [0, 12 ] . Then ( 12 , 12 ) is the global minimum for minS f =
f ( 12 , 12 ) = 16
5
, f ( 12 , 12 ) = 1, − 12 , f (x, 2x 2 ) = 2 x, 2x 2 − 34 . Note that the circle
2
x 2 + y − 34 = 16 5
is tangent to S at the point 12 , 12 . The function f is strongly convex
with constant μ = 2. The LPL condition does not hold because vector f (0, 0) = 0, − 32
is perpendicular to the curve at the point (0, 0). We have R = 14 , L = 32 , μ = 2.
Lemma 1 Suppose that S is a proximally smooth with constant R > 0 smooth manifold
without edge, f : Rd → R is a smooth function with constant L1 .
The Gradient Projection Algorithm for Smooth Sets and Functions... 347
Let = Arg min f (x) = ∅, f0 = f () and L = sup f (x) < +∞. Then there
x∈S x∈
L L1
exists B = R + 2 with
f (x) − f0 ≤ B
2
(x) ∀x ∈ S ∩ U (R). (5)
Fig. 1 x − z ≤ x − a = R − R 2 − z − x 2
348 M.V. Balashov
T T
K
M ≥ g(u(0)) − g(u(T )) = − (g (u), u (t)) dt = g (u)2 dt ≥ T,
4
0 0
Thus length || ≤ 2 √M and by (7) ⊂ int Bδ (u∗ ). We get g(u(0)) − g(u(T )) =
K
√ √
|| K u(0) − u(T ) K
= g(u(0)) ≥ ≥ ≥
2 2
The Gradient Projection Algorithm for Smooth Sets and Functions... 349
√ √
ϕ(u(0)) − ϕ(u(T )) K (x(u(0))) K
≥ ≥ .
2Lϕ 2Lϕ
For any u(0) ∈ int Bσ (u∗ ) and hence for any x ∈ U (x∗ ) we have
μϕ02
f (x) − f0 ≥ A
2
(x), A= .
4L2ϕ
Theorem 1 Suppose that S is a proximally smooth with constant R > 0 smooth manifold
without edge, f : Rd → R is a smooth function with constant L1 . Let = Arg min f (x) =
x∈S
∅, f0 = f () and L = sup f (x) < +∞. Assume that there exists A > 0 with
x∈S∩U (R)
f (x) − f0 ≥ A
2
(x) ∀x ∈ S ∩ U (R), (8)
L
f is convex and A > R. 2
L
A− R
Then there exists μ = L L1
such that for all x ∈ S ∩ U (R) we have
R+ 2
Remark 1 Notice that we can demand concavity of the function f (x) − 12 L1 x2 instead of
Lipschitz property for f . Indeed, conacavity of the function f (x) − 12 L1 x2 is equivalent
to concavity of the function f (x) − 12 L1 x − x0 2 for any x0 and
1
f (x) − L1 x − x0 2 ≤ f (x0 ) + (f (x0 ), x − x0 ) ∀x, x0 .
2
We need only the last estimate for proving Lemma 1 and Theorem 1.
Remark 3 Note that in the case (f (x), w − x ) ≤ 0 (in notations of Theorem 1) we can
refine the estimate for μ in the LPL condition from Theorem 1. The last situation may takes
place for example when S is the boundary of a convex set. In this case we get
A2
L1
(f (x) − f0 ) ≤ PTx f (x)2 ∀x ∈ S ∩ U (R).
L
R + 2
Definition 2 Let S be a smooth manifold, r0 > 0. We shall say that the manifold S is r0 -
uniform if there exists λ ∈ (0, 1) such that for any x0 ∈ S and for any x ∈ x0 + Tx0 ,
x − x0 < λr0 , the set (x + Tx⊥0 ) ∩ S ∩ Br0 (x0 ) is a singleton.
Proof If the set S is proximally smooth with any constant r < R, then S is also proximally
smooth with constant R by the supporting principle for proximally smooth sets. Indeed, the
The Gradient Projection Algorithm for Smooth Sets and Functions... 351
Fig. 2 Lemma 4
The Gradient Projection Algorithm for Smooth Sets and Functions... 353
and the Stiefel manifold is proximally smooth with constant of proximal smoothness
2
R≥ √ .
k2 + 3k
At first we use (13) for surfaces Sij with i = j and R = 1 (k surfaces) and then we intersect
√
Sij with i < j and R = 2 ( 12 (k 2 − k) surfaces).
We want to emphasize that the result of Lemma 4 gives a lower bound for constant of
proximal smoothness R. In the case of the Stiefel manifold we can obtain the best possible
value for constant R.
Proposition 2 [25, Proposition 7] Let X0 ∈ Sn,k . Then for any X ∈ Rn×k such that X −
X0 < σk (X0 ) in the Frobenius norm, the projection (in the sense of the Frobenius norm)
k
of X onto Sn,k exists, is unique, and has the expression PSn,k (X) = Ui ViT = U In,k V T ,
i=1
given by a singular value decomposition X = U V T . Here σk (X0 ) is the smallest singular
.
value of X , U (V ) is the i-th column of U (V ) and I = [I ..0]T ∈ Rn×k .
0 i i n,k k
For some strange reason the authors of [25] didn’t pay attention on the fact that all sin-
gular values of a matrix from the Stiefel manifold equal 1. Hence σk (X0 ) = 1 and for
any matrix X with Sn,k (X) < 1 (in the sense of the Frobenius norm) there is exactly one
metric projection on Sn,k . Thus the Stiefel manifold is proximally smooth set with con-
.
stant R ≥ 1. Consider the matrix X0 = [e1 ..e2 . . . ek ] ∈ Sn,k , where {ei }ni=1 ∈ Rn is the
.
standard orthonormal basis, and the normal P = [e1 ..0 . . . 0]. For any t ∈ (1, 32 ) we have
.
PSn,k (X0 − tP ) = [−e1 ..e2 . . . ek ]. Hence R = 1 for any Sn,k .
For a smooth manifold S and x0 ∈ S put N1 (x0 ) = N1 (S, x0 ) = {n ∈ Tx⊥0 | n ≤ 1}.
354 M.V. Balashov
We want to recall the next result about a smooth and proximally smooth manifold.
Proposition 3 [26, Theorem 1.19.2], [31, Theorem 3.1 (k), (l)] Let S be a C 1 -smooth
manifold in Rd without edge. Assume that S is proximally smooth set with constant R. Then
S satisfies the property: for any x0 , x1 ∈ S, x0 − x1 < 2R,
1
h(N1 (x0 ), N1 (x1 )) ≤ S (x0 , x1 ),
R
where S (x0 , x1 ) is the length of the geodesic curve ⊂ S with endpoints x0 , x1 . Such
curve exists, moreover
x0 − x1
S (x0 , x1 ) ≤ 2R arcsin
2R
and thus
x0 − x1 π x0 − x1
h(N1 (x0 ), N1 (x1 )) ≤ 2 arcsin ≤ . (14)
2R 2R
Lemma 5 Let S be a manifold of dimension m without edge and S is smooth and √
proximally
smooth with constant 2 R. Then the set S is r0 -uniform with r0 = R and λ = 2 .
π 3
Proof Suppose that r > 0 is such constant that the set S is proximally smooth with constant
r and h(N1 (x0 ), N1 (x1 )) ≤ x0 −x
r
1
for all x0 , x1 ∈ S. We can take r = R, see (14).
Fix a point x0 ∈ S and x1 ∈ S, x0 − x1 < r. Then
x0 − x1
h(N1 (x0 ), N1 (x1 )) ≤ <1 (15)
r
and hence the angle between Tx0 and Tx1 (and between Tx⊥0 and Tx⊥1 ) is less than π2 . Thus for
any x1 ∈ S, x1 − x0 < r, Tx1 is not perpendicular to Tx0 .
√
Define l(x) = r − r 2 − x − x0 2 and Bx0 = {x | x ∈ x0 + Tx0 , x0 − x < 23 r},
see Fig. 3
From the supporting principle for proximally smooth sets and elementary planimetry any
point y ∈ (x + Tx⊥0 ) ∩ S ∩ Br (x0 ), where x ∈ Bx0 , belongs to the set
= {x + tn | t ∈ [0, l(x)]}
x∈Bx0 n∈Tx⊥ , n=1
0
and vice versa: any point y ∈ S ∩ belongs to the intersection (x + Tx⊥0 ) ∩ S ∩ Br (x0 ) for
some x ∈ Bx0 . It is sufficient to consider the√affine hull of points {x0 , x, y} for proof.
Suppose that x ∈ x0 + Tx0 , x0 − x < 23 r and (x + Tx⊥0 ) ∩ S ∩ Br (x0 ) = ∅. Consider
x(t) = (1 − t)x0 + tx, t ∈ [0, 1]. Let F (t) = (x(t) + Tx⊥0 ) ∩ S ∩ Br (x0 ). Put
t0 = inf {t ∈ [0, 1] | F (t) = ∅} .
By the inverse function theorem t0 ∈ (0, 1) and (due to compactness) F (t0 ) = ∅. Let
y ∈ F (t0 ) and y0 = Px0 +Tx0 y.
If Ty is not perpendicular to Tx0 , then (by the inverse function theorem and C 1 -
smoothness of S) there exist small neighborhoods U (y) ⊂ S and V (y0 ) ⊂ x0 +Tx0 such that
the metric projection gives bijection Px0 +Tx0 U (y) = V (y0 ). This contradicts the definition
of t0 as infimum. Hence Ty is perpendicular√
to Tx0 . The last assertion contradicts (15).
Suppose that x ∈ x0 +Tx0 , x0 −x < and {x1 , x2 } ⊂ (x+Tx⊥0 )∩S∩Br (x0 ). Then by
3
2 r
the supporting principle for proximally smooth sets x −xi ≤ r − r 2 − x0 − x2 < 12 r,
i = 1, 2, x1 − x2 ≤ x − x1 + x − x2 < r and
x0 − x1 x − x0 2 + x − x1 2
h(N1 (x0 ), N1 (x1 )) ≤ ≤ < 1.
r r
Note that n0 = xx22 −x
−x1 ∈ N1 (x0 ). Then there exists a unit vector n1 ∈ N1 (x1 ) with
1
n0 − n1 < 1. The cone with vertex x1 , axis lin {x1 , x1 + n1 }, generatrix of the length r
and with the angle between generatrix and the axis equals π3 , is contained in Br (x1 + rn1 ).
Hence the angle between n0 and n1 is strictly less than π3 and the supporting ball
int Br (x1 + rn1 )
contains the point x2 . Contradiction shows that the intersection (x + Tx⊥0 ) ∩ S ∩ Br (x0 ) is a
singleton.
Theorem 2 (One step of the GPA). Let S be a smooth and proximally smooth with constant
2 R manifold without edge, x0 ∈ S, t > 0. Suppose
π
Put
L L1
q(t) = t − t 2 − t2 . (22)
R 2
We have
1
max q(t) = q(t0 ), where t0 = .
R + L1
t>0 2L
√ √
Note that t0 < 3R
2L and thus the inequality tf (x) ≤ 3
2 R holds for all x ∈ S.
π
Theorem 3 Let S be a smooth and proximally smooth with constant 2R manifold without
edge, x0 ∈ S.
Suppose that f : Rd → R is a function with the next properties
1) f is smooth with constant L1 ,
2) L = sup f (x) < +∞,
x∈S∪US (R)
3) The LPL condition takes place for the function f on the set S∩ Lf (f (x0 )) with constant
μ > 0.
Assume that t ∈ (0, t0 ]. Put q = q(t), q(t) is from (22).
The Gradient Projection Algorithm for Smooth Sets and Functions... 357
Remark 4 Notice that we can demand concavity of the function f (x) − 12 L1 x2 instead
of Lipschitz property for f in Theorems 2 and 3.
By Formula (21)
t2
PTxk f (xk )2 .
xk+1 − zk ≤
R
Hence we should find a (unique) common point of the ball from the affine subspace zk +Tx⊥k
2
with centerpoint zk of radius tR PTxk f (xk )2 with S. In the case when S is given by the
system gi (x) = 0, i = 1, . . . , d − m, it can be done by solving the system
gi (x) = 0, (ej , x − zk ) = 0, 1 ≤ i ≤ d − m, 1 ≤ j ≤ m,
t2 (x
and with x − zk ≤ R PTxk f k )
2.
g (xk )
In the case m = 1 the point xk+1 can be easily found. Put pk = g (xk ) and λk =
t 2 PTx f (xk )2
k
R Then we can find xk+1 dividing the segment [zk − λk pk , zk + λk pk ] by half
.
[24].
We want to admit that the retraction
xk+1 = S ∩ (zk + Tx⊥k ) ∩ BR (xk ).
is not the only possible option. It is more a theoretical fact than a guide to computing.
Actually, we can choose xk+1 ∈ S arbitrarily, but with the property that there exists C > 0
with xk+1 − zk ≤ Ct 2 PTxk f (xk )2 for all k. Then we have
L1 2
f (xk+1 ) − f (xk ) ≤ −PTxk f (xk )2 t − t − CLt 2
2
and all results
√
of Theorems 2 and 3 remain valid, we only have to replace 1/R on C and
take t ∈ (0, 2L3R
], t ≤ 1/(2CL + L1 ).
Sometimes we can choose x k+1 = PS zk (e. g. for the Stiefel manifold, see Proposition
2). In comparison with the choice xk+1 = S ∩ (zk + Tx⊥0 ) ∩ BR (x0 ) we have x k+1 − zk ≤
xk+1 − zk . Hence all results of Theorems 2 and 3 remain the same.
Corollary 2 Suppose that conditions of Theorem 3 hold. Then the GPA in the form x0 ∈ S,
zk = xk − tPTxk f (xk ), xk+1 = PS zk , converges with a linear rate (25) to a solution of the
problem min f (x).
x∈S
3 Discussions
In most papers about the GPA there is no proof for the rate of convergence for the GPA in
nonconvex case or the proof is implicit. The last means that such proof often says that the
sequence {xk } generated by the GPA converges to a solution x∗ with some rate, for example
with a linear rate
∃C > 0 ∃ k0 ∈ N ∃ q ∈ (0, 1) : xk − x∗ ≤ Cq k , ∀k > k0 .
But there is no evaluation of the common ratio q or constants C, k0 in terms of known
constants.
We shall compare the result [21, Theorem 2.3, Corollary 2.11] with Theorems 2 and 3.
In comparison with [21] we have more information about the manifold S in Theorems 2 and
3: S is proximally smooth with constant R. It allows us to obtain an explicit estimate (25)
for the rate of convergence of the GPA through Lipschitz constants of the functions f , f ,
constant of proximal smoothness R and constant μ from the LPL condition. We also use a
The Gradient Projection Algorithm for Smooth Sets and Functions... 359
References
1. Goldstein, A.A.: Convex programming in Hilbert space. Bull. Amer. Math. Soc. 70:5, 709–710 (1964)
2. Levitin, E.S., Polyak, B.T.: Constrained minimization methods. Zh. Vychisl. Mat. Mat. Fiz. 6:5, 787–823
(1966)
3. Polyak, B.T.: Introduction to Optimization. M., Science (1983)
4. Nesterov, Y.u.: Introductory Lectures on Convex Optimization. Springer, Abasic course basic course
(2004)
5. Karimi, H., Nutini, J., Schmidt, M.: Linear Convergence of Gradient and Proximal-Gradient Methods
under the Polyak-Lojasiewicz Condition Linear Convergence of Gradient and 1 Methods under the
2 Condition. In: Frasconi, P., Landwehr, N., Manco, G., Vreeken, J. (eds.) Machine Learning and
Knowledge Discovery in Databases. ECML PKDD 2016. Lecture Notes in Computer Science, vol. 9851.
Springer, Cham (2016)
6. Ležanski, T.: ÜBer das Minimumproblem für Funktionale in Banachschen räumen. Math. Ann. 152,
271–274 (1963)
7. Polyak, B.T.: Gradient methods for minimizing functionals. Zh. Vychisl. Mat. Mat. Fiz. 3:4, 643–653
(1963)
8. Lojasiewicz, S.: A Topological Property of Real Analytic Subsets (in French). Coll. du CNRS. Les
Equations aux derives partielles 117, 87–89 (1963)
9. Lojasiewicz, S.: Sur le problem de la division. Studia Math. 8, 87–136 (1959)
10. A. D: Ioffe METRIC REGULARITY A SURVEY arXiv:1505.07920v2 (2015)
11. Luo, Z.-Q.: New error bounds and their applications to convergence analysis of iterative algorithms.
Math. Program. Ser. B. V. 88, 341–355 (2000)
12. Maxim, V., Maxim, O.: Golubev Balashov About the Lipschitz property of the metric projection in the
Hilbert space. J. Math. Anal. Appl. 394:2, 545–551 (2012)
13. Balashov, M.V.: Maximization of a function with Lipschitz continuous gradient. J. Math. Sci. 209:1,
12–18 (2015)
360 M.V. Balashov
14. Drusvyatskiy, D., Lewis, A.: Error bounds, quadratic growth, and linear convergence of proximal
methods arXiv:1602.06661v2 (2016)
15. Edelman, A., Arias, T.A., Smith, S.T.: The geometry of algorithms with orthogonality constraints. SIAM
J. Matrix Anal. Appl. 20:2, 303–353 (1998)
16. Luenberger, D.G.: The gradient projection methods along geodesics. Manag. Sci. 18:11, 620–631 (1972)
17. Hager, W.W.: Minimizing a quadratic over a sphere. SIAM J. Optim. Contr. 12:1, 188–208 (2001)
18. Absil, P.-A., Mahony, R., Sepulchre, R.: Optimization algorithms on matrix manifolds. Princeton
University Press, Princeton and Oxford (2008)
19. da Cruz Neto, J.X., De Lima, L.L., Oliveira, P.R.: Geodesic algorithms on Riemannian manifolds. Balkan
J. Geom. Appl. 3:2, 89–100 (1998)
20. Udrişte, C.: Convex Functions and Optimization Methods on Riemannian Manifolds. Mathematics and
Its Applications series, vol. 297. Springer (1994)
21. Schneider, R., Uschmajew, A.: Convergence results for projected line search methods on varieties of
low-rank matricies via Lojasiewicz inequality, vol. 25:1 (2015)
22. Jain, P., Kar, P.: Non-convex Optimization for Machine Learning. Now Foundations and Trends, pp. 154
(2017)
23. Barber, R.F., Ha, W.: Gradient descent with nonconvex constraints: local concavity determines conver-
gence, arXiv:1703.07755v3 (2017)
24. Balashov, M., Polyak, B., Tremba, A.: Gradient projection and conditional gradient methods for con-
strained nonconvex minimization. Numerical Functional Analysis and Optimization 41(7), 822–849
(2020)
25. Absil, P.-A., Malick, J.: Projection-like retraction on matrix manifolds. SIAM J. Optim. 22:1, 135–158
(2012)
26. Ivanov, G.E.: Weakly convex sets and functions, M., Fizmatlit. In Russian (2006)
27. Vial, J.-P.h.: Strong and weak convexity of sets and functions. Math. Oper. Res. 8:2, 231–259 (1983)
28. Clarke, F.H., Stern, R.J., Wolenski, P.R.: Proximal smoothness and lower–C 2 property. J. Convex Anal.
2:1-2, 117–144 (1995)
29. Balashov, M.V.: The gradient projection algorithm for a proximally smooth set and a function with
lipschitz continuous gradient. Sbornik: Mathematics 211(4), 481–504 (2020)
30. Balashov, M.V., Ivanov, G.E.: Weakly convex and proximally smooth sets in Banach spaces. Izv. RAN.
Ser. Mat. 73:3, 23–66 (2009)
31. Goncharov, V.V., Ivanov, G.E.: Strong and Weak Convexity of Closed Sets in a Hilbert Space. In: Daras,
N., Rassias, T. (eds.) Operations Research, Engineering, and Cyber Security. Springer Optimization and
Its Applications, vol. 113, pp. 259–297. Springer, Cham (2017)
32. Bounkhel, M., Thibault, L.: On various notions of regularity of sets in nonsmooth analysis. Nonlin. Anal.
48, 223–246 (2002)
33. Poliquin, R.A., Rockafellar, R.T., Thibault, L.: Local differentiability of distance functions. Trance.
Amer. Math. Soc. 353, 5231–5249 (2000)
34. Balashov, M.V.: About the gradient projection algorithm for a strongly convex function and a proximally
smooth set. J. Convex Anal. 24:2, 493–500 (2017)
35. Gao, B., Liu, X., Chen, X., Yuan, Y.-X.: On the Lojasiewicz Exponent of the Quadratic Sphere
Constrained Optimization Problem, arXiv:1611.08781v2
36. Liu, H., Wu, W., So, A.M.C.: Quadratic Optimization with Orthogonality Constraints:Explicit
Lojasiewicz Exponent and Linear Convergence of Line-Search Methods, arXive:1510.01025v1 (2015)
37. Zhang, J., Zhang, S.: A Cubic Regularized Newton’s Method over Riemannian Manifolds, arX-
ive:1805.05565v1 (2018)
38. Merlet, B., Nguyen, T.N.: Convergence to equilibrium for discretizations of gradient-like flows on
Riemannian manifolds, vol. 26 (2013)
Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps
and institutional affiliations.