The Gradient Projection Algorithm For Smooth Sets and Functions in Nonconvex Case

Set-Valued and Variational Analysis (2021) 29:341–360
https://doi.org/10.1007/s11228-020-00550-4
The Gradient Projection Algorithm for Smooth Sets

and Functions in Nonconvex Case
Maxim V. Balashov1
Received: 16 January 2019 / Accepted: 19 July 2020 / Published online: 1 August 2020
© Springer Nature B.V. 2020
Abstract
We consider the problem of minimization for a function with Lipschitz continuous gradient
on a proximally smooth and smooth manifold in a finite dimensional Euclidean space. We
consider the Lezanski-Polyak-Lojasiewicz (LPL) conditions in this problem of constrained
optimization. We prove that the gradient projection algorithm for the problem converges
with a linear rate when the LPL condition holds.
Keywords Lipschitz continuous gradient · Proximal smoothness · Gradient projection

algorithm · Metric projection · Nonconvex extremal problem ·
Lezanski-Polyak-Lojasiewicz condition
Mathematics Subject Classiﬁcation (2010) Primary: 90C26 · 65K05. Secondary: 46N10 ·

65K10
1 Introduction and Main Notations

1.1
The gradient projection algorithm GPA (also known as the gradient projection method, the
projection-proximal method etc.) for solving the problem
min f (x) (1)
A
was introduced in [1] (without evaluating the rate of convergence) and [2] (with evaluating
the rate of convergence). Convexity and smoothness are essential properties for efficiency of
the method. In particular, if the function in (1) is strongly convex with Lipschitz continuous
gradient and the set is closed and convex, then the GPA converges with the rate of geometric
progression (or linear rate). The GPA proved to be an extremely useful tool for solving
different extremal problems [2, 3]. Numerous variants of the GPA for convex optimization
problems can be found in [4].
Maxim V. Balashov
balashov73@mail.ru
1 V. A. Trapeznikov Institute of Control Sciences of Russian Academy of Sciences, 65

Profsoyuznaya street, Moscow 117997, Russia
342 M.V. Balashov
Convexity assumptions may be weakened. Firstly it was done in unconstrained optimiza-

tion, when A = Rd . Probably the Lezanski-Polyak-Lojasiewicz (further LPL) condition
[6–8] was the first reasonable and important property of the function instead of (strong)
convexity, it can be used in the gradient descent method [7]. Different types of conditions
instead of convexity for functions in unconstrained optimization and their relationships can
be found in [5]. Most of these conditions are ideologically close to the concept of metric
regularity [10, 11].
One can require additional convexity properties for the set involved in the algorithm (e.
g. strong convexity) and consider nonconvex and smooth function [12, 13]. There are so-
called error bound conditions of different types which allow us to prove linear convergence
for gradient descent methods [14] and for gradient algorithms in optimization problems with
convex constrains [11].
The property of convexity for the function and for the set does not hold in number of
extremal problems. We can mention the problem of minimization of a smooth function f
on some matrix manifold A without edge. For example A is the real Stiefel manifold [15].
After all, A in (1) can be neither a convex set nor a smooth manifold.
Suppose that the set A in problem (1) is a smooth manifold of dimension ≤ d − 1.
Consideration of geodesic-related steps in the GPA is a standard practice. One of the
first works in the field is [16]. Special case of problem (1) for quadratic function f
has been studied in [17]. Later publications include [18–20], in most of them A is a
smooth manifold. Newton’s method was also used [15, 17] to solve mentioned smooth
and nonconvex problems, basically for quadratic functions/constraints. But the Newton
method converges locally in general and there are very few papers with explicit condi-
tions for its convergence in problem (1). The last is true for the GPA in nonconvex case
for problem (1), we can mention the papers [11, 21, 23, 24, 35] with more or less rea-
sonable estimates for the error of the algorithm in nonconvex constrained optimization
problems.
There are very few works where the GPA is considered for an arbitrary nonconvex closed
set which is at least not a manifold.
In [22, Theorem 3.3] the authors proved a linear convergence of the GPA for an arbitrary
closed set A but with very strong restrictions on the function f over A, so-called restricted
strong convexity and restricted smoothness. Moreover, the function’s graph should be close
to the norm’s squared graph. Most of the functions (e.g. functions with linear sections of the
graph) do not satisfy these restrictions.
In [23, Theorem 3] the authors also proved a linear convergence of the GPA in some
neighborhood of the unique minimum point. They used in fact the class of proximally
smooth sets in problem (1) but again the properties of restricted strong convexity and
restricted smoothness for the function f . For the case of a strongly convex function with
Lipschitz continuous gradient this result was proved in [34].
Under certain conditions a linear asymptotic for the rate of convergence1 for the GPA
was proved in [21, Theorem 2.3] for an arbitrary closed subset. But the only reasonable class
of sets for which the mentioned conditions are fulfilled is the class of smooth manifolds
[21, Corollary 2.11].
As far as the author knows, the strongest general results about convergence of the GPA
for problem (1) with a smooth manifold A can be found in [21].
1 This means that the sequence {x } generated by the GPA converges to a solution x with the rate xk −x∗ ≤
k ∗
C1 e−C2 k for all k > k0 with unknown constants k0 ∈ N, C1 > 0 and/or C2 > 0.
The Gradient Projection Algorithm for Smooth Sets and Functions... 343
One of the most important assumptions for linear rate is some sort of the Lojasiewicz
condition with exponent 1/2. This means the LPL condition (Definition 1) in notations of
the present paper. We compare our results with [21, Theorem 2.3, Corollary 2.11] and some
other results in Section 3.
Retraction [18, 25] is another important thing for gradient methods on smooth manifolds.
When we slide along a tangent line (to a geodesic or manifold) in the anti-gradient direction,
we typically leave the manifold A. Thus we need a procedure (retraction) to return the point
x + tv on the set A (x ∈ A, t > 0, v is a tangent vector to the set A at the point x).
In the present paper we plan to prove convergence of the GPA for the problem of min-
imization of a smooth nonconvex function on a proximally smooth and smooth manifold.
This is a key point in the work. Our choice is dictated by the fact that the metric projection
on such set exists and it is unique (and continuously depends on the point) in some uniform
neighborhood of the set. We use proximal properties of proximally smooth sets first of all.
As we will see further, these properties are useful in particular for the case when A is a
smooth manifold.
Suppose that the set A is proximally smooth and f has Lipschitz continuous gradient.
(x))
Define for t > 0 and for a point x ∈ A the gradient mapping Gt (x) = x−PA (x−tf t [4]. If
the error bound condition for the gradient mapping
∃ν > 0 (x) ≤ νGt (x) ∀x ∈ A
holds for the solution set in problem (1) and t > 0 is sufficiently small, then the standard
GPA for (1) with step-size t converges with a linear rate [29, Theorem 5.1]. This result
shows that proximally smooth sets form a good class for the GPA in problem (1).
The mentioned error bound condition for the gradient mapping has basically theoretical
meaning and quite complicated for usage. Moreover, it does not take into account spe-
cific situation when A is a smooth manifold. Assume further that A is a smooth manifold
S. In the present paper we consider (instead of an Error bound condition for the func-
tion f on a manifold S) the Lezanski-Polyak-Lojasiewicz condition (or LPL, Definition
1). This condition is well known in unconstrained optimization, we adopted it to the case
of constrained optimization problems. The LPL condition (which is known as Lojasiewicz
condition, Polyak-Lojasiewicz condition, Kurdyka-Lojasiewaicz condition and so on) was
considered before [6–8], [21, Definition 2.1], [24, §3.2]. In particular it was considered for
constrained optimization of a quadratic function on the Euclidean unit sphere [35] and on
the Stiefel manifold [36].
In Section 2 we recall the LPL condition for problem (1) in the case when A is a
proximally smooth and smooth manifold. In Lemmas 1, 2 and Theorem 1 we consider rela-
tionship between the LPL condition of the function f on the smooth manifold S and the
quadric growth condition for f .
We prove some lemmas about properties of a proximally smooth set which is given by
the system {x ∈ Rd | g(x) = 0}, g : Rd → Rm . In particular, we proved in Lemma 4 that
proximal smoothness is a typical property for a compact smooth manifold.
Theorems 2 and 3 are devoted to proving a linear convergence of the GPA on a smooth
and proximally smooth manifold S ⊂ Rd under the LPL condition. We obtain explicit form
of the convergence rate via known constants (25). In the case dimS = d − 1 it was proved in
[24]. Principal difficulties arise in the case dim S < d − 1. Lemma 5 is the main technical
tool in this situation.
Thus smooth and proximally smooth manifolds define a class of sets for which the GPA
under the LPL condition works alright and the rate of convergence for the GPA can be quite
344 M.V. Balashov
easily and explicitly proven in fact in the same way as in the convex case. Our final results
do not need concept of geodesics at all. This is another advantage of the proposed approach.
Note that in the case of a smooth manifold A the property of proximal smoothness for
A with constant R is equivalent to the existence of local geodesics on the manifold. Such a
geodesic exists and is unique for any endpoints a, b ∈ A, a − b < 2R [26, Items (1) and
(5) of Theorem 1.14.2], [31, Item (k) of Theorem 3.1].
In the present paper we consider first order algorithms. Some approachers to the second
order algorithms (a cubic regularised Newton’s method over Riemannian manifolds) can be
found in [37].
1.2
We denote a Euclidean space of d dimensions by Rd and the inner product by (·, ·). Define
BR (a) = {x ∈ Rd | x − a ≤ R}, a ∈ Rd , R > 0.
For a set A ⊂ Rd we denote by cl A, int A, ∂A the closure, the interior and the boundary
of the set A, respectively.
Let T ⊂ Rd be a subspace. Denote by T ⊥ its orthogonal complement.
A function f : Rd → R is called strongly convex with constant κ > 0 [3, §1 Chapter 1]
if the function f (x) − κ2 x2 is convex.
We denote by f (x) the Frechet gradient for a differentiable function f at a point x.
Suppose that there exists L1 > 0 with f (x) − f (y) ≤ L1 x − y for all x, y. Then we
shall say that the function f is smooth.
If f is smooth with constant L1 then for all x, x0 we have
f (x0 ) + (f (x0 ), x − x0 ) − L1
2 x − x0 2 ≤ f (x) ≤
≤ f (x0 ) + (f (x0 ), x − x0 ) + L1
2 x − x0 2 .
For a function f : Rd → R and β ∈ R define the lower level set
Lf (β) = {x ∈ Rd | f (x) ≤ β}.
We denote by A (x) = (x, A) = inf x − a the distance function.
a∈A
The metric projection of a point x ∈ Rd onto a set A ⊂ Rd is
PA x = {a ∈ A | x − a = A (x)}.
Let R > 0, x0 , x1 ∈ Rd and x0 − x1 < 2R. We denote by DR (x0 , x1 ) the strongly con-
vex segment with endpoints x0 , x1 that is the intersection of all closed balls of radius R each
of which contains {x0 , x1 }. The boundary of the set DR (x0 , x1 ) consists of all small arcs of
circles of radius R with endpoints {x0 , x1 }. Define also DR◦ (x0 , x1 ) = DR (x0 , x1 )\{x0 , x1 }.
A closed set A ⊂ Rd is called proximally smooth (or prox-regular, weakly convex) with
constant R if the distance function A (x) is continuously Frechet differentiable on the set
UA (R) = {x ∈ Rd | 0 < A (x) < R}.
The properties of such sets were considered by different authors [27, 28, 30, 31].
For a proximally smooth set A and a point x ∈ A we denote by N (A, x) the cone of
proximal normals
N (A, x) = {n ∈ Rd | ∃t = t (n) > 0 PA (x + tn) = x}.
For a proximally smooth set in Rd all tangent cones and consequently all normal cones
coincide, see [32, Section 6], [33, Corollary 2.2].
Proposition 1 Let A ⊂ Rd be a closed set. The next properties are equivalent.

(i) The set A is proximally smooth with constant R > 0.
(ii) For any points x0 , x1 ∈ A with 0 < x0 −x1 < 2R we have DR◦ (x0 , x1 )∩A = ∅ [27].
(iii) The supporting principle holds [27, 28]: for any point x ∈ ∂A and any unit normal
n(x) ∈ N (A, x) we have
A ∩ int BR (x + Rn(x)) = ∅. (2)
(iv) For any r ∈ (0, R) and x0 , x1 ∈ UA (r), ai = PA xi , i = 0, 1, we have [28]
R
a0 − a1 ≤ x0 − x1 . (3)
R−r
We shall say that x ∈ A is a stationary point of the function f on the set A ⊂ Rn , if there
exists some t > 0 with
PA (x − tf (x)) = x.
The last equality is equivalent to the inclusion f (x) ∈ −N (A, x).
For a differentiable mapping ϕ : Rm → Rd , ϕ(u) = (ϕ1 (u), . . . , ϕd (u)), we define the
Jacobi matrix ⎛ ⎞
ϕ1u (u) . . . ϕ1 u (u)
⎜ 1 m
⎟
ϕ (u) = ⎝. . . . . . . . . . . . . . . . . . . .⎠ ,

ϕdu (u) . . . ϕdu (u)
1 m
where ϕiu (u) = d ϕi (u)

d uj , 1 ≤ i ≤ d, 1 ≤ j ≤ m.
j
We shall say that the Lezanski-Polyak-Lojasiewicz condition [6–8] (simply the LPL con-
dition) takes place for the bounded from below function f : Rd ⊃ U → R, if there
exist numbers θ ∈ (1, 2] and μ > 0 with f (x)θ ≥ μ(f (x) − f0 ) for all x ∈ U .
Here f0 = inf f (y). In the pioneering work in english [9] Lojasiewicz proved his famous
y∈U
inequality for a real-analytic function f : Rd → R in the next form: for any compact set
K ⊂ Rd there exist α > 0 and C > 0 with
(x, {y ∈ Rd | f (y) = 0}) ≤ C|f (x)|α ∀x ∈ K. (4)
Note that the power α depends on K and usually the value of α is unpredictable. In the
sequel this result was generalized in the form of the LPL condition. The LPL condition was
independently formulated by Lezanski and Polyak (with θ = 2). Further we shall assume
that θ = 2 in the LPL condition. It is the exponent θ = 2 which guarantees a linear rate of
convergence for gradient algorithms.
Let Ik ∈ Rk×k be the identity matrix.
Denote by Sn,k the real Stiefel manifold with natural parameters n ≥ k, i.e.
Sn,k = {X ∈ Rn×k | X T X = Ik }.
√
The Frobenius norm of a matrix X ∈ Rn×k is X = tr XT X.
2 Minimization of a Smooth Function on a Smooth Manifold
We denote further by S ⊂ Rd a smooth, i.e. C 1 , m-dimensional connected manifold without

edge, 1 ≤ m ≤ d − 1. Further we shall assume that S is an embedded smooth manifold, i.e.
for any point x ∈ S there exist a local chart (W, ψ), W is an open subset of S, x ∈ W , and
ψ ∈ C 1 (W ), ψ : W → ψ(W ) ⊂ Rm is a diffeomorphism. In particular this means that for
346 M.V. Balashov
any point x ∈ S we can find δ > 0, u ∈ Rm , Bδ (u) ⊂ Rm and a function ϕ ∈ C 1 (int Bδ (u))
with ϕ(u) = x, ϕ : int Bδ (u) → ϕ(int Bδ (u)) ⊂ S is a diffeomorphism. In fact, ϕ = ψ −1 .
Hence rank ϕ (v) = m for all v ∈ int Bδ (u).
Smoothness of the manifold S means that for any point x ∈ S there exists a tangent
subspace
Tx to S at the point x and a tangent plane x + Tx . It should be noted that Tx =
ϕ (u), ∈ Rm has dimension m for any x ∈ S. Here the function ϕ and the point u are
from the previous paragraph.
Recall that in the case S = {x ∈ Rd | g(x) = 0}, g : Rd → Rm is a C 1 function
satisfying the full rank condition rank (g (x)) = m for all x ∈ S, we have Tx = {v ∈
Rd | g (x)v = 0}. For a vector h ∈ Rd the metric projection PTx h on the subspace Tx is
given by the formula

−1

PTx h = I − g (x) g (x)g (x)
T T
g (x) h.
Definition 1 [21, 24, 35] Let f : Rd → R be a differentiable function which is bounded

from below on a smooth manifold S and β ∈ R. We shall say that the Lezanski-Polyak-
Lojasiewicz (LPL) condition takes place for the function f on the manifold S, if there exists
μ > 0 such that for any x ∈ S ∩ Lf (β)
PTx f (x)2 ≥ μ(f (x) − f0 ),
where f0 = inf f (y).

y∈S
Further in the context of Definition 1 we shall write ”for any x ∈ S” instead of ”for any
x ∈ S ∩ Lf (β)” for simplicity.
We want to point out that Definition 1 is a natural generalization for the LPL condition
in unconstrained case. Indeed, we may assume that in unconstrained situation S = Rd and
thus for any x ∈ S we get PTx f (x) = f (x).
Consider a function f : Lf (β) → R. Suppose that f is strongly convex with constant κ
and Lipschitz continuous with constant L. Let S be a smooth manifold without edge which
L
is proximally smooth with constant R and κ < R. Then f satisfies the LPL condition on
the set S ∩ Lf (β) [24, Example 4].
A strongly convex function does not satisfy the LPL condition without the inequal-
2
μ < R. Consider the function f (x, y) = x + y − 4 and the curve S =
3
ity L 2

(x, y) | y = 2x 2 , x ∈ [0, 12 ] . Then ( 12 , 12 ) is the global minimum for minS f =

f ( 12 , 12 ) = 16
5
, f ( 12 , 12 ) = 1, − 12 , f (x, 2x 2 ) = 2 x, 2x 2 − 34 . Note that the circle
2
x 2 + y − 34 = 16 5
is tangent to S at the point 12 , 12 . The function f is strongly convex

with constant μ = 2. The LPL condition does not hold because vector f (0, 0) = 0, − 32
is perpendicular to the curve at the point (0, 0). We have R = 14 , L = 32 , μ = 2.
Lemma 1 Suppose that S is a proximally smooth with constant R > 0 smooth manifold
without edge, f : Rd → R is a smooth function with constant L1 .
Let = Arg min f (x) = ∅, f0 = f () and L = sup f (x) < +∞. Then there
x∈S x∈
L L1
exists B = R + 2 with
f (x) − f0 ≤ B
2
(x) ∀x ∈ S ∩ U (R). (5)
Proof Suppose that x ∈ S ∩ U (R) and x ∈ P x, x − x < R. Let z = Px +Tx x.

Note that the angle xzx is 12 π and hence x − z ≤ x − x , x − z ≤ x − x . By
L1 -Lipschitz condition for f we have
L1
f (x) − f (x ) ≤ (f (x ), x − z + z − x ) + x − x 2 .
2
The point x is stationary and z − x ∈ Tx , thus (f (x ), z − x ) = 0.
Suppose that x = z and p = x−z
x−z
. We have p ∈ Tx⊥ = N (S, x ), p = 1, and by the
supporting principle for proximally smooth sets, item (iii) of Proposition 1, S ∩ int BR (x +

Rp) = ∅. We get x − z ≤ R − R 2 − z − x 2 ≤ z−x
≤ x−x
2 2
R R , see Fig. 1.
Hence f (x) − f0 = f (x) − f (x ) ≤

x − x 2 L1 L L1
≤ f (x ) · + x − x ≤
2
+ x − x 2 = B 2
(x).
R 2 R 2
Lemma 2 Suppose that S is a smooth manifold without edge,

f : Rd → R is a Frechet differentiable function and f satisfies the LPL condition on the
set S with constant μ > 0. Let = Arg min f (x) = ∅ and f0 = f (). Then for any point
x∈S
x∗ ∈ there exist a constant A > 0 and a neighbourhood U (x∗ ) ⊂ S of the point x∗ with
2
A (x) ≤ f (x) − f0 ∀x ∈ U (x∗ ). (6)

Fig. 1 x − z ≤ x − a = R − R 2 − z − x 2
348 M.V. Balashov
Proof Fix x∗ ∈ . There exist δ > 0, u∗ ∈ Rm and ϕ : Rm ⊃ int Bδ (u∗ ) →

ϕ (int Bδ (u∗ )) ⊂ S, ϕ ∈ C 1 (int Bδ (u∗ )), ϕ is a diffeomorphism with the properties
ϕ(u∗ ) = x∗ and ϕ0 = inf ϕ (u)v > 0.
v=1, u∈ intBδ (u∗ )
Let Lϕ be Lipschitz constant of ϕ on the set Bδ (u∗ ).
For a point x ∈ ϕ (int Bδ (u∗ )) consider the corresponding point u ∈ int Bδ (u∗ ): ϕ(u) =
x. Consider the function h(u) = f (ϕ(u)) for all u ∈ int Bδ (u∗ ). For u ∈ int Bδ (u∗ ) and
x = ϕ(u) fix ∈ Rm , = 1 with the next property: the angle γ between f (x) and
ϕ (u) equals the angle between f (x) and Tx . Note that depends on u (or x = ϕ(u)).
Then h (u) = f (x)ϕ (u) ≥
≥ (f (x)ϕ (u), ) = (f (x), ϕ (u)) = f (x) · ϕ (u) cos γ ≥ PTx f (x)ϕ0 .
≥ PTx f (x) and
ϕ0 h (u)
1
Hence
1
h (u)2 ≥ μ(h(u) − f0 ),
ϕ02
f0 = f (x∗ ) = minx∈ϕ(intBδ (u∗ )) f (x) = minu∈intBδ (u∗ ) h(u). Thus h satisfies the LPL
condition on the set int Bδ (u∗ ) with constant K = μϕ02 .
For a real number σ ∈ (0, δ) define

M = sup h(u) − f0 < +∞.
u∈Bσ (u∗ )
Choose σ ∈ (0, δ) with the property

M
δ > σ + 2√ . (7)
K
By the equality lim M = 0 we can find such σ .
σ →+0
Define U (x∗ ) = ϕ(int Bσ (u∗ )).
√
Consider the function g(u) = h(u) − f0 and the system
u (t) = −g (u), u(0) ∈ int Bσ (u∗ ).
The solution of the system is the flow orbit starting at the point u(0) flowing along the
antigradient −g (u). √
(u)
By the inequality g (u) = 2√hh(u)−f ≥ 2K we have
0
T T
K
M ≥ g(u(0)) − g(u(T )) = − (g (u), u (t)) dt = g (u)2 dt ≥ T,
4
0 0
hence after ”time” T ≤ 4M

K we get ϕ(u(T )) ∈ . From the other hand
T √ T √
K || K
M ≥ g(u(0)) − g(u(T )) = g (u) dt ≥ 2
u (t) dt = .
2 2
0 0
Thus length || ≤ 2 √M and by (7) ⊂ int Bδ (u∗ ). We get g(u(0)) − g(u(T )) =
K
√ √
|| K u(0) − u(T ) K
= g(u(0)) ≥ ≥ ≥
2 2
√ √
ϕ(u(0)) − ϕ(u(T )) K (x(u(0))) K
≥ ≥ .
2Lϕ 2Lϕ
For any u(0) ∈ int Bσ (u∗ ) and hence for any x ∈ U (x∗ ) we have
μϕ02
f (x) − f0 ≥ A
2
(x), A= .
4L2ϕ
Note that the function itself is not necessarily quadratic in a neighbourhood of .

Indeed, the function (p, x) can satisfy the LPL condition, for example in the case p =
(−1, 0, . . . , 0)T ∈ Rd and S = {x ∈ Rd | x = 1, x1 ≥ 12 }.
Theorem 1 Suppose that S is a proximally smooth with constant R > 0 smooth manifold
without edge, f : Rd → R is a smooth function with constant L1 . Let = Arg min f (x) =
x∈S
∅, f0 = f () and L = sup f (x) < +∞. Assume that there exists A > 0 with
x∈S∩U (R)
f (x) − f0 ≥ A
2
(x) ∀x ∈ S ∩ U (R), (8)
L
f is convex and A > R. 2
L
A− R
Then there exists μ = L L1
such that for all x ∈ S ∩ U (R) we have
R+ 2
μ(f (x) − f0 ) ≤ PTx f (x)2 .
Proof Suppose that x ∈ S ∩ U (R) and x ∈ P x, x − x < R. Let w = Px+Tx x .

Note that the angle xwx is 12 π and hence x − w ≤ x − x , x − w ≤ x − x .
Due to the convexity of f we have
f (x ) ≥ f (x) + (f (x), x − x), (9)
f (x) − f (x ) ≤ (f (x), x − x ).
Thus
(f (x), x − w + w − x ) ≤ PTx f (x) · x − w + (f (x), w − x ) ≤
≤ PTx f (x) · x − x + Lw − x .
x−x 2
Repeating arguments from Lemma 1 we obtain that w − x ≤ R and
L
Ax − x 2 ≤ f (x) − f (x ) ≤ PTx f (x) · x − x + x − x 2 ,
R
Cx − x 2 ≤ PTx f (x)2

L 2
for C = A − R .
L L1 L1
By Lemma 1 for B = R + 2 ≤ L
R + 2 we have
1 1
(f (x) − f (x )) ≤ x − x 2 ≤ PTx f (x)2
B C
and μ(f (x) − f0 ) ≤ PTx f (x)2 for all x ∈ S ∩ U (R).
350 M.V. Balashov
Remark 1 Notice that we can demand concavity of the function f (x) − 12 L1 x2 instead of
Lipschitz property for f . Indeed, conacavity of the function f (x) − 12 L1 x2 is equivalent
to concavity of the function f (x) − 12 L1 x − x0 2 for any x0 and
1
f (x) − L1 x − x0 2 ≤ f (x0 ) + (f (x0 ), x − x0 ) ∀x, x0 .
2
We need only the last estimate for proving Lemma 1 and Theorem 1.
Remark 2 Unfortunately, convexity of the function f is essential for suggested proof of

Theorem 1. If f is not convex (and all other assumptions are satisfied) then we should take
L1
f (x ) ≥ f (x) + (f (x), x − x) − x − x 2 ,
2
instead of (9). Hence
L1
f (x) − f (x ) ≤ f (x) + (f (x), x − x ) + x − x 2
2
and

L L1
Ax − x 2 ≤ f (x) − f0 ≤ PTx f (x) · x − x + + x − x 2 .
R 2

L1
By Lemma 1 f (x) − f0 ≤ L
R + 2 x − x 2 and the last estimate is trivial.
Remark 3 Note that in the case (f (x), w − x ) ≤ 0 (in notations of Theorem 1) we can
refine the estimate for μ in the LPL condition from Theorem 1. The last situation may takes
place for example when S is the boundary of a convex set. In this case we get
A2
L1
(f (x) − f0 ) ≤ PTx f (x)2 ∀x ∈ S ∩ U (R).
L
R + 2
Consider an example: f (x) = (−e1 , x) and S = {x ∈ Rd | x = 1, x1 ≥ 12 }. We obtain

in notations of Theorem 1 that = {e1 } and for all x ∈ S we have f (x) − f (e1 ) = 1 − x1 ,
2 (x) = 2(1 − x ). Hence A = 1 , L = 1, R = 1, L = 0 and μ = A2 = 14 .
1 2 1 L L1
R+ 2
Definition 2 Let S be a smooth manifold, r0 > 0. We shall say that the manifold S is r0 -
uniform if there exists λ ∈ (0, 1) such that for any x0 ∈ S and for any x ∈ x0 + Tx0 ,
x − x0 < λr0 , the set (x + Tx⊥0 ) ∩ S ∩ Br0 (x0 ) is a singleton.
Definition 2 is technical, we need it to prove Lemma 5.
Lemma 3 Let R > 0. Suppose that g : Rd → R is a differentiable function and S = {x ∈

Rd | g(x) = 0} is a smooth oriented manifold of dimension d − 1 without edge. Suppose
that for any x0 , x1 ∈ S and (similarly oriented) ni ∈ N (S, xi ), ni = 1, i = 0, 1,
1
n0 − n1 ≤ x0 − x1 .
R
Then the set S is proximally smooth with constant R.
Proof If the set S is proximally smooth with any constant r < R, then S is also proximally
smooth with constant R by the supporting principle for proximally smooth sets. Indeed, the
last assertion follows from the equality

int BR (x0 + Rn0 ) = int Br (x0 + rn0 )
r<R
for any x0 ∈ S, n0 ∈ N (S, x0 ), n0 = 1.
Suppose that S is not proximally smooth with some constant r < R. Then [27] there
exist points x0 , x1 ∈ S, 0 < x0 − x1 < 2r, with
Dr◦ (x0 , x1 ) ∩ S = ∅.
We may assume without loss of generality that Dr◦ (x0 , x1 ) ⊂ {x | g(x) < 0}.
Put
Ni = N (Dr (x0 , x1 ), xi ), i = 0, 1.
Then we have for any oriented (directed to the set {x | g(x) ≥ 0}) unit vectors ni ∈ N (S, xi ),
i = 0, 1,
ni ∈ Ni ⊂ int N (DR (x0 , x1 ), xi ).
The last means that Rn0 − n1 > x0 − x1 . A contradiction.
Consider S = {x ∈ Rd | g(x) = 0}, where g : Rd → R is a smooth function with
constant L1 and there exists m > 0 such that g (x) ≥ m for all x ∈ S. Then S gives an
example of an R-proximally smooth set from Lemma 3. Moreover
m
R= , (10)
L1
see [27].
We shall call a manifold of n − 1 dimensions in Rn surface.
Lemma 4 Let gi : Rd → R, 1 ≤ i ≤ m < d, be such differentiable functions that for any

i the set Si = {x ∈ Rd | gi (x) = 0}, 1 ≤ i ≤ m, has the Lipschitz continuous oriented unit

m
normal with Lipschitz constant R −1 . Suppose that the set S = Si is nonempty and for
i=1
any point x ∈ S unit normals to the sets Si at the point x are linear independent. Let S1 be
a compact surface without edge and assume that Si , 2 ≤ i ≤ m, are surfaces without edge.
Then the set S is proximally smooth. Constant of proximal smoothness can be estimated
from below by Formula (12).
Proof By Lemma 3 the sets Si are proximally smooth with constant R.

Suppose that we proved proximal smoothness of the set

l−1
Gl−1 = Si , l ≤ m,
i=1
with constant of proximal smoothness r = Rl−1 ≤ R and Gl = Gl−1 ∩ Sl .
Define
g (x)
δ = δl = max nl−1 ± nl (x), nl (x) = l . (11)
x∈Gl−1 , nl−1 ∈N(Gl−1 ,x), nl−1 =1 gl (x)
We get δ ∈ (0, 2) because normals from subspace N (Gl−1√ , x) and nl (x) are linearly
independent and Gl−1 is compact. In fact, in our situation δ ∈ [ 2, 2).
Fix x ∈ Gl and n ∈ N (Gl , x), n = 1. Then
n ∈ cone {nl−1 , nl },
352 M.V. Balashov
where nl−1 ∈ N (Gl−1 , x), nl−1 = 1, nl = 1 and nl ∈ N (Sl , x).

Put u = x + rnl−1 , v = x + Rnl , w = Plin{u,v} x and h = x − w.
If w ∈ [u, v], then (as follows from elementary planimetry in the plane aff{x, u, v})
Bh (x + hn) ⊂ Br (u) ∪ BR (v). If w ∈ / [u, v] (and hence w ∈ {u + λ(u − v) | λ > 0}) then
Br (x + rn) ⊂ Br (u) ∪ BR (v).
Denote by ϕ the angle between nl−1 and nl . If ϕ ≥ π2 then by the Cosine theorem from
the triangle 0nl−1 nl we have
δ2
0 ≥ cos ϕ ≥ 1 −
2

4
and sin ϕ ≥ δ 2 − δ4 .
From the triangle xuv we get u − v2 = R 2 + r 2 − 2Rr cos ϕ ≤ (R − r)2 + Rrδ 2 ,

Rr Rr δ 2 − 14 δ 4
h= sin ϕ ≥ =: Rl . (12)
u − v (R − r)2 + Rrδ 2
If ϕ < π
2 and w ∈ [u, v] then for θ = π − ϕ we have
1 2
0 < cos ϕ = − cos θ ≤ δ − 1,
2
u − v2 ≤ R 2 + r 2 and

Rr Rr δ 2 − 14 δ 4
h= sin ϕ ≥ √ .
u − v R2 + r 2
Thus in any case h is estimated from below by the value Rl (12) and Rl ≤ r.
So for any unit normal n ∈ N (Gl , x) the ball BRl (x + Rl n) is supporting to the set Gl
at the point x ∈ Gl . Hence the set Gl is proximally smooth with constant Rl (12) for any l,
2 ≤ l ≤ m.
Lemma 4 is exact for the intersection of two spheres (Fig. 2).
Fig. 2 Lemma 4
Example 1 Let S = Sn,k be the real Stiefel manifold with n ≥ k, i.e. S = {X ∈

Rn×k | X T X = Ik }.
We will treat matrices X as elements of Rnk :
Rn×k X ⇔ x = (X1T . . . XkT ) ∈ Rnk
and Xi is the i-th column of the matrix X.

Then we can consider S as solutions of the system
gij (x) = (Xi , Xj ) − δij = 0, 1 ≤ i ≤ j ≤ k,
δij is the Kronecker delta.

Its easy to see that if i = j then, by (10), the set Sij = {x ∈ Rnk | gij (x) = 0} is
proximally smooth with constant R = 1 and if i < j then the set Sij is proximally smooth
√
with constant R = 2. Moreover, for any x ∈ S normals {gij (x)} are orthogonal, hence δ
√
takes value of 2 in notations of Lemma 4. Thus from Lemma 4
RRl−1
Rl ≥ (13)
R 2 + Rl−1
2
and the Stiefel manifold is proximally smooth with constant of proximal smoothness
2
R≥ √ .
k2 + 3k
At first we use (13) for surfaces Sij with i = j and R = 1 (k surfaces) and then we intersect
√
Sij with i < j and R = 2 ( 12 (k 2 − k) surfaces).
We want to emphasize that the result of Lemma 4 gives a lower bound for constant of
proximal smoothness R. In the case of the Stiefel manifold we can obtain the best possible
value for constant R.
Proposition 2 [25, Proposition 7] Let X0 ∈ Sn,k . Then for any X ∈ Rn×k such that X −
X0 < σk (X0 ) in the Frobenius norm, the projection (in the sense of the Frobenius norm)

k
of X onto Sn,k exists, is unique, and has the expression PSn,k (X) = Ui ViT = U In,k V T ,
i=1
given by a singular value decomposition X = U V T . Here σk (X0 ) is the smallest singular
.
value of X , U (V ) is the i-th column of U (V ) and I = [I ..0]T ∈ Rn×k .
0 i i n,k k
For some strange reason the authors of [25] didn’t pay attention on the fact that all sin-
gular values of a matrix from the Stiefel manifold equal 1. Hence σk (X0 ) = 1 and for
any matrix X with Sn,k (X) < 1 (in the sense of the Frobenius norm) there is exactly one
metric projection on Sn,k . Thus the Stiefel manifold is proximally smooth set with con-
.
stant R ≥ 1. Consider the matrix X0 = [e1 ..e2 . . . ek ] ∈ Sn,k , where {ei }ni=1 ∈ Rn is the
.
standard orthonormal basis, and the normal P = [e1 ..0 . . . 0]. For any t ∈ (1, 32 ) we have
.
PSn,k (X0 − tP ) = [−e1 ..e2 . . . ek ]. Hence R = 1 for any Sn,k .
For a smooth manifold S and x0 ∈ S put N1 (x0 ) = N1 (S, x0 ) = {n ∈ Tx⊥0 | n ≤ 1}.
354 M.V. Balashov
The Hausdorff distance between compact sets A, B ⊂ Rd is

h(A, B) = max max (b, A), max (a, B) .
b∈B a∈A
We want to recall the next result about a smooth and proximally smooth manifold.
Proposition 3 [26, Theorem 1.19.2], [31, Theorem 3.1 (k), (l)] Let S be a C 1 -smooth
manifold in Rd without edge. Assume that S is proximally smooth set with constant R. Then
S satisfies the property: for any x0 , x1 ∈ S, x0 − x1 < 2R,
1
h(N1 (x0 ), N1 (x1 )) ≤ S (x0 , x1 ),
R
where S (x0 , x1 ) is the length of the geodesic curve ⊂ S with endpoints x0 , x1 . Such
curve exists, moreover
x0 − x1
S (x0 , x1 ) ≤ 2R arcsin
2R
and thus
x0 − x1 π x0 − x1
h(N1 (x0 ), N1 (x1 )) ≤ 2 arcsin ≤ . (14)
2R 2R
Lemma 5 Let S be a manifold of dimension m without edge and S is smooth and √
proximally
smooth with constant 2 R. Then the set S is r0 -uniform with r0 = R and λ = 2 .
π 3
Proof Suppose that r > 0 is such constant that the set S is proximally smooth with constant
r and h(N1 (x0 ), N1 (x1 )) ≤ x0 −x
r
1
for all x0 , x1 ∈ S. We can take r = R, see (14).
Fix a point x0 ∈ S and x1 ∈ S, x0 − x1 < r. Then
x0 − x1
h(N1 (x0 ), N1 (x1 )) ≤ <1 (15)
r
and hence the angle between Tx0 and Tx1 (and between Tx⊥0 and Tx⊥1 ) is less than π2 . Thus for
any x1 ∈ S, x1 − x0 < r, Tx1 is not perpendicular to Tx0 .
√
Define l(x) = r − r 2 − x − x0 2 and Bx0 = {x | x ∈ x0 + Tx0 , x0 − x < 23 r},
see Fig. 3
Fig. 3 The set (grey colour)

From the supporting principle for proximally smooth sets and elementary planimetry any
point y ∈ (x + Tx⊥0 ) ∩ S ∩ Br (x0 ), where x ∈ Bx0 , belongs to the set

= {x + tn | t ∈ [0, l(x)]}
x∈Bx0 n∈Tx⊥ , n=1
0
and vice versa: any point y ∈ S ∩ belongs to the intersection (x + Tx⊥0 ) ∩ S ∩ Br (x0 ) for
some x ∈ Bx0 . It is sufficient to consider the√affine hull of points {x0 , x, y} for proof.
Suppose that x ∈ x0 + Tx0 , x0 − x < 23 r and (x + Tx⊥0 ) ∩ S ∩ Br (x0 ) = ∅. Consider
x(t) = (1 − t)x0 + tx, t ∈ [0, 1]. Let F (t) = (x(t) + Tx⊥0 ) ∩ S ∩ Br (x0 ). Put
t0 = inf {t ∈ [0, 1] | F (t) = ∅} .
By the inverse function theorem t0 ∈ (0, 1) and (due to compactness) F (t0 ) = ∅. Let
y ∈ F (t0 ) and y0 = Px0 +Tx0 y.
If Ty is not perpendicular to Tx0 , then (by the inverse function theorem and C 1 -
smoothness of S) there exist small neighborhoods U (y) ⊂ S and V (y0 ) ⊂ x0 +Tx0 such that
the metric projection gives bijection Px0 +Tx0 U (y) = V (y0 ). This contradicts the definition
of t0 as infimum. Hence Ty is perpendicular√
to Tx0 . The last assertion contradicts (15).
Suppose that x ∈ x0 +Tx0 , x0 −x < and {x1 , x2 } ⊂ (x+Tx⊥0 )∩S∩Br (x0 ). Then by
3
2 r
the supporting principle for proximally smooth sets x −xi ≤ r − r 2 − x0 − x2 < 12 r,
i = 1, 2, x1 − x2 ≤ x − x1 + x − x2 < r and

x0 − x1 x − x0 2 + x − x1 2
h(N1 (x0 ), N1 (x1 )) ≤ ≤ < 1.
r r
Note that n0 = xx22 −x
−x1 ∈ N1 (x0 ). Then there exists a unit vector n1 ∈ N1 (x1 ) with
1
n0 − n1 < 1. The cone with vertex x1 , axis lin {x1 , x1 + n1 }, generatrix of the length r
and with the angle between generatrix and the axis equals π3 , is contained in Br (x1 + rn1 ).
Hence the angle between n0 and n1 is strictly less than π3 and the supporting ball
int Br (x1 + rn1 )
contains the point x2 . Contradiction shows that the intersection (x + Tx⊥0 ) ∩ S ∩ Br (x0 ) is a
singleton.
Theorem 2 (One step of the GPA). Let S be a smooth and proximally smooth with constant
2 R manifold without edge, x0 ∈ S, t > 0. Suppose
π
f : Rd → R is a function with the next properties

1) f is smooth with constant L1 ,
2) L = sup f (x) < +∞,
x∈S∪US (R)
√
Assume that tf (x0 ) ≤ 23 R. Let z = Px0 +Tx0 (x0 − tf (x0 )) = x0 − tPTx0 f (x0 ),
x1 = S ∩ (z + Tx⊥0 ) ∩ BR (x0 ). Then

L L1
f (x1 ) − f (x0 ) ≤ −PTx0 f (x0 )2 t − t 2 + . (16)
R 2
Note that the point x1 exists and it is unique by Lemma 5.

356 M.V. Balashov
Proof By the definition z = x0 − tPTx0 f (x0 ) we get

x0 − z = tPTx0 f (x0 ),
(f (x0 ), z − x0 ) = (f (x0 ), −tPTx0 f (x0 )) = −tPTx0 f (x0 )2 .

By the estimate
3 2
x0 − z2 = t 2 PTx0 f (x0 )2 < R
4
and Lemma 5 we get S ∩ (z + Tx⊥0 ) ∩ BR (x0 ) = {x1 }.
Consider the inequality
f (x1 ) − f (x0 ) ≤ f (x1 ) − f (z) + f (z) − f (x0 ). (17)
We have

L1 L1
f (z) − f (x0 ) ≤ (f (x0 ), z − x0 ) + x0 − z2 ≤ −PTx0 f (x0 )2 t − t 2 . (18)
2 2
Let n ∈ N (S, x0 ), n = 1, such that n = xx11 −z−z (in the case x1 = z we have f (x1 ) −
f (z) = 0). From proximal smoothness of the set S we have S ∩ int BR (x0 + Rn) = ∅. Let a
be the nearest to the point z point inside the intersection ∂BR (x0 +Rn)∩{z+λ(x1 −z) | λ ∈
R}.
From the section in the affine hull of points {x0 , z, x1 } we obtain that

x0 − z2
z − a = R − R 2 − x0 − z2 ≤ . (19)
R
Hence
t2
x1 − z ≤ z − a ≤ PTx0 f (x0 )2 , (20)
R
t 2L
f (x1 ) − f (z) ≤ Lx1 − z ≤ PTx0 f (x0 )2 . (21)
R
From Formulae (17), (18) and (21) we obtain (16).
Put
L L1
q(t) = t − t 2 − t2 . (22)
R 2
We have
1
max q(t) = q(t0 ), where t0 = .
R + L1
t>0 2L
√ √
Note that t0 < 3R
2L and thus the inequality tf (x) ≤ 3
2 R holds for all x ∈ S.
π
Theorem 3 Let S be a smooth and proximally smooth with constant 2R manifold without
edge, x0 ∈ S.
Suppose that f : Rd → R is a function with the next properties
1) f is smooth with constant L1 ,
2) L = sup f (x) < +∞,
x∈S∪US (R)
3) The LPL condition takes place for the function f on the set S∩ Lf (f (x0 )) with constant
μ > 0.
Assume that t ∈ (0, t0 ]. Put q = q(t), q(t) is from (22).
Then the iteration process x0 ∈ S,

zk = xk − tPTxk f (xk ), xk+1 = S ∩ (zk + Tx⊥k ) ∩ BR (xk ),
converges with linear rate (25) to a solution of the problem min f (x).
x∈S
Proof From Theorem 2 and by the definition of t we obtain that

f (xk+1 ) − f (xk ) ≤ −PTxk f (xk )2 · q. (23)
Define ϕ(x) = f (x) − f0 , where f0 = inf f . Then
S
ϕ(xk+1 ) − ϕ(xk ) ≤ −μqϕ(xk ),
ϕ(xk+1 ) ≤ (1 − μq)ϕ(xk ). (24)
Note that inequality (24) means that μq(t) ∈ (0, 1] for any t ∈ (0, t0 ].
By Formula (20) of Theorem 2 and the Pythagoras theorem

L2
xk+1 − xk 2 = xk − zk 2 + zk − xk+1 2 ≤ PTxk f (xk )2 t 2 + t 4 2 .
R
By Formula (23)
PTxk f (xk )2 · q ≤ f (xk ) − f (xk+1 ) ≤ ϕ(xk )
and hence we obtain (by (24)) that
2 2
t 2 +t 4 L2 t 2 +t 4 L2
xk+1 − xk 2 ≤ R
ϕ(xk ) ≤ R
(1 − μq(t))k ϕ(x0 ), (25)
q q(t)
where q(t) is from (22).
Remark 4 Notice that we can demand concavity of the function f (x) − 12 L1 x2 instead
of Lipschitz property for f in Theorems 2 and 3.
Corollary 1 Under assumptions of Theorem 3 we have the next inequality

μ
μq(t0 ) = 4L ≤ 1. (26)
R + 2L1
Proof The proof follows from Formula (24) if we take t = t0 = 2L

1
.
R +L1
We want to admit that in the algorithm from Theorem 3 we should firstly find the metric
projection
zk = xk − tPTxk f (xk )
of the point xk − tf (xk ) on the plane xk + Txk . In fact we should find a basis {ei }m
i=1 (where
m = dim Txk = dim S) in the subspace Txk and such numbers {λi }m i=1 (depending on xk )
that
m

xk − tf (xk ) − xk + λi ei ⊥ Txk .
i=1

m
Then zk = xk + λi ei .
i=1
The second step is to find the point
xk+1 = S ∩ (zk + Tx⊥k ) ∩ BR (xk ).
358 M.V. Balashov
By Formula (21)
t2
PTxk f (xk )2 .
xk+1 − zk ≤
R
Hence we should find a (unique) common point of the ball from the affine subspace zk +Tx⊥k
2
with centerpoint zk of radius tR PTxk f (xk )2 with S. In the case when S is given by the
system gi (x) = 0, i = 1, . . . , d − m, it can be done by solving the system
gi (x) = 0, (ej , x − zk ) = 0, 1 ≤ i ≤ d − m, 1 ≤ j ≤ m,
t2 (x
and with x − zk ≤ R PTxk f k )
2.
g (xk )
In the case m = 1 the point xk+1 can be easily found. Put pk = g (xk ) and λk =
t 2 PTx f (xk )2
k
R Then we can find xk+1 dividing the segment [zk − λk pk , zk + λk pk ] by half
.
[24].
We want to admit that the retraction
xk+1 = S ∩ (zk + Tx⊥k ) ∩ BR (xk ).
is not the only possible option. It is more a theoretical fact than a guide to computing.
Actually, we can choose xk+1 ∈ S arbitrarily, but with the property that there exists C > 0
with xk+1 − zk ≤ Ct 2 PTxk f (xk )2 for all k. Then we have

L1 2
f (xk+1 ) − f (xk ) ≤ −PTxk f (xk )2 t − t − CLt 2
2
and all results
√
of Theorems 2 and 3 remain valid, we only have to replace 1/R on C and
take t ∈ (0, 2L3R
], t ≤ 1/(2CL + L1 ).
Sometimes we can choose x k+1 = PS zk (e. g. for the Stiefel manifold, see Proposition
2). In comparison with the choice xk+1 = S ∩ (zk + Tx⊥0 ) ∩ BR (x0 ) we have x k+1 − zk ≤
xk+1 − zk . Hence all results of Theorems 2 and 3 remain the same.
Corollary 2 Suppose that conditions of Theorem 3 hold. Then the GPA in the form x0 ∈ S,
zk = xk − tPTxk f (xk ), xk+1 = PS zk , converges with a linear rate (25) to a solution of the
problem min f (x).
x∈S
3 Discussions
In most papers about the GPA there is no proof for the rate of convergence for the GPA in
nonconvex case or the proof is implicit. The last means that such proof often says that the
sequence {xk } generated by the GPA converges to a solution x∗ with some rate, for example
with a linear rate
∃C > 0 ∃ k0 ∈ N ∃ q ∈ (0, 1) : xk − x∗ ≤ Cq k , ∀k > k0 .
But there is no evaluation of the common ratio q or constants C, k0 in terms of known
constants.
We shall compare the result [21, Theorem 2.3, Corollary 2.11] with Theorems 2 and 3.
In comparison with [21] we have more information about the manifold S in Theorems 2 and
3: S is proximally smooth with constant R. It allows us to obtain an explicit estimate (25)
for the rate of convergence of the GPA through Lipschitz constants of the functions f , f ,
constant of proximal smoothness R and constant μ from the LPL condition. We also use a
fixed step-size t ∈ (0, t0 ], t0 = 2L

1
, in all steps of the algorithm. In [21, Theorem 2.3]
R +L1
we are forced to take a complicated Armijo-related step-size αn to satisfy condition (A3),
see [21, Corollary 2.9, Theorem 2.10]. But even this does not lead to an explicit estimate
for the constant in (A3), the constant in (A3) is generally unknown, see [21, Theorem 2.10].
Note that without condition (A3) [21, Theorem 2.3] guarantees only convergence of the
GPA without any estimate for the rate of convergence. Under conditions (L) and (A1-A3)
from [21] Theorem 2.3 and Corollary 2.11 give a linear asymptotic behaviour for the rate
of convergence in problem (1) with a smooth function f and a smooth manifold A but not
an explicit upper bound. We also want to notice that in the paper [38] the authors assumed a
uniform bound on the second-order terms in the Taylor expansion of the metric projection,
i.e., a curvature bound for the manifold and in fact proximal smoothness. Nevertheless, the
result [38, Corollary 4.2] is in fact the same as [21, Theorem 2.3], but with a fixed step-size
t.
Even in the case of a quadratic function on the Stiefel manifold [36] the above mentioned
difficulties from [21] can not be overcame. Theorem 3 works alright in this case.
Another peculiarity of our work is the retraction scheme xk+1 = S ∩(zk +Tx⊥k )∩BR (xk ).
t 2 Px f (xk )2
Moreover, xk+1 − zk ≤ k
R . This result is crucial for the estimate of the error
for the algorithm and it takes place due to proximal smoothness of the set S with constant
R. On the base of this retraction we also proved that the retraction xk+1 = PS zk leads to the
same estimate for the error, see Corollary 2.
Acknowledgments The author is greatful to B. T. Polyak for useful comments.

The author is grateful to the reviewers for numerous comments and suggestions.
The work was supported by Russian Science Foundation (Project 16-11-10015).
References
1. Goldstein, A.A.: Convex programming in Hilbert space. Bull. Amer. Math. Soc. 70:5, 709–710 (1964)
2. Levitin, E.S., Polyak, B.T.: Constrained minimization methods. Zh. Vychisl. Mat. Mat. Fiz. 6:5, 787–823
(1966)
3. Polyak, B.T.: Introduction to Optimization. M., Science (1983)
4. Nesterov, Y.u.: Introductory Lectures on Convex Optimization. Springer, Abasic course basic course
(2004)
5. Karimi, H., Nutini, J., Schmidt, M.: Linear Convergence of Gradient and Proximal-Gradient Methods
under the Polyak-Lojasiewicz Condition Linear Convergence of Gradient and 1 Methods under the
2 Condition. In: Frasconi, P., Landwehr, N., Manco, G., Vreeken, J. (eds.) Machine Learning and
Knowledge Discovery in Databases. ECML PKDD 2016. Lecture Notes in Computer Science, vol. 9851.
Springer, Cham (2016)
6. Ležanski, T.: ÜBer das Minimumproblem für Funktionale in Banachschen räumen. Math. Ann. 152,
271–274 (1963)
7. Polyak, B.T.: Gradient methods for minimizing functionals. Zh. Vychisl. Mat. Mat. Fiz. 3:4, 643–653
(1963)
8. Lojasiewicz, S.: A Topological Property of Real Analytic Subsets (in French). Coll. du CNRS. Les
Equations aux derives partielles 117, 87–89 (1963)
9. Lojasiewicz, S.: Sur le problem de la division. Studia Math. 8, 87–136 (1959)
10. A. D: Ioffe METRIC REGULARITY A SURVEY arXiv:1505.07920v2 (2015)
11. Luo, Z.-Q.: New error bounds and their applications to convergence analysis of iterative algorithms.
Math. Program. Ser. B. V. 88, 341–355 (2000)
12. Maxim, V., Maxim, O.: Golubev Balashov About the Lipschitz property of the metric projection in the
Hilbert space. J. Math. Anal. Appl. 394:2, 545–551 (2012)
13. Balashov, M.V.: Maximization of a function with Lipschitz continuous gradient. J. Math. Sci. 209:1,
12–18 (2015)
360 M.V. Balashov
14. Drusvyatskiy, D., Lewis, A.: Error bounds, quadratic growth, and linear convergence of proximal
methods arXiv:1602.06661v2 (2016)
15. Edelman, A., Arias, T.A., Smith, S.T.: The geometry of algorithms with orthogonality constraints. SIAM
J. Matrix Anal. Appl. 20:2, 303–353 (1998)
16. Luenberger, D.G.: The gradient projection methods along geodesics. Manag. Sci. 18:11, 620–631 (1972)
17. Hager, W.W.: Minimizing a quadratic over a sphere. SIAM J. Optim. Contr. 12:1, 188–208 (2001)
18. Absil, P.-A., Mahony, R., Sepulchre, R.: Optimization algorithms on matrix manifolds. Princeton
University Press, Princeton and Oxford (2008)
19. da Cruz Neto, J.X., De Lima, L.L., Oliveira, P.R.: Geodesic algorithms on Riemannian manifolds. Balkan
J. Geom. Appl. 3:2, 89–100 (1998)
20. Udrişte, C.: Convex Functions and Optimization Methods on Riemannian Manifolds. Mathematics and
Its Applications series, vol. 297. Springer (1994)
21. Schneider, R., Uschmajew, A.: Convergence results for projected line search methods on varieties of
low-rank matricies via Lojasiewicz inequality, vol. 25:1 (2015)
22. Jain, P., Kar, P.: Non-convex Optimization for Machine Learning. Now Foundations and Trends, pp. 154
(2017)
23. Barber, R.F., Ha, W.: Gradient descent with nonconvex constraints: local concavity determines conver-
gence, arXiv:1703.07755v3 (2017)
24. Balashov, M., Polyak, B., Tremba, A.: Gradient projection and conditional gradient methods for con-
strained nonconvex minimization. Numerical Functional Analysis and Optimization 41(7), 822–849
(2020)
25. Absil, P.-A., Malick, J.: Projection-like retraction on matrix manifolds. SIAM J. Optim. 22:1, 135–158
(2012)
26. Ivanov, G.E.: Weakly convex sets and functions, M., Fizmatlit. In Russian (2006)
27. Vial, J.-P.h.: Strong and weak convexity of sets and functions. Math. Oper. Res. 8:2, 231–259 (1983)
28. Clarke, F.H., Stern, R.J., Wolenski, P.R.: Proximal smoothness and lower–C 2 property. J. Convex Anal.
2:1-2, 117–144 (1995)
29. Balashov, M.V.: The gradient projection algorithm for a proximally smooth set and a function with
lipschitz continuous gradient. Sbornik: Mathematics 211(4), 481–504 (2020)
30. Balashov, M.V., Ivanov, G.E.: Weakly convex and proximally smooth sets in Banach spaces. Izv. RAN.
Ser. Mat. 73:3, 23–66 (2009)
31. Goncharov, V.V., Ivanov, G.E.: Strong and Weak Convexity of Closed Sets in a Hilbert Space. In: Daras,
N., Rassias, T. (eds.) Operations Research, Engineering, and Cyber Security. Springer Optimization and
Its Applications, vol. 113, pp. 259–297. Springer, Cham (2017)
32. Bounkhel, M., Thibault, L.: On various notions of regularity of sets in nonsmooth analysis. Nonlin. Anal.
48, 223–246 (2002)
33. Poliquin, R.A., Rockafellar, R.T., Thibault, L.: Local differentiability of distance functions. Trance.
Amer. Math. Soc. 353, 5231–5249 (2000)
34. Balashov, M.V.: About the gradient projection algorithm for a strongly convex function and a proximally
smooth set. J. Convex Anal. 24:2, 493–500 (2017)
35. Gao, B., Liu, X., Chen, X., Yuan, Y.-X.: On the Lojasiewicz Exponent of the Quadratic Sphere
Constrained Optimization Problem, arXiv:1611.08781v2
36. Liu, H., Wu, W., So, A.M.C.: Quadratic Optimization with Orthogonality Constraints:Explicit
Lojasiewicz Exponent and Linear Convergence of Line-Search Methods, arXive:1510.01025v1 (2015)
37. Zhang, J., Zhang, S.: A Cubic Regularized Newton’s Method over Riemannian Manifolds, arX-
ive:1805.05565v1 (2018)
38. Merlet, B., Nguyen, T.N.: Convergence to equilibrium for discretizations of gradient-like flows on
Riemannian manifolds, vol. 26 (2013)
Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps
and institutional affiliations.

The Gradient Projection Algorithm For Smooth Sets and Functions in Nonconvex Case

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

The Gradient Projection Algorithm For Smooth Sets and Functions in Nonconvex Case

Uploaded by

Copyright:

Available Formats

Set-Valued and Variational Analysis (2021) 29:341–360

The Gradient Projection Algorithm for Smooth Sets

Keywords Lipschitz continuous gradient · Proximal smoothness · Gradient projection

Mathematics Subject Classiﬁcation (2010) Primary: 90C26 · 65K05. Secondary: 46N10 ·

1 Introduction and Main Notations

1 V. A. Trapeznikov Institute of Control Sciences of Russian Academy of Sciences, 65

Convexity assumptions may be weakened. Firstly it was done in unconstrained optimiza-

∃ν > 0 (x) ≤ νGt (x) ∀x ∈ A

Proposition 1 Let A ⊂ Rd be a closed set. The next properties are equivalent.

where ϕiu (u) = d ϕi (u)

2 Minimization of a Smooth Function on a Smooth Manifold

We denote further by S ⊂ Rd a smooth, i.e. C 1 , m-dimensional connected manifold without

Definition 1 [21, 24, 35] Let f : Rd → R be a differentiable function which is bounded

PTx f  (x)2 ≥ μ(f (x) − f0 ),

where f0 = inf f (y).

Proof Suppose that x ∈ S ∩ U (R) and x ∈ P x, x − x < R. Let z = Px +Tx x.

Lemma 2 Suppose that S is a smooth manifold without edge,

Proof Fix x∗ ∈ . There exist δ > 0, u∗ ∈ Rm and ϕ : Rm ⊃ int Bδ (u∗ ) →

Choose σ ∈ (0, δ) with the property

hence after ”time” T ≤ 4M

Note that the function itself is not necessarily quadratic in a neighbourhood of .

μ(f (x) − f0 ) ≤ PTx f  (x)2 .

Proof Suppose that x ∈ S ∩ U (R) and x ∈ P x, x − x < R. Let w = Px+Tx x .

Remark 2 Unfortunately, convexity of the function f is essential for suggested proof of

Consider an example: f (x) = (−e1 , x) and S = {x ∈ Rd | x = 1, x1 ≥ 12 }. We obtain

Definition 2 is technical, we need it to prove Lemma 5.

Lemma 3 Let R > 0. Suppose that g : Rd → R is a differentiable function and S = {x ∈

last assertion follows from the equality

Lemma 4 Let gi : Rd → R, 1 ≤ i ≤ m < d, be such differentiable functions that for any

Proof By Lemma 3 the sets Si are proximally smooth with constant R.

where nl−1 ∈ N (Gl−1 , x), nl−1 = 1, nl = 1 and nl ∈ N (Sl , x).

Lemma 4 is exact for the intersection of two spheres (Fig. 2).

Example 1 Let S = Sn,k be the real Stiefel manifold with n ≥ k, i.e. S = {X ∈

We will treat matrices X as elements of Rnk :

Rn×k  X ⇔ x = (X1T . . . XkT ) ∈ Rnk

and Xi is the i-th column of the matrix X.

gij (x) = (Xi , Xj ) − δij = 0, 1 ≤ i ≤ j ≤ k,

δij is the Kronecker delta.

The Hausdorff distance between compact sets A, B ⊂ Rd is

Fig. 3 The set  (grey colour)

f : Rd → R is a function with the next properties

Note that the point x1 exists and it is unique by Lemma 5.

Proof By the definition z = x0 − tPTx0 f  (x0 ) we get

(f  (x0 ), z − x0 ) = (f  (x0 ), −tPTx0 f  (x0 )) = −tPTx0 f  (x0 )2 .

Then the iteration process x0 ∈ S,

Proof From Theorem 2 and by the definition of t we obtain that

Corollary 1 Under assumptions of Theorem 3 we have the next inequality

Proof The proof follows from Formula (24) if we take t = t0 = 2L

fixed step-size t ∈ (0, t0 ], t0 = 2L

Acknowledgments The author is greatful to B. T. Polyak for useful comments.

You might also like

where ϕiu (u) = d ϕi (u)

PTx f (x)2 ≥ μ(f (x) − f0 ),

μ(f (x) − f0 ) ≤ PTx f (x)2 .

Rn×k X ⇔ x = (X1T . . . XkT ) ∈ Rnk

Fig. 3 The set (grey colour)

Proof By the definition z = x0 − tPTx0 f (x0 ) we get

(f (x0 ), z − x0 ) = (f (x0 ), −tPTx0 f (x0 )) = −tPTx0 f (x0 )2 .