You are on page 1of 4

Stiefel_manifolds

August 18, 2020

Optimization on Stiefel Manifolds


http://noodle.med.yale.edu/~hdtag/notes/steifel_notes.pdf

1. Problem formulation

We wish to find a set of p orthonormal vectors in Rn which are optimal wrt to a real-valued
function F:
min F ( X ) s.t. X T X = I
X ∈Rn× p

• Overview of the algorithm


1. The constraint set X T X = I is a submanifold of Rn× p called the Stiefel manifold.
2. The algorithm works by finding the gradient of F in the tangent plane of the manifold
at the point X [k] of the current iterate. A curve is found on the manifold that proceeds
along the projected negative gradient, and a curvilinear search is made along the curve
for the next iterate X [k+1] .
3. The search curve is not a geodesic, but is instead constructed using a Cayley transfor-
mation (explained in section 4). This has the advantage that matrix exponentials are
not required. The algorithm only requires inversion of a 2p × 2p matrix. The algorithm
is especially suitable for use in high dimensional spaces when p << n.

2. Preliminaries
• two subspaces A and B of a vectorspace V are called orthogonal complements, if they
are ortogonal and their direct sum is V.
• To show that two subspaces A and B are equal, it is sufficient to show that A is a sub-
space of B and that dim( A) = dim( B).
• To show that two subspaces A and B are orthogonal complements in V, it is sufficient
to show that A is orthogonal to B, and that dimV = dimA + dimB.
• Representation: If ⟨., .⟩ is an inner product defined on V , then corresponding to any
linear functional L : V → R there is a v ∈ V with the property that ⟨v, u⟩ =
L(u) for all u ∈ V . The vector v is the representation of L. The representation of L
depends on the inner product.
• An invertible linear map L : V → W between two vector spaces V, W is an isomor-
phism. An isomorphism between two spaces shows that the spaces are essentially
identical as far as their vector space structure is concerned. An isomorphism between
a space and itself is an automorphism.

1
( )
• The Euclidean inner product for the vector space of matrices is ⟨ A, B⟩ = tr A T B
• the subspaces consisting of all symmetric and all skew-symmetric matrices are orthog-
onal complements

2.1 Manifold structure of X T X = I


{ }
• The set X ∈ Rn× p | X T X = I is a manifold
• This manifold is the Stiefel manifold.
• The standard notation for p orthonormal vectors in Rn is V p (Rn )
• It has dimension equal to np 12 p( p + 1). Proof for the case p = n here. We will
view this manifold as an embedded sub-manifold of Rn× p . This means that we
identify tangent vectors to the manifold with n × p matrices.

2.2 The tangent space

• Lemma 1. Any Z ∈ T X V p (Rn ), then Z (as an element of Rn× p ) satisfies Z T X +


X T Z = 0 That is, Z T X is a skew-symmetric p × p matrix.
• Proof. Let Y (t) be a curve in V p (Rn ). Then Y T (t)Y (t) = I. Differentiating this wrt
to t gives Y r (t)Y (t) + Y T (t)Y ′ (t) = 0. At t = 0, setting Y (0) = X and Y ′ (0) = Z
gives the result.
• Now we wish to find representations of the tangent vectors
• Recall that if X ∈ V p (Rn ) , then its columns are orthonormal vectors in Rn . For
any such X, we can always find additional n − p vectors in Rn (by the Gram-
Schmidt ocedure, for example) which when combined with the p columns of X
form an oronormal basis of Rn . Let X⊥ denote the n × (n − p) matrix whose
columns are these new n − p vectors, and consider the matrix [ XX⊥ ] (the concate-
nation of columns of X and X⊥ ). This is an n × n orthonormal matrix.
• Lemma 2. The matrix [ XX⊥ ] is an automoprhism in Rn× p .
• Proof: Orthonormal, hence invertible. Multiplying from the right gives a n × p
matrix, and multiplying the inverse from the right gives a n × p matrix. Hence an
automoprhism.
• Thus, any element U ∈ Rn× p can be written as U = [ XX⊥ ]C, where C is an n × p
matrix.
• Now we can represent elements of ∈ Rn× p as U = XA + X⊥ B, where A is the top
p by p matrix, and B is the bottom (n − p) by p matrix of C.
• Lemma 3. A matrix Z = XA + X⊥ B belongs to the tangent space T X V p (Rn ) if
and only if A is skew symmetric.
• Proof:
• (=>) Since Z belongs to Rn× p , it has the representation above. And since Z also
lies in the tangent space, we use the equality from Lemma 1. Then combine.
• (<=) The space of matrices S with this representation has the same dimensionality
as the tangent space. Also, from (=>) we know that the tangent space is a subspace
of S. qed.
• The elements of A above the principle diagonal and all elements of B are coordi-
nates of the tangent vector Z.

2.3 Euclidean and Canonical Inner Products for the Tangent Space

• The Stiefel manifold becomes a Riemannian manifold by introducing an inner product in its
tangent spaces.

2
• There are two natural inner products for tangent spaces of Stiefel manifolds: the Euclidean
inner product and the canonical inner product ( )
• we define the Euclidean inner product for 2 elements Z1 and Z2 as ⟨ Z1 , Z2 ⟩e = tr Z1T Z2
• from our representation we know that ⟨ Z, Z ⟩e = tr ( A T A) + tr ( B T B) = ∑i> j 2a2 (i, j) +
∑i,j b2 (i, j), after some reformulation
• thus, the Euclidean metric ⟨ Z, Z ⟩e induced by the Euclidean inner product weights the ele-
ments of A (the ‘A’ coordinates) twice as much as the elements of B (the ‘B’ coordinates),
• the canonical inner product weights the coordinates equally
• the idea is to find the “A” matrix of the tangent vector Z and weigh it by 21 in the inner
product ( ( ) )
• from this, we define the canonical inner product as ⟨ Z1 , Z2 ⟩c = tr Z1T I − 21 XX T Z2
• because of the above argument, we use exclusively the canonical inner product

3. Differentials and Gradients

• If F is a function from Rn× p to R, and X, Z ∈ Rn× p , then DFX : Rn× p → R, called the
∂F
differential of F, gives the derivative of F in the Z direction at X by DFX ( Z ) = ∑i,j ∂X Zi,j =
( T ) [ ] i,j
∂F
tr G Z , where G = ∂X i,j
∈ Rn× p
• Because DFX is a linear functional on Rn× p a representation of DFX in Rn× p of any of its
subspaces is the gradient of F in Rn× p or in the subspace (?). We can view the sum above as
the vector product of G and Z, and thus G is the representation of DFX
• DFX is also a linear functional on V p (Rn )
• DFX restricted to the tangent space has the following representation:
• Lemma 4. Under the canonical inner product, the vector AX with A = ( GXTXGT ) repre-
sents the action of DFX on the tangent space V p (Rn ).

4. Cayley Transform and the Search Curve


( ) −1 ( )
Let W be any skew-symmetric matrix. Consider the curve Y (τ ) = I + τ2 W I − τ2 W X. It
has following properties:

1. It stays in the Stiefel manifold, i.e. Y (t)T Y (t) = I


2. Its tangent vector at τ = 0 is Y (0) = −WX
3. If we set W = A = GX T XG T , then the curve is a descent curve for F, i.e. the direction of
steepest descent, which is also the negative gradient by Lemma 4.
( ) −1 ( )
• We can view Y (τ ) as the point X transformed by I + τ2 W I − τ2 W .
• This is called the Cayley transformation

Sketch of the algorithm: Generate X [k+1] from X [k] by a curvilinear search along Y (t) by chang-
ing τ.
The search is carried out using the Armijo-Wolfe rule.
Since the matrix inversion is costly, we use the e Sherman-Morrison-Woodbury formula.
( ) −1 T
We get Y (τ ) = X − τU I + τ2 V T U V X, where U = [ G, X ] and V = [ X, − G ] (this means
concatenation).
This reduces the matrix size from n × n to 2p × 2p.

6. Curvilinear Search

3
Curvilinear search is just traditional linear search applied to the curve Y (τ ). The search termi-
nates when the Armijo-Wolfe conditions are satisfied. The Armijo-Wolfe conditions require two
parameters 0 < ρ1 < ρ2 < 1.
Curvilinear Search: Initialize τ to a non-zero value.
$ \begin{array}{l} Until { F (Y (τ )) ≤ F (Y (0)) + ρ

7.
6.
• Find constraints of the problem
Prove that the feasible set is a manifold
Find the tangent space of the manifold
Find/derive an inner product for the tangent space
Find a representation of the gradient and with this the descent curve
Use curvilinear search, i.e. gradient descent
The Cayley transformation is orthogonal:
• Q = ( I − A)( I + A)T = ( I + A)T ( I − A)

[ ]:

You might also like