You are on page 1of 74

On Sums of Singular Values

Sadhana Subramani

A thesis submitted for the degree of


Master of Science
in the School of Mathematics,
University of New South Wales, Australia

February, 1993
· I hereby declare that this submission is my own work and that, to the
best of my knowledge and belief, it contains no material previously published
or written by another person nor material which to a substantial extent has
been accepted for the award of any other degree or diploma of a university
or other institute of higher learning, except where due acknowledgement is
made in the text.
Abstract

The sum of the K largest singular values of a real m by n matrix is a


convex function of the matrix elements. A max characterization for this sum
involving partial isometries is given in Horn and Johnson [16]. Using the
Frobenius inner product, a different partial isometry characterization is first
established in this thesis. This result is then generalized to establish another
max characterization for the singular value sum. Identification of the matrices
which attain the maximum leads to a concise expression for the subdifferen-
tial in terms of a "dual" matrix. From this, a useful characterization of the
subdifferential of the convex composite function: the sum of the K largest
singular values of a smooth m by n matrix-valued function of a set of real
parameters, is obtained. The first-order optimality conditions at a point are
verifiable computationally and involve the computation of dual matrices. If
the conditions are not satisfied, then the dual matrix provides information
that leads to the generation of a. descent direction from that point, splitting
a multiple singular value if necessary. The dimension of the dual matrix is
determined by the multiplicity of the Kth singular value. Finally, it is indi-
cated how the eigenvalue results of Overton and Womersley [24] are related
to results on singular values.
Contents

Abstract l

Acknowledgements iv

1 Introduction 1

2 Preliminaries 9

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.2 Properties of the SVD . . . . . . . . . . . . . . . . . . . . . . 9

2.3 Miscella.nia. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3 Sums of the largest singular values 27

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.2 A partial isometry characterization . . . . . . . . . . . . . . . 28

11
3.3 Another max characterization . . . . . . . . . . . . . . . . . . 41

4 Differential properties 51

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4.2 The subdifferential of JK(A) . . . . . . . . . . . . . . . . . . . 52

4.3 The generalized gradient of JK(x) . . . . . . . . . . . . . . . . 53

4.4 Necessary conditions . . . . . . . . . . . . . . . . . . . . . . . 55

4.5 The directional derivative . . . . . . . . . . . . . . . . . . . . 55

4.6 Splitting multiple singular values . . . . . . . . . . . . . . . . 57

5 Conclusions 60

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

5.2 Relation to AT A . . . . . . . . . . . . . . . . . . . . . . . . . 61

5.3 Further research . . . . . . . . . . . . . . . . . . . . . . . . . . 63

Bibliography 64

Ill
Acknowledgements

I am indebted to my supervisor Dr. R. S. Womersley for his very able


guidance throughout the course of my study. His constant encouragement
and enthusiasm were of great help and very much appreciated. His comments
and interest in my work ensured the successful completion of this thesis.

Some of the results in this thesis were presented at the 29th Applied
Mathematics Conference in Adelaide. Sincere thanks are extended to the
School of Mathematics for giving me the opportunity and financial assistance
to attend this conference. The excellent resources and facilities at the School
provided the required backup for my studies.

It is my pleasure to acknowledge my family for their support and patience,


and all my friends especially Xiaojun Chen, Pritha Das, Maria Natividad,
Andreas Stephens, Prabhakar Tripathi and Tinny Widjaja for providing the
necessary change from my studies. Special thanks are due to Emeritus Pro-
fessor V. T. Buchwald for his encouragement.

Finally I gratefully acknowledge the financial support of the Australian


Government which sponsored me under the Equity and Merit Scholarship
Scheme. It facilitated not only my research but also a cultural exchange
which I found very rewarding.

IV
Chapter 1

Introduction

In this thesis, we study the properties of sums of the largest singular values
of real m by n matrices.

We introduce some notation first. Let ~mxn denote the linear space of all
real m by n matrices and let Om,n denote the set of all real m by n matrices
with orthonormal columns, where m 2: n. Thus, zT Z = I for all Z E Om,n·
We use I to denote the identity matrix when the dimension is implicit in the
context and In to denote the identity matrix of order n. Also, let 'Dn denote
the set of all real n by n diagonal matrices. The notation diag( a 1 , ... , an)
with all O'i E ~ or diag( tt) where tt E ~n, refers to a matrix in 'Dn with
diagonal entries a 1 , ..• , O'n or tt 1 , ... , ttn respectively.

If A E ~mxn, then its singular value decomposition (SVD) is defined by

(1.1)

where X E Om,m, YE On,n and form 2: n, :E E ~mxn and is of the form

~ E 'T"I OE c;n(m-n)xn_ (1.2)


Lq Vn, ::J\.

Proofs of the existence of this matrix decomposition can be found in many

1
places, see for example [15) or [19). Equation (1.1) may be rewritten equiva-
lently as xT AY = :E.

The diagonal elements of the matrix :E 1 are denoted by o-1 , ..• , O"n and ordered
as o-1 > · · · > O"n > 0. The nonnegative real numbers O"i = O"i(A), i = 1, ... , n
are known as the singular values of A. The columns of X are called the left
singular vectors of A while the columns of Y are called the right singular
vectors of A.

For m :::; n, :E would be of the form [:E 1 0) where :E 1 = diag( 0-1, ••• , um) and
0 E ~mx(n-m). Throughout this thesis, we shall assume that m > n. For
complex A, X and Y are unitary matrices and yT is replaced by Y*, where
Y* represents the complex conjugate transpose of Y. It is easy to extend
the theory to the complex case. All our results will be given in terms of real
matrices. We choose to consider the case of real A with m ~ n as results are
easily adapted to then > m case, and because this is the one which is typical
of many applications.

The SVD can be easily obtained using sophisticated and widely accessible
numerical software. See the LINPACK User's Guide [4), the EISPACK Guide
Extension [12) or the more recent LAPACK User's Guide [1] for subroutines
to compute the SVD. Our emphasis is on applications rather than the com-
putation. Numerical examples in this thesis were generated by implementing
routines in MATLAB [21).

Singular values are a useful tool in many applications. For instance, they
play a key role in control system design where many important structural
properties, such as robustness and noise sensitivity, can be expressed as in-
equalities involving the singular values of appropriate transfer function ma-
trices (see [6] and the references therein; also [26) and [28)).

The SVD figures prominently in schemes for reducing data based on approxi-

2
mating a given matrix with one of lower rank. For example, a problem which
sometimes arises in image processing is that the amount of data which is
generated cannot be transferred reasonably so that it becomes necessary to
reduce the data. In some cases, this can be achieved by using the singular
vectors corresponding to the largest singular values of the image matrix (see
[2] for examples of such data reduction schemes).

Further details on the SVD, including its properties, computation and nu-
merous other applications can be found in [13], [15) and [18).

The primary motivation for this study on properties of sums of the K

largest singular values of real m by n matrices and matrix-valued functions


is the work of Overton and Womersley [24) on minimizing sums of the K

largest eigenvalues of real symmetric matrices and matrix-valued functions.


The singular values can be characterized in terms of the eigenvalues of a
symmetric matrix. For example, for the ca.se of m ~ n, the singular values
of the m by n matrix A are the square roots of the eigenvalues of AT A.
Alternatively, the eigenvalues of the block matrix

are the singular values of A a.nd their negatives, with Im - nl zero eigenvalues
if m =/:- n. However, there are disadvantages to both these representations. For
example, if A( x) is affine, then working with AT A can destroy the structure in
the matrix-valued function A(x), while the second characterization increases
the dimension. Therefore, we work directly with the SVD of A to obtain
information on the singular value problem.

Let the singular values of A E ~mxn be ordered by

and define
I',

J,,,(A) = L 0-i(A)
i=l

3
where K E { 1, ... , n}. Here, / 1 (A) denotes the largest singular value of A and
fn(A) denotes the sum of all the singular values of A. It is well known that
the largest singular value is a convex function of the elements of the matrix
A. The fact that the sum of the K largest singular values of a real m by n
matrix is a convex function of the matrix elements appears less well known
but it is implied in the work by Horn and Johnson [16]. It is an immediate
consequence of the results obtained in this thesis.

Results on sums of singular values are of particular interest since using these
one may obtain estimates of the smallest and intermediate singular values.
For example, the smallest singular value, an(A), can be written as

O"n(A) = fn(A) - fn-1 (A)

which is a difference of convex functions. The properties of d. c. ( difference


of convex) functions are summarized in [14]. The sensitivity analysis of sums
of singular values has also been studied by Seeger [29].

The situation for eigenvalues of symmetric matrices is slightly different. The


sum of all the eigenvalues of a symmetric matrix is equal to the trace of the
matrix. Hence, it is a linear function of the elements of the matrix. By [9]
and [24], the sum of the K largest eigenvalues of a symmetric matrix is a
(nonsmooth) convex function of the matrix elements. Hence, the smallest
eigenvalue is a concave function of the matrix elements.

Issues of determining the smallest nonzero singular value, say ap, are sig-
nificant as the "condition number", aif ap, gives an approximate measure of
ill-conditioning.. See [8] for an example of the use of the smallest nonzero
singular value and intermediate singular values in the numerical stabilization
of an ill-conditioned problem which occurs in Geophysics.

Let K E {1, ... , n} and let m 2:'.: n. A matrix W E ~mxn is said to be a


rank K partial isometry if r 1 = · · · = r,,, = 1 and T,;,+1 = · · · = Tn = 0,

4
where r 1 , •.. , Tn are the singular values of W. Define

Pmxn,,c is the set of all rank K partial isometries. It is established that

f,c(A) = max l(A, W)I.


We1'mxn,K

A generalization of the set 'Pmxn,,c is the set cI>mxn,,c defined by


n
cI>mxn,,c = {W E ~mxn : 'Tt ::; 1 for f = 1, ... , n, L Tt < K }.
l=l

The set cI>mxn,,c is a compact convex subset of ~mxn. It is shown that f,c is
the support function for cI>mxn,11:, i.e.,

!11:(A) = max l(A, W)I. (1.3)


We<l>mxn,K

Another max characterization for !11: is given by Theorem 3.4.1 of Horn and
Johnson [16], namely,

max {ltr(UT AV)I}. (1.4)


u e Om,K
VE On,,.

lri [16], it is also proved that this variational formula is equivalent to the
partial isometry characterization

J,c(A) = ce~nxm
max {ltr(AC)I: C is a rank K partial isometry}.

Both (1.3) and (1.4) show that f,.(A) is a convex function. The advantage
of (1.3) over (1.4) is that, since cI>mxn,11: is convex, (1.3) leads directly to a
characterization of the subdifferential of f ,c which does not involve a convex
hull operation.

One of the earliest results for the sum of the K largest singular values of
a matrix is due to von Neumann [30] in 1937 and later in 1951 to Fan [7].
Sums of singular values have also been addressed in Horn and Johnson [16];
for other comments, see [20]. However, it appears that not much has been
done on the interconnection between this subject and the sets 'Pmxn,11: and

5
We also consider the function

where A(x) is a smooth, i.e., continuously differentiable, m by n matrix-


valued function on ~- For convenience, the same symbol JK, is used for a
function defined on the set of m by n matrices and on the parameter space
~- The distinction should be clear from the context.

The function JK,(x) is a convex composite function as it is a composition of a


convex function JK,(A) with a smooth function A(x). The Clarke generalized
gradient (see [3]) of JK,(x) may therefore be defined by means of a chain
rule. In general, convex composite functions are neither convex nor smooth
but they have many of the same local properties as convex functions; more
complete discussions are given in [10] and [17].

The smoothness of the function f K, depends on the multiplicity of the Kth


singular value. If the Kth singular value, u K., has multiplicity greater than one
at some value of x and all these multiple singular values are not included in
the sum, then f K, may be nonsrnooth at that x value. The following example
illustrates this.

Example 1.0.1 Let m = 8,n = 5,K = 3 and define A(x): ~--+ ~xs, i.e.,

A(x) = A0 + xA1
where
2
1
1
Ao=
1
0.5
0
and A 1 is a randomly generated 8 by 5 matrix. The top curve in Figure 1.1
is a plot oJ the sum oJ the 3 largest singular values of A( x) against x; the
remaining 5 curves represent plots of the individual singular values of A( x)

6
versus x. Note that f1t is a convex function with x = 0 as the minimizer. At
this x, the K.th singular value u K, = 1 with multiplicity 3 and we see that f 1t is
nonsmooth. Near x = -0.9, there is a multiple singular value with u 1 = u2•
All the multiple singular values are included in the sum and f K, is smooth at
this x-value.

Singular values and swn of 3 largest singular values


8~---.------.------r-----r---..-----.,-------.------r----r-----.

, , , , , , I I J_,,,,-
7 ............. ) ................ ~················!················i· .............. ~ ................ ;................; ................ ;................; .......... ..,,,:'

6 -~~<······j··············--j-···············j················\················\··············--j-···············l············J~7~~.<.. +·············
',,j ~ ; : ; ,,,' : :
5 ·············f·~~,.~<f~.~~······-r-··············L............. i.. ........._._J>>,-~f ············1················1····..········
---
.............. i. ...............!. ............. T~~:-:.:::-.:.;~,,:-:'"~--......::<~...... L. ..............;................;.............. ) ............
:
4
: : : : : ;

3 ·············+···············i··············+···············/·······
: :
········,················,················,···········
:
; ;
: : : : : :
:.
.. .
: : : . : :
.. :. :.

2 -::-: .. -- , ············:-···············<················•···· ·············•················i················i················i··············


·······~- ... :-:~·":.---- ~
; -.. __ --:------ : ------~::: -------~,.·. -------
---:'-
1 -·-·-·-·-·-i-·-···-·~:~!···~. ~::·.-2~T.~~-~~.~:~-T.~~~~.:~~~:.:.~~~c~:.~. . _.t·-·-·-·-·-·t._.-:~~-.-~·.=:L~:~::·
: : :
.-~:·~: ;

0 ' - - - - - - - ' - - - - - ' - - - - - ' - - - - - ' - - - - ' - - - - - ' ' - - - - - - ' - - - - - ' - - - _ . __ __.
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1

Figure 1.1: Plot of Example 1.0.1.

The remainder of this thesis is organized as follows. In Chapter 2, some


further notation, definitions and basic results are presented. In Chapter 3,
using the Frobenius inner product, max characterizations for the function
f1t(A) are established, first over the maximizing set Pmxn,1t and then over the
set cI>mxn,1t· In each case, the elements which achieve the maximum are also
identified.

In Chapter 4, the result on the elements in the set cI>mxn,1t which attain
the maximum is used to obtain a. concise formula for the subdifferential of

7
f"(A). Optimality conditions for J"(x) are derived by characterizing Clarke's
generalized gradient in terms of a dual matrix which has dimension equal
to the multiplicity of the 11:th singular value. The directional derivative of
J"(x) is discussed and it is emphasized how optimality may be verified by
computation of the appropriate dual matrix. This chapter is concluded by
showing how, for a non-optimal point, the dual matrix information may be
used in the generation of a descent direction, splitting a multiple singular
value if necessary.

The final chapter shows that the eigenvalue results of Overton and Womer-
sley (24] may be applied to eigenvalues of AT A to derive results on singular
values of A. An indication of how the theoretical results of this thesis may
be used for effective algorithm development is also given together with some
other concluding remarks on further research that may be done in this area.

8
Chapter 2

Preliminaries

2.1 Introduction

This chapter introduces definitions, specifies further notation and gives


fundamental results. Much of this material either implicitly or explicitly
underlies the work in later chapters.

In Section 2.2, we summarize the basic properties of singular values and


singular vectors and briefly discuss the relationship between singular values
and eigenvalues. In Section 2.3, we give notational detail, definitions and a
summary of properties of the Frobenius inner product relevant to our work.
In addition to this, we establish some basic results which will prove useful in
Chapter 3.

2.2 Properties of the SVD

The singular values and vectors of A E ~mxn with m > n satisfy the fol-

9
lowing properties for i = 1, ... , n. These properties follow from the definition
of the SVD:
Ui = Ayi xr
Ayi = UiXi
AT Xi -:-- UiYi
AT Ayi = U[Yi
AAT Xi = U[Xi.
Here, Xi denotes a left singular vector corresponding to Ui and Yi denotes
a right singular vector corresponding to ui. Notice that the singular values
of A are the positive square roots of the eigenvalues of the n by n positive
semidefinite matrix AT A, or of then largest eigenvalues of them by m positive
semidefinite matrix AAT; the remaining eigenvalues of AAT, if any, are all
zero. The right singular vectors are the orthonormal eigenvectors of AT A
while the left singular vectors are the orthonormal eigenvectors of AAT. In
addition, for any PE Om,m and Q E On,n, the singular values of pT AQ are
the same as those of A. This expresses the orthogonal invariance of the set
of singular values of a real matrix. When A is square and symmetric, its
singular values are just the absolute values of its eigenvalues.

If A has rank r, then exactly r of its singular values are positive, i.e.,

U1 ~ · · · ~ Ur > Ur+I = ··· = Un = 0

and using ( 1.1) we have the SVD expansion


r
A = L UiXiYT'
i=l

which expresses A as a sum of r matrices of rank one.

There is a very straightforward relationship between the singular value


decomposition of any A E ~mxn and the spectral decomposition of the real
symmetric m +n by m +n block matrix defined by

-
A= [ 0
AT Al
O . (2.1)

Suppose m ~ n and that A has a. singular value decomposition A = XEYT.


If the columns of X a.re partitioned as X = [X1 X 2 ] such that X 1 E ~mxn and

10
X2 E ~mx(m-n) with~ given by (1.2), then A has the spectral decomposition

0E ~(m-n)x(m-n)

where all unspecified entries are zero and the matrix

0E ~nx(m-n)

is orthogonal. The eigenvalues of A are related to the singular values of A as


stated in the following theorem.

Theorem 2.2.1 ({15}, Theorem 7.3.7) Let A E ~mxn, let q = min{m,n},


and define A by (2.1). Let o-1 , ••• , o-9 be nonnegative real numbers. The sin-
gular values of A are o-1 , ••• , o-9 if and only if them+ n eigenvalues of A are
0-1, ..• , o-9 , -o-9 , ... , -0"1 and zero repeated Im - nl times.

The block matrix (2.1) plays a key role in relating eigenvalue results for real
symmetric matrices to singular value results for general real matrices.

The largest and smallest singular values of A, denoted o-1 (A) and un(A)
respectively, are sometimes equivalently defined in terms of the spectral norm,
ll·lb,as
IIAxll2
o-1(A) = llxll2i'~o
max II X II 2 = IIAll2

and

o-n(A)
. IIAxll2
mm
llxll2'/:0 llxll2
IIA2 1112 if det(A) =/- 0,
{
0 if det(A) = 0.

The largest singular value obeys the triangle inequality, i.e.,

11
for any A, BE ~mxn. However,

is not true for all i = 1, 2, ... , where {a;( A)} and {a;( B)} are the singular val-
ues of A and B respectively, both arranged in decreasing order. Nevertheless,
the sum of the K largest singular values does satisfy the triangle inequality.

Theorem 2.2.2 {{16}, Corollary 3.4.3) Let A, B E ~mxn have respective


ordered singular values a 1 (A) ~ · · · ~ aq(A) ~ 0 and a 1 (B) ~ · · · > aq(B) ~
0 where q = min{m,n}, and let a 1 (A + B) ~ ··· ~ aq(A + B) > 0 be the
ordered singular values of A + B. Then for K = 1, ... , q
K K K
L
a;(A + B) ~ a;(,4) + La;(B). L
i=l i=l i=l

An immediate corollary is

Corollary 2.2.1 For any KE {1, ... , q}, the function fK(A) = Li=I ai(A) is
convex.

The Frobenius norm of A, denoted as IIAIIF, can be neatly characterized


in terms of singular values as
m n n
IIAII} = LL laiil = I:a; 2
i=l j=l i=l

where aii denotes the (i,j)-th element in the matrix A.

We next note that the singular values of an arbitrary matrix have a min-
imax characterization. This is a generalization of the minimax characteriza-
tion for the eigenvalues of real symmetric matrices.

Theorem 2.2.3 {{15}, Theorem 7.3.10) Let A E ~mxn, let q = min{m,n},


let a 1 ~ a2 ~ ••• ~ aq ~ 0 be the ordered singular values of A, and let

12
k = 1, ... , q. Then

mm max
Wt , ••. ,Wk-t ElRn. X -:f. 0, X E )Rn
X l. Wt' • • • 'Wk-t

and
max mm
Wt , ••• ,Wn-kE)Rn X -:f. 0, X E !Rn
X l. Wt, ••• , Wn-k

This characterization can be used to show the well-conditioned nature of


singular values. Perturbations of the elements of a matrix produce perturba-
tions of the same, or smaller magnitude in the singular values. The theorem
below gives a precise statement about this stability.

Theorem 2.2.4 {[13}, Corollary 8.3.2) Let A and E be in ~mxn where E


represents the perturbation in A and let q = min{ m, n}. Let the ordered
singular values of A be o- 1 (A) ~ ··· ~ o-q(A) ~ 0 and similarly for E and
A + E. Then for i = 1, ... , q

According to this theorem, the singular values of a matrix resulting from


such a perturbation cannot differ from the singular values of the original
matrix by more than the largest singular value of the perturbation matrix.
In other words, there is an upper bound for the perturbations in the singular
values. Also, from the above inequality, it is clear that ai(A + E) --+ ai(A)
as A +E --+ A. This illustrates the continuity of singular values under
perturbations.

Another perturbation result for singular values which is useful in practice


is given below.

Theorem 2.2.5 {{13}, Theorem 8.3.4) lflith hypotheses as in Theorem 2.2.4,


q
Z:)ai(A + E) - O"i(A)) 2 ::; IIEII}.
i=l

13
We conclude this section by giving an interlacing property for singular
values.

Theorem 2.2.6 ({15}, Theorem 7.3.9} Let A E ~mxn be a given matrix and
let A be the matrix obtained by deleting any one column of A. Let {o-i} and
{ ui} denote the singular valttes of A and A respectively, both arranged in
nonincreasing order. Then the singular values O"i of A interlace with those O-i

of A as follows.

(a) If m ~ n, then

(b} If m < n, then

The inequalities associated with the two cases (a) and (b) are interchanged
if a row of A is deleted instead of a column. This theorem gives bounds on
the perturbation of singular values due to removing a column of a matrix.

2.3 Miscellania

We start with some notation. Throughout this thesis, matrices are de-
noted by capital Roman letters except for diagonal matrices whose diagonal
entries are eigenvalues or singular values. Such diagonal matrices will be de-
noted by capital Greek letters such as ~ or A. Diagonal matrices may also be
denoted by diag(o 1 , ... , on) where o{, .' .. , O'n are elements on the diagonal.

Let Sn be the set of a.ll n by n real symmetric matrices and let Kn be the

14
set of all n by n real skew-symmetric matrices. A E Sn satisfies the identity
A= AT while A E Kn satisfies A= -AT.

Let A be in ~mxn and let q = min{m,n}. The (i,j)-th element in the


matrix A is denoted by aii · In some cases, Aij is used to denote the (i, j )-th
entry of A. Thus, for any matrix A we have the notational equivalence

The trace of A is defined by


q

tr(A) = L aii
i=l

and the absolute trace of A by


q

atr(A) = L laiJ
i=l

If A is square, then the trace of A also equa.ls to the sum of its eigenvalues.

Matrix inequalities are expressed using the positive semidefinite partial


ordering on Sn (see [15], for example). By A ~ 0, where A E Sn, we mean
that ~ is positive semidefinite (equivalently ::T Az ~ 0 for all z E ~n or Ai ~ 0
for i = 1, ... , n where A1 , ... , An denote the eigenvalues of the matrix A). For
example, the constraints O ::::; A ::::; I on A E Sn imply that O ::::; Ai ::::; 1 for
i=l, ... ,n.

The Frobenius inner product for any two matrices A, B E ~mxn is


m n
{A, B) = tr(ABT) = LL aiibii·
i=l j=l

This inner product is the extension of the ,standard inner product xT y =


Li=l XiYi for x, y E ~n, to real rectangular matrices (of the same dimension).

1-5
Some useful properties of the Frobenius inner product are summarized below.

1. (A, A) = IIAII}.
2. (A,J) = tr(A) (for a square matrix A).

3. (A, B) = (B, A) (since tr(ABT) = tr(BAT)).


4. If A E Sn and [{ E Kn, then

(A,K) = 0.
5. If A E ~mxn, B E ~icxe, U E ~mxic and V E ~nxl, then

(B,UT AV)= (UBVT,A).

Proof: Expand both sides using the definition of the Frobenius inner
product. D

A matrix U E ~mxic with m ~"'is said to have orthonormal columns if


and only if ur U = I. The matrix U is referred to as an orthogonal matrix.
If U is square, then UUT =I also so that the rows of U are orthonormal as
well. Note that if Z E Om,n, then Z E ~mxn with m ~ n and zr Z = I.

The following lemma summarizes basic properties of orthogonal matrices.

Lemma 2.3.1 Let U E Om,ic· Then for i = l, ... , m and j = l, ... , K

(a) (i)
T ) ~ { 1 if i = j,
(
U U ii = ~ UeiUtj = 0 otherwise. (2.2)
l=l

(ii) IUijl ::; 1.


(iii)
K

l(UUT)ijl = I L UitUjtl :=; 1. (2.3)


l=l

16
{b) IUi;I = 1 implies

Ue; =0 for I', i= i, I', = 1, ... , m


and
Uit = 0 for I', i= j, f = 1, ... , K.

(c) l(UUT)i;I =1 implies

(UUT)t; =0 for f i= i, f = 1, ... , m


and
(UUT)it =0 for f i= j, f = 1, ... , m.

Proof:

(a) (i) This follows from the statement uTu = /.


(ii) It is an immediate consequence of (2.2) with i = j.
(iii) Extend to W = (U V] such that W is square and orthogonal, so
WTW = I = WWT. Dia.gona.l elements of WWT then give the
desired result.

(b) Using (2.2) with i = j, we obtain


m
l(UTU);;I = I: ut = 1.
t'=l

If IUi; I = 1, this implies

Ue; =O for ei= i, f = 1, ... , m.


Similarly, (2.3) with i =j gives
K

l(UUT)iil = I: ui ~ 1.
l=l

If IUi; I = 1, this implies

Uie =0 for f f. j, e= 1, ... , IC.

17
(c) Suppose B = uur. Then

Using this together with (2.2),


m
l(BT B);;I = L Bi; = IB;;I < 1.
l=I

Bt; = (UUT)t; = 0 for f =/:- i, f = l, ... , m.


Similarly,
m
l(BBT)id = I L Biel = IBid ::; 1
l=I

which for IBi;I = 1 implies


Bit - (UUT)it =0 for £ =/- j, f = l, ... , m. D

The next two lemmas give results related to the outer product W = UVT,
where U E Om,1t and VE On,1t· These lemmas are necessary for proving the
partial isometry result of Section 3.2.

Lemma 2.3.2 Let U E Om,1t, let V E On,1t and let W = UVT. Then for
i = 1, ... ,m andj = l, ... ,n

(aJ llVi; I < 1.

(bJ Iwi; I = 1 implies

for e= l, ... ' K, where

( ={ 1 if wi; = 1,
-1 if wi; = -1.
(ii) Wit= 0 for f=j:.j, f = l, ... ,n and

Wt; =O for £ =/- i, f = I, ... , rn.

18
Proof:

(a) Let ( be either 1 or -1. Since


ft ft ft ft

0 ~ I)Uil - (V;t) 2 =L U(t + L V;} - 2( L Uitll;t, (2.4)


l=l l=l l=l l=l

we have
ft ft ft

2( L Uie V;e ~ L U(t + L V;~ ~2


l!=l l!=l l=l

where the second inequality follows from the fact that U and V have
orthonormal columns. Thus if ( = 1,
ft

w:--IJ -- "'""""u-.v:-.
~ I~ J~ -< 1
l=l

and if ( = -1, ft

wij =L Uie lf;t ~ -1.


l=l
Lemma 2.3.2 (a) follows from these last two inequalities.

(b) From (2.4) and the orthonormality property of U and V it follows that
ft ft

0 ~ L(Uie - (V;e) ~ 2(1 -


2 (L Uitll;t)-
l=I l=l

Thus if Wii = E;=I Uiel1;c = ( (where ( is either 1 or -1), then

Uie = (ll;e for l = 1, ... , K.

Therefore if lVij = 1, then


Uie = V;t for l = 1, ... , K
and
ft

1wijl = l(UVT)iil =Lui~= l(UUT)iil = 1.


Then from pa.rt (c) of Lemma 2.3.1,

lV;e=(UUT)ic=O for l=Jj, l=l, ... ,n.

19
If wij = -1, then
Uu = -½e for R = 1, ... , K

which means that


K

IWiil = l(UVT)iil = L V;~ = l(VVT)iil = 1.


i=I

Hence from Lemma 2.3.1 (c),

Wei = (VVT)ei = 0 for R=J i, R= 1, ... , m.

(c) As wii = Lt=I UieVie for i = 1, ... , n,


n K n
L llViil =LI L UieViel- (2.5)
i=I l=I i=I

Since
n n n n
0 ::; I:(Uie - (Vie)2 =LU~+ L ½~ - 2( L Uiel~e,
i=l i=l i=l i=l

we obtain
n n n
2(LUiel1ie ~ I:U~ + L½i ~ 2,
i=l i=l i=l
where the second inequality follows from the fact that U and V have
orthonormal columns. Thus if ( = 1,
n
L Uiel"ie ~ 1
i=l

and if ( = -1, n
L UieVie ~ -1.
i=l
Hence
n
IL UieViel ~ 1.
i=l

Lemma 2.3.2 (c) follows from this inequality and (2.5). D

Remark 2.3.1 As I Li=I l Vid 1 < Li=I llViil, Lemma 2.3.2 (c) implies that
II:?=1 wiil ~ ,.,_

20
Lemma 2.3.3 Let ( be either 1 or -1, let W = UVT where U E Om,it and
VE On,it with m ~ n ~ K. Also, let r and t be integers such that r ~ 0, t ~ 1
and 1 < r + t :s; n. If

wii - ( for i = 1, ... , r, (2.6a)


for i = r + 1, ... , r + t and (2.6b)

((K - r), (2.6c)

then
Wij=O for i=r+t+l, ... ,m and j=l, ... ,n

and
Wii =0 for i = 1, ... , rn a.nd j = r + t + 1, ... , n.

Proof: Let ( be either 1 or -1 and assume that the diagonal elements of W


satisfy (2.6). As (2.6a) holds, Lemma. 2.3.2 (b) (i) implies that

Uil = (Vit for i = 1, ... , r a.nd f = 1, ... , K. (2.7)

Since ,;.

lV··tt -- '°'U·,Vo
~ ,~ l< (2.8)
l=l

it follows that for i = 1, ... , r


(2.9)

and ,;.

L'~~ = i.
l=l
(2.10)

Using (2.9), we may write

l=l i=l
T K. 1·+t It

LLu?e+ L Lui~
i=l l=l i=r+l l=l
r+t ,;.
r+ L LU;~
i=r+ll=l

21
so that
r+t r;,

"' - r 2:: E E ui~- (2.11)


i=r+I i=I

Using (2.10) and a similar argument,


r+t r;,

"' - r 2:: E E v:r (2.12)


i=r+I i=I

Also, from (2.6c) and (2.8) we obtain


r+t r;,

L L UieVie = ((K - r). (2.13)


i=r+I i=I

Now,
r+t r;,

0 < L L(Uie - (½e) 2


i=I i=I
r+t r;,

L I:(Uie - ('~e) 2
i=r+Ii=I
r+t r;, r+t r;, r+t r;,

L z:=ui~+ L z:=v;;-2( I: z:=uie½e


i=r+I i=I i=r+I i=I i=r+I i=I
< (K - r) + (K - r) - 2(K - r) =0
from (2.7), and (2.11) to (2.13). This implies that

Uie=(½e for i=l, ... ,r+t and f=l, ... ,K. (2.14)

Substituting for ½e in (2.13) gives


r+t r;,

L L ui~ = K - r
i=r+l i=l

or
r+t r;,

}:}:Ui~ = K. (2.15)
i=I i=l
Since U has orthonormal columns,
m K tt m

z:=z:=ui~ = I:Eui~ = "'·


i=Il=I i=Ii=l

This result and (2.15) imply that

Uie = 0 for i = r + t +I, ... , m and f = 1, ... , "'· (2.16)

22
Hence
Wii = 0 for i = r + t + I, ... , m and j = 1, ... , n.

Note that if we substitute for Uit in (2.13), we obtain


r+t it

L L½~ = K - r
i=r+I l=I
or

i=l l=l
Again, using the orthonormality property of V, this implies

Vil = 0 for i = r + t + 1, ... , n and f = 1, ... , ,c (2.17)

so that

Wi;=O for i=l, ... ,m and j=r+t+l, ... ,n. D

The following lemma completes this section. It provides results which


we shall use in Section 3.3 to establish a ma.x characterization of singular
value sums and to identify the elements which achieve the maximum in this
characterization.

Lemma 2.3.4 Let m ~ n, let KE {1, ... ,n} and let ( be either 1 or -1.
Let the SVD of H1 E ~mxn be given by H1 = u:tvT' u E Om,m, V E On,n
and EE ~mxn where 1 ~ T1 ~ • •• ~ Tn ~ 0 are the singular values of W and

Et=I Tt ~ K. Then

(a) IWi;I ~ 1 for i = 1, ... , m and j = 1, ... , n.

{b) wii =( for any i = 1, ... , n implies that

(i)
TcUic = (½e for f = 1, ... , n (2.18)

and
Uie = (Tel-'ie for f = 1, ... , n. (2.19)

23
{ii) for any f. = 1, ... , n

Uit = (¼t =/:- 0 implies Tl = 1,


ijil = Vil = 0 implies Tl < 1,

and
Tl < 1 implies Uil =0 and Vil = 0. {2.20)

{iii) uil =o for f. = n + 1, ... , m.

Proof:

(a) From the SVD of W,


n
wi; = :E r,.uil ½l-
l=I

Since
n
0 < L( reUit - (½e) 2 {2.21)
l=I
n n n
:E(rtUie) 2 + L V;~ - 2( :E reUil½e (2.22)
l=I (=I (=I

it follows that
n n n
2( :E reUiel1ie < :E(reUit) + L V;~ 2

l=I l=I l=I


n n
< :E ui~ + :E V;} ~ 2 (2.23)
l=I l=I

using the fact that Te ~ 1 for f. = 1, ... , n and the orthogonality of U


and V. From (2.23) we obtain
n
lVi; = :E reUii½e ~ 1
'-=1

if we let ( = 1 and n
Wi; = L reU;eVje ~ -1
£=1

24
if we let ( = -1. Lemma 2.3.4 (a) follows from these two inequalities.
Notice that we may also obtain this lemma by using the inequality
n
L(Uit - (Ttl1it) 2 ~ 0 (2.24)
l=l

instead of (2.21).

(b) (i) For i = j,


n n
0 :s; L(TtUit - (½t) 2 :s; 2(1- (LTtUitYit)
l=l l=l

using (2.21 ), (2.22), the fact that Tt :s; 1 for i = 1, ... , n together
with the orthogonality of U and V. If 11/ii = Lt=l TtUitYit = ( for
any i = 1, ... , n (where ( is either 1 or -1), then

TtUie = (l"it for R = 1, ... , n.

Equation (2.19) can be obtained by a similar argument starting


with (2.21) repiaced by (2.24).

(ii) When Wii = Lt=t TtUie½e = (, (2.18) implies that


n
lVii =L T!Ui~ =l (2.25)
l=l

and
n
U/.. -
Vl11-~
~ v2i i-- 1 l (2.26)
l=l

while (2.19) implies


n
wii = I: Tf ½; = 1 (2.27)
l=l

and
n
HI-·i i- -~
~ Ui i
2 - 1
-. (2.28)
l=l

As TJ :s; 1 for R = 1, ... , n, (2.25) and (2.28) imply that if Uie =J 0


for any R, then Te = l, and Uie = 0 for any R implies Te :s; 1. If
Te = 1, then from (2.18) we have Uie = (½t- Likewise (2.26) and
(2.27) imply that if l7ie =/:- 0 for any R, then T1_ = l, and ½1. =0
for any f_ implies Tt ~ l. Again, from (2.18) we obtain Uie = (Vil
2.5
when Tt = l. Moreover from (2.25) and (2.28), as Vi~ > 0 for
any f implies that rl = 1, it follows < 1 implies Uit = 0.
that Tt

Similarly from (2.26) and (2.27), it follows that Tt < 1 for any f
implies ½t = 0.

(iii) Lemma 2.3.4 (b) (iii) follows from (2.28) and the orthogonality of
u.
(c) Since
n n n n n
I:IWiil = LILTtUit½d ~ LTtlLUitVitl,
i=l i=l l=l l=l i=l

Lemma 2.3.4 (c) follows easily, as ILt=I Uit½d ~ 1 (from the proof of

Lemma 2.3.2 (c)) and Lt=l Tt ~ "'· D

26
Chapter 3

Sums of the largest singular


values

3.1 Introduction

We shall be concerned with sums of the K largest singular values of real


m by n matrices, i.e., functions of the form
K

JK(A) = L O"i(A) (3.1)


i=l

where A E ~mxn, m 2: n and KE {l, ... ,n}. The singular values of A,


o-i(A) = O"i, are ordered according to
o-1(A) 2: · · · 2: o-n(A) 2: 0.

In view of the next section, we have the following definition. A matrix W E


~mxn is said to be a. rank K, partial isometry if o- 1 (W) = · · · = o-K(W) = 1
and o-K+I(W) = · · · = o-q(W) = 0, where q = min{m,n}.

In Section 3.2, we establish a max characterization for JK(A) over the set
Pmxn,K (defined by (3.11)), which is the set of all realm by n rank K. partial
isometries. In addition, we characterize the matrices in Pmxn,K which achieve

27
the maximum. Section 3.3 establishes a ma.x characterization for f"(A) over
a different set <I>mxn," ( defined by {3.23) ), which is a generalization of the set
'Pmxn,K· The matrices in the set <I>mxn," which achieve the maximum are also
identified.

3.2 A partial isometry characterization

Theorem 3.2.1 which is from Horn and Johnson ([16], Theorem 3.4.1) gives
a max characterization for the sum of the K largest singular values of an m
by n matrix A.

Theorem 3.2.1 Let A E ~mxn, let q = rnin{ m, n} and denote the ordered
singular values of A by o-1 (A) 2: · · · 2: o-q(A) 2: 0. Then for each K = 1, ... , q
max {ltr(UT AV)I} (3.2)
UEOm,"

max {ltr(AC)I: C is a rank K partial isometry}


ce~nxm

where f"(A) is defined by {3.1).

Proof of (3.2): Suppose m 2: n and let A = XEYT be a singular value


decomposition of A. Partition the columns of X as X = [X1 X2] such that
X1 E Om," and X2 E Om,m-1t, and partition the columns of Y as Y = [Yi ½]
such that Yi E On," and ½ E On,n-K· Also, define E by (1.2), let U E Om,"
and let VE On,K· Since A= XEYT, we have

(3.3)

where U E ~mx1t and V E ~nxK are defined as

28
and

Then using the orthogonality of X and Y,

and

Here I,,. is the identity matrix of order K, and hence U and V are matrices
whose columns are K orthonormal vectors in ~m and ~n respectively. If we
denote the rth diagonal entry of OT Eli by ( OTEfl)rr and define the m by n
matrix Eby

if i = £,
Eit ={ ~i(A)
otherwise,

then using this and (3.3), we ma.y write

tr(UT AV) tr(UTEV)


It

L([!TEV)rr
1·=1
K m n

r=l i=l l=l


,,. n

r=l i=l
n It

L O"i(A)(L [fir ½r)


i=l r=l
n
L O"i(A)(UVT)ii·
i=l

By parts (a) and (c) of Lemma. 2.3.2, l(Ul1 T)iil < 1 for i = 1, ... , n and
Li=t l(OVT)iil ~ K so that
n
ltr(UT AV)I IL O"i(A)(Ol1 T)iil
i=l
n K

< L O"i(A)l(CTf'T)iil ~ L O"i(A). (3.4)


i=l i=l

29
If we let U = X1 and V = Yi, then from the SVD of A and the partitioning
of X and Y,

Xi[X1 X2]E [ ~; l Yi
EK

where EK = diag(o-1 (A), ... , o-K(A)). Hence


K

ltr(UT AV)I = tr(EK) = Lai(A),


i=l

so the upper bound in {3.4) can be achieved. For a different proof, seep. 195
of [16]. D

In order to characterize the elements which achieve the maximum in the


following results, information about the multiplicity of the singular values
of A is required. Let a K be the ,,;th largest singular value of A and let the
singular values of A be written

0"1 ~ ~
O"r >
•••

O"r+l = · · · = O"K = · · · = O"r+t > (3.5)


O"r+t+l ~ · · · ~ O"n ~ 0
where r ~ 0 and t ~ 1 are integers. The number of singular values larger than
aK 1s r. The ,,;th singular value has multiplicity t. Note that by definition

r + 1 ~ ,,; ~ r + t ~ n.
Thus, r = 0 if ,,; = 1. Also if t = 1, then K = r + I. We may write
r
JK(A) = LO"i + (,,; - r)aK. (3.6)
i=I

Assumption: All the results which follow are derived on the assumption that
(7K > 0.

Remark 3.2.1 The case where aK =0 is briefly discussed in Section 3.3.

30
First we establish the following lemma which depends only on the defini-
tion of the set 'Pn,K and the ordering (3.5) for then nonnegative real numbers
<1'i. Specifically, it does not require Ui to be a singular value of a matrix.

Lemma 3.2.1 Let KE {1, ... , n} and define 'Pn,~ by


n
'Pn,K = {w E ~n: lwil ~ 1 for i = 1, ... ,n, 12::wd ~ K}. (3.7)
i=l

If the elements of u E ~n are ordered as in {3.5}, then


K

max
WE<Pn,K
luTwl = i=l
Lui

and

argmax {luTwl: w E 'Pn,1t} = {w E ~n:


Wi = ( for i = 1, ... , r,

lwil ~ 1 for i = r + 1, ... , r + t,


Wi = 0 for i = r + t + 1, ... , n and
r+t
L Wi = ((K. - r),
i=r+l
where ( is either 1 or - 1}. (3.8)

Proof: From {3.5) and the definition (3.7) of 'Pn,K


n K

luT wl = I L uiwil ~ L <1'i


i=l i=l

for any w E 'Pn,K. Hence


K

max luTwl ~ Lui.


wE<Pn,1< i=l

Let ( be either 1 or -1. If w* = (w;, ... ,w~ f is any element of the right
hand side of (3.8), then
r r+t
luT w* I = IL uiw; + L uiw; I
i=l i=r+l
r
I 2:ui( + u,.((K. - r)I
i=l

31
Conversely, let w* E argmax {luTwl : w E <Pn,,J. Then, w* E <Pn,K satisfies
o-T w* = CLi=l O"i and from {3. 7) we have
n
"I:w; = (1,,.
i=l

There are two possible cases corresponding to this. The first case is where
( = 1 so that I(

T w* ~
(T = ~O"i
i=l
and
n
I:w; = K. {3.9)
i=l
From {3.5) it follows that

w~I =1 for i = 1, ... , r

and
w;=o for i=r+t+l, ... ,n.

Thus, from (3.9)


r+t
L w; = 1,,-r.
i=r+l
The other case is where ( = -1 so that
I(

(T
T w* =- ~
~O"i
i=l

and
n
"I:w; = -1,,. (3.10)
i=l

Again, from (3.5)


w~I = -1 for i = 1, ... , r
and
w;=o for i=r+t+l, ... ,n.

Therefore, from (3.10)


r+t
L w7=-(1,,-r). D
i=r+l

32
Lemma 3.2.2 Let 1,, E {1, ... ,n}, let m 2 n and define the set Pmxn,,. by

Pmxn,,c = {WE ~mxn : w = uvT, u E Om,,c, VE On,,.}. (3.11)

Then, W E ~mxn is in Pmxn,,. if and only if W is a rank K partial isometry.

Proof: If W E Pmxn,,., then there exists U E Om,,. and VE On,,. such that
w = uvT. Form the matrices [U ir] E Om,m and [V V] E On,n• Then

W = [U ll]E [ r; ], where E = [ /,. O l E ~mxn

and so W is a rank 1,, partial isometry.

Conversely, let WE ~mxn be a given rank K partial isometry. Then, from


the definition of the SVD a.nd that of a ra.nk K partial isometry, it follows

l[r; l
that
W = UEVT = [U U] [ /,. 0

where E E ~mxa, and U E Om,,. and V E On,,. are the first K columns of
U E Om,m and V E On,n respectively. D

Remark 3.2.2 Pmxn,,. is invariant under orthogonal transformations, i.e.,

where PE Om,m and Q E On,n·

The following lemma gives a max characterization for f,.(A) over the set
Pmxn,,c· In the proof of this lemma, we establish an upper bound for the
sum of the K. largest singular values of A. \Ve then use an appropriate square
diagonal matrix to show that the upper bound can be achieved.

Lemma 3.2.3 Let KE {1, ... ,n}, let m 2 n and define Pmxn,,. by {3.11).
Then
IC

max l(A, H1 )1 = L ai(A).


We1'mxn,K i=l

33
Remark 3.2.3 From the properties of the Frobenius inner product

for any wE 'Pmxn,K.1 w= uvT where u E ~mXK. and V E ~nXK. so


Lemma 3.2.3 follows from Theorem 3.2.1. A direct proof is given below.

Proof: For any WE Pmxn,K, (1.1) and the properties of the Frobenius inner
product imply that

(A, W) · (XEYT, lV) = (E, xTWY). (3.12)

So,

max l(A, lV)I


WE'Pmxn,"'

= max l(E,XTWY)I
ll'E'Pmxn,K
= l-i'E'Pmxn,K
max I(~, lV)I

from the orthogonal invariance of Pmxn,K (see Remark 3.2.2). Now,

max l(E, lV)I


WE'Pmxn,1<
n
max I L o-i(A)Wid
= WE'Pmxn,K i=l
n
:::; max Lo-i(A)IWiil
WE'Pmxn,1< i=l
K,

: :; L o-;(A)
i=l

as from Lemma 2.3.2 (a) and (c), W E Pmxn,K implies that IWid < 1 for
i = 1, ... , n and Li=l IWiil :::; "'·

If we let W = diag(W11 , ••• , Wnn) where Wii = 1 for i 1, ..• , K and


wii = 0 otherwise, then WE Pmxn,K and
K,
l(E, l V)I = o-1(A)lV11 + · · · + o-K,(A)liVK,K, = L O"i(A)
1

i=l

so that the maximum is achieved. D

34
Remark 3.2.4 Another matrix which achieves the maximum can be obtained
from {1.1), the SVD of A. Suppose, we partition X as X = [X1 X 2) such
that X1 E Om,1t and X2 E Om,m-1t, and Y as [Yi 1-';) such that Yi E On,1t and
l'; E On,n-1t. Then, letting W = X1 Y? gives
I(

l(A, W}I = l(E,XTWY}I = l(El(,/1(}1 = I:o-i(A)


i=l

where El(= diag(o-1(A), ... , o-l((A)).

The following lemma is used in the proof of Theorem 3.2.2.

Lemma 3.2.4 Let St be the set of all real t x t symmetric positive semidefi-
nite matrices, let st- be the set of all real t x t symmetric negative semidefinite
matrices and let Ptxt,1t-r be the set of all real t X t rank K-r partial isometries.
Then

{a)
(3.13)

(b)

Proof: Let B E St n Ptxt,1t-r· Let o-1(B), ... , O-t(B) denote the singular
values of B with o-1 (B) ~ ··· ~ O-t(B) ~ 0 and let A1(B), ... , At(B) denote
the eigenvalues of B with IA 1 (B)I ~ · · · ~ IAt(B)I. Suppose B has the spectral
decomposition
B = QAQT (3.14)

where Q E Ot,t and A= diag{A 1 (B), ... , At(B)). Since Bis symmetric,

for i = 1, ... , t.
As Bis positive semidefinite, Ai(B) ~ 0, so

for i = 1, ... , t. (3.15)

35
If B E 'Ptxt,1t-r, then by the definition of a partial isometry

CTi(B) = · · · = CTK-r(B) = 1; 0"1t-r+1(B) = · •· = CTt(B) = 0.


Using (3.15) we therefore obtain

Ai(B) = •· · = ,\K-r(B) = 1; ,\1t-r+1(B) = · •· = At(B) = 0. (3.16)

Now, let Qi E Ot,K-r be the matrix consisting of the first K - r columns of


Q. Then from (3.14) and (3.16)

is an element of the right hand side of (3.13).

Conversely, let B be any element of the right hand side of (3.13). Clearly,
B is symmetric and positive semidefinite. Furthermore, as there exists a
Qi E Ot,K-r such that B = Q 1 Q[, from Lemma (3.2.2) it follows that B is a
rank K - r partial isometry. Thus, BE St n 'Ptxt,K-r·
The proof for Lemma 3.2.4 (b) is similar with (3.15) replaced by

CTi(B) = -,\i(B) for i=l, ... ,t. D

Example 3.2.1 This shows that there exists BE St n 'Ptxt,K-r which cannot
be expressed as Qi Q[ for some Qi E Ot,K-r.

Clearly, B is a 3 x 3 symmetric matrix and a partial isometry of rank 2.

Theorem 3.2.2 Let'Pmxn,K be defined as in {3.11} and let E = [ ~1 ] where


E1 = diag(u) and OE ~(m-n)xn. Then
K

max l(E,lV)l=LCTi (3.17)


WE1'mxn,1< i=l

36
with

argmax {l{:E, W}I : lV E 'Pmxn,1t} =


Ir
{W E ~mxn : W = ( [ W

where ( is either 1 or - 1 and W E w;xt,1t-r} (3.18)

where
I • • T
wtxt,K-r = {WE St : w = zz ' z E Ot,K-r }. (3.19)

Here the diagonal blocks of W are r x r, t x t and (m - r - t) x (n - r - t)


matrices respectively.

Remark 3.2.5 Note that tr(l-i') =K- r.

Proof: Equation (3.17) follows from the proof of Lemma 3.2.3.

Let ( be either 1 or -1. If H1 * is any element of the right hand side of


(3.18), then from (3.5) and (3.6)
n r it
l(:E, W*)I = I L O"i~il = I L o-i( + o-lt(tr(W)I = L O"i-
i=I i=I i=I

Conversely, let lV* E a.rgma.x {l(E, lV}I : W E 'Pmxn,1t}- Then, w* =


(W1*1 , ... , w:nf E </>n,K satisfies o-Tw* = ( I:f= 1 ai, and therefore also satisfies
the properties given on the right hand side of (3.8). Furthermore, W* has the
representation W* = U*V*T where U* E Om,it, V* E On,it, m ~ n ~ 1,, with

nr u ( for i = 1, ... , r, (3.20a)

llViil < 1 for i = r + 1, ... , r + t, (3.20b)


lV-~u 0 for i = r + t + 1, ... , n and (3.20c)
r+t
L
i=r+l
lV-~u ({K-r). (3.20d)

37
Partition the rows and columns of W* E ~mxn into blocks as

W* =
Fn F12
[ F21 F22 F23
Fi3]
F31 F32 F33
where the dimensions of the square matrices Fii for i = 1, 2 a.re respectively
rand t.

Also partition U* E Om,1t and V* E On,1t into blocks as

where Cii and Eii for i = 1, 2 are r x rand t x (K- - r) matrices respectively.

From (3.20a) the diagonal elements of W* a.re all either equal to 1 or -1 for
i = 1, ... , r. Then Lemma 2.3.2 (b) implies that Fn = (Ir, F1i = 0 for all
j -/=- 1, Fil = 0 for all i -/=- 1 and (C11 Cl2] = ((E11 E 12 ). As the diagonal
elements of W* satisfy (3.20), from Lemma. 2.3.3 it follows that F 3i = 0 for
j = 1, 2, 3 and Fi 3 = 0 for i = 1, 2, 3. Furthermore, from (2.14)

[ Cu C12 ] = ( [ Eu E12 ] .
C21 C22 E21 E22
From (2.16) we have C3 j = 0, and from (2.17) E 3i = 0, for all j. Using all
these results and setting [JT = (C11 Cu) E ~rx1t and (JT = (C21 C 22 ] E ~tx1t,

u• =[ F], oE jllm-,-,)x•

and

v· = ( [ F], 0 E jlln-,-•)x•

and so
[JT[J
W* =( [ (JT[J

Since Fu = (lr, it follows that [JT[J = Ir. Also, F 12 = 0 implies [JT(J =0


while F 21 = 0 implies (JT[J = 0.

38
Let W = (JT(J_ From (3.20d), it follows that tr(W) = K- r. Notice that
W is symmetric, positive semidefinite and also a rank K - r partial isometry.
Hence, by Lemma 3.2.4 we can express W as

W
A

= ZZ T , Z E C::\K-r·

If we let W have this representation, then the matrix W* reduces to that


given in (3.18). D

Remark 3.2.6 If K = r + t, then

argmax {l(E, lV)I : l,V E 'Pmxn,tt} =


lr
{W E ~m xn :W =( [ ft

where ( is either 1 or - 1}.

Recall that X E Om,m is a matrix of left singular vectors of A and Y E On,n


is a matrix of right singular vectors of A satisfying

(3.21)

Let X 1 E Om,r be the matrix consisting of the first r columns of X and


let X2 E Om,t be the matrix consisting of the next t columns of X. Also,
let Yi E On,r be the matrix consisting of the first r columns of Y and let
Y; E On,t be the matrix consisting of the next t columns of Y. By (3.5) and
(3.21)
x'{ AYi = diag(o-1, ... , u,.) = t; x[ A½= o-Kft. (3.22)

We now characterize the matrices lV E 'Pmxn,K which achieve the maxi-


mum in the characterization of f,.,(A) given by Lemma 3.2.3.

Corollary 3.2.1 If the singular values of A satisfy (3.5), then

argmax {l(A, tff)I : WE 'Pmxn,tt}

39
where O.p1 = {X1 Yt + X 2ZZTY[ : Z E Ot,,--r} and X1, X2, Yi, ½ satisfy
{3.22).

Proof: From the proof of Lemma 3.2.3,


,.
max l(A, W)I = max l(E, W}I = :Eu;.
We1'mxn,K WE1'mxn,K i=l

From (3.12) and the invariance of Pmxn,,. to orthogonal transformations

argmax {l(A, W}I: w E 'Pmxn,,.} = {XW*YT: W* En'}


where n' = argmax {l(E, W)I : WE 'Pmxn,,.}. From Theorem 3.2.2

where ( is either 1 or -1, W = Z zT, Z E Ot,K.-r and so

from the partitioning of X and Y. D

Below are two possible generalizations of the set 'Pmxn,K· Let m > n, let
K E { 1, ... , n} and define

q>~xn,K. = {WE ~mxn : Tf_ ::; 1 for £ = 1, ... , n, ltr(W)I < K}


and

q>:xn,K = {lV E ~mxn : T(' ::; 1 for £ = 1, ... , n, atr(W) ::; K },

where r 1 , ... , Tn are the singular values of W. Both sets are compact and
convex. However, the following example illustrates that they are not suitable
for our purposes.

Example 3.2.2 Let m

W =
= 4, n = 3, "' = 2
0.7205 -0.0162
[ 0.1737
-0.6209
0.3351
0.4215
and let

0.6346
0.2083
0.6349
l
0.2554 0.8425 -0.3883

40
with singular value decomposition W = xf;yT_ The singular values of Ware
T1 = T2 = T3 = 1 and ltr(W)I = atr(W) = 1.6905 ~ K. Thus, W is in 4>~xn,it

and also in 4>:xn,it but if A = XEYT with singular values C11 = 3, C12 = 2 and
0'3 = 1, then
It

l(A, W)I = jtr(AWT)I = 6 ~ LCTi = 5.


i=l

3.3 Another max characterization

Now, consider another set 4>mxn,it defined by


n
4>mxn,it = {WE ~mxn : Tf. ~ 1 for f = 1, ... , n, L Tf. ~ K,} (3.23)
i=l

where m ~ n, KE {1, ... ,n} a.nd r 1 , ••• ,Tn are the singular values of W.

Remark 3.3.1 4>mxn,it is a compact convex set. It is also invariant under or-
thogonal transformations. The convexity and orthogonal invariance of 4>mxn,it
can be established using the properties of the SVD given in Section 2.2.

Lemma 3.3.1 Let A E ~mxn have singular values CT1 ~ • • • ~ O"n > 0 and let
4>mxn,it be defined by {3.23}. Then

max l(A, W)I = fit(A). (3.24)


WE4>mxn,1<

Proof: For any WE 4>mxn,it, equation {1.1) and the properties of the Frobe-
nius inner product imply that

(3.25)

with W = XTWy E 4>mxn,it as 4>mxn,it is invariant under orthogonal trans-


formations. Hence

ma.x l(A, lV)I


WE~mxn,1<

41
= H E<l>mxn,K
max l(:E, lV)I
1

n
= max
H'E<l>mxn,1<
I L ui(A)Wid
i=l
n
~ max
WE<l>mxn,1< i=l
L ui(A)IWiil
K

~ L O"i(A) = f,.(A)
i=I

as from parts (a) and (c) of Lemma 2.3.4, H1 E ~mxn,,. implies that IWid <1
for i = 1, ... ,n and I.:~1 IW;il ~ ,c

Let W = diag(W11 , ••• , lVnn) such that lV;i = 1 for i = 1, ... , K and
Wii = 0 otherwise. This lV is in ~mxn,,. and achieves the maximum. D

Remark 3.3.2 For any W E ~mxn,,. the inner products (A, W) and -(A, W)
are linear functions of A E ~mxn. As a pointwise maximum of convex func-
tions is convex, the convexity of f,.(A) follows from the maximum characteri-
zation in Lemma 3.3.1. In a similar way, the convexity of f,.(A) also follows
from the partial isometry characterization in Lemma 3.2.3.

We now state the main result of this section.

Theorem 3.3.1 Let ~mxn," be defined by {3.23} and let :E = [ ~1 ] where


E1 = diag(u) and OE ~(m-n)xn. Then
K

ma.x l(E, lV)I =Lu; (3.26)


H'E<l>mxn,1< i=l

with

argmax {l(I:, lV)I: lV E ~mxn,,.} =


Ir
{HIE ~mxn : lV =( [ w
( is either 1 or - 1 and lV E '11txt,,.-r} (3.27)

42
where

'11txt,it-r = {WE St, 0 :S W :S I and tr(W) =K- r }. (3.28)

Here the diagonal blocks of W are r x r, t x t and (m - r - t) x (n - r - t)


matrices respectively.

Proof: Equation (3.26) follows from the proof of Lemma 3.3.1.

Let ( be either 1 or -1. If W* is any element of the right hand side of


(3.27), then from (3.5) and (3.6)
n r K

I(E, W*) I = IL O"j wi: I = I L ai( + O" ,c(tr(W) I = L O"j.


i=l i=l i=l

Conversely, let lV* E argmax { I(E, lV) I : lV E <I>mxn,it}. Then W* satisfies


n K

I:o-iwi: = cI: O"i (3.29)


i=l i=l

and from the definition of <I>mxn it we have


'

TI_= T1_(W*) :S 1 for f = 1, ... , n


and
n
L T1.(lV*) = IC (3.30)
'-=1
From parts (a) and (c) of Lemma 2.3.4, lV E <I>mxn,it implies that IWid :S 1 for
i = 1, ... , n and I:~ 1 llVid :S K. This together with (3.5) and (3.29) implies

nru ( for i = 1, ... , r, (3.31a)

llVi:I < 1 for i = r + 1, ... , r + t, (3.31b)


lV~u 0 for i=r+t+l, ... ,n and (3.31c)
r+t
I:
i=r+l
W-~u ((K- - r). (3.31d)

Let W* have SVD W* = U*E*V*T where U* E O m,m, V* E On ,n and


E* E ~mxn. Let the integers p and q be defined by

1 = T1 = · · · = Tp >
(3.32)
Tp+I ~ · · · ~ Tq > Tq+1 = · · · = Tn = 0

43
where O ~ p < K and r +1 ~ K ~ q ~ n.

We shall now establish that p ~ r using a proof by contradiction. If we


assume that p < r, then by the definition of p, Tt < 1 for f, = p + 1, ... , n
and hence by Lemma 2.3.4 (b) (iii) and (2.20), Utt = 0 for i = 1, ... , r and
f, = p + 1, ... , m. Thus, the first r rows of U* are effectively vectors in ~P.

If r > p, then these must be linearly dependent. This is a contradiction


as the first r rows of U* must be linearly independent as U* is orthogonal.
Therefore, p ~ r.

Partition the rows and columns of U* into blocks as

where Cii,i = 1, ... ,4 are r x p, t x (q-p), (n - r -t) x (n - q) and


(m - n) x (m, - n) matrices respectively. Furthermore, partition f;• as

where Ip, t and the diagonal zero are square matrices with dimensions p,
q - p and n - q respectively. Also, partition V* a.s

v· = [ ;:: ;:: ;:;


£31 £32 £33
l
where Eii,i = 1,2,3 are r x p, t x (q- p) and (n -r -t) x (n -q) matrices
respectively.

Using the orthogonality of U* and (3.30)


n n n n n
LLTcUt/ = LT( Lui~ 2 ~ LTt = "-, (3.33)
i=l l=l l=l i=l l=l

and using the orthogonality of V* and (3.30)


n n n n n
L L Te 1~;
i=l l=l
2
= L Te L 1~; = L Te = "'·
l=l i=l
2
l=l
(3.34)

44
Moreover from (3.31) it follows that Li=I wii = ,c(. As wii = Lt=l TtUit¼t,
we obtain
n n n
(}: Wii = (LLTtUil¼t = IC. (3.35)
i=l i=l l=l
Now,

0 < "''°'( 112u·


n n
L., L., Tt il -
( Tt112v.)2
il
i=l l=l
n n n n n n
- LLTtUil 2 -2(}:}:rtUil¼t + LLTt¼e 2
~1~1 ~l~l ~l~l
< K, - 2K + =0
K,

from (3.33) to (3.35). It follows that

112['1·
7t il = ( 7 t112v·il for i = 1, ... ,n and f = l, ... ,q.
I.e.,
= (En, C12 = (E12, C21 = (E21,
Cu
(3.36)
= (E22, C31 = (E31, C32 = (E32.
C22
From (3.31a), Wii = ( for i = 1, ... , r. By (3.32), re< 1 for f = p+ 1, ... , n
so that from (2.20) we obtain

C12 = 0 = E12 and C13 = 0 = E13 (3.37)

and from Lemma 2.3.4 (b) (iii) it follows that

C14 =0. (3.38)

As C 12 = 0, C~ 3 = 0 and C 14 = 0, from the orthogonality of u•


(3.39)

The orthogonality of u• also implies that


C 11 CiJ = 0, C11 C[i = 0, C 21 C{i = 0, (3.40)
C31C'fi = 0, C41C'fi = 0.
From the partitioning of u•, t• and V*, and (3.36) to (3.40)

lV* = U*E*V*T = ( I lr

45
where the sizes of the diagonal blocks of them x n matrix W* are respectively
r, t and n - r - t.

By (3.31c), Wii = 0 for i = r + t + l, ... , n. This implies that

and as :E > 0 it follows that

C31 = 0 and C32 = 0.

Therefore
E31 = 0 and E32 = 0.

From the orthogonality of U*

while from the orthogonality of V*

This implies that C 41 = 0. Similarly, from the orthogonality of U* and V* it


follows that C 42 = 0. Hence W* reduces to

where the diagonal blocks are r x r, t x t and (m- r-t) x (n-r-t) matrices
respectively.

Let W
-
= C21 C 21T + C22:EC
- T
22 where
-
:E > 0. The trace condition comes
directly from (3.31d). It is obvious that W E St. Also, W is a positive
semide:finite matrix as it is a sum of positive semide:finite matrices. Therefore

Ai(W) = Ti(W) for i = 1, ... , t

i.e., 0 ~ W ~ I. The matrix W* is now in the form given in (3.27). D

46
Example 3.3.1 A matrix W* E argmax {l(E, W)I : W E ~mxn,"} is given
below. U* and V* are orthogonal mafrices from the SVD of W*. The singular
values of W* are r 1 = r2 = r3 = 1 and r4 = rs = 0. In this example
m = 6, n = 5, p = q = 3, r = 2, t = 2 and K = 3.
0.5085 -0.7807 0.3632
0.6602 0.0827 -0.7465
0.5514 0.6178 0.5560 0.0505 0.0505
U* = -0.0395 -0.0442 -0.0398 0.0753 0.0753
0.7071 -0.7071
-1

0.5085 -0.7807 0.3632


0.6602 0.0827 -0.7465
V* = 0.5514 0.6178 0.5.560 0.0505 0.0505
-0.0395 -0.0442 -0.0398 0.0753 0.0753
0.7071 -0.7071
and
1

w·= l
r 1
0.9949 -0.0712
-0.0712 0.0051

0
0

Notice that
r+t
I: "'ii = 1 = "' -
i=r+l
r =

Corollary 3.3.1 If the singular values of A satisfy {3.5}, then

argmax {l(A, lV)I : WE ~mxn,"}


=nu -n (3.41)

where n = {X1ll +X2WY?: WE Wtxt,"-r}, Wtxt,"-r is defined in {3.28}


and X1, X2, Yi, ½ satisfy (3.22}.

Proof: In the proof of Lemma. 3.3.1 it was shown that

"
max l(A, W)I = max l(E, W)I = LO"i-
"'E~mxn," li'E~mxn,1< i=l

47
From (3.25) and the invariance of 4>mxn,,. to orthogonal transformations

argmax {l(A, W)I: w E 4>mxn,.. } = {XW*YT: W* En"}


where n" = argmax {l(E, W)I: WE 4>mxn, .. }. From Theorem 3.3.1

so that

from the partitioning of X and Y. D

Remark 3.3.3 Either WE 4>mxn,,. or M1 E 'Vtxt, ..-r may be referred to as a


"dual" matrix. The distinction between lV and W is analogous to the question
of whether or not to assign zero Lagrange multipliers to inactive constraints
in a nonlinear program.

Remark 3.3.4 The degrees of freedom in the argmax result of Corollary 3.3.1
is parametrized by a symmetric matrix W. The trace condition on W gives
a linear equation while the positive semidefinite constraint gives eigenvalue
inequalities. Furthermore, <I>mxn,,. is a convex set. Thus, the argmax result
leads to a minimal representation of the subdifferential off,. so that a non-
singular system can be solved to ve1·ify optimality. The argmax result for
the partial isometry characterization of f,.(A) given by Corollary 3.2.1 is less
useful because this involves orthogonal matrices. Also, the set Pmxn,,. is not
convex.

Remark 3.3.5 The set 'Vtxt,K-r is the convex hull of the set .
'V~xt K-r (defined
by 3.19), and 'V~xt, .. -r is the set of extreme points of 'Vtxt, .. -r (see Overton
and Womersley {25}, Theorem :J).

48
Remark 3.3.6 If K = r + t, i.e., ult > <r1t+I, then W = It and

Remark 3.3. 7 When = I the


K condition W< I is unnecessary as it is
implied by W > 0, tr(W) = 1.

We now consider the situation where u It = 0. For this case, the appropriate
ordering of the singular values of A is

U1 ~ · · · ~ U r >

Ur+I = ··· = U,t = ··· = Un = 0

where n = r + t.

Remark 3.3.8 The multiplicity t, of alt is n - r.

Lemma 3.3.2 Let <I>mxn,1t be defined by (3.23} and let E [ ~ 1 ] where


E1 = diag(u) and OE ~(m-n)xn. Then

with

argmax {l(E, lV)I: WE <I>mxn,1t} =

{W E ~rxn : W = ( [ [ /, OW l ],
( is either 1 or - 1 and W E 'Vtxt,1t-r}

where
'Vtxt,1t-r = {WE St, 0 ~ 11/ ~ I and tr(W) =K - r }.

Here Ir and W are r X r and t X t matrices respectively and O E ~(m-n)xn.

49
Proof: The proof is similar to the proof of Theorem 3.3.1 with equations
(3.31) replaced by the equations

wi: ( for i = 1, ... , r,


Iwi: I < 1 for i = r + 1, ... , n and
n
'°'
L...J
i=r+I
W:".
"
((,.; - r),

and with U*, f;• and V* from the SVD of lV* partitioned as follows:

C 11 ,C22 and C 33 are r x p, t x (q - p) and (m - n) x (m - q) matrices


respectively;

E·=[! !],
Ip and f; are p x p and (q - p) x (n - p) matrices respectively and

E 11 and E 22 are r x p and t x ( n - p) matrices respectively. D

50
Chapter 4

Differential properties

4.1 Introduction

In this chapter, we study the differential properties of singular value sums.


Let "' E { 1, ... , n}. In Section 3.3, we established that for the function
K

JK(A) = L o"i(A),
i=l

the following cha.ra.cteriza.tion holds:

JK(A) = max l(A, W)I.


WE~mxn,"

Moreover, we identified the matrices which achieved the maximum in this


characterization. In Section 4.2, we show that this leads to a concise charac-
terization of the subdifferential of f K. Subsequently, we focus on the convex
composite function

where x E ~l and A : ~l ---+ ~mxn is a continuously differentiable matrix-


valued function. Section 4.3 discusses the generalized gradient of JK(x), Sec-
tion 4.4 gives necessary conditions for x to minimize f K locally, Section 4.5
gives a formula. for the directional derivative and Section 4.6 considers the
generation of descent directions by splitting multiple singular values.

51
4.2 The subdifferential offK,(A)

Lemma 4.2.1 The function JK : ~mxn --+~is convex and its subdifferential
8fK(A) is the nonempty compact convex set

(4.1)

where 'Vtxt,K-r is defined by (3.28) and X1, X2, Yi, Yi satisfy {3.22). Further-
more, fK is differentiable at the point A if and only if K = r + t, in which case
8fK(A) reduces to X1Yt + X 2 Y,l, the derivative of JK at A.

Proof: By Lemma 3.3.1 the function JK is the support function for <Pmxn,K,
and is therefore convex.

From Corollary 23,5.3 of Rocka.fellar [27], the subdifferential of a function


defined as a pointwise maximum of a set of linear functions is the convex hull
of the gradients of the linear functions achieving the maximum at the given
point. As JK is a finite-valued function, the subdifferential is a nonempty
compact convex set in ~mxn, i.e.,

aJK(A) = conv {(HI: lV E ni}


where "conv" denotes convex hull, f2 1 = argmax {l(A, W)I : W E <Pmxn,K}

and
(4.2)

From Corollary 3.3.1,

The properties of the Frobenius inner product and (3.22) imply that

(A, W} ([ (A, X11;T} + (A, X2 wrt} l


([(t, Ir} + aK(/t, M1 }]
(JK(A),

52
so ( in Lemma 3.3.1 is given by (4.2). Note that JK(A) =0 if and only if
A=O.
Equation (4.1) follows from Corollary 3.3.1 as the convex hull of a convex
set is convex. Clearly, the right-hand side of (4.1) is a singleton if and only
if K = r + t. The last part of the result then follows from Theorem 25.1 of
Rockafellar (27]. D

4.3 The generalized gradient of f,.,(x)

Let A : ~l --+ ~mxn be a. smooth (at least once continuously differ-


entiable) matrix-valued function whose partial derivative with respect to Xk

IS

Ak(x) = BA(x)
for k = 1, ... ,f.
8xk
In this section we characterize the generalized gradient of the convex com-
posite function
JK(x) = fk(A(x)).
Although Ak and the singular values a.nd singular vectors of A( x) are functions
of x E ~e, the explicit dependence on x will usually be omitted. Therefore,
as before the singular values of A(x) a.re denoted by (3.5), with rand t now
dependent on x, and with the corresponding left and right singular vectors
satisfying (3.22).

Lemma 4.3.1 The generalized gradient of JK(x) is given by

8JK(x) = {w E ~e: :3l,V E St with O::; W::; I, tr(W) = K - r, and


Wk = tr(X'{ AkYi) + (X[ Ak½, W), k = 1, ... ,£}. (4.3)

53
Proof: Since f,. (A) is convex and A( x) is smooth, the chain rule of Theo-
rem 2.3.10 from Clarke (3] implies that

8f,.(x) = {w E ~l : Wk = (Ak, w'}, k = l, ... ,.e


where w' E 8fk(A(x))}.

Lemma 4.2.1 and the properties of the Frobenius inner product complete the
proof. D

Remark 4.3.1 Since by Lemma 3.3.1

f,.(x) = max l{A(x), lV)I


"'E~mxn,1<

the result also follows from Theorem 2.8.6 of Clarke {3} which characterizes
the generalized gradients of functions defined by a pointwise maximum. From
the Clarke characterization

8f,.(x) = conv{w E ~l : Wk = (Ak, W }, k = I, ... ,l


I

where (A(x), lV'} = f,.(x) and W' E <I>mxn,,.}.


Equation (4.3} follows from Lemma 3.3.1 as the maximizing set is already
convex.

Remark 4.3.2 The form of the generalized gradient given by Lemma 4.3.1
is computationally convenient as it does not involve taking a convex hull.
The absence of the convex hull operation also means that the structure of the
subdifferential is displayed.

Corollary 4.3.1 If"'= r+t, i.e., u,. > u,.+1, the function f,. is differentiable
at x with
8f,.(x)
f) = tr (XT1 Ak ,,,)I +
l tr
(XT v)
2 Ak .t 2 •
Xk

Proof: It follows from Lemma. 4.3.1 using the ordinary chain rule. D

54
4.4 Necessary conditions

The first-order necessary condition for x to be a local minimizer of f,,. is


0 E 8/,,.(x) from Proposition 2.3.2 of Clarke [3], i.e., there exists WE St such
that
0 :::; t-f/ :::; I, tr{l11 ) =K- r, (4.4)

and

These two conditions are computationally very useful as one can relax the
inequalities on W and solve (4.5) together with tr(W) =K- r for W. This
requires solving a system of l + 1 linear equations for the t( t + 1) /2 unknowns
in the symmetric matrix W. If the inequalities O < W ::::; I are not satisfied
then a descent direction may be generated. This is discussed in Section 4.6.

If f,,. is convex {for example if A(x) is affine), then equations (4.4) and
(4.5) are both necessary and sufficient for x to be a minimizer of f,,..

4.5 The directional derivative

As f,,.(x) is a. convex composite function the standard one-sided directional


derivative at x in a direction d E ~l exists and is obtained by composing the
directional derivatives of f,,.(A) and A(x). By Theorem 23.4 from [27],

lim f,,.(x + nd ) - f,,.(x) = J~(A(x); A'(x; d))


a-+O+ 0:
l
max (l-V,
weaJK(A)
L dkAk)
k=I

.55
where (4.6) follows from Lemma 4.2.1. Recall that the matrices X 1 , X 2 , Yi
and ½ defined by (3.22), are evaluated at the point x, and that Ak is the
partial derivative of A(x) with respect to Xk evaluated at the point x. For
k = 1, ... , l define
bk = tr(X[ AkYi), (4.8)

Bk= ½(Xf Ak½ + Y[ AkX2) (4.9)

and
Kk = ½(Xf Ak½ - l;T Ar X2).
Also, define B( d) E St by
l!
B(d) =L dkBk. (4.10)
k=l

Note that Xf Ak½ = Bk + J(k where Bk E St and J(k E Kt. As W is


symmetric, from the properties of the Frobenius inner product,

Therefore from (4.7), (4.8) a.nd (4.10),

f~(x; d) = bT d + _ max {W, B(d)).


l-i'E'l'txt,1<-r

Let the eigenvalues of the symmetric matrix B( d) be /31 > · · · > f3t- Then
from Theorem 3.4 of Overton and Womersley [24], it follows that
K-T

f~(x; d) = bT d + L /3i- (4.11)


i=l

Example 4.5.1 Reconsider Example 1.0.1 where r = 1 and K = 3. If x = 0


and d = 1, then bT d = 0.7098 and B(d) has eigenvalues /31 = 1.5129, /32 =
-0.0176 and /33 = -0.8353 so that (4.11) gives f~(0;+l) = 2.2051. lfx = 0

56
and d = -l, then bT d = -0.7098 and B(d) has eigenvalues /31 = 0.8353, /32 =
0.0176 and /33 = -1.5129. Again from (4.11), we obtain 1:(0;-l) = 0.1431.
For comparison, the definition

f.,,_'( X,·d)=
_
. l,,,(x+od)-l,,,(x)
1Im
a-o+ o
with a= 1x10- 7 gave the approximations 1:(0; +l) = 2.2051 and 1:(0; -l) =
0.1431.

4.6 Splitting multiple singular values

In this section, for given x, we wish to either (a) generate a descent direc-
tion for I,,,, or (b) demonstrate that x satisfies the first-order conditions for
optimality. If,., = r + t, then l,,,(x) is differentiable; consequently it is suffi-
cient to examine the gradient, which has entries given by (4.3). If the gradient
is zero, the first-order optimality conditions hold; otherwise, the negative gra-
dient provides a descent direction. The function I,,, may be nonsmooth for
K < r + t. We consider only this case in the remainder of this section. The
steepest descent direction is not of interest as it is known that the method
of steepest descent may converge to a non-optimal point when applied to a
nonsmooth function. Instead, we consider a descent direction which main-
tains the multiplicity t for a,,,, to first order, when possible. This is possible
in the first of the following three cases. In the second case, generation of a
descent direction requires splitting a group of singular values corresponding
to a,,,. The third case is a degenerate case.
Casel. I E Span{B1, ... , Bt}-
Solve the system
C
6/ - L dkBk = O (4.12)
k=l
C
(K - r)6 + L dkbk = -l. (4.13)
k=l

57
This is a system of t(t + 1)/2 + 1 linear equations in f.+ 1 unknowns
6,di, ... ,de. Equation (4.12) implies that the eigenvalues of B(d) defined by
(4.10) are all equal to 6. The system is solvable since (4.12) is solvable for
any 6 by assumption, and (4.13) scales this solution. Hence, from equations
(4.11) and (4.13), f~(x;d) = -1, where the direction d E ~ has components
d1 , ••• , de. Note that the -1 on the right-hand side of (4.13) is just a normal-
ization constant and can be replaced by any 'f/ < 0 giving J~(x; d) = T/· To
first order, all the singular values O"r+t(x), ... , ar+t(x) decrease at the same
rate along d, and 6 gives a first order estimate of the change in their common
value.
Case 2. Case 1 does not apply and the span of the f.+ 1 vectors in ~t(t+I)/ 2

associated with I, B 1 , ••• , Be has the maximum dimension t(t + 1)/2.


Solve the linear system
tr(W) = 1,, - r (4.14)

-(lV,Bk)=bk, k=l, ... ,f. (4.15)

for the dual matrix l1'1 E St. Notice that the trace condition ( 4.14) is equiva-
lent to (I, W) = 1,, - r. Since the {Bk} may not form a linearly independent
set, (4.15) may be replaced by considering only a maximal independent set
of {Bk}. The resulting system cannot be inconsistent because of the related
definitions of the left and right-hand sides (i.e. Bk and bk)- By the rank as-
sumption, the resulting linear system is square and nonsingular, with order
t(t + 1)/2, and having a unique solution H1 . If lV satisfies O ~ W ~ I then
0 E 8/,,.(x), so x satisfies the first-order necessary conditions for a minimum.
If these inequalities on l1'1 are not satisfied then a descent direction can be
generated using the following lemma. This lemma shows the importance of
the eigenvalues of the t by t dual matrix l11.

Lemma 4.6.1 Suppose (4- 14) and (4- 15) are satisfied but O (/: 8/,,.(x), so W
has an eigenvalue () outside [O, 1]. Let z E ~t be the corresponding normalized
eigenvector of W. Choose /3 E ~ so that /3 < 0 if() > 1 and (3 > 0 if() < 0.

-58
Solve
l
81 - L dkBk = {lzzT.
k=l

Then d = [d1 , ••. , dtf is a descent direction.

Proof: The proof is omitted because it is almost identical to the proof of


Theorem 3.13 of Overton and Womersley [24]. D

Remark 4.6.1 The descent direction splits the multiple singular value into
two clusters, one of unit multiplicity and one of multiplicity t-1 to first order.
This is analogous to moving off only one active constraint at one time in linear
or nonlinear programming. A descent direction is generated using information
provided by the dual matrix W. Negative Lagrange multipliers provide similar
information in constrained optimization. Lemma 4-6.1 guarantees an overall
reduction in f,. as follows: if 0 < 0, then one singular value in the group
of multiplicity t is separated from the others by a reduction, reducing the
approximate multiplicity but leaving the number of singular values larger than
u,., to first order, unchanged; if 0 > 1, then one singular value in the group of
multiplicity t is separated from the others by an increase, again reducing the
approximate multiplicity but increasing the number of larger singular values
(to first order).

Case 3.
Neither of Cases 1 and 2 apply. Degeneracy is said to occur in this case.
Generation of a descent direction is not straightforward in the degenerate
case, just as in linear or nonlinea.r programming.

59
Chapter 5

Conclusions

5.1 Introduction

If A E ~mxn with m ~ n, then the squares of the singular values of A are


equal to the eigenvalues of the symmetric matrix AT A, i.e.,

o-;(A) = Ai(AT A) for i = 1, ... , n. (5.1)

Therefore, we can apply the eigenvalue results of Overton and Womersley [24]
to AT A and simplify these to obtain results on singular values of A.

In Section 5.2, we give some indication of how this may be done. We also
derive an expression for Bk (defined by (4.9)). In Section 5.3, we make some
concluding remarks on possible extensions of the work done in this thesis.

60
5.2 Relation to AT A

Let A E ~mxn with m 2: n and define

cl>nxn,K = {WE Sn : 0:::; W::; I, tr(W) = K, }.

Suppose the eigenvalues of AT A are ordered as

A1(AT A) 2: · · · 2: ,\r(AT A)>


Ar+i(AT A) = • • • = ,\K(AT A) = · · · = Ar+t(AT A) > (5.2)
Ar+t+I(AT A) 2: · · · 2: An(AT A) 2: 0

where t ~ 1 and r 2: 0 are integers.

Let the SVD of A be given by A = X:EYT where X E Om,m, Y E On,n


and :E E ~mxn. Then, the spectral decomposition of AT A can be written as

AT A= Y DYT, D = :ET:E E Vn
where the columns of Y form an orthonormal set of eigenvectors for AT A
and D = diag(,\ 1 {AT A), ... , ,\n(AT A)). Alternatively, the columns of Y are
the set of right singular vectors of A, and D = diag(a;(A), .. . ,a;_(A)) as a
consequence of (5.1).

Let Yi E On,r be the matrix consisting of the first r columns of Y, let ½


be the matrix consisting of the next t columns of Y and let ½ be the matrix
consisting of the remaining columns of Y. By (5.1) and {5.2),

Yt(AT A)Yi = diag(a;(A), ... , a;(A)); ll(AT A)½= a~(A)ft. (5.3)

As AT A E Sn with ,\i(AT A) 2: · · · 2: ,\n(AT A), from Theorem 3.4 of


Overton and Womersley [24], it follows that
K K

~ax (AT A, lV) =L ,\i(AT A) =L a;(A).


"W'E<l>nxn,>< i=l i=l

61
Furthermore, if the eigenvalues of AT A satisfy (5.2), then
T A

argmax {(A A, W} : WE <I>nxn,it}


={WE Sn: w = Yi.YiT + ½WI-;
- T -
; w E <l>txt,it-r}
A

with
itxt,it-r ={WE St: 0 ~ W~ I and tr(W) = K - r},
where Yi, ½ satisfy (5.3).

Remark 5.2.1 If K = r + t, then W=I and

argmax {(A, W} : WE <I>nxn,it}


A

= Yi.YiT + ½I';T ·

It

9it(A) = I:C,-;(A).
i=l
From Theorem 3.5 of [24]; it follows that the function git : Sn ----+ ~ is convex
with subdifferential

For completeness, we give the derivation of the expression for Bk. Let A( x)
be a real m by n ( where m ~ n) matrix affine function of a real parameter
vector of x = (x 1 , ••• , xe)T E ~e, i.e.,
l
A(x) =Ao+ L XkAk,
k=l
where { Ak} are given real m by n matrices. This expression can also be
obtained by a first order Taylor series expansion.

We have,
l l
A(xf A(x) (A~+ L xkAf)(Ao + L XkAk)
k=l k=l
l l l
A~Ao + L Xk(A~Ak + Af Ao)+ L L XjAj Akxk.
k=l i=l k=l

62
Therefore, the partial derivative of A(xf A(x) with respect to Xk is

Let the SVD of Ao be given by Ao= XEYT where X E Om,m, YE On,n


and E E ~mxn. Partition the columns of X as [X1 X2 Xa] where X1 is the
first r columns of X, X 2 is the next t columns of X and X 3 is the remaining
columns of X. Similarly, partition Y as [Yi ½ ½] where Yi is the first r
columns of Y, ½ is the next t columns of Y and ½ is the remaining columns
of Y.

From (5.4) and the SVD of Ao,

y;T (8A(xf A(x)) 1,,.


2 8 Xk 2 Y[(A6Ak + AI Ao)½
a,_(Xf Ak½ + Y[ AIX2)

where the second equality follows from the SVD of Ao and the partitioning
of X and Y.

By assumption, a,_ =/ 0. Thus, using the previous expression we may write

5.3 Further research

The theoretical results of this thesis may be used to develop practical


algorithms for minimizing f,_( x) based on successive linear or quadratic pro-
gramming. Algorithms may be defined to take full advantage of the structure
of the generalized gradient of f" that is estimated to apply at the optimal

63
point. In [22) and (23), Overton has described and extensively tested such al-
gorithms when minimizing the largest eigenvalue (i.e., K = 1) of a symmetric
matrix-valued function.

Suppose that x* is a (local) minimizer of f/((x), with corresponding values


r* and t* defined by (3.5). A model algorithm would require estimates, say r
and t, of r* and t*, which are obtained and revised during the course of the
minimization process. The basic iteration of a model algorithm for minimizing
f/((x) is a solution of the following quadratic program:

(5.5)

l
subject to bi - L dkXf Ak½ = diag(o-r+I, ... , O"r+t)- (5.6)
k=l

Here, H is some positive semidefinite matrix. All the quantities X 2 , Y;, Ak, O"i
and b (defined in (4.8)) are evaluated at the current point x. The new point
is x + d, and b gives an estimate of o-/((x + d).

Equation (5.6) represents the appropriate linearization of the nonlinear


system
O"r+1(X + d) = · · · = O"r+t(X + d) = b;
a justification of this is needed as this system is not differentiable (see Fried-
land, Nocedal and Overton [11)). In the objective function (5.5), the first two
terms represent a linearization of f/((x + d) while the the third term may be
used to incorporate second order information. The Lagrange multipliers cor-
responding to the t(t + 1)/2 equality constraints {5.6) make up a dual matrix
estimate of W.

Another possible extension of this work may be to investigate the differ-


ential properties of the structured singular value ( see Doyle [5]) which plays
a key role in robust control applications. Some methods of computing the
structured singular value a.re considered in [31 ).

64
Bibliography

[1] E. Anderson et a.I., LAPACK User's Guide, Society for Industrial and
Applied Mathematics, Philadelphia, 1992.

[2] H. C. Andrews and B. R. Hunt, Digital Image Restoration, Prentice Hall,


Englewood Cliffs, New Jersey, 1977.

[3] F. H. Clarke, Optimization a.nd Nonsmooth Analysis, John Wiley, New


York, 1983. Reprinted by SIAM, Philadelphia, 1990.

[4] J. J. Dongarra et a.I., LINPACK User's Guide, SIAM, Philadelphia, 1984.

[5] J. C. Doyle, "Analysis of feedback systems with structured uncertaini-


ties", Proc. IEE, Vol. 129, pt. D, No. 6, (1982), 242-250.

[6] J. C. Doyle, J. E. Wall, and G. Stein, "Performance and robustness


analysis for structured uncertainity", Proc. 21st IEEE Con[. Decision
Contr., Orlando, FL, (1982), 629-636.

[7] K. Fan, "Maximum properties and inequalities for the eigenvalues of


completely continuous operators", Proceedings of the Na.tiona.l Academy
of Sciences of the United Sta.tes of America., Vol. 37, (1951), 760-766.

[8] N. J. Fisher and L. E. Howard, "Gravity interpretation with the aid of


quadratic programming", Geophysics, Vol. 45, No. 3, (1980), 403-419.

[9] R. Fletcher, "Semi-definite matrix constraints in optimization", SIAM


Journal on Control a.nd Optimiza.tion, Vol. 23, No. 4, (1985), 493-513.

65
[10] R. Fletcher, Practical Methods of Optimization, (second edition), John
Wiley, Chichester and New York, 1987.

[11] S. Friedland, J. Nocedal and M. L. Overton, "The formulation and analy-


sis of numerical methods for inverse eigenvalue problems", SIAM Journal
on Numerical Analysis, Vol. 24, No. 3, (1987), 634-667.

[12] B. S. Garbow et al., Matrix Eigensystem Routines-EISPACK Guide Ex-


tension, Lecture Notes in Computer Science 51, Springer-Verlag, New
York, 1977.

[13] G. H. Golub and C. F. Van Loan, Matrix Computations, The John


Hopkins University Press, Baltimore, 1983.

[14] J. -B. Hiriart-Urruty, "Generalized differentiability, duality and opti-


mization for problems dealing with differences of convex functions", in:
Convexity and Duality in Optimization, Lecture Notes in Economics and
Mathematical Systems 256 ( J. Ponstein, ed.), Springer-Verlag, Berlin,
1985.

[15] R. A. Horn and C. R. Johnson, Matrix Analysis, Cambridge University


Press, New York, 1985.

[16] R. A. Horn and C. R. Johnson, Topics in Matrix Analysis, Cambridge


University Press, New York, 1991.

[17] A. D. Joffe and V. M. Tihomirov, Theory of Extrema.I Problems, North-


Holland, Amsterdam, 1979.

[18] V. C. Klema and A. J. Laub, "The singular value decomposition : Its


computation and some applications", IEEE Tra.nsa.ctions on Automatic
Control, Vol. AC-25, No. 2, (1980), 164-176.

[19] C. L. Lawson and R. L. Hanson, Solving Least Squares Problems,


Prentice-Hall, Englewood Cliffs, New Jersey, 1974.

[20] A. W. Marshall and I. Olkin, Inequalities: Theory of Majorization and


its Applications, Academic Press, London, 1979.

66
[21] The MathWorks, Inc., PRO-MATLAB User's Guide, Cochituate Place,
24 Prime Park Way, South Natick, MA 01760, 1991.

[22] M. L. Overton, "On minimizing the maximum eigenvalue of a symmetric


matrix", SIAM Journal on Matrix Analysis and Applications, Vol. 9,
No. 2, (1988), 256-268.

[23] M. L. Overton, "Large-scale optimization of eigenvalues", SIAM Journal


on Optimization, Vol. 2, No. 1, (1992), 88-120.

(24] M. L. Overton and R. S. Womersley, "Optimality conditions and dual-


ity theory for minimizing sums of the largest eigenvalues of symmetric
matrices", UNSW Applied Mathematics Preprint AM91/18, {1991); to
appear in Mathematica.l Programming (1993).

(25] M. L. Overton and R. S. \Vomersley, "On the sum of the largest eigen-
values of a symmetric matrix", SIA.l\1 Journal on Matrix Analysis and
A.ppli<;ations) Vol. 13, No. 1, (1992), 41-45.

(26) E. Polak and Y. Wardi, "Nondifferentiable optimization algorithm for


designing control systems having singular value inequalities", Automat-
ica, Vol. 18, No. 3, (1982), 267-283.

(27] R. T. Rockafellar, Convex Analysis, Princeton University Press, Prince-


ton, New Jersey, 1970.

(28] N. R. Sandell, "Robust stability of systems with application to singular


perturbations", Automa.tica., Vol. 15, (1979), 467:-470.

(29] A. Seeger, "Sensitivity analysis of nondifferentiable sums of singu-


lar values of rectangular matrices", technical report, Department de
Matematica Aplicada i Ana.Hsi, Universitat de Barcelona (Barcelona,
1990).

[30] J. von Neumann, "Some matrix-inequalities and metrization of matrix-


space", in: John von Neumann Collected Works (A. H. Taub, ed.),
Vol. IV, 205-218, Pergamon, Oxford, 1962.

67
[31] G. A. Watson, "Computing the structured singular value", SIAM Journal
on Matrix Analysis a.nd Applications, Vol. 13, No. 4, (1992), 1054-
1066.

68

You might also like