Professional Documents
Culture Documents
Sadhana Subramani
February, 1993
· I hereby declare that this submission is my own work and that, to the
best of my knowledge and belief, it contains no material previously published
or written by another person nor material which to a substantial extent has
been accepted for the award of any other degree or diploma of a university
or other institute of higher learning, except where due acknowledgement is
made in the text.
Abstract
Abstract l
Acknowledgements iv
1 Introduction 1
2 Preliminaries 9
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3 Miscella.nia. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
11
3.3 Another max characterization . . . . . . . . . . . . . . . . . . 41
4 Differential properties 51
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
5 Conclusions 60
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.2 Relation to AT A . . . . . . . . . . . . . . . . . . . . . . . . . 61
Bibliography 64
Ill
Acknowledgements
Some of the results in this thesis were presented at the 29th Applied
Mathematics Conference in Adelaide. Sincere thanks are extended to the
School of Mathematics for giving me the opportunity and financial assistance
to attend this conference. The excellent resources and facilities at the School
provided the required backup for my studies.
IV
Chapter 1
Introduction
In this thesis, we study the properties of sums of the largest singular values
of real m by n matrices.
We introduce some notation first. Let ~mxn denote the linear space of all
real m by n matrices and let Om,n denote the set of all real m by n matrices
with orthonormal columns, where m 2: n. Thus, zT Z = I for all Z E Om,n·
We use I to denote the identity matrix when the dimension is implicit in the
context and In to denote the identity matrix of order n. Also, let 'Dn denote
the set of all real n by n diagonal matrices. The notation diag( a 1 , ... , an)
with all O'i E ~ or diag( tt) where tt E ~n, refers to a matrix in 'Dn with
diagonal entries a 1 , ..• , O'n or tt 1 , ... , ttn respectively.
(1.1)
1
places, see for example [15) or [19). Equation (1.1) may be rewritten equiva-
lently as xT AY = :E.
The diagonal elements of the matrix :E 1 are denoted by o-1 , ..• , O"n and ordered
as o-1 > · · · > O"n > 0. The nonnegative real numbers O"i = O"i(A), i = 1, ... , n
are known as the singular values of A. The columns of X are called the left
singular vectors of A while the columns of Y are called the right singular
vectors of A.
For m :::; n, :E would be of the form [:E 1 0) where :E 1 = diag( 0-1, ••• , um) and
0 E ~mx(n-m). Throughout this thesis, we shall assume that m > n. For
complex A, X and Y are unitary matrices and yT is replaced by Y*, where
Y* represents the complex conjugate transpose of Y. It is easy to extend
the theory to the complex case. All our results will be given in terms of real
matrices. We choose to consider the case of real A with m ~ n as results are
easily adapted to then > m case, and because this is the one which is typical
of many applications.
The SVD can be easily obtained using sophisticated and widely accessible
numerical software. See the LINPACK User's Guide [4), the EISPACK Guide
Extension [12) or the more recent LAPACK User's Guide [1] for subroutines
to compute the SVD. Our emphasis is on applications rather than the com-
putation. Numerical examples in this thesis were generated by implementing
routines in MATLAB [21).
Singular values are a useful tool in many applications. For instance, they
play a key role in control system design where many important structural
properties, such as robustness and noise sensitivity, can be expressed as in-
equalities involving the singular values of appropriate transfer function ma-
trices (see [6] and the references therein; also [26) and [28)).
The SVD figures prominently in schemes for reducing data based on approxi-
2
mating a given matrix with one of lower rank. For example, a problem which
sometimes arises in image processing is that the amount of data which is
generated cannot be transferred reasonably so that it becomes necessary to
reduce the data. In some cases, this can be achieved by using the singular
vectors corresponding to the largest singular values of the image matrix (see
[2] for examples of such data reduction schemes).
Further details on the SVD, including its properties, computation and nu-
merous other applications can be found in [13], [15) and [18).
are the singular values of A a.nd their negatives, with Im - nl zero eigenvalues
if m =/:- n. However, there are disadvantages to both these representations. For
example, if A( x) is affine, then working with AT A can destroy the structure in
the matrix-valued function A(x), while the second characterization increases
the dimension. Therefore, we work directly with the SVD of A to obtain
information on the singular value problem.
and define
I',
J,,,(A) = L 0-i(A)
i=l
3
where K E { 1, ... , n}. Here, / 1 (A) denotes the largest singular value of A and
fn(A) denotes the sum of all the singular values of A. It is well known that
the largest singular value is a convex function of the elements of the matrix
A. The fact that the sum of the K largest singular values of a real m by n
matrix is a convex function of the matrix elements appears less well known
but it is implied in the work by Horn and Johnson [16]. It is an immediate
consequence of the results obtained in this thesis.
Results on sums of singular values are of particular interest since using these
one may obtain estimates of the smallest and intermediate singular values.
For example, the smallest singular value, an(A), can be written as
Issues of determining the smallest nonzero singular value, say ap, are sig-
nificant as the "condition number", aif ap, gives an approximate measure of
ill-conditioning.. See [8] for an example of the use of the smallest nonzero
singular value and intermediate singular values in the numerical stabilization
of an ill-conditioned problem which occurs in Geophysics.
4
where r 1 , •.. , Tn are the singular values of W. Define
The set cI>mxn,,c is a compact convex subset of ~mxn. It is shown that f,c is
the support function for cI>mxn,11:, i.e.,
Another max characterization for !11: is given by Theorem 3.4.1 of Horn and
Johnson [16], namely,
lri [16], it is also proved that this variational formula is equivalent to the
partial isometry characterization
J,c(A) = ce~nxm
max {ltr(AC)I: C is a rank K partial isometry}.
Both (1.3) and (1.4) show that f,.(A) is a convex function. The advantage
of (1.3) over (1.4) is that, since cI>mxn,11: is convex, (1.3) leads directly to a
characterization of the subdifferential of f ,c which does not involve a convex
hull operation.
One of the earliest results for the sum of the K largest singular values of
a matrix is due to von Neumann [30] in 1937 and later in 1951 to Fan [7].
Sums of singular values have also been addressed in Horn and Johnson [16];
for other comments, see [20]. However, it appears that not much has been
done on the interconnection between this subject and the sets 'Pmxn,11: and
5
We also consider the function
Example 1.0.1 Let m = 8,n = 5,K = 3 and define A(x): ~--+ ~xs, i.e.,
A(x) = A0 + xA1
where
2
1
1
Ao=
1
0.5
0
and A 1 is a randomly generated 8 by 5 matrix. The top curve in Figure 1.1
is a plot oJ the sum oJ the 3 largest singular values of A( x) against x; the
remaining 5 curves represent plots of the individual singular values of A( x)
6
versus x. Note that f1t is a convex function with x = 0 as the minimizer. At
this x, the K.th singular value u K, = 1 with multiplicity 3 and we see that f 1t is
nonsmooth. Near x = -0.9, there is a multiple singular value with u 1 = u2•
All the multiple singular values are included in the sum and f K, is smooth at
this x-value.
, , , , , , I I J_,,,,-
7 ............. ) ................ ~················!················i· .............. ~ ................ ;................; ................ ;................; .......... ..,,,:'
6 -~~<······j··············--j-···············j················\················\··············--j-···············l············J~7~~.<.. +·············
',,j ~ ; : ; ,,,' : :
5 ·············f·~~,.~<f~.~~······-r-··············L............. i.. ........._._J>>,-~f ············1················1····..········
---
.............. i. ...............!. ............. T~~:-:.:::-.:.;~,,:-:'"~--......::<~...... L. ..............;................;.............. ) ............
:
4
: : : : : ;
3 ·············+···············i··············+···············/·······
: :
········,················,················,···········
:
; ;
: : : : : :
:.
.. .
: : : . : :
.. :. :.
0 ' - - - - - - - ' - - - - - ' - - - - - ' - - - - - ' - - - - ' - - - - - ' ' - - - - - - ' - - - - - ' - - - _ . __ __.
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1
In Chapter 4, the result on the elements in the set cI>mxn,1t which attain
the maximum is used to obtain a. concise formula for the subdifferential of
7
f"(A). Optimality conditions for J"(x) are derived by characterizing Clarke's
generalized gradient in terms of a dual matrix which has dimension equal
to the multiplicity of the 11:th singular value. The directional derivative of
J"(x) is discussed and it is emphasized how optimality may be verified by
computation of the appropriate dual matrix. This chapter is concluded by
showing how, for a non-optimal point, the dual matrix information may be
used in the generation of a descent direction, splitting a multiple singular
value if necessary.
The final chapter shows that the eigenvalue results of Overton and Womer-
sley (24] may be applied to eigenvalues of AT A to derive results on singular
values of A. An indication of how the theoretical results of this thesis may
be used for effective algorithm development is also given together with some
other concluding remarks on further research that may be done in this area.
8
Chapter 2
Preliminaries
2.1 Introduction
The singular values and vectors of A E ~mxn with m > n satisfy the fol-
9
lowing properties for i = 1, ... , n. These properties follow from the definition
of the SVD:
Ui = Ayi xr
Ayi = UiXi
AT Xi -:-- UiYi
AT Ayi = U[Yi
AAT Xi = U[Xi.
Here, Xi denotes a left singular vector corresponding to Ui and Yi denotes
a right singular vector corresponding to ui. Notice that the singular values
of A are the positive square roots of the eigenvalues of the n by n positive
semidefinite matrix AT A, or of then largest eigenvalues of them by m positive
semidefinite matrix AAT; the remaining eigenvalues of AAT, if any, are all
zero. The right singular vectors are the orthonormal eigenvectors of AT A
while the left singular vectors are the orthonormal eigenvectors of AAT. In
addition, for any PE Om,m and Q E On,n, the singular values of pT AQ are
the same as those of A. This expresses the orthogonal invariance of the set
of singular values of a real matrix. When A is square and symmetric, its
singular values are just the absolute values of its eigenvalues.
If A has rank r, then exactly r of its singular values are positive, i.e.,
-
A= [ 0
AT Al
O . (2.1)
10
X2 E ~mx(m-n) with~ given by (1.2), then A has the spectral decomposition
0E ~(m-n)x(m-n)
0E ~nx(m-n)
The block matrix (2.1) plays a key role in relating eigenvalue results for real
symmetric matrices to singular value results for general real matrices.
The largest and smallest singular values of A, denoted o-1 (A) and un(A)
respectively, are sometimes equivalently defined in terms of the spectral norm,
ll·lb,as
IIAxll2
o-1(A) = llxll2i'~o
max II X II 2 = IIAll2
and
o-n(A)
. IIAxll2
mm
llxll2'/:0 llxll2
IIA2 1112 if det(A) =/- 0,
{
0 if det(A) = 0.
11
for any A, BE ~mxn. However,
is not true for all i = 1, 2, ... , where {a;( A)} and {a;( B)} are the singular val-
ues of A and B respectively, both arranged in decreasing order. Nevertheless,
the sum of the K largest singular values does satisfy the triangle inequality.
An immediate corollary is
Corollary 2.2.1 For any KE {1, ... , q}, the function fK(A) = Li=I ai(A) is
convex.
We next note that the singular values of an arbitrary matrix have a min-
imax characterization. This is a generalization of the minimax characteriza-
tion for the eigenvalues of real symmetric matrices.
12
k = 1, ... , q. Then
mm max
Wt , ••. ,Wk-t ElRn. X -:f. 0, X E )Rn
X l. Wt' • • • 'Wk-t
and
max mm
Wt , ••• ,Wn-kE)Rn X -:f. 0, X E !Rn
X l. Wt, ••• , Wn-k
13
We conclude this section by giving an interlacing property for singular
values.
Theorem 2.2.6 ({15}, Theorem 7.3.9} Let A E ~mxn be a given matrix and
let A be the matrix obtained by deleting any one column of A. Let {o-i} and
{ ui} denote the singular valttes of A and A respectively, both arranged in
nonincreasing order. Then the singular values O"i of A interlace with those O-i
of A as follows.
(a) If m ~ n, then
The inequalities associated with the two cases (a) and (b) are interchanged
if a row of A is deleted instead of a column. This theorem gives bounds on
the perturbation of singular values due to removing a column of a matrix.
2.3 Miscellania
We start with some notation. Throughout this thesis, matrices are de-
noted by capital Roman letters except for diagonal matrices whose diagonal
entries are eigenvalues or singular values. Such diagonal matrices will be de-
noted by capital Greek letters such as ~ or A. Diagonal matrices may also be
denoted by diag(o 1 , ... , on) where o{, .' .. , O'n are elements on the diagonal.
Let Sn be the set of a.ll n by n real symmetric matrices and let Kn be the
14
set of all n by n real skew-symmetric matrices. A E Sn satisfies the identity
A= AT while A E Kn satisfies A= -AT.
tr(A) = L aii
i=l
atr(A) = L laiJ
i=l
If A is square, then the trace of A also equa.ls to the sum of its eigenvalues.
1-5
Some useful properties of the Frobenius inner product are summarized below.
1. (A, A) = IIAII}.
2. (A,J) = tr(A) (for a square matrix A).
(A,K) = 0.
5. If A E ~mxn, B E ~icxe, U E ~mxic and V E ~nxl, then
Proof: Expand both sides using the definition of the Frobenius inner
product. D
(a) (i)
T ) ~ { 1 if i = j,
(
U U ii = ~ UeiUtj = 0 otherwise. (2.2)
l=l
16
{b) IUi;I = 1 implies
Proof:
l(UUT)iil = I: ui ~ 1.
l=l
17
(c) Suppose B = uur. Then
The next two lemmas give results related to the outer product W = UVT,
where U E Om,1t and VE On,1t· These lemmas are necessary for proving the
partial isometry result of Section 3.2.
Lemma 2.3.2 Let U E Om,1t, let V E On,1t and let W = UVT. Then for
i = 1, ... ,m andj = l, ... ,n
( ={ 1 if wi; = 1,
-1 if wi; = -1.
(ii) Wit= 0 for f=j:.j, f = l, ... ,n and
18
Proof:
we have
ft ft ft
where the second inequality follows from the fact that U and V have
orthonormal columns. Thus if ( = 1,
ft
w:--IJ -- "'""""u-.v:-.
~ I~ J~ -< 1
l=l
and if ( = -1, ft
(b) From (2.4) and the orthonormality property of U and V it follows that
ft ft
19
If wij = -1, then
Uu = -½e for R = 1, ... , K
Since
n n n n
0 ::; I:(Uie - (Vie)2 =LU~+ L ½~ - 2( L Uiel~e,
i=l i=l i=l i=l
we obtain
n n n
2(LUiel1ie ~ I:U~ + L½i ~ 2,
i=l i=l i=l
where the second inequality follows from the fact that U and V have
orthonormal columns. Thus if ( = 1,
n
L Uiel"ie ~ 1
i=l
and if ( = -1, n
L UieVie ~ -1.
i=l
Hence
n
IL UieViel ~ 1.
i=l
Remark 2.3.1 As I Li=I l Vid 1 < Li=I llViil, Lemma 2.3.2 (c) implies that
II:?=1 wiil ~ ,.,_
20
Lemma 2.3.3 Let ( be either 1 or -1, let W = UVT where U E Om,it and
VE On,it with m ~ n ~ K. Also, let r and t be integers such that r ~ 0, t ~ 1
and 1 < r + t :s; n. If
then
Wij=O for i=r+t+l, ... ,m and j=l, ... ,n
and
Wii =0 for i = 1, ... , rn a.nd j = r + t + 1, ... , n.
Since ,;.
lV··tt -- '°'U·,Vo
~ ,~ l< (2.8)
l=l
and ,;.
L'~~ = i.
l=l
(2.10)
l=l i=l
T K. 1·+t It
LLu?e+ L Lui~
i=l l=l i=r+l l=l
r+t ,;.
r+ L LU;~
i=r+ll=l
21
so that
r+t r;,
Now,
r+t r;,
L I:(Uie - ('~e) 2
i=r+Ii=I
r+t r;, r+t r;, r+t r;,
Uie=(½e for i=l, ... ,r+t and f=l, ... ,K. (2.14)
L L ui~ = K - r
i=r+l i=l
or
r+t r;,
}:}:Ui~ = K. (2.15)
i=I i=l
Since U has orthonormal columns,
m K tt m
22
Hence
Wii = 0 for i = r + t + I, ... , m and j = 1, ... , n.
L L½~ = K - r
i=r+I l=I
or
i=l l=l
Again, using the orthonormality property of V, this implies
so that
Lemma 2.3.4 Let m ~ n, let KE {1, ... ,n} and let ( be either 1 or -1.
Let the SVD of H1 E ~mxn be given by H1 = u:tvT' u E Om,m, V E On,n
and EE ~mxn where 1 ~ T1 ~ • •• ~ Tn ~ 0 are the singular values of W and
Et=I Tt ~ K. Then
(i)
TcUic = (½e for f = 1, ... , n (2.18)
and
Uie = (Tel-'ie for f = 1, ... , n. (2.19)
23
{ii) for any f. = 1, ... , n
and
Tl < 1 implies Uil =0 and Vil = 0. {2.20)
Proof:
Since
n
0 < L( reUit - (½e) 2 {2.21)
l=I
n n n
:E(rtUie) 2 + L V;~ - 2( :E reUil½e (2.22)
l=I (=I (=I
it follows that
n n n
2( :E reUiel1ie < :E(reUit) + L V;~ 2
if we let ( = 1 and n
Wi; = L reU;eVje ~ -1
£=1
24
if we let ( = -1. Lemma 2.3.4 (a) follows from these two inequalities.
Notice that we may also obtain this lemma by using the inequality
n
L(Uit - (Ttl1it) 2 ~ 0 (2.24)
l=l
instead of (2.21).
using (2.21 ), (2.22), the fact that Tt :s; 1 for i = 1, ... , n together
with the orthogonality of U and V. If 11/ii = Lt=l TtUitYit = ( for
any i = 1, ... , n (where ( is either 1 or -1), then
and
n
U/.. -
Vl11-~
~ v2i i-- 1 l (2.26)
l=l
and
n
HI-·i i- -~
~ Ui i
2 - 1
-. (2.28)
l=l
Similarly from (2.26) and (2.27), it follows that Tt < 1 for any f
implies ½t = 0.
(iii) Lemma 2.3.4 (b) (iii) follows from (2.28) and the orthogonality of
u.
(c) Since
n n n n n
I:IWiil = LILTtUit½d ~ LTtlLUitVitl,
i=l i=l l=l l=l i=l
Lemma 2.3.4 (c) follows easily, as ILt=I Uit½d ~ 1 (from the proof of
26
Chapter 3
3.1 Introduction
In Section 3.2, we establish a max characterization for JK(A) over the set
Pmxn,K (defined by (3.11)), which is the set of all realm by n rank K. partial
isometries. In addition, we characterize the matrices in Pmxn,K which achieve
27
the maximum. Section 3.3 establishes a ma.x characterization for f"(A) over
a different set <I>mxn," ( defined by {3.23) ), which is a generalization of the set
'Pmxn,K· The matrices in the set <I>mxn," which achieve the maximum are also
identified.
Theorem 3.2.1 which is from Horn and Johnson ([16], Theorem 3.4.1) gives
a max characterization for the sum of the K largest singular values of an m
by n matrix A.
Theorem 3.2.1 Let A E ~mxn, let q = rnin{ m, n} and denote the ordered
singular values of A by o-1 (A) 2: · · · 2: o-q(A) 2: 0. Then for each K = 1, ... , q
max {ltr(UT AV)I} (3.2)
UEOm,"
(3.3)
28
and
and
Here I,,. is the identity matrix of order K, and hence U and V are matrices
whose columns are K orthonormal vectors in ~m and ~n respectively. If we
denote the rth diagonal entry of OT Eli by ( OTEfl)rr and define the m by n
matrix Eby
if i = £,
Eit ={ ~i(A)
otherwise,
L([!TEV)rr
1·=1
K m n
r=l i=l
n It
By parts (a) and (c) of Lemma. 2.3.2, l(Ul1 T)iil < 1 for i = 1, ... , n and
Li=t l(OVT)iil ~ K so that
n
ltr(UT AV)I IL O"i(A)(Ol1 T)iil
i=l
n K
29
If we let U = X1 and V = Yi, then from the SVD of A and the partitioning
of X and Y,
Xi[X1 X2]E [ ~; l Yi
EK
so the upper bound in {3.4) can be achieved. For a different proof, seep. 195
of [16]. D
0"1 ~ ~
O"r >
•••
r + 1 ~ ,,; ~ r + t ~ n.
Thus, r = 0 if ,,; = 1. Also if t = 1, then K = r + I. We may write
r
JK(A) = LO"i + (,,; - r)aK. (3.6)
i=I
Assumption: All the results which follow are derived on the assumption that
(7K > 0.
30
First we establish the following lemma which depends only on the defini-
tion of the set 'Pn,K and the ordering (3.5) for then nonnegative real numbers
<1'i. Specifically, it does not require Ui to be a singular value of a matrix.
max
WE<Pn,K
luTwl = i=l
Lui
and
Let ( be either 1 or -1. If w* = (w;, ... ,w~ f is any element of the right
hand side of (3.8), then
r r+t
luT w* I = IL uiw; + L uiw; I
i=l i=r+l
r
I 2:ui( + u,.((K. - r)I
i=l
31
Conversely, let w* E argmax {luTwl : w E <Pn,,J. Then, w* E <Pn,K satisfies
o-T w* = CLi=l O"i and from {3. 7) we have
n
"I:w; = (1,,.
i=l
There are two possible cases corresponding to this. The first case is where
( = 1 so that I(
T w* ~
(T = ~O"i
i=l
and
n
I:w; = K. {3.9)
i=l
From {3.5) it follows that
and
w;=o for i=r+t+l, ... ,n.
(T
T w* =- ~
~O"i
i=l
and
n
"I:w; = -1,,. (3.10)
i=l
32
Lemma 3.2.2 Let 1,, E {1, ... ,n}, let m 2 n and define the set Pmxn,,. by
Proof: If W E Pmxn,,., then there exists U E Om,,. and VE On,,. such that
w = uvT. Form the matrices [U ir] E Om,m and [V V] E On,n• Then
l[r; l
that
W = UEVT = [U U] [ /,. 0
where E E ~mxa, and U E Om,,. and V E On,,. are the first K columns of
U E Om,m and V E On,n respectively. D
The following lemma gives a max characterization for f,.(A) over the set
Pmxn,,c· In the proof of this lemma, we establish an upper bound for the
sum of the K. largest singular values of A. \Ve then use an appropriate square
diagonal matrix to show that the upper bound can be achieved.
Lemma 3.2.3 Let KE {1, ... ,n}, let m 2 n and define Pmxn,,. by {3.11).
Then
IC
33
Remark 3.2.3 From the properties of the Frobenius inner product
Proof: For any WE Pmxn,K, (1.1) and the properties of the Frobenius inner
product imply that
So,
= max l(E,XTWY)I
ll'E'Pmxn,K
= l-i'E'Pmxn,K
max I(~, lV)I
: :; L o-;(A)
i=l
as from Lemma 2.3.2 (a) and (c), W E Pmxn,K implies that IWid < 1 for
i = 1, ... , n and Li=l IWiil :::; "'·
i=l
34
Remark 3.2.4 Another matrix which achieves the maximum can be obtained
from {1.1), the SVD of A. Suppose, we partition X as X = [X1 X 2) such
that X1 E Om,1t and X2 E Om,m-1t, and Y as [Yi 1-';) such that Yi E On,1t and
l'; E On,n-1t. Then, letting W = X1 Y? gives
I(
Lemma 3.2.4 Let St be the set of all real t x t symmetric positive semidefi-
nite matrices, let st- be the set of all real t x t symmetric negative semidefinite
matrices and let Ptxt,1t-r be the set of all real t X t rank K-r partial isometries.
Then
{a)
(3.13)
(b)
Proof: Let B E St n Ptxt,1t-r· Let o-1(B), ... , O-t(B) denote the singular
values of B with o-1 (B) ~ ··· ~ O-t(B) ~ 0 and let A1(B), ... , At(B) denote
the eigenvalues of B with IA 1 (B)I ~ · · · ~ IAt(B)I. Suppose B has the spectral
decomposition
B = QAQT (3.14)
where Q E Ot,t and A= diag{A 1 (B), ... , At(B)). Since Bis symmetric,
for i = 1, ... , t.
As Bis positive semidefinite, Ai(B) ~ 0, so
35
If B E 'Ptxt,1t-r, then by the definition of a partial isometry
Conversely, let B be any element of the right hand side of (3.13). Clearly,
B is symmetric and positive semidefinite. Furthermore, as there exists a
Qi E Ot,K-r such that B = Q 1 Q[, from Lemma (3.2.2) it follows that B is a
rank K - r partial isometry. Thus, BE St n 'Ptxt,K-r·
The proof for Lemma 3.2.4 (b) is similar with (3.15) replaced by
Example 3.2.1 This shows that there exists BE St n 'Ptxt,K-r which cannot
be expressed as Qi Q[ for some Qi E Ot,K-r.
36
with
where
I • • T
wtxt,K-r = {WE St : w = zz ' z E Ot,K-r }. (3.19)
37
Partition the rows and columns of W* E ~mxn into blocks as
W* =
Fn F12
[ F21 F22 F23
Fi3]
F31 F32 F33
where the dimensions of the square matrices Fii for i = 1, 2 a.re respectively
rand t.
where Cii and Eii for i = 1, 2 are r x rand t x (K- - r) matrices respectively.
From (3.20a) the diagonal elements of W* a.re all either equal to 1 or -1 for
i = 1, ... , r. Then Lemma 2.3.2 (b) implies that Fn = (Ir, F1i = 0 for all
j -/=- 1, Fil = 0 for all i -/=- 1 and (C11 Cl2] = ((E11 E 12 ). As the diagonal
elements of W* satisfy (3.20), from Lemma. 2.3.3 it follows that F 3i = 0 for
j = 1, 2, 3 and Fi 3 = 0 for i = 1, 2, 3. Furthermore, from (2.14)
[ Cu C12 ] = ( [ Eu E12 ] .
C21 C22 E21 E22
From (2.16) we have C3 j = 0, and from (2.17) E 3i = 0, for all j. Using all
these results and setting [JT = (C11 Cu) E ~rx1t and (JT = (C21 C 22 ] E ~tx1t,
u• =[ F], oE jllm-,-,)x•
and
v· = ( [ F], 0 E jlln-,-•)x•
and so
[JT[J
W* =( [ (JT[J
38
Let W = (JT(J_ From (3.20d), it follows that tr(W) = K- r. Notice that
W is symmetric, positive semidefinite and also a rank K - r partial isometry.
Hence, by Lemma 3.2.4 we can express W as
W
A
= ZZ T , Z E C::\K-r·
(3.21)
39
where O.p1 = {X1 Yt + X 2ZZTY[ : Z E Ot,,--r} and X1, X2, Yi, ½ satisfy
{3.22).
Below are two possible generalizations of the set 'Pmxn,K· Let m > n, let
K E { 1, ... , n} and define
where r 1 , ... , Tn are the singular values of W. Both sets are compact and
convex. However, the following example illustrates that they are not suitable
for our purposes.
W =
= 4, n = 3, "' = 2
0.7205 -0.0162
[ 0.1737
-0.6209
0.3351
0.4215
and let
0.6346
0.2083
0.6349
l
0.2554 0.8425 -0.3883
40
with singular value decomposition W = xf;yT_ The singular values of Ware
T1 = T2 = T3 = 1 and ltr(W)I = atr(W) = 1.6905 ~ K. Thus, W is in 4>~xn,it
and also in 4>:xn,it but if A = XEYT with singular values C11 = 3, C12 = 2 and
0'3 = 1, then
It
where m ~ n, KE {1, ... ,n} a.nd r 1 , ••• ,Tn are the singular values of W.
Remark 3.3.1 4>mxn,it is a compact convex set. It is also invariant under or-
thogonal transformations. The convexity and orthogonal invariance of 4>mxn,it
can be established using the properties of the SVD given in Section 2.2.
Lemma 3.3.1 Let A E ~mxn have singular values CT1 ~ • • • ~ O"n > 0 and let
4>mxn,it be defined by {3.23}. Then
Proof: For any WE 4>mxn,it, equation {1.1) and the properties of the Frobe-
nius inner product imply that
(3.25)
41
= H E<l>mxn,K
max l(:E, lV)I
1
n
= max
H'E<l>mxn,1<
I L ui(A)Wid
i=l
n
~ max
WE<l>mxn,1< i=l
L ui(A)IWiil
K
~ L O"i(A) = f,.(A)
i=I
as from parts (a) and (c) of Lemma 2.3.4, H1 E ~mxn,,. implies that IWid <1
for i = 1, ... ,n and I.:~1 IW;il ~ ,c
Let W = diag(W11 , ••• , lVnn) such that lV;i = 1 for i = 1, ... , K and
Wii = 0 otherwise. This lV is in ~mxn,,. and achieves the maximum. D
Remark 3.3.2 For any W E ~mxn,,. the inner products (A, W) and -(A, W)
are linear functions of A E ~mxn. As a pointwise maximum of convex func-
tions is convex, the convexity of f,.(A) follows from the maximum characteri-
zation in Lemma 3.3.1. In a similar way, the convexity of f,.(A) also follows
from the partial isometry characterization in Lemma 3.2.3.
with
42
where
1 = T1 = · · · = Tp >
(3.32)
Tp+I ~ · · · ~ Tq > Tq+1 = · · · = Tn = 0
43
where O ~ p < K and r +1 ~ K ~ q ~ n.
where Ip, t and the diagonal zero are square matrices with dimensions p,
q - p and n - q respectively. Also, partition V* a.s
44
Moreover from (3.31) it follows that Li=I wii = ,c(. As wii = Lt=l TtUit¼t,
we obtain
n n n
(}: Wii = (LLTtUil¼t = IC. (3.35)
i=l i=l l=l
Now,
112['1·
7t il = ( 7 t112v·il for i = 1, ... ,n and f = l, ... ,q.
I.e.,
= (En, C12 = (E12, C21 = (E21,
Cu
(3.36)
= (E22, C31 = (E31, C32 = (E32.
C22
From (3.31a), Wii = ( for i = 1, ... , r. By (3.32), re< 1 for f = p+ 1, ... , n
so that from (2.20) we obtain
lV* = U*E*V*T = ( I lr
45
where the sizes of the diagonal blocks of them x n matrix W* are respectively
r, t and n - r - t.
Therefore
E31 = 0 and E32 = 0.
where the diagonal blocks are r x r, t x t and (m- r-t) x (n-r-t) matrices
respectively.
Let W
-
= C21 C 21T + C22:EC
- T
22 where
-
:E > 0. The trace condition comes
directly from (3.31d). It is obvious that W E St. Also, W is a positive
semide:finite matrix as it is a sum of positive semide:finite matrices. Therefore
46
Example 3.3.1 A matrix W* E argmax {l(E, W)I : W E ~mxn,"} is given
below. U* and V* are orthogonal mafrices from the SVD of W*. The singular
values of W* are r 1 = r2 = r3 = 1 and r4 = rs = 0. In this example
m = 6, n = 5, p = q = 3, r = 2, t = 2 and K = 3.
0.5085 -0.7807 0.3632
0.6602 0.0827 -0.7465
0.5514 0.6178 0.5560 0.0505 0.0505
U* = -0.0395 -0.0442 -0.0398 0.0753 0.0753
0.7071 -0.7071
-1
w·= l
r 1
0.9949 -0.0712
-0.0712 0.0051
0
0
Notice that
r+t
I: "'ii = 1 = "' -
i=r+l
r =
"
max l(A, W)I = max l(E, W)I = LO"i-
"'E~mxn," li'E~mxn,1< i=l
47
From (3.25) and the invariance of 4>mxn,,. to orthogonal transformations
so that
Remark 3.3.4 The degrees of freedom in the argmax result of Corollary 3.3.1
is parametrized by a symmetric matrix W. The trace condition on W gives
a linear equation while the positive semidefinite constraint gives eigenvalue
inequalities. Furthermore, <I>mxn,,. is a convex set. Thus, the argmax result
leads to a minimal representation of the subdifferential off,. so that a non-
singular system can be solved to ve1·ify optimality. The argmax result for
the partial isometry characterization of f,.(A) given by Corollary 3.2.1 is less
useful because this involves orthogonal matrices. Also, the set Pmxn,,. is not
convex.
Remark 3.3.5 The set 'Vtxt,K-r is the convex hull of the set .
'V~xt K-r (defined
by 3.19), and 'V~xt, .. -r is the set of extreme points of 'Vtxt, .. -r (see Overton
and Womersley {25}, Theorem :J).
48
Remark 3.3.6 If K = r + t, i.e., ult > <r1t+I, then W = It and
We now consider the situation where u It = 0. For this case, the appropriate
ordering of the singular values of A is
U1 ~ · · · ~ U r >
where n = r + t.
with
{W E ~rxn : W = ( [ [ /, OW l ],
( is either 1 or - 1 and W E 'Vtxt,1t-r}
where
'Vtxt,1t-r = {WE St, 0 ~ 11/ ~ I and tr(W) =K - r }.
49
Proof: The proof is similar to the proof of Theorem 3.3.1 with equations
(3.31) replaced by the equations
and with U*, f;• and V* from the SVD of lV* partitioned as follows:
E·=[! !],
Ip and f; are p x p and (q - p) x (n - p) matrices respectively and
50
Chapter 4
Differential properties
4.1 Introduction
JK(A) = L o"i(A),
i=l
51
4.2 The subdifferential offK,(A)
Lemma 4.2.1 The function JK : ~mxn --+~is convex and its subdifferential
8fK(A) is the nonempty compact convex set
(4.1)
where 'Vtxt,K-r is defined by (3.28) and X1, X2, Yi, Yi satisfy {3.22). Further-
more, fK is differentiable at the point A if and only if K = r + t, in which case
8fK(A) reduces to X1Yt + X 2 Y,l, the derivative of JK at A.
Proof: By Lemma 3.3.1 the function JK is the support function for <Pmxn,K,
and is therefore convex.
and
(4.2)
The properties of the Frobenius inner product and (3.22) imply that
52
so ( in Lemma 3.3.1 is given by (4.2). Note that JK(A) =0 if and only if
A=O.
Equation (4.1) follows from Corollary 3.3.1 as the convex hull of a convex
set is convex. Clearly, the right-hand side of (4.1) is a singleton if and only
if K = r + t. The last part of the result then follows from Theorem 25.1 of
Rockafellar (27]. D
IS
Ak(x) = BA(x)
for k = 1, ... ,f.
8xk
In this section we characterize the generalized gradient of the convex com-
posite function
JK(x) = fk(A(x)).
Although Ak and the singular values a.nd singular vectors of A( x) are functions
of x E ~e, the explicit dependence on x will usually be omitted. Therefore,
as before the singular values of A(x) a.re denoted by (3.5), with rand t now
dependent on x, and with the corresponding left and right singular vectors
satisfying (3.22).
53
Proof: Since f,. (A) is convex and A( x) is smooth, the chain rule of Theo-
rem 2.3.10 from Clarke (3] implies that
Lemma 4.2.1 and the properties of the Frobenius inner product complete the
proof. D
the result also follows from Theorem 2.8.6 of Clarke {3} which characterizes
the generalized gradients of functions defined by a pointwise maximum. From
the Clarke characterization
Remark 4.3.2 The form of the generalized gradient given by Lemma 4.3.1
is computationally convenient as it does not involve taking a convex hull.
The absence of the convex hull operation also means that the structure of the
subdifferential is displayed.
Corollary 4.3.1 If"'= r+t, i.e., u,. > u,.+1, the function f,. is differentiable
at x with
8f,.(x)
f) = tr (XT1 Ak ,,,)I +
l tr
(XT v)
2 Ak .t 2 •
Xk
Proof: It follows from Lemma. 4.3.1 using the ordinary chain rule. D
54
4.4 Necessary conditions
and
These two conditions are computationally very useful as one can relax the
inequalities on W and solve (4.5) together with tr(W) =K- r for W. This
requires solving a system of l + 1 linear equations for the t( t + 1) /2 unknowns
in the symmetric matrix W. If the inequalities O < W ::::; I are not satisfied
then a descent direction may be generated. This is discussed in Section 4.6.
If f,,. is convex {for example if A(x) is affine), then equations (4.4) and
(4.5) are both necessary and sufficient for x to be a minimizer of f,,..
.55
where (4.6) follows from Lemma 4.2.1. Recall that the matrices X 1 , X 2 , Yi
and ½ defined by (3.22), are evaluated at the point x, and that Ak is the
partial derivative of A(x) with respect to Xk evaluated at the point x. For
k = 1, ... , l define
bk = tr(X[ AkYi), (4.8)
and
Kk = ½(Xf Ak½ - l;T Ar X2).
Also, define B( d) E St by
l!
B(d) =L dkBk. (4.10)
k=l
Let the eigenvalues of the symmetric matrix B( d) be /31 > · · · > f3t- Then
from Theorem 3.4 of Overton and Womersley [24], it follows that
K-T
56
and d = -l, then bT d = -0.7098 and B(d) has eigenvalues /31 = 0.8353, /32 =
0.0176 and /33 = -1.5129. Again from (4.11), we obtain 1:(0;-l) = 0.1431.
For comparison, the definition
f.,,_'( X,·d)=
_
. l,,,(x+od)-l,,,(x)
1Im
a-o+ o
with a= 1x10- 7 gave the approximations 1:(0; +l) = 2.2051 and 1:(0; -l) =
0.1431.
In this section, for given x, we wish to either (a) generate a descent direc-
tion for I,,,, or (b) demonstrate that x satisfies the first-order conditions for
optimality. If,., = r + t, then l,,,(x) is differentiable; consequently it is suffi-
cient to examine the gradient, which has entries given by (4.3). If the gradient
is zero, the first-order optimality conditions hold; otherwise, the negative gra-
dient provides a descent direction. The function I,,, may be nonsmooth for
K < r + t. We consider only this case in the remainder of this section. The
steepest descent direction is not of interest as it is known that the method
of steepest descent may converge to a non-optimal point when applied to a
nonsmooth function. Instead, we consider a descent direction which main-
tains the multiplicity t for a,,,, to first order, when possible. This is possible
in the first of the following three cases. In the second case, generation of a
descent direction requires splitting a group of singular values corresponding
to a,,,. The third case is a degenerate case.
Casel. I E Span{B1, ... , Bt}-
Solve the system
C
6/ - L dkBk = O (4.12)
k=l
C
(K - r)6 + L dkbk = -l. (4.13)
k=l
57
This is a system of t(t + 1)/2 + 1 linear equations in f.+ 1 unknowns
6,di, ... ,de. Equation (4.12) implies that the eigenvalues of B(d) defined by
(4.10) are all equal to 6. The system is solvable since (4.12) is solvable for
any 6 by assumption, and (4.13) scales this solution. Hence, from equations
(4.11) and (4.13), f~(x;d) = -1, where the direction d E ~ has components
d1 , ••• , de. Note that the -1 on the right-hand side of (4.13) is just a normal-
ization constant and can be replaced by any 'f/ < 0 giving J~(x; d) = T/· To
first order, all the singular values O"r+t(x), ... , ar+t(x) decrease at the same
rate along d, and 6 gives a first order estimate of the change in their common
value.
Case 2. Case 1 does not apply and the span of the f.+ 1 vectors in ~t(t+I)/ 2
for the dual matrix l1'1 E St. Notice that the trace condition ( 4.14) is equiva-
lent to (I, W) = 1,, - r. Since the {Bk} may not form a linearly independent
set, (4.15) may be replaced by considering only a maximal independent set
of {Bk}. The resulting system cannot be inconsistent because of the related
definitions of the left and right-hand sides (i.e. Bk and bk)- By the rank as-
sumption, the resulting linear system is square and nonsingular, with order
t(t + 1)/2, and having a unique solution H1 . If lV satisfies O ~ W ~ I then
0 E 8/,,.(x), so x satisfies the first-order necessary conditions for a minimum.
If these inequalities on l1'1 are not satisfied then a descent direction can be
generated using the following lemma. This lemma shows the importance of
the eigenvalues of the t by t dual matrix l11.
Lemma 4.6.1 Suppose (4- 14) and (4- 15) are satisfied but O (/: 8/,,.(x), so W
has an eigenvalue () outside [O, 1]. Let z E ~t be the corresponding normalized
eigenvector of W. Choose /3 E ~ so that /3 < 0 if() > 1 and (3 > 0 if() < 0.
-58
Solve
l
81 - L dkBk = {lzzT.
k=l
Remark 4.6.1 The descent direction splits the multiple singular value into
two clusters, one of unit multiplicity and one of multiplicity t-1 to first order.
This is analogous to moving off only one active constraint at one time in linear
or nonlinear programming. A descent direction is generated using information
provided by the dual matrix W. Negative Lagrange multipliers provide similar
information in constrained optimization. Lemma 4-6.1 guarantees an overall
reduction in f,. as follows: if 0 < 0, then one singular value in the group
of multiplicity t is separated from the others by a reduction, reducing the
approximate multiplicity but leaving the number of singular values larger than
u,., to first order, unchanged; if 0 > 1, then one singular value in the group of
multiplicity t is separated from the others by an increase, again reducing the
approximate multiplicity but increasing the number of larger singular values
(to first order).
Case 3.
Neither of Cases 1 and 2 apply. Degeneracy is said to occur in this case.
Generation of a descent direction is not straightforward in the degenerate
case, just as in linear or nonlinea.r programming.
59
Chapter 5
Conclusions
5.1 Introduction
Therefore, we can apply the eigenvalue results of Overton and Womersley [24]
to AT A and simplify these to obtain results on singular values of A.
In Section 5.2, we give some indication of how this may be done. We also
derive an expression for Bk (defined by (4.9)). In Section 5.3, we make some
concluding remarks on possible extensions of the work done in this thesis.
60
5.2 Relation to AT A
AT A= Y DYT, D = :ET:E E Vn
where the columns of Y form an orthonormal set of eigenvectors for AT A
and D = diag(,\ 1 {AT A), ... , ,\n(AT A)). Alternatively, the columns of Y are
the set of right singular vectors of A, and D = diag(a;(A), .. . ,a;_(A)) as a
consequence of (5.1).
61
Furthermore, if the eigenvalues of AT A satisfy (5.2), then
T A
with
itxt,it-r ={WE St: 0 ~ W~ I and tr(W) = K - r},
where Yi, ½ satisfy (5.3).
= Yi.YiT + ½I';T ·
It
9it(A) = I:C,-;(A).
i=l
From Theorem 3.5 of [24]; it follows that the function git : Sn ----+ ~ is convex
with subdifferential
For completeness, we give the derivation of the expression for Bk. Let A( x)
be a real m by n ( where m ~ n) matrix affine function of a real parameter
vector of x = (x 1 , ••• , xe)T E ~e, i.e.,
l
A(x) =Ao+ L XkAk,
k=l
where { Ak} are given real m by n matrices. This expression can also be
obtained by a first order Taylor series expansion.
We have,
l l
A(xf A(x) (A~+ L xkAf)(Ao + L XkAk)
k=l k=l
l l l
A~Ao + L Xk(A~Ak + Af Ao)+ L L XjAj Akxk.
k=l i=l k=l
62
Therefore, the partial derivative of A(xf A(x) with respect to Xk is
where the second equality follows from the SVD of Ao and the partitioning
of X and Y.
63
point. In [22) and (23), Overton has described and extensively tested such al-
gorithms when minimizing the largest eigenvalue (i.e., K = 1) of a symmetric
matrix-valued function.
(5.5)
l
subject to bi - L dkXf Ak½ = diag(o-r+I, ... , O"r+t)- (5.6)
k=l
Here, H is some positive semidefinite matrix. All the quantities X 2 , Y;, Ak, O"i
and b (defined in (4.8)) are evaluated at the current point x. The new point
is x + d, and b gives an estimate of o-/((x + d).
64
Bibliography
[1] E. Anderson et a.I., LAPACK User's Guide, Society for Industrial and
Applied Mathematics, Philadelphia, 1992.
65
[10] R. Fletcher, Practical Methods of Optimization, (second edition), John
Wiley, Chichester and New York, 1987.
66
[21] The MathWorks, Inc., PRO-MATLAB User's Guide, Cochituate Place,
24 Prime Park Way, South Natick, MA 01760, 1991.
(25] M. L. Overton and R. S. \Vomersley, "On the sum of the largest eigen-
values of a symmetric matrix", SIA.l\1 Journal on Matrix Analysis and
A.ppli<;ations) Vol. 13, No. 1, (1992), 41-45.
67
[31] G. A. Watson, "Computing the structured singular value", SIAM Journal
on Matrix Analysis a.nd Applications, Vol. 13, No. 4, (1992), 1054-
1066.
68