DATTORRO

CONVEX OPTIMIZATION & EUCLIDEAN DISTANCE GEOMETRY
Mεβοο

Dattorro

CONVEX

OPTIMIZATION

&

EUCLIDEAN

DISTANCE

GEOMETRY

Meboo

Convex Optimization & Euclidean Distance Geometry

Jon Dattorro

Mεβoo Publishing

Meboo Publishing USA PO Box 12 Palo Alto, California 94302 Dattorro, Convex Optimization & Euclidean Distance Geometry, Mεβoo, 2005, v2012.01.28. ISBN 0976401304 (English) ISBN 9780615193687 (International III)

This is version 2012.01.28: available in print, as conceived, in color. cybersearch: I. II. III. IV. V. VI. VII. convex optimization semidefinite program rank constraint convex geometry distance matrix convex cones cardinality minimization programs by Matlab

typesetting by with donations from SIAM and AMS. This searchable electronic color pdfBook is click-navigable within the text by page, section, subsection, chapter, theorem, example, definition, cross reference, citation, equation, figure, table, and hyperlink. A pdfBook has no electronic copy protection and can be read and printed by most computers. The publisher hereby grants the right to reproduce this work in any format but limited to personal use. 2001-2012 Meboo All rights reserved.

for Jennie Columba ♦

Antonio

♦ & Sze Wan

EDM = Sh ∩ S⊥ − S+ c

Prelude
The constant demands of my department and university and the ever increasing work needed to obtain funding have stolen much of my precious thinking time, and I sometimes yearn for the halcyon days of Bell Labs. −Steven Chu, Nobel laureate [81] Convex Analysis is the calculus of inequalities while Convex Optimization is its application. Analysis is inherently the domain of a mathematician while Optimization belongs to the engineer. A convex optimization problem is conventionally regarded as minimization of a convex objective function subject to an artificial convex domain imposed upon it by the problem constraints. The constraints comprise equalities and inequalities of convex functions whose simultaneous solution set generally constitutes the imposed convex domain: called feasible set. It is easy to minimize a convex function over any convex subset of its domain because any locally minimum value must be the global minimum. But it is difficult to find the maximum of a convex function over some convex domain because there can be many local maxima. Although this has practical application (Eternity II 4.6.0.0.15), it is not a convex problem. Tremendous benefit accrues when a mathematical problem can be transformed to an equivalent convex optimization problem, primarily because any locally optimal solution is then guaranteed globally optimal.0.1
Solving a nonlinear system, for example, by instead solving an equivalent convex optimization problem is therefore highly preferable and what motivates geometric programming; a form of convex optimization invented in 1960s [60] [79] that has driven great advances in the electronic circuit design industry. [34, 4.7] [249] [386] [389] [101] [181] [190] [191] [192] [193] [194] [265] [266] [311] 2001 Jon Dattorro. co&edg version 2012.01.28. All rights reserved. citation: Dattorro, Convex Optimization & Euclidean Distance Geometry, Mεβoo Publishing USA, 2005, v2012.01.28.
0.1

7

Recognizing a problem as convex is an acquired skill; that being, to know when an objective function is convex and when constraints specify a convex feasible set. The challenge, which is indeed an art, is how to express difficult problems in a convex way: perhaps, problems previously believed nonconvex. Practitioners in the art of Convex Optimization engage themselves with discovery of which hard problems can be transformed into convex equivalents; because, once convex form of a problem is found, then a globally optimal solution is close at hand − the hard work is finished: Finding convex expression of a problem is itself, in a very real sense, its solution. Yet, that skill acquired by understanding the geometry and application of Convex Optimization will remain more an art for some time to come; the reason being, there is generally no unique transformation of a given problem to its convex equivalent. This means, two researchers pondering the same problem are likely to formulate a convex equivalent differently; hence, one solution is likely different from the other although any convex combination of those two solutions remains optimal. Any presumption of only one right or correct solution becomes nebulous. Study of equivalence & sameness, uniqueness, and duality therefore pervade study of Optimization. It can be difficult for the engineer to apply convex theory without an understanding of Analysis. These pages comprise my journal over a ten year period bridging gaps between engineer and mathematician; they constitute a translation, unification, and cohering of about four hundred papers, books, and reports from several different fields of mathematics and engineering. Beacons of historical accomplishment are cited throughout. Much of what is written here will not be found elsewhere. Care to detail, clarity, accuracy, consistency, and typography accompanies removal of ambiguity and verbosity out of respect for the reader. Consequently there is much cross-referencing and background material provided in the text, footnotes, and appendices so as to be more self-contained and to provide understanding of fundamental concepts.

−Jon Dattorro Stanford, California 2011

8

Convex Optimization & Euclidean Distance Geometry
1 Overview 2 Convex geometry 2.1 Convex set . . . . . . . . . . . . . . 2.2 Vectorized-matrix inner product . . 2.3 Hulls . . . . . . . . . . . . . . . . . 2.4 Halfspace, Hyperplane . . . . . . . 2.5 Subspace representations . . . . . . 2.6 Extreme, Exposed . . . . . . . . . . 2.7 Cones . . . . . . . . . . . . . . . . 2.8 Cone boundary . . . . . . . . . . . 2.9 Positive semidefinite (PSD) cone . . 2.10 Conic independence (c.i.) . . . . . . 2.11 When extreme means exposed . . . 2.12 Convex polyhedra . . . . . . . . . . 2.13 Dual cone & generalized inequality 3 Geometry of convex functions 3.1 Convex function . . . . . . . . . . 3.2 Practical norm functions, absolute 3.3 Inverted functions and roots . . . 3.4 Affine function . . . . . . . . . . 3.5 Epigraph, Sublevel set . . . . . . 3.6 Gradient . . . . . . . . . . . . . . 9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 35 36 50 61 70 84 91 95 104 112 139 145 146 154

. . . . value . . . . . . . . . . . . . . . .

217 . 218 . 223 . 233 . 235 . 239 . 248

10 CONVEX OPTIMIZATION & EUCLIDEAN DISTANCE GEOMETRY 3.7 3.8 3.9 Convex matrix-valued function . . . . . . . . . . . . . . . . . . 260 Quasiconvex . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267 Salient properties . . . . . . . . . . . . . . . . . . . . . . . . . 269 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271 . 272 . 282 . 297 . 306 . 331 . 351 . 398 . 404 409 . 410 . 411 . 411 . 416 . 449 . 454 . 460 . 465 . 473 . 480 . 483 . 491 . 495 . 501 511 . 513 . 515 . 516 . 517 . 525 . 531 . 536

4 Semidefinite programming 4.1 Conic problem . . . . . . . . . . . . . . . 4.2 Framework . . . . . . . . . . . . . . . . . 4.3 Rank reduction . . . . . . . . . . . . . . 4.4 Rank-constrained semidefinite program . 4.5 Constraining cardinality . . . . . . . . . 4.6 Cardinality and rank constraint examples 4.7 Constraining rank of indefinite matrices . 4.8 Convex Iteration rank-1 . . . . . . . . . 5 Euclidean Distance Matrix 5.1 EDM . . . . . . . . . . . . . . . . . . . . 5.2 First metric properties . . . . . . . . . . 5.3 ∃ fifth Euclidean metric property . . . . 5.4 EDM definition . . . . . . . . . . . . . . 5.5 Invariance . . . . . . . . . . . . . . . . . 5.6 Injectivity of D & unique reconstruction 5.7 Embedding in affine hull . . . . . . . . . 5.8 Euclidean metric versus matrix criteria . 5.9 Bridge: Convex polyhedra to EDMs . . . 5.10 EDM-entry composition . . . . . . . . . 5.11 EDM indefiniteness . . . . . . . . . . . . 5.12 List reconstruction . . . . . . . . . . . . 5.13 Reconstruction examples . . . . . . . . . 5.14 Fifth property of Euclidean metric . . . 6 Cone of distance matrices 6.1 Defining EDM cone . . . . . . . . . . . . 6.2 √ Polyhedral bounds . . . . . . . . . . . . 6.3 EDM cone is not convex . . . . . . . . 6.4 EDM definition in 11T . . . . . . . . . . 6.5 Correspondence to PSD cone SN −1 . . . + 6.6 Vectorization & projection interpretation 6.7 A geometry of completion . . . . . . . .

CONVEX OPTIMIZATION & EUCLIDEAN DISTANCE GEOMETRY 11 6.8 Dual EDM cone . . . . . . . . . . . . . . . . . . . . . . . . . . 543 6.9 Theorem of the alternative . . . . . . . . . . . . . . . . . . . . 557 6.10 Postscript . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 559 7 Proximity problems 7.1 First prevalent problem: . 7.2 Second prevalent problem: 7.3 Third prevalent problem: . 7.4 Conclusion . . . . . . . . . 561 . 569 . 580 . 592 . 602 605 . 605 . 610 . 613 . 625 . 630 . 634 . 639 647 . 648 . 653 . 654 . 656 . 661 667 . 667 . 668 . 675 . 677 . 682

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

A Linear algebra A.1 Main-diagonal δ operator, λ , tr , vec A.2 Semidefiniteness: domain of test . . . . A.3 Proper statements . . . . . . . . . . . . A.4 Schur complement . . . . . . . . . . . A.5 Eigenvalue decomposition . . . . . . . A.6 Singular value decomposition, SVD . . A.7 Zeros . . . . . . . . . . . . . . . . . . . B Simple matrices B.1 Rank-one matrix (dyad) B.2 Doublet . . . . . . . . . B.3 Elementary matrix . . . B.4 Auxiliary V -matrices . . B.5 Orthomatrices . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

C Some analytical optimal results C.1 Properties of infima . . . . . . . . C.2 Trace, singular and eigen values . C.3 Orthogonal Procrustes problem . C.4 Two-sided orthogonal Procrustes C.5 Nonconvex quadratics . . . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

D Matrix calculus 683 D.1 Directional derivative, Taylor series . . . . . . . . . . . . . . . 683 D.2 Tables of gradients and derivatives . . . . . . . . . . . . . . . 704

12 CONVEX OPTIMIZATION & EUCLIDEAN DISTANCE GEOMETRY E Projection E.1 Idempotent matrices . . . . . . . . . . . . . E.2 I − P , Projection on algebraic complement . E.3 Symmetric idempotent matrices . . . . . . . E.4 Algebra of projection on affine subsets . . . E.5 Projection examples . . . . . . . . . . . . . E.6 Vectorization interpretation, . . . . . . . . . E.7 Projection on matrix subspaces . . . . . . . E.8 Range/Rowspace interpretation . . . . . . . E.9 Projection on convex set . . . . . . . . . . . E.10 Alternating projection . . . . . . . . . . . . F Notation, Definitions, Glossary Bibliography Index 713 . 717 . 722 . 723 . 729 . 730 . 736 . 744 . 747 . 748 . 762 783 803 833

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

List of Figures
1 Overview 1 Orion nebula . . . . . . . . . . . . . . . 2 Application of trilateration is localization 3 Molecular conformation . . . . . . . . . . 4 Face recognition . . . . . . . . . . . . . . 5 Swiss roll . . . . . . . . . . . . . . . . . 6 USA map reconstruction . . . . . . . . . 7 Honeycomb . . . . . . . . . . . . . . . . 8 Robotic vehicles . . . . . . . . . . . . . . 9 Reconstruction of David . . . . . . . . . 10 David by distance geometry . . . . . . . 21 22 23 24 25 26 28 30 31 34 34 35 38 41 42 45 49 49 52 54 56 61 63 66 68

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

2 Convex geometry 11 Slab . . . . . . . . . . . . . . . . . . . . . . . . 12 Open, closed, convex sets . . . . . . . . . . . . . 13 Intersection of line with boundary . . . . . . . . 14 Tangentials . . . . . . . . . . . . . . . . . . . . 15 Inverse image . . . . . . . . . . . . . . . . . . . 16 Inverse image under linear map . . . . . . . . . 17 Tesseract . . . . . . . . . . . . . . . . . . . . . 18 Linear injective mapping of Euclidean body . . 19 Linear noninjective mapping of Euclidean body 20 Convex hull of a random list of points . . . . . . 21 Hulls . . . . . . . . . . . . . . . . . . . . . . . . 22 Two Fantopes . . . . . . . . . . . . . . . . . . . 23 Convex hull of rank-1 matrices . . . . . . . . . . 13

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

14 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61

LIST OF FIGURES A simplicial cone . . . . . . . . . . . . . . . . . . . . . . . . Hyperplane illustrated ∂H is a partially bounding line . . . Hyperplanes in R2 . . . . . . . . . . . . . . . . . . . . . . . Affine independence . . . . . . . . . . . . . . . . . . . . . . . {z ∈ C | aTz = κi } . . . . . . . . . . . . . . . . . . . . . . . . Hyperplane supporting closed set . . . . . . . . . . . . . . . Minimizing hyperplane over affine set in nonnegative orthant Maximizing hyperplane over convex set . . . . . . . . . . . . Closed convex set illustrating exposed and extreme points . Two-dimensional nonconvex cone . . . . . . . . . . . . . . . Nonconvex cone made from lines . . . . . . . . . . . . . . . Nonconvex cone is convex cone boundary . . . . . . . . . . . Union of convex cones is nonconvex cone . . . . . . . . . . . Truncated nonconvex cone X . . . . . . . . . . . . . . . . . Cone exterior is convex cone . . . . . . . . . . . . . . . . . . Not a cone . . . . . . . . . . . . . . . . . . . . . . . . . . . . Minimum element, Minimal element . . . . . . . . . . . . . . K is a pointed polyhedral cone not full-dimensional . . . . . Exposed and extreme directions . . . . . . . . . . . . . . . . Positive semidefinite cone . . . . . . . . . . . . . . . . . . . Convex Schur-form set . . . . . . . . . . . . . . . . . . . . . Projection of truncated PSD cone . . . . . . . . . . . . . . . Circular cone showing axis of revolution . . . . . . . . . . . Circular section . . . . . . . . . . . . . . . . . . . . . . . . . Polyhedral inscription . . . . . . . . . . . . . . . . . . . . . Conically (in)dependent vectors . . . . . . . . . . . . . . . . Pointed six-faceted polyhedral cone and its dual . . . . . . . Minimal set of generators for halfspace about origin . . . . . Venn diagram for cones and polyhedra . . . . . . . . . . . . Simplex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Two views of a simplicial cone and its dual . . . . . . . . . . Two equivalent constructions of dual cone . . . . . . . . . . Dual polyhedral cone construction by right angle . . . . . . K is a halfspace about the origin . . . . . . . . . . . . . . . Iconic primal and dual objective functions . . . . . . . . . . Orthogonal cones . . . . . . . . . . . . . . . . . . . . . . . . ∗ Blades K and K . . . . . . . . . . . . . . . . . . . . . . . . Membership w.r.t K and orthant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 72 75 78 79 80 88 89 93 96 97 97 97 98 98 100 103 106 110 114 115 118 129 130 132 141 142 144 148 150 152 156 157 158 159 164 166 176

LIST OF FIGURES 62 63 64 65 66 67 Shrouded polyhedral cone . . . . . . . . . . . Simplicial cone K in R2 and its dual . . . . . . Monotone nonnegative cone KM+ and its dual Monotone cone KM and its dual . . . . . . . . Two views of monotone cone KM and its dual First-order optimality condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

15 180 185 196 197 198 202

3 Geometry of convex functions 217 68 Convex functions having unique minimizer . . . . . . . . . . . 219 69 Minimum/Minimal element, dual cone characterization . . . . 221 70 1-norm ball B1 from compressed sensing/compressive sampling 225 71 Cardinality minimization, signed versus unsigned variable . . 227 72 1-norm variants . . . . . . . . . . . . . . . . . . . . . . . . . . 227 73 Affine function . . . . . . . . . . . . . . . . . . . . . . . . . . 237 74 Supremum of affine functions . . . . . . . . . . . . . . . . . . 239 75 Epigraph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240 76 Gradient in R2 evaluated on grid . . . . . . . . . . . . . . . . 248 77 Quadratic function convexity in terms of its gradient . . . . . 255 78 Contour plot of convex real function at selected levels . . . . . 257 79 Iconic quasiconvex function . . . . . . . . . . . . . . . . . . . 268 4 Semidefinite programming 80 Venn diagram of convex program classes . . . . . . . . 81 Visualizing positive semidefinite cone in high dimension 82 Primal/Dual transformations . . . . . . . . . . . . . . 83 Projection versus convex iteration . . . . . . . . . . . . 84 Trace heuristic . . . . . . . . . . . . . . . . . . . . . . 85 Sensor-network localization . . . . . . . . . . . . . . . . 86 2-lattice of sensors and anchors for localization example 87 3-lattice of sensors and anchors for localization example 88 4-lattice of sensors and anchors for localization example 89 5-lattice of sensors and anchors for localization example 90 Uncertainty ellipsoids orientation and eccentricity . . . 91 2-lattice localization solution . . . . . . . . . . . . . . . 92 3-lattice localization solution . . . . . . . . . . . . . . . 93 4-lattice localization solution . . . . . . . . . . . . . . . 94 5-lattice localization solution . . . . . . . . . . . . . . . 271 275 278 288 309 310 314 317 318 319 320 321 324 324 325 325

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

16 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120

LIST OF FIGURES 10-lattice localization solution . . . . . . . . . . . . . . . . . . 100 randomized noiseless sensor localization . . . . . . . . . . 100 randomized sensors localization . . . . . . . . . . . . . . . Regularization curve for convex iteration . . . . . . . . . . . . 1-norm heuristic . . . . . . . . . . . . . . . . . . . . . . . . . . Sparse sampling theorem . . . . . . . . . . . . . . . . . . . . . Signal dropout . . . . . . . . . . . . . . . . . . . . . . . . . . Signal dropout reconstruction . . . . . . . . . . . . . . . . . . Simplex with intersecting line problem in compressed sensing . Geometric interpretations of sparse-sampling constraints . . . Permutation matrix column-norm and column-sum constraint max cut problem . . . . . . . . . . . . . . . . . . . . . . . . Shepp-Logan phantom . . . . . . . . . . . . . . . . . . . . . . MRI radial sampling pattern in Fourier domain . . . . . . . . Aliased phantom . . . . . . . . . . . . . . . . . . . . . . . . . Neighboring-pixel stencil on Cartesian grid . . . . . . . . . . . Differentiable almost everywhere . . . . . . . . . . . . . . . . . Eternity II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Eternity II game-board grid . . . . . . . . . . . . . . . . . . . Eternity II demo-game piece illustrating edge-color ordering . Eternity II vectorized demo-game-board piece descriptions . . Eternity II difference and boundary parameter construction . Sparsity pattern for composite Eternity II variable matrix . . MIT logo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . One-pixel camera . . . . . . . . . . . . . . . . . . . . . . . . . One-pixel camera - compression estimates . . . . . . . . . . . 326 326 327 330 333 337 341 342 344 347 356 364 370 375 377 379 380 383 385 386 387 389 390 400 401 402

5 Euclidean Distance Matrix 121 Convex hull of three points . . . . . . . . . 122 Complete dimensionless EDM graph . . . . 123 Fifth Euclidean metric property . . . . . . 124 Fermat point . . . . . . . . . . . . . . . . 125 Arbitrary hexagon in R3 . . . . . . . . . . 126 Kissing number . . . . . . . . . . . . . . . 127 Trilateration . . . . . . . . . . . . . . . . . 128 This EDM graph provides unique isometric 129 Two sensors • and three anchors ◦ . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . reconstruction . . . . . . . . .

. . . . . . . . .

409 . 410 . 413 . 415 . 424 . 426 . 427 . 432 . 437 . 437

LIST OF FIGURES 130 131 132 133 134 135 136 137 138 139 140 141 142 143 Two discrete linear trajectories of sensors . . . . . Coverage in cellular telephone network . . . . . . Contours of equal signal power . . . . . . . . . . . Depiction of molecular conformation . . . . . . . Square diamond . . . . . . . . . . . . . . . . . . . Orthogonal complements in SN abstractly oriented Elliptope E 3 . . . . . . . . . . . . . . . . . . . . . Elliptope E 2 interior to S2 . . . . . . . . . . . . . + T Smallest eigenvalue of −VN DVN . . . . . . . . . Some entrywise EDM compositions . . . . . . . . Map of United States of America . . . . . . . . . T Largest ten eigenvalues of −VN OVN . . . . . . . Relative-angle inequality tetrahedron . . . . . . . Nonsimplicial pyramid in R3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

17 438 440 441 443 452 455 474 475 481 481 494 497 504 508

6 Cone of distance matrices 144 Relative boundary of cone of Euclidean distance matrices 145 Example of VX selection to make an EDM . . . . . . . . 146 Vector VX spirals . . . . . . . . . . . . . . . . . . . . . . 147 Three views of translated negated elliptope . . . . . . . . 148 Halfline T on PSD cone boundary . . . . . . . . . . . . . 149 Vectorization and projection interpretation example . . . 150 Intersection of EDM cone with hyperplane . . . . . . . . 151 Neighborhood graph . . . . . . . . . . . . . . . . . . . . 152 Trefoil knot untied . . . . . . . . . . . . . . . . . . . . . 153 Trefoil ribbon . . . . . . . . . . . . . . . . . . . . . . . . 154 Orthogonal complement of geometric center subspace . . 155 EDM cone construction by flipping PSD cone . . . . . . 156 Decomposing a member of polar EDM cone . . . . . . . 157 Ordinary dual EDM cone projected on S3 . . . . . . . . h

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

511 . 514 . 519 . 522 . 529 . 533 . 534 . 537 . 539 . 540 . 542 . 547 . 548 . 552 . 558

7 Proximity problems 561 158 Pseudo-Venn diagram . . . . . . . . . . . . . . . . . . . . . . 564 159 Elbow placed in path of projection . . . . . . . . . . . . . . . 565 160 Convex envelope . . . . . . . . . . . . . . . . . . . . . . . . . 585

. . . . subspaces of elementary matrix . . . . . . . . . . . . . . 756 173 Projection product on convex set in subspace . . . . . . . . . . . . . . . . . . . . . . 779 183 Normal-cone progression . . . . 754 172 Projection on dual cone . . . . . . . . . . . . D Matrix calculus 683 166 Convex quadratic bowl in R2 × R . . . . . . . . . . . . . . . feasibility . . . . . . . . . . . . . . . . . . . 649 . . . . . . . . . . . . . . . 767 177 Alternating projection on nonnegative orthant and hyperplane 770 178 Geometric convergence of iterates in norm . . . . . . . . . . . . . . . . 764 175 Alternating projection on two halfspaces . . . . . . . . . . . . . . . . . . . . . 770 179 Distance between PSD cone and iterate in A . . . . . . . . . . . . . . . . . . . . . . . 751 171 Projection on orthogonal complement . . . . . . . . . .18 LIST OF FIGURES A Linear algebra 605 161 Geometrical interpretation of full SVD . . . . . . 775 180 Dykstra’s alternating projection algorithm . . . . . . . . . . . . . . . 654 . . . . . . . . . . . 778 182 Normal cone to elliptope . . . . . . . . . . . . . . . . . . . 647 . . . . . . . . . . . . . . . . 694 E Projection 713 167 Action of pseudoinverse . 714 168 Nonorthogonal projection of x ∈ R3 on R(U ) = R2 . 720 169 Biorthogonal expansion of point x ∈ aff K . . 653 . . . . . . optimization. . . . . . . . . 638 B Simple matrices 162 Four fundamental 163 Four fundamental 164 Four fundamental 165 Gimbal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 765 176 Distance. . . . . 781 . . . 761 174 von Neumann-style projection of point b . . . . subspaces of a doublet . . . 663 subspaces of any dyad . 777 181 Polyhedral normal cones . . . . 732 170 Dual interpretation of projection on convex set . . . . . . .

Convex Optimization & Euclidean Distance Geometry. directions Cone Table 1 Cone Table S Cone Table A Cone Table 1* 4 Semidefinite programming 3 Faces of S3 corresponding to faces of S+ + 5 Euclidean Distance Matrix Pr´cis 5.i.List of Tables 2 Convex geometry Table 2.2: affine dimension in terms of rank e B Simple matrices Auxiliary V -matrix Table B.2. citation: Dattorro. algebraic gradients and derivatives Table D.2. exponential gradients and derivatives 2001 Jon Dattorro. logarithmic derivatives Table D.2.1. trace Kronecker gradients Table D.0.7.2. logarithmic determinant gradients.0. Mεβoo Publishing USA.2.28. v2012. determinant gradients and derivatives Table D. 121 140 193 193 194 199 279 463 660 705 706 707 709 710 710 711 19 . co&edg version 2012.2.7. maximum number of c. rank versus dimension of S3 faces + Table 2.1.2. derivatives Table D. 2005.3.4. All rights reserved.6.01.5.28.4.2.9.1.10. trace gradients and derivatives Table D.3.4 D Matrix calculus Table D.01.2.

20 .

01. A virtual flood of new applications follows by epiphany that many problems. [397] [321] This book is about convex optimization. then this interpretive benefit is acquired. and geometrical problems that can be relaxed or transformed into convex problems. [10] [11] [59] [93] [150] [152] [279] [300] [308] [364] [365] [394] [397] [34. p. co&edg version 2012. combinatorial. Mεβoo Publishing USA. and nonconvex. citation: Dattorro.316-322] 2001 Jon Dattorro. All rights reserved. Conversely. can be so transformed. 21 .3.28.Chapter 1 Overview Convex Optimization Euclidean Distance Geometry People are so afraid of convex analysis. 2005. the mathematical science of Optimization is the study of how to make a good choice when confronted with conflicting requirements. That is a powerful attraction: the ability to visualize geometry of an optimization problem. The qualifier convex means: when an optimal solution is found. recent advances in geometry and in graph theory hold convex optimization within their proofs’ core. presumed nonconvex. v2012. Convex Optimization & Euclidean Distance Geometry. If a given optimization problem can be transformed to a convex equivalent. there is no better choice.28. −Claude Lemar´chal. 4. then it is guaranteed to be a best solution. Any convex optimization problem has geometric interpretation. 2003 e In layman’s terms.01. convex geometry (with particular attention to distance geometry).

given only distance information.. the Sun as vertex to two distant stars).g. want to realize a constellation given only interstellar distance (or. (astrophotography by Massimo Robberto) Euclidean distance geometry is. By inference we mean: e. abstractly. distance geometry of music [108]. a list of points in some dimension that attains the given interpoint distances. fundamentally. equivalently. for example. . any entity expressible as a vector in finite-dimensional Euclidean space. called stellar cartography. OVERVIEW Figure 1: Orion nebula. yet there are many circumstances where this can be reduced to O(N ). parsecs from our Sun and relative angular measurement.g. a determination of point conformation (configuration. relative position or location) by inference from interpoint distance information.22 CHAPTER 1. It is a common misconception to presume that some desired point conformation cannot be recovered in absence of complete interpoint distance information. Each point may represent simply location or. We might.. At first it may seem O(N 2 ) data is required. e. an application evoked by Figure 1. determine whether there corresponds a realizable conformation of points.

a convex optimization problem. [322] If we agree that a set of points may have a shape (three points can form a triangle and its interior. Trilateration is expressible as a semidefinite program. it becomes possible to determine absolute position or location. given only distance information. reflection. but we can determine the smallest possible dimension in which an unknown list of points can exist. e.2. for example.3. that attribute is their affine dimension (a triangle in any ambient space has affine dimension 2.4. In this scenario. x3 ..x4 ˇ x3 ˇ x2 ˇ Figure 2: Application of trilateration ( 5.5) is localization (determining position) of a radio signal source in 2 dimensions. more commonly known by radio engineers as the process “triangulation”. hence. translation). for example). It should be apparent: from distance. x4 are illustrated as fixed antennae. In circumstances where stationary reference points are also provided. four points a tetrahedron). anchors x2 . Figure 2. [210] The radio signal source ˇ ˇ ˇ (a sensor • x1 ) anywhere in affine hull of three antenna bases can be uniquely localized by measuring distance to each (dashed white arrowed line segments). these shapes can be determined only to within a rigid transformation (rotation. then we can ascribe shape of a set of points to their convex hull. 23 . Ambiguity of lone distance measurement to sensor is represented by circle about each antenna. Absolute position information is generally lost.g.

[47] [392] [45] wireless location of a radio-signal source such as a cell phone by multiple measurements of signal strength. Figure 3. At the beginning of the 1980s.g.. e. Mathematics of this combined study of geometry and optimization is rich and deep. . The method is called sequential assignment and is today a cornerstone of all NMR structural investigations.g. [88] [354] [140] [46] Many disciplines have already benefitted and simplified consequent to this theory. −[283] [381] [184] Geometric problems involving distance between points can sometimes be reduced to convex optimization problems. distance-based pattern recognition (Figure 4). He invented a systematic method of pairing each NMR signal with the right hydrogen nucleus (proton) in the macromolecule. and multidimensional scaling which is a numerical representation of qualitative data by finding a low-dimensional scale. Its application has already proven invaluable discerning organic molecular conformation by measuring interatomic distance along covalent bonds. He also showed how it was subsequently possible to determine pairwise distances between a large number of hydrogen nuclei and use this information with a mathematical method based on distance-geometry to calculate a three-dimensional structure for the molecule. Kurt W¨thrich [Nobel laureate] developed u an idea about how NMR could be extended to cover biological molecules such as proteins.. the global positioning system (GPS).24 CHAPTER 1. e. localization in wireless sensor networks by measurement of intersensor distance along channels of communication. OVERVIEW Figure 3: [189] [117] Distance data collected via nuclear magnetic resonance (NMR) helped render this 3-dimensional depiction of a protein molecule.

Distance geometry is applied to determine discriminating-features. their many manifestations and representations. – Fourier spectra of kindred utterances [213].1).0. to robotics. called face recognition.7.g.. we make convex polyhedra.0. (Figure 8) by chapter We study pervasive convex Euclidean bodies. 6. e. cones. and dual cones visceral through illustration in chapter 2 Convex geometry where geometric relation of polyhedral cones to nonorthogonal bases .25 Figure 4: This coarsely discretized triangulated algorithmically flattened human face (made by Kimmel & the Bronsteins [227]) represents a stage in machine recognition of human identity. Euclidean distance geometry together with convex optimization have also found application in artificial intelligence: to machine learning by discerning naturally occurring manifolds in: – Euclidean bodies (Figure 5. automatic control of vehicles maneuvering in formation. – and photographic image sequences [375]. In particular.

yielding a faithful embedding of the original manifold Ç. . illustrated for N = 800 data points sampled from a “Swiss roll” Ç. The problem of manifold 1 learning. An unsupervised learning algorithm unfolds the Swiss 3 roll while preserving the local geometry of nearby data points Ç. OVERVIEW Figure 5: Swiss roll from Weinberger & Saul [375].26 CHAPTER 1. Finally. A discretized manifold is revealed by connecting each data point and its k =6 2 nearest neighbors Ç. the data points are projected onto the two dimensional subspace that maximizes 4 their variance.

is studied in depth: We interpret. How to find the smallest face of any pointed closed convex cone containing convex set C . Desire to visualize in high dimension [Sagan. later shown to have practical application to presolving convex programs. Arcane theorems of alternative generalized inequality are.9. 22:55′ ] is deeply embedded in the mathematical psyche. and extreme direction of a convex Euclidean body are explained here. We explain conversion between halfspaceand vertex-description of a convex cone. high-rank subsets of the positive semidefinite cone boundary united with its interior are convex. In other words.1. affine.9. extreme point. simply derived from cone membership relations.27 (biorthogonal expansion) is examined. is introduced as a tool for study. discriminated by rank exceeding some lower bound. in particular. a natural extension and next logical step in progression: linear. generalizations of algebraic Farkas’ lemma translated to geometry of convex cones. for example.12.13. [1] Chapter 2 provides tools to make visualization easier.9. It is shown that coordinates are unique in any conic system whose basis cardinality equals or exceeds spatial dimension.2) Subsets of the positive semidefinite cone. in fact. (Figure 48) . analysis. while Gerˇgorin discs specify inscription of a polyhedral cone into that s positive semidefinite cone. we motivate the dual cone and provide formulae for finding it. conic. is divulged. The positive semidefinite cone is a circular cone in low dimension. and we show how first-order optimality conditions or alternative systems of linear inequality or linear matrix inequality can be explained by dual generalized inequalities with respect to convex cones. (Theorem 2. are convex. The convex cone of positive semidefinite matrices.2. Cosmos −The Edge of Forever. and we teach how to visualize in high dimension.3) There is a closed form for projection on those convex subsets. for example. and manipulation of cones. The concepts of face.1.0. a new definition of conic coordinate is provided in Theorem 2. inverse image of the positive semidefinite cone under affine transformation. crucial to understanding convex optimization. called conic independence. Any convex optimization problem can be visualized geometrically. for high cardinality.0. (Example 2. The conic analogue to linear independence.

Positive definite Farkas’ lemma is derived. it is easy to reconstruct map exactly via Schoenberg criterion (941). methods are drawn from the appendices about matrix calculus for determining convexity and discerning geometry.1.0. confer Figure 140) Map reconstruction is exact (to within a rigid transformation) given any number of interpoint distances. called Euclidean distance matrix. and we also show how to determine if a feasible set belongs exclusively to a positive semidefinite cone boundary. for example.1.28 CHAPTER 1.13. and a perturbation method for rank reduction of optimal solutions (extant but not well known). An arguably good three-dimensional polyhedral analogue to the positive semidefinite cone of 3 × 3 symmetric matrices is introduced: a new tool for visualizing coexistence . Partly a toolbox of practical useful convex functions and a cookbook for optimization problems. Chapter 3 Geometry of convex functions observes Fenchel’s analogy between convex sets and functions: We explain. the greater the detail (as it is for all conventional map preparation). OVERVIEW original reconstruction Figure 6: About five thousand points along borders constituting United States were used to create an exhaustive matrix of interpoint distance for each and every pair of points in an ordered set (a list). Semidefinite programming is reviewed in chapter 4 with particular attention to optimality conditions for prototypical primal and dual problems. their interplay. the greater the number of distances. ( 5. From that noiseless distance information. how the real affine function relates to convex functions as the hyperplane relates to convex sets.

content. We introduce a method of convex iteration for constraining rank in the form rank G ≤ ρ and cardinality in the form card x ≤ k . and so on ad infinitum) grow linearly in complexity and number with problem size. Reconstruction methods are explained and applied to a map of the United States.a localization) is a convex optimization problem. n The sensor-network localization problem is solved in any dimension in this chapter.and high-rank optimal solutions in six isomorphic dimensions and a mnemonic aid for understanding semidefinite programs. We show how to handle polynomial constraints.29 of low. We explain why trilateration (a. We show how to recover relative position given incomplete interpoint distance information. its properties and relationship to both positive semidefinite and Gram matrices. We also experimentally test a conjecture of Borg & Groenen by reconstructing a distorted but recognizable isotonic map of the USA using only ordinal (comparative) distance data: Figure 140e-f. observing existence of an infinity of properties of the Euclidean metric beyond triangle inequality. 1} . (679) i=1 . volume.k. We demonstrate an elegant method for including dihedral (or torsion) angle constraints into a molecular conformation problem. e. for which we find a new lower bound of 1. Cardinality minimization is applied to a discrete image-gradient of the Shepp-Logan phantom. We proceed by deriving the fifth Euclidean metric property and then explain why furthering this endeavor is inefficient because the ensuing criteria (while describing polyhedra in angle or area.g. Figure 6. . e.. from Magnetic Resonance Imaging (MRI) in the field of medical imaging. and how to transform a rank-constrained problem to a rank-1 problem. thereby. We relate the EDM to the four classical properties of Euclidean metric.. . and how to pose EDM problems or transform geometrical problems to convex optimizations. . We find a minimal cardinality Boolean solution to an instance of Ax = b : minimize x x 0 subject to Ax = b xi ∈ {0.g.9% cardinality. The EDM is studied in chapter 5 Euclidean Distance Matrix. kissing number of packed spheres about a central sphere (solved in R3 by Isaac Newton).

and then devise a polyhedral spectral cone for determining membership of a given matrix (in Cayley-Menger form) to the convex cone of Euclidean distance matrices.30 CHAPTER 1. a matrix is an EDM if and only if its nonincreasingly ordered vector of eigenvalues belongs to a polyhedral spectral cone for EDMN . . this Schoenberg criterion implies nonnegativity of the EDM entries. but not obvious. id est. It is known. assuming. OVERVIEW Figure 7: These bees construct a honeycomb by solving a convex optimization problem ( 5. Sphere centers describe a regular lattice.4. The set of all Euclidean distance matrices forms a pointed closed convex cone called the EDM cone: EDMN . We characterize the eigenvalue spectrum of an EDM. We offer a new proof of Schoenberg’s seminal characterization of EDMs: T −VN DVN D ∈ EDM N ⇔ 0 D ∈ SN h (941) Our proof relies on fundamental geometry. proved herein.4).3.2. any EDM must correspond to a list of points contained in some polyhedron (possibly at its vertices) and vice versa.    λ   0 1T 1 −D RN + R− D ∈ EDMN ⇔ ∈ ∩ ∂H (1157) D∈ SN h We will see: spectral cones are not unique. The most dense packing of identical spheres about a central sphere in 2 dimensions is 6.

but still open is the question whether all its faces are exposed as they are for the positive semidefinite cone. [139] In chapter 6 Cone of distance matrices we explain a geometric relationship between the cone of Euclidean distance matrices. in particular. guard a perimeter.31 Figure 8: Robotic vehicles in concert can move larger objects. or localize a plume of gas. The Schoenberg criterion D ∈ EDM N ⇔ T −VN DVN ∈ SN −1 + D ∈ SN h (941) for identifying a Euclidean distance matrix is revealed to be a discretized membership relation (dual generalized inequalities. A matrix criterion for membership to the dual EDM cone is derived that is simpler than the . or radio waves. two positive semidefinite cones. We illustrate geometric requirements. for projection of a given matrix on a positive semidefinite cone that establish its membership to the EDM cone. The faces of the EDM cone are described. liquid. and the elliptope. a new Farkas’-like lemma) ∗ between the EDM cone and its ordinary dual: EDMN . find land mines. perform surveillance.

OVERVIEW 0 (1307) There is a concise equality. Known heuristics for rank minimization are also explained.32 Schoenberg criterion: D∗ ∈ EDMN ⇔ δ(D∗ 1) − D∗ ∗ CHAPTER 1. an equality between two large convex Euclidean bodies: EDMN = SN ∩ SN ⊥ − SN h c + (1301) In chapter 7 Proximity problems we explore methods of solution to a few fundamental and prevalent Euclidean distance matrix proximity problems. . to a given matrix H : √ ◦ 2 F minimize D subject to rank V D V ≤ ρ D ∈ EDMN minimize D −V (D − H)V 2 F minimize √ ◦ D D−H subject to rank V D V ≤ ρ √ ◦ D ∈ EDMN minimize √ ◦ D subject to rank V D V ≤ ρ D ∈ EDMN D−H 2 F √ −V ( ◦ D − H)V (1343) 2 F subject to rank V D V ≤ ρ √ ◦ D ∈ EDMN We apply a convex iteration method for constraining rank. We explain how this problem is transformed to a convex optimization for any rank ρ . with particular regard to Euclidean projection of a point on that generally nonconvex subset of the positive semidefinite cone boundary comprising all semidefinite matrices having rank not exceeding a prescribed bound ρ . apparently overlooked in the literature. of a famous discovery by Eckart & Young in 1936 [130]. We offer new geometrical proof. relating the convex cone of Euclidean distance matrices to the positive semidefinite cone. in some sense. the problem of finding that distance matrix closest.

collection of known analytical solutions to some important optimization problems (appendix C). or between projection and a linear objective function in appendix E). in appendix B).and second-order gradients and matrix derivatives). to determine conic independence. the other given only comparative distance data. so as to be more self-contained: linear algebra (appendix A is primarily concerned with proper statements of semidefiniteness for square matrices). compressed sensing (compressive sampling) for digital image and audio signal processing. doublet. 3] formally known as analytic geometry. in the form of appendices. matrix differentiation and directional derivatives. for example. and tables of first. Taylor series. simple matrices (dyad. Schoenberg. . Matlab code on Wıκımization to discriminate EDMs. and two distinct methods of reconstructing a map of the United States: one given only distance data. Householder. [13. matrix calculus remains somewhat unsystematized when compared to ordinary calculus (appendix D concerns matrix-valued functions. to reduce or constrain rank of an optimal solution to a semidefinite program. orthogonal. [383] Sundry toolboxes are provided. an elaborate exposition offering insight into orthogonal and nonorthogonal projection on convex sets (the connection between projection and positive semidefiniteness. elementary. etcetera.33 appendices We presume a reader already comfortable with elementary vector operations.

Figure 10: Digital Michelangelo Project.29mm. Measuring distance to David by laser rangefinder. OVERVIEW Figure 9: Three-dimensional reconstruction of David from distance data. Spatial resolution is 0. Stanford University.34 CHAPTER 1. .

v2012.01. we provide much background material on linear algebra (especially in the appendices) although a reader is assumed comfortable with [333] [335] [203] or any other intermediate-level text.01. [216] [204.Chapter 2 Convex geometry Convexity has an immensely rich structure and numerous applications. On the other hand.6] [296] 2001 Jon Dattorro. Mεβoo Publishing USA. As convex geometry and linear algebra are inextricably bonded by asymmetry (linear inequality).28. co&edg version 2012. almost every “convex” idea can be explained by a two-dimensional picture.28. 35 . citation: Dattorro. 6. 2005. much of a practitioner’s energy is expended seeking invertible transformation of problematic sets to convex ones. There is relatively less published pertaining to convex matrix-valued functions. The essential references to convex analysis are [200] [309]. All rights reserved. For that reason. −Alexander Barvinok [26.vii] We study convex geometry because it is the easiest of geometries. Convex Optimization & Euclidean Distance Geometry. p. The reader is referred to [332] [26] [374] [42] [61] [306] [361] for a comprehensive treatment of convexity.

5) if every vector2. by definition.3] A subspace is a convex set containing the origin. β ∈ R . given matrix C ∈ Rm×n is a good icon for a convex set. but does not necessarily. p. in place of the Latin exempli gratia.g.2] A convex set can.2. 3. C is convex iff the line segment connecting any two points in C is itself in C .2 R=x+R . A vector is assumed.1] although not proper. the linear sum in (1) is called a convex combination of Y and Z . Z ∈ C and 0 ≤ µ ≤ 1 Under that defining condition on µ . inclusively.5] [42. for α . 2] It is not difficult to show R+R=R R = −R (3) (4) (5) as is true for any subspace R . throughout.1. [251. Apparent from this definition: a convex set is a connected set. to be a column vector. [309. because x ∈ R ⇔ −x ∈ R . 2. contain the origin 0. p.3 any line through the origin in two-dimensional Euclidean space R2 . This particular definition is slablike (Figure 11) in Rn when C has nontrivial nullspace.1 Convex set µ Y + (1 − µ)Z ∈ C (1) A set C is convex iff for all Y . is in R whenever vectors x and y are.4. 3.. If Y and Z are points in real finite-dimensional Euclidean vector space [228] [383] Rn or Rm×n (matrices). 2.g.1 subspace A nonempty subset R of real Euclidean vector space Rn is called a subspace ( 2. [259. Any subspace not constituting the entire ambient vector space Rn is a proper subspace.2 of the form αx + βy . then (1) represents the closed line segment joining them.42). An ellipsoid centered at x = a (Figure 13 p.4] Any subspace is therefore open in the sense that it contains no boundary. 2. e. but closed in the sense [259.36 CHAPTER 2.3 We substitute abbreviation e. CONVEX GEOMETRY 2. The vector space Rn is itself a conventional subspace. Line segments are thereby convex sets. Given any x ∈ R The intersection of an arbitrary collection of subspaces remains a subspace.1 {x ∈ Rn | C(x − a) 2 = (x − a)T C TC(x − a) ≤ 1} (2) 2.1 2.2. [228. 2.

10.86] Conversely. . 2. iff no vector from the given set can be expressed as a linear combination of those remaining.3 Orthant: name given to a closed convex set that is the higher-dimensional generalization of quadrant from the classical Cartesian partition of R2 . there exist no nontrivial solution ζ ∈ RN to Γy1 ζi + · · · + ΓyN −1 ζN −1 − ΓyN ζN = 0 (8) By factoring out Γ . n R+ The nonpositive orthant Rn or Rn×n (analogue to quadrant III) denotes − − negative and 0 entries. by definition.4. The most common is the nonnegative orthant Rn or Rn×n + + (analogue to quadrant I) to which membership denotes nonnegative vectoror matrix-entries respectively.1.4 is easily verified by definition (1). T (Γ) : Rn×N → Rn×N ΓY (7) 2. i = 1 . 2.1.) if and only if.12.1. two nontrivial vector subspaces are linearly independent iff they intersect only at the origin. . [228. i = 1 . 2. a Cartesian cone. consider the mapping whose domain is the set of all matrices Γ ∈ Rn×N holding a linearly independent set columnar.5. ( 2. p.1.4. in other words.g. Linear independence of {Γyi ∈ Rn .1) .. Geometrically.1 preservation of linear independence (confer 2. e.4 {x ∈ Rn | xi ≥ 0 ∀ i} (9) All orthants are selfdual simplicial cones.1.i. Given Y = [ y1 y2 · · · yN ] ∈ RN ×N . N } demands. Orthant convexity2. linear independence can be preserved under linear transformation. CONVEX SET 37 2. . we see that triviality is ensured by linear independence of {yi ∈ RN }. .2.1) Linear transformation preserves linear dependence. N } are linearly independent (l.3. for all ζ ∈ RN (ζi ∈ R) Γ1 ζ1 + · · · + ΓN −1 ζN −1 − ΓN ζN = 0 (6) has only the trivial solution ζ = 0 .2 linear independence Arbitrary given vectors in Euclidean space {Γi ∈ Rn .2.13.2.1. 2.

[309. Illustrated in R2 . The intersection of an arbitrary collection of affine sets remains affine.2.4] . 2. line. hyperplane ( 2.4). We analogize affine subset to subspace. △ For some parallel 2. an affine subset of Rn . 2.1 Definition.2). it may be constructed by intersecting two opposing halfspaces whose bounding hyperplanes are parallel but not coincident.g.4.5 defining it to be any nonempty affine set of vectors ( 2.6 The popular term affine subspace is an oxymoron.1) is the smallest affine set containing it.5 2. to each halfspace-boundary. (Cartesian axes + and vector inward-normal.3. p.0..4. CONVEX GEOMETRY {y ∈ R2 | c ≤ aTy ≤ b} (2001) Figure 11: A slab is a convex Euclidean body infinite in extent but not affine.1. subspace. Two affine sets are parallel when one is a translation of the other. etcetera. plane.12).1. empty set ∅ .) 2. Any affine set is convex and open so contains no boundary: e.4 affine set A nonempty affine set (from the word affinity) is any subset of Rn that is a translation of some subspace. are drawn for reference. point.6 subspace R and any point x ∈ A A is affine ⇔ A = x + R = {y | y − x ∈ R} (10) Affine hull of a set C ⊆ Rn ( 2.38 CHAPTER 2. Affine subset. slab is a polyhedron ( 2.1. Because number of halfspaces used in its construction is finite.

1. paper in the real world: An ordinary flat sheet of paper is a nonempty convex set having empty interior in R3 but nonempty interior relative to its affine hull.2. [200.1.6 empty set versus empty interior Emptiness ∅ of a set is handled differently than interior in the classical literature. It is common for a nonempty convex set to have empty interior..3]. II.8 Likewise for relative boundary ( 2.14] dim S dim aff S = dim aff(S − s) .8 Superfluous mingling of terms as in relatively nonempty set would be an unfortunate consequence.1] 2.6.1. 2. Relative interior rel int C of a convex set C ⊆ Rn is interior relative to its affine hull. From the opposite perspective.7. p.5 dimension Dimension of an arbitrary set S is Euclidean dimension of its affine hull. we prefer the qualifier relative which is the conventional fix to this ambiguous terminology. [200.2). 2.1] 2.1.2.2.2. Hence dimension (of a set) is synonymous with affine dimension.7 So we distinguish interior from relative interior throughout: [332] [374] [361] Classical interior int C is defined as a union of points: x is an interior point of C ⊆ Rn if there exists an open ball of dimension n and nonzero radius centered at x that is contained in C . although relative closure is superfluous.2. CONVEX SET 39 2. e. [374.g. s ∈ S (11) the same as dimension of the subspace parallel to that affine set aff S when nonempty. A.1 relative interior Although it is always possible to pass to a smaller ambient Euclidean space where a nonempty set acquires an interior [26. some authors use the term full or full-dimensional to describe a set having nonempty interior. A.7 .2.1.

int{x} = ∅ .9 (17) ⇔ ∂ int C = ∂ C (18) and presumption of nonempty interior.9 Implications are: Otherwise. were C paper) or in the exception when C is a single point.1. . closure of the relative interior of a convex set C always yields closure of the set itself. 1.1. p. e. 2. 2.7.1. [200. [259.. then rel int C = int C Given the intersection of convex set C with affine set A rel int(C ∩ A) = rel int(C) ∩ A Because an affine set A is open rel int A = A (16) ⇐ rel int(C) ∩ A = ∅ (15) (14) 2.g.3] int{x} = ∅ = ∅ the empty set is both open and closed.1] which follows from the fact int C = C 2. ∂ C = C \ int C [55.7 classical boundary (confer 2. CONVEX GEOMETRY Thus defined.2) Boundary of a set C is the closure of C less its interior.2.24] If C has nonempty interior. If C is convex then rel int C and C are convex.2. x ∈ Rn (12) In any case.1] rel int{x} aff{x} = {x} .2. [259. it is common (though confusing) for int C the interior of C to be empty while its relative interior is not: this happens whenever dimension of its affine hull is less than dimension of the ambient space (dim aff C < n .40 CHAPTER 2. rel int C = C (13) Closure is invariant to translation. for x ∈ Rn as in (12).

p.49] are convex.6) and positive orthant with origin adjoined [309. and it is not convex because it does not contain all convex combinations of its boundary points. it is not closed because it does not contain its boundary. int C = C \∂ C a bounded open set has boundary defined but not contained in the set interior of an open set is equivalent to the set itself.1 Line intersection with boundary A line can intersect the boundary of a convex set in any dimension at a point demarcating the line’s entry to the set interior.1.3). (b) Neither open. 2.9.109] C is open ⇔ int C = C C is closed ⇔ int C = C (19) (20) The set illustrated in Figure 12b is not open because it is not equivalent to its interior.2. Nonnegative orthant with origin excluded ( 2. p. for example. CONVEX SET 41 (a) R2 (b) (c) Figure 12: (a) Closed convex set. (c) Open convex set.2. from which an open set is defined: [259.7. On one side of that .1.9. closed. Yet PSD cone can remain convex in absence of certain boundary components ( 2. or convex.

1). Intersection of line with ellipsoid in R . .6.42 CHAPTER 2.1. Intersection of line with boundary is a point at entry to interior. Each ellipsoid illustrated has entire boundary constituted by zero-dimensional faces.0. in fact. CONVEX GEOMETRY R (a) (b) R2 (c) R3 Figure 13: (a) Ellipsoid in R is a line segment whose boundary comprises two points. (c) in R3 . These same facts hold in higher dimension. by vertices ( 2. (b) in R2 .

2. For example: The ellipsoids in Figure 13 have boundaries composed only of zero-dimensional faces. II. {ρ(ρ + 1)/2. for example. a line intersects the boundary of the ellipsoids in Figure 13 at a point in R .9). and R3 . for example.2. starting from any point of a convex set.1). for example. we observe the boundary of a convex body to be entirely constituted by all its faces of dimension lower than the body itself. For now. Boundary of the positive semidefinite cone.14. Intersection of line with boundary in R6 . We formally define face in ( 2. and 3 .0.1. is a six-dimensional Euclidean body in isometrically isomorphic R6 ( 2. The three-dimensional bounded polyhedron in Figure 20 has zero-. on the other side is the set interior. [26. having boundaries constructed from other three-dimensional convex polyhedra called faces. Given.9) of any point X ∈ S3 on that cone S3 is known in closed form ( 7. Any face of a convex set is convex. then the projection on S3 in R6 is +   λ1 0 QT ∈ S3  λ2 PX = Q (22) + 0 0 . This is intuitively plausible because.1). 1. Λ  0 −λ3 where Q ∈ R3×3 is an orthogonal matrix.1. CONVEX SET 43 entry-point along the line is the exterior of the set. In other words. ρ = 0.1. comprises faces having only the dimensions 0. one-. λ ∈ int R3 + + and diagonalization ( A.1) of exterior point   λ1 0  λ2 (21) X = QΛQT ∈ S3 . The convex cone of positive semidefinite matrices S3 ( 2. The two-dimensional slab in Figure 11 is an unbounded polyhedron having one-dimensional faces making its boundary. and two-dimensional polygonal faces constituting its boundary.2] When a line intersects the interior of a convex body in any dimension. id est. 1. 2. the boundary appears to the line to be as thin as a point.2. Such thinness is a remarkable fact when pondering visualization of convex polyhedra ( 2. 2}.12.1. in the ambient + subspace of symmetric matrices S3 ( 2.2).2. Unique minimum-distance projection P X ( E. 5. in this dimension. a move toward the interior is an immediate entry into the interior.3) in four Euclidean dimensions.6).5.1 Example.7. R2 .

The line connecting these two points is {X + (P X − X)t | t ∈ R} where t = 0 ⇔ X and t = 1 ⇔ P X .7. Tangential line intersection with boundary.44 CHAPTER 2. Again. Indeed. A higher-dimensional boundary ∂ C of a convex Euclidean body C is simply a dimensionally larger set through which a line can pass when it does not intersect the body’s interior. for example. it must exist in at least three dimensions. Now let’s move to an ambient space of three dimensions. Because this line intersects the boundary of the positive semidefinite cone S3 at + point P X and passes through its interior (by assumption).1. the line is confined to the same one-dimensional space as the line-segment and must pass along the segment to reach the end points. . found by discarding all negative eigenvalues in Λ . CONVEX GEOMETRY This positive semidefinite matrix P X nearest X thus has rank 2. for ε an arbitrarily small positive constant. Still. But for a line to pass tangentially through a single arbitrarily chosen point in the relative interior of a one-dimensional face on the boundary as illustrated. a line must exist in at least two dimensions to tangentially pass through any single arbitrarily chosen point on the boundary (without intersecting the ellipsoid interior). Figure 14a shows a vertical line-segment whose boundary comprises its two endpoints. it must exist in at least the two dimensions of the polygon. Figure 14b illustrates a two-dimensional ellipsoid whose boundary is constituted entirely by zero-dimensional faces.3]) through a single point relatively interior to a three-dimensional face on ∂ C .1.2 Example. then the matrix corresponding to an infinitesimally positive perturbation of t there should reside interior to the cone (rank 3). Let’s understand why by inductive reasoning. For a line to pass through the boundary tangentially (intersecting no point relatively interior to the line-segment). it must exist in an ambient space of at least two dimensions. Figure 14c shows a polygon rotated into three dimensions. a line existing in five or more dimensions may pass tangentially (intersecting no point interior to C [220. Otherwise.   λ1 0 QT ∈ int S3 λ2 X + (P X − X)t|t=1+ε = Q(Λ+(P Λ−Λ)(1+ε))QT = Q + 0 ελ3 (23) 2. 15. For a line to pass through its zero-dimensional boundary (one of its vertices) tangentially.

(c) (d) to relative interior of a one-dimensional face in R3 .2. .1. CONVEX SET 45 (a) R 2 (b) R3 (c) (d) Figure 14: Line tangential: (a) (b) to relative interior of a zero-dimensional face in R2 .

4. striking only one point from its relative interior rel int P as claimed. From these few examples.7. Now imagine that polyhedron P is itself a three-dimensional face of some other polyhedron in R4 .2. the trilateration problem ( 5.1. This cone’s boundary is constituted solely by those one-dimensional halflines. this means: a being capable of navigating four spatial dimensions (one Euclidean dimension beyond our physical reality) could see inside three-dimensional objects. call it P : A line existing in at least four dimensions is required in order to pass tangentially (without hitting int P ) through a single arbitrary point in the relative interior of any two-dimensional polygonal face on the boundary of polyhedron P . In layman’s terms.3. striking only one arbitrarily chosen point relatively interior to a one-dimensional face. requires a line existing in at least five dimensions. CONVEX GEOMETRY Figure 14d illustrates a solid circular cone (drawn truncated) whose one-dimensional faces are halflines emanating from its pointed end (vertex ).5). if it exists in at least the three-dimensional ambient space of the cone..10 (17) This rule can help determine whether there exists unique solution to a convex optimization problem whose feasible set is an intersecting line. Now the interesting part. with regard to Figure 20 showing a bounded polyhedron in R3 .10 It is not too difficult to deduce: A line may pass through a single arbitrarily chosen point interior to a k-dimensional convex Euclidean body (hitting no other interior point) if that line exists in dimension at least equal to k + 1.g. . way deduce a general rule (without proof): A line may pass tangentially through a single arbitrarily chosen point relatively interior to a k-dimensional face on the boundary of a convex Euclidean body if the line exists in dimension at least equal to k + 2.2. A line may pass through the boundary tangentially. To pass a line tangentially through polyhedron P itself.2 Relative boundary The classical definition of boundary of a set C presumes nonempty interior: ∂ C = C \ int C 2. 2. e.46 CHAPTER 2.

1. thm. (12) rel ∂{x} = {x}\{x} = ∅ . A. For a finite collection of N sets. defined [200.2.0. [309. while a convex polychoron (a bounded polyhedron in R4 [376]) has boundary constructed from three-dimensional convex polyhedra.2] rel ∂ C C \ rel int C (24) boundary relative to affine hull of C . [200. In converse this theorem is implicitly false insofar as a convex set can be formed by the intersection of sets that are not.3. 2.8 intersection. Unions of convex sets are generally not convex. .5] Intersection of an arbitrary collection of convex sets {Ci } is convex. nontrivial. in R2 from at least three line segments. p. Applying this definition to nonempty convex set C1 .1) in subspace R .1. p. 2. sum.2. in R3 from convex polygons. has boundary constructed from two points. its interior therefore excludes that hyperplane. difference.. for example.1 Theorem. and convex. a necessarily nonempty intersection of relative interior N rel int Ci = rel int N Ci equals relative interior of intersection.12.g. Ci = Ci . its self-difference C1 − C1 is generally nonempty. Intersection. we can similarly define vector difference of two convex sets C1 − C2 = {x − y | x ∈ C1 .0. 2. A halfspace is partially bounded by a hyperplane.8.6. product 2.24] C1 + C2 = {x + y | x ∈ C1 . By additive inverse. CONVEX SET 47 More suitable to study of convex sets is the relative boundary.1.2. In the exception when C is a single point {x} .22] Vector sum of two convex sets C1 and C2 is convex [200. e. y ∈ C2 } (26) but not necessarily closed unless at least one set is closed and bounded.1.0. An affine set has no relative boundary. x ∈ Rn (25) A bounded convex polyhedron ( 2. i=1 i=1 ⋄ And for a possibly infinite collection. y ∈ C2 } (27) which is convex.

Inverse image of a convex set F .4): 2. it therefore follows from (26).1.1 Theorem. CONVEX GEOMETRY for any convex cone K . a Cartesian product is convex iff each set is. Inverse image. id est.23] Convex results are also obtained for scaling κ C of a convex set C . y ∈ C2 } = T (C1 ) + T (C2 ) = {T (x) + T (y) | x ∈ C1 . y ∈ C2 } (30) 2. Let f be a mapping from Rp×k to Rm×n . or translation C + α . it generally holds that inverse image (Figure 15) of a convex function is not. The most prominent examples to the contrary are affine functions ( 3. rotation/reflection Q C .0. we are prone to write T (C) meaning T (C) {T (x) | x ∈ C} (29) T (C1 + C2 ) = {T (x + y) | x ∈ C1 . ( 2.9 inverse image While epigraph ( 3. p.or many-valued mapping.15] Cartesian product of convex sets C1 × C2 = x y | x ∈ C1 . under any affine function f is convex. p. The image of a convex set C under any affine function f (C) = {f (X) | X ∈ C} ⊆ Rm×n (31) is convex. remains convex. y ∈ C2 = C1 C2 (28) Given linear operator T . 3] f −1 (F ) = {X | f (X) ∈ F} ⊆ Rp×k (32) a single. [309. ⋄ . The converse also holds. Given any operator T and convex set C .2) the set K − K constitutes its affine hull.1. [200.48 CHAPTER 2. each similarly defined.9. [309.7.5) of a convex function must be convex.

but there is no converse.1. (b) Inverse image under convex function f .2. A†Ax = xp R(AT ) xp {x} 0 η N (A) R(A) b 0 {b} Rn Rm xp = A† b Axp = b Ax = b x = xp + η Aη = 0 N (AT ) Figure 16:(confer Figure 167) Action of linear map represented by A ∈ Rm×n : Component of vector x in nullspace N (A) maps to origin while component in rowspace R(AT ) maps to range R(A). . CONVEX SET 49 f (C) (a) C f (b) f −1 (F ) f F Figure 15: (a) Image of convex set in domain of any convex function f is convex. For any A ∈ Rm×n . AA†Ax = b ( E) and inverse image of b ∈ R(A) is a nonempty affine set: xp + N (A).

4.11 2 = y yTy . Orthogonal projection of points on affine subsets is reviewed in E. the converse is false.0. CONVEX GEOMETRY In particular.9. [309.50 CHAPTER 2.7. Although not precluded.5. Shadows.4. A counterexample. any affine transformation of an affine set remains affine. 2 y⊥z ⇔ y. Each converse of this two-part theorem is generally false. for example.4. A vector inner-product defines Euclidean norm (vector 2-norm. ⋄ Again. e.1. are umbral projections that can be convex when the body providing the shade is not.g.0. z y Tz = y z cos ψ (33) Euclidean space Rn comes equipped with a vector inner-product where ψ (950) represents angle (in radians) between vectors y and z .2. A nonempty affine set is called an affine subset ( 2.1.11 Orthogonal projection of a convex set on a subspace or nonempty affine set is another convex set.2 Vectorized-matrix inner product y. z = 0 (34) When orthogonal vectors each have unit norm. mechanizes inverse image under a general linear map. a convex image f (C) does not imply that set C is convex.0.0.6]. vector y might represent a hyperplane normal ( 2.1.2. and neither does a convex inverse image f −1 (F ) imply set F is convex. Figure 16.1). We prefer those angle brackets to connote a geometric rather than algebraic perspective.2).9.0. for example.1) y 2. this inverse image theorem does not require a uniquely invertible mapping f . given f affine. y =0 ⇔ y=0 (35) For hyperplane representations see 2. is easy to visualize when the affine function is an orthogonal projector [333] [251]: (1973) [309. A. Example 2.2 offer further applications. . id est. 6.2 and Example 3. then they are orthonormal. Projection on subspace. 3] 2. p.8] Ellipsoids are invariant to any [sic] affine transformation.2 Corollary. 2. For projection of convex sets on hyperplanes see [374. Two vectors are orthogonal (perpendicular ) to one another if and only if their inner product vanishes (iff ψ is an odd multiple of π ).4.. invalidating a converse.

p. 2.2. For example. not necessarily square. for Z ∈ Rp×k . 8] [368. [203.10] y . yk Then the vectorized-matrix inner-product is trace of matrix inner-product. linear bijective vectorization is distributive with respect to Hadamard product. the adjoint operator AT is its transposition.12 tr(Y TZ) = vec(Y )T vec Z (38) tr(Y TZ) = tr(Z Y T ) = tr(Y Z T ) = tr(Z T Y ) = 1T (Y ◦ Z)1 (39) (42) Hadamard product is a simple entrywise product of corresponding entries from two matrices of like size. id est. represented by real matrix A . 3.1] [200. the Hadamard product can be extracted from within a Kronecker product.1) and where ◦ denotes the Hadamard product 2. by first transforming a matrix in Rp×k to a vector in Rpk by concatenating its columns in the natural order. C1 = v . This operator is self-adjoint when A = AT . Vector inner-product for matrices is calculated just as it is for vectors. [61.3.2. and consider any particular dimensionally compatible real vectors v and w .1. The adjoint AT operation on a matrix can therefore be defined in like manner: Y .6. Then vector inner-product of C1 with vwT is vwT . A commutative operation.1] [385. 1. z (36) For linear operation on a vector. for example.2]) transformation vectorization.475] . For lack of a better term. VECTORIZED-MATRIX INNER PRODUCT 51 For linear operator A . 0. its adjoint AT is a linear operator defined by [228.  . we shall call that linear bijective (one-to-one and onto [228.2] Y. vec(Y ◦ Z) = vec(Y ) ◦ vec(Z ) 2. Z (40) Take any element C1 from a matrix-valued set in Rp×k .12 of matrices [160. ATZ AY . 2.A1.Z where ( A.4]. ATz Ay . the vectorization of Y = [ y1 y2 · · · yk ] ∈ Rp×k [167] [329] is   y1  y2   . App.  ∈ Rpk (37) vec Y  . C1 w = v TC1 w = tr(wv T C1 ) = 1T (vwT )◦ C1 1 (41) Further. id est.1.

by replacing v and w with any particular respective matrices U and W of dimension compatible with all elements of convex set C . C must be convex consequent to inverse image theorem 2.13 Instead given convex set Y . Since that holds true for any two elements from Y .9. 2. hence.1 Example. (b) Tesseract is a projection of hypercube in R4 on R3 . then it must be a convex set.13 The two sides are equivalent by linearity of inner product. C2 = vwT . To verify that. C1 + (1 − µ) vwT . the set of vector inner-products Y v TCw = vwT . Then for any particular vectors v ∈ Rp and w ∈ Rk .1. The right-hand side remains a vector inner-product of vwT with an element µ C1 + (1 − µ)C2 from the convex set C . and then form the vector inner-products (43) that are two elements of Y by definition. C ⊆ R (43) is convex. CONVEX GEOMETRY R3 (a) (b) Figure 17: (a) Cube in R3 projected on paper-plane R2 .2. for 0 ≤ µ ≤ 1 µ vwT . Application of inverse image theorem. then set U TCW is convex by the inverse image theorem because it is a linear mapping of C . vwT in (43) may be replaced with any particular matrix Z ∈ Rp×k while convexity of set Z . videlicet.1. Subspace projection operator is not an isomorphism because new adjacencies are introduced. it belongs to Y . Further.0.2. µ C1 + (1 − µ)C2 2. More generally.0. . It is easy to show directly that convex combination of elements from Y remains an element of Y . Now make a convex combination of those inner products. take any two elements C1 and C2 from the convex matrix-valued set C . Suppose set C ⊆ Rp×k were convex. C ⊆ R persists.0.52 R2 CHAPTER 2.

1] Projection ( E) is not an isomorphism. id est. j 2 2 = Y.4] Frobenius’ norm is the Euclidean norm of vectorized matrices. When Z = Y ∈ Rp×k in (38). Because of this Euclidean structure.1] thus Y 2 F = i λ(Y )2 = λ(Y ) i 2 2 = λ(Y ) .5. under which the bodies correspond. λ(Y ) = Y . for X ∈ Rp×k vec X − vec Y 2 = X −Y F (46) and because vectorization (37) is a linear bijective map. of their vector spaces. An isomorphism of a vector space is a transformation equivalent to a linear bijective mapping.5. then vector space Rp×k is isometrically isomorphic with vector space Rpk in the Euclidean sense and vec is an isometric isomorphism of Rp×k .2.2.1 Definition.Y i = tr(Y T Y ) i 2 Yij = λ(Y T Y )i = σ(Y )2 i (44) where λ(Y T Y )i is the i th eigenvalue of Y T Y . all the known results from convex analysis in Euclidean space Rn carry over directly to the space of real matrices Rp×k . hence. [370. if v and w are points connected by a line segment in one vector space. 8. Were Y a normal matrix ( A. Y (45) The converse (45) ⇒ normal matrix Y also holds. perfect reconstruction (inverse projection) is generally impossible without additional information. I.0. Image and inverse image under the transformation operator are then called isomorphic vector spaces.1 Frobenius’ 2.1.2. Frobenius’ norm is resultant from vector inner-product. Figure 17 for example.1). VECTORIZED-MATRIX INNER PRODUCT 53 2. Two Euclidean bodies may be considered isomorphic if there exists an isomorphism. then σ(Y ) = |λ(Y )| [396. Isomorphic. Because the metrics are equivalent. . 2. △ Isomorphic vector spaces are characterized by preservation of adjacency. and σ(Y )i the i th singular value of Y . [203. (confer (1723)) Y 2 F = = vec Y i.2. then their images will be connected by a line segment in the other.

2. y ∈ dom T Tx − Ty = x − y Then isometric isomorphism T is called a bijective isometry. is an isometric isomorphism. represented by orthogonal matrix Q ∈ Rk×k ( B. synonymous with uniquely invertible linear mapping on Euclidean space.5.14 Then we also say any matrix U whose columns are orthonormal with respect to each other (U T U = I ). 2. e.1 Injective linear operators Injective mapping (transformation) means one-to-one mapping.1 Definition.2.1.0.54 CHAPTER 2.1. CONVEX GEOMETRY T R3 R2 dim dom T = dim R(T ) Figure 18: Linear injective mapping T x = Ax : R2 → R3 of Euclidean body remains two-dimensional under mapping represented by skinny full-rank matrix A ∈ R3×2 .1. discrete Fourier transform via (835).2).1. 2. these include the orthogonal matrices. Suppose T (X) = U XQ is a bijective isometry where U is a dimensionally compatible orthonormal matrix. 2..14 . Isometric isomorphism.2.2.1. An isometric isomorphism of a vector space having a metric defined on it is a linear bijective mapping T that preserves distance.g. △ (47) Unitary linear operator Q : Rk → Rk . Linear injective mappings are fully characterized by lack of nontrivial nullspace. id est. for all x. two bodies are isomorphic by Definition 2.

Y ∈ Rp×k U (X −Y )Q F 55 = X −Y F (48)   1 0 Yet isometric operator T : R2 → R3 . ( E.0. Mappings in Euclidean space created by noninjective linear operators can be characterized in terms of an orthogonal projector ( E).2.1. for any invertible linear transformation T [ibidem] dim dom(T ) = dim R(T ) (49) e.1) This means vector Ax can be succinctly interpreted as coefficients of orthogonal projection.1. In fact.. 1. meaning. image P (T B) of projection P T (B) on R(AT ) in Rn must therefore be isomorphic with the set of projection coefficients T B = {Ax | x ∈ B} in Rm and have the same affine dimension by (49).2 Example. 2. 2. any linear injective transformation has a range whose dimension equals that of its domain. Noninjective linear operators.3.6] This operator T can therefore be a bijective isometry only with respect to its range. 0 0 3 is injective but not a surjective map to R . of any n-dimensional Euclidean body in domain of operator T .1. Pseudoinverse matrix A† is skinny and full-rank.1. [228. so operator P y is a linear bijection with respect to its range R(A† ).2. VECTORIZED-MATRIX INNER PRODUCT Frobenius’ norm is orthogonally invariant. . for X.6. represented by A =  0 1  on R2 . What can be said about the nature of this m-dimensional mapping? Concurrently. By Definition 2.2.g. (Figure 18) An important consequence of this fact is: Affine dimension. consider injective linear operator P y = A† y : Rm → Rn where R(A† ) = R(AT ).2. Consider noninjective linear operator T x = Ax : Rn → Rm represented by fat matrix A ∈ Rm×n (m < n). In other words. To illustrate. we present a three-dimensional Euclidean body B in Figure 19 where any point x in the nullspace N (A) maps to the origin. P (Ax) = P T x achieves projection of vector x on the row space R(AT ). Any linear injective transformation on Euclidean space is uniquely invertible on its range. is invariant to linear injective transformation. T represented by skinny-or-square full-rank matrices.

Symmetric matrix subspace.2. Any square matrix A ∈ RM ×M can be written as a sum of its .2 Symmetric matrices 2.0. The orthogonal complement [333] [251] of SM is SM ⊥ A ∈ RM ×M | A = −AT ⊂ RM ×M SM ⊕ SM ⊥ = RM ×M where unique vector sum ⊕ is defined on page 786.2. CONVEX GEOMETRY R3 P T (B) x PTx Figure 19: Linear noninjective mapping P T x = A†Ax : R3 → R3 of three-dimensional Euclidean body B has affine dimension 2 under projection on rowspace of fat full-rank matrix A ∈ R2×3 . 2.56 PT B R 3 CHAPTER 2.1 Definition. Define a subspace of RM ×M : the convex set of all symmetric M×M matrices. Set of coefficients of orthogonal projection T B = {Ax | x ∈ B} is isomorphic with projection P (T B) [sic]. id est. SM A ∈ RM ×M | A = AT ⊆ RM ×M (50) This subspace comprising symmetric matrices SM is isomorphic with the vector space RM (M +1)/2 whose dimension is the number of free variables in a symmetric M ×M matrix. △ (51) the subspace of antisymmetric matrices in RM ×M .2. (52) All antisymmetric matrices are hollow by definition (have 0 main diagonal).

2. 1 1 A = (A + AT ) + (A − AT ) 2 2 2 57 (53) The symmetric part is orthogonal in RM to the antisymmetric part. To realize isometrically in RM (M +1)/2 .2. vec .2.2.Z tr(Y TZ) = vec(Y )T vec Z = svec(Y )T svec Z (57) . videlicet.   . because of antisymmetry. Such a realization would remain isomorphic but not isometric.1 Isomorphism of symmetric matrix subspace When a matrix is symmetric in SM . YMM where all entries off the main diagonal have been scaled.2. . Further confined to the ambient subspace of symmetric matrices. VECTORIZED-MATRIX INNER PRODUCT symmetric and antisymmetric parts: respectively. Now for Z ∈ SM Y.2. SM ⊥ would become trivial. Lack of 2 isometry is a spatial distortion due now to disparity in metric between RM and RM (M +1)/2 . tr (A + AT )(A − AT ) = 0 (54) In the ambient space of real matrices. we may still employ the vectorization 2 transformation (37) to RM . we must make a correction: For Y = [Yij ] ∈ SM we take symmetric vectorization [216. the antisymmetric matrix subspace can be described SM ⊥ = 1 (A − AT ) | A ∈ RM ×M 2 ⊂ RM ×M (55) because any matrix in SM is orthogonal to any matrix in SM ⊥ . We might instead choose to realize in the lower-dimensional subspace RM (M +1)/2 by ignoring redundant entries (below the main diagonal) during transformation. an isometric isomorphism. 2.1]   Y11  √2Y12     Y22  √   2Y   √ 13  ∈ RM (M +1)/2 (56) svec Y  2Y   23   Y   33  .

2.58 CHAPTER 2. The set of all symmetric matrices SM forms a proper subspace in RM ×M .. 2. so for it there exists a standard orthonormal basis in isometrically isomorphic RM (M +1)/2    ei eT . CONVEX GEOMETRY Then because the metrics become equivalent. 1 ≤ i < j ≤ M  j i 2 where M (M + 1)/2 standard basis matrices Eij are formed from the standard basis vectors ei = 1. for X ∈ SM svec X − svec Y 2 = X−Y F (58) and because symmetric vectorization (56) is a linear bijective mapping.M  i (59) {Eij ∈ SM } = 1  √ ei eT + ej eT ... then svec is an isometric isomorphism of the symmetric matrix subspace. RM ×M h A ∈ RM ×M | A = AT .M √ Yij 2 . δ(A) = 0 ⊂ RM ×M (63) .2. .1 Definition. In other words. i = j ..3 Symmetric hollow subspace 2. Hollow subspaces. . Y = Yii . SM is isometrically isomorphic with RM (M +1)/2 in the Euclidean sense under transformation svec . Y Eij (61) whose unique coefficients Eij . i = j = 1. i = j (60) Thus we have a basic orthogonal expansion for Y ∈ SM M j Y = j=1 i=1 Eij . 1 ≤ i < j ≤ M (62) correspond to entries of the symmetric vectorization (56).3. M ∈ RM 0. [355] Define a subspace of RM ×M : the convex set of all (real) symmetric M ×M matrices having 0 main diagonal. j = 1 . i = 1.0.

operator δ(δ(Λ)) returns Λ itself. tr 1 (A + AT ) − δ 2(A) 2 1 (A − AT ) + δ 2(A) 2 = 0 (70) . Operating recursively on a vector Λ ∈ RN or diagonal matrix Λ ∈ SN . id est.1) δ(A) ∈ RM (1448) 59 Operating on a vector. δ(δ(A)) is a diagonal matrix. (65) Yet defined instead as a proper subspace of ambient SM SM h A ∈ SM | δ(A) = 0 ≡ RM ×M ⊂ SM h SM ⊥ h A ∈ SM | A = δ 2(A) ⊆ SM (66) the orthogonal complement SM ⊥ of symmetric hollow subspace SM . h h (67) called symmetric antihollow subspace. is simply the subspace of diagonal matrices. its orthogonal complement is M Rh ×M ⊥ A ∈ RM ×M | A = −AT + 2δ 2(A) ⊆ RM ×M M RM ×M ⊕ Rh ×M ⊥ = RM ×M h (64) the subspace of antisymmetric antihollow matrices in RM ×M . A= 1 (A + AT ) − δ 2(A) + 2 1 (A − AT ) + δ 2(A) 2 (69) The symmetric hollow part is orthogonal to the antisymmetric antihollow 2 part in RM . δ 2(Λ) ≡ δ(δ(Λ)) = Λ (1450) The subspace RM ×M (63) comprising (real) symmetric hollow matrices is h isomorphic with subspace RM (M −1)/2 .2. VECTORIZED-MATRIX INNER PRODUCT where the main diagonal of A ∈ RM ×M is denoted ( A.2. SM ⊕ SM ⊥ = SM (68) h h △ Any matrix A ∈ RM ×M can be written as a sum of its symmetric hollow and antisymmetric antihollow parts: respectively. linear operator δ naturally returns a diagonal matrix. id est. videlicet.

2.   . .60 CHAPTER 2. dvec is an isometric isomorphism on the symmetric hollow subspace. YM −1.2 characterizes eigenvalues of symmetric hollow matrices.1. for symmetric hollow matrices we introduce the linear bijective vectorization dvec that is the natural analogue to symmetric matrix vectorization svec (56): for Y = [Yij ] ∈ SM h   Y12  Y13     Y23    √  Y  14  ∈ RM (M −1)/2 (73) dvec Y 2   Y 24     Y 34   .M Like svec . which reduces to the diagonal matrices in the ambient space of symmetric matrices SM ⊥ = h δ 2(A) | A ∈ SM = δ(u) | u ∈ RM ⊆ SM (72) In anticipation of their utility with Euclidean distance matrices (EDMs) in 5. j i 2 1≤i<j≤M (75) where M (M − 1)/2 standard basis matrices Eij are formed from the standard basis vectors ei ∈ RM . The symmetric hollow majorization corollary A. . so for it there must be a standard orthonormal basis in isometrically isomorphic RM (M −1)/2 {Eij ∈ SM } = h 1 √ ei eT + ej eT . CONVEX GEOMETRY because any matrix in subspace RM ×M is orthogonal to any matrix in the h antisymmetric antihollow subspace M Rh ×M ⊥ = 1 (A − AT ) + δ 2(A) | A ∈ RM ×M 2 ⊆ RM ×M (71) of the ambient space of real matrices.0. For X ∈ SM h dvec X − dvec Y 2 = X−Y F (74) The set of all symmetric hollow matrices SM forms a proper subspace h in RM ×M .

3. For nonempty sets.2. we define affine dimension r of the N -point list X as dimension of the smallest affine set in Euclidean space Rn that contains X . N } = aff X = x1 + R{xℓ − x1 . ℓ = 1 . hyperplane ( 2. .12). . [355] [186] That affine set A in which those points are embedded is unique and called the affine hull [332. convex.2. HULLS 61 Figure 20: Convex hull of a random list of points in R3 . and conic hulls: convex sets that may be regarded as kinds of Euclidean container or vessel united with its interior.4. .1] Ascribe the points in a list {xℓ ∈ Rn . 2. affine dimension is the same as dimension of the subspace parallel to that affine set. affine dimension In particular. point. ℓ = 1 . . . plane. line.2). Some points from that generating list reside interior to this convex polyhedron ( 2. A.3 Hulls We focus on the affine. ℓ = 2 .1]. Convex Polyhedron] (Avis-Fukuda-Mizukoshi) 2. r dim aff X Affine dimension of any set in Rn is the dimension of the smallest affine set (empty set.3. translated subspace.1 Affine hull. N } = {Xa | aT 1 = 1} ⊆ Rn (78) . [376. . Rn ) that contains it. 1] [200. N } to the columns of matrix X : X = [ x1 · · · xN ] ∈ Rn×N (76) (77) Affine dimension r is a lower bound sometimes called embedding dimension. A aff {xℓ ∈ Rn . [309. 2.

3. (Figure 21) The affine hull of three noncollinear points in any dimension is that unique plane containing the points.0.1.1.1 Example. CONVEX GEOMETRY for which we call list X a set of generators. . N } = R(X − x1 1T ) ⊆ Rn where R(A) = {Ax | ∀ x} aff C = x + aff(C − x) where aff(C −x) is a subspace.9. aff{x} = {x} (82) (142) (79) Given some arbitrary set C and any x ∈ C (80) Affine hull of two distinct points is the unique line through them. Affine hull of correlation matrices.3. The subspace of symmetric matrices Sm is the affine hull of the cone of positive semidefinite matrices. Affine hull of rank-1 correlation matrices. Hull A is parallel to subspace R{xℓ − x1 .0. ( 2. and so on. ℓ = 2 . [219] The set of all m × m rank-1 correlation matrices is defined by all the binary vectors y in Rm (confer 5.9) aff Sm = Sm + (83) 2.62 CHAPTER 2. id est.1) {yy T ∈ Sm | δ(yy T ) = 1} + (84) Affine hull of the rank-1 correlation matrices is equal to the set of normalized symmetric matrices. aff ∅ ∅ (81) The affine hull of a point x is that point itself.2 Exercise. Find the convex hull instead. .1.0. aff{yy T ∈ Sm | δ(yy T ) = 1} = {A ∈ Sm | δ(A) = 1} + (85) 2. . Prove (85) via definition of affine hull.

C. A . HULLS 63 A affine hull (drawn truncated) C convex hull K conic hull (truncated) range or span is a plane (truncated) R Figure 21: Given two points in Euclidean vector space of any dimension.2. generally. K ⊆ R ∋ 0. their various hulls are illustrated.) .3. (Cartesian axes drawn for reference. Each hull is a subset of range.

0.1. Given some arbitrary set C ⊆ Rn .3.9. 2. 2. With particular respect to the nonnegative orthant. like I A 0 + in Example 2. . A. a b ⇔ a i ≥ b i ∀ i (369). Given P .2) The symbol ≥ is reserved for scalar comparison on the real line R with respect to the nonnegative real line R+ as in aTy ≥ b .4] [309] of any bounded2. a conv C ⊆ aff C = aff C = aff C = aff conv C 2. Comparison of matrices with respect to the positive semidefinite cone SM .12. [114.12.1.1. . its convex hull conv C is equivalent to the smallest convex set containing it.1. a b means a − b belongs to the nonnegative orthant but neither a or b is necessarily nonnegative. id est.7.2 Convex hull The convex hull [200. ℓ = 1 .4. CONVEX GEOMETRY Partial order induced by RN and SM + + Notation a 0 means vector a belongs to nonnegative orthant RN while + a ≻ 0 means vector a belongs to the nonnegative orthant’s interior int RN .15 list or set of N points X ∈ Rn×N forms a unique bounded convex polyhedron (confer 2.1.3. + a b denotes comparison of vector a to vector b on RN with respect to the nonnegative orthant. N } = conv X = {Xa | aT 1 = 1. (confer 2..1 CHAPTER 2.2] (confer 5. the smallest closed convex set that contains the list X .7.3. but the supremum may be difficult to ascertain.2) of the polyhedron comprise its convex hull P . . 2.1.0.g. is explained in 2. the generating list {xℓ } is not unique.7.3. Figure 20.2] the vertices of P comprise a minimal set of generators.0.15 (87) An arbitrary set C in Rn is bounded iff it can be contained in a Euclidean ball having finite radius. 0} ⊆ Rn (86) Union of relative interior and relative boundary ( 2.2.64 2. e.1) whose vertices constitute some subset of that list. a K b or a ≻K b denotes comparison with respect to pointed closed convex cone K . [332.1) The smallest ball containing C has radius inf sup x − y and center x⋆ whose determination is a convex problem because sup x − y x y∈C y∈C is a convex function of x .0.0. But because every bounded polyhedron is the convex hull of its vertices.2.1) The convex hull is a subset of the affine hull. P conv {xℓ . ( 2. but equivalence with entrywise comparison does not hold and neither a or b necessarily belongs to K . More generally.1.

1) of its convex hull.0.0. ( 2. there is slight simplification: ((1652).1). U T U = I = A ∈ SN | I 0 . I .1) A conv U U T | U ∈ RN . id est. N = 2 .2.2. 2. but that circumscription does not hold in higher dimension.2.3.3. A = k ⊂ SN + (90) This important convex body we call Fantope (after mathematician Ky Fan). A =1 (91) In case k = N . 3] [240. each and every extreme point U U T has only k nonzero eigenvalues λ and they all equal 1 . the Fantope is identity matrix I .6. U T U = 1 = A ∈ SN | A 0.1) conv U U T | U ∈ RN ×k .0. [144] [289] [11.1] [294. In case k = 1. HULLS 65 Any closed bounded convex set C is equal to the convex hull of its boundary.2. 4. So Frobenius’ norm of each and every extreme point equals the same constant UUT 2 F =k (93) Each extreme point simultaneously lies on the boundary of the positive semidefinite cone (when k < N .2. the set U U T | U ∈ RN ×k . λ(U U T )1:k = λ(U T U ) = 1.9.4] [241] Convex hull of the set comprising outer product of orthonormal matrices has equivalent expression: for 1 ≤ k ≤ N ( 2.9.8) . C = conv ∂ C conv ∅ ∅ (88) (89) 2.2. the boundary of a disc corresponding to k = 1. By (1496). I . Hull of rank-k projection matrices.0.9) and on the boundary of a hypersphere 1 k k of dimension k(N − k + 2 ) and radius k(1 − N ) centered at N I along 2 the ray (base 0) through the identity matrix I in isomorphic vector space RN (N +1)/2 ( 2. Figure 22 illustrates extreme points (92) comprising the boundary of a Fantope. Example 2.7. More generally.9. 2. U T U = I (92) comprises the extreme points ( 2.1 Example.

CONVEX GEOMETRY svec ∂ S2 + α β β γ I γ α √ 2β √ Figure 22: Two Fantopes. Lone point illustrated is identity matrix I .) . Circle (radius 1/ 2). interior to PSD cone. N = 2). + comprises boundary of a Fantope (90) in this dimension (k = 1. N = 2. (View is from inside PSD cone looking toward origin.66 CHAPTER 2. and is that Fantope corresponding to k = 2. shown here on boundary of positive semidefinite cone S2 in isometrically isomorphic R3 from Figure 43.

.2 Example. As in A. v ∈ Rn } = {X ∈ Rm×n | i σ(X)i ≤ 1} (94) where σ(X) is a vector of singular values.1) in the hull is effectively bounded above by 1. u ∈ Rm . n} ] ∈ Rm×min{m. conv{uv T | uv T ≤ 1 . u ∈ Rm . T T Given any particular dyad uvp in the convex hull. .3. (Since uv T = u v (1643).7 we find the convex hull of all symmetric rank-1 matrices to be the entire positive semidefinite cone. ∂{X ∈ Rm×n | i σ(X)i ≤ 1} ⊇ {uv T | uv T = 1 . .1.6. n} . therefore. we learn that the convex hull of normalized symmetric rank-1 matrices is a slice of the positive semidefinite cone.3. In 2.9.3. Any point formed by convex combination of dyads in the hull must therefore be expressible as a convex combination of dyads on the boundary: Figure 23. v ∈ Rn } (96) Normalized dyads constitute the set of extreme points ( 2. v ∈ Rn } (95) id est. n} ] ∈ Rn×min{m.4. because its polar −uvp and every convex combination of the two belong to that hull. . vmin{m.6. (⇐) Suppose σ(X)i ≤ 1. In the present example we abandon symmetry.2. n} . From (91).0. dyads may be normalized and the hull’s boundary contains them. HULLS 67 2. . in Example 2. define a singular value decomposition: X = U ΣV T where U = [u1 .2. u ∈ Rm . instead posing. their convex hull. Nuclear norm ball : convex hull of rank-1 matrices.) Proof.3] T T σ(X)i ≤ σ(αi ui vi ) = αi ui vi ≤ αi . and ui vi ≤ 1 ∀ i . Then by triangle inequality for singular values [204.2.0. and whose sum of singular values is √ √ T σ(X)i = tr Σ = κ ≤ 1.1) of this nuclear norm ball which is. norm of each vector constituting a dyad uv T ( B. (Srebro) (⇒) Now suppose we are given a convex combination of dyads T T X = αi ui vi such that αi = 1.0. what is the convex hull of bounded nonsymmetric rank-1 matrices: conv{uv T | uv T ≤ 1 . umin{m. v ∈ Rn } ≡ conv{uv T | uv T = 1 .3.0. Then we may write X = σi κui κvi which is a κ convex combination of dyads each of whose norm does not exceed 1. cor.2. then the unique T line containing those two points ±uvp (their affine combination (78)) must intersect the hull’s boundary at the normalized dyads {±uv T | uv T = 1}. αi ≥ 0 ∀ i . V = [v1 . u ∈ Rm .

3 Exercise. U V T ≤ 1} (97) Find the convex hull of nonsymmetric matrices bounded under some norm: (98) 2. CONVEX GEOMETRY uv T = 1 T uvp {X ∈ Rm×n | i σ(X)i ≤ 1} uv T ≤ 1 0 T −uvp T −xyp −uv T = 1 −xy T = 1 T Figure 23: uvp is a convex combination of normalized dyads ±uv T = 1 .1.2. Find the convex hull of nonorthogonal projection matrices ( E. the set of all permutation matrices is a proper subset of the nonconvex manifold of orthogonal matrices ( B. the only orthogonal matrices having all nonnegative entries are permutations of the identity: Ξ−1 = ΞT . In fact. T T T similarly for xyp . V ∈ Rn×k . V T U = I } {U V T | U ∈ Rm×k . [202] [319] [262] A permutation matrix Ξ is formed by interchanging rows and columns of identity matrix I . 2. Any point in line segment joining xyp to uvp is expressible as a convex combination of two to four points indicated on boundary. Ξ≥0 (99) .0. Permutation polyhedron.3.2.0.4 Example.68 xy T = 1 T xyp CHAPTER 2. Convex hull of outer product.1): {U V T | U ∈ RN ×k .5). V ∈ RN ×k . Since Ξ is square and ΞT Ξ = I .3. Describe the interior of a Fantope.

HULLS 69 And the only positive semidefinite permutation matrix is the identity. its convex hull is a bounded polyhedron ( 2. [335. is also known as the set of doubly stochastic matrices.5 Example. by this device.0. . It is remarkable that n! permutation matrices can be described as the extreme points ( 2.16 [66] [285. then. X ≥ 0} (100) th where Π i is a linear operator here representing the i permutation. The only orthogonal matrices belonging to this polyhedron are the permutation matrices.1) of a bounded polyhedron. the spectral norm X 2 constraining largest singular value σ(X)1 can be expressed as a semidefinite constraint X 2. can be a device for relaxing an integer. Convex hull of orthonormal matrices. . the so-called assignment problem can be formulated as a linear program. Their convex hull is expressed. conversely. for 1 ≤ k ≤ n conv{U ∈ Rn×k | U T U = I } = {X ∈ Rn×k | X 2 ≤ 1} = {X ∈ Rn×k | X Ta ≤ a ∀ a ∈ Rn } (101) By Schur complement ( A.5] . 8.2. n!} = {X ∈ Rn×n | X T 1 = 1. whose n! vertices are the permutation matrices.4). [318] [26. of all orthogonal 1 matrices. a closed bounded submanifold. The permutation matrices are the minimal cardinality (fewest nonzero entries) doubly stochastic matrices. that is itself described by 2n equalities (2n−1 linearly independent equality constraints in n2 nonnegative variables).1] 2. [27.20] Regarding the permutation matrices as a set of points in Euclidean space.0. having dimension nk − 2 k(k + 1) [52]. of affine dimension (n−1)2 .2] Consider rank-k matrices U ∈ Rn×k such that U T U = I .3.2. any doubly stochastic matrix can be described as a convex combination of at most (n−1)2 + 1 permutation matrices.16 2 ≤1 ⇔ I X XT I 0 (102) Relaxation replaces an objective function with its convex envelope or expands a feasible set to one that is convex.7] [55. These are the orthonormal matrices.5 prob. By Carath´odory’s e theorem. 1946) conv{Ξ = Π i (I ∈ S n ) ∈ Rn×n . This polyhedral hull.12) described (Birkhoff.0. or Boolean optimization problem.3.1. 3.6. [203. Dantzig first showed in 1951 that.2. X1 = 1. i = 1 .2. combinatorial. thm. 1.5] This polyhedron. II. 6.

the aberration [332.7. [309] [200] Every hyperplane partially bounds a halfspace (which is convex. Hull intersection with the nonnegative orthant Rn×n contains the + permutation polyhedron (100).158] [331. p. ℓ = 1 .6. (86). and (103) respectively define an affine combination.1) of this hull. .0. matrices U are orthogonal and the convex hull is called the spectral norm ball which is the set of all contractions.12. and the only nonempty convex set in Rn whose complement is convex and nonempty). 2.313] The orthogonal matrices then constitute the extreme points ( 2. every nonnegative combination of points from the list. it is apparent conv C ⊆ cone C (105) 2. (1599) (1483) (1484) When k = n .4 Vertex-description The conditions in (78). Whenever a Euclidean body can be described as some hull or span of a set of points.1.3. 2. Conic hull of any finite-length list forms a polyhedral cone [200. A.3 Conic hull In terms of a finite-length point list (or set) arranged columnar in X ∈ Rn×N (76).4 Halfspace. CONVEX GEOMETRY because of equivalence X TX I ⇔ σ(X) 1 with singular values. and conic combination of elements from the set or list.1] cone ∅ {0} (104) Given some arbitrary set C . 2. Figure 50a).2) that contains the list. e. then that representation is loosely called a vertex-description and those points are called generators.0. ..3] ( 2. An (n − 1)-dimensional affine subset of Rn is called a hyperplane.4. [204.0. its conic hull is expressed K cone {xℓ . .1.70 CHAPTER 2. Hyperplane A two-dimensional affine subset is called a plane.3. By convention. p. but not affine. N } = cone X = {Xa | a 0} ⊆ Rn (103) id est. convex combination. the smallest closed convex cone ( 2.g.

Its extreme directions give rise to what is called a vertex-description of this polyhedral cone.4.3). Because this set is polyhedral. entire boundary can be constructed from an aggregate of rays emanating exclusively from the origin. exposed directions are in one-to-one correspondence with extreme directions. HALFSPACE.6.1). there are only three. Obviously this cone can also be constructed by intersection of three halfspaces.12.0. constructed using A ∈ R3×3 and C = 0 in (286). hence the equivalent halfspace-description. .2. and linearly independent for this cone.3.0.1) in R3 whose boundary is drawn truncated. HYPERPLANE 71 Figure 24: A simplicial cone ( 2. simply.1. they are conically. By the most fundamental definition of a cone ( 2.7. Each of three extreme directions corresponds to an edge ( 2. affinely. the conic hull of extreme directions.

halfspace. vector a is inward-normal with respect to H+ . ∆ is distance from origin to hyperplane. and nullspace are each drawn truncated. . and vector c − d is normal to it. Points c and d are equidistant from hyperplane. Halfspace H− contains nullspace N (aT ) (dashed line through origin) because aTyp > 0. Shaded is a rectangular piece of semiinfinite H− with respect to which vector a is outward-normal to bounding hyperplane. CONVEX GEOMETRY H+ = {y | aT(y − yp ) ≥ 0} a yp y d c ∂H = {y | aT(y − yp ) = 0} = N (aT ) + yp ∆ H− = {y | aT(y − yp ) ≤ 0} N (aT ) = {y | aTy = 0} Figure 25: Hyperplane illustrated ∂H is a line partially bounding halfspaces H− and H+ in R2 .72 CHAPTER 2. Hyperplane.

An equivalent more intuitive representation of a halfspace comes about when we consider all the points in Rn closer to point d than to point c or equidistant. Halfspaces. id est. in terms of proximity.2b] [42.1 PRINCIPLE 1: Halfspace-description of convex sets The most fundamental principle in convex geometry follows from the geometric Hahn-Banach theorem [251.4. H− = {y | y − d ≤ y − c } (108) This representation. is resolved with the more conventional representation of a halfspace (106) by squaring both sides of the inequality in (108). both partially bounded by ∂H . HALFSPACE.4] n A closed convex set in R is equivalent to the intersection of all halfspaces that contain it.1.1. may be described as an asymmetry H− = {y | aTy ≤ b} = {y | aT(y − yp ) ≤ 0} ⊂ Rn H+ = {y | aTy ≥ b} = {y | aT(y − yp ) ≥ 0} ⊂ Rn (106) (107) Halfspaces H+ and H− where nonzero vector a ∈ Rn is an outward-normal to the hyperplane partially bounding H− while an inward-normal with respect to H+ . 5.1 Euclidean space Rn is partitioned in two by any hyperplane ∂H .12] [18. For any vector y − yp that makes an obtuse angle with normal a . [200.4. H− + H+ = Rn .2] which guarantees any closed convex set to be an intersection of halfspaces.4.1. I.1 Theorem. from Figure 25. A. 2. 2. H− = y | (c − d)T y ≤ c 2 − d 2 2 = y | (c − d)T y − c+d 2 ≤0 (109) 2. HYPERPLANE 73 2.1.4. 1] [133. vector y will lie in the halfspace H− on one side (shaded in Figure 25) of the hyperplane while acute angles denote y in H+ on the other side. ⋄ . The resulting (closed convex) halfspaces.2.4. in the Euclidean sense.

CONVEX GEOMETRY Intersection of multiple halfspaces in Rn may be represented using a matrix constant A i Hi − = {y | ATy Hi + = {y | ATy b} = {y | AT(y − yp ) b} = {y | AT(y − yp ) 0} 0} (110) (111) i where b is now a vector. ∂H {y | aTy = b} = {y | aT(y−yp ) = 0} = {Z ξ+yp | ξ ∈ Rn−1 } ⊂ Rn (114) All solutions y constituting the hyperplane are offset from the nullspace of aT by the same constant vector yp ∈ Rn that is any particular solution to aTy = b . it is itself a subspace if and only if it contains the origin. 2.17 We will later find this expression for y in terms of nullspace of aT (more generally.2. 19] ∂H = H− ∩ H+ = {y | aTy ≤ b . then any hyperplane in Rn can be described as the solution set to vector equation aTy = b (illustrated in Figure 25 and Figure 26 for R2 ). a plane in R3 . and the i th column of A is normal to a hyperplane ∂Hi partially bounding Hi .2 Hyperplane ∂H representations Every hyperplane ∂H is an affine set parallel to an (n − 1)-dimensional subspace of Rn . aTy ≥ b} = {y | aTy = b} (113) a halfspace-description. y = Z ξ + yp (115) where the columns of Z ∈ Rn×n−1 constitute a basis for the nullspace N (aT ) = {x ∈ Rn | aTx = 0} .4. intersections like this can describe interesting convex Euclidean bodies such as polyhedra and cones. giving rise to the term halfspace-description. of matrix AT (144)) to be a useful trick (a practical device) for eliminating affine equality constraints. much as we did here. [309. 2. By the halfspaces theorem.74 CHAPTER 2. Every hyperplane can be described as the intersection of complementary halfspaces.17 . id est. dim ∂H = n − 1 (112) so a hyperplane is a point in R . a line in R2 . Assuming normal a ∈ Rn to be nonzero. and so on.

20]. [61. exer.2.2. e.4. .4. HYPERPLANE 75 1 a= 1 1 1 b= {y | aTy = 1} −1 −1 −1 1 1 (a) −1 −1 (b) −1 {y | bTy = −1} {y | bTy = 1} {y | aTy = −1} c= (c) −1 1 −1 1 1 1 −1 −1 −1 1 (d) 1 −1 d= {y | cTy = 1} {y | cTy = −1} {y | d Ty = −1} {y | d Ty = 1} e= −1 1 1 0 (e) {y | eTy = −1} {y | eTy = 1} Figure 26: (a)-(d) Hyperplanes in R2 (truncated) redundantly emphasize: hyperplane movement opposite to its normal direction minimizes vector inner-product.6.2.4.4. HALFSPACE.0. This concept is exploited to attain analytical solution of linear programs by visual inspection.2. Example 2.4.5. (e) |β|/ α from ∂H = {x | αTx = β} represents radius of hypersphere about 0 supported by any hyperplane with same ratio |inner product|/norm.g.2. Each graph is also interpretable as contour plot of a real affine function of two variables as in Figure 73. Exercise 2. Example 3.8-exer.1.2.0..

4. we can find its representation ∂H by dropping a perpendicular. given constants A ∈ Rm×n and b = A .0.0. Y − Yp ≥ 0} (118) (119) Recall vector inner-product from 2. draw the hyperplane {x ∈ R | x z = 1}. Given the (shortest) distance ∆ ∈ R+ from the origin to a hyperplane having normal vector a . 2. Suppose z = 2 y . draw a hyperplane {x ∈ R2 | xTy = 1}. Y ≥ b} = {Y ∈ Rmn | A .2 Example. when H− ∋ 0 ∂H = or when H+ ∋ 0 ∂H = y | aT (y + a∆/ a ) = 0 = y | aTy = − a ∆ (117) y | aT (y − a∆/ a ) = 0 = y | aTy = a ∆ (116) Knowledge of only distance ∆ and normal a thus introduces ambiguity into the hyperplane representation.0.2: A . The point thus found is the orthogonal projection of the origin on ∂H ( E. equal to a∆/ a if the origin is known a priori to belong to halfspace H− (Figure 25).2. 2. Y ≤ b} = {Y ∈ Rmn | A . given any point yp in Rn . Yp ∈ R H− = {Y ∈ Rmn | A . Now suppose z = 2y . Hyperplane scaling.5). 2 T On the same plot.5. the unique hyperplane containing it having normal a is the affine set ∂H (114) where b equals aTyp and where a basis for N (aT ) is arranged in Z columnar.0. Y = tr(AT Y ) = vec(A)T vec(Y ). For variable Y ∈ Rm×n .1 Matrix variable Any halfspace in Rmn may be represented using a matrix variable. id est. then draw the last hyperplane again with this new z .4.76 CHAPTER 2. What is the apparent effect of scaling normal y ? 2. . Y − Yp ≤ 0} H+ = {Y ∈ Rmn | A .2. 1 Given normal y . or equal to −a∆/ a if the origin belongs to halfspace H+ . Distance from origin to hyperplane. that hyperplane is parallel to the span of its columns.2. CONVEX GEOMETRY Conversely.1 Exercise.4. Hyperplane dimension is apparent from dimension of Z .

. of course. N } is an affinely independent set if and only if {xi − x1 . Any hyperplane in Rn may be described as affine hull of a minimal set of points {xℓ ∈ Rn .i. ℓ = 1 . Y − Yp = 0} ⊂ Rmn (120) Vector a from Figure 25 is normal to the hyperplane illustrated. i= · · · =j =k = 1 . ℓ = 2 .2. {xi . . . (124) Consequently. i = 1 . ⇒ a. . n} = n−1 = x1 + R(X − x1 1T ) . A. .2. .3 Affine independence.) if and only if. ∂H = {Y | A . in words. .1. = aff X . A ⊥ ∂H in Rmn 2. dim R{xℓ − x1 .i. .2. .3] . N (123) has no solution ζ . [200.4. ζ k = 0 ∈ R (confer 2. . i = 1 . Y = b} = {Y | A . N } is a linearly independent (l. HALFSPACE. Arbitrary given points {xi ∈ Rn . n} . iff no point from the given set can be expressed as an affine combination of those remaining. nonzero vectorized matrix A is normal to hyperplane ∂H . ℓ = 2 . n} arranged columnar in a matrix X ∈ Rn×n : (78) dim aff X = n−1 dim aff{xℓ ∀ ℓ} = n−1 (122) = x1 + R{xℓ − x1 .1.) set.4. ℓ = 1 .i. N } are affinely independent (a. a minimal set of points constituting its vertex-description is an affinely independent generating set and vice versa. dim R(X − x1 1T ) = n−1 (142) where R(A) = {Ax | ∀ x} 2.2 Vertex-description of hyperplane (121) ∂H = aff{xℓ ∈ Rn . HYPERPLANE 77 Hyperplanes in Rmn may. minimal set For any particular affine set. also be represented using matrix variables. Likewise. . i = 2 . . [206. . We deduce l. 3] (Figure 27) X This is equivalent to the property that the columns of (for X ∈ Rn×N 1T as in (76)) form a linearly independent set. n} .i.2) xi ζi + · · · + xj ζj − xk = 0 . . .4. over all ζ ∈ RN ζ T 1 = 1. .

1. Three corresponding vectors in R2 are. and conic ( 2. . . each drawn truncated) of points remaining.1).10. to Xyi ζi + · · · + Xyj ζj − Xyk = 0 . they intersect only at a point. affine. CONVEX GEOMETRY A2 A3 A1 0 Figure 27: Of three points illustrated. N } demands (by definition (123)) there exist no solution ζ ∈ RN ζ T 1 = 1. i= · · · =j =k = 1 .1) senses can be preserved under linear transformation. N (126) By factoring out X .2. Consider a transformation on the domain of such matrices T (X) : Rn×N → Rn×N XY (125) where fixed matrix Y [ y1 y2 · · · yN ] ∈ RN ×N represents linear operator T .4 Preservation of affine independence Independence in the linear ( 2. any one particular point does not belong to affine hull A i (i ∈ 1. 2. ζ k = 0. 2. 3. we see that is ensured by affine independence of {yi ∈ RN } and by R(Y )∩ N (X) = 0 where N (A) = {x | Ax = 0} (143) .78 CHAPTER 2. Affine independence of {Xyi ∈ Rn . affinely independent (but neither linearly or conically independent).2. therefore. Two nontrivial affine subsets are affinely independent iff their intersection is empty {∅} or. . i = 1 . analogously to subspaces.4. Suppose a matrix X ∈ Rn×N (76) holds an affinely independent set in its columns. .

12] [18. Given any affine mapping T of vector spaces and some arbitrary set C [309.) 2. of equal inner product in vector z with normal a . 5. represents i th hyperplane in R2 parametrized by scalar κi .6 PRINCIPLE 2: Supporting hyperplane (127) The second most fundamental principle of convex geometry also follows from the geometric Hahn-Banach theorem [251.2.5 Affine maps Affine transformations preserve affine hulls.4. HALFSPACE. Inner product κi increases in direction of normal a . p. HYPERPLANE 79 C H+ H− {z ∈ R2 | aTz = κ1 } a 0 > κ3 > κ2 > κ1 {z ∈ R2 | aTz = κ2 } {z ∈ R2 | aTz = κ3 } Figure 28: (confer Figure 73) Each linear contour.2. i th line segment {z ∈ C | aTz = κi } represents intersection with hyperplane. 1] that guarantees .8] aff(T C) = T (aff C) 2. In convex set C ⊂ R2 .4. (Cartesian axes drawn for reference.4.2.

figure (a). id est. .80 CHAPTER 2. but outward-normal with respect to set Y . Vector a is inward-normal to hyperplane now with respect to both ˜ halfspace H+ and set Y . A supporting hyperplane can be considered the limit of an increasing sequence in the normal-direction like that in Figure 28. Tradition [200] [309] recognizes only positive normal polarity in support function σY as in (129). CONVEX GEOMETRY yp tradition (a) Y H− a H+ ∂H− yp nontraditional (b) Y a ˜ H+ H− ∂H+ Figure 29: (a) Hyperplane ∂H− (128) supporting closed set Y ⊂ R2 . (b) Hyperplane ∂H+ nontraditionally supporting Y . Vector a is inward-normal to hyperplane with respect to halfspace H+ . normal a . But both interpretations of supporting hyperplane are useful.

2. ˜ When a supporting hyperplane contains only a single point of Y .1 Definition.6. e.100] Hyperplanes in support of lower-dimensional bodies are admitted. 2.g. 11] 2. the hyperplane supporting Y is equivalently described (129) is called the support function for Y .44] (confer [309. Hiriart-Urruty & Lemar´chal [200.169] a definition we do not adopt because our only criterion for tangency is e intersection exclusively with a relative boundary.20 useful for constructing the dual cone. yp ∈ Y .21 Rockafellar terms a strictly supporting hyperplane tangent to Y if it is unique there.4. 2.18 . Another equivalent but nontraditional representation2. p. [309. HYPERPLANE 81 existence of at least one hyperplane in Rn supporting a full-dimensional convex set2. HALFSPACE. called nontrivial support. 18. p. that hyperplane is termed strictly supporting. which is. (1714) ∂H+ = = ˜ y | aT(y − yp ) = 0 . Figure 55b. Supporting hyperplane ∂H . [309.18 at each point on its boundary. then a hyperplane supporting Y at point yp ∈ ∂Y is described ∂H− = y | aT(y − yp ) = 0 . Tradition would instead have us construct the polar cone. the negative dual cone. p. p..19 Normal a belongs to H+ by definition. [309.19 (Figure 29a).2. 2.21 △ It is customary to speak of a hyperplane supporting set C but not containing C . 2.20 for a supporting hyperplane is obtained by reversing polarity of normal a . yp ∈ Y .2.4.100]) do not demand any tangency of a supporting hyperplane. The partial boundary ∂H of a halfspace that contains arbitrary set Y is called a supporting hyperplane ∂H to Y when the hyperplane contains at least one point of Y . aT(z − yp ) ≥ 0 ∀ z ∈ Y ˜ T T y | a y = − inf{˜ z | z ∈ Y} = sup{−˜Tz | z ∈ Y} ˜ a a (130) where normal a and set Y both now reside in H+ (Figure 29b). aT(z − yp ) ≤ 0 ∀ z ∈ Y ∂H− = where real function σY (a) = sup{aTz | z ∈ Y} (552) y | aTy = sup{aTz | z ∈ Y} (128) Given only normal a . Assuming set Y and some normal a = 0 reside in opposite halfspaces2.

2. [94] mathematicians see this geometry as a means to relax a discrete problem (whose desired solution is integer or combinatorial. It was originally meant with regard to a plan or to efficient organization or systematization of some industrial process.2.6. Minimization over hypercube. 2] 2.3. x⋆ = − sgn(c) is an optimal solution to this minimization problem but is not necessarily unique.1. 2.135] C= y | aTy ≤ σC (a) (131) a∈Rn by the halfspaces theorem ( 2. and Figure 29. p.1). given vector c minimize x cTx x 1 subject to −1 (132) This convex optimization problem is called a linear program 2. Applying graphical concepts from Figure 26.0. conversely.1) regardless of value of nonzero vector c in (132).4. [240. 3.24 of minimization cTx is a linear function of variable x and the constraints describe a polyhedron (intersection of a finite number of halfspaces and hyperplanes).82 CHAPTER 2. Consider minimization of a linear function over a hypercube.23 because the objective 2.24 The objective is the function that is argument to minimization or maximization. [251.23 The term program has its roots in economics.1.22 an ordinary hyperplane ∂H coincident with them. Any vector x satisfying the constraints is called a feasible solution.22 . can be expressed as the intersection of all halfspaces partially bounded by hyperplanes supporting it. confer Example 4.2 Example. CONVEX GEOMETRY A full-dimensional set that has a supporting hyperplane at every point on its boundary.1.1. is convex.4. videlicet. There is no geometric difference between supporting hyperplane ∂H+ or ∂H− or ∂H and2. Figure 28.6. A convex set C ⊂ Rn . 2. we may instead signify a supporting hyperplane by ∂H . 2.1] [241] If vector-normal polarity is unimportant. for example. It generally holds for optimization problem solutions: optimal ⇒ feasible (133) Because an optimal solution always exists at a hypercube vertex ( 2.1). [94.

Suppose instead we minimize over the unit hypersphere in Example 2.2. 2.4.1] . x = α − β (extensible to vectors). Are the following programs equivalent for some arbitrary real convex set C ? (confer (516)) minimize x∈R subject to −1 ≤ x ≤ 1 x∈C |x| minimize α . What is an expression for optimal solution now? Is that program still linear? Now suppose minimization of absolute value in (132). 5. 2.2] that guarantees existence of a hyperplane separating two nonempty convex sets in Rn whose relative interiors are nonintersecting. then there exists a separating hyperplane ∂H containing the intersection but containing no points relatively interior to either set.2.2.2.5. If at least one of the two sets is open. Unbounded below.7. β α+β (134) ≡ subject to 1 ≥ β ≥ 0 1≥α≥0 α−β ∈ C Many optimization problems of interest and some methods of solution require nonnegative variables. There are two cases of interest: 1) If the two sets intersect only at their relative boundaries ( 2. The method illustrated below splits a variable into parts. 1] [133. HALFSPACE.6.4. [61.7 PRINCIPLE 3: Separating hyperplane The third most fundamental principle of convex geometry again follows from the geometric Hahn-Banach theorem [251.3 Exercise. I. Separation intuitively means each set belongs to a halfspace on an opposing side of the hyperplane. x ≤ 1.6. Under what conditions on vector a and scalar b is an optimal solution x⋆ negative infinity? minimize α∈R .1.4. β∈ R subject to β ≥ 0 α≥0 α aT β α−β (135) = b Minimization of the objective function entails maximization of β .12] [18.4. conversely.2. HYPERPLANE 83 2.2). then the existence of a separating hyperplane implies the two sets are nonintersecting.1.

The fundamental vector subspaces associated with a matrix A ∈ Rm×n [333. 3. the rowspace and range identify one form of subspace description (range form or vertex-description ( 2.b a b radians (136) 2.4.a.k.4. p. p.5 Subspace representations There are two common forms of expression for Euclidean subspaces.1] are ordinarily related by orthogonal complement R(AT ) ⊥ N (A) . Given normals a and b respectively describing ∂Ha and ∂Hb .3. a. its existence is guaranteed when intersection of the closures is empty and at least one set is bounded.84 CHAPTER 2. dihedral angle between hyperplanes or halfspaces is defined as the angle between their defining normals.a conservation of dimension) dim N (A) = n − rank A . and of dimension: dim R(AT ) = dim R(A) = rank A ≤ min{m . ∂Hb ) arccos a. dim N (AT ) = m − rank A (140) (139) N (AT ) ⊥ R(A) N (AT ) ⊕ R(A) = Rm (137) (138) These equations (137)-(140) comprise the fundamental theorem of linear algebra. vertex-description and halfspace-description respectively. R(AT ) ⊕ N (A) = Rn . y ∈ R(A)} (141) span A = {Ax | x ∈ Rn } = {y ∈ Rm | Ax = y . CONVEX GEOMETRY 2) A strictly separating hyperplane ∂H intersects the closure of neither set.k. both coming from elementary linear algebra: range form R and nullspace form N .3 Angle between hyperspaces Given halfspace-descriptions. [333. [200.95.138] From these four fundamental subspaces.1] 2. for example (∂Ha .4)) R(AT ) R(A) span AT = {ATy | y ∈ Rm } = {x ∈ Rn | ATy = x . x ∈ R(AT )} (142) . A. n} with complementarity (a.

5. (145) (146) (147) 2. .1 . Any particular vector subspace Rp can be described as nullspace N (A) of some matrix A or as range R(B) of some matrix B . .0. More generally. for (143). each row of A is normal to a hyperplane while each row of AT is a normal for (144). .3. as hyperplane intersection Any affine subset A of dimension n − m can be described as an intersection of m hyperplanes in Rn . halfspace-description N (A) is actually the intersection of as many hyperplanes through the origin.0.0.1. given fat (m ≤ n) full-rank (rank = min{m .. .6.1. Subspace algebra.1 Subspace or affine subset. Theorem A.  ∈ Rm×n A (148) . or as the offset span of n − m vectors: 2.1 Given prove Exercise.g.2. we have the choice of expressing an n − m-dimensional affine subset of Rn as the intersection of m hyperplanes. aT m . 2. whereas nullspace form (143) or (144) is the solution set to a linear equation similar to hyperplane definition (114).5.5.5. R(A) + N (AT ) = R(B) + N (B T ) = Rm R(A) ⊇ N (B T ) ⇔ N (AT ) ⊆ R(B) R(A) ⊇ R(B) ⇔ N (AT ) ⊆ N (B T ) e.  . n}) matrix  T a1 . Yet because matrix A generally has multiple rows. SUBSPACE REPRESENTATIONS 85 while the nullspaces identify the second common form (nullspace form or halfspace-description (113)) N (A) N (AT ) {x ∈ Rn | Ax = 0} = {x ∈ Rn | x ⊥ R(AT )} {y ∈ Rm | ATy = 0} = {y ∈ Rm | y ⊥ R(A)} (143) (144) Range forms (141) (142) are realized as the respective span of the column vectors in matrices AT and A .

5.2 . A describes a subspace whenever xp = 0.1. . In R4 .2. . For n > k m A ∩ Rk = {x ∈ Rn | Ax = b} ∩ Rk = i=1 x ∈ Rk | a i (1 : k)Tx = b i (150) The result in 2. whereas three independent hyperplanes intersect at a point.1) and then equivalently express affine subset A as its span plus an offset: Define Z basis N (A) ∈ Rn×n−rank A Z ξ + xp | ξ ∈ Rn−rank A ⊆ Rn (151) so AZ = 0.2. ( 2. A = {x ∈ Rn | Ax = b} = (152) the offset span of n − rank A column vectors.1.4. Then we have a vertex-description in Z . as span of nullspace basis Alternatively. This misuse departs from independence of two affine subsets that demands intersection only at a point or not at all. where xp is any particular solution to Ax = b .1) 2. Two planes can intersect at a point in four-dimensional Euclidean vector space.2 is extensible.3.1. any affine subset A also has a vertex-description: 2. (113) For example: The intersection of any two independent2.25 hyperplanes in R3 is a line. It is easy to visualize intersection of two planes in three dimensions. Any number of hyperplanes are called independent when defining normals are linearly independent. id est.4.86 and vector b ∈ Rm .g. whereas three hyperplanes intersect at a line. and so on.25 .5.1 Example.1). CONVEX GEOMETRY m A {x ∈ R | Ax = b} = n i=1 x | aTx = b i i (149) a halfspace-description. CHAPTER 2. Intersecting planes in 4-space. 2.1.. we may compute a basis for nullspace of matrix A ( E. A describes a subspace whenever b = 0 in (149). the intersection of two independent hyperplanes is a plane (Example 2.2. four at a point.5. e.0.

they intersect at a A1 is invertible.2. So let’s resort to the tools acquired. Two cases of interest are drawn in Figure 30. Why is α⋆ negative in both cases? Is there solution on the vertical axis? What causes objective unboundedness in the latter case (b)? Describe all vectors c that would yield finite optimal objective in (b). Minimize a hyperplane over affine set A in the nonnegative orthant minimize x cTx subject to Ax = b x 0 (156) where A = {x | Ax = b}.1.5. Why is optimal solution x⋆ not aligned with vector c as in Cauchy-Schwarz inequality (2098)? subject to x ∈ P (157) . Bounded set P is an intersection of many halfspaces. Linear program. Graphical solution to linear program maximize cTx x is illustrated in Figure 31. Suppose an intersection of two hyperplanes in four dimensions is specified by a fat full-rank matrix A1 ∈ R2×4 (m = 2. SUBSPACE REPRESENTATIONS 87 a line can be formed. n = 4) as in (149): A1 x ∈ R4 a11 a12 a13 a14 x = b1 a21 a22 a23 a24 (153) The nullspace of A1 is two dimensional (from Z in (152)). point because A2 A 1 ∩ A2 = x ∈ R4 A1 x= A2 b1 b2 (155) 2.2 Exercise. Similarly define a second plane in terms of A2 ∈ R2×4 : A2 x ∈ R4 a31 a32 a33 a34 x = b2 a41 a42 a43 a44 (154) If the two planes are affinely independent and intersect.2. so A1 represents a plane in four dimensions.5. In four dimensions it is harder to visualize. Graphically illustrate and explain optimal solutions indicated in the caption.

88 CHAPTER 2. + Solutions are visually ascertainable: (a) Optimal solution is • .1) are the nonnegative Cartesian axes. (b) Optimal objective α⋆ = −∞. CONVEX GEOMETRY (a) c {x | cT x = α} A = {x | Ax = b} A = {x | Ax = b} (b) c {x | cT x = α} Figure 30: Minimizing hyperplane over affine set A in nonnegative orthant R2 whose extreme directions ( 2. .8.

6]. Then [160.5. over polyhedral set P in R2 is a linear program (157).2. Suppose the columns of a matrix Z constitute a basis for N (A) while the columns of a matrix W constitute a basis for N (BZ ). SUBSPACE REPRESENTATIONS 89 P c x⋆ ∂H Figure 31: Maximizing hyperplane ∂H . 2.5. 12. whose normal is vector c ∈ P . Optimal solution x⋆ at • . or in the circumstance A and B are two linearly independent dyads .2] N (A) ∩ N (B) = R(ZW ) (159) If each basis is orthonormal.4. then the columns of ZW constitute an orthonormal basis for the intersection. In the particular circumstance A and B are each positive semidefinite [21.2 Intersection of subspaces The intersection of nullspaces associated with two matrices A ∈ Rm×n and B ∈ Rk×n can be expressed most simply as N (A) ∩ N (B) = N A B {x ∈ Rn | A x = 0} B (158) nullspace of their rowwise concatenation.

CHAPTER 2.1 and where ( B. CONVEX GEOMETRY 2. and w ∈ N (AT ) A . T i=1 wM where nullspace eigenvectors are real by theorem A.  = (1580) λ i s i wi . such as R(AT ) ⊥ N (A) .5. But to aid visualization of involved geometry.5.1). −1 T A = SΛS = [ s1 · · · sM ] Λ  . ssT = 0 .1): for rank-ρ matrix A ∈ RM ×M  T w1 M .0.3 Visualization of matrix subspaces   A.) Fundamental subspace relations.1. then N (A) ∩ N (B) = N (A + B) . it sometimes helps to vectorize matrices. This innocuous observation becomes a sharp instrument for visualization of diagonalizable matrices ( A.B ∈ SM + or  T T A + B = u1 v 1 + u 2 v 2 (160) (l. s ∈ N (A) .0. For any square matrix A .1) R{si ∈ RM | λ i = 0} = R R{wi ∈ R | λ i = 0} = R M M i=ρ+1 M si sT i = N (A) = N (A ) T (162) T wi wi i=ρ+1 Define an unconventional basis among column vectors of each summation: M basis N (A) ⊆ basis N (AT ) ⊆ i=ρ+1 M si sT ⊆ N (A) i T wi wi ⊆ N (AT ) (163) i=ρ+1 . N (AT ) ⊥ R(A) (137) are partially defining. A .90 ( B.i. wwT = 0 (161) because sTA s = wTA w = 0.5.1.

6. EXTREME. µ ∈ [0. An extreme point xε of a convex set C is a point.0. Now. id est.2. 3.6] An extreme point of a convex set C is a point xε in C whose relative complement C \ xε is convex. [361. vec basis N (AT ) ⊥ vec A (166) These vectorized subspace orthogonality relations represent a departure (absent T ) from fundamental subspace relations (137) stated at the outset. 1] (167) △ In other words.6. 2. Extreme point.3. vec A ⊥ vec basis N (A) .1 Definition. because M M A. that is not expressible as a convex combination of points in C distinct from xε .1 M vec basis N (A) = vec vec basis N (AT ) = vec i=ρ+1 M si sT i (164) i=ρ+1 T wi wi By this reckoning. e.g.. T wi wi = 0 i=ρ+1 (165) then vectorized matrix A is normal to a hyperplane (of dimension M 2 − 1) that contains both vectorized nullspaces (each of whose dimension is M − ρ).1.0. . belonging to its closure C [42. The set consisting of a single point C = {xε } is itself an extreme point. i i=ρ+1 A.3]. 2. si sT = 0 .6 Extreme. EXPOSED 91 We shall regard a vectorized subspace as vectorization of any M ×M matrix whose columns comprise an overcomplete basis for that subspace.10] Borwein & Lewis offer: [55. for xε ∈ C and all x1 . xε is an extreme point of C if and only if xε is not a point relatively interior to any line segment in C . x2 ∈ C \xε µ x1 + (1 − µ)x2 = xε . Exposed 2. E. 4. vec basis R(A) = vec A but is not unique.

id est.3 Definition. ⋄ 2. [309. [200. A.5] A nonempty closed convex set containing no lines has at least one extreme point. facet. [309. vertex. in general. 18] All faces F are extreme sets by definition. exposed point. Exposed face.3) belonging to it.2.0.4.0. 18. dim F = sup dim{x2 − x1 .2. .3] A face F of convex set C is a convex subset F ⊆ C such that every closed line segment x1 x2 in C . for F ⊆ C and all x1 . The zero-dimensional faces of C constitute its extreme points.2. xρ − x1 | xi ∈ F .2.1.2 Theorem. having a relatively interior point (x ∈ rel int x1 x2) in F . . .3] [26. has both endpoints in F .5.0. the concept exposed face requires introduction: 2.6. . The nonempty intersection of any supporting hyperplane with C identifies a face. 1] / (168) A one-dimensional face of a convex set is called an edge.4] F is an exposed face of an n-dimensional convex set C iff there is a supporting hyperplane ∂H to C such that F = C ∩ ∂H (170) Only faces of dimension −1 through n − 1 can be exposed by a hyperplane. △ Dimension of a face is the penultimate number of affinely independent points ( 2.0. but not vice versa.92 CHAPTER 2. Face.6. . edge. [200. but not vice versa. µ ∈ [0. x3 − x1 . To acquire a converse. CONVEX GEOMETRY 2. The empty set ∅ and C itself are conventional faces of C . A.0. . x2 ∈ C \F µ x1 + (1 − µ)x2 ∈ F . i = 1 .3. ρ} (169) ρ The point of intersection in C with a strictly supporting hyperplane identifies an extreme point.3. Extreme existence.6.6.1 Definition. II.1 Exposure 2. . A.

Union of all rotations of this entire set about its vertical edge produces another convex set in three dimensions having no edges. Point D is neither an exposed or extreme point although it belongs to a one-dimensional exposed face. The arc is not a conventional face. Point B is extreme but not an exposed point. zero-dimensional exposure makes it a vertex. . Point A is exposed hence extreme. EXTREME. yet it is composed entirely of extreme points. EXPOSED 93 A B C D Figure 32: Closed convex set in R2 .6. a classical vertex. A.6] Closed face AB is exposed. Point C is exposed and extreme.2.4] [332. 3. but that convex set produced by rotation about horizontal edge containing D has edges.2. a facet. [200.

there are no more. consists entirely of extreme points. A facet is an (n − 1)-dimensional exposed face of an n-dimensional convex set C . [224.115/6 p.6. then B is an extreme point by definition. 18] [337] [332. any extreme point of F2 is an extreme point of F3 . [309. For the polyhedron in R3 from Figure 20. and so on. point B cannot be exposed because it relatively bounds both the facet AB and the closed quarter circle. is equivalent to a zero-dimensional exposed face.1 Density of exposed points For any closed convex set C . and facets.6. each bounding the set.26 . facets exist in one-to-one correspondence with the (n − 1)-dimensional faces. Nor is a face of dimension 2 or more entirely composed of edges. For the convex set illustrated in Figure 32. If F1 is a face (an extreme set) of F2 which in turn is a face of F3 . CONVEX GEOMETRY An exposed point. of dimension 1 or more. F2 could be a face exposed by a hyperplane supporting polyhedron F3 . in this example. This coincidence occurs simply because the facet’s dimension is the same as the dimension of the supporting hyperplane exposing it. (The parallel statement for exposed faces is false. the point of intersection with a strictly supporting hyperplane. its exposed points constitute a dense subset of its extreme points. for example. then it is always true that F1 is a face of F3 .1. 2. 18]) For example.2. def. [309. Point B may be regarded as the limit of some sequence of exposed points beginning at vertex C . the definition of vertex. and two-dimensional faces are in one-to-one correspondence with the exposed faces in that example. 2. p.358] Yet it is erroneous to presume that a face.26 {exposed points} = {extreme points} {exposed faces} ⊆ {faces} △ 2. 3. The zero-.1. the nonempty faces exposed by a hyperplane are the vertices.94 CHAPTER 2. edges.115] dense in the sense [376] that closure of that subset yields the set of extreme points. Since B is not relatively interior to any line segment in the set.6.2 Face transitivity and algebra Faces of a convex set enjoy transitive relation. one-.

1 Definition. central to the definition of exposure.1. 2.7. that contains G .1. Most compelling is the projection analogy: Projection on a subspace can be ascertained from projection on its orthogonal complement (Figure 171).2) Relative boundary rel ∂ C = C \ rel int C is equivalent to: 2.2).100] Any face F of convex set C (that is not C itself) belongs to rel ∂ C . ( 2. 2. does not contain C . The smallest face. that contains G .8. E.1.6.6.1] The relative boundary ∂ C of a nonempty convex set C is the union of all exposed faces of C . △ Equivalence to (24) comes about because it is conventionally presumed that any supporting hyperplane.2. p.9. C. Figure 172. called the polar cone.6.3. An affine set has no faces except itself and the empty set. C ⊃ rel int F(C ∋ G) ∋ G .7 Cones In optimization.3 Smallest face 95 Define the smallest face F . that contains some element G .1) (24) 2.13. of set C . . CONES 2. [309.1. of intersection of convex set C with an affine set A [240. Conventional boundary of convex set. of a convex set C : F(C ∋ G) (171) videlicet.2. convex cones achieve prominence because they generalize subspaces. whereas projection on a closed convex cone can be determined from projection instead on its algebraic complement ( 2.4] [241] F((C ∩ A) ∋ G) = F(C ∋ G) ∩ A (172) equals intersection of A with the smallest face.4.7.4 Conventional boundary (confer 2. [200.

2. Boundary of this cone is itself a cone. simply.. [374. (b) This convex cone (drawn truncated) is a line through the origin in any dimension.7.5] Similar examples are shown in Figure 33 and Figure 37. When B = 0.0. 2. (173) {ζ Γ + B | ζ ≥ 0 . Γ = 0} ⊂ Rn defines a halfline called a ray in nonzero direction Γ ∈ Rn having base B ∈ Rn . is the origin because that is the union of all exposed faces not containing the entire set. X = {x ∈ R2 | x1 x2 ≥ 0} which is not convex. e.g. △ Relative boundary of a single ray.1 Definition. base 0 in any dimension. The one-dimensional set Ray.1 Cone defined Γ ∈ X ⇒ ζ Γ ∈ X for all ζ ≥ 0 (174) A set X is called. Its relative interior is the ray itself excluding the origin. cone if and only if where X denotes closure of cone X . An example of such a cone is the union of two opposing quadrants. hence a convex cone. Each polar half is itself a convex cone. 0 (b) 2. a ray is the conic hull of direction Γ .7.0. Hence all closed cones contain . All cones can be defined by an aggregate of rays emanating exclusively from the origin (but not all cones are convex). while its relative interior comprises entire line. CONVEX GEOMETRY X 0 (a) X Figure 33: (a) Two-dimensional nonconvex cone drawn truncated. It has no relative boundary.96 CHAPTER 2.

a pair of rays emanating from the origin.2. . X X Figure 36: Union of two pointed closed convex cones is nonconvex cone X .4] Because the lines are linearly independent. [251. they are algebraic complements whose vector sum is R2 a convex cone.7. 0 Figure 35: Boundary of a convex cone in R2 is a nonconvex cone. 2. CONES 97 0 Figure 34: This nonconvex cone in R2 is a pair of lines through the origin.

Boundary is also a cone.98 CHAPTER 2. CONVEX GEOMETRY X X Figure 37: Truncated nonconvex cone X = {x ∈ R2 | x1 ≥ x2 . [251. X 0 Figure 38: Nonconvex cone X drawn truncated in R2 .4] Cone exterior is convex cone.4] Cartesian axes drawn for reference. Boundary is also a cone. x1 x2 ≥ 0}. 2. [251. 2. Each half (about the origin) is itself a convex cone. .

ξ ≥ 0 µ ζ Γ1 + (1 − µ) ξ Γ2 ∈ K ∀ µ ∈ [0. the positive semidefinite cone SM (191).1. The set of convex cones is a narrower but more familiar class of cone.7. quadratic cone. Obviously. Γ2 ∈ K ⇒ ζ Γ1 + ξ Γ2 ∈ K for all ζ . any ray having the origin as base such as the nonnegative real line R+ in subspace R . and Euclidean vector space Rn .9.7. 1] because µ ζ . e. ζ Γ1 ∈ K and ξ Γ2 ∈ K ∀ ζ . for any particular ζ .1). a. (1 − µ) ξ ≥ 0. Esoteric examples of convex cones include the point at the origin. circular cone ( 2. {X } ⊃ {K} (177) the set of all convex cones is a proper subset of all cones. ℓ=2 (178) and polyhedral cone ( 2. unbounded ice-cream cone united with its interior. any halfspace partially bounded by a hyperplane through the origin. cone ∅ = {0} (104) 99 The 2.2 Convex cone Γ1 .8. if and only if any conic combination of elements from K belongs to its closure. completely positive semidefinite matrices {CC T | C ≥ 0} [40.12. More familiar convex cones are Lorentz cone (confer Figure 46)2.2. any subspace. p. Apparent from this definition. any orthant generated by Cartesian half-axes ( 2. Set K is convex since.27 Kℓ = x t ∈ Rn × R | x ℓ (176) ≤t . meaning.a: second-order cone.1.1). but its conic hull is. 2.3). empty set ∅ is not a cone.. the cone of Euclidean distance matrices EDMN (924) + ( 6). CONES the origin 0 and are unbounded.g.0. excepting the simplest cone {0}. a halfspace-description ( 2. Convex cones need not be full-dimensional. any member of which can be equivalently described as the intersection of a possibly (but not necessarily) infinite number of hyperplanes (through the origin) and halfspaces whose bounding hyperplanes pass through the origin. ξ ≥ 0 (175) We call set K a convex cone iff id est.4).71].k. ξ ≥ 0 .2. K is a cone.27 . any line through the origin.

[309. 2. 19] Intersection of an arbitrary collection of closed convex cones is a closed convex cone.12.1. the convex cone.2. p.22] 2.1 cone invariance More Euclidean bodies are cones. ⋄ confer Figures: 24 33 34 35 38 37 39 41 43 50 54 57 59 60 62 63 64 65 66 144 157 182 2.7. than are not. [259.142.0.28 .100 CHAPTER 2. 2.7. Intersection of an arbitrary collection of convex cones is a convex cone. and Cartesian product.1) remains a polyhedral cone.or many-valued inverse linear transformation. Cone intersection (nonempty). 2.28 This class of convex body.1 Theorem.2.3] Intersection of a finite number of polyhedral cones (Figure 50 p. the three-dimensional flared horn (with or without its interior) resembling mathematical symbol ≻ denoting strict cone membership and partial order. 2.1. vector summation.2. [309. ironically. it seems. linear and single. but is not invariant to translation. CONVEX GEOMETRY Figure 39: Not a cone. is invariant to scaling.

and vice versa. but not full-dimensional. CONES The property pointedness is associated with a convex cone.2. if K ∩ −K = {0} (179) then K is pointed.2) A convex cone K is pointed iff it contains no line. but.7.3]: the 2. has a strictly supporting hyperplane at the origin. bounded.4. △ Then a pointed closed convex cone.7.7). K is pointed if and only if there exists a vector β normal to a hyperplane strictly supporting K . Equivalently. emanating from the origin in Rn . exer. h + 2. β ≥ ǫ x ∀ x∈ K (181) ⋄ If closed convex cone K is not pointed. equivalently.3. If the origin is an extreme point of K or. K is not pointed iff there exists any nonzero direction Γ ∈ K such that −Γ ∈ K . The simplest and only bounded [374.12. id est. .1. [332.7.2.2. has relative boundary 0 while its relative interior is the halfline itself excluding 0.29 Yet a pointed closed convex cone has only one extreme point [42.2. Its relative boundary is the empty set ∅ (25) while its relative interior is the point 0 itself (12). α = 1} (180) is closed.2. by convention. 3. Pointed are any Lorentz cone. then it has no extreme point.3 Theorem. The pointed convex cone that is a halfline. by principle of separating hyperplane ( 2.10] A convex cone is pointed iff the origin is the smallest nonempty face of its closure. and K = cone C .2. Equivalently. p. Figure 36) 2. cone of Euclidean distance matrices EDMN in symmetric hollow subspace SN . 3. for some positive scalar ǫ x.15. Pointed convex cone.2 Definition.1.75] convex cone K = {0} ⊆ Rn is pointed.20] n A closed convex cone K ⊂ R is pointed if and only if there exists a normal α such that the set C {x ∈ K | x . (confer 2.1). pointed cone convex cone 101 (Figure 35. and positive semidefinite cone SM in ambient SM . 2.8.29 nor does it have extreme directions ( 2. Pointed cones. [55.

7] essentially defined by vector or matrix inequality.30 . [21. implying. y x ⇒ x = z) z⇒x z). e.102 CHAPTER 2. Visceral mechanics of actually comparing points. x y or y x . We say two points x and y are comparable when x y or y x with respect to pointed closed convex cone K . Only when K is a nonnegative orthant Rn do these inequalities reduce + to ordinary entrywise comparison ( 2. And from the cone intersection theorem it follows that an intersection of convex cones is pointed if at least one of the cones is.2. (x y .30 x) z. its vertex.4. Pointedness is invariant to Cartesian product by (179). each and every nonempty exposed face of a pointed closed convex cone is a pointed closed convex cone.or matrix-valued partially ordered set are thus well defined. are well illustrated in the example of Figure 63 which relies on the equivalent membership-interpretation in definition (182) or (183). p. we ascribe nomenclature generalized inequality to comparison with respect to a pointed closed convex cone. y ≺ z ⇒ x ≺ z) A pointed closed convex cone K induces partial order on Rn or Rm×n . x K z ⇔ ⇔ z−x∈ K z − x ∈ rel int K (182) (183) x ≺ z K Neither x or z is necessarily a member of K for these relations to hold. 2. Comparable points and the minimum element of some vector. so nonincreasing A set is totally ordered if it further obeys a comparability property of the relation: for each and every x and y from the set. Inclusive of that special case. CONVEX GEOMETRY exposed point residing at the origin. 1] [326.. z y.2 Relation reflexivity (x antisymmetry (x transitivity (x Pointed closed convex cone induces partial order represents partial order on some set if that relation possesses2.3) while partial order lingers. when cone K is not an orthant.13.7.g. 2.2. one-dimensional real vector space R is the smallest unbounded totally ordered and connected set.

7.) (b) Point y is a minimal element of set C2 with respect to cone K because negative cone translated to y ∈ C2 contains only y . . CONES 103 R2 C1 (a) x+K y x C2 (b) y−K Figure 40: (confer Figure 69) (a) Point x is the minimum element of set C1 with respect to cone K because cone translated to x ∈ C1 contains entire set. minimum/minimal.2. become equivalent under a total order. (Cones drawn truncated. These concepts.

is useful for partially ordered sets having no minimum element: Point x ∈ C is a minimal element of set C with respect to pointed closed convex cone K if and only if (x − K) ∩ C = x . λ > 0 ⇒ λx ≺ λz) z.31 A closely related concept.1. λ ≥ 0 ⇒ λx λz).2. u v ⇒ x + u ≺ z + v) additivity (x 2.1) it follows: Borwein & Lewis [55. Proper cone: a cone that is pointed closed convex full-dimensional. (x ≺ y . 2.21] ignore possibility of equality to x + K in this condition. and require a second condition: .1 Definition. 1] [61. .7. A.3 exer.16] or univariate degree-2n polynomials nonnegative over R [61.4. (x ≺ z . [90. CONVEX GEOMETRY homogeneity (x sequences with respect to cone K can therefore converge in this sense: Point x ∈ C is the (unique) minimum element of set C with respect to cone K iff for each and every z ∈ C we have x z .2. although implicit is the assumption: dim K ≥ dim aff C .2] Because any supporting hyperplane to a convex cone must therefore itself be a cone. exmp.2.8 Cone boundary Every hyperplane supporting a convex cone contains the origin.31 . equivalently.104 CHAPTER 2. 3. iff C ⊆ x + K . . In words. and the set of all coefficients of univariate degree-n polynomials nonnegative on interval [0. or any orthant in Rn . u v ⇒ x+u z + v).2.2.7. (Figure 40) No uniqueness is implied here. 5. the nonnegative real line R+ in vector space R . Further properties of partial order with respect to pointed closed convex cone K are not defining: y .2. 2. minimal element. [200. a point that is a minimal element is smaller (with respect to K) than any other point in the set to which it is comparable. △ A proper cone remains proper under injective linear transformation.1] Examples of proper cones are the positive semidefinite cone SM in + the ambient space of symmetric matrices ( 2.37]. then from the cone intersection theorem ( 2.9). and C ⊂ y + K for some y in Rn implies x ∈ y +K. exer.

and the ray belongs to K by definition (174). [309.8.1 Extreme direction The property extreme direction arises naturally in connection with the pointed closed convex cone K ⊂ Rn . By the cone faces lemma. Then it follows that the ray {ζ Γ | ζ ≥ 0} also belongs to ∂K . p.7. ⋄ Proof.3]2. {ζ Γ | ζ ≥ 0} ⊆ K ∩ ∂H ⊂ ∂K Proper cone {0} in R0 has no boundary (24) because (12) rel int{0} = {0} (185) (184) The boundary of any proper cone in R is the origin. being analogous to extreme point. ∂H must therefore expose a face of K that contains the ray. 105 [26. So the ray through Γ belongs both to K and to ∂H . Each nonempty exposed face of a convex cone is a convex cone.1 Lemma. The boundary of any convex cone whose dimension exceeds 1 can be constructed entirely from an aggregate of rays emanating exclusively from the origin. cor. 2. 18.33 We diverge from Rockafellar’s extreme direction: “extreme point at infinity”.0.11.0.6.33 An extreme direction Γε of pointed K is a vector Rockafellar’s corollary yields a supporting hyperplane at the origin to any convex cone in Rn not equal to Rn .0. By virtue of its propriety. Hence the closed line segment 0Γ must lie in an exposed face of K because both endpoints do by Definition 2.2 Theorem. 2.2. The origin belongs to the ray through Γ .162]2. id est.32 Hence the origin belongs to the boundary of K because it is the zero-dimensional exposed face. Cone faces.4. a proper cone guarantees existence of a strictly supporting hyperplane at the origin. That means there exists a supporting hyperplane ∂H to K containing 0Γ .8] ⋄ 2. Proper-cone boundary. [309.8. II. each and every nonempty exposed face must include the origin.8.1.32 .8. 2. CONE BOUNDARY 2.1.0. Suppose a nonzero point Γ lies on the boundary ∂K of proper cone K in Rn .

. CONVEX GEOMETRY ∂K ∗ K Figure 41: K is a pointed polyhedral cone not full-dimensional in R3 (drawn truncated in a plane parallel to the floor upon which you stand). Dual cone ∗ K is a wedge whose truncated boundary is illustrated (drawn perpendicular ∗ to the floor). K ⊂ int K (excepting the origin).106 CHAPTER 2. In this particular instance. Cartesian coordinate axes drawn for reference.

9) comprise the infinite set of all symmetric rank-one matrices.6. for example. e. The extreme directions of the positive semidefinite cone ( 2. in this context.2. 2. 6] [196. we are referring to distinctness of rays containing them. but its vector representation Γε is not because any positive scaling of it produces another vector in the same (extreme) direction. the corresponding extreme ray {ζ Γε ∈ K | ζ ≥ 0} is not a ray relatively interior to any plane segment 2.71). an extreme direction in a pointed closed convex cone is the direction of a ray. For any pointed polyhedral cone.0. extreme direction Γε is not a point relatively interior to any line segment in K \{ζ Γε ∈ K | ζ ≥ 0}. correspond to its three edges.36 The extreme directions of the polyhedral cone in Figure 24 (p. by analogy.2. [21. uniqueness An extreme direction is unique.35 in K . that cannot be expressed as a conic combination of directions of any rays in the cone distinct from it. a planar cone.g. Nonzero vectors of various length in the same extreme direction are therefore interpreted to be identical extreme directions. III] It is sometimes prudent to instead consider the less infinite but complete normalized set.3) of a convex cone is not necessarily a ray.1 extreme distinction. Hence an extreme direction is unique to within a positive scaling.36 Like vectors. a wedge-shaped polyhedral cone (K in Figure 41). there is a one-to-one correspondence of one-dimensional faces with extreme directions..0. 2.34 (187) An edge ( 2. Γ2 ∈ K \{ζ Γε ∈ K | ζ ≥ 0} (186) In words.8. When we say extreme directions are distinct. Thus.34 Nonzero direction Γε in pointed K is extreme if and only if ζ1 Γ1 + ζ2 Γ2 = Γε ∀ ζ1 . . for M > 0 (confer (235)) {zz T ∈ SM | z = 1} 2. A convex cone may ∗ contain an edge that is a line.1. ζ2 ≥ 0 . called an extreme ray. ∀ Γ1 . an extreme direction can be identified with the Cartesian point at the vector’s head with respect to the origin. By (105). CONE BOUNDARY 107 corresponding to an edge that is a ray emanating from the origin.8. An extreme ray is a one-dimensional face of K . 2.35 A planar fragment.2.

then it has no extreme directions and no vertex.1) Any closed convex set containing no lines can be expressed as the convex hull of its extreme points and extreme rays. [309. an idiosyncrasy of dimension 1. 18.1. generate affine sets by taking the affine hull of any collection of points and directions. p. Pointed closed convex cone K = {0} has no extreme direction because extreme directions are nonzero by definition. the extreme points and directions are called generators of a convex set. 18. a vertex at the origin (if it exists) becomes benign.167] That is the practical utility of extreme direction.6] [309.2. 1] Conversely.166] (confer 2. pointed closed convex cone K is equivalent to the convex hull of its vertex and all its extreme directions.1.1 Theorem. We broaden. p.2.1.8.9.1.2. of course. When the extremes theorem applies.3. generators for a convex set comprise any collection of points and directions whose convex hull constructs the set.g. Example 2. the set of extreme elements of a convex set is a minimal set of generators for that convex set.7. When the convex set under scrutiny is a closed convex cone.0. to facilitate construction of polyhedral sets. (Klee) Extremes.1.8. [21.12.8. [332.. If closed convex cone K is not pointed. ⋄ It follows that any element of a convex set containing no lines may be expressed as a linear combination of its extreme elements. has one extreme direction belonging to its relative interior.10.1.2. thereby. 2. the meaning of generator to be inclusive of all kinds of hulls.2. S+ the nonnegative real line.0. So. e. 3. apparent from the extremes theorem: 2.108 CHAPTER 2. An arbitrary collection of generators for a convex set includes its extreme elements as a subset.1 and Example 2. conic combination of generators during construction is implicit as shown in Example 2. We can. 2. . CONVEX GEOMETRY The positive semidefinite cone in one dimension M = 1.2 Generators In the narrowest sense. Any polyhedral set has a minimal set of generators whose cardinality is finite.

2.3.0. {exposed directions} ⊆ {extreme directions} △ For a proper cone in vector space Rn with n ≥ 2.8. denoting the i th extreme direction by Γi ∈ Rn . an exposed point.1) When a convex cone has a vertex. The conventionally exposed directions of EDM2 constitute the empty set ∅ ⊂ {extreme direction}. it resides at the origin. there can be only one. an exposed direction is the direction of a one-dimensional exposed face that is a ray emanating from the origin. we can say more: {exposed directions} = {extreme directions} (189) It follows from Lemma 2.8.0.1) is the origin. an idiosyncrasy of dimension 1.8.0. .1 Definition. there is one-to-one correspondence of one-dimensional exposed faces with exposed directions. then their convex hull is (86) P = = = [0 Γ1 Γ2 · · · ΓN ] a ζ | aT 1 = 1.1. so any basis constitutes generators for a vertex-description. a 0. [309. for example.1. ( 2. This cone has one extreme direction belonging to its relative interior. 2.6.2 Exposed direction 2.1 Example. 18] (confer 2.2.1 for any pointed closed convex cone.4.1.2. a 0. Given an extreme point at the origin and N extreme rays. span basis R(A).8. The pointed closed convex cone EDM2 . 2.6. CONE BOUNDARY 109 Any hull of generators is loosely called a vertex-description. ζ ≥ 0 [Γ1 Γ2 · · · ΓN ] b | b 0 ⊂ Rn (188) a closed convex set that is simply a conic hull like (103). there is no one-dimensional exposed face that is not a ray base 0.8.4) Hulls encompass subspaces. is a ray in isomorphic subspace R whose relative boundary ( 2. Application of extremes theorem. In the closure of a pointed convex cone. Exposed point & direction of pointed convex cone. id est. ζ ≥ 0 [Γ1 Γ2 · · · ΓN ] a ζ | aT 1 ≤ 1.0.

18] Four rays (drawn truncated) on boundary of conic hull of two-dimensional closed convex set from Figure 32 lifted to R3 . B}.110 CHAPTER 2. . although it belongs to the exposed face cone{A . Extreme ray through C is exposed. Ray through point A is exposed hence extreme. Point D is neither an exposed or extreme direction although it belongs to a two-dimensional exposed face of the conic hull. CONVEX GEOMETRY C B A D 0 Figure 42: Properties of extreme points carry over to extreme directions. Extreme direction B on cone boundary is not an exposed direction. [309.

2. although it can correctly be inferred: each and every extreme point and direction belongs to some exposed face. 18.1.1 Connection between boundary and extremes 111 2.6.1. the direction of an arbitrary ray. so dim F < dim C .8.1.1).8.7] (confer 2.1. CONE BOUNDARY 2. Neither is an extreme direction on the boundary of a pointed convex cone necessarily an exposed direction.1) Any closed convex set C containing no lines (and whose dimension is at least 2) can be expressed as closure of the convex hull of its exposed points and exposed rays.2. 18. [309. [309. ⋄ From Theorem 2.8. there are three two-dimensional exposed faces constituting the entire boundary.3] 2. Exposed. The relationship between extreme sets and the relative boundary actually goes deeper: Any face F of convex set C (that is not C itself) belongs to rel ∂ C . Lift the two-dimensional set in Figure 32. for example.2. For the polyhedral cone illustrated in Figure 24.4. Yet there are only three exposed directions.2. into three dimensions such that no two points in the set are .1 Theorem. base 0. as might be erroneously inferred from the conventional boundary definition ( 2.2 Converse caveat It is inconsequent to presume that each and every extreme point and direction is necessarily exposed. for example.1.1. Similarly. while each and every extreme direction of a convex set (that is not a halfline and contains no line) resides on its relative boundary because extreme points and directions of such respective sets do not belong to relative interior by definition.1.8. each composed of an infinity of rays.1. on the boundary of a convex cone is not necessarily an exposed or extreme direction. rel ∂ C = C \ rel int C  (24)    = conv{exposed points and exposed rays} \ rel int C    = conv{extreme points and extreme rays} \ rel int C (190) Thus each and every extreme point of a convex set (that is not a point) resides on its relative boundary. Arbitrary points residing on the relative boundary of a convex set are not necessarily exposed or extreme points.8.8.

p.78] 2.9. The positive definite (full-rank) matrices comprise the cone interior int SM = + = = A ∈ SM | A ≻ 0 A ∈ SM | y TAy > 0 ∀ y = 1 y =1 A ∈ SM | yy T .4.1) in vectorized variable2.2. Because y TA y = y TAT y . The set of all symmetric positive semidefinite matrices of particular dimension M is called the positive semidefinite cone: SM + = = A ∈ SM | A 0 (191) A ∈ SM | y TAy ≥ 0 ∀ y = 1 y =1 A ∈ SM | yy T .1 Definition. illustrated in Figure 42. CONVEX GEOMETRY collinear with the origin.1) 2. −Alexander Barvinok [26.37 . ( A. Positive semidefinite cone. A ≥ 0 = {A ∈ SM | rank A ≤ M } + formed by the intersection of an infinite number of halfspaces ( 2. A > 0 (192) = {A ∈ SM | rank A = M } + while all singular positive semidefinite matrices (having at least one infinite in number when M > 1.9 Positive semidefinite (PSD) cone The cone of positive semidefinite matrices studied in this section is arguably the most important of all non-polyhedral cones whose facial structure we completely understand. matrix A is almost always assumed symmetric.0.112 CHAPTER 2.1. It is a unique immutable proper cone in the ambient space of symmetric matrices SM .0. 2.37 A . each halfspace having partial boundary containing the origin in isomorphic RM (M +1)/2 . Then its conic hull can have an extreme direction B on the boundary that is not an exposed direction.

1. B 0 ⇒ A 0 .3.1) 2.9.2) for the pointed closed convex positive semidefinite cone.0.1 Membership Observe notation A 0 denoting a positive semidefinite matrix.0.1. given A + S = C S 0 ⇔ A C S≻0 ⇔ A≺C 2. (1519) A B .1 Example. E. ( A. i = 1 .7. Employing properties of partial order ( 2. (confer Figure 63) + 2.1. Yet.5) ∂ SM = A ∈ SM | A + 0.38 (194) the same as nonnegative definite matrix.6.1). id est. can be read: symmetric matrix A exceeds the origin with respect to the positive semidefinite cone.13.7. ( A. . Equality constraints in semidefinite program (648).1) id est.0.2. ( 2. eigenvalues must be (nonnegative) positive.0.38 meaning (confer 2.5. matrix A belongs to the positive semidefinite cone in the subspace of symmetric matrices whereas A ≻ 0 denotes membership to that cone’s interior.9. it is easy to show. These notations further imply that coordinates [sic] for orthogonal expansion of a positive (semi)definite matrix must be its (nonnegative) positive eigenvalues ( 2. id est.3. ( A. .3.4. △ The only symmetric positive semidefinite matrix in SM having M + 0-eigenvalues resides at the origin.2) Notation A ≻ 0.2. A ∈ SM .3).7. A ⊁ 0 = A ∈ SM | min{λ(A)i .1) when expanded in its eigenmatrices ( A.13.9. A = 0 for some y = 1 + (193) where λ(A) ∈ RM holds the eigenvalues of A .1. the notation A B denotes comparison with respect to the positive semidefinite cone.7.2. denoting a positive definite matrix. M } = 0 = {A ∈ SM | rank A < M } + 113 = A ∈ SM | yy T . POSITIVE SEMIDEFINITE (PSD) CONE 0 eigenvalue) reside on the cone boundary (Figure 43). . Generalizing comparison on the real line. A B ⇔ A − B ∈ SM but neither matrix A or B necessarily belongs to + the positive semidefinite cone.1.

darkest shading is farthest and inside shell.114 CHAPTER 2.9.12] although PSD cone is selfdual (376) in ambient real space of symmetric matrices. A circular cone in this dimension ( 2.0. 0-contour of smallest eigenvalue (193). [196. II] PSD cone has no two-dimensional face in any dimension. PSD cone geometry is not as simple in higher dimensions [26.7. each and every ray on boundary corresponds to an extreme direction but such is not the case in any higher dimension (confer Figure 24).8). Entire boundary can be constructed from an aggregate of rays ( 2. II.0. CONVEX GEOMETRY γ svec ∂ S2 + α β β γ α √ 2β Minimal set of generators are the extreme directions: svec{yy T | y ∈ RM } Figure 43: (d’Aspremont) Truncated boundary of PSD cone in S2 plotted in isometrically isomorphic R3 via svec (56). z ∈ R2 .2. its only extreme point residing at 0.1) √ 2 2 emanating exclusively from origin: κ2 [ z1 2z1 z2 z2 ]T | κ ∈ R . . Lightest shading is closest.

and so on. ζ2 ≥ 0 (197) (196) ζ1 xTA1 x + ζ2 xTA2 x ≥ 0 for each and every normalized x ∈ RM The convex cone SM is more easily visualized in the isomorphic vector + space RM (M +1)/2 whose dimension is the number of free variables in a symmetric M ×M matrix. having boundary illustrated in Figure 43.9. When M = 2 the PSD cone is semiinfinite in expanse in R3 .9. . ζ2 ≥ 0 and each and every A1 . for all ζ1 . 7. When M = 3 the PSD cone is six-dimensional.1] videlicet. 2. for A1 .2. A2 0 (195) a fact easily verified by the definitive test for positive semidefiniteness of a symmetric matrix ( A): A id est.1 Positive semidefinite cone is convex The set of all positive semidefinite matrices forms a convex cone in the ambient space of symmetric matrices because any pair satisfies definition (175). [203. A2 0 ⇔ xTA x ≥ 0 for each and every x = 1 0 and each and every ζ1 . A2 ∈ SM ζ1 A1 + ζ2 A2 0 ⇐ A1 0 . POSITIVE SEMIDEFINITE (PSD) CONE C 115 X x Figure 44: Convex set C = {X ∈ S × x ∈ R | X xxT } drawn truncated.

Assign 2 x vec X ∈ RN 2 b vec B ∈ RN . (205) is block-diagonal formed by Kronecker product ( A. Inverse image of positive semidefinite cone.0. in contrast. ( A. The set {X ∈ S n × x ∈ Rn | X xxT } is not convex. Yet for fixed x = xp .1.1).1. CONVEX GEOMETRY Example. Yet we would like a less amorphous characterization of this set.2.9. Sets from maps of positive semidefinite cone.0.9. Define the set X {X | AX + B 0} ⊆ SN (202) 0 (201) which is the inverse image of the positive semidefinite cone under affine transformation g(X) AX + B .116 2.9. C = {X ∈ S n × x ∈ Rn | X xxT } X x xT 1 (198) is convex because it has Schur-form.1.1.1. x) 0 (199) n+1 e.2 Example. Set C is the inverse image ( 2..1) of S+ under affine mapping f . the set {X ∈ S n | X xp xT } p (200) is simply the negative semidefinite cone shifted to xp xT .0. so instead we consider its vectorization (37) which is easier to visualize: vec g(X) = vec(AX) + vec B = (I ⊗ A) vec X + vec B where I ⊗A QΛQT ∈ SN 2 (203) (204) D. Figure 44. Set X must therefore be convex by Theorem 2.31.1 no.4) X − xxT 0 ⇔ f (X . having no Schur-form.1.0. B ∈ SN . p 2.g.1.1 The set CHAPTER 2.9. Now consider finding the set of all matrices X ∈ SN satisfying AX + B given A .

0.3).9. Should I ⊗ A have no nullspace. vec X = (I ⊗ A)† vec(SN − B) ⊕ N (I ⊗ A) + (212) In words. Utilizing the diagonalization (204). the vectorization of every element of SN .3 and A.7. + .3. + vec X = {x | ΛQTx ∈ QT(K − b)} 2 = {x | ΦQTx ∈ Λ† QT(K − b)} ⊆ RN where † (208) denotes matrix pseudoinverse ( E) and Φ Λ† Λ (209) is a diagonal projection matrix whose entries are either 1 or 0 ( E.1). and ΦΛ† = Λ† . then vec X = (I ⊗ A)−1 vec(SN − B) which is the expected result.6. set vec X is the vector sum of the translated PSD cone (linearly mapped onto the rowspace of I ⊗ A ( E)) and the nullspace of I ⊗ A (synthesis of fact from A. We have the complementary sum ΦQTx + (I − Φ)QTx = QTx (210) So.2. POSITIVE SEMIDEFINITE (PSD) CONE then make the equivalent problem: Find vec X = {x ∈ RN | (I ⊗ A)x + b ∈ K} where K vec SN + 2 117 (206) (207) is a proper cone isometrically isomorphic with the positive semidefinite cone in the subspace of symmetric matrices. adding (I − Φ)QTx to both sides of the membership within (208) admits vec X = {x ∈ RN | QTx ∈ Λ† QT(K − b) + (I − Φ)QTx} 2 = {x | QTx ∈ Φ Λ† QT(K − b) ⊕ (I − Φ)RN } 2 = {x ∈ QΛ† QT(K − b) ⊕ Q(I − Φ)RN } = (I ⊗ A)† (K − b) ⊕ N (I ⊗ A) 2 2 (211) where we used the facts: linear function QTx in x on RN is a bijection.

CONVEX GEOMETRY √ 2β svec S2 + (a) (b) α α β β γ Figure 45: (a) Projection of truncated PSD cone S2 . + on αβ-plane in isometrically isomorphic R3 . γ = ∞. View is from above with respect to Figure 43. . √ for example. truncated above γ = 1. (b) Truncated above γ = 2. line [ 0 1/ 2 γ ]T | γ ∈ R intercepts PSD cone at some large value of γ .118 CHAPTER 2. in fact. From these plots we might infer.

1.2). Y ∈ RM ×ρ (213) Then the boundary of the positive semidefinite cone may be expressed ∂ SM = A ∈ SM | rank A < M = Y Y T | Y ∈ RM ×M −1 + + (214) Because the boundary of any convex body is obtained with closure of its relative interior ( 2. ∂ SM = A ∈ SM | rank A = M − 1 = A ∈ SM | rank A ≤ M − 1 + + + (217) In S2 . . from (192) we must also have SM = A ∈ SM | rank A = M + + = Y Y T | Y ∈ RM ×M . each and every ray on the boundary of the positive semidefinite cone in isomorphic R3 corresponds to a symmetric rank-1 matrix (Figure 43).3] A = Y Y T ∈ SM . For example. For ρ < M this subset.2.2. but that does not hold in any higher dimension.1. 2. this applies more generally. POSITIVE SEMIDEFINITE (PSD) CONE 119 2.9.1 rank ρ subset of the positive semidefinite cone For the same reason (closure).9.9. nonconvex for M > 1. + rank A = rank Y = ρ . resides on the positive semidefinite cone boundary.1.9. Closure and rank ρ subset. we give such generally nonconvex sets a name: rank ρ subset of a positive semidefinite cone. there must exist a rank ρ matrix Y such that A be expressible as an outer product in Y .7. [333. 6.1 Exercise.7.2 Positive semidefinite cone boundary For any symmetric positive semidefinite matrix A of rank ρ . 2. rank Y = M = Y Y T | Y ∈ RM ×M (215) 2.2. for 0 ≤ ρ ≤ M A ∈ SM | rank A = ρ = A ∈ SM | rank A ≤ ρ + + (216) For easy reference. Prove equality in (216).

0.9. that contains a given + positive semidefinite matrix A .3 Faces of PSD cone. each face belongs to a subspace of dimension the same as the face. having ordered diagonalization A = QΛQT ∈ SM ( A. Specifically. 6] [216.0.2 by (2042).120 2.1).0. [247.5. having dimension less than that of the cone.2. their dimension versus rank Each and every face of the positive semidefinite cone. orthogonal complement to the geometric center subspace. prop.1). 5. + Another good example of tangent subspace is given in E. 2. + Subspace RM (B) is a hyperplane supporting SN when B ∈ M(N − 1).4] Because each and every face of the positive semidefinite cone contains the origin ( 2. the subspace RM tangent to manifold M(ρ) at B ∈ M(ρ) [187.3. RM (11T ) = SN ⊥ .547) 2. Then + matrix A .7.1] RM (B) has dimension dim svec RM (B) = ρ N − ρ−1 2 = ρ(N − ρ) + ρ(ρ + 1) 2 (220) {XB + BX T | X ∈ RN ×N } ⊆ SN (219) Tangent subspace RM contains no member of the positive semidefinite cone SN whose rank exceeds ρ .2. is exposed.2. CONVEX GEOMETRY Subspace tangent to open rank ρ subset When the positive semidefinite cone subset in (216) is left unclosed as in M(ρ) A ∈ SN | rank A = ρ + (218) then we can specify a subspace tangent to the positive semidefinite cone at a particular member of manifold M(ρ).8. of positive semidefinite cone SM .9. is + .1. Define F(SM ∋ A) (171) as the smallest face. c (Figure 154 p.2 CHAPTER 2.

7. the smaller the face. then R(X) ⊥ N (X) ⊇ N (A). X = 0. (140) Thus dimension of the smallest face that contains given matrix A is dim F SM ∋ A = rank(A)(rank(A) + 1)/2 + (222) in isomorphic RM (M +1)/2 . (⇐) Assume R(X) ⊥ N (A) .39 [26.9.2. 31. ( A..40 The boundary constitutes all the one-dimensional faces.4] [241] F SM ∋ A = {X ∈ SM | N (X) ⊇ N (A)} + + = {QΛΛ† ΨΛΛ† QT | Ψ ∈ SM } + 121 = {X ∈ SM | Q(I − ΛΛ† )QT .g.40 1.39 . in R3 .3. The positive semidefinite cone has no facets. then X Q(I − ΛΛ† )QT = 0 ⇒ N (X) ⊇ N (A). A = QΛQT ∈ SM . The + + + larger the nullspace of A . For X ∈ SM . for example. 2.2.9. X = 0 ⇔ R(X) ⊥ N (A). 2. rank-2 matrices belong to the interior of that face having dimension 3 (the entire closed cone). Observe: not all dimensions are represented. and the only zero-dimensional face is the origin.1 Table: Rank k versus dimension of S3 faces + k dim F(S3 ∋ rank-k matrix) + 0 0 1 boundary ≤ 1 ≤2 3 interior ≤3 6 For the positive semidefinite cone S2 in isometrically isomorphic R3 + depicted in Figure 43. II. which are rays emanating from the origin.4) (⇒) Assume N (X) ⊇ N (A) .5. rank-1 matrices belong to relative interior of a face having dimension2.12] [114.3] [240. X = 0} + (221) = QΛΛ† SM ΛΛ† QT + ≃ Srank A + which is isomorphic with convex cone Srank A . + + Given Q(I − ΛΛ† )QT . POSITIVE SEMIDEFINITE (PSD) CONE relatively interior to2. and each and every face of SM is isomorphic with + a positive semidefinite cone having dimension the same as the face. show N (X) ⊇ N (A) ⇔ Q(I − ΛΛ† )QT . 2. Q SM QT = SM . and the only rank-0 matrix is the point at the origin (the zero-dimensional face). 2. e.

2.122 CHAPTER 2. There are M !/(1!(M − 1)!) principal 1 × 1 submatrices.5 PSD cone face containing principal submatrix A principal submatrix of a matrix A ∈ RM ×M is formed by discarding any particular subset of its rows and columns having the same indices. Each and every orthogonal matrix Q makes projectors Q(: . X = 0} + + = X ∈ SM + Q I− I ∈ Sk 0 0T 0 QT .1) + + because PSD cone is selfdual. Simultaneously diagonalizable means commutative. totaling 2M − 1 principal submatrices Any vectorized nonzero matrix ∈ SM is normal to a hyperplane supporting SM ( 2.1 Exercise. Normal on boundary exposes nonzero face by (329) (330). described by intersection with a hyperplane: for + ordered diagonalization of A = QΛQT ∈ SM rank(A) = k < M + F SM ∋ A = {X ∈ SM | Q(I − ΛΛ† )QT .41 svec Q(: . 2. + + 2. 2. M !/(2!(M − 2)!) principal 2 × 2 submatrices. Prove that the smallest face of positive semidefinite cone SM . Bijective isometry. k+1 : M )T indexed by k . and discretely indexed by rank k . show how I − ΛΛ† and ΛΛ† ΨΛΛ† share a complete set of eigenvectors.3.9. containing a + particular full-rank matrix A having ordered diagonalization QΛQT . k+1 : M )Q(: .9.2. of positive semidefinite cone SM . prove Q SM QT = SM from (221).4. and so on.11) of the positive semidefinite cone containing rank-k (and less) matrices. 2. k+1 : M )T to a supporting hyperplane ∂H+ (containing the origin) exposing a face ( 2.9.4 rank-k face of PSD cone Any rank-k < M positive semidefinite matrix A belongs to a face. each projector describing a normal2. both in SM from (221). is the entire cone: id est.2 Exercise.2.2.9. X = 0 (223) ≃ Sk + = SM ∩ ∂H+ + Faces are doubly indexed: continuously indexed by orthogonal matrix Q . in other words.41 . CONVEX GEOMETRY 2. k+1 : M )Q(: .13. Given diagonalization of rank-k ≤ M positive semidefinite matrix A = QΛQT and any particular Ψ 0.

We can find the smallest face.2.42 so embedded can + be expressed ΦAΦ . for then description of smallest face becomes simpler. X = 0} + ≃ Srank Φ + = {ΦΨΦ | Ψ ∈ SM } + (225) Smallest face that contains an embedded principal submatrix. If positive semidefinite matrix A ∈ SM has principal + submatrix of dimension ρ with rank r . Because each and every principal submatrix of a positive semidefinite matrix in SM is positive semidefinite. 1} (224) having diagonal entry 0 corresponding to a discarded row and column from A ∈ SM . may be expressed like (221): For embedded principal submatrix ΦAΦ ∈ SM rank ΦAΦ ≤ rank Φ . [299. apply ordered diagonalization + instead to ˆ ˆ ΦTA Φ = U ΥU T ∈ Srank Φ + 2. Φii ∈ {0 . then rank A ≤ M −ρ + r by (1569). id est. A given symmetric matrix has rank ρ iff it has a nonsingular principal ρ×ρ submatrix but none larger. POSITIVE SEMIDEFINITE (PSD) CONE 123 including A itself.0. whose rank is not necessarily full. Principal submatrices of a symmetric matrix are symmetric.9. Of special interest + are full-rank positive semidefinite principal submatrices. 5-10] By loading vector y in test y T Ay ( A.2) with various binary patterns. then each principal submatrix belongs to a certain face of positive semidefinite cone SM by (222).3.1. by embedding that submatrix in a 0 matrix of the same dimension as A : Were Φ a binary diagonal matrix Φ = δ 2(Φ) ∈ SM . Φ = . then any principal submatrix2. 0 To express a leading principal submatrix. for an embedded principal submatrix ΦAΦ ∈ SM rank ΦAΦ = rank Φ ≤ rank A + F SM ∋ ΦAΦ = {X ∈ SM | N (X) ⊇ N (ΦAΦ)} + + = {X ∈ SM | I − Φ . for example.42 (226) I 0T 0 . it follows that any principal submatrix must be positive (semi)definite whenever A is (Theorem A. that contains a particular complete full-rank principal submatrix of A .4).

Ai = 0 ⇔ XAi = Ai X = 0 .5.124 CHAPTER 2. Because our interest is only that part of the nullspace in the positive semidefinite cone. each nullspace has only one affine dimension because A1 and A2 are rank-1. ˆ rank Φ = rank Φ .1 Example. Smallest face formula (221) can be altered to accommodate a union of points {Ai ∈ SM }: + F SM ⊃ + Ai = i X ∈ SM + N (X) ⊇ i N (Ai ) (230) To see that.5. X = 0} + ˆ ˆ = {ΦU ΥΥ† ΨΥΥ† U T ΦT | Ψ ∈ Srank Φ } + rank ≃ S+ ΦAΦ (227) where binary diagonal matrix Φ is partitioned into nonzero and zero columns by permutation Ξ ∈ RM ×M .9. CONVEX GEOMETRY where U −1 = U T is an orthogonal matrix and Υ = δ 2(Υ) is diagonal.3). Similarly. imagine two vectorized matrices A1 and A2 on diametrically opposed sides of the positive semidefinite cone S2 boundary pictured in + Figure 43. Then F SM ∋ ΦAΦ = {X ∈ SM | N (X) ⊇ N (ΦAΦ)} + + ˆ ˆ = {X ∈ SM | ΦU (I − ΥΥ† )U T ΦT + I − Φ . X. there is a second hyperplane containing svec basis N (A2 ) having normal svec A2 .Ai ∈ SM + (1623) we may ignore the fact that vectorized nullspace svec basis N (Ai ) is a proper subspace of the hyperplane. 2. ˆˆ Φ = ΦΦT ∈ SM . We may think instead in terms of whole . While each hyperplane is two-dimensional. then by X . Smallest face containing disparate elements. ΦΞT ˆ [ Φ 0 ] ∈ RM ×M .2. Regard svec A1 as normal to a hyperplane in R3 containing a vectorized basis for its nullspace: svec basis N (A1 ) ( 2. ˆ ˆ ΦT Φ = I (228) Any embedded principal submatrix may be expressed ˆˆ ˆˆ ΦAΦ = ΦΦTA ΦΦT ∈ SM + (229) ˆˆ ˆˆ ˆ ˆ where ΦTA Φ ∈ Srank Φ extracts the principal submatrix whereas ΦΦTA ΦΦT + embeds it.

One way is by + ⊥ M M showing N (Ai ) ∩ S+ = conv ({Ai }) ∩ S+ . To maintain generality.1. with perpendicularity ⊥ as in (372). 2.9. the larger the smallest face.4. Prove that (230) holds for an arbitrary set {Ai ∈ SM ∀ i ∈ I }. And so hyperplane intersection makes a line intersecting the positive semidefinite cone S2 but only at the origin. basis N ˆ ˆ ΦTA Φ B T B C span ⊇ 0 I − CC † (232) Hence N (ΞXΞT ) ⊇ N (ΞAΞT ) 2. I − Φ is dropped from (227).5. whose normal is Ai . In this hypothetical example. given fixed principal submatrix.6 face of all PSD matrices having same principal submatrix Now we ask what is the smallest face of the positive semidefinite cone containing all matrices having a complete principal submatrix in common.43 2. more pertinently. + smallest face containing those two matrices therefore comprises the entire cone because every positive semidefinite matrix has nullspace containing 0. constituting N (Ai ). that face containing all PSD matrices (of any rank) with particular entries fixed − the smallest face containing all PSD matrices whose fixed entries correspond to some given embedded principal submatrix ΦAΦ .9.44 we move an extracted principal submatrix ˆ ˆ ΦTA Φ ∈ Srank Φ into leading position via permutation Ξ from (228): for + A ∈ SM + ˆ ˆ ΦTA Φ B ΞAΞT ∈ SM (231) + T B C By properties of partitioned PSD matrices in A.2. are admitted: Hint: (1623) (1950).2.2 Exercise.9. not only leading principal submatrices.2.43 0 in a smallest face F formula2. 2.2.2. Disparate elements. The smaller the intersection. POSITIVE SEMIDEFINITE (PSD) CONE 125 hyperplanes because equivalence (1623) says that the positive semidefinite cone effectively filters that subset of the hyperplane. to fix any principal submatrix.0.45 meaning.45 I because all PSD matrices. 2.44 . in other words.

1.126 Define a set of all PSD matrices S A = ΞT ˆ ˆ ΦTA Φ B Ξ T B C CHAPTER 2.1.1.7. CONVEX GEOMETRY 0 B ∈ Rrank Φ×M −rank Φ . because of the cone faces lemma ( 2. Symmetric dyads constitute the set of all extreme directions: For M > 1 {yy T ∈ SM | y ∈ RM } ⊂ ∂ SM + (235) this superset of extreme directions (infinite in number. Smallest face of the positive semidefinite cone.0.1) conv{yy ∈ S T M | y∈R } = M ∞ i=1 bi zi ziT | zi ∈ RM .1.8.9.2. id est. there is a one-to-one correspondence of one-dimensional faces with extreme directions in any dimension M .3.8.9.1.2. X = 0} + U ΥΥ† 0 ΥΥ† U T 0 Ψ Ξ Ψ ∈ SM + 0T I 0T I (234) ≃ SM −rank Φ+rank ΦAΦ + Ξ = I whenever ΦAΦ denotes a leading principal submatrix. the convex hull of extreme rays and origin is the positive semidefinite cone: ( 2.2. b 0 = SM + (236) .0. So T 0 0 ˆ ˆ = {X ∈ SM | ΦU (I − ΥΥ† )U T ΦT . 2. Υ 0). confer (187)) for the positive semidefinite cone is. is the entire cone (Exercise 2. C ∈ SM −rank Φ + (233) having fixed embedded principal submatrix ΦAΦ = ΞT F SM ⊇ S = X ∈ SM | N (X) ⊇ N (S) + + = ΞT ˆ ˆ ΦTA Φ 0 Ξ . generally.7 Extreme directions of positive semidefinite cone Because the positive semidefinite cone is pointed ( 2. By extremes theorem 2.1) and direct correspondence of exposed faces to faces of SM .2). it follows: there + is no one-dimensional face of the positive semidefinite cone that is not a ray emanating from the origin.8. a subset of the boundary.2).2. containing all matrices having the same full-rank principal submatrix (ΥΥ† = I .

7) {yy ∈ S | y = 0} = int S+ 127 (237) (238) Each and every extreme direction yy T makes the same angle with the identity matrix in isomorphic RM (M +1)/2 .46 .1 Example. POSITIVE SEMIDEFINITE (PSD) CONE For two-dimensional matrices (M = 2. I ) = arccos yy T . Positive semidefinite matrix from extreme directions.5) of symmetric matrices yields the following results: Any positive semidefinite matrix (1483) in SM can be written in the form M A = i=1 ˆˆ λ i zi ziT = AAT = i ai aT ˆ ˆi 0.2.2. I yy T F I 1 = arccos √ M ∀ y ∈ RM (239) F 2. M A= i=1 λ i zi ziT ∈ C . 2. The cone’s axis is −E = 11T − I (1127). Extreme directions of the EDM cone can be found in 6.2.7. in exception.2.4. λ 0 (240) a conic combination of linearly independent extreme directions (ˆi aT or zi ziT a ˆi where zi = 1).3. where λ is a vector of eigenvalues. λ 0 (241) Implications are: Analogy with respect to the EDM cone is considered in [186. If we limit consideration to all symmetric positive semidefinite matrices bounded via unity trace C {A 0 | tr A = 1} (91) then any matrix A from that set may be expressed as a convex combination of linearly independent extreme directions. (M = 1. 1T λ = 1 . Figure 43) {yy T ∈ S2 | y ∈ R2 } = ∂ S2 + while for one-dimensional matrices. dependent only on dimension.9. p.9. Diagonalizability ( A.162] where it is found: angle is not constant. 2.46 (yy T . videlicet.

2.0. but only in low dimension. an intersecting plane perpendicular to a circular cone’s axis of revolution produces a section bounded by a circle. We also find that the positive semidefinite cone and cone of Euclidean distance matrices are circular cones. id est. no single eigenvalue can exceed 1 .47 inf yy T − cI c F = a 1− 1 M (243) A circular cone is assumed convex throughout. 2. CONVEX GEOMETRY 1. directly from definition (186). because the set of eigenvalues corresponding to a given square matrix A is unique ( A. .7. The positive semidefinite cone has axis of revolution that is the ray (base 0) through the identity matrix I .1). 2. set C is convex (an intersection of PSD cone with hyperplane). although not so by other authors. In three dimensions. (Figure 46) A prominent example of a circular cone in convex analysis is Lorentz cone (178).9. I A 3.9.5.2.9. We investigate this now: 2. We also assume a right circular cone. Consider a set of normalized extreme directions of the positive semidefinite cone: for some arbitrary positive constant a ∈ R+ √ (242) {yy T ∈ SM | y = a} ⊂ ∂ SM + The distance from each extreme direction to the axis of revolution is radius R 2. Prove. △ A conic section is the intersection of a cone with any hyperplane.8 Positive semidefinite cone is generally not circular Extreme angle equation (239) suggests that the positive semidefinite cone might be invariant to rotation about its axis of revolution.8. that symmetric dyads (235) constitute the set of all extreme directions of the positive semidefinite cone. Circular cone:2. id est. a circular cone. and the converse holds: set C is an instance of Fantope (91).2.128 CHAPTER 2. Extreme directions of positive semidefinite cone. 2.1 Definition.47 a pointed closed convex cone having hyperspherical sections orthogonal to its axis of revolution about which the cone is invariant to rotation.2 Exercise.

.2. POSITIVE SEMIDEFINITE (PSD) CONE 129 R 0 Figure 46: This solid circular cone in R3 continues upward infinitely. R represents radius: distance measured from an extreme direction to axis of revolution. any plane slice containing axis of revolution would make a right angle.9. Were this a Lorentz cone. Axis of revolution is illustrated as vertical line through origin.

From Example 2. id est. CONVEX GEOMETRY yy T a I M θ R a 11T M Figure 47: Illustrated is a section. of circular cone from Figure 46. y Ty = a} = SM ∩ {A ∈ SM | I .1. the convex hull (excluding vertex at the origin) of the normalized extreme directions is a conic section C conv{yy T | y ∈ RM . perpendicular to axis of revolution. we can show it is not by demonstrating shortage of extreme directions. A = a} + (244) orthogonal to identity matrix I . C− a I . Although the positive semidefinite cone possesses some characteristics of a circular cone. the length of vector yy T − M I . some extreme directions corresponding to each and every angle of rotation about the axis of revolution are . I M = tr(C − a I) = 0 M (245) Proof.130 CHAPTER 2. Because distance R (in a particular dimension) from the axis of revolution to each and every normalized extreme direction is identical.7. a a which is the distance from yy T to M I .2. Vector M 11T is an arbitrary reference by which to measure angle θ . the extreme directions lie on the boundary of a hypersphere in isometrically isomorphic RM (M +1)/2 . Radius R is distance from any extreme a a direction to axis at M I .9.

Figure 42).48 2.2.8. then all eigenvalues of A belong to the union of m closed intervals on the real line. Prove the PSD cone to be circular in matrix dimensions 1 and 2 while it is a rotation of Lorentz cone (178) in matrix dimension 2 . 1-7] cos θ = Solving for vector y we get a(1 + (M − 1) cos θ) = (1T y)2 a a a 11T − M I . s [203. yy T − M I M 1 a2 (1 − M ) 131 (246) (247) which does not have real solution ∀ 0 ≤ θ ≤ 2π in every matrix dimension M . POSITIVE SEMIDEFINITE (PSD) CONE nonexistent: Referring to Figure 47.1. conic section (244) cannot be hyperspherical by the extremes theorem ( 2. Because of a shortage of extreme directions.9. 2.        m m  1 m [A −̺ . PSD cone inscription in three dimensions.9.2. ⋄ β γ−α is . show √2 Hint: Given cone β γ β/ 2 a vector rotation that is positive semidefinite under the same inequality.2 Exercise.8. if a union of k of these m [intervals] forms a connected region that is disjoint from all the remaining n − k [intervals].48 Furthermore.3 Example. Gerˇgorin discs.8. then there are precisely k eigenvalues of A in this region.140] m m For p ∈ R+ given A = [Aij ] ∈ S .1.1. [383.1] [367] [256.2.9. From the foregoing proof we can conclude that the positive semidefinite cone might be circular but only in matrix dimensions 1 and 2. 6. 2. p.2. Aii +̺i ] pj |Aij | = ξ ∈ R |ξ − Aii | ≤ ̺i λ(A) ∈  i=1 ii i pi j=1 i=1      j=i (248) √ α β/ 2 γ+α 1 √ | α2 + β 2 ≤ γ . Theorem. Circular semidefinite cone.

created by intersection of halfspaces.5 0.5 p= 1 1 1 1 1 0. . Hyperplanes supporting K intersect along boundary of PSD cone.5 0.5 1 1 0. Four extreme directions of K coincide with extreme directions of PSD cone.75 0.75 0.5 0 0. CONVEX GEOMETRY -0.5 0.5 0.5 A11 p= 2 1 1 0.75 0.25 0 A22 svec ∂ S2 + Figure 48: Proper polyhedral cone K .132 0 -1 CHAPTER 2.5 1 1 1 0. inscribes PSD cone in isometrically isomorphic R3 as predicted by Gerˇgorin s discs theorem for A = [Aij ] ∈ S2 .5 0 √ 2A12 0.5 0.25 0 0 -1 -0.25 0 p= 1/2 1 p1 A11 ≥ ±p2 A12 p2 A22 ≥ ±p1 A12 0 -1 -0.5 0 0.

As p1 /p2 increases in value from 0. That intersection creates the proper polyhedral cone K ( 2. id est.9. Because the entire positive semidefinite cone can be swept by K . Then we have m2 m−1 = 4 inequalities. Drawn truncated is the boundary of the positive semidefinite cone svec S2 and the bounding hyperplanes + supporting K .2. in the matrix entries Aij with Gerˇgorin parameters p = [pi ] ∈ R2 . POSITIVE SEMIDEFINITE (PSD) CONE 133 To apply the theorem to determine positive semidefiniteness of symmetric matrix A . Only the extreme directions of K intersect the positive semidefinite cone boundary in this dimension. two extreme directions of K sweep the entire boundary of this positive semidefinite cone. K always belongs to the positive s semidefinite cone for any nonnegative value of p ∈ Rm . the four extreme directions of K are extreme directions of the positive semidefinite cone. s + p1 A11 ≥ ±p2 A12 p2 A22 ≥ ±p1 A12 (251) (249) which describe an intersection of four halfspaces in Rm(m+1)/2 . svec A belongs to isometrically isomorphic R3 . we observe that for each i we must have Aii ≥ ̺i Suppose m=2 (250) so A ∈ S2 .1) whose construction is illustrated in Figure 48. the system of linear inequalities √ p1 ±p2 /√2 0 T svec A 0 (252) Y svec A 0 ±p1 / 2 p2 when made dynamic can replace a semidefinite constraint A K = {z | Y Tz given p where Y ∈ Rm(m+1)/2×m2 m−1 0 . for (253) 0} ⊂ svec Sm + svec A ∈ K ⇒ A ∈ Sm + (254) .12. Created by means of Gerˇgorin discs. Hence any point in + K corresponds to some positive semidefinite matrix A . Vectorizing A as in (56).

∀i = 1. its geometry becomes intricate.2. 1) of µ .4 Exercise.1. Any positive semidefinite matrix belonging to the PSD cone has an eigenvalue decomposition that is a positively scaled sum of linearly independent symmetric dyads. rank of the sum A + B is equivalent to the number of linearly independent dyads constituting it. ∗ Find dual proper polyhedral cone K from Figure 48.1 Lemma.2.9. it was made necessary and sufficient.2.B ∈ SM + rank(A + B) = rank(µA + (1 − µ)B) over open interval (0.9.8. 7. In higher dimension (m > 2).9 Boundary constituents of the positive semidefinite cone 2.49 It is not necessary to sweep the entire boundary in higher dimension. By the linearly independent dyads definition in B.3] m Aii ≥ j=1 j=i |Aij | . For A.2.. 2.m (256) is only a sufficient condition for membership to the PSD cone. 2. Dual inscription. diagonal dominance [203.134 but ∃p CHAPTER 2. but by dynamic weighting p in this dimension. remains an open question.2.9.49 similarly to the foregoing example.0.1. boundary of the positive semidefinite cone is no longer constituted completely by its extreme directions (symmetric rank-one matrices). Sum of positive semidefinite matrices..9. 2. The assumption of positive semidefiniteness prevents annihilation of any dyad from the sum A + B . p.349. Linear independence is insensitive to further positive scaling by µ . How all the extreme directions can be swept by an inscribed polyhedral cone. . (257) ⋄ Proof.1. CONVEX GEOMETRY Y T svec A 0 ⇔ A 0 (255) In other words.

That is confirmed by applying identity (257) from Lemma 2. A. for 0 ≤ ρ ≤ M + SM(ρ) + are pointed convex cones. rank B} (261) It can similarly be shown.4] rank A + rank B ≥ rank(A + B) that follows from the fact [333. ξ ≥ 0 + Therefore.9. almost identically to proof of the lemma. Convex subsets of positive semidefinite cone. videlicet.2 Example. ( 3. any conic combination of A. Lemma 2. Subsets of the positive semidefinite cone SM . 3.9. a subset SM(ρ) is convex if and only if convex + combination of any two members has rank at least ρ .B ∈ Rm×n [203. quasiconvex functions are not necessarily continuous. rank(ξB)} {X ∈ SM | rank X ≥ ρ} + (260) (262) Another proof of convexity can be made by projection arguments: . Rank function quasiconcavity. From this example we see. + rank(ζ A + ξB) ≥ min{rank(ζ A) .9.2. id est. + + ⋄ Proof.8) (258) (259) that follows from the fact rank A + rank B ≥ rank(A + B) ≥ min{rank A .B in subset SM(ρ) remains a member. Given ρ .9.2.2.B ∈ SM(ρ) on closed interval [0.9.9.B ∈ SM + dim R(A) + dim R(B) = dim R(A + B) + dim(R(A) ∩ R(B)) (1499) (160) 135 (confer 3. but not closed unless ρ = 0 .6] For A.8) We also glean: 2. id est.3 Theorem.1. POSITIVE SEMIDEFINITE (PSD) CONE 2. rank B} N (A + B) = N (A) ∩ N (B) . SM(0) = SM .B ∈ SM + Rank is a quasiconcave function on SM because the right-hand inequality in + (1499) has the concave form (642).9.9.9. 1] of µ + rank(µA + (1 − µ)B) ≥ min{rank A .1 to (1499). ∀ ζ .2. 0.2. unlike convex functions. for A. id est. For A. SM(ρ) is a convex cone.

5.1 Exercise. .1) SM \ SM(ρ + 1) = {X ∈ SM | rank X ≤ ρ} + + + PSM \SM(ρ+1) H = QΥ⋆ QT + + (216) (265) As proved in 7. .2. .9.10 Projection on SM(ρ) + CHAPTER 2. Given a symmetric matrix H having diagonalization H QΛQT ∈ SM ( A. we may conclude it is a convex set by the Bunt-Motzkin theorem ( E. ρ max {0 . . CONVEX GEOMETRY Because these cones SM(ρ) indexed by ρ (260) are convex. then these two different projections would produce identical results in the limit ǫ → 0.0. M (1374) Together these two results (264) and (1374) mean: A higher-rank solution to projection on the positive semidefinite cone lies arbitrarily close to any given lower-rank projection. i = 1 .0. Because each H ∈ SM has unique projection on SM(ρ) (despite possibility + of repeated eigenvalues in Λ).136 2. i = ρ+1 . . .1) with eigenvalues Λ arranged in nonincreasing order.10.4. Projection on open convex cones.9. i = ρ+1 .2.1). M (264) (263) where ǫ is positive but arbitrarily close to 0. Λii } .9.9. projection on + them is straightforward.2.2. i = 1 . then its Euclidean projection (minimum-distance projection) on SM(ρ) + PSM(ρ) H = QΥ⋆ QT + corresponds to a map of its eigenvalues: Υ⋆ = ii max {ǫ . . Were the number of nonnegative eigenvalues in Λ known a priori not to exceed ρ . but not vice versa. . ρ 0.1.1. Λii } . this projection of H corresponds to the eigenvalue map Υ⋆ = ii max {0 . Prove (264) using Theorem E. .0.9. 2. Λii } . Compare (264) to the well-known result regarding Euclidean projection on a rank ρ subset of the positive semidefinite cone ( 2.

all the positive definite matrices.9.9.2. What remains is again convex. + because all positive semidefinite matrices having rank M constitute the cone interior.2.9. one starts by removing the origin. for example.2) ∂ SM = + M −1 ρ=0 {Y ∈ SM | rank Y = ρ} + (267) The composite sequence.9. Proceeding in this manner eventually removes the entire boundary leaving. remains convex at each step.3.9. POSITIVE SEMIDEFINITE (PSD) CONE 2. at last. appends a nonconvex constituent to the cone boundary.9. but only in their union is the boundary complete: (confer 2.9.2. id est. the only rank-0 positive semidefinite matrix. 2. for 0 < ρ < M {Y ∈ SM | rank Y = ρ} + (266) having rank ρ successively 1 lower than M .2.12 Peeling constituents Proceeding the other way: To peel constituents off the complete positive semidefinite cone boundary. Thus each nonconvex subset of positive semidefinite matrices. the extreme directions are removed because they constitute all the rank-1 positive semidefinite matrices. and so on. All positive semidefinite matrices of rank less than M constitute the cone boundary. for 0 ≤ k ≤ M M ρ=k {Y ∈ SM | rank Y = ρ} + (268) is convex for each k by Theorem 2.11 Uniting constituents 137 Interior of the PSD cone int SM is convex by Theorem 2. an amalgam of positive semidefinite matrices of different rank. .2. the convex interior of the PSD cone.3. the cone interior in union with each successive constituent. Next. What remains is convex.9.

. 2. X = bj . This bound is itself limited above. by N . Difference A − B . What about a difference of matrices A.9. m} ⊆ SN (270) If the intersection A ∩ SN is nonempty given a number m of equalities.2.138 CHAPTER 2. j = 1 . .50 rank X ≤ √ 8m + 1 − 1 2 (272) (271) Given desired rank instead. Aj . equivalently. .50 (273) 4. 2. X = bj .B belonging to the PSD cone? Show: Difference of any two points on the boundary belongs to the boundary or exterior.3. II. belongs to the exterior.2. j =1 .2 contains an intuitive explanation.2] Consider finding a matrix X ∈ SN satisfying X 0. . a tight limit corresponding to an interior point of SN . + . Define the affine subset A {X | Aj .1 Exercise. Difference A − B .1 Proposition. of course.13] [24. + then there exists a matrix X ∈ A ∩ SN such that + rank X (rank X + 1)/2 ≤ m whence the upper bound2.1. m (269) given nonzero linearly independent (vectorized) Aj ∈ SN and real bj . m < (rank X + 1)(rank X + 2)/2 2.3 Barvinok’s proposition Barvinok posits existence and quantifies an upper bound on rank of a positive semidefinite matrix belonging to the intersection of the PSD cone with an affine subset: 2.0. [26. where A belongs to the boundary while B is interior.12.9. CONVEX GEOMETRY 2. Affine intersection with PSD cone.9.

2. iff (171) + dim F (A ∩ SN ) ∋ X (274) + T T = rank(X)(rank(X) + 1)/2 − rank svec RA1 R svec RA2 R · · · svec RAm RT equals 0 in isomorphic RN (N +1)/2 . Figure 49 [test-(278) Matlab implementation.2. in words. [240. . (confer 4. N } comprise a conically independent set if and only if (confer 2. iff no direction from the given set + can be expressed as a conic combination of those remaining.10 Conic independence (c.g.1. the property conically independent direction is more generally applicable.I. e. i = 1 . Wıκımization].i. 2. If the intersection is nonempty and bounded. . of A ∩ SN has dimension 0 . Arbitrary given directions {Γi ∈ Rn .2. Arranging any set of generators for a particular closed convex cone in a matrix columnar.4] [241] id est.2) + A matrix X RTR is an extreme point if and only if the smallest face.2. inclusive of all closed convex cones (not only pointed closed convex cones). X [ Γ1 Γ2 · · · ΓN ] ∈ Rn×N (279) .10. CONIC INDEPENDENCE (C. √ 8m + 1 − 1 −1 (277) rank X ≤ 2 ⋄ 2. . a reduction by exactly 1 of the bound provided by (272) given the same specified number m (275) of equalities.4.) In contrast to extreme direction.3) Γi ζi + · · · + Γj ζj − Γℓ = 0 . a number of equalities m = (ρ + 1)(ρ + 2)/2 (275) and matrix dimension N ≥ ρ + 2 ≥ 3. i= · · · =j =ℓ = 1 . . that contains X .. id est. 2. then there exists a matrix X ∈ A ∩ SN such that + rank X ≤ ρ (276) This represents a tightening of the upper bound.1. N (278) has no solution ζ ∈ RN (ζi ∈ R+ ). Now the intersection A ∩ SN is assumed bounded: Assume a given + nonzero upper bound ρ on rank.) 139 An extreme point of A ∩ SN satisfies (272) and (273).

It is evident that linear independence (l. which suggests. .140 CHAPTER 2. on the other hand.i. generators. number of l. . Denoting by k the number of conically independent generators contained in X . N find ζ ∈ RN subject to Xζ = Γℓ ζ 0 ζℓ = 0 (280) If feasible for any particular ℓ . generators in the columns of X cannot exceed number of c. CONVEX GEOMETRY then this test of conic independence (278) may be expressed as a set of linear feasibility problems: for ℓ = 1 .1 Table: Maximum number of c. set thus found is not necessarily unique.0. . to continue testing remaining generators when Γℓ is found to be dependent. sup k (pointed) sup k (not pointed) 0 1 2 ∞ .10. directions dimension n 0 1 2 3 . . we have the most fundamental rank inequality for convex cones dim aff K = dim aff[ 0 X ] = rank X ≤ k ≤ N (281) Whereas N directions in n dimensions can no longer be linearly independent once N exceeds n . A c. 0 2 4 ∞ . l. A generator Γℓ that is instead found conically independent of remaining generators in X .i. . To find all conically independent directions from a set via (280).0.i. .i. So. conic independence remains possible: 2.i. ⇒ c.i. then the set is not conically independent. is conically independent of any subset of remaining generators. . .i.) of N directions implies their conic independence. Γℓ must be discarded from matrix X before proceeding. . generator Γℓ must be removed from the set once it is found (feasible) conically dependent on remaining generators in X .

(In general. by definition (278).) 141 0 0 0 (a) (b) (c) Figure 49: Vectors in R2 : (a) affinely and conically independent. i= · · · =j =ℓ = 1 .i. and conic senses can be preserved under linear transformation. Consider a transformation on the domain of such matrices T (X) : Rn×N → Rn×N XY (282) where fixed matrix Y [ y1 y2 · · · yN ] ∈ RN ×N represents linear operator T . n ≤ sup k 2. . (c) conically independent but not affinely independent.2.2. (b) affinely independent but not conically independent.1 Preservation of conic independence Independence in the linear ( 2. 1.4.) Assuming veracity of this table.1) Conic independence is certainly one convex idea that cannot be completely explained by a two-dimensional picture as Barvinok suggests [26. affine ( 2. i = 1 . a. From this table it is also evident that dimension of Euclidean space cannot exceed the number of conically independent directions possible. Conic independence of {Xyi ∈ Rn . Suppose a matrix X ∈ Rn×N (279) holds a conically independent set columnar.1).vii]. and 2 must be polyhedral.1. N (283) . None of the examples exhibits linear independence.4). . . N } demands. ( 2. The finite numbers of conically independent directions indicate: Convex cones in dimensions 0. p.i. Xyi ζi + · · · + Xyj ζj − Xyℓ = 0 .10.10.I. there is an apparent vastness between two and three dimensions. c. CONIC INDEPENDENCE (C.2.12. .

CONVEX GEOMETRY K (a) K ∂K (b) ∗ Figure 50: (a) A pointed polyhedral cone (drawn truncated) in R3 having six facets. (b) The boundary of dual cone K (drawn ∗ truncated) is now added to the drawing of same K . and has the same number of extreme directions as K has facets. proper. The extreme directions. . corresponding to six edges emanating from the origin. K is polyhedral.142 CHAPTER 2. not linearly independent but they ∗ must be conically independent. are generators for this cone.

44.i. 2.2. If K is full-dimensional in R .12) are invariant to any linear or inverse linear transformation [26.i.10. Figure 49). If K is pointed. then so is T (K). then T (K) is a convex cone in M and x y with respect to K implies T (x) T (y) with respect to T (K).0.2.1.1.1): The set of all extreme directions from a pointed closed convex cone K ⊂ Rn is not necessarily a linearly independent set.10.3]..10.51 for a pointed closed convex cone.} When a conically independent set of directions from pointed closed convex cone K is known to comprise generators. so is T (K). then T (F ) is a face of T (K). CONIC INDEPENDENCE (C. then all directions from that set must be extreme directions of the cone. And if K is closed.g.I.19.10. e. seen by factoring out X . If F is a face of K .9] [309. generators of pointed closed convex K} Barker & Carlson [21.1 linear maps of cones [21. {extreme directions} ⇔ {c. then so is T (K) in M .) 143 have no solution ζ ∈ RN . If T is a linear bijection.0.8. p. 2. 2. ponder four conically independent generators for a plane (n = 2. then x y ⇔ T (x) T (y). I. 7] If K is a convex cone in Euclidean space R and T is any linear mapping from R to Euclidean space M . This converse does not hold for nonpointed closed convex cones as Table 2. That is ensured by conic independence of {yi ∈ RN } + and by R(Y )∩ N (X) = 0 .2 Pointed closed convex K & conic independence The following bullets can be derived from definitions (186) and (278) in conjunction with the extremes theorem ( 2.1 implies. convex polyhedra ( 2.1. 1] call the extreme directions a minimal generating set for a pointed closed convex cone. conversely. A minimal set of generators is therefore a conically independent set of generators.51 . (compare Figure 24 on page 71 with Figure 50a) {extreme directions} ⇒ {c. yet it must be a conically independent set. Linear bijection is only a sufficient condition for pointedness and closedness. thm. and vice versa.

i. e. generators of closed convex K} 2. Vertex-description of halfspace H about origin.9] {extreme directions} ⇐ {l.i.3. By demanding the augmented set .10. p..10. CONVEX GEOMETRY x2 H x3 0 ∂H x1 Figure 51: Minimal set of generators X = [ x1 x2 x3 ] ∈ R2×3 (not extreme directions) for halfspace about origin. it only first comes to light in four dimensions! But there is a converse: [332.0. {≤ n extreme directions in Rn } {l. dual extreme directions (482) from Example 2. n} constitutes a minimal set of generators for a hyperplane ∂H through the origin. From n + 1 points in Rn we can make a vertex-description of a convex cone that is a halfspace H . An example is illustrated in Figure 51.} Linear dependence of few extreme directions is another convex idea that cannot be explained by a two-dimensional picture as Barvinok suggests [26.vii].1 Example. An arbitrary collection of n or fewer distinct extreme directions from pointed closed convex cone K ⊂ Rn is not necessarily a linearly independent set. . indeed. . 2.g.2.144 CHAPTER 2.13.0.11. affinely and conically independent. where {xℓ ∈ Rn . ℓ = 1 .

.10.2] for faces of . distinction between the terms extreme and exposed vanishes [332. .3 Utility of conic independence Perhaps the most useful application of conic independence is determination of the intersection of closed convex cones from their halfspace-descriptions. .10.2 Exercise.10. Generators for the sum of any number of closed convex cones Ki can be determined by retaining only the conically independent generators from the aggregate of all the vertex-descriptions.2. WHEN EXTREME MEANS EXPOSED 145 {xℓ ∈ Rn .2. (confer Table 2.10. 2. only the conically independent normals from the aggregate of all the halfspace-descriptions need be retained.11 When extreme means exposed For any convex full-dimensional polyhedral set in Rn . but {xℓ ∀ ℓ} is a minimal set of generators for this convex cone H which has no extreme elements. then H = = (ζ xn+1 + ∂H) ζ ≥0 {ζ xn+1 + cone{xℓ ∈ Rn . n} | ζ ≥ 0} (284) a union of parallel hyperplanes. Enumerating conically independent directions. .1 in R and R3 by drawing two figures corresponding to Figure 51 and enumerating n + 1 conically independent generators for each.2. ℓ = 1 . .0. n + 1} 2. 2. n+1} be affinely independent (we want vector xn+1 not parallel to ∂H ). or representation of the sum of closed convex cones from their vertex-descriptions. ℓ = 1 . Do Example 2. Ki Such conically independent sets are not necessarily unique or minimal. 2. .0. 2.0. ℓ = 1 . Ki A halfspace-description for the intersection of any number of closed convex cones Ki can be acquired by pruning normals.11. Describe a nonpointed polyhedral cone in three dimensions having more than 8 conically independent generators. specifically.4] [114.1) = cone{xℓ ∈ Rn .0. Cardinality is one step beyond dimension of the ambient space.

def. Kepler-Poinsot Solid ] 2. Convex polyhedra2. whose dimension is less than dimension of the cone.4] claims nonempty extreme proper subsets and n the exposed subsets coincide for S+ . 2. convex polygons. It follows from definition (285) by exposure that each face of a convex polyhedron is a convex polyhedron. Figure 24. Lorentz cone (178) [20.0.52 .3.g. each and every face of the positive semidefinite cone. 2. and vice versa. a polyhedron thus described is a closed convex set possibly not full-dimensional.1. Each row of C is a vector normal to a hyperplane. C y = d} ⊆ Rn (285) where coefficients A and C generally denote matrices. image and inverse image of a convex polyhedron under any linear We consider only convex polyhedra throughout.g.. 6] [216.53 etcetera. polychora. e.g. II.. A convex polyhedron is the intersection of a finite number of halfspaces and hyperplanes.53 Some authors distinguish bounded polyhedra via the designation polytope. each and every face of any polyhedral set (except the set itself) can be exposed by a hyperplane. Projection of any polyhedron on a subspace remains a polyhedron. e. line segments. polytopes. More generally.2. rays. CONVEX GEOMETRY all dimensions except n . is exposed. △ By the halfspaces theorem in 2. can be expressed as the solution set of a finite system of linear equalities and inequalities. e.2] 2. their meanings become equivalent as we saw in Figure 20 (discussed in 2. Lewis [247.1.3. [376. halfspaces.6. P = {y | Ay b .12. such as the convex hull (86) of a bounded list X . and vice versa. 2.1 Definition. A more general discussion of cones having this property can be found in [343].104/6 p.2). Convex polyhedra.343].1.146 CHAPTER 2.A]. id est.1. [114..52 are finite-dimensional comprising all affine sets ( 2.4. 2. solids [224. Figure 20. polyhedral cones. but acknowledge the existence of concave polyhedra. while each row of A is a vector inward-normal to a hyperplane partially bounding a halfspace.12 Convex polyhedra Every polyhedron.1).2] 2. halfspace-description. [114. In other words.0.

When the number is finite. When b and d in (285) are 0. 19]. thm. e. the convex cone is termed polyhedral. I.1.9] [309. 2. halfspace-description. The set of all polyhedral cones is a subset of convex cones: 2. while each row of A is a vector inward-normal to a hyperplane containing the origin and partially bounding a halfspace.54 = {y | Ay 0 . [26.1.8. (Weyl) [332. any finitely generated convex cone is polyhedral.1).1 Definition. and can be not full-dimensional. △ A polyhedral cone thus defined is closed.2.1] Rockafellar [309. 2.4.54 (confer (103)) A polyhedral cone is the intersection of a finite number of halfspaces and hyperplanes about the origin.12. Cy = 0} ⊆ Rn (a) (b) (286) (c) where coefficients A and C generally denote matrices of finite dimension. convex ( 2. 2.19.3] the foremost consequence being. Cy     A    C y 0 = y|   −C 0} . (Minkowski) Conversely. thm. Cy 0 . are finitely generated [309. including polyhedral cones. K = {y | Ay 0 .0. That is the primary distinguishing feature between the set of all convex cones and polyhedra.19. has only a finite number of generators ( 2. although all convex cones of dimension 2 or less are polyhedral.1 Polyhedral cone From our study of cones. (Figure 52) We distinguish polyhedral cones in the set of all convex cones for this reason. when taken together. CONVEX POLYHEDRA 147 transformation remains a convex polyhedron. Polyhedral cone.12.. all polyhedra. we see: the number of intersecting hyperplanes and halfspaces constituting a convex cone is possibly but not necessarily infinite. the resultant is a polyhedral cone.1. can present severe difficulties to some interior-point methods of numerical solution.g.8] [309. invariance of polyhedral set closedness.2. 19] proposes affine sets be handled via complementary pairs of affine inequalities. Cy d and Cy d which. Each row of C is a vector normal to a hyperplane containing the origin.2).12.

0. More polyhedral cones are illustrated in Figure 50 and Figure 24.2 Exercise. 2. neither can they all be described by the convex hull of a bounded set of points as defined in (86). evidently. any subspace. The most familiar example of polyhedral cone is any quadrant (or orthant.1 p.12. any ray having the origin as base such as the nonnegative real line R+ in subspace R . polyhedral flavor (proper) Lorentz cone (316). [224. Hence a universal vertex-description .252]. Illustrate an unbounded polyhedron that is not a cone or its translation. a vertex ( 2.358] In Figure 20.148 convex polyhedra CHAPTER 2. or any halfspace partially bounded by a hyperplane through the origin is a polyhedral cone. and Rn . The set of all polyhedral cones is clearly a subset of convex polyhedra and a subset of convex cones (Figure 52).0. each vertex of the polyhedron is located at an intersection of three or more facets.3) generated by Cartesian half-axes. Not all convex polyhedra are bounded. the only vertex of that polyhedral cone lies at the origin. 2.115/6 p.6. VI. unbounded.1) always lies on the relative boundary of a convex polyhedron. and every edge belongs to precisely two facets [26.2 Vertices of convex polyhedra By definition.1. Unbounded convex polyhedra. def. From the definition it follows that any single hyperplane through the origin. Esoteric examples of polyhedral cone include the point at the origin.1. CONVEX GEOMETRY convex cones bounded polyhedra polyhedral cones Figure 52: Polyhedral cones are finitely generated. 2.12. In Figure 24. any line through the origin.1. and convex.

Convex polyhedra. .12.1. 0} ⊆ Rn (103) .2. for 1 ≤ m ≤ k ≤ N {am:k | aT 1 = 1. {1 . vertex-description. certainly true for (86) and (288).1. Setting m = N +1 removes the inequality.2. then that cone is described setting m = 1 and k = 0 in vertex-description (288): K = cone X = {Xa | a a conic hull of N generators. From (78). affine sets −→ aT 1 = 1 1:k polyhedral cones −→ am:N 0 ←− convex hull (m ≤ k) (289) It is always possible to describe a convex hull in the region of overlapping indices because. . . Conversely. 2. k} ∪ {m . and (103). (86).12. when boundedness (86) applies. aℓ 149 (287) By discriminating a suitable finite-length generating list (or set) arranged columnar in X ∈ Rn×N . △ Coefficient indices in (288) may or may not be overlapping.1) Denote the truncated a-vector ai:ℓ = ai .2.8. . am:N 1:k 0 . convex hull of the vertices is a polyhedron identical to convex hull of the generating list.1 Definition. N } = {1 .12.0. CONVEX POLYHEDRA of polyhedra in terms of that same finite-length list X (76): 2.1 Vertex-description of polyhedral cone Given closed convex cone K in a subspace of Rn having any set of generators for it arranged in a matrix X ∈ Rn×N as in (279). am:k m:k 0} ⊆ {am:k | aT 1 = 1. then any particular polyhedron may be described P = Xa | aT 1 = 1 . Setting k = 0 removes the affine equality condition. we summarize how the coefficient conditions may be applied. . but all the coefficients are assumed constrained. N } (288) where 0 ≤ k ≤ N and 1 ≤ m ≤ N +1. . (confer 2. . some subset of list members reside in the polyhedron’s relative interior. . am:N 1:k 0} (290) Members of a generating list are not necessarily vertices of the corresponding polyhedron.

then the dual cone K ( 2. id est.g. Any subspace.2 Pointedness ( 2. 1Ts ≤ 1} 1 Figure 53: Unit simplex S in R3 is a unique solid tetrahedron but not regular. CONVEX GEOMETRY S = {s | s 0 . the cone will contain + at least one line and there can be no vertex. any orthant in Rn or Rm×n . polyhedral cone K is pointed if and only if there is no nonzero a 0 that solves Xa = 0 .55 If rank X = n ..2.2. Euclidean vector space Rn .2) [332.1) is pointed. This null-pointedness criterion Xa = 0 means that a pointed polyhedral cone is invariant to linear injective transformation. or any halfspace are examples of nonpointed polyhedral cone. hence. any 0-based ray in a subspace. any two-dimensional V-shaped cone in a subspace.150 150 CHAPTER 2.2. no vertex. (309) ∗ . 2.55 Otherwise. nonnegative real line R+ in vector space R .12.10] Assuming all generators constituting the columns of X ∈ Rn×N are nonzero. id est. the cone cannot otherwise be pointed. e.7.13. Examples of pointed polyhedral cone K include: the origin. iff find a subject to Xa = 0 1T a = 1 a 0 (291) is infeasible or iff N (X) ∩ RN = 0 . 2.1. 2.

△ There are an infinite variety of simplicial cones in Rn . by increasing affine dimension. any five-vertex polychoron. Figure 64. are: a point.3. Any orthant is simplicial.1 Simplex [ 0 e1 e2 · · · en ] a | aT 1 = 1 .2.12.7.12.56 thus its vertex-description: S = conv {0. In R0 the unit simplex is the point at the origin. i = 1 . .2.12. i = 1 . [20. CONVEX POLYHEDRA 151 2. .g.3 Unit simplex S {s | s 0 . k+1 . and dimension dim S = n (293) The origin supplies one vertex while heads of the standard basis [203] [333] {ei . . 1].A] equivalently. n}} = 2.2.. a 0 (294) The unit simplex comes from a class of general polyhedra called simplex. . and so on. n + 1 vertices.12. in R2 it is a triangle and its relative interior. {ei . Figure 54. n} in Rn constitute those remaining. e. as is any rotation thereof. 2. any line segment. and so on. n ≥ k (295) So defined. in R the unit simplex is the line segment [0.1 Definition.56 simplicial cone ⇒ proper polyhedral cone .1) cone K in Rn is called simplicial iff K has exactly n extreme directions. dim aff{xℓ } = k . in R3 it is the convex hull of a tetrahedron (Figure 53). Examples of simplices.1. in R4 it is the convex hull of a pentatope [376]. II. . 1Ts ≤ 1} ⊆ Rn + (292) A peculiar subset of the nonnegative orthant with halfspace-description is a unique bounded convex full-dimensional polyhedron called unit simplex (Figure 53) having n + 1 facets. iff proper K has exactly n linearly independent generators contained in any given set of generators. a simplex is a closed bounded convex set possibly not full-dimensional. Simplicial cone.2. . A proper polyhedral ( 2. having vertex-description: [94] [309] [374] [114] conv{xℓ ∈ Rn } | ℓ = 1 . Figure 24. a general tetrahedron. any triangle and its relative interior.3. 2.

152 CHAPTER 2. Each cone has three facets (confer 2.0.11.13. Cartesian axes are drawn for reference.3). Semiinfinite boundary of each cone is truncated for illustration. . CONVEX GEOMETRY Figure 54: Two views of a simplicial cone and its dual in R3 (second view on next page).

2. CONVEX POLYHEDRA 153 .12.

meaning. we tacitly assume the two descriptions to be equivalent. dual cone. and generalized inequality obeys a total order entrywise.12. and 2.4 Converting between descriptions Conversion between halfspace. the negative dual cone −K plays the role that orthogonal complement plays for subspace projection. [394] [279. in general. then these three concepts come into alignment with the Cartesian prototype: biorthogonal expansion becomes orthogonal expansion. K ⊞ −K = Rn ∗ (2066) where ⊞ denotes unique orthogonal vector sum.154 CHAPTER 2.9. 2.4] Nonetheless.57 ( E. 19 thm.11: 2.2. CONVEX GEOMETRY 2.13. very much like the familiar Cartesian system whose analogous cone is the first quadrant (the nonnegative orthant). it is difficult to completely discuss one without mentioning the others. fundamentally.13 Dual cone & generalized inequality & biorthogonal expansion These three concepts.2) whereas biorthogonal expansion is. 2.2.9.13. One way to think of a pointed closed convex cone is as a new kind of coordinate system whose basis is generally nonorthogonal. [61. and biorthogonal expansion. ∗ Figure 172) Indeed. projection on a subspace is ascertainable from projection on its orthogonal complement (Figure 171). the dual cone becomes identical to the orthant. Namely.2.5] For unique minimum-distance ∗ projection on a closed convex cone K . an expression of coordinates in a pointed conic system whose axes are linearly independent but not necessarily orthogonal. −K is the algebraic complement in Rn .2] but the conversion is easy for simplices.(285) (286) and vertex-description (86) (288) is nontrivial. 2.13.57 . [15] [114. The dual cone is critical in tests for convergence by contemporary primal/dual methods for numerical solution of conic problems.7.1] We explore conversions in 2. generalized inequality. Generalized inequality K is a formalized means to determine membership to any pointed closed convex cone K ( 2. [309.4. 2. are inextricably melded.19.2. a conic system. When cone K is the nonnegative orthant. 4.

 . .2. [200.  . DUAL CONE & GENERALIZED INEQUALITY 155 2. for example.4.7] 2. . Find the dual of each polyhedral cone from Figure 56 by using dual cone equation (296).0.g. e. The dual cone is the negative polar cone defined by many authors. 14] [41] [26] [332.13. belonging to K . When K is represented by a halfspace-description such as (286). .4.. e.58 that is always closed and convex because it is an intersection of halfspaces ( 2. Perhaps the most instructive graphical method of dual cone construction is cut-and-try. there is a second and equivalent construction: ∗ Dual cone K is the union of each and every vector y inward-normal to a hyperplane supporting K ( 2. Figure 55b. where  aT 1 . Manual dual cone construction. .2: for PK x the Euclidean projection of point x on closed convex cone K −K = {x − PK x | x ∈ Rn } = {x ∈ Rn | PK x = 0} ∗ (2067) 2.  ∈ Rp×n .1 Dual cone For any set K (convex or not). aT m  ∗ A C  cT 1 .1). cT p  (297) then the dual cone can be represented as the conic hull K = cone{a1 .g. x ≥ 0 for all x ∈ K} (296) is a unique cone2.2.  ∈ Rm×n . the dual cone [110. . 2.9.58 ∗ ◦ .3.1). ±cp } (298) a vertex-description. Each halfspace has inward-normal x . When cone K is convex. ∗ K can also be constructed pointwise using projection theory from E.2] [309.1 Exercise. ..13. because each and every conic combination of normals from the halfspace-description of K yields another inward-normal to a hyperplane supporting K .13. . 4.1. K = −K .2] K ∗ {y ∈ Rn | y . ±c1 .1. .1. Figure 55a. and boundary containing the origin. . am .6. A.

(b) Suggesting construction by union of inward-normals y to each and every hyperplane ∂H+ supporting K . CONVEX GEOMETRY K (a) ∗ 0 K ∗ K y K 0 (b) Figure 55: Two equivalent constructions of dual cone K in R2 : (a) Showing construction by intersection of halfspaces about 0 (drawn truncated).6).156 CHAPTER 2. ∗ .4. This interpretation is valid when K is convex because existence of a supporting hyperplane is then guaranteed ( 2. by (368).2. Only those two halfspaces whose bounding hyperplanes have inward-normal corresponding to an extreme direction of this pointed closed convex cone K ⊂ R2 need be drawn.

2 K ∂K ∗ K (b) 0 −0. (confer Figure 64) .2.6.8 −1 1 1.13. Figure 57 would be reached in limit.4 ∂K ∗ (a) 0.2 −0.13. The construction is then pruned so that each dual boundary vector does not exceed π/2 radians in angle with each and every vector from polyhedral cone. ( 2.1) (a) This characteristic guides graphical construction of dual cone in two dimensions: It suggests finding dual-cone boundary ∂ by making right angles with extreme directions of polyhedral cone.6 K −0. DUAL CONE & GENERALIZED INEQUALITY 157 R2 R3 1 0.6 0.5 0 0. and vice versa.4 −0. Were dual cone in R2 to narrow.8 0. Each extreme direction of a proper polyhedral cone is orthogonal to a facet of its dual cone.5 ∗ −0. x ≥ 0 for all y ∈ G(K ) ∗ (365) Figure 56: Dual cone construction by right angle. in any dimension.5 x ∈ K ⇔ y . (b) Same polyhedral cone and its dual continued into three dimensions.

dual cone K exists even when the affine hull of the original cone is a proper subspace. so K cannot be pointed. What is {x ∈ Rn | xTz ≥ 0 ∀ z ∈ Rn } ? What is {x ∈ Rn | xTz ≥ 1 ∀ z ∈ Rn } ? What is {x ∈ Rn | xTz ≥ 1 ∀ z ∈ Rn } ? + As defined.3 Example.16] for a good illustration of Rockafellar’s recession cone [42]. 2. Dual problem. (Both convex cones appear truncated. the dual program is always convex even if the primal is not.2 Exercise. hence has no extreme directions. even when the original cone is not full-dimensional. consider the ease with which convergence can be ascertained in the following optimization problem (301p): 2. CONVEX GEOMETRY K ∗ 0 Figure 57: Polyhedral cone K is a halfspace about origin in R2 .1.1] His monumental work Convex Analysis has not one figure or illustration.158 K CHAPTER 2. the number of variables in the dual is equal to the Rockafellar formulates dimension of K and K .) ∗ 2. II.2.0. Dual cone definitions. 14.1) Duality is a powerful and widely employed tool in applied mathematics for a number of reasons. [309.13. hence not full-dimensional in R2 .59 To further motivate our understanding of the dual cone. Dual cone K is a ray base 0. First.59 ∗ ∗ .13.0. Second.1. See [26. id est.6. (confer 4.

although objective functions from conic problems (301p) (301d) are linear. meaning. DUAL CONE & GENERALIZED INEQUALITY 159 f (x . duality gap is 0 . When problems are strong duals. dual functions never meet (f (x) > g(z)) by (299). g(z) (dotted) kiss at saddle value as depicted at center. Barbosa) This serves as mnemonic icon for primal and dual problems. zp ) or f (x) f (xp . z) or g(z) x z Figure 58: (Drawing by Lucas V. Otherwise.2.13. . functions f (x) .

z) minimize maximize f (x.160 CHAPTER 2. β and matrix constant C minimize (p) x αTx maximize y. Third. the dual always provides a bound on the primal optimal objective.283). 3.3] When not equal. Lagrange duality theory concerns representation of a given optimization problem as half of a minimax problem. [301. then equality x⋆Ty ⋆ = 0 2. p. 5. For convex problems. .4] Given any real function f (x.3. When minimize maximize f (x. 36] [61. [309. z) = maximize minimize f (x.z β Tz ∗ subject to x ∈ K Cx = β subject to y ∈ K C Tz + y = α (d) (301) Observe: the dual problem is also conic.60 (303) In this context. αTx ≥ β Tz xT(C Tz + y) ≥ (C x)T z x y≥0 T (302) which holds by definition (296). the dual variables provide necessary and sufficient optimality conditions: Essentially.60 satisfying Slater’s condition (p. Under the sufficient condition that (301p) is a convex problem 2. z) x z z x (300) we have strong duality and then a saddle value [153] exists.1] [240.3] Consider primal conic problem (p) (over cone K) and its corresponding dual problem (d): [295.1. the maximum value achieved by the dual problem is often equal to the minimum of the primal. 2. and its objective function value never exceeds that of the primal. CONVEX GEOMETRY number of constraints in the primal which is often less than the number of variables in the primal program. z) ≥ maximize minimize f (x.1] [241] given vectors α . (Figure 58) [306. 2. problems (p) and (d) are convex if K is a convex cone. z) x z z x (299) always holds.

p.2. Were convex cone K polyhedral ( 2. Were K a positive semidefinite cone. C Tz + y = α ∗ (p)−(d) (304) whose optimal objective always has the saddle value 0 (regardless of the particular convex cone K and other problem parameters).1) 2. 2.13. y.2] Thus determination of convergence for either primal or dual problem is facilitated. [362. 4.5). Selfdual nonnegative orthant K yields the primal prototypical linear program and its dual.12.7] (Cartesian product) For closed convex cones K1 and K2 .5] [332. then problem (p) has the form of prototypical semidefinite program (648) with (d) its dual. 5. Then (p) and (d) are together equivalent to the minimax problem minimize x. dual of the dual cone equals closure of the original cone.13. z αTx − β Tz subject to x ∈ K . or has some special structure that can be exploited. 14] [110. and K = K1 × K2 ∗ ∗ ∗ (305) where each dual is determined with respect to a cone’s ambient space.10.3. 3.1). K = K ∗∗ (306) .13.52] When K is any convex cone. advantageous when the dual problem is easier to solve than the primal problem.1 Key properties of dual cone ∗ ∗ ∗ ∗ For any cone. which is necessary and sufficient for optimality ( 2. for example. DUAL CONE & GENERALIZED INEQUALITY 161 is achieved.1. each problem (p) and (d) attains the same optimal value of its objective and each problem is called a strong dual to the other because the duality gap (optimal primal−dual objective difference) becomes 0.5. (−K) = −K For any cones K1 and K2 . y ∈ K C x = β .2. It is sometimes possible to solve a primal problem by way of its dual.5] ( 4. (conjugation) [309. [61. then problems (p) and (d) would be linear programs. because it can be solved analytically.1. their Cartesian product K = K1 × K2 is a closed convex cone. K1 ⊆ K2 ⇒ K1 ⊇ K2 [332.

162 CHAPTER 2. x ≥ 0 ∀ y ∈ K } If any cone K is full-dimensional. p. 3.3 exer. then dual of the dual cone is the original ∗∗ cone. 3. K = (K) ∗ ∗ (307) When convex cone K is closed. 2. ∗∗∗ ∗ Because K = K always holds.6] (R1 + R2 )⊥ = R⊥ ∩ R⊥ 1 2 (R1 ∩ R2 )⊥ = R⊥ + R⊥ 1 2 R⊥⊥ = R for any subspace R .61 For closed convex cones K1 and K2 (K1 ∩ K2 ) = K1 + K2 = conv(K1 ∪ K2 ) ∗ (313) [332. 4. 4. R2 ⊆ Rn .4.3.1 exer. then K is pointed. .61 These parallel analogous results for subspaces R1 .16. (dual vector-sum) [309. K = K ⇔ K is a closed convex cone: [332. conversely.2] [110.96] where operation closure becomes superfluous under the sufficient condition K1 ∩ int K2 = ∅ [55. K full-dimensional ⇒ K pointed ∗ ∗ ∗ (308) (309) ∗ If the closure of any convex cone K is pointed. 16. id est. iff K is full-dimensional. CONVEX GEOMETRY is the intersection of all halfspaces about the origin that contain K . [55.53. then K is full-dimensional. p.8] For convex cones K1 and K2 K1 + K2 = conv(K1 ∪ K2 ) is a convex cone. thm. K pointed ⇒ K full-dimensional ∗ (310) Given that a cone K ⊂ Rn is closed and convex. K is pointed if and only ∗ ∗ ∗ if K − K = Rn .3 exer. [110.95] K = {x ∈ Rn | y .20] (vector sum) [309. p.7].6] For convex cones K1 and K2 K1 ∩ K2 = (K1 + K2 ) = (K1 ∪ K2 ) ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ (311) (312) (closure of vector sum of duals)2. 4.

1.9.9. Figure 59) When cone K is a halfspace in Rn with n > 0 (Figure 57 for example). and where AT denotes adjoint operator. 2.2) A simplicial cone and its dual are proper polyhedral cones (Figure 64. ( E. [332.13. K ⊞ −K = Rn ⇔ K is closed and convex. Figure 54).1. 2.13.2.13. confer 4.9.1) of K1 under mapping A . . and vice versa. K is the point at the origin. ( 2. K is polyhedral if and only if K is polyhedral. but not the converse.9] (A−1 K1 ∩ K2 ) = AT K1 + K2 ∗ ∗ ∗ (314) where dual of cone K1 is with respect to its ambient space Rm and dual of cone K2 is with respect to Rn . The particularly important case K2 = Rn is easy to show: for ATA = I (AT K1 ) ∗ {y ∈ Rn | xTy ≥ 0 ∀ x ∈ AT K1 } = {y ∈ Rn | (ATz)T y ≥ 0 ∀ z ∈ K1 } = {ATw | z Tw ≥ 0 ∀ z ∈ K1 } ∗ = AT K1 ∗ (315) K is proper if and only if K is proper.3. K is its orthogonal complement. then provided int K1 ∩ AK2 = ∅ [55. and vice versa.1.0. ∗ the dual cone K is a ray (base 0) belonging to that halfspace but orthogonal to its bounding hyperplane (that contains the origin). 3. (2066) Any direction in a proper cone K is normal to a hyperplane separating ∗ K from −K .1 exer. and vice versa.8] K is simplicial if and only if K is simplicial. DUAL CONE & GENERALIZED INEQUALITY 163 (Krein-Rutman) Given closed convex cones K1 ⊆ Rm and K2 ⊆ Rn and any linear map A : Rn → Rm .13.2.2 Examples of dual cone ∗ ∗ ∗ ∗ When K is Rn . ∗ When K is a subspace. where A−1 K1 denotes inverse image ( 2.

164 CHAPTER 2. In R2 . its dual cone is the convex hull of all axes remaining. ∗ dual cone K (drawn tiled and truncated) is a hyperplane through origin. dual cone K is a line through origin while convex cone K is that line orthogonal. In R3 . its orthogonal complement. . its ∗ normal belongs to line K . CONVEX GEOMETRY K K ∗ 0 R3 K ∗ R2 K 0 Figure 59: When convex cone K is any one Cartesian axis.

hence. e.2.2 Abstractions of Farkas’ lemma 2. DUAL CONE & GENERALIZED INEQUALITY 165 When convex cone K is a closed halfplane in R3 (Figure 60). When K is any particular orthant in Rn . By closure we have conjugation: [309. a statement of fact by definition of dual cone (296). thm. ∗ [200. merely. id est.25] Kq = Kℓ = ∗ ∗ x t ∈ Rn × R | x q ≤t . A. ∞} (317) where x ℓ = x q is that norm dual to x ℓ determined via solution to ∗ ∗ 1/ℓ + 1/q = 1. ∗ K=K .1 Corollary.13. K = R When K is a polyhedral flavor Lorentz cone (confer (178)) Kℓ = x t ∈ Rn × R | x ℓ ≤t . and let x and y belong to a vector space Rn . it is ∗ neither pointed or full-dimensional. 2. Then y∈K ⇔ ∗ y .2.2. the dual cone is identical.g. and a generalization of the well-known Cartesian cone fact x 0 ⇔ y .2] Let K be any closed convex cone and K its dual.13. x ≥ 0 for all x ∈ K (318) which is. Figure 62 illustrates K = K1 and K = K1 = K∞ in R2 × R . for K equal to quadrant I . ℓ ∈ {1. 2. 22] to the language of convex cones.4. Generalized inequality and membership relation. K is a wedge-shaped polyhedral R2 ∗ + . exmp. the dual cone K can be neither full-dimensional or pointed. ℓ ∈ {1.14.. cone in R3 . x ≥ 0 for all y 0 (320) . x ≥ 0 for all y ∈ K ∗ (319) which may be regarded as a simple translation of Farkas’ lemma [134] as in [309.1] x∈K ⇔ y .13.0. ∗ When K is any quadrant in subspace R2 . ∞} (316) its dual is the proper cone [61.

166 CHAPTER 2. Both semiinfinite convex cones appear truncated. blades. ∗ . Each cone is like K from Figure 57 but embedded in a two-dimensional subspace of R3 . Cartesian coordinate axes drawn for reference. CONVEX GEOMETRY K K ∗ Figure 60: K and K are halfplanes in R3 .

2. coordinate axes for biorthogonal expansion asserted by the corollary are taken from extreme directions of K .2.13. expansion is assured by Carath´odory’s theorem ( E.7.2. x = 0 ⇔ An open cone K is admitted to (325) and (328) by (19). for any proper cone2. x ≥ 0 for all y ∈ K (319) ⇔ ∗ y . x > 0 for all y ∈ int K ∗ ∗ (325) (326) x ∈ K. x > 0 for all y ∈ K . Strict membership relations ∗ are useful.13.0.13.13.13. e We presume.1. y = 1 ∗ (323) 2. the obvious: x∈K ⇔ x∈K ⇔ y . x ≥ 0 for all y 0 K ∗ (321) meaning.. . coordinates for biorthogonal expansion of x ( 2. y = 0 y . Conjugating. 2. e.62 y . y K ∗ 0 ⇔ y .2 Exercise.0. Test Corollary 2.8) [368] must be nonnegative when x belongs to K . x K 0 ⇔ y .1.1).4. DUAL CONE & GENERALIZED INEQUALITY ∗ 167 for which implicitly K = K = Rn the nonnegative orthant. x ≥ 0 for all y ∈ K .1 and (323) graphically on the two-dimensional polyhedral cone and its dual in Figure 56. (confer 2. + Membership relation (319) is often written instead as dual generalized ∗ inequalities. when K and K are pointed closed convex cones.2. Dual generalized inequalities.2.2) When pointed closed convex cone K is implicit from context: x x ≻ 0 ⇔ x ∈ rel int K 0 ⇔ x∈K (324) Strict inequality x ≻ 0 means coordinates for biorthogonal expansion of x must be positive when x belongs to rel int K .62 K and its dual K x ∈ int K ⇔ 2. x ≥ 0 for all x 0 K (322) ⋄ When pointed closed convex cone K is not polyhedral. throughout.g.7.6.

2. y=0 ⇔ x ∈ ∂K ⇔ ∃ y = 0 y ∈ ∂K ⇔ ∃ x = 0 ∗ Boundary-membership relations for proper cones are also useful: (329) (330) which are consistent.13.0. Theorem of the alternative If in particular xp ∈ K a closed convex cone.5.1.3 Example. / 2.168 CHAPTER 2.13. 4] (confer 2. x > 0 for all x ∈ int K y.1) Consider a given matrix A and closed convex cone K . x = 0. x = 0.5. x ∈ K y. CONVEX GEOMETRY Conjugating. y ∈ K . e.13. y ∈ K ∗ ∗ (327) (328) y∈K. x = 0 y . x ∈ K. we get the dual relations: y ∈ int K ⇔ ∗ ∗ y . indeed.1 Null certificate. Linear inequality.2. then 0} = {ATx | x 0} ∗ (333) 0} (334) comes by a theorem of Weyl (p. then construction in Figure 55b / suggests there exists a supporting hyperplane (having inward-normal ∗ belonging to dual cone K ) separating xp from K . (319) . x > 0 for all x ∈ K ..147) that yields closedness for any vertex-description of a polyhedral cone. x ∈ ∂K ⇔ x ∈ int K and x ∈ K .g.13. [339. By membership relation we have Ay ∈ K ⇔ xTA y ≥ 0 ∀ x ∈ K ⇔ y Tz ≥ 0 ∀ z ∈ {ATx | x ∈ K} ∗ ⇔ y ∈ {ATx | x ∈ K} This implies {y | Ay ∈ K } = {ATx | x ∈ K} {y | Ay and the dual relation {y | Ay 0} = {ATx | x ∗ ∗ ∗ ∗ (331) (332) When K is the selfdual nonnegative orthant ( 2.1). 2. for example.

Beginning from the simplest Cartesian dual generalized inequalities (320) (with respect to nonnegative orthant Rm ). From (319) we have y ∈ K ⇔ bTy ≥ 0 for all b ∈ K T A y 0 ⇔ bTy ≥ 0 for all b ∈ {Ax | x ∗ 0} (341) .1 Example. b = Ax for all x ATy 0 By complementing sense of the scalar inequality: or in the alternative bTy < 0 .2. ∃ b = Ax . then y ∈ K or in the alternative y ∈ K . we make vector substitution y ← ATy ATy ATy Introducing a new vector by calculating b 0 ⇔ xTAT y ≥ 0 for all x 0 (338) Ax we get 0 (339) 0 ⇔ xTy ≥ 0 for all x 0 (337) 0 ⇔ bTy ≥ 0 .2.3. one system is feasible while the other is not. / T Scalar inequality b y < 0 is movable to the other side of alternative (340). 2. Theorem of the alternative for linear inequality. xp < 0 By alternative is meant: these two systems are incompatible. From a different perspective. x 0 (340) If one system has a solution. but that requires some explanation: From results in Example 2. xp ∈ K or in the alternative ∃y∈K ∗ (336) y . then the other does not.13.2. define a convex cone K = {y | ATy 0} . the ∗ dual cone is K = {Ax | x 0}.1.13. Myriad alternative systems of linear inequality can be explained in terms of pointed closed convex cones and their duals. xp < 0 Existence of any one such y is a certificate of null membership. + y Given A ∈ Rn×m . DUAL CONE & GENERALIZED INEQUALITY / xp ∈ K ⇔ ∃ y ∈ K ∗ 169 (335) y .13.0.

From this. CONVEX GEOMETRY ∗ Given some b vector and y ∈ K . b − Ay ∈ K ∗ b − Ay ∈ K ⇐ xT(b − Ay) ≥ 0 ∗ ∀x ∈ K (345) ⇔ xT b ≥ 0 . ATx = 0 . In toto.59] (Farkas/Tucker) ATy 0 . then there is a vector y ∈ K normal to a hyperplane / ∗ separating point b from cone K . conversely.170 CHAPTER 2. p. x . alternative systems of generalized inequality ∗ with respect to pointed closed convex cones K and K Ay K ∗ b (347) 0 K or in the alternative xT b < 0 . For another example. Then because −xTA y is unbounded below. xT(b − Ay) ≥ 0 implies ATx = 0 : for y ∈ Rm ATx = 0 . then bTy < 0 can only mean b ∈ K . When b ∈ K . b − Ay ∈ K b − Ay ∈ K ∗ ∗ ⇒ xT b ≥ 0 ⇔ xT(b − Ay) ≥ 0 ∀x ∈ K ∀x ∈ K (343) (344) From membership relation (343). from membership relation (318) with affine transformation of dual variable we may write: Given A ∈ Rn×m and b ∈ Rn ATx = 0 . suppose we allow any y ∈ Rm . bTy < 0 (342) 0 ∗ or in the alternative b = Ax . An / ∗ alternative system is therefore simply b ∈ K : [200. x Geometrically this means: either vector b belongs to convex cone K or it ∗ does not. ATx = 0 ∀ x ∈ K (346) Vector x belongs to cone K but is also constrained to lie in a subspace of Rn specified by an intersection of hyperplanes through the origin {x ∈ Rn | ATx = 0}.

Then question of simultaneous infeasibility is moot.2.1. x = 0 K (348) From this.2. b − Ay ≻ 0 ⇔ xT b > 0 . x 0.63 [309. Gordan. then the alternative systems become instead mutually exclusive with respect to nonpolyhedral cones. x 1 (350) =1 Ben-Israel collects related results from Farkas. b .1. A better strategy than disallowing solutions at ±∞ is to demand an interior point as in (349) or Lemma 4. x = 0 K (349) derived from (348) by taking complementary sense of the inequality in xT b . Ye provides an example illustrating simultaneous infeasibility with respect to the positive 1 0 0 1 semidefinite cone: x ∈ S2 . 2. then the other does not. A = .262] Ay ≺ b K ∗ or in the alternative xT b ≤ 0 .2] substituting A ← AT and setting b = 0 ATy ≺ 0 or in the alternative Ax = 0 .54.2. Simultaneous infeasibility of the two systems is not precluded by mutual exclusivity. DUAL CONE & GENERALIZED INEQUALITY 171 derived from (346) simply by taking the complementary sense of the inequality in xT b . ATx = 0 . These two systems are alternatives. if one system has a solution. x 0. p. and b = where xT b means 0 0 1 0 x . y ∈ R .63 . And from this. called a weak alternative.13. alternative systems with respect to the nonnegative orthant attributed to Gordan in 1873: [161] [55. pages:50. 2. ATx = 0 ∀ x K ∗ 0. and Stiemke in Motzkin transposition theorem.201] By invoking a strict membership relation between proper cones (325).2. we can construct a more exotic alternative strengthened by demand for an interior point. [33] If solutions at ±∞ are disallowed. Motzkin. alternative systems of generalized inequality [61.

In the unconstrained case (C = Rn ). the set of feasible solutions. spawned from calculus of variations. x−x⋆ ≤ 0 ∀ x ∈ C (2052) Proof of equivalence [Wıκımization. [251. negative gradient is normal to a hyperplane supporting the feasible set at a point of optimality.6) of f with respect to x evaluated at x⋆ . p. CONVEX GEOMETRY 2.172 CHAPTER 2. 2. p. In words. 3] ∇f (x⋆ )T (x − x⋆ ) ≥ 0 ∀ x ∈ C . instead of the corresponding minimization.13.2) P (x⋆−∇f (x⋆ )) ∈ C .64 x⋆ ∈ dom f (355) which can be inferred from the following application: presumably nonempty set of all variable values satisfying all given problem constraints. Given minimum-distance projection problem e minimize x 1 2 subject to x ∈ C x−y 2 (353) on convex feasible set C for example.9. the equivalent fixed point problem x = PC (x − ∇f (x)) = PC (y) (354) is solved in one step.3 Optimality condition (confer 2.13.64 and where ∇f (x⋆ ) is the gradient ( 3.1) The general first-order necessary and sufficient condition for optimality of solution x⋆ to a minimization problem ((301p) for example) with real differentiable convex objective function f (x) : Rn → R is [308. x⋆−∇f (x⋆ )−x⋆ .2.0. (Figure 67) Direct solution to variation inequality (351). Complementarity problem] is provided by N´meth. optimality condition (351) reduces to the classical rule (p.37] One solution method solves an equivalent fixed point-of-projection problem x = PC (x − ∇f (x)) (352) that follows from a necessary and sufficient condition for projection on convex set C (Theorem E.1. x⋆ ∈ C (351) where C is a convex feasible set. .10.250) ∇f (x⋆ ) = 0 .178] [133.

But this is simply half of a membership relation where the cone dual to Rn−rank C is the origin in Rn−rank C . [307] [251.2. DUAL CONE & GENERALIZED INEQUALITY 173 2. we now derive condition (357) from the general first-order condition for optimality (351): For problem (356) C {x ∈ Rn | C x = d} = {Z ξ + xp | ξ ∈ Rn−rank C } (358) is the feasible set where Z ∈ Rn×n−rank C holds basis N (C ) columnar.13.1 Example. are necessary and sufficient for optimality. We must therefore have Z T ∇f (x⋆ ) = 0 ⇔ ∇f (x⋆ )T Z ξ ≥ 0 ∀ ξ ∈ Rn−rank C (360) meaning. Since x⋆ ∈ C .1 Discretization of membership relation Dual halfspace-description ∗ Halfspace-description of dual cone K is equally simple as vertex-description K = cone(X) = {Xa | a 0} ⊆ Rn (103) .0. and vector d ∈ Rp .4. the convex optimization problem minimize f (x) x (356) subject to C x = d is characterized by the well-known necessary and sufficient optimality condition [61. ∇f (x⋆ ) must be orthogonal to N (C ). 4.13. C x⋆ = d (361) 2.2.3. and xp is any particular solution to C x = d . a fat full-rank matrix C ∈ Rp×n . Given a real differentiable convex function f (x) : Rn → R defined on domain Rn .3] ∇f (x⋆ ) + C T ν = 0 (357) where ν ∈ Rp is the eminent Lagrange multiplier. Optimality for equality constrained problem.13. we arbitrarily choose xp = x⋆ which yields an equivalent optimality condition ∇f (x⋆ )T Z ξ ≥ 0 ∀ ξ ∈ Rn−rank C (359) when substituted into (351).4 2. solution x⋆ is optimal if and only if ∇f (x⋆ ) belongs to R(C T ). These conditions Z T ∇f (x⋆ ) = 0 .13. Via membership relation.188] [230] In other words. p.

the conjugate statement must also hold.2.9. 1] without proof for pointed closed convex case. Whenever cone K is known to be closed and convex.1. and any set of generators denoted G(K ) for its dual such that K = cone G(K) . ( 2.2 ∗ 0} ⇔ K = K = z | Y Tz ∗∗ 0 (363) First dual-cone formula From these two results (362) and (363) we deduce a general principle: From any [sic] given vertex-description (103) of closed convex cone K .0. ∗ a halfspace-description of the dual cone K is immediate by matrix transposition (362). a 0 X Ty 0 0 (362) that follows from the generalized inequality and membership corollary (320).13. [309.8. 2.2) denoted by G(K) for closed convex ∗ cone K ⊆ Rn .4.8] K = {Y a | a 2. given any set of generators for dual cone ∗ K arranged columnar in Y .4. ( 2. then the consequent vertex-description of dual cone connotes a halfspace-description for K : [332. (confer (286)) K ∗ = = = = y ∈ Rn y ∈ Rn y ∈ Rn y ∈ Rn | | | | z Ty ≥ 0 for all z ∈ K z Ty ≥ 0 for all z = X a .56] The generalized inequality and membership corollary is discretized in the following theorem inspired by (362) and (363): 2. the dual cone K is also ∗∗ polyhedral and K = K . conversely. id est.11) ∗ We deduce further: For any polyhedral cone K . from any given halfspace-description (286) of K .65 K = cone G(K ) ∗ ∗ (364) Stated in [21.2.13.1)2. p. p. CONVEX GEOMETRY for corresponding closed convex cone K : By definition (296). The semi-infinity of tests specified by all z ∈ K has been reduced to a set of generators for K constituting the columns of X .1 Theorem.13. . a dual vertex-description is immediate (363). 2.13. 2.65 Given any set of generators. id est.174 CHAPTER 2.13. [332. (confer 2. a aTX Ty ≥ 0 . for X ∈ Rn×N as in (279). Discretized membership.122] Various other converses are just a little trickier. the test has been discretized.

a i αi ei where ei is the i th member of a standard basis of possibly infinite ∗ ∗ cardinality.. ∗ ∗ x ∈ K ⇔ y .1 on Figure 56a using extreme directions there as generators.2.13.165) is necessary and sufficient for certifying cone membership: for x and y in vector space Rn x ∈ K ⇔ γ ∗ . x ≥ 0 ∀ αi ≥ 0 ⇔ ∗ G(K )ei . x ≥ 0 for all γ ∗ ∈ G(K )} K = {y ∈ Rn | γ . x ≥ 0 ∀ i . e. x ≥ 0 for all γ ∗ ∈ G(K ) y ∈ K ⇔ γ . Conjugate relation (366) is similarly derived. x ≥ 0 ∀ a 0 ⇔ i αi G(K )ei .2.2.4. x ≥ 0 ∀ a 0 (319).2 Exercise.13. then + from the discretized membership theorem it directly follows: x z ⇔ xi ≤ zi ∀ i (369) Generate simple counterexamples demonstrating that this equivalence with entrywise inequality holds only when the underlying cone inducing partial order is the nonnegative orthant.2. y ∈ K ⇔ y = G(K )a for some a 0. explain Figure 61. From the discretized membership theorem we may further deduce a more surgical description of closed convex cone that prescribes only a finite number of halfspaces for its construction when polyhedral: (Figure 55a) K = {x ∈ Rn | γ ∗ .4. Partial order induced by orthant.4. y ≥ 0 for all γ ∈ G(K)} ∗ ∗ (367) (368) 2. Discretized dual generalized inequalities. Test Theorem 2.13. y ≥ 0 for all γ ∈ G(K) ∗ ∗ ∗ ∗ ∗ ∗ (365) (366) ⋄ Proof. G(K )a .3 Exercise. DUAL CONE & GENERALIZED INEQUALITY 175 then discretization of the generalized inequality and membership corollary (p. x ≥ 0 ∀ y ∈ K ⇔ G(K )a . K = {G(K )a | a 0}. .g.13. 2. When comparison is with respect to the nonnegative orthant K = Rn .

Say proper polyhedral cone K is specified completely by generators that are arranged columnar in X = [ Γ1 · · · ΓN ] ∈ Rn×N id est. For a polyhedral cone. y (279) ∗ 0}.4. 2. Boundary membership to proper polyhedral cone.g. . test (329) of boundary membership can be formulated as a linear program. via constraint 1T y = 1 if c = 1. c ∈ K y=0 (370) (329) c ∈ ∂K ⇔ ∃ y = 0 subject to cT y = 0 X Ty 0 Xa = c a 0 This linear feasibility problem has a solution iff c ∈ ∂K .4 Example. 2. the smallest face of K containing C is equivalent to intersection of all faces of K that contain C .4. K = {X a | a may be expressed2. e.13. y ∈ K . c = 0.13.3 smallest face of pointed closed convex cone Given nonempty convex subset C of a convex set K . + 2.66 Nonzero y ∈ Rn may be realized in many ways.66 find a..176 CHAPTER 2. CONVEX GEOMETRY x K Figure 61: x 0 with respect to K but not with respect to nonnegative orthant R2 (pointed convex cone K drawn truncated). Then membership relation y.2.

67 Consider generator Γi . in that case. y = 0 ∀ z ∈ C} ∗ (371) (372) When C ∩ int K = ∅ then F(K ⊃ C) = K . Suppose polyhedral cone K is completely specified by generators arranged columnar in X = [ Γ1 · · · ΓN ] ∈ Rn×N (279) To find its smallest face F(K ∋ c) containing a given point c ∈ K . is ∗ the intersection of all hyperplanes containing C whose normals are in K . If there exists a vector z ∈ K orthogonal to c but not to Γi . Γi is a generator of F(K ∋ c). conversely. then i Γi ⊥ K ∩ c ⊥ = {z ∈ Rn | X Tz ∗ (373) 0 . containing C ⊂ ∂K ⊂ Rn on boundary of proper cone K . then Γi cannot belong to the smallest face of K containing c . where F(K ⊃ C) = {x ∈ K | x ⊥ K ∩ C ⊥ } C⊥ {y ∈ Rn | z . It follows that the smallest face F .1. cTz = 0} (374) so Γi ∈ F(K ∋ c) .1 Example. Such a vector z can be realized by a linear feasibility problem: find z ∈ Rn subject to cTz = 0 X Tz 0 ΓTz = 1 i If there exists a solution z for which ΓTz = 1.4.2.13.67 . it is necessary and sufficient to find generators for the smallest face. then Γi ∈ F(K ∋ c) ∗ by (371) and (362) because Γi ⊥ K ∩ c ⊥ . We may do so one generator at a ∗ time:2.2.13.164] By (308). When finding a smallest face.4. generators of K in matrix X may not be diminished in number (by discarding columns) until all generators of the smallest face have been found. 2. DUAL CONE & GENERALIZED INEQUALITY 177 [309.3. solution z is a certificate of null membership. p. Finding smallest face of cone. If this / problem is infeasible for generator Γi ∈ K .13. by the discretized membership theorem 2. membership relation (329) means that each and every point on boundary ∂K of proper cone K belongs to a hyperplane ∗ supporting K whose normal y belongs to dual cone K . 2.

then (373) is infeasible and Γi ∈ F(K ∋ c) is a generator of the smallest face that contains c .2.13. with normal in K . µ∈R find a. CONVEX GEOMETRY Since the constant in constraint ΓTz = 1 is arbitrary positively.68 Hint: A hyperplane. Finding smallest face of pointed closed convex cone.178 CHAPTER 2. Dual generalized inequalities with respect to the positive semidefinite cone in the ambient space of symmetric matrices can therefore be simply stated: (Fej´r) e X 0 ⇔ tr(Y TX) ≥ 0 for all Y 0 (377) Membership to this cone can be determined in the isometrically isomorphic 2 Euclidean space RM via (38). Smallest face of positive semidefinite cone. ( 2.3 Exercise.5 Dual PSD cone and generalized inequality ∗ The dual positive semidefinite cone K is confined to SM by convention. id est. containing cone K is admissible. a full-dimensional cone K is an unnecessary condition. 2. 2.13. II].2. SM + ∗ {Y ∈ SM | Y .24] [39] [196. 2.1.3. ∗ . then by i theorem of the alternative there is correspondence between (373) and (347) admitting the alternative linear problem: for a given point c ∈ K a∈RN .2 Exercise.2.4.3. µ (375) subject to µc − Γi = X a a 0 Now if this problem is feasible (bounded) for generator Γi ∈ K . positive semidefinite matrix Y can be interpreted as inward-normal to a hyperplane supporting the positive semidefinite cone.68 2. X ≥ 0 for all X ∈ SM } = SM + + (376) The positive semidefinite cone is selfdual in the ambient space of symmetric ∗ matrices [61. K = K .4.1) By the two interpretations in 2. Show that formula (371) and algorithms (373) and (375) apply more broadly. exmp.13.13. Derive (221) from (371).

1).1. a necessary but insufficient condition for selfdualness (Figure 62. evokes a particular instance of these dual generalized inequalities (377): X 0 ⇔ yy T . This means each extreme direction of K is normal to a hyperplane exposing one of its own faces.1].2. [30.2. 2.1.5.3) Consider a peculiar vertex-description for a convex cone K defined over a positive semidefinite cone (instead of a nonnegative orthant as in .0.13. 2.13.3. II.13. exmp. X ≥ 0 ∀ yy T( 0) (1475) Discretization ( 2.1 selfdual cones From (131) (a consequence of the halfspaces theorem.9. 2.1 Example.13.0. In three dimensions.4.7).2. or from discretized definition (368) of the dual cone we get a rather self-evident characterization of selfdual cones: K=K ∗ ⇔ K = γ∈G(K) y | γ Ty ≥ 0 (378) In words: Cone K is selfdual iff its own extreme directions are inward-normals to a (minimal) set of hyperplanes bounding halfspaces whose intersection constructs it. (confer 2.3. the positive semidefinite cone SM in the ambient space of symmetric matrices (376).1).A] [61. Linear matrix inequality.2.4. for example).0.13. and + Lorentz cone (178) [20. where the only finite value of the support function for a convex cone is 0 [200. Selfdual cones are necessarily full-dimensional. I] Their most prominent representatives are the orthants (Cartesian cones). a plane containing the axis of revolution of a selfdual cone (and the origin) will produce a slice whose boundary makes a right angle.2.1) allows replacement of positive semidefinite matrices Y with this minimal set of generators comprising the extreme directions of the positive semidefinite cone ( 2.1.5.25]. y TXy ≥ 0 ∀ y ( A.2. DUAL CONE & GENERALIZED INEQUALITY 179 The fundamental statement of positive semidefiniteness. C.

∗ Each of four extreme directions from K belongs to a face of dual cone K . is symmetrical about its axis of revolution. thm.. Cone K . CONVEX GEOMETRY (a) 0 K ∂K ∗ ∂K K (b) ∗ Figure 62: Two (truncated) views of a polyhedral cone K and its dual in R3 . [20. 2.180 CHAPTER 2.21] e. An orthant (or any rotation thereof. In fact. every selfdual polyhedral cone in R3 has an odd number of extreme directions.g. [22. consider an equilateral having five extreme directions. a simplicial cone) is not the only selfdual polyhedral cone in three or more dimensions. shrouded by its dual.3] . Each pair of diametrically opposed extreme directions from K makes a right angle.A.

2. and convex set C ⊆ Rp g −1(rel int C) = ∅ .2) Although matrix A is finite-dimensional.6.1.A. K n is generally not a polyhedral cone (unless m = 1 or 2) simply because X ∈ S+ . DUAL CONE & GENERALIZED INEQUALITY n definition (103)): for X ∈ S+ given Aj ∈ S n .   Am . Convex cone K can be closed. . Theorem. X   . X    T   svec(A1 ) . j = 1 . rel int K = int K (14) ∗ meaning.7] Given affine operator g : Rm → Rp . A svec Xp2 ∈ K ⇒ A(ζ svec Xp1+ξ svec Xp2 ) ∈ K for all ζ . by (175) A svec Xp1 .13.  | X 0 ⊆ Rm . and where symmetric vectorization svec is defined in (56). cone K is full-dimensional ⇒ dual cone K is pointed by (309).12] [309. .1. svec X | X 0  = . convex set D ⊆ Rm . prop. Cone K is indeed convex because.   svec(Am )T 181 (379) {A svec X | X 0} where A ∈ Rm×n(n+1)/2 . [200.0. thm. then rel int g(D) = g(rel int D) . g −1 C = g −1 C (381) ⋄ By this theorem. Inverse image closedness. m    A1 .3. rel int g −1 C = g −1(rel int C) . ξ ≥ 0 (380) since a nonnegatively weighted sum of positive semidefinite matrices must be positive semidefinite. by this corollary: . provided the vectorized Aj matrices are linearly independent. relative interior of K may always be expressed rel int K = {A svec X | X ≻ 0} (382) Because dim(aff K) = dim(A svec S n ) (127) then.2. ( A. K =  . .

2.2. 3] p m Given linear operator A : R → R and closed convex cone X ⊆ Rp .0. dual cone relative interior may always be expressed rel int K = {y ∈ Rm | m ∗ m j=1 yj Aj ≻ 0} (387) Function g(y) on R is an isomorphism when the vectorized Aj matrices ∗ are linearly independent.6. hence. And although we already know that the dual cone is closed. [56.1. thm. X ⊁ 0} (384) Now consider the (closed convex) dual cone: {y {y {y y y | | | | | z . then A svec X is an isomorphism n in X between cone S+ and range R(A) of matrix A . [59] [152] [263] [365] Although we already know that the dual cone is convex ( 2.1) sufficient for convex cone K to be closed and have relative boundary rel ∂K = {A svec X | X K = = = = = ∗ 0 .6] If matrix A has no nontrivial nullspace.1).182 CHAPTER 2.1. it is certified by (381). convex cone K = A(X ) is closed A(X ) = A(X ) if and only if N (A) ∩ X = {0} or N (A) ∩ X rel ∂X ⋄ (383) Otherwise. y ≥ 0 for all X 0} svec X . Inverse image K must . y ≥ 0 for all z ∈ K} ⊆ Rm z . inverse image theorem 2. CONVEX GEOMETRY Corollary. 3] [51. Cone closedness invariance.1 ∗ certifies convexity of K which is the inverse image of positive semidefinite n cone S+ under linear transformation g(y) yj Aj . y ≥ 0 for all z = A svec X .1.9.1. K = A(X ) ⊇ A(X ) ⊇ A(X ).13. uniquely invertible. X A svec X .0. [309.10. ATy ≥ 0 for all X 0 svec−1 (ATy) 0 0} (385) that follows from (377) and leads to an equally peculiar halfspace-description K = {y ∈ Rm | ∗ m yj Aj j=1 0} (386) n The summation inequality with respect to positive semidefinite cone S+ is known as linear matrix inequality. ( 2. By the inverse image closedness theorem.

1 Definition. ζ ≥ 0 X a ζ | aT 1 ≤ 1. vertex-description. ζ ≥ 0 X b | b 0 ⊆ Rn (389) that is simply a conic hull (like (103)) of a finite number N of directions.8.6 Dual of pointed polyhedral cone In a subspace of Rn . DUAL CONE & GENERALIZED INEQUALITY 183 n therefore have dimension equal to dim R(AT )∩ svec S+ (49) and relative boundary ∗ m m m rel ∂K = {y ∈ R | yj Aj j=1 0. denoting its i th extreme direction by Γi ∈ Rn arranged in a matrix X as in (279). a 0. Relative interior may always be expressed rel int K = {X b | b ≻ 0} ⊂ Rn but identifying the cone’s relative boundary in this manner rel ∂K = {X b | b 0 . now we consider a pointed polyhedral cone K given in terms of its extreme directions Γi arranged columnar in X = [ Γ1 Γ2 · · · ΓN ] ∈ Rn×N (279) The extremes theorem ( 2. j=1 ∗ yj Aj ⊁ 0} (388) When this dimension equals m . 2. then dual cone K is full-dimensional rel int K = int K ∗ ∗ (14) which implies: closure of convex cone K is pointed (309).1.1) provides the vertex-description of a pointed polyhedral cone in terms of its finite number of extreme directions and its lone vertex at the origin: 2.6. Given pointed polyhedral cone K in a subspace of Rn .13. b ⊁ 0} (391) (390) . a 0.13. then that cone may be described: (86) (confer (188) (292)) K = = = [ 0 X ] a ζ | aT 1 = 1.0. Pointed polyhedral cone.1.2.13.

Examine Figure 56 or Figure 62.1) and exposing a dual facet when N is ∗ finite.69 That dual cone so defined is unique.7. . Simplicial cone K illustrated in Figure 63 induces a partial order on R2 . . the extreme directions of pointed closed convex cone K constituting the N columns of X ) each define an inward-normal to a hyperplane ∗ supporting dual cone K ( 2. the extreme directions ∗ of proper polyhedral K are respectively orthogonal to the facets of K .4.13.0. Formulae and conversions to vertex-description of the dual cone are in 2.13. some coefficients meeting lower bound zero (b ∈ ∂RN ) do not + necessarily provide membership to relative boundary of cone K .1 allows relaxation of the requirement ∀ x ∈ K in (296) to ∀ γ ∈ {Γi } directly.13. N } ⊆ rel ∂K ∗ (392) because when {Γi } constitutes any set of generators for K . CONVEX GEOMETRY holds only when matrix X represents a bijection onto its range.2. identical to (296). Relationship to dual polyhedral cone.2. id est.13. i = 1 . likewise.1 Example.69 .11. then by closure the conjugate ∗ statement would also hold.13. for example. 2. polyhedral whenever the number of generators N is finite K = y | X Ty ∗ 0 ⊆ Rn (362) ∗ and is full-dimensional because K is assumed pointed.6.1 Facet normal & extreme direction We see from (362) that the conically independent generators of cone K (namely.1). then ∗ dual cone K has a halfspace-description in terms of the extreme directions Γi of K : K = y | γ Ty ≥ 0 for all γ ∈ {Γi .7 Biorthogonal expansion by example 2. in other words.1.184 CHAPTER 2. the extreme directions of pointed K each define an inward-normal to a hyperplane supporting K and exposing a facet when N is finite. All The extreme directions of K constitute a minimal set of generators.13. 2. △ Whenever cone K is pointed closed and convex (not only polyhedral). the extreme directions of proper polyhedral K are ∗ respectively orthogonal to the facets of K . 2. the discretization result in 2. We may conclude.13.4.9 and 2. Were K pointed and finitely generated.6. But K is not necessarily pointed unless K is full-dimensional ( 2.

∗ . Translating a negated cone is quite helpful for visualization: u K w ⇔ u ∈ w − K ⇔ u − w K 0.2. Point y is not comparable to z because z does not belong to y ± K . Point x is comparable to point z (and vice versa) but not to y .13. z K x ⇔ z − x ∈ K ⇔ z − x K 0 iff ∃ nonnegative coordinates for biorthogonal expansion of z − x . DUAL CONE & GENERALIZED INEQUALITY 185 Γ4 Γ2 K ∗ z x K y Γ1 u K ∗ w Γ1 ⊥ Γ 4 Γ2 ⊥ Γ 3 0 w−K Γ3 Figure 63: (confer Figure 169) Simplicial cone K in R2 and its dual K drawn truncated.g. Dotted ray-pairs bound translated cones K . all points less than w (with respect to K) belong to w − K .. e. Points need not belong to K to be comparable. Conically independent generators Γ1 and Γ2 constitute extreme ∗ directions of K while Γ3 and Γ4 constitute extreme directions of K .

732). iff X has no nullspace. Γ T Γ 2 = 0 3 4 Biorthogonal expansion of x ∈ K is then x = Γ1 ΓT x ΓT x 3 4 + Γ2 T ΓT Γ1 Γ4 Γ 2 3 (393) (394) where ΓT x/(ΓT Γ1 ) is the nonnegative coefficient of nonorthogonal projection 3 3 ( E.2). Those coefficients must be nonnegative x K 0 because x ∈ K (324) and K is simplicial. CONVEX GEOMETRY points greater than x with respect to K . .1) of x on Γ1 in the direction orthogonal to Γ3 (y in Figure 169 p.0.186 CHAPTER 2.0.70 Expansion w = XX † w . If we ascribe the extreme directions of K to the columns of a matrix X [ Γ1 Γ 2 ] (395) then we find that the pseudoinverse transpose matrix X †T = Γ3 1 Γ T Γ1 3 Γ4 1 ΓT Γ2 4 (402) (396) holds the extreme directions of the dual cone. they are coordinates in this nonorthogonal system.1).6.t X if and only if the extreme directions of K populating the columns of X ∈ Rn×N are linearly independent.0. for example. for any particular w ∈ Rn more generally.70 X †X = I (403) Possibly confusing is the fact that formula XX † x is simultaneously: the orthogonal projection of x on R(X) (1949). Therefore x = XX † x is biorthogonal expansion (394) ( E. neither do extreme directions Γ3 and Γ4 of dual cone K . id est. 2.1. rather. we have the biorthogonality condition [368] ΓT Γ1 = ΓT Γ2 = 0 4 3 Γ T Γ1 = 0 .r. The extreme directions Γ1 and Γ2 of K do not make ∗ an orthogonal set. is unique w. and where ΓT x/(ΓT Γ2 ) is the nonnegative coefficient of nonorthogonal 4 4 projection of x on Γ2 in the direction orthogonal to Γ4 (z in Figure 169).5. and biorthogonality condition (393) can be expressed succinctly ( E. are contained in the translated cone x + K .1)2. and a sum of nonorthogonal projections of x ∈ R(X) on the range of each and every column of full-rank X skinny-or-square ( E.

1 means Γ1 and Γ2 are linearly independent generators of K ( B.2. When S is real and X belongs to that polyhedral cone K .7. id est. generators because every x ∈ K is their conic combination. When X = QΛQT is symmetric.1).13. T wM X  M T λ i s i wi i=1 (397) Coordinate values depend upon geometric relationship of X to its linearly T independent eigenmatrices si wi .  = .1.13.5.13. X = SS −1 X = [ s1 · · · sM ] . A biorthogonal expansion is necessarily associated with a pointed closed convex cone. From 2. for example. for X ∈ SM M M T qi qi X i=1 X = QQ X = T = i=1 T λ i qi qi ∈ SM (398) . T wM  M T λ i s i wi i=1 (1580) coordinates for biorthogonal expansion are its eigenvalues λ i (contained in diagonal matrix Λ) when expanded in S .1. X = SΛS −1  T w1 .2 we know that means Γ1 and Γ2 must be extreme directions of K . Expansions implied by diagonalization. (confer 6.1 Pointed cones and biorthogonality 187 Biorthogonality condition X † X = I from Example 2.3.7.7.1) T Eigenmatrices si wi are linearly independent dyads constituted by right and left eigenvectors of diagonalizable X and are generators of some pointed polyhedral cone K in a subspace of RM ×M . coordinates for biorthogonal expansion are its eigenvalues when expanded in Q .1 Example.0.0.10.1. B. ( A. = [ s1 · · · sM ] Λ  .8.2.3. We will address biorthogonal expansion with respect to a pointed polyhedral cone not full-dimensional in 2. then coordinates of expansion (the eigenvalues λ i ) must be nonnegative.5).1.1).13.  = . 2.  T w1 X . pointed.1) When matrix X ∈ RM ×M is diagonalizable ( A.8. otherwise there can be no extreme directions ( 2.4.13. DUAL CONE & GENERALIZED INEQUALITY 2.

1. 2. This means matrix X simultaneously belongs to the positive semidefinite cone and to the pointed polyhedral cone K formed by the conic hull of its eigenmatrices.8.188 CHAPTER 2. id est. qi is the corresponding i th eigenvector arranged columnar in orthogonal matrix T and where eigenmatrix qi qi is an extreme direction of some pointed M polyhedral cone K ⊂ S and an extreme direction of the positive semidefinite cone SM . Expansion respecting nonpositive orthant.13.7. and −ei is an extreme direction ( 2. CONVEX GEOMETRY becomes an orthogonal expansion with orthonormality condition QTQ = I where λ i is the i th eigenvalue of X . 2. when X = QΛQT belongs to the positive semidefinite cone in the subspace of symmetric matrices. − Then biorthogonal expansion becomes orthogonal expansion n n x = XX x = i=1 T −ei (−eTx) i = i=1 −ei |eTx| ∈ Rn i − (401) and the coordinates of expansion are nonnegative.2. in fact. + Q = [ q1 q2 · · · qM ] ∈ RM ×M (399) Orthogonal expansion is a special case of biorthogonal expansion of X ∈ aff K occurring when polyhedral cone K is any rotation about the origin of an orthant belonging to a subspace. this expansion x = XX T x applies more broadly to domain Rn .71 Then coordinates for biorthogonal expansion of x must be nonnegative. Suppose. For this orthant K we have orthonormality condition X TX = I where X = −I . in particular. Similarly. for X ∈ SM + M M X = QQ X = i=1 T T qi qi X = i=1 T λ i qi qi ∈ SM + (400) where λ i ≥ 0 is the i th eigenvalue of X . x belongs to the nonpositive orthant K = Rn . absolute value of the Cartesian coordinates. coordinates for orthogonal expansion must be its nonnegative eigenvalues (1483) when expanded in Q . ei ∈ Rn is a standard basis vector.1) of K .71 An orthant is simplicial and selfdual.2 Example. Suppose x ∈ K any orthant in Rn . but then the coordinates each belong to all of R . Of course. .

the ordinary dual cone K will have more generators in any minimal set than K has extreme directions. assume N linearly independent extreme When K is contained in a proper subspace of Rn .13. Then we ∗ need only observe that section of dual cone K in the affine hull of K because. otherwise.3).t K could not be unique.13. by expansion of x . derivation Biorthogonal expansion is a means for determining coordinates in a pointed conic coordinate system characterized by a nonorthogonal basis.r.13. Biorthogonal expansion x = XX † x ∈ aff K = aff cone(X) is expressed in the extreme directions {Γi } of K arranged columnar in X = [ Γ1 Γ2 · · · ΓN ] ∈ Rn×N under assumption of biorthogonality X †X = I (403) (279) (402) where † denotes matrix pseudoinverse ( E). we consider a basis spanning a subspace. 2. We therefore seek.72 as the extreme directions {Γi } of K = cone(X) = {Xa | a 0} ⊆ Rn (103) We assume the quantity of extreme directions N does not exceed the dimension n of ambient vector space because. Those extreme directions must be linearly independent to uniquely represent any point in their span.2. id est. membership x ∈ aff K is implicit and because any breach of the ordinary dual cone into ambient space becomes irrelevant ( 2. Study of nonorthogonal bases invokes pointed polyhedral cones and their duals. expansion w. We consider nonempty pointed polyhedral cone K possibly not full-dimensional. id est. extreme directions of a cone K are assumed to constitute the basis while ∗ those of the dual cone K determine coordinates. then it possesses extreme directions. a vertex-description for K ∩ aff K in terms of linearly independent ∗ dual generators {Γi } ⊂ aff K in the same finite quantity2.72 ∗ . in this ∗ section. DUAL CONE & GENERALIZED INEQUALITY 189 2.8 Biorthogonal expansion.9. Unique biorthogonal expansion with respect to K relies upon existence of its linearly independent extreme directions: Polyhedral cone K must be pointed.

1 x∈K Suppose x belongs to K ⊆ Rn . in which case a is a standard basis vector.13.2. while the last inequality is a consequence of that corollary’s discretization ( 2.1). The pseudoinverse B = X † ∈ RN ×n ( E) is suitable when X is skinny-or-square and full-rank. polyhedral cones whose extreme directions number in excess of the ambient space dimension are precluded in biorthogonal expansion. . Conic independence alone ( 2. a 0 ⇔ ai ei 0 for all i . we know 2. . fat full-rank matrix X is prohibited by uniqueness because of existence of an infinity of right inverses. In other words. . .73 -or-square full-rank). CONVEX GEOMETRY directions hence N ≤ n (X skinny 2. In that case rank X = N . Then x = Xa for some a 0. because c 0 defines a pointed polyhedral cone (in fact.8.13.74 2. Then we require Xa = XBx = x and Bx = BXa = a . 2. Theoretically.73 2.75 From (404) and (392) we deduce K ∩ aff K = cone(X †T ) = {X †T c | c ∗ ∗ 0} ⊆ Rn (405) is the vertex-description for that section of K in the affine hull of K because R(X †T ) = R(X) by definition of the pseudoinverse. From (309). X † X must be a matrix whose entries are each nonnegative. more rows than columns.4. a 0 ⇔ aT X TX †T c ≥ 0 ∀ (c ∀ (c 0 0 ⇔ ⇔ aT X TX †T c ≥ 0 ∀ a 0) ∀i) ΓT X †Tc ≥ 0 i Intuitively. we can take (404) one step further by discretizing c : a 0 ⇔ ΓT Γj ≥ 0 for i. and for all c 0 and i = 1 . Coordinate vector a is unique only when {Γi } is a linearly independent set.74 Vector a ∈ RN can take the form a = Bx if R(B) = RN .10) is insufficient to guarantee uniqueness. j = 1 . a = ei 0. N a 0 ⇔ X † Xa 0 ⇔ aT X TX †T c ≥ 0 ⇔ ΓT X †Tc ≥ 0 i (404) The penultimate inequality follows from the generalized inequality and membership corollary.190 CHAPTER 2.75 “Skinny” meaning thin. The last inequality in (404) is a consequence of the fact that x = Xa may be any extreme direction of K .2.2. N ⇔ X † X ≥ 0 i ∗ In words. . any nonnegative vector a is a conic combination of the standard basis {ei ∈ RN }. the nonnegative orthant in RN ).

i = 1 . ∗ because X † X = I .76 When closed convex cone K is not full-dimensional. . . id est. ΓT Γi = 1. ( 2. assuming x ∈ aff cone X x ∈ K ⇔ γ ∗Tx ≥ 0 for all γ ∗ ∈ Γi . . . i 2. When X is full-rank. 2. and for all i we have eTX † Xa = eTa ≥ 0 i i only when a 0.2.2. suppose full-rank skinny-or-square matrix (N ≤ n) X †T Γ1 Γ2 · · · ΓN ∗ ∗ ∗ ∗ ∈ Rn×N (406) comprises the extreme directions {Γi } ⊂ aff K of the dual-cone intersection with the affine hull of K . Hence x ∈ K .13. ( E) In the present case. the columns constitute its extreme directions. then for i. hence. so is its pseudoinverse X † . the columns of X †T are linearly independent ∗ and generators of the dual cone K ∩ aff K .2 x ∈ aff K ∗ The extreme directions of K and K ∩ aff K have a distinct relationship. Whenever X is full-rank. ∗ . K has no extreme directions. then unique biorthogonal expansion of x ∈ K becomes (402) N x = XX † x = i=1 ∗T Γi x Γi Γi x ∗T (410) must be nonnegative because K is assumed whose coordinates a = pointed closed and convex.j = 1 . N ⇔ X †x 0 that leads to a partial halfspace-description.8. K = x ∈ aff cone X | X † x 0 (409) ∗ ⊂ ∂K ∩ aff K ∗ (407) (408) For γ ∗ = X †T ei .1) having the same number of extreme directions as K . while for i = j . 2.13. N .7.2) That section of the dual cone is itself a polyhedral cone (by (286) or the cone intersection theorem.76 From the discretized membership theorem and (313) we get a partial dual to (392).2.1. any x = Xa . DUAL CONE & GENERALIZED INEQUALITY ∗ 191 K ∩ aff K must be pointed if rel int K is logically assumed nonempty with respect to aff K . Conversely.10.

dual.77 from (313) K = { [ X †T X ⊥ −X ⊥ ]b | b 2. CONVEX GEOMETRY ∗ ΓT Γj = 0.10.0.1) Biorthogonal expansion therefore applies more broadly. biorthogonal expansion (410) becomes x = XX † Xb = Xb . [368.1.4] [203] implying each set of extreme directions is linearly independent.0. 2. ( B.1). 2. is i necessarily orthogonal.1. Thus a halfspace-description K = {x ∈ Rn | X † x 0 . Figure 49) when only three would suffice. A vertex-description of the dual cone. {Γi } nor {Γi } . Yet neither set of extreme directions.9 2.1. meaning.2. vector x can be uniquely expressed x = Xb where b ∈ RN because aff K contains the origin. we can say meaning K lies in a subspace.77 (413) (414) (415) basis N (X T ) aff cone X = aff K = {x | X ⊥T x = 0} ∗ 0 } ⊆ Rn (417) These descriptions are not unique.1 Formulae finding dual cone Pointed K . for any such x ∈ R(X) (confer E. for any x ∈ aff K . might use four conically independent generators for a plane ( 2.9.1. .13.13.192 ∗ CHAPTER 2. Thus. This is a biorthogonality condition. perhaps Rn . X skinny-or-square full-rank We wish to derive expressions for a convex cone and its ordinary dual under the general assumptions: pointed polyhedral K denoted by its linearly independent extreme directions arranged columnar in matrix X such that rank(X ∈ Rn×N ) = N The vertex-description is given: K = {Xa | a ∗ dim aff K ≤ n 0} ⊆ Rn 0} (411) (412) from which a halfspace-description for the dual cone follows directly: K = {y ∈ Rn | X T y By defining a matrix X⊥ (a columnar basis for the orthogonal complement of R(X)). for example. X ⊥T x = 0} (416) and a vertex-description2. precisely.

vertex-description (417) simplifies to K = {X †T b | b ∗ 0} ⊂ Rn ∗ (419) Now. Cone Table 1 simplifies because then aff cone X = Rn : For square X and assuming simplicial K such that rank(X ∈ Rn×N ) = N we have Cone Table S K X †T XT K ∗ dim aff K = n (418) vertex-description X halfspace-description X † For example. 2. having linearly independent generators. that its determination instead in subspace S ⊆ R is identical to its intersection with S .9.13.13. ( E) the dual cone K is simplicial whenever K is.9. X ⊥T K ∗ X †T .13. because dim R(X) = dim R(X †T ) . DUAL CONE & GENERALIZED INEQUALITY 193 These results are summarized for a pointed polyhedral cone. x ≥ 0 for all x ∈ K} = {y ∈ R | y . x ≥ 0 for all x ∈ K} ∩ S (421) .3 Cone membership relations in a subspace ∗ It is obvious by definition (296) of ordinary dual cone K .2. assuming closed convex cone K ⊆ S and ∗ K ⊆R ∗ ∗ (K were ambient S) ≡ (K in ambient R) ∩ S (420) because {y ∈ S | y . ±X ⊥ XT When a convex cone is simplicial ( 2.2 Simplicial case K X † X . in ambient vector space R . and its ordinary dual: Cone Table 1 vertex-description halfspace-description 2.3). id est.12.

13. x = 0 ⇔ 2. for proper cone K with respect to subspace S (confer (325)) x ∈ int K ⇔ y . X ⊥T K ∩ aff K X †T X . y = 0 y . this table reduces to Cone Table S. CONVEX GEOMETRY From this. x > 0 for all y ∈ rel int(K ∩ aff K) ∗ ∗ x ∈ K. x > 0 for all y ∈ K ∩ S . a constrained membership relation for the ordinary dual cone ∗ K ⊆ R . 2. X ⊥T T ∗ (411) We want expressions for the convex cone and its dual in subspace S = aff K : When dim aff K = n . For sake of completeness.78 The dual cone of positive semidefinite matrices SN = SN remains in SN by convention.194 CHAPTER 2. often serves as ambient space.78 ∗ . + + whereas the ordinary dual cone would venture into RN ×N . for example. we also have the conjugate relations. x ≥ 0 for all x ∈ K ∗ (422) By closure in subspace S we have conjugation ( 2.) Yet when S equals aff K for K a closed convex cone x ∈ rel int K ⇔ x ∈ K.4 y .2.1. x = 0 ⇔ (426) (427) Subspace S = aff K Assume now a subspace S that is the affine hull of cone K : Consider again a pointed polyhedral cone K denoted by its extreme directions arranged columnar in matrix X such that rank(X ∈ Rn×N ) = N Cone Table A vertex-description halfspace-description † dim aff K ≤ n K X X . y = 0 y . x > 0 for all y ∈ int K ∩ S ∗ ∗ (424) (425) (By closure. These descriptions facilitate work in a proper subspace.9. y ∈ S and closed convex cone K ⊆ S y∈K ∩S ⇔ x∈K ⇔ ∗ y . assuming x. The subspace of symmetric matrices SN .1): y . x ≥ 0 for all y ∈ K ∩ S (423) This means membership determination in subspace S requires knowledge of dual cone only in S . x > 0 for all y ∈ K ∩ aff K .13.

.9. Monotone nonnegative cone.1. exer. id est. simple algebra demands n x y = i=1 T xi yi = (x1 − x2 )y1 + (x2 − x3 )(y1 + y2 ) + (x3 − x4 )(y1 + y2 + y3 ) + · · · + (xn−1 − xn )(y1 + · · · + yn−1 ) + xn (y1 + · · · + yn ) (430) Because xi − xi+1 ≥ 0 ∀ i by assumption whenever x ∈ KM+ .13. 2] Simplicial cone ( 2.33] [356. eT x ≥ 0} n 0} (428) a halfspace-description where ei is the i th standard basis vector. 2. where 2. .9.4.. Conically independent columns and rows.79 y1 + y 2 ≥ 0 .2 Example. xTy ≥ 0 ∀ X † x . We suspect the number of conically independent columns (rows) of X to be the same for X †T . where † denotes matrix pseudoinverse ( E). i = 1 . y1 + y 2 + · · · + y n ≥ 0 0 (431) (432) (433) 0 ⇔ X Ty X = [ e1 e1 + e2 e1 + e2 + e3 · · · 1 ] ∈ Rn×n With X † in hand.2.4.79 [ e1 −e2 e2 −e3 · · · en ] ∈ Rn×n (429) For any vectors x and y .3. we might concisely scribe the remaining vertex.1) KM+ is the cone of all nonnegative vectors having their entries sorted in nonincreasing order: KM+ {x | x1 ≥ x2 ≥ · · · ≥ xn ≥ 0} ⊆ Rn + † = {x | X x X †T = {x | (ei − ei+1 )Tx ≥ 0. . We can say xTy ≥ 0 for all X † x 0 [sic] if and only if y1 ≥ 0 .i.. ⇔ the columns (rows) of X †T are c.1 Exercise.2.13.i.13. . Instead we use dual generalized inequalities in their derivation. DUAL CONE & GENERALIZED INEQUALITY 195 2. [61. and where2.and halfspace-descriptions from the tables for KM+ and its dual. Prove whether it holds that the columns (rows) of X are c.12. we can employ dual generalized inequalities (322) with respect to the selfdual nonnegative orthant Rn to find the halfspace-description of dual monotone nonnegative + ∗ cone KM+ . n−1.

6 KM+ −0. 1) =  −1  0   1 1 1 X = 0 1 1  0 0 1   0 X †T(: . CONVEX GEOMETRY 1 0. 2) =  1  −1  Figure 64: Simplicial cones. (a) Monotone nonnegative cone KM+ and its ∗ dual KM+ (drawn truncated) in R2 .5 1 1. 3) =  0  1 (b) ∗ ∂KM+  KM+  1 X †T(: .6 KM+ ∗ (a) 0.4 KM+ 0.2 −0.4 −0.196 CHAPTER 2.2 0 −0.8 0. (b) Monotone nonnegative cone and boundary of its dual (both drawn truncated) in R3 . Extreme directions of ∗ KM+ are indicated.5 ∗ −0.5 0 0. .8 −1  0 X †T(: .

5 2 ∗ −2 −1 −0. the orthogonal to the facets of KM+ .2.13. the extreme directions of proper KM+ are respectively ∗ ∗ Because KM+ is simplicial. then by (296) the dual cone we seek comprises all y for which (432) holds.1.9. n} = {y | X Ty 0} ⊂ Rn (434) The monotone nonnegative cone and its dual are simplicial. . So the ∗ vertex-description for KM+ employs the columns of X †T .5 0 KM KM −0. (Figure 65. thus its halfspace-description KM+ = {y ∗ ∗ KM+ ∗ 0} = {y | k i=1 yi ≥ 0 .4.5 KM 0 0. inward-normals to its facets constitute the linearly independent rows of X T by (434). Monotone cone. DUAL CONE & GENERALIZED INEQUALITY x2 1 197 0. the monotone cone is polyhedral and defined by the halfspace-description KM {x ∈ Rn | x1 ≥ x2 ≥ · · · ≥ xn } = {x ∈ Rn | X ∗Tx 0} (435) .5 1 1.5 −1 −1. . Likewise.13. the extreme ∗ directions of proper KM+ are respectively orthogonal to the facets of KM+ whose inward-normals are contained in the rows of X † by (428). 2. From 2.3 Example. Hence the vertex-description for KM+ employs the columns of X in agreement with Cone Table S because X † = X −1 . illustrated for two Euclidean spaces in Figure 64. k = 1 . Because X † x 0 connotes membership of x to pointed KM+ .6. Figure 66) Full-dimensional but not pointed.5 x1 Figure 65: Monotone cone KM and its dual KM (drawn truncated) in R2 .13.

Cartesian coordinate axes are drawn for reference. CONVEX GEOMETRY x3 ∂KM -x2 KM x3 ∂KM ∗ x2 KM ∗ Figure 66: Two views of monotone cone KM and its dual KM (drawn truncated) in R3 . ∗ . Monotone cone is not pointed. Dual monotone cone is not full-dimensional.198 CHAPTER 2.

.13.  −(n − 4) −3   ∈ Rn−1×n .  . In three dimensions.. . X ∗† =  . .9.. −(n − 3) −(n − 3)   . X ∗⊥T K ∗ X ∗†T . n−3 n−3 1 .. n−4   2 ···  2 1 1 1 while ∗ 0} ⊂ Rn (438) ··· . ±X ∗⊥ X ∗T K =K ∗∗ The vertex-description for KM is therefore KM = {[ X ∗†T X ∗⊥ −X ∗⊥ ]a | a where X ∗⊥ = 1 and  n − 1 −1 −1   n − 2 n − 2 −2  . and because KM is closed and convex. . Mathematically describe the respective interior of the monotone nonnegative cone and monotone cone. Inside the monotone cones.13.2..193) as follows: Cone Table 1* vertex-description halfspace-description ∗† X∗ X .. we may adapt Cone Table 1 (p.4 Exercise.. −1 ··· −1 −2 . 2.4..  . ···    . . n 3 . . also describe the relative interior of each face. 2 −(n − 2) −(n − 2)  1 1 −(n − 1) (439) 0 . Because dual monotone cone KM is pointed and satisfies rank(X ∗ ∈ Rn×N ) = N dim aff K ≤ n ∗ (437) where N = n − 1. DUAL CONE & GENERALIZED INEQUALITY Its dual is therefore pointed but not full-dimensional. X ∗⊥T y = 0} (440) −2 −1  KM = {y ∈ Rn | X ∗† y is the dual monotone cone halfspace-description. KM = { X ∗ b ∗ 199 [ e1 − e2 e2 − e3 · · · en−1 − en ] b | b 0 } ⊂ Rn (436) the dual cone vertex-description where the columns of X ∗ comprise its ∗ extreme directions. .

of course.5 CHAPTER 2. id est.2.19) to apply. be removed. ˆ K = {Z(AZ )† b | b ∗ 0} (443) From this and (362) we get a halfspace-description of the dual cone ˆ K = {y ∈ Rn | (Z TAT )† Z Ty 0} (444) From this and Cone Table 1 (p. This can be equivalently written in terms of nullspace of C and vector ξ : K = {Z ξ ∈ Rn | AZ ξ where R(Z ∈ Rn×n−rank C ) rank X 0} (441) N (C ). (1912) ∗ ˆ K = {[ Z †T(AZ )T C T −C T ]c | c 0} (445) (446) Yet because then.9. 2.80 . we are given the ordinary halfspace-description K = {x | Ax 0 .10) are removed. CONVEX GEOMETRY More pointed cone descriptions with equality condition Consider pointed polyhedral cone K having a linearly independent set of generators and whose subspace membership is explicit.13. we get an equivalent vertex-description for the dual cone K = {x | Ax ∗ K = {x | Ax 0} ∩ {x | Cx = 0} = {[ AT C T −C T ]b | b 0}∗ + {x | Cx = 0}∗ 0} (447) from which the conically dependent columns may.193) we get a vertex-description.200 2. m−ℓ×n ˆ where A ∈ R . So we get the ˆ vertex-description. When the conically dependent rows ( 2. by (313).80 ˆ ˆ Then results collected there admit assignment X (AZ )† ∈ Rn−rank C×m−ℓ . the rows remaining must be linearly independent for the Cone Tables (p. for full-rank (AZ )† skinny-or-square. Assuming (411) is satisfied rank (AZ )† ∈ Rn−rank C×m = m−ℓ = dim aff K ≤ n−rank C (442) where ℓ is the number of conically dependent rows in AZ which must ˆ be removed to make AZ before the Cone Tables become applicable. C x = 0} ⊆ Rn (286a) where A ∈ Rm×n and C ∈ Rp×n . followed with linear transformation by Z .

10. x − a ≤ 0 for all x ∈ K} = {y ∈ Rn | y . a new membership relation like (319): y ∈ −(K − a)∗ ⇔ x∈K ⇔ 2.restatement (confer 2.2.3) The general first-order necessary and sufficient condition for optimality of solution x⋆ to a minimization problem with real differentiable convex objective function f (x) : Rn → R over convex feasible set C is [308.1.13.3.1) First-order optimality condition (351) inspires a dual-cone variant: For any set K . The + normal cone to Rn at a ∈ K is (2134) + ⊥ KRn(a ∈ Rn ) = −(Rn − a)∗ = −Rn ∩ a⊥ . This means: When point a is interior to Rn . the selfdual nonnegative orthant in Rn . x − a ≤ 0 for all y ∈ −(K − a)∗ (450) first-order optimality condition . + .1 y .10. If np represents number + of nonzero entries in vector a ∈ ∂ Rn .6) belongs to the normal cone to C at x⋆ as in Figure 67. and a⊥ is the + + orthogonal complement to range of vector a .13. Consider proper cone K = Rn .10.10 Dual cone-translate ( E. the normal cone is the origin. for closed convex cone K y . x⋆ ∈ C (451) id est. DUAL CONE & GENERALIZED INEQUALITY 201 2.2. then dim(−Rn ∩ a⊥ ) = n − np and + + there is a complementary relationship between the nonzero entries in vector a and the nonzero entries in any vector x ∈ −Rn ∩ a⊥ .13.13. Normal cone to orthant. x ≤ 0 for all x ∈ K − a} K⊥ (a) (448) a closed convex cone called normal cone to K at point a . 3] −∇f (x⋆ ) ∈ −(C − x⋆ )∗ .13. From this. x − a ≤ 0 for all x ∈ K (449) and by closure the conjugate. + + + + ∗ a ∈ Rn + (452) where −Rn = −K is the algebraic complement of Rn . the negative dual of its translation by any a ∈ Rn is −(K − a)∗ = {y ∈ Rn | y . 2.1 Example. the negative gradient ( 3.

In circumstance depicted. gradient ∇f (x⋆ ) is normal to γ-sublevel set Lγ f (558) by Definition E. β .6.1.13. contours of equal level f (level sets) drawn dashed in function’s domain. CONVEX GEOMETRY α β C x⋆ γ {z | f (z) = α} α ≥ β ≥ γ −∇f (x⋆ ) {y | ∇f (x⋆ )T (y − x⋆ ) = 0 .202 CHAPTER 2. function is minimized over convex set C at point x⋆ iff negative gradient −∇f (x⋆ ) belongs to normal cone to C there. . and γ .256). normal cone is a ray whose direction is coincident with negative gradient.9. From 2.1. f (x⋆ ) = γ} Figure 67: (confer Figure 78) Shown is a plausible contour plot in R2 of some arbitrary differentiable convex real function f (x) at selected levels α . From results in 3.1.2 (p. So.10. id est.0. gradient is normal to a hyperplane supporting both C and the γ-sublevel set.

b − b⋆ ≤ 0 for all b ∈ Rm−ℓ + ⇔ ˆ −(Z TAT )† Z T ∇f (x⋆ ) ∈ −Rm−ℓ ∩ b⋆⊥ + (457) Then equivalent necessary and sufficient conditions for optimality of conic problem (453) with feasible set K are: (confer (361)) ˆ (Z TAT )† Z T ∇f (x⋆ ) 0.2. Consider a convex optimization problem having real differentiable convex objective function f (x) : Rn → R defined on domain Rn minimize x f (x) subject to x ∈ K (453) Let’s first suppose that the feasible set is a pointed polyhedral cone K possessing a linearly independent set of generators and whose subspace membership is made explicit by fat full-rank matrix C ∈ Rp×n . ℓ is the number of conically dependent rows in AZ ( 2.13.1. Optimality conditions for conic problem.13. for A ∈ Rm×n K = {x | Ax 0 .1. is ˆ K = {Z(AZ )† b | b 0} (443) ˆ where A ∈ Rm−ℓ×n . Cx = 0} ⊆ Rn (286a) (We’ll generalize to any convex cone K shortly.1 ˆ −(Z TAT )† Z T ∇f (x⋆ ) . DUAL CONE & GENERALIZED INEQUALITY 203 2.10.) Vertex-description of this ˆ cone. and Z ∈ Rn×n−rank C holds basis N (C ) columnar. ˆ ∇f (x⋆ )T (Z(AZ )† b − x⋆ ) ≥ 0 ∀ b ˆ −∇f (x⋆ )T Z(AZ )† (b − b⋆ ) ≤ 0 ∀ b because x⋆ ˆ Z(AZ )† b⋆ ∈ K 0 0 (454) (455) (456) From membership relation (449) and Example 2. assuming (AZ )† skinny-or-square full-rank. From optimality condition (351). we are given the halfspace-description.2 Example.10. id est.13. m−ℓ R+ m−ℓ R+ ˆ ∇f (x⋆ )T Z(AZ )† b⋆ = 0 (458) .10) which must be removed. b⋆ 0.

then C = 0. + minimize x f (x) 0 Rn + subject to x (460) Necessary and sufficient optimality conditions become (confer [61.3] because the dual variable becomes gradient ∇f (x). + 2.2.13. A complementarity problem in nonlinear function f is nonconvex find z ∈ K ∗ subject to f (z) ∈ K z . and equivalent to well-known Karush-Kuhn-Tucker (KKT) optimality conditions [61. 2. 5. A = Z = I ∈ S n . 4.10. id est. f (z) = 0 [211] (462) yet bears strong resemblance to (459) and to (2070) Moreau’s decomposition ∗ theorem on page 755 for projection P on mutually polar cones K and −K .81 .3 Example. When K = Rn . ∇f (x⋆ ) ∈ K .3]) ∇f (x⋆ ) 0.5.1. ∇f (x⋆ )T x⋆ = 0 (461) equivalent to condition (329)2. CONVEX GEOMETRY x⋆ ∈ K . x⋆ ∈ K ∗ (451) whose membership to normal cone. by (444). ∗ CHAPTER 2. ∇f (x⋆ )T x⋆ = 0 (459) This result (459) actually applies more generally to any convex cone K comprising the feasible set: Necessary and sufficient optimality conditions are in terms of objective gradient −∇f (x⋆ ) ∈ −(K − x⋆ )∗ . assuming only cone K convexity.81 (under nonzero gradient) for membership to the nonnegative orthant boundary ∂Rn . Complementarity problem.204 expressible. in particular. Rn + x⋆ Rn + 0. ⊥ −(K − x⋆ )∗ = KK (x⋆ ∈ K) = −K ∩ x⋆⊥ (2134) equivalently expresses conditions (459).

there must be a complementary relationship between the nonzero entries of vectors w and z . z = PK x and −f (z) = P−K∗ x .0. then z belongs to the convex set of solutions: ⊥ z ∈ −KRn(w ∈ Rn ) ∩ A = Rn ∩ w⊥ ∩ A + + + ⊥ where KRn(w) is the normal cone to Rn at w (452). Given w 0.2. Linear complementarity problem.9.10. DUAL CONE & GENERALIZED INEQUALITY 205 Identify a sum of mutually orthogonal projections x z −f (z) .4) and z is a solution to the complementarity problem iff it is a fixed point of z = PK x = PK (z − f (z)) (463) Given that a solution exists. . this linear complementarity problem can be solved by identifying w ← ∇f (z) = q + Bz then substituting that gradient into (462) find z ∈ K ∗ subject to ∇f (z) ∈ K z . p. wi zi = 0 ∀ i . ∇f (z) = 0 2.155] Elegant proofs of equivalence between complementarity e problem (462) and fixed point problem (463) are provided by N´meth [Wıκımization. If this intersection is nonempty. a prototypical complementarity problem on the nonnegative orthant K = Rn is linear in w = f (z) : + find z 0 subject to w 0 wTz = 0 w = q + Bz (464) This problem is not convex when both vectors w and z are variable.9.13. Then f (z) ∈ K ( E.13.82 (465) But if one of them is fixed. [204. existence of a fixed point would be guaranteed by theory of contraction.3. id est. Complementarity problem]. [87] [273] [313] Given matrix B ∈ Rn×n and vector q ∈ Rn . p. uniqueness cannot be assured.1. then + + the problem is solvable.300] But because only nonexpansivity (Theorem E. 2.4 Example.2. in Moreau’s ∗ terms. then the problem becomes convex with a very simple geometric interpretation: Define the affine subset A {y ∈ Rn | B y = w − q} For wTz to vanish.2.82 Notwithstanding.2 no.1) is achievable by a projector. [228.

by means of first-order optimality condition (351) or (451). constrained to affine set A = {x | C x = d}.10) of an arbitrary proper polyhedral cone K in Rn arranged columnar in X ∈ Rn×N such that N > n (fat) and rank X = n .83 Proper nonsimplicial K .10. Show. 2. Suitable f (z) is the quadratic objective from convex problem minimize z 1 T z Bz 2 + q Tz subject to z 0 (466) n which means B ∈ S+ should be (symmetric) positive semidefinite for solution of (464) by this method. CONVEX GEOMETRY which is simply a restatement of optimality conditions (459) for conic problem (453). that necessary and sufficient optimality conditions are: (confer (459)) x⋆ ∈ K C x⋆ = d (468) ∗ ∇f (x⋆ ) + C Tν ⋆ ∈ K ∇f (x⋆ ) + C Tν ⋆ .206 CHAPTER 2.13.1. Optimality for equality constrained conic problem. then assume we are given a set of N conically independent generators ( 2.5.13.3]. the easiest way to find a vertex-description of proper dual an optimal dual variable. these optimality conditions are equivalent to the KKT conditions [61. Then (464) has solution iff (466) does.5 Exercise. 5. Consider a conic optimization problem like (453) having real differentiable convex objective function f (x) : Rn → R minimize x f (x) (467) subject to C x = d x∈K where ν ⋆ is any vector2. x⋆ = 0 2. minimized over convex cone K but.83 satisfying these conditions. Having found formula (419) to determine the dual of a simplicial cone. this time. dual.11 Since conically dependent columns can always be removed from X to ∗ construct K or to determine K [Wıκımization]. 2. X fat full-rank .

84 0        (472) Because extreme directions of the simplices K i are extreme directions of K by assumption.   . Then proper dual cone K ∗ is the intersection of M dual simplicial cones K i . b .   c K = { [ X1 X2 · · · XM ] d | d 2.13. can no longer be unique because number N of extreme directions in K exceeds dimension n of the space.3.11. For Xi ∈ Rn×n . that we know how to perform simplicial decomposition efficiently. then 0} (473) That proposition presupposes. 3.1. Dual cone intersection.0.1 Theorem. . . like (410). id est. here.12. corresponding simplicial K i ( 2. .2.84 Each component simplicial cone in K corresponds to some subset of n linearly independent columns from X .  | a . M K= i=1 Ki ⇒ K = ∗ M i=1 Ki ∗ (469) ⋄ Proof. 3. also called “triangulation”. a complete matrix of linearly independent extreme directions (p. [332. M M 0} (470) K = i=1 Ki = i=1 {Xi c | c 0} (471) The union of all K i can be equivalently expressed    a     b  [ X1 X2 · · · XM ]  . The key idea.2.1) has vertex-description K i = {Xi c | c Now suppose.1] [175. 2.7] Suppose proper cone K ⊂ Rn equals the union of M simplicial cones K i whose ∗ extreme directions all coincide with those of K . of course. Finding the dual of K amounts to finding the dual of each simplicial part: 2.144) arranged columnar. c K =  . is how the extreme directions of the simplicial parts must remain extreme directions of K .13.1] Existence of multiple simplicial parts means expansion of x ∈ K . [305] [174. DUAL CONE & GENERALIZED INEQUALITY ∗ 207 cone K is to first decompose K into simplicial parts K i so that K = K i .

. i = 1 . Defining X [ X1 X2 · · · XM ] (with ∗ any redundant [sic] columns optionally removed from X ). the extreme ∗ directions of dual cone K are respectively orthogonal to the facets of K .i. . n | Xi† (j.193). N (476) where c.0. Γ2 . Xi†T(:. CONVEX GEOMETRY by the extremes theorem ( 2. . ℓ = 1 . Given conically independent generators for pointed closed convex cone K in . For any particular proper polyhedral cone K . those ∗ normals are extreme directions of the dual simplicial cones K i .:)Γℓ ≥ 0.j) . From the theorem and Cone Table S (p.142) shows a cone and its dual found via this algorithm. K = ∗ M ∗ Ki M = i=1 ∗ i=1 {Xi†T c | c 0} ∗ (475) The set of extreme directions {Γi } for proper dual cone K is therefore constituted by those conically independent generators. argument (:.j) denotes the j th column while (j. Γ 1 . .1. Cone Table S p.1).:) denotes the j th row. M . .2 Example. Figure 50b (p. .208 CHAPTER 2.13. that do not violate discrete ∗ definition (362) of K . and {Γℓ } constitutes the extreme directions of K .193) K = {y | X y ∗ M T M 0} = i=1 {y | XiTy 0} = i=1 Ki ∗ (474) To find the extreme directions of the dual cone. Γ N ∗ ∗ ∗ = c.8.6.1. first we observe that some facets of each simplicial part K i are common to facets of K by assumption. ( 2. j = 1 .13. from the columns of all the dual simplicial matrices {Xi†T } . .1) Then the extreme directions of the dual cone can be found among inward-normals to facets of the component simplicial cones K i . then K can be expressed ((362).i. . and the union of all those common facets comprises the set of all facets of K by design. denotes selection of only the conically independent vectors from the argument set.11. 2. Dual of K nonsimplicial in subspace aff K .

   .  1 1 0  −1 0 0 X2 =   0 −1 1 0 0 −1  1 0 0  0 1 0 X4 =   −1 0 1 0 −1 −1          (478) The corresponding dual simplicial cones in aff K have generators respectively columnar in     2 1 1 1 2 1  −2  1 1  2 1  †T †T . 4X4 =  −1 4X3 =   −1 −1   −1 −2 2  3 −1 −1 −2 −1 −2 −1 Applying algorithm (476) we get  1 2 3 2 1 1 2 −1 −2   =   1 −2 −1 2  4 −3 −2 −1 −2  Γ 1 Γ 2 Γ3 Γ4 ∗ ∗ ∗ ∗ (480) . DUAL CONE & GENERALIZED INEQUALITY R4 arranged columnar in  1 1 0 0  −1 0 1 0   Γ4 ] =   0 −1 0 1  0 0 −1 −1  209 X = [ Γ1 Γ2 Γ3 (477) having dim aff K = rank X = 3. 4X2 =  −3  4X1 =   2 −3  1 −2  1 1  −2 1 −3 1 −2 −3 (479)     3 −1 2 3 2 −1   −1 3 −2  2 −1  †T †T  . (281) then performing the most inefficient simplicial decomposition in aff K we find 1 1 0  −1 0 1 X1 =   0 −1 0 0 0 −1  1 0 0  −1 1 0 X3 =   0 0 1 0 −1 −1    .13.2.

2.11. 2. ( 2.1) this proper polyhedral K = cone(X) has six (three-dimensional) facets generated G by its {extreme directions}:     F1          F2            F3 = G  F4          F5            F6 Γ 1 Γ2 Γ3 Γ 1 Γ2 Γ1 Γ4 Γ1 Γ3 Γ4 Γ3 Γ4 Γ2 Γ3     Γ5     Γ5    Γ5     Γ5 (483) These calculations proceed so as to be consistent with [115. we find the six extreme directions of dual cone K (with Γ2 = Γ5 ) ∗ ∗ ∗ ∗ ∗ ∗ X∗ = Γ 1 Γ2 Γ3 Γ 4 Γ 5 Γ 6 1 0 0  1 0 0 =   1 0 −1 1 −1 −1  1 1 1 0 0 −1 1 0  1 0   1  0 (482) which means. In that ambient space.86 There are no linearly dependent combinations of three or four extreme directions in the primal cone.13. 2.85 a conically independent set ∗ of generators for that pointed section of the dual cone K in aff K . id est.2.0.13. 4. as if the ambient vector space were proper subspace aff K whose dimension is 3. Dual of proper polyhedral K in R4 .86 Applying algorithm ∗ ∗ (476). Yet that author (from the citation) erroneously states dimension of the ordinary dual cone to be 3 .6.210 CHAPTER 2.3 Example. CONVEX GEOMETRY whose rank is 3.2. and is the known result. 6]. in fact. K may be regarded as a proper cone. it is. ∗ K ∩ aff K .85 . Given conically independent generators for a full-dimensional pointed closed convex cone K 1 1 0 1  −1 0 1 0 Γ5 ] =   0 −1 0 1 0 0 −1 −1   0 1   0  0 X = [ Γ1 Γ2 Γ3 Γ4 (481) we count 5!/((5 − 4)! 4!) = 5 component simplices.

3 whose convex hull passes through that cone’s interior. Name two extreme directions Γi of cone K from Example 2. Reaching proper polyhedral cone interior. With respect to vector v in some finite-dimensional Euclidean space Rn .11.2.t pointed closed convex K .11. like biorthogonal expansion w. We can check this result (482) by reversing the process.13. define a coordinate t⋆ of point x in full-dimensional pointed closed convex v cone K (485) t⋆ (x) sup{t ∈ R | x − tv ∈ K} v Three combinations of four dual extreme directions are linearly dependent. Conic coordinates. DUAL CONE & GENERALIZED INEQUALITY whereas dual proper polyhedral cone K has only five:  ∗  ∗ ∗ ∗ ∗   F1   Γ1 Γ2 Γ3 Γ4     ∗  ∗ ∗   F2   Γ1 Γ2   Γ∗     6  ∗ Γ∗ Γ ∗ Γ ∗ Γ∗ G F3 = 4 5 6  ∗  1  ∗  F4    Γ3 Γ∗ Γ∗     4 5  ∗      ∗ ∗ ∗ ∗  F5 Γ2 Γ3 Γ 5 Γ6 ∗ 211 (484) Six two-dimensional cones.0.3). Explain why.13.87 Applying algorithm (476) to those simplices returns a conically independent set of generators for K equivalent to (481).r. 2. having generators respectively {Γ1 Γ3 } {Γ2 Γ4 } ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ {Γ1 Γ5 } {Γ4 Γ6 } {Γ2 Γ5 } {Γ3 Γ6 } . we find 6!/((6 − 4)! 4!) − 3 = 12 component simplices in the dual cone.4 Exercise.0.2. But there are no linearly dependent combinations of three dual extreme directions. 2.13.12 coordinates in proper nonsimplicial system A natural question pertains to whether a theory of unique coordinates. is extensible to proper cones whose extreme directions number in excess of ambient spatial dimensionality. 2.0.13. ∗ so cannot be two-dimensional faces of K (by Definition 2.13. are relatively interior to dual facets.1 Theorem.0.87 . Are there two ∗ such extreme directions of dual cone K ? ∗ ∗ ∗ ∗ 2. they belong to the dual facets.6.0.12.

To the i th extreme direction v = Γi ∈ Rn of cone K . by boundary membership relation for full-dimensional pointed closed convex cones ( 2. comprises generators for cone K .  t⋆ (x) =  . If x − t⋆ v is nonzero.0. x−t⋆ v ∈ K . its extreme directions which constitute a minimal generating set. id est. λ ∈ K ∗ ∗ (330) This description of K means: cone K is an intersection of halfspaces whose inward-normals are generators of the dual cone. Coordinate t⋆ (c) = 0. Each and every vector x − t⋆ v on the cone boundary must therefore be orthogonal ∗ to an extreme direction constituting generators G(K ) of the dual cone. for v example. the mapping   t⋆ (x) 1  . Vector x − t⋆ v must belong to the cone boundary ∂K by definition (485). of possibly infinite cardinality N ..2) x − t⋆ v ∈ ∂K ⇔ ∃ λ = 0 where ∗ λ . x−t⋆ v = 0 .12.  : Rn → RN (486) . ⋆ tN (x) K = {z ∈ Rn | λ . So there must exist a nonzero vector λ that is inward-normal to a hyperplane supporting cone K and containing x − t⋆ v . ⋄ Conic coordinate definition (485) acquires its heritage from conditions (375) for generator membership to a smallest face. if t⋆ (x) = t⋆ (y) for each and every extreme v v direction v of K then x = y .13. corresponds to unbounded µ in (375). z ≥ 0 for all λ ∈ G(K )} (367) . w ≥ 0 for all v ∈ G(K)} λ . λ ∈ K . The set G(K) .g. On domain K . Each and every face of cone K (except the cone itself) belongs to a hyperplane supporting K . indicating.2 Proof. extreme direction v cannot belong to the smallest face of cone K that contains c . any such vector λ must belong to the dual cone boundary by conjugate boundary membership relation λ ∈ ∂K ⇔ ∃ x−t⋆ v = 0 where ∗ K = {w ∈ Rn | v . e.212 CHAPTER 2. 2.13. CONVEX GEOMETRY Given points x and y in cone K . x − t⋆ v ∈ K (329) (368) ∗ is the full-dimensional pointed closed convex dual cone. x − t⋆ v = 0 . ascribe a coordinate ⋆ ti (x) ∈ R of x from definition (485).

13.2. Otherwise. . index j must be judiciously selected from a set I .13. 2. we need only show mapping t⋆ (x) to be invertible. Because such dual extreme directions preexist by (329). x is recoverable given t⋆ (x) : N x = arg minimize n x∈R ˜ subject to x − t⋆ Γi ∈ K . it establishes existence of dual extreme directions {Γj∗ } that will reconstruct . the mapping t⋆ (x) is equivalent to a convex problem (separable in index i) whose objective (by (329)) is tightly bounded below by 0 : t⋆ (x) ≡ arg minimize t∈RN N i=1 ∗T Γj(i)(x − ti Γi ) (487) i=1 . DUAL CONE & GENERALIZED INEQUALITY 213 has no nontrivial nullspace. ˜ i i=1 ∗T Γj(i)(˜ − t⋆ Γi ) x i (488) i=1 . [129. closed. To prove injectivity when extreme-direction cardinality N > n exceeds spatial dimension. and convex by assumption. t⋆ (x) is invertible. thm. t⋆ (x) holds coordinates for biorthogonal expansion. in the case N ≤ n .2.9. N subject to x − ti Γi ∈ K . . The i objective function’s linear part describes movement in normal-direction −Γj∗ for each of N hyperplanes. Because extreme-direction cardinality N for i ∗ cone K is not necessarily the same as for dual cone K . . N The feasible set of this nonseparable convex problem is an intersection of translated full-dimensional pointed closed convex cones i K + t⋆ Γi . So we need simply choose N spanning dual extreme directions that make the optimal objective vanish. Because the ∗ dual cone K is full-dimensional. where index j ∈ I is dependent on i and where (by (367)) λ = Γj∗ ∈ Rn is an ∗ extreme direction of dual cone K that is normal to a hyperplane supporting K and containing x − t⋆ Γi . The optimal point of hyperplane intersection is the unique solution x when {Γj∗ } comprises n linearly independent normals that come from the dual cone and make the objective vanish. .3] id est. pointed.1 reconstruction from conic coordinates The foregoing proof of the conic coordinates theorem is not constructive. Reconstruction of x is therefore unique.12. Because x − t⋆ v must belong to ∂K by definition. ∗ there exist N extreme directions {Γj∗ } from K ⊂ Rn that span Rn .

88 meaning.2. Under this assumption of strong duality.283) is satisfied. sup L(t .214 CHAPTER 2. but does not prescribe the index set I . . t λ K∗ 0 1 − λT v = 0 Dual variable (Lagrange multiplier [251. primal and dual optimal objectives are equal t⋆ = λ⋆T x assuming Slater’s condition (p. which implies. CONVEX GEOMETRY a point x from its coordinates t⋆ (x) via (488). We describe the latter: Convex problem (P) (P) maximize t t∈R minimize n λ∈ R λT x (D) (489) subject to x − tv ∈ K subject to λT v = 1 ∗ λ∈K is equivalent to definition (485) whereas convex problem (D) is its dual. i=1 . p. we propose inversion by alternating solution of primal and dual problems N minimize n x∈R i=1 Γ∗T(x − t⋆ Γi ) i i t⋆ Γi i ∈K. .216]) λ generally has a nonnegative sense for primal maximization with any cone membership constraint. λ⋆T (x − t⋆ v) = t⋆ (1 − λ⋆T v) = 0 . .88 Form a Lagrangian associated with primal problem (P): L(t . the other is a geometric method that searches for a minimum of a nonconvex function. λ) = t + λT (x − t v) = λT x + t(1 − λT v) . whereas λ would have a nonpositive sense were the primal instead a minimization problem having a membership constraint. There are at least two computational methods for specifying ∗ {Γj(i) } : one is combinatorial but sure to succeed. the primal problem is equivalent to minimize λ⋆T(x − tv) t∈R (P) (490) subject to x − tv ∈ K while the dual problem is equivalent to minimize n λ∈ R subject to λT v = 1 ∗ λ∈K λT (x − t⋆ v) (D) (491) Instead given coordinates t⋆ (x) and a description of cone K . N (492) subject to x − 2. λ) = λT x .

not in theory. Convex problems (492) and (493) are iterated until convergence which is guaranteed by virtue of a monotonically nonincreasing real sequence of objective values. But this global optimality condition cannot be guaranteed by the alternation because the common objective function. i ∗ Γ∗ ∈ K . is generally neither quasiconvex or monotonic. A nonzero objective indicates that the global minimum of a multimodal objective function could not be found by this alternation. Γ∗T(x⋆ − t⋆ Γi ) i i i=1 Γ∗T Γi = 1 . . . that suitable dual extreme directions {Γj∗ } always exist. i N i=1 .0. N i=1 .13.13. id est. . Convergence can be fast.2. DUAL CONE & GENERALIZED INEQUALITY minimize ∗ n subject to Γi ∈ R 215 where dual extreme directions Γ∗ are initialized arbitrarily and ultimately i ascertained by the alternation. when Γ∗T(x⋆ − t⋆ Γi ) = 0 ∀ i . But such an algorithm can be combinatorial.12.89 A numerical remedy is to reinitialize the Γ∗ to i different values. 2.2. a nonzero objective at convergence is a certificate that inversion was not performed properly. the unique inversion x .2. a zero i i objective can occur only at the true solution x . means that a global optimization algorithm would always find the zero objective of alternation (492) (493). ( 3. That is a flaw in this particular iterative algorithm for inversion. . when regarded in both primal x and dual Γ∗ variables i simultaneously.8. The mapping t⋆ (x) is uniquely inverted when the necessarily nonnegative objective vanishes.0. N (493) The Proof 2. hence.89 . Here.3) Conversely.0.

216 CHAPTER 2. CONVEX GEOMETRY .

2.1 to finite-dimensional Euclidean space. with its tables of first. whereas the concave icon is the inverted bowl. exer.3.3.Chapter 3 Geometry of convex functions The link between convex sets and convex functions is via the epigraph: A function is convex if and only if its epigraph is a convex set.1 217 . concave.46]. is the practical adjunct to this chapter. v2012.01. Mεβoo Publishing USA.5]. 2005. All rights reserved. Convex Optimization & Euclidean Distance Geometry. [61. vector. Appendix D. co&edg version 2012. the reader is reminded that the set of all convex. 3.01. Then an icon for a one-dimensional (real) convex function is bowl-shaped (Figure 77). 3.28.and second-order gradients. respectively characterized by a unique global minimum and maximum whose existence is assumed. Because of this simple relationship.g. citation: Dattorro. quasiconvex. e. Despite iconic imagery. usage of the term convexity is often implicitly inclusive of concavity.or matrix-valued functions including the real functions. −Werner Fenchel We limit our treatment of multidimensional functions 3. and quasiconcave functions contains the monotonic functions [204] [216.6.28.. 2001 Jon Dattorro.

Z ∈ dom f and α . p. Linear (and affine 3. 3.4. Reversing sense of the inequality flips this definition to concavity. + . for each and every Y. Z ∈ dom f and 0 ≤ µ ≤ 1 f (µ Y + (1 − µ)Z) RM + µf (Y ) + (1 − µ)f (Z ) (496) As defined. a proper + cone.2 Apparently some.4)3. e.g. .t RM ⇔ wTf convex ∀ w ∈ G(RM ) + + 3. β ∈ R f (α Y + βZ) = αf (Y ) + βf (Z ) (495) A vector-valued function f (X) : Rp×k → RM is convex in X if and only if dom f is a convex set and.2.13. continuity is implied but not differentiability (nor smoothness).1 3. Function f (X) is linear in X on its domain if and only if. ∗ ∗ Referred to in the literature as K-convexity. Linear functions are therefore simultaneously convex and concave.3 While linear functions are not invariant to translation (offset). [244]..2.0. but not all.4 Definition of convexity can be broadened to other (not necessarily proper) cones.13.4 In this case.2 ∗ (497) Figure 68b illustrates a nondifferentiable convex function.3 functions attain equality in this definition. [296] RM (497) generalizes to K .1. the test prescribed by (496) is simply a comparison on R of each entry fi of a vector-valued function f .3] of its range (a subset of RM ). fM (X)  Vector-valued function f (X) : Rp×k → RM (494) assigns each X in its domain dom f (a subset of ambient vector space Rp×k ) to a specific element [259.218 CHAPTER 3. for each and every Y.3. convex functions are.r. GEOMETRY OF CONVEX FUNCTIONS 3.3. =   . This conclusion follows from theory of dual generalized inequalities ( 2. ( 2.3) The vector-valued function case is therefore a straightforward generalization of conventional convexity theory for a real function.1) which asserts f convex w. nonlinear functions are convex. Differentiability is certainly not a requirement for optimization of convex functions by numerical methods. Vector-valued functions are most often compared (182) as in (496) with respect to the M -dimensional selfdual nonnegative orthant RM .1 Convex function real and vector-valued function  f1 (X)   . 3.

More tests for strict + convexity are given in 3. for each and every distinct Y and Z in dom f and all 0 < µ < 1 on an open interval f (µ Y + (1 − µ)Z) ≺ µf (Y ) + (1 − µ)f (Z ) RM + (499) then we shall say f is a strictly convex function.2. the test for convexity of a vector-valued function is a comparison on R of each entry.2. M } + ∗ (498) (the standard basis for RM and a minimal set of generators ( 2.2.3. Strict convexity of a real function is only a sufficient condition for minimizer uniqueness.1. . shown by substitution of the defining inequality (496).1) of a semiinfinite number of conditions {w ∈ RM } to: + {w ∈ G(RM )} ≡ {ei ∈ RM . id est.0. .6.13. . Discretization allows ∗ relaxation ( 2. √ (x) = x2 = x 2 is strictly convex whereas nondifferentiable function f1 2 f2 (x) = x2 = |x| = x 2 is convex but not strictly.r.2 strict convexity When f (X) instead satisfies. and 3.2) for RM ) + from which the stated conclusion follows. For x ∈ R .0.3.4. CONVEX FUNCTION f1 (x) f2 (x) 219 (a) (b) Figure 68: Convex real functions here have a unique minimizer x⋆ .8. Similarly to (497) f strictly convex w. w = 0} (325) to a finite number (498).1. i = 1 . 3.1.6.7.t RM ⇔ wTf strictly convex ∀ w ∈ G(RM ) + + ∗ (500) discretization allows relaxation of the semiinfinite number of conditions ∗ {w ∈ RM .1. 3.4.

3. Quadratic real functions xTAx + bTx + c are convex in x iff A 0. e. for each and every ∗ w ∈ int RM . not unique.2) Study of multidimensional objective functions is called multicriteria.[326] or multiobjective. ( 2. it is the unique minimizer of wTf . 3. + It is more customary to consider only a real function for the objective of a convex optimization problem because vector.2.5) at its minimum over C is necessary and sufficient for uniqueness. uniqueness of solution A local minimum of any convex real function is also its global minimum. ( 3. then there exists a nonzero ∗ w ∈ RM such that f (X ⋆ ) minimizes wTf . But strict convexity is not necessary for minimizer uniqueness: existence of any strictly supporting hyperplane to a convex function epigraph (p. then f (X ⋆ ) is a minimal element of its range.or matrix-valued functions can introduce ambiguity into the optimal objective value.2.g. [304.5 .1. Figure 68a illustrates a strictly convex real function.or vector-optimization.3] + If f (X ⋆ ) is a minimal element of its range.2. GEOMETRY OF CONVEX FUNCTIONS local/global minimum. 3.5 function and set C convex.2.0. In fact. it is sufficient (Figure 68) that f (X) be a strictly convex real3.6.1 CHAPTER 3.2 minimum/minimal element. for the optimal solution set in (502) to be a single point.2.6. id est. dual cone characterization f (X ⋆ ) is the minimum element of its range if and only if. for example. conversely. The vector 2-norm square x 2 (Euclidean norm square) and Frobenius’ norm square X 2 .4. (Figure 69) [61.1. in general.1) Quadratics characterized by positive definite matrix A ≻ 0 are strictly convex and vice versa. any convex real function f (X) has unique minimum value over any convex subset of its domain.. p.1.217.7.220 3. are strictly F convex functions of their respective argument (each absolute norm is convex but not strictly). 3. given minimization of a convex real function over some convex feasible set C minimize X f (X) subject to X ∈ C any optimal solution X ⋆ comes from a convex set of optimal solutions X ⋆ ∈ {X | f (X) = inf f (Y ) } ⊆ C Y∈C (501) (502) But a strictly convex real function has a unique minimizer X ⋆ . If f (X ⋆ ) minimizes wTf for some + ∗ w ∈ int RM .123] Yet solution to some convex optimization problem is. 2.

) . (b) Point f (X ⋆ ) is a minimal element of depicted range.3. (a) Point f (X ⋆ ) is the unique minimum element of function range Rf . (Cartesian axes drawn for reference. CONVEX FUNCTION 221 Rf (b) f (X ⋆ ) f (X ⋆ ) w f (X)2 f (X)1 Rf w (a) f f Figure 69: (confer Figure 40) Function range is convex for a convex problem.1.

. Cone of convex functions. id est.1).6 Strict case excludes cone’s point at origin and zero weighting. Indeed. whereas any affine function must satisfy (505). h − a = 0 for all ν ∈ A⊥ (505) In other words: all affine functions are convex (with respect to any given proper cone).2. Relatively interior to each face of this cone are the strictly convex functions of corresponding dimension. The cone of convex functions.1. provides foundation for what is known as a Lagrangian function.2. given point a ∈ A . so we may write    M∗   M w R+ R+ f f g convex w. A real Lagrangian arises from the scalar term in (504).t  K  ⇔ [ wT λT ν T ] g convex ∀  λ  ∈  K∗  h h ν A A⊥ (504) ⊥ where A is a normal cone to A . Prove that relation (497) implies: The set of all convex vector-valued functions forms a convex cone in some space.222 CHAPTER 3.3. [253.10. So trivial function f = 0 is convex. A normal cone to an affine subset is the orthogonal complement of its parallel subspace ( E.398] [282] Consider a conic optimization problem. Conic origins of Lagrangian.1. Membership relation (504) holds because of equality for h in convexity criterion (496) and because normal-cone membership relation (450). GEOMETRY OF CONVEX FUNCTIONS 3. p.6 How do convex real functions fit into this cone? Where are the affine functions? 3. L f [ w T λT ν T ] g h = w T f + λT g + ν T h (506) subject to g(x) K 0 h(x) ∈ A (503) 3.2.3. all convex functions are translation invariant.1 Exercise.2. any nonnegatively weighted sum of convex functions remains convex. implied by membership relation (497).2 Example.2. becomes h∈A ⇔ ν .r. for proper cone K and affine subset A minimize x f (x) A Cartesian product of convex functions remains convex.

. in this sense.7 3. 2-. theoretically.2. Vector 1 may be replaced with any positive [sic] vector to get absolute value. ABSOLUTE VALUE 223 3. PRACTICAL NORM FUNCTIONS. But to statisticians and engineers. p. describes a norm ball.3. .53]. f (αx) = |α|f (x) (nonnegativity) 3.59] [160. whose focus is predominantly 1-norm. .8 = minimize t∈ R t tI x xT t 0 n+1 S+ subject to where x 2 (508) = x √ xTx = t⋆ . y ∈ Rn . ratios of different norms are bounded above and below by finite predeterminable constants. = α∈ R . as evidenced by the compressed sensing (sparsity) revolution. x ∞ = minimize t∈ R t x t1 where max{|xi | . p. n} = t⋆ because x ∞ = max{|xi |} ≤ t ⇔ |x| Absolute value |x| inequality. β∈ R minimizen 1T (α + β) n subject to α . f (x) ≥ 0 (f (x) = 0 ⇔ x = 0) 3.3. “all norms on Rn are equivalent” [160. Most useful are 1-. meaning. p. A norm on Rn is a convex function f : Rn → R satisfying: for x. and ∞-norm: x 1 1T t subject to −t x 2 x t (507) where |x| = t⋆ (entrywise absolute value equals optimal t ).52] 1.2 Practical norm functions. absolute value To some mathematicians. x 1 subject to −t1 (509) t1. β 0 x=α−β 3. i = 1 . all norms are certainly not created equal. begun in 2004. f (x + y) ≤ f (x) + f (y) (triangle inequality) (nonnegative homogeneity) = minimize n t∈ R Convexity follows by properties 2 and 3. α ∈ R [228. with equality iff x = κ y where κ ≥ 0. although 1 provides the 1-norm.7 2.8 (510) x + y ≤ x + y for any norm.

. t∈ R t tI x xT t x∈ C 0 n+1 S+ minimize n x∈ R x 2 subject to x ∈ C ≡ subject to (514) minimize n x∈ R x ∞ minimize n ≡ x∈ R . Over some convex set C given vector constant y or matrix constant Y arg inf x − y = arg inf x − y x∈C x∈C 2 (511) 2 arg inf X − Y X∈ C = arg inf X − Y X∈ C (512) are unconstrained convex problems for any convex norm and any affine transformation of variable.3.2). β 0 x=α−β x∈C These foregoing problems (507)-(516) are convex whenever set C is. β∈ R 1 minimizen 1T (α + β) n (516) subject to x ∈ C ≡ subject to α . But equality does not hold for a sum of norms ( 5. GEOMETRY OF CONVEX FUNCTIONS where |x| = α⋆ + β ⋆ because of complementarity α⋆Tβ ⋆ = 0 at optimality.297] minimize n x∈ R x 1 subject to x ∈ C ≡ x∈ R . x ∞ is maximum |coordinate|.2. minimize n x∈ R x α∈ R . x 2 is Euclidean length.4.224 CHAPTER 3. Their equivalence transformations make objectives smooth. p. (508) is a semidefinite program. t∈ R t t1 (515) subject to x ∈ C subject to −t1 x x∈C In Rn these norms represent: x 1 is length measured along a grid (taxicab distance). (507) (509) (510) represent linear programs. Optimal solution is norm dependent: [61. t∈ R minimize 1T t n n t (513) subject to −t x x∈C minimize n x∈ R .

If 1-norm ball is expanded until it kisses A (intersects ball only at boundary). ABSOLUTE VALUE 225 A = {x ∈ R3 | Ax = b} R3 B1 = {x ∈ R3 | x 1≤ 1} Figure 70: 1-norm ball B1 is convex hull of all cardinality-1 vectors of unit norm (its vertices).2. Only were A parallel to two axes could there be a minimal cardinality least Euclidean norm solution.3. Plane A is overhead (drawn truncated). A-orientations resulting in a minimal cardinality solution. then distance (in 1-norm) from origin to A is achieved.) . Ball boundary contains all points equidistant from origin in 1-norm. PRACTICAL NORM FUNCTIONS. but not all. (1-norm ball is an octahedron in this dimension while ∞-norm ball is a cube. Yet 1-norm ball offers infinitely many. Cartesian axes drawn for reference. Euclidean ball would be spherical in this dimension.

while relative interiors of facets represent maximal cardinality 3. . in 1-norm. rather. . in 1-norm. Optimal solution can be interpreted as an oblique projection of the origin on A simply because the Euclidean metric is not employed. Projecting the origin. x∈R c (519) subject to Ax = b 3. all kissable faces of the Euclidean ball are single points (vertices). 3. n = 3).0.0.1 Example.2. Maximization of the 1-norm ball until it kisses A is equivalent to minimization of the 1-norm ball until it no longer intersects A . By reorienting affine subset A so it were parallel to an edge or facet.a compressed sensing problem.9 The 1-norm ball in Rn has 2n facets and 2n vertices.3. which can be explained intuitively with reference to Figure 70: [19] Projection of the origin.9 ≡ subject to x ∈ cB1 Ax = b This is unlike the case for the Euclidean ball (1923) where minimum-distance projection on a convex set is unique ( E. it becomes evident as we expand or contract the ball that a kissing point is not necessarily unique. . GEOMETRY OF CONVEX FUNCTIONS 3.k. we use taxicab distance to do projection. on an affine subset. instead of the Euclidean metric.10 The ∞-norm ball in Rn has 2n facets and 2n vertices. on affine subset A is equivalent to maximization (in this case) of the 1-norm ball B1 until it kisses A . whereas edges represent cardinality 2. it appears that a vertex of the ball will be first to touch A . This problem statement sometimes returns optimal x⋆ having minimal cardinality.226 CHAPTER 3. Then projection of the origin on affine subset A is minimize n x∈R x 1 minimize n c∈R . Suppose. for b ∈ R(A) minimize x 1 x (517) subject to Ax = b a.9). For the example illustrated (m = 1. n} (518) is a vertex-description of the unit 1-norm ball. a kissing point in A achieves the distance in 1-norm from the origin to A .3. In (1923) we interpret least norm solution to linear system Ax = b as orthogonal projection of the origin 0 on affine subset A = {x ∈ Rn | Ax = b} where A ∈ Rm×n is fat full-rank. Then the least 1-norm problem is stated.10 For n > 0 B1 = {x ∈ Rn | x 1≤ 1} = conv{±ei ∈ Rn . i = 1 . 1-norm ball vertices in R3 represent nontrivial points of minimal cardinality 1.

2 0.2 0.6 0.8 0.6 0.3.1 0 0 positive 227 0. f2 (x) xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx f3 (x) xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx xxx f4 (x) xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx xx Figure 72: Under 1-norm f2 (x) .3 0.7 0.4 0.4 0.6 0.7 0. thick curve demarcates phase transition in ability to find sparsest solution x by linear programming.2 0. ABSOLUTE VALUE k/m 1 0.4 0.9 0. Under norm f4 (x) (not discussed). histogram (hatched) of residual amplitudes Ax − b exhibits predominant accumulation of zero-residuals.8 1 0.6 0. PRACTICAL NORM FUNCTIONS.8 1 m/n (517) minimize x x 1 minimize x x 1 subject to Ax = b subject to Ax = b x 0 (522) Figure 71: Exact recovery transition: Respectively signed [118] [120] or positive [125] [123] [124] solutions x to Ax = b with sparsity k below thick curve are recoverable.2.5 0.8 0.9 0. These results were empirically reproduced in [38].4 0.3 0.2 0.1 0 0 signed 1 0. Nonnegatively constrained 1-norm f3 (x) from (522) accumulates more zero-residuals than f2 (x). .5 0. histogram would exhibit predominant accumulation of (nonzero) residuals at gradient discontinuities. For Gaussian random matrix A ∈ Rm×n .

22].399) Any vector 1-norm minimization problem may have its variable replaced with a nonnegative variable of the same optimal cardinality but twice the length. p. a 0} (520) minimize n 2n c minimize a 1 2n x = [ I −I ]a a∈R ≡ subject to [ A −A ]a = b aT 1 = c a 0 a 0 Ax = b (521) subject to where x⋆ = [ I −I ]a⋆ .3.g. All other things being equal. because those columns correspond to 0-entries in vector x (and vice versa). GEOMETRY OF CONVEX FUNCTIONS Then (519) is equivalent to c∈R ..0. the permutation matrices of dimension n . (Figure 71. nonnegative variables are easier to solve for sparse solutions. A device commonly employed to relax combinatorial problems is to arrange desirable solutions at vertices of bounded polyhedra.13. In the latter case. Figure 100) The compressed sensing problem (517) becomes easier to interpret. which are factorial in number.0. (confer (516)) Significance of this result: (confer p. we find practical application of the smallest face F containing b from 2. e.0.. Figure 30) over a bounded polyhedron always has a vertex solution [94. for A ∈ Rm×n minimize x x 1 minimize 1T x ≡ x subject to Ax = b x 0 subject to Ax = b x 0 (522) movement of a hyperplane (Figure 26. x∈R . a∈R cB1 = {[ I −I ]a | aT 1 = c .2.2 Exercise.228 where CHAPTER 3. .3 to remove all columns of matrix A not belonging to F . are the extreme points of a polyhedron in the nonnegative orthant described by an intersection of 2n hyperplanes ( 2. Or vector b might lie on the relative boundary of a pointed polyhedral cone K = {Ax | x 0}.g.2. Figure 72. Minimizing a linear objective function over a bounded polyhedron is a convex problem (a linear program) that always has an optimal solution residing at a vertex. 3. Combinatorial optimization. e.4.4).

geometrically describe three circumstances under which there are likely to exist vertex solutions to minimize n x∈R Ax 1 subject to x ∈ P optimized over some bounded polyhedron P .11 (523) 3. P belongs to an orthant and A were orthogonal.3. Finding k smallest entries of an n-length vector x is expressible as an infimum of n!/(k!(n − k)!) linear functions of x .1.2 k largest entries Sum of the k largest entries of x ∈ Rn is the optimal objective value from: [61. . t .1 k smallest entries n Sum of the k smallest entries of x ∈ Rn is the optimal objective value from: for 1 ≤ k ≤ n n π(x)i = i=n−k+1 minimize n y∈ R x y ≡ T i=n−k+1 π(x)i = maximize k t + 1T z n z∈ R . 3.2. The sum π(x)i is therefore a concave function of x . Or rewrite x the problem as a linear program like (513) and (515) but in a composite variable ← y. Begin with A = I and apply level sets of the objective.2. exer. monotonic ( 3.0. i = 1 .3.6.1) in this instance. .11 subject to x z Hint: Suppose. for example. t∈ R subject to 0 y 1 1T y = k subject to x z t1 + z 0 (524) which are dual linear programs. PRACTICAL NORM FUNCTIONS. t∈ R k t + 1T z t1 + z 0 (525) subject to 0 y 1 1T y = k 3.19] k i=1 π(x)i = maximize xTy n y∈ R k π(x)i = ≡ i=1 minimize n z∈ R . in fact. n} where π is a nonlinear permutation-operator sorting its vector argument into nonincreasing order. as in Figure 67 and Figure 70. where π(x)1 = max{xi .5.2. ABSOLUTE VALUE 229 What about minimizing other functions? Given some nonsingular matrix A .

230 CHAPTER 3.2).6. Finding k largest entries of an n-length vector x is expressible as a supremum of n!/(k!(n − k)!) linear functions of x . x∈ Rn minimize k t + 1T z −t 1 − z z 0 x∈C x t1 + z (528) subject to x ∈ C ≡ subject to .1 k-largest norm Let Π x be a permutation of entries xi such that their absolute value becomes arranged in nonincreasing order: |Π x|1 ≥ |Π x|2 ≥ · · · ≥ |Π x|n . [61. this norm is convex in x . exer.3e] minimize n x∈ R x n k z∈ Rn . 3.1. (Figure 74) hence. and k = = = x x 1 ∞ 1 (527) π(|x|)1:k Sum of k largest absolute entries of an n-length vector x is expressible as a supremum of 2k n!/(k!(n − k)!) linear functions of x . GEOMETRY OF CONVEX FUNCTIONS which are dual linear programs. 1} card ai = k = maximize n y1 . and is the optimal objective value of a linear program: k k x n k i=1 |Π x|i = π(|x|)i i=1 = minimize n z∈ R .6. t∈ R k t + 1T z x t1 + z subject to −t 1 − z z 0 = sup aTx i i∈I aij ∈ {−1. by properties of vector norm ( 3. Sum of the k largest entries of |x| ∈ Rn is a norm. t∈ R .1).2. 3.0.2. 0. (Figure 74) The summation is therefore a convex function (and monotonic in this instance. y 2 ∈ R (y1 − y2 )T x (526) subject to 0 y1 1 0 y2 1 (y1 + y2 )T 1 = k where the norm subscript derives from a binomial coefficient x x x n n n 1 n k n .

1.2. x n is the convex k-largest norm of x k (monotonic on Rn ) while x 0 (quasiconcave on Rn ) expresses its cardinality. but a statement of compressed sensing more precise than (522) because of its equivalence to 0-norm. when the smallest n − k entries of variable vector x are zeroed. id est. Assuming a cardinality-k solution exists. minimization of a concave function on Rn ( 3. this compressed sensing problem (529) becomes the same as minimize n z(x) .3. x∈R (1 − z)T x subject to Ax = b x 0 where z = ∇ x 1 = ∇ x 1 n k (530) = ∇1T x = ∇z T x .2. Compressed sensing problem. Polyhedral epigraph of k-largest norm. ABSOLUTE VALUE 3.2. Under nonnegativity constraint.2. x 0 (531) . PRACTICAL NORM FUNCTIONS.1.2 Example. the compressed sensing problem may be written as a difference of two convex functions: for A ∈ Rm×n minimize n x∈R x 1 − x n k subject to Ax = b x 0 ≡ find x ∈ Rn subject to Ax = b x 0 x 0≤k (529) which is a nonconvex statement. 3. Conventionally posed as convex problem (517). + + Global optimality occurs at a zero objective of minimization. Plot x 2 2 and 2 2 and x 2 1 in three dimensions. a minimization of n−k smallest entries of variable vector x .2. we showed: the compressed sensing problem can always be posed equivalently with a nonnegative variable as in convex statement (522). Make those card I = 2k n!/(k!(n − k)!) linear functions explicit for x x 2 1 231 on R2 and x 3 2 on R3 . it inherently minimizes cardinality under some technical conditions. [68] and because the desirable 0-norm is intractable. 32].2.1 Exercise. The 1-norm predominantly appears in the literature because it is convex.1) + [309.

x 0 (532)   ∇ x n = arg maximize y Tx   n k  y∈R   subject to 0 y 1     yT1 = k 3. n = 1 (x + |x|) 2 (534) minimize 1T t n n (535) minimize n x∈ R x+ subject to x ∈ C 3. .3 Exercise.12 3.3.12 ≡ subject to x t 0 t x∈C Hint: D. . for x = [xi . i=1 .232 CHAPTER 3.1. Is ∇ x n unique? Find ∇ x 1 and ∇ x k n k on Rn .2.2. . i = 1 . xi ≥ 0 xi < 0 . 0. t∈ R 1 t⋆ = xi . GEOMETRY OF CONVEX FUNCTIONS where gradient of k-largest norm is an optimal solution to a convex problem:  maximize y Tx x n =   n k  y∈R   subject to 0 y 1     yT1 = k   .1. .2.2. .3 clipping Zeroing negative vector entries under 1-norm is accomplished: x+ 1 = minimize n t∈ R 1T t t t (533) subject to x 0 where. k-largest norm gradient. Prove (531). n] ∈ Rn x+ (clipping) x∈ R .

g. . i = 1 . y∈ R y x 1 1 y x∈C 0 0 (538) subject to x > 0 x∈C rather subject to x > 0. e. INVERTED FUNCTIONS AND ROOTS 233 3. . . i=1 . f inverted about ordinate 1 is concave. We wish to implement objectives of the form x−1 .13 ⇔ x √i n √ n y 0.3.3 Inverted functions and roots A given function f is convex iff −f is concave. n] ∈ Rn n minimize n x∈R i=1 x−1 i ≡ minimize n x∈R . Suppose we have a 2×2 matrix x z T ∈ R2 (536) z y which is positive semidefinite by (1547) when T 0 A polynomial constraint such as this is therefore called a conic constraint. . as semidefinite programs in Schur-form.3. y∈R y √ n y 0.3.13 This means we may formulate convex problems. Minimization of f is maximization of 1/f .. y ≥ tr δ(x) 3. Both functions are loosely referred to as convex since −f is simply f inverted about the abscissa axis. y ≥ 1 x ⇔ x 1 1 y (539) (inverted) For vector x = [xi . . A given positive function f is convex iff 1/f is concave. n (541) In this dimension. having inverted variables. . n (540) subject to x ≻ 0 x∈C rather x √i subject to n x∈C −1 x ≻ 0 . z} satisfying constraint (537) is called a rotated quadratic or circular cone or positive semidefinite cone. minimize x∈R ⇔ x > 0 and xy ≥ z 2 (537) x−1 ≡ minimize x. i=1 . the convex cone formed from the set of all values {x . y . . and minimization of f is equivalent to maximization of −f .

for quantized 0 ≤ α < 1 maximize xα x∈R x∈R . q (543) subject to x > 0 x∈C ≡ subject to where nonnegativity of yq is enforced by maximization.3. . . 2 q − 1}. i=1 . .1 fractional power [150] To implement an objective of the form xα for positive α .3. 1}. Define vector y = [yi . 2 . i=1 . y q ≤ xα 3. . GEOMETRY OF CONVEX FUNCTIONS 3. id est.1. q (544) It is also desirable to implement an objective of the form x−α for positive α . y∈R maximize q+1 yq yi−1 yi yi x b i x∈C 0. .1. The technique is nearly the same as before: for quantized 0 ≤ α < 1 x . we quantize α and work instead with that approximation. .3. . z∈R . y∈R minimize q+1 z yi−1 yi yi x b i z 1 1 yq x∈C 0 (545) 0. q minimize x∈R x−α ≡ subject to subject to x > 0 x∈C . i=1 .2 negative ⇔ yi−1 yi yi x b i 0. 1.1 Then we have the equivalent semidefinite program for maximizing a concave function xα . Choose nonnegative integer q for adequate quantization of α like so: k (542) 2q Any k from that set may be written α where k ∈ {0. . q k= i=1 b i 2i−1 where b i ∈ {0. . . q] with y0 = 1 : positive 3. x > 0 .234 CHAPTER 3. i = 0 .

. . To implement an objective x1/α for quantized 0 ≤ α < 1 as in (542) minimize x∈R x 1/α y∈R . when B = 0.g. q 235 x > 0 . e.0. affine function convexity holds with respect to any dimensionally compatible proper cone substituted into convexity . y ≥ x1/α ≡ subject to (547) ⇔ ti−1 ti ti y b i x = tq ≥ 0 0.. i=1 .3. . The linear functions constitute a proper subset of affine functions.2) f (X) = AX + B (549) All affine functions are simultaneously convex and concave. z ≥ x−α ⇔ (546) 3.4 Affine function A function f (X) is affine when it is continuous and has the dimensionally extensible form (confer 2. . q] with t0 = 1. t∈R minimize q+1 y ti−1 ti ti y b i x = tq ≥ 0 x∈C 0.4. q (548) 3. i=1 . Both −AX + B and AX + B .3 positive inverted Now define vector t = [ti .9. .3. AFFINE FUNCTION rather yi−1 yi yi x b i z 1 1 yq 0 0. q subject to x ≥ 0 x∈C rather x ≥ 0 . . . are convex functions of X . Unlike other convex functions.1. . for example.1. function f (X) is linear. i=1 . i = 0 .

1) 3.2]3. GEOMETRY OF CONVEX FUNCTIONS definition (496).0.2 Example.14 . (confer 4. 3. Were convex set C polyhedral ( 2. Linear objective.14 M For X ∈ S and matrices A . All affine functions satisfy a membership relation. Were The interpretation from this citation of {X ∈ SM | g(X) 0} as “an intersection between a linear subspace and the cone of positive semidefinite matrices” is incorrect.2.12).0.236 CHAPTER 3. [391. The real affine function in Figure 73 illustrates hyperplanes. Engineering control.2 for a similar example. the problem posed is the same as the convex optimization minimize aTz z subject to z ∈ C (551) whose objective of minimization is a real linear function. constituting contours of equal function-value (level sets (554)). 2. ATXA + B TB is affine in X .0. B . engineering control.0.0. a real linear function expressible as inner product f (X) = A . Trace is an affine function.0. id est. R of any compatible dimensions. for some normal cone. actually. entries of the function are characterized by only linear combinations of argument entries plus constants. X with matrix A being the identity.1 Example. the expression XAX is not affine in X whereas g(X) = R B TX XB Q + ATX + XA (550) is an affine multidimensional function. Since scalar b is fixed.1. Consider minimization of a real affine function f (z) = aTz + b over the convex feasible set C in its domain R2 illustrated in Figure 73. for example. in its domain.4. for example. then this problem would be called a linear program. 3. (See 2.) The conditions they state under which strong duality holds for semidefinite programming are conservative.4.or many-valued inverse of an affine function is affine. Affine multidimensional functions are more easily recognized by existence of no multiplicative multivariate terms and no polynomial terms of degree higher than 1 . [59] [152] Such a function is typical in (confer Figure 16) Any single. like (505).3.9. Q .

3). is increasing in normal direction (Figure 26) because affine function increases in direction of gradient a ( 3.4. Minimization of aTz + b over C is equivalent to minimization of aTz . AFFINE FUNCTION 237 A f (z) z2 C H+ H− a z1 {z z { ∈ R ∈ R 2| a T 2 z {z ∈ Rz2| a T | a T = κ2 = κ1 } } z = κ3 } Figure 73: Three hyperplanes intersecting convex set C ⊂ R2 from Figure 28.3.r. a plane with third dimension.0.6. We say sequence of hyperplanes. .0.t domain R2 of affine function f (z) = aTz + b : R2 → R . w. Cartesian axes in R3 : Plotted is affine subset A = f (R2 ) ⊂ R2 × R .

0. the other including the ambient space of the objective R2 function’s range as in . Given nonempty closed bounded convex sets Y and Z in Rn and nonnegative scalars β and γ [374.6)). One solves optimization problem (551) graphically by finding that hyperplane intersecting feasible set C furthest right (in the direction of negative gradient −a ( 3. Figure 67) [153] Then the nonlinear objective function can be replaced with a dynamic linear objective. GEOMETRY OF CONVEX FUNCTIONS convex set C an intersection with a positive semidefinite cone. its support function σY (a) : Rn → R is defined σY (a) sup aTz z∈Y (552) a positively homogeneous function of direction a whose range contains ±∞.9) In this circumstance.13. it is convex in a (Figure 74). Both visualizations are illustrated in Figure 73. its level sets are defined Lκ f κ {z | f (z) = κ} (554) (553) .0. Level sets. R Visualization in the function domain is easier because of lower dimension and because level sets (554) of any affine function are affine. Given a function f and constant κ .2.0.1.3. ( 2.3 Example. Application of the support function is illustrated in Figure 29a for one particular normal a . Because σY (a) is a pointwise supremum of linear functions. When a differentiable convex objective function f is nonlinear.135] For each z ∈ Y .234] σβ Y+γZ (a) = β σY (a) + γ σZ (a) 3. p.C.1. [200.4. C. p.0.4 Exercise. the level sets are parallel hyperplanes with respect to R2 . aTz is a linear function of vector a . [251. ( 2.10. There are two distinct ways to visualize this problem: one in the objective function’s domain R2 . the negative gradient −∇f is a viable search direction (replacing −a in (551)).1] n For arbitrary set Y ⊆ R .1. 3. then this problem would be called a semidefinite program.2. Support function.4.238 CHAPTER 3. linear as in (551).

4. having convex level sets. 3. Piecewise-continuous convex functions are admitted.5. Supremum of affine functions in variable a evaluated at argument ap is illustrated. Give two distinct examples of convex function. (confer Figure 74) Draw three hyperplanes in R3 representing max(0. x) sup{0. What is the normal to each hyperplane? 3.0. [200] [309] [361] [374] [251] epigraph is the connection between convex sets and convex functions (p. Sublevel set It is well established that a continuous real function is convex if and only if its epigraph makes a convex set. that are not affine.5 Epigraph. by epigraph intersection.217).3.15 Why is max(x) convex? 3. SUBLEVEL SET {aTz1 + b1 | a ∈ R} sup aTz i p i + bi 239 {aTz2 + b2 | a ∈ R} {aTz3 + b3 | a ∈ R} a {aTz4 + b4 | a ∈ R} {aTz5 + b5 | a ∈ R} Figure 74: Pointwise supremum of any convex functions remains convex.15 Hint: page 256.5 Exercise. xi | x ∈ Rn } in R2 × R to see why maximum of nonnegative vector entries is a convex function of variable x . thereby. and 3. .0. Topmost affine function per a is supremum. EPIGRAPH. Epigraph intersection.

t) ∈ Rp×k × RM | X ∈ dom f . we must show f (µX + (1− µ)Y ) RM + µ u + (1− µ)v (557) Yet this holds by definition because f (µX + (1−µ)Y ) The converse also holds. f (X) t} (555) RM + Necessity is proven: [61.0. we must show for all µ ∈ [0. Sublevel sets are necessarily convex for either function. (Y . then f must be convex.60] Given any (X . µf (X)+ (1−µ)f (Y ). GEOMETRY OF CONVEX FUNCTIONS q(x) f (x) quasiconvex convex x x Figure 75: Quasiconvex function q epigraph is not necessarily convex. u) . . Epigraph sufficiency. (Y . 3. but convex function f epigraph is convex in any dimension. all invariant properties of convex sets carry over directly to convex functions. if for all µ ∈ [0. u) . f convex ⇔ epi f convex (556) {(X . v) ∈ epi f . v) ∈ epi f .3. v) ∈ epi f .0. 1] that µ(X .240 CHAPTER 3. v) ∈ epi f holds. but sufficient only for quasiconvexity.1 Exercise. Prove that converse: Given any (X . id est. u) + (1− µ)(Y .5. exer. Generalization of epigraph to a vector-valued function f (X) : Rp×k → RM is straightforward: [296] epi f id est. 1] µ(X . u) + (1− µ)(Y .

4: for t ∈ R . Matrix pseudofractional function. Sense of the inequality is reversed in (555). Consider Schur-form (1544) from A. Sense of the inequality in (558) is reversed. we need only demonstrate that the function epigraph is convex. This function is convex simultaneously in both variables when variable matrix A belongs to the entire positive semidefinite n cone S+ and variable vector x is confined to range R(A) of matrix A . for concave functions. (Figure 75) To prove necessity of convex sublevel sets: For any X .5. similarly. SUBLEVEL SET 241 Sublevel sets of a convex real function are convex. with each convex set then called superlevel set. f (µ X + (1−µ)Y ) RM + µf (X) + (1−µ)f (Y ) RM + ν (559) When an epigraph (555) is artificially bounded above. Consider a real function of two variables f (A .3.5. then the corresponding sublevel set can be regarded as an orthogonal projection of epigraph on the function domain. 1] that µ X + (1−µ)Y ∈ Lν f .2 Example. To explain this. Y ∈ Lν f we must show for each and every µ ∈ [0. x) : S n × Rn → R = xTA† x (560) n on dom f = S+ × R(A). Likewise. EPIGRAPH. 3. As for convex real functions. and we use instead the nomenclature hypograph.0. the converse does not hold.0. By definition. corresponding to each and every ν ∈ RM Lν f {X ∈ dom f | f (X) ν } ⊆ Rp×k (558) RM + sublevel sets of a convex vector-valued function are convex. t ν .

0. x) = ǫ xT(A + ǫ I )−1 x (565) where the inverse always exists by (1484). GEOMETRY OF CONVEX FUNCTIONS G(A .1) Continuing Example 3.0. z ∈ R(A) . 3. x) whose epigraph is epi f (A . x) = (A . Matrix fractional function. Matrix product function. t) | A is convex.0. Continue Example 3.3 Exercise. This function is convex n simultaneously in both variables over the entire positive semidefinite cone S+ .2.5.242 CHAPTER 3. for all x ∈ Rn f (A .0.5. Of the equivalent conditions for positive semidefiniteness of G . t) | A 0 .1. n+1 0 . Function f (A . z) = (A .2 by introducing vector variable x and making the substitution z ← Ax .5.5.9. xTA x ≤ t (564) (563) Provide two simple explanations why f (A .0. z . x . (confer 3. Because of matrix symmetry ( E). t) = T A z zT t 0 (561) ⇔ z (I − AA† ) = 0 t − z TA† z ≥ 0 A 0 n+1 Inverse image of the positive semidefinite cone S+ under affine mapping G(A . z .2. the first is an equality demanding vector z belong to R(A). x) = xTA x is not a function convex simultaneously in positive semidefinite matrix A and vector x on n dom f = S+ × Rn .0. z) = z TA† z is convex on convex domain n S+ × R(A) because the Cartesian product constituting its epigraph epi f (A . z . z TA† z ≤ t = G−1 S+ (562) 3.0.4 Example.0.0. now consider a real function of two variables n on dom f = S+ × Rn for small positive constant ǫ (confer (1909)) f (A .7.0.1. z(x)) = z TA† z = xTATA† A x = xTA x = f (A . t) is convex by Theorem 2.

0.2). x) = lim ǫ xT(W + ǫ I )−1 x = xT(I − W )x + ǫ→0 (570) where I − W is also an orthogonal projector ( E.3. From (1909) for any ǫ > 0 and idempotent symmetric W . t) is convex by Theorem 2. z) is convex on n S+ × Rn because its epigraph is that inverse image: n+1 epi f (A . Any orthogonal projector can be decomposed into an outer product of orthonormal matrices W = U U T where U T U = I as explained in E. t) = A + ǫI z zT ǫ−1 t ⇔ T t − ǫ z (A + ǫ I )−1 z ≥ 0 A + ǫI ≻ 0 0 243 (566) n+1 Inverse image of the positive semidefinite cone S+ under affine mapping G(A .4 that f (W . ǫ(W + ǫ I )−1 = I − (1 + ǫ)−1 W from which f (W .1. x) = ǫ xT(W + ǫ I )−1 x (568) Consider nonlinear function f having orthogonal projector W as argument: Projection matrix W has property W † = W T = W 0 (1955).1. ǫ z T(A + ǫ I )−1 z ≤ t = G−1 S+ (567) 3. x) = ǫ xT(W + ǫ I )−1 x = xT I − (1 + ǫ)−1 W x Therefore ǫ→0+ (569) lim f (W . t) | A + ǫ I ≻ 0 .0. x) = ǫ xT(W + ǫ I )−1 x is n convex simultaneously in both variables over all x ∈ Rn when W ∈ S+ is confined to the entire positive semidefinite cone (including its boundary). We learned from Example 3. EPIGRAPH.9. SUBLEVEL SET and all x ∈ Rn : Consider Schur-form (1547) from A. It is now our goal to incorporate f into an optimization problem such that .4: for t ∈ R G(A .5.5.3.1 matrix fractional projector f (W . z) = (A .2.5. Function f (A .0. z . z . z .

where x⋆ represents an optimal solution of (573) from any particular iteration.16 A convex problem has convex feasible set. there is no guarantee that optimal W is a projection matrix because only extreme points of a Fantope are orthogonal projection matrices U U T . substitute equivalence (569). and then iterate solution of convex problem minimize xT(I − (1 + ǫ)−1 W )x n x∈R (573) subject to x ∈ C with convex problem minimize n W∈S x∈R . for convex set C and 0 < k < n as in minimizen ǫ xT(W + ǫ I )−1 x n subject to 0 W I (572) tr W = k x∈C Although this is a convex problem.3. p. the partitioned feasible sets are not interdependent. Suppose we allow dom f to constitute the entire positive semidefinite cone but constrain W to a Fantope (90). GEOMETRY OF CONVEX FUNCTIONS an optimal solution returned always comprises a projection matrix W .. So f cannot be convex on the projection matrices. and the fact that the original problem (though nonlinear) is convex simultaneously in both variables. The idea is to optimally solve for the partitioned variables which are later combined to solve the original problem (572). [304. Let’s try partitioning the problem into two convex parts (one for x and one for W ). What makes this approach sound is that the constraints are separable. e.244 CHAPTER 3. and the objective surface has one and only one global minimum. x) = xT I − (1 + ǫ)−1 W x (571) cannot be convex simultaneously in both variables on either the positive semidefinite or symmetric projection matrices.g. and its equivalent (for idempotent W ) f (W .123] 3. The set of orthogonal projection matrices is a nonconvex subset of the positive semidefinite cone. W ∈ S (a) subject to 0 W I tr W = k x⋆T(I − (1 + ǫ)−1 W )x⋆ maximize x⋆T W x⋆ n ≡ subject to 0 W I tr W = k W∈S (574) until convergence.16 .

U ⋆ ) must also be optimal to (572) in the limit.5. by iterating solution to problem (573) with optimal (projector) solution (575) to convex problem (574). this convergent optimal objective value f(573) (for positive ǫ arbitrarily close to 0) is necessarily optimal for (572). Proof. 2 : k)Q(: . W = U ⋆ U ⋆T = x⋆ x⋆T + Q(: . id est.2] [42. [259. . then converged (x⋆ . so ⋆ f(573) = x⋆ 2 (1 − (1 + ǫ)−1 ) (576) ⋆ after each iteration. 1 : k) ∈ Rn×k per (1738c). 2 : k)T ⋆ 2 x (575) Then optimal solution (x⋆ .3.1) and set U ⋆ = Q(: . and because their objectives are equivalent for projectors by (569). 1. Optimal vector x⋆ is orthogonal to the last n − 1 columns of orthogonal matrix Q . and ǫ→0+ ⋆ lim f(573) = 0 (578) Since optimal (x⋆ . Convergence of f(573) is proven with the observation that iteration (573) (574a) is a nonincreasing sequence that is bounded below by 0. we must invoke a known analytical optimal solution to problem (574): Diagonalize optimal solution from problem (573) x⋆ x⋆T QΛQT ( A. 1. EPIGRAPH. for small ǫ . Any bounded monotonic sequence in R is convergent.1] Expression (575) for optimal projector W holds at each iteration. Because the objective f(572) from problem (572) is also bounded below ⋆ by 0 on the same domain. U ⋆ ) from problem (573) is feasible to problem (572). ⋆ ⋆ f(573) ≥ f(572) ≥ 0 (577) by (1720). SUBLEVEL SET 245 But partitioning alone does not guarantee a projector as solution. Because problem (572) is convex. U ⋆ ) to problem (572) is found. therefore ⋆ x⋆ 2 (1 − (1 + ǫ)−1 ) must also represent the optimal objective value f(573) at convergence. this represents a globally optimal solution.5. To make orthogonal projector W a certainty.

it demonstrates how a quadratic objective or constraint can be converted to a semidefinite constraint. equivalent to (579). First we write the epigraph form: X∈ SM . t∈ R subject to S TXS tI vec(A − X) T vec(A − X) 1 0 0 (581) This semidefinite program ( 4) is an epigraph form in disguise. but only on a subspace determined by S . Suppose.2): minimize t X∈ SM .4. we are presented with the constrained projection problem studied by Hayden & Wells in [185] (who provide analytical solution): Given A ∈ RM ×M and some full-rank matrix S ∈ RM ×L with L < M minimize X∈ SM subject to S TXS A −X 2 F 0 (579) Variable X is constrained to be positive semidefinite. 6.3] [249] and matrix vectorization ( 2. GEOMETRY OF CONVEX FUNCTIONS 3. t∈ R A −X T F 0 (582) minimize t A −X F ≤ t S TXS 0 (583) subject to .5.2 semidefinite program via Schur Schur complement (1544) can be used to convert a projection problem to an optimization problem in epigraph form. for example. Were problem (579) instead equivalently expressed without the square minimize X∈ SM subject to S XS then we get a subtle variation: X∈ SM . t∈ R minimize t A −X 2 ≤ t F S TXS 0 (580) subject to Next we use Schur complement [279.246 CHAPTER 3.

t∈ R subject to (584) S TXS 3.3.2. Consider a problem abstract in the convex constraint.1 Example. Schur anomaly. EPIGRAPH.5. α .0. given symmetric matrix A minimize X∈ SM X 2 F subject to X ∈ C − A −X 2 F (585) the minimization of a difference of two quadratic functions each convex in matrix X . ( 2.1) (588) .1. SUBLEVEL SET that leads to an equivalent for (582) (and for (579) by (512)) minimize t tI vec(A − X) T vec(A − X) t 0 0 247 X∈ SM . Observe equality X 2 F − A −X 2 F = 2 tr(XA) − A 2 F (586) So problem (585) is equivalent to the convex optimization minimize X∈ SM tr(XA) subject to X ∈ C But this problem (585) does not have Schur-form minimize t−α (587) X∈ SM . t subject to X ∈ C X 2 ≤t F A −X 2 ≥ α F because the constraint in α is nonconvex.9.0.5.

5 0 0. Circular contours are level sets.  ∂f (x) ∂xK    ∈ RK   (1798) while the second-order gradient of the twice differentiable real function.5 2 3. . Gradient of a differentiable real function f (x) : RK → R with respect to its vector argument is uniquely defined  ∂f (x) ∂x1 ∂f (x) ∂x2   ∇f (x) =    . with .1) maps each entry fi to a space having the same dimension as the ambient space of its domain.5 1 1. ∇f (y) can mean ∇y f (y) or gradient ∇x f (y) of f (x) with respect to x evaluated at y . .248 CHAPTER 3. each defined by a constant function-value.5 −2 Y1 2 Figure 76: Gradient in R evaluated on grid over some open disc in domain of convex quadratic bowl f (Y ) = Y T Y : R2 → R illustrated in Figure 77. GEOMETRY OF CONVEX FUNCTIONS 2 1.6 Gradient Gradient ∇f of any differentiable multidimensional function f (formally defined in D. Notation ∇f is shorthand for gradient ∇x f (x) of f with respect to x . −2 −1.5 −1 −1.5 1 0. a distinction that should become clear from context.5 Y2 0 −0.5 −1 −0.

Figure 78. zero gradient ∇f = 0 is a condition necessary for its unconstrained minimization [153.2.   .2 .9. the gradient of is (589) ∇x (xTx)2 − 2xTA x = 4(xTx)x − 4Ax 3. for example. the gradient evaluated at any point in the function domain is just the slope (or derivative) of that function there. e.17   ∂ 2f (x) ∂ 2f (x) ∂ 2f (x) · · · ∂x1 ∂xK ∂x1 ∂x2 ∂x2   2 1  ∂ f (x) ∂ 2f (x)  ∂ 2f (x) · · · ∂x2 ∂xK   ∂x2 ∇ 2 f (x) =  ∂x2. λ1 ≤ 0 √ v 1 λ1 . Figure 67. GRADIENT respect to its vector argument. 3.3.g. . . λ1 ≤ 0 2 − λ1 . .1. [153. .18 [378] The gradient can also be interpreted as that vector normal to a sublevel set.. p. is traditionally called the Hessian .0.∂x1 (1799)  ∈ SK . For a one-dimensional function of real variable.3.17 3..2]: 3.0. consider the unconstrained nonconvex optimization that is a projection on the rank-1 subset ( 2. Newton’s direction −∇ 2 f (x)−1 ∇f (x) is better for optimization of nonlinear functions well approximated locally by a quadratic function. Projection on rank-1 subset.1 Example. the gradient maps to R2 . 15.6..6. λ1 > 0 xxT − A 2 F (1734) From (1828) and D. . illustrated in Figure 76.1) For any differentiable multidimensional function.   2 ∂ f (x) ∂ 2f (x) ∂ 2f (x) ··· ∂xK ∂x1 ∂xK ∂x2 ∂x2 K 249 The gradient can be interpreted as a vector pointing in the direction of greatest change.3. .18 Jacobian is the Hessian transpose. For A ∈ SN having eigenvalues λ(A) = [λ i ] ∈ RN .1.105] .4.2. For the quadratic bowl in Figure 77.1) of positive semidefinite cone SN : Defining λ1 max i {λ(A)i } and + corresponding eigenvector v1 minimize xxT − A x 2 F = minimize tr(xxT(xTx) − 2AxxT + ATA) x = λ(A) λ(A) 2 F 2 2 . so commonly confused in matrix calculus. (confer D. [220.6] or polar to direction of steepest descent. λ 1 > 0 (1733) arg minimize xxT − A x = 0.

4]: given A ∈ Rm×n where whose gradient ( D.0. 5. nevertheless. corresponding to a positive eigenvalue λ i . The gradient provides a necessary and sufficient condition (351) (451) for optimality in the constrained case. zero gradient ∇f = 0 is a necessary and sufficient condition for its unconstrained minimization [61. 5. and AA† = I .2) when λ i = λ1 is the largest positive eigenvalue of A .5. Then (590) is satisfied x ← vi λ i ⇒ Avi = vi λ i (591) T xxT = λ i vi vi is a rank-1 matrix on the positive semidefinite cone boundary. we can make AAT invertible by adding a positively scaled identity. X = A (AA ) is the pseudoinverse A† .3) XA − I 2 F minimize XA − I n×m X∈ R 2 F (592) (593) (594) (595) ⋆ T T −1 = tr ATX TXA − XA − ATX T + I 2 F ∇X XA − I vanishes when = 2 XAAT − AT = 0 XAAT = AT T When A is fat full-rank. and the minimum is achieved ( 7. Replace vector x with a normalized eigenvector vi of A ∈ SN . scaled by square root of that eigenvalue.3]: 3. Pseudoinverse. If A has no positive eigenvalue. Otherwise.6.250 CHAPTER 3.2.5. for any A ∈ Rm×n X = AT(AAT + t I )−1 (596) . GEOMETRY OF CONVEX FUNCTIONS Setting the gradient to 0 Ax = x(xTx) (590) is necessary for optimal solution. then AA is invertible. as it does in the unconstrained case: For any differentiable multidimensional convex function.1. then x = 0 yields the minimum. Differentiability is a prerequisite neither to convexity or to numerical solution of a convex optimization problem.2 Example. The pseudoinverse matrix is the unique solution to an unconstrained convex optimization problem [160.0.

Hyperplane.1. ∂H = having nonzero normal η = a −1 ∈ Rp × R (599) x a x+b T | x ∈ Rp ⊂ Rp × R (598) This equivalence to a hyperplane holds only for real functions.6. 3.0. (confer Figure 73) f (x) : Rp → R = aTx + b (597) whose domain is Rp and whose gradient ∇f (x) = a is a constant vector (independent of x).6.3. and it describes a nonvertical [200.19 To prove that.0. consider a vector-valued affine function f (x) : Rp→ RM = Ax + b | x ∈ Rp ⊂ Rp × RM having gradient ∇f (x) = AT ∈ Rp×M : The affine set x Ax + b is perpendicular to η because ηT ∇f (x) −I x Ax + b − ∈ Rp×M × RM ×M 0 b = 0 ∀ x ∈ Rp Yet η is a vector (in Rp × RM ) only when M = 1.3 Example. This function describes the real line R (its range). described by affine function. F 3.2] hyperplane ∂H in the space Rp × R for any particular vector a (confer 2.2). . B. Minimizing instead AX − I 2 yields the second flavor in (1908). Consider the real affine function of vector variable.4. so we have: of real affine function f (x) is therefore a halfspace in R The real affine function is to convex functions as the hyperplane is to convex sets. GRADIENT 251 Invertibility is guaranteed for any finite positive value of t by (1484). Then matrix X becomes the pseudoinverse X → A† X ⋆ in the limit t → 0+ .19 Epigraph Rp . line.3.

the matrix-valued affine function of real variable x . In pointed closed convex cone K . h(x) : R → RM ×N = Ax + B describes a line in RM ×N in direction A {Ax + B | x ∈ R} ⊆ RM ×N and describes a line in R×RM ×N x Ax + B | x∈ R ⊂ R×RM ×N (602) (601) (600) whose slope with respect to x is A .1 Definition. it is necessary and sufficient for each entry fi to be monotonic in the same sense.252 CHAPTER 3.1. which follows from a result (377) of Fej´r. tr(Z TX) is a + nondecreasing monotonic function of matrix X ∈ SM when constant matrix e Z is positive semidefinite. Any affine function is monotonic. Y Y K K X ⇒ f (Y ) X ⇒ f (Y ) f (X) f (X) (603) △ For monotonicity of vector-valued functions. 3. multidimensional function f (X) is nondecreasing monotonic when nonincreasing monotonic when ∀ X. GEOMETRY OF CONVEX FUNCTIONS Similarly. Monotonicity. for example. A convex function can be characterized by another kind of nondecreasing monotonicity of its gradient: . In K = SM . for any particular matrix A ∈ RM ×N .1 monotonic function A real function of real argument is called monotonic when it is exclusively nonincreasing or nondecreasing over the whole of its domain.6. Y ∈ dom f . A real differentiable function of real argument is monotonic when its first derivative (not necessarily continuous) maintains sign over the function domain.0.6. f compared with respect to the nonnegative orthant. 3.

0.1. ⋄ 3.2. these rules are consequent to (1829).1. 3.4.0. Convexity (concavity) of any g is preserved when h is affine. [250.6.3. For differentiable functions.3 Example. then (f (x)k + g(x)k )1/k is also convex for integer k ≥ 1. GRADIENT 253 3. [200. Gradient monotonicity.20] Given real differentiable function f (X) : Rp×k → R with matrix argument on open convex domain.4] [55. Strict inequality and caveat Y = X provide necessary and sufficient conditions for strict convexity.6.2 Theorem. B.44] A squared norm is convex having the same minimum because a squaring operation is convex nondecreasing monotonic on the nonnegative real line. their composition f = g(h) : Rn → R defined by f (x) = g(h(x)) .4] [200. Given functions g : Rk → R and h : Rn → Rk .1 exer. the condition ∇f (Y ) − ∇f (X) .6. Y ∈ dom f (604) is necessary and sufficient for convexity of f . B. 3. Y − X ≥ 0 for each and every X. p.2. If f and g are nonnegative convex real functions. [61. is convex if g is convex nondecreasing monotonic and h is convex g is convex nonincreasing monotonic and h is concave and composite function f is concave if g is concave nondecreasing monotonic and h is concave g is concave nonincreasing monotonic and h is convex where ∞ (−∞) is assigned to convex (concave) g when evaluated outside its domain. dom f = {x ∈ dom h | h(x) ∈ dom g} (605) .1. Composition of functions.1] Monotonic functions play a vital role determining convexity of functions constructed by transformation.

1. positive.6.6.1.32] In general the product or ratio of two convex functions is not convex.7) f (Y ) ≥ f (X) + ∇f (X) . GEOMETRY OF CONVEX FUNCTIONS 3.1.4 Exercise. (c) If f is convex.5.2 first-order convexity condition. then f /g is convex. [61. with one nondecreasing and the other nonincreasing. I. 4. and g is concave. real function 0 in (497) invites refocus to the real-valued function: Discretization of w 3.3. [61. Explain this anomaly.1. Y − X for each and every X. then f g is convex. 1. B. and positive.6. [231] However.2] [332. Product and ratio of convex functions.2. The following result for product of real functions is extensible to inner product of multidimensional functions on real domain: 3. Necessary and sufficient convexity condition.1.2] [308. 3] For real differentiable function f (X) : Rp×k → R with matrix argument on open convex domain. both nondecreasing (or nonincreasing).1.6.6.5a by verifying Jensen’s inequality ((496) at µ = 1 ).20 Hint: Prove 3. 1.1. nondecreasing. (b) If f . Order of composition.1 Theorem. the condition (confer D. exer.3 when g = h(x)−1 and h = x2 . nonincreasing.3] [42.2] [394.3] [133. [200.0.2. then f g is concave. Real function f = x−2 is convex on R+ but not predicted so by results in Example 3. there are some results that apply to functions on R [real domain]. g are concave. 3. 2 . and positive.6.254 CHAPTER 3.0.0. 3.1] 3. Y ∈ dom f (606) is necessary and sufficient for convexity of f . and positive functions on an interval.0. Prove the following.3.0.5 Exercise. Caveat Y = X and strict inequality again provide necessary and sufficient conditions for strict ⋄ convexity.4.20 (a) If f and g are convex.

694). GRADIENT 255 f (Y ) ∇f (X) −1 ∂H− Figure 77: When a real function f is differentiable at each point in its open domain.6. Unique strictly supporting hyperplane ∂H− ⊂ R2 × R (only partially drawn) and its normal vector [ ∇f (X)T −1 ]T at the particular point of support [ X T f (X) ]T are illustrated. . there is an intuitive geometric interpretation of function convexity in terms of its gradient ∇f and its epigraph: Drawn is a convex quadratic bowl in R2 × R (confer Figure 166 p.3. there is a unique hyperplane containing [ Y T f (Y ) ]T and supporting the epigraph of convex differentiable f . The interpretation: At each and every coordinate Y . f (Y ) = Y T Y : R2 → R versus Y on some open disc in R2 .

From (607) we deduce.1. there is simplification of the first-order condition (606). ∇f (X)T (Y − X) ≥ 0 ⇒ f (Y ) ≥ f (X) (610) meaning.501] . expressed in terms of function gradient. 3.256 CHAPTER 3. p. Y ∈ dom f f (Y ) ≥ f (X) + ∇f (X)T (Y − X) (607) From this we can find ∂H− a unique [374. GEOMETRY OF CONVEX FUNCTIONS When f (X) : Rp → R is a real differentiable convex function with vector argument on open convex domain. defining f (Y ∈ dom f ) ∞ [61.220-229] nonvertical [200.1. B. the gradient at X identifies a supporting hyperplane there in Rp {Y ∈ Rp | ∇f (X)T (Y − X) = 0} to the convex sublevel sets of convex function f (confer (558)) Lf (X) f {Z ∈ dom f | f (Z) ≤ f (X)} ⊆ Rp (612) (611) illustrated for an arbitrary convex real function in Figure 78 and Figure 67. That supporting hyperplane is unique for twice differentiable f . there exists a hyperplane ∂H− in Rp × R having normal ∇f (X) X supporting the function epigraph at ∈ ∂H− −1 f (X) ∂H− = Y t ∈ Rp R ∇f (X)T −1 Y t − X f (X) =0 (609) Such a hyperplane is strictly supporting whenever a function is strictly convex. Y ∈ dom f in the domain. for each and every X .2] hyperplane ( 2. [220. One such supporting hyperplane (confer Figure 29a) is illustrated in Figure 77 for a convex quadratic. for each and every point X in the domain of a convex real function f (X) .4). p.7] / at f (X) Y t ∈ epi f ⇔ t ≥ f (Y ) ⇒ ∇f (X)T −1 Y t − X f (X) ≤0 (608) This means. for each and every X. supporting epi f X : videlicet.

f (X) = α} (611) Figure 78:(confer Figure 67) Shown is a plausible contour plot in R2 of some arbitrary real differentiable convex function f (Z ) at selected levels α . 4. [309. the family comprising all its sublevel sets is nested. . A convex function has convex sublevel sets Lf (X) f (612). Contour plots of real affine functions are illustrated in Figure 26 and Figure 73.6. p. GRADIENT 257 α β α ≥ β ≥ γ Y −X ∇f (X) γ {Z | f (Z ) = α} {Y | ∇f (X)T (Y − X) = 0 . For any particular convex function. we may certainly conclude the corresponding function is neither convex. for instance. β .75] Were sublevel sets not convex.6] The sublevel set whose boundary is the level set at α . contours of equal level f (level sets) drawn in the function’s domain.3. and γ . [200. comprises all the shaded regions.

7). Y ∈ dom f f (Y ) RM + →Y −X f (X) + df (X) →Y −X f (X) + d dt t=0 f (X + t (Y − X)) (613) where df (X) is the directional derivative 3.2. This. The right-hand side of inequality (613) is the first-order Taylor series expansion of f about X . The vector-valued function case (613) is therefore a straightforward application of the first-order convexity condition for real functions to each entry of the vector-valued function. . GEOMETRY OF CONVEX FUNCTIONS 3.1.  tr ∇fM (X)T (Y − X) →Y −X RM + 0 ⇔ f (Y ) − f (X) − df (X) . vector-valued f Now consider the first-order necessary and sufficient condition for convexity of a vector-valued function: Differentiable function f (X) : Rp×k → RM is convex if and only if dom f is open.258 CHAPTER 3. M }} the extreme directions of the selfdual nonnegative orthant.6. convex. then Theorem 3.13. thereby broadening scope of the Taylor series ( D.0.3 first-order convexity condition.4 so that direction may be indicated by a vector or a matrix.2.1). w ≥ 0 ∀ w →Y −X 0 ∗ RM + (614) (615) Necessary and sufficient discretization (497) allows relaxation of the semiinfinite number of conditions {w 0} instead to {w ∈ {ei . . .21 →Y −X . Each extreme direction picks out a real entry fi and df (X) i from the vector-valued function and its directional derivative. i = 1 .1. of course. and for each and every X. follows from the real-valued function case: by dual generalized inequalities ( 2.0.   . f (Y ) − f (X) − df (X) where  tr ∇f1 (X)T (Y − X)  tr ∇f2 (X)T (Y − X)  →Y −X   df (X) =   ∈ RM .1 applies. 3.6. We extend the traditional definition of directional derivative in D.21 [220] [334] of f at X in direction Y − X .

Quadratic forms constitute a notable exception where the strict-case converse holds reliably.4) Prove that real function f (x.2. even on the first quadrant.6.0. when M = 1. hence precluding points of inflection as in Figure 79 (p.4 second-order convexity condition Again. (f is quasilinear (p.2 Exercise. . y) = x/y is not convex on the first quadrant. id est. 3. we are obliged only to consider each individual entry fi of a vector-valued function f .6. Real quadratic multivariate polynomial in matrix A and vector b xTA x + 2bTx + c is convex if and only if A gradient: ( D. and nonmonotonic. a twice differentiable vector-valued function with vector argument on open convex domain.1 Example.3.3. . ∇ 2fi (X) 0 ∀ X ∈ dom f . but that is nothing new.1) 0. this convexity condition also serves for a real function. Condition (616) demands nonnegative curvature. Strict inequality in (616) provides only a sufficient condition for strict convexity.4.5. Also exhibit this in a plot of the function.6.0. intuitively.4. GRADIENT 259 3. by discretization (497).6.0. i=1 . matrix A can be assumed symmetric.0. Real fractional function. in fact. videlicet. For f (X) : Rp → RM .) . M (616) p S+ is a necessary and sufficient condition for convexity of f . 3. 3.269) on {y > 0} . strictly convex real function fi (x) = x4 does not have positive second derivative at each and every x ∈ R . Convex quadratic.268). Obviously. the real functions {fi }. (617) Proof follows by observing second order 2 ∇x xTA x + 2bTx + c = A + AT (618) Because xT(A + AT )x = 2xTA x . (confer 3.

For N = 6 and hij data from (1414). 1. 1] such that [394.6.3. We choose symmetric g(X) ∈ SM because matrix-valued functions are most often compared (620) with respect to the positive semidefinite cone SM in the ambient space of symmetric + matrices. a consequence of the mean value theorem from calculus allows compression of its complete Taylor series expansion about X ∈ dom fi ( D.7 Convex matrix-valued function We need different tools for matrix argument: We are primarily interested in continuous matrix-valued functions g(X). so that each and every line segment [X .3] [42.3. apply line theorem 3.6.0.1. for A ∈ Rm×p and B ∈ Rm×k .0.4.1 to plot f along some arbitrary lines through its domain.7) to three terms: On some open interval of Y 2 . indeed. 3.260 CHAPTER 3. g(X) = AX + B is a convex (affine) function in X on domain Rp×k with 3. id est.1. there exists an α ∈ [0.7. 1. (x − y)2 and Define |x − y| Given symmetric nonnegative data [hij ] ∈ SN ∩ RN ×N .22 . Stress function.3 Exercise.4. consider function + f (vec X) = N −1 N i=1 j=i+1 X = [ x1 · · · xN ] ∈ R1×N (76) (|xi − xj | − hij )2 ∈ R (1341) Find a gradient and Hessian for f . Then explain why f is not a convex function. why doesn’t second-order condition (616) apply to the constant positive semidefinite Hessian matrix you found.1 second-order ⇒ first-order condition For a twice-differentiable real function fi (X) : Rp → R having open domain. Y ] belongs to dom fi .22 Function symmetry is not a necessary requirement for convexity.4] 1 fi (Y ) = fi (X) + ∇fi (X)T (Y −X) + (Y −X)T ∇ 2fi (αX + (1 − α)Y )(Y −X) 2 (619) The first-order condition for convexity (607) follows directly from this and the second-order condition (616). GEOMETRY OF CONVEX FUNCTIONS 3. 3.2.

0. CONVEX MATRIX-VALUED FUNCTION 261 3.t SM ⇔ W .13. ( 2.7) comprises a minimal set of generators for that cone. for each and every Y. By dual generalized inequalities we have the equivalent but more broad criterion.7] liken symmetric matrices to real numbers.7.0.4.r. 2.2.7. . Z ∈ dom g and all 0 ≤ µ ≤ 1 [216. Then distance between points x1 and x2 is the norm of their difference: x1 − x2 1 . △ 3.0.3. discretization ( 2. Symmetric convex functions share the same + benefits as symmetric matrices. Horn & Johnson [203.3. A function g(X) : Rp×k → SM is convex in X iff dom g is a convex set and. Convex matrix-valued function: 1) Matrix-definition. Strict convexity is defined less a stroke of the pen in (620) similarly to (499). Because the set of all extreme directions for the selfdual positive semidefinite cone ( 2. It follows that g(X) : Rp×k → SM is convex in X iff wTg(X)w : Rp×k → R is convex in X for each and every w = 1 .0. Given a list of points arranged columnar in a matrix X = [ x1 · · · xN ] ∈ Rn×N (76) respect to the nonnegative orthant Rm×k .1 Definition. Taxicab distance matrix.5) g convex w. 7. 2) Scalar-definition.2.9.2 Example. Consider an n-dimensional vector space Rn with metric induced by the 1-norm.7. shown by substituting the defining inequality (620).1) allows replacement of matrix W with symmetric dyad wwT as proposed.7] g(µ Y + (1 − µ)Z) µ g(Y ) + (1 − µ)g(Z ) (620) SM + Reversing sense of the inequality flips this definition to concavity. g convex for each and every W + 0 (621) SM + ∗ Strict convexity on both sides requires caveat W = 0.13. and (symmetric) positive definite matrices to positive real numbers.

Prove (623).0.13. .0. . road distance is a sum of grid lengths.7. ( 2. Y − X →Y −X (625) where dg(X) is the directional derivative ( D. for each and every Y..0.1) of a convex matrix-valued function. . 1-norm distance matrix.1 first-order convexity condition. . matrix-valued f From the scalar-definition ( 3. GEOMETRY OF CONVEX FUNCTIONS then we could define a taxicab distance matrix D1 (X) (I ⊗ 1T ) | vec(X)1T − 1 ⊗ X | ∈ SN ∩ RN ×N + h n  0 x 1 − x2 1 x 1 − x3 1 · · ·  x 1 − x2 1 0 x 2 − x3 1 · · ·   =  x 1 − x3 1 x 2 − x3 1 0  . wwT ≥ 0 ∀ wwT( 0) SM + ∗ →Y −X (626) . This matrix-valued function is convex with respect to the nonnegative orthant since. The 1-norm is called taxicab distance because to go from one point to another in a city by car. x 1 − xN 1 x 1 − xN x 2 − xN x 3 − xN . 3. By discretized dual generalized inequalities. Z ∈ Rn×N and all 0 ≤ µ ≤ 1 D1 (µ Y + (1 − µ)Z) RN ×N + x 2 − xN 1 x 3 − xN 1 ···   (622)     µ D1 (Y ) + (1 − µ)D1 (Z ) (623) 3.  . 0 1 1 1  where 1n is a vector of ones having dim 1n = n and where ⊗ represents Kronecker product. for differentiable function g and for each and every real vector w of unit norm w = 1. ..1.262 CHAPTER 3.0. .7. .7.5) g(Y ) − g(X) − dg(X) →Y −X SM + 0 ⇔ g(Y ) − g(X) − dg(X) . we have wTg(Y )w ≥ wTg(X)w + wT dg(X) w →Y −X (624) that follows immediately from the first-order condition (606) for convexity of a real function because →Y −X wT dg(X) w = ∇X wTg(X)w .4) of function g at X in direction Y − X .3 Exercise.

5 on page 240.155] g(X) : Rp×k → SM : epi g {(X . 3.3. p. sublevel sets We generalize epigraph to a continuous matrix-valued function [34. Y ∈ dom g (confer (613)) g(Y ) SM + 263 g(X) + dg(X) →Y −X (627) must therefore be necessary and sufficient for convexity of a matrix-valued function of matrix variable on open convex domain. X) = ǫ X(A + ǫ I )−1 X T (631) where the inverse always exists by (1484).5.4 consider a matrix-valued function of two variables on dom g = SN × Rn×N for small positive constant ǫ (confer (1909)) + g(A .1 Example. [34. CONVEX MATRIX-VALUED FUNCTION For each and every X .2 epigraph of matrix-valued function.0. Matrix fractional function redux.0.7. Sublevel sets of a convex matrix-valued function corresponding to each and every S ∈ SM (confer (558)) LS g {X ∈ dom g | g(X) S } ⊆ Rp×k (630) SM + are convex. T ) ∈ Rp×k × SM | X ∈ dom g . This function is convex simultaneously in both variables over the entire positive semidefinite cone SN + . There is no converse.7. p. g(X) T} (628) SM + from which it follows g convex ⇔ epi g convex (629) Proof of necessity is similar to that in 3. 3.0.7.155] Generalizing Example 3.2.

X . Y ∈ Rp×k not necessarily in that function’s domain. [61. T ) = A + ǫ I XT X ǫ−1 T ⇔ T − ǫ X(A + ǫ I )−1 X T 0 A + ǫI ≻ 0 0 (632) epi g(A .0. Line theorem.9. ǫ X(A + ǫ I )−1 X T By Theorem 2.1 Theorem.7. 3. ⋄ Now we assume a twice differentiable function. Matrix-valued function g(X) : Rp×k → SM is convex in X iff dom g is an open convex set. X .3 second-order convexity condition. id est. X) is convex on SN × Rn×N because its epigraph is that inverse image: + T = G−1 SN +n + (633) 3.264 CHAPTER 3.3. Function g(A . 3. inverse image of the positive semidefinite cone SN +n + under affine mapping G(A .4: for T ∈ S n G(A .3.1] p×k M Matrix-valued function g(X) : R → S is convex in X if and only if it remains convex on the intersection of any line with its domain.0. 3.7.2 Definition.1. and its second derivative g ′′(X + t Y ) : R → SM is positive semidefinite on each point of intersection along every line {X + t Y | t ∈ R} that intersects dom g . then we say a line {X + t Y | t ∈ R} passes through dom g when X + t Y ∈ dom g over some interval of t ∈ R . Given a function g(X) : Rp×k → SM and particular X . T ) is convex. matrix-valued f The following line theorem is a potent tool for establishing convexity of a multidimensional function. GEOMETRY OF CONVEX FUNCTIONS and all X ∈ Rn×N : Consider Schur-form (1547) from A. what is meant by line must first be solidified. X) = (A . Differentiable convex matrix-valued function. X .1. iff for each and every X .1. To understand it. Y ∈ Rp×k such that X + t Y ∈ dom g over some open interval of t ∈ R d2 g(X + t Y ) dt2 0 SM + (634) . T ) | A + ǫ I ≻ 0 .7.0.

3.3 Example. for X . The matrix-valued function g(X) = X 2 is convex on the domain of symmetric matrices. [61.1.6.25 By (1508) in A. and strictly convex when A is positive definite The strict-case converse is reliably true for quadratic forms.7. [61.0. 7.23 . g(X) is convex in X . if d2 g(X + t Y ) ≻ 0 dt2 M S+ 265 (635) then g is strictly convex.23 △ 3.1) d2 d2 g(X + t Y ) = 2 (X + t Y )2 = 2Y 2 dt2 dt (637) which is positive semidefinite when Y is symmetric because then Y 2 = Y T Y (1490).25 A more appropriate matrix-valued counterpart for f is g(X) = X TX which is a convex function on domain {X ∈ Rm×n } .2] In particular. A. Matrix inverse. Hence.3. 3.24 tr X −1 is convex on that same domain.0. will not produce a convex function. [203. 3. This result is extensible. [204. and strictly convex whenever X is skinny-or-square full-rank.3.1.491] 3.3.24 3. CONVEX MATRIX-VALUED FUNCTION Similarly.1) The matrix-valued function X µ is convex on int SM for −1 ≤ µ ≤ 0 + or 1 ≤ µ ≤ 2 and concave for 0 ≤ µ ≤ 1.1. the function g(X) = X −1 is convex on int SM . 3. 3.3. Iconic real function f (x) = x2 is strictly convex on R . changing the domain instead to all symmetric and nonsymmetric positive semidefinite matrices.6 prob.3. d/dt tr g(X + t Y ) = tr d/dt g(X + t Y ).3.7. for example. Y ∈ SM and any open interval of t ∈ R ( D.25] 3.1.2] [55.706).5) d2 g(X + t Y ) = 2(X + t Y )−1 Y (X + t Y )−1 Y (X + t Y )−1 dt2 0 SM + (636) on some open interval of t ∈ R such that X + t Y ≻ 0.7.0. the converse is generally false.1 exer.4 Example. This matrix-valued function can be generalized to g(X) = X TAX which is convex whenever matrix A is positive semidefinite (p. p.2. (confer 3.3.4]3. For each and every Y ∈ SM + ( D.2. Matrix squared.

7 we have d 2 X+ t Y e = Y eX+ t Y Y dt2 0.7.5). .0. Give seven examples of distinct polyhedra P for which the set n {X TX | X ∈ P} ⊆ S+ (639) were convex. Squared inverse. SM + (XY )T = XY (640) because all circulant matrices are commutative and. X 2 sup Xa a =1 2 = σ(X)1 = λ(X TX)1 = minimize t∈R t tI X X T tI (638) 0 subject to The matrix 2-norm (spectral norm) coincides with largest singular value. Given symmetric argument. for any polyhedron P ? (confer (1266)(1273)) Is the epigraph of function g(X) = X TX convex for any polyhedral domain? 3. XY = Y X ⇔ (XY )T = XY (1507).6 Exercise.0.5 Exercise. 3. (confer 3. show how it follows via Definition 3. Given positive definite matrix constant A .0. Matrix exponential. Squared maps. in general.2.1) −2 For positive scalar a .26 From this result.0.7 Example.0.3.7. the matrix 3.3. from Table D. Applying the line theorem.0.2. for symmetric matrices. prove via line theorem that g(X) = tr (X TA−1X)−1 is generally not convex unless X ≻ 0 .3.3. Y ∈ SM .7. The matrix-valued function g(X) = eX : SM → SM is convex on the subspace of circulant [169] symmetric matrices.0.266 CHAPTER 3.2. GEOMETRY OF CONVEX FUNCTIONS and X is skinny-or-square full-rank (Corollary A.1-2 that h(X) = (X TA−1X)−1 is generally neither convex. 3.7. real function f (x) = ax is convex on the nonnegative real line. Is this set convex.7. This supremum of a family of convex functions in X must be convex because it constitutes an intersection of epigraphs of convex functions.3.1. for all t ∈ R and circulant X .3.26 Hint: D.

log det X = − log det X −1 is concave.0.3. 5. There are.8 Quasiconvex Quasiconvex functions [61. ( A. 3. Quasiconvex function. f (X) : Rp×k → R is a quasiconvex function of matrix X iff dom f is a convex set and for each and every Y. 3. f (Z )} A quasiconcave function is determined: f (µ Y + (1 − µ)Z) ≥ min{f (Y ) .1. p.4] [200] [332] [374] [245. Then for any matrix Y of compatible dimension. 2] are useful in practical problem solving because they are unimodal (by definition when nonmonotonic).3. changing the function domain to the subspace of real diagonal matrices reduces the matrix exponential to a vector-valued function in an isometrically isomorphic subspace RM . multifarious methods to determine function convexity..1) from the real-valued function case [61. QUASICONVEX 267 exponential always resides interior to the cone of positive semidefinite matrices in the symmetric matrix subspace. [34. [252.149] Show by three different methods: On interior of the positive semidefinite cone.0. f (Z )} (642) △ (641) . but its inverse is convex when domain is restricted to interior of a positive semidefinite cone. e.1 Definition. [61] [42] [133] each of them efficient when appropriate. in general. Y TeA Y is positive semidefinite. of course.g. The matrix exponential of any diagonal matrix eΛ exponentiates each individual entry on the main diagonal.8. log det function. known convex ( 3.0. eA ≻ 0 ∀ A ∈ SM (1906). a global minimum is guaranteed to exist over any convex set in the function domain. 0 ≤ µ ≤ 1 f (µ Y + (1 − µ)Z) ≤ max{f (Y ) . 3. 3.8.0.7.8 Exercise. Matrix determinant is neither a convex or concave function.5].1. Figure 79. 3.5) The subspace of circulant symmetric matrices contains all diagonal matrices.3] So.1.3. Z ∈ dom f .

△ .g. Note reversal of curvature in direction of gradient.268 CHAPTER 3. GEOMETRY OF CONVEX FUNCTIONS Figure 79: Iconic unimodal differentiable quasiconvex function of two variables graphed in R2 × R on some open disc in R2 . e.9.2. quasiconcave rank(X) on SM ( 2.0. or matrix-valued function g(X) : Rp×k → SM is a quasiconvex function of matrix X iff dom g is a convex set and the sublevel set corresponding to each and every S ∈ SM LS g = {X ∈ dom g | g(X) S } ⊆ Rp×k (630) is convex.2) and card(x) + on RM . Vectors are compared with respect to the nonnegative orthant RM + while matrices are with respect to the positive semidefinite cone SM .0.. Although insufficient for convex functions.9.2 Definition. Unlike convex functions.8. likewise LSg = {X ∈ dom g | g(X) S } ⊆ Rp×k (643) is necessary and sufficient for quasiconcavity. convexity of each and + every sublevel set serves as a definition of quasiconvexity: 3. vector-. quasiconvex functions are not necessarily continuous. + Convexity of the superlevel set corresponding to each and every S ∈ SM . Quasiconvex multidimensional function. Scalar-.

28 ( 3. monotonicity ⇒ quasilinearity ⇔ convex level sets.27 3. .8. 2. Exercise 3. Consider real function f on a positive definite domain f (X) = tr(X1 X2 ) .4..g.0. but dare not confuse these terms. it is called quasilinear. X2 ≥ s } X X1 ∈ dom f X2 rel int SN + rel int SN + 269 (644) (645) Prove: f (X) is not quasiconcave except when N = 1.2 quasilinear When a function is simultaneously quasiconvex and quasiconcave. convex hypograph ⇔ concavity ⇒ quasiconcavity ⇔ convex superlevel. but not vice versa.1 bilinear Bilinear function3. 3. Quasilinear functions are completely determined by convex level sets. 10] A quasiconvex function is not necessarily a continuous function.0.8.0. [3] e. convex epigraph ⇔ convexity ⇒ quasiconvexity ⇔ convex sublevel sets. nor is it quasiconvex unless X1 = X2 . [309.0. a monotonic concave function is quasiconvex.9 1.8.9 no. 3.9. One-dimensional function f (x) = x3 and vector-valued signum function sgn(x) .8.2). for example.6. 3.0. Nonconvexity of matrix product. are quasilinear. 3.27 xTy of vectors x and y is quasiconcave (monotonic) on the entirety of the nonnegative orthants only when vectors are of dimension 1.1. SALIENT PROPERTIES 3.4. Salient properties of convex and quasiconvex functions A convex function is assumed continuous but not necessarily differentiable on the relative interior of its domain. 3.3 Exercise.3. with superlevel sets Ls f = {X ∈ dom f | X1 . Any monotonic function is quasilinear 3.28 Convex envelope of bilinear functions is well known.

homogeneity) Function convexity. B.2. where h(Rm×n ) ∩ dom g = ∅ . concavity. 4.2. 5] – Sum of quasiconvex functions not necessarily quasiconvex.2] 5.30 (concave) functions remains convex (concave).270 3. 3.0. ( 3.29 . CHAPTER 3.1) – Nonnegatively weighted nonzero sum of strictly convex (concave) functions remains strictly convex (concave).30 3. – Pointwise supremum (infimum) of convex (concave) functions remains convex (concave).4. 7.7. Log-convex means: logarithm of function f is convex on dom f . – Nonnegatively weighted maximum (minimum) of quasiconvex (quasiconcave) functions remains quasiconvex (quasiconcave). (Figure 74) [309. and quasiconcavity are invariant to offset and nonnegative scaling. [61. [200.3. GEOMETRY OF CONVEX FUNCTIONS log-convex ⇒ convex ⇒ quasiconvex. quasiconvexity.1] Likewise for the quasiconvex (quasiconcave) functions g .1) translates identically to quasiconvexity (quasiconcavity).1. 3. – Nonnegatively weighted maximum (minimum) of convex3.2. The line theorem ( 3. (affine transformation of argument) Composition g(h(X)) of convex (concave) function g with any affine function h : Rm×n → Rp×k remains convex (concave) in X ∈ Rm×n . – Pointwise supremum (infimum) of quasiconvex (quasiconcave) functions remains quasiconvex (quasiconcave). 3. 6.29 concave ⇒ quasiconcave ⇐ log-concave ⇐ positive concave. g convex ⇔ −g concave g quasiconvex ⇔ −g quasiconcave g log-convex ⇔ 1/g log-concave (translation. – Nonnegatively weighted sum of convex (concave) functions remains convex (concave). Supremum and maximum of convex functions are proven convex by intersection of epigraphs.

All rights reserved. becomes conversion of a given problem (perhaps a nonconvex or combinatorial problem statement) to an equivalent convex form or to an alternation of convex subproblems convergent to a solution of the original problem: nascence of polynomial-time interior-point methods of solution [366] [384]. Convex Optimization & Euclidean Distance Geometry.) −Forsgren. 2005. 2001 Jon Dattorro.01. 4. citation: Dattorro.28.4.1 one a subset of the other.28. Linear programming ⊂ (convex ∩ nonlinear) programming. rather.01.Chapter 4 Semidefinite programming Prior to 1984. [168] [387] [388] [386] [351] [338] The goal.5. (The use of “programming” to mean “optimization” serves as a persistent reminder of these differences. Mεβoo Publishing USA. v2012. co&edg version 2012.1 271 . The explanation is: typically we do not seek analytical solution because there are relatively few. then. ( 3. linear and nonlinear programming. Gill. C) If a problem can be expressed in convex form. it may at first seem puzzling why a search for its solution ends abruptly with a formalized statement of the problem itself as a constrained optimization. had evolved for the most part along unconnected paths.2. without even a common terminology. then there exist computer programs providing efficient numerical global solution. & Wright (2002) [146] Given some practical application of convex analysis.

0. [61.2] [308. 1] Given convex real objective function g and convex feasible set D ⊆ dom g . 4. any locally optimal point (or solution) of a convex problem is globally optimal. there is much ongoing research to determine which problem classes have convex expression or relaxation.272 CHAPTER 4.0. we pose a generic convex optimization problem minimize X g(X) subject to X ∈ D (646) where constraints are abstract here in membership of variable X to convex feasible set D . and the lack of support for SDP solvers in popular modelling languages likely discourages submissions. which is the set of all variable values satisfying the problem constraints.9] . −SIAM News. We speculate that semidefinite programming is simply experiencing the fate of most new areas: Users have yet to understand how to pose their problems as semidefinite programs. Affine equality constraint functions. p. Similarly. 2002. [34] [59] [152] [279] [348] [150] 4. the problem maximize X g(X) subject to X ∈ D (647) is called convex were g a real concave function and feasible set D convex.1 Conic problem Still. as opposed to the superset of all convex equality constraint functions having convex level sets ( 3. we are surprised to see the relatively small number of submissions to semidefinite programming (SDP) solvers. [116.2. As conversion to convex form is not always possible. make convex optimization tractable.4. SEMIDEFINITE PROGRAMMING By the fundamental theorem of Convex Optimization. Inequality constraint functions of a convex optimization problem are convex while equality constraint functions are conventionally affine. but not necessarily so. as this is an area of significant current interest to the optimization community.4).

When K is a polyhedral cone ( 2. 4. negative quantities of activities are not possible.X y∈R . 6. K is its dual.8] minimize n (P) X∈ S C. Unlike the optimal objective value. 3-1]4.3. 3. where matrix C ∈ S n and vector b ∈ Rm are fixed. [94. y subject to X 0 A svec X = b subject to S 0 svec−1 (AT y) + S = C (D) (648) This is the prototypical semidefinite program and its dual. matrix A is fixed. the optimal solution set {x⋆ } or {y ⋆ .1] [240. . . a negative number of cases cannot be shipped. . . i = 1 .1 a Semidefinite program n When K is the selfdual cone of positive semidefinite matrices S+ in the subspace of symmetric matrices S n . S∈ S maximize m n b. each optimization problem is convex when K is a closed convex cone. are given.1.4] primal problem (P) having matrix variable X ∈ S n while corresponding dual (D) has slack variable S ∈ S n and vector variable y = [yi ] ∈ Rm : [10] [11. a solution to each convex problem is not necessarily unique.1). Thus 4. .2 Dantzig explains reasoning behind a nonnegativity constraint: . 2. T svec(Am ) where Ai ∈ S n . [279. the selfdual nonnegative orthant providing the prototypical primal linear program and its dual. as is   svec(A1 )T .1] [241] minimize (p) x c Tx maximize y.  ∈ Rm×n(n+1)/2  (649) A .12. s⋆ } is convex and may comprise more than a single point although the corresponding optimal objective value is unique when the feasible set is nonempty. 1. m . 2] [394. . CONIC PROBLEM 273 (confer p.3. .1. in other words. then each conic problem becomes a linear program.s b Ty ∗ subject to x ∈ K Ax = b subject to s ∈ K A Ty + s = c ∗ (d) (301) where K is a closed convex cone. . .160) Consider a conic problem (p) and its dual (d): [295. and the remaining quantities are vectors.4. then each conic problem is called a semidefinite program (SDP).2 More generally.

274 CHAPTER 4. id est. high capacity electronic computor that will be available here from the middle of next year.3] but numerical performance of contemporary general-purpose semidefinite program solvers remains limited: Computational intensity for dense systems varies as O(m2 n) (m constraints ≪ n variables) based on interior-point methods that produce solutions no more relatively accurate than 1E-8. SEMIDEFINITE PROGRAMMING  A1 . . There are no solvers capable of handling in excess of n =100.3] Heuristics are not ruled out by SIOPT. p. there are still questions relating to high-accuracy and speed. invented by Dantzig in 1947 [94]. indeed I would suspect that most successful methods have (appropriately described ) heuristics under the hood . one may easily run into several hundred variables and perhaps a hundred or more degrees of freedom. Gould. p. Of course. X tr(C TX) = svec(C )T svec X (38) where svec X defined by (56) denotes symmetric vectorization. Am .000 variables without significant. 11] [384] [366] [279. −Ragnar Frisch [148] Linear programming. In a national planning problem of some size. loss of precision or time.  . A svec X =  . sometimes crippling. C.258] [67. Stanford alumnus. [31] [102] Interior-point methods for numerical solution can be traced back to the logarithmic barrier of Frisch in 1954 and Fiacco & McCormick in 1968 [142]. . . . X . Karmarkar’s polynomial-time interior-point method sparked a log-barrier renaissance in 1984. Our outlook may perhaps be changed when we get used to the super modern. .my codes certainly do. The same cannot yet be said of semidefinite programming whose roots trace back to systems of positive semidefinite linear inequalities studied by Bellman & Fan in 1963. but for many applications a few digits of accuracy suffices and overnight runs for non-real-time delivery is acceptable. It should always be remembered that any mathematical method and particularly methods in linear programming must be judged with reference to the type of computing machinery available. X m  (650) svec−1 (AT y) = yi Ai i=1 The vector inner-product for matrices is defined in the Euclidean/Frobenius sense in the isomorphic vector space Rn(n+1)/2 . SIOPT Editor in Chief 4.4.3 [35] [278. M. is now integral to modern technology. . [276. p.3 . −Nicholas I.

Second-order cone program and quadratic program each subsume linear program. Semidefinite program is a subset of convex program PC.1. Nonconvex program \PC comprises those for which convex equivalents have not yet been found. Semidefinite program subsumes other convex program classes excepting geometric program.4. CONIC PROBLEM 275 PC quadratic program semidefinite program second-order cone program linear program Figure 80: Venn diagram of programming hierarchy. .

4 4. it asserts an extreme point ( 2.3] that contemporary interior-point methods [385] [291] [279] [11] [61. for example.5 This phenomenon can be explained by recognizing that interior-point methods generally find solutions relatively interior to a feasible set by design. [249] This characteristic might be regarded as a disadvantage to this method of numerical solution. [59] and because it theoretically subsumes other convex techniques: (Figure 80) linear programming and quadratic programming and second-order cone programming.0.3. That proposition asserts existence of feasible solutions with an upper bound on their rank. [26.276 CHAPTER 4. 4.1. is presented in 4. semidefinite programming has recently emerged to prominence because it admits a new class of problem previously unsolvable by convex optimization techniques. generally find vertex solutions.3. 4.5 {X ∈ S n | A svec X = b} (651) SOCP came into being in the 1990s. II. are rendered more difficult to find as numerical error becomes more prevalent there.4.3] Log barriers are designed to fail numerically at the feasible set boundary. 11] [146] (developed circa 1990 [152] for numerical solution of semidefinite programs) can converge to a solution of maximal complementarity. in contrast. A 4.5.13.0. 8. for construction of a primal optimal solution X ⋆ to (648P) satisfying an upper bound on rank governed by Proposition 2.0.1] specifically.4. given A ∈ Rm×n(n+1)/2 and b ∈ Rm . 4.6 Simplex methods.1 Reduced-rank solution A simple rank reduction algorithm.4. p. 5] [393] [254] [159] not a vertex-solution but a solution of highest cardinality or rank among all optimal solutions.2 Maximal complementarity It has been shown [394. it is not posable as a quadratic program. 2. 13]. all on the boundary. but this behavior is not certain and depends on solver implementation.6 [6.9.1. So low-rank solutions.6.2. can be posed as a semidefinite program.1.4 Determination of the Riemann mapping function from complex analysis [288] [29. . [177.1) of primal feasible set n A ∩ S+ satisfies upper bound √ 8m + 1 − 1 rank X ≤ (272) 2 where. SEMIDEFINITE PROGRAMMING Nevertheless.

+ Euclidean distance from any particular rank-3 positive semidefinite matrix (in the cone interior) to the closest rank-2 positive semidefinite matrix (on the boundary) is generally less than the distance to the closest rank-1 positive semidefinite matrix.12.1 similarly hold for S3 and S+ .7. analogy That low-rank and high-rank optimal solutions {X ⋆ } of (648P) coexist may be grasped with the following analogy: We compare a proper polyhedral cone 3 S+ in R3 (illustrated in Figure 81) to the positive semidefinite cone S3 in + isometrically isomorphic R6 .9.2. Rank of a sum of members F + G in Lemma 2. Rank-1 matrices are in one-to-one correspondence with extreme 3 directions of S3 and S+ . + 3 int S+ has three dimensions. + 3 boundary ∂S+ contains 0-. The set of all rank-1 symmetric matrices in + this dimension G ∈ S3 | rank G = 1 + is not a connected set.2. 1-. and 2-dimensional faces. rank-1.9.1.2. boundary ∂ S3 contains rank-0.and high-rank solutions.1. CONIC PROBLEM is the affine subset from primal problem (648P).9. (652) . and rank-2 matrices.1.1.1.1 and location of 3 a difference F − G in 2.2) distance from any point in ∂ S3 to int S3 is infinitesimal ( 2. difficult to visualize. The analogy is good: int S3 is constituted by rank-3 matrices.4.2 Coexistence of low. 277 4. the only rank-0 matrix resides in the vertex at the origin.1). ( 7. + + 3 3 distance from any point in ∂S+ to int S+ is infinitesimal.

Number of facets is + arbitrary (analogy is not inspired by eigenvalue decomposition). rank-2 to the facet relative interiors. rank-1 positive semidefinite matrices correspond to the edges of the polyhedral cone. X on P . and 3 extreme directions of S+ . and rank-3 to the polyhedral cone interior. 3 Vertices Γ1 and Γ2 are extreme points of polyhedron P = ∂H ∩ S+ .r. SEMIDEFINITE PROGRAMMING C 0 Γ1 P Γ2 3 S+ A = ∂H Figure 81: Visualizing positive semidefinite cone in high dimension: Proper 3 polyhedral cone S+ ⊂ R3 representing positive semidefinite cone S3 ⊂ S3 . (confer Figure 26. A given vector C is normal to another hyperplane (not illustrated but independent w. + analogizing its intersection with hyperplane S3 ∩ ∂H . Figure 30) .278 CHAPTER 4. The rank-0 positive semidefinite matrix corresponds to the origin in R3 .t ∂H) containing line segment Γ1 Γ2 minimizing real linear function C .

the analogue to (272). 3 from all X ∈ ∂H ∩ S+ there exists an X such that rank X ≤ 1.1.9. Proof. Now visualize intersection of the polyhedral cone with two (m = 2) hyperplanes having linearly independent normals. The hyperplane intersection A makes a line. providing an upper bound. there exists a positive semidefinite matrix X belonging to any line intersecting the polyhedral cone such that rank X ≤ 2. For A equal to intersection of m hyperplanes having linearly 3 independent normals. . and for X ∈ S+ ∩ A . the hyperplane intersection A makes a point that can reside anywhere in 3 the polyhedral cone. In other words. Every intersecting line contains at least one matrix having rank less than or equal to 2.2. we have rank X ≤ m . In the case of three independent intersecting hyperplanes (m = 3). Positive semidefinite cone S3 has four kinds of faces.4. whose dimensions in isometrically isomorphic R6 are listed under dim F(S3 ).3.1): + 279 k 0 boundary 1 2 interior 3 3 dim F(S+ ) dim F(S3 ) + 0 0 1 1 2 3 3 6 dim F(S3 ∋ rank-k matrix) + 0 1 3 6 3 Integer k indexes k-dimensional faces F of S+ . Rank 1 is therefore an upper bound in this case. CONIC PROBLEM 3 faces of S3 correspond to faces of S+ (confer Table 2. The upper bound on a point in S+ is also the greatest upper bound: rank X ≤ 3. id est. With reference to Figure 81: Assume one (m = 1) hyperplane A = ∂H intersects the polyhedral cone. Every intersecting plane contains at least one matrix having rank less than or equal to 1 . including cone itself (k = 3. + boundary + interior). Smallest face F(S3 ∋ rank-k matrix) + + that contains a rank-k positive semidefinite matrix has dimension k(k + 1)/2 by (222).

As predicted by analogy to Barvinok’s Proposition 2.0. X = f0 }. and every face contains a subset of the extreme points of P by the extreme existence theorem ( 2. 4. Consider minimization of the real linear function C .2).280 CHAPTER 4.9.1. This means: because the affine subset A and hyperplane ⋆ {X | C .6. Γ2 } ⊆ F(P) (657) As all linear functions on a polyhedron are minimized on a face.0.7 in particular.2)) extreme points of the intersection P as well. The rank-1 matrices Γ1 and Γ2 in face F(P) are extreme points of that face and (by transitivity ( 2. X over P a polyhedral feasible set. calculation of an upper bound on ⋆ rank of X ignores counting the hyperplane when determining m in (272). Optimization over A ∩ S+ . The upper bound on rank of an optimal solution X ⋆ existent in F(P) is thereby also satisfied by an extreme point of P precisely because {X ⋆ } constitutes F(P) .0. on any X belonging to the face of P 3 ⋆ F(P) = {X | C . In other words.2. the set of all ⋆ optimal points X is a face of P {X ⋆ } = F(P) = Γ1 Γ2 (656) comprising rank-1 and rank-2 positive semidefinite matrices.2. [94] [253] [275] [282] by analogy we so demonstrate coexistence of optimal solutions X ⋆ of (648P) having assorted rank. X = f0 } must intersect a whole face of P . ⋆ f0 3 A ∩ S+ (653) minimize X C. SEMIDEFINITE PROGRAMMING 3 4.3.7 .1. the upper bound on rank of X existent in the feasible set P is satisfied by an extreme point. {X ⋆ ∈ P | rank X ⋆ ≤ 1} = {Γ1 . Γ2 } and all the rank-2 matrices in between.4.1 Example. X = f0 } ∩ A ∩ S+ (655) ⋆ exposed by the hyperplane {X | C .1. this linear function is minimized on any X belonging to the face of P containing extreme points {Γ1 . Rank 1 is the upper bound on existence in the feasible set P for this case m = 1 hyperplane constituting A .6. id est.X 3 subject to X ∈ A ∩ S+ (654) As illustrated for particular vector C and hyperplane A = ∂H in Figure 81.

minimize n X∈ S subject to I 0 .1. 2.2] when given a positive definite matrix C and an arbitrarily small neighborhood of C comprising positive definite matrices. Assume dimension n to be an even positive number.9. Clearly.3]. optimal solution (659) is a terminal solution along the central path taken by the interior-point method as implemented in [394. We will review such an algorithm in 4. certainly nothing given any symmetric matrix C . This can be proved by example: 4. ˜ there exists a matrix C from that neighborhood such that optimal solution n ⋆ ˜ X to (648P) (substituting C ) is an extreme point of A ∩ S+ and satisfies 4.1. Then the particular instance of problem (648P).2. X 0 2I X 0 I. Indeed.3.3 Previous work 281 Barvinok showed.1 Example. it is also a solution of highest rank among all optimal solutions to (658). 2. .1.2.0.3. [24. as the problem is posed.1. but first we provide more background. rank of this primal optimal solution exceeds by far a rank-1 solution predicted by upper bound (272). (Ye) Maximal Complementarity. CONIC PROBLEM 4.5.8 upper bound (272). 4. the set of all such C in that neighborhood is open and dense. this means nothing inherently guarantees that an optimal solution X ⋆ to problem (648P) satisfies (272).8 ˜ Further.2.4.1.X =n 2I 0 0 0 (658) has optimal solution X⋆ = ∈ Sn (659) with an equal number of twos and zeros along the main diagonal.4 Later developments This rational example (658) indicates the need for a more generally applicable and simple algorithm to identify an optimal solution X ⋆ satisfying Barvinok’s Proposition 2.3. 4. Given arbitrary positive definite C .

X (660) D∗ = n S ∈ S+ . ( 2.9 n Equivalently. each assumed nonempty. primal feasible set A ∩ S+ is nonempty if and only if yi Ai i=1 0.1. n primal feasible A ∩ S+ represents an intersection of the positive semidefinite n cone S+ with an affine subset A of the subspace of symmetric matrices S n in isometrically isomorphic Rn(n+1)/2 . m} 4. m} . A has dimension n(n + 1)/2 − m when the vectorized Ai are linearly independent. each and every vector y = [yi ] ∈ Rm such that 0.   Am .0.1 n A ∩ S+ emptiness determination via Farkas’ lemma (651) A = {X ∈ S n | Ai . X   .9. Dual feasible set D∗ is a Cartesian product of the positive semidefinite cone with its inverse image ( 2. 4.1) under affine transformation4. matrix S is known as a slack variable (a term borrowed from linear programming [94]) since its inclusion raises this inequality to equality.2.0.2.1. . .2. Hence. (648P) and (648D) are convex optimization problems. i = 1 . n n  = b = A ∩ S+ . . SEMIDEFINITE PROGRAMMING 4. D = X ∈ S+ |  .5.1 Framework Feasible sets Denote by D and D∗ the convex sets of primal and dual points respectively satisfying the primal and dual constraints in (648). X = b i .1) and is known as a linear matrix inequality.1) Because yi Ai C .1. Semidefinite Farkas’ lemma. and affine subset n then primal feasible set A ∩ S+ is nonempty if and only if y T b ≥ 0 holds for m {A svec X | X yi Ai i=1 0} (379) is closed. i = 1 . Given set {Ai ∈ S n .2 4. Geometrically.     A1 . Both feasible sets are convex. y = [yi ] ∈ Rm | m yi Ai + S = C i=1 These are the primal feasible set and dual feasible set. . m y T b ≥ 0 holds for each and every vector y = 1 such that 4.1.9.1.13.1 Lemma. vector b = [b i ] ∈ Rm .1. .9 C − yi Ai . ⋄ Inequality C − yi Ai 0 follows directly from (648D) ( 2. and the objective functions are linear on a Euclidean vector space.282 CHAPTER 4.

1 K = {y | ∗ m yj Aj j=1 0} (386) then we can apply membership relation b∈K ⇔ to obtain the lemma b∈K ⇔ ∃X 0 A svec X = b ∗ y. T svec(Am ) semidefinite Farkas’ lemma assumes that a convex cone K = {A svec X | X 0} (379) is closed per membership relation (319) from which the lemma springs: [233. a positive definite lemma is required to insure that a point of intersection closest to the origin is not at infinity. any point satisfying the respective affine constraints and relatively interior to the convex cone.  ∈ Rm×n(n+1)/2 (649) A = .182).8] because existence of a feasible n solution in the cone interior A ∩ int S+ is required by Slater’s condition 4.g.2.3] ..10 Slater’s sufficient constraint qualification is satisfied whenever any primal or dual strictly feasible solution exists.1.10 to achieve 0 duality gap (optimal primal−dual objective difference 4.4. Geometrically. 4. 6. 1. 5.6] [41.2. e. then Slater’s constraint qualification is satisfied when any feasible solution exists (relatively interior to the cone or on its relative boundary). p. id est.3. FRAMEWORK 283 Semidefinite Farkas’ lemma provides necessary and sufficient conditions n for a set of hyperplanes to have nonempty intersection A ∩ S+ with the positive semidefinite cone. a positive definite version is required for semidefinite programming [394. [61. [332. b ≥ 0 ∀ y ∈ K ⇔ ⇔ n A ∩ S+ = ∅ n A ∩ S+ = ∅ (661) (662) The final equivalence synopsizes semidefinite Farkas’ lemma.2. .325] If the cone were polyhedral. I] K closure is attained when matrix A satisfies the cone closedness invariance corollary (p. b ≥ 0 ∀y ∈ K ∗ (319) b∈K ⇔ y. Figure 58).13. While the lemma is correct as stated. Given closed convex cone K and its dual from Example 2. Given   svec(A1 )T .3.5.

X = b i . m} (651) holds for each and every vector y = [yi ] = 0 such that Equivalently.3 Example. . to support closedness in semidefinite Farkas’ Lemma 4. for y = 1 y1 4. videlicet. “New” Farkas’ lemma.11 0 1 √ 2 0 + y2 0 0 0 1 0 ⇔ y= 0 1 ⇒ yTb = 0 (666) n Detection of A ∩ int S+ = ∅ by examining int K instead is a trick need not be lost.1.1.2. . ⋄ 4. make affine set n Primal feasible cone interior A ∩ int S+ is nonempty if and only if y T b > 0 m A = {X ∈ S n | Ai .1. Positive definite Farkas’ lemma.2.i. K (379) and K (386). primal feasible cone interior A ∩ only if y T b > 0 holds for each and every vector y = 1 yi Ai 0. SEMIDEFINITE PROGRAMMING Figure 45. i=1 n int S+ is nonempty m yi Ai i=1 if and 0. Yet the dual system yi Ai i=1 0 ⇒ y T b ≥ 0 erroneously indicates nonempty intersection because 1 √ 2 K (379) violates a closedness condition of the lemma.1.1. i = 1 . Lasserre [233. b= 1 0 (664) n Intersection A ∩ S+ is practically empty because the solution set {X m 0 | A svec X = b} = α 1 √ 2 1 √ 2 0 0 | α∈ R (665) is positive semidefinite only asymptotically (α → ∞). originally offered by Ben-Israel in 1969 [32. i = 1 . b > 0 ∀y ∈ K .1: A svec(A1 )T svec(A2 )T = 0 1 0 0 0 1 . y = 0 ∗ ⇔ n A ∩ int S+ = ∅ (663) ∗ Positive definite Farkas’ lemma is made from proper cones. and membership relation (325) for which K closedness is unnecessary: 4. m} and vector b = [b i ] ∈ Rm . set {Ai ∈ S n . .284 CHAPTER 4. .11 (382) b ∈ int K ⇔ y.1. p.4.2 Lemma. we wish to detect existence of nonempty primal feasible set interior to the PSD cone. Then given A ∈ Rm×n(n+1)/2 having rank m .378]. Given l. .2. III] presented an example in 1995.

2. what we need to know for semidefinite programming. 4. i = 1 .1.2. 4. .1.13.3 Boundary-membership criterion (confer (662)(663)) From boundary-membership relation (329). b = 0.1. y = 0 n Any single vector y satisfying the alternative certifies A ∩ int S+ is empty. m} minimize y yTb m subject to i=1 yi Ai y 2 0 (668) ≤1 If an optimal vector y ⋆ = 0 can be found such that y ⋆T b ≤ 0. . we may construct alternative systems from them. positive definite Farkas’ Lemma 4.2. y ∈ K . Such a vector can be found as a solution to another semidefinite program: for linearly independent (vectorized) set {Ai ∈ S n . But positive definite Farkas’ lemma ( 4.2. yi Ai i=1 (667) 0.2 certifies that n A ∩ int S+ is empty. b ∈ ∂K ⇔ ∃y =0 y. Applying the method of 2.1) to make a new lemma having no closedness condition. for proper ∗ cones K (379) and K (386) of linear matrix inequality.1.2.1.1.2.2) is simpler and obviates the additional condition proposed.1. then from positive definite Farkas’ lemma we get n A ∩ int S+ = ∅ m or in the alternative yTb ≤ 0 .1. then primal n feasible cone interior A ∩ int S+ is empty.4.2 Theorem of the alternative for semidefinite programming Because these Farkas’ lemmas follow from membership relations. FRAMEWORK 285 On the other hand.1.1. Lasserre suggested addition of another condition to semidefinite Farkas’ lemma ( 4. b ∈ K ∗ ⇔ n n ∂ S+ ⊃ A ∩ S+ = ∅ (669) .2.

y i yi Ai + S . . one that is certainly expressible as a feasibility problem: Given linearly independent set4.1.2 Duals The dual objective function from (648D) evaluated at any feasible solution represents a lower bound on the primal optimal objective value from (648P). 4.X ≥0 (671) The converse also follows because X 0. for b ∈ K (661) find y = 0 subject to y T b = 0 m (670) 0 yi Ai i=1 Any such nonzero solution y certifies that affine subset A (651) intersects the n positive semidefinite cone S+ only on its boundary.12 .8] and can be used to detect convergence in any primal/dual numerical method of solution. X ] y S. vector b on the boundary of K cannot be detected simply by looking for 0 eigenvalues in matrix X .12 {Ai ∈ S n .5. We do not consider a n skinny-or-square matrix A because then feasible set A ∩ S+ is at most a single point.X ≥0 (1518) Optimal value of the dual objective thus represents the greatest lower bound on the primal.1.13. Then it is always true: C . 1. nonempty n n feasible set A ∩ S+ belongs to the positive semidefinite cone boundary ∂ S+ . . SEMIDEFINITE PROGRAMMING Whether vector b ∈ ∂K belongs to cone K boundary.286 CHAPTER 4. S 0 ⇒ S. n We can see this by direct substitution: Assume the feasible sets A ∩ S+ and ∗ D are nonempty. X · · · Am . in other words. that is a determination we can indeed make. 4. X ≥ [ A1 . m} . X ≥ b. This fact is known as the weak duality theorem for semidefinite programming.2. [394. From the results of Example 2. i = 1 .3.

˜ Derive prototypical primal (648P) from its canonical dual (648D).1 Dual problem statement is not unique 287 Even subtle but equivalent restatements of a primal convex problem can lead to vastly different statements of a corresponding dual problem. y C subject to svec−1 (AT y) ˜ (648D) n Dual feasible cone interior in int S+ (660) (650) thereby corresponds with ˜ canonical dual (D) feasible interior m ˜ rel int D∗ y∈ R | m i=1 yi Ai ≺ C (672) 4. X ⋆ = 0 . X ⋆ = b .2.2. the primal optimal objective value becomes equal to the dual optimal objective value: there is no duality gap (Figure 58) and so determination of n ˜ convergence is facilitated.2. X ⋆ ] y ⋆ (673) S⋆ .283). then these two problems (648P) (648D) become strong duals by Slater’s sufficient condition (p. FRAMEWORK 4.2.4. demonstrate that particular connectivity in Figure 82.3 Optimality conditions n When primal feasible cone interior A ∩ int S+ exists in S n or when canonical ˜ dual feasible interior rel int D∗ exists in Rm .1. y⋆ i ⋆ yi Ai + S ⋆ . In other words.2. X ⋆ · · · Am . S∈ Sn maximize b.1 Exercise. for example. X ⋆ = [ A1 . Primal prototypical semidefinite program. This phenomenon is of interest because a particular instantiation of dual problem might be easier to solve numerically or it might take one of few forms for which analytical solution is known. Here is a canonical restatement of prototypical dual semidefinite program (648D). id est. equivalent by (194): y∈Rm. y (D) ≡ subject to S 0 −1 T svec (A y) + S = C maximize m y∈R b.2. 4. id est. if ∃ X ∈ A ∩ int S+ or ∃ y ∈ rel int D∗ then C .

13 We summarize this: 4.1] [394. [362. y ⋆ are optimal for (648D) duality gap C .3. dual of a dual problem is not necessarily stated in precisely same manner as corresponding primal convex problem. in keeping with linear programming tradition [94]. that forbids dual inequalities in (648) to simultaneously hold strictly.1. assume primal n and dual feasible sets A ∩ S+ ⊂ S n and D∗ ⊂ S n × Rm (660) are nonempty. More generally.4. in other words. SEMIDEFINITE PROGRAMMING ˜ P duality duality D transformation ˜ D Figure 82: Connectivity indicates paths between particular primal and dual problems from Exercise 4. X = 0 4.1 Corollary. 3.3. Optimality and strong duality.2. [308. X ⋆ = 0 is called a complementary slackness condition. although its solution set is equivalent to within some transformation. 4] .1.2. X ⋆ − b .13 ˜ ∃ y ∈ rel int D∗ ⋄ Optimality condition S ⋆ .8] For semidefinite programs (648P) and (648D). y ⋆ denote a dual optimal solution. where S ⋆ .288 P CHAPTER 4.0.2. y ⋆ is 0 if and only if n i) ∃ X ∈ A ∩ int S+ or and ⋆ ⋆ ii) S . 1. Then X ⋆ is optimal for (648P) S ⋆ . any given path is not necessarily circuital. any path between primal ˜ ˜ problems P (and equivalent P) and dual D (and equivalent D) is possible: implying.

3.4) S ⋆ . .2. X ⋆ = 0 ⇔ S ⋆X ⋆ = X ⋆S ⋆ = 0 (674) Commutativity of diagonalizable matrices is a necessary and sufficient condition [203.1 solving primal problem via dual The beauty of Corollary 4.1 is its conjugacy. The product of the nonnegative diagonal Λ matrices can be 0 if their main diagonal zeros are complementary or coincide.2). FRAMEWORK 289 For symmetric positive semidefinite matrices.2. When a dual optimal solution is known. the total number of nonzero diagonal entries from both Λ cannot exceed n .14 distinct from maximal complementarity ( 4.2. because of the complementarity.4. Due only to symmetry.11] S ⋆ X ⋆ = X ⋆ S ⋆ = QΛS ⋆ ΛX ⋆ QT = 0 ⇔ ΛX ⋆ ΛS ⋆ = 0 (676) where Q is an orthogonal matrix.7.14 Logically it follows that a necessary and sufficient condition for strict complementarity of an optimal primal and dual solution is X ⋆ + S⋆ ≻ 0 (678) 4.12] for these two optimal symmetric matrices to be simultaneously diagonalizable. Therefore rank X ⋆ + rank S ⋆ ≤ n (675) Proof. requirement ii is equivalent to the complementarity ( A.4. rank X ⋆ = rank ΛX ⋆ and rank S ⋆ = rank ΛS ⋆ for these optimal primal and dual solutions. S ⋆ ∈ S n must itself be symmetric because of commutativity.1. cor. X = 0}. id est. a primal optimal solution is any primal feasible solution in hyperplane {X | S ⋆ . the product of symmetric optimal matrices X ⋆ .2. 4. and so we have what is called strict complementarity.3. To see that. one can solve either the primal or dual problem in (648) and then find a solution to the other via the optimality conditions. When equality is attained in (675) rank X ⋆ + rank S ⋆ = n (677) there are no coinciding main diagonal zeros in ΛX ⋆ ΛS ⋆ .3. 1.0. for example. (1493) So. (1507) The symmetric product has diagonalization [11.

4] [348] (confer Example 4.5.1) Consider finding a minimal cardinality Boolean solution x to the classic linear algebra problem Ax = b given noiseless data A ∈ Rm×n and b ∈ Rm . 4. (679) i=1 . i = 1 . .1 Example. Given data  −1 −9 1 2 4 8 8 8 1 1 2 1 4 1 1 3 1 9 0 1 −1 2 3 1 −1 4 9  A =  −3   . .1. the two problems minimize (679) x x 0 minimize = i=1 . n x x 1 subject to Ax = b xi ∈ {0.290 CHAPTER 4. The Boolean constraint makes the 1-norm problem nonconvex. 1} .1.3. x⋆ = e4 ∈ R6 (682) .a 0-norm. The Boolean constraint forces an identical solution were the norm in problem (679) instead the 1-norm or 2-norm.k. n where x 0 denotes cardinality of vector x (a. In this example.2. 1} . minimize x x 0 subject to Ax = b xi ∈ {0.  b =  1 1 2 1 4    (681) the obvious and desired solution to the problem posed. . [93] [34. .5. arising in many fields of science and across many disciplines. id est. . [320] [215] [172] [171] Yet designing an efficient algorithm to optimize cardinality has proved difficult. Minimal cardinality Boolean. n are the same. we also constrain the variable to be Boolean. (680) subject to Ax = b xi ∈ {0.3. SEMIDEFINITE PROGRAMMING 4. not a convex function). A minimal cardinality solution answers the question: “Which fewest linear combination of columns in A constructs vector b ?” Cardinality problems have extraordinarily wide appeal. 1} . .

0. The Matlab backslash command x=A\b . finds  2  128 xM having norm xM 2 = 0.7044 . xM is a 1-norm solution.0456  −0.0623 † xP = A b =   0. the optimal solution to ( E.3770 −0.1881   0.5165 . and in general. FRAMEWORK 291 has norm x⋆ 2 = 1 and minimal cardinality.1) minimize x x 2 subject to Ax = b (685) Certainly none of the traditional methods provide x⋆ = e4 (682) because.4.1102          (517) (684) has least norm xP 2 = 0.  0     5    =  128   0   90   128  0 (683) minimize x 1 subject to Ax = b The pseudoinverse solution (rounded) −0. for Ax = b arg inf x 2 2 ≤ arg inf x 1 2 ≤ arg inf x 0 2 (686) . for example. an optimal solution to x Coincidentally. the minimum number of nonzero entries in vector x . id est. id est.0.2.1.2668   0.

by removing it4. 11.292 CHAPTER 4.4] The relaxed problem must therefore be convex having a larger feasible set. SEMIDEFINITE PROGRAMMING We can reformulate this minimal cardinality Boolean problem (679) as a semidefinite program: First transform the variable x so xi ∈ {−1.1). its optimal objective value represents a generally loose lower bound (1720) on the optimal objective of problem (690). 4. The rank constraint makes this problem nonconvex. By assigning ( B.5. ˆ minimize x ˆ 1 (ˆ + 1) 2 x 0 1 (ˆ + 1) 2 x (687) subject to A(ˆ + x =b δ(ˆxT ) = 1 xˆ 1) 1 2 (688) where δ is the main-diagonal linear operator ( A.1) G= x ˆ 1 [ xT 1 ] ˆ = X x ˆ T x 1 ˆ xxT x ˆˆ ˆ xT 1 ˆ ∈ S n+1 (689) problem (688) becomes equivalent to: (Theorem A.9.15 we get the semidefinite program Relaxed problem (691) can also be derived via Lagrange duality.7) minimize 1T x ˆ (690) X∈ Sn . 5.3.0.1) by the rank constraint.1. 1} .1. it is a dual of a dual program [sic] to (690). exer. and the equality constraints δ(X) = 1. x∈Rn ˆ subject to A(ˆ + 1) 1 = b x 2 X x ˆ G= ( T x 1 ˆ δ(X) = 1 rank G = 1 0) where solution is confined to rank-1 vertices of the elliptope in S n+1 ( 5.15 . equivalently.39] [380.3. IV] [151. the positive semidefiniteness. [306] [61.0.

3. and we are minimizing an affine function on a subset of the elliptope (Figure 136) containing rank-1 vertices.2.9.17 For that data given in (681). 4.18 finds optimal solution to (691)  1 1 1 1 1 1 1 1 1 −1 −1 −1 1 1 1 1 1 1 −1 −1 −1 −1 1 1 −1 1 1 −1 1 1 1 −1 −1 −1 1 1 −1 1 1 1 −1 −1 −1 −1 −1 1 −1 −1 1           (692)     ⋆ round(G ) =      near a rank-1 vertex of the elliptope in S n+1 (Theorem 5. by assumption that the feasible set of minimal cardinality Boolean problem (679) is nonempty. Hope4.4. Confinement to the elliptope can be regarded as a kind of normalization akin to matrix A column normalization suggested in [121] and explored in Example 4.12.4. a desired solution resides on the elliptope relative boundary at a rank-1 vertex.16 . id est.0. FRAMEWORK X∈ S .2. x∈R ˆ 293 minimizen 1T x ˆ n (691) 0 subject to A(ˆ + 1) 1 = b x 2 X x ˆ G= xT 1 ˆ δ(X) = 1 whose optimal solution x⋆ (687) is identical to that of minimal cardinality Boolean problem (679) if and only if rank G⋆ = 1. its sorted A more deterministic approach to constraining rank and cardinality is in 4. Nonzero primal−dual objective difference is not a good measure of solution accuracy.2.1.16 of acquiring a rank-1 solution is not ill-founded because 2n elliptope vertices have rank 1. 4. id est.2).6.1. our semidefinite program solver sdpsol [387] [388] (accurate in solution to approximately 1E-8)4.2204E-16). optimal solution x⋆ cannot be more accurate than square root of machine epsilon (ǫ = 2.18 A typically ignored limitation of interior-point solution methods is their relative accuracy of only about 1E-8 on a machine using 64-bit (double precision) floating-point arithmetic.17 4.0.0.

6) 2 x 0 subject to Ax = b xi ∈ {0.1.00000000999738     −0. n . Then approximate the minimal cardinality Boolean problem minimize x 4. 3.2 Example. the desired result (682)   0. Suppose we define a diagonal matrix   A(: .  6. CHAPTER 4.00000001408950  0.. the standard practice [121] [68.1. . not all SDP solvers will return a rank-1 vertex solution.99999997469044  = e4    0. A minimal cardinality problem is typically formulated via. 1} .00000001000000 Negative eigenvalues are undoubtedly finite-precision effects. We find.3. indeed. 3.00000000181001   (694) x⋆ = round  0.2.00000022687241     0. 2) 2   Λ (695)   ∈ S6 . ..00000000482903  used to normalize the columns (assumed nonzero) of given noiseless data matrix A .00000000527369     0.99999977799099  0.1. 1) 2 0   A(: . SEMIDEFINITE PROGRAMMING These numerical results are solver dependent.2.2.   0 A(: .3. what is by now. (679) i=1 . insofar.00000002250296    (693) λ(G⋆ ) =  0.00000000262974     −0. Because the largest eigenvalue predominates by many orders of magnitude.00000000127947  0. we can expect to find a good approximation to a minimal cardinality Boolean solution by truncating all smaller eigenvalues.294 eigenvalues.00000000999875  −0. Optimization over elliptope versus 1-norm polyhedron for minimal cardinality Boolean Example 4.4] of column normalization applied to a 1-norm problem surrogate like (517).

given data A = 1 0 0 1 1 √ 2 1 √ 2 . Approximation (696) is therefore equivalent to minimization of an affine function ( 3.4. equivalently expressible as a linear program via (513). Although the same Boolean solution is obtained from this approximation (696) as compared with semidefinite program (691).2) on a bounded polyhedron.2.1. when given that particular data from Example 4. 1} from (679).2. b = 1 1 (698) then solving approximation (696) yields . whereas semidefinite program minimizen 1T x ˆ n (691) 0 X∈ S .1. FRAMEWORK as minimize y ˜ 295 y ˜ 1 −1 subject to AΛ y = b ˜ 1 Λ−1 y 0 ˜ where optimal solution y ⋆ = round(Λ−1 y ⋆ ) ˜ (696) (697) The inequality in (696) relaxes Boolean constraint yi ∈ {0. bounding any solution y ⋆ to a nonnegative unit hypercube whose vertices are binary numbers. Singer confides a counterexample: Instead.3. Convex problem (696) is justified by the convex envelope cenv x 0 on {x ∈ Rn | x ∞ ≤ κ} = 1 x κ 1 (1399) Donoho concurs with this particular formulation. x∈R ˆ subject to A(ˆ + 1) 1 = b x 2 X x ˆ G= T x 1 ˆ δ(X) = 1 minimizes an affine function on an intersection of the elliptope with hyperplanes.

5 Exercise. with or without rounding.00000001000000   y ⋆ = round 1 −   0     =  0  1  (699) (701) Truncating all but the largest eigenvalue. 4.1. ( 2.99999965057264  0.296 CHAPTER 4. 4. Assess general performance of standard-practice approximation (696) as compared with the proposed semidefinite program (691). SEMIDEFINITE PROGRAMMING  1− 1 1 √ 2 1 √ 2 with sorted eigenvalues (infeasible. from (687) we obtain (confer y ⋆ )     1 0. may be regarded full-rank without loss of generality. Linear independence.99999999625299  =  1  (702) x = round 0 0.00000035942736   λ(G⋆ ) =   −0.4 Exercise.2. Conic independence.10) To what part of proper cone K = {Ax | x 0} does vector b belong? 4.2. with respect to original problem (679)) whereas solving semidefinite program (691) produces   1 1 −1 1  1 1 −1 1   round(G⋆ ) =  (700)  −1 −1 1 −1  1 1 −1 1  3.3.99999999625299 ⋆  0. In other words: Is a minimal cardinality solution invariant to linear dependence of rows? .3.00000000000000  −0. Minimal cardinality Boolean art.3. Show why fat matrix A . Matrix A from (681) is full-rank having three-dimensional nullspace.2. from compressed sensing problem (517) or (522).1.3 Exercise.00000001434518 the desired minimal cardinality Boolean result.1. Find its four conically independent columns.

−Farid Alizadeh (1995) [11. −Monique Laurent (2004) . Rank reduction is a means to adjust rank of an optimal solution.  ∈ Rm×n(n+1)/2  . 3] [293] Consider again affine subset A = {X ∈ S n | A svec X = b} where for Ai ∈ S n A  svec(A1 )T .2] It is therefore sufficient to locate an extreme point of the intersection whose primal objective value (648P) is optimal:4.3 Rank reduction . The particular numerical algorithm solving a semidefinite program may have instead returned a high-rank optimal solution ( 4..19 (648P) There is no known construction for Barvinok’s tighter result (277). e. (659)) when a lower-rank optimal solution was expected.3. . 2.1. [24. RANK REDUCTION 297 4. until it satisfies Barvinok’s upper bound. p.1 Posit a perturbation of X ⋆ n Recall from 4.2. 4.1. it is not clear generally how to predict rank X ⋆ or rank S ⋆ before solving the SDP problem.3.19 [114.22] The premise of rank reduction in semidefinite programming is: an optimal solution found does not satisfy Barvinok’s upper bound (272) on rank. svec(Am )T  minimize n X∈ S (651) (649) Given any optimal solution X ⋆ to C.1. there is an extreme point of A ∩ S+ (651) satisfying upper bound (272) on rank.X n subject to X ∈ A ∩ S+ 4. .4.2.5.g. 31.4] [241] [7. 2.3] [240. . returned by a solver.

The following simple algorithm has very low computational intensity and locates an optimal extreme point. .0. n} such that. Feasibility of (704) is insured in this manner.3. j = 1 .. Aj = 0 . m (705) n while membership to the positive semidefinite cone S+ is insured by small perturbation (714).n { 1.1. for some 0 ≤ i ≤ n and scalars {tj . optimality is proved in 4. SEMIDEFINITE PROGRAMMING whose rank does not satisfy upper bound (272). Rank reduction. compute a nonzero perturbation matrix Bi of X ⋆ + 2. Membership of (704) to affine subset A is secured for the i th perturbation by demanding Bi . assuming a nontrivial solution: 4. . .298 CHAPTER 4. j =1 . i} . .3. initialize: Bi = 0 ∀ i for iteration i=1. . i (703) X + j=1 ⋆ tj Bj (704) becomes an extreme point of A ∩ Sn and remains an optimal solution of + (648P).1 Procedure.3. Bj ∈ S n . maximize ti [Wıκımization] i−1 tj Bj j=1 subject to X ⋆ + } i j=1 n tj Bj ∈ S+ ¶ .. we posit existence of a set of perturbations {tj Bj | tj ∈ R . . j = 1 .

1 Definition. i = 1 .4. 31. Matrix step function. Numerical accuracy of any rank-reduced result. Z 0 −1 .3. RANK REDUCTION A rank-reduced optimal solution is then i 299 X ←X + ⋆ ⋆ tj Bj j=1 (706) 4.3] prove: every perturbation matrix Bi . is therefore quite dependent upon initial accuracy of X ⋆ . △ Deza & Laurent [114.5.3. otherwise (707) The value −1 is taken for indefinite or nonzero negative semidefinite argument.3.1) Define the signum-like quasiconcave real function ψ : S n → R ψ(Z ) 1.6.2. n . . is of the form T Bi = −ψ(Zi )Ri Zi Ri ∈ S n (708) where i−1 X ⋆ T R1 R1 .2 Perturbation form Perturbations of X ⋆ are independent of constants C ∈ S n and b ∈ Rm in primal and dual problems (648).0. (confer A. .0. found by perturbation of an initial optimal solution X ⋆ . 4.5. X + j=1 ⋆ tj Bj T Ri Ri ∈ S n (709) where the tj are scalars and Ri ∈ Rn×ρ is full-rank and skinny where i−1 ρ rank X ⋆ + j=1 tj Bj (710) .

A simple method of solution is closed-form projection of a random nonzero point on that proper subspace of isometrically isomorphic Rρ(ρ+1)/2 specified by the constraints. At iteration i i−1 X + j=1 ⋆ T tj Bj + ti Bi = Ri (I − ti ψ(Zi )Zi )Ri (712) By fact (1483). therefore i−1 X + j=1 ⋆ tj Bj + ti Bi 0 ⇔ 1 − ti ψ(Zi )λ(Zi ) 0 (713) where λ(Zi ) ∈ Rρ denotes the eigenvalues of Zi . and I − ti ψ(Zi )Zi can be padded with zeros while maintaining equivalence in (712). Ri Aj Ri = 0 . and because a positive semidefinite matrix having one or more 0 eigenvalues corresponds to a point on the PSD cone boundary (193). m is fixed by the number of equality constraints in (648P) while rank ρ decreases with each iteration i .5. SEMIDEFINITE PROGRAMMING and where matrix Zi ∈ S ρ is found at each iteration i by solving a very simple feasibility problem: 4.3. Necessity and sufficiency are due to the facts: Ri can be completed to a nonsingular matrix ( A.20 . a sparsity pattern complement. . we might iterate indefinitely. then a good choice for Zi has 1 in each entry corresponding to a 0 in the pattern. Maximization of each ti in step 2 of the Procedure reduces rank of (712) n and locates a new point on the boundary ∂(A ∩ S+ ) .1. guaranteed by ρ(ρ + 1)/2 > m .0. Indeed. id est. this geometric intuition about forming the perturbation is what bounds any solution’s rank from below. ( E. m} . Otherwise. j = 1 . m Were there a sparsity pattern common to each member of the set T {Ri Aj Ri ∈ S ρ .6) Such a solution is nontrivial assuming the specified intersection of hyperplanes is not the origin.20 find Zi ∈ S ρ T subject to Zi . . .21 Maximization of ti thereby has closed form.21 This holds because rank of a positive semidefinite matrix in S n is diminished below n by the number of its 0 eigenvalues (1493).300 CHAPTER 4. 4.4. .5). 4.0. (711) j =1 .0.

then the perturbation is null because an extreme point has been found. 2. Assuming the form of every perturbation matrix is indeed (708). thus T T svec(Ri A1 Ri ) · · · svec(Ri Am Ri ) ⊥ = 0 (718) from which the stated result (715) directly follows. . j = 1 . .4.4] [241] Proof. ρ} i (714) When Zi is indefinite. We may take an early exit from the Procedure were Zi to become 0 or were T T T rank svec Ri A1 Ri svec Ri A2 Ri · · · svec Ri Am Ri = ρ(ρ + 1)/2 (715) n which characterizes rank ρ of any [sic] extreme point in A ∩ S+ . direction of perturbation (determined by ψ(Zi )) is arbitrary. .3. RANK REDUCTION 301 (t⋆ )−1 = max {ψ(Zi )λ(Zi )j . [240. then by (711) T T T svec Zi ⊥ svec(Ri A1 Ri ) svec(Ri A2 Ri ) · · · svec(Ri Am Ri ) (716) By orthogonal complement we have T T rank svec(Ri A1 Ri ) · · · svec(Ri Am Ri ) ⊥ T T + rank svec(Ri A1 Ri ) · · · svec(Ri Am Ri ) = ρ(ρ + 1)/2 (717) When Zi can only be 0.

SEMIDEFINITE PROGRAMMING 4. X ⋆ + i i j=1 T (I − tj ψ(Zj )Zj ) R1 tj Bj j=1 = k=1 ⋆ yk Ak + S ⋆ .3 Optimality of perturbed X ⋆ We show that the optimal objective value is unaltered by perturbation (708).3. . R1 = k=1 m ⋆ yk Ak . From (709) and (712) we get the sequence: T X ⋆ = R1 R1 T T X ⋆ + t1 B1 = R2 R2 = R1 (I − t1 ψ(Z1 )Z1 )R1 T T T X ⋆ + t1 B1 + t2 B2 = R3 R3 = R2 (I − t2 ψ(Z2 )Z2 )R2 = R1 (I − t1 ψ(Z1 )Z1 )(I − t2 ψ(Z2 )Z2 )R1 .2.302 CHAPTER 4. From Corollary 4.X + j=1 ⋆ tj Bj = C . X ⋆ (719) Proof. .3. j by design (705). X⋆ (722) because Bi . X ⋆ = C . Aj = 0 ∀ i . i C.1 we have the necessary and sufficient relationship between optimal primal and dual solutions under assumption of n nonempty primal feasible cone interior A ∩ int S+ : T T S ⋆ X ⋆ = S ⋆ R1 R1 = X ⋆ S ⋆ = R1 R1 S ⋆ = 0 (720) T This means R(R1 ) ⊆ N (S ⋆ ) and R(S ⋆ ) ⊆ N (R1 ). i i X⋆ + tj Bj = R1 j=1 j=1 T (I − tj ψ(Zj )Zj ) R1 (721) Substituting C = svec−1 (AT y ⋆ ) + S ⋆ from (648).0. id est. C . . X⋆ + i j=1 m tj Bj = svec−1 (AT y ⋆ ) + S ⋆ .

RANK REDUCTION 303 4.1.0.0.3.4.1 Example. Rank-3 solution X ⋆ = δ(xM ) is optimal. This academic example demonstrates that a solution found by rank reduction can certainly have rank less than Barvinok’s upper bound (272): Assume a given vector b belongs to the conic hull of columns of a given matrix A     1 −1 1 8 1 1  1   m×n 1 1  . id est. b =  2  ∈ Rm (723) A =  −3 2 8 2 3  ∈ R 1 1 −9 4 8 1 9 4 4 Consider the convex optimization problem minimize 5 X∈ S tr X (724) subject to X 0 Aδ(X) = b that minimizes the 1-norm of the main diagonal. problem (724) is the same as minimize δ(X) 1 5 X∈ S subject to X 0 Aδ(X) = b (725) that finds a solution to Aδ(X) = b . we invoke Procedure 4.3. Aδ(X) = b . where (confer (683))  2  128 xM Yet upper bound (272) predicts existence of at most a √ 8m + 1 − 1 rank=2 2  0   5  =  128     0  90 128 (726) (727) feasible solution from m = 3 equality constraints.1: . To find a lower rank ρ optimal solution to (724) (barring combinatorics).3.3.

2.0247 0 0.1657    0 0 0 0 0 0. is a solution yielding nonzero perturbation matrix B1 .0247 0 128   0 0 0 0. R1 Aj R1 = 0 . having 0 main diagonal. 2.1657 0 0  0 0. 3 (728) find Z1 ∈ S3 T subject to Z1 . SEMIDEFINITE PROGRAMMING δ(A(j. ρ = 3. X ⋆ = δ(xM ) .1048  0 0  0 0. Aj { Iteration i=1:     Step 1: R1 =      2 128 CHAPTER 4. 3. 0 0 0 0 0 0 5 128 0 0 0 0 90 128  0 0     .1048  0 0 0 0  0 0 0 0.304 Initialize: C = I . . j = 1. 1  2 0 0.1657 } (731) has rank ρ ← 1 and produces the same optimal objective value. Choose arbitrarily Z1 = 11T − I ∈ S3 (729) then (rounding)  0 0 0.1048  0 0. n = 5.0247 128  0 0 0  5 X ⋆ ← δ(xM ) + B1 =  0. A nonzero randomly selected matrix Z1 .1048 0 0. :)) . So. m = 3.    j = 1.1657    0 0 90 0 128   B1 =    (730) Step 2: t⋆ = 1 because λ(Z1 ) = [ −1 −1 2 ]T .0247 0 0.

1).22 has linear inequality constraints of the form T αi svec X βi . inactive inequality constraints are ignored while active inequality constraints are interpreted as Contemporary numerical packages for solving semidefinite programs can solve a range of problems wider than prototype (648). i = 1. of course.4 thoughts regarding rank reduction Because rank reduction Procedure 4..k (732) where {βi } are given scalars and {αi } are given vectors. RANK REDUCTION 305 4. Demonstrate a rank-1 solution. i = 1.3. 4.3.1.3. we say the i th inequality constraint is active when it is T met with equality. but if they exist.4.3.1 inequality constraints The question naturally arises: what to do when a semidefinite program (not in prototypical form (648))4.1 is guaranteed only to produce another optimal solution conforming to Barvinok’s upper bound (272). But for the purpose of rank reduction. [11] [388] We are momentarily considering a departure from the primal prototype that augments the constraint set with linear inequalities.1. the Procedure will not necessarily produce solutions of arbitrarily low rank.3. Alternatively. αi svec X ⋆ = βi .0. which can certainly be found (by Barvinok’s Proposition 2. 4.1.2.1) because there is only one equality constraint.3. and the enormity of choices for Zi (711) are liabilities for this algorithm.0. Rank reduction of maximal complementarity. ⋆ An optimal high-rank solution X is. id est. feasible (satisfying all the constraints). One expedient way to handle this circumstance is to convert the inequality constraints to equality constraints by introducing a slack variable γ .3. Generally.3. Arbitrariness of search direction when matrix Zi becomes indefinite. Apply rank reduction Procedure 4. they do so by transforming a given problem into prototypical form by introducing new constraints and variables.2 Exercise. the Procedure can.3...1 to the maximal complementarity example ( 4.4. γ 0 (733) thereby converting the problem to prototypical form. when for particular i in (732). mentioned on page 301. id est.9.0.. T αi svec X + γi = βi .22 .k .0. 4.

W (p. C intersects the positive semidefinite cone boundary.W (735) subject to G ∈ C G 0 where direction vector 4.23 .1 rank constraint by convex iteration find G (734) Consider a semidefinite feasibility problem of the form G∈ SN subject to G ∈ C G 0 rank G ≤ n where C is a convex set presumed to contain positive semidefinite matrices of rank n or less. id est. W (1738a) subject to 0 W I tr W = N − n Search direction W is a hyperplane-normal pointing opposite to direction of movement describing minimization of a real linear function G .1). In other words. for finding low-rank optimal solutions to SDPs of a more general form: 4.23 W is an optimal solution to semidefinite program.2.4 Rank-constrained semidefinite program We generalize the trace heuristic ( 7. for 0 ≤ n ≤ N − 1 N λ(G⋆ )i = minimize N i=n+1 W∈S G⋆ . We propose that this rank-constrained feasibility problem can be equivalently expressed as iteration of the convex problem sequence (735) and (1738a): minimize G∈ SN G. we take the union of active inequality constraints (as equalities) with equality constraints A svec X = b to form ˆ a composite affine subset A substituting for (651).4.75).306 CHAPTER 4. 4. 4. SEMIDEFINITE PROGRAMMING equality constraints.2. Then we proceed with ⋆ rank reduction of X as though the semidefinite program were in prototypical form (648P).

argument to conv{ } .4. comprises the extreme points of this Fantope (90).4. Inner product of eigenvalues follows from (1623) and properties of commutative matrix products (page 619). any direction vector for which the last N − n nonincreasingly ordered eigenvalues λ of G⋆ are zero. a solution of least rank must be an extreme point of the feasible set. In any semidefinite feasibility problem.1. Sum of eigenvalues follows from results of Ky Fan (page 672).24 (confer (757)) N λ(G⋆ )i = G⋆ . and where G⋆ is an optimal solution to problem (735) given some iterate W .0. conv U U T | U ∈ RN ×N −n .5.1 direction matrix interpretation (confer 4.4. Then there must exist a linear objective function such that this least-rank feasible solution optimizes the resultant SDP. A =N− n (90) This set (92). I .4. 4.3.24 . RANK-CONSTRAINED SEMIDEFINITE PROGRAM 307 whose feasible set is a Fantope ( 2. Proposed iteration is neither dual projection (Figure 170) or alternating projection (Figure 174). 4. U T U = I = A ∈ SN | I A 0.2. The idea is to iterate solution of (735) and (1738a) until convergence as defined in 4. at global convergence. convex iteration (735) (1738a) makes it instead an equivalent problem. 2. Optimal direction vector W ⋆ is defined as any positive semidefinite matrix yielding optimal solution G⋆ of rank n or less to then convex equivalent (735) of feasibility problem (734): G∈ SN find G ≡ minimize G∈ SN G . W⋆ (735) (734) subject to G ∈ C G 0 rank G ≤ n subject to G ∈ C G 0 id est.2) The feasible set of direction matrices in (1738a) is the convex hull of outer product of all rank-(N − n) orthonormal matrices. videlicet. An optimal solution W to (1738a). a vanishing objective that is a certificate of global optimality but cannot be guaranteed.1).4] [241] We emphasize that convex problem (735) is not a relaxation of rank-constrained feasibility problem (734).2: 4.1.1. that is an extreme point. [240. W ⋆ = λ(G⋆ )T λ(W ⋆ ) i=n+1 0 (736) defines global convergence of the iteration.

25 . then direction matrix W = U ⋆ U ⋆T is optimal and extreme where U ⋆ = Q(: . By (221) (223). So that this method. not on these two subspaces. will not be misconstrued under closed-form solution W to (1738a): Define (confer (221)) Sn {(I −W )G(I −W ) | G ∈ SN } = {X ∈ SN | N (X) ⊇ N (G⋆ )} (738) as the symmetric subspace of rank ≤ n matrices whose nullspace contains N (G⋆ ). To make rank-constrained problem (734) resemble cardinality problem (529).672): N ⋆ T G = QΛQ ∈ S+ ( A. we could make C an affine subset: find X ∈ SN subject to A svec X = b X 0 rank X ≤ n 4. though efficient. For that particular closed-form solution W .1). n+1 : N ) ∈ RN ×N −n . (confer (759)) N λ(G⋆ )i = G⋆ .5. polar direction −W can be regarded as pointing toward the set of all rank-n (or less) positive semidefinite matrices whose nullspace contains that of G⋆ . Then projection of G⋆ on Sn is (I −W )G⋆ (I −W ). for constraining rank. convex iteration incorporating constrained tr(W GW ) = G . ( E. SEMIDEFINITE PROGRAMMING Given ordered diagonalization is known in closed form (p.4. Closed-form solution W to problem (1738a). not trace minimization of a nonnegative diagonal matrix δ(x) as in [136. id est. eigenvalue λ cardinality (rank) is analogous to vector x cardinality via (759). So we sometimes choose to solve (1738a) instead of employing a known closed-form solution. direction W is not unique. (Figure 83) tr(W G⋆ W ) is a measure of proximity to Sn because its orthogonal complement is ⊥ Sn = {W GW | G ∈ SN } . consequently. 1] [302. comes with a caveat: there exist cases where this projection matrix solution W does not provide the shortest route to an optimal rank-n solution G⋆ . the point being.308 CHAPTER 4. W i=n+1 = λ(G⋆ )T λ(W ) ≥ 0 (737) This is the connection to cardinality minimization of vectors. Eigenvalue vector λ(W ) has 1 in each entry corresponding to the N − n smallest entries of δ(Λ) and has 0 elsewhere. 2]. W minimization is not a projection method: certainly.7) Direction of projection is −W G⋆ W .25 id est.

4.4.2.2. . Then we define global convergence of the iteration (735) (1738a) to correspond with this vanishing vector inner-product (736) of optimal solutions.2 convergence We study convergence to ascertain conditions under which a direction matrix will reveal a feasible solution G .1) for rank minimization. −W is instead trained toward the boundary of the positive semidefinite cone. Figure 84). it can find a rank-n solution G⋆ ((736) will be satisfied) only if at least one exists in the feasible set of program (735). Denote by W ⋆ a particular optimal direction matrix from semidefinite program (1738a) such that (736) holds (feasible rank G ≤ n found). as in the trace heuristic for example. Vector inner-product of an optimization variable with direction matrix W is therefore a generalization of the trace heuristic ( 7. When direction matrix W = I . of rank n or less. RANK-CONSTRAINED SEMIDEFINITE PROGRAM Sn (I −W )G⋆ (I −W ) G⋆ 309 W G⋆ W ⊥ Sn Figure 83: (confer Figure 171) Projection of G⋆ on subspace Sn of rank ≤ n matrices whose nullspace contains N (G⋆ ). Because this iterative technique for constraining rank is not a projection method.1.4. to semidefinite program (735). then −W points directly at the origin (the rank-0 PSD matrix. 4. This direction W is closed-form solution to (1738a).

310 CHAPTER 4. over positive semidefinite cone drawn here in isometrically isomorphic R3 . Polar of direction vector W = I points toward origin. . with normal I . I = κ} Figure 84: Trace heuristic can be interpreted as minimization of a hyperplane. SEMIDEFINITE PROGRAMMING I 0 S2 + (confer Figure 99) ∂H = {G | G .

There can be no proof of global convergence because of the implicit high-dimensional multimodal manifold in variables (G . 1] by selection of Φ ∈ SN −n while + λ(W )N −n+1:N = 0. When a rank-n feasible solution to (735) exists.4. Local convergence.6.3) demanded by the current iterate: given ordered diagonalization G⋆ = QΛQT ∈ SN . Zero eigenvalues act as memory while randomness largely reduces likelihood of stall. direction vector W can be manipulated to steer out. Because the infimum of vector inner-product of two positive semidefinite matrix variables is zero. 1.4. When stall occurs. Then rank G⋆ ≤ n and pair (G⋆ . convergence to τ = 0 and definition of a stall. When this direction works.4. reversal of search direction as in Example 4. convex iteration always converges locally if not globally.0. W with respect to iteration tend to be noisily monotonic.g. in this context. never implies nonexistence of a rank-n feasible solution to (735). when instead regarded simultaneously in two variables (G . [259. then W = U ⋆ ΦU ⋆T where U ⋆ = Q(: . A nonexistent rank-n feasible solution would mean certain failure to converge globally by definition (736) (convergence to τ = 0) but. e. means convergence to a fixed point of possibly infeasible rank. ( 3.9. it remains an open problem to state conditions under which G⋆ . Suppose G⋆ . Only local convergence can be established because objective G .3) Local convergence. Once a particular value of τ is achieved.0. W ). rank and objective sequence G⋆ .1.1. W ) .1 Proof. .2] [42. RANK-CONSTRAINED SEMIDEFINITE PROGRAM 311 4. n+1 : N ) ∈ RN ×N −n and where eigenvalue vector λ(W )1:N −n = λ(Φ) has nonnegative uniformly distributed random entries in (0. it can never be exceeded by subsequent iterations because existence of feasible G and W having that vector inner-product τ has been established simultaneously in each problem. 1. or reinitialization to a random rank-(N − n) matrix in the same positive semidefinite cone face ( 2. as proved.0.. W = τ is satisfied for some nonnegative constant τ after any particular iteration (735) (1738a) of the two minimization problems. is generally multimodal.2. the nonincreasing sequence of iterations is thus bounded below hence convergent because any bounded monotonic sequence in R is convergent.8.0.1] Local convergence to some nonnegative objective value τ is thereby established.2. W ⋆ = τ = 0 (736) is achieved by iterative solution of semidefinite programs (735) and (1738a). W ⋆ ) becomes a globally optimal fixed point of iteration. W .

1.4.2 CHAPTER 4.20  0.50 0.  [40]  0.55 0. Completely positive semidefinite matrix. find a nonnegative 17 82 72 factorization X = WH (740) by solving A∈S3 .22 . Nonnegative matrix factorization. SEMIDEFINITE PROGRAMMING Exercise. 4.1.4.2.55 0. 3 : 8) ∈ R8×6 . .   17 28 42 Given rank-2 nonnegative matrix X =  16 47 51 . H∈ R2×3 find subject to W.   I [I WT H ] ⋆  W  Z = HT (742) Use the known closed-form solution for a direction vector Y to regulate rank by convex iteration.312 4.20 0.H   I WT H A X  Z = W T H XT B W ≥0 H≥0 rank Z ≤ 2 0 (741) which follows from the fact. B∈S3 . W ∈ R3×2 .08 T positive factorization G = X X (931) by solving X∈ R2×3 find subject to Z = I X XT G rank Z ≤ 2 X≥0 0 (739) via convex iteration.1). find a Given rank-2 positive semidefinite matrix G = 0.22 0.3 Exercise.2. then Y = U ⋆ U ⋆T ( 4. set Z ⋆ = QΛQT ∈ S8 to an ordered diagonalization and U ⋆ = Q(: .61 0. at optimality.4.1.

But there is no iteration on the entire network. [280] [86] In the past. Now.2. As partitions are selected based on “rule sets” (heuristics. hence the term “iterative” is perhaps inappropriate. Ye’s authorship is honorary. not geographics). initialize Y then iterate numerical solution of (convex) semidefinite program A∈S3 .4. Saunders.26 .4 Example.1. a large network is partitioned into smaller subnetworks (as small as one sensor − a mobile point. W ∈ R3×2 . H∈ R2×3 minimize subject to  I WT H A X  Z = W HT XT B W ≥0 H≥0 Z. whereabouts unknown) and then semidefinite programming and heuristics called spaseloc are applied to localize each and every partition by two-dimensional distance geometry.26 is limited to two Euclidean dimensions and applies semidefinite programming (SDP) to little subproblems. Sensor-Network Localization or Wireless Location. a term applicable only insofar as adjoining partitions can share localized sensors and anchors (absolute sensor positions known a priori). There. proposed by Carter. Heuristic solution to a sensor-network localization problem. One can reasonably argue that semidefinite programming methods are unnecessary for localization of small partitions of large sensor networks.4.4. Y  0 (743) with Y = U ⋆ U ⋆T until convergence (which is global and occurs in very few iterations for this instance). an application to optimal regulation of affine dimension: 4. these nonlinear localization problems were solved algebraically and computed by least squares solution to hyperbolic equations. But no adaptation of a partition actually occurs once it has been determined. 4. yet termed iterative. RANK-CONSTRAINED SEMIDEFINITE PROGRAM 313 In summary. The paper constitutes Jin’s dissertation for University of Toronto [217] although her name appears as second author. B∈S3 . they also term the partitioning adaptive.4. & Ye in [72]. Jin. Their partitioning procedure is one-pass.

In navigation systems. known fixed sensor positions. can be expressed as a semidefinite program Multilateration − literally. to a noiseless nonlinear system describing the common point of intersection of hyperspheres in real Euclidean vector space. 4.) [229] [267] Indeed. as relates to localization.4. shape of a geometric figure formed by nearly intersecting lines of position. lies in convex expression of classical multilateration: So & Ye showed [322] that the problem of finding unique solution. illustrating connectivity and circular radio-range per sensor. Multilateration can be regarded as noisy trilateration. Sensor/anchor distance is measurable with negligible uncertainty for sensor within those grey regions. (Graphic by Geoff Merrett. SEMIDEFINITE PROGRAMMING Figure 85: Sensor-network localization in R2 . Smaller dark grey regions each hold an anchor at their center. therefore: Obtaining a fix from multiple lines of position.27 numerical methods for global positioning (GPS) by satellite do not rely on convex optimization. [292] Modern distance geometry is inextricably melded with semidefinite programming. The beauty of semidefinite programming. having many sides. practical contemporary called multilateration.314 CHAPTER 4.27 .

the ultimate partition solved.13.4.4. this measurement noise is more aptly characterized by ellipsoids of varying orientation and eccentricity as one recedes from a sensor. 2) reliance on complicated and extensive heuristics for partitioning a large network that could instead be efficiently solved whole by one semidefinite program [225. heuristics provided are only for two spatial dimensions (no higher-dimensional heuristics are proposed). . 5 sensors optimally. expected to suffer most. for example. For these small numbers it remains unclarified as to precisely what advantage is gained over traditional least squares: it is difficult to determine what part of their noise performance is attributable to SDP and what part is attributable to their heuristic geometry. RANK-CONSTRAINED SEMIDEFINITE PROGRAM via distance geometry. 2. that distance measurement errors have distribution characterized by circular contours of equal probability about an unknown sensor-location. primarily determined by the terrain. 3].2] ( 5.2) Since an individual partition’s solution is not iterated in Carter & Jin and is interdependent with adjoining partitions. Yet in most all urban environments.28 A distinct contour map corresponding to each anchor is required in practice. When heuristics fail. we expect errors to propagate from one partition to the next. Partitioning of large sensor networks is a compromise to rapid growth of SDP computational intensity with problem size. one is averse to a partitioning scheme because noise-effects vary inversely with problem size. generally they are repaired by adding more heuristics.4. Heuristics often fail on real-world data because of unanticipated circumstances. But when impact of noise on distance measurement is of most concern. [53. While partitions range in size between 2 and 10 sensors. 315 But the need for SDP methods in Carter & Jin et alii is enigmatic for two more reasons: 1) guessing solution to a partition whose intersensor measurement data or connectivity is inadequate for localization by distance geometry. (Figure 132) Each unknown sensor must therefore instead be bound to its own particular range of distance. Tenuous is any presumption. (Figure 85) That presumption effectively appears within Carter & Jin’s optimization problem statement as affine equality constraints relating unknowns to distance measurements that are corrupted by noise.28 The nonconvex problem we must instead solve is: 4.

SEMIDEFINITE PROGRAMMING i. Any optimal solution obtained depends on whether or how a network is partitioned. it is not necessarily the true localization because there is little hope of exact localization by any algorithm once significant noise is introduced. The sensor-network localization problem is considered difficult.316 CHAPTER 4. whether distance data is complete.7. and how the problem is formulated. 2] Rank constraints in optimization are considered more difficult. Measurement noise precludes equality constraints representing distance. Biswas’ approach to localization can be improved: [44] [46]. Carter & Jin gauge performance of their heuristics to the SDP formulation of author Biswas whom they regard as vanguard to the art. We show how to achieve that rank constraint only if the feasible set contains a matrix of desired rank. Figure 90 illustrates contours of equal sensor-location uncertainty. we present the localization problem as a semidefinite program (equivalent to (744)) having an explicit rank constraint which controls affine dimension of an optimal solution. (confer 6. instead of heuristics. presence of noise.0. orientation and eccentricity can effectively be incorporated into the problem statement. xj } subject to dij ≤ xi − xj 2 ≤ dij (744) where xi represents sensor location. Generally speaking. When a particular formulation is a convex optimization problem. [14. Their two-dimensional heuristics outperformed Biswas’ localizations both in execution-time and proximity to the desired result. The optimal solution set is consequently expanded. then the set of all optimal solutions forms a convex set containing the actual or true localization.1) precisely the anomaly motivating Carter & Jin. necessitated by introduction of distance inequalities admitting more and higher-rank solutions. minimization of any distance measure yields compacted solutions. [47] [45] Intuitively. Perhaps. In what follows. there can be no unique solution to the sensor-network localization problem because there is no unique formulation. and where dij and dij respectively represent lower and upper bounds on measured distance-square from i th to j th sensor (or from sensor to anchor). [14. 1] Biswas posed localization as an optimization problem minimizing a distance measure.0. . By establishing these individual upper and lower bounds. Even were the optimal solution set a single point. that is the art of Optimization. Control of affine dimension in Carter & Jin is suboptimal because of implicit projection on R2 .j ∈ I find {xi .

a connectivity graph is assumed incomplete. remaining nodes are sensors. In the interest of test standardization. Nodes 3 and 4 are anchors. hand-drawn.4. Matrix entries hollow dot ◦ represent known distance between anchors (to high accuracy) while zero distance is denoted 0. Because measured distances are quite unreliable in practice.4. In essence. Lattice connectivity is solely determined by sensor radio range. Radio range of sensor 1 indicated by arc. Our problem formulation is extensible to any spatial dimension. 0 • ? • • 0 • • ? • 0 ◦ • • ◦ 0 (745) Matrix entries dot • indicate measurable distance between nodes while unknown distance is denoted by ? (question mark ). proposed standardized test Jin proposes an academic test in two-dimensional real Euclidean space R2 that we adopt. this test is a localization of sensors and anchors arranged in a regular triangular lattice. we propose adoption of a few small examples: Figure 86 through Figure 89 and their particular connectivity represented by matrices (745) through (748) respectively. RANK-CONSTRAINED SEMIDEFINITE PROGRAM 2 317 3 1 4 Figure 86: 2-lattice in R2 . our solution to the localization problem substitutes a distinct .

remaining nodes are sensors.318 5 CHAPTER 4. and 9 are anchors. SEMIDEFINITE PROGRAMMING 7 6 3 8 4 1 9 2 Figure 87: 3-lattice in R2 . Radio range of sensor 1 indicated by arc. The challenge is to find a solution in two dimensions close to the true sensor positions given incomplete noisy intersensor distance information. 0 • • ? • ? ? • • • 0 • • ? • ? • • • • 0 • • • • • • ? • • 0 ? • • • • • ? • ? 0 • • • • ? • • • • 0 • • • ? ? • • • • 0 ◦ ◦ • • • • • • ◦ 0 ◦ • • • • • • ◦ ◦ 0 (746) . 8. hand-drawn. Nodes 7. Anchors are chosen so as to increase difficulty for algorithms dependent on existence of sensors in their convex hull. equality constraints exist only for anchors. range of possible distance for each measurable distance.

15. 0 ? ? • ? ? • ? ? ? ? ? ? ? • • ? 0 • • • • ? • ? ? ? ? ? • • • ? • 0 ? • • ? ? • ? ? ? ? ? • • • • ? 0 • ? • • ? • ? ? • • • • ? • • • 0 • ? • • ? • • • • • • ? • • ? • 0 ? • • ? • • ? ? ? ? • ? ? • ? ? 0 ? ? • ? ? • • • • ? • ? • • • ? 0 • • • • • • • • ? ? • ? • • ? • 0 ? • • • ? • ? ? ? ? • ? ? • • ? 0 • ? • • • ? ? ? ? ? • • ? • • • 0 • • • • ? ? ? ? ? • • ? • • ? • 0 ? ? ? ? ? ? ? • • ? • • • • • ? 0 ◦ ◦ ◦ ? • ? • • ? • • ? • • ? ◦ 0 ◦ ◦ • • • • • ? • • • • • ? ◦ ◦ 0 ◦ • • • • • ? • • ? ? ? ? ◦ ◦ ◦ 0 (747) . Nodes 13. RANK-CONSTRAINED SEMIDEFINITE PROGRAM 10 13 11 12 319 7 14 8 9 4 15 5 6 1 16 2 3 Figure 88: 4-lattice in R2 . and 16 are anchors. remaining nodes are sensors. hand-drawn.4. 14. Radio range of sensor 1 indicated by arc.4.

320 17 CHAPTER 4. 0 • ? ? • • ? ? • ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? • 0 ? ? • • ? ? ? • ? ? ? ? ? ? ? ? ? ? ? ? ? • • ? ? 0 • ? • • • ? ? • • ? ? ? ? ? ? ? ? ? ? • • • ? ? • 0 ? ? • • ? ? ? • ? ? ? ? ? ? ? ? ? ? ? • ? • • ? ? 0 • ? ? • • ? ? • • ? ? • ? ? ? ? ? • ? • • • • ? • 0 • ? • • • ? ? • ? ? ? ? ? ? ? ? • • • ? ? • • ? • 0 • ? ? • • ? ? • • ? ? ? ? ? ? • • • ? ? • • ? ? • 0 ? ? • • ? ? • • ? ? ? ? ? ? ? • ? • ? ? ? • • ? ? 0 • ? ? • • ? ? • • ? ? ? ? ? ? ? ? • ? ? • • ? ? • 0 • ? • • ? ? ? • ? ? • • • • • ? ? • ? ? • • • ? • 0 • ? • • • ? ? • ? ? • • • • ? ? • • ? ? • • ? ? • 0 ? ? • • ? ? • • ? • • • ? ? ? ? ? • ? ? ? • • ? ? 0 • ? ? • • ? ? • • ? ? ? ? ? ? ? • • ? ? • • • ? • 0 • ? • • • ? • • • • ? ? ? ? ? ? ? • • ? ? • • ? • 0 • ? ? • • • • • • ? ? ? ? ? ? ? • • ? ? • • ? ? • 0 ? ? • • ? • ? ? ? ? ? ? ? • ? ? ? • ? ? ? • • ? ? 0 • ? ? • ? ? ? ? ? ? ? ? ? ? ? ? • • ? ? • • ? ? • 0 • ? • • • ? ? ? ? ? ? ? ? ? ? ? ? • • ? • • • ? • 0 • • • • ? ? ? ? ? ? ? ? ? ? ? ? ? • ? ? • • ? ? • 0 • • ? ? ? ? ? ? ? ? ? ? ? ? • ? ? • • • ? • • • • 0 ◦ ◦ ◦ ◦ ? ? ? ? ? ? ? ? ? • • • • • • • ? • • • ◦ 0 ◦ ◦ ◦ ? ? • ? • • • ? ? • • • ? • • ? ? • • ? ◦ ◦ 0 ◦ ◦ ? • • • ? • • • ? • • • ? • • ? ? ? ? ? ◦ ◦ ◦ 0 ◦ ? • • ? • • • ? ? • • ? ? ? ? ? ? ? ? ? ◦ ◦ ◦ ◦ 0 (748) . SEMIDEFINITE PROGRAMMING 18 21 19 20 13 14 22 15 16 9 10 23 11 12 5 6 24 7 8 1 2 25 3 4 Figure 89: 5-lattice in R2 . Nodes 21 through 25 are anchors.

. . .. Positive semidefinite matrix X TX . ..  . [251. is a Gram matrix .. We instead assign measured distance to a range estimate specified by individual upper and lower bounds: dij is an upper bound on distance-square from i th .4. . . Data by Polaris Wireless. ℓ = 1 . T T T 2 xN x1 xN x2 xN x3 · · · xN where SN is the convex cone of N × N positive semidefinite matrices in the + symmetric matrix subspace SN . . formed from inner product of the list. RANK-CONSTRAINED SEMIDEFINITE PROGRAM 321 Figure 90: Location uncertainty ellipsoid in R2 for each of 15 sensors • within three city blocks in downtown San Francisco. N } to the columns of a matrix X . ..4. Existence of noise precludes measured distance from the input data. problem statement Ascribe points in a list {xℓ ∈ Rn . [324] AR M KE T St .  . (931) G = X TX =  xTx1 xTx2 x3 2 .6]   x1 2 xTx2 xTx3 · · · xTxN 1 1 1  T  x2 2 xTx3 · · · xTxN   x2 x1 2 2   N . 3. .  . . xTxN  ∈ S+  3  3 3  . X = [ x1 · · · xN ] ∈ Rn×N (76) where N is regarded as cardinality of list X .

ei eT i G .29 4. X∈ Rn×N find X G . Distance-square dij = xi − xj dij = gii + gjj − 2gij 2 2 = x i − xj .4. (ei − ej )(ei − ej )T ≤ dij = xi 2 . Then the ˇ sensor-network localization problem (744) can be expressed equivalently: Given a number of anchors m . SEMIDEFINITE PROGRAMMING to j th sensor. difference being: fewer sensors. j ∈ {N − m + 1 . . while each linear inequality pair represents a convex Euclidean body known as slab. In terms of position X . . e.4). N } ˇi ˇ = [ xN −m+1 · · · xN ] ˇ ˇ 0 = n (749) ∀(i. Each measurement range is presumed different from the others because of measurement uncertainty. ∀ i. . . j) ∈ I subject to where ei is the i th member of the standard basis for RN . Figure 90. any solution (G .g. (ei eT + ej eT )/2 j i X(: . an intersection of two parallel but opposing halfspaces (Figure 11). while dij is a lower bound.322 CHAPTER 4. N − m + 1 : N ) I X Z= XT G rank Z dij ≤ G . for 0 < n < N G∈ SN .29 A sensor position that is known a priori to high accuracy (with absolute certainty) xi is called an anchor. Each linear equality constraint in G ∈ SN represents a hyperplane in isometrically isomorphic Euclidean vector space RN (N +1)/2 . X) provides comparison with respect to the positive semidefinite cone G 4.30 By Schur complement ( A. x i − xj (918) is related to Gram matrix entries G [gij ] by vector inner-product (933) = G . (ei − ej )(ei − ej )T = tr(G T (ei − ej )(ei − ej )T ) hence the scalar inequalities. i < j . this distance slab can be thought of as a thick hypershell instead of a hypersphere boundary. Our mathematical treatment of anchors and sensors is not dichotomized. N ˇ = xTxj .4. and I a set of indices (corresponding to all measurable distances • ). i = N − m + 1 ..30 X TX (969) Wireless location problem thus stated identically. . These bounds become the input data.

4.3. . thus restricting solution to Rn . N − m + 1 : N ) I X Z= XT G ei eT i (ei eT + j = xi . by Theorem A.W dij ≤ G . Thus the purpose of vector inner-product objective (752) is to locate a rank-n feasible Gram matrix assumed existent on the boundary of positive semidefinite cone SN . For properly chosen W . Assuming full-rank solution (list) X rank Z = rank G = rank X convex equivalent problem statement Problem statement (749) is nonconvex because of the rank constraint. RANK-CONSTRAINED SEMIDEFINITE PROGRAM which is a convex relaxation of the desired equality constraint I X XT G = I [I X ] XT (970) 323 The rank constraint insures this equality holds. Matrix W is chosen so −W points in direction of rank-n feasible solutions G . . ej eT )/2 i X(: . W (752) a generalization of the trace heuristic for minimizing convex envelope of rank. i < j . . j ∈ {N − m + 1 . ( 7.4. I ← Z . tr Z = Z . We do not eliminate or ignore the rank constraint. i = N − m + 1 . .1) In this convex optimization problem (751).1. ∀ i. rather. j) ∈ I Convex function tr Z is a well-known heuristic whose sole purpose is to represent convex envelope of rank Z .0.4.1.2. G. as explained beginning in 4.4. (ei − ej )(ei − ej )T ≤ dij G.2. where W ∈ SN +n is constant with respect to (751). we substitute a vector inner-product objective function for trace. problem (751) becomes an equivalent to (749). how to + choose direction vector W is explained there and in what follows: . we find a convex way to enforce it: for 0 < n < N G∈ SN . a semidefinite program. N ˇ = xTxj . Matrix W is normal to + a hyperplane in SN +n minimized over a convex feasible set specified by the constraints in (751). X∈ Rn×N (750) minimize Z. N } ˇi ˇ = [ xN −m+1 · · · xN ] ˇ ˇ 0 (751) 2 subject to ∀(i.

Rank-2 solution found in 1 iteration (751) (1738a) subject to reflection error.5 0 0. .6 0.1 .2 1.6 0. Three red vertical middle nodes are anchors. SEMIDEFINITE PROGRAMMING 1. remaining nodes are sensors. radius = 1.5 1 1. Actual sensor indicated by target while its localization is indicated by bullet • . Radio range of sensor 1 indicated by arc.2 0 −0.2 0 0.6 0.4 0.2 −0.8 1 1. Actual sensor indicated by target while its localization is indicated by bullet • .4 0. Radio range of sensor 1 indicated by arc.14 . radius = 1.4 Figure 92: Typical solution for 3-lattice in Figure 87 with noise factor η = 0.2 1 0.12 .1 . Rank-2 solution found in 2 iterations (751) (1738a). 1.8 0.8 0.324 CHAPTER 4. Two red rightmost nodes are anchors.2 −0.2 0 −0.5 2 Figure 91: Typical solution for 2-lattice in Figure 86 with noise factor η = 0.4 0.2 1 0. two remaining nodes are sensors.2 0.

radius = 0. Radio range of sensor 1 indicated by arc. Radio range of sensor 1 indicated by arc. 1.6 0.2 1. Four red vertical middle-left nodes are anchors.4.6 0.8 1 1.1 .75 .1 .2 0. Rank-2 solution found in 7 iterations (751) (1738a). Five red vertical middle nodes are anchors.2 1.2 1 0.6 0. radius = 0. Rank-2 solution found in 3 iterations (751) (1738a).4 Figure 93: Typical solution for 4-lattice in Figure 88 with noise factor η = 0.4. remaining nodes are sensors.56 .8 0. remaining nodes are sensors.2 −0. .8 0.4 0.2 325 1 0. RANK-CONSTRAINED SEMIDEFINITE PROGRAM 1.2 0 0.2 −0.8 1 1.4 0.4 0. Actual sensor indicated by target while its localization is indicated by bullet • .4 Figure 94: Typical solution for 5-lattice in Figure 89 with noise factor η = 0.4 0.2 0. Actual sensor indicated by target while its localization is indicated by bullet • .2 0 −0.6 0.2 0 −0.2 0 0.

Rank-2 solution found in 3 iterations (751) (1738a).1 compares better than Carter & Jin [72. Rank-2 solution found in 5 iterations (751) (1738a). radius = 0.8 0.6 0.2 −0.2].6 0. remaining nodes are sensors.8 0. SEMIDEFINITE PROGRAMMING 1.6 0. fig. Ten red vertical middle nodes are anchors.2 0 0.2 1 0. Actual sensor indicated by target while its localization is indicated by bullet • .4 0.2 1 0. Radio range of sensor 1 indicated by arc.2 0 0. Actual sensor indicated by target while its localization is indicated by bullet • . Ten red vertical middle nodes are anchors.2 Figure 96: Typical localization of 100 randomized noiseless sensors (η = 0) is exact despite incomplete EDM.4 0. radius = 0.4.8 1 1.4 0. . the rest are sensors.8 1 1.2 1.4 0.25 .6 0.2 0 −0. Radio range of sensor at origin indicated by arc.2 0.25 . 1.4 Figure 95: Typical solution for 10-lattice with noise factor η = 0.326 CHAPTER 4.2 0 −0.2 −0.2 0.

1 .19].0044 compares better than Carter & Jin’s 0.2 0 −0. Ten red vertical middle nodes are anchors. requiring more iterations.4 0. Interior anchor placement makes localization difficult.4 0. radius = 0. (Regular lattice in Figure 95 is actually harder to solve.6 0. Actual sensor indicated by target while its localization is indicated by bullet • . Radio range of sensor at origin indicated by arc.2 0.0154 computed in 0.2 Figure 97: Typical solution for 100 randomized sensors with noise factor η = 0. same as before. Remaining nodes are sensors.2 0 0.4. 800MHz FSB). RANK-CONSTRAINED SEMIDEFINITE PROGRAM 327 1.4. After 1 iteration rank G = 92.8 0.) Runtime for SDPT3 [351] under cvx [168] is a few minutes on 2009 vintage laptop Core 2 Duo CPU (Intel T6400@2GHz. . after 2 iterations rank G = 4.25 .6 0.71s [72. p. Rank-2 solution found in 3 iterations (751) (1738a).2 −0. worst measured average sensor error ≈ 0.8 1 1.2 1 0.

number of anchors √ m= N (753) equals square root of cardinality N of list X . implies iteration. W W I tr W = N (1738a) subject to 0 which has an optimal solution that is known in closed form (p.4.1. This eigenvalue sum is zero when Z ⋆ has rank n or less. Indices set I identifying all measurable distances • is ascertained from connectivity matrix (745). (746). Then this cycle (751) (1738a) iterates until convergence. solution via this convex iteration solves sensor-network localization problem (744) and its equivalent (749). Foreknowledge of optimal Z ⋆ .328 direction matrix W CHAPTER 4. we randomize each range of distance-square that bounds G . id est. SEMIDEFINITE PROGRAMMING Denote by Z ⋆ an optimal composite matrix from semidefinite program (751). To make the examples interesting and consistent with previous work. This implies that the set of all optimal solutions having least rank must be small. In presence of negligible noise. noteworthy insofar as each example represents an incomplete graph. true position is reliably localized for every standardized example. (ei − ej )(ei − ej )T in (751). We solve iteration (751) (1738a) in dimension n = 2 for each respective example illustrated in Figure 86 through Figure 89. numerical solution In all examples to follow. When rank Z ⋆ = n . 4. Z ⋆ becomes constant in semidefinite program (1738a) where a new normal direction W is found as its optimal solution. to make possible this search for W . or (748). (Ky Fan) N+n λ(Z ⋆ )i = minimize N +n i=n+1 W∈S Z ⋆. id est.1). semidefinite program (751) is solved for Z ⋆ initializing W = I or W = 0.672. j) ∈ I √ dij = dij (1 + 3 η χl )2 √ (754) dij = dij (1 − 3 η χl+1 )2 . (747). Once found. for each and every (i. Then for Z ⋆ ∈ SN +n whose eigenvalues λ(Z ⋆ ) ∈ RN +n are arranged in nonincreasing order.

indeed. becomes difficult via the holistic semidefinite program we proposed. χl is the l th sample of a noise process realization uniformly distributed in the interval (0. Trace of a matrix. Exact localization. dij ] is not necessarily centered on actual distance-square dij . trace represents the normal I to some hyperplane in Euclidean vector space.4. & Ye [72] is a sobering demonstration of the need for more efficient methods for solution of semidefinite programs. Numerical solution to noisy problems. Elegance of our semidefinite problem statement (751). by inspection of their published graphical data. localization example conclusion Solution to this sensor-network localization problem became apparent by understanding geometry of optimization. 2) no implicit postprojection on rank-2 positive semidefinite matrices induced by nonzero G−X TX denoting suboptimality as occurs in [45] [46] [47] [72] [217] [225].1 is a constant noise factor. containing sensor variables well in excess of 100. 96. Unit √ stochastic variance is provided by factor 3. G⋆ = X ⋆TX ⋆ by convex iteration. by any method. our results are better than those of Carter & Jin. But to us. and dij is actual distance-square from i th to j th sensor. should . for constraining affine dimension of sensor-network localization. then the convex iteration we presented inherently overcomes limitations of Carter & Jin with respect to both noise performance and ability to localize in any desired affine dimension. Saunders. The legacy of Carter. Figure 91 through Figure 94 each illustrate one realization of numerical solution to the standardized lattice problems posed by Figure 86 through Figure 89 respectively. Because of distinct function calls to rand() . RANK-CONSTRAINED SEMIDEFINITE PROGRAM 329 where η = 0. to a student of linear algebra. is perhaps a sum of eigenvalues. requiring: 1) no centralized-gradient postprocessing heuristic refinement as in [44] because there is effectively no relaxation of (749) at global optimality. 1) like rand(1) from Matlab. When problem size is within reach of contemporary general-purpose semidefinite program solvers. 97) Obviously our solutions do not suffer from those compaction-type errors (clustering of localized sensors) exhibited by Biswas’ graphical results for the same noise factor η . (Figure 95. is impossible because of measurement noise. Certainly. each range of distance-square [ dij . (Figure 84) Our solutions are globally optimal. while that of So & Ye [322] forever bonds distance geometry to semidefinite programming. Jin.4.

330 Z. and f (Z ⋆ ) is constant in (756).2. .4.31 called multiobjective. ball packing ( 5.2.4. and cardinality problems ( 4.3. stress ( 7. parametrized by weight w for real convex objective f minimization (755) with rank constraint to k by convex iteration. We have had some success introducing the direction matrix inner-product (752) as a regularization term4.7. An approach different from interior-point methods is required. W (755) subject to Z ∈ C Z 0 4.4).2.W CHAPTER 4.1. provide some impetus to focus more research on computational intensity of general-purpose semidefinite program solvers. 4. higher speed and greater accuracy from a simplex-like solver is what is needed. SEMIDEFINITE PROGRAMMING rank Z wc k w 0 f (Z) Figure 98: Regularization curve.or vector optimization.6).1 because f is a convex real function.1).2. Proof of convergence for this convex iteration is identical to that in 4.3 regularization We numerically tested the foregoing technique for constraining rank on a wide range of problems including localization of randomized positions (Figure 97).1.4.31 minimize Z∈ SN f (Z) + w Z . hence bounded below.

32 0 cannot be a norm ( 3. that analogy is known as the cardinality problem.5 Constraining cardinality The convex iteration technique for constraining rank can be applied to cardinality problems. There is a direct analogy to linear programming that is simpler to present but. but with an upper bound k on cardinality x 0 of a nonnegative solution x : for A ∈ Rm×n and vector b ∈ R(A) find x ∈ Rn subject to Ax = b (529) x 0 x 0≤k where x 0 ≤ k means4. or cardinality: The abstraction. CONSTRAINING CARDINALITY minimize W ∈ SN 331 f (Z ⋆ ) + w Z ⋆ . W (756) subject to 0 W I tr W = N − n whose purpose is to constrain rank.1 nonnegative variable Our goal has been to reliably constrain rank in a semidefinite program. perhaps. that is Figure 98. Graphical discontinuity can subsequently exist when there is a range of greater w providing required rank k but not necessarily increasing a minimization objective function f . 4.6.. is a synopsis. Example 4.32 vector x has at most k nonzero entries. affine dimension. cardinality x not positively homogeneous.5.0. e. such a vector is presumed existent in the feasible set.g.4. In Optimization. 4.2). Positive scalar w is well chosen by cut-and-try.5. a broad generalization of accumulated empirical evidence: There exists a critical (smallest) weight wc • for which a minimal-rank constraint is met.0.2) because it is . more difficult to solve.2. Nonnegativity constraint Although it is a metric ( 5. There are parallels in its development analogous to how prototypical semidefinite program (648) resembles linear program (301) on page 273 [385]: 4. Consider a feasibility problem Ax = b .

306) N subject to 0 y 1 yT1 = n − k λ(G⋆ )i = minimize N i=k+1 W∈S G⋆ . (p. and cardinality of x⋆ ∈ Rn is 4. . Then n−k entries of x⋆ are themselves zero whenever their absolute sum is.4. W (1738a) subject to 0 W I tr W = N − k Linear program (524) sums the n−k smallest entries from vector x . But this global convergence cannot always be guaranteed. we want n−k entries of x to sum to zero.332 CHAPTER 4. In context of problem (529).5. y ⋆ = |x⋆ |T y ⋆ 0 (757) defines global convergence for the iteration. (confer (736)) n i=k+1 π(|x⋆ |)i = |x⋆ | .1 direction vector We propose that cardinality-constrained feasibility problem (529) can be equivalently expressed as iteration of a sequence of two convex problems: for 0 ≤ k ≤ n−1 minimize n x∈ R x.2] 4. a sequence may be regarded as a homotopy to minimal 0-norm. until desired cardinality is achieved. SEMIDEFINITE PROGRAMMING x 0 is analogous to positive semidefiniteness. This sequence is iterated until x⋆Ty ⋆ vanishes. [61.1. we want a globally optimal objective x⋆Ty ⋆ to vanish: more generally. y (156) subject to Ax = b x 0 n π(x⋆ )i = i=k+1 minimize n y∈ R x⋆ . y (524) where π is the (nonincreasing) presorting function. 3.33 Problem (524) is analogous to the rank constraint problem. Cardinality is quasiconcave on Rn + + n just as rank is quasiconcave on S+ . id est.4.33 When it succeeds. the notation means vector x belongs to the nonnegative orthant Rn . id est.

1) Vector y may be interpreted as a negative search direction.4. . over nonnegative orthant drawn here in R3 . (p.2 direction vector interpretation (confer 4.75) Direction vector y is not unique. it points opposite to direction of movement of hyperplane {x | x . y over the feasible set in linear program (156). at most k . 1 = κ} Figure 99: 1-norm heuristic for cardinality minimization can be interpreted as minimization of a hyperplane. is obvious assuming existence of a cardinality-k solution x⋆ .1.4. Optimal direction vector y ⋆ is defined as any nonnegative vector for which find x ∈ Rn subject to Ax = b x 0 x 0≤k minimize n ≡ x∈ R x . whose nonzero entries are complementary to those of x⋆ . 4. CONSTRAINING CARDINALITY 0 333 R3 + ∂H 1 (confer Figure 84) ∂H = {x | x .1. videlicet. Polar of direction vector y = 1 points toward origin. y = τ } in a minimization of real linear function x . The feasible set of direction vectors in (524) is the convex hull of all cardinality-(n−k) one-vectors. y⋆ (156) (529) subject to Ax = b x 0 Existence of such a y ⋆ . ∂H with normal 1.5.5.

[259. a fixed point of possibly infeasible cardinality. 4.1. a = n − k} (758) conv{u ∈ Rn | card u = n − k .5.2] [42. then polar direction −y points directly at the origin (the cardinality-0 nonnegative vector) as in Figure 99. or reinitialization to a random cardinality-(n−k) vector in the same nonnegative orthant face demanded by the current Convex iteration (156) (524) is not a projection method because there is no thresholding or discard of variable-vector x entries. 4.1. to steer out of local minima. argument to conv{ } . Constraining cardinality. 1. by virtue of a monotonically nonincreasing real objective sequence.1] There can be no proof of global convergence.5. for that closed-form solution. y = |x⋆ |T y ≥ 0 (759) When y = 1. called reweighting [162]. ui ∈ {0. An optimal solution y to (524).3. 1. y = τ > 0. solution to problem (529). at a positive objective value x⋆ . comprises the extreme points of set (758) which is a nonnegative hypercube slice.34 by Proposition 7. (confer (737)) n i=k+1 π(|x⋆ |)i = |x⋆ | . whose basis is a subset of the Cartesian axes. 1}} = {a ∈ Rn | 1 This set.3] in 1999.. is known in closed form: it has 1 in each entry corresponding to the n−k smallest entries of x⋆ and has 0 elsewhere.34 . as in 1-norm minimization for example. can often be achieved but simple examples can be contrived that stall at a fixed point of infeasible cardinality. containing all cardinality k (or less) vectors having the same ordering as x⋆ . Setting direction vector y instead in accordance with an iterative inverse weighting scheme. id est. it is ill-advised to attempt simultaneous optimization of variables x and y . 1 . That particular polar direction −y can be interpreted4. that is an extreme point of its feasible set. Direction vector y is then manipulated. e.334 CHAPTER 4.3 convergence can mean stalling Convex iteration (156) (524) always converges to a locally optimal solution. An optimal direction vector y must always reside on the feasible set boundary in (524) page 332.g.11. as countermeasure.1.5. SEMIDEFINITE PROGRAMMING a 0 .1. 4. Consequently.0. We sometimes solve (524) instead of employing a known closed form because a direction vector is not unique.3 as pointing toward the nonnegative orthant in the Cartesian subspace. complete randomization as in Example 4. defined by (757). was described for the 1-norm by Huo [208.

There we showed + k how (529) is equivalently stated in terms of gradients minimize n x∈R x.5. confer D.4. y versus iteration are characterized by noisy monotonicity.5.6. x n k = xT ∇ x n k .1.2. 4. the compressed sensing problem was precisely represented as a nonconvex difference of convex functions bounded below by 0 find x ∈ Rn subject to Ax = b x 0 x 0≤k minimize n ≡ x∈R x 1 − x n k subject to Ax = b x 0 (529) where convex k-largest norm x n is monotonic on Rn .1.  T z 1=k . Zero entries behave like memory or state while randomness greatly diminishes likelihood of a stall. 1] corresponding to the n−k smallest entries of x⋆ and has 0 elsewhere. ∇ x 1 −∇ x n k subject to Ax = b x 0 because x 1 (760) = xT ∇ x 1 .1.4 algebraic derivation of direction vector for convex iteration In 3. cardinality and objective sequence x⋆ .1.1.1) of the objective function from (529) while the direction vector of convex iteration y=∇ x is an objective gradient where ∇ x ∇ x n k 1 1 −∇ x n k (762) = ∇1T x = 1 under nonnegativity and  = ∇z Tx = arg maximize z Tx   n z∈R x 0 (532) subject to 0 z 1  .4. x 0 (761) The objective function from (760) is a directional derivative (at x in direction x . When this particular heuristic is successful.2. CONSTRAINING CARDINALITY 335 iterate: y has nonnegative uniformly distributed random entries in (0.2. D.

. 5.5 subject to 0 z 1 (524) zT1 = n − k optimality conditions for compressed sensing Now we see how global optimality conditions can be stated without reference to a dual problem: From conditions (468) for optimality of (529). π(|x|)k+1:n 1 =0 ⇔ x 0 ≤k (4g) (766) is a necessary condition for global optimality. By definition.336 CHAPTER 4. By assumption. Assuming k existence of a cardinality-k solution. it is necessary [61.5.1. the fourth condition is identical to x⋆ Because a 1-norm x 1 1 − x⋆ n k n k + ν ⋆TAx⋆ = 0 1 (4ℓ) (764) (765) = x + π(|x|)k+1:n is separable into k largest and n − k smallest absolute entries.3] that x⋆ 0 Ax⋆ = b ∇ x⋆ 1 − ∇ x⋆ n + ATν ⋆ 0 1 ∇ x⋆ − ∇ x⋆ n k + A ν . ∇ x 1 ∇ x n always holds. global optimality of a feasible solution to (529) is identified by a zero objective.5. SEMIDEFINITE PROGRAMMING is not unique. This means ν ⋆ ∈ N (AT ) ⊂ Rm . and ν ⋆ = 0 when A is full-rank. x⋆ = 0 k T ⋆ (1) (2) (3) (4ℓ) (763) These conditions must hold at any optimal solution (locally or globally). then only three of the four conditions are necessary and sufficient for global optimality of (529): x⋆ 0 Ax⋆ = b x ⋆ 1 − x⋆ n = 0 k (1) (2) (4g) (767) meaning. By (761). matrix A is fat and b = 0 ⇒ Ax⋆ = 0. Substituting 1 − z ← z the direction vector becomes y = 1 − arg maximize z Tx n z∈R ← arg minimize z T x n z∈R subject to 0 z 1 zT1 = k 4.

2 0. id est.8 1 k/n Figure 100: For Gaussian random matrix A ∈ Rm×n . Problems having nonnegativity constraint (dotted) are easier to solve.4 0. graph illustrates Donoho/Tanner least lower bound on number of measurements m below which recovery of k-sparse n-length signal x by linear programming fails with overwhelming probability. but not the reverse. [123] [124] .6 0.5. Inequality demarcates approximation (dashed curve) empirically observed in [23]. Hard problems are below curve. CONSTRAINING CARDINALITY 337 7 m/k 6 Donoho bound approximation x > 0 constraint minimize x x 1 subject to Ax = b 5 (517) m > k log2 (1+ n/k) 4 minimize x x 1 3 subject to Ax = b (522) x 0 2 1 0 0. failure above depends on proximity.4.

index = round(5∗rand(1)) + 1.2. in Matlab notation A = randn(3 .338 CHAPTER 4.5. SEMIDEFINITE PROGRAMMING 4. for m = 3.1.2) Problem (681) has sparsest solution not easily recoverable by least 1-norm.3) Stalls are simply detected as fixed points x of infeasible cardinality.1. we then apply convex iteration to 22. ( 3. sometimes remedied by reinitializing direction vector y to a random positive state. namely.2. 6). we discern it instead by convex iteration. Iteration continues until xTy vanishes (to within some numerical precision). id est. n = 6.8.1 Example. whose occurrence is sensitive to initial conditions of convex iteration. Although the sparsest solution is recoverable by inspection. Bolstered by success in breaking out of a stall. k = 1.5. Stalling. If .0. id est. is a consequence of finding a local minimum of a multimodal objective x .0. index) (768) is a scaled standard basis vector. the sparsest solution x ∝ eindex Without convex iteration or a nonnegativity constraint x 0. −9 4 8 1 4 1 9 1 1 −9 4 1 4 the sparsest solution to classical linear equation Ax = b is x = e4 ∈ R6 (confer (694)). [70] [119] (confer Example 4. b = rand(1)∗ A(: . rate of failure for this minimal cardinality problem Ax = b by 1-norm minimization of x is 22%.0. until desired cardinality is achieved.5. Sparsest solution to Ax = b .3. But this comes not without a stall. not by compressed sensing because of proximity to a theoretical lower bound on number of measurements m depicted in Figure 100: for A ∈ Rm×n Given data from Example 4.1. by iterating problem sequence (156) (524) on page 332. From the numerical data given. k = 1     1 −1 1 8 1 1 0  1   1 1 1  1 b = 2  (681) A =  −3 2 8 2 3 2 − 3  .000 randomized problems: Given random data for m = 3. n = 6. cardinality x 0 = 1 is expected. y when regarded as simultaneously variable in x and y . That failure rate drops to 6% with a nonnegativity constraint.

audio data from a compact disc (CD) or video data from a digital video disc (DVD). or a typically ravaged cell phone communication. The linear channel is assumed to introduce no filtering. while passing through some channel. Stalling is not an inevitable behavior.4.5. [122. Transmitted signal is denoted s = Ψz ∈ Rn (769) whose upper bound on DCT basis coefficient cardinality card z ≤ k is assumed known. in the Fourier sense. This simplifies exposition. caused by some man-made or natural phenomenon. details of the basis are unimportant except for its orthogonality ΨT = Ψ−1 . Essentially dropout means momentary loss or gap in a signal.35 . 6. For some problem types (beyond mere Ax = b). etcetera.5. The signal could be voice over Internet protocol (VoIP).4. Signal dropout. a television transmission over cable or the airwaves. well studied from both an industrial and academic perspective. failure rate drops to 0% but the amount of computation is approximately doubled.2] Signal dropout is an old problem. What remains within the time-gap is system or idle channel noise. We create a discretized windowed signal for this example by positively combining k randomly chosen vectors from a discrete cosine transform (DCT) basis denoted Ψ ∈ Rn×n . for example. Otherwise. with noise. convex iteration succeeds nearly all the time. The signal lost is assumed completely destroyed somehow. 4. CONSTRAINING CARDINALITY 339 we then engage convex iteration. and randomly reinitialize the direction vector. whose statement is just a bit more intricate but easy to solve in a few convex iterations: 4.35 hence a critical assumption: transmitted signal s is sparsely supported (k < n) on the DCT basis. from DC toward Nyquist as column index of basis Ψ increases.2 Example. Here we consider signal dropout in a discrete-time signal corrupted by additive white noise assumed uncorrelated to the signal. although it may be an unrealistic assumption in many applications.5. It is further assumed that nonzero signal coefficients in vector z place each chosen basis vector above the noise floor.1. Here is a cardinality problem. Frequency increases. detect stalls.

We speculate that is because signal cardinality 2ℓ becomes the predominant cardinality.340 CHAPTER 4. index ℓ locates the last sample prior to the gap’s onset. with caveat. while index n−ℓ+1 locates the first sample subsequent to the gap: for rectangularly windowed received signal g possessing a time-gap loss and additive noise η ∈ Rn   s1:ℓ + η1:ℓ ηℓ+1:n−ℓ  ∈ Rn g = (770) sn−ℓ+1:n + ηn−ℓ+1:n The window is thereby centered on the gap and short enough so that the DCT spectrum of signal s can be assumed static over the window’s duration n . number of equations exceeds cardinality of signal representation (roughly ℓ ≥ k) with respect to DCT basis. id est. But addition of a significant amount of noise η increases level of difficulty dramatically. We can instead formulate the dropout recovery problem as a best approximation: f1:ℓ − g1:ℓ fn−ℓ+1:n − gn−ℓ+1:n x∈ R subject to f = Ψ x x 0 card x ≤ k minimize n (772) . constraints equating reconstructed signal f to received signal g are not possible. almost always returns DCT basis coefficients numbering in excess of minimal cardinality. knowing the signal DCT basis and having a good estimate of basis coefficient cardinality makes perfectly reconstructing gap-loss easy: it amounts to solving a linear system of equations and requires little or no optimization. SEMIDEFINITE PROGRAMMING We also assume that the gap’s beginning and ending in time are precisely localized to within a sample. for example. Signal to noise ratio within this window is defined s1:ℓ SNR 20 log sn−ℓ+1:n η (771) In absence of noise. DCT basis coefficient cardinality is an explicit constraint to the optimization problem we shall pose: In presence of noise. a 1-norm based method of reducing cardinality.

1 0.2 0.1 0.15 −0.05 0 −0.15 −0.05 −0.15 0.25 flatline and g 0. (b) Reconstructed signal f (red) overlaid with corrupted signal g . .25 0 s+η s+η η (a) dropout (s = 0) 100 200 300 400 500 0.4.25 f and g 0. CONSTRAINING CARDINALITY 341 0. Flatline indicates duration of signal dropout.2 0.5.25 0 100 200 300 400 500 f (b) Figure 101: (a) Signal dropout in signal s corrupted by noise η (SNR = 10dB.05 −0.1 −0. g = s + η).15 0.05 0 −0.2 −0.2 −0.1 −0.

15 −0.15 −0.05 0 −0.1 0.2 −0. SEMIDEFINITE PROGRAMMING f−s 0. (b) Original signal s overlaid with reconstruction f (red) from signal g having dropout plus noise.2 0.15 0.1 −0.25 0.342 CHAPTER 4.2 −0.05 0 −0.2 0.25 f and s 0.05 −0.25 0 100 200 300 400 500 (a) 0. .1 −0.25 0 100 200 300 400 500 (b) Figure 102: (a) Error signal power (reconstruction f less original noiseless signal s) is 36dB below s .05 −0.1 0.15 0.

from Ψ (n = 500 in this example). n/2}. When this experiment is repeated 1000 times on noisy signals averaging 10dB SNR. When number of samples in the dropout region exceeds half the window size. 2ℓ ≥ max{2k . The time gap contains much noise. Correct cardinality is also recovered (card x = card z) along with the basis vector indices used to make original signal s . y + subject to f = Ψ x x 0 and minimize n y∈ R f1:ℓ − g1:ℓ fn−ℓ+1:n − gn−ℓ+1:n (773) x⋆ . Original signal s is created by adding four (k = 4) randomly selected DCT basis vectors. original signal s is recovered with relative error power 36dB down. by observation. by iteration of two convex problems until convergence: minimize n x∈ R x. Then a 240-sample dropout is realized (ℓ = 130) and Gaussian noise η added to make corrupted signal g (from which a best approximation f will be made) having 10dB signal to noise ratio (771).4. Thus. in the interval [10−10/20 .5. id est. then that deficient cardinality of signal remaining becomes a source of degradation to reconstruction in presence of noise. y (524) subject to 0 y 1 yT1 = n − k Signal cardinality 2ℓ is implicit to the problem statement. we divine a reconstruction rule for this signal dropout problem to attain good noise suppression: ℓ must exceed a maximum of cardinality bounds.5. as apparent from Figure 101a. we get perfect reconstruction in one iteration. illustrated in Figure 102. Without noise. Approximation error is due to DCT basis coefficient estimate error. Figure 101 and Figure 102 show one realization of this dropout problem. CONSTRAINING CARDINALITY 343 We propose solving this nonconvex problem (772) by moving the cardinality constraint to the objective as a regularization term as explained in 4. whose amplitudes are randomly selected from a uniform distribution above the noise floor. But in only a few iterations (773) (524). [Matlab code: Wıκımization] . 1]. the correct cardinality and indices are recovered 99% of the time with average relative error power 30dB down.

344 CHAPTER 4. Kissing point (• on edge) is achieved as ball contracts or expands under optimization. . SEMIDEFINITE PROGRAMMING R3 c S = {s | s A 0 . 1Ts ≤ c} F Figure 103: Line A intersecting two-dimensional (cardinality-2) face F of nonnegative simplex S from Figure 53 emerges at a (cardinality-1) vertex. S is first quadrant of 1-norm ball B1 from Figure 70.

there is a geometrical interpretation that leads to an algorithm more efficient than convex iteration. Empirically. Figure 100).6 345 Compressed sensing geometry with a nonnegative variable It is well known that cardinality problem (529).g.5. which requires explanation: Under the assumption that nonnegative cardinality-1 solutions exist in feasible set A . this conclusion also holds in other Euclidean dimensions (Figure 71. If variable x is constrained to the nonnegative orthant minimize x x 1 subject to Ax = b x 0 (522) Whereas 1-norm ball B1 has only six vertices in R3 corresponding to cardinality-1 solutions. And whereas B1 has twelve edges containing cardinality-2 solutions. In other words. In the special case of cardinality-1 solution to prototypical compressed sensing problem (522). S has three (out of total four) facets constituting cardinality-2 solutions.4. 4. e. likelihood of a low-cardinality solution is higher by kissing nonnegative simplex S (522) than by kissing 1-norm ball B1 (517) because facial dimension (corresponding to given cardinality) is higher in S . Figure 103 shows a vertex-solution to problem (522) when desired cardinality is 1. is easier to solve by linear programming when variable x is nonnegatively constrained than when not.1. Prototypical compressed sensing problem. CONSTRAINING CARDINALITY 4. for A ∈ Rm×n minimize x x 1 subject to Ax = b (517) is solved when the 1-norm ball B1 kisses affine subset A .1. it so happens. Figure 100.5. on page 231. We postulate a simple geometrical explanation: Figure 70 illustrates 1-norm ball B1 in R3 and affine subset A = {x | Ax = b}..7 cardinality-1 compressed sensing problem always solvable then 1-norm ball B1 (Figure 70) becomes simplex S in Figure 103. simplex S has three edges (along the Cartesian axes) containing an infinity of cardinality-1 solutions.5. But first-quadrant S of 1-norm ball B1 does not kiss line A . . Figure 71.

37 the elimination of redundant constraints and identically zero variables prior to numerical solution.36 from A ∈ Rm×n in a linear program like (522) ( 3. but not in execution time.4. ( 2. do not intersect that index corresponding to the cardinality-1 solution.0.13. A similar argument holds for any orientation of line A and point of entry to simplex S . 4.1 (p. When that line segment kisses line A . and because a cardinality-1 constraint in (517) transforms to a cardinality-1 constraint in its nonnegative equivalent (521). as we learned in Example 3. perhaps because of sheer dimension.3) Removing columns (and rows)4. Imagining that the corresponding cardinality-2 face F has been removed. corresponding entries of vector b may also be removed without loss of generality.2. Although it is more efficient to search over the columns of matrix A for a cardinality-1 solution known a priori to exist. 4. We offer a different and geometric presolver: Rows of matrix A are removed based upon linear dependence.228). When problem (522) is first solved.0. then this cardinality-1 recursive reconstruction algorithm continues to hold for a signed variable as in (517). may be deprecated and the problem solved again with those columns missing.4. Either a solution to problem (522) is cardinality-1 or column indices of A . then the cardinality-1 vertex-solution illustrated has been found. solution is cardinality-2 at the kissing point • on the indicated edge of simplex S .346 CHAPTER 4. in the example of Figure 103. SEMIDEFINITE PROGRAMMING columns of measurement matrix A .36 . This geometric presolver becomes attractive when a linear or integer program is not solvable by other means. This cardinality-1 reconstruction algorithm also holds more generally when affine subset A has any higher dimension n−m . corresponding to a higher cardinality solution. Because signed compressed sensing problem (517) can be equivalently expressed in a nonnegative variable.5. Such columns are recursively removed from A until a cardinality-1 solution is found. Assuming b ∈ R(A) . a combinatorial approach. tables are turned when cardinality exceeds 1 : 4. then the simplex collapses to a line segment along one of the Cartesian axes. corresponding to any particular solution (to (522)) of cardinality greater than 1.2) is known in the industry as presolving.37 The conic technique proposed here can exceed performance of the best industrial presolvers in terms of number of columns removed.2 cardinality-k geometric presolver This idea of deprecating columns has foundation in convex cone theory.

Number of extreme directions in K may exceed dimensionality of ambient space.5. (Drawing + by Pedro S´nchez) (b) Vertex-description of constraints in problem (522): a point b belongs to polyhedral cone K = {Ax | x 0}. CONSTRAINING CARDINALITY 347 Rn Rn + (a) A P 0 F Rm b (b) K 0 Figure 104: Constraint interpretations: (a) Halfspace-description of feasible set in problem (522) is a polyhedron P formed by intersection of nonnegative orthant Rn with hyperplanes A prescribed by equality constraint.4. .

5.4.2. In the example of Figure 104b. n linear feasibility problems plus numerical solution of the presolved problem. Minimal cardinality generators. would be removed from A . interpreted literally in R3 . save one. There are always exactly n linear feasibility problems to solve in order to discern generators of the smallest face of K containing b . When vector b resides in a face F of K that is not cone K itself. columns are removed from A if they do not belong to the smallest face F of K containing vector b . k = 2 (A ∈ Rm×n . (confer Example 4. all but two columns of A are discarded by our presolver when b belongs to facet F . then the smallest face is an edge that is a ray containing b .348 CHAPTER 4. Assuming that a cardinality-k solution exists and matrix A describes a pointed polyhedral cone K = {Ax | x 0} .0.13. Number of columns removed depends completely on geometry of a given problem.1 Exercise.5.1. n = 6.0. location of b within K .5. Presolving for cardinality-2 solution to Ax = b . desired cardinality of x is k) Comparison of computational intensity for this conic presolver to a brute force search n would pit combinatorial complexity.1) Again taking data from Example 4. as in Figure 104b. this geometrically describes a cardinality-1 case where all columns. particularly. at the other end of the spectrum.3.2).38 4. because a generator outside the smallest face (having positive coefficient) would violate the assumption that b belongs to that face.2. benefit is realized as a reduction in computational intensity because the consequent equivalent problem has smaller dimension. 4. there would be no columns to remove were b ∈ rel int K since the smallest face becomes cone K itself (Example 4.5.2 Example. 4.0.1. Benefit accrues when vector b does not belong to relative interior of K .2. the topic of 2.4. Were b an extreme direction. in other words. against polynomial k complexity.1. Prove that generators of the smallest face F of K = {Ax | x 0} containing vector b always hold a minimal cardinality solution to Ax = b .2.38 . a binomial coefficient ∝ . Generators of that smallest face always hold a minimal cardinality solution. those columns correspond to 0-entries in variable vector x (and vice versa). SEMIDEFINITE PROGRAMMING Two interpretations of the constraints from problem (522) are realized in Figure 104.3. for m = 3.5.

13.39 .12. so absolute value |x| is needed here. sum of the last two columns of matrix A . So geometry of this particular problem does not permit number of generators to be reduced below n by discerning the smallest face. CONSTRAINING CARDINALITY  −1 −9 1 2 4 8 8 8 1 1 2 1 4 349  1 1 2 1 4 1 1 3 1 9 0 1 −1 2 3 1 −1 4 9  A =  −3   . or nonnegative. 4.5.  b =    (681) proper cone K = {Ax | x 0} is pointed as proven by method of 2. but with an upper bound k on cardinality x 0 : for vector b ∈ R(A) find x ∈ Rn subject to Ax = b (774) x 0≤k where x 0 ≤ k means vector x has at most k nonzero entries. Convex iteration ( 4. such a vector is presumed existent in the feasible set.2. A cardinality-2 solution is known to exist. 4. We propose that nonconvex problem (774) can be equivalently written as a sequence of convex But a canonical set of conically independent generators of K comprise only the first two and last two columns of A .2.13. After columns are removed by theory of convex cones (finding the smallest face). When nonnegative rows appear in an equality constraint to 0.1.4.3.4.2.3 constraining cardinality of signed variable Now consider a feasibility problem equivalent to the classical problem from linear algebra Ax = b . identical to other rows.1).39 There is wondrous bonus to presolving when a constraint matrix is sparse. found by the method in Example 2.4. more columns may be removed. meaning. some remaining rows may become 0T . Once rows and columns have been removed from a constraint matrix.7.5. comprise the entire A matrix because b ∈ int K ( 2. all nonnegative variables corresponding to nonnegative entries in those rows must vanish ( A.4).1) utilizes a nonnegative variable.4. even more rows and columns may be removed by repeating the presolver procedure.5. Generators of the smallest face that contains vector b .

1 local convergence As before ( 4..5. vector y can be interpreted as a direction of search. but their geometrical interpretation is not as apparent: e.350 CHAPTER 4. SEMIDEFINITE PROGRAMMING problems that move the cardinality constraint to the objective: minimize n x∈ R |x| . y subject to Ax = b ≡ x∈ R . id est.5. 4. 4. x∈ R .5.2 simple variations on a signed variable Several useful equivalents to linear programs (775) (524) are easily devised.3).g.1. t∈ R subject to Ax = b −t x (776) t . ε1 in (775) is necessary to determine absolute value |x⋆ | = t⋆ ( 3.3. The term t . y + ε1 minimize n y∈ R subject to 0 y 1 yT1 = n − k (524) where ε is a relatively small positive constant. convex iteration (775) (524) always converges to a locally optimal solution. This sequence is iterated until a direction vector y is found that makes |x⋆ |T y ⋆ vanish. t∈ R minimize n n t.2) because vector y can have zero-valued entries. a fixed point of possibly infeasible cardinality. equivalent in the limit ε → 0+ minimize t . the first iteration of problem (775) is a 1-norm problem (513). y can be interpreted as corrections to this 1-norm problem leading to a 0-norm solution. t∈ R minimize n n t .1 t ≡ subject to Ax = b −t x minimize n x∈ R x 1 subject to Ax = b (517) Subsequent iterations of problem (775) engaging cardinality term t . y n n x∈ R .3. y + ε1 (775) t subject to Ax = b −t x t⋆ . By initializing y to (1−ε)1.

Projection on ellipsoid boundary. by interpreting problem (517) as infimum to a vertex-description of the 1-norm ball (Figure 70. y subject to 0 y 1 yT1 = n − k (524) We get another equivalent to linear programs (775) (524). CARDINALITY AND RANK CONSTRAINT EXAMPLES minimize n y∈ R 351 |x⋆ | .6.1] [248.4. 5.0.0. 4. y (524) subject to 0 y 1 y T 1 = 2n − k where x⋆ = [ I −I ]a⋆ . given matrices C .2. fat A . Example 3. and vector b ∈ R(A) find x ∈ RN subject to Ax = b Cx = 1 (778) The set {x | C x = 1} (2) describes an ellipsoid boundary (Figure 13). 2] Consider classical linear equation Ax = b but with constraint on norm of solution x .0.1. [52] [149.6 Cardinality and rank constraint examples 4.0. in the limit.1 Example. This is a nonconvex problem because solution is constrained to that boundary. from which it may be construed that any vector 1-norm minimization problem has equivalent expression in a nonnegative variable.6. y (777) subject to Ax = b subject to [ A −A ]a = b a 0 minimize 2n y∈ R a⋆ . confer (516)): minimize n x∈ R x 1 minimize 2n ≡ a∈ R a. Assign G= Cx 1 [ xTC T 1 ] = X Cx x CT 1 T C xxTC T C x xTC T 1 ∈ SN +1 (779) .

2 : N+ 1) ∗ (U(: . 1)′ ) (783) in Matlab notation. Y Ax = b X Cx T x C 1 tr X = 1 G= T subject to 0 (781) with minimize Y ∈ SN +1 G⋆ . making convex problem (781) its equivalent.2) Ellipsoidally constrained feasibility problem (778) is equivalent to: X∈ SN find x ∈ RN G= X Cx ( T x C 1 rank G = 1 tr X = 1 T subject to Ax = b 0) (780) This is transformed to an equivalent convex problem by moving the rank constraint to the objective: We iterate solution of X∈ SN .0. initially 0. 2 : N + 1)T that optimally solves (782) at each iteration. In that case.1. SEMIDEFINITE PROGRAMMING Any rank-1 solution must have this form. . were a rank-1 G matrix not found. 2 : N+ 1)′ + randn(N .6) provides a + new direction matrix Y = U (: . This heuristic is quite effective for problem (778) which is exceptionally easy to solve by convex iteration. Direction matrix Y ∈ SN +1 . x∈RN minimize G. It remains possible for the iteration to stall. 2 : N + 1)U (: . Y (782) subject to 0 Y I tr Y = N until convergence. the current search direction is momentarily reversed with an added randomized element: Y = −U(: . 1) ∗ U(: . (1738a) Singular value decomposition G⋆ = U ΣQT ∈ SN +1 ( A. regulates rank.352 CHAPTER 4. ( B. An optimal solution to (778) is thereby found in a few iterations.

By vectorizing   T T  x [x y 1] X Z G = y  =  ZT Y xT y T 1 X .0. Problem (781) in turn becomes minimize G . Y ∈S. Y + Ax − b X∈ SN .0. [52] Example 4. x.6. y (786) Now orthonormal Procrustes problem (786) can be equivalently restated: minimize A[ x y ] − B F   X Z x G =  Z T Y y ( xT y T 1 rank G = 1 tr X = 1 tr Y = 1 tr Z = 0 matrix Q we can make the assignment:    xxT xy T x x y   yxT yy T y ∈ S2n+1 (787) xT y T 1 1 subject to 0) (788) . Consider the particular case Q = [ x y ] ∈ Rn×2 as variable to a Procrustes problem ( C.0.0.1 is extensible. An orthonormal matrix Q ∈ Rn×p is characterized QTQ = I .3): given A ∈ Rm×n and B ∈ Rm×2 minimize n×2 Q∈ R AQ − B F subject to QTQ = I which is nonconvex.2 Example. x∈RN X Cx (785) 0 T x C 1 tr X = 1 We iterate this with calculation (782) of direction matrix Y as before until a rank-1 G matrix is found. Z . Orthonormal Procrustes. CARDINALITY AND RANK CONSTRAINT EXAMPLES 353 When b ∈ R(A) then problem (778) must be restated as a projection: / minimize x∈RN Ax − b subject to Cx = 1 (784) This is a projection of point b on an ellipsoid boundary because any affine transformation of an ellipsoid remains an ellipsoid. subject to G= T 4.4.6.6.

This problem formulation is extensible. a solution seems always to be found in one or few iterations. Numerically. the other on cardinality.354 CHAPTER 4. B ∈ Rn . solution to orthogonal Procrustes problem minimize n×n X∈ R A − XB F subject to X T = X −1 (1750) is not necessarily a permutation matrix Ξ even though an optimal objective value of 0 is found by the known analytical solution ( C. Because permutation matrices are sparse by definition.672). y A[ x y ] − B F  X Z  ZT Y subject to G = xT y T tr X = 1 tr Y = 1 tr Z = 0 minimize minimize 2n+1 W∈S + G. of course. we form the convex problem sequence: X . 4. W (790) subject to 0 W I tr W = 2n which has an optimal solution W that is known in closed form (p.0. Combinatorial Procrustes problem.0. Optimal Q⋆ equals [ x⋆ y ⋆ ]. Instead of sorting. we design two different convex problems each of whose optimal solution is a permutation matrix: one design is based on rank constraint. this Procrustes problem is easy to solve. x. when vector A = ΞB is known to be a permutation of vector B .6. Y . A good initial value for direction matrix W is 0. to orthogonal (square) matrices Q .W  x y  0 1 (789) and G⋆ .3 Example. Z . In case A . These two problems are iterated until convergence and a rank-1 G matrix is found. we depart from a traditional Procrustes problem by instead . SEMIDEFINITE PROGRAMMING To solve this.3). The simplest method of solution finds permutation matrix X ⋆ = Ξ simply by sorting vector B with respect to A .

the only orthogonal matrices having exclusively nonnegative entries are permutations of the identity.4. It is this combination of nonnegativity.4. Matrix X must also belong to that polyhedron.0. i)T 1 tr Gi = 2 X T1 = 1 X1 = 1 X≥0 (793) This fact would be superfluous were the objective of minimization linear. But as posed. ΞT 1 = 1 . 2) all points extreme to the polyhedron (100) of doubly stochastic matrices. There are two principal facts exploited by the first convex iteration design ( 4. CARDINALITY AND RANK CONSTRAINT EXAMPLES 355 demanding a vector 1-norm which is known to produce solutions more sparse than Frobenius’ norm. i) Gi = X(: . because the permutation matrices reside at the extreme points of a polyhedron (100) implied by (792).6. like tr Z = 0 from Example 4. only either rows or columns need be constrained to unit norm because matrix orthogonality implies transpose orthogonality. 1 : n) X(: . That means: 1) norm of each row and column is 1 .2) Absence of vanishing inner product constraints that help define orthogonality.5. 4. B n X∈ R minimizen+1 n×n . sum.4. and sum square constraints that extracts the permutation matrices: (Figure 105) given nonzero vectors A .4) Ξ1 = 1 .3.40 Ξ(: . n (791) 2) sum of each nonnegative row and column is 1. i=1 . i) = 1 . . id est. . .6.0. implied by constraints (792). n subject to Gi (1 : n . Permutation matrices Ξ constitute: 1) the set of all nonnegative orthogonal matrices.40 . Ξ(i . . (100) in the nonnegative orthant.0.1) we propose. Gi ∈ S A − XB 1 + w i=1 Gi . Wi 0    . ( 2.2. Ξ≥0 (792) solution via rank constraint The idea is to individually constrain each column of variable matrix X to have unity norm.2. ( B. so each row-sum and column-sum of X must also be unity. is a consequence of nonnegativity. :) = 1 . i=1 .

i)T X(: . SEMIDEFINITE PROGRAMMING {X(: .  tr Wi = n    i=1 . Optimal solutions G⋆ are key to finding direction matrices Wi for the next iteration i of semidefinite programs (793) (794): minimize n+1 Wi ∈ S G⋆ . i)T 1 ] 1 (795) subject to 0 Wi I  . Binary-valued X ⋆ column entries result from the further sum constraint X1 = 1. . i) | X(: .433) G⋆ = i X ⋆ (: . when rank-1 constraint is satisfied. i) = 1} Figure 105: Permutation matrix i th column-sum and column-norm constraint. Columnar orthogonality is a consequence of the further transpose-sum constraint X T 1 = 1 in conjunction with nonnegativity constraint X ≥ 0 . Constraint on trace of G⋆ normalizes the i th column of X ⋆ to unity i because (confer p. i) | 1T X(: . n (794) at convergence. . but we leave proof of orthogonality an exercise. i) = 1} {X(: .356 CHAPTER 4. abstract in two dimensions. The . i) [ X ⋆ (: . Optimal solutions reside at intersection of hyperplane with unit circle. where w ≈ 10 positively weights the rank regularization term. Wi i Direction matrices thus found lead toward rank-1 matrices G⋆ on subsequent i iterations.

4.6. CARDINALITY AND RANK CONSTRAINT EXAMPLES

357

optimal objective value is 0 for both semidefinite programs when vectors A and B are related by permutation. In any case, optimal solution X ⋆ becomes a permutation matrix Ξ . Because there are n direction matrices Wi to find, it can be advantageous to invoke a known closed-form solution for each from page 672. What makes this combinatorial problem more tractable are relatively small semidefinite constraints in (793). (confer (789)) When a permutation A of vector B exists, number of iterations can be as small as 1. But this combinatorial Procrustes problem can be made even more challenging when vector A has repeated entries.

solution via cardinality constraint Now the idea is to force solution at a vertex of permutation polyhedron (100) by finding a solution of desired sparsity. Because permutation matrix X is n-sparse by assumption, this combinatorial Procrustes problem may instead be formulated as a compressed sensing problem with convex iteration on cardinality of vectorized X ( 4.5.1): given nonzero vectors A , B minimize n×n
X∈ R

A − XB

1

+ w X, Y (796)

subject to X T 1 = 1 X1 = 1 X≥0 where direction vector Y is an optimal solution to minimize n×n
Y ∈R

X ⋆, Y (524)

subject to 0 ≤ Y ≤ 1 1T Y 1 = n2 − n

each a linear program. In this circumstance, use of closed-form solution for direction vector Y is discouraged. When vector A is a permutation of B , both linear programs have objectives that converge to 0. When vectors A and B are permutations and no entries of A are repeated, optimal solution X ⋆ can be found as soon as the first iteration. In any case, X ⋆ = Ξ is a permutation matrix.

358

CHAPTER 4. SEMIDEFINITE PROGRAMMING

4.6.0.0.4 Exercise. Combinatorial Procrustes constraints. Assume that the objective of semidefinite program (793) is 0 at optimality. Prove that the constraints in program (793) are necessary and sufficient to produce a permutation matrix as optimal solution. Alternatively and equivalently, prove those constraints necessary and sufficient to optimally produce a nonnegative orthogonal matrix. 4.6.0.0.5 Example. Tractable polynomial constraint. The set of all coefficients for which a multivariate polynomial were convex is generally difficult to determine. But the ability to handle rank constraints makes any nonconvex polynomial constraint transformable to a convex constraint. All optimization problems having polynomial objective and polynomial constraints can be reformulated as a semidefinite program with a rank-1 constraint. [287] Suppose we require 3 + 2x − xy ≤ 0 Identify   2 x [x y 1] x xy  y   xy y 2 G= = x y 1  tr(GA) ≤ 0 G33 = 1 (G 0) rank G = 1  x y  ∈ S3 1 (798) (797)

Then nonconvex polynomial constraint (797) is equivalent to constraint set

(799)

with direct correspondence to sense of trace inequality where G is assumed symmetric ( B.1.0.2) and
1 0 −2 1 0 A = −2 1 0

 1 0  ∈ S3 3

(800)

Then the method of convex iteration from 4.4.1 is applied to implement the rank constraint.

4.6. CARDINALITY AND RANK CONSTRAINT EXAMPLES

359

4.6.0.0.6 Exercise. Binary Pythagorean theorem. Technique of Example 4.6.0.0.5 is extensible to any quadratic constraint; e.g., xTA x + 2bTx + c ≤ 0 , xTA x + 2bTx + c ≥ 0 , and xTA x + 2bTx + c = 0. Write a rank-constrained semidefinite program to solve (Figure 105) x+y =1 x2 + y 2 = 1 (801)

whose feasible set is not connected. Implement this system in cvx [168] by convex iteration. 4.6.0.0.7 Example. Higher-order polynomials. Consider nonconvex problem from Canadian Mathematical Olympiad 1999:
x , y , z∈ R

find

x, y, z
22 33

subject to x2 y + y 2 z + z 2 x = x+y+z =1 x, y, z ≥ 0

(802)

2 We wish to solve for, what is known to be, a tight upper bound 33 on the constrained polynomial x2 y + y 2 z + z 2 x by transformation to a rank-constrained semidefinite program. First identify

2

  2 x [x y z 1] x xy zx  y   xy y 2 yz G =  =  z   zx yz z 2 1 x y z      X=      x2 y2 z2 x y z 1           [ x2 y 2 z 2 x y z 1 ]     =     

 x y   ∈ S4 z  1

(803)

 x4 x2 y 2 z 2 x2 x3 x2 y zx2 x2 x2 y 2 y 4 y 2 z 2 xy 2 y 3 y 2 z y 2   z 2 x2 y 2 z 2 z 4 z 2 x yz 2 z 3 z 2   x3 xy 2 z 2 x x2 xy zx x  ∈ S7  x2 y y3 yz 2 xy y 2 yz y   zx2 y 2 z z3 zx yz z 2 z  x2 y2 z2 x y z 1 (804)

360

CHAPTER 4. SEMIDEFINITE PROGRAMMING

then apply convex iteration ( 4.4.1) to implement rank constraints:
A , C∈ S , b

find

b
22 33

subject to tr(XE) = G= X = 

A b ( bT 1 C

0) δ(A) b 1  ( 0) (805)

δ(A)T bT

1T b = 1 b 0 rank G = 1 rank X = 1 where  0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 

(Matlab code on Wıκımization). Positive semidefiniteness is optional only when rank-1 constraints are explicit by Theorem A.3.1.0.7. Optimal solution 2 (x , y , z) = (0 , 3 , 1 ) to problem (802) is not unique. 3 4.6.0.0.8 Exercise. Motzkin polynomial. Prove xy 2 + x2 y − 3xy + 1 to be nonnegative on the nonnegative orthant. 4.6.0.0.9 Example. Boolean vector satisfying Ax b . (confer 4.2.3.1.1) Now we consider solution to a discrete problem whose only known analytical method of solution is combinatorial in complexity: given A ∈ RM ×N and b ∈ RM find x ∈ RN (807) subject to Ax b T δ(xx ) = 1

    E =    

   1  ∈ S7 2   

(806)

4.6. CARDINALITY AND RANK CONSTRAINT EXAMPLES

361

This nonconvex problem demands a Boolean solution [ xi = ±1, i = 1 . . . N ]. Assign a rank-1 matrix of variables; symmetric variable matrix X and solution vector x : G= x 1 [ xT 1 ] = X x xT 1 xxT x xT 1 ∈ SN +1 (808)

Then design an equivalent semidefinite feasibility problem to find a Boolean solution to Ax b :
X∈ SN

find

x ∈ RN b 0) (809) G= X x ( xT 1 rank G = 1 δ(X) = 1

subject to Ax

where x⋆ ∈ {−1, 1} , i = 1 . . . N . The two variables X and x are made i dependent via their assignment to rank-1 matrix G . By (1653), an optimal rank-1 matrix G⋆ must take the form (808). As before, we regularize the rank constraint by introducing a direction matrix Y into the objective:
X∈ SN , x∈RN

minimize

G, Y Ax G= b 0 (810) X x xT 1 δ(X) = 1

subject to

Solution of this semidefinite program is iterated with calculation of the direction matrix Y from semidefinite program (782). At convergence, in the sense (736), convex problem (810) becomes equivalent to nonconvex Boolean problem (807). Direction matrix Y can be an orthogonal projector having closed-form expression, by (1738a), although convex iteration is not a projection method. ( 4.4.1.1) Given randomized data A and b for a large problem, we find that stalling becomes likely (convergence of the iteration to a positive objective G⋆ , Y ). To overcome this behavior, we introduce a heuristic into the

362

CHAPTER 4. SEMIDEFINITE PROGRAMMING

implementation on Wıκımization that momentarily reverses direction of search (like (783)) upon stall detection. We find that rate of convergence can be sped significantly by detecting stalls early. 4.6.0.0.10 Example. Variable-vector normalization. Suppose, within some convex optimization problem, we want vector variables x , y ∈ RN constrained by a nonconvex equality: x y =y id est, x = 1 and x points in the same direction as y = 0 ; e.g., minimize
x, y

(811)

f (x , y) (812)

subject to (x , y) ∈ C x y =y

where f is some convex function and C is some convex set. We can realize the nonconvex equality by constraining rank and adding a regularization term to the objective. Make the assignment:   T T     x [x y 1] X Z x xxT xy T x G = y  =  Z Y y   yxT yy T y ∈ S2N +1 (813) xT y T 1 xT y T 1 1

where X , Y ∈ SN , also Z ∈ SN [sic] . Any rank-1 solution must take the form of (813). ( B.1) The problem statement equivalent to (812) is then written minimize f (x , y) + X − Y F
X , Y ∈S, Z , x, y

subject to

(x , y) ∈ C   X Z x G =  Z Y y ( xT y T 1 rank G = 1 tr(X) = 1 δ(Z) 0

0)

(814)

The trace constraint on X normalizes vector x while the diagonal constraint on Z maintains sign between respective entries of x and y . Regularization

4.6. CARDINALITY AND RANK CONSTRAINT EXAMPLES

363

term X − Y F then makes x equal to y to within a real scalar; ( C.2.0.0.2) in this case, a positive scalar. To make this program solvable by convex iteration, as explained in Example 4.4.1.2.4 and other previous examples, we move the rank constraint to the objective
X , Y , Z , x, y

minimize

subject to (x , y) ∈ C   X Z x G= Z Y y  xT y T 1 tr(X) = 1 δ(Z) 0

f (x , y) + X − Y

F

+ G, W

0

(815)

by introducing a direction matrix W found from (1738a): minimize
W ∈ S2N +1

G⋆ , W (816)

subject to 0 W I tr W = 2N

This semidefinite program has an optimal solution that is known in closed form. Iteration (815) (816) terminates when rank G = 1 and linear regularization G , W vanishes to within some numerical tolerance in (815); typically, in two iterations. If function f competes too much with the regularization, positively weighting each regularization term will become required. At convergence, problem (815) becomes a convex equivalent to the original nonconvex problem (812). 4.6.0.0.11 Example. fast max cut. [114]

Let Γ be an n-node graph, and let the arcs (i , j) of the graph be associated with [ ] weights aij . The problem is to find a cut of the largest possible weight, i.e., to partition the set of nodes into two parts S , S ′ in such a way that the total weight of all arcs linking S and S ′ (i.e., with one incident node in S and the other one in S ′ [Figure 106]) is as large as possible. [34, 4.3.3] Literature on the max cut problem is vast because this problem has elegant primal and dual formulation, its solution is very difficult, and there exist

364

CHAPTER 4. SEMIDEFINITE PROGRAMMING

2 1

3

4

S

CUT

5 16 S′ 6 7

1

15
4

-1 -3 5 2 1 -3

14 13

8 9 10 12 11

Figure 106: A cut partitions nodes {i = 1 . . . 16} of this graph into S and S ′ . Linear arcs have circled weights. The problem is to find a cut maximizing total weight of all arcs linking partitions made by the cut.

4.6. CARDINALITY AND RANK CONSTRAINT EXAMPLES

365

many commercial applications; e.g., semiconductor design [126], quantum computing [390]. Our purpose here is to demonstrate how iteration of two simple convex problems can quickly converge to an optimal solution of the max cut problem with a 98% success rate, on average.4.41 max cut is stated: maximize
x

subject to δ(xx ) = 1 where [aij ] are real arc weights, and binary vector x = [xi ] ∈ Rn corresponds to the n nodes; specifically, node i ∈ S ⇔ xi = 1 node i ∈ S ′ ⇔ xi = −1 (818)

aij (1 1≤i<j≤n T

1 − xi x j ) 2

(817)

If nodes i and j have the same binary value xi and xj , then they belong to the same partition and contribute nothing to the cut. Arc (i , j) traverses the cut, otherwise, adding its weight aij to the cut. max cut statement (817) is the same as, for A = [aij ] ∈ S n maximize
x 1 4

subject to δ(xxT ) = 1 Because of Boolean assumption δ(xxT ) = 1

11T − xxT , A

(819)

11T − xxT , A = xxT , δ(A1) − A so problem (819) is the same as maximize
x 1 4

(820)

subject to δ(xxT ) = 1

xxT , δ(A1) − A

(821)

This max cut problem is combinatorial (nonconvex).
We term our solution to max cut fast because we sacrifice a little accuracy to achieve speed; id est, only about two or three convex iterations, achieved by heavily weighting a rank regularization term.
4.41

366

CHAPTER 4. SEMIDEFINITE PROGRAMMING

Because an estimate of upper bound to max cut is needed to ascertain convergence when vector x has large dimension, we digress to derive the dual problem: Directly from (821), its Lagrangian is [61, 5.1.5] (1451) L(x , ν) = = =
1 4 1 4 1 4

xxT , δ(A1) − A + ν , δ(xxT ) − 1 xxT , δ(A1) − A + δ(ν) , xxT − ν , 1 xxT , δ(A1 + 4ν) − A − ν , 1

(822)

where quadratic xT(δ(A1 + 4ν)−A)x has supremum 0 if δ(A1 + 4ν)−A is negative semidefinite, and has supremum ∞ otherwise. The finite supremum of dual function g(ν) = sup L(x , ν) =
x

− ν , 1 , A − δ(A1 + 4ν) ∞ otherwise

0

(823)

is chosen to be the objective of minimization to dual (convex) problem minimize
ν

subject to A − δ(A1 + 4ν)

−ν T 1

0

(824)

whose optimal value provides a least upper bound to max cut, but is not 1 tight ( 4 xxT , δ(A1)−A < g(ν) , duality gap is nonzero). [158] In fact, we find that the bound’s variance with problem instance is too large to be useful for this problem; thus ending our digression.4.42 To transform max cut to its convex equivalent, first define X = xxT ∈ S n then max cut (821) becomes maximize n
X∈ S 1 4

(829)

X , δ(A1) − A (825)

subject to δ(X) = 1 (X 0) rank X = 1
4.42

Taking the dual of dual problem (824) would provide (825) but without the rank constraint. [151] Dual of a dual of even a convex primal problem is not necessarily the same primal problem; although, optimal solution of one can be obtained from the other.

4.6. CARDINALITY AND RANK CONSTRAINT EXAMPLES whose rank constraint can be regularized as in maximize n
X∈ S 1 4

367

X , δ(A1) − A − w X , W

subject to δ(X) = 1 X 0

(826)

where w ≈ 1000 is a nonnegative fixed weight, and W is a direction matrix determined from
n

λ(X ⋆ )i = minimize n
i=2 W∈S

X ⋆, W W I tr W = n − 1

(1738a)

subject to 0

which has an optimal solution that is known in closed form. These two problems (826) and (1738a) are iterated until convergence as defined on page 307. Because convex problem statement (826) is so elegant, it is numerically solvable for large binary vectors within reasonable time.4.43 To test our convex iterative method, we compare an optimal convex result to an actual solution of the max cut problem found by performing a brute force combinatorial search of (821)4.44 for a tight upper bound. Search-time limits binary vector lengths to 24 bits (about five days CPU time). 98% accuracy, actually obtained, is independent of binary vector length (12, 13, 20, 24) when averaged over more than 231 problem instances including planar, randomized, and toroidal graphs.4.45 When failure occurred, large and small errors were manifest. That same 98% average accuracy is presumed maintained when binary vector length is further increased. A Matlab program is provided on Wıκımization.
We solved for a length-250 binary vector in only a few minutes and convex iterations on a 2006 vintage laptop Core 2 CPU (Intel T7400@2.16GHz, 666MHz FSB). 4.44 more computationally intensive than the proposed convex iteration by many orders of magnitude. Solving max cut by searching over all binary vectors of length 100, for example, would occupy a contemporary supercomputer for a million years. 4.45 Existence of a polynomial-time approximation to max cut with accuracy provably better than 94.11% would refute NP-hardness; which H˚ astad believes to be highly unlikely. [182, thm.8.2] [183]
4.43

368

CHAPTER 4. SEMIDEFINITE PROGRAMMING

4.6.0.0.12 Example. Cardinality/rank problem. d’Aspremont, El Ghaoui, Jordan, & Lanckriet [95] propose approximating a positive semidefinite matrix A ∈ SN by a rank-one matrix having constraint + on cardinality c : for 0 < c < N minimize
z

subject to card z ≤ c which, they explain, is a hard problem equivalent to maximize xTA x
x

A − zz T

F

(827)

subject to

√ λ x and where optimal solution x⋆ is a principal eigenvector where z (1732) ( A.5) of A and λ = x⋆TA x⋆ is the principal eigenvalue [160, p.331] when c is true cardinality of that eigenvector. This is principal component analysis with a cardinality constraint which controls solution sparsity. Define the matrix variable X xxT ∈ SN (829) whose desired rank is 1, and whose desired diagonal cardinality card δ(X) ≡ card x (830)

x =1 card x ≤ c

(828)

is equivalent to cardinality c of vector x . Then we can transform cardinality problem (828) to an equivalent problem in new variable X :4.46 maximize
X∈ SN

X, A X, I =1 (X 0) rank X = 1 card δ(X) ≤ c (831)

subject to

We transform problem (831) to an equivalent convex problem by introducing two direction matrices into regularization terms: W to achieve
A semidefiniteness constraint X 0 is not required, theoretically, because positive semidefiniteness of a rank-1 matrix is enforced by symmetry. (Theorem A.3.1.0.7)
4.46

4.6. CARDINALITY AND RANK CONSTRAINT EXAMPLES

369

desired cardinality card δ(X) , and Y to find an approximating rank-one matrix X : maximize
X∈ SN

X , A − w1 Y − w2 δ(X) , δ(W ) X, I =1 X 0

subject to

(832)

where w1 and w2 are positive scalars respectively weighting tr(XY ) and δ(X)T δ(W ) just enough to insure that they vanish to within some numerical precision, where direction matrix Y is an optimal solution to semidefinite program minimize X ⋆ , Y subject to 0 Y I tr Y = N − 1
Y ∈ SN

(833)

and where diagonal direction matrix W ∈ SN optimally solves linear program minimize 2
W = δ (W )

δ(X ⋆ ) , δ(W ) (834)

subject to 0 δ(W ) 1 tr W = N − c

Both direction matrix programs are derived from (1738a) whose analytical solution is known but is not necessarily unique. We emphasize (confer p.307): because this iteration (832) (833) (834) (initial Y, W = 0) is not a projection method ( 4.4.1.1), success relies on existence of matrices in the feasible set of (832) having desired rank and diagonal cardinality. In particular, the feasible set of convex problem (832) is a Fantope (91) whose extreme points constitute the set of all normalized rank-one matrices; among those are found rank-one matrices of any desired diagonal cardinality. Convex problem (832) is neither a relaxation of cardinality problem (828); instead, problem (832) becomes a convex equivalent to (828) at global convergence of iteration (832) (833) (834). Because the feasible set of convex problem (832) contains all normalized ( B.1) symmetric rank-one matrices of every nonzero diagonal cardinality, a constraint too low or high in cardinality c will not prevent solution. An optimal rank-one solution X ⋆ , whose diagonal cardinality is equal to cardinality of a principal eigenvector of matrix A , will produce the least residual Frobenius norm (to within machine noise processes) in the original problem statement (827).

370

CHAPTER 4. SEMIDEFINITE PROGRAMMING

phantom(256)

Figure 107: Shepp-Logan phantom from Matlab image processing toolbox.

4.6.0.0.13 Example. Compressive sampling of a phantom. In summer of 2004, Cand`s, Romberg, & Tao [70] and Donoho [119] e released papers on perfect signal reconstruction from samples that stand in violation of Shannon’s classical sampling theorem. These defiant signals are assumed sparse inherently or under some sparsifying affine transformation. Essentially, they proposed sparse sampling theorems asserting average sample rate independent of signal bandwidth and less than Shannon’s rate. Minimum sampling rate: of Ω-bandlimited signal: 2Ω (Shannon) (Cand`s/Donoho) e

of k-sparse length-n signal: k log2 (1+ n/k)

(Figure 100). Certainly, much was already known about nonuniform or random sampling [36] [209] and about subsampling or multirate systems [91] [360]. Vetterli, Marziliano, & Blu [369] had congealed a theory of noiseless signal reconstruction, in May 2001, from samples that violate the Shannon rate. [Wıκımization, Sampling Sparsity] They anticipated the sparsifying transform by recognizing: it is the innovation (onset) of functions constituting a (not necessarily bandlimited) signal that determines minimum sampling rate for perfect reconstruction. Average onset (sparsity) Vetterli et alii call rate of innovation. Vector inner-products that Cand`s/Donoho call samples or measurements, Vetterli calls projections. e

4.6. CARDINALITY AND RANK CONSTRAINT EXAMPLES

371

From those projections Vetterli demonstrates reconstruction (by digital signal processing and “root finding”) of a Dirac comb, the very same prototypical signal from which Cand`s probabilistically derives minimum e sampling rate [Compressive Sampling and Frontiers in Signal Processing, University of Minnesota, June 6, 2007]. Combining their terminology, we paraphrase a sparse sampling theorem: Minimum sampling rate, asserted by Cand`s/Donoho, is proportional e to Vetterli’s rate of innovation (a.k.a: information rate, degrees of freedom [ibidem June 5, 2007]). What distinguishes these researchers are their methods of reconstruction. Properties of the 1-norm were also well understood by June 2004 finding applications in deconvolution of linear systems [83], penalized linear regression (Lasso) [320], and basis pursuit [214]. But never before had there been a formalized and rigorous sense that perfect reconstruction were possible by convex optimization of 1-norm when information lost in a subsampling process became nonrecoverable by classical methods. Donoho named this discovery compressed sensing to describe a nonadaptive perfect reconstruction method by means of linear programming. By the time Cand`s’ e and Donoho’s landmark papers were finally published by IEEE in 2006, compressed sensing was old news that had spawned intense research which still persists; notably, from prominent members of the wavelet community. Reconstruction of the Shepp-Logan phantom (Figure 107), from a severely aliased image (Figure 109) obtained by Magnetic Resonance Imaging (MRI), was the impetus driving Cand`s’ quest for a sparse sampling e theorem. He realized that line segments appearing in the aliased image were regions of high total variation. There is great motivation, in the medical community, to apply compressed sensing to MRI because it translates to reduced scan-time which brings great technological and physiological benefits. MRI is now about 35 years old, beginning in 1973 with Nobel laureate Paul Lauterbur from Stony Brook USA. There has been much progress in MRI and compressed sensing since 2004, but there have also been indications of 1-norm abandonment (indigenous to reconstruction by compressed sensing) in favor of criteria closer to 0-norm because of a correspondingly smaller number of measurements required to accurately reconstruct a sparse signal:4.47
Efficient techniques continually emerge urging 1-norm criteria abandonment; [80] [359] [358, IID] e.g., five techniques for compressed sensing are compared in [37] demonstrating
4.47

372

CHAPTER 4. SEMIDEFINITE PROGRAMMING

5481 complex samples (22 radial lines, ≈ 256 complex samples per) were required in June 2004 to reconstruct a noiseless 256×256-pixel Shepp-Logan phantom by 1-norm minimization of an image-gradient integral estimate called total variation; id est, 8.4% subsampling of 65536 data. [70, 1.1] [69, 3.2] It was soon discovered that reconstruction of the Shepp-Logan phantom were possible with only 2521 complex samples (10 radial lines, Figure 108); 3.8% subsampled data input to a (nonconvex) 1 -norm 2 total-variation minimization. [76, IIIA] The closer to 0-norm, the fewer the samples required for perfect reconstruction. Passage of a few years witnessed an algorithmic speedup and dramatic reduction in minimum number of samples required for perfect reconstruction of the noiseless Shepp-Logan phantom. But minimization of total variation is ideally suited to recovery of any piecewise-constant image, like a phantom, because gradient of such images is highly sparse by design. There is no inherent characteristic of real-life MRI images that would make reasonable an expectation of sparse gradient. Sparsification of a discrete image-gradient tends to preserve edges. Then minimization of total variation seeks an image having fewest edges. There is no deeper theoretical foundation than that. When applied to human brain scan or angiogram, with as much as 20% of 256×256 Fourier samples, we have observed4.48 a 30dB image/reconstruction-error ratio4.49 barrier that seems impenetrable by the total-variation objective. Total-variation minimization has met with moderate success, in retrospect, only because some medical images are moderately piecewise-constant signals. One simply hopes a reconstruction, that is in some sense equal to a known subset of samples and whose gradient is most sparse, is that unique image we seek.4.50
that 1-norm performance limits for cardinality minimization can be reliably exceeded. 4.48 Experiments with real-life images were performed by Christine Law at Lucas Center for Imaging, Stanford University. 4.49 Noise considered here is due only to the reconstruction process itself; id est, noise in excess of that produced by the best reconstruction of an image from a complete set of samples in the sense of Shannon. At less than 30dB image/error, artifacts generally remain visible to the naked eye. We estimate about 50dB is required to eliminate noticeable distortion in a visual A/B comparison. 4.50 In vascular radiology, diagnoses are almost exclusively based on morphology of vessels and, in particular, presence of stenoses. There is a compelling argument for total-variation reconstruction of magnetic resonance angiogram because it helps isolate structures of particular interest.

4.6. CARDINALITY AND RANK CONSTRAINT EXAMPLES

373

The total-variation objective, operating on an image, is expressible as norm of a linear transformation (853). It is natural to ask whether there exist other sparsifying transforms that might break the real-life 30dB barrier (any sampling pattern @20% 256×256 data) in MRI. There has been much research into application of wavelets, discrete cosine transform (DCT), randomized orthogonal bases, splines, etcetera, but with suspiciously little focus on objective measures like image/error or illustration of difference images; the predominant basis of comparison instead being subjectively visual (Duensing & Huang, ISMRM Toronto 2008).4.51 Despite choice of transform, there seems yet to have been a breakthrough of the 30dB barrier. Application of compressed sensing to MRI, therefore, remains fertile in 2008 for continued research. Lagrangian form of compressed sensing in imaging We now repeat Cand`s’ image reconstruction experiment from 2004 which e led to discovery of sparse sampling theorems. [70, 1.2] But we achieve perfect reconstruction with an algorithm based on vanishing gradient of a compressed sensing problem’s Lagrangian, which is computationally efficient. Our contraction method (p.380) is fast also because matrix multiplications are replaced by fast Fourier transforms and number of constraints is cut in half by sampling symmetrically. Convex iteration for cardinality minimization ( 4.5) is incorporated which allows perfect reconstruction of a phantom at 4.1% subsampling rate; 50% Cand`s’ rate. By making neighboring-pixel e selection adaptive, convex iteration reduces discrete image-gradient sparsity of the Shepp-Logan phantom to 1.9% ; 33% lower than previously reported. We demonstrate application of discrete image-gradient sparsification to the n×n = 256×256 Shepp-Logan phantom, simulating idealized acquisition of MRI data by radial sampling in the Fourier domain (Figure 108).4.52
I have never calculated the PSNR of these reconstructed images [of Barbara]. −Jean-Luc Starck The sparsity of the image is the percentage of transform coefficients sufficient for diagnostic-quality reconstruction. Of course the term “diagnostic quality” is subjective. . . . I have yet to see an “objective” measure of image quality. Difference images, in my experience, definitely do not tell the whole story. Often I would show people some of my results and get mixed responses, but when I add artificial Gaussian noise to an image, often people say that it looks better. −Michael Lustig 4.52 k-space is conventional acquisition terminology indicating domain of the continuous raw data provided by an MRI machine. An image is reconstructed by inverse discrete Fourier transform of that data interpolated on a Cartesian grid in two dimensions.
4.51

. . .31 we have a vectorized two-dimensional DFT via Kronecker product ⊗ vec F(U) (F ⊗F ) vec U (839) and from (838) its inverse [167.  . . . . like DFT matrix F . . −(n−1)2π/n −(n−1)4π/n −(n−1)6π/n 1 e e e  ··· 1 · · · e−(n−1)2π/n   · · · e−(n−1)4π/n  1 √ ∈ C n×n · · · e−(n−1)6π/n  n  . define a circulant [169] symmetric permutation matrix4.53 Θ 4.. 2 2π/n · · · e−(n−1) (835) a symmetric (nonHermitian) unitary matrix characterized F = FT F −1 = F H (836) Denoting an unknown image U ∈ Rn×n .24] vec U = (F H ⊗ F H )(F ⊗F ) vec U = (F HF ⊗ F HF ) vec U (840) Idealized radial sampling in the Fourier domain can be simulated by Hadamard product ◦ with a binary mask Φ ∈ Rn×n whose nonzero entries could.1 no.374 CHAPTER 4. correspond with the radial line segments in Figure 108. . .53 0 I I 0 ∈ Sn (841) Matlab fftshift() . .1. . . . . for example. SEMIDEFINITE PROGRAMMING Define a Nyquist-centric discrete Fourier transform (DFT) matrix  F        1 1 1 1 − 2π/n − 4π/n − 6π/n 1 e e e − 4π/n − 8π/n 1 e e e−12π/n 1 e− 6π/n e−12π/n e−18π/n . To make the mask Nyquist-centric. its two-dimensional discrete Fourier transform F is F(U) F UF (837) hence the inverse discrete transform U = F H F(U)F H (838) From A. p.

in DC-centric Fourier domain. there are actually twice the number of equality constraints as there are measurements. representing 4. We can cut that number of constraints in half via vertical and horizontal mask Φ symmetry which forces the imaginary inverse transform to 0 : The inverse subsampled transform in matrix form is F H (ΘΦΘ ◦ F UF )F H = F HKF H and in vector form (F H ⊗ F H )δ(vec ΘΦΘ)(F ⊗F ) vec U = (F H ⊗ F H ) vec K (845) (844) .1% (10 lines) subsampled data. Only half of these complex samples. MRI acquisition time is proportional to number of lines. CARDINALITY AND RANK CONSTRAINT EXAMPLES 375 Φ Figure 108: MRI radial sampling pattern. Then given subsampled Fourier domain (MRI k-space) measurements in incomplete K ∈ C n×n .6. need be acquired for a real image because of conjugate symmetry. Due to MRI machine imperfections. we might constrain F(U) thus: ΘΦΘ ◦ F UF = K and in vector form. samples are generally taken over full extent of each radial line segment. in any halfspace about the origin in theory.4. (42) (1826) δ(vec ΘΦΘ)(F ⊗F ) vec U = vec K (843) (842) Because measurements K are complex.

.376 later abbreviated CHAPTER 4. P is a projection matrix.. 4. . id est. p.     ..54 where Ξ ∈ S n−1 is the order-reversing permutation matrix (1766).   1 0   0T −1 1 (851) (847) is a diagonalization of matrix P whose binary eigenvalues are δ(vec ΘΦΘ) while the corresponding eigenvectors constitute the columns of unitary matrix F H ⊗ F H . when for positive even integer n Φ= Φ11 Φ(1 . 2 Define   1 0 0   1 0  −1    .54 P is an orthogonal projector. In words.... 2 : n)Ξ ΞΦ(2 : n . this necessary and sufficient condition on Φ (for a real inverse subsampled transform [286.   −1 1   ∆  (850)  ∈ Rn×n .24] P = (F H ⊗ F H )δ(vec ΘΦΘ)(F ⊗F ) = (F ⊗F )H δ(vec ΘΦΘ)(F H ⊗ F H )H = P H (848) 4. SEMIDEFINITE PROGRAMMING P vec U = f where P (F H ⊗ F H )δ(vec ΘΦΘ)(F ⊗F ) ∈ C n 2 × n2 (846) (847) Because of idempotence P = P 2 .... 1) ΞΦ(2 : n .55 This condition on Φ applies to both DC. P vec U is real when P is real..55 about column n +1. 2 : n)Ξ ∈ Rn×n (849) Express an image-gradient estimate   U∆  U ∆T  4n×n   ∇U  ∆U ∈ R ∆T U 4.and Nyquist-centric DFT matrices. .53]) demands vertical symmetry about row n +1 and 2 horizontal symmetry4. Because of its Hermitian symmetry [167. p. .

CARDINALITY AND RANK CONSTRAINT EXAMPLES 377 vec−1 f Figure 109: Aliasing of Shepp-Logan phantom in Figure 107 resulting from k-space subsampling pattern in Figure 108.6.31.4. given only U 0 = vec−1 f .1.4. Improvement is due to centering.1 no. left. We find small improvement on real-life images.56 By A. its vectorization: for 2 2 Ψi ∈ Rn × n  ∆T ⊗ I  ∆⊗I   vec ∇U =   I ⊗ ∆ vec U I ⊗ ∆T  2 2  Ψ1  ΨT   1 vec U  Ψ2  ΨT 2  Ψ vec U ∈ R4n 2 (852) where Ψ ∈ R4n ×n .56 subject to P vec U = f Ψ vec U 1 (853) There is significant improvement in reconstruction quality by augmentation of a nominally two-point discrete image-gradient estimate to four points per pixel by inclusion of two polar directions. It is remarkable that the phantom can be reconstructed. symmetry of discrete differences about a central pixel. by convex iteration. that is a simple first-order difference of neighboring pixels (Figure 110) to the right. by further augmentation with diagonally adjacent pixel differences. may be concisely posed minimize U 4. . that is known suboptimal [208] [71]. and below. This image is real because binary mask Φ is vertically and horizontally symmetric. above. A total-variation minimization for reconstructing MRI image U . ≈ 1dB empirically.

the best commercial solvers fail simply because computer numerics become nonsense. The obstacle is.. execution time.378 where CHAPTER 4. 1} such that minimize U subject to P vec U = f Ψ vec U 0 ≡ 1 minimize |Ψ vec U| .2. Figure 109). SEMIDEFINITE PROGRAMMING f = (F H ⊗ F H ) vec K ∈ C n 2 (854) is the known inverse subsampled Fourier data (a vectorized aliased image. y + 1 λ P vec U − f 2 U |Ψ vec U| . We 2 4n2 introduce a direction vector y ∈ R+ as part of a convex iteration ( 4. there exists a vector y ⋆ with entries yi ∈ {0. memory.1. Because (855b) is an unconstrained convex problem. ( D. its numerical solution is beyond the capability of even the most highly regarded of contemporary commercial solvers. y ⋆ + 2 λ P vec U − f 2 2 U (856) Existence of such a y ⋆ .3) to overcome that known suboptimal minimization of discrete image-gradient ⋆ cardinality: id est. and where a norm of discrete image-gradient ∇U is equivalently expressed as norm of a linear transformation Ψ vec U . Although this simple problem statement (853) is equivalent to a linear program ( 3. a zero objective function gradient is necessary and sufficient for optimality ( 2. numerical errors enter significant digits and the algorithm exits prematurely. complementary to an optimal vector Ψ vec U ⋆ .1).2).2.57 Our only recourse is to recast the problem in Lagrangian form ( 3.13. loops indefinitely. 4.7. id est. Even when all dependent equality constraints are manually removed.4. or produces an infeasible solution. inadequate numerical precision.3).g. is obvious by definition of global optimality |Ψ vec U ⋆ | . in fact. y (a) (855) 2 2 (b) where multiobjective parameter λ ∈ R+ is quite large (λ ≈1E8) so as to enforce the equality constraint: P vec U −f = 0 ⇔ P vec U −f 2 = 0 ( A.5. id est. Obstacle to numerical solution is not a computer resource: e.57 .2) and write customized code to solve it: minimize U subject to P vec U = f ≡ minimize |Ψ vec U| .1) for images as small as 128×128 pixels.2. y ⋆ = 0 (757) under which a cardinality-c optimal objective Ψ vec U ⋆ 0 is assumed to exist.

Continuous image-gradient from two pixels holds only in a limit. By trading execution time and treating discrete image-gradient cardinality as a known quantity for this phantom. we may ignore the limiting operation. Speaking more analytically.16GHz. Implementation selects adaptively from darkest four • about central. For discrete differences.1% subsampled data (10 radial lines in k-space). introduction of ǫ serves to uniquely define the objective’s gradient everywhere in the function domain. With this setting of ǫ . Then the We are looking for at least 50dB image/error ratio from only 4. we actually attain in excess of 100dB from a simple Matlab program in about a minute on a 2006 vintage laptop Core 2 CPU (Intel T7400@2.4. better practical estimates are obtained when centered. this is equivalent to lim ΨT δ(y)δ(|Ψ vec U| + ǫ1)−1 Ψ + λP vec U = λP f ǫ→0 (858) where small positive constant ǫ ∈ R+ has been introduced for invertibility. it transforms absolute value in (855b) from a function differentiable almost everywhere into a differentiable function.58 (ǫ ≈1E-3). 666MHz FSB). When small enough for practical purposes4. ΨT δ(y) sgn(Ψ vec U ) + λP H (P vec U − f ) = 0 (857) Because of P idempotence and Hermitian symmetry and sgn(x) = x/|x| . id est. CARDINALITY AND RANK CONSTRAINT EXAMPLES 379 Figure 110: Neighboring-pixel stencil [359] for image-gradient estimation on Cartesian grid. over 160dB is achievable. An example of such a transformation in one dimension is illustrated in Figure 111.6.58 . 4.

4.4.59 Observe that P (847). is not a fat matrix. 1] from Figure 68b superimposed upon integral of its derivative at ǫ = 0.60 Conjugate gradient method requires positive definiteness.05 which smooths objective function. a two-dimensional fast Fourier transform of U is computed followed by masking with ΘΦΘ and then an inverse fast Fourier transform. matrix P is square but never actually formed during computation. . together with contraction method of solution. so instead we apply the conjugate gradient method of solution to ΨT δ(y)δ(|Ψ vec U t | + ǫ1)−1 Ψ + λP vec U t+1 = λP f (860) which is linear in U t+1 at each recursion in the Matlab program. mapping.3.380 CHAPTER 4. p..2] Fat is typical of compressed sensing problems. p.8. for 0 y 1 −1 vec U t+1 = ΨT δ(y)δ(|Ψ vec U t | + ǫ1)−1 Ψ + λP λP f (859) is a contraction in U t that can be solved recursively in t for its unique fixed point. SEMIDEFINITE PROGRAMMING x y dy −1 |y|+ǫ |x| Figure 111: Real absolute value function f2 (x) = |x| on x ∈ [−1. e. [228. in the equality constraint from problem (855a).155] Calculating this inversion directly is not possible for large matrices on contemporary computers because of numerical precision. This technique significantly reduces memory requirements and.g.300] [204. is the principal reason for relatively fast computation.4. [153.59 4. [69] [76]. Rather. 4. id est.60 Although number of Fourier samples taken is equal to the number of nonzero entries in binary mask Φ . until U t+1 → U t .

[358. is known in closed form: it has 1 in each entry corresponding to the 4n2 − c smallest entries of |Ψ vec U ⋆ | and has 0 elsewhere (page 334).g.6. Once U ⋆ is found. e. vector y is updated according to an estimate of discrete image-gradient 2 cardinality c : Sum of the 4n2 − c smallest entries of |Ψ vec U ⋆ | ∈ R4n is the optimal objective value from a linear program. cardinality minimization of a discrete image-gradient. for 0 ≤ c ≤ 4n2 − 1 (524) 4n2 i=c+1 π(|Ψ vec U ⋆ |)i = minimize 2 y∈ R4n |Ψ vec U ⋆ | . that is an extreme point of its feasible set.. number of points constituting an estimate is directly determined by direction vector y . direction vector y is updated again. There are two features that distinguish problem formulation (855b) and our particular implementation of it [Matlab on Wıκımization]: 1) An image-gradient estimate may engage any combination of four adjacent pixels. Direction vector y is initialized to 1 until the first fixed point is found. the contraction recursion begins calculating a (1-norm) solution U ⋆ to (853) via problem (855b). which means. In other words.61 Indeed. An optimal solution y to (861). Updated image U ⋆ is assigned to U t . in this case. It is an artifact of the convex iteration method for minimal cardinality solution. the contraction is recomputed solving (855b). IIB]. meaning. This adaptive gradient was not contrived. discrete image-gradient sparsity is actually closer to 1. y subject to 0 y 1 y T 1 = 4n2 − c (861) where π is the nonlinear permutation-operator sorting its vector argument into nonincreasing order.4. we find only c = 5092 zero entries in y ⋆ for the Shepp-Logan phantom. and so on until convergence which is guaranteed by virtue of a monotonically nonincreasing real sequence of objective values in (855a) and (861). 4.61 .4.9% than the 3% reported elsewhere. CARDINALITY AND RANK CONSTRAINT EXAMPLES 381 convex iteration By convex iteration we mean alternation of solution to (855a) and (861) until convergence. the algorithm is not locked into a four-point gradient estimate (Figure 110).

introduced in London by Christopher Walter Monckton. meaning.62 can yield the sum of smallest entries in |Ψ vec U ⋆ |. Game-rules state that a °2M prize would be awarded to the first person who completely solves the puzzle before December 31. Contraction operator. The full game comprises M = 256 square pieces and a 16×16 gridded Simultaneous optimization of these two variables U and y should never be a pinnacle of aspiration. Called Eternity II. well below an 11% least lower bound predicted by the sparse sampling theorem. SEMIDEFINITE PROGRAMMING 2) Numerical precision of the fixed point of contraction (859) (≈1E-2 for perfect reconstruction @−103dB error) is a parameter to the implementation. for then. there is apparently vast distance to a solution because no complete solution was found in 2009. 4. playable by children. 4. but the prize remains unclaimed after the deadline.0.63 No prize was awarded for 2009 and 2010. its name derives from time that would pass while trying all allowable tilings of puzzle pieces before obtaining a complete solution. Perfect reconstruction of the Shepp-Logan phantom (at 103dB image/error) is achieved in a Matlab minute with 4. Because reconstruction approaches optimal solution to a 0-norm problem.0.62 . 4.6.9%.6.63 That score means all but a few of the 256 pieces had been placed successfully (including the mandatory piece). 2010.1% subsampled data (2671 complex samples). optimal y might not attain a boundary point. A tessellation puzzle game.382 CHAPTER 4.14 Exercise. By the end of 2008. commenced world-wide in July 2007. a complete solution had not yet been found although a °10. Determine conditions on λ and ǫ under which (859) is a contraction and ΨT δ(y)δ(|Ψ vec U t | + ǫ1)−1 Ψ + λP from (860) is positive definite.0.15 Example.0. 3rd Viscount Monckton of Brenchley.4. Impact of this idiosyncrasy tends toward simultaneous optimization in variables U and y while insuring y settles on a boundary point of its feasible set (nonnegative hypercube slice) in (861) at every iteration. Although distance between 467 to 480 is relatively small. minimum number of Fourier-domain samples is bounded below by cardinality of discrete image-gradient at 1.000 USD prize was awarded for a high √ √ score 467 (out of 480 = 2 M ( M − 1)) obtained by heuristic methods. Eternity II. direction vector y is updated after contraction begins but prior to its culmination. 4. for only a boundary point4.

(a) Shown are all of the 16 puzzle pieces (indexed as in the tableau alongside) from a scaled-down computerized demonstration-version on the TOMY website. whose border is lightly outlined. except the grey board-boundary. There is no mandatory piece placement as for the full game.6. was placed last in this realization. removed. indexed 1 through 4 . grey corresponds to 0. CARDINALITY AND RANK CONSTRAINT EXAMPLES 1 5 9 383 13 (a) pieces 2 6 10 14 3 7 11 15 4 8 12 16 13 4 16 5 (b) one solution 3 2 10 11 12 14 6 8 1 15 7 9 e1 (c) colors e2 e3 e4 Figure 112: Eternity II is a board game in the puzzle genre.4. (b) Illustrated is one complete solution to this puzzle whose solution is not unique. Pieces may be moved. and rotated at random on a 4×4 board. Puzzle pieces are square and triangularly partitioned into four colors (with associated symbols). Solution time for a human is typically on the order of a minute. (c) This puzzle has four colors. The piece. .

Differences between the full game (Figure 113) and scaled-down game are the number of edge-colors L (22 versus 4. in any order face-up on the square board. 6) The board must be tiled completely (covered ). The object of the game is to completely tile the board with pieces whose touching edges have identical color. which are not necessarily the same per piece or from piece to piece. id est. Even so. The scaled-down game has four distinct edge-colors. 2) Only one piece may occupy any particular cell on the board. A scaled-down demonstration version of the game is illustrated in Figure 112. whose coding is illustrated in Figure 112c. and a single mandatory piece placement interior to the board in the full game. different pieces may or may not have some edge-colors in common. SEMIDEFINITE PROGRAMMING board (Figure 113) whose complete tessellation is considered NP-hard. ignoring solid grey). retile. 5) One mandatory piece (numbered 139 in the full game) must have a predetermined orientation in a predetermined cell on the board. There are L = 4 distinct edge-colors and M = 16 pieces and dimension √ √ M × M = 4×4 for the scaled-down demonstration board. as demonstrated by Yannick Kirschhoffer. The boundary of the board must be colored grey. There are L = 22 distinct edge-colors plus a solid grey. 3) All adjacent pieces must match in color (and symbol) at their touching edges. full-game rules 1) Any puzzle piece may be rotated face-up in quadrature and placed or replaced on the square board. indexed 1 through 256. Pieces are immutable in the sense that each is characterized by 4 colors (and their uniquely associated symbols).64 . and rotate pieces. number of pieces M (256 versus 16).384 CHAPTER 4. plus a solid grey. There is a steep rise in level of difficulty going to a 15×15 board. L = 22 distinct edge-colors and number of puzzle pieces M = 256 and √ √ board-dimension M × M = 16×16 for the full game.4. 4) Solid grey edges must appear all along the board’s boundary.64 [346] [107] A player may tile. 4. one at each edge. combinatorial-intensity brute-force backtracking methods can solve similar puzzles in minutes given M = 196 pieces on a 14×14 test board.

2. one point representing board coordinates and color per edge.6. they are hard-pressed to find a solution for variable matrices of dimension as small as 100. facilitates piece placement within unit-square cell.65 in EDM 4M . ( 5.0. rotation occurs when colors are aligned with a neighboring piece. Because distance information provides for reconstruction of point position to within an isometry ( 5.65 .4.5. then this EDM would have a block-diagonal structure of known entries. CARDINALITY AND RANK CONSTRAINT EXAMPLES 1 242 3 244 5 246 7 248 9 121 250 11 252 13 254 15 256 385 Figure 113: Eternity II full-game board (16×16. ( 6. 4.66 Convex constraints can be devised to prevent puzzle-piece reflection and to quantize rotation such that piece-edges stay aligned with the board boundary. Since all interpoint distances per piece are known.7) Were edge-points ordered sequentially with piece number.1) But manipulating such a large EDM is too numerically difficult for contemporary general-purpose semidefinite program solvers which incorporate interior-point methods. then Euclidean distance geometry would be suitable for solving this puzzle. not actual size).5). Our challenge.4. Grid Euclidean distance intractability If each square puzzle piece were characterized by four points in quadrature. indeed. 4.66 Translation occurs when a piece moves on the board in Figure 113. piece translation and rotation are isometric transformations that abide by rules of the game. one piece per cell. this game may be regarded as a Euclidean distance matrix completion problem4.

and then relax a very hard combinatorial problem: All pieces are initially placed. Each square piece is characterized by four colors. . M (862) In other words. assign an index i representing a unique piece-number. a loose upper bound on number of combinations is M ! 4M = 256! 4256 . i = 1 . in order of their given index. 4} of Pi are ordered ij counterclockwise as in Figure 114. from a set of M pieces {Pi . SEMIDEFINITE PROGRAMMING p62 p63 p64 p61 P6 = [ p61 p62 p63 p64 ]T ∈ R4×L Figure 114: Demo-game piece P6 illustrating edge-color • p6j ∈ RL counterclockwise ordering in j beginning from right. is to express this game’s rules as constraints in a convex and numerically tractable way so as to find one solution from a googol of possible combinations. Then matrix Pi describes the i th piece. L} identifying a standard basis vector eℓ ∈ RL (Figure 112c) that becomes a vector pij ∈ {e1 . each distinct nongrey color is assigned a unique corresponding index ℓ ∈ {1 . . . Color data is given in Figure 115 for the demonstration board. as in There exists at least one solution. . as in Figure 115.4. Ignoring boundary constraints and the single mandatory piece placement in the full game. . eL . Each color pij ∈ RL is represented by eℓ ∈ RL an L-dimensional standard basis vector or 0 if grey.386 CHAPTER 4. corresponding to its four edges. . one matrix per piece Pi [ pi1 pi2 pi3 pi4 ]T ∈ R4×L . . in quadrature. Our strategy to solve Eternity II is to first vectorize the board. M } . therefore. . Rows {pT .67 . i=1 . j = 1 . excepting its orientation and location on the board. 0} ⊂ RL constituting matrix Pi representing a particular piece. their exact number is unknown although Monckton insists they number in thousands. That number gets loosened: 150638!/(256!(150638−256)!) after presolving Eternity II (883). . with respect to the whole pieces. . These four edge-colors are represented in a 4×L-dimensional matrix.67 permutation polyhedron strategy To each puzzle piece. Then the vectorized game-board has initial state. 4.

6. CARDINALITY AND RANK CONSTRAINT EXAMPLES 387 P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14 P15 P16 [ e3 [ e2 [ e2 [ e4 [0 [ e2 [ e2 [ e4 [0 [ e2 [ e2 [ e4 [0 [ e2 [ e2 [ e4 0 e4 e1 e1 0 e2 e3 e3 e3 e2 e3 e1 e1 e2 e1 e3 0 e4 0 0 e3 e4 0 0 e3 e4 0 0 e1 e4 0 0 e1 ]T e4 ]T e1 ]T e1 ]T e1 ]T e2 ]T e3 ]T e3 ]T 0 ]T e4 ]T e1 ]T e3 ]T 0 ]T e4 ]T e3 ]T e1 ]T Figure 115: Vectorized demo-game board has M = 16 matrices in R4×L describing initial state of game pieces. 4 colors per puzzle-piece (Figure 114). standard basis vectors in RL represent color.4. . So color difference measurement remains unweighted. L = 4 colors total in game (Figure 112c).

388 CHAPTER 4. √ √ comprising numeric color differences between 2 M ( M − 1) touching edges. PM (863) and a rotation Πi ∈ R4×4 of each individual piece (4M ) that solve the puzzle. 4. π4 } ⊂ R4×4 correspond to clockwise piece-rotations {0◦ . Rules of the game dictate that adjacent pieces on the square board have colors that match at their touching edges as in Figure 112b.  4M ×L P  . is doubly combinatorial (M ! 4M combinations) because we must find a permutation of pieces (M !) Ξ ∈ RM ×M (864) Figure 115. ∈ R ..68  4M ×4M ∈ R Piece adjacencies on the square board map linearly to the vectorized board. This permutation problem. SEMIDEFINITE PROGRAMMING Moving pieces about the square board all at once corresponds to permuting pieces Pi on the vectorized board represented by matrix P . as stated.  11 0 0 0 0 0 1 1 0 0 0  0 0 11 . 180◦ . 270◦ }. (Ξ ⊗ I4 )ΠP = (Ξ ⊗ I4 ) .  ∈ R4M ×L .68 A complete match is therefore equivalent to demanding that a constraint. Circulant [169] permutation matrices {π1 . 90◦ .  00 1 1 0 .4.  Π1 P1 . π2 . while rotating the i th piece is equivalent to circularly shifting row indices of Pi (rowwise permutation). ΠM 1 0 0 0  0 1 0 0  0 0 00 . π3 . π4 } 0 0 0 0 1 0 0 1 (866) (867)  1    0  0    0 Initial game-board state P (863) corresponds to Ξ = I and Πi = π1 = I ∀ i . π2 .. . represented within a matrix   P1  .  00 0 0 (865) where Πi ∈ {π1 . π3 . of course. ΠM PM    1   0  0   0 Π 0 1 0 0    0 0 1 0 Π1 0  0 0 00 .

∆  (868) . line segments indicate differences ∆ . 0.6. locations of adjacent edges in R4M ×L are known a priori. 1} . Defining sparse constant fat matrix   ∆T 1 √ √ . then the desired constraint is (869) Boundary of the square board must be colored grey. Entries are indices ℓ to standard basis vectors eℓ ∈ RL from Figure 115. . Because the vectorized board layout is fixed and its cells are loaded or reloaded with pieces during play. These can all be lumped into one equality constraint β T(Ξ ⊗ I4 )ΠP 1 = 0 (870) . 2 M ( M − 1)}. vanish. .4. CARDINALITY AND RANK CONSTRAINT EXAMPLES 389 Figure 116: indicate boundary β . We need simply form differences between colors from adjacent edges of pieces loaded into those known locations (Figure 116). ∈ R ∆T√M (√M −1) 2 ∆(Ξ ⊗ I4 )ΠP = 0 ∈ R2 √ √ M ( M −1)×L whose entries belong to {−1.   2 M ( M −1)×4M . Each difference √ be represented by a constant vector from a √ may 4M set {∆i ∈ R . This means there √ are 4 M boundary locations in R4M ×L that must have value 0T . i = 1 .

Any M = 16-piece solution may be verified on the TOMY website. . 1} complementary to the known 4 M zeros (Figure 116). . π4 . π3 . 1} because (866) ij this square matrix becomes a structured permutation matrix replacing the product of permutation matrices. . Now we ask what are necessary conditions on Φ⋆ at optimality: . (872) . where β ∈ R4M is a sparse √ constant vector having entries in {0. φM 1 · · · φM M φ⋆ ∈ {π1 . Each four-point cluster represents a circulant permutation matrix from (866)..390 0 CHAPTER 4. By combining variables: Φ (Ξ ⊗ I4 )Π ∈ R4M ×4M (871) where Φ⋆ ∈ {0. SEMIDEFINITE PROGRAMMING 10 20 Φ⋆ M = 16 30 40 50 60 0 10 20 30 40 50 60 Figure 117: Sparsity pattern for composite permutation matrix Φ⋆ ∈ R4M ×4M representing solution from Figure 112b.  Φ  .. π2 . Partition the composite variable Φ into blocks   φ11 · · · φ1M  .  ∈ R4M ×4M . 0} ⊂ R4×4 ij (873) An optimal composite permutation matrix Φ⋆ is represented pictorially in Figure 117.

4. Sφ 0 0 1 0 0 0 0 1 ∈ R2×4 (876) These matrices R and S enforce circulance. that piece numbered 139 by the game designer must be placed in cell 121 on the vectorized board (Figure 113) with orientation π3 (p. Corner pair of 2×2 submatrices on antidiagonal of each and every 4×4 block φ⋆ are equal. 4. Each column has one 1.70 Since 0 is the trivial circulant matrix. Constraints Φ1 = 1 meaning. application is democratic over all blocks φij . certifies that the puzzle has been solved.69 and where     0 1 1 0  ∈ R3×4  ∈ R3×4 . Sd  0 1 1 0 (875) Rd  0 1 1 0 Rφ 1 0 0 0 0 1 0 0 ∈ R2×4 .70 Mandatory piece placement in the full game requires the equality constraint in π3 . in terms of this Φ partitioning (872).6. if attained. ij We want an objective function whose global optimum. 391 Entries along each and every diagonal of each and every 4×4 block φ⋆ ij are equal. CARDINALITY AND RANK CONSTRAINT EXAMPLES 4M -sparsity and nonnegativity.4.69 . the Eternity II problem is a minimization of cardinality Φ∈ R4M ×4M minimize vec Φ 0 subject to ∆ΦP = 0 β T ΦP 1 = 0 Φ1 = 1 ΦT 1 = 1 T T (I ⊗ Rd )Φ(I ⊗ Rd ) = (I ⊗ Sd )Φ(I ⊗ Sd ) T T (I ⊗ Rφ )Φ(I ⊗ Sφ ) = (I ⊗ Sφ )Φ(I ⊗ Rφ ) T (e121 ⊗ I4 ) Φ(e139 ⊗ I4 ) = π3 Φ≥0 (874) which is convex in the constraints where e121 . Each row has one 1.388). Then. 4.4. e139 ∈ RM are members of the standard basis representing mandatory piece placement in the full game.

It is critical. ( 2.2. id est.0. A compressed sensing paradigm [70] is not inherent here.1.0.22].6. a vertex of the permutation polyhedron persists in the feasible set.3. to add enough random noise to the direction vector so as to insure a vertex solution [94. choosing a direction vector randomly will find an optimal solution in only a Vertex means zero-dimensional exposed face ( 2.593 × 1. p. SEMIDEFINITE PROGRAMMING and ΦT 1 = 1 confine nonnegative Φ to the permutation polyhedron (100) in R4M ×4M .1).4. Any vertex in the permutation polyhedron. a direction vector is introduced for cardinality minimization as in 4. such a solution is known to exist. dimension of E is 864.5. problem (874) is equivalent to Φ∈ R4M ×4M minimize vec Φ 0 subject to (P T ⊗ ∆) vec Φ = 0 (P 1 ⊗ β)T vec Φ = 0 (1T ⊗ I4M ) vec Φ = 14M 4M (I4M ⊗ 1T ) vec Φ = 14M 4M (I ⊗ Rd ⊗ I ⊗ Rd − I ⊗ Sd ⊗ I ⊗ Sd ) vec Φ = 0 (I ⊗ Sφ ⊗ I ⊗ Rφ − I ⊗ Rφ ⊗ I ⊗ Sφ ) vec Φ = 0 (e139 ⊗ I4 ⊗ e121 ⊗ I4 )T vec Φ = vec π3 Φ≥0 This problem is abbreviated: Φ∈ R4M ×4M (877) minimize vec Φ 0 subject to E vec Φ = τ Φ≥0 √ √ 2 2 (878) where E ∈ R2L M ( M −1)+8M +13M +17×16M is sparse and optimal objective value is 4M . has minimal cardinality.048. There can be no further intersection with a feasible affine subset that would enlarge that face. To solve this by linear programming.392 CHAPTER 4. in this case.71 In the vectorized variable. 4.71 . so it must also be a vertex of the intersection.576. For the demonstration game. intersection with a strictly supporting hyperplane. The feasible set of problem (874) is an intersection of the permutation polyhedron with a large number of hyperplanes. in fact. which is the convex hull of permutation matrices.4) The intersection contains a vertex of the permutation polyhedron because a solution Φ⋆ cannot otherwise be a permutation matrix.

A randomized direction vector also works for Clue Puzzles provided by the toy maker: similar 6×6 and 6×12 puzzles whose solution each provide a clue to solution of the full game. 1}.6.593 is too large.4. Eternity II (877) is thus equivalently transformed ˜ minimize vec Φ 0 ˜ subject to (P T ⊗ ∆)Y vec Φ = 0 T ˜ (P 1 ⊗ β) Y vec Φ = 0 T T ˜ (1 ⊗ I ⊗ 14 ) vec Φ = 1 ˜ (I ⊗ 1T ) vec Φ = 1 4M ˜ (e139 ⊗ e1 ⊗ e121 ⊗ I4 )T Y vec Φ = π3 e1 ˜ Φ≥0 4. Then. 1) Φ(: . zeroed. we may take as variable every fourth column of Φ : ˜ Φ [ Φ(: .40 ˜ vec Φ = (I ⊗ e1 ⊗ I4M + I ⊗ e2 ⊗ I ⊗ π4 + I ⊗ e3 ⊗ I ⊗ π3 + I ⊗ e4 ⊗ I ⊗ π2 ) vec Φ 2 ˜ Y vec Φ ∈ R16M (881) where Y ∈ R16M × 4M . numerical errors prevent solution of (878).4. in case (883)). Even better is a nonnegative uniformly distributed randomized direction vector having 4M entries (M entries.73 So again. 9) · · · Φ(: . A simplex-method solver is required for numerical solution.508 constraints by eliminating linearly dependent rows of matrix E . 4M − 3) ] ∈ R4M ×M (879) ˜ where Φij ∈ {0.1 no. 5) Φ(: . corresponding to the largest entries of Φ⋆ .4.72 But for the full game.1.72 2 2 ˜ Φ∈ R4M ×M (882) This can only mean: there are many optimal solutions. for ei ∈ R4 ˜ ˜ ˜ ˜ Φ = (Φ⊗eT )+(I ⊗π4 )(Φ⊗eT )+(I ⊗π3 )(Φ⊗eT )+(I ⊗π2 )(Φ⊗eT ) ∈ R4M ×4M 1 2 3 4 (880) From A. interior-point methods will not work. number of equality constraints 864. 4 Constraints in R and S (which are most numerous) may be dropped because circulance of φij is built into Φ-reconstruction formula (880). we reformulate the problem: canonical Eternity II Because each block φij of Φ (872) is optimally circulant having only four degrees of freedom (873). 4. CARDINALITY AND RANK CONSTRAINT EXAMPLES 393 few iterations.31 and no. but that reduction is not enough to overcome numerical issues with the best contemporary linear program solvers. .73 Saunders’ program lusol can reduce that number to 797. Permutation polyhedron (100) now demands that ˜ ˜ each consecutive quadruple of adjacent rows of Φ sum to 1 : (I ⊗ 1T )Φ1 = 1.

especially disappointing insofar ˜ as sparsity of E is high with only 0. ( 2.2.4) We may apply techniques from 2.13.2) Even more intriguing is the observation: vector τ resides on that polyhedral cone’s ˜ boundary. For the full game. The ˜ zero sum means that all vec Φ entries.394 CHAPTER 4.3 to prune generators not belonging to the smallest face of that cone. After zero-row ˜ and dependent row removal. 0.144. 1.1 no. But this dimension remains out of reach of most highly regarded contemporary academic and commercial linear program solvers because of numerical failure. 1}.12. to which τ belongs.39) and e121 . which zeroes the board at its edges.4. Number of equality constraints in abbreviation of reformulation (882) minimize ˜ Φ∈ R4M ×M ˜ vec Φ 0 ˜ ˜ subject to E vec Φ = τ ˜ ˜≥0 Φ (883) is now 11.07% nonzero entries ∈ {−1. i) of matrix E Column elimination can be quite dramatic but is dependent upon problem geometry.74 .144. element {2} arising only in the β constraint which is soon to disappear after presolving.2. By method of convex cones. a total 4. this means we may immediately eliminate 57. ˜ Number of columns in matrix E has been reduced from a million to 262. e139 ∈ R . must be zero.144 . dimension of E goes to 10. 0.077 × 262. corresponding to nonzero entries in T row vector (P 1 ⊗ β) Y . we discard 53.054 × 204. 2} .840 variables from 262. because ˜ generators of that smallest face must hold a minimal cardinality solution. has all positive coefficients.4.077 (an order of magnitude fewer constraints than (878)) where √ √ 2 ˜ sparse E ∈ R2L M ( M −1)+2M +5×4M replaces the E matrix from (878). SEMIDEFINITE PROGRAMMING ˜ whose optimal objective value is M with Φ⋆ -entries in {0. ˜ ˜ Matrix dimension is thereby reduced:4. 1} and where M 4 e1 ∈ R ( A.304 with entries in {−1.666 more columns via Saunders’ pdco.74 The i th column E(: .13.1. polyhedral cone theory Eternity II problem (883) constraints are interpretable in the language of ˜ convex cones: The columns of matrix E constitute a set of generators ˜ ˜ ˜ for a pointed polyhedral cone K = {E vec Φ | Φ ≥ 0}. ( 2. Notice that the constraint in β . ˜ dimension of E goes to 11. ˜ Variable vec Φ itself is too large in dimension.

638. . CARDINALITY AND RANK CONSTRAINT EXAMPLES belongs to the smallest face F of K that contains τ if and only if ˜ ˜ Φ∈ R4M ×M . This tightened problem (885) therefore ˜ tells us two things when feasible: E(: .6. dimension ˜ of E goes to 7. Any process of discarding rows and columns prior to optimization is presolving. µ ˜ ˜ µ τ − E(: . 4M 2 distinct feasibility problems ˜ Φ∈ R4M ×M find ˜ Φ (885) ˜ ˜ ˜ subject to E vec Φ = τ ˜ vec Φ 0 ˜ (vec Φ)i = 1 whose feasible set is a proper subset of that in (884). . If infeasible here in (885). i) may be discarded after all columns have been tested. µ ˜ Φ∈ R4M ×M . column E(: . then feasible (vec Φ)i = 1 could not be feasible to Eternity II (883).362 × 150.638 . id est. . i) belongs to the smallest face of ˜ K that contains τ .144 leaving all remaining column entries unaltered. µ∈R 395 find ˜ Φ. for i = 1 . dimension of E becomes 7. After presolving via this conic pruning method (with subsequent zero-row and dependent row removal). By a transformation of Saunders.506 columns removed from 262. this linear feasibility problem is the same as ˜ find Φ. call that A .362 × 150. So this test (884) of membership to F(K ∋ τ ) may be tightened ˜ ˜ to a test of (vec Φ)i = 1 . then the only ˜ ˜ choice remaining for (vec Φ)i is 0 . of 111. i) = E vec Φ ˜ ˜ ˜ vec Φ 0 (375) subject to is feasible. ˜ Following dependent row removal via lusol.4. Real variable µ can ˜ be set to 1 because if it must not be. meaning. and (vec Φ)i constitutes a nonzero vertex-coordinate ˜ of permutation polyhedron (100). µ∈R subject to ˜ ˜ E vec Φ = µ τ ˜ ˜ 0 vec Φ ˜ (vec Φ)i ≥ 1 (884) ˜ A minimal cardinality solution to Eternity II (883) implicitly constrains Φ⋆ to be binary.

. (887) is never feasible and so no column of A can be discarded (A remains unaltered). .396 CHAPTER 4.638 find x subject to Ax = A(: . id est. .10 (280): for i = 1 . 1 x by confinement of Φ in (874) to the permutation polyhedron (100). then column A(: .76 affinity for maximization Designate vector b ∈ Rm to be τ after discarding all entries corresponding ˜ ˜ . Changing 1 to 0 in (885) is always feasible for Eternity II. . previously found via (885). SEMIDEFINITE PROGRAMMING generators of smallest face are conically independent ˜ Designate A ∈ R7362×150638 Rm×n as matrix E after having discarded all generators not in the smallest face F of cone K that contains τ .4. i) is a conically dependent generator of the smallest face and must be discarded from matrix A before proceeding with test of remaining columns. One cannot help but notice a binary selection of variable by tests (885) and (887): Geometrical test (885) (smallest face) checks feasibility of vector entry 1 while geometrical test (887) (conic independence) checks feasibility of 0. The ˜ Eternity II problem (883) becomes minimize n x∈R x 0 subject to Ax = b x 0 (886) To further prune all generators relatively interior to that smallest face. 150. Then to dependent rows in E ˜ Eternity II resembles Figure 30a (not (b)) because variable x is implicitly bounded above by design. i) (887) x 0 xi = 0 ˜ ˜ where x is vec Φ corresponding to columns of E not previously discarded by (885).638 1 = maximize xi x subject to Ax = b x 0 4. It turns out. b is τ subsequent to presolving.75 If feasible. comprise a minimal set. for Eternity II: generators of the smallest face. .4. for i = 1 .76 (888) ˜ Discarded entries in vec Φ are optimally 0.75 4. 150. id est. we may subsequently test for conic dependence as described in 2.

4.5.5. nevertheless. by (885). k = M ). A globally optimal complementary direction vector z ⋆ will always exactly match an optimal solution x⋆ for convex iteration of any problem formulated as maximization of a Boolean variable.1. here we have z ⋆Tx⋆ M (891) . we propose iterating a convex problem sequence: for 1 − y ← z maximize z Tx n x∈ R subject to Ax = b x 0 maximize z Tx⋆ n z∈ R (890) subject to 0 z 1 zT1 = M (525) Variable x is implicitly bounded above at unity by design of A .1 and x n is a k-largest norm ( 3.1. in what follows. We therefore choose to work with a complementary direction vector z .2. x 397 maximize ≡ x x n M subject to Ax = b x 0 where subject to Ax = b x 0 n M (889) y =1−∇ x (762) is a direction vector from the technique of convex iteration in 4. solution x will be optimal for Eternity II because it must then be a Boolean vector with minimal cardinality M . [309. CARDINALITY AND RANK CONSTRAINT EXAMPLES Unity is always attainable.4) M = maximize (1 − y)T x y(x) . 32] This problem formulation is unusual. Maximization of convex function x n (monotonic on Rn ) is not a + M convex problem. which is difficult. though the constraints are convex. insofar as its geometrical visualization is quite clear. When upper bound M in (889) M is met. direction vector is optimal solution at global convergence Instead of solving (889). in predilection for a mental picture of convex function maximization.2.1.6. By (879) this means ( 4.

Because z ⋆ = x⋆ .77 such that x = Z ξ + xp (115) then the equivalent problem formulation maximize (Z ξ + xp )T (Z ξ + xp ) ξ subject to Z ξ + xp 0 (893) might invoke optimality conditions as obtained in [198. Indeed. . which follows. not only to positive semidefinite matrices.398 rumination CHAPTER 4. . SEMIDEFINITE PROGRAMMING Adding a few more clue pieces makes the problem harder in the sense that solution space is diminished. If it were possible to form a nullspace basis Z . Z find X Y X XT Z rank G ≤ k G= (894) X∈ R subject to X ∈ C rank X ≤ k 4. the assumptions in Theorem 8 ask for the Qi being positive definite (see the top of the page of Theorem 8 ). 4.8]4. −Jean-Baptiste Hiriart-Urruty 4.77 subject to X ∈ C Define sparsity as ratio of number of nonzero entries to matrix-dimension product. thm. I must confess that I do not remember why.0.7 Constraining rank of indefinite matrices Example 4. demonstrates that convex iteration is more generally applicable: to indefinite or nonsquare matrices X ∈ Rm×n .7. Eternity II can instead be formulated equivalently as a convex-quadratic maximization: maximize xTx n x∈ R subject to Ax = b x 0 (892) a nonconvex problem but requiring no convex iteration. .78 . for A of about equal sparsity.0.1. the target gets smaller.4.78 . find m×n X ≡ X . Y.

W⋆ (895) 0 subject to X ∈ C G= subject to X ∈ C Y X G= XT Z Were W ⋆ = I . (1498) rank X ≤ rank B.7. rank C ≤ rank R = rank G is tight. The phenomenon of ubiquitous compressibility raises very natural questions: Why go to so much effort to acquire all the data when most of what we get will be thrown away? Can’t we just directly measure the part that won’t end up being thrown away? −David Donoho [119] Lossy data compression techniques like JPEG are popular. and specialized technical data. everyone now knows that most of the data we acquire can be thrown away with almost no perceptual loss − witness the broad success of lossy compression formats for sounds. But there are better direction vectors than identity I which occurs only under special conditions: 4.4.228) Any nuclear norm minimization problem may have its variable replaced with a composite semidefinite variable of the same optimal rank but doubly dimensioned. but it is also well known that compression artifacts become quite perceptible with . Compressed sensing. So there must exist an optimal direction vector W ⋆ such that X . compressive sampling. rank C ≤ rank R and B ∈ Rk×m and C ∈ Rk×n . rank G ≤ k ⇒ rank X ≤ k because X is the projection of composite matrix G on subspace Rm×n . Z find X Y X XT Z rank G ≤ k ≡ minimize X .0. Y.7. Because Y and Z and X = B T C are variable. [302] As our modern technology-driven civilization acquires and exploits ever-increasing amounts of data. images. by (1725) the optimal resulting trace objective would be equivalent to the minimization of nuclear norm of X over C .0. G = RTR where R [ B C ] and rank B. This means: (confer p. Then Figure 84 becomes an accurate geometrical description of a consequent composite semidefinite problem objective.1 Example. CONSTRAINING RANK OF INDEFINITE MATRICES 399 Proof. any rank-k positive semidefinite composite matrix G can be factored into rank-k terms R . Y. For symmetric Y and Z . Z G.

(Stanford University logo rank is much higher. SEMIDEFINITE PROGRAMMING Figure 118: Massachusetts Institute of Technology (MIT) logo. desirable artifacts are forever lost. After stabilizing the mirrors to a fixed but pseudorandom pattern.4. This sampling process is repeated As simple a process as upward scaling of signal amplitude or image size will always introduce noise. In this example we throw out only so much information as to leave perfect reconstruction within reach.g. Specifically. so rendered imperceptible even with significant postfiltering (of the compressed signal) that is meant to reveal them.. may be interpreted as a rank-5 matrix. there will always be need for raw (noncompressed) data. The MIT-logo image in this example impinges a 46×81 array micromirror device. A single photodiode (one pixel) integrates incident light from all mirrors. the MIT logo in Figure 118 is perfectly reconstructed from 700 time-sequential samples {yi } acquired by the one-pixel camera illustrated in Figure 119. e.7. there can be no universally acceptable unique metric of perception for gauging exactly how much data can be tossed. For these reasons.) This constitutes Scene Y observed by the one-pixel camera in Figure 119 for Example 4. [221] [246] Spatial or audio frequencies presumed masked by a simultaneity are not encoded.79 . id est. for example. so highly compressed data is not amenable to analysis and postprocessing: e. signal postprocessing that goes beyond mere playback of a compressed signal. This mirror array is modulated by a pseudonoise source that independently positions all the individual mirrors.400 CHAPTER 4.0. sound effects [97] [98] [99] or image enhancement (Photoshop).79 Further. including its white boundary. But scaling-noise is particularly noticeable in a JPEG-compressed image. even to a noncompressed signal. 4.0.g. text or any sharp edge. light so collected is then digitized into one sample yi by analog-to-digital (A/D) conversion.1..

CONSTRAINING RANK OF INDEFINITE MATRICES Y 401 yi Figure 119: One-pixel camera. as reported in [302. Each different mirror pattern produces a voltage at the single photodiode that corresponds to one measurement yi . [347] [373] Compressive imaging camera block diagram.80 (Figure 120) Our approach to reconstruction is to look for low-rank solution to an underdetermined system: subject to A vec X = y rank X ≤ 5 4.4. The most important questions are: How many samples do we need for perfect reconstruction? Does that number of samples represent compression of the original data? We claim that perfect reconstruction of the MIT logo can be reliably achieved with as few as 700 samples y = [yi ] ∈ R700 from this one-pixel camera. Incident lightfield (corresponding to the desired image Y ) is reflected off a digital micromirror device (DMD) array whose mirror orientations are modulated in the pseudorandom pattern supplied by the random number generators (RNG). If a minimal basis for the MIT logo were instead constructed. with the micromirror array modulated to a new pseudorandom pattern. 6]. (Figure 120) .4. which corresponds to 8% of the original information. This means a lower bound on achievable compression is about 5×46 = 230 samples plus 81 samples column encoding. That number represents only 19% of information obtainable from 3726 micromirrors.80 X ∈ R46×81 find X (896) That number (700 samples) is difficult to achieve. only five rows or columns worth of data (from a 46×81 matrix) are linearly independent.7.

5) no compression (3726 samples). 6]. where vec X is the vectorized (37) unknown image matrix.402 1 2 3 CHAPTER 4. Perfect reconstruction here means optimal solution X ⋆ equals scene Y ∈ R46×81 to within machine precision. :) vec Y . 4) JPEG @100% quality (2588 samples). X ∈ R46×81 find X A vec X = y W1 X rank X T W2 (897) ≤5 subject to This rank constraint on the composite (block) matrix insures rank X ≤ 5 for any choice of dimensionally compatible matrices W1 and W2 . k = 140. y = A vec Y . Since these patterns are deterministic (known). 3) convex iteration (700 samples) can achieve lower bound predicted by compressed sensing (670 samples. X ∈ R subject to A vec X = y (898) W1 X 0 X T W2 . we constrain its rank by rewriting the problem equivalently W1 ∈ R46×46 . we alternate solution of semidefinite program W1 X Z minimize 46×81 tr X T W2 W1 ∈ S46 . then the i th sample yi equals A(i . W2 ∈ S81 . id est. SEMIDEFINITE PROGRAMMING 4 samples 5 0 1000 2000 3000 3726 Figure 120: Estimates of compression for various encoding methods: 1) linear interpolation (140 samples). Each row of fat matrix A is one realization of a pseudorandom pattern applied to the micromirrors. Because variable matrix X is generally not square or positive semidefinite. Figure 100) whereas nuclear norm minimization alone does not [302. But to solve this problem by convex iteration. n = 46×81. W2 ∈ R81×81 . 2) minimal columnar basis (311 samples).

requires a problem sequence in a progressively larger number of balls to find a good initial value for the direction matrix.g. But each problem type (per Example) possesses its own idiosyncrasies that slightly modify how a rank-constrained optimal solution is actually obtained: The ball packing problem in Chapter 5.4. that being. 4.7. Finding a Boolean solution in Example 4. convex iteration with a direction matrix as normal to a linear regularization (a generalization of the well-known trace heuristic). while other problems have no such requirement.6..9 requires a procedure to detect stalls.672) until a rank-5 composite matrix is found. It would be nice if there were a universally .0. convergence occurs in two iterations.7. then that constraint becomes a linear regularization term in the objective. p. This idea of direction matrix is good principally because of its simplicity: When confronted with a problem otherwise convex if not for a rank or cardinality constraint. whereas other problems do not.3 allows use of a known closed-form solution for direction vector when solved via rank constraint.0. for example. but not when solved via cardinality constraint. There exists a common thread through all these Examples. binary sequences succeed with the same efficiency as Gaussian or uniformly distributed sequences.4.6.3. Iterating more admits taking of fewer samples.4. Some problems require a careful weighting of the regularization term. 700 samples require more than ten iterations but reconstruction remains perfect. Reconstruction is independent of pseudorandom sequence parameters. With 1000 samples {yi } . e.0. The combinatorial Procrustes problem in Example 4. CONSTRAINING RANK OF INDEFINITE MATRICES with semidefinite program W1 X minimize tr Z 46+81 X T W2 Z∈ S subject to 0 Z I tr Z = 46 + 81 − 5 ⋆ 403 (899) (which has an optimal solution that is known in closed form. whereas many of the examples in the present chapter require an initial value of 0.1 rank-constraint midsummary We find that this direction matrix idea works well and quite independently of desired rank or affine dimension.0.2. and so on.

. T svec(Am ) Recall that. we now propose a further generalization of convex iteration for constraining rank that attempts to ameliorate quirks and unify problem types: 4. This suggests minimization of an objective function directly in terms of eigenvalues.5).1.404 CHAPTER 4. It turns out that this general method always requires solution to a rank-1 constrained problem regardless of desired rank from the original problem.81 . We speculate one reason to be a simple dearth of optimal solutions of desired rank or cardinality.0. in Convex Optimization. e. Procrustes Example 4. This method is motivated by observation ( 4.2. one that is less susceptible to quirks of a particular problem type.8 Convex Iteration rank-1 We now develop a general method for constraining rank that first decomposes a given problem via standard diagonalization of matrices ( A.81 an unfortunate choice of initial search direction leading astray. . A second motivating observation is that variable orthogonal matrices seem easily found by convex iteration. Ease of solution by convex iteration occurs when optimal solutions abound. we pose a semidefinite feasibility problem find X ∈ Sn subject to A svec X = b X 0 rank X ≤ ρ (900) given an upper bound 0 < ρ < n on rank. 4. To demonstrate.4.6.1) that an optimal direction matrix can be simultaneously diagonalizable with an optimal variable matrix. SEMIDEFINITE PROGRAMMING applicable method for constraining rank. Poor initialization of the direction matrix from the regularization can lead to an erroneous result.0. With this speculation in mind. an optimal solution generally comes from a convex set of optimal solutions that can be large.g.  ∈ Rm×n(n+1)/2 (649) A = . and typically fat full-rank   svec(A1 )T . vector b ∈ Rm .4.

Thus   tr(A1 X) . .   0T λn The fact Λ allows splitting semidefinite feasibility problem (900) into two parts: ρ 0 ⇔ X 0 (1483) minimize Λ A svec i=1 Rρ + λ i Qii − b (903) subject to δ(Λ) ∈ minimize Q 0 ρ i=1 A svec T −1 λ⋆i Qii − b (904) subject to Q = Q The linear equality constraint A svec X = b has been moved to the objective within a norm because these two problems (903) (904) are iterated. II. tr(Am X) X QΛQT = i=1 λ i Qii ∈ S n (901) which is a sum of rank-1 orthogonal projection matrices Qii weighted by T eigenvalues λ i where Qij qi qj ∈ Rn×n . [26. QT = Q−1 .2] Express the nonincreasingly ordered diagonalization of variable matrix n where Ai ∈ S n . equality .  .4. (650) A svec X =  . i = 1 . . CONVEX ITERATION RANK-1 405 This program (900) states the classical problem of finding a matrix of specified rank in the intersection of the positive semidefinite cone with a number m of hyperplanes in the subspace of symmetric matrices S n . Q = [ q1 · · · qn ] ∈ Rn×n .13] [24.. m are also given. Λii = λ i ∈ R .. 2.8. and   λ1 0   λ2   Λ = (902)  ∈ Sn . Symmetric matrix vectorization svec is defined in (56).

tr Qij = 0 . 1. .  G = . . any monotonically nonincreasing real sequence converges.2] [42. .  . . SEMIDEFINITE PROGRAMMING might only become attainable near convergence. . . ( G = . .   . . ρ i<j = 2 . T T T qρ q1 · · · qρ qρ Q1ρ · · · Qρρ  Given ρ eigenvalues λ⋆i . .   Q11 · · · Q1ρ . [259.  . T Q1ρ · · · Qρρ rank G = 1 (906) This new problem always enforces a rank-1 constraint on matrix G . ρ 0) subject to tr Qii = 1 . qρ    T T  Q11 · · · Q1ρ q1 q1 · · · q1 qρ . . .. The second problem (904) in the iteration is not convex. .. . this equivalent problem (906) always poses a rank-1 constraint. . Qij ∈ R (905) minimizen×n n A svec i=1 λ⋆i Qii − b i=1 . then a problem equivalent to (904) is ρ Qii ∈ S .406 CHAPTER 4.. a certificate of global optimality for the iteration. regardless of upper bound on rank ρ of variable matrix X . 1. . This iteration always converges to a local minimum because the sequence of objective values is monotonic and nonincreasing. Given some fixed weighting ... id est. . ∈ S nρ . the main diagonal of Λ must belong to the nonnegative orthant in a ρ-dimensional subspace of Rn . Positive semidefiniteness of matrix X with an upper bound ρ on rank is assured by the constraint on eigenvalue matrix Λ in convex problem (903). . We propose solving it by convex iteration: Make the assignment  T T q1 [ q1 · · · qρ ] .. = .1] A rank ρ matrix X solving the original problem (900) is found when the objective converges to 0 . . it means.

With dimension n = 26 and number of equality constraints m = 10.8. including optimization problems having nontrivial This finding is significant insofar as Barvinok’s Proposition 2. one numerical implementation is disclosed on Wıκımization. but his upper bound can be no tighter than 3. We find a good value of weight w ≈ 5 for randomized problems.1 predicts existence of matrices having rank 4 or less in this intersection.82 .0. of course.. CONVEX ITERATION RANK-1 factor w . Initial value of direction matrix W is not critical. . For a fixed number of equality constraints m and upper bound ρ .4. Empirically.672). convergence to a rank-ρ = 2 solution to semidefinite feasibility problem (900) is achieved for nearly all realizations in less than 10 iterations. An optimal solution to (908) has closed form (p.. ρ 0 subject to tr Qii = 1 . 4.  . .3. G = .4. the feasible set in rank-constrained semidefinite feasibility problem (900) grows with matrix dimension n . .   Q11 · · · Q1ρ . . stall detection is required. These convex problems (907) (908) are iterated with (903) until a rank-1 G matrix is found and the objective of (907) vanishes. . . tr Qij = 0 . T Q1ρ · · · Qρρ with minimize nρ W∈S (907) tr(G⋆ W ) (908) subject to 0 W I tr W = nρ − 1 the latter providing direction of search W for a rank-1 matrix G in (907).82 For small n .9. . ρ i<j = 2 . . This diagonalization decomposition technique is extensible to other problem types. this semidefinite feasibility problem becomes easier to solve by method of convex iteration as n increases. we propose solving (906) by iteration of the convex sequence ρ 407 minimize n×n Qij ∈ R A svec i=1 λ⋆i Qii − b + w tr(G W ) i=1 .

83 The inequality in A remains in the constraints after diagonalization. Because of a greatly expanded feasible set.2] [42. [259. for example. although.4. local convergence is guaranteed by virtue of monotonically nonincreasing real objective sequences.1] 4.83 Because of the nonconvex nature of a rank-constrained problem. . 1. find X ∈ Sn subject to A svec X b X 0 rank X ≤ ρ (909) this relaxation of (900) [226. SEMIDEFINITE PROGRAMMING objectives. more generally.408 CHAPTER 4. III] is more easily solved by convex iteration rank-1 . there can be no proof of global convergence of convex iteration from an arbitrary initialization. 1.

−John Clifford Gower [163.1 2001 Jon Dattorro. We might want to know more. or specified only by certain tolerances (affine inequalities).28. 5. All rights reserved. ◦ D ∈ RN ×N . a classical two-dimensional matrix representation of absolute interpoint distance because its entries (in ordered rows and columns) can be written neatly on a piece of paper. A question naturally arising in some fields (e. v2012.01.01.. 2005.Chapter 5 Euclidean Distance Matrix These results [(941)] were obtained by Schoenberg (1935 ). genetics. we will answer some of these questions via Euclidean distance matrices. co&edg version 2012. a surprisingly late date for such a fundamental property of Euclidean geometry. relative or absolute position or dimension of some hull. 409 .g.. citation: Dattorro.28. Matrix D will be reserved throughout to hold distance-square. engineering) [104] asks what facts can be deduced given only distance information.g.5. These questions motivate a study of interpoint distance. psychology. √ e.1 In what follows. such as. Mεβoo Publishing USA. or suppose the distance information is not reliable. Convex Optimization & Euclidean Distance Geometry. biochemistry. available. 3] By itself. What can we know about the underlying points that the distance information purports to describe? We also ask what it means when given distance information is incomplete. geodesy. distance information between many points in Euclidean space is lacking. economics. well represented in any spatial dimension by a simple matrix from linear algebra.

hence the row or column index of an EDM.1] A Euclidean distance matrix. 3. [228. i or j = 1 .1 EDM Euclidean space Rn is a finite-dimensional real vector space having an inner product defined on it. . individually addresses all the points in the list. Dotted lines are imagined vectors to points whose affine dimension is 2. . Consider the following example of an EDM for the case N = 3 :       0 d12 d13 d11 d12 d13 0 1 5 D = [dij ] =  d21 d22 d23  =  d12 0 d23  =  1 0 4  (911) d13 d23 0 d31 d32 d33 5 4 0 Matrix D has N 2 entries but only N (N − 1)/2 pieces of information. x i − xj (910) Each point is labelled ordinally. . the measure of distance-square: dij = xi − xj 2 2 x i − xj . an EDM in RN ×N . In Figure 121 are shown three points in R3 that can be arranged in a list . is an exhaustive table of distance-square dij + between points taken by pair from a list of N points {xℓ . 5.410 CHAPTER 5. inducing a metric. the squared metric. EUCLIDEAN DISTANCE MATRIX 1 2 3 1 2 3 2 √ 5 1  0 1 5 D = 1 0 4  5 4 0  Figure 121: Convex hull of three points (N = 3) is shaded in R3 (n = 3). ℓ = 1 . . N } in Rn . N .

Implicit from the terminology.6] Kreyszig calls these first metric properties axioms of the metric. 2. Marsden calls nondegeneracy. [50. 1.4] Blumenthal refers to them as postulates.3 and an EDM must be symmetric.2 First metric properties For i. The fourth property provides upper and lower bounds for each entry.15] 5.2. i = j = k nonnegativity self-distance symmetry triangle inequality Then all entries of an EDM must be in concord with these Euclidean metric properties: specifically. N . 1. [259. [163. . p. 5. for Euclidean metric dij ( 5.j = 1 . absolute distance between points xi and xj must satisfy the defining requirements imposed upon any metric space: [228.2 the main diagonal must be 0 .k . But such a list is not unique because any rotation.5. FIRST METRIC PROPERTIES 411 to correspond to D in (911). [228. dij ≥ 0 . What we call self-distance.7] namely.5. each entry must be nonnegative. 3. or translation ( 5. Property 4 is true more generally when there are no restrictions on indices i. i = j dij = 0 ⇔ xi = xj dij = dij ≤ dji dik + dkj . 2] Yet any list of points or the vertices of any bounded convex polyhedron must conform to the properties. reflection.3 ∃ fifth Euclidean metric property The four properties of the Euclidean metric provide information insufficient to certify that a bounded convex polyhedron more complicated than a triangle has a Euclidean realization. p.j. 4. 1. . 5.5) of those points would produce the same EDM D .1] [259.3 5. dij ≥ 0 ⇔ dij ≥ 0 is always assumed. but furnishes no new information.4) in Rn 1.2 .5.

2] specify impossible distances in any dimension. x3 . while property 3 makes d31 = d13 . √ property 2 requires the main diagonal 31 be 0. d14 . Yet there is only one possible choice for d14 because √ points x2 . but missing one of its entries:   0 1 d13 D= 1 0 4  d31 4 0 (912) Indeed. nor should they be from the information given for this example. Now consider the polyhedron in Figure 122b formed from an unknown list {x1 . The corresponding EDM less one critical piece of information. Small completion problem. is given by   0 1 5 d14  1 0 4 1   (914) D=  5 4 0 1  d14 1 1 0 From metric property 4 we may write a few inequalities for the two triangles common to d14 . All other values of d14 in the interval √ [ 5−1.1 Example.0.3. Consider the EDM in (911). x2 . x3 . Triangle. I.1.1). in√ this particular example the triangle inequality does not yield an interval for d14 over which a family of convex polyhedra can be reconstructed. . described over that closed interval [1. x4 must be collinear. yes we can determine the unknown entries of D .3. EUCLIDEAN DISTANCE MATRIX Can we determine unknown entries of D by applying the metric properties? √ √ Property 1 demands d13 .0. id est. So. The fourth property tells us 1≤ d13 ≤ 3 (913) 5. x4 }.0. we find √ 5−1 ≤ d14 ≤ 2 (915) √ We cannot further narrow those bounds on d14 using only the four metric √ properties ( 5.0. d√ ≥ 0.412 CHAPTER 5.3.8.2 Example. 3] is a family of triangular polyhedra whose angle at vertex x2 varies from 0 to π radians. 5. but they are not unique.

. Until then.7) is unique as proved in 5. now only five (2N − 3) absolute distances are specified.0.9.2.3. and 5.1.2 to illustrate more elegant methods of solution in 5.3. First four properties of Euclidean metric are not a recipe for reconstruction of this polyhedron.3. ∃ FIFTH EUCLIDEAN METRIC PROPERTY x3 1 (a) √ 5 x4 1 x1 1 x2 x1 x2 2 x4 (b) x3 413 Figure 122: (a) Complete dimensionless EDM graph.4.9.3.4 Four Euclidean metric properties are insufficient for reconstruction.1.5.3.1 and 5.14.3. 5.1. missing d14 as in (914). (b) Emphasizing obscured segments x2 x4 .8.1. We will return to this simple Example 5. and x2 x3 .5.0.4.0. EDM so represented is incomplete. we can deduce some general principles from the foregoing examples: Unknown dij of an EDM are not necessarily uniquely determinable. The triangle inequality does not produce necessarily tight bounds.0.1.4.1.14. yet the isometric reconstruction ( 5. 5.1.4 The term tight with reference to an inequality means equality is achievable. x4 x3 .

14. and for N ≥ 4 distinct points {xk } .1 Lookahead There must exist at least one requirement more than the four properties of the Euclidean metric that makes them altogether necessary and sufficient to certify realizability of bounded convex polyhedra. 5.3. .1. One of our early goals is to determine matrix criteria that subsume all the Euclidean metric properties and any further requirements. .0.5 matrices ( 2.5.2. From an EDM. must be satisfied at each point xk regardless of affine dimension.3. (Numerical test isedm(D) provided on Wıκımization. for all i. N } .2. a generating list ( 2. Indeed.6 5. θikj ≤ π (916) where θikj = θjki represents angle between vectors at vertex xk (986) (Figure 123).0. 2. j.12.9 . ℓ = k ∈ {1 . EUCLIDEAN DISTANCE MATRIX 5. there are precisely N + 1 necessary and sufficient Euclidean metric requirements for N points constituting a generating list ( 2. (confer 5. θℓkj . we will see there is a bridge from bounded convex polyhedra to EDMs in 5.414 CHAPTER 5.1. and translation ( 5.3.5).2) for a polyhedron can be found ( 5. reflection.2). we will find (1265) (941) (945)  −z TDz ≥ 0   1Tz = 0  ⇔ D ∈ EDMN (917) (∀ z = 1)    D ∈ SN h where the convex cone of Euclidean distance matrices EDMN ⊆ SN belongs h to the subspace of symmetric hollow5. Relative-angle inequality. there are infinitely many more.3.1).) Having found equivalent matrix criteria. ⋄ We will explore this in 5.5 .1 Fifth Euclidean metric property.2. i < j < ℓ .12) correct to within a rotation. Looking ahead.14. Here is the fifth requirement: 5. the inequalities cos(θikℓ + θℓkj ) ≤ cos θikj ≤ cos(θikℓ − θℓkj ) 0 ≤ θikℓ .1) Augmenting the four fundamental properties of the Euclidean metric in Rn .3.6 0 main diagonal.

k. this fifth property is necessary and sufficient for realization of four points. Together with the first four Euclidean metric properties. ∃ FIFTH EUCLIDEAN METRIC PROPERTY 415 k θikℓ θj kℓ θi kj i ℓ j Figure 123: Fifth Euclidean metric property nomenclature. .5. The fifth property is necessary for realization of four or more points. a reckoning by three angles in any dimension. j. Each angle θ is made by a vector pair at vertex k while i. ℓ index four points at the vertices of a generally irregular tetrahedron.3.

id est. Φij = ei eT + ej eT − ei eT − ej eT i j j i = (ei − ej )(ei − ej )T δ((ei eT + ej eT )1) − (ei eT + ej eT ) ∈ SN j i j i + (920) where {ei ∈ RN . .1). . ℓ = 1 . EUCLIDEAN DISTANCE MATRIX Now we develop some invaluable concepts. .0. N ( A. [309. N } to the columns of a matrix X = [ x1 · · · xN ] ∈ Rn×N (76) where N is regarded as cardinality of list X . moving toward a link of the Euclidean metric properties to matrix criteria. its entries must be related to those points constituting the list by the Euclidean distance-square: for i.2) of vec X (37).1 no. j = 1 .4.33) dij = xi − xj = xT i xT j 2 = vec(X)T (Φij ⊗ I ) vec X = Φij .0.1.2. . 6] .4 EDM definition Ascribe points in a list {xℓ ∈ Rn . X TX where   vec X =    x1 x2 . .1.416 CHAPTER 5. 5. i = 1 . N } is the set of standard basis vectors. . Φij ⊗ I is positive semidefinite (1515) having I ∈ S n in its ii th and jj th block of entries while −I ∈ S n fills its ij th and ji th block. xN     ∈ RnN  (919) I −I −I I = (xi − xj )T (xi − xj ) = xi xj xi 2 + xj 2 − 2xTxj i (918) and where ⊗ signifies Kronecker product ( D. . When matrix D = [dij ] is an EDM. Thus each entry dij is a convex quadratic function ( A. .

4. it must have the form D(X) δ(X TX)1T + 1δ(X TX)T − 2X TX ∈ EDMN = [vec(X)T (Φij ⊗ I ) vec X . 5. δ(D) = 0 and DT = D or. When we say D is an EDM. for ζ ∈ R ◦ D(ζX) = |ζ| ◦ D(X) (925) where the positive square root is entrywise ( ◦ ).1 homogeneity Function D(X) is homogeneous in the sense.1).5. i. . . Figure 158 p. conversely.4.2) is obvious from the distance-square definition (918).4. id est. 5. j = 1 . id est. it implicitly means the main diagonal must be 0 (property 2.0. . D ∈ SN h are necessary matrix criteria. N ] (922) (923) Function D(X) will make an EDM given any X ∈ Rn×N . EDM DEFINITION 417 The collection of all Euclidean distance matrices EDMN is a convex subset of RN ×N called the EDM cone ( 6. Figure 150). self-distance) and D must be symmetric (property 3). Now the EDM cone may be described: EDMN = D(X) | X ∈ RN −1×N (924) Expression D(X) is a matrix definition of EDM and so conforms to the Euclidean metric properties: Nonnegativity of EDM entries (property 1. Any nonnegatively scaled EDM remains an EDM. the matrix class EDM is invariant to nonnegative scaling (α D(X) for α ≥ 0) because all EDMs of dimension N constitute a convex cone EDMN ( 6. reading from (922). + 0 ∈ EDMN ⊆ SN ∩ RN ×N ⊂ SN h + (921) An EDM D must be expressible as a function of some list X . equivalently. so holds for any D expressible in the form D(X) in (922). but D(X) is not a convex function of X ( 5.564).

7.418 CHAPTER 5.2).3.2 over its domain that is all of Rn×N . [160. .4.2. 7.   0 1 R(VN ) = N (1T ) . The outcome is different when instead we consider T T −VN D(X)VN = 2VN X TXVN (927) (N (VN ) = 0) having range where we introduce the full-rank skinny Schoenberg auxiliary matrix ( B...7. T VN 1 = 0 (929) Matrix-valued function (927) meets the criterion for convexity in 3.4.0.2. for any Y ∈ Rn×N d2 T T − 2 VN D(X + t Y )VN = 4VN Y T Y VN dt 0 (930) T Quadratic matrix-valued function −VN D(X)VN is therefore convex in X achieving its minimum. 4. videlicet. Yet xj −D(X) (922) is not a quasiconvex function of matrix X ∈ Rn×N because the second directional derivative ( 3.8] [203. strict convexity of the quadratic (927) becomes impossible because (930) could not then be positive definite. with respect to a positive semidefinite cone ( 2.2)   −1 −1 · · · −1  1 0   1  1 −1T   1 √  VN ∈ RN ×N −1 (928)  = √ I  2 2 .8) We saw that EDM entries dij − d2 D(X + t Y ) = 2 −δ(Y T Y )1T − 1δ(Y T Y )T + 2 Y T Y dt2 t=0 (926) is indefinite for any Y ∈ Rn×N since its main diagonal is 0. EUCLIDEAN DISTANCE MATRIX T −VN D(X)VN convexity 5. at X = 0.1 prob. When the penultimate number of points exceeds the dimension of the space n < N − 1.1 xi are convex quadratic functions.2] Hence −D(X) can neither be convex in X .

 cos ψ23 1 13   .  . xN xN cos ψ1N cos ψ2N cos ψ3N · · · 1 δ 2(G) Ψ δ 2(G) (931) where ψij (950) is angle between vectors xi and xj .3. . . . EDM DEFINITION 419 5. i. is known as a Gram matrix .  N .  =  xTx1 xTx2 . G where Φij is defined in (920). . G . G 0 ⇐ Ψ 0 (932) [gij ] (933) When δ 2(G) is nonsingular. N ] ⇐ G 0 (934) The EDM cone may be described.5. . T T T 2 xN x1 xN x2 xN x3 · · · xN   1 cos ψ12 cos ψ13 · · · cos ψ1N       x1 x1 1 cos ψ23 · · · cos ψ2N    x2   cos ψ12   x2      ..5) Distance-square dij (918) is related to Gram matrix entries G T = G dij = gii + gjj − 2gij = Φij .. [251.. . .. . .0. . X TX =  . 3. .. . formed from inner product of list X . ..4. ( A. .   cos ψ3N  δ  . x3 2 .2 Gram-form EDM definition G Positive semidefinite matrix X TX in (922). xN  . (confer (1021)(1027)) EDMN = D(G) | G ∈ SN + (935) . .. Hence the linear EDM definition D(G) δ(G)1T + 1δ(G)T − 2G ∈ EDMN = [ Φij . then G 0 ⇔ Ψ 0. . xTxN  ∈ S+  3  3 3  .  .1.  cos ψ = δ  .   . Positive semidefiniteness of interpoint angle matrix Ψ implies positive semidefiniteness of Gram matrix G .. .    .  T . .4...  . and where δ 2 denotes a diagonal matrix in this case. .6]   x1 2 xTx2 xTx3 · · · xTxN 1 1 1  T   x1 [ x1 · · · xN ]  xTx1 x2 2 xTx3 · · · xTxN  2 2 2    . j = 1 . .

From (939) we get sufficiency of the first matrix criterion for an EDM proved by Schoenberg in 1935. = G − (Ge1 1T + 1eTG) + 1eTGe1 1T 1 1 (937) where   1  0  (938) e1  . Consider the symmetric translation (I − 1eT )D(G)(I − e1 1T ) that shifts 1 the first row and column of D(G) to the origin. Then it follows for D ∈ SN h − D − (De1 1T + 1eTD) + 1eTDe1 1T 1 1 1 2 1 G = − D − (De1 1T + 1eTD) 2 .4.1) on subspace SN = {G ∈ SN | Ge1 = 0} 1 √ √ T 2VN Y 0 2VN = 0 (2044) | Y ∈ SN in the Euclidean sense. EUCLIDEAN DISTANCE MATRIX First point at origin Assume the first point x1 in an unknown list X resides at the origin. setting Gram-form EDM operator D(G) = D for convenience. But VN is the matrix implicit in Schoenberg’s seminal exposition. .2.  .420 5. so (941) is the same as (917). .7 N ⇔ T −VN DVN ∈ SN −1 + D ∈ SN h (941) From (929) we know R(VN ) = N (1T ) . [314]5. 1 √ √ T 2VN D 0 2VN =− 0 Xe1 = 0 ⇔ Ge1 = 0 (936) x1 = 0 1 2 = where 1 T T VN GVN = −VN DVN 2 0 0T T 0 −VN DVN (939) ∀X I − e1 1T = 0 √ 2VN (940) is a projector nonorthogonally projecting ( E. 0 is the first vector from the standard basis.7 D ∈ EDM 5.1 CHAPTER 5. In fact. any matrix V in place of VN will satisfy (941) whenever R(V ) = R(VN ) = N (1T ).

X1 = 0 ⇔ G1 = 0 Now consider the calculation (I − centering or projection operation.2.4.2.1) of an unknown list X is the origin.9. From (943) and the assumption D ∈ SN we get sufficiency of the more popular h form of Schoenberg criterion: D ∈ EDM N ⇔ −V D V ∈ SN + D ∈ SN h (945) Of particular utility when D ∈ EDMN is the fact. [isedm(D) on Wıκımization] 0 0T By substituting G = (939) into D(G) (934). a geometric ( E. T 0 −VN DVN D= 0 T δ −VN DVN T 1T + 1 0 δ −VN DVN T −2 0 0T T 0 −VN DVN (1041) assuming x1 = 0.2) Setting D(G) = D for 11TD) + 1 1 11TD11T 2 N2 .4.5.2. G =− D− 1 (D11T + N 1 2 1 2 1 11T )D(G)(I N (942) 1 − N 11T ) .0. projection) matrix 1 V I − 11T ∈ SN (944) N are found in B. = tr G N = ℓ=1 xℓ 2 = X 2 F .1.j Φij ⊗ I vec X (946) = tr(V G V ) . 5.0.1.5.1.20) (918) tr −V D V 1 2 = 1 2N dij i.4.2 no. Details of this bijection are provided in 5.6. X1 = 0 (943) ∀X V G V = −V D V = −V D V where more properties of the auxiliary (geometric centering.4. V G V may be regarded as a covariance matrix of means 0. ( B.0. EDM DEFINITION 421 We provide a rigorous complete more geometric proof of this fundamental Schoenberg criterion in 5.j = G 0 1 2N vec(X)T i. X1 = 0 . convenience as in 5.4.2.4.2 0 geometric center Assume the geometric center ( 5.7.

1. EUCLIDEAN DISTANCE MATRIX where Φij ∈ SN (920).2. 5. for example. is equivalent to the constraint that all points lie on a hypersphere of radius 1 centered at the origin. p. therefore convex in vec X .0. [89. (Laura Klanfer 1933 [314. assuming X1 = 0 D = δ −V D V 1 2 1T + 1δ −V D V 1 T 2 − 2 −V D V 1 2 (1031) Details of this bijection can be found in 5. it compacts a reconstructed list by minimizing total norm-square of the vertices. a Gram matrix may always be constrained to have equal positive values along its main diagonal.116] Any further constraint on that Gram matrix applies only to interpoint angle matrix Ψ = G . D = 2(g11 11T − G) ∈ EDMN (948) Requiring 0 geometric center then becomes equivalent to the constraint: D1 = 2N 1. but it tends to facilitate reconstruction of a list configuration having least energy.2.1.9. We will find this trace + useful as a heuristic to minimize affine dimension of an unknown list arranged columnar in X ( 7.6.422 CHAPTER 5. (1130) (1274)) Constraining all main diagonal entries gii of a Gram matrix to 1.4.9.3. 1 By substituting G = −V D V 2 (943) into D(G) (934). . id est. Because any point list may be constrained to lie on a hypersphere boundary whose affine dimension exceeds that of the list.1.0.1.3 hypersphere These foregoing relationships allow combination of distance and Gram constraints in any optimization problem we might pose: Interpoint angle Ψ can be constrained by fixing all the individual point lengths δ(G)1/2 . 3]) This observation renewed interest in the elliptope ( 5.1).2). then 1 Ψ = − δ 2(G)−1/2 V D V δ 2(G)−1/2 2 (947) (confer 5.

for G ∈ SN ⇔ G1 = 0 c G∈SN c (952) find G ∈ SN ˇ G .3.4.4. be replaced with the equivalent Gram-form (934). For M list members.10.2. Consider the academic problem of finding a Gram matrix subject to constraints on each and every entry of the corresponding EDM: D∈SN h find −V D V 1 2 subject to ˇ D . (ei eT + ej eT ) 1 = dij . for A ∈ SN D . j i j i G 0 i. j = 1 . .2. Capitalizing on identity (943) relating Gram and EDM D matrices. j = 1 . A = δ(G)1T + 1δ(G)T − 2G .2. . . . δ(A1) − A 2 Then the problem equivalent to (951) becomes.3. Requiring only the self-adjointness property (1451) of the main-diagonal linear operator δ we get. of course. N . N . there total M (M+1)/2 such constraints. i<j (953) subject to .4.5. j i 2 0 ∈ SN i.4. EDM DEFINITION 423 5.1 Example. i<j (951) −V D V ˇ where the dij are given nonnegative constants.3.3 and Example 5. Angle constraints are incorporated in Example 5. List-member constraints via Gram matrix. EDM D can. δ (ei eT + ej eT )1 − (ei eT + ej eT ) = dij . a constraint set such as  tr − 1 V D V ei eT = xi 2  i  2  1 T T 1 T tr − 2 V D V (ei ej + ej ei ) 2 = xi xj (949)    tr − 1 V D V e eT = x 2 2 j j j relates list member xi to xj to within an isometry through inner-product identity [383. 1-7] xTxj i (950) cos ψij = xi xj where ψij is angle between the two vectors as in (931). A = G .

4.2 Example. dual to a given minimization problem (related to each other by a Lagrangian function).1. rank V D V ≤ 8(N (N − 1)/2) + 1 − 1 2 = N−1 (954) because. (Figure 58 Example 2.9 to be recorded in the literature dates back to 1755. ( E.4.8 5.1.5.3. Three circles intersect at Fermat point of minimum total distance from three vertices of (and interior to) red/black/white triangle. Fermat point] Perhaps more −V D V |N ← 1 = 0 ( B.1) and is the greatest upper bound. in the strongest sense: the optimal objective achieved by a maximization problem. in each case. 5. Barvinok’s Proposition 2.3) A dual problem is always convex. First duality.1).2.7.9. the Gram matrix is confined to a face of positive semidefinite cone SN isomorphic with SN −1 ( 6.7.0.1 predicts existence for either formulation (951) or (953) such that implicit equality constraints induced by subspace membership are ignored rank G .13.2.8 . is always equal to the optimal objective achieved by the minimization.0.2) This bound + + is tight ( 5.424 CHAPTER 5.6. Kuhn reports that the first dual optimization problem5. [Wıκımization. EUCLIDEAN DISTANCE MATRIX Figure 124: Rendering of Fermat point in acrylic on canvas by Suman Vaze.3.0.1) By dual problem is meant.9 5.

The historically first dual problem formulation asks for the smallest equilateral triangle encompassing (N = 3) three points xi .4.10 . D 1 2 ˇ = dij ∀(i. Minimization of −V. EDM DEFINITION 425 intriguing is the fact: this earliest instance of duality is a two-dimensional Euclidean distance geometry problem known as a Fermat point (Figure 124) named after the French mathematician.5. 3} . where propensity of (955) for producing α⋆ is geometric center of points xi (1005).2) 5. Given N distinct points in the plane {xi ∈ R2 . 2. Another problem dual to (955) (Kuhn 1967) N maximize {zi } i=1 N zi .1) is a penultimate minimum eigenvalue problem having application to PageRank calculation by search engines [232. 4]. [377] It is not so straightforward to write the Fermat point problem (955) equivalently in terms of a Gram matrix from this section.7) must be 2 because a third dimension can only increase total distance. ei eT + ej eT j i −V D V 0 −V .0.7.5. but for multiple unknowns in R2 .10 Going the other way. Squaring instead N minimize α i=1 α−xi 2 minimize ≡ D∈SN +1 subject to D . the Fermat point y is an optimal solution to N minimize y i=1 y − xi (955) a convex minimization of total distance. I = {1. N } . . For three points. i = 1 . j) ∈ I (957) yields an inequivalent convex geometric centering problem whose equality constraints comprise EDM D main-diagonal zeros and known distances-square. optimal affine dimension ( 5. xi zi = 0 i=1 subject to (956) zi ≤ 1 ∀ i has interpretation as minimization of work required to balance potential energy in an N -way tug-of-war between equally matched opponents situated at {xi }. a problem dual to total distance-square maximization (Example 6.2.0. [342] Fermat function (955) is empirically compared with (957) in [61.7. D is a heuristic for rank minimization. 8.3]. . ( 7.

3. a constraint on the sum of norm-square of each and every vertex x .2. .2. zero distance between unknowns is revealed. 6) mod 6 2 1 subject to tr D(ei eT + ej eT ) 2 = lij . prescribed angles ϕ between the three pairs of opposing faces 3.426 CHAPTER 5. Let’s realize this as a convex feasibility problem (with constraints written in the same order) also assuming 0 geometric center (942): D∈Sh find −V D V 6 1 2 ∈ S6 j −1 = (i = 1 .1). Problem (957) is interpreted instead using linear springs.11 5. otherwise.4. 2. Barvinok [25. . i = 1.6] poses a problem in geometric realizability of an arbitrary hexagon (Figure 125) having: 1. prescribed (one-dimensional) face-lengths l 2. 0 force. j i 1 tr − 2 V D V (Ai + AT ) 1 = cos ϕi . as in Figure 72. 3 i 2 tr(− 1 V D V ) = 1 2 −V D V 0 5. Hexagon. An optimal solution to (955) gravitates toward gradient discontinuities ( D. .3 Example. whereas optimal solution to (957) is less compact in the unknowns. EUCLIDEAN DISTANCE MATRIX x3 x1 x6 x2 x4 x5 Figure 125: Arbitrary hexagon in R3 whose vertices are labelled clockwise. ten affine equality constraints in all on a Gram matrix G ∈ S6 (943).5.11 (958) Optimal solution to (955) has mechanical interpretation in terms of interconnecting springs with constant force when distance is nonzero. 2.

3.9. An elementary geometrical problem can be posed: Given hyperspheres. In two dimensions.4.2.1. whose affine dimension ( 5. In one dimension the answer to this kissing problem is 2.3. Two nonoverlapping Euclidean balls are said to kiss if they touch.1. 6. kissing number ]. As posed.1 asserts existence of a list. for Ai ∈ R6×6 (950) A1 = (e1 − e6 )(e3 − e4 )T /(l61 l34 ) A2 = (e2 − e1 )(e4 − e5 )T /(l12 l45 ) A3 = (e3 − e2 )(e5 − e6 )T /(l23 l56 ) (959) 2 and where the first constraint on length-square lij can be equivalently written 1 as a constraint on the Gram matrix −V D V 2 via (952). The total number of spheres is N = K + 1. EDM DEFINITION 427 Figure 126: Sphere-packing illustration from [376. where. but not required. We show how to numerically solve such a problem by alternating projection in E. corresponding to Gram matrix G solving this feasibility problem.5. Kissing number of sphere packing.1.4. Translucent balls illustrated all have the same diameter. to kiss. Barvinok’s Proposition 2.10.2. (Figure 7) .1) does not exceed 3 because the convex feasible set is bounded by 1 the third constraint tr(− 2 V D V ) = 1 (946). the problem seeks the maximal number of spheres K kissing a central sphere in a particular dimension. how many hyperspheres can simultaneously kiss one central hypersphere? [397] The noncentral hyperspheres are allowed.0.4 Example. 5. each having the same diameter 1.7.

[92] Translating this problem to an EDM graph realization (Figure 122. Delsarte’s method provides an infinite number [16] of linear inequalities necessary for converting a rank-constrained semidefinite program5.13 [274] There are no proofs known for kissing number in higher dimension excepting dimensions eight and twenty four.13 Simplex-method solvers for linear programs produce numerically better results than contemporary log-barrier (interior-point method) solvers for semidefinite programs by about 7 orders of magnitude.1) bounded above by some desired rank. 5. Assign index 1 to the central sphere.N (960) K = Nmax − 1 5.0. Their dispute was finally resolved in 1953 by Sch¨tte & van der Waerden. . Then kissing number D ∈ EDMN j = 2. And so was born a controversy between the two scholars on the campus of Trinity College Cambridge in 1694.428 CHAPTER 5. Figure 127) is suggested by Pfender & Ziegler. to Isaac Newton by David Gregory in the context of celestial mechanics. Interest persists [84] because sphere packing has found application to error correcting codes from the fields of communications and information theory. From this perspective. EUCLIDEAN DISTANCE MATRIX The question was presented. in three dimensions. Then the distance between centers must obey simple criteria: Each sphere touching the central sphere has a line segment of length exactly 1 joining its center to the central sphere’s center. and they are far more predisposed to vertex solutions.9. and assume a total of N spheres: T minimize − tr W VN DVN D∈SN subject to D1j = 1 .12 to a linear program. specifically to quantum computing. the kissing problem can be posed as a semidefinite program.N 2 ≤ i < j = 3.. All spheres. excepting the central sphere..5.1. Dij ≥ 1 .. must have centers separated by a distance of at least 1.12 (961) is found from the maximal number N of spheres that solve this semidefinite whose feasible set belongs to that subset of an elliptope ( 5. Oleg Musin tightened u the upper bound on kissing number K in four dimensions from 25 to K = 24 by refining a method of Philippe Delsarte from 1973.. Imagine the centers of each sphere are connected by line segments. [300] In 2003. Newton correctly identified the kissing number as 12 (Figure 126) while Gregory argued for 13.

                    W =                      10 −1 1 −1 0 0 −1 1 1 −1 1 0 1 2 1 1 0 −1 −1 −1 1 0 0 −1 −1 10 1 0 1 −1 −1 0 1 0 1 −1 0 1 −1 2 1 −1 1 0 0 −1 1 −1 0 1 1 −1 10 0 1 −1 −1 1 −1 0 1 0 1 −1 0 1 −1 −1 1 2 0 −1 0 −1 1 1 0 10 1 −1 1 1 −1 0 −1 0 −1 1 2 −1 1 −1 1 0 0 −1 1 1 0 −1 −1 1 0 −1 10 1 0 −1 1 −1 0 −1 −1 2 0 1 −1 1 1 0 −1 0 −1 0 1 1 −1 2 1 10 1 1 0 1 1 0 −1 −1 −1 0 0 −1 −1 1 1 1 0 1 −1 −1 2 −1 0 1 10 1 −1 −1 0 −1 1 0 0 −1 1 1 −1 0 0 −1 1 −1 0 0 −1 −1 −1 1 1 10 1 0 −1 1 0 1 1 1 −1 0 2 −1 1 0 −1 2 1 −1 1 0 1 0 −1 1 10 −1 −1 0 1 −1 1 0 0 −1 −1 1 2 1 −1 1 0 0 1 −1 −1 1 −1 0 −1 10 −1 −1 0 1 1 1 −1 0 0 1 1 −1 0 1 1 −1 0 −1 0 1 0 −1 −1 −1 10 1 1 0 2 1 −1 −1 1 0 1 2 −1 0 −1 1 1 0 −1 0 −1 1 0 −1 1 10 −1 1 −1 0 0 1 −1 1 0 1 −1 −1 0 2 −1 1 −1 −1 1 0 1 0 1 −1 10 1 −1 1 −1 0 0 1 −1 −1 0 1 1 −1 0 1 2 −1 0 1 −1 1 0 1 1 10 0 −1 1 −1 −1 0 −1 1 0 −1 −1 1 0 1 0 −1 0 1 1 1 2 −1 −1 0 10 −1 1 1 −1 0 3 −1 2 1 9 1 1 −1 −1 −1 1 −1 −1 1 −1 −1 1 9 2 −1 2 −1 2 3 −1 1 −1 1 1 2 9 3 −1 1 −2 −1 1 −2 2 −1 −1 −1 3 9 2 −1 1 1 2 1 −1 3 −1 2 −1 2 9 −1 1 −1 1 2 −1 2 −1 −1 1 −1 −1 9 3 1 −2 −1 1 −1 1 2 −2 1 1 3 9 −1 1 −1 2 1 −1 3 −1 1 −1 1 −1 9          1    12         −1 0 1 0 −1 −1 1 0 1 0 −1 1 0 1 1 0 1 −1 −1 10 2 1 −1 −1 (963) 1 0 −1 0 1 1 −1 0 −1 0 1 −1 0 −1 −1 0 −1 1 1 2 10 −1 1 1 0 −1 −1 1 2 0 −1 1 1 −1 1 0 −1 0 −1 1 0 −1 1 1 −1 10 0 1 (964) 0 1 −1 1 0 0 1 1 1 −1 −1 2 −1 0 1 −1 0 −1 −1 −1 1 0 10 1 −1 −1 2 −1 −1 −1 0 −1 0 1 0 −1 1 1 0 1 1 0 0 −1 1 1 1 10                      1    12                   . 9  1   −2   −1   3    −1 W =  −1   1   2   1   −2 1 1 1 10 1 1 1 0 1 0 −1 0 1 −1 −1 0 −1 −1 0 0 1 −1 −1 −1 2 −1 0 1 10 −1 1 −1 0 −1 0 1 −1 2 1 1 0 −1 1 −1 0 0 1 1 −1 program in a given affine dimension r whose realization is assured by 0 optimal objective.5.1. In two dimensions. EDM DEFINITION 429 In three dimensions. Matrix W is constant. determined by a method disclosed in 4.4.4. in this program. Matrix W can be interpreted as direction of search through the positive semidefinite cone SN −1 for a rank-r optimal solution + T −VN D⋆ VN .   4 1 2 −1 −1 1  1 4 −1 −1 2 1     2 −1 4 1 1 −1  1  (962) W =  −1 −1 1 4 1 2 6    −1 2 1 1 4 −1  1 1 −1 2 −1 4  1 9 3 −1 −1 1 1 −2 1 2 −1 −1 −2 3 9 1 2 −1 −1 2 −1 −1 1 2 −1 −1 0 −1 1 1 10 1 0 −1 2 −1 1 1 0 1 −1 0 0 1 −1 −1 1 0 −1 −1 1 9 1 −1 1 −1 3 2 −1 1 1 0 1 0 −1 −1 1 10 −1 2 −1 −1 0 −1 −1 0 1 1 1 0 0 1 1 −1 A four-dimensional solution has rational direction matrix as well.

0.8146 0 0.2004 0. the conjecture in the general case also is: k(5) = 40.6863 0.0624 -0.6533 0.0611 0.4835 -0.7147 -0.5059 0.2926 -0.1613 0. Here is an optimal point list5.2313 -0.2004 Columns 7 through 12 -0. we are considering points with coordinates that have two A and three 0 with any choice of signs and any ordering of the coordinates. 0.6239 0.1698 -0.5759 -0.4815 -0.1632 0.2936 -0. then k(5) = 40.3721 Columns 19 through 25 -0.1698 0.3922 -0.1657 0.4584 0 0.9176 -0.5059 0.1983 -0. −Oleg Musin .4093 5.2936 -0.6473 -0.3922 0. 0. You would like to see coordinates? Easily.9152 0. Actually.0624 0.3640 -0.2313 0.4835 0.2936 0.3427 Columns 13 through 18 0.0943 0. Moreover.3721 0.0372 0.0372 An optimal five-dimensional point list is known: The answer was known at least 175 years ago. EUCLIDEAN DISTANCE MATRIX but these direction matrices are not unique and their precision not critical. the same coordinates-expression in dimensions 3 and 4.9176 -0. Of course.1613 -0. people in coding theory have conjectures on the kissing numbers for dimensions up to 32 (or even greater? ). p(2) = (−A.2313 0. that is the so-called Leech lattice. 0.3427 -0. It’s a real miracle how dense the packing is in eight dimensions (E8 =Korkine & Zolotarev packing that was discovered in 1880s) and especially in dimension 24.8756 0.430 CHAPTER 5. −A) .2309 0.2088 0. i.5431 -0.0943 -0.e.9152 -0. Korkine & Zolotarev proved in 1882 that D5 is the densest lattice in five dimensions. p(3) = (A.7416 0. 0).2936 0 -0. 0).2224 0. However.6863 0. A. 0.3640 0. 0. −A.6239 0.3927 -0.2832 -0. .0611 -0. 0. −A.8146 -0.8756 -0.6448 -0. There are better packings than D6 (Conjecture: k(6) = 72).4584 -0. 0.2313 -0.1657 0.7147 0.7520 0.4815 0 0.4093 0.6239 -0. .5759 0.14 in Matlab output format: Columns 1 through 6 X = 0 -0. p(40) = (0.6448 -0.1632 -0.14 0. The first miracle happens in dimension 6.6863 -0. sometimes they found better lower bounds. Then p(1) = (A.3927 -0.2832 -0.9399 -0. 0). I believe Gauss knew it. A..5431 -0.6863 0.6473 0. √ Let A = 2 .2088 0.2309 0.6239 -0.9399 -0.4224 0. I know that Ericson & Zinoviev a few years ago discovered (by hand.2224 -0. So they proved that if a kissing arrangement in five dimensions can be extended to some lattice.7416 0 0. .1983 -0.6533 0.7520 0.4224 -0. no computer ) in dimensions 13 and 14 better kissing arrangements than were known before.2926 -0.

Given three known absolute point positions in R2 (three anchors x2 . x4 ) ˇ ˇ ˇ and only one unknown point (one sensor x1 )..9. 5.3 we have T −VN DVN = 11T − [ 0 I ] D 0T I 1 2 (965) (which does not hold more generally) where identity matrix I ∈ SN −1 has one less dimension than EDM D . a nonlinear problem. matrix size can be reduced: From 5.1.5. Trilateration in wireless sensor network. By eliminating some equality constraints for this particular problem.4. This trilateration can be expressed as a convex optimization problem in terms of list X [ x1 x2 x3 x4 ] ∈ R2×4 and ˇ ˇ ˇ . has convex expression. remaining T eigenvalues are −W ⋆ VN D⋆ VN zero. ˆ 11T − D 1 2 ˆ − tr(W D) 0 1 ≤ i < j = 2.3. N−1 (967) ˆ δ(D) = 0 ˆ Any feasible solution 11T − D 1 belongs to an elliptope ( 5.8. 2 This next example shows how finding the common point of intersection for three circles in a plane. the sensor’s absolute position is ˇ determined from its noiseless measured distance-square di1 to each of three anchors (Figure 2. By defining an EDM principal submatrix ˆ D [0 I ] D 0T I ∈ SN −1 h (966) we get a convex problem equivalent to (960) minimize ˆ D∈SN −1 ˆ subject to Dij ≥ 1 . x3 .4..5 Example.2. Figure 127a). Numerical problems begin to arise with matrices of this size due to interior-point methods of solution.0. EDM DEFINITION 431 T The r nonzero optimal eigenvalues of −VN D⋆ VN are equal.1).

x3 .432 CHAPTER 5. EUCLIDEAN DISTANCE MATRIX x3 ˇ √ √ d12 d13 √ x4 ˇ x3 x4 (a) x2 ˇ (b) d14 x2 x1 x1 x4 x2 (c) x3 x5 x5 x2 x6 (d) x4 x1 x3 x1 Figure 127: (a) Given three distances indicated with absolute point positions x2 . (b) Four-point list must always be embeddable in affine subset having T dimension rank VN DVN not exceeding 3. only distances to x3 . absolute position of x1 in R2 ˇ ˇ ˇ can be precisely and uniquely determined by trilateration. x4 known and noncollinear. yet unique isometric reconstruction ( 5. Line segments represent known absolute distances and may cross without vertex at intersection. d12 .6)) point position x1 is uniquely determined to within an isometry in any dimension.7) of six points in R2 is certain. Dimensionless EDM graphs (b) (c) (d) represent EDMs in various states of completion.4.14. x5 are required to determine relative position of x2 √ R2 . x4 . . √ √ √ ( d12 . x4 . d23 ) from complete set of interpoint distances.2.1). then (by injectivity of D ( 5. To determine relative position of x2 . solution to a system of nonlinear equations. triangle inequality is necessary and sufficient ( 5. Knowing all distance information. (c) When fifth point is introduced. Graph represents first instance in (d) Three distances are absent of missing distance information.3. x3 . d13 .

4. 4 xi 2 .5.4. id est.1) It may be interpreted as squashing G which is bounded below by X TX as in (969).2) to the positive semidefinite cone boundary.1. There are 9 linearly independent affine equality constraints on that Gram matrix while the sensor is constrained. δ(G−X TX) = 0 ⇔ G = X TX ( A.0.2.6.2. only by dimensioning.1). ˇ i = 2.7. G ) minimization is a heuristic for rank minimization.2. G−X TX 0 ⇒ tr G ≥ tr X TX (1522). 4 [ x2 x 3 x 4 ] ˇ ˇ ˇ 0 subject to (968) where ˇ and where the constraint on distance-square di1 is equivalently written as a constraint on the Gram matrix via (933).7. 3. .3.15 Trace (tr G = I . ( 7.1.2. the relaxation admits (3 ≥) rank G ≥ rank X 5. 2. 2 : 4) I X XT G = = = = ˇ di1 . we claim that the set of feasible + Gram matrices forms a line ( 2.1.15 insures a solution on the boundary of positive semidefinite cone S4 .9. 4 T x i xj . Although the objective tr G of minimization5. to lie in R2 .1) in isomorphic R10 tangent ( 2. 3.2) ⇒ tr G = tr X TX which is a condition necessary for equality.1. ( 5.1. by (1524). i = 2. (971) expected positive semidefinite rank-2 under noiseless conditions. any feasible G and X provide G X TX (969) Φij = (ei − ej )(ei − ej )T ∈ SN + (920) which is a convex relaxation of the desired (nonconvex) equality constraint I X XT G = I [I X ] XT (970) But. for this problem. confer 4.3) By Schur complement ( A. EDM DEFINITION Gram matrix G ∈ S4 (931): G∈ S4 . X∈ R2×4 433 minimize tr G tr(GΦi1 ) tr Gei eT i tr(G(ei eT + ej eT )/2) j i X(: .4. ˇ ˇ 2 ≤ i < j = 3.5.

three noncollinear) anchors. 4 2 ≤ i < j = 3. Assuming the anchors exhibit no rotational or reflective symmetry in their affine hull ( 5. rank I X⋆ X ⋆T G⋆ = 2 (972) then G⋆ = X ⋆TX ⋆ by Theorem A. 4 which is invertible:  svec(I )T  svec(Φ21 )T   svec(Φ31 )T   svec(Φ41 )T   svec(e2 eT )T 2  svec G =  svec(e3 eT )T 3   svec(e4 eT )T 4   svec (e eT + e eT )/2  2 3 3 2   svec (e2 eT + e4 eT )/2 4 2 svec (e3 eT + e4 eT )/2 4 3 T             T   T   −1                 x1 2 + x2 ˇ 2 + x3 ˇ ˇ21 d ˇ d31 ˇ d41 x2 2 ˇ x3 2 ˇ x4 2 ˇ xTx3 ˇ2 ˇ xTx4 ˇ2 ˇ xTx4 ˇ3 ˇ 2 + x4 ˇ 2          (974)        . ˇ = xTxj .2. 1) is unique under noiseless measurement.4. 3.434 CHAPTER 5. tr Gei eT i tr(G(ei eT + ej eT )/2) j i = xi . If rank of an optimal solution equals 2. Only the sensor location x1 is unknown. not circles.3. ˇi ˇ 2 2 + x3 ˇ 2 + x4 ˇ 2 i = 2.3. The objective function together with the equality constraints make a linear system of equations in Gram matrix variable G tr G = x1 2 + x2 ˇ ˇ tr(GΦi1 ) = di1 .6 Proof (sketch).4.0.5.1. 3. 4 (973) i = 2. EUCLIDEAN DISTANCE MATRIX (a third dimension corresponding to an intersection of three spheres. this localization problem does not require affinely independent (Figure 27. As posed. 5.2) and assuming the sensor x1 lies in that affine hull. were there noise). then sensor position solution x⋆ = X ⋆ (: . [322] 1 This preceding transformation of trilateration to a semidefinite program works all the time ((972) holds) despite relaxation (969) because the optimal solution set is a unique point.

413. and Pardalos [206.5.3) Isometric reconstruction from an EDM means building a list X correct to within a rotation.4. △ How much distance information is needed to uniquely localize a sensor (to recover actual relative position)? The narrative in Figure 127 helps dispel any notion of distance data proliferation in low affine dimension (r < N − 2) . then all distances-square in the EDM must be known for unique isometric reconstruction in Rr . 4. going the other way. as it turns out. 2. when r = 1 then the condition that the dimensionless EDM graph be connected is necessary and sufficient.4. such problems can remain convex although the set of all optimal solutions generally becomes a convex set bigger than a single point (and still containing the noiseless solution).0. and translation. from small completion problem Example 5.0. [188. correct to within an isometry. reflection. is traced by x1 2 ∈ R on the right-hand side.16 .3. 5.3. (confer 5. EDM DEFINITION 435 That line in the ambient space S4 of G .2) is an example in R2 requiring only 2N − 3 = 5 known symmetric entries for unique isometric reconstruction.7 Definition. Unique r-dimensional isometric reconstruction by semidefinite relaxation like When affine dimension r reaches N − 2.1. claimed on page 433. Isometric reconstruction. correct to within a rigid transformation.2. [9] [4] [5] Figure 122b (p.2) to S4 in order to prove uniqueness.5. achievable under certain noiseless conditions on graph connectivity and point position. having only 2N − 3 = 9 absolute distances specified. Liang. has only one realization in R2 but has more realizations in higher dimensions. The list represented by the particular dimensionless EDM graph in Figure 128. Alfakih shows how to ascertain uniqueness over all affine dimensions via Gale matrix. in other terms.7. One must show this line to be tangential ( 2.2] claim O(2N ) distances is a least lower bound (independent of affine dimension r) for unique isometric reconstruction. But as soon as significant noise is introduced or whenever distance data is incomplete. Tangency is + possible for affine dimension 1 or 2 while its occurrence depends completely on the known measurement data.2] 5. although the four-point example in Figure 127b will not yield a unique reconstruction when any one of the distances is left unspecified. reconstruction of relative position.1.5.16 Huang.

This example differs from Example 5.2. We express this localization as a convex optimization problem (a semidefinite program. Tandem trilateration in wireless sensor network. An ad hoc method for recovery of the least-rank optimal solution under noiseless conditions is introduced: 5.3. tr Gei eT = xi 2 . 3 : 5) = [ x3 x 4 x 5 ] ˇ ˇ ˇ I X 0 XT G Φij = (ei − ej )(ei − ej )T ∈ SN + i = 2. x2 ∈ R have absolute position determinable from their noiseless distances-square (as indicated in Figure 129) assuming the anchors exhibit no rotational or reflective symmetry in their affine hull ( 5. we provide the complete corresponding EDM:   0 50641 56129 8245 18457 26645  50641 0 49300 25994 8810 20612     56129 49300 0 24202 31330 9160   (975) D =  8245 25994 24202 0 4680 5290     18457 8810 31330 4680 0 6658  26645 20612 9160 5290 6658 0 We consider paucity of distance information in this next example which shows it is possible to recover exact relative position given incomplete noiseless distance information.8 Example. ˇ ˇ ˇ 2 two unknown sensors x1 .436 CHAPTER 5. 4. 4.3. Given three known absolute point-positions in R2 (three anchors x3 .4. ˇi ˇ j i X(: . 5 i = 3.2). 5 3 ≤ i < j = 4.4. x4 . EUCLIDEAN DISTANCE MATRIX (969) occurs iff realization in Rr is unique and there exist no nontrivial higher-dimensional realizations. 4.2. X∈ R2×5 minimize tr G ˇ = di1 .5 insofar as trilateration of each sensor is now in terms of one unknown position: the other sensor.5.1) in terms of list X [ x1 x2 x3 x4 x5 ] ∈ R2×5 and Gram ˇ ˇ ˇ 5 matrix G ∈ S (931) via relaxation (969): G∈ S5 . ˇ tr(GΦi2 ) = di2 . 5 i = 3. ˇ i tr(G(ei eT + ej eT )/2) = xTxj . x5 ). [322] For sake of reference. 5 (976) subject to tr(GΦi1 ) where (920) .

Connecting line-segments denote known absolute distances.4. . Incomplete EDM corresponding to this dimensionless EDM graph provides unique isometric reconstruction in R2 .5. (drawn freehand. no symmetry intended. EDM DEFINITION 437 x1 x4 x5 x6 x3 x2 Figure 128: Incomplete EDM corresponding to this dimensionless EDM graph provides unique isometric reconstruction in R2 . confer (975)) x4 ˇ x1 x2 x3 ˇ x5 ˇ Figure 129: (Ye) Two sensors • and three anchors ◦ in R2 .

x4 . On this run. Anchors x3 .438 CHAPTER 5. These errors can be corrected by choosing a different normal in objective of minimization. x5 corresponding to Figure 129 are indicated by ⊗ . localization is successful. two visible localization errors are due to rank-3 Gram optimal solutions. . EUCLIDEAN DISTANCE MATRIX 10 5 x4 ˇ 0 x3 ˇ x5 ˇ −5 x2 −10 x1 −15 −6 −4 −2 0 2 4 6 8 Figure 130: Given in red are two discrete linear trajectories of sensors x1 and x2 in R2 localized by algorithm (976) as indicated by blue bullets • . When ˇ ˇ ˇ targets and bullets • coincide under these noiseless conditions.

2.1) on rank of G⋆ is 4.2. now implicitly the identity matrix I . Now we illustrate how a problem in distance geometry can be solved without equality constraints representing measured distance.4 does not work here because it chooses the wrong normal. their individual effect being to expand the optimal solution set. 7.4. in an affine subset. an optimal solution is known a priori to exist in two dimensions. instead. 1 : 2) due to a rank-3 optimal Gram matrix G ⋆ . and (we speculate) because the feasible affine-subset slices the positive semidefinite cone thinly in the Euclidean sense.18 Projection on that nonconvex subset of all N × N -dimensional positive semidefinite matrices.9. Wireless location in a cellular telephone network. 5.17 . id est.2. A random search for a good normal δ(u) in only a few iterations is quite easy and effective because: the problem is small. EDM DEFINITION 439 This problem realization is fragile because of the unknown distances between sensors and anchors. 4] 5. Yet there is no more information we may include beyond the 11 independent equality constraints on the Gram matrix (nonredundant constraints not antithetical) to reduce the feasible set. 5.1) A rank-2 optimal Gram matrix can be found and the errors corrected by choosing a different normal for the linear objective function.18 rank G . time of flight. tr G = G .9. Utilizing measurements of distance. Rank reduction ( 4. The trace objective is a heuristic minimizing convex envelope of quasiconcave function5.2.19 The log det rank-heuristic from 7.1.5. ( 2.2. angle of arrival.9.17 Exhibited in Figure 130 are two mistakes in solution X ⋆ (: . Hence our focus shifts in 4. [356.9 Example.2. we have only upper and lower bounds on distances measured: 5. a good normal direction is not necessarily unique.2. δ(u) (977) where vector u ∈ R5 is randomly selected.2.5.19 We explore ramifications of noise and incomplete data throughout.5.1) is unsuccessful here because Barvinok’s upper bound ( 2. introducing more solutions and higher-rank solutions.4 to discovery of a reliable means for diminishing the optimal solution set by introduction of a rank constraint.0. or signal By virtue of their dimensioning.4. I ← G . the sensors are already constrained to R2 the affine hull of the anchors. whose rank does not exceed 2 is a problem considered difficult to solve.3.3.

There can be as many as 20 base-station antennae capable of receiving signal from any given cell phone • . The term cellular arises from packing of regions best covered by neighboring base stations. Like a Voronoi diagram. it is not unusual to find base stations spaced approximately 1 mile apart. Illustrated is a pentagonal cell best covered by base station x2 . cell geometry depends on ˇ base-station arrangement. In some US urban environments. . that number is closer to 6. practically.440 CHAPTER 5. EUCLIDEAN DISTANCE MATRIX x3 ˇ x7 ˇ x1 x6 ˇ x2 ˇ x4 ˇ x5 ˇ Figure 131: Regions of coverage by base stations ◦ in a cellular telephone network.

On a large open flat expanse of terrain.20 Those signal power measurements are transmitted from that cell phone to Cell phone signal power is typically encoded logarithmically with 1-decibel increment and 64-decibel dynamic range. turning one’s head. so we assume distance to any particular base station can be inferred from received signal power. EDM DEFINITION 441 Figure 132: Some fitted contours of equal signal power in R2 transmitted from a commercial cellular telephone • over about 1 mile suburban terrain outside San Francisco in 2005.20 . multilateration is the process of localizing (determining absolute position of) a radio signal source • by inferring geometry relative to multiple fixed base stations ◦ whose locations are known. 5. Depicted in Figure 131 is one cell phone x1 whose signal power is automatically and repeatedly measured by 6 base stations ◦ nearby. Consequently. [324] power in the context of wireless telephony.5. or rolling the windows up in an automobile. We consider localization of a cellular telephone by distance geometry. mountainous terrain. But it is not uncommon for measurement of signal power to suffer 20 decibels in loss caused by factors such as multipath interference (signal reflections).4. contours of equal signal power are no longer circular.5. signal-power measurement corresponds well with inverse distance. their geometry is irregular and would more aptly be approximated by translated ellipsoids of graduated orientation and eccentricity as in Figure 132. man-made structures. Data courtesy Polaris Wireless.

. 5.21 . EUCLIDEAN DISTANCE MATRIX base station x2 who decides whether to transfer (hand-off or hand-over ) ˇ responsibility for that call should the user roam outside its cell.5.8. 5.1. 4 in three.4. localization of a particular cell phone • by distance geometry would be far easier were the whole cellular system instead conceived so cell phone x1 also transmits (to base station x2 ) its signal power as received by all other ˇ cell phones within range. i=2. point list X is simply redimensioned in the semidefinite program.21 Due to noise. These bounds become the input data..2.7 [ x2 x 3 x 4 x 5 x 6 x 7 ] ˇ ˇ ˇ ˇ ˇ ˇ 0 (978) subject to where Φij = (ei − ej )(ei − ej )T ∈ SN + (920) This semidefinite program realizes the wireless location problem illustrated in Figure 131. at least one distance measurement more than the minimum number of measurements is required for reliable localization in practice. 1) is taken as solution.442 CHAPTER 5. although measurement noise will often cause rank G⋆ to exceed 2.2. Because distance to base station is quite difficult to infer from signal power measurements in an urban environment.3. 2 : 7) I X XT G ≤ = = = di1 .7 2 xi . 3 measurements are minimum in two dimensions.22 In Example 4. X∈ R2×7 minimize tr G tr(GΦi1 ) di1 ≤ T tr Gei ei tr(G(ei eT + ej eT )/2) j i X(: . ˇ i=2. ˇ ˇ 2≤ i< j = 3.. while di1 is the lower bound.7 T xi x j .22 Existence of noise precludes measured distance from the input data. Each measurement range is presumed different from the others. we explore how this convex optimization algorithm fares in the face of measurement noise.4. We instead assign measured distance to a range estimate specified by individual upper and lower bounds: di1 is the upper bound on distance-square from the cell phone to i th base station...5. To formulate this same problem in three dimensions.. Location X ⋆ (: .4.4 for enforcing the stronger rank-constraint (972). Randomized search for a rank-2 optimal solution is not so easy here as in Example 5. Then convex problem (968) takes the form: G∈ S7 . We introduce a method in 4.

10 Example. Nigam. EDM DEFINITION 443 Figure 133: A depiction of molecular conformation. the bond lengths and angles are treated as completely fixed. These are the dihedral angles between the planes spanned by the two consecutive triples in a chain of four covalently bonded atoms. F. . M. e. .g. angle. From this technique.5. and dihedral angle measurements can be obtained. In the rigid covalent geometry approximation.4.1] Crippen & Havel recommend working exclusively with distance data because they consider angle data to be mathematically cumbersome. −G. Figure 133. . (Biswas. so that a given spatial structure can be described very compactly indeed by a list of torsion angles alone. Crippen & T. Figure 3. [117] 5.4. The subatomic measurement technique called nuclear magnetic resonance spectroscopy (NMR) is employed to ascertain physical conformation of molecules. distance. The present example shows instead how inclusion of dihedral angle data into a problem statement can be made elegant and convex. Ye) Molecular Conformation. Havel (1988) [88.2.. 1.3. Dihedral angles arise consequent to a phenomenon where atom subsets are physically constrained to Euclidean planes.

Aℓ12 = 0 Z . There might occur. δ(Gℵ ) = 1 (983) (982) NMR data is noisy. Z . So our problem’s variables are the two Gram matrices GX and Gℵ and matrix Z = ℵTX of cross products. dihedral angle ϕ measurements can be expressed as linear constraints on Gℵ by (950). we preferentially work with Gram matrices G because of the bridge they provide between other variables. Given bounds on dihedral angles respecting 0 ≤ ϕj ≤ π and bounds on distances di and given constant matrices Ak (982) and symmetric matrices Φi (920) and Bj per (950). Then measurements of distance-square d can be expressed as linear constraints on GX as in (978). so measurements are given as upper and lower bounds. ascribe position information to the matrix X = [ x1 · · · xN ] ∈ R3×N (76) and introduce a matrix ℵ holding normals η to planes respecting dihedral angles ϕ : ℵ [ η 1 · · · ηM ] ∈ R3×M (979) As in the other examples. the independent constraints T ηℓ (x1 − x2 ) = 0 T ηℓ (x2 − x3 ) = 0 (981) which are expressible in terms of constant matrices A ∈ RM ×N . 3 assumed to lie in the ℓ th plane whose normal is ηℓ . we define Gℵ Z Z T GX ℵT ℵ ℵTX X T ℵ X TX = ℵT [ ℵ X ] ∈ RN +M ×N +M (980) XT whose rank is 3 by assumption. 2 . and normal-vector η conditions can be expressed by vanishing linear constraints on cross-product matrix Z : Consider three points x labelled 1 . EUCLIDEAN DISTANCE MATRIX As before. then a molecular conformation problem can be .444 CHAPTER 5. for example. Aℓ23 = 0 Although normals η can be constrained exactly to unit length.

k are positive integers. and θikj is the angle at vertex xk formed by vectors xi − xk and xj − xk . parsecs from our Sun and relative angular measurement.4.4. EDM DEFINITION expressed: Gℵ ∈ SM .22) Equivalent to (918) is [383. 5. want to realize a constellation given only interstellar distance (or.4. j . cos θikj = 1 (d 2 ik + dkj − dij ) dik dkj = (xi − xk )T (xj − xk ) x i − xk x j − xk (986) . What binds these three variables is the rank constraint. ∀ j ∈ I2 ∀ k ∈ I3 (984) Z . Ak GX 1 δ(Gℵ ) Gℵ Z Z T GX Gℵ Z rank Z T GX where GX 1 = 0 provides a geometrically centered list X ( 5. 3.4.2). Ignoring the rank constraint would tend to force cross-product matrix Z to zero.2] dij = dik + dkj − 2 dik dkj cos θikj = dik dkj 1 −e −ıθikj −e ıθikj 1 dik dkj (985) √ called law of cosines where ı −1 .2. called stellar cartography. Z∈ RM ×N 445 find GX di ≤ tr(GX Φi ) ≤ di . the Sun as vertex to two distant stars). 1-7] [333. GX ∈ SN . =0. for example. i .3 Inner-product form EDM definition We might. (p. we show how to satisfy it in 4. equivalently. =0 =1 0 =3 ∀ i ∈ I1 subject to cos ϕj ≤ tr(Gℵ Bj ) ≤ cos ϕj .5.

23 on R2 whereas dij (θikj ) is dij + dkj ⋆ quasiconvex ( 3. . having eigenvalues {0.23 1 dik dkj −e = −ıθikj −e ıθikj 0. µ ≥ 0 . Distance-square dik is a convex quadratic function5. . d1N (989) then any EDM may be expressed D(Θ) = 0 δ(ΘT Θ) 0 δ(ΘT Θ) 1T + 1 0 δ(ΘT Θ)T − 2 0 0T 0 ΘT Θ ∈ EDMN (990) (991) δ(ΘT Θ)T δ(ΘT Θ)1T + 1δ(ΘT Θ)T − 2ΘT Θ D(Θ) | Θ ∈ RN −1×N −1 EDMN = 5. d12 d1N cos θ21N  2 .. d13 d1N cos θ31N d12 d14 cos θ214 d13 d14 cos θ314 d14 . we get the Pythagorean theorem when θikj = ±π/2..2.4. . 2}. ··· ··· .8) minimized over domain {−π ≤ θikj ≤ π} by θikj = 0.446 CHAPTER 5. θikj = 0 (987) ΘT Θ d14 d1N cos θ41N · · ·  d13 d1N cos θ31N   N −1 d14 d1N cos θ41N  ∈ S   . −π ≤ θikj ≤ π . We may construct an inner-product form of the EDM definition for matrices by evaluating (985) for k = 1 : By defining         d12 d12 d13 cos θ213 d12 d14 cos θ214 . Minimum is attained for 1 µ1 .  . . [61..5]) 0 ... exmp. θikj = ±π θikj = ± π 2 2 .. Euclidean metric property 4. holds for any EDM D . θikj = 0 . dik + dkj dij = dij = dik + dkj . θikj = 0 . ( D. dij = dik − dkj so | dik − dkj | ≤ dij ≤ dik + dkj (988) Hence the triangle inequality. .1. . d12 d1N cos θ21N d12 d13 cos θ213 d13 d13 d14 cos θ314 . EUCLIDEAN DISTANCE MATRIX where the numerator forms an inner product of vectors. and dij (θikj ) is maximized ⋆ when θikj = ±π .

Entries of ΘT Θ result from vector inner-products as in (986).. .1 Relative-angle form The inner-product form EDM definition is not a unique definition of Euclidean distance matrix. Here is another one: Like D(X) (922). 0 d1N cos θ31N · · · 1 (995) 0    cos θ213  . ... id est... distances therein are all with respect to point x1 ..4..  . i. Scrutinizing ΘT Θ (989) we find that because of the arbitrary choice k = 1.. d1N cos θ21N Expression D(Θ) defines an EDM for any positive semidefinite relative-angle matrix Ω = [cos θi1j . .. .  .4.3.3.2). N ] = δ(ΘT Θ) ∈ RN −1 (997) . Yet picking arbitrary θi1j to fill ΘT Θ will not necessarily make an EDM. x1 = 0 (993) For D = D(Θ) and no condition on the list X (confer (939) (943)) T ΘT Θ = −VN DVN ∈ RN −1×N −1 (994) 5.  . d13  cos θ31N   1   . and it is homogeneous in the sense (925). there are approximately five flavors distinguished by their argument to operator D . j = 2 . N ] ∈ SN −1 (996) and any nonnegative distance vector d = [d1j . . G= 0 0T 0 ΘT Θ .  . it is neither a convex function of Θ ( 5..5.4. EDM DEFINITION 447 for which all Euclidean metric properties hold. √ (992) Θ = [ x2 − x1 x3 − x1 · · · xN − x1 ] = X 2VN ∈ Rn×N −1 Inner product ΘT Θ is obviously related to a Gram matrix (931). . Similarly. inner product (989) must be positive semidefinite. j = 2 .  1 δ(d) Ω δ(d)  √  cos θ213 · · · cos θ21N 0 d12 √   .  . ΘT Θ = √     d12 √ 0 d13 . D(Θ) will make an EDM given any Θ ∈ Rn×N −1 . relative angles in ΘT Θ are between all vector pairs having vertex x1 .

3.1. −D(Θ) is not a quasiconvex function of Θ by the same reasoning. EUCLIDEAN DISTANCE MATRIX Ω 0 ⇒ ΘT Θ 0 (998) Decomposition (995) and the relative-angle matrix inequality Ω 0 lead to a different expression of an inner-product form EDM definition (990) 0 d = 0 d 0 d δ(d) 0 0 0T Ω 0 d (999) D(Ω . Ψ ≡ 1 0 0T Ω .4.0.4. x1 = 0 (1001) 5. we can relate interpoint angle matrix Ψ from the Gram decomposition in (931) to relative-angle matrix Ω in (995). Here the situation for of dkj inner-product form EDM operator D(Θ) (990) is identical to that in 5.2 T Inner-product form −VN D(Θ)VN convexity On page 446 we saw that each EDM entry dij is a convex quadratic function dik and a quasiconvex function of θikj . δ(d) 0 (1000) In the particular circumstance x1 = 0.448 because ( A.3. d) | Ω 0. . and from (994) T −VN D(Θ)VN = ΘT Θ (1002) is a convex quadratic function of Θ on domain Rn×N −1 achieving its minimum at Θ = 0. Thus. d) 1T + 1 0 d T − 2 δ δ dT d1T + 1d T − 2 δ(d) Ω ∈ EDMN and another expression of the EDM cone: EDMN = D(Ω .1 for list-form D(X) .5) CHAPTER 5.

and translation (offset or shift). Use of a fixed reference in measurement of angles and distances would reduce required information but is antithetical.3 Inner-product form. 5. ordering all points xℓ (in a length-N list) by increasing angle of vector xℓ − x1 with respect to j−1 k=i x2 − x1 . for example. θi1j becomes equivalent to reduced to 2N − 3 . INVARIANCE 5. interpoint distances are unaffected by offset.5.24 with respect to EDM D . In the particular case n = 2.1 Translation Any translation common among all the points xℓ in a list will be cancelled in the formation of each dij . O(N ).5. Proof follows directly from (918).5 Invariance When D is an EDM. there exist an infinite number of corresponding N -point lists X (76) in Euclidean space.3. 5. we may remove it from the list constituting the columns of X by subtracting α1T . Knowing that translation α in advance. All those lists are related by isometric transformation: rotation.24 √ 2VN = D(X) (1004) The reason for amount O(N 2 ) information is because of the relative measurements. Then it stands to reason by list-form definition (922) of an EDM.4. EDM D is translation invariant. rather. θk.k+1 ≤ 2π and the amount of information is . discussion 449 We deduce that knowledge of interpoint distance is equivalent to knowledge of distance and angle from the perspective of one point.5. reflection. x1 in our chosen case.1. When α = x1 in particular. √ [ x2 − x1 x3 − x1 · · · xN − x1 ] = X 2VN ∈ Rn×N −1 (992) and so D(X −x1 1T ) = D(X −Xe1 1T ) = D X 0 5. The total amount of information N (N − 1)/2 in ΘT Θ is unchanged5. for any translation α ∈ Rn D(X − α1T ) = D(X) (1003) In words.

5. The geometric center is the same as the center of mass under the assumption of uniform mass density across points. id est.5. the translation-invariant subspace SN ⊥ (2042) of the symmetric matrices SN . This means matrix G c is not unique and can belong to an expanse more broad than a positive semidefinite cone.1 Gram-form invariance Following from (1007) and the linear Gram-form EDM operator (934): D(G) = D(V G V ) = δ(V G V )1T + 1δ(V G V )T − 2V G V ∈ EDMN (1008) The Gram-form consequently exhibits invariance to translation by a doublet ( B.5. G ∈ SN − SN ⊥ . So explains Gram matrix sufficiency + c in EDM definition (934). ( B.25 Subtracting the geometric center from every list member.5) 5. EUCLIDEAN DISTANCE MATRIX 5.2) u1T + 1uT D(G) = D(G − (u1T + 1uT )) (1009) because.0.25 . T id est. The collection of all such doublets forms the nullspace (1025) to the operator. 5. for any u ∈ RN .4. [355] [164] α = αc X bc 1 X1 N ∈ P ⊆ A (1005) where A represents the list’s affine hull.450 CHAPTER 5. D(u1T + 1uT ) = 0. X − αc 1T = X − 1 X11T N = X(I − 1 11T ) N = XV ∈ Rn×N (1006) where V is the geometric centering matrix (944). If we were to associate a point-mass mℓ with each of the points xℓ in the list.1 Example.26 A constraint G1 = 0 would prevent excursion into the translation-invariant subspace (numerical unboundedness). makes an auxiliary V -matrix.1. Translating geometric center to origin. more generally. We might choose to shift the geometric center αc of an N -point list {xℓ } (arranged columnar in X) to the origin. αc ∈ P because bc 1 = 1 and bc 0 .26 Any b from α = Xb chosen such that bT 1 = 1.5. So we have (confer (922)) D(X) = D(XV ) = δ(V TX TXV )1T+1δ(V TX TXV )T−2V TX TXV ∈ EDMN (1007) 5.1. [220] The geometric center always lies in the convex hull P of the list. then their center of mass (or gravity) would be ( xℓ mℓ ) / mℓ .

can be accomplished via Q(X −α1T ) where Q is an orthogonal matrix ( B.5. EDM D is rotation/reflection invariant. Consider the EDM graph in Figure 134b representing known distance between vertices (Figure 134a) of a tilted-square diamond in R2 .5). an orthonormal matrix. it appears that any matrix Qp such that X T QTQp X = X TX p will have the property D(Qp X) = D(X) (1012) An example is skinny Qp ∈ Rm×n (m > n) having orthonormal columns. Proof follows from the fact.1 Example. Interpoint distances are unaffected by rotation or reflection. INVARIANCE 451 5.2.5. Looking at EDM definition (922). 5. we may safely ignore offset and consider only the impact of matrices that premultiply X . we say. or reflection through some affine subset containing α . QT = Q−1 ⇒ X TQTQX = X TX . a counterclockwise rotation of any vector about the origin by angle θ is prescribed by the orthogonal matrix Q = cos θ − sin θ sin θ cos θ (1013) (1011) . only multiples of π/2. Suppose some geometrical optimization problem were posed where isometric transformation is allowed excepting reflection. We rightfully expect D Q(X − α1T ) = D(QX − β1T ) = D(QX) = D(X) (1010) Because list-form D(X) is translation invariant.5. Reflection prevention and quadrature rotation. and where rotation must be quantized so that only quadrature rotations are allowed. The class of premultiplying matrices for which interpoint distances are unaffected is a little more broad than orthogonal matrices. So (1010) follows directly from (922).5. In two dimensions.0.2 Rotation/Reflection Rotation of the list X ∈ Rn×N about some arbitrary point α ∈ Rn .

1. .5.5. EUCLIDEAN DISTANCE MATRIX x2 x3 x1 x3 x1 x4 (a) x4 (b) (c) Figure 134: (a) Four points in quadrature in two dimensions about their geometric center.452 x2 CHAPTER 5. whereas reflection of any point through a hyperplane containing the origin ∂H = x∈ R 2 cos θ sin θ T x=0 (1014) is accomplished via multiplication with symmetric orthogonal matrix ( B.3) R = sin(θ)2 − cos(θ)2 −2 sin(θ) cos(θ) −2 sin(θ) cos(θ) cos(θ)2 − sin(θ)2 (1015) Rotation matrix Q is characterized by identical diagonal entries and by antidiagonal entries equal but opposite in sign. define Y XV ∈ R2×4 (1016) . 4 to columns of a matrix X = [ x1 x2 x3 x4 ] ∈ R2×4 (76) Our scheme to prevent reflection enforces a rotation matrix characteristic upon the coordinates of adjacent points themselves: First shift the geometric center of X to the origin. whereas reflection matrix R is characterized in the reverse sense. Assign the diamond vertices xℓ ∈ R2 . (c) Quadrature rotation of a Euclidean body in R2 first requires a shroud that is the smallest Cartesian square containing it.1). for geometric centering matrix V ∈ S4 ( 5. ℓ = 1 . . (b) Complete EDM graph of diamond-shaped vertices.0.

Counterclockwise rotation is thereby enforced by constraining equality among diagonal and antidiagonal entries as prescribed by (1013). it is sufficient that all interpoint distances be specified and that adjacencies Y (: . as desired. absolute rotation. Y (: . any clockwise rotation would ascribe a reflection matrix characteristic. 2 : 3) . 3 : 4) be proportional to 2×2 rotation matrices.5. D(Θ) (990) is rotation/reflection invariant. and translation information is lost. the affine constraint YC + quantizes rotation. 1 : 3) = 0 1 Y (: . reconstruction of point position ( 5. this isometric reconstruction is unique insofar as every realization of a corresponding list X is congruent: . 1 : 2) .5. We capture the four square vertices as columns of a product Y C where   1 0 0 1  1 1 0 0   C = (1018)  0 1 1 0  0 0 1 1 Then.2. reflection. Given a noiseless complete EDM. 5.5.1 Inner-product form invariance D(Qp Θ) = D(QΘ) = D(Θ) so (1011) and (1012) similarly apply.5. assuming a unit-square shroud. and Y (: . Given an EDM. INVARIANCE 453 To maintain relative quadrature between points (Figure 134a) and to prevent reflection. 5.3 Invariance conclusion In the making of an EDM.12. Y (: . Our scheme to quantize rotation requires that all square vertices be described by vectors whose entries are nonnegative when the square is translated anywhere interior to the nonnegative orthant. (1020) 1/2 T 1 ≥0 1/2 (1019) Likewise. the list X) can be guaranteed correct only in affine dimension r and relative position. 2 : 4) −1 0 (1017) Quadrature quantization of rotation can be regarded as a constraint on tilt of the smallest Cartesian square containing the diamond as in Figure 134c.

7).2.1] on the geometric center subspace5. we endeavor to demonstrate it.2) SN c {G ∈ SN | G1 = 0} N (2040) (2041) (1022) = {V Y V | Y ∈ SN } ⊂ SN T ≡ {VN AVN | A ∈ SN −1 } = {G ∈ S | N (G) ⊇ 1} = {G ∈ SN | R(G) ⊆ N (1T )} then D(G) on that subspace is injective.3.27 ( E. EUCLIDEAN DISTANCE MATRIX 5.6. and inner-product form D(Θ) (990) are many-to-one surjections ( 5.2.0. EDM operators list-form D(X) (922). 5.5. hence. then VN BVN = A or VN Y VN = A .4.7.5) onto the same range.1 Gram-form bijectivity Because linear Gram-form EDM operator D(G) = δ(G)1T + 1δ(G)T − 2G (934) has no nullspace [85.454 CHAPTER 5.1. T The equivalence ≡ in (1022) follows from the fact: Given B = V Y V = VN AVN ∈ SN c † †T † †T N −1 with only matrix A ∈ S unknown.27 . A. the EDM cone ( 6): (confer (935) (1027)) EDMN = = = where ( 5.6 Injectivity of D & unique reconstruction Injectivity implies uniqueness of isometric reconstruction ( 5. Gram-form D(G) (934).1) SN ⊥ = {u1T + 1uT | u ∈ RN } ⊆ SN c (2042) D(X) : RN −1×N → SN | X ∈ RN −1×N h N N D(G) : S → Sh | G ∈ SN − SN ⊥ + c N −1×N −1 N N −1×N −1 D(Θ) : R → Sh | Θ ∈ R (1021) 5.

6.5. INJECTIVITY OF D & UNIQUE RECONSTRUCTION 455 ∈ basis SN ⊥ h dim SN = dim SN = c h N (N −1) 2 in RN (N +1)/2 dim SN ⊥ = dim SN ⊥ = N in RN (N +1)/2 c h basis SN = V {Eij }V (confer (59)) c SN c basis SN ⊥ c SN h Figure 135: Orthogonal complements in SN abstractly oriented in isometrically isomorphic RN (N +1)/2 . such is not the case in higher dimension.) ∈ basis SN ⊥ h . Basis vectors for each ⊥ space outnumber those for its respective orthogonal complement. Case N = 2 accurately illustrated in R3 . Orthogonal projection of basis for SN ⊥ on SN ⊥ yields another basis for SN ⊥ . h c c (Basis vectors for SN ⊥ are illustrated lying in a plane orthogonal to SN in this c c dimension.

11. and because the Gram-form EDM operator D is translation invariant and N (D) is the translation-invariant subspace SN ⊥ .2) were a subset of R(V Y V ) .1. EUCLIDEAN DISTANCE MATRIX To prove injectivity of D(G) on SN : Any matrix Y ∈ SN can be c decomposed into orthogonal components in SN . c then EDM definition D(G) (1021) on5.1.6. 5.4. Because of translation c c invariance ( 5. A. which is contradictory.1.0. . c It remains only to show D(V Y V ) = 0 ⇔ V Y V = 0 (1024) ⇔ Y = u1T + 1uT for some u ∈ RN . (confer (943)) V(D) : SN → SN SN = aff EDMN h 5. 4.28 (confer 6.5.0. 6.29 Critchley cites Torgerson (1958) [352. then VN BVN = A and A ∈ S+ must be positive semidefinite by positive semidefiniteness of B and Corollary A.5.1. D(Y − V Y V ) = 0 hence N (D) ⊇ SN ⊥ . (confer (935)) EDMN = D(G) | G ∈ SN ∩ SN c + 5.3.3]5.5.28 −V D V 1 2 (1028) [89.1 Gram-form operator D inversion (1027) Define the linear geometric centering operator V .456 CHAPTER 5.1) SN ∩ SN = {V Y V c + T 0 | Y ∈ SN } ≡ {VN AVN | A ∈ SN −1 } ⊂ SN + (1026) must be surjective onto EDMN .7.6.29 This orthogonal projector V has no nullspace on (1282) T Equivalence ≡ in (1026) follows from the fact: Given B = V Y V = VN AVN ∈ SN + † †T N −1 with only matrix A unknown.1) and linearity. But this implies R(1) ( B. Thus we have N (D) = {Y | D(Y ) = 0} = {Y | V Y V = 0} = SN ⊥ c (1025) Since G1 = 0 ⇔ X1 = 0 (942) simply means list X is geometrically centered at the origin. 2] for a history and derivation of (1028). ch. D(V Y V ) will vanish whenever T T 2V Y V = δ(V Y V )1 + 1δ(V Y V ) .1. Y = V Y V + (Y − V Y V ) (1023) where V Y V ∈ SN and Y − V Y V ∈ SN ⊥ (2042).

line 11] x3 = 2 (not 1). h c D SN = D V(SN ⊕ SN ⊥ ) = SN + D V(SN ⊥ ) = SN c h h h h h (1036) because SN ⊇ D V(SN ⊥ ) . for the Gram-form we have the h h isomorphisms [90.107] [7. p.1. p. [7. but SN ⊥ ∩ SN = 0 (Figure 135). line 20].30 [6. 5.1] SN = D(SN ) h c N Sc = V(SN ) h and from bijectivity results in 5. INJECTIVITY OF D & UNIQUE RECONSTRUCTION 457 because the projection of −D/2 on SN (2040) can be 0 if and only if c D ∈ SN ⊥ .6. 18. SN = V(SN ) = V SN ⊕ SN ⊥ = V SN (1029) c h h h because (72) V SN h ⊇ V SN ⊥ = V δ 2(SN ) h (1030) Because D(G) on SN is injective. D = D V(D) or −V D V = V D(−V D V ) SN = D V(SN ) h h −V SN V = V D(−V SN V ) h h These operators V and D are mutual inverses. 2.2. −V SN V /2 is equivalent to the h geometric center subspace SN in the ambient space of symmetric matrices. we find for D ∈ SN h 1 1 1 1 D(−V D V 2 ) = δ(−V D V 2 )1T + 1δ(−V D V 2 )T − 2(−V D V 2 ) (1031) (1032) (1033) (1034) (1035) id est. In summary. The Gram-form D SN (934) is equivalent to SN .5. 2] [8. p.6. . Further.76. 2.1] [2.1]5. 2] [89. not a singleton set. p. Projector V on SN is therefore h c c h injective hence uniquely invertible. and aff D V(EDMN ) = D V(aff EDMN ) c by property (127) of the affine hull. delete sentence: Since G is also . a c surjection.30 (1037) (1038) (1039) (1040) .10.6. . EDMN = D(SN ∩ SN ) c + SN ∩ SN = V(EDMN ) c + In [7.

D(D) becomes an injective h map onto that same subspace. h c N⊥ 5. EUCLIDEAN DISTANCE MATRIX 5. There exists a linear mapping such that T (VN (D)) †T † VN VN (D)VN = −V D V 1 2 = V(D) where the equality SN ⊥ = N (V) is known ( E. Proof follows directly from the fact: linear D has no nullspace [85. on the domain that is the subspace of symmetric matrices having all zeros in the first row and column SN = {G ∈ SN | Ge1 = 0} 1 = 0 0T 0 I Y (2044) 0 0T 0 I | Y ∈ SN because it obviously has no nullspace there.6.1 T Inversion of D −VN DVN Injectivity of D(D) suggests inversion of (confer (939)) VN (D) : SN → SN −1 5.7. −VN D(VN Y VN /2)VN = Y .0. T Substituting ΘT Θ ← −VN DVN (1002) into inner-product form EDM definition D(Θ) (990). Yet when its domain is instead the entire symmetric hollow subspace SN = aff EDMN .31 mapping onto SN −1 having nullspace5.32 SN ⊥ . h 5.458 CHAPTER 5. for example.31 T −VN DVN (1042) a linear surjective5.6.2. then for any Y ∈ SN −1 . it may be further decomposed: D(D) = 0 δ T −VN DVN T 1T + 1 0 δ −VN DVN T − 2 0 0T T 0 −VN DVN (1041) This linear operator D is another flavor of inner-product form and an injective map of the EDM cone onto itself. c Surjectivity of VN (D) is demonstrated via the Gram-form EDM operator D(G) : †T † T Since SN = D(SN ) (1036).32 N (VN ) ⊇ Sc is apparent. c N (T (VN )) = N (V) ⊇ N (VN ) ⊇ SN ⊥ = N (V) c .1] on SN = aff D(EDMN ) = D(aff EDMN ) (127).2. Since Ge1 = 0 ⇔ Xe1 = 0 (936) means the first point in the list X resides at the origin. A.2 Inner-product form bijectivity The Gram-form EDM operator D(G) = δ(G)1T + 1δ(G)T − 2G (934) is an injective map. then D(G) on SN ∩ SN 1 + must be surjective onto EDMN .2).

any EDM may be expressed by the new flavor D(Φ) 0 δ(Φ) 1T + 1 0 δ(Φ)T Φ ⇔ 0 − 2 0 0 0T Φ ∈ EDMN (1049) where this D is a linear surjective operator onto EDMN by definition. we get another flavor T D −VN DVN = 0 0T T 0 −VN DVN (1044) N and we obtain mutual inversion of operators VN and D .6. Revising the argument of h c h this inner-product form (1041). and by bijectivity of this inner-product form: EDMN = D(SN −1 ) + N −1 S+ = VN (EDMN ) (1051) (1052) . injective because it has no nullspace on domain SN −1 .5. INJECTIVITY OF D & UNIQUE RECONSTRUCTION VN (SN ) = SN −1 h 459 (1043) injective on domain SN because SN ⊥ ∩ SN = 0. for D ∈ Sh 0 T δ −VN DVN T 1T + 1 0 δ −VN DVN T −2 D = D VN (D) T T −VN DVN = VN D(−VN DVN ) (1045) (1046) (1047) (1048) or SN = D VN (SN ) h h T T −VN SN VN = VN D(−VN SN VN ) h h Substituting ΘT Θ ← Φ into inner-product form EDM definition (990). More broadly. + + SN = D(SN −1 ) h SN −1 = VN (SN ) h (1050) demonstrably isomorphisms. + aff D(SN −1 ) = D(aff SN −1 ) (127).

( 2. Affine hull A is the unique plane that contains the triangle. 2]. EUCLIDEAN DISTANCE MATRIX 5. For the particular example illustrated in Figure 121.7 Embedding in affine hull The affine hull A (78) of a point list {xℓ } (arranged columnar in X ∈ Rn×N (76)) is identical to the affine hull of that polyhedron P (86) formed from all convex combinations of the xℓ . subtracting any α ∈ A in the affine hull from every list member will work. equal to dimension of the subspace parallel to that nonempty affine set A in which the points are embedded. . P is the triangle in union with its relative interior while its three vertices constitute the entire list X . A ⊇ P ⊇ {xℓ } (1054) Recall: affine dimension r is a lower bound on embedding.33 A − α = aff(X − α1T ) = aff(X) − α P − α = conv(X − α1T ) = conv(X) − α (1055) (1056) (1057) 5. 17] A = aff X = aff P (1053) Comparing hull definitions (78) and (86).460 CHAPTER 5. then the affine hull would instead be the unique line passing through them. r would become 1 while rank would then be 2.3. [61.1 Determining affine dimension Knowledge of affine dimension r becomes important because we lose any absolute offset common to all the generating xℓ in Rn when reconstructing convex polyhedra given only distance information. 5.1) We define dimension of the convex hull P to be the same as dimension r of the affine hull A [309. X − α1T translating A to the origin:5.7. Were there only two points in Figure 121. ( 5. so affine dimension r = 2 in that example while rank of X is 3. but r is not necessarily equal to rank of X (1073).33 The manipulation of hull functions aff and conv follows from their definitions.1) To calculate r .5. we first remove any offset that serves to increase dimensionality of the subspace required to contain polyhedron P . 2] [309. it becomes obvious that the xℓ and their convex hull P are embedded in their unique affine hull A .

Applying (1058) to (1061). x1 ∈ P ⊆ A to the origin in Rn . in particular. X − x1 1T = X − Xe1 1T = X(I − e1 1T ) = X 0 √ 2VN ∈ Rn×N (1061) where VN is defined in (928).1. and whose dimension is more easily calculated.7.1) translates the affine hull to the origin as well. p. Rn ⊇ R(XVN ) = A−x1 = aff(X − x1 1T ) = aff(P − x1 ) ⊇ P − x1 ∋ 0 (1062) n×N −1 where XVN ∈ R . the affine set A − α becomes a unique subspace of Rn in which the {xℓ − α} and their convex hull P − α are embedded (1058). [309. (1056)-(1060) remain true. (77) r dim A = dim(A − α) dim(P − α) = dim P (1060) For any α ∈ Rn .1.5. EMBEDDING IN AFFINE HULL Because (1053) and (1054) translate.0. Subtracting the first member α x1 from every list member will translate their affine hull A and their convex hull P and.12] Yet when α ∈ A . videlicet. Hence r = dim R(XVN ) (1063) Since shifting the geometric center to the origin ( 5. and e1 in (938).4.7. then it must also be true r = dim R(XV ) (1064) .5.1 Example. Translating first list-member to origin. 461 Rn ⊇ A − α = aff(X − α1T ) = aff(P − α) ⊇ P − α ⊇ {xℓ − α} (1058) where from the previous relations it is easily shown aff(P − α) = aff(P) − α (1059) Translating A neither changes its dimension or the dimension of the embedded polyhedron P . 5.0. p.

2) are more closely related. These auxiliary matrices ( B.. [203.g. (confer (943)) Similarly pre. multiplying inner-product form EDM definition (990).4]5. by (1066). 3.1.3] . rank ATA = rank A = rank AT . [333.1 Affine dimension r versus rank Now. EUCLIDEAN DISTANCE MATRIX For any matrix whose range is R(V ) = N (1T ) we get the same result. † V = VN VN R(XV ) = {Xz | z ∈ N (1T )} (1066) (1689) 5. N (ATA) = N (A).4.462 CHAPTER 5.34 (1069) For A ∈ Rm×n . Likewise.7.34 So. it is always true that T T T −VN DVN = 2VN X TXVN = 2VN GVN ∈ SN −1 (1067) where G is a Gram matrix. †T r = dim R(XVN ) (1065) because †T and R(V ) = R(VN ) = R(VN ) ( E). it always holds: T −VN DVN = ΘT Θ ∈ SN −1 (994) For any matrix A . affine dimension †T r = rank XV = rank XVN = rank XVN = rank Θ T T = rank V D V = rank V G V = rank VN DVN = rank VN GVN 5.and postmultiplying by V (1068) −V D V = 2V X TX V = 2V G V ∈ SN always holds because V 1 = 0 (1679). e. 0. suppose D is an EDM as defined by D(X) = δ(X TX)1T + 1δ(X TX)T − 2X TX ∈ EDMN (922) T T and we premultiply by −VN and postmultiply by VN . Then because VN 1 = 0 (929).

N − 1} (1073) is evident from (1061) but can be visualized in the example illustrated in Figure 121. Ge1 = 0 (939) or G1 = 0 (943)  † T T T rank VN DVN = rank V D V = rank VN DVN = rank VN (VN DVN )VN    rank Λ (1160) D ∈ EDMN 0 1T 0 1T    N − 1 − dim N = rank − 2 (1081) 1 −D 1 −D 5.7.35 (confer (954)) r ≤ min{n . The general fact5.7.7.5.0. X e1 = 0 or X 1 = 0 (1074) † T rank VN GVN = rank V G V = rank VN GVN rank G . N } . There we imagine a vector from the origin to each point in the list. EMBEDDING IN AFFINE HULL By conservation of dimension. but affine dimension r is 2 because the three points lie in a plane. 5.1) T r + dim N (VN DVN ) = N − 1 463 (1070) (1071) For D ∈ EDM N r + dim N (V D V ) = N T −VN DVN ≻ 0 ⇔ r = N − 1 (1072) but −V D V ⊁ 0. ( A.3. it becomes the only subspace of dimension r = 2 that can contain the translated triangular polyhedron. When that plane is translated to the origin. Those three vectors are linearly independent in R3 .35 rank X ≤ min{n .2 Pr´cis e We collect expressions for affine dimension r : for list X ∈ Rn×N and Gram matrix G ∈ SN + r = = = = = = = = = = dim(P − α) = dim P = dim conv X dim(A − α) = dim A = dim aff X rank(X − x1 1T ) = rank(X − αc 1T ) rank Θ (992) †T rank XVN = rank XV = rank XVN rank X .

40] ( 5.2). EDM rank versus affine dimension r . 3.7. λ i = 0 (1076) † −V D VN VN vi = λ i vi † † −VN V D VN νi = λ i VN vi † −VN DVN νi = λ i νi (1077) (1078) (1079) † the eigenvectors of −VN DVN are given by (1076) while its corresponding † nonzero eigenvalues are identical to those of −V D V although −VN DVN T is not necessarily positive semidefinite.2] [88.11. In Cayley-Menger form [114. 3] [185. [164.7.3] [50.464 CHAPTER 5. r = rank(D) − 2 ⇔ 1TD† 1 = 0 There can be no circumhypersphere whose relative boundary contains a generating list for the corresponding polyhedron. 7] . In contrast.0. 3] [163. EUCLIDEAN DISTANCE MATRIX † Eigenvalues of −V D V versus −VN DVN 5. r = rank(D) − 1 ⇔ 1TD† 1 = 0 Points constituting a list X generating the polyhedron corresponding to D lie on the relative boundary of an r-dimensional circumhypersphere having √ −1/2 diameter = 2 1TD† 1 (1080) XD† circumcenter = 1TD†1 1 2. 5. 6. 3] For D ∈ EDMN (confer (1234)) 1. i = 1... −VN DVN is positive semidefinite but its nonzero eigenvalues are generally different. 3.N (1075) Then we have: † From these we can determine the eigenvectors and eigenvalues of −VN DVN : Define † νi VN vi .1 Theorem. [350. r = N − 1 − dim N 0 1T 1 −D = rank 0 1T 1 −D −2 (1081) ⋄ Circumhyperspheres exist for r < rank(D)−2.3 Suppose for D ∈ EDMN we are given eigenvectors vi ∈ RN of −V D V and corresponding eigenvalues λ ∈ RN so that −V D V vi = λ i vi .3.

i+1 + d1. 1 1T − D2:N. ATA 0 ⇔ y TATAy = Ay full-rank skinny-or-square.5.8. d1N  (1085) When A is For A ∈ Rm×n . .36 We claim nonnegativity of the dij is enforced primarily by the matrix inequality (1083). T −VN DVN 0 SN h D∈ ⇒ dij ≥ 0 . ATA 0 ..  . . 5.5.j+1 + d1.i+1 )  .2:N + D2:N.j+1 + d1.i+1 − dj+1. .. 2:N ) ∈ SN −1 2    1  2 (d14 + d1N − d4N )  . 1 (d13 + d1N − d3N ) 2 d1.. ··· ··· 1 2 (d12 + d1N − d2N ) 1 2 (d13 + d1N − d3N ) 1 = 2 (1D1.i+1 + d1. then it is apparent from (1067) 0 (1083) because for any matrix A .8.j+1 − di+1. (1073) max{0 .i+1 .8 5. . 1 2 (d14 + d1N − d4N ) ··· .j+1 ) 1 2 (d1.i+1 − dj+1. id est.36 ≥ 0 for all y = 1. . rank(D) − 2} ≤ r ≤ min{n . EUCLIDEAN METRIC VERSUS MATRIX CRITERIA For all practical purposes. . . N − 1} 465 (1082) 5. (1087)) We now support our claim: If any matrix A ∈ Rm×m is positive semidefinite.2] Given D ∈ SN h T −VN DVN =  d12 1  2 (d12 + d13 − d23 )  1  2 (d1. ATA ≻ 0. then its main diagonal δ(A) ∈ Rm must have all nonnegative entries. [160.1 Euclidean metric versus matrix criteria Nonnegativity property 1 T T 2VN X TXVN = −VN DVN When D = [dij ] is an EDM (922). .j+1 ) .j+1 − di+1.i+1 ) 1 2 (d1. . i = j (1084) (The matrix inequality to enforce strict positivity differs by a stroke of the pen. 4. 1 2 (d12 + d1N − d2N ) 1 2 (d12 + d13 − d23 ) d13 1 2 (d1.  .

id est. In other words. then δ(−VN ΞTDΞ rVN ) ∈ RN −1 is some r th permutation of the i row or column of D excepting the 0 entry from the main diagonal. . N − 1}. . nonnegativity symmetry and hollowness become necessary and sufficient criteria satisfying matrix inequality (1083).2) of the Euclidean metric.15] that properties 2 through 4 of the Euclidean metric ( 5. −VN DVN 0 ⇔ −VN ΞTDΞ r VN 0 (⇔ −ΞT VN DVN Ξc 0).1 prob. then entries of D off the main diagonal must be strictly positive {dij > 0.8.1.8. [314] It follows: T −VN DVN 0 D ∈ SN h ⇒ T δ(−VN DVN ) Multiplication of VN by any permutation matrix Ξ has null effect on its range and nullspace. .j ∈ {1 . Hence −VN DVN 0 is a sufficient test for the first property ( 5.1 Strict positivity   =   d12 d13 .1) 5. 1) = 1.5. i = j} and only those entries along the main diagonal of D are 0.0. ( 6. T The rule of thumb is: If Ξ r (i . . By similar argument. 5.466 CHAPTER 5. the strict matrix inequality is a sufficient test for strict positivity of Euclidean distance-square.0. nonnegativity. EUCLIDEAN DISTANCE MATRIX where row. any permutation of the rows or columns of VN produces a basis for N (1T ).37 . T T T Hence. 1. R(Ξ r VN ) = R(VN Ξc ) = R(VN ) = N (1T ).2) together imply nonnegativity property 1.column indices i. When affine dimension r equals 1. T −VN DVN ≻ 0 D ∈ SN h ⇒ dij > 0 .2 Triangle inequality property 4 In light of Kreyszig’s observation [228. r c 5. i = j (1087) 5. in particular.37 Various permutation matrices will sift the remaining dij similarly T to (1086) thereby proving their nonnegativity. d1N      0 (1086) Should we require the points in Rn to be distinct.

5.8. EUCLIDEAN METRIC VERSUS MATRIX CRITERIA 2 djk = djk + dkj ≥ djj = 0 , j =k

467 (1088) 0

T nonnegativity criterion (1084) suggests that matrix inequality −VN DVN might somehow take on the role of triangle inequality; id est,

We now show that is indeed the case: Let T be the leading principal T submatrix in S2 of −VN DVN (upper left 2×2 submatrix from (1085)); T d12 1 (d12 +d13 −d23 ) 2
1 (d +d13 −d23 ) 2 12

 δ(D) = 0  DT = D ⇒  T −VN DVN 0

dij ≤

dik +

dkj ,

i=j =k

(1089)

d13

(1090)

T Submatrix T must be positive (semi)definite whenever −VN DVN ( A.3.1.0.4, 5.8.3) Now we have,

is.

T −VN DVN 0 ⇒ T 0 ⇔ λ1 ≥ λ2 ≥ 0 T −VN DVN ≻ 0 ⇒ T ≻ 0 ⇔ λ1 > λ2 > 0

(1091)

where λ1 and λ2 are the eigenvalues of T , real due only to symmetry of T : λ1 = λ2 =
1 2 1 2

d12 + d13 + d12 + d13 −

2 2 2 d23 − 2(d12 + d13 )d23 + 2(d12 + d13 )

2 2 2 d23 − 2(d12 + d13 )d23 + 2(d12 + d13 )

∈R ∈R

(1092)

Nonnegativity of eigenvalue λ1 is guaranteed by only nonnegativity of the dij which in turn is guaranteed by matrix inequality (1084). Inequality between the eigenvalues in (1091) follows from only realness of the dij . Since λ1 always equals or exceeds λ2 , conditions for positive (semi)definiteness of submatrix T can be completely determined by examining λ2 the smaller of its two eigenvalues. A triangle inequality is made apparent when we express

468

CHAPTER 5. EUCLIDEAN DISTANCE MATRIX

T eigenvalue nonnegativity in terms of D matrix entries; videlicet, T 0 ⇔ det T = λ1 λ2 ≥ 0 , d12 , d13 ≥ 0 ⇔ λ2 ≥ 0 ⇔ √ √ √ √ √ | d12 − d23 | ≤ d13 ≤ d12 + d23 (c) (b) (a) (1093)

Triangle inequality (1093a) (confer (988) (1105)), in terms of three rooted entries from D , is equivalent to metric property 4 d13 ≤ d23 ≤ d12 ≤ d12 + d12 + d13 + d23 d13 d23

(1094)

for the corresponding points x1 , x2 , x3 from some length-N list.5.38

5.8.2.1

Comment

Given D whose dimension N equals or exceeds 3, there are N !/(3!(N − 3)!) distinct triangle inequalities in total like (988) that must be satisfied, of which each dij is involved in N − 2, and each point xi is in (N − 1)!/(2!(N −1 − 2)!). We have so far revealed only one of those triangle inequalities; namely, T (1093a) that came from T (1090). Yet we claim if −VN DVN 0 then all triangle inequalities will be satisfied simultaneously; | dik − dkj | ≤ dij ≤ dik + dkj , i<k<j (1095)

(There are no more.) To verify our claim, we must prove the matrix inequality T −VN DVN 0 to be a sufficient test of all the triangle inequalities; more efficient, we mention, for larger N :
Accounting for symmetry property 3, the fourth metric property demands three inequalities be satisfied per one of type (1093a). The first of those inequalities in (1094) is self evident from (1093a), while the two remaining follow from the left-hand side of (1093a) and the fact for scalars, |a| ≤ b ⇔ a ≤ b and −a ≤ b .
5.38

5.8. EUCLIDEAN METRIC VERSUS MATRIX CRITERIA

469

5.8.2.1.1 Shore. The columns of Ξ r VN Ξc hold a basis for N (1T ) when Ξ r and Ξc are permutation matrices. In other words, any permutation of the rows or columns of VN leaves its range and nullspace unchanged; id est, R(Ξ r VN Ξc ) = R(VN ) = N (1T ) (929). Hence, two distinct matrix inequalities can be equivalent tests of the positive semidefiniteness of D on T R(VN ) ; id est, −VN DVN 0 ⇔ −(Ξ r VN Ξc )TD(Ξ r VN Ξc ) 0. By properly choosing permutation matrices,5.39 the leading principal submatrix TΞ ∈ S2 of −(Ξ r VN Ξc )TD(Ξ r VN Ξc ) may be loaded with the entries of D needed to test any particular triangle inequality (similarly to (1085)-(1093)). Because all the triangle inequalities can be individually tested using a test equivalent T to the lone matrix inequality −VN DVN 0, it logically follows that the lone matrix inequality tests all those triangle inequalities simultaneously. We T conclude that −VN DVN 0 is a sufficient test for the fourth property of the Euclidean metric, triangle inequality. 5.8.2.2 Strict triangle inequality

Without exception, all the inequalities in (1093) and (1094) can be made strict while their corresponding implications remain true. The then strict inequality (1093a) or (1094) may be interpreted as a strict triangle inequality under which collinear arrangement of points is not allowed. T [224, 24/6, p.322] Hence by similar reasoning, −VN DVN ≻ 0 is a sufficient test of all the strict triangle inequalities; id est,  δ(D) = 0  ⇒ DT = D  T −VN DVN ≻ 0

dij <

dik +

dkj ,

i=j =k

(1096)

5.8.3

T From (1090) observe that T = −VN DVN |N ← 3 . In fact, for D ∈ EDMN , T the leading principal submatrices of −VN DVN form a nested sequence (by inclusion) whose members are individually positive semidefinite [160] [203] [333] and have the same form as T ; videlicet,5.40

T −VN DVN nesting

To individually test triangle inequality | dik − dkj | ≤ dij ≤ particular i, k , j, set Ξ r (i, 1) = Ξ r (k , 2) = Ξ r (j , 3) = 1 and Ξc = I . 0 5.40 −V D V |N ← 1 = 0 ∈ S+ ( B.4.1)

5.39

dik +

dkj for

470
T −VN DVN |N ← 1 = [ ∅ ]

CHAPTER 5. EUCLIDEAN DISTANCE MATRIX (o) (a)
1 (d +d13 −d23 ) 2 12

T −VN DVN |N ← 2 = [d12 ] ∈ S+ T −VN DVN |N ← 3 =

d12 1 (d +d13 −d23 ) 2 12 

d13

= T ∈ S2 +
1 (d +d14 −d24 ) 2 12 1 (d +d14 −d34 ) 2 13

(b) 

T −VN DVN |N ← 4

d12  1 =  2 (d12 +d13 −d23 ) 1 (d +d14 −d24 ) 2 12 . . .

1 (d +d13 −d23 ) 2 12

d13 1 (d +d14 −d34 ) 2 13

d14

  (c)

T −VN DVN |N ← i = 

T −VN DVN |N ← i−1 ν(i)

. . .

ν(i)T

d1i

i−1  ∈ S+

(d)

T −VN DVN

= 

T −VN DVN |N ← N −1 ν(N )

ν(N )

T

d1N

 ∈ SN −1 + 

(e) (1097)

where 1   2  d12 +d1i −d2i d13 +d1i −d3i . . . d1,i−1 +d1i −di−1,i    ∈ Ri−2 , 

ν(i)

i>2

(1098)

Hence, the leading principal submatrices of EDM D must also be EDMs.5.41 Bordered symmetric matrices in the form (1097d) are known to have intertwined [333, 6.4] (or interlaced [203, 4.3] [330, IV.4.1]) eigenvalues; (confer 5.11.1) that means, for the particular submatrices (1097a) and (1097b),
5.41

In fact, each and every principal submatrix of an EDM D is another EDM. [236, 4.1]

5.8. EUCLIDEAN METRIC VERSUS MATRIX CRITERIA λ2 ≤ d12 ≤ λ1

471 (1099)

where d12 is the eigenvalue of submatrix (1097a) and λ1 , λ2 are the eigenvalues of T (1097b) (1090). Intertwining in (1099) predicts that should d12 become 0, then λ2 must go to 0 .5.42 The eigenvalues are similarly intertwined for submatrices (1097b) and (1097c); γ 3 ≤ λ 2 ≤ γ2 ≤ λ 1 ≤ γ1 (1100)

where γ1 , γ2 , γ3 are the eigenvalues of submatrix (1097c). Intertwining likewise predicts that should λ2 become 0 (a possibility revealed in 5.8.3.1), then γ3 must go to 0. Combining results so far for N = 2, 3, 4: (1099) (1100) γ3 ≤ λ2 ≤ d12 ≤ λ1 ≤ γ1 (1101)

The preceding logic extends by induction through the remaining members of the sequence (1097). 5.8.3.1 Tightening the triangle inequality

Now we apply Schur complement from A.4 to tighten the triangle inequality from (1089) in case: cardinality N = 4. We find that the gains by doing so are modest. From (1097) we identify: A B BT C A
T −VN DVN |N ← 4 T T = −VN DVN |N ← 3

(1102) (1103)

both positive semidefinite by assumption, where B = ν(4) (1098), and C = d14 . Using nonstrict CC † -form (1544), C 0 by assumption ( 5.8.1) and CC † = I . So by the positive semidefinite ordering of eigenvalues theorem ( A.3.1.0.1),
T −VN DVN |N ← 4

0 ⇔ T

−1 d14 ν(4)ν(4)T ⇒

−1 λ1 ≥ d14 ν(4) λ2 ≥ 0

2

(1104)

−1 −1 where {d14 ν(4) 2 , 0} are the eigenvalues of d14 ν(4)ν(4)T while λ1 , λ2 are the eigenvalues of T .

If d12 were 0, eigenvalue λ2 becomes 0 (1092) because d13 must then be equal to d23 ; id est, d12 = 0 ⇔ x1 = x2 . ( 5.4)

5.42

472

CHAPTER 5. EUCLIDEAN DISTANCE MATRIX

5.8.3.1.1 Example. Small completion problem, II. Applying the inequality for λ1 in (1104) to the small completion problem on √ page 412 Figure 122, the lower bound on d14 (1.236 in (915)) is tightened √ to 1.289 . The correct value of d14 to three significant figures is 1.414 .

5.8.4

Affine dimension reduction in two dimensions

T (confer 5.14.4) The leading principal 2×2 submatrix T of −VN DVN has largest eigenvalue λ1 (1092) which is a convex function of D .5.43 λ1 can never be 0 unless d12 = d13 = d23 = 0. Eigenvalue λ1 can never be negative while the dij are nonnegative. The remaining eigenvalue λ2 (1092) is a concave function of D that becomes 0 only at the upper and lower bounds of triangle inequality (1093a) and its equivalent forms: (confer (1095))

√ √ √ √ √ | d12 − d23 | ≤ d13 ≤ d12 + d23 √ √ √⇔ √ √ | d12 − d13 | ≤ d23 ≤ d12 + d13 √ √⇔ √ √ √ | d13 − d23 | ≤ d12 ≤ d13 + d23

(a) (b) (c) (1105)

In between those bounds, λ2 is strictly positive; otherwise, it would be negative but prevented by the condition T 0. When λ2 becomes 0, it means triangle △123 has collapsed to a line segment; a potential reduction in affine dimension r . The same logic is valid T for any particular principal 2×2 submatrix of −VN DVN , hence applicable to other triangles.
The largest eigenvalue of any symmetric matrix is always a convex function of its entries, while the smallest eigenvalue is always concave. [61, exmp.3.10] In our particular   d12 case, say d d13 ∈ R3 . Then the Hessian (1799) ∇ 2 λ1 (d) 0 certifies convexity d23 whereas ∇ 2 λ2 (d) 0 certifies concavity. Each Hessian has rank equal to 1. The respective gradients ∇λ1 (d) and ∇λ2 (d) are nowhere 0 and can be uniquely defined.
5.43

5.9. BRIDGE: CONVEX POLYHEDRA TO EDMS

473

5.9

Bridge: Convex polyhedra to EDMs

The criteria for the existence of an EDM include, by definition (922) (990), the properties imposed upon its entries dij by the Euclidean metric. From 5.8.1 and 5.8.2, we know there is a relationship of matrix criteria to those properties. Here is a snapshot of what we are sure: for i , j , k ∈ {1 . . . N } (confer 5.2) dij dij dij dij ≥ 0, i = j T −VN DVN 0 = 0, i = j δ(D) = 0 ⇐ = dji DT = D ≤ dik + dkj , i = j = k

(1106)

all implied by D ∈ EDMN . In words, these four Euclidean metric properties are necessary conditions for D to be a distance matrix. At the moment, we have no converse. As of concern in 5.3, we have yet to establish metric requirements beyond the four Euclidean metric properties that would allow D to be certified an EDM or might facilitate polyhedron or list reconstruction from an incomplete EDM. We deal with this problem in 5.14. Our present goal is to establish ab initio the necessary and sufficient matrix criteria that will subsume all the Euclidean metric properties and any further requirements5.44 for all N > 1 ( 5.8.3); id est,
T −VN DVN 0 D ∈ SN h

⇔ D ∈ EDMN

(941)

or for EDM definition (999), Ω δ(d) 0 0 ⇔ D = D(Ω , d) ∈ EDMN (1107)

T In 1935, Schoenberg [314, (1)] first extolled matrix product −VN DVN (1085) (predicated on symmetry and self-distance) specifically incorporating VN , albeit T algebraically. He showed: nonnegativity −y T VN DVN y ≥ 0, for all y ∈ RN −1 , is necessary and sufficient for D to be an EDM. Gower [163, 3] remarks how surprising it is that such a fundamental property of Euclidean geometry was obtained so late. 5.44

474

CHAPTER 5. EUCLIDEAN DISTANCE MATRIX

Figure 136: Elliptope E 3 in isometrically isomorphic R6 (projected on R3 ) is a convex body that appears to possess some kind of symmetry in this dimension; it resembles a malformed pillow in the shape of a bulging tetrahedron. Elliptope relative boundary is not smooth and comprises all set members (1108) having at least one 0 eigenvalue. [239, 2.1] This elliptope has an infinity of vertices, but there are only four vertices corresponding to a rank-1 matrix. Those yy T , evident in the illustration, have binary vector y ∈ R3 with entries in {±1}.

5.9. BRIDGE: CONVEX POLYHEDRA TO EDMS

475

0

Figure 137: Elliptope E 2 in isometrically isomorphic R3 is a line segment illustrated interior to positive semidefinite cone S2 (Figure 43). +

5.9.1

Geometric arguments

5.9.1.0.1 Definition. Elliptope: [239] [236, 2.3] [114, 31.5] a unique bounded immutable convex Euclidean body in S n ; intersection of n positive semidefinite cone S+ with that set of n hyperplanes defined by unity main diagonal; n En S+ ∩ {Φ ∈ S n | δ(Φ) = 1} (1108) a.k.a the set of all correlation matrices of dimension dim E n = n(n − 1)/2 in Rn(n+1)/2 (1109)

An elliptope E n is not a polyhedron, in general, but has some polyhedral faces and an infinity of vertices.5.45 Of those, 2n−1 vertices (some extreme points of the elliptope) are extreme directions yy T of the positive semidefinite cone, where the entries of vector y ∈ Rn belong to {±1} and exercise every combination. Each of the remaining vertices has rank, greater than one, belonging to the set {k > 0 | k(k + 1)/2 ≤ n}. Each and every face of an elliptope is exposed. △
Laurent defines vertex distinctly from the sense herein ( 2.6.1.0.1); she defines vertex as a point with full-dimensional (nonempty interior) normal cone ( E.10.3.2.1). Her definition excludes point C in Figure 32, for example.
5.45

476

CHAPTER 5. EUCLIDEAN DISTANCE MATRIX

In fact, any positive semidefinite matrix whose entries belong to {±1} is a rank-one correlation matrix; and vice versa:5.46 5.9.1.0.2 Theorem. Elliptope vertices rank-one. [126, 2.1.1] For Y ∈ S n , y ∈ Rn , and all i, j ∈ {1 . . . n} (confer 2.3.1.0.1) Y 0, Yij ∈ {±1} ⇔ Y = yy T , yi ∈ {±1}

(1110) ⋄

The elliptope for dimension n = 2 is a line segment in isometrically n isomorphic Rn(n+1)/2 (Figure 137). Obviously, cone(E n ) = S+ . The elliptope for dimension n = 3 is realized in Figure 136. 5.9.1.0.3 Lemma. Hypersphere. (confer bullet p.422) [17, 4] N Matrix Ψ = [Ψij ] ∈ S belongs to the elliptope in SN iff there exist N points p on the boundary of a hypersphere in Rrank Ψ having radius 1 such that pi − pj
2

= 2(1 − Ψij ) ,

i, j = 1 . . . N

(1111) ⋄

There is a similar theorem for Euclidean distance matrices: We derive matrix criteria for D to be an EDM, validating (941) using simple geometry; distance to the polyhedron formed by the convex hull of a list of points (76) in Euclidean space Rn . 5.9.1.0.4 EDM assertion. D is a Euclidean distance matrix if and only if D ∈ SN and distances-square h from the origin correspond to points p in some bounded convex polyhedron having N or fewer vertices embedded in an r-dimensional subspace A − α of Rn , where α ∈ A = aff P and where domain of linear surjection p(y) is the unit simplex S ⊂ RN −1 shifted such that its vertex at the origin is translated + to −β in RN −1 . When β = 0, then α = x1 . ⋄
As there are few equivalent conditions for rank constraints, this device is rather important for relaxing integer, combinatorial, or Boolean problems.
5.46

{ p(y)

2

T = −y T VN DVN y | y ∈ S − β}

(1112) (1113)

P − α = {p(y) | y ∈ S − β}

5.9. BRIDGE: CONVEX POLYHEDRA TO EDMS

477

where e1 is as in (938). Incidental to the EDM assertion, shifting the unit-simplex domain in RN −1 translates the polyhedron P in Rn . Indeed, there is a map from vertices of the unit simplex to members of the list generating P ; p : RN −1 →                Rn   −β   e1 − β     e2 − β p  .  .  .    e N −1 − β   x1 − α    x2 − α   x3 − α  .  .  .    x −α N             

In terms of VN , the unit simplex (292) in RN −1 has an equivalent representation: √ (1114) S = {s ∈ RN −1 | 2VN s −e1 }

(1115)

=

5.9.1.0.5 Proof. EDM assertion. (⇒) We demonstrate that if D is an EDM, then each distance-square p(y) 2 described by (1112) corresponds to a point p in some embedded polyhedron P − α . Assume D is indeed an EDM; id est, D can be made from some list X of N unknown points in Euclidean space Rn ; D = D(X) for X ∈ Rn×N as in (922). Since D is translation invariant ( 5.5.1), we may shift the affine hull A of those unknown points to the origin as in (1055). Then take any point p in their convex hull (86); P − α = {p = (X − Xb 1T )a | aT 1 = 1, a 0} (1116)

where α = Xb ∈ A ⇔ bT 1 = 1. Solutions to aT 1 = 1 are:5.47 √ a ∈ e1 + 2VN s | s ∈ RN −1

(1117)

√ where e1 is as in (938). Similarly, b = e1 + 2VN β . √ √ √ P − α = {p = X(I − (e1 + 2V√β)1T )(e1 + 2VN s) | 2VN s −e1 } N √ (1118) = {p = X 2VN (s − β) | 2VN s −e1 }
5.47

Since R(VN ) = N (1T ) and N (1T ) ⊥ R(1) , then over all s ∈ RN −1 , VN s is a hyperplane through the origin orthogonal to 1. Thus the solutions {a} constitute a hyperplane orthogonal to the vector 1, and offset from the origin in RN by any particular solution; in this case, a = e1 .

478

CHAPTER 5. EUCLIDEAN DISTANCE MATRIX

that describes the domain of p(s) as the unit simplex √ S = {s | 2VN s −e1 } ⊂ RN −1 + Making the substitution s − β ← y √ P − α = {p = X 2VN y | y ∈ S − β}

(1114)

(1119)

Point p belongs to a convex polyhedron P −α embedded in an r-dimensional subspace of Rn because the convex hull of any list forms a polyhedron, and because the translated affine hull A − α contains the translated polyhedron P − α (1058) and the origin (when α ∈ A), and because A has dimension r by definition (1060). Now, any distance-square from the origin to the polyhedron P − α can be formulated {pTp = p
2 T = 2y T VN X TXVN y | y ∈ S − β}

(1120)

Applying (1067) to (1120) we get (1112). (⇐) To validate the EDM assertion in the reverse direction, we prove: If each distance-square p(y) 2 (1112) on the shifted unit-simplex S − β ⊂ RN −1 corresponds to a point p(y) in some embedded polyhedron P − α , then D is an EDM. The r-dimensional subspace A − α ⊆ Rn is spanned by p(S − β) = P − α (1121)

because A − α = aff(P − α) ⊇ P − α (1058). So, outside domain S − β of linear surjection p(y) , simplex complement \S − β ⊂ RN −1 must contain domain of the distance-square p(y) 2 = p(y)T p(y) to remaining points in subspace A − α ; id est, to the polyhedron’s relative exterior \P − α . For T p(y) 2 to be nonnegative on the entire subspace A − α , −VN DVN must be positive semidefinite and is assumed symmetric;5.48
T −VN DVN

ΘT Θp p

(1122)

where5.49 Θp ∈ Rm×N −1 for some m ≥ r . Because p(S − β) is a convex polyhedron, it is necessarily a set of linear combinations of points from some
T T The antisymmetric part −VN DVN − (−VN DVN )T /2 is annihilated by p(y) 2 . By the same reasoning, any positive (semi)definite matrix A is generally assumed symmetric because only the symmetric part (A + AT )/2 survives the test y TA y ≥ 0. [203, 7.1] 5.49 T A = A 0 ⇔ A = RTR for some real matrix R . [333, 6.3] 5.48

5.9. BRIDGE: CONVEX POLYHEDRA TO EDMS

479

length-N list because every convex polyhedron having N or fewer vertices can be generated that way ( 2.12.2). Equivalent to (1112) are {pTp | p ∈ P − α} = {pTp = y T ΘT Θp y | y ∈ S − β} p (1123)

Because p ∈ P − α may be found by factoring (1123), the list Θp is found by factoring (1122). A unique EDM can be made from that list using inner-product form definition D(Θ)|Θ=Θp (990). That EDM will be identical to D if δ(D) = 0, by injectivity of D (1041).

5.9.2

Necessity and sufficiency

T From (1083) we learned that matrix inequality −VN DVN 0 is a necessary test for D to be an EDM. In 5.9.1, the connection between convex polyhedra and EDMs was pronounced by the EDM assertion; the matrix inequality together with D ∈ SN became a sufficient test when the EDM assertion h demanded that every bounded convex polyhedron have a corresponding EDM. For all N > 1 ( 5.8.3), the matrix criteria for the existence of an EDM in (941), (1107), and (917) are therefore necessary and sufficient and subsume all the Euclidean metric properties and further requirements.

5.9.3

Example revisited

Now we apply the necessary and sufficient EDM criteria (941) to an earlier problem. 5.9.3.0.1 Example. Small completion problem, III. (confer 5.8.3.1.1) Continuing Example 5.3.0.0.2 pertaining to Figure 122 where N = 4, T distance-square d14 is ascertainable from the matrix inequality −VN DVN 0. √ Because all distances in (914) are known except d14 , we may simply T calculate the smallest eigenvalue of −VN DVN over a range of d14 as in Figure 138. We observe a unique value of d14 satisfying (941) where the abscissa axis is tangent to the hypograph of the smallest eigenvalue. Since the smallest eigenvalue of a symmetric matrix is known to be a concave function ( 5.8.4), we calculate its second partial derivative with respect to d14 evaluated at 2 and find −1/3. We conclude there are no other satisfying values of d14 . Further, that value of d14 does not meet an upper or lower bound of a triangle inequality like (1095), so neither does it cause the collapse

480

CHAPTER 5. EUCLIDEAN DISTANCE MATRIX

of any triangle. Because the smallest eigenvalue is 0, affine dimension r of any point list corresponding to D cannot exceed N − 2. ( 5.7.1.1)

5.10

EDM-entry composition

Laurent [236, 2.3] applies results from Schoenberg (1938) [315] to show certain nonlinear compositions of individual EDM entries yield EDMs; in particular, D ∈ EDMN ⇔ [1 − e−α dij ] ∈ EDMN ∀ α > 0 ⇔ [e−α dij ] ∈ E N ∀α> 0 where D = [dij ] and E N is the elliptope (1108). 5.10.0.0.1 Proof. (Monique Laurent, 2003) [315] (confer [228]) (a) (b) (1124)

Lemma 2.1. from A Tour d’Horizon . . . on Completion Problems. [236] For D = [dij , i, j = 1 . . . N ] ∈ SN and E N the elliptope in SN h ( 5.9.1.0.1), the following assertions are equivalent: (i) D ∈ EDMN

(ii) e−αD

(iii) 11T − e−αD

[e−αdij ] ∈ E N for all α > 0

[1 − e−αdij ] ∈ EDMN for all α > 0

1) Equivalence of Lemma 2.1 (i) (ii) is stated in Schoenberg’s Theorem 1 [315, p.527]. 2) (ii) ⇒ (iii) can be seen from the statement in the beginning of section 3, saying that a distance space embeds in L2 iff some associated matrix is PSD. We reformulate it: Let d = (dij )i,j=0,1...N be a distance space on N + 1 points (i.e., symmetric hollow matrix of order N + 1) and let p = (pij )i,j=1...N be the symmetric matrix of order N related by: (A) 2pij = d0i + d0j − dij for i, j = 1 . . . N or equivalently (B) d0i = pii , dij = pii + pjj − 2pij for i, j = 1 . . . N

5.10. EDM-ENTRY COMPOSITION smallest eigenvalue
1 2 3 4

481

d14

-0.2

-0.4

-0.6

T Figure 138: Smallest eigenvalue of −VN DVN makes it a PSD matrix for only one value of d14 : 2.

1.4 1.2 1 0.8

dij

1/α 1/α

log2 (1+ dij ) 1− e−α dij

(a)

0.6 0.4 0.2 0.5 4 1 1.5 2

dij

3

(b)

2

1

dij 1/α log2 (1+ dij ) 1− e−α dij
0.5 1 1.5 2

1/α

α

Figure 139: Some entrywise EDM compositions: (a) α = 2. Concave nondecreasing in dij . (b) Trajectory convergence in α for dij = 2.

482

CHAPTER 5. EUCLIDEAN DISTANCE MATRIX Then d embeds in L2 iff p is a positive semidefinite matrix iff d is of negative type (second half page 525/top of page 526 in [315]). For the implication from (ii) to (iii), set: p = e−αd and define d ′ from p using (B) above. Then d ′ is a distance space on N + 1 points that embeds in L2 . Thus its subspace of N points also embeds in L2 and is precisely 1 − e−αd . Note that (iii) ⇒ (ii) cannot be read immediately from this argument since (iii) involves the subdistance of d ′ on N points (and not the full d ′ on N + 1 points).

3) Show (iii) ⇒ (i) by using the series expansion of the function 1 − e−αd : the constant term cancels, α factors out; there remains a summation of d plus a multiple of α . Letting α go to 0 gives the result. This is not explicitly written in Schoenberg, but he also uses such an argument; expansion of the exponential function then α → 0 (first proof on [315, p.526]). Schoenberg’s results [315, 6 thm.5] (confer [228, p.108-109]) also suggest certain finite positive roots of EDM entries produce EDMs; specifically, D ∈ EDMN ⇔ [dij ] ∈ EDMN
1/α

∀α> 1

(1125)

The special case α = 2 is of interest because it corresponds to absolute distance; e.g., √ ◦ D ∈ EDMN ⇒ D ∈ EDMN (1126) Assuming that points constituting a corresponding list X are distinct (1087), then it follows: for D ∈ SN h
α→∞

lim [dij ] = lim [1 − e−α dij ] = −E
α→∞

1/α

11T − I

(1127)

Negative elementary matrix −E ( B.3) is relatively interior to the EDM cone ( 6.5) and terminal to respective trajectories (1124a) and (1125) as functions of α . Both trajectories are confined to the EDM cone; in engineering terms, the EDM cone is an invariant set [312] to either trajectory. Further, if D is not an EDM but for some particular αp it becomes an EDM, then for all greater values of α it remains an EDM.

5.11. EDM INDEFINITENESS

483

5.10.0.0.2 Exercise. Concave nondecreasing EDM-entry composition. Given EDM D = [dij ] , empirical evidence suggests that the composition 1/α [log2 (1 + dij )] is also an EDM for each fixed α ≥ 1 [sic] . Its concavity in dij is illustrated in Figure 139 together with functions from (1124a) and (1125). Prove whether it holds more generally: Any concave nondecreasing composition of individual EDM entries dij on R+ produces another EDM.

5.10.0.0.3 Exercise. Taxicab distance matrix as EDM. Determine whether taxicab distance matrices (D1 (X) in Example 3.7.0.0.2) are all numerically equivalent to EDMs. Explain why or why not.

5.10.1

EDM by elliptope

(confer (948)) For some κ ∈ R+ and C ∈ SN in elliptope E N ( 5.9.1.0.1), + Alfakih asserts: any given EDM D is expressible [9] [114, 31.5] D = κ(11T − C ) ∈ EDMN (1128) This expression exhibits nonlinear combination of variables κ and C . We therefore propose a different expression requiring redefinition of the elliptope (1108) by scalar parametrization;
n where, of course, E n = E1 . Then any given EDM D is expressible

Etn

n S+ ∩ {Φ ∈ S n | δ(Φ) = t1}

(1129) (1130)

which is linear in variables t ∈ R+ and E ∈ EtN .

D = t11T − E ∈ EDMN

5.11

EDM indefiniteness

By known result ( A.7.2) regarding a 0-valued entry on the main diagonal of a symmetric positive semidefinite matrix, there can be no positive or negative semidefinite EDM except the 0 matrix because EDMN ⊆ SN (921) and h the origin. So when D ∈ EDMN , there can be no factorization D = ATA or −D = ATA . [333, 6.3] Hence eigenvalues of an EDM are neither all nonnegative or all nonpositive; an EDM is indefinite and possibly invertible. SN ∩ SN = 0 h + (1131)

N − 1} the T eigenvalues of −VN DVN and denote eigenvalues of the congruence −W TDW by {ζ i . if we denote by {γi .3] [333. N − 1. 4. i = 1 . ζi ≥ 0. then by theory of interlacing eigenvalues for bordered symmetric matrices [203. then γi ≥ 0 ∀ i (1483) because −VN DVN 0 as we know.50 All entries of the corresponding eigenvector must have the same sign.4.3. [89. . . and negative eigenvalues as −D . zero. .1 EDM eigenvalues. By congruence. . [203. 6. nonnegative matrices are guaranteed only to have at least one nonnegative eigenvalue. nontrivial −D must therefore have exactly one negative eigenvalue. . IV. EUCLIDEAN DISTANCE MATRIX 5. . an impossibility. Unlike positive semidefinite matrices. i = 1 .1] ζN ≤ γN −1 ≤ ζN −1 ≤ γN −2 ≤ · · · ≤ γ2 ≤ ζ2 ≤ γ1 ≤ ζ1 (1135) T When D ∈ EDMN .11. we can characterize its eigenvalues by congruence transformation: [333. . then (1549) [ VN 1 ] ∈ RN ×N (1133) inertia(−D) = inertia −W TDW (1134) the congruence (1132) has the same number of positive. That means the congruence must have N − 1 nonnegative eigenvalues.4.4] [330. with respect to each other. Further.50 [114.484 CHAPTER 5. 2. p.5. congruence transformation For any symmetric −D .1] the predominant characteristic of square nonnegative matrices. The remaining eigenvalue ζN cannot be nonnegative because then −D would be positive semidefinite. N } and if we arrange each respective set of eigenvalues in nonincreasing order. 6.116] because that eigenvector is the Perron vector corresponding to spectral radius. so ζN < 0.5] 5. i = 1 . 8.3] T T VN T T VN DVN VN D1 −W D W = − Because 1 T D [ VN 1] = − 1 DVN T 1 D1 T ∈ SN (1132) W is full-rank.

1.26] corresponding to some set of matrices. 5.11. p.11. all these conditions are insufficient to determine whether some given H ∈ SN is an h EDM. When N = 3.1 Exercise. for example. Even so.51 is not an EDM. can be considered redundant.365] [330. as shown by counterexample. .3723]T conforms to (1136). for example.1]. positive semidefinite matrix.5. i = 1 . N − 1    N N D ∈ EDM ⇒ λ(−D)i = 0  i=1    D ∈ SN ∩ RN ×N h + 485 (1136) where the λ(−D)i are nonincreasingly ordered eigenvalues of −D whose sum must be 0 only because tr D = 0 [333.0.11. therefore. p. 5.3723 −5. possesses a vector of nonnegative eigenvalues corresponding to an eigenspectrum contained in a spectral cone that is a nonnegative orthant.5. The eigenvalue summation condition.2 Spectral cones 0 1T 1 −D ∈ SN +1 by (1138) Denoting the eigenvalues of Cayley-Menger matrix 0 1T 1 −D ∈ RN +1 λ 5. EDM INDEFINITENESS   λ(−D)i ≥ 0 . Prove whether it holds: for D = [dij ] ∈ EDMN λ(−D)1 ≥ dij ≥ λ(−D)N −1 ∀i = j (1137) Terminology: a spectral cone is a convex cone containing all eigenspectra Any [228. Spectral inequality. the symmetric hollow matrix   0 1 1 H =  1 0 5  ∈ SN ∩ RN ×N h + 1 5 0 . although λ(−H) = [5 0. .51 5.

[17. −eN +1 } (1140) 5. 1] [218. 5. 6. When D is an EDM. eigenvalues λ 1 −D particular orthant in RN +1 having the N + 1 th coordinate as sole negative coordinate5.7. e2 .52 T Recall: for D ∈ SN .3. D ∈ SN h ⇔ T −VN DVN 0 D ∈ SN h (1139) These conditions say the Cayley-Menger form has one and only one negative 0 1T belong to that eigenvalue.1 Cayley-Menger versus Schoenberg Connection to the Schoenberg criterion (941) is made when the Cayley-Menger form is further partitioned:  0 1 1 0 [ 1 −D2:N.52 [74. EUCLIDEAN DISTANCE MATRIX we have the Cayley-Menger form ( 5.486 CHAPTER 5. 3]5..4) in this partition is positive semidefinite..0.N    D ∈ EDMN ⇔ i ≥ 0.8.53 : RN + R− = cone {e1 .2. 5. 2:N    0 1T 1 −D  = (1141) −D2:N. 1 ] 1T −D1. · · · eN . 2:N Matrix D ∈ SN is an EDM if and only if the Schur complement of h 0 1 1 0 ( A.53 . all except one entry of the corresponding eigenvector have the same sign with respect to each other. 3] [114.11. 3] id est.2] (confer (941) (917))   λ    0 1T 1 −D   i=1.1) of necessary and sufficient conditions for D ∈ EDMN from the literature: [185. −VN DVN 0 subsumes nonnegativity property 1 ( 5. h Empirically.1).

1T ζ = 0} = KM ∩ = λ where ∂H {ζ ∈ RN +1 | 1T ζ = 0} (1144) is a hyperplane through the origin.11.11.9. whereas complementary inertia (1551) insures existence + of that lone negative eigenvalue of the Cayley-Menger form.13.1). 1 ] 1 0 −D1. and KM is the monotone cone ( 2. 5.3.8.2. implying ordered eigenspectra) which is full-dimensional but is not pointed. 5.4. 2:N and D ∈ SN h 487 −D2:N. EDM INDEFINITENESS (confer (1085)) D ∈ EDMN ⇔ 0 1 1T − [ 1 −D2:N. KM = {ζ ∈ RN +1 | ζ1 ≥ ζ 2 ≥ · · · ≥ ζN +1 } KM = { [ e1 − e2 ∗ RN + R− ∩ ∂H (1143) 0 1T 1 −EDMN (435) (436) e2 − e3 · · · eN − eN +1 ] a | a 0 } ⊂ RN +1 .2 Ordered eigenspectra Conditions (1139) specify eigenvalue membership to the smallest pointed 0 1T polyhedral spectral cone for : 1 −EDMN Kλ {ζ ∈ RN +1 | ζ1 ≥ ζ 2 ≥ · · · ≥ ζN ≥ 0 ≥ ζN +1 . Now we apply results from chapter 2 with regard to polyhedral cones and their duals.5. 2:N T = −2VN DVN 0 (1142) Positive semidefiniteness of that Schur complement insures nonnegativity (D ∈ RN ×N .

. and where [sic] ˆ B = eT ∈ R1×N +1 N and  eT − eT 1 2  eT − eT  ˆ A =  2 .   .10) in VN .488 CHAPTER 5. . T T eN −1 − eN  (1149) (1150) hold those rows of A and B corresponding to conically independent rows A ( 2. EUCLIDEAN DISTANCE MATRIX So because of the hyperplane.  . B Conditions (1139) can be equivalently restated in terms of a spectral cone for Euclidean distance matrices:  0 1T RN  +  λ ∈ KM ∩ ∩ ∂H N 1 −D R− (1151) D ∈ EDM ⇔   N D ∈ Sh . 3  ∈ RN −1×N +1 . eT N −eT +1 N  (1145)    ∈ RN +1×N +1   (1146) From this and (443) we get a vertex-description for a pointed spectral cone that is not full-dimensional: Kλ = VN † ˆ A ˆ VN b | b B 0 (1148) where VN ∈ RN +1×N . Defining   T  T e1 − e2   eT − eT   2 3 N ×N +1 ∈R A  . 1T ζ = 0} (1147) eT 1 eT 2 . B  .   .  eT − eT +1 N N we have the halfspace-description: Kλ = {ζ ∈ RN +1 | Aζ 0 . dim aff Kλ = dim ∂H = N indicating Kλ is not full-dimensional. Bζ 0 .

eT N  (1156) . EDM INDEFINITENESS Vertex-description of the dual spectral cone is. where projection of a given presorted vector on the nonnegative orthant in a subspace is equivalent to its projection on the monotone nonnegative cone in that same subspace. equivalence is a consequence of presorting.54 Then things simplify: Conditions (1139) now specify eigenvalue membership to the spectral cone λ 0 1T 1 −EDMN = RN + R− ∩ ∂H N +1 (1154) 0 .5. (313) ∗ Kλ 489 = = ∗ KM + RN + R− ∗ + ∂H∗ ⊆ RN +1 0 = ˆ ˆ AT B T 1 −1 a | a 0 (1152) (1153) AT B T 1 −1 b | b From (1148) and (444) we get a halfspace-description: ∗ T ˆ T ˆ Kλ = {y ∈ RN +1 | (VN [ AT B T ])† VN y ∗ Kλ 0} This polyhedral dual spectral cone is closed.11. eigenspectra ordering can be rendered benign within a cone by presorting a vector of eigenvalues into nonincreasing order. 5.5.11.54  eT 1  eT   . for example. convex. full-dimensional because Kλ is pointed. From (443) we get a vertex-description for a pointed spectral cone not full-dimensional: λ 0 1T 1 −EDMN ˜ = VN (BVN )† b | b = where VN ∈ RN +1×N and ˜ B I −1T b|b 0 0 (1155) = {ζ ∈ R | Bζ Eigenspectra ordering (represented by a cone having monotone description such as (1143)) becomes benign in (1365). 1 ζ = 0} T where B is defined in (1146).2.3 Unordered eigenspectra Spectral cones are not unique.  .2  ∈ RN ×N +1  . but is not pointed because Kλ is not full-dimensional. and ∂H in (1144). 5.

When we consider the nonnegative orthant. (1139) can be equivalently restated    λ   0 1T 1 −D RN + R− D ∈ EDMN ⇔ ∈ ∩ ∂H (1157) D∈ SN h Vertex-description of the dual spectral cone is. Both the nonnegative orthant and the monotone nonnegative cone are spectral cones for it. convex.11. . (Notice that any nonincreasingly ordered eigenspectrum belongs to this dual spectral cone.490 CHAPTER 5.) 5. then that spectral cone for the selfdual positive semidefinite cone is also selfdual. for example. For presorted eigenvalues. (313) 0 1T 1 −EDMN ∗ λ = = RN + R− + ∂H ⊆ RN +1 0 = ˜ B T 1 −1 a | a 0 (1158) ∗ B T 1 −1 b | b From (444) we get a halfspace-description: 0 1T 1 −EDMN ∗ T ˜ T = {y ∈ RN +1 | (VN B T )† VN y λ 0} (1159) = {y ∈ RN +1 | [ I −1 ] y 0} This polyhedral dual spectral cone is closed.2. A positive semidefinite cone. EUCLIDEAN DISTANCE MATRIX holds only those rows of B corresponding to conically independent rows in BVN .4 Dual cone versus dual spectral cone An open question regards the relationship of convex cones and their duals to the corresponding spectral cones and their duals. full-dimensional but is not pointed. is selfdual.

1. VN QΛQT 0 (1160) (1161) QΛQT where Q ∈ RN −1×N −1 is an orthogonal matrix containing eigenvectors while Λ ∈ SN −1 is a diagonal matrix containing corresponding nonnegative eigenvalues ordered by nonincreasing value. identify the list using (1067). If the scale is multidimensional. N coordinates in X define the scale..3.e.2) for polyhedron P − α by factoring Gram matrix T T −VN DVN (939) as in (1122).3) T −VN DVN x1 at the origin. 5.2.12.k. 5.6] [203] ( A. From the diagonalization. When data comprises measurable distances. [333.9) [354]. for then.12 List reconstruction The term metric multidimensional scaling 5. it’s multidimensional scaling.55 0 ⇔ Λ .1).55 [258] [106] [356] [104] [261] [89] refers to any reconstruction of a list X ∈ Rn×N in Euclidean space from interpoint distance information. numerical errors of factorization are easily spotted in the eigenvalues: Now we consider how rotation/reflection and translation invariance factor into a reconstruction. e Isometric reconstruction ( 5. A.2). we have D ∈ EDMN and wish to find a generating list ( 2. in an affine subset of desired or minimum dimension.5. In one dimension. [53. a numerical representation of qualitative data.1. 5. Karhunen-Lo´ve transform in the digital signal processing literature.a. possibly incomplete ( 6.9. √ √ T T −VN DVN = 2VN X TXVN Q Λ QTQp Λ QT (1162) p Scaling [352] means making a scale. LIST RECONSTRUCTION 491 5.7). One way to factor −VN DVN is via diagonalization of symmetric matrices.0. i.4.5.12. ordinal ( 5.13. corresponding to given Euclidean distance data.5. then reconstruction is termed metric multidimensional scaling. The oldest known precursor is called principal component analysis [166] which analyzes the correlation matrix ( 5.3) of point list X is best performed by eigenvalue decomposition of a Gram matrix. 22] a. −Jan de Leeuw A goal of multidimensional scaling is to find a low-dimensional representation of list X so that distances between its elements best fit a given set of measured pairwise dissimilarities. Techniques for reconstruction are essentially methods for optimally embedding an unknown list of points.3. or specified perhaps only by bounding-constraints ( 5.1 At the stage of reconstruction.

affine dimension √ T (1165) rank VN DVN = rank Qp Λ QT = rank Λ = r Each member then has at least N −1 − r zeros in its higher-dimensional coordinates because r ≤ N − 1.492 CHAPTER 5.. Because of rotation/reflection and translation invariance ( 5. . 0. EDM D can be uniquely made from that list by calculating: (922) Recall r signifies affine dimension.5).1). choose n equal to affine dimension which is the smallest n possible because XVN has rank r ≤ n (1069). Qp is not necessarily an orthogonal matrix.   5. Rotation/reflection is accounted for by Qp yet only its first r columns are necessarily orthonormal.. Qp is constrained such that only its first r columns are necessarily orthonormal because there T are only r nonzero eigenvalues in Λ when −VN DVN has rank r ( 5.1. each member of the generating list has at most r nonzero entries.1).  as rowwise eigenvectors. Qp = I . the simplest choice for Qp is [ I 0 ] having dimension r × N − 1.   T . then point p = X 2VN y = Qp Λ QT y in Rn belongs to the translated polyhedron P − x1 whose generating list constitutes the columns of (1061) √ 0 X 2VN = √ 0 Qp Λ QT ∈ Rn×N x 2 − x1 x3 − x1 · · · xN − x1 ] (1164) (1163) = [0 The scaled auxiliary matrix VN represents that translation. qN −1 0 0 in terms of eigenvalues. Ideally.7. A simple choice for Qp has n set to N − 1.57 λr If we write QT =  . id est. Λ =   .   λ1 0   T . and Qp = qp1 · · · qpN −1 as column vectors.5. Remaining columns of Qp are arbitrary.56√Assuming membership to √ the unit simplex y ∈ S (1119). r being.56 . (1073) To truncate those zeros. We may wish to verify the list (1164) found from the diagonalization of T −VN DVN . i=1 5. q1   .5.1. then r √ √ T Qp Λ QT = λ i qpi qi is a sum of r linearly independent rank-one matrices ( B.57 In that case. Hence the summation has rank r . EUCLIDEAN DISTANCE MATRIX √ √ √ √ where Λ QTQp Λ Λ = Λ Λ and where Qp ∈ Rn×N −1 is unknown as is p its dimension n .

Similarly to (1041). V Alternatively we may perform reconstruction using auxiliary matrix V ( B. Redimensioning diagonalization factors Q . Now EDM D can be uniquely made from the list found: (922) √ √ 1 1 D(X) = D(XV ) = D( √ Qp Λ QT ) = D( Λ QT ) 2 2 (1170) This EDM is. from −V D V we can find EDM D (confer (1032)) 1 D = δ(−V D V 1 )1T + 1δ(−V D V 1 )T − 2(−V D V 2 ) 2 2 (1031) .2 0 geometric center. identical to (1166).12.12. to find a generating list 2 for polyhedron P − αc (1167) whose geometric center has been translated to the origin.4.5. LIST RECONSTRUCTION D(X) = D(X[ 0 √ 2VN ]) = D(Qp [ 0 493 √ T √ T Λ Q ]) = D([ 0 Λ Q ]) (1166) T This suggests a way to find EDM D given −VN DVN (confer (1045)) D= 0 δ T −VN DVN T 1T + 1 0 δ −VN DVN T −2 0 0T T 0 −VN DVN (1041) 5.5.1) The simplest choice for Qp is [ I 0 ] ∈ Rr×N .1.0. Λ ∈ RN ×N and unknown Qp ∈ Rn×N . of course. (1068) −V D V = 2V X TX V √ √ Q Λ QTQp Λ QT p QΛQT (1168) where the geometrically centered generating list constitutes (confer (1164)) XV = 1 √ 2 1 = [ x1 − N X1 √ Qp Λ QT ∈ Rn×N 1 x2 − N X1 1 1 x3 − N X1 · · · xN − N X1 ] (1169) 1 where αc = N X1.1) and Gram matrix −V D V 1 (943) instead. ( 5.

All plots made by connecting 5020 points. Any difference in scale in (a) through (d) is artifact of plotting routine. (c) Map isometrically reconstructed from an EDM (from distance only). longitude) data. (b) Original map data rotated (freehand) to highlight curvature of Earth. problem (1180) with no sort constraint Π d (and no hidden line removal). EUCLIDEAN DISTANCE MATRIX (a) (c) (b) (d) (f) (e) Figure 140: Map of United States of America showing some state boundaries and the Great Lakes. (d) Same reconstructed map illustrating curvature. . (e)(f ) Two views of one isotonic reconstruction (from comparative distance). (a) Shows original map made from decimated (latitude.494 CHAPTER 5.

1 Reconstruction examples Isometric reconstruction 5. RECONSTRUCTION EXAMPLES 495 5. Cartography.2.4.1 Example. y .3 2. states. conversion to Cartesian coordinates (x. EDM D for three points is unknown:   0 d12 d13 (911) D = [dij ] =  d12 0 d23  ∈ S3 h d13 d23 0 d13 ≥ d23 ≥ d12 (1173) .7) from complete distance data. The original (decimated) data and its isometric reconstruction via (1162) are shown in Figure 140a-d.13. meaning. the EDM data indicates three dimensions (r = 3) are required for reconstruction to nearly machine precision.8 152. for example. 5.13. Suppose.13 5. z) via: φ π/2 − latitude θ longitude (1171) x = sin(φ) cos(θ) y = sin(φ) sin(θ) z = cos(φ) We used 64% of the available map data to calculate EDM D from N = 5020 points.0. border. Matlab code is on Wıκımization.13. The eigenvalues computed for (1160) are T λ(−VN DVN ) = [199. The most fundamental application of EDMs is to reconstruct relative point position given only interpoint distance information.13. We obtained latitude and longitude information for the coast.3. and Great Lakes from the usalo atlas data file within Matlab Mapping Toolbox.465 0 0 0 · · · ]T (1172) The 0 eigenvalues have absolute numerical error on the order of 2E-13 .5.1.2 Isotonic reconstruction but comparative distance data is available: Sometimes only comparative information about distance is known (Earth is closer to the Moon than it is to the Sun). Drawing a map of the United States is a good illustration of isometric reconstruction ( 5.

13. reflection. the smaller the optimal solution set. Replacing EDM data with indices-square of a nonincreasing sorting like this is.496 CHAPTER 5. isotonic solution set ⊇ isometric solution set (1177) . Yet Borg & Groenen argue: [53. whereas. a heuristic we invented and may be regarded as a nonlinear introduction of much noise into the Euclidean distance matrix. EUCLIDEAN DISTANCE MATRIX where Π is a given permutation matrix expressing known sorting action on the entries of unknown EDM D . the larger the number. Any process of reconstruction that leaves comparative distance information intact is called ordinal multidimensional scaling or isotonic reconstruction. and KM+ is the monotone nonnegative cone ( 2. we express the comparative data as the nonincreasing sorting      0 1 0 d12 d13 (1174) Π d =  0 0 1  d13  =  d23  ∈ KM+ 1 0 0 d23 d12 (428) generally defined where N (N − 1)/2 = 3 for the present example.2) KM+ = {z | z1 ≥ z2 ≥ · · · ≥ zN (N −1)/2 ≥ 0} ⊆ R+ N (N −1)/2 With vectorization d = [d12 d13 d23 ]T ∈ R3 .5) list reconstruction by isotonic reconstruction is subject to error in absolute scale (dilation) and distance ratio. of course. From sorted vectorization (1174) we create the sort-index matrix   0 12 32 O =  12 0 22  ∈ S3 ∩ R3×3 (1175) h + 2 2 3 2 0 Oij k 2 | dij = (Ξ Π d)k . For large data sets. ( 5.4. 2. we see an example in relaxed problem (1181). j=i (1176) where Ξ is a permutation matrix (1766) completely reversing order of vector entries. and translation error.2] reconstruction from complete comparative distance information for a large number of points is as highly constrained as reconstruction from an EDM. this heuristic makes an otherwise intense problem computationally tractable. Beyond rotation.9.

The eigenvalues.1 46.1 Isotonic cartography To test Borg & Groenen’s conjecture.) T λ(−VN OVN ) = [880. matrix O is normalized by (N (N − 1)/2)2 . Whereas EDM D returned only three significant eigenvalues (1172).625 8. the sort-index matrix O is generally not an EDM (certainly not an EDM with corresponding affine dimension 3) so returns many more.2. sorted by nonincreasing value. To realize the map. of −VN OVN for map of USA.58 a problem explored more in 7. suppose we make a complete sort-index matrix O ∈ SN ∩ RN ×N for the map of the USA and then substitute O in place h + of EDM D in the reconstruction process of 5. .9 186.1 463.5. 5.12.13. calculated with absolute numerical error approximately 5E-7 .7128 0. RECONSTRUCTION EXAMPLES T λ(−VN OVN )j 900 497 800 700 600 500 400 300 200 100 0 1 2 3 4 5 6 7 8 9 10 j T Figure 141: Largest ten eigenvalues.257 1.58 while maintaining the known comparative distance relationship.6460 · · · ]T (1178) The extra eigenvalues indicate that affine dimension corresponding to an EDM near O is likely to exceed 3. are plotted in Figure 141: (In the code on Wıκımization. For example: given permutation matrix Π expressing the known sorting action like (1174) on entries 5.13.12 9.20 17. we must simultaneously reduce that dimensionality and find an EDM D closest to O in some sense5.701 0.

. Ignoring the sort constraint. .12 are robust to a significant amount of noise. .N       ∈ RN (N −1)/2     (1179) subject to Π d ∈ KM+ D ∈ EDMN T rank VN DVN T −VN (D − O)VN F ≤3 (1180) that finds the EDM D (corresponding to affine dimension not exceeding 3 in isomorphic dvec EDMN ∩ ΠT KM+ ) closest to O in the sense of Schoenberg (941). EUCLIDEAN DISTANCE MATRIX  d12 d13 d23 d14 d24 d34 .498 CHAPTER 5.1) minimize D subject to D ∈ EDM T rank VN DVN N T −VN (D − O)VN F ≤3 (1181) Only the three largest nonnegative eigenvalues in (1178) need be retained to make list (1164). Analytical solution to this problem. Yet even more remarkable is how much the map reconstructed using only ordinal data still resembles the original map of the USA after suffering the many violations produced by solving relaxed problem (1181). violates it. Matlab code is on Wıκımization. The reconstruction from EDM D found in this manner is plotted in Figure 140e-f. From these plots it becomes obvious that inclusion of the sort constraint is necessary for isotonic reconstruction. we can make sort-index matrix O input to the h optimization problem minimize D     1  √ dvec D =   2    dN −1. That sort constraint demands: any optimal solution D⋆ must possess the known comparative distance relationship that produces the original ordinal distance data O (1176). apparently. ignoring the sort constraint Π d ∈ KM+ .  d of unknown D ∈ SN . is known [356]: we get the convex optimization [sic] ( 7. This suggests the simple reconstruction techniques of 5. the rest are discarded.

i ∈ {1.2 Isotonic solution with sort constraint 499 Because problems involving rank are generally difficult. a convex problem in terms of the halfspace-description of KM+ : minimize σ subject to Y † σ (σ − Π d)T (σ − Π d) 0 (1185) .} (1183) As mentioned in discussion of relaxed problem (1181). RECONSTRUCTION EXAMPLES 5.13. estimating distance-square not order. . Only the first iteration of (1182a) sees the original sort-index matrix O whose entries are nonnegative whole numbers. we will partition (1180) into two problems we know how to solve and then alternate their solution until convergence: minimize D T −VN (D − O)VN F subject to D ∈ EDM minimize σ T rank VN DVN N ≤3 (a) (1182) (b) σ − Πd subject to σ ∈ KM+ where sort-index matrix O (a given constant in (a)) becomes an implicit vector variable o i solving the i th instance of (1182b) 1 √ dvec Oi = o i 2 ΠTσ ⋆ ∈ RN (N −1)/2 . 3 . 2. to the sort-index matrix O . O0 = O ∈ SN ∩ RN ×N (1176). By defining Y †T = [e1 − e2 e2 − e3 e3 − e4 · · · em ] ∈ Rm×m (429) where m N (N − 1)/2. id est. we may rewrite (1182b) as an equivalent quadratic program.5. New convex problem (1182b) finds the unique minimum-distance projection of Π d on the monotone nonnegative cone KM+ . .13.2. Subsequent iterations i take h + the previous solution of (1182b) as input √ (1184) Oi = dvec−1 ( 2 o i ) ∈ SN real successors. a closed-form solution to problem (1182a) exists.

we get the equivalent problem minimize t∈ R . Yet in 7. therefore. no guarantee from theory of alternating projection that the alternation (1182) converges to a point. the second half (1182b) of the alternation takes place in a different vector space. It so happens: optimal orthogonal U ⋆ always equals Q given. in the set of all EDMs corresponding to affine dimension not in excess of 3. belonging to dvec EDMN ∩ ΠT KM+ . SN −1 = VN (SN ) (1050). is an isometry because Frobenius’ norm is orthogonally invariant (48).6 we know these two h vector spaces are related by an isomorphism. acting on square AU matrix A . This isometric isomorphism T thus maps a nonconvex problem to a convex one that preserves distance.10 we discuss convergence of alternating projection on intersecting convex sets in a Euclidean vector space. SN (versus SN −1 ).2). From 5. EUCLIDEAN DISTANCE MATRIX This quadratic program can be converted to a semidefinite program via Schur-form ( 3.2. Secondly. .5). Linear operator T (A) = U ⋆T ⋆ .13.1.3 Convergence In E. We have. but h not by an isometry. sets of positive semidefinite matrices having an upper bound on rank are generally not convex. Here the situation is different for two reasons: Firstly.4.500 CHAPTER 5.5.1 we prove (1182a) is equivalent to a projection of nonincreasingly ordered eigenvalues on a subset of the nonnegative orthant: minimize D T −VN (D − O)VN F minimize ≡ Υ subject to D ∈ EDM T rank VN DVN N ≤3 Υ−Λ F subject to δ(Υ) ∈ R3 + 0 (1187) T T where −VN DVN U ΥU T ∈ SN −1 and −VN OVN QΛQT ∈ SN −1 are ordered diagonalizations ( A.0. convergence to a point in their intersection. σ t tI σ − Πd T (σ − Π d) 1 0 0 (1186) subject to Y †σ 5.

k . Suitable rank heuristics are discussed in 4.2.5. It would be remiss not to mention another method of solution to this isotonic reconstruction problem: Once again we assume only comparative distance data like (1173) is available.4.13. isotonic reconstruction. Convergence of isotonic solution by alternation. n) ∈ I (1188) subject to dij ≤ dkl ≤ dmn this problem minimizes affine dimension while finding an EDM whose entries satisfy known comparative relationships. l . m . it is important to recognize: the optimal solution set for this problem (1188) is practically always different from the optimal solution set for its counterpart.2.3 regarding the necessity for at least one requirement more than the four properties of the Euclidean metric ( 5.2. discussed in 5.1 and 7. But we have not yet implemented the second half (1185) of alternation (1182) for USA map data because memory-demands exceed capability of our computer.1 Exercise. Regardless.4 Interlude 501 Map reconstruction from comparative distance data.2.4.13. 5.14. 5.3. this problem formulation is more difficult to compute than the relaxed counterpart problem (1181).2) to certify realizability of a bounded convex polyhedron or to . would also prove invaluable to stellar cartography where absolute interstellar distance is difficult to acquire. even with a rank heuristic in place of the objective function.13.2 that will transform this to a convex optimization problem.14 Fifth property of Euclidean metric We continue now with the question raised in 5. Using contemporary computers. Given known set of indices I minimize D rank V D V D ∈ EDMN ∀(i. FIFTH PROPERTY OF EUCLIDEAN METRIC 5. problem (1180). Empirically demonstrate convergence. That is because there exist efficient algorithms to compute a selected few eigenvalues and eigenvectors from a very large matrix. on a smaller data set. j.

triangle inequality is then the only Euclidean condition bounding the necessarily nonnegative dij . j ∈ {1.2 where a triangle inequality (1093a) was revealed within the T leading principal 2 × 2 submatrix of −VN DVN when positive semidefinite. and det(−VN DVN ) is nonnegative. 2. 5. 3} dij dij dij dij ≥ 0. and those bounds are tight. We now endeavor to ascribe geometric meaning to it. in other words. for i.3.2 Derivation of the fifth Correspondence between the triangle inequality and the EDM was developed in 5. (confer 5.2. i = j T = 0. When N = 4. that nonnegative determinant becomes the fifth and last Euclidean metric requirement for D ∈ EDMN . all T triangle inequalities are satisfied.14.0. then by the positive (semi )definite principal submatrices theorem ( A.1. −VN DVN 0 (1091) and D ∈ S3 are necessary h and sufficient conditions for D to be an EDM. The triangles corresponding to those submatrices all have vertex x1 .59 .4) it is sufficient to prove: all dij are nonnegative. T There are fewer principal 2×2 submatrices in −VN DVN than there are triangles made by four or more points because there are N !/(3!(N − 3)!) triangles made by point triples.5.1) 5. but become insufficient when cardinality N exceeds 3 (regardless of affine dimension).59 Assuming D ∈ S4 h T and −VN DVN ∈ S3 .14. EUCLIDEAN DISTANCE MATRIX reconstruct a generating list for it from incomplete distance information. a unique triangle inequality like (988) corresponds to any one of the (N − 1)!/(2!(N −1 − 2)!) principal 2 × 2 submatrices. i = j −VN DVN 0 ⇔ D ∈ S3 = dji h ≤ dik + dkj .8. By (1093). Our choice of the leading principal submatrix was arbitrary. i = j = k ⇔ D ∈ EDM3 (1189) Yet those four properties become insufficient when N > 3. 5. That means the first four properties of the Euclidean metric are necessary and sufficient conditions for D to be an EDM in the case N = 3 .1 Recapitulate T In the particular case N = 3. actually. There we saw that the four Euclidean metric properties are necessary for D ∈ EDMN in the case N = 3.502 CHAPTER 5.8.

θi1j ≤ π and all i = j = ℓ ∈ {2. the more concise inner-product form of the determinant is admitted. FIFTH PROPERTY OF EUCLIDEAN METRIC 5.2. Inequality (1193) can be equivalently written linearly as a triangle inequality between relative angles [396.4].1 Nonnegative determinant 503 T By (994) when D ∈ EDM4 .14. θℓ1j .5. ΘT Θ =   d12 d12 d13 cos θ213 d12 d14 cos θ214 d12 d13 cos θ213 d13 d13 d14 cos θ314  d12 d14 cos θ214 d13 d14 cos θ314  d14 (1190) Because Euclidean space is an inner-product space. 3. for 0 ≤ θi1ℓ . 1. to cos(θi1ℓ + θℓ1j ) ≤ cos θi1j ≤ cos(θi1ℓ − θℓ1j ) (1193) Analogously to triangle inequality (1105). |θi1ℓ − θℓ1j | ≤ θi1j ≤ θi1ℓ + θℓ1j θi1ℓ + θℓ1j + θi1j ≤ 2π 0 ≤ θi1ℓ . −VN DVN is equal to inner product (989). 4} . the determinant is 0 upon equality on either side of (1193) which is tight. θℓ1j . θi1j ≤ π (1194) .14. det(ΘT Θ) = −d12 d13 d14 cos(θ213 )2 + cos(θ214 )2 + cos(θ314 )2 − 2 cos θ213 cos θ214 cos θ314 − 1 (1191) The determinant is nonnegative if and only if cos θ214 cos θ314 − cos θ213 cos θ314 − cos θ213 cos θ214 − sin(θ214 )2 sin(θ314 )2 ≤ cos θ213 ≤ cos θ214 cos θ314 + ⇔ 2 sin(θ 2 ≤ cos θ sin(θ213 ) 314 ) 214 ≤ cos θ213 cos θ314 + ⇔ 2 sin(θ 2 ≤ cos θ sin(θ213 ) 214 ) 314 ≤ cos θ213 cos θ214 + sin(θ214 )2 sin(θ314 )2 sin(θ213 )2 sin(θ314 )2 sin(θ213 )2 sin(θ214 )2 (1192) which simplifies.

j. N } . must be satisfied at each point xk regardless of affine dimension. the inequalities |θikℓ − θℓkj | ≤ θikj ≤ θikℓ + θℓkj θikℓ + θℓkj + θikj ≤ 2π 0 ≤ θikℓ . (confer 5.17. ⋄ . Relative-angle inequality.restatement.1 Fifth property of Euclidean metric .3.14.1. θikj ≤ π (a) (b) (c) (1195) where θikj = θjki is the angle between vectors at vertex xk (as defined in (986) and illustrated in Figure 123).1] Augmenting the four fundamental Euclidean metric properties in Rn . p.0. drawn in entirety. 3. Generalizing this: 5.504 CHAPTER 5. i < j < ℓ . and for N ≥ 4 distinct points {xk } .2.107] [236. for all i. . p. . θℓkj . Each angle θ (986) must belong to this solid to be realizable.1) [49] [50.1. ℓ = k ∈ {1 . EUCLIDEAN DISTANCE MATRIX θ213 -θ214 -θ314 π Figure 142: The relative-angle inequality tetrahedron (1195) bounding EDM4 is regular.

( 5. d) ∈ EDMN ⇔ Angle matrix inequality Ω 0 (995). and so on. ( 5. for this case.5. hence the new index k in (1195).14.3.0.3) T −VN DVN 0 Four Euclidean metric properties ( 5. 5. D ∈ SN h (1197) Like matrix criteria (917). the relative-angle inequality is the ultimate test for only four. For four distinct points. . When N = 5 in particular. the four Euclidean metric properties and relative-angle inequality remain necessary but become a sufficient test only for positive semidefiniteness of all the principal 3 × 3 T submatrices [sic] in −VN DVN . relative-angle matrix inequality in (1107) is necessary and sufficient to certify realizability.4. ⇔ D = D(Θ) ∈ EDM4 ⇔ 4 Angle θ inequality (916) or (1195). this fifth Euclidean metric requirement must apply to each of the N points as though each were in turn labelled x1 . the first four properties of the Euclidean metric and the relative-angle inequality together become insufficient conditions for realizability. (941). FIFTH PROPERTY OF EUCLIDEAN METRIC 505 Because point labelling is arbitrary. is illustrated in Figure 142. In other words. and (1107).4.2). D ∈ Sh (1196) The relative-angle inequality. relative-angle inequality becomes the penultimate Euclidean metric requirement while nonnegativity of then unwieldy det(ΘT Θ) corresponds (by the positive (semi )definite principal submatrices theorem in A. only assuming nonnegative dij . the relative-angle matrix inequality and nonnegativity property subsume all the Euclidean metric properties and further requirements.2 Beyond the fifth metric property When cardinality N exceeds 4. Relative-angle inequality can be considered the ultimate test only for realizability at each vertex xk of each and every purported tetrahedron constituting a hyperdimensional body.4) to the sixth and last Euclidean metric requirement.14. Just as the triangle inequality is the ultimate test for realizability of only three points.1.1) T −VN DVN 0 Euclidean metric property 1 ( 5.3. the triangle inequality remains a necessary although penultimate test.2). Yet for all values of N . Together these six tests become necessary and sufficient.2. ⇔ D = D(Ω .

4} (1198) which is a generalized “triangle” inequality [228. id est. for example.3] the i th facet area-square for i ∈ {1 .4. EUCLIDEAN DISTANCE MATRIX 5.14.3.0. that occurs. det = 0 ⇔ r < N−1 1 −D where D is any EDM ( 5.3. the three-dimensional analogue to triangle & distance is tetrahedron & facet-area. i = j = k = ℓ ∈ {1. N + 1 metric properties are required. c2 . 3.1). 5.1 N= 4 Each of the four facets of a general tetrahedron is a triangle and its relative interior.3 Path not followed As a means to test for realizability of four or more points. while in case N = 5 the four-dimensional analogue is polychoron & facet-volume. then area-square of the i th triangular facet has a convenient formula in terms of Di ∈ EDMN −1 the EDM corresponding to that particular facet: From the Cayley-Menger determinant 5. [376] [132] [163.14. 3.506 CHAPTER 5. if one point is relatively interior to the convex hull of the three remaining.60 area inequality for the facets √ ci ≤ √ cj + √ ck + √ cℓ . 1.60 . c4 . 2. . Then analogous to metric property 4.1) The upper bound is met when all angles in (1199) are simultaneously 0 . ad infinitum.61 whose foremost characteristic is: the determinant vanishes if and only if affine 0 1T dimension does not equal penultimate cardinality.61 for simplices. Otherwise.7.1] that follows from √ ci = √ cj cos ϕij + √ ck cos ϕik + √ cℓ cos ϕiℓ (1199) [243] [376. Law of Cosines] where ϕij is the dihedral angle at the common edge between triangular facets i and j . c3 . For N points. Suppose we identify each facet of the tetrahedron by its area-square: c1 . 5. an intuitively appealing way to augment the four Euclidean metric properties is to recognize generalizations of the triangle inequality: In the case of cardinality N = 4. . 4] [88. If D is the EDM corresponding to the whole tetrahedron. the determinant is negative. we may write a tight5. 5. N } is ( A.

The analogue to triangle & distance is now polychoron & facet-volume. i = j = k = ℓ = m ∈ {1 . IV] √ ci ≤ √ cj + √ ck + √ cℓ + √ cm . 4. 5. (A pentahedron is a three-dimensional body having five vertices. and cof(Di ) is the N − 1 × N − 1 matrix of cofactors [333.1] The simplest polychoron is called a pentatope [376].63 5. 4] corresponding to Di .14.5. The number of principal 3 × 3 submatrices in D is. [382. Triangle inequality (property 4) and area inequality (1198) are conditions necessary for D to be an EDM. .62 of D ∈ EDMN . We could then write another generalized “triangle” inequality like (1198) but in terms of facet volume.3.2 N= 5 Moving to the next level.) 5. we might encounter a Euclidean body called polychoron: a bounded polyhedron in four dimensions. Prove their sufficiency in conjunction with the remaining three Euclidean metric properties. Sufficiency conditions for an EDM of four points. .1. four (N !/(3!(N −3)!)) when N = 4.1 Exercise.63 Our polychoron has five (N !/(4!(N −4)!)) facets.14.5. 5} (1203) Every principal submatrix of an EDM remains an EDM. [236.62 . a regular simplex hence convex. equal to the number of triangular facets in the tetrahedron. each of them a general tetrahedron whose volume-square c i is calculated using the same formula. and Di is the EDM corresponding to the i th facet (the principal 4 × 4 submatrix of D ∈ EDMN corresponding to the i th tetrahedron). FIFTH PROPERTY OF EUCLIDEAN METRIC ci = −1 0 1T det 1 −Di 2N −2 (N − 2)! 2 N (−1) −1 det Di 1TDi 1 = N −2 2 (N − 2)! 2 (−1)N T T = N −2 2 1 cof(Di ) 1 2 (N − 2)! 507 (1200) (1201) (1202) where Di is the i th principal N − 1 × N − 1 submatrix5. of course.3. 5. (1200) where D is the EDM corresponding to the polychoron.14.

and volume inequality (1203) are conditions necessary for D to be an EDM. Figure 143: Length of one-dimensional face a equals height h = a = 1 of this convex nonsimplicial pyramid in R3 with square base inscribed in a circle of radius R centered at the origin. triangle (distance) inequality ( 5.14. area inequality (1198). 2. Sufficiency for an EDM of five points.1. 5.9. [394. EUCLIDEAN DISTANCE MATRIX .173] [234] [174] [175] Volume is a concept germane to R3 .2). content(S)2 = −1 0 1T det 1 −D([ 0 e1 e2 · · · eN −1 ]) 2N −1 (N − 1)! 2 (1205) where ei ∈ RN −1 and the EDM operator used is D(X) (922).3.12. [376.508 CHAPTER 5. p. For N = 5. in higher dimensions it is called content. a general nonempty simplex ( 2.14.0.2. Applying the EDM assertion ( 5. Prove their sufficiency. p.1 Exercise. Pyramid ] 5. .407].4) and a result from [61.1] [284.3 Volume of simplices There is no known formula for the volume of a bounded general convex polyhedron expressed either by halfspace or vertex-description.3) in RN −1 corresponding to an EDM D ∈ SN has content h √ c = content(S) T det(−VN DVN ) (1204) where content-square of the unit simplex S ⊂ RN −1 is proportional to its Cayley-Menger determinant.3.

Slicing it in half along the plane containing the line segments corresponding to radius R and height h we find the vertices of one simplex. we must first decompose the pyramid into simplicial parts. meaning that a tetrahedron has collapsed to a lower affine dimension. id est.14. √The volume of 1 this simplex is half that of the entire pyramid. In solving completion problems of any size N where one or more entries of an EDM are unknown.5. (1195a). 5.64 .7.64 it is 1 the product of its 3 base area with its height.1. id est.14.4 Affine dimension reduction in three dimensions (confer 5. The determinant will go to 0 whenever equality is attained on either side of (916). A formula for volume of a pyramid is known.1] When N = 4 and det(ΘT Θ) is 0. c = 6 found by evaluating (1204). r = rank ΘT Θ = rank Θ is reduced below N − 1 exactly by the number of 0 eigenvalues ( 5.3. 5.1 Example. FIFTH PROPERTY OF EUCLIDEAN METRIC 509 5.14. we conclude digression of path. therefore.4) The determinant of any M ×M matrix is equal to the product of its M eigenvalues. dimension r of the affine hull required to contain the unknown points is potentially reduced by selecting distances to attain equality in (916) or (1195a) or (1195b). Pyramid volume is independent of the paramount vertex position as long as its height remains constant. or (1195b). 1/2 1/2  1/2 −1/2 X= 0 0  −1/2 −1/2 0  0 0  ∈ Rn×N 1 (1206) where N = n + 1 for any nonempty simplex in Rn . With that. [224] The pyramid in Figure 143 has volume 1 . 3 To find its volume using EDMs. Pyramid. [333. 5.8.5.3. that means one or more eigenvalues of ΘT Θ ∈ R3×3 are 0.1).

then cos θ123 = 0 and θ324 = 0 result from identity (986).9. Small completion problem. IV. distance-square d14 is ascertainable from the fifth Euclidean metric √ property.0.510 5. (confer 5.1) Returning again to Example 5.4.14.0. EUCLIDEAN DISTANCE MATRIX Exemplum redux We now apply the fifth Euclidean metric property to an earlier problem: 5. Applying (916).1 CHAPTER 5. affine dimension r cannot exceed N − 2 because equality is attained in (1207).2 that pertains to Figure 122 where N = 4. .4.14. As explained in this subsection. cos(θ123 + θ324 ) ≤ cos θ124 ≤ cos(θ123 − θ324 ) 0 ≤ cos θ124 ≤ 0 (1207) It follows again from (986) that d14 can only be 2. Because all distances in (914) are known except d14 .3.0.1.1 Example.3.

28. Mεβoo Publishing USA. . −Hayden. 2005. the two convex cones can be related. v2012. Convex Optimization & Euclidean Distance Geometry.8. In 6. & Tarazaga (1991) [186. 511 . we know that the convex cone of Euclidean distance matrices EDMN (the EDM cone) does not intersect the positive semidefinite cone SN (PSD cone) except at the origin. Wells. Liu. (1131) [236] EDMN ∩ SN = 0 + (1208) Even so.Chapter 6 Cone of distance matrices For N > 3. All rights reserved.01. the cone of EDMs is no longer a circular cone and the geometry becomes complicated. their only + vertex.01. citation: Dattorro. there can be no positive or negative semidefinite EDM.1 we prove the equality EDMN = SN ∩ SN ⊥ − SN h c + (1301) 2001 Jon Dattorro. . co&edg version 2012.28. 3] In the subspace of symmetric matrices SN .

512 CHAPTER 6.7 we show: the Schoenberg criterion for discriminating Euclidean distance matrices D ∈ EDM N ⇔ T −VN DVN ∈ SN −1 + D ∈ SN h (941) is a discretized membership relation ( 2.1 Because both positive semidefinite cones are frequently in play.1 gravity Equality (1301) is equally important as the known isomorphisms (1039) (1040) (1051) (1052) relating the EDM cone EDMN to positive semidefinite cone SN −1 ( 5.1 But those isomorphisms have never led to this equality relating whole cones EDMN and SN . EDMN ⊇ SN ∩ (SN ⊥ − SN ).6.4.1.0.2. CONE OF DISTANCE MATRICES a resemblance to EDM definition (922) where SN = h A ∈ SN | δ(A) = 0 (66) is the symmetric hollow subspace ( 2.1).6. 6.13.0.2) SN = {Y ∈ SN | Y 1 = 0} c (2040) 6.6. + Equality (1301) is not obvious from the various EDM definitions such as (922) or (1224) because inclusion must be proved algebraically in order to establish equality.1) or to an N (N − 1)/2-dimensional face of SN + + ( 5.2.2 highlight In 6.1. dual generalized inequalities) between the EDM cone and its ordinary dual.0.8.3) and where SN ⊥ = {u1T + 1uT | u ∈ RN } c (2042) is the orthogonal complement of the geometric center subspace ( E. dimension is explicit. 6.2. . We will instead prove (1301) h c + using purely geometric methods.7.

Cone of Euclidean distance matrices. for each and every ζ1 .0. ζ2 ≥ 0 ( A. −D = 0 for some z ∈ N (1T ) (1212) △ .2) V D1 V 0. δ(D) = 0 (1211) = D ∈ EDMN | rank(V D V ) = N − 1 while its relative boundary comprises rel ∂ EDMN = = D ∈ EDMN | rank(V D V ) < N − 1 D ∈ EDMN | zz T .1. DEFINING EDM CONE 513 6. δ(D) = 0 (1210) The EDM cone in isomorphic RN (N +1)/2 [sic] is the intersection of an infinite number (when N > 2) of halfspaces about the origin and a finite number of hyperplanes through the origin in vectorized variable D = [dij ] . The set of all EDMs of dimension N ×N forms a closed convex cone EDMN because any pair of EDMs satisfies the definition of a convex cone (175). The EDM cone relative interior comprises h rel int EDMN = z∈N (1T ) D ∈ SN | zz T .0.1.1 Definition.1 Defining EDM cone We invoke a popular matrix criterion to illustrate correspondence between the EDM and PSD cones belonging to the ambient space of symmetric matrices: D ∈ EDMN ⇔ −V D V ∈ SN + D ∈ SN h (945) where V ∈ SN is the geometric centering matrix ( B.1. 6. −D > 0 . ζ1 V D1 V + ζ2 V D2 V 0 ⇐ N N D1 ∈ Sh .0.3. In the subspace of symmetric matrices. ζ1 D1 + ζ2 D2 ∈ Sh V D2 V 0 N D2 ∈ Sh (1209) and convex cones are invariant to inverse linear transformation [309. −D ≥ 0 . videlicet. the set of all Euclidean distance matrices forms a unique immutable pointed closed convex cone called the EDM cone: for N > 0 EDMN = z∈N (1T ) D ∈ SN | −V D V ∈ SN h + D ∈ SN | zz T .22]. p. Hence EDMN is not full-dimensional with respect to SN because it is confined to the symmetric hollow subspace SN .4).6.

4 0.6 0.2 0. pointed EDM3 is a circular cone having axis of revolution dvec(−E ) = dvec(11T − I ) (1127) (73).4 0.514 d13 0. Unlike positive semidefinite cone.2 0.8 0 0.6 1 1 0.8 1 1 0.6 0.10).6 0.) .4 (a) 0 d12 0 0.6 0.2 0 0. intersection of three halfspaces (1094) whose partial boundaries each contain origin.2 0 0.2 0.2 CHAPTER 6.4 0.6 0.6 0.2 0.8 1 1 1 0.4 0.8 0.3) (c) Two coordinate systems artificially superimposed.8 1 √ d12 0. CONE OF DISTANCE MATRICES dvec rel ∂ EDM3 d12 0.6 0.8 1 d23 d23 (d) d13 √ d13 0. neither is it proper in ambient symmetric subspace (dual EDM cone for this example belongs to isomorphic R6 ). (a) EDM cone drawn in usual distance-square coordinates dij .2 0. Coordinate transformation from dij to dij appears a topological contraction.6 0.8 √ d23 0. ( 6. EDM cone is not selfdual.4 0. (Rounded vertex is plot artifact.2 (b) 0 (c) Figure 144: Relative boundary (tiled) of EDM cone EDM3 drawn truncated in isometrically isomorphic subspace R3 .4 0.8 0. Cone geometry becomes nonconvex (nonpolyhedral) in higher dimension. (b) Drawn in its natural coordinates dij (absolute distance).4 0. cone remains convex (confer 5.8 0.4 0. View is from interior toward origin. (d) Sitting on its vertex 0.

In the case N = 2. the first four Euclidean metric properties are necessary and sufficient tests to certify realizability of triangles. Thus triangle inequality property 4 describes three halfspaces (1094) whose intersection makes a polyhedral cone in R3 of realizable dij (absolute distance). e. symmetry. The EDM cone in the case N = 3 is a circular cone in R3 illustrated in Figure 144(a)(d).2 Polyhedral bounds The convex cone of EDMs is nonpolyhedral in dij for N > 2 . the set of all matrices N (N −1)/2  0 d12 d13 D =  d12 0 d23  ∈ EDM3 d13 d23 0  (1213) makes a circular cone in this dimension. a halfline in a subspace of the realization in Figure 148. 3.2. 6. 4: N = 3.g. the EDM cone is the origin in R0 . Transforming distance-square coordinates dij by taking their positive square root provides the polyhedral cone in Figure 144b. 2. the EDM cone is the nonnegative real line in R . self-distance.6. POLYHEDRAL BOUNDS 515 This cone is more easily visualized in the isomorphic vector subspace R corresponding to SN : h In the case N = 1 point. an isomorphic subspace representation of the set of all EDMs D in the natural coordinates   √ √ 0 d12 √d13 √ √ ◦  d12 d23  (1214) D √ √0 0 d13 d23 illustrated in Figure 144b. Figure 144a. (1189).. rather. polyhedral because an intersection of three halfspaces in natural coordinates dij is provided by triangle inequalities (1094). This polyhedral cone implicitly encompasses necessary and sufficient metric properties: nonnegativity. In this case. . Still we found necessary and sufficient bounding polyhedral relations consistent with EDM cones for cardinality N = 1. and triangle inequality.

Still.6.3 √ EDM cone is not convex For some applications. it is obvious that (confer 5.2] EDMN √ ◦ { D | D ∈ EDMN } (1215) √ where ◦ D is defined like (1214).10) D ∈ EDMN ⇔ √ ◦ D∈ EDMN (1216) √ √ EDMN convex.5.7) Any given ray {ζ Γ ∈ RN (N −1)/2 | ζ ≥ 0} remains a ray under entrywise square root: { ζ Γ ∈ RN (N −1)/2 | ζ ≥ 0}.14. Figure 133) or multidimensional scaling [103] [357]. Taking square root of the entries in all EDMs D of dimension N . CONE OF DISTANCE MATRICES N = 4. property-4 triangle inequalities (1094) corresponding to each principal 3 × 3 T submatrix of −VN DVN demand that the corresponding dij belong to a polyhedral cone like that in Figure 144b.516 CHAPTER 6. then given ◦ D1 . absolute distance dij is the preferred variable. we get another cone but not a convex cone when N > 3 (Figure 144b): [89. It is already established that √ ◦ D ∈ EDMN ⇒ D ∈ EDMN (1126) But because of how EDMN is defined. ◦ D2 ∈ EDMN we would Were √ √ expect their conic combination ◦ D1 + ◦ D2 to be a member of EDMN .2 . relative-angle inequality provides a regular tetrahedron in R3 [sic] (Figure 142) bounding angles θikj at vertex xk consistently with EDM4 . 4. then we would find all the necessary and sufficient dij -transformations for generating bounding polyhedra consistent with EDMs of any higher dimension (N > 3). Relative-angle inequality (1195) together with four Euclidean metric properties are necessary and sufficient tests for realizability of tetrahedra. 6. (1196) Albeit relative angles θikj (986) are nonlinear functions of the dij . 6.2 Yet were we to employ the procedure outlined in 5. like a molecular conformation problem (Figure 3.3 for making generalized triangle inequalities. It is a cone simply because any cone is completely constituted by rays emanating from the origin: ( 2.

although affine dimension r = rank(VX ) = rank(XV ).2. (1219) Equation (1218) is simply the standard EDM definition (922) with a centered list X as in (1007). (1064) .4 EDM definition in 11T D(VX ) T T T δ(VX VX )1T + 1δ(VX VX )T − 2VX VX ∈ EDMN Any EDM D corresponding to affine dimension r has representation (1218) where R(VX ∈ RN ×r ) ⊆ N (1T ) = 1⊥ T T VX VX = δ 2(VX VX ) and VX is full-rank with orthogonal columns. EDM DEFINITION IN 11T 517 That is √ easily proven √ false by counterexample via (1216).2)6. for then √ √ ◦ ◦ ◦ ( D1 + D2 ) ◦ ( D1 + ◦ D2 ) would need to be a member of EDMN . T T P1⊥ δ(VX VX ) = V δ(VX VX ) 1 T T P1 δ(VX VX ) = 11T δ(VX VX ) N (1221) (1222) Of course T T δ(VX VX ) = V δ(VX VX ) + 6. 6.3 1 T T 11 δ(VX VX ) N (1223) √ √ T T Subcompact SVD: VX VX Q Σ Σ QT ≡ V TX TX V .1. and we learn how to transform a nonconvex proximity problem in the natural coordinates dij to a convex optimization in 7. Notwithstanding.4. Gram matrix X TX has been replaced with the subcompact singular value decomposition ( A. So VX is not necessarily XV T ( 5.1). T Vector δ(VX VX ) may me decomposed into complementary parts by projecting it on orthogonal subspaces 1⊥ and R(1) : namely.0.5.6. EDMN ⊆ EDMN (1217) by (1126) (Figure 144).3 T VX VX ≡ V TX TX V ∈ SN ∩ SN c + (1220) T This means: inner product VX VX is an r×r diagonal matrix Σ of nonzero singular values.6.1.

We validate eigenvector 1 and eigenvalue λ . (⇒) Suppose 1 is an eigenvector of EDM D . T (⇐) Now suppose δ(VX VX ) = λ 1 . Then because T VX 1 = 0 (1227) 2 F1 it follows T T T D1 = δ(VX VX )1T 1 + 1δ(VX VX )T 1 = N δ(VX VX ) + VX T ⇒ δ(VX VX ) ∝ 1 (1228) For some κ ∈ R+ T T δ(VX VX )T 1 = N κ = tr(VX VX ) = VX 2 F T ⇒ δ(VX VX ) = 1 VX N 2 F1 (1229) so y = 0. D= 6.746).g. Scalar λ becomes an eigenvalue when corresponding eigenvector 1 exists. CONE OF DISTANCE MATRICES by (944). & Tarazaga EDM formula [186. Wells. when X = I in EDM definition (922). 2] D(VX .518 CHAPTER 6. Then 2N (1230) λ T T 11 − 2VX VX ∈ EDMN N 1 is an eigenvector with corresponding eigenvalue λ . . y = 0.4 e. we get the Hayden.. Substituting this into EDM definition (1218). id est.4 Then the particular dyad sum from (1224) λ T 11 ∈ SN ⊥ (1226) c N must belong to the orthogonal complement of the geometric center subspace T (p.6. Liu. y1T + 1y T + Proof. whereas VX VX ∈ SN ∩ SN (1220) belongs to the positive semidefinite c + cone in the geometric center subspace. y) where λ 2 VX 2 F T = 1T δ(VX VX )2 y1T + 1y T + λ T T 11 − 2VX VX ∈ EDMN N y T δ(VX VX ) − (1224) and λ T 1 = V δ(VX VX ) 2N (1225) and y = 0 if and only if 1 is an eigenvector of EDM D .

.6. EDM DEFINITION IN 11T 519 N (1T ) T δ(VX VX ) 1 VX Figure 145: Example of VX selection to make an EDM corresponding to cardinality N = 3 and affine dimension r = 1 . but belongs to nonnegative orthant which is strictly supported by N (1T ). Nullspace of 1T is hyperplane in R3 (drawn truncated) having T normal 1.4. VX } . VX is a vector in nullspace N (1T ) ⊂ R3 . Vector δ(VX VX ) may or may not be in plane spanned by {1 .

VX } is l. in particular.5 make a linearly independent set.i. on the other hand. ap = 0. then the nontranspose halves constitute a basis for the range of EDM D .4.4. 1 T 11T )δ(VX VX ) N 1 T δ(VX VX ) − N VX 2 1 F (I − T V δ(VX VX ) = VX a for any a ∈ Rr = VX a = VX a (1232) T δ(VX VX ) − λ 1 2N T = y = VX a ⇔ {1 .1.0. id est. Saying this mathematically: For D ∈ EDMN R(D) = R([ 1 VX ]) T T R(D) = R([ δ(VX VX ) 1 VX ]) ⇐ rank([ δ(VX VX ) 1 VX ]) = 2 + r ⇐ otherwise (1231) 6. . then VX VX = i T vi vi is also a sum of dyads. but what is the relative T T T geometric orientation of δ(VX VX ) ? δ(VX VX ) 0 because VX VX 0. 6. (Figure 145) If the projection of T δ(VX VX ) on N (1T ) does not belong to R(VX ) .i.520 CHAPTER 6. When this condition is violated (when (1225) y = VX ap for some particular a ∈ Rr ). δ(VX VX ) . T T and δ(VX VX ) ∝ 1 remains possible (1228).1 pertaining to linear independence of dyad sums: If the transpose halves of all the dyads in the sum (1218)6.1 Range of EDM D From B. then from (1224) we have R D = y1T + 1y T + λ 11T − N T 2VX VX = R (VX ap + = R([ VX ap + = R([ 1 VX ]) λ T 1)1T + (1aT − 2VX )VX p N λ 1 1aT − 2VX ]) p N (1233) An example of such a violation is (1230) where. CONE OF DISTANCE MATRICES 6.1. then that is a necessary and T sufficient condition for linear independence (l. We need that condition under which the rank equality is satisfied: We know R(VX ) ⊥ 1.1 Proof. this means δ(VX VX ) ∈ N (1T ) / simply because it has no negative entries.5 Identifying columns VX T [ v1 · · · vr ] .) of δ(VX VX ) with respect to R([ 1 VX ]) .

5) When N = 3 and r = 1. This figure could be constructed via (1236) by spiraling vector VX tightly about the origin in N (1T ) .6. for example. although EDMN = = rel int EDMN = N −1 r=0 D ∈ EDMN | rank(V D V ) = N − 1 D ∈ EDMN | rank(V D V ) = N − 1 are pointed convex cones.4. (Figure 146) . R(VX ) ⊆ N (1T ) (1235) D ∈ EDMN | rank(V D V ) = r whereas {D ∈ EDMN | rank(V D V ) ≤ r} is the closure of this same set. as can be imagined with aid of Figure 145. VX VX = δ 2(VX VX ) .4. for D ∈ EDMN (Theorem 5.3.2 Boundary constituents of EDM cone Expression (1218) has utility in forming the set of all EDMs corresponding to affine dimension r : T T = D(VX ) | VX ∈ RN ×r .7. For example. ( 6. Vectors close to the origin in N (1T ) are correspondingly close to the origin in EDMN . rank VX = r .0. the relative boundary of the EDM cone dvec rel ∂ EDM3 is realized in isomorphic R3 as in Figure 144d. the corresponding EDM orbits the axis of revolution while remaining on the boundary of the circular cone dvec rel ∂ EDM3 . As vector VX orbits the origin in N (1T ) . the relative interior rel int EDM3 is realized via (1235). D ∈ EDMN | rank(V D V ) ≤ r = D ∈ EDMN | rank(V D V ) = r rel ∂ EDMN = = D ∈ EDMN | rank(V D V ) < N − 1 D ∈ EDMN | rank(V D V ) = r D ∈ EDMN | rank(V D V ) = r (1238) (1236) N −2 r=0 (1237) None of these are necessarily convex sets.1) rank(D) = r + 2 ⇔ y ∈ R(VX ) ⇔ 1TD† 1 = 0 / rank(D) = r + 1 ⇔ y ∈ R(VX ) ⇔ 1TD† 1 = 0 (1234) 6. When cardinality N = 3 and affine dimension r = 2. EDM DEFINITION IN 11T 521 Then a statement parallel to (1231) is.

522 CHAPTER 6. CONE OF DISTANCE MATRICES 1 0 (a) 1 -1 0 -1 VX ∈ R3×1 -1 0 1 տ ←− dvec D(VX ) ⊂ dvec rel ∂ EDM3 (b) Figure 146: (a) Vector VX from Figure 145 spirals in N (1T ) ⊂ R3 decaying toward origin.) .) (b) Corresponding trajectory D(VX ) on EDM cone relative boundary creates a vortex also decaying toward origin. (Spiral is two-dimensional in vector space R3 . (Vortex is three-dimensional in isometrically isomorphic R3 . There are two complete orbits on EDM cone boundary about axis of revolution for every single revolution of VX about origin.

4. Isomorphic faces. EDM cone faces are EDM cones. VX VX = δ 2(VX VX ) . R(VX ) ⊆ R(VXp ) (1239) which is isomorphic6. When cardinality N = 4 and affine dimension r = 2 so that R(VXp ) is any two-dimensional subspace of three-dimensional N (1T ) in R4 . e. hence of dimension dim F EDMN ∋ D(VXp ) = (r + 1)r/2 (1240) in isomorphic RN (N −1)/2 .1]. The fact that the smallest face is isomorphic with another EDM cone (perhaps smaller than EDMN ) is implicit in [186.4. 4. rank VX = r . Prove that in high cardinality N . 6.3.1 Exercise. any set of EDMs made via (1235) or (1236) with particular affine dimension r is isomorphic with any set admitting the same affine dimension but made in lower cardinality. then each principal submatrix belongs to a particular face of EDMN .6 with convex cone EDM r+1 . the EDM cone has no two-dimensional faces.. 2].3. for example.4.g. EDM DEFINITION IN 11T 523 6.3 Faces of EDM cone Like the positive semidefinite cone. 6.4. Not all dimensions are represented. then the corresponding face of EDM4 is isometrically isomorphic with: (1236) EDM3 = {D ∈ EDM3 | rank(V D V ) ≤ 2} ≃ F(EDM4 ∋ D(VXp )) (1241) Each two-dimensional subspace of N (1T ) corresponds to another three-dimensional face. for example.0.3) is another EDM [236.14. The EDM cone’s smallest face that contains D(VXp ) is F EDMN ∋ D(VXp ) ≃ EDM r+1 T T = D(VX ) | VX ∈ RN ×r .1 smallest face that contains an EDM Now suppose we are given a particular EDM D(VXp ) ∈ EDMN corresponding to affine dimension r and parametrized by VXp in (1218). Because each and every principal submatrix of an EDM in EDMN ( 5.6. 6.6 .

1) we may write c N i=1 T T λ i qi qi N D=δ N i=1 T λ i qi qi 1 + 1δ + T −2 i=1 T λ i qi qi (1243) = i=1 T λ i δ(qi qi )1T T 1δ(qi qi )T − T 2qi qi 1 where qi is the i th eigenvector of −V D V 2 arranged columnar in orthogonal matrix Q = [ q1 q2 · · · qN ] ∈ RN ×N (399) . Biorthogonal expansion of an EDM. there is no conic combination of distinct EDMs (each conically independent of Γ ( 2. (confer 2. Here is a sketch: Any EDM may be represented T T T D(VX ) = δ(VX VX )1T + 1δ(VX VX )T − 2VX VX ∈ EDMN (1218) where matrix VX (1219) has orthogonal columns. nonnegative coordinates for biorthogonal expansion are the eigenvalues λ ∈ RN of −V D V 1 : For any D ∈ SN it holds h 2 D = δ −V D V 1 2 1T + 1δ −V D V 1 2 1 T 2 − 2 −V D V 1 2 (1031) By diagonalization −V D V N QΛQT ∈ SN ( A. CONE OF DISTANCE MATRICES extreme directions of EDM cone In particular.2 CHAPTER 6. For the same reason (1500) that zz T is an extreme direction of the positive semidefinite cone ( 2.5.10)) equal to Γ .8.8. extreme directions ( 2.4.9.1 Example.524 6. Proving this would exercise the fundamental definition (186) of extreme direction.1) of EDMN correspond to affine dimension r = 1 and are simply represented: for any particular cardinality N ≥ 2 ( 2.4.2.1.7) for any particular nonzero vector z .3. 6.13.1) When matrix D belongs to the EDM cone.7.2.3.2) and each and every nonzero vector z in N (1T ) Γ = δ(zz T )1T + 1δ(zz T )T − 2zz T (z ◦ z)1T + 1(z ◦ z)T − 2zz T ∈ EDMN (1242) is an extreme direction corresponding to a one-dimensional face of the EDM cone EDMN that is a ray in isomorphic subspace RN (N −1)/2 .

2.4. unique coordinates Y † dvec D for this biorthogonal expansion must be the nonnegative eigenvalues λ of −V D V 1 .7 . 2] assert one-to-one correspondence of EDMs with positive semidefinite matrices in the symmetric subspace. 6. i = 1 .5 N Correspondence to PSD cone S+ −1 Hayden.3. CORRESPONDENCE TO PSD CONE SN −1 + 525 T T T and where {δ(qi qi )1T + 1δ(qi qi )T − 2qi qi .1).7 ( 2. . Liu.3 open question Result (1240) is analogous to that for the positive semidefinite cone (222). .3) [350] 6. Because rank(V D V ) ≤ N − 1 ( 5.6. . 6. that PSD cone corresponding to Elementary example of face not exposed is given by the closed convex set in Figure 32 and in Figure 42.7. This means D 2 simultaneously belongs to the EDM cone and to the pointed polyhedral cone dvec K = cone(Y ). Wells. N ∈ RN (N −1)/2×N (1246) 1 2 (1245) When D belongs to the EDM cone in the subspace of symmetric hollow matrices. N } are extreme directions of some pointed polyhedral cone K ⊂ SN and extreme directions h of EDMN .9. Invertibility of (1243) N −V D V 1 2 = −V N i=1 T T T λ i δ(qi qi )1T + 1δ(qi qi )T − 2qi qi V 1 2 (1244) = i=1 T λ i qi qi implies linear independence of those extreme directions. & Tarazaga [186.5. i = 1 . .6. Then biorthogonal expansion is expressed dvec D = Y Y † dvec D = Y λ −V D V where Y T T T dvec δ(qi qi )1T + 1δ(qi qi )T − 2qi qi . although the question remains open whether all faces of EDMN (whose dimension is less than dimension of the cone) are exposed like they are for the positive semidefinite cone.1.

7) T rank VN DVN = 1 D ∈ SN ∩ RN ×N h + D ∈ EDMN ⇔ ⇔ D is an extreme direction T −VN DVN ≡ uuT D ∈ SN h (1248) . r < N − 1: Symmetric hollow matrices −D positive semidefinite on N (1T ) . we invoke inner-product form EDM definition D(Φ) = 0 δ(Φ) 1T + 1 0 δ(Φ)T Φ ⇔ 0 − 2 0 0 0T Φ ∈ EDMN (1049) Then the EDM cone may be expressed EDMN = D(Φ) | Φ ∈ SN −1 + (1247) Hayden & Wells’ assertion can therefore be equivalently stated in terms of an inner-product form EDM operator D(SN −1 ) = EDMN + VN (EDMN ) = SN −1 + (1051) (1052) identity (1052) holding because R(VN ) = N (1T ) (929). [8. 18.0.2.6. Hayden & Wells claim particular correspondence between PSD and EDM cones: r = N − 1: Symmetric hollow matrices −D positive definite on N (1T ) correspond to points relatively interior to the EDM cone.3. linear functions D(Φ) T and VN (D) = −VN DVN ( 5. In terms of affine dimension r . CONE OF DISTANCE MATRICES the EDM cone can only be SN −1 .2. where T −VN DVN has at least one 0 eigenvalue.1. r = 1: Symmetric hollow nonnegative matrices rank-one on N (1T ) correspond to extreme directions (1242) of the EDM cone. correspond to points on the relative boundary of the EDM cone.526 CHAPTER 6. for some nonzero vector u ( A.1) being mutually inverse. id est.1] To clearly demonstrate this + correspondence.

nonnegativity.  0 1 1 1 T T −V  1 0 5 V = z1 z1 − z2 z2 2 1 5 0  (1252) 6.2) N (VN (D)) = {1q T + q 1T | q ∈ RN } ⊆ SN (1250) (1249) Hollowness demands q = δ(zz T ) while nonnegativity demands choice of positive sign in (1249).8 †T (⇐) For ai ∈ RN −1 . id est. e. and symmetric hollowness conditions but not positive semidefiniteness on subspace R(VN ) .0.5. .0. i = 1 .1.q ∈ RN and nonzero z ∈ N (1T ) . for {zi ∈ N (1T ) .6. let zi = VN ai .8. The foregoing proof is not extensible in rank: An EDM with corresponding affine dimension r has the general form. extreme direction (1242). and Schoenberg criterion (941).. confer E. r T D = 1δ i=1 zi ziT r +δ i=1 zi ziT 1T − 2 r i=1 zi ziT ∈ EDMN (1251) The EDM so defined relies principally on the sum zi ziT having positive T 6. Then it is easy to find a sum incorporating negative coefficients while meeting rank. prove T rank VN DVN = 1 D ∈ SN ∩ RN ×N h + D ∈ EDMN ⇒ D is an extreme direction Any symmetric matrix D satisfying the rank condition must have the form. CORRESPONDENCE TO PSD CONE SN −1 + 527 6. for z.2.7.0. Case r = 1 is easily proved: From the nonnegativity development in 5. Matrix D thus takes the form of an extreme direction (1242) of the EDM cone.6.5. r} an independent set.8 summand coefficients (⇔ −VN DVN 0) .1 Proof.1. D = ±(1q T + q 1T − 2zz T ) because ( 5. from page 485. we need show only sufficiency. . .g.2.

5} . this conventional boundary does not include D which belongs to the relative interior in this dimension. CONE OF DISTANCE MATRICES Example.4.1.1 provide [2.1) 6. for any particular κ > 0.0.1. + = V(EDMN ) 6. Extreme rays versus rays on the boundary.1.5.6] T EDMN = D V(EDMN ) ≡ D VN SN −1 VN + T T VN SN −1 VN ≡ V D VN SN −1 VN + + (1253) −V EDMN V 1 = SN ∩ SN c + 2 (1254) a one-to-one correspondence between EDMN and SN −1 . Because EDM2 is a ray whose relative boundary ( 2.   0 1 4 The EDM D =  1 0 1  is an extreme direction of EDM3 where 4 1 0 1 T in (1248). Because −VN DVN has eigenvalues {0. is an 1 0 T extreme direction of EDM2 but −VN DVN has only one eigenvalue: {κ}.5. (1129) (1255) ( 5. the ray u= 2 whose direction is D also lies on the relative boundary of EDM3 .6.528 6. elliptope .1 Gram-form correspondence to SN −1 + With respect to D(G) = δ(G)1T + 1δ(G)T − 2G (934) the linear Gram-form EDM operator.7.9.2 EDM cone by elliptope Having defined the elliptope parametrized by scalar t > 0 EtN = SN ∩ {Φ ∈ SN | δ(Φ) = t1} + then following Alfakih [9] we have N N EDMN = cone{11T − E1 } = {t(11T − E1 ) | t ≥ 0} N Identification E N = E1 equates the standard Figure 136) to our parametrized elliptope.1) is the origin.0.5.2 CHAPTER 6.6. ( 2. EDM D = κ . results in 5.0.0.0. 2. 0 1 In exception.

The fractal is trying to convey existence of a neighborhood about the origin where the translated elliptope boundary and EDM cone boundary intersect. CORRESPONDENCE TO PSD CONE SN −1 + 529 dvec rel ∂ EDM3 dvec(11T − E 3 ) EDMN = cone{11T − E N } = {t(11T − E N ) | t ≥ 0} (1255) 3 Figure 147: Three views of translated negated elliptope 11T − E1 (confer Figure 136) shrouded by truncated EDM cone.6. .5. Fractal on EDM cone relative boundary is numerical artifact belonging to intersection with elliptope relative boundary.

5.530 CHAPTER 6. CONE OF DISTANCE MATRICES ⊥ The normal cone KE (11T ) to the elliptope at 11T is a closed convex cone defined ( E.4] ⊥ KE (11T ) = TE (11T ) ◦ (1260) From Deza & Laurent [114.1.0. (1256) T E (11T ) {t(E − 11T ) | t ≥ 0} {B | B . id est.2. We further propose. Define T E (11T ) to be the tangent cone to the elliptope E at point 11T .3. p.535] we have the EDM cone EDM = −TE (11T ) (1261) The polar EDM cone is also expressible in terms of the elliptope. Φ ∈ E } (1257) The polar cone of any set K is the closed convex cone (confer (296)) K ◦ {B | B . for any particular t > 0 EDMN = cone{t11T − EtN } Proof.1 Expository. tangent cone. Normal cone. From (1259) we have ◦ ⊥ (1262) EDM = −KE (11T ) ⋆ In 5. elliptope.2.10. ⊥ KE (11T ) = T E (11T ) (1259) and vice versa. A ≤ 0 . Φ − 11T ≤ 0 . Figure 182) ⊥ KE (11T ) 6.2. [200. for all A ∈ K} ◦ (1258) The normal cone is well known to be the polar of the tangent cone.10.1 we proposed the expression for EDM D D = t11T − E ∈ EDMN (1130) where t ∈ R+ and E belongs to the parametrized elliptope EtN . (1263) .5. Pending. A.

to the Schoenberg criterion T −VN DVN 0 D∈ SN h ⇔ D ∈ EDMN (941) because N (11T ) = R(VN ). When D ∈ EDMN . Yet there is another useful projection interpretation: Revising the fundamental matrix criterion for membership to the EDM cone (917).2 Exercise.2 we learn: −V D V can be interpreted as orthogonal projection [6. VECTORIZATION & PROJECTION INTERPRETATION 531 6.6.2.6.6 Vectorization & projection interpretation In E. Relationship of the translated negated elliptope with the EDM cone is illustrated in Figure 147.6.6.7.1).2.9 zz T .2.9 N (11T ) = N (1T ) and R(zz T ) = R(z) .0. Prove whether it holds that EDMN = lim t11T − EtN t→∞ (1264) 6.0. of course. Figure 149) of −D in isometrically isomorphic 6. EDM cone from elliptope.5.4. 2] of vectorized −D ∈ SN on the subspace of geometrically centered h symmetric matrices SN = {G ∈ SN | G1 = 0} c (2040) (2041) (1022) = {G ∈ SN | N (G) ⊇ 1} = {G ∈ SN | R(G) ⊆ N (1T )} T ≡ {VN AVN | A ∈ SN −1 } = {V Y V | Y ∈ SN } ⊂ SN because elementary auxiliary matrix V is an orthogonal projector ( B. correspondence (1265) means −z TDz is proportional to a nonnegative coefficient of orthogonal projection ( E. −D ≥ 0 ∀ zz T | 11Tzz T = 0 D∈ SN h ⇔ D ∈ EDMN (1265) this is equivalent.4.

+ + To show this. only one extreme direction of the PSD cone is involved: zz T = 1 −1 −1 1 (1268) Any nonnegative scaling of vectorized zz T belongs to the set T illustrated in Figure 148 and Figure 149.532 CHAPTER 6. 2. From (221). Yet only those dyads in R(VN ) are included in the test (1265). + In the particularly simple case D ∈ EDM2 = {D ∈ S2 | d12 ≥ 0} .2. thus only a subset T of all vectorized extreme directions of SN is observed. we must first find the smallest face that contains auxiliary matrix V and then determine its extreme directions.3).1 Face of PSD cone SN containing V + In any case.6. hence lie on + its boundary.2.1) in the nullspace of 11T .2.9) SN .6. a face isomorphic with SN −1 = Srank V ( 2. for h example.8. on each and every member of T svec(zz T ) | z ∈ N (11T ) = R(VN ) ⊂ svec ∂ SN + (1266) T = svec(VN υυ T VN ) | υ ∈ RN −1 whose dimension is dim T = N (N − 1)/2 (1267) The set of all symmetric dyads {zz T | z ∈ RN } constitute the extreme directions of the positive semidefinite cone ( 2.1) symmetric dyad ( B. id est. F SN ∋ V + = {W ∈ SN | N (W ) ⊇ N (V )} = {W ∈ SN | N (W ) ⊇ 1} + + T ≃ Srank V = −VN EDMN VN + = {V Y V T 0 | Y ∈ SN } ≡ {VN B VN | B ∈ SN −1 } + (1269) where the equivalence ≡ is from 5.1. CONE OF DISTANCE MATRICES RN (N +1)/2 on the range of each and every vectorized ( 2. set T (1266) constitutes the vectorized extreme directions of an N (N − 1)/2-dimensional face of the PSD cone SN containing auxiliary + matrix V .1 while isomorphic equality ≃ with transformed EDM cone is from (1052).9. Projector V belongs to F SN ∋ V + . 6.

Relative boundary of √ EDM cone is constituted solely by matrix 0. constituted solely by its extreme directions. Halfline T = {κ2 [ 1 − 2 1 ]T | κ ∈ R} on PSD cone boundary depicts that lone extreme ray (1268) on which orthogonal projection of −D must be positive semidefinite if D is to belong to EDM2 . in this dimension.6. (1273) Dual c EDM cone is halfspace in R3 whose bounding hyperplane has inward-normal svec EDM2 . aff cone T = svec S2 .6. Truncated cone of Euclidean distance matrices EDM2 in isometrically isomorphic subspace R . κ∈ R −1 ⊂ svec ∂ S2 + Figure 148: Truncated boundary of positive semidefinite cone S2 in + isometrically isomorphic R3 (via svec (56)) is. VECTORIZATION & PROJECTION INTERPRETATION d11 d12 d12 d22 svec ∂ S2 + 533 d22 svec EDM2 0 −T √ d11 2d12 T svec(zz T ) | z ∈ N (11T ) = κ 1 . .

Set T (1266) has only one ray member in this dimension. Orthogonal projection of −D on T (indicated by half dot) has h nonnegative projection coefficient. we find its polar along h S2 . not orthogonal h to S2 . CONE OF DISTANCE MATRICES d11 d12 d12 d22 svec ∂ S2 + d22 D 0 −D −T √ svec S2 h d11 2d12 Projection of vectorized −D on range of vectorized zz T : Psvec zzT (svec(−D)) = zz T . −D zz T zz T . zz T D ∈ EDM N ⇔ D ∈ SN h zz T .534 CHAPTER 6. −D ≥ 0 ∀ zz T | 11Tzz T = 0 (1265) Figure 149: Given-matrix D is assumed to belong to symmetric hollow subspace S2 . Negating D . . Matrix D must therefore be an EDM. a line in this dimension.

4.10.6. the origin constitutes the extreme points of F .1 we propose D ∈ EDMN ⇔ ∃ t ∈ R+ E ∈ EtN D = t11T − E (1274) .6.3] D ∈ EDMN ⇔ [1 − e−α dij ] ∈ EDMN ∀ α > 0 (1124a) From the parametrized elliptope EtN in 6. In fact the smallest face. (948)) Laurent specifies an elliptope trajectory condition for EDM cone membership: [236.1. and auxiliary matrix V is some convex combination of those extreme points and directions by the extremes theorem ( 2. of the PSD cone SN is the intersection with the geometric center subspace (2040) (2041).8.4. ( B. + F SN ∋ V + T = cone VN υυ T VN | υ ∈ RN −1 ≡ {X = SN ∩ SN c + (1271) (1626) 0 | X .3) Each and every rank-one matrix belonging to this face is therefore of the form: T VN υυ T VN | υ ∈ RN −1 (1270) Because F SN ∋ V is isomorphic with a positive semidefinite cone SN −1 . + + then T constitutes the vectorized extreme directions of F . 11T = 0} In isometrically isomorphic RN (N +1)/2 svec F SN ∋ V = cone T + related to SN by c aff cone T = svec SN c (1273) (1272) 6.1.2 EDM criteria in 11T (confer 6.5. that contains auxiliary matrix V .2 and 5. 2. VECTORIZATION & PROJECTION INTERPRETATION 535 † †T T because VN VN VN VN = V .1).6.

Intuitively. but is no more practical: for D ∈ SN h D ∈ EDMN (1276) ⇔ ∀ κ ≥ κp .7 A geometry of completion It is not known how to proceed if one wishes to restrict the dimension of the Euclidean space in which the configuration of points may be constructed. 4] prove a different criterion they attribute to Finsler (1937) [145]. Given an incomplete noiseless EDM.. This representation is practical because it is easily combined with or replaced by other convex constraints.536 CHAPTER 6. Trosset (2000) [355.g. CONE OF DISTANCE MATRICES Chabrillac & Crouzeix [74. With reference to Figure 148. Depicted in Figure 150a is an intersection of the EDM cone EDM3 with a single hyperplane representing the set of all EDMs having one fixed symmetric entry. Chabrillac & Crouzeix invent a criterion for the semidefinite case. the offset 11T is simply a direction orthogonal to T in isomorphic R3 .10 −Michael W. slab inequalities in (749) that admit bounding of noise processes.10 Assuming a nonempty intersection. [2] [4] [5] [6] [8] [17] [67] [206] [218] [235] [236] [237] If one or more entries of a particular EDM are fixed. translation of −D in direction 11T is like orthogonal projection on T insofar as similar information can be obtained.6. intriguing is the question of whether a list in X ∈ Rn×N (76) may be reconstructed and under what circumstances reconstruction is unique. 6. then geometric interpretation of the feasible set of completions is the intersection of the EDM cone EDMN in isomorphic subspace RN (N −1)/2 with as many hyperplanes as there are fixed symmetric entries. −D − κ11T [sic] has exactly one negative eigenvalue ∃ κp > 0 6. We apply it to EDMs: for D ∈ SN (1072) h T −VN DVN ≻ 0 ⇔ ∃ κ > 0 −D + κ11T ≻ 0 ⇔ D ∈ EDMN with corresponding affine dimension r = N − 1 (1275) This Finsler criterion has geometric interpretation in terms of the vectorization & projection already discussed in connection with (1265). 1] . e. When the Finsler criterion (1275) is applied despite lower affine dimension. the constant κ can go to infinity making the test −D + κ11T 0 impractical for numerical computation.

8. (b) d13 = κ . In this dimension it is impossible to represent a unique nonzero completion corresponding to affine dimension 1. (c) d12 = κ . then it is apparent there exist infinitely many EDM completions corresponding to affine dimension 1.6. intersection of EDM3 with hyperplane ∂H representing one fixed symmetric entry d23 = κ (both drawn truncated.1. EDMs in this dimension corresponding to affine dimension 1 comprise relative boundary of EDM cone ( 6.0.7.0. rounded vertex is artifact of plot).5). . A GEOMETRY OF COMPLETION 537 (b) (c) dvec rel ∂ EDM3 (a) 0 ∂H Figure 150: (a) In isometrically isomorphic subspace R3 . using a single hyperplane because any hyperplane supporting relative boundary at a particular point Γ contains an entire ray {ζ Γ | ζ ≥ 0} belonging to rel ∂ EDM3 by Lemma 2. Since intersection illustrated includes a nontrivial subset of cone’s relative boundary. for example.

2. Figure 151 realizes a densely sampled neighborhood. incomplete even though it is a superset of the neighborhood graph. because the neighbor number is relatively small. imagine. who originated the technique.0. unfolding. N ] employing only 2 nearest neighbors is banded having the incomplete form Local reconstruction of point position. The common number of nearest neighbors to each sample is a data-dependent algorithmic parameter whose minimum value connects the graph.10. The dimensionless EDM subgraph between each sample and its nearest neighbors is completed from available data and included as input. called. we find it deformed from flatness in some way introducing neither holes.g. 6.6. it is this variability that admits unfurling. A corresponding EDM D = [dij .6.9. Maximum variance unfolding.12). e.2) that can be as small as a point. decompacting. and those corresponding to particular affine dimension r < N − 1 belong to some generally nonconvex subset of that intersection (confer 2.1. consider untying the trefoil knot drawn in Figure 152a.11 .1.2). [375.7. in general. Remaining distances (those not graphed at all) are squared then made variables within the algorithm. CONE OF DISTANCE MATRICES then the number of completions is generally infinite.9. the algorithmic process preserves local isometry between nearest neighbors allowing distant neighbors to excurse expansively by “maximizing variance” (Figure 5). is unique to within an isometry ( 5.6). 6. or unraveling. In particular instances.2. i.2] The physical process is intuitively described as unfurling. specify an applicable manifold in three dimensions by analogy to an ordinary sheet of paper (confer 2.538 CHAPTER 6.. neighborhood graph. Weinberger & Saul. diffusing. one such EDM subgraph completion is drawn superimposed upon the neighborhood graph in Figure 151 .1 Example. or self-intersections. .5) of certain kinds of Euclidean manifold by topological transformation can be posed as a completion problem (confer E.0.11 The consequent dimensionless EDM graph comprising all the subgraphs is incomplete. [375] A process minimizing affine dimension ( 2. 5.1. Essentially. from the EDM submatrix corresponding to a complete dimensionless EDM subgraph. the process is a sort of flattening by stretching until taut (but not by crushing). tears. .2. j = 1 . unfurling a three-dimensional Euclidean body resembling a billowy national flag reduces that manifold’s affine dimension to r = 2. To demonstrate. Data input to the proposed process originates from distances between neighboring relatively dense samples of a given manifold.

7. All line segments measure absolute distance. 2]) Solid line segments represent completion of EDM subgraph from available distance data for an arbitrarily chosen sample and its nearest neighbors. suggesting. best number of nearest neighbors can be greater than value of embedding dimension after topological transformation. Local view of a few dense samples from relative interior of some arbitrary Euclidean manifold whose affine dimension appears two-dimensional in this neighborhood. A GEOMETRY OF COMPLETION 539 (a) 2 nearest neighbors (b) 3 nearest neighbors Figure 151: Neighborhood graph (dashed) with one dimensionless EDM subgraph completion (solid) superimposed (but not obscuring dashed). (confer [213. .6. Dashed line segments help visually locate nearest neighbors. Each distance from EDM subgraph becomes distance-square in corresponding EDM submatrix.

CONE OF DISTANCE MATRICES (a) (b) Figure 152: (a) Trefoil knot in R3 from Weinberger & Saul [375]. . (b) Topological transformation algorithm employing 4 nearest neighbors and N = 539 samples reduces affine dimension of knot to r = 2. Choosing instead 2 nearest neighbors would make this embedding more circular.540 CHAPTER 6.

. a corresponding solution D⋆ can be found.. .. D 1 2 ˇ = dij ∀(i.. j) ∈ I (1278) where ei ∈ RN is the i th member of the standard basis.. Yet because the decomposition for the trefoil knot reveals only two dominant eigenvalues..    ? ? ?   ˇ ?  d1. where V ∈ RN ×N is the geometric centering matrix ( B. constrained total distance-square maximization: maximize D  ˇ ˇ  d12 0 d23   ˇ ˇ  d13 d23 0   ˇ ˇ  ? d24 d34 D =  .7.. A GEOMETRY OF COMPLETION  0 ˇ d12 ˇ d13 ? ˇ d24 ··· .j (946) where G is the Gram matrix producing D assuming G1 = 0. . ? ? ? ..N ˇ d1.. 0 ˇ dN −2.... ? ? .N −1 ? ˇ ˇ d1N d2N ? ˇ . ei eT + ej eT j i rank(V D V ) = 2 D ∈ EDMN −V . .4.... .. .1).6.1.... . a low-dimensional embedding of the trefoil knot.N −1 ˇ dN −2.N −1 dN −2..  .N ˇ 0 dN −1.. If the (rank) constraint on affine dimension is ignored. ˇ d1N ˇ d2N ? ? ?  541 ˇ where dij denotes a given fixed distance-square. and a nearest rank-2 solution is then had by ordered eigenvalue decomposition R2 + of −V D⋆ V followed by spectral projection ( 7.3) on ⊂ RN . Such a reconstruction of point position ( 5. and where −V . 0 .. ? ˇ ˇ dN −2. . .N 0          (1277)         subject to D ...N −1 ? ? ? . then problem (1278) becomes convex. D = tr(−V D V ) = 2 tr G = 1 N dij i.. where set I indexes the given distance-square data like that in (1277). the spectral projection is nearly benign. d34 .N ˇ dN −1. . The unfurling algorithm can be expressed as an optimization problem. This 0 two-step process is necessarily suboptimal.12) utilizing 4 nearest neighbors is drawn in Figure 152b.

courtesy. . Kilian Weinberger. CONE OF DISTANCE MATRICES Figure 153: Trefoil ribbon.542 CHAPTER 6. Same topological transformation algorithm with 5 nearest neighbors and N = 1617 samples.

8. (Example 5.1.1. constraints on any or all of these three variables may now be introduced. D ≤ 0 .8 6.6.3. closed. D ≥ 0 .2..4. dual cone K comprises each and every vector inward-normal to a hyperplane supporting convex cone K . Optimal Gram rank turns out to be tightly bounded above by minimum multiplicity of the second smallest eigenvalue of a dual optimal variable.1. and has direct application to quick calculation of PageRank by search engines [232.8. j) ∈ I (1279) subject to The advantage to converting EDM to Gram is: Gram matrix G is a bridge between point list X and EDM D . The dual EDM cone in the ambient space of symmetric matrices is therefore expressible as .4.1. zz T . for Φij as in (920) maximize G∈SN c I.5) Confinement to the geometric center subspace SN (implicit constraint G1 = 0) keeps G c independent of its translation-invariant subspace SN ⊥ ( 5. of course. 4]. That dual has simple interpretations in graph and circuit theory and in mechanical and thermal systems. explored in [342]. but not full-dimensional. 6.G ˇ G . −D ≥ 0 i i z∈ N (1T ) (1280) i=1.5. The set of all EDMs in SN is a closed convex cone because it is the intersection of halfspaces about the origin in vectorized variable D ( 2.1 Dual EDM cone Ambient SN We consider finding the ordinary dual EDM cone in ambient space SN where EDMN is pointed.. A problem dual to maximum variance unfolding problem (1279) (less the Gram rank constraint) has been called the fastest mixing Markov process. be written equivalently in terms of Gram matrix G . videlicet. facilitated by (952).1. DUAL EDM CONE 543 This problem (1278) can. convex. ei eT .7. Φij = dij rank G = 2 G 0 ∀(i.2): EDMN = D ∈ SN | ei eT .N ∗ By definition (296). 2. Figure 154) c so as not to become numerically unbounded.

ξ j ≥ 0} − cone{zz T | 11Tzz T = 0} j T = {δ(u) | u ∈ R } − cone VN υυ T VN | υ ∈ RN −1 . CONE OF DISTANCE MATRICES the aggregate of every conic combination of inward-normals from (1280): EDMN = cone{ei eT . j = 1 .13. (confer 2.2. N } − cone{zz T | 11Tzz T = 0} i j N ∗ ={ i=1 ζ i ei eT − i N j=1 N ξj ej eT | ζ i . the EDM cone is the point at the origin in R . ( 2. . The dual EDM cone is formed by translating the hyperplane along the negative semidefinite halfline −T . −A ≥ 0 N T T (1283) 0) A ∈ S | −V A V A ∈ S | −z V A V z ≥ 0 ∀ zz ( N 0 K1 ∩ K2 = EDMN (1284) .10.1 EDM cone and its dual in ambient SN K1 = = so SN h y∈N (1T ) Consider the two convex cones K2 A ∈ SN | yy T . and dual cone EDM∗ is the real line. ( 2. . When N = 2. the dual EDM cone can no longer be polyhedral simply because the EDM cone cannot. Diagonal matrices {δ(u)} in (1281) are represented by a hyperplane T through the origin {d | [ 0 1 0 ]d = 0} while the term cone{VN υυ T VN } is represented by the halfline T in Figure 148 belonging to the positive semidefinite cone boundary.1) 6. Dual cone h ∗ EDM2 is the particular halfspace in R3 whose boundary has inward-normal EDM2 . the EDM cone is a nonnegative real line in isometrically isomorphic R3 .1) When N = 1. there S2 is a real line containing the EDM cone.544 CHAPTER 6.1) When cardinality N exceeds 2.1.13. ( v = 1) ⊂ SN T = {δ 2(Y ) − VN ΨVN | Y ∈ SN .1. therefore. Ψ ∈ SN −1 } + (1281) The EDM cone is not selfdual in ambient SN because its affine hull belongs to a proper subspace aff EDMN = SN (1282) h The ordinary dual EDM cone cannot.0. the union of each and every translation. Auxiliary matrix VN is empty [ ∅ ] . be pointed.8.1. −ej eT | i .

9. (V H V − H) = 0 + Product V H V belongs to the geometric center subspace.1.6.0. we observe membership of H −PSN (V H V ) to K2 because + PSN (V H V ) − H = + PSN (V H V ) − V H V + + (V H V − H) (1287) The term PSN (V H V ) − V H V necessarily belongs to the (dual) positive + semidefinite cone by Theorem E.3. while projection of H ∈ SN on K2 is PK2 H = H − PSN (V H V ) (1286) + Proof.5) whose nullspace is spanned by the eigenvectors corresponding to 0 eigenvalues by Theorem A. H − PSN (V H V ) = 0 + + (1289) 0 (1288) (1290) (1291) (1292) PSN (V H V ) .5. PK2 H − H . T K2 = − cone VN υυ T VN | υ ∈ RN −1 ⊂ SN ∗ (1285) Gaffke & Mathar [149.1.2. Next.3.1. hence −V H − PSN (V H V ) V + by Corollary A. we require Expanding. First. h From (1281) via (313).1) simply zeroes negative eigenvalues in diagonal matrix Λ . 5.7. Projection of V H V on the PSD cone ( 7. V 2 = V . DUAL EDM CONE ∗ 545 Dual cone K1 = SN ⊥ ⊆ SN (72) is the subspace of diagonal matrices. ( E.7.2) V H V ∈ SN = {Y ∈ SN | N (Y ) ⊇ 1} c (1293) Diagonalize V H V QΛQT ( A.0. Then N (PSN (V H V )) ⊇ N (V H V ) ( ⊇ N (V ) ) + (1294) .0. PK2 H = 0 −PSN (V H V ) . (PSN (V H V ) − V H V ) + (V H V − H) = 0 + + PSN (V H V ) .8.2.3] observe that projection on K1 and K2 have simple closed forms: Projection on subspace K1 is easily performed by symmetrization and zeroing the main diagonal or vice versa.0.

3) which was + + + already established in (1294). ( A. we must have PK2 H − H = −PSN (V H V ) ∈ K2 . .3. projection of PSN H on the positive c semidefinite cone remains in the geometric center subspace SN = {G ∈ SN | G1 = 0} c (2040) (1022) (2041) = {G ∈ SN | N (G) ⊇ 1} = {G ∈ SN | R(G) ⊆ N (1T )} = {V Y V | Y ∈ SN } ⊂ SN That is because: eigenvectors of PSN H corresponding to 0 eigenvalues c span its nullspace N (PSN H ).5. Projection on PSD cone ∩ geometric center subspace. its negative eigenvalues are zeroed. we know matrix product V H V is the orthogonal projection of H ∈ SN on the geometric center subspace SN .2) The nullspace is thereby expanded while eigenvectors originally spanning N (PSN H ) c remain intact. From results in E. From 6.7.1 + ∗ we know dual cone K2 = −F SN ∋ V is the negative of the positive + semidefinite cone’s smallest face that contains auxiliary matrix V .8.546 from which it follows: CHAPTER 6.1.1 (1296) Lemma. Thus c the projection product PK2 H = H − PSN PSN H c + 6.9. Because the geometric center subspace is invariant to projection on the PSD cone. ( 7.1) To project PSN H on the positive c c semidefinite cone.0.7. Thus PSN (V H V ) ∈ F SN ∋ V ⇔ N (PSN (V H V )) ⊇ N (V ) ( 2. then the rule for projection on a convex set in a subspace governs ( E.2.6. projectors do not commute) and statement (1297) follows directly.0.1. c + ∗ Finally.2.2. For each and every H ∈ SN . PSN ∩ SN = PSN PSN c c + + (1297) ⋄ Proof.9. CONE OF DISTANCE MATRICES PSN (V H V ) ∈ SN c + (1295) so PSN (V H V ) ⊥ (V H V −H) because V H V −H ∈ SN ⊥ .1.

(Rounded vertex is artifact of plot. drawn is a tiled fragment) apparently supporting positive semidefinite cone.) Line svec S2 = aff cone T (1273) intersects svec ∂ S2 . DUAL EDM CONE 547 EDM2 = S2 ∩ S2⊥ − S2 h c + svec S2⊥ c svec S2 c svec ∂ S2 + svec S2 h 0 Figure 154: Orthogonal complement S2⊥ (2042) ( B.6. also drawn in c + Figure 148. (confer Figure 135) .8. it runs along PSD cone boundary.2) of geometric center c subspace (a plane in isometrically isomorphic R3 .

.548 CHAPTER 6. CONE OF DISTANCE MATRICES EDM2 = S2 ∩ S2⊥ − S2 h c + svec S2⊥ c svec S2 c svec S2 h 0 − svec ∂ S2 + Figure 155: EDM cone construction in isometrically isomorphic R3 by adding polar PSD cone to svec S2⊥ . EDM cone is nonnegative halfline along svec S2 in h this dimension. Difference svec S2⊥ − S2 is halfspace partially c c + 2⊥ bounded by svec Sc .

1): EDMN = K1 + K2 = SN ⊥ − SN ∩ SN h c + which bears strong resemblance to (1281). The dual EDM cone follows directly from (1301) by standard properties of cones ( 2.1.2 nonnegative orthant contains EDMN ∗ ∗ ∗ (1302) That EDMN is a proper subset of the nonnegative orthant is not obvious from (1301).8. 6. p. and it is not an equivalence between EDM operators or an isomorphism. [89. it is a recipe for constructing the EDM cone whole from large Euclidean bodies: the positive semidefinite cone. Rather. orthogonal complement of the geometric center subspace. it is not an EDM definition.1. Formula (1301) is not a matrix criterion for membership to the EDM cone.6.13. A realization of this construction in low dimension is illustrated in Figure 154 and Figure 155.8. in hindsight. DUAL EDM CONE From the lemma it follows {PSN PSN H | H ∈ SN } = {PSN ∩ SN H | H ∈ SN } c c + + Then from (2067) − SN ∩ SN c + ∗ 549 (1298) = {H − PSN PSN H | H ∈ SN } c + (1299) From (313) we get closure of a vector sum K2 = − SN ∩ SN c + therefore the equality [100] EDMN = K1 ∩ K2 = SN ∩ SN ⊥ − SN h c + (1301) ∗ = SN ⊥ − SN c + (1300) whose veracity is intuitively evident. We wish to verify EDMN = SN ∩ SN ⊥ − SN ⊂ RN ×N h c + + (1303) . and symmetric hollow subspace.109] from the most fundamental EDM definition (922).

the doublet δ(zz T )1T + 1δ(zz T )T ∈ SN ⊥ (1305) c to the orthogonal complement (2042) of the geometric center subspace.3.8. N ] ∈ N (1T ) ( 6. We observe that the dyad 2zz T ∈ SN belongs to the positive semidefinite + cone. δ(zz T )1T + 1δ(zz T )T − 2zz T ≥ 0 (1304) where the inequality denotes entrywise comparison. This in turn implies δ(D∗ 1) = δ(y). for D∗ ∈ SN + D∗ ∈ EDMN ⇔ δ(D∗ 1) − D∗ where. CONE OF DISTANCE MATRICES While there are many ways to prove this. id est.1. Then we have. h Here is an algebraic method to prove nonnegativity: Suppose we are given A ∈ SN ⊥ and B = [Bij ] ∈ SN and A − B ∈ SN . Positive semidefiniteness of B requires nonnegativity A − B ≥ 0 because (ei −ej )TB(ei −ej ) = (Bii −Bij )−(Bji −Bjj ) = 2(ui +uj )−2Bij ≥ 0 6.550 CHAPTER 6. . it is sufficient to show that all entries of the extreme directions of EDMN must be nonnegative.4. . for any symmetric matrix D∗ δ(D∗ 1) − D∗ ∈ SN c (1308) ∗ 0 (1307) To show sufficiency of the matrix criterion in (1307). for c + h some vector u . for any particular nonzero vector z = [zi . A = u1T + 1uT = [Aij ] = [ui + uj ] and δ(B) = δ(A) = 2u .3 Dual Euclidean distance matrix criterion (1306) Conditions necessary for membership of a matrix D∗ ∈ SN to the dual ∗ ∗ EDM cone EDMN may be derived from (1281): D∗ ∈ EDMN ⇒ T D∗ = δ(y) − VN AVN for some vector y and positive semidefinite matrix A ∈ SN −1 . The inequality holds because the i. i = 1 . j th entry of an extreme direction is squared: (z i − zj )2 . while their difference is a member of the symmetric hollow subspace SN .2). Then. recall Gram-form EDM operator D(G) = δ(G)1T + 1δ(G)T − 2G (934) .

1) D∗ ∈ EDMN ∗ ⇔ D . For any D∗ ∈ SN it is obvious: δ(D∗ 1) ∈ SN ⊥ (1312) h Recall: Schoenberg criterion (941) for membership to the EDM cone requires membership to the symmetric hollow subspace. ∗ Find a spectral cone as in 5. 6.8.111] EDMN = {D∗ ∈ SN | D . for Y ∈ SN DT (Y ) (δ(Y 1) − Y ) 2 (1310) Then from (1309) and (935) we have: [89. and recall the self-adjointness property of the main-diagonal linear operator δ ( A.12 Linear Gram-form EDM operator D(Y ) (934) has adjoint. D∗ = δ(G)1T + 1δ(G)T − 2G .4 Nonorthogonal components of dual EDM Now we tie construct (1302) for the dual EDM cone together with the matrix criterion (1307) for dual EDM cone membership.3.1): D .8. δ(D∗ 1) − D∗ ≥ 0 (1518). 6.1 Exercise. DUAL EDM CONE 551 where Gram matrix G is positive semidefinite by definition. D∗ ≥ 0 ∀ D ∈ EDMN (1309) Elegance of this matrix criterion (1307) for membership to the dual EDM cone is lack of any other assumptions except D∗ be symmetric:6.0.6. 6. D∗ = G . A dual EDM cone determined this way is illustrated in Figure 157.2 corresponding to EDMN .1. p. then we have known membership relation ( 2. Dual EDM spectral cone.1. DT (D∗ ) ≥ 0 ∀ G ∈ SN } + 0} (1311) = {D∗ ∈ SN | δ(D∗ 1) − D∗ the dual EDM cone expressed in terms of the adjoint operator.11.2. D∗ ≥ 0 ∀ G ∈ SN } + = {D∗ ∈ SN | G .13.12 . δ(D∗ 1) − D∗ 2 (952) Assuming G . D∗ ≥ 0 ∀ D ∈ EDMN } ∗ = {D∗ ∈ SN | D(G) .8.

Any member D◦ of polar EDM cone can be decomposed into two linearly independent nonorthogonal components: δ(D◦ 1) and D◦ − δ(D◦ 1).552 CHAPTER 6. . CONE OF DISTANCE MATRICES D◦ = δ(D◦ 1) + (D◦ − δ(D◦ 1)) ∈ SN ⊥ ⊕ SN ∩ SN = EDMN h c + SN ∩ SN c + ◦ EDMN ◦ D◦ δ(D◦ 1) D◦ − δ(D◦ 1) 0 EDMN ◦ SN ⊥ h Figure 156: Hand-drawn abstraction of polar EDM cone (drawn truncated).

for some A ∈ SN −1 (confer (1313)) + T δ(D∗ 1) − D∗ = VN AVN ∈ SN ∩ SN c + (1315) T if and only if D∗ belongs to the dual EDM cone. DUAL EDM CONE 553 any diagonal matrix belongs to the subspace of diagonal matrices (67). rewriting.1.8. 6.5 Affine dimension complementarity From 6.8. minimize subject to D ◦ ∈ SN D − δ(D 1) (D◦ − δ(D◦ 1)) − (H − δ(D◦ 1)) ◦ ◦ F 0 which is the projection of affinely transformed optimal solution H − δ(D◦⋆ 1) on SN ∩ SN . Empirically.13 D◦ − H F 0 (1316) This polar projection can be solved quickly (without semidefinite programming) via Lemma 6.13 minimize D◦ ∈ SN subject to D◦ − δ(D◦ 1) 6. Call rank(VN AVN ) dual affine dimension.1. id est. EDMN = −EDMN . c + D◦⋆ − δ(D◦⋆ 1) = PSN PSN (H − δ(D◦⋆ 1)) c + Foreknowledge of an optimal solution D◦⋆ as argument to projection suggests recursion. and its projection on the EDM cone. Figure 156). Any nonzero + dual EDM D∗ = δ(D∗ 1) − (δ(D∗ 1) − D∗ ) ∈ SN ⊥ ⊖ SN ∩ SN = EDMN + c h ∗ (1314) can therefore be expressed as the difference of two linearly independent (when vectorized) nonorthogonal components (Figure 135.1.8.1. we find a complementary relationship in affine dimension between the projection of some arbitrary symmetric matrix H on ◦ ∗ the polar EDM cone. .1. the optimal solution of 6.6. We ∗ know when D∗ ∈ EDMN δ(D∗ 1) − D∗ ∈ SN ∩ SN c + (1313) this adjoint expression (1310) belongs to that face (1271) of the positive semidefinite cone SN in the geometric center subspace.8.3 we have.

1. because then we have no complementarity relation like (1318) or (1319) ( 7. 6. and rank(D◦⋆ − δ(D◦⋆ 1)) ≤ N − 1 because vector 1 is always in the nullspace of rank’s argument.6.6 EDM cone is not selfdual In 5. CONE OF DISTANCE MATRICES has dual affine dimension complementary to affine dimension corresponding to the optimal solution of minimize D∈SN h D−H F T subject to −VN DVN 0 (1317) Precisely. Convex polar problem (1316) can be solved for D◦⋆ by transforming to an equivalent Schur-form semidefinite program ( 3.1.2.2) Then D⋆ = H − D◦⋆ ∈ EDMN by Corollary E. and D⋆ will tend to have low affine dimension.2).2. This approach breaks when attempting projection on a cone subset discriminated by affine dimension or rank. via Gram-form EDM operator D(G) = δ(G)1T + 1δ(G)T − 2G ∈ EDMN ⇐ G 0 (934) we established clear connection between the EDM cone and that face (1271) of positive semidefinite cone SN in the geometric center subspace: + EDMN = D(SN ∩ SN ) c + N (1039) (1040) (1028) V(EDM ) = where SN ∩ c SN + V(D) = −V D V 1 2 . Interior-point methods for numerically solving semidefinite programs tend to produce high-rank solutions.1.4. This is similar to the known result for projection on the selfdual positive semidefinite cone and its polar: rank P−SN H + rank PSN H = N + + (1319) T rank(D◦⋆ − δ(D◦⋆ 1)) + rank(VN D⋆ VN ) = N − 1 (1318) When low affine dimension is a desirable result of projection on the EDM cone.1.9.8.554 CHAPTER 6.5.1). ( 4.1. projection on the polar EDM cone should be performed instead.1.

4) between a closed convex ∗ cone K and its dual K like y .1 we established T SN ∩ SN = VN SN −1 VN c + + 555 (1026) Then from (1307).7 Schoenberg criterion is discretized membership relation We show the Schoenberg criterion T −VN DVN ∈ SN −1 + D ∈ SN h ⇔ D ∈ EDMN (941) to be a discretized membership relation ( 2. it means the EDM cone is not selfdual in SN . and (1281) we can deduce T δ(EDMN 1) − EDMN = VN SN −1 VN = SN ∩ SN + c + ∗ ∗ (1320) which. −D ≥ 0 ∀ zz T | 11Tzz T = 0 D ∈ SN h ⇔ D ∈ EDMN (1265) .6.7.2.8.13.8. means the EDM cone can be related to the dual EDM cone by an equality: EDMN = D δ(EDMN 1) − EDMN ∗ ∗ ∗ ∗ (1321) (1322) V(EDMN ) = δ(EDMN 1) − EDMN This means projection −V(EDMN ) of the EDM cone on the geometric center subspace SN ( E. Secondarily. by (1039) and (1040).1. 6.6. (1315).0.2) is a linear transformation of the dual EDM c ∗ ∗ cone: EDMN − δ(EDMN 1). DUAL EDM CONE In 5. x ≥ 0 for all y ∈ G(K ) ⇔ x ∈ K ∗ ∗ (365) where G(K ) is any set of generators whose conic hull constructs closed ∗ convex dual cone K : The Schoenberg criterion is the same as zz T .

Hitherto a correspondence between the EDM cone and a face of a PSD cone. is the same as T zz T . υ ∈ RN −1 } ⇔ D ∈ EDMN i j (1325) Because {δ(u) | u ∈ RN } . −D ≥ 0 ∀ zz T ∈ VN υυ T VN | υ ∈ RN −1 D∈ SN h ⇔ D ∈ EDMN (1323) where the zz T constitute a set of generators G for the positive semidefinite cone’s smallest face F SN ∋ V ( 6.2 Ambient SN h When instead we consider the ambient space of symmetric hollow matrices (1282). .1 asserts that . Then for D ∈ SN h T D∗ . we can restrict observation h to the symmetric hollow subspace without loss of generality.13.8. by (1266).4.58] D∗ . assuming only D ∈ SN [200. N . then still we find the EDM cone is not selfdual for N > 2. + From the aggregate in (1281) we get the ordinary membership relation. CONE OF DISTANCE MATRICES which. the Schoenberg criterion is now accurately interpreted as a discretized membership relation between the EDM cone and its ordinary dual. −ej eT . identical to the Schoenberg criterion.2. The simplest way to prove this is as follows: Given a set of generators G = {Γ} (1242) for the pointed closed convex EDM cone. 6. D ≥ 0 ∀ D∗ ∈ {δ(u) | u ∈ RN } − cone VN υυ T VN | υ ∈ RN −1 Discretization (365) yields: T D∗ . D ≥ 0 ∀ D∗ ∈ EDMN ⇔ D ∈ EDMN ∗ (1324) ⇔ D ∈ EDMN T D∗ . −VN υυ T VN | i .6.1) that contains auxiliary matrix V . the discretized membership theorem in 2. p. D ≥ 0 ∀ D∗ ∈ −VN υυ T VN | υ ∈ RN −1 ⇔ D ∈ EDMN (1326) this discretized membership relation becomes (1323).556 CHAPTER 6. D ≥ 0 ∀ D∗ ∈ {ei eT . D ≥ 0 ⇔ D ∈ SN . j = 1 . .

selfdual in this dimension. the EDM cone and its dual in ambient Sh each comprise the origin in isomorphic R0 . D∗ ≥ 0 ∀ z ∈ N (1T )} h By comparison EDMN = {D ∈ SN | −zz T . no longer selfdual. 1 1 ∗ ⇒ d12 ≥ 0. thus.9 Theorem of the alternative In 2. and δ(zz T ) = κ2 z=κ 1 −1 The first case adverse to selfdualness N = 3 may be deduced from Figure 144. Figure 157 illustrates the EDM cone and its dual in ambient S3 .1. THEOREM OF THE ALTERNATIVE 557 members of the dual EDM cone in the ambient space of symmetric hollow matrices can be discerned via discretized membership relation: EDMN ∩ SN h ∗ {D∗ ∈ SN | Γ . D ≥ 0 ∀ z ∈ N (1T )} h the term δ(zz T )T D∗ 1 foils any hope of selfdualness in ambient SN .13.4 we prune the h aggregate in (1281) describing the ordinary dual EDM cone. D∗ ≥ 0 ∀ Γ ∈ G(EDMN )} h = {D ∈ ∗ (1327) ∗ SN h = {D∗ ∈ SN | 1δ(zz T )T − zz T . h | δ(zz )1 + 1δ(zz ) − 2zz .9.1 we showed how alternative systems of generalized inequality can be derived from closed convex cones and their duals.2.6. a fitting postscript to the discussion of the dual EDM cone. verified directly: for all κ ∈ R . the EDM cone is a circular cone in isomorphic R3 corresponding to no rotation of Lorentz cone (178) (the selfdual circular cone). the EDM cone is the nonnegative real line in isomorphic R . . This section is.13. thus selfdual in this dimension.9. removing any member having nonzero main diagonal: T T EDMN ∩ SN = cone δ 2(VN υυ T VN ) − VN υυ T VN | υ ∈ RN −1 h T T = {δ 2(VN ΨVN ) − VN ΨVN | Ψ ∈ SN −1 } + ∗ (1329) When N = 1. D T T T T T ≥ 0 ∀ z ∈ N (1T )} (1328) To find the dual EDM cone in ambient SN per 2. therefore. ∗ (Figure 148) EDM2 in S2 is identical. h 6. h This result is in agreement with (1327). (confer (104)) When N = 2.

558 CHAPTER 6. h drawn tiled in isometrically isomorphic R3 . (It so happens: intersection ∗ EDMN ∩ SN ( 2.3) is identical to projection of dual EDM cone on SN . CONE OF DISTANCE MATRICES 0 dvec rel ∂ EDM3 dvec rel ∂(EDM3 ∩ S3 ) h D∗ ∈ EDMN ⇔ δ(D∗ 1) − D∗ ∗ ∗ 0 (1307) Figure 157: Ordinary dual EDM cone projected on S3 shrouds EDM3 .13.) h h .9.

0.1. EDM alternative. a convex optimization problem.9. 2] ( E. ( 7. in fact. POSTSCRIPT 6. the alternatives are incompatible.10 Postscript We provided an equality (1301) relating the convex cone of Euclidean distance matrices to the convex cone of positive semidefinite matrices.4) In the past. either N (D) intersects hyperplane {z | 1Tz = 1} or D is an EDM.0. But a solution acquired that way is necessarily suboptimal. 2] N (D) ⊂ N (1T ) = {z | 1Tz = 0} Because [164. a matter of truncating a list of eigenvalues.3 we present a method for projecting directly on the EDM cone under a constraint on rank or affine dimension. A surrogate method was to invoke the Schoenberg criterion (941) and then project on a positive semidefinite cone under a rank constraint bounding affine dimension from above.3.1) DD† 1 = 1 1T D† D = 1T then R(1) ⊂ R(D) (1333) (1332) (1331) 6.6. constrained by an upper bound on rank. In 7. Given D ∈ SN h D ∈ EDMN 559 [164.10. [130] simply.1 Theorem. . is easy and well known. 1] or in the alternative 1Tz = 1 Dz = 0 (1330) ∃ z such that In words. Projection on a positive semidefinite cone with such a rank constraint is. Projection on a positive semidefinite cone.0. it was difficult to project on the EDM cone under a constraint on rank or affine dimension. ⋄ When D is an EDM [260.

560 CHAPTER 6. CONE OF DISTANCE MATRICES .

Mεβoo Publishing USA.1.3.01. co&edg version 2012.Chapter 7 Proximity problems In the “extremely large-scale case” (N of order of tens and hundreds of thousands). 5. 2005. v2012.28.1. including all known polynomial time algorithms. 561 . [iteration cost O(N 3 )] rules out all advanced convex optimization techniques. rather. citation: Dattorro. All rights reserved.01. −Arkadi Nemirovski (2004) A problem common to various sciences is to find the Euclidean distance matrix (EDM) D ∈ EDMN closest in some sense to a given complete matrix of measurements H under a constraint on affine dimension 0 ≤ r ≤ N − 1 ( 2.1). Convex Optimization & Euclidean Distance Geometry. 2001 Jon Dattorro. r is bounded above by desired affine dimension ρ .28.7.

to belong to the intersection of the orthant of nonnegative matrices RN ×N with the symmetric + hollow subspace SN ( 2. Sometimes realization of an optimization problem demands that its input. possess some particular characteristics.3).2). h When measurements of distance in H are negative.0. then we must impose them upon H prior to optimization: When measurement matrix H is not symmetric or hollow. on the nonnegative orthant in the symmetric hollow subspace that is. When that H given does not have the desired properties. and only then clipping negative entries of the result to 0 .2. perhaps symmetry and hollowness or nonnegativity.1. there is only one correct order of projection. on an orthant intersecting a subspace: .562 CHAPTER 7. H can possess significant measurement uncertainty (noise).2.1) K SN ∩ RN ×N h + (1334) Yet in practice.2.1 Measurement matrix H Ideally. the given matrix H . PROXIMITY PROBLEMS 7.0.3. taking its symmetric hollow part is equivalent to orthogonal projection on the symmetric hollow subspace SN . zeroing negative entries effects unique minimum-distance projection on the orthant of 2 nonnegative matrices RN ×N in isomorphic RN ( E.1). we want H to belong to the h polyhedral cone ( 2. in fact.12. in general. the intersection.9. + 7. we want to project on that subset of the orthant belonging to the subspace. id est. we want a given matrix of measurements H ∈ RN ×N to conform with the first three Euclidean metric properties ( 5.1. Geometrically.1 Order of imposition Since convex cone K (1334) is the intersection of an orthant with a subspace. For that reason alone. unique minimum-distance projection 2 of H on K (that member of K closest to H in isomorphic RN in the Euclidean sense) can be attained by first taking its symmetric hollow part.0.0.

Thus minimization proceeds inherently following the correct order for projection on SN ∩ C . h There are two equivalent interpretations of projection ( E. Consider the proximity problem 7. minimum distance between a point and a set.1 over convex feasible set SN ∩ C given nonsymmetric h nonhollow H ∈ RN ×N : minimize B∈ SN h B−H B∈C 2 F subject to (1335) a convex optimization problem. Here we realize the latter view.9): one finds a set normal.3). order of projection on an intersection of subspaces is arbitrary.563 project on the subspace. Therefore we may either assume h H ∈ SN . This action of Frobenius’ norm (1337) is effectively a Euclidean projection (minimum-distance projection) of H on the symmetric hollow subspace SN h prior to minimization. (confer E. then project the result on the orthant in that subspace.1 . of course. then h for B ∈ SN h tr B T 1 (H − H T ) + δ 2(H) 2 =0 (1336) so the objective function is equivalent to B−H 2 F ≡ B− 1 (H + H T ) − δ 2(H) 2 2 + F 1 (H − H T ) + δ 2(H) 2 F (1337) 2 This means the antisymmetric antihollow part of given matrix H would be ignored by minimization with respect to symmetric hollow variable B under Frobenius’ norm. That order-of-projection rule applies more generally. the other.5) In contrast.9. or take its symmetric hollow part prior to optimization. to the intersection of any convex set C with any subspace.2. id est. minimization proceeds as though given the symmetric hollow part of H . Because the symmetric hollow subspace SN h is orthogonal to the antisymmetric antihollow subspace RN ×N ⊥ ( 2. 7.

.. . . .. ... ..... .. ..... .... ....... .. .. . . ... .. . .. ........ .. ..... N . . . . .. ... . .... .. ............. . ..... ... . . ............. . ......... R b b b ¨¨ " b ¨¨ " b ¨ " b " b ¨¨ " b ¨ " ¨¨ b " ¨ b h " 0 ™hhhhh b hhhhh " b hhhhh EDM " S b ™ hhh " b ™ " b ™ " b ™ " b " b ™ " b ™ " b ™ K=S ∩ R " b ™ " b " b ™ S " b ™ b ™ "" b " b b" " " " " "b b Figure 158: Pseudo-Venn diagram: The EDM cone belongs to the intersection of the symmetric hollow subspace with the nonnegative orthant. Now comes a surprising fact: Even were we to correctly follow the order-of-projection rule and provide H ∈ K prior to optimization. .. ... ... .. .. ............. . ... .. ........ .. .. .... .. .. ........ ... ....... .. ........... . . ......... . ...... . . . .... .... .. .. ... . .. .. ....... . ... . ...... . ........ N N ×N .. .. . . . ......... .. ..... . .. . ......... ... ... . .. ........... .. .... .......... N ×N .. . .... ............. . . .1........... . ...... . . ...... ......... ..... . ... ... .. .... . .... .......... . ..... ... ... then the ensuing projection on EDMN would be guaranteed incorrect (out of order).. ......... .. .... EDMN cannot exist outside SN ....... . ....... ............. . ..... . ...... .... . .. ........ . .. ..... . ..... . ... .............. ... . ... and were we only to zero negative entries of a nonsymmetric nonhollow input H prior to optimization........ . . . ................... . ... .. ............ EDMN ⊆ K (921). ..... ...... . .... . . ... ....... .... .. .. .... . .. .......... .. .... .. ............ . .... ............564 CHAPTER 7... .... .... .......... N .. .. . .... ......... N ...... . .. .... ..... . .. . ..0.... ... ... ......... h + 7. .... ... . ........ . . ......... .. h + . ... .. ..... ... .... ...... ........ ... but RN ×N does.................. PROXIMITY PROBLEMS . ... ....... . . ................. .. . . ... ........... . ... .... .. . ... ....... .... .. ..... . . .. then the ensuing projection on EDMN will be incorrect whenever input H has negative entries and some proximity problem demands nonnegative input H .......... . . .... .... ..... ........ ....... .......... ........... .. ....... ...2 Egregious input error under nonnegativity demand More pertinent to the optimization problems presented herein where C EDMN ⊆ K = SN ∩ RN ×N h + (1338) then should some particular realization of a proximity problem demand input H be nonnegative. ...... .... .. ....... ... ... ... .... ...... . . ..... ..... . .... ..... .... h ...

The first two line segments leading away from H result from correct order-of-projection required to provide nonnegative H prior to optimization. Were H nonnegative.565 0 H SN h EDMN K = SN ∩ RN ×N h + Figure 159: Pseudo-Venn diagram from Figure 158 showing elbow placed in path of projection of H on EDMN ⊂ SN by an optimization problem h demanding nonnegative input matrix H . h (confer Figure 173) . making the elbow disappear. then its projection on SN would instead belong to K .

7.6)). ΣB − ΣA F ≤ U . There is no resolution unless input H is guaranteed nonnegative with no tinkering. one not demanding nonnegative input. that being. T T Denoting by A U ΣA QA and B UB ΣB QB their full singular value A decompositions (whose singular values are always nonincreasingly ordered ( A. a particular proximity problem realization requiring nonnegative input.4. UB . QA .566 CHAPTER 7. then the direction of projection is orthogonal to C . Yet h such an imposition prior to projection on EDMN generally introduces an elbow into the path of projection (illustrated in Figure 159) caused by the technique itself. This particular objective denotes Euclidean projection ( E) of vectorized matrix A on the set C which may or may not be convex. [203. Otherwise. When C is a subspace. PROXIMITY PROBLEMS This is best understood referring to Figure 158: Suppose nonnegative input H is demanded.51] . 7.3]. there exists a tight lower bound on the objective over the manifold of orthogonal matrices. When C is convex. and then the problem realization correctly projects its input first on SN and then directly on C = EDMN . we have no choice but to employ a different problem realization.0.2.1) including the Frobenius and spectral norm [330. QB A inf B−A F (1340) This least lower bound holds more generally for any orthogonally invariant norm on Rm×n ( 2. Any procedure for imposition of nonnegativity on input H can only be incorrect in this circumstance. then the projection is unique minimum-distance because Frobenius’ norm when squared is a strictly convex function of variable B and because the optimal solution is the same regardless of the square (512).2 Lower bound minimize B Most of the problems we encounter in this chapter have the general form: subject to B ∈ C B−A F (1339) where A ∈ Rm×n is given data. That demand h for nonnegativity effectively requires imposition of K on input H prior to optimization so as to obtain correct order of projection (on SN first). II.

(That conversion is performed regardless of whether known data is complete. the three prevalent proximity problems are (1343. i = 1 .0.7. the Gram matrix acting as bridge between position and distance.0. j ∈ I i.1) or (1342).2.2.3): [344] for D [dij ] and ◦ D [ dij ] minimize (1) D subject to rank V D V ≤ ρ D ∈ EDMN minimize D −V (D − H)V 2 F minimize √ ◦ D √ ◦ D−H 2 F subject to rank V D V ≤ ρ √ ◦ D ∈ EDMN minimize √ ◦ D (2) (1343) 2 F (3) subject to rank V D V ≤ ρ D ∈ EDMN D−H 2 F −V ( D − H)V √ ◦ subject to rank V D V ≤ ρ √ ◦ D ∈ EDMN (4) . the multiplicity due primarily to choice of projection on the EDM versus positive semidefinite (PSD) cone and vacillation between the distance-square variable dij versus absolute distance dij .1). √ (1343.2). This approach is taken because we prefer introduction of rank constraints into convex problems rather than searching a googol of local minima in nonconvex problems like (1341) [105] ( 3.0. N } such as minimize ( xi − xj − hij )2 (1341) {xi } or minimize {xi } i. and (1343. . 7. In their most fundamental form.6.4 Three prevalent proximity problems There are three statements of the closest-EDM problem prevalent in the literature.3 Problem approach Problems traditionally posed in terms of point position {xi ∈ Rn .4.) Then the techniques of chapter 5 or chapter 6 are applied to find relative or absolute position.3. j ∈ I ( x i − xj 2 − hij )2 (1342) (where I is an abstract set of indices and hij is given data) are everywhere converted herein to the distance-square variable D or to Gram matrix G . . 7.567 7.

We show in 7. the geometric centering matrix V (944) appears in the literature most often although VN (928) is the auxiliary matrix naturally consequent to Schoenberg’s seminal exposition [314].3).4 how (1343. produces a different result because minimize D subject to D ∈ EDMN 7.2 −V (D − H)V 2 F (1344) Because −VHV is orthogonal projection of −H on the geometric center subspace SN c ( E. whereas problems (1343.2).3) becomes equivalent to (1342).2. PROXIMITY PROBLEMS where we have made explicit an imposed upper bound ρ on affine dimension T r = rank VN DVN = rank V D V (1074) that is benign when ρ = N − 1 or H were realizable with r ≤ ρ .1. T Substitution of VN for left-hand V in (1343. Generally speaking.7. p. Of the various auxiliary V -matrices ( B.4) is not posed in the literature because it has limited theoretical foundation. as the problem is stated.1) becomes a convex optimization problem for any ρ when transformed to the spectral domain.4). each problem in (1343) produces a different result because there is no isometry relating them. −V ◦ D V ∈ SN ( 5.3 D ∈ EDMN ⇒ ◦ D ∈ EDMN . in particular. Even with the rank constraint removed from (1343.4) may be interpreted as oblique (nonminimum distance) projections of −H on a positive semidefinite cone.2 Problem (1343.34] [103] [357] Problems (1343.3) are convex optimization problems in D for the case ρ = N − 1 wherein (1343.3 Analytical solution to (1343. Problems (1343.2) and (1343.1).7. problem (1343.1) and (1343. we will see that the convex problem remaining inherently minimizes affine dimension. [53.1) and (1343.3) are Euclidean projections of a vectorized matrix H on an EDM cone ( 6. √ √ 7. Substitution of any auxiliary matrix or its pseudoinverse into these problems produces another valid problem.2) and (1343.2) becomes a variant of what is known in statistics literature as a stress problem. problems (1343. it is a convex optimization only in the case ρ = N − 1. When expressed as a function of point list in a matrix X as in (1341).568 CHAPTER 7.4) are Euclidean projections of a vectorized matrix −VHV on a PSD cone.0.10) + .1) is known in closed form for any bound ρ although.2).7.

7.g. implicit to a realizable measurement matrix H .4. e. but is not an isometry.4. .6.3) or VN yields the same † T result as (1343. the optimal objective value will vanish ( ⋆ = 0).4      Problem 1 (1347) †T † The isomorphism T (Y ) = VN Y VN onto SN = {V X V | X ∈ SN } relates the map in c (1345) to that in (1344).1) because V = VW VW = VN VN . † T But substitution of auxiliary matrix VW ( B.5.2. 6. B..5 7. 5.4. whereas minimize D T −VN (D − H)VN 2 F subject to D ∈ EDMN (1345) T attains Euclidean distance of vectorized −VN H VN to the positive semidefinite cone in isometrically isomorphic subspace RN (N −1)/2 . Each has its own coherent interpretations.5 All four problem formulations (1343) produce identical results when affine dimension r .7. FIRST PREVALENT PROBLEM: 569 finds D to attain Euclidean distance of vectorized −V H V to the positive semidefinite cone in ambient isometrically isomorphic RN (N +1)/2 .4 regardless of whether affine dimension is constrained. 7.1. id est. quite different projections7. −V (D − H)V 2 F T T = −VW VW (D − H)VW VW † † = −VN VN (D − H)VN VN 2 F 2 F = = T −VW (D − H)VW † −VN (D − H)VN 2 F 2 F (1346) We see no compelling reason to prefer one particular auxiliary V -matrix over another.1 First prevalent problem: Projection on PSD cone This first problem minimize D T −VN (D − H)VN 2 F T subject to rank VN DVN ≤ ρ D ∈ EDMN 7. Neither can we say any particular problem formulation produces generally better results than another. does not exceed desired affine dimension ρ . because.

PROXIMITY PROBLEMS T poses a Euclidean projection of −VN H VN in subspace SN −1 on a generally nonconvex subset (when ρ < N − 1) of the positive semidefinite cone boundary ∂ SN −1 whose elemental matrices have rank no greater than desired + affine dimension ρ ( 5.6 .570 CHAPTER 7. It is wrong here to zero the main diagonal of given H because first projecting H on the symmetric hollow subspace places an elbow in the path of projection in Problem 1.2] for all nonnegative ρ ≤ N − 1 .1) in the subspace of symmetric matrices SN −1 . thm.7 H ∈ SN (1348) T Arranging the eigenvalues λ i of −VN H VN in nonincreasing order for all i . λ i ≥ λ i+1 with vi the corresponding i th eigenvector. this optimization problem is convex only when desired affine dimension is largest ρ = N − 1 although its analytical solution is known [258.4.7 Projection in Problem 1 is on a rank ρ subset of the positive semidefinite cone SN −1 + ( 2.14.1. Problem 1 finds the closest EDM D in the sense of Schoenberg. then an optimal solution to Problem 1 is [356.7. In 7. (941) [314] As it is stated.1.6 We assume only that the given measurement matrix H is symmetric.2) to Eckart & Young.7.2.7.9. being first pronounced in the context of multidimensional scaling by Mardia [257] in 1978 who attributes the generic result ( 7. λ i } vi vi (1349) where T −VN H VN N −1 i=1 T λ i vi vi ∈ SN −1 (1350) is an eigenvalue decomposition and D⋆ ∈ EDMN (1351) is an optimal Euclidean distance matrix. 2] ρ T −VN D⋆ VN = i=1 T max{0. 1936 [130]. 7. (Figure 159) 7.1.4 we show how to transform Problem 1 to a convex optimization problem for any ρ .1).

1) (1623) (confer (2074)) T −VN D⋆ VN 0 T ⋆ T T −VN D VN −VN D⋆ VN + VN H VN = 0 T T −VN D⋆ VN + VN H VN 0 (1353) T Symmetric −VN H VN is diagonalizable hence decomposable in terms of its eigenvectors v and eigenvalues λ as in (1350).1. To see that.0. p.8 . Therefore (confer (1349)) N −1 i=1 T −VN D⋆ VN = T max{0.9. When desired affine dimension is unconstrained. the rank function disappears from (1347) leaving a convex optimization problem. 5.1 Closest-EDM Problem 1. λ i }vi vi (1355) The Karush-Kuhn-Tucker (KKT) optimality conditions [282.8 ( E. convex case 7. convex case. 7.2. Because T SN −1 = −VN SN VN h (1043) then the necessary and sufficient conditions for projection in isometrically isomorphic RN (N −1)/2 on the selfdual (376) positive semidefinite cone SN −1 + are:7. λ i }vi vi (1354) these satisfies (1353).3] for problem (1352) are identical to these conditions for projection on a convex cone. recall: eigenvectors constitute an orthogonal set and T −VN D⋆ VN + T VN H VN = − N −1 i=1 T min{0. FIRST PREVALENT PROBLEM: 571 7.1 Proof. optimally solving (1352).7. ρ = N − 1.1. a simple unique minimum-distance projection on the positive semidefinite cone SN −1 : + videlicet T minimize −VN (D − H)VN 2 F N D∈ Sh (1352) T subject to −VN DVN 0 by (941).1. Solution (1349).5.1.0.328] [61.

1. T then Problem 1 is truly a Euclidean projection of vectorized −VN H VN on that generally nonconvex subset of symmetric matrices (belonging to the positive semidefinite cone SN −1 ) having rank no greater than desired affine + . PROXIMITY PROBLEMS 7. a unique correspondence by injectivity arguments in 5.12 can be used to determine a uniquely corresponding optimal Euclidean distance matrix D⋆ .1. the technique of 5. 7. λ i } vi vi ∈ SN −1 (1349) Once optimal B ⋆ is found.2.2.572 CHAPTER 7.2 generic problem Prior to determination of D⋆ .1 Projection on rank ρ subset of PSD cone Because Problem 1 is the same as minimize D∈ SN h T −VN (D − H)VN 2 F T subject to rank VN DVN ≤ ρ T −VN DVN 0 (1357) and because (1043) provides invertible mapping to the generic problem. analytical solution (1349) to Problem 1 is equivalent to solution of a generic rank-constrained projection problem: Given desired affine dimension ρ and A T −VN H VN = N −1 i=1 T λ i vi vi ∈ SN −1 (1350) Euclidean projection on a rank ρ subset of a PSD cone (on a generally nonconvex subset of the PSD cone boundary ∂ SN −1 when ρ < N − 1) + minimize B∈ SN −1 has well known optimal solution (Eckart & Young) [130] ρ subject to rank B ≤ ρ   B 0 = i=1 B−A 2 F    Generic 1 (1356) B ⋆ T −VN D⋆ VN T max{0.6.

Define a nonlinear permutation-operator π(x) : Rn → Rn that sorts its vector argument x into nonincreasing order.0.0. Then spectral projection simply means projection of δ(Λ) on a subset of the nonnegative orthant.5.1) and diagonal matrix Λ is ordered during diagonalization ( A. We will see why an orthant turns out to be the best choice of spectral cone. Let R be an orthogonal matrix and Λ a nonincreasingly ordered diagonal matrix of eigenvalues. Although solution to generic problem (1356) is well known since 1936 [130].g. In this section we develop a method of spectral projection for constraining rank of positive semidefinite matrices in a proximity problem like (1356). equal to dimension of the smallest affine set in which points from a list X corresponding to an EDM D can be embedded.1.1). and why presorting is critical.3.2. as we shall now ascertain: It is curious how nonconvex Problem 1 has such a simple analytical solution (1349).3). projection on a positive semidefinite cone. FIRST PREVALENT PROBLEM: dimension ρ . orthogonal matrix R equals I ( 7. Spectral projection. its equivalence was observed in 1997 [356. 7.5.1.7.11.5) nonincreasingly ordered (π) vector (δ) of eigenvalues π δ(RTΛR) (1358) on a polyhedral cone containing all eigenspectra corresponding to a rank ρ subset of a positive semidefinite cone ( 2.3 Choice of spectral cone Spectral projection substitutes projection on a polyhedral cone.1 Definition. containing a complete set of eigenspectra.1.1) or the EDM cone (in Cayley-Menger form. △ In the simplest and most common case..1.7.9. 7. (1369). Spectral projection means unique minimum-distance projection of a rotated (R .9 .9 called rank ρ subset: (260) SN −1 \ SN −1(ρ + 1) = {X ∈ SN −1 | rank X ≤ ρ} + + + (216) 573 7. 2] to projection Recall: affine dimension is a lower bound on embedding ( 2. e.3. B. in place of projection on a convex set of diagonalizable matrices.2. 5.4.1).

1.3.13.9.0. id est. + ρ KM+ {υ ∈ Rρ | υ1 ≥ υ2 ≥ · · · ≥ υρ ≥ 0} ⊆ Rρ + (1359) a pointed polyhedral cone.2] Any vectors σ and γ in RN −1 satisfy a tight inequality π(σ)T π(γ) ≥ σ T γ ≥ π(σ)T Ξ π(γ) (1362) where Ξ is the order-reversing permutation matrix defined in (1766). 1.1).2 Exercise.3. momentarily. PROXIMITY PROBLEMS of an ordered vector of eigenvalues (in diagonal matrix Λ) on a subset of the monotone nonnegative cone ( 2. 7.3 Proposition. 7. Smallest spectral cone.2. [180. is only the smallest convex subset of the monotone nonnegative cone KM+ containing every nonincreasingly ordered eigenspectrum corresponding to a rank ρ subset of the positive semidefinite cone SN −1 .574 CHAPTER 7.4. ρ KM+ 0 N −1 = π(λ(rank ρ subset)) ⊆ KM+ KM+ (1360) For each and every elemental eigenspectrum γ ∈ λ(rank ρ subset) ⊆ RN −1 + (1361) of the rank ρ subset (ordered or unordered in λ). ⋄ . and permutator π(γ) is a nonlinear function that sorts vector γ into nonincreasing order thereby providing the greatest upper bound and least lower bound with respect to every possible sorting.2) KM+ = {υ | υ1 ≥ υ2 ≥ · · · ≥ υN −1 ≥ 0} ⊆ RN −1 + (428) Of interest.0. X] o [55. there is a nonlinear ρ surjection π(γ) onto KM+ . (Hardy-Littlewood-P´lya) Inequalities. a ρ-dimensional convex subset of the monotone nonnegative cone KM+ ⊆ RN −1 having property.1.9. for λ denoting + eigenspectra. ρ Prove that there is no convex subset of KM+ smaller than KM+ containing every ordered eigenspectrum corresponding to the rank ρ subset of a positive semidefinite cone ( 2.

Any given vectors σ . Proof.1.1. ⋄ Given γ ∈ RN −1 σ∈RN −1 + inf σ−γ = inf σ∈RN −1 + π(σ)−π(γ) = inf σ∈RN −1 + σ−π(γ) = inf σ∈KM+ σ−π(γ) (1364) Yet for γ representing an arbitrary vector of eigenvalues. Simply because π(γ)1:ρ infρ σ∈ π(γ1:ρ ) σ1:ρ − γ1:ρ 2 R+ 0 σ−γ 2 T = γρ+1:N −1 γρ+1:N −1 + inf σ∈RN −1 + T T = γ T γ + inf σ1:ρ σ1:ρ − 2σ1:ρ γ1:ρ σ∈RN −1 + T T ≥ γ T γ + inf σ1:ρ σ1:ρ − 2σ1:ρ π(γ)1:ρ σ∈RN −1 + (1366) R σ∈ + 0 infρ σ−γ 2 ≥ infρ R σ∈ + 0 σ − π(γ) 2 . γ ∈ RN −1 satisfy a tight Euclidean distance inequality π(σ) − π(γ) ≤ σ − γ (1363) where nonlinear function π(γ) sorts vector γ into nonincreasing order thereby providing the least lower bound with respect to every possible sorting. because infρ σ∈ R+ 0 σ−γ 2 ≥ infρ σ∈ R+ 0 σ − π(γ) 2 = σ∈ inf ρ KM+ 0 σ − π(γ) 2 (1365) then projection of γ on the eigenspectra of a rank ρ subset can be tightened simply by presorting γ into nonincreasing order. Monotone nonnegative sort.0.7. FIRST PREVALENT PROBLEM: 575 7.3.4 Corollary.

2] Here we derive + that known result but instead using a more geometric argument via spectral projection on a polyhedral cone (subsuming the proof in 7.4.3. + in other words. 7.1.1. nonconvex case.576 7. we may instead work with the more facile generic problem (1356).0.1. unique minimum-distance projection of sorted γ on the nonnegative orthant in a ρ-dimensional subspace of RN is indistinguishable ρ from its projection on the subset KM+ of the monotone nonnegative cone in that same subspace. Only then does unique spectral projection on a subset ρ KM+ of the monotone nonnegative cone become equivalent to unique spectral projection on a subset Rρ of the nonnegative orthant (which is simpler). “nonconvex” case Proof of solution (1349).1 Proof.1.Υ ≡ subject to rank Υ ≤ ρ Υ 0 R −1 = RT Υ − RTΛR 2 F (1369) . Solution (1349). As explained in 7. In so doing. we demonstrate how nonconvex Problem 1 is transformed to a convex optimization: 7. [356. With diagonalization of unknown B U ΥU T ∈ SN −1 (1367) given desired affine dimension 0 ≤ ρ ≤ N − 1 and diagonalizable A QΛQT ∈ SN −1 (1368) having eigenvalues in Λ arranged in nonincreasing order. can be algebraic in nature. for projection on a rank ρ subset of the positive semidefinite cone SN −1 . PROXIMITY PROBLEMS Orthant is best spectral cone for Problem 1 This means unique minimum-distance projection of γ on the nearest spectral member of the rank ρ subset is tantamount to presorting γ into nonincreasing order.1 CHAPTER 7. by (48) the generic problem is equivalent to minimize B∈ SN −1 subject to rank B ≤ ρ B 0 B−A 2 F minimize R .1.2.1).4 Closest-EDM Problem 1.

1. hence. FIRST PREVALENT PROBLEM: where R QT U ∈ RN −1×N −1 577 (1370) in U on the set of orthogonal matrices is a bijection. (3) unique + Rρ + . Unique minimum-distance projection Rρ + T of vector π δ(R ΛR) on the ρ-dimensional subset of nonnegative 0 .7.9. ( E. δ(RTΛR) ∈ RN −1 can always be arranged in nonincreasing order without loss of generality. (2) nonincreasingly ordering the result.5) minimum-distance projection of the ordered result on 0 Projection on that N− 1-dimensional subspace amounts to zeroing RTΛR at all entries off the main diagonal. the equivalent sequence leading with a spectral projection: minimize Υ δ(Υ) − π δ(RTΛR) Rρ + 0 2 subject to δ(Υ) ∈ minimize R (a) (1372) subject to R −1 = RT Υ⋆ − RTΛR 2 F (b) Because any permutation matrix is an orthogonal matrix. thus. permutation operator π . We propose solving (1369) by instead solving the problem sequence: minimize Υ subject to rank Υ ≤ ρ Υ 0 minimize R Υ − RTΛR 2 F (a) (1371) 2 F subject to R −1 = RT Υ⋆ − RTΛR (b) Problem (1371a) is equivalent to: (1) orthogonal projection of RTΛR on an N − 1-dimensional subspace of isometrically isomorphic RN (N −1)/2 containing δ(Υ) ∈ RN −1 .

10 . i (1373) Any value Υ⋆ satisfying . a nonconvex subset of its boundary) from either the exterior or interior of that cone. conditions (1373) is optimal for (1372a). . π δ(RTΛR) 0. Projection on the boundary from the interior of a convex Euclidean body is generally a nonconvex problem. For selection of Υ⋆ as in (1374). PROXIMITY PROBLEMS orthant RN −1 requires: ( E.578 CHAPTER 7.2. . .1) + δ(Υ⋆ )ρ+1:N −1 = 0 δ(Υ⋆ ) δ(Υ⋆ ) − π(δ(RTΛR)) 0 0 δ(Υ⋆ )T δ(Υ⋆ ) − π(δ(RTΛR)) = 0 which are necessary and sufficient conditions. 7.4. this lower bound is attained when (confer C. is generally nonconvex. The lower bound on the objective with respect to R in (1372b) is tight: by (1340) |Υ⋆ | − |Λ| F ≤ Υ⋆ − RTΛR F (1375) where | | denotes absolute entry-value. . i=1 .4. projection on a rank ρ subset becomes a convex optimization problem.1. So δ(Υ⋆ )i = max 0 .7. ρ i = ρ+1 .2.9. 7.1 Significance (1376) Importance of this well-known [130] optimal solution (1349) for projection on a rank ρ subset of a positive semidefinite cone should not be dismissed: Problem 1.2) R⋆ = I which is the known solution.0. This analytical solution at once encompasses projection on a rank ρ subset (216) of the positive semidefinite cone (generally. as stated. N − 1 (1374) specifies an optimal solution.10 By problem transformation to the spectral domain.

5 Problem 1 in spectral norm.12 7. 0 t .13 giving rise to nonorthogonal projection ( E. Otherwise.1 for unique projection on a closed convex cone does not apply here because the direction of projection is not necessarily a member of the dual PSD cone. This occurs.7. its solution set includes the Frobenius solution (1349) for the convex T case whenever −VN H VN is a normal matrix.D µ µI (1378) T subject to −µI −VN (D − H)VN D ∈ EDMN 7. for example.1. 8. Because U ⋆ = Q .9.1) on the positive semidefinite cone SN −1 . 7. 1] [178] [61.2.1. convex case When instead we pose the matrix 2-norm (spectral norm) in Problem 1 (1347) for the convex case ρ = N − 1. then the new problem minimize D subject to D ∈ EDM T −VN (D − H)VN N 2 (1377) is convex although its solution is not necessarily unique.11 But Theorem E.0.3. for example.7. 579 This solution is equivalent to projection on a polyhedral cone in the spectral domain (spectral projection.1 . a minimum-distance projection on a rank ρ subset of the positive semidefinite cone is a positive semidefinite matrix orthogonal (in the Euclidean sense) to direction of projection.1) for membership of a symmetric matrix to a rank ρ subset of a positive semidefinite cone ( 2. distinct eigenvalues (multiplicity 1) in Λ guarantees uniqueness of this solution by the reasoning in A.1. this solution is always unique.5. a necessary and sufficient condition ( A. has the same spectral-norm value.3. projection on a spectral cone 7.11 For the convex case problem (1352). 2 0 7. whenever positive eigenvalues are truncated.0.0.13 For each and every |t| ≤ 2. [185.2.9. FIRST PREVALENT PROBLEM: This solution is closed form.1).0.0.9.1). + Indeed.1] Proximity problem (1377) is equivalent to minimize µ.7.7.12 Uncertainty of uniqueness prevents the erroneous conclusion that a rank ρ subset (216) were a convex body by the Bunt-Motzkin theorem ( E.1.1).

√ √ ◦ ◦ D = [dij ] D ◦ D ∈ EDMN (1381) where ◦ denotes Hadamard product. For lack of a unique solution here.. 7. confer Figure 144b)  √ ◦  minimize D−H 2 √ F  ◦  D T Problem 2 (1382) subject to rank VN DVN ≤ ρ  √   ◦ D ∈ EDMN where √ ◦ EDMN = { D | D ∈ EDMN } (1215) This statement of the second proximity problem is considered difficult to solve because of the constraint on desired affine dimension ρ ( 5.2 Let Second prevalent problem: Projection on EDM cone in √ ◦ D [ dij ] ∈ K = SN ∩ RN ×N h + dij (1380) be an unknown matrix of absolute distance. i = 1. . N−1 ∈ R+ (1379) the minimized largest absolute eigenvalue (due to matrix symmetry).2) and because the objective function √ ◦ ( dij − hij )2 D−H 2 = (1383) F i. we prefer the Frobenius rather than spectral norm.7. id est.j is expressed in the natural coordinates.580 by (1742) where µ⋆ = max i CHAPTER 7. projection on a doubly nonconvex set. The second prevalent proximity problem is a Euclidean projection (in the natural coordinates dij ) of matrix H on a nonconvex subset of the boundary of the nonconvex cone of Euclidean absolute-distance matrices rel ∂ EDMN : ( 6..3. PROXIMITY PROBLEMS T λ −VN (D⋆ − H)VN i .

Otherwise we might erroneously conclude EDMN were a convex body by the Bunt-Motzkin theorem ( E.7. the rank constraint vanishes and a convex problem that is equivalent to (1341) emerges:7. projection of H on K = SN ∩ RN ×N (1334) prior to application of this proposed solution is h + incorrect. then the technique of solution presented here becomes invalid.2.j dij − 2hij dij + h2 ij (1385) subject to D∈ EDMN subject to D ∈ EDMN For any fixed i and j .15 The transformed problem in variable D no longer describes Euclidean projection on an EDM cone. strictly convex in D on the nonnegative orthant. moreover. Existence of a unique solution D⋆ for this second prevalent problem depends upon nonnegativity of H and a convex feasible set ( 3.2.14 minimize √ ◦ D √ ◦ √ ◦ D−H 2 F minimize ⇔ D i.2. it can be easily √ ascertained: ◦ D − H F is not convex in D . SECOND PREVALENT PROBLEM: 581 Our solution to this second problem prevalent in the literature requires measurement matrix H to be nonnegative.0. 3.1).1 Convex case When ρ = N − 1. 7. H = [hij ] ∈ RN ×N + (1384) If the H matrix given has negative entries. 7. we have a convex optimization problem: √ minimize 1T(D − 2H ◦ ◦ D )1 + H 2 F D (1386) N subject to D ∈ EDM The objective function being a sum of strictly convex functions is. .1.1] and because the feasible set is convex in D .14 still thought to be a nonconvex problem as late as 1997 [357] even though discovered convex by de Leeuw in 1993. the argument of summation is a convex function of dij because (for nonnegative constant hij ) the negative square root is convex in nonnegative dij and because dij + h2 is affine (convex).15 7.1. [103] [53.0. As explained in 7.9. 13. Because ij the sum of any number of convex functions in D remains convex [61.2).6] Yet using methods from 3.0.7.

1. .20).Y i. as desired.2 no.1. Problem 2. √ then Y = H ◦ ◦ D must belong to K = SN ∩ RN ×N (1334). + dij yij yij h2 ij 0 ⇔ hij dij ≥ 2 yij (1389) Minimization of the objective function implies maximization of yij that is bounded above.582 7. Because Y ∈ SN h + h ( B. If the given matrix H is now assumed symmetric and nonnegative. yij → hij dij as optimization proceeds.j = 1 .j dij − 2yij + h2 ij 0. PROXIMITY PROBLEMS Equivalent semidefinite program. Hence nonnegativity of yij is implicit to (1388) and. (941)).8.2. We translate (1385) to an equivalent semidefinite program (SDP) for a pedagogical reason made clear in 7.1 CHAPTER 7. So when H ∈ RN ×N is nonnegative as assumed.j H = [hij ] ∈ SN ∩ RN ×N + (1390) . i.2. Substituting a new matrix variable Y [yij ] ∈ RN ×N + hij dij ← yij (1387) Boyd proposes: problem (1385) is equivalent to the semidefinite program minimize D.2 and because there exist readily available computer programs for numerical solution [168] [387] [388] [362] [34] [386] [351] [338]. .2.4. N (1388) subject to dij yij yij h2 ij D ∈ EDMN To see that. then √ ◦ D−H 2 = dij − 2yij + h2 = −N tr(V (D − 2 Y )V ) + H 2 (1391) F ij F i. convex case Convex problem (1385) is numerically solvable for its global minimum using an interior-point method [394] [291] [279] [385] [11] [146]. recall dij ≥ 0 is implicit to D ∈ EDMN ( 5.

7.2. This way. Example 6.7. N ≥ j > i = 1.Y 583 subject to dij yij yij h2 ij D ∈ EDMN Y ∈ SN h 0. N−1 (1393) subject to where distance-square D = [dij ] ∈ SN (918) is related to Gram matrix entries h G = [gij ] ∈ SN ∩ SN by c + dij = gii + gjj − 2gij = Φij .4.2. G yij G 0 yij h2 ij 0.3.0. Convex problem (1392) may be equivalently written via linear bijective ( 5. Gram matrix G . 7.0. and distance matrix D at our discretion.1) EDM operator D(G) (934).g.2 Gram-form semidefinite program.. G∈SN . Y ∈ SN c h minimize − tr(V (D(G) − 2 Y )V ) Φij .2.5... G where Φij = (ei − ej )(ei − ej )T ∈ SN + (920) (933) .6. Problem 2. Example 5.1.. convex case There is great advantage to expressing problem statement (1392) in Gram-form because Gram matrix G is a bidirectional bridge between point list X and distance matrix D . N−1 (1392) where the constants h2 and N have been dropped arbitrarily from the ij objective.1. SECOND PREVALENT PROBLEM: So convex problem (1388) is equivalent to the semidefinite program minimize − tr(V (D − 2 Y )V ) D. N ≥ j > i = 1. problem convexity can be maintained while simultaneously constraining point list X . e..

5.2 Minimization of affine dimension in Problem 2 When desired affine dimension ρ is diminished.1 Definition. if desired.2. [199] Convex envelope cenv f of a function f : C → R is defined to be the largest convex function g such that g ≤ f on convex domain C ⊆ Rn . N ≥ j > i = 1.9.2. X∈ Rn×N c h minimize − tr(V (D(G) − 2 Y )V ) Φij .2. Convex envelope.2) id est.2.1] 7. [200.1 Rank minimization heuristic A remedy developed in [263] [136] [137] [135] introduces convex envelope of the quasiconcave rank function: (Figure 160) 7.584 CHAPTER 7. Y ∈ SN .8) on the positive semidefinite cone. then the convex envelope is equal to the convex conjugate (the Legendre-Fenchel transform) of the convex conjugate of f .16 . its sublevel sets are not convex.16 △ Provided f ≡ +∞ and there exists an affine function h ≤ f on Rn .. id est. implicit constraint G1 = 0 is otherwise unnecessary. more constraints on G could be added. we would first rewrite (1393) G∈SN . the rank function h is quasiconcave ( 3. G yij I X XT G X∈ C yij h2 ij 0 0.2. This problem realization includes a convex relaxation of the nonconvex constraint G = X TX and.2. Indeed. Y } loses convexity in SN × RN ×N .4. 7. This technique is discussed in 5. N−1 subject to (1394) and then add the constraints.. the rank function becomes reinserted into problem (1388) that is then rendered difficult to solve because feasible set {D . realized here in abstract membership to some convex set C .7. E.9.1. 7. To include constraints on the list X ∈ Rn×N .3.2. PROXIMITY PROBLEMS Confinement of G to the geometric center subspace provides numerical stability and no loss of generality (confer (1279)). ( 2. the conjugate-conjugate function f ∗∗ .

large decline in trace required here for only small decrease in rank. Vertical bar labelled g measures a trace/rank gap. rank found always exceeds estimate. id est. Rank is a quasiconcave function on positive semidefinite cone. .2. SECOND PREVALENT PROBLEM: 585 rank X g cenv rank X Figure 160: Abstraction of convex envelope of rank function.7. but its convex envelope is the largest convex function whose epigraph contains it.

2 ∞ ≤ κ} = 1 T 1 x κ (1400) Applying trace rank-heuristic to Problem 2 Substituting rank envelope for rank function in Problem 2.2. PROXIMITY PROBLEMS [136] [135] Convex envelope of rank function: for σi a singular value.2.5) + rank A ← cenv(rank A) ∝ tr A = σ(A)i = i i λ(A)i (1398) which is equivalent to the sum of all eigenvalues or singular values. (1599) 1 1 √ cenv(rank A) on {A ∈ Rm×n | A 2 ≤ κ} = 1T σ(A) = tr ATA (1395) κ κ 1 √ 1 λ(A) 1 = tr ATA (1396) cenv(rank A) on {A normal | A 2 ≤ κ} = κ κ 1 1 n cenv(rank A) on {A ∈ S+ | A 2 ≤ κ} = 1T λ(A) = tr(A) (1397) κ κ A properly scaled trace thus represents the best convex lower bound on rank for positive semidefinite matrices. [135] Convex envelope of the cardinality function is proportional to the 1-norm: 1 x 1 (1399) cenv(card x) on {x ∈ Rn | x ∞ ≤ κ} = κ cenv(card x) on {x ∈ Rn | x + 7. The idea.586 CHAPTER 7.6. is to substitute convex envelope for rank of some variable A ∈ SM ( A. then. for D ∈ EDMN (confer (1074)) T cenv rank(−VN DVN ) = cenv rank(−V D V ) ∝ − tr(V D V ) (1401) and for desired affine dimension ρ ≤ N − 1 and nonnegative H [sic] we get a convex optimization problem √ ◦ minimize D−H 2 F D subject to − tr(V D V ) ≤ κ ρ D ∈ EDM N (1402) .

5. (1388). the problem with no explicit constraint on affine dimension..3). Example 4. Those non−rank-constrained problems are each inherently equivalent to cenv(rank)-minimization problem (1403). or when the diagonal represents a Boolean vector. e.20..9.Y . As the present problem is stated. it tends to optimize compaction of the reconstruction by minimizing total distance. SECOND PREVALENT PROBLEM: 587 where κ ∈ R+ is a constant determined by cut-and-try.4. 5].0.4.2. N−1 subject to (1403) − tr(V D V ) ≤ κ ρ Y ∈ SN h D ∈ EDMN which is the same as (1392). contrast no.8.2.3. rather.g. For this Problem 2. The equivalent semidefinite program makes κ variable: for nonnegative and symmetric H minimize D.22) although the rank function itself is insensitive to choice of auxiliary matrix. the trace rank-heuristic arose naturally in the T objective in terms of V .2. .0.14.2 no.4. We observe: V (in contrast to VN ) spreads energy over all available distances ( B. Example 4..4).g. Trace rank-heuristic (1397) is useless when a main diagonal is constrained to be constant. [355..1. e. in other words. Yet this result is an illuminant for problem (1392) and it equivalents (all the way back to (1385)): When the given measurement matrix H is nonnegative and symmetric. 7. finding the closest EDM D as in problem (1385). or (1392) implicitly entails minimization of affine dimension (confer 5.6.7.3 Rank-heuristic insight Minimization of affine dimension by use of this trace rank-heuristic (1401) tends to find a list configuration of least energy. N ≥ j > i = 1. ρ is effectively ignored. the desired affine dimension ρ yields to the variable scale factor κ .2. Such would be the case were optimization over an elliptope ( 5. (946) It is best used where some physical equilibrium implies such an energy minimization.2. and their optimal solutions are unique because of the strictly convex objective function in (1385).1. κ κ ρ + 2 tr(V Y V ) dij yij yij h2 ij 0.

. to limit maximum weight. the smallest entry on the main diagonal of Υ gets the largest weight. substitute the nonincreasingly ordered diagonalizations Yi + εI Y T Q(Λ + εI )QT (a) (b) U ΥU T (1408) into (1407). .11.2. the first step becomes equivalent to finding the infimum of tr Y .7) we make the surrogate sequence of infima over bounded convex feasible set C arg inf rank Y ← lim Yi+1 Y ∈C i→∞ log det(Y + εI) ≈ log det(Yi + εI) + tr (Yi + εI)−1 (Y − Yi ) (1405) (1406) where. & Boyd [137] [390] [138] propose a rank heuristic more potent than trace (1398) for problems of rank minimization.9. The role of ε is. from the first-order Taylor series expansion about Yi on some open interval of Y 2 ( D. rank Y ← log det(Y + εI) (1404) the concave surrogate function log det in place of quasiconcave rank Y n ( 2. To see that. for i = 0 . Hindi.3]. They propose minimization of the surrogate by substituting a sequence comprising infima of a linearized surrogate about the current estimate Yi .2.9. .4 CHAPTER 7. specifically. id est. (Yi + εI)−1 weights Y so that relatively small eigenvalues of Y found by the infimum are made even smaller.2. the trace rank-heuristic (1398).2) when Y ∈ S+ is variable and where ε is a small positive constant. The intuition underlying (1407) is the new term in the argument of trace.1. PROXIMITY PROBLEMS Rank minimization heuristic beyond convex envelope Fazel. therefore. Then from (1739) we have.588 7. Υ∈U ⋆TCU ⋆ inf δ((Λ + εI )−1 ) δ(Υ) = ≤ T Υ∈U TCU RT=R−1 Y ∈C inf inf tr (Λ + εI )−1 RT ΥR (1409) inf tr((Yi + εI)−1 Y ) where R Q U in U on the set of orthogonal matrices is a bijection. Choosing Y0 = I . Yi+1 = arg inf tr (Yi + εI)−1 Y Y ∈C (1407) a matrix analogue to the reweighting scheme disclosed in [208. 4.

2. problem (1403) becomes the problem sequence in i minimize D.2. λ(Yi + εI) = δ(Λ + εI ) .2. we may certainly force selected weights to ε−1 by manipulating diagonalization (1408a).1 to numerically solve an instance of Problem 2. κ κ ρ + 2 tr(V Y V ) djl yjl yjl h2 jl 0.2. (1410) Tightening this log det rank-heuristic Like the trace method.1 Example. 7.2.17 that does not solve the ball packing problem presented in 5.6 D⋆ ∈ EDMN and D0 11T − I .4.2.1.4.Y . although affine dimension of a solution cannot be guaranteed.17 7. 7.4.2. We apply the convex iteration method from 4.2. l > j = 1.5 Applying log det rank-heuristic to Problem 2 589 When the log det rank-heuristic is inserted into Problem 2... .10).2. this log det technique for constraining rank offers no provision for meeting a predetermined upper bound ρ .7 Cumulative summary of rank heuristics We have studied a perturbation method of rank reduction in 4.3 as well as the trace heuristic (convex envelope method 7.2.2.7.2. There is another good contemporary method called LMIRank [287] based on alternating projection ( E. Unidimensional scaling.1) and log det heuristic in 7.2. N−1 subject to − tr((−V Di V + εI )−1 V D V ) ≤ κ ρ Y ∈ SN h D ∈ EDMN where Di+1 7.3. Empirically we find this sometimes leads to better results. a method empirically superior to the foregoing convex envelope and log det heuristics for rank regularization and enforcing affine dimension.7.7. SECOND PREVALENT PROBLEM: 7. Yet since eigenvalues are simply determined.2.4.

g. I − w VN DVN . This problem has proven NP-hard. W D. N ≥ j > i = 1. [73]. geometrically. W vanish to within some numerical precision. the nonconvex problem is: given nonnegative symmetric matrix H = [hij ] ∈ SN ∩ RN ×N (1390) whose entries + hij are all known. so h begin with convex problem (1392) on page 583 minimize − tr(V (D − 2 Y )V ) D. e. it means reconstructing a list constrained to lie in one affine dimension. we first transform variables to distance-square D ∈ SN .Y subject to dij yij yij h2 ij D ∈ EDMN Y ∈ SN h 0.. N ≥ j > i = 1.Y subject to dij yij yij h2 ij Y ∈ SN h 0. In terms of point list. entails solution of an optimization problem having local minima whose multiplicity varies as the factorial of point-list cardinality. N−1 (1412) T where w (≈ 10) is a positive scalar just large enough to make VN DVN . N }.. [105] a historically practical application of multidimensional scaling ( 5.. The iteration is formed by moving the dimensional constraint to the objective: T minimize − V (D − 2 Y )V . PROXIMITY PROBLEMS Unidimensional scaling. .590 CHAPTER 7. j =1 (|xi − xj | − hij )2 (1341) called a raw stress problem [53.34] which has an implicit constraint on dimensional embedding of points {xi ∈ R . . N−1 (1411) D ∈ EDMN T rank VN DVN = 1 that becomes equivalent to (1341) by making explicit the constraint on affine dimension via rank. N minimize {xi ∈ R} i . p.12).. As always. i = 1 .. and where direction matrix W is .

679550 0.764096 2.020339 4.353489   5. SECOND PREVALENT PROBLEM: an optimal solution to semidefinite program (1738a) T minimize − VN D⋆ VN . problem (1412) is a convex equivalent to (1411) at convergence of the iteration.679550 4. instead.543120   4. ( 4.555130 0.239842 6.000000 5. Because of machine numerical precision and an interior-point method of solution. This iteration is not a projection method.822032 ] =[ x⋆ 1 x⋆ 2 x⋆ 3 x⋆ 4 x⋆ 5 x⋆ 6 ] (1415) found by searching 6! local minima of (1341) [105].404294 6.7.543120 6.038738 4.499274 3.263265 5. W W 591 subject to 0 W I tr W = N − 1 (1413) one of which is known in closed form. By iterating convex problems (1412) and (1413) about twenty times (initial W = 0) we find the global infimum 98.121026 −1.263265 5.618718  6.404294 5.840931 5.486829 6. Here we found the infimum to accuracy of the given data.1) Convex problem (1412) is neither a relaxation of unidimensional scaling problem (1411). accuracy degrades quickly as problem size increases beyond this.840931 3.000000 3.235301 5.353489 5.862884 0.208028 0.000000 4.2.981494 −2.000000 4. Semidefinite programs (1412) and (1413) are iterated until convergence in the sense defined on page 307. and by (1164) we find a corresponding one-dimensional point list that is a rigid transformation in R of X ⋆ .12812 of stress problem (1341).239842   4.499274 6. we speculate.235301 0. but that ceases to hold as problem size increases.559010 5.208028 5.000000 (1414)    H=    and a globally optimal solution X ⋆ = [ −4.618718  0.020339 5. .486829 3.000000 5. Jan de Leeuw provided us with some test data  0.4.1.559010 4.862884 4.

PROXIMITY PROBLEMS 7.3 Third prevalent problem: Projection on EDM cone in dij In summary. numerical solution to Problem 3 is generally different than that of Problem 2. For the moment.580) in terms of EDM D changes the problem considerably:  minimize D−H 2  F  D T Problem 3 (1416) subject to rank VN DVN ≤ ρ   D ∈ EDMN This third prevalent proximity problem is a Euclidean projection of given matrix H on a generally nonconvex subset (ρ < N − 1) of ∂ EDMN the boundary of the convex cone of Euclidean distance matrices relative to subspace SN (Figure 144d). we find that the solution to problem [(1343. −Hayden. Liu.592 CHAPTER 7. & Tarazaga (1991) [186. 7. this third problem becomes obviously convex because the feasible set is then the entire EDM cone and because the objective function D−H 2 F = i.3.3) p.j (dij − hij )2 (1418) .1 Convex case minimize D subject to D ∈ EDMN D−H 2 F (1417) When the rank constraint disappears (for ρ = N − 1). we need make no assumptions regarding measurement matrix H . 3] Reformulating Problem 2 (p. Because coordinates of projection are h distance-square and H now presumably holds distance-square measurements.567] is difficult and depends on the dimension of the space as the geometry of the cone of EDMs becomes more complex. Wells.

7. for 2 (1420) ∂ [dij ] = D ◦D a matrix of distance-square squared.1) [156] [149] [186. then the technique of solution presented here becomes invalid..0. this convex problem was solved numerically by means of alternating projection. (Example 7..19 . for this simple projection on the EDM cone equivalent to (1342). Problem 3. is incorrect. convex case In the past.3) h d2 (D + t Y ) − H dt2 2 F = 2 tr Y T Y > 0 If that H given has negative entries.2. THIRD PREVALENT PROBLEM: is a strictly convex quadratic in D .1 Equivalent semidefinite program.7.1.1. N ≥ j > i = 1.3.7. as expected.1. N−1 (1421) 7.0.1.j 2 dij − 2hij dij + h2 ij 593 (1419) subject to D ∈ EDMN Optimal solution D⋆ is therefore unique.3. D.18 For nonzero Y ∈ SN and some open interval of t ∈ R ( 3.19 H = [hij ] ∈ SN ∩ RN ×N + (1390) We then propose: Problem (1419) is equivalent to the semidefinite program. 7.D subject to ∂ij dij dij 1 D ∈ EDMN ∂ ∈ SN h 0. as explained in 7.18 minimize D i.3. 7.3. 1] We translate (1419) to an equivalent semidefinite program because we have a good solver: Assume the given measurement matrix H to be nonnegative and symmetric. minimize − tr(V (∂ − 2 H ◦ D)V ) ∂.7. Projection of H on K (1334) prior to application of this proposed technique.2.

3. input H goes to D⋆ with an accuracy of four decimal places in about 17 iterations.594 where ∂ij dij dij 1 CHAPTER 7.8.1 Example.1.10):     0 19 19 0 1 1 9 9   D⋆ =  19 0 76  H= 1 0 9 .1. D t D−H 2 ≤t F D ∈ EDMN (1424) subject to . while 2 its nonnegativity causes ∂ij → dij as optimization proceeds.1. PROXIMITY PROBLEMS 2 0 ⇔ ∂ij ≥ dij (1422) Symmetry of input H facilitates trace in the objective ( B. (1423) 9 9 19 76 1 9 0 0 9 9 The original problem (1417) of projecting H on the EDM cone is transformed to an equivalent iterative sequence of projections on the two convex cones (1283) from 6. By solving (1421) we confirm the result from an example given by Glunt. 7. Schur-form semidefinite program. Hayden. This makes an equivalent epigraph form of the problem: for any measurement matrix H minimize t∈ R .3. Alternating projection on nearest EDM. Obviation of semidefinite programming’s computational expense is the principal advantage of this alternating projection technique. & Wells [156. Using ordinary alternating projection.1. Hong.1. Affine dimension corresponding to this optimal solution is r = 1.2 Semidefinite program (1421) can be reformulated by moving the objective function in minimize D−H 2 F D (1417) subject to D ∈ EDMN to the constraints. Problem 3 convex case 7. 6] who found analytical solution to convex optimization problem (1417) for particular cardinality N = 3 by using the alternating projection method of von Neumann ( E.2 no.20).4.

3. ( 7.6.7.4 Dual interpretation.5.2. projection on EDM cone From E.2) 7. this problem statement may be equivalently written in terms of a Gram matrix via linear bijective ( 5. t∈ R . t∈ R c minimize t tI vec(D(G) − H) T vec(D(G) − H) 1 0 0 (1426) subject to G To include constraints on the list X ∈ Rn×N .2) minimize t∈ R .. THIRD PREVALENT PROBLEM: 595 We can transform this problem to an equivalent Schur-form semidefinite program. we would rewrite this: G∈ SN .4.5. The advantage of this SDP is lack of conditions on input H . D t tI vec(D − H) T vec(D − H) 1 0 (1425) subject to D ∈ EDMN characterized by great sparsity and structure.1.1.3.1 we learn that projection on a convex set has a dual form. In the circumstance K is a convex cone and point x exists exterior to the cone .3.0. negative entries would invalidate any solution provided by (1421).3. Problem 3 convex case Further. 7. ( 3. G∈ SN . X∈ Rn×N c minimize t tI vec(D(G) − H) T vec(D(G) − H) 1 0 0 (1427) subject to I X XT G X∈ C where C is some abstract convex set.1.1) EDM operator D(G) (934).g. e.3 Gram-form semidefinite program. This technique is discussed in 5.9.1.

we get a convex optimization for any given symmetric matrix H exterior to or on the EDM cone boundary: minimize D ◦ subject to D ∈ EDMN D−H 2 F maximize ◦ A A◦ . Problem 3 (1416) is difficult to solve [186. H which is equal to (1429) when A⋆ solves maximize A (1430) A. H (1429) Critchley proposed.H A F subject to =1 (1431) A ∈ EDMN This projection of symmetric H on polar cone EDMN can be made a convex problem.113]: In that circumstance.596 CHAPTER 7. p. of course. by projection on the algebraic complement ( E. 3] because the feasible set in RN (N −1)/2 loses convexity. by relaxing the equality constraint ( A F ≤ 1). instead.3.1).2. distance to the nearest point P x in K is found as the optimal value of the objective x − P x = maximize aTx a subject to a ≤1 a∈K ◦ (2058) where K is the polar cone. PROXIMITY PROBLEMS or on its boundary.2 Minimization of affine dimension in Problem 3 When desired affine dimension ρ is diminished.9. Applying this result to (1417).2. ◦ 7. projection on the polar EDM cone in his 1980 thesis [89. D⋆ = A⋆ A⋆ . H A◦ F ≡ subject to A◦ ∈ EDM ≤1 (1428) N◦ Then from (2060) projection of H on cone EDMN is D⋆ = H − A◦⋆ A◦⋆ . By .

D subject to ∂ij dij dij 1 0. spectral projection can be considered a natural means in light of its successful application to projection on a rank ρ subset of the positive semidefinite cone in 7..3 Constrained affine dimension.5. Another approach to affine dimension minimization is to project instead on the polar EDM cone. discussed in 6. Problem 3 When one desires affine dimension diminished further below what can be achieved via cenv(rank)-minimization as in (1433). minimize − tr(V (∂ − 2 H ◦ D)V ) ∂. because that particular method comes from . N−1 (1433) − tr(V D V ) ≤ κ ρ D ∈ EDMN ∂ ∈ SN h That means we will not see equivalence of this cenv(rank)-minimization problem to the non−rank-constrained problems (1419) and (1421) like we saw for its counterpart (1403) in Problem 2..1. then for any given H we get a convex problem minimize D subject to − tr(V D V ) ≤ κ ρ D ∈ EDMN D−H 2 F (1432) where κ ∈ R+ is a constant determined by cut-and-try.3.3. 7. N ≥ j > i = 1. problem (1432) is a convex optimization having unique solution in any desired affine dimension ρ . Given κ .8. Yet it is wrong here to zero eigenvalues of −V D V or −V G V or a variant to reduce affine dimension. an approximation to Euclidean projection on that nonconvex subset of the EDM cone containing EDMs with corresponding affine dimension no greater than ρ .1. The SDP equivalent to (1432) does not move κ into the variables as on page 587: for nonnegative symmetric input H and distance-square squared variable ∂ as in (1420). THIRD PREVALENT PROBLEM: 597 substituting rank envelope (1401) into Problem 3.7.4.

7. to which membership subsumes the rank constraint.598 CHAPTER 7. Rank of an optimal solution is intrinsically bounded above and below. ( 5.3. spectral projection on  ρ+1  R+  0  ∩ ∂H ⊂ RN +1 (1436) R− where ∂H = {λ ∈ RN +1 | 1T λ = 0} (1144) is a hyperplane through the origin.11.0.1 Cayley-Menger form We use Cayley-Menger composition of the Euclidean distance matrix to solve a problem that is the same as Problem 3 (1416): ( 5. 2 ≤ rank 0 1T 1 −D⋆ ≤ ρ+2 ≤ N +1 (1435) Our proposed strategy for low-rank solution is projection on that subset 0 1T of a spectral cone λ ( 5. .1) 0 1T 0 1T − minimize 1 −H 1 −D D 0 1T ≤ ρ+2 subject to rank 1 −D D ∈ EDMN 2 F (1434) a projection of H on a generally nonconvex subset (when ρ < N − 1) of the Euclidean distance matrix cone boundary rel ∂ EDMN . Problem 3 is instead a projection on the EDM cone whose associated spectral cone is considerably different.2.3. This pointed polyhedral cone (1436).2. projection from the EDM cone interior or exterior on a subset of its relative boundary ( 6. (1212)). PROXIMITY PROBLEMS projection on a positive semidefinite cone (1369). id est.5. zeroing those eigenvalues here in Problem 3 would place an elbow in the projection path (Figure 159) thereby producing a result that is necessarily suboptimal.3) corresponding to affine 1 −EDMN dimension not in excess of that ρ desired.3.11.3) Proper choice of spectral cone is demanded by diagonalization of that variable argument to the objective: 7. is not full-dimensional. id est.

1. R  ρ+1  R+  ∩ ∂H  0 subject to δ(Υ) ∈ R− minimize δ(QR ΥRTQT ) = 0 R −1 = RT 2 (1439) where π is the permutation operator from argument in nonincreasing order. any permutation matrix is an orthogonal matrix. and where ∂H insures one negative eigenvalue. THIRD PREVALENT PROBLEM: 599 Given desired affine dimension 0 ≤ ρ ≤ N − 1 and diagonalization ( A.3. Hollowness constraint δ(QR ΥRTQT ) = 0 makes problem (1439) difficult by making the two variables dependent.20 where R 7.3 arranging its vector (1440) in U on the set of orthogonal matrices is a bijection. Our plan is to instead divide problem (1439) into two and then iterate their solution: δ(Υ) − π δ(RTΛR) Υ  ρ+1  R+  ∩ ∂H subject to δ(Υ) ∈  0 R− minimize minimize R 2 QT U ∈ RN +1×N +1 (a) (1441) subject to δ(QR Υ⋆ RTQ ) = 0 R −1 = RT 7.5) of unknown EDM D 0 1T U ΥU T ∈ SN +1 (1437) h 1 −D and given symmetric H in diagonalization 0 1T 1 −H QΛQT ∈ SN +1 (1438) having eigenvalues arranged in nonincreasing order.20 R Υ⋆ RT − Λ 2 F T (b) Recall.7. then by (1157) problem (1434) is equivalent to δ(Υ) − π δ(RTΛR) Υ. .7.

The cone membership constraint in (1441a) therefore inherently insures existence of a symmetric hollow matrix.9. intersection of 1 −D  ρ+1  R+  ∩ ∂H ∩ KM from (1442) with the cone of  0 this feasible superset R− ∗ majorization Kλδ is a benign operation. PROXIMITY PROBLEMS Proof.3) among the aggregate of halfspace-description normals.600 CHAPTER 7.10.4.2).1.0.13. Curiously.3).3 with regard to π the permutation operator.9. .4. id est. cone membership constraint  ρ+1 R+  ∩ ∂H from (1441a) is equivalent to  0 δ(Υ) ∈ R−  Rρ+1 +  ∩ ∂H ∩ KM δ(Υ) ∈  0 R−  (1442) where KM is the monotone cone ( 2.1.13.2. ∂H ∩ KM+ ∩ KM = ∂H ∩ KM ∗ (1443) verifiable by observing conic dependencies ( 2. Membership of δ(Υ) to the polyhedral cone of majorization (Theorem A. We justify disappearance of the hollowness constraint in convex optimization problem (1441a): From the arguments in 7.1) ∗ ∗ Kλδ = ∂H ∩ KM+ ∗ (1459) where KM+ is the dual monotone nonnegative cone ( 2. is a condition (in absence of a hollowness constraint) that would insure existence 0 1T of a symmetric hollow matrix .

4.  .. tr Rij= 0 . .3..  .  G= T RN +1. .  .N +1 rN +1  R1. .6. N + 1    (  0) (1445) rank G = 1 The rank constraint is regularized by method of convex iteration developed in 4. R11 · · · R1. .0. .N +1 rN +1   rN +1 r1  T T T T r1 ··· rN +1 1 r1 ··· rN +1 1 where Rij T ri rj ∈ RN +1×N +1 and Υ⋆ ∈ R .N +1 T T r1 ··· rN +1 1 N +1 i=1 ⋆ Υii Rii QT = 0 i=1 . instead.0.   . it is. .N +1 r1 .. .. a minimization over the intersection of the nonconvex manifold of orthogonal matrices with another nonconvex set in variable R specified by the hollowness constraint.N +1 r1 r1 r1 · · · r1 rN +1 r1 . . N + 1 i<j = 2 .4) were it not for the hollowness constraint. Rij .. .   .2: Define R = [ r1 · · · rN +1 ] ∈ RN +1×N +1 and make the assignment  r1 . . Problem (1445) is partitioned into two convex problems: . .7. = T    T T rN +1 rN +1 rN +1  R1. . ri 2 F minimize i=1 ⋆ Υii Rii − Λ subject to δ Q tr Rii = 1 .  T T [ r1 · · · rN +1 1 ]   2   (1444) G = ∈ S(N +1) +1  rN +1   1     T T R11 · · · R1. .  .. .N +1 RN +1. . We solve problem (1441b) by a method introduced in 4.  . . Since R Υ⋆ RT = ii N +1 i=1 ⋆ Υii Rii then problem (1441b) is equivalently expressed: N +1 Rii ∈ S . . . THIRD PREVALENT PROBLEM: 601 Optimization (1441b) would be a Procrustes problem ( C.

latent semantic indexing [232]. . .4 Conclusion The importance and application of solving rank.or cardinality-constrained problems are enormous. . a conclusion generally accepted gratis by the mathematics and engineering communities. An optimal solution to (1447) is known in closed form (p. tr Rij= 0 . PROXIMITY PROBLEMS ⋆ Υii Rii − Λ minimize Rij . it can be moved to the objective within an added weighted norm.11. Figure 112).0. video surveillance. W 0 W I tr W = (N + 1)2 (1447) subject to then iterated with convex problem (1441a) until a rank-1 G matrix is found and the objective of (1441a) is minimized.0. sparse or low-rank matrix completion for preference models and . [264] Rank and cardinality constraints also arise naturally in combinatorial optimization ( 4. . optics [75] (Figure 119). R11 · · · R1. in that case. . . N + 1 i<j = 2 .N +1 r1 . The hollowness constraint in (1446) may cause numerical infeasibility.672).. .  . and communications [143] [255] (Figure 101). and find application to face recognition (Figure 4).6.602 N +1 2 CHAPTER 7. . For example.. 7. one might be interested in the minimal order dynamic output feedback which stabilizes a given linear time invariant plant (this problem is considered to be among the most important open problems in control ). ri i=1 + G.N +1 RN +1. cartography (Figure 140). W F subject to tr Rii = 1 . N + 1      (1446) 0 δ Q ⋆ Υii Rii QT = 0 and W ∈ S(N +1) +1 minimize 2 G⋆ . . Conditions for termination of the iteration would then comprise a vanishing norm of hollowness.  G= T R1.N +1 rN +1  T T r1 ··· rN +1 1 N +1 i=1 i=1 . Rank-constrained semidefinite programs arise in many vital feedback and control problems [173].

It was assumed that if H did not already belong to the EDM cone. for example. any solution acquired that way is necessarily suboptimal.2.1. or lack thereof. sensor-network localization or wireless location (Figure 85). hollowness.12).7. 5. One popular recourse is intentional misapplication of Eckart & Young’s result by introducing spectral projection on a positive semidefinite cone into Problem 3 via D(G) (934). digital filter design with time domain constraints [Wıκımization]. then we wanted an EDM closest to H in some sense. 7. at once encompasses proximity and completion problems.7. p. is based on their discovery (Problem 1. For practical problems. symmetry. having a constraint on rank. That particular redesign (the art. id est. . We considered H having various properties such as nonnegativity. There has been little progress in spectral projection since the discovery by Eckart & Young in 1936 [130] leading to a formula for projection on a rank ρ subset of a positive semidefinite cone ( 2. it places a regularization term in the objective that enforces the rank constraint. medical imaging (Figure 107).1). input-matrix H was assumed corrupted somehow. molecular conformation (Figure 133). A third recourse is to apply the method of convex iteration just like we did in 7. etcetera.2. the norm objective is thereby eliminated as in the development beginning on page 321. [73] Since Problem 3 instead demands spectral projection on the EDM cone. multidimensional scaling or principal component analysis ( 5.1. it withstands reason that such a proximity problem could instead be reformulated so that some or all entries of H were unknown but bounded above and below by known limits.8).13).4. in terms of the Gram-matrix bridge between point-list X and EDM D .9. This technique is applicable to any semidefinite problem requiring a rank constraint.2. [147] The only closed-form spectral method presently available for solving proximity problems. CONCLUSION 603 collaborative filtering. A second recourse is problem redesign: A presupposition to all proximity problems in this chapter is that matrix H is given.

PROXIMITY PROBLEMS .604 CHAPTER 7.

citation: Dattorro. 9. y = A . All rights reserved. v2012. λ . Mεβoo Publishing USA. X2 = X1 .1 Main-diagonal δ operator. δ(A) returns a vector composed of all the entries from the main diagonal in the natural order.28. When linear function δ operates on a square matrix A ∈ RN ×N .1 (1451) Linear operator T : Rm×n → RM ×N is self-adjoint when. vec Operating on a vector y ∈ RN . Convex Optimization & Euclidean Distance Geometry.01.5-1]. main-diagonal linear operator δ is self-adjoint [228.1 videlicet. ∀ X1 . co&edg version 2012. 605 . X2 ∈ Rm×n T (X1 ) .10.2) δ(A)T y = δ(A) . T (X2 ) 2001 Jon Dattorro.01.A. δ(y) = tr AT δ(y) A. δ(δ(Λ)) returns Λ itself. δ(y) ∈ SN We introduce notation δ denoting the main-diagonal linear self-adjoint operator. δ(A) ∈ RN (1448) (1449) Operating recursively on a vector Λ ∈ RN or diagonal matrix Λ ∈ SN . ( 2. δ naturally returns a diagonal matrix. 2005.28.Appendix A Linear algebra A. 3. tr . δ 2(Λ) ≡ δ(δ(Λ)) Λ (1450) Defined in this manner.

tr(A) = tr(AT ) = δ(A)T 1 = I .4] of matrices of like size. vec(cA) = c vec(A) 6. π(δ(A)) = λ(I ◦ A) where π is the presorting function. σ(A) a vector of (nonincreasingly) ordered singular values. and λ(A) a vector of nonincreasingly ordered eigenvalues of matrix A : 1. δ(A) = δ(AT ) 2. X a matrix.1. δ(AB)T = 1T(AT ◦ B) = 1T(B ◦ AT ) c∈R c∈R c∈R c∈R c∈R c∈R c∈R c∈R .1 Identities This δ notation is efficient and unambiguous as illustrated in the following examples where: A ◦ B denotes Hadamard product [203] [160. sgn(c) λ(|c|A) = c λ(A) 14.1. ei the i th member of the standard basis for Rn . (A + B) ⊗ C = A ⊗ C + B ⊗ C A ⊗ (B + C ) = A ⊗ B + A ⊗ C 13. 17. A 3.1.1). SN the symmetric h hollow subspace. LINEAR ALGEBRA A. A ⊗ cB = cA ⊗ B 8. tr(c ATA ) = c tr ATA = c 1T σ(A) 16. vec(A + B) = vec(A) + vec(B) 11. ⊗ Kronecker product [167] ( D. sgn(c) σ(|c|A) = c σ(A) √ √ 15.2. y a vector. (A + B) ◦ C = A ◦ C + B ◦ C A ◦ (B + C ) = A ◦ B + A ◦ C 12.606 APPENDIX A. tr(cA) = c tr(A) = c 1T λ(A) 5. δ(AB) = (A ◦ B T )1 = (B T ◦ A)1 18. 1. A ◦ cB = cA ◦ B 7. δ(cA) = c δ(A) 4. tr(A + B) = tr(A) + tr(B) 10. δ(A + B) = δ(A) + δ(B) 9.

For ζ = [ζ i ] ∈ Rk and x = [xi ] ∈ Rk . D = [dij ] ∈ SN . 30. tr(ΛA) = δ(Λ)T δ(A) . δ(uv T ) =  . vec(A XB) = (B T ⊗ A) vec X 32. T = δ vec(X) vec(X)T (B T ⊗ A) 1 34. ζ i /xi = ζ T δ(x)−1 1 i vec(A ◦ B) = vec(A) ◦ vec(B) = δ(vec A) vec(B) = vec(B) ◦ vec(A) = δ(vec B) vec(A) (42)(1826) not H 31.  = u ◦ v . δ(I 1) = δ(1) = I 28. uN v N  607 u. 27. SN (confer B. 23. vec(B XA) = (AT ⊗ B) vec X tr(A XBX T ) = vec(X)T vec(A XB) = vec(X)T (B T ⊗ A) vec X 33. δ δ(A)1 T = δ 1 δ(A)T = δ(A) δ(y)1 = δ(y1T ) = y 26. tr(A X TBX) = vec(X)T vec(B XA) = vec(X)T (AT ⊗ B) vec X T = δ vec(X) vec(X)T (AT ⊗ B) 1 [167] . V = I − h h 22.j y TB δ(A) = tr B δ(A)y T = δ(A)TB Ty = tr y δ(A)TB T = tr AT δ(B Ty) = tr δ(B Ty)AT 24.A. .v ∈ RN 20. TR .20) dij hij i. VEC  u1 v 1  . δ 2(ATA) = i ei eTAT Aei eT i i 25. δ(ei eT 1) = δ(ei ) = ei eT j i 29.  19. λ .2 no. tr(ATB) = tr(AB T ) = tr(BAT ) = tr(B TA) = 1T(A ◦ B)1 = 1T δ(AB T ) = δ(ATB)T 1 = δ(BAT )T 1 = δ(B TA)T 1 N tr(−V (D ◦ H)V ) = tr(DTH) = 1T(D ◦ H)1 = tr(11T(D ◦ H)) = δ 2(Λ) Λ ∈ SN = tr δ(B Ty)A = tr A δ(B Ty) 1 11T ∈ N 21. MAIN-DIAGONAL δ OPERATOR. H = [hij ] ∈ SN .1. δ(A1)1 = δ(A11T ) = A1 .4.

(A ⊗ B)T = AT ⊗ B T 42. There exist permutation matrices Ξ1 and Ξ2 such that [167. For A a vector. p. tr(A ⊗ B) = tr A tr B 44.28] . For B a row vector. B ∈ Rn×n .608 APPENDIX A. Given analytic function [82] f : C n×n → C n×n . For A ∈ Rm×m . (A ⊗ B) = A(I ⊗ B) 41. δ 2(B) = Ξ δ 2(ΞTB Ξ)ΞT 36. (A ⊗ B)−1 = A−1 ⊗ B −1 43. A ⊗ 1 = 1 ⊗ A = A 37. A ⊗ (B ⊗ C ) = (A ⊗ B) ⊗ C 38. det(A ⊗ B) = detn (A) detm (B) 45. (A ⊗ B) = (A ⊗ I )B 40. for example. For any permutation matrix Ξ and dimensionally compatible vector y or matrix A δ(Ξ y) = Ξ δ(y) ΞT δ(Ξ A ΞT ) = Ξ δ(A) (1452) (1453) So given any permutation matrix Ξ and any dimensionally compatible matrix B . For eigenvalues λ(A) ∈ C n and eigenvectors v(A) ∈ C n×n such that A = vδ(λ)v −1 ∈ Rn×n λ(A ⊗ B) = λ(A) ⊗ λ(B) . (A ⊗ B)(C ⊗ D) = AC ⊗ BD 39. v(A ⊗ B) = v(A) ⊗ v(B) (1456) 47. LINEAR ALGEBRA 35. then f (I ⊗A) = I ⊗f (A) and f (A ⊗ I) = f (A) ⊗ I [167. p.28] A ⊗ B = Ξ1 (B ⊗ A)Ξ2 (1455) (1454) 46.

MAIN-DIAGONAL δ OPERATOR.2 Corollary.2 Majorization A.0. both arranged in nonincreasing order. 4. VEC 609 A. ⋄ Majorization cone Kλδ is naturally consequent to the definition of majorization.1. ∗ (1463) ⋄ .0.2.5] Let λ ∈ R denote a given vector of eigenvalues and let δ ∈ RN denote a given vector of main diagonal entries.4] [203. 7. 5. id est. and where the ∗ dual of the line is a hyperplane.A. λ . Then ∃ A ∈ SN and conversely λ(A) = λ and δ(A) = δ ⇐ λ − δ ∈ Kλδ A ∈ SN ⇒ λ(A) − δ(A) ∈ Kλδ ∗ ∗ (1457) (1458) The difference belongs to the pointed polyhedral cone of majorization (not a full-dimensional cone.1.2.1. rather. In the particular circumstance δ(A) = 0 we get: A. vector y ∈ RN majorizes vector x if and only if k k ∗ i=1 xi ≤ yi i=1 ∀1 ≤ k ≤ N (1460) and 1T x = 1T y (1461) Under these circumstances. N Let λ ∈ R denote a given vector of eigenvalues arranged in nonincreasing order.1. [396. confer (312)) Kλδ ∗ ∗ KM+ ∩ {ζ 1 | ζ ∈ R} ∗ ∗ (1459) where KM+ is the dual monotone nonnegative cone (434). Then ∗ ∃ A ∈ SN λ(A) = λ ⇐ λ ∈ Kλδ (1462) h and conversely where ∗ Kλδ A ∈ SN ⇒ λ(A) ∈ Kλδ h is defined in (1459). Symmetric hollow majorization. (Schur) Majorization. ∂H = {ζ 1 | ζ ∈ R} = 1⊥ . vector x is majorized by vector y .1 Theorem. TR .3] N [204.

4. A. By so doing. A. A Hermitian matrix has real eigenvalues and real main diagonal. the complex test xH ≥ 0 is most fundamental.1 Symmetry versus semidefiniteness We call (1464) the most fundamental test of positive semidefiniteness.3 unless that complex test is adopted. then it is nonnegative for any αxp where α ∈ R . only Hermitian matrices (AH = A where superscript H denotes conjugate transpose)A. for example. xT A. for all real A .3 Golub & Van Loan [160.610 APPENDIX A. for real A and complex domain {x ∈ C n } . . A= (A + AT ) (A − AT ) + 2 2 (53) Because. sufficient.2. id est. Thus.2].2. namely. Any real square matrix A has a representation in terms of its symmetric and antisymmetric parts.2 Semidefiniteness: domain of test The most fundamental necessary. the antisymmetric part vanishes under real test. only normalized x in Rn need be evaluated.2 are admitted to the set of positive semidefinite matrices (1467). that standard textbook requirement is far over-reaching because if xTA x is nonnegative for particular x = xp . 1] xTA x ≥ 0 for each and every x ∈ Rn such that x = 1 (1464) Traditionally. Yet admitting nonsymmetric real matrices or not is a matter of preferenceA. Yet some authors instead say. That complex broadening of the Ax domain of test causes nonsymmetric real matrices to be excluded from the set of positive semidefinite matrices. Many authors add the further requirement that the domain be complex. as we shall now explain. admit nonsymmetric real matrices. authors demand evaluation over broader domain. Indeed. LINEAR ALGEBRA A. over all x ∈ Rn which is sufficient but unnecessary. the broadest domain.2 (A − AT ) x=0 2 (1465) Hermitian symmetry is the complex analogue. the real part of a Hermitian matrix is symmetric while its imaginary part is antisymmetric. an unnecessary prohibitive condition. and definitive test for positive semidefiniteness of matrix A ∈ Rn×n is: [204.

It is customary. re(xH Ax) = xH rather. if A ∈ C n×n .4 A + AH x 2 (1466) xH ∈ R ⇔ AH = A Ax (1467) . according to Horn & Johnson. 52] Perhaps a better explanation of this pervasive presumption of symmetry comes from Horn & Johnson [203. Certainly misleading under (1464).A.5 is the complex matrix. A totally complex perspective is not necessarily more advantageous.3) For that reason. II] A. (A + AT )/2. then A is Hermitian. They explain. Hence the oft-made presumption that only symmetric matrices may be positive semidefinite is. So the Hermitian symmetry assumption is unnecessary. not Symmetric matrices are certainly pervasive in the our chosen subject as well. because the imaginary part of xH Ax comes from the anti-Hermitian part (A − AH )/2 .1] whose perspectiveA. the assumption that A is Hermitian is not necessary in the definition of positive definiteness. erroneous under (1464). The positive semidefinite cone. thus necessitating the complex domain of test throughout their treatise.5) in the ambient space of Hermitian matrices. SEMIDEFINITENESS: DOMAIN OF TEST 611 only the symmetric part of A . Because eigenvalue-signs of a symmetric matrix translate unequivocally to its semidefiniteness. has a role determining positive semidefiniteness. [196. the convention adopted in the literature is that semidefinite matrices are synonymous with symmetric semidefinite matrices. the real part of xH Ax comes H from the Hermitian part (A + A )/2 of A . the eigenvalues that determine semidefiniteness are always those of the symmetrized matrix.5 A. and because symmetric (or Hermitian) matrices must have real eigenvalues. .13. ( A. of course. A − AH im(xH Ax) = xH x (1468) 2 that vanishes for nonzero x if and only if A = AH . Their comment is best explained by noting. however.2. Thus. for example. and if xH Ax is real for all x ∈ C n . . is not selfdual ( 2. 7. that presumption is typically bolstered with compelling examples from the physical sciences where symmetric matrices occur within the mathematical exposition of natural phenomena.A.4 [141.

A and B are assumed individually Hermitian and the domain of test is assumed complex.1. A.6 prob. LINEAR ALGEBRA because nonHermitian matrices could be regarded positive semidefinite.2.5   λ(B) =   3. One important exception occurs for rank-one matrices Ψ = uv T where u and v are real vectors: Ψ is positive semidefinite if and only if Ψ = uuT . The alternative to expanding domain of test is to assume all matrices of interest to be symmetric. Yet that complex edifice is Ax dismantled in the test of real matrices (1464) because the domain of test is no longer necessarily complex. Horn & Johnson assert and Zhang agrees: If A.9  0 5  4. In summary. We prove that assertion to be false for real matrices under (1464) that adopts a real domain of test. 7.2.7) We might choose to expand the domain of test to all x in C n so that only symmetric matrices would be comparable. 3.10] [396.8  5. regardless of symmetry. then we know that the product AB is positive definite if and only if AB is Hermitian. 6.4  4 1 0 0 1 4 2.  AT = A =  λ(A) =  (1469)  −1 1   3.2] Implicitly in their statement.3. meaning.1.0 4  4 BT = B =   −1 −1   4 −1 −1 5 0 0  . that is commonly done.1 Example.     3 0 −1 0 5.0.612 APPENDIX A. xT Ax will certainly always be real.B ∈ C n×n are positive definite.5  1 0  . and so real A will always be comparable.0.24  (1470) . [203. rather because nonHermitian (includes nonsymmetric real) matrices are not comparable on the real line under xH . then nonsymmetric real matrices are admitted to the realm of semidefinite matrices because they become comparable on the real line. if we limit the domain of test to all x in Rn as in (1464).3  0. ( A. hence the synonymous relationship with semidefinite matrices. 0 5 1  0 1 4  8. Nonsymmetric positive definite product.

5.1. So motivated is our consideration of proper statements of positive semidefiniteness under real domain of test.5 25 3 0.5  1 .5 0. (AB + (AB)T ) =  2  −6. while those remaining are well established or obvious.   λ(AB) =   10.3.3.  29.  0.   λ 1 (AB + (AB)T ) =  2  10. which may have served to advance any confusion. It is a little more difficult to find a counterexample in R2×2 or R3×3 .6 .A. they do so by expanding the domain of test to C n . we never adopt a complex domain of test with real matrices. a few require proof as they depart from the standard texts.5  15. then λ(AB) = λ( AB A) will always be a nonnegative vector by (1496) and Corollary A.0.  0. complicates the facts when compared to corresponding statements for the complex case (found elsewhere [203] [396]).1. A.5 9 17    36.  36.3 Proper statements of positive semidefiniteness Unlike Horn & Johnson and Zhang.6 Horn & Johnson and Zhang resolve the anomaly by choosing to exclude nonsymmetric matrices and products.0.1) despite the fact that AB is not symmetric.5 −4. This restriction.  30. Yet positive definiteness of product AB is certified instead by the nonnegative eigenvalues 1 λ 2 (AB + (AB)T ) in (1472) ( A. ironically.5 −6. PROPER STATEMENTS 613  13 12 −8 −4  19 25 5 1  . We state several fundamental facts regarding positive semidefiniteness of real matrix A and the product AB and sum A + B of real matrices under fundamental real test (1464).014  A. (AB)T = AB =   −5 1 22 9  −5 0 9 17  13 15.72  (1471) (1472) √ √ n n Whenever A ∈ S+ and B ∈ S+ .3.5 3 22 9  −4.A.

The set of norm-1 vectors is necessary and sufficient to establish positive semidefiniteness. the extreme directions xxT of the positive semidefinite cone constitute a minimal set of generators necessary and sufficient for discretization of dual generalized inequalities (1476) certifying membership to that cone.1.4. any particular norm and any nonzero norm-constant will work. Statements (1473) and (1474) are each a particular instance of dual generalized inequalities ( 2.5) of unit norm. as in A 0 ⇔ tr(XA) ≥ 0 ∀ X ∈ SM + (1476) A ≻ 0 ⇔ tr(XA) > 0 ∀ X ∈ SM .1 Theorem. xxT = 0 (1475) This says: positive semidefinite matrix A must belong to the normal side of every hyperplane whose normal is an extreme direction of the positive semidefinite cone.7 .0. x = 1 .3.13. The traditional condition requiring all x ∈ RM for defining positive (semi)definiteness is actually more than what is necessary. actually. X = 0 + But that condition is more than what is necessary. xxT = 0 (1474) ⋄ Proof.A. X = 1. [363] A 0 ⇔ xxT .13. A ∈ SM is positive semidefinite if and only if for each and every vector x ∈ RM of unit norm. A ≻ 0 ⇔ tr(xxTA) = xTA x > 0 ∀ xxT . Relations (1473) and (1474) remain true when xxT is replaced with “for each and every” positive semidefinite matrix X ∈ SM + ( 2. By the discretized membership theorem in 2. A 0 ⇔ tr(xxTA) = xTA x ≥ 0 ∀ xxT (1473) x =1 Matrix A ∈ SM is positive definite if and only if for each and every we have xTA x > 0 .0.13.2) with respect to the positive semidefinite cone. A.614 APPENDIX A.7 we have xTA x ≥ 0 (1464). videlicet.2. LINEAR ALGEBRA A. A > 0 ∀ xxT( 0) 0) . A ≥ 0 ∀ xxT( A ≻ 0 ⇔ xxT . Positive (semi)definite matrix.

nonsymmetric 1 When A ∈ Rn×n . because the anti-Hermitian part does not vanish under complex test unless A is Hermitian. App.1) xTA x ≥ 0 ∀ x ∈ Rn ⇔ A + AT ( 2.334] it is not λ(A) that requires observation. λ(A) λ(A) If AT = A then λ(A) meaning. δ(J ) = λ(A) .0.B] id est.0. let λ 2 (A + AT ) ∈ Rn denote eigenvalues of the A. 1 The symmetrization of A is (A + AT )/2. A .1] (confer A. (1468) A. matrix A belongs to the positive semidefinite cone in the subspace of symmetric matrices if and only if its eigenvalues belong to the nonnegative orthant.9 [273. Yet he is mistaken by proposing the Hermitian part alone xH(A+AH )x be tested.A. eigenvalues. λ 2 (A + AT ) = λ(A + AT )/2.3. λ(A) (45) 0 ⇔ A 0 (1483) 0 ⇒ xTA x ≥ 0 ∀ x (1481) (1482) 0 ⇐ AT = A . p. symmetrized matrix By positive semidefiniteness of A ∈ Rn×n we mean.9 .8 A. xTA x ≥ 0 ∀ x For µ ∈ R . A = λ(A) . AT = A .3.8 arranged in nonincreasing order. PROPER STATEMENTS 615 A.1.A.9.1) A A T 0 ⇔ λ(A + AT ) 0 (1477) B ⇔ A−B 0 ⇒ AT = A 0 A T (1478) 0 or B 0 (1479) (1480) x A x≥ 0 ∀ x A =A Matrix symmetry is not intrinsic to positive semidefiniteness.6. 1.3.1 Semidefiniteness.3. Strang agrees [333. [333. and vector λ(A) ∈ C n holding the ordered eigenvalues of A λ(µI + A) = µ1 + λ(A) (1484) Proof: A = M JM −1 and µI + A = M (µI + J )M −1 where J is the Jordan form for A . so λ(µI + A) = δ(µI + J ) because µI + J is also a Jordan form. 5. A ∈ Rn×n .

7.4) det(A ) = i=1 n m λ(A)m i λ(A)m i i=1 (1491) (1492) tr(Am ) = . 2.7 prob. LINEAR ALGEBRA λ(I + µA) = 1 + λ(µA) For vector σ(A) holding the singular values of any matrix A σ(I + µATA) = π(|1 + µσ(ATA)|) σ(µI + ATA) = π(|µ1 + σ(ATA)|) (1485) (1486) (1487) where π is the nonlinear permutation-operator sorting its vector argument into nonincreasing order. xT Tx = ATx 2 . (1907) n .616 By similar reasoning.1. For A ∈ SM and each and every w = 1 [203. AA 2 For A ∈ Rn×n and c ∈ R tr(cA) = c tr(A) = c 1T λ(A) For m a nonnegative integer.5. for dimensionally compatible vector x . APPENDIX A. AAT 0 (1490) xT TAx = Ax A 2 2 because.4] (confer (44)) A is normal matrix ⇔ For A ∈ Rm×n ATA A 2 F µI ⇔ λ(A) µ1 (1488) = λ(A)T λ(A) (1489) 0.1 no.9] wTAw ≤ µ ⇔ A [203. ( A.

1). n} (1496) Any eigenvalues remaining are zero. B.20] id est.3. By the 0 eigenvalues theorem ( A.3.A.0.7. rank is equal to the number of nonzero eigenvalues in vector λ(A) δ(Λ) (1494) by the 0 eigenvalues theorem ( A.0.3. 1.2] (confer (1779)) tr(AB) ≤ λ(A)T λ(B) (1765) with equality (Theobald) when A and B are simultaneously diagonalizable [203] with the same ordering of eigenvalues. p. 1.1). A = SΛS −1 .3.B ∈ S+ ( 2. [203. including their multiplicity. λ(AB)1:η = λ(BA)1:η . B ∈ S n [55. η min{m .2. (confer [333. rank(AB) = rank(BA) . rank B} ≥ rank(AB) (1499) n For linearly independent matrices A. For A ∈ Rm×n and B ∈ Rn×m tr(AB) = tr(BA) (1495) and η eigenvalues of the product and commuted product are identical.4] min{rank A . B [203. 0.7. rank A + rank B = rank(A + B) > min{rank A . R(AT ) ∩ R(B T ) = 0.B ∈ S+ (1498) rank A + rank B ≥ rank(A + B) ≥ min{rank A . PROPER STATEMENTS For A diagonalizable ( A.255]) rank A = rank δ(λ(A)) = rank Λ 617 (1493) meaning.5).1. AB and BA diagonalizable (1497) For any compatible matrices A .1. rank B} ≥ rank(AB) (1500) . rank B} ≥ rank(AB) n For A.1). (Ky Fan) For A . R(A) ∩ R(B) = 0.

4. then N (AB) = N (B) and the stated result follows by conservation of dimension (1616).26] Matrix symmetry alone is insufficient for product symmetry. p. [203. N (CAB) ⊇ N (AB) ⊇ N (B) is obvious. B ∈ Rn×n square. (1507) . Let C = A† . (⇒) Suppose AB = (AB)T . [333. (⇐) Suppose AB = BA . LINEAR ALGEBRA Because R(ATA) = R(AT ) and R(AAT ) = R(A) (p. AB = (AB)T ⇒ AB = BA . (AB)T = AB ⇔ AB = BA Proof.k. 2.5] det(AB) = det(BA) det(AB) = det A det B Yet for A ∈ Rm×n and B ∈ Rn×m [77. By assumption ∃ A† A†A = I . (1549) [114. p. 0. For A ∈ S n and any nonsingular matrix Y inertia(A) = inertia(YAY T ) a. BA = B TAT = (AB)T . and for any B ∈ Rn×k rank(AB) = rank(B) (1502) (1501) Proof.3.648).B ∈ S n . product AB is symmetric iff AB is commutative. AB = BA ⇒ AB = (AB)T .a Sylvester’s law of inertia. (AB)T = B TAT = BA . for any A ∈ Rm×n rank(AAT ) = rank(ATA) = rank A = rank AT For A ∈ Rm×n having no nullspace.618 APPENDIX A.3] For A . Commutativity alone is insufficient for product symmetry. For any compatible matrix C .72] det(I + AB) = det(I + BA) (1506) (1504) (1505) (1503) For A.

Because (AB)T = AB .12] A product of diagonal matrices is always commutative. Now assume A. conclude nothing about the semidefiniteness of A and B . For A. [203. [203.B 0. B ∈ Rn×n commute if and only if they are simultaneously diagonalizable.3.A. 5. Because all symmetric matrices are diagonalizable. . then T must equal S . xTB x ≥ 0 ∀ x ⇒ λ(A+AT )i λ(B+B T )i ≥ 0 ∀ i the negative result arising because of the schism between the product of eigenvalues λ(A + AT )i λ(B + B T )i and the eigenvalues of the symmetrized matrix product λ(AB + (AB)T )i . .B ∈ Rn×n and AB = BA xTAB x ≥ 0 ∀ x (1508) xTA x ≥ 0 .6] we have A = SΛS T and B = T ∆T T . meaning. then (1490) applies. B and AB = BA 0 (1509) 0 ⇒ λ(AB)i = λ(A)i λ(B)i ≥ 0 ∀ i ⇔ AB Positive semidefiniteness of commutative A and B is sufficient but not necessary for positive semidefiniteness of product AB .3] and the eigenvalues of A are ordered in the same way as those of B . and so all products λ(A)i λ(B)i must be nonnegative. Then Λ∆ 0 by (1477). PROPER STATEMENTS 619 Diagonalizable matrices A .B ∈ S n A 0. X 2 is generally not positive semidefinite unless matrix X is symmetric. where Λ and ∆ are real diagonal matrices while S and T are orthogonal matrices. AB = SΛ∆S T is symmetric and has nonnegative eigenvalues contained in diagonal matrix Λ∆ by assumption. id est. therefore. 1. hence positive semidefinite by (1477).1) [333. Simply substituting symmetric matrices changes the outcome: For A. sgn(λ(A)) = sgn(λ(B)). Proof. implies λ(A)i λ(B)i ≥ 0 for all i because all the individual eigenvalues are nonnegative. We may. (⇐) Suppose AB = SΛ∆S T 0. 1.5. of course. n . For example.3. That. ( A. . λ(A)i = δ(Λ)i and λ(B)i = δ(∆)i correspond to the same eigenvector. (⇒) Assume λ(A)i λ(B)i ≥ 0 for i = 1 .

3.1] A 0. There is no converse. B 0 ⇒ λ(AB) = λ( AB A) 0 by (1496) and Corollary A. xTA x ≥ xTB x ∀ x ⇔ λ((A −B) + (A −B)T )/2 0 ⇒ tr(A +AT − (B +B T ))/2 = tr(A −B) ≥ 0.1) 0 (1510) 0 (1511) AB = BA ⇒ λ(AB)i = λ(A)i λ(B)i ≥ 0 ∀ i ⇒ AB AB = BA ⇒ λ(AB)i ≥ 0 .C ∈ S n (L¨wner) o A A A A B. A B ⇒ A=B A (transitivity) (additivity) (antisymmetry) (reflexivity) (strict transitivity) (strict additivity) (1519) A B.13.2.2.5. A ≻ 0.2] Because A 0 . A 0. A ≻ 0. (1483) A A A A 0.0. B 0 (Example A.1.0. 6. 0 ⇔ λ(A) 0 ⇒ δ(A) 0 0 ⇒ tr A ≥ 0 (1516) (1517) 0 ⇒ tr A tr B ≥ tr(AB) ≥ 0 (1518) √ √ [396.B.B ∈ Rn×n (1520) xTA x ≥ xTB x ∀ x ⇒ tr A ≥ tr B (1521) Proof.2.620 For A. A 0 ⇔ tr(AB) ≥ 0 ∀ B 0 (377) For A.3] [204.B ∈ S n and A APPENDIX A. B ≺C ⇒ A≺ C A≺B ⇔ A+C ≺ B+C For A. 4.7 prob. B B 0 B 0 B≻0 B≻0 ⇒ ⇒ ⇒ ⇒ A⊗B 0 A◦B 0 A⊗B ≻0 A◦B ≻0 0 yet (1512) (1513) (1514) (1515) where Kronecker and Hadamard products are symmetric. .B ∈ S n [203. λ(A)i λ(B)i ≥ 0 ∀ i ⇔ AB For A. 7.1. then we have tr(AB) ≥ 0. B C ⇒ A C B ⇔ A+C B+C B. LINEAR ALGEBRA 0. 5.B ∈ S n . For A.

6.3.0. Then [203. and so on.1. 6. Positive semidefinite ordering of eigenvalues. 6] x Ax > 0 ∀x = 0 ⇔ λ A +A xTA x ≥ 0 ∀ x T ⇔ λ A + AT T 0 ≻0 (1530) (1531) 1 because xT(A − AT )x = 0.3] A B 0 ⇒ Ak k = 1. Then [333. All-strict versions hold. 2.B ∈ int S+ [34. . 6.7.2] A For A.2 prob.A.2] [203.1 prob.B ∈ S n [396.1 Theorem. (1465) Now arrange the entries of λ 2 (A + AT ) 1 1 and λ 2 (B + B T ) in nonincreasing order so λ 2 (A + AT ) 1 holds the largest eigenvalue of symmetrized A while λ 1 (B + B T ) 1 holds the largest 2 eigenvalue of symmetrized B . B ∈ RM ×M .1.B ∈ S n B A √ 0 ⇒ A 0 ⇐ A1/2 Bk . (1529) A.0. For A . . A ≻ 0 ⇔ A−1 ≻ 0 √ (1524) (1525) (1526) A B A≻B B ⇔ A−1 n For A. 7.2] (Theorem A.3. . and restriction to the positive semidefinite cone does not improve the situation.9] for κ∈R xTA x ≥ xTI x κ ∀ x ⇔ λ xTA x ≥ xTB x ∀ x ⇒ λ A + AT 1 (A 2 λ B + BT κ1 + AT ) (1532) . 4.B ∈ S n [396. 7. λ 2 (B + B T ) ∈ RM . A B 0 ⇒ rank A ≥ rank B 0 ⇒ det A ≥ det B ≥ 0 0 ⇒ det A > det B ≥ 0 B −1 .3.7 prob.4) A A B ⇒ tr A ≥ tr B B ⇒ δ(A) δ(B) 621 (1522) (1523) There is no converse. place the eigenvalues of each symmetrized matrix into 1 1 the respective vectors λ 2 (A + AT ) . PROPER STATEMENTS For A.4] A (1527) For A. B 0 (1528) and AB = BA [396.

λ(B) ∈ RM in nonincreasing order so λ(A)1 holds the largest eigenvalue of A while λ(B)1 holds the largest eigenvalue of B . 7.1.3 Corollary. [203. M } λ(A − B)k ≤ λ(A)k ≤ λ(A + B)k (1540) ⋄ . for A . 2 Then. for any k ∈ {1 . (Weyl) Eigenvalues of sum. [396. B ∈ R . LINEAR ALGEBRA Now let A .3.5] B ⇔ λ(A−B) 0 B ⇒ λ(A) λ(B) B λ(A) λ(B) B ⇐ λ(A) λ(B) (1533) (1534) (1535) (1536) ⋄ λ A + AT A. 4. place the eigenvalues of each matrix into respective + vectors λ(A). .3.622 APPENDIX A. and positive semidefiniteness of a sum of positive semidefinite matrices. Eigenvalues of sum and difference.0.3] For A ∈ SM and B ∈ SM .0. . for any k ∈ {1 . B ∈ SM + Because SM is a convex cone ( 2. then by (175) + A.3.0. and so on. Then A A A T S AS where S = QU T .1).0. Then. λ 2 (B + B T ) ∈ RM in nonincreasing 2 1 order so λ 2 (A + AT ) 1 holds the largest eigenvalue of symmetrized A while λ 1 (B + B T ) 1 holds the largest eigenvalue of symmetrized B . M } k + λ B + BT M ≤ λ (A + AT ) + (B + B T ) k ≤ λ A + AT k + λ B + BT (1537) ⋄ 1 Weyl’s theorem establishes: concavity of the smallest λM and convexity of the largest eigenvalue λ1 of a symmetric matrix.2 Theorem.1.B 0 ⇒ ζA + ξB λ(A)k + λ(B)M ≤ λ(A + B)k ≤ λ(A)k + λ(B)1 0 for all ζ . [203. 4. ξ ≥ 0 (1538) (1539) A. B ∈ SM have diagonalizations A = QΛQT and B = U ΥU T with λ(A) = δ(Λ) and λ(B) = δ(Υ) arranged in nonincreasing order. .9. place the eigenvalues of each symmetrized matrix into 1 the respective vectors λ 1 (A + AT ) . and so on.1] M ×M For A . via (496). .

1.10 . If A ∈ SM is positive semidefinite and singular it remains possible.10 A ∈ SM is positive semidefinite if and only if all M principal submatrices of dimension M −1 are positive semidefinite and det A is nonnegative.0.6. [203. id est. A. for some skinny Z ∈ RM ×N with N < M . this means coefficients of orthogonal projection of vectorized A on a subset of extreme directions from SM determined by Z can be positive. [273. conversely. for B = qq T λ(A)k−1 ≤ λ(A − qq T )k ≤ λ(A)k ≤ λ(A + qq T )k ≤ λ(A)k+1 A. If matrix A ∈ SM is positive (semi)definite then.4. A ∈ SM is positive definite if and only if any one principal submatrix of dimension M − 1 is positive definite and det A is positive.3.3. 1. this theorem is a synthesis of facts from [203.3. 1.11 ⋄ A recursive condition for positive (semi)definiteness.11 Using the interpretation in E. ⋄ If any one principal submatrix of dimension M−1 is not positive definite.A. for any matrix Z of compatible dimension.399] If A ∈ SM is positive definite and any particular dimensionally compatible matrix Z has no nullspace. Regardless of symmetry.A.1]).1] A. p. 7.3. 6. if A ∈ RM ×M is positive (semi)definite. Z T AZ is positive semidefinite. then Z T AZ is positive definite. PROPER STATEMENTS 623 When B is rank-one positive semidefinite.A.3.3.3] (confer [273. Positive (semi )definite principal submatrices.5 Corollary.1. the eigenvalues interlace. then A can neither be. + A. that Z T AZ becomes positive definite. A ∈ SM is positive (semi)definite if and only if there exists a nonsingular Z such that Z T AZ is positive (semi)definite. Positive (semi )definite symmetric products.0. then the determinant of each and every principal submatrix is (nonnegative) positive.4 (1541) Theorem.2] [333.

⋄ A.624 APPENDIX A. In other words. from the only if YT Corollary it follows: for dimensionally compatible Z A 0 ⇔ ZT AZ 0 and Z T has a left inverse Products such as Z †Z and ZZ † are symmetric and positive semidefinite although.7 Theorem. suppose uv T is positive semidefinite. The same does not hold true for matrices of higher rank.1. but that is possible only if v = u . p. Given real matrix Ψ with rank Ψ = 1 Ψ 0 ⇔ Ψ = uuT [203.3. y Tu uTy ≥ 0. as Example A.2.R P 0 (1542) R ⇔ R(P ) ⊇ R(R) ⇔ N (P ) ⊆ N (R) [20. Z †AZ and ZAZ † are neither necessarily symmetric or positive semidefinite.0. Symmetric positive semidefinite. Any rank-one matrix must have the form Ψ = uv T .5 prob. We know that can hold if and only if uv T + vuT 0 ⇔ for all normalized y ∈ RM .400] (1543) where u is some real vector. p. v = u . id est.0. given nonsingular matrix Z and any particular dimensionally compatible Y : matrix A ∈ SM is positive semidefinite if and ZT A [ Z Y ] is positive semidefinite. ⋄ Proof. 6] [223. ( B. III] Projector P is never positive definite [335. LINEAR ALGEBRA We can deduce from these.3. [21.20] unless it is the identity matrix.6 Theorem.55] For symmetric idempotent matrices P and R P.1. Symmetric projector semidefinite. . symmetry is necessary and sufficient for positive semidefiniteness of a rank-1 matrix.1) Suppose Ψ is symmetric. 2 y Tu v Ty ≥ 0 . id est. Conversely. 6.1.1 shows. For all y ∈ RM . given A 0. A.0.

in the second instance. R(B) is in R(A).4 Schur complement Consider Schur-form partitioned matrix G : Given AT = A and C T = C .8] ⇔ A ≻ 0 . Note that A ≻ 0 ⇒ A† = A−1 . Again. G= A B BT C ≻0 A ≻ 0 . A−B C −1 B T ≻ 0 (1548) . for all B of compatible dimension. while the Schur complement of C in G is A − B C −1 B T . for A or C nonsingular. (1949) It is apparently required which precludes A = 0 when B is any nonzero matrix. the projection matrix vanishes.4. for A and C nonsingular. when C is full-rank. C −B TA−1 B or C ≻ 0 . C −B TA† B 0 . 4. then [59] G= ⇔ A ⇔ C A B BT C 0 0 0 (1544) where A† denotes the Moore-Penrose (pseudo)inverse ( E). It is required which precludes C = 0 for B nonzero. Likewise.A. B(I − CC † ) = 0 . C −B TA−1B ≻ 0 ⇔ C ≻ 0 . G= A B BT C ⇔ 0 (1547) R(B T ) ⊥ N (C T ) † −1 0 . C ≻ 0 ⇒ C = C get. So we When A is full-rank then. A−B C −1 B T 0 0 where C − B TA−1B is called the Schur complement of A in G . I − AA† is a symmetric projection matrix orthogonally projecting on N (AT ). Likewise. A−B C † B T R(B) ⊥ N (AT ) (1545) (1546) . B T(I −AA† ) = 0 . thereby. [154. SCHUR COMPLEMENT 625 A. In the first instance. Thus the flavor. R(B T ) is in R(C ). I − CC † projects orthogonally on N (C T ).

p. then we have eigenvalues + λ(G) = λ and 0 B BT 0 = σ(B) −Ξ σ(B) (1553) (1554) inertia(G) = inertia(B TB) + inertia(−B TB) where Ξ is the order-reversing permutation matrix defined in (1766). zero.0. n} (1549) where p . when A is invertible.4] Define inertia G ∈ SM {p .163] Quadratic multivariate polynomial xTA x + 2bTx + c is a convex function of vector x if and only if A 0. LINEAR ALGEBRA Origin of the term Schur complement is from complementary inertia: [114.0.2 Example.4. id est. and negative eigenvalues of G . M = p+z+n (1550) Then. inertia(G) = inertia(C ) + inertia(A − B C −1 B T ) (1552) (1551) A.17] When A = C = 0. n respectively represent number of positive. Nonnegative polynomial.0. [55.g. Equipartition inertia. inertia(G) = inertia(A) + inertia(C − B TA−1B) and when C is invertible. 2. z .626 APPENDIX A. e. z . Schur-form positive semidefiniteness is sufficient for polynomial convexity but necessary and sufficient for nonnegativity: A b bT c 0 ⇔ [ xT 1 ] A b bT c x 1 ≥ 0 ⇔ xTA x + 2bTx + c ≥ 0 (1555) All is extensible to univariate polynomials. x [ tn tn−1 tn−2 · · · t ]T .4.1 Example. . [34. denoting nonincreasingly ordered singular values of matrix B ∈ Rm×m by σ(B) ∈ Rm .2 exer. 1.4. A. but sublevel set {x | xTA x + 2bTx + c ≤ 0} is convex if A 0 yet not vice versa.0..

G= A B BT C 0 A 0 T 0 C −B TA† B A−B C † B T 0 0T C 0 and B T(I −AA† ) = 0 (1557) 0 and B(I − CC † ) = 0 These facts plus Moore-Penrose condition ( E.1 Schur-form nullspace basis From (1544).4. SCHUR COMPLEMENT 627 A. then minimization of tr C is necessary and sufficient for minimization of A B tr(C −B TA−1B) when both are under constraint 0. A B BT C 0 ⇒ tr(A + C ) ≥ 0 A 0 0T C −B TA−1B A−B C −1 B T 0 0T C 0 ⇒ tr(C −B TA−1B) ≥ 0 (1556) 0 ⇒ tr(A−B C −1 B T ) ≥ 0 Since tr(C −B TA−1B) ≥ 0 ⇔ tr C ≥ tr(B TA−1B) ≥ 0 for example.1) provide a partial basis: basis N A B BT C ⊇ I −AA† 0 T 0 I − CC † (1558) .0.4.4.0. From (1517).0.3 Example.A.0. BT C A. Schur-form fractional function trace minimization.

2 Exercise. 7. rank A B BT C = rank A − B C −1 B T 0 0T C −1 = rank A + rank(C −B TA−1B) (1562) ⋄ = rank(A − B C B ) + rank C T Proof.0.7.1. consequence relates definiteness of three quantities: I B BT C 0 ⇔ C − B TB 0 ⇔ I 0 0T C −B TB One 0 (1559) A. for B ∈ Rm×n and C ∈ S n   1 + λ(C )i .4.4.6] Y = XT = I −A−1B 0T I (1564) X A B Y = BT C A 0 T 0 C −B TA−1B (1563) . Sparse Schur conditions. otherwise (1560) A. when symmetric matrix C is invertible and A is symmetric.0. The first assertion (1561) holds if and only if [203. 1 ≤ i ≤ n  I B 1.2 prob. Y Let [203. [396. Prove: given C −B TB = 0.7] When symmetric matrix A is invertible and C is symmetric.1.1 Example.1.4. 2.0. Setting matrix A to the identity simplifies the Schur conditions.6c] ∃ nonsingular X.3 Theorem. Rank of partitioned matrices.628 APPENDIX A.4. LINEAR ALGEBRA A. 0. n<i≤m = λ BT C   i 0. Eigenvalues λ of sparse Schur-form. rank A B BT C = rank A 0 0T C −B TA−1B (1561) equals rank of main diagonal block A plus rank of its Schur complement. Similarly.

0. but not necessarily positive (semi)definite. When A is invertible.A. rank B .4. even in absence of semidefiniteness. But.1 Determinant G= A B BT C (1571) We consider again a matrix G partitioned like (1544). [137] [135] Matrix B ∈ Rm×n has rank B ≤ ρ if and only if there exist matrices A ∈ Sm and C ∈ S n such that rank A 0 0T C ≤ 2ρ and G= A B BT C 0 (1570) ⋄ Schur-form positive semidefiniteness alone implies rank A ≥ rank B and rank C ≥ rank B . we must always have rank G ≥ rank A . A.1.4.0. Rank of Schur-form block. where A and C are symmetric. det G = det A det(C − B TA−1B) When C is invertible.4 Lemma.3.4. SCHUR COMPLEMENT From Corollary A. rank C by fundamental linear algebra.3 eigenvalues are related by 0 0 which means rank(C −B TA−1B) ≤ rank C rank(A − B C −1 B T ) ≤ rank A Therefore rank A B BT C ≤ rank A + rank C λ(C −B TA−1B) λ(A − B C −1 B T ) λ(C) λ(A) 629 (1565) (1566) (1567) (1568) (1569) A.1. det G = det C det(A − B C −1B T ) (1573) (1572) .

A = 0. = X = SΛS = [ s1 · · · sm ] Λ λ i s i wi (1580) . LINEAR ALGEBRA When B is full-rank and skinny. A. then (1577) When B is full-rank and fat. Eigenvectors must be nonzero. if not square. and C det G = 0 ⇔ C + B TB ≻ 0 When B is a row-vector.1] (1574) When B is a (column) vector. 5. 10. C = 0. p. something akin to “characteristic”. then for A = 0 and all C of dimension compatible with G det G = A det(C − while for all A ∈ R T det G = det(C )A − B Ccof B T 1 T B B) A (1578) (1579) where Ccof is the matrix of cofactors corresponding to C . then for all C ∈ R and all A of dimension compatible with G det G = det(A)C − B TAT B cof while for C = 0 det G = C det(A − (1575) where Acof 1 BB T ) (1576) C is the matrix of cofactors [333. 4] corresponding to A .  . then [61.5 Eigenvalue decomposition All square matrices have associated eigenvalues λ and eigenvectors. Ax = λ i x becomes impossible dimensionally. Prefix eigen is from the German. T i=1 wm . [333.6] then  T w1 m −1 T . and A det G = 0 ⇔ A + BB T ≻ 0 0. 0.1. in this context meaning.14] When a square matrix X ∈ Rm×m is diagonalizable. [330.630 APPENDIX A.

where xH = x∗T . There is no connection between diagonalizability and invertibility of X .1 Uniqueness 2 = xTA x∗ ⇒ From the fundamental theorem of algebra.0. many different matrices may share the same unique set of eigenvalues..2] Diagonalizability is guaranteed by a full set of linearly independent eigenvectors. in contrast. for a given square matrix are unique.) Uniqueness of eigenvectors. whereas invertibility is guaranteed by all nonzero eigenvalues. i = 1. λ(X) = λ(X T ).. EIGENVALUE DECOMPOSITION 631 where {si ∈ N (X − λ i I ) ⊆ Cm } are l.5. Ax = λx .0. meaning. xHAx = λxHx = λ x x = x∗ . there is no other set of eigenvalues for that matrix.m (1582) and where {λ i ∈ C} are eigenvalues (1494) δ(λ(X)) = Λ ∈ Cm×m (1583) corresponding to both left and right eigenvectors. id est. A.A. 5.. ⋄ Proof. eigenvectors ⇔ diagonalizable not diagonalizable ⇒ repeated eigenvalue (1584) A. disallows multiplicity of the same direction: . [220] which guarantees existence of zeros for a given polynomial.. Real eigenvector. it follows: eigenvalues. (Conversely. Given λ = λ∗ .i. for any X . e.5. distinct eigenvalues ⇒ l.5. (right-)eigenvectors constituting the columns of S ∈ Cm×m defined by XS = SΛ rather Xsi λ i si . [333. i = 1.1 Theorem.. λ(X) = λ(X T ).i. Eigenvectors of a real matrix corresponding to real eigenvalues must be real.0.g. including their multiplicity. The converse is equally simple.m (1581) {wi ∈ N (X T − λ i I ) ⊆ Cm } are linearly independent left-eigenvectors of X (eigenvectors of X T ) constituting the rows of S −1 defined by [203] S −1X = ΛS −1 rather T wi X T λ i wi .

.1. eigenvectors −S are equivalent to eigenvectors S by Definition A.i.0. distinct eigenvalues ⇔ eigenvectors unique (1587) A.r.632 APPENDIX A. When eigenvectors are unique.12 (1588) A matrix is diagonalizable iff algebraic multiplicity (number of occurrences of same eigenvalue) equals geometric multiplicity dim N (X − λI ) = m − rank(X − λI ) [330. We may conclude. any choice of independent vectors from N (X − λI ) (of the same multiplicity) constitutes eigenvectors corresponding to λ .5.0.220] distinct eigenvalues ⇒ eigenvectors unique (1585) Eigenvectors corresponding to a repeated eigenvalue are not unique for a diagonalizable matrix.15] (number of Jordan blocks w. △ If S is a matrix of eigenvectors of X as in (1580). in other words. Caveat diagonalizability insures linear independence which implies existence of distinct eigenvectors.12 dim N (X − λI ) exceeds 1.0.2 Invertible matrix When diagonalizable matrix X ∈ Rm×m is nonsingular (no zero eigenvalues).5. produces another eigenvector. [330. eigenvectors). the eigenvector corresponding to a distinct eigenvalue is unique. corresponding to a particular eigenvalue.1. For eigenvalue λ whose multiplicityA. Unique eigenvectors. For any square matrix. and their directions are distinct.1. then it has an inverse obtained simply by inverting eigenvalues in (1580): X −1 = S Λ−1 S −1 A.t λ or number of corresponding l. p. p. then −S is certainly another matrix of eigenvectors decomposing X with the same eigenvalues. repeated eigenvalue ⇒ eigenvectors not unique (1586) Proof follows from the observation: any linear combination of distinct eigenvectors of diagonalizable X . Although directions are distinct.5. for example. LINEAR ALGEBRA A. we mean: unique to within a real nonzero scaling. for diagonalizable matrices.1 Definition.

0.5. All normal matrices are diagonalizable.3] e. p. A symmetric matrix is a special normal matrix whose eigenvalues Λ must be realA. 2. that set of all real matrices having a complete orthonormal set of eigenvectors.2. orthogonal.g.5.10. for i = 1 . Suppose λ i is an eigenvalue corresponding to eigenvector si of real A = AT .1 Symmetric matrix diagonalization The set of normal matrices is. [335.4] [333. wi si = 1 . sT m  m λ i si sT i i=1 (1592) A. each dyad from a diagonalization is an independent ( B.31] id est. Consequently.1) nonorthogonal projector because T T T s i wi s i wi = s i wi (1589) (whereas the dyads of singular value decomposition are not inherently projectors (confer (1596))). There is no converse. called a biorthogonality condition [368.1] [335. Dyads of eigenvalue decomposition can be termed eigenmatrices because T T X si wi = λ i s i wi (1590) Sum of the eigenmatrices is the identity.  = . symmetric.1. any matrix X for which XX T = X TX .2. 8. and circulant matrices [169].13 and whose eigenvectors S can be chosen to make a real orthonormal set. .A. m T s i wi = I i=1 (1591) A. EIGENVALUE DECOMPOSITION A.315] id est. precisely. m . .3 eigenmatrix 633 T The (right-)eigenvectors {si } (1580) are naturally orthogonal wi sj = 0 T to left-eigenvectors {wi } except. i . [396.13 Proof. So we have λ i si 2 = λ∗ si 2 . for X ∈ Sm  sT 1 .1. p.4] [203] because neither set of left or right eigenvectors is necessarily an orthogonal set.5. prob. 6.3] [330. [160. Then sHAsi = sTAs∗ (by transposition) ⇒ s∗Tλ i si = sTλ∗ s∗ because (Asi )∗ = (λ i si )∗ by i i i i i i i assumption.. X = SΛS T = [ s1 · · · sm ] Λ  . 7.

LINEAR ALGEBRA where δ 2(Λ) = Λ ∈ Sm ( A. Σ∈ Rη×η .1) and S −1 = S T ∈ Rm×m (orthogonal matrix.634 APPENDIX A.1. A = U ΣQT = [ u1 · · · uη ] Σ  . 2.5.7. A.0. Then X = X X . A.5.6. R{si | λ i = 0} = R(A) = R(AT ) (1593) R{si | λ i = 0} = N (AT ) = N (A) A.6 A. SVD Compact SVD T  q1 .4] For any A ∈ Rm×n  η i=1 T σi ui qi (1596) U∈R m×η .1.1.5. we almost always arrange eigenvalues in nonincreasing order as is the convention for singular value decomposition. By 0 eigenvalues theorem A.3 Positive semidefinite matrix square root (1594) When X ∈ Sm . T qη [160. its unique positive semidefinite matrix square root is defined + √ √ X S Λ S T ∈ Sm (1595) + √ where the square root of nonnegative diagonal matrix Λ is taken entrywise √ √ and positive. Then to diagonalize a symmetric matrix that is already a diagonal matrix.1. orthogonal matrix S becomes a permutation matrix.3.1 Singular value decomposition.5. then its inverse (obtained by inverting eigenvalues in (1592)) is also symmetric: X −1 = S Λ−1 S T ∈ Sm A.2) because of symmetry: SΛS −1 = S −TΛS T .1 Diagonal matrix diagonalization Because arrangement of eigenvectors and their corresponding eigenvalues is arbitrary.  = . B.5. + QT Q = I Q∈R n×η .2 Invertible symmetric matrix When symmetric matrix X ∈ Sm is nonsingular (invertible). U TU = I .

η} | σi = 0} ≤ η (1602) There are η singular values. . σ(A) = |λ(A)|. For any flavor SVD. .1] For η = n .15 When matrix A is normal. SVD 635 where U and Q are always skinny-or-square. 1 ≤ i ≤ ρ i i σ(A)i = σ(AT )i =  0.14 A. For η = m .2 Subcompact SVD Some authors allow only nonzero singular values. 2.14  √ √  λ(ATA)i = λ(AAT )i = λ ATA = λ AAT > 0 . ρ<i≤η (1599) of which the last η − ρ are 0 . σ(A) = λ(ATA) = λ ATA . In that case the compact decomposition can be made smaller. A.A.6. rank is equivalent to the number of nonzero singular values on the main diagonal of Σ . where [160. 8.5.1) δ 2(Σ) = Σ ∈ Rη×η + (1598) holding the singular values {σi ∈ R} of A which are always arranged in nonincreasing order by convention and are related to eigenvalues λ byA. it can be redimensioned in terms of rank ρ because. σ(A) = . and where η min{m . for any A ∈ Rm×n ρ = rank A = rank Σ = max {i ∈ {1 .1. [396.6. SINGULAR VALUE DECOMPOSITION.A. each having orthonormal columns. n} (1597) Square matrix Σ is diagonal ( A.15 where ρ rank A = rank Σ (1600) A point sometimes lost: Any real matrix may be decomposed in terms of its real singular values σ(A) ∈ Rη and real matrices U and Q as in (1596). λ(AAT ) = λ AAT .3] R{ui | σi = 0} = R(A) R{ui | σi = 0} ⊆ N (AT ) (1601) R{qi | σi = 0} = R(AT ) R{qi | σi = 0} ⊆ N (A) A.

U TU = I .  Σ ∈ Rm×n .636 Now APPENDIX A. App. and R{ui } = R(A) R{qi } = R(AT ) (1604) A. Orthonormal matrices U and Q become orthogonal matrices ( B. + QT = Q−1 Q ∈ Rn×n   T  n×ρ basis R(AT )     (n×n−ρ basis N (A))T (1606) . A = U ΣQT = [ u1 · · · uρ ] Σ  . LINEAR ALGEBRA  T q1 . Completing the nullspace bases in U and Q from (1601) provides what is called the full singular value decomposition of A ∈ Rm×n [333. U T = U −1 .A]..3 Full SVD Another common and useful expression of the SVD makes U and Q square. i=1 T qn  = m×ρ basis R(A)   m×m−ρ basis N (AT )   U ∈ Rm×m .  .5): R{ui | σi = 0} = R{ui | σi = 0} = R{qi | σi = 0} = R{qi | σi = 0} = R(A) N (AT ) R(AT ) N (A) (1605) For any matrix A having rank ρ (= rank Σ)  T  q1 η T T . σ1 σ2 .  = . Σ∈ Rρ×ρ . = A = U ΣQ = [ u1 · · · um ] Σ σi ui qi .6. making the decomposition larger than compact SVD. + Q∈R n×ρ QT Q = I where the main diagonal of diagonal matrix Σ has no 0 entries. T qρ  U∈R m×ρ ρ i=1 T σi ui qi (1603) ..

given A = U ΣQT ∈ Rm×n A† = QΣ†T U T ∈ Rn×m (1609) . Figure 161 shows how the axes q1 and q2 are first rotated by QT to coincide with the coordinate axes. & Herbst [272] A. An important geometrical interpretation of SVD is given in Figure 161 for m = n = 2 : The image of the unit sphere under any m × n matrix multiplication is an ellipse. and Σ follow readily from (1606).6. stretched by a factor σj .4 Pseudoinverse by SVD Matrix pseudoinverse ( E) is nearly synonymous with singular value decomposition because of the elegant expression.m (see [336]). the principal axes of the final ellipse. and then rotated to point in the direction of uj . SVD 637 where upper limit of summation η is defined in (1597). the circle is stretched by Σ in the directions of the coordinate axes to form an ellipse. A direct calculation shows that Aqj = σj uj . SINGULAR VALUE DECOMPOSITION. note that QT is a pure rotation of the circle.A. Second. All of this is beautifully illustrated for 2 × 2 matrices by the Matlab code eigshow. the nonincreasingly ordered (possibly 0) singular values appear along its main diagonal as for compact SVD (1599). Thus qj is first rotated to coincide with the j th coordinate axis.6. Expressions for U . Magaia. Note how q1 and q2 are rotated to end up as u1 and u2 . −Muller. AAT U = U ΣΣT and ATAQ = QΣT Σ (1608) demonstrating that the columns of U are the eigenvectors of AAT and the columns of Q are the eigenvectors of ATA . Q . A direct consequence of the geometric interpretation is that the largest singular value σ1 measures the “magnitude” of A (its 2-norm): A 2 = sup Ax x 2 =1 2 = σ1 (1607) This means that A 2 is the length of the longest principal semiaxis of the ellipse. The third step rotates the ellipse by U into its final position. now padded with respect to (1598) by m−η zero rows or n−η zero columns. Considering the three factors of the SVD separately. Matrix Σ is no longer necessarily square.

an ellipse. For the example illustrated. in general.638 APPENDIX A. A = U ΣQT = [ u1 · · · um ] Σ  . . LINEAR ALGEBRA  T q1 . Q [ q1 q2 ] ∈ R2×2 .  = . T qn  η i=1 T σi ui qi q1 q2 QT q2 q1 q2 Σ q1 U u2 u1 Figure 161: Geometrical interpretation of full SVD [272]: Image of circle {x ∈ R2 | x 2 = 1} under matrix multiplication Ax is. U [ u1 u2 ] ∈ R2×2 .

p. . ZEROS 639 that applies to all three flavors of SVD.5.7 A. Step function. A.7. (confer 4.2. [353.5. where Σ† simply inverts nonzero entries of matrix Σ .1 Zeros norm zero x =0 ⇔ x=0 (1615) For any given norm. n ∈ Rn ai < 0 (1612) △ Eigenvalue signs of a symmetric matrix having diagonalization A = SΛS T (1592) can be absorbed either into real U or real Q from the full SVD.5 SVD of symmetric matrices 1≤i≤ρ ρ<i≤η A.A.2. its pseudoinverse simply inverts all nonzero eigenvalues: A† = SΛ† S T ∈ S n (1610) A. (1611) lim xi = |xi | 1.6.1) or A = SΛS T = Sδ(ψ(δ(Λ))) |Λ|S T A = SΛS T = S|Λ| δ(ψ(δ(Λ)))S T U ΣQT ∈ S n U Σ QT ∈ S n (1613) (1614) where matrix of singular values Σ = |Λ| denotes entrywise absolute value of diagonal eigenvalue matrix Λ .6. ℓ .0. i σ(A)i =  0.3. by definition. ai ≥ 0 .0. Given symmetric matrix A ∈ S n and its diagonalization A = SΛS T ( A. i = 1 .4.1 Definition.34] (confer C. −1 . .1) n n Define the signum-like quasilinear function ψ : R → R that takes value 1 corresponding to a 0-valued entry in its argument: ψ(a) xi →ai From (1599) and (1595) for A = AT  √  λ(A2 )i = λ A2 = |λ(A)i | > 0 .1).7.

number of 0 eigenvalues is at least equal to dim N (A) dim N (A) ≤ number of 0 eigenvalues ≤ m (1617) while all eigenvectors corresponding to those 0 eigenvalues belong to N (A).16 A. and widely applicable: A. 1.g.1 Theorem. We take as given the well-known fact that the number of 0 eigenvalues cannot be less than dimension of the nullspace.1) dyad uv T ∈ Rm×m has m 0-eigenvalues when u ∈ v ⊥ . [203. powerful. Number of 0 eigenvalues. [333.1] Any symmetric positive semidefinite matrix having a 0 entry on its main diagonal must be 0 along the entire row and column to which that 0 entry belongs.2 0 entry If a positive semidefinite matrix A = [Aij ] ∈ Rn×n has a 0 entry Aii on its main diagonal. three eigenvectors in the nullspace but only two are independent.3.1 prob.3. e.7.7. λ(A) = [ 0 0 0 1 ]T . We offer an example of the converse:   1 0 1 0 0 0 1 0  A = 0 0 0 0 1 0 0 0 . then Aij + Aji = 0 ∀ j . The right-hand side of (1617) is tight for nonzero matrices.8] [203.4. For any matrix A ∈ Rm×n rank(A) + dim N (A) = n (1616) by conservation of dimension. [273.640 APPENDIX A. 7.1]A. 4. a generally nonconvex constraint in x like becomes convex when κ = 0. 0. 5.2. [160. LINEAR ALGEBRA Ax − b = κ Consequently..3 0 eigenvalues theorem This theorem is simple.4] For any square matrix A ∈ Rm×m .0.16 dim N (A) = 2. ( B.2] A.7. A.

and each of those eigenvectors must be in N (A).5).1] Suppose rank(A) < m . Real and imaginary parts of the left-eigenvectors remaining span R(AT ). Let ∗ denote complex conjugation. Suppose A = A∗ and Asi = 0.A.2] Eigenvectors of a real matrix corresponding to 0 eigenvalues must be real. ⋄ Proof.7. number of 0 eigenvalues is at least equal to dim N (AT ) = dim N (A) while all left-eigenvectors (eigenvectors of AT ) corresponding to those 0 eigenvalues belong to N (AT ).) Likewise. the number of 0 eigenvalues is precisely dim N (A) while the corresponding eigenvectors span N (A).17 Proof. A. [333. 5. Then dim N (A) = m−rank(A). for any matrix A ∈ Rm×n rank(AT ) + dim N (AT ) = m (1618) For any square A ∈ Rm×m . Now suppose A has more than m−rank(A) eigenvalues equal to 0. Diagonalizable A therefore has rank(A) nonzero eigenvalues and exactly m−rank(A) eigenvalues equal to 0 whose corresponding eigenvectors span N (A). (Transpose. si = s∗ ⇒ Asi = As∗ ⇒ As∗ = 0. First we show. Then there are more than m−rank(A) linearly independent eigenvectors associated with 0 eigenvalues. the number of 0 eigenvalues is precisely the dimension of its nullspace while the eigenvectors corresponding to those 0 eigenvalues span the nullspace: Any diagonalizable matrix A ∈ Rm×m must possess a complete set of linearly independent eigenvectors. i i i i i i Then .17 Thus A has at least m−rank(A) eigenvalues equal to 0. a contradiction.A. If A is full-rank (invertible). [333. Thus there is a set of m−rank(A) linearly independent vectors spanning N (A). As∗ = 0 ⇒ Asi = As∗ ⇒ si = s∗ . 5. ZEROS 641 For diagonalizable matrix A ( A. then all m = rank(A) eigenvalues are nonzero. Conversely. number of 0 eigenvalues is precisely dim N (AT ) while the corresponding left-eigenvectors span N (AT ). Real and imaginary parts of the eigenvectors remaining span R(A). Each of those can be an eigenvector associated with a 0 eigenvalue because A is diagonalizable ⇔ ∃ m linearly independent eigenvectors. For diagonalizable A . Thus there are more than m−rank(A) linearly independent vectors in N (A) . for a diagonalizable matrix.

Given any particular conjugate pair from (1619). m}} must therefore span the range of diagonalizable matrix A . . . and must come in complex conjugate pairs for the summation to remain real. The argument is similar regarding span of the left-eigenvectors. LINEAR ALGEBRA By similar argument.1.0. we get the partial summation ∗T T T λ i si wi + λ∗ s∗ wi = 2 re(λ i si wi ) i i T T = 2 re si re(λ i wi ) − im si im(λ i wi ) (1620) whereA.and left-eigenvectors. Assume that conjugate pairs of eigenvalues appear in sequence. Then (1619) is T T re s2i re(λ2i w2i ) − im s2i im(λ2i w2i ) + T λ j s j wj j λ∈R λj = 0 (1621) The summation (1621) shows: A is a linear combination of real and imaginary parts of its (right-)eigenvectors corresponding to nonzero eigenvalues.642 APPENDIX A. The k vectors {re si ∈ Rm . So.1. from the diagonalization (1580). A. matrix A has a representation in terms of its right. the dyads {si wi } must be independent because each set of eigenvectors are. i ∈ {1 . the number of nonzero eigenvalues. hence rank A = k . Conjugate transpose is denoted wH = w∗T . Next we show when A is diagonalizable. Complex eigenvectors and eigenvalues are common for real matrices.18 Complex conjugate of w is denoted w∗ . and wi wi+1 . . assuming 0 eigenvalues are ordered last. m A = i=1 T λ i s i wi = k≤m i=1 λi= 0 T λ i s i wi (1619) T From the linearly independent dyads theorem ( B. the left-eigenvectors corresponding to 0 eigenvalues span N (AT ). the real and imaginary parts of its eigenvectors (corresponding to nonzero eigenvalues) span R(A) : The (right-)eigenvectors of a diagonalizable matrix A ∈ Rm×m are linearly independent if and only if the left-eigenvectors are.18 λ∗ λ i+1 .2). im si ∈ Rm | λ i = 0. i equivalently written A = 2 i λ∈C λi= 0 s∗ i ∗ si+1 .

7.3.7. (⇒) Suppose tr(XA) = 0. 1.0. tr(XA) = tr( A X A) whose argument is positive semidefinite by Corollary A. Arguing similarly yields AX = 0. having all 0 eigenvalues.6.3.0. [203. given A 0 {X | X . Eigenvalues of a positive semidefinite matrix can total 0 if and only if each and every nonnegative eigenvalue is 0.12] id est. √ √ T√ √ √ √ AX A = X A X A = 0 (1624) √ √ √ √ √ √ implying X A = 0 which in turn implies X( X A) A = XA = 0. An equivalence in nonisomorphic spaces. A. ZEROS 643 A. (confer (1647)) id est. A = 0} ∩ {X 0} ≡ {X | N (X) ⊇ R(A)} ∩ {X 0} (1625) (one expressed in terms of a hyperplane.5. SM = {Y ∈ SM | Y 1 = 0} c (2040) = {Y ∈ SM | N (Y ) ⊇ 1} = {Y ∈ SM | R(Y ) ⊆ N (1T )} . √ Then √ tr(XA) = 0 is obvious. Identity (1623) leads to an unusual equivalence relating convex geometry to traditional linear algebra: The convex sets.A ∈ RM ×N (39) + For X. the other in terms of nullspace and range) are equivalent only when symmetric matrix A is positive semidefinite. Trace of any square matrix is equivalent to the sum of its eigenvalues.7.1 exer. Diagonalizable matrices A and X are simultaneously diagonalizable if and only if they are commutative under multiplication. (⇐) Suppose XA = AX = 0.1.A. resides at the origin.2. 3.4 0 trace and matrix product tr(X TA) = 0 ⇔ X ◦ A = A ◦ X = 0 (1622) For X. We might apply this equivalence to the geometric center subspace. 2.1 Example. The only positive semidefinite matrix.4. for example. iff they share a complete set of eigenvectors.8] [362.1] + tr(XA) = 0 ⇔ XA = AX = 0 (1623) Proof.A ∈ SM [34.

3. Yet {x | xT = 0} = {x | 1T x = 0} ⊂ R2 Ax (1632) 1 2 0 1 (1631) (1630) (1629) (1628) which is the nullspace of the symmetrized matrix. B= 1 2 2 1 (1633) . videlicet. LINEAR ALGEBRA 0 | X .2 prob. and N (A+AT ) = N (R).5] {x | xT = 0} = RM ⇔ AT = −A Ax while {x | xH = 0} = CM ⇔ A = 0 Ax The positive semidefinite matrix A= for example.7. Rx = 0 ⇔ Rx = 0. For p any positive definite matrix A (for A + AT ≻ 0) {x | xT = 0} = 0 Ax Further.5 Zero definite The domain over which an arbitrary real matrix A is zero definite can exceed its left and right nullspaces. For any positive semidefinite matrix A ∈ RM ×M (for A + AT 0) {x | xT = 0} = N (A + AT ) Ax (1627) because ∃ R A+AT = RTR . has no nullspace. xTAxp = 0 ⇔ xp ∈ N (A + AT ). 11T = 0} (1626) A. Symmetric matrices are not spared from the excess. Then given any particular vector xp . [396.644 from which we derive (confer (1026)) SM ∩ SM ≡ {X c + APPENDIX A.

5. A .19 √ X {x ∈ R2 | x2 = (−2 ± 3 )x1 } 645 (1634) A. Sturm & Zhang give a simple procedure for constructing the dyad-decomposition [Wıκımization]. Symmetrizing the dyad does not change the outcome: {x | xT(uv T + vuT )x/2 = 0} = {x | xTu = 0} ∪ {x | v Tx = 0} A. ZEROS has eigenvalues {−1. but is zero definite onA.2] Let positive semidefinite matrix X ∈ SM have rank ρ . matrix A may be regarded as a parameter. for matrix B and set X as defined ε→0+ lim {x ∈ R2 | xTB x = ε} = X .0.7.1 Proposition.7.0. [339. Then given symmetric + matrix A ∈ SM . That means. elemental dyads in decomposition (1635) constitute a generally nonorthogonal set. unless ρ = 1 or the given matrix A is simultaneously diagonalizable ( A.1) is zero definite on all x for which either xTu = 0 or xTv = 0 . on u⊥ ∪ v ⊥ .2 Example. (Sturm/Zhang) Dyad-decompositions. . A.4) with X . . id est.7. X = 0 if and only if there exists a dyad-decomposition ρ X= j=1 xj xjT (1635) satisfying A . xj xjT = 0 for each and every j ∈ {1 . 3} . The dyad uv T ∈ RM ×M ( B.7. no nullspace. Dyad.19 (1638) These two lines represent the limit in the union of two generally distinct hyperbolae.5. {x | xT uv T x = 0} = {x | xTu = 0} ∪ {x | v Tx = 0} (1637) id est.5.A. ρ} (1636) ⋄ The dyad-decomposition of X proposed is generally not that obtained from a standard diagonalization by eigenvalue decomposition.

646 APPENDIX A. LINEAR ALGEBRA .

citation: Dattorro. Later the physicist P. All rights reserved.01. The first vector algebra that involved a noncommutative vector product (that is. Vitulli [371] 2001 Jon Dattorro.28. M. as sums of simple matrices. 2005. A. −Marie A. which Gibbs called dyads.01. Dirac introduced the term “bra-ket” for what we now call the scalar product of a “bra” (row ) vector times a “ket” (column) vector and the term “ket-bra” for the product of a ket times a bra. v2012. Grassmann’s text also introduced the product of a column matrix and a row matrix. resulting in what we now call a simple matrix. co&edg version 2012. Mεβoo Publishing USA. as above. In that treatise Gibbs represented general matrices. In the late 19th century the American mathematical physicist Willard Gibbs published his famous treatise on vector analysis. v×w need not equal w×v) was proposed by Hermann Grassmann in his book Ausdehnungslehre (1844 ). 647 . Convex Optimization & Euclidean Distance Geometry.Appendix B Simple matrices Mathematicians also attempted to develop algebra of vectors but there was no natural definition of the product of two vectors that held in arbitrary dimensions. Our convention of identifying column matrices and vectors was introduced by physicists in the 20th century. which he called dyadics.28. which resulted in what is now called a simple or a rank-one matrix.

1 Rank-one matrix (dyad) Any matrix formed from the unsigned outer product of two vectors.648 APPENDIX B. It is obvious a dyad can be 0 only when u or v is 0 . we have R(AB T ) ⊆ R(A) . Ψ = uv T ∈ RM ×N (1639) where u ∈ RM and v ∈ RN .1] Product −uv T is a negative dyad.4. Ψ 2 (1642) = σ1 = uv T F = uv T 2 = u v (1643) When u and v are normalized.1. Conversely. SIMPLE MATRICES B. Otherwise. Ψ = uv T = 0 ⇔ u = 0 or v = 0 The matrix 2-norm for Ψ is equivalent to Frobenius’ norm. in general. 3.1 Proof. N (AB T ) ⊇ N (B T ) (1640) with equality when B = A [333. N (uv T ) = N (v T ) ≡ v ⊥ (1641) where dim v ⊥ = N − 1.6]B. R(AAT ) ⊆ R(A) is obvious. is rank-one and called a dyad. [203. R(AAT ) = {AAT y | y ∈ Rm } ⊇ {AAT y | ATy ∈ R(AT )} = R(A) by (142) .1 or respectively when B is invertible and N (A) = 0.3. For matrix products AB T . the pseudoinverse is the transposed dyad. any rank-one matrix must have the form Ψ . prob. Yet for all nonzero dyads we have R(uv T ) = R(u) . 3. vuT † T † Ψ = (uv ) = (1644) u 2 v 2 B.

a projector dyad.1] When dyad uv T ∈ RN ×N is square.B. has two 0-eigenvalues.5) because its eigenvectors are not necessarily independent.2 λ = 0 together with the first N − 1 0-eigenvalues.1. RANK-ONE MATRIX (DYAD) R(v) 649 R(Ψ) = R(u) r0 0r N (uT ) N (Ψ) = N (v T ) RN = R(v) ⊞ N (uv T ) N (uT ) ⊞ R(uv T ) = RM Figure 162: The four fundamental subspaces [335. it is always true that det Ψ = det(uv T ) = 0 (1646) (1645) When λ = 1. E. The remaining eigenvector u spans the range of uv T with corresponding eigenvalue λ = v Tu = tr(uv T ) ∈ R Determinant is a product of the eigenvalues. the square dyad is a nonorthogonal projector projecting on its range (Ψ2 = Ψ . id est. uv T has at least N − 1 0-eigenvalues and corresponding eigenvectors spanning v ⊥ . because u and v share the same dimension. It is quite possible that u ∈ v ⊥ making the remaining eigenvalue instead 0 . In other words.6] of any dyad Ψ = uv T ∈ RM ×N : R(v) ⊥ N (Ψ) & N (uT ) ⊥ R(Ψ). The matrix 1 −1 [1 1] = 1 1 −1 −1 (1647) for example. dim u = M = dim v = N : A dyad is not always diagonalizable ( A. 3. The explanation is. Ψ(x) uv T x is a linear mapping from RN to RM . 3.6). it is possible uv T were nonzero while all its eigenvalues are 0. so. simply. eigenvector u may simultaneously be a member of the nullspace and range of the dyad. [333.2 . B. Map from R(v) to R(u) is bijective.B.

(1652) (1653) .2 dyad symmetry In the specific circumstance that v = u . the dyad becomes an orthogonal projector. [154. rank-one. 2. B.11. (Theorem A. App. and positive semidefinite having exactly N − 1 0-eigenvalues.0.1] Axy TA rank A − T y Ax = rank(A) − 1 (1649) Given nonsingular matrix A ∈ RN ×N 1 + v TA−1 u = 0. then uuT ∈ RN ×N is symmetric. u ∈ N (uv T ) ⇔ v Tu = 0 R(uv T ) ∩ N (uv T ) = ∅ u ∈ R(uv T ) (1648) B. λ = uT u = tr(uuT ) > 0 unless u = 0 When λ = 1.1.6] [396. and y TAx = 0 [207. 2. x ∈ RN . Matrix Ψ u uT 1 for example.650 APPENDIX B.2] [222. y ∈ RM .3.1.0.0. SIMPLE MATRICES Proof. Figure 162 shows the four fundamental subspaces for the dyad. by Saunders’ reckoning. 4. In fact.16] (Sherman-Morrison-Woodbury) (A + uv T )−1 = A−1 − A−1 uv T A−1 1 + v TA−1 u (1650) which accomplishes little numerically. Linear operator Ψ : RN → RM provides a map between vector spaces that remain distinct when M = N .1 rank-one modification For A ∈ RM ×N .3 prob.7) uv T 0 ⇔ v=u (1651) and the remaining eigenvalue is almost always positive. is rank-1 positive semidefinite if and only if Ψ = uuT .1.

i.i.i. any rank-k matrix can be written in the form SW T by singular value decomposition. nor does it imply SW T were full-rank.1.1.11] (1656) where si ∈ CM and wi ∈ CN .i.1.1 Definition. . if and only if dyads {si wi ∈ CM ×N . k R i=1 T T T si wi = R s1 w1 ⊕ . .B.1. generally. Conversely. ( A. k} are T l. as defined. k} are l. and vectors {wi ∈ CN . i = 1 . then vector sums are unique (p.2] The set of k dyads T s i wi | i = 1 . p.1 Dyad independence Now we consider a sum of dyads like (1639) as encountered in diagonalization and singular value decomposition: k k T s i wi i=1 k R = i=1 R T s i wi = i=1 R(si ) ⇐ wi ∀ i are l. .3 (Theorem B. k} are l.i. In absence of assumption of independence.1. .3 Move of range R to inside summation admitted by linear independence of {wi }. k [212.i. .29 thm. Linearly independent dyads.2 Theorem. (1654) range of summation is the vector sum of ranges. is said to be linearly independent iff k rank SW T i=1 T s i wi = k (1657) △ where S [s1 · · · sk ] ∈ CM ×k and W [w1 · · · wk ] ∈ CN ×k . [341. . ⋄ B. Linearly independent (l. RANK-ONE MATRIX (DYAD) 651 B. ⊕ R sk wk = R(s1 ) ⊕ . . ⊕ R(sk ) (1655) B.). i = 1 . i = 1 . Vectors {si ∈ CM . .) dyads.i. and {si } l.1) Under the assumption the dyads are linearly independent (l. p.1.0.B. Dyad independence does not preclude existence of a nullspace N (SW T ) . .1. . .i. rank SW T ≤ k .0.6) B.786): for {wi } l. .1.1. .

suppose rank(SW T ) = k .4 (⇐) Assume BS = I . 0.1.1. Range and Nullspace of Sum (1659) Dyads characterized by biorthogonality condition W TS = I are independent. id est. B. it immediately follows: N (W ) = 0 ⇔ ∃ S S TW = I .652 APPENDIX B. [203. ( A.3) The converse is generally false. (1640) (⇒) If N (S) = 0 then S must be full-rank skinny-or-square.1.1 Biorthogonality condition. B.1. for S ∈ CM ×k and W ∈ CN ×k . for example. rank W } (≤ k) (1658) Then k ≤ rank(SW T ) ≤ k which implies the dyads are independent. if W TS = I then rank(SW T ) = k by the linearly independent dyads theorem because (confer E. Because of reciprocity with S . we need only show: N (S) = 0 ⇔ ∃ B BS = I . B. . Then N (BS) = 0 = {x | BSx = 0} ⊇ N (S).5.4] [396. Invoking Sylvester’s rank inequality. are independent because of their inherent biorthogonality.B.1 Theorem. id est.0.4 Left inverse is not unique. C Left inverse B is given as W T here. Linear independence of k dyads is identical to definition (1657).1. Nullspace and range of dyad sum. Then k ≤ min{rank S . Given a sum of dyads represented by SW T where S ∈ CM ×k and W ∈ CN ×k N (SW T ) = N (W T ) ⇐ R(SW T ) = R(S) ⇐ ∃Z ∃B BS = I W TZ = I ⋄ (1661) B. Dyads produced by diagonalization.1) W TS = I ⇒ rank S = rank W = k ≤ M = N (1660) To see that. B ∴ ∃ A. linearly independent dyads are not necessarily biorthogonal. rank W } ≤ k implying the vector sets are each independent. in general. 2.4] rank S + rank W − k ≤ rank(SW T ) ≤ min{rank S .1. SIMPLE MATRICES Proof. (⇐) Conversely. (⇒) Suppose {si } and {wi } are each linearly independent sets. C [ S A ] = I (id est. [ S A ] is invertible) ⇒ BS = I .

Proof. (⇒) N (SW T ) ⊇ N (W T ) and R(SW T ) ⊆ R(S) are obvious.6] of a doublet Π = uv T + vuT ∈ SN . one a transposition of the other: Π = uv T + vuT = [u v] vT uT = SW T ∈ SN (1664) where u . Π(x) = (uv T + vuT )x is a linear bijective mapping from R([ u v ]) to R([ u v ]). .. v ∈ RN . S = W = [ 1 0 ] . Π = uv T + vuT = 0 ⇔ u = 0 or v = 0 (1665) By assumption of independence. a doublet can be 0 only when u or v is 0 . Like the dyad. DOUBLET R([ u v ]) 653 R(Π) = R([u v]) 0r v ⊥ ∩ u⊥ N vT uT ⊞ R([ u v ]) = RN r0 N (Π) = v ⊥ ∩ u⊥ RN = R([ u v ]) ⊞ N vT uT Figure 163: The four fundamental subspaces [335. the theorem’s converse cannot be true.B.g. λ2 uTv − uv T (1666) B. e. a nonzero doublet has two nonzero eigenvalues λ1 uTv + uv T .2 Doublet Consider a sum of two linearly independent square dyads.2.B. 3.5 By counterexample.5 R(SW T ) = {SW Tx | x ∈ RN } ⊇ {SW TZ y | Zy ∈ RN } = R(S) N (SW T ) = {x | SW Tx = 0} ⊆ {x | BSW Tx = 0} = N (W T ) (1662) (1663) B. (⇐) Assume the existence of a left inverse B ∈ Rk×N and a right inverse Z ∈ RN ×k .

N (Π) = v ⊥ ∩ u⊥ (1668) B.6] of uv T elementary matrix E as a linear mapping E(x) = I − T x . We therefore have N uT R(Π) = R([ u v ]) . of respective dimension 2 and N − 2. so Π is indefinite. v u where λ1 > 0 > λ2 . doublet Π has N − 2 zero-eigenvalues remaining and corresponding eigenvectors spanning vT . Eigenvalue λ1 cannot be 0 unless u and v have opposing directions. again antithetical. By the nullspace and range of dyad sum theorem.654 N (uT ) APPENDIX B. R(v) ⊞ R(E ) = RN The four fundamental subspaces [335.3 Elementary matrix E = I − ζ uv T ∈ RN ×N (1669) A matrix of the form where ζ ∈ R is finite and u. is called an elementary matrix or a rank-one modification of the identity. Generally we have λ1 > 0 and λ2 < 0. with corresponding eigenvectors x1 v u + . SIMPLE MATRICES R(E ) = N (v T ) r0 0r N (E ) = R(u) R(v) RN = N (uT ) ⊞ N (E ) Figure 164: v Tu = 1/ζ . u v x2 u v − u v (1667) spanning the doublet range. but that is antithetical since then the dyads would no longer be independent. 3. [205] Any elementary matrix in RN ×N . Eigenvalue λ2 is 0 if and only if u and v share the same direction.v ∈ RN .

Otherwise. for nonzero vector u [160. In summary. when λ = 0 .10. can be at most one.3.2] [154.7. but dim v = N − 1. N (E ) = R(u) . 2. ( A. 5.1. Then (when λ = 0) elementary matrix E is a nonorthogonal projector projecting on its range (E 2 = E . and the remaining eigenvectors span it. dim R(E ) = N − dim N (E ). here.3. whenever u belongs to hyperplane {z ∈ RN | v Tz = 1/ζ}. 7.2] uuT H = I − 2 T ∈ SN u u B. E. R(E ) = N (v T ) . u ∈ v ⊥ is possible. when v Tu = 1/ζ .A.3] [203. therefore.26] the determinant: (1671) Eigenvectors corresponding to 0 eigenvalues belong to N (E ) . App.6 (1675) Elementary matrix E is not always diagonalizable because eigenvector u need not be independent of the others. v Tu = 1/ζ (1673) illustrated in Figure 164. eigenvector u spans the nullspace when it exists. [154] (confer B.B.1] [333.1) and N (E ) = R(u) . 4.1 Householder matrix An elementary matrix is called a Householder matrix when it has the defining form. and the number of 0 eigenvalues must be at least dim N (E ) which.1. It is apparent ⊥ ⊥ from (1669) that v ⊆ R(E ) . the spectral norm is E 2 = max{1 . The remaining eigenvalue λ = 1 − ζ v Tu det E = 1 − tr ζ uv T = λ If λ = 0 then E is invertible.0.3. When E = E T . .1) The nullspace exists. |λ|} (1674) B. id est. R(E ) = RN .B. which is opposite to the assignment of subspaces for a dyad (Figure 162).0.7. By conservation of dimension. Hence R(E ) ≡ v ⊥ when the nullspace exists. id est. ELEMENTARY MATRIX 655 has N − 1 eigenvalues equal to 1 corresponding to real eigenvectors that span v ⊥ . rather.6 From [222. when a nontrivial nullspace of E exists.1) E −1 = I + ζ T uv λ (1672) (1670) corresponds to eigenvector u .

. In place of X − αc 1T we may write XV as in (1006) where V = I− 1 T 11 ∈ SN N (944) .2. Matrix H has N − 1 orthonormal eigenvectors spanning that reflecting subspace u⊥ with corresponding eigenvalues equal to 1.4] because they are made by permuting rows and columns of the identity matrix.1 Auxiliary V -matrices Auxiliary projector matrix V It is convenient to define a matrix V that arises naturally as a consequence of translating the geometric center αc ( 5. Not all permutation matrices are Householder matrices.5. Neither are  all symmetric permutation matrices Householder matrices.4 B. the permutation matrix   1 0 0 Ξ = 0 0 1  0 1 0 (1678) B. Hu⊥ = u⊥ while Hu = −u .0.5. the matrix 2-norm (the spectral norm) is equal to the largest eigenvalue-magnitude. A Householder matrix is thus characterized. Vector u is normal to an N −1-dimensional subspace u⊥ through which this particular H effects pointwise reflection.  0 0 0 1  0 0 1 0   Ξ =  0 1 0 0  (1766) is not a Householder matrix. ΞT Ξ = I ) [333.656 APPENDIX B. e. H −1 = H T .1) of some list X to the origin.5. so det H = −1 (1676) Due to symmetry of H .1. H 2 = 1. although all permutation matrices are orthogonal matrices ( B.g. H 0 (1677) √ is a Householder matrix having u = [ 0 1 −1 ]T / 2 .g. 3..3)). SIMPLE MATRICES which is a symmetric orthogonal (reflection) matrix (H −1 = H T = H ( B. 1 0 0 0 For example. HT = H . e.4. The remaining eigenvector u has corresponding eigenvalue −1 .

elementary matrix V is also a projection matrix ( E. 4. the N th eigenvalue equals 0. For the particular elementary matrix V . 1} eigenvalues also indicate diagonalizable V is a projection matrix. Let H ∈ SN be a Householder matrix (1675) defined by   1 .4.4. Because V1=0 (1679) the nullspace N (V ) = R(1) is spanned by the eigenvector 1. AUXILIARY V -MATRICES 657 is an elementary matrix called the geometric centering matrix.1). The number of 0 eigenvalues must equal dim N (V ) = 1. 2] V =H Let D ∈ SN and define h −HDH − A b bT c (1685) I 0 H 0T 0 (1684) .   .  ∈ RN  (1683) u=  1 √ 1+ N Then we have [156. VT= V . V 2 = 1.B.7.0.3) projecting orthogonally on its range N (1T ) which is a hyperplane containing the origin in RN V = I − 1(1T 1)−1 1T (1681) The {0. . V† = V . Because V2= V (1680) and V T = V . Any elementary matrix in RN ×N has N − 1 eigenvalues equal to 1.1 Example.3.0. B. because V = V T is diagonalizable. V 0 (1682) Matrix V is also circulant [169]. from (1955).1.1 thm. Relationship of Auxiliary to Householder matrix. The remaining eigenvectors span R(V ) ≡ 1⊥ = N (1T ) that has dimension N − 1. [396. by the 0 eigenvalues theorem ( A.4.1] Symmetry of V denotes orthogonal projection.

0 2VN VN = VN √ 5.0. V 0 2VN = 0 2VN √ √ √ 2VN 0 2VN = 0 2VN 7. 9. Then because H is nonsingular ( A. 3] −V D V = −H A 0 H 0T 0 0 ⇔ −A 0 (1686) and affine dimension is r = rank A when D is a Euclidean distance matrix.1. 0 2VN V = V √ √ 6. 10. 0 8. 0 0 0 0 0 √ √ √ √ √ 2VN 2VN 2VN 2VN 2VN 0 † † = 0 0T 0 I √ V 2VN =V = 0 0T 0 I √ 2VN † † V = 0 0 0 √ 2VN 2VN † √ 0 0T 0 I √ 2VN = 0 = 0 0T 0 I 0 0T 0 I . 12. VN = √ 2 T 2.5) [185.4. 11. 13.658 APPENDIX B.2 Schoenberg auxiliary matrix VN −1T I ∈ RN ×N −1 1 1.3. SIMPLE MATRICES where b is a vector. B. VN 1 = 0 √ 3. I − e1 1T = 0 2VN √ 4.

VN = 1 √ 1 ]−1 2 659 = √ † VN 2 T 1 N 1 I − N 11T ∈ SN −1 √ 1 1 2 − N 1 I − N 11T ∈ RN −1×N .7 V (Y − δ(Y 1))V = Y − δ(Y 1) If k1 is 1−ρ while k2 equals −ρ ∈ R .B. † 16. AUXILIARY V -MATRICES 14. [ VN † 15.B. For Y ∈ SN B. k2 ∈ R . 1 11T N ∈ SN 11T − I ∈ EDMN 20. VN 1 = 0 † 17. D = [dij ] ∈ SN tr(−V D V ) = (1687) dij .j Any elementary matrix E ∈ SN of the particular form E = k1 I − k2 11T where k1 . 1 N i.4.j i=j dij − N −1 N dii = i 1 T 1 D1 N − tr D 22. −VN (11T − I )VN = I . D = [dij ] ∈ SN h (946) 1 T 1 D1 N † tr(−V D V ) = tr(−V D) = tr(−VN DVN ) = = 1 N tr(11TD) = 1 N dij i. [303] . D = [dij ] ∈ SN h T tr(−VN DVN ) = d1j j 23.7 will make tr(−ED) proportional to 21. V T = V = VN VN = I − † 19. VN VN = I † 18. then all eigenvalues of E for −1/(N − 1) < ρ < 1 are guaranteed positive and therefore E is guaranteed positive definite.

[6] We defined three auxiliary V -matrices: V . has R(VW ) = N (1T ) and orthonormal columns.3) of the form such that bT 1 = 1 (confer [163.  .4.. 2] that any elementary matrix ( B.. .   1+    −1  N + √N   .4.4. T N (VS ) = R(1) R(VS ) = N (1T ) (1692) . and VW sharing some attributes listed in Table B. . VN (928). SIMPLE MATRICES B.4. V can be expressed † T V = VW VW = VN VN T but VW VW = I means V is an orthogonal projector (1952) and † T VW = VW .4. N (VS ) = R(b) . ..5 More auxiliary matrices VS = I − b 1T ∈ RN ×N (1691) Mathar shows [260.660 APPENDIX B.4 dim V V VN VW N ×N Auxiliary V -matrix Table rank V N−1 N−1 N−1 N (1T ) N (1T ) N (1T ) R(V ) N (V T ) R(1) R(1) R(1) 1 (I 2 V TV V + 11T ) I 1 2 VVT V N − 1 −1T −1 I V VV† V V V N ×(N − 1) N ×(N − 1) B... For example. .. 2]). ... −1 √ N+ N . −1 √ N+ N ··· 1 +       ∈ RN ×N −1     (1688) (1689) T VW 1 = 0 VW 2 =1.  −1 √ N+ N −1 √ N −1 √ N+ N −1 √ N+ N  . is an auxiliary V -matrix having T R(VS ) = N (bT ) . (1690) B.3 Orthonormal auxiliary matrix VW √ N −1 √ N+ N −1 √ N −1 √ N+ N The skinny matrix  −1 VW ··· ··· .

B. for A = −AT Q = (I + A)−1 (I − A) (1696) A unitary matrix is a complex generalization of orthogonal matrix. ORTHOMATRICES 1 Given X ∈ Rn×N .1] for any x ∈ Rn Qx 2 = x 2 (1695) In other words. id est.1] 661 X(I − b 1T ) F . 3. B. 6.6. Property Q−1 = QT completely defines orthogonal matrix Q ∈ Rn×n employed to effect vector rotation. a skinny-or-square full-rank matrix characterized by nonexpansivity (1956) QTx 2 ≤ x 2 ∀ x ∈ Rn .8 δ(Λ) ∞ =1 (1698) Orthogonal and unitary matrices are called unitary linear operators.5. 2. .2 Orthogonal matrix & vector rotation An orthogonal matrix is a square orthonormal matrix.B.1) so.1 Orthomatrices Orthonormal matrix Property QT Q = I completely defines orthonormal matrix Q ∈ Rn×k (k ≤ n). conjugate transpose defines it: U −1 = U H .5. the 2-norm is orthogonally invariant.4] [335. An orthogonal matrix is simply a real unitary matrix. the choice b = N 1 (VS = V ) minimizes [165.5] [203. Hence the rows and columns of Q respectively form an orthonormal set. Qz = y . 3.2. z (1694) B.B. Normalcy guarantees diagonalization ( A.5. 2. Any antisymmetric matrix constructs an orthogonal matrix. Qy 2 = y 2 ∀ y ∈ Rk (1693) and preservation of vector inner-product Qy .8 Orthogonal matrix Q is a normal matrix further characterized by spectral norm: Q−1 = QT . [333.5 B. for Q SΛS H SΛ−1 S H = S ∗ΛS T = SΛ∗ S H . Q 2=1 (1697) Applying characterization (1697) to QT we see it too is an orthogonal matrix.5.

B. for example. . 1}. a reflection matrix is completely defined by Q−1 = QT = Q . n Q(j . and vice versa. Orthogonal matrices have complex eigenvalues in conjugate pairs: so det Q = ±1. . Given any other dimensionally compatible orthogonal matrix U .1] [241] 2 The largest magnitude entry of an orthogonal matrix is 1 . |λ(Q)| = 1 (1700) but only the identity matrix can be simultaneously orthogonal and positive definite. In fact.1) is an example of symmetric orthogonal (reflection) matrix. are nonnegative orthogonal matrices. In fact. [240.5. for each and every j ∈ 1 . Reflection matrices have eigenvalues equal to ±1 and so det Q = ±1. j) ∞ ≤ 1 Each and every eigenvalue of a (real) orthogonal matrix has magnitude 1 (Λ−1 = Λ∗ ) λ(Q) ∈ C n . Any product of permutation matrix with orthogonal matrix remains orthogonal. Q 2 =1 (1701) The Householder matrix ( B. and vice versa. Product or Kronecker product of any permutation matrices remains a permutator.3. characterized: QT = Q . any reflection matrix Q is related to some orthogonal projector P by [205. any product of orthogonal matrices AQ remains orthogonal by definition.44] Q = I − 2P (1702) . :) ∞ ≤ 1 (1699) Q(: . SIMPLE MATRICES characterizes an orthogonal matrix in terms of eigenvalues and eigenvectors. id est. the mapping g(A) = U TAQ is a bijection on the domain of orthogonal matrices (a nonconvex manifold of dimension 1 n(n − 1) [52]). It is natural to expect a relationship between reflection and projection matrices because all projection matrices have eigenvalues belonging to {0. The reflection matrix is a symmetric orthogonal matrix. 2.662 APPENDIX B. 1 prob.3 Reflection A matrix for pointwise reflection is defined by imposing symmetry upon the orthogonal matrix. Q−1 = QT . All permutation matrices Ξ .

4 Rotation of range and rowspace Given orthogonal matrix Q .5. In three dimensions (X ∈ R3×N ).2) λ(Q) ∈ Rn . B.5. the precise meaning of rotation is best illustrated in Figure 165 where the gimbal aids visualization of what is achievable. generally. The collection of all orthogonal matrices of particular dimension does not form a convex set. Matrix 2P −I represents antireflection. ( 5. ORTHOMATRICES 663 Figure 165: Gimbal : a mechanism imparting three degrees of dimensional freedom to a Euclidean body suspended at the device’s center. neither orthogonal or invertible.1) cos θ  0 Q = sin θ     0 − sin θ 1 0 0 cos φ − sin φ 0 1 0  0 cos ψ − sin ψ  sin φ cos φ 0  (1704) 0 cos θ 0 sin ψ cos ψ 0 0 1 . ( E.3.0. Every orthogonal matrix can be expressed as the product of a rotation and a reflection.2.) Yet P is.B. |λ(Q)| = 1 (1703) Reflection is with respect to R(P )⊥ .5. column vectors of a matrix X are simultaneously rotated about the origin via product QX . (Courtesy of The MathWorks Inc. mathematically. Each ring is free to rotate about one axis.

R) = arccos en+1 .g. B. Rotation of X as in QXQT is a simultaneous rotation/reflection of range and rowspace. We want an orthogonal matrix that rotates a list in the columns of matrix X ∈ Rn+1×N through the dihedral angle between Rn and R ( 2. R(U ) ⊇ R(X) . given linear map y = Ax : Rn → Rn and orthogonal Q .3) (Rn .664 B. SIMPLE MATRICES Example. Another interpretation of product QX is rotation/reflection of R(X). n+1 Partition an n + 1-dimensional Euclidean space R an n-dimensional subspace R {λ ∈ Rn+1 | 1T λ = 0} Rn R and define (1705) (a hyperplane through the origin). Any matrix can be expressed as a singular value decomposition X = U ΣW T (1596) where δ 2(Σ) = Σ . and R(W ) ⊇ R(X T ).4.g.3. 1 en+1 1 1 = arccos √ n+1 radians (1706) The vertex-description of the nonnegative orthant in Rn+1 is {[ e1 e2 · · · en+1 ] a | a 0} = {a 0} = Rn+1 ⊂ Rn+1 + (1707) Consider rotation of these vertices via orthogonal matrix Q 1 [ 1 √n+1 ΞVW ]Ξ ∈ Rn+1×n+1 (1708) where permutation matrix Ξ ∈ S n+1 is defined in (1766)...9 Proof. This particular orthogonal matrix is selected because it rotates any point in subspace Rn about one axis of revolution onto R .1 APPENDIX B.4.9 . the transformation Qy = A Qx is a rotation/reflection of range and rowspace (141) of matrix A where Qy ∈ R(A) and Qx ∈ R(AT ) (142).4. e.0. rotation Qen+1 aligns the last standard basis vector with subspace normal R⊥ = 1. One axis of revolution.B.5. The product QTA Q can be regarded as a coordinate transformation. The rotated standard basis vectors remaining are orthonormal spanning R . and VW ∈ Rn+1×n is the orthonormal auxiliary matrix defined in B. e.

by A. Every square matrix. for example.31.1 no.5.2. such a rotation would depart from S n . the product QTA will 2 rotate A ∈ Rn×n in the Euclidean sense in Rn because Frobenius’ norm is orthogonally invariant ( 2.1.5.1). Were A symmetric.38.2. prob. 12.B..4.1] Given orthogonal matrix Q . is orthogonally equivalent to a matrix having equal entries along the main diagonal.1.g. ORTHOMATRICES 665 B. One remedy is to instead form product QTA Q because QTA Q By A.3] . vec QTA Q = (Q ⊗ Q)T vec A (1711) which is a rotation of the vectorized A matrix because Kronecker product of any orthogonal matrices remains orthogonal. (Q ⊗ Q)T (Q ⊗ Q) = I (1712) F = tr(QTATQQTA Q) = A F (1710) Matrix A is orthogonally equivalent to B if B = S TA S for some orthogonal matrix S . QTA F = tr(ATQQTA) = A F (1709) (likewise for AQ).1 no. 2.5 Matrix rotation Orthogonal matrices are also employed to rotate/reflect other matrices like vectors: [160. [203. e.

SIMPLE MATRICES .666 APPENDIX B.

01. co&edg version 2012. citation: Dattorro. 667 .28. All rights reserved.2] x∈X inf f (x) = − sup −f (x) x∈X x∈X sup f (x) = − inf −f (x) x∈X (1714) arg inf f (x) = arg sup −f (x) x∈X x∈X arg sup f (x) = arg inf −f (x) x∈X x∈X (1715) 2001 Jon Dattorro. C.1.1 Properties of infima inf ∅ sup ∅ ∞ −∞ (1713) Given f (x) : X → R defined on arbitrary set X [200. Mεβoo Publishing USA.. e.1. −Roman Polyak We speculate that optimization problems possessing analytical solution have convex transformation or constructive global optimality conditions. (1738). 2005.28.4.0. Convex Optimization & Euclidean Distance Geometry.01. perhaps yet unknown.g. 7.3.1.Appendix C Some analytical optimal results People have been working on Optimization since the ancient Greeks [Zenodorus. v2012. C. 0. circa 200bc] learned that a string encloses the most area when it is formed into the shape of a circle.

inf f (x)} x∈X x∈Y x∈X x∈Y inf f (x) ≥ max{ inf f (x) .2] X ⊂ Y ⇒ inf f (x) ≥ inf f (x) x∈X x∈Y x∈X ∪Y x∈X ∩Y (1720) (1721) (1722) inf f (x) = min{ inf f (x) .668 APPENDIX C. inf f (x)} C. Y ∈ S minimizen tr X + tr Y m X A AT Y 0 subject to . (1599).1. λ(ATA)1 = A 2 = sup Ax x =1 2 = minimize t∈R t tI A AT tI 0 (638) subject to For A ∈ Rm×n and σ(A) denoting its singular values. singular and eigen values For A ∈ Rm×n and σ(A)1 connoting spectral norm.1.2 σ(A)1 = Trace. κ>0 (1717) (1718) Given f (x) : X → R defined on arbitrary set X x∈X x∈X arg inf |f (x)| = arg inf f (x)2 (1719) Given f (x) : X ∪ Y → R and arbitrary sets X and Y [200.2] x∈X inf (κ + f (x)) = κ + inf f (x) x∈X arg inf (κ + f (x)) = arg inf f (x) x∈X x∈X x∈X (1716) inf κ f (x) = κ inf f (x) x∈X x∈X x∈X inf (f (x) + g(x)) ≥ inf f (x) + inf g(x) x∈X x∈X arg inf κ f (x) = arg inf f (x)  x∈X   . SOME ANALYTICAL OPTIMAL RESULTS Given scalar κ and f (x) : X → R and g(x) : X → R defined on arbitrary set X [200.200]) is √ σ(A)i = tr ATA = sup tr(X TA) = maximize tr(X TA) i X 2 ≤1 X∈ Rm×n subject to = 1 2 I X XT I 0 (1723) X∈ S . p. the nuclear norm (Ky Fan norm) of matrix A (confer (44). [204. 0. 0.

C.2. TRACE, SINGULAR AND EIGEN VALUES

669

This nuclear norm is convexC.1 and dual to spectral norm. [204, p.214] [61, A.1.6] Given singular value decomposition A = SΣQT ∈ Rm×n (A.6), then X ⋆ = SQT ∈ Rm×n is an optimal solution to maximization (confer 2.3.2.0.5) while X ⋆ = SΣS T ∈ S m and Y ⋆ = QΣQT ∈ S n is an optimal solution to minimization [136]. Srebro [325] asserts σ(A)i =
i 1 2

minimize
U,V

U

2 F

+ V

2 F

subject to A = U V T = minimize
U,V

(1724)

U

F

V

F

subject to A = U V T

C.2.0.0.1 Exercise. Optimal matrix factorization. Prove (1724).C.2

For X ∈ Sm , Y ∈ S n , A ∈ C ⊆ Rm×n for set C convex, and σ(A) denoting the singular values of A [136, 3]
1 2

minimize tr X + tr Y
A,X, Y

minimize
A i

σ(A)i

subject to A ∈ C

subject to

X A AT Y A∈C

0

(1725)

C.1 discernible as envelope of the rank function (1395) or as supremum of functions linear in A (Figure 74). C.2 Hint: Write A = SΣQT ∈ Rm×n and

X AT

A Y

=

U V

[ UT V T ]

0

√ √ Show U ⋆ = S Σ ∈ Rm×min{m, n} and V ⋆ = Q Σ ∈ Rn×min{m, n} , hence

U⋆

2 F

= V⋆

2 F

.

670

APPENDIX C. SOME ANALYTICAL OPTIMAL RESULTS For A ∈ SN and β ∈ R + β tr A = maximize
X∈ SN

tr(XA) βI

subject to X

(1726)

But the following statement is numerically stable, preventing an unbounded solution in direction of a 0 eigenvalue: maximize sgn(β) tr(XA)
X∈ SN

subject to X X

|β| I −|β| I −|β|I ← X

(1727) 0.

where β tr A = tr(X ⋆ A). If β ≥ 0, then X

For symmetric A ∈ SN , its smallest and largest eigenvalue in λ(A) ∈ RN are respectively [11, 4.1] [43, I.6.15] [203, 4.2] [240, 2.1] [241] min{λ(A)i } = inf xTA x = minimize
i x =1 X∈ SN +

tr(XA) = maximize
t∈R

t

subject to tr X = 1 max{λ(A)i } = sup xTA x = maximize
i x =1 X∈ SN +

subject to A t I (1728)
t∈R

tr(XA) = minimize

t

subject to tr X = 1 whereas λN I A λ1 I
N

subject to A t I (1729) (1730)

The largest eigenvalue λ1 is always convex in A ∈ S because, given particular x , xTA x is linear in matrix A ; supremum of a family of linear functions is convex, as illustrated in Figure 74. So for A , B ∈ SN , λ1 (A + B) ≤ λ1 (A) + λ1 (B). (1537) Similarly, the smallest eigenvalue λN of any symmetric matrix is a concave function of its entries; λN (A + B) ≥ λN (A) + λN (B). (1537) For v1 a normalized eigenvector of A corresponding to the largest eigenvalue, and vN a normalized eigenvector corresponding to the smallest eigenvalue, vN = arg inf xTA x
x =1

(1731) (1732)

v1 = arg sup xTA x
x =1

C.2. TRACE, SINGULAR AND EIGEN VALUES

671

For A ∈ SN having eigenvalues λ(A) ∈ RN , consider the unconstrained nonconvex optimization that is a projection on the rank-1 subset ( 2.9.2.1, 3.6.0.0.1) of the boundary of positive semidefinite cone SN : + Defining λ1 max i {λ(A)i } and corresponding eigenvector v1 minimize xxT − A
x 2 F

= minimize tr(xxT(xTx) − 2AxxT + ATA)
x

=

λ(A) λ(A)
2 F

2 2

, λ1 ≤ 0 2 − λ1 , λ 1 > 0

(1733)

arg minimize xxT − A
x

=

0, λ1 ≤ 0 √ v 1 λ1 , λ1 > 0

(1734)

Proof. This is simply the Eckart & Young solution from 7.1.2: x⋆ x⋆T = 0, T λ1 v 1 v 1 , λ1 ≤ 0 λ1 > 0 (1735)

Given nonincreasingly ordered diagonalization A = QΛQT where Λ = δ(λ(A)) ( A.5), then (1733) has minimum value  T 2 2 λ1 ≤ 0  QΛQ F = δ(Λ) ,            2 2 λ1 λ1 T 2 minimize xx −A F = 0    0  x  Q   ...  − ΛQT =  .  − δ(Λ) , λ1 > 0  .  .    0 0
F

(1736)

C.2.0.0.2 Exercise. Rank-1 approximation. Given symmetric matrix A ∈ SN , prove: v1 = arg minimize
x

subject to

xxT − A x =1

2 F

(1737)

where v1 is a normalized eigenvector of A corresponding to its largest eigenvalue. What is the objective’s optimal value?

672

APPENDIX C. SOME ANALYTICAL OPTIMAL RESULTS (Ky Fan, 1949) For B ∈ SN whose eigenvalues λ(B) ∈ RN are arranged in nonincreasing order, and for 1 ≤ k ≤ N [11, 4.1] [216] [203, 4.3.18] [362, 2] [240, 2.1] [241]
N

i=N −k+1

λ(B)i = inf tr(U U TB) = minimize N
U ∈ RN ×k U TU = I X∈ S+

tr(XB)

(a)

subject to X I tr X = k = maximize (k − N )µ + tr(B − Z ) N
µ∈R , Z∈ S+

(b)

subject to µI + Z
k i=1

B (c)

λ(B)i = sup tr(U U TB) = maximize tr(XB) N
U ∈ RN ×k U TU = I X∈ S+

subject to X I tr X = k = minimize N
µ∈R , Z∈ S+

kµ + tr Z B

(d) (1738)

subject to µI + Z

Given ordered diagonalization B = QΛQT , ( A.5.1) then an optimal U for the infimum is U ⋆ = Q(: , N − k+1 : N ) ∈ RN ×k whereas U ⋆ = Q(: , 1 : k) ∈ RN ×k for the supremum is more reliably computed. In both cases, X ⋆ = U ⋆ U ⋆T . Optimization (a) searches the convex hull of outer product U U T of all N × k orthonormal matrices. ( 2.3.2.0.1) For B ∈ SN whose eigenvalues λ(B) ∈ RN are arranged in nonincreasing order, and for diagonal matrix Υ ∈ Sk whose diagonal entries are arranged in nonincreasing order where 1 ≤ k ≤ N , we utilize the main-diagonal δ operator’s self-adjointness property (1451): [12, 4.2] Υii λ(B)N −i+1 = inf tr(Υ U TBU ) = inf δ(Υ)T δ(U TBU )
U ∈ RN ×k U TU = I U ∈ RN ×k U TU = I

k i=1

(1739)

k

=

minimize
Vi ∈ SN

tr B
i=1

(Υii −Υi+1,i+1 )Vi i=1 . . . k i=1 . . . k

subject to tr Vi = i , I Vi 0 ,

C.2. TRACE, SINGULAR AND EIGEN VALUES where Υk+1,k+1
k i=1

673

0. We speculate, (1740)

Υii λ(B)i = sup tr(Υ U TBU ) = sup δ(Υ)T δ(U TBU )
U ∈ RN ×k U TU = I U ∈ RN ×k U TU = I

Alizadeh shows: [11, 4.2]
k k

Υii λ(B)i =
i=1

µ∈Rk , Zi ∈ SN

minimize

iµi + tr Zi
i=1

subject to µi I + Zi − (Υii −Υi+1,i+1 )B Zi 0 ,
k

0,

i=1 . . . k i=1 . . . k

=

maximize
Vi ∈ SN

tr B
i=1

(Υii −Υi+1,i+1 )Vi i=1 . . . k i=1 . . . k (1741)

subject to tr Vi = i , I Vi 0 , where Υk+1,k+1 0.

The largest eigenvalue magnitude µ of A ∈ SN max { |λ(A)i | }
i

=

minimize
µ∈ R

µ A µI

subject to −µI

(1742)

is minimized over convex set C by semidefinite program: (confer 7.1.5) minimize
A

A

minimize
2 µ,A

µ µI (1743)

subject to A ∈ C id est, µ⋆

subject to −µI A A∈C

max { |λ(A⋆ )i | , i = 1 . . . N } ∈ R+
i

(1744)

674

APPENDIX C. SOME ANALYTICAL OPTIMAL RESULTS For B ∈ SN whose eigenvalues λ(B) ∈ RN are arranged in nonincreasing order, let Π λ(B) be a permutation of eigenvalues λ(B) such that their absolute value becomes arranged in nonincreasing order: |Π λ(B)|1 ≥ |Π λ(B)|2 ≥ · · · ≥ |Π λ(B)|N . Then, for 1 ≤ k ≤ N [11, 4.3]C.3

k i=1

|Π λ(B)|i = minimize N
µ∈R , Z∈ S+

kµ + tr Z 0 0

=

maximize
V , W ∈ SN +

B, V−W

subject to µI + Z + B µI + Z − B
k

subject to I V , W tr(V + W ) = k (1745)

For diagonal matrix Υ ∈ S whose diagonal entries are arranged in nonincreasing order where 1 ≤ k ≤ N
k i=1 k

Υii |Π λ(B)|i =

µ∈Rk , Zi ∈ SN

minimize

iµi + tr Zi
i=1

subject to µi I + Zi + (Υii −Υi+1,i+1 )B µi I + Zi − (Υii −Υi+1,i+1 )B Zi 0 ,
k

0, 0,

i=1 . . . k i=1 . . . k i=1 . . . k

=

maximize
Vi , Wi ∈ SN

tr B
i=1

(Υii −Υi+1,i+1 )(Vi − Wi ) i=1 . . . k i=1 . . . k i=1 . . . k

subject to tr(Vi + Wi ) = i , I Vi 0 , I Wi 0 , where Υk+1,k+1 0.

(1746)

C.2.0.0.3 Exercise. Weighted sum of largest eigenvalues. Prove (1740).

We eliminate a redundant positive semidefinite variable from Alizadeh’s minimization. There exist typographical errors in [294, (6.49) (6.55)] for this minimization.

C.3

C.3. ORTHOGONAL PROCRUSTES PROBLEM

675

For A , B ∈ SN whose eigenvalues λ(A) , λ(B) ∈ RN are respectively arranged in nonincreasing order, and for nonincreasingly ordered T diagonalizations A = W ΥW T and B = WB ΛWB [201] [240, 2.1] A A [241] λ(A)T λ(B) = sup tr(AT U TBU ) ≥ tr(ATB)
U ∈ RN ×N U TU = I

(1765)

(confer (1770)) where optimal U is U ⋆ = WB W T ∈ RN ×N A (1762)

We can push that upper bound higher using a result in C.4.2.1: |λ(A)|T |λ(B)| = sup re tr(AT U HBU )
U ∈ CN ×N U HU = I

(1747)

For step function ψ as defined in (1612), optimal U becomes U ⋆ = WB δ(ψ(δ(Λ)))
H

δ(ψ(δ(Υ))) W T ∈ CN ×N A

(1748)

C.3

Orthogonal Procrustes problem

Given matrices A , B ∈ Rn×N , their product having full singular value decomposition ( A.6.3) AB T U ΣQT ∈ Rn×n A − RTB
T −1 F

(1749)

then an optimal solution R⋆ to the orthogonal Procrustes problem minimize
R

subject to R = R

(1750)

maximizes tr(ATRTB) over the nonconvex manifold of orthogonal matrices: [203, 7.4.8] R⋆ = QU T ∈ Rn×n (1751) A necessary and sufficient condition for optimality AB TR⋆ 0 (1752)

holds whenever R⋆ is an orthogonal matrix. [165, 4]

676

APPENDIX C. SOME ANALYTICAL OPTIMAL RESULTS

Optimal solution R⋆ can reveal rotation/reflection ( 5.5.2, B.5) of one list in the columns of matrix A with respect to another list in B . Solution is unique if rank BVN = n . [114, 2.4.1] In the case that A is a vector and permutation of B , solution R⋆ is not necessarily a permutation matrix ( 4.6.0.0.3) although the optimal objective will be zero. More generally, the optimal value for objective of minimization is tr ATA + B TB − 2AB TR⋆ = tr(ATA) + tr(B TB) − 2 tr(U ΣU T ) = A 2 + B 2 − 2δ(Σ)T 1 F F while the optimal value for corresponding trace maximization is
RT=R −1

(1753)

sup tr(ATRTB) = tr(ATR⋆TB) = δ(Σ)T 1 ≥ tr(ATB) maximize
R

(1754)

The same optimal solution R⋆ solves A + RTB
T −1 F

subject to R = R C.3.0.1 Procrustes relaxation

(1755)

By replacing its feasible set with the convex hull of orthogonal matrices (Example 2.3.2.0.5), we relax Procrustes problem (1750) to a convex problem minimize
R

A − RTB

2 F

= tr(ATA + B TB) − 2 maximize tr(ATRTB)
R

subject to RT = R −1

subject to

I R RT I

0

(1756)

whose adjusted objective must always equal Procrustes’.C.4

C.3.1

Effect of translation

Consider the impact on problem (1750) of dc offset in known lists A , B ∈ Rn×N . Rotation of B there is with respect to the origin, so better results may be obtained if offset is first accounted. Because the geometric centers of the lists AV and BV are the origin, instead we solve minimize
R
C.4

subject to R = R

AV − RTBV
T −1

F

(1757)

(because orthogonal matrices are the extreme points of this hull) and whose optimal numerical solution (SDPT3 [351]) [168] is reliably observed to be orthogonal for n ≤ N .

C.4. TWO-SIDED ORTHOGONAL PROCRUSTES

677

where V ∈ SN is the geometric centering matrix ( B.4.1). Now we define the full singular value decomposition A V BT and an optimal rotation matrix R⋆ = QU T ∈ Rn×n The desired result is an optimally rotated offset list R⋆TB V + A(I − V ) ≈ A (1759) (1751) U ΣQT ∈ Rn×n (1758)

which most closely matches the list in A . Equality is attained when the lists are precisely related by a rotation/reflection and an offset. When R⋆TB = A or B1 = A1 = 0, this result (1759) reduces to R⋆TB ≈ A . C.3.1.1 Translation of extended list

Suppose an optimal rotation matrix R⋆ ∈ Rn×n were derived as before from matrix B ∈ Rn×N , but B is part of a larger list in the columns of [ C B ] ∈ Rn×M +N where C ∈ Rn×M . In that event, we wish to apply the rotation/reflection and translation to the larger list. The expression supplanting the approximation in (1759) makes 1T of compatible dimension;
1 R⋆T [ C −B 11TN 1 B V ] + A11TN

(1760)

1 1 id est, C −B 11TN ∈ Rn×M and A11TN ∈ Rn×M +N .

C.4
C.4.0.1

Two-sided orthogonal Procrustes
Minimization

Given symmetric A , B ∈ SN , each having diagonalization ( A.5.1) A
T QA ΛA QA ,

B

T QB ΛB QB

(1761)

where eigenvalues are arranged in their respective diagonal matrix Λ in nonincreasing order, then an optimal solution [131]
T R⋆ = QB QA ∈ RN ×N

(1762)

678

APPENDIX C. SOME ANALYTICAL OPTIMAL RESULTS

to the two-sided orthogonal Procrustes problem minimize
R

subject to RT = R −1

A − RTBR

F

=

minimize
R

subject to RT = R −1

tr ATA − 2ATRTBR + B TB

(1763)

maximizes tr(ATRTBR) over the nonconvex manifold of orthogonal matrices. Optimal product R⋆TBR⋆ has the eigenvectors of A but the eigenvalues of B . [165, 7.5.1] The optimal value for the objective of minimization is, by (48) while the corresponding trace maximization has optimal value
RT=R −1 T T QA ΛA QA − R⋆TQB ΛB QB R⋆ F T = QA (ΛA − ΛB )QA F

= ΛA − ΛB

F

(1764) (1765)

sup tr(ATRTBR) = tr(ATR⋆TBR⋆ ) = tr(ΛA ΛB ) ≥ tr(ATB)

The lower bound on inner product of eigenvalues is due to Fan (p.617). C.4.0.2 Maximization

Any permutation matrix is an orthogonal matrix. Defining a row- and column-swapping permutation matrix (a reflection matrix, B.5.3)   0 1   ·   T   · (1766) Ξ = Ξ =     1 1 0 then an optimal solution R⋆ to the maximization problem [sic] maximize
R

subject to R = R

A − RTBR
T −1

F

(1767)

minimizes tr(ATRTBR) : [201] [240, 2.1] [241] The optimal value for the objective of maximization is
T T QA ΛA QA − R⋆TQB ΛB QB R⋆ F T = QA ΛA QA − QA Ξ TΛB ΞQT A = ΛA − ΞΛB Ξ F F T R⋆ = QB ΞQA ∈ RN ×N

(1768)

(1769)

while the corresponding trace minimization has optimal value
RT=R −1

inf tr(ATRTBR) = tr(ATR⋆TBR⋆ ) = tr(ΛA ΞΛB Ξ)

(1770)

C.4. TWO-SIDED ORTHOGONAL PROCRUSTES

679

C.4.1

Procrustes’ relation to linear programming

Although these two-sided Procrustes problems are nonconvex, there is a connection with linear programming [12, 3] [240, 2.1] [241]: Given A , B ∈ SN , this semidefinite program in S and T minimize
R

tr(ATRTBR) = maximize
S , T ∈ SN

tr(S + T )

(1771) 0

subject to RT = R −1

subject to AT ⊗ B − I ⊗ S − T ⊗ I

(where ⊗ signifies Kronecker product ( D.1.2.1)) has optimal objective value (1770). These two problems in (1771) are strong duals ( 2.13.1.0.3). Given ordered diagonalizations (1761), make the observation: ˆ ˆ inf tr(ATRTBR) = inf tr(ΛA RTΛB R)
R ˆ R

(1772)

T ˆ because R QB R QA on the set of orthogonal matrices (which includes the permutation matrices) is a bijection. This means, basically, diagonal matrices of eigenvalues ΛA and ΛB may be substituted for A and B , so only the main diagonals of S and T come into play;

maximize
S,T ∈ SN

1T δ(S + T ) 0

subject to δ(ΛA ⊗ (ΞΛB Ξ) − I ⊗ S − T ⊗ I )

(1773)

a linear program in δ(S) and δ(T ) having the same optimal objective value as the semidefinite program (1771). We relate their results to Procrustes problem (1763) by manipulating signs (1714) and permuting eigenvalues: maximize
R

tr(ATRTBR) = minimize
S, T ∈ SN

1T δ(S + T ) 0

subject to RT = R −1

subject to δ(I ⊗ S + T ⊗ I − ΛA ⊗ ΛB ) = minimize
S , T ∈ SN

tr(S + T )

(1774) 0

subject to I ⊗ S + T ⊗ I − AT ⊗ B

This formulation has optimal objective value identical to that in (1765).

680

APPENDIX C. SOME ANALYTICAL OPTIMAL RESULTS

C.4.2

Two-sided orthogonal Procrustes via SVD

By making left- and right-side orthogonal matrices independent, we can push the upper bound on trace (1765) a little further: Given real matrices A , B each having full singular value decomposition ( A.6.3) A
T U ΣA QA ∈ Rm×n , A

B

T UB ΣB QB ∈ Rm×n

(1775)

then a well-known optimal solution R⋆ , S ⋆ to the problem minimize
R,S

subject to RH = R −1 S H = S −1

A − SBR

F

(1776)

maximizes re tr(ATSBR) : [316] [290] [52] [195] optimal orthogonal matrices
H S ⋆ = U UB ∈ R m×m , A H R⋆ = QB QA ∈ R n×n

(1777)

[sic] are not necessarily unique [203, 7.4.13] because the feasible set is not convex. The optimal value for the objective of minimization is, by (48)
H H U ΣA QA − S ⋆ UB ΣB QB R⋆ A F H = U (ΣA − ΣB )QA A F

= ΣA − ΣB

F

(1778)

while the corresponding trace maximization has optimal value [43, III.6.12]
RH =R−1 S H =S −1 T sup | tr(ATSBR)| = sup re tr(ATSBR) = re tr(ATS ⋆BR⋆ ) = tr(ΣA ΣB ) ≥ tr(ATB) RH =R−1 S H =S −1

(1779)

for which it is necessary ATS ⋆BR⋆ 0, BR⋆ATS ⋆ 0 (1780)

The lower bound on inner product of singular values in (1779) is due to H von Neumann. Equality is attained if U H UB = I and QB QA = I . A C.4.2.1 Symmetric matrices

Now optimizing over the complex manifold of unitary matrices ( B.5.2), the upper bound on trace (1765) is thereby raised: Suppose we are given diagonalizations for (real) symmetric A , B ( A.5) A = W ΥW T ∈ S n , A A δ(Υ) ∈ KM (1781)

C.4. TWO-SIDED ORTHOGONAL PROCRUSTES
T B = WB ΛWB ∈ S n ,

681 (1782)

having their respective eigenvalues in diagonal matrices Υ, Λ ∈ S n arranged in nonincreasing order (membership to the monotone cone KM (435)). Then by splitting eigenvalue signs, we invent a symmetric SVD-like decomposition A
H U ΣA QA ∈ S n , A

δ(Λ) ∈ KM

B

H UB ΣB QB ∈ S n

(1783)

where U , UB , QA , QB ∈ C n×n are unitary matrices defined by (confer A.6.5) A U A UB W A WB δ(ψ(δ(Υ))) , δ(ψ(δ(Λ))) , QA QB W A WB δ(ψ(δ(Υ))) , δ(ψ(δ(Λ))) ,
H H

ΣA = |Υ| ΣB = |Λ|

(1784) (1785)

where step function ψ is defined in (1612). In this circumstance,
H S ⋆ = U UB = R⋆T ∈ C n×n A

(1786) The (1787)

optimal matrices (1777) now unitary are related by transposition. optimal value of objective (1778) is
H H U ΣA QA − S ⋆ UB ΣB QB R⋆ A F

=

|Υ| − |Λ|

F

while the corresponding optimal value of trace maximization (1779) is
RH =R−1 S H =S −1

sup re tr(ATSBR) = tr(|Υ| |Λ|)

(1788)

C.4.2.2

Diagonal matrices

Now suppose A and B are diagonal matrices A = Υ = δ 2(Υ) ∈ S n , B = Λ = δ 2(Λ) ∈ S n , δ(Υ) ∈ KM δ(Λ) ∈ KM (1789) (1790)

both having their respective main diagonal entries arranged in nonincreasing order: minimize Υ − SΛR F R,S (1791) subject to RH = R −1 H −1 S =S

682

APPENDIX C. SOME ANALYTICAL OPTIMAL RESULTS

Then we have a symmetric decomposition from unitary matrices as in (1783) where U A UB δ(ψ(δ(Υ))) , δ(ψ(δ(Λ))) , QA QB δ(ψ(δ(Υ))) , δ(ψ(δ(Λ))) ,
H H

ΣA = |Υ| ΣB = |Λ| (1786)

(1792) (1793)

Procrustes solution (1777) again sees the transposition relationship
H S ⋆ = U UB = R⋆T ∈ C n×n A

but both optimal unitary matrices are now themselves diagonal. So, S ⋆ΛR⋆ = δ(ψ(δ(Υ)))Λδ(ψ(δ(Λ))) = δ(ψ(δ(Υ)))|Λ| (1794)

C.5

Nonconvex quadratics
minimize
x 1 T x Ax 2

[349, 2] [323, 2] Given A ∈ S n , b ∈ Rn , and ρ > 0 + bT x subject to x ≤ρ (1795)

is a nonconvex problem for symmetric A unless A 0. But necessary and sufficient global optimality conditions are known for any symmetric A : vector x⋆ solves (1795) iff ∃ Lagrange multiplier λ⋆ ≥ 0 i) (A + λ⋆ I )x⋆ = −b ii) λ⋆ ( x⋆ − ρ) = 0 , iii) A + λ⋆ I 0 x⋆ ≤ ρ (1796)

λ⋆ is unique.C.5 x⋆ is unique if A + λ⋆ I ≻ 0. Equality constrained problem minimize
x 1 T x Ax 2

+ bT x

subject to

x =ρ

i) (A + λ⋆ I )x⋆ = −b ii) x⋆ = ρ iii) A + λ⋆ I 0

(1797)

is nonconvex for any symmetric A matrix. x⋆ solves (1797) iff ∃ λ⋆ ∈ R satisfying the associated conditions. λ⋆ and x⋆ are unique as before. Hiriart-Urruty disclosed global optimality conditions in 1998 [198] for maximizing a convexly constrained convex quadratic; a nonconvex problem [309, 32].
The first two are necessary KKT conditions [61, 5.5.3] while condition iii governs passage to nonconvex global optimality.
C.5

Appendix D Matrix calculus
From too much study, and from extreme passion, cometh madnesse. − Isaac Newton [155, 5]

D.1
D.1.1

Directional derivative, Taylor series
Gradients

Gradient of a differentiable real function f (x) : RK → R with respect to its vector argument is defined uniquely in terms of partial derivatives      
∂f (x) ∂x1 ∂f (x) ∂x2

∇f (x)

. . .

∂f (x) ∂xK

   ∈ RK  

(1798)

while the second-order gradient of the twice differentiable real function with respect to its vector argument is traditionally called the Hessian ;  ∂ 2f (x)  ∂ 2f (x) ∂ 2f (x) · · · ∂x1 ∂xK ∂x1 ∂x2 ∂x2  2 1  2f  ∂ f (x) ∂ 2f (x)   ∂x2 ∂x1 ∂ ∂x(x) · · · ∂x2 ∂xK  2 (1799) ∇ 2 f (x)   ∈ SK . .2 . ...   . . . . . .  2  ∂ f (x) ∂ 2f (x) ∂ 2f (x) ··· ∂xK ∂x1 ∂xK ∂x2 ∂x2
K

2001 Jon Dattorro. co&edg version 2012.01.28. All rights reserved. citation: Dattorro, Convex Optimization & Euclidean Distance Geometry, Mεβoo Publishing USA, 2005, v2012.01.28.

683

684

APPENDIX D. MATRIX CALCULUS

The gradient of vector-valued function v(x) : R → RN on real domain is a row-