i
“oa” 2004/4/14 page i i
Approximation of Large−Scale Dynamical Systems
A.C. Antoulas
Department of Electrical and Computer Engineering Rice University Houston, Texas 772511892, USA
email: aca@rice.edu
URL: http://www.ece.rice.edu/˜aca
SIAM Book Series: Advances in Design and Control
i i i
i
i i ii
i
“oa” 2004/4/14 page ii i
i i
ACA :
April 14, 2004
i i
i i iii
i “oa”
2004/4/14 page iii
i
Αφιερώνεται στούς γονείς µου
Θ˜ ανος Κ. Αντούλας
Dedicated to my parents
Thanos C. Antoulas
Μέ λογισµό καί µ’
ν ειρο
Ο Ελεύθεροι Πολιορκηµένοι Σχεδίασµα Γ’ ∆ιονύσιος Σολωµός
With reﬂection and with vision
The Free Besieged Third Draft Dionysios Solomos
ACA
April 14, 2004
i i i
i
i i iv
i “oa”
2004/4/14 page iv
i
i i
ACA :
April 14, 2004
i i
i i
i “oa”
2004/4/14 page v
i
Contents
Preface 0.1 0.2 xv Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii How to use this book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii
I
1
Introduction
Introduction 1.1 Problem setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.1 Approximation by projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Summary of contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Motivating examples 2.1 Passive devices . . . . . . . . . . . . . . . 2.2 Weather prediction  Data assimilation . . . 2.2.1 North sea wave surge forecast 2.2.2 Paciﬁc storm tracking . . . . . 2.2.3 America’s Cup . . . . . . . . 2.3 Air quality simulations  Data assimilation . 2.4 Biological systems: Honeycomb vibrations 2.5 Molecular dynamics . . . . . . . . . . . . 2.6 ISS  International Space Station . . . . . . 2.7 Vibration/Acoustic systems . . . . . . . . 2.8 CVD reactors . . . . . . . . . . . . . . . . 2.9 MEMS devices . . . . . . . . . . . . . . . 2.9.1 Micromirrors . . . . . . . . . 2.9.2 Elk sensor . . . . . . . . . . . 2.10 Optimal cooling of steel proﬁle . . . . . .
1
3 6 8 9 13 14 15 15 17 17 17 19 19 20 21 21 22 22 22 23
2
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
II
3
Preliminaries
Tools from matrix theory 3.1 Norms of ﬁnitedimensional vectors and matrices . . . . . . 3.2 The singular value decomposition . . . . . . . . . . . . . . 3.2.1 Three proofs . . . . . . . . . . . . . . . . . . 3.2.2 Properties of the SVD . . . . . . . . . . . . . . 3.2.3 Comparison with the eigenvalue decomposition 3.2.4 Optimal approximation in the 2induced norm . 3.2.5 Further applications of the SVD . . . . . . . . 3.2.6 SDD: The semidiscrete decomposition* . . . . 3.3 Basic numerical analysis . . . . . . . . . . . . . . . . . . . 3.3.1 Condition numbers . . . . . . . . . . . . . . . 3.3.2 Stability of solution algorithms . . . . . . . . . 3.3.3 Two consequences . . . . . . . . . . . . . . . 3.3.4 Distance to singularity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
25
27 28 30 32 33 34 35 37 38 39 39 41 44 45
v i i i i
i i vi
3.3.5 LAPACK Software . . . . . . . General rank additive matrix decompositions* Majorization and interlacing* . . . . . . . . Chapter summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
i “oa”
2004/4/14 page vi
i Contents
. . . . . . . . . . . . . . . . 46 47 48 50
3.4 3.5 3.6 4
Linear dynamical systems: Part I 4.1 External description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Internal description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 The concept of reachability . . . . . . . . . . . . . . . . . . . 4.2.2 The state observation problem . . . . . . . . . . . . . . . . . 4.2.3 The duality principle in linear systems . . . . . . . . . . . . . 4.3 The inﬁnite gramians . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.1 The energy associated with reaching/observing a state . . . . . 4.3.2 The crossgramian . . . . . . . . . . . . . . . . . . . . . . . 4.3.3 A transformation between continuous and discrete time systems 4.4 The realization problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.1 The solution of the realization problem . . . . . . . . . . . . . 4.4.2 Realization of proper rational matrix functions . . . . . . . . . 4.4.3 Symmetric systems and symmetric realizations . . . . . . . . 4.4.4 The partial realization problem . . . . . . . . . . . . . . . . . 4.5 The rational interpolation problem* . . . . . . . . . . . . . . . . . . . . . 4.5.1 Problem formulation* . . . . . . . . . . . . . . . . . . . . . . 4.5.2 The L¨ owner matrix approach to rational interpolation* . . . . 4.5.3 Generating system approach to rational interpolation* . . . . . 4.6 Chapter summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Linear dynamical systems: Part II 5.1 Time and frequency domain spaces and norms . . . . . . . . . . 5.1.1 Banach and Hilbert spaces . . . . . . . . . . . . . 5.1.2 The Lebesgue spaces p and Lp . . . . . . . . . . 5.1.3 The Hardy spaces h p and Hp . . . . . . . . . . . . 5.1.4 The Hilbert spaces 2 and L2 . . . . . . . . . . . . 5.2 The spectrum and singular values of the convolution operator . . 5.3 Computation of the 2induced or H∞ norm . . . . . . . . . . . 5.4 The Hankel operator and its spectra . . . . . . . . . . . . . . . 5.4.1 Computation of the eigenvalues of H . . . . . . . . 5.4.2 Computation of the singular values of H . . . . . . 5.4.3 The HilbertSchmidt norm . . . . . . . . . . . . . 5.4.4 The Hankel singular values of two related operators 5.5 The H2 norm . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5.1 A formula based on the gramians . . . . . . . . . . 5.5.2 A formula based on the EVD of A . . . . . . . . . 5.6 Further induced norms of S and H* . . . . . . . . . . . . . . . 5.7 Summary of norms . . . . . . . . . . . . . . . . . . . . . . . . 5.8 System stability . . . . . . . . . . . . . . . . . . . . . . . . . . 5.8.1 Review of system stability . . . . . . . . . . . . . 5.8.2 Lyapunov stability . . . . . . . . . . . . . . . . . 5.8.3 L2 –Systems and norms of unstable systems . . . . 5.9 System dissipativity* . . . . . . . . . . . . . . . . . . . . . . . 5.9.1 Passivity and the positive real lemma* . . . . . . . 5.9.2 Contractivity and the bounded real lemma* . . . . 5.10 Chapter summary . . . . . . . . . . . . . . . . . . . . . . . . . Sylvester and Lyapunov equations 6.1 The Sylvester equation . . . . . . . . . . . . . . . . . . 6.1.1 The Kronecker product method . . . . . . . 6.1.2 The eigenvalue/complex integration method 6.1.3 The eigenvalue/eigenvector method . . . . 6.1.4 Characteristic polynomial methods . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
53 . 53 . 56 . 59 . 65 . 67 . 68 . 70 . 74 . 75 . 76 . 79 . 81 . 83 . 83 . 84 . 85 . 86 . 91 . 102 105 105 105 106 106 108 108 112 113 115 116 118 119 121 121 121 123 125 126 126 128 129 133 137 141 143 145 145 146 147 148 150
5
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
6
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
i i
ACA :
April 14, 2004
i i
i i Contents
6.1.5 The invariant subspace method . . . . . . . . . . . 6.1.6 The sign function method . . . . . . . . . . . . . . 6.1.7 Solution as an inﬁnite sum . . . . . . . . . . . . . The Lyapunov equation and inertia . . . . . . . . . . . . . . . . Sylvester and Lyapunov equations with triangular coefﬁcients . . 6.3.1 The BartelsStewart algorithm . . . . . . . . . . . 6.3.2 The Lyapunov equation in the Schur basis . . . . . 6.3.3 The square root method for the Lyapunov equation 6.3.4 Algorithms for Sylvester and Lyapunov equations . Numerical issues: forward and backward stability . . . . . . . . Chapter summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
i “oa”
2004/4/14 page vii
i vii
153 154 156 157 159 159 160 162 165 167 169
6.2 6.3
6.4 6.5
III
7
SVDbased approximation methods
Balancing and balanced approximations 7.1 The concept of balancing . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Model Reduction by balanced truncation . . . . . . . . . . . . . . . . 7.2.1 Proof of the two theorems . . . . . . . . . . . . . . . . 7.2.2 H2 norm of the error system for balanced truncation . . . 7.3 Numerical issues: four algorithms . . . . . . . . . . . . . . . . . . . 7.4 A canonical form for continuoustime balanced systems . . . . . . . . 7.5 Other types of balancing* . . . . . . . . . . . . . . . . . . . . . . . 7.5.1 Lyapunov balancing* . . . . . . . . . . . . . . . . . . . 7.5.2 Stochastic balancing* . . . . . . . . . . . . . . . . . . . 7.5.3 Bounded real balancing* . . . . . . . . . . . . . . . . . 7.5.4 Positive real balancing* . . . . . . . . . . . . . . . . . . 7.6 Frequency weighted balanced truncation* . . . . . . . . . . . . . . . 7.6.1 Frequency weighted balanced truncation* . . . . . . . . 7.6.2 Frequency weighted balanced reduction without weights* 7.6.3 Balanced reduction using timelimited gramians* . . . . 7.6.4 Closedloop gramians and model reduction* . . . . . . . 7.6.5 Balancing for unstable systems* . . . . . . . . . . . . . 7.7 Chapter summary . . . . . . . . . . . . . . . . . . . . . . . . . . . .
171
173 174 176 178 181 183 186 190 190 190 192 193 195 197 199 200 200 201 203 205 205 206 207 207 209 209 210 211 212 212 213 215 220 220 223 225 226 226
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
8
Hankelnorm approximation 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1.1 Main ingredients . . . . . . . . . . . . . . . . . . . . . . . . 8.2 The AAK theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3 The main result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4 Construction of approximants . . . . . . . . . . . . . . . . . . . . . . . . 8.4.1 A simple inputoutput construction method . . . . . . . . . . 8.4.2 An analogue of SchmidtEckartYoungMirsky . . . . . . . . 8.4.3 Unitary dilation of constant matrices . . . . . . . . . . . . . . 8.4.4 Unitary system dilation: suboptimal case . . . . . . . . . . . 8.4.5 Unitary system dilation: optimal case . . . . . . . . . . . . . 8.4.6 All suboptimal solutions . . . . . . . . . . . . . . . . . . . . 8.5 Error bounds for optimal approximants . . . . . . . . . . . . . . . . . . . 8.6 Balanced and Hankelnorm approximation: a polynomial approach* . . . . 8.6.1 The Hankel operator and its SVD* . . . . . . . . . . . . . . . 8.6.2 The Nehari problem and Hankelnorm approximants* . . . . . 8.6.3 Balanced canonical form* . . . . . . . . . . . . . . . . . . . 8.6.4 Error formulae for balanced and Hankelnorm approximations* 8.7 Chapter summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
9
Special topics in SVDbased approximation methods 227 9.1 Proper Orthogonal Decomposition (POD) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227 9.1.1 POD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
April 14, 2004
ACA
i i i
i
i i viii
9.1.2 Galerkin and Petrov–Galerkin projections . . 9.1.3 POD and Balancing . . . . . . . . . . . . . . Modal approximation . . . . . . . . . . . . . . . . . . . . Truncation and residualization . . . . . . . . . . . . . . . A study of the decay rates of the Hankel singular values* . 9.4.1 Preliminary remarks on decay rates* . . . . . 9.4.2 Cauchy matrices* . . . . . . . . . . . . . . . 9.4.3 Eigenvalue decay rates for system gramians* . 9.4.4 Decay rate of the Hankel singular values* . . 9.4.5 Numerical examples and discussion* . . . . . Assignment of eigenvalues and Hankel singular values* . . Chapter summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
i “oa”
2004/4/14 page viii
i Contents
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228 229 231 233 233 234 235 235 239 243 250 251
9.2 9.3 9.4
9.5 9.6
IV
10
Krylovbased approximation methods
Eigenvalue computations 10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 10.1.1 Condition numbers and perturbation results . . . 10.2 Pseudospectra* . . . . . . . . . . . . . . . . . . . . . . . . . 10.3 Iterative methods for eigenvalue estimation . . . . . . . . . . 10.3.1 The RayleighRitz procedure . . . . . . . . . . . 10.3.2 The simple vector iteration (Power method) . . . 10.3.3 The inverse vector iteration . . . . . . . . . . . . 10.3.4 Rayleigh quotient iteration . . . . . . . . . . . . 10.3.5 Subspace iteration . . . . . . . . . . . . . . . . 10.4 Krylov methods . . . . . . . . . . . . . . . . . . . . . . . . 10.4.1 A motivation for Krylov methods . . . . . . . . . 10.4.2 The Lanczos method . . . . . . . . . . . . . . . 10.4.3 Convergence of the Lanczos method . . . . . . . 10.4.4 The Arnoldi method . . . . . . . . . . . . . . . 10.4.5 Properties of the Arnoldi method . . . . . . . . . 10.4.6 An alternative way of looking at Arnoldi . . . . . 10.4.7 Properties of the Lanczos method . . . . . . . . . 10.4.8 Twosided Lanczos . . . . . . . . . . . . . . . . 10.4.9 The Arnoldi and Lanczos algorithms . . . . . . . 10.4.10 Rational Krylov methods . . . . . . . . . . . . . 10.4.11 Implicitly restarted Arnoldi and Lanczos methods 10.5 Chapter summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
253
255 256 257 259 261 261 261 262 262 262 263 263 264 265 266 266 269 270 270 271 272 273 277 279 280 281 282 283 287 289 290 291 293
11
Model reduction using Krylov methods 11.1 The moments of a function . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Approximation by moment matching . . . . . . . . . . . . . . . . . . 11.2.1 Twosided Lanczos and moment matching . . . . . . . . . 11.2.2 Arnoldi and moment matching . . . . . . . . . . . . . . . 11.3 Partial realization and rational interpolation by projection . . . . . . . . 11.3.1 Twosided projections . . . . . . . . . . . . . . . . . . . 11.3.2 How general is model reduction by rational Krylov? . . . . 11.3.3 Model reduction with preservation of stability and passivity 11.4 Chapter summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
V
12
SVD–Krylov methods and case studies
295
SVD–Krylov methods 297 12.1 Connection between SVD and Krylov methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297 12.1.1 Krylov methods and the Sylvester equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298 12.1.2 From weighted balancing to Krylov . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298
i i
ACA :
April 14, 2004
i i
. . . . . . . . . . . . . . . . . .3 12. . . . . .4. 12. . . . . . . . . . .3. . 13. . . . . . . . . . 14 Epilogue 337 14. . . . . . .2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.4. . . . . . .2 12.4. . . . . . . .5 Relation of the Smith method to the trapezoidal rule . . . . . . . . . . . . . . . . . . . . . .4 A discussion on the approximately balanced reduced system . . . . . . . . . .1 Stage 1R . . . 2004 i i i i . . . . Chapter summary . . . . . . . and cyclic Smith(l) iterations . .3. . . . . . . . . . . . . 13. . . . . . . . . . . . . . . . . . 337 14. . . . . . . . .2. . . . . . . . . . . .1 ADI. . . . . 13. . . . . . . . . . . . . . . . . . . . . . . . . . 13. . . . . . . . . . . . . . .1 Projectors. 13. . . . . . . . . . . . . . . .3 The modiﬁed LRSmith(l) iteration . . . . . . . .3. . . . 13. . . 13. . . . . . . . . . . . . . .2 A system of order n = 1006 . . .4. . . i “oa” 2004/4/14 page ix i ix 299 301 302 303 308 309 310 311 314 314 315 317 317 317 319 321 321 323 324 325 326 328 330 330 331 333 335 12. . . . . . Introduction to iterative methods . . . . . . . . . . . . . . . . .i i Contents 12. . . . . . . . . . . . . . .3 Approximation by least squares . . . . . . . . 339 Problems 343 359 379 15 Bibliography Index ACA April 14. . . . . . . . . . . . . . . . . . . . .4 Iterative Smithtype methods . . . . . . . . . . 13. . . . . . . . . . . .4 Clamped beam model . . . . . . . . . . . . . Smith. . . . . . . . 13. .4. . . . . . . . . . 12. . . . . . . . . . . . . . . . . . 13. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2. . . . . . . . . . . . . . . . . . . . . . 13. . . . . . . . . . . . . . . . . . . . . . .2 Open problems . . . . . . . . . . . . .2. . . . . . . .1 Approximate balancing through low rank approximation of the cross gramian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13. .1 The CD player . . . . . . . . . . .4 12. . . . . . . . .3 The CD player . . . . . . . . . . .3 Approximation of the ISS 1R and 12A ﬂex models . . . . . . . . . . . . . . . . .5 Lowpass Butterworth ﬁlter . . . . . . . . . . . . . . . 12. . . . . . . . . . . . . . . . . . . . . . 12. . . . . . . . . . . . . . . . . .2 Heat diffusion model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Iterative Smithtype methods for model reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4.5 13 Case studies 13. . . . .1. . .2 Model reduction algorithms applied to low order systems 13. . . . . .2 Stage 12A . . . . . . . . . .3 Aluminum plate n = 3468 . . . . . . . . . . . . . . .1 Building Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12. . . . . . . . . .4. . . 12. computational complexity and software . . . . . . . . . . . . . . . . . . . . . An iterative method for approximate balanced reduction . . . . . . . . .2 LRADI and LRSmith(l) iterations . . . . . . . . . . . .4 Heat distribution on a plate n = 20736 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 Approximation of images .4. . . . . . .2. . . . . . . . . . . . . . . . . . . . . . . .4. . . . . . . . . . . . . . . . . . . .
2004 i i .i i x Contents i “oa” 2004/4/14 page x i i i ACA : April 14.
. . . . . . . Feedback interpretation of the parametrization of all solutions of the rational interpolation problem . . . . . . . . . . . . . . . . . . . . . . . . . Evolution of IC design [330] . Locus of the eigenvalues of the Hamiltonian as a function of γ .2 3. . . . . . . . . . . . . . . . . . . .1 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . Upper left frame: Hankel singular values. . . .14 The broad setup . . . . . . . . .12 2. . . . . . . . Unit balls in R2 : 1norm (inner square). . . . . . . . . . . . . . . . . . . Plot of the determinant versus 0 ≤ γ ≤ 1 (right). . . . . . . . . . . . . . . . Approximation of an image using the SVD. . . .5 110 114 115 121 123 xi i i i i . . . . . . . . . . . . . . . . . . right: rank 1 approximants. .2 2. . .5 4. . . . . . . . . . . . . . . . . . . . . . . .3 2. . .7 2. .1 2. . . . . . . . . . . .6 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10 2. . . Ozone concentrations in Texas and vicinity on 8 September 1993 . . . . . Lower pane: Resulting output considered for t > 0. . . . . . . . . . . . . . . . .1 5. . 2. Left: original. . 2norm (circle). . . . . . The zero crossings are the Hankel singular values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9 2. . .6 4. . . . . . . . . . Frequency response of the ISS as complexity increases . . . Lower right frame: total variation of step response. . .7 5. . . . . . . . . . . . .4 5. . . . .2 5. . . . . . . . . . . . . . . . . . . Explicit ﬁnitedimensional dynamical system .8 2. . . . . . . . . Left: Figure “eight” dances. . . . . . . . . . . . . . MEMS rollover sensor . . . . . . . . . . . . . . . . . . . . . . Applications leading to largescale dynamical systems . . . . . . . . . . .i i i “oa” 2004/4/14 page xi i List of Figures 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .13 2. . . . . . . . Wave surge prediction problem: depth of the north sea. . Feedback interpretation of recursive rational interpolation . . . . Wave surge prediction problem: error covariance of the estimated water level without measurements (above) and with assimilation of 8 water level measurements (below). . . . . . . . . . . . . . . 10. . . . . . . 6 7 9 13 14 15 15 16 16 18 19 20 21 22 23 23 24 29 31 37 57 72 73 94 96 97 98 3. . . . . . Lower left frame: impulse response and step response. . . . . middle: rank 2. . . 16th order Butterworth ﬁlter. . An RLC circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . right plot: time axis running forward. . . . . . .5 2. . . . . . . .3 4. Wave surge prediction problem: discretization grid close to the coast. . . . . . Flowchart of approximation methods and their interconnections. . . . . . . . . . . . . . . . . . The bar on the right shows the colorcoding of the error. . . . . . Optimal cooling: discretization grid . Heme: active site. . . . . . . . . . . . . . . . . . . . Quantities describing the singular value decomposition in R2 . . . . . ∞norm (outer square) . Honeybee Waggle dance. . . . . . . . . . . .4 4. . . . . . . . . . . .4 2. . . . .3 2. . .3 4. . . . . . . . . . . . . . . . . . . . . . . . . . .1 4. . . . . . . . . . . . .11 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Upper right frame: Nyquist plot. . . . . . . . The ﬁrst 6 Hankel singular values (left). . . . . . . . . . . . . . Histidine: part that opens and closes catching oxygen molecules. . . . . N = 4 (circles). . . . . . . . Top pane: Square pulse input applied between t = −1 and t = 0. . . . . . . . . . .2 4. . . . . . . . . . Right: Dancer and followers. N = 6 (crossed circles). . . . . . . . Left plot: time axis running backward. . . . . . . . . . . . . . . . . . .3 5. . . . . . . Parallel connection of a capacitor and an inductor . . . . . Molecular dynamics: Myoglobin (protein). . . The CVD reactor and the required discretization grid . Electric circuit example: Minimum energy inputs steering the system to x1 = [1 0]∗ .2 1. . . . . . . . Cascade (feedback) decomposition of scalar systems . . . . . . . .1 3. . . . . . . . . . . . . . . . . . . . . . . Spectrum of the convolution operator S compared with the eigenvalues of the ﬁnite approximants SN . . . . . . N = 10 (dots). . . . . . the bars are cooled from 1000o C to 500o C. . Progression of the cooling of a steel proﬁle (from lefttoright and from toptobottom). . . . . . . . . . . The action of the Hankel operator: past inputs are mapped into future outputs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . for T = 1. . . . . . . . . . . . . Wave surge prediction problem: measurement locations. . . . . . . . . A general cascade decomposition of systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . The abscissa is γ . . . . .4 i “oa” 2004/4/14 page xii i List of Figures Balancing transformations for Butterworth ﬁlters of order 130. . Lower panes: error Bode plots.2 by balanced truncation . . Left pane: Decay rates for Penzl’s example. . . . . . . . . . . . . . The ﬁrst 10 Chebyshev polynomials. A NALOG F ILTER A PPROXIMATION: Hankel singular values . . . . . . . .1 8. . . . . .4 13. . . . . . . . . . . . . . Right pane: Fast eigendecay rate for A with clustered eigenvalues having dominant imaginary parts. . . . . . . . . . . . . . . . . . . . . . . . . . . . Lower panes: error Bode plots. . . . . . . . . Upper panes: Bode plot of original and reduced models for the CD player model. . . . . . . . . . . . . Left pane: normalized Hankel singular values of the heat model. . . . . . . . . . . . . . . . . . . . . . Right ﬁgure: errors of three observability Lyapunov equations. Upper panes: Bode plot of original and reduced systems. . . . . . . . . . . . . . . . . . . . . . . . . . . Right pane: Comparison of estimates for a nonsymmetric A . . . . . . 204 Construction of approximants which are optimal and suboptimal in the Hankel norm . The leading Hankel singular values (top) and the poles (bottom) of the 12A ISS model . . . . . .8 13. . . . . . . . . . . . . . . . .5 9. . . .1 13. . . . .3 8. . . . .8 9. . . . . . . . . . . . . . . . MIMO cases: the effect of different p . . . . . . . Lower panes: error Bode plots. . . 305 The normalized singular values of the clown and galaxies NGC6822 and NGC6782 . . . . . . The Arnoldi and the twosided Lanczos algorithms . . . . . . .1 13. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . versus k = 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A NALOG F ILTER A PPROXIMATION: Bode plots of the error systems for model reduction by optimal Hankelnorm approximation (continuous curves). . . . . . . . . . . . . 10−3 and 10−2 pseudospectra of the 2 × 2 matrix of example 3. the clamped beam model. . . . . . Eigenvalue decay for the discrete Laplacian . . . . . . . Eigenvalue decay for shifted random matrices . . . . . . . . . . . δ+ of allpass dilation systems. . . . . . . . . . . . . . . . Right pane: error Bode plots. . . . . . . . . . . . . . . . . . . . . . . . . . · · · . . . .3 13. . . . . .3 10. . . . . . . . . . . . . . .12 10. . . . . . 10−6 (right). . . .5 13. 187 Approximation of the unstable system in example 7. . . . . . . .6. . . .5 10. . . . . . . . . . . . . .7 13. . . . . . . . . 207 213 219 219 9. . . . . . . . . . . . . . . . 10−8 . second pane from left: pseudospectra of a 50 × 50 upper triangular matrix for = 10−14 . . . . . . . . The Arnoldi process: given A ∈ Rn×n . . the Butterworth ﬁlter. . 10−12 . . . . . . . . . . . . . Perturbation of the eigenvalues of a 12 × 12 Jordan block (left) and its pseudospectra for real perturbations with = 10−9 . . . . . . . . . . and only the last column of the residual R ∈ Rn×k is nonzero. . . . . .9 9. . . . . The Implicitly Restarted Arnoldi algorithm . . . . . stable part of allpass dilation systems. . . the system to be approximated has dimension n = 4. .2 8. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4 10. .2 13. . .3 9. . . .21) (dashdash curves). . . . . . balanced truncation (dashdot curves). . . .2. . . . . . . . . . . . . . . · · · . . .2 232 241 244 245 246 246 248 248 249 249 249 250 259 260 260 266 267 267 273 275 10. . . .10 13. The original images together with the 10% and 5% approximants . . Right pane: ﬁve pairs of γ /σ curves for allpole systems of order 19. . . . . . . . .11 An implicitly restarted method for the cross gramian. . . 5. . . . . . . . . . . . . . . . . . . . . . . . . . Right pane: Bode plots of the original system and three approximants. . . . . . . . .3 8. . . . . . . . .1 10. . . . . σmax of the frequency response of the reduced systems (upper plot) and of the error systems of the clamped beam Upper panes: Bode plot of original and reduced systems for the Butterworth ﬁlter. . . . .4 9. . . . . . . . . .9 13. . . . . . . . . . . . . . . the building example. . . . . . . . . . . . . . . k k . . Left pane: Absolute value of the ﬁrst 100 coefﬁcients of the expansion of H (log scale). . . . . 10−2 . . .1 9. . . . . . . . . . . . . . . . . . Plot of the eigenvalues of a symmetric 20 × 20 matrix using the symmetric Lanczos procedure.2 9. . .10 9. . . . .1 7. . . construct V ∈ Rn×k with orthonormal columns. . . . . . . .i i xii 7. . . . . . . . . such that H ∈ Rk×k is an upper Hessenberg matrix. . . . .6 13. . . . . . . . . right most pane: real pseudospectrum of the same matrix for the same . . . Eigendecay rate vs Cholesky estimates for some real LTI systems . . . . . . . . . . . .025. . . . . . . . . . . . . . . . . . . . · · · . . . . . Eigenvalue decay for Vshaped spectrum . . . . . . . . . Left ﬁgure: errors of three balancing transformations. . . . . . . . Right pane: relative degree reduction n vs error tolerance σ σ1 Building model. . . . . . . . .7 10. . . . . . . . . . . . . . Leftmost pane: pseudospectra of a 10 × 10 companion matrix for = 10−9 . . . . . . . . . . . . . . . 2004 i i . .1 by balanced truncation . . 318 318 319 320 321 322 323 324 326 327 329 i i ACA : April 14. . . . . . .8 12. . Hankel norm approximation: dimension δ . .2 7. . . . . . . .11 9. . . and the upper bound (8. . . . . . . second pane from right: complex pseudospectrum of the 3 × 3 matrix above for = 0. . . . .7 9. . . . . . . . the CD player model. . . . 203 Approximation of the unstable system in example 7. .6. . . . Eigenvalue decay for a spectrum included in a pentagonshaped region . . . . . . . . . . .6 9. . . . . . . . . . . . . . . . . Hankel singular values and poles of the 1R ISS model . . . . Fast eigendecay rate for large cond(A) . . . . . . . . . . . . . . . . . . . . . . . . . . . Left pane: Bode plot of original and reduced systems for the heat diffusion model. 10−2 . . . . . . . . . . .2. . . . . . . . . P Pk Left pane: k i=1 log γi  (upper curve) and i=1 log σi (lower curve). . . . . . . . . . . . . . . . . . . Model reduction plots of the 1R ISS model . . . . . . . . . . . . . . . . . . . . . . Eigendecay rate and Cholesky estimates for the slowdecay cases . . Left pane: Comparison of estimates for the discrete Laplacian. . . . .6 10. . . . . . . . . . . . . . . . .
. . k k Top left pane: the normalized Hankel singular values σi ’s. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . k k Frequency response of FOM . . . . . . . ΣSl k and Σk . . . . . and σ ˜i . . . . . .1 15. . . σ ˆi .14 13.13 Model reduction plots of the 12A ISS model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C = 1pF . . . Plots for the CD player. . . . . . . . . . Upper left pane: the normalized Hankel singular values σi . . . . . Top right pane: the amplitude Bode ˜ plot of error system ΣSl k − Σk for the 1006 example.3 334 334 335 335 Heatconducting rod . . . . (b) Amplitude Bode plots of the error systems for heat example i “oa” 2004/4/14 page xiii i xiii 330 332 13. . . . . 2004 i i i i . . . . .1Ω. . . . . . . . . . . . . . R The kth section of an RLC circuit . . . . . . . . . .2 15. . . . . . . . . . . .17 15. . . . . . . . . . . . . . . Lower left pane: amplitude Bode plots of the FOM Σ and ˜ the reduced systems Σk . . .12 13. . . . . . Bottom right pane: the amplitude Bode plots of error systems ˜ Σ − Σk .16 13. .i i List of Figures 13. . . . expansion point s = 100 (right pane). . . . . (a) Amplitude Bode plots of the FOM and ROM. . . Lower right pane: amplitude Bode plots of the error systems Σ − Σk . . . . . . . . . . . . and σ ˜i ’s. . . . . . . . . . . . 354 A section of an RLC cascade circuit: R = . 350 ¯ = 100Ω. . . . . ΣSl k and Σk . . . . . . . . . . . . . . . Bottom left pane: the amplitude Bode plots of the FOM ˜ Σ and the reduced systems Σk . ˜ Σ − ΣSl and Σ − Σ . . . . . Upper right ˜ pane: amplitude Bode plot of error system ΣSl k − Σk . . . . . . . . . . . . . . . . Σ − ΣSl and Σ − Σ for the n=1006 example. 356 ACA April 14. . . . . . . . . . . . . . . σ ˆi ’s. . L = 10pH .15 13. . . . . Rational Krylov method: expansion point s = 0 (left pane). . . .
2004 i i .i i xiv List of Figures i “oa” 2004/4/14 page xiv i i i ACA : April 14.
the transfer function. is one of the most beautiful and useful parts of system theory. or from another system representation. and the Hankel norm. In central stage in this book. equally controllable as observable. This book deals with important problems and results on the interface between three central areas of (applied) mathematics: linear algebra. Whereas the ﬁrst two of these ﬁelds have had a very fruitful symbiosis since a long time in the form of numerical linear algebra. Perhaps the most remarkable of these relations is the inequality that bounds the H∞ norm of a system by twice the sum (without repetition) of its Hankel singular values. these singular value oriented methods are computationally rather complex. AAK model reduction is based on a remarkable result of Arov. but it is shown that the resulting reduced system has very good properties. The misﬁt issue leads to an indepth discussion of matrix. a state space representation. There are numerous inequalities relating matrix norms. requiring of the order of n3 operations. What is meant by the misﬁt between two systems? How is this misﬁt computed? 3. Two effective approximation algorithms. this problem can be viewed as a tradeoff between complexity and misﬁt. The book in front of you is an example of this rapprochement. and system theory. where n denotes the dimension of the state space of the system to be approximated. In system xv i i i i . a transfer function. and some of these extend to system norms. this makes these methods applicable to systems of only modest dimension (a few hundred variables). We are looking for a system of minimal complexity that approximates the original one (optimally) with a maximal allowed misﬁt. and system norms. Give algorithms that compute optimal approximants. While neither balancing. Unfortunately. Two important system norms emerge: the L2 induced. conversely. are optimal reductions in the allimportant H∞ norm. until recently. we ﬁnd the problem of model reduction. or their discretetime counterparts. This problem can be formulated as follows. Together with accuracy considerations. or H∞ norm. This reduction method is heuristic. (ii) AAK model reduction. three Russian mathematicians who proved that a Hankel operator can be approximated equally well within the class of Hankel operators than in the class of general linear operators. As all approximation questions. we are looking for a system that approximates the original one with minimal misﬁt within the class of systems with maximal admissible complexity. Model reduction by balancing is based an a very elegant method of ﬁnding a state space representation of a system with state components that are. both related to these Hankel singular values are: (i) Approximation by balancing. commonly called realization theory. What is meant by the complexity of a linear timeinvariant system? How is the complexity computed from the impulse response matrix. The complexity issue leads to theory of state space representations. The issue of system approximation centers around the singular values of the associated Hankel operator. 2. so to speak. originally championed in the work of Kalman. Three key questions arise: 1. approximate it by a simpler one. or. operator. the third area has remained somewhat distant from all this. The book also introduces a second set of approximation methods based on moment matching. nor AAK. and Krein. very nice inequalities bounding the H∞ approximation error can be derived. numerical analysis. Given a linear timeinvariant input/output system deﬁned by the convolution integral. Adamjan.i i i “oa” 2004/4/14 page xv i Preface Lectori Salutem. This theory.
for numerical simulation of PDE’s) has not yet been fully achieved. is that stability of the reduced model is not guaranteed. the Krylov iterative methods. namely. The downside. and promises to be a stimulating experience to everyone interested in mathematics and its relevance to practical problems. The scope of this book is the complete theory of mostly linear system approximation. The SVD based approach can be extended to nonlinear systems. This brings us to the last part of the book which aims at combining the SVD based methods and the Krylov methods in what are called SVDKrylov methods. The mathematical ideas underlying the interplay of the singular value decomposition and linear system theory are of the most reﬁned mathematical ideas in the ﬁeld of system theory. I believe that its impact (for example. and practical applications. It is hard to overestimate the importance of the theory presented in this book. simulation questions. 2004 i i . The book in front of you is unique in its coverage. 030303 i i ACA : April 14. nor is there a known global error bound. however.i i xvi Preface i “oa” 2004/4/14 page xvi i theory language. this moment matching is a generalization of the wellknown partial realization problem. These schemes were originally developed for computing eigenvalues and eigenvectors. Willems Leuven. these methods require only of the order of n2 operations. Typically. The resulting method is known as POD (Proper Orthogonal Decomposition) and is widely used by the PDE community. Special attention is paid to numerical aspects. These methods can be iteratively implemented using standard algorithms from numerical linear algebra. but can be applied to model reduction via moment matching. Jan C.
The next recipient of my gratitude is Paolo Rapisarda who was always willing to read critically and provide invaluable advice over extensive portions of the book. Peter Benner. Thanks go also to Brian Anderson. Siep Weiland. 5. First. 5. Numerical Analysis. 5.2 – Chapter 6 – Chapter 7: Sections 7. 0. 7. who contributed in many ways over the last three years. • Notation ACA April 14. I would also like to thank the anonymous referees and several students who commented on the book at various stages.13. 10. 5.1. 4.4 – Chapter 11: Sections 11. 3.2.1. Roberto Tempo.2.15. 10. • Sections omitted at ﬁrst reading. Kalman. Sections marked with an asterisk ∗ contain material which can be omitted at ﬁrst reading.5. Theory of PDE/CFD (Partial Differential Equations/Computational Fluid Dynamics). – Chapter 3: Sections 3. special thanks go to Serkan Gugercin. 5.8. But I am even more thankful to him for introducing me to Dan Sorensen. Alessandro Astolﬁ.3. Michael Hinze and Stefan Volkwein for reading and commenting on various parts of the book. and Caroline Boss. are due.3 – Chapter 10: Sections 10. Acknowledgements xvii i “oa” 2004/4/14 page xvii i 0. longtime friend and colleague from our years as graduate students under R.5. I would also like to thank Angelika BunseGerstner. 3.3 – Chapter 4: Sections 4. Dan is also acknowledged for contributing part of the section on the decay rates. Knowledge of elementary system theory and numerical analysis is desirable.2 How to use this book • A ﬁrst course in model reduction would consist of the following sections.i i 0. I would like to thank Jan Willems for his friendship and inspiration for over a quarter century and for writing the Preface of this book.4. Yunkai Zhou.8.2. as well as part of the last two chapters. and to whom most numerical experiments and ﬁgures. 11.2.1.1. but the “real deal”. Dan took it upon himself to teach me numerical linear algebra.1.1. 2004 i i i i . Meeting Dan was an event which added a new dimension to my research.2 • Prerequisites.2.1 Acknowledgements There are numerous individuals without whose help this book will not have been completed. The most important prerequisite is familiarity with linear algebra. for their comments.3 – Chapter 5: Sections 5. Finally. Finally I would like to thank the editors at SIAM for their professional and efﬁcient handling of this project.3. Many thanks go furthermore to Mark Embree for numerous discussions on pseudospectra and for his very careful reading and substantial comments on several chapters. my acknowledgements go to Yutaka Yamamoto. The target readership consists of graduate students and researchers in the ﬁelds of System and Control Theory. I would also like to thank Paul van Dooren for his advice on the book while I was visiting LouvainlaNeuve. 4. Matthias Heinkenschloss. and in general of anyone interested in model reduction. 7. and not some watereddown version. Next.1.E.
2 section 5.i i xviii Preface i “oa” 2004/4/14 page xviii i R (R+ . 2004 i i .q f l(x) e Mt i i ACA : April 14.3.16) S S∗ hk ∈ Rp×m h(t) H(s) H+ (s) x p A p. output spaces linear system matrices linear system with missing D linear system with missing C and D linear system with missing B and D stable subsystem of Σ dual or adjoint of Σ projection reduced order dynamical system system matrices of linear reduced order system convolution operator (discrete and continuous time) adjoint of convolution operator Markov parameters impulse response discretetime impulse response continuoustime systems transfer function discrete. (4.2.3) and (4.23) (4.j ) M(i.2) (1. (5.8). X.2) (3.and continuoustime transfer function of stable subsystem of Σ pnorm of x ∈ Rn (p.11) (1. (4. negative) largest (in magnitude) eigenvalue of M ∈ Rn×n largest (in magnitude) diagonal entry of M ∈ Rn×n adjoint (matrix of cofactors) of M ∈ Rn×n characteristic polynomial of M ∈ Rn×n rank of M ∈ Rn×m ith singular value of M ∈ Rn×m condition number of M ∈ Rn×m transpose if M ∈ Rn×m complex conjugate transpose if M ∈ Cn×m j th column of M ∈ Rn×m ith row of M ∈ Rn×m spectrum (set of eigenvalues) of M ∈ Rn×n pseudospectrum of M ∈ Rn×n inertia of A ∈ Rn×n general dynamical system input.8.3. C− ) Z (Z+ .8.7).4). chapter 8 section 4.3.2.6) (1.22) section 5. q )induced norm of A ﬂoating point representation of x ∈ R matrix exponential (6. R− ) C (C+ . (4.8) (5. Y Σ= Σ= Σ= Σ= Σ+ Σ∗ Π ˆ Σ ˆ = Σ ˆ A ˆ C ˆ B ˆ D A C A C A A C B D B B real numbers (positive.14) (4.2.21) (4.13) (4.7) (1.21) (4. negative) complex numbers (with positive.1 section 4.15) section 4.1) (1.4) (3.5) and (4. negative real part) integers (positive.20) (4. state.:) spec (M) spec (M) in (A) Σ U. 8 (3. Z− ) λmax (M) δmax (M) adj M χM (s) rank M σi (M ) κ(M) M∗ M(:.15) (1.
84) (4.1 (11.2) (9.5). B) Rn (A. (4. L∞ norm 2norm of the system Σ 2induced norm of the convolution operator S H∞ norm of the transfer function Hankel operator of Σ: discrete.4. 5. ηk (s0 ) H∞ L∞ ACA April 14. y) d(x. continoustime adjoint of continuoustime Hankel operator Hankel singular values Hankel norm of Σ 2induced norm of Hankel operator H2 norm of Σ mixed norm of vectorvalued function of time mixed induced operator norm reachability inputweighted gramian observability outputweighted gramian frequencylimited reachability gramian frequencylimited observability gramian integral of log of resolvent (sI − A)−1 timelimited reachability gramian timelimited observability gramian Cauchy matrix diagonal of Cauchy matrix multiplicative majorization gradient section moments.23) (5. (5.3) (9.26) (4.q) (r.i i 0.s)  T (p. (5. discretetime inﬁnite reachability gramian continuous.32) (7. 5.47) (4.q) Pi Qo P (ω ) Q(ω ) S(ω ) P (T ) Q(T ) C (x.19). B) P (t) O(C. (5.52) (4.51) (4.81) (4. (5.63) (4.4) reachability matrix ﬁnite reachability matrix (ﬁnite) reachability gramian continuous.101)/sec. (4. A) On (C. F Σ 2 S 2−ind H H∞ H H∗ σi (Σ) Σ H H 2−ind Σ H2  f (p.42) (7.16) (5.1) section 5.93). discretetime in the frequency domain inﬁnite reachability gramian continuous.16) (5. (4.44). · n n p (I ).25) (4.16) (5.86) (4.10) 10. Lp (I ) q ×r q ×r hp .41) (7. F F ∞.60) (4.1.42) (9. A) Q(t) P Q X H P L (s) Rr Op D Θ s · .30) (5. y) γ  ≺ µ σ ∇ρ ηk .8.25) (4.28).85) (4.24) Σ H = σ1 (Σ) H 2−ind = σ1 (Σ) (5. (5.97) (4.2/sec. How to use this book R(A.20) (5.9 (5. 2004 i i i i .35) (7. generalized moments of h F h∞ . (4. (4.9 section 5. continuoustime case Hankel matrix array of pairs of points for rational interpolation L¨ owner matrix Lagrange interpolating polynomial generalized reachability matrix generalized observability matrix timeseries associated with an array generating system/Lyapunov function/storage function supply function inner product Lebesgue spaces of functions Hardy spaces of functions h∞ norm.39) (7.41) (4. discretetime observability matrix ﬁnite observability matrix (ﬁnite) observability gramian continuous.1.3 (5.27). Hp i “oa” 2004/4/14 page xix i xix (4. (11.43). discretetime in the frequency domain cross gramian.38) (7.29) (5.2.40).29) (4.80) (4.28).26) (4. (4.48) (4.7) (5.79).34) (7.22). H∞ norm ∞ .2).2 section 5.10) (5.
2004 i i .i i xx Preface i “oa” 2004/4/14 page xx i i i ACA : April 14.
i i i “oa” 2004/4/14 page 1 i Part I Introduction 1 i i i i .
i i i “oa” 2004/4/14 page 2 i i i i i .
which capture the main features of the original complex model. can easily lead to one million equations. and storage capabilities. In the computer science community. These can be used to simulate the behavior of the processes in question.i i i “oa” 2004/4/14 page 3 i Chapter 1 Introduction In today’s technological world. The simpliﬁed model is then used in place of the original complex model. In particular. there is an ever increasing need for improved accuracy which leads to models of high complexity. like parallelization of the corresponding algorithm. The weather on the one hand and Very Large Scale Integration (VLSI) circuits on the other. the curse of complexity. The design of reducedorder controllers is a challenging problem aspects of which have been investigated at least for systems of nottoohigh complexity (see [252]). Prominent examples include weather prediction and air quality simulations. efﬁcient algorithms are those which can be executed in polynomial time. they are also used to modify or control their behavior. accuracy. Accuracy: due to computational considerations (illconditioning) it may be impossible to compute such a high order controller with any degree of accuracy. It will not be addressed in the sequel (see section 14. (These aspects however are not be addressed here. may reach the tens or hundred of thousands. In such cases reduced simulation models are essential for the quality and timeliness of the prediction. we seek to control a CD (Compact Disk) player in order to decrease its sensitivity to disturbances (outside shocks). paraphrasing. Hence if the latter has high complexity. often simulation of the full model is not feasible. Sometimes. so will the controller. Here are examples of problems which can be solved in polynomial time. called the controller. • Set of linear equations: γ = 3 • Eigenvalue problem: γ = 3 3 i i i i . resulting in simulation with reduced computational complexity. where n is the size of the problem. constitute examples of such processes. or for control. Consequently. In this framework of mathematical models. This need arises from limited computational. as their future behavior depends on the past evolution. Computational speed: due to limited computational speed the time needed to compute the parameters of such a controller may be prohibitively large. the latter artiﬁcial. that is. discretization in problems which arise from dynamical PDEs (Partial Differential Equations) which evolve in three spatial dimensions. an appropriate simpliﬁcation of this model is necessary. However. this book deals with what is called the curse of dimensionality or. the former physical. either for simulation. algorithms whose execution time grows polynomially with the size of the problem. Other methods for accelerating the simulation time exist.2 on open problems. In a broader context.g. Such modiﬁcations are achieved in the vast majority of cases by interconnecting the original system with a second dynamical system. measured in terms of the number of coupled ﬁrst order differential or difference equations. Generically. The complexity of models. Storage: it may be hard to implement a high order controller on a chip. the complexity of the controller (the number of ﬁrstorder differential or difference equations describing its behavior) is approximately the same as that of the system to be controlled. In the former case (simulation) one seeks to predict the system behavior.) In the latter case (control) we seek to modify the system behavior to conform with certain desired performance speciﬁcations (e. page 341). The basic motivation for system approximation is the need for simpliﬁed models of dynamical systems. the complexity of a generic problem is of the order nγ . This however has three potential problems.. Furthermore these are dynamical systems. physical as well as artiﬁcial processes are described mainly by mathematical models.
It will also follow that methods which reduce the number of operations to k · n2 or k 2 · n. besides being efﬁcient in the sense mentioned above. Berkooz. extensive. the algorithm. linear timevarying models or piecewise linear models can be considered. This book addresses the approximation of dynamical systems that are described by a ﬁnite set of differential or difference equations. the numerical errors increase as well. the wave equation. therefore methods addressing nonlinear system approximation should be primarily considered. Kirchhoff’s Voltage Laws (KVL) and Kirchhoff’s Current Laws (KCL). but not polynomial time.g. We will see later on. Randomized algorithms are one instance. Second. while the former family of methods can be applied to relatively low dimensional systems (a few hundred states). The reason is that with order n3 operations one can deal with system complexities of a few hundred states. This becomes problematic for sufﬁciently large complexities even if the underlying algorithm is polynomial in time. 2004 i i . are parallelization methods.. First. We conclude this introduction by pointing to several books in the areas of interest: Obinata and Anderson [252] (model reduction for control system design. balanced truncation. • Artiﬁcial systems are sometimes designed to be linear. • All physical systems are locally linear. Roughly speaking the former family preserves important properties of the original system. the solution obtained however may fail to satisfy all constraints for some speciﬁed percentage of the cases. the diffusion equation. and offers a coherent picture. we espouse the following arguments.i i 4 Chapter 1. while with k 2 · n. • Linear theory is rich. In our framework there are two additional constraints. and approximation methods related to Krylov or moment matching concepts. This book is dedicated to the presentation of mostly linear theory. on the other. but they remain mostly adhoc. Nonlinear. The combination of these two basic approximation methods leads to a third one which aims at merging their salient features and will be referred to as SVDKrylov based approximation. Our goal is to present approximation methods related to the SVD (singular value decomposition) on one hand. e. in particular controller reduction). • There are attempts in developing a nonlinear approximation theory. Thus. climbs into the millions. which is a problem given that numbers can be represented in a computer only with ﬁnite precision. Datta [90] (comprehensive treatment of numerical issues in systems and control). If the operating point cannot be ﬁxed. must produce an answer in a given amount of time. probability laws (Markov equations). and Gallo [114] (model reduction with applications in electrical engineering). Gawronski [135] (model reduction of ﬂexible structures). in applications typically one linearizes around an operating point of interest. and this linearity holds for large ranges of the operating conditions. while the available computing power increases. However. Maxwell’s equations. • Many physical laws. mentioned earlier. The latter family lacks these properties but leads to methods which can be implemented in a numerically much more efﬁcient way. Introduction i “oa” 2004/4/14 page 4 i • LMI (Linear Matrix Inequalities): γ ≈ 4 · · · 6 On the other hand the following problems • Factorization of an integer into prime factors • Stabilization of a linear system with constant output feedback can be solved in exponential. represent considerable improvements. Other kinds of remedies mostly applicable to problems with polynomial time solution algorithms but very large complexity. There are numerous remedies against the curse of dimensionality. Holmes. Nunnari. The basic argument used is that real systems are nonlinear. the complexity of the systems that can be dealt with. the latter methods are applicable to problems whose complexity can be several orders of magnitude higher. and in addition provides an explicit quantization of the approximation error. together with a ﬁnite set of algebraic equations. It should be mentioned at this point that. since with increased computing power. the solution must be accurate enough. where k is the complexity of the reduced model. are linear. requires of the order n3 operations where n is the complexity of the system to be approximated. Finally some thoughts which have guided our choice of topics concerning the dilemma Linear vs. Fortuna. Newton’s second law. like stability. and Lumley [62] (dynamical systems described by partial differential equations). Schr¨ odinger’s equation. Banks [42] (control and estimation for distributed i i ACA : April 14. that a popular method for model reduction of dynamical systems. this turns out to be a mixed blessing.
[370]. Zhou et al. and by Blondel and Tsitsiklis [65] (complexity of algorithms in system theory). 2004 i i i i . [371] (comprehensive treatment of systems and control with emphasis on robustness issues). as well the collection of reprints [262] on numerical linear algebra and control. See also the special issue of the Control Systems Magazine [347] on numerical awareness in control. Insights into aspects mentioned earlier but not discussed further in this book can be found in the book by Skelton.i i 5 parameter systems). Grigoriadis and Iwasaki [299] (controller complexity). and in the surveys by Tempo and Dabbene [323] (randomized algorithms). i “oa” 2004/4/14 page 5 i ACA April 14.
1 Problem setup Physical/Artiﬁcial System + Data Modeling © Σ: ODEs d d d d d d d PDEs ' d d discretization d Model reduction d d d d ˆ : reduced # of ODEs Σ ¨ B Simulation ¨ ¨ ¨ rr r r j Control Figure 1.1. This system will be denoted by Σ. It should also be mentioned that 1 Given a vector or matrix with real entries. g are vectorvalued maps of appropriate dimensions. In the latter case the equations are typically discretized in the space variables leading to a system of ODEs. 2004 i i . Z− . yielding discretetime dynamical systems. the description is completed with a set of measurement or observation variables y.1. x ∈ X = {x : T → Rn }. Next. y ∈ Y = {y : T → Rp }. The modeling phase consists in deriving a set of ordinary differential equations (ODEs) or partial differential equations (PDEs). (1. The complexity n of such systems is measured by the number of internal variables involved (assumed ﬁnite). respectively. the real numbers. u) = g(x. First the time axis T is needed. Σ ˆ is in coming up with a dynamical system Σ now used in order to simulate and possibly control Σ. In the sequel we will assume that Σ consists of ﬁrst order ODEs together with a set of algebraic equations: d dt x Σ: = f (x. an internal variable (usually the state). If the entries are complex the same superscript denotes complex conjugation with transposition. all integers. u) y (1.2) are the input (excitation). i i ACA : April 14. Z. The broad setup The broader framework of the problems to be investigated is shown in ﬁgure 1. the positive.i i 6 Chapter 1. negative. while f . or Z+ . we will formalize the notion of a dynamical system Σ. Introduction i “oa” 2004/4/14 page 6 i 1. The starting point is a physical or artiﬁcial system together with measured data. that is. and the output (observation). It should be mentioned that sometimes the ODEs are discretized in time as well.1) where u ∈ U = {u : T → Rm }. the positive (negative) real numbers. we will assume for simplicity that T = R. the superscript ∗ denotes its transpose. other choices are R+ (R− ). The model reduction step consists ˆ by appropriately reducing the number of ODEs describing the system.1 Thus we will be dealing with dynamical systems that are ﬁnitedimensional and described by a set of explicit ﬁrstorder differential equations. n is the size of x = (x1 · · · xn )∗ .
i i 1. Often in practice. ﬁnd ˆ = {x ˆ = (ˆ ˆ ).1. systems whose behavior is discrete in time can be treated equally well. and X ˆ : T → Rk }. If we consider linear. Problem setup 7 i “oa” 2004/4/14 page 7 i u1 −→ u2 −→ . Given Σ = (f .3). u(t)). . Campbell. We refer to the book by Brenan. k < n. system.g. where k < n. x ∈ X. for an appropriate vector valued function F. . In this case T = Z (or Z+ . u) −→ y1 −→ y2 .1) is replaced by the difference equation x(t + 1) = f (x(t). . um −→ ( Σ: d x dt = f (x.1) is implicit in the derivative of d x. g) with: u ∈ U. the resulting system being linear. ΣLTI : y(t) = C(t)x(t) + D(t)u(t) ˙ (t) = Ax(t) + Bu(t) x y(t) = Cx(t) + Du(t) (1. denoted by ΣLPTV . and Petzold [73] for an account on this class of systems.4) so that the above conditions are satisﬁed. x.existance of global error bound (2) Preservation of stability. Such systems are known as DAE differential algebraic equations systems. denoted by Σ= the problem consists in approximating Σ with: ˆ = Σ ˆ A ˆ C ˆ B ˆ D ∈ R(k+p)×(k+m) . u) = 0.2. the remark on page 289) this more general class of systems will not be addressed in the sequel. this explicit nonlinear system is linearized around some equilibrium trajectory (ﬁxed point). 2004 i i i i . Besides an occasional mention (see e. timeinvariant dynamical systems ΣLTI as in (1. passivity (3) Procedure computationally stable and efﬁcient (COND) Special case: linear dynamical systems. In the sequel however. we obtain a linear. if this trajectory happens to be stationary (independent of time). timeinvariant. Explicit ﬁnitedimensional dynamical system within the same framework. −→ yp Figure 1. we concentrate mostly on continuoustime systems. parameter timevarying. or Z− ).3) A more general class of systems is obtained if we assume that the ﬁrst equation in (1. u) y = g(x. g such that (some or all of) the following conditions are satisﬁed: (1) Approximation error small . u ∈ U. and the ﬁrst equation in (1. Finally. y ∈ Y. y ∈ Y. Σ f. that is F( dt x. we have the following: ACA April 14. . (1.5) A C B D ∈ R(n+p)×(n+m) (1. Pictorially. denoted by ΣLTI : ΣLPTV : ˙ (t) = A(t)x(t) + B(t)u(t) x . Problem statement.
6) is a oblique projection onto the k dimensional subspace spanned by the columns V. d ¯ = Tf (T−1 x ¯ . In the linear timeinvariant case the resulting approximant is ˆ = Σ ˆ A ˆ C ˆ B D = W∗ AV CV W∗ B D (1. The resulting approximant Σ subspace obtained by restricting the full state as follows: x ˆ : Σ d ˆ dt x(t) ˆ (t). the socalled H2 norm. This turns out to be the norm of the impulse response of the error system. This is equivalent to trun¯ = Tx. In the sequel we will also make use of another norm for measuring approximation errors. Consider the change of basis T ∈ Rn×n . Introduction i “oa” 2004/4/14 page 8 i A Σ: C B ⇒ D ˆ B ˆ :Σ ˆ A ˆ D ˆ C That is. T−1 = [V T1 ]. consists in comparing the outputs y and y ˆ One possible measure for judging how well Σ ˆ . u). while the number of columns of B and the number of rows of C remain unchanged. For details on norms of linear systems we refer to Chapter 5. the neglected term T1 x ˜ must be “small” in some appropriate Consequently if Σ sense. u).1 Approximation by projection A unifying feature of the approximation methods discussed in the sequel are projections. T and T−1 : following quantities by partitioning x ¯= x ˆ x ˜ x . ˆ approximates Σ. The These equations describe the evolution of the k dimensional trajectory x ˜ . that the size (norm) of obtained by using the same excitation function u on Σ and Σ ˆ . only the size of A is reduced. all eigenvalues of A are in the lefthalf of the complex plane) this measure of ﬁt is known as the H∞ norm of the error. it follows that Π = VW∗ ∈ Rn×n (1. u). where x Since W∗ V = Ik . What results is a dynamical system which evolves in a k dimensional approximation occurs by neglecting the term Zx ˆ of Σ is: ˆ = W∗ x. W ∈ Rn×k . be kept small or even minimized.1. Retaining the ﬁrst k differential Substituting for x in (1.i i 8 Chapter 1.8) i i ACA : April 14. x ˜ ∈ Rn−k . in the state space: x ¯ . V. for all normalized inputs u. u) and y = g(T−1 x ¯ . respectively. We require namely. Assuming that Σ is stable the worst output error y − y (that is. T = W∗ T∗ 2 ˆ ∈ Rk . u(t)) = g(Vx y(t) (1. x dt ˆ in terms of x ˜ . . 2004 i i . y = g(Vx ˆ + T1 x ˜ . We deﬁne the cation in an appropriate basis. notice that they are exact. u(t)) = W∗ f (Vx ˆ (t).1) we obtain dt x equations leads to d ˆ = W ∗ f (V x ˆ + T1 x ˜ . along the kernel of W∗ .7) ˆ is to be a ”good” approximation of Σ. 1.
2 Summary of contents Following the introduction. Furthermore. Basic approximation methods: Overview C SVD methods d d d Weighted SVD methods s E Krylov methods • Realization © d d • Interpolation • Lanczos • Arnoldi Nonlinear systems • POD methods • Empirical gramians Linear systems • Balanced truncation • Hankel approximation d d d d d © d SVDKrylov methods Figure 1.3.2. which establishes a link between (I) and (II). which are well known to in the systems and control community.i i 1. Part II is devoted to a review of the necessary mathematical and system theoretic prerequisites. (III) SVDKrylov based methods. and have good numerical properties. and have good system theoretic properties. the class of weighted SVD methods. norms of vectors and (ﬁnite) matrices are introduced in chapter 3. together with a detailed ACA April 14. (I) SVDbased methods. for choosing the projection Π (that is the matrices V and W) so that some or all conditions (COND) are satisﬁed. In particular. 2004 i i i i . Three different methods are discussed. Summary of contents 9 i “oa” 2004/4/14 page 9 i A ﬂowchart of the contents of this book is shown in ﬁgure 1. 1. The purpose of this book is to describe certain families of approximation methods by projection. will also be discussed. Flowchart of approximation methods and their interconnections. which seek to merge the best attributes of (I) and (II).3. SVD stands for Singular Value Decomposition and POD for Proper Orthogonal Decomposition (concepts to be explained later). (II) Krylovbased methods which are well known in the numerical analysis community and to a lesser degree to the system theory community.
After a more general discussion on induced norms of linear systems. Introduction i “oa” 2004/4/14 page 10 i discussion of the Singular Value Decomposition (SVD) of matrices. The Hankel operator H is introduced and its eigenvalues and singular values. Chapter 5 introduces various norms of linear systems which are essential for system approximation and for the quantiﬁcation of approximation errors. Having completed the presentation of the preparatory material. Elements of numerical linear algebra are also presented in chapter 3. namely SVDbased approximation methods. frequency weighted balancing. are computed. together with that of the convolution operator S . depending on whether we are dealing with continuous. This result is generalized to linear dynamical systems in chapter 8.1. The last part is concerned with the relationship between internal and external descriptions. aspects of the more general problem of rational interpolation are displayed. The associated structural concepts of reachability and observability are analyzed. The approximation problem in the induced 2norm and its solution given by the SchmidtEckartYoungMirsky theorem are tackled next. In this context a brief description of the POD (Proper Orthogonal Decomposition) method is given in section 9. followed by a discussion of L2 systems and allpass L2 systems. The next part of the book (Part IV) is concerned with Krylovbased approximation methods. the special cases of bounded real and positive real systems are brieﬂy explored. This is a representation in terms of ﬁrst order ordinary differential or difference equations. This leads to the concept of Hankel singular values and of the H∞ norm of a system. The ingredients are Lyapunov equations and the Hankel singular values. positive real. which play an important role in Hankel norm system approximation. In chapter 13 aspects of the approximation methods presented earlier are illustrated by means of numerical experiments. The ﬁrst section discusses the external description of linear systems in terms of convolution integrals or convolution sums. The following chapter 11. 2004 i i . the Sylvester equation and the closely related Lyapunov equation.or discretetime systems. The latter part of the chapter is dedicated to a study of the decay rates of the Hankel singular values which is of importance in predicting how well a given system can be approximated by a low order system. Finally. In particular a method which involves least squares combines attributes of both approaches.i i 10 Chapter 1. These methods turn out to be numerically efﬁcient. These methods have their roots in numerical linear algebra and address the problem of providing good estimates for a few eigenvalues of a big matrix. discusses the application of these methods to system approximation. which can be viewed as a reﬁnement of approximation by balanced truncation. its relation with balanced truncation is also mentioned. by applying them to i i ACA : April 14. They are introduced in this chapter and their properties are explored. which is known as the realization problem. The concept of dissipativity which generalizes that of stability from autonomous systems to systems with external inﬂuences. Furthermore two iterative methods are presented which provide approximate solutions to Lyapunov equations and therefore can be used to obtain reduced order systems which are approximately balanced. Part II commences with the exposition of the ﬁrst class of approximation methods. The ﬁnal section of this chapter is devoted to the exposition of a polynomial approach that offers new insights and connections between balancing and Hankel norm approximation. Chapter 7 is devoted to approximation by balanced truncation. The connection with rational interpolation is discussed in some detail. we turn our attention to a brief review of system stability. The chapter concludes with numerically reliable algorithms for the solution of these equations. such as quantiﬁcation of the approximation error. Chapter 4 presents some basic concepts from linear system theory. which lead to methods for approximating unstable systems. on Hankelnorm approximation. The following section treats the internal description of linear systems. is introduced. Chapter 6 is devoted to the study of two linear matrix equations. The main result is followed by a canonical form which can be applied to balanced systems. Krylov methods lead to approximants by matching moments. Consequently chapter 10 gives an account of Lanczos and Arnoldi methods as they apply to eigenvalue problems. Part III of the book concludes with a chapter dedicated to special topics in SVDbased approximation methods. Chapter 8 presents the theory of optimal and suboptimal approximation in the induced 2norm of the Hankel operator H. The ﬁnal part of the book (Part V) is concerned with the connections between SVDbased and Krylovbased approximation methods. Next the inertia theorem for the Lyapunov equation is investigated. Approximation by modal truncation is discussed next. These equations are central to SVD based methods and various solution methods (from complex integration to the Cayley Hamilton theorem to the sign function method) are discussed. including bounded real. The last part of the chapter is involved with special types of balancing. although they lack other important properties. The gramians are important tools for system approximation.
i i 1. computational complexity. 2004 i i i i . followed by a collection of exercises (chapter 15) appropriate for classroom use. ACA April 14. Algorithms are compared in terms of both approximation error and computational effort. Summary of contents 11 i “oa” 2004/4/14 page 11 i various systems.2. software and open problems. The book concludes with a chapter on projections.
Introduction i “oa” 2004/4/14 page 12 i i i ACA : April 14.i i 12 Chapter 1. 2004 i i .
Passive devices • VLSI circuits • EIP Electrical Interconnect and Packaging • North sea wave surge forecast • Paciﬁc storm forecast • America’s Cup • Ozone propagation • Honeycomb vibrations • Dynamics simulations • Heat capacity • Windscreen vibrations • Stabilization • Bifurcations • Micromirrors • Elk sensor • Steel proﬁle 2. ISS: International Space Station 8. Biological systems 5. Optimal cooling Figure 2.1 are described brieﬂy. 13 i i i i . 1. CVD reactor 9. 4. 3. Applications leading to largescale dynamical systems In the sequel the examples listed in ﬁgure 2. MEMS: Micro ElectroMechanical Systems 10.Data assimilation 4. and those where simulation and control are of primary interest (cases 7. those where simulation and/or prediction of future behavior is of interest (cases 1. 10 of the same table). There are two categories of examples. Weather prediction . 9.1). Air quality . 8.i i i “oa” 2004/4/14 page 13 i Chapter 2 Motivating examples In this section we describe various applications where largescale dynamical systems arise. Molecular systems 6. namely. Vibration/Acoustic problems 7.Data assimilation 3.1. 5. 6 in ﬁgure 2. 2.
the AWE method is described in [86]. The IC (Integrated Circuit) was introduced in the 1960’s. 1960’s: IC 1971: Intel 4040 10µ details 2300 components 64KHz speed 2001: Intel Pentium IV 0. Since then the scaling trends in VLSI (Very Large Scale Integration) design are: (i) the decrease in feature size is greater than 10% per year. The complexity of the resulting model reaches unmanageable proportions. [33. submicron VLSI circuits. which was originally developed in the 1970s at the University of California. This evolution is depicted in ﬁgure 2. 2004 i i . (iii) there is an increase in operating frequency which now reaches the gigahertz range. Furthermore. A related problem was described in [79]. inductors. This requires the solution of Maxwell’s equations for 3dimensional circuits with interconnections of the order of several kilometers. There is a generalpurpose simulation program for electric circuits which is known as SPICE (Simulation Program with Integrated Circuit Emphasis).1 Passive devices Highfrequency. These trends impact physical parameters due to the increase of interconnect length and interconnect resistance. Again. For details see van der Meijs [330]. The resulting chips are multilayered and the passive parts thus correspond to 3dimensional RLC (resistorinductorcapacitor) circuits. i i ACA : April 14. It thus becomes inadequate for complexities of the order mentioned above. Berkeley. capacitors. and can be anywhere from n ≈ 105 to 106 .2. diodes. and transistors. [194]. We thus seek to generate models which are as simple as possible but are nevertheless capable of generating the actual chip behavior. we give just a few: [89].18µ details 42M components 2GHz speed 2km interconnect 7 layers Figure 2. Motivating examples i “oa” 2004/4/14 page 14 i 2. Therefore reduced order modeling is of great importance. Evolution of IC design [330] The design phase of a VLSI circuit is followed by the physical veriﬁcation phase where design ﬂaws are to be discovered.1 for the deﬁnition of moments and details on moment matching methods). which works by spatial discretization of Maxwell’s equations for 3D geometries. [31]. [113]. As a consequence. This allows the following components: resistors. that computing moments is numerically illconditioned and the fact that such problems can be solved efﬁciently using Krylov methods was soon proposed for solving AWE problems [124]. See also [279] and [202]. There are numerous references on this topic. It turns out however. transmission lines.2. we refer to [316]. who proposed the method of asymptotic waveform evaluation (AWE). The next step along this line of approach to circuit simulation came with the work of Feldmann and Freund who proposed the Pad´ e via Lanczos approach. dependent sources. This consists in computing some of the moments of the corresponding circuits (see section 11. A more general problem is the electromagnetic modeling of packages and interconnects. One method for deriving a model in this case is known as PEEC (Partial Element Equivalent Circuit). [116]. independent sources. (ii) the increase in chip size is greater than 10% per year. The ﬁrst attempts to use model reduction to simplify such circuits was made by Pillage and Rohrer [269]. and submicron scale geometric resolution. capacitance/inductance effects inﬂuence the chip and there is a decrease of metal width and dielectric thickness. the starting point is Maxwell’s equations and the PEEC method is used to obtain a ﬁnite dimensional approximation. 34]. for an overview. Simulations are thus required to verify that internal electromagnetic ﬁelds do not signiﬁcantly delay or distort circuit signals. SPICE can handle complexities of a few hundred such elements.i i 14 Chapter 2. the chip complexity has been increasing by at least 50% each year.
4. there are water barriers that can be closed to prevent ﬂooding.3 the horizontal and vertical axes indicate the number of discretizadepth 3500 160 140 Wick 3000 120 100 2500 2000 North Shields 80 60 Lowestoft Huibertgat Harlingen Delfzijl 1500 Den Helder IJmuidenII HoekvanHolland 40 20 0 20 40 60 1000 Sheerness Dover Vlissingen 500 80 100 120 140 160 180 200 0 Figure 2. tion points.3. Wave surge prediction problem: discretization grid close to the coast. the barriers must only stay closed while the surge lasts. problem in this case is not just prediction of the wave surge based on initial conditions. Since these rivers are in many cases important waterways. In ﬁgure 2.Data assimilation 15 i “oa” 2004/4/14 page 15 i 2. 2004 i i i i . Weather prediction . The 510 500 490 480 y [km] 470 460 450 440 430 420 0 20 40 x [km] 60 80 100 Figure 2. There are several locations ACA April 14. Furthermore the warning has to come about six hours in advance.Data assimilation 2.2.2. while the color bar on the righthand side indicates the depth of the sea at various locations of interest (justifying the use of shallow water equations). Wave surge prediction problem: depth of the north sea. The equations governing the evolution of the wave surge are in this case the shallow water equations which are PDEs (Partial Differential Equations).4. The discretization grid used in this case is shown in ﬁgure 2.i i 2. In case of such a surge.2 Weather prediction .1 North sea wave surge forecast Due to the fact that part of the Netherlands is below sea level it is important to monitor wave surges at river openings.
and the resulting computational time is several times the allowed limit of 6 hours.6 shows the error covariance of water level prediction in two cases: ﬁrst.25 0. Wave surge prediction problem: error covariance of the estimated water level without measurements (above) and with assimilation of 8 water level measurements (below). [168].4 160 140 0. been studied by Verlaan and Heemink at Delft University of Technology in the Netherlands. There are also locations where the movement of the sea currents is measured. 2004 i i . The FE (Finite Element) discretization of the shallow water equations yields approximately 60. This problem has stdv waterlevels 0.4 120 100 n 80 60 40 0.2 60 40 0. This is achieved by means of a Kalman Filter.1 0.15 0.5. as one wishes to predict the wave surge based both on the model and the provided measurements. i i ACA : April 14. Wave surge prediction problem: measurement locations. with wind disturbance and no additional measurements. Therefore reduced order models are necessary.35 0. 000 equations.05 0 0 Figure 2.5 160 140 0. problem thus becomes data assimilation.2 80 0. Motivating examples i “oa” 2004/4/14 page 16 i where the wave surge is measured. The 510 500 490 480 470 460 450 Hook of Holland 440 430 420 0 20 40 60 80 100 Noordwijk Scheveningen waterlevel current HF−radar station Figure 2. and second assimilating the measurements from the 8 locations.1 20 0 50 100 m 150 200 20 0 20 40 60 80 100 120 140 160 180 200 120 0. The bar on the right shows the colorcoding of the error. Figure 2.i i 16 Chapter 2.6.3 100 0.3 0. for details see [348].
In particular we wish to determine the initial perturbation which produces the greatest perturbation growth over some speciﬁed interval of time. Discretization in y yields the set of ODEs: ˆ(t) dφ ˆ(t). state variables n. in particular we assume that (i) random inputs are affecting ˆi . an expert in weather forecasting at the CSIRO (Commonwealth Scientiﬁc and Industrial Research Organization). and (ii) all these variables are measured (observed). ACA April 14. These are meant to advise on the weather (in particular. The 31st competition since 1848 took place in 2003 in New Zealand between the Team NZ and the Swiss Team Alinghi. is that the contestants invest effort in setting up weather teams. Such models are used for storm tracking in midlatitude Paciﬁc.2. assuming harmonic perturbations in the wind velocity of the form Φ(x. we have 2 ∂ 2 φ(y. t) ∂ 2 φ(y.Data assimilation 17 i “oa” 2004/4/14 page 17 i 2. t) ∂φ(y. This strategy turned out to be an important factor for the Alinghi. The discretized system is thus a linear system all variables φ having the same number of inputs m. These are governed by the OrrSommerfeld equation. t) 1 2 . t) = φ(y. 2004 i i i i . A fact however which is less well known. Emissions from fossil fuel combustion and biomass burning and the resulting photochemical production of tropospheric ozone and climate warming are considered as global problems. 2.3 Air quality simulations .3 America’s Cup The America’s Cup is a race of sailing boats which takes place every 4 years. There is a lot of technological knowhow which goes into the construction of the boats. At 7:00am before each race. and a regional one during the 1980s. The goal is to choose the right sails and develop appropriate strategies for the race. Important is to be able to predict wind shifts.2 Paciﬁc storm tracking The issue here is to determine the sensitivity of atmospheric equilibria to perturbations. Furthermore the Alinghi team set up 8 weather stations which provided data for the data assimilation part of the model. the weather team was presenting the weather prediction for next 68 hours of sailing.Data assimilation Air pollution was thought to be a local phenomenon until the 1970s. At present it is recognized that air pollution extends from urban to the regional and global scales. last minute updates brought the Alinghi 200m ahead of the New Zealand boat which proved decisive for the winners. = Aφ(y. the Alinghi team. lead by Dr. Australia. and outputs p: Σ= ˆ A In In 0 ⇒ m = p = n. The 2003 winner. the wind direction and speed) which will prevail during the approximately 8 hours of the race. For data assimilation in this context see [109]. =A dt We assume that this system is inﬂuenced by perturbations. 2.i i 2.2. y. J. A ˆφ ˆ ∈ Rn×n . perturbations to the vorticity equation of a Couette ﬂow are studied. In [108]. set up a strong weather forecasting group. Air quality simulations .3. in the 3rd regatta for instance. t ) ∂t ∂y 2 Re ∂y 2 where Re denotes the Reynolds number. t) = −iky + − k φ ( y. t)eikx . Katzfey.
. at present. Si (x.i i 18 Chapter 2.. procedures for their numerical solution. c2 . ∂t i = 1. transport and fate of air pollutants: ∂ci + ∇ · (uci ) − ∇ · (K∇ci ) = Ri (c1 .7. and removal of air pollutants. as well as its loss due to various processes such as dry deposition and rainout. Ri is the rate of concentration change of species i by chemical reactions. This results in 300. c(t0 ) = c0 . The air quality model is the only prognostic tool available to the policymaking community. This is achieved by means of air quality models (AQMs) which provide a mechanism for elucidating the underlying physical and chemical processes responsible for the formation. Ensemble averaging is used to dispense with the need to capture the extremely smallscale ﬂuctuations due to turbulence. temperature). Thus the problem becomes. In particular. s. Differences between various air quality models stem from alternative choices made by their developers in characterizing these physical and chemical processes. cs ]. regional and global scales. Ozone concentrations in Texas and vicinity on 8 September 1993 After spatial discretization of the CTM (Chemistry Transport Model) we obtain: dc = f (c) = fa (c) + fd (c) + fr (c) + u. ci is the (averaged) concentration of species i. 000 equations. in the z direction 10km = 30 points. the system described by the above equation represents the continuum chemistrytransport model (CTM). cs ) + Si . Many models share common features. transport. With appropriate initial and boundary conditions. and s is the number of predicted species. dt The grid in the x and y directions is 1km = 100 points.e. c = [c1 .7. air quality models are based on solving the same species conservation equations describing the formation. a variety of air quality models being applied on the urban. Air quality models are designed to calculate concentrations of ozone and other pollutants and their variations in space and time in response to particular emission inputs and for speciﬁed meteorological scenarios. a source emissions inventory. the color bar on the lefthand side of this ﬁgure indicates the concentration of pollutants in parts per billion (PPB). Figure 2. These are mainly satellite images like Figure 2. one of data assimilation.g. air quality models have come to play a central role in the process of determining how pollutant emissions should be managed to achieve air quality goals. In the past few years the MOPITT satellite was launched by NASA for monitoring purposes. t) is the wind velocity vector at location x and time t. once more. t) is the source/sink of species i. ranging from zero (blue) to 92 (green) to 184 (red). Because of this unique capability. a description of pollutant transport and removal. There are. · · · . Motivating examples i “oa” 2004/4/14 page 18 i A current challenge to the atmospheric science community is to quantify the impact of human activities on global atmospheric photochemistry. t) is the turbulence diffusivity tensor. Here. It should be mentioned that many measurements of pollutants exist. K(x. · · · . A chemistrytransport modeling system is composed of four basic components: a chemical kinetic mechanism. and the approach taken to adapt the model to the computational domain of interest. i i ACA : April 14. accumulation. a tool capable of quantitatively estimating future air quality outcomes for conditions and/or emissions different from those that have existed in the past. · · · . The source/sink term Si can include emissions of a species. 2004 i i . i. and a set of numerical algorithms for integrating the governing equations. u(x. Ri can also be a function of meteorological variables (e.
This SVD provides a way of decomposing ACA April 14. The goal is therefore to ﬁnd a model of the honeycomb which explains the phenomena observed and measured and provides new insights into these phenomena. The equations describing the motion of proteins are of the type: dt 2 x(t) = −∇φ(x(t)) where 3n φ is a potential function and x ∈ R .8). The distribution of these substates and their transitions are major factors in determining the biological d2 properties of proteins. Biological systems: Honeycomb vibrations 19 i “oa” 2004/4/14 page 19 i For details we refer to [102]. there are no optical signals involved given the darkness of the hive. Honeybee Waggle dance. In particular we would like to know to what extent a honeycomb is appropriate as a medium for the transmission of information through vibrations. 2.8. Proteins are dynamic entities primarily due to the thermal motion of their atoms. During the waggle dance (see right pane in ﬁgure 2. It has also been experimentally observed that the combs exhibit an impedance minimum to horizontal vibrations at 230270Hz .4µm. J.8). furthermore the comb seems to amplify vibrations in frequencies around 250Hz . where n is the number of atoms in the protein.4. They have different states which are attained by thermal motion and determine their biological properties. The main tool for doing this is the SVD (Singular Value Decomposition). while the chemical one results by transmission of pollen and/or nectar. Right: Dancer and followers. a conformational substate being a collection of structures that are energetically degenerate.1. The thermal motion of the atoms drives the transitions between the different substates accessible to a protein. An important recent development in threedimensional chemistry transport modeling is called MOZART (Model of Ozone and Related Tracers). The possible mechanisms involved in this unique communication are (i) mechanical and (ii) chemical. The question thus arises as to the ability of the bees to detect weak vibrations in a noisy environment (hive). the dancer bee waggles her body at 15Hz and vibrates her wings intermittently at 200300Hz . 2. It should be mentioned that the transmitted vibrations have an amplitude of about 1. The mechanical mechanism results from the vibration of the honeycomb.i i 2. [293]. [235]. is an example of symbolic communication in nonprimates. it has been developed in the framework of NCAR (National Center for Atmospheric Research) Community Climate Model (CCM) [179]. During this dance distance and direction information about the food source are transmitted. The honeybee dance language.5 Molecular dynamics This example involves simulation in molecular dynamics. In the 1960s K. For more details see [101] and the more recent article [321]. von Frisch (1973 Nobel Prize Winner in Physiology and Medicine) postulated that this recruitment takes place by means of the socalled waggle dance performed on the honeycomb. Experimental measurements have shown that the vibrations occur only in the horizontal direction to the plane of the combs. Left: Figure “eight” dances.4 Biological systems: Honeycomb vibrations Figure 2. [292]. This method is known as POD (Proper Orthogonal Decomposition) and is described in section 9. and in particular protein substate modeling and identiﬁcation. an SVD of snapshots of x is used. It was already noted by Aristotle that honeybees recruit nestmates to a food source. in which foragers perform dances containing information about the distance and direction to a food source. For details see [275]. To ﬁnd the most important protein conﬁgurations. possess an abstract system of communication. This dance consists of a looping ﬁgure eight movement with a central straight waggle run (see left pane in ﬁgure 2. [246]. Tautz at the Universit¨ at W¨ urzburg. Honeybees together with human beings. Most of the experimental investigations in this problem have been performed by Dr. Proteins can exist in a number of different conformational substates. 2004 i i i i .
and I is the Heaviside step function. Histidine: part that opens and closes catching oxygen molecules. (3) the SVD provides a powerful tool for visualizing complex highdimensional systems in a lowdimensional space. Gauss quadrature is used. If an atom were constrained to move only along one of the left singular vectors. The left singular vectors describe the direction each atom prefers to move in.and white .International Space Station The International Space Station (ISS) is a complex structure composed of many modules. This requires all the fundamental frequencies of the system to be computed in advance.6 ISS . The active site which catches oxygen molecules is called the heme (this is shown in yellow . The elements of the right singular vector can be thought of as scaling the size of the left singular vector to describe where the atom is at each timepoint or in each conformation. The ﬂex modes of each module are described in terms of n ≈ 103 state variables. this involves the Lanczos algorithm where A is the Hessian of φ. Motivating examples i ‘‘oa’’ 2004/4/1 page 20 i Figure 2. then its motion in cartesian space can be projected onto this vector. they are projections of the protein conformations onto these modes showing the protein motion in a generalized lowdimensional basis.closing . In chapter 13. Consequently controllers of low complexity are needed. a molecular dynamics trajectory into fundamental modes of atomic motion.i i 20 Chapter 2. Molecular dynamics: Myoglobin (protein). Heat capacity of molecular systems The next application is concerned with the determination of the heat capacity of a molecular system. and b is arbitrary. This histidine also ranked second on the list of residues. For details see [364]. 2004 i i i i . these modules are contributed by NASA and other space agencies. because of hardware. The goal is to develop controllers which are implemented onboard the space station.9. Houston. Figure 2.9 a protein called myoglobin (more precisely: F46V mutant bioglobin) is depicted. where g (ω )dω gives the number of vibrational frequencies in the interval (ω. The ﬁrst left singular vector of the (distal) histidine in this F46V mutant bioglobin describes more than 90% of the total motion of the histidine. giving a curve. (2) multiconformer reﬁnement appears to be the better method for overall improvement of the model. Since the molecular system is ω n discrete we have: σ (ω ) = 0 g (τ )dτ = i=1 I(ω − ωi ). throughput. and testing issues. ﬁnally oxygen molecules (shown in green) are captured by means of the active site which is called histidine. This involves the calculation of the following integral: ∞ Cv = 0 f (ω )g (ω )dω.10 shows the frequency response (amplitude Bode plot) as more components are added to the space station. model reduction of the ﬂex models for two speciﬁc modules. Thus: (1) incorporating conformational substate information improves the reﬁnement model. where n is the number of particles in the molecular system. 2. ω + dω ). The complexity of the resulting model is reﬂected in the number of spikes present in the frequency response. The right singular vectors provide temporal information.state). These models were provided by Draper Laboratories. In Figure 2. For details on the ISS and its assembly we refer to ACA : April 14. radiation. Instead.closed . Heme: active site. namely the 1R (Russian service module) and 12A (second leftside truss segment) are discussed.
Mar 00 Stage 13A . 7564 nodes (3 layers of 60 by 30 elements).7. See also [236]. C.7 Vibration/Acoustic systems Consider a car windscreen subject to an acceleration load. Poisson ratio 0. dt x velocity of the windscreen at the grid points chosen and M. in a speciﬁc case. Another issue concerning CVD reactors is the determination of the stability of steady states. for instance CVD (Chemical Vapor Deposition) reactors. The problem consists in computing the noise generated at points away from the window. 384 states). Finally.gov/station/ 2. Since this is a second order system its complexity is twice as high (45.Dec 99 Stage 4A . the transient behavior is linearized around a steady state. A model problem of 3D incompressible ﬂow and heat transfer in a rotating disk CVD reactor is used to analyze the effect of parameter change on the ACA April 14.10. The ﬁnite element discretization gives. density 2490 kg/m3 . The dimension of the resulting linear systems is on the order of a few thousand state variables. This leads to a generalized eigenvalue problem. These parameters help determine the coefﬁcients of the resulting FE model experimentally. Leuven. damping. 692. 2004 i i i i . stiffness matrices.nasa. For details the reader is referred to the work of Banks and coworkers [199].i i 2. This problem is addressed in the literature using POD (Proper Orthogonal Decomposition) methods.May 02 Figure 2. 2.23. This problem was provided by Karl Meerbergen. The discretized problem in this particular case has dimension 22. K are the mass. Belgium. For details on eigenvalue problems for second order systems see [325]. Vibration/Acoustic systems Flex Structure Variation During Assembly i “oa” 2004/4/14 page 21 i 21 Stage 1R . To address this problem. Free Field Technologies. The eigenvalues with largest real part are calculated using the Arnoldi iteration. the windscreen is subjected to a point force at some given point (node 1891). Frequency response of the ISS as complexity increases http://spaceflight.8 CVD reactors An important problem in semiconductor manufacturing is the control of chemical reactors. and the goal is to compute the displacement at the same point. The ﬁrst step in solving this problem is the partial differential equation which describes the deformation of the windscreen of a speciﬁc material. the material is glass with Young modulus 7 · 1010 N/m2 . Notice that the last two problems (the windscreen and the ISS) yield second order equations: M d2 d x(t) + C x(t) + Kx(t) = f (t). dt2 dt d where x is position. [48]. [200].
Rayleigh.1 Micromirrors One such MEMS device is the micromirror and the micromirror array. strict precision of mirror tilt control is necessary.9. i i i i ACA : April 14.2 Elk sensor A few years ago. Large arrays.11. They are usually fabricated using integrated circuit (IC) techniques and can range in size from micrometers to millimeters. Nowadays the elk sensor belongs to the standard equipment in many cars. electrostatic gaps. and electronics. all integrated onto the same chip. cheaper as compared to electrical switches. For a distance of 500 microns. Beams.berkeley. They are used for precision light manipulation.000 mirrors will be needed for telecommunication applications. In the most general form.9. The calculation of leading eigenvalues for matrix systems of order 416 million is required.10 degree precision is required.g. Its advantages as an optical switch are that it is faster. http://wwwbsac. circuit elements. Finite Element Model (FEM) simulation is often used which results in complex systems. For details. A mirror tilts to reﬂect light from one ﬁber to another ﬁber. For a description we refer to [88]. are modeled by small. These calculations lead to the critical Grashof. Rotating Disk H C L (a) Outlet (b) Outlet Figure 2. and Reynolds numbers for a Hopf bifurcation. are integrated systems combining electrical and mechanical components. up to 1. e. Motivating examples i “oa” 2004/4/14 page 22 i performance of the eigenvalue algorithm. This test consists in forcing the car to take a sharp turn in order to avoid an obstacle which suddenly appears on the road. A second application of micromirros are maskless lithography (virtual masks).. The description of this system requires a full 3D NavierStokes model.i i 22 Chapter 2. as an optical switch in ﬁberoptics. Subsequently the company Bosch AG developed a microelectromechanical sensor at reduced cost and reduced size. SUGAR is a simulation tool for MEMS devices based on nodal analysis techniques from the world of integrated circuit simulation. Systemlevel models with reduced order and accuracy have to be generated as a basis of system simulation [291].edu/cadtools/sugar/sugar/ 2. smaller. The ﬁrst rollover sensors were mechanical. production of the Mercedes A Class cars had to be stopped just before their launch because they failed the elk test. coupled systems of differential equations. The CVD reactor and the required discretization grid 2. To remedy this situation the company incorporated a rollover (elk) sensor which could detect turning movement and apply the breaks to slow the rotational movement down. As the ﬁbertomirror distance increases. etc. Feedback control is needed to achieve precision positioning of a two degree of freedom micromirror. microsensors. 2. or MEMS. MEMS consist of mechanical microstructures. microactuators.9 MEMS devices Microelectromechanical Systems.eecs.000x1. at least 0. 2004 . W W R Inlet R L Heated. see [226].
2004 i i i i .10 Optimal cooling of steel proﬁle Many problems in control are both structured and computationally intensive.12. this domain can further be halved due to symmetry. The domain is obtained by cutting the steel bar vertically. Optimal cooling: discretization grid The heat equation is used to model the cooling process.i i 2. And as the available simulation packages are built to handle low complexities. [322]. Teegarden et al. Thus a 2D boundary control problem results. using appropriate simulation packages as described in [291].g. One method is physically oriented modeling. An application arising in tractor steering can be found in [60]. In rolling mills steel bars have to be heated and cooled quickly and uniformly in order to achieve a fast production rate.10. The steel bar is assumed to have inﬁnite length thus reducing the problem to a 2dimensional one. Here we will describe an application which is worked out by Benner [59]. see e.e. the next issue consists in testing its performance by simulation. Cooling takes place by spraying the bars with cooling ﬂuids. The problem consists in devising and applying an optimal cooling strategy. ACA April 14.13. The more detailed the modeling the higher the complexity (i. Optimal cooling of steel proﬁle 23 i “oa” 2004/4/14 page 23 i Figure 2. It is assumed that there are 8 nozzles spraying cooling liquid uniformly on the boundary of the bar. there is a need for simpliﬁcation of the model through model reduction. Figure 2. MEMS rollover sensor Once such a device has been designed. the number of differential equations) of the resulting model. Schwarz [290]. For an overview see Fassbender [111]. 2.
2004 i i . The control law is obtained by means of LQR (linear quadratic regulator) design.14. The resulting mass and stiffness matrices are sparse (with approximately 3 · 105 and 3 · 104 nonzero elements.500 Figure 2. Progression of the cooling of a steel proﬁle (from lefttoright and from toptobottom). Motivating examples i “oa” 2004/4/14 page 24 i The heatdiffusion equation with Neumann boundary conditions is discretized in the spatial variables using the ﬁnite element method (FEM) described in the package ALBERT. and 5177.000 0. the bars are cooled from 1000o C to 500o C.500 1. The initial mesh leads to a system of order n = 106.500 1. i i ACA : April 14. 1. Finally. and has the spraying intensity as control parameter. 1357. the model uses water whose temperature is 20o C . Subsequently the mesh is reﬁned leading to systems of order n = 371.i i 24 Chapter 2. respectively).000 0.000 0.
i i i “oa” 2004/4/14 page 25 i Part II Preliminaries 25 i i i i .
i i i “oa” 2004/4/14 page 26 i i i i i .
V and a diagonal matrix with nonnegative entries σ1 . like the size of the elements belonging to certain linear function spaces. σk . where Σ = diag (σ1 . In order to be able to discuss approximation problems we need to be able to measure the sizes of various objects. n n The 2induced norm of A : R −→ R is deﬁned as A 2= sup x=0 Ax x 2 2 . Details will be discussed in the sequel. · · · . and ﬁnally (vi) the SVD. Despite the lack of convexity however. Next. For an overview of the role of matrix decompositions in computing. σn on the diagonal. such that A = UΣV∗ . 2 The 2norm of the nvector x = (x1 · · · xn )∗ ∈ Rn . The most commonly used norm for this purpose is the 2norm for elements of function spaces and the associated 2induced norm for measuring the size of operators between these spaces. The problem just deﬁned is a nonconvex optimization problem. 27 i i i i . it can be solved explicitly by means of the singular value decomposition (SVD). the complexity of A is deﬁned as its rank. (ii) the pivoted LU decomposition. (iii) the QR algorithm. This is a powerful result relating the complexity k of the approximant with the (k + 1)st largest eigenvalue of AA∗ (or A∗ A). we refer to [315]. and that Xk = UΣk V∗ where Σk = diag (σ1 . Furthermore σ1 = A 2−ind . is the usual Euclidean norm: x 2 = x2 1 + · · · + xn . ﬁnd a matrix (operator) Xk of lower complexity (say. The σi are the singular values of A. In this simple case one approximation problem is as follows. · · ·. 0) is an optimal solution.i i i “oa” 2004/4/14 page 27 i Chapter 3 Tools from matrix theory The broad topic of this book is approximation. · · · . This decomposition is one of the most useful tools in applied linear algebra. It can be efﬁciently computed and is an important tool both for theoretical and computational considerations. which are the square roots of the largest n eigenvalues of A∗ A or AA∗ .rankX≤k min A−X 2−ind = σk+1 (A). Given a matrix (operator) A of complexity n. one notices that the minimum value of the error is X. · · · . this is a decomposition of A in terms of two unitary matrices U. Furthermore. σn ). (iv) the spectral decomposition. that form the foundations of matrix computations: (i) the Cholesky decomposition. it is easy to convince one’s self that this lower bound is achieved by simple truncation of the SVD of A. k < n) so that the 2induced norm of the error is minimized: Xk = arg rank X≤k min A−X 2−ind . First. The solution of the approximation problem introduced above can now be obtained as follows. which lists the big six matrix decompositions. 0. (v) the Schur decomposition. as well as the size of operators acting between these spaces.
The ﬁrst section lists various norms of vectors and matrices. Let U be n × n and U∗ U = In . ∞ holds: √ √ x ∞ ≤ x 2 ≤ x 1 ≤ n x 2. where m ≤ n. the SVD Singular Value Decomposition.2) These norms satisfy H¨ older’s inequality: x∗ y ≤ x p y q for 1 1 + = 1. . It follows that ∗ ∗ ∗ 2 Ux 2 2 = x U Ux = x x = x 2 . 1 < p < 2. ∀ x ∈ X. For the material of the ﬁrst three sections see. Meyer [239]. ∀ x ∈ X. Lax [225]. More material from linear algebra is presented in chapter 10. This holds also if U has size n × m. These are. y ∈ X. More precisely. It has to do with the Krylov Iteration. c ∈ C. with equality iﬀ x = 0. for example. and outer square). Higham [171]. The second presents a discussion of an important decomposition of matrices and operators. Stewart and Sun [314]. . the condition number of a given problem and the stability of an algorithm to solve the problem. For the material in section 3. An important class of matrix norms are those which are induced by the vector pnorms deﬁned above. is a circular ﬁgure lying between the diamond and the circle. circle. Horn and Johnson [181]. Golub and Van Loan [143]. For a treatment with more functional analytic ﬂavor. ∀ α ∈ C. and for p > 2 is also a circular ﬁgure lying between the circle and the outer square. which are essential in assessing the accuracy of computations. p q (3.3 we also refer to the notes by Sorensen [308].i i 28 Chapter 3.1) For vectors x ∈ Cn the H¨ older or pnorms are deﬁned as follows: ( x p n i=1 =  xi p ) p . xn (3. and ∞norms are shown in ﬁgure 3. p = ∞ x1 . Thompson [324]. Notice that the unit ball for the pnorm. In the same chapter.1 Norms of ﬁnitedimensional vectors and matrices Let X be a linear space over the ﬁeld of reals R or complex numbers C. see Bhatia [64].2. x 2 ≤ n x ∞ The unit balls in the 1. which turns out to be important both in linear algebra and in approximation of linear dynamical systems. ∀ x. 1 ≤ p < ∞ 1 maxi∈{1···n}  xi . with equality holding iff y = cx. for A = (Aij ) ∈ Cn×m Ax q (3.4) A p. 2004 i i . ν (αx) = αν (x). The following relationship between the H¨ older norms for p = 1. related issues dealing with eigenvalue computations and pseudospectra are also brieﬂy discussed. (3.3) For p = q = 2 this becomes the CauchySchwarz inequality: x∗ y ≤ x 2 y 2. Trefethen and Bau [328]. where In is the n × n identity matrix. in this case U is sometimes called suborthogonal (subunitary). ν (x + y) ≤ ν (x) + ν (y). A norm on X is a function ν: X → R such that the following three properties are satisﬁed: strict positiveness : triangle inequality : positive homogeneity : ν (x) ≥ 0.q = sup x p x=0 i i ACA : April 14. 3.1 (diamond. An important property of the 2norm is that it is invariant under unitary (orthogonal) transformations. x= . Tools from matrix theory i “oa” 2004/4/14 page 28 i This chapter contains some fundamental results from linear algebra and numerical linear algebra. Then we brieﬂy review two concepts from numerical analysis. 2. Most of the material presented in this chapter is standard and can be found in many textbooks.
denote the j th column. respectively. Unit balls in R2 : 1norm (inner square).∞ = max A:.1. j =1 1 1 2 = λmax (AA∗ ) 2 = λmax (A∗ A) 2 More generally. In particular. ith row of A.j j i p p ¯. These noninduced norms are unitarily invariant. but for now. i = 1. 1≤p<∞ (3. ∞norm (outer square) is the induced p. p = q = 1. There exist other norms besides the induced matrix norms. 2004 i i i i . where A:.∞ = σmax (A) (3. m).1.∞ = δmax (A∗ A) 2 . q norm of A.p = i=1 p σi ( A) . min(n. Then for m ≤ n m 1 p A S. ∞]. A 2. A 1.6) April 14. · · · . the following expressions hold for mixed induced norms [85]: A A 1. 2norm (circle). we introduce the singular values of A. i.: .j where δmax (M) denotes the largest diagonal entry of the matrix M. Ai. One such class is the Schatten pnorms.i i 3.e. denoted by σi (A).2 = δmax (AA∗ ) 2 .5) It follows that the Schatten norm for p = ∞ is A ACA S. Norms of ﬁnitedimensional vectors and matrices 29 i “oa” 2004/4/14 page 29 i x2 1 0 1 x1 Figure 3. In particular the following special cases hold 1 1 A 1. To deﬁne them.: p ¯= p p−1 for p ∈ [1. In the next section we describe the singular values in greater detail.j  i. 2. ∞ the following expressions hold m A A A 1 = j ∈{1···n} max Aij .∞ = max Ai. it sufﬁces to say that σi (A) is the square root of the ith largest eigenvalue of AA∗ . for equiinduced norms.p p.j . = max Ai. i=1 n ∞ = i∈{1···m} max Aij .
Tools from matrix theory i “oa” 2004/4/14 page 30 i which is the same as the 2induced norm of A. All of the matrix norms discussed above satisfy the submultiplicativity property: AB ≤ A B .8) It should be noticed that there exist matrix norms which do not satisfy this relationship. For p = 1 we obtain the trace norm m A S. There exist unitary matrices U ∈ Cn×n . let the ordered nonnegative numbers σ1 ≥ σ2 ≥ · · · ≥ σn ≥ 0 be the positive square roots of the eigenvalues of AA∗ . is one of the most useful tools in applied linear algebra. For an overview of the role of matrix decompositions in computing. V = (v1 v2 · · · vm ) are called the left and right singular vectors of A. the QR algorithm. 3. the Schur decomposition. 1 (3. σi are the singular values of A while the columns of U and V U = (u1 u2 · · · un ). UU∗ = In . It is safe to say that if we were to keep only one of these decompositions. 5. The decomposition (3. n i i ACA : April 14.7) where tr (·) denotes the trace of a matrix. 3. n. 4. and V ∈ Cm×m . consider the matrix norm A = max Aij . Thus if n = m. 2004 i i .2 The singular value decomposition The Singular Value Decomposition. i = 1. let also Ik denote the k × k identity matrix. and zero elsewhere. 2.9) where Σ is an n × m matrix with Σii = σi . respectively. the SVD. Σ is a square diagonal matrix with the ordered σi on the diagonal. the Schatten 2norm.9) is called the singular value decomposition (SVD) of the matrix A. we refer to the article by Stewart [315]. · · · . These singular vectors are the eigenvectors of AA∗ and A∗ A. As an example. VV∗ = Im . abbreviated SVD. · · · . such that A = UΣV∗ (3. or the HilbertSchmidt norm of A: m 1 2 A F = i=1 2 σi (A) = [tr (A∗ A)] 2 . It can be efﬁciently computed and is an important tool both for theoretical and computational considerations. it would be the SVD. the pivoted LU decomposition.i i 30 Chapter 3. the spectral decomposition. Given a matrix A ∈ Cn×m . n ≤ m. respectively. i = 1. It readily follows that Avi = σi ui . the Cholesky decomposition. that form the foundations of matrix computations: 1.1 = i=1 σi (A) For p = 2 the resulting norm is also known as the Frobenius norm. which lists the big six matrix decompositions. (3. and ﬁnally 6.
where 1 U= √ 2 σ1 = 2+ 1 1 √ 1 −1 2. Since A = σ1 u1 v1 ∗ theorem 3. V= √1 2σ 1 √ 1+ √ 2 2σ 1 √1 2σ 2 √ 1 − 2 √ 2σ 2 2− ∗ ∗ .1. The singular values of A are unique. By deﬁnition A ACA 2 2 = sup x=0 Ax 2 x∗ A∗ Ax 2 = sup . Quantities describing the singular value decomposition in R2 1 √ 1 Example 3.4 discussed on page 35. It readily follows that the eigenvalue decomposition of the 0 2 √ 2 2 1 1 √ matrices AA∗ = and A∗ A = are: 1 3 2 2 AA∗ = UΣ2 U∗ and A∗ A = VΣ2 V∗ .2. −1 2−1 1 1+ 2 2 2 In other words the distance of A to singularity is σ2 . that X = σ2 u2 v2 is a perturbation of smallest 2norm (equal to σ2 ) such that A − X is singular: √ √ 1 1 1 1 − 2 1 1 + √2 √ X= ⇒ A−X= .1. x 2 x∗ x x=0 2 April 14. σ2 = . Lemma 3. The singular value decomposition 31 i “oa” 2004/4/14 page 31 i A Figure 3. The largest singular value of a matrix A is equal to its induced 2norm: σ1 = A 2 . rightsingular vectors corresponding to singular values of multiplicity one are also uniquely determined up to simultaneous sign change.2.2. 2004 i i i i . Consider the matrix A = . Proof. Σ= √ σ1 0 2. 0 σ2 .2). Thus the SVD is unique in case the matrix A is square and the singular values have multiplicity one.i i 3. The left. it follows from + σ2 u2 v2 Notice that A maps v1 → σ1 u1 and v2 → σ2 u2 (see ﬁgure 3.
for the rest of the proof we assume for simplicity that λn = 0. It follows that column is x1 . UU∗ = In . Since n ≤ m. we conclude that 2 σ1 + w∗ w ≤ AA∗ 2 = σ1 . Thus U∗ 1 AV1 = σ1 0 0 B . that is x = v1 . such that Ax1 = σ1 y1 . Bw Since the 2norm of every matrix is greater than or equal to the norm of any of its submatrices. U1 = [y1 U ˆ 1 ]. i. there exist unit length vectors x1 ∈ Cm .i i 32 Chapter 3. The above relationship implies −2 K1 K∗ U∗ A ∈ Cn×m . Substituting in the above expression we obtain 2 σn ≤ 2 2 2 2 x∗ A∗ Ax σ1 y1 + · · · + σn yn 2 = ≤ σ1 . This implies that w must be the zero vector. First proof. U1 so that their ﬁrst ˆ 1 ]. 2 ∗ 2 x x y1 + · · · + yn 2 This latter expression is maximized and equals σ1 for y = e1 (the ﬁrst canonical unit vector [1 0 · · · 0]∗ ). and y1 ∈ C . Second proof. Tools from matrix theory i “oa” 2004/4/14 page 32 i Let y be deﬁned as y = V∗ x where V is the matrix containing the eigenvectors of A∗ A. y1 respectively: V1 = [x1 V σ1 w∗ = A1 where w ∈ Cm−1 . If n > m. we provide three proofs. A∗ A = VΣ∗ ΣV∗ . where v1 is the ﬁrst column of V. 2004 i i . U∗ 1 AV1 = 0 B and consequently ∗ ∗ U∗ 1 AA U1 = A1 A1 = 2 + w∗ w σ1 w∗ B∗ BB∗ . Let σ1 be the 2norm of A. Deﬁne the unitary matrices V1 . 3. follow the same procedure with A∗ A. This is based on the eigenvalue decomposition of the positive semideﬁnite Hermitian matrix AA∗ : λ1 λ2 AA∗ = UΛU∗ . 1 = In where K1 = Λ 1 1 i i ACA : April 14. Λ = ∈ Rn×n . which has size (n − 1) × (m − 1). . The procedure is now repeated for B. n ∗ x∗ 1 x1 = 1.1 Three proofs Given the importance of the SVD..e. λn where λi ≥ λi+1 . since the norm of any submatrix is no greater than the norm of the whole matrix. Again. y1 y1 = 1.2. Theorem 3. We are now ready to state the existence of the SVD for all matrices. Set Σ = Λ 2 . This based on the lemma above.2. Every matrix A with entries in C has a singular value decomposition. w = 0. σ2 = B ≤ σ1 .
such that Avk = σk uk . Assume that in (3. where σk are nonnegative real numbers. un ) of Rn . respectively. (ii. the matrices U. The singular value decomposition ∗ Therefore K1 can be extended to a unitary matrix. Given (3. vm ) of Rm . A = a is a row vector. 0 ∈ Rn×m and V = [V1 V2 ] Σ2 (n−r )×(m−r ) > 0. k = 2. 3. 1. First recall that the SVD of A ∈ Rn×m corresponds to the existence of orthonormal bases (u1 . · · · . we conclude that AVw ⊂ VAw . (3. un and v2 .1) For n = 1. ⊥ ⊥ ⊥ Since x ∈ Vw implies Ax. U2 have r. ker A∗ = span col U2 . A can be decomposed as a sum of r outer products of rank one: ∗ ∗ ∗ A = σ1 u1 v1 + σ2 u2 v2 + · · · + σr ur vr . Now deﬁne 1 u1 = w Aw Aw . The four fundamental spaces associated with A are: span col A = span col U1 . and V1 . say V = (K∗ 1 K2 ) of size m × m. n − r columns. we have Au1 = w v1 = σ1 v1 .10) the following statements hold. the ﬁrst having r columns: Σ1 U = [U1 U2 ]. ker A = span col V2 .2. u2 . Σ. span col A∗ = span col V1 .2) Assume that the result holds for matrices having k − 1 rows. satisfy the desired requirements. Corollary 3. and (v1 . VAw = {y ∈ Rn : y∗ Aw = 0}. w Aw w Aw Aw Aw Since A w w = w Aw . 2. Σ = 0 σ1 . such that Auk = σk vk . u3 . rank A = r. · · · . σ1 = . for k = 1. apply a suitable permutation to the bases to achieve this. which completes the proof..9) σr > 0 while σr+1 = 0.3. Σ 2 = 0 ∈ R σr where U1 . (3. V are partitioned compatibly in two blocks.i i 3.2. take for U and V any orthogonal (unitary) matrices. (ii. Therefore.9) and (3. Deﬁne the spaces ⊥ ⊥ Vw = {x ∈ Rm : x∗ w = 0}. The proof goes by induction on n. By the induc⊥ ⊥ tion hypothesis. vk . Aw = x. v1 = .11) April 14. (ii. there are orthonormal bases u2 . · · · . 2. [357].10) 3. · · · . assuming n ≤ m. We show that it holds for matrices having k rows. Σ1 = . w = 0. uk . n. · · · . V any orthogonal matrix with ﬁrst column a a . and the third proof is complete. V2 have r. and put Σ = 0. vn for (a subspace of) Vw and VAw . σn are not ordered in decreasing order.3) If the resulting σ1 . · · · . 2. v3 . consequently i “oa” 2004/4/14 page 33 i 33 U[Λ 2 0]V∗ = A. take U = 1. (i) If A = 0. (ii) Assume that A = 0 and n ≤ m (otherwise consider A∗ ). k = 1. Third proof. and Σ = ( a 0 · · · 0). · · · . A∗ Aw = λ x. · · · . n. Dyadic decomposition. n.2 Properties of the SVD The SVD has important properties which are stated next. m − r columns respectively. Let w = 0 be an eigenvector of A∗ A corresponding to the eigenvalue λ > 0. 2004 ACA i i i i . v2 .
as formalized by means of the concept of pseudospectra. with det X = 0. λi ∈ R . r. vi ).i i 34 4. m) 1. its sensitivity to perturbations. abbreviated EVD.11) are unique.. subject to linear independence. Λ is a diagonal matrix (each Jordan block is scalar). 2004 i i .2. the columns of U2 are arbitrary subject to the constraint that they be linearly independent.k) computes a short SVD containing k terms. On the other hand. where the λk are the eigenvalues of A and the columns of X are the eigenvectors and generalized eigenvectors of A. will be discussed in chapter 10. ∗ 5. Jordan) or upper triangular: EVD : AX = XΛ. U1 ∈ Rn×r . i. where Jmk is a nilpotent matrix with ones on the superdiagonal and zeros everywhere else. The orthogonal projection onto the kernel A∗ is In − U1 U∗ 1 = U2 U2 . Every square matrix has an eigenvalue decomposition (EVD) as above. VV∗ = Im . It follows that the outer products in (3. In MATLAB the command svd(A) computes the full SVD of A. The orthogonal projection onto the span of the columns of A∗ is V1 V1 . X is unitary: λ1 . and the Schur decomposition are decompositions of the general form AY = ZΦ where Φ is either diagonal or “close” to diagonal (i. i. that is the ﬁrst k singular values and singular vectors. Thus the EVD.2. (3.e. normalized. in particular as it relates with the Krylov methods. · · · . and thus.. det X = 0. and orthogonality with the columns of V1 . the SVD. normalization. The orthogonal projection onto the span of the columns of A is U1 U∗ 1. where U is unitary and T is upper triangular with the eigenvalues on the diagonal. we obtain the Schur decomposition: A = UTU∗ . A = A∗ ⇒ XX∗ = In and Λ = . σ1 n Remark 3. −vi ). of size mk : Λk = λk Imk + Jmk ∈ Cmk ×mk . i = 1. the only other option for this pair is (−ui . and orthogonal to the columns of U1 . More about the EVD.e. Symmetric matrices are diagonalizable. UU∗ = I. while the command svds(A.10) is ∗ A = U1 Σ1 V1 . and the SVD. UU∗ = In . such that A = XΛX−1 . will be brieﬂy discussed there. given a pair of left.e. i “oa” 2004/4/14 page 34 i Chapter 3. it is a direct sum of Jordan blocks Λk . Σ1 ∈ Rr×r . Λ is in Jordan form.3 Comparison with the eigenvalue decomposition We will now compare some similarities and differences between the eigenvalue decomposition. The Frobenius norm of A is A F = 2 + · · · + σ2 . Tools from matrix theory 8.. The short form of the SVD in (3. right singular vectors (ui . i. The use of the short SVD is recommended for min(n. λn If we require that X be unitary even if A is not Hermitian. in addition. the set of eigenvectors can be chosen to form an orthonormal set in Cn . V1 ∈ Rm×r where r is the rank of A. SVD : AV = UΣ. there exist matrices X and Λ. ∗ ∗ 7.e. ∗ 6. Given a square matrix A ∈ Cn×n . The orthogonal projection onto the kernel of A is Im − V1 V1 = V2 V2 .1. V2 are not necessary for the computation of the SVD of A. Similarly the columns of V2 are arbitrary. for instance. Thus U2 .12) i i ACA : April 14. 3.
· · · . Consider the 2 × 2 matrix A= 1 0 0 The eigenvalues of this matrix are 1 and 0 irrespective of the value of . · · · . cos θ = √ sin θ − cos θ 0 0 1+ 2 1+ 2 √ Hence the largest singular value. Theorem 3. ﬁnd X ∈ Cn×m . Young. Given A ∈ Cn×m . it sometimes happens that Φ is not diagonal and X not orthogonal. Mirsky. −1 if λ < 0. Eckart and Young. Eckart. In the Schur decomposition the unitarity of X forces the matrix U∗ AU to be upper triangular with the eigenvalues on the diagonal.4 Optimal approximation in the 2induced norm The singular value decomposition is the tool which leads to the solution of the problem of approximating a matrix by one of lower rank.2. This matrix is revisited in section 10. sgnλn ). Let the EVD of a given A ∈ Rn×n be: A = A∗ = VΛV∗ Deﬁne by S = diag (sgnλ1 .2. it equals +1 if λ ≥ 0. 2004 i i i i . Then the SVD of A is A = UΣV∗ where U = VS and Σ = diag (λ1 . λn ).2. By not requiring that Y = Z.2. Problem. This leads to a Φ = Σ matrix which is always diagonal and furthermore its elements are nonnegative. and depends on the value of . Schmidt derived a version of it in the early 1900s in connection with integral equations [289]. Then two researchers from the quantitative social sciences.4. even rectangular matrices have an SVD. where sgnλ is the signum function. This however is not true with the singular values.i i 3. A further difference between the EVD and the SVD is illustrated by the following Example 3. A (nonunique) minimizer X∗ is obtained by truncating the dyadic decomposition (3.11) to contain the ﬁrst k terms: ∗ ∗ ∗ X∗ = σ1 u1 v1 + σ2 u2 v2 + · · · + σk uk vk (3. SVD for symmetric matrices For symmetric matrices the SVD can be directly obtained from the EVD. The singular value decomposition 35 i “oa” 2004/4/14 page 35 i By forcing Y = Z = X.e. is equal to 1 + 2 .2. rank A = r ≤ n ≤ m. This example shows that big changes in the entries of a matrix may have no effect on the eigenvalues. The SVD of A is √ 1 cos θ sin θ 1+ 2 0 and sin θ = √ A = I2 . optimally in the 2induced norm.13) provided that σk > σk+1 . With the notation introduced above X. Schmidt. In the SVD the constraint Y = Z is relaxed and replaced by the requirement that Y and Z be unitary. Hence. A detailed account of this result can be found in the book by Stewart and Sun [314]. such that the 2norm of the error matrix E = A − X is minimized. The solution of this problem is given in the following result due to four people. in contrast to the EVD. page 259 from the point of view of pseudospectra. rank X = k < r. the condition that A be square is also relaxed. published another version of this result in the mid 1930s [100].14) ACA April 14. rankX=k min A−X 2 = σk+1 (A) (3. the 2norm of A. 3. i.. Finally Mirsky in the 1960s derived a general version which is valid for all unitarily invariant norms [241].
i = 1. Proof of theorem. span {y1 . rank X=k 1 2 min A−X F = i=k+1 2 σi (A) . Eckart. that is the coefﬁcients are nonnegative and their sum is equal to one. 2 2 ≥ (A − X)z 2 2 = Az 2 2 = i=1 2 ∗ 2 2 (vi z) ≥ σk σi +1 . be a basis for the kernel of X ker X = span {y1 . one may relax the optimality condition (3. there is a unique minimizer given by (3. for all X of rank less than or equal to k . EckartYoung theorem provides the solution to the optimal approximation problem in the Frobenius norm. there holds A−X 2 i “oa” 2004/4/14 page 36 i Chapter 3. Young.23) in example 8. 2004 i i .13) becomes in this case: m X.16) This property is important in the Hankel norm approximation problem. This is the suboptimal approximation problem in the 2induced norm. · · · . (a) Optimal approximation in the Frobenius norm. As a consequence. · · · . (3.2. and there is little hope of solving it by applying general optimization algorithms. formula (8. Given A of rank r. The minimum value of the error (3.5. Let yi ∈ Cm . z∗ z = 1.g. (d) It is easy to see that the rank of the sum (or the linear combination) of two matrices is in general not equal to the sum of their ranks. · · · . e. The minimizer (3. is not unique in the 2induced norm. It follows that (A − X)z = Az. See. Tools from matrix theory ≥ σk+1 (A). the problem of minimizing the 2induced norm of A − X over all matrices X of rank (at most) k is a nonconvex problem. (3. the SchmidtMirsky. n − k .2.14) given in the Schmidt. There remains to show that the lower bound obtained in the above lemma is attained. vk+1 } = {0}. Mirsky theorem.14) does precisely this. which was i i ACA : April 14.5. yn−k } From a dimension argument follows that the intersection of the two spans is nontrivial. This is also true if the linear combination is convex. belong to this intersection. Let z ∈ Cm . Furthermore. As it turns out. provided that σk > σk+1 .. · · · . Instead one may wish to obtain all matrices X of rank k satisfying σk+1 (A) < A − X where σk+1 (A) < solved in [331]. < σk (A).1. · · · .15) Proof of lemma. yn−k } ∩ span {v1 . Lemma 3. thus k+1 A−X This completes the proof. (c) The minimizer (3. ηk ) = i=1 ∗ (σi − ηi )ui vi where 0 ≤ ηi  ≤ σk+1 .i i 36 The proof of this result is based on the next lemma. 2 < .14). Remark 3.13). (b) The importance of the SchmidtMirsky result is the relationship between the rank k of the approximant and the (k + 1)st singular value of A. A class of minimizers is given as follows k X(η1 . (e) In order to obtain a larger set of solutions to the problem of optimal approximation in the 2induced norm.
07 0. respectively.5 3 3.5 2 2. We will compute optimal approximations of rank 1 and 2. of rank 1.5 2 2. σ3 = 3 − 7 = 0. The singular value decomposition Example 3.59.2. some of which we mention here.23 T1 = 0.3.3.5 1 2 3 0.65 0.5 4 4. 0 1 which has rank 3. Figure 3.65 0. The threshold is zero for the actual rank and. This is done by counting the number of singular values which are above a certain threshold. middle: rank 2.5 3 3.30 0. 2.98 −0. i. obtained by means of the dyadic decomposition turn out to be: 0. right: rank 1 approximants.5 1 1.69 1. σ2 3 In order to be able to represent the results graphically.2.59. Because the SVD is wellconditioned. T2 = 0.96 0. inbetween values represent levels of gray. it can be used to determine both the numerical and actual rank of a matrix. and that of E1 = T − T1 is σ3 = . for the numerical rank.38 0. Here we consider the following 4 × 3 matrix 1 1 0 1 T= 0 1 1 1 37 i “oa” 2004/4/14 page 37 i 0 0 .5 4 4. Notice that the norm of the errors.42 0.38 0. we notice that T can be considered as a 12 pixel image of the number one. ACA April 14.17 0.23 .3. shows a pictorial representation in MATLAB of the original ﬁgure and its approximants.38.42 0. Approximation of an image using the SVD. respectively.i i 3.5 1 1.16. The approximants T1 .5 3 3.e. All pixels with value bigger than 1 and smaller than 0 are approximated by 1 (white) and 0 (black). some “small” number determined by the user according to the application at hand. there are 128 levels of gray.69 1.79 0. 3.5 4 4. Left: original.5 Further applications of the SVD The SVD can be used to tackle a number of other problems. numbers which are not integer multiples of 128 are approximated by the nearest integer multiple of 1 . A pixel with value 0/1 represents black/white. First is the determination of the rank of a matrix.10 0. √ √ The singular values of T are σ1 = 3 + 7 = 2.46 The approximants shown in ﬁgure 3.5 1 1.5 1 2 3 Approximant: rank 1 Figure 3. with 1 being white 1 and 0 being black.5 2 2.84 1. Original: rank 3 0. ﬁnally numbers less than zero are depicted by black pixels while numbers bigger than one are depicted by white 128 pixels.2.3 should be interpreted as follows.08 0.5 1 2 3 Approximant: rank 2 0. while no entry of E1 can exceed 2 + σ 2 = 1. In MATLAB each pixel is depicted as a gray square.98 −0. It also follows that no entry of E2 can exceed σ3 .08 0.10 1. 2004 i i i i . the largest singular value of E2 = T − T2 is σ2 = 1.07 0. T2 . σ2 = 1.
e. y. 1}m and δ ∈ R. Given M. the complex stability radius is the inverse of the smallest 2norm of a complex ∆ such that In − ∆M. (AX)∗ = AX. This problem is solved iteratively by ﬁxing y and solving for x ∈ {−1. (3.6 SDD: The semidiscrete decomposition* A variant of the SVD is the SDD (SemiDiscrete Decomposition) of a matrix. by ﬁxing x and solving for y ∈ {−1. This decomposition was introduced in [255]. ∆ ∈ Rn×n ) can be computed as the second singular value of a real matrix derived from the complex matrix M. The ﬁrst remark is that this decomposition does not reproduce A even for k = n or k = m. (XA)∗ = XA. where Σ1 > 0. ρ2 (u∗ LS = i b) .11). Applying the SchmidtEckartYoungMirsky theorem. and subsequently. a pseudoinverse or generalized inverse is deﬁned as a matrix X ∈ Cm×n that satisﬁes the relationships AXA = A. 0. M ∈ Cn×m . is singular. Given a not necessarily invertible or even square matrix A ∈ Cn×m . 2004 i i . while δi ∈ R are real numbers. The problem can also be formulated equivalently as follows. Tools from matrix theory i “oa” 2004/4/14 page 38 i The second application of the SVD is the calculation of the MoorePenrose pseudoinverse. Recall the dyadic decomposition (3. 0. Given A as above. it can be shown that the (complex) stability radius is equal to σ1 (M). The rank k approximation of a matrix A ∈ Rn×m by means of the SDD is given by a different dyadic decomposition: ∗ ∗ ∗ ∗ . ∆ ∈ Cm×n . Given Ak−1 we seek x. and a vector b ∈ Cn .17) Often the pseudoinverse of A is denoted by A+ . i i ACA : April 14. 0).2. Let Σ X=V 1 Σ− 1 0 0 0 U∗ . 3. = X k ∆k Y k Ak = δ1 x1 y1 + δ2 x2 y2 + · · · + δk xk yk In this expression the entries of the vectors xi and yi are restricted to belong to the set {−1. The computation of this decomposition proceeds iteratively. The least squares (LS) problem is closely related to the pseudoinverse of a matrix. The advantage obtained comes from the fact that the rank k approximation requires storage of only 2k (n + m) bits plus k scalars as opposed to (n + m + 1)k scalars for the rank k approximation of A using the SVD. The iteration is stopped when the difference in two successive iterates of δ drops below a speciﬁed tolerance. 0. We wish to ﬁnd x ∈ Cm which solves: min Ax − b x 2 We denote the norm of this residual by ρLS . It follows that The solution can now be given using the SVD of A. 1}.i i 38 Chapter 3. σi i=m+1 n The ﬁnal problem discussed here is that of the stability radius in robust control. 1}n and δ ∈ R. Given is A ∈ Cn×m . The solution is m xLS = Xb = i=1 u∗ ib 2 vi . ﬁnd X that solves min AX − In X F ¯ = diag (Σ1 . XAX = X. It was recently shown that the real stability radius (i. n ≥ m. Ek−1 = A − Ak−1 . δ such that the Frobenius norm of the error matrix Ek−1 is minimized: Ek−1 − δ xy∗ F.
The Taylor series expansion becomes in this case f (x0 + δ ) = f (x0 ) + J∗ (x0 )δ + O( δ 2 ) where δ ∈ Rn is a vector perturbation. This leads to the concept of condition number of the problem at hand. quantiﬁes the error due to ﬂoating point arithmetic. Next we will examine the condition number of evaluating the vectorvalued function f : Rn → Rm at a point x0 ∈ Rn . the ampliﬁcation factor for the perturbations lies between the largest and the smallest ACA ∂f April 14. If x0 is perturbed to x0 + δ . which depends on the direction of δ .4. if it exists.i i 3. and Petkov. It tells us by how much inﬁnitesimal perturbations in x0 will be ampliﬁed to produce the perturbation in the evaluation. 2004 i i i i . is twice differentiable in a neighborhood of x0 . T ˆ2 = T ˆ 1 − 1 0 0 = 1 −1 3 0 T 1 0 0 4 1 1 0 4 −1 3 0 1 1 0 0 0 0 3 3 0 ˆ 1. T − T ˆ 2 is 1. while a backward error analysis is concerned with whether the computed inexact solution can be interpreted as the exact solution of the same problem with perturbed initial data. together with a knowledge of the sensitivity of the problem allows us to measure the forward error. For problems of linear algebra. The absolute condition number is precisely this derivative f (x0 ). a Taylor series expansion of f (x) around x0 gives f (x0 + δ ) = f (x0 ) + f (x0 )δ + O(δ 2 ). See also [206]. and Smale [301]. The corresponding largest singular value of the error matrices T − T 3. For the relative condition number.1 Condition numbers First we examine the problem of estimating the effect of perturbations on input data. Assume that f : R → R. Two concepts are involved.j (x0 ) = ∂ xji x=x0 . Otherwise a problem is wellconditioned.3. Second. f (x0 ) The interpretation of the condition number is that given a small relative error in x0 . [338]. Applying the iterative SDD approximation to example 3. First. A forward error analysis is concerned with how close the computed solution is to the exact solution.3. A problem is illconditioned if small changes in the data cause relatively large changes in the solution. where prime denotes the derivative with respect to x. we obtain approximants T rank one and two respectively: 1 1 0 3 3 0 0 0 0 ˆ1 = 3 1 1 0 . and the papers by Higham. let us look at the problem of the sensitivity of evaluating a given function f at a point x0 . it is often advantageous to perform a backward error analysis. Konstantinov. Basic numerical analysis 39 i “oa” 2004/4/14 page 39 i ˆ 1 and T ˆ 2 of Example 3. van Dooren [334]. which is a measure of the sensitivity of the solution to perturbations in the data. In this case a perturbation δ in x0 will result in a perturbation J(x0 )δ in f (x0 ). the resulting relative error in f (x0 ) will be ampliﬁed by κf (x0 ).2. for a given solution algorithm.3. Thus assuming that 2norms are used. [172]. error analysis which. The condition of a problem does not depend on the algorithm used to compute an answer. [335]. 1. the condition of the problem. [337].3 Basic numerical analysis In this section we present a brief introduction to issues of accuracy in numerical computations. 3. A knowledge of the backward error. notice that f (x0 + δ ) − f (x0 ) f (x0 ) · x0  δ  = · + O(δ 2 ) f (x0 ) f (x0 ) x0  This condition number is deﬁned as the absolute value of the ratio of the relative change of f (x) over the relative change in x: f (x0 ) · x0  κf (x0 ) = .2. J(x0 ) is the Jacobian of f evaluated at x0 : Ji.0877.188. First. Mehrmann. Besides references listed already at the beginning of this chapter (see page 27) we also recommend the lecture notes by Sorensen [308].
If A is a perturbed to A + ∆A. the other quantities in the above equation will be perturbed too: (A + ∆A)(x + δ x) = (λ + δλ)(x + δ x) Our goal is to describe the change δλ for inﬁnitesimal perturbations in A.g.i i 40 Chapter 3. see e. we derive the inequality b ≤ A−1 Ab and hence Ab ≤ Ab . Finally we turn our attention to the condition number of the linear system of equations Ax = b. i. Using this last inequality −1 together with the relative error we obtain the following upper bound ˆ ˆ Ab − Ab A b−b ≤ = A 1 ˆ ˆ Ab · b −1 A A− 1 κ ˆ b−b ˆ b Thus the condition number. we wish to compute the condition number of the inner product a∗ · x at x = x0 . that the relative error in the solution has the upper bound ˆ x−x ≤κ x ˆ ˆ b−b A−A + A b where κ = A A− 1 Hence the relative error in the solution can be ampliﬁed up to κ times with respect to the relative error in the data. The sensitivity of these calculations to uncertainty in the data must therefore be studied. Again.e. Condition number of EVD and SVD A problem of considerable importance is that of computing the eigenvalue or the singular value decomposition of a given matrix. the largest ampliﬁcation of perturbations in b is κ = A A−1 . The absolute condition number is deﬁned as the largest singular value. Starting from the equality b = AA−1 b. This is called the condition number of the matrix A. x = A−1 b. Since the Jacobian in this case is J = a ∈ Rn . In the next section we will make use of the condition number of the inner product.e. where ∆A is small. This formula implies that the condition number of A is the same as that of its inverse A−1 . Again. x ∈ Rn .. It can ˆx ˆ and b − b ˆ=b and A are perturbed. more generally. y be the right/left eigenvector of A corresponding to the eigenvalue λ: Ax = λx. Tools from matrix theory i “oa” 2004/4/14 page 40 i singular values of the Jacobian J(x0 ). followed by left multiplication by y∗ we obtain: δλ = y y ∗x A ≤ i i ACA : April 14. where both b ˆ . This implies that ∆ terms in the perturbations.19) σn (A) is the ratio of the largest to the smallest singular values of A. the (2induced) norm of the Jacobian J(x0 ) 2 . can be determined as follows. y∗ A = λy∗ . The system actually solved is A be shown. Expanding and neglecting second order ∗ λ ∆Ax . given a relative error in x0 . i. therefore the condition number of solving the set of linear equation Ax = b. it is the norm J(x0 ) induced by the vector norms in the domain and the range of f . if the 2induced norm is used. [171]. due to perturbations in b. the largest resulting relative error in f (x0 ) will be ampliﬁed by κf (x0 ). A−1 2 = σ1 . is the same as that of Ab. the earlier formula yields a x0 κ= (3. Given the square matrix A.18) a∗ x0  The condition number of a matrixvector product Ab. it follows from the SVD of A that A 2 = σ1 . 2004 i i . For a. The relative condition number is deﬁned as κf (x0 ) = J(x0 ) · x0 f (x0 ) where the matrix norm in the above expression is induced by the vector norm used. where the norms of A − A ˆ are small. page 145. and thus the n condition number σ1 (A) κ= (3. let x.
Golub. For details see chapter 2 of the book by Wilkinson [355]. Let our goal be to compute f (x). we refer to the literature. Let λ small perturbation in A will result in a perturbation of the eigenvector which is inversely proportional to the difference ˆ . evaluate the function f at the point x.1 · 10−16 (3. The purpose of error analysis is the discovery of the factors determining the stability of an algorithm. that is: f l(x) = x · (1 + δ ) where δ  ≤ (3.2 Stability of solution algorithms The stability of a method for solving a problem (algorithm) is concerned with the sensitivity of the method to rounding errors in the solution process. we will denote by f l(x) its representation in ﬂoating point. Floating point arithmetic. due to ﬂoating point errors during its execution. The maximum of this expression. for details. namely: f l(x ± y ) = (x ± y ) · (1 + δ ). Van Loan [143]. A forward error analysis is concerned with how close the computed solution is to the exact solution. namely κλ = y x . Moreover.i i 3.20) which is also referred to as machine precision. A = A∗ . where i is the index of λ. since the singular values of a matrix are the square roots of the eigenvalues of symmetric (and positive semideﬁnite) matrices. Given a real number x in the above interval. the most commonly used is the ANSI/IEEE 7451985 Standard for Binary Floating Point Arithmetic [186]. the conditioning of the singular values is also perfect: κσ = 1. Basic numerical analysis y x y∗ x i “oa” 2004/4/14 page 41 i 41 . where x − f l(x) ≤ x. These results will not be discussed here. so their inner product is zero and hence the conditioning is the worst possible. i.21) The operations of ﬂoating point addition and multiplication in IEEE arithmetic yield a result which is up to close to the true result. There are two ways of interpreting the error due to ﬂoating point arithmetic. where for double precision arithmetic Nmax = 21024 ≈ 1. Then a algebraic multiplicity one. the result obtained will be fA (x) = f (x).g. namely AA∗ and A∗ A. 3. the numerical computation of the Jordan canonical form is an illposed problem. Thus the smaller the gap between the eigenvalue of interest and the remaining ones. the more sensitive the λ − λ calculation of the corresponding eigenvector becomes. A backward error analysis is concerned with how well the computed solution satisﬁes the problem to be solved.79 · 10308 . There exists a variety of ﬂoating point arithmetics. y ∗ x  is called the absolute condition number of the eigenvalue λ. Using our favorite algorithm denoted by ”A”. By far. We just quote one simple case which gives the type of results obtained. [256]. For this reason. In the special case where the matrix is Hermitian.3. A similar result holds for eigenvectors of symmetric matrices and hence of singular vectors of matrices. this means that there exists a point which is mapped to the actual result under the function f . can be represented.e.3. Consequently. only real numbers x in the interval between [−Nmax . In this case. In case there is a nontrivial Jordan block associated with λ then x ⊥ y. In order to study the condition number of the eigenvectors we need to study the sensitivity of the corresponding eigenspaces. double precision corresponds to a quantization error = 2−53 ≈ 1. [171]. and the eigenvalues associated with the Jordan block split like ∆A 1/i . e. Nmax ]. Numbers in a computer are represented by means of ﬁnitely many bits. It is now assumed that there is a point xA such that f (xA ) = fA (x). Error analysis is concerned with establishing whether or not an algorithm is stable for the problem at hand. Therefore the precision is also ﬁnite. 2004 i i i i . In the ACA April 14. Suppose that λ is an eigenvalue of ˆ denote the eigenvalue of the matrix under consideration which is closest to λ. f l(x · y ) = (x · y ) · (1 + δ ) and f l(x/y ) = (x/y ) · (1 + δ ). In this arithmetic. the linearization is invalid. A method that guarantees as accurate a solution as the data warrants is said to be stable. otherwise the method is unstable. we have that x = y and therefore the condition number of computing eigenvalues of symmetric matrices is perfect κλ = 1.
666666666666667 · 10−1 . we will now compute the ﬂoating point inner product of two vectors x = (xi ). the forward error can be upper bounded as follows fA (x) − f (xA ) ≤ κf (x) xA − x Therefore if the backward error is small. thus a is represented without error.i i 42 Chapter 3. 2004 i i .000000000000000 · 10−1 . The algorithm fA (x) for computing f (x) is forward stable if fA (x) = f (x) + ef where ef = O( ) f (x) The same algorithm is called backward stable if fA (x) = f (x + ex ) where ex = O ( ) x (3. b = − 3 . Here is a simple example. in ﬂoating point arithmetic we get (a + c)+ b = (c + a)+ b = 5. Let a = 1 2 .77 · 10−17 . while backward stability is deﬁned using the latter two. fA . In ﬂoating point arithmetic the order of summation plays a role. f l(b) = −3. that is.23) (3. Thus if the norm of the difference xA − x is small (say. there are four quantities involved. Backward error is deﬁned as xA − x . as well. 1 1 Example 3.22) where O( ) is interpreted as a polynomial of modest degree in the size n of the problem times . with si = xi · yi . and that holds for summation in any order. The following experiment is conducted in MATLAB. and seeks to determine how close xA is to x. In exact arithmetic the sum of these three fractions is zero. Forward stability is deﬁned using the ﬁrst two. the forward error is bounded from above by the product of the backward error and the condition number (see page 10 of Higham’s book [171]). However. f l(c) = −1. c = − 6 . namely: f . Tools from matrix theory i “oa” 2004/4/14 page 42 i evaluation of the function above at the given point. xA . x. The above differences are due to the fact that f l(a) = 5. y = (yi ) ∈ Rn . (a + b)+ c = (b + a)+ c = 2. Forward error is deﬁned as fA (x) − f (x) . The two types of error introduced above are related through a Taylor series expansion fA (x) = f (xA ) = f (x) + J∗ (x) (xA − x) + O( xA − x 2 ).55 · 10−17 .333333333333333 · 10−1 . it interprets this same result as an exact calculation on perturbed data xA . on the order of machine precision). ≤ Condition Number × Backward Error Rule of Thumb: Forward Error The inner product in ﬂoating point arithmetic As an illustration of numerical stability. x x xA → f (x) exact arithmetic on given input data → fA (x) = f l(f (x)) ﬂoating point arithmetic on given input data → f (xA ) = f (f l(x)) exact arithmetic on diﬀerent input data fA (x) − f (xA ) : forward error xA − x : backward error We are now ready to deﬁne stability of an algorithm.1. i i ACA : April 14. the forward error aims at estimating how close the ﬁnal result is to the exact result. (b + c)+ a = (c + b)+ a = 0. In other words. while the sum b + c happens to be exactly representable in ﬂoating point arithmetic.3. the desired inner product is obtained by summing these quantities x∗ · y = n i=1 xi · yi .
n. The exact result gives a · b = e · 106 . and second. the error can be interpreted in two different ways. Consider the column vector of length one million: b=ones(1000000. is 1.i i 3. and thus the relative error has the upper bound µκ where κ =  n i=1 xi yi  . while the actual forward error shows a loss of precision of 5 digits. Indeed the backward error implies a loss of precision of 6 digits. denoted by µ. The condition number for this problem. Assuming that n < 1.3. for t > 0. we assume that there is no perturbation in x. Thus y ˆ−y y ≤ µ. the forward error is at most equal to the backward error. As indicated in the previous section. In the former case. and α = ±1.718281828459054. the forward error is bounded from above by the product of the backward error times the condition number of the problem at hand. We will estimate both of these ˆ = (µi yi ) errors. Example 3. as resulting from perturbation of the ﬁnal result. i = 1. and the relative error in y has the upper bound µ: and y = (yi ). Let µk = πk − 1. n. 2004 i i i i . We will now estimate µk . Since µ ≈ n . S1 = s1 . given by (3. Therefore according to the Rule of Thumb given above. which implies that Sn is the desired result (or more precisely.3.  i  ≤ f l(S2 ) = f l(f l(s2 ) + f l(s1 )) = [s1 · (1 + 1 ) + s2 · (1 + 2 )] · (1 + δ2 ). First as resulting from perturbation of the data.3. Illconditioned inner product gives large forward error. Therefore we have lost 5 signiﬁcant digits of accuracy.718281828476833 · 106 while exp(1) = 2. namely: Sk = sk + Sk−1 . Backward and forward stable inner product calculation. This can be explained as follows. 2 ≤ k ≤ n. In ﬂoating point the following sums and products are computed f l(si ) = f l(xi · yi ) = xi · yi · (1 + i ). Basic numerical analysis 43 i “oa” 2004/4/14 page 43 i Therefore our algorithm for computing the inner product will consist of choosing a particular order of summation. is: n µk ≤ µ = n 1 + 2 The assumption n < 1. Then we can write n n n f l(Sn ) = i=1 xi yi (1 + µi ) = ex i=1 xi yi + i=1 xi yi µi . ef where ex is the backward and ef is the forward error. · · · . According to the considerations above the backward error is of the order n ≈ 106 · 10−16 = 10−10 . We compute the inner product a*b. using the truncated power series expansion: exp(αt) = 1 + ACA αn−1 tn−1 α t α 2 t2 + + ··· + = x∗ · y 1! 2! (n − 1)! April 14. while there is perturbation in y: y ˆ − y ≤ µ y . Thus as expected. an approximation thereof). The result in MATLAB gives 2. n i=1 xi yi  n i=1 xi yi µi can Recall from the previous section that κ is the condition number of the inner product. it can be shown that the largest µk . δ2  ≤ f l(S3 ) = f l(f l(s3 ) + f l(S2 )) = [s3 · (1 + 3 ) + f l(S2 )] · (1 + δ3 ) f l(Sn ) = f l(f l(sn ) + f l(Sn−1 )) = [sn (1 + n ) + f l(Sn−1 )] · (1 + δn ) n n = i=1 xi yi πi where πk = (1 + k ) i=k (1 + δi ). We will investigate the computation of the exponentials Eα = exp(αt). let a be the row vector of the same length: a=exp(1)*b’. implies that the number of terms in the inner product cannot be too big. δ1 = 0. A simple example in MATLAB illustrates the issues discussed above.3. k = 1. Example 3. The forward error n n be bounded as  i=1 xi yi µi  ≤ µ i=1 xi yi . y This is the backward error. since i and δi are bounded in absolute 2 value by machine precision. the algorithm used is backward stable. · · · . thus πk  ≤ (1 + )n = 1 + n + n(n − 1) 2 + · · ·.2.1).18).
∀ i Thus σ ˆi − σi  ≤ E = O( ) A . For example. Remark 3. not be too sensitive to perturbations in the data) and in addition the algorithm used must be backward stable. where  Z ≤ µAB. µ ≈ n . That is.1 · 10−9 . times machine precision . In other words. which is both forward and z . Furthermore given a second matrix B. with n = 100. Both are at the very foundations of subsequent developments. (3. this is approximately κ ≈ n · e2t . Thus: Backward stability by itself. Thus for α = 1.3. • f l(AB) = AB + Z. V] = svd(A) be the exact decomposition.0 · 10−9 . µ ≈ n . the forward error is of the same order of magnitude. For α = −1 however.g.3. computing ez and inverting e1 . this series expansion is an unstable algorithm. µ ≈ n . The above discussion is summarized in the following result.24) says that the SVD can be computed using a backward stable algorithm. will loose exp(2t) digits of precision.3 Two consequences We will now brieﬂy discuss two consequences of the above considerations. the forward 3 error is n 2 · · exp(2t). note that  AB ≤ A  B . Σ.6. the inner product is backward stable since ex ≤ n . Forward errors for various vectormatrix operations. be the one obtained taking into ˆ.1. its absolute value is equal to the matrix of absolute values  A = (aij ). Thus while for positive exponent. x ∈ Rn . √ • f l( x 2 ) = x 2 · (1 + δ ). 3. ˆ. B ∈ Rn×k .e. which implies that the computation of the singular values can be accomplished in a forward stable manner as well. the 3 computation is forward stable (i. Taken as an algorithm for computing e−z . A ∈ Rm×n . and let [U account ﬂoating point errors. Tools from matrix theory t where the ith entries of these vectors are: (x)i = αi−1 and (y)i = (i −1)! . It can be shown (see e. j . the result will be accurate to within n 2 · . Finally.18).27). The SVD of a matrix is wellconditioned with respect to perturbations of its entries. i i ACA : April 14. namely. −z On the other hand. this means that the errors due to ﬂoating point arithmetic (roundoff errors) must be attributable to execution of the algorithm chosen on data which are close to the initial data. V let [U. the notation  A ≤ B  means aij  ≤ bij . • f l(Ax) = Ax + z. exp(−20) ≈ 2. i = 1. 2004 i i .24) As before O( ) is a polynomial of modest degree in the size of the problem. it follows √ √ that κ ≤ n. and may thus become unstable. namely the stability of the SVD and the illposedness of the rank of a matrix. We conclude this section with the forward error for a collection of important expressions. for all i. gives an algorithm for computing e backward stable. For this purpose we need to introduce the notation: given the matrix A = (aij ). · · · . where  z ≤ µAx. namely −2. that given any E: σi (A + E) − σi (A) = σ ˆi − σi  ≤ σ1 (E) = E 2 . which for t > 1. When is a numerical result accurate? The answer to this question depends on two issues: the problem has to have a good condition number (i.e. Conclusion. z > 0. for negative exponent. to determine forward stability we need the condition y number of the inner product which was computed in (3. Stability of the SVD An important property of the SVD is that it can be computed in a backward stable way. in MATLAB notation. n. [143]) that the following holds ˆ =U ˆΣ ˆ ∗ = A + E where E = O( ) A ˆV A (3. does not imply accurate answers.i i 44 i−1 i “oa” 2004/4/14 page 44 i Chapter 3. Lemma 3. where 1 + δ  ≤ 1 + µ (1 + ).Σ ˆ ] = ﬂ (svd(A)). x ∈ Rn . A ∈ Rm×n . As shown earlier in this section. κ = x x∗ ·y . Furthermore it follows from the Fan inequalities (3.
is such that A − H has k nonzero singular values and hence rank k . The determination of the rank of a matrix is an illposed problem. From the dyadic decomposition (3.4 Distance to singularity The remedy to the illposedness of the rank determination problem is to consider instead the distance to singularity. for almost all H of inﬁnitesimally small norm.11) of A. The minimality follows from lemma 3. n} ρ : Rn×m −→ I We will show that for rank deﬁcient matrices. A basic problem in numerical analysis is trying to decide when a small ﬂoating point number should be considered as zero. 2004 i i i i .2. trying to decide when a cluster of eigenvalues should be regarded as one multiple eigenvalue or nearby but distinct eigenvalues.3. Illposedness of rank computation The rank of a matrix A ∈ Rn×m . ACA April 14. the tolerance may be set as δ = σk +2 . 3. for H of sufﬁciently small norm (less than the smallest singular value of A) ρ(A + H) = ρ(A). and this remains constant for sufﬁciently small perturbations.3. in which case the numerical rank will be k . σk and σk+1 .5. n ≤ m. There exist matrices where a small change in the parameters causes a large change in the eigenvalues. n ≤ m. In a similar fashion. where is machine precision). For instance if there is a sufﬁcient gap σk+1 between say. We distinguish two cases. Consider for instance a 10 × 10 matrix A in companion form with all 10 eigenvalues equal to 1. Thus Dρ can be chosen as the zero function. If ρ(A) = k < n. the computed singular values of A are the exact singular values of the perturbed matrix A + E. A small change of the order of 10−3 in one of the entries of A. If A has full rank.7. · · · . this map is not differentiable. Furthermore. The numerical rank According to corollary 3. the perturbation H = i=k+1 σi ui vi . the rank of the perturbed matrix becomes full ρ(A + H) = n. Such phenomena motivate the use of pseudospectra (see chapter 10. the distance to matrices of rank k can be deﬁned as the smallest norm of H such that A − H has rank at most k . The numerical rank of a matrix is now deﬁned as the number of singular values which are bigger that δ : rankδ A = {k : σk (A) > δ. To deﬁne the numerical rank we need to deﬁne a tolerance δ > 0 (e. can be considered as a map ρ between the space of all n × m matrices and the set of natural numbers I = {0. We thus have the following result. σk+1 (A) ≤ δ }. 1.3.g. The Fr´ echet derivative of ρ denoted by Dρ is deﬁned by means of the equation ρ(A + H) − ρ(A) − Dρ (H) = O( H ) where Dρ (H) is linear in H.3 the rank of a matrix is equal to the number of nonzero singular values. These distance problems in the 2norm can be solved explicitly by means of the SVD.2 on page 259). In terms of backward error analysis. Or. The reason is that the condition number of the eigenvalues of this matrix is of the order of 1013 . causes the eigenvalues to change drastically. and are therefore well n ∗ conditioned. it can be adapted to the problem at hand so that the determination of the numerical rank becomes well posed. Therefore. Although this is still an illposed problem. the distance to singularity in a given norm is deﬁned as the smallest norm of a matrix H ∈ Rn×m . Therefore Lemma 3. the condition number of A (which is the distance to singularity) is of the order of 105 . ρ(A) = n. Basic numerical analysis 45 i “oa” 2004/4/14 page 45 i This property does not hold for the EVD. For a given A ∈ Rn×m .i i 3. Therefore there exists no Fr´ echet derivative Dρ (the deﬁning equation cannot be satisﬁed). δ = A 2 = σ1 . such that the rank of the perturbed matrix A − H is less than n. the tolerance is user speciﬁed. Remark 3.
is a state of the art software package for the numerical solution of problems in dense and banded linear algebra.40 · 10−2 5.72o σ2 2 1 + σ2 = . 3. n ≤ m. Thus this distance in the two norm is the quotient of the smallest to the largest singular values. is UΣV∗ .i i 46 Chapter 3. Tools from matrix theory i “oa” 2004/4/14 page 46 i Lemma 3.4. Consider the square matrix of size n. the relative distance to singularity is of importance. σ2 ) such that the perturbed A − H is singular. while cos θ sin θ − sin θ cos θ .5001 3.12 · 10−4 1. Such phenomena (small σn but bounded σn−1 ) are typical of Toeplitz matrices.05 · 10−5 2.e.5009 1. We also notice that the distance of the above matrix to matrices of rank n − 2 is practically. It provides namely.171 . It is the distance to singularity scaled by the norm of A. The distance of A to the set of rank k matrices in the 2norm is σk+1 (A). including generalized problems Matrix factorizations Condition and error estimates The BLAS (Basic Linear Algebra Subprograms) as a portability layer ACA : i i April 14.65 · 10−15 σn−1 1.5 LAPACK Software LAPACK [7]. For details we refer to the book of B¨ ottcher and Silbermann [69]. Therefore the distance to singularity is σn (A).86 · 10−6 2.5094 1. and tan θ = σ2 ⇒ θ = 31. which stands for Linear Algebra PACKage. minus ones in all entries above the diagonal and zeros elsewhere. For an account of other matrix distances to singularity we refer to Rump [282] and references therein.276 is a matrix of least 2norm (i.447 −. n.15 · 10−5 2. with ones on the diagonal. Example 3. For matrix nearness problems. the smallest singular value of A.29 · 10−2 2.3. Example 3.276 −. the (k + 1)st singular value of A. Here is a list of its salient features.60 · 10−17 σn σ1 Thus for n = 50 the matrix is singular. It should be mentioned that this issue can be understood in terms of pseudospectra. • • • • • • Solution of systems of linear equations Solution of linear least squares problems Solution of eigenvalue and singular value problems. 2004 i i .618. · · · . Let A ∈ Rn×m .3. V= .19) is the inverse of the 2condition number relative distance to singularity = σn (A) 1 = σ 1 ( A) κ(A) We can say in general that the reciprocal of the relative distance to illposed problems is equal to the 2condition number.8.5. σ2 = √ 5−1 2 = . It allows users to assess the accuracy of their solutions. which according to (3.5005 1.5021 1. see Demmel [93] an analysis of this issue. error bounds for most quantities computed by LAPACK (see Chapter 4 of LAPACK Users Guide.92 · 10−3 9.41 · 10−7 8. Often. The SVD of A = U= It follows that ∗ H2 = σ2 u2 v2 = 1 1 0 1 . ”Accuracy and Stability”). see also Higham [169]. independent of n. i = 1. have singular values σi (A).618. . where σ1 = sin θ cos θ σ2 −1 − cos θ sin θ 2 −σ2 σ2 √ 5+1 2 = 1. for machine precision tolerance. The absolute and the relative distance to singularity decreases with increasing n: n n= 5 n = 10 n = 15 n = 20 n = 50 σn 9.3.
if. w = y A. From our earlier discussion it follows that the SVD provides such a decomposition. then A B C D = A AX YA YAX = I Y A I X Therefore the rank of this matrix is equal to the rank of A.26) hold.g. there exist matrices X ∈ Rn×k and Y ∈ Rk×n .10. LU. with rank B = rank C = rank D = k . Rankone reduction. 1 Corollary 3. C ∈ Rk×n . a general characterization of such decompositions is given.26) are satisﬁed.25) Proof. such that the ﬁrst two conditions in (3. v. QR. A B = rank (A − BD−1 C) + rank D. The rank of the difference A − π vw∗ . 47 i “oa” 2004/4/14 page 47 i 3. SVD. Then the fact that the rank of the ﬁrst block row and the ﬁrst block column of this matrix both equal the rank of A implies the existence of X. (3. this proves the sufﬁciency of (3. if.25) be satisﬁed. A − BD−1 C is called the Schur complement of C D the block matrix in question. and π = y Ax. this implies rank A = rank (A − BD−1 C) + rank D. and only if. General rank additive matrix decompositions* • Linear Algebra PACKage for highperformance computers • Dense and banded linear algebra for Shared Memory It should be mentioned that starting with Version 6. see also [185]. w ∈ Rn . If conditions (3. EVD) use LAPACK. Y. Therefore calls to dense linear algebra algorithms (e. Then rank (A − BD−1 C) = rank A − rank (BD−1 C). v = Ax. and only if. C = YA and D = YAX. for some vectors x. First recall the identity: A B C D Therefore: rank = In 0 BD−1 Ik A − BD−1 C 0 0 D In D−1 C 0 Ik . D ∈ Rk×k . Conversely. Let A ∈ Rn×n . From the above analysis it follows that rank A B C D = rank A. Below. 2004 i i i i . y of appropriate dimension. and since rank D = rank (BD−1 C).9.4 General rank additive matrix decompositions* In this section we investigate matrix decompositions Q = R + S where rank Q = rank R + rank S.26) (3. B ∈ Rn×k . such that B = AX. following the developments in [87].i i 3.4. is one less than the rank of n×n ∗ ∗ ∗ A∈R .26). The following result for the rankone case was discovered by Wedderburn in the early 1930s [353]. which satisfy the property that the rank of the sum is equal to the sum of the ranks. Lemma 3. ACA April 14. the third condition follows then from the fact that the whole matrix should have rank equal to that of A. let (3. MATLAB has incorporated LAPACK.
It follows from the above considerations that in LS the ﬁrst n1 components of each measurement X vector are assumed errorfree an assumption which may not always be justiﬁed. S is a doubly stochastic matrix.w y. Tools from matrix theory i “oa” 2004/4/14 page 48 i The least squares problem consists in ﬁnding A ∈ R(n−n1 )×n1 such that AY − Z 2 F . i. Proposition 3. yk zk ∈ Rn .i i 48 Least squares Given is the set of data (measurements) xk = This data is arranged in matrix form: X = [ x1 · · · xN ] = Y Z ∈ Rn×N where Y = [y1 · · · yN ] ∈ Rn1 ×N . it has nonnegative entries and the sum of its rows and columns are equal to one.17)). un1 vn1 ··· unn vnn αn i i ACA : April 14. . yi ≥ yi+1 .w y. where yk ∈ Rn1 . . x ≺π y. k = 1. Given two real vectors x. Chapter 3.5 Majorization and interlacing* In this section we review some relationships between the eigenvalues. or multiplicatively. Notice also that the decomposition is orthogonal since ˜.. 2004 i i . if the following inequalities hold: x ≺Σ y x ≺π y k i=1 n i=1 xi ≤ xi = k i=1 k i=1 yi yi k . α = [A11 · · · Ann ]∗ . The LS solution leads to the following rank additive decomposition of the data ˆ ˜ matrix X = X + X.4 on page 233 to express an average rate of decay of the socalled Hankel singular values. These relationships are known as majorization inequalities. . for some S ∈ Rn×n . . let x = Sy. 2. Majorization inequalities will be used in section 9. is minimized. and Thompson [324]. respectively.7). we say that y majorizes x additively. N ≥ n. . consider matrices A and B such that B = UAV∗ (all matrices are square). where Y+ denotes a generalized inverse of Y (see (3.e. the singular values and the diagonal entries of a matrix.12. . . Then y majorizes x additively if.11. where: ˆ = X Y YY+ Z ˜ = and X 0 [In−n1 − YY+ ] Z . Given the above deﬁnitions. and only if. let β = [B11 · · · Bnn ]∗ . Z = [z1 · · · zN ] ∈ R(n−n1 )×N . x ≺π.X ˆ =X ˜ ∗X ˆ = 0. Πi=1 xi < Πi=1 yi . y ∈ Rn with ordered entries as above. This latter expression can also be written in the more N familiar form i=1 Ayi − zi 2 2 . respectively. Deﬁnition 3. . For details we refer to Marshall and Olkin [233]. Horn and Johnson [182]. Given two vectors x. zk ∈ Rn−n1 . if the last relationship above. 3. is an inequality: n n n n i=1 xi < i=1 yi . the following holds. . denote the vectors containing the diagonal entries of each matrix. Πk i=1 xi ≤ Πi=1 yi n Πn i=1 xi = Πi=1 yi We say that y weakly majorizes x and write x ≺Σ . Next. . · · · . y ∈ Rn . and write x ≺Σ y. .. where the subscript “F” denotes the Frobenius norm deﬁned in (3. A straightforward computation shows that u11 v11 · · · u1n v1n α1 u21 v21 u2n v2n α2 β = (U V ) α = . with entries arranged in decreasing order: xi ≥ xi+1 .
Then the symmetry A = A∗ . i = 1.13. there exists a symmetric matrix with the prescribed diagonal entries and eigenvalues. σn ).27) April 14. Proposition 3. These conditions are also sufﬁcient. · · · . Proposition 3. n − 1. Let σ ( · ) denote the vector of ordered singular values. let λ( · ) denote the vector of ordered eigenvalues. given any set of vectors satisfying this majorization relationship. · · · . Let its eigenvalue decomposition be A = UΛU∗ . We now turn our attention to the relationship between eigenvalues and singular values. A similar result holds for the singular values and the absolute values of the diagonal entries of an arbitrary matrix A ∈ Rn×n . σi  ≥ σi+1 . and therefore α ≺Σ λ. · · · . respectively. Furthermore. denote the ith eigenvalue. Proposition 3.14. s ≥ 0. r + s + 1 ≤ n. there exists a matrix A with the prescribed diagonal entries and singular values. the following inequality is due to Fan: σr+s+1 (A + B) ≤ σr+1 (A) + σs+1 (B) where r. σn ). σi . and αi  ≥ αi+1 . n.15. For arbitrary matrices both additive and multiplicative versions of the majorization inequalities hold. n (a) λ1 · · · λk−1 λk  ≤ σ1 · · · σk−1 σk . B ∈ Rn×n . Furthermore.w (σ1 . Then σ (A + B) ≺Σ . and Πn i=1 λi  = Πi=1 σi .5. (e) µi  ≤ σi . and also µi = λi A+ 2 µi ≥ µi+1 . is the ith singular value of A. singular value.w σ (A) + σ (B).16. we have i=1 σi (A + B) ≥ i=1 [σi (A) + σn−i+1 (B)]. · · · . (b) (λ1 . ACA k k (3. let λi . Proposition 3. Let α ∈ Rn be such that αi = Aii . k = 1. Simple algebra reveals that α = (U U) λ and since U is unitary. k = 1. in the opposite direction. · · · . let α denote the vector of diagonal entries of A. Consider the symmetric matrix A = A∗ ∈ Rn×n . 2004 i i i i . The ordered vector of eigenvalues of a symmetric matrix majorizes the ordered vector of diagonal entries α ≺Σ λ. We assume without loss of generality that the entries of α and λ are ordered in decreasing order. Let A∗ . (c) (λ1 2 . σi = λi . The next result is concerned with sums of matrices. Majorization and interlacing* 49 i “oa” 2004/4/14 page 49 i The matrix U V is the Hadamard product of U and V. B = B∗ implies λ(A + B) ≺Σ λ(A) + λ(B) Finally. The following majorizations hold. and is deﬁned as the elementwise multiplication of these two matrices: (U V)ij = Uij Vij . λn ) ≺Σ . given vectors α and σ satisfying the above inequalities. µn ).i i 3. · · · . i. · · · . · · · . Given a matrix A ∈ Rn×n .. For a symmetric (Hermitian) matrix. For Hermitian matrices. the singular values are equal to the absolute values of the eigenvalues. n. Re(λn )) ≺Σ (µ1 . ﬁnally. Conversely. In other words. Then σ weakly majorizes α and in addition satisﬁes the inequality α1  + · · · + αn−1  − αn  ≤ σ1 + · · · + σn−1 − σn . U U is doubly stochastic (actually it is called orthostochastic). the vector composed of the eigenvalues of A. be such that 2 σi = λ(AA∗ ). Consider the matrices A. let σ ∈ Rn . and λ = (λ1 · · · λn )∗ .e. 2 2 . λn 2 ) ≺Σ (σ1 (d) (Re(λ1 ). · · · . These quantities are ordered as follows: λi  ≥ λi+1 .
1. r = 1. namely. B. and βn−1 ≥ αn−1 .15). The singular values αi and βi of A.i i 50 Chapter 3. 3. · · · . · · · . n − 1. Proposition 3. an important ingredient for carrying out an analysis of the quality of approximations and of the numerical accuracy of computations. there hold σ (AB) ≺Σ . · · · .5. B. Thus by relabeling the quantities involved. k = 1. n. · · · . we obtain a result concerning the singular values of the product of two matrices. i = 1. and U ∈ Rn×k such that U∗ U = Ik . However. the norm of the error is lowerbounded by the (k + 1)st singular value of A. 2004 i i . if not the most important. by αi . let B = U∗ AU ∈ Rk×k . Tools from matrix theory i “oa” 2004/4/14 page 50 i Remark 3. The second section introduces the Singular Value Decomposition (SVD).6 Chapter summary The focus of this chapter is on tools from matrix analysis which will be used subsequently. The above result has an analog for arbitrary matrices where the singular values take the place of the eigenvalues. satisfy the inequalities αi ≥ βi ≥ αi+2 .w σ (A) together with a multiplicative majorization: r r r σ (B) ≺Σ . n − 1. Proposition 3. k. k. we obtain σk+1 (A + B) ≤ σ1 (A) + σk+1 (B). 2 i k n n Πk i=1 σi (AB) ≤ Πi=1 σi (A)σi (B). which as mentioned. let B be obtained by deleting column k and row k from A. · · · . Inequality (3. given A and k real numbers βi satisfying the above inequalities. Corollary 3. i = 1. where X is a rankk approximation of A. Given two matrices A ∈ Rn×k and B ∈ Rk×m . is one the most important factorizations in matrix analysis. Since the rank of X is at most k . The ﬁrst section introduces norms of vectors and matrices. It states that if λi . 2 σi (AB) ≤ i=1 i=1 σi (A)σi (B) ≤ i=1 1 2 2 σ (A) + σi (B) . Given A ∈ Rn×n . βi .w λ(AA∗ ) + λ(BB∗ ) .15). k ≤ n. i = 1. n − 2.17. σk+1 (X) = 0 and the desired inequality (3.27) can be used to derive the lower bound given by (3. · · · . ˆ i .4 which provides the solution of the problem of approximating matrices (operators) by ones of lower rank and are optimal in the 2induced norm. and Πi=1 σi (AB) = Πi=1 σi (A)σi (B). respectively. then ˆ 1 ≥ λ2 ≥ λ ˆ 2 ≥ · · · ≥ λn−1 ≥ λ ˆ n−1 ≥ λn . m. Conversely. Associated with the SVD is the SchmidtEckartYoungMirsky theorem 3. there exists a matrix U composed of k orthonormal columns such that the eigenvalues of B = U∗ AU are precisely the desired βi . provided that the approximant has rank at most k . This is a prototype i i ACA : April 14. Denote the ordered eigenvalues of A. Given A = A∗ ∈ Rn×n . The classical result in this area is known as the Cauchy interlacing property.18. we get σk+1 (A) ≤ σ1 (A − X) + σk+1 (X). no interlacing holds. The following interlacing inequalities hold: αi ≥ βi ≥ αi+n−k . arranged in decreasing order. For r = 0 and s = k . and λ eigenvalues of the principal submatrix A[k] of A obtained by deleting column k and row k from A. σk+1 (A) ≤ σ1 (A − X) follows. are the are the eigenvalues of a given symmetric matrix A. We conclude with a discussion of the relationship between the spectral properties of matrices and some submatrices.19. λ1 ≥ λ A more general result states the following. i = 1. Combining the above results.
Chapter summary 51 i “oa” 2004/4/14 page 51 i of the kind of approximation result one would like to obtain. with the singular values of the original matrix providing a tradeoff between achievable accuracy and desired complexity. 2004 i i i i .i i 3. The third section’s goal is to brieﬂy discuss the issue of accuracy of computations.11) leads to rank additive decompositions of matrices. gives a general account of rank additive matrix decompositions. ACA April 14. by providing an analysis of the various sources of error. it is also shown that the popular least squares data ﬁtting method induces a rank additive decomposition of the data matrix. The dyadic decomposition (3.6. The ﬁnal section (which can also be omitted at ﬁrst reading) lists a collection of results on majorization inequalities. Backward stability and forward stability are important concepts in this regard. The fourth section which can be omitted on a ﬁrst reading.
i i 52 Chapter 3. 2004 i i . Tools from matrix theory i “oa” 2004/4/14 page 52 i i i ACA : April 14.
1) while the next section investigates some structural properties of systems described by (4.e. Y = {y : Z → Rp }. The third section discusses the equivalence of the external and internal descriptions. approximation of largescale systems. which are geared towards the main topic of this book. j ) ∈ Rp×m such that u −→ y = S (u).2). We are also concerned with systems where besides the input and output variables.1 External description Let U = {u : Z → Rm }. Here it is assumed that the external variables have been partitioned into input variables u and output variables y. y(i) = j ∈Z h(i. [304]. D are linear constant maps. It is assumed that x lives in a ﬁnitedimensional space: σ x = Ax + Bu. This is called the internal description.i i i “oa” 2004/4/14 page 53 i Chapter 4 Linear dynamical systems: Part I In this chapter we review some basic results concerning linear dynamical systems. For an introduction to linear systems from basic principles the reader may consult the book by Willems and Polderman [270]. i. C. This is called the external description. the state x has been deﬁned as well. 53 i i i i . going from the latter to the former involves the elimination of x and is thus straightforward. while that of y with x and u is given by a set of linear algebraic equations. (4. systems where the relation between u and y is given by a convolution sum or integral y = h ∗ u. Furthermore. and we consider convolution systems. the relationship between x and u is given by means of a set of ﬁrst order difference or differential equations with constant coefﬁcients.. The ﬁrst section is devoted to the discussion of systems governed by (4. (4. 4. [370]. The converse however is far from trivial as it involves the construction of state. It is called the realization problem.2) where σ is the derivative operator or shift operator and A.1) where h is an appropriate weighting pattern. S : U −→ Y. j )u(j ). There exists a sequence of matrices h(i. Closely related is the concept of gramians for linear systems which is central in subsequent developments. which is linear. General references for the material in this chapter are [280]. namely reachability and observability. namely. [371]. [75]. As it turns out. y = Cx + Du. B. A discretetime linear system Σ with m input and p output channels can be viewed as an operator between the input space U and output space Y. i ∈ Z.
1) h(0. y(−2) · · · y(−1) · · · y(0) = · · · y(1) · · · . −1) h(0. . . denotes the instantaneous action of the system. A continuoustime linear system Σ with m input and p output channels can be viewed as an operator S mapping the input space U to the output i i ACA : April 14. i “oa” 2004/4/14 page 54 i Chapter 4. 0) h(−1. . . the subsequence of h composed of the k th column of each entry hi . 1) h(−1. This sequence is called the impulse response of Σ. (4. 0) . . . . Y = {y : R → Rp }. . h−1 . −2) h(1. The system Σ described by S is called causal if h(i. . . . . h1 . . ··· ··· ··· ··· . In analogy to the discretetime case. The matrix representation of S in this case is lower triangular and Toeplitz. . let U = {u : R → Rm }. multioutput (MIMO) case. The operation of S can now be represented as a convolution sum: ∞ S : u −→ y = S (u) = h ∗ u. it is the output obtained in response to a unit pulse u(t) = δ (t) = 1. · · · u(−2) · · · u(−1) . t ∈ Z. i ≤ j and timeinvariant if h(i. h(−2. Linear dynamical systems: Part I . is produced by applying the input ek δ (t). The remaining terms denote the delayed or dynamic action of Σ. . . .. .. . singleoutput (SISO) case m = p = 1. . 2004 i i . t = 0.. −1) h(1. u(−2) u(−1) u(0) u(1) . t = 0. . . (4. .g. For a time invariant system Σ we can deﬁne the sequence of p × m constant matrices h = ( · · · . −1) . h−3 h−2 h−1 h0 . h(−2. . . . ··· ··· ··· ··· . j ) = 0. . where ek is the k th canonical unit vector (all entries zero except the k th which is 1). t ∈ Z. . . . . In this case Σ: y(t) = h0 u(t) + h1 u(t − 1) + · · · + hk u(t − k ) + · · · . .3) The convolution sum is also known as a Laurent operator in the theory of Toeplitz matrices (see e. . h0 h1 h2 h3 . . . . . h(−2. . . . · · · u(0) · · · u(1) . 0. h2 . . . 0) h(1. .. .. h−2 h−1 h0 h1 . . . 1) h(1. . . . h−1 h0 h1 h2 . the matrix representation of S in this case is a (doubly inﬁnite) block Toeplitz matrix . . . 0) h(0. In the multiinput. . The ﬁrst term h0 u(t). . . = . · · · ). j ) = hi−j ∈ Rp×m . h(−2. Moreover. In the singleinput. . . . .. . . . −2) ... −2) h(−1. . . .4) In the sequel we will restrict our attention to causal and timeinvariant linear systems. .i i 54 This relationship can be written in matrix form as follows . [69]). h0 . . . . h−2 . −1) h(−1. y(−2) y(−1) y(0) y(1) . −2) h(0. . . 1) . where (h ∗ u)(t) = k=−∞ ht−k u(k ). .
It readily follows that H can be expanded in a formal power series in ξ : H(ξ ) = h0 + h1 ξ −1 + h2 ξ −2 + · · · + hk ξ −k + · · · .1. It should be clear from the context which of the two cases applies. · · · . 1! 2! (k + 1)! It follows that if (4. (4. In this case S is a convolution operator ∞ S : u → y = S (u) = h ∗ u. we will express the output as a sum of two terms. h2 . hk ∈ Rp×m . The system just deﬁned is causal if h(t. as in the discretetime case. τ ) = 0. where (h ∗ u)(t) = −∞ h(t − τ )u(τ ) dτ. t ∈ R (4. which is linear.6) where δ denotes the δ distribution.i i 4. Notice that by abuse of notation. The external description of a timeinvariant. ACA April 14. smooth continuoustime linear systems can be described by means of the inﬁnite sequence of p × m matrices hi .and discretetime systems. (4. and purely dynamic action. Deﬁnition 4. · · · ).5) It is assumed from now on that S is both causal and timeinvariant which means that the upper limit of integration can be replaced by t. causal discretetime linear system with m inputs and p outputs is given by an inﬁnite sequence of p × m matrices Σ = (h0 . τ ). τ )u(τ ) dτ. t ≤ τ. The Laplace variable is denoted by ξ for both the continuous. hk . and ha is a smooth kernel. we use Σ to denote both the system operator and the underlying sequence of Markov parameters. h1 . External description 55 i “oa” 2004/4/14 page 55 i space Y. In the sequel we will assume that ha is an analytic function. y(t) = −∞ h(t.8) H(ξ ) = (Lh)(ξ ). τ ) = h(t − τ ). the ﬁrst being the instantaneous and the second the dynamic action: t y(t) = h0 u(t) + −∞ ha (t − τ )u(τ ) dτ. where h(t. We formalize this conclusion next.1. 2004 i i i i . where h0 ∈ Rp×m . This assumption implies that ha is uniquely determined by the coefﬁcients of its Taylor series expansion at t = 0+ : ha (t) = h1 + h2 t t2 tk+1 + h3 + · · · + hk + · · · . i ≥ 0. It readily follows that h is the response of the system to the impulse δ . and Σ is consequently called a smooth system. causal and smooth continuestime system and that of a timeinvariant. hk ∈ Rp×m . that is. In addition. is a matrixvalued function called the kernel or weighting pattern of Σ. Hence just like in the case of discretetime systems. the output y is at least as smooth as the input u. we will distinguish between instantaneous.7) The matrices hk are often referred to as the Markov parameters of the system Σ.6) is satisﬁed.or discretetime) Laplace transform of the impulse response yields the transfer function of the system (4. h(t. t ≥ 0. and is termed therefore the impulse response of Σ. In particular this requirement implies that h can be expressed as h(t) = h0 δ (t) + ha (t). The (continuous. t ∈ R. In particular we will be concerned with systems for which S can be expressed by means of an integral ∞ S : u −→ y. and time invariant if h depends on the difference of the two arguments.
Remark 4.1. The behavioral framework In the classical framework. according to whether we are dealing with a continuous. or (4. Linear dynamical systems: Part I i “oa” 2004/4/14 page 56 i This can also be regarded as a Laurent expansion of H around inﬁnity. Rm . given are three linear spaces: the state space X. Y are linear spaces. the state x. It describes how the system interacts with the outside world. A : Rn → Rn . (4. while the second one describes the dynamics internal evolution of the system. the ﬁrst one is called the input map. C. in many cases (think. and A. timeinvariant. it should follow as a consequence of the modeling. again. and C : Rn → Rp . B. without distinguishing between them.and continuoustime linear systems. of electrical circuits) the distinction between inputs and outputs is not a priori clear.2 Internal description Alternatively. (4. 2004 i i . a dynamical system is viewed as a mapping which transforms inputs u into outputs y. See also the paper by Rapisarda and Willems [272]. Chapter 1. the basic variables considered are the external or manifest variables w. which consist of u and y. respectively. instead. and the (backwards) shift operator for discretetime systems. and Arbib [192].i i 56 Chapter 4.1. Equations (4.or vice versa. For our purposes. continuousor discretetime system which is ﬁnitedimensional. Linear means: U.or a discretetime system: dx(t) = Ax(t) + Bu(t). The collection of trajectories describing the evolution of w over time deﬁnes a dynamical system. For details on the behavioral framework we refer to the book by Polderman and Willems [270]. the disadvantages of representationdependent results . it is desirable to be able to treat the different representations of a given system (for example: inputoutput and statespace representations) in a uniﬁed way. Again for a ﬁrstprinciples treatment of the concept of state we refer to the book by Willems and Poldeman [270]. Falb. 4.9) and (4. which. It turns out that this deﬁnition provides the right level of abstraction. The output equations. are composed of a set of linear algebraic equations y = Cx + Du.3) and (4. Two basic considerations express the need for a framework at a more fundamental level. is the input function. input space U and the output space Y. σ x = Ax + Bu. we can characterize a linear system via its internal description which uses in addition to the input u and the output y. B : Rm → Rn .5) can be written as Y(ξ ) = H(ξ )U(ξ ).9) (4. Moreover. The resulting central object is the most powerful unfalsiﬁed model (MPUM) derived from the data.11) where σ denotes the derivative operator for continuoustime systems. In the sequel the term linear system in the internal description is used to denote a linear.10) can be written in a uniﬁed way. containing functions taking values in Rn . Second. The state equations describing a linear system are a set of ﬁrst order linear differential or difference equations. In both cases x ∈ X is the state of the system. for both discrete. see e. inputs and outputs can be introduced and the corresponding I/O operator recovered. Kalman. necessary for accommodating the two considerations laid out above. This establishes the foundations of a parameterfree theory of dynamical systems. are (constant) linear maps.12) where y is the output function (response). Consequently.being well recognized.g.10) (4. the advantages of representationindependent results . are (constant) linear maps. t ∈ Z. while u ∈ U. Subsequently. is a space of trajectories. and Rp . In the behavioral setting. First. D are i i ACA : April 14. X. t ∈ R or dt x(t + 1) = Ax(t) + Bu(t). C is called the output map. for example. D : Rm → Rp .
1. C are positive. D do not depend on time. Consequently state equations are thus u = Rx1 + Lx ∗ ∗ in (4. The ˙ 1 + x2 and C x ˙ 2 = x1 . Internal description 57 i “oa” 2004/4/14 page 57 i linear maps. (4. p are all ﬁnite positive integers. x = (x∗ 1 .2. We are now ready to give the Deﬁnition 4. A ∈ Rn×n . C ∈ Rp×n . B. it is stable since the characteristic polynomial 1 χA (s) = s2 + R L s + CL . see section 5. B ∈ Rn×m . namely eMt = In + ACA t2 tk t M + M2 + · · · + Mk + · · · . C ∈ Rp×n . p × n. of A has roots with negative real parts. b u c b Figure 4. D = 0. C. n.2. and the voltage across the resistor denoted by y. The external variables are: the voltage applied at the terminals denoted by u. given M ∈ Rn×n . In the sequel (by slight abuse of notation) the linear maps A. Example 4. ﬁnitedimensional means: m.16) April 14. and t ∈ R. B ∈ Rn×m .9).2. and assuming that R. C= R 0 .1. we deﬁne the matrix exponential by means of the same series representation as the scalar exponential. The concept of stability is introduced formally in the above deﬁnition. An RLC circuit −R L 1 C 1 −L 0 1 L Ey R E E x2 L C x1 c A= . x2 ) . denoted by x2 . B= 0 . One choice for the internal or state variables is to pick the current through the inductor. The former is the input or excitation function and the latter the output or measured variable of Σ.11). In the sequel we will also use the notation Σ= A C B A ∈ Rn×n . The system has dimension n = 2. their matrix representations are constant n × n. D ∈ Rp×m .15) It denotes a linear system were either D = 0. (4. or D is irrelevant for the argument pursued. p × m matrices. while the output equation is y = Rx1 .13) The dimension of the system is deﬁned as the dimension of the associated state space: dim Σ = n. timeinvariant means: A. For a more detailed discussion. 2004 i i i i .1.14) (b) Σ is called stable if the eigenvalues of A have negative real parts (for continuoustime systems) or lie inside the unit disk (for discretetime systems). L. B. We consider the dynamical system Σ shown in ﬁgure 4. n × m. C.8 on page 126. 1! 2! k! (4. D as well as their matrix representations (in some appropriate basis) are denoted with the same symbols. (a) A linear system in internal or state space description is a quadruple of linear maps (matrices) Σ= A C B D . denoted by x1 and the voltage across the capacitor. (4. Solution of the state equations We will now give the solution of (4. For this we will need the matrix exponential.i i 4.
Expanding the transfer function in a Laurent series for large ξ . 0. only the state variables could be transformed. (4. det T = 0. In our case since the external variables (i. (4.and continuoustime systems.10) t−1 t t0 eA(t−τ ) Bu(τ ) dτ. CA2 B.21) Finally. More precisely given the state transformation T (4.e. For discretetime systems CAt−1 B. i. CAB. t) = At−t0 x0 + j =t0 At−1−j Bu(j ).18) For both discrete. x(0). in the neighborhood of inﬁnity. t) + Du(t) = Cφ(0. t>0 t=0 t<0 (4. 0. e A e B e C e D where D remains unchanged. the state of the system at time t attained from the initial state x0 at time t0 . t≥0 t<0 (4. x(0).22) where ξ = s (continuoustime Laplace transform) and ξ = z (discretetime Laplace or Z transform). In particular. which is called the transfer function of Σ. CB.24) Ip Im C D C D i i ACA : April 14. CAk−1 B. t ≥ t0 .9) φ(u. In particular. if the new state is x = Tx. x0 .e. the corresponding matrices describing the system will change.23) Sometimes it is advantageous to describe the system from a different point of view than the original one. i.20) where δ denotes the δ distribution.19) If we compare the above expressions for t0 = −∞ and x0 = 0 with (4. Σ and Σ are equivalent if A B A B T T = (4. we get HΣ (ξ ) = D + CB ξ −1 + CAB ξ −2 + · · · + CAk−1 B ξ −k + · · · . h(t) = 0. The corresponding system triples are called equivalent. For continuoustime systems: h(t) = CeAt B + δ (t)D..8) the Laplace transform of the impulse response. x0 .12) become σ x = TAT−1 x + TB u. is: Σ = (D. y = CT−1 x + D u.e. D. 2004 i i .17) φ(u.i i 58 Chapter 4.11) and (4. t) + Du(t). the input and the output) are ﬁxed. Linear dynamical systems: Part I i “oa” 2004/4/14 page 58 i Let φ(u. t) = eA(t−t0 ) x0 + while for the discretetime state equations (4. t) + Cφ(u. t) denote the solution of the state equations (4. (4.5). it follows that the output is given by: y(t) = Cφ(u. t ≥ t0 . · · · . x0 . under the inﬂuence of the input u. and the corresponding external description given by the Markov parameters (4.3) and (4.7). Put differently.. by (4. · · · ). is HΣ (ξ ) = D + C(ξ I − A)−1 B (4. it follows that the impulse response h has the form below.11). for the continuoustime state equations (4. (4.
i i 4. C ) is HΣ (s) = ˜ = Tx. L. 0. the impulse response is h(t) = CeAt B = e− 2 cos (in terms of R.1. 2004 i i i i . The system Σ is (completely) reachable if Xreach = X. t ≥ 0. . for this subsection and the next. ACA (4. the new state space representation of Σ is 1 1 ˜ = CT−1 = R (1 1). A ∈ Rn×n . The related concept of controllability is discussed subsequently. B) = [B AB A2 B · · · An−1 B · · · ] is the reachability matrix of Σ.2. k ∈ N.2. Continuation of example 4. We have thus proved Proposition 4. − ( CR . Example 4. A state x ¯ < ∞. C and D will be ignored: Σ = A B . such that ¯ (t). Thus. T The reachable subspace Xreach ⊂ X of Σ is the set containing all reachable states of Σ. the matrix exponential is eAt = e− 2 cos t √ t 3 2 1 0 0 1 t √ 1 t t 3 + e− 2 sin 3 2 √ t 3 2 −2 −1 sin 3e t −1 2 √ t 3 2 −2 1 .25) April 14. where T = Finally. Consequently. Furthermore R(A. Internal description for some invertible matrix T. 1 −1 . The ﬁrst ﬁve Markov parameters of Σ are 0. Both concepts involve only the state equations. and a time T there exist an input function u ¯) ¯ = φ(¯ x u. C 2 ˜ = TB = 1 . L = 1. Given is Σ = A B ¯ ∈ X is reachable from the zero state if . For a survey of reachability and observability (which is introduced later) see [16]. − 2 L ) L L CL3 C 2 L5 59 i “oa” 2004/4/14 page 59 i Assuming that R = 1. B ∈ Rn×m . Equivalent triples have the same transfer function and consequently the same Markov parameters. This immediately implies that hk = hk . This concept allows us to identify the extent to which the state of the system x can be manipulated through the input u. (CR2 − L) .2. Deﬁnition 4. if the state is changed to x ˜ = TAT−1 = −1 A 2LC RC − L + C RC + L + C RC − L − C RC + L − C . C = 1.2.3.2. If Σ and Σ are equivalent with equivalence transformation T. B ∈ Rn×m . B L 4. A ∈ Rn×n . while the transfer function s2 + 1 1 R L s 1 R L s + RC . it readily follows that HΣ (ξ ) = D + C(ξ I − A)−1 B = D + CT−1 T(ξ I − A)−1 T−1 TB = D + CT−1 (ξ I − TAT−1 )−1 TB = D + C(ξ I − A)−1 B = HΣ e (ξ ). − 2 . of ﬁnite energy.4. R R2 R R2 2 .1 The concept of reachability In this subsection we introduce and discuss the fundamental concept of reachability of a linear system Σ.
B) = [B AB A2 B · · · An−1 B].e. reachability reduces to an algebraic concept depending only on properties of A and B and. (a) AXreach ⊂ Xreach .27) As mentioned in section 10. Y satisﬁes X ≥ Y. (4. timeinvariant systems. This. B). (c) Let T be a nonsingular transformation in X. Linear dynamical systems: Part I i “oa” 2004/4/14 page 60 i By the Cayley–Hamilton theorem. in particular.28) and for discretetime systems as t−1 P (t) = Rt (A. It should be noticed however that the physical signiﬁcance of A and B is different for the discrete. given by the formula A B . The above theorem.7) on page 264. Before proceeding with the proof of the theorem some remarks are in order..e. At B. then Adiscr = eAcont . im R(A. It follows from (4. i i ACA : April 14. Corollary 4. t = 0. B) = n. is denoted by im L. B) = Rn .29) A hermitian matrix X = X∗ is called positive semideﬁnite (positive deﬁnite). · · · . while it is known as the reachability space in the systems community. B) = im AR(A. together with a similar result on observability (4. however.6.6. We will ﬁrst prove corollary 4. The fundamental result concerning reachability is the following Theorem 4.and discretetime systems. shows that for linear. for both the continuous. n − 1. ﬁnitedimensional. 2004 i i . and only if. It is also worthwhile to notice that formula (4. t ∈ R+ . (c) Reachability is basis independent. has as a consequence the fact that many tools for studying linear systems are algebraic. ∗ (4. rankR(A. Given Σ = of X.5. if we discretize for instance the continuoustime system x Adiscr x(t) + Bdiscr u(t). on the rank of the reachability matrix R(A. t ∈ Z+ . (b) The result follows by noticing that Σ is (completely) reachable if. B is transformed to the pair TAT−1 .24) that the pair A. Thus for computational purposes the following (ﬁnite) reachability matrix is be of importance: Rn (A.i i 60 Chapter 4. (X > Y) if the eigenvalues of the difference X − Y are nonnegative (positive).26) The image of a linear map L. A very useful concept is that of the reachability gramian. if its eigenvalues are nonnegative (positive). B). and only if. Xreach is a linear subspace Xreach = im R(A. Proof. (a) A Xreach = A im R(A. B) (4. expression (10. The ﬁnite reachability gramians at time t < ∞ are deﬁned for continuoustime systems as t P (t) = 0 eAτ BB∗ eA τ dτ. 1. Deﬁnition 4.7. TB) = TR(A. (4. B) = im (AB A2 B · · · ) ⊂ im (B AB · · · ) = Xreach . det T = 0.and continuous˙ (t) = Acont x(t) + Bcont u(t). B)R∗ t (A. to x(t + 1) = time cases.4. i. the rank of the reachability matrix and the span of its columns are determined (at most) by the ﬁrst n terms. B) = k=0 Ak BB∗ (A∗ )k . i. which shows that the rank of the original and the transformed reachability matrices is the same. It is used in the proof of the theorem above and extensively in later chapters. but independent of time and the input function. is referred to as the Krylov subspace in the numerical linear algebra community. (b) Σ is (completely) reachable if.39). In general. It is readily checked that R(TAT−1 . reachability is an analytic concept. The difference of two Hermitian matrices X.27) is valid for both continuous.and discretetime case. TB.
if xi = φ(ui . n − 1. B). and only if. Let x ¯) = ¯ = φ(¯ x u.16). B). 0. B). for all t ≥ 0. i. B). (i − 1)! ¯ ∈ im R(A. x ¯ (A. α1 x1 + α2 x2 = φ(α1 u1 + α2 u Next we prove (4. T i>0 Ai−1 B 0 t (t − τ )i−1 u(τ ) dτ. B∗ (A∗ )i−1 q = 0.. consider an element x ¯ ∈ im Rn (A. i. and (b) their columns span the reachability subspace. B). T1 ]. i. this condition is equivalent to the function and all its derivatives being zero at t = 0. Consider ¯ −1 T ¯) = ¯ = φ(¯ x u. 0. Then. t ∈ [0. u2 (t − T1 + T2 ). we use the proposition given above which asserts that for every x ¯ such that there exists ξ ¯ ¯)ξ. The proof for discretetime systems is similar.2. Next we will prove proposition 4. By the Cayley– Clearly. α2 ∈ R. i = 1.. It is readily checked that for both continuous. Moreover q∗ P (t)q = 0 ∗ t B∗ eA ∗ (t−τ ) q 2 dτ = 0. T e = i>0 (i−1)! Ai−1 . The reachability gramians have the following properties: (a) P (t) = P ∗ (t) ≥ 0. T1 − T2 ]. This relationship holds for continuoustime systems for all t > 0.27) for continuoustime systems. q∗ P (t)q = 0. · · · . there exist elements u ¯ (j ) ∈ Rm . 2004 i i i i . To prove (4. First we show that Xreach is a linear space. thus. Conversely. 0. 0. and for discretetime systems (at least) for t ≥ n. P (t) is positive deﬁnite for some t > 0. Let T1 ≥ T2 . Since P (t) is symmetric q ⊥ P (t) if. Deﬁne the input function ˆ 2 (t) = u 0. Internal description 61 i “oa” 2004/4/14 page 61 i Proposition 4.8. n − 1). j = 0. 0.e. We now turn out attention to theorem 4.e. i > 0. For the converse inclusion. B). 1. Corollary 4. B). and only if. i > 0. which shows that x ¯ ∈ im R(A.e. q ⊥ R(A. ¯ = P (T (4.30) x ACA April 14. Ti ) ∈ Xreach . Proof.27) for discretetime systems. B). (b) To prove the second property. i. it is enough to show that q ⊥ P (t) if. B) ⊂ im R(A. t ∈ [T1 − T2 ..5. 2 then α1 x1 + α2 x2 ∈ Xreach for all α1 . is equivalent to B∗ eA t q = 0. ti−1 At ¯ we have ¯ ∈ Xreach . Proof. im P (t) = im R(A.9. Σ = A B is reachable if.e. (a) The symmetry of the reachability gramian follows by deﬁnition. AT −1−j Bu ¯ ¯ ∈ im RT ¯ ∈ im R(A. Since the exponential is an analytic function. q⊥R(A.i i 4. i.e. for some u ¯. Hamilton theorem.. this implies x ¯ = φ(¯ such that x u.and discretetime systems ˆ 2 . This in turn is equivalent to q⊥Ai−1 B. we make use of the expansion (4. T1 ). T j =0 ¯ (j ). and only if.8.
A method known as scaling and squaring yields the best results for a wide variety of matrices A..31) is a minimal energy input which steers the system to the desired The input function u state at a given time. ν . φ(ˆ ¯) = x ¯ . i = 1. A survey on this topic can be found in Moler and Van Loan [242]. · · · . the functions φi are a fundamental set of solutions of the autonomous differential d equation q (D)f = 0. T ˆ ≥ u ¯ . is thus deﬁned as its 2norm. . with eigenvalues λi . . In this case the Vandermonde matrix has to be modiﬁed accordingly. 1 λ2 . where D = dt . It follows that x u. . A formula for the matrix exponential.16) is eAt = fν (t)Aν −1 + fν −1 (t)Aν −2 + · · · + f2 (t)A + f1 (t)Iν . T proof of the theorem. ··· ··· ··· 1 λn . This concludes the ¯ = φ(¯ and transposition if the matrix or vector is complex). Proof. and q is the characteristic polynomial of A.32) for the discretetime case.e.30) and (4. −1 λν 1 −1 λν 2 If the eigenvalues are not distinct. and V is the Vandermonde matrix V ( λ1 . where [f1 (t) · · · fν (t)] = [φ1 (t) · · · φν (t)] V(λ1 . which is equal to ξ P (T )ξ . Linear dynamical systems: Part I i “oa” 2004/4/14 page 62 i ¯ is any positive real number for the continuoustime case and at least n for where P is the reachability gramian and T the discretetime case.10.31). i.12).12). λν )−1 If the eigenvalues of A are distinct. u (4. the functions φi are φi (t) = eλi t . ¯ deﬁned by (4. if the system is reachable. The energy or norm of the sequence or function f denoted by f . Then at time T u .. . Remark 4. (Recall that ∗ denotes transposition if the matrix or vector is real and complex conjugation ¯) ∈ Xreach . λ ν ) = 1 λ1 . Consider the square matrix A ∈ Rν ×ν . and let u ˆ be any input function which reaches the state x ¯ Proposition 4. Choose ∗ ¯ ¯ ¯ (t) = B∗ eA (T −t) ξ (4. i = 1. 0. Consider u ¯.1.33) ¯. ¯ deﬁned by (4. −1 λν n . i. the minimal energy required to reach the state x ∗ ¯ ¯ ¯ ¯ . · · · . .30) and (4.2. i i ACA : April 14. 2004 i i . From a numerical viewpoint the computation of the matrix exponential is a challenging proposition. One way to compute the matrix exponential of A given by (4.e. f = 0 f ∗ (t)f (t) dt.31) u for the continuoustime case and ¯ ¯ (t) = B∗ (A∗ )(T −t) ξ u ¯ (4. . =x (4. · · · . this formula becomes u ¯ u 2 ¯)−1 x ¯ ∗ P (T ¯. · · · . 0. we can quantify the time needed to arrive at a given reachable state.i i 62 Chapter 4. The proof is based on the fact that the inner product of u From the above considerations. is equal to the energy of the input function ¯ at time T Furthermore. The deﬁnition of the inner product for vectorvalued sequences and functions is given in (5. ν . f 2 T = f.34) ¯ with u ˆ −u ¯ is zero. (5.
Part (a) follows immediately from the Cayley–Hamilton theorem together with (4. Lemma 4. Br ) = rank R(A. This concludes the proof of the lemma.27). We conclude this subsection by stating a result stating various equivalent conditions for reachability. xn of X so that x1 .35) is unique. (a) For discretetime systems.11. B have the following matrix representations: A B .e.. Moreover. · · · .i i 4. the form (block structure) of the reachable decomposition (4.5.35) where the subsystem Σr = Ar Br is reachable. Let X ⊂ X be such that X = Xreach + X . xq is a basis of Xreach and xq+1 . we have x ¯ = φ(¯ ¯ is deﬁned by (4. and a second whose states are all unreachable.12. and in a subsystem Σr ¯ = which is completely unreachable. Thus every system Σ = A B Ar ¯ . the matrix representation of B in the above basis has the form given by the same formula. Given is Σ = A. Since Ar ¯r = 0. The interaction between Σr and Σr ¯ is given by Ar r ¯. every reachable state can be reached in at most n time steps. Proof. since im B ⊂ Xreach . while T Next we show that a nonreachable system can be decomposed in a canonical way into two subsystems: one whose states are all reachable. · · · . Reachable canonical decomposition.2. (4. it cannot be inﬂuenced by outside forces. can be decomposed in a subsystem Σr = 0 Ar Br which is reachable. Since Xreach is Ainvariant. (b) For continuoustime systems. Choose a basis x1 . is a reachable pair. xn is a basis for X . Internal description Proposition 4. The second part of the proposition implies that the delay in reaching a given state can be attributed to the nonlinearities present in the system. is not unique. it sufﬁces to notice that rank R(Ar . Given is Σ = A B i “oa” 2004/4/14 page 63 i 63 . 0. Finally. B) = dim Xreach = q. it follows that the unreachable subsystem Σr ¯ inﬂuences the reachable subsystem Σr but not vice versa.30) and proof of theorem 4. every reachable state can be reached arbitrarily fast. 2004 i i i i . i. · · · . ACA April 14. T ¯ is an arbitrary positive real number. It should be noticed that although X in the proof above. the matrix representation of A in the above basis has the form given by formula (4. There exists a basis in X such that A B Ar = 0 Ar r ¯ Ar ¯ Br 0 . (4. Br ∈ Rq×m .35).31). Proof. we showed that for any x u . In the latter part of the ¯) where u ¯ ∈ Xreach . to prove that Ar ∈ Rq×q . This establishes claim (b).
The next theorem shows that for continuoustime systems the concepts of reachability and controllability are i i ACA : April 14. Then v = (0 v2 ) (where 0 is the zero vector of appropriate dimension) is a left eigenvector of A that is also in the left kernel of B. Reachability conditions. 5. No left eigenvector v of A is in the left kernel of B: v∗ A = λv∗ ⇒ v∗ B = 0. 2004 i i . The rank of the reachability matrix is full: rank R(A. intuitively. A concept which is closely related to reachability is that of controllability. Linear dynamical systems: Part I i “oa” 2004/4/14 page 64 i Theorem 4. the concept of left coprimeness of polynomial matrices. B). such that ¯ (t) and a time T input function u ¯ ) = 0. n × m pairs of matrices. 2. After introducing the reachability property.e. B) = n. for a deﬁnition we refer for instance. Recall the canonical decomposition (4. B). −B) = n. Remark 4. The equivalence of the ﬁrst three statements has already been proved. The pair Σ = A B is stabilizable if Ar ¯ is stable. 3. The following are equivalent: 1. Put differently. A ∈ Rn×n . rank (µIn − A. Proof. The system Σ is (completely) controllable if Xcontr = X. B). in the space of all n × n. The last condition of the theorem is given for completeness. Deﬁnition 4. B) = 0. More precisely we have: Deﬁnition 4. We will prove the equivalence between conditions 1 and 4. there exists a basis in the state space such that ∗ ∗ A and B have the form given by (4. for some t > 0. instead of driving the zero state to a desired state. ¯. The polynomial matrices sI − A and B are left coprime. a (nonzero) state x ¯ < ∞. x The controllable subspace Xcontr of Σ is the set of all controllable states. B ∈ Rn×m . be a left eigenvector of Ar ¯. Conversely. a given nonzero state is steered to the zero state.or discretetime systems. B) = n.14. is not used further in this book. 4. The equivalence between conditions 4 and 5 is straightforward. T φ(¯ u. all its eigenvalues have either negative real parts or are inside the unit disk.2.2.35) of a pair (A. for all µ ∈ C 6.. Let v2 = 0. This concludes the proof. this implies the lack of reachability of (A. The fourth and ﬁfth conditions in the theorem above are known as the PBH or PopovBelevichHautus tests for reachability. Here. This means. i. clearly v∗ R(A. Reachability is a generic property.35).15.13. the unreachable pairs form a hypersurface (of “measure” zero). n × m pair of matrices A. B satisﬁes rank Rn (A. The pair (A. The reachability gramian is positive deﬁnite P (t) > 0. depending on whether we are dealing with continuous.i i 64 Chapter 4. is reachable. B) be unreachable. Given Σ = A B ¯ ∈ X is controllable to the zero state if there exist an . If there exists some nonzero v for which v∗ A = λv∗ and v∗ B = 0. we introduce a weaker concept which is sufﬁcient for many problems of interest. that almost every n × n. and 6 can be considered as a different way of stating 5. to [122]. let (A.
there exist u and T such that −x = φ(u.. the states that can be attained are also within the reachable subspace.13. the reachability of x1 implies the existence of u1 . Thus. we assume that T = 0. the numerical computation of rank is an illposed problem. Xreach is Ainvariant. let x ∈ Xreach . Since the input u is known. Internal description 65 i “oa” 2004/4/14 page 65 i equivalent. T ) = 0. (4. of theorem 4. and only if. as A may be singular. in particular Xcontr = Xreach + ker An . we have the state reconstruction problem. only the notion of reachability is used. Therefore the same holds for the numerical determination of reachability (controllability) of a given pair (A. Remark 4. Without loss of generality. x. which in turn implies φ(u. x2 − f (A. T ) = 0 for some u. Thus.3. Finally. T )x1 ∈ Xreach for some T. If τ ∈ [T. this implies eAT x = − 0 T eA(T −τ ) Bu(τ ) dτ ∈ Xreach . (b) For discretetime systems Xreach ⊂ Xcontr . x ∈ Xcontr .2. B). while for discretetime systems the latter is weaker. T1 such that x1 = φ(u1 . Typically. this distance is deﬁned as follows δr (A. x2 . Recall (4. 0. T ) = AT for discretetime systems. T12 ). i. however. in ACA April 14. Proof.3. x ∈ Xcontr implies φ(u.12). denoted by δr (A.16. T ) = eAT for continuoustime systems and f (A. x ∈ e−AT Xreach ⊂ Xreach . Therefore. T ). In general. Xreach ⊂ Xcontr . The converse does not hold true in general. B). x. 0. Since Xreach is Ainvariant. T ). One could consider instead the numerical rank of the reachability matrix R(A. The observation problem is discussed ﬁrst. there is a trajectory passing through the two points if. x2 . From the above results it follows that for any two states x1 . 0. given by the output equations (4.6. Thus. (a) By deﬁnition. T12 such that x1 = φ(u12 . T1 ). µ∈C In words. Then. while T12 = T1 + T2 .17). this is the inﬁmum over all complex µ of the smallest singular value of [µIn − A. T2 ) = 0. T ]. For this reason in the sequel. −x = φ(u. Thus we need to discuss the problem of reconstructing the state x(T ) from observations y(τ ) where τ is in some appropriate interval. T + t] we have the state observation problem. AT x ∈ Xreach . Given is Σ = A B . while if τ ∈ [T − t. Thus. The function u12 is then the concatenation of u2 with u1 . x ∈ Xcontr . To see this note that since x2 is reachable it is also controllable.i i 4. T2 such that φ(u2 . Xcontr ⊂ Xreach . very often the state x needs to be available. if x1 .36) 4. x2 . B]. It follows that −eAT x ∈ Xreach . 2004 i i i i . But x satisﬁes φ(u.19) are also known for t ≥ 0. B].3. x2 ∈ Xreach there exist u12 . (b) Let x ∈ Xreach .18) and (4. T ) = 0. This shows that if we start from a reachable state x1 = 0. This completes the proof of (a). B) = inf σmin [µIn − A.2 The state observation problem In order to be able to modify the dynamical behavior of a system. where f (A. are known. Conversely. Following part 5. A measure of reachability which is wellposed is the distance of the pair to the set of unreachable/uncontrollable ones. Theorem 4. the state variables are inaccessible and only certain linear combinations y thereof.2.19). the latter two terms in (4. (a) For continuoustime systems Xcontr = Xreach .e. x. Distance to reachability/controllability Following the considerations in section 3. (4.2. the latter inclusion follows because by corollary 4. thus there exist u2 . are not reachable. B) or of the reachability gramian P (T ) or in case the system is stable of the inﬁnite gramian P .
e. We are now ready to state the main Theorem 4.. for this subsection Σ= A C . Remark 4. t) for t ≥ 0. t) = α1 y1 (t) + α2 y2 (t) = 0 for all constants α1 . C ∈ Rp×n ¯ . This proves the linearity of the unobservable subspace. t ≥ 0. t) = 0. For discretetime systems.e. x2 be unobservable states.e. Let x1 . formula (4. the observation problem reduces to the following: given Cφ(0.. up to an arbitrary linear combination of unobservable states. and only if. for all t ≥ 0. A) = n. An immediate consequence of the above formula is Corollary 4. 2004 i i . (a) The unobservable subspace Xunobs is Ainvariant.39) Xunobs = ker O(C. This implies (4. Thus. i i ACA : April 14.e. The unobservable subspace Xunobs of X is the set of all unobservable states of Σ. A) = (C∗ A∗ C∗ (A∗ )2 C∗ · · · )∗ . x(0)..39) follows from the fact that the unobservability of x is equivalent to y(i) = CAi x = 0. Next. · · · . Then Cφ(0.19. Given y(t). A state x ¯ ∈ X is unobservable if y(t) = Cφ(0. i. Therefore for computational purposes.4. we will give the proof of theorem 4. A) = {x ∈ X : CAi−1 x = 0. t) + α2 Cφ(0. it has a unique solution if and only if Σ is observable. A ∈ Rn×n . it is completely determined by all its derivatives at t = 0. i. CAi−1 . The observation problem reduces to the solution of the linear set of equations On (C. i. Given Σ = A C (4.i i 66 Chapter 4. α2 and t ≥ 0. Otherwise x(0) can only be determined modulo Xunobs . i = 1. Since CeAt is analytic. x is unobservable if y(t) = CeAt x = 0.18. Xunobs is a linear subspace of X given by (4. by deﬁnition. For continuoustime systems.38) . if x ¯ is indistinguishable Deﬁnition 4.39). let Y0 denote the following np × 1 vector: Y0 = (y∗ (0) Dy∗ (0) · · · Dn−1 y∗ (0))∗ for continuous − time systems.17. for both t ∈ Z and t ∈ R. t ≥ 0. x2 . The observability matrix of Σ is O(C. i. i ≥ 0. x1 . the ﬁnite version On (C. A) = (C∗ A∗ C∗ · · · (A∗ )n−1 C∗ )∗ of the observability matrix is used. (4. Linear dynamical systems: Part I i “oa” 2004/4/14 page 66 i determining x(0) we may assume without loss of generality that u(·) = 0.2.37) Again by the Cayley–Hamilton theorem. α1 x1 + α2 x2 . Since B and D are irrelevant. A)x(0) = Y0 This set of equations is solvable for all initial conditions x(0). (c) Observability is basis independent. Σ is (completely) observable if Xunobs = {0}. Y0 = (y∗ (0) y∗ (1) · · · y∗ (n − 1))∗ for discrete − time systems. (b) Σ is observable if.18. where D = d dt . the kernel of O(C. x from the zero state for all t ≥ 0. n. Proof. A) is determined by the ﬁrst n terms. ﬁnd x(0). i > 0}. t) = α1 Cφ(0.. rank O(C.
A C B D . The matrix representations of A∗ . this relationship holds for continuoustime systems for t > 0 and for discretetime systems.20. respectively. Given is the pair (C. we now brieﬂy turn our attention to the reconstructibility problem. B. Their proof follows by duality from the corresponding results for reachability and is omitted. if x ¯ is indistinguishable from the zero state for all is unreconstructible if y(t) = Cφ(0. In terms of the observability gramian. The dual system Σ∗ of Σ = formally deﬁned as Σ∗ = −A∗ B∗ −C∗ D∗ ∈ R(n+m)×(n+p) . A). t) = 0. Lemma 4. or with the ﬂow of causality reversed and time running backwards. we obtain the following results. The ﬁnite observability gramians at time t < ∞ are: t Q(t) = 0 eA ∗ (t−τ ) C∗ CeA(t−τ ) dτ. B. be the dual maps of A. The unreconstructible subspace Xunrecon of X is the set of all unreconstructible states of Σ. it is shown that the dual system is also the adjoint system deﬁned by (5.5.2. i. C. the input map is given by −C∗ . There exists a basis in X such that Ao ¯ = 0 0 Ao ¯o Ao Co April 14. As in the case of reachability. computed in appropriate dual bases.i i 4. and only if. A). with respect to the standard inner product. Theorem 4.15). C∗ . (4. A state x ¯ . For completeness. In a similar way. The system Σ is reachable if and only if its dual Σ∗ is observable. 2004 i i i i . The main result is the duality principle. at least for t ≥ n. In section 5.22.42) ¯∈X Remark 4. for all t ≤ 0. D respectively. D∗ . It readily follows that ker Q(t) = ker O(C. (4.. C. Since (A. upon recalling that for a linear map M there holds (im M)⊥ = ker M∗ . B) is reachable if. (B∗ . Σ Σ∗ Proof. For this reason. t ∈ R+ . For continuoustime systems Xunrec = Xunobs . The result follows immediately from formulae (4. For discretetime systems Xunrec ⊃ unobs X . One may think of the dual system Σ∗ as the system Σ but with the role of the inputs and outputs interchanged. C have the following matrix representations A C ACA A C . and the dynamics are given by −A∗ . C∗ . is i. A). A)Ot (C. Given is Σ = A.41) ∗ Q(t) = Ot (C. D∗ are the complex conjugate transposes of A.37). A∗ ) is observable. The orthogonal complement of the reachable subspace of Σ is equal to the unobservable subspace of its dual Σ∗ : (Xreach )⊥ = Xunobs . one can prove that controllability and reconstructibility are dual concepts.2 on page 108.e. B∗ . B∗ . Observable canonical decomposition. this energy can be expressed as y 2 = x∗ Q(T )x.2.2.25) and (4. 4. D. Let Σ = A C i “oa” 2004/4/14 page 67 i 67 . This shows that while for continuoustime systems the concepts of observability and reconstructibility are equivalent. only the concept of observability is used in the sequel. in particular. the output map by B∗ .40) (4.21.e.3 The duality principle in linear systems Let A∗ .. t ∈ Z+ . Xunobs = Xunrec ∩ im An . Σ is (completely) reconstructible if Xunrec = {0}. The energy of the output function y at time T caused by the initial state x is denoted by y . x t ≤ 0. for discretetime systems the latter is weaker. Internal description Deﬁnition 4.
The observability gramian is positive deﬁnite: Q(t) > 0. A C B . A).13. i. 2004 i i . which from (4.23. The following are equivalent: 1.12 and 4. A) will be used instead.. 6.e.3. depending on whether we are dealing with a continuous.36) is: δo (C. The pair (C. The reachable and observable canonical decompositions given in lemmata 4.24. A) = δr (A∗ . In this case both (4. The polynomial matrices sI − A and C are right coprime. Reachableobservable canonical decomposition.40) are deﬁned for t = ∞: i i ACA : April 14. We conclude this subsection by stating the dual to theorem 4. and C have the following matrix representations Σ= A C B = Ar o ¯ 0 0 0 0 A12 Aro 0 0 Cro A13 0 Ar ¯o ¯ 0 0 A14 A24 A34 Ar ¯o Cr ¯o Bro ¯ Bro 0 0 . B. A). Again.e. Observability conditions. 2. Therefore as in the reachability case the distance of the pair to the set of unobservable. denoted by δo (C.28) and (4. C∗ ) = inf σmin [µIn − A∗ . has eigenvalues either in the lefthalf of the complex plane or inside the unit disk. Given is Σ = X such that A. 3. is an illposed problem. rank (µIn − A∗ . all eigenvalues of A have negative real part. i. A) = n. Linear dynamical systems: Part I is observable. the numerical determination of the observability of a given pair (C. C∗ ).. by section 3. No right eigenvector v of A is in the right kernel of C: Av = λv ⇒ Cv = 0.3.3 The inﬁnite gramians Consider a continuoustime linear system Σc = A C B D that is stable. B. Σ = is detectable if Ao ¯ in the observable canonical de composition is stable. for some t > 0.22 can be combined to obtain the following decomposition of the triple (A. C): Lemma 4. 5. is observable. Theorem 4. This distance is deﬁned by means of the distance to reachability of the dual pair (A∗ . 4.or discretetime system. C ∈ Rp×n . The rank of the observability matrix is full: rank O(C. for all µ ∈ C. A ∈ Rn×n . There exists a basis in where the triple Σro = Aro Cro Bro is both reachable and observable. C∗ ] = inf σmin µ∈C µ∈C µIn − A C 4. A C The dual of stabilizability is detectability.i i 68 where Σo = Ao Co i “oa” 2004/4/14 page 68 i Chapter 4. C∗ ) = n.
all eigenvalues of A are inside the unit disk. while the associated inﬁnite observability gramian satisﬁes A∗ Q + QA + C∗ C = 0. This proves (4.44) P and Q are the inﬁnite reachability and inﬁnite observability gramians associated with Σc . (4.3. consider the following two maps: the inputtostate map: ξ (t) = eAt B and the statetooutput map: η (t) = CeAt . in the absence of a forcing function u.e.45). if the discretetime system Σd = A C B D ∞ is stable. Proposition 4. the resulting output is y(t) = η (t)x(0). AP + P A∗ = 0 ∞ ∞ AeAτ BB∗ eA ∗ ∗ τ + eAτ BB∗ eA τ A∗ dτ ∗ = 0 d(eAτ BB∗ eA τ ) = −BB∗ .29) as well as (4.i i 4. 2004 i i i i . the associated inﬁnite reachability gramian P satisﬁes the continuoustime Lyapunov equation AP + P A∗ + BB∗ = 0. for time running from 0 to T are: T P= 0 ξ (t)ξ (t)∗ dt = 0 ∞ eAt BB∗ eA t dt.47) April 14.. The matrices P and Q are indeed gramians in the following sense: Recall that the impulse response of a continuoustime system Σ is: h(t) = CeAt B. It is readily checked that. called Lyapunov equations.45) Proof.46) (4. i. the gramians (4. (4. the resulting state is ξ (t).25. The gramians corresponding to ξ (t) and η (t). continuoustime system Σc as above. due to stability.43) Q= 0 eA τ C∗ CeAτ dτ.46) is proved similarly. (4. t > 0. moreover. If the input to the system is the impulse δ (t). ∗ (4. ∗ These are the expressions that we have encountered earlier as (ﬁnite) gramians. Given the stable. (4. ∗ and Q= 0 ∞ η (t)∗ η (t) dt = 0 T eA t C∗ CeAt dt. The inﬁnite gramians 69 i “oa” 2004/4/14 page 69 i ∞ P= 0 ∞ eAτ BB∗ eA ∗ τ dτ. Similarly. These gramians satisfy the following linear matrix equations.41) are deﬁned for t = ∞: P = RR∗ = k=0 ACA Ak BB∗ (A∗ )k . Now. if the initial condition of the system is x(0).
3. while the associated inﬁnite observability gramian Q satisﬁes A∗ QA + C∗ C = Q. valid for both discrete. discretetime system Σd as above.34).47).51) Q= 1 2π ∞ −∞ (−iω I − A∗ )−1 C∗ C(iω I − A)−1 dω. (4.52) In the discretetime case the inﬁnite gramians deﬁned by (4.48). moreover Q = C∗ C + A∗ QA. and (4. can also be expressed as P= 1 2π 1 2π 2π 0 2π 0 (eiθ I − A)−1 BB∗ (e−iθ I − A∗ )−1 dθ.i i 70 ∞ i “oa” 2004/4/14 page 70 i Chapter 4. In particular. 4. Linear dynamical systems: Part I Q = O∗ O = (A∗ )k C∗ CAk . (4. (4.54) These expressions will be useful in model reduction methods involving frequency weighting.26. From the deﬁnition of the gramians follows that: P (t2 ) ≥ P (t1 ). (4. 1 In the theory of Fourier transform.and continuoustime systems. (4.48) Notice that P can be written as P = BB∗ + AP A∗ . In continuous time we have: 2π −∞ g∗ (t)f (t) dt = −∞ F∗ (−iω )G(iω ) dω . Q(t2 ) ≥ Q(t1 ). Given the stable. These are the socalled discretetime Lyapunov or Stein equations. the associated inﬁnite reachability gramian P satisﬁes the discretetime Lyapunov equation AP A∗ + BB∗ = P . Proposition 4.44) yields 1 2π ∞ −∞ (iω I − A)−1 BB∗ (−iω I − A∗ )−1 dω . applying Plancherel’s theorem1 to (4.53) Q= (4. Plancherel’s theorem states that the inner product R of two (matrixvalued)R functions in the time domain and in ∞ ∞ the frequency domain is (up to a constant) the same. (4. t2 ≥ t1 . Recall (4. 2004 i i . k=0 (4.50) (4. −∞ g (t)f (t) = 0 F (e i i ACA : April 14.1 The energy associated with reaching/observing a state An important consideration in model reduction is the ability to classify states according to their degree of reachability or their degree of observability.42).49) The inﬁnite gramians in the frequency domain The inﬁnite gramians can also be expressed in the frequency domain.43) we obtain P= similarly (4.33). while in R 2π ∗ −iθ P∞ ∗ iθ discretetime there holds: 2π )G(e ) dθ . (e−iθ I − A∗ )−1 C∗ C(eiθ I − A)−1 dθ.
B ∈ Rn×m . Proposition 4. Λ = diag (λ1 . They turn out to be equal to the singular values of the Hankel operator introduced in section 5. where Rij = λi +λ∗ j . If in addition A is stable. (4. Hence from (4. produce small observation energy.28). Furthermore. λn ).1..i i 4.6 and (5. the eigenvalues of PQ are important invariants called Hankel singular values of the system. the product of the two gramians of equivalent systems is related by similarity transformation.56). A is diagonalizable. With the notation introduced above the following formula holds: P (T ) = VR(T )V∗ . the inﬁnite gramian (4. and denote by Wi ∈ C1×m the ith row of W. Quantities which are invariant under state space transformation are called inputoutput invariants of the associated system Σ. [R(T )]ij = (Wi Wj ) T .. i. in terms of the EVD of A.56) x∗ o Qxo . The above conclusion is at the heart of the concept of balancing discussed in chapter 7. Q = T−∗ QT−1 ⇒ P Q = T (PQ) T−1 . Lemma 4.3. As discussed in lemma 5.43) is given by P = VRV∗ .55) Similarly. the reachability gramian is deﬁned by (4. Let W = V−1 B ∈ Cn×m . and this minimal energy is −1 x∗ xr . the corresponding eigenvector is also complex. vi denotes the eigenvector corresponding to the eigenvalue λi .57) Therefore. 2004 i i i i . require the most energy to reach. i.e. The inﬁnite gramians 71 i “oa” 2004/4/14 page 71 i irrespective of whether we are dealing with discrete. Let the EVD be A = VΛV−1 where V = [v1 v2 · · · vn ]. This formula accomplishes both the computation of the exponential and the j integration implicitly.24). are (have a signiﬁcant component) in the span of those eigenvectors of P that correspond to small eigenvalues. Recall the deﬁnition of equivalent systems (4. the gramians are transformed as follows P = TP T∗ . If the eigenvalues of A are assumed to be distinct.27. the largest observation energy produced by the state xo is also obtained for an inﬁnite observation interval.or continuoustime systems. This lemma provides a way to determine the degree of reachability or the degree of observability of the states of Σ. and is equal to: (4. if λi + λ∗ j = 0. We summarize these results as follows. (b) The maximal energy produced by observing the output of the system whose initial state is xo is given by (4. The eigenvalues of the product of the reachability and of the observability gramians are inputoutput invariants. rP (4.3. Given a continuoustime system described by the pair A ∈ Rn×n .2. which are the most difﬁcult.3. the gramian is positive deﬁnite. Notice that if the ith eigenvalue is complex. and hence has the same eigenvalues.1. hence assuming stability and (complete) minimal energy for the transfer from state 0 to xr is obtained as T reachability. where [R(T )]ij = ∗ −Wi Wj 1 − exp [(λi + λ∗ j )T ] ∈ C λi + λ∗ j ∗ Furthermore. Under equivalence.e. Remark 4.34) follows that the ¯ → ∞.24). A formula for the reachability gramian.55). The states namely. Remark 4. Let P and Q denote the inﬁnite gramians of a stable linear system Σ. are (have a signiﬁcant component) in the span of those eigenvectors of Q which correspond to small eigenvalues. the states that are difﬁcult to observe. · · · . ACA −Wi W∗ April 14. (a) The minimal energy required to steer the state of the system from 0 to xr is given by (4.28.
3 below.2. The gramian P (T ) and the inﬁnite gramian P are: P (T ) = 1 −2T e + −2 −3T + −2 3e 1 2 2 3 −3T −2 + 3e 2 3 −e−4T + 1 . while the output is the current y. at a given time T > 0. (4. Both plots show the i i ACA : April 14. Parallel connection of a capacitor and an inductor ˙ 1 = −RL x1 + u. Thus 1 −τL 0 RL τ L A B R 1 −τC τC where τL = L . we choose as states the current through the inductor x1 . the time axis is t ¯. RL = 1. running from 0 to T . We choose x1 = [1 0]∗ . B= 1 2 ⇒ e ¯ At B= ¯ e−t ¯ 2e−2t . ∞ (t) = lim u2.T (t) = −12e ¯ u2 T (t) = T →∞ ¯ −t T →∞ ¯ = T − t.e. and the voltage across the capacitor x2 . y E RL u c cx1 L C c x2 RC Figure 4. RL RC RC It readily follows that this system is reachable and observable if the two time constants are different (τL = τC ). Reachability in this case inquires about the existence of an input voltage u which will steer the state of the system ˜ . P = lim P (T ) = T →∞ 1 2 2 3 2 3 1 . In the above expressions. t ≥ 0. τC = Σ= = 0 . Linear dynamical systems: Part I i “oa” 2004/4/14 page 72 i Example 4. x2 = [0 1]∗ .1.T (t) = 18e ¯ u1 T (t) = − ¯−2T ¯ 6e 2e−3T − 2 − 3e−t + 3e−t . in the left plot of ﬁgure 4. since the system is always reachable (for positive values of to some desired x the parameters) any state can be reached. RC = 2 : A= −1 0 0 −2 . t time runs backwards from T to 0. while the output equation is RC y = The state equations are Lx RC x1 − x2 + u.. e−6T − 9e−2T − 9e−4T + 1 + 16e−3T ¯ ¯ −t ¯ ⇒ u1 − 24e−2t . CR Rx ˙ 2 = −x2 + u. are: The corresponding inputs valid for t ¯ ¯−3T ¯ 6e−t 3e−4T − 3 − 4e−t + 4e−t . ∞ (t) = lim u1. Consider the electric circuit consisting of the parallel connection of two branches as shown in ﬁgure 4. Therefore the impulse response is h(t) = τC −τC t 1 τL −τL t e − e + δ (t). e−6T − 9e−2T − 9e−4T + 1 + 16e−3T ¯ ¯ −t ¯ ⇒ u2 + 18e−2t . ¯ between 0 and T . C = 1.i i 72 Chapter 4. where t is the time. in the plot on the right the time axis is t. In this case.3. 2004 i i . The input is the voltage u applied to this parallel connection. 1 Assuming that the values of these elements are L = 1. i.2.58) L RC C C D 1 1 1 −R RC C are the time constants of the two branches of the circuit.
B ∗ B ) = ACA 1 12 0 1 6 0 . starting with t = 0. i. A second simple example is the following: A= This implies P (T ) = − 2T − 3T −4T −1 +2 −1 + 2e 3e 4e 1 −2T 1 −4T −3T −e + 2e + 2e 1 12 1 − 2T 1 − 4T −e−3T + 2 e +2 e 4 − 3T 1 − 2T − 4T −e + 3e − 2e + 0 −2 1 −3 .i i 4. for T → ∞.00000 .50000 0.3. 4. In MATLAB. 2004 i i i i .54.66666 1. Re(λi (A)) < 0. 2. Electric circuit example: Minimum energy inputs steering the system to x1 = [1 0]∗ . Time axis reversed 10 10 Inputs for T = 1. The inﬁnite gramians Inputs for T = 1. B) is reachable. if in addition the pair (A.3. we get: P= 0. consequently. the inﬁnite gramian can be computed as the solution to the above linear matrix equation. Hence. and it satisﬁes (4. 10. If the system is stable.2. 4. B ∗ B ) For the matrices deﬁned earlier. multiplication and subsequent integration is required. Left plot: time axis running backward.51. to T = ∞ and the input function can thus be plotted only in the t the reachability gramian is deﬁned for T = ∞.45).51. We conclude this example with the computation of the reachability gramian in the frequency domain: P= 1 2π 1 2π ∞ −∞ (iω I2 − A)−1 BB∗ (−iω I2 − A∗ )−1 dω = 4 3 1 2π ∞ −∞ 1 ω 2 +1 1 ω 2 −iω +2 1 ω 2 +iω +2 4 ω 2 +4 dω = 1 2 2 3 2 3 = 2 arctan ω 4 ω 4 3 arctan 2 + 3 arctan ω 4 arctan ω 2 + 3 arctan ω 4 arctan ω 2 = ω =∞ 1 2π π 4π 3 4π 3 2π = 1 Example 4. 4. the activity occurs close ¯ axis. Notice that for T = 10 the input function is zero for most of the interval.24. 2. B= 0 1 ⇒ eAt = −e−2t + 2e−t −2e−t + 2e−2t e−t − e−2t 2e−2t − e−t . 4.66666 0. 2. Input energy: 6. 10.3. no explicit calculation of the matrix exponentials.. April 14. 2.e. 1 6 And ﬁnally the inﬁnite gramian is P = lyap (A. minimum energy inputs required to steer the system to x1 for T = 1. 10. Input energy: 6. we have: P = lyap (A. using the ’lyap’ command in the format ’short e’.24 i “oa” 2004/4/14 page 73 i 73 8 8 6 6 4 4 2 2 0 0 −2 −2 −4 −4 −6 −6 −8 −8 −10 0 1 2 3 4 5 6 7 8 9 10 −10 0 1 2 3 4 5 6 7 8 9 10 Figure 4. right plot: time axis running forward.54. 10 units of time. for T = 1.
i i 74 In the frequency domain P= = 1 2π 1 2π ∞ −∞ 2 3 i “oa” 2004/4/14 page 74 i Chapter 4.37). A2 ) and X2 = R(B2 . X = RO ∈ Rn×n . If m = 2p. A)O(C. but its form depends on the number of inputs and outputs m.50). We will now compute the gramian of a simple discretetime. a third one. the three (inﬁnite) gramians of Σ are P = RR∗ .3. A)O(CA. second order system A= 0 0 1 1 2 . is used in the sequel. 2.3. i = 1.47): ∞ P = RR∗ = k=0 Ak BB∗ (A∗ )k = 1+ 1 2 + 1 4 1 8 + + 1 16 1 32 + ··· + ··· 1 2 + 1+ 1 8 1 4 + + 1 32 1 16 + ··· + ··· = 2 3 2 1 1 2 For the frequency domain computation we make use of formula (4.49) and (4. it can be veriﬁed that X = X1 + X2 where X1 = R(B1 . B= 0 1 . This concept is ﬁrst deﬁned for discretetime systems Σ = A C B . Q = O∗ O.53): 1 P= 2π 1 = 2π 2π 0 4i 3 2i 3 4e−iθ (e−iθ −2)(2e−iθ −1) 4 (e−iθ −2)(2e−iθ −1) 4 (e−iθ −2)(2e−iθ −1) −iθ 4e (e−iθ −2)(2e−iθ −1) 2i 3 4i 3 dθ = θ =2π ln(eiθ − 2) − ln(2eiθ − 1) 4 ln(e − 2) − ln(2e − 1) iθ iθ 4 ln(eiθ − 2) − ln(2eiθ − 1) ln(e − 2) − ln(2e − 1) iθ iθ = θ =0 2 3 2 1 1 2 4. summarizing. A2 ). If m = p. To compute the reachability gramian in the time domain we make use of (4. A) (4. B) (4. Bi ∈ Rn×p . let B = [B1 B2 ].3. the cross gramian is the n × n matrix deﬁned by X = RO. Given the (inﬁnite) reachability matrix R(A. 2004 i i . The crossgramian satisﬁes a Stein equation as well. Thus. Notice that these gramians are the three ﬁnite matrices which can be formed from the reachability matrix (which has inﬁnitely many columns) and the observability matrix (which has inﬁnitely many rows). Linear dynamical systems: Part I (iω I2 − A)−1 BB∗ (−iω I2 − A∗ )−1 dω = arctan ω − 0 1 3 1 2π ∞ −∞ 1 ω 4 +5ω 2 +4 iω ω 4 +5ω 2 +4 −iω ω 4 +5ω 2 +4 ω2 ω 4 +5ω 2 +4 π 6 dω = 1 12 arctan ω 2 4 3 arctan ω 2 0 − 2 3 arctan ω = ω =∞ 1 2π 0 π 3 0 = 0 1 6 0 Example 4. i i ACA : April 14. the cross gramian. p of Σ. and the observability matrix O(C.25).2 The crossgramian In addition to the reachability and observability gramians. These ﬁrst two gramians satisfy the Stein equations (4. a moment’s reﬂection shows that X satisﬁes the following Sylvester equation AX A + BC = X .
the product PQ = 0. Their eigenvalues are inputoutput invariants for the associated Σ.5.1 on page 216). the eigenvalues and the singular values of the Hankel operator H associated with Σ.3. The theory of optimal approximation in the Hankel norm discussed in chapter 8. if in addition RL = RC . are given by these eigenvalues. Example 4. the crossgramian X is deﬁned as the solution to the Sylvester equation AX + X A + BC = 0. either theoretically or computationally. Secondly. Consider the circuit shown in ﬁgure 4. and the imaginary axis onto the unit circle. p. the Stein equation involves Ar . is easier to formulate for discretetime systems. observability.60) All three gramians are related to the eigenvalues and singular values of the Hankel operator which will be introduced later.3. The system matrices are given by (4. it can readily be shown that X can be expressed as ∞ (4. 4. which is reﬂected in the fact that X in this case has two zero eigenvalues. page 205. Thus the reachability. Therefore.2. 2004 i i i i . Under a state space transformation T.25. if the number of inputs of the stable continuoustime system Σ is equal to the number of outputs m = p. page 113. the three gramians are transformed to TP T∗ . 75 i “oa” 2004/4/14 page 75 i For general m and p. namely λi (H) = λi (X ). by solving the corresponding Stein equation. In particular. 1+s The bilinear transformation is deﬁned by z = 1 −s . and cross gramians are: P= τL τC 1 1 2 2τ RL C 1 1 RL τL +τC 1 1 RL τL +τC 1 2τL 1 2τ L 1 −R C 1 τL +τC λi (PQ) . As in the discretetime case.43) and (4. Q= 1 1 −R C τL +τC 1 1 2 2τ RC C 1 2RL τC τL +τC X = − RL1 RC τL τL +τC 1 − 2R C Notice that P and Q become semideﬁnite if the two time constants are equal τL = τR .58). In section 12. that is. while the eigenvalues of the reachability and observability gramians are not inputoutput invariants.i i 4. The inﬁnite gramians Combining these expressions we obtain the Stein equation satisﬁed by X : AX A2 + B C CA = X. T−∗ QT−1 . TX T−1 . where r is the least common multiple of m. the gramians remain invariant under the bilinear transformation. while it is easier to solve for continuoustime systems. respectively. the transfer function Hc (s) of a continuoustime system is obtained from that of the discretetime transfer function Hd (z ) as follows: Hc (s) = Hd ACA 1+s 1−s April 14. and maps the open lefthalf of the complex plane onto the inside of the unit disc.4. We will mention here some cases where this transformation is important. both the product PQ and X are transformed by similarity.59) X = 0 eAt BCeAt dt.3 A transformation between continuous and discrete time systems Often it is advantageous to transform a given problem in a way that its solution becomes easier.2. As will be shown in section 5. this fact is used in order to iteratively solve a continuoustime Lyapunov equation in discretetime.and continuoustime systems.4. Similarly to (4. A transformation that is of interest in the present context is the bilinear transformation. both for discrete.3. Thus given a discretetime system the bilinear transformation is used to obtain the solution in continuoustime and then transform back (see example 8.44) in proposition 4. (4. σi (H) = The crossgramian for SISO systems was introduced in [112]. as stated in the next proposition.
C.61) The converse problem. Qd . B. H. Qc . is far from trivial. Continuous time Discrete time F=( I + A)(I − A)−1 √ G = √2(I − A)−1 B H = 2C(I − A)−1 J = D + C(I − A)−1 B A. states x. be the stable discretetime system obtained by means of the bilinear transformation given above. Given the stable continuoustime system Σc = Σd = F H G J A C B D with inﬁnite gramians Pc . The question thus arises as to the relationship between these two descriptions. The above result implies that the bilinear transformation between discrete. The former makes use of the inputs u. derive the internal one.i i 76 Consequently. Consequently. the external description is readily derived. It follows that this bilinear transformation preserves the gramians: Pc = Pd and Qc = Qd . the matrices Σc = A C B D . this concept is discussed in section 7. The latter makes use only of the inputs u and the outputs y. with inﬁnite gramians Pd . hk = CAk−1 B ∈ Rp×m .29.8) and (5. the transformation preserves the inﬁnity norms as deﬁned by (5. the transfer function of the system is given by (4. the Hankel norm of Σc and Σd . (4. D z= 1+s 1− s A=√ (F + I)−1 (F − I) B = √2(F + I)−1 G C = 2H(F + I)−1 D = J − H(F + I)−1 G s= z −1 z +1 F.4 The realization problem In the preceding sections we have presented two ways of describing linear systems: the internal and the external. deﬁned by 5. As shown on page 58. This is the realization problem: given the external description of a linear system construct an internal or state variable description. Given the internal description Σ = A C B D of a system. i. and outputs y.9). the transfer function H. In other words.e.and continuoustime systems preserves balancing. Linear dynamical systems: Part I i “oa” 2004/4/14 page 76 i of these two systems are related as given in the following table. In one direction this problem is trivial. · · · . or equivalently. 2. . k = 1.. 4. is the same. 2004 i i . J Proposition 4. given the impulse response h.22) HΣ (s) = D + C(sI − A)−1 B while from (4. Furthermore. Σd = F G H J Chapter 4. the Markov parameters are given by h0 = D. G. let . given the external description.5. or the Markov i i ACA : April 14.23).
1. 1.4. 1. · · · . · · · } = {1. 7. 3. . 19. . 7. . construct A C B D i “oa” 2004/4/14 page 77 i 77 . Antoulas [9]. Example 4. We start by listing conditions related to the realization problem. 21. k = 1. van Barel and Bultheel [329]. e. 2004 i i i i . (H)i. . it has inﬁnitely many rows.. . 34. 8. Lemma 4. . . · · · } inverse factorials 1! 2! 3! 4! 5! 6! It is assumed that for all sequences h0 = D = 0. (4.4. are they unique in some sense? (c) Construction: in case of existence. 8. Kalman [193].62) The triple (C. Consider the following (scalar) sequences: = {1. Which sequences are realizable? This question will be answered in the example of section 4. . 5. . B) such that hk = CAk−1 B.2. Each statement below implies the one that follows: (a) The sequence hk . 5. 1. Remark 4. 3. ﬁnd n and give an algorithm to construct such a triple. . is realizable. 2. B ∈ Rn×m . inﬁnitely many columns. .62) holds.63) This is the Hankel matrix. Given the sequence of p × m matrices hk . Fuhrmann [122]. A. B) is a minimal realization if among all realizations of the sequence. and Arbib [192]) and eventually two approaches crystallized: the state space and the polynomial (see Fuhrmann [120] for an overview of the interplay between these two approaches in linear system theory). 13. Hence the following problem results: Deﬁnition 4. . 55. k > 0. Problems.1. The polynomial approach has the Euclidean division algorithm as focal point. A. and the latter is called a realizable sequence. 1. . (b) Uniqueness: in case such an integer and triple exist. such that (4.31. . 17.. .4. 5. . Σ1 Σ2 Σ3 Σ4 (4. It readily follows without computation that D = h0 . . the realization problem consists of ﬁnding a positive integer n and constant matrices (C. The realization problem is sometimes referred to as the problem of construction of state for linear systems described by convolution relationships. The state space method uses the Hankel matrix as main tool. . Actually. k > 0. · · · } Fibonacci numbers = {1. 2. Realization was formally introduced in the 60s (see Kalman. . A ∈ Rn×n .61) holds. and block Hankel structure. 13. determine whether there exist a positive integer n and a triple of matrices C. .30. j > 0. . i. .. . H= hk h · · · h h · · · k+1 2k−1 2k hk+1 hk+2 · · · h h · · · 2 k 2 k +1 .j = hi+j −1 . 6. 1. .. ACA April 14. 1. its dimension is the smallest possible. multioutput systems. The realization problem parameters hk of a system. · · · } primes 1 1 1 1 1 1 Σ5 = { . . C ∈ Rp×n . Falb. 4. and will be presented next.i i 4.4. . The main tool for answering the above questions is the matrix H of Markov parameters: h1 h2 ··· hk hk+1 · · · h2 h3 · · · hk+1 hk+2 · · · . . . . . 9. for i. · · · } natural numbers = {1.g. . 3. . see. A. B such that (4. 1. (C. A. The following problems arise: (a) Existence: given a sequence hk .e. . . B) is then called a realization of the sequence hk . k > 0. 11. . Antoulas [9] presents the complete theory of recursive realization for multiinput. 1. 2. 2.
3 asserts that equivalent triples (C. 0 ≤ i < r.32.. H = O(C. i. B). it also provides a direct proof of the implication (a) ⇒ (d). the rank of H is ﬁnite.e.64) CAk−1 Bs−k = C k>0 Ak−1 s−k B = C(sI − A)−1 B. this holds for realizations with the smallest possible dimension. we need to recall the concept of equivalent systems deﬁned by (4. see chapter 6 of Meyer’s book [239]). A. proposition 4. A)R(A. · · · }. 2. Proof. Hence: H= CB CAB CAB CA2 B . ··· ··· = O(C. the matrix of cofactors (for the deﬁnition and properties of the cofactors and of the adjoint. The previous relationship implies χA (s) k>0 hk s−k = C [adj (sI − A)] B. is realizable. . (c) The sequence hk . Notice that the quantity in parentheses is a formal power series and convergence is not an issue.i i 78 Chapter 4. this implies that all block columns after the rth are linearly dependent on the ﬁrst r. such that hr+k = −α0 hk − α1 hk+1 − α2 hk+2 − · · · − αr−2 hr+k−2 − αr−1 hr+k−1 . 2004 i i .64) imply that the (r + 1)st block column of H is a linear combination of the previous r block columns. B). Proof. This proves (b). If the sequence of Markov parameters is realizable by means of the triple (C. which in turn implies the ﬁniteness of the rank of H. Indeed.65) It follows that rank H ≤ max{rank O. .64). A)R(A. On the lefthand side there are terms having both positive and negative powers of s. In order to discuss the uniqueness issue of realizations. In particular. satisﬁes a recursion with constant coefﬁcients. H can be factored. Lemma 4. if the sequence of Markov parameters is realizable. B) (4. where adj (M) denotes the matrix adjoint of the M. Consequently. A. k > 0 (d) The rank of H is ﬁnite. n = 1. k > 0. i i ACA : April 14.. Hence hk s−k = k>0 k>0 (4. .e. this implies precisely (4. Linear dynamical systems: Part I i “oa” 2004/4/14 page 78 i (b) The formal power series k>0 hk s−k is rational. hold. the relationships hn = CAn−1 B. Factorization of H. while on the righthand side there are only terms having positive powers of s. B) have the same Markov parameters. because of the block Hankel structure. If the sequence {hn . there exist a positive integer r and constants αi . Hence the best one can hope for the uniqueness question is that realizations be equivalent. (c) ⇒ (d) Relationships (4. The following lemma describes a fundamental property of H. . as shown in the next section. rank R} ≤ dim (A). Furthermore. (b) ⇒ (c) Let det(sI − A) = α0 + α1 s + · · · + αr−1 sr−1 + sr = χA (s). every block column of H is a subcolumn of the previous one. (a) ⇒ (b) Realizability implies (4. .24). i. Hence the coefﬁcients of the negative powers of s on the lefthand side must be identically zero.62).
Main Result.67). It acts on the columns on the Hankel matrix. is realizable if. in other words σ is a shift by m columns.e. where C = Λ. note that the columns making up Φ∞ need not be consecutive columns of H. Denote these columns by Φ∞ . Thus (4.1 The solution of the realization problem We are now ready to answer the three questions posed at the beginning of this section.. its rows are the ﬁrst p rows of H. (iii) Λ ∈ Rp×n is composed of the same columns as Φ. its columns are the ﬁrst m columns of H.e.67) For this proof (M)k .31. Construct the following matrices: (i) σ Φ ∈ Rn×n . Since the columns of Φ∞ form a basis for the space spanned by the columns of H. and hence the equivalence of the statements in lemma 4. is applied to each column separately. the ﬁrst p individual rows. For this we need to deﬁne the shift σ . is a realization of dimension n of the given sequence of Markov parameters. 2004 i i i i . of Φ∞ : C = (Φ∞ )1 .e. k > 0. To prove (1) in the other direction we will actually construct a realization assuming that the rank of H is ﬁnite. σ (H)k = (H)k+m . together with (4. by m individual columns. Proof.66) h2 = (σ Γ∞ )1 = (σ Φ∞ B)1 = (Φ∞ AB)1 = (Φ∞ )1 AB = CAB ACA April 14.32 proves part (1) of the main theorem in one direction. Thus making use of (4. Lemma 4.4. (1) The sequence hk . i. By assumption there exist n = rankH columns of H that span its column space. Silverman Realization Algorithm. i.4. Recall that the ﬁrst block element of Γ∞ is h1 .i i 4. Conversely.68) implies h1 = (Γ∞ )1 = (Φ∞ B)1 = (Φ∞ )1 B = CB.34. Lemma 4.66) (4. Let rank H = n. denote the k th block row of the matrix M. (4. A = Φ−1 σ Φ. and only if. Find an n × n submatrix Φ of H which has full rank. (2) The state space dimension of any solution is at least n. (3) All minimal realizations are equivalent.. Let σ Φ∞ denote the n columns of H obtained by shifting those of Φ∞ by one block column. We denote by σ the column rightshift operator. deﬁne C as the ﬁrst block row. The shift applied to a submatrix of H consisting of several columns. and the columns obtained by shifting each individual column of Φ by one block column (i. there exist unique matrices A ∈ Rn×n and B ∈ Rn×m such that: σ Φ∞ = Φ∞ A Γ∞ = Φ∞ B Finally. Theorem 4. using our notation (Γ∞ )1 = h1 . The triple (C. if (H)k denotes the k th column of H. (ii) Γ ∈ Rn×m is composed of the same rows as Φ. k > 0. B). i... For the next Markov parameter. let Γ∞ denote the ﬁrst m columns of H.e. rank H = n < ∞.33. notice that h2 = (σ Γ∞ )1 = (Γ∞ )2 . The realization problem 79 i “oa” 2004/4/14 page 79 i 4. m columns). A. and B = Φ−1 Γ. is the submatrix of H having the rows with the same index as those of Φ.68) (4. every realization that is reachable and observable is minimal. In the process we prove the implication (d) ⇒ (a). All realizations which are minimal are both reachable and observable.
h2 = 2. (4. 21.. rankR} ≤ dim(A) ˆ. the following determinants are nonzero: 2 −1 1 −1 2 . .69) implies C1 = C2 T−1 and B1 = TB2 .35. while (4. which is a contradiction to the assumption that the rank of H is equal to n. We will show the existence of a transformation T. which in turn yields 1 2 2 On A1 R1 n = On A2 Rn . In this context the following holds true. p = m = 1). 2. .B ˆ ) be a reachable and observable realization.n+1 = On [B1 A1 R1 n ] = On [B2 A2 Rn ]. 3. We only provide the proof for a special case. ··· ··· ··· ··· ··· ··· .A ˆ. From lemma 4. i det On (4. A. B) is indeed a realization of dimension n. .2. n > 0. singleoutput case (i. .32 we conclude that 1 1 2 2 Hn. which concludes the proof. . .i i 80 Chapter 4. 2004 i i . .1 Σ3 = {1. .4. 55. it is reachable and observable. 5.n = On Rn = On Rn . det Ri n = 0. . Since H = OR.4. 2. B) be some realization of hn . The state dimension of a realization cannot be less than n. Let (C. . k > 0 The Hankel matrix (4. 2. A realization of Σ is minimal if. . we obtain hk = (σ k−1 Γ∞ )1 = (σ k−1 Φ∞ B)1 = (Φ∞ Ak−1 B)1 = (Φ∞ )1 Ak−1 B = CAk−1 B. Outline of proof. and hk+2 = hk+1 + hk . we conclude that dim(A ˆ ) ≤ rankH. . namely that minimal realizations are equivalent. We now investigate the realization problem for the Fibonacci sequence given in example 4. . Bi ). nonsingular matrix of size A We are now left with the proof of part (3) of the main theorem.63) becomes H= 1 2 3 5 2 3 5 8 3 5 8 13 5 8 13 21 8 13 21 34 13 21 34 55 . A. Singleinput. . combining (4. Since H = O ˆR ˆ . the same lemma also implies 1 2 2 Hn. . and each of the matrices O ˆ. ) On = R1 T = ( On n (Rn ) Equation (4.70) implies A1 = TA2 T−1 .. Ai . notice that the Silverman algorithm constructs minimal realizations. if such a realization exists. the proof of the general case follows along similar lines. i = 1. · · · } which is constructed according to the rule h1 = 1. 8. 8 13 13 21 21 34 34 55 55 89 89 144 . Lemma 4. the rank of H will be less than n. Furthermore.67). Example 4. and only if.24) holds.66) and (4. such that (4. . Thus a realization of Σ whose dimension equals rank H is called minimal realization. . Proof. We now deﬁne Because of minimality. (4. R ˆ contains a Let (C ˆ .68). be minimal realizations of Σ.e. Thus (C. 13. Linear dynamical systems: Part I i “oa” 2004/4/14 page 80 i For the k th Markov parameter. indeed.70) = 0.69) where the superscript is used to distinguish between the two different realizations. Let (Ci . 34. det T = 0. rankH ≤ min{rankO. i = 1. i i ACA : April 14. .
i i 4.1. The realization problem 81 i “oa” 2004/4/14 page 81 i It readily follows from the law of construction of the sequence that the rank of the Hankel matrix is two. 10−17 . i. 0 −q2 p2 . namely.e. s2 − s − 1 are realizable while the last two. . i. H2 (s) = . 5 21 13 55 . . We can write H(s) = D + where D is a constant in R and p. C∈R C C A p(s) . . In particular. p = m = 1. hk s−k = s+1 .2 Realization of proper rational matrix functions Given is a p × m matrix H(s) with proper rational entries. q(s) ΣH = A C B D 0 0 . is nonzero. (s − 1)2 0 Concerning the remaining four sequences of example 4. that det Hn = (j − 1)2 . of size n. ··· ··· 0 0 .e.71) ACA April 14. −1/2 1/2 1/2 3/2 . H1 (s) = and Σ2 = s−1 1 0 0 −1 In the last case H5 (s) = es − 1. B = Φ− 1 Γ = −3/2 1/2 and C = (2 8). . which is not rational and hence has no ﬁnitedimensional realization. 6 is of the order 10−7 . pi ∈ R. 0 1 1 1 − 1 Σ1 = . 0 1 D (4. 0 −q1 p1 0 1 . we can write down a realization of H(s) as follows 0 B B B B =B B B @ 1 C C C C (ν +1)×(ν +1) . Φ is chosen so that it contains rows 2. Σ1 and Σ2 and Σ5 are not realizable.4. Consider ﬁrst the scalar case.4. qi ∈ R. also known as the Hankel determinant. .j = i+j −1 . 5.. 1 −qν −1 pν −1 0 0 . . and 10−43 respectively. . 2004 i i i i . In terms of these coefﬁcients pi and qi . 4 and columns 2. The fact that 1 Σ5 is not realizable follows also from the fact that the determinant of the associated Hankel matrix Hi. q are polynomials in s. entries whose numerator degree is no larger than the denominator degree... it has been shown in [5] namely. 5 of H: Φ= The remaining matrices are now σΦ = It follows that A = Φ− 1 σ Φ = Furthermore H3 (s) = k>0 3 8 13 34 ⇒ Φ−1 = −17 4 13 2 −3 2 . . . Γ= 2 5 and Λ = (2 8). . 0 −q0 p0 1 0 .4. . q(s) = q0 + q1 s + · · · + qν −1 sν −1 + sν . (2j − 1)(2j − 2)2 (2j − 3) i=2 j =2 n i which implies that the Hankel determinant for n = 4. 4. p(s) = p0 + p1 s + · · · + pν −1 sν −1 . Σ4 1 2 1 0 s 1 .
The realization is in addition. 0m −q2 Im P2 ··· ··· . the McMillan degree turns out to equal the rank of the associated Hankel matrix H. the state of this realization has size νm. k > 0. (4. Linear dynamical systems: Part I i “oa” 2004/4/14 page 82 i where q is a scalar polynomial that is the least common multiple of the denominators of the entries of H. .71) is minimal if p and q are coprime. 0m Im . observable if the polynomials p and q are coprime. . Alternatively. Unlike the scalar case however. 0m Im D Im · · · −qν −1 Im ··· Pν −1 where 0m is a square zero matrix of size m.74) i i ACA : April 14. An important attribute of a rational matrix function is its McMillan degree. i. Pi ∈ Rp×m . ν (4. The Markov parameters can be computed using the following relationship. is guaranteed to have ﬁnite rank.. and let q (s) denote the characteristic polynomial of A. 0m 0m .i i 82 It can be shown that ΣH is indeed a realization of H. assume that H(s) = C(sI − A)−1 B. this means that the rank of the associated Hankel matrix is at most ν . let q (k) (s) = sν −k + qn−1 sν −k−1 + · · · + qk+1 s + qk . . This realization is reachable but not necessarily observable. · · · . . the realization ΣH need not be minimal. . Given the polynomial q as above. Then adj (sI − A) = q (ν ) (s)Aν −1 + q (ν −1) (s)Aν −2 + · · · q (2) (s)A1 + q (1) (s)I. For proper rational matrix functions H. ΣH = = C D 0m 0 m −q0 Im −q1 Im P0 P1 . In the general case. k = 1. in other words the McMillan degree in this case is equal to the dimension of any minimal realization of H. . .72) denote its ν pseudoderivative polynomials. . . The construction given above provides a realization: 0m Im 0m 0 m . H(s) = D + C(sIν − A)−1 B. we can write H(s) = D + 1 P(s). The result (4. One way to obtain a minimal realization is by applying the reachable observablecanonical decomposition given in lemma 4. It follows that the numerator polynomial P(s) is related to the Markov parameters hk and the denominator polynomial q . Im is the identity matrix of the same size. Thus (4. qi ∈ R. the rank of the ensuing Hankel matrix associated with the sequence of Markov parameters hk . and consequently.73) follows by noting that P(s) = C adj (sI − A) B. . In particular the following upper bound holds: rankH ≤ min{νm. q(s) Chapter 4.23. as follows: P(s) = h1 q (1) (s) + h2 q (2) (s) + · · · + hν −1 q (ν −1) (s) + hν q (ν ) (s).. A B . In this case the rank of the associated Hankel matrix H is precisely ν . in this case H has to be expanded into a formal power series H(s) = h0 + h1 s−1 + h2 s−2 + · · · + ht s−t + · · · . 2004 i i .73) This can be veriﬁed by direct calculation. (4. An alternative way is to apply the Silverman algorithm. Since H is rational. νp}. 0m 0m . q(s) = q0 + q1 s + · · · + qν −1 sν −1 + sν .e. and P is a polynomial matrix of size p × m: P(s) = P0 + P1 s + · · · + Pν −1 sν −1 . .
A)AΨ = R∗ (B∗ . 4. A∗ )O∗ (A∗ . Deﬁnition 4.63) is symmetric. In other words.4. and there exists a symmetric matrix Ψ = Ψ∗ such that AΨ = ΨA∗ .37. it possesses a symmetric realization. A)ΨO∗ (A∗ . Lemma 4. the equality AΨ = ΨA∗ follows. The triple A C B is then called a partial realization of the sequence hk . there exists a matrix Ψ ∈ Rn×n such that O(C. Proof. (b) parametrization of solutions: parametrize all minimal and other solutions. B) such that hk = CAk−1 B.75) It follows that every symmetric system has a symmetric realization. Recursive solutions were provided in [12] and [11]. H = H .36.4. and only if. which we denote by OI . Since HI. hk = h∗ k . a ﬁnite sequence of matrices is always realizable. It remains to show that Ψ is symmetric. is symmetric. The realization problem 83 i “oa” 2004/4/14 page 83 i 4. in }. B = ΨC∗ .4 The partial realization problem This problem was studied in [193]. 2004 i i i i . C ∈ Rp×n . A)Ψ = R∗ (B∗ . C∗ ) = H∗ . since O has full column rank. r. Given the ﬁnite sequence of p × m matrices hk . Again. (4. k = 1. implies H = O(C. thus OI ΨOI = HI. 2. B) = R∗ (B∗ . · · · . A. A)R(A.4 and in particular deﬁnition 4. A∗ )A∗ = O(C. A moment’s reﬂection shows that if Σ has a symmetric realization. C∗ ) = R∗ (B∗ . B ∈ Rn×m . it has an n × n nonsingular ∗ submatrix. A∗ )O∗ (A∗ .I . N.i i 4. k = 1. A reachable and controllable system Σ is symmetric if. C∗ ) = H. Deﬁnition 4. As a consequence the set of problems arising are: (a) minimality: given the sequence hk . where the latter is the submatrix of the Hankel matrix composed of those rows and columns indexed by I .I is symmetric. this together with the factorization (4. Because of lemma 4. since the column span of H and H∗ are the same. The proof is thus complete. Σ is ∗ ∗ symmetric if h0 = h0 and the associated Hankel matrix (4. Conversely. · · · .31. Similarly to the realization problem. Thus. Since O has full column rank.4. · · · . k = 1. the partial realization problem can be studied by means of the partially 2 Recall that if a matrix is real the superscript (·)∗ denotes simple transposition ACA April 14. The realization problem with partial data is deﬁned as follows. Notice that O(C.3 Symmetric systems and symmetric realizations 2 A system Σ is called symmetric if its Markov parameters are symmetric. A∗ ). A ∈ Rn×n . · · · . r. composed of the rows with index I = {i1 . this proves the symmetry of Ψ. hence CΨ = B∗ . Thus −1 ∗ −1 Ψ = [OI ] HI. A)ΨA∗ . it is symmetric.38. let the system be symmetric.I [OI ] .65). (c) recursive construction: recursive construction of solutions. k ≥ 0 . Recall section 4. Furthermore O(C. A realization is called symmetric if D = D∗ . ﬁnd the smallest positive integer n for which the partial realization problem is solvable. the partial realization problem consists in ﬁnding a positive integer n and constant matrices (C.30.
. i i ACA : April 14. b. which is a minimal partial realization. as a matter of fact the above expressions provide a parametrization of all minimal solutions. C = (1 1 1) 0 1 a−2 0 Hence. H3 = 1 a b 1 b c 1 1 2 1 2 . Example 4. We illustrate this procedure by means of a simple example. there always exists a partial realization of dimension n. Silverman’s algorithm (see lemma 4. As already noted. 1. H2 = a 1 1 1 1 . Once the rank of Hk is determined. hr ? ··· ··· hr .i i 84 deﬁned Hankel matrix: Chapter 4. the corresponding Hankel matrix H4 together with its ﬁrst 3 submatrices are: 1 1 H4 = 1 2 1 1 2 a 1 2 1 2 a . c denote the unknown continuation of the original sequence.e. ∗ 4. .4. k = 1. ··· ··· ? ? where ? denote unknown matrices deﬁning the continuation of the given ﬁnite sequence hk . This problem is known as rational interpolation. . It follows that rank H4 = 3 By lemma 4. which implies 0 0 a2 − 4a + 8 − b 1 A = 1 0 −a2 + 3a − 4 + b . H1 = (1) where a. we note that the value of c is uniquely determined by a. Hr = . The rank of the partially deﬁned Hankel matrix Hk is deﬁned as the size of the largest nonsingular submatrix of Hk . In this section we will address the problem of constructing an external or internal description of a system based on ﬁnitely many samples of the transfer function taken at ﬁnite points in the complex plane. Linear dynamical systems: Part I hr ? .3. 1. m = p = 1) sequence Σ = (1. B = 0 . For simplicity we will assume that the systems in question have a single input and a single output and thus their transfer function is scalar rational. We distinguish between two approaches. independently of the unknown parameters ”?”. . the Markov parameters constitute information about the transfer function H(s) obtained by expanding it around inﬁnity H(s) = k≥0 hk s−k . det H2 = 0.34. det H1 = 1. N . we choose: Φ = H3 .. we must have: c = −a3 +5a2 −12a+2ab−4b+16. In this case. there are multiple minimal partial realizations of Σ. b ∈ R. for realizations of minimal degree 3. ? ? i “oa” 2004/4/14 page 84 i h1 h2 . hr h2 . Γ = Λ = (1 1 1)∗ . It then follows that the dimension of any partial realization Σ satisﬁes dim Σ ≥ rank Hk = n Furthermore. . ∈ Rrp×rm . 2004 i i .5 The rational interpolation problem* The realization problem aims at constructing a linear system in internal (state space) form. from some or all Markov parameters hk . The determinants of these matrices are: det H4 = −a3 + 4a2 − 8a + 8 + 2ab − 3b − c. Finally.34) can be used to construct such a realization. · · · . . Consider the scalar (i. det H3 = −1. b. 2). the parameters are a. .
2004 i i i i .79) (4.80) ( s) = j =1 φj i= j s − si sj − si (4. The second approach presented in subsection 4.77).81) A similar formula holds for the general case.78) which satisfy additional constraints. one of can be chosen freely. d) = 1 d(s) = N . In the distinct point case (4. 1. The L¨ owner matrix encodes the information about the admissible complexity of the solutions. · · · . · · · . The computation of the solutions can be carried out both in the external (transfer function) and in the internal (state space) frameworks. i − 1. it is N (4. N − 1} where the value of the function and derivatives thereof are provided only at s = s0 .1 Problem formulation* Consider the array of pairs of points P = {(si . The solutions can be classiﬁed according to properties like complexity. A parametrization of all solutions to (4. Most often however. i = j }. 4. si = sj . one is interested in parametrizing all solutions to the interpolation problem (4. i = 1 . system theoretic. 1. k (4. provides little insight. which interpolate the points of the array P. · · · . j = 0.5. (a) The distinct point interpolation problem: P = {(si . This construction has a system theoretic interpretation as a cascade interconnection of two systems.76) where we will assume that there are N given pieces of data. k. · · · . has as main tool the socalled L¨ owner matrix.79).5. φij ) : j = 0. the cascade interconnection just mentioned. We are looking for all rational (4. gcd (n. (4.3. that is: i=1 functions n(s) φ(s) = . is known as the generating system approach and involves the construction of a polynomial or rational matrix. concept of realization of linear dynamical systems. φ0j ) : j = 0. such that any polynomial or rational combination of its rows yields a solution of the problem at hand.e.77) where n(s). This method leads to recursive solutions in a natural way.5. k i (4. φi ) : i = 1. The rational interpolation problem* 85 i “oa” 2004/4/14 page 85 i The ﬁrst approach presented in subsection 4. 1. d(s) are coprime polynomials. · · · .78) can be given in terms of (s) as follows: φ(s) = (s) + r(s)ΠN (4.i i 4. norm boundedness or positive realness. their greatest common divisor is a (nonzero) constant. dj φ(s) dsj = φij . i = 1. We distinguish two special cases. the j th derivative of φ(s) evaluated at s = si is equal to φij . i. by expanding namely. N.2. i = j } and (b) the single multiple point interpolation problem: P = {(s0 .5. In such cases.78) In other words.82) i=1 (s − si ). · · · . ACA April 14. si = sj . this formula although general. where the parameter r(s) is an arbitrary rational function with no poles at the si . Solution of the unconstrained problem The Lagrange interpolating polynomial associated with P is the unique polynomial of degree less than N which interpolates the points of this array. This approach to rational interpolation leads to a generalization of the classical. as a simple function of its rank. that is. s= si i − 1.
p. r}. p + r = N . Chapter 4. φ ˆi = φr+i . what is the minimum norm and how can such interpolating functions be constructed? A third constraint of interest is positive realness of interpolants.5. Another constraint of interest is bounded realness.r. with deg φ = π .77).2 and 4. φi ) : i = 1. i = 1.e. i. r.77). the following problems arise. Problem (B): NevanlinnaPick. r j =1 γj Πi=j (s − si ) Solving for φ(s) we obtain φ(s) = γ j = 0. those positive integers π for which there exist solutions φ(s) to the interpolation problem (4. A function φ : C → C is positive real (p. p}. (4. It is deﬁned as: deg φ = max{deg n. construct all corresponding solutions. J = {(si . φ 1. will be provided.i i 86 Constrained interpolation problems The ﬁrst parameter of interest is the complexity or degree of rational interpolants (4. (a) Find the admissible degrees of complexity. i = where for simplicity of notation some of the points have being redeﬁned as follows: s ˆi = sr+i . Problem (C): Parametrization of positive real interpolants. 2004 i i . (b) Given an admissible degree π . At the end of section 4. · · · . give a procedure to construct such interpolating functions. · · · . that is problems (B) and (C).r.83) i i ACA : April 14. Consider φ(s) deﬁned by the following equation: r γi i=1 φ(s) − φi = 0. Linear dynamical systems: Part I i “oa” 2004/4/14 page 86 i ¨ 4. deg d}.81) which would be valid for rational functions. The following problems arise: Problem (A): Parametrization of interpolants by complexity. Before introducing this formula.5. In sections 4. Parametrization of interpolants by norm. γi = 0.5.2 The Lowner matrix approach to rational interpolation* The idea behind this approach to rational interpolation is to use a formula similar to (4. that is ﬁnding interpolants which have poles in the lefthalf of the complex plane (called stable) and whose magnitude on the imaginary axis is less than some given positive number µ. (4. · · · . s − si r j =1 φj γj Πi=j (s − si ) . Thus given the array of points P.5. · · · .3 we will investigate the constrained interpolation problem (A) for the special case of distinct interpolating points. (a) Does there exist a p. I = {(ˆ si .3 a short summary of the solution of the other two.) if it maps the closed righthalf of the complex plane onto itself: s ∈ C : Re(s) ≥ 0 → φ(s) : Re(φ(s)) ≥ 0 for s not a pole of φ. we partition the array P given by (4. and is sometimes referred to as the McMillan degree of the rational function φ. interpolating function? (b) If so.78).79) in two disjoint subarrays J and I as follows: ˆi ) : i = 1. (a) Do there exist bounded real interpolating functions? (b) If so. r ≤ N.
the points of the array J. . where L= ˆ1 −φ1 φ s ˆ1 −s1 ··· ··· .85) (4. . . ˆ1 −φr φ s ˆ1 −sr . . . r. the above formula. Consider the array of points P deﬁned by (4. The rational interpolation problem* 87 i “oa” 2004/4/14 page 87 i Clearly. the coefﬁcients γi have to satisfy the following equation: Lγ = 0.84) L is called the L¨ owner matrix deﬁned by means of the row array I and the column array J. . ∈ Rr ∈ Rp×r . .87). .85) has rank equal to the size of A.80) has Hankel structure. . i. . . ¨ From rational function to Lowner matrix The key result in connection with the L¨ owner matrix is the following. ACA April 14. consisting of samples taken from a given rational function φ(s). .5. b.39.41. In order for φ(s) to interpolate the points of the array I. given A ∈ Rπ×π .39. In particular φ(1) (s ) φ(k) (s0 ) φ(2) (s0 ) φ(3) (s0 ) 0 ··· 1! 2! 3! k! φ(2) (s0 ) φ(3) (s0 ) φ(4) (s0 ) φ(k+1) (s0 ) · · · 2! 3! 4! (k+1)! (3) (k+2) (4) (5) φ (s0 ) φ (s0 ) φ ( s0 ) φ ( s0 ) · · · L= . ˆp −φ1 φ s ˆp −s1 ˆp −φr φ s ˆp −sr γ1 . let si . For a proof of this proposition see Anderson and Antoulas [24]. r ≥ deg φ. I as deﬁned in the beginning of the subsection. . Rr = [(s1 I − A)−1 b · · · (sr I − A)−1 b] ∈ Rπ×r Op = [(ˆ s1 I − A∗ )−1 c∗ · · · (ˆ sp I − A∗ )−1 c∗ ]∗ ∈ Rp×π (4. It follows that the generalized reachability matrix deﬁned by (4. . in addition. (k ) (k+1) (k+2) (2k) φ (s0 ) φ ( s0 ) φ (s0 ) φ (s0 ) · · · k! (k+1)! (k+2)! (2k)! This shows that the L¨ owner matrix is the right tool for generalizing realization theory to rational interpolation. the (generalized) L¨ owner matrix associated with the array P consisting of one multiple point (4. γr (4. we can now provide a proof of lemma 4. 3! 4! 5! (k+2)! . . In the sequel. provided that r ≥ size (A). any square L¨ owner submatrix of L of size deg φ is nonsingular. Under the assumptions of the lemma. Proposition 4. As it turns out L is the main tool of this approach to the rational interpolation problem. γ = . interpolates the ﬁrst r points of the array P. the L¨ owner matrix factors in a product of Op times Rr . Lemma 4. . . i = 1. .e. As shown in Antoulas and Anderson [25]. . the following matrices will be of interest. together with a partition into subarrays J. b) be a reachable pair where A is a square matrix and b is a vector.86) As will be shown subsequently in (4.5.76). Let (A. which can be regarded as the rational equivalent of the Lagrange formula. in analogy with the realization problem (where the Hankel matrix factors in a product of an observability times a reachability matrix) we will call Op the generalized observability matrix and Rr the generalized reachability matrix associated with the underlying interpolation problem. Based on it. Let L be any p × r L¨ owner matrix with p. Remark 4. 2004 i i i i . It follows that rank L = deg φ.i i 4.1.40. be scalars which are not eigenvalues of A. c∗ ∈ Rπ . Therefore. · · · . Corollary 4.
78). d) of dimension π such that φ(s) = d + c(sI − A)−1 b. (b) φ(s) is not proper rational.4. (a) φ(s) is proper rational. αγ − β = 0. The following deﬁnition is needed ﬁrst. Let q = rank P. the sum of the number of rows and columns being equal to N . are related as follows: ¯ attached to φ. Linear dynamical systems: Part I i “oa” 2004/4/14 page 88 i Proof. ¨ From Lowner matrix to rational function Given the array of points P deﬁned by (4. b. and hence [L]i. (4. say r + 1. satisfying Lγ = 0 or γ ∗ L = 0 (4.39. Deﬁnition 4.42. The following is a consequence of lemma 4. 2004 i i . the rank of both Op and Rr is π . L can be factorized as follows: where Rr and Op are the generalized reachability. γ . of problem (A). (a) The rank of the array P is rank P = max {rank L} = q. and in particular. Proposition 4. According to the results in section 4. Consequently almost square L¨ owner matrices have rank q .43. Because of the proposition given above. which implies the desired result. we are now ready to tackle the interpolation problem (4. c. there exists a column vector γ = 0 of appropriate dimension. L where the maximum is taken over all possible L¨ owner matrices which can be formed from P.88) i i ACA : April 14.77). The L¨ owner matrices L. β . (b) We will call a L¨ owner matrix almost square. and assume that 2q < N . L ˜ = diag (α − s (αγ − β )L ˆi ) L diag (α − si ) The parameter α can be chosen so that diag(α − s ˆi ) and diag(α − sj ) are nonsingular.76). s+γ ˜ respectively.j = ˆi − φj φ = −c(ˆ si I − A)−1 (sj I − A)−1 b. In this case. if it has at most one more row than column or vice versa.87) Consequently. by means of a bilinear transformation s→ for almost all α.85).2. φ will be proper. (4.86) respectively. the rational function ˜(s) = φ φ αs + β s+γ αs + β . This concludes the proof of the lemma.i i 88 Chapter 4. This implies that the rank of their product L is also π . there exists a minimal quadruple (A. observability matrices deﬁned by (4. For any L¨ owner matrix with rank L = q . We distinguish two cases. The rank of all L¨ owner matrices having at least q rows and q columns is equal to q . This expression implies ˆi − φj = c(ˆ φ si I − A)−1 [(sj I − A) − (ˆ si I − A)](sj I − A)−1 b = c(ˆ si I − A)−1 [sj − s ˆi ](sj I − A)−1 b. solve the two problems (a) and (b). This completes the proof when φ(s) is proper. s ˆi − sj L = −Op Rr (4.
This occurs when L has at most q − 1 rows.88). as long as rank L = q . 2004 i i i i . r +1 r +1 i “oa” 2004/4/14 page 89 i 89 (4. (i) If 2q = N . In this case its rank is q − 1 and there exists a column vector γ such that Lγ = 0. all q × q L¨ owner matrices which can be formed from the data array P are nonsingular. its row set Iq contains the points of the row set Iq together with the last point of the column set Jq . and all square L¨ owner matrices of size q which can be formed from P are nonsingular. the only solution γ of (4. The rational interpolation problem* In this case we can attach to L a rational function denoted by n ( s) φL (s) = L . deg φL = q if. (ii) In order to distinguish between case (a) and case (b) of this theorem. the main result. has the following properties.e. φL . Remark 4. for almost all γ ) be N − q . with row. Part (b) of the proposition above says that as long as L has rank q there is a unique rational function φL attached to it.90) The rational function φL (s) just deﬁned.89). The second part can be justiﬁed as follows. while ACA April 14. Since L has N − q + 1 columns.90). and all integers bigger than or equal to N − q . the admissible degrees are all integers bigger than or equal to N − q .89) nL (s) = j =1 γj φj i=j (s − si ). φmin (s) is not unique and deg φmin = N − q . the degree of the attached φL will generically (i. we will discuss in this section the construction of all corresponding interpolating functions. Theorem 4. (4.5. The ﬁrst part of the theorem follows from the previous corollary. The proof of this result can be found in Antoulas and Anderson [25]. let rank P = q . does not exist. Hence. i.46. Consequently in order for L to yield a different rational function φL deﬁned by (4. (b) There is a unique φL attached to all L and γ satisfying (4. Two construction methods will be presented: the ﬁrst is based on an external (inputoutput) framework.5. (b) Otherwise. there is a unique interpolating function of minimal degree denoted by φmin (s).2.45. its column set J∗ q contains the points of the column set Jq with the exception of the last one. φL interpolates all given points if. Lemma 4. from Antoulas and Anderson [25]. dL have q − deg φL common factors of the form (s − si ). Jq . (a) deg φL ≤ r ≤ q < N .83). This argument shows that there can never exist interpolating functions of degree between q and N − q . deﬁned by (4.47.88) is γ = 0. column sets denoted by Iq .89). dL (s) using formula (4. (c) The numerator. denominator polynomials nL . As a consequence of the above lemma and lemma 4. while if deg φmin = N − q . φL will interpolate all the points of the array P.39 we obtain the Corollary 4. It readily follows that for almost all γ .e. (a) If 2q < N . and call it Lq . it will have to lose rank. (4. We are now ready to state. Corollary 4. and only if. The L¨ owner matrix L∗ q of size (q + 1) × q is now constructed. The construction of interpolating functions Given an admissible degree. and part (b) of the above theorem applies. moreover. dL (s) = j =1 γj i= j ( s − si ) (4. we only need to check the nonsingularity of 2q + 1 L¨ owner matrices.44.90). Under the assumptions of the main theorem. and only if.i i 4. The 2q + 1 L¨ owner matrices which need to be checked are the q × q submatrices of Lq and L∗ q. (d) φL interpolates exactly N − q + deg φL points of the array P. The admissible degree problem can now be solved in terms of the rank of the array P. Given the array of N points P. and deg φmin = q . if deg φmin = q . Construct from P any L¨ owner matrix of size q × (q + 1). the admissible degrees are q .
:) d = yi − c(si I − A)−1 b i i ACA : April 14. have to avoid the hypersurfaces deﬁned by the equations dL (si ) = 0.39). For the polynomial construction we need to form from P any L¨ owner matrix having π + 1 columns: L ∈ R(N −π−1)×(π+1) . c. 2004 i i . For details and examples. are the points which deﬁne the column array of L (A.i i 90 Chapter 4. If γ ¯π1 + · · · + γ ¯ππ+1 = 0. Let γ ¯ ∈ Rπ+1 be such that ¯γ L ¯ = 0. another Thus. the statespace construction proceeds by deﬁning the following two π × π matrices: 1 s1 1 s2 . we have to make sure that there are no common factors between numerator and denominator of φL . Linear dynamical systems: Part I i “oa” 2004/4/14 page 90 i the second is based on a statespace framework.91) .. from the second formula (4. we must have γπ1 + · · · + γππ+1 = 0 For the statespace construction of interpolating functions of admissible degree π . we are actually dealing with hyperplanes. this is the case for almost all γ . ¯ ¯ . the 2π + 1 − N (scalar) parameters which parametrize all γ . . the leading coefﬁcient of dL is different from zero. · · · . and determine a parametrization of all γ such that Lγ = 0. d) be deﬁned as follows: A = (σ Q)Q−1 ¯ ](:. · · · . For use below. Since we can always make sure that γ depends afﬁnely on these parameters. i = 1. i = 1. c = [(s1 I − A)](1. Otherwise. If π ≥ N − q . Once the properness condition is guaranteed. we need an array P 2π + 1 − N points. Q=L .e.90). and only if. i. notice that φL will be proper rational if. N. σQ = L ∈ R(π+1)×π 1 s −1 −1 · · · − 1 −sπ+1 −sπ+1 · · · −sπ+1 ¯ . A parametrization of all interpolating functions of degree π is then n ( s) φL (s) = L .90). let π be an admissible degree. More precisely. Given the array P. the underlying interpolating function is proper. we need a L¨ owner matrix of size π × (π + 1): ¯ ∈ Rπ×(π+1) L ¯ which contains besides the original N points of the array P. . see Antoulas and Anderson [25]. dL (s) where the numerator and denominator polynomials are deﬁned by (4. b.. Let the quadruple of constant matrices where si . in case π ≥ N − q . chosen arbitrarily but subject to the nonsingularity condition given in part (a) of the main theorem (see also the remark at the end of the previous section). π + 1..1) b = (s1 I − A)[L (4. we need to perform a bilinear transformation which will assure the properness of the function under construction (see the proof of lemma 4.
To keep the exposition simple. yk (t) = φk esk t . invariant factors of polynomial matrices. N (4. Next we need to show that none of the si s is an eigenvalue of A. k = 1.91) shows that AQ = σ Q If we deﬁne the shift operation in this case as assigning to the ith column of the L¨ owner matrix.92) is indeed an interpolating function of the prescribed degree π . to section 6.i i 4. This section will make use of certain elementary concepts and results concerning polynomial and rational matrices. we need to show that the rational function given by (4. where [M](:. (si I − A) is invertible.3 Generating system approach to rational interpolation* This method for dealing with the rational interpolation problem. For the more general interpolation problem. t ≥ 0. · · · . and (C).79). is based on the factorization of a rational matrix expressed in terms of the data (4. A is again determined by the shift. for example. denotes the ﬁrst column of M.94) April 14.3.1) . presented above. The rational interpolation problem* 91 i “oa” 2004/4/14 page 91 i for any si . N } (4.76). It can be shown that the above quadruple is a minimal realization of the desired interpolating function φ(s) of degree π : φ(s) = d + c(sI − A)−1 b (4. (iii) It is readily checked that the classical systemtheoretic problem of realization can be interpreted as a rational interpolation problem where all the data are provided at a single point. (ii) The theory. the generating system was constructed recursively. has been worked out for the multiplepoint as well as for more general interpolation problems. All missing proofs. Remark 4. The tools presented however are applicable in the general case. 2004 i i i i . we associate the following exponential time series (i. Anderson and Antoulas [24]. denotes the ﬁrst row. and consequently. We refer e. · · · . For example. The data in terms of time series The interpolation array P deﬁned by (4. uk (t) = esk t . (i) In the realization problem the shift is deﬁned in terms of the associated Hankel matrix.93) We will also consider the (unilateral) Laplace transform of these time series.5. to si times itself. Our theory has generalized the theory of realization to the theory of interpolation. because of the properness of the underlying function.g. Finally. (B). It follows that A is determined by this shift. First. These steps can be found in Anderson and Antoulas [24]. the reader is referred to Antoulas and Anderson [25]. more than three quarters of a century ago. Some of the results discussed can also be found in Belevitch [49].3 of the book by Kailath [191] for an exposition of these concepts and the underlying theory. Actually. in the sequel only the distinct point and the single multiple point interpolation problems will be considered.5.92) The steps involved in proving the above result are as follows. solutions to various special cases of the general rational interpolation problem have been worked out in what amounts to a generating system approach.5. as the operation which assigns to the ith column the (i + 1)st column. problem (B) was solved using this approach. formula (4.76) can be interpreted in terms of time functions. It leads to a parametrization of all interpolants which solve problems (A). the matrix Q is nonsingular. left coprime polynomial matrices. in particular we will consider the 2 × N matrix of rational entries whose k th column is the transform of wk (t): W(s) = [W1 (s) · · · WN (s)] where Wk (s) = ACA 1 s − sk 1 −φk .:) . as well as other details and examples. k = 1. row reduced polynomial matrices. Through the years. deﬁned on page 86.e. can be found in the references: Antoulas and Anderson [25] as well as Anderson and Antoulas [24]. functions of time) D = {wk (t) = uk (t) −yk (t) . that is. while [M](1. 4. unimodular polynomial matrices and Bezout equations. then σ Q is indeed the shifted version of Q. To the distinct point array P deﬁned by (4.
A) is observable. can be expressed in terms of the rational interpolation problem.k−1 −(k −1)! . A= s0 1 s0 1 .e. The time series in this case are polynomialexponential D = {wk (t) = uk (t) −yk (t) = es0 t pk (t).80). s0 Again by construction. h1 .95) where k ˜ C= 1 −h0 0 − h1 0 − h2 ··· ··· 0 −hN −1 ∈R 2×N . (4.. t ≥ 0. · · ·. Let pk be the vectorvalued polynomial function pk (t) = 1 −φ0 tk−1 + ··· + (k − 1)! 0 φ − j0!j i “oa” 2004/4/14 page 92 i Chapter 4. the realization problem is equivalent to a rational interpolation problem where all the data are provided at zero. φ 1 d φ Consequently hk = k ! dtk s=0 . sN Since the interpolation points are distinct. 1 0 ..N −1 − (N −1)! ∈C 2×N . Given the scalar Markov parameters h0 . 1 s0 . . From the above considerations the corresponding time series W = [W1 · · · WN ] can be expressed in terms of (4. N } (4. · · · . i. k = 1. we seek to determine all rational functions φ(s) whose behavior at inﬁnity (i. k = 1.4. ∈ RN ×N .. 2004 i i . .. A= 0 1 0 1 .e. i.e.97) A straightforward calculation yields the following realization for the (unilateral) Laplace transform of this set of time series W(s) = [W1 (s) · · · WN (s)] = C(sIN − A)−1 where C= 1 −φ0 0 01 − φ1! 0 02 − φ2! ··· ··· φ 0 0. A = s1 s2 . formal power series) is: φ(s) = h0 + h1 s−1 + · · · + hN −1 s−N +1 + · · · . the pair (C. By introducing s−1 as the new variable. . The realization problem discussed in section 4.i i 92 It is easy to see that W(s) has a realization W(s) = C(sIN − A)−1 where C= 1 −φ1 1 −φ2 ··· ··· 1 −φN ∈ C2×N . . A) is observable. i = j . the pair (C.95) ∈ CN ×N . ∈ CN ×N . the behavior at inﬁnity is transformed into the behavior at zero and the Markov parameters become moments: ˜(s) = φ(s−1 ) = h0 + h1 s + · · · + hN −1 sN −1 + · · · . N. · · · . Linear dynamical systems: Part I (4. a similar construction holds. . For the single multiple point interpolation array P deﬁned by (4.. si = sj . 0 (4.98) i i ACA : April 14. hN −1 .96) tk−j −1 + ··· + (k − j − 1)! 0 φ0.
99). that is its determinant is a nonzero P = LP constant.99) (4. and in particular n(s)s = rk (s). ACA April 14. where n.99) we −φk d(s) obtain (4. We will give the proof only for the distinct point interpolation problem. one of the following equivalent conditions holds: [n( d d ) d( )]wk (t) = 0. since this must hold for all t ≥ 0.102) det Θ(s) = χ(s).100) a frequency domain characterization of rational interpolants. such that the socalled Bezout equation is satisﬁed Θ(s)P(s) + Ξ(s)Q(s) = I2 . Ξ(s) must be left coprime. Therefore.48. · · · . θ22 (s) are coprime. Moreover the polynomial matrices Θ(s). A consequence of left coprimeness is the existence of polynomial matrices P(s). Proof. where rk (s) is a polynomial resulting from initial conditions of −sk − yk at t = 0 . such that det Θ(s) = 0.95) of the data. If we take the unilateral Laplace transform of (4. (4. This rational function interpolates the points of the array P deﬁned by (4. Proof.2) entries θ12 (s). The matrix Θ(s) constructed above satisﬁes: (a) its invariant factors are 1 and χ(s) = Πi (s − si ). N . d are coprime. for appropriate polynomial matrices P ˜. φ(sk ) = φk for all k . (2.49. we will denote by r( dt ) the constant coefﬁcient differential operator i “oa” 2004/4/14 page 93 i 93 r( d d d2 dn ) = r0 + r1 + r2 2 + · · · + rn n dt dt dt dt The following is a characterization of rational interpolants in terms of the time series both in the time and the frequency domains. Proposition 4. the necessity and sufﬁciency n(sk ) of the interpolation conditions d (sk ) = φk . n(s) Proposition 4. follows for all k . and only if. Q(s) of size 2 × 2. The rational interpolation problem* Interpolation in terms of the time series data Given the polynomial r(s) = k i i=0 ri s . consider the rational function φ(s) = d (s) . follows (n(sk ) − φk d(sk )) esk t = 0.g. A and the coprimeness of Ξ. Θ. t ≥ 0.79) if. [8] or chapter 6 of [191]). where rk (s) is polynomial.i i 4. ri d ∈ R. we construct the pair of polynomial matrices Ξ(s). (b) its (1. the polynomial matrices sI − A and Θ(s) have both a single nonunity invariant factor which is the same. N × 2 respectively.5. 2 × 2 respectively. namely the characteristic polynomial of A: χ(s) = ΠN i=1 (s − si ) (see e. Θ(s) of size 2 × N . given the deﬁnition of the time series wk . 2004 i i i i . dt dt (4. (4.2). every nonsingular polynomial matrix L(s) such that ˜ and Q = LQ ˜ . after a possible normalization by a (nonzero) constant. Q ˜ . that is. Because of the observability of the pair C.99) provides a time domain characterization. A Θ constructed this way has the following properties. is unimodular. Equation (4. The solution of problem (A) From the frequency domain representation (4. for k = 1. From (4. and W(s) = Θ(s)−1 Ξ(s) = C(sI − A)−1 .100) [n(s) d(s)]Wk (s) = rk (s). Thus the expression on the lefthand side is a polynomial if.101) The above operation consists in computing a left polynomial denominator Θ(s) for the data W(s). With the notation introduced above. and only if.100). while equation (4.
because otherwise n. Feedback interpretation of the parametrization of all solutions of the rational interpolation problem generating system or generating matrix of the rational interpolation problem at hand.104). We can interpret Θ as a twoport with inputs u.e. b. which implies that conditions (4. · · · . This system in turn. d coprime. i.103) (4. Conversely. the coprimeness of n. which is a contradiction. b are coprime. Because of (4. Also. i = 1.104) must be satisﬁed. · · · . y ˆ y θ11 θ12 u = ˆ u θ21 θ22 y d d ˆ and y ˆ in the following manner: b( dt Γ can be seen as relating u )ˆ u = a( dt )ˆ y. i = 1. there exist polynomials a. their greatest common divisor would have to be a product of terms (s − si ).50. Furthermore. Linear dynamical systems: Part I i “oa” 2004/4/14 page 94 i Let θij denote the (i.48 φ is an interpolant. 2004 i i . N. N. Multiplying this relationship on the left by the row vector [n(s) d(s)] we conclude that [n(s) d(s)][Θ(s)]−1 must be a polynomial row vector. θ21 (si ) = φi θ22 (si ). Evaluating this expression at s = si we obtain θ11 (si ) = φi θ12 (si ). b such that (4. u ˆ (see ﬁgure 4.104). Proof. The rational function φ(s) = n d(s) . b(s) such that [n(s) d(s)] = [a(s) b(s)] Θ(s) and a(si )θ12 (si ) + b(si )θ22 (si ) = 0.102). If the numerator and the denominator of φ satisfy (4. θ22 were not coprime. · · · . let φ be an interpolant. all four entries of Θ would have the same common factor. This is possible due to part (b) of the proposition above. (s) Lemma 4.103). there exist coprime polynomials a(s).101) yields the equation (s − si )[Ξ(s)](:. if θ12 . if.4): outputs y.4. i = 1. N . by the latter equation. (4. and only if. Furthermore (4. where Θ is ﬁxed and Γ is arbitrary.48 [n(s) d(s)]W(s) is polynomial row vector. The ith column of (4. a. j )th entry of Θ. constant coefﬁcient. there holds [n(s) d(s)]W(s) = [a(s) b(s)]Ξ(s). The desired coprimeness is thus established. a As shown above all interpolants φ = n d can be parametrized by means of (4.104) The latter expression is polynomial and hence by proposition 4. with n.103) holds. subject to (4. differential equation d d d( dt )y = n( dt )u.i) = θ11 (s) θ21 (s) θ12 (s) θ22 (s) 1 −φi = θ11 (s) − φi θ12 (s) θ21 (s) − φi θ22 (s) . d implies the coprimeness of a.i i 94 Chapter 4. This however is a contradiction to the fact that one of the two invariant factors of Θ is equal to 1. Finally. d would not be coprime. From the Bezout equation follows that P(s) + W(s)Q(s) = [Θ(s)]−1 . and is an arbitrary rational function subject to constraints (4. As a consequence of the above interpretation Θ is called the u E y ' Θ ' ˆE y Γ ˆ u Figure 4.103) where the parameter Γ = b ˆ . where the si are the interpolation points. can be represented by means of a feedback interconnection between Θ and Γ. we notice that d = aθ12 + bθ22 . is an interpolant for the array P. According to proposition 4. Then the parametrization of all solutions φ can be interpreted as a linear system described by the linear. Therefore.103) shows i i ACA : April 14.
The former is a nonlinear constraint in the space of coefﬁcients of a. This result was proved in Antoulas and Willems [15] as well as Antoulas.104) are linear in the coefﬁcients of a. the degree of r(s)Θ(s). Examples will be worked out later. (b) Because of the socalled predictabledegree property of row reduced polynomial matrices (see e. (a) Although the row reduced version of any polynomial matrix is nonunique. A) (for details see [8] or [191]). the next row to be linearly dependent on the previous ones will be c1 Aκ2 . We use this freedom to transform Θ into row reduced form. Let the corresponding row degrees be κ1 = deg (θ11 (s) θ12 (s)) ≤ κ2 = deg (θ21 (s) θ22 (s)) . i = 1.5.101) is unique. of the pair (C. which leads to the two observability indices κi . chapter 6 of [191]).i i 4. The procedure involves the determination of two linear dependencies among the rows of this observability matrix. Then. det [Θ]hr = −1. a(s)θ12 (s) + b(s)θ22 (s) 95 i “oa” 2004/4/14 page 95 i Remark 4. b together with constraints (4. Notice that the row degrees of Θ and the degree of its determinant. the corresponding row degrees are unique. Ball. The coprimeness constraint on n. Let νi be the degree of the ith row of Θ. It follows that Θ can be readoff of the above relationships: − sκ2 − κ1 i=0 Θ(s) = αi si γi si sκ 1 − − κ1 −1 i=0 βj sj κ 2 −1 i=0 κ1 j i=0 δj s . For simplicity we will use the same symbol. with Θ row reduced and r some polynomial row vector with coprime entries. can be either κ1 or greater than or equal to κ2 . j ≤ κ1 . it is automatically satisﬁed in the case of minimal interpolants discussed below. κ1 + κ2 = N. whose (i. namely Θ.5.105) Clearly. i. The rowwise highest coefﬁcient matrix [Θ]hr . to denote the row reduced version of this matrix. Construction of a row reduced Θ using O(C. The matrix Θ in equation (4. b.102) is N . Let ci denote the ith row of C. 2. A). The rational interpolation problem* that every interpolant can be expressed in terms of the linear fractional representation φ(s) = a(s)θ11 (s) + b(s)θ21 (s) . (4. i = 1.g. We will call Θ row reduced if [Θ]hr is nonsingular. satisfy ν1 + ν2 ≥ N . 2. 2. d has been expressed equivalently as a coprimeness constraint on the parameter polynomials a. which implies that Θ is row reduced. An equivalent characterization of row reducedness is that the row degrees of Θ satisfy ν1 + ν2 = N .104). b. ci Aj .4. because of observability. in general. Constraints (4. Kang. κ2 −1 i=0 κ1 j j =0 δj c2 A . up to left multiplication with a unimodular matrix (which is a polynomial matrix with constant nonzero determinant). 2004 i i i i . A). ACA April 14. The row degrees of row reduced polynomial matrices have two important properties. which by (4. We will now show how a row reduced generating matrix can be constructed directly from the observability matrix O(C. In order to tackle problem (A) the concept of row reduced generating matrix is needed. and Willems [22].j (s). c2 Aκ1 is the ﬁrst row of O which is linearly dependent on the preceding ones. and κ1 + κ2 = N : c2 Aκ1 c1 Aκ2 = = κ1 i=0 αi c1 Ai γi c1 Ai + + κ1 −1 j =0 βj c2 Aj . j )th entry is the coefﬁcient of the term sνi of the polynomial θi. is a 2 × 2 constant matrix. i = 1.e. For simplicity we will assume that working from top to bottom of the observability matrix. where κ1 < κ2 . Combining the preceding lemma with the above considerations we obtain the main result which provides the solution of problem (A).
where the polynomials a. In particular. θ22 (s) + a(x)θ12 (s) where the polynomial a(s) satisﬁes deg a = κ2 − κ1 . b satisfy [b(s)θ22 (s) + a(s)θ12 (s)]h = 0. b satisfy the Bezout equation b(s)θ22 (s) + a(s)θ12 (s) = 1. Γ is arbitrary – except for the avoidance conditions (4. where [r]h is used to denote the coefﬁcient of the highest power of the polynomial r. 2004 i i . Feedback interpretation of recursive rational interpolation recursive interpolation problem. More generally. are given by −n(s) = b(s)θ21 (s) + a(s)θ11 (s).103). It is also worth mentioning that a generating matrix (which is not row reduced) can be written down in terms of (s) and χ(s) given by (4. where deg a = κ − κ1 . θ12 (s) is the unique minimal interpolant. satisfying (4. · · · . we have the following pictorial representation of the solution to the yE 1 Θ1 ' Θ2 u1 ' yE 2 Γ2 u2 u E y ' Figure 4. Furthermore. b are coprime. there is a family of interpolating functions of minimal complexity which can be parametrized as follows: φmin (s) = θ21 (s) + a(s)θ11 (s) .81). (ii) Otherwise. All polynomial interpolants φ(s) = n(s).104) – yields. the solution to the recursive interpolation problem for free. if D = D1 ∪ D2 we deﬁne Θ1 as a generating system for the data set D1 and Θ2 as a generating system d for the modiﬁed data set Θ1 ( dt )D2 .52.i i 96 Chapter 4.101). at least conceptually. One polynomial interpolant is the Lagrange interpolating polynomial (s) given in the distinct point case by (4.82) can be obtained in this framework by means of polynomial linear combinations of the rows of this particular generating matrix. i = 1. the Schur i i ACA : April 14. provides a generating system for D. Proper rational and polynomial interpolants. with row degrees κ1 ≤ κ2 .5. The interpolants above are proper rational provided that a. δ (φmin ) = κ2 = N − κ1 . N. there are no interpolants of complexity between κ1 and κ2 . Consider Θ deﬁned by (4. (i) If κ1 < κ2 and θ11 . Recursive interpolation The fact that in the parametrization of all interpolants by means of the generating system. Corollary 4. Linear dynamical systems: Part I i “oa” 2004/4/14 page 96 i Theorem 4. n (iii) In both cases (i) and (ii). θ22 (si ) + a(si )θ12 (si ) = 0. The cascade interconnection is associated with: the Euclidean algorithm. The the cascade Θ = Θ2 Θ1 of the two generating systems Θ1 and Θ2 . θ21 are coprime. there are families of interpolants φ = d of every degree κ ≥ κ2 . δ (φmin ) = κ1 .51. (s) 1 The solution of the unconstrained interpolation problem (4.102): χ(s) 0 Θ(s) = . φmin (s) = θ11 (s) . deg b = κ − κ2 and a. which is row reduced.
By appropriate scaling. corresponds to attaching an appropriately deﬁned component to a cascade interconnection of systems. we will assume that the polynomials are monic (coefﬁcient of highest degree is one).107) β2 s + α1 + β3 s + α2 + . For details we refer to [145] and [10]. Here we will illustrate the generic case only.. .. Let the transfer function be H(s) = n d(s) . there is a state space realization which follows from this decomposition.6.107): 1 −α1 β2 0 · · · 0 0 −1 −α2 β3 · · · 0 0 0 . the Nevanlinna algorithm. . . strictly proper.6 and 4. . and Hankel norm approximation. + where the di are polynomials of degree κi . A general cascade decomposition of systems algorithm.108) C 0 0 · · · −1 −αn−1 βn 0 0 0 ··· 0 −1 −αn 0 β1 0 ··· 0 0 0 ACA April 14. that is the case where κi = 1. A B . the BerlekampMassey algorithm. The rational interpolation problem* 97 i “oa” 2004/4/14 page 97 i u E y ' Θ1 ' E Θ2 ' E .7.7.. .. .. We conclude this account on recursiveness by making this cascade interconnection explicit.106) dN (s) 1 κi = deg d = n. Furthermore. for (s) singleinput singleoutput systems. Earlier.. . In the generating system framework we have 0 1 1 −di Θ= i=1 Θi where Θi = This cascade decomposition can be simpliﬁed in this case as shown in ﬁgure 4. As shown in Kalman [193]. 2004 i i i i . is that recursive update of the solution of the realization or interpolation problems. and N i (4. . . recursive realization corresponds to the decomposition of H(s) is a continued fraction: H(s) = d1 (s) + d 2 ( s) + d3 (s) + 1 1 1 1 . Problem (A) was solved recursively in [12]. Darlington synthesis. . . whenever it has a generic decomposition of the form (4. . for all i. = (4. The main result of these two papers which is shown pictorially in ﬁgures 4. (4.5. and hence N = n. The scalar case. βn + s+ αn The following triple is a minimal realization of H(s)..i i 4.106) becomes β1 H(s) = (4.. . continued fractions. the recursive realization problem with a degree constraint was solved in [9]. assumed for simplicity. ' E Θk ' E Γk Figure 4. .
i i ACA : April 14. for µ = µ1 it becomes semideﬁnite and for µ < µ1 it is indeﬁnite. 2 2 ¯N φ1 ¯N φN µ −φ µ −φ · · · s ¯N +s1 s ¯N +sN where (¯ ·) denotes complex conjugation. . Π2 > 0. we wish to ﬁnd out whether there exist interpolants which are stable (poles in the lefthalf of the complex plane) with magnitude bounded by µ on the imaginary axis. A solution exists if. . we have seen important connections between the following topics: (a) Realization/Interpolation (b) Cascade/ feedback interconnection. Πµ > 0. with µ1 the largest. and only if. . As long as µ > µ1 . 2004 i i . this matrix is positive (semi) deﬁnite Πµ ≥ 0. where Π1 > 0. Thus the smallest norm for 1 which there exist solutions to problem (B) is the square root of the largest eigenvalue of Π− 1 Π2 . To E e E T 1 d1 E 1 d2 ' c e T c e E T . . . (d) Continued fractions. . Cascade (feedback) decomposition of scalar systems summarize. . . (e) Tridiagonal state space realizations.79) – again we restrict attention to the distinct point case – together with µ > 0. while B and C∗ are multiples of the ﬁrst canonical unit vector in Rn . .i i 98 Chapter 4. The tool for investigating the existence issue is the NevanlinnaPick matrix 2 ¯ ¯1 φN µ − φ1 φ1 µ2 −φ · · · ¯1 +s1 s ¯1 +sN s . . The solution of problems (B) and (C) Given the array P deﬁned by (4.7. Thus partial realization consists in truncating the tail of the continued fraction or equivalently of the cascade decomposition of the system or of the tridiagonal state space realization. (c) Linear fractions. . Linear dynamical systems: Part I i “oa” 2004/4/14 page 98 i Notice that A is in tridiagonal form. These issues will be play a role in chapter 10 where an iterative method. −1 2 Write Πµ = µ2 Π1 − Π2 . T 1 d3 c . Πµ = . of constructing this tridiagonal realization will be of central importance. Let µ2 i be the eigenvalues of Π1 Π2 .. the socalled Lanczos procedure. 1 dk ' Figure 4.
See also the book by Ball. Concluding remarks and generalizations The problem of rational interpolation has a long history. It was only recently recognized however. must be at most µ. i = 1 . and there are p + m (observability) indices which enter the picture. the mirror image set D D={ 1 −φi ˆ ={ esi t . as well as the bitangential or bidirectional interpolation problem. satisfying si = sj . i = 1. Consider the array consisting of the distinct interpolation data si . · · · . [26] and Anderson and Antoulas [24]. For related treatments of problem (B) we refer to [271] and [195]. if they exist. ˆ . are given.i i 4.1. i = 1. keeping track of their complexity. deﬁned by (4. The above results can be generalized considerably. 3). The generalization of the state space framework from the realization to the rational interpolation problem is due to Antoulas and Anderson [25]. The left tangential or left directional interpolation problem consists in ﬁnding all p × m rational matrices Φ(s) satisfying Vi Φ(si ) = Yi . Therein. [234]. of size 1 × 1. The rational interpolation problem* 99 i “oa” 2004/4/14 page 99 i In Antoulas and Anderson [26] it was shown that the NevanlinnaPick interpolation problem can be transformed into an interpolation problem without norm constraint. N } and D ¯i −φ 1 ¯i t e−s . i = 1. For a general account on the generating system approach to rational interpolation. N . · · · . One can distinguish two approaches: state space and polynomial. Aaug = Caug = (C JC A 0 0 ¯ −A . and the corresponding pair of matrices is The augmented data set is thus Daug = D ∪ D ¯ ). [23]. Ξaug such that Caug (sI − Aaug )−1 = [Θaug (s)]−1 Ξaug (s). by φ = aθ12 +bθ22 . 4). i = j . We now construct left coprime polynomial matrices Θaug .5. This is achieved by adding the socalled mirror image interpoˆ of D. ri × m respectively. The following result can be proved using the results in Antoulas and Anderson [26]. see Ball.5. Kang. and Rodman [38]. We conclude this part by mentioning that problem (C). 2004 i i i i .53. 0). [309]. (1. can be deﬁned similarly. Ball. and Willems [22]. N . For details we refer to [27]. The generating system or polynomial approach to rational interpolation with the complexity (McMillan degree) as constraint. Yi . N }.93). (1/2. (2. The solution of problem (A) in all its matrix and tangential versions. and (¯ ·) applied to a matrix denotes complex conjugation (without transposition). where the magnitude of b on the imaginary axis. and rank Vi = ri ≤ p. The main result is that the generating system for the data set Daug is the generating system which solves problem (B). The interpolants φ = d of D with norm of φ aθ11 +bθ21 a less than or equal to µ on the imaginary axis. In terms of trajectories. Let Θaug be as deﬁned above. Examples Example 4. norm boundedness or positive realness at the same time. The above result achieves an algebraization of the NevanlinnaPick interpolation problem. Gohberg. Kang. For simplicity of notation let the entries of Θaug be denoted by θij . Vi . · · · . Gohberg. n Theorem 4. as a problem which generalizes the realization problem. was put forward in Antoulas and Willems [15] and Antoulas. In this case the generating matrix Θ is square of size p + m. can also be turned into an unconstrained interpolation problem by adding an appropriately deﬁned mirrorimage set of points. has been given in the generating system framework in Antoulas. provided that the parameters a. where J = 0 1 1 0 . is lation points to the original data. Consider the data array containing 4 pairs: P = {(0. b are appropriately restricted. Classiﬁcation by norm. and Rodman [38]. and Willems [22]. ACA April 14. 2)}. the L¨ owner matrix replaces and generalizes the Hankel matrix as the main tool. that is the problem of positive real interpolation. The right tangential or right directional interpolation problem. ri × p. · · · . Ball.
Following the construction leading to (4. 2004 i i . there is a unique minimal interpolant with McMillan degree 1. According to formula (4. as follows: φ(s) = (p3 s3 + p2 s2 + p1 s + p0 )6s + (q1 s + q0 )s(s − 4)(2s − 1) . there are no interpolants of degree 2. s+1 Furthermore. This means that the next linear dependence is that of the 7th row on the previous ones: 2c1 A3 − 9c1 A2 + 4c1 A − 2c2 A + c2 = 0. 1 0 1 4 4 0 −3 −16 − 1 2 1 1 8 0 8 0 −3 −32 − 1 4 Examining the rows from top to bottom. which in this case turn out to be p0 + 1 = 0 2p2 + 2p1 + 2p0 − 1 = 0 4p2 + 2p1 + p0 − 1 = 0 p2 + 2p1 + 4p0 = 0 By letting p2 = p1 = 0 and p0 = 2 in the above family of interpolants.105) a row reduced generating matrix Θ is therefore Θ(s) = 6s s+1 s(s − 4)(2s − 1) −(2s − 1) .i i 100 According to (4. namely 6s φmin (s) = . we obtain the Lagrange interpolating polynomial for the data array P: s (s) = (2s2 − 9s + 16). It follows that κ1 = 1. −(2s − 1) + (p2 s2 + p1 s + p0 )(s + 1) The coefﬁcients of p must satisfy the constraints (4. (p3 s3 + p2 s2 + p1 s + p0 )(s + 1) − (q1 s + q0 )(2s − 1) ACA : i i April 14. The next family of interpolants has McMillan degree 3.105) we need the observability matrix of the (C. 3 The next family of interpolants has McMillan degree 4.104) in R3 .96): C= 1 0 1 1 −3 −4 1 −2 Chapter 4. By (i) of the theorem. Linear dynamical systems: Part I . It can be parametrized in terms of a third degree polynomial p and a ﬁrst degree polynomial q. the ﬁrst linear dependence occurring in O4 is that of the fourth row on the previous ones: c2 A + 6c1 A + c2 = 0. A) pair 1 1 1 1 0 −3 −4 −2 1 1 2 0 2 0 −3 −8 −1 O4 = . A= 1 2 1 2 i “oa” 2004/4/14 page 100 i 0 . It can be parametrized in terms of the second degree polynomial p(s) = p0 + p1 s + p2 s2 . Therefore κ2 = N − κ1 = 3. since 6s and s +1 are coprime polynomials. as follows: φ(s) = s(s − 4)(2s − 1) + (p2 s2 + p1 s + p0 )6s .
i = 1. Continuation of example 4.105) we need the observability matrix of the (C. 3 2 2 3 p3 q0 − p2 q0 q1 + p1 q0 q1 − p0 q1 = 0. The generating system which annihilates w1 is thus Θ1 (s) = s 0 0 1 . i = 1. p. Although this generating matrix is not the same as the one obtained in the previous example. and we have e2 = w2 .3 revisited. The time series associated with P are t 1 1 1 1 w1 = . the third error time 2 d e2t .102).2.5. In particular UΘ4 Θ3 Θ2 Θ1 .1: Recursive construction of interpolants. Following the construction leading to (4. thus Θ2 (s) = 3 0 1 s−1 . w3 = e 2t . w 4 = e2 0 −3 −4 −2 Following ﬁgure 4. d since the ﬁrst error time series. since det Π4 i=1 Θ(s) = Πi=1 (s − si ). deﬁned as e1 = Θ1 ( dt )w1 . A) pair deﬁned by (4.104) must also be satisﬁed.e.98): C= 1 0 0 −1 0 0 −1 −1 0 −2 . i “oa” 2004/4/14 page 101 i 101 Then. while e4 = Θ3 Θ2 Θ1 ( )w4 = dt dt 0 3 4 e2 t 2s − 1 0 0 1 ⇒ Θ(s) = Θ4 (s)Θ3 (s)Θ2 (s)Θ1 (s) = 3s(s − 2)(2s − 1) 6s (s − 2)(2s − 1) s+1 . we will now construct the generating systems Θi . 4 according to (4. the second error time series is deﬁned d similarly e2 = Θ1 ( dt )w2 . i. is equal to the 1 1− 2s generating matrix obtained in the previous example.5.6. 2004 i i i i . i = 1. satisfying Θ = Θ4 Θ3 Θ2 Θ1 .3. w2 = et . constraints (4. it differs only by left 0 1 multiplication with a unimodular matrix. and Θ2 ( dt )e2 = Θ2 Θ1 ( dt )w2 = 0. q must be coprime. that is the free parameters must avoid the following hyperplanes in R6 : p0 + q0 = 0 2p3 + 2p2 + 2p1 + 2p0 − q1 − q0 = 0 8p3 + 4p2 + 2p1 + p0 − 2q1 − q0 = 0 p3 + 2p2 + 4p1 + 8p0 = 0 Example 4. d d d The ﬁrst error time series remains zero Θ2 Θ1 ( dt )w1 = 0.4.5. 2. 4. 3. 2. 3 3 Example 4. 3. 2. is zero: e1 = 0. 4. 3. Realization example 4. The rational interpolation problem* Firstly. there must hold det Θi = s − si .5. A= 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 ACA April 14. which implies series is Θ2 Θ1 ( dt )w3 = −4 Θ3 (s) = Finally Θ4 (s) = s−2 0 2 1 ⇒ Θ3 Θ2 Θ1 ( d d )wi = 0.i i 4. where U(s) = .
the internal description. the minimal interpolant has degree 3 and all minimal interpolants form a two parameter family obtained by multiplying Θ on the left by [αs + β 1]: 2 3 2 ¯min (s) = − (αs + β )s + (s + s − s) . The internal description consists thus of a set of equations describing how the input affects the state and a second set describing how the output is obtained from the input and the state. In the frequency domain. And in order to obtain matching of the Markov parameters we replace s by s−1 : ¯min (s−1 ) = − φmin (s) = φ (α + βs) + (1 + s − s2 ) s2 − (β + 1)s − (α + 1) = 3 2 3 (α + βs)(1 − s) − 2s + s s − (β + 2)s2 + (β − α)s + α It is readily checked that the power series expansion of φ around inﬁnity is: φmin (s) = s−1 + s−2 + s−3 + 2s−4 + (β + 4)s−5 + (β 2 + 4β + α + 8)s−6 + · · · 4. It forms a natural extension and generalization of the realization problem which is interesting in its own right. the input and the output) an internal variable called the state.6 Chapter summary The purpose of the preceding chapter was to familiarize the reader with fundamental concepts from system theory. The section on the external description states that a linear timeinvariant system is an operator (map) which assigns inputs to outputs. therefore since the value of the denominator for s = 0 is 1. 2004 i i . α. and the realization/interpolation problem. A) = 1 0 0 0 0 0 0 0 0 0 0 −1 1 0 0 0 0 0 0 0 0 −1 0 −1 1 0 0 0 0 0 0 −1 0 −1 0 −1 1 0 0 0 0 −2 0 −1 0 −1 0 −1 1 0 Chapter 4.104) the parameters α. namely the external description. The internal description introduces besides the external variables (that is. The former set contains i i ACA : April 14. it is the convolution operator deﬁned in terms of its kernel which is the impulse response of the system. In particular. Linear dynamical systems: Part I i “oa” 2004/4/14 page 102 i The ﬁrst row which is linearly dependent on the preceding ones is the 6th . these two parameters are free. c1 A3 + c1 A2 − c1 A + 2c2 A − c2 = 0 The corresponding generating system is Θ(s) = s2 s + s2 − s 3 −s2 + s 2s − 1 According to the theory.i i 102 The associated observability matrix is O5 : O5 (C. β ∈ R φ 2 (αs + β )(s − s) − (2s − 1) According to (4. the next one is the 7th : c1 A2 − c2 A2 + c2 A = 0. the external description is given in terms of the transfer function. The last section on the rational interpolation problem can be omitted on ﬁrst reading. Three problems are discussed. β must be such that the denominator of the above expression is nonzero for s = 0. Many aspects however discussed in this section will turn out to be important in the sequel. whose series expansion around inﬁnity yields the Markov parameters.
Eliminating the state from the internal description is straightforward and leads to the external description. if the assumption of stability is made. deﬁned by (4.6. Chapter summary 103 i “oa” 2004/4/14 page 103 i differential or difference equations. ACA April 14. 2004 i i i i . Secondly both statespace and polynomial ways of constructing all solutions of a given complexity are discussed. Their advantage lies in the fact that they are square symmetric and semideﬁnite. this is discussed brieﬂy. namely. like bounded real or positive real transfer function. Besides these matrices.4. Furthermore. They will be the subject of detailed study in chapter 6. It involves the construction of state and if the data consist of all Markov parameters. The latter set contains algebraic equations (no derivatives/shifts of the state are allowed).13). To get a better understanding of a dynamical system. instead samples of the transfer function at arbitrary points in the complex plane are allowed. The latter is used to assess the inﬂuence of the state on the output. The internal description is thus completely determined by means of the quadruple of matrices A C B D . it is known as the realization problem. the inﬁnite gramians arise. The ﬁnal section on rational interpolation provides a generalization of the realization results in several ways. the parametrization of all (minimal complexity) solutions becomes important. the existence of solutions being no longer an issue. is nontrivial. for instance. Finally it is pointed out that the machinery set up (generating system method) can be used to solve the problems of constructing systems with special properties. which can be computed by solving appropriately deﬁned linear matrix equations. the socalled Lyapunov equations. These equations play a fundamental role in the computation of approximants to a given system. Conditions for solvability and ways of constructing solutions (if they exist) are discussed in section 4. The converse however. which are ﬁrst order in the state (only the ﬁrst derivative/shift of the state is involved) and it describes the dynamics of the system. two structural concepts are introduced. First. the inputoutput data need not be conﬁned to Markov parameters. the socalled gramians can be used as well. those of reachability and observability. The former is used to investigate the extent to which the states can be inﬂuenced my manipulating the input. If the data consist of a partial set of Markov parameters. The reachability and observability matrices are used to quantitatively answer these questions. that is deducing an internal description from the external description.i i 4.
i i 104 Chapter 4. 2004 i i . Linear dynamical systems: Part I i “oa” 2004/4/14 page 104 i i i ACA : April 14.
In such spaces the concept of convergence can be deﬁned as follows. (5. .1 Time and frequency domain spaces and norms Consider a linear space X over R. 2. · · ·. This does not hold if X is inﬁnite dimensional. X is then called a normed space.1 Banach and Hilbert spaces A Banach space is a normed linear space X which is complete. The second part offers a brief review of system stability and an introduction to the related concept of system dissipativity. see [95]. converges to x∗ . y) −→ x. is a Cauchy sequence if for all > 0. [361]. k = 1. x ≥ 0 x. · : X × X −→ R. [115]. [288] and in addition [118]. if the sequence of real numbers ν (xk − x∗ ) = xk − x∗ converges to zero. Let a norm ν be deﬁned on X. If X is ﬁnite dimensional. [148]. y is a linear function x. there exists an integer m such that xi − xj < . We now turn our attention to Hilbert spaces.1. The inner product is a function from the Cartesian product X × X to R: · . x . A sequence xk . we must be able to measure “sizes” of dynamical systems as well as the distance between two of them. [70]. k = 1.i i i “oa” 2004/4/14 page 105 i Chapter 5 Linear dynamical systems: Part II To assess the quality of a given approximation. x = 0 ⇔ x = 0 for ﬁxed x∗ ∈ X. 5. 2. · · ·. x∗ . and [197]. [370]. y ∗ = y. The additional structure results from the existence of an inner product. for all indices i.1). [177]. This inner product induces a norm on X. A subspace Y of X is closed if every sequence in Y which converges in X has its limit in Y. not necessarily ﬁnite dimensional. For the material in the sections that follow. If every Cauchy sequence converges then X is called complete. namely x −→ x. [360]. j > m.1) This map must satisfy the following four properties x. x = x 105 i i i i 2 . (x. satisfying the three axioms (3. These spaces have more structure than Banach spaces. The chapter concludes with the discussion of L2 systems which plays a role in Hankel norm approximation. y ∈ R. We say that a sequence xk . 5. every subspace is closed. In the ﬁrst part of this chapter we discuss various notions of norms of linear dynamical systems.
page 29. I = Z. The spaces deﬁned above are sometimes referred to as timedomain p . 1 ≤ p ≤ ∞. spaces will also be introduced. I = R+ . The resulting hp spaces are deﬁned as follows: q ×r q ×r ¯ hp = hp (D) = {F : C → Cq×r : F hp < ∞}. where for instance. Lp .6). I = Z+ or I = Z− . f p< ∞}. or the ﬁnite interval I = [a. and the complement of the closed unit disc.3 The Hardy spaces hp and Hp We now turn our attention to spaces of functions of a complex variable. I = R.i i 106 Chapter 5. in this case X can be decomposed in a direct sum X = Y ⊕ Y⊥ . spaces since their elements are functions of a real variable. I = R+ .3) . The H¨ older pnorms of these sequences are deﬁned as: 1 p . i i ACA : April 14. there are other possible choices (see section 5. ∀y ∈ Y}. I = R− . the unit circle. the Schatten norms are deﬁned in (3.5). 2004 i i . or I = [a.2 The Lebesgue spaces p and Lp In this section we deﬁne the pnorms of inﬁnite sequences and functions.1. D ¯ . which is mostly time. 1≤p<∞ f (t) p p t∈I f p= (5. 5. f p< ∞}. where for instance. ∂ D. ∞) and F h∞ = ¯ z ∈D sup F(z ) p for p = ∞. I = Z+ or I = Z− . hp spaces ¯ ⊂ C denote the (open) unit disc. Linear dynamical systems: Part II i “oa” 2004/4/14 page 106 i Linear spaces X which are endowed with an inner product and are complete are called Hilbert spaces. I = R− . where I = Z. We close this introductory section by deﬁning the space Y⊥ of a subspace Y ⊂ X: Y⊥ = {x ∈ X : x. 5. supt∈I f (t) p . which is analytic in D F hp = 1 sup 2π r>1 2π 0 1 p F(reiθ ) p p dθ for p ∈ [1. 1 ≤ p ≤ ∞. Let D. b]. Its pnorm is deﬁned as follows: Consider the matrixvalued function F : C → Cq×r . then Y⊥ is called its orthogonal complement. We choose F(z0 ) p to be the Schatten pnorm of F evaluated at z = z0 . 1≤p<∞ p dt f p= (5. b]. where I = R. Consider now continuoustime vectorvalued functions f : I → Rn . p = ∞ The corresponding Lp spaces are: n Ln p (I ) = {f : I → R . Consider the vectorvalued sequences f : I → Rn . respectively. the pnorms are deﬁned as: 1 p t∈I f (t) p . Lp . y = 0. however.2) supt∈I f (t) p . p = ∞ The corresponding p spaces are: n p (I ) = {f : I → Rn . we will refer to them as frequency domain spaces.1. Since this variable can be interpreted as complex frequency. if Y is also closed. The space Y⊥ is always closed. In the sequel frequency domain p .
5) F Hp = sup x>0 −∞ F(x + iy ) p p 1 p dy for p ∈ [1. attains its maximum on the boundary ∂D of D.9) Frequency domain Lp spaces If the function F : C → Cq×r . ∞) and F L∞ = y ∈R sup σmax (F(iy )).2π ] (5. 2004 i i i i .or righthalf of the complex plane. The resulting Hp spaces are deﬁned analogously to the hp spaces: q ×r q ×r Hp = Hp (C+ ) = {F : C → Cq×r : F Hp < ∞}. F h∞ = sup σmax F(eiθ ) . (5.10) The corresponding frequency domain Lp spaces are Lp (iR) = {F : C → Cq×r : In a similar way one may deﬁne the frequency domain ACA F Lp < ∞}. q ×r The Hardy spaces Hp (C− ) can be deﬁned in a similar way. The following special cases are worth noting: i “oa” 2004/4/14 page 107 i 107 F h2 = 1 sup 2π r>1 2π 0 1 2 trace F (re ∗ −iθ )F(re ) dθ iθ . but is not necessarily analytic in either the left. Two special cases are worth noting: ∞ F H2 = sup x>0 −∞ trace [F (x − iy )F(x + iy )] dy ∗ 1 2 .5).4).4) where (·)∗ denotes complex conjugation and transposition. F H∞ = y ∈R sup σmax (F(iy )) . (5.i i 5. furthermore F Hp spaces Let C+ ⊂ C denote the (open) righthalf of the complex plane: s = x + iy ∈ C.6). which are analytic in C+ .1. Hp norms are not deﬁned. (5. (5. Again F(s0 ) p is chosen to be the Schatten pnorm ((3.6) where (·)∗ denotes complex conjugation and transposition.11) p spaces. (5. (5. (5. x > 0. Thus (5.5). Instead the frequency domain Lp norms of F are deﬁned as follows: ∞ 1 p F Lp = sup y ∈R −∞ F(iy ) p p dy for p ∈ [1. page 29). which states that a function f continuous inside a domain D ⊂ C as well as on its boundary ∂D and analytic inside D.8) ∞ −∞ trace [F (−iy )F(iy )] dy ∗ . (5. Time and frequency domain spaces and norms q ×r In a similar way one can deﬁne the spaces hp (D). ∞) and F H∞ = sup z ∈C+ F(z ) p for p = ∞. Consider the q × r complexvalued functions F. θ ∈[0. Then ∞ h∞ = ¯ z ∈D sup σmax (F(z )) .7) The search for the suprema in the formulae above can be simpliﬁed by making use of the maximum modulus theorem.7) become: F and F H2 = h2 = 1 2π 2π 0 trace F (e ∗ −iθ )F(e ) dθ 1 2 iθ 1 2 . (5. and F H∞ = s∈C+ sup σmax (F(s)) . has no singularities (poles) on the imaginary axis. April 14. of F evaluated at s = s0 . (5.
−→ L2 [0. if the domain of f is continuous.i i 108 Chapter 5. F(eiθ ) = F (f )(θ) is the Fourier transform of f and belongs to L2 [0.5). β ) are two normed spaces with norms α. The Fourier transform maps L2 (R+ ). Furthermore the following bijective correspondences hold: 2 (Z) = 2 (Z− ) ⊕ 2 (Z+ ) ¯ ). I (5. elements (vectors or matrices) with entries in 2 (Z) and L2 (R) have a transform deﬁned as follows: ∞ −t for discrete − time functions −∞ f (t)ξ . the L∞ norm can be viewed as an induced norm in the frequency domain space L2 (iR): F L∞ = F L2 −ind = sup X=0 FX X L2 L2 If F ∈ H∞ and G ∈ H2 (C+ ) then FG ∈ H2 (C+ ). L2 (R) = L2 (R− ) ⊕ L2 (R+ ) For simplicity the above relationships are shown for spaces containing scalarvalued functions. can be given the structure of Hilbert 1 2π = t∈I x∗ (t)y(t). the H∞ norm can be viewed as an induced norm in the frequency domain space H2 . respectively.11). Let F ∈ L∞ and G ∈ L2 (iR). We only state the continuoustime versions. L2 (R− ) onto H2 (C+ ).1. In addition. In addition it is an isometry.4 The Hilbert spaces and L2 2 (I ) The (real) spaces p (I ) and Lp (I ) are Banach spaces. H∞ = F H2 −ind = sup X=0 FX X H2 H2 = sup x=0 f ∗ x L2 = f x L2 L2 −ind 5. is a frequency domain L2 function. H2 (C− ) respectively.β = supx=0 Tx β / x α . it preserves distances. just as in the ﬁnitedimensional case. for the discretetime and continuoustime case. x.2 The spectrum and singular values of the convolution operator Given a linear system Σ. for continuous − time functions Thus if the domain of f is discrete. Linear dynamical systems: Part II 2 i “oa” 2004/4/14 page 108 i 5. The second one asserts that the product of an L∞ function and a frequency domain L2 function. The Fourier transform F is a Hilbert space isomorphism between L2 (R) and L2 (iR). f −→ F(ξ ) = ∞ −∞ f (t)e−ξt dt. Furthermore. spaces. It also asserts that the L∞ and H∞ norms can be viewed as induced norms.2.4) and (4. They are however equally valid for the corresponding spaces containing matrixvalued functions. Proposition 5. β . as well as in the time domain space L2 : F where x ∈ L2 (R+ ).1.12) For I = Z and I = R respectively. that is. y 2 and L2 (I ) however. There are two results connecting the spaces introduced above. β induced norm of an operator T with domain X and codomain Y is: T α. i i ACA : April 14. 2π ]. The product FG ∈ L2 (iR). respectively. y L2 = x∗ (t)y(t)dt. α) and (Y. 2004 i i . F(iω ) = F (f )(ω ) is the Fourier transform of f belonging to the space L2 (iR) deﬁned by (5. recall the deﬁnition of the convolution operator S given by (4. Plancherel and PaleyWiener attached to it. 2π ] = h2 (D) ⊕ h2 (D −→ L2 (iR) = H2 (C− ) ⊕ H2 (C+ ). where the inner product is deﬁned as follows: x. Proposition 5. analogously. Recall that if (X. the α. The ﬁrst has the names of Parseval.
. Some examples will illustrate these issues. N − 1.2. y = T (u). Notice that S is not invertible whenever 0 ∈ Λ(S ). if there exists v ∈ 2 (Z) such that (S − λI)v = 0. A classical result which can be found. N − 1. which is sometimes referred to as the Nyquist plot of the underlying system.1. In this case y = h ⊗ u. h0 hN−1 y(N − 1) u(N − 1) hN−1 hN−2 · · · h1 h0 SN The N × N matrix SN is called a circulant. 2004 i i i i . The convolution operator S is ∞ also known as the Laurent operator. . where ⊗ denotes periodic convolution. 1.i i 5. where H ¯ (z ) = root of unity. t = 0. namely hk . N − 1. which shows that S is a normal operator. asserts that S is a bounded operator in 2 (Z) if. where k runs over any consecutive set of N integers. . v) is an eigenvalue. . Thus S (H)[S (H)]∗ = S (HH∗ ) = S (H∗ H) = [S (H)]∗ S (H). . Circulant matrices are normal. . we have in this case S (H) 2 i “oa” 2004/4/14 page 109 i 109 = H ∞ . (λ. Otto Toeplitz in 1911 showed that the spectrum of S is Λ(S ) = { H(z ) :  z = 1 }. h0 h2 u(1) . let k = 0. y(1) h1 . u(N − 2) hN−2 .. if H is rational this means that S is not invertible whenever H has a zero on the unit circle. the set of singular values of S is given by absolute values of the elements of its spectrum: Σ(S ) = {  H(z ) :  z = 1 }. k = 0. where wN = e−i N is the principal N th N N −1 −k ¯ (wk ). The corresponding input and output sequences are ﬁnite: uk . . 1. . to stress the connection with H. This implies that if λ belongs to the spectrum of S . We will ﬁrst compute the singular values of the convolution operator S associated with the discretetime system: y (k + 1) − ay (k ) = bu(k ) ACA April 14.···. which is the usual convolution but the sum is taken over N consecutive values of u and outside the N −1 interval [0. H ∞ < ∞.2.l=1. they are diagonalized by means of the DFT 2π (k−1)( −1) wN (Discrete Fourier Transform) matrix (ΦN )k. 1. eigenvector pair of S . 1. · · · . ⇒ yN = SN uN . ∞ Closely related to S is the Toeplitz operator T : 2 (Z+ ) → 2 (Z+ ). In matrix form we have h0 hN−1 · · · h2 h1 y(0) u(0) . It can be shown (see [69]) that T 2 = H h∞ and that the spectrum Λ(T ) contains Λ(S ). N − 1]. .. The eigenvalues of SN are H . also known as the symbol of S . e. . . it is a special Toeplitz matrix which is constructed by means of cyclic permutation of its ﬁrst column or row. yk . H belongs to the (frequency domain) space ∞ (∂ D ). N − 1.g. furthermore. Furthermore it follows that [S (H)]∗ = S (H∗ ) and S (H1 )S (H2 ) = S (H1 H2 ). · · · . . . can be k=0 hk z considered an approximant of H. where y(t) = k=0 ht−k u(k ). The spectrum and singular values of the convolution operator Spectrum and singular values of S Consider ﬁrst the case of singleinput singleoutput (m = p = 1) discretetime systems. The spectrum Λ(S ) can be approximated by considering a ﬁnite portion of the impulse response. To this operator we associate the transfer function H(z ) = t=−∞ ht z −t .. together with all points enclosed by this curve which have nonzero winding number. .. that is. Denoting the convolution operator by S (H). Therefore the largest entry of this set is the ∞ norm of H. Due to the normality of S . Example 5. y(N − 2) .N = √1 . = . the functions are assumed to be periodic: y(t) = k=0 ht−k u(k ). · · · . k = 0. t ≥ 0.. The spectrum of this operator is composed of all complex numbers λ such that the resolvent S − λI is not invertible. and only if. For simplicity. λ is a singular value of S . · · · . . . in [69].. If we plot the elements of this set in the complex plane we obtain a closed curve.
. The Nyquist plot (i. which implies that S is invertible. Hence from the above discussion it follows that: S = b . .e. h3 = 0. 2004 i i . SN = b .2 0 0. . 6 z (z 2 + 1 3) 1 4 −4 4 The ﬁrst few Markov parameters are h0 = 6 . . .8 −1 −0. The singular values of S are Σ(S ) = [0. ··· ··· ··· ··· .4 0.6 0. If the associated H is rational. aN −2 0 ∈ RN ×N The error between the eigenvalues of SN and the corresponding elements of the spectrum of S are of the order of 10−2 . Again.1. provided that the associated transfer function H(s) belongs to (the frequency domain space) L∞ (iR). h5 = 0. Furthermore similar arguments show that S is normal. 0 0 0 1 . h6 = 81 . 10−3 for N = 6..2 0 −0.2 −0. . . −4 h8 = 243 . 0 0 1 a . . h1 = 1 2 . . . that is. . to stress its connection with H. h7 = 0. N = 6 (crossed circles). Spectrum of the convolution operator S compared with the eigenvalues of the ﬁnite approximants SN . 0 1 1 a . 0 1 . .8 1 1. 0) of radius 1−a2 . Linear dynamical systems: Part II i “oa” 2004/4/14 page 110 i The impulse response of this system is: h0 = 0. for N = 4 (circles) N = 6 (crossed circles). the error in the latter case is of the order 10−3 . . denoting the convolution operator by S (H). the spectrum of S ) is shown in ﬁgure 5. h2 = 9 . The convolution operator deﬁned by (4. k > 0. . and N = 10 (dots). we have S (H) 2 = H L∞ . . . As a second example we consider a discretetime third order system which is known under the name Butterworth ﬁlter. Without proof we state the following facts. respectively. H is bounded on the imaginary axis.i i 110 Chapter 5. .1. . . the spectrum of S is composed of all complex numbers such that the resolvent S − λI is not invertible.. h4 = 27 . hk = bak−1 . Generalization to continuoustime systems For SISO continuoustime systems similar results hold. 0 1 a a2 . .2 0.2 Figure 5. Thus the largest singular value of S is b 1−a while the smallest is b 1+a .6 0. .5) is bounded in L2 (R). . . this condition is equivalent to the absence of zeros of H on the imaginary axis. .. . N −3 a aN −2 aN −2 0 .4 −0. .4 −0. N = 4 (circles). Finally the spectrum of the associated Toeplitz operator Λ(T ) is composed of Λ(S ) and all points in the interior of this curve. . aN −4 aN −3 ··· ··· . . ··· ··· a a2 . Its transfer function is 1 (z + 1)3 H(z ) = . ··· ··· ··· ··· . . 0 0 0 0 . which implies that S is not invertible. The spectrum ab b (Nyquist plot) of this system consists of a circle centered at ( 1− a2 . .6 −0.8 −0.8 0. Nyquist diagram for 3rd order discrete−time Butterworth filter 1 0. together with the eigenvalues of SN . The parallel of the Otto Toeplitz i i ACA : April 14. Just as in the discretetime case. 1].6 −0. N = 10 (dots). N = 10.4 0. a similar property holds for the singular values of S and SN .
= sup σ1 (H(iω )). given the inner product ·.14) with time t running backwards.13) follows that S ∗ : Lp (R) −→ Lm (R). is the unique operator satisfying y. The spectrum and singular values of the convolution operator result in the continuoustime case is that the spectrum of S is Λ(S ) = { H(iω ) : ω ∈ R }. B D (5. therefore its adjoint is given by D∗ . S u = S ∗ y. By (4.15) April 14. the adjoint of S . Therefore. The convolution operator for MIMO systems is in general not normal. denoted by S ∗ . m }. · · · . which denotes dynamical action. S is an integral operator with kernel h(·) which by means of state space data can be expressed as follows h(·) = Dδ (·) + CeA(·) B. k = 1. page 55. Therefore. It follows that given Σ = respect to the usual inner product in L2 (R). t ≥ 0. can be computed as follows: ∞ y. min(m. its adjoint with (5. · . t > 0. In this case it is composed of the union of the m curves which constitute the m eigenvalues of the transfer function H(iω ). this issue is brieﬂy addressed next.5). S ∗ y Thus from (5. k = 1. Generalization to MIMO systems The spectrum of the convolution operator associated with a MIMO (multiinput multioutput) system is deﬁned for m = p (same number of input and output channels). p) }. that is from +∞ to −∞.i i 5. is an instantaneous and not dynamical action. Its singular values (which are deﬁned even if m = p) are those of H(iω ): Σ(S ) = { σk (H(iω )) : ω ∈ R. the set of singular values of S is given by Σ(S ) = {  H(iω ) : ω ∈ R }. this gives Λ(S ) = { λk (H(iω )) : ω ∈ R. ω ∈R where as before σ1 denotes the largest singular value. 2 i “oa” 2004/4/14 page 111 i 111 S The adjoint of the convolution operator A detailed study of the singular values of the convolution operator S requires the use of the adjoint operator. u in the appropriate spaces. Similar statements can be made for discretetime MIMO systems. namely D. The former part. 2004 ACA i i i i . · · · . The largest entry of this set is the L∞ norm of H. can be deﬁned as Σ∗ = − A∗ B∗ −C∗ D∗ A C . although it will not be used in the sequel.13) for all y. By deﬁnition. y −→ u = S ∗ (y) where u(t) = − −∞ +∞ B∗ e−A ∗ (t−τ ) C∗ y(τ )dτ. This set in the complex plane yields a closed curve known as the Nyquist plot. For completeness. u (5. For continuoustime systems for instance. The adjoint due to the latter part ha (·) = CeA(·) B. S u = −∞ y∗ (t) ∞ −∞ CeA(t−τ ) Bu(τ ) dτ dt = ∞ −∞ u∗ (τ ) ∞ −∞ B∗ e A ∗ (t−τ ) C∗ y(t) dt dτ = u.2.
A simple calculation shows that if A is invertible. i. ω (5. the adjoint can also be deﬁned by replacing −C∗ . and only if. Consider the rational function Φγ (s) = Im − 1 ∗ H (−s)HΣ (s). Given Σ = A C B D . p (t)x(t) = u dt In other words for ﬁxed t. notice that u. the transfer function of the adjoint system is HΣ∗ (z ) = H∗ (z −1 ) = D∗ + B∗ (z −1 I − A∗ )−1 C∗ .2. and only if. the adjoint of the autonomous system dt x(t) = Ax(t) is dt p(t) = d ∗ ∗ ∗ ∗ −A p(t). u ˜ and y. are adjoints of eachother if. if HΣ (s) = D + C(sI − A)−1 B. the transfer function of the adjoint system Σ∗ is HΣ∗ (s) = H∗ (−s) = D∗ − B∗ (sI + A∗ )−1 C∗ . are respectively elements of the same space. respectively. the adjoint system has the following realization: Σ∗ = (A∗ )−1 B∗ (A∗ )−1 −(A∗ )−1 C∗ D∗ − B∗ C∗ . [119]. and only if. if Σ is stable. One way to deﬁne the adjoint Σ∗ of a discretetime system Σ is by means of the transfer function. B∗ with C∗ . u ˜ = B∗ y ˜ + D∗ y ˜ . −B∗ . there is no real ω .17) i i ACA : April 14. y ˜ . If D = 0 Σ is less than γ if. The following general result adjoint system is equal to the difference of the inner products between u. γ2 Σ If the H∞ norm of Σ is less than γ > 0.e. Remark 5. A ˜ (γ ) = A A 1 ∗ −γ C C 1 ∗ γ BB ∗ −A ∈ R2n×2n (5. Linear dynamical systems: Part II i “oa” 2004/4/14 page 112 i Furthermore. u ˜ and output y. its 2 norm is: Σ 2 = S 2−ind = H H∞ = sup σmax (H(iω )). Given H(z ) = D + C(z I − A)−1 B. 2004 i i . ˙ = Ax + Bu. u ˜ . Φ− γ (s) has no pure imaginary poles. can be shown. y they admit minimal state space representations such that the states x. let ΣΦγ = ˜ (γ ) A ˜ (γ ) C ˜ (γ ) B ˜ (γ ) D 1 be a realization of Φ− γ .i i 112 Chapter 5. A p straightforward calculation shows that the following relationship holds: d ∗ ˜ ∗ (t)u(t) − y ˜ ∗ (t)y(t). Thus the H∞ norm of Σ is less 1 than γ if.2 which asserts that the 2induced norm and the H∞ norm of a stable Σ are the same.1. 5. such that Φγ (iω ) is zero. p satisfy the above instantaneous condition.16) The last equality follows from proposition 5. For details see [287]. for all t ≥ t0 . It follows that dt [p (t)x(t)] = 0 which means that p (t)x(t) = p (t0 )x(t0 ). Proposition 5.9). respectively. y = Cu + Du and one for the adjoint (b) Given the state space representation of the original system x ˙ = −A∗ p − C∗ y ˜. (d) Discretetime systems. Two systems with input u. the derivative of the usual inner product of the state of the original system and that of the ˜ and y. The H∞ norm of ˜ (γ ) has no eigenvalues on the imaginary axis.3. the eigenvalues of A have negative real parts. d d (c) According to the above considerations. y ˜ .3 Computation of the 2induced or H∞ norm According to (5. (a) According to the computations above.
Therefore the computation of the H∞ norm requires the (repeated) computation of the eigenvalues of structured – in this case Hamiltonian – matrices.5). and the Hamiltonian −1 0 1 −1 0 1 0 0 0 0 0 0 0 −1 0 0 0 1 −γ 1 γ ˜ A(γ ) = 0 0 1 0 0 0 0 0 −1 1 1 0 0 0 0 −1 0 The locus of the eigenvalues of the Hamiltonian as a function of the parameter γ ≥ 0. while for a sufﬁciently small γ . ACA April 14. Thus at the bifurcation point of the locus. For γ = ∞.2. especially close to the H∞ norm. Consider the continuoustime third order Butterworth ﬁlter. Then both the eigenvalues and the singular values of H are determined. The state space matrices are −1 0 0 1 A = 1 −1 −1 . For smaller values of γ the eigenvalues move again in straight lines away from the origin. γ ¯ ]: let γ ˜ = (γ + γ ¯ )/2. Such matrices are called Hamiltonian matrices. the Hamiltonian deﬁned by (5. 2004 i i i i . Example 5. If A γ ) has imaginary eigenvalues then the interval above is substituted by the interval where γ = γ ˜ . It is worth examining the sensitivity of the eigenvalues ˜ (1) has six zero eigenvalues while as a function of γ . The algorithm used to ﬁnd an approximation of the H∞ norm ˜ (˜ consists in bisecting the interval [γ. Therefore as detailed the dimension of its null space is one. C = [0 0 1] . B 0 . has no pure imaginary eigenvalues. We proceed as follows.i i 5. the six eigenvalues go to inﬁnity. H = γ C∗ (γ 2 I − DD∗ )−1 C. ˜ satisﬁes The proof of this proposition is left as an exercise (see Problem [30]). .18). To remedy this situation a procedure for computing the inﬁnity norm which bypasses this problem has been proposed in [74]. Therefore the H∞ norm (which is the 2norm of the associated convolution operator S ) is equal to 1. Therefore A in example 10. Notice that A ˜ AJ ∗ ˜ . is shown in ﬁgure 5. we have very high sensitivity. D = 0.1. G = γ B(γ 2 I − D∗ D)−1 B∗ .1 and in particular the deﬁnition of the convolution operator S (4. A general reference on Hankel operators is [261]. respectively. 0 1 0 0 which implies that Φγ (s) = 1 + 1 1 γ 2 s6 −1 . The procedure continues until the difference γ ¯ − γ is sufﬁciently small. The Hankel operator and its spectra If D = 0 and γ 2 I − DD∗ is nonsingular ˜ (γ ) = A F −H G −F∗ ∈ R2n×2n (5.2.1. 5.18) 113 i “oa” 2004/4/14 page 113 i where F = A + γ BD∗ (γ 2 I − DD∗ )−1 C. As γ decreases the eigenvalues move in straight lines towards the origin. In this section we ﬁrst deﬁne the Hankel operator H which is obtained from S by restricting its domain and codomain. As γ approaches 0. a perturbation in γ of the order 10−8 will result in a perturbation of the eigenvalues of the order 10−3 . The 6 eigenvalues lie on unit circle at the locations indicated by circles.3. both of these intervals have now half the length of the previous interval. it does.3 to compute the H∞ norm The above fact can be used to compute the H∞ norm of Σ to any given accuracy by means of the bisection algorithm.17) or (5. Using proposition 5. with two of them traversing the imaginary axis in the positive. negative direction. where J = = AJ 0 I −I 0 . they meet at the origin for γ = 1. otherwise by γ ¯=γ ˜ . For a sufﬁciently large γ ¯ . For γ = 1 the Hamiltonian A ˜ (1) is similar to a Jordan block of size six. .4 The Hankel operator and its spectra Recall section 4.4.
time invariant. where y+ (t) = H(u− )(t) = k=−∞ ht−k u− (k ). . Locus of the eigenvalues of the Hamiltonian as a function of γ . we have 0 j =∞ y (k ) = j =−∞ CA(k−j ) Bu(j ) = CAk 0 Aj Bu(−j ) which implies y(k ) = CAk x(0).2.65). . namely the three gramians. . H has a ﬁnite set of nonzero singular values. its 2 induced norm is H 2 −ind = σmax (H) = Σ H ≤ Σ ∞ The quantity Σ H is called the Hankel norm of the system Σ. Given a linear. .i i 114 3 i “oa” 2004/4/14 page 114 i Chapter 5. Linear dynamical systems: Part II Root locus of eigenvalues of the Hamiltonian 2 1 Imag Axes 0 −1 −2 −3 −3 −2 −1 0 Real Axis 1 2 3 Figure 5.1 and 5. the properties of this operator are that because of (4.6). of its transform which is the transfer function HΣ . p If H has domain m 2 (Z− ) and codomain 2 (Z+ ). the convolution operator S induces an operator which is of interest in the theory of linear systems: the Hankel operator H deﬁned as follows: −1 H: m 2 (Z− ) −→ p 2 (Z+ ). not necessarily causal system. Finally. . . . u− −→ y+ . If in addition the system is stable.2. . H Thus H turns out to be the same matrix as the Hankel matrix deﬁned by (4.63) on page 77. .. . its eigenvalues and singular values can be computed by means of ﬁnitedimensional quantities. . it follows that the 2 induced norm of S is equal to the h∞ Schatten norm (3. it has ﬁnite rank (at most n). t ≥ 0 (5. by combining the discretetime versions of propositions 5. . In this case Σ ∞ above. . can be replaced by Σ h∞ and we often refer to this quantity as the h∞ norm of Σ. . and that the rank equals n if and only if Σ is reachable and observable. page 29.4) of S : y(0) h1 h2 h3 · · · u(−1) y(1) h2 h3 h4 · · · u(−2) y(2) = h3 h4 h5 · · · u(−3) . Since the Hankel operator of Σ maps past inputs into future outputs. . . page 78.19) The matrix representation of H is given by the lowerleft block of the matrix representation (4. x(0) = ∞ Aj Bu(−j ) 0 Brieﬂy. 2004 i i . Thus due to the fact that H has ﬁnite rank. as shown in the next section. i i ACA : April 14.
which is the transfer function HΣ .1 0 −0. If H has domain Lm 2 (R− ) and H σmax (H) = S ≤ ˆ S L∞ (5. similarly to the discretetime case.6 0. u− −→ y+ where y+ (t) = H(u− )(t) = 0 h(t − τ )u− (τ ) dτ. page 77 for both discrete. The Hankel operator and its spectra 115 i “oa” 2004/4/14 page 115 i Given a linear.1 Computation of the eigenvalues of H We begin by computing the eigenvalues of the Hankel operator. to the second order system with transfer function H(s) = s2 +1 s+1 . The ﬁrst is the Hankel matrix deﬁned by (4. time invariant.3 0. if the system is stable.4. 2004 i i i i .2 follows that the L2 induced norm of the system deﬁned by the kernel h is equal to the H∞ Schatten norm of its transform. A square pulse of unit amplitude is applied from t = −1 until t = 0 (upper plot). t ≥ 0 −∞ (5. The ﬁrst result. the eigenvalues of the former are not the same as those of the latter.8 0. for instance. the (nonzero) eigenvalues of the Hankel operator are ACA April 14.2 0 −2 −1 0 2 4 6 8 10 12 Output for positive time due to square pulse 0. Remark 5. continuoustime not necessarily causal system.21) The quantity Σ H is called the Hankel norm of the system Σ. and we often refer to this quantity as the H∞ norm of Σ. It turns out that the Hankel matrix is the matrix representation (in the canonical basis) of the Hankel operator for discretetime systems. with the same symbol H.19). whenever deﬁned. the convolution operator S induces a Hankel operator H deﬁned as follows: p H : Lm 2 (R− ) −→ L2 (R+ ). page 114.4.4 0. We have deﬁned three objects which carry the name Hankel and are denoted. Top pane: Square pulse input applied between t = −1 and t = 0. its L2 induced norm is H L2 −ind = ∞ t h(τ )u− (t − τ )dτ . 5.3. this problem makes sense only for square systems m = p.1. by abuse of notation.2 0.4. and the third object is the Hankel operator for continuoustime systems deﬁned by (5.4. The action of the Hankel operator: past inputs are mapped into future outputs. the resulting response for t > 0 is depicted (lower plot). The Hankel operator and the Hankel matrix.1 and 5.20) above. Figure 5. Furthermore the Hankel matrix and the continuoustime Hankel operator are not related.3 illustrates the action of the Hankel operator.and continuoustime systems.1 −2 −1 0 2 4 6 8 10 12 Figure 5. asserts that in this case.20) Sometimes this integral is written in the form y+ (t) = codomain Lp 2 (R+ ). Lower pane: Resulting output considered for t > 0. As in the discretetime case. the second is the Hankel operator (5. by combining propositions 5.63). Input: square pulse applied between t = −1 and t = 0 1 0. lemma 5. deﬁned for discretetime systems.i i 5.
6) that the (nonzero) singular values of the Hankel operator are equal to the squareroots of the eigenvalues of the product of the reachability and observability gramians.2 Computation of the singular values of H Next. is deﬁned as the Hankel operator of its stable and causal part Σ+ : HΣ = HΣ+ . Recall that this concept was deﬁned in section 4.22) are the singular values of HΣ deﬁned by (5. the second and the third are equal to the same constant vector. Lemma 5. (4. i = 1. Therefore u 5. i=1 mi = n. First. Recall that the Hankel operator maps past inputs into future outputs. is deﬁned as the solution X of the Sylvester equation AX + X A + BC = 0. it is shown (lemma 5. The Hankel singular values of the stable system Σ. 2004 i i . The cross gramian X for continuoustime square systems Σ (m = p). (5. i i ACA : April 14.2 for discretetime systems.5. The proof is thus complete. For square minimal systems Σ the nonzero eigenvalues of the Hankel operator H are equal to the eigenvalues of the cross gramian X . Let the function u− be an eigenfunction of H. i. y+ (t) = H(u− )(t) = −∞ h(t − τ )u(τ )dτ. Proof. Conversely. If A is stable. say.i i 116 Chapter 5. Here we prove this result for continuoustime systems. Then 0 −∞ ∞ 0 CeA(t−τ ) Bu− (τ )dτ = λu− (−t) ⇒ CeAt eAt BCeAt dt 0 −∞ 0 −∞ e−Aτ Bu− (τ )dτ = λu− (−t) ∞ 0 e−Aτ Bu− (τ )dτ = λ eAt Bu− (−t)dt The ﬁrst integral is equal to the cross gramian X . t ≥ 0 where h(t) = CeAt B. Linear dynamical systems: Part II i “oa” 2004/4/14 page 116 i equal to the eigenvalues of the cross gramian. v ∈ Rn .4.59) X = 0 eAt BCeAt dt. namely 0 H : u− → y+ . in which case the statement about the equality of eigenvalues follows in a straightforward way.3. Deﬁnition 5. The eigenvalue problem of H for square systems is H(u− ) = λy+ .e. The Hankel norm of Σ is the largest Hankel singular value: Σ H = σ1 (Σ) The Hankel operator of a not necessarily stable system Σ. where y+ (t) = u− (−t). v) be an eigenpair of X .19). t ≥ 0 is the impulse response of Σ. q. We thus have X v = λv.4. (5. X v = λv: ∞ 0 0 eAτ BCeAτ dτ v = λ v ⇒ CeAt 0 −∞ e−Aτ BCe−Aτ v dτ = λ CeAt v ⇒ −∞ ˜ (−t) ⇒ H(˜ ˜ (−t).60) The next lemma states the desired property of the cross gramian. which shows that if λ is a nonzero eigenvalue of H it is also an eigenvalue of X . we need a deﬁnition. let (λ.20). it is well known that the solution of this equation can be written as ∞ (4. · · · . t ≥ 0 u)(t) = λu CeA(t−τ ) B Ce−Aτ v dτ = λCeAt v = λu ˜ (τ ) u ˜ is an eigenfunction of H. denoted by: q σ1 (Σ) > · · · > σq (Σ) each with multiplicity mi .
and only if. there exists a triple (C. this equation becomes 2 PQxi = σi xi .49). Q are the inﬁnite gramians obtained by solving the discrete Lyapunov equations (4.43) holds for discretetime systems where P . This implies that if v is an eigenvector of H∗ H then Rv is an eigenvector of the product of the two gramians PQ. t ≤ 0 (5. if.20) holds: h(t) = CeAt B. in contrast to S . Conversely.47). For continuoustime systems the adjoint is deﬁned as follows: H∗ : Lp (R+ ) −→ Lm (R− ). RR = P is the reachability and O∗ O = Q is the observability gramian of the system. Thus H∗ H = R∗ O∗ OR. R are the inﬁnite observability. It can be shown that (4. t > 0 It follows that 0 0 −∞ = xi (Hu)(t) = −∞ h(t − τ )u(τ )dτ = CeAt e−Aτ Bu(τ )dτ = CeAt xi . The continuoustime case In order to compute the singular values of H we need its adjoint H∗ . Because of the factorization of H given in lemma 4. This implies: u(t) = Substituting u in the expression for xi we obtain 1 2 σi 0 −∞ 1 ∗ −A∗ t Qxi .44). Consequently.20). y+ −→ u− = H∗ (y+ ) where u− (t) = 0 ∞ h∗ (−t + τ )y(τ )dτ. The Hankel operator and its spectra The discretetime case We start by computing the singular values of the Hankel operator H deﬁned by (5.4.19). 2004 i i i i . xi ∈ Rn Moreover (H∗ Hu)(t) = B∗ e−A ∗ t 0 ∞ eA σ C∗ CeAσ dσ xi = B∗ e−A t Qxi ∗ ∗ where the expression in parenthesis is Q. B) such that (4. page 78 we have H = OR. (4. t ≤ 0 2B e σi e−Aτ BB∗ e−A τ dτ ∗ Qxi = xi Recall the deﬁnition (4. We conclude that the (nonzero) singular values of the Hankel operator H are the eigenvalues of the product of the inﬁnite gramians P and Q of the system.4. let PQw = σ 2 w. We conclude that σ is a singular value of the Hankel operator H. then R∗ O∗ ORv = σ 2 v ⇒ RR∗ O∗ O Rv = σ 2 Rv P ∗ Q i “oa” 2004/4/14 page 117 i 117 Recall that by (4. i.48) on page 69.i i 5. this implies PQw = σ 2 w ⇒ OPQw = σ 2 Ow ⇒ OR R∗ O∗ Ow = σ 2 Ow H H∗ and therefore σ 2 is an eigenvalue of HH∗ with corresponding eigenvector Ow. reachability matrices. Let v be an eigenvector of H∗ H corresponding to the eigenvalue σ 2 . respectively. Therefore H.23) In the sequel it is assumed that the underlying system is ﬁnite dimensional. the inﬁnite observability gramian deﬁned by (4. page 114. A.50) respectively. is that this last expression be equal to σi u(t). The requirement for u 2 to be an eigenfunction of H∗ H.43) of the inﬁnite reachability gramian.e. where O. (4. the rank of the Hankel matrix derived from the corresponding Markov parameters ht of Σ is ﬁnite. has a discrete set of singular values. σ 2 is an eigenvalue of the product of gramians PQ. In summary we have: ACA April 14. by subsection 4.32. Recall (5.
where X.4. This property can be readily checked for integral i i ACA : April 14.4. Proposition 5. this implies that the reachability and the cross gramians are related in this case: P = X Ψ. If the k k k λi (P )+λi (Q) Hankel operator is symmetric the following majorization inequalities hold: i=1 πi . i.3 The HilbertSchmidt norm An operator T : X → Y. the reachability.24) Remark 5. Remark 5.7. and the associated cross gramian X .or continuoustime system Σ of dimension n. (4. namely the cross gramian X deﬁned in (4.4. (a) For discretetime systems: σk (Σ) = σk (H).59). that is m = p. i = 1. (c) If the system is not symmetric. Y are Hilbert spaces. the Hankel singular values of Σ are equal to the positive square roots of the eigenvalues of the product of gramians PQ σk (Σ) = λk (PQ) . and the absolute values of these eigenvalues are the Hankel singular values of H. Recall that according to lemma 4. the singular values of the system are the singular values of the (block) Hankel matrix deﬁned by (4. 2004 i i . such that n>0 T (wn ) < ∞. For continuoustime systems however σk (Σ) are the singular values of a continuoustime Hankel operator.75). It follows that the eigenvalues of X are the eigenvalues of the associated Hankel operator H.3. 5.3 the Hankel singular values of a continuoustime stable system and those of a discretetime stable system related by means of the bilinear −1 transformation s = z z +1 are the same. · · · .2. Q = Ψ−1 X ⇒ X 2 = PQ We have thus proved the following result which also follows as a corollary of lemma 5. (5.3. Using a similar argument involving (4.45) has a unique solution. k = 1.37.25) . They are not equal to the singular values of the associated matrix of Markov parameters. a symmetric system always possesses a realization which satisﬁes (4. Given a symmetric minimal system Σ = A C B (5. and cross gramians satisfy (5. is a HilbertSchmidt operator.46) we obtain Q = Ψ−1 X . the Hankel singular values σi and the singular values πi of X satisfy the k k k k following majorization inequalities: i=1 σi ≥ i=1 πi and i=1 πn−i+1 ≥ i=1 σn−i+1 . it follows that trace H = trace X . if there exists a complete orthonormal sequence {wn } ∈ X. Linear dynamical systems: Part II i “oa” 2004/4/14 page 118 i Lemma 5. Since the eigenvalues of X are the same as the eigenvalues of the Hankel operator H. (b) It should be noticed that following proposition 4.59). Furthermore according to [250] there holds: 2 trace H = −trace CA−1 B .4. . Consider a square system Σ = A C B . We conclude therefore that twice the trace of the crossgramian is equal to minus the trace of the transfer function of Σ evaluated as s = 0. Given a reachable. the trace of the lefthand side is namely equal to 2 trace X while that of the righthand side is equal to the desired − trace CA−1 B .e.60).29 presented in subsection 4. This fact can be seen directly by taking the trace of X + A−1 X A = −A−1 BC. Substituting these relations in equation (4. n. n. observability. · · · . observable and stable discrete. i=1 σi ≤ i=1 2 k k Thus in the balanced basis (see chapter 7) i=1 σi ≥ i=1 πi . we conclude that P = X Ψ.63). we obtain: AX + X A + BC = 0 ⇒ AX Ψ + X AΨ + BCΨ = 0 ⇒ A [X Ψ] + [X Ψ] A∗ + BB∗ = 0 Since the Lyapunov equation (4. its Hankel singular values can be computed using only one gramian. The Hankel singular values and the crossgramian If we are dealing with a symmetric and stable system Σ.i i 118 Chapter 5.25).6.
1. . y )dxdy is ﬁnite. 1 − a2 In this case.5. the HilbertSchmidt norm (which is equal to the b trace of H) is 1− a2 .4 The Hankel singular values of two related operators Given a system Σ = A C B D . . The integral operator b i “oa” 2004/4/14 page 119 i 119 T (w)(x) = a k(x. d]. In particular: κ2 = trace 0 ∞ 0 h∗ (t − τ )h(t − τ ) dτ dt −∞ Assuming that the system is stable. In this case κ is the HilbertSchmidt norm of I . since H is symmetric λ1 (H) = ±σ1 (H)..4. The HilbertSchmidt norm for SISO linear systems was given an interpretation in [165]. Following [365]. T ] → L2 [0. It readily follows that the convolution operator S deﬁned above is not HilbertSchmidt.8. this expression is equal to κ2 = trace 0 ∞ ∞ 0 −∞ B∗ e A ∗ ∗ (t−τ ) C∗ CeA(t−τ ) B dτ dt ∗ = trace 0 ∞ B∗ eA B∗ eA t 0 −∞ e−A τ B∗ Be−Aτ dτ eAt B dt ∞ 0 = trace 0 ∗ t Q eAt B dt = trace eAt BB∗ eA ∗ t 2 2 Q dt = trace [PQ] = σ1 + · · · + σn .4. discussed earlier. consider the operator T K : L2 [0. − ay (k ) = bu(k ). .i i 5. See also problem 58. such operators are compact and hence have a discrete spectrum. while the Hankel operator H is. page 357. . where σi are the Hankel singular values of the system Σ. . . y )k(x. Furthermore. multiplicities included. where each eigenvalue has ﬁnite multiplicity. 5. For the discretetime system y (k + 1) operator is: 1 a H = b a2 . x ∈ [c. a2 a3 a4 . An illustration of this result is given in example 5.4. if its kernel k is square integrable in both variables: κ2 = trace a b c d k∗ (x. y )w(y )dy. the Hankel a a2 a3 . . T ]. a < 1. singleoutput stable system. which turns out to be σ1 (H) = b  . πκ2 is equal to the area of the Nyquist plot of the associated transfer function. is a HilbertSchmidt operator. . Proposition 5. The Hankel operator and its spectra operators T . K(u) = 0 ACA h(t − τ )u(τ )dτ April 14. Example 5.2. 2004 i i i i . The result asserts the following. Given a singleinput. This operator has a single nonzero singular value. and the only possible accumulation point is zero. ··· ··· ··· .
121. For the generalization to the case D = 0. in particular. Figure 5. [368. The bisection search on the interval (0. which conﬁrms the singular values in the table above.17). 1 − ce−s s+1 The singular values of the associated Hankel operator are characterized by the above ∆(γ ) = 0. with conjecture for the general case γmax = inf q∈H∞ ˆ − m(s)q h ∞ ˜ (γ )T ) by m(−A ˜ (γ )).4. γ thus (1 + c2 ) 2 c (1 − c2 ) (1 − 3ω 2 ) sin ω + ω − 3 cos ω − 3 − ω2 4 4 ω 2 The Newton iteration can be applied. A second instance of the existence of an exact formula for the Hankel singular values is given in [253].g. H(s) = . ˜ (γ ) is 2 × 2. which yields the Hankel singular values shown below. The where m is a general (real) inner function. the determinant of ˜ (γ )T is zero: the (2. 363]) that γ > 0 is a singular value of K if. This paper shows that the Hankel singular values of a system whose transfer function is given by the product of a scalar inner (stable and allpass) function and of a rational MIMO function can be determined using an appropriately deﬁned Hamiltonian. [363] gave a completely basisfree proof. where φ(s) = c < 1 e− s − c 1 . Linear dynamical systems: Part II i “oa” 2004/4/14 page 120 i ˜ (γ ) The singular values of this operator can be determined as follows. By straightforward calculation. and ω = ∆= 1 Ω 1−γ 2 .17). Recall the deﬁnition of the Hamiltonian A deﬁned by (5.75.2. where ∆(γ ) = det I 0 1 −γ P 0 − 0 0 1 −γ Q I ˜ (γ )) φ(−A ˜ (γ ) is the associated Hamiltonian deﬁned by (5. and A Example 5.i i 120 Chapter 5. P . Since A ˜ (γ )) is equal to φ(A 1 Ω 1 1 + c2 cos ω − ω 1 − c2 sin ω − 2c 1 − γω 1 − c2 sin ω 1+c 2 √ 1 − c2 sin ω 1 cos ω + ω 1 − c2 sin ω − 2c 1 γω where Ω = 1 − 2c cos ω + c2 . 2 Q= 1 2 For this example φ(−F) = eF − cI I − ceF . 1) is employed for c = 0. where F= −1 1 −γ −1 1 1 γ .07. it is possible to compute the determinant in terms of γ . The conjecture amounts to replacing exp(A conjecture was resolved ﬁrst by [232] and later streamlined by [302. The Hankel singular values of the system whose transfer function is φ(s)HΣ (s) are the solutions of the following transcendental equation ∆(γ ) = 0 in γ . P= 1 . i i ACA : April 14. [254] The transfer function from u to y for the delay differential equation y ˙ (t) = −y (t) + cy (t − 1) + cy ˙ (t − 1) − cu(t) + u(t − 1).4 shows the determinant as a function of γ . and only if. 363] in a more generalized context.26) ˆ − e− T s q h ∞ This was ﬁrst solved in the above form in [368].. Let Σ = A C B 0 be given with transfer function HΣ (s). is ψ (s) = φ(s)H(s). It is known (e. 2004 i i . 2) block of the matrix exponential of A ˜ (γ )T )]22 = 0 det[exp(A This problem arises as the computation of the optimal sensitivity for a pure delay: γmax = inf q∈H∞ (5. Here are some details. We also note that the result played a crucial role in the solution of the sampleddata H∞ control problem [39]. together with φ(s) which is scalar and inner. Q are the gramians of Σ. see [302].
i i 5.5.0807 0.1470 0. 2004 i i i i . Thus using (4. only in the singleinput or singleoutput (m = 1 or p = 1) case. Furthermore in [85] it is shown that the largest eigenvalue λmax (CP C∗ ) is an induced norm. (5.9): Σ 2 H2 = ∞ 0 trace [h∗ (t)h(t)] dt = 1 2π ∞ −∞ trace [H∗ (−iω )H(iω )] dω where the second equality is a consequence of Parseval’s theorem. in case that the coefﬁcients of H are real H(s) = H(−s). the eigenvalues of A are in the open lefthalf of the complex plane. · · · . the H2 norm is an induced norm.2 A formula based on the EVD of A Continuoustime SISO Given is a stable SISO system Σ with transfer function H(s). · · · .0771 . Let ci be the corresponding residues: ci = H(s)(s − λi )s=λi . . The question reduces to whether the Frobenius norm is induced. Consequently. in this case there holds (5. . The following result holds ACA April 14. Plot of the determinant versus 0 ≤ γ ≤ 1 (right).5. The ﬁrst 6 Hankel singular values (left). 0.9 1 Figure 5. The H2 norm 10 i “oa” 2004/4/14 page 121 i 121 γ1 γ2 γ3 γ4 γ5 γ6 .6 0. (5. . i. 8 6 4 2 determinant 0 2 4 6 8 10 0 0. n. therefore Σ H2 = trace [B∗ QB] = trace [CP C∗ ] .2 0.4 0.5.6.1615 0. i = 1.e. n ci i = 1. The zero crossings are the Hankel singular values.44) we obtain Σ 2 H2 = ∞ 0 trace [B∗ eA t C∗ CeAt B]dt = trace [B∗ QB] ∗ Furthermore since trace [h∗ (t)h(t)] = trace [h(t)h∗ (t)]. This norm can be interpreted as the maximum amplitude of the output which results from ﬁnite energy input signals.5 The H2 norm 5.27) Similarly for discretetime systems. . We ﬁrst assume that all poles λi are distinct: λi = λj .1 0.4. In [84] it is shown that the Frobenius norm is not induced (see also [98]).8 0. n. 5. For details on the deﬁnition of this (nonequi) induced norm see section 5. Thus H(s) = i=1 s− λi .9228 0.3 0. using (4.7 0. 5. Therefore this norm is bounded only if D = 0 and Σ is stable. this last expression is also equal to trace [CP C∗ ].43).5 sigma 0.4465 0.28) A question which arises is whether the H2 norm is induced.1 A formula based on the gramians The H2 norm of the continuoustime system Σ = the time domain): Σ H2 = A C B D is deﬁned as the L2 norm of the impulse response (in h L2 . ∗ ∗ ∗ We will use the notation H(s) = H (−s).
2004 i i . let H(z ) = C(z I − A)−1 B = i=1 z− λi . Then n Σ 2 H2 = k=1 ck (−1)k−1 dk−1 H(s) (k − 1)! dsk−1 n = s=−λ k=1 dk−1 ck H(−s) (k − 1)! dsk−1 s= λ Continuoustime MIMO Assume that the matrix A is diagonalizable (i. C = [C1 C2 · · · Ck ]. j = 1. λn ]. · · · . i=1 (5. where In is the n × n identity matrix and Jn is ∗ the matrix with 1s on the superdiagonal and zeros elsewhere.29) Proof. n λi + λ∗ j Thus the square of the H2 norm is CP C∗ and the desired result follows. Thus. λk Ik ). and H(s) = C(sI − A)−1 B. The following realization for H holds A = diag [λ1 . The H2 norm of this system is n Σ 2 H2 = i=1 ci 1 H z 1 z z =λ∗ i We now consider the case where H has a single real pole λ of multiplicity n: H(z ) = c1 cn + ··· + n (z − λ) ( z − λ) In this case (as before) a minimal realization of G is A = λIn + Jn .e. Thus let in the eigenvector basis: ∗ ∗ ∗ A = diag (λ1 I1 . where In is the n × n identity matrix and Jn is the ∗ matrix with 1s on the superdiagonal and zeros elsewhere. C = [c1 · · · cn ] Therefore the reachability gramian is Pi. multiplicities are allowed but the algebraic and geometric multiplicity of eigenvalues is the same). We now consider the case where H has a single real pole λ of multiplicity n: H(s) = cn c1 + ··· + (s − λ)n ( s − λ) In this case a minimal realization of H is given by A = λIn + Jn . where Ij is the identity matrix. B = [1 · · · 1]∗ . Then n Σ 2 H2 = k=1 dk −1 ck (k − 1)! dsk−1 1 H z 1 z s=λ∗ i i ACA : April 14.j = 1 . B = [0 · · · 0 1] and C = [cn · · · c1 ]. Linear dynamical systems: Part II i “oa” 2004/4/14 page 122 i n n Σ 2 H2 = i=1 ci H(s)∗ s=λ∗ = i ci H(−λ∗ i ). i. · · · . · · · .i i 122 Chapter 5. B = [B∗ 1 B2 · · · Bk ] . Then the H2 norm is given by the expression k Σ 2 H2 = i=1 ∗ ∗ trace [H∗ (−λ∗ i )Bi Ci ] Discretetime SISO In this case we will consider systems whose transfer function is strictly proper. B = [0 · · · 0 1] and C = [cn · · · c1 ]. The constant or nonproper part can n ci be easily taken care of.
’s’). Lower left frame: impulse response and step response. this gives an area of 3π for the Nyquist diagram.5 shows the Hankel singular values.2 The step response and its total variation (sum of vertical lines) 1 1 0. continuoustime low–pass Butterworth ﬁlter. which veriﬁes the result mentioned earlier. 1 is the cutoff frequency and ’s’ indicates that it is a continuous time ﬁlter. where 16 is the order. the sum of all peaktovalley variations of the step response. the step response. the H2 norm is 0.6 0.2 0. this norm is equal to the 2.2 0 −0.6 7 6 7 6 7 6 7 6 5 4 4.8 0.6.2 0 5 10 15 20 25 30 35 40 45 50 −0.e. and the Nyquist diagram of this system.4 0.5197e − 17 3.2) on page 28: φ(t0 ) = f (t0 ) ACA q April 14. In this case the 2induced norm of the convolution and the Hankel operators is the same. Further induced norms of S and H* An example We consider a 16th order. f (t0 ) is a vector in Rn and we can deﬁne its norm in one of the different ways indicated in (3. b) ⊂ R.4 −0. This number is also equal to the total variation of the step response. in MATLAB it can be generated by using the command butter(16.8 0.1. 2004 i i i i . Upper right frame: Nyquist plot.6 0. 16th order Butterworth ﬁlter. Nyquist diagram: 16th order Butterworth filter 1 i “oa” 2004/4/14 page 123 i 123 2 6 6 6 6 6 6 6 6 6 6 4 9. 5.4789e − 04 3 2 7 6 7 6 7 6 7 6 7 6 7.2 0 5 10 15 20 25 30 35 40 45 50 Figure 5.9963e − 01 9.4 0. The Frobenius norm of the Hankel operator (also known as the HilbertSchmidt norm) is equal to 4.4 0. i.6 −0.2 0. this is equal to the peak amplitude of the output obtained for some input signal of unit largest amplitude.2 −0.2471e − 01 6. Notice that the Nyquist diagram winds 3 times around 0 drawing almost perfect circles. Upper left frame: Hankel singular values.6892e − 14 7.i i 5.1336e − 01 7.6 −0.6 Further induced norms of S and H* Consider a vectorvalued function of time f : I → Rn . also known as the peaktopeak gain is 2.2 0 0 −0. since the system is SISO.7116e − 02 1. Figure 5.2 1.4 −0.5646. The ∞. For a ﬁxed t = t0 .9163e − 01 9. The various norms are as follows: the H∞ norm is equal to 1.4 0.6103e − 08 5. Lower right frame: total variation of step response.5242e − 05 1.0359e − 02 8. in other words it is the maximum amplitude obtained by using unit energy (2norm) input signals. where I = (a.8 −0.6 0.5903e − 06 3.0738e − 10 4.8 0.1314. the impulse response.8 (HS norm)2 = 4 Real Axis Impulse and step responses for the Butterworth filter 1. This is a linear dynamical system with low– pass frequency characteristics.1111e − 12 1.2 0 0.8230e − 01 3. ∞ induced norm of the convolution operator.5. The remaining area is actually equal to one circle.2466e − 17 3 7 7 7 7 7 7 7 7 7 7 5 Imaginary Axis 0. thus the total area is estimated to 4π . ∞ induced norm.8 −1 −1 −0.6 0.
q −1 + q ¯−1 = 1 ¯q ¯) . +p ¯−1 = 1. and the largest system energy gain from the disturbance to the tobecontrolled outputs is minimized. if namely. Rm ) → L(r. q ) Lebesgue spaces: L(p. where ηij = i ∞ 0 hij (t) dt ∞ 0 1 = S (1. q ) norm of a discretetime vector valued function can be deﬁned similarly. To distinguish this from the cases discussed we will use a different notation: b 1 p  f (p. is minimized in some appropriate sense.q) → L(r.30) Recall the H¨ older inequality (3. Linear dynamical systems: Part II i “oa” 2004/4/14 page 124 i φ in turn is a scalar (and nonnegative) function of time t and we can deﬁne its p norm as follows b 1 p φ p = a [φ(t)] dt p Combining these two formulae we get the (p. I ⊂ Z. If p = q = 2 and r = 2.q) (R− .q ) (5. in the domain and the range: p = q = r = s • • • • Σ Σ Σ Σ 2 = S (2.1) = supu=0 R∞ (y1 +···+yp ) dt R∞ 0 (u1 (t)+···+um (t)) dt = maxj ηij . where p is the temporal norm while q is the spatial norm. tobecontrolled and measured outputs w. The (p. Let S : L(p. Rm ) → L(r. Such norms are used extensively in control theory.i i 124 Chapter 5. and the resulting problem is the L1 optimal control problem. such that the (convolution operator of the) closedloop system Scl from v to w.32) These induced operator norms are called equiinduced if p = q = r = s. We can now deﬁne the corresponding (p.9. that is.q) (R+ . y.q) = sup (r.q) (I .q) = f (t) a p q dt (5.s) . the associated induced operator norm is  T (p.q) ·  g (p.2) R∞ 2 2 (y1 (t)+···+yp (t)) dt R0 ∞ 2 2 0 1 2 (u1 (t)+···+um (t)) dt dt 2 dt 2 1 2 = supω∈R σmax (H(iω )) 1 H = H (2. In the H∞ control problem p = q = r = s = 2. g  ≤  f (p. we obtain the generalized H2 control problem.1) 0 R∞ ( y+ R0∞ 0 u− = [λmax (PQ)] 2 j ∞ (∞. ∞ and s = ∞. Rp ). q ) norm of f .3) on page 28. p) norms. the spatial and temporal norms are taken to be pnorms. We refer to [85] for a more complete description of such problems and appropriate references. the problem consists in ﬁnding a feedback control law mapping y to u. and H : L(p. Otherwise they are called mixedinduced norms. Rn ) = {f : I → Rn . u. given a system with control and disturbance inputs v.s) (5. (5. Rp ). p For a proof see [85]. The generalized H¨ older inequality valid for vectorvalued time signals becomes: −1  f . 2004 i i . With these generalized deﬁnitions. where ηij = hij (t) dt i i ACA : April 14.∞) supi maxyi  supi maxui  = maxi ηij . λmax (M) denotes the largest eigenvalue of M.2) = supu=0 (2. If instead of energy one is interested in maximum amplitude minimization. The following notation will be used: given a matrix M ≥ 0.31)  y (r.∞) = supu=0 (1. Roughly speaking. Proposition 5.3) are (p.q) < ∞} Consider the map T : L(p.s) u=0  u (p.s) (R+ .2) and (5. The following are equiinduced norms in time and in space. p = q = r = s = ∞.s) (R+ .  f (p.2) = supu− =0 = S (∞.2) (2. the time integrals in the above expressions are replaced by (discrete) time sums. valid for constant vectors. We will now conclude this section by quoting some results from [85] on induced norms of the convolution operator. while δmax (M) denotes the largest diagonal entry of M.
(∞. of the two real variables σ and ω . where H(s) = σ1 (H(s)) + · · · + σp (H(s)). q. Frequencydomain Hardy norms. The H1 . and separately in time and in space in the range: r = s • • Σ Σ 1.s) (∞. The L1 .1) = w •  w (2. they are equiinduced in time and in space. Consider a matrix valued function of s = iω : H : iR → Cp×m . = σ1 L∞ = supω H(iω ) . ∞]. where H(s) 2 2 2 = σ1 (H(s)) + · · · + σp (H(s)).7. ∞ = S (2. April 14. the induced norms  · (1.2) More generally. where H(iω ) 2 2 2 (H(iω )) + · · · + σp (H(iω )).2) = supu=0 (∞.e. and equal to 0  h  dt.  · (1. with p ≤ m.1) = supu=0 (∞. The L1 . 1 1 supu=0 supt≥0 √ 2 (t)+···+y2 (t) y1 p [ 1 2 2 (u2 1 (t)+···+um (t)) dt ] •  S (1. for q. •  S (1. Frequencydomain Lebesgue norms. ∞ 0 L2 H(iω ) 2 dω .q) . H2 . and L∞ norms of w are deﬁned as follows. It is assumed that both the spatial and the temporal norms are the same: •  w (1.∞) = w L∞ = maxi supt wi (t). ∞]. s ∈ [1. ∞ 0 H2 supσ>0 H(σ + iω ) 2 dω . with p ≤ m.2) supt≥0 R∞√ 0 2 (t)+···+y2 (t) y1 p 2 u1 (t)+···+u2 m (t) dt √ = supt≥0 σmax (h(t)) (2. ∞ 0 L2 w(t) 2 dt . Summary of norms 125 i “oa” 2004/4/14 page 125 i The following are mixedinduced norms.7 Summary of norms Timedomain Lebesgue norms. 2 = S (1. Notice that for SISO systems the L1 induced and the L∞ induced norms of the convolution operator S are the ∞ same.  S (∞. are equal. L2 . where w(t) = w1 (t) + · · · + wn (t) . where H(iω ) = σ1 (H(iω )) + · · · + σp (H(iω )).∞) (2.i i 5. that is. in the domain p = q . s ∈ [1.s) 5. •  w (∞. and H∞ norms of H are deﬁned as follows: • • • ACA H H H H1 = supσ>0 = ∞ 0 H(σ + iω ) dω .2) = supu=0 [ R∞ 0 supt≥0 maxi yi  1 2 2 ( u2 1 (t)+···+um (t)) dt ] Further mixed induced norms.s) (∞. where H(iω ) = σ1 (H(iω )). analytic for Re(s) ≥ 0). 1 1 2.q) .q) . H∞ = supσ>0 [supω H(σ + iω ) ]. are L1 operator norms. Consider a vector valued function of time: w : R → Rn .2) = w L1 = = ∞ 0 w(t) dt. L2 . Finally. where w(t) 2 = w1 (t)2 + · · · + wn (t)2 . of the convolution operator S and of the Hankel operator H.2) (2.2) 1 R 2 2 (t)+···+yp (t)) dt] 2 [ 0∞ (y1 R ∞ 0 (u1 (t)+···+um (t)) dt = [δmax (B∗ QB)] 2 . 2004 i i i i . = [δmax (CP C∗ )] 2 . of the complex variable s = σ + iω ∈ C. We assume that H is analytic in the closed RHP (i.2) = supu=0 •  S (∞. where H(s) = σ1 (H(s)).  · (2. Consider a matrix valued function H : C → Cp×m .2) [ R∞ 0 √ R0 ∞ R∞ 0 2 2 (y1 (t)+···+yp (t)) dt] 2 2 u2 1 (t)+···+um (t) dt 1 = [λmax (B∗ QB)] 2 = [λmax (CP C∗ )] 2 .2) . and L∞ norms of H are deﬁned as above. Notice that the spatial and the frequency norms are the same: • • • H H H L1 = = ∞ 0 H(iω ) dω .2) = (2.
10.1 Review of system stability Consider the autonomous or closed system (i. We note that in case of induced norms. and only if. all pure imaginary eigenvalues have multiplicity one. i = 1. that is A is Hurwitz. The system is called asymptotically stable if all solution trajectories go to zero as time tends to inﬁnity: x(t) → 0 for t → ∞.8. and of the Hankel operator H: u− → y+ = H(u− ).22)).2−ind = σ1 (H) = P j p λmax (PQ) R∞ 0 ∞ ∞. H H∞ = H L∞ Induced and noninduced norms of Σ. They involve the norms of the convolution operator S : u → y = S (u) = h ∗ u. and only if. 5. we are now in a position to discuss the important concept of stability.1−ind = maxj P i ηij .e. the spatial and the temporal norms are taken to be the same. The following holds. L2 norm of H: Hankel norm L1 norm of S : peak row gain L∞ norm of S : peak column gain RMS gain H 2. 2004 i i . The expressions for the most common induced and noninduced norms of Σ are summarized. H H2 = H L2 . Furthermore the matrix A is called Hurwitz if its eigenvalues have negative real parts (belong to the LHP lefthalf of the complex plane).i i 126 Chapter 5. i i ACA : April 14.33) is • Asymptotically stable if. The autonomous system (5.∞−ind = maxi ηij . resp.33) This system is called stable if all solution trajectories x are bounded for positive time: x(t) for t > 0 are bounded. r=s=2 p=q=2 r=s=2 p=q=2 r=s=∞ p=q=∞ r=s=1 p=q=1 Σ Σ Σ Σ Σ 2 = = = S H S S p 2−ind = supω σ1 (H(iω )) = H(s) = Σ H H∞ L2 norm of S : sup of freq. The various induced norms of S and the Hankel singular values satisfy the following relationships. ηij = hij (t) dt 1 = = 1. system having no external inﬂuences): d x(t) = Ax(t).8 System stability Having introduced norms for dynamical systems. A ∈ Rn×n dt (5. the Hardy norms are equal to the corresponding frequency domain Lebesgue norms: H H1 = H L1 . Theorem 5.. • Σ H ≤ Σ 2 2 ≤ Σ ∞ • σ1 ≤ Σ • • Σ Σ ∞ ∞ ≤ 2(σ1 + · · · + σq ) ≤ 2(σ1 + · · · + σq ) ≤ 2n Σ 2 5. all eigenvalues of A have nonpositive real parts and in addition. · · · . all eigenvalues of A have negative real parts. ηij = R∞ 0 hij (t) dt H2 trace [C∗ P C] = p trace [BQB∗ ] Let σi . Linear dynamical systems: Part II i “oa” 2004/4/14 page 126 i As a consequence of the maximum modulus theorem. followed by a a brief account on the more general concept of dissipativity. • Stable if. q be the distinct Hankel singular values of Σ (see (5.
The eigenvalues of A are often referred to as the poles of (5. A ∈ Rn×n (5. · · · . i = 1. for t > 0. 0 Closely related to BIBO stability is L2 stability. with n ﬁnite.i i 5. The ﬁniteness of the L1 norm of the impulse response h implies the Lp I/O stability of Σ for 1 ≤ p ≤ ∞. if any bounded input u results in a bounded output y: p u ∈ Lm ∞ (R) ⇒ y ∈ L∞ (R). In other words. B ∈ Rn×m . y(t) = Cx(t) + Du(t) dt where A ∈ Rn×n . moreover.e. internal stability of a forced system. ∞ To avoid ambiguity we stress that both the spatial and the temporal norms are the inﬁnity norms (in the domain and the codomain). C ∈ Rp×n . C ∈ Rp×n . one can deﬁne Lp inputoutput stability.e.33) is x(t + 1) = Ax(t). there exist A ∈ Rn×n .11. m. for t → ∞. this means that the (i. and only if. j = 1. we also want to ask the question: if the system with impulse response h is realizable. The discretetime analogue of (5. Theorem 5. The convolution system Σ is L2 inputoutput stable if. such that h(t) = CeAt B. to systems with inputs and outputs: d x(t) = Ax(t) + Bu(t). in the latter case any eigenvalues on the unit circle must have multiplicity one. The following holds. System stability In the sequel we identify stability with asymptotic stability. which is in turn. Let us now consider the inputoutput (i. · · · . In a similar way. ACA April 14. assumed causal (h(t) = 0. that is. We want to address the question of what is meant by stability of such a system. Here A is called a Schur matrix. p. in particular what properties of h guarantee stability of the system. the L∞ induced norm of the convolution operator is ﬁnite. j )th entry hij of the p × m impulse response h. boundedinput. what is the relation between the stability of the external representation and that of the realization. The system Σ described by the convolution integral y(t) = −∞ h(t − τ )u(τ )dτ . the L2 induced norm of Σ is ﬁnite. is said to be L2 inputp output stable if all square integrable inputs u ∈ Lm 2 (R) produce square integrable outputs y ∈ L2 (R). D ∈ Rp×m . and only if. For this it is assumed that h is realizable. This system is called internally stable if u(t) = 0. boundedoutput stable). The system described by the convolution integral given above is BIBO stable if. is BIBO stable (i. In this respect there holds Proposition 5. According the previous section. Thus we have the well known statement: stability ⇔ poles in LHP/unit disc We now turn our attention to forced systems Σ = Σ: A C B D .33). The following result holds. equivalent to the ﬁniteness of the H∞ norm of the associated transfer function H. 2004 i i i i . that is. where h is the impulse response of the system. implies x(t) → 0. The convolution system Σ mentioned above.34) 127 i “oa” 2004/4/14 page 127 i This system is (asymptotically) stable if the eigenvalues of A have norm at most one: (λ(A) < 1) λ(A) ≤ 1. must be absolutely integrable ∞  hij (t)  dt < ∞.8. We conclude this section with a result which relates internal and I/O stability. is equivalent to zeroinput stability. Theorem 5. the convolution) representation of forced systems: y = h ∗ u.34) respectively. (5. B ∈ Rn×m .13. t < 0).12.
is not asymptotically stable. there holds d (5. external stability together with minimality implies internal stability. Theorem 5.37) known as the discretetime Lyapunov or Stein equation. then P = P∗ > 0. dt The function Θ : Rn → R is said to be a Lyapunov function for the above system. With this notation d dt Θ(x(t)) = Usually Q = Q is given. dt dt dt Q ∗ ∂Θ ∂ xi . Moreover if Q = Q∗ then P = P∗ .15. There exists a realization of Σ. Every minimal realization of Σ has A Hurwitz. which is a row vector whose ith entry is ∇Θ(x(t)) · f (x(t)). dt Let ∇Θ(x) be the gradient of Θ. and hence the above inequality can also be written as ∇Θ · f ≤ 0 In the case of linear systems (5. 1.i i 128 Chapter 5. while BIBO stable. 2. t ≥ 0. 5. For instance. Conversely. Q) is observable.8.35) Θ(x(t)) ≤ 0. Σ Σ 1 2 thereof. if i i ACA : April 14. with impulse response h(t) = e−t . 3. Given a system Σ and a ﬁnite dimensional realization are equivalent. Linear dynamical systems: Part II A C B 0 i “oa” 2004/4/14 page 128 i Theorem 5. The next result summarizes some properties which are discussed in section 6. This theorem asserts that internal stability implies external stability and that conversely. it follows that d d ∗ d Θ= x Px + x∗ P x = x∗ [A∗ P + PA] x = x∗ Qx. then for all Q ∈ Rn×n . 4. there exists a unique P ∈ Rn×n which satisﬁes (5. If A is Hurwitz.36) The discretetime analogue of this equation is A∗ PA + P = Q (5. P = P∗ ∈ Rn×n .2 Lyapunov stability Consider the (autonomous) system described by d x(t) = f (x(t)).36). x(t) ∈ Rn . and if Q = Q∗ ≤ 0 and (A.33) and quadratic Lyapunov functions Θ(x) = x∗ Px. if for all solutions x : R → Rn . Thus the problem of constructing a Lyapunov function amounts to solving for P the Lyapunov equation: A∗ P + PA = Q (5. the system d x(t) = dt 1 1 0 −1 x(t) + 0 1 2 u(t). y(t) = 0 2 x(t). with A Hurwitz. 2004 i i .14. < ∞. the following statements < ∞.
Q) is an observable pair. It turns out that Q = − ≤ 0. These are systems which produce L2 trajectories on R. x ˙ x . Q) is observable. where x∗ = [x1 . this property is automatic. ˙ be the position and velocity Example 5. B= . k 0 .1 with u = 0 (i. The impulse response hΣ as deﬁned above. We postulate the following quadratic Lyapunov function V = where P = 5. Thus V 0 C 0 0 from the negative semideﬁniteness of Q and asymptotic stability from the additional fact that (A. then (5. However. y(t) = (SL2 u)(t) = Du(t) + ACA −∞ hL2 (t − τ )u(τ )dτ April 14.3 L2 –Systems and norms of unstable systems We conclude this section with a brief account on L2 systems deﬁned on the whole real line R.1. if the poles are both in the left– as well as in the right–half of the complex plane.i i 5. Let x.36). the negative semideﬁniteness of Q 0 c proves stability. the corresponding ones for discretetime systems follow in a similar way and are omitted. x ¨ + cx ˙ + k x = 0. it is assumed that Σ is at rest in the distant past: x(−∞) = 0. P = 2 . they are deﬁned as follows: ∞ SL2 : u −→ y = SL2 u. t < 0. Q) is observable. t ≤ 0 p The associated system operator SL2 deﬁnes a map from Lm 2 (R) to L2 (R). Therefore the associated convolution operator SΣ deﬁned by (4. but not on the imaginary axis. asymptotic stability follows from the additional fact that the pair (A. Eliminating x2 . when fed with trajectories which are L2 on R.33) is asymptotically stable. that is L 0 1 ˙ = x∗ Qx. After basis change. We now redeﬁne the impulse response h as follows: hL2 (t) = h+ (t) = C+ eA+ t B+ . t > 0 h− (t) = C− eA− t B− . A can be written in block diagonal form: A+ B+ A= . x1 is the current through the inductor differential equation describing this system is x 1 ¨ 1 + Rx ˙1 + C and x2 is the voltage across the capacitor. namely in the inductor and the capacitor V = 1 2 Lx1 + 2 C x2 . C = (C+ C− ) A− B− where Reλi (A+ ) < 0 and Reλi (A− ) > 0. the following equation holds: mx ˙ = Aw where A = w 0 k −m 1 c −m and w = 1 ∗ 2 w Pw. The system can nevertheless be interpreted as an L2 system on the whole real line. that is V = 0 m 1 1 2 ˙2 2 k x + 2 mx . As Lyapunov function 1 2 2 we will take the energy stored in this circuit. is the sum of the potential energy stored in the spring plus the kinetic energy of the mass.33) is stable.8. When all poles of the system are in the lefthalf of the complex plane. x2 ]. that is Reλi (A) = 0. Then. t ≥ 0 hΣ = 0. where Q = A∗ P + PA = − R 0 0.8. If in addition the pair (A. 2004 i i i i . and hence always positive. This can be written as a ﬁrst order system of the mass. (a) Consider a massspringdamper system attached to a wall. in order to prove stability 0 0 we have to compute A∗ P + PA = Q.8. Here are the details. System stability 129 i “oa” 2004/4/14 page 129 i Q = Q∗ ≤ 0 and P = P∗ > 0 satisfy (5. The formulae below hold for continuoustime systems.2. Let A have no eigenvalues on the imaginary axis. HΣ (s) = D + C(sI − A)−1 B. we get Lx x1 = 0.e.5) with hΣ as deﬁned above does not map L2 inputs u into L2 outputs y unless A is stable. all pass L2 systems are deﬁned and characterized in terms of the transfer function and a state representation. (b) We now consider the system in example 4. then (5. Stability follows V = x∗ Px. continuoustime system Σ = A C B D deﬁned for all time t ∈ R. Given is the linear. is not square integrable on R unless the eigenvalues of A lie in the left–half of the complex plane. The ˙ = Ax. the input terminal is shortcircuited). by giving up causality we can deﬁne an L2 system. and associated Hankel operator HL2 deﬁnes p a map from Lm 2 (R− ) to L2 (R+ ). Its impulse response and transfer function are hΣ (t) = CeAt B + Dδ (t). According to the above considerations. whenever the system is not at rest.
The above considerations allow the deﬁnition of the 2–norm and of the Hankel norm of unstable systems.3 for the computation of the H∞ norm of a stable system by ˜ (γ ). . The steps presented in proposition 5. . k = 0. 0 0 a−b a2 − b2 . a < 1. . . ··· ··· ··· ··· . Example 5. Linear dynamical systems: Part II 0 i “oa” 2004/4/14 page 130 i HL2 : u− −→ y+ = HL2 u− .38) the 2–norm of the unstable system Σ. . We will consider a simple second order discretetime system with one stable and one unstable pole: Σ : y(k + 2) − (a + b)y(k + 1) + aby(k ) = (a − b)u(k + 1). . i. Norms of unstable systems. (5.i i 130 Chapter 5. . Σ 2 = S L2 2−ind = sup σmax (H(iω )) = H(s) ω L∞ . its 2–norm is deﬁned as the 2induced norm of the convolution operator of the associated L2 ( 2 ) system. . . . . .. The H2 norm of the system is the 2 norm of h. h(k ) = ak − bk . −2. This is equal to the supremum of the largest singular value of the transfer function on the imaginary axis (the unit circle) and is know as its L∞ ( ∞ ) norm. The transfer function of Σ is H(z ) = (a − b)z a b = − (z − a)(z − b) z+a z+b The impulse response of this system is h(0) = 0.8. · · · . −1.1. .39) ≥ Σ H. 2. y+ (t) = (HL2 u− )(t) = h+ (t − τ )u(τ )dτ −∞ An important property of the L2 system is that it has the same transfer function as the original system: HL2 (s) = H(s). 0 0 0 0 0 0 a−b 0 . 0 a−b a2 − b2 a3 − b3 . .2. Norms of unstable systems play an important role in the theory of Hankel norm approximation presented in chapter 8 on page 205. Given an unstable system with no poles on the imaginary axis (unit circle). these norms are deﬁned as the corresponding norms of the associated 2 system Σ 2 : y(k + 1) − ay(k ) = au(k + 1). . ··· ··· ··· ··· . . . Here is the precise statement.8. can also be applied if the eigenvalues of A are not necessarily in the lefthalf means of the Hamiltonian matrix A of the complex plane. . 2004 i i . y(k ) − b−1 y(k + 1) = u(k ).. which since b > 1. . As indicated above. k = 1.38) The Hankel norm of an unstable Σ is equal to the Hankel norm of its stable subsystem Σ+ Σ and there holds Σ 2 H = Σ+ H.e. ∞ 2j j =1 (a In this case the H2 norm is the square root of the quantity 1 + + b−2j ) = b 2 − a2 (1−a2 )(b2 −1) . according to (5. . Let the circulant i i ACA : April 14. The former is done by means of hL2 and the latter by means of h+ . The 2induced norm of the convolution operator is the largest singular value of S= . . which is also inﬁnite. . Remark 5. is inﬁnite. . b > 1. In this case the result of the computation provides the L∞ norm of the transfer function H. · · · . k > 0. (5. .
. An L2 system is allpass or uniraty if u 2= y 2. the transfer function of Σ is denoted by H(s) = D + C(sI − A)−1 B. Finally. . a a2 b−n ··· ··· b−n+2 b−n+3 . (1 + a)(1 + b) (1 − a)(b − 1) a−b Thus the 2 induced norm of Σ 2 is (1− a)(b−1) .2. . its eigenvalues approach those of the convolution operator S . all its Hankel singular values are equal to one for some appropriate D. The third condition of the above theorem can be expressed by means of the following single equation: A∗ B∗ Q 0 + Q 0 A B + C∗ D∗ 0 −Im C 0 D −Im =0 which is a structured Sylvester equation. 3. . H∗ (−iω )H(iω ) = Im . ··· ··· . 2004 i i i i . deﬁned on page 109. (a) The transfer function of an allpass system can be written as follows: H(s) = (I − C(sI − A)−1 Q−1 C∗ )D. As n → ∞.17.46) satisfy PQ = I . As before. i = 1. ∃Q : A∗ Q + QA + C∗ C = 0. .e. . a an−1 n−2 i “oa” 2004/4/14 page 131 i 131 ··· ··· . 2. y) satisfying the system equations. D∗ D = Im . an−1 an an b−n . The results are presented only for continuoustime systems which need not be stable. is 1. Corollary 5. The following statements are equivalent: 1. for all (u. Q of the Lyapunov equations (4.. .16. . L2 systems having the same number of inputs and outputs) which are allpass. .45) and (4. satisfy PQ = In and D∗ D = Im . System stability matrix associated with the convolution operator. 4. Allpass L2 systems We are now ready to characterize (square) L2 systems (i. an b−n ∈ R(2n+1)×(2n+1) . . by deﬁnition. · · · . and (4. Theorem 5. (b) Given A. 2. b−1 .18. which is readily computed to be a/(1 − a2 ). . The basic statement given below is that a system is allpass if. (c) The 2 norm of the system. . 2n + 1.e. . be: b−n b−n+1 . C. a . The solutions P . ACA April 14. This is also the maximum of the transfer function on the unit circle. the solutions P . S2n+1 ··· ··· b 1 .46). and only if. . B. Remark 5. their transfer functions are unitary on the iω axis. .e. Deﬁnition 5. The L2 system Σ is square p = m.. and allpass. which is the L∞ norm of H. the Hankel norm of Σ is the Hankel norm of its stable part Σ+ . y(k + 1) − ay(k ) = au(k + 1). a2 . that is the ∞ norm of H. and only if. = 1 . . Moreover the singular values of S2n+1 lie between a − b a − b ≤ σi (S2n+1 ) ≤ . Q of the Lyapunov equations (4.8.8.i i 5. page 69. there exists D such that the system is allpass if. i.. i. .45). QB + C∗ D = 0. b−n b −n+1 b−n+1 b−n+2 .
whose product is equal to one. and (c) −Q−1 A∗ Q = A× = A − BD−1 C. the following string of equivalences holds: Y 2= HU 2 ⇔ Y. Combining the last two equations we obtain A∗ Q + QA + C∗ C = 0. irrespective of the parameters a and b. Consider the second order allpass system described by the transfer function: H(s) = (s + a)(s + b) . Y 2 = HU. Q = − c2 2a − acd +b d2 2b − acd +b = a+b a− b √ ab − 2a− b ab − 2a− b a+ b a−b √ . HU 2 = U. • [3] ⇒ [2] By direct computation we obtain H∗ (v )H(w) = I − (v + w)C(v I − A)−1 Q−1 (wI − A∗ )−1 C∗ which implies the desired result for w = iω = −v . det Q = 0. the product√ of P and Q is the identity matrix. 2004 i i . d2 = 2b >0 C D a−b a−b c −d 1 In this case P and Q are no longer equal: 2 P = − c 2a cd a+ b d2 2b cd a+b = a+b a− b √ 2 ab a− b √ 2 ab a− b a+b a− b . c 2 = 2a = 0 > 0.i i 132 Chapter 5. Next. d2 = −2b >0 C D a−b a−b c d 1 It follows that the solutions of the two Lyapunov equations are equal 2 P = Q = − c 2a cd a+ b d2 2b cd a+ b = a+b a− b √ 2 −ab a− b √ 2 −ab a−b a+ b −a −b ⇒ PQ = I Notice that the eigenvalues of P are λ1 = 1. i i ACA : April 14. λ2 = −1. while (c) implies A∗ Q + QA − QBD−1 C = 0. the above equation has a unique solution Q. (b) −Q−1 C∗ = BD−1 . a+b=0 (s − a)(s − b) We ﬁrst assume that a > 0 and b < 0.8. In this case a minimal realization of H is given as follows: a 0 c a+b a+b A B = 0 b d . from [H(s)]−1 = H∗ (−s). Thus. we obtain that D−1 (I − C(sI − A× )−1 BD−1 ) = D∗ + B∗ (sI + A∗ )−1 (−C∗ ) ⇒ D−1 = D∗ and there exists Q. c 2 = 2a > 0. in the frequency domain. H∗ HU 2 = U. √ 2 ( a+ b) 1 namely λ1 = . Thus (a) and (b) imply QB + C∗ D = 0. Linear dynamical systems: Part II i “oa” 2004/4/14 page 132 i Example 5. • [1] ⇔ [2] First we notice that y = S (u) is equivalent to Y = HU. s = iω . We make use of the following auxiliary result (D + C(sI − A)−1 B)−1 = D−1 (I − C(sI − A× )−1 BD−1 ) where A× = A − BD−1 C. Clearly. Hence. Furthermore these two matrices have the same set of eigenvalues. Since Re λi (A) = 0. such that (a) B∗ Q = −D−1 C. U ⇔ H∗ H = I • [2] ⇒ [3] for m = p. a−b 1 Proof. which is symmetric since it also satisﬁes A∗ Q∗ + Q∗ A + C∗ C = 0. we assume that a > b > 0.3. λ2 = λ . In this case a minimal realization of H differs from the one above by a sign: a 0 c a+b a+b A B b d .
(5. Thus the above deﬁnition says that the change in internal storage. y) → s(u. y) which satisfy the system equations. It is a generalized energy function for the dissipative system in question. x. the dissipation inequality can be written as d Θ(x(t)) ≤ s(u(t). x(∞) = 0. while if this integral is negative we will say that work is done by the system. from autonomous or closed systems to open systems. The nonnegative function Θ is called a storage function. 2004 i i i i . systems with inputs and outputs: d x = f (x. with respect to the supply function s. u). y(t)) dt .i i 5. y) satisfy the system equations. t (5. that is.40) holds. the unique solution is X = 0 which implies the desired PQ = I. the available storage and the required supply. dt (5.9 System dissipativity* We now wish to generalize the concept of Lyapunov stability to open systems. Dissipativeness now means that the system absorbs supply (energy). for all t0 ≤ t1 and all trajectories (u. There are two universal storage functions which can be constructed. The latter is 0 Θreq (x0 ) = inf −∞ ACA s(u(t). The former is deﬁned as follows: ∞ Θavail (x0 ) = sup − 0 s(u(t). x. that is. y(t)) dt . y(t)). and in addition x(0) = x0 . if Θ(x) is differentiable with respect to time. namely. the system deﬁned above is said to be dissipative. u) dt We deﬁne a function s called the supply function to the system s : U × Y → R. Therefore Θavail (x0 ) is obtained by seeking to maximize the supply extracted from the system starting at a ﬁxed initial condition. and hence QA(PQ) + (QP )A∗ Q + QBB∗ Q = 0. namely the input u and output y.43) April 14. Finally.9. namely. 5. we will say that work is done on the system. y = g(x. • [4] ⇒ [3] Follows similarly. Θ(x(t1 )) − Θ(x(t0 )). y) which represents something like the power delivered to the system by the environment through the external variables.35). It follows that Q(AP + P A∗ + BB∗ )Q = 0. we obtain QA (I − PQ) + (I − QP ) A∗ Q = 0 ⇒ RX + X∗ R∗ = 0 R X X∗ R∗ Since Re λi (A) = 0. can never exceed what is supplied to the system. (5. along trajectories x of the system. Therefore if t01 s dt > 0. if there exists a nonnegative function Θ : X → R such that the following dissipation inequality t1 Θ(x(t1 )) − Θ(x(t0 )) ≤ t0 s(u(t).41) Thus the storage function generalizes the concept of Lyapunov function given in (5. System dissipativity* 133 i “oa” 2004/4/14 page 133 i • [3] ⇒ [4] Let P satisfy AP + P A∗ + BB∗ = 0.42) where (u. Consequently the concept of dissipativity generalizes to open systems the concept of stability. Substituting QB = −CD and subtracting from AQ + A∗ Q + C∗ C = 0. (u. y(t)) dt. systems with inputs and outputs. More precisely.
capacitors. for all x0 . Q = Q∗ ∈ R(p+m)×(p+m) . where ˆ = Q Q11 Q21 Q12 Q22 = C∗ D∗ 0 Im Qyy Quy Qyu Quu C D 0 Im . Some commonly used supply functions are s(u. [361]. the sum of the product of the voltage and current at the external ports. We are now ready to state the following theorem due to Willems. Consider the system Σ = A C B D . Θreq (x0 ) is obtained by seeking to minimize the supply needed to achieve a ﬁxed initial state.i i 134 Chapter 5. y) = u 2 2 0 I I 0 I 0 0 I + y 2 2. This amounts to the construction of a storage function Θ such that the dissipation inequality holds. x(0) = x0 . 2004 i i . (1) Σ is dissipative with respect to s.9. constitutes a supply function: k Vk Ik electrical power. For electrical circuits consisting of passive components. which is the power supplied. (5. Given that y = Cx + Du. (2) Σ admits a quadratic storage function Θ = x∗ Xx. Q= Q= Q= I 0 0 −I (5. Electrical and mechanical systems. y) = u 2 2 (5. inductors and ideal transformers. the sum of the product of the force and velocity as well as angle and torque of the various particles is a possible supply d d function: k (( dt xk ) Fk + ( dt θk ) Tk ) mechanical power. A result due to Willems [360].44) − y 2 2. The central issue now is. Theorem 5. In this case. and in addition x(−∞) = 0. to determine whether Σ is dissipative with respect to s. y) satisfy the system equations. given s. Linear systems and quadratic supply functions Consider the linear system Σ = A C B D . inherits the symmetry property from Q.47) where no deﬁniteness assumptions on Q are necessary at this stage. the storage function Θ becomes a Lyapunov function and dissipativity is equivalent to stability.19. x. Remark 5. in this case storage functions form a convex set and in addition the following inequality holds: 0 ≤ Θavail (x0 ) ≤ Θ(x0 ) ≤ Θreq (x0 ) Thus the available storage and the required supply are extremal storage functions. the supply function can also be expressed in terms of the state x and the input u: s = x∗ Q11 x + x∗ Q12 u + u∗ Q21 x + u∗ Q22 u.1. which is assumed reachable and observable.45) (5. i i ACA : April 14. s(u. y) = y∗ u∗ Qyy Quy Q Qyu Quu y u . Notice that if s = 0. y) = u∗ y + y∗ u.46) s(u. In mechanical systems. Linear dynamical systems: Part II i “oa” 2004/4/14 page 134 i where (u. with the supply rate s. with X = X∗ ≥ 0. The following statements are equivalent. Consider the general quadratic supply function which is a function of the external variables. together with the quadratic supply function s. that is resistors. states that a system is dissipative if and only if Θreq (x0 ) is ﬁnite. namely. the input u and the output y: s(u.
Let Q22 be nonsingular and deﬁne the quantities 1 −1 ∗ −1 F = A − BQ− 22 Q21 . (ARE) (9) There exists a solution X = X∗ ≥ 0 to the Riccati inequality F∗ X + XF + XGX + J ≤ 0. This relationship says that what gets dissipated at time t. is equal to the supply at time t minus the rate of change of the storage function. (5. for all ω ∈ R. Two reﬁnements. which is always nonnegative. the inequality in part (3) of theorem 5. the system Σ is called conservative with respect to s.48) (7) The transfer function H(s) = D + C(sI − A)−1 B satisﬁes H∗ (−iω ) I Qyy Quy Qyu Quu H(iω ) I ≥ 0. In this case. such that Θavail = x X− x. dt d is a dissipativity function for the system Σ with the supply rate s. Thus d=s− d Θ. G = BQ22 B .41)) is satisﬁed with equality. (8) There exists a solution X = X∗ ≥ 0 to the Riccati equation F∗ X + XF + XGX + J = 0. 2004 i i i i . J = Q12 Q22 Q21 − Q11 . such that iω is not a eigenvalue of A.9. K and L such that A∗ X + XA + K∗ K = Q11 XB + K∗ L = Q12 L∗ L = Q22 ∗ (5) There exists X− = X∗ − ≥ 0. In case of dissipativity. Remark 5. ∗ (6) There exists X+ = X∗ + ≥ 0. There are two further conditions which are equivalent to the seven listed in the theorem. System dissipativity* (3) There exists a positive semideﬁnite solution X to the following set of LMIs (Linear Matrix Inequalities): − A∗ X + XA B∗ X XB 0 + C∗ D∗ 0 Im Qyy Quy Qyu Quu C D 0 Im ≥0 135 i “oa” 2004/4/14 page 135 i Φ(X) (4) There exist X = X∗ ≥ 0. Θ = x∗ Xx ⇒ dt Kx + Lu 2 = s− d ∗ x Xx ≥ 0.19 is ACA April 14. the quadratic form d= x∗ u∗ Φ(X) x u = Kx + Lu 2 ≥ 0.20. (a) If the dissipation inequality (5. is positive semideﬁnite.40) (and consequently (5. where the second equality follows from part (3) of the theorem.2. Corollary 5. such that Θreq = x X+ x.i i 5.9.
(b) If the dissipation inequality (5. Therefore in the case of conservative systems. The equations of motion are as follows. the Riccati equation is replaced by a Lyapunov equation which has a unique solution.e. Consequently in part (4) of the same theorem K = 0 and L = 0. b1 = b2 = 0. To formalize this we introduce the supply function which consists of the sum of products of force times velocity s = fq ˙1 − [b1 (q ˙1 − q ˙0 ) + k1 (q1 − q0 )] q ˙0 . The car chassis has a mass m2 and its connection with the wheel is modeled by means of a second spring and damper with constants k2 .17 coincide with (5.e. The wheel follows the road which has a proﬁle described by its distance q0 from a ﬁxed position. Q22 = 0. which means that the energy supplied is stored and no part of it is dissipated as heat. As a consequence. We will not provide a proof of the above theorem and corollary.40) (and consequently (5. q2 . The dissipation function d = b2 (q ˙2 − q ˙1 )2 + b1 (q ˙1 − q ˙0 )2 . [361]. there is a unique storage function Θavail = Θ = Θreq . m2 from the same ﬁxed position are q1 . In particular the conditions in part 3. See also the lecture notes of Weiland and Scherer [288]. ˙ .i i 136 Chapter 5. Thus conditions (5. From physical considerations if follows that a storage function is Θ= 1 1 1 1 2 2 + m2 q + k1 (q1 − q0 )2 + k2 (q2 − q1 )2 . the distance of the masses m1 . the system Σ is called strictly dissipative with respect to s.1. namely the work of Willems [360]. all inequalities in theorem 5. of theorem 5.19. is the force exerted by the road on the wheel. It is composed of a mass m1 which models the wheel.45). i. b2 . The above storage function is not unique. This is a dissipative system as there are no active components.19 are strict. where Q11 = −C∗ C. Furthermore. To characterize all of them we can apply theorem 5. For simplicity time derivatives are denoted by dots. 3 33 3 T 3 q0 m2 q ¨2 + b2 (q ˙2 − q ˙1 ) + k2 (q2 − q1 ) = 0 m1 q ¨1 − b2 (q ˙2 − q ˙1 ) − k2 (q2 − q1 ) + b1 (q ˙1 − q ˙0 ) + k1 (q1 − q0 ) = f Notice that b1 (q ˙1 − q ˙0 ) + k1 (q1 − q0 ). as a consequence.49). of theorem 5. Notice is the energy dissipated in the two dashpots. Linear dynamical systems: Part II i “oa” 2004/4/14 page 136 i satisﬁed with equality: Φ(X) = 0. Q12 = −C∗ D. i i ACA : April 14. XB = Q12 . 2004 i i . together with a spring with constant k1 and a damper with constant b1 which model the interaction of the wheel and the road.41)) is strict. respectively. Indeed a straightforward computation shows that d = s − Θ that if the dashpots are absent. Furthermore the inequality in part (6) of theorem 5.9. m2 b2 k2 m1 q1 T b1 k1 3 3 3road proﬁle T q2 We consider a simpliﬁed model of a car suspension. It readily follows that a stable system is unitary or allpass if it is conservative with respect to the supply function (5.17. the system is conservative. We refer instead to the original source.48) become (5. This is left as an exercise for the reader (see problem 31). and Q22 = I − D∗ D. The inputs to this system are therefore the road proﬁle q0 and the control force f .19 becomes an equality and in the allpass case becomes identical with part 3. to the wheel) which acts on m1 vertically. The output may be chosen as q2 . There is a (control) force f applied to the axle (i. respectively. Example 5. m1 q ˙1 ˙2 2 2 2 2 which is the sum of the kinetic energies of the two masses and the sum of the energy stored in the two springs.49) A∗ X + XA = Q11 .
iff Re(H(s)) ≥ 0 for all but a ﬁnite number of points s in the righthalf plane Re(s) > 0.9. 5. in fact.48) provide in this case a state space characterization of passivity and are known as the positive real lemma. . .45). all poles λ of H lie in the lefthalf of the complex plane: Re(λ) ≤ 0. and is used to investigate contractivity of the system. The equalities (5. and the (s) ≥ 0. . λk is simple. 2004 i i i i . Furthermore if there is a pole at inﬁnity. aini = 0. For the second one we notice that α = 0 satisﬁes Re(α) ≥ 0 iff Re(α−1 ) ≥ 0. The ﬁrst uses the supply function(5. . which is important in electric networks composed of passive components (resistors R. n2 . . . simple. ai1 is the residue of H at the pole λi . The next result provides the characterization of positive realness in terms of the external representation of Σ. (iii) the poles of H on the imaginary axis are simple. . λk be the distinct zeros of d (the poles of H) and k n1 . if. We will show that such systems are dissipative with respect to the supply rate s = y∗ u + u∗ y deﬁned in (5. for all but a ﬁnite number of points in Re(s) ≥ 0.9. d coprime polynomials and deg (d) = n. that is. that is whether the system has H∞ norm less than unity. and the associated residues are real and positive. capacitors C . This turns out to be closely related with the concept of positive realness. the energy dissipated is never greater than the energy generated. We can now express H as follows: k ni H(s) = a0 + i=1 =1 ai (s − λi ) where ai and ai are complex numbers that are uniquely determined by H. let H = n d with n. H(s−1 ) is p. s not a pole of H Henceforth. if.r.1 Passivity and the positive real lemma* In this section we will consider a special class of systems. . Roughly speaking these are systems which do not generate energy. associated residue is positive: lims→∞ Hs ACA April 14.) if it maps the righthalf of the complex plane C onto itself: s ∈ C.48) provide a characterization of contractivity and are known as the bounded real lemma. or as the KYP lemma. for i = 1. ak1 = (s − λk )H(s)s=λk . The second case makes use of the supply function (5. Re(s) ≥ 0. λ2 . Recall ﬁrst the partial fraction expansion of this rational function. where the abbreviation stands for ”KalmanYakubovichPopov”. namely. The map s → 1/s is a bijection for s = 0.r. If a pole. The real rational function H(s) is positive real (p. Proof. The following result follows from the deﬁnition. System dissipativity* 137 i “oa” 2004/4/14 page 137 i Instead in the sections that follow. (b) H satisﬁes the following conditions: (i) Re H(iω ) ≥ 0 for all ω ∈ R.r. passive systems. Observe (using continuity) the obvious fact that H is p. . and only if. as those on the imaginary axis.r. deg (n) = m ≤ n. that is nk = 1. (ii) the system is stable. and the proof is complete. and only if. we will attempt to clarify two special cases.i i 5. Hence Re H(s) ≥ 0.21. The following statements are equivalent. Hence the ﬁrst equivalence is proved. Re(s) ≥ 0. their multiplicities ( i=1 ni = n). H−1 (s) is p. and inductors L).r. Hence Re H(s) ≥ 0 for all but a ﬁnite number of points s with Re(s) ≥ 0 iff Re H−1 (s) ≥ 0 for all but a ﬁnite number of points s.22. that is. for all but a ﬁnite number of points in Re(s) ≥ 0 iff Re(H(s−1 )) ≥ 0. (iv) The difference between the degree of the numerator and that of the denominator is at most one: deg (n) − deg (d) ≤ 1. This section follows the notes [359]. . nk . it is. H(s) = 0 is p. Equations (5.46). (a) H is p. . . such that iω is not a pole of H. Proposition 5. Re(s) ≥ 0 ⇒ Re(H(s)) ≥ 0.46) and is used to discuss issues related to passivity of a given system. 2. Let λ1 .r. For simplicity we consider only SISO (single–input single–output) systems. Theorem 5. k .
if (u. y3 must have compact support. . and Σ3 . examine the real part of H(s) in a large circle. . examine the partial fraction expansion of H in a small circle around a right half plane or an imaginary axis pole. dt 2 b0 d2 dt2 d ξ . . Linear dynamical systems: Part II i “oa” 2004/4/14 page 138 i (c) All input–output pairs (u. k . To prove (b)(ii) and (iii). 2. H0 is proper and all its poles are in the open left–half of the complex plane.51) holds. ak > 0. Hence in order to prove that (5. . Σ1 . Now consider the systems Σ0 : Σ1 : Σ2. . there holds: ∞ ∞ ∞ u(t)y3 (t) dt = −∞ −∞ ˆ (−iω )H0 (iω )ˆ u u(iω ) dt = −∞ 1 ˆ (iω )2 dt ≥ 0. where y = y1 + y2 + i=1 y2i + y3 . it follows that for L2 signals. 2004 i i . y1 . i = 1.i i 138 Chapter 5. y3 satisfy the above equations. . Σ2. consider Σ3 . dt 2 For Σ1 . Consider compact support trajectories satisfying the equations of each of these systems. . i = 1. Re(H0 (iω ))u 2 i i ACA : April 14. centered at 0. −∞ (5. Using Parseval’s theorem. To prove (b)(iv). y2i . k k i=1 y y1 + y2 + y2i + y3 Clearly the pair of input/output functions (u. d n0 ( dt )u. s2 + ωk with a0 ≥ 0. k It is easy to see (using observability) that. y1 .i . 2. Finally. which are compatible with the system equations d( satisfy the dissipation inequality d d )y = n( )u dt dt (5. y2i . y) of compact support.51) Proof. . 0 0 u(t)y0 (t) dt = a0 −∞ −∞ u(t) d 1 u(t) dt = a0 u2 (0) ≥ 0. We now prove that for each of these systems the inequality analogous to (5. (b) ⇒ (c): Consider the partial fraction expansion of H.50) 0 u(t) y(t) dt ≥ 0.51) holds for Σ. with n0 .50) iff y0 . it sufﬁces to prove it for Σ0 . Note that Re H0 (iω ) = Re H(iω ) for all ω ∈ R. . where the polynomial part of H dominates. Hence 2 2 For Σ2i we introduce the auxiliary variable ξ : u = ωi + 0 0 u(t)y2i (t) dt = ai −∞ −∞ 2 ωi + d2 dt2 ξ d 1 1 d 2 2 ξ dt = ai ωi ξ (0) + ai ξ (0) dt 2 2 dt . y2i = ai dt ξ . Note that d0 is Hurwitz. 2. (a) ⇒ (b): (b)(i) is trivial. y) have compact support. b0 ≥ 0. k . . 0 u(t)y1 (t) dt = −∞ 1 b0 0 −∞ 2 d 1 y1 (0) y1 (t) y1 (t) dt = ≥ 0. . d0 coprime polynomials.i : Σ3 : Σ: 2 y0 d dt y1 d 2 dt2 y2i + ωi y2i d d0 ( dt )y3 = = = = = d u a0 dt b0 u d ai dt u. the summands y0 . assuming b0 > 0. Let H0 = n0 /d0 . and iω not a pole of H. For Σ0 . i = 1. y) satisﬁes (5. Part (b) shows that H may be written as b0 H(s) = a0 s + + s r k=1 ak s 2 + H0 (s). . 0 = ωk ∈ R.
. Now by (c). and therefore a smoothness argument is strictly speaking required to complete the proof of nonnegativity. and consider H−1 ). 0 k The desired inequality thus follows −∞ u(t)y(t) dt ≥ 0. H is p. t > 0. (1) ⇒ (2). Obviously. without loss of generality that H = the output. u (t) = 0. x (t0 ) = x(t0 ) for the second inﬁmum.i i 5. it follows that y3 ∈ L2 (R. 2. Proof. and s not a pole of H. 0 u(t)y(t) dt where the inﬁmum is taken over all (u. x. −∞ Note that u may not be in C ∞ . R). exchange the roles of the input and be a minimal (i. where y = y0 + y1 + i=1 y2i + y3 . To show that −∞ u(t)y3 (t) dt ≥ 0. The following conditions are equivalent. (u. such that the Linear Matrix Inequality (LMI) A∗ X + XA B∗ X − C holds.23. but we need it for where u 0 integrals on the negative real line R− only. From the previous result. y). reachable and observable) state space A C B D representation system of H. Note that such input–output pairs do not have compact support and therefore an approximation argument is strictly speaking required.e. This shows the inequality for integrals over the real line R. constrained by x (t1 ) = x(t1 ). ACA April 14. (c) ⇒ (a): We show that by considering exponential trajectories. there holds t1 t1 t0 t1 Θreq x(t1 ) = inf −∞ u (t)y (t)dt ≤ t0 u(t)y(t)dt + inf −∞ u (t)y (t)dt = t0 u(t)y(t)dt + Θreq x(t0 ) . 0 0 ≤ Re −∞ u∗ (t)y(t) dt 0 = Re H(s) −∞ eRe(s)t dt Hence Re H(s) ≥ 0.43). There exists X ∈ Rn×n . Consider u(t) = est and y(t) = H(s)est . x. Let the corresponding output be y3 . y) of compact support. y) is an admissible input–output pair for the system. Let Σ = n n d 0 −∞ u∗ (t)y(t) dt from (c). and since d0 is Hurwitz. consider the input u which is deﬁned as u (t) = u(t). X = X∗ > 0. for the ﬁrst. R). namely Θreq (x0 ) = inf −∞ 0 −∞ XB − C∗ −D − D∗ ≤0 u(t)y(t) dt ≥ 0 for all (u. We now discuss what constraints are imposed by positive realness on state representations.. y) having compact support. where the inﬁma are taken over all elements (u . y) of compact support that satisfy the system equations and in addition x(0) = x0 . t ≤ 0. Hence ∞ −∞ 0 u (t)y3 (t) dt = u(t)y3 (t) dt ≥ 0. Σ is dissipative with respect to the supply rate s = u∗ y + y∗ u. for all such (u. for all s with Re(s) > 0. System dissipativity* 139 i “oa” 2004/4/14 page 139 i ˆ is the Fourier transform of u. to deduce the nonnegativity of Re Assume. 2004 i i i i .40) is satisﬁed. is proper (otherwise. This proves that since the dissipation inequality (5. with state space R .9. Σ deﬁnes a dissipative system with storage function Θreq . 3. x . Theorem 5. it follows that Deﬁne Θreq : Rn → R by (5. y ) of compact support. (c) implies Re H(s) > 0. 1. for all (u. Further. Since u ∈ L2 (R.r.
without loss of generality. K and L such that A∗ X + XA + K∗ K = 0 XB + K∗ L = C∗ D + D∗ = L∗ L (5. and only if. and the leading coefﬁcients of d and n have the same sign. x. The assumptions of the zeros (examine the behavior of H on the real axis) and on the leading coefﬁcients imply further i i ACA : April 14. The minimal system Σ = A C B D is dissipative with respect to the supply rate s = y∗ u + u∗ y. S (u) where the inﬁmum is taken over all u ∈ C∞ (R− . H is p. 0 such that −∞ u(t)y(t) dt < 0.i i 140 0 i “oa” 2004/4/14 page 140 i Chapter 5. there exists a positive semideﬁnite solution X = X∗ ≥ 0 to the Riccati equation (A∗ − C∗ ∆B∗ )X + X(A − B∆C) + XB∆B∗ X + C∗ ∆C = 0. (a) Positive Real Lemma. We will also need the operator T which maps the space of C∞ (R− .52) (b) Let D + D∗ be nonsingular. 2004 i i . y) of compact support. and ∆ = (D + D∗ )−1 .R ) . and only if. y) of compact support. Corollary 5. y) of compact support implies (3). this implies that −∞ u(t)y(t) dt is bounded from below for all (u. Proof. Here is the idea. if. Rm ) of compact support subject to the constraint T (u) = x0 .24. The system Σ is positive real if. −∞ Θreq can be deﬁned as L2 (R− . The fact that the inﬁmum is hence a quadratic function of x0 readily follows from ﬁrst principles: Θreq (x0 ) is the inﬁmum of a quadratic functional with a linear constraint. Corollary 5. If all the roots of both d and n are purely imaginary. (3) ⇒ (1) is trivial.r.Rm ) ≥ 0. u. m Θreq (x0 ) = inf u. the roots of d and n interlace. y) of compact support. consider H−1 ). there exist X = X∗ ≥ 0. if the roots of d and n are nonpositive and interlacing.48) and (ARE) we obtain the following result. Rm ) functions of −∞ compact support into Rn by T (u) = 0 e−At Bu(t)dt. The ﬁrst assumption on the n ai poles and zeros implies that H can be written as H(s) = a0 + i=1 s+ bi . and in particular (5. and only if.25. rational functions with real or pure imaginary poles. Then for all (u. for all (u. and the leading coefﬁcients of d and n have the same sign. x∈ R 0 Since Θ is bounded from below. Let Θ be the storage function. equivalently that x∗ Xx is a storage function. It hence sufﬁces to prove that Θreq (x) is a quadratic function of x. From theorem 5. By assumption. Note that the LMI implies d ∗ x Xx ≤ u∗ y dt for all (u. This is easily proven by contradiction: assume that there exists (u. (PRARE) We conclude this section with a result concerning p.r. Rm ) functions of compact support into itself by S (u)(t) = Du(t) + t CeA(t−τ ) Bu(τ )dτ . We claim that 0 is a lower bound. Now use the input κu and let κ → ∞ to obtain a contradiction.r. We ﬁrst prove that (2) implies −∞ u(t)y(t) dt ≥ 0. Linear dynamical systems: Part II (2) ⇒ (3). y). We will need the convolution operator S which maps the space of C∞ (R− .19. Then H is p. that H is proper (otherwise. if. Assume. x. We now prove that 0 −∞ u(t)y(t) dt ≥ 0 for all (u. there holds 0 −∞ u(t)y(t) dt ≥ Θ x(0) − Θ(0) ≥ inf n Θ(x) − Θ(0). with 0 ≤ b0 < b1 < · · · < bn . Assume that all the roots of both d and n are real. y). S u L2 (R− .
0 −ωn ωn 0 √ √ √ √ . D = 0. n. dissertation [76]. A detailed treatment of these issues can be found in the book by Anderson and Vongpanitlerd [6]. Ph. positive L’s. . . This result. This problem was a wellknown open problem at that time.45). This can also been seen from the state representation with √ √ √ A = diag (−b1 . the number of L’s and C ’s combined) is equal to max{deg(d). the price one has to pay to avoid transformers is the exponential increase in the number of reactances. Brune’s synthesis however requires ideal transformers. d d Problem.22. To prove the converse. H is p.r.. which introduced the state description in network analysis and synthesis. D = a0 . and the common factors are stable.r.r. (the McMillan degree of H). The BottDufﬁn synthesis is a strong case for the importance of nonminimal state representations. since it is the sum of p. and transformers if and only if H is p. . It turns out that the required number of reactances (that is.I. The only if part of the above theorem is analysis. + 2 + ω2 s s i i=1 n with a0 ≥ 0. B∗ = C = [ a0 a1 0 a2 0 · · · an 0]. n. implies that its partial fraction expansion looks like H(s) = ai s a0 .. The answer is more involved than: iff the transfer function is p. This implies that H is p.T. positive C ’s. . ai > 0. When does d( dt )V = n( dt )I. By investigating the behavior of the imaginary part of H on the imaginary axis. H is the driving point impedance of an electric circuit that consists of an interconnection of a ﬁnite number of positive R’s. deg(n)}. Transformerless synthesis requires a number of reactances that is larger than max{deg(d). If a0 = 0. −b2 .9. B∗ = C = [ a1 a2 · · · an ]. In terms of state representations . for i = 1. . for i = 1. since each term of the sum is.26. that transformers are not needed. −bn ). . This can also be seen from the state representation with A = diag 0. However.9. positive L’s. functions. 2004 i i i i . In 1949. We will now see that the dissipativity of Σ with respect to s is equivalent among other things to ACA April 14. The LMI is satisﬁed with X = In . we conclude that H p.i i 5. and Raoul Bott solved it as a graduate student working under the direction of Richard Dufﬁn at Carnegie Institute of Technology. and transformers? The above is an open problem. . undoubtedly the most spectacular in electric circuit theory. deg(n)}. describe an electric circuit that consists of an interconnection of a ﬁnite number of positive R’s. Obviously now.2 Contractivity and the bounded real lemma* Consider a stable system Σ (eigenvalues of A have negative real part). Some remarks on network synthesis Theorem 5. Since H is strictly proper it follows that there is exactly one zero between any two poles. .r. is synthesis. delete the ﬁrst element. . The if part. use the interlacing property to conclude that H admits a partial fraction expansion as above.r. . was proved by Otto Brune in his M. and 0 ≤ ω1 < ω2 < · · · < ωn . 0 −ω1 ω1 0 . this means that we will end up with a state dimension that is larger than the McMillan degree of H. Bott and Dufﬁn proved in a halfpage (!) paper which appeared in the ”Letters to the Editor” section of the Journal of Applied Physics [68]. with the quadratic supply rate s = u 2 − y 2 deﬁned by (5.r. Using part (b) of theorem 5. and ai > 0. we conclude that between any two poles there must be at least one zero. An important advance came in the 1960s with the Positive Real Lemma. 2. 5. Observe that the LMI is satisﬁed with X = In . . . System dissipativity* 141 i “oa” 2004/4/14 page 141 i that a0 ≤ 0. 2.···.D. positive C ’s.
such that A∗ X + XA + C∗ C + K∗ K = 0 XB + K∗ = 0 (5. The ≤ 1.19. ∞ Θavail (x(0)) = sup { − u 0 ( u 2 − y 2 )dt } . Integrating equation (5. Now with t1 ≥ 0.56) where the last inequality follows from the fact that X is positive semideﬁnite. where X ≥ 0. Furthermore.T ) 2 L2 (0. Systems satisfying this relationship are for instance. since the supply function is quadratic. if the pair (C. with initial condition x(0) = 0. [2] states furthermore that if a storage i i ACA : April 14.i i 142 Chapter 5. and a matrix K ∈ Rm×n such that d ∗ x Xx = u∗ u − y∗ y − u + Kx 2 . t ≥ 0). where u = 1 2 (V + I) and y = 2 (V − I).27. The proof of this part requires the construction of a positive semideﬁnite X satisfying (5.T ) ≤ 1. We will show that it is a quadratic storage function. A) is observable. Given is a stable and reachable system Σ = following statements are equivalent.T ) ≥ x∗ (T )Xx(T ) ≥ 0.53) from t = 0 to t = T . Therefore y u 2 L2 (0. Linear dynamical systems: Part II i “oa” 2004/4/14 page 142 i the H∞ norm of Σ being no greater than one. [4] Bounded Real Lemma. ∀T > 0 (5. This is done as follows. passive electric 1 circuits with input a voltage V and output some current I.53). the second term on the right handside of the above t inequality becomes Θavail (x(t1 )). we obtain u 2 L2 (0. we have t1 ∞ Θavail (x(0)) ≥ − 0 s dt − t1 s dt Taking the supremum over all trajectories with x(t1 ) ﬁxed. in other words the L2 (0. First notice that Θavail (x(0)) ≥ 0 (take u = 0 and y(t) = CeAt x(0).T ) − y 2 L2 (0. and keeping in mind that x(0) = 0. Consider the available storage. Lemma 5. [1] The 2induced norm of Σ is at most equal to one: Σ 2 A C B 0 . by optimal control the storage function Θavail (which is the optimal cost) is also quadratic: Θ = x∗ Xx. Therefore Θavail is a storage function since it satisﬁes Θavail (x(t0 )) ≥ − 0 1 s dt+ Θavail (x(t1 )). [2] → [1].55) (5.54) (5. [2] There exists a positive semideﬁnite X ∈ Rn×n .57) Therefore this property holds as T → ∞. subject to the system equations and x(0) = 0.53) Proof. ∀T > 0 (5. The following is a special case of theorem 5. dt [3] There exists a positive semideﬁnite solution X ≥ 0 of the Riccati equation A∗ X + XA + C∗ C + XBB∗ X = 0 The solution is positive deﬁnite X > 0. [1] → [2]. ∞) norm of y is always less than that of u and hence the 2induced norm of the convolution operator is less than or equal to 1. 2004 i i . There exists X ≥ 0.
6 shows that they are the square roots of the eigenvalues of the product of the (inﬁnite) reachability and observability gramians. their computation involves the solution of two Lyapunov equations. (5.48) with Q11 = −C∗ C. these deﬁnitions can be extended to unstable Σ. this implies K = −B∗ X and substituting in the term which is quadratic in x we obtain (5. Finally. For open systems.10 Chapter summary In the ﬁrst part of this chapter we discussed various norms for vector– and matrix–valued functions of time and frequency.58) 5. the power supplied to an electric circuit) only part of the supplied energy can be stored while the rest is dissipated. The lefthand side of (5. Computing and collecting terms we obtain the expression 143 d ∗ ∗ ∗ ∗ ∗ ∗ dt (x Xx) − u u + y y +(u + x K )(u + Kx) i “oa” 2004/4/14 page 143 i e = x∗ (A∗ X + XA + C∗ C + K∗ K)x + u∗ (B∗ X + K)x + x∗ (XB + K)u By requiring that e = 0 for all x.i i 5. important are the storage function Θ and the dissipation function d which are related by means of the equation: d Θ. Remark 5. As above. 2004 i i i i . the equivalence [3] ↔ [4] is straightforward. The latter is known as the Hankel norm of Σ. known as Hankel singular values of Σ. Consequently. Although these are usually taken to be the same. this is not necessary. The implication [3] → [2] follows by reversing the above steps. Thus expanding the righthand side of this equation and collecting terms on the lefthand side we obtain e = 0.7. d ∗ d [2] → [3]. Besides the supply function s. 2. The former is also known as the ∞norm of the system because it is the ∞norm of the transfer function of Σ. there exists K such that (5. For a summary of this part. where X satisﬁes the Riccati equation (5. where p = 1.10. Various so–called mixedinduced norms are discussed in section 5. Chapter summary function exists. of importance are the 2induced norm of the convolution operator S and the 2induced norm of the associated Hankel operator H.54). As in the case of constant vectors and matrices. the proper generalization of stability is provided by the concept of dissipativity. The system norms discussed above assume the stability of Σ. K. Another important norm is the H2 norm of Σ. As we will see in the sequel. ∞.9. Q12 = −C∗ D and Q22 = I − D∗ D. The idea is that given a supply function (e. the Bounded Real Lemma states that the H∞ norm of Σ is at most one.53) is satisﬁed. if. that is systems with inputs and outputs. L such that A∗ X + XA + C∗ C + K∗ K = 0 XB + C∗ D + K∗ L = 0 I − D∗ D = L∗ L This follows from (5. If D = 0. The important special cases of passive and contractive systems are discussed in some detail. the most useful are the p norms.53) can be rewritten as: ( dt x )Xx + x∗ X( dt x) = (Ax + Bu)∗ Xx + ∗ x X(Ax + Bu). Lemma 5.39).38) and (5. there exists X ≥ 0. For vector– or matrix–valued time/frequency signals one needs to choose a norm both for the spatial dimension as well as for time/frequency.g. two different ways of computing it are given. d = s − dt ACA April 14. these invariants provide a tradeoff between desired accuracy and required complexity for a certain type of approximation.54). and only if. Following formulae (5. Norms of linear systems Σ are introduced next.6. see section 5. Important invariants are the singular values of H.3. The second part of the chapter is devoted to stability and dissipativity. u we obtain K = −XB. The concept of Lyapunov function was also discussed. The former is a property of the trajectories of an autonomous or free system (they must all remain bounded or decay to zero).
i i 144 Chapter 5. Linear dynamical systems: Part II i “oa” 2004/4/14 page 144 i i i ACA : April 14. 2004 i i .
2) both being square with size (n + k ). the resulting equation AP + P A∗ = Q in the unknown P ∈ Rn×n . 145 i i i i . the most reliable is the square root method. A special case is the Lyapunov equation. is symmetric . is referred to as the Lyapunov equation.and thus deﬁnes a quadratic form . These equations underly many of the considerations for model reduction. The distribution of the eigenvalues of a symmetric matrix can be determined using a classical result of Jacobi. without computing the solution ﬁrst. Before discussing methods for solving this equation. In case that B = A∗ and C = Q = Q∗ . Roth [276] proved the following result.. 6. In particular. called the inertia theorem.2. Next various algorithms for the numerical solution of such equations are presented.g. the solution to an appropriate Lyapunov equation. which is a powerful way of checking stability of approximants. the inertia result is stated and proved using two different methods. which computes a Cholesky factor of the solution directly.1) in the unknown X ∈ Rn×k .3) (6. we state a condition for the existence of a solution. for details.1). B ∈ Rk×k . (6. see e. But Lyapunov wanted to accomplish this using quadratic forms. We begin by listing several ways for solving the Sylvester and Lyapunov equations (section 6. For the semideﬁnite Lyapunov equation. is the Sylvester equation. Lyapunov was interested in studying the distribution of the roots of a polynomial equation with respect to the imaginary axis in the complex plane. which is a linear matrix equation. M2 = A 0 0 −B . chapter X of Gantmacher [133]. He observed that under mild assumptions. the Lyapunov equation has a remarkable property. The chapter cocludes with remarks on the numerical stability of solution algorithms for the Sylvester and Lyapunov equations. and C ∈ Rn×k the matrix equation AX + XB = C (6.and its eigenvalues are distributed in the same way as the roots of the original polynomial. Of course one can do this by deﬁning a matrix whose characteristic polynomial is equal to the given one. First we need to deﬁne the following matrices M1 = A 0 −C −B . In the following section 6.i i i “oa” 2004/4/14 page 145 i Chapter 6 Sylvester and Lyapunov equations This chapter is dedicated to the investigation of the Sylvester equation.1 The Sylvester equation Given the matrices A ∈ Rn×n . by checking the signs of the principal minors of the matrix.
The sign function method 7.8).1) has a solution if. Several methods for solving this equation have been developed and will be discussed in the sequel. see. among others. e. Sylvester and Lyapunov equations i “oa” 2004/4/14 page 146 i Proposition 6. 1. where pi . and thus AB vec (X) = vec (C) where AB = Ik ⊗ A + B∗ ⊗ In (6. ri . For details we refer to. 2004 i i .3. The eigenvalue/eigenvector method 4. [181].1. =y⊗x+y ∗ Therefore. Proof. there exists a nonsingular matrix M3 = 1 M3 M2 .1) can be analyzed using the Kronecker product.1) can be rewritten as vec (AX + XB) = AB vec (X). which guarantee the existence of a unique solution. is nonsingular and satisﬁes M1 M3 = M3 M2 . It follows that X = X12 X− 22 . The vector associated to a matrix T ∈ Rq×r is vec (T) = (t11 · · · tq1 t12 · · · tq2 · · · t1r · · · tqr )∗ ∈ Rqr It follows that vec (xy∗ ) = [yi x] = y ⊗ x ⇒ vec x ˆ x y∗ ˆ∗ y ˆ⊗x ˆ. The eigenvalue/complex integration method 3.i i 146 Chapter 6. Given P = (pij ) ∈ Rp×q and R ∈ Rr×s their Kronecker product is deﬁned as follows: P ⊗ R = [pij R] = p11 R ··· p 1q R ppq R ∈ Rpr×qs pp1 R · · · We can also write P ⊗ R = [p1 ⊗ R · · · pp ⊗ R] = [p1 ⊗ r1 · · · p1 ⊗ rm  · · · pp ⊗ r1 · · · pp ⊗ rm ]. Let X be a solution of (6. Using the Kronecker product. if C∗ = (c1 · · · ck ) ∈ Rn×k .1). the lefthand side of equation (6. which implies vec C = e1 ⊗ c1 + · · · + ek ⊗ ck ∈ Rnk .4) i i ACA : April 14.1 The Kronecker product method The Sylvester equation (6. we can write C = e1 c∗ 1 + · · · + ek ck . 6. and only if.1).7) and (6. The square root method (Section 6. the matrices M1 and M2 are similar. We will also derive conditions (6. [219]. R. The characteristic polynomial method 5. is a solution of (6. X11 0 X12 X22 such that M1 M3 = Conversely. if M1 and M2 are similar. The inﬁnite sum method 8. The Kronecker product method 2. [96]. respectively.. [220]. Then M3 = I X 0 I .g. denotes the ith column of P. The invariant subspace method 6. Equation (6.1.3) It should be remarked that reachability and observability are intimately related with properties of the Sylvester equation.
V = [v1 · · · vn ].2. There holds (B + z I)−1 dz = 0.4) yields 1 vec (X) = A− B vec (C). and only if. ACA April 14. the eigenvectors of AB are ∗ ∗ [vec (w1 v1 ) · · · vec (wn vk )] = [w1 ⊗ v1 · · · w1 ⊗ vk · · · wn ⊗ v1 · · · wn ⊗ vk ] = [w1 ⊗ V · · · wn ⊗ V] = W ⊗ V i “oa” 2004/4/14 page 147 i 147 (6. This method can also be used to obtain a solution of the equation A1 XA2 + B1 XB2 = C.2) has a unique solution P if. ∀ i. and only if. The Sylvester equation Let the EVD (eigenvalue decomposition) of A and B be: AV = VΛ. Provided that the spectra of the associated pencils z B1 − A1 and z A2 + B2 are disjoint the solution can be expressed as follows X= 1 2πi (z B1 − A1 )−1 C(z A2 + B2 )−1 dz Γ where Γ is a Cauchy contour which includes the spectrum of the pencil z B1 − A1 .1.i i 6. (6. and excludes that of the pencil z A2 + B 2 .2 The eigenvalue/complex integration method Consider the Sylvester equation where the term z X.1. 1 The reader may consult (almost) any book on complex analysis for details on complex integration and Cauchy contours. W = [w1 · · · wk ]. but none of the eigenvalues of −B. λi (A) + λj (B) = 0. j. it is hermitian P = P ∗ . 2004 i i i i .1) has a unique solution X if.7) Corollary 6. respectively. let Γ be a Cauchy contour1 which contains all the eigenvalues of A. j As a consequence we obtain Proposition 6. W∗ B = MW∗ . Given the eigenvectors of A and B.3.9) Remark 6. In this case (6. Γ Γ (A − z I)−1 dz = 2πi In The earlier expression together with these integrals yield the solution of the Lyapunov equation as a complex integral: X= 1 2πi (A − z I)−1 C(B + z I)−1 dz Γ (6.1. j Furthermore if the solution is unique.8) 6.1. ∀ i. We assume for simplicity that these matrices are diagonalizable: Λ = diag (λi ). (6. The Sylvester equation (6.6) It can be shown (see Horn and Johnson [181]. z ∈ C. is added and subtracted on both sides (A − z I)X + X(B + z I) = C ⇒ X(z I + B)−1 + (A − z I)−1 X = (A − z I)−1 C(B + z I)−1 Assuming that the sets of eigenvalues of A and −B are disjoint. page 268]) that the eigenvalues of AB are λ(AB ) = λi (A) + λj (B). ∀ i. The Lyapunov equation (6. λi (A) + λ∗ j (A) = 0.5) (6. M = diag (µj ).
z = iω . and formula (6. chapter 10. In order to address this issue. Sylvester and Lyapunov equations i “oa” 2004/4/14 page 148 i The special case of Lyapunov equations If we consider the Lyapunov equation. but has eigenvalues in both half planes.3 The eigenvalue/eigenvector method Deﬁne the linear operator L : Rn×k → Rn×k where X −→ L(X) = AX + XB (6. Let T be a transformation such that T−1 AT = diag {Λ− . We now examine the case where A has eigenvalues both in C− and in C+ . P ˆ+P ˆ A∗ = ΠQΠ − (I − Π)Q(I − Π) AP (6. we deﬁne the projection Π onto the stable eigenspace of A in Rn .4. that P can be expressed in the time domain as ∞ P= 0 eAt QeA t dt ∗ which is the same as (4. C = Q. This projection can be computed as follows. this Lyapunov equation can be considered as the sum of the Lyapunov equations ∗ ˆ− + P ˆ− A∗ ˆ ˆ A− P − = Q− . but not on the imaginary axis. In such a case the solution of the Lyapunov ∗ equation does not admit an integral representation in the timedomain (such as eAt QeA t dt). where all the eigenvalues of Λ− are in C− and all the I 0 T−1 .11) solves the following Lyapunov equation Lemma 6.12) With the notation Q− = ΠQΠ and Q+ = (I − Π)Q(I − Π). in other words B = A∗ . Furthermore we assume that the contour Γ is the imaginary axis. 2004 i i . then I − Π is the projection onto the antistable eigenspace of A in Rn .43) with Q in place of BB∗ .13) i i ACA : April 14.11) First notice that under the assumptions above.9) becomes P= 1 2π ∞ −∞ (iω I − A)−1 Q(−iω I − A∗ )−1 dω (6. The question we wish to address is: what equation ˆ satisfy? does the quantity P ˆ= 1 P 2π ∞ −∞ (iω I − A)−1 Q(−iω I − A∗ )−1 dω ∈ Rn×n (6. Λ+ }. The question posed is of interest when dealing with Lyapunov equations where A is neither stable (eigenvalues in C− ) not antistable (eigenvalues in C+ ). we can choose the contour Γ above as the imaginary axis. and A has eigenvalues in C− .i i 148 Chapter 6. ˆ deﬁned by (6. Then Π = T 0 0 A = (I − Π)A(I − Π) + ΠAΠ = A− + A+ The main result of this section is due to Godunov [141]. Notice that eigenvalues of Λ+ in C+ .10) It readily follows by Parseval’s theorem of Fourier transforms. this expression is well deﬁned. except on the imaginary axis. A+ P+ + P+ A+ = −Q+ ˆ=P ˆ− − P ˆ+ has an integral representation in the time domain. namely Thus P ˆ= P 0 ˆ− P ∞ eA− t Q− eA− t dt − ∗ 0 −∞ eA+ t Q+ eA+ t dt ˆ+ P ∗ 6.1.
Remark 6. The ﬁrst formula follows. ∗ Recall the EVDs given in (6.i i 6. Let also the left eigenvectors ∗ ˆ1 v . wj are right. c1 ∈ Rn . In order to ACA April 14..6). Due to i=1 vi v vi v can be shown similarly. and the right eigenvectors of B be W = W = [w ∗ ˆn v k n ∗ ∗ ˆ i wi ˆi = Ik . where λi is an eigenvalue of A. . ∗ Let us assume for argument’s sake that C is rank one. ˆ1 X = V4 5 = (µ1 In + A) C w .5.c1 ) ˆ1 c∗ 2w . it can then be written as C = c1 c2 . ˆn c∗ 2w ˜∗ W ∗ W ˜ is the matrix of scaled right eigenvectors of A while W ˜ is the matrix of scaled left eigenvectors of B. 2004 i i i i . . . the second ˆ i = In we have i=1 vi v ˆ i C(λi Ik + B) . µj . the solution of the Sylvester = In . The Sylvester equation One way to analyze the operator L is using the eigenvalue decomposition (EVD) of A and B. c2 ∈ Rk . Then.5) and (6.2.1. With the notation established above.15) can be written as follows: ∗ ∗ ˆ 1 c1 v c2 (λ1 I + B)−1 . −1 . ∗ ˆn v c1 ˜ V −1 c∗ 2 (λn I + B) O (c∗ 2 .1.. left eigenvectors of A.15) Proof. vi . Given these eigenvalue decompositions.1) can be expressed as a sum of rankone matrices: n k ∗ ˆi vi v C (λi Ik + B)−1 = i=1 j =1 ∗ ˆ j wj (µj In + A)−1 C w X= (6. It should be mentioned that the formulae above contain a number of ingredients which are the same as those in [150]. i “oa” 2004/4/14 page 149 i 149 Theorem 6. We will now present a connection between the solution of the Sylvester equation and the L¨ owner matrix.B) = (µ1 I + A)−1 c1 · · · (µk I + A)−1 c1 R(A. The Sylvester equation and the L¨ owner matrix. since ∗ ∗ ∗ ∗ L(vi wj ) = Avi wj + vi w j B = (λi + µj )vi wj ∗ we conclude that vi wj is an eigenvector of L corresponding to the eigenvalue λi + µj . .14) These expressions can also be written as ∗ ˆ1 v C (λ1 Ik + B)−1 6 7 . X = V . ∗ −1 ˆ n C (λn Ik + B) v 2 3 ··· ˆ k W∗ (µk In + A)−1 C w (6. it readily follows of A be V = V = .1) follows (A − λi In )X + X(B + λi Ik ) = C. with corresponding ∗ ∗ ∗ ∗ ˆ i . respectively. hence vi v X= n n ∗ ∗ ∗ − 1 ˆ i X = X. and i=1 w that i=1 vi v equation can be explicitly written down. B. the solution of the Sylvester equation (6. consequently a connection is established between the Sylvester equation and rational interpolation. hence v ˆi ˆi ˆi ˆi left/right eigenvectors v (A − λi In ) = 0. Then (6. which leads to v X=v C(λi Ik + B)−1 . From (6. . −1 −∗ ˆ ˆ ˆ1 ···w ˆ k ]. corresponding to the eigenvalues λi . It is Thus V interesting to notice that the remaining matrices O and R are directly related with rational interpolation. it follows that vi . .
18) as a basis. this yields an expression for the solution of the Sylvester equation (6. Thus are precisely these generalized matrices: X = V 2 X2 and L are the same up to congruence. with characteristic polynomial χΓ (s) = sn + γn−1 sn−1 + · · · + γ1 s + γ0 . A similar method for solving Sylvester and Lyapunov equations was proposed by Hanzon and Peeters [166]. If Γ is invertible. therefore X = L−1 (C) (provided that the inverse exists). · · · . Then from AX + XB = C follow the relationships X−X AX + XB A2 X − XB2 A3 X + XB3 A4 X − XB4 . . Given a matrix Γ ∈ Rn×n . propose a general recursive method for solving polynomial Lyapunov equations. j = 1. Notice that the Sylvester equation can be written as L(X) = C. k . c1 ) and O(c∗ 2 . Sylvester and Lyapunov equations i “oa” 2004/4/14 page 150 i establish recall (4. c0 (6. γ0 −1 n−1 Γ b + γn−1 Γn−2 b + · · · + γ2 Γb + γ1 b . Next we will apply this method to the Sylvester equation (6. i = 1. and if B = A. µj are the eigenvalues of A. which shows that the L¨ owner matrix can be factorized as a product of generalized reachability and observability matrices. The matrices deﬁned above. where λi . deﬁned as in (4. 2004 i i . Let χL (s) = snk + cnk−1 snk−1 + cnk−2 snk−2 + · · · + c2 s2 + c1 s + c0 . namely R(A.72). The ﬁrst uses the characteristic polynomial of the operator L and the second the characteristic polynomial of either A or B.13). j = 1. and its roots are λi + µj . Cayley Hamilton implies χL (L(C)) = 0 ⇒ L−1 (C) · χL (L(C)) = 0.87). We also consider the pseudoderivative polynomials α(i) . i = 1. = = = = = 0 C AC − CB A2 C − ACB + CB2 A3 C − A2 CB + ACB2 − CB3 .i i 150 Chapter 6. The second method which falls under the characteristic polynomial approach makes use of the characteristic polynomial of either A or B. j = 1. γ0 Consequently. Let these be α(s) = sn + αn−1 sn−1 + · · · + α1 s + α0 . can be written down. . nk. · · · . X2 = −V ˜ LW ˜ ∗ .1).86) as well as (4. Lj (C) = ALj −1 (C) + Lj −1 (C)B. . Its characteristic polynomial χL has degree nk . B) = R(A.1). .17) and (6. · · · . of the following form X= −1 nk−1 L (C) + cnk−1 Lnk−2 (C) + · · · + c3 L2 (C) + c2 L1 (C) + c1 L0 (C) . Recall the Sylvester operator L deﬁned by (6.17) The quantities Lj (C) can be obtained recursively as follows: L0 (C) = C.85) and (4.16) Therefore if the coefﬁcients of the characteristic polynomial of Γ are known an expression for the solution of Γx = b. the inverse is given by Γ−1 = −1 n−1 Γ + γn−1 Γn−2 + · · · + γ2 Γ + γ1 I . We begin with the former. and β (j ) . Therefore the characteristic polynomial of L is the same as that of AB deﬁned in (6. that is detΓ = γ0 = 0. · · · . · · · . who using (6.4 Characteristic polynomial methods There are two methods that make use of characteristic polynomials. i i ACA : April 14. 6. β (s) = sk + βk−1 sk−1 + · · · + β1 s + β0 . (6. respectively. the solution of Γx = b can be expressed as x = Γ−1 b = (6. n. In analogy to (6. involving powers of Γ.18) This procedure has been adapted from Peeters and Rapisarda [264].4). n.1.16). c1 )W ˜ ∗ . k . B. B) ˜ O(c∗ . the Cayley Hamilton theorem implies that χΓ (Γ) = 0.
C = − 3/2 0 0 0 1 1 −1/2 0 −24 −50 −35 −10 3/2 2 (a) The ﬁrst method uses the Kronecker product: AB = (I2 ⊗ A) + (B∗ ⊗ I4 ) = 1 0 0 0 −1 1 0 0 −2 0 −1 1 0 0 −1 0 0 2 1 0 0 −1 1 0 0 −2 0 −24 −50 −35 −11 0 0 0 −1 2 = . the ﬁrst term is zero. In a similar way. making use of the characteristic polynomial of B.1. this linear combination is given by the coefﬁcients of the characteristic polynomial of A. We now consider the above methods for solving the Sylvester equation (6. It is assumed that 0 1 0 0 1 1/2 1/2 0 0 1 0 1 −1 −3/2 . 2004 i i i i . we obtain the formulas n k X=− i=1 α(i) (A)CBi−1 p(B) = −q(A) j =1 Aj −1 Cβ (j ) (B) Example 6. By CayleyHamilton. B = A= . we need to solve the following polynomial Bezout equation for the coefﬁcients of polynomials p(−s) and q(s): α(s)p(−s) + β (−s)q(s) = 1 Then. −3 0 0 0 0 1 0 0 2 3 0 −2 0 0 0 0 1 0 3 0 0 −2 0 0 0 0 1 3 0 0 0 −2 −24 −50 −35 −10 1 1 1 2 3 0 2 0 1 3 2 − 2 −1 0 vec (C) = − 1 ⇒ vec (X) = −AB vec (C) = 0 ⇒ X = 0 2 1 −2 1 −1 1 −1 2 ACA 0 1 −1 −1 April 14. (6.19) k X = − [β (−A)] −1 j =1 Aj −1 Cβ (j ) (B) .i i 6.19) where α(−B) is invertible because A and −B are assumed to have no common eigenvalues.1). (6.1. we obtain the dual formula to (6. and therefore we obtain the following solution n X=− i=1 α(i) (A)CBi−1 [α(−B)] −1 . The Sylvester equation 151 i “oa” 2004/4/14 page 151 i We now form a linear combination of the above equations. For this purpose.1.20) We can go one step further and compute the inverse of the quantity in the second bracket. We thus obtain α(A)X − Xα(−B) = α(1) (A)C − α(2) (A)CB + α(3) (A)CB2 − α(4) (A)CB3 + · · · . since α(−B)p(B) = Ik and β (−A)q(A) = In .
i i 152 Chapter 6. Sylvester and Lyapunov equations
i
“oa” 2004/4/14 page 152 i
(b) We will now use the Cayley Hamilton method applied to the characteristic polynomial of L. We have χL (s) = s8 + 24s7 + 243s6 + 1350s5 +
35787 4 8 s
+
17937 3 2 s
+
167387 2 16 s
+
50505 8 s
+
363825 256 .
Using formulae (6.18), we successively apply the Sylvester operator deﬁned by (6.13) to C, for j = 1, · · · , 7. Then 256 X = 363825 X , is equal to the expression obtained above, where
17937 2 167387 1 50505 0 3 X = L7 (C)+24L6 (C)+243L5 (C)+1350L4 (C)+ 35787 8 L (C)+ 2 L (C)+ 16 L (C)+ 8 L (C).
(c) The next method uses complex integration. The expression F(s) = (A − sI4 )−1 C(B + sI2 )−1 , is F(s)=
−(43s3 + 4s4 + 178s2 + 393s + 144) −(153s2 + 28s3 + 423s + 216 + 2s4 )
3 4 2 1 −2(−s + 14s + s + 79s − 12) 2 3 d(s) −2(−86s − 86s + 34s + 3s4 − 24)
−(−96 + 183s2 + 43s3 − 32s + 4s4 ) −(−142s + 24 − 197s2 + 53s3 + 4s4 ) (517s2 + 322s + 120 + 387s3 − 8s4 )
2(201s3 + 271s2 + 146s + 3s4 + 24)
where d(s) = (s + 4)(s + 3)(s + 2)(s + 1)(2s + 1)(2s − 3). A partial fraction expansion can be obtained in MATLAB by using the command diff(int(F)): F(s) =
8 1 3 32 s + 4 − 3 128 3 11 3 − 11 3 11 3 − 11 3
−2 3
+ 1 s+1
3 3 −5 −5 8 10 1 1 −9 −9 10 3 + + s + 3 s + 2 27 27 − 20 − 20 − 32 3 128 −81 −81 40 40 3 1 3 8 3 3 −2 −2 −2 3 2 3 1 1 −8 − −1 1 2 1 − 2 3 2 2 + + 8 3 1 1 2s + 1 − 1 2s − 3 2 3 2 2 2 8 1 3 7 7 −3 −2 2 2 2 −2 3
To evaluate the integral (6.9), we need the residues of F(s) at s = −4, s = −3, s = −2, and s = −1; these are:
11 1 −11 = 11 3 −11 3 −9 = 27 −81
X1 = (s + 1)F(s)s=−1
8 −5 −8 10 , X2 = (s + 2)F(s)s=−2 = 8 −20 −8 40
−5 10 −20 40 −2 8 −32 128
X3 = (s + 3)F(s)s=−3
3 −2 1 −9 8 , X4 = (s + 4)F(s)s=−4 = 27 3 −32 −81 128
It follows that X = X1 + X2 + X3 + X4 . We can obtain this solution by using the dual expression of (6.9). This involves the complex integration of G(s) = (A + sI4 )−1 C(B − sI2 )−1 . In this case the Cauchy contour must contain 1 the eigenvalues of B but not those of A. Therefore the residues of G(s) at s = 2 and s = − 3 2 are needed: 11 −3 1 11 3 s − 1 − 11 3 + 1 s−4
11 3 2 3 −8 3 32 3 − 128 3
−8 3
G(s) =
5 8 1 −10 3 + s − 2 20 −8 3 8 −40 3 2
3 −8 3 32 3 128 − 3
5 −3 −10 1 9 + 20 s − 3 −27 −40
3 2 1 2 −1 2 7 −2
−3 9 −27 81 −3 2
+
1 2s + 3
3 2 1 2 1 − 2 −7 2
81 + 1 2s − 1
1 2 − 1 2 1 2 −1 2
3 2 −3 2 3 2
i i
ACA :
April 14, 2004
i i
i i 6.1. The Sylvester equation In this case X = Y1 + Y2 , where Y1 = s − 1 G(s) 2 153
i
“oa” 2004/4/14 page 153 i
s= 1 2
1 1 −1 = 1 4 −1
−3 3 , Y2 = −3 3
s+
3 G(s) 2
s=− 3 2
3 1 1 = 4 −1 −7
3 1 −1 −7
(d) The eigenvector method is applied to B; we have B = VΛV−1 , V = −1 3 1 1 , Λ=
1 2
0
0 −3 2
−1 0 = 0 3 2
5
Therefore, according to the righthand side formula (6.14), the solution is: X = [(Λ(1, 1)I4 + A)−1 CV(:, 1)  (Λ(2, 2)I4 + A)−1 CV(:, 2)]

2
−1 V {z}
{z
6 6 −1 6 6 1 4
1
−1
−3 −1 1 7
3 7 7 7 7 5
}2
4
−1 4
1 4
3 4 1 4
0 −1 1 1
(e) Finally, the characteristic polynomial method, and in particular formula (6.20) yields: 1 0 −1 0 3 1 2 X = A − A − I4 (−AC + CB + C) = 0 −1 4 −2 −1
Using the Bezout equation we can express the inverse in the above expression as a polynomial in A. This is done as follows: we solve the polynomial equation which is always possible, since p(s), q(−s) are coprime. It turns out that p(s)a(s) + q(−s)b(s) = 1 ⇒ p(A)a(A) + q(−A)b(A) = I4 ⇒ q(−A)b(A) = I This implies that b(A) is the inverse of q(−A); the 4 polynomials are p(s) = s4 + 10s3 + 35s2 + 50s + 24 q(−s) = s2 − s − 3/4 4 a(s) = 3465 (−64s + 100) 4 b(s) = 3465 (64s3 + 604s2 + 1892s + 2045) Thus X= 4 (64A3 + 604A2 + 1892A + 2045I4 )(−AC + CB + C) 3465
6.1.5 The invariant subspace method
Recall the deﬁnition (6.3) of M1 . The method proposed here stems from the following observation. Suppose that a relationship of the following form A 0 −C −B
M1
V1 V2
V
=
V1 V2
R, where V ∈ R(n+k)×k , V∗ V = Ik , R ∈ Rk×k ,
(6.21)
−1 has been obtained. If V2 is nonsingular, the solution of the Sylvester equation can be obtained as X = V1 V2 . Indeed, we obtain from (6.21) −1 −1 −1 −C = V1 V2 V2 RV2 = −XB. A V1 V2 X
ACA
X
−B
April 14, 2004
i i i
i
i i 154 Chapter 6. Sylvester and Lyapunov equations
i
“oa” 2004/4/14 page 154 i
Moreover, it turns out that V2 is nonsingular if and only if R and −B are similar (see exercise 7). In this case the columns of V form a basis for the eigenspace of M1 , which corresponds to the eigenvalues of −B. We summarize this discussion in
∗ ∗ Lemma 6.6. Let (6.21) hold where V = [V1 , V2 ] is the eigenspace of M1 corresponding to the eigenvalues of −B. −1 The solution of the Sylvester equation (6.1) is given by X = V1 V2 .
The desired invariant subspace can be obtained by computing a real partial Schur decomposition of M1 , where R is quasiupper triangular with the eigenvalues of −B appearing on the diagonal.
6.1.6 The sign function method
Consider the matrix Z ∈ Rn×n with eigenvalue decomposition Z = VΛV−1 , Λ = diag(Λ+ , Λ− ), where Λ+ , Λ− contain Jordan blocks corresponding to the r, n − r eigenvalues with positive, negative real parts, respectively. The sign function of this matrix is deﬁned as Zσ = Vdiag(Ir , −In−r )V−1 . The purpose of this section is to show that under certain conditions, it can be used to solve the Sylvester and the Lyapunov equations. Given a complex number z ∈ C with Re(z ) = 0, we perform the following iteration, known as the sign iteration: 1 1 zn+1 = zn + , z0 = z. 2 zn It readily follows that the ﬁxed points of this iterations are ±1. Therefore if Re(z ) > 0, limn→∞ zn = 1, and if Re(z ) < 0, limn→∞ zn = −1. This iteration can also be deﬁned for matrices Z ∈ Cr×r : Zn+1 = (Zn + 1 2 Z− n )/2, Z0 = Z. Fixed points in this case, are matrices which satisfy Z = Ir , in other words, matrices which are diagonalizable and their eigenvalues are ±1. It can be easily shown that if the matrix has eigenvalues ±1 but is not diagonalizable, convergence to a diagonalizable matrix with eigenvalues ±1 is achieved in ﬁnitely many steps (equal to half the size of the largest Jordan block). An important property of the sign iteration is that it is invariant under similarity, i.e. if the j th iterate of Z is Zj , the j th iterate of VZV−1 is VZj V−1 . The following holds. Proposition 6.7. Let J be a Jordan block, i.e. J = λIr + N, where λ was positive real part, and N is the nilpotent matrix with 1s above the diagonal and zeros elsewhere. The sign iteration of J converges to Ir . Proof. Since the matrix is upper triangular, the elements on the diagonal will converge to 1. Furthermore, this limit, denoted by J , has Toeplitz structure. Thus each iteration applied to J , will bring zeros to the successive superdiagonals. Thus J (which is upper triangular and Toeplitz with ones on the diagonal) will converge in r steps to the identity matrix. Corollary 6.8. If Z ∈ Cr×r , and Reλi (Z) < 0 (Reλi (Z) > 0), the sign iteration converges to Zn → −Ir (Zn → +Ir ), respectively. Proof. . Let Z = VΛV−1 be the EVD of Z. Since the iteration is not affected by similarity we need to consider the iterates of the Jordan blocks Ji of Λ. The proposition implies that each Jordan block converges to ±I, depending on whether the real part of the corresponding eigenvalue is positive or negative. We will now consider a matrix of the following type Z= A 0 −C −B , A ∈ Rn×n , Reλi (A) < 0, B ∈ Rk×k , Reλi (B) < 0, C ∈ Rn×k .
1 Proposition 6.9. The iteration Zn+1 = (Zn + Z− n )/2, Z0 = Z, deﬁned above, converges to
j →∞
lim Zj =
−In 0
2X Ik
,
i i
ACA :
April 14, 2004
i i
i i 6.1. The Sylvester equation where AX + XB = C. Proof. Let ZV = VΛ, with Λ = diag(Λ1 , Λ2 ), be the eigenvalue decomposition of Z. V is upper block triangular A 0 −C −B V11 0 V12 V22 = V11 0 V12 V22 Λ1 0 0 Λ2 155
i
“oa” 2004/4/14 page 155 i
−1 . The block triangular structure of Z is This readily implies that the solution of the Sylvester equation is V12 V22 preserved during the iterations, i.e. Zj is also block upper triangular. Furthermore, by the proposition above, the limits of the (1, 1) and (2, 2) blocks are −In , Ik , respectively. Thus Zj as j → ∞ has the form claimed. There remains to show that X satisﬁes the Sylvester equation as stated. This follows since limj →∞ Zj V = V limj →∞ Λj , and therefore −In 2X V11 V12 V11 V12 −In 0 = 0 Ik 0 V22 0 V22 0 Ik −1 This implies X = V12 V22 which is indeed the solution to the Sylvester equation, as claimed.
Remark 6.1.3. (a) The above result shows that the solution of the Sylvester equation where A, and B have eigenvalues either in the left or the right half planes, can be solved by means of the sign iteration. (b) The above result also shows that given a matrix Z = VΛV−1 ∈ Cn×n , with k eigenvalues in the lefthalf 1 plane, the sign iteration yields the matrix Z∞ = Vdiag(−Ik , In−k )V−1 . Therefore Π± = 2 (In ± Z∞ ) yields the spectral projectors Z onto its unstable/stable eigenspaces, respectively. (c) For Lyapunov equations AP + P A∗ = Q, the starting matrix is Z= A 0 −Q −A∗ , A ∈ Rn×n , Reλi (A) < 0 ⇒ Zj = Aj 0 −Qj −A∗ j
The iterations can also be written as follows Aj +1 = 1 1 1 1 −∗ Aj + A− Q j + A− , A0 = A; Qj +1 = , Q0 = Q. j j Qj Aj 2 2
The limits of these iterations are A∞ = −In and Q∞ = 2P , where P is the solution of the Lyapunov equation. (d) The convergence of the sign iteration is ultimately quadratic. To accelerate convergence in the beginning, 1 2 −1 one can introduce a scaling constant as follows: Zn+1 = 2γ ( Zn + γ n Zn ). It has been proposed in [63] that for the n solution of the Lyapunov equation this constant be chosen as γn = 1/ n detAn . (e) Often, in the solution of the Lyapunov equation the constant term is provided in factored form Q = RR∗ . As a consequence we can obtain the (j + 1)st iterate in factored form 1 1 Qj +1 = Rj +1 R∗ Rj , A− j +1 where Rj +1 = √ j Rj 2 ⇒ Q∞ = R∞ R∗ ∞ = 2P
Thus the solution is obtained in factored form. However, R∞ has inﬁnitely many columns, although its rank cannot exceed n. In order to avoid this, we need to perform at each step a rank revealing RQ factorization Rj = Tj Uj , ∗ ∗ th where Tj = [∆∗ step Rj can be replaced by Tj j , 0] , where ∆j is upper triangular and Uj Uj = Ij . Thus at the j which has exactly as many columns as the rank of Rj . (f) The sign function was ﬁrst used for solving the Lyapunov equation by Roberts [274]. For a recent overview of the sign function see Kenney and Laub [198]. Benner and coworkers have contributed to the solution of Lyapunov equations by computing fullrank (triangular) factors [51], as well as by computing low rank factors of rank k , by means of an O(k 3 ) SVD (k is the maximum numerical rank of the solution) [54]. The use of the sign function in Newton’s method for AREs (Algebraic Riccati Equations) and its application to stochastic truncation is described in [53]. In [51], the sign function is used for the ﬁrst time to solve Lyapunov equations arising in generalized linear systems (i.e. systems containing both differential and algebraic equations). The computation of (triangular) Cholesky factors with the sign function was also introduced
ACA
April 14, 2004
i i i
i
i i 156 Chapter 6. Sylvester and Lyapunov equations
i
“oa” 2004/4/14 page 156 i
independently in [223]. In [55], [56] the sign function method is used for solving discrete Lyapunov equations; balancing related model reduction techniques for discretetime systems are also developed. [57] gives a survey of all model reduction algorithms together with detailed performance studies. Portable software for largescale model reduction on parallel computers based on the sign function has been developed in the references cited above.
1 ), z0 = 2. The iterates Example 6.1.2. Consider the number z=2. We perform the iteration zn+1 = 2 (zn + z1 n −1 converge to 1. The error en = zn − 1 is as follows: e1 = 2.5000 10 , e2 = 2.5000 10−2 , e3 = 3.0488 10−4 , e4 = 3.6461 10−8 , e5 = 1.0793 10−15 , e6 = 5.8246 10−31 , e7 = 1.6963 10−61 , e8 = 1.4388 10−122 . This shows that convergence is fast (the exponent of the error doubles at each iteration).
Example 6.1.3. Let 0 A= 0 −1 We form the matrix Z= 1 0 −3 0 1 , B = −3 A 0 0 −1 1 −2 0 , C = −1 −2 −1 2 . −7
−C −B
. will converge in ﬁnitely 2 0 2 0 1 .
Since both A and B are composed of one Jordan block with eigenvalues −1, the iteration many steps. Indeed, in two iterations we get 3 3 2 −3/2 −1 −1/2 −1 0 0 1/2 0 −1 0 1 / 2 1 − 1 0 2 , Z2 = 0 − 1 / 2 − 1 − 3 / 2 0 − 1 − 1 3 − 2 Z1 = 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 1 0 Therefore the solution of the Sylvester equation AX + XB + C = 0 is 1 1 X = 1 0 . −1 1
6.1.7 Solution as an inﬁnite sum
Equations of the type AXB + C = X, for the unknown matrix X, are known as discretetime Sylvester or Stein equations. They are closely related to continuoustime Sylvester equations. Proposition 6.10. Given K ∈ Rn×n , L ∈ Rm×m , M ∈ Rn×m , where the ﬁrst two have no eigenvalues equal to 1, we deﬁne the matrices: ˆ = (I + K)(I − K)−1 , L ˆ = (I + L)(I − L)−1 , M ˆ = 2(I − K)−1 M(I − L)−1 . K It follows that X ∈ Rn×m satisﬁes the Sylvester equation KX + XL = M, if, and only if, it satisﬁes the Stein equation ˆ L ˆ −X=M ˆ. KX ˆ ) < 1 and Re(λi (L)) < 0 is equivalent to λi (L ˆ ) < 1. If these Furthermore Re(λi (K)) < 0 is equivalent to λi (K conditions are satisﬁed, there holds
∞
X =
0
eKt MeLt dt =
k≥0
ˆ kM ˆL ˆ k. K
i i
ACA :
April 14, 2004
i i
i i 6.2. The Lyapunov equation and inertia 157
i
“oa” 2004/4/14 page 157 i
Thus, according to the above formula, provided that K and L have eigenvalues in the lefthalf plane, the solution of the Sylvester equation can be expressed as an inﬁnite sum.
6.2 The Lyapunov equation and inertia
In this section a remarkable property of the Lyapunov equation (6.2) AP + P A∗ = Q, for which Q is semideﬁnite: Q ≥ 0 or Q ≤ 0, is analyzed. This property allows the count of the location of the eigenvalues of A given the count of the location of the eigenvalues of the solution, and vice versa. This is the inertia result associated with the Lyapunov equation. We begin with a deﬁnition. Deﬁnition 6.11. Given a square matrix A ∈ Rn×n , let the number of eigenvalues in the left halfplane, on the imaginary axis, in the right halfplane be denoted by ν (A), δ (A), π (A) respectively. The triple in (A) = (ν (A), δ (A), π (A)), is called the inertia of A. Matrices with inertia (n, 0, 0) are referred to as stable, while matrices with inertia (0, 0, n) as antistable. The following result is a consequence of the CourantFischer maxmin characterization of eigenvalues. Proposition 6.12. Sylvester’s law of inertia. Let A = A∗ and X be real, with det X = 0. Then in (A) = in (X∗ AX). The ﬁrst fundamental relationship between inertia and the Lyapunov equation is the following. Lemma 6.13. Given Q > 0, the Lyapunov equation has a unique positive deﬁnite solution P > 0 if, and only if, A is an antistable matrix, i.e. in (A) = (0, 0, n). Proof. (a) We ﬁrst show that if P > 0 then A has inertia equal to (0, 0, n). Let A∗ yi = λi yi , 1 ≤ i ≤ n. If we ∗ multiply (6.2) by yi on the left and by yi on the right we obtain
∗ ∗ (λ∗ i + λi )yi P yi = yi Qyi > 0, 1 ≤ i ≤ n ∗ ∗ By assumption yi P yi > 0 and yi Qyi > 0 for 1 ≤ i ≤ n. We conclude that (λ∗ i + λi ) > 0 which proves the assertion. (b) Conversely, if A is antistable, we deﬁne 0
P=
−∞
eAτ QeA τ dτ
∗
which exists because of the antistability of A. Furthermore it is positive deﬁnite P > 0. Finally it is a solution of ∗ ∗ ∗ 0 0 (6.2) since AP + P A∗ = −∞ [AeAτ QeA τ + eAτ QeA τ A∗ ]dτ = −∞ d[eAτ QeA τ ] = Q. The following generalization holds [257]. Theorem 6.14. Let A, P , Q satisfy (6.2) with Q > 0. It follows that in (A) = in (P ). Proof. Assume that δ (A) = 0; this implies the existence of x = 0 such that x∗ A = iω x∗ . Multiplying both sides of (6.2) by x∗ , x respectively, we obtain x∗ Qx = 0, which is a contradiction to the positive deﬁniteness of Q. It follows that δ (A) = 0, i.e. in(A) = (k, 0, r) for r + k = n and k ≥ 0. We may thus assume without loss of generality that A has the form A1 0 A= where in (A1 ) = (0, 0, r) and in (A2 ) = (k, 0, 0). 0 A2 If we partition P conformaly to A, (6.2) implies
∗ A1 P11 + P11 A∗ 1 = Q11 > 0 and A2 P22 + P22 A2 = Q22 > 0.
ACA
April 14, 2004
i i i
i
i i 158 Chapter 6. Sylvester and Lyapunov equations
i
“oa” 2004/4/14 page 158 i
Using the previous lemma we conclude that P11 > 0 while P22 < 0. There remains to show that the matrix P has the same number of positive eigenvalues as P11 and the same number of negative eigenvalues as P22 , i.e. that in(P ) = (k, 0, r). Towards this goal, notice that we can write
∗ ∗ P11 = V11 V11 ∈ Rr×r , P22 = −V22 V22 ∈ Rk×k , det Vii = 0, i = 1, 2
Hence
∗ ∗ P = diag(V11 , V22 )Ydiag(V11 , V22 ) where Y =
Ir ∗ Y12
Y12 −Ik
.
¯ 0) where 0 is the zero Finally, assuming that r ≤ k , let Y12 = UΣW∗ where UU∗ = I, WW∗ = I and Σ = (Σ ¯ = diag{σ1 , · · · , σr }, σi ≥ σi+1 ≥ 0. Then matrix with r rows and k columns, and Σ ¯ Ir Σ 0 U 0 U∗ 0 ¯ ∗ −Ir . 0 where Φ = Σ Y= Φ 0 W 0 W∗ 0 0 −Ik−r
2 , i = 1, · · · , r and −1 with multiplicity k − r , i.e. Φ has r positive Clearly the matrix Φ has eigenvalues ± 1 + σi and k negative eigenvalues. Hence by the Sylvester law of inertia both Y and P have the same number of positive and negative eigenvalues as Φ. This completes the proof.
If we relax the positive deﬁniteness of Q we obtain the following result. Lemma 6.15. Let A, P , Q satisfy (6.2) with Q ≥ 0. Then δ (A) = 0 implies π (A) ≥ π (P ) and ν (A) ≥ ν (P ). Furthermore, δ (P ) = 0 implies π (P ) ≥ π (A) and ν (P ) ≥ ν (A). Proof. The proof goes by using continuity arguments as follows. First, assume that the quantities A, P , Q satisfy (6.2) with Q ≥ 0 and δ (A) = 0. Let P be the solution of AP + P A∗ = I; thus in(P ) = in(A). Deﬁne P =P+ P , the solution of AP + P A∗ = Q + I. This implies that in (P ) = in (A), > 0. > 0,
Since the above holds for all > 0 and due to the continuity of eigenvalues as a function of , the result follows by letting → 0. Similarly, if δ (P ) = 0, deﬁne A = A + P −1 , Substituting this into (6.2) we obtain A P + P A∗ = Q + 2 In > 0, > 0. > 0.
This implies in(P ) = in(A ) for > 0. Since this expression is positive deﬁnite for all > 0, and again due to the continuity of the eigenvalues as a function of the desired result follows by letting → 0.
Theorem 6.16. Let A, P , Q satisfy (6.2) with Q ≥ 0. If the pair (A, Q) is reachable then in (A) = in (P ).
Proof. We will show that reachability implies (a) δ (A) = 0, and (b) δ (P ) = 0. Hence the result follows by applying the lemma above. There remains to show (a) and (b).
i i
ACA :
April 14, 2004
i i
i i 6.3. Sylvester and Lyapunov equations with triangular coefﬁcients 159
i
“oa” 2004/4/14 page 159 i
Assume that δ (A) = 0; there exists ω and v such that A∗ v = iω v. Thus v∗ (AP + P A∗ )v = v∗ Qv. The lefthand side is equal to (−iω + iω )v∗ Qv = 0. Hence the righthand side is zero; we conclude that v∗ Q = 0. This is a contradiction to the fourth condition of lemma 4.13, that is, if δ (A) = 0 the pair (A, Q) is not reachable. This proves (a). If we assume now that δ (P ) = 0, there exists a vector v = 0 such that P v = 0. Hence v∗ (AP + P A∗ )v = ∗ v Qv. The lefthand side is zero and therefore v∗ Qv = 0 which implies v∗ Q = 0. This in turn implies v∗ (AP + P A∗ ) = v∗ AP = v∗ Q = 0, i.e. v∗ AP = 0. Repeating the above procedure to the equation A(AP + P A∗ )A∗ = AQA∗ we conclude that v∗ AQ = 0. Consequently δ (P ) = 0 implies v∗ Ak−1 Q = 0 for k > 0 which means that the pair (A, Q) is not reachable. This proves (b).
Remark 6.2.1. An inertia result with respect to the unit circle can also be formulated. Consider the discretetime Lyapunov or Stein equation FXF∗ + R = X. (6.22) If the inertia of F with respect to the unit circle ∂ D is deﬁned as the number of eigenvalues which are outside, on, or inside the unit circle, and denoted by in∂ D , and the inertia of X = X∗ is deﬁned as before, we conclude from proposition 6.10 that if the pair (F, R) is reachable in∂ D (F) = in (X).
6.3 Sylvester and Lyapunov equations with triangular coefﬁcients
Consider the Sylvester equation (6.1). There exist orthogonal matrices U, V such that UAU∗ and VBV∗ are in Schur form, that is upper triangular. Thus if we multiply (6.1) on the left by U and on the right by V∗ we obtain (UAU∗ )(UXV∗ ) + (UXV∗ )(VBV∗ ) = (UCV∗ ). For simplicity we will denote the transformed quantities by the same symbol A ← UAU∗ , X ← UXV∗ , B ← VBV∗ , C ← UCV∗ , where A, B are in Schur (upper triangular) form. In this case we will refer to the Sylvester and Lyapunov equations in Schur basis.
6.3.1 The BartelsStewart algorithm
With the coefﬁcient matrices triangular the solution can be obtained columnwise as follows. Let the columns of X, C be denoted by xk , ck , and the entries of A, B by aij , bij , respectively. The ﬁrst column of X satisﬁes Ax1 + x1 b11 = c1 . Thus x1 = (A + b11 I)−1 c1 . The second column satisﬁes Ax2 + x1 b12 + x2 b22 = c2 . Thus x2 = (A + b22 I)−1 (c2 − b12 x1 ), which can be computed given once x1 is known. Similarly
−1 −1
(A + b I)x = c −
j =1
bj xj ⇒ x = (A + b I)−1 c −
j =1
bj xj ,
= 1, · · · , k.
(6.23)
Thus the fact that B is in Schur form allows the calculation of the columns of the solution recursively. In addition, the fact that A is in Schur form implies that A + b I is also in Schur form and therefore the inverse is also in Schur form and can be computed explicitly. Remark 6.3.1. Recall that given the upper triangular M = (µij ), µij = 0, for i > j , i, j = 1, · · · , n, its inverse N = (νij ) is also upper triangular, that is νij = 0, i > j , i, j = 1, · · · , n, and its nonzero elements are deﬁned 1 , and having computed the entries of all rows from i = k + 1, · · · , n, the entries of recursively as follows: νnn = µnn th the k row are νkk = 1 −1 , νk = µkk µkk µkj νj ,
j =k+1
= k + 1, · · · , n and k = n, n − 1, · · · , 2, 1.
ACA
April 14, 2004
i i i
i
there is a 2 × 2 block on the diagonal between rows/columns and + 1. (6. we also assume that Q = −BB∗ . c ˜ +1 . c ˜ +1 − b . ˆ 1 ) is reachable. The BartelsStewart algorithm in real arithmetic In order to avoid complex arithmetic. P= P11 ∗ P12 P12 P22 (6. even if A has complex eigenvalues X will be real. +1 −1 = [c − j =1 =˜ c bj xj .2 The Lyapunov equation in the Schur basis For this purpose we assume that in (6. is reachable. and partitioning (6. with (A. where B ˆ 1 = B1 − P12 P −1 B2 . This can be solved using where c the block version system solve routine in LAPACK [7]. and c ¯ +1 = Ac ˜ +1 + b .13. +1 )A + (b . A has no eigenvalues on the imaginary axis and both matrices are nonsingular. The where the entries of the righthand side of this expression are denoted by c above system of equations can be solved using different methods. x +1 ] b b . B2 . there is no loss in generality with this assumption as it amounts to a transformation of the Lyapunov equation to an equivalent system using the Schur form basis vectors. = 0. As shown earlier. equation (6. x +1 ] + [x . In this case we must solve simultaneously for x and x +1 . +1 −b .e. +1 b +1. Following [311]. x +1 ] ¯ = [¯ c.17. Recursive algorithms for the solution of linear matrix equations with triangular coefﬁcients is also treated in [189]. where the blocks on the diagonal are 1 × 1 for real and 2 × 2 for complex eigenvalues. B ∈ Rn×m . b +1. respectively. the real Schur form can be used. i. Proposition 6. ) I [x . for the Lyapunov equation.2) P is symmetric solution. c +1 ] ¯ = Ac ˜ + b +1. (4. (b) δ (P22 ) = 0.24).e. i. the solution of the Sylvester equation X in the (complex) Schur basis will be complex. 6. 2004 i i . +1 +1. i. B 22 i i ACA : April 14. +1 +1 xj ]. P satisfy the Lyapunov equation. +1. the above system of equations is equivalent to A2 + ( b . c +1 − j =1 =˜ c bj. partition A. for simplicity. On the other hand. A is upper triangular. The following statements hold: (a) The pair A22 . +1 c ˜ − b +1. B) is reachable.26) where A11 and A22 are upper triangular. b b −1 . Once the system is in Schur form.24) ˜. (6. of theorem 4. and P compatibly as follows: A= A11 0 A12 A22 . (c) The pair (A11 . We refer to [311] for a comparison between different methods of solving (6. B. thus we consider the Lyapunov equation AP + P A∗ + BB∗ = 0.c ˜ +1 . B) reachable.3.26) holds. B= B1 B2 .45) where A ∈ Rn×n .25) Recall the PBH condition for reachability stated in part 3. +1 c ˜ . Assume A. In this case if A or B have complex eigenvalues they can be transformed by (real) orthogonal transformation to quasi upper triangular form.i i 160 Chapter 6. because if A or B have complex eigenvalues. +b +1. P22 is nonsingular. B. Throughout this section we assume that (A.23) becomes A[x . As a consequence both δ (A) = 0 and δ (P ) = 0. Sylvester and Lyapunov equations i “oa” 2004/4/14 page 160 i The above procedure is the BartelsStewart algorithm [43] in complex arithmetic.e. To continue our discussion it will be convenient to assume that A is in Schur form. In the latter case let b +1. that is. P = P ∗ ∈ Rn×n .
A12 = −B1 B2 P22 From (6. B 22 ˆ11 = P11 − P12 P −1 P ∗ . A diagonal matrix is called a signature if its diagonal entries consist only of 1 or −1. An iterative proof of the inertia result A consequence of the above lemma is a selfcontained proof by induction. the Lyapunov equation can be transformed to ˆP ˆ∗ +B ˆB ˆ ∗ = 0. of this section. Let A. B) reachable. Now. both of the pairs (A11 . 2) block has dimension n − k < n. Due to reachability.18. ˆ+P ˆA A ˆ = TAT−1 . B and P satisfy the Lyapunov equation. (b) Since A22 P22 + P22 A∗ 22 + B2 B2 = 0. It ﬁrst appeared in [19]. satisfying (6.17.27). we are ready to prove the main theorem 6. Deﬁnition 6. P (where A has dimension n) are partitioned as in (6. B2 ).28) and satisfy the transformed Lyapunov ˆ 1 ) and (A22 . B ˆ11 = U11 S1 U∗ and hypothesis can be applied to each of the two reduced order Lyapunov equations giving: P 11 ∗ ∗ P22 = U22 S2 U22 . where U is upper triangular and S is a signature matrix.25) is satisﬁed. where the (1. B. where (6.45) with (A.i i 6. Then ν (A) = π (P ) and π (A) = ν (P ). B2 ) are reachable and the induction equation (6. Then z1 A12 = 0 and it follows that z = [z1 . of the inertia result associated with the Lyapunov equation. (c) As a consequence of (b). the order of A. stated earlier.27) ˆ = . Lemma 6.20.27) and (6. B ˆ = TB. and therefore the PBH condition also ∗ implies the reachability of (A22 . A22 P22 + P22 A22 + B2 B2 = 0. B) reachable.28) ˆ 12 = A12 − P12 P −1 A22 + A11 P12 P −1 . part (b) follows from the fact that δ (A) = δ (P ) = 0. Assume that A. then P can be expressed in factored form: P = USU∗ .29) ∗ˆ ∗ ∗ ˆ Suppose that there is a left eigenvector z1 of A11 such that z∗ 1 B1 = 0. Sylvester and Lyapunov equations with triangular coefﬁcients 161 i “oa” 2004/4/14 page 161 i ∗ Proof. z∗ 2 ] is a left eigenvector of A and the PBH condition ∗ ∗ implies 0 = z B = z2 B2 . B and P satisfy the Lyapunov equation (4. The induction is thus complete. If A is in Schur form.20. B ˆ1 B B2 ˆ= . It is based on lemma 6.26). Then z = [0.28) follow the three equations: (6. By Proposition 6. P 12 22 ∗ ∗ ˆ ˆ∗ ˆ ˆ ∗ −1 ˆ11 + P ˆ11 A∗ A11 P 11 + B1 B1 = 0. 1) block has dimension k < n and the (2. Theorem 6. P ˆ11 P 0 0 P22 (6. with (A. A A11 0 ˆ 12 A A22 ˆ = .27) gives P = USU with U= and S = S1 0 0 S2 U11 0 U12 U22 = I 0 −1 P12 P22 I U11 0 0 U22 . ACA April 14. we may also assume that these matrices are in the form (6. The required property clearly holds for n = 1. we can assume without loss of generality that the matrices A. To prove this. This is true for any left eigenvector of A22 . (a) Let z2 be any left eigenvector of A22 . 2004 i i i i . P ˆ = TP T∗ . The proof is given by induction on n. Transforming back from equation (6. A 22 22 ˆ 1 = B1 − P12 P −1 B2 . 0] ∗ ∗ ˆ ˆ such that z B ˆ =z B is a left eigenvector of A 1 1 = 0 in contradiction of the PBH condition.19. and where A T= with I 0 −1 −P12 P22 I (6.19. Assume that it holds for Lyapunov equations of order k < n.25). Proof.3. . We show that the same property must also hold for Lyapunov equations of order n.
No apriori assumptions on the spectrum of A are required. We show as a consequence that these properties must also hold for Lyapunov equations of order n. First. where U is upper triangular ˜ in the original coordinate system. possess such and S is a signature matrix. of P we have ν (P ) = ν (P If A is not in Schur form. where A = Q−1/2 AQ1/2 . ∗ ∗ ∗ This follows from the fact that if P xi = λi xi . 0 −β αβ α − β does not have an LU factorization. when does the solution P a factorization? ˜ (k : n. If the condition A + A∗ < 0 is satisﬁed. The considerations laid out in the proof of the theorem above lead to a UL factorization of the solution P to the Lyapunov equation. Of course. Each one of these has size less than n and hence the induction hypothesis applies: ˆ11 ). let P ¯L ¯ . The properties stated in the above theorem clearly hold for n = 1. for some R ∈ R .3 The square root method for the Lyapunov equation If A is stable (that is. ν (A11 ) = π (P Since A is in Schur form. It should be noted that the nonsingularity of the S minors deﬁned above is basis dependent. For example if n = 2.19 can be written as P = UU∗ . 2004 i i . the upper triangular U in the considerations above. In this case in the Schur basis. the proof is given by induction on n. ν (P ˆ11 ) = π (A11 ) and ν (A22 ) = π (P22 ). The question is. where ˜ = Q∗ AQ is the original matrix (not in Schur form). Since P ˜ is symmetric there exists a signature matrix to have the same magnitude as the corresponding entries of U ˜ such that (S ˜ )−1 L ¯ =U ¯ ∗ . are different from zero. ν (P22 ) = π (A22 ). and the required factorization follows. P Remark 6. β > 0 ⇒ P2 = . k : n). then λi x∗ i (A + A )xi = xi B Bxi . and π (A) = π (A11 ) + π (A22 ). This is the case whenever A has eigenvalues with both positive and negative real parts. let the solution P1 be diagonal. where the diagonal entries of U ¯ can be chosen to be positive and those of L ¯ can be chosen ˜ =U P ¯ .i i 162 Chapter 6. that there always exists a basis such ˜ does not have an LU factorization. By (b) If A is stable. completing the induction. there holds ν (A) = ν (A11 ) + ν (A22 ). Assume they hold for Lyapunov equations of order k < n. If A is in Schur form. the UL factorization of If the principal minors det P ˜ exists. and π (P ) = π (P11 ) + π (P22 ). If we partition the matrices as in the proof of Lemma 6. 6. and thus ∗ ∗ ∗ −∗ ∗ ∗ −∗ ∗ λi trace [yi yi ] = λi trace [yi yi ] = trace yi R B BR−1 yi = trace BR−1 yi yi R B ∗ The desired result follows by summing the above equations for i = 1.3. the condition A ∗ ∗ ¯ ¯ ¯ assumption. the result is the Cholesky factorization which always exists. The solution of the corresponding Lyapunov equation is A ˜ = (QU)S(QU)∗ . the order of A.25). by basis change the that P transformed solution P2 . and cannot always be satisﬁed. all eigenvalues are in the lefthalf plane). there exists Q > 0 such that AQ + QA < 0.3.19.29) are satisﬁed. i i ACA : April 14. Sylvester and Lyapunov equations i “oa” 2004/4/14 page 162 i Proof.3. due to the structure ˆ11 ) + ν (P22 ). then A + A < 0. Remark 6. √ αβ α 0 0 √ P1 = . The theorem we have just established determines the inertia of A. · · · . n and using the fact that i yi yi = In and trace P = i λi . satisfying (6. Again. ¯ +A ¯ ∗ < 0 can always be satisﬁed for some A ¯ which is similar to A. Thus with yi = Rxi we have λi yi yi = yi R B BR−1 yi . assume that A is in Schur form. (a) Consider the Lyapunov equation (6. Actually it is easy to show in this case. the solution according to lemma 6. is replaced by QU. the following relationship holds trace P = − trace B∗ (A + A∗ )−1 B .3. the factorization P = USU∗ holds. By assumption we can write n×n ∗ ∗ ∗ ∗ −∗ ∗ −(A + A ) = R R.2. if P is positive or negative deﬁnite. the solution of the Lyapunov equation is positive deﬁnite P > 0. it follows that the Lyapunov equations (6. k = 1. α. if a symmetric solution to a Lyapunov equation is available (as it is in our special case).2). n. · · · . satisfying (6.25).
Then.32) 2 2 ∗ Then. given the fact that the ˆ 1 ) is according to proposition 6.29) which we rewrite as follows: and ﬁnally P ∗ A22 P22 + P22 A∗ 22 + B2 B2 = 0. Let 0 0 .3. P12 ∈ Rk×(n−k) . Then. we have the three equations (6. B ˆ 1 ) is reachable. and partition A. 2004 i i i i . of the solution P .30). where The Schur decomposition of A is A = UAU −1 1 U= −1 1 ACA 0 1 0 −35 11 −7 −1 17 82 6 1 21 11 · diag 1 . equation (6. Let us assume that A is in Schur form. which implies 22 P12 = [A11 + A∗ 22 In−1 ] −1 B2 n−1 A12 − B1 B∗ 2 ∈R A22 + A∗ 22 (6. namely: P22 ∈ R(n−k)×(n−k) .3. from the middle equation follows [A11 + A∗ 22 In−1 ] P12 = A12 A22 +A∗ − B1 B2 .31) we obtain P22 = − B2 B∗ 2 ∈R A22 + A∗ 22 B B∗ (6. where k < n. The Lyapunov equation can be further transformed to (6.26). This is a consequence of the discussion of the preceding section. than the original one.17 reachable. B = 1 −10 0 0 0 1 0 1 0 0 A= 0 0 −24 −50 ˜ ∗ . the problem is reduced to one of dimension one less. Thus (assuming stability of A) P can be expressed in factored form: P= ˆ 1 /2 P 11 0 P12 P22 1 /2 P22 U −1/2 ˆ 1/ 2 P 11 −1/2 ∗ P22 P12 U∗ 0 1 /2 P22 = UU∗ (6. −1 −1 ∗ A11 P12 − P12 P22 A22 P22 + P22 B2 B∗ + A P + B B = 0 .1. If we wish to apply this method in practice. in other words we compute successively the last row/column of the square root factors. Sylvester and Lyapunov equations with triangular coefﬁcients 163 i “oa” 2004/4/14 page 163 i where U is upper triangular. this is an equation of small or at least smaller size. the pair (A11 . It was observed by Hammarling [164] that this square root factor U can be computed without explicitly computing P ﬁrst. P as in (6. ˆ11 + P ˆ11 A∗ + B A11 P 11 1 (6. Example 6. we solve the second equation which is a Sylvester equation for P12 . √ −112 6 2 460 22310 194 −51 1 April 14.31) We can thus solve the ﬁrst (Lyapunov) equation for P22 . √ 1 . B.27) using the transformation T given in (6.17.28). we choose k = n − 1. From the ﬁrst equation (6.31). At the same time. √ 1 . the problem of solving a Lyapunov equation of size n is pair (A11 . B reduced to that of solving a Lyapunov equation of smaller size k < n. 12 22 1 2 2 ˆ 1B ˆ ∗ = 0. ˆ11 ∈ Rk×k . There remains ˆ 1/2 which satisﬁes the Lyapunov equation given by the third to determine the upper triangular square root factor P 11 − 1 ˆ 1 = B1 − P12 P B2 and by proposition 6. we have determined the last column of the upper triangular square root factor U in (6.30) The problem is now to successively compute the smaller pieces of P . we have computed the last n − k rows and columns of the factor U and because of triangularity. To that effect. Recall that B 22 Consequently. that is (quasi) upper triangular.i i 6.33) Thus.
B ← B3 .2. Equation (6. √ 2 460 22310 194 −51 1 0 0 1 2520 −4 0 0 1 504 0 P= − 1 2520 0 √14 0 L= √14 − 210 0 168 0 1 − 504 1 − 504 = LL∗ = UU∗ where 0 151 2520 0 0 0 √ 6 60 0 √ 70 420 0 0 − √ 70 84 0 √ 5 10 0 0 . k ← n − 1. √ . n ← n.32) is solved for P22 . • Step 2: A ← A(1 : 3. 4) = 2/2. then equation (6.i i 164 ˜.30) the last column √ √ (4. B ← B2 . B . 1) − P12 BP = 22 ∗ [−19/2 − 7/2 − 1] . G = 2 1 0 2 0 0 −4 First. 3) = 2/2. • Step 3: A ← A(1 : 2. P22 = 1/2. 3) = 2[31/4 9/4]∗ .3. According to (6. According to (6.30) the third √ √ (3. 2) = 2[7/2]∗ . Sylvester and Lyapunov equations √ 5 133 √ 4462 109 √ 194 −3 0 1 − 2520 √ √ 10 − 773 2231 √ 49 √ 5 23 460 −√ 194 1 17 1 1 1 1 ˜ = diag . 2) = 2/2. k ← n − 1. We also have B2 = B3 (1 : 2. P12 = [21/4 9/4 1]∗ .33) is solved for the n − 3 vector P12 . 1 0 1 −1 · diag 2 2 2 2 −1 0 1 1 Thus −1 0 ∗ A = WFW = 0 0 2 −1 0 0 3 2 −1 0 4 3 . then equation (6. We also have B3 = B(1 : 3. F is transformed into Schur form. n ← n − 1. 1) − P12 B2 = [−1]∗ . k ← n − 1. U = 0 0 0 √ 6 120 0 √ 755 1510 14 − 420 √ 0 − √ 10570 12684 0 √ 14 84 0 0 0 √ 10570 420 0 Example 6. 1 : 2). U(2. Equation (6. then equation (6.30) the second column √ √ (2. 2004 i i .32) is solved for P22 .1) of U is U(1 : 3. B = WG = 2 −1 1 1 . using Hammarling’s algorithm. √ . Equation (6. P22 = 1/2. P22 ACA : i i April 14.32) is solved for P22 . U(4. U(3. n ← n − 1.1) of U is U(1 : 1.1) column of U is U(1 : 2.33) is solved for the n − 1 vector P12 . We also have B3 = B2 (1 : 1. B ← B. where √ 2 0 0 √ 1 12 √ 0 . We get: P22 = 1/2. According to (6. B ˜ = U∗ B are and A √ √ 5 −1 − 3 23 0 −2 ˜ = A 0 0 0 1 2016 i “oa” 2004/4/14 page 164 i Chapter 6.33) is solved for the n − 2 vector P12 . 1 1 • Step 1: A ← A. P12 = [31/4 9/4]∗ . 4) = 2[21/4 9/4 1]∗ . 1 : 3). P12 = [7/2]∗ . We wish to solve FP + P F∗ + GG∗ √ − 4 2 √ 1 2 0 − √ F= 0 2 − 2 √ 0 2 = 0. the orthogonal matrix which achieves this is: 1 1 0 1 √ √ −1 1 0 −1 2 2 1 1 W= . 1) − P12 B3 = P22 ∗ [6 1] .
. 2004 i i i i .8336e+000 1.i i 6.32) is solved for P22 .0250e+001 which up to machine precision is the same as the P obtained above.8336e+000 2. Equation (6.5000e+000 4.6250e+001 2.4000e+001 6.7276e+001 4.4194e+000 2.7276e+001 5. Putting together the columns computed above we obtain the upper triangular factor of the solution to the Lyapunov equation in the Schur basis: 1 √ 2 0 U= 2 0 0 7 1 0 0 31 2 9 2 21 2 9 2 1 0 1 1 To obtain the square root factor in the original coordinates we have to multiply by W∗ : 7 15 3 5 √ U ← W∗ U = diag 2 1 1 . 1 : 1). and hence U(1.3.3.4000e+001 5. The ﬁrst computes the solution of the Sylvester equation in real arithmetic and the second computes the square root factors of a Lyapunov equation in complex arithmetic.33) in this case: P22 = 1/2.G*G’) = 2. ACA April 14. there is no equation (6.9650e+001 5.6250e+001 5. .7250e+002 1.4194e+000 2. k ← √ n − 1. 4 2 2 √ 1 2 1 · 0 4 1 6 8 0 6 12 20 1 10 Thus the solution P in the original quantities is P = UU∗ = √ 197 2 4 345 2 65 4 √ 81 2 2 √ 33 2 8 65 4 5 2 √ 25 2 8 115 4 √ 197 2 4 √ 33 2 8 24 √ 81 2 2 √ 25 2 8 81 4 24 Finally. MATLAB gives the following solution lyap(F.4 Algorithms for Sylvester and Lyapunov equations We will now quote two algorithms given in [311]. Sylvester and Lyapunov equations with triangular coefﬁcients 165 i “oa” 2004/4/14 page 165 i • Step 4: A ← A(1 : 1. n ← n − 1.8750e+001 6. 1) = 2/2.9650e+001 1. 6. B ← B3 .
2) + t11 b(:. in the original basis it will be real. 1) : −1 : 1]. Of course. 4. j ← 1. 1)] (c) block solve the linear equations [R2 + (t11 + t22 )R + (t11 t22 − t12 t21 )I]x = b. While (j < k + 1) • if j < n and T(j + 1. Thereby. C ∈ Rn×k . Compute the real Schur decomposition A = QRQ∗ . (b) b ← − C(:. (c) j ← j + 1. t22 ← T(j +1. R as follows: idx = [size(A. set U = Q. 2004 i i . idx)∗ . b ← [Rb(:. Rb(:. with R: quasi upper triangular. Sylvester and Lyapunov equations Algorithm: Solution of the Sylvester equation AX + XB = C in real arithmetic. all quantities obtained are real. T = R(idx. j +1). j ) . i i ACA : April 14. up to part (c) of the ”else” part. idx). j ). get U. which is coded in MATLAB as lyap. T from Q. 1 : j − 1) ∗ T(1 : j − 1. 1 : j − 1) ∗ T(1 : j − 1. with T quasi upper triangular. For details we refer to [311]. I ← eye(n). R2 ← R ∗ R. 2). U = Q(:. j : j + 1) ← x. j + 1) ) (a) b ← − C(:. 2. j ) − X(:. else if A = A∗ . 1) − t21 b(:. j ) < 10 ∗ ∗ max( T(j. and set X(:. this is achieved at the expense of introducing 2 × 2 blocks on the diagonal in the Schur form which makes a step like the ”block solve” necessary. This algorithm is also implemented in SLICOT. B ∈ Rk×k . A version of this algorithm in real arithmetic can be derived in a way similar to that employed for the BartelsStewart algorithm. j ). the resulting square root factor in the Schur basis will in general be complex. j : j + 1). The solution X in the original basis is X ← QXQ∗ . 2) − t12 b(:. It is implemented in complex arithmetic. j )I)x = b. j ). Output data: X ∈ Rn×n . is the BartelsStewart algorithm. 1) + t22 b(:. j ) ← x. (b) solve (R + T(j.i i 166 Chapter 6. if A has complex eigenvalues and R is the complex Schur form. In other words. 3. j : j + 1) − X(:. Input data: A ∈ Rn×n . • else (a) t11 ← T(j. If B = A. (d) j ← j + 2 • end if 5. t21 ← T(j +1. and set X(:. The following is essentially Hammarling’s algorithm for computation of the square root factor of the solution to the Lyapunov equation (in case the latter is positive deﬁnite). else compute the real Schur decomposition B = UTU∗ . i “oa” 2004/4/14 page 166 i The above algorithm.m.  T(j + 1. T = R. C ← Q∗ CQ. 1. If A or B has complex eigenvalues. j +1). t12 ← T(j.
E η (X ACA F ≤ α. A small relative residual R ˜ X indicates forward stability. 2004 i i i i . T(1 : j − 1. This means that the residual as follows. j ) ∗ µ/µ1 solve for u: (R(1 : j − 1. F F ≤ β. R = AX is different from zero. j )∗ ∗ I)u = −temp B(1 : j − 1. It was shown in [43] that R ˜ X F F ≤ O( )( A F + B F) where O( ) is a polynomial in (machine precision) of modest degree. Following [170] the normwise backward error is solution X ˜ ) = min{ : (A + E)X ˜ +X ˜ (B + F) = C + G. The former is explained ˜ be an approximate solution of the Sylvester equation (6. T(j. µ1 ← 2. The question of how good these solutions are. A = URU∗ ∈ Rn×n .1).i i 6. 1 : j − 1) + R(j. :) 2 / −2 ∗ real(R(1. µ ← b 2 . we need to deﬁne the backward error of an approximate ˜ . 1) end if 3. April 14. • T ← zeros(n. In order to address the issue of backward stability.2 on page 41. :) ∗ b∗ ∗ µ1 + R(1 : j − 1.4. :). j )) 167 i “oa” 2004/4/14 page 167 i Input data: • Square root factor in original basis T ← U∗ T. R ∈ Cn×n upper triangular. arises therefore. G F ≤ γ }. 6. Reλj (A) < 0. :) − u ∗ b ∗ µ1 else u ← zeros(j − 1. Numerical issues: forward and backward stability Algorithm: Computation of the square root factor of the solution to the Lyapunov equation AP + P A∗ + BB∗ = 0. :) ← B(1 : j − 1. 1) ← B(1. Let X ˜ + XB ˜ − C. As discussed in section 3. j ) ← u • T(1. I ← eye(j − 1) temp ← B(1 : j − 1.4 Numerical issues: forward and backward stability In applications due to ﬁnite precision and various errors the Sylvester and the Lyapunov equations can only be solved approximately. B) reachable. j ) ← µ/µ1 4. 1)) −2 ∗ real(R(j. there are two ways of assessing the quality of a solution: forward and backward stability. B ∈ Rn×m .3. n) • for j = n : −1 : 2 do 1. in complex arithmetic. Output data: square root factor T ∈ Rn×n of solution P = TT∗ > 0. (A. Therefore the solution of the Sylvester equation is forward stable. b ← B(j. if µ = 0 b ← b/µ.
i i ACA : April 14.i i 168 Chapter 6. we would not expect the solution of the Sylvester equation to be backward stable.1. If we choose B = I. E 2 ≤ α. Let the smallest singular value of X ˜ be σn . G = 0) to determine the backward error we need to solve EX a b for the perturbations E and F. d = √ 2 2 .e. C = 1 1 1 1 . In contrast the solution of the system of equations AX = B where B is a matrix and not just a vector. Thus even without knowledge of the above formula. Before stating a general result. β= B F. if the approximate solution is badly conditioned (near rankdeﬁcient) the backward error may become arbitrarily large. The fact that the conclusion reached in the example holds in general. n = k . A frequent choice is α= A F. was shown by Higham in [170]. we discuss an example. We will give the formula here assuming that the righthand side of the equation is known exactly (i. β . The residual in this case is ˜ + XA ˜ −C= R = AX 0 σ2 −σ ˜2 σ1 +σ2 σ2 −σ ˜2 σ1 +σ2 σ2 −σ ˜2 σ2 ˜ + XF ˜ = −R. σ solution has been obtained instead: X ˜2 ). Remark 6. σ2 ).4. Sylvester and Lyapunov equations i “oa” 2004/4/14 page 168 i The positive constants α. and the ˜ has full rank n. let A= 1 2σ 1 1 σ1 +σ2 1 σ1 +σ2 1 2σ 2 = B. The condition number of the solutions Another question which comes up when solving Sylvester or Lyapunov equations concerns the properties that the coefﬁcient matrices must satisfy so that the solution is well conditioned. f 2 ≤ β} = r ˜ α x 2 2 +β r In this case the backward error is directly linked to the relative residual.1. we mention that the solution of the linear set of equations Ax = b is backward stable. the backward error can therefore be big. σ2 . To preserve the structure. σ ˜2 . Thus assuming that σ1 = 1 c= σ2 . which in turn. Depending on the value of σ ˜2 . ˜) σn (X Here. The exact solution of the Sylvester equation (6. namely x ˜ . To put the issue of backward error in perspective. where σ2 − σ ˜2 is small. µ is an ampliﬁcation factor which indicates by how much the backward error can exceed the relative residual. G = 0). γ= G F.4. the 2norm of the relative residual (σ1 +σ2 ) σ1 +˜ σ2 of the perturbation E is approximately equal to φ is approximately φ = σ ˜2 . we will assume that E = F∗ = . Indeed. is the special case of the Sylvester equation where B = 0. γ provide freedom in measuring the perturbations. Furthermore. It turns out c d ˜2 −σ2 ˜2 −σ2 that a = 0 and d = σ ˜2 d = σ 2σ 2 σ ˜2 .e. This shows that the backward error in this case is equal to the forward error divided by the smallest singular value of the approximate solution. 2004 i i . The following upper bound holds approximate solution X for the backward error ˜) ≤ µ· η (X R F ˜ (α + β ) X where µ = F (σ1 +σ2 ) σ1 +˜ σ2 σ ˜2 −σ2  . In fact the following can be proved min{ : (A + E)˜ x = b + f. while that σ2 (α + β ) α2 + β 2 · ˜ F X . Example 6.1) is X = diag(σ1 . is not backward stable. Given the positive numbers σ1 . this amounts to the computation of the inverse of the matrix A. Assuming that C is known exactly (i. the least squares (smallest norm) solution of σ1 c + σ σ1 +σ2 is σ1 (˜ σ2 −σ2 ) σ ˜2 (˜ σ2 −σ2 ) √ 2 2. Assume now that the following approximate ˜ = diag(σ1 .
The Lyapunov equation with triangular coefﬁcient matrices is studied next. known under the name inertia.5 1015 . The solution of these equations is a prominent part of many model reduction schemes. we will call the quantity κ= 1 X F · L−1 [α(X∗ ⊗ In ). 1 0 .5.2) is (semi) deﬁnite the inertia on the coefﬁcient matrix A and of the solution P are related. Chapter summary 169 i “oa” 2004/4/14 page 169 i Before addressing this question we deﬁne the condition number of the Sylvester equation around a solution X.2. β (Ik ⊗ X)] 2 (6. Let A = −b 0 = B∗ . −1 2 b 6. κ = 2. an algorithm is derived which in case the solution to the Lyapunov equation is positive (semi) deﬁnite. Then come the eigenvalue/eigenvector method. b = 10−2 . which is perfectly conditioned. the associated condition number (6. b = 10−3 . b = 108 . followed by two characteristic polynomial methods. As discussed in section 3. The inertia of a matrix (with respect to some domain D in the complex plane) is composed of three nonnegative numbers. not backward stable. It has the advantage of increased precision and will prove essential for the computation of the Hankel singular values. The Lyapunov equation is also a linear matrix equation where the coefﬁcient matrices satisfy symmetry constraints.34) can be arbitrarily large. Ik ⊗ X] vec(δ B) where L = Ik ⊗ A + B∗ ⊗ In is the Lyapunov operator. The complex integration method uses the properties of Cauchy integrals to compute solutions. κ = 27.1) – which is a Lyapunov equation – is the identity matrix X = . 2004 i i i i . This is a property shared with the computation of the inverse of a matrix (which can be obtained by solving a special Sylvester equation). In fact for b = 10−1 . we shall attempt to clarify this formula by considering an example. The inertia result then asserts that in the righthand side of the equation (6. computes a squareroot (i. and one using the sign function. In the third section consequences of transforming the coefﬁcient matrices to triangular form are discussed. Consider (A + δ A)(X + δ X) + (X + δ X)(B + δ B) = C where again we assume for simplicity that the right hand side is known exactly. It is thus shown that although the solution of the Sylvester equation is forward stable. After retaining only ﬁrst order terms we get vec(δ A) A(δ X) + (δ X)B = (δ A)X + X(δ B) ⇒ Lvec(δ X) = [X∗ ⊗ In .i i 6. First we notice the multitude of methods for solving the Sylvester/Lyapunov equation. Cholesky) factor of the solution. The BartelsStewart algorithm was the ﬁrst numerically reliable method for solving the Sylvester equation.e. This is the algorithm ﬁrst derived by Hammarling. The solution of the Sylvester equation 0 0 1 0 (6.4. κ = 2.8. Following a proof of the inertia result which is iterative in the size of the equation. Depending on the application. Following [170]. and C = The conclusion from this example is that there does not seem to be any connection between the condition number of the Sylvester or the Lyapunov equation and their solutions. In other words.2 the stability of the corresponding algorithm is important.34) the condition number of the Sylvester equation. it is in general. b > 0. ACA April 14. and outside the domain.5 Chapter summary The Sylvester equation is a linear matrix equation.3. κ = 2.5 103 . Moreover. Again. The Kronecker method reduces their solution to that of a linear system of equations of the form Ax = b. namely the number of eigenvalues inside the domain. The second section discusses an important property of the Lyapunov equation.5 105 . on its boundary. an invariant subspace method. or the open unit disc. We 0 1 will show that depending on b. The last section addresses the numerical issue of accuracy of a computed solution. D can be taken as the open lefthalf plane. b are vectors. the derivative of the Lyapunov function can be big even around solutions which are perfectly conditioned. where x. Example 6. Therefore in the preceding chapter these equations were studied in some detail. the conditioning of the Sylvester equation and of its solution have in general no connection.
2004 i i .i i 170 Chapter 6. Sylvester and Lyapunov equations i “oa” 2004/4/14 page 170 i i i ACA : April 14.
i i i “oa” 2004/4/14 page 171 i Part III SVDbased approximation methods 171 i i i i .
i i i “oa” 2004/4/14 page 172 i i i i i .
Correspondingly. stability is preserved and there is an a priori computable error bound for the error system. the smallest amount of energy needed to steer the system from 0 to x is given by the square root of 2 Er = x∗ P −1 x. one would have to look for a basis in which these two concepts are equivalent. ∞ respectively. σn ). In system theory however there are other instances of positive deﬁnite matrices attached to a linear system. The concept of balancing is ﬁrst encountered in the work of Mullis and Roberts [245] on the design of digital ﬁlters. for instance stability. Difﬁculties however arise when one wishes (i) to determine whether the reduced system has inherited properties from the original one. The reachability gramian is then deﬁned as P = 0 hr (t)h∗ r (t)dt. one way to reduce the number of states is to eliminate those which require a large amount of energy Er to be reached and/or yield small amounts of observation energy Eo . when one seeks an estimate of the norm of the error system. clearly h(t) = Chr (t) = ho (t)B. t > 0 (D is irrelevant in this case and is omitted). ho (t) = CeAt be the inputtostate and the statetooutput responses of the system. The system theoretic signiﬁcance of this concept was recognized a few years later by Moore [243]. However these concepts are basis dependent. Roughly speaking. and in this basis there holds P = Q = Σ = diag (σ1 . which are positive deﬁnite matrices. Such a basis actually exists. The signiﬁcance of these quantities stems from the fact that given a state x. Thus. while the energy obtained by observing the output of the system with initial condition x and no excitation function is given by the square root of 2 Eo = x∗ Qx. notably. Section 7. Indeed. the states in such a representation are such that the degree of reachability and the degree of observability of each state are the same. and (ii) to have some idea of what has been eliminated. if we ﬁrst transform the system to a balanced representation and then eliminate (truncate) some of the state variables. It is called a balanced basis. From a mathematical viewpoint the balancing methods consist in the simultaneous diagonalization of appropriate reachability and observability gramians. there exist several other types of balancing. which can be accomplished easily. 173 i i i i . let hr (t) = eAt B. that is.5 lists some of these methods and examines in some detail weighted balancing which provides a link with moment matching reduction methods. ∞ ∗ while the observability gramian is deﬁned as Q = 0 ho (t)ho (t)dt. Technical description Given a stable linear system Σ with impulse response h(t) = CeAt B. solutions to various Riccati equations. and therefore in order for such a scheme to work.i i i “oa” 2004/4/14 page 173 i Chapter 7 Balancing and balanced approximations A central concept in system theory with application to model reduction is that of balanced representation of a system Σ. Model reduction requires the elimination of some of the state variables from the original or a transformed system representation. · · · .
respectively: 1 1 5 −1 2 2 2 . . Approximation in this basis takes place by truncating the state ˆ = (x1 · · · xk )∗ .45) and (4. i i ACA : April 14. After an introduction. This leads to balancing and balanced truncation for unstable systems. This observation suggests that reduced order models may be obtained by eliminating those states which are difﬁcult to reach and/or difﬁcult to observe. i. it has nevertheless a variational interpretation.52573 . approximation by truncation is deﬁned and the two main properties (preservation of stability and error bound) are discussed in detail.1. Subsequently.08578 1. those which yield small amounts of observation energy. 2004 i i . Here is a simple example illustrating this point. Although balanced truncation does not seem to be optimal in x = (x1 · · · xn )∗ to x any norm. and minimizing T are balancing transformations (see proposition 7.e. Similarly the states which are difﬁcult to observe. This chapter is concerned with balancing and approximation by balanced truncation. T The minimum of this expression is twice the sum of the Hankel singular values: 2 i=1 σi . followed by frequency selective balancing. states which are difﬁcult to reach may not be difﬁcult to observe and viceversa. In particular stochastic. . which can be omitted on ﬁrst reading.5). Balancing and balanced approximations i “oa” 2004/4/14 page 174 i where the σi are the Hankel singular values of Σ. a canonical form for continuoustime balanced systems is derived followed by numerical considerations in computing balanced representations. those which require a large amount of energy to reach. The reachability gramian P and the observability gramian Q can be obtained by solving the Lyapunov equations (4. The latter part of the chapter.92388 0. are those which lie (have a signiﬁcant component) in the span of the eigenvectors of the observability gramian Q corresponding to small eigenvalues as well.19098 . However. by a nonsingular transformation T which solves the following minimization problem min trace TP T∗ + T−∗ QT−1 . n 7. In order to ﬁnd how difﬁcult to observe the states are we compute ∗ VQ VP = α −β β α .30901 0 0 0. Example 7.91421 0 0 0. k < n.i i 174 Chapter 7. stable and minimal system A= 1 −1 3 −2 .38268 −0.e. Q = P= 1 1 −1 1 2 2 The set of eigenvalues Λ and the corresponding eigenvectors V are ΛP = ΛQ = 2. Approximation by balanced truncation preserves stability and the H∞ norm (the maximum of the frequency response) of the error system is bounded above by twice the sum of the neglected singular values 2(σk+1 + · · · + σn ).1 The concept of balancing From lemma 4. are (have a signiﬁcant component) in the span of the eigenvectors of the reachability gramian P corresponding to small eigenvalues.92388 0.85865 0.46). VP = VQ = 0. B= 1 0 . i.27 follows that the states which are difﬁcult to reach.52573 −0.38268 0. C= 0 1 . Consider the following continuoustime. The balanced basis is determined namely.1.85865 0. discusses various generalizations of the concept of balancing. bounded real and positive real balancing are presented.
such that the transformed Gramians P ¯=Q ¯. The transformation which achieves this goal is called a balancing transformation. does there exist a basis in the state space in which states which are difﬁcult to reach are also difﬁcult to observe? The answer to this question is afﬁrmative. or difﬁcult to observe. In the general case we have T ACA April 14. ˆ are uniquely deterIf the Hankel singular values are distinct (i. i = 1. U∗ QU = KΣ2 K∗ . Deﬁnition 7. det T = 0.e.2) Proof. P This ensures that the states which are difﬁcult to reach are precisely those which are difﬁcult to observe. and β = 0. observable and stable system Σ is balanced if P = Q. and viceversa. Σ is principalaxis balanced if P = Q = Σ = diag (σ1 . i. the question arises: Question: Given a continuous. observable and stable system corresponding gramians P and Q. n. From these considerations.i i 7.1) Lemma 7. (7.1. Recall that under an equivalence transformation the gramians are transformed as shown in (4. The concept of balancing 175 i “oa” 2004/4/14 page 175 i where α = 0. implies that the quantities σi . have multiplicity one) balancing transformations T mined from T given above. we need to search for a basis in which states which are difﬁcult to reach are simultaneously difﬁcult to observe. This means that the eigenvector of P which corresponds to the smallest eigenvalue. Q ˜ = T−∗ QT−1 ⇒ P ˜Q ˜ = T (PQ) T−1 .98709.57): ˜ = TP T∗ .e.1. stable system Σ = A C B . a diagonal matrix with ±1 on the diagonal: ˆ = ST. Q ˜ are equal: The problem is to ﬁnd T. the state which is the easiest state to reach.or discretetime. a (principal axis) balancing transformation is given as follows: A C B and the T = Σ1/2 K∗ U−1 and T−1 = UKΣ−1/2 (7. We can now state the main lemma of this section.16018. Furthermore lemma 5. gives almost maximum observation energy. i. 2004 i i i i .e. the eigenvector of P which corresponds to the largest eigenvalue. Given the reachable.e. P ˜. gives almost minimum observation energy. · · · . The above example suggests that if we wish to base a model reduction procedure to the degree to which states are difﬁcult to reach.1.6 on page 118. i. The reachable.1.2. σn ). It is readily veriﬁed that TP T∗ = Σ and T−∗ QT−1 = Σ. From a linear algebraic point of view the concept of balancing consists in the simultaneous diagonalization of two positive (semi) deﬁnite matrices. · · · . the state which is the most difﬁcult state to reach. For this we need the Cholesky factor U of P and the eigenvalue decomposition of UQU∗ : P = UU∗ . up to multiplication by a sign matrix S. viceversa. are the Hankel singular values of the system Σ. Remark 7. Balancing transformation.
i i 176 Chapter 7. B= . s2 = 1. Using (7. Balanced systems have the following property.3090. 7. σ2 as above. · · · . Thus we obtain trace (P + Q) = trace M + 2 trace Σ ⇒ trace (P + Q) ≥ 2 trace Σ It readily follows that the lower bound is attained in the balanced case. Given that trace PQ = trace Σ2 . σ2 = 0.25) with γ . Proposition 7. σ1 .1.2. Therefore the lower bound is attained in the balanced case: P = Q = Σ.2 Model Reduction by balanced truncation Let Σ = A C B be balanced with gramians equal to Σ. partition the corresponding matrices as follows: A11 A21 A12 A22 Σ1 0 0 Σ2 B1 B2 A= .6687 γ = −0. (Continuation) Using (5. Variational interpretation Balancing transformations can be obtained by minimization of an appropriate expression. Proof.8944 −0.6687 γ Finally notice that the above form matches the canonical form (7. is a positive (semi) deﬁnite expression and hence its trace is positive. Every principal axis ˆ has the form T ˆ = ST. In discrete Example 7. First recall that the product of the gramians is similar to a positive deﬁnite matrix: PQ ∼ P 1/2 QP 1/2 = UΣ2 U∗ . C = (C1 C2 ) (7.6687 −0.6687 0. Proposition 7. and s1 = −1. where T is deﬁned by (7.7236 0. Notice that M = (P 1/2 U − P −1/2 UΣ)(U∗ P 1/2 − ΣU∗ P −1/2 ). respectively. Let there be k distinct singular values σi . i = 1. Balancing and balanced approximations i “oa” 2004/4/14 page 176 i Corollary 7. with multiplicities mi .3) i i ACA : April 14. t > 0.24) we obtain the Hankel singular values σ1 = 0.2) we obtain the (principal axis) balancing transformation: T=γ Thus we have Σbal = TAT CT−1 −1 1 1 1 2 4σ 1 1 2 4σ 2 TB = − 2γ σ1 γ2 σ2 −σ1 2 γ2 σ1 −σ2 −γ − 2γ σ2 γ 2 −0. 2 < 1.8090. Deﬁne the quantity: γ = 0. In continuoustime eAt time A 2 ≤ 1. The following equalities hold: P + Q = P + P −1/2 UΣ2 U∗ P −1/2 = = (P 1/2 U − P −1/2 UΣ)(U∗ P 1/2 − ΣU∗ P −1/2 ) + P −1/2 UΣU∗ P 1/2 + P 1/2 UΣU∗ P −1/2 . k . 2004 i i . Σ= .8944 0. with strict inequality holding if the Hankel singular values are distinct.4.2).2763 0. we have trace (P + Q) ≥ 2 trace Σ. and S is a block diagonal unitary balancing transformation T matrix with an arbitrary mi × mi unitary matrix as ith block.5. The following holds true.6687.3.
It is valid for inﬁnite dimensional systems and involves a minimization. 3. with multiplicities mi .5) Furthermore. obtained by balanced truncation have the following properties: 1. and hence we state two theorems. In addition to (7. have poles in the closed unit disc.7. these systems are in general not balanced. If λmin (Σ1 ) > λmax (Σ2 ). · · · . 2004 i i i i . If Σ1 has singular values σi . with the multiplicity mi . Balanced truncation: discretetime systems. q .8. i = 1. i = 1.22). i = 1. k . with no poles on the imaginary axis. the reduced order system Σi obtained by balanced truncation has the following properties: 1. · · · . Model Reduction by balanced truncation Deﬁnition 7.2.2. k < q . ∀p. bounds for the L1 norm of the impulse response of the error system can be derived. k . Σ1 is in addition reachable and observable.e.i i 7.1. observable and stable (poles in open unit disc) discretetime system Σ.and continuoustime systems. k . the multiplicities of the neglected singular values do not enter in the upper bound. i = 1. · · · .and discretetime systems. equality holds if Σ2 = σq Imq . observable. When the Hankel singular values have multiplicity one. the amplitude Bode plots of Σ and Σ1 are guaranteed to be close. with the multiplicity mi . this bound is as follows: n n h − hk L1 ≤2 4 =k+1 σ −3 =k+1 σ .6. observable and stable (poles in the open lefthalf plane) continuoustime system Σ. the reduced order system Σi . Below is an outline of the proof of parts [1] and [2]. · · · . L1 error bound. These error bounds are due to Glover [138] and Enns [106]. The last part of the above theorems says that if the neglected singular values are small. If λp (Σ1 ) = λq (Σ2 ). (7. Σi . i = 1. page 116. ACA April 14.5) for the H∞ norm of the error system. 2. One such bound was obtained in [140]. Let the distinct singular values of Σ be σi . k < q . The H∞ norm of the difference between the full order system Σ and the reduced order system Σ1 is upperbounded by twice the sum of the neglected Hankel singular values: Σ − Σ1 H∞ ≤ 2 (σk+1 + · · · + σq ). 2 (7. Given the reachable.4) 177 i “oa” 2004/4/14 page 177 i are reduced order systems obtained form Σ by balanced truncation. Balanced truncation: continuoustime systems. q . 2. However these properties are slightly different for discrete.5 of the Hankel singular values. i = 1. i = 1. Remark 7. 3. Notice that in part [3] for both continuous. · · · . 2. and notation (5. 2. Theorem 7. Let Σ1 have singular values σi . Σi for both i = 1 and i = 2 are in addition reachable. i. An a priori computable bound involving the Hankel singular values and their multiplicities was derived in [218]. Recall deﬁnition 5. i = 1. Reduced order models obtained by balanced truncation have certain guaranteed properties. The systems Σi = Aii Ci Bi . Given the reachable. the h∞ norm of the difference between full and reduced order models is upperbounded by twice the sum of the neglected Hankel singular values multiplicities not included: Σ − Σ1 h∞ ≤ 2 (σk+1 + · · · + σq ). k . Theorem 7. Σ2 is equal to the smallest Hankel singular value of Σ. Σi is balanced and has no poles in the open righthalf plane.
By multiplying the same equation only on the right by v we obtain ∗ (A∗ 11 + ν I)Σ1 v = 0.2. Σ2 A21 + A12 Σ1 + C2 C1 = 0. 0) is an eigenvector of the whole matrix A corresponding to the eigenvalue ν . Next we need to consider the equations ∗ ∗ ∗ A21 Σ1 + Σ2 A∗ 12 + B2 B1 = 0.i i 178 Chapter 7. H3 = C2 . Due to the assumptions above we can take v to be equal to the ﬁrst unit vector v = e1 . and hence different ∗ from 1. Balancing and balanced approximations i “oa” 2004/4/14 page 178 i where h is the impulse response of the original system and hk is the impulse response of the k th order approximant obtained by balanced truncation. Since the eigenvalues of Σ2 are different from those of Σ1 . right by v Σ1 . Thus. Therefore the ﬁrst column of A21 is zero: a = 0. v respectively. the H∞ norm of the error between Σ and Σ1 is equal to the theoretical upper bound. and Σ2 a + b = 0.2. Assume on the contrary. right ∗ by v . have negative real parts) and the eigenvalues of F22 are on the imaginary axis. B ¯ = G2 . by lemma 6. Q 12 P33 0 Q12 Q22 0 0 0 Q33 where F22 has eigenvalues on the imaginary axis. Furthermore. The consequence of this is that Σ1 v must be a multiple of v. B1 ). where all eigenvalues of F11 are (strictly) in the LHP (i. Alternative proof of part 2. F33 = A22 . By construction the following equations hold: ∗ ∗ ∗ A11 Σ1 + Σ1 A∗ 11 + B1 B1 = 0. there exists a blockdiagonal transformation T = T11 0 0 I 1 such that T11 A11 T− 11 = F11 0 0 F22 . Similarly by multiplying the ﬁrst equation on the left. There remains to show that if Σ1 and Σ2 have no diagonal entries in common. Part 2. we obtain C1 v = 0. then A11 has no eigenvalues on the imaginary axis. Clearly. We work with the subsystem Σ1 = (C1 . The proof now proceeds in three steps. Let ∗ ∗ A11 v = ν v for ν = jω . For simplicity we assume that A11 has only one complex eigenvalue on the imaginary axis. (a) First. The conclusion is that A11 cannot have eigenvalues on the imaginary axis. Therefore the column vector (v∗ . which in this case is 2σ1 = 1.618033. that A11 has eigenvalues on the imaginary axis. ACA : i i April 14. the H∞ norm of the error between Σ and Σ2 is also equal to the theoretical upper bound. 7.1 Proof of the two theorems Balanced truncation: continuoustime Proof. Let the quantities B. b respectively. To prove stability. Example 7.1. It is assumed that A11 has eigenvalues both on the imaginary axis and in the LHP. upon multiplication of the same equation on the right by Σ1 v we obtain the relationship 2 (A11 − ν I)Σ2 1 v = 0. Part 1. we show that G2 = 0 and H2 = 0. P33 = Q33 = Σ2 . A∗ 12 by a. C and Σ be transformed as follows: F11 0 F13 G1 ¯ = 0 F22 F23 . since Σ1 is positive deﬁnite. namely. G3 = B2 .6180. (Continuation) If we reduce the system discussed earlier by balanced truncation. Σ1 v respectively. Multiplying the second equation above on the left. Denote the ﬁrst column of A21 . it follows that v∗ A∗ 11 = ν v .e. A11 Σ1 + Σ1 A11 + C1 C1 = 0. 2004 i i . we ∗ conclude that B1 Σ1 v = 0. the system Σ1 is balanced. 2σ2 = 0. A11 . we also assume that σ1 = 1 with multiplicity one. which completes the proof.15 we conclude that the eigenvalues of A11 are in the lefthalf plane or on the imaginary axis. This however is a contradiction to the reachability of the pair (A. Multiplying these latter equations on the right by v we have a + Σ2 b = 0. B). C ¯ = H1 H2 H3 A F31 F32 F33 G3 P11 ¯ = P∗ P 12 0 P12 P22 0 0 Q11 ¯ = Q∗ 0 .
v we obtain ∗ ∗ (1 − ν 2 )v∗ Σ1 v = v∗ A12 Σ2 A∗ 12 v + v B1 B1 v ≥ 0 Since Σ1 > 0. on page 344).2) equation is F11 P12 + P12 F∗ 22 = 0. it follows that F23 = 0 and F32 = 0. consequently F32 = 0. we conclude that P12 = 0. the gramians are block diagonal.e. ¯. Combining all these relationships together we get 11 v = ν v it follows that A21 v (1 − ν 2 )σmin (Σ1 ) ≤ (1 − ν 2 )σmax (Σ2 ) ⇒ σmin (Σ1 ) ≤ σmax (Σ2 ) since the system is stable by part 1.i i 7. The equation given at the beginning of the proof implies (1 − ν 2 )v∗ Σ1 v = v∗ A12 Σ2 A∗ 12 v Moreover v∗ Σ1 v ≥ σmin (Σ1 ) and the righthand side is no bigger than A21 v 2 σmax (Σ2 ). in the above basis. Due to the fact that the singular values are ordered in decreasing order and because of the assumption σk = σk+1 . This proves part (c) As noted earlier. this implication however contradicts the The consequence of these three relationships is that (A original assumption of reachability. j . Then v∗ A12 = 0. For the proof of part 3. Assume that the subsystem Σ1 is not reachable. the (2. that ν  = 1. namely A ∗¯ ∗ ∗ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ QA + C C = 0 and AP + PA + BB = 0: F11 0 F13 0 F22 F23 F31 F32 F33 P11 P12 0 P∗ 0 12 P22 0 0 P33 + P11 P12 0 P∗ 0 12 P22 0 0 P33 F∗ 0 F∗ 11 31 ∗ 0 F∗ 22 F32 ∗ ∗ F13 F23 F∗ 33 + + G1 G∗ G1 G∗ G1 G∗ 1 2 3 ∗ ∗ G G G G2 G∗ 2 G3 2 2 1 G3 G∗ G3 G∗ G3 G∗ 3 2 1 =0 and the associated observability Lyapunov equation. The proof of part 2 is also by contradiction.B ¯ ) is not reachable. Model Reduction by balanced truncation 179 i “oa” 2004/4/14 page 179 i (b) Next. the observability Lyapunov equation yields F∗ 32 Q33 + Q22 F23 = 0. ¯ ∗Q ¯+ The proof of the above three facts is based on the Lyapunov equations of the transformed triple. ∗ The (2. such that A∗ 11 v = ν v and B1 v = 0. The (1. Multiplying this equation by v . we show that P12 = 0. Furthermore since 2 A ≤ 1 and A∗ ≤ 1 − ν 2 . Q12 = 0. 2004 i i i i . which means that (A. for all i. B) is not reachable. Hence the reduced system is asymptotically stable. This is however a contradiction and part 1 of the theorem is proved. the spectra of P33 Q33 and P22 Q22 are disjoint. this immediately implies ν  ≤ 1. these three parts imply the lack of reachability and observability of the original triple which is a contradiction to the assumption of minimality. From the Lyapunov equations follows ∗ ∗ A11 Σ1 A∗ 11 + A12 Σ2 A12 + B1 B1 = Σ1 ∗ This shows that the truncated subsystem need not be balanced. Hence (v∗ 0) is a left eigenvector of A and at the same time it is in the left kernel of B. There remains to show that the inequality is strict. it follows that G2 = 0 (this is left as an exercise. (c) Finally. Assume the contrary. provided that σk = σk+1 . Balanced truncation: discretetime Proof.3) equation is F23 P33 + P22 F∗ 32 = 0. we refer to [173]. Finally. Then there exists ∗ a left eigenvector ν of unit length. Similarly H2 = 0. similarly Q12 = 0 which proves (b). These two equations yield F23 P33 Q33 + P22 Q22 F23 = 0 Recall that P33 = Q33 = Σ2 and the eigenvalues of P22 Q22 are a subset of the eigenvalues of Σ2 1 . see Problem [5]. ACA April 14. or equivalently λi (P22 Q22 ) = λj (P33 Q33 ). Therefore F23 = 0. i.2. This last inequality yields the desired contradiction which proves part 2.2) equation is F22 P22 + P22 F∗ 22 + G2 G2 = 0. Since F11 and F22 have disjoint sets of eigenvalues. This proves (a). Due to the fact that F22 has imaginary eigenvalues exclusively. Let A∗ 11 v = ν v. that is.
where the transformed quantities are: A11 1 A21 2 0 A12 A22 A12 0 1 A21 2 A11 ¯ = TFT−1 = F ¯ = TG = .4). Σerr has the following state space realization: Σ − Σred . The resulting error system is Σerr = ∗ ∗ ∗ ¯ . The accordingly partitioned state is x = (x∗ 1 x2 ) .5) ∗ ∗ Recall the partitioning of the balanced system (7.i i 180 A proof of the H∞ error bound (7. 2004 i i .H C2 0) The following is the crucial result. Lemma 7. it has input u.7) Proof. We show instead that H∞ ≤ 1. The resulting state space realization is 1 Σerr = 2σ ¯ F ¯ H 1 ¯ G 2σ The matrix X= Σ1 2I 1 Σ− 1 is clearly positive deﬁnite. state [x∗ 1 x2 ξ ] and output e = y − y A11 A12 0 B1 A21 A22 F G 0 B2 = Σerr = (7. [358] With the setup as above. It follows that the H∞ norm of the error system is Σerr 1 2σ Σerr H∞ ≤ 2σ (7. and its state will be denoted by ξ .6) 0 0 A11 B1 H C1 C2 −C1 i “oa” 2004/4/14 page 180 i Chapter 7.9. The reduced system is assumed to be Σ1 deﬁned by (7. I ∈ Rr×r .3). Balancing and balanced approximations Recall that the Lyapunov equations corresponding to the balanced system Σ are: A11 A21 A∗ 11 A∗ 12 A12 A22 A∗ 21 A∗ 22 Σ1 Σ2 Σ1 Σ2 + + Σ1 Σ2 Σ1 Σ2 A∗ 11 A∗ 12 A11 A21 A∗ 21 A∗ 22 A12 A22 + + B1 B2 C∗ 1 C∗ 2 B∗ 1 C1 B∗ 2 C2 =0 =0 The next step is to make a basis change T in the state space: x1 − ξ 1 ¯1 x x ¯ 2 = x2 = 0 ¯3 x x1 + ξ 1 0 1 0 T −1 x1 0 x2 ⇒ ξ 1 Therefore Σerr = ¯ F ¯ H ¯ G .G 0 B2 2B1 ¯ = HT−1 = ( C1 . that is Σ2 = σ I. It can readily be veriﬁed that it satisﬁes the Riccati equation1 ¯ ∗ X + XF ¯ +H ¯ ∗H ¯ + F 1 1 ¯G ¯ ∗X = 0 XG (2σ )2 i i ACA : April 14. σ ∈ R. assume that the part which is eliminated is all pass.
2 H2 norm of the error system for balanced truncation In this section we compute the H2 norm of the error system.r−1 − Σ1. Then using the expression for the solution of the Sylvester equation (6.0 ⇒ Σ ≤ 2σr + 2σr−1 + · · · + 2σ2 + 2σ1 The ﬁrst inequality follows from the triangle inequality and the second from lemma 7.28). · · · .r−1 + Σ1. the H2 norm of the error system Σerr is the square root of the following expression: B1 Σ1 0 −Y1 0 B∗ B∗ Σ2 −Y2 B2 B∗ Σe 2 1 2 1 H2 = trace ∗ ∗ −Y1 −Y2 Σ1 B1 The following string of equalities holds: ¯ ∗ X + XF ¯ + H∗ H + 1 2 XG ¯G ¯ ∗X = 0 F (2σ ) 1 0 1 0 A∗ A∗ 0 Σ1 A11 Σ1 A12 0 11 Σ1 21 − 1 ∗ ∗ ∗ A+ @ A Σ1 2A 2A22 A21 A12 Σ1 A + @ A21 22 12 1 1 ∗ ∗ Σ−1 0 Σ− A12 Σ− A11 0 A A 1 1 21 11 1 1 1 0 0 0 0 0 C∗ C∗ 0 1 C1 1 C2 −1 ∗ ∗ A= @ C∗ C1 C∗ C2 0 A + @ 0 B2 B1 Σ1 B2 B2 2 2 1 −1 ∗ ∗ Σ−1 0 0 0 0 Σ− B B Σ B B 1 1 2 1 1 1 1 0 ∗ 1 ∗ A11 Σ1 + Σ1 A11 + C∗ A∗ 0 1 C1 21 + Σ1 A12 + C1 C2 − 1 − 1 ∗ ∗ ∗ ∗ ∗ ∗ ∗ @ A Σ1 + A21 + C C1 A 2A22 + 2A22 + C2 C2 + B2 B2 A12 Σ1 + A21 + B2 B1 Σ1 12 2 −1 −1 −1 −1 ∗ ∗ −1 ∗ + Σ−1 A 0 A21 A∗ 12 + Σ1 B1 B2 11 Σ1 + Σ1 A11 + Σ1 B1 B1 Σ1 1 From the Lyapunov equations given earlier. The above proof leads to a proof of the inequality. Model Reduction by balanced truncation The claim follows from lemma 5. We can thus write Σ = (Σ1. Σ1 is the reduced system obtained by balanced truncation 7.1) obtained in (6.r−2 ) + · · · + (Σ1.i i 7. (7. Thus clearly. and Σerr is the resulting error system (7. σr .r−1 ) + (Σ1.1 − Σ1.10). The results in section 8. holds.k = . . 7.9 which asserts that Σ1.1 ) + (Σ1.6. · · · . we obtain a computable upper bound for this H2 norm (7.0 is the zero system.7) holds with equality.4. for all k = 1.27. A ··· A B k1 kk k i “oa” 2004/4/14 page 181 i 181 C1 ··· Ck σk I Notice that with the above notation Σ = Σ1.2.k−1 ≤ 2σk . mr . page 121. Again recall that Σ is the balanced partitioned system (7. Let Y be such that A∗ Y + YA11 + C∗ C1 = 0.r and Σ1.8) From (5.r − Σ1. with Σk = . .14).1 − Σ1. it follows that each one of the (block) entries of the above matrix is zero.5) on page 177. Let Σ have r distinct Hankel singular values σ1 .0 ) ⇒ Σ ≤ Σ1..r−2 + · · · · · · + Σ1. r . Σ1. . and keeping in mind that Σ2 = σ I.4 lead to a proof of the relationship with equality (at least in the SISO case).k − Σ1. which in partitioned form is A∗ 11 A∗ 12 A∗ 21 A∗ 22 Y1 Y2 + Y1 Y2 A11 + C∗ 1 C∗ 2 C1 = 0. such that i mi = n. The following notation is used: A11 · · · A1k B1 σ1 I A21 · · · A2k B2 σ2 I . 2004 i i i i . .3). k = 1. . .2 − Σ1.1 + Σ1. . part 3. ACA April 14. the error bound (7. It should be mentioned that (7.2..r−1 − Σ1. For the proof of the general case we proceed as follows.2 − Σ1. · · · . · · · . . . .6). r.r − Σ1. with multiplicity m1 .
while the second is quadratic in Σ2 . With the notation established above. Balancing and balanced approximations i “oa” 2004/4/14 page 182 i ˆ is: Lemma 7. and consequently − (B1 B2 ) Y2 = (A12 Σ2 + Σ1 A21 ) Y2 .11. we derive an upper bound for the H2 of the error in terms of the H∞ norm of the auxiliary system A11 A12 0 A22 Σ2 A21 Σaux = A21 0 A12 Σ2 0 Notice that the transfer function of Σaux can be written as Haux (s) = A12 Σ2 sI − A22 − A21 (sI − A11 ) −1 A12 −1 Σ2 A21 It is worth noting that this expression is quadratic in the neglected part Σ2 of the gramian. The ﬁrst term in (7.2. i i ACA : April 14. 2004 i i .1) entry of equation (7.8) implies: A∗ 11 Y1 + A21 Y2 + Y1 A11 + C1 C1 = 0 ∗ ∗ ∗ ∗ and consequently Σ1 A21 Y2 = −(Σ1 A11 Y1 + Σ1 Y1 A11 + Σ1 C1 C1 ). Thus the expression trace [ B1 Σ1 B1 − ∗ B∗ 1 Y1 B1 + Σ1 A21 Y2 ] equals ∗ ∗ ∗ trace [ B∗ 1 Σ1 B1 − B1 Y1 B1 ] − trace [ Σ1 A11 Y1 + Σ1 Y1 A11 + Σ1 C1 C1 ] ∗ ∗ ∗ = trace [ B∗ 1 Σ1 B1 ] − trace [ C1 Σ1 C1 ] − trace [ B1 Y1 B1 + Σ1 A11 Y1 + Σ1 Y1 A11 ] ∗ = 0 − trace [ (B1 B∗ 1 + Σ1 A11 + A11 Σ1 ) Y1 ] = 0 − 0 = 0 This concludes the proof of (7. there holds Σe 2 ≤ trace [ B∗ 2 Σ2 B2 ] + 2 k Σaux H∞ . It is this second term that we try to express by means of the original system data below.10) is linear in the neglected singular values Σ2 . This implies that trace ( ) is: ∗ ∗ trace [ −B∗ 2 Y2 B1 ] = trace [ −B1 B2 Y2 ] = trace [ A12 Σ2 Y2 + Σ1 A21 Y2 ] Substituting yields Σe 2 = trace [B∗ 2 Σ2 B2 ] + 2 trace [ A12 Σ2 Y2 ]+ ∗ ∗ +2 trace [ B∗ 1 Σ1 B1 − B1 Y1 B1 + Σ1 A21 Y2 ] ( ) ∗ ∗ We show that ( ) = 0. The following equalities hold: Σe 2 ∗ ∗ = trace [B∗ 1 Σ1 B1 − B1 Y1 B1 + B2 Σ2 B2 − ∗ ∗ ∗ ∗ ∗ −B∗ 2 Y2 B1 − B1 Y1 B1 − B1 Y2 B2 + B1 Σ1 B1 ] ∗ ∗ ∗ = trace [B2 Σ2 B2 ] + 2 trace [ B1 Σ1 B1 − B∗ 1 Y1 B1 −B2 Y2 B1 ] ( ) From the (1. Next. (7. Corollary 7.9). Proof the lemma. while the second term is the inner product of the second block row of the gramian with Y. With the notation established above the H2 norm of the error system Σe = Σ − Σ Σe 2 H2 = trace [ (B2 B∗ 2 + 2 Y2 A12 ) Σ2 ] (7.10.10) Remark 7.i i 182 Chapter 7.9) The ﬁrst term in the above expression is the H2 norm of the neglected subsystem of the original system. The (1.2) entry of the Lyapunov equation for the balanced reachability gramian follows: A12 Σ2 + Σ1 A∗ 21 + ∗ ∗ B1 B∗ 2 = 0.2.
e. ﬁnally the third term is twice the trace of the inner product of the second block row ˆ − Q11 satisﬁes the of the observability gramian Q with Y weighted by the block offdiagonal entry of A. i. the second term. The reason is that often P and Q have numerically lowrank compared to n. each term in the above sum is bounded from above by the H∞ horm of this auxiliary system. because of illconditioning due to the rapid decay of the eigenvalues of the gramians. L : lower triangular ACA (7. formula (7. Consequently they have a Cholesky decomposition: P = UU∗ . and multiplying on the left by [0 A12 Σ2 ]. Q as well as the singular values σi (Σ) (see section 9. If we assume that the gramian Q is not diagonal. lefteigenvectors of the Amatrix of the reducedorder system: A11 .2. Q Sylvester equation: ∗ ˆ ˆ A∗ 11 (Q − Q11 ) + (Q − Q11 )A11 = A12 Q21 + Q12 A12 ˆ − Q11 will be small This implies that if either the gramian Q or A have small (zero) offdiagonal elements. w Thus we can derive an upper bound for the H2 of the error in terms of the H∞ norm of the auxiliary system Σaux deﬁned earlier. We thus obtain an upper bound for the H2 norm. in practice yield algorithms with quite different numerical properties. instead of formula expression for the H2 error Q11 Q12 −Y1 Q∗ B∗ Q22 −Y2 B∗ B∗ Σe 2 12 1 2 1 H2 = trace ∗ ∗ ˆ Q − Y1 − Y2 (7. In this section we list several algorithms for balancing and balanced truncation. Numerical issues: four algorithms Proof of the corollary.4 for details). which although in theory identical. (7.14) derived for the solution of the Sylvester equation.11) reduces to (7. where µi . Q12 = 0 and Q21 = 0. 2004 i i i i . that the error reduces to the following expression: Σe 2 H2 = ∗ ∗ ˆ trace [B∗ 2 Q22 B2 ] + trace [B1 (Q − Q11 )B1 ] + 2 trace [A12 (Q2.9) when the gramian is block diagonal. B1 ∗ ˆ where A∗ 11 Q + QA11 + C1 C1 = 0. observable and stable system Σ = A C B of dimension n. Q. Therefore. Using manipulations similar to those above it can be shown. Clearly. then Q (zero). especially for largescale problems. Remark 7. First we notice that Y − A∗ Y1 − Σ 1 Y2 + Σ1 0 satisﬁes the Sylvester equation: A11 = 0 Σ2 A21 183 i “oa” 2004/4/14 page 183 i Y1 − Σ 1 Y2 Thus.3. followed by truncation. are n × n positive deﬁnite matrices which we have denoted by P . For more indepth account we refer to [54]. This approach may turn out to be numerically inefﬁcient and illconditioned.11) Interpretation. Finally. Q = LL∗ where U : upper triangular. The gramians of a reachable. it is important to avoid formulas involving matrix inverses.9) we get the following B1 B2 . 7. right. This is due in many cases to the rapid decay of the eigenvalues of P . we obtain k A12 Σ2 Y2 = i=1 0 A12 Σ2 (µi I + A) −1 0 Σ2 A21 ∗ ¯i wi w ¯ i are the eigenvalues. The ﬁrst term in the above expression is the H2 norm of the neglected subsystem of the original system.i i 7. requires balancing the whole system Σ.: Y )]. wi . using the expression (6.3. is the difference between the H2 norms of the reducedorder system and the dominant subsystem of the original system.12) April 14. In particular.3 Numerical issues: four algorithms Model reduction by balanced truncation discussed above.. [340]. Hence the sum is bounded by k times the same norm.
Ψ.13) where X.1) In addition. we need the QRfactorization of the products UW and LV UW = XΦ. it remains balanced and is equal to Σ. L∗ . but is no longer the same as Σ. Y are orthogonal and Φ.18) T3 AT3i CT3i T3 B D = is still balanced. The third transformation is derived from the previous one by dropping the term Σ−1/2 . 2004 i i . Y. Ψ are upper triangular. LV = YΨ (7. The SVD of this product produces the orthogonal matrices W and V: U∗ L = WΣV∗ Finally.16) satisfy T1 Ti1 = Ik . that is Σ−1/2 B3 D . Balancing and balanced approximations i “oa” 2004/4/14 page 184 i The eigenvalue decomposition of U∗ QU. 1. V1 have k columns and Σ1 ∈ Rk×k . Clearly. This yields a system which is balanced only up to diagonal scaling. Σ = Σ1 Σ2 . the transformations T1 = Σ1 − 1 /2 ∗ ∗ V1 L ∈ Rk×n and Ti1 = UW1 Σ1 −1/2 ∈ Rn×k (7. The ﬁrst transformation and its inverse are T = Σ1/2 K∗ U−1 and T−1 = UKΣ−1/2 2. We are now ready to derive various balancing transformations as well as projections which produce systems obtained from Σ by balanced truncation. and TP T∗ = Σ−1/2 V∗ L∗ UU∗ LVΣ−1/2 = − 1 /2 W∗ U∗ LL∗ UWΣ−1/2 = Σ. Q we generate the orthogonal matrices: K.15) It is readily checked that Ti = T−1 is indeed the inverse of T. However if we scale by Σ−1/2 . produces the orthogonal matrix U and the diagonal matrix Σ which is composed of the Hankel singular values of Σ (this follows from the fact that U∗ QU is similar to PQ): U∗ QU = KΣ2 K∗ (7. i i ACA : April 14.14) (7. the system Σ−1/2 A3 Σ−1/2 C3 Σ−1/2 A3 C3 B3 D (7.2) (7. V. To summarize. Furthermore if we partition W = [W1 W2 ]. and T∗ i QTi = Σ is balanced. V = [ V1 V2 ] where W1 .17) CTi1 D 3. and the diagonal matrix Σ. The second transformation follows from (7.13): T = Σ−1/2 V∗ L∗ and T−1 = UWΣ−1/2 TATi CTi TB D (7. obtained from Σ by balanced truncation: T1 ATi1 T1 B Σ1 = (7. Φ. Therefore the system Σ. Removing the factor Σ−1/2 may in some cases make the transformation better conditioned. the upper triangular matrices U. L. Square root algorithm.i i 184 Chapter 7. and applied on Σ yield the reduced order system of dimension k . X. The transformations are T3 = V∗ L∗ ∈ Rn×n and T3i = UW ∈ Rn×n which again satisfy T3 T3i = T3i T3 = Σ. W. from the gramians P . the Hankel singular values are the singular values of the product of triangular matrices U∗ .
22) is similar to Σ1 . without ﬁrst computing the gramians P .3. The transformations are T4 = (Y∗ X)−1 Y∗ = X∗ and T4i = X Then X∗ AX CX X∗ B D i “oa” 2004/4/14 page 185 i 185 (7. Yk ∈ Rn×k (7. Ψ are upper triangular. 2. where n is the dimension of the system under investigation O(n3 ) while the storage required is O(n2 ) (the storage of n × n matrices is required). that is. 2004 i i i i . Ψk are the k × k leading submatrices of Φ. Therefore the truncated system is similar to the system which is obtained by balanced truncation. In general the transformations have different condition numbers. The transformations T3 and T4 have almost the same condition number which is in general much lower than those of the ﬁrst two transformations. U. This upper triangular transformation is K = Σ1/2 Φ. and K.1. This method was introduced by Varga [340]. Next we truncate the QR factorizations of UW and LV: UW1 = Xk Φk .3. and subsequently computing their Cholesky decompositions. X. followed by T2 . K can be ill conditioned due to Σ1/2 . (a) Summarizing. while Φk . we obtain the right projectors denoted ACA April 14. Towards this goal we compute the QR factorizations of UW and LV: UW = XΦ. T2 the square root balancing transformation.3). See also the paper of Safonov and Chiang [283]. W.3.i i 7. Clearly. there are four different ways of computing a balanced realization and consequently a reduced system obtained by balanced truncation. The similarity transformation is K = Σ1/2 Φ−1 : K 0 0 I ˆ1 A ˆ1 C ˆ1 B D K− 1 0 0 I = A1 C1 B1 D Remark 7. Their inverses are denoted by Tji . Balancingfree square root algorithm. up to an upper triangular matrix. T3 the square root transformation which balances up to scaling.19) (7. X ∈ Rn×n and LV = YΨ. In particular T1 has the largest condition number. We thus obtain the following transformations: ˆ 1 = (Y∗ Xk )−1 Y∗ ∈ Rk×n and T ˆ i1 = Xk ∈ Rn×k T k k Then the system ˆ1 = Σ ˆ 1 AT ˆ i1 T ˆ i1 CT ˆ 1B T D (7. Y ∈ Rn×n where X and Y are unitary matrices and Φ. Summary of balancing transformations The table below summarizes the four balancing transformations discussed earlier. 4. the Cholesky factors U and L of the gramians can be computed directly.3. Our goal here is to transform the system by means of an orthogonal transformation in such a way that it is similar to a balanced system up to a triangular transformation. Y conformally. Numerical issues: four algorithms 4.21) ∗ where X∗ k Xk = Ik and Yk Yk = Ik . Xk ∈ Rn×k and LV1 = Yk Ψk .20) is balanced up to an upper triangular matrix. In the transformed bases if we partition Σ as in (7. (b) As shown in section 6. j = 1. (c) The number of operations required to obtain a balanced realization is of the order of n3 . This method for solving the Lyapunov equation was ﬁrst proposed by Hammarling [164]. Ψ respectively. T1 is the original one. 3. and consequently are upper triangular matrices. and T4 which yields balanced systems up to an upper triangular transformation. Q.
the third the up to diagonal scaling balance and truncate. are the 2norms of the differences between the identity and the product of each transformation and its inverse.C. Q = LL∗ U QU = KΣ K ∗ 2 ∗ Properties U upper. and the corresponding balanced triples (Cj . A ∈ Rn×n . Aj . C∗ ∈ Rn . L lower K unitary Transformation (Left) Inverse T1 = Σ1/2 K∗ U−1 ˆ 1 = Σ1/2 K∗ U−1 T 1 1 T1i = UKΣ−1/2 ˆ 1i = UK1 Σ−1/2 T 1 T2i = UWΣ−1/2 ˆ 2i = UW1 Σ−1/2 T 1 T3i = UW ˆ 3i = UW1 T T 4i = X ˆ 4 i = X1 T U∗ L = WΣV∗ W.’s’). T4 the squareroot balancingfree transformation. The order n of the ﬁlter varies. j = 1. 2. and the last one is the square root balancing free transformation. and the associated left projectors T ˆ ji . stable. σq Imq ACA : singular values as given by (5. These are the balance and truncate transformations.22) on page 116: σ1 Im1 Σ= . Let Σ = A C B . (7. For the general case we refer to the original source [248] and references therein. 2. j = 1. observable and balanced with Hankel σ 2 I m2 . Bj ). Y : unitary Φ. The transformations are: T1 the usual balancing transformation. 2. 2. Balancing and balanced approximations i “oa” 2004/4/14 page 186 i ˆ j . We will only deal with the singleinput singleoutput case. while the second is depicted in the right ﬁgure 7. 3 All these quantities are theoretically zero. 4. 2004 i i .4 A canonical form for continuoustime balanced systems In the sequel we present a canonical form for systems which are continuoustime. i i April 14.1. these can be obtained in MATLAB as follows: [A. reachable.1. observable and balanced. (1) 7. A similar example is explored in problem 47. 4 (2) ∗ Ej = norm (A∗ j Σ + ΣAj + Cj Cj ). the by T second one is the square root balance and truncate. and the norm of lefthand side of the observability Lyapunov equations for three balancing transformations: Ej = norm (In − Tj Tji ). be stable. 3.. n = 1 : 30. j = 1. The ﬁrst set is depicted in the left ﬁgure 7. Factorization P = UU∗ . T2 the squareroot balancing transformation. V unitary T2 = Σ−1/2 V∗ L∗ ˆ 2 = Σ−1/2 V∗ L∗ T 1 1 T 3 = V ∗ L∗ ˆ 3 = V ∗ L∗ T 1 UW = XΦ LV = YΨ X. . Ψ : upper T4 = (Y∗ X)−1 Y∗ ˆ 4 = (Y∗ X1 )−1 Y∗ T 1 1 An example We conclude this section with numerical experiments performed on continuoustime lowpass Butterworth ﬁlters.13). j = 1. B. The Hankel singular values for n = 30 have condition number close to 1019 and the ﬁrst 6 are nearly equal to 1. The errors that we look at. 4.1. are plotted as a function of the order n of the ﬁlter. that is.D]=butter(n.B. Our ﬁrst goal is to compare the ﬁrst three balancing transformations Tj . The actual values of the above errors in ﬂoating point arithmetic. reachable.i i 186 Chapter 7. these singular values were computed by means of the square root algorithm.
A canonical form for continuoustime balanced systems Balancing transformations for Butterworth filters: Various errors i “oa” 2004/4/14 page 187 i 187 Butterworth filters: Observability Lyapunov equation errors −6 0 −7 −2 −8 −4 −9 T1 Q T20 Q T2 Q Errors: log scale Errors: log scale −6 −10 −8 E1=Id−T1*inv(T1) E2=Id−T2*inv(T2) E3=Id−T3*inv(T3) −11 −10 −12 −12 −13 −14 −14 −15 −16 5 10 15 Filter dimension 20 25 30 −16 5 10 15 Filter dimension 20 25 30 Figure 7. γn It follows from (7.i i 7. . the singular values are distinct. . B) may be assumed to have the following form = 2 −γ1 2σ1 −γi γj where si = ±1. . Right ﬁgure: errors of three observability Lyapunov equations. . Left ﬁgure: errors of three balancing transformations. . A) the γi must be different from zero. notice that due to the reachability of (A. B) and to the observability (C. · · · . Let the entries of B be denoted by γi : γ1 . si sj σi + σj −γ1 γ2 s1 s2 σ1 +σ2 2 −γ2 ··· ··· . γn −γ2 γ1 s2 s1 σ2 +σ1 2σ2 A C B . −γn γ2 sn s2 σn +σ2 . . . Finally. . .24) follows that C = (s1 γ 1 · · · sn γ n ) . i. 2 −γn 2σ n (7. A. −γn γ1 sn s1 σn +σ1 . Consequently.4. and mj is the multiplicity of σj . i. n.23) (7. .23) that the entries of A can be chosen as follows aij = Moreover from (7.24) the triple (C.25) ··· ··· s1 γ 1 ACA s2 γ2 sn γ n April 14.23) and (7. The ﬁrst is when the multiplicity of all singular values is equal to one. −γ1 γn s1 sn σ1 +σn −γ2 γn s2 sn σ2 +σn γ1 γ2 . In other words the si are signs associated with the singular values σi . where σj > σj +1 > 0. from (7. B= . . 2004 i i i i .24) Before stating the result valid for the general case we examine two special cases.1. Balancing transformations for Butterworth ﬁlters of order 130..e. Summarizing. j = 1. the following Lyapunov equations are satisﬁed: AΣ + ΣA∗ + BB∗ = 0 ΣA + A∗ Σ + C∗ C = 0 (7.
As we will see later. · · · .24) is BB∗ = diag (γ. . γ > 0. For simplicity we denote this single singular value σ . · · · .e. . . . . 0 0 (7. mi = 0. To further simplify this form we are allowed to use orthogonal transformations U of the type U = diag (1..29) are σi > 0. αi.mi −1 0 0 γi 0 0 . . . . 0. . . i = 1. . i > 1.1 0 = 0 0 si γi 2 −γi 2σi Let there be q distinct singular values each of A1 q A2 n . · · · . . . . Aq 1 Aq 2 · · · C1 C2 · · · The following hold true: −α i. 0 0 0 0 αi. where U2 is an arbitrary orthogonal matrix of size n − 1. mi ≥ 0. Aqq Cq B1 B2 .26) . . . .i i 188 Chapter 7. Bq (7. 0) + L 2σ where L is an arbitrary skew symmetric matrix.. . mi (7. these systems are precisely the allpass systems for an appropriate choice of D.. m1 = n.28) Aij Cj Bi = −γi γj si sj σi +σj 0 . γi > 0. . . · · · .27). The general case is a combination of the above two special cases. i. . U2 ). . · · · . The ﬁnal form in this case is −γ 2 α1 0 ··· 0 0 γ 2σ −α1 0 0 α2 0 0 0 −α2 0 0 0 0 A B .27) Aii Ci Bi αi. i = 1. . 0 0 γi 0 . 0. . γ > 0 −1 A= diag (γ. 0 ··· 0 ··· (7. Following corollary 7. C .2 0 0 0 0 ··· 0 0 0 . . = (7. . . . 0 0 0 . 2004 i i .28) and (7. . C 0 0 0 ··· 0 αn−1 0 0 0 0 · · · −αn−1 0 0 sγ 0 0 ··· 0 0 where σ > 0.2 . . 0 sj γj 0 ··· . ··· 0 · · · −αi. q.3 the balanced representation is unique up to orthogonal transformation in the state space.. 0 0 . . .29) The parameters which enter the forms described by (7. . j = 1. αi > 0.30) i i ACA : April 14. It follows that one solution of (7.23) and (7. . A11 A12 · · · A21 A22 · · · A B = . (7. q q i=1 mi = n αij > 0. si = ±1. 0) = C∗ C.mi −1 ··· 0 0 ··· 0 0 . . multiplicity mi . Balancing and balanced approximations i “oa” 2004/4/14 page 188 i The second special case is that of one singular value of multiplicity n. s = ±1.1 0 −αi.
Notice that Σc is balanced.4.1 is also balanced since G1 = −H2 but its gramians are not equal to σ1 . using the bilinear transformation of section 4.4. singleinput. 10 √ γ= √ 3 5− 5 √ . σ3 ). (a) It is interesting to notice that a class of stable and minimal systems is parametrized in terms of positive parameters σi . mi . We now compute ﬁrst and second reduced order systems Σc.3.4.1 .2 = √ C2 D2 −δ+ 2 2 ACA April 14.1.27)(7.i i 7. Hankel norm approximation for the same system is investigated in example 8. Σd. A third order discretetime FIR system described by the transfer function: H(z ) = z2 + 1 . even if the condition given in part 2 of the theorem is not satisﬁed.30). β = 1 √ 3 5+5 √ .2 is balanced. Recall that the parametrization of minimal (let alone stable) systems in terms of other canonical forms is complicated. Σd. γi . ﬁrst in discretetime. 2004 i i i i . Theorem 7.1. σ 2 = 1. Let Σc denote the continuous time system obtained from Σd by means of the bilinear transformation described in section 4. (b) Using the parametrization of balanced systems derived above one can obtain an alternative proof of part 2 of theorem 7.1.3: −1 − 2α2 2α 2α 2 δ√ + A B 2α −1 −2α − 2 Σc = = 2 2 −2α − 2α −1 + 2α δ− C D √ −δ+ 2 δ− 2 √ where δ± = 2(α ± β ). αij and the sign parameters si . σ1 = The second and ﬁrst order balanced truncated systems are 0 α β F2 G 2 Σd. We ﬁrst investigate approximation by balanced truncation. σ2 .3. continuoustime systems with Hankel singular values σi of multiplicity mi and the family of systems parametrized by (7.2 by truncating Σc : −1 − 2α2 2α δ√ + A2 B 2 2α −1 − 2 = Σc. singleoutput. σ 3 = 2 √ 5−1 2 P = Q = Σ = diag (σ1 .3. reachable. A balanced realization of this system is given by F G H J 0 α = 0 β α 0 α 0 0 −α 0 γ β 0 −γ 0 Σd = √ where α = 5− 4 .12. and then in continuoustime. z3 is considered. but has singular values which are different from σ1 . observable.2 = = α 0 0 . It thus becomes clear under what conditions the truncated system is minimal. σ2 .7. Σc. 10 and the reachability and observability gramians are equal and diagonal: √ 5+1 . 189 i “oa” 2004/4/14 page 189 i Remark 7. Example 7.1 = H2 J2 −β 0 0 F1 H1 G1 J1 = 0 −β β 0 Notice that Σd. and compare the results. A canonical form for continuoustime balanced systems We are now ready to state the main result of this section.5. There is a onetoone correspondence between the family of stable.
618034.5 Other types of balancing* In general terms balancing consists of the simultaneous diagonalization of two appropriately chosen positive deﬁnite matrices.2 . and satisfy an H∞ error bound (see theorems 7. which satisﬁes the following properties: (i) it is square. B ˆ Q)∗ (DD∗ )−1 (C − B ˆ Q) = 0 A∗ Q + QA + (C − B AP + P A + BB + (P C + BD )(I − DD ) (P C + BD ) = 0. This problem has been studied in linear algebra. 7. A∗ Q + QA + C∗ C + (QB + C∗ D)(I − D∗ D)−1 (QB + C∗ D)∗ = 0 AP + P A∗ + (P C∗ − B)(D + D∗ )−1 (P C∗ − B)∗ = 0. β = 1.1 .5. σ3 = 0. 2004 i i .47598.1 = A1 C1 B1 D1 Chapter 7. [78]. are simultaneously diagonalized. In this case the solution of the Lyapunov equations AP + P A∗ + BB∗ = 0.2 . Type Lyapunov Stochastic Equations AP + P A + BB = 0 A∗ Q + QA + C∗ C = 0 ˆ = P C∗ + BD∗ AP + P A∗ + BB∗ = 0.1 is σ1 . or Riccati equations. 7. 5β 2 /4 which satisfy the following interlacing inequalities: √ σ3 < 5β 2 /4 < σ2 < 5β 2 /4 < σ1 The numerical values of the quantities above are: σ1 = 1. Given a reachable and observable system Σ= A C B D . Reduced models. and by W(s) a minimum phase i i ACA : April 14. γ = 0. (ii) it is stable (all eigenvalues of A are in the lefthalf of the complex plane) and (iii) D is nonsingular.8).4).361241. these positive deﬁnite matrices are either solutions of Lyapunov equations. Σc. while that of Σ value of Σd. see. α = 0.2 = Σ = α α2 −αγ ¯2 J ¯2 H −β αγ −γ 2 ¯ d.2 Stochastic balancing* The starting point is a system Σ = A C B D . If H(s) = C(sI − A)−1 B + D is the transfer function of Σ. A∗ Q + QA + (QB − C∗ )(D∗ + D)−1 (QB − C∗ )∗ = 0 ∗ ∗ ∗ ∗ ∗ −1 ∗ ∗ ∗ ∗ ∗ Error H − Hr ≤ 2 Pn Pn i=r +1 σi H−1 (H − Hr ) ≤ ≤2 1+σi i=r +1 1−σi BR PR H − Hr ≤ 2 Pn −1 σi i=r +1 (H + D∗ )−1 − (Hr + D∗ )−1 ) ≤ ≤ 2 (D + D∗ )−1 Pn i=r +1 σi 7.. i. δ+ = 2.g. e.2 . β 2 satisﬁes σ2 < β 2 < σ1 .618034.08204.2 are 5β 2 /4. m = p.1 Lyapunov balancing* This is the method that was discussed in detail in the preceding sections. Here is a list of four types of balancing. obtained by simple truncation as in (7.66874.413304. preserve stability.1 are balanced and different from Σd. Σ ¯ d.7 and 7.1 = Σ ¯1 F ¯1 H ¯1 G ¯1 J = −σ3 /2 −(α + β )σ3 /2α2 (α + β )σ3 /2α2 2 − (α2 + β )2 /(1 + α2 ) ¯ d.i i 190 Σc.e. we denote by Φ its power spectrum. and A∗ Q + QA + C∗ C = 0. Balancing and balanced approximations = −1 − 2α2 −δ+ δ+ 2 i “oa” 2004/4/14 page 190 i ¯ d. It is interesting to notice that the singular The conclusion is that Σ ¯ d. δ− = .5. Σd.1 be the discretetime systems obtaining by transforming Σc.1 back to discretetime: Let Σ β 0 α ¯ ¯ F2 G2 ¯ d.1 is β 2 = 1+ 5 √ Σd. Σ ¯ d.2 . Furthermore the singular values of √σ1 .
i i 7. 2004 i i i i . from ln(Hk /H) = ln Hk /H + i arg(Hk /H).5. If we let the neglected distinct singular values be σi . The multiplicative error bound. H H  arg(H) − arg(Hk ) ≤ 1  ∆  radians. the Bode plot (both amplitude and phase) of the relative approximant Hk /H is bounded up to a constant by the product given above. the eigenvalues of the product PQ are the squares of the Hankel singular values σi of the following auxiliary system A B P C∗ + BD∗ 0 This method proceeds by computing a balancing transformation T for this auxiliary system. reduction by truncation follows as usual. Remark 7. 2 3 By neglecting higher order terms we have ln(Hk /H) ≈ ∆. then Hk H = 1 − ∆. furthermore the phase system whose transfer function is Z(s) is introduced. Other types of balancing* 191 i “oa” 2004/4/14 page 191 i right spectral factor. deﬁne B Q be a stabilizing solution of the following Riccati equation ¯ ∗ Q)∗ [DD∗ ] A∗ Q + QA + (C − B −1 ¯ ∗ Q) = 0 (C − B It can be shown that a realization of W and Z is given in terms of the two gramians P and Q as follows: ΣW = ¯ B ¯ A ¯ D ¯ C ΣZ = ˆ A ˆ C = ˆ B ˆ D −1 D P C∗ + BD∗ A ¯ Q) (C − B D∗ A C P C∗ + BD∗ DD∗ /2 = By construction. In the SISO case let ∆ = H− H . the following relative error bound holds n σk+1 ≤ H(s)−1 [H(s) − Hk (s)] H∞ ≤ i=k+1 1 + σi − 1. unstable) zeros of Σ. satisﬁes this same error bound. 1 − σi where only the distinct singular values enter in the above formula. which in turn implies ln Hk H = ln(1 − ∆) = ∆ + 1 2 1 3 ∆ + ∆ + ··· . and let Let P be the usual reachability gramian of Σ satisfying AP + P A∗ + BB∗ = 0.69  ∆  dB. These quantities satisfy: Φ(s) = H(s)H∗ (−s) = W∗ (−s)W(s) = Z(s) + Z∗ (−s) ¯ = P C∗ + BD∗ . This follows from the following considerations.e. follows that Hk Hk  ln    ≤  ∆  and  arg  ≤ ∆. ACA April 14. H H Thus transforming these relationships to log10 we get 20 log10  and similarly Hk Hk = 20 log10 e ln  ≤ 20 log10 e  ∆ = 8. which is applied to the original system Σ.1. be the relative error. It is worth noting that σi ≤ 1 and the number of σi which are equal to one is equal to the number of righthalf plane (i.5. 2 Thus assuming that ∆ is small. Furthermore the H∞ norm of the relative error Hk (s)−1 [H(s) − Hk (s)]. It is pointed out in chapter 2 of the Obinata and Anderson book [252] that the multiplicative error measure above is focusing on Bode diagram errors (errors in logmagnitudes and Hk phases).
−1 Bounded real balancing is obtained by simultaneous diagonalization of Ymin and Ymax . Also. that is its transfer function H satisﬁes: I − H∗ (−iω )H(iω ) ≥ 0. For a recent overview see the book of Obinata and Anderson [252].58). This issue will be discussed in section 7. for simplicity. see [342] and [139].31) such that A + B(I − D∗ D)−1 (B∗ Y + D∗ C) is asymptotically stable.32) are the bounded real Riccati equations of the system Σ.32) Also in this case any solution lies between two extremal solutions: 0 < Zmin ≤ Z ≤ Zmax . or equivalently Ymin and Zmin . i = 1. that is.31) Any solution Y lies between two extremal solutions: 0 < Ymin ≤ Y ≤ Ymax . stochastic balanced truncation is the same as the selfweighted balanced truncation where only input weighting exists and is given by H−1 . mi . and for numerical algorithms we refer to the work of Benner [53].i i 192 Chapter 7. 7. Let the reduced order model Σr = A11 C1 B1 D be obtained by bounded real balanced truncation.31).1 in more detail. q are the multiplicities of ξi .6. Zhou [369] showed that for minimal phase systems. Furthermore positive realness is also equivalent with the existence of a positive deﬁnite solution Z of the dual Riccati equation AZ + Z A∗ + BB∗ + (Z C∗ + BD∗ )(I − DD∗ )−1 (Z C∗ + BD∗ )∗ = 0. It is called strictly bounded real if this inequality is strict. The realization of a bounded real system Σ is called bounded real balanced if −1 −1 Ymin = Zmin = Ymax = Zmax = diag(ξ1 Il1 . ω ∈ R. It follows that Y = Y ∗ > 0 is a solution to (7. Thus in this case the minimum phase property is preserved by balanced stochastic truncation. if Z = Y −1 is a −1 −1 solution to (7.32). As a consequence of these facts. For application to singular systems. (7. Also let W(s) and V(s) be stable and minimum phase spectral factors.31) and (7. which implies that (strict) bounded realness is equivalent to the existence of a positive deﬁnite solution Y = Y ∗ > 0 to the Riccati equation A∗ Y + Y A + C∗ C + (Y B + C∗ D)(I − D∗ D)−1 (Y B + C∗ D)∗ = 0. and conversely. · · · . we will be concerned with strictly bounded real systems only. Bounded real systems were discussed in detail in section 5. · · · . We will call ξi the bounded real singular values of Σ. the difference R = Q−1 − P is nonsingular and it satisﬁes the Lyapunov equation (A − BD−1 C)R + R(A − BD−1 C)∗ + (D−1 C)∗ (D−1 C) = 0 Thus R is the observability gramian of the inverse system and no Riccati equation needs to be solved.2. The ﬁrst consequence is that with Q the usual observability gramian of Σ. Truncation now follows as usual after transformation to the balanced basis.5. (7. Ymin is the unique solution to (7. we have σi < 2− 2 . W∗ (s)W(s) = i i ACA : April 14. the stochastic singular values νi are related to the Hankel singular values σi of Σ: σi = 1 νi 2 1 + νi Since νi < 1. ξq Ilq ) where 1 ≥ ξ1 > ξ2 > · · · > ξq > 0.3 Bounded real balancing* The asymptotically stable system Σ is bounded real if its H∞ norm is no greater than one. Hence Zmin = Ymax and Zmax = Ymin . Stochastic balanced truncation can be applied to all asymptotically stable dynamical systems which are square and nonsingular. by eliminating the states which correspond to small singular values. and m1 + · · · + mq = n. In the former reference it is stated that stochastic balanced truncation yields a uniformly good approximant over the whole frequency range rather than yielding small absolute errors. the relative error bound is due to Green [146.9. (7. In the sequel. Recall in particular the bounded real lemma (5. Balancing and balanced approximations i “oa” 2004/4/14 page 192 i It is of interest to discuss the special case when Σ is minimum phase. 147]. Stochastic balancing was introduced by Desai and Pal [94]. 2004 i i .
There is canonical form for the pr balanced system. These are the positive real Riccati equations of the passive system Σ. we have (A)ij = −γi γj 1 + si sj ξi ξj + ξj d . and m1 + · · · + mq = n. not only are Σ and Σr close. Furthermore. The solutions K and L of these equations lie between two extremal solutions. minimal and positive real balanced. Lmin : 1 −1 = L− Kmin = Lmin = Kmax max = diag(π1 Is1 . mi . q . states that a necessary and sufﬁcient condition for a system Σ to have this property is that the Riccati equation (PRARE) have a positive (semi) deﬁnite solution: A∗ K + KA + (KB − C∗ )(D + D∗ )−1 (KB − C∗ )∗ = 0. obtained by the positive real balanced truncation. 2d(si sj πi + πj ) where si = ±1. Bounded real balancing and the above error bounds are due to Opdenacker and Jonckheere [251]. · · · . There is also a canonical form associated with bounded real balancing. In the singleinput singleoutput case. H∞ H − Hr V − Vr q ≤2 H∞ i=k+1 ξi . which are termed the positive real singular values of Σ.5. The dual Riccati equation in this case is AL + LA∗ + (LC∗ − B)(D + D∗ )−1 (LC∗ − B)∗ = 0. which in the SISO case is: −γi γj (1 − si πi )(1 − sj πj ) Aij = . corollary 5.33) Thus if 2 i=k+1 ξi is small. D = d > 0. ACA H∞ ≤2 R 2 i=k+1 πi . i = 1. (7. satisﬁes q (D∗ + H(s))−1 − (D∗ + Hr (s))−1 where R2 = (D + D∗ )−1 .4 Positive real balancing* An important class of linear dynamical systems are those whose transfer function is positive real. The reduced order model Σr .9. · · · . but also the reduced spectral factors Wr and Vr are guaranteed to be close to the full order spectral factors W and V.i i 7. It also satisﬁes an error bound [159].13. is asymptotically stable. (C)j = sj γj . · · · . i.5. minimal. 0 < Kmin ≤ K ≤ Kmax and 0 < Lmin ≤ L ≤ Lmax . Then Σr is asymptotically stable. πq Isq ) = Π. (PRARE) where 1 ≥ π1 > π2 > · · · > πq > 0. are the multiplicities of πi . 7. If K = K∗ is a solution of −1 1 the former. n. For details on the general case of this canonical form see Ober [249]. Proposition 7. Cj = sj γj . The reduced order model Σr obtained by positive real balanced truncation. D = d.24 on page 140. The deﬁnition and properties of such systems are discussed in section 5. hence Kmin = L− max and Kmax = Lmin . bounded real balanced and satisﬁes max q H − Hr W − Wr . for details see [249].1. Other types of balancing* 193 i “oa” 2004/4/14 page 193 i I − G∗ (s)G(s) and V(s)V∗ (s) = I − G(s)G∗ (s). A balancing transformation is obtained by simultaneous diagonalization of the minimal solutions Kmin . (B)i = γi . i = j . Deﬁne Wr (s) and Vr (s) similarly for Σr . L = K−1 is a solution of the latter. and assuming the corresponding singular values are distinct ξi = ξj . i = 1. 1 − d2 si sj ξi + ξj where γi > 0 and si = ±1. Bi = γi > 0. 2004 i i i i . respectively.e. April 14.
Remark 7. The result follows by noting A11 C1 B1 R−2 −1 −BR2 R2 = and A11 − B1 R2 C1 R2 C1 −B1 R2 R2 = . Π2 = diag(πk+1 Imk+1 . · · · .5. is bounded real balanced with the bounded real gramian Π. C Π1 Π2 ˆ 12 A ˆ 22 A ˆ = . We can assume that Σ is in the positive real balanced basis. and deﬁne the bounded real reduced ˆr = system Σ ˆ r (s) (−R) H ˆ 11 A ˆ1 C ˆ (s) − H ˆ r ( s) . (A − BR2 C)∗ Π + Π(A − BR2 C) + ΠBR2 B∗ Π + C∗ R2 C = 0. = (Θ(s) + R2 ) − (Θr (s) + R2 ) A C B R−2 −1 ≤2 R πi .i i 194 Chapter 7. This error bound is equivalent to q [D∗ + H(s)]−1 [H(s) − Hr (s)][D∗ + Hr (s)]−1 H∞ ≤2 R 2 i=k+1 πi . Hence the following two Riccati equations result: AΠ + ΠA∗ + (ΠC∗ − B)(D + D∗ )−1 (ΠC∗ − B)∗ = 0. A∗ Π + ΠA + (ΠB − C∗ )(D + D∗ )−1 (ΠB − C∗ )∗ = 0.2. 2004 i i . where R−2 = D + D∗ . respectively. Let ˆ1 B ˆ2 B ˆ = .13. Π= . Balancing and balanced approximations i “oa” 2004/4/14 page 194 i Proof. A modiﬁed positive real balancing method with an absolute error bound We will now introduce a modiﬁed positive real balancing method for a subclass of positive real systems. · · · . Then based on proposition 7. where Π1 = diag(π1 Im1 . These can be written as (A − BR2 C) Π + Π(A − BR2 C)∗ + ΠC∗ R RC Π + BR RB∗ = 0. From (7. ˆ = It follows that the system Σ ˆ = A ˆ 11 A ˆ 21 A ˆ1 B 0 ˆ A ˆ C ˆ B 0 ˆ =A ˆ =C ˆ =B . let Σ ˜ (s) = (D∗ + H(s))−1 − 1 R2 . H 2 ACA : i i April 14. B ˆ1 C ˆ2 C . which is a frequency weighted bound for the error system Σ − Σr . this leads to q ˆ ( s) − H ˆ r (s))(−R) R(H ˆ (s)(−R) − RH ˆ r (s)(−R) = RH =Θ(s) =Θr (s) H∞ ≤2 R 2 i=k+1 πi . πq Imq ).33) we conclude that H 2 H∞ ≤2 q i=k+1 ˆ ( s) − πi . ˜ be deﬁned by means of its transfer function as follows: Given the system Σ with pr transfer function H. Since R (H H∞ ≤ R ˆ (s) − H ˆ r ( s) H H∞ H∞ . where the input and the output weightings are (D∗ + H(s))−1 and (D∗ + Hr (s))−1 . πk Imk ). A realization for Θ(s) and Θr (s) can be obtained as Θ(s) = Thus Θ(s) − Θr (s) that A − BR2 C R2 C H∞ A − BR2 C −BR2 R2 C 0 and Θr (s) = H∞ A11 − B1 R2 C1 R2 C1 2 q i=k+1 −B1 R2 0 . we will derive an absolute error bound for this reduction method.
positive real and satisﬁes discussed above. is: A realization of the systems Σ ˆi A ˆi C ˆi B ˆi D A 0 = C BCi Ai DCi BDi Bi DDi ˆ . Varga and Anderson proposed in [346] the application of square root and balancing free square root techniques for accuracy enhancement. Asymptotic stability and positive realness follows by construction. 2 In this case the positive real balanced truncation discussed above is applied to Σ ˜ . The ﬁnal reduced order model Σ ¯ r = D. π ¯q Isk . 2 1 ˜ (s) = This condition is not always satisﬁed. Σi = Ai Ci Bi Di . Σ ˆ o whose transfer functions are H ˆ i = HHi . a good approximation is required only in a speciﬁc frequency range. They consist in incorporating suitable output and input frequency weights in the computation of the truncated matrices. This leads to approximation by frequency weighted balanced truncation. π ¯k+1 Isk+1 . The assumption in this section is ˜ (s) is positive real.13 yields (D π ¯ ∞ i=k+1 i . The next section gives a brief overview of existing frequency weighting model reduction methods. The following bound holds where D ¯ r be obtained by the modiﬁed positive real balancing method Proposition 7. and let Σr denote the reduced positive ˜ r is the intermediate reduced model. and a system Σo . which implies the desired inequality. ˆ i. and discusses how the squareroot and balancingfree square root techniques can be incorporated. · · · . Σo = Ao Co Bo Do .6. which is the output weight: Σ= A C B D . such that the weighted error Ho (s)(H(s) − Hr (s))Hi (s) H∞ is small. Then Σ q ¯ r (s) H(s) − H H∞ ≤ 2 R−1 2 i=k+1 π ¯i . 2 April 14. H p and the above condition is satisﬁed for p > ACA 1 . let Σ ¯ r is asymptotically stable. proposition 7. · · · . where R = ∗ −1 −2 ∗ −1 ∗ ∗ −1 ∗ ˜ ˜ ˜ ˜ ˜ ˜ (D + D ) = R . if H(s) = 1 + s+ where p ≥ 0. Let the that H ˜ ˜ ¯ pr singular values of Σ be Π = diag(¯ π1 Is1 . Frequency weighted balanced truncation* ˜ is easily computed as A statespace representation of Σ A − BR2 C R2 C −BR2 R2 /2 i “oa” 2004/4/14 page 195 i 195 . H ˆ o = Ho H respectively. the problem is to compute a reduced order system Σr . Σo = ˆo A ˆo C ˆo B ˆo D A 0 Bo C A o = Do C Co 2s+2p−1 . a system Σi which the input weight. 7. ˜ r is obtained from Σ ˜ by pr balanced Proof.14. real system obtained by keeping the ﬁrst k pr singular values π ¯i .i i 7. 2004 i i i i . For example. Since Σ ˜2 ˜∗ + H ˜ (s))−1 − (D ˜∗ + H ˜ r (s))−1 H ≤ 2 R ˜ 2 q truncation.6 Frequency weighted balanced truncation* The balancing methods discussed above aim at approximating the system Σ over all frequencies. Several methods have been proposed for frequency weighted model reduction. Given the positive real system Σ. The resulting Σ ¯ r is obtained from Σ ˜ r by means of the equation D ¯r +H ¯ r (s) = (R2 /2 + H ˜ r (s))−1 . π ¯q Isq ). Gramians for weighted systems Consider the system Σ. By construction we have (D + H(s)) = D + H(s). In many cases however. and (D + Hr (s)) = D + Hr (s). Given an input weighting Σi and an output weighting Σo . 4(2s+2p+1) ˆi = Σ B Bo D Do D .
so that the dimension of the (1. (4. 2004 i i . Qo = Q ˆ 11 Pi = P (7.35). 1) blocks of the gramians obtained by solving the Lyapunov equations (7.3).37) Proof. σn ) Subsequently. observability gramians deﬁned by (7.35) We can now ﬁnd a balancing transformation T which will simultaneously diagonalize these two gramians TPi T∗ = T−∗ Qo T−1 = Σ = diag (σ1 . that is. go through the system Σo . A ˆ ∗Q ˆ ˆ ˆ ˆi + P ˆi A A i i o o + Qo Ao + Co Bo = 0. (7. The question which arises is whether these weighted gramians can be obtained as solutions of Lyapunov equaˆ i. respectively: the (1.51) where (sI − A)−1 B is the inputtostate map. Let P ˆ∗ˆ ˆ iP ˆ∗ +B ˆ iB ˆ ∗ = 0. 1) block is the same as that of A. (sI − A I 0 . observability gramians of Σ tions. ˆi = P ˆ11 P ˆ21 P ˆ12 P ˆ22 P ˆo = . Given that the input to the system has to be weighted. The frequency weighted reachability. and Q= 1 2π ∞ −∞ (−iω I − A∗ )−1 C∗ C(iω I − A)−1 dω.34). Q ˆ o be the reachability. respectively. i i ACA : April 14. to that effect recall (7.i i 196 Chapter 7. the weighted inputtostate map becomes (sI − A)−1 BHi (s).36) The gramians are partitioned in twobytwo blocks. (4. (7. are equal to ˆi . (7. 1) block of the gramians P ˆ11 . i. Q ˆ 11 Q ˆ 21 Q ˆ 12 Q ˆ 22 Q Proposition 7. the order of the tobereduced system.4).36). Balancing and balanced approximations i “oa” 2004/4/14 page 196 i Recall the deﬁnition of the system gramians in the frequency domain P= 1 2π ∞ −∞ (iω I − A)−1 BB∗ (−iω I − A∗ )−1 dω. Σ ˆ o . The proof follows by noticing that (sI − A)−1 B Hi (s) = Ho (s) C(sI − A)−1 = I 0 sI − A 0 −BCi sI − Ai −1 BDi Bi −1 = I 0 I 0 ˆ i )−1 B ˆ i. Consequently the reachability and observability weighted gramians are: Pi = 1 2π 1 2π ∞ −∞ ∞ −∞ ∗ ∗ −1 (iω I − A)−1 B Hi (iω )H∗ dω i (−iω ) B (−iω I − A ) (7.e. · · · .52) where C(sI−A)−1 is the statetooutput map. order reduction takes place exactly as for balanced truncation. while the weighted statetooutput map becomes Ho (s)C(sI − A)−1 .15. that is: ˆi .34) Qo = −1 (−iω I − A∗ )−1 C∗ H∗ dω o (−iω )Ho (iω ) C(iω I − A) (7. Do C Co sI − A − Bo C 0 sI − Ao ˆ o ( sI − A ˆ o )−1 =C Therefore the required gramians are obtained as the (1. Q ˆ o .
Finally. Lin and Chiu [229] compute P . Σr is stable. let Σr be obtained by frequency weighted balanced truncation as above. σk+1 Imk+1 .6. Pi = Qo = diag(σ1 Im1 . For simplicity. 2004 i i i i . this method does not guarantee stability of the reduced model for the case of twosided weighting. Σo respectively. Ho (s) = H−1 (s) = The gramians to be diagonalized are therefore APi + Pi A∗ + BB∗ = 0.i i 7. In general. Truncation in the balanced basis of the states which correspond to small singular values yields a system Σr of reduced order obtained by weighted balanced truncation. we will only discuss the case where H(s) is square. April 14. This method is a special case of Enns’ method where Hi (s) = I. The original work of Enns also did not provide an error bound. βk .16. Furthermore. must be minimal. The following result holds. If either Hi = I or Ho = I. Qo (A − BD−1 C) + (A − BD−1 C)∗ Qo + (D−1 C)∗ (D−1 C) = 0. whenever Σr is stable the error is bounded by the following expression q Ho (s)(H(s) − Hr (s))Hi (s) H∞ ≤2 i=k+1 2 + (α + β )σ σk k k k 3 /2 + αk βk σk . Frequency weighted balanced truncation* 197 i “oa” 2004/4/14 page 197 i 7. · · · . λk and ωk denote the H∞ norms of transfer functions depending on the data. and H−1 is asymptotically stable. together with input. For details we refer to [229]. However. ACA A − BD−1 C −BD−1 D−1 C D−1 . the weighted error system satisﬁes an error bound of the form q Ho (s)(H(s) − Hr (s))Hi (s) H∞ ≤2 i=k+1 2 + α + λ )(˜ (˜ σk k k σk + βk + ωk ) where αk . Given the stable and minimal system Σ.e. σq Imq ) where mi are the multiplicities of σi with m1 + · · · + mq = n. output weights Σi . Q = Q11 − Q12 Q− 22 Q21 . If the realizations are minimal. Q as above. The method by Lin and Chiu To add stability for the case of twosided weighting. but replace the gramians (7. Lemma 7. The ﬁrst attempt to deal with weightings in model reduction is due to Enns [106]. det(D) = 0. these gramians are nonsingular. where αk and βk are the H∞ norms of transfer functions depending on the weights and the reduced system. i.1 Frequency weighted balanced truncation* Once the weighted gramians Pi and Qo have been deﬁned. nonsingular. see [201]. stability of the reduced system is guaranteed. · · · .6. Zhou’s selfweighted method Zhou’s method [369] is applicable to any stable Σ which has a stable right inverse and D is fullrank. σk Imk .37) by their Schur complements: −1 1 P = P11 − P12 P22 P21 . For details on the computation of αk and βk . the system is minimum phase. a balanced realization is obtained by determining the transformation which simultaneously diagonalizes these two gramians. The main drawback of this method is that the realizations of Σi and Σo as given above.
· · · . the reduced order system is stable. Remark 7. β and computes the frequency weighted gramians as follows: −1 1 P = P11 − α2 P12 P22 P21 . a family of stable weighted reduced order models is obtained. nonsingular and minimum phase system. σk+1 Ink+1 .1. Although the computation in this method is more involved.2. This method also guarantees stability for the case of two sided weighting. It does so by forcing the gramians to satisfy a deﬁnite Lyapunov equation (i. stability of the reduced system is guaranteed. i i ACA : April 14. Let H(s) be an asymptotically stable. · · · . Let X = −AP11 − P11 A∗ . let the eigenvalue decomposition of the former be X = UΛU∗ . where Q ≥ 0). the σi and the stochastic singular values µi ’s are related by µi = √ σi 2 . In this case. consisting in generating a semideﬁnite  X  from X by setting all its negative eigenvalues to zero (similarly for  Y .6. H−1 (H − Hr ) H∞ ≤ i=k+1 (1 + 2σi 2 + 2σ 2 ) − 1 1 + σi i It can be shown that if H(s) is square. Then stability is a result of the inertia results for Lyapunov equations discussed in section 6. The combination method by Varga and Anderson The combination method proposed in [346] is formulated by combining Enn’s method with Lin and Chiu’s method. To improve accuracy in all methods mentioned above. by choosing the parameters in the neighborhood of one.e.e. The gramians are now computed by solving the Lyapunov equations AP + P A∗ +  X = 0. One issue which needs to be addressed is the sensitivity of the reduced model to the choice of the input and output weightings. the reduced system is guaranteed to be stable. A∗ Q + QA+  Y = 0. · · · . σk Ink . This method introduces two parameters α. while for α = β = 0 it may not be. λn ). Balancing and balanced approximations i “oa” 2004/4/14 page 198 i The selfweighted balanced realization is then obtained by simultaneously diagonalizing P11 and Q11 . P11 = Q11 = diag(σ1 In1 . let Hr (s) be obtained by the selfweighted frequency truncation method. [351] provides computation of an a priori computable error bound for the weighted error. on the diagonal. minimum phase and satisﬁes q 1 H− r (H − Hr ) H∞ . page 157. 1+σi The method by Wang et al. Y = −A∗ Q11 − Q11 A. Let  X = U  Λ  U∗ . Then Hr (s) is asymptotically stable. square. Q = Q11 − β 2 Q12 Q− 22 Q21 . Lemma 7. Furthermore an error bound can be computed (see [351] for details). If these two matrices X and Y are positive (semi) deﬁnite. Using this approach the reduced model retains the features of both methods. In the same paper a modiﬁcation of Wang’s method is presented. and similarly for  Y . Since for α = β = 1. where U is orthogonal and Λ is diagonal with real entries λi . Subsequently. Otherwise. asymptotically stable. square root and balancing free square root techniques can be used. AP + P A∗ + Q = 0. σq Inq ).i i 198 Chapter 7. Also. A method due to Wang et al. this method is equivalent to stochastic balancing. 2004 i i . Let  Λ = diag (λ1 . i.17. balancing and truncation are applied as earlier. nonsingular and minimum phase as in the above theorem.
However. S(ω ) can be efﬁciently computed as well. provides a reduced order system which yields small errors in the interval [ω1 .2 Frequency weighted balanced reduction without weights* The goal of frequency weighted reduction problem is to ﬁnd a system Σr of order less than Σ such that the weighted error Ho (s)(H(s) − Hr (s))Hi (s) H∞ . Wr (ω1 . With the notation introduced above the gramians satisfy the following Lyapunov equations AP (ω ) + P (ω )A∗ + Wr (ω ) = 0. The reduced model is obtained as usual. ω2 ) = P (ω2 ) − P (ω1 ) and Q(ω1 . · · · . ω2 ): P (ω1 . ω2 ) = 0. This is achieved by using the frequency domain representation of the gramians.39) Thus if we are interested in the frequency interval [0. A∗ Q(ω ) + Q(ω )A + Wo (ω ) = 0. Instead the problem is to approximate Σ over a given frequency range [ω1 . However. A∗ Q(ω1 . ω2 ) = diag(σ1 Im1 . To answer this question we introduce the following notation. by truncation. ω ]. ω2 )A∗ + Wr (ω1 . for largescale problems. ω2 )A + Wo (ω1 .40) It is readily seen that these expressions are positive (semi) deﬁnite. stability of the reduced model ACA April 14. ω2 ] is of interest. ω2 ) + Q(ω1 .6. this issue is still under investigation. the gramians are deﬁned as follows: P (ω1 . For smalltomedium scale problem for which an exact balancing transformation can be computed. since Wr and Wo are not guaranteed to be positive deﬁnite. without constructing input and output weights.18. In case that the frequency interval [ω1 . Thus simultaneous diagonalization of these gramians and subsequent truncation. ω2 ) and Q(ω1 .41) Wr (ω ) = S(ω )BB∗ + BB∗ S∗ (−ω ). we simultaneously diagonalize the two gramians P (ω ) and Q(ω ) deﬁned above and proceed as before. ω2 ) = Wo (ω2 ) − Wo (ω1 ) (7. ω2 ) = Wr (ω2 ) − Wr (ω1 ) Lemma 7. ω2 ) = Q(ω1 . ω2 ) + P (ω1 . The computations of the various gramians require the evaluation of a logarithm. 2004 i i i i . S(ω ) = 1 2π ∞ −∞ (iω I − A)−1 dω = i ln (iω I − A)(−iω I − A)−1 2π Wo (ω ) = S∗ (−ω )C∗ C + CC∗ S(ω ). The question now is whether these gramians satisfy Lyapunov equations. Furthermore AP (ω1 .6. It should be mentioned that in many cases the input and output weightings Hi (s) and Ho (s) are not given. Balancing involves the simultaneous diagonalization of the gramians P (ω1 .i i 7. this problem can be attacked directly. As shown by Gawronski and Juang [134]. in addition to the solution of two Lyapunov equations. σq Imq ). ω2 ]. ω2 ) = 0. is small. ω2 ]. Frequency weighted balanced truncation* 199 i “oa” 2004/4/14 page 199 i 7.38) Q(ω ) = 1 2π ω −ω (−iω I − A∗ )−1 C∗ C(iω I − A)−1 dω (7. where as before mi are the multiplicities of each distinct singular value σi . But we note that computing an exact solution to a Lyapunov equation in largescale settings is an illconditioned problem itself. ω2 ) = Q(ω2 ) − Q(ω1 ) (7. Wo (ω1 . with the limits of integration appropriately restricted: ω −ω P (ω ) = 1 2π (iω I − A)−1 BB∗ (−iω I − A∗ )−1 dω (7.
(a) In section 6.i i 200 Chapter 7. Hc ) − φ(H ˆ c such that is kept small or minimized. Hc ) − φ(H. Balancing in this case is obtained by simultaneous diagonalization of the timelimited gramians P (T ) and Q(T ): P (T ) = Q(T ) = diag(σn1 In1 . and the impulse response of the reduced model is expected to match that of the full order model in the time interval T = [ t1 . σnq Inq ). Hc ) = H(s)(I + Hc (s)H(s)) .11) where the eigenvalues of A may lie both in the left. It can be shown by means of examples that as the order of the weightings increases. This equation is in terms of the spectral projector Π onto the stable invariant subspace of A.6. The gramians P (T ) and Q(T ) are solutions of the following Lyapunov equations: AP (T ) + P (T )A∗ + Vr (T ) = 0. Q(T ) satisfy Lyapunov equations.19. · · · . t2 ]. The reduced model is obtained by truncation in this basis. Q given by (4. φ(H.6. one may also wish to ﬁnd a reduced order Σ ˆ c ) or φ(H. The details are omitted here. the two methods produce similar results. the reduced model is not guaranteed to be stable. In the simplest case the overall system −1 ˆ is sought is characterized by the transfer function φ(H. as in the frequency weighted case. For an illustration of this fact.44).1. 7. Hc ) φ(H. ∗ (7. (b) The above discussion reveals the connection between Enns’ and Gawronski and Juang’s frequency weighted balancing methods. i.6. We recognize this as the gramian P (∞) discussed above. The resulting Lyapunov equations have dimension n + ni where ni is the order of Hi (s). see the example at the end of this section. see [134]. t2 ].2. In [159]. (4. 7. Hence.3 Balanced reduction using timelimited gramians* Yet another set of gramians is obtained by restricting in time the gramians P . the given system Σ is part of a closedloop with controller Σc . this can be ﬁxed as shown in [159]. Let Vr (T ) = eAt1 BB∗ eA ∗ t1 − eAt2 BB∗ eA ∗ t2 . these bandpass ﬁlters are approximated by lower order bandpass ﬁlters. a modiﬁed reduction method is presented. Vo (T ) = eA ∗ t1 C∗ CeAt1 − eA ∗ t2 C∗ CeAt2 Lemma 7. As for the frequencylimited gramians.4 Closedloop gramians and model reduction* In this case. the realizations of the weightings are never computed. in Enns’ method.43). Balancing and balanced approximations i “oa” 2004/4/14 page 200 i cannot be guaranteed. the time limited gramians are deﬁned as t2 P (T ) = t1 eAτ BB∗ eA τ dτ and Q(T ) = ∗ t2 t1 eA τ C∗ CeAτ dτ.and the righthand side of the complex plane. 2004 i i . A∗ Q(T ) + Q(T )A + Vo (T ) = 0. The latter is obtained from the former by choosing the Hi (s) and Ho (s) as the perfect bandpass ﬁlters over the frequency range of interest. It can be shown that Π = 2 I + S(∞). Hc ) − φ(H ˆ. Alternatively. This also results in error bounds. This projector is related to 1 S(∞) deﬁned above. i i ACA : April 14.e. For T = [ t1 .H ˆ c) . However. However. as they get closer to a perfect bandpass ﬁlter. Remark 7.42) These quantities are positive (semi) deﬁnite and therefore qualify as gramians. We note that an inﬁnite dimensional realization will be needed to obtain the perfect bandpass ﬁlters. A reduced order system Σ such that the error ˆ . P (T ).2 on page 147 a Lyapunov equation was derived for an expression of the form (6. In that section a Lyapunov equation satisﬁed by this gramian was given. H is kept small or minimized. where the Lyapunov equations are forced to be deﬁnite (much the same way as the method in [351]).
52) on page 70. are not deﬁned and therefore one cannot talk of simultaneous diagonalization. 7. Consequently.44) on page 69. A related approach applicable to openloop preﬁltered systems is described in [132]. are now deﬁned as shown earlier. e. Hc ) is given by At Ct Bt 0 = A −BCc Bc C Ac C 0 B 0 0 . Bt . consists in using the expression for the gramians in the frequency domain. for the input weighting case.6. they can be simultaneously diagonalized. Consider the system Σ in the usual feedback loop together with the controller Σc . This can be considered as a weighted composite system with inputweighting Hi (s) = (I + Hc (s)H(s))−1 .18. We will now illustrate some of the properties of the reduction of unstable systems by balanced truncation. The second approach proposed in [372]. this can be considered as a weighted system with input weighting Hi (s) = H(s) (I + Hc (s)H(s)) −1 . ACA April 14. Therefore. 1) blocks of the two gramians of the partitioned system above.1. Actually. can be computed by solving Lyapunov equations. with the following example. Furthermore. the closedloop gramians deﬁned above in the frequency domain.51).g. as expected. The resulting gramians of Hc (s) with respect to the given closedloop. Frequency weighted balanced truncation* 201 i “oa” 2004/4/14 page 201 i We now turn our attention to the deﬁnition of gramians for closedloop systems. There are two remedies to this situation. (4. These frequency domain expressions are indeed deﬁned even if the system is unstable (they are not deﬁned when the system has poles on the imaginary axis) and are positive deﬁnite.43). The ﬁrst is to use timelimited gramians. (4. one can again use the (1. 2004 i i i i . as noted in [372]. Let Σ= A C B 0 . Hc ). by means of (7. Finally. can be used to reduce the order of the controller in the closed loop. be the system quadruples. Balancing P (T ) and Q(T ) followed by truncation would then provide a reducedorder system which approximates well the impulse response over the chosen interval. balancing based on simultaneous diagonalization of the inﬁnite gramians consists in separate balancing of the stable and of the antistable parts of the system. The approach follows [362]. Σ= A C B D = 0 1 0 0 0 0 4 1/5 1 0 0 1 0 0 0 0 0 1 −1/10 0 0 0 0 1 0 . as deﬁned above. t2 ]. (4.6. the gramians of the controller Hc (s) in the −1 same closed loop. Q given by (4. Similarly. can be deﬁned by noting that the transfer function of interest is Hc (s)H(s) (I + Hc (s)H(s)) .5 Balancing for unstable systems* The problem with balancing of unstable systems is that the inﬁnite gramians P . let the transfer function be φ(H.6. Thus in order to get an approximation by balanced truncation of the system Σ in the sense that the closedloop norm is kept small or minimized.42) over any ﬁnite interval [t1 . which are deﬁned by (7.34). then a realization (At . Σc = Ac Cc Bc 0 . Example 7. The gramians of the system Σ with respect to the given closedloop with the controller Σc . as noticed earlier these gramians satisfy Lyapunov equations given in lemma 7. Ct ) for the compound φ(H.i i 7.
5164 .0273 + . and the balanced system is −.4142 0 −.2567i 1.2922 −.7303].3017 Abal Bbal .0302.4096 Abal Bbal 0 Σ= = Cbal −. the singular values are .3123 .2698 −3. Bbal = TB.3049 .8264 −.4096 −.0343 − .8471 = ∞.5908 .0529 .0005 1. W = .1017 −. Cbal = CT−1 : −.3419 + . where v∗ = [.0258i −.4096 −.0171 + . λ3.4955.3123 − A− 0 B − − bal bal −1.0533 Abal Bbal .9117 Cbal D −3.5161i .0128 −.8538.3123 1.2922 −.0128 −1.5164 −.9560 .0924 .2807 −.0182 + . those of S(100) are −.3016 0 −. 1 Finally we recall that S(∞) deﬁnes the spectral projectors related to A.7080 .1956.3016 0 .4097 −.3016 T= −.0299.0181.2567i .0009.0924 −.6061 −.3124 .6257 −.0258i −.3419 − . .5878 −. It follows that Π+ = vw .3017 −.3549 .2807 0 This is the direct sum of the balanced realizations of the stable subsystem Σ− and of the antistable subsystem Σ+ : −.3651 −.4955.2807 Thus Abal = TAT−1 . .2822 −. 0924 − .2582 .4142.6037 −. .9117 −2.4664 −.4998 ± .0533 0 −.2 = ± 2. b = 10 2 − .7440 −3.0280. .8881 . . ω = 100.6838 − .4143 .0534 1.i i 202 Chapter 7. For ω = 1.0277.7938 .0013 .2934 −.0242i .6596 −.2807 Cbal − + Cbal Cbal −.6838 + .4139 .3642 − .0045i .2797i.4270 .0037 .7987 −.7303 .1025 1.2698 2.3123 .2807 0 Σ+ = A+ bal C+ bal B+ bal = 1.2922 0 . The eigenvalues of S(1) are −. .1025 . 2 2 We will balance this system for ω = 1.0924 .0484i V= .5908 3.4 = −1/20 ± i 799/20.3306].0013 −.0924 .4142 −.0484i −.1204 2.5006 .4844i −. .2922 0 −1.0182 − .9688i −.9560 0 For ω = 100 the singular values are . 2004 i i .0242i i i ACA : April 14.1956.0559 2. and 2 with multiplicity 3. 1017 . Σ− = 0 A+ B+ Σ= = bal bal − − .6929 −3.5161i −.0343 + . The singular values are .7696 .0529 −1.2937 = Cbal D −.0159 ± .4097 −. With .0129. and the eigenvalues of S(∞) are − 1 2 .7303 .4844i .2582 .1025 . Balancing and balanced approximations i “oa” 2004/4/14 page 202 i √ √ The eigenvalues of A are: λ1. 3. is an (oblique) projection onto the invariant subspace of A corresponding to the eigenvalues in the righthalf plane (unstable eigenval1 ues) and Π− = 2 I − S(∞).0529 1.0924 −. .0010.5878 4. namely Π+ = 2 I + S(∞).7303 −.4424 −.0559 −.3651 . and ω .3016 .4080 .2822 0 Finally.0320 2.7696 −2.9688i .3124 −1.3974 .2922 −. 0005 . . Hence the balanced system is −2.9206. Consequently the balancing transformation is −.5426 . 1 ∗ .3016 −. for ω → ∞ the balanced realization of the system consists of the direct sum of the balanced realization of the stable part and that of the instable part.0171 − .3642 + .3123 .8881 3. 1025 − 1 . this implies w∗ Av = 1.9203.4096 −. .9439 −. and the transfer function is H(s) = s − 399 5 a b 10 √ + √ − 1 2 799 s + 10 s + 2 s + 2 s − 2 √ √ 1 1 where a = 10 2 + .1204 Abal Bbal = −2. is a projection onto the invariant subspace corresponding to the eigenvalues in the lefthalf plane (stable eigenvalues). .9352 .4080 .4080 .8471 .8535.0273 − . w∗ = [. −.
2.100] 2b −50 3a 3a −25 −60 −30 −70 2a −35 orig −80 −40 10 0 −90 Frequency (rad/sec) 0 Frequency (rad/sec) Figure 7.05 − 1.9800 2. The plots show that the approximants for ω = 1. by balanced truncation. 1] and 4 the inﬁnite horizon fourth order model.1] 2nd order system [0. which are labelled 2a. Finally. ω = 100. − . Notice that the second order inﬁnite horizon approximant misses the resonance while the ﬁnite horizon one does not. 1 ± i. 3a for ω = 1 and 2b. We compute second and fourth order approximants obtained by frequency weighted balanced truncation with horizon [−1. −1 ± i. Subsequently 3 is the fourth order system with horizon [−1. Chapter summary 203 i “oa” 2004/4/14 page 203 i Π− = VW∗ which implies that the projected A is W∗ AV = diag [−1.100] 2nd order system [0.4133i.6. 1] and with inﬁnite horizon.7. More precisely.i i 7. 2004 i i i i .9399 0 0 0 0 1 0 0 1 0 0 0 0 −1.2.7 Chapter summary A balanced realization of a system is one in which states which are difﬁcult (easy) to reach are also difﬁcult (easy) to observe. the second order approximant obtained with inﬁnite horizon with 1 and the one with horizon [−1. The original system is denoted with the subscript 0.1 by balanced truncation Example 7. 1] by 2. The frequency responses are plotted from .4142. Both fourth order approximants reproduce the peak but the inﬁnite horizon fourth order one gives the overall better approximation. Error plots: approximation of an unstable system Balanced approximation of an unstable system 10 10 5 0 0 −10 2b 3a −5 2a 3b Singular Values (dB) −20 Singular Values (dB) −10 −30 2a 3b 3a −15 −40 2b −20 Original system 3nd order system [0.05 + 1. 3b for ω = 100.4133i]. − .2.1] 3nd order system [0. we obtain reduced order systems of the unstable system Σ. These results can also be observed in the frequency response plots of the error systems which are depicted on the right plot of ﬁgure 7. We consider a second unstable system deﬁned as follows = −1 0.0200 1 0 0 1 0 0 0 0 0 0 0 0 −1. the frequency response of the original system is plotted together with the frequency responses of the second and third order approximants. the minimal energy required to reach a given state is the inverse of the observation energy ACA April 14. Approximation of the unstable system in example 7.2. −2. On the lefthand side of ﬁgure 7. while the approximants for ω = 100 do a better job in approximating the peak. We will compare the reduced order systems of order two and three computed for ω = 1.0201 2. approximate the frequency response well for low frequencies up to 1Hz .3Hz to 7Hz .0402 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 A C B D The poles of the system are 1. 7.6.
2 by balanced truncation generated by the same state.5. without computing them ﬁrst. called Lyapunov balancing in section 7. They can be omitted at ﬁrst reading. It has the following properties: (i) stability is preserved and in addition (ii) an apriori computable error bound exists.6.i i 204 Chapter 7. yields directly the square roots (Cholesky factors) of the gramians. i i ACA : April 14. approximation by balanced truncation. The main results together with their proofs. The latter provides a tradeoff between accuracy of the approximation and complexity of the approximant. Section 7. and positive real balancing. namely. bounded real balancing.2. These facts form the basis for one of the most effective approximation methods. namely additional types of balancing and frequency weighted balancing. we discuss stochastic balancing. Mathematically.3. both with and without the choice of frequency weights. Subsequently two specialized sections are presented. Approximation of the unstable system in example 7. The third section is devoted to numerical issues. A canonical form for continuoustime balanced systems follows. the difﬁculty in applying balanced truncation in largescale settings is the fact that O(n3 ) operations and O(n2 ) storage are required.6 explores the issue of balancing which is frequency selective. Since balancing consists in the simultaneous diagonalization of two positive (semi) deﬁnite matrices different kinds of balancing can be obtained by considering various Riccati equations. just like the singular values of a (constant) matrix provide a tradeoff between accuracy and complexity. which as discussed in the previous chapter. 2004 i i . Balancing and balanced approximations Frequency weighted balanced approximation of an unstable system 40 i “oa” 2004/4/14 page 204 i 30 20 2 1 10 3 Singular Values (dB) 0 −10 −20 −30 4 −40 −50 0 0 1 −60 −1 10 10 10 Frequency (rad/sec) Figure 7. The section concludes with timelimited balancing and balancing for unstable systems. The main method used for computing the gramians is the squareroot (Hammarling’s) method. can be found in section 7. Besides the usual kind of balancing. balancing amounts to the simultaneous diagonalization of two positive (semi) deﬁnite gramians.
See also [238]. it maps past inputs u− into future outputs y+ . Arov. t ∈ R. If the system is ﬁnite dimensional. Given a linear system Σ. it provides additional insights to the theory presented in the ﬁrst part of the chapter. a balanced state space representation of a linear system Σ can be derived by minimizing an appropriate criterion. The operator S is not compact however.16) on page 112). Although this is presently limited to singleinput singleoutput systems.2. This is the Hankel operator of Σ deﬁned by means of (5. and especially Glover [138]. namely. approximation by balanced truncation does not seem to minimize any norm. and it does not have a discrete singular value decomposition (see section 5.2). For surveys see [14]. its range is a ﬁnite dimensional space). [17]. where h is the impulse response of the system. It shows for instance that the all pass dilation system has additional structure. Nevertheless. [2]. The resulting theory is the generalization to dynamical systems of the optimal approximation in the 2induced norm of ﬁnite dimensional matrices and operators discussed in section 3. we need to come up with the singular value decomposition of some input/output operator associated with such systems. deﬁned by (4.1 Introduction In order to successfully apply the SVD approach to the approximation of dynamical systems.1. except in very special cases. optimal approximation in the Hankel norm. but which has domain and range different from those of S : 0 H : u− → y+ = H(u− )(t) = −∞ h(t − τ )u(τ )dτ. This is referred to as the H∞ norm of the system. It provides explicit formulae for optimal and suboptimal approximants as well as an error bound in the related H∞ norm (the 2induced norm of the convolution operator). t ∈ R+ . [136] and the books [148] and [370]. Therefore 205 i i i i . we deﬁne one which makes use of the same convolution sum or integral. Searching for an operator associated with Σ that has a discrete SVD.6 gives an exposition of a polynomial approach to Hankel norm approximations.20) on page 115. Section 8. since it can be constructed using rational interpolation at the poles of Σ. that is of the transfer function of Σ (see (5.4. The consequence of this lack of a discrete singular value decomposition is that the problem of approximating S by an operator of lower complexity that minimizes the 2induced norm of the error cannot be solved at present.5) on page 55: ∞ S : u → y = S (u)(t) = −∞ h(t − τ )u(τ )dτ.i i i “oa” 2004/4/14 page 205 i Chapter 8 Hankelnorm approximation As mentioned in section 7. and Krein [1]. If the system is stable (the eigenvalues of A are in the lefthalf of the complex plane) the 2induced norm of S turns out to be equal to the H∞ norm of the Laplace transform H(s) of h(t). A reﬁnement leads to an approximation method which is optimal with respect to the 2induced norm of the Hankel operator. H has ﬁnite rank (that is. 8. the natural choice for an inputoutput operator is the convolution operator S . The developments in this chapter are due to Adamjan.
5) ˆ is allpass. is discussed in section 8. Given stable systems Σ. Fact II. and Krein [1].4) (8. The 2–norm of any L2 system Σ is no less than the Hankel norm of its stable part Σ+ : Σ 2≥ H≥ σk+1 (Σ) (8.i i 206 Chapter 8.38) and (5.4. and (ii) the existence of an error bound. with subsequent contributions by Glover [138]. n − 1}. 2004 i i . Fact I. this bound is the same as the one valid in the balanced truncation case. [2]. Σ . Given a stable system Σ. These results can also be used for approximating discretetime systems by means of the bilinear transformation of subsection 4.3. It should also be mentioned that in some cases balanced truncation is superior to Hankel norm approximation. The H∞ norm of the error system is bounded namely. then the ˆ is the optimal/suboptimal approximant of Σ.3. It is reminiscent of the bound which holds for constant matrices. the theory involves unstable systems.5. a stable system Σ and a positive real number such that σk (Σ) > ≥ σk+1 (Σ) ˆ having k stable poles. and the associated error system has Hankel norm stable part of Σ between σk+1 and . 8.4.1. in the H∞ norm. 1. in the 2induced norm of the Hankel operator. This solution originated in the work of Adamjan. Although we are seeking to approximate stable systems. k respectively. 2. where n > k . The reason is that Hankel norm approximants are optimal in the Hankel norm but not in the H∞ norm.3) ˆ is allpass. Given k ∈ {0. · · · . such that with σ0 (Σ) = ∞. The solution of the Hankelnorm approximation problem both for the optimal and the suboptimal ˆ so that the cases. the formulae obtained are explicit in the system parameters.2 this induced norm is called the Hankel norm and the corresponding singular values are the Hankel singular values of the system. Hankelnorm approximation i “oa” 2004/4/14 page 206 i (since it is bounded and compact) it possesses a discrete singular value decomposition (SVD). Furthermore Σ − Σ Fact IV.4.1) Σ+ H (8.4. Therefore formulas (5. Arov. It is a nontrivial fact that the optimal approximation problem in the 2induced norm of H can be solved. there exists a system Σ ˆ Σ−Σ 2= (8. This construction depicted in ﬁgure 8. The main attributes of optimal approximation in the Hankel norm are: (i) preservation of stability. of McMillan degree n. i i ACA : April 14. having exactly k stable Fact III. Furthermore Σ − Σ Conclusion. In the following sections we report the results which are valid for continuoustime systems. there holds Σ−Σ See section 8. such that ˆ 2 = σk+1 (Σ) Σ−Σ (8. As discussed in section 5. construct Σ parallel connection Σe is allpass (having both stable and unstable poles) and 2–norm equal to . by twice the sum of the neglected Hankel singular values 2(σk+1 + · · · + σn ).1.39) of the 2–norm (which is equal to the L∞ norm of the associated transfer function) and of the Hankel norm of such systems should be kept in mind. follows as a consequence of the facts listed above: given Σ which is stable.2) ˆ . which is in general not stable. there exists a system Σ poles. and in principle one could try to solve the optimal approximation problem as deﬁned above.4.2.1 Main ingredients The Hankelnorm approximation theory is based on the following four facts. This construction is discussed in section 8.
Σ of dimension n. k . Arov and Krein asserts that this lower bound is indeed attained for some Σ of dimension k .and discretetime systems can be handled within the same framework. 8. Notice that the above theorem holds for continuoustime systems as well.1. The optimality is with respect to the 2induced norm of the associated Hankel operator. such that the associated Hankel matrix H has ﬁnite rank n. Construction of approximants which are optimal and suboptimal in the Hankel norm 8.2. Therefore.1. Theorem 8. respectively. The original sources for this result are [1] and [2]. Optimal here means inf H − H∗ = inf H − K .6) The question which arises is to ﬁnd the inﬁmum of the above norm. we seek stable approximants Σ∗ of dimension k . solved by the AAK theorem. (AAK) Given the sequence of p × m matrices h = (h(k ))k>0 .i i 8. A result due to Adamjan.7) If p=m=1. This result says that every stable and causal system Σ can be optimally approximated by a stable and causal system Σ∗ of lower dimension. Given a stable system Σ. As it turns out. Both continuous. there exists a sequence of p × m matrices h∗ = (h∗ (k ))k>0 .2 The AAK theorem Consider the stable systems Σ. The AAK theorem 207 i “oa” 2004/4/14 page 207 i E E E ˆ( ) Σ E − + c d T Σe E Σ E Figure 8. the SchmidtEckartYoungMirsky theorem implies that HΣ − HΣ 2−ind ≥ σk+1 (HΣ ) (8. the associated Hankel operator HΣ has rank n and HΣ has rank k . it turns out that the formulae for suboptimal approximants are simpler than their optimal counterparts.3 The main result In this section we present the main result. As shown earlier. where the ﬁrst inﬁmum is taken over all Hankel matrices H∗ and the second over all arbitrary matrices K. one can consider both suboptimal and optimal approximants within the same framework. Problem 8. ACA April 14.3.1. the optimal approximant is unique. The concept introduced in the next deﬁnition is the key to its solution. Actually. 2004 i i i i . satisfying σk+1 (Σ) ≤ Σ − Σ∗ H ≤ < σk (Σ) This is a generalization of the optimal approximation in the Hankel norm. given the fact that the approximant is structured (block Hankel matrix): inf rankΣ =k HΣ − HΣ 2−ind . such that the associated Hankel matrix H∗ has rank k and in addition H − H∗ 2−ind = σk+1 (H) (8.
2. (8. be an allpass dilation of Σ.2: Proposition 8. (b) We are given a stable system Σ and seek to compute an approximant in the same class (i.e. the ˆ is allpass. recall that if a system has both stable and unstable poles.8) ˆ + has exactly k stable poles and consequently It follows that Σ ˆ σk+1 (Σ) ≤ Σ − Σ In case σk+1 (Σ) = . There holds Σ−Σ H ≥ σk+1 (Σ) This means that the 2induced norm of the Hankel operator of the difference between Σ and Σ is no less than the (k + 1)st singular value of the Hankel operator of Σ.3. is obtained for k = 0 in (8. The result is a consequence of the following sequence of equalities and inequalities: ˆ+ σk+1 (Σ) ≤ Σ − Σ H ˆ = Σ−Σ H ˆ ≤ Σ−Σ ∞ = The ﬁrst inequality on the left side is a consequence of the main lemma 8. 2004 i i . Proof. (8. exactly k stable poles. Then Σ ˆ has Main Lemma 8. form where the inﬁmum is taken over all antistable systems Σ one of the cornerstones of the Hankelnorm approximation theory. Theorem 8. they are inside the unit disc.5).i i 208 Chapter 8. second inequality follows from (5.2.1. H < . ˆ be an allpass dilation of the linear. Let Σ ˆ + = k. the allpass dilation system has the following crucial property. systems whose poles are all unstable).8). The construction mentioned above implies that this distance is equal to the Hankel norm of Σ: inf ˆ Σ−Σ 2= Σ H= σ1 (Σ). The Nehari problem and associated construction of Σ ˆ . let Σ have at most k stable poles. since the allpass dilation system Σ i i ACA : April 14.4.3.3.e. is that of its stable part.1. It seeks to ﬁnd the distance between a stable system from the set of antistable systems (i.or continuoustime system Σ. and for = σ1 (Σ) in (8.1. it is proved in section 8.10) ˆ .4). the construction given above takes us outside this class of systems. where satisﬁes (8.Σ As a consequence of the inertia result of section 6.6). i.3. stable). with dim Σ ˆ ≤ dim Σ. and the last equality holds by construction.4). for dynamical systems is stated next. discrete. The analogue of the SchmidtEckartYoungMirsky result (8. This is the Nehari problem.9) ˆ σk+1 (Σ) = Σ − Σ H. (8. Finally. (8. its Hankel norm. since Σ − Σ Remark 8.4. the equality follows by deﬁnition. ˆ . We are now ready for the main result. Hankelnorm approximation i “oa” 2004/4/14 page 208 i ˆ : Σe = Σ − Σ ˆ . stable.e. . If Σe is an allpass system with norm Deﬁnition 8. Given the stable system Σ. Let Σe be the parallel connection of Σ and Σ ˆ is called an allpass dilation of Σ. while in the discretetime case. Let Σ where σk+1 (Σ) ≤ < σk (Σ). (a) An important special case of the above problem.21). dim Σ Recall that stability in the continuoustime case means that the eigenvalues of A are in the lefthalf of the complex plane. In order ˆ to achieve this.
5.1 A simple inputoutput construction method Given the polynomials a = i=0 ai si . − = ⇔ pq ˆ ˆ q q qq ACA April 14. which is the Hankel matrix HΣ has rank r and approximates the Hankel matrix H .6).4). A special case of this problem is treated in [13]. of degree at most n such that p. not necessarily square) systems (section 8. Construction of approximants 209 i “oa” 2004/4/14 page 209 i has poles which are both stable and unstable.4. 1). • A state space based parametrization of all suboptimal approximants for general (i.4 and 8. q ˆ . satisfying c = ab.1). HΣ ˆ (s) = ˆ (s) q q(s) α β γ i i=0 ci s . • The optimality of the approximants is with respect to the Hankel norm (2induced norm of the Hankel operator). is still open. The advantage of this approach is that the equations can be setup in a straightforward manner using the numerator and denominator polynomials of the transfer function of the given system (section 8. − 1. c = product c are a linear combination of those of b: c = T(a)b where c = (cγ cγ −1 · · · c1 c0 )∗ ∈ Rγ +1 .e. In terms of matrices. Section 8. We will discuss the following cases. • An inputoutput construction applicable to scalar systems. which is no longer lower triangular. gives an account of error bounds for the inﬁnity norm of the approximants (2induced norm of the convolution operator).4. As mentioned earlier all formulae describe the construction of allpass dilation systems (see 8. the problem of optimal approximation concerning ﬁnitedimensional Hankel matrices. • A state space based construction method for suboptimal approximants (section 8. 1. such that the difference is unitary.4. and T(a) is a Toeplitz matrix with ﬁrst column (aα · · · a0 0 · · · 0)∗ ∈ Rγ +1 . (d) It is interesting to notice that while the AAK result settles the problem of optimal approximation of inﬁnitedimensional Hankel matrices of ﬁnite rank. so that the 2norm of the error satisﬁes (8. is not easy in this framework. and the polynomials ˆ. with Hankel matrices of lower rank.8). For continuoustime systems. the problem is: given . Let The basic construction given in theorem 8.1. Given a polynomial c with real coefﬁcients. ﬁnd polynomials p ˆ ˆ∗ p p q∗ q ˆ − qp ˆ = q∗ q ˆ∗. This means that c∗ = J c ˆ of Σ.4.5) of square systems. 2004 i i i i . 8. such that deg(p) ≤ deg(q) = n. and ﬁrst row (aα 0 · · · 0) ∈ R1×(β +1) .3. We then compute a (block) Toeplitz matrix SΣ ˆ .4 hinges on the construction of an allpass dilation Σ HΣ (s) = ˆ (s) p p(s) .4 Construction of approximants We are ready to give some of the formulae for the construction of suboptimal and optimal Hankelnorm approximants. Both optimal and suboptimal approximants are treated. b = (bβ · · · b0 )∗ ∈ Rβ +1 . b = i=0 bi si . the coefﬁcients of the We require that the difference HΣ −HΣ ˆ = HΣe . It then follows that the lower lefthand portion of SΣ ˆ .i i 8. We also deﬁne the sign matrix whose last diagonal entry is 1: J = diag ( · · · . we start with a system whose convolution operator SΣ is a (block) lower triangular Toeplitz matrix. of appropriate size. 8. Therefore.4.4. ˆ Σ (c) The suboptimal and optimal approximants can be constructed using explicit formulae. be allpass.4) and for optimal approximants (section 8.4. q. The drawback is that the proof of the main lemma 8. see sections 8. the polynomial c∗ is deﬁned as c(s)∗ = c(−s).4.5.
We illustrate the features of this approach by means of a simple example. cases.1. Thus both suboptimal and optimal approximants can be computed this way. Fe Qe + Qe Fe + He He = 0. F.i i 210 Chapter 8. we obtain the following system of equations: 0 0 q2 0 0 q ˆ1 p2 + q2 p2 − q2 ˆ0 p1 − q1 0 q1 q2 0 q p1 + q1 p2 + q2 q0 q1 q2 p ˆ2 = − p0 + q0 p0 − q0 p1 − q1 0 q0 q1 p ˆ1 0 0 p0 + q0 0 0 q0 0 p ˆ0 W( ) This can be solved for all which are not roots of the equation det W( ) = 0. The solution of this set of linear equations provides the coefﬁcients of the allpass dilation system Σ this system can be solved both for the suboptimal = σi . since in the SISO case HΣ is selfadjoint (symmetric) the absolute values of 1 . Notice that λmax (Pe Qe ) = λmax (RR∗ Qe ) = λmax (R∗ Qe R) ≥ λmax (R∗ Qe R)11 where the subscript ( · )11 denotes the block (1. where Σ has McMillan degree n and Σ has McMillan degree k < n. Qe are the reachability. i. Example 8. Furthermore (R∗ Qe R)11 = (R11 R12 )Qe (R11 R12 )∗ = R11 (Qe )11 R∗ 11 + X i i ACA : April 14. Pe . 2 . For an alternative approach along similar lines. observability gramians of Σe . 8. Furthermore. P12 = R11 R12 . P22 = R12 R12 + R22 R22 . ˆ . q ˆ2 = 1. n = 2. i. 2004 i i . are the singular values of HΣ . see [120]. for which the determinant of W is zero. Qe ∈ R Consider the Cholesky factorization of Pe : Pe = R∗ R where R = R11 0 R12 R22 and Pe = P11 ∗ P12 P12 P22 The following relationships hold between the Pij and Rij : ∗ ∗ ∗ P11 = R∗ 11 R11 . satisfying (n+k)×(n+k) ∗ ∗ ∗ Fe Pe + Pe F∗ e + Ge Ge = 0. 1 and 2 . It can be shown that the roots of this determinant are the eigenvalues of the Hankel operator HΣ . B) be realizations for Σ and Σ .3. Let Σ be a second order system.4. Let (H.4. A realization of the difference Σe = Σ − Σ is given by F 0 G He = (H − C). G) and (C.2 An analogue of SchmidtEckartYoungMirsky In this section we prove proposition 8. The latter is a polynomial equation of second degree .e. If we normalize the coefﬁcient of the highest power of q ˆ. 1) element of ( · ). Fe = . there are thus two values of . respectively.e. A. and the optimal = σi . − T(q)) ˆ q ˆ p = 0. Hankelnorm approximation i “oa” 2004/4/14 page 210 i This polynomial equation can be rewritten as a matrix equation involving the quantities deﬁned above: ˆ T(p)ˆ q − T(q)ˆ p = T(q∗ )ˆ q∗ = T(q∗ )Jq Collecting terms we have (T(p) − T(q∗ )J. Ge = 0 A B By deﬁnition Σ−Σ 2 H= λmax (Pe Qe ) where Pe .
σn ) . We proceed from the simple to the more involved cases. Given p. Solution: M= 2. ACA April 14. Qe = (8.4. N where M= such that MN = I. σ  > 1. N= Σ I I −Γ−1 Σ such that Pe Qe = I. Given σ ∈ R. .11) The last dilation (8. 2004 i i i i . ﬁnd M. rank X ≤ k This completes the proof of proposition 8. Given P.3 Unitary dilation of constant matrices In this section we investigate the problem of augmenting a matrix so that it becomes unitary. Given Σ = diag (σ1 . ﬁnd Pe = P ? ? ? Γ −QΓ . . Given σ ∈ R. Solution: M= p 1 − pq 1 − pq −q (1 − pq ) . N= q ? ? ? 4. or augmenting two matrices so that their product becomes the identity. Qe = Q ? ? ? Q I I −Γ−1 P Σ Γ Γ −ΣΓ . Solution: M= σ 1 − σ2 1 − σ2 −σ (1 − σ 2 ) . 8. σi = 1.3. Construction of approximants where X = (R11 R12 ) Since rank X ≤ k . σ2 . N= σ 1 1 σ − 1− σ2 σ ? ? ? ∈ R2×2 . ﬁnd M. Q ∈ Rn×n . we conclude that ∗ λmax (R∗ Qe R)11 ≥ λmax (R11 (Qe )11 R∗ 11 + X) ≥ λk+1 (R11 (Qe )11 R11 ) = σk+1 (Σ). N = σ ? ? ? ∈ R2×2 σ ? ? ? ∈ R2×2 √ − 1 − σ2 σ √ σ 1 − σ2 3. N= q 1 1 p − 1− pq p ? ? ? . Solution: Let Γ = I − PQ Pe = P Γ∗ . σ  < 1. 1. where M= such that MM∗ = I. N where M= such that MN = I.4.11) is now used to construct suboptimal approximants. . ﬁnd M.i i 8. pq = 1. λi (PQ) = 1. i “oa” 2004/4/14 page 211 i 211 0 Q∗ 12 Q12 Q22 R∗ 11 R∗ 12 ≥ 0. ﬁnd M= Σ ? ? ? ∈ R2n×2n . . q ∈ R. N = Σ ? ? ? ∈ R2n×2n such that MN = I. Solution: Let Γ = I − Σ2 M= 5.
Following the dilation (8. B. and consequently −Γ P has k positive eigenvalues. i. B ˆ = (CP − DB ˆ ∗ )(Γ∗ )−1 . C ˆ. B ˆ. P .12) Σe = is unitary (allpass). C= C11 C12 . P = Ir 0 0 P22 . Decompose H ˆ in a stable and an antistable part H ˆ =H ˆ+ +H ˆ − .4. B= B11 B21 .13) The conditions for the dilation to be all pass given in part 3. Q. A ˆ +C ˆ ∗C ˆ = 0. Solving these relationships for the quantities sought. A11 + A11 + C11 C11 = 0 i i ACA : April 14.4. ∗ A∗ e Qe + Qe Ae + Ce Ce = 0. De = −D ˆ .15) r where A11 ∈ Rr×r . B11 . page 131. Solution. the case where 1 is a singular value of multiplicity r of the system to be approximated. Given A. Ce = (C − C (8. notice that A11 . we obtain ˆ = −QB + C∗ D ˆ. it follows that −Γ has k Furthermore: A −1 ˆ has k eigenvalues in positive eigenvalues. Be = B ˆ B ˆ ).11).4 the desired ˆ + H < 1. Qe Be + C∗ e De = 0. C.14) 8. ˆ. Q. Let A. Suboptimal Hankel norm approximation ˆ (s) = D ˆ +C ˆ (sI − A ˆ )−1 B ˆ . Then due to Let H main theorem 8. p = m.4 Unitary system dilation: suboptimal case Recall the deﬁnition of allpass or unitary systems given in section 5. we deﬁne Qe = Q I I −Γ−1 P .17 are (i) (ii) (iii) D∗ e De = I. Γ = I − PQ (8. of theorem 5. B. where Q ˆ ∗Q ˆ+Q ˆA ˆ = −(Γ∗ )−1 P . Hence if σk+1 < 1 < σk .1. Assume: σk > 1 > σk+1 . We conclude that A the lefthalf of the complex place C− .i i 212 Chapter 8. D ˆ such that Problem 8.8.3. C∗ 11 ∈ R and Ir denotes the r × r identity matrix. σk+1 < H−H holds true. p = m. ﬁnd A Ae = Ae Ce Be De A ˆ A . 2004 i i .4. P . (8. C ˆ = −A∗ + C∗ C ˆ.5 Unitary system dilation: optimal case We now discuss the construction of solutions in the optimal case. B11 and C11 satisfy ∗ ∗ ∗ A11 + A∗ 11 + B11 B11 = 0. Q= Ir 0 0 Q22 (8. be partitioned as follows: A= A11 A21 A12 A22 . C.e. Hankelnorm approximation i “oa” 2004/4/14 page 212 i 8. First.
i. Gohberg.2. Ce . B21 . Consider a system of McMillan degree 4 with Hankel singular values σ1 > σ2 > σ3 > σ4 > 0. C12 ).e. let σk (Σ) > > σk+1 (Σ). which implies the existence of a unitary matrix D such that i “oa” 2004/4/14 page 213 i 213 ˆ = C∗ . This is not true for systems with more than one inputs and/or more than one outputs. A Example 8.D ˆ is deﬁned by (8. Hence unlike the suboptimal where Γ2 = I − P22 Q22 ∈ R(n−r)×(n−r) . The abscissa is γ . Hankel norm approximation: dimension δ . The following rational matrix Θ(s) has dimension (p + m) × (p + m): 1 ∗ Θ(s) = Ip+m − HΘ (sI − FΘ )−1 Q− Θ HΘ J ACA April 14.6 All suboptimal solutions Above we provided a way to construct one suboptimal approximant of a square system.12) where A ˆ has k eigenvalues in the lefthalf plane and n − r − k eigenvalues in the righthalf plane.C ˆ.B ˆ.2. δ+ 4 4 δ 3 3 2 2 1 1 0 σ4 σ3 σ2 σ1 γ 0 σ4 σ3 σ2 σ1 γ Figure 8. Given the system Σ = A C B with m inputs and p outputs.4. Be . 8.4. and Rodman [38].16) ˆ is the unitary matrix deﬁned above. Construction of approximants ∗ ˆ Consequently B11 B∗ 11 = C11 C11 . In the singleinput singleoutput case D ˆ is completely determined by the above relationship case D (is either +1 or −1) and hence there is a unique optimal approximant.16) is all pass. The ﬁgures below show δ and δ+ as a function of γ . De deﬁned by (8. Let ˆ and δ+ the McMillan degree of Σ ˆ + .3 on the book by Ball. stable part of allpass dilation systems. δ+ of allpass dilation systems. and D ˆ is not arbitrary.14) we construct an allpass dilation of the subsystem (A22 .5. ˆ. the system to be approximated has dimension n = 4. The system Ae .i i 8. Moreover Lemma 8. namely ˆ = −Q22 B21 + C∗ D ˆ B 12 ˆ = (C12 P22 − DB ˆ ∗ )(Γ∗ )−1 C 21 2 ˆ = −A∗ + C∗ C ˆ A 22 12 (8. It is however rather straightforward to provide a parametrization of all suboptimal approximants of an arbitrary system Σ. δ denotes the number of poles of the δ denote the McMillan degree of Σ dilation system while δ+ denotes the number of stable poles of the same system. 2004 i i i i . The exposition below we follows section 24. B11 D 11 Using formulae (8.4.
e.i i 214 where 1 i “oa” 2004/4/14 page 214 i Chapter 8. where Φi (∆). for some H ˆ having k stable poles. J = In P 1 2 ∈ R(p+m)×(p+m) Γ−1 − 2 QΓ−1 1 1 1 QΘ = thus Θ(s) = 2 Q In Ip + 1 ⇒ Q− Θ = −Γ−1 P Γ−∗ Im − C(sI − A)−1 Γ−1 P C∗ C(sI − A)−1 Γ−1 B 2 − 1 B∗ (sI + A∗ )−1 Γ−∗ C∗ B∗ (sI + A∗ )−1 QΓ−1 B Notice that by construction Θ is Junitary on the jω axis.18) if. 2. We only (b) With Z as above and ∆ antistable. ψ2 has no zeros in the LHP and hence ψ2 has no poles there. 2004 i i . A realization for −1 ψ3 is −1 −1 −1 −A∗ 1 B −A∗ + 1 BB∗ 1 B 2 QΓ 2 QΓ 2 QΓ −1 ψ3 = ⇒ ψ3 = − B∗ Im B∗ Im −1 Using an inertia argument. where ν is the number of LHP poles of ∆. and ∆(s) is a p × m antistable contraction. Hankelnorm approximation HΘ = C 0 0 B∗ 1 2 ∈ R(p+m)×2n . i. and only if. FΘ = Ip 0 0 −Im A 0 0 − A∗ ∈ R2n×2n (8.6. (a) Let Z(∆) = Φ1 (∆)Φ2 (∆)−1 . In this case Z(s) = = 1 1 C(sI − A)−1 Γ−1 B Im − C(sI − A)−1 Γ−1 B Im + 1 1 −1 B∗ (sI + A∗ )−1 QΓ−1 B 2 2 B∗ (sI + A∗ − 1 2 QΓ−1 BB∗ )−1 QΓ−1 B Thus a realization of Z is given by: − A∗ + 1 1 2 QΓ−1 BB∗ 0 A C 1 QΓ−1 B Γ−1 B 0 Γ−1 BB∗ 0 i i ACA : April 14. we can write Z = H − H show this for ∆ = 0. ψ1 has n + ν poles in the LHP. we write Z as follows: Z = (Θ11 (s)∆(s) + Θ12 (s)) (Θ22 (s)−1 Θ21 (s)∆(s) + I)−1 Θ22 (s)−1 ψ1 −1 ψ2 ψ3 −1 We examine the poles of each one of the above terms. ˆ is an approximant of Σ with k stable poles satisfying Σ − Σ ˆ+ Theorem 8. The proof can be divided into three parts. ˆ . the associated where [ · ]+ denotes the stable part of [ · ]. This concludes the proof of 2 QΓ (a). Outline of proof. −A∗ + 1 BB∗ has exactly k eigenvalues in the LHP. To prove this fact. Σ transfer functions satisfy ˆ (s) = Φ1 (∆)Φ2 (∆)−1 = Z(s) H(s) − H ∆(s) < H− ∞ 1 H< =Θ ∆ I = Θ11 (s)∆(s) + Θ12 (s) Θ21 (s)∆(s) + Θ22 (s) (8. are deﬁned by (8.17) ∈ R2n×2n Γ = (I − 1 PQ). i = 1. The last term can be treated as follows. Deﬁne Φ1 (∆) Φ2 (∆) The following result holds. By a homotopy argument.18) and the L∞ norm of ∆ is less than 1. We show that the number of LHP poles of Z is n + k + ν .
i i 8.2 of [138]. This follows from part (a) of the proof above. 2004 i i i i . Given is a square system Σ (m = p) which is stable. Assume that H(∞) = 0. are stable. From the above decomposition we can derive the following upper bound of the H∞ norm of Σ. 2. · · · .e. · · · . let Σ and antistable parts: ˆ =Σ ˆ+ + Σ ˆ− Σ ˆ =Σ ˆ + . Hence ∆ has the correct L∞ norm (i. 8. 2. the approximant where the poles of Σ+ are all in the LHP and those of Σ− are in the RHP. q.16). · · · .22) on page 116: q σ1 (Σ) > · · · > σq (Σ) each with multiplicity mi . where Hk (s). Let its distinct Hankel singular values be as deﬁned by (5.22) ˆ be an optimal allpass dilation of Σ deﬁned by (8. and is stable and furthermore H(s) − H ˆ ) = σi (Σ).18) we obtain (I ∆) = (I Φ1 ) −Θ11 Θ21 ¯ Θ Θ12 −Θ22 ¯ . with McMillan degree k k (8. 2. k = 1. Φ2 = I. i = 1. size on the jω axis). q . the following inequality holds: Σ H∞ ≤ 2(σ1 + · · · + σq ) (8. n − rq . (5. Let Φ1 = H − H. Remains to show that ∆ Since Θ is Junitary. k = 1. i=1 mi = n. Error bounds for optimal approximants Applying the equivalence transformation T = −A∗ + 1 i “oa” 2004/4/14 page 215 i 215 I −1 P 0 I (new state is T times old state) we obtain the realization 0 A 1 C 1 2 QΓ−1 BB∗ QΓ−1 B B 0 0 1 C P 2 This proves the desired decomposition for ∆ = 0.5 Error bounds for optimal approximants We now concentrate on error bounds which exist for optimal Hankel norm approximants. so is Θ is antistable. Solving equation (8.e.19) δ i=1 σi Hi (s) = i=1 mi . This system is decomposed into its stable Furthermore. i. ˆ with k stable poles such that H − H ˆ is allpass there exists an antistable contractive ∆ such that (c) Given H ˆ ˆ H − H = Z(∆).5. allpass. · · · . It follows that Σ ˆ (s) is all pass with magnitude σq . i = 1.20) ACA April 14. For details see section 9. q. σi (Σ This fact implies the following decomposition of the transfer function H: H(s) = D0 + σ1 H1 (s) + · · · + σq Hq (s).
) Consider the system described by the transfer function: Z(z ) = z2 + 1 z3 As already mentioned. such that H(s) − D0 H∞ ≤ i “oa” 2004/4/14 page 216 i Chapter 8.k denote an optimal Hankel approximant of McMillan degree k . Let Hb. σ3 = 2 i i April 14.e. Notice that the above formulae remain valid if we scale the singular values by γ . 2004 i i . 2k + 1 σ2k+1+j (H − Hh. The Hankel singular values of the error can be explicitly determined: σi (H − Hh. σ2 = 1. consider the balanced truncated system In addition to H (A22 .k ) = σk+1 (H).k ) = σj (He− ) ≤ σk+1+j (H). Hankelnorm approximation σ1 + · · · + σq . H ≤ 2σq A consequence of the above inequalities is that the error for reduction by balanced truncation can be upperbounded. · · · . ˆ constructed above which will be denoted by Hh in the sequel.4. (Continuation of example 7. It is interesting to notice that the optimal approximant of order 2 is no longer FIR (it is IIR instead). we conclude H − Hh.. i.20). i = 1.k denote a balanced approximant of McMillan degree m1 + · · · + mk . The corresponding Hankel matrix is 1 0 1 0 0 1 0 0 ··· H= 1 0 0 0 0 0 0 0 . Let Hh. · · · .5.1.k H∞ ≤ 2(σk+1 + · · · + σn ) σi γ . .k (s) H∞ . whose transfer function will be denoted by Hb .i i 216 furthermore.22) 0 σ1 √ 5 5+1 . and thus both the H∞ and the Hankel norms have the same upper bound: H(s) − Hb (s) H∞ . The SVD of the 3 × 3 submatrix of H is: σ1 √ 0 5 1 0 1 0 1 0 = 0 1 σ3 1 0 0 √ 0 5 where σ1 = σ1 0 0 σ1 √ 5 0 0 0 σ3 σ3 − √ 5 √ 5−1 2 ACA : σ3 √ 5 0 − √ σ1 √ 5 0 σ2 0 0 1 0 σ3 √ 5 (8. . In the sequel we investigate the optimal approximation of this system in the Hankel norm. by means of the singular values of Σ.21) Below we assume that each Hankel singular value has multiplicity equal to one: mi = 1.1. B21 . replace σi by Example 8. C12 ). . it follows that Hb (s) − Hh (s) is allpass with norm σq . there exists D0 ∈ Rm×m . H ≤ 2(σk+1 + · · · + σq ) (8. this is an FIR system. . n − k − 1 The ﬁnal set of inequalities concerning the error of the approximation in the Hankel norm follows from (8. j = 1. Since we know explicitly the singular values of the stable part of the error system Σe− . namely H(s) − Hb. then both the Hankel and the H∞ norms of the error system have the same upper bound.
Finally it is readily checked that the Hankel matrix consisting of 1 as (1. we proceed as follows.3. that of σ1 Z1 + Z2 is two and ﬁnally that of all three summands is three. transform the system to a continuoustime system using the transformation of subsection 4. the error Z(z ) − Z2 (z ) is all pass with magnitude equal to σ3 on the unit circle: Z(z ) − Z2 (z ) = σ3 The corresponding optimal Hankel matrix of rank 2 is: ˆ H= 1 0 σ3 0 2 σ3 0 σ3 0 2 σ3 0 . First. while η2 = σ2 − σ3 = (3 − 5)/2. It has poles at ± σ3 .i i 8. it √is straightforward to 2 verify that η1 = σ1 − σ3 − 1 = 5 − 2. .3. 2004 i i i i . . ACA April 14. we obtain the transfer function: G(s) = Z 1+s 1−s = 2(s3 − s2 + s − 1) (s + 1)3 Applying the theory discussed in the preceding subsections we obtain the following second order continuoustime optimal approximant: −(s2 − 1) G 2 ( s) = 2 (1 − σ3 )s + 2σ1 s + (1 − σ3 ) Again using the transformation of subsection 4. which implies η2 ≤ σ3 .3 we obtain the following discretetime optimal approximant: Z2 (z ) = G2 z−1 z+1 = z z 2 − σ3 √ Notice that the optimal approximant is not an FIR (ﬁnite impulse response) system. In this particular case the 3 × 3 submatrix of H 1 0 σ3 0 σ3 0 σ3 0 = 2 σ3 σ1 √ 5 0 1 0 − σ3 √ 5 0 σ3 √ 5 0 σ1 √ 5 2 1+ σ3 0 0 0 σ3 0 σ1 √ 5 0 0 0 σ3 0 − √ 5 0 1 0 σ3 √ 5 (8. To compute the optimal approximant.5. and the McMillan degree of Z1 is one.3.23) 0 σ1 √ 5 This decomposition is an application √ of formula (3.. which provides a class of minimizers. The problem with this procedure is that the resulting approximant does not have Hankel structure.19) of Z is: Z(z ) = σ1 1 − σ3 z 2 −1 + σ3 z 2 1 +σ2 +σ3 3 2 2 z z (z − σ3 ) z (z − σ3 ) Z1 (z ) Z2 (z ) Z3 (z ) Notice that each Zi is allpass.22).16). σ3 0 2 σ3 0 3 σ3 0 2 σ3 0 3 σ3 0 2 σ3 0 3 σ3 0 4 σ3 1 − σ3 z 2 z 3 (z 2 − σ3 ) ··· . which implies η1 < σ3 . Error bounds for optimal approximants 217 i “oa” 2004/4/14 page 217 i It is tempting to conjecture that the optimal second order approximant is obtained by setting σ3 = 0 in (8. 1) entry and 0 everywhere else. ˆ is also an optimal approximant of the corresponding submatrix of H. Furthermore. The decomposition (8. is the optimal approximant of H of rank one. .
given above.5. the subscript ”hank” stands for optimal Hankel norm approximation. this canonical form is a special case of the forms discussed in section 8. This system is not reachable but observable (i. 0. As → σ2 we obtain checked directly by noticing that the determinant of A ˆ = A σ2 −σ1 2σ1 (σ1 +σ2 ) σ2 −σ1 2σ2 (σ1 +σ2 ) 0 0 ˆ = .14) depend on the choice of D ¯ ¯. C ¯ = −1 . 2004 i i . m = p = 1 and 1 1 1 2σ1 σ1 +σ2 A=− . C ¯. Similarly. B= . B 2σ1 (σ1 + σ2 ) σ1 + σ2 This is the best antistable approximant of Σ. the Nehari solution (see (8. C = 1 1 . From (8. if → σ1 . D ¯ . we thus choose to approximate ΣB . Balancing and Hankel norm approximation applied to lowpass ﬁlters. σ2 ) = Σ. furthermore from (8. the limit still exists and gives a realization The formulae (8. and show that the system obtained is indeed the optimal approximant.e. Figure 8. B − σ1 − σ2 ˆ = . ”ROM” stands for ”reduced order model”.6. and D A state space representation of the reachable and observable subsystem is: ¯ = A σ2 − σ1 ¯ = σ2 − σ1 . there is a polezero cancellation in the transfer function). ”FOM” stands for ”full order model”. If we choose it to be − . is of the order 10−7 . . another is to compute weighted approximants. B σ2 − σ1 0 ˆ = . we obtain the following reachable and observable approximant: ¯ = A σ1 − σ2 ¯ = σ1 − σ2 .4 gives the amplitude Bode plots of the error systems and tabulates their H∞ norms and upper bounds. We observe that the 10th order Hankelnorm approximants of ΣC1 and ΣE are not very good in the stop band. Figure 8. the inertia of Γ is {1. Hankelnorm approximation i “oa” 2004/4/14 page 218 i Example 8. ΣE by 10th order models. one way for improving them is to increase the approximation order. Thus ΣC2 has an 8th order (approximately) allpass subsystem of magnitude .3. D ¯ = σ2 . after a polezero cancellation.i i 218 Chapter 8.05. we compute the limit of this family for → σ2 . Example 8. with pass band gain equal to 1. D = 0. It follows from these plots that in order to obtain roughly comparable approximation errors. where n = 2. this means that the gramians are P = Q = diag (σ1 . Consider the system Σ given by (4.2.3.e. 2 2 2 (8. ˆ = .17) Γ = 2 − S 2 = diag ( 2 − σ1 . 1 1 1 σ +σ 2σ 1 2 2 where σ1 > σ2 . Observe that for the Chebyshev2 ﬁlter the difference of the singular values: σ13 − σ20 . i. Since the inertia of A ˆ is equal to the inertia of −Γ. C −1 +σ1 −1 +σ2 .1% ripple both in the pass and stop bands In each case we consider 20th order low pass ﬁlters. ˆ = σ2 . A ˆ has one stable and one unstable poles (this can be and D ˆ is negative). C ¯ = −1 . by means of balanced truncation and Hankel norm approximation. In order to determine the suboptimal Hankel norm approximants for σ1 > > σ2 . D ¯ = σ1 . of the optimal system which is equivalent to A. and cutoff frequency normalized to 1. B 2σ1 (σ1 + σ2 ) σ1 + σ2 ˆ .14) : ˆ = A −σ1 2σ1 ( +σ1 ) −σ2 (σ1 +σ2 )( +σ1 ) −σ1 (σ1 +σ2 )( +σ2 ) −σ2 2σ2 ( +σ2 ) ˆ = . i i ACA : April 14. Four types of analog ﬁlters will be approximated next. The ﬁlters are: ΣB ΣC1 ΣC2 ΣE B UTTERWORTH C HEBYSHEV1: C HEBYSHEV2: E LLIPTIC: 1% ripple in the pass band 1% ripple in the stop band 0. ΣC2 by 8th order models. 1}. ΣB and ΣC2 have to be approximated by systems of lower order than ΣC1 and ΣE .13). the Chebyshev1 ﬁlter has an allpass subsystem of order 3.3 shows the Hankel singular values of the fullorder models. − σ2 ). The table compares the peak values of these Bode plots with the lower and upper bounds predicted by the theory. A continuoustime suboptimal approximation.5. B Finally.13). C −1 σ1 +σ2 −1 2σ2 .10) on page 208). This system is in balanced canonical form. In this example the subscript ”bal” stands for approximation by balanced truncation. and → σ1 . and ΣC1 .
2 0 1 0.8 0. and the upper bound (8.4 0.2015 Σ − Σhank ∞ Σ − Σbal ∞ 2(σ11 + · · · + σ20 ) 0.6 0.2700 0.2 0 1 0.3595 1.0383 0.4113 0.0506 σ11 0.5. Error bounds for optimal approximants 219 i “oa” 2004/4/14 page 219 i HSV Butterworth N=20 1 0. A NALOG F ILTER A PPROXIMATION: Bode plots of the error systems for model reduction by optimal Hankelnorm approximation (continuous curves). 2004 i i i i .8 0.i i 8.6508 1.4 0.0999 1.5818 April 14.8 0.1035 0.4.2 0 HSV Chebyshev−1 N=20 0 5 10 15 20 0 5 10 15 20 HSV Chebyshev−2 N=20 1 0.6 0.9678 0.8 0.2 0 HSV Elliptic N=20 0 5 10 15 20 0 5 10 15 20 Figure 8.6 0. B UTTERWORTH C HEBYSHEV2 C HEBYSHEV1 E LLIPTIC ACA σ9 0.2457 H∞ Norm of the Error and Bounds Σ − Σhank ∞ Σ − Σbal ∞ 2(σ9 + · · · + σ20 ) 0.6 0.3374 0. A NALOG F ILTER A PPROXIMATION: Hankel singular values Butterworth: FOM − ROM 10 0 −10 −20 −30 −40 −50 −1 0 1 Chebyshev−1: FOM − ROM 10 0 −10 −20 −30 −40 −50 −1 0 1 10 10 10 10 10 10 Chebyshev−2: FOM − ROM 10 0 −10 −20 −30 −40 −50 −1 0 1 Elliptic: FOM − ROM 10 0 −10 −20 −30 −40 −50 −1 0 1 10 10 10 10 10 10 Figure 8. balanced truncation (dashdot curves).21) (dashdash curves).4 0.1008 0.4 0.0388 0.0779 0.3.
The image of Hφ is: im Hφ = Xd ∗ = Xd 4. H2 (C+ ) are denoted by P− . ∗ .6. H2 (C+ ) contain functions which are analytic in C− . Recall that L2 (j R) = H2 (C− ) ⊕ H2 (C+ ) where H2 (C− ). The shift operator in this space is deﬁned as Fd (h(s)) = π− 1 d(s) π+ h(s) d(s) s where π+ . implies P+ ψ Hφ f = Hφ ψ f .24) and its adjoint Hφ 1. ψ ∈ H2 (C− ). The exposition is based on Fuhrmann [118].6 Balanced and Hankelnorm approximation: a polynomial approach* In chapter 7 and in the preceding sections of the current chapter. The kernel of the adjoint Hankel operator is: ker Hφ 3.i i 220 Chapter 8. The dual Hankel operator is deﬁned as: ∗ ∗ f = P− (φ∗ f ). gcd (n. The kernel of this Hankel operator is: ker Hφ = d d∗ H2 (C+ ) d∗ d H2 (C− ) ∗ = 2.1 The Hankel operator and its SVD* In this section we restrict our attention to the singleinput singleoutput case. f → Hφ f = P+ (φf ). we use φ∗ to denote φ(s)∗ = φ∗ (−s) The Hankel operator with symbol φ ∈ L2 (j R) is deﬁned as follows Hφ : H2 (C− ) → H2 (C+ ). Given φ ∈ L2 (j R). respectively.7. 2004 i i . This approach provides new insights and connections between the concepts discussed above. Consider a Hankel operator Hφ satisfying (8. 8. C+ respectively. π− are projections on to the polynomials. P+ . d) = 1 d (8. : H2 (C+ ) → H2 (C− ). In the sequel we present an approach based on polynomial methods. Proposition 8. strictly proper rational functions. φ = n ∈ H2 (C+ ). The projections onto H2 (C− ). For the rest of this section we consider Hankel operators with stable. The image of the adjoint operator is: im Hφ ∗ i i ACA : April 14. f → Hφ Hφ An immediate consequence of the above deﬁnitions is: φ. It readily follows that the characteristic polynomial of this shift operator is d. we presented the theory of balanced and Hankel norm approximations by employing state space methods.24) In order to state the next result we need to introduce the space Xd of all strictly proper rational functions with denominator d: a Xd = : deg a < deg d d Xd is a ﬁnitedimensional linear space with dimension deg d. rational symbol: Hφ . Hankelnorm approximation i “oa” 2004/4/14 page 220 i 8.
deg p By conjugating the second equation and eliminating n we obtain ˆ ∗ − pp∗ ) + d(π p ˆ ∗ − ξ ∗ p). d∗ is antistable. therefore above implies that d divides p ˆ ∗ − pp∗ ) ⇒ p ˆp ˆ ∗ − pp∗ = 0.26) ˆ . The Schmidt pairs of Hφ satisfying (8. Using an argument similar to the above. This in turn The above implication follows because the degree of dd∗ is 2ν . d∗ d .25) (8. while that of p implies that ∗ p p ˆ = p∗ . d∗ d .i i 8. Next.6. g ∈ H2 (C+ ) is a Schmidt pair of Hφ corresponding to the singular value σ . A pair of functions f ∈ H2 (C− ). while the second implies that d∗ divides the same quantity.27) ˆ ˆ p q ˆ ) be a minimal degree solution of (8. Then all other solutions of this equation for the Let (p. we discuss the issue of the multiplicity of singular values. 2004 i i i i . we obtain the following equations: ˆ ˆ n p p n p p π =σ ⇒ =σ + ∗ ⇒ d d∗ d d d∗ d d ˆ ˆ n∗ p p n∗ p p ξ P− ∗ = σ ∗ ⇒ ∗ = σ ∗ + ⇒ d d d d d d d P+ ˆ + dπ np = σ d∗ p ˆ = σ dp + d∗ ξ n∗ p (8. and characterize the corresponding Schmidt pairs. dd∗  (ˆ pp ˆp ˆ ∗ − pp∗ is less than 2ν . ∗ Since g ∈ im Hφ and f ∈ im Hφ . = ±1 ˆ remains independent of the are Schmidt pairs corresponding to the same singular value σ . also known as Schmidt pairs. Balanced and Hankelnorm approximation: a polynomial approach* 221 i “oa” 2004/4/14 page 221 i Next. Hence the ﬁrst equation ˆp ˆ ∗ − pp∗ . we compute the singular values and corresponding singular vectors of the Hankel operator. d∗ d where deg p < deg d. deg p. we can show that if ˆ p p . Hφ g = σf .8. the quotient of p by p particular Schmidt pair: q p = (8. deg ξ < deg d = ν. 0 = σ d(ˆ pp Since d is stable (roots in the lefthalf plane). q ACA April 14. ap ˆ ). π . 0 = σ d∗ (ˆ pp while by conjugating the ﬁrst and eliminating n∗ we obtain ˆ ∗ − pp∗ ) + d∗ (π ∗ p ˆ − ξ p∗ ). p same σ are given by ˆ ) = (ap. p ˆ . deg a < deg d − deg p (q. have the form p p∗ .24). = ±1 =1 ⇒ p ˆ p ˆ p Proposition 8.25) for some ﬁxed σ . d∗ are coprime. deg π. and hence d. ˆ q q . provided that the following relationships are satisﬁed ∗ Hφ f = σ g. ξ are polynomials having degree less than the degree of d The quantities p.
modulo d: Fd (r) = s · r(s) mod d(s). . Theorem 8. there exist polynomials a.25): ∗ dim ker Hφ Hφ − σ 2 I = deg d − deg p.6. d∗ i ∗ p∗ i ai d . ∗ ker Hφ Hφ − σ 2 I = Combining the above results we can state the main result with regard to the SVD of Hφ . be the minimaldegree solutions of this equation. with n. σ is a singular value of the Hankel operator Hφ . d n d ∈ H2 (C+ ).29) This is the eigenvalue equation which has to be solved in order to compute the singular values and singular vectors of the Hankel operator Hφ . Hankelnorm approximation i “oa” 2004/4/14 page 222 i ap : deg a < deg d − deg p d∗ This implies that the multiplicity of a singular value is equal to the degree of d minus the degree of the minimaldegree p which satisﬁes (8.28) is an eigenvalue equation modulo the polynomial d. r. µi . (8.1. the matrix representation of K is a sign matrix. . . . i = 1. The singular values of Hφ and their multiplicities are σi =  λi . there holds Mp = λp where M = Kb(Fd )n(Fd ) ∈ Rν ×ν and p ∈ Rν . p∗ respectively in the basis ei . i = 1. Notice that the characteristic polynomial of Fd is d. . . p. d∗ are coprime. Furthermore. . with ν = deg d.28) we obtain the matrix equation n(Fd )p = λd∗ (Fd )p∗ .28) where deg π = deg p. r λi while the Schmidt pairs of Hφ corresponding to σi have the form pi ai . deg ai < deg d − deg pi . Hence. with multiplicities µi . λ = σ (8.28). Equation (8. . . and subsequently solved. b such that ad + bd∗ = 1 Thus the inverse of d∗ (Fd ) is b(Fd ). . a corresponding Schmidt pair if and only if the following equation is satisﬁed: np = λd∗ p∗ + dπ. 2004 i i . . i=1 µi = ν .9. Then from (8. be the solutions of the eigenvalue equation deﬁned by (8. With the notation above. . . Since d. let pi .10. φ = d coprime. Corollary 8. i = 1. . It can be converted into a matrix equation. Let λi . . i r = σi . and Fd be the shift in Xd . Let ei be a basis of the space Xd of all polynomials of degree less than deg d. as follows. This implies b(Fd )n(Fd )p = λp∗ Finally let K be the map such that Kp∗ = p. i = 1. r. r Example 8.i i 222 Thus Chapter 8. and p p∗ d∗ . . We consider again the system deﬁned by φ(s) = s2 1 +s+1 ACA : i i April 14. p∗ denote the vector representations of p.
Theorem 8. Corollary 8. d = s2 + s + 1. (8.29) is: M= 1 The eigenvalues of M are the roots of λ2 − 2 λ− 1 4: 1 2 1 −1 λ1 = √ √ 1− 5 1+ 5 . Furthermore b(s) = 1 2 (s + 1). such that (8. ν − 1. 2004 i i i i . d pi d pi We now have the following main result. πi . Onestep reduction.2 The Nehari problem and Hankelnorm approximants* For simplicity we assume in the sequel that the Hankel operator is generic in the sense that it has ν distinct singular values of multiplicity one: σ1 > σ2 > · · · > σν > 0. which implies π2 = λ2 s + λ1 . · · · . λ2 = ⇒ σ1 = λ1 . The k th order stable and optimal Hankelnorm approximant φk of φ = φk = where [ ]+ denotes the stable part of [ ]. λi = Upon dividing by d · pi .11. · · · . 2 = −1 The corresponding (unnormalized) eigenvectors are −2λ1 1 . n(Fd ) = I2 . of degree equal to ν − 1.13. i = 1. 8.30) is given by πk+1 pk+1 . pν n d of order ν − 1 is The difference φ − φν −1 is stable and allpass. b(Fd ) = 1 2 −1 0 1 1 −1 0 . i σi . Thus there are polynomials pi . Thus Fd = 0 −1 1 −1 . · · · . The best antistable approximant of φ = φ0 = π1 . − 2λ 2 1 Therefore p1 (s) = s − 2λ1 which implies π1 = λ1 s + λ2 . ν . ν. namely σ1 . s in Xd . ACA April 14. σ2 = −λ2 . k = 1 . and p2 (s) = s − 2λ2 .6.i i 8.12. this equation becomes n πi d∗ p∗ i − = λi . 4 4 1 = 1. its H∞ and Hankel norms are equal to σν . Nehari’s theorem. Balanced and Hankelnorm approximation: a polynomial approach* We chose the basis 1. Corollary 8.6.28) is satisﬁed npi = λi d∗ p∗ i + dπi . The best stable approximant of φ = φν −1 = πν . p1 n d n d. i = 1. K= 1 0 0 −1 223 i “oa” 2004/4/14 page 223 i Hence M in equation (8. + is The L∞ norm of the difference φ − φ0 is equal to the Hankel norm of φ. and signs i = ±1.
2004 i i . Let (f1 . f1 ) be a Schmidt pair corresponding to the largest singular value of Hφ : ∗ Hφ f1 = λ1 f1 and Hφ 2−ind = σ1 Hence H φ f1 = σ1 f1 .e. i = 1.i i 224 Chapter 8. while fo is the outer factor (i. Hence the πν difference between n and is allpass with norm σν : d pν n πν d∗ p∗ ν − = λν ⇒ d pν d pν The proof is thus complete. n πν − d pν H= n πν − d pν H∞ = σν Example 8. The following string of equalities and inequalities holds: Hφ · fo = = ≤ Hφ · f1 = Hφ f1 = Hφ fi fo P + fi P + φ fo = P + fi H φ fo ≤ P + fi · H φ fo ≤ H φ · fo =1 The above relationships imply that fo is a singular vector of the Hankel operator corresponding to the singular value σ1 . Proof. The proof of corollary 8.12. ν. In order to prove corollary 8. (Continuation) Consider again the Hankel operator Hφ with symbol φ(s) = 1 s2 + s + 1 The best antistable approximant and the best stable ﬁrstorder approximant. d∗ i pi d . we can factorize p1 in its stable/antistable parts: p1 = [p1 ]+ [p1 ]− . Let p1 have stable zeros. whose poles are antistable.e.2. = = λ2 + p1 s − 2λ1 s − 2λ1 p2 s − 2λ 2 s − 2λ 2 i i ACA : April 14. are respectively: λ1 s + λ2 1 π2 λ2 s + λ1 1 π1 = = λ1 + . it sufﬁces to show that the zeros of p1 in (8.30) are all unstable. stable and allpass). r = 1 d∗ p∗ ν λν d p ν −1 −1 −1 . antistable). This shows that pν is stable with degree ν − 1.13 is based on the fact that the Hankel operator Hr with rational symbol r deﬁned by: Hr . Then [p1 ]∗ π1 π1 + = ∗ p1 [p1 ]+ [p1 ]+ [p1 ]− f1 fi fo ∗ where fi is the inner factor (i. and corresponding Schmidt pairs has singular values: σ1 < σ2 < · · · < σν p∗ i .12 however we know that p∗ ν is antistable. By corollary 8. Hankelnorm approximation i “oa” 2004/4/14 page 224 i Proof. · · · .6.
ν 1 σν +σ1 . λ1 + λi has the following realization: = 2 2 ω2 γ2 2σ 2 ··· .33) ν ων γν Remark 8. . namely: −γ1 γ2 1 2 σ1 +σ2 2 −γ2 2σ 2 = ··· ··· .3 Balanced canonical form* Recall equation (8. that is the solution of the Hankel norm approximation problem for antistable approximants.. 1 σ1 ω2 γ2 (8. 2004 i i i i . Example 8. can be written down explicitly in terms of the balanced realization of φ. let σi = Then φ has a minimal realization Abal Cbal Bbal Dbal 2 −γ1 2σ 1 (8. ν 2 σν +σ2 . γν φ(∞) −γ2 γ1 2 1 σ2 +σ1 Abal Cbal Bbal Dbal . In the example discussed above ω2 = ACA 1 λ1 − λ2 1 = 54 = λ1 + λ2 γ April 14. i = 2. the solution to the Nehari problem. . . . such that p∗ p∗ p∗ n = γ1 1 + γ2 2 + · · · γν ν d d d d Furthermore.6. ων ω2 γν γ2 ν 2 σν +σ2 2 ω2 γ2 . 2 2 ων γν 2σν ··· ··· ∈ Rν ×ν ων γν .32).1.2. .6. deﬁne ωi = Then π1 p1 λ1 − λi . there exist γi . (8. . · · · ν. −γ1 γν 1 ν σ1 +σν −γ2 γν 2 ν σ2 +σν γ1 γ2 . Let the pi be normalized so that p∗ i d 2 = σi The function φ can always be decomposed as a linear combination of the singular vectors of the Hankel operator Hφ .i i 8. i = 1. 2 −γν (8.30). ω2 ων γ2 γν 2 ν σ2 +σν Aneh Cneh Bneh Dneh . namely.6. . .32) −γν γ1 −γν γ2 ··· ··· 2σν 1 γ1 2 γ2 ν γν Remark 8.25).3. . . π1 Making use of (8. Another important connection between balancing and the Schmidt pairs of Hφ is that the Nehari extension. . ν . . · · · .6.31) shows namely that the balanced realization is the realization obtained by using as basis in Xd (this space can be chosen as state space for a realization of φ) the the Schmidt vectors of Hφ . namely p has a realization which can be written 1 explicitly in terms of the quantities introduced above. . . We assume without loss of generality that λ1 > 0. .31) i λi ∈ R(ν +1)×(ν +1) which has the form given by (7.. The above result shows an additional close connection between the Hankel operator Hφ and balancing. Balanced and Hankelnorm approximation: a polynomial approach* 225 i “oa” 2004/4/14 page 225 i 8. .6.
6. the optimal approximation problem can be solved.33) is: 2λ1 1 1 λ1 which agrees with the expression for π1 p1 obtained previously.e. the realization of π1 p1 i “oa” 2004/4/14 page 226 i Chapter 8. The following relations hold: ∗ ∗ σν (dd∗ b + d db ) = pν pν . gh = = db dh pn be the reduced order models of order ν − 1 obtained from φ by balanced truncation. the optimal approximation in the Hankel norm need not provide better approximants than balanced approximation. a system with more states) which is allpass (unitary) and denoted by Σe . If this norm is replaced by the twoinduced norm of the Hankel operator H associated with Σ. This problem however (except in very special cases) remains unsolved. Recall that the Hankel norm is the two induced norm of the map H which assigns past inputs into future outputs and is different from the H∞ norm (and less than or equal in value). The most natural norm in which to seek optimality is the twoinduced norm of the convolution operator S associated with the linear system Σ in question (in other words. a polynomial approach to this theory is presented which. preservation of stability and existence of an error bound) it is not optimal in any norm. 8. It should be stressed that as far as the H∞ norm of the error system is concerned. This means that Σ is embedded in a larger system (i. Furthermore eb = φ − gb = λν p∗ ν pν d∗ d∗ + b d db . respectively. The key construction for solving this problem is the allpass or unitary dilation of the system Σ which is to be approximated. is presented because it yields new insights. i i ACA : April 14. For instance it is interesting to note that the decomposition of the transfer function in terms of singular (Schmidt) vectors yields the coefﬁcients of a balanced realization of the underlying system. 8.7 Chapter summary Although approximation by balanced truncation enjoys two important properties (namely. In the latter part of the chapter. although it uses a framework not discussed in this book. The preceding chapter was dedicated to a detailed discussion of system approximation in the Hankel norm. The beauty of the theory lies in the fact that the norm of the allpass system determines the order of the optimal (or suboptimal) approximants. 2004 i i . the H∞ norm of Σ). Hankelnorm approximation according to (8. Hankelnorm approximation. Furthermore the errors eh and ehb are allpass with norm σν .i i 226 Recalling that λ1 λ2 = − 1 4 .4 Error formulae for balanced and Hankelnorm approximations* Let gb = nb nh πn . eh = φ − gh = πν d∗ p∗ d∗ p∗ ν = λν ⇒ ehb = gh − gb = λν b ν pν d pν db pν These relationships imply that error eb is the sum of two allpass functions and its norm is 2σν . and can be omitted without loss of continuity.
1. where k is smaller than n. N. 9. Below we describe this method and show its connection with both the SVD (Singular Value Decomposition) and balanced truncation.1 Proper Orthogonal Decomposition (POD) The purpose of this section is to discuss a model reduction method known as POD which is short for Proper Orthogonal Decomposition.1. Section 9. Towards this goal a projector known as a Galerkin or PetrovGalerkin.i i i “oa” 2004/4/14 page 227 i Chapter 9 Special topics in SVDbased approximation methods A model reduction method which is in wide use is POD (Proper Orthogonal Decomposition).2 we provide a brief exposition of the POD method and the associated Galerkin projection. wn . let x(ti . w) of time t and the vector valued variable w. Starting point for this method is an input function and/or an initial condition.3 we provide a general view of approximation by truncation and the associated approximation by residualization. The data is collected in timesnapshots denoted by x(ti . that is an error bound which is valid for trajectories other than the ones used to obtain the projector. The issue then becomes whether it is possible to approximate this state trajectory with one living in a lower dimensional space Rk . this approach (inspired by balanced truncation) concludes that projection on the dominant eigenspace of the product of two gramians leads in the linear case to a global error bound. · · · .1. and may lead to approximants which are not close to the original system. if A has Jordan blocks. and the original dynamical system of dimension n is reduced to one which has dimension k . x(ti . The problem which arises is to determine how well this reduced system approximates trajectories other than the measured one. wj ) be a ﬁnite number of samples of x for t = t1 . The resulting state trajectory which lives in Rn . 9. w1 ) . Subsequently another widely used approximation method. n . · · · . is measured.3 examines the similarities between POD and balancing and proposes a method for obtaining a global error bound. namely modal approximation. This can be considered as an application of the singular value decomposition (SVD) to the approximation of general dynamical systems. . This is an EVD (Eigenvalue Decomposition) based approximation method.1 and 9. This is important because the sum of the neglected Hankel singular values provides error bounds for balanced and Hankel norm approximants. xi = ∈ R . i = 1. thus the faster the decay of these singular values the tighter the error bound. where n is the number of state variables needed to describe the system.1. tN and w = w1 . · · · . The chapter concludes with a study of the decay rate of the Hankel singular values of a system Σ. In the next two sections 9. In section 9. is discussed. wn ) 227 i i i i . is constructed.1 POD Given a function x(t.
7) ˆ = W∗ x. If the number of spatial samples n is bigger than that of time samples N .e. 2004 i i . . the requited eigenvectors. 9. . N . i = 1. Often it is required that the 2induced ˆ 2 be minimized. samples of the state computed at given time instants) of the original system. Special topics in SVDbased approximation methods n j =1 i “oa” 2004/4/14 page 228 i We are looking for a set of orthonormal basis vectors uj ∈ Rn . [ˆ x1 · · · x .1) on page 6 Σ: d x(t) = f (x(t). let X = UΣV∗ be its SVD. . we speak of a Petrov–Galerkin projection. ∗ . such that the 2induced norm of the error P − P The problem just stated is precisely the one solved by the Schmidt–Mirsky–Eckard–Young theorem 3. if V = W. The columns of U are the desired empirical eigenfunctions and the coefﬁcients are given by Γ = ΣV∗ . . W ∈ Rn×k . . is minimized. · · · . γN 1 X U ··· . V. Let these snapshots x(tj ). ˆi = k In addition it is required that the truncated elements x j =1 γji uj . that is the columns of V form as whose trajectories x orthonormal set. Using the considerations developed in the previous section. assuming that Σ ∈ RN ×N . y = g(Vx ˆ . u) Σ dt (1. Π is chosen using a POD method applied on a number of snapshots (i. u(t)) dt (1. k < N. 2.. where W∗ V = Ik . that is Γ γji uj .1) is known as proper orthogonal decomposition of the norm of the difference X − X family {xi }.e. up to scaling. N . The above optimality condition can also be deﬁned by means of the gramian (or autocorrelation matrix) of the data. · · · . ˆ N ] = [u1 · · · uk ] . u(t)). Equation (9. · · · ..4 on page 35. γN N (9. in some average sense. If the V = W. Recall the dynamical system deﬁned by (1. ﬁnd N ˆ = P i=1 ˆ ˆix ˆ∗ x i with k = rank P < rank P. u). N . i. Given the matrix of snapshots X = [x(t1 ) x(t2 ) · · · x(tN )]. j = 1. ··· γ 1N .1) Using the projection Π = VW∗ . be provided. · · · . . y(t) = g(x(t). U U = IN . 2. N . Given X. Otherwise. U = XVΣ−1 . evolve in a k dimensional subspace. ∈R ˆ X Uk γk 1 ··· γkN approximate the elements in the family {xi } optimally. that is the snapshots reconstructed from only k empirical eigenfunctions: γ11 · · · γ1N . such that xi = i = 1. . then the columns of XV are. the computation of the SVD of X can be achieved by solving the (small) eigenvalue problem X∗ X = VΣ2 V∗ .2 Galerkin and Petrov–Galerkin projections The next concept is that of a projection.1.1) The ui are sometimes referred to as empirical eigenfunctions or principal directions of the “cloud” of data {xi }. namely. we obtain the reduced order dynamical system ˆ : dx ˆ = W ∗ f (V x ˆ . . i =P ∈R The optimization problem can now be formulated as a matrix approximation problem. j = 1. namely N P= i=1 n×n ∗ xi x∗ . Π is orthogonal and bears the name of Galerkin projection. n×N . we i i ACA : April 14.i i 228 Chapter 9. ˆ 2 . 2. . [x1 · · · xN ] = [u1 · · · uN ] . γ11 .
We will now show that the method of approximation of linear systems by balanced truncation discussed in chapter 7. therefore discarding the small eigenvalues corresponds to discarding the states which are most difﬁcult to reach. according to theorem 7. Assume that the singular values of X decay rapidly and only k of them are signiﬁcant for the application in question. Ti1 = UW1 Σ1 − 1 /2 Π = Ti1 T1 . ˆ = VV∗ x. In conclusion for linear systems.43) on page 69. It readily follows from (7. where ∈ Rn×k . Proper Orthogonal Decomposition (POD) 229 i “oa” 2004/4/14 page 229 i compute the singular value decomposition X = UΣV∗ .1. in which case we recover the inﬁnite reachability gramian P introduced in (4. In this case we collect all such state trajectories X(t) = [x1 (t) · · · xm (t)] and compute the gramian: T 0 X(t)X∗ (t) dt = 0 T eAt BB∗ eA t dt = P (T ) ∗ We recognize this quantity as the reachability gramian of the system. Recall that by (4.5) on page 177). B ˆ = T1 B. The Galerkin projection is thus deﬁned by the leading eigenvectors of this gramian. If m = 1. we need to project onto the dominant eigenspace of the product of the two gramians PQ.16) on page 184 that if we partition Σ = Σ1 Σ2 so that Σ2 contains the (square root of the) negligible eigenvalues of PQ. B ˆ = U∗ B.1.1 and in particular (4.2. where ei ∈ Rm is the canonical ith unit vector. namely U1 . and the reduced system is described by: ˆ = U∗ AU1 . D ˆ =D A 1 1 According to the analysis presented earlier this is the subsystem of the original system which retains those states which are easier to reach while discarding the states which are the most difﬁcult to reach. Balanced truncation however suggests that if in addition to the degree of reachability of the states.28) on page 60. The Galerkin projection consists of the leading k left singular vectors of the snapshot matrix X.e. C ˆ = CTi1 . the projection T 1 = Σ1 − 1 /2 ∗ ∗ V1 L ∈ Rk×n . In order to obtain error bounds. If m > 1. x One choice for the input function is the impulse δ (t). the system is stable). If A happens to have eigenvalues in the lefthalf of the complex plane (i.i i 9. POD approximation using the impulse response is related to balanced truncation where in POD approximation only the degree of reachability of the states is considered.3 POD and Balancing The method just described is POD combined with a Galerkin projection. leads to the following reduced system: ˆ = T1 ATi1 . POD now proceeds by computing an eigenvalue decomposition of this gramian: P = UΛU∗ = [U1 U2 ] Λ1 Λ2 ∗ [U1 U2 ]∗ = U1 Λ1 U∗ 1 + U2 Λ2 U2 where it is assumed that Λ2 contains the small or negligible eigenvalues of P . The resulting state is xi (t) = eAt Bi . 2004 i i i i .7 on page 177. Therefore let Q be the (inﬁnite) observability gramian which satisﬁes as usual the Lyapunov equation A∗ Q + QA + C∗ C = 0. applied to the impulse response of the system.55) on page 71 the energy required to steer the system to a given state (starting from the zero state) is related to the inverse of P . is POD method combined with a Petrov–Galerkin projection. A ACA April 14. we can let T approach inﬁnity. Consider a linear system of the usual form. namely ˙ (t) = Ax(t) + Bu(t). we also consider their degree of observability. that is V = W = Uk ∈ Rn×k . D ˆ = D. the corresponding state is x(t) = eAt B. each input channel is excited separately u(t) = δ (t)ei . C ˆ = CU1 . It is in wide use for the simulation and control of nonlinear dynamical systems and systems described by PDEs (partial differential equations). Thus Vx 9. y(t) = Cx(t) + Du(t). a global approximation error bound can be derived (see (7. where Bi is the ith column of B. discussed in section 4. is the projection of x onto span col Uk .
that is in the linear case x result from a preassigned initial condition x(0). Consequently collecting snapshots of the adjoint system is straightforward. and also ∗ ∗ ∗ ∗ ∗ ∗ ∗ 0 = T∗ i1 (A Q + QA + C C)Ti1 = (Ti1 A T1 )Σ1 + Σ1 (T1 ATi1 ) + (Ti1 C )(CTi1 ) = 0. see the three papers [298]. Furthermore. in other words C = In . since P T∗ 1 = Ti1 Σ1 and QTi1 = T1 Σ1 imply ∗ ∗ ∗ ∗ ∗ 0 = T1 (AP + P A∗ + BB∗ )T∗ 1 = (T1 ATi1 )Σ1 + Σ1 (Ti1 A T1 ) + (T1 B)(B T1 ) = 0. i i ACA : April 14. Remark 9. u(t) = B∗ p(t) + D∗ y(t). the observability gramian has to be considered. Subsequently Lumley. and y. According to theorem 4. the snapshots (b) If the dynamical system in question is autonomous. 2004 i i . POD is also related to the KarhunenLo` eve expansion introduced in the theory of stochastic processes in the late 1940s.1. The error bound then asserts that x ˜ x−x for all input functions of unit energy. and Holmes made important contributions. output of Σ∗ . the same as the snapshots of the impulse response of the dynamical system x (c) If the snapshots result from an input other than the impulse. and the output or reconstructed state is x ˜ = Ti 1 x ˆ = Ti1 T1 x.i i 230 Chapter 9. The projected state is ˆ = T1 x. Special topics in SVDbased approximation methods i “oa” 2004/4/14 page 230 i ∗ The projected system is balanced.1.15) on page 111). the error bound obtained tells us that the L2 norm of the difference If we denote by y ˆ is upper bounded by twice the sum of the neglected singular values. u are the input. In this case the corresponding Lyapunov equation becomes A∗ Q + QA + I = 0. was introduced by Lorenz in 1956 for the study of weather prediction [231]. dt 2 = (In − Ti1 T1 )x 2 ≤ 2(σk+1 + · · · + σn ) where p is the state of the adjoint system. where B = x(0). Sirovich. ˆ the output of the reduced system. (d) The snapshots needed to construct the observability gramian can be interpreted using the adjoint system. These gramians can be constructed from snapshots of the original system and snapshots of the adjoint system. This method. Given its conceptual simplicity. (a) While the projection onto the dominant eigenspace of one gramian (either P or Q) is orthogonal (Galerkin projection). and the book [62]. weighted gramians become relevant (see section 7. ˙ (t) = Ax(t). is Σ∗ : d p(t) = −A∗ p(t) − C∗ y(t). the projection onto the dominant eigenspace of their product PQ is an oblique (Petrov–Galerkin) projection. In this case the snapshots of the state of the autonomous system are ˙ = Ax + Bu. Therefore we would like to propose that instead of an orthogonal Galerkin projection: Use an oblique (Petrov–Galerkin) projection onto the dominant eigenspace of the product of two gramians. the adjoint equation enters the formulation of the problem in a natural way through the optimization. the adjoint system running backward in time (see (5. (e) Some historical remarks on the POD. If the system in question is linear. for all inputs u of unit norm: y−y ˆ y−y 2 ≤ 2(σk+1 + · · · + σn ) It should be stressed at this point that even if the output of the system is the state y = x. the following holds: the reachability gramian of the adjoint (dual) system is the same as the observability gramian of the original. In numerous optimal ﬂuid control problems.21 on page 67. this procedure can be implemented even if the dynamics are not linear. in order to obtain an error bound on the behavior of the approximate system. respectively. also known as the method of empirical eigenfunctions for dynamical systems.6 on page 195 for details). POD is in wide use for model reduction of systems described in general by nonlinear PDEs (partial differential equations) or by nonlinear ODEs (ordinary differential equations).
1 · · · C:.i i 9. The connection between POD and balancing has also been noted in [356]. if the eigenvalue decomposition contains Jordan blocks.j ∈ Rp×1 denotes the j th column of C.e. A= . Also worth mentioning is the work of Fujimoto and Scherpen on balancing for nonlinear systems [123] and the work of Bamieh [40].: where Bi. where k = 0.: ∈ R1×m denotes the ith row of B and C:. the k eigenvalues with largest real part). Banks. Modal approximation is not derived from balancing. for more recent work in this direction see [277] and [247]. It can then be shown that n Σ − Σr H∞ ≤ i=k+1 C:. C = C:.n . · · · . 2. It is well known that polynomialexponential functions of the form tk e−λt . Model reduction of nonlinear systems using empirical gramians has been studied in [215].i Bi. We mention here the work of Burns.2 Modal approximation One widespread application of the truncation approach to model reduction. 9. [81]. . − λn Bn. ··· s+λ s+λs+λ s + λ (s + λ)n √1 . In the sequel we present a simple example which shows that the nearness of the poles to the imaginary axis need bear no relationship to the dominant behavior of the frequency response (i. in the frequency domain we have We now wish to expand e−µt in terms of 1 γ0 γ1 s − λ γ2 (s − λ)2 γn (s − λ)n = + + + · · · + + ··· s+µ s + λ s + λ s + λ s + λ (s + λ)2 s + λ (s + λ)n ACA April 14. King and coworkers [80]. [204]. [30]. We would also like to mention [160]. ∞].: . [199]. B = .2. we obtain the Laguerre basis. . [213] as well as Hinze [3]. of A. E 1 ( s) = . expressed in the Laplace domain. to the location of the resonances of the system). the eigenvalues of A can be ordered so that Re(λi+1 ) ≤ Re(λi ) < 0. En (s) = . in this regard we mention the work of Kevrekidis and coworkers as exempliﬁed in [97].g. The papers mentioned below are a few rather randomly chosen contributions. POD methods have also been applied to control problems. [212]. form a basis for L2 [0. [203]. 2004 i i i i . namely the one consisting of the dominant poles. [29]. this can be a meaningful model reduction method. is modal truncation. If we orthonormalize this family of functions. namely the above functions. when the poles have physical signiﬁcance.e.. Kevrekidis and coworkers [4]. but simply. [174]. L n! s n! (s + λ)n+1 The (unnormalized) Laguerre functions. · · · . However. [273]. may not be appropriate for model reduction. L e = . 2λ It can be shown that the above functions have the same norm. It is derived by looking at the EVD (eigenvalue decomposition) not of the product of gramians PQ. this method may fail. In particular the transfer function is decomposed in terms of partial fractions and only those partial fractions are retained which have their poles closest to the imaginary axis.: Re(λi ) In some cases. [216]. the reduced system Σr obtained by truncation now preserves the k dominant poles (i. Kunisch and Volkwein [349]. e. The original work was geared towards capturing the openloop dynamics with differential equations having few degrees of freedom (low dimension). The principle of choosing a subsystem. are as follows: E0 (s) = 1 1 s−λ 1 (s − λ)n . [41] on a system theoretic approach to the study of ﬂuids. 1. If the system is stable. we can transform the state space representation of Σ to the basis composed of the eigenvectors of A: −λ1 B1. [162]. . [200]. Assuming that A is diagonalizable. [296]. Recall the following Laplace transform pairs 1 tn −λt 1 tn = n+1 . .. besides balanced truncation. For details we refer to Varga [341]. Modal approximation 231 i “oa” 2004/4/14 page 231 i There is an extensive literature on POD methods. Tran and coworkers [48]. [163].
1. In particular we take µ = σ + iω . and as argued above.126.498 while that with a 10th order approximation drops to 1. Let the tobeapproximated system be: GN (s) = k=1 αk Ek (s). Special topics in SVDbased approximation methods i “oa” 2004/4/14 page 232 i Taking inner products in the time domain we obtain the following values γk (µ) = 2λ (µ − λ)k . Conclusion. for large N its frequency response comes close to that of the secondorder system H(s). and the H∞ norm of the error system between the original and a 4th order approximation is 1. We thus obtain the following expansion 1 2λ 1 2λ µ − λ 1 s − λ 2λ (µ − λ)2 1 (s − λ)2 = + + + ··· s+µ µ + λ s + λ µ + λ µ + λ s + λ s + λ µ + λ (µ + λ)2 s + λ (s + λ)2 2λ µ − λ s − λ (µ − λ)2 (s − λ)2 1 = 1+ + + ··· µ+λs+λ µ + λ s + λ (µ + λ)2 (s + λ)2 = 2λ 1 1 −λ s−λ µ+λs+λ 1− µ µ+λ s+λ = 2λ 1 = (µ + λ)(s + λ) − (µ − λ)(s − λ) s+µ As an example we consider the expansion of the secondorder transfer function having a pair of complex conjugate poles µ and µ ¯: 1 1 1 Im µ H(s) = − = 2 . of the exansion of H(s) (log scale) 0 10 Singular Values √ −1 0 −2 −10 −3 −20 Singular Values (dB) −4 −5 −30 original 2nd order approx 4th order approx 10th order approx −6 −40 −7 −50 −8 −60 −9 −70 −1 10 10 0 −10 10 Frequency (rad/sec) 1 0 10 20 30 40 50 60 70 80 90 100 Figure 9. 2i s + µ ¯ s+µ s + 2Re(µ) s + µ2 The coefﬁcients of the expansion of H = i≥0 αi Ei in terms of the above orthonormal basis are: αk = I m γk (¯ µ). Notice that the magnitude which is 2. 2004 i i . The frequency response has a maximum at frequency 23 1 is shown in ﬁgure 9. µ + λ (µ + λ)k Notice that this is the transfer function of the kth basis function evaluated as s = µ. Following the above considerations approximation of GN by modal truncation gives slow convergence although the magnitude of the coefﬁcients αk is exponentially decreasing. Right pane: Bode plots of the original system and three approximants.1.i i 232 Chapter 9. any behavior can be approximated arbitrarily by means of a single pole given highenough multiplicity. Absolute value of the first 100 coef. N i i ACA : April 14. The expansion of this transfer function in terms of s+2 of the coefﬁcients of this expansion decays exponentially. Left pane: Absolute value of the ﬁrst 100 coefﬁcients of the expansion of H (log scale). where σ = 1/4 and ω = 1. Therefore modal truncation has difﬁculty capturing the behavior due to multiple poles.
we can deﬁne the following reduced order system: Σresid = 1 A11 − A12 A− 22 A21 1 C1 − C2 A− 22 A21 1 B1 − A12 A− 22 B2 1 D − C2 A− 22 B2 . B= B1 B2 . The related problem of determining the decay rate of the eigenvalues of the solution to one Lyapunov equation are also addressed. i. Very often these eigenvalues. that is. like the Hankel singular values. Residualization is equivalent to This reduced order model is obtained by residualizing x2 . Such conditions are derived here by relating the solution to a numerically low rank Cauchy matrix determined by the poles of the system. 2004 i i i i . The decay rate involves a new set of invariants associated with a linear system. of balanced truncation since the sum of the neglected singular values provides an upper bound for an appropriate norm of the approximation error. it was shown that in this case. until now. together with a conformal partition ing of the state space representation A= A11 A21 A12 A22 . In particular.4 A study of the decay rates of the Hankel singular values* The issue of decay of the Hankel singular values is of interest in model reduction by means. In [230] it is shown that reduction by truncation and reduction by residualization are related through the bilinear transformation 1 s . 9. conditions assuring rapid decay have not been well understood. the latter provides a good approximation at low frequencies. Thus while the former provides a reduced order system which approximates the original well at high frequencies. It should be mentioned that residualization can be applied to any realization of the original system Σ. for an overview see [208].i i 9. however. truncating the state. C = (C1 C2 ). These considerations are equivalent to studying the decay rate of the eigenvalues of the product of the solutions of two Lyapunov equations. ˙ 2 = 0. An important attribute of this model reduction method is that it preserves the steady state gain of the system: H(0) = Hresid (0) = D − CA−1 B This fact is not surprising. Truncation and residualization 233 i “oa” 2004/4/14 page 233 i 9. by setting x singular perturbation approximation..e.3 Truncation and residualization Given Σ = A C B D . are rapidly decaying. As discussed in section 7. and for a more recent account see [50].3. Σr = C1 D We showed earlier (see theorem 7. In general. a reduced order model can be obtained by eliminating x2 . ACA April 14. This section follows [21]. This fact has motivated the development of several algorithms for computing low rank approximate solutions to Lyapunov equations. which are obtained by evaluating a modiﬁed transfer function at the poles of the system. for instance. it can be applied to the one obtained by balancing. let us consider a partitioning of the state x = x1 x2 . the only property of the reduced system Σr is that its transfer function at inﬁnity is equal to the transfer function of the original system Σ at inﬁnity: H(∞) = Hr (∞) = D An alternative to state truncation is state residualization.7) that if Σr is constructed by balanced truncation. If A22 is namely. Bounds explaining rapid decay rates are obtained under some mild conditions. However. it enjoys stability and an error bound.2. In the above mentioned reference. The resulting reduced system is: A11 B1 . nonsingular. stability is preserved and the same error bound exists as for reduced order system obtained by truncating the balanced state.
which are possibly discontinuous but have bounded variation. our purpose is ﬁrst. (4. These results show that the bounds given here seem to give a signiﬁcantly better indication of the actual decay rate of the eigenvalues of P . Due to their importance in model reduction. Recall that the Hankel singular values are the square roots of the eigenvalues of the product of two gramians ˆ ∞ ≤ 2 n P . Special topics in SVDbased approximation methods i “oa” 2004/4/14 page 234 i 9. Because of this. these results are stated in terms of the condition number of A which is assumed symmetric. See [20] for a recent survey of such methods. Consider functions f deﬁned on the interval [0. Our bounds are functions of the eigenvalues of A and make no symmetry assumptions. Q which are positive deﬁnite. It has been observed that in many cases these quantities decay very fast and therefore the corresponding systems are easy to approximate. the problem of determining the decay rate of the Hankel singular values is discussed. [267]. These results lead directly to an explanation of why it is often possible to approximate P with a very low rank matrix. More precisely. Thus. He established upper bounds on the ratios λ λ1 (P ) for symmetric A. n−1 respectively. the approximation error decays asymptotically 1 as n− 2 . If we approximate f by means of a truncated nterm Fourier series expansion or by means of a truncated nterm wavelet expansion. We begin by considering only one of the two Lyapunov equations. The main result states that these invariants and the Hankel singular values are related by means of multiplicative majorization relations. It is based on a new set of system invariants. to investigate the decay rate of the eigenvalues of the product of the two gramians. First. there has been some activity recently on the issue of the decay rate of these singular values. Furthermore if additional smoothness assumptions on f are made. namely AP + P A∗ + BB∗ = 0.1 Preliminary remarks on decay rates* In the theory of function approximation by means of truncated Fourier series expansions or truncated wavelet expansions. [267]. the better the approximation. 1]. When A has an eigenvector basis that is not too illconditioned. [107]. Secondly. If the transfer function of the system in question is H = p q . They are closely related to the eigenvalue decay rates of Penzl [266]. there are explicit results relating the approximation error to the decay rate of the Fourier or wavelet coefﬁcients. P may be approximated by a low rank matrix (see section 12.i i 234 Chapter 9.4. Moreover the lower bounds for small eigenvalues of P are trivially zeros. page 308). For example. Several iterative methods for computing a low rank approximation to P have been proposed [154]. Next we turn our attention to the Hankel singular values of Σ. There are some results on the eigenvalue bounds for the solution of Lyapunov equations [214]. Many have observed that the eigenvalues of P generally decay very fast [154]. that is the decay rate of the Hankel singular values. and second.4. the invariants are the p magnitudes of q∗ evaluated at the poles of H. In contrast to the Penzl estimates. but these do not explain why Lyapunov equations permit the very low rank approximate solutions observed in practice. in particular balanced model reduction of largescale systems. [176]. we obtain an outer product representation of the solution of the form n P= j =1 δj zj z∗ j with δ1 ≥ δ2 ≥ · · · ≥ δn > 0. As mentioned earlier. i i ACA : April 14. to investigate the decay rate of the eigenvalues of one gramian. Penzl [266] took into account the low rank structure k (P ) of BB∗ . the decay is faster. [187]. Two recent approaches are [45] and [47]. The eigenvalue bounds surveyed in [214] focus mainly on AP + P A∗ + S = 0 with S ∈ Rn×n positive semideﬁnite. decay rates are derived that are direct estimates of the error of the best rank k approximation to one gramian P . our results do not establish explicit bounds for the eigenvalues of P . and the special low rank structure of S = BB∗ is not fully exploited. the error bound is Σ − Σ i=k+1 σi . the following result can be found in [99]. We give some numerical results and compare the two. the norms of vectors zj are uniformly bounded by a modest constant and hence the ratio δk+1 /δ1 gives an order of magnitude relative error estimate for a rank k approximation to P . Our purpose here is to explore the issue of decay of the approximation error as applied to linear dynamical systems.45) In general and especially when n is large it is unwise or even impossible to solve for P directly since this requires O(n3 ) ﬂops and O(n2 ) storage. 2004 i i . Instead. [284]. the smaller the sum of the tail of the Hankel singular values. but this approach depends heavily on symmetry and is not easily generalized to the nonsymmetric case.
for the elements of the Cholesky factors. i. It turns out that when C is positive deﬁnite. There holds D(y. We consider the SISO case. there are closed form expressions for elements of Cholesky factors and inverses of Cauchy matrices that lend themselves to derivation of the decay estimates we seek.4. which appear fundamentally in direct formulas for solutions of Lyapunov equations. ˆ Also let J1 . there are explicit formulas. if x. y)∗ = C (y. x)i = ρ ˆi eiθi . j = 1. With the notation established above. we establish the main result on the decay rates of the eigenvalues of a single Lyapunov equation. Actually this result is quoted for real x and y. x)− 2 C (x. x) D(y. x∗ )] −1 C (y∗ . Let √ ˆ the · and · on vectors or matrices be meant to apply elementwise. δn ). y) is nonsingular and the following result holds. ξi + ηj It readily follows that C (x.1. Moreover. y) ∈ Cn×n is deﬁned as follows 1 C (x. · · · . If Re ( ) < 0 then −C ( . (9. y) ∈ Cn as follows d(x. y) takes a value of 0 or ∞. Thus.1 of [171]. a diagonally scaled version of C (x. J1 . Let also d(x. x∗ ) [D(x.2) ∗ .1.2 Cauchy matrices* Our results depend heavily on properties of Cauchy matrices.3 Eigenvalue decay rates for system gramians* A canonical gramian and the Cauchy kernel* In this section. and the Lyapunov equation is AP + P A∗ + bb∗ = 0. λ2 . J2 are diagonal matrices of signs (i.g.i i 9. then δk = where ∆ = diag (δ1 . · · · . Moreover. y)) Πk (ξi + ηk (9. ) = L∆L∗ is the Cholesky factorization. 9. y)− 2 Γ 1 1 −1 2 R e ( λk ) k−1 j =1 λk − λj λ∗ k + λj 2 . A study of the decay rates of the Hankel singular values* 235 i “oa” 2004/4/14 page 235 i 9. (9. x)− 2 Γ∗ 1 1 = J2 . respectively. y)ij = (9. if −C ( . Remark 9. 2004 i i i i . y ∈ Rn . where the superscript ∗ denotes complex conjugation followed by transposition. y)i = ρi eiθi and d(y. and Γ is J1 . ηi denote their ith entry. y)] −1 = In . is of interest here. Formula (9. section 26.4. J2 be diagonal matrices with (J1 )ii = eiθi and (J2 )ii = eiθi . y ∈ Cn .5) J1 D(x.e. J2 unitary.4.3) If the components of x and y are such that no entry of d(x. y) = diag(d(x. Lemma 9.4) The proof of this result follows from the closed form expression of the inverse of the Cauchy matrix. The special case x = y = = (λ1 . ) is positive deﬁnite. C (x. and a slight extension is necessary to obtain the correct formula for complex x and y . Given two vectors x. Lemma 9. ACA April 14. x). let ξi . δ2 . The Cauchy matrix C (x. λn )∗ . y) is Junitary.2. y)− 2 C (y. due to Gohberg and Koltracht [142]. Deﬁne the vector d(x. n. y) [D(y∗ . y) D(x. so B is a column vector which will be denoted by b.4) implies that if x and y are real. ±1). y )i = Πk=i (ξi − ξk ) ∗ ) and D (x. see e. · · · .4. then C (x.
XΛ2 b ˆ . −1 . R = [b. Since C is positive deﬁnite. We immediately have the following lemma. AXb = Xb Λ and Xb b = e. b) is reachable. Moreover. Lemma 9. 1). where e = (1.4. A2 b. λj . The vector g∗ = (α0 . λ2 . one has YAc = ΛY. one ﬁnds that C is a Cauchy matrix with Cij = positive deﬁnite. Let Ac be the companion matrix for A. if. λj and Λ = diag (λ1 . if diagonal pivoting is included to bring the maximum diagonal element to the pivot i i ACA : April 14. This result provides a direct relationship with the Krylov or reachability matrix R. A2 b. b) is reachable and let R = [b. αn−1 ) n deﬁnes the characteristic polynomial of A with q(s) = det (sI−A) = i=0 αi si . The j th row of Y is n−1 2 e∗ ]. By inspection. When A is diagonalizable. where Xb = X diag (b) with b = X Proof. j Y = [1. First observe that ˆ . Ac = J − ge∗ n. It follows easily from the Cayley Hamilton theorem that AR = RAc . Special topics in SVDbased approximation methods i “oa” 2004/4/14 page 236 i We assume that A is a stable matrix (all eigenvalues in the open lefthalf plane). This kernel provides the decay rates we seek due to the following result. Since the pair (A. 1 −∗ ∗ ∗ ∗ ∗ 0 = X− b (AP + P A + bb )Xb = ΛC + C Λ + ee . We shall deﬁne the Cauchy kernel to be the matrix C = YGY∗ . XΛn−1 b ˆ ] = Xb Y . Proof. where J is a left shift matrix with ones on the ﬁrst subdiagonal and zeros elsewhere. P solves AP + P A∗ + bb∗ = 0. These rates are a function of the eigenvalues λj of the matrix A. Since (A. it has a Cholesky factorization. An−1 b] be the Krylov or reachability matrix. R is nonsingular and Re1 = b. · · · . no component of b −1 ∗ nonsingular. λn ). Let b be any vector such that (A. Ab. This concludes the proof. · · · . XΛ b ˆ . P = RGR∗ . λj . Moreover. αn = 1. · · · . but further analysis is needed to derive decay rates. b . An−1 b] = [Xb From Lemma 9. Therefore. · · · . 2004 i i . Lemma 9. λi + λ∗ j −1 ˆ ˆ is Hermitian positive deﬁnite and P = Xb C X∗ b. · · · . Hence. · · · . Λ is stable and C must be We are now prepared to derive a decay rate based on the eigenvalues of A. · · · . b) is reachable.i i 236 Chapter 9. and only if. 1. Let X be a matrix of right eigenvectors for A so that AX = XΛ and assume the columns of X each have unit norm. Then −1 Cij = . This is easily seen by multiplying on the left and right by R and R∗ to get ∗ ∗ ∗ ∗ ∗ ∗ 0 = R(Ac G + GA∗ c + e1 e1 )R = ARGR + RGR A + bb Since A is stable.3 we have P = RGR∗ = Xb YG(Xb Y)∗ = Xb C X∗ b . Ab. α1 . this solution is unique and the lemma is proved. where Y is the Vandermonde matrix of powers of the eigenvalues. λi +λ∗ j Since A is stable. b cannot ˆ = X−1 b is zero and the matrix diag (b ˆ ) is be orthogonal to any left eigenvector of A. Deﬁne G to be the solution of the canonical Lyapunov equation ∗ Ac G + GA∗ c + e1 e1 = 0.3.
A study of the decay rates of the Hankel singular values* 237 i “oa” 2004/4/14 page 237 i position at each stage of the factorization. index) these eigenvalues. ACA April 14. Decay rate estimate. From this.5. λk = argmax (9. Departure from normality increases the condition number of X and renders this bound useless. · · · .2 gives the δk in terms of the eigenvalues of A and one may think of the diagonal pivoting in the Cholesky factorization as a means to order (i. j > k. Then P − Pk 2 k ∗ j =1 δj zj zj .e. When A has an eigenvector basis that is well conditioned. Let Pk = · · · ≥ δn > 0. The previous discussion has established n P − Pk since the δj are decreasing and zj z∗ j zj 2 2 = j =k+1 δj zj z∗ j 2 2. we may assume an ordering of the eigenvalues of A and hence a symmetric permutation of the rows and columns of C such that C = L∆L∗ .6. On the other hand.6). Let P solve the Lyapunov equation AP + P A∗ + bb∗ = 0. Let λ(A) denote the spectrum of A. φ(λj ) < 1 −1 for every λj ∈ λ(A). with ∆ = diag (δ1 . Theorem 9. the norms of the vectors zj are uniformly bounded by a modest constant and hence the following estimate is obtained: δk+1 λk (P ) ≈ = λ1 (P ) δ1 k−1 j =1 λk − λj λ∗ k + λj 2 (9.5) of lemma 9. Proof. If the ﬁrst k − 1 eigenvalues Sk−1 ≡ {λj : 1 ≤ j ≤ k − 1} ⊂ λ(A) have been selected and indexed. Thus. we have Lej 2 ≤ (n − j + 1)1/2 . 2 ≤ δk+1 (n − k ) max zj j>k 2 2. One might see ˆ . δn ) and with L unit lower triangular and such that each column Lej satisﬁes Lej ∞ = 1.6) k −1 2Re (λ) λ∗ + λj j =1 We shall call this selection the Cholesky ordering. 2004 i i i i . The eigenvalues λk (P ) are ordered in the Cholesky ordering. δ2 . These results may be summarized in the following theorem.i i 9. the function φ(ζ ) = λ−ζ λ∗ + ζ is a linear fractional transformation that maps the open lefthalf plane onto the open unit disk.4. the components of b ˆ may be the low rank phenomenon accentuated through tiny components of b magniﬁed in a way that cancels the effects of the rapidly decaying δj . then the k th eigenvalue λk is selected according to −1 k − 1 λ − λ 2 j : λ ∈ λ ( A ) \ S . 2 = zj 2 Due to Lej ∞ = 1. given above and where zj = Xb Lej . we may conclude that the sequence {δj } is decreasing with δ1 = 2 max{R e (λi )} = max Cij .7) This gives an order of magnitude relative error estimate for a rank k approximation to P . deﬁned by (9. Formula (9. and thus ≤ X 2 ˆ b 2 Lej 2 = X X−1 b 2 (n − j + 1)1/2 ≤ κ2 (X) b 2 (n − k )1/2 . Corollary 9. given a ﬁxed λ in the open lefthalf plane. Now. with δ1 ≥ δ2 ≥ ≤ (n − k )2 δk+1 (κ2 (X) b 2 )2 . This completes the proof.
In chapter 4 of [373] an upper bound for the decay rate was derived which we state next. Let P satisﬁes with δ1 ≈ P 2. then P ˆkm P −P ≤ δ1 m(n − k )2 (κ2 (X) B 2 )2 . k = 1. we have APi + Pi A∗ + bi b∗ i = 0. and P be the unique solution to AP + P A∗ + BB∗ = 0. · · · . Proof. · · · . Since Zj Z∗ j = zij z∗ ij . Note that the Lyapunov equation may be written as m i “oa” 2004/4/14 page 238 i Chapter 9. · · · . A bound for the decay rate Formula (9. · · · . AB.i i 238 Generalization to MIMO* The bounds of the previous section only apply to the case where B = b is a single vector. m i=1 k ∗ j =1 δj Zj Zj . i = 1. If δk+1 δ1 2 ˆkm is an approximation to P of rank at most km which < . with A ∈ Rn×n . n. is the reachability matrix.νk 1≤ ≤n i i April 14. Given the eigenvalues λi (A). m. 2. Then rank (P ) = rank (Rn ). Xm Lej ] = [z1j . z2j . · · · . where bi ∈ Rn . 2. where zij = Xdiag (X−1 bi )Lej . where the estimates of Theorem 9. 2 and the ﬁnal result follows from an Remark 9.5) above. X2 Lej . i = 1. 2004 i i . this result still has implications for the general case. b2 . n. · · · .4. There is a close connection between these bounds and the Krylov (reachability) sequence Ak−1 B. · · · . A2 B. i = 1. An−1 B]. This fact is numerically reﬂected in the decay rates. In particular. ∗ ∗ −1 P i = Xi C X∗ bi ). Linearity of the Lyapunov equation yields m n P= i=1 Pi = j =1 δ j Zj Z∗ j.5. B ∈ Rn×m .7) provides an estimate for the decay rate of the eigenvalues of the Lyapunov equation.7. for j > k . · · · . stable. it follows that m 2 Zj Z∗ j ≤ i=1 zij z∗ ij i 2 ≤ m max zij z∗ ij i 2 ≤ m max(n − k )(κ2 (X) bi 2 )2 ≤ m (n − k )(κ2 (X) B 2 )2 . ν ˆk } = arg min max Πk i=1 νi − λ ∗+λ νi ACA : ν1 . · · · . where Rn = [B. m. zmj ]. as the solution of the following minmax problem: {ν ˆ1 . bm ]. With this notation we may establish the following result. Let B = [b1 . ˆkm = Theorem 9. we ﬁrst deﬁne the complex constants ν ˆi ∈ C.···.2.5 are applied to each zij z∗ ij argument analogous to that of Theorem 9. However. Special topics in SVDbased approximation methods AP + P A∗ + i=1 bi b∗ i =0 Let Pi be the solution to From (9. The approach used is based on the SVD (Singular Value Decomposition). i = Xi L∆L Xi where Xi = Xdiag (X Let Zj = [X1 Lej .
An straightforward choice of these parameters is νi = λi (A) where the eigenvalues are ordered in the Cholesky ordering.8. q(s) Due to the minimality of the above realization. The main result states that these invariants and the Hankel singular values are related ∗ q by means of multiplicative majorization relations. In this case we obtain the following computable upper bound: Corollary 9. the n roots of the denominator polynomial q(s). let B have m columns. with X the right eigenvector matrix. and only if. Let A be stable and diagonalizable. The invariants we introduce are the magnitudes of p evaluated at the poles of H . A system is allpass (unitary) if. and the ν ˆi solve the minmax problem deﬁned above. · · · .10. · · · . that is q(s) = det (sI − A) = i=0 αi si . The usual assumption of minimality (reachability and observability) are also made. Computable decay rate bound. n = deg q. ACA April 14. Lemma 9. Lemma 9. Furthermore. that is. A consequence of the above deﬁnition is that the same holds for the γ ’s.4 Decay rate of the Hankel singular values* In this section we present a bound for the decay rate based on a new set of system invariants. n. although the method used to obtain it is different. s=λi (9. n − 1. Once again. the γi are all equal: γ1 = · · · = γn . i = 1. This expression is similar to (9. The proof of the above lemma was given in [373].4. We now deﬁne the following quantities γi = p(λi ) p(s) = ∗ q (λi ) q(s)∗ . 2004 i i i i . The Hankel singular values of the system Σ are deﬁned as the square roots of the eigenvalues of the product of the reachability gramian P and the observability gramian Q: σi (Σ) = λi (PQ) . are all equal.9) Recall that the Hankel singular values of allpass systems satisfy the property: σ1 = · · · = σn . The theory requires us to consider the transfer function of the system: H = p q . The minmax problem mentioned above remains in general unsolved. γi  ≥ γi+1 . αn = 1.4. A study of the decay rates of the Hankel singular values* The following result holds. we consider the SISO case. We now introduce a new set of system invariants as follows. i = 1. With the choice νi = λi (A). the following bound holds: λmk+1 (P ) ≤ κ2 (X) λ1 (P ) k 1≤ ≤n i “oa” 2004/4/14 page 239 i 239 max i=1 λi − λ λ∗ i +λ 2 (9. and we assume for convenience that the eigenvalues λi of A are distinct and that the system is stable.9. These singular values are invariant under state space transformations.8) where the eigenvalues λi (A) are ordered in the Cholesky ordering. · · · .7). These quantities are assumed ordered in deceasing order σi ≥ σi+1 . 9. the eigenvalues of A are the same as the poles of H(s). n. There holds 2 λmk+1 (P ) ν ˆi − λ ≤ κ2 (X) max Πk i=1 ∗+λ 1≤ ≤n λ1 ( P ) ν ˆi where κ(X) denotes the condition number of X. We make use of the standard deﬁnition n q(s)∗ = q∗ (−s) = k=0 ∗ αk (−s)k . i = 1.i i 9. In this case the Hankel singular values of the system and the γi . Let the transfer function be H(s) = C(sI − A)−1 B + D = p(s) .
Interpretation of Theorem 9. lie in the region bounded from above by log σk and log γk . let σk be known. 2004 i i . there exist a matrix Γ. The converse is also true. namely that the product the γi  is equal to the product of the Hankel singular values was ﬁrst reported in the discretetime case. is of importance. namely γ  ≺µ σ. one can compute the γi . In many cases the poles (natural frequencies) of the system are known together with a state space realization Σ = A C B D . Lemma 9.2. P satisfy the relationship Γ = UP. The main result of this section is Theorem 9. we repeat next the deﬁnition which will be needed in the sequel. i “oa” 2004/4/14 page 240 i (9. namely i=1 log γi . remain above. The vector of Hankel singular values σ majorizes the absolute value of the vector of new invariants γ multiplicatively: γ  ≺µ σ. if the following relations hold: k n n γ  ≺µ σ ⇔ Πk i=1 γi  ≤ Πi=1 σi .10) The result quoted next is due to Sherman and Thompson [295]. Let γ. (9. This implies k k log γn−i+1  ≥ i=1 i=1 log σn−i+1 . It is actually best to plot the logarithm of k this curve.4. Multiplicative majorization To state the main result of this section. respectively.4. Furthermore the curves are monotone decreasing. Decay rate of the Hankel singular values. This means that 4 i=1 log σi has to lie on the dotted line. i = 1. then multiplicative majorization holds.12 The issue in balanced model reduction is how fast the Hankel singular values decay to zero. k (b) It follows from the majorization inequalities that Πk i=1 γn−i+1  ≥ Πi=1 σn−i+1 .5. where γi are its eigenvalues. and with equality holding for i = n. n − 1 and Πi=1 γi  = Πi=1 σi . i i ACA : April 14. Let the matrices Γ. Stated differently.i i 240 Chapter 9. σ ∈ Rn be the vectors whose ith entry is equal to γi . 2. n n and in addition the two curves have to converge at the last point i=1 log σi = i=1 log γi . · · · . Given P and γi satisfying the above multiplicative majorization relations. > k . If γi = λi (Γ).11) Remark 9. that is. n − 1.4. γi  ≥ γi+1  and σi = λi (P) ≥ σi+1 . Then the σ . · · · . we need the notion of multiplicative majorization. The main k result asserts that given this curve. Special topics in SVDbased approximation methods The proof of this result is given in section 9. The main theorem then says that the (discrete) curve whose k th value is the product Πk i=1 γi .3. the logarithmic sum of the tail of the Hankel singular values can be bounded above by the logarithmic sum of the tail of the γi . because the γi tend to decrease rapidly and their product even more so. Thus in principle. (a) The last relationship of the main result. We say σ majorizes γ multiplicatively. Additive and multiplicative majorization is discussed in section 3. U. This is a partial ordering relation between vectors. k = 1. and a unitary matrix U such that Γ = UP.11. σi . This is depicted for a 5th order system on the lefthand side of ﬁgure 9. where P > 0 is positive deﬁnite and U is unitary.12. by Mullis and Roberts [245]. · · · . with relatively small computational effort. n − 1. For convenience. while the last two points coincide. the corresponding curve for the Hankel singular values i=1 log σi . and write γ  ≺µ σ . for i = 1.
4. each γi for a minimumphase system is smaller than the γi for a system with all zeros in the righthalf of the complex plane. Remark 9. The third one has real poles only. . 5. The γ curve is always the lower one.4. Recall that minimumphase systems are systems with stable zeros.2. ﬁve pairs of γ /σ curves for allpole systems of order 19 are depicted. C = [b1 · · · bn ]. while the observability gramian is a diagonally scaled version of the Cauchy matrix with x = y = ¯ (the vector of complex ACA April 14. · · · . A study of the decay rates of the Hankel singular values* 241 i “oa” 2004/4/14 page 241 i On the righthand side of ﬁgure 9.12 Consider again the system Σ where the previous assumptions hold. A = diag ([λ1 . λn ]). Therefore p(λi ) = bi Πj =i (λi − λj ). Notice also that ∗ q(s)∗ s=λi = Πn j =1 (λi − λj ). i. The ﬁrst system has poles with real part equal to −1. as follows.i i 9.12) It follows that p(s) = Dq(s) + bi Πj =i (s − λj ). 1 Σ= Then the reachability gramian is equal to the Cauchy matrix with x = y = (the vector of eigenvalues λj ). the transfer function can be decomposed in a partial fraction expansion of the type: H(s) = n i=1 p(s) p(s) bi = n =D+ .e. versus k = 1. The curves are from upper to lower. The proof of Theorem 9. 2004 i i i i . · · · .9) since. B = . assuming that the poles are the same. A consequence of the main result is that nonminimum phase systems are harder to approximate than minimum phase ones. . Due to the fact that the poles are distinct. The units on the y axis are orders of magnitude. This follows from the deﬁnition (9.4. Left pane: i=1 log γi  (upper curve) and Right pane: ﬁve pairs of γ /σ curves for allpole systems of order 19. k k i=1 log σi (lower curve). The next has poles which are on a 45 degree angle with respect to the negative real axis. From the partial fraction decomposition follows the state space realization A C B D 1 . . zeros which lie in the lefthalf of the complex plane.2. Finally the last two pairs have their poles spread apart by a factor of 10 with respect to the previous two pairs. q(s) Πi=1 (s − λi ) s − λi i=1 n (9. 5 pairs of gamma/sigma curves 0 −100 −200 −300 −400 −500 1st pair: poles parallel to imag axis 2nd pair: poles on 45 degree line −600 3rd pair: real poles 4th pair: same as 2nd pair streched by factor 10 −700 5th pair: same as 3nd pair streched by factor 10 −800 2 4 6 8 10 12 14 16 18 Figure 9.
δk = bk  ξk − ξi n ∗ Πi=k+1 ξ + η ∗ ξk + ηk k i 2 .. √ then the above equality implies Σ D ˆ . · · · . where Σ = diag (σ1 . . k = 1. Σ ∗ ˆ −1 ΣD ˆ −1 = In ))−1 T∗ T−∗ QT−1 T(D( . It should also be noticed that as mentioned earlier [142]. . Then. Special topics in SVDbased approximation methods i “oa” 2004/4/14 page 242 i conjugate eigenvalues). Πj (λi +λ∗ j) Thus from the previous discussion and in ∗ −1 ) C( ∗. Furthermore the former multiplicatively majorizes the vector γ : γ  ≺µ δ ↑ ≺µ λ(P ) (9. √ √ √ √ √ ˆ −1 Σ = U . we conclude that C ( . i. . )i = particular Lemma 9. = bi Πj =i (λi −λj ) .e.e. −1 ∗ )] −1 Q [D( . and ∆ = diag δ ↑ . Then TP T∗ T−∗ (BD( ∗ . .13.11 together with (9. · · · . are the Hankel singular values of Σ. U is upper triangular with ones on the diagonal.3). the LDU factorization of the Cauchy matrix can be written explicitly as C = L∆U. σn ). the γi are: γk = bk ξk − ξi ∗ Πi=k ξ + η ∗ .i i 242 Chapter 9. . the vector δ ↑ . Let P be the solution to the Lyapunov equation with eigenvalues λi (P ). Let T be a balancing transformation. )i . ξk + η k k i Notice that δ ↑ δ ↓ = γ γ ¯ = γ 2 . where denotes pointwise multiplication. where L is lower triangular with ones on the diagonal. and since where is unitary UU∗ = U∗ U = In . where Σ denote the square root of Σ. Let B be the diagonal matrix whose entries on the diagonal are bi . We associate with the ξi ’s and ηi ’s the following vectors δ ↑ ∈ Rn and δ ↓ ∈ Rn : ↑ δk = bk  k−1 ξk − ξi ∗ Πi=1 ξ + η ∗ ξk + ηk k i 2 ↓ . Lemma 9. Σ Σ√ = Σ. )B ∗ ] = In . From here we proceed as follows. Because of the above mentioned fact on the LDU factorization we have the following result due to Alfred Horn [180]. A sharpening of the main result Recall (9. it states that the vector of eigenvalues of a positive deﬁnite matrix majorizes multiplicatively the vector whose ith entry is the quotient of the determinants of the ith over the (i1)st principal minors of the matrix. It is assumed in the sequel that the ξi are arranged so that the entries of δ ↑ turn out to be in decreasing Cauchy ordering: ↑ ↑ δk ≥ δk +1 . TP T∗ = T−∗ QT−1 = Σ.13) i i ACA : April 14. Lemma 9. )B∗ . . is majorized multiplicatively by the vector of eigenvalues λ(P ). ∗ )D( . This implies P [BD( ∗ . . Therefore. 2. Σ U∗ Σ = D √ U∗ √ λi ( Σ U Σ) = λi (UΣ). ∗ .2) and (9. λi (D ˆ ) = d( . ) = − 1 λ1 +λ∗ 1 1 λ2 +λ∗ 1 1 λn +λ∗ 1 1 λ1 +λ∗ 2 1 λ2 +λ∗ 2 1 λn +λ∗ 2 . Then. )−1 = In . yield the desired result (9.12).11). )D( ∗ . . From the above remarks it follows that d( .9). ··· ··· . n. 2004 i i . ··· p(λi ) q∗ (λi ) 1 λ1 +λ∗ n 1 λ2 +λ∗ n 1 λn +λ∗ n ∈ Cn×n ⇒ Q = −BC ( ∗ . i. )B∗ )−1 T−1 = ΣD Σ ˆ −1 D ⇒ √ ΣD ˆ −∗ U∗ √ Σ √ ˆ −∗ D ΣD ˆ −1 U √ Σ = In . From our earlier discussion it follows that if the system is given by (9. Then P = −C ( . since.1.
2004 i i i i . (b) The inequalities (9.28. Therefore p = κq∗ .i i 9. are equal. to stress the dependency on the j th singular value we write rj instead of r: r∗ p(λi ) j ( λi ) γi = ∗ = σj . and A is any matrix with characteristic polynomial equal to q. (a) In section 9. where H = p q is the transfer function of the system.6. and the corresponding eigenvectors. ∗ We are now ready to prove lemma 9. i. b be polynomials satisfying the Bezout equation aq + bq∗ = 1. the signed Hankel singular values µ (i. are characterized by means of the polynomial equation 8. 1. for some κ ∈ R. we illustrate the effectiveness of the Cholesky estimates for decay rates of the eigenvalues of the system gramians derived in section 9. Connection with Fuhrmann’s results In section 8. −1. −1. since the degree of r is at most n − 1. It follows therefore that γi is equal to some singular value times the r∗ magnitude of the corresponding allpass r . Then the coefﬁcients of the polynomial r and the eigenvalues µ of the Hankel operator are determined from the eigenvalue decomposition of the matrix M = Kb(A)p(A) where K = diag (1. (9. n.4.e. (2j +1)/(2k) + 1 λ1 (P ) κ ( A ) j =0 ACA April 14. these bounds are 2 k−1 (2j +1)/(2k) κ(A) − 1 λk+1 (P ) ≤ (9. When m = 1 (the SISO case).10.14) q (λi ) rj (λi ) Thus the ratio of γi and σj is a quantity depending on the allpass function deﬁned by the polynomial r which in turn deﬁnes the eigenvectors of the Hankel operator.5). (·)∗ is as deﬁned earlier. Necessity: if the system is all pass.4. H = κ q q .5 Numerical examples and discussion* In this section.5.15) .9) of the gammas we have γi = κ for all i. we only report on the results for the eigenvalues of one gramian here.e. = ±1.5 it was shown that the optimal conditioning of Cauchy matrices by means of diagonal scaling is given by ∗ ξk + η i ∗ βk = ξk + ηk  .4.3 (see theorem 9. the eigenvalues) of the Hankel operator. j . · · · . Penzl’s results are only valid for symmetric A. To our knowledge these were the only eigendecay estimates available prior to the results given here. evaluated at s = λi . Sufﬁciency: if all the γi . Even though the results of Section 9. σ is a Hankel singular value. is the corresponding signed singular value. by deﬁnition (9.4. where κ = γi = σj for all i. i = 1. As the two are intimately related. n. π are unknown polynomials of degree less than n = deg q. µ = σ . using the notation of this section it becomes pr = µq∗ r∗ + qπ .14) implies r = r∗ . ξk − ξi i= k This implies that for optimally conditioned Lyapunov equations we must have δ ↑ δ ↓ = e∗ = [1 1 · · · 1]. j = 1.4.13) are a reﬁned version of the result concerning the decay rate of the eigenvalues of a single gramian. see theorem 9.3 do not establish direct bounds on the eigenvalues of P they seem to predict the behavior of these eigenvalues quite well. · · ·). We notice that this equation can be rewritten as follows: p r∗ q π p = µ + ∗ ⇒ γi = ∗ q∗ r q r q =µ s=λi r∗ r s=λi where −λi are the roots of q (poles of Σ). A study of the decay rates of the Hankel singular values* 243 i “oa” 2004/4/14 page 243 i Remark 9. · · · . (9. We illustrate this with some computational results and compare our estimates to those derived by Penzl in [266].5. Let a. 9. We have observed essentially the same quality of results for the decay rate estimates for the Hankel singular values. and r.
5 where A has been constructed to have two clustered eigenvalues as follows: i i ACA : April 14. namely the roll. 2004 i i . and the roll. yaw jets. Left pane: Comparison of estimates for the discrete Laplacian. a ﬁnite element model of heat conduction in a plate with boundary controls and a ﬁnite element model of a clamped beam with a control force applied at the free end.01). These results indicate that the condition number κ(A) alone may not be enough to determine the decay rate effectively. pitch. we show the same comparison for a random nonsymmetric A of order 200 with a few eigenvalues near the imaginary axis (distance about . (9.3 we show the results of our estimates versus Penzl’s in comparison to the eigenvalues of P where A is the standard ﬁnite difference approximation to the 1D Laplacian of order 100. Since A is nonsymmetric in all of these systems. However. In the right pane of ﬁgure 9. Moreover. pitch. Our computations indicate that a great deal may be lost in replacing estimates involving all of the eigenvalues λj of A with a condition number. yaw rate gyros readings. n=100. Special topics in SVDbased approximation methods i ‘‘oa’’ i 2004/4/1 page 244 Our estimates are derived from the diagonal elements of the Cholesky factor of the Cauchy matrix C deﬁned by (9. In fact. There are 3 inputs.15) fails to predict the rapid eigendecay rate. n=200. Two of the examples are simple model problems. one can easily see that clustered eigenvalues of A can make the Cauchy kernel have very low numerical rank (hence P has very low numerical rank). It seems that the decay rate really depends on the full spectrum of A as indicated by the Cholesky estimates. b=rand(n. A third example labelled “CD player model” is a simpliﬁed simulation of a CD player tracking mechanism that has been described in detail in [152]. This example was provided to us by Draper Labs. In fact.2).i i 244 Chapter 9. and 3 outputs. Even though these two results are derived in very different ways. However. the Penzl estimates do not apply. respectively. b=rand(n. The horizontal dotted line indicates where these ratios fall below machine precision eps ≈ 10−16 .3.3 gives a semilog graph of the Cholesky estimates δk /δ1 . In Figure 9.01*max(eig(R))*I.1) 50 60 10 −35 10 −40 Cholesky Eigenvalues eps 0 10 20 30 40 R=rand(n). This is illustrated by the example shown in Figure 9. and the actual eigenvalue ratios λk (P )/λ1 (P ) for 1 ≤ k ≤ 60. Right pane: Comparison of estimates for a nonsymmetric A The left pane of ﬁgure 9. but κ(A) can be arbitrarily large at the same time. they are closely related. in that case. A = R−1. These are labelled “Heat model” and “Struct model” respectively. the Penzl estimates. They are of the form δk = −1 2Re (λk ) k−1 j =1 λk − λj ¯ k + λj λ 2 . they arose from ADI approximations to the solution of the Lyapunov equation.1) 50 60 Figure 9. The fourth example labelled “ISS 1rc04 model” is an actual ﬁnite element discretization of the ﬂex modes of the Zvezda Service Module of the International Space Station (ISS). the Cholesky ratios give very good approximations to the actual eigendecay rates for all of these examples. 10 0 10 0 10 10 −10 −5 10 10 −20 −10 10 −15 10 −30 10 −20 10 10 −40 −25 10 10 −50 −30 10 −60 Cholesky Eigenvalues Penzl eps 0 10 20 30 40 A = 1d Laplacian. the eigendecay rate can be extremely fast even when κ(A) is huge.4 we compare Cholesky estimates to actual eigendecay on LTI examples that are more closely related to engineering applications. and do not require symmetry of A. Penzl derives his bounds from expressions involving the same linear fractional transformations that led to our results. Hence the right hand side in (9.3. In Figure 9.16) where the λj ’s have been indexed according to the Cholesky ordering imposed by diagonal pivoting.
cond(A)=17793.5508 160 180 200 10 −25 0 5 10 15 20 25 30 eig−decay of the solution P vs.0071 −200 −100 0 10 −12 0 20 40 60 80 100 eig−decay of the solution P vs.1) 250 Figure 9. cond(A)=9680.1 eigenvalue plot of A.4.1) 120 ISS 1r−c04 model: n= 270 80 10 0 ISS 1r−c04 model: n= 270 Cholesky Eigenvalues 60 40 10 20 −5 0 −20 10 −40 −10 −60 −80 −0.4. cond(A)=7035. Chol estimate.8329 −100 0 10 −18 0 20 40 60 80 100 120 140 eig−decay of the solution P vs. B = ones(n.1774 −0.i i 9. Chol estimate.25 −0. Chol estimate.1) 160 180 5 x 10 CD player model: n= 120 10 0 CD player model: n= 120 Cholesky Eigenvalues 4 10 3 −2 2 10 1 −4 0 10 −6 −1 10 −2 −8 −3 10 −4 −10 −5 −900 −800 −700 −600 −500 −400 −300 eigenvalue plot of A.05 0 10 −15 0 50 100 150 200 eig−decay of the solution P vs. B = ones(n.3 −0. cond(A)=6866842. B = eye(n.1) 35 40 Struct model: n= 348 100 10 0 Struct model: n= 348 Cholesky Eigenvalues 80 10 −2 60 10 −4 40 10 20 −6 10 0 −8 10 −20 −10 10 −40 −12 −60 10 −14 −80 10 −16 −100 −600 −500 4 −400 −300 −200 eigenvalue plot of A.15 −0. A study of the decay rates of the Hankel singular values* Heat model: n= 197 0 10 0 i ‘‘oa’’ i 245 2004/4/1 page 245 Heat model: n= 197 Cholesky Eigenvalues −20 10 −5 −40 −10 10 −60 10 −15 −80 −100 10 −20 −120 0 20 40 60 80 100 120 140 eigenvalue plot of A. Chol estimate. B = rand(n.2 −0. Eigendecay rate vs Cholesky estimates for some real LTI systems ACA April 14.35 −0. 2004 i i i i .
In [266] Penzl suggested that the dominant imaginary parts of the eigenvalues cause the decay rate to slow down. where Aj = Aj (t) = −1. eigendecay rate of P slows down when A has eigenvalues with increasingly dominant imaginary parts. Re(eig(A))=−1. We illustrate this in the right pane of ﬁgure 9. Right pane: Fast eigendecay rate for A with clustered eigenvalues having dominant imaginary parts. A2 . b=ones(n. Again we construct: n = 2d + 1. Penzl [266] constructed a nice example to show that even in the SISO case.1) 80 100 120 Figure 9. d. From the construction of A. He observed that the eigenvalue ratios are almost constant 1 for large t. κ(A) = A 2 A−1 2 t. 103 . A = diag (−1. 2. Fast eigendecay rate for large cond(A) For the nonsymmetric case. 1000.1) 120 Figure 9. j = 1. 100. B = [1. jt/d −1 . · · · . b=rand(n. 104 . Re(eig(A))=−1. This observation is relevant.5. t = 10. 2004 i i . 1]∗ ∈ Rn .6 by constructing A with eigenvalues having huge imaginary parts but with all eigenvalues clustered about 3 points: {−1. end Eigenvalue decay of P: AP + P AT + b bT=0 K(A) = 1e12 K(A) = 1e13 K(A) = 1e14 K(A) = 1e15 10 0 10 −5 10 −10 eig(P) / max(eig(P)) 10 −15 10 −20 10 −25 10 −30 10 −35 10 −40 0 20 40 60 n= 200. together with the absence of clustering in the spectrum of A are very important factors.n)= 1e5. the eigendecay rate can be arbitrarily slow.i i 246 Chapter 9. His example is: put n = 2d + 1. −jt/d.1))*100. t = 10. · · · . m = 1. A1 .6. In the extreme case the ratios remain almost constant at 1. the 10 0 Eigenvalue decay of P: AP + P AT + b bT=0 10 0 Eigenvalue decay of P: AP + P AT + b bT=0 t = 10 t = 100 t = 1e3 t = 1e4 10 −2 10 −2 10 −4 10 −4 10 eig(P) / max(eig(P)) −6 10 eig(P) / max(eig(P)) −6 10 −8 10 −8 10 −10 10 −10 10 −12 10 −12 10 −14 10 −14 10 −16 10 −16 10 −18 t = 10 t = 100 t = 1e3 t = 1e4 0 100 200 300 400 500 n= 601. but further analysis seems to indicate that relative dominance of the imaginary parts to the real parts.6. · · · . b=ones(n. Left pane: Decay rates for Penzl’s example.1) 600 700 10 −18 10 −20 10 −20 0 20 40 60 80 100 n= 201. Ad ) ∈ Rn×n . −1 + t i . We tried this example with d = 300.10ˆ(6+iter)*eye(n) + diag(rand(n. −1 − t i }. Penzl considered d = 50. As can be seen from the left pane of ﬁgure 9. 100. A(n. Special topics in SVDbased approximation methods i “oa” 2004/4/14 page 246 i for iter = 1 : 4 A = . A has clustered eigenvalues. i i ACA : April 14.
We also see that κ(A) is irrelevant to the decay rate in this example since here κ(A) t for each t. ﬁlling a sharp eigenvalue inclusion region. −t − j/d. these cases are related to all pass systems. In the ﬁrst plot. t + j/d −1 . 247 i “oa” 2004/4/14 page 247 i Again. As an example. λmax ]. we take t = 10.6. · · · . Figure 9. and compare these values to the values generated from 200 points (red curve) and 2000 points (black curve) distributed like the Chebyshev points in [λmin .12 and in the reﬁned bounds (9. generate optimal interpolation points. the worst case is possible. In fact. it seems to be uncommon for the conditions for slow decay to arise. 1]∗ ∈ Rn . the eigendecay rate of P continues to be fast regardless of the magnitude of t as demonstrated in ﬁgure 9. The Lyapunov equation may have √ the . while P depends on both A and b. In engineering applications. However. Discrete Laplacian in 1D.5. (ii) most eigenvalues of A have dominant imaginary parts and the spectrum of A is not clustered relative to maxj Re (λj ). Re (λ) remains λk −λj constant and the increasingly dominant Imag(λ) leads to λ ¯ k +λj → 1. even in the SISO case.0 on a Sun Sparc Ultra60 under SunOS 5.8 gives one example of the effect of increasing m (the number of columns of B). 9. · · · .13). it is possible for the solution to have no eigendecay. The result of Theorem 9. Three examples are investigated. In this example. while we modiﬁed Aj as follows Aj = Aj (t) = −1. and then compare the δi ’s obtained from those points to the δi ’s obtained from the eigenvalues.16) still approximates the actual decay rate well. Ad ) ∈ Rn×n . the eigendecay rate of P does not deteriorate because of the clustered eigenvalues. 2004 i i i i . {δk } becomes nearly a constant sequence and there is very little decay of the eigenvalues. and thus the Cauchy kernel (9. ACA April 14. 100. that is. B = [1. However. Figure 9. The Cholesky estimates are not shown in ﬁgures 9. b is taken into account in theorem 9. when A has a moderately well conditioned eigenvector basis (9. Cases of nondecay or slowdecay Penzl’s example is one slowdecay case. 104 for comparison. the most relevant reasons we found are: (i) most eigenvalues of A have tiny real parts with not so tiny imaginary parts. 103 . A2 . (a) The Cholesky ratio (9. (b) On the other hand.2) has low numerical rank for each t.16) only involves the eigenvalues of A. We also did extensive numerical studies on these nondecay cases. Note in this example that the spectrum of A contains relatively clustered points with dominant real parts. All computations were done in MATLAB 5. despite the presence of eigenvalues with increasingly dominant imaginary parts.6 (right pane). n = 400. which implies the fast eigendecay rate. 2. Thus. However. (9. choose A = T − µ e1 e∗ 1 for some µ > 0.7 suggests that the rank of P increases with the rank of B. In this example. Machine epsilon is approximately 10−16 . we illustrate the values based on the true eigenvalues of A (blue curve). A study of the decay rates of the Hankel singular values* A = diag (−1. A1 . Further numerical experiments The plots in this section are due to Marc Embree [104].16) usually gives an accurate prediction of the numerical rank of P .7 shows even in the slowdecay or nondecay case. For each t in this example A has only 3 clustered eigenvalues. in all of these cases the Cholesky ratios were computed and again they approximate the actual eigendecay rates well.3.6 to avoid the clutter of many overlayed plots. so there may be considerable discrepancy between the Cholesky ratio and the actual eigendecay rate for some b. d. if T = −T∗ . Remark 9. j = 1. Hence.4. Then AI + IA∗ + bb∗ = 0.i i 9.6.15) gives an explanation of the nondecay in Penzl’s example.4. The goal in each case is to pick some region in the complex plane. keeping in mind that the δi ’s are meant to provide an order of magnitude relative error estimate for the numerical rank of P . Actually (9. b = 2µ e1 solution P = I where I is the identity matrix. Hence. · · · .
i i 248
A = T − mu e et ,
1 1
i
“oa” 2004/4/14 page 248 i
Chapter 9. Special topics in SVDbased approximation methods
cond(A)=64.311
10
1
2
1.5
10
0
1
0.5
10
−1
0
−0.5
10
−2
−1
10
−3
−1.5
Cholesky Eigenvalues 0 10 20 30 40 50 60 70 80 eig−decay of P vs. Cholesky estimate, B = rand(n,1) 90 100
−2 −2
−1.8
−1.6
−1.4
−1.2 −1 −0.8 eigenvalue plot of A, n=100
−0.6
−0.4
−0.2 x 10
0
−5
10
−4
10
1
10
1
10
0
10
0
10
−1
10
−1
10
−2
10
−2
10
−3
10
−3
10
−4
Cholesky Eigenvalues 0 10 20 30 40 50 60 70 80 eig−decay of P vs. Cholesky estimate, B = sqrt(2mu) e1 90 100
10
−4
Cholesky Eigenvalues 0 10 20 30 40 50 60 70 80 eig−decay of P vs. Cholesky estimate, B = ones(n,1) 90 100
Figure 9.7. Eigendecay rate and Cholesky estimates for the slowdecay cases
10
0
Eigenvalue decay of P : AP + P AT + B BT=0 p=1 p=15 p=30 p=45
10
−2
10
−4
10 eig(P) / max(eig(P))
−6
10
−8
10
−10
10
−12
10
−14
10
−16
10
−18
10
−20
0
50
100 150 200 n= 500, A= rand(n)− n*eye(n),
250 B=rand(n,p)
300
350
Figure 9.8. MIMO cases: the effect of different p Shifted random matrix, n = 1000. Let A be generated as follows: A = randn(n)/sqrt(n)1.1*eye(n). The entries of A are thus normally distributed with mean µ = −1.1 and variance σ 2 = n−1 . With this scaling, the spectrum of A will ﬁll the disk of radius 1 centered at 1.1 in the large n limit. We approximate the spectrum by this disk, bounded by the red line in the plot below. Notice that some of the eigenvalues of A (blue dots) are slightly outside this region. We use 100 uniformlyspaced points on the boundary of the disk (red dots) to get approximate values. This classic example from random matrix theory exhibits only mild nonnormality. The second ﬁgure above compares the decay of the true δi values (blue curve) with the δi values obtained from the 100 points (red) and 1000 points (black) on the disk boundary. Notice the good agreement for the ﬁrst 25 or so values of δi , at which point discrete effects begin to take over. Matrix with a Vshaped spectrum, n = 1000. Here, we let A have eigenvalues uniformly distributed on two
i i
ACA :
April 14, 2004
i i
i i 9.4. A study of the decay rates of the Hankel singular values*
10
0
i
“oa” 2004/4/14 page 249 i
249
10
−10
10
−20
δi / δ1
2000 points
−30
10
10
−40
200 points
10
−50
true eigenvalues
10
−60
0
10
20
30
40
50
60
70
80
90
100
i
Figure 9.9. Eigenvalue decay for the discrete Laplacian
1 0.8 0.6 0.4 0.2
10
−20
10
0
10
−10
1000 points
δi / δ1
0 −0.2
10
−30
10
−40
−0.4 −0.6 −0.8 −1 −2 −1.5 −1 −0.5 0
10
−60 −50
true eigenvalues
10
100 points
0
10
20
30
40
50
60
70
80
90
100
i
Figure 9.10. Eigenvalue decay for shifted random matrices sides of a triangle, with the rightmost eigenvalue λ = 0.01 at the vertex, having multiplicity 2. For our comparison, we use 200 points derived from a conformal map of the convex hull of the eigenvalues of A, generated using Driscoll’s SchwarzChristoffel Toolbox for MATLAB. The true eigenvalues are shown in blue in the left plot below; the approximation points are shown in red on the right.
1 0.8 0.6 0.4 0.2 0 −0.2 −0.4 −0.6 −0.8 −1 −0.6 −0.4 −0.2 0 0.2 1 0.8 0.6 0.4 0.2
10
−20
10
0
10
−10
true
eigenvalues
δi / δ1
1000 points
−30
0 −0.2
10
200 points
10
−40
−0.4 −0.6 −0.8 −1 −0.6 −0.4 −0.2 0 0.2
10
−60 −50
10
0
50
100
150
i
Figure 9.11. Eigenvalue decay for Vshaped spectrum Lastly, we modify the previous example by supposing that we don’t have such an accurate inclusion region for the eigenvalues. We bound the spectrum with a pentagon having a vertex at 0.005. The ﬁrst plot below shows the eigenvalues (blue dots) and the set of 200 approximation points (red dots). We now see that this new region generally
ACA
April 14, 2004
i i i
i
i i 250 Chapter 9. Special topics in SVDbased approximation methods
i
“oa” 2004/4/14 page 250 i
leads to an overestimate of the decay behavior. If in practice one only requires an order of magnitude estimate (and, indeed, is more willing to accept an overestimate of the numerical rank rather than an underestimate), this may still be acceptable.
10
0
1
10
−10
1000 points
0.5
10
−20
true eigenvalues
δi / δ1
0
10
−30
10
−40
200 points
−0.5
10
−50
−1
10
−60
0
20
40
60
80
100
120
140
160
180
200
−0.6 −0.4 −0.2
0
0.2
i
Figure 9.12. Eigenvalue decay for a spectrum included in a pentagonshaped region
9.5 Assignment of eigenvalues and Hankel singular values*
We will now brieﬂy address the issue of conditioning of A and of the Hankel singular values of a linear system. Two important invariants are associated with a system Σ, to wit: • The natural frequencies or poles λi (Σ), of Σ. • The Hankel singular values σi (Σ), of Σ. The former quantities are deﬁned as the eigenvalues of A: λi (Σ) = λi (A), i = 1, · · · , n. The latter are deﬁned as the singular values of the Hankel operator HΣ , associated with Σ. Next, we investigate the relationship between these two quantities. This section follows [18]. As pointed out in earlier chapters, the nonzero singular values of Σ can be computed as σi (Σ) = λi (PQ ), i = 1, · · · , n, where P , Q are the reachability, observability gramians respectively. We use the notation Σ = diag (σ1 , · · · , σn ) ∈ Rn×n . The eigenvalues λi (Σ) describe the dynamics of the system while the Hankel singular values σi (Σ), just as the singular values in the case of constant matrices, describe how well Σ can be approximated by a system of lower dimension using balanced truncation or Hankel norm approximation. Approximation works in this case by truncating the small Hankel singular values, and these quantities provide a priori computable information on how well a dynamical system can be approximated by a system of prespeciﬁed order. A system representation is balanced if the solution of both Lyapunov equations is equal and diagonal, i.e. P =Q=Σ. As discussed earlier, every system Σ which is stable, reachable and observable has a balanced representation. We can thus assume without loss of generality that such a system Σ is in balanced form. In this case the matrices have a special form which is referred to as balanced canonical form; see section 7.4 for details. In case that the singular values are distinct this form is Bi = γi > 0, Ci = si γi , Aij = −γi γj where si = ±1, i = 1, · · · , n si sj σ i + σ j
In other words the si are signs associated with the singular values σi ; the quantities si σi , i = 1, · · · , n, turn out to be the eigenvalues of the Hankel operator H. Finally, notice that due to the reachability of (A, B) and to the observability (C, A) the γi must be different from zero, and thus without loss of generality they can be chosen positive γi > 0. From the above relationships it follows that A can be written as A = Bd A0 Bd , (A0 )ij = −1 , Bd = diag (γ1 , · · · , γn ), si sj σi + σj
ACA :
(9.17)
i i
April 14, 2004
i i
i i 9.6. Chapter summary 251
i
“oa” 2004/4/14 page 251 i
where γi , σi > 0, and si = ±1. The problem which arises is the relationship between λi (Σ) and σi (Σ). Numerical experiments. (I) If the eigenvalues λi (Σ) of Σ are ﬁxed but A, B, C are otherwise randomly chosen, the condition number of the singular values turns out to be on the average, of the order of 10n , where n is the system dimension. (II) If the singular values σi (Σ) of Σ are ﬁxed but the system is otherwise randomly chosen, the condition n number of the resulting A is on the average, of the order of 10 2 . Ensuing problems. The above numerical experiments suggest two speciﬁc problems: to what extent is it possible to reduce (I) the condition number of the singular values in the ﬁrst case above, and (II) the condition number of A in the second case, by appropriate choice of the quantities which have been chosen randomly? From the numerical experiments outlined earlier we conclude that the Hankel singular values of generic linear systems fall off rapidly, and hence such systems can be approximated well by systems of low dimension (complexity). The issue raised here is thus concerned with determining the extend to which nongeneric systems can be approximated well by low dimensional systems. Problem (I), is addressed as follows. Given a pair of matrices (C, A), with A stable, following corollary 5.18 on page 131, one can always ﬁnd B = −Q−1 C∗ such that
A C B I
is allpass; in this case the Hankel singular
values are all equal to 1 (i.e. there is no decay). Thus the condition number of the Hankel singular values can be minimized optimally by appropriate choice of B. Problem (II), i.e. the distribution of the eigenvalues of A for preassigned Hankel singular values, can be addressed using the balanced canonical form. Using results of Golub and Varah [144] the following result can be proved. Theorem 9.14. Given n distinct positive real numbers σi , together with n signs si , the condition number of A deﬁned by (9.17) is minimized, for the following choice of the entries γi > 0, of the diagonal scaling matrix Bd :
2 γi =  λi + λi  j =i
λi + λj λi − λj
where λk = sk σk , k = 1, · · · , n.
Remark 9.5.1. (a) The improvement of the condition number of A in (9.17) for optimal conditioning, compared with random choice of the γi , is between 103 and 104 . This improvement is not very pronounced especially for higher n. (b) Matrices of the type deﬁned by A0 are Cauchy matrices, while for the special case that all signs si are positive we have Hilbert matrices. Thus the theorem above provides the optimal diagonal scaling of Cauchy and Hilbert matrices.
9.6 Chapter summary
POD (Proper Orthogonal Decomposition) is a popular model reduction method in the PDE community. After a brief overview, it was argued that if one can afford to consider a second set of snapshots coming from the adjoint system, then at least in the linear case a global bound for the approximation error results. The next two sections discuss model reduction by modal approximation and by residualization. The former works in some cases but may run into trouble if the system has poles of high multiplicity, while the purpose of approximation by residualization (obtained by means of the Schur complement) is to preserve the steady state gain. This chapter concludes with an extensive discussion of the decay rates of the eigenvalues of a single gramian which satisﬁes a Lyapunov equation as well as the decay rate of the Hankel singular values (which are the square roots of the eigenvalues of the product of two gramians). Three results are presented. The ﬁrst is an orderofmagnitude decay rate of the eigenvalues of a single gramian given by (9.7); the key is the Cholesky ordering of the eigenvalues of A, which in turn is closely related to the Cauchy matrix. The second is an upper bound on the decay rate given by (9.8). The main ingredient is a minmax problem which is unsolvable; but it turns out that this can be bypassed leading to a computable upper bound. Finally, it is shown that a new set of invariants given by (9.9) and the Hankel singular values satisfy the multiplicative majorization relation (9.11). This leads to an average decay rate for the Hankel singular values. This approach also provides an explanation for the fact that nonminimum phase systems (i.e. systems having unstable zeros) are more difﬁcult to approximate than minimum phase ones. The chapter concludes with a brief account on the assignment of eigenvalues and Hankel singular values.
ACA
April 14, 2004
i i i
i
i i 252 Chapter 9. Special topics in SVDbased approximation methods
i
“oa” 2004/4/14 page 252 i
i i
ACA :
April 14, 2004
i i
i i
i
“oa” 2004/4/14 page 253 i
Part IV
Krylovbased approximation methods
253 i i i i
i i
i
“oa” 2004/4/14 page 254 i
i i i
i
i i
i
“oa” 2004/4/14 page 255 i
Chapter 10
A brief overview of eigenvalue computations
The 2norm approach to system approximation outlined above requires dense computations of order n3 and storage of order n2 , where n is the dimension of the original system. Consequently, it can be used for systems of moderate complexity. A set of iterative algorithms known as Krylov methods, provide an alternative. The basic Krylov iteration Given a real n × n matrix A and an nvector b, let v1 =
b b
. At the k th step we have (10.1)
n×k AVk = Vk Hk + fk e∗ , Hk ∈ Rk×k , fk ∈ Rn , ek ∈ Rk , k , Vk ∈ R
where ek is the k th canonical unit vector in Rk , Vk = [v1 · · · vk ] consists of k column vectors which are orthonormal, ∗ ∗ Vk Vk = Ik , and A projected onto the subspace spanned by the columns of Vk is Hk = Vk AVk ; these conditions fj imply that vj +1 = fj , j = 1, · · · , n − 1. The computational complexity required for k steps of this iteration is O(n2 k ) and the storage is O(nk ); often, the iteration is applied to a sparse matrix A, in which case the complexity reduces to O(nk 2 ). Two algorithms fall under this umbrella, namely the Lanczos algorithm [221, 222] and the Arnoldi algorithm [28]. For an overview of the Krylov iteration we refer the reader to [285], [149], [332] and [71]. This iteration has rich structure involving the subspaces Ki spanned by the sequence of vectors b, Ab, A2 b, · · ·, Ak−1 b, which are known as Krylov subspaces in the numerical linear algebra community and as reachability or controllability subspaces in the control systems community. For arbitrary A, (10.1) is known as the Arnoldi iteration; in this case Hk is upper Hessenberg. For symmetric A = A∗ , (10.1) is known as the symmetric or onesided Lanczos iteration, in which case Hk is tridiagonal and symmetric. A variant of (10.1) involving two staring vectors can be applied to nonsymmetric matrices A and is known as the twosided or nonsymmetric Lanczos iteration. In this case the projected matrix Hk is tridiagonal (but not symmetric). Three uses of the Krylov iteration The iteration described above has three main uses. (a) Iterative solution of Ax = b. In this case we seek to approximate the solution x in an iterative fashion. The Krylov methods are based on the fact that successive approximants belong the subspaces Ki mentioned above. Both the Arnoldi and the onesided Lanczos algorithms, construct iteratively orthonormal bases for these subspaces. (b) Iterative approximation of the eigenvalues of A. In this case b is not apriori ﬁxed. The goal is to use the eigenvalues of the projected matrix Hk as approximants of the dominant eigenvalues of A. The most simpleminded approach to the approximation of eigenvalues is the power method, where given b, successive terms Ak−1 b are computed. To overcome the slow convergence of this method, Krylov methods are used, where at the k th step 255 i i i i
The problem deﬁned by equation (10. which has been worked out by Saad [285]. this turns out to be a truncated form of the implicitly shifted QR algorithm. Let the columns of V ∈ Rn×k form a basis for V . λ is called an eigenvalue and x is called the corresponding eigenvector of A. (c) Approximation of linear systems by moment matching. Nevertheless it can be solved in two steps as follows. Two square matrices A and B are called similar if there exists a nonsingular matrix T such that AT = TB or equivalently A = TBT−1 . i=1 (ξ − λi ) i=1 mi = n. and only if. the eigenvalues of H are a subset of the eigenvalues of A. Ab. 10.4) i i ACA : April 14.i i 256 Chapter 10. · · ·. In the general case. let k mi χA (ξ ) = det(ξ I − A) = Πk . x) is a solution of (10. (10.3) This orthogonality property is also known as a Petrov–Galerkin condition. This approach has been implemented.2) is nonlinear in λ and the entries of x. This approach was introduced to overcome the often intractable storage and computational requirements of the original Lanczos/Arnoldi method. The present chapter is dedicated to an exposition of Krylov methods as they apply to eigenvalue computations. If λi ∈ Λ(A). Then λ is an eigenvalue if. they have the same spectrum. Eigenvalue computations i “oa” 2004/4/14 page 256 i one makes use of the information contained in the whole sequence b.g. this is equivalent to the determinant of A − λI being zero. it is known that convergence of the eigenvalues on the boundary precedes convergence of the rest of the eigenvalues. (10. It should be noted that in the case of symmetric matrices there exists a theory for the convergence of the eigenvalues of Hk to the eigenvalues of A. (10. furthermore (λ. (10. the software package is called ARPACK. the Jordan canonical form – with nontrivial Jordan blocks – comes into play (for details see e. If this condition is not satisﬁed. It readily follows that two matrices are similar if. A − λI has a nontrivial null space. For details we refer to the ARPACK manual [227]. and only if. Ak−1 b. The set of all eigenvalues of A is called the spectrum of A and is denoted by Λ(A). [181]). 2004 i i .1 Introduction Given a matrix or operator A : X → X. such that AV = VH ⇒ Λ(H) ⊂ Λ(A). This problem which is of interest in the present context. the eigenvalue problem consists in ﬁnding all complex numbers λ and nonzero vectors x ∈ X such that Ax = λx. If we are working in a ﬁnite dimensional space X ∼ = Rn . be the characteristic polynomial of A. First. where the residual is orthogonal to some k dimensional subspace W = span col W: AV = VH + R where R ⊥ W ⇔ W∗ R = 0. say V . Our goal in this section is to discuss methods for obtaining approximate invariant subspaces. λi = λj . x is an eigenvector provided that x ∈ ker(A − λI).e. mi is its algebraic multiplicity and the dimension of ker(A − λi I) is its geometric multiplicity. An important advance is the Implicitly Restarted Arnoldi Method (IRAM) developed by Sorensen. a theory however is emerging in [46]. Consequently. and only if.2) has a solution x if. i. that is. Krylov methods have their origins in eigenvalue computations and in particular eigenvalue estimations. A subspace V ⊂ X is Ainvariant if AV ⊂ V . λ is a root of the characteristic polynomial χA (λ) = 0. Λ(A) = Λ(B). and the same Jordan structure. will be the discussed in the next chapter 11. then there exists a matrix H ∈ Rk×k . notice that for a ﬁxed λ. The resulting approximate eigenvalues are then obtained by appropriately projecting A: H = W∗ AV where W∗ V = Ik . For a summary of results on eigenvalue computations we refer to the work of Sorensen [307] and [306]. The matrix A is diagonalizable if the algebraic multiplicity of each eigenvalue is equal to its geometric multiplicity. i = j.2). x) is called an eigenpair of A.2) If (λ.
Furthermore. and a matrix norm · . in which case H = V∗ AV. Given a matrix A ∈ Rn×n . Instead.1. the reciprocal condition number is the normwise distance to singularity (see e. 2004 i i i i . For every eigenvalue λ ˆ A = A + E. this is also equivalent to the condition number of the eigenvector matrix being inﬁnite κ(V) = ∞. since U−1 = U∗ . A− 1 is called the condition number of A.e. the condition number becomes very big. A consequence of this fact is that the Jordan form is not stably computable. for details we refer to the work of K˚ agstr¨ om et al. λ where V is the matrix whose columns are the eigenvectors of A. i. is equivalent to the nondiagonalizability of A. the same holds true for Proof. what is stably computable is the distance of the given matrix to various Jordan forms. Often we take W = V . By assumption. to the existence of nontrivial Jordan blocks.1. In the limit.i i 10. assuming λ / Λ(A). [314]): A min {A + E is singular} = . (BauerFike) Let A be diagonalizable and let E be a perturbation. Introduction 257 i “oa” 2004/4/14 page 257 i In this case the projection is called biorthogonal. the existence of parallel eigenvectors. if A has almost parallel eigenvectors. ([143]) Let A = VΛV−1 . singularity of this latter expression −1 −1 ˆ implies that (λI − Λ) V EV has an eigenvalue equal to one. The following result holds for pnorms. which implies κ(U) = 1. Consider A =A+ E ACA V E V−1 = 1 κ(V) E .1. The distance of the eigenvalues of A and A + E is bounded from above by the norm of the perturbation and by the condition number of the eigenvector matrix of A. Thus ˆ I − Λ)−1 VEV−1 ≤ (λ ˆ I − Λ)−1 1 ≤ (λ The result thus follows. in the 2induced norm. we consider in more detail what happens to A under the inﬂuence of a perturbation. ˆ of Proposition 10. there exists an eigenvalue λ of A such that ˆ − λ ≤ κ(V) E . Orthogonal (unitary) matrices U are perfectly conditioned. ˆ I − A − E)V is singular. It is well know that the following relationship holds: κ ( A) = A A−1 . We discuss here only a simple case. ˆ − λ) minλ (λ April 14. κ(A) E Notice that the condition number is bounded from below by one: κ(A) ≥ 1. [190]. the quantity κ(A) = lim + →0 sup E ≤ A (A + E)−1 − A−1 .1 Condition numbers and perturbation results Deﬁnition 10.g. det A = 0. Next. so its norm must be at least that large. 10.2. and the approximate eigenvalues are the eigenvalues of H. the matrix V−1 (λ −1 −1 −1 ˆ ˆ ˆ∈ the matrices λI − Λ − V EV and I − (λI − Λ) V EV. for induced matrix norms.
wj vi = δj. combining these equalities with ∗˙ wi v = 0. k = i. respectively: ∗ ∗ ∗ Avi = λi vi .. where A = A + E. W. vk ) ACA : i i April 14. wi i dv d k=i =0 = ∗ wk Evi k=i λk −λi vk =0 =0 1 ∗v wi i ∗ ∗ (wi Evk ) (wk Evi ) . For details we refer to [220]. For a sufﬁciently small > 0. E is ﬁxed.3. λi −λk (10. (a) The condition number κλk of the eigenvalue λk is now deﬁned as the maximum of w∗ vk . wk ( ) = 1. Since A0 = A it follows that λ0 = λi . The ﬁrst and the second derivatives of the quantities above are: dλ d 1 d2 λ 2 d 2 1 d2 v 2 d 2 =0 = = = ∗ wi Evi ∗v . yields the desired second part of (10. The second summand above vanishes and hence the ﬁrst one yields the expression for λ ˙ . wk (A − λ k ∗˙ ∗ Taking the limit → 0. k ∗ ∗ over all E satisfying E = 1. the expression becomes (λk − λi )wk vi = w k Evi . In the sequel we only prove the formulae for the ﬁrst derivatives. let vi . Moreover.5). same holds true for the right and left eigenvectors: v0 = vi .5) ∗ ∗ (wk Evj ) (wj Evi ) k. and is a varying parameter.i where δj. the perturbed eigenvalue chosen. and left eigenvector of A . then V−1 A = ΛV−1 . for simplicity the derivative of f with respect to be denoted by f ˙ I)v + (A − λ I)v ˙ −λ ˙ = 0.5). (A The ﬁrst formula in (10. w0 = wi . 2004 i i . wi A = λi wi . the ∗ v = 0 and w∗ vi = 0. and w be an eigenvalue. the rows of W∗ = V−1 are its left eigenvectors. Proposition 10. which is small enough so that A has simple eigenvalues as well. k k Remark 10. [143]. we calculate the inner product of the expression above with a left eigenvector wk To prove the formula for v of A that corresponds to the eigenvalue λk = λi : ∗ ˙ ˙ I)v + w∗ (A − λ I)v ˙ = 0. furthermore this upper bound is attained for ∗ the following choice of the perturbation matrix E = wk 1 vk wk vk . Thus if the columns of V are the right eigenvectors of A. we also assume that wj for all i = j and for all permissible values of the parameter . w∗ (A ˙  =0 given in (10. Under the above assumptions the Taylor series expansions of λ and v are λ = λi + dλ d + =0 d2 λ d 2 2 =0 2 + O( 3 ). Thus w∗ Ev κλk = wk vk 1 = . w . right. Let AV = VΛ.i is the Kronecker symbol (equal to one if the subscripts are equal and zero otherwise). v = 1. Eigenvalue computations i “oa” 2004/4/14 page 258 i It is assumed that A has simple eigenvalues (mi = 1).5) is obtained by taking the inner product of the above expression by w : ˙ I)v + w∗ (A − λ I)v ˙ −λ ˙ = 0. is equal to an eigenvalue of A. evaluated for = 0.1.i i 258 Chapter 10. vk ( ) = 1. wi denote the ith column of V. v .j =i (λi −λk )(λj −λi ) vj Proof. respectively. the corresponding formulae for the second derivatives follow similarly. w∗ (A − λ I) = 0. v = vi + dv d + =0 d2 v d 2 2 =0 2 + O( 3 ). there holds (A − λ I)v = 0. For details see the book by Wilkinson [355]. ∗v wk cos ( w k k .1. i. let λ . First notice that wk Evk ≤ wk vk . The derivative of the above expression with respect ˙: to is equal to the following expression.e.
while those in the two panes on the right are due to Karow [196]. For details on this topic we refer to the work of Trefethen [326]. vi = 0. [327]. Pseudospectra* 259 i “oa” 2004/4/14 page 259 i The condition number is large if the left and right eigenvectors are close to being orthogonal wi .5) gives an expression for the rate of change of the right eigenvector of A as a function of the perturbation E. is the concept of pseudospectra. 2004 i i i i . The singular values of the original Jordan block are σk = 1. In our ﬁrst example 10. we revisit the matrix discussed in example 3.3). of J are perturbed. Thus. Consider a perturbed matrix J which differs from J in that J (n.6 0. Therefore. For example if n = 103 and = 10−18 .2) can be expressed as Λ (A) ⊆ Λ (A) + ∆κ(V) . we consider the perturbed matrix A + E. the right eigenvector is vi = e1 while the left one is wi = en .5 1 1. where is a small positive parameter.4. (b) To study the sensitivity of the eigenvectors.2. Notice that if A is in Jordan form and λi corresponds to a nontrivial Jordan block with ones above the diagonal.1. It is interesting to note that while the eigenvalues are highly perturbed with small perturbations of the Jordan block. 11. a simple calculation shows 1 that the eigenvalues of J lie on a circle of radius n .2.1. which can be downloaded from http://www. therefore wi .2.1))). the eigenvector vi is most sensitive in the direction of eigenvectors corresponding to eigenvalues that are close to λi . Next we consider the following matrices: 1 0.5 2 Figure 10. there is an orthonormal set of eigenvectors and hence the condition number of every eigenvalue is one.1)). or all entries.2 Pseudospectra* An important tool for analyzing matrices whose basis of eigenvectors is not well conditioned.8 −1 −1 −0. and A=[0 1 6.comlab. Figure 10. A=triu(ones(50. Example 10.ac.uk/pseudospectra/psagui Deﬁnition 10. Pseudospectra. and σ12 = 0.2. · · · .i i 10.6 −0. and E is arbitrary subject to the norm constraint E = 1. ACA April 14. It readily follows that whether just one entry.2 −0. 10−3 and 10−2 pseudospectra of the 2 × 2 matrix of example 3. Perturbations of the unitary eigenproblem are studied in [66]. eigenvectors corresponding to clustered eigenvalues are difﬁcult to compute. k = 1.ox. These examples have been computed using the Pseudospectra GUI.4 −0. the radius of this circle is 0. Consider a Jordan block J of size n × n with ones above the diagonal and zeros everywhere else. page 35. the singular values are not.9594 (see ﬁgure 10.2. vi ≈ 0. The pseudospectrum of A ∈ Rn×n is Λ (A) = {Λ (A + E) : E ≤ }= z∈C: (z I − A)−1 ≥ −1 . 1) = . The plots in the two panes to the left of this ﬁgure are due to Embree. See also the paper by Embree and Trefethen [105] and references therein.5 0 0. (c) If A is symmetric (Hermitian).0 0 0]. Below we just give the deﬁnition and present a few illustrative examples. With the above deﬁnition the BauerFike result (proposition 10.2.8 0. A=compan(poly(ones(10.1.2 0 −0.4 0. 10. the second formula (10. Here we take = 100.2 shows the pseudospectra of these matrices.0 0 1. where ∆κ (V) is the disk of radius κ(V).
0. It should also be mentioned that if the perturbations of J are allowed to be complex.4 0.i i 260 4 i “oa” 2004/4/14 page 260 i Chapter 10.2 −0. 10−2 .00. right most pane: real pseudospectrum of the same matrix for the same . In more detail. i i ACA : April 14. 10−2 . The (12.4 0.1 0 0. respectively. 10−1 .2. Leftmost pane: pseudospectra of a 10 × 10 companion matrix for = 10−9 .1 −1 −1 −0.3 240 270 300 −0.8 Eigenvalues of perturbed Jordan block 0.3 −0.1 0 0. 1) element being 10−6 and 10−9 .3 is the pseudospectrum of J. Right ﬁgure.1 0 0.1 0. disks covering the starlike ﬁgures result.3 illustrates the perturbation of a 12 × 12 Jordan block J with ones above the diagonal and zeros everywhere else.2 0. · · · .3 0.2 2 2 1 1 0. 1.1 180 0 0 −0.3 −0.1 −0.5 Figure 10.3 0. 10−6 (right). The superimposed dots are the eigenvalues obtained as in the lefthandside ﬁgure with the (12. Eigenvalues of perturbed Jordan block 120 90 1 60 0. 1.3 0.3 −0. 1) element of J is perturbed from zero to = 10−3 .82.4 0.4 −0. Left ﬁgure. The resulting shape in ﬁgure 10.4 0.4 −0.1 210 330 −0.2 0. J is perturbed by adding a matrix with random entries which are normally distributed. 10−2 .5 −0.2 −0. respectively. respectively. namely: 0. Perturbation of the eigenvalues of a 12 × 12 Jordan block (left) and its pseudospectra for real perturbations with = 10−9 .2 0.1 0. second pane from left: pseudospectra of a 50 × 50 upper triangular matrix for = 10−14 .4 30 0.4 Figure 10.1 0.5 −0.5 −0.1 0 −0. 10−12 .56. the singular values are the same as those of the unperturbed J to within the norm of the perturbation.2 0.2 0.1 0 0 0 −0. · · · . The corresponding eigenvalues lie on a circle of radius equal to the 12th root of the perturbation.4 −0. Eigenvalue computations 4 0.5 0.4 0.025.5 3 3 0. 10−8 . ﬁgure 10.2 0.3 0.2 −0.6 150 0. second pane from right: complex pseudospectrum of the 3 × 3 matrix above for = 0. with variance 10−6 and 10−9 .3 0.2 −2 −2 −0.3 −0. 2004 i i .5 0.3 −0.3 0.3.5 −3 −3 −4 −1 −4 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 8 −0.68. 5000 experiments are performed and the resulting eigenvalues plotted.2 −0.4 −0. 0.4 −0.2 −0.4 −0.
x = 0. let x = Vy: Ax = µx + w with w ⊥ V ⇒ AVy = µVy + w ⇒ V∗ AVy = µy ˆ = V∗ AV is a symmetric matrix. and T is upper triangular. let its eigenvalues be µ1 ≥ · · · ≥ µr . The interval width is minimized if α is the Rayleigh quotient of v. i 2 λi − α2 u∗ i v .3. and real numbers λi . By the Cauchy interlacing The matrix A theorem (see proposition 3. Furthermore. The QR factorization. it readily follows that for all nonzero x: λ1 ≥ ρ(x) ≥ λn . Deﬁnition 10. V∗ V = Ir . It readily follows that β 2 = i (λi − α)ui ui v . ˆ x ˆ is called a Ritz value and Vx ˆ . α + β ]. let the columns of V = [v1 · · · vr ] ∈ Rn×r . 2004 i i i i . Given the n × n matrix A = A∗ . and an upper triangular matrix R ∈ Rk×k such that A = QR. i. Iterative methods for eigenvalue estimation 261 i “oa” 2004/4/14 page 261 i 10. k ≤ n. x April 14. there exists an orthogonal (unitary) matrix U = [u1 · · · un ].i i 10. r ≤ n.e.3 Iterative methods for eigenvalue estimation The following are well known facts. i = 1. Then λ ˆ ) be an eigenpair of A ˆ a Ritz vector of A. v = 1.3. r. respectively. Schur triangular form. Proposition 10. Every square matrix is unitarily equivalent to a triangular matrix: A = QTQ∗ . There follows Let (λ. Consider the eigenvalue problem projected onto the subspace spanned by the columns of V. the extremal values are attained by choosing x to be the (right) eigenvector corresponding to the largest/smallest eigenvalue. 2. where Q is orthogonal (unitary).3. ∗ 2 Proof. Given a matrix A ∈ Rn×k . the latter expression is equal to 2 in turn is greater than or equal to mini λi − α .7. Spectral theorem for Hermitian matrices. there exists a matrix Q ∈ Rn×k whose columns are orthonormal. Then A has an eigenvalue in the closed interval [α − β. x∗ x If we order the eigenvalues of A as λ1 ≥ · · · ≥ λn . let β = Av − αv .2 The simple vector iteration (Power method) Given is the diagonalizable matrix A ∈ Rn×n . · · · . Q∗ Q = Ik . The Rayleigh quotient of a square symmetric matrix A = A∗ ∈ Rn×n at x ∈ Rn is: ρ(x) = x∗ Ax . be orthonormal.e. i. The columns/rows of U are the right/left eigenvectors of A and form an orthonormal basis for Rn (Cn ). Given α ∈ R and v ∈ Rn . 10. Repeat the following steps: x = Av(k) and v(k+1) = ACA x .5. such that n A = UΛU = i=1 ∗ λi ui u∗ i. which 10.1 The RayleighRitz procedure Given A = A∗ ∈ Rn×n . Let v(0) be any vector with v(0) = 1.6.17 on page 50) we have λi ≥ µi ≥ λn−r+i . Proposition 10.
is of the order O  . Since a system of equations has to be solved at each step. let v(0) be any vector with v(0) = 1. x In this case convergence is quadratic and in the symmetric case even cubic. λ ≈ λk . Then.3.5 Subspace iteration Instead of iterating with only one vector. However. where the eigenvalues are ordered in decreasing magnitude λi  ≥ λi+1 . 0 1 (d) The algorithm does not converge if λ1  = λ2 . (c) In general this inverse iteration is more expensive than the simple iteration discussed earlier.3. The following result holds (10. As before. (b) The result holds even if the matrix is not diagonalizable but λ1 = λ2 . the convergence is in general fast.3 The inverse vector iteration Often a good approximation of an eigenvalue is given. (c) If the initial vector v(0) does not have a component in the direction of v1 . Eigenvalue computations i “oa” 2004/4/14 page 262 i Notice that an approximate eigenvalue can be obtained at each step by means of the Rayleigh quotient. Repeat the following steps: σk = ρ(v(k) ). we can iterate with more than one. Usually. let v be any vector with v = 1. 2004 i i . Formally. one has to prevent these vectors from converging to the same eigenvector. A counterexample is given by 1 . Let v(0) be any vector that has a nonzero component in the direction of v1 . solve (A − σk I)x = v(k) and set v(k+1) = x . the following steps are repeated: x (A − λI)x = v(k) and v(k+1) = x Remark 10. in any particular case there is a tradeoff between the increased amount of work versus the increased speed of convergence. 10.6) i i ACA : April 14. because the largest eigenvalue λk1 −λ of B is (much) bigger 1 (0) (0) than the other eigenvalues λj −λ . factoring a matrix is required at each step. Let the eigenvalue decomposition of A be A = VΛV−1 . Thus this method is usually applied only to symmetric tridiagonal matrices [259]. 0 10. (b) λ is called the spectral shift. this is also called the shiftandinvert method. (a) The smaller the ratio  λ1  . Again. to become the Rayleigh quotient of the most recent eigenvector estimate.3. This is accomplished by orthogonalizing them at each step. (a) The solution x of (A − λI)x = v(k) can be determined by means of the LU factorization or iteratively. having full column rank. but λ1 = λ2 . λ1 k λ2  Remark 10.1.3. Assume that A has a simple largest eigenvalue λ1 and let v1 be the corresponding eigenvector.4 Rayleigh quotient iteration This method is obtained by changing the shift in the inverse iteration method at each step. The simple iteration described above conλ2 k verges towards the dominant eigenvector and the angle between the k th iterate v(k) and v1 .8. assuming that λ2  = λ3 . The following steps are repeated: solve (A − σ I)X = V(k) and perform a QR factorization V(k+1) R(k+1) = X. if A in the simple vector iteration is replaced by B = (A − λI)−1 . Therefore.2.i i 262 Chapter 10. 10. Proposition 10.3. the faster the convergence. convergence is towards v2 . j = k . Given is V(0) ∈ Rn×q .
B = QB RB . (0) Let V(0) ∈ Rn×k be such that Q∗ ∈ Rk×k is nonsingular. Consider the Rayleigh quotient ρ(x) = xx∗ x . B respectively.4 Krylov methods In the iteration methods described above. We follow the reasoning of Golub and Van Loan [143]. and knowledge of good starting vectors can be exploited. 263 i “oa” 2004/4/14 page 263 i where Q = [q1 · · · qk ]. The angle between the two subspaces is the largest singular value of Q∗ A QB (the singular values of this product are the principal angles between the two subspaces). the gradient ∇ρ = ∂ρ ∂x1 ∂ρ · · · ∂x n ∗ .3. (b) To speedup convergence. and the corresponding eigenvalues are λ1  ≥ · · · ≥ λk  > λk+1  ≥ · · · ≥ λn .4). 2004 i i i i . Krylov methods Proposition 10. ∆ ∈ Rk×k . The partial derivative of ρ with respect to xi is ∗ ∂ρ e∗ Ax + x∗ Aei x∗ Ax (e∗ 2 x∗ Ax ∗ i x + x ei ) ∗ Ax ) − = i − = ( e (e x) . As stated earlier. a sequence of vectors v(k) is constructed that converges (under mild assumptions) to the eigenvector corresponding to the largest (in modulus) eigenvalue. Consider the partial (block) Schur decomposition of A AQ = Q∆ = [Q1 Q2 ] ∆11 0 ∆12 ∆22 . The convergence rate can be slow.4.9. However. our goal is to determine approximate invariant subspaces that satisfy (10. 10. 2 x∗ x which shows that the stationary points of the Rayleigh quotient are the eigenvectors of A. Given two subspaces spanned by the columns of A. Then the simple subspace iteration (10. The approach described in this section attempts to keep the information provided by the whole sequence and make good use of it.4. let A = QA RA . and let Vj denote the submatrix of V consisting of its ﬁrst j columns. be the QRfactorizations.3). is 1 Ax − ρ(x)x ∇ρ(x) = . (a) The angle between two subspaces is the maximum over all angles between two vectors. (c) The advantages of subspace iteration are: it is stable. λk m Remark 10. In this procedure a few p ≤ k vectors are sought that converge much faster to eigenvectors of the original matrix. the approximate eigenvalues are eigenvalues of the projected A given by (10.i i 10. Consider a matrix V whose columns are orthonormal. notice that at each step only the last vector of the sequence v(k) is used. The disadvantage is it has slow convergence. 10. each lying in one subspace. It follows from the Cauchy interlacing theorem that the following set of inequalities holds: ∗ ∗ ∗ λmax (V1 AV1 ) ≤ λmax (V2 AV2 ) ≤ · · · ≤ λmax (Vk AVk ) ≤ λmax (A) ACA April 14.1 A motivation for Krylov methods First we attempt to provide a motivation for the use of Krylov methods by concentrating on symmetric matrices ∗ Ax A = A∗ ∈ Rn×n . i ∂xi x∗ x x∗ x x∗ x x∗ x x∗ x i Thus. it is simple to implement. it has low storage requirements. In the sequel we ﬁrst discuss the case where W = V and motivate the particular choice of V. the RayleighRitz procedure can be incorporated in the subspace iteration algorithm.3.6) con1V verges and the angle between the subspaces spanned by the columns of V(k) and Q1 at the mth step is of the order m k+1  O λ .
we need to orthogonalize Avk with respect to the vi . · · · . Solving these constraints successively for j = 1. Consequently. it equal to the largest eigenvalue of Vj AVj . k . vj +1 . the Rayleigh quotient of rj is k ∗ ∗ ∗ AVj )z.7) This leads to the problem of computing orthonormal bases for reachability/Krylov subspaces. vk ] = span col v1 . See (4. i < j . k .26) for the deﬁnition of these subspaces. Now the condition that has to be imposed on the choice of the (j + 1)st column vj +1 is that the span of the columns of the matrix Vj +1 contain the gradient of the Rayleigh quotient evaluated at rj : ∇ρ(rj ) = j j 2 (Arj − ρ(rj )rj ) ∈ span col Vj +1 = span col [v1 · · · vj vj +1 ] . · · · . · · · . i. Let z ∈ R be a unit vector such that Vj AVj z = λmax (Vj ∗ ∗ ∗ follows that z Vj AVj z = λmax (Vj AVj ). let rj ∈ Rn be such that ρ(rj ) = λmax (Vj AVj ). · · · . i = 1. vk that form an orthonormal basis for the reachability (Krylov) subspace Rk (A. vk = A ∗ Furthermore. The direction of largest increase of the Rayleigh quotient evaluated at rj is given by its gradient. we have αi. v1 ). vj }. we obtain. This shows that rj = Vj z ∈ span{v1 . since Vk span col [v1 . A = A∗ .j = vi Avj ∈ R. j . Following the reasoning of the previous section. 10. this procedure has the following property. Therefore. is given by k rk = Avk − i=1 ∗ ∗ ∗ (vi Avk ) vi = Avk − Vk [Vk Avk ] = [I − Vk Vk ] Avk Thus the new vector in the sequence is vk+1 = Therefore Avk = i=1 k+1 rk .j = αj. Eigenvalue computations i “oa” 2004/4/14 page 264 i The Lanczos algorithm can be derived by imposing the requirement that given vi .7) form the basis for Krylov methods. 2. i = 1. Since vj depends only on vi . Proposition 10.e. ∗ In order to achieve this goal. and the coefﬁcients are zero for j > i + 1 as well. i=1 αi vi .k vi where αi. rk ∗ αi. Ak−1 v1 = Rk (A. r∗ j rj k −1 where rj = v1 . Furthermore. i i ACA : April 14. · · · . v1 ).3) and (10. vj and the matrix A. namely ∇ρ(rj ) = r∗2rj (Arj − ρ(rj )rj ). v1 ).2 The Lanczos method Consider the symmetric n × n matrix A together with a sequence of vectors v1 . Av1 . (10. for i > j + 1. For any choice of A ∈ Rn×n . since A is symmetric. ∗ the next vector in the sequence.j = 0.i . · · · . a similar argument shows that this choice of vj +1 minimizes the smallest eigenvalue of Vj +1 AVj +1 . The quantities at our disposal for achieving this are v1 . namely Avk .10. This can be done by applying the Gram–Schmidt procedure to the new direction. the coefﬁcients satisfy αi. The component rk of Avk orthogonal to the span of Rk (A.4. it should be mentioned that in the numerical linear algebra community these are referred to as Krylov subspaces and denoted by Kk (A. conditions (10. v2 . · · · .i i 264 Chapter 10. 2004 i i . ∗ ∗ Vk = I k : given the smallest eigenvalue of Vj AVj . be chosen so as to MAXIMIZE λmax (Vj +1 AVj +1 ). v1 ). · · · .
λ − µ]. Let the starting vector be v1 . then 2 6 6 6 6 6 Hk = 6 6 6 6 6 4 α1 6 β2 β2 α2 β3 3 β3 α3 . If the remainder rk = 0. k . and the one labelled 1 the eigenvalues obtained after 20 steps. . The abscissa shows the values of the various eigenvalues and their approximants. . j = 2. tan φ1 . let αi = αi. 3. if rk = 0. Proposition 10. the matrix Hk .4. . Thus if the angle between the starting vector and the corresponding eigenvectors of A is not large. Then 2 λ1 ≥ µ1 ≥ λ1 − (λ1 − λn ) ν1 where ν1 = ∗ and cos φ1 = v1 w1 . where c0 (ξ ) = 1. i “oa” 2004/4/14 page 265 i 265 (10. Krylov methods The above equations can also be written compactly as follows: ∗ AVk = Vk Hk + rk e∗ k where Hk = Vk AVk . c1 (ξ ) = ξ . while the rows show the approximants after a certain number of steps. 15]. the one labelled 10 shows the eigenvalues obtained at the 11th step. Then the Rayleigh ∗ x∗ V k AVk x = λ. k − 1. moreover. · · · . However.4. is tridiagonal. 7 7 7 7 5 (10.5 shows pictorially the convergence of the Lanczos procedure. obtained after one step of Lanczos. We quote here one of them due to Kaniel and Paige. we assume that λi ≥ λi+1 and that the eigenvectors are orthonormal. ν1 and νn are small. we also have λk (A) ≤ λ ≤ λ1 (A). (λ. 10. 1] (since the denominators of ν1 and νn are the values of such a polynomial outside this interval. Let µj . ck−1 (1 + 2ρn ) 2 λn ≤ µk ≤ λn + (λ1 − λn ) νn where νn = √ λn−1 −λn λ1 −λn−1 and ρn = ∗ and cos φn = vn wn .3 Convergence of the Lanczos method How close is the estimate of the largest eigenvalue after k steps of the Lanczos procedure? There are several results in this direction. for simplicity of notation. Let (λi . ρ1 = λ1 −λ2 λ2 −λn . Recall that the Chebyshev polynomials are generated recursively using the formula cj (ξ ) = 2ξcj −1 (ξ ) − cj −2 (ξ ). . i = 1. and increase rapidly outside the interval [−1. · · · .i and βi+1 = αi. 2. which coincide with the exact eigenvalues ACA April 14. in which case if (λ.4) lie within the square bounded by ±1 in both axes. x) is an eigenpair of Hk . be the eigenvalues of the tridiagonal Hk obtained after k steps of the Lanczos algorithm.. βk αk αk−1 βk 7 7 7 7 7 7 7.. Furthermore tan φn . · · ·. the row labelled 20 shows the eigenvalue. ck−1 (1 + 2ρ1 ) and ci denotes the Chebyshev polynomial of degree i. for details we refer the reader to [143]. 2004 i i i i . j = 1. Vk x) is an eigenpair of A (since Hk x = λx implies AVk x = Vk Hk x = λVk x).i i 10. rk+1 cannot be constructed and the Lanczos procedure terminates. Given is a 20 × 20 symmetric matrix whose eigenvalues lie in the interval [−15. which is the projection of A onto V . For instance. implies that A has an eigenvalue in the interval [λ + µ.8) Several conclusions can be drawn from the above relations. they are likely to be large).. Figure 10. wi ) be the ith eigenpair of the symmetric matrix A ∈ Rn×n . where Vk x is the corresponding Ritz vector. by interlacing. First. Chebychev polynomials (shown in ﬁgure 10. . where µ = quotient ρ(Vk x) = x∗ x Ax − λx .11.9) This matrix shows that the vectors in the Lanczos procedure satisfy a three term recurrence relationship Avi = βi+1 vi+1 + αi vi + βi vi−1 .i+1 . we can apply the Rayleigh–Ritz procedure.
The difference is that in proposition 10.3 h3.k hk. 7 7 7 7 5 .2 h2. in this case we denote the entries by hi. k = 1. (10.26) on page 60.edu/ caam551/MatlabCode/matlabcode.11.html 10. Devise a process which is iterative and at the k th step gives AVk = Vk Hk + Rk . Hk ∈ Rk×k . which is true. and it takes another 5 steps for the second biggest and second smallest eigenvalues to be well approximated.k−1 .5 −2 −2 −1.1 6 h2.k 3 7 7 7 7 7 7 7. This is in contrast to the Lanczos procedure where only threeterm recurrences are necessary.4.4.10) A key consequence of this lack of tridiagonal structure is that long recurrences are now needed to construct vk+1 .. 2. i i ACA : April 14.1.2 h1. hk−1.1 h1. hk−1. The resulting process is known as the Arnoldi method. .caam. Vk . Notice that the eigenvalue obtained at the ﬁrst step is approximately in the middle of the range of values. Let Rk (A. and b ∈ Rn . Figure 10. . . .j = αi.4. · · · .i i 266 The first 10 Chebyshev polynomials 2 i “oa” 2004/4/14 page 266 i Chapter 10.5 1 1. The conclusion is that the spectrum is approximated starting with the extremes.2 h3.k .5 Properties of the Arnoldi method Given is A ∈ Rn×n .4. n These quantities have to satisfy the following conditions at each step. which can be downloaded from: http://www. Eigenvalue computations 1. αi.k−1 h3. b) ∈ Rn×k be the reachability or Krylov matrix deﬁned by (4.5 0 0. if (A.10. and therefore the projected matrix Hk loses its symmetry and becomes a Hessenberg matrix.k h2. shown on the row labelled 0. 2004 i i . b) is reachable.k−1 hk. It is assumed that Rk has full column rank equal to k . while after 6 steps the two extreme (largest and smallest) eigenvalues are approximated well. for example.5 was generated by means of a MATLAB GUI due to Dan Sorensen.j = αj. 10. The ﬁrst 10 Chebyshev polynomials.3 h2.k−1 h2.j : 2 6 6 6 6 6 Hk = 6 6 6 6 6 4 h1.rice. The stage is thus set for restarting methods for the Arnoldi procedure which is brieﬂy discussed in section 10.4.k−1 h1. Rk ∈ Rn×k .3 ··· ··· h1.4 The Arnoldi method The procedure described above can also be applied to nonsymmetric matrices A.5 0 −0. .5 −1 −0.5 −1 −1.5 2 Figure 10.k h3. Problem 10.i .5 1 0.
2. it satisﬁes the Galerkin condition: Vk 1. k = 1 . b).i i 10. and only the last column of the residual R ∈ Rn×k is nonzero. n. k = 2. The Arnoldi process: given A ∈ Rn×n . construct V ∈ Rn×k with orthonormal columns. 2. Krylov methods 20 i “oa” 2004/4/14 page 267 i 267 18 16 14 12 10 8 6 4 2 0 −15 −10 −5 0 5 10 15 Figure 10. · · · . ∗ 1. ∗ Rk = 0. The residual Rk is orthogonal to the columns of Vk . n . · · · .5. ACA April 14. k = 1.4. 2. H A V = V + R Figure 10. Plot of the eigenvalues of a symmetric 20 × 20 matrix using the symmetric Lanczos procedure. The columns of Vk are orthonormal: Vk V k = I k . span col Vk = span col Rk (A. · · · . n. 3.6. 2004 i i i i . that is. such that H ∈ Rk×k is an upper Hessenberg matrix.
Eigenvalue computations i “oa” 2004/4/14 page 268 i This problem leads to the The Arnoldi procedure. where vk+1 is the (k + 1)st column of V.13. H ∈ Rk×k upper Hessenberg. The solution has the following structure. 3. Hk. All these methods seek approximate solutions at the k th step. for any polynomial φ of degree less than k φ(A)v1 = Vφ(H)e1 . FOM. Theorem 10. be the characteristic polynomial of Hk . MINRES.1. the second item is equivalent with the second condition of the above problem.k−1 = . the ﬁrst item is equivalent with the second item of the problem formulation. th 2. of φ(ξ ). Remark 10. rk rk . k that is. is the product of the entries of the subdiagonal of H. Its proof is left to the reader (see problem 42. we must have x(k) = Vk y. where pi+1 is the coefﬁcient of λi of the polynomial pk . Consequently. that belong to the Krylov (reachability) subspace spanned by the columns of Rk (A. page 353). This further implies that vk+1 = Hessenberg matrix. Given the fact that Vk Vk = Ik . ∗ 1. for some y ∈ Rk . pk−1 (A)b pk−1 (A)b ∗ Proof. b). V∗ V = Ik . n×n Lemma 10. it also follows that the coefﬁcients of pk provide the least squares ﬁt between Ak b and the columns of Rk . Ab. A ∈ Rn×n . This monic polynomial is the solution of the following minimization problem: pk = arg min p(A)b 2 . Furthermore. The proof of the ﬁrst three items is similar to the corresponding properties of the Lanczos procedure (proposition 10. ν = Πk i=1 hi+1. Thus according to the preceding analysis. thus k .) i i ACA : April 14. ν = e∗ k H e1 .4. Since pk (A)b = Ak b + Rk · p. CG. denoted by x(k) . For j = k we have −1 Ak v1 = VHk e1 + ν f . (Actually x(k) ∈ x(0) + Rk (A. Furthermore. 2004 i i . V ∈ Rn×k . There are certain iterative methods for solving the set of linear equations Ax = b. where ek is the k rk ⊥ Rk . where the minimum is taken over all monic polynomials p of degree k . Hk is an upper 4. · · · . Hk is obtained by projecting A onto the span of the columns of Vk : Hk = Vk AVk .12.10). and in addition. Ak−1 b]. The remainder Rk has rank one and can be written as Rk = rk e∗ unit vector. There holds Aj v1 = VHj e1 for 0 ≤ j < k. Let pk (λ) = det(λIk − Hk ). with A ∈ R Ve1 = v1 . that are closely related to the Arnoldi method. Let AV = VH + fe∗ . b) = [b. where αk is the coefﬁcient of the highest power ξ k . For the remaining items see [306]. b ∈ Rn . and k . GMRES.i ∈ R. There holds rk = 1 pk (A)b pk (A)b. for any polynomial φ of degree k there holds φ(A)v1 = Vφ(H)e1 + ναk f . often x(0) is taken to be zero.i i 268 Chapter 10. The above theorem is based on the following fundamental lemma of the Arnoldi procedure. 5.
k y 2 Thus y is the least squares solution of the above problem. b). so is U−1 . The following relationship holds true: ARn = Rn F where F = 0 1 0 0 0 0 1 0 ··· ··· ··· . if A = A∗ is symmetric. the CG (Conjugate Gradient) method results.4. namely that the residual be orthogonal to Rk (A. The k step Arnoldi factorization can now be obtained by considering the ﬁrst k columns of the above relationship. 2004 i i i i .4. to wit: ¯ [AV]k = VA k ¯ kk + fe∗ ⇒ A[V]k = [V]k A k ¯ kk contains the ﬁrst k columns and k rows of A. Ab. ··· 0 0 0 1 −α0 −α1 −α2 −αn−1 . . y= b H∗ k+1. · · · . If A = A∗ > 0 is positive deﬁnite. and χA (s) = sn + αn−1 sn−1 + · · · + α1 s + α0 . 10. b): ∗ Vk b − Ax(k) = 0. . The residual in both cases (FOM and GMRES) can be expressed in terms of a polynomial of degree k : ψ (s) = αk sk + · · · + α1 s + α0 .i i 10. and where f is a multiple of the (k +1)st column of V. where V is orthogonal and U is upper triangular. Krylov methods 269 i “oa” 2004/4/14 page 269 i The FOM (Full Orthogonalization Method) requires a Ritz–Galerkin condition. furthermore F is upper Hessenberg.k Hk+1.k e1 Finally. It follows that ¯. namely. Therefore A an upper triangular times an upper Hessenberg times an upper triangular matrix.k and x(k) = Vk y. in the second we seek to minimize the norm of the same residual under the assumption α0 = 1. the columns of [V]k provide an orthonormal basis for the space spanned by the ﬁrst k columns of the reachability matrix Rn . ACA April 14. Notice that A is therefore still upper Hessenberg. is the characteristic polynomial of A. and the corresponding reachability matrix Rn = [b. the method reduces to MINRES (Minimal Residual Method). Since AVk = Vk+1 Hk+1. An−1 b]. Thus y is obtained by solving Hk. a starting vector b ∈ Rn . The GMRES (Generalized Minimal Residual Method) requires that the residual be minimized b − Ax(k) 2 minimal where x(k) ∈ Rk (A. AVU = VUF ⇒ AV = V UFU−1 ⇒ AV = VA ¯ A ¯ being the product of Since U is upper triangular.6 An alternative way of looking at Arnoldi Consider a matrix A ∈ Rn×n .k y 2 = b e1 − Hk+1.k y = b e1 . there holds b − Ax(k) 2 = b − AVk y 2 = Vk+1 b e1 − Vk+1 Hk+1.k −1 H∗ k+1. In the former case the polynomial should be such that the norm of the residual r = ψ (A)b is minimized subject to αk = 1. is upper Hessenberg. Compute the QR factorization of Rn : Rn = VU.
k = 1. · · · . n. Sk have rank one and can be written as Rk = rk e∗ k . In this case however the inner product is deﬁned in a different way.7 Properties of the Lanczos method If A is symmetric. · · · .4. namely p(λ). b)) = 0. i i ACA : April 14.i i 270 Chapter 10. Sk = qk ek . q(A)b = c p(A) · q(A) b Lemma 10. 1. pj = 0. The assumption for the solvability of this problem is that det(Ok (c. j = 1. q(λ) = p(A∗ )c∗ . devise an iterative process that gives at the k th step AVk = Vk Tk + Rk . · · · . Wk Rk = 0. c∗ ∈ Rn . c∗ ). then • Hk is symmetric and tridiagonal. and span col Wk = span col Rk (A∗ . ∗ • Biorthogonality: Wk Vk = Ik .15. · · · . p0 = 1.i . Problem 10.j +1 = Hj +1. • Tk is obtained by projecting A as follows: Tk = Wk ∗ • The remainders Rk . Corollary 10. k = 0. The following additional results hold. it provides short recurrences but no orthogonality. In the symmetric case. · · · . are orthogonal: pi . A). i = 1.j . a modiﬁed Lanczos procedure can be applied.4. The associated Lanczos polynomials are deﬁned as before.14. the Arnoldi process becomes the Lanczos process and the corresponding characteristic polynomials of Hk are the Lanczos polynomials. · · · . pk (A)b . versus long recurrences with orthogonality. namely pk (λ) = det(λIk − Tk ). b). n−1. namely αi = Hi. This can be seen as an alternative to the Arnoldi method.9)). • The polynomials pk . where Ok is the observability matrix of the pair (c. ∗ AVk .2. n.38). 2. ∗ ∗ • Galerkin conditions: Vk Sk = 0. n. k = 1. A)Rk (A. n.4. This problem is solved by the twosided Lanczos procedure. Therefore the normalized polynomials are p • The columns of Vk and the Lanczos polynomials satisfy the following threeterm recurrences βk+1 vk+1 ˜ k+1 (λ) βk+1 p pk+1 (λ) = (A − αk I)vk = (λ − αk )˜ pk (λ) = (λ − αk )pk (λ) − βk vk−1 ˜ k−1 (λ) − βk p 2 − βk pk−1 (λ) 10. The Lanczos process applied to symmetric matrices produces 2n−1 numbers. 2. If A = A∗ ∈ Rn×n . for i = j . A) deﬁned by (4. Notice that the second condition of the second item above can also be expressed as ∗ span rows Wk = span rows Ok (c. and βj = Hj. Given A ∈ Rn×n and two vectors b. where the latter is the usual inner product in Rn . · · · .8 Twosided Lanczos If a matrix A has to be transformed to tridiagonal form but it is not symmetric. A∗ Wk = Wk T∗ k + Sk . for all k = 1. and βk pk−1 (A)b = ˜ j = (β1 β2 · · · βj )−1 pj . n − 1 (see also (10. q(A)b = b∗ p(A∗ )q(A)b = b∗ p(A) · q(A) b. 2. k = 1. 2. 2004 i i . Eigenvalue computations i “oa” 2004/4/14 page 270 i 10. we can deﬁne an inner product on the space of polynomials as follows p(λ). • span col Vk = span col Rk (A. q(λ) = p(A)b. n.
V = [ V γ2 = r∗ AV . k . we are given two c∗ starting vectors b ∈ Rn and c∗ ∈ Rn .4. ∗ ∗ ∗ ∗ Tk = Wk AVk . forming ∗ the product Vk AVk is expensive. Thus V2 = [V1 f2 e ∗ 2. k ) entry of Hk+1 . For j = 1. c∗ ∈ Rn Find: Vk . pj = 0 for i = j . are orthogonal: pi . namely Hk . qj = A∗ wj − αj wj − βj wj −1 (c) βj +1 = ACA ∗ r∗ j qj . β1 = b∗ c∗ . Now in addition to A ∈ Rn×n . the k column of the projector Vk is constructed and simultaneously. At the third step we compute the ] . A Wk = Wk Tk + gk ek . · · · . First the inner product γ1 = cb is computed. where. Krylov methods 271 i “oa” 2004/4/14 page 271 i • This further implies that vk+1 . wk+1 are scaled versions of rk . e. γk−1 wk−1 . 2004 i i i i . ∗ th H3 = V3 AV3 . Consequently. n − 1. AVk anew at each iteration. the entries of the k th row and column of the projected Amatrix. 10. p0 = 1. is exploited. V k gk = 0. b b . the Arnoldi procedure. Then This implies f1 = AV1 − V1 H1 ⊥ V1 . th f2 2 and F2 = AV2 − V2 H2 = [0 f2 ] = Thus the third vector of V3 is f f 2 : V3 = [ V2 f2 ] . Twosided Lanczos algorithm 1. one would not compute. ∗ R2 = [0 r2 ] = r2 e2 . f1 ±r1 ∗ √ √ f . γj +1 = sign (rj qj )βj +1 April 14. and so on. qk respectively. Wk ∈ Rn×k . Thus at the k step. the fact that only the last column changes from step to step. 1 ≤ i ≤ n − 1. βk−1 pk−1 (λ). given A ∈ Rn×n and the starting vector b ∈ Rn . and so on. W = [ W ] .i i 10. Therefore.12) where ek denotes the k th unit vector in Rn . and hence T = W 1 2 1 2 2 1 2 1 2 ∗ remainders F2 = AV2 − V2 T2 . R2 = A∗ W2 − W2 T∗ 2 . fk . this in turn implies that H2 = The twosided Lanczos algorithm is executed in a similar way.. Wk fk = 0 . • The generalized Lanczos polynomials pk (λ) = det(λIk − Tk ). Again. • The columns of Vk . such that ∗ ∗ ∗ AVk = Vk Tk + fk e∗ k . γi . The second (where the plus/minus sign depends on whether γ1 is positive or negative). γk−1 qk−1 (λ). 1 ≤ i ≤ n. ∗ (a) αj = wj Avj (b) rj = Avj − αj vj − γj vj −1 . γ1 = sign (b∗ c∗ )β1 v1 = b/β1 . Then V1 = √b and W1 = ± γ1  ∗ AV1 . Computationally. ∗ V2 AV2 f1 f1 ]. having 3n − 2 entries: αi . (10. as mentioned earlier. Vk Wk = Ik . 1. it turns out that they are both rankone: F2 = [0 f2 ] = f2 e2 . Wk and the Lanczos polynomials satisfy the following threeterm recurrences γk vk+1 βk wk+1 γk pk+1 (λ) βk qk+1 (λ) = (A − αk I)vk = (A∗ − αk I)wk = (λ − αk )pk (λ) = (λ − αk )qk (λ) − − − − βk−1 vk−1 . k = 0. γ2  γ2  γ1  The Lanczos algorithm: recursive implementation Given: the triple A ∈ Rn×n .11) (10. βi . it follows that T1 = W1 ∗ step involves the computation of the remainders: f1 = AV1 − V1 T1 and r1 = A W1 − W1 T∗ 1 . Subsequently. plus the th (k + 1. w1 = c∗ /γ 2.g.4. the third column of the two projectors is then obtained by appropriate normalization of f2 and r2 . gk ∈ Rn . Tk is a tridiagonal matrix.9 The Arnoldi and Lanczos algorithms Conceptually. · · · . follow. b. ﬁrst sets V1 = H1 = ∗ V1 AV1 . and Tk ∈ Rk×k .
The Arnoldi algorithm ∗ 1. where hj is chosen so that the norm fj is minimized. and the Arnoldi algorithm coincides with the Lanczos algorithm.3. γ k βk αk i “oa” 2004/4/14 page 272 i Chapter 10. V V = Ik .13) (10. α1 = v1 w f1 = w − v1 α1 . b). Wk = (w1 w2 · · · wk ) . q∗ Tk = k ∈ Ok+1 (c. w = Av1 . The rational Arnoldi algorithm i i ACA : April 14. · · · . vj +1 = βj j Hj βj e∗ j ∗ w = Avj +1 . It ∗ ∗ ∗ turns out that Vj hj = 0. .. a further improvement can be obtained by using several shifts λ1 . 10. λk . (b) If A is symmetric. and H ∈ Rk×k . As pointed out by Ruhe [281].4. These methods will be discussed in more detail in the next chapter. The implementation of rational Krylov methods requires the computation of (direct) LU factorizations of A − λI. fj +1 = w − Vj +1 h ˆj h Hj +1 = H ˆj = Vj +1 = (Vj vj +1 ). such that AV = VH + fe∗ k . 2004 i i . k − 1 f βj = fj . V1 = (v1 ). 2.4. where λ is the shift which is close to the eigenvalue of interest.i i 272 (d) vj +1 = rj /βj +1 . A). Eigenvalue computations The Arnoldi algorithm: recursive implementation Given: the pair A ∈ Rn×n . H Remark 10.10 Rational Krylov methods In order to accelerate convergence of the Krylov methods. · · · . h = Vj +1 w. Hj is tridiagonal. . v1 = b b . . one can apply a shiftinvert strategy as shown in section 10. that is fj = (I − Vj Vj )Avj . (a) The residual is: fj = Avj − Vj hj .. This leads to the family of rational Krylov methods. wj +1 = qj /γj +1 The following relationships hold true: Vk = (v1 v2 · · · vk ) . V∗ f = 0 where H is in upper Hessenberg form (as before ek denotes the k th unit vector in Rn ).3 for the single vector iteration.2. A Wk = Wk Tk + γk+1 wk+1 ek .14) 2. For j = 1. and rk ∈ Rk+1 (A. Therein A is replaced by (A − λI)−1 . b ∈ Rn Find: V ∈ Rn×k . satisfying ∗ ∗ ∗ AVk = Vk Tk + βk+1 vk+1 e∗ k . . and hj = Vj Avj . H1 = (α1 ) (10. α1 γ2 β2 α2 . where ∗ ∗ H = V AV. . . f ∈ Rn .
where ACA April 14. starting vectors v. In this case we hope for the eigenvalues of the projected matrix to capture some salient subset of the spectrum of A. starting vector v ∈ R . Vj +1 ← (Vj . w 1 = w / w . k − 1. vj +1 = βj j v1 v1 . the rightmost eigenvalues of A. for example. ∗ α1 = w1 f. Hj βj e∗ j ∗ Solve (A − µI)w = vj +1 . gj j j j j j . h). those eigenvalues that have • Largest modulus. 2. H1 = (α1 ) 2. f = Av1 . 2004 i i i i . ∗ w = Av1 . · · · . f1 ← w − α 1 v . vj +1 ← fj /βj . • Largest real part. Choose shift µ Solve (A − µI)v1 = b. h = Vj +1 w. ∗ αj +1 ← wj +1 f . g = A∗ w1 .4. for example. one hopes to obtain accurate estimates to the eigenvalues of A when m = dimKm dimA. v1 = v / v . V1 = (v1 ). H1 ← (α1 ). k − 1 f βj = fj . βj = ∗ f . α1 = v1 w. for j = 1. wj +1 ← gj /γj . V1 ← (v1 ). ∗ h ← Vj +1 w. Krylov methods 273 i “oa” 2004/4/14 page 273 i Algorithm: The k –step Arnoldi factorization. g1 ← g − α1 w1 .i i 10. fj +1 = w − Vj +1 h ˆj h Hj +1 = H ˆj = Vj +1 = (Vj vj +1 ). for j = 1. k − 1. The Arnoldi and the twosided Lanczos algorithms 1. γ = sign (g∗ f )β . 2. fj +1 ← f − αj +1 vj +1 . f ← Avj +1 − γj vj . · · · .4. fj +1 ← w − Vj +1 h ˆ j . Hj +1 ← (H end vj +1 ← fj /βj .11 Implicitly restarted Arnoldi and Lanczos methods The goal of restarting the Lanczos and Arnoldi factorizations is to get a better approximation to some desired set of preferred eigenvalues. Data: A ∈ R n×n Algorithm: The k –step twosided Lanczos process. normalize v1 ← ∗ w Solve (A − µI)w = v1 . Since the Arnoldi recurrence gets increasingly expensive as the iteration number m gets large. w ∈ Rn . α1 = v1 f1 = w − v1 α1 . Hj ˆj ← H βj e∗ j w ← Avj +1 . f1 ← f − α1 v1 . 2. H 10. βj = fj . vj +1 ). gj +1 ← g − αj +1 wj +1 end Figure 10. For j = 1. g ← A∗ wj +1 − βj wj . Data: A ∈ Rn×n . n v1 = v/ v . · · · . or • Positive or negative real part. One way to encourage such convergence is to enhance the starting vector b in the direction of the eigenvalues (or invariant subspace) associated with the desired eigenvalues via a polynomial transformation b ← p(A)b.7.
Consider the following symmetric matrix: 2 1 2 1 1 2 0 1 A= 2 0 2 1 1 1 1 0 1 . For the Hamiltonian case see [61]. let H containing the leading m − 1 rows and columns. see [35]. Suppose for example that while A has eigenvalues in the lefthalf plane. for details. We now truncate the above relationship to contain m − 1 columns.4. b = 0 . where V ¯ m = Vm Qm and H ¯ m = Q∗ AV m m Hm Qm . 2004 i i .16. without explicitly forming the new starting vector) and extra shifts.e. 3. This is done as follows: ﬁrst. This has been implemented in ARPACK [227]. ¯ m−1 can be obtained through an m − 1 step Arnoldi process with A and Theorem 10. using unwanted eigenvalues of Km as roots of p.3. see [258].4. For a recent study on restarted Lanczos algorithms. i. implicit restarting (i. [110].i i 274 Chapter 10. However.e. 4. (a) Implicit restarting was introduced by Sorensen [305].1. (c) Convergence of IRAM. take as roots of p the leftmost eigenvalues of Km ). 2. it should be noted that stability is achieved at the expense of no longer matching the Markov parameters. ¯ m−1 denote the principal submatrix of H ¯ m.. 0 1 0 √ 1 0 √ 6 2 6 H2 = √ . One gets some idea for these roots by performing a basic Arnoldi iteration and computing the spectrum of the projected matrix Km . It follows that ¯m = V ¯ mH ¯ m +m e∗ Qm . The combination has proved especially effective. yield: 1 0 V1 = H1 = [2]. Remark 10. For additional eigenvalue algorithms. m−1 This process can be repeated to eliminate other unwanted eigenvalues (poles) from the reduced order system. 8 6 6 3 1 0 √ 6 i i ACA : April 14. 0 . Example 10. Then one begins the Arnoldi iteration anew with starting vector p(A)b. restarting through the QR decomposition. [307]. compute the QRfactorization of Hm − µIm = Qm Rm . namely. An excellent choice for the roots of p is the set of those eigenvalues of Km most unlike the eigenvalues of A that are sought (if one want the rightmost eigenvalues of A. See [46]. (b) Krylov methods and implicit restarting have been worked out for special classes of matrices. Given the above setup. H ¯ = (µIn − A)b: starting vector b ¯ m− 1 = V ¯ m− 1 H ¯ m−1 + ¯ AV f e∗ . To eliminate this unwanted eigenvalue the reduced order system obtained at the mth step is projected onto an (m − 1)st order system. The convergence of Implicitly Restarted Arnoldi methods has been studied in some detail. We will now present some examples which will illustrate the Krylov methods. the approximant H obtained through ∗ one of the two methods has one eigenvalue which has positive real part. and: AVm = Vm Hm +m e∗ m.. for some x. say µ. 0 0 0 1 R1 = 2 1 0 0 0 √1 54 R2 = 0 √ −1 54 0 √1 54 The symmetric Lanczos or Arnoldi procedure for steps k = 1. This paper offers two new ideas. where Vm AVm x = µx. V2 = 0 √ 2 . Eigenvalue computations i “oa” 2004/4/14 page 274 i p is a polynomial with roots near unwanted eigenvalues of A. This can be accomplished in a numerically stable way via implicit restarting.
4. Hm ← Q∗ Hm Q. R4 = 4 0 √1 √3 18 2 √3 0 √3 0 0 6 0 2 √ 0 0 0 0 0 0 0 0 √ 3 2 − 3 2 0 0 √ 0 0 0 0 0 0 . 4). Krylov methods 275 i “oa” 2004/4/14 page 275 i Algorithm: Implicitly Restarted Arnoldi Method (IRAM). Repeat until convergence: Compute Λ (Hm ) and select p shifts µ1 . end Figure 10. k = 1.8. 2. · · · . 0 0 These matrices satisfy: AVk = Vk Hk + Rk . The associated orthogonal polynomials are the characteristic polynomials of Hk : p0 (x) = 1 p1 (x) = det(xI1 − H1 ) = x − 2 p2 (x) = det(xI2 − H2 ) = x2 − 14x/3 − 2/3 p3 (x) = det(xI3 − H3 ) = x3 − 6x2 + 11x/2 + 1. q∗ ← q∗ Q. Compute an m–step Arnoldi factorization: AVm = Vm Hm + fm e∗ m . 1. β2 = H4 (2. 1 : k ). H3 = 0 0 1 √ 3 − √1 3 1 √ 3 √2 1 6 √ 2 . end ˆk + fm σk . q ∗ ← e∗ m for j = 1. R3 = 0 1 4 √ 3 18 0 √ 6 0 0 0 8 √1 0 3 0 18 √ . Beginning with the k –step Arnoldi factorization AVk = Vk Hk + fk e∗ k. and Rk implies ∗ Hk = Vk AVk . In order to normalize these polynomials. we compute β1 = H4 (1. 3).4. the characteristic polynomial at the 4th step. H4 = 0 0 −1 √ 0 2 0 0 0 1 8 √ 3 18 . which. 2004 i i i i . 3. 2). ACA April 14. due to the orthogonality of the columns of Vk . Finally. µp . and therefore in the associated inner product. is the same as the characteristic polynomial of A. · · · . p Factor [Q.i i 10. The Implicitly Restarted Arnoldi algorithm V3 = V4 = 1 0 0 0 1 0 0 0 0 1 √ 6 2 √ 6 1 √ 6 0 1 √ 6 2 √ 6 1 √ 6 0 1 √ 3 − √1 3 1 √ 3 2 √6 . Hk ← Hm (1 : k. p4 (x) = det(xI4 − H4 ) = x4 − 6x3 + 4x2 + 8x + 1. Vm ← Vm Q. 2. fk ← vk+1 β Vk ← Vm (1 : n. m = p + k. β3 = H4 (3. it corresponds to the zero polynomial. apply p additional steps to obtain a new m–step Arnoldi factorization AVm = Vm Hm + fm e∗ m. 1 : k ). R] = qr (Hm − µj I).
8933 − 1. f4 = 0 .3405 . It is readily checked that 4 2 ˜ k (λi ) · p ˜ (λi ) = δk. yields the following matrices for k = 1. H4 are: −0. Eigenvalue computations k.0607 . H3 = 2 2 0 . . 0. H3 .1550 −0. −1.0067. λ3 = −0. is the Kronecker delta symbol. 1.1362 −0. 2. . λ4 = −0. H2 .8054 4.8153 Example 10. λ2 = 2.4641 3.7399 −0. 2. The eigenvalues of successive approximants H1 . H2 . 4: 1 1 −1 1 1 . H3 .1362.2361 5.2361 −1. V2 = H2 = . = 0.i i 276 thus the orthonormal polynomials are: ˜ 0 (x) = 1 p 1 ˜ 1 (x) = p1 (x)/β1 = √ p (x − 2) 6 √ 2 ˜ 2 (x) = p2 (x)/β1 /β2 = 3(3 p 14x − 2) √x − 3 ˜ 3 (x) = p3 (x)/β1 /β2 /β3 = 2(2x − 12x2 + 11x + 2) p 2 2 2 2 The weights that deﬁne the Lanczos inner product turn out to be γ1 = 0.1387 2.0079 corresponding to the eigenvalues λ1 = 4. 3 where δk.8145 4.5560.4.0607.8154. 1. f3 = V3 = −1 1 −1 2 1 0 4 −2 1 −1 −1 1 1 1 1 1 0 2 2 5 0 1 1 −1 1 −1 0 0 .4641 −1. γ2 = 0. 3. 23 1 3 A= 3 4 7 −15 −11 5 1 −3 1 1 −19 3 −1 . γi ·p i=1 i “oa” 2004/4/14 page 276 i Chapter 10. V1 = H1 = [0]. 2004 i i . γ4 = 0. γ3 = 0.2.8933 + 1. i i ACA : April 14. −1. 4.4294.6594i . V4 = 0 4 −2 2 0 1 −1 −1 2 1 1 −1 −1 1 0 0 2 4 0 The eigenvalues of the successive approximants H1 .6594i −2 . f2 = −2 1 2 2 2 1 1 −1 −2 1 1 1 1 0 2 2 −1 1 1 −1 1 .7866 2 . b = −1 3 1 1 1 1 The Arnoldi procedure AVk = Vk Hk + fk e∗ k .7398. f1 = 1 2 1 1 −1 1 1 2 2 1 1 −1 0 2 .H4 = 2 2 . H4 are: 3. 2.
are iterative methods which can be implemented by means of vectormatrix multiplications exclusively. Eigenvalue estimation methods are divided in two parts: the single vector iteration methods. which consist of the Arnoldi and the Lanczos procedures.1802.i i 10.5623i}. 0. ACA April 14. 0 1 Finally. 3. 2004 i i i i . In this case we have 1 0 A= 1 0 In this case b is actually equal to e1 columns and rows. −0.3580. .4. 10.3247.6655i}. {0.5. The preceding chapter was dedicated to the presentation of these iterative methods known collectively as Krylov methods. they cannot be applied to large problems. 110 − 7 110 0 − √ 55 11 √ 22 22 0 0 5 10 22 The eigenvalues of the projected matrices are: {1}. 0. and the subspace iteration methods. Thus: 1 0 V4 = 0 0 1 1 0 0 1 0 1 1 1 1 . H = V AV = 3 3 3 2 0 6 √ 0 √ 6 0 2 6 0 0 0 1 The eigenvalues of H2 are: {2. Thus.2198}. {2. {2. This connection will be exploited in the next chapter. 2.1846 ± .3376 ± 0. the Arnoldi procedure terminates after 3 steps. {2.3247. This leads to a different class of approximation methods the origin of which lies in eigenvalue estimation.8004i}. 2.3376 ± 0. 0.3376 ± 0. If the starting vector is chosen as b = [1 1 0 0]∗ . as the starting vector belongs to the eigenspace of A spanned by the eigenvectors corresponding to the eigenvalues {2. 4 are: {1}.3247.5623i}. 0 0 277 i “oa” 2004/4/14 page 277 i and A can be brought to upper Hessenberg form by a simple permutation of the 0 0 1 0 0 0 0 1 1 0 1 1 ∗ ⇒ H4 = V4 AV4 = 0 0 0 0 1 1 1 0 1 0 0 1 1 0 .2056. where we will show that Krylov subspace iterations are intimately connected with model reduction. 0}. 0.5623i}. It is worth noting at this stage that the Krylov subspaces are in the system theoretic language reachability and observability subspaces. Thus 1 1 √ √ √ √ 0 2 6 9 3 6 2 1 − 1 √ √ √ 0 1 6 ∗ 3 3 9 0 . the eigenvalues of Hk for k = 1. {2. 0. that is estimation of a few eigenvalues of a given operator (matrix). Finally. 1}. H = V AV = √ √ √ 4 4 4 1 55 22 11 18 0 √ 0 − 22 11 5 55 2 √ 1 √ 2 0 √ √ 7 10 55 3 11 .3. for the starting vector b = [0 0 1 1]∗ we get √ √ √ √ 55 22 2 √ 10 110 0 1 55 11 5 5 10 √ √ √ √ 0 √ 1 55 22 7 11 10 − 2 55 − 2 11 − 9 55 5 5 2 ∗ V4 = . {2. their compelling properties not withstanding.1028 ± 0. b = 0 0 1 0 . What is needed in such cases. Chapter summary Example 10.5 Chapter summary The SVD–based approximation methods presented in the preceding three chapters require dense computations of the order O(n3 ) and storage of the order O(n2 ). where n is the dimension of the system to be approximated. V3 = 2 . The latter category includes Krylov subspace methods.
Eigenvalue computations i “oa” 2004/4/14 page 278 i i i ACA : April 14.i i 278 Chapter 10. 2004 i i .
that is.4. In this case the complexity of the system turns out to be the rank of the Hankel operator H deﬁned in the section 5. approximation by moment matching. H is a rational function. [131]. Approximation by moment matching A linear discrete. There is an extended literature on this topic. as well as [153]. the Laplace transform of the impulse response H. that is. 279 i i i i . There are two approaches which fall in this category: the Lanczos and the Arnoldi methods. For the systems considered here. page 83. This can be done by matching some terms of the Laurent series expansion of H at various points of the complex plane.13). Equivalently. hi = h i = 1. page 255. the ﬁrst < n Markov parameters of the original system. . This is the partial realization problem discussed in section 4. Pad´ e approximation and Lanczos methods on the other. the work of Gutknecht [161] and the book of Komzsik [209]. i≥0 The coefﬁcients ht are known as Markov parameters of the system. Let Σ= A C B be given. one way of approximating a system is to approximate its transfer function H by a rational function of lower degree. the partial realization problem can be solved in a numerically efﬁcient way. where B and C∗ are column vectors. We now seek a rational function ˆ (s) = H i≥0 ˆ i s−i . the expansion around inﬁnity is used: H(s) = hi s−i . to mention but a few. In this case. [37]. this problem can be solved iteratively and in a numerically efﬁcient manner by means of the Arnoldi and Lanczos factorizations.4.i i i “oa” 2004/4/14 page 279 i Chapter 11 Model reduction using Krylov methods In this chapter we turn our attention the third use of the Krylov iteration as described at the beginning of the previous chapter. h ˆ i . If the tobeapproximated system Σ is described in state space terms (4. Often.4. Therefore. but are determined by B and C. called the transfer function of the system. First we would like to point to the paper by Gragg and Lindquist [145] which discusses many connections between system theory and the realization problem on the one hand and numerical analysis. · · · . for which matches. can be used. the starting vectors are not arbitrary (as in the case of eigenvalue approximation). 2. for instance. See also the course notes of van Dooren [333]. The Lanczos and Arnoldi methods In case the system is given in terms of state space data. [237].or continuoustime system deﬁned by a convolution sum or integral is uniquely determined by its impulse response h.
Finally. and has C arbitrary entries. namely the moments of h around the (arbitrary) point s0 ∈ C: ∞ ηk (s0 ) = 0 tk h(t)e−s0 t dt. O(102 )) or higher. it should be stressed that these methods are useful because they can be implemented iteratively. ∞ 0 (11. 2. Furthermore. A k k ˆ is that it matches k Markov parameters of Σ.2) For our purposes. as already mentioned.. Vk . B ˆ. A ˆ. C ˆ = CVk . Wk ∗ deﬁned by Π = Vk Wk to Σ: ˆ A ˆ C ˆ B is now constructed by applying the Galerkin projection ˆ = W∗ AVk . Their range of applicability is for large n. they are competitive with dense methods for n ≈ 400 (i. C . B. and then truncating. both the Lanczos and Arnoldi methods correspond at the nth step to the computation of a canonical form of the triple A. A ˆ = reduced order system Σ Σ: ˆ A ˆ C ˆ B i “oa” 2004/4/14 page 280 i can be constructed by applying the PetrovGalerkin projection Π = VV∗ to ˆ = V∗ AVk . by applying rational Krylov methods reduced order systems are obtained which match moments at different points in the complex plane. k = 0. k. A reduced order system Σ is. In the generic case these two canonical forms are as follows. h ˆ has the property that it matches 2k moments of Σ: This reduced system Σ Put in a different way. C ˆ can be computed without computing the corresponding canonical forms of A. 11. Notice that the resulting A x B . the soconstructed reduced order models match k . k ≥ 0. its k th moment is deﬁned as ∞ ηk = 0 tk h(t) dt. where k < n. k ≥ 0. i. ˆ = (x1 x2 · · · xk )∗ . C. 2k. that ∗ ˆ = Vk = Ik . B ˆ = V ∗ B.4) i i ACA : April 14. which are no longer orthonormal but are biorthogonal. s=0 (11. · · · . to iteratively produce two matrices Vk and Wk with k columns each. we also make use of a generalized notion of moments. namely: The main property of the reduced system Σ ˆj = C ˆA ˆ j −1 B ˆ = CAj −1 B = hj . · · · . (11. As B a consequence the Lanczos and Arnoldi procedures have reliable numerical implementations.3) These generalized moments turn out to be the derivatives of H(s) evaluated at s = s0 : ηk (s0 ) = (−1)k dk H(s) dsk ∈ Rp×m . depending on the problem. Model reduction using Krylov methods The k –step Arnoldi iteration applied to the pair (A. which involve matrixvector multiplications. C ﬁrst. · · · . 2004 i i . ˆ. while B and C∗ are multiples of the ﬁrst unit vector e1 . exclusively. In both cases. j = 1. C ˆ = CVk . s=s0 (11. C ˆ retain the same structure as the full order A. B) produces a matrix with orthonormal columns. The canonical form in the Lanczos method corresponds to A being in tridiagonal form.1 The moments of a function Given a matrixvalued function of time h : R → Rp×m . respectively. A k k ˆj = C ˆA ˆ j −1 B ˆ = CAj −1 B = hj . 2k Markov parameters of the original system. j = 1. the reduced order system is obtained by truncating the state x = (x1 x2 · · · xn )∗ to ˆ. A is in upper Hessenberg form. B ˆ = W ∗ B. the twosided Lanczos iteration uses A together with two starting vectors. the k th moment of h is the k th ∈ Rp×m . Furthermore. namely B and C∗ . 1. h Similarly.i i 280 Chapter 11. B.e.e. B is a multiple of the unit vector e1 . For the former.1) If this function has a Laplace transform deﬁned by H(s) = L(h) = derivative of H evaluated at s = 0: ηk = (−1)k dk H(s) dsk h(t)e−st dt.
i. A. In the general case.i i 11. ACA April 14. when the expansion is around inﬁnity. ηk (0) = CA−(k+1) B. Lanczos and Arnoldi methods. and the resulting problem is known as partial realization.7 on page 55. Thus. 2004 i i i i . In the general case they are known as rational Krylov methods. we may expand H(s) in a Laurent series around inﬁnity: H(s) = D + CB s−1 + CAB s−2 + · · · + CAk B s−(k+1) + · · · The quantities η0 (∞) = D. ˆ A ˆ C ˆ B ˆ D ˆ = The approximation problem consists in ﬁnding Σ . The algorithms that achieve this are called Krylov methods. in particular H(s) = H(s0 ) + H(1) (s0 ) ( s − s0 ) (s − s0 )k + · · · + H(k) (s0 ) + ··· 1! k! (s − s0 ) (s − s0 )k = η0 (s0 ) + η1 (s0 ) + · · · + ηk (s0 ) + ···. since H(s) = C(sI − A)−1 B + D. ηk (s0 ) = C(s0 I − A)−(k+1) B. the problem is known as rational interpolation. h(t) = CeAt B + δ (t)D. expand the transfer function around s0 as in (11. the moments are called Markov parameters. the moments at s0 = 0 are η0 (0) = D + CA−1 B.e.2. where the moments at s0 are ηj .2 Approximation by moment matching Given Σ = A C B D . · · · . k > 0. It should be noted that the moments determine the coefﬁcients of the Laurent series expansion of the transfer function H(s) in the neighborhood of s0 ∈ C. 1! k! (11. and in particular. Given (C. k > 0. the following are key properties of the ensuing algorithms: • moment matching is achieved without computation of moments. The Markov parameters of this system grow as λk (while other moments grow exponentially in maxλ s0 − λ−1 ). (11. ηk (∞) = CAk−1 B. where 2 3 ( s − s0 ) ( s − s0 ) (s − s0 ) ˆ (s) = η +η ˆ2 +η ˆ3 + ··· H ˆ0 + η ˆ1 1! 2! 3! such that for appropriate : ηj = η ˆj . and • the procedure is implemented iteratively. t ≥ 0. j = 1. j ≥ 0.6) are the Markov parameters hk of Σ deﬁned in 4. B). Consider the system where A is stable and the eigenvalue of largest modulus is λ..5).5) In a similar way. Their consequence is numerically reliability. If the moments are chosen at zero. 11. k > 0. As already mentioned. 2. the corresponding problem is known as Pad´ e approximation. Approximation by moment matching In this context h is the impulse response of a linear system Σ = A C B D i “oa” 2004/4/14 page 281 i 281 . Thus in many cases the computation of the moments is numerically problematic. and those at s0 are η0 (s0 ) = D + C(s0 I − A)−1 B. .
ˆ. and h ˆA where A ˆ of Σ. Rk = b Ab ··· Ak − 1 b ∈ Rn×k . Furthermore. It follows trivially that these maps satisfy: (a) ΠL ΠU = I. 2k. · · · . L(i. i = 1. .i i 282 Chapter 11. where A (11. j ) = 0.7) of the Markov parameters. . Deﬁne the following four quantities associated with Σ: (i) the k × n observability matrix Ok . We are now ready to deﬁne a reduced order system: ¯ = Σ ¯ A ¯ c ¯ b ¯ = ΠL b . j ) = 0. This allows the computation the LU factorization of Hk : Hk = LU.8) ¯ as deﬁned above satisﬁes the equality (11. ··· hk+1 hk+2 . . b ¯ = cΠU .7) We proceed as follows. . hi = h (11. i). c Proof. First notice that we can write ΠU ΠL = Rk U−1 L−1 Ok = Rk (Ok Rk ) Therefore Ok ΠU ΠL = Ok and ΠU ΠL Rk = Rk −1 Ok i i ACA : April 14. b ˆ i−1 b ˆ∗ ∈ Rk . · · · .1. of Σ. h2k ∈ Rk×n . b. c ¯ = Π L A ΠU . that det Hi = 0.. c ˆi = c ˆ. and (iv) its shift σ Hk : Hk = h1 h2 . onal. k . .2.1 Twosided Lanczos and moment matching Given is the scalar system Σ = A c b ˆ = . hk h2 h3 hk+1 ··· ··· . Therefore Hk = Ok Rk and σ Hk = Ok ARk . The goal is to ﬁnd Σ ˆ A ˆ c ˆ b . where A ∈ Rn×n . i < j and U(i. i = 1. . i > j. are matched: ˆ i . . σ Hk = . (ii) the n × k reachability matrix Rk : Ok = c cA . and thus (b) ΠU ΠL is an oblique projection. A ¯ is tridiagTheorem 11. k < n. Σ ∗ ¯ ¯ are multiples of the unit vector e1 .. L and U are chosen so that the absolute values of their diagonal entries are equal: L(i. Next comes the key assumption of this procedure. so that the ﬁrst 2k Markov parameters hi = cAi−1 b. ˆ ∈ Rk×k . . Deﬁne the maps: ΠL = L−1 Ok and ΠU = Rk U−1 . h2k−1 hk+1 h3 h4 hk+2 ··· ··· . c∗ ∈ Rn . ··· hk h2 h3 hk+1 . 2004 i i . Model reduction using Krylov methods i “oa” 2004/4/14 page 282 i 11. i) = ±U(i. Furthermore. and b. cAk−1 (iii) the k × k Hankel matrix Hk . namely.
b This completes the proof of the theorem. we obtain: ¯ c = = cΠU ¯ ¯A c = cΠU ΠL AΠU = cAΠU . ¯ k −1 ¯A c ∈ Rk×n . and that A by the columns of ΠU . j ) = 0. O ¯. ACA April 14. R ¯ ¯ k−1 b A ¯t = ∈ Rn×k and O ¯ c ¯ ¯A c . it readily follows that where ¯k = R ¯ A ¯ ¯b b ··· ¯ k = U and O ¯ k = L. . . due to the fact that U ¯ k = U implies b Since L is lower triangular. ∗ Corollary 11. ¯k = Ok ΠU . ¯ = U(1. .i i 11.2. for i > j . Recall the QR factorization of the reachability matrix Rk ∈ Rn×k (recall also that Hk = Ok Rk ). R ¯ (i.2. while. It follows that in the basis formed ¯k = L implies that c ¯ = L(1. c ¯ is tridiagonal. 2004 i i i i . an orthogonal projection ΠΠ∗ can then be attached to this factorization: Rk = VU ⇒ Π = Rk U−1 = V. ¯ k−1 = cΠU ΠL AΠU · · · ΠL AΠU = cAk−1 ΠU ¯A c k−1 times ¯ k = ΠL Rk . j ) = 0. γn−1 βn−1 αn−1 βn 11. for i < j .4. The Lanczos procedure leads to the following canonical form for singleinput singleoutput linear systems: α1 β2 = γ1 γ2 α2 β3 β1 γ3 α3 . Approximation by moment matching 283 i “oa” 2004/4/14 page 283 i which shows that ΠU ΠL is a projection along the rows of Ok and the columns of Rk . A c b . 1)e1 and A ¯ (i. we have R ¯k R ¯ k = [Ok ΠU ΠL ] Rk = Ok Rk = Hk . Using the ﬁrst projection relationship.8. 1)e1 and A is upper triangular. The maps ΠL and ΠU deﬁned above are equal to Vk .. . Combining the last two equalities. O Ok Furthermore ¯R ¯k A ¯ k = Ok ΠU ΠL AΠU ΠL Rk = Ok ΠU ΠL A ΠU ΠL Rk = Ok ARk = σ Hk O ¯k O ¯k R Ok Rk Finally. . ¯ are multiples of the unit vector e1 . γn αn . ⇒ O . Wk . respectively of the twosided Lanczos procedure deﬁned in section 10.2 Arnoldi and moment matching The Arnoldi factorization can be used for model reduction as follows. ... we get Similarly.2. .
= . λi ).n−1 h2n h32 h33 h3. i i ACA : April 14. · · · . k .n−1 h3.4. From (10. Only matrixvector multiplications are required. Furtherˆ as deﬁned above satisﬁes the equality of the Markov parameters h Theorem 11. . H ˆ i ) be the residues and poles of the original and reduced systems.n hn. A is in Hessenberg form. and U is upper triangular. .4. hence A ˆ = V∗ AVV∗ b = V∗ Ab. The Arnoldi procedure leads to a canonical form for the triple (C. c ˆ = Π∗ AΠ. O(k 2 n) for sparse systems. page 122. Model reduction using Krylov methods i “oa” 2004/4/14 page 284 i where V ∈ Rn×k .1. The requirement for memory is O(kn). c hn−1. the upper triangularity of U implies that A is in Hessenberg form. respectively ˆ of the A. 2004 i i .n−1 h1n h21 h22 h23 · · · h2. The map Π deﬁned above is given by V of the Arnoldi procedure as deﬁned in section 10. First notice that since U is upper triangular. b ˆ = cΠ. A matrices. no matrix factorizations and/or inversions. given an order n system using Lanczos or Arnoldi factorizations is O(kn2 ) for dense systems. . in general.n−1 hn−1. . where A (11.29).10) follows: β1 h11 h12 h13 · · · h1. Corollary 11. This expression points to the following rule of thumb for choosing interpolation points for model reduction: Interpolate at the mirror image − λ∗ i of the poles λi of the original system. 2. The number of operations needed to compute a reduced system of order k . since VV∗ is an ˆb b e1 . We are now ready to deﬁne a reduced order system: ˆ = Σ ˆ A ˆ c ˆ b ˆ = Π∗ b.3.10) The above is a global error bound. Σ ˆ ˆ more. Assuming diagonalizability procedure. There is no need to compute the transformed nth order model and then truncate. For a local one see [32]. VV∗ b = Vb ˆ = b. i = 1. hence orthogonal projection along the columns of Rk .i i 284 Chapter 11. moreover: R ˆ1 · · · h ˆk) = c ˆ k = cVV∗ Rk = cRk = (h1 · · · hk ). Properties of Krylov methods 1. we can derive an error bound for approximation by means of the Lanczos ˆ be the transfer functions of the original and the reduced systems. let (ci . This reduces the illconditioning that arises in SVD methods. therefore b ˆ k = V∗ Rk . versus O(n3 ) operations needed for the SVD–based methods. ˆ (−λ c ˆi H(−λ i (11. Let H. we have VV∗ Rk = Rk . . and since V Rk = U it follows that b = u1 = ˆ = V∗ b. v1 = b b . (ˆ ci .. A more precise operation count can be found in section 14. V∗ V = Ik . ∗ ˆ Proof. A.5.n−1 hnn γ1 γ2 γ3 · · · γn−1 γn An error expression for the Lanczos model reduction procedure Making use of formula (5. and b is a multiple of the unit vector e1 .9) ˆ i = hi .n A b . B). . ˆR (h Finally. λ There holds n k ˆ Σ−Σ 2 H2 = i=1 ci H(−λ∗ i) ˆ (−λ∗ ) + −H i=1 ˆi) − H ˆ∗) . .
e. • The Lanczos method breaks down if det Hi = 0. The remedy is to match the coefﬁcients of the Laurent expansion of the transfer function in frequencies other than inﬁnity.4. This leads to the rational Krylov approach.g. If A is nonnormal Γ(A) may extend into the right half plane C+ . The remedy is implicit restart of Lanczos and Arnoldi. high convergence rate. For details we refer to [156] and references therein. even if the original system is stable. This explains the origin of instability in Krylov methods.1. Drawbacks • Numerical issue: Arnoldi/Lanczos methods lose orthogonality. at the expense of increased bookkeeping. described in section 10. • The resulting reduced order system tends to approximate the high frequency poles of the original system. The Arnoldi method breaks down if Ri does not have full rank.2. the initial slope of the impulse or step response of the system. If A is normal (AA∗ = A∗ A) the numerical range is equal to the convex hull of the spectrum of A. of the section on Recursive interpolation on page 96. • A drawback of nonsymmetric Lanczos is the need to work with A∗ . is always deﬁned. and consequently. which can be difﬁcult to use. because it may help capture for example. the projection Vm AVm can never be unstable. The continued fraction expansion of a rational function. at the origin is max{x∗ (A + A∗ )x}.2. However the Lanczos canonical form and the associated tridiagonal form of A does not always exist. A consequence ∗ of this is that for stable A. on a distributed memory parallel computer. 2004 i i i i . which is one of the extremal points of the numerical range of A. Knowledge of good starting vectors cannot be exploited in model reduction since in this case they are ﬁxed. The remedy consists in reorthogonalization. Hence the steadystate error may be signiﬁcant. this is based on the fact that the slope of the norm of the matrix exponential eAt . obtained be repeatedly dividing the denominator by the numerator.i i 11. 4. b).11. It can also be argued that the lack of stability is not necessarily a bad thing. a remark is in order. It can be shown that Lanczos breaks down precisely when one or more remainders in the above division process have degree greater than one. Krylov methods often have simple derivation and algorithms. Lanczos and continued fractions The Lanczos canonical form of a linear system is related to a continued fraction expansion of its transfer function. The remedy in this case is provided by lookahead methods. this is a consequence of the lack of reachability of the pair (A. A remark on the stability/instability of reduced models With regard to the instability of reduced order models using Krylov methods.. • The reduced order system may not be stable. This comes from the instability of the classical GramSchmidt procedure. 5. Approximation by moment matching 285 i “oa” 2004/4/14 page 285 i 3. and happens less frequently than the singularity of some principal minor of the Hankel matrix (even though Hn is non–singular). For developments related to this continued fraction expansion and generalizations we refer besides the ACA April 14. and when used for eigenvalue estimation. For this we need to introduce the numerical range of the map A ∈ Cn×n : Γ(A) = {x∗ Ax : x = 1} . for some 1 ≤ i ≤ n. depending on the starting vector the projection may turn out to be unstable. Krylov methods can also be applied to multiinput multioutput (MIMO) systems where B ∈ Rn×m . Example 11. C ∈ Rp×n . These issues are the subject in greater generality. A connection between the Lanczos procedure and system theory described in [145] is that the coefﬁcients of the Lanczos canonical form can be read off from the continued fraction expansion of the transfer function. m and/or p are bigger than one.
b A 0 0 ˆ= 1 0 0 0 0 1 0 0 1 1 6 −12 . ⇒ A 0 0 0 0 the biorthogonal Lanczos transformation matrices are √ −12 6 192 0 0 0 0 1 0 −4 0 −24 −4 V= . s + 3s3 + 12s2 + 8s + 24 4 4 s + 2 + 12 s The three approximants obtained by truncating this continued fraction expansion are: H1 (s) = H2 (s) = H3 (s) = 1 s+2 1 4 s+2+ s− 1 1 s+2+ = s−1 s2 + s + 2 = 4 4 s−1− s+ 2 s2 + s − 6 (s + 3)(s − 2) = s + 3 s2 − 4 (s − 1)(s + 2)2 3 A realization of H(s) = c(sI4 − A)−1 b. Indeed.1931 −1.4171 0 8.4907 0.9876 18. b = 0 0 0. For a discussion of the connection between tridiagonalization of A and minimal realizations. ¯. W∗ = 1 1 0 −2 0 −4 0 2 √ √ 1 8 3 0 1 1 3 2 The Arnoldi factorization applied to A with starting vector b yields: −3 −12 −8 −24 1 1 0 0 0 0 ˆ = . the Arnoldi factorization applied to A∗ with starting vector c∗ yields 13.6608 1. c ¯ = W∗ AV. 0 0 0 .6272 ˜ ˜ = A . b ¯ = cV. Consider the system described by the following transfer function in continued fraction form H(s) = 1 s+2+ 4 s−1− = s3 + s2 + 6s − 12 . with A in companion form is given by: 0 1 0 0 0 0 0 0 1 0 .8791 −1.1983 ˜= c 0. Model reduction using Krylov methods i “oa” 2004/4/14 page 286 i section on recursive interpolation mentioned above.8938 .6603 −8.9058 7. is in Lanczos form.0741 −0.i i 286 Chapter 11. c 12 0 2 − 2 √ 0 0 0 − 12 0 The latter realization ¯ A ¯ c ¯ b 6 1 1 .2276 14. since A 1 1 −2 0 ¯ = W ∗ b. and b ¯∗ are multiples of e1 . to [9].4314 0. b = 0 . c = −12 A= 0 0 0 0 1 −24 −8 −12 −3 1 The continued fraction expansion of H induces a different realization. namely −2 2 0 0 1 −2 2 0 2 0 ¯ ¯ ¯= 1 √ A= . c ¯ is tridiagonal. b = .1455 −3. Finally. see also [260].0902 0 0 0 1.8904 2. 2004 i i . [15] and [12]. c ˆ = . i i ACA : April 14.0973 0.
. and as before. Gallivan and coworkers have revisited this area and provided connections with the Sylvester equation [127]. let W be any left inverse of V. 2004 i i i i . C∗ ∈ Rn . [128]. As it turns out any matrix whose column span is the same as that of V can be used. [366]. Suppose that we are given a system Σ = ˆ = lower dimensional models Σ ˆ A ˆ C ˆ B A C B . · · · . Ak−1 B] = Rk (A. normboundedness. such that Σ ˆ preserves some . this is due to the fact that the starting vector is not a multiple of e1 . which leads to realization and interpolation problems. B)W∗ Aj Rk (A. one would not use V as deﬁned above to construct Σ of V are almost linearly dependent. The issue of partial realization and more generally rational interpolation by projection has been treated in the work of Skelton and coworkers. The following string of equalities leads to the desired result: ˆ ( sj I k − A ˆ )−1 B ˆ = CV(sj Ik − W∗ AV)−1 W∗ B C ACA April 14. etc. j = 1.i i 11. is which shows that even though the starting matrix A∗ is in upper Hessenberg form. We have C ˆA ˆ jB ˆ = CRk (A. B ˆ. see e. Recently. B)W∗ Aj B = CRk (A. although upper Hessenberg. Suppose now that we are given k distinct points sj ∈ C. AB. ˆB ˆ = CVW∗ B = CRk (A. Partial realization and rational interpolation by projection 287 i “oa” 2004/4/14 page 287 i ˜ .3. (sk In − A)−1 B . Grimme in his thesis [151] showed how to accomplish this by means of Krylov methods.3 Partial realization and rational interpolation by projection In the preceding sections we saw that rational Krylov methods achieve model reduction by moment matching. Σ ˆ ( sj I k − A ˆ )−1 B ˆ =H ˆ (sj ). [367].6. that is Proposition 11. and from the introduction. j = 1. van Dooren. · · · . like stability.85). the reduced system should be given by: ˆ = Σ We begin with the following ˆ deﬁned Proposition 11. k − 1. (1. A not equal to A∗ . We wish to ﬁnd ˆ ∈ Rk×k .5. We wish to study this problem through appropriate projection methods. where A ∈ Rn×n and B. B) and W be any left inverse of V . is a partial realization of Σ and matches k Markov parameters. passivity. Then ˆ deﬁned by (1. page 8.. Then Σ by (1. H(sj ) = C(sj In − A)−1 B = C ˆ A ˆ C ˆ B D = W∗ AV CV W∗ B D . 11.8). V is deﬁned as the generalized reachability matrix (4. k < n. · · · .C ˆ ∗ ∈ Rk . B)e1 . furthermore Proof. where A properties of the original system. [92]. k.g. interpolates the transfer function of Σ at the sj . B)ej +1 = CAj B. · · · . since usually the columns From a numerical point of view. we seek V ∈ Rn×k and W ∈ Rn×k such that W∗ V = Ik . page 87: V = (s1 In − A)−1 B. Let V = [B. ˆ . In this section we will revisit this problem from a more general point of view. C This expression is equal to CRk (A.8).8) Proof. B)e1 = CB. In other words.
ek−1 ] ⇒ (s0 Ik − A ˆ ) = [ ∗ · · · ∗. det R = 0. · · · . 2. i i ACA : April 14. For this we deﬁne the generalized reachability matrix V = (s0 In − A)−1 B. σ . · · · . B. together with k − 1 derivatives. and partial generalized reachability matrix: Rk (A. · · · . · · · . ˆ deﬁned by (1. · · · . e1 . together with k − 1 derivatives at Proposition 11. −1 i “oa” 2004/4/14 page 288 i W∗ B −1 W∗ B The next result concerns matching the value of the transfer function at a given point s0 ∈ C. For this purpose we need the partial reachability matrix Rk (A. Let also W be any left inverse of V. k. σ1 ) · · · Rm (A.8) interpolates the transfer function of Σ at s0 . any projector which is composed of any combination of the above three cases achieves matching of an appropriate number of Markov parameters and moments. which ﬁnally implies = 1. e1 . The reduced system (1. 1. Rational Arnoldi. s0 I k − A −1 Consequently [W (s0 In − A)V] ∗ − W B = e . σ )] . W∗ B. · · · . · · · . (s0 In − A)−k B . B. · · · .i i 288 Chapter 11. Let V be as deﬁned in (11. The above corollary can be interpreted as rational Arnoldi. ¯ that spans the same column space as V achieves the same objective.7. and W by W ¯ = R−1 W. k − 1.11). together with any left inverse W thereof. (sk In − A)−1 B (W∗ (sj In − A)V) = C(s1 In − A)−1 B. B) Rm1 (A. · · · .8. Model reduction using Krylov methods = C (s1 In − A)−1 B. Σ the same point: (−1)j dj H(s) j ! dsj = C(s0 In − A)−(j +1) B s=s0 j j ˆ (s) ˆ (s0 Ik − A ˆ )−(j +1) B ˆ = (−1) d H =C j ! dsj (11. Proof. (b) Let V be such that span col V = span col [Rk (A. It readily follows that the projected matrix ˆ is in companion form (expression on the left) and therefore its powers are obtained by shifting its columns s0 Ir − A to the right: ˆ = W∗ (s0 In − A)V = [W∗ B. Its numerical implementation is obtained by combining regular Arnoldi with shiftinvert Arnoldi for the shifts σ1 . · · · . This is formalized next. B. . i = 1.11) s= s0 where j = 0. C(sk In − A)−1 B ej = C(sj In − A)−1 B. (s0 In − A)−2 B. ¯ = VR. and W be such that W∗ V = Ik . C(sk In − A)−1 B ([· · · W∗ B · · ·]) = C(s1 In − A)−1 B. A moment’s reﬂection shows that any V Finally. C which completes the proof. ∗ ˆ ( s0 I k − A ˆ )− B ˆ = CV [W∗ (s0 I − A)V]− W∗ B = CVe = C(s0 In − A)− B. (a) If V as deﬁned in the above three cases is replaced by V the same matching results hold true. B) = B AB · · · Ak−1 B . 2004 i i . σ ) = (σ In − A)−1 B (σ In − A)−2 B · · · (σ In − A)−k B . Corollary 11.8) matches k Markov parameters and mi moments at σi ∈ C. ek− ]. R ∈ Rk×k .
.3.i i 11.85) and (4.e. 2k . A) ∈ n×n R . A)Rk (A.1. 2k . · · · . we will make use of the following generalized reachability and observability matrices (recall (4. the projected system Σ ˆ deﬁned by (1. A) ∈ Rk×n . · · · . W = (Ok (C. · · · . Certain nonsingularity conditions have to be satisﬁed for this to happen.9. y(t) = Cx(t). These relationships are valid for i = k + 1. Notice that the projection used above is more general that the one used for the twosided Lanczos procedure and therefore it fails less often (only one determinant has to be nonsingular for it not to fail). · · · . B))−1 Ok (C. A ˆ = W∗ AV. Assuming that det Hk = 0. A). . E ACA April 14. C ˆ = CV. The ﬁrst case is V = Rk (A. W ˜ ∗V ˜ = 0. Assuming that det W ∗ ˜ −1 ˜ ˜ W = W(V W) interpolates the transfer function of Σ at the points si . However. 2004 i i i i .1 Twosided projections The above results can be strengthened if the row span of the left matrix W∗ is chosen to match the row span of an observability or generalized observability matrix. .3. Hk ˆ deﬁned by (1. . The reduced system given in a generalized form Ex is given by ˆ = W∗ EV. the twosided Lanczos process forces the resulting matrix A This in turn is equivalent with the existence of an LU factorization of Hk . B ˆ = W ∗ B. Proof.8) is a partial realization of Σ and matches 2k Markov Proposition 11.86)): ˜ = (s1 In − A)−1 B · · · (sk In − A)−1 B . i. this is not always possible. i = 1. which is a rather stringent condition. W for i = 1. where E may be singular. The key reason for this is that ˆ to be tridiagonal. In such a case twice as many moments can be matched with a reduced system of the same dimension as in the previous section.10. Given 2k distinct points s1 . Σ parameters.3. s2k . We will denote by Ok (C. Partial realization and rational interpolation by projection 289 i “oa” 2004/4/14 page 289 i 11.. (a) The same procedure as above can be used to approximate implicit systems. The string of equalities that follows proves the desired result: ˆ (si Ik − A ˆ )−1 B ˆ = CV ˜ si Ik − (W ˜ ∗V ˜ )−1 W ˜ ∗ AV ˜ C ˜ si W ˜ ∗V ˜ −W ˜ ∗ AV ˜ = CV ˜ W ˜ ∗ (si In − A)V ˜ = CV ˜ W ˜ ∗ [· · · B · · ·] = CV −1 −1 ˜ ∗V ˜ )−1 W ˜ ∗B (W ˜ ∗B W −1 ˜ ∗B W −1 ˜ ∗ B = CVe ˜ i = C(si In − A)−1 B. the partial observability matrix consisting of the ﬁrst k rows of On (C. B). V ˜ = (sk+1 In − A∗ )−1 C∗ · · · (s2k In − A∗ )−1 C∗ . k −1 . ∗ ˜ ∗ −1 ˜ C V ˜∗ ˜ = CV W B = ei W B = C(si In − A) B.8) where V = V ˜ and Proposition 11. systems that are ˙ (t) = Ax(t) + Bu(t). Remark 11. .
i i ACA : April 14. We refer to the work of Genin. For k = n − 1. Chapter 11. . Model reduction using Krylov methods i “oa” 2004/4/14 page 290 i 11. that is systems described by nonscalar proper rational transfer functions. See also [310]. 2004 i i . provides a projector that interpolates the original system C. we will assume that D = D ˆ = 0. (b) The general case of model reduction of MIMO (multiinput.2 How general is model reduction by rational Krylov? Consider a singleinput singleoutput system Σ = ˆ = that the reduced system Σ ˆ A ˆ C ˆ B ˆ D A C B D of order n. W are obtained by means of the rational Krylov procedure. ˆ have a common factor of degree. Assume . [128].3.i i 290 where the projection VW∗ is given by C(sk+1 E − A)−1 . while for k < n − 1 there are several choices of the interpolation points. . Remark 11. Vandendorpe. Let Σ be given by 0 1 0 0 0 1 0 0 0 A= 0 0 0 −1 −5 −10 0 0 1 0 −10 0 0 0 1 −5 . C ˆ = CV. W∗ = ∈R . is also singleinput singleoutput of order k and described by the rational ˆ .15). all µi have to be used. the degree of the numerator is less than the degree of the denominator). To see this.12) The situation however is not as simple for multiinput multioutput systems.3. i = 1. The Sylvester equation and projectors. say. An important connection between rational interpolation and the Sylvester equation follows from theorem 6. that is when d and d ˆ the degree of H which can be obtained by interpolation. Its transfer function is a scalar propoer rational function H = n/d (i.1. Example 11. multioutput) systems by means of tangential interpolation is analyzed in [130]. Gallivan. B at minus the eigenvalues of H. n + k . The answer to this question is Since D. such that ˆ = W∗ AV. B = 0 0 0 0 1 . The question is whether there exists a projection Π = VW∗ (in general oblique) where ˆ =n ˆ /d transfer function H V. A. C = [1 8 26 40 5]. Therefore the projectors discussed in the previous section can be obtained by solving Sylvester equations. This result was ﬁrst observed in [127].5 and in particular formula (6. This shows that the solution of an appropriate Sylvester equation AX + XH + BG = 0. [129]. Here is a simple example which illustrates the SISO case. A ˆ are not involved in the above projection. Therefore H interpolating H at 2k + 1 of these zeros. for details. D afﬁrmative in the generic case.e. namely k ≤ n − − 1. B ˆ = W ∗ B. D = 0. there is a restriction on In the nongeneric case. consider the difference of the transfer functions ˆ = H−H ˆ −n ˆ ˆd n n nd − = ˆ ˆ d d dd ˆ can be constructed by This difference has n + k zeros (including those at inﬁnity): µi . (11. −1 C(s2k E − A) V = (s1 E − A)−1 B · · · (sk E − A)−1 B ∈ Rn×k . k×n . · · · .2. and van Dooren [137].3.
Recall that the zeros of the square system A C B D . the zeros are the ﬁnite eigenvalues of a generalized eigenvalue problem. ˆ is The reduced order system Σ 1 0 0 ˆ = 0 . where D is nonsingular are the eigenvalues of the matrix A − BD−1 C.B ˆ ) and (C ˜.1.e. Given is a passive system described by Σ = A C B D . i. For a result derived from the one presented below. (s+1)5 i “oa” 2004/4/14 page 291 i 291 0 ˆ = 0 A 1 ˆ (s) = which implies H 5s 2 − 1 (s−1)3 . D ˆ = 0. W = CA 0 0 C 1 8 26 40 5 0 0 0 0 0 Therefore 5 ¯ ∗V 15 ⇒ W∗ = W −1 3 −3 . Lemma 11. 0 −1 = 1 V = A− 1 B A − 2 B A − 3 B = . 2004 i i i i . As interpolation points we pick all ﬁve zeros at 0 and one at inﬁnity. are chosen to satisfy this property. This means that its transfer function H(s) = C(sI − A)−1 B + D is positive real.3 Model reduction with preservation of stability and passivity Recall the deﬁnitions of passivity and positive realness discussed in section 5. see also [234]. Partial realization and rational interpolation by projection which implies H(s) = 5s4 +40s3 +26s2 +8s+1 . the projected system is both stable and passive.A ˜. namely all ﬁnite λ for which A B I 0 −λ .11. It is readily checked that the triples (C 11.9. as zeros of the spectral factors.i i 11. [117]. C D 0 0 loses rank. Thus we have −1 5 −15 0 −1 3 16 30 0 −1 CA−1 5 −2 ¯∗ 0 −30 −16 −3 . 0 1 . see [309]. C ˜ = CV = 1 .B ˜ ) are equivalent and therefore H ˆ ( s) = C ˜ ( sI − A ˜ )−1 B ˜. Thus when D + D∗ is nonsingular.A ˆ. If the frequencies (interpolation points) sj in proposition 11. B 0 1 −1 −3 −1 . If D is singular. Let λi be such that H∗ (−λi ) + H(λi ) = 0. [116].3. (s + 1)5 (s − 1)3 which implies that this expression has ﬁve zeros at 0 and three at inﬁnity. The following result is shown in [27]. the zeros of the spectral factors are the eigenvalues of the following Hamiltonian matrix: A 0 B − (D + D∗ )−1 C B∗ 0 − A∗ −C∗ ACA April 14. [205]..3. B −3 3 1 It readily follows that ˆ (s) = H(s) − H −128 s5 . For different approaches to the problem see [33]. ˆ. 1 −3 ¯ ∗ V = −1 W −1 Finally −1 5 −3 −1 −5 −1 ¯ ∗ W = 0 −1 0 0 −10 21 −5 −23 −1 8 3 1 ˜ = W∗ AV = −3 0 A 1 0 0 3 ˜ = W ∗ B = −3 .9. C ˆ = [−1 0 5]. [352].
(s5 + 7s4 + 14s3 + 21s2 + 23s + 7)(s5 − 7s4 + 14s3 − 21s2 + 23s − 7) The zeros of the stable spectral factor are: −.2947 .7451 . A further result in passivity preserving model reduction was developed by Sorensen [309].2.1603 −.2640 .0972 −. −1.0250 .1212 −. D = 1 H(s) = H(s) + H(−s) = s5 + 3 s4 + 6 s3 + 9 s2 + 7 s + 3 s5 + 7s4 + 14s3 + 21s2 + 23s + 7 2(s10 − s8 − 12s6 + 5s4 + 35s2 − 21) . Example 11.0157 . x4 . the current through L1 . the voltage across C3 . We consider the following RLC ladder network.8355.3.3018. the voltage across C1 . s2 = −1.2919 1. 2004 i i .8355.3018.7943.0833 . We assume that all the 1 capacitors and inductors have unit value.0519 −. The input is the voltage u and the output is the current y as shown in the ﬁgure below.2749 . B = 0 0 0 0 2 . and x5 .8355. E where A 0 B I 0 0 −C∗ . these can also be computed as the square roots in the lefthalf plane of the eigenvalues of 3 −2 1 0 0 2 −2 0 1 0 2 −1 0 −2 0 1 A − BD CA = 1 .3. −1. Therefore the problem of model reduction with preservation of passivity has been transformed into a structured eigenvalue problem.3.1842 V = [(s1 I5 − A) B (s2 I5 − A) B] = . the voltage across C2 .7081 i i ACA : April 14. The state variables are as follows: x1 . s4 = −1. −. the current through L2 . x3 . A = 0 − A∗ ∗ C B D + D∗ 0 0 0 Remark 11. x2 . C = [0 0 0 0 −2].5430i.0524 .6186 −. Model reduction using Krylov methods i “oa” 2004/4/14 page 292 i In particular if the reduced order system has dimension k .i i 292 Chapter 11. y E R2 L2 L1 u c A= C3 C2 C1 R1 −2 1 −1 0 0 −1 0 0 0 0 0 1 0 −1 0 0 0 1 0 −1 0 0 0 1 −5 .0671 −. E ) provide the desired passivity preserving projector VW∗ . R2 = 5 Ω. s3 = 1. we need to compute k eigenvalues λi ∈ C− .0065 . while R1 = 1 2 Ω. E = 0 I 0 . If the nonsingularity condition is not satisﬁed we end up with a generalized eigenvalue problem of the pair A.3018.7251 ¯ = W C(s3 I5 − A)−1 C(s4 I5 − A)−1 = −. who showed that the invariant spaces of the of (A.11 the interpolation points are therefore: s1 = 1. 0 1 0 −2 −5 0 0 1 1 4 According to lemma 11. and can therefore be solved using the ARPACK software.0434 −1 −1 ¯ .1833 ± 1. .
5310 ¯ = W = L− 1 W The reduced system is ˆ = WAV = − A 3.1279 .2613 293 i “oa” 2004/4/14 page 293 i −.1058 . It was shown that rational Krylov methods match moments at preassigned points in the complex plane. Chapter summary ¯V ¯ =− W . C ˆ1 = 0. 4.2430 F .2245 . they provide an iterative solution of the rational interpolation problem. we examined model reduction by moment matching from a more general point of view. First we showed that the Lanczos and Arnoldi procedures yield reduced order models which match a certain number of Markov parameters (i. thus showing that the reduced system is passive.0236 −1.3036 −.2762 . In the second part. The values of the above elements are: R ˜1 R ˆ1 R y 11. and D ˆ = D.0620 0. s3 .7451 .4161 0.U = − .0905 −0.2630 −. B 1.2351 −. and C ˆ (si ) = H(si ).1152 s + 2.0040 −. Furthermore s4 .4. 1.0634 . R ˜ 1 = 3. that is.1897 −0.7081 −.3419 .0640 = LU ⇒ L = .2317 1 1 0 .0640 0 . i = 1.1439 −1 ¯ V = VU = −. by choosing the interpolation points as a subset of the spectral zeros of the system. −.2762 . Furthermore H ˆ (s) = D + C ˆ (sI2 − A ˆ )−1 B = 1 + H u E R2 4. ˆ = CV = [1.9905 5.7652 s + 1.6186 −. Lanczos matches twice as many moments. Finally the reduction with preservation of passivity was shown to be solvable using rational Krylov methods. Thus while Arnoldi matches a number of moments equal to the order of the reduced model. The solution of this problem can thus be reduced to the solution of a structured eigenvalue problem. 2004 i i i i .4388 1.5874 ˆ = WB = − .i i 11.4 Chapter summary In this chapter we applied Krylov methods to model reduction.4404 Ω. 3.8969 Ω. R2 = 1 Ω.0904 H .9953 −2.9261 2. moments at inﬁnity) of the original model. 2.0972 −.0671 −.2923 5. It readily follows that the zeros of the spectral factors are s1 .2560 . s2 .0620]. Furthermore the question of the generality of the rational Krylov procedure has been brieﬂy investigated.e. s2 + 1.4891 ˆ1 L ˆ1 C c ˆ 1 = 2. ACA April 14.6584 . L ˆ 1 = 5.2762 .
i i 294 Chapter 11. 2004 i i . Model reduction using Krylov methods i “oa” 2004/4/14 page 294 i i i ACA : April 14.
i i i “oa” 2004/4/14 page 295 i Part V SVD–Krylov methods and case studies 295 i i i i .
i i i “oa” 2004/4/14 page 296 i i i i i .
i i
i
“oa” 2004/4/14 page 297 i
Chapter 12
SVD–Krylov methods
The basic ingredient of SVD–based methods are certain gramians P , Q, usually referred to as reachability, observability gramians, respectively. These are positive (semi) deﬁnite, can be simultaneously diagonalized, and are used to construct a projector onto the dominant eigenspace of the product PQ. These gramians are solutions to Lyapunov equations. Gramians are also involved in Krylov–based methods, but these are given in factored form P = RR∗ , Q = ∗ O O, where R, O are reachability, observability or generalized reachability, generalized observability matrices; these matrices are used to project the original model so that interpolation (moment matching) conditions are satisﬁed. Consequently there is no need for solving Lyapunov equations in this case. A further aspect of Krylov methods is that they can be implemented in an iterative way, using well known procedures.
12.1 Connection between SVD and Krylov methods
This chapter is dedicated to a discussion of various connections between these two approximation methods. Next, four such connections are listed. 1. Reachability and generalized reachability matrices can be obtained as solutions of Sylvester equations, the latter being a general form of Lyapunov equations. The same holds for observability and generalized observability matrices. 2. By appropriate choice of the weightings, weighted balanced truncation methods can be reduced to Krylov methods. 3. There are methods which combine attributes of both approaches, for instance model reduction by least squares. In this case instead of two, only one gramian (the observability gramian Q) is computed, while at the same time moment matching takes place. The former is the SVD part while the latter is the Krylov part of this method. 4. The bottleneck in applying SVD–based methods (balanced truncation) to large–scale systems is the fact that the solution of Lyapunov equations requires O(n3 ) operations where n is the dimension of the original system. One remedy to this situation consists is using iterative methods for solving these Lyapunov equation approximately. This leads to approximately balancing transformations and to reduced obtained by approximately balanced truncation. The ﬁrst three items above will be discussed in the three sections that follow. The fourth item will be discussed in section 12.2. 297 i i i i
i i 298 Chapter 12. SVD–Krylov methods
i
“oa” 2004/4/14 page 298 i
12.1.1 Krylov methods and the Sylvester equation
Recall from chapter 6 that the solution to the Sylvester equation AX + XB = C can be written according to (6.14) on page 149, as X= (µ1 I + A)−1 c1 · · · (µk I + A)−1 c1
R(A,c1 )
ˆ1 c∗ 2w .. . ˆn c∗ 2w
˜∗ W
∗ W ,
where W∗ B = MW∗ , M = diag (µ1 , · · · , µk ), is the eigenvalue decomposition of B. We recognize the matrix R(A, c1 ) as a generalized reachability matrix (see (4.85)) and according to proposition 11.6 if we chose any left inverse W∗ thereof, the two matrices deﬁne a projected system which interpolates at −µ∗ i, i.e., at the mirror image the eigenvalues of B. Thus, if we want to interpolate at given points λi , choose a matrix B whose eigenvalues are µi = −λ∗ i , chose c1 = b and c2 such that the pair (B, c∗ ) is reachable, and solve the resulting Sylvester equation. The solution provides 2 one part of the required projector.
12.1.2 From weighted balancing to Krylov
Recall the weighted reachability gramian P= 1 2π
∞ −∞
(iω I − A)−1 B W(iω )W∗ (−iω ) B∗ (−iω I − A∗ )−1 dω,
deﬁned by (7.34) on page 196. Therein W is the input weight to the system which will be assumed for simplicity to be a scalar function. Let I denote the unit step function (I(t) = 1, t ≥ 0, and I(t) = 0, otherwise). We deﬁne the weight W (iω )W (−iω ) = 1 1 [I(ω − (ω0 − )) − I(ω − (ω0 + ))] + [I(−ω − (ω0 − )) − I(−ω − (ω0 + ))] 2 2
As → 0 the above expression tends to the sum of two impulses δ (ω − ω0 ) + δ (−ω − ω0 ). In this case the gramian becomes P = (iω0 I − A)−1 BB∗ (−iω0 I − A∗ )−1 + (−iω0 I − A)−1 BB∗ (iω0 I − A∗ )−1 Therefore P = ( iω0 I − A)−1 B, (−iω0 I − A)−1 B
V ∗ ∗
B∗ (−iω0 I − A∗ )−1 B∗ ( iω0 I − A∗ )−1
Thus projection with VW where W V = I2 , yields a reduced system which interpolates the original system at ±iω0 . Thus, with appropriate choice of the weights, weighted balanced truncation reduces to a Krylov method (rational interpolation) and matches the original transfer function at points on the imaginary axis. Example 12.1.1. The purpose of this example is to show yet another way of combining SVD and Krylov methods. Recall the example 11.2.1 on page 285. Given the reachability, observability matrices R4 (A, B), O4 (C, A), the ∗ gramians P = R4 (A, B)R∗ 4 (A, B) and Q = O4 (C, A)O4 (C, A), can be assigned. The balancing method would now provide transformations which simultaneously diagonalize P and Q. In this new basis, model reduction would amount to truncation of the states which correspond to small eigenvalues values of the product PQ. For this choice of gramians, notice that the square roots of these eigenvalues are the absolute values of the eigenvalues of the 4 × 4 Hankel matrix associated with the triple (C, A, B). We will now compute a balancing transformation which diagonalizes the gramians simultaneously. For this purpose, let ∗ ∗ R∗ 4 (A, B)O4 (C, A) = H4 = USV
i i
ACA :
April 14, 2004
i i
i i 12.1. Connection between SVD and Krylov methods Then It follows that 299
i
“oa” 2004/4/14 page 299 i
T = S−1/2 V∗ O4 (C, A) and T−1 = R4 (A, B)US−1/2 . −0.0522 −0.0279 0.2013 0.2255 U= −0.5292 −0.7921 −0.8226 0.5665 −0.7458 0.6209 0.2366 0.0470 −0.6635 −0.7233 , V = U diag (1, −1, 1, −1), −0.1911 −0.0119
and S = diag (71.7459, 64.0506, 2.4129, 1.1082). Thus the balancing transformation is −3.4952 −4.6741 0.3783 −0.4422 5.3851 −3.6051 1.1347 −0.2233 ⇒ TP T∗ = T−∗ QT−1 = S T= 3.4811 −10.6415 −2.5112 −1.1585 −15.3406 −10.8670 −2.8568 −0.6985 −6.6419 5.6305 −1.5031 2.4570 −0.4422 −5.6305 3.1458 −1.0941 1.7895 , G = TB = −0.2233 , F = TAT−1 = −1.5031 1.0941 −1.1585 0.1628 −1.0490 −2.4570 1.7895 1.0490 0.3333 −0.6985 and H = CT−1 = G∗ diag (1, −1, 1, −1). Model reduction now proceeds by retaining the leading parts of F, G, H. Thus given the gramians P = RR∗ and Q = O∗ O, depending on how we proceed, different reduced order models are obtained. For instance, if we apply an SVD–type method, the gramians are simultaneously diagonalized. If we chose a Krylov method on the other hand, the square root factors R and O are directly used to construct the projector. Notice that the former methods loose their interpolation properties. To complete the example we also quote the result obtained by applying the Arnoldi method (11.9), page 289: ˜ 1 = [−2], G ˜ 1 = [1], H ˜ 1 = [1]. F ˜2 = F 0 1 −2 −1 ˜2 = , G 1 0 ˜ 2 = [1, − 2]. , H ˜ 3 = [1, − 2, 0]. H
0 0 4 ˜3 = 1 0 ˜3 = 0 , G F 0 1 −3 0 0 0 −24 −8 ˜ = 1 0 0 ˜ F 0 1 0 −12 , G = 0 0 1 −3
1 0 , 0 1 0 , 0 0
˜ = [1, − 2, 0, 4]. H
In this case according to proposition 11.9, the truncated model of order k matches 2k Markov parameters.
12.1.3 Approximation by least squares
In this section we discuss the approximation of a stable discretetime system in the least squares sense. It turns out that this model reduction method matches moments, i.e., Markov parameters, and guarantees stability of the reduced order system. This section follows [158]. Other contributions along these lines are [92] as well as [178]. Consider as usual, the system Σ = we use the notation: h1 h2 . . .
A c b
, λi (A) < 1. Recall the Hankel matrix (4.63). For this section, ··· ··· ··· hk hk+1 . . . h2k − 1 . . . hk+1 hk+2 . . . .
Hk = hk . . .
ACA
h2 h3 . . . hk+1 . . .
, hk+1 = h2k . . .
April 14, 2004
i i i
i
i i 300 Chapter 12. SVD–Krylov methods
i
“oa” 2004/4/14 page 300 i
We denote by O the inﬁnite observability matrix and by Rk the reachability matrix containing k terms. Thus Hk = ORk and hk+1 = OAk b. Furthermore Q = O∗ O is the observability gramian of the system. The solution is obtained by computing the least squares ﬁt of the (k + 1)st column hk+1 of H on the preceding k columns of H, i.e., the columns of Hk : Hk xLS = hk+1 + eLS where eLS is the least squares error vector. From the standard theory of least squares follows that
∗ xLS = (Hk Hk ) −1 ∗ Hk hk+1 = (R∗ k QRk ) −1 k R∗ k QA b.
The characteristic polynomial of the resulting system is χLS (σ ) = (1 σ · · · σ k ) The corresponding matrices are ALS = ΠL AΠR , bLS = ΠL b, cLS = cΠR , where the projection ΠR ΠL is deﬁned as: ΠL = (R∗ k QRk )
−1
−xLS 1
.
R∗ k Q, ΠR = Rk ⇒ ΠL ΠR = Ik .
If we let x∗ LS = (α0 · · · αk−1 ), the reduced triple has the following form: ALS cLS bLS = 0 0 h1 0 1 . . . 0 0 . . . 0 0 h2 ··· ··· .. . ··· ··· ··· 0 0 . . . 0 1 hk − 1 α0 α1 . . . αk−2 αk−1 hk 1 0 . . . . 0 0
ΣLS =
(12.1)
Lemma 12.1. Stability of the LS approximant. The least squares approximant ΣLS of the stable system Σ, is stable. Before giving the proof of this result, we transform the approximant to a new basis. Let Q = R∗ R be the Cholesky factorization of Q, and let
k ×k ∗ ∗ R∗ , det W = 0 k R RRk = W W, W ∈ R
be the Cholesky factorization of R∗ k QRk . We deﬁne T = RRk W−1 ∈ Rn×k ⇒ T∗ T = Ik It follows that ¯ LS = WALS W−1 = T∗ AT. A
¯ LS have the same characteristic polynomial. We now show that A ¯ LS has eigenvalues Thus by similarity, ALS and A inside the unit circle. ¯ LS ) ≤ 1. Proposition 12.2. If λ(A) < 1, it follows that λ(A Proof. Recall that A∗ QA + c∗ c = Q. Assuming without loss of generality that Q = I, it follows that A∗ A ≤ In ; multiplying this relationship by T∗ and T on the left and right respectively, we have T∗ A∗ AT ≤ Ik . Since I n L∗ > 0, is equivalent with In − L∗ L > 0 or Im − LL∗ , for L ∈ Rm×n , the latter inequality is equivalent to L Im
i i
ACA :
April 14, 2004
i i
i i 12.2. Introduction to iterative methods 301
i
“oa” 2004/4/14 page 301 i
¯ LS A ¯ ∗ = T∗ ATT∗ A∗ T ≤ Ik ; this concludes the proof of the proposition. ATT∗ A∗ ≤ In , which in turn implies A LS
Proof. We will now prove lemma 12.1. The above proposition implies that the LS approximant may have poles on the unit circle. To show that this cannot happen, we complete the columns of T to obtain a unitary matrix: U = [T S], UU∗ = U∗ U = In . Then In − A∗ A = c∗ c, implies In − U∗ A∗ UU∗ AU = U∗ c∗ cU, which written explicitly gives: Ik T ∗ A∗ T T ∗ A∗ S T∗ AT T∗ AS T∗ c∗ cT T∗ c∗ cS − = . ∗ ∗ ∗ ∗ ∗ ∗ In−k S A T S A S S AT S AS S∗ c∗ cT S∗ c∗ cS The (1, 1) block of this equation is: Ik − T∗ A∗ TT∗ AT − T∗ A∗ SS∗ AT = T∗ c∗ cT. Assume now that the reduced system has a pole of magnitude one, i.e., there exists x ∈ Ck such that T∗ ATx = λx and λ = 1. Multiplying the above equation with x∗ on the left and x on the right, we obtain x∗ T∗ A∗ SS∗ ATx + x∗ T∗ c∗ cTx = S∗ ATx
2
+ cTx
2
= 0.
This implies cTx = 0 and S∗ ATx = 0 which means that the reduced system is not observable if it has a pole on the x x T∗ ⇒ y = Tx; hence Tx is ⇒ y = (T S) y= unit circle. Now let y = ATx; then U∗ y = 0 0 S∗ an eigenvector of A with eigenvalue 1, which is in the null space of c; this is a contradiction of the fact that c, A is observable. This concludes the proof. From (12.1) follows that the least squares approximant matches the ﬁrst k Markov parameters of the original system. The above considerations are summarized next. Theorem 12.3. Given the discretetime stable system Σ, let ΣLS be the k th order approximant obtained by means of the least squares ﬁt of the (k + 1)st column of the Hankel matrix of Σ to the preceding k columns. ΣLS is given by (12.1) and enjoys the following properties: • ΣLS is stable. ¯ , i = 1, · · · , k . ¯ i− 1 b ¯A • k Markov parameters are matched: cAi−1 b = c
Remark 12.1.1. Connection with Prony’s method. The least squares method just presented is related to Prony’s method. For details on this and other issues, e.g. generalization to continuoustime systems, MIMO systems, and an error bound, we refer to [158]. A different way to obtain stable reduced order systems by projection is described in [207].
12.2 Introduction to iterative methods
Recently, there has been renewed interest in iterative projection methods for model reduction. Three leading efforts in this area are Pad´ e via Lanczos (PVL) [113], multipoint rational interpolation [151], and implicitly restarted dual Arnoldi [188]. The PVL approach exploits the deep connection between the (nonsymmetric) Lanczos process and classic moment matching techniques; for details see the work of van Dooren as summarized in [333, 336]. The multipoint rational interpolation approach utilizes the rational Krylov method of Ruhe [281] to provide moment matching of the transfer function at selected frequencies and hence to obtain enhanced approximation of the transfer function over a broad frequency range. These techniques have proven to be very effective. PVL has enjoyed considerable success in
ACA
April 14, 2004
i i i
i
i i 302 Chapter 12. SVD–Krylov methods
i
“oa” 2004/4/14 page 302 i
circuit simulation applications. Rational interpolation achieves remarkable approximation of the transfer function with very low order models. Nevertheless, there are shortcomings to both approaches. In particular, since the methods are local in nature, it is difﬁcult to establish rigorous error bounds. Heuristics have been developed that appear to work, but no global results exist. Secondly, the rational interpolation method requires selection of interpolation points. At present, this is not an automated process and relies on adhoc speciﬁcation by the user. The dual Arnoldi method runs two separate Arnoldi processes, one for the reachability subspace, and the other for the observability subspace and then constructs an oblique projection from the two orthogonal Arnoldi basis sets. The basis sets and the reduced model are updated using a generalized notion of implicit restarting. The updating process is designed to iteratively improve the approximation properties of the model. Essentially, the reduced model is reduced further, keeping the best features, and then expanded via the dual Arnoldi processes to include new information. The goal is to achieve approximation properties related to balanced realizations. Other related approaches [67, 228, 194, 267] work directly with projected forms of the two Lyapunov equations to obtain low rank approximations to the system gramians. An overview of similar model reduction methods can be found in [336]. See also [91] and [210]. In the sequel we describe two approaches to iterative projection methods. The ﬁrst uses the cross gramian introduced in section 4.3.2 on page 74. The second is based on Smithrelated iterative methods. These two sections follow closely [310] and [157], respectively.
12.3 An iterative method for approximate balanced reduction
Computational techniques are well known for producing a balancing transformation T for small to medium scale problems [167, 224]. Such methods rely upon an initial Schur decomposition of A followed by additional factorization schemes of dense linear algebra. The computational complexity involves O(n3 ) arithmetic operations and the storage of several dense matrices of order n; i.e. O(n2 ) storage. For large state space systems this approach for obtaining a reduced model is clearly intractable. Yet, computational experiments indicate that such systems are representable with very low order models. This provides the primary motivation for seeking methods to construct projections of low order. The cross gramian X of Σ is deﬁned for square systems (m = p) by (4.59); it is the solution to the Sylvester equation AX + X A + BC = 0. (4.59)
Our approach to model reduction as reported in section 12.3.1, consists in constructing low rank k approximate soˆ W∗ with W∗ V = Ik and then projecting using V together with W. An lutions to this matrix by setting X = VX implicit restart mechanism is presented which allows the computation of an approximation to the best rank k approximation to X . Furthermore, a reduced basis constructed from this procedure has an error estimate in the SISO case. As mentioned earlier, construction of a balancing transformation has typically relied upon solving for the reachability and observability gramians P , Q and developing a balancing transformation from the EVD of the product PQ. However, it turns out that in the SISO case and in the case of symmetric MIMO systems, a balancing transformation can be obtained directly from the eigenvector basis for the cross gramian. Recall lemma 5.4, which states that the eigenvalues of the Hankel operator for square systems are given by the eigenvalues of the cross gramian. In addition, the following two results hold. Lemma 12.4. Let
A C B
be a stable SISO system that is reachable and observable. There is a nonsingular
symmetric matrix J such that AJ = JA∗ , and CJ = B∗ . It follows that the three gramians are related as follows P = X J and Q = J−1 X . Proof. Let R = [B, AB, · · · , An−1 B], O = [C∗ , A∗ C∗ , · · · , (A∗ )n−1 C∗ ]∗ , and deﬁne Hk = OAk R. The hypothesis implies both R and O are nonsingular, and it is easily shown that the Hankel matrix Hk is symmetric. Deﬁne J = RO−∗ . Note J = O−1 H0 O−∗ so that J = J∗ . Moreover,
∗ ∗ −1 ∗ ∗ ∗ CJ = e∗ R = e∗ 1 O J = e1 OO 1R = B .
i i
ACA :
April 14, 2004
i i
i i 12.3. An iterative method for approximate balanced reduction 303
i
“oa” 2004/4/14 page 303 i
To complete the proof, we note that AJ = ARO−∗ = O−1 H1 O−∗ , and hence AJ = (AJ)∗ = JA∗ . It follows that A(X J) + (X J)A∗ + BB∗ = 0, which implies P = X J. Similarly, multiplying the cross gramian equation on the left by J−1 we obtain A∗ J−1 X + J−1 X A + C∗ C = 0, which in turn implies the desired Q = J−1 X .
Lemma 12.5. Let
A C
B
be a stable SISO system that is reachable and observable. Suppose that (4.59) is
satisﬁed. Then X is diagonalizable with X Z = ZD, where Z is nonsingular and D is real and diagonal. Moreover, up to a diagonal scaling of its columns, Z is a balancing transformation for the system. Proof. Since P = X J is symmetric positive deﬁnite, it has a Cholesky factorization X J = LL∗ with L lower triangular and nonsingular. This implies L−1 X L = L∗ J−1 L = QDQ∗ , where Q is orthogonal and D is real and diagonal since L∗ J−1 L is symmetric. Thus, X = ZDZ−1 , and J = ZD−1 Z∗ , with Z = LQ. Since D1/2 D = DD1/2 we may replace Z by ZD−1/2 to obtain J = ZDJ Z∗ , and X = ZDZ−1 , with DJ = DD−1 = diag (±1).
1 It follows that X J = Z(DDJ )Z∗ , and J−1 X = Z−∗ (DJ D)Z−1 , since DJ = diag (±1) implies DJ = D− J . If we let S = DDJ = D, we note S is a diagonal matrix with positive diagonal elements, and the above discussion together with equation (4.59) gives
(Z−1 AZ)S + S(Z∗ A∗ Z−∗ ) + Z−1 BB∗ Z−∗ = 0. After some straightforward manipulations, we ﬁnd that (Z−1 AZ)∗ S + S(Z−1 AZ) + Z∗ C∗ CZ = 0. To conclude the proof, we note that CJ = B∗ implies CZDJ = B∗ Z−∗ , and hence the system transformed by Z is indeed balanced.
A corollary of this result is that if the cross gramian is diagonal then the system is essentially balanced. Since Z can be taken to be a diagonal matrix such that CZej = ±B∗ Z−∗ ej , and thus a diagonal linear transformation balances the system. This diagonal transformation is constructed trivially from the entries of B and C. As mentioned earlier, in the SISO case the absolute values of the diagonal entries of D are the Hankel singular values of the system. If we assume that the diagonal entries of D have been ordered in decreasing order of magnitude then the n × k matrix Zk consisting of the leading k columns of Z provide a truncated balanced realization with all of the desired stability and error properties. The question then is how to compute a reasonable approximation to Zk directly. We wish to avoid computing all of Z ﬁrst, followed by truncation, especially in the largescale setting.
12.3.1 Approximate balancing through low rank approximation of the cross gramian
We now consider the problem of computing the best rank k approximation Xk to the cross gramian X . We seek a restarting mechanism analogous to implicit restarting for eigenvalue computations that enable us to compute Xk directly instead of computing all of X and truncating.
∗ ∗ Motivation. Suppose X = USV∗ is the SVD of X . Let X = U1 S1 V1 + U2 S2 V2 where U = [U1 , U2 ], and V = [V1 , V2 ]. Projecting the cross gramian equation on the right with V1 we get ∗ (AX + X A + BC)V1 = 0 ⇒ AU1 S1 + U1 S1 (V1 AV1 ) + BCV1 = E, ∗ where E = −U2 S1 V1 AV1 . Observe that E that is, the (k+1)st singular value of X . 2
= O(σk+1 ) A , where σk+1 is the ﬁrst neglected singular value,
ACA
April 14, 2004
i i i
i
i i 304 Chapter 12. SVD–Krylov methods Procedure. This suggests the following type of iteration; First obtain a projection of A AV + VH = F with V∗ V = I, V∗ F = 0, where V ∈ Rn×m and H ∈ Rm×m , with k < m n. We require that this projection yield a stable matrix H. If not, V must be modiﬁed. Using the technique of implicit restarting as developed in [152], we can cast out the unstable eigenvalues. This would result in a V and an H of smaller dimension. As long as this dimension remains greater than k , we can execute the remaining steps described below. Should the dimension fall below k , then some method must be used to expand the basis set. As an example, one could use a block form of Arnoldi with the remaining V as a starting block of vectors. Once V and H have been computed, solve the Sylvester equation AW + WH + BCV = 0, for W ∈ Rn×m . We now compute the SVD of W: W = QSY∗ = Q1 Q2 S1 S2
∗ Y1 ∗ Y2
i
“oa” 2004/4/14 page 304 i
(12.2)
where Q1 ∈ Rn×k , S1 ∈ Rk×k , Y1 ∈ Rm×k ,
and S1 contains the k most signiﬁcant singular values of W. If we now project (12.2) on the right with Y1 we obtain
∗ ∗ HY1 where H1 = Y1 HY1 . AU1 S1 + U1 S1 H1 + BCV1 = −Q2 S2 Y2
At this point the projection has been updated from V ∈ Rn×m to V1 = VY1 ∈ Rn×k . We must now have a mechanism to expand the subspace generated by V1 so that the above process can be repeated. One possibility is as ∗ ∗ V ∈ Rn×n , can be considered as an approximate solution of follows. First notice that the expression X = Q1 S1 Y1 the equation AX + X A + BC = 0; therefore we project the expression AX + X A + BC on the left with Q∗ 1 ; we call the result E ∗ ∗ ∗ ∗ ∗ E = (Q ∗ 1 AQ1 )(S1 Y1 V ) + (S1 Y1 V )A + Q1 BC and then solve the equation HZ + ZA + E = 0, for Z ∈ Rk×n . This represents a residual correction to the solution of the Sylvester equation AX + X A + BC = 0 when projected on the left. However, instead of adding this correction to the basis set V1 , we simply adjoin the columns of Z∗ to the subspace spanned by the columns of V1 and project A onto this space. Let VY1 Z∗ = V1 V2 S1 S2 U1∗ U2∗ , V1 ∈ Rn×k , S1 ∈ Rk×k , U1 ∈ R2k×k .
Thus the updated projector at this stage is V1 , the updated projection of A is H = V1∗ AV1 , and the procedure can go to equation (12.2). The latter portion of the iteration is analogous to the “Davidson part” of the JacobiDavidson algorithm for eigenvalue calculation proposed by Sleijpen and van der Vorst [300]. These ideas are summarized in the algorithm sketched in ﬁgure 12.1 for the complete iteration. There have been many ideas for the numerical solution of Lyapunov and Sylvester equations [176, 350, 184, 284, 297, 267, 228, 194]. This approach nearly gives the best rank k approximation directly. However, the best rank k approximation is not a ﬁxed point of this iteration. A slightly modiﬁed correction equation is needed to achieve this. Note the iteration of Hodel, et. al. [176] for the Lyapunov equation also suffers from this difﬁculty. A ﬁnal remark on this method. The equation AW + WH + BCV = 0, introduces in the projector directions which lie in the reachability or Krylov space R(A, B), while the correction equation HZ + ZA + E = 0, introduces directions from the observability or Krylov space O(C, A). A special Sylvester equation Efﬁcient solution of a special Sylvester equation provides the key to Steps 2.1 and 2.3 in Algorithm 12.1. Both of these steps result in an equation of the form AZ + ZH + M = 0, where H is a k × k stable matrix, and M is an n × k
i i
ACA :
April 14, 2004
i i
3 Correct Sylvester equation projected in O(C. k. for j = 1. Q1 = Q( :. 1 : k).4 Expand the space – Adjoin correction and project: [V. This ˆj . Y1 = Y ( :. Within this framework. z ˆ.21) are the desired eigenvalues. Staying in real arithmetic would require working with quadratic factors involving A and hence would destroy sparsity. AV + VH = F with V∗ V = I and V∗ F = 0 2. from left + (ZU)R + (MU) = 0. while (not converged ). If there is a reasonable gap between the eigenvalues of H and the imaginary axis. 2.i i 12. H ← Y1 HY1 .. and transform the equation to A(ZU) ˆ = ZU. Since the eigenvalues of A are assumed to be in the open left halfplane and the eigenvalues of −H∗ are in the open right halfplane. An implicitly restarted method for the cross gramian. An alternative would be to construct a Schur decomposition H = QRQ∗ . S. Figure 12. where R is (quasi) upper triangular.5 on page 153. Z]) New projector: V ← V( :. 1 : k). 1 : k). and then solve for the columns of Z to right via j −1 (A − ρjj I)ˆ zj = −mj − i=1 ˆi ρij . ACA April 14. A): ∗ ∗ ∗ Form E = (Q∗ 1 AQ1 )(S1 V )+(S1 V )A + Q1 BC. S.1. The special feature of this Sylvester equation is that A is much larger in dimension than H.1 Solve Sylvester equation projected in R(A. One possibility is to use a single Cayley transformation (costing one sparse direct factorization) to map the eigenvalues of A to the interior and the eigenvalues of H to the exterior of the unit disk. the implicitly restarted Arnoldi method [305] (implemented in ARPACK [227]) can be used effectively to compute this partial Schur decomposition. the k eigenvalues of largest real part for the block upper triangular matrix in (6. 2.1. m ˆ j are the columns of Z where ρij are the elements of the upper triangular matrix R. Solve HZ + ZA + E = 0 2. 1 : k) New projected A: H ← V∗ AV. The ﬁrst method we propose here is the one described in section 6.2 Contract the space (keep largest singular values): S1 = S(1 : k. then IRA is successful in computing the k eigenvalues of largest real part using only matrix vector products. 2004 i i i i . 2.3. and z would require a separate sparse direct factorization of a large n × n complex matrix at each step j . ∗ V ← VY1 . exact knowledge of the desired eigenvalues provides several opportunities to enhance convergence. U] = svd([V. matrix with k n.21).. Y ] = svd(W). it is possible to compute the eigenvalues of H in advance of the computation of the partial Schur decomposition in (6. 2. An iterative method for approximate balanced reduction 305 i “oa” 2004/4/14 page 305 i (Implicit) Restarting Algorithm 1. When k is small. .. That would be k such factorizations and each of these would have a potentially different sparsity pattern for L and U due to pivoting for stability. B): Solve AW + WH + BCV = 0 [Q. In any case. M ˆ = MU.
i i 306 Approximate balancing transformation ∗ ∗ Let the SVD of the cross gramian be X = USV∗ = U1 S1 V1 + U2 S2 V2 where S1 contains the signiﬁcant singular values. one cannot be certain that the two subspaces are the same as the ones that would have been achieved through computing the full gramians and then truncating. Remark 12. we see that i “oa” 2004/4/14 page 306 i Chapter 12. as discussed previously. 1 /2 1 /2 this projected system is approximately balanced. i. C1 = CZ1 . If X1 were obtained as X1 = Z1 D1 W1 where the diagonal elements of D1 are the eigenvalues of largest magnitude (i. Sorensen [152] to the projected quantities. there is a clear advantage to working with the cross gramian instead of the working with the two gramians related to reachability and observability. Z1 would provide a balancing transformation.e. For MIMO systems we propose the idea of embedding the system in a symmetric system. It is easily seen that any eigenvector of X1 corresponding to a nonzero eigenvalue must be in the range of U1 .e. ∗ ∗ provides X1 = Z1 D1 W1 with W1 Z1 = I1 Note also that X Z1 = X1 Z1 + (X − X1 )Z1 = Z1 D1 + O(σk+1 ). If GZ = ZD1 with D1 real and diagonal. is almost always stable in practice. The preceding discussion was primarily concerned with the SISO case. poutput system Σ= A C B ˜ ∈ Rn×p and C ˜ ∈ Rm×n such that the augmented system . there is the question of compatibility that arises when working with the pair of gramians. Instead. ∗ In the SISO case. corresponding to nonzero eigenvalues. it is important in many applications that the reduced model be stable as well. SVD–Krylov methods X1 U1 S1 1 /2 1 /2 1 /2 = U1 S1 G 1 /2 ∗ where G = S1 V1 U1 S1 . In fact.1. but occasionally it might be unstable. Stability of the reduced model. In order to make use of this property. The crucial property of the three gramians in the SISO case is X 2 = PQ. It is easy to see that this relationship holds true for MIMO systems which are symmetric. It follows that Uk is an orthonormal basis for a subspace of the reachability subspace R(A. Our methods pertain to stable systems and therefore. A). we have the best rank k approximation to X in X1 . Of course. That would amount to applying the implicit restarting mechanism to rid the projected matrix of unstable modes. we know that X has real eigenvalues. van Dooren. Therefore. In addition to the fact that only one Sylvester equation need be solved. Our goal in this section is to obtain an approximate balancing transformation through the eigenvectors of X1 = U1 S1 V1 . then taking Z1 = U1 S1 ZD1 −1/2 and W1 = V1 S1 Z−∗ D1 −1/2 . We must have the eigenvalues of the reduced matrix A1 in the left halfplane. and thus we have an approximate balancing transformation represented by this Z1 . Since two separate projections must be made. we propose to embed the given system into a symmetric system with more inputs and outputs but the same number of state variables. One approach to achieving a stable reduced model is to apply the techniques developed in Grimme. we seek B ˆ A ˆ C ˆ B A = C ˜ C ˜ B B ∈ R(n+m+p)×(n+p+m) ˆ = Σ i i ACA : April 14. The reduced model obtained through the algorithm shown in Figure 1. The resulting reduced model is ∗ ∗ A1 = W 1 AZ1 . this is explored next. B). while Vk provides an orthonormal basis for a subspace of the observability subspace O(C. the transfer function is symmetric. Given the minput. B1 = W1 B. 2004 i i . Extension of theory and computation to MIMO systems In the large scale setting. we shall attempt to approximate the relevant eigenvector basis for X with an eigenvector basis for X1 . this is not generally the case.3. the dominant Hankel singular values) and ∗ W1 Z1 = I1 then.
e.6. Consider the following criterion: J (T) = trace (TP T∗ ) + trace (T−∗ QT−1 ). The question arises. In other words Tb P T∗ b = Tb QTb = P. The criterion (the sum of the traces of the two gramians) for the augmented system is as follows J (J) = trace (P + JQJ) + trace (Q + J−1 P J−1 ) = trace (P + Q) + trace (JQJ + J−1 P J−1 ) J1 (J) ACA April 14. The following result is due to Helmke and Moore. the symmetrizer is any symmetric matrix which satisﬁes AJ = JA∗ . i. It readily follows that with the eigenvalue decomposition P 1/2 QP 1/2 = UΣ2 U∗ . Proposition 12. That this is square and symmetric. where A ˆ = A and B ˆ. number of states) of the system has not increased. Therefore the norm (H2 or H∞ ) of the original system is bounded from above by that of the augmented system. of how to best choose the diagonal entries of J.1. the above quantity is equal to the sum of the eigenvalues of the reachability and of the observability gramians. First notice that J can be expressed in terms of the positive deﬁnite matrix Φ = T∗ T: J = trace P Φ + QΦ−1 . The question which arises is to ﬁnd the minimum of J (T) as a function of all nonsingular transformations T. P The choice of symmetrizer At this stage. This system has the property that its The augmented system is Σ Hankel operator is symmetric. The minimum of J is J∗ = minΦ>0 J = 2 n k=1 σk . let us assume that A is diagonalizable and has been transformed to the basis where A is diagonal. The criterion which is chosen is that the Hankel operator of the augmented system be close to that of the original system. In our case. the Markov parameters C can be done follows readily from properties of system realizations. We should remark that the ﬁrst part of the above proposition can be proved using elementary means as in section 7. a resulting balancing −∗ −1 transformation is Tb = Σ1/2 U∗ P −1/2 . using the same tools as above we can show that ˆ=X ˆ J−1 . Let T be a basis change in the state space. Q ˆ = JX ˆ ⇒ X ˆ2 = P ˆQ ˆ P A straightforward calculation shows that the gramians of the augmented and the original systems are related as follows: ˆ = P + JQJ. i.i i 12. Φ = T∗ T > 0. 244]. The important aspect of this embedding is that the complexity (McMillan degree. Q ˆ = Q + J−1 P J−1 . and the minimizer is Φ∗ = P −1/2 P 1/2 QP 1/2 1 /2 P − 1 /2 . AJ = JA∗ . C ˆ are as deﬁned above. Therefore. we wish to compute an appropriate symmetrizer.3. The following quantities are deﬁned ˜ = JC∗ and C ˜ = B∗ J−1 ⇒ B ˆ = B ˜ B . MIMO systems and the symmetrizer Let J = J∗ be a symmetrizer for A. are symmetric for all ≥ 0. 2004 i i i i . The transformation Tb is unique up to orthogonal similarity (P need not be diagonal). For simplicity. C ˆ = B C ˜ C ˆ = JC ˆ∗ ⇒ B ˆ . X ˆ = JQ ˆ=P ˆ J−1 . Then as already mentioned the gramians are transformed as follows: P → TP T∗ . For a ﬁxed state space basis. Q → T−∗ P T−1 . An iterative method for approximate balanced reduction 307 i “oa” 2004/4/14 page 307 i ˆA ˆ B ˆ ∈ R(m+p)×(m+p) .e. In this case the symmetrizer J is an arbitrary diagonal matrix. In order to address this issue we make use of the variational characterization of balancing as developed in a series of papers by Helmke and Moore [268.
our test examples indicate that for medium scale problems (n ≈ 400) it is already competitive with existing dense methods. In this representation the diagonal entries of the two gramians are: pii = b∗ ci c∗ i bi i . A = diag (−λ1 . SVD–Krylov methods i “oa” 2004/4/14 page 308 i We compute the diagonal J = diag (j1 . qii = . Therefore if A is diagonalizable there exists a symmetrizer which guarantees that the sum of the singular values of the augmented system is twice that of the original. Closely related to this system are the continuous time Lyapunov equations (6. Under the assumptions of stability. The square roots of the eigenvalues of the product PQ are the socalled Hankel singular values σi (Σ) of the system S which are basis independent. and (ii) to a recursive way of computing dominant singular subspaces [82]. This motivates the development of lowrank approximate solution schemes and leads to model reduction by truncation. the iterative method is based upon adjoining residual corrections to the current approximate solutions to the cross gramian equations. in many cases. 12. [176]).4 Iterative Smithtype methods for model reduction Let Σ = A C B D ∈ R(n+p)×(n+m) . the above equations have unique symmetric positive deﬁnite solutions P . the eigenvalues of P . it can provide balanced realizations where most other methods fail. Q as well as the Hankel singular values σi (Σ) decay very rapidly. i i ACA : April 14.16) the maps T1 and Ti1 satisfy Ti1 T1 = Ik and hence T1 Ti1 is an oblique projector. Finally. However. the factorizations P = UU∗ and Q = LL∗ where U and L are the square roots of the gramians. observability gramians. The above computation was carried through under the assumption that A is diagonal. we would also like to point (i) to a preconditioned subspace method for solving the Lyapunov equation [175]. ∗ λi + λi λi + λ∗ i c c∗ i Furthermore by applying the state space transformation T. namely 2 i=1 (pii + qii ). Numerical experiments with iterative methods for solving the Lyapunov equation can be found in [72]. respectively. We observe very rapid convergence in practice. · · · . be the model to be reduced. · · · .3 on page 183. where bi ∈ Rm .i i 308 Chapter 12. so that the above trace is minimized. The second is n 1 2 J1 (J) = pii ji + qii 2 ji i=1 2 The minimum of J1 is achieved for ji = n qii pii : min J1 = 2 i=1 √ √ √ 2 pii qii ⇒ min J = ( pii + qii ) . Given the singular value decomposition (SVD) U∗ L = WΣV∗ . reachability. it seems to converge quite rapidly. Moreover.g. respectively. and observability. Usually three major iterations are sufﬁcient. ci ∈ Rp . Then according to (7. B = b1 ··· bn ∗ . jn ). C= c1 ··· cn . The ﬁrst summand does not depend on J. The proposed implicit restarting approach involves a great deal of work associated with solving the required special Sylvester equations. Let in particular. which is diagonal with entries tii = bi ∗ b .1) on page 145. 2004 i i . according to (7. called the reachability. n This should be compared with twice the sum of the trace of the two gramians. there is no proof of convergence and this seems to be a general difﬁculty with projection approaches of this type (see e. Computational efﬁciency The method we have proposed appears to be computationally expensive. In fact. The difference √ √ n of the two traces is i=1 ( pii − qii )2 . we can assure i i that pii = qii . Q ∈ Rn×n . Recall from section 7. Nevertheless..17).4. As discussed in section 9. −λn ). However.
and ρi = Re (µi ). As already mentioned the above formulation for balanced truncation requires the knowledge of the square roots U and L. µ2 . 263. Moreover. Next we brieﬂy summarize some approaches which aim at obtaining lowrank approximants for the gramians P and Q. Section 12. i I)(A + µi I) i I)(A + µi I) A where P0 = 0. the number of columns of the solution of the low rank ADI iteration. The development follows [157]. In general. for j = 1. {µ1 .1 ADI. 2004 i i i i . instead of P and Q. where M is symmetric positive deﬁnite and can be split into the sum of two symmetric positive deﬁnite matrices M = M1 + M2 for which the following iteration is efﬁcient: y 0 = 0. 12. and cyclic Smith(l) iterations In the following discussion.3) l April 14. Application to Lyapunov equation is obtained through the following iteration step: A −1 A −1 ∗ Pi = (A − µ∗ Pi−1 [(A − µ∗ ] − 2ρi (A + µi I)−1 BB∗ (A + µi I)−∗ .4. C1 = CTi1 .2 summarizes the low rank versions of ADI and Cyclic Smith(l) iterations which compute lowrank approximants to U and L. Iterative Smithtype methods for model reduction 309 i “oa” 2004/4/14 page 309 i a reduced model is obtained by letting A1 = T1 ATi1 . and hence reduce the memory complexity to O(nk ). The ADI shift parameters µj and ηj are determined from spectral bounds on M1 and M2 to increase the convergence rate. The ADI method uses a different shift at each step and obtains a series of Stein equations. Since this method requires the computation of a Schur decomposition. (M1 + µj I)yj −1/2 = b − (M2 − µj I)yj −1 . and Cyclic Smith(l) iterations that lead to an approximate solution to P and hence have the storage requirement of O(n2 ). Below we introduce and analyze a modiﬁed lowrank Smith method that essentially retains the convergence properties but overcomes the memory requirements.4. (M2 + ηj I)yj = b − (M1 − ηj I)yj −1/2 . and B ∈ Rn×m . where A ∈ Rn×n is stable and diagonalizable. In all of these methods the idea is to transform. In section 12. Smith. The BartelsStewart [43] method as modiﬁed by Hammarling [164]. · · · . as previously explained. For a detailed analysis of these algorithms. Smith.4.1 we mention the Alternating Direction Implicit (ADI). can easily exceed manageable memory capacity. for which the solution is obtained by an inﬁnite summation. The parameter µ is the shift. it is not appropriate for large scale problems. The reduced model Σ1 is balanced. 2. · · ·} are elements of C− . the ADI iteration is applied to linear systems My = b. the shift parameters {µ1 . using spectral mappings of ∗ −λ the type ω (λ) = µ µ+λ . ∗ −1 The spectral radius ρADI = ρ determines the rate of convergence where l is i=1 (A − µi I)(A + µi I) the number of shifts used. is the standard direct method for the solution of Lyapunov equations of small to moderate size.i i 12. For large n and especially for slowly converging iterations. The minimization of ρADI with respect to shift parameters µi is called the ADI minimax problem: ∗  (λ − µ∗ 1 ) · · · (λ − µl )  . µ3 . The ADI iteration The alternating direction implicit (ADI) iteration was ﬁrst introduced by Peaceman and Rachford [263] for solving linear systems of equations arising from the discretization of elliptic boundary value problems. we refer the reader to [267. µl } = arg min max µi ∈C− λ∈σ (A)  (λ + µ1 ) · · · (λ + µl )  ACA (12. asymptotically stable and the H∞ norm of the error system is bounded from above by twice the sum of the neglected singular values.4. µ2 . J. This lowrank phenomenon leads to the idea of approximating the gramians with low rank solutions. B1 = T1 B. a continuous time Lyapunov equation into a discrete time Lyapunov or Stein equation. This applies equally to the computation of lowrank approximants to the observability gramian. 303] and references there in. we focus on the approximate solution of a single Lyapunov equation AP + P A∗ + BB∗ = 0. · · · . Smith’s method uses a single shift parameter and hence is a special case of the ADI iteration. P and Q are often found to have numerically lowrank compared to n. Smith(l) method is also a special case of the ADI iteration where l shifts are used in a cyclic manner. where µ ∈ C− .
The key idea in the lowrank versions of Smith(l) and ADI methods is to write Sl Sl ∗ A A A ∗ Pi = ZSl i (Zi ) and Pi = Zi (Zi ) .1 compute the solution P explicitly computed.4. · · ·. S ∗ S ∞ S = 0 and Pi Since A is stable. ρ(Aµ ) < 1 and the sequence {Pi }i=0 generated by the iteration P0 +1 = Bµ Bµ S ∗ + Aµ Pi Aµ . These observations lead to the idea of cyclic Smith(l) iteration.e.29 on page 76. Smith and Smith(l) iterations outlined in section 12. F ≤ X 2 X −1 2 ρ2 P ADI F (12.4) has the same solution as its ∞ ∗ ∗ j continuoustime counterpart. The Low Rank Smith method amounts to truncating the series at an appropriate step. which is a special case of ADI where l different shifts are used in a cyclic manner. Smith’s method For every real scalar µ < 0. the lowrank methods compute and store the lowrank approximate square root factors reducing the storage requirement to O(nr) where r is the numerical rank of P . and [350] for contributions to the solution of the ADI minmax problem. [313].2 LRADI and LRSmith(l) iterations Since the ADI. while the modiﬁed low rank Smith method ∗ ∗ j updates a low rank SVD approximation to P by updating and truncating the SVD as each term Aj µ Bµ Bµ (Aµ ) is added in. The remedy is the lowrank methods. i i ACA : April 14. the lth iterate satisﬁes the inequality P − PlA where X is the matrix of eigenvectors of A. 2004 i i . i. T = PA l . Thus the Smith iterates can be written as i S Pi = j =1 −1 ∗ j −1 ∗ Aj µ Bµ Bµ (Aµ ) . Bµ = −2µ (A + µI )−1 B.4) where Aµ = (A − µI)(A + µI)−1 . The Smithl iteration Penzl has observed in [267] that in many cases the ADI method converges very slowly when only a single shift (Smith iteration) is used and a moderate increase of the number of shifts l accelerates the convergence. 12. 2. converges to the solution P . when all the shifts are equal. But he has also observed that the speed of convergence is hardly improved by a further increase of l. the continuous time problem has been transformed into a discrete time one. i=1 As in Smith’s method.i i 310 Chapter 12. AP + P A∗ + BB∗ = 0. But one should notice that in many cases the storage requirement is the limiting factor rather than the amount of computation. It can be shown that if A is diagonalizable. P − Ad P A∗ d = T is equivalent to an appropriately deﬁned continuoustime Lyapunov equation. The solution to the Stein equation is P = −2µ j =0 Aj µ Bµ Bµ (Aµ ) . It is easy to show that the Smith(l) iterates are generated as follows: i Sl Pi = j =1 −1 i−1 ∗ Ai d T(Ad ) where Ad = l (A − µi I)(A + µi I)−1 .4. µi+jl = µi for j = 1. the Stein equation (12. Then according to proposition 4. SVD–Krylov methods i “oa” 2004/4/14 page 310 i We refer the reader to [103]. is equivalent to the Stein equation ∗ P − Aµ P A∗ µ = Bµ Bµ . (12. Instead of explicitly forming the solution P . the storage requirement is O(n2 ).5) −λ Hence using the bilinear transformation ω = µ µ+λ . Formulas for multiple shifts µj and for complex shifts are fairly straightforward to derive [157]. Notice that the Smith method is the same as the ADI method.
a k step LRSmith(l) iteration computes only l matrix factorization. p). The following holds true: = λi (PQ) and σ ˆi proximate gramian respectively. we do not recompute the SVD. Let σi and σ ˆi denote Hankel singular values resulting from the fullrank exact gramians and the lowrank ap2 2 = λi (Pk Qk ). The proofs of these result can be found in [157].3 we k introduce a modiﬁed LRSmith(l) iteration to overcome this problem and to retain the lowrank structure. Let σi and σ ˆi be deﬁned as above.8. 12.7) One should notice that while a k step LRADI iteration requires k matrix factorizations. Proposition 12.6) where Bd is deﬁned in (12. First the iterate ZSl 1 is obtained by an l step lowrank ADI iteration. 2004 i i i i . if the shifts {µ1 . ACA April 14. The key idea of the proposed method is to compute the singular value decomposition of the iterate at each step and. instead we update it after each step to include the new information and then truncate given the tolerance τ . In section 12.e. The methods consists of two steps.3) can be rewritten in terms of ZA i as −1 A −2µi (A + µi I)−1 B ]. Hence in case of large m and slow convergence. Next we present some convergence results for the LRSmith(l) iteration without proofs. Moreover.. when m is large and/or the convergence is slow. to replace the iterate with its best lowrank approximation. the cyclic lowrank Smith 1 = methods (LRSmith(l)) is a more efﬁcient alternative to the LRADI. the LRSmith(l) iteration has the drawback that when applied to the usual Lyapunov equation AP + P A∗ + BB∗ = 0. Deﬁne Ekp = P −Pk and Ekq = Q−Qk . · · · . ZA Zi−1 . i = [ (A − µi I)(A + µi I) √ where ZA −2µ1 (A + µ1 I )−1 B.4. where K = κ(X )2 . i. i. the number of columns of the approximate squareroot is increased by m × l at each step. i. 0 ≤ trace (Ekq ) = trace (Q − Qk ) ≤ K p l (ρ(Ad ))2k trace (Q).3). κ(X ) denoting the condition number of X . ZA i and Zi has m × i and m × l × i columns respectively.e. µl } are used in a cyclic manner. where m is the number of inputs and l is the number of cyclic shifts applied. Iterative Smithtype methods for model reduction 311 i “oa” 2004/4/14 page 311 i The lowrank ADI (LRADI) is based on (12.3 The modiﬁed LRSmith(l) iteration As discussed in section 12. Sl We note that at the ith step. and let A = X (Λ)X −1 be the eigenvalue decomposition of A. Then. Combining the above relationships (12. p)(ρ(Ad ))2k trace (P ) trace (Q) 0≤ + m trace (P ) i=0 C d Ai d 2 2 + p trace (Q) i=0 Ai d Bd 2 2 where K is the square of the condition number of X . k = [Bd Ad Bd Ad Bd · · · Ad (12. the number of columns of ZA k and ZSl easily reaches an unmanageable level of memory requirements. It then follows that k th step LRSmith(l) iterate is given by k−1 2 ZSl Bd ] .4. The k step LRSmith(l) iterates satisfy 0 ≤ trace (Ekp ) = trace (P − Pk ) ≤ K m l (ρ(Ad ))2k trace (P ). In what follows we propose a modiﬁed LRSmith(l) iteration so that the number of columns in the lowrank square root factor does not increase unnecessarily at each step. When the number of shift parameters is limited. this causes storage problems. σi Corollary 12. Deﬁne n ˆ = kl min(m. Hence as discussed in [157].4. n n ˆ 2 σi − i=1 i=1 k−1 k−1 2 σ ˆi ≤ K l (ρ(Ad ))2k K min(m.e. However. given a tolerance τ .i i 12.7.4.5).2. the LRSmith(l) is initialized by A ZSl 1 = B d = Zl (12. the cyclic LRSmith(l) iteration is equivalent to LRADI iteration. ρ(Ad ) is close to 1. and these two methods fail.
k . Σ ˜ k+1 is simply obtained from Z ˜ k .5) and (12. V ˆ ]T. We note that since S Let S ˜ inexpensive. Then it readily follows that Zk+1 is given by ˜ k+1 = V ˜Σ ˜ = [V. In other words. A∗ Q + i i ACA : April 14. the modiﬁed LRSmith(l) solutions to the two Lyapunov equations AP + P A + BB = 0. we partition Σ ˜ conformally: ˆ and V is already available. k be Zk = VΣW where V ∈ R (mlk)×(mlk) (mlk)×(mlk) Sl Sl 2 ∗ Σ∈R . In this section we present some convergence results for the k step approximate solutions P ∗ ∗ respectively. Z (12.2. Next. the continuoustime Lyapunov equation is equivalent to the Stein equation ∗ P − Ad P A∗ d + Bd Bd = 0 . Then the approximate lowrank gramian at the k th step is k−1 Sl Sl ∗ Pk = ZSl k (Zk ) = i=0 n×(mlk) Sl ∗ Let the short singular value decomposition (SSVD hereafter) of ZSl . Deﬁne B(k) = Ak d Bd and decompose ˆ Θ. taking the SVD of S ˆ is ˆ Y∗ . Consequently. and W ∈ R . and it is ˜ k = VΣ is also a lowrank square root factor for P Sl .4. Z 0 Θ ˆ S ˆ have the following SVD: S ˆ = TΣ ˆ ∈ R(k+1)ml×(k+1)ml . · · · . enough to store V and Σ only.i i 312 The proposed algorithm As stated in section 12. 1) (12. Θ ∈ R(ml)×(ml) . the SSVD of Pk is given by Pk = VΣ V . Again one should notice that Z and V ˆ . In view of the above decomposition.6) respectively.10) The (k + 1)st lowrank square root factor is approximated by ˜ k+1 ≈ V ˜ 1Σ ˆ 1. we where Γ ∈ R(mlk)×(ml) . Let ZSl LRSmith(l) iterate as deﬁned in k the k (12. An increase occurs only if more than r singular values of Z ˜ k+1 fall columns of Z above the tolerance τ σ1 . (12. and from the SVD of S ˜ k+1 = [V ˜1 V ˜ 2] Z ˆ1 Σ ˆ2 Σ where ˆ 2 (1.7). which ˆ ∈ R(k+1)ml×(k+1)ml . and Bd are deﬁned in (12. the number of ˜ k+1 generally does not increase.4. At the (k + 1)st step. ˆ Σ1 (1. Convergence properties of the modiﬁed LRSmith(l) iteration ˜k and Q ˜ k which are.11) st ˜ k+1 (Z ˜ k+1 )∗ .1. SVD–Krylov methods (12. which is easy to compute. Assume that until the k th step of the algorithm all the iterates ZSl i ∗ i ∗ Ai d Bd Bd (Ad ) . Hence. this means that we simply ignore ˜k+1 = Z Hence the (k + 1) step lowrank gramian is computed as P the singular values which are less than the given tolerance. In any case. V∗ V deﬁne the matrix ˆ k+1 = [V V ˆ] Σ Γ . Z k Let τ > 0 be a prespeciﬁed tolerance value. there can be at most ml additional columns added at any step which is the same as the original LRSmith(l) iteration discussed in section 12. 1) Σ < τ.9) satisfy as σmin (ZSl i ) σmax (ZSl i ) Sl > τ for i = 1. B(k) = VΓ + V ˆ = 0 and V ˆ ∗V ˆ = Iml .8) th where Ad . the approximants ZSl k+1 and Pk+1 are readily computed Sl k Sl Sl k ∗ k ∗ ZSl k+1 = [Zk Ad Bd ] and Pk+1 = Pk + Ad Bd Bd (Ad ) . 2004 i i . i “oa” 2004/4/14 page 312 i Chapter 12. ˆ where V Z ˜ ∈ Rn×(k+1)ml . from the k th to the (k + 1)st step.
Iterative Smithtype methods for model reduction 313 i “oa” 2004/4/14 page 313 i QA + C∗ C = 0.7. P ˜ k (Z ˜ k )∗ .8. (σmax (Z (12.8). τQ are the given tolerance values.10) Σ In other words. Then. Let A = X ΛX Ekp = P − P be the eigenvalue decomposition of A. and P ˜ k (Z ˜ k )∗ . · · · . 2. where A ∈ Rn×n .12) SL ˜k ) ≤ τ 2 0 ≤ trace (∆kp ) = trace (Pk −P i∈IP ˜ 2 nP i (σmax (Zi )) . i = 1. IQ and ni are deﬁned similarly.11. we denote by nP i . 2004 i i i i . Let Z − 1 ˜k . as before. ˜k = Z Proposition 12.i i 12.8). where K . we refer the reader to [157]. p)(ρ(Ad ))2k trace (P ) trace (Q) k−1 k−1 0≤ +m trace (P ) i=0 2 +τP QSL k i∈IP 2 2 +τP τQ i∈IP C d Ai d 2 2 + p trace (Q) i=0 Ai d Bd 2 2 2 SL ˜ 2 nP i (σmax (Zi )) + τQ Pk i∈IQ ˜ 2 nQ i (σmax (Yi )) ˜ i ))2 (σmax (Z i∈IQ ˜ 2 nQ i (σmax (Yi )) ] where τP .11 2 2 2 2 differ only by the summation of terms which are O(τP ). It follows that Pk and ∆kp satisfy SL ˜k ≤ τ 2 ∆kp = Pk −P i∈IP ˜ i ))2 . In view of Proposition 12. the following result holds: ˜ k be the k th step modiﬁed LRSmith(l) iterate corresponding.9. (12.10 differ only by an order of O(τ 2 ). n n ˆ 2 σi − i=1 i=1 2 σ ˜i ≤ K l (ρ(Ad ))2k [K min(m. Once again the bounds for the error in the computed Hankel singular values in corollary 12. k }. ˆ 2 = 0 for Z IP = {i : such that in (12. We denote now introduce the set ˜ i . For the proofs. Let Z th ˜ ˜ Pk be the k step LRSmith(l) iterates given by (12. and O(τP τP ).8 and corollary 12. O(τQ ). the number of neglected singular values. Corollary 12. and C ∈ Rp×n . and K = κ(X )2 . is the square of the condition number of X . Then Q for each i ∈ IP .7 and proposition 12. We note that the bounds for the traces of the errors in proposition 12. Deﬁne ∆kp = P − Pk . this is the set of indices for which some columns have been eliminated from the ith approximant. B ∈ Rn×m . Let σi and σ ˜i denote Hankel singular values resulting from the fullrank exact gramians P and 2 ˜k and Q ˜ k respectively: σi 2 = λi (PQ) and σ ˜k Q ˜ k ). ACA April 14.4.10. Q and from the modiﬁed LRSmith(l) approximants P ˜i = λi (P Deﬁne n ˆ = kl min(m. The next result introduces a convergence result for the computed Hankel singular values similar to the one in corollary 12. to (12.9). ˜k = Z Proposition 12.13) where τ is the tolerance value of the modiﬁed LRSmith(l) algorithm. The k step modiﬁed LRSmith(l) iterates satisfy ˜k ) ≤ K m l ρ(Ad )2k trace (P ) + τ 2 0 ≤ trace (Ekp ) = trace (P − P i∈IP ˜ 2 nP i (σmax (Zi )) . p). ˜ k be the k th step modiﬁed LRSmith(l) iterate corresponding to (12.
∆B ≤ τ Bk and ∆C ≤ τ Ck . −1 Hence. the closer V 12. ˆ∗ +B ˆ kB ˆ kS ˆk + S ˆkA A k k k Sl ˆ k is the diagonal matrix with the approximate Hankel singular where ∆ is the error in P .4 A discussion on the approximately balanced reduced system ∗ ∗ Wk AVk Wk B be the k th order reduced system obtained by exact balanced CVk D ˆk B ˆk ˆ ∗ AV ˆk W ˆ ∗B A W k k ˆk = truncation. i. 1. Similarly. The result is easily generalized to multiple inputs by linearity and superposition. in other words we assume that V ˆ k . .4.4.i i 314 Chapter 12. Stability of the reduced system is not always guaranteed. . ˆ k. 2 2 where Aµ = (A − µI)−1 (A + µI) and Bµ = (A − µI)−1 B as before. between a Smith method and the well known trapezoidal rule (Crank˙ = Ax + Bu. but if the approximate singular values are close enough to the original ones. A straightforward application of the trapezoidal rule along with the conditions x(0) = h −1 −1 0 and 0 δ (τ )dτ = 1 gives x1 = (I − h B and xj +1 = (I − h (I + h 2 A) 2 A) 2 A)xj for the remaining intervals. i i ACA : April 14. The following equation is easily derived: exact square roots in computing W Let Σk = Ak Ck Bk D = ˆ∗ = W ˆ ∗ (A∆ + ∆A∗ )W ˆ k. W ˆ k are to Vk . ˆ k . It is interesting to note that quadrature rules have been proposed previously ∗ ∞ [154] in this context for directly approximating the integral formula P = 0 eAt BB∗ eA t dτ . for the gramian.5 Relation of the Smith method to the trapezoidal rule ˙ = Ax + Bu. the solution of the Lyapunov equation AP + P A∗ + BB∗ = 0. Now. There is a direct relation due to Sorensen. We consider the system x Moreover.. .4) where (12. ∆C = Ck − C ˆ k satisfy to Vk and Wk . Towards this goal. x(t) = eAt B. ∆ = P − Pr and S values as diagonal entries.. This establishes the direct connection between the low rank Smith method (in simplest form) with the technique of constructing an approximate trajectory {xj } via the trapezoidal rule and then developing an approximation to P ∞ directly by applying the simplest quadrature rule P = h j =0 xj +1 x∗ j +1 . 2. y = cx. (j +1)h (j +1)h (j +1)h x((j + 1)h) − x(jh) = jh ˙ (τ )dτ = x jh Ax(τ )dτ + B jh δ (τ )dτ for each interval [jh. then it is. . . (j + 1)h). Thus. since hµ2 = µ µ = 2µ. 1. deﬁne ∆V = Vk − V ˆ k and ∆W = Wk − W ˆk Next we examine how close Σk is to Σ ˆ k and W ˆ k are close and let ∆V ≤ τ and ∆W ≤ τ where τ is a small number. putting µ = 2 A) ∞ ∞ 2 h gives P= 0 x(τ )x(τ )∗ dτ = h j =0 xj +1 x∗ j +1 = 2µ j =0 ∗ ∗ j Aj µ Bµ Bµ (Aµ ) .e. ∗ The Smith method derives from the equivalence of the Lyapunov equation to a Stein equation (12. 2004 i i . We consider xj ≈ x(jh). It can be shown that ∆A = Ak − A ∆A ≤ τ A ( Wk + Vk ) + τ 2 A . xj +1 = (I − h (I + h 2 A) 2 A) ∞ j −1 (I − h B. that is the smaller τ . j = 0.5) holds. the closer Σ ˆ k will be to Σk . let Σ = be the k th order reduced system obtained ˆk D ˆk C CV D Sl by approximate balanced truncation where the approximate lowrank square roots ZSl r and Yr are used instead of the ˆ k and V ˆ k . for j = 0. ∆B = Bk − B ˆ k . . Wk . For any positive step size h. to Nicholson) method for integrating the differential equation x be computed by the trapezoidal rule and we recall that x0 = 0 and that u(τ ) = δ (τ ). is ∞ P= 0 x(τ )x(τ )∗ dτ = 0 ∞ eAt BB∗ eA t dτ. and recall that for u(τ ) = δ (τ ) and x(0) = 0. SVD–Krylov methods i “oa” 2004/4/14 page 314 i 12.
Chapter summary 315 i “oa” 2004/4/14 page 315 i 12. But since moment matching methods are local in frequency. The former preserve important properties like stability.5. Therefore they can applied to largescale systems. The latter are moment matching methods which have iterative implementations. two iterative methods for solving Lyapunov equations have been presented. In the preceding chapter we established connections between these two families of approximation methods. and have a global error bound. As mentioned however. First.5 Chapter summary The two basic families of approximation methods are the SVD–based and the Krylov–based ones. it was shown that reachability and generalized reachability matrices can be obtained as solutions of Sylvester equations which are a general form of Lyapunov equations. the least squares approximation method was introduced and was shown to combine some of the attributes of each of the basic methods. leading to approximation methods which combine the best attributes of SVD–based and Krylov–based methods. stability is not guaranteed. ACA April 14. The same holds for observability and generalized observability matrices. Finally. Third.i i 12. each has its own set of advantages and disadvantages. 2004 i i i i . it was argued that by appropriate choice of weightings. weighted balanced truncation methods can be reduced to Krylov methods. Second. in return they can be applied to systems of low to moderate dimension.
2004 i i .i i 316 Chapter 12. SVD–Krylov methods i “oa” 2004/4/14 page 316 i i i ACA : April 14.
obtained by a twodimensional sampling and quantization of the intensity at each of the pixels. and a lowpass Butterworth ﬁlter. which is a modiﬁed modal approximation method commonly used for the reduction of structural systems. using the SVD (Singular Value Decomposition). to the approximation of three black–and–white images: a clown from MATLAB. k will be the smallest positive integer such that σk+1 < τ σ1 . Finally in section 13. A picture can be described as a rectangular matrix X ∈ Rn×m . yields the compression ratio n ·k . Section 13.2 examines the reduction of ﬁve systems of relatively low order (n = 48 to n = 348). · · · . 13. Thus the compression ratio in storage is (n+m+1)·r ·r .2.i i i “oa” 2004/4/14 page 317 i Chapter 13 Case studies The results presented in earlier chapters will now be illustrated by means of several case studies. The SDD (semidiscrete decomposition) which is an approximate +3m+k) SVD method discussed brieﬂy in section 3. and the pictures of galaxies NGC6782 and NGC6822 taken by the Hubble telescope. namely module 1R with n = 270 states and module 12A with n = 1412 states. The ﬁrst section deals with the approximation of three images. the reduced models obtained using a variety of methods are compared with each other. which for large n·m·k 3·r m ≈ n. without however compromising the quality of the reconstructed picture from the coded data. page 30. Coding of the picture consists in reducing the number of bits which are necessary for each pixel.1 Approximation of images Here we apply the SVD (Singular Value Decomposition) discussed in section 3. this ratio becomes approximately 2n . n. i = 1.2. Section 13. which can be considered as static systems. If k bits of information are required per n·m pixel then the reduction ratio is still the same. a model of a clamped beam. and n = 20736 states. The original picture of size n × m requires storage of size n × m. of each system.4 iterative Smithtype methods are applied to the CD player model used in section 13. The order k of the reduced models will be determined by a given tolerance. a heat transfer model.3 investigates the reduction of models of two modules of the ISS (International Space Station). 317 i i i i . Given the Hankel singular values σi .6 on page 38. 13. A variety of model reduction methods are compared with the Modal Gain Factor method.2 Model reduction algorithms applied to low order systems In this section we apply both SVDbased and Krylovbased model reduction algorithms to ﬁve different dynamical systems: a building model.2. assuming that n ≈ m. n = 3468. The next three sections deal with dynamical systems. achieves a compression ratio r·(3n . a model of a CD player. The compression is as follows. and to three further models with dimension n = 1006. namely τ = 1 × 10−3 . The SVD compressed picture of rank r requires storage of size (n + m + 1) · r.
The normalized singular values of the clown and galaxies NGC6822 and NGC6782 Original Images Approx. 2004 i i . Case studies Normalized Singular Values of the Clown and of the two Hubble Images clown tol = 1/10 10 −1 10 −2 tol = 1/100 ngc6782 10 −3 ngc6822 10 −4 10 −5 10 0 10 1 10 2 Figure 13. Images: Tol = 1/10 Approx. Images: Tol = 5/100 Figure 13.i i 318 10 0 i “oa” 2004/4/14 page 318 i Chapter 13.2. The original images together with the 10% and 5% approximants i i ACA : April 14.1.
Building model Heat model CD player model Clamped beam model Butterworth ﬁlter n 48 197 120 348 100 m 1 2 1 1 1 p 1 2 1 1 1 k 31 5 12 13 35 The Hankel singular values of each model are depicted in ﬁgure 13. the easier it is to approximate the system. the building example. In the subsections that follow each system is brieﬂy described and the amplitude Bode plots (more precisely. It should be stressed that the tolerance is a user speciﬁed quantity that determines the tradeoff between accuracy and complexity.4 10 10 −12 0. namely displacements in the x and y directions. 24. and the rational Krylov procedure. as well as the order of the reduced system. 10 0 Hankel Singular Values of the five systems 1 k/n vs tolerance 10 −2 0.2. the k clamped beam model. the largest singular value of the frequency responses) of the full and reduced order models.1. as well as of the corresponding error systems are plotted.i i 13. left pane. For comparison. we plot and tabulate results for one of these approximants.3. i = 1. namely: balanced truncation. the highest k Hankel singular values are normalized to 1. it determines completely the reduced order models for SVD–based methods.3.7 Building 10 −6 Butterworth 0. The eigenvalues (poles) of the reduced system for instance.2. 2004 i i i i . The right pane of ﬁgure 13. the Lanczos procedure. the CD player model.6 10 −8 k/n −10 0. one has to choose the interpolation points and their multiplicities. are placed automatically. the building model is the hardest to approximate. Right pane: relative degree reduction n vs error σk tolerance σ1 The model reduction approaches used are three SVD–based methods. · · · . namely: the Arnoldi procedure.3 Heat 10 −14 Heat Butterworth CD Player Beam 0. Model reduction algorithms applied to low order systems 319 i “oa” 2004/4/14 page 319 i The table that follows shows the order of the original system n. It thus follows that among all models. and singular perturbation approximation. Since the approximants obtained by balanced reduction and approximate balanced reduction are (almost) indistinguishable for all but the heat model. as well as two SVD–Krylov based methods. the number of inputs m.3 shows the relative degree reduction n versus σk the error tolerance τ = σ1 . and rotation. On the other hand.2 10 −16 0.3. three Krylov–based methods. to apply the Krylov–based methods.9 Building 10 −4 0. the Butterworth ﬁlter.8 CD Player 0. and outputs p. ACA April 14. k .3. namely the least squares method described in section 12.1 Building Model The full order model is that of a building (Los Angeles University Hospital) with 8 ﬂoors. Once speciﬁed. which will be referred to as approximate balanced reduction. Left pane: normalized Hankel singular values of the heat model. page 302. each of which has 3 degrees of freedom.1 10 −18 0 10 −15 10 −10 10 −5 10 0 0 20 40 60 80 100 120 140 160 tolerance Figure 13. Moreover. and the iterative method described in section 12. The 24 resulting variables qi .5 Beam 0.0 × 10−1 . optimal Hankel norm approximation. the relative H∞ and H2 norms of the error systems are tabulated. page 299. for any ﬁxed tolerance smaller than 1. this shows how much the order can be reduced for a given tolerance: the lower the curve. 13.
upper. Hankel norm i i ACA : April 14. Since the expansion of transfer function H(s) around ∞ results in unstable reduced systems for Arnoldi and Lanczos. since the interpolation points were chosen in the 1−100 rad/sec region. q The state space model has order 48. 2004 i i . The same holds for the rational Krylov method.62 × 10−1 . Among the SVD based methods.4. This equation can be written in state space form by deﬁning the state ˙ ∗ ]∗ ∈ R48 : x∗ = [ q∗ q ˙ (t) = x 0 −M−1 K I −M−1 D x(t) + 0 M−1 v u(t). balanced reduction methods are the best for low.4. balancing and singular perturbation are closer to the original model. We will assume that the input affects the ﬁrst coordinate q1 (t). high frequencies respectively. Building model. singular perturbation. But in terms of the relative H∞ error norm. were used instead. the pole closest to imaginary axis has real part equal to −2. the shifted version of these two methods with s0 = 1. i = 1. When compared to SVD based methods. Lower panes: error Bode plots.i i 320 Chapter 13. singular value σmax of the frequency response of the reduced order and of the error systems are shown in ﬁgure 13. and is single input single output. satisfy a vector second order differential equation of the form: Mq ¨ (t) + Dq ˙ (t) and their derivatives q + Kq(t) = v u(t). as Arnoldi and Lanczos yield good approximants for the low frequency range. We approximate the system with a model of order 31. For this example. that is v = [1 0 · · · 0]∗ . where u(t) is the input. 24. Upper panes: Bode plot of original and reduced systems. respectively. Case studies i “oa” 2004/4/14 page 320 i ˙ i . · · · . and the output is y(t) = ˙ 1 (t) = x25 (t). the moments matching based methods are better for low frequencies. The effect of choosing s0 as a low frequency point is observed in the right panes of the ﬁgure. lower panes. The largest Building Model Reduced Systems −45 Singular Values (dB) FOM BT HNA SPA −50 −55 −60 −65 −70 −75 −80 10 0 −45 −50 Singular Values (dB) −55 −60 −65 −70 −75 −80 −85 FOM LS RK LP AP 10 2 −85 10 0 10 2 Sigma plot for the error systems of Building Model −105 Singular Values (dB) Singular Values (dB) −110 −115 −120 −125 −130 Frequency (rad/sec) BT HNA −100 LS −150 −200 RK 10 2 −250 Frequency (rad/sec) 10 2 Figure 13. When we consider the whole frequency range.
5. due to the nonautomated choice of interpolation points.26 × 10−2 3. It is observed from ﬁgure 13. a single input and a single output. The real part of the pole closest to imaginary axis is −1. Lanczos and Arnoldi have not been used in this case.39 × 10−2 2. Pert.3 that this system is easy to approximate since its Hankel singular values decay rapidly.2 Heat diffusion model The original system is a plate with two heat sources and temperature measurements at two locations.26 × 10−2 4.2.04 × 10−3 6. Hence. Pert.95 × 10−3 Balanced Hankel Sing. Left pane: Bode plot of original and reduced systems for the heat diffusion model. all methods generate satisfactory approximants matching the full order model through the frequency range (see ﬁgure 13. The ﬁrst Markov parameter of the system is zero. 2004 i i i i . H∞ norm of error 9.2.43 × 10−2 .52 × 10−2 .65 × 10−4 7. instead of expanding the transfer function around ∞. Krylov Lanczos Arnoldi Least Squares 13.03 × 10−3 4.92 × 10−2 H2 norm of error 5.93 × 10−3 2.50 × 10−4 9. we expand it around s0 = 200. Only the rational Krylov method is inaccurate around the frequency 1 rad/sec.i i 13. As expected rational Krylov. The model has 120 states.93 × 10−2 1.51 × 10−3 7. Approximants have order 12. H∞ norm of error 2. Rat.11 × 10−2 1. Rat.25 × 10−3 1.16 × 10−2 7.5).35 × 10−3 H2 norm of error 2.2. Model reduction algorithms applied to low order systems 321 i “oa” 2004/4/14 page 321 i approximation is the best. This overcomes the breakdown ACA April 14.68 × 10−2 6. As expected. Arnoldi and Lanczos result in high relative errors since they are local in nature. Bal.86 × 10−3 1.64 × 10−4 5.3 The CD player This model describes the dynamics between the lens actuator and the radial arm position of a portable CD player. Right pane: error Bode plots.01 × 10−1 Balanced Approx. We approximate it with models of order 5. It is described by the heat equation.39 × 10−3 1. The pole closest to the imaginary axis has real part equal to −2. Krylov 13.33 × 10−2 1. A model of order 197 is obtained by spatial discretization. due to the low tolerance value. Sigma plot for the reduced systems of Heat Model 30 FOM BT HNA SPA RK Sigma plot for the error systems of Heat Model −15 20 −20 BT SPA 10 −25 0 −30 Singular Values Singular Values −10 −20 −35 −30 −40 HNA −45 −40 −50 RK −50 −60 −70 −3 10 10 −2 10 −1 10 Frequency(rad/sec) 0 10 1 10 2 10 3 −55 −3 10 10 −2 10 −1 10 Frequency(rad/sec) 0 10 1 10 2 10 3 Figure 13. 42 × 10−2 1. Hankel Sing.25 × 10−3 2.
Figure 13.28 × 10−2 1. 2004 i i . But notice that rational Krylov is better than Arnoldi and Lanczos except for the frequency range 102 − 103 rad/sec. We also use rational Arnoldi with s0 = 200. The largest singular value of the frequency response of the error systems in ﬁgure 13. left pane. Krylov Arnoldi Lanczos Least Squares i i ACA : April 14. Case studies i “oa” 2004/4/14 page 322 i in the Lanczos procedure.14 × 10−3 H2 norm of error 3. H∞ norm of error 9.60 × 10−2 1. Among the SVD based ones. Sigma plot for the reduced systems of CD Player 40 20 Singular Values (dB) 0 −20 −40 −60 −80 −100 2 4 40 20 Singular Values (dB) 0 −20 −40 −60 −80 −100 −120 0 10 FOM BT HNA SPA FOM LS RK LP AP 10 2 10 10 Frequenccy (rad/sec) 10 6 −120 0 10 10 4 10 6 Frequenccy (rad/sec) Sigma plot for the error systems of CD Player HNA Singular Values (dB) −20 −30 −40 −50 −60 −70 5 −20 −30 Singular Values (dB) −40 −50 −60 −70 −80 0 10 BT RK LS 5 10 −80 0 10 10 Figure 13.55 × 10−3 4.22 × 10−3 5.74 × 10−4 9. It should be noticed that only the rational Krylov method approximates well the peaks around the frequency range 104 − 105 rad/sec. reveal that the SVD based methods are better when we consider the whole frequency range. Pert.74 × 10−4 9. right pane. Hankel norm approximation is the worst for both low and high frequencies.28 × 10−2 3. Rat. shows the largest singular values of the frequency response of the reduced order models together with that of the full order model. Lower panes: error Bode plots.6.16 × 10−3 4.92 × 10−3 3.01 × 10−4 1. Hankel Sing.6. this is due to the choice of interpolation points in this frequency region.6.i i 322 Chapter 13. rational Krylov has the highest relative H∞ and H2 error norms as listed in the table below.06 × 10−2 1.39 × 10−3 Balanced Approx.92 × 10−3 4. Bal.81 × 10−2 1. Upper panes: Bode plot of original and reduced models for the CD player model. Despite doing a good job at low and high frequencies.84 × 10−2 1.
the real part of the pole closest to imaginary axis is −5. The ensuing approximant is one of the best approximants. For rational Krylov the interpolation points s0 = 1 and s0 = 3 were used. The Lanczos and Arnoldi procedures lead to a very good approximants for the frequency range 0 − 1 rad/sec.2. 2004 i i i i . σmax of the frequency response of the reduced systems (upper plot) and of the error systems of the clamped beam ACA April 14. namely s0 = 1. Model reduction algorithms applied to low order systems 323 i “oa” 2004/4/14 page 323 i 13. Since the ﬁrst Markov parameter is zero. Balanced model reduction is the best among the SVD methods after 1 rad/sec. 80 60 Singular Values 40 20 0 −20 −40 −2 10 Singular Values Sigma plot for the reduced systems of Clamped Beam Model 80 FOM BT 60 HNA SPA 40 20 0 −20 0 2 3 FOM RK LP AP 10 10 Frequency(rad/sec) 10 −40 −2 10 10 10 Frequency(rad/sec) 0 2 10 3 Sigma plot for the error systems of Clamped Beam Model 0 Singular Values −10 HNA −20 −30 −40 −2 10 SPA BT −80 0 2 3 20 Singular Values 0 −20 −40 −60 LP RK 10 10 Frequency(rad/sec) 0 AP 10 10 Frequency(rad/sec) 10 −100 −2 10 2 10 3 Figure 13. It is obtained by spatial discretization of an appropriate partial differential equation. respectively. left pane and right pane.2.4 Clamped beam model This is the model of a clamped beam with proportional (Rayleigh) damping. We approximate the system with models of order 13. The plots of the largest singular value of the frequency response of the approximants and error systems are shown in ﬁgure 13. to circumvent the breakdown of Lanczos.05 × 10−3 . but the differences are not as pronounced as for the previous examples.i i 13. The input is the force applied to the free end of the beam. while the output is the resulting displacement. we also use rational Arnoldi with the same shift. due to the choice of the interpolation point. In this case. rational Lanczos with s0 = 1 is used.7.7. it has 348 states and is single input single output. In terms of error norms. SVD based methods are better than moment matching based methods.
It should be noticed that the ﬁrst 25 Hankel singular values are (almost) equal.1.28 × 10−4 5.6 SPA HNA 0 −20 −40 −60 AP Singular Values Singular Values LP RK FOM 1 Frequency(rad/sec) 1 Frequency(rad/sec) 2 Sigma plot for the Error Systems of Butterworth Filter −60 0 −20 SPA −80 Singular Values −40 −60 −80 −100 2 −120 0.3. SVD based methods produce good approximants for the whole frequency range. and consequently this system cannot be reduced to order less than 25 using SVD–based methods. while balanced reduction is best in terms of the H2 norm of the error system. Upper panes: Bode plot of original and reduced systems for the Butterworth ﬁlter.72 × 10−3 9.6 1 Frequency(rad/sec) 1 Frequency(rad/sec) Figure 13.88 × 10−3 1. Among the SVD based methods.1.5 Lowpass Butterworth ﬁlter The full order model in this case is a lowpass Butterworth ﬁlter of order 100 with cutoff frequency 1 rad/sec and gain 1 in the passband. The normalized Hankel singular values of this system are shown in ﬁgure 13. s3 = 0. Case studies i “oa” 2004/4/14 page 324 i Balanced Hankel Sing.18 was also used. the approximation around the cutoff frequency provided by the moment matching methods is not good. Rat. Hankel norm approximation is the best in terms of the H∞ norm. i i ACA : April 14.45 × 10−4 3. Given the tolerance τ = 1 × 10−3 . the rational Arnoldi and Lanczos are used with interpolation point s0 = 0.88 × 10−2 8. Lower panes: error Bode plots.8.i i 324 H∞ norm of error 2. Pert. As ﬁgure 13.6 RK AP Singular Values −70 −90 BT HNA LP 2 −100 0. 2004 i i .10 × 10−3 4. 0 −20 −40 −60 FOM −80 0.43 × 10−4 H2 norm of error 7. Krylov Arnoldi Lanczos 13.67 × 10−2 Chapter 13.6 BT −80 2 0. s2 = 0.2.68 × 10−2 1.15 and s4 = 0. the order of the approximants is 35.97 × 10−4 3.8.14 × 10−4 2. On the other hand. left pane shows.69 × 10−3 8. Since the transfer function in this case has no ﬁnite zeros. Rational Krylov with interpolation points s1 = 1 × 10−5 .
approximate balanced reduction.19 × 10−4 5.68 × 10−1 325 i “oa” 2004/4/14 page 325 i Balanced Approx. approximate balancing has the advantage that it computes a reduced system iteratively. 13.i i 13.02 × 100 1.02 × 100 1.33 × 10−4 1. This selection however is not an automated process and has to be speciﬁed by the user. Hankel norm approximation has the lowest H∞ error norm in most cases. Among the SVD– based methods. in return they reduce the computational cost and storage requirements.29 × 10−4 5. The results show that balanced reduction and approximate balanced reduction are the best when we consider the whole frequency range. The ﬁgure shows particular stages in the space station assembly sequence and a frequency response plot from thruster commands (roll. These algorithms have been applied to ﬁve dynamical systems.3 Approximation of the ISS 1R and 12A ﬂex models The model reduction problem for two structural systems will be examined next.65 × 10−3 5. The structural ﬂexibility during assembly is shown in ﬁgure 2. Hence. singular perturbation approximation. Krylov Arnoldi Lanczos Conclusions. the computational cost and storage requirements are reduced.44 × 10−1 5.e. to perform controller ﬂex structure dynamic interaction assessment.21 × 10−4 4. it is assumed that the ﬂex model is in modal form. The integrated control system ﬂex structure assessment and ﬂight readiness certiﬁcation process must identify the potential for dynamic interaction between the ﬂexible structure and the control systems. and therefore the need to obtain a balanced realization of the full order system ﬁrst. and for low frequencies it gives the worst approximation. that ACA April 14. Hankel Sin