Thomas Finley, tomf@cs.cornell.

edu

Norms

Determinant
The determinant det : Rn×n → R satisfies: 1. det(AB) = det(A) det(B). 2. det(A) = 0 iff A is singular. 3. det(L) = ℓ1,1 ℓ2,2 · · · ℓn,n for triangular L. 4. det(A) = det(AT ). To compute det(A) factor A = P T LU . det(P ) = (−1)s where s is the number of swaps, det(L) = 1. When computing det(U ) watch out for overflow!

i, j where i ≥ j (or i ≤ j) such that Ai,j = 0. A matrix with lower bandwidth 1 is upper Hessenberg. For A, B ∈ Rn×n , B is A’s inverse if AB = BA = I. If such a B exists, A is invertible or nonsingular. B = A−1 . The inverse of A is A−1 = [x1 , · · · , xn ] where Axi = ei . For A ∈ Rn×n the following are equivalent: A is nonsingular, rank(A) = n, Ax = b has a solution x for any b, if Ax = 0 then x = 0. The nullspace of A ∈ Rm×n is {x ∈ Rn : Ax = 0}. For A ∈ Rm×n , Range(A) and N ullspace(AT ) are orthogonal complements, i.e., x ∈ Range(A), y ∈ N ullspace(AT ) ⇒ xT y = 0, and for all p ∈ Rm , p = x + y for unique x and y. For a permutation matrix P ∈ Rn×n , P A permutes the rows of A, AP the columns of A. P −1 = P T .

A vector norm function · : Rn → R satisfies: Linear Algebra 1. x ≥ 0, and x = 0 ⇔ x = 0. A subspace is a set S ⊆ Rn such that 0 ∈ S and ∀x, y ∈ 2. γx = |γ| · x for all γ ∈ R, and all x ∈ Rn . S, α, β ∈ R . αx + βy ∈ S. 3. x + y ≤ x + y , for all x, y ∈ Rn . The span of {v1 , . . . , vk } is the set of all vectors in Rn Common norms include: that are linear combinations of v1 , . . . , vk . 1. x 1 = |x1 | + |x2 | + · · · + |xn | A basis B of subspace S, B = {v1 , . . . , vk } ⊂ S has 2. x 2 = x2 + x2 + · · · + x2 n 1 2 Span(B) = S and all vi linearly independent. 1 3. x ∞ = lim (|x1 |p + · · · + |xn |p ) p = max |xi | The dimension of S is |B| for a basis B of S. p→∞ i=1..n For subspaces S, T with S ⊆ T , dim(S) ≤ dim(T ), and An induced matrix norm is A = supx=0 Ax . It x further if dim(S) = dim(T ), then S = T . satisfies the three properties of norms. A linear transformation T : Rn → Rm has ∀x, y ∈ ∀x ∈ Rn , A ∈ Rm×n , Ax ≤ A x . Rn , α, β ∈ R . T (αx + βy) = αT (x) + βT (y). Further, AB ≤ A B , called submultiplicativity. ∃A ∈ Rm×n such that ∀x . T (x) ≡ Ax. aT b ≤ a 2 b 2 , called Cauchy-Schwarz inequality. n → Rm , S : Rm → For two linear transformations T : R 1. A ∞ = maxi=1,...,m n |ai,j | (max row sum). j=1 Rp , S ◦ T ≡ S(T (x)) is linear transformation. (T (x) ≡ 2. A 1 = maxj=1,...,n m |ai,j | (max column sum). i=1 Ax) ∧ (S(y) ≡ B) ⇒ (S ◦ T )(x) ≡ BAx. 3. A 2 is hard: it takes O(n3 ), not O(n2 ) operations. n m The matrix’s row space is the span of its rows, its column 2 · F often replaces · 2 . 4. A F = i=1 j=1 ai,j . space or range is the span of its columns, and its rank is Numerical Stability the dimension of either of these spaces. For A ∈ Rm×n , rank(A) ≤ min(m, n). A has full row (or Six sources of error in scientific computing: modeling errors, measurement or data errors, blunders, discretization column) rank if rank(A) = m (or n). A diagonal matrix D ∈ Rn×n has dj,k = 0 for j = k. The or truncation errors, convergence tolerance, and rounding exponent errors. For single and double: diagonal identity matrix I has ij,j = 1. e t = 24, e ∈ {−126, . . . , 127} ± d1 .d2 d3 · · · dt × β The upper (or lower ) bandwidth of A is max |i−j| among
sign mantissa base
t = 53, e ∈ {−1022, . . . , 1023}

Basic Linear Algebra Subroutines

For A ∈ Rm×n , if rank(A) = n, then AT A is SPD.

Orthogonal Matrices
For Q ∈ Rn×n , these statements are equivalent: 1. QT Q = QQT = I (i.e., Q is orthogonal ) 2. The · 2 for each row and column of Q. The inner product of any row (or column) with another is 0. 3. For all x ∈ Rn , Qx 2 = x 2 . A matrix Q ∈ Rm×n with m > n has orthonormal columns if the columns are orthonormal, and QT Q = I. The product of orthogonal matrices is orthogonal. For orthogonal Q, QA 2 = A 2 and AQ 2 = A 2 .

0. Scalar ops, like x2 + y 2 . O(1) flops, O(1) data. 1. Vector ops, like y = ax + y. O(n) flops, O(n) data. 2. Matrix-vector ops, like rank-one update A = A + xyT . O(n2 ) flops, O(n2 ) data. 3. Matrix-matrix ops, like C = C + AB. O(n2 ) data, O(n3 ) flops. Use the highest BLAS level possible. Operators are architecture tuned, e.g., data processed in cache-sized bites.

Linear Least Squares
Suppose we have points (u1 , v1 ), . . . , (u5 , v5 ) that we want to fit a quadratic curve au2 + bu + c through. We want to  2     solve for u1 u1 1 v1 a  . . .  b  =  .  . . .  .  .  . .  . c u2 u5 1 v5 5 This is overdetermined so an exact solution is out. Instead, find the least squares solution x that minimizes Ax − b 2 . For the method of normal equations, solve for x in AT Ax = AT b by using Cholesky factorization. This takes 3 mn2 + n + O(mn) flops. It is conditionally but not back3 wards stable: AT A doubles the condition number. Alternatively, factor A = QR. Let c = [ c1 c2 ]T = −1 QT b. The least squares solution is x = R1 c1 . If rank(A) = r and r < n (rank deficient), factor A = U ΣV T , let y = V T x and c = U T b. Then, min Ax −
m 2 i=r+1 ci ,

QR-factorization
For any A ∈ Rm×n with m ≥ n, we can factor A = QR, where Q ∈ Rm×m is orthogonal, and R = [ R1 0 ]T ∈ Rm×n is upper triangular. rank(A) = n iff R1 is invertible. Q’s first n (or last m − n) columns form an orthonormal basis for span(A) (or nullspace(AT )). T A Householder reflection is H = I − 2vvv . H is symmetvT ric and orthogonal. Explicit H.H. QR-factorization is: 1: for k = 1 : n do 2: v = A(k : m, k) ± A(k : m, k) 2 e1 T 3: A(k : m, k : n) = I − 2vvv A(k : m, k : n) vT 4: end for We get Hn Hn−1 · · · H1 A = R, so then, Q = H1 H2 · · · Hn . This takes 2mn2 − 2 n3 + O(mn) flops. 3 Givens requires 50% more flops. Preferable for sparse A. The Gram-Schmidt produces a skinny/reduced QRfactorization A = Q1 R1 , where Q1 ∈ Rm×n has orthonormal columns. The Gram-Schmidt algorithm is: Right Looking Left Looking 1: Q = A 1: for k = 1 : n do 2: for k = 1 : n do 2: qk = ak 3: R(k, k) = qk 2 3: for j = 1 : k − 1 do 4: qk = qk /R(k, k) 4: R(j, k) = qT ak j for j = k + 1 : n do 5: qk = qk − R(j, k)qj 5: 6: R(k, j) = qT qj 6: end for k 7: qj = qj − R(k, j)qk 7: R(k, k) = qk 2 8: end for 8: qk = qk /R(k, k) 9: end for 9: end for

x−x| The relative error in x approximating x is |ˆ|x| . ˆ Unit roundoff or machine epsilon is ǫmach = β −t+1 . Arithmetic operations have relative error bounded by ǫmach . E.g., consider z = x−y with input x, y. This program has three roundoff errors. z = ((1 + δ1 )x − (1 + δ2 )y) (1 + δ3 ), ˆ where δ1 , δ2 , δ3 ∈ [−ǫmach , ǫmach ]. |z−ˆ| z |z|

r 2 b 2 = min i=1 (σi yi − ci ) + i = r + 1 : n, yi is arbitrary.

so yi =

ci σi .

For

Singular Value Decomposition
For any A ∈ Rm×n , we can express A = U ΣV T such that U ∈ Rm×m and V ∈ Rn×n are orthogonal, and Σ = diag(σ1 , · · · , σp ) ∈ Rm×n where p = min(m, n) and σ1 ≥ σ2 ≥ · · · ≥ σp ≥ 0. The σi are singular values. 1. Matrix 2-norm, where A 2 = σ1 . σ1 2. The condition number κ2 (A) = A 2 A−1 2 = σn , or rectangular condition number κ2 (A) = σ σ1 . Note
min(m,n)

=

|(δ1 +δ3 )x−(δ2 +δ3 )y+O(ǫ2 mach )| |x−y| |z−ˆ| z |z|

The bad case is where δ1 = ǫmach , δ2 = −ǫmach , δ3 = 0: = ǫmach |x+y| |x−y| Inaccuracy if |x+y| ≫ |x−y| called catastrophic calcellation.

Conditioning & Backwards Stability

A problem instance is ill conditioned if the solution is sensitive to perturbations of the data. For example, sin 1 is Gaussian Elimination well conditioned, but sin 12392193 is ill conditioned. GE produces a factorization A = LU , GEPP P A = LU . Suppose we perturb Ax = b by (A + E)ˆ = b + e where x E e x+x ˆ GEPP Plain GE ≤ 2δκ(A) + O(δ 2 ), where A ≤ δ, b ≤ δ. Then x 1: for k = 1 to n − 1 do 1: for k = 1 to n − 1 do κ(A) = A A−1 is the condition number of A. 2: γ = argmax |aik | 2: if akk = 0 then stop 1. ∀A ∈ Rn×n , κ(A) ≥ 1. i∈{k+1,...,n} 3: ℓk+1:n,k = ak+1:n,k /akk 2. κ(I) = 1. 3: a[γ,k],k:n = a[k,γ],k:n 4: ak+1:n,k:n = ak+1:n,k:n − 3. For γ = 0, κ(γA) = κ(A). 4: ℓ[γ,k],1:k−1 = ℓ[k,γ],1:k−1 ℓk+1:n,k ak,k:n 4. For diagonal D and all p, D p = maxi=1..n |dii |. So, 5: pk = γ 5: end for i=1..n |dii κ(D) = maxi=1..n |dii || . 6: ℓk:n,k = ak:n,k /akk min Backward Substitution 7: ak+1:n,k:n = ak+1:n,k:n − If κ(A) ≥ ǫ 1 , A may as well be singular. mach 1: x = zeros(n, 1) ℓk+1:n,k ak,k:n An algorithm is backwards stable if in the presence of 2: for j = n to 1 do 8: end for roundoff error it returns the exact solution to a nearby wj − uj,j+1:n xj+1:n 3: xj = problem instance. uj,j GEPP solves Ax = b by returning x where (A+E)ˆ = b. ˆ x 4: end for ∞ It is backwards stable if E ∞ ≤ O(ǫmach ). With GEPP, A To solve Ax = b, factor A = LU (or A = P T LU ), solve E ∞ 2 ˆ ˆ Lw = b (or Lw = b where b = P b) for w using forward A ∞ ≤ cn ǫmach + O(ǫmach ), where cn is worst case exposubstitution, then solve U x = w for x using backward sub- nential in n, but in practice almost always low order poly2 stitution. The complexity of GE and GEPP is 3 n3 +O(n2 ). nomial. Combining stability and conditioning analysis yields GEPP encounters an exact 0 pivot iff A is singular. x−x ˆ ≤ cn · κ(A)ǫmach + O(ǫ2 mach ). For banded A, L + U has the same bandwidths as A. x

In left looking, let line 6 be qT qj−1 for modified G.S. to j See that range(U1 ) = range(A). The SVD gives an ormake it backwards stable. thonormal basis for the range and nullspace of A and AT . T Positive Definite, A = LDL Compute the SVD by using shifted QR on AT A. A ∈ Rn×n is positive definite (PD) (or semidefinite (PSD)) Information Retrival & LSI if xT Ax > 0 (or xT Ax ≥ 0). m When LU -factorizing symmetric A, the result is A = In the bag of words model, wd ∈ R , where wd (i) is the T ; L is unit lower triangular, D is diagonal. A is SPD (perhaps weighted) frequency of term i in document d. The LDL m×n iff D has all positive entries. The Cholesky factorization is corpus matrix is A = [w1 , · · · , wn ] ∈ R T . For a query q w m A = LDLT = LD1/2 D1/2 LT = GGT . Can be done directly q ∈ R , rank documents according to a wd d score. 2 3 In latent semantic indexing, you do the same, but in a in n + O(n2 ) flops. If G has all positive diagonal A is SPD. 3 T , solve k dimensional subspace. Factor A = U ΣV T , then define To solve Ax = b for SPD A, factor A = GG k×n . Each w∗ = A∗ = U T w , and ∗ T Gw = b by forward substitution, then solve GT x = w A = Σ1:k,1:k V:,1:k ∈ R :,1:k d :,d d 3 T with backwards substitution, which takes n + O(n2 ) flops. q∗ = U:,1:k q. 3

that κ2 (AT A) = κ2 (A)2 . 3. For a rank k approximation to A, let Σk = diag(σ1 , · · · , σk , 0T ). Then Ak = U Σk V T . rank(Ak ) ≤ k and rank(Ak ) = k iff σk > 0. Among rank k or lower matrices, Ak minimizes A − Ak 2 = σk+1 . 4. Rank determination, since rank(A) = r equals the number of nonzero σ, or in machine arithmetic, perhaps the number of σ ≥ ǫmach × σ1 . V1T Σ(1 : r, 1 : r) 0 A = U ΣV T = U1 U2 V2T 0 0

so d 2 = −kx . transitive. un . r) = {y ∈ Rn : x − y < r}. ˜ qk+1 8: βk = wk 2 8: qk+1 = H(k+1. . and symmetric. 1. e. Further for symmetric A: 1. T −1 AT (the similarity transIterative Methods for Ax = b formation) is similar to A. insert a remainder term +R on RHS. y1 . the If QH Q = I. singular values of B are the 2: r0 = b − Ax(0) per iteration. ax2 + bx + c = 0. . The singular values are absolute values of eigenvalues. (error at each step) is O(hp+1 ).such that A = QHQT . λn1 . For SPD. SPD then T is SPD. Automatic differentiation takes advantage of the notion that a computer program is nothing but arithmetic operations. transients in electical circuits. . k k speed up CG. A linear (or quadratic) model approximates a function f by the first two (or three) terms of f ’s Taylor expansion. It produces the exact solution after n iterations. i. h : R h(x) ≥ 0 holds inequality constraints.i T:. r1 . For A ∈ Cn×n . where c = λ| max | + ǫ. An works in practice). so yk ≈ y(tk ) for increasing tk . consider M as the diagonals of A. det(A−λI) is A’s characteristic polynomial. if ∇f (x∗ ) = 0 and The stable or neutrally spable or unstable case is where ∇2 f (x∗ ) is PD. . Arnoldi is O(nk 2 ) times and O(nk) of A and λ is the corresponding eigenvalue. do −1 3: sk = −Bk ∇f (x(k) ) D6 22 9 9 9 9 In the Ando-Lee analysis.} for k = 0. then  . H2 . . a transient initial velocity that in the true sowhen checking for PD. Lanczos is O(nk)+k ·M time (M is time for matrixRemember. . so splitting methods −1 AT .. Thus. LSI does worse. .e. (error overall) is O(hp ). tk+1 ). then 1: for k = 1 : n − 1 do 1: β0 = w0 2 ∗ )T A∗ will approximate RRT well. dt 0 and ∇2 f (x∗ ) is PSD. g(x∗ ) − x∗ = 0.. t). and one can apply the chain rule to get the derivative. What if the Hessian isn’t PD? Use (a) secant method. Time per iteration is O(n) + M. k) = qk+1 2 The real part is x = ℜ(z). Use Lanczos for symmetric A. x(k+1) = x(k) − h where h = (∇g(x)T ∇g(x))−1 ∇g(x)T g(x).. ignore the last term. . After very few iterations of either method. e(k+1) ≤ c e(k) 2 . Nonlinear Equation Solving Given f : Rn → Rm . where distribution is ρ = mintt Rt. Exponential growth/decay with dy = ay. A’s eigenvalues are D’s diagonals. with closed dt form y(t) = y0 eat . space. i. we choose g : Rn → Rn such that x(k+1) = g(x(k) ). 3: Set A(k) − σ (k) I = Q(k) R(k) Q(0) · · · Q(k+1) . yi is population size. A is diagonalizable if A is similar to a diagonal matrix The error is e(k) = (M −1 N )k e0 . and the eigenvectors may be chosen as the Conjugate Gradient columns of an orthogonal matrix Q. since the tensor → 0 as k → ∞.. A = M −N and M v = c is easily (though probably different eigenvectors). where ǫ is the influence of the ignored last term. and the D = T converge if λ| max | (M −1 N ) < 1. . conditioner M such that 7: rk+1 = rk − αk Apk For B ∈ Cn×n . eigenvalues of Tk and H will be excellent approximations Eigenvalues & Eigenvectors to the “extreme” eigenvalues of A. SD is points y0 (given).  .  . . Growth if a > 0. . modify that diagolution disappears immediately. Bk is SPD provided sT y > 0 (use line search to k increase αk if needed). for a corpus with k topics. futile for inexact observations. if old qk ’s are discarded. let Rt. 3: 3: qk = βk−1 for ℓ = 1 : k do 4: uk = Aqk 4: H(ℓ. y2 . It is derived from Lanczos and takes advantage of if A is eigendecomposition of A. . In Broyden-Fletcher-Goldfarb-Shanno: 96 ODE (or PDE) has one (or multiple) independent variables. use a per6: x(k+1) = x(k) + αk pk In this sheet I denote λ| max | = maxλ∈{λ1 . chemical reaction rate varinal in ∇2 f (x(k) ) and keep going (unjustified by theory.   (A − σI)−1 has eigenpairs λ11 . A is In the Jacobi method. A global minizer is the best local minimizer. In initial value problems. .g. A ball is a set B(x. very low. This converges quadratically. and dropping a resulting tensor (derivative of Jacobian). This may be used to compute Jacobians and determinants. eigenvectors are columns of T since AT:. do after 1 iteration. . the αk and βk are diagonal and subdiagonal The conjugate transpose of x is xH = (x)T .λn } |λ|. trans. where entry (i. ˜ 7: wk = vk − αk qk 7: H(k + 1. initialize α. t) is perhaps where h or −h (doesn’t work well in practice). . dt and y(0) = y0 . x(k+1) = M −1 N x(k) + b . r2 = −b± 2a −4ac . . dyi 2. and we have H in Hermitian or self-adjoint if A = AH . In the Gauss-Newton method. . Semi-conversely. 1. then ∇f (x∗ ) = For stability of an ODE. etc.2. k) = qT qk+1 Great for ρ is near 1.e. optimize min f (x Miscellaneous Frowned upon because it’s computationally expensive. r). Examples include: 1. k)qℓ 5: Complex Numbers √ 6: αk = qT vk 6: end for k Complex numbers are written z = x + iy ∈ C for i = −1. A B = (AB) 9: end for 9: end for The absolute value of z is |z| = x2 + y 2 . ways PD ∇2 f . In exact line search. True document similar. . y(t) ∈ Rn . Mechanics. ity is RRT = Rn×n .t. A ∈ Cn×n is entries of the Hermitian tridiagonal Tk . f ’s Taylor expansion is f (x+h) = f (x)+∇f (x)h+O( h 2 ). solve for R. . This relationship solvable. 2 nd 2 To factor A = QHQT . g : Rn → Rm s. (λn . and y′ = f (y. Steepest Descent In finite difference methods. If x∗ is a local minimizer. t) for species i = 1. g(x) = 0 n → Rp holds equality constraints. can iterate in O(n2 ) flops. r1 r2 = a f (x Exact arithmetic is slow. characteristic polynomial and hence the same eigenvalues In the splitting method. . given x(k) and step h (perhaps derived from is implicit (y k+1 on the RHS). If BFCS converges. 2. For symmetric A ∈ Rn×n .. 9: pk+1 = rk+1 − βk+1 pk x(k+1) = Ax(k) converges to λ| max | (A)’s eigenvector. . use a Taylor (k) ) = 0. If it converges to x∗ .. ℓ ˜ ˜ ˜ vk = uk − βk−1 qk−1 5: qk+1 = qk+1 − H(ℓ. R:. In fixed point iteration. Then. SD or NMUM). . the global truncation error Newton’s Method for Unconstrained Min. If it converges.. approximate y(t) by discrete Go where the function (locally) decreases most rapidly via (k+1) = x(k) − α ∇f (x(k) . output Q. x(k+1) = x(k) − (∇f (x(k) ))−1 f (x(k) ). The secant condition αk Bk+1 sk = yk holds. un ). Similar matrices have the same Useful for sparse A where GE would cause fill-in. 4: 5: Non-linear Least Squares For g : Rn → Rm . . We have local minimizers x∗ which are the best in a region. if A contains information about RRT . diagonalizable iff it has n linearly independent eigenvectors. ∇f =   . with y0 as initial position and velocity. . derived To find p. m ≥ n. Then this may not converge as (∇g(x∗ ))k initially grows! In Newton’s method.. Explicit! region idea so h = −(∇2 f (x(k) ) + tI)−1 ∇f (x(k) ) (interpoA stiff problem has widely ranging time scales in the solation of NMUM and SD). v −kx m Assume f is c2 .k) The conjugate of z is z = x−iy. . Ecological models. . 2. find the associated eigenMultivariate Calculus (k) T (k) value λ through the Raleigh quotient λ = x (k) TAx . H relevance to topic t. u1 ). tk )hk where direction of negative curvature where hT ∇2 f (x(k) )h < 0 hk = tk+1 − tk is the step size.1α∇f (x(k) )T h. Provided f : Rn → R.i . substitute the exact solution into FDM for∗ ) = 0. 2.e. Ordinary Differential Equations Optimization In continuous optimization. (b) In Euler’s method. fi encodes species relationships. is reflexive. . p+1 n±constant p p k =n k=1 p+1 + O(n ) √ In Armijo or backtrack line search. the Jacobian is (0) = A   1: A A(k) is similar to A by δf1 /δx1 · · · δf1 /δxn 2: for k = 0. . detect 0 pivots. explicit method requires hk to be on the smallest scale! Line Search Backward Euler has yk+1 = yk + hf (yk+1 . . This will fail of A has any zero diagonals. 2. keep only the leading term. CG converges For any A ∈ Cn×n . the limit point x∗ is a solution to Ax = b. If ∇g(x∗ ) has λ| max | < 1. p = 1. i. find successive Householder reflec. Arnoldi. . Call p the order of accuracy.: 2 . has all real eigenvalues. do orthog. u1 .d ≥ 0 be document d’s Given A ∈ Rn×n and unit length q1 ∈ Rn . it converges superlinearly. Suppose for ∇g(x∗ ) = QT QH . α is explained later. . This indicates a linear rate of convergence. or O(n + k) space λ as a variable. x(k+1) = x(k) + αk sk {Use special line search for αk !} yk = ∇f (x(k+1) ) − ∇f (x(k) ) T yk yk Bk sk sT Bk 6: Bk+1 = Bk + − T k T αyk sk sk Bk sk 7: end for By maintaining Bk in factored form. n. e. T ’s superdiagonal portion is large relative to the diagonal. Iterate by x(k+1) = x(k) − (∇2 f (x(k) ))−1 ∇f (x(k) ). tions H1 . Ax = (Ax). . let dy = Ay for A ∈ Cn×n . but ability over temperature. Is SPD (or SPSD) iff eigenvalues > 0 (or ≥ 0).If f is c (2 partials are all continuous). x is an eigenvector For k iterations. f (x + h) = f (x) + h ∇f (x) + 2 h ∇ f (x)h + O( h ) Provided f : Rn → Rm . Perhaps . yn . . singular values equal eigenvalues. we want the x for min g(x) 2 . . (c) trust computed by finite difference.03 28 21 21 28 26 23 6 8 4 2 A C F 8 8 B 2 9 F 1 A F 6 1 0 E 7 2 A 1 A 7 A B 2 4 2 8 5 2 5 2 C E 0 C E 9 9 C E 1 8 E 0 C F 1 C 1 4 0 C 4: 2: D 1 E 0 00 9 E A B E 3 0 δfm /δx1 · · · δfm /δxn choose σ (k) as eigenvalues of submatrices of A. do 2: qk+1 = Aqk max Rt.1. U (k) =   . ∇ f is symmetric. You keep the quadratic convergence when g(x∗ ) = 0. for 3: p0 = r0 square roots of B T B’s eigenvalues. . ∇f (x series expansion. wall-spring-block models for F = ma T 2x 2x (a = d 2 ) and F = −kx. any h will work! (k) + αh) over α. For Lanczos. . ∃r > 0 such that f (x∗ ) ≤ f (x) for all x ∈ T B(x∗ . . that zero out rows 2 and lower of column 1. κ(A) = 1.v] = m dt dt dt 02 27 01 20 02 22 20 . . if Ax = λx where x = 0. For B ∈ Rm×n . A = QDQT is the Conjugate gradient iteratively solve Ax = b for SPD A.i = Di. Then Q = H1 · · · Hn−2 .. then x∗ is a local minimizer. A is diagonalizable. To 5: αk = (rT rk )/(pT Apk ) with unitary Q ∈ Cn×n and upper triangular T ∈ Cn×n . BE Line search.. LSI depends on even (A ˜ 2: for k = 1. maxi ℜ(λi (A)) < 0 or = 0 or > 0 respectively. T is non-normal. but if ρ ≫ 1. if the local truncation error stateless: depends only on the current point. Yields d[x. halve α. We did unrestricted optimization min f (x) in the course. given dy = f (y. Too slow. we want y(t) for t > 0. j) is relevance of i to Arnoldi Lanczos j. the Schur form of A is A = QT QH 4: for k=0. then limk→∞ B k = 0 if λ| max | (B) < 1. Q is unitary. If ∇2 f (x(k) ) is PD and by solving for where ∇f (x mula. . Using LSI. ( κ(A) − 1)/( κ(A) + 1) 4. finds a α > 0 for x stable. x k k For many IVPs and FDMs. While b2 c (k) + αh) > f (x(k) ) + 0. For nonsingular T ∈ Cn×n . 10: end for Once you find an eigenvector x. −σ −σ δf δ2 f δ2 f δ2 f δxn δxn δx1 δxn δx2 · · · δx2 n Factor A = QHQT where H is upper Hessenberg. The imaginary part is y = ℜ(z). (d) factor ∇2 f (x(k) ) by Cholesky lution. If A has eigenpairs (λ1 . dt = fi (y1 . . for Arnoldi and Lanczos t ∈ 1 : k and d ∈ 1 : n. ∇2 f =  . the step is a descent direction.d 2 = 1. . f : Rn → R min f (x) is the objective function. let yk+1 = yk + f (yk . m ≥ n. the  gradient and Hessian are x(k) x  2f  δf  δ2 f δ2 f The inverse shifted power method is x(k+1) = (A − · · · δxδ1 δxn 2 δx1 δx2 δx1   δx1 σI)−1 x(k) . is reduced by 1: x(0) = arbitrary (0 is okay) Error 3. Note that h is a solution to a linear least squares problem min ∇g(x(k) )h − g(x(k) ) ! GN is derived by applying NMUM to to g(x)T g(x). decay if a < 0. .g. Secant/quasi Newton methods use an approximate aland NA relies on approximate algorithms. ∇f =  .  . If the original program is (k+1) = x(k) + αh. A − λx is singular iff det(A − λI) = 0. then x(k) → x∗ as e(k) ≤ ck e(0) for large k. 8: βk+1 = (rT rk+1 )/(rT rk ) κ(M A) ≪ κ(A) and solve k+1 k Power Methods for Eigenvalues M Ax = M b instead. 3. g(x(k) ) = g(x∗ )+∇g(x∗ )(x(k) −x∗ )+O( x(k) −x∗ 2 ) For small e(k) = x(k) − x∗ . we want x such that f (x) = 0. With vector multiplication) and O(nk) space.: 2 wk−1 distribution of topics. The Taylor expansion for f is 1 T 2 T 3 T T rows 3 and lower of column 2. A A(k+1) = R(k) Q(k) + σ (k) I 5: end for C 1 E 8 7 2 0 F 6 5 D 7 A 1 7 C7 00 99 D6 6 2 7 2 8 2 6 5 3 2 88 8 6 5 A 4 C 6 9 9 E E 1 6 F D 1 1: B0 = initial approximate Hessian {OK to use I.