This action might not be possible to undo. Are you sure you want to continue?

Welcome to Scribd! Start your free trial and access books, documents and more.Find out more

edu

Norms

Determinant

The determinant det : Rn×n → R satisﬁes: 1. det(AB) = det(A) det(B). 2. det(A) = 0 iﬀ A is singular. 3. det(L) = ℓ1,1 ℓ2,2 · · · ℓn,n for triangular L. 4. det(A) = det(AT ). To compute det(A) factor A = P T LU . det(P ) = (−1)s where s is the number of swaps, det(L) = 1. When computing det(U ) watch out for overﬂow!

i, j where i ≥ j (or i ≤ j) such that Ai,j = 0. A matrix with lower bandwidth 1 is upper Hessenberg. For A, B ∈ Rn×n , B is A’s inverse if AB = BA = I. If such a B exists, A is invertible or nonsingular. B = A−1 . The inverse of A is A−1 = [x1 , · · · , xn ] where Axi = ei . For A ∈ Rn×n the following are equivalent: A is nonsingular, rank(A) = n, Ax = b has a solution x for any b, if Ax = 0 then x = 0. The nullspace of A ∈ Rm×n is {x ∈ Rn : Ax = 0}. For A ∈ Rm×n , Range(A) and N ullspace(AT ) are orthogonal complements, i.e., x ∈ Range(A), y ∈ N ullspace(AT ) ⇒ xT y = 0, and for all p ∈ Rm , p = x + y for unique x and y. For a permutation matrix P ∈ Rn×n , P A permutes the rows of A, AP the columns of A. P −1 = P T .

A vector norm function · : Rn → R satisﬁes: Linear Algebra 1. x ≥ 0, and x = 0 ⇔ x = 0. A subspace is a set S ⊆ Rn such that 0 ∈ S and ∀x, y ∈ 2. γx = |γ| · x for all γ ∈ R, and all x ∈ Rn . S, α, β ∈ R . αx + βy ∈ S. 3. x + y ≤ x + y , for all x, y ∈ Rn . The span of {v1 , . . . , vk } is the set of all vectors in Rn Common norms include: that are linear combinations of v1 , . . . , vk . 1. x 1 = |x1 | + |x2 | + · · · + |xn | A basis B of subspace S, B = {v1 , . . . , vk } ⊂ S has 2. x 2 = x2 + x2 + · · · + x2 n 1 2 Span(B) = S and all vi linearly independent. 1 3. x ∞ = lim (|x1 |p + · · · + |xn |p ) p = max |xi | The dimension of S is |B| for a basis B of S. p→∞ i=1..n For subspaces S, T with S ⊆ T , dim(S) ≤ dim(T ), and An induced matrix norm is A = supx=0 Ax . It x further if dim(S) = dim(T ), then S = T . satisﬁes the three properties of norms. A linear transformation T : Rn → Rm has ∀x, y ∈ ∀x ∈ Rn , A ∈ Rm×n , Ax ≤ A x . Rn , α, β ∈ R . T (αx + βy) = αT (x) + βT (y). Further, AB ≤ A B , called submultiplicativity. ∃A ∈ Rm×n such that ∀x . T (x) ≡ Ax. aT b ≤ a 2 b 2 , called Cauchy-Schwarz inequality. n → Rm , S : Rm → For two linear transformations T : R 1. A ∞ = maxi=1,...,m n |ai,j | (max row sum). j=1 Rp , S ◦ T ≡ S(T (x)) is linear transformation. (T (x) ≡ 2. A 1 = maxj=1,...,n m |ai,j | (max column sum). i=1 Ax) ∧ (S(y) ≡ B) ⇒ (S ◦ T )(x) ≡ BAx. 3. A 2 is hard: it takes O(n3 ), not O(n2 ) operations. n m The matrix’s row space is the span of its rows, its column 2 · F often replaces · 2 . 4. A F = i=1 j=1 ai,j . space or range is the span of its columns, and its rank is Numerical Stability the dimension of either of these spaces. For A ∈ Rm×n , rank(A) ≤ min(m, n). A has full row (or Six sources of error in scientiﬁc computing: modeling errors, measurement or data errors, blunders, discretization column) rank if rank(A) = m (or n). A diagonal matrix D ∈ Rn×n has dj,k = 0 for j = k. The or truncation errors, convergence tolerance, and rounding exponent errors. For single and double: diagonal identity matrix I has ij,j = 1. e t = 24, e ∈ {−126, . . . , 127} ± d1 .d2 d3 · · · dt × β The upper (or lower ) bandwidth of A is max |i−j| among

sign mantissa base

t = 53, e ∈ {−1022, . . . , 1023}

Basic Linear Algebra Subroutines

For A ∈ Rm×n , if rank(A) = n, then AT A is SPD.

Orthogonal Matrices

For Q ∈ Rn×n , these statements are equivalent: 1. QT Q = QQT = I (i.e., Q is orthogonal ) 2. The · 2 for each row and column of Q. The inner product of any row (or column) with another is 0. 3. For all x ∈ Rn , Qx 2 = x 2 . A matrix Q ∈ Rm×n with m > n has orthonormal columns if the columns are orthonormal, and QT Q = I. The product of orthogonal matrices is orthogonal. For orthogonal Q, QA 2 = A 2 and AQ 2 = A 2 .

0. Scalar ops, like x2 + y 2 . O(1) ﬂops, O(1) data. 1. Vector ops, like y = ax + y. O(n) ﬂops, O(n) data. 2. Matrix-vector ops, like rank-one update A = A + xyT . O(n2 ) ﬂops, O(n2 ) data. 3. Matrix-matrix ops, like C = C + AB. O(n2 ) data, O(n3 ) ﬂops. Use the highest BLAS level possible. Operators are architecture tuned, e.g., data processed in cache-sized bites.

**Linear Least Squares
**

Suppose we have points (u1 , v1 ), . . . , (u5 , v5 ) that we want to ﬁt a quadratic curve au2 + bu + c through. We want to 2 solve for u1 u1 1 v1 a . . . b = . . . . . . . . . c u2 u5 1 v5 5 This is overdetermined so an exact solution is out. Instead, ﬁnd the least squares solution x that minimizes Ax − b 2 . For the method of normal equations, solve for x in AT Ax = AT b by using Cholesky factorization. This takes 3 mn2 + n + O(mn) ﬂops. It is conditionally but not back3 wards stable: AT A doubles the condition number. Alternatively, factor A = QR. Let c = [ c1 c2 ]T = −1 QT b. The least squares solution is x = R1 c1 . If rank(A) = r and r < n (rank deﬁcient), factor A = U ΣV T , let y = V T x and c = U T b. Then, min Ax −

m 2 i=r+1 ci ,

QR-factorization

For any A ∈ Rm×n with m ≥ n, we can factor A = QR, where Q ∈ Rm×m is orthogonal, and R = [ R1 0 ]T ∈ Rm×n is upper triangular. rank(A) = n iﬀ R1 is invertible. Q’s ﬁrst n (or last m − n) columns form an orthonormal basis for span(A) (or nullspace(AT )). T A Householder reﬂection is H = I − 2vvv . H is symmetvT ric and orthogonal. Explicit H.H. QR-factorization is: 1: for k = 1 : n do 2: v = A(k : m, k) ± A(k : m, k) 2 e1 T 3: A(k : m, k : n) = I − 2vvv A(k : m, k : n) vT 4: end for We get Hn Hn−1 · · · H1 A = R, so then, Q = H1 H2 · · · Hn . This takes 2mn2 − 2 n3 + O(mn) ﬂops. 3 Givens requires 50% more ﬂops. Preferable for sparse A. The Gram-Schmidt produces a skinny/reduced QRfactorization A = Q1 R1 , where Q1 ∈ Rm×n has orthonormal columns. The Gram-Schmidt algorithm is: Right Looking Left Looking 1: Q = A 1: for k = 1 : n do 2: for k = 1 : n do 2: qk = ak 3: R(k, k) = qk 2 3: for j = 1 : k − 1 do 4: qk = qk /R(k, k) 4: R(j, k) = qT ak j for j = k + 1 : n do 5: qk = qk − R(j, k)qj 5: 6: R(k, j) = qT qj 6: end for k 7: qj = qj − R(k, j)qk 7: R(k, k) = qk 2 8: end for 8: qk = qk /R(k, k) 9: end for 9: end for

x−x| The relative error in x approximating x is |ˆ|x| . ˆ Unit roundoﬀ or machine epsilon is ǫmach = β −t+1 . Arithmetic operations have relative error bounded by ǫmach . E.g., consider z = x−y with input x, y. This program has three roundoﬀ errors. z = ((1 + δ1 )x − (1 + δ2 )y) (1 + δ3 ), ˆ where δ1 , δ2 , δ3 ∈ [−ǫmach , ǫmach ]. |z−ˆ| z |z|

r 2 b 2 = min i=1 (σi yi − ci ) + i = r + 1 : n, yi is arbitrary.

so yi =

ci σi .

For

**Singular Value Decomposition
**

For any A ∈ Rm×n , we can express A = U ΣV T such that U ∈ Rm×m and V ∈ Rn×n are orthogonal, and Σ = diag(σ1 , · · · , σp ) ∈ Rm×n where p = min(m, n) and σ1 ≥ σ2 ≥ · · · ≥ σp ≥ 0. The σi are singular values. 1. Matrix 2-norm, where A 2 = σ1 . σ1 2. The condition number κ2 (A) = A 2 A−1 2 = σn , or rectangular condition number κ2 (A) = σ σ1 . Note

min(m,n)

=

|(δ1 +δ3 )x−(δ2 +δ3 )y+O(ǫ2 mach )| |x−y| |z−ˆ| z |z|

The bad case is where δ1 = ǫmach , δ2 = −ǫmach , δ3 = 0: = ǫmach |x+y| |x−y| Inaccuracy if |x+y| ≫ |x−y| called catastrophic calcellation.

Conditioning & Backwards Stability

A problem instance is ill conditioned if the solution is sensitive to perturbations of the data. For example, sin 1 is Gaussian Elimination well conditioned, but sin 12392193 is ill conditioned. GE produces a factorization A = LU , GEPP P A = LU . Suppose we perturb Ax = b by (A + E)ˆ = b + e where x E e x+x ˆ GEPP Plain GE ≤ 2δκ(A) + O(δ 2 ), where A ≤ δ, b ≤ δ. Then x 1: for k = 1 to n − 1 do 1: for k = 1 to n − 1 do κ(A) = A A−1 is the condition number of A. 2: γ = argmax |aik | 2: if akk = 0 then stop 1. ∀A ∈ Rn×n , κ(A) ≥ 1. i∈{k+1,...,n} 3: ℓk+1:n,k = ak+1:n,k /akk 2. κ(I) = 1. 3: a[γ,k],k:n = a[k,γ],k:n 4: ak+1:n,k:n = ak+1:n,k:n − 3. For γ = 0, κ(γA) = κ(A). 4: ℓ[γ,k],1:k−1 = ℓ[k,γ],1:k−1 ℓk+1:n,k ak,k:n 4. For diagonal D and all p, D p = maxi=1..n |dii |. So, 5: pk = γ 5: end for i=1..n |dii κ(D) = maxi=1..n |dii || . 6: ℓk:n,k = ak:n,k /akk min Backward Substitution 7: ak+1:n,k:n = ak+1:n,k:n − If κ(A) ≥ ǫ 1 , A may as well be singular. mach 1: x = zeros(n, 1) ℓk+1:n,k ak,k:n An algorithm is backwards stable if in the presence of 2: for j = n to 1 do 8: end for roundoﬀ error it returns the exact solution to a nearby wj − uj,j+1:n xj+1:n 3: xj = problem instance. uj,j GEPP solves Ax = b by returning x where (A+E)ˆ = b. ˆ x 4: end for ∞ It is backwards stable if E ∞ ≤ O(ǫmach ). With GEPP, A To solve Ax = b, factor A = LU (or A = P T LU ), solve E ∞ 2 ˆ ˆ Lw = b (or Lw = b where b = P b) for w using forward A ∞ ≤ cn ǫmach + O(ǫmach ), where cn is worst case exposubstitution, then solve U x = w for x using backward sub- nential in n, but in practice almost always low order poly2 stitution. The complexity of GE and GEPP is 3 n3 +O(n2 ). nomial. Combining stability and conditioning analysis yields GEPP encounters an exact 0 pivot iﬀ A is singular. x−x ˆ ≤ cn · κ(A)ǫmach + O(ǫ2 mach ). For banded A, L + U has the same bandwidths as A. x

In left looking, let line 6 be qT qj−1 for modiﬁed G.S. to j See that range(U1 ) = range(A). The SVD gives an ormake it backwards stable. thonormal basis for the range and nullspace of A and AT . T Positive Deﬁnite, A = LDL Compute the SVD by using shifted QR on AT A. A ∈ Rn×n is positive deﬁnite (PD) (or semideﬁnite (PSD)) Information Retrival & LSI if xT Ax > 0 (or xT Ax ≥ 0). m When LU -factorizing symmetric A, the result is A = In the bag of words model, wd ∈ R , where wd (i) is the T ; L is unit lower triangular, D is diagonal. A is SPD (perhaps weighted) frequency of term i in document d. The LDL m×n iﬀ D has all positive entries. The Cholesky factorization is corpus matrix is A = [w1 , · · · , wn ] ∈ R T . For a query q w m A = LDLT = LD1/2 D1/2 LT = GGT . Can be done directly q ∈ R , rank documents according to a wd d score. 2 3 In latent semantic indexing, you do the same, but in a in n + O(n2 ) ﬂops. If G has all positive diagonal A is SPD. 3 T , solve k dimensional subspace. Factor A = U ΣV T , then deﬁne To solve Ax = b for SPD A, factor A = GG k×n . Each w∗ = A∗ = U T w , and ∗ T Gw = b by forward substitution, then solve GT x = w A = Σ1:k,1:k V:,1:k ∈ R :,1:k d :,d d 3 T with backwards substitution, which takes n + O(n2 ) ﬂops. q∗ = U:,1:k q. 3

that κ2 (AT A) = κ2 (A)2 . 3. For a rank k approximation to A, let Σk = diag(σ1 , · · · , σk , 0T ). Then Ak = U Σk V T . rank(Ak ) ≤ k and rank(Ak ) = k iﬀ σk > 0. Among rank k or lower matrices, Ak minimizes A − Ak 2 = σk+1 . 4. Rank determination, since rank(A) = r equals the number of nonzero σ, or in machine arithmetic, perhaps the number of σ ≥ ǫmach × σ1 . V1T Σ(1 : r, 1 : r) 0 A = U ΣV T = U1 U2 V2T 0 0

. i. . k) = qT qk+1 Great for ρ is near 1. This converges quadratically. ∇2 f = . You keep the quadratic convergence when g(x∗ ) = 0. Secant/quasi Newton methods use an approximate aland NA relies on approximate algorithms. then x∗ is a local minimizer. Mechanics.k) The conjugate of z is z = x−iy. ∇f (x series expansion. . 2. 2. If ∇2 f (x(k) ) is PD and by solving for where ∇f (x mula. dt and y(0) = y0 . With vector multiplication) and O(nk) space. insert a remainder term +R on RHS. . If ∇g(x∗ ) has λ| max | < 1. After very few iterations of either method. A − λx is singular iﬀ det(A − λI) = 0. This may be used to compute Jacobians and determinants. 2. a transient initial velocity that in the true sowhen checking for PD. .. . for Arnoldi and Lanczos t ∈ 1 : k and d ∈ 1 : n. ∃r > 0 such that f (x∗ ) ≤ f (x) for all x ∈ T B(x∗ . . Time per iteration is O(n) + M. u1 ). u1 . . . halve α. SPD then T is SPD. . p+1 n±constant p p k =n k=1 p+1 + O(n ) √ In Armijo or backtrack line search. we choose g : Rn → Rn such that x(k+1) = g(x(k) ). transients in electical circuits.. LSI depends on even (A ˜ 2: for k = 1. and the D = T converge if λ| max | (M −1 N ) < 1. the Schur form of A is A = QT QH 4: for k=0. g(x) = 0 n → Rp holds equality constraints. we want x such that f (x) = 0. Bk is SPD provided sT y > 0 (use line search to k increase αk if needed). . that zero out rows 2 and lower of column 1. eigenvalues of Tk and H will be excellent approximations Eigenvalues & Eigenvectors to the “extreme” eigenvalues of A. True document similar. etc.. λn1 . approximate y(t) by discrete Go where the function (locally) decreases most rapidly via (k+1) = x(k) − α ∇f (x(k) . ways PD ∇2 f . (c) trust computed by ﬁnite diﬀerence. Suppose for ∇g(x∗ ) = QT QH . given dy = f (y. A is In the Jacobi method. v −kx m Assume f is c2 . substitute the exact solution into FDM for∗ ) = 0.λn } |λ|. ignore the last term. ﬁnd successive Householder reﬂec. and the eigenvectors may be chosen as the Conjugate Gradient columns of an orthogonal matrix Q.. detect 0 pivots. Note that h is a solution to a linear least squares problem min ∇g(x(k) )h − g(x(k) ) ! GN is derived by applying NMUM to to g(x)T g(x).e. f (x + h) = f (x) + h ∇f (x) + 2 h ∇ f (x)h + O( h ) Provided f : Rn → Rm . ˜ 7: wk = vk − αk qk 7: H(k + 1. if A contains information about RRT . A is diagonalizable. the αk and βk are diagonal and subdiagonal The conjugate transpose of x is xH = (x)T . Thus. where entry (i. . T is non-normal. then limk→∞ B k = 0 if λ| max | (B) < 1. characteristic polynomial and hence the same eigenvalues In the splitting method. then 1: for k = 1 : n − 1 do 1: β0 = w0 2 ∗ )T A∗ will approximate RRT well. very low. 3. For nonsingular T ∈ Cn×n . the limit point x∗ is a solution to Ax = b.03 28 21 21 28 26 23 6 8 4 2 A C F 8 8 B 2 9 F 1 A F 6 1 0 E 7 2 A 1 A 7 A B 2 4 2 8 5 2 5 2 C E 0 C E 9 9 C E 1 8 E 0 C F 1 C 1 4 0 C 4: 2: D 1 E 0 00 9 E A B E 3 0 δfm /δx1 · · · δfm /δxn choose σ (k) as eigenvalues of submatrices of A. Further for symmetric A: 1. . we want the x for min g(x) 2 . let dy = Ay for A ∈ Cn×n . ∇ f is symmetric. Steepest Descent In ﬁnite diﬀerence methods. k) = qk+1 2 The real part is x = ℜ(z). ﬁnds a α > 0 for x stable. r) = {y ∈ Rn : x − y < r}. . the If QH Q = I. . do after 1 iteration. modify that diagolution disappears immediately. α is explained later. A is diagonalizable if A is similar to a diagonal matrix The error is e(k) = (M −1 N )k e0 . f ’s Taylor expansion is f (x+h) = f (x)+∇f (x)h+O( h 2 ). g(x∗ ) − x∗ = 0. y(t) ∈ Rn . ℓ ˜ ˜ ˜ vk = uk − βk−1 qk−1 5: qk+1 = qk+1 − H(ℓ. 2 nd 2 To factor A = QHQT . x(k+1) = x(k) − (∇f (x(k) ))−1 f (x(k) ). .. Similar matrices have the same Useful for sparse A where GE would cause ﬁll-in. Too slow. Then Q = H1 · · · Hn−2 . singular values of B are the 2: r0 = b − Ax(0) per iteration. maxi ℜ(λi (A)) < 0 or = 0 or > 0 respectively. and y′ = f (y.. any h will work! (k) + αh) over α. A = QDQT is the Conjugate gradient iteratively solve Ax = b for SPD A. output Q.i T:. For symmetric A ∈ Rn×n . let Rt. wall-spring-block models for F = ma T 2x 2x (a = d 2 ) and F = −kx. (error at each step) is O(hp+1 ).i = Di. p = 1. . CG converges For any A ∈ Cn×n . . If it converges. where distribution is ρ = mintt Rt. . Provided f : Rn → R. has all real eigenvalues. dt = fi (y1 . if Ax = λx where x = 0. explicit method requires hk to be on the smallest scale! Line Search Backward Euler has yk+1 = yk + hf (yk+1 . κ(A) = 1.. In initial value problems. Yields d[x. R:. . x(k+1) = M −1 N x(k) + b . . is reduced by 1: x(0) = arbitrary (0 is okay) Error 3. and we have H in Hermitian or self-adjoint if A = AH .t. the gradient and Hessian are x(k) x 2f δf δ2 f δ2 f The inverse shifted power method is x(k+1) = (A − · · · δxδ1 δxn 2 δx1 δx2 δx1 δx1 σI)−1 x(k) . the global truncation error Newton’s Method for Unconstrained Min. then . A B = (AB) 9: end for 9: end for The absolute value of z is |z| = x2 + y 2 . x(k+1) = x(k) + αk sk {Use special line search for αk !} yk = ∇f (x(k+1) ) − ∇f (x(k) ) T yk yk Bk sk sT Bk 6: Bk+1 = Bk + − T k T αyk sk sk Bk sk 7: end for By maintaining Bk in factored form. To 5: αk = (rT rk )/(pT Apk ) with unitary Q ∈ Cn×n and upper triangular T ∈ Cn×n . n.e. . use a per6: x(k+1) = x(k) + αk pk In this sheet I denote λ| max | = maxλ∈{λ1 . 3: 3: qk = βk−1 for ℓ = 1 : k do 4: uk = Aqk 4: H(ℓ. Ordinary Diﬀerential Equations Optimization In continuous optimization. tions H1 . tk )hk where direction of negative curvature where hT ∇2 f (x(k) )h < 0 hk = tk+1 − tk is the step size. for 3: p0 = r0 square roots of B T B’s eigenvalues. initialize α. diagonalizable iﬀ it has n linearly independent eigenvectors. Nonlinear Equation Solving Given f : Rn → Rm . . Ecological models. (A − σI)−1 has eigenpairs λ11 . . Lanczos is O(nk)+k ·M time (M is time for matrixRemember. h : R h(x) ≥ 0 holds inequality constraints. In the Gauss-Newton method. . A ∈ Cn×n is entries of the Hermitian tridiagonal Tk . It produces the exact solution after n iterations. The singular values are absolute values of eigenvalues. ﬁnd the associated eigenMultivariate Calculus (k) T (k) value λ through the Raleigh quotient λ = x (k) TAx . 4: 5: Non-linear Least Squares For g : Rn → Rm . . do −1 3: sk = −Bk ∇f (x(k) ) D6 22 9 9 9 9 In the Ando-Lee analysis. We have local minimizers x∗ which are the best in a region.i . 3: Set A(k) − σ (k) I = Q(k) R(k) Q(0) · · · Q(k+1) . y1 ..: 2 wk−1 distribution of topics. and one can apply the chain rule to get the derivative. x(k+1) = x(k) − h where h = (∇g(x)T ∇g(x))−1 ∇g(x)T g(x). ∇f = . A = M −N and M v = c is easily (though probably diﬀerent eigenvectors). do orthog. and symmetric. singular values equal eigenvalues. Then this may not converge as (∇g(x∗ ))k initially grows! In Newton’s method. . . If x∗ is a local minimizer.v] = m dt dt dt 02 27 01 20 02 22 20 . 1. k k speed up CG. . ity is RRT = Rn×n . given x(k) and step h (perhaps derived from is implicit (y k+1 on the RHS).If f is c (2 partials are all continuous). where c = λ| max | + ǫ. A’s eigenvalues are D’s diagonals. f : Rn → R min f (x) is the objective function.d ≥ 0 be document d’s Given A ∈ Rn×n and unit length q1 ∈ Rn . where ǫ is the inﬂuence of the ignored last term.. 8: βk+1 = (rT rk+1 )/(rT rk ) κ(M A) ≪ κ(A) and solve k+1 k Power Methods for Eigenvalues M Ax = M b instead. r1 .such that A = QHQT . In Broyden-Fletcher-Goldfarb-Shanno: 96 ODE (or PDE) has one (or multiple) independent variables. This indicates a linear rate of convergence. We did unrestricted optimization min f (x) in the course. i. Use Lanczos for symmetric A.e. T −1 AT (the similarity transIterative Methods for Ax = b formation) is similar to A. with closed dt form y(t) = y0 eat . 10: end for Once you ﬁnd an eigenvector x. let yk+1 = yk + f (yk . e. Q is unitary. then ∇f (x∗ ) = For stability of an ODE. x k k For many IVPs and FDMs. What if the Hessian isn’t PD? Use (a) secant method. trans. if the local truncation error stateless: depends only on the current point. . it converges superlinearly. ˜ qk+1 8: βk = wk 2 8: qk+1 = H(k+1. ( κ(A) − 1)/( κ(A) + 1) 4. LSI does worse. tk+1 ). is reﬂexive. so d 2 = −kx . For SPD. det(A−λI) is A’s characteristic polynomial. Is SPD (or SPSD) iﬀ eigenvalues > 0 (or ≥ 0). for a corpus with k topics. The imaginary part is y = ℜ(z). keep only the leading term. x is an eigenvector For k iterations. dyi 2. do 2: qk+1 = Aqk max Rt. . In exact line search.1. −σ −σ δf δ2 f δ2 f δ2 f δxn δxn δx1 δxn δx2 · · · δx2 n Factor A = QHQT where H is upper Hessenberg. decay if a < 0. . BE Line search. . The secant condition αk Bk+1 sk = yk holds. ∇f = . Explicit! region idea so h = −(∇2 f (x(k) ) + tI)−1 ∇f (x(k) ) (interpoA stiﬀ problem has widely ranging time scales in the solation of NMUM and SD). so splitting methods −1 AT .g. r1 r2 = a f (x Exact arithmetic is slow. . futile for inexact observations. Exponential growth/decay with dy = ay. t). if old qk ’s are discarded. Examples include: 1. the Jacobian is (0) = A 1: A A(k) is similar to A by δf1 /δx1 · · · δf1 /δxn 2: for k = 0. In ﬁxed point iteration. un ). If the original program is (k+1) = x(k) + αh. Iterate by x(k+1) = x(k) − (∇2 f (x(k) ))−1 ∇f (x(k) ).d 2 = 1.. This will fail of A has any zero diagonals. Ax = (Ax). then x(k) → x∗ as e(k) ≤ ck e(0) for large k. consider M as the diagonals of A.1α∇f (x(k) )T h. . if ∇f (x∗ ) = 0 and The stable or neutrally spable or unstable case is where ∇2 f (x∗ ) is PD. A global minizer is the best local minimizer. . chemical reaction rate varinal in ∇2 f (x(k) ) and keep going (unjustiﬁed by theory. Perhaps . yn . . r). fi encodes species relationships. m ≥ n. and dropping a resulting tensor (derivative of Jacobian). or O(n + k) space λ as a variable. k)qℓ 5: Complex Numbers √ 6: αk = qT vk 6: end for k Complex numbers are written z = x + iy ∈ C for i = −1. Semi-conversely. m ≥ n. conditioner M such that 7: rk+1 = rk − αk Apk For B ∈ Cn×n .: 2 . but if ρ ≫ 1. Arnoldi is O(nk 2 ) times and O(nk) of A and λ is the corresponding eigenvalue. t) for species i = 1. H2 . . we want y(t) for t > 0. (d) factor ∇2 f (x(k) ) by Cholesky lution. (b) In Euler’s method. optimize min f (x Miscellaneous Frowned upon because it’s computationally expensive. space.2. Then. It is derived from Lanczos and takes advantage of if A is eigendecomposition of A. A ball is a set B(x. e(k+1) ≤ c e(k) 2 . A linear (or quadratic) model approximates a function f by the ﬁrst two (or three) terms of f ’s Taylor expansion. H relevance to topic t. but ability over temperature.g. SD or NMUM). For Lanczos. g : Rn → Rm s. un . SD is points y0 (given). g(x(k) ) = g(x∗ )+∇g(x∗ )(x(k) −x∗ )+O( x(k) −x∗ 2 ) For small e(k) = x(k) − x∗ . ax2 + bx + c = 0. derived To ﬁnd p. the step is a descent direction. . Using LSI. Automatic diﬀerentiation takes advantage of the notion that a computer program is nothing but arithmetic operations. If A has eigenpairs (λ1 . y2 . solve for R. 9: pk+1 = rk+1 − βk+1 pk x(k+1) = Ax(k) converges to λ| max | (A)’s eigenvector. Growth if a > 0. For A ∈ Cn×n . For B ∈ Rm×n . U (k) = . The Taylor expansion for f is 1 T 2 T 3 T T rows 3 and lower of column 2.} for k = 0. yi is population size. An works in practice). 2. A A(k+1) = R(k) Q(k) + σ (k) I 5: end for C 1 E 8 7 2 0 F 6 5 D 7 A 1 7 C7 00 99 D6 6 2 7 2 8 2 6 5 3 2 88 8 6 5 A 4 C 6 9 9 E E 1 6 F D 1 1: B0 = initial approximate Hessian {OK to use I. transitive. 1. (λn . . . j) is relevance of i to Arnoldi Lanczos j. since the tensor → 0 as k → ∞. dt 0 and ∇2 f (x∗ ) is PSD. Call p the order of accuracy. i. T ’s superdiagonal portion is large relative to the diagonal. . While b2 c (k) + αh) > f (x(k) ) + 0. . This relationship solvable. e. If it converges to x∗ . with y0 as initial position and velocity. Arnoldi. If BFCS converges. t) is perhaps where h or −h (doesn’t work well in practice). eigenvectors are columns of T since AT:. so yk ≈ y(tk ) for increasing tk . use a Taylor (k) ) = 0. . . (error overall) is O(hp ). r2 = −b± 2a −4ac . can iterate in O(n2 ) ﬂops.

- Runge–Kutta methods
- NA-5-Latex
- Chapter 1 Notes
- Computation of the Singular Value Decomposition
- BASIC ITERATIVE METHODS FOR SOLVING LINEAR SYSTEMS.pdf
- Linear Algebra Lectures
- scilab5a
- DDXIII
- Equation Matrix
- Defranza solution manual
- cis515-12-sl5
- Book Linear
- Linear Algebra 01
- Numerical Analysis-Solving Linear Systems
- Chapter 1 Student New 1
- VARIOUS METHODS TO FIND RANK
- Matrix
- Matrices w
- T15.Matrix and Vector Algebra
- Matrices
- nc10
- IE301 03 Review Linear Algebra
- 422816240311Chapter 7 - Solving Linear Systems of Equations
- Week1-3
- Matrix
- Matrix 101
- Numerical Analysis Study Guide
- Kuttler LinearAlgebra Slides Matrices MatrixArithmetic Handout
- eign cha

Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

We've moved you to where you read on your other device.

Get the full title to continue

Get the full title to continue listening from where you left off, or restart the preview.

scribd