You are on page 1of 35

Numerical Analysis Class Notes

Sarah Constantin April 19, 2012

Grading consists of homework and nal exam. This class is conducted in C and Fortran. No, not C++. Text is Dahlquist and Bjorks Numerical Methods.

Lecture 1

If you implement a mathematical statement on the computer, you will not get a correct answer. Which mathematical statements can be implemented on the computer and produce a correct answer? That is the subject of this class. The second part of numerical analysis deals with a more subtle question. It is desirable for code to run in nite time. A large proportion of algorithms will not do any such thing. A computer can do half a billion operations per second... but this isnt actually a lot. Inversion of a matrix costs n3 operations. 1000x1000 matrix takes a second and this is not a big matrix! Unless you do things correctly, a computer is a very small and slow device. Java is not a numerical language. Neither is C++. We are not going to do things with object oriented programming. A simple problem should be written concisely. (Of course, codes have to run!) f (x + h) f (x) h Suppose you want to calculate a derivative numerically. Its tempting to take this with a reasonably small h and make an approximation that way. This is not a good idea. Suppose f (x) 1. Choose h = 1020 . Then f (x) f (x + h) will be of the order of 1020 and with double-precision arithmetic, we get 0. We have to do something dierent. f (x) = lim h 0

This also brings us to the question of the representation of numbers in the computer. A computer allocates a certain amount of space for a number. First bit: plus or minus. Then represents the number in the form 0.1234567 1011 . In binary, but we dont need to consider that. The important part is that its in scientic notation. This is called oating point. The exponent in double precision goes up to about 200. You never want to use single precision calculations in numerics. Why? In normal human behavior, a million is still a number that we can compute; the precision is too low in single. You constantly run into trouble. With double-precision, you only run into trouble if you dont know what youre doing. How do we add two numbers, 0.1245 and 0.3953E 2? By the time we shift to the right you lose two digits. By the time you shift to the right four times, the sum of these two numbers is unchanged. So if a is much smaller than b, (a + b) b = 0 a + (b b) = a Addition is NOT associative! What about multiplication? As a practical matter, you rarely run out of exponents. 101 00 doesnt come up too often. Multiplication and division do not cause the same loss of accuracy as addition and subtraction. So what is the right way to analyze and calculate a derivative? If h = 107 then were left with 6 digits. What is the relative error in the derivative? f (x + h) = f (x) + f (x) h + f (x)/2h2 + ... Subtracting and dividing by h, f (x) = f (x)/2h + . . . The error of the formula is of the order h. The best possible error is the square root of the accuracy. The best accuracy you can get in single precision is about 3 digits. In double, the best you can get is 6 digits. So this formula is never used. There is a trivial modication that improves accuracy dramatically. f (x + h) f (x h) 2h Try decomposing it into Taylor series. Observe that the f (x) term cancels. As do all the even derivatives f (x) and so on. f (x + h) f (x h) = f (x) 2x + f (x)/6h3 + . . . 2 Consider

Set h to 105 , h2 = 1010 and the optimum h will be 105 and the accuracy will be 1010 . f (x + h) f (x h) is about 105 . The error will be roughly 1015 because theyre each given to precision 15 digits. Then dividing by h, which is 105 . So the global eort is 1010 . Just simple multiplication: 1015 /105 = 10(1510) = 1010 . If h = 105 then this will be evaluated with error 1010 , not 1015 . Number of digits grows like 1/h in the numerical computation. The Taylor error decays as h2 . Want to minimize the sum. This happens to be 105 . In the bad formulation its 1/h + h, which is minimized for h is the square root of the precision. In the good formulation its the cube root. This is the crudest version of the origins of numerical analysis. Most people doing calculations ignore these issues completely. This is especially true in research. Implementation matters. Now what happens when you do integrals? Riemann integration approximate with rectangles and take the limit of upper or lower sums. Immediately observe that the problem we encountered with derivatives does not occur we are not dividing by small numbers. Its okay to integrate functions, but it is not okay to dierentiate them. Error we make in the error of any rectangle is a triangle of order about 1/2h2 . Total number of such triangles is about 1/h. So we make a mistake of order h. It turns out that if instead of the Riemann sum we subtract 1/2 of the rst and last rectangles, the error goes to order h2 . In order to save two operations, youre taking the square root of your error. For two more operations, you can replace h with h4 . If you take h that is too small, though, you just take more time. In dierentiation, if you take h too small, you lose digits. So integration is easier than dierentiation. Suppose we have a mapping from R2 R1 . We have dF/dx and dF/dy. Calculate them the way you would think; x x and dierentiate with respect to y, and vice versa. F (x, y) = dF dF + + . dx dy

The error is of the order 2 plus something of the order 2 . Its all perfectly mechanical. If a function has three continuous derivatives, then all mixed derivatives of order two are the same.

Lecture 2

Inverse Function Theorem. Suppose f (x) = 0, f (x) = 0. Then x = x 3 f (x) f (x)

f (x + t) = f (x) + f (x)t + O(t2 ) Newtons Method consists of taking this statement seriously. If we have an approximation Xk , we dene f (Xk ) Xk+1 = Xk f (Xk ) We take the approximation, follow the tangent, and get closer to the zero. If your error started o at t, then after the next step it will be t2 , and the next step it will be t4 . This is quite fast. When Newton can be used, it SHOULD be used! If Newton divides the error by two, if you get 106 , something is wrong its a bug. Any results produced by a code with a bug make no sense they are not slightly wrong, theyre all wrong.


Linear Algebra

A linear operation is of the form i Xi We can rotate, compress, and expand spaces. A linear mapping of Rn into Rn is naturally written by an n n matrix. Notational convention: multiply rows by columns. NOT vice versa. Multiplying matrices is composing linear operators. Cij = aik bkj

An orthogonal matrix is dened by the following characteristic: the inner product of every row and every column with itself is ONE and the inner product of every row and every column with a dierent one is ZERO. Orthogonal matrices do not change distances. Orthogonal matrices have nice properties theyre invertible. Only rotates and reects multiplies everything by one. Cannot have determinant zero. The inverse of an orthogonal matrix is its adjoint. Lets prove this! Take two orthogonal matrices. One is the other ipped. Multiplying row by column is the same as multiplying a row or column by itself. So you get one on the diagonal, and zero on the o-diagonal. Theres the proof! Now, A = U DV where U and V are orthogonal and D is diagonal. Multiply by the adjoint of U: U A = DV 4

and then multiply by the adjoint of V. U AV = D This says you can diagonalize matrices! Or D1 U A = V V D1 U A = I Now we can even INVERT matrices! Once you have singular-value decomposed the matrix its completely defenseless. Maximum value on the diagonal of D divided by the minimum value is called the conditional number. This is an extremely important concept. The number of digits you lose is the log of the condition number, base ten. Well come back to this. If the matrix is symmetric, the matrix is simplied: A = U DU the other orthogonal matrix can be simply the adjoint of U. This is called diagonal form. The values along the diagonal of D are called eigenvalues. A = Back to calculus and the inverse function theorem in one dimension. It says that if you have a function, and its linearization f (x + t) = f (x) + f (x)t + O(t2 ) is not singular, then in the vicinity there is a root f (x) and Newton will guess it. The same is true in higher dimensions, say in Rn Rn . Proper denition of derivative in this environment is a matrix. Now if we have a function at a point x, and nearby there is a root, and the derivative is nonsingular as a matrix, then a Newton-style process of the form Xk+1 = Xk A1 f (xk ) will get you to that zero. Here A is the matrix of derivatives. Its basically indistinguishable from the one-dimensional version if you dont look at it too closely. This is ALL FALSE if the derivatives arent continuous everywhere. And even then, the proofs stink. Two continuous derivatives, and the proofs are trivial. We will not cover this in this class. 5

Lecture 3

What is the most important characteristic of your programming style? A general program of any length cannot be debugged. It costs more than 2n operations to debug a program of length n. In practice the situation is even worse. So structured programming limits the number of things that you can do. Its a set of rules that will eliminate the unreasonable cost of debugging code. Rule 1: No global variables. They can inuence anything anywhere. This makes debugging expensive. In a function, all declarations should be at the top. Rule 2: No GOTOs backwards. This gives you spaghetti code if something goes wrong at a given line, you dont know where it came from. Sometimes GOTOs forward are ok to get out of a loop, for instance. Rule 3: Do not export concepts from one part of the code to another. Suppose you have a code that designs a building. One part deals with windows. You name a variable WindowsillArea. Then you want to do calculations with timber. There, you should name your variables A, B, and C. Write both parts separately and translate between them. Dont do linear algebra with variables named WindowsillArea. Rule 4: Loops should have counts. If something goes wrong you can print the counter! Rule 5: if youre going to draw a owchart, it should go on one page and be sequential. Rule 6: Code should be documented. Any group of seven lines should be identied with its meaning. Rule 7: Always have working code in front of you. Start with very short code and immediately check. And keep many backups. (this is homemade version control...) Rule 8: Print everything. And the printer should be polite. Always echo-print to a le. Make yourself a function that does this for you, to save time. Rule 9: Do not stare at the code. Run something!

Lecture 4

Code must compile in gcc. No C++. Bool is bad. c.out is bad. No classes, no function overloading, all bad bad bad. Testing: will take out our main routine and compile it.

Dont use structures. If you think you need to, you dont. Just use arrays thats the most complicated thing youll have to do for this class. Start early. Most shouldnt take that long.


Diagonalization of Matrices

A matrix is normal if it is diagonalizable by a unitary transformation. A = U DU Theorem: A complex square matrix A is normal if and only if A A = AA . A Jordan matrix has a Jordan canonical form. Diagonal sum of Jordan blocks, where each block has nonzero eigenvalues on the diagonal and a superdiagonal of 1s. Any matrix has a Jordan decomposition. If its diagonalizable, then all the Jordan blocks are single elements. You cant diagonalize everything, though. Theorem: Every A has a singular value decomposition A = U V , and the l are determined uniquely and if A is square then ul and vl are also uniquely determined up to a complex sign. A= Rank of A is the number of nonzero l . Jacobi Eigenvalue Algorithm A A = I a2 + c2 = 1 b2 + d2 = 1 ab + cd = 0 This is a rotation matrix, [cos , sin ; sin , cos ]. Orthogonal transformations preserve the sum of squares of o-diagonal elements. ...lost track here... More in Dahlquist. l ul vl

Lecture 5

Orthogonalizing a matrix: I orthogonalize the second row to the rst, then normalize and orthogonalize to the rst and second rows, and so on. (orthogonalize = project onto orthogonal subspace.) This is called Gram-Schmidt. Unfortunately its unstable. But its probably the simplest way to solve linear systems. And its only a little slower than Gaussian elimination. Suppose you have a 7x7 linear system, to 15 digits, and its symmetric. Condition number is 100. How do you solve it? Its not complicated; condition number doesnt matter here. Use Gram-Schmidt or something. At that size, you dont lose that many digits. You construct an identity matrix, and every time you multiply a matrix by a number and subtract it, you apply that to the identity matrix; this is the product of elementary matrices we applied. This just proved that the classical Gram-Schmidt is the product of a triangular and an orthogonal matrix. A pivoted Gram-Schmidt is actually a product of three matrices: triangular, permutation, and orthogonal. But there are other ways to do this, such as QR factorization. This decomposes it into an upper triangular and lower triangular matrices. Take a 2x2 matrix, orthogonal and apply it to the rst two rows. An orthogonal matrix, 2 by 2, has the form cos sin sin sin so we rotate until the last element in the two rows is a 0. We keep doing this until all the last columns except the bottom right are 0. Then we repeat the process until we make the whole upper triangle 0. This scheme has a conceptual advantage over Gram-Schmidt. You never divide by small numbers or any numbers. You apply orthogonal transformations. Its stable by denitions. Its also twice as slow as Gaussian elimination. But its widely used. And it provably works. At this point we should be ready to implement these methods. The solution of linear systems is n3 . Every one of these is n3 . There are n passes each with n2 operations. There is no theorem that prohibits faster methods, but known algorithms are this slow. But there are many special cases. If you have a matrix that is tridiagonal only three nonzero diagonal then a child can solve such a system in order n operations. They dont come up very often, but theyre available.

How about 5-diagonal matrices? Still pretty fast two eliminations in each direction. Banded matrices of bandwidth m and size n cost only m2 n operations. If m is small youre ok.

Lecture 6

Inverse power method and the determination of eigenvalues of matrices. Why would anybody want to do that? Suppose we have a matrix A. If we apply A to a vector many times and normalize, we will eventually converge to the top eigenvector. Nothing prevents us, however, from replacing A than A I. If is close to an eigenvalue, 1/( ) will be a very big number, so the projection on that eigenvector will be multiplied by something very large. We will get a convergence that can be fast. is known as the shift. Can we base a diagonalization of A on some kind of approach like this? This is n4 rather than inversion, which is n3 . So this seems insane. Why would you do this? Now suppose you have a tridiagonal matrix. These we know how to solve linear systems with. But if we try to use the normal Jacobi method here, we lose that advantage. We dont know any fast way to diagonalize a dense matrix. Let M= ||Axk || ||xk ||

xk = Axk this approaches the smallest eigenvalue. If you want to apply the inverse of the same tridiagonal matrix, its only n operations, so you can apply it a few times. Then shift by , and every time you check how close Axk is to xk , how close it is to the eigenvector. And if youre suciently close to the solution, you get cube-of-error convergence. This is very fast. You run out of precision before you have time to notice. That only gives us one eigenvector, though. What about the rest? You cant just subtract it from the matrix or youll get a dense matrix. You already have one eigenvalue; orthogonalize it to that eigenvector. At every step. Before you compute the inverse and after. This is order n. But we can nd at least 2 or 3 these times. n(n/2) operations per iteration. And again there are n operations, so youll expend n3 operations as usual. But. Theres something else you can do that works. n . If you have an eigenvalue, its closer to than to any other eigenvalue; there will be no tendency to nd eigenvalues far away from . In practice, if you use a random number generator for your vector, the 9

probability of getting the wrong eigenvalue is very low. Anything 0.01n away from I accept; anything farther away I ignore. Rule of thumb. This is a viable method for diagonalizing tri-diagonal matrices. This does not tell us how to diagonalize non-tri-diagonal matrices. Consider a matrix where you cant do Gaussian elimination on the next-to-last bottom right rows. and suppose theres stu on the diagonal. Eliminate the second column from the right; but youll have to stop at the third-to-last row. Were rapidly approximating the appearance of a tri-diagonal matrix Well have eliminated all elements except for a tri-diagonal matrix. How long does this take? order n to kill each element. So about n2 elements to kill, and each costs n to destroy so n3 . The cost is not really that dierent from the cost of solving a linear system using QR in the rst place. This is called the classical Jacobi tri-diagonalization of a matrix. Its sort of a small n3 . 500x500 matrix ne, 1000x1000 matrix not so much. BUT when you diagonalize a tridiagonal matrix, you get the eigenvalues right along the diagonal. You do not accumulate as the result of multiplication of some matrices. With this tridiagonalization, you get there with 2x2 rotations. In order to take the global matrix, you have to apply these rotations to the matrix of eigenvectors that you created to the tridiagonal matrix. Orthogonal matrix obtained by inverse power matrix, and then do these n2 2x2 matrices.Each operation costs you n. The n3 components of this are two: one, tri-diagonalizing the matrix; two, applying the tri-diagonal matrix to the matrix of eigenvectors. This is a few times more expensive than solving a linear system. A dierent method: Apply the inverse power method. In order to prevent yourself from converging to eigenvalues youve already found, orthogonalize to the old eigenvectors. If you get a matrix with a big cluster of eigenvalues, all close to each other, youll get screwed.

Lecture 7

Take a tridiagonal matrix and try to diagonalize it. One of the elements on the diagonal will be zero if the determinant is zero. First discuss the case where it is zero on the diagonal in the original tridiagonal matrix. Then applying an orthogonal transformation on the right doesnt change it. once the rst row consists of zeros, then the rst column consists of zeros, so we have the rst eigenvalue; the rst row of the orthogonal transformation matrix


will be the eigenvector corresponding to the rst eigenvalue. Now we have a new matrix, to which you can apply the same procedure. How much is this going to cost? Well, its a tridiagonal matrix, we rotate (orthogonal transformations) but we dont have to rotate zeros. Takes n operations for the whole thing, but since we have to do this n times, thats n2 operations. This is the QR scheme. It gives all eigenvalues. Convert the tridiagonal matrix into upper trangular times orthogonal. (Get zeros all along one of the band edges.) QR method: factor Ai = Qi Ri , then let Ai+1 = Ri Qi = QT Ai Qi . This i has the same eigenvalues (since QT Qi = I until convergence. If it converges to a diagonal i matrix that gives the eigenvalues. It converges if the eigenvalues all have distinct absolute values. The orthogonal transformation, at every step, is the rotation that brings a pair of rows closest to upper triangular. You dont lose precision when eigevalues are close. But if the last two eigenvalues are tiny, if you do an implicit QR, youll get extra precision on the smallest eigenvalues; if you use explicit QR, youll lose enough precision that they seem identical. Because you took a tiny number, and subtracted 1 and then added it back, which gives you 0. Because we replace A with A I and run QR on it. Tridiagonal matrices: you cannot get the smallest eigenvalue, if its very small, except to observe that its close to zero. Tridiagonal matrices, it turns out, can avoid this issue. In a diagonal matrix, you obviously can just read o the bottom eigenvalues. Eigenvectors are basically localized in dierent parts of the matrix. Now we look at approximating functions. Youre given a function and some values, and you want to approximate the function. Obvious approximation: polynomial. Slightly more complicated: rational function. i=k (X xi ) look at this polynomial. LOOK AT IT. Its equal to zero at all points except one. But if we divide it by i=k (Xk Xi ), we have a polynomial thats equal to zero at all points except one, where its equal to one. Suppose f (xi ) = fi Then consider
i=k (x i=k (xk

xi ) xi )


This is the only polynomial of order n going through all n points fk . Suppose we have pk X k with pk xk = fi . If Aij = i (xj ) is a nonsingular matrix, this is referred to as a Chebyshev i system. The polynomials of the above form are a Chebyshev system. 11

You can use the Euclidean algorithm to divide polynomials. Pn = Snm Qm + m1

Lecture 8
j (x i i (xi

xj ) xj )


this will recover a polynomial through the points fi . This is ne in exact arithmetic. But there is potential for dividing by a small number. There is instability. Some of these products are big. Your polynomial is equal to the function at the point; but the derivatives may not be the same. You have a function which you approximate with a polynomial; and then discover with horror that it does weird shit between the points. Very large or very small between the points. This is called Runges phenomenon. It turns out that Runges phenomenon is easily defeated by a dierent choice of points. Chebyshev points: take points equally spaced on the circle and their images on the unit interval. If you use these as interpolation points, you get error that is uniformly distributed. As long as all the polynomials are small, you will not have big error. A polynomial of the above form, with xj Chebyshev nodes, is the smallest polynomial of order n passing through the fi . About f we require nothing: only continuity. Lets approximate it with a polynomial of order n in two ways. Way one: take Chebyshev nodes, approximate as above. xi = cos( 2i 1 ) 2n

Unique polynomial Pn1 which has value f (xi ) at all the points xi . Interpolation error at x, if f is n times dierentiable: f (x) Pn1 = f (n) () n! (x xi )

so you try to minimize | (x xi ), a monic polynomial of degree n. Maximum is bounded below by 21n which is attained by the Chebyshev polynomials 21n Tn . So |f (x) Pn1 (x)| 1 2n1 n! [1,1] max |f (n) ()|

c Error denoted En (f ). Now, suppose God gives you the best approximation of order n. Error will be denoted En (f ). Error refers to maximum error. c En (f ) En (f )


This tells us how bad the Chebyshev error is compared to the best achievable. This is less than 4 so long as n 20. Less than 5 for n 100. generally grows as log n. Note: k X k are NOT good ways to approximate polynomials. Set of linear equations, one equation for each point, n variables and n unknowns, and you can solve the linear system numerically for the coecients k . All good? Theres a unique solution. But polynomials are not well approximated as functions by their coecients. Ill conditioned for large n. Now we look at orthogonal polynomials. Vectors can be orthogonalized: this is GramSchmidt. If you have a bunch of functions f1 . . . fn , nothing prevents you from normalizing them, (in L2 ), and performing Gram-Schmidt as per usual. For functions fi we can take polynomials 1, x, x2 , . . . . The result is known as orthogonal polynomials. The procedure just described will not work on the computer. For all practical purposes, these functions are linearly independent. When you orthogonalize x to 1, and normalize, continue... by the time you get to the 8th polynomial, the denominator is small. Youll lose a bunch of digits. By the time the remainder is of order 1015 youll get exactly zero digits. Instead, orthogonalize each to the preceding polynomial. This is a perfectly stable process. Take some other polynomial (not xn ) and orthogonalize it to the previous polynomial. Pn dened to be orthogonal to the previous;

Pn xPk dx

But Pn is orthogonal to all polynomials of order less than n. So long as k + 1 < n this is already zero. Orthogonal polynomials satisfy a three-term recurrence: Pk+1 = Pk x + Pk1 Legendre polynomials are orthogonal polynomials on [1, 1]. n wiggles if polynomial is of order n. Represent an orthonormal basis of L2 . Good way to represent functions. If we represent a function in the form k Pk (x) this is widely used and stable. The Chebyshev nodes and the roots of the Legendre polynomial are almost the same for large n. Same thing with a weight 1/ (1 X 2 ) are Chebyshev polynomials; the Chebyshev nodes are their roots. Chebyshev polynomials are special in one characteristic: it has a simple elementary formula. Tn (x) = cos(n acos(x)) Hermite polynomials are orthogonal on the real line; the weight function is ex .


Lecture 9

There is another class of algorithms related to orthogonal polynomials, and that is quadratures. Suppose you want to calculate

f (x)dx

Riemann approximation is bad; error order 1/n. fi h this is the rectangle rule. f (xi )wi is a generalized rule. If xi , wi are properly chosen, it may be a better approximation. Trapezoidal rule: alternative to rectangular rule. But rectangular rule is the same as trapezoidal rule except at the endpoints, where they have to be divided by two. As we saw before, convergence of order h2 . Simpsons rule:

ba a+b [f (a) + 4f ( ) + f (b)] 6 2

This is a quadratic interpolation. Book recommendations: Gradstein and Ryzhik; Abramovitz and Stegun. Newton-Cotes formulas: use the endpoints on an interval. (The ones that dont use the ends are called Newton-Cotes formulas of open type.) Unbelievably General Fact: Euler-Maclaurin Formula.

f (x)dx Tn (f ) =

(b a)k

f (2k1) (b) f (2k1) (a) (2k)!

Richardson Extrapolation Suppose you have a quantity Q that you want to integrate. Or a measure. You can, by expending an amount of work proportional to n, you can calculate an approximation to that quantity Tn and the dierence will behave like /n + /n2 + /n3 + . . . . Also observe that Q = T2n + /2n + /4n2 + . . . . By the same argument, applied to 2n. Nothing prevents you from taking the second identity, multiplying by two, and subtracting from the rst identity. Q = 2T2n Tn + C/n2 14

We took a process that had convergence rate 1/n, doubled the cost, and converted it into a process that converges like 1/n2 . It is very useful. In fact the convergence doesnt even have to be of integer power; Richardson Extrapolation can be used in a variety of cases. The drawback is that you double the number of points each points. You cant do it too many times youre raising n to 2k times, where k is the number of times you do Richardson. Could you, instead of doubling, multiply by a smaller number? The denominator then starts to go down... which means youre dividing by and you start to have problems when the factor goes down below 2. Gaussian Quadrature You have an integral,

f (x)dx

A trivial way to do it is to take a bunch of points, approximate with a polynomial, and integrate the polynomial. If you approximate with a polynomial of order n 1, youll need n points, and the error will be order hn1 . You have n points and you choose n weights. wi f (xi ) Its reasonable, but its not too reasonable. Because we have 2n sets of parameters; w, and xi . f2n1 = pn qn1 + n where pn are polynomials, xi are Gaussian nodes. The polynomial ( (x xi ))2


has integral equal to the value at m times 2; the mth weight of a Gaussian quadrature is the ratio of two positive numbers, and hence positive. How does this help?


1dx = 2

Sum of the weights equals two; theyre all positive. So all of them are less than 2.



Lecture 10

Consider two orthogonal polynomials (of whatever form.) Say theyre of order less than n. And take their product. Pi (x) Pj (x) Because theyre orthogonal, this is 0 unless i = j.

Pi (xk ) Pj (xk )wk =


Pi (x)Pj (x)dx

If we construct a matrix of Pi (xj ) wj , then this matrix is orthogonal; the product of two columns is simply the above integral. That is a very interesting statement. We just proved that if we take a function and decompose it into a linear combination of orthogonal polynomials, and if were given that polynomial at the nodes it is a stable process. We need to apply some matrix, and the matrix doesnt have large elements. Its an orthogonal matrix. All of its singular values are one. So no error will be amplied. Decomposition into polynomials, provided you can evaluate the function wherever you like, is a stable process. This gives you a way to create interpolation schemes; they consist of applying the matrix Pi (Xj ) wj . This also illustrates the Runge phenomenon from taking equally spaced nodes instead of Gaussian nodes. (Pi (xj ) wj )2 = wj because its an orthogonal matrix. FInite Dierence Newton: Calculate the derivative approximately, as use this in the model f (x) = f (x0 ) + hf (x0 ) + o(h2 ).
f (x+h)f (xh) . 2h

Pi (xj )2 = 1



Lecture 11

If you dont have access to a derivative, you might want to use Finite Dierence Newton, where you approximate the derivative with a nite dierence. Theres also something called the Secant Method. Have two points on either side of the presumed zero. Approximate the derivative between these two points by f (xk ) f (xk1 ) xk xk1 You do nite dierence Newton, but you use the points in previous iterations. One function evaluation instead of two, at the expense of less accuracy of the derivative. Every iteration 16

raises the error to power 1.5 or so. Generally secant method should not be used. At some point xk xk1 becomes eectively zero and you get complete noise. If you have a poor termination criterion, you can be in big trouble. Moreover, when youre far away from the solution, Newton is very robust. For various method, secant method loves to go haywire. Third method: Miller Method. There is no rule against using higher derivatives of higher order Taylor series. Approximate the function by a quadratic function. f (xk ) + f (xk ) t + f (xk )/2t2 Find the root of this quadratic polynomial and choose the one nearest to xk . This is a cubic order convergence scheme. If you are doing Newton, you have to square the error every step. If it doesnt, you have a bug. If it doesnt square the error, its not Newton. But I know its Newton, I implemented it! No you didnt. If it doesnt square the error youre out of luck. Additional step in optimization that is extremely useful. Make a Newton step. Go halfway. Evaluate the function to see if youve reduced your function. Half again, half a few times, etc. Makes the whole thing more forgiving. Youre not getting quadratic convergence, but it gives you an ad-hoc idea of whats going on. Recall that we can calculate orthogonal polynomials due to the existence of a three-term recurrence. Suppose we want to nd the roots of an orthogonal polynomial of order 100? Near the ends, distance between roots will be roughly 1/n2 . Then do bisection on each interval. Its slow, but it works for low orders. If its a Chebyshev polynomial, then we can nd the roots directly, because theres a formula. Otherwise, we can do Newton. For which we need the derivative. But three-term recurrences allow us to calculate the derivative. Pk+1 (x) = xPk (x) + Pk1 (x) Dierentiate! Pk+1 (x) = Pk (x) + xPk (x) + Pk1 (x) So we know the rst two derivatives, and then we can just do recurrence to get all the derivatives. We have a recurrence on the derivatives! This is the simplest way to nd the roots. Golub-Walsh is the standard way to implement root nding for orthogonal polynomials, which is somewhat slower and less accurate. Gaussian quadratures are a good way to integrate orthogonal polynomials. Good clean fun. But polynomials are not the only tool for representing functions. The other is complex exponentials.


Functions on the interval [0, 2].


(f, g) =

f (x)(x)dx g

this is the inner product.


Lecture 12
k eikx = f (x)

error is order of 1/mp where m is the number of fourier coecients. Error is big until you exhaust the frequency content, and then dies. The approximation with a Fourier series is exact, or its not good at all.
2 0

f (x)eimx dx = m

integrate by parts.


f (x)eimx dx

|m | 1/m

|f (x)|dx

2 max |f (x)|/m If you take more derivatives, instead we have maxima of f and so on, as long as derivatives of f stay continuous. If a function has p continuous derivatives, then the convergence rate of the error is 1/mp . When a function is periodic on an interval, it better be tabulated on evenly spaced points. So as not to bias in favor of any particular wiggles. More technical explanation. Function of the form eimx which youre trying to integrate on an interval. h eimxk This is a geometric progressio. 1 (eimh )n 1 eimh any function of this form when integrated by a trapezoidal rule with enough points, integrates the function exactly. The error of the projection using a trapezoidal rule decays like 1/np where p is the number of points. h 18


Lecture 12

The Fourier Transform diagonalizes convolution. (f g)(x) = f (t) = g(t) = eikt f (t)dk f (t)g(x t)dt f (t)eikt dk g (k)eikt dk g (l)ei(xt)l dldt

Integrate by parts! (aka change order of integration) Change of variables: x t = , t = + x. Then eik +x = eik eikx . When we convolve two functions, we multiply their Fourier transforms. If you could magically get the Fourier transform of every function, you could convolve any two functions instantly. FFT is the fast way to do Fourier transforms n log n time. Try this with a nite interval and sums instead of intervals, you can get nonsense. What you get will be periodic. You have sequences on a circle, not at innity. Related application: ltering of functions. You have a function thats either a sum or an integral. (x) = einx (k)dk

You need to lter it. You Fourier transform it, remove high frequencies, and then pass it through whatever your lter is. Lets take a constant sigma. See what happens if you truncate it at some nite frequency.

eix d = eiax /x

You get ringing at the frequency at which you truncate. A function thats constant in the frequency domain is a blip. But truncating it makes it decay as 1/x instead of a beep, it lasts forever. Instead of a brick-wall lter, you design a lter thats one at frequencies you want to reproduce exactly, and bells downward outside. Filter design is a whole subject. Common homework mistakes: 1) forgetting to pivot. 2) misunderstanding the Double Gram-Schmidt. (q1 , q2 . . . qi1 , vi . . . vn ). Were at step i. First: pivot (look through the rest, nd the vector with largest magnitude, call that vi .) Then: reorthogonalize. Set vi to 19

be orthogonal to qi . . . qi1 . Then normalize vi and call it qi . Then orthogonalize vi+1 . . . vn to qi . Next homework: Homework asks for you to nd the matrix B such that U AU T = B, where B is Hessenberg. Why bother with Hessenberg when we can get a triangular matrix with just U A? What we want is to apply Givens rotations on both sides to get a similarity transformation. At every step apply a left transformation and a right transformation. Hessenberg, not upper triangular, because you have to stop early enough that the transformations dont mess anything up.


Lecture 13

Last time: numerical solution for dierential equations. The Euler method is sort of a viable approach. The convergence is 1/n, where n is the number of points. And then you apply Richardson extrapolation to speed up convergence. Adaptive version of Euler? Make smaller steps when the function changes faster? If you do that, you will mess up Richardson. There is no more step h. What is the next step? 2nd order Runge-Kutta. Suppose we have a dierential equation = f (, t) and an initial value. Suppose we start making Euler steps. The Euler step says that k+1 = k + h f (k , t). How could we make this better? We integrated approximately over the interval of length h. We replaced the integral with a rectangle. This is stupid; we could use the trapezoidal rule. Rectangle rule gives us error h2 but trapezoidal rule gives us error h3 . But wed need the second value, which is not available; only an approximation obtained from Euler. k+1 = h/2 (k + +1 ) + k k + h/2(f (k ) + f (k + hf (k ))) This has error order of h3 . Euler predictor and trapezoidal corrector. Total error behaves like 1/n2 where n is the number of points of the discretization. This is a good initial method; too easy to mess up more sophisticated methods. Runge-Kutta methods are trapezoidal rules and higher order. You rarely use order higher than 6. Integrating on an interval: rst step is predictor: . f () gives you , so you can approximate, via Runge-Kutta or Euler method. This is the corrector. A predictor-corrector of reasonably high order is a good solution. Dont want to do it over 12th order. New concept: Sti system of dierential equations. There is more than one scale, and the faster scale contributes nothing or little to the solution, but makes all above methods


impossible. Suppose you have a linear dierential equation, system of equations, = A. A is diagonal. Say one of the terms is 106 . = 106 e10

Any projection onto the rst coordinate is killed immediately. The second coordinate is more mild. In order to discretize this system of dierential equations, how many points do you need? 100 points should be enough for the = solution. But what will happen if we try to use Euler? k+1 = k + h k = (1 + h)k (1 + 1/h)h (0) k+1 = k h 106 k = (1 106 h)k this will explode. This equation is basically 0, but requires 2 million points for accuracy. If the equations are not separable (as in this lucky example) then we are in trouble. How do you deal with sti dierential equations? Euler method says k+1 = k + hk . The result is (1 + h)k is the dierential equation is = . k+1 = k + hk+1 = k + hk+1 this is true too we just dont have k+1 . k+1 (1 h) = k k+1 = k /(1 h) k+1 = k + hf (k+1 ) k+1 hf ((k+1 ) = k Draw the curve or curves that separate, in h -space, where its greater than 1. Determine the curves numerically, and then compute inside and outside.



Lecture 14

Storing numbers on a computer... a = 1.2345 b = 1.2333 a-b = 0.00012. Or, for series: sin(x) = x x3 /3! + x5 /5! . . . If you attempt to compute sin(100), its some number of absolute value less than 1. 100 106 /6 + 108 /120 . . . too many big numbers added and subtracted youll have problems implementing this. What can you do about this? Recall the Fibonacci sequence. Fn+2 = Fn+1 + Fn Jn+1 = [0, 1; 1, 1]Jn Diagonalize this and nd the eigenvalues: 1+ 5 1 = 2 Fn = n + m 1 2 Save and and thats an ecient way to nd Fn . What if = 0? = 1016 numerically. If you try to plot it, youll get Fn n umn 1 For these initial conditions, the operation is unstable. The operator has two eigenvectors; its behavior is governed by two eigenvectors, one large, one small. If you follow the small one, youll lose everything. This is an alternative formulation of a sti problem. For dierential equations. If you replace the Euler method with a scheme called Backward Euler, then instead of k+1 = k + hf (k ) as in Euler, you can do k+1 = k + hf (k+1 ) Then if you have one mode that decays very rapidly, the corresponding component does decay. It doesnt decay in a way that constitutes a solution it just goes down where it belongs. 22

You can, however, do better than this. Euler gives rst-order convergence; more accurate approximation can give you 2nd-order or higher convergence. But this doesnt work for sti problems. The only other method that works for sti problems is called Crank-Nicholson. k+1 = k + h/2(f (k ) + f (k+1 )) Get an equation which you can solve, for k+1 . What is the stability behavior? Apply to the equation = . For which is this stable? k+1 h/2k+1 = k + h/2k 2 + h k 2 h For really large , the mode will be barely stable. Turns out there are no schemes like this of order higher than two where the nodes are stable. n+1 = Dierent class of odes: boundary value problems. Suppose has two dimensions, and you have a dierential equation = f (, t) You need two initial conditions. Say you have 1 (0) and 2 (1). Whats the most obvious way to deal with such a problem? On the left side, you know 1 (0). If we knew the second coordinate we could just march to the right. Lets take some solution. 0 . Just an 2 arbitrary function. Then we solve a Cauchy problem. We dont get the correct value at the right-hand side but we get some value. Now we try to nd the correct value of 2 so that the corresponding value at the right endpoint would be correct. This is known as the shooting method. In reality you wouldnt shoot truly at random. Shoot above from the left and below on the right...if the function is monotone, at least it will converge. But...sometimes we need the dierential equation to be solved simultaneously over the whole range. Lets say you have a rocket, you start on earth, and you want to hit Jupiter.


Lecture 15

Example for today: second-order dierential equation. + a + b = f 23

(0) = (1) = We can replace with plus a linear function and the equation will still hold. We can choose it such that (0) = and (1) = . So we can reduce this problem to one where its 0 on both sides. Approximate the interval with points xi and let (xi ) = i Then i+1 + i1 2i h2 i+1 i1 (xi ) = 2h We actually know 1 and n : theyre 0. So we have the power to just solve these equations, moving in from the boundary. Tridiagonal linear system every point talks to itself, its predecessor, and its successor. We have a second-order error, O(h2 ). This is a nite dierence scheme. Youre given a dierential equation; you replace the function with a table of values; you replace the derivatives with nite dierences; and you approximate. This works ne. (xi ) = You can also use Richardson extrapolation to improve the accuracy. You can also replace derivatives with nite dierences of higher order, but this isnt worthwhile after order 4. When you calculate the derivative near the end, you have to use a non-symmetric dierentiation formula, which is problematic. What is the condition number of the resulting linear system? Lets say you have eix on this interval. The size will be roughly 1: (ei(x+h) ei(xh) )/2h 1 + h 1 h/2h = 1. But lets say we have something worse, like +1, -1, etc, alternating on all the points. Condition number is n2 if n is the number of points in the interval. Which reduces the eective number of digits you can get. If you have double precision, the best accuracy you can get: rst source of error is 1/n2 from the nite dierences; the other is n2 from the condition number of solving the matrix. minimize: 1/n3 + n = 0 n 1

We just made a terrifying discovery: if youre using nite dierences with double precision and second-order dierentiation, then we lose half the digits. Were never going to get better than 8 digits. Richardson wont really help with this, because the problem is in machine accuracy, not in the nite dierence approximation.


A compact operator, at least in L2 , is a limit of a sequence of nite dimensional operators. Rapidly decaying spectrum. You can represent the identity plus a compact operator on the computer; then you can also represent its inverse, which is also identity plus a compact operator. Since outside a nite-dimensional subspace, this is dominated by the identity. Suppose you have some operator of the form

m eimx

and N 106 . The small terms are roughly 1015 . When we dierentiate this, it gets multiplied by 106 , thereby losing you 6 digits. Non-compactness of operators makes badness. The farther we go out, the bigger the thing becomes. Second derivative gives us N 2 , lose 12 digits, which means if you start with double precision, you end with only 3 digits. Solving equations of the form (I + C)x = y is okay; inverting a compact operator is not okay. Lets say we have some basis say, orthogonal polynomials. (x) = We want to solve the equation (x) + a + b = p(x) = f We can solve this; the inverse should be compact. p(x) = i P (i ) i i

which we know. From this, we want to determine i . i P (i ) = But we can represent p(i ) = ij j . So we have i

j j

ij j =

i i

Take the operator were trying to invert, project it onto our basis, getting a matrix; project the right-hand side onto our basis , getting a vector; and invert the matrix and multiply by the vector! Happy times. This is generally called the Galerkin method. Any time you want to solve a dierential equation, its tempting to use basis elements that are localized. These are called nite elements. For now we observe subversively that if you have a nite element scheme for any set of dierential equations, then there always exists a nite dierence scheme that produces identical numerical results. 25


Lecture 16

Finite element methods. Take basis elements, project the operator on them, left and right, and get a linear system; and you solve the linear system. Pointwise convergence of nite element calculations is not important. Youre doing this in a weak sense. That is, if you have a boundary value problem, + a + b = f You never use nite elements in one dimension. In higher dimensions theyre used frequently but they shouldnt be. Why? Legendre polynomials evaluated at Gaussian nodes are orthgonal. Indeed:

Pm (x)Pk (x)dx = 0

Pm (xi )Pk (xi )wi Then its equal to the integral. The matrix Pk (xi ) wi is an orthogonal matrix. So any functions representation as a Legendre series is exactly equivalent to a table of values at Gaussian nodes. Also: equispaced nodes are the Gaussian nodes for trigonometric functions! Theorem: if you have N functions, bounded, then there exists a collection of N points, such that if used as a set of interpolation nodes, you do not lose digits. You dont even need continuity. So if you have a collection of nite elements, you can get the equivalent solution with interpolation nodes. Integral equations of the rst kind:

K(x, t)(t)dt = f (x)


Unlike dierential equations, insensitive to the way you formulate them. nm Pn (x)Pm (y) Legendre expansion of K. K is compact: so the coecients decay fast. If not, you cant always solve the equation. Second kind of integral equation:

(x) +

K(x, t)(t)dt = f (x) 26

Outside of a nite region, this is basically the identity. So these are easy to solve. What is the proper way to start analyzing this? take a simple case. Lets assume k is small. (x) + Say that (x) = K(x, t)(t)dt + f (x) K(x, t)(t)dt = f (x)

The norm of this whole operator is like |K|. i+1 = This is a contractive mapping.

K(xi , t)i (t)dt + f (x)

(x) f (x)

K(x, t)f (t)dt

just the xed point iteration. Take the nite-dimensional subspace, where its a nitedimensional linear system, and outside its just the identity. The rst kind integral equation is an ill posed problem. There isnt really anything you can do. (xi ) = K(xi , xj )(xi )wi = f (xi )

This is an n by n system of equations, usually dense. Can be solved. This is called the Nystrom method.


Lecture 17

(x) =

K(x, t)(t)dt

nm Pn (x)Pm (y) = K(x, y) Youre solving for . The only chance to solve such a system is if, say, K is just barely L2 so the sum converges slowly. Smooth kernels are bad in this context. Singular kernels are good. Extreme form of a singular kernel delta function on the diagonal. So you have a compact operator plus 1, so the diagonal is + 1 outside a compact subset. Fredholm alternative: second kind inegral equations behave like nite-dimensional linear systems. Outside a nite-dimensional subspace, its essentially 1. Solve a nite linear 27

system inside a compact region, and its easy outside. Apply Cramers rule to this system. If something is zero there, you can have no solutions or innitely many solutions, depending. People discretize integral equations without noticing which kind they are. They miss the fact that accuracy can decay. You often get a dense system of linear equations and get n3 operations. And its generally wrong. Next topic: iterative solutions of linear systems of equations. You have a linear system Ax = y and you would like to solve it. An iterative method views A as an may not necessarily know what A is. For example you might be measuring something. X k Ak y

this is the general form of an iterative algorithm. (Gauss-Seidel is not a good method.) We can immediately write down a least-squares problem: min||A k Ak y y||2

You can solve for the coecients k and get x that way. Not very viable scheme if you need higher order powers of A. Its just a starting point of iterative methods. Matlabs implementation is not correct watch out! Substitution method: if you have an equation F (x) = y then you can always rewrite G(x) + x = y where G = F Id. x = y G(x) xk+1 = y G(xk ) This is a contractive mapping. If (F (x), F (y))/(x, y) < a < 1 then a sequence of repeated contractive mappings converges. All you need is to live in a complete metric space. This shows that the sequence xk+1 = y G(xk ) will converge to the solution of the original equation. If F is linear then G is also linear, and the mapping is contractive if all eigenvalues of G are less than 1. 28

Neumann series: decompose matrices.

1 1x

into Taylor series: 1 + X + X 2 + X 3 + . . . . Same for

I + pA + qA2 and B = U 1 AU . Then B 2 = U 1 AU U 1 AU = U 1 A2 U . (m = k k ym Ym )2 m

To nd an eigendecomposition, you can do a polynomial approximation.


Lecture 18

Last time: if you have a big linear system, its tempting to try to solve it iteratively, and there is an acncient algorithm known as substitution. All iterative algorithms have to operate on an operator rather than a matrix. You dont have to get inside; you just apply the matrix repeatedly. Youre trying to approximate, in Ax = y, y That would mean x This suggests some kind of least squares: ||y k Ak y|| k Ak1 y k Ak y

choose k to minimize this. This is a stupid idea because when you take a matrix and apply it to a vector repeatedly, the vector will be converging to the top eigenvalue, and all these guys will be approximately the same, so youll get noise. yk = so take ||yk
k i

i i yk k i i yk ||2 k

All the eigenvalues. All of them. If we have a matrix A and the spectrum is clustered around one point, then 1/ is roughly a constant. Then just taking a Chebyshev approximation to 1/ and applying that to A, youll get a vector to high accuracy.


The problem with iterative schemes is that the number of iterations you need depends on the spectrum. So for certain classes of problems it works well and for others it doesnt. Conjugate residual algorithm: Ax = y construct a sequence of orthogonal vectors Aei = i , i some basis, we want them to be all orthonormal. Suppose we could; suppose Aei were all orthonormal; then we could take y and project onto them: y= i Aei x= i e i

What do we do? Take e1 = y, Ae1 = Ay, normalized to ei ; Ae2 = A(Ae1 ) (Ae2 , Ae1 )Ae1 and then we normalize the whole thing. Geometric interpretation: take A k , and calculate inner product < A k , j >, j < k 1, in order to orthogonalize it. Thats < k , A j >. If your matrix is orthogonal. Otherwise you cant do the iteration. Problem: suppose you want to approximate 1/ Pk () around the circle. But all polynomials integrate to 0! So this is a problem. This approximation can never work. ||Ax y||2 can be minimized by minimizing 1/2(Ax, x) (x, y) A has to be positive denite, symmetric. The convergence rate of the above is 1 ( 1+
1 k m 1 ) k

Start by taking the gradient of f at some x0 ; this is Ax0 y. Taking x0 = 0, start with e1 = y. then the kth residual rk is rk = y Axk , and ek+1 = rk eT Ark i ei eT Aei i


and then xk+1 = xk + k+1 ek+1 where k+1 = eT rk k+1 eT Aek+1 k+1


Lecture 19

Recall: the conjugate gradient method involve taking a sequence of estimates for x in the equation Ax = y. Let rk be the residual at the kth step. rk is b Axk . This is also the negative gradient of f at x = xk if f (x) = 1/2xT Ax xT b the function whose minimum is the true value of x. Gradient descent would be to move in the direction rk . Since we require the directions to be conjugate to each others (uT Av = 0) so instead you take pT Ark i pk+1 = rk pi pT Api i xk+1 = xk + k+1 pk+1 A sequence of mutually conjugate directions form a basis of Rn and so we can expand x in the conjugate directions. This is a good method for large matrices. Take a 7x7 matrix with condition number 100. How do you solve Ax = b? The correct answer is that you can do anything Gram-Schmidt, anything you want. There is no way of solving linear algebra systems of equations with high condition number. With a condition number of 100 youll lose 2 digits; given 7 digits to start with, theres no issue. A partial dierential equation is of the form f = 2 2 2 + 2 + 2 x2 xy y

Ordinary dierential equations are your friend. Existence and uniqueness of solutions. All these are incorrect about partial dierential equations in general. The only thing you can say is that if =f t


then if youre given an initial value at 0, then in a neighborhood of zero if all coecients are analytic you have a solution. This is the Cauchy-Kovalevski theorem. It is not useless. Most things are PDEs of second order. Those are actually solvable. Suppose we take pnm eimx einy and apply a partial dierential operator to this. If you dierentiate with respect to m twice you multiply by m2 , if you dierentiate once you multiply by im, and so on. The action of any rst-order parts is completely dominated by its second-order part. One more fact (which will not be derived): Suppose we apply a matrix B to the coordinates x and y. [u, v] = B[x, y] What happens to the coecients? its like conjugating: A = B T AB Behaves like the matrix of a quadratic form. Quadratic forms are in the form aij xi xj What happens to a quadratic form when the variables xi are transformed linearly? This has the property (checkable!) that B = V AV . Matrix as an operator, matrix as a quadratic form equivalent, so long as the transformation is orthogonal. What can we transform our quadratic form to? If V is orthogonal, any symmetric matrix can be transformed to a diagonal form. Any quadratic form can be brought to a diagonal form with only 1s and -1s on the diagonal; but the numbers are not changeable. This fact is called the Law of Inertia. If its positive denite, all the nonzero diagonal elements are 1. This is NOT diagonalizing matrices. Back to PDEs. Suppose we can write A = B AB. What does that mean? When you transform independent variables by an operator, the matrix coecients for the PDE transforms as if it were a quadratic form. Types of PDEs: 2 2 + 2 x2 y 2 2 2 x2 y 32

2 x2 by Sylvesters law of inertia. These are described as elliptic, hyperbolic, and parabolic. Now, can an equation be elliptic in some regions and parabolic in others and hyperbolic in others? This means you cant really solve the equation in that case. Now, the Fourier transform diagonalizes dierential operators. Look at 2 2 2 =0 x2 y mn = This lives on a torus. mn (n2 m2 )mn If m2 = n2 then we solve this equation. These are in fact the only solutions. Our solution will look like m eimx einy + m eimx einy Oscillating with the same frequency in both directions. nm eimx einy


Lecture 20

Initial value problems for elliptic PDEs cannot be solved; there are modes that grow. The way to suppress them is to have a boundary value problem, which do make sense for elliptic PDEs. What happens when you have a set of equations that change their type? This is hard outside the scope of any course. 2 = t x2 How do we do this? Discretize. (x h) 2(x) + (x + h) h2 ht h2 . This is a tridiagonal matrix so you can solve it. x Imagine that the problem is 2 2 = + (y) 2 2 2 t x y In x its constant but in y it changes. Happens a lot in the oil industry: from what you record from an explosion you try to reconstruct the underground structure. To a rst 33

approximation, its a bunch of layers, horizontally uniform. But in the vertical direction it changes. Constant with respect to two variables, varies with respect to the third. How would we solve such a thing? Separation of variables. Suppose you have a function that is of the form (x, y) = (y)eimx eit What happens if we substitute it into the equation? t = i = m2 (y)eimx eit + eimx eit d/dy(y) Now you have something like 2 + = 0 y 2 This is solvable. Whenever you can separate variables, do it. Fourier transform can be used for separation of variables. You can separate variables on a sphere rotation around longitude can be separated from azimuthal rotations. There is no Fourier transform on the sphere there is expansion into spherical harmonics. If you can separate variables, ALWAYS do it. One example; fast Poisson solver. Quickly solves Poisson equations on a square. Decompose as nm einx eimy and solve the equation 2 + = f (x, y) t x2 with boundary conditions | = 0 We have a function that is periodic, but we have no guarantee it is smooth! How well did we approximate by a Fourier transform? In our case, convergence rate is 1/n2 . What if the boundary isnt zero? Suppose its linear in y, zero on the bottom boundary of the square. Applying inverse Fourier transform in x to a function multiplied by a linear function in y is easy. If g is not zero at the left and right endpoints of the boundary, we have a worse conversion. So we subtract a function from g so that we can apply an inverse Fourier transform.



Lecture 21

Suppose you have a separable dierential equation; you should separate variables. Always. With one exception; if you have a hyperbolic equation a(x, y) 2 2 + b(x, y) 2 x2 y

The Courant condition says that the time step must be the square of the space step for this to be numerically stable. For simplicity consider the system 2 2 = t2 x2 Let = t

partial 2 = t x2 = t The stability condition prevents this from being a sti problem. (t) = (ti1 2(ti ) + (ti+1 ) h2 t

Here ht is the timestep and hx is the space step. Whats the problem with this scheme? Its a second-order scheme, but the constant is HORRIBLE. For 10 wavelengths you need 10,000 points. Error is in the speed of propagation. Waves are out of synch.