You are on page 1of 42

# Modeling and Simulation, 5FY095, Fall 2009 Lecture notes for ordinary dierential equations First Revision: 2009/09/8

Claude Lacoursi` ere Deptartment of Computing Science Room C240, MIT Huset email: claude@hpc2n.umu.se September 8, 2009
File: notes.tex Revision: 1.3 Date: 2009/09/08

Part I
1 Overall plan
In these notes weby which I mean you, the reader, and me, the authorwill look at numerical solutions ordinary dierential equations (ODEs) in some depth. Just to get going here, ODEs describe time rate of ch-age for variables describing the state of a system and are thus essential components of continuous time models as discussed previously. By numerical solution here, I mean a series of system congurations corresponding to discrete moments in time. The positions of the planets in the sky at midnight every day of the year for instance. We will focus on the mathematical theory necessary to understand how to construct numerical methods and which ones are applicable to a given problem. This means in particular that we will try to understand what makes a method ecient, stable, and accurate. The general theory will be applied to mechanical problems which Newtons

File: notes.tex

## Rev : 1.3 2009/09/08

2 second law as much as possible to give concrete examples and to prepare for what follows in the course. The theory will be matched to experiments you can perform with the Matlab ODE toolbox and software you can write yourself when this is more appropriate. We will use Matlab for the sake of continuity between dierent sections of this course but the same numerical techniques are also available in software libraries and as part of Octave for instance. The theory will be used to derive simple low order numerical methods of the type you can implement yourself. After all this eort, you should be able to understand how accurate methods are derived and which is the best one to use for a given problem. This type of question will be relevant in the next sections of the course in which you will compute solutions to stochastic ODEs. In order to read these notes, you must have a good understanding of vector calculus. Extensive use is made of Taylor series, polynomial approximations, and integration of simple polynomial functions. Some notions of linear algebra are invoked in a few places, mostly things about eigenvalues and eigenvector, as well as references to the Jordan canonical form. Some basic mechanics is used as well in the examples. As a complement to this material, you will also be required to complete a set of lab exercises in matlab where you will get to see how library software performs on certain dierential equations that are famous in the literature. That will make you appreciate some of the theory.

## 2 Background, motivations, and scope

The notion of a model was introduced in previous parts of this course. Continuous time models lead to ordinary dierential equations (ODEs) which relate the rate of change of the state of a given system to its present state. To be concrete, if the state is a vector function of time, x(t) : R Rn , an ODE is of the form x (t) = f (x(t), t). (1) What is under investigation here are functions f (x, t) such that Eqn. (1) does not have an analytic solution in terms of elementary algebraic or trigonometric functions. This said, linear equations which do have simple solutions will be used extensively for the purpose of analysis. The reason is that if a method fails on a linear system, you cannot expect it to work anywhere else. A numerical integrator produces discrete time samples of the trajectory of an ODE, say x(t1 ), x(t2 ), . . . x(tn ), tk+1 > tk . We will often abbreviate this notation with x(tk ) = xk , k = 1, 2, . . . , n. (2)

To do that, the analytic model Eqn. (1) is rst transformed to produce a discrete time dynamical system. If the numerical integrator proceeds at xed time step the discrete dynamical system has the form x k+1 = ( xk , tk ). (3)

File: notes.tex

## Rev : 1.3 2009/09/08

3 For one-step methods such as the entire Runge-Kutta family, and in particular, all the integration schemes derived in 8, x k = xk is just the present system state. The simple case of the explicit Euler integrator for instance yields xk+1 = xk + hf (xk , tk ), Explicit Euler. (4)

This is a prediction of the future state of the system and you expect that to be accurate provided that f is continuous and that h is small. For linear multistep methods, the value of x is approximated using an interpolation over the r last steps and thus, the discrete state being forwarded is in fact x k = (xk , xk1 , . . . , xkr ). Linear multistep methods will not be investigated further here but they are commonly used in some application domains and they are available in the matlab ODE suite. An integration method might automatically vary the integration step during computation. The integrators found in Matlab for instance will adjust the time step so the resulting discrete data is indistinguishable from the exact solution to machine precision, at least for nice enough functions f (x). The theory says that in the limit where the time step h becomes small enough, any discrete approximation converges to the exact solution. The problem here is that small enough might be too small for a bad method and the real question in numerical analysis is how to compute x(t) eciently, doing as few computations as possible for the most accuracy possible. If you have a dierential equations and you want to see what solutions look like, just use one of the stock methods found in the Matlab library. However, time step adjustment comes at a cost and if speed is critical, it becomes important to understand the qualitative nature of the discrete dynamics. If the dynamical system you are simulating has special symmetry, it is a good idea to look for a method that preserves these symmetries globally, irrespective of the size of the time step h. That might make it possible to make good approximations which are stable over long period of time despite being of low local accuracy. If you are going to add noise to the system any, as you will do later in this course, that might be the right thing for you. Going back to the modeling and simulation pipeline 1 discussed previously in this course, the continuous time dierential equation serves the purpose of theoretical analysis and the discrete-time samples serve to compare theory with experiment as well as to make predictions. Indeed, experiments are performed in discrete time and simulated data must be compared directly.. The double arrow is a crucial point in the analysis. It relates to whether or not the symmetries or the underlying theory are preserved by the numerical method. This can be addressed by using high accuracy methods, which can be inecient, or by using methods which can be proved to have the right properties. To give an example of this and to x the ideas concerning preserved quantities, consider the Taylor series expansion of a function of time, x(t) about 0 so that x(h) = x + hx + h3 ... h4 ... h2 x+ x + O(h5 ), x + 2 6 24 (5)

where all the terms on the right hand side are evaluated at 0, and h is small. In basic

File: notes.tex

## Continuoustime Models: ODEs

Mathematical Analysis

Discretetime Predictions

## Numerical Integration: Simulation

Figure 1: The basic simulation and validation process dierential calculus, the derivative x is dened as the limit x(h) x(0) = x. h0 h lim (6)

But in numerical work, we keep h nite. This is something that was very clear to Newton for instance. When calculus was invented, it helped demonstrate that nite time calculations were good approximations as long as the time step h was small enough because it was always possible to make it smaller in principle. Small but not 0 is the point, indeed. But then if you look in more detail, you nd the following approximations x(h) x(0) =x + O(h), h x(h) x(h) =x + O(h3 ), 2h x(2h) + 8x(h) 8x(h) + x(2h) =x + O(h5 ). 30h

(7)

Which means that with more data and more operations, we can get better and better approximations for a nite h. For this case, one can keep h constant and use more data to get a much better approximation. Thats nice if you are studying planetary motion where h is xed since you can only observe certain planets at certain given times. Observe also that the second line in Eqn. (7) is symmetric in time which means that if we integrate using xk+1 = xk1 + 2hf (xk ), (8) we can retrace our steps backward in time. That is not the case with the rst approximation Eqn. (4).

File: notes.tex

## Rev : 1.3 2009/09/08

5 Another way to look at the Taylor series expansion Eqn. (5) is to use the relation x = f (x), we drop the time dependence momentarily, so that x(h) = x + hf + h2 h2 fx + O(h3 ).x + hf + f f + O(h3 ). 2 2 (9)

The chain rule was used for the second order term here. Using the idea of Eqn. (4), we could write h2 (10) xk+1 = hf + f f + O(h2 ), 2 which would give a better approximation. But the problem here is that in general, x Rn is a high dimensional vector and so f is a vector function of a vector argument and therefore f is a matrix with elements [f ]ij = [f ]ij = Gij = fi . xj (11)

## For large systems with millions of variables, f is prohibitively expensive to compute.

3 Computational considerations
The standard notation x = f (t, x) is deceptively simple. In molecular dynamics computations for instance, the vector x would contain the aggregate coordinates and velocities of several million particles. In addition, the function f (t, x) for this case would compute pairwise forces between all particles that are close enough to interact. If N is the number of particles, the complexity of evaluating the function f (t, x) is then of O(N 2 ), i.e., the number of distinct pairs. Clever tricks have to be used to reduce that to O(N ). In addition, if f (t, x) varies quickly, the Cauchy-Picard existence theory covered below in Sec. 6 suggests that the time step should be very very small. Typically however, f (t, x) varies very quickly in some but not all directions. For such cases, we have to do a lot of work to recover a nice solution. Much too much work in fact. General theory and a wealth of good information on numerical integration technique is found in the books by Hairer, Nrsett, Wanner and Lubich[3, 4]. These books are advanced but contain the nal word. More recent work on numerical solution of dierential equations with specic attention to symmetry preserving methods, as is needed in physics simulations in particular, is found in the books by Hairer and Lubich  and Leimkuhler and Reich . A simpler treatment of general methods is also found in the book by Ascher and Petzold .

4 A concrete example
In this section, a complete story is traced from discrete-time experimental data to simulated discrete-time data and validation. Think of astronomical tables collected over the last four millennia looking at the sky every night and marking the position x(t) of given planets. One cannot look continuously

File: notes.tex

## Rev : 1.3 2009/09/08

6 into the sky so discrete time samples of positions, typically once per day for a fraction of the year for any heavenly body were recorded. When these results were plotted on paper by the time of Tycho de Brahe, Kepler and Newton, interesting things appeared which led ultimately to Newtons universal law of gravitation. The same type of tables were computed also in South America, Africa, and Asia. In Europe though, these observations ultimately lead to calculus and dierential equations. Lets write this down here to get a concrete example of a dierential equation. Suppose the position of the earth relative to the sun is given by x : R R3 , i.e., x is a three dimensional vector. Think that the sun is xed for the moment, which is not far from the truth. The combination of Newtons second law of mechanics with the law of universal gravitation leads to the following equations for the motion M x = GM M x , x 3 (12)

where G is the universal gravitation constant, and M , M are the masses of the earth and the sun, respectively. Also, x is the standard Euclidean norm. This is telling us that if we know where the earth is at a given time, we can compute its acceleration. With this information, it should be possible to predict where the earth will be relative to the sun at any time for several years for instance, provided there is nothing else aecting the motion. What equation (12) is not telling is the absolute position of the earth but rather, it relates a rate of change to a position. That is fundamental to dierential equations: they tell a story of how things change given how they are now. In addition, it gives a taste for the topic since if we know how things change, we can predict where they will be in the near future. The point to make now is that (12) was written down in this form because an exact mathematical solution was possible: the resulting motion is a simple ellipse. To do that, it was necessary to dene the derivative as the instantaneous rate of change in time as x = lim x(t + h) x(t) . h (13)

h0

Newton did this because he gured out that in dening the approximate velocity as v (t) = x x(t + h) x(t) , h (14)

where h would be the time corresponding to a full day in the case of Tychos tables for instance, it did not matter much how what time step h was used, provided it was small enough, i.e., things did not change much if h/2 was used instead. It is instructive to think that Newton always thought of x as the nite dierence (x(t + h) x(t))/h and that is precisely what you need to keep in mind when working on dierential equations. Basically, if you can write (x(t + h) x(t))/h = f (x(t)), well, you can predict the future! Thats what we are trying to do here. Numerical integration is the exact inverse of this limit process. To see this, lets look at gravitation again. First, we need to dene the acceleration x in terms of discrete

File: notes.tex

## Rev : 1.3 2009/09/08

7 samples so, lets consider the following approximation, x 1 x(t + h) x(t) x(t) x(t h) h h h 1 = 2 x(t + h) 2x(t) + x(t h) h

(15)

i.e., we compute how fast the approximate velocity is changing. Well, lets use that in the equation of motion (12) so now x(t + h) = 2x(t) x(t h) where we dened kg = GM M . (17) Look at that! This gives a numerical procedure to plot the orbit using a very simple minded procedure. Also, we can predict the position of tomorrow, x(t + h), using only the known positions of yesterday and today, x(t h) and x(t), respectively. Now we are in business. The simple technique of (16) works very well, in fact. Today, it is known as the Verlet formula and that is used extensively in Molecular dynamics simulations for instance, where millions of point particles are simulated attracting and repelling each others, simulating the collision of galaxies, or computing physical properties of gasses and liquids. Interestingly, Newton himself came up with this numerical scheme and it is in fact one of the rst examples in his book Principia Mathematica. h2 kg x(t), x(t) 3 (16)

File: notes.tex

## Rev : 1.3 2009/09/08

Orbit of the Earth around the Sun: symplectic Euler, h = 10.00 2 orbit center

-1

-2 -3 -2 -1 0 1 2

Figure 2: A trajectory computed using stepper (16), using the correct units for the earthsun system integrated over twenty years. Note how the orbit has the correct geometry. However, consider that we write instead x =v v = kg x(t)
3

x(t),

(18)

and use Eulers explicit method for that so xk+1 = xk + hvk kg vk+1 = vk + h xk . xk 3

(19)

It is easy to show that this is not time symmetric. A very small time step h will produce good results for a short while but you can see what happens for long integration periods in Fig. 3. Here, time-reversal symmetry was lost in the discretization and obviously, the simulated data is not very good. Neither does it corresponds to the experimental data, it is not even faithful to the analytic model.

File: notes.tex

## Rev : 1.3 2009/09/08

Orbit of the Earth around the Sun: explicit Euler, h = 0.5 2 orbit center

-1

-2 -3 -2 -1 0 1 2

Figure 3: A trajectory computed using the explicit Euler method, using the correct units for the earth-sun system integrated over twenty years. Note how the orbit spirals out of control with time. So, to answer the questions posed in the rst paragraph, given a function x(t) for which there is discrete time data, x(ti ), i = 1, 2, 3, . . . , the derivative can be generally computed using some sort of nite dierence scheme. The reason to do that is because we are usually interested in building a model that relates the derivative to the state. Conversely, given a models ODE, a nite dierence computation of the derivative leads to a time stepping scheme whereby it is possible to compute a future value of x(t + h), h > 0, given present and past data. Thats simulation. Other questions that come to mind are the following. When is a method numerical stable? Which techniques give the best results for the least amount of work? How do we choose the integration time step h? When does numerical integration preserve the geometry of the problem? When are invariants, e.g., energy, total mass, etc., preserved by the numerical method?

File: notes.tex

10

## 5 Notation and denitions

Dierential equations are relations between a function and its derivatives. They describe how things change given the current state. In most cases, the function is the unknown quantity which we are trying to compute. This unknown function could be the trajectory of an orbiting satellite, for instance. There are two alternative notations used in the literature and both will be used in what follows. In both cases, there is a single scalar independent variable which is labeled either t for time, or x for the general case. The dependent variable is then a n-dimensional vector function, x : R Rn when x(t) is the unknown, or y : R Rn , when y (x) is ... the unknown. The derivatives are denoted as x, x , x , . . ., when x(t) is the unknown, and y , y , y , . . ., when y (x) is the unknown. For the case where y (x) is the unknown, a relation of the form F (x, y, y , y , y , . . . , y (m) ) = 0 (20)

is a general dierential equation. Such equations are usually associated with boundary conditions, i.e., prescribed values of the function y (x) or certain of its derivatives at given values of x. In what follows, we only consider initial value problems for which the values of y (x0 ), y (x0 ), . . . , y (n) (x0 ) are given at the initial point x0 . Denition 5.1 (Order of a dierential equation). The largest integer m for which y (m) appears non-trivially in a dierential equation of the form (20) is the order of the differential equation. Denition 5.2 (Autonomous and non-autonomous ODEs). For a dierential equation of the form (20), if F/x = 0 everywhere, then, the system is autonomous. The relation (20) is extremely general and therefore, we rst reduce it in scope by concentrating on autonomous ordinary dierential equations (ODEs) which can be written as y = f (y ). (21) This is not an oversimplication however. In general, it is possible to reduce any ODE to the form ( F y (x), y (x)) = 0, (22) where is a real vector valued function of two vector arguments. To do this, dene the augmented variable 1 x y y (x) y y (x) (24) (x) = y y (x) = y (x) , and so y . . . . . . . ( n 1) y (n) (x) y (x)
File: notes.tex

(23)

10

## Rev : 1.3 2009/09/08

11 Basically, we add the trivial equation x = 1 to the system and concatenate all the derivatives y (j ) into one large vector. Once this is done, we can compute the Jacobians (r, s), of the function F F F J (1) = , and J (2) = , (25) r s and J (i) , i = 1, 2 are (mn + 1) (mn + 1) square matrices. When J (2) has full row rank, i.e., as long as det(J (2) ) = 0, the implicit function theorem guarantees that we can so that always nd f (y (x)). y (x) = f (26) This fails when J (2) has zero determinant, and the reduction to an ODE of (20) is not possible. For that case, we might have a dierential algebraic equation (DAE). We will not study these in the following lectures. Example 5.1. Consider the approximate equation of motion for the rotation of the earth about the Sun given in Eqn. (12). Converting to the y (x) notation, the form (20) reads: F (x, y, y , y ) = GM M y + M y = 0, y 3 (27)

and the usual initial conditions for this problem is a given position y (x0 ) and velocity y (x0 ) for a specic time x0 . The problem is then to compute the trajectory for future discrete times, y (xk ), xk > x0 , k = 1, 2, 3, . . . Note rst that this is a second order dierential equation so m = 2. Also, if we can that y (x) is a two dimensional vector so n = 2. In addition, since the time does not appear explicitly in (27), the system is autonomous. Explicitly reducing (27) to a rst order autonomous system, we write x 1 (28) y = y , and y = y . y y The form (22) is then 0 0 1 0 0 F ( y, y )= 0 + kg 0 M 0 y 3 where 0 1 0 0 x x 1 y + 0 1 0 y = 0, 0 0 0 M y y

(29)

kg = GM The system is quasi-linear and so, we can read o the Jacobian J (2) : 1 0 0 J (2) = 0 1 0 . 0 0 M
File: notes.tex

(30)

(31)

11

## Rev : 1.3 2009/09/08

12 Therefore, det(J (2) ) = M = 0 and so, the dierential equation can be reduced to an autonomous ODE. To do that, we introduce the new variable v = y (x) and write y v

0
kg y 3

1 0

y . v

(32)

Note that this is a block matrix equation since y (x) and v (x) are two dimensional vectors. Reducing complicated models to standard ODEs is necessary when using software libraries. Indeed, the entire matlab ODE suite is based on the assumption that you have a non-autonomous ODE of the form y = f (x, y ), and you have to write a script which computes that. For instance, the gravitational problem could be implemented as follows. function yprime = myderivative(x, y) global k; % the gravitational constant must come as % a global parameter defined from the driver script yprime = 0*y ; % make sure that dimensions of yprime and y match. yprime(1:2) = y(3:4); kgr3 = k/norm(y(1:2))^3; yprime(3:4) = - kgr3*y(1:2); in which you compute y given the current state y and current independent coordinate x, which is often considered as the time.

## 6 Basic theory of autonomous systems

Now, consider the standard autonomous ODE y = f (y ), and lets say that we are dealing with an initial value problem so that y (0) is given. The question is to compute y (x) for x > 0. What rst comes to mind is whether there is a solution to this problem or not. Then, the question is how to compute it to a given degree of precision. The proof of existence and uniqueness was constructed by Cauchy and Picard and it relies on the fundamental theorem of calculus. The reason to look at this in some details is that numerical methods are constructed from the very same principles. Consider the function
x

(y (x)) = y (0) +
0

dsf (y (s)),

(33)

i.e., for a given function y (x), the map (x) is a value that depends on the integral of f (y (x)). Now, consider the derivative (x): (y (x)) = d y (0) + dx = f (y (s)),
x

dsf (y (s))
0

(34)

File: notes.tex

12

## Rev : 1.3 2009/09/08

13 using the fundamental theorem of calculus. Now, if (y (x)) y (x) = 0 for all x [0, h], then, (y (x)) y (x) = f (y (x)) y (x) = 0, (35)

and so, y (x) is then a solution. To see how to get to that function y (x), consider a small interval [0, h] and a test function y1 (x). Then, consider what can happen to the sequence yn+1 = (yn ). If this sequence converges to y , then, we have y = (y ) and thus, y is a solution to y = f (y ). What we need here is to understand whether yn+1 yn 0 as n . The convergence criterion for such a sequence is that limn yn+1 yn 0, and this means we need to dene a norm for the function y (x). We can use maxx[0,h] y (x) . Next, look at what happens to the dierence
x

## yn+1 yn = max max max

0<x<h

0 x

ds(f (yn (s)) f (yn1 (s)) ds f (yn (s)) f (yn1 (s)) (36) yn (s) yn1 (s) .

0<x<h 0 x

0<x<h 0

ds

x

## f (yn (s)) f (yn1 (s)) yn (s) yn1 (s)

(37)

as long as u = v , and as long as we keep to bounded u, v , i.e., u y (0) < 1 and v y (0) < 1, for instance. So then, the bound of (36) can be reduced to yn+1 yn max
0<x<h 0

## ds yn (s) yn1 (s))

(38)

What this means now is that for small enough h, the sequence yn (x) is contractive and so, it will converge. Provided f (y ) is nicely bounded on the integration interval and provided it has a nice bounded derivative, the solution y (x) exists and is unique. Example 6.1. Consider the ODE y (x) = y (x), with y (0) = 0. A solution to this is easily veried to be y (x) = (C/2)x2 . However, C is not uniquely determined by the initial value. The reason for this is that f (y ) = y , and so f (y ) = (1/2)y 1/2 . This is not bounded near y = 0 so we cannot get a bound for (37) and the proof fails for this case. Example 6.2. Consider the ODE y (x) = y 2 (x) with initial condition y (0) = 1. This has the general solution y (x) = 1/(x x0 ) since y = 1/(x x0 )2 = y 2 . For the given initial conditions, x0 = 1 and so, the solution diverges at x = 1. What happened here is that it is not possible to nd a solution among the test functions with y < r < . There is no global solution for this case. To make use of the existence proof in constructing numerical methods, we need some basic ideas from integration theory.

File: notes.tex

13

14

## 7 Basic numerical integration

Using the ideas in the Cauchy-Picard existence theory, we can construct good approxh imations if we can evaluate integrals of the from 0 dsf (y (s)). So, lets take a few moments to recall basic ideas of integration theory. In the literature, this is often called quadrature. Consider a function of time, g : R Rn . It does not really matter in what follows if you think of g as a scalar or vector valued function. Consider the denite integral
h

I [g] =
0

dsg(s).

(39)

Obviously, if h is small enough, we have the rst order approximations I [g] = hg(0) + O(h2 ) = hg(h) + O(h2 ). To see this, expand g(t) in a Taylor series about g(0) 1 g(t) = g(0) + g t + g t2 + O(h3 ), for t [0, h], 2 and integrating this produces I [g] = hg(0) + h2 g + O(h3 ) = hg(0) + O(h2 ). 2 (42) (41) (40)

Since you may not be able to compute g , this is considered to be a rst order method. The same works if we expand from the other end g(t) = g(h) g (h)t + O(h2 ). To get a second order approximation, we can expand about h/2 as follows g(t) = g(h/2) + (t h/2)g (h/2) + (t h/2)2 g (h/2) + O(h3 ). 2 (43)

Integrating this, the second term vanishes and the third term is an O(h3 ) correction so we have h3 I [g] = hg(h/2) + O(h3 ) = hg(h/2) + g (h/2) + O(h5 ). (44) 8 Note that the O(h4 ) term vanishes as well and so, if you know g (h/2), you can get a fth order approximation but otherwise, the correction term is of O(h3 ). Usually, when doing numerical integration, you cannot compute g or g because g(t) is usually a very complicated function. So, in order to construct an even higher order approximation, you try to t a polynomial to the function g(t), and then, estimate the integral on that. So, let us rst t a rst order polynomial to the function g(t) as follows P1 (t) = t (h t) g(0) + g(h), h h (45)

File: notes.tex

14

## Rev : 1.3 2009/09/08

15 so that P1 (0) = g(0), and P1 (h) = g(h). This is called a Lagrange interpolation polynomial. A Taylor series expansion shows that it is a faithful approximation of g(t) up to O(h2 ). Integrating that produces
h

I [P1 ] = g(0)
0

ds

hs + g(h) h

ds
0

s h

h (g(0) + g(h)). 2

(46)

A simple check with the Taylor series shows that the error is of O(h3 ), which means that this type of trapezoidal or midpoint rule is accurate to O(h2 ). We can produce a quadratic interpolation polynomial as well using the midpoint value as follows P2 = 2g(0) t(h t) t(t h/2) (h/2 t)(h t) + 4g(h/2) + 2g(h) . h2 h2 h2 (47)

It is easy to see that P2 (0) = g(0), P2 (h/2) = g(h/2), and P2 (h) = g(h). So, P2 (t) is an interpolating polynomial of second degree and it approximates g(t) up to terms of O(h3 ). Integrating this, we nd that I [P2 ] = h g(0) + 4g(h/2) + g(h) 6 (48)

Using Taylor series, it is possible to show that the error is I [g] = I [P2 ] + h5 (5) g ( ), for some [0, h], 2880 (49)

meaning that this approximation is good up to O(h4 ). The trick to constructing higher order integration methods for ODEs is to combine results from integration theory such as those shown above, with the Cauchy-Picard sequence to construct approximations of g(t) = f (y (t)). Remember that since we do not know y (t), we cannot use the quadrature formulae directly. The simpler formulae are derived in the next section.

8 Basic approximations
Let us now construct some basic approximations with what we know. Use the simplest possible test function as y0 (x) = y (0), a constant. Then, the rst Cauchy-Picard iterate is
h

y1 (h) = y (0) +
0

## dsf (y0 (s)) = y (0) + hf (y (0)),

(50)

and the integral is trivial. So now, if we take y1 (h) as the approximation of y (h), we have the stepping formula yk+1 = yk + hk f (yk ). (51)

File: notes.tex

15

## Rev : 1.3 2009/09/08

16 Remark. Note that we usually write yk = y (tk ) for the numerical approximation of k 1 function y at the discrete time tk = j =0 hj . For xed time step h, tk = kh. This is not to be confused with yj used above to denote the j th iteration of the Cauchy-Picard map of (33). The procedure of (51) is called the explicit Euler scheme, and is the most unstable method in the book. The time step hk could change as we go along, or we could try this at xed step. We will go over the stability results later and time step control later in these lectures. From integration theory shown above, we have several choices for a rst order approximation and the explicit Euler scheme derived above is only one of these. Consider for instance using the value at the end of the integration interval so y0 (x) = y (h), Nota: we dont yet know the value of y (h)!
h

y1 (h) = y (0) +
0

## dsf (y0 (x)) = y (0) + hf (y (h)).

(52)

This last equation is implicit in y (h) since it appears on both sides of the equality sign. Therefore, rst order implicit Euler is the solution of the system of equations yk+1 hk f (yk+1 ) yk = 0, (53)

which produces yk+1 given yk . This can be solved provided f (y ) is nice enough. In general, the equations (53) are nonlinear in the unknown variable y (h). More will be said about the solution of linear and nonlinear problems in the next lecture. Also from the integration theory, we expect this method to be good to rst order in h only. If we now go back to the integration theory and consider the midpoint approximation, we get the explicit midpoint rule using the second Cauchy-Picard iterate y0 (x) = y (0)
x

y1 (x) = y (0) +
0 x

## dsf (y0 (s)) = y (0) + xf (y (0)) dsf (y1 (s))

0

(54)

y2 (x) = y (0) +

From 7, y2 (h) we can approximated as y (0) + hf (y1 (h/2)) up to second order. This leads to the explicit midpoint rule k1 = f (y (0)), compute from given initial data h h k2 = f (y0 + k1 ), explicit computation 2 2 y (h) = y (0) + k2 .

(55)

The same procedure can be repeated to get y (2h) from y (h), and so on and so forth. As we will show later, this approximation is accurate to O(h2 ), i.e., the dierence between trajectories computed with (55) and the real solution is of O(h3 ).

File: notes.tex

16

## Rev : 1.3 2009/09/08

17 Yet another method is to estimate the integrand with the midpoint value so that 1 y (0) + y (h) , we still dont know y (h) 2 h 1 y (0) + y (h) dsf (y0 (x)) = y (0) + hf y1 (h) = y (0) + 2 0 y0 (x) =

(56)

To compute y1 (h), the approximate numerical solution at the end of the interval, we need to solve the nonlinear system of equations y (h) hf (1/2)(y (0) + y (h)) y (0) = 0, (57)

and that is similar to what happens with the implicit Euler method. This new stepping scheme is called the implicit midpoint rule or Gauss method. h Finally, if we follow the idea of the second order polynomial approximation for 0 dsf (y (s)) in the previous section, we have to evaluate y (h) y (0) = h (f (y (0)) + 4f (y (h/2)) + f (y (h))). 6 (58)

Here, the terms appearing on the left hand side, y (h) and y (0), are the computed numerical values, and y (0) is given. To approximate what appears on the right hand side, particularly the f (y (h/2)) and f (y (h)) terms, we make use of the Cauchy-Picard idea. As previously, our rst guess for y (h/2) is as follows y1 (h/2) y (0) + This suggests an improvement in the form of h f (y1 (h/2)). (60) 2 Finally, iterating on that last approximation, we can construct an approximation for the last term: y3 (h) y (0) + hf (y2 (h)). (61) y2 (h/2) y (0) + Now, we have two dierent approximations of y (h/2) so we can average them, i.e., 4f (y (h/2)) 2f (y1 (h/2)) + 2f (y2 (h/2)). (62) h f (y (0)). 2 (59)

Putting everything together produces the explicit Runge-Kutta method of order 4 of the rst kind k1 = f (yk ) h k1 ) 2 h k3 = f (yk + k2 ) 2 k4 = f (yk + hk3 ) h yk+1 = yk + (k1 + 2k2 + 2k3 + k4 ) , 6 k2 = f (yk +

(63)

File: notes.tex

17

## Rev : 1.3 2009/09/08

18 where yk is the value at the beginning of the time step, yk+1 is the value at the end of the time step, and h is the time step for step k. To prove that this is indeed a method of fourth order requires the error analysis presented in the third of these lectures. The construction of higher order methods is possible but that quickly becomes more complicated since so many choice are possible. The interested reader can consult the standard references [2, 3, 1].

9 Introduction
For this lecture, we look deeply into stability issues related to numerical integrator. In other words, given an ODE of the form y = f (y ), we analyze what happens to the numerical solution, yk , k = 1, 2, . . ., and investigate the sucient conditions for producing bounded values. Though we proved previously that it is sucient to study equations of the form y = f (y ), or x = f (x), depending on the notation of choice, it is useful to also consider the type of equations that arise in classical mechanics, i.e., x =v v = f (x, v ), Newtons second law. (64)

As shown below, there are special techniques designed specically for Newtons law which are very advantageous from the stability point of view because of the special structure of the equations of motion.

10 Linear stability
Consider the general explicit ODE (21), or x (t) = f (x(t)) in the present notation. As a starting point, we study the behavior of linear systems which are dened as x = f (x) = Ax, (65)

where A is a n n matrix with complex entries in general. For our purposes, we consider the scalar equation x = x, C (66) for general ODEs, and x v 0 1 x v (67)

for physical systems. The reason to use complex numbers will be made clear shortly. The scalar equation (66) has solution x(t) = et x(0) = e()t cos(()t) + i sin(()t) x(0). (68)

File: notes.tex

18

19

10.1

## Stability of explicit Euler

This means that linear dynamics in one dimension is determined by a combination of exponential growth or decay, for () > 0 or () < 0, respectively, and sinusoidal oscillations at frequency (). The analytic dynamical system is stable whenever () 0 and the stability analysis of the discrete system will help understand when that property is preserved by a numerical method.

00000000000 11111111111 ( ) 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 Oscillations 00000000000 11111111111 00000000000 11111111111 Stable reagion () < 0 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 ( ) 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 Exponential decay Exponential increase 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 Oscillations 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111
Figure 4: The plane For reference, the linear rst order equation x = x is sometimes called the Dalquist test equation. It plays a central role in stability theory.

## 10.1 Stability of explicit Euler

The rst order explicit Euler method is not a reliable integration technique. However, it is useful in understanding the theory of numerical integration and a good starting point to x some ideas. Applying this method to the rst order linear equation x = x, the result is x (tk ) xk+1 xk = xk , and so h xk+1 = (1 + h)xk = R(z )xk R(z ) = 1 + z, z = h.

(69)

The reason to dene z = h is that since x is a time rate of change, the coecient must have units of inverse time and given that the time step h has dimension of time, z is dimensionless. Dimensionless parameters are always more useful during analysis, at the very least.

File: notes.tex

19

20

10.2

## Stability of implicit Euler

which means that xk+1 0 as k if and only if |R(z )| < 1. In fact, for xk+1 to be nite as k , we need |R(z )| 1. Computing the isoclines of R(z ), i.e., the curves z (t) in the complex planes such that |R(z (t))| = c, we get Fig 5 shown below.

The function R(z ) is called the stability function. It is a measure of the amplication of xk done by the numerical method. Notice that the asymptotic behavior here is easily computed to be xk+1 = R(z )xk = = R(z )k x0 , (70)

Stability function for explicit Euler 1.5 1 imaginary component 0.5 0 -0.5 -1 -1.5 -2 -1.5 -1 -0.5 0 0.5 1 real component
Figure 5: Isoclines of stability function for Explicit Euler All we get here for stability region is a disk of unit radius within the half plane with (z ) 0. In particular, there is no stability for pure oscillatory system where z = ih, R. Even more curious is the case of z = 1 , > 1, in which case the recurrence reads xk+1 = ( )k x0 = ()k k x0 , (71) as this oscillates wildly with increasing amplitude, despite the fact that the analytic solution should just decay as e(1+ )t/h .

## 10.2 Stability of implicit Euler

The implicit Euler method has somewhat more applicability than its explicit brother but still, do keep in mind that this is never recommended for real applications unless you

File: notes.tex

20

21

10.2

## Stability of implicit Euler

have a specic reason for ignoring this advice. As explained in Sec. 10.1, the derivation is included here as an illustrative example. The discretization now reads x k and after rearranging, we have xk+1 = R(z )xk 1 , R(z ) = 1z (73) xk+1 xk = xk+1 h (72)

and z = h.

The stability function satises |R(z )| 1 whenever |1 z | 1. This means that as long as we avoid putting z within the unit circle centered at (1, 0), we are safe and we have |R(z )| 1. The stability region is shown in Fig 6. Note that the pattern is inverted here and that we have stability outside of the isocline with |R(z )| = 1. In particular, for pure 1+i imaginary z = i, R, we have R(i ) = 1+ . This can be rewritten as 2 1 1 + i 1 + i = 2 2 1+ 1+ 1 + 2 1 = ei 1 + 2 . tan = , or sin = 1 + 2 R(z ) =

(74)

This means that pure oscillations are actually damped. The amplitude is decimated by 1/ 1 + 2 per step. Also, the oscillation frequency should be close to /h but here, unless is very small, the eective oscillation frequency is much smaller than /h. The interesting thing to observe here is that implicit Euler actually ltering out all high frequencies.

File: notes.tex

21

22

10.3

## Stability function for implicit Euler 3 2 imaginary component 1 0 -1 -2 -3 -2 -1 0 1 2 3 4 real component

Figure 6: Isoclines of stability function for implicit Euler

## 10.3 Stability of the implicit midpoint rule

Gauss came up with a very good correction to the implicit scheme of the previous section. This is often called the implicit midpoint rule. Like most good methods, it bears several dierent names. To understand the method, rst recall that x(t + h) + x(t h) = 2x(t)+ hx (t) hx (t)+ O(h2 ) = x(t)+ O(h2 ). Then, x(t + h/2) x(t + h/2) = hx + O(h2 ). Therefore, a higher order approximation to the previous stepping formulae and the corresponding stability function become xk+1 xk xk+1 + xk = ( ) h 2 xk+1 = R(z )xk 1 + z/2 R(z ) = , and z = h. 1 z/2 x k

(75)

A bit of analysis on R(z ) shows that it has modulus |R(z )| 1 on the left half plane where (z ) 0, and that |R(z )| = 1 whenever z = i, R. This is interesting since it means that for purely oscillatory systems, the midpoint rule is conservative, i.e., it preserves the magnitude of xk so that |xk | = |x0 |. That is why it is so useful in the simulation of conservative physical systems.

File: notes.tex

22

## Rev : 1.3 2009/09/08

23

Stability function for implicit midpoint 10 0.20 0.40 0.60 0.80 1.00

imaginary component

-5

-10 -10 -8 -6 -4 -2

real component
Figure 7: Isoclines of stability function for implicit midpoint

In general, given any numerical integrator, it is possible to write the discrete dynamical system as x k+1 = R(z ) xk . For one-step methods, x k = xk but for multistep methods, the state vector may contain several iterates x k = (xk , xk1 , . . . , xkr ). Correspondingly, the stability function R(z ) is either a simple scalar function or a matrixthis later case is not covered here. For the important case of Runge-Kutta methodsand all examples above are instances of low order Runge-Kuttathe stability function has the following form. For explicit Runge-Kutta methods of order r , the stability function can be computed to be 1 1 1 1 (76) Rr (z ) = 1 + z 2 + z 3 + z 4 + . . . + z r + O(z r+1 ). 2! 3! 4! r! For implicit Runge-Kutta methods of order r like the implicit Euler (r = 1) or the implicit midpoint method (r = 2) abovethe stability function is the rational function Rr (z ) = P (z ) Q(z ) (77)

File: notes.tex

23

## Rev : 1.3 2009/09/08

24 For example Fig. 8 illustrates the stability region of the fourth order explicit RungeKutta method. This has signicantly more stability than the case of explicit Euler for instance, and does include a part of the imaginary axis. Note however that we still have |z | < 3 everywhere within the stability domain so this cannot handle very high frequencies like the implicit midpoint for example.

Stability function for explicit Runge Kutta order 4 3 2 imaginary component 1 0 -1 -2 -3 -5 -4 -3 -2 -1 0 1 real component
Figure 8: Isoclines of stability function for explicit Runge Kutta order 4 Stability domain is not everything, however. If is large and imaginary, it might be because there are important high frequencies in the dynamical system and fundamentally, it is not possible to resolve these correctly unless the time step is suciently small. However, in a nonlinear system, there might be non-essential frequencies which we might want to ignore. This brings two dierent concepts of stability. Denition 11.1. An integrator is called A-stable if the stability function satises |R(z )| < 1 whenever (z ) < 0. For instance, the implicit midpoint method is A-stable. This notion of stability has a slight problem however since it does not say what is going on as |z | . For the midpoint method for instance, we can follow z to innity along the imaginary axis and have |R(z )| = 1 all the way. That is not necessarily a desirable feature. The implicit Euler method on the other hand has |R(z )| 0 as z and this justies the following denition.

File: notes.tex

24

## Rev : 1.3 2009/09/08

25 Denition 11.2. An integrator is called L-stable if it is A-stable and if its stability function satises |R(z )| 0 as z . If you are trying to integrate a dynamical system which has high frequencies that are not essential to the model, a good L-stable method will be able to damp out these frequencies and save time by integrating the relevant dynamics with a bigger time step. Note however that all L-stable methods are implicit to some degree and that makes each computational step more expensive, as shown below. The computation of the stability domain of a given integration method can be very complicated, especially when the method include automatic time-step control, various internal stages, and implicit components. This is the topic of a more specialized course in dierential equations. For a good survey, consult the book by Hairer and Wanner .

## 12 Linear stability of physical systems.

For a physical system, we have to look at the matrix equation x =v v = Bx + Cv, (78)

where B and C are square, real matrices. The general analysis that was done in 10 will not be done here but instead, we just look at the case where B and C are real scalar. To build a theory for this system, we look at two dierent discretization methods, namely, the explicit Euler method of rst order, and the symplectic Euler method of rst order. The latter will be explained shortly. The archetypal linear physical system is the damped simple harmonic oscillator. The equation of motion for that is x + x + 2 x = 0, (79) where is the damping constant and is the oscillation frequency. The physical system is stable for 0, and is chosen nonnegative by convention. This means that we have B = 2 and C = in (78).

## 12.1 Linear stability of explicit Euler

After discretization, the stepping equations become 1 xk+1 xk 0 1 = 2 h vk+1 vk and rearranging this yields xk+1 1 h = 2 vk+1 h 1 h xk . vk (81) xk vk (80)

File: notes.tex

25

26

12.1

## Linear stability of explicit Euler

This is now a matrix recurrence relation of the form uk+1 = Duk = = D k u0 . If the vector uk+1 does not escape to innity, its norm should be nite and so uk+1 = D k u0 (D )k u0 , (82)

where (D ) is the spectral radius of matrix D , i.e., the modulus of the largest eigenvalue. Here, (D ) plays the same role as R(z ) did previously. Computing the eigenvalues of the matrix in (81), one nds h 0, 2 (1 )(1 2 ) + 2 = 0 = = 1 =1 = h 0 (83) (1 )2 1 + 2 2 2 2

2 2(1 ) + 1 2 + 2 = 0

The isoclines of max(|+ |, | |) are shown in Fig. 9. As seen from the distorted, narrow region of stability, it is an art to nd a damping coecient that stabilizes this stepper for a given frequency .

Stability of Euler rst order on SHO 4 0.20 0.40 3.5 0.60 0.80 3 1.00 damping 2.5 2 1.5 1 0.5 0 -1 -0.5 0 0.5 1 1.5 2 angular frequency
Figure 9: Isoclines of stability function for explicit Euler method

File: notes.tex

26

27

12.2

## 12.2 Linear stability of the symplectic Euler method

Because we are working with a physical system, there are two dierential equations to work with, namely, one for x and one for v . These equations can in fact be discretized dierently. If we pick the implicit Euler method on the x equation and the explicit version on the v equation, we get the Leapfrog formula x xk+1 xk = vk+1 h vk+1 vk v = 2 xk vk . h

(84)

After reorganizing the terms so that xk+1 and vk+1 are on the left hand side and xk , vk are on the right hand side, we can write this as the recurrence 1 h 0 1 xk+1 1 0 = vk+1 h 2 h xk . vk (85)

Now, the matrix on the right hand side is easily inverted 1 0 1 h = 0 1 0 1 1 h = 0 1 Therefore, the stepping equation is now xk+1 1 h = vk+1 0 1 = 1 0 2 h h xk vk xk vk (87) 1 h 0 1 1 h 0 1
1

(86)

1 2 h(1 ) 2 /h 1 and = h.

with = h,

The eigenvalues for this matrix are readily computed (1 2 )(1 ) + 2 (1 ) = 2 2(1 = (1 + 2 ) 2

+ 2 )+1 =0 2 1 2 + ( + 2 )2 4

(88)

The stability domain for this is illustrated in Fig. 10. This is a sizable wedge of stability in comparison with the explicit rst order Euler method, which requires the same amount of computational work per step. In fact, this compares favorably with the Runge-Kutta

File: notes.tex

27

## Rev : 1.3 2009/09/08

28 fourth order method, especially along the = 0 axis. Note that whenever = 0, we have | | = 1. Unlike in the case of the midpoint method though, energy, dened here 2 2 2 as E = 1 2 (v + x ) is not strictly preserved.

Stability of symplectic Euler rst order on SHO 2 0.20 0.40 0.60 0.80 1.00

imaginary component

1.5

0.5

## 0 0 0.5 1 real component

Figure 10: Isoclines of stability function for symplectic Euler method

1.5

## 13 Sti equations and implicit methods

So far, weve only looked at single equations, decoupled from others. When we put the system back together, we might nd that some of the frequencies are high whilst others are low. Using a time step small enough to resolve all the high frequencies can be inecient. In addition, for a physical problem with nite energy, high frequency modes are expected to have small amplitude. Going back to the linear stability equation (65), we choose A to be a 2 2 matrix of the form 1 A= , , , , C, ( ) < 0. (89) 0 This is called a Jordan block in the theory of canonical forms of matrices. This leads to

File: notes.tex

28

## 29 the two weakly coupled equations x = x + y y = y. (90)

This can be solved easily for y rst and then for x. We consider the case where both (), ( ) < 0, and set | | = with > 0 very large. We expect y 0 as t and so, in the long run, only x matters. But things are very dierent at the numerical level. Using the explicit Euler method the discrete stepping equation is then xk+1 1 + ih h = yk+1 0 h and the eigenvalues of the iteration matrix are then 1 = 1 + h, 2 = 1 + h. (92) xk , yk (91)

Obviously, we need |i | < 1 for convergence and thus, if is very large, we need a very small time step, even though the rst equation could be integrated with a much larger time step. The second equation which is a very slow mode stops everything. This is almost paradoxical since y 0 very quickly and should not really count at all. The story is dierent if we use the implicit Euler method in which case we have 1 h h 0 1 h xk+1 x = k . yk+1 yk (93)

After some simple algebra, we have the explicit form xk+1 yk+1 x (1 h)1 h(1 h)1 (1 h )1 = k . 1 0 (1 h ) yk (94)

## As , this system becomes xk+1 = (1 h)xk yk+1 = 0, (95)

as required. This observation motivates the more general but less useful denition of sti systems, namely, sti ODEs are those best solved with implicit methods. As observed before, all sti methods require the computation of the matrix grad f as well as the solution of linear systems of equations of the form (grad f )x = b during the solution of nonlinear equations. This involves a lot of work and the trade-o is between the cost of building and factoring matrices and that of stepping with very small steps. Interestingly enough, the framework and overhead necessary for the use of implicit method is reputed to be so high that they are seldom used even for cases where the should. There is much scientic work published where incredibly small time-steps are used in simulations.

File: notes.tex

29

30

## 14 A concrete example of a sti system

We look here at a simple physical example which has two degrees of freedom, with slow and fast dynamics. Consider a pendulum in two dimensions. This is a bob of mass m with position x(t) R2 such that x l0 . Normalize the length to l0 = 1 and assume that the string that holds the pendulum is in fact a linear spring so that the restoring force is 1 x = kr (x), (96) fs = k( x 1) x where k > 0 is a spring constant. We could add a damping term but as we see below, most integration strategies that work for this problem do not need additional damping in the physical model. Observe right away that this force can be written as a gradient 1 U = ( x 1)2 2 ( x 1) U T = ( x 1) x xT U 1 x, and sofs = k T . = ( x 1) x x

(97)

The reason for the xT is that we use the convention that U/x is a row vector and U/xT is a column vector. The other force that acts on the pendulum is the constant pull from gravity and that is just 0 fg = mag = mag u (98) 1 where ag 9.8 10 is a constant, and u is a constant unit vector pointing downward. Write 2 = k/m and so the equations of motion for this are x =v v = ag u + 2 r (x). (99)

Now, we anticipate oscillatory motion of the bob with frequency 0 ag /l0 = ag 3. However, if the string is strong enough to make elongation invisible, say less than 1% when the bob is exactly downward, constant must satisfy force balance: mag = kl mag l 1 divide by l0 : =k k l l 100 k 1 2 2 2 . rearrange : 1000 0 m 100

(100)

divide by m :

If we use an explicit method, we have to adjust the time step so that h < 1 or so. This means that we have to use 10 times more steps to compute the relevant motion, the

File: notes.tex

30

## Rev : 1.3 2009/09/08

31 pendulum oscillations, because we have to process the irrelevant motion, the oscillations along the string. This seems to be a perfect candidate for the implicit Euler method. To use that, we need to expand r (xk+1 ) in terms of known terms xk , up to some order. This is done as follows: r (xk+1 ) = r (xk + hvk+1 ) = r (xk ) + h = r (xk ) + hKvk+1 . The exact form of matrix K is not entirely relevant but here it is K= r (x) = x x =I ( x 1) 1 xxT . x 2 1 x x r vk+1 + O(h2 ) x (101)

(102)

The expression xxT might look funny but it is the matrix xxT = x1 x1 x1 x2 . x2 x1 x2 x2 (103)

In any case, the system that we need to solve now is (I h2 2 K )vk+1 = hag u + h 2 rk , xk+1 hvk+1 = xk (104)

and this means that we must invert a 2 2 matrix at each step. However, for that cost, we can crank the time step very high since the large frequencies are not a threat to the stability of implicit Euler. Observe however that as , the stepping equations become essentially xk+1 hvk+1 = xk Kvk+1 = (1/h) 2 rk , (105)

and this is our good friend, the steepest descent. In other words, the pendulum stops falling if the spring constant of the string becomes high enough (this will be tested in the lab work). Something went wrong here obviously. We can do things dierently though by reusing the types of ideas that went into the construction of the symplectic Euler method. Introduce the variable so that g(x) = ( x 1) = kg(x) g G= x fs = GT (106)

File: notes.tex

31

## Rev : 1.3 2009/09/08

32 and rewrite the rst of these equations to read h2 + g(x) = 0, and so with h2 = 1/k (107)

## + g(xk+1 ) = + g(xk ) + hGk vk+1 + O(h2 ).

Using the symplectic Euler discretization and collecting terms, we can construct a stepper as follows xk+1 = xk + hvk+1 vk+1 = vk + hag + hGT k h + hGk vk+1 = gk . After absorbing a factor of h in and dividing the last equation by h, we get the following system of equations xk+1 hvk+1 = xk
2

(108)

## vk+1 GT k = vk + hag 1 Gk vk+1 + = gk . h

(109)

This has a nice limit as 0 and preserves the physics. It still introduces a small dissipation but that is bearable. This example will be analyzed in the lab and more will be said about that in the next lectures.

Part II
15 Introduction
The present lecture provides some theory to understand several practical aspects of typical integration software. The rst is the time step control that is used to control the quality of the numerical approximation. The second addresses the overall cost of using a given integration method on an ODE. Finally, several issues connected to implicit integration methods are covered. What can possibly go wrong when numerically integrating an ODE? The short of it is bad scaling, i.e., when some derivatives grow large with respect to others, and especially when some derivatives grow large for a very short time, or in a very short region of phase space. Why is that a problem? Well, since we use velocity information x to nd intermediate stages xk+1 xk + hx k , we can miss all the variation if it happens too quickly.

File: notes.tex

32

## Rev : 1.3 2009/09/08

33 As was seen in the last lecture, linear problems can be diagonalized and in doing that, they become completely decoupled. Therefore, it is nonlinear problems that generally cause trouble. This is why the lab exercises are almost all based on nonlinear problems. This lecture and the associated lab will build some practical experience with the numerical integrators found in matlab. The combination of matlab and simulink is often regarded as the standard tool for simulating complicated systems with many interconnected components. Having a good look at these tools and understanding where they go wrongor where the take too much time to compute adequate answersis a valuable experience.

## 16 Truncation error, local error, global error

When considering a numerical method, it is important to understand the sources and sizes of all errors. Considering a general dierential equation, we can formally write the discrete form as (h) (110) Nh xk = 0, where xk is the complete state vector for the integrator at step k. Do not read (110) as a multiplication. Instead, you should think of the statement in (110) as operator (h) (h) Nh applied to state vector xk produces the zero vector. The state vector xk might include several past, present and future values of the vector xk . For multistep methods for instance, the state vector contains a history over the past r steps. As a concrete example, consider the implicit midpoint rule. For this case, the operator Nh applied to an explicit ODE x = f (x) reads N h xk =
(h) (h)

xk+1 xk f h

xk+1 + xk 2

(111)

The operator Nh is obviously nonlinear in xk . Error analysis starts by considering the truncation error, i.e., the value of the discrete dierential operator applied to the exact solution x(t). Denition 16.1. The local error dk of a discrete integrator dened with Nh xk = 0 is dk = Nh x(tk ), for a given time step h. In general, we dont know x(t) and all we can do is estimate the residual by expanding everything in power series. Doing this on the midpoint rule given in (111), we need the following approximations 1 1 (x(t + h) x(t)) = h h =x + x x + hx + h2 h2 h3 ... x + ... x + 2! 3! (112)
(h)

(113)

h ... x + ... x + 2! 3!

File: notes.tex

33

## Rev : 1.3 2009/09/08

34 ... Observe now that since x = f (x), then x = f x = f f . Similarly, x = f f 2 + (f )2 f , and so on. Collecting terms, this produces 1 h h2 (x(t + h) x(t)) = f + f f + (f f 2 + (f )2 f ) + O(h3 ). h 2! 3! Then, looking at the last term in (113) h h (x + x + O(h2 )) 2 2! h h2 h2 = f + f x + f x + O(h3 ) + f x + O(h3 ) 2 4 8 h2 h2 h = f + f f + f f f + f f + O(h3 ). 2 4 8 The residual error then evaluates to f =f x(t) + Nh x = h2 h2 h2 2 (f f + (f )2 f ) f f f f f + O(h3 ) 3! 4 8 2 = O(h ). x(t + h) + x(t) 2 (114)

(115)

(116)

The general result here is that the local residual error for a method of order p is Nh x(t) = O(hp ). Note carefully that this has absolutely nothing to do with oating point arithmetic. All we did now is to compare terms in Taylor series expansions of the analytic solution (presumed to exist) and the computed solution. The next thing to look at is the local error, i.e., the dierence between the exact solution x (t) with x (0) = xk and the result of the numerical method at xk+1 . Denition 16.2. The local error of a numerical integrator, lk is the dierence between = f (x) starting at x the exact solution x (h), with x (0) = xk , and the numerical solution xk+1 . More precisely = f ( x x), x (0) = xk lk = x (h) xk+1 . (117)

By performing a Taylor series expansion of x (h) in powers of h around x (0), and considering the fact that xk+1 is also a truncated power series in h about xk that is designed to match that of x (h) up to order p, it is possible to show that the local error lk is of the form (h) xk+1 chp+1 , (118) lk = x

where c > 0 is some constant. In other words, the local error has to be of the order of hNh . For implicit methods, the constant c might be quite large, however. For RungeKutta methods, the value of the factor c can be expressed in terms of polynomials in the p rst derivatives of x(t), which can in turn be expanded in monomials involving factors of f , f , etc. The theory behind this is too involved for the present course but is a nice example of application of combinatorial trees . The nal form of c is not so practical unless the function f is well understood analytically. This error estimate is not useful on its own since we do not know x (0). The way to turn the local error estimate into a practical error control device is explained next.

File: notes.tex

34

35

## 17 Time step control

Suppose we have two integration methods of dierent order, p and p + r , say. Label the numerical solutions xk and x k , respectively. Then, the corresponding local errors are lk = c hp+r+1 + O(hp+r+2 ) lk = chp+1 + O(hp+2 ). This means in particular that the dierence between xk and x k is x k xk = ( xk x (h)) (xk (x)(h) hp+1 ( chr + c). (120) (119)

So, for a given tolerance , our local step with time step h is good and accepted whenever x k xk < . What to do if that is not the case? Well, assuming that the constants c will change the local error and c do not vary much, performing a new step with size h to become p+1 h ) lk (h lk (h) , (121) h assuming that h < 1 and that c hr can be neglected. Therefore, if the local error estimate < h so that the is less than unity, x k xk < 1, it is possible in principle to nd h error tolerance is met. Likewise, if the local error estimate is very small, it is possible to to still meet the tolerance with a bigger stride. This helps eciency. increase h , we If we decide to scratch that step and perform and restart with the new step size h = h/2. have to redo all the work, unless perhaps we only allow binary division, i.e., h This is the principle behind extrapolation methods. The fundamental idea of adaptive step control in the context of explicit Runge-Kutta methods in which f (x) is evaluated at several intermediate internal stages, xk1 , xk2 , . . . , xks , is to nd embedded pairs of methods of dierent orders, usually p and p + 1, so that both method use the same internal stages. This can save considerably on computational work. The best known methods for these are the Dormand-Prince pairs . A very crude picture of what is involved here is to consider the denitions of the second and fourth order explicit Runge-Kutta methods of (123) and (124), respectively. Note that k1 and k2 are identical for both methods so the ideas above can be used. Higher order embedded methods are dicult to construct in general. Similar ideas can be used in the context of linear multistep methods and the predictorcorrector family as well. There is still discussion about what is the best method to use in general.

## 18 Sti and non-sti systems

An explicit ODE can be written as x = f (x), (122)

File: notes.tex

35

## Rev : 1.3 2009/09/08

36 and from this, it seems reasonable that an integrator should be able to advance the solution xk to xk+1 by computing the value of the function f (x). If you are simulating a galaxy with a few million stars, each of which attract each other, f (x) is hard enough to compute, involving O(1012 ) pairwise force computations! Likewise, for a complex ecology system with generalized predator prey, it can happen that each species considered inuences nearly all the others, leading to a densely coupled system. Nevertheless, for N variables x, this type of computation grows at most as O(N 2 ), the number of pairs. In addition, if a numerical integrator as order p, it tends to require O(p) evaluations of f (x) at each step. If the function f (x) is expensive to evaluate, a higher order method can be prohibitive. For example, the second order explicit Runge-Kutta involves the following computations k1 = f (xk ) k2 = f (xk + xk+1 h k1 ) 2 = xk + hk2 , (123)

and the most common fourth order Runge-Kutta method involves the following computations k1 = f (xk ) h k1 ) 2 h k3 = f (xk + k2 ) 2 k4 = f (xk + hk3 ) h xk+1 = xk + (k1 + 2k2 + 2k3 + k4 ). 6 k2 = f (xk +

(124)

For all explicit Runge-Kutta methods, one must evaluate s stages, computing f (x) each time. For order p > 6, the number of stages s is always higher than the order of the method s > p. There are Runge-Kutta methods of very high order which are useful in some applications. To understand the computations involved in implicit methods, consider the implicit midpoint rule 1 (x(t + h) + x(t h)) = x(t) + O(h2 ) 2 1 1 (x(t + h) x(t h)) = x (t) + O(h3 ) = f ( (x(t + h) + x(t h))). 2h 2

(125)

In the last equation, xk+1 = x(t + h) appears on both sides and for a nonlinear function f (x), this means that we need to approximate the answer with successive linear approximations.

File: notes.tex

36

## Rev : 1.3 2009/09/08

37 Recall that for any small scalar and any vector v , f (x + v ) = f (x) + f (x) v + O(2 ). x (126)

1 x1 f2 x1

1 x2 f2 x2

..

## ... ... .. . ...

fn x1

fn x2

f1 xn f2 xn fn xn

If x is a n-dimensional vector, then, so if f and thus, J is an n n matrix. Using this, the stepping rule (125) can be further expanded as xk+1 xk = hf (xk + h (xk+1 xk ) + O(h3 )) 2

. . .

(127)

h2 = hfk + Jk (xk+1 xk ) + O(h3 )), 2 and this is then reorganized as I h2 Jk 2 vk+1 = hfk , where vk+1 = 1 (xk+1 xk ). h

(128)

(129)

This is now a linear system. If x is an n-dimensional vector and f : Rn Rn , the matrix Jk is n n and, in general, it takes O(n2 ) operations to compute Jk itself, and O(n3 ) operations to solve the system (129). This only makes sense in cases where the cost of doing that allows an enormous increase in time step. But that does happens, as shown below. In particular, if the matrix Jk is sparse or if it can be well approximated by a sparse matrix, the savings in computations can be worth it. Also note that the simple minded rst order expansion of (129) is in fact very common. In the Runge-Kutta family, methods based on this form of linear implicitness are called Rosenbrock methods. You can see that the matlab documentation for ode23s is a Rosenbrock method. Essentially, the dierence between sti and nonsti integrators is based on whether it is necessary to solve nonlinear systems of equations and consequently, whether one needs the Jacobian.

## 19 Solving nonlinear equations

Given a function f : Rn Rn with Jacobian J (x), solving the nonlinear system of equations f (x0 + v ) = 0 can be done using Newton iterations. Note however that this is not guaranteed to converge if the solution vector x0 + v is far from x0 , or if f (x) has multiple roots in the neighborhood.

File: notes.tex

37

## Rev : 1.3 2009/09/08

38 Just the same, assume that we are given x0 . Expand the function f (x0 + v ) = f (x0 ) + Jv 0. So now, just solve this for v and this gives 1 Jv = f (x0 ), (131) (130)

which is a standard linear system for the vector v , as described in the previous section. A good nonlinear solver will solve for = 1 rst and then evaluate f (x0 + v ) for dierent values of and pick the best choice. In addition, if the solver decides that it needs a dierent vector v , it might reuse the Jacobian several times. In fact, one often uses the recursion x( ) = x( 1) + v ( ) Jv ( +1) = f (x( ) ), (132)

keeping matrix J constant. This is called the simplied Newton method. Note that one might still perform a line search along x( 1) + v ( ) , for (0, 1] to avoid missing zeros of f (x). In fact, if you use the function fsolve from matlab and monitor the number of function and Jacobian evaluations, you will nd that in many cases, there are several function evaluations f (x) (between 45 is my own experience on well conditioned problems) for each Jacobian evaluation. Remember that solving a linear system of n equations in n variables can take up to O(n3 ) operations. Any form of gain one might make on large time steps in the integrator might be canceled by having solve the nonlinear problems. Things get worse with integrators that numerically estimate the Jacobian matrix directly, though that is sometimes unavoidable. Of course, if the Jacobian is sparse, the cost of solving the linear system is usually much less than O(N 2 ) and in addition, if one reuses the matrix J for several steps, the matrix only has to be factored once and each subsequent step involves O(n2 ) in the worst case. For instance, using f (x) = |x| yields the iterates x0 , x0 , . . .. For smooth functions, other problems may arise. However, Newtons method for nonlinear equations, especially with line search, is the best tool in the box.

File: notes.tex

38

## Rev : 1.3 2009/09/08

39

Newtons method on a nonsmooth function 1 function value 0.8 0.6 0.4 0.2 0 -1 -0.5 0 iterate
Figure 11: The Newton method going bad on a nonsmooth function.

curve iterates

0.5

Newtons method on a smooth function 40000 35000 30000 25000 20000 15000 10000 5000 0 -8 -7 -6 -5 function value curve iterates

-4 iterate

-3

-2

-1

Figure 12: The Newton method doing a good job on f (x) = 10x4 .

20 A string model
To give some esh to the notion that implicit techniques are a very good idea in some cases, consider a string of length L, mass M , under tension T . There are many dierent ways to derive the string equations with varying degrees of accuracy. The one chosen here is denitely simple minded and therefore not the most convincing. A good textbook on physics will provide you with a better derivation if you need that. Working forward, consider a set of point masses uniformly distributed along a line. Write the transversal displacement of each mass as x (i) (t), i = 1, 2, . . . , n. The mass of each particle is set to m = M/n which is held constant, and this corresponds to l where is the mass density of the string and l is the longitudinal spacing between the

File: notes.tex

39

## Rev : 1.3 2009/09/08

40 point masses so that L = nl. The question now is how to construct a simple linear force model which is at rest whenever x (1) = x (2) = = x (n) = 0. This is easily done by putting a force on each x (i) which is proportional to the local curvature. The . Do do this with discrete displacements, we proportionality constant will be the scalar K use the dierence between the local average value of the x (j ) in some neighborhood, and the value at x (i) . If you use the nearest neighbors to compute a local weighted average, this construction becomes curvature = (x (i) ) average value local value 1 = (x (i+1) + 2x (i) + x (i1) ) x (i1) 4 1 (i+1) = (x 2x (i) + x (i1) ). 4

(133)

Note in passing that if we write x (i) (t) = x(t, il) = x(t, z ), then, the expression on the last line of (133) is proportional to an approximation of the second derivative of x(t, z ) with respect to z , namely 2 x(t, z ) 1 1 2 (x (i+1) 2x (i) + x (i1) ). 2 z l 4 x (n) (134)

Finally, write Newtons second law for each particle (well settle the case of x (1) and further below) K lx (i) = (x (i+1) 2x (i) + x (i1) ). (135) 4 In the limit where l 0, the classical string equation 2y 2y = T , t2 z 2 (136)

= T /l = T n/L. Here, y (t, z ) is the transversal disis recovered provided we use K placement as a function of time t and longitudinal coordinate z . From basic physics, it is known that the fundamental vibration frequency of the string is given by = 1 2L T , (137)

and this means that for our discrete model, we can write x (i) = T n2 (i+1) (x 2x (i) + x (i1) ) L2
2 (i+1)

(138) ).

= 4(n ) (x

2x

(i)

+x

(i1)

Note how the local oscillation frequency increases as n . For the end points x (1) and x (n) , we put a restoring force that is biased to bring these points back to 0 over time so x (n) = (2n )2 (2x (n) + x (n1) ) x (1) = (2n )2 (2x (1) + x (2) ) (139)

File: notes.tex

40

## 41 Dening the matrix 2 1 0 0 1 2 1 1 .. .. .. .. D= . . . . 0 , 4 0 . . . 1 2 1 0 ... 0 1 2 x =v v = (4n )2 Dx.

(140)

which is a discrete form of the Laplace operator 2 in one dimension, the equations of motion can now be written as (141)

This is in fact a nite dierence scheme to integrate a partial dierential equation. The sad news here which is clearly visible from (141) is that as we rene the spatial discretization, increasing n, we will have to decrease the time step by the same amount so that we maintain hn < 1, the basic criteria for both stability and adequate resolution of the oscillations. Because the local oscillation frequency increases with n, this problem is in fact becoming rapidly sti with increasing number of lumped elements. However, the string still has fundamental frequency due to the coupling between the parts. If one followed a disturbance x (1) in time when integrating with an explicit numerical method, one would see that it propagates by one neighbor for each time step. This gives a longitudinal wave velocity of v l = l/h. However, wave theory predicts that this should be in fact vl = T /. The two quantities have no relation to each other! Observe now that a string has no upper limit on its oscillation frequencies. In theory, there can be waves with frequencies j = j, j = 1, 2, 3, . . . , and this is why the nite dierence model becomes more complicated as we increase the number of samples. For n sample points, we can simulate waves with frequency up to j = n . The drawback is that we have to decrease the time step just in case these modes appear. In practice, what would be preferable is to adjust the model so that it can only simulate frequencies up to some nite value, say 10 . To do this, you need nite element methods where each element represents one type of oscillation in space. That is for another course, however. If we integrate the linear system (141) with the implicit midpoint rule, we nd that the following system of equations has to be solved at each step I h 2I h 2 2 (4n ) D I xk+1 = vk+1 I h 2 2 (4n ) D
h 2I

xk . vk

(142)

Damping can also be added in this system by changing the lower right corner of the matrix on the right hand side to (1 h )I , and keeping h moderately small. An easier way to simulate this system is to rst observe that 1 1 (xk+1 + xk ) = xk + (xk+1 xk ) 2 2 (143) h = xk + (vk+1 + vk ). 4

File: notes.tex

41

## Rev : 1.3 2009/09/08

42 After substituting this into (141), the result is found to be I (4hn )2 D vk+1 = h(4n )2 Dxk + 4 xk+1 I+ (4hn )2 D vk 4

References

(144)

h = xk + (vk+1 + vk ). 2

Such an integrator is called a Crank-Nicholson method. For the case at hand, that is easy to implement in matlab and that is part of the ODE lab. The idea here is that D is just a tridiagonal matrix and is very, very sparse. Likewise, if the rst line of the stepping equations (144) is rewritten as Avk+1 = Bxk + Cvk , all three matrices A, B and C are extremely sparse, in addition to be constant. One can easily construct these matrices in matlab using the spdiags routine and factorize them using the lu function. After that, one needs linear solves which take O(n) to compute, i.e., just about as fast as the fully explicit method. This will be done in the lab. In comparison to an explicit solver applied directly on (141), the stepper dened by (144) is far superior because it actually preserves the vibration modes of the string. Explicit solvers cannot simulate the propagation of waves properly. To simulate a string properly though, you do need to have recourse to a suitable nite element method.

References
 U. M. Ascher and L. R. Petzold, Computer Methods for Ordinary Dierential Equations and Dierential-Algebraic Equations, SIAM Publ., Philadelphia, 1998.  E. Hairer, C. Lubich, and G. Wanner, Geometric Numerical Integration, vol. 31 of Spring Series in Computational Mathematics, Springer-Verlag, Berlin, Heidelberg, New York, London, Paris, Tokyo, Hong Kong, 2001.  E. Hairer, S. P. Nrsett, and G. Wanner, Solving Ordinary Dierential Equations I: Nonsti Problems, vol. 8 of Springer Series in Computational Mathematics, Springer-Verlag, Berlin, Heidelberg, New York, London, Paris, Tokyo, Hong Kong, second revised edition ed., 1991.  E. Hairer and G. Wanner, Solving Ordinary Dierential Equations II: Sti and Dierential Algebraic Problems, vol. 14 of Springer Series in Computational Mathematics, Springer-Verlag, Berlin, Heidelberg, New York, London, Paris, Tokyo, Hong Kong, second revised edition ed., 1996.  B. Leimkuhler and S. Reich, Simulating Hamiltonian Dynamics, vol. 14 of Cambridge Monographs on Applied and Computational Mathematics, Cambridge University Press, Cambridge, 2004.

File: notes.tex

42