0% found this document useful (0 votes)
1K views593 pages

Applied Numerical Methods - (NAFTI - Ir)

Uploaded by

Wilson Wan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1K views593 pages

Applied Numerical Methods - (NAFTI - Ir)

Uploaded by

Wilson Wan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

i IED

Original Edition 1969


Reprint Edition 1990

4 : :. 3: >a.
&.
Printed and Published by
.,
i-..
' .,/ ' I
- *" f '
-.s,* ,
k~ ; GJ,:t.:,.? .:'2, .:; -.
a u a ~ ~ c~o&~ ~~= -.-& $'3i'::$-~ 1e;:,. T G3~.I L &,, j: ...4-;:i ~i..
$. .
l i

ROBERT E.-E* :
I - .... r*:. ; ee
%.+
- . .<J
.).- . . " "'.*. , , *$. ..<.,:zv- :
-. ., &A
'
,:;:*, ,

K ~ U E G E R D R ~ ~: E. .".
&-

% . .:-.
,+
r,; , *I
, . A

MALABAR, FJBRIDA 32950

Copyright 01969 by John Wiley and Sons, Inc.


Reprinted by Arrangement

All rights reserved. No part of this bookmay be reproduced inany form or by any means, electronic or mechanical,
including information storage and retrieval systems without permission in writing from the publisher.
No liability is assumed with respect to the use of the information contained herein.
Printed in the United States of America.

Library of Congress Cataloging-in-PubUcationData


Camahan, Brice.
Applied numerical methods / Brice Caxnahan, H.A. Luther, James 0.
Wikes.
p. cm. ,

Reprint. Originally published: New York : Wiley, 1969.


Includes bibliograpbical references and index.
ISBN 0-89464-486-6 (alk. paper)
1. Numerical analysis. 2. Algorithms. I. Luther, H.A.
11. Wilkes, James 0. IU.Title.
QA297.C34 1990
519.4-dc20 90-36060
CIP
to
DONALD L. KATZ
A. H. While University Professor of Chemical Engineering
The University of Michigan
Preface

This book is intended to be an intermediate treatment of the theory and appli-


cations of numerical methods. Much of the material has been presented at the
University of Michigan in a course for senior and graduate engineering students.
The main feature of this volume is that the various numerical methods are not only
discussed in the text but are also illustrated by completely documented computer
programs. Many of t b programs relate to problems in engineering and applied
mathematics. The r e a b r should gain an appreciation of what to expect during the
implementation of particular numerical techniques on a digital computer.
Although the emphasis here is on numerical methods (in contrast to numerical
analysis), short proofs or their outlines are given throughout the text. The more
important numerical methods are illustrated by worked computer examples. The
appendix explains the general manner in which the computer examples are presented,
and also describes the flow-diagram convention that is adopted. In addition to the
computer examples, which are numbered, there are several much shorter examples
appearing throughout the text. These shorter examples are not numbered, and
usually illustrate a particular point by means of a short hand-calculation, The
computer programs are written in the FORTRAN-IV language and have been run
on an IBM 360167 computer. We assume that the reader is already moderately
familiar with the FORTRAN-IV language.
There is a substantial set of unworked problems at the end of each chapter.
Some of these involve the derivation of formulas or proofs; others involve hand
calculatigns; and the rest are concerned with the computer solution of a variety of
problems, many of which are drawn from various branches of engineering and
applied mathematics.
Brice Carnahan
H. A. Luther
James 0.Wilkes

vii
Contents

COMPUTER EXAMPLES

CHAPTER 1
Interpolation and Approximation

1.1 Introduction
1.2 Approximating Functions
1.3 Polynomial Approximation-A Survey
The Interpolating Polynomial
The Least-Squares Polynomial
The Minimax Polynomial
Power Series
1.4 Evaluation of Polynomials and Their Derivatives
1.5 The Interpolating Polynomial
1.6 Newton's Divided-Difference Interpolating Polynomial
1.7 Lagrange's Interpolating Polynomial
1.8 Polynomial Interpolation with Equally Spaced Base Points
Forward Differences
Backward Dlfferences
Central Drflerences
1.9 Concluding Remarks on Polynomial Interpolation
1.10 Chebyshev Polynomials
1.1 1 Minimizing the Maximum Error
1.I2 Chebyshev Economization-Telescoping a Power Series
Problems
Bibliography

CHAPTER 2
Numerical Integration
2.1 Introduction
2.2 Numerical Integration with Equally Spaced Base Polnts
2.3 Newton-Cotes Closed Integration Formulas
2.4 Newton-Cotes Open lntegration Formulas
2.5 Integration Error Using the Newton-Cotes Formulas
2.6 Composite Integration Formulas
2.7 Repeated Interval-Halving and Romberg Integration
2.8 Numerical lntegration with Unequally Spaced Base Points
2.9 Orthogonal Polynomials
Legendre Polynomials: P,(x)
Laguerre Polynomials: 5Yn(x)
Chebyshev Polynomials: Tn(x)
Hermite Polynomials: Hn(x)
General Comments on Orthogonal Polynomials
2.10 Gaussian Quadrature
Gauss-Legendre Quadrature
Gauss-Luguerre Quadrature
Gauss-Chebysheu Quadrature
Gauss-Hermite Quadrature
Other Gaussian Quadrature Formulas
2.1 1 Numerical Differentiation
Problems
Bibliography

CHAPTER 3
Solution of Equations

Introduction
Graeffe's Method
Bernoulli's Method
Iterative Factorization of Polynomials
Method of Successive Substitutions
Ward's Method
Newton's Method
Regula Falsi and Related Methods
Rutishauser's QD Algorithm
Problems
Bibliography

CHAPTER 4
Matrices and Related Topics

Notation and Preliminary Concepts


Vectors
Linear Transf~rmationsand Subspaces
Similar Matrices and Polynomials ia a Matrix
Symmetric and Hermitian Matrices
The Power Meth~dof Mises
Method of Rutishauser
Jacobl's Method for gymmetrio Matrices
Method of Danilevski
ProbIems
Bibliagraphy

x Contents
CHAPTER 5
Systems of Equations

Introduction
Elementary Transformations of Matrices
Gaussian Elimination
Gauss-Jordan Elimination
A Finite Form of the Method of Kaczmarz
Jacobi Iterative Method
Gauss-Seidel Iterative Method
Iterative Methods for Nonlinear Equations
Newton-Raphson Iteration for Nonlinear Equations
Problems
Bibliography

CHAPTER 6
The Approximation of the Solution of Ordinary Differential Equations

6.1 Introduction
6.2 Solution of First-Order Ordinary Differential Equations
Taylor's Expansion Approach
6.3 Euler's Method
6.4 Error Propagation in Euler's Method
6.5 Runge-Kutta Methods
6.6 Truncation Error, Stability, and Step-Size Control in
the Runge-Kutta Algorithms
6.7 Simultaneous Ordinary Differential Equations
6.8 Multistep Methods
6.9 Open Integration Formulas
6.10 Closed Integration Formulas
6.11 Predictor-Corrector Methods
6.12 Truncation Error, Stability, and Step-Size Control in
the Multistep Algorithms
6.13 Other Integration Formulas
6.14 Boundary-Value Problems
Problems
Bibliography

CHAPTER 7
Approximation of the Solution of Partial Differential Equations

7.1 Introduction
7.2 Examples of Partial Differential Equations
7.3 The Approximation of Derivatives by Finite Differences
7.4 A Simple Parabolic Differential Equation
7.5 The ~ x ~ l i cForm
i t of the Difference Equation
7.6 Convergence of the Explicit Form
7.7 The Implicit Form of the Difference Equation
7.8 Convergence of the Implicit Form
7.9 Solution of Equations Resulting from the Implicit Method
7.10 Stability
7.1 1 Consistency
7.12 The Crank-Nicolson Method
7.13 Unconditionally Stable Explicit Procedures
DuFort-Frankel Method
Saul'yev Method
Barakat and Clarlc Metlzod
7.14 The Implicit Alternating-Direction Method
7.15 Additional Methods for Two and Three Space Dimensions
7.16 Simultaneous First- and Second-Order Space Derivatives
7.17 Types of Boundary Condition
7.18 Finite-Difference Approximations at the Interface between
Two Different Media
7.19 Irregular Boundaries
7.20 The Solution of Nonlinear Partial Differential Equations
7.21 Derivation of the Elliptic Difference Equation
7.22 Laplace's Equation in a Rectangle
7.23 Alternative Treatment of the Boundary Points
7.24 Iterative Methods of Solution
7.25 Successive Overrelaxation and Alternating-Direction Methods
7.26 Characteristic-Value Problems
Problems
Bibliography

CHAPTER 8
Statistical Metllods

8.1 I I I ~ QThe ~u of Statistical


UseG ~~o Methods
~:
8.2 Definitions and Notation
8.3 Laws of Probability
8.4 Per mutations and Combinations
8.5 Population Statistics
8.6 Sample Statistics
8.7 Moment-Generating Functions
8.8 The Binomial Distribution
8.9 The Multinomial Distribution
8.10 The Poisson Distribution
8.1 1 The Normal Distribution
8.12 Derivation of the Normal Distribution Frequency Function
8.13 The X Z Distribution
8.14 X Z as a Measure of Goodness-of-Fit
8.15 Contingency Tables
xii Contents
8.16 The Sample Variance
8.17 Student's t Distribution
8.18 The F Distribution
8.19 Linear Regression and Method of Least Squares
8.20 Multiple and Polynomial Regression
8.21 Alternative Formulation of Regression Equations
8.22 Regression in Terms of Orthogonal Polynomials
8.23 Introduction to the Analysis of Variance
Problems
Bibliography

APPENDIX
Presentation of Computer Examples
Flow-Diagram Convention

INDEX
Computer Examples

CHAPTER 1
1.1 Interpolation with Finite Divided Diferences

1.2 Lagrangian Interpolation

1.3 Chebyshev Economization

CHAPTER 2
2.1 Radiant Interchange between Parallel Plates-Composite
Simpson's Rule

2.2 Fourier Coeficients Using Romberg Integration

2.3 Gauss-Legendre Quadrature

2.4 Velocity Distribution Using Gaussian Quadrature

CHAPTER 3
3.1 GraefSe's Root-Squaring Method-Mechanical Vibration
Frequencies

3.2 Iterative Factorization of Polynomials

3.3 Solution of an Equation of State Using Newton's Method

3.4 Gauss-Legendre Base Points and Weight Factors by the


Hau-Interval Method

3.5 Displacement of a Cam Follower Using the Regula Falsi Method

CHAPTER 4
4.1 Matrix Operations

4.2 The Power Method

4.3 Rutishauser's Method

4.4 Jacobi's Method


CHAPTER 5
5.1 Gauss-Jordan Reduction- Voltages and Currents in an
Electrical Network

5.2 Calculation of the Inverse Matrix Using the Maximum


Pivot Strategy-Member Forces in a Plane Truss

5.3 Gauss-Seidel Method

5.4 Flow in a Pipe Network-Successive-Substitution Method

5.5 Chemical Equilibrium-Newton-Raphson Method

CHAPTER 6
6.1 Euler's Method

6.2 Ethane Pyrolysis in a Tubular Reactor

6.3 Fourth-Order Ruge-Kutta Method-Transient Behavior of a


Rpsnno~fCircuit

6.4 Hamming's Method

6.5 A Boundary- Value Problem in Fluid Mechanics

CHAPTER 7
7.1 Unsteady-State Heat Conduction in an Infnite, Parallel-Sided
Slab (Explicit Method)

7.2 Unsteady-State Heat Conduction in an Infinite, Parallel-Sided


Slab (Implicit Method)

7.3 Unsteady-State Heat Conduction in a Long Bar of Square


Cross Section (Implicit Alternating-Direction Method)

7.4 Unsteady-State Heat Conduction in a Solidifying Alloy

7.5 Natural Convection at a Heated Vertical Plate

7.6 Steady-State Heat Conduction in a Square Plate

7.7 UefIection of a Loaded Plate

7.8 Torsion with Curved Boundary

7.9 Unsteady Conduction between Cylinders (Characteristic-Value


Problem)
xvi Computer Examples
CHAPTER 8
8.1 Distribution ofpoints in a Bridge Hand

8.2 Poisson Distribution Random Number Generator

8.3 Tabulation of the Standardized Normal Distribution

8.4 XZ Test for Goodness-of-Fit

8.5 Polynomial Regression with Plotting


2 Interpolation antd Approximation

In the general case, however, only the base-point func- 1.2 Approximating Functions
tional information is given and little is known about f (x)
The most common approximating functions g(x) are
for other arguments, except perhaps that it is continuous
those involving linear combinations of simple functions
in some interval of interest, a < x < b. The tabulated
drawn from a class of functions (gi(x)) of the form
functional values f (x,), . . ., f (x,), or even the base-point
values x,, . . ., x, may themselves be approximations to
true values, particularly when the table entries are the The classes of functions most often encountered are the
results of experimental measurements.
monomials (xi), i = 0, 1 , . . ., n, the Fourier functions
The synthesis of a new analytical function g(x) that
{sin kx, cos kx}, k = 0, 1, . . ., n, and the exponentials
approximates the original function f(x) depends upon {ebax),i = 0, 1, . . ., n. Linear combinations of the mono-
many factors such as knowledge of the function, the mials lead to polynomials of degree n, pn(x):
source and accuracy of the tabulated functional values,
the intended use of the approximating function g(x), and
accuracy requirements for the approximation. It is in-
tuitively obvious that the more we know about f(x), the
greater is the likelihood of finding a suitable functiong(x).
For example, if a theoretical model suggests that f(x)
should behave as a cubic in x, then we would probably Linear combination of the Fourier functions leads to
begin by attempting to fit the tabulated information approximations of the form
with a third-degree polynomial. If f(x) is a measure of
activity in a process involving radioactive decay, then
f ( x ) = g(x) = a, + a , cosx + a 2cos2x +
it is quite probable that g(x) will be exponential in ' . . f a n cosnx
character. + b, sin x + b, sin 2x f
lnformation about the reliability of the f (xi) values . . . + b, sinnx,
(i = 0, I , . . ., n) is essential. It would be unrealistic to
expect a n approximation g(x) to produce estimates of
f ( x ) accurate to four significant figures, if the values n n
f(xo), f ( x i ) , . . ., f (x,) used to generate g(x) were accurate g(x) = a, -I-
k=1
a k cos kx + k= 1
bk sin kx.
to no more than two figures. Even tabulated values of
"known" functions are normally only approximate Approximations employing exponentials are l~suallyof
values because of rounding, that is, the representation the form
of any real number by a finite number of digits. Num-
erical mathematics is not immune to the laws of thermo-
dynamics. We never get something for nothing (although
~t is often possible to get more than is apparent at first Rational approximations,
glance). a 0 + a , x + a 2 x 2 f . . . + a n x " =-p,(x)
In some cases, the synthesized function will not be used f (x) 2 g(x) =
directly for functional estimation, but instead will be
+
bo b,x + + +
b2x2 . . . h,,xm p , , , ( ~ ) '
manipulated further. For example, suppose that the are also used, although less frequently than are the poly-
integral nomials and Fourier functions.
The algebraic polynomials pn(x) are by far the most
important and popular approximating functions. The
case for their use is strong, although not overwhelming.
is required. Assuming that alternative formulations for The theory of polynomial approximation is well de-
g(x) are possible, the natural choice would be one that is veloped and fairly simple. Polynomials are easy to
easy to integrate. evaluate and their sums, products, and differences are
If the function f (x) is not known precisely, then there is also polynomials. Polynomials can be d ~ k r e n t i a t z dand
certainly no way of evaluating the error committed in re- integrated with little difficulty, yielding other polynomials
placing f ( ~ )by g ( ~ ) .Fortunately, it is often possible to in both cases. In addition, if the origin of the coord~nate
find some order-of-magnitude estimate for the error by system is shifted or if the scale of the independent variable
making reasonable assumptions aboutf(x) (for exa~nple, is changed, the transformed polynomials remain poly-
that f(x) is smooth, that it is monotonic, that its high- nomials, that is, if p , ( ~ )is a polynomial, so are p , ( ~f a)
order derivatives are small, etc.). If the function f(x) is and pn(ax). Some, but not all, of these favorable properties
known precisely o r analytically, then it is often possible * The conventional compact polynomial notation will be used
to establish an upper bound for the error. throughout, it being understood that p,(O) = a,.
Applied Numerical Methods
CHAPTER 1

Interpolation and Approximation

1.1 Introduction another kind of error, termed round-08 error, is intro-


duced. Round-off errors are caused by the rounding of
This text is concerned with the practical solution of
results from individual arithmetic operations because only
problems in engineering, science, and applied mathe-
a finite number of digits can be retained after each opera-
matics. Special emphasis is given to those aspects of prob-
tion, and will differ from computer to computer, even
lem formulation and mathematical analysis which lead to
when the same numerical method is being used.
the construction of a solution algorithm or procedure suit-
We begin with the important problem of approximating
able for execution on a digital computer. The identifica-
tion and analysis of computational errors resulting from one function f (x) by another "suitable" function g(x).
mathematical approximations present in the algorithms This may be written
will be emphasized throughout.
To the question, "Why approximate?", we can only
answer, "Because we must!" Mathematical models of There are two principal reasons for deveIoping such
physical or natural processes inevitably contain some in- approximations. The first is to replace a function f(x)
herent errors. These errors result from incomplete under- which is difficult to evaluate or ma~ipulate(for example,
standing of natural phenomena, the stochastic or random differentiate or integrate) by a simpler, more amenable
nature of many processes, and uncertainties in experimen- function g(x). Transcendental functions given in closed
tal measurements, Often, a model includes only the most form, such as In x, sin x, and erf x, are examples of func-
pertinent features of the pl~ysicalprocess and is deliber- tions which cannot be evaluated by strictly arithmetic
ately stripped of superfluous detail related to second-level operations without first finding approximating functions
effects. such as finite power series. The second reason is for inter-
Even if an error-free mathematical model could be de- polating in tables of functional values. The function f (x)
veloped, it could not, in general, be solved exactly on a is known quantitatively for a finite (usually small) number
digital computer. A digital computer can perform only a of arguments called base points; the sampled functional
limited number of simple arithmetic operations (prin- +
values may then be tabulated at the n 1 base points
cipally addition, subtraction, multiplication, and division) XQ, xl, . . ., x,, as follows:
on finite, rational numbers. Fundamentally important
mathematical operations such as differentiation, integra-
tion, and evaluation of infinite series cannot be imple-
mented directly on a digital computer. All such computers
have finite memories and computational registers; only a
discrete subset of the real, rational numbers may be
generated, manipulated, and stored. Thus, it is impossible
to represent infinitesimally small or infinitely large quan-
tities, or even a continuum of the real numbers on a finite
interval.
Algorithms that use only arithmetic operations and cer-
tain logical operations such as algebraic comparison are
called numerical methods. The error introduced in approxi- We wish to generate an approximating function that
mating the solution of a mathematical problem by a will allow an estimation of the value off (x) for x # xi,
numerical method is usually termed the truncation error i = 0,1, . . ., n. In some cases, f (x) is known analytically
of the method. We shall de:vote considerable attention to but is difficult to evaluate. We have tables of functional
the truncation errors associated with the numerical values for the trigonometric functions, Bessel functions,
approximations developed in this text. etc. In others, we may know the general class of functions
When a numerical method is actually run on a digital to which f(x) belongs, without knowing the values of
computer after transcription to computer program form, specific functional parameters.
- Interpolation and Approximation

In the general case, however, only the base-point func- 1.2 Approximating Functions
tional information is given and little is known about f (x) The most common approximating functions g(x) are
for other arguments, except perhaps that it is continuous those involving linear combinations of simple functions
in some interval of interest, a < x < b. The tabulated drawn from a class of functions {gi(x)) of the form
functional values f (xo), . . ., f (xn), or even the base-point
values x,, . . ., x,, may themselves be approximations to
true values, particularly when the table entries are the The classes of functions most often encountered are the
results of experimental measurements. monomials {xi), i = 0, 1, . . ., n, the Fourier functions
The synthesis of a new analytical function g(x) that {sin kx, cos kx), k = 0, 1, . . ., n, and the exponentials
approximates the original function f(x) depends upon {ebix),i = 0, 1, ...,n. Linear combinations of the mono-
many factors such as knowledge of the function, the mials lead to polynomials of degree n, pn(x):
source and accuracy of the tabulated functional values,
the intended use of the approximating function g(x), and
accuracy requirements for the approximation. It is in-
tuitively obvious that the more we know about f(x), the
greater is the likelihood of finding a suitable function g(x).
For example, if a theoretical model suggests that f(x)
should behave as a cubic in x, then we would probably Linear combination of the Fourier functions leads to
begin by attempting to fit the tabulated information approximations of the form
with. a third-degree polynomial. Iff (x) is a measure of
activity in a process involving radioactive decay, then
f(x)&g(x) = a o + a , cosx + a, cos2x +
it is quite probable that g(x) will be exponential in '.-+ an COSnx
character. + b, s i n x + b2 sin2x +
Information about the reliability of the f(xi) values . -+ bn sinnx,
(i = 0, 1, . . ., n) is essential. It would be unrealistic to
expect an approximation g(x) to produce estimates of
f(x) accurate to four significant figures, if the values
f (x,), f (xi), ..., f (x.) used to generate g(x) were accurate g(x) = a, + C akcos kx + kC= bk sin kx.
k= 1 1
to no more than two figures. Even tabulated values of
"known" functions are normally only approximate Approximations employing exponentials are usually of
values because of rounding, that is, the representation the form
of any real number by a finite number of digits. Num-
erical mathematics is not immune to the laws of thermo-
dynamics. We never get something for nothing (although
it is often possible to get more than is apparent at first Rational approximations,
glance). +
a, + a,x a2x2 + + anxn pn(x)
In some cases, the synthesized function will not be used f (x) r g(x) = bo + b,x b2x2 . .. + bmxm=-
+ +
directly for functional estimation, but instead will be pm(x)'
manipulated further. For example, suppose that the are also used, although less frequently than are the poly-
integral nomials and Fourier functions.
The algebraic polynomials pn(x) are by'far the most
important and popular approximating functions. The
case for their use is strong, although not overwhelming.
is required. Assuming that alternative formulations for The theory of polynomial approximation is well de-
g(x) are possible, the natural choice would be one that is veloped and fairly simple. Polynomials are easy to
easy to integrate. evaluate and their sums, products, and differences are
If the function f (x) is not known precisely, then there is also polynomials. Polynomials can be differentiated and
certainly no way of evaluating the error committed in re- integrated with little difficulty, yielding other polynomials
placing f ( ~ )by g ( ~ )Fortunately,
. it is often possible to in both cases. In addition, if the origin of the coordinate
find some order-of-magnitude estimate for the error by system is shifted or if the scale of the independent variable
making reasonable. assumptions about f (x) (for exa~nple, is changed. the transformed polynomials remain poly-
that f(x) is smooth, that it is monotonic, that its high- nomials, tliat is, if p , ( ~ )is a polynomial, so are pn(x + a)
order derivatives are small, etc.). If the function f(x) is and pn(ax). Some, but not all, of these favorable properties
known precisely or analytically, then it is often possible * The conventional compact polynomial notation will be used
to establish an upper bound for the error. throughout, it being understood that p,(O) = ao.
1.3 Polynomial Appro xim mat ion-A Survey 3
are possessed by the Fourier approximations as well. As The Interpolating Polynomial. Given the paired values
we shall see later, most of the other functions considered (xl,f(xl)), i = 0, 1, ..., n, perhaps the most obvious
as potential candidates for approximating functions criterion for determining the coefficients of p,(x) is to
(sines, cosines, exponentials, etc.) must themselves be require that
evaluated by using approximations; almost invariably,
these approximations are given in terms of polynomials
or ratios of polynomials. Thus the nth degree polynomial p,(x) must reproducef (x)
All these obvious advantages of the polynomials would exactly for the n + 1 arguments x = xi. This criterion
be of little value if there were no analytical justification seems especially pertinent since (from a fundamental
for believing that polynomials can, in fact, yield goad theorem of algebra) there is one and only one polynomial
approximations for a given function f (x). Here, "good" of degree n or less which assumes specified values for
implies that the discrepancy between an approximating +
n 1 distinct arguments. This polynomial, called the nth
polynomial p,(x) and ~ ( J c )that
, is, the error in the ap- degree interpolating polynomial, is illustrated schematic-
proximation, can be made arbitrarily small. Fortunately, ally for n = 3 in Fig. 1.1. Note that requirement (1.2)
this theoretical justification does exist. Any continuous
functionf (x) can be approximated to any desired degree
of accuracy on a specified closed interval by Jame poly-
nomial p,(~).This follows from the Weierstrass approxi-
mation theorem stated here without proof [6]:
Iff (x) is continuous in the closed interval [s,b],(that
is, a < x < b) then, given any c: 0, there is some:
polynomial p,(x) of degree n = n(e) such that

Unfortunately, although it is reassuring to know that


some polynomial will approximatef (x) to a specified ac-
curacy, the usual criteria for generating approximating
polynomials in no way guarantee that the polynomial
found is the one which the Weierstrass theorem shows
must exist. If f(x) is in fact unknown except for a few I >x
sampled values, then the theorem is of little relevance, (It Figure 1.1 The interpolating poljwomial.
is comforting nonetheless!)
The case for polynomials as approximating functions establishes the value of p,,(x) for all x, but in no way
is not so strong that other possibilities should be ruled out guarantees accurate approximation of f(x) for x # xi,
co~llpletely. Periodic functions can often be approximttted that is, for arguments other than the given base points.
very efficientlywith Fourier funstions; funstions with an Iff (x) should be a polynomial of degree n or less, agree-
obvious exponential character will be described mare ment is of course exact for all x.
compactly with a sum of exponentials, etc, Nevertheless, The interpolating polynomial will be developed in con-
for the general approximation problem, polynomid siderable detail in Sections 1.S to 1.9.
approximations are usually adequate and reasonably easy The Least-Squares Polynomial. If there is some question
to generate. as to the accuracy of the individual values f (xi), i = 0, 1,
The remainder of this chapter will be devoted to poly- .
. .,n (often the case with experimental data), then it may
nomial approximations of the form be unreasonable to require that a polynomial fit the f (xi)

f (x) p,(x) -
Qj '
n
qxi. (1.1)
exactly. In addition, it often happens that the desired
polynomial is of low degree, say m, but that there are
many data values available, so that n > m. Since the exact
For a thorough discussion of several other approximating matching criterion of (1.2) for n + 1 functional values can
functions, see Hamming [Z], be satisfied only by one polynomial of degree n or less, it is
generally impossible to find an interpolating polynomial
1.3 P~lynomialApproximation-A Survey of degree m using all n + 1 of the sampled functional
kfter selection of an nth-degree polynomial (1.1) as the values.
approximating function, we must choose the criterion for Some other measure of goodness-of-fit is needed. In-
"fitting the data." This is equivalent to establishing the stead of requiring that the approximating polynomial
procedure for computing the values af the coefficients reproduce the given functional values exactly, we ask only
QO,a,, , . ., a,. that it fit the data as closely as possible. Of the many
4 Inteplation and Approximation

meanings which might be ascribed to "as closely as


possible," the most popular involves application of the
+
least-squares principle. We fit the given n 1 functional Here, w(x) is a nonnegative weighting function; in many
values with p,(x), a polynomial of degree m, requiring cases, w ( ~=) 1.
that the sum of the squares of the discrepancies between Since the motivation for the least-squares criterion is
the f (xi) and p,(xi) be a minimum. If the discrepancy at essentially statistical in nature, further description of the
the ith base point x, is given by di =p,(xi) -f (xt), the least-squares polynomial will be delayed until Chapter 8.
least-squarescriterion requires that the aj, j = 0, 1, ..., m, The Minimax Polynomial. Another popular criterion,
be chosen so that the aggregate squared error termed the minimax principle, requires that the co-
efficients of the approximating polynomial p,(~) be
chosen so that the maximum magnitude of the differences
f (xi) - pm(xi), i = 0, 1, . .., n, (m < n) be as small as
possible. Then the minimax polynomial of degree m must
satisfy the condition
be as small as possible. Um should equal n, the minimum max 1f (xJ - pm(xi)l= minimum,
error E is exactly zero, and the least-squares polynomial (1.4)
i
is identical with the interpolating polynomial. Figure 1.2 that is, p,,,(x) must minimize the maximum error. In more
general form, this condition may be written
max If (x) - p,(x)l = minimum.
a€x€b

The minimax polynomial is often called the optimal poly-


nomial approximation.
The minimax principle is attributed to Chebyshev, and
the minimax polynomials are closely related to the Cheby-
shev polynomials described in Section 1.10.
Power Series. If a functionf(x) is continuous and suit-
ably differentiable, it can be written in term of a Taylor's
series. We assume that the reader is familiar with the
Taylor's (power) series expansion of a function; the de-
vefopment may be found in any elementary calculus text.
One of the most useful and easily generated polynomial
approximationspn(x) of a function (provided that the re-
Figure 1.2 The least-squarespolynomid. quired derivative terms can be evaluated) results from
truncation of its power series expansion after the nth-
illustrates the fitting of five functional values (n = 4) with degree term. In order to establish a bound for the error
a least-squares polynomial of degree one (m = I), that is, introduced by the truncation process, we make use of
a straight line. Taylor's formu/a with remainder [8] :
When the values f(xi) are thought to be of unequal re-
If a continuous functionf (x) possesses a continuous
liability or precision, the least-squares criterion is some-
(n + 1)th derivative everywhere on the interval
times modified to require that the squared error at xi be
.[xo,x], it can be represented by a finite power series
multiplied by a nonnegative weight factor w(xi) before
the aggregate squared error is calculated, that is, (1.3) f ( 4 =f(xo) + (X- ~ 0 ) ~(xO)
f + (X- xO)*
assumes the form

The weight w(xi) is thus a measure of the degree of pre-


cision or relative importance of the value f(xi) in deter-
mining the coefficients of the weighted least-squares
polynomialp,(x).
The least-squares principle may also be used to find an = PAX)+ Rtx), (1.5)
approximatingpolynomial p,(x) for a known continuous where R(x), the remainder, is given by
function f(x) on the interval [a,b]. In this case the object
is to choose the coefficiexlts of p,(x) which minimize E
where
1.3 Polynomial Appt *oximation-A Survey

<
Here xo < < x, or, if x < x,, x < 5 < x,. Hence- For xo = .n/4:
forth* this will be written more succinctly as in <
(~9x0).
The parameter 5 in (1.6) is a n unknown function of
x. Hence, except in very special cases (for example, when
f("")(t) is a constant), it is impossible t o evaluate the (x - 7r/4)4
R (x) = cos f,
error or remainder term R(x) exactly. Nevertheless, we 4!
shall see that (1.6) can prove useful in establishing a n We now use (I .8) and (1.9) to estimate cos(~r/2).From (1.8),
upper bound for the error incurred when p,(x) is used to cos(7rl2) = 1 - .n2/8= -0.2337. Since 5 is unknown, but
approximate f(x). Provided that f(x) meets the con- 0 < [ < n/2, an estimate of the error bound is given by
tinuity and differentiability requirements of the formula,
the polynomial p,(x) given by (1.5) may be viewed as
fittingexactly then + 1 paired values(xo,f(xo)),(xo,f'(xo)),
(xo,f "(xo)), . . ., (x,, f("'(x0)), since p2'(xo) =f ( i ) ( ~ o ) . In this case, since the sign of the error is known (because cos f
For the most commonly encountered polynomial is positive for all x, 0 < x < 77/2), we could also write:
approximation problem, in which only the functional
values f(xi) are known at n + 1 distinct base points
x,, i = 0, 1, . . ., n, the Taylor's expansion of (1.5) is of Note that although the error is sizeable [cos(n/2) = 0.01, it is
little use, since it is usually not possible to evaluate the smaller than the predicted upper bound.
required derivatives. However, if the function f(x) is From (1.9),
known analytically and is simply differentiable, then use-
ful approximations p,(x) can be found easily. By estab-
lishing an upper bound foir the magnitude of the remain-
der R (x), and hence of the error in p,(x), we can use the Since cos ( has a maximum value 0.7071 on the interval
approximating polynomial with complete confidence in n/4 < f < 4 2 ,
the region of interest.
Exanlplc. Expand the function f(x) = cos x in a Taylor's
series. Use Taylor's formula with remainder to find a third- As before, the sign of R is known to be positive, and the
degree polynomial approximation p3(x) = cos x. For xo = 0 approximation could be written:
and xo = ~ 1 4estimate
, cos(rr/2) from p3(x); establish bounds
on the errors in the estimated values, using (1.6).
Cos x and its first four derivatives are: The error is again smaller than the upper bound. The approxi-
f (x) = cos x f "'(x) = sin x mation is much better than in the previous case.
f '(x) = -sin x f ' 4 ) ( ~=) cos x The preceding example illustrates the influence of the
polynomial (x - x,)"+' on the remainder term of (1.6),
f "(x) = - cos x. R ( x ) . Normally, to approximate best a function f ( x ) on
Substitution into (1.5) and (1.6) with n =3 yields: the interval [a,b],xo should be chosen near the middle of
* p4x)
CO~(X) = cos xo - (x - xo) sin xo the interval; we can show that the choice x, = ( a + b)/2
minimizes the maximum contribution of the term
--(x - x0l2 cos Xo (X- xol3 (x - x,)"" to the remainder for a < x < b. For a fixed
2!
+ ----
3!
sin xo, (1.7)
value of n (that is, for a fixed number of terms retained in
the series), this is about the only practical way of reducing
the magnitude of the error in the approximating poly-
nomial p , ( ~ ) .The value of f'"+''(t) for a given x in [u,S]
For xo = 0:
cannot be computed, in general, since 5 is an unknown
sin(0) = 0, cos(0) = 1 function of x. Consequently, in estimating the error R (s),
we can only be certain that (f'"+''(<)1 can be n o greater
cos x 5 1 - than the maximum value of 1f '"+"(w)l for x in [a,b].Of
course, if each successive term in the Taylor's expansion
x4
R (x) = - cos t, f in (x,O). is smaller in magnitude than the previous ones (often but
4! not always the case), another way to lower the upper
bound on the magnitude of R (x) is to include additional
+ .
In smral, the notation 6 in ( x , x o , x , , . ., x.) will indicate that
terms in the approximation, that is, to increase the degree
f is in the open interval determined by the smallest and largest of
the enclosed arguments. of the approximating polynomial.
6 Interpolation and Approximation
1.4 Evaluation of Polynomials and Their Derivatives Transcription of these two flow diagrams into two
FORTRAN IV statement sequences is complicated
Efficient evaluation of a polynomial
somewhat by the following restrictions in the FORTRAN
IV language as implemented on most of the digital com-
puters now in use [10,11]:
may be important if pn(x) is to be computed many times 1. Zero is not a permitted subscript for subscripted
for different values of x. Straightforward term-by-term variables.
evaluation is inefficient, particularly for large values of n. 2. Subscripts are limited to simple arithmetic expres-
If each factor Xk is computed by k - 1 repeated multi- sions of the form clv k c2 where c, and c, are non-
+
plications of x, then n(n 1)/2 multiplications and n ad- negative integer constants and v is a nonsubscripted
ditions are required to evaluate pn(x). If each factor xk is integer variable.
calculated by successive multiplications, xxk-', then 3. The initial value, increment, and terminal value of
2n - 1 multiplications and n additions are required. the iteration variable in the iteration (DO) statement
However, if the nesred form of p,(x), must be positive integer constants or variables.
4. The iteration (DO) loop is executed at least once,
even when the upper limit of the iteration variable is
smaller than the initial value of the iteration variable.
is used, then only n multiplications and n additions are Some of these difficulties can be avoided if a different
required per evaluation. The procedure described by subscription convention is used in (1.lo). However, given
(1.10) is called Horner's rule. It has been shown [5] that the subscription convention of (1. lo), assume that
for n < 4, the nested evaluation requires the minimum FORTRAN variables N and X have been assigned the
possible number of arithmetic operations. For poly- values of n and x, respectively; let the values of the co-
nomials of higher degree, other schemes which require efficients a,, . . ., an be assigned in sequence to the ele-
fewer than 2n operations (on the average) are known, ments of the FORTRAN subscripted variable A from
particularly when pn(x) is to be evaluated many times. +
A(l) to A(N 1). Thus coefficient a i is assigned to array
The procedure is generally different for each n [4,19]. For +
element A(I I), where problem variable i is equivalent
most applications, the nested computing scheme of ( I .lo) to program variable I. If J is acounting variableand NP1
is adequate. is a variable assigned the value n + 1, FORTRAN IV
Since the calculations in the innermost parentheses statement sequences which describe algorithms 1 and 2
must be performed first, the computing procedure for and assign the computed value ofpn(x)ta the FORTRAN
variable P are:

Algorithm 1 Algorithm 2

In both FORTRAN sequences a few integer operations


evaluating pn(x) can be seen more easily if the terms on could be eliminated, but not without destroying the
the right-hand side of (1.10) are written in reversed order equivalence of program variable I and problem variable i.
as follows: McCracken and Dorn [9] show that in addition to
pn(x) =((...(a,x+ an-,)x +a,,-,)x + - . . + a , ) x + a,. saving computing time, the nested scheme leads to a
For example, for n = 4, lower bound for the total round-off error than straight-
forward term-by-term evaluation when. the coefficients a i
+ +
p4(x) = (((a4x + a3)x a,)x a,)x + a,. become smaller with increasing i. The term round-off error
Using the flow diagram conventions outlined in the refers to those errors resulting from the roundiog and/or
appendix, two algorithms for implementing the compu- truncation of the results of individual arithmetic opera-
tation of (1.10) are shown in Fig. 1.3. In each case, the tions on a computing machine. They are generated be-
value of p,(x) is assigned to the symbol p. cause the memories of all real machines are finite and
1.4 Evaluation of Polynomi bls and Their Derivatives 7
wherein local round-off errors are treated as if they were
random variables. Such models produce much smaller
and usually more realistic estimates of round-off error
actually observed than do the round-off error bounds.
The sequence of factors computed during evaluation
of pn(x) for a particular argument x = t is closely related
to that involved in removing the factor (x - t) fromp,(x)
by synthetic division. Using the notation of (1.10), let
bn = an,
bi=bi+,t+ai, i = n - 1,..., 0. (1.11)
The sequence of values of the bi in (1.1 1) is exactly the
sequence of values assumed by the FORTRAN variable
P in the two FORTRAN program segments of page 6,
and bo is equivalent to the right-hand side of (1.10) with

1 x = t, that is, bo = pn(t). For example, for n = 4,

t
Algorithm 1 Algorithm 2
Figure 1.3 Evaharion of a polynomial p,,Q using Horner's rule. Now let us divide pn(x) by the factor (x - t) to yield

only a fixed, usually small, number of digits can be re-


tained after each arithmetic operation. The extent of
round-off error associated with any algorithm depends
upon the computing machine used (the number of digits
retained, whether numbers are rounded or truncated,
etc.), the particular sequence of machine operations used, Here qn-,(x) is a polynomial of degree n - 1 and Ro is
and the values of the various numbers involved in these the constant remainder. The coefficients of qn-,(x) and
machine operations. Ro depend on t. By carrying out the indicated division in
Analysis of the round-off error present in the final longhand or, more simply, by equating coefficients of like
result of a numerical computation, usually termed the powers of x on both sides of the last equation, it can be
accumulated round-off error, is difficult, particularly seen that
when the algorithm used is of some complexity. Except in
very simple cases, the accumulated error is not simply the
sum of the local round-off errors, that is, errors resulting
from individual rounding or truncation operations. The where the b, are given by (1.11).
local error at any stage of the calculation is propagated If we divide qn-,(x) by the factor (x - t) in similar
(either magnified or diminished) throughout the remain- fashion we get a polynomial qn-,(x) of degree n - 2 with
ing part of the computation. In order to establish a constant remainder R1 such that
round-off error bound, one must assume the worst
possiblc outcome for the result of each arithmetic opera-
tion and follow the propagation of all such errors through- Let qn-,(x) be written as
out the remaining calculations. In cases where it is
possible to do this, the resulting bounds are almost in-
+
~ , , - ~ ( x ) = c , x " - ~ c,,-,x"-~ + . - .+ C 3 X + C 2 .
variably very conservative; the observed errors are By analogy with (1.1 l), we may write
usually much smaller than the calculated bounds, al-
though they can be very considerable in some problems
(we shall attempt to point out those algorithms in which
round-off error may be a serious problem). In recent where c1 = R, = qn-,(t). The original polynomial be-
years, numerical analysts have attempted to create sta- comes
tistical models of the propagation of rounding errors [12], PJX) = (x - tI2qn- + (X- t)Rt + RO
8 Interpolation and Approximation

and its first derivative becomes p4(2) = Ro = 59

so that p:(t) = R1 = c,. The process of using the nested


form of (1.10) for qn- ,(x), followed by division by a factor
(x - t) to generate a new polynomial qn-,(x) with re-
mainder Rz, is continued until the original polynomial 1.5 The Interpolating Polynomial
is written in the form It is important to reiterate that there is one and only one
polynomial of degree n or less which assumes the exact
values f(x,), f (xi), . . .,f(xn) at n + 1 distinct base points
x,, xl, . . ., x,, that is, satisfies (1.2). Therefore, although
the many polynomial interpolation formulas to be found
Then p,(t) = R,, p:(t) = R,, p:(t) = 2R2, etc. In general, in the literature appear to be different, those which use the
pF)(t) = i!Ri. This procedure is best illustrated by same base-point information and the same criterion (1.2)
preparing a table of the a , b,, ci, and coefficients of the for computing the coefficients a,, a,, . . ., a, must be fun-
otherhtermediate polynomials qn-3(x), qnd4(x),etc., as damentally the same. The interpolating polynomial has
follows : coefficients a, which are the solutions of the set of n + 1
simultaneous linear equations:

a, + a,x2 + a2x2 + -.. + anx; =f (x,)

a, + alxn + a2x; + -..+ anx: =f(x.).


If we let T i q jbe the entry in the ith row and jth column The determinant of the matrix of coefficients for these
of the table, then the procedure for calculating the non- equations
zero table elements is:

is known as the Vandermonde determinant and is non-


Example. Use the algorithm of (1.12) to evaluate the
zero if x i # xi, i # j. Thus (see Chapter 4) there is a
polynomial
unique solution for the a,, that is, there is a unique poly-
p,(x) = 3x4 + 2x3 - x 2 + 2x - 5 nomial p , ( ~ )which exactly reproducesf(x) at the sample
and its derivatives of order 1 through 4 for x = 2. points.
The elements of the T table computed using the first three Since the coefficients of the desired polynomial may be
steps of (1.12) are shown below: computed by solving this set of simultaneous linear equa-
tions, there is some question as to the need for interpolat-
ion formulas. One reason for not using the simultaneous
equations approach is that solving a linear system of any
size is not an easy task (see Chapter 5), particularly if
hand methods are being used. More important, perhaps,
the development of an interpolating formula often pro-
duces an error term as a-y-product. While it is not
possibie to evaluate this error term exactly, one can fre-
quently find an upper bound for the error, or barring that,
Using step 4 of (1.12), the values of the polynomial and its >one order-of-magnitude estimate of the error. The im-
first four derivatives for x = 2 are: portance of such information can hardly be overstated.
1.6 Newton's Divided-Differznce Interpolating Polynomial 9
Obviously, the smaller the error bound, the greater con-
fidence one has in using the derived polynomial.
Most numerical analysis textbooks cover in great detail
the subject of interpolation by formulas derived from the wheref [x,x,] is termed thefirstfinite divided dzference or
criterion of (1.2). All can be classified into one of two the finite divided dzference of order one relative to argu-
groups: those applicable for arbitrarily spaced base ments x, x,.
points and those for evenly spaced base points, that is, The relationship of the first finite divided difference and
for base points, the first derivative is clearly indicated by the differential
mean-value theorem from the elementary calculus :

Let f (x) be continuous for a < x < b and differenti-


able for a < x < b; then there exists at least one
5, a < 5 c 6, for which

X, = x0 + nh, Graphically (see Fig. 1.4), this simply means that if f(x)
is continuous and suitably differentiable on some interval
where h is the constant spacing or stepsize between in x, then there is at least one point x = ( on the interval
adjacent ordered xi values. for which the slope of the line tangent to f (x) [that is, the
In what follows, the two most common forms of the derivative, f1(t)] is the same as the slope of the line
interpolating polynomial for arbitrarily spaced base joining the functional values at the ends of the interval.
points, Newton's divided-diference interpolating poly-
momial and Lagrange's interpolating polynomial will be
developed in some detail. Both permit arbitrary ordering
of the base points x,, x,, x,, . . ., x, as well. Duplicate
base-point values are not permitted, since the Vander-
monde determinant would vanish and the system of equa-
tions in the unknown coefficients would have no unique
solution,
The equal-interval formulas have been developed to
simplify interpolation in tables of functions with evenly
spaced arguments. Some have been developed specifically
for interpolation near the beginning, middle, or end of a
table. All can be derived from either Lagrange's or New-
ton's divided-difference interpolating polynomial. With
the widespread use of high-speed computers, tabular
interpolation of this sort has lost much of its importance.
Almost without exception, functional values needed
for digital computation are generated directly by sub-
routines from the computer's program library. As a con-
Figure 1.4 Zllustration of the differential mean-value theorem.
sequence, only a few of the most important equal-interval
formulas will be shown, primarily because of their use in
the development of numerical integration formulas (see Thus the first finite divided difference of (1.13) is related
Chapter 2). to the first derivative (provided that the continuity and
differentiability restrictions of the theorem are me^) as
1.6 Newton's Divided-Difference Interpolating follows :
Polynomial
Consider the definition of the derivative:

In order to permit similar approximations for higher-


For the finite or discrete mathematics, it is useful to define order derivatives as well. the concept of the finite divided
an approximation to the derivative, difference is extended as shown in Table 1.1, where it is
Interpolation and Approximation

Table 1.1 The Finite Divided Differences

Order Difference Notation Definition

assumed that tabulated sample values at n + 1 discrete where the sequence of integers a,, a,, a,, ..., an is any
base points are available as follows: permutation of n, n - I, n - 2, . . ., 0.
It is also apparent from the definition that

Algebraic manipulation of differences of increasing order


leads by induction to a similar symmetric form for the
nth divided difference in terms of the tabulated arguments
xn f(x.1.
and functional values :

This symmetric form can be written more compactly as


The general relationship between higher-order finite
divided differences and the derivatives of corresponding
orders is shown later in (1.41). Note that the divisor in ~[x,,x,-I,...,xo~= x" ,
f(xi)
. (1.1 7)
each divided difference involves the difference of the two i=O n ( x i - xj)
j=O
arguments which are not common to the two divided j+i
differences in the numerator.
It is apparent from the definition of the divided differ- Consider the problem of linear interpolation to evaluate
ence of order one that the linear function f ( x ) at argument x, xo < x < x , , given
only the base-point information:
f [x,,xol =f [xo,x119 (1.15)
that is, the order of the two arguments is immaterial.
With a little algebraic manipulation it can be shown that
f [ X Z , X I , X=f
O ] C X ~ ~ ~ X ~ , , X ~ ~ ~ ~This situation is shown schematically in Fig. 1.5. From
where a,, a,, or, is any permutation of the integers 2, 1, 0. geometric considerations alone, it is apparent that, for
In general it follows by induction that this case,
1.6 Newton's Divided-Dife~ Interpolating Polynomial 11
Substituting the tabular values into (1.19) yields
pl(x) =f (x) = 1 + (x - 1)(2) = 2x - 1,
which is the straight line (first-degree polynomial) passing
through the points (1,l) and (33).
Now let us reexamine the case of linear interpolation
shown above. Suppose that f (x) is not linear, that is, that
equation (1.18) is only an approximation,

f [x,xol =f 1x17x01. (1.20)


Equation (1.19) is then, of course, also only an approxi-
mation. This situation is shown schematically in Fig. 1.7.

Figure 1.5 Linear interpolation-f ( x ) linear.

Replacing f [x,xo] by its definition from Table 1.1 yields

or
f ( 4 =f (xo) -I- (X - x0)f Cx1,xoI
=f Cxol .+ (x - xo)f Cx1,xol
= P~(x>. (1.19)
Since all the elements on the right-hand side of (1.19),
which defines a straight line or first-degree interpolating
polynomial pl(x), can be evaluated from known quanti- Figure 1.7 Linear interpolation-f ( x ) not linear.
ties, the linear interpolant f (x) can be computed directly.
The procedure or algorithm of (1.19) is, of course, the one To account for any discrepancy and to restore the desired
commonly used for linear interpolation, disguised some- equality, an error or remainder term Rl(x) can be ap-
what by the presence of the divided-difference notation. pended to (1.19). Then
Consider linear interpolation between the tabulated
points (I, 1) and (3,5), shown schematically (Fig. 1 . 6 ~and
)
in tabular form (Fig. 1.6b). Solving (1.21) for R,(x) and collecting factors in terms
of the finite divided differences yields
R1(x) = (x -- xo)(f [x,xol -f [x1,xol)
=(x-xO)(X-X~)~[X,X~,XOI. (1.22)
Then (1.21) has the form
f ( 4 =f Cxol + (x - xo)f Cx1,xol
+ (X - XO)(X- x1)f CX,X~,XOI
= P ~ ( x+
> R1(x), (1.23)
where pl(x) [see (1.19)] is, as before, the first-degree inter-
polating polynomial passing through the sample points
(xo,f(xo)) and (x,, f(xl)). Rl(x) is the discrepancy be-
tween f(x) and p,(x) and will be termed the remainder or
error term for the first-degree polynomial approximation
of f(4.
It is, of course, impossible to compute f [x,x,,x,]
exactly sincef(x), required for its evaluation, is unknown
(b) (otherwise there would be no need for interpolation in
Figure 1.6 Numerical e.ramplefor linear interpolation. the first place). However, if an additional value off (x) is
12 Interpolation and Approximation

known, say f (x,) at x = x,, then, on the assumption that hencep,(x) is the second-degree interpolating polynomial
f [x,xl,xo] is not a rapidly changing function on the passing through (x0,f (~011,(x, ,f (XI)),and (x2,f (~2)).
interval containing x,, x,, and x,, that is, Substituting values from Table 1.2 into equation (1.26)
yields

Rl(x) may be estimated as - + -


p2(x) = 5 (X 0)(6) + (X- O)(X- 1x2)
=2x2+4x-5,
and
Now consider the functional values and divided dif-
ferences of order one, two, and three, shown in Table 1.2
(a divided-dzference table). The generating function for
this table is f(x) = x3 - 2x2 + 7x - 5.
Without information thatf (x) is in fact a second-degree
Table 1.2 Divided-Diflerence Table
polynomial, equation (1.26) must be modified to include
a remainder term to account for any discrepancy between
f [x,x1,xol and f [xz,x1,xol,

In a manner completely analogous to that followed in de-


veloping (1.22), it can be shown that the remainder for
the second-degree approximation is given by

a Here fi[ ] is an abbreviation indicating the first


divided difference, fi[ ] the second divided differ-
ence, etc. f [x,x2,xl,xo] cannot be determined just as f [x,x,,xo]
could not be computed for the linear case. But
A linear interpolation for an estimated value of f(x) at f [x,x2,xl,xo]can be estimated if an additional data point,
x = 0.5, using the base points xo, x,, is described by say (x,, f (x,)), is available, by assuming that f [x,x2,xl,xo]
(1.19): is approximately equal to f [x3,x2,xl,xo].For the differ-
p,(x) =f [xol + (x - xolf tx1,xol ences of Table 1.2 then,
or
pl(x) = -5 + (X - 0) (6) = 6x - 5.
Thus
5 )-5 + (0.5 -0)(6)= -5 + 3 = -2.
~ ~ ( 0 .= By assuming that f [x,x2,x1,xo]is constant and equal to
f [x3,x,,xl ,xo], (1.28) can be incorporated into (127) to
On the assumption that (1.24) applies, an estimate of the yield the third-degree polynomial
error is given by

Suppose that this linear interpolation is thought to be


inadequate. To introduce some curvature into the ap- Evaluation of (1.29) for arguments x = xo, x = x,,
proximating function, assume for the moment that the x = x,, and' x = x, shows that it satisfies criterion (1.2)
second divided differencef [x,xl,xo] is constant and given for n = 3; hence p,(x) given by (1.29) is the third-degree
by interpolating polynomial passing through the points
f [X,XI,XO] =f[XZ,XI,XO]. (1.25) (xO>),(xl,f(xl)), (~2,f( ~ 2 ) and
) ~ (x3,f (x3)). For the
Then the remainder term given by (1.24) can be incor- functional values and finite-divided differences of (1.25),
porated into (1.23) to yield a second-degree polynomial

and

Evaluation of (1.26) for arguments x = x,, x = x,, and


x = x, shows that it satisfies criterion (1.2) for n = 2;
1.6 Newton's Divided-fife1 wnce ZnterpoIating Polynomial 13
Following the procedure used to develop (1.22) and (1.28), Note that this estimate of %(x) is simply the highest-
the error term for the third-degree polynomial (1.29) can +
order term of the (n 1)th-degree divided-difference
be shown to have the form interpolating polynomialp,+,(x) and may also be viewed
as the term truncated from p,, ,(x) to produce p,(x).
Unfortunately, (1.33) yields only an estimate, not an
upper limit, for the error. To establish a bound for R, (x),
we need Rolle's theorem from the elementary calculus:
Since only four sample values are given in Table 1.2, there
is no information available to compute an estimate for Let f (x) be continuous for a < x < b and differenti-
this fourth divided difference. In genera1;for x # xi, able for a < x < b; iff (a) =f (b), then f '(t) = 0 for
i = 0, 1,2, 3, we can say that R3(x) = 0 only if f(x) is at least one 5 where a < 5 < b.
known to be a polynomial of degree three or less (the case Now consider Newton's fundamental formula:
for the generating function for Table 1.2).
It is interesting to note that for an argument x = 0.5, the
first-degree polynomial produced a result nearer the cor-
rect value than did the second-degree polynomial. This
example illustrates the important point that a h a e r -
degree interpolating polynomial does not necessarily where p,(x) is the nth-degree interpoIating polynomial
guarantee better interpolant values. The error terms are, (1.31), R,(x) is the remainder term for the nth-degree
as expected, only order-of-magnitude estimates. interpolating polynomial (I .32), and G (x) is the unknown
The procedure already employed to generate the first, ,
(n + 1)th divided difference f [x,x,,x,- , . . .,xo].
then the second, then the third-degree interpolating poly- At the base points xo, x,, . . ., x,, R,(x) = 0, but in
nomial leads by induction to the general form for nth- general, for any other point,
degree interpolation, Newton's fundamental formula:

Here the nth-degree divided-difference interpolating poly- Now, consider a new function Q ( t ) such that
nomial p , ( ~ )has the forrrt

When t = x i , i = O , l , ..., n, Q(t)=O; when t = x ,


Q (t) = 0 also, since the right-hand side of (1.36) vanishes
+
[see (1.3411. Thus Q (t) vanishes n 2 times (that is,
possesses n + 2 roots) on the smallest interval containing
x and the n + 1 base points x,, x,, .. ., x,. Iff (t) is con-
The corresponding error or remainder term R,(x) is given tinuous and suitably differentiable, Rolle's theorem re-
by +
quires that Q '(t) vanish at least it 1 times on the inter-
Rn(x) = (x - x,)(x - x,)(x - xz) val. Applying the theorem repeatedIy to successively
'
(x -- xn - I)(X - xn)fCx,xn,xn-1, - - .JoI7
higher derivatives shows that Q "(t) must have n roots,
(1.32a) Q '"(t), n - 1 roots, etc., and that Q("+')(t) must vanish
or, in more compact forrrl, by a t least once on the interval containing the base points.
Let this point be t = 5.
+
Upan differentiating (1.36) n 1 times, we get
Q("+
"(t) =f ("+ "(t) - pF+" ( t ) - (n + l)!G(x). (1.37)
If an additional sample, point (x, +, ,f (x, + ,)) is available, Butp,(t) is an ntb-degree polynomial, so thatp("+')(t) = 0.
it is possible to estimate U,(x) as At t = t, then

(1.38)
or, in more compact f m , as We now have a new form for the ermr Rn(x) in terms csf
+
the (a 1)th derivative of f(x), and Newton's funda-
m e 1 formula can be written as
14 Interpolation and Approximation

where The error given by (1.39a) id

ff we estimate casfd,8~,B,,Bl]as the average of the three cal-


culated third-order diRerences, an estimate of the error in the
cttlculttted ca&(O.Pf)is

Ifl tliiS case we know that the functional values are for the
The value of 5 is unknown, except that it is some- fudctibil cos 0. Since we know that the third derivative is the
where on the interval containing x and the base-point function sin B and can establish its upper bound on the interval
values xo, x,, . . ., x,. If the function f (x) is given only in [Bl,03] = [0.2,0.4], an upper bound for the interpolation error
tabular form, then the second form (1.39b) of &(x) is is given by (1.39b) as
of little value since f(""'(5) cannot be determined. How-
ever, if f(x) should be known analytically, then (1.39b) is sin [
(e - e3)(e- e,)(e - el) -
useful in establishing an upper bound for the error. 3!
sin(0.4)
Example. Several values of the cosine function and the cor- < (O.OS)(-0.05)(-0.15) - 3!
= 2.44 x 10- '.
responding divided differences of orders one through five are
shown in Table 1.3. The differences have been computed by
retaining only significant digits (equal to the of digits Since the tl-ue value of cos(0.25) to seven significant figures
in the numerator for each divided difference) at each stage of is 0.9689124, the actual error is 1.77 x lo-*.
the calculation. Notice that each differencing operation tends
to reduce the number of meaningful figures in a divided- Note that (I .39b) shows that i f f (x) is a polynomial of
difference table (see Example 1.1). Evaluate the second-degree degree Or less, then Rn(x) vanishes for This is
xa

interpolating polynomial passing through the functional evidenced by the appearance of zeros in a table of divided
values at B,, 02, and 03, for interpolation argument e = 0.25, differences. For example, consider the divided-difference
that is, estimate the value cos (0.25). table (Table 1.4).

Table 1.3 Divided-Diference Table for cos 0

i 8,
Radians f3[ 1 f4[ 1 f ~ 1l
-
0 0.0
-0.09966715
1 0.2 0.98006657 -0.4921 125
-0.2473009 0.0371062
2 0.3 0.95533648 -0.4772700 0.0396700
-0.3427549 0.0609082 -0.0029662
3 0.4 0.92 106099 -0.4529067 0.0375936
-0.4786269 0.0797050
4 0.6 0.82533561 -0.4210247
-0.6049343
5 0.7 0.76484218

In terms of 6' and the three base points B,, B,, and B3, (1.31) The constant third differences and zero higher-order
becomes differences indicate that all six points can be fitted exactly
+
cos 8 =p2(B) = cos 8, (0 - 8,) cos[8,,Bl] with a third-degree polynomial. Without other informa-
+ (0 - B,)(B - 8,) C O S [ ~ ~ , ~ ~ ,tion,
B , ~ however,
. it is not certain that f ( x ) itself is a third-
Then degree polynomial. In this case (1.33) would yield an
+
cos(0.25) = 0.98006657 (0.05)(-0.2473009) estimated error of zero, which is the true error only iff ( x )
+(0.05)(- 0.05)(-0.4772700) is in fact a polynomial of degree three.
or Some comments concerning the relationship of der-
cos(0.25) i-0.9688947. ivatives and the finite divided differences are in order.
1.6 Newton's Divided-Diflerence Interpolating Polynomial 15
Table 1.4 Ditlided-Diflerence Table As before, the third-degree polynomial generated by the
information in the table is given by

or (following the indicated path through the table),

With additional interchanges of the base-point indices,


it becomes apparent that any path through the divided-
difference table which terminates with the same higher-
Consider equation (1.30), order difference yields the same polynomial. Figure 1.8
f (x) = ~n(x)+ Rn(x). shows the eight different paths which could be taken
across this table; all terminate with the third-order
Since f ( x ) agrees exactly with pn(x) at the n + 1 base differencef [x3,x2,x,,xo] = 1.
points xo, x,, . .., x,, Rn(x) must vanish a t least n 1 +
times on the interval determined by the base points.
Rolle's theorem, applied repeatedly, shows that Ri(x)
must vanish at least n times, R;(x) at least n - 1 times,
etc., and that R!,'"(x) must vanish at least once, say at
x = <.Then differentiation of (1.30) n times and evalua-
tion at x = 5 yields
f '"'(t)= pp'(t) + R P ) ( ~=) pF'(t). (1.40)
But
p?)(t) = n ! f Cxn,xn-,, ..., xOl, Figure 1.8 Equivalent paths across the divided-differencetable.
so that
That the same polynomial is produced for all possible
paths across the table is a natural consequence of the
fact that any particular high-order difference is a function
of all sample values in the triangular segment subtended
( in (xn,x,,- ..., xo). (1.41) to its left in the table (all four sample values are used to
In general, a finite divided difference of any order is compute f [x,,x,,x,,x,] = 1). Therefore, all p3(x) which
similarly related to the derivative of corresponding order terminate with the difference f [x3,x2,x,,xo] must pass
evaluated at some point 5 on the interval containing all through the points xo, x,, x,, and x,. Since there is only
its arguments. one such polynomial, all third-degree polynomials gen-
The order of the base-point indices is immaterial. erated must be the same, regardless of the path taken.
For example, consider the divided-difference table of 'Table 1.2 can be modified as well by arbitrarily order-
Table 1.2 with the base-point indices interchanged ing the base points. Consider the same sample values in
(Table 1.5). different order (Table 1.6).

Table 1.5 Divided-Difference Table-Basepoint Table 1.6 Divided-DifferenceTable-Functional


Indices Interchanged Values in Arbitrary Order
16 Interpolation and Approximation

The third-degree interpolating polynomial is given (taking ment is near the endpoints xo or x,,, the error is likely to
the upper diagonal path) by (1.29) as be larger than when it is near x,,,, assuming that the base
+ (X - 3)lO + (X - 3)(x - 0)5
p3(x) = 25 points are in algebraic order. It also indicates that when x
is outside the interval [x,, . . ., x,], that is, when the inter-
+ ( x - 3)(x-O)(X -4)l = x 3 -2x2 + 7x - 5. polating polynomial is used for extrapolation, the error
When a divided-difference table for n + 1 base points may be very large.
has been prepared f o ~L! differences of order n or less, The divided-difference form of the interpolating poly-
interpolating polynomials of any degree m where m < n nomial is especially well suited for computing interpolant
may be found. If the full difference table is not used values for polynomials of different degree. Altering the
jm < n), there is a choice of base points. It is intuitively degree of the polynomial from j to j + 1 requires the
obvious that one should use base-point information near evaluation of just one additional term, and no previous
the desired interpolation argument. That this is the case calculations need be repeated or modified. This feature
may be seen by examining the form of the error term is important when the degree of the desired polynomial is
(1.39b). In general, since f ' n + l ' ( { ) is unknown, the best not known at the beginning of the calculation. New terms
that can be done to reduce the error is to make that part can be added one at a time (equivalent to increasing the
of the error term given by the polynomial degree by one) until the bound on the remainder term or
the error estimate given by the first term dropped is small
enough, without invalidating the calculations already
done.
as small as possible in magnitude. Clearly the interpola- In addition, when many interpolations with the same
tion argument should be centered as nearly as possible on data set but different interpolation arguments must be
the interval containing the base points used in the inter- carried out, considerable computing time is saved (over
pojation, that is: the base points should be distributed the Lagrange formulation of the next section), since a
about the interpolation argument as "evenly" as possible. large part of the computation, involved in setting up the
In general, (1.42) shows that when the interpolation argu- divided-difference table, need be done just once.
EXAMPLE 1.1
INTERPOLATION WITH FINITE DIVIDED DIFFERENCES

Problem Statement T as given by (1.1 .I). The program should then read
values for 2 and d, should call on FNEWT to evaluate the
Write a subroutine named DTnaLE that will calculate
all divided of order or less for the paired appropriate divided-difference interpolating polynomial,
values (xi,yi),i = 1 , 2 , . . ., n, where m < n. Elements of and then should compare the interpolant value, J(E),
with the value of cos ? computed by the library function
the divided-difference table should be assigned to the
COS'
first m columns of the first n - 1 rows of the lower
-srangular matrix? T , so that
i = 1 , 2 ,..., n - 1 ,
T , = f C x i + l , x i , . . . , x i + I - J ' [ j = 1 , 2 , ..,,m,
Method of Solution
(1.1.1)
Using (1.1. l), the divided-difference table will have
where f ( x i ) = yi.
the following appearance :

Xl Yl 1 1 =fCxz,xtl
x2 Y2 7 - 2 ~ =~CX~,XZ] Tz,z ' f [X~,~ZIXI]
X3 y3 T3.1 =f[x4rx31 T3,2 = f[ X ~ , X ~ , X Z ]
xm ym T =f [xm+l,xm] 2 !
= S C x m + ~ , x m 3 x m - ~ ITm,m = f[ x m + ~ , - . - , x ~ I
~ n - 1 yn-1 T n - l , l = f [xn,xn-ll Tn-1.2 =f Cxn~xn-1,xn-21 i K-l,m = f Cxn,...,xn-mI
Xn Yn
Note that, unlike the divided-difference tables such as
Assume that the values xi are in ascending algebraic ~ ~ 1.2,b there l is~ no base point (xo,yo = f ( x , ) ) .
order but of arbitrary spacing. Write a function named order to facilitate the simple subscription scheme of
FNEWT that evaluates a divided-difference interpolating (1.1.1), the divided-difference of the table is no
polynomial of degree d for interpolation argument 2 , longer symmetric about the base points near the middle
using the appropriate divided differences from the of the table, All elements of the T matrix, T i , j for
matrix T of (1.1.1' The function should search the x
+
vector? to establlsk which d 1 of the n possible base
, ,
. ,, and n, are unused. The subrolltine
DTABLE assumes that T is a k by k matrix, and
points xi should be used to determine the interpolating for argument consistency to insure that-nl < n.
polynomial, so that is as nearly central to the selected The function FNEWT the vector of base
points as possible. ':his will make (1.42) as small as to find that value of for which I x i ; if %, xn, is
possible and, 'Y, make the error term for assigned the value n. The base points used to determine
the interpolation acc:ptably small. Should ? be smaller the interpolating polynojlial are nornlally max-d, '. .,
than x1 or larger that1 x,,, the base points should be the
first or last d + 1 of the xi, respectively; for these cases,
+
x,,,, where max = i d/2 for d even and max = i +
(d - 1)/2 for d odd. should max be smaller than d + 1
the interpolating polwlmial will be used to extrapolate, or larger than n, then max is reassigned value + I
since i lies outside the ;.inge of available base points. or n, This ins,lres that only given base
To test the two programs, write a main program that points are used to fit the polynomial^
reads values for m , n, an? xi, i = 1 , 2 , . . ., n, computes
terms of the divided the interpolant
the eknents of the y veac'r as Yi = cos xi, and calls on
DTABLE to calculate entries t the divided-difference table
c
value, j(?), is described by the polynomial of degree d:

t The term vector will be used thro ahout to describe alineararray J(%) = Ymax - d + - ~ i r -- d- ) f Cxmox - d + 1, ~ m o x - d l
(a column) of numbers identified i p a common name. Individual
elements of the vector are identifit4 h\ a single subscripr attached
to the name. In this case, the JC and v \ I tors consist of the elements
+ (z - X m ~ ~ - d1x2
+ - xmax-d)
XI, x2, . . ., X. and y,, y 2 , . ..,y n rreqw tively.
The term matrix will describe a rt c i r i rqular array (rows and col- f Cxmas-d+~,X m a x - d + 19 ~ma~--dl
umns) of numbers ldentltied by a c >.?,r ,on name. lnd~vidualele-
ments of the matrix are identified by :) s subscripts, the first to
indicate the row index and the second t ~ , dicate the column index.
+ "'
(x - X m a x - l ) . . . ( ? - x,,,-~)
Thus Tt,, is the element appearing In tile a row and jth column of
the matrix T (see Chapter 4). X f C X ~ .~- .,X xmaX-dl.
, (1.1.2)
18 Interpolation and Approximation

The corresponding error term, from (1.32) and (1.39), is or, from (l.l,l),

< i n ( E , ~ , ~ ,,...,
- ~ xmaJ. (1.1.4) FNEWT uses this nested form to evaluate J(3). Should
there be an inconsistency in the function arguments,
Rewritten in the nested form of (1. lo), (1.1.2) becomes that is, if d > m, the value assigned to FNEWT is zero;
otherwise, the value is J(3).
- xma~-dl(~
J(n) = {{ ' ' ' {f[~max, -3 - XWX- 1) In both DTABLE and FNEWT, a computational switch,
+f [xmax-1, ~max-dl}(~ - xmax-2> trubl, is set to 1 when argument inconsistency is found;
otherwise, trubl is set to 0. FNEWT does not check to
+f [xmax-2, ~max-dI}(~-~max-3)
insure that the elements xi, i = 1,2, . ..,n, are in
+ +f [xmaX-d+1 9 xmax-al> ascending order, although such a test could be incor-
x (E - x,,,-~) + ymax-d, (1.1.5) porated easily.

Flow Diagram
Main Program
---------------------- I
I
I
I

yi 4- COS Xi

Compute finite divided-


difference table for
differences of order m or less
TiBj*f [xi+l, Xi, ---,
~i+l-~I.
(Subroutine DTABLE)

Compute interpolant value


j(Z) using a "well-centered"
Newton's divided-difference
interpolating polynomial of
degree d.
(Function FNEWT)
Example 1.1 Interpolation with Finite Divided Differences

Subroutine DTABLE (Arguments: x, y, T, n, m, trubl, k)

0
Return
20 Interpolation and Approximation

Function FNEWT (Arguments: x, y, T, n, m, d, 2, trubl, k)


------------ ----------- 1
I

F T

Yf

trublt 1

T F T

Y %
d
. , *
max c n maxti+- maxcd+ 1
2

t t t
1
------- - F
I
7 Tmox-l,d
I
I T
I
I
1
I
I
I
I
I
I Evaluate Newton's divided-
I difference polynomial using
I
I
I nested algorithm.
I
I Y 7(2 - xmax - i )
+

I
I + Tmax-i-l,d-i
1
I I
I
I
I
t t
I
------------- Y 8%- xmax-3 + y m a ~ - a
+
trubl + 0
Example 1.1 Interpolation with Finite Divided Di&ences

FORTRAN Implementation
List of Principal Variables
Program Symbol Definition
(Main)
I. J, L Slubscripts i, j, I.
lDEG Degree, d, of the interpolating polynomial.
M m, highest-order divided difference to be computed by DTABLE.
N n, the number of paired values (xi, yi =f(xi)).
NM1 n - 1.
TABLE Matrix of divided differences, T i p i .
TRUBL Computational switch: set to 1 if argument inconsistency is encountered, otherwise set to 0.
TRUVAL Value of cos Z computed by the library function cOS.
X Vector of base points, x i .
XARG Interpolation argument, P.
Y Vector of functional values, yi =f(xi).
Y lNTER Iriterpolant value, j(3).
(Subroutine
DTABLE)
SUB Subscript, i + 1 -j.
K Row and column dimensions, k, of matrix T.
(Function
FN EWT)
lDEGMl d ,- 1.
ISUB1,lSUBZ Subscripts, max - i, d i.-
MAX Subscript of largest base point used to determine the interpolating polynomial, max.
Y EST Variable used in nested evaluation of interpolating polynomial, (1.1.6).
Program Listing
Main Program
C A P P L l ED NUMERI CAL METHODS, EXAMPLE 1.1
C NEWTON'S DIVIDED-DIFFERENCE INTERPOLATING POLYNOMIAL

TEST PROGRAM FOR THE SUBROUTINE DTABLE AND THE FUNCTION


FNEWT. T H I S PROGRAM READS A SET OF N VALUES X ( 1 )
COMPUTES A CORRESPOND l NG SET OF VALUES Y( 1). . Y ( N I WHERE
X(N),
. ...
Y ( I ) = COS(X(I)), AND THEN CALLS QN SUBROUTINE DTABLE
TO CALCULATE A L L F I N I T E D I V I D E 0 DIFFERENCES OF ORDER M OR
LESS. W I T H THE D I V I D E D DIFFERENCES STORED I N MATRIX TABLE,
THE PROGRAM READS VALUES FOR XARG, THE INTERPOLATION
ARGUMENT, AND IDEG, THE DEGREE OF THE INTERPOLATING
POLYNOMIAL 7 0 BE EVALUATED BY THE FUNCTION FNEWT.
FNEWT COMPUTES THE INTERPOLANT VALUE, YINTER, WHl CH I S
COMPARED W1 TH THE TRUE VALUE, TRUVAL = COS(XARG).

DIMENSION X ( 2 0 1 , Y(20), TABLE(20,20)

..... READ DATA, COMPUTE Y VALUES,


READ ( 5 , 1 0 0 ) N, M, ( X ( l ) , l = l , N )
AND P R I N T .....
WRITE (6,200)
D O 1 I=l,N
Y ( I ) = COS(X(I))
WRITE (6,201) I, X ( l ) , Y ( l )

...,. COMPUTE AND P R I N T D I V I D E D DIFFERENCES


CALL DTABLE( X,Y,TABLE,N,M,TRUBL,20 )
.....
16 (TRU$L.NE.O.O) CALL E X I T

-
WRITE (6,202)
NM1 = N 1
M
DO 6
L = l
l*l,NMl

I F ( I .GT.M)
WRITE ( 6 , 2 0 3 )
C -
M
(TABLE(I,J), J=l,L)

.. .
, , READ XARG AND IDEF, CALL ON FNEWT TQ \NTERPOLATE ,
WHITE ( 6 , 2 0 4 )
- 9 .
REAP (2,101) XARG, I P E E
Y l NTER = F#EWT( X,Y,TA@LE,N,M, IDEG,XARG,WBL,2O )

..,.. COMPUTE TRUE VALUE OF COS(XARG) AND P R I N T RESULT9


TRUVAL = COS(XARG)
.....
WRITE (6,205) XARG, IDEG, YINTER, TSUVAC, TRUBL
GO TO 7

..... FORMATS FOR l NPUT AND OUTPUT STATEMENTS


FORMAT ( 4X, 13, l o x , 1 3 / ( l S X , SF10.4) )
.....
FORMAT ( 7X, F8.4, 13X, 1% 1
FQRMAT ( 3 3 H l T H E SAMPLE FUNCTIONAL VALUES ARE / 6HO I, 8X,
1 4 H X ( I ), 9X. 4HY( I ) / 14
FORMAT ( 1 H 14, 2F13,6 )
FORMAT ( 9 ~ l i M 0 ~= , 12, 29H8 THE D I V I D E 9 DIFFERENCES ARE )
FQRMAT ( 1 H / ( l H ,
8516.7)
FQRMAT ( 2 5 H l T H E DATA AND RESULTS ARE / lHO, SX, 4HXbRG, 5X,
1 4HIDEG, 5X, GHYINTER, 6X, CHTRUVAL, 3X, SHTRUBL / 1 H
FORMAT ( 1 H , F9,4, 18, 2F12.6, F7.1 )

END

Subroutine D T 4 B LE
SUBROUTINE DTABLE ( X,Y,TABLE,N,M,TRUBL,K
C
C DTABLE COMPUTES THE F I N I T E D I V I D E D DIFFERENCES OF
C Y(l)...Y(N) FOR A L L ORDERS M OR LESS AND STORES THEM I N
C THE LOWER TRIANGULAR PORTION OF THE F I R S T M COLUMNS OF THE F I R S T
C N - 1 ROWS OF THE MATRIX TABLE. FOR INCQNSISTENT ARGUMENTS,
C TRUBL = 1.0 ON E X I T . OTHERWISE, TRUBL = 0.0 ON E X I T .
C
DIMENSION X(N), Y(N), TABLEtKtY)
i e Interpolation with Finite Diuided Dflerences
~ x a r n ~ 1.1

Program Listing (Continued)


C
C .I .F ..(t4.LT.N)
. CHECK FOR ARGUMENT CONS l STENCY
GO TO 2
.....
TRUBL = 1.0
RETURN
.Ir
C
2
..... CALCULATE F I RST-,ORDER D l FFERENCES
N M l = N - 1
.....
DO 3
3
1=1,NM1
TABLE(I,l)
I F (M.LE.1)
= (Y(l+l)
GO TO 6
-
Y(I))/(X(I+l) X(1)) -
C
C .....
00 5
CALCULATE HIGHER-ORDER DIFFERENCES
J=2,M
.....
DO 5 I=J.NMl

6 TRUBL = 0.0
RETURN
C
END

Function FNEWT
FUNCTION FNEWT ( X,Y,TABLE,N,M, IDEG,XARG,TRUBL,K )

FNEWT ASSUMES THAT X(l)...X(N) ARE I N ASCENDING ORDER AND


F I R S T SCANS THE X VECTOR TO DETERMINE WHICH ELEMENT I S
NEAREST ( .GE. ) THE INTERPOLATION ARGUMENT, XARG.
THE I D E G + l BASE POINTS NEEDED FOR THE EVALUATION OF THE
DIVIDED-DIFFERENCE POLYNOMIAL OF DEGREE I D E G t l ARE THEN
CENTERED ABOUT THE CHOSEN ELEMENT W I T H THE LARGEST HAVING
THE SUBSCRIPT MAX. I T I S ASSUMED THAT THE F I R S T M D I V I D E D
DIFFERENCES HAVE BEEN COMPUTED BY THE SUBROUTINE
DTABLE AND ARE ALREADY PRESENT I N THE MATRIX TABLE.
MAX I S CHECKED TO INSURE THAT A L L REQUIRED BASE POINTS ARE
AVAILABLE, AND THE fNTERPOLANT VALUE I S COMPUTED USING NESTED
POLYNOMIAL EVALUATION. THE INTERPOLANT I S RETURNED AS
THE VALUE OF THE FUNCTION. FOR INCONSISTENT ARGUMENTS,
TRUBL = 1.0 ON E X I T . OTHERWISE, TRUBL = 0.0 ON E X I T .

DIMENSION X(N), Y(N), TABLE(K,K)

C ..... CHECK FOR ARGUMENT INCONSISTENCY


I F (1DEG.LE.M) GO TO 2
.....
TRUBL = 1.0
FNEWT = 0.0
RETURN
C
C
2
.....
DO4
SEARCH X VECTOR FOR ELEMENT .GE. XARG
l=l,N
.....
I F (1.EQ.N .OR. XARG.LE.X(I)) GO TO 5
4 CONTINUE
5 MAX = I + I D E G / 2
C
C ..... INSURE THAT A L L REQUIRED DIFFERENCES ARE I N TABLE
I F (MAX.LE.IDEG) MAX = IDEG + 1
I F (MAX.GT.N) MAX = N
24 Interpolation and Approximation

Program Listing (Continued)


C
C ..... COMPUTE l NTERPOLANT VALUE
YEST = TABLE(MAX-1, I D E G )
.....
I F (IDEG.LE.1) GO TO 1 3
I D E G M l = IDEG
DO 1 2
-
I =1, IDEGM1
1
--
-
I S U B l = MAX I
l S U B 2 = IDEG I
12
13
YEST YEST+(XARG
l S U B l = MAX - IDEG
- X(ISUB1)) + TABLE(ISUB1-l,ISUB2)

TRUBL = 0.0
FNEWT = YEST+(XARG
RETURN
-X(ISUB1)) + Y(ISUB1)

C
END

Data
N = 8 M = 6
X(l)...X(S) = 0.0000
X(6)...X(8) = 0.7000
XARG = 0.2500 IDEG =
XARG = 0.2500 IDEG =
XARG= 0.2500 IDEG =
XARG- 0.2500 IDEG =
XARG = 0.2500 IDEG =
XARG= 0.2500 IDEG =
XARG = 0.4500 IDEG =
XARG = 0.4500 IDEG =
XARG= 0.4500 IDEG =
XARG = 0.0500 IDEG =
XARG= 0.0500 IDEG =
XARG = 0.8500 IDEG =
XARG = 0.9500 IDEG =
XARG=
XARG=
XARG =

XARG =
0,9500
0.9500
0.1000
XARG = - 0 . 1 0 0 0
0.5500
IDEG
IDEG
IDEG
IDEG
IDEG
-
=
=
=
=
XARG = 1.1000 IDEG =
XARG= 2.0000 IDEG =
XARG= 2.0000 IDEG =
XARG = 2.0000 IDEG =
XARG = 2.0000 IDEG =
XARG = 2.0000 IDEG =
XARG= 2.0000 IDEG =
Example 1.1 Interpolation with Finite Divided Differences

Computer Output
THE SAMPLE FUNC:TIONAL VALUES ARE

FOR M = 6, THE D I V I D E D D I F F E R E N C E S ARE

THE DATA AND RESULTS ARE

XARG IWEG Y l NTER TRUVAL TRUBL


26 Interpolation and Approximation

Discussion of Results Comparison of the interpolated estimates of the cosine


The programs have been written to allow calculation function with those computed by the library function
of the desired table of divided differences just once (by COS (accurate to the number of figures shown) shows that
calling DTABLE once). Subsequently, an interpolating interpolation of degree three or more in general yields
polynomial of any degree d < m can be evaluated for results accurate to five or more figures. Interpolant
any argument, without recomputing the divided-differ- values resulting from double-precision calculations (not
ence table, by calling only on F N E W T . shown) do not differ from the single-precision results in
The programs were compiled and executed on an the first six digits, indicating that the higher-order
IBM 360167 computer using single-precision arithmetic, divided differences contribute little to the significant
equivalent to a computer word-size of approximately part of the computed interpolant values. On the other
seven decimal digits. Note that the calculated higher- hand, the extrapolated estimates of the cosine function
order divided differences differ considerably from those for the 17th and 19th-25th data sets are in considerable
of Table 1.3 which were computed using significant-digit error, with at best two digits of accuracy, even for
arithmetic. Yet different values are found using double- evaluation of the interpolating polynomial of degree six.
precision arithmetic, equivalent to a word-size of Although the function FNEWT assumes that the base-
approximately 16 decimal digits for computers of the point values are ordered in ascending sequence, this is not
IBM 360 type. The results for calculation of the divided an essential feature of Newton's divided-difference
differences using double-precision arithmetic are: interpolating polynomial. Modification of FNEW~ to

FOR M = 6, THE D I V I D E D D I F F E R E N C E S A R E

In the computer output the letters E and D should be


interpreted as ten to the ~h~~~examples permit arbitrary ordering of the elements of the x vector
illustrate the important point that computer character- only changes. In particular, it would
istics may affect calculated results significantly, even for be necessary to 'pecify which base points are to be used
the same algorithm. to determine the interpolating polynomial, probably by
addition of another dummy argument.
1.7 Lagrange's Interpolating Polynomial ) [ X O ]f ( X - x o l f [ x ~ , x o ]
P ~ ( x=f
Lagrange's form of the interpolatirig polynomial [3] is + (X - XO)(X- x;)f [xz,x;,xo].
given by
Substituting the equivalent symmetric forms for the
divided difkences yields

The higher-degree Lagrange formulations can be der-


tl ived in an analogous way from the corresponding
p,(x) = 1L i ( ~ )(xi),
i=o
f
(1'43) divided-difference polynomial, although the algebra is
quite tedious. A somewhat simpler alternative develop-
where ment is as follows: Assume that the interpolation poly-
nomial has the form
x x , i=O,i ,..., n. (1.44)
p,(x) = ao(x - x,)(x - x2)(x - x,) (x - x,)
+ a l ( x - xO)(X - x2)(x - x 3 ) .- . ( X - x,)
Each functional value f ( x , ) included in the polynomial
fit is multiplied by t i ( x ) , an nth-degree polynomial in x + a2(x - xo)(x - xl)(x - x,) . (x - x 3
[since there are n factors ( x xi)].-
As before [see (1.30) and (1.39)],
+ ai(x - xo)(x - x,) (1.47)
.( x - X i - 1)(-
~ X i + ,) . (x - x,)
where

where the coefficients a,, a,, . . ., an are determined by re-


quiring, as before, that p , ( ~ ~=)f (xi), i = 0,1, ..., n.
Examination of (1.47) shows th-at this can be the case
only if
The Lagrange form for p,(x) can be derived directly from
Newton's divided-difference polynomial of equivalent
degree by first writing the divided differences in the sym-
metric form of (1.17). For example, consider the second-
dcgrce dividcd-diffcrcncc polynomial,
or, in general,
28 Interpolation and Approximation

In condensed form, (1.47), with the coefficients of (1.48) Example. Use the Lagrange form of (1.43) and (1.44) to find
substituted, yields the Lagrange form of (1.43). the second-degree interpolating polynomial passing through
Note that the Lagrange form involves only the base the three points in Table 1.7 (see Table 1.2).
points xi and the corresponding functional values f (xi). Table 1.7 Functional Values
The divided differences of Newton's fundamental for-
mula need not be computed at all. When only one inter-
polation is to be carried out, the amount of computation
required by the divided-difference and Lagrange formulas
is roughly equivalent. Less computer storage is required
for the Lagrange formula, of course, since there is no
need to save the divided-difference table.
The Lagrange form has two significant disadvantages For this case,
aside from the excessive amount of calculation required
when many interpolations are to be done using the same
data set. Since the divided differences are not computed,
the error estimate of (1.33) has little relevance. Normally,
no estimate of the error can be made, since the error
bound of (1.39b) is not applicable unless high-order der- (x-o)(x-1) x2-x
ivatives can be evaluated (not the usual case). Also, the =-
addition of a new term (that is, increasing the degree of
L'(x)= (3 - 0)(3 - 1) 6 '
the polynomial by one), which is a simple matter when and
using Newton's form, requires complete recomputation + +
p2(x) = -5 Lo(x) Ll(x) 25 L ~ ( x = +
) 2x2 4x - 5 ,
of all the Li(x) values for Lagrange's form. Thus the which is identical with the polynomial found previously for
Lagrange form is not well-suited for cases in which the the same data using the divided-difference form of the inter-
desired polynomial degree is not known a priori. polating polynomial.
EXAMPLE 1.2
LAGRANGIAN INTERPOLATION

Problem Statement corresponding functional values y,, y,, . . ., y,,, where


Write a function na~med FLAGR that evaluates for y i =f(xi). Then call on FLAGR to evaluate y(X) for argu-
interpolation argument X the Lagrangian interpolating ments X = 300, 1700, 2500, 3300, 5300, and 5900, with
polynomial of degree d passing through the points various values for d and min. Compare the results with
the experimentally observed values from Table 1.2.1.
(xmin,~min),(~min+l,~mint 117 .
-9 (xmin+d,~min+d).
In addition, write a main program that reads data
Method of Solution
values n, x,, x2, . . ., xn, y,, y,, . . ., yn, x, d, and min,
and then calls upon FLAGR to evaluate the appropriate In terms of the problem parameters, Lagrange's form
interpolating polynomial and return the interpolant of the interpolating polynomial (1.43) becomes :
value, y(X). As test data, use information from Table
1.2.1 relating observed voltage and temperature for the Y()' =, 1 Li(x)~i,
r = min
Platinum to Platinum-10 percent Rhodium thermo-
couple with cold junctions at 32°F. where
Table 1.2.1 Reference Table for
the Pt-10 % iPh Thermocouple[21]

emf
(microvolts)
Temperature
03
i = min, min + 1, . .., min + d.
The program that follows is a straightforward imple-
mentation of (1.2.1) and (1.2.2), except that some
calculations (about d2 multiplications and d 2 subtrac-
+
tions, at the expense of d 1 divisions) are saved by
writing (1.2.2) in the form

j = min
j+i

i = min, min + 1, . . ., min + d, X # xi,


where
min + d
c= n
j=min
(T - xj). (1.2.4)

The restriction in (1.2.3), X # xi, causes no difficulty,


Read tabulated values for the 13 selected base points since, if Z = xi, the interpolant J(X) is known to be yi;
x, = 0, x2 = 500, x3 = 1000, . . ., x13 = 6000, and the no additional computation is required.
Flow Diagram
Main Program

/n,
XI, .. .7 X, ' Z,d, min
Y17 - --,Yn
vf

Compute interpolant value


J(X) using Lagrange's
interpolating polynomial of
degree d with base points
Xmin, .. -9 Xmin+d.
(Function FLAGR)
30 Interpolation and Approximation

Function FLAGR (Arguments: x, y, E, d, min, n ) _ - - - _ _ _ - - - - - - _ _ - - _ _ - -____


- -_- - 1
I

j = min, rnin -I- 1, c + c(X - xi)


c t l
..., miiz + d

0 Return

j = m i n , min + 1, i = min,min + 1,

Return

-------- 4

--

FORTRAN Implementation
List of Principal Variables
Program Symbol DeJinition
(Main)
I Subscript, i.
IDEG Degree, d, of the interpolating polynomial.
MIN Smallest subscript for base points used to determine the interpolating polynomial, min.
N n, the number of paired values (xi, yi =f (xi)).
X Vector of base points, x i .
XARG Interpolation argument, x.
Y Vector of functional values, y i =f (xi).
YINTER Interpolant value, J(jS).
(Function
FLAGR)
FACT0 R The factor c (see (1.2.4)).
J ' Subscript, j.
MAX Largest subscript for base points used to determine the interpolating polynomial, min + d.
TERM t , a variable that assumes successively the values Li(jS)yi in (1.2.1).
YEST Interpolant value, jj(Z).
ExampIe 1.2 Lagrangian Interpolation

Program Listing
Main Program
C A P P L l ED NUMERI CAL METHODS, EXAMPLE 1.2
C LAGRANGE'S INTERPOLATING POLYNOMIAL
C
C TEST PROGRAM FOR THE FUNCTION FLAGR. T H I S PROGRAM READS A
C SET OF N VALUES X(l)...X(N) AND A CORRESPONDING SET OF
C FUNCTIONAL VALUES Y(l)...Y(N) WHERE Y ( I ) = F ( X ( I ) ) . THE
C PROGRAM THEN READS VALUES FOR XARG, IDEG, AND M I N (SEE FLAGR
C FOR MEANINGS) AND CALLS ON FLAGR TO PRODUCE THE INTERPOLANT
C VALUE, Y I NTER.
C
I M P L I C I T REAL*8(A-H, 0-2)
DIMENSION X(100), Y(100)
C
C .....
READ N,
READ (5,100)
X AND Y VALUES, AND PRINT
N, ( X ( l ) , l=l,N)
.....
READ ( 5 , 1 0 1 ) ( Y ( I l 0 l=l,N)
WRITE (6.200)
DO 1 I = ~ , N
1 WRITE (6,201) I, X ( l ) , Y(l)
C
C .....READ INTERPOLATION ARGUMENTS, CALL ON FLAGR,
WRITE (6,202)
AND P R I N T .....
2 READ (5,102) XARG, IDEG, M I N
YINTER = FLAGR ( X,Y,XARG,IDEG,MIN,N )
WRITE ( 6 , 2 0 3 ) XARG, IDEG, MIN, YINTER
GO TO 2

..... FORMATS FOR INPUT AND OUTPUT STATEMENTS


FORMAT ( 4X. 1 3 / (15X. 5F10.4)
.....
FORMAT ( 1 5 ~ , , 5F10.4 1-
FORMAT ( 7X, F10.4, 13X, 12, 12X, 1 2 )
FORMAT ( 3 3 H l T H E SAMPLE FUNCTIONAL VALUES ARE / SHO
1 4HX( I), 9X, 4HY( I ) / 1 H
I. 8X. . .
FORMAT ( 1 H , 14, 2F13.4
-
FORMAT ( 2 5 H l T H E DATA AND RESULTS ARE / 1HO. 5X. 4HXARG. 5X. - - -
1 LHIDEG, SX, 3HMIN, SX, 6HYINTER / 1~ )
FORMAT ( 1 H , F9.4, 18, 18, F12.4

END

Function FLAGR

FUNCTION FLAGR ( X,Y,XARG,IDEG,MIN,N


C
C FLAGR USES THE LAGRANGE FORMULA TO EVALUATE THE INTERPOLATING
C POLYNOMIAL OF DEGREE IDEG FOR ARGUMENT XARG USING THE DATA
C VALUES X(MIN)...X(MAX) AND Y(MIN)...Y(MAX) WHERE
C MAX = M I N + IDEG. NO ASSUMPTION I S MADE REGARDING ORDER OF
C THE X ( I ) , AND NO ARGUMENT CHECKING I S DONE. TERM I S
C A VARIACLE WHICH CONTAINS SUCCESSIVELY EACH TERM OF THE
C LAGRANGE FORMULA. THE F I N A L VALUE OF YEST I S THE INTERPOLATED
C VALUE. SEE TEXT FOR A DESCRIPTION OF FACTOR.
C
I M P L I C I T REAL*I(A-H, 0-Z)
REAL*8 X, Y, XARG, FLAGR
DIMENSION X(N), Y ( N )
C
C ..... COMPUTE VALUE OF FACTOR
FACTOR = 1.0
.....
MAX = M I N + I D E G
DO 2 J=MIN,MAX
I F (XARG.NE.X(J)) GO TO 2
FLAGR = YCJ)
RETURN
2 FACTOR = FACTOR*(XARG X(J)) -
32 litterpolation and Approximation

Program Listing (Continued)


C
C .....
EVALUATE l NT ERPOLAf l NG POLYNOMIAL
YEST = 0.
.....
DO 5 I=MIN,MAX
TERM = Y(I)+FACTOR/(XARG
DO 4 J=MIN.MAX
- X(I))

4 IF (I.KZ.J) -TERM = T E R M / ( X ( I ) - x ( J ) )
5 YEST = YEST + TERM
FLAGR = YEST
RETURN
C
END

Data
N = 13
X(l)...X(5) = 0. 500. 1000. 1500. 2000.
X(6)...X(lO) = 2500. 3000. 3500. 4000. 4500.
X(ll).,.X(l3) = 5000. 5500. 6000.
Y(l)...Y(S) = 32.0 176.0 296.4 405.7 509.0
Y(6)...Y(10) = 608.4 704.7 799.0 891.9 983.0
Y(ll)...Y(l.3) = 1072.6 1160.8 1247.5
XARG = 300. IDEG = 1 MIN = 1
XARG= 300. IDEG = 2 MIN = 1
XARG= 300. IDEG = 3 MIN = 1
XARG= 300. IDEG = 4 MIN = 1
XARG = 1700. IDEG = 1 MIN = 4
XARG = 1700. IDEG = 2 MIN = 3
XARG = 1700; IDEG = 2 MIN = 4
XARG = 1700. IDEG = 3 MIN = 3

XARG -
XARG = 1700.
1700.
IDEG = 4
IDEG 0 4
MIN =
MIN =
2
3
XARG = 2500.
XARG = 2500.
XARG = 2500.
XARG = 2500.
XARG = 2500.
IDEG = 1
IDEG = 1
IDEG = 2
IDEG = 3
IDEG = 3
MIN =
MIN =
MIN
MIN =
MIN =
- 5
6
5
4
5
XARG = 2500. IDEG = 4 MIN = 4
XARG = 3300. IDEG = 1 MIN = 7
XARG = 3300. IDEG = 2 MIN = 6
XARG = 3300. IDEG = 2 MIN = 7
XARG = 3300. IDEG = 3 MIN = 6
XARG = 3300. IDEG = 4 M'IN = 5
XARG = 3300. IDEG = 4 MIN = 6

-
XARG = 3300.
XARG
XARG = 3300.
3300.
IDEG = 5
IDEG = 6
IDEG = 6
MIN =
MIN=
MIN=
5
4
5
XARG = 3300. IDEG = 7 MIN = 4
XARG = 3300. IDEG = 8 MIN = 3
XARG = 3300. IDEG = 8 MIN = 4
XARG = 3300. IDEG = 9 MIN = 3
XARG = 5300. IDEG = 1 MIN = 11
XARG = 5300. IDEG = 2 MIN = 10
XARG = 5300. IDEG = 2 MIN = 11
XARG = 5300. IDEG = 3 MIN = 10
XARG = 5300. IDEG = 4 MIN = 9
XARG = 5900. IDEG = 1 MIN = 12
XARG = 5900. IDEG = 2 MIN = 11
XARG = 5900. IDEG = 3 MlN = 10
XARG = 5900. IDEG = 4 MIN = 9
Example 1.2 Lagrangisn hterpolatien

Computer Output
THE SAMPLE FUNC'TIONAL VALUES ARE

THE DATA AND RESULTS ARE

XARG I O EG MIN Y l NTER


34 Interpolation and Approximation

Discussion of Results While the base points used as test data were ordered in
All computations have been carried out using double- ascending sequence and equally spaced, the program is
precision arithmetic (8 byte REAL operands); for the not limited with respect to either ordering or spacing of
test data used, however, single-precision arithmetic would the base points. For greater accuracy, however, one would
have yielded results of comparable accuracy. Since the normally order the base points in either ascending or
true functionf (x), for which y i =f(xi), is unknown, it is descending sequence, and attempt to center the inter-
not possible to determine an upper bound for the inter- polation argument among those base points used to
polation error from (1.39b). However, comparison of the determjne the interpolating polynomial. This tends to
interpolant values with known functional values from keep the polynomial factor
Table 1.2.1 for the arguments used as test data, shows
that interpolation of degree 2 or more when the argu-
ment is well-centered with respect to the base points,
generally yields results comparable in accuracy to the
experimentally measured values. The results for 2 = 300 and hence the error term comparable to (1.39b), as
are less satisfactory than for the other arguments, small as possible. This practice will usually lead to more
possibly because the argument is near the beginning of satisfactory low-order interpolant values as well. The
the table where the function appears to have considerable program does not test to insure that the required base
curvature, and because centering of the argument among points are in fact available, although a check to insure
the base points is not practicable for large d. that min > 0 and max 6 n could be inserted easily.
1.8 Polynomial Interpolation with Equally Spaced Base Points 35
1.8 Polynomial Interpolation with Equally Spaced The relationship between a given forward finite dif-
Base Points ference and the corresponding finite divided difference of
Table 1.1 is a simple one:
When the base-point values are equally spaced so
thatx,-xo=x2-x,==x3-x2=-..=x,-x,-,=h,
then, as we would expect, some simplification of the
divided-difference or Lagrange formulation is possible.
The interpolating polynomial will be developed in terms Similarly,
of the forward (A), backward (V), and central (6) difference
opbrators. These are linear operators, that is,

In general,

and will be defined individually in subsequent sections.


The principal reason for using $nite differences is nota- In order to rewrite the divided-difference polynomial
tional compactness. In addition, use of these operators in terms of the forward finite differences in compact form,
permits a considerable reduction in computation com- we introduce a parameter a, such that
pared with the Newton or Lagrange forms.
Fbrward Differences. The forward-dighrence operator
( A ) is defined as follows: where 0 < u ,< n for values of x in the interval xo ,< x ,< xn.
Since the base-point values xo, x,, x,, . . ., x, are evenly
spaced at intervals h,

Af(x) is termed the first forward difference, A2f(x) the


second forward difference, etc. x - X, = h(u - n).
The forward finite differences can be computed and The finite divided-difference formulation (Newton's
saved in a forward-d~ferencetable in much the same man- fundamental formula) is rewritten here for convenience
ner as shown earlier for the divided differences. As an see (1.30), (1.31), and (1.39)]:
example, consider the polynomial (x3 - 2x2 + 7x $), -
used earlier, tabulated at five base points with h = 1.0
(Table 1.8). where

Table 1.8 Forward-DifferenceTable

+ (x - xo)(x - XI)
(X - Xn- l)f Cxn,xn- l r . ., XOI,
and
R,(x) = (x - xo)(x - x,) ...(x - xn- ,)(x - x,)

Notice that for this third-degree polynomial, the third Rn(4 = (x - xo)(x - x,) ...(x - x,- ,)(x - x,)
forward differences are constant and the fourth forward f '"+
"(5)
difference is zero. X-
(n + I)! ' 5 in (x,x,,x,-, , , .., xo).
36 Interpolation and Approximation

In terms of a and the forward-difference operator, New- Substitution into (1.54) yields
ton's fundamental formula assumes the form

As a check, use the second-degree Lagrange formula (1.46):

The error estimate from the first difference dropped would be


given by (1.56):

= pn(x0 + ah) + Rn(xo+ ah) RZ(xO+ ah) = a(a - l)(a - 2) -


A3f ( x o )
3!
where
+
Rn(xo ah) = hn 'a(a - l)(a - 2). - (a - n)
+

In this case, - # happens to be exactly the error, since f ( x ) is


a third-degree polynomial and the higher-order differences
vanish.
Backward Differences. The backward-difference operator
(V) is defined as follows:
+
Rn(xo ah) = hn+'a(u - l)(a - 2). ..(a - n)
. . .,xo].
x f [X,X,,,X,-~, (1.55b) Vf (x) =f (x) - f (x - h)
V2f(x) = Vf(x) - Vf(x - h)
As before, an estimate of the error can be made if an
additional base point x,+, is given, by assuming that V3f(x) = V2f (x) - VZf(X - h ) (1.57)

Vnf(x) = Vn-'f(x) - Vn-'f(x - h).


Since
Vf (x) is termed the jirst backward dlflerence, V2f(x), the
second backward d~ference, etc. Define a parameter r ,
this time by using as the origin the base point x,, so that
+
an estimate of Rn(xo ah) is given by x = X, + ah. (1.58)
a is zero or negative (-n < a < 0) for values of x in the
Rn(xo+ ah) = a(a - l)(a - 2)-.-(cr - n) An+ I f ( ~ 0 )
(n + I)! ' table, xo < x < x,. Table 1.9 indicates the backward dif-
ferences for the data of Table 1.8 with the base-point
indices renumbered.
Equation (1.54) is known as Newton's forward formula
(NFF). Notice that N F F uses only the differences along Table 1.9 Back ward-Difference Table
the upper diagonal of the difference table (marked with
solid lines in Table 1.8). ~ o n s e ~ u e n tis lmost
~ , ~use-
~ ~
ful for interpolation near the beginning of an equal-
interval table. Of course, the same formula can be applied
in other parts of a table by a suitable translation of the
zero subscript. The necessary forward differences, starting
with the shifted origin x, = 1.0, yo = 1.0, would be those
along the dotted diagonal of Table 1.8.
Example. Apply Newton's fundamental formula to the data
of Table 1.8 and evaluate the second-degree polynomial for
interpolation argument x = 1.5.
For this case, h = 1 and
Notice that the backward differences of Table 1.9 are
identical with the forward differences of Table 1.8 ; only
the indices of the base points differ.
1.8 Polynomial Interpolation with Equally Spaced Base Points 37

If Newton's fundamental formula is written in terms Equation (1.61) is k ~ o w n as Newton's back7r;ard


of the divided differences along the lower diagonal path formula ( N B F ) . Since NBF uses differences along the
of a divided-difference table, (1.30) has the form lower diagonal of the difference table, it is most useful for
interpolation near the end of a set of tabulated values.
As before, a shifting of subscripts, say the labeling of the
where point (3,25) in Table 1.9 as (x,,f(x,)), allows the formula
to be used elsewhere in the table. In Chapter 6 , we shall
pn(x> =f Ex,,] + ( x - xn>f Cxn,x.- 11 show that this form of the interpolating polynomial is
+ ( X - xn)(x - ~ n - l ) f C ~ n , ~ n - 1 , ~ n - z I quite useful in creating algorithms for stepwise solution
+ .. + ( x .- xn)(x - X n - l)(x - xn-z)
* of ordinary differential equations.
. . ( x - x1).fCxn,xn-17. - .,xoI Example. By using the differences of Table 1.9, find the
(1.59) interpolant value predicted by the third-degree NBF formula
and (1.61) for the argument x = 3.5.
For this case, cc = (x - x 3 / h = (3.5 - 4)/1 = -0.5. Then
NBF yields

f "(5)
x --
'"+
X, < 5 < x,. (1.60) Since f ( x ) = x 3 - 2 x 2 f 7 x - 5 is a third-degree polynomial,
(n+ I)!' this predicted value is the exact value, that is,'the higher-order
In terms of a and th~ebackward differences, (1.60) differences vanish as before.
becomes Central Differences. In Section 1.6 it was shown that
the interpolation error tends to be smallest when we fit
f (x, + ah) =f (x,) + aV f (x,) + -
2!
v y (x,) +
the interpolating polynomial with base points on both
sides of the interpolation argument. This can be effected
by choosing a path through the divided-difference table
which zizgags about some base point near the interpola-
tion argument. Stated another way, this means that if a
central path is taken across the difference table, more of
the functional value is represented by leading terms in the
difference formulation (the sequence converges faster).
x V"f(xn) + Rn(xn+ ah) Consequently, a low-order central interpolation may pro-
(1.61) duce answers with remainder terms no larger than a
where higher-order fit using the forward or backward paths.
Of course, if the same base points are used to produce
polynomials of equivalent degree, then all paths are
equivalent.
A simple notation which describes differences along a
zigzag path near the center of the difference table is
+
Rn(xn ah) = hn ' a ( a + + l ) ( a + 2). ..( a + n ) needed. The central-diflerence operator ( 6 ) is defined as

X
f-- '"+
"(S)
Xo < 5 < X,.
follows :
(n + I ) ! '
If an additional base point x - , = x, - h should be
available, then the (n + 1) th divided difference may be
approximated by

leading to an estimate for .Rn(x, + ah) of


a(a + l ) ( u + 2 ) . - ( a + n)Vn+ 'f(x,)
Rn(x, + ah) s
( n + I)! There is a notational difficulty for the odd differences,
(1.63) since there are no tabulated values at the half-interval
38 Interpolation and Approximation

values xi + h/2, xi - h/2, etc. However, these "fictional" n = 2r + 1 is odd, then the forward formula agrees
values are simply ignored and instead the odd differences exactly with the data at x-,, x-,+,, . .., x,, . . ., xr-,, x,,
are evaluated at mid-interval. Thus the even and odd x,+,, and the backward formula agrees at x-,-,, x-,,
differences are evaluated separately as follows : ,
x-,+ ,, . . ., x,, . . ., x,- , x,. Stirling's formula, occasion-
ally mentioned in the literature, is simply the average of
For 2r = n (even): these two Gauss central-difference formulas:

--
n! hn
-f [xi,xi+ I,. . ., Xi+,,ii-,I.
For 2r + 1 = n (odd):

The central-difference table (entries are identical with


the forward- and backward-difference tables, except for
the subscripts of the base points) then takes the form of
Table 1.10.
Table 1.10 Central-Difference Table

If the zigzag path along the solid lines is taken, the for- The differences of Table 1.8 with subscripts adjusted as
mulation is termed the Gaussforward formula and is given shown in Table 1.10 are illustrated in Table 1.11.
by
Table 1.! I Central-DifferenceTable
f (xo + ah) =f (xo) + a6f (xo + h/2)
+ @(a- 1)62f(xo)/2!
+ a(a - l)(a + 1)63f(xo + h/2)/3!
+ .. . + R,(x, + ah), (1.65)

where a = (x - x,)//r. When the dotted path is used, the


Gauss backward formula is generated instead :

f (xo + ah) =f (xo) + a6f (xo - h/2)


+ a(u + 1)62f(xo)/2!
+ a(u - 1)(a + 1)6~f(x,- h/2)/3!
+ ... + R,(x0 + ah). (1.66) Example. Use the Gauss forward formula of (1.65) and
An estimate of K,(xo + ah) can be made, as before, by central-differences of Table 1.11 to compute the interpolant
value for interpolation argument x - 2.5 with n - 3.
evaluating the first term dropped. If the last retained term
Following the zigzag path across the table,
is of even order n = 2r, then the two formulas are equiva-
lent (since the same base points are involved in the fit) u = ( X - x O ) / h = (2.5 - 2)/1 = 0.5,
and yield identical results for all cr. The polynomial agrees p3(2.5) =9 + (0.5)(16)+ (0.5)(-0.5)(8)/2!
with the data at x-,,x-,+,, . . ., x,, . . ., x,-,, x,. If +(O.S)(-0.5)(1.5)(6)/3 ! = 154.
1.10 Chebysheu Polynomials 39
Evaluation of the generating function approximation, that is, as the degree of the interpolating
polynomial is increased.
f (x) =p,(x) = x3 - 2xZf 7x - 5
One final word of caution. The functional values f ( x i )
for x = 2.5 yields the same value, as expected. are usually known to a few significant figures at best.
Successive differencing operations on these data, which
1.9 Concluding Remarks on Polynomial Interpolation are normally of comparable magnitude, inevitably lead
Approximating polynomials which use information to loss of significance in the computed results; in some
about derivatives as well as functional values may also cases, calculated high-order differences may be com-
be constructed. For example, a third-degree polynomial pletely meaningless.
could be found which reproduces functional values f (x,) On the reassuring side, low-degree interpolating poly-
and f (x,) and derivative valuesf'(xo) and f '(x,) at x, and nomials usually have: very good convergence properties,
x, respectively. The simultaneous equations to be solved that is, most of the functional value can be represented by
in this case would be (for p,(x) = x;=,
aixi): low-order terms. In practice, we can almost always
achieve the desired degree of accuracy with low-degree
a, + ~ , X ~ + L I ~ X =f(x,)
~ + ~ , ~ ~ polynorllial approximations, provided that base-point
a, -+ a,x, + a,x: + a3x: =f(x,) functional values are available on the interval of interest.
a , + 2a,xo + 3a,xg =f '(x,) 1.10 Chebyshev Polynomials
a , + 2a2x, + 3a3x; =f '(x,). The only approximating functions employed thus far
This system has the determinant have been the polynomials, that is, linear combinations
of the monomials 1, x, x2, . . ., xn. An examination of the
I1 X I xl x: 1 monomials on the interval [-1,1] shows that each
achieves its maximum magnitude (1) at x = 1 and its +
minimum magnitude (0) at x = 0. If a function J(x) is
approximated by a polynomial

Higher-order derivatives may be used as well, subject to


where pn(x) is presumably a good approximation, the
the restriction that the determinant of the system of
dropping of high-order terms or modification of the co-
equations may not vanish. The logical limit to this pro-
efficients a,, . . ., a, will produce little error for small x
cess, when f (x,), f '(x,), f "(x,), . . .,f("'(x,) are employed,
(near zero), but probably substantial error near the ends
yields the nth-degree polynomial produced by Taylor's
of the interval (x near rt 1).
expansion, truncated after the term in xn. The generation
Unfortunately it is in general true that polynomial
of appropriate interpolation formulas for these special
approximations (for example those following from Tay-
cases is somewhat more tedious, but fundamentally no
lor's series expansions) for arbitrary functions f(z) ex-
more difficult than casels for which only the f(xi) are
hibit this same uneven error distribution over arbitrary
specified.
intervals a < z ib. Since any such arbitrary finite interval
Unfortunately, there are no simple ground rules for
deciding what degree interpolation will yield best results. can be transformed to the interval - 1 < x < 1 by the
change of variable
When it is possible to evaluatk higher-order derivatives,
then, of course, an error bound can be computed using 2z-b-a
(1.39b). In most situations, however, it is not possible to X = , z in Ca,bl,
b-a
compute such a bound and the error estimate of (1.33) xin[-l,l], (1.68)
is the only information available. As the degree n of the
interpolating polynomial increases, the interval contain- it is sufficient to examine the behavior of functions
ing the points x,, x,, . . ., xn also increases in size, tending, potentially better than 1, x, x2, . . ., xn on the interval
for a given x, to increase the magnitude of the polynomial [ - 1,lj. In particular, it seems reasonable to look for
term n:=, (x - xi) in the error (1.39) or error estimate
(1.33). And, of course, the derivatives and divided dif-
other sets of simple, related functions that have their ex-
treme values well distributed on the interval [- 1,1]. We
ferences do not necessarily become smaller as n increases; hope that if we approximate an arbitrary function using
in fact, for many functions [I31 the derivatives at first a linear combination of such functions, the error in the
tend to decrease in magnitude with increasing n and then approximation will be distributed more evenly over the
eventually increase without bound as n becomes larger interval. In particular, we want to find approximations
and larger. Therefore the error may well increase rather which are fairly easy to generate and which reduce the
than decrease as additional terms are retained in the mn.rimum error to the minimfm (or near minimum) value.
40 Interpolation and Approximation

The cosine functions, cos 8, cos 26, . . ., cos no, appear Table 1.12 The Chebyshev Polynomials
to be good candidates. Each of the functions has identical
maximum and minimum values distributed regularly over
an arbitrary interval, say 0 C 8 C n; in addition, the
extreme values for two functions cosj8, cos k8, j # k, do
not, in general, occur at the same values of x (see Fig. 1.9).
As shown earlier (1.7), the cosine function requires an
approximation for its numerical evaluation. A simpler
and more useful form results from the transformation of
cos n0 on the interval 0 C 8 C n into an nth-degree poly-
nomial in x on the interval - 1 C x < 1. The set of poly-

Figure 1.9 Cos no, n = 1,2.3,4, on the interval 0 d 8 < m.


nomials T,(x) = cos no, n = 0, 1, .. ., generated from the The monomials 1, x, x2, x3, .. ., x9 can be expressed as
sequence of cosine functions using the transformation functions of the Chebyshev polynomials by simple alge-
braic manipulation and are shown in Table 1.13. Notice
that the Chebyshev polynomials (because of their cos nt9
is known as the Chebyshev polynomials. Clearly T,(x) = origin) have n +
1 extreme values of equal magnitude
cos(0) = 1. On introducing (l.69), the first cosine function ( 1), alternately positive and negative, on the interval
is transformed into the Chebyshev polynomial of degree [- 1,I]. The Ti(x) of degrees 0 to 3 are shown in Fig.
one, 1.10. The n roots of T,(x) are real, occur on the interval
T,(x) = cos 8 = cos(cos-' x) = x. (1.70) (-1,1), and are given by
To find T2(x), the Chebyshev polynomial of degree two,
apply the trigonometric identity, cos 20 = 2 cos2 8 - 1 to
yield A very useful property of these polynomials is that of all
cos 26 = cos(2 cos - 'x) = 2 cos2(cos-'x) -1 the possible monic polynomials p,(x), that is, of all poly-
or nomials of degree n with the coefficient of the nth-power
term equal to 1, the polynomial
T2(x) = 2x2 - 1. (1.71)
In general, repeated application of the trigonometric
identity,
has the smallest upper bound for its absolute value on the
cos n8 = 2 cos 0 cos(n - 1)8 - cos(n - 2)6, interval [- 1,1].
Examination of the recursion relation or of Table 1.12
can be used to compute higher-order Chebyshev poly-
shows that the leading coefficient of the nth Chebyshev
nomials, yielding the recursion relation,
polynomial is 2"-', so that ~$.(x) is simply the normalized
T,(x) = 2xT, -,(x) - T,- 2(x). (1.72) nth Chcbyshev polynomial, that is, T,(x) divided by its
I . 1 1 Minimizing the Maximum Error 41
Table 1.13 Powers of x in Terms of he S,(x), which has a smaller maximum magnitude on the
Chebyshev Polynomials indicated interval, and determine the difference

at the local extrema of $,(x). 4,(x) has exactly n + 1


extreme values and crosses the x axis exactly n times (has
n roots) in the interval - 1 < x < 1. At each of these ex-
treme values (let the abscissas be x,, x,, . . ., x,), S,(x)
must be smaller in magnitude than 4,(x) because of the
initial assumption. This requires that the differences
evaluated at the xi,

change sign for each successive value of i, that is, if


D (x,) < 0, then D (x,) > 0, D (x,) < 0, etc., and if
D (x,) > 0, then D (x,) < 0, D (x,) > 0, etc. Thus D (x)
must change sign n times, or equivalently, have n roots
in the interval [- 1,1]. But D(x) is a polynomial of degree
n - I, because both $~,(x) and Sn(x) have leading co-
efficient unity. Since an (n - 1)th-degree polynomial has
only n - 1 roots, there is no polynomial S,(x). The pro-
position that $,(x)is the monic polynomialofdegree n that
deviatesleast from zero on[-1,1] is proved by contradiction.
Consider the illustration of the proof for &(x) =
T2(x)/2 shown in Fig. 1.11. The solid curve is &(x), that
IT, (x)
A is, Tz(x)/2 = xZ - 4,which has three extreme values at
x, = - 1, x, = 0, and x, = 1. The dotted curve shows a
proposed S,,(x) which has a smaller maximum magnitude
on the interval than 4,(x). The difference in the ordi-
nates 4,(x) and Sz(x) at x,, x,, and x, are shown as
D (x,), D (x,), and D (x,). As indicated by the direction
of the arrows, D (x) must change sign twice on the interval,
an impossibility since 4,(x) - Sz(x) is only a first-degree
polynomial.
-1 1 f X
1.11 Minimizing the Maximum Error
Since the nth-degree polynomial 4,(x) = Tn(x)/2"-' has
the smallest maximum magnitude of all possible nth-
degree monic polynomials on the interval [- 1,1], any
error that can be expressed as an nth-degree polynomial
can be minimized for the interval [- 1,1] by equating it
with 4,(x). For example, the error term for the inter-
-- 1 polating polynomial has been shown to be of the form
Figure 1.10 Cliebyshcv po&nomiaIs of degree 0 to 3.
(see 1.39b)

high-order coefficient. Since the maximum magnitude of


Tn(x) is 1 for - 1 < x ,< 1, the maximum magnitude of
q!J,(x) is 112"-' for - 1 ,< x < 1. A study of 4,(x) on this We can do very little about f("+')(t). The only effeo
interval shows that all the extreme values have the same tive way of minimizing R,(x) is to minimize the maxi-
magnitude, namely 112"'-', and that there are exactly mum magnitude of the (n + 1)th-degree polynomial
n + 1 of them, alternately relative maxima and minima. nyxo(x - xi). Treat f("+')(t) as though it were constant.
The proof that q!J,(x) has the property of minimum upper Now equate flYEo(x - xi) with 4,+,(x), and notice that
bound follows. the (x - xi) terms are simply the n + 1 factors of 4,. ,(x);
Assume that there is some other monic polynomial the x i are therefore roots of +,+,(x), or equivalently, the
Interpolation and Approximation
42

Figure 1.11 &(x), the second-degree monic polynomial that deviates least fiom zero on [- 1,1].
roots of the corresponding Chebyshev polynomial argument is straightforward. Suppose that pn(x) =f (x) -
Tn+l(x), given by d(x) is a better approximation to f(x) than is a given
polynomial p,*(x) that exhibits the equal-magnitude
oscillatory behavior for its error (such an error function
is said to possess the equal ripple property). Then
For an arbitrary interval, a < z < b, the appropriate zi -
d*(x) - d(x) = p,(x) pz(x) is evidently a polynomial
values may be computed by transforming these root of degree n, alternately positive and negative at the n + 2
values from [- 1,1] to [a,b], using the inverse of (1.68), or more points where d*(x) passes through its extreme
values, since at these points, by hypothesis, Id(x)l <
Id*(x)(. Since no polynomial of degree n possesses n + 1
roots (changes in sign), the contention that p:(x) is the
This minimization of maximum error is sometimes termed polynomial of degree n that minimizes the maximum de-
the minimax principle. viation from f (x) on the interval of interest is proved by
The application of the minimax principle is poss- contradiction. The polynomial p:(x) is called the mini-
ible only when there is a freedom of choice in selec- max polynomial of degree n.
ting the base points xi. Fortunately, this is often the case We assume that f(x) is a single-valued continuous
in experimental work where one has control over the function on the closed interval [a,b]. Then d*(x) is con-
independent variable values which may be used subse- +
tinuous and p*,(x) must be equal to f (x) at n 1 distinct
quently as base points for polynomial interpolation. For points x,, xi, . . ., x,, where each x i lies between two
example, the sampling time for an experiment may be points at which the error achieves its extreme value. Thus
the independent variable of interest. Then for a fixed total the minimax polynomial p,*(x) is an nth-degree inter-
time interval, during which a fixed number of samples are polating polynomial. When f(x) possesses an (n + 1)th-
to be taken, the best times for taking sample measurements order derivative, the error term (1.39b) applies, and the
are given by the transformed roots of the appropriate problem reverts to that of the early paragraphs of this
Chebyshev polynomial. section. Iff("+ "(x) is a constant, then the interpolating
A more general but related problem is to find that polynomial using the base points given by (1.77) is pre-
approximating polynomial p:(x) of degree n which cisely p $ ( ~ ) if
; not, then the interpolating polynomial so
satisfies generated is only an approximation to p,*(x).
The problem of findingp*,(x) when f '"+''({) in (1.39b)
is unknown or variable in [- 1,1], becomes an iterative
one. The procedures commonly used (see for example
[16]) follow from the work of Remes [15] and are rather
that is, minimizes the maximum deviation from f(x) on complex; they will not be described here.
[ - l,l] (or, in general, on [a,b]). One can show [I41 that Clenshaw [17] shows that if the function f(x) can be
p,*(x) is unique and that the deviation, d*(x) =f(x) expanded in terms of the Chebyshev polynomials as
- pf (x), must attain its maximum magnitude at not less
than n + 2 distinct points in [- 1,1] (or [a,b]) and be
alternately positive and negative at these points. The
1.12 Chebyshev Economizar'ion-Telescoping a Power Series 43

then the partial sum x2 x4 x6 x8 x10


cosx=l-- +--- +---
2! 4! 6! 8! lo!
+
will usually be a very goold approximation to p,*(x), that
is, pn(x) given by (1.81) will be near-minimax. Note that
(1.80) is just the Fourier cosine series expansion of the with n even. Examination of the remainder term for trun-
function ~ ( C O6).S Clenshaw suggests that the improve- +
cation after the nth-degree term, x"+'/(n 2)! cos 5, 151 <
ment one can get in the approximation using p,*(x) is 1, shows that truncation of the series results in a total
seldom worth the effort required to find it and that the error smaller than the first term dropped. Truncation
addition of one more term in (1.81), that is, increasing the after the term in x8 will produce an error of magnitude
degree of the polynomial by one, will usually produce a less than 1/10! G 2.76 x lo-' for x in [- 1,1].
greater improvement in accuracy. Unfortunately, the pro- Rewriting the remaining five-term series in terms of the
cedure for generating the coefficients in (1.80), described Chebyshev polynomials by using Table 1.13 yields:
in detail in [IT], is rather tedious for arbitrary functions. cos x 5 To
Clenshaw lists extensive tabulations of the a, for the com-
mon trigonometric functions, inverse trigonometric
functions, exponential and hyperbolic functions, logarith-
mic and inverse hyperbolic functions, the gamma func-
tion, the error function, and several of the Bessel
functions.
Snyder [18] describes procedures for generating mini-
max or near-minimax rational approximations (see
Section 1.2) to arbitrary lfunctions. His book contains a
substantial bibliography of recent work in the general
area of minimax approximations and Chebyshev poly-
nomials.
Collecting coefficients of the Ti,
1.12 Chebyshev Economizatian-Telescoping a Power
Series
Coefficients of the Chebyshev polynomials in (1.81) for
arbitrary functions f(x) can be computed with some
effort using methods similar to those outlined in Example Since the maximum magnitude of the Chebyshev poly-
2.2 in Chapter 2. Iff (x) is a polynomial of degree n, then nomials in [- 1,1] is 1.0, the terms in T6 and T8 can be
it is a simple matter to determine the coefficients in (1.81) dropped without incurring an additional error larger
using Table 1.13; we simply replace each term xi by its than4.186 x lo-' + 1.938 x = 4.206 x 10-'.When
expansion in terms of the Chebyshev polynomials and added to the maximum Taylor's series error, 2.76 x
collect coefficients of like polynomials Ti(x). It often hap- the maximum total error must be smaller than 4.234 x
pens that when polynomials, particularly those resulting lo-'. Since this is less than the prescribed error, a suitable
from truncated Taylor's sleries expansions, are expressed approximation for the cosine is given by
using Chebyshev polynomials, some of the high-order cos x 0.76519775T0
Chebyshev polynomials can be dropped with the know-
ledge that the error involved is small (since the upper
- 0.22980686T2 + 0.0049533419T4. (1.86)
bound for IT,(x)l in [- 1,111 is known to be 1). The trun- Table 1.12 can be used to retransform the series into a
cated series can then be retransformed to a polynomial polynomial in x, yielding
in x with fewer terms than the original, and, of course,
cos x 0.99995795 - 0.49924045 xz + 0.03962674x4.
modified coefficients. This procedure, termed Chebyshev
economization or telescoping a power series, is probably (1.87)
If the original power series had been truncated after the
best illustrated with some examples.
term in x4, the error bound would have been
Suppose we want to find a three-term approximation
for the cosine on the interval [ - 1,1] of the form

which is in error by no more than 5 x lo-' (see the results some 28 times the specified value. While it would appear
of Example 1.3). The logical starting place is with the at first glance that the original series could be truncated
power series expansion after the term in x6 (since 1/8 ! G 2.5 x 10- ' < 5 x lo-')
44 Interpolation and Approximation

before expansion in the Chebyshev polynomials, this is Table I . 14 Values of F(z), F(z) and 6(z)
not the case. The coefficient of T, in the expanded form from Figure 1 .I2
would have been 4.34 x lo-', and this plus 2.5 x
would have exceeded the specified bound.
Consider another similar problem. Using the econo-
mization procedure, find a linear approximation for the
function
F(z) = z3 (1.88)
on the interval 1 < z < 3.
First, use (1.68) to transform the given function F(z)
on the interval 1 < z ,< 3 to the new function f(x) on the is shown graphically in Fig. 1.13. The local minimum
interval - 1 < x d 1 where x = z - 2. Then (found by equating the derivative to zero) is at z = 2.06
and equals - 3.06. The local maximum error, at z = 3,
equals 3t. In this case, the true maximum error is the
same as the known error bound because T,Cx) and T,(x)
Using Table 1.13, f (x), in terms of the Chebyshev poly- both achieve their maximum values (1.0) at x = 1
nomials, becomes (or z = 3).

Dropping the Chebyshev polynomials of degree greater


than 1 (with a maximum error of 2 3a) and using Table
1.12 to express To and TI in terms of x, we obtain

Retransforming to the interval [1,3],

Figure 1.12 shows the original polynomial F(z) = z3 and


the economized polynomial of degree one, P(z) =
1222 - 143.
Figuxe 1.13 Error in the first-degrce economized polynomisl
approximation of F(z) = z3.

Notice that the economized polynomial of degree one,


F(z), has an error which assumes n + 2 = 3 extreme
values of alternating sign (22, -3.06, and 3$ for z = 1,
2.06, and 3, respectively) as the minimax polynomial
approximation must. Recall, however, that the mini-
max polynomial approximation must possess the equal
ripple property, that is, the extreme errors must be
equal in magnitude. Hence F"(z) cannot be the minimax
polynomial of degree one for the function F(z) = z 3 .
However, in view of the nearly equal extreme values for
the magnitude of the error, F(z) is probably a near-
minimax approximation.
Suppose that the expansion (1.90) is truncated after the
term in T,(x); the only term dropped is $T3(3C) Then the
Figure 1.12 First-degree economized polynomial approximation economized polynomial of degree two,
of F(z) = z3 on the interval [1,3].
f(x) G pz(x) = 1lTo + 12$T1 + 3T2, (1.94)
A tabulation of the values of the two functions (and of
the deviations) at several points on the interval is shown will have a maximum error of magnitude f . Transforming
in Table 1.14. (1.94) back to the interval 1 < z < 3 leads to the second-
The deviation, givcl by degree economized polynomial approximation of F(z):
6 ( ~ ) = E ' ( z ) - F ( z ) = 2 ~ - 1 2 $ 2 + 1 4 + , (1.93) F(z) = 6z2 - Il&z+ 64. (1.95)
1.12 Chebyshev Economization-Telescoping a Power Series 45

Observe that the term dropped from (1.90) to get (1.94), +


does indeed have n 2 = 4 extreme values on [1,3] of
namely, $T,(x), is precisely &(x) of (1.74). We have +
equal magnitude and alternating sign.
already shown that $,(x) is the monic polynomial of Given a starting polynomial of degree n on an arbi-
degree n which deviates lleast from zero on the interval trary finite interval [a,b], the Chebyshev economization
[- 1,1]. Hence (1.94) must be the polynomial of degree process always leads to the minimax polynomial approxi-
two which deviates least from the polynomial of degree mation of degree n - 1. Approximations of lower degree
three given by (1.90). Then the transformed second- generated by the process will not, in general, be minimax
degree economized polyn,omial approximation (1.95) of polynomial approximations; however, they will usually
the original third-degree polynomial (1.88) must be the be near-minimax polynomial approximations. Since the
second-degree minimax polynomial approximation of maximum error in such approximations can be bounded
F(z) on the interval [1,3]. The reader should verify that as the sum of the magnitudes of the coefficients of the
the deviation of (1.95) from (1.88), Ti(x) dropped from the expansion in terms of Chebyshev
polynomials, the economization process often leads to
very useful polynomial approximations.
EXAMPLE 1.3
CHEBYSHEV ECONOMIZATION

Problem Statement polynomial in any variable (z for example) on an arbitrary


Write a program that reads as data the following: interval [Ll ,R1],
n
P ~ ( z=
) C dizi,
i=O
(1.3.10)
Here, the a i are the coefficients of the nth-degree poly-
to an nth-degree polynomial in any other variable (x, for
nomial
example)' on another arbitrary interval [L2,R2],

where L < z < R. The program should transform (1.3.1)


to the nth-degree polynomial in a new variable x having using the transformation
coefficients a:,

Note that (1.3.3) is a special case of (1.3.12), for which


where - 1 < x < 1, and x and z are related (see (1.68)) L, = L , R, = R , L 2 = - 1 , a n d R 2 = 1.
by
22-R-L Method of Solution
x=
R-L ' Substituting (1.3.12) into (1.3. lo), collecting terms in
like powers of x, and equating them with the coefficients
Next, the program should expand (1.3.2) in terms of the
of (1.3.11) leads, after some tedious algebra, to the
Chebyshev polynomials of Tables 1.I2 and 1.13 as
following relationship between diand di*:

and then carry out the Chebyshev economization pro-


cedure, dropping all those high-order polynomials T,(x)
which can, in sum, contribute no more than an amount
to the value of pn(x) on [- 1,1]. Let m be the degree of
the truncated polynomial, so that
Here,

where
n
E= C biT(x), (1.3.6) is the binomial coefficient.
i=rn+l The subroutine TRANS computes values for the d:,
and given the d,, n, L,, R,, L2, and R2. A function called
n
NOMIAL evaluates the binomial coefficient of (1.3.13),
when needed by TRANS.
The economized polynomial prn(x)should be rewritten To implement the transformations from (1.3.2) to
as (1.3.4) and from the economized polynomial p,(x) in
terms of the Chebyshev polynomials to (1.3.8), it is
convenient to set up two 10 x 10 matrices X and T,
containing the coeffcients from Tables 1.12 and 1.13.
using Table 1.12. Finally, (1.3.8) should be transformed The matrix X contains the coefficients from Table 1.13
to the original variable z on the interval [L,R], using arranged so that element Xipjcontains the coefficient of
(1.3.3), to yield Tj for the expansion of x i in terms of the Chebyshev
rn
polynomials, as illustrated in Table 1.3.1.
The matrix Tcontains the coefficients from Table 1.12,
The program should print the input data, the coefficients arranged so that is equal to the coefficient of the jth
of the various polynomials as they are computed, and the power of x for the ith Chebyshev polynomial, Ti, and is
value of Ern,,. illustrated by Table 1.3.2. This organization of the
To simplify the transformation from (1.3.1) to (1.3.2) matrices X and T Ieads to rather simple subscription
and from (1.3.8) to (1.3.9), write a general purpose schemes for carrying out the economization process, but
subroutine named TRANS that converts an nth-degree is not essential. To economize on computer memory
Example 1.3 Chebysheu Economization

Table 1.3.1 The Matrix 2


I

Table 1.3.2 The Matrix T


I

Row 1 j - Column Subscript


48 Interpolation and Approximation

requirements, the tables could be packed tightly to where Ai = 2 indicates that i is to be incremented by 2;
eliminate all nonzero entries; this would, of course, lead this allows the zero entries in X to be ignored.
to a different subscription scheme. Once the economized polynomial pm(x) has been
Having found the a* in (1.3.2), the coefficients of found, using criterion (1.3.7), the coefficients of (1.3.8)
(1.3.4) can be computed from can be computed from

Flow Diagram
Main Program
Transform the polynomial

a07 ..-,an

I on the interval L < z


to the polynomial
<R
I

Transform the economized


polynomial

on the interval - 1 d x 6 1
to the polynomial I
< R.
on the interval L < z
(Subroutine TRANS) I
Example 1.3 Chebyshev Economizution

Subroutine TRANS (Dummy arguments: n, d, d*, L,, R,,L,, R,;

calling arguments :
-
n, a, a*, L,R, 1 , 1
[in, c*, c, -1, 1, L, R

I
I
I
I
1
,.----.
I
I
1
I
I I
I
I
I
T I
I
I I
I I
I
I
I
I
Y m Y I
I
I
I
I
d: tdt + (:) kpd, di* c dl I
II
I I
I (Function NOMIAL) II
I I
I I
I I
1
I I
I I
I I
L---,,,-, I
I
I

Function NOMIAL (Dummy arguments: k, 1; calling arguments: j, i)


50 Interpolation and Approximation

FORTRAN Implementation
List of Principal Variables
Program Symbol Dejnition
(Main)
A, ASTAR, B, Vectors of polynomial coefficients ai , a,*,bi , ci , and c:.
c, C S T A R ~
AL, AR Lower bound, L, and upper bound, R, for the interval of interest in the variable z.
EMAX Em,, maximum possible error introduced by the economization procedure.
EPS E,maximum allowable error to be introduced by the economization process.
I. ISUB, +
Subscripts, i, n + 2 - i, j, n 1, and m 1. +
J, NP1, M P l t
M m, degree of the economized polynomial.
N n, degree of the starting polynomial.
ONE, ONEM 1.O, - 1.0 (double-precision).
TToxt Matrix of coefficients for converting from a polynomial in terms of Ti(x) to a polynomial ia
terms of xi, TiSj(see Table 1.3.2).
Matrix of coefficients for converting from a polynomial in terms of xi to a polynomial in terms
of Ti(x), XiVj(see Table 1.3.1).
(Subroutine
TRANS)
BlNOM Binomial coefficient calculated by function NOMIAL.
COEFFI, COEFFTt Vectors of coefficients for the initial and transformed polynomials, d, and d,*, respectively.
CON1, CON2 -
Constants k, = (R, L1)/(R2- L,)
and k, = (L, R, - R,L,)/(R, - L,).
ENDLI, ENDLT Lower bounds for the initial and terminal intervals, respectively.
ENDRI, ENDRT Upper bounds for the initial and terminal intervals, respectively.
(Function
NOMIAL)
ICOUNT Counting variable.
K, L Arguments for the binomial coefficient, k, I.
NOM Binomial coefficient g =

t Because of FORTRAN limitations (subscripts smaller than one


are not allowed), all subscripts in the text and flow diagrams are
advanced by one when they appear in the programs: for example,
a,, an,
TTOX(N
z.,,
+ +
z,.and
1, N 1) respectively.
+
become A(l), A(N I), TTOX(1, 1) and
Example 1.3 Chebysheu Economization

Program Listing
Main Program
A P P L l ED NUMERl CAL METHODS, EXAMPLE 1.3
CHEBYSHEV ECONOMlZATlON

T H I S PROGRAM READS THE N + 1 COEFFICIENTS A ( l ) . . . A ( N + l ) OF AN


N-TH DEGREE POLYNOMIAL I N Z ON THE INTERVAL (AL,AR) AND
EPS, THE MAXIMUM TOLERABLE ERROR PERMITTED I N THE
ECONOMIZED POLYNOMIAL. THE COEFFICIENTS ARE ASSUMED TO
BE I N ORDER OF ASCENDING POWERS OF Z. THE PROGRAM
CALLS ON THE SUBROUTINE TRANS TO COMPUTE THE COEFFl C l ENTS
ASTAR(l)...ASTAR(N+l) OF THE TRANSFORMED POLYNOMIAL I N X
ON THE I N T E R V A L (-1,l). THE TRANSFORMED POLYNOMIAL I S
EXPANDED I N TERMS OF THE CHEBYSHEV POLYNOMIALS OF DEGREE N
OR LESS, WHERE B ( J + l ) I S THE C O E F F I C I E N T OF THE J - T H CHEBYSHEV
POLYNOMIAL. T H I S PULYNOMIAL I S THEN TRUNCATED TO ONE
OF DEGREE M BY DROPPING H I G H ORDER TERMS OF THE POLYNOMIAL.
TERMS ARE DROPPED I N REVERSE ORDER SO LONG AS THE MAGNITUDE OF
THE P O S S I B L E ERROR, EMAX, I S NOT GREATER THAN EPS. AFTER THE
ECONOMl Z A T l ON PROCESS I S COMPLETED, THE TRUNCATED POLYNOMIAL
I S CONVERTED TO A POLYNOMIAL I N THE VARIABLE X W l T H THE
M + l COEFFl C l ENTS I N CSTAR( 1). .CSTAR(M+l). . THE SUBROUTINE
TRANS I S CALLED TO TRANSFORM THE ECONOMl ZED POLYNOMIAL
I N X ON THE I N T E R V A L (-1,l) TO A POLYNOMIAL I N THE O R I G I N A L
VARlABLlE Z ON THE INTERVAL (AL,AR). THE COEFFICIENTS OF THE
ECONOMIZED POLYNOMIAL ARE I N C ( 1 ) . ..C(M+l). ELEMENTS OF THE
CONVERSION MATRICES XTOT AND TTOX ARE PRESET W l T H VALUES
FROM TABLES 1.3.1 AND 1.3.2 I N THE TEXT. AS WRITTEN, THE PROGRAM
CAN HANDLE STARTING POLYNOMIALS OF DEGREE N I N E A T MOST.

I M P L l C l T REAL*8(A-H, 0-2
D I M E N S I O N A ( l O ) , ASTAR(101, B(101, CSTAR(lO), C(10), XTOT(lO,lO),
1 TTOX( 1 0 , l O )

..... PRESET CONVERSION MATRICES XTOT AND TTOX


DATA XTO-r / 1.0, 0.0, 0.5, 0.0, 0.375, 0.0, 0.3125,
..... 0.0,
1 0.2734375, 2.0.0, 1.0, 0.0, 0.75, 0.0, 0.625, 0.0, 0.5C6875, 0.0,
2 0.4921875, 2*0.0, 0.5, 0.0, 0.5, 0.0, 0.46875, 0.0, 0.4375,
3 4e0.0, 0.25, 0.0, 0.3125, 0.0, 0.328125, 0.0, 0.328125, 4.0.0,
4 0.125, 0.0, 0.1875, 0.0, 0.21815, 6*0.0, 0.0625, 0.0, 0.109315,
5 0.0, 0.140625, 6.0.0, 0.03125, 0.0, 0.0625, 8.0.0, 0.015625, 0.0,
6 0.03515625, 8+0.0, 0.0078125, 10*0.0, 0.00390625 /

DATA TTOX / 1.0, 0.0, -1.0, 0.0, 1.0, 0.0, -1.0, 0.0, 1.0,
12.0.0, 1.0, 0.0, -3.0, 0.0, 5.0, 0.0, -7.0, 0.0, 9.0, 2.0.0,
2 2.0, 0.0, -8.0, 0.0, 18.0, 0.0, -32.0, 4.0.0, 4.0, 0.0, -20.0,
3 0.0, 56.0, 0.0, -120.0, 4.0.0, 8.0, 0.0, -48.0, 0.0, 160.0,
4 6.0.0, 16.0, 0.0, -112.0, 0.0, 432.0, 6.0.0, 32.0, 0.0, -256.0,
5 8.0.0, 64.0, 0.0, -576.0, 8*0.0, 128.0, 10.0.0, 256.0 /

C
C
DATA ONE, ONEM / 1.0,

I N T DATA .....
..... READ AND N,P REPS,
-1.0 / ,
.' C
\
,
,>
. "'<
'\
-,
i \*
1 READ (5,100) AL, AR
I F ( N.GT.9 ) CALL E X I T "y,,.\
NP1 a N + 1
k
,.:-r
e.
\+
. -xu
READ ( 5 , 1 0 1 ) ( A ( ! 1, I a 1 , N P l )
/#
WRITE ( 6 , 2 0 0 ) N, EPS, AL, AR, NP1, ( A ( I 1, I=l,NPl) \
\\
-2

"CP
C ~

C
C
..... TOTRANSFORM POLYNOMIAL I N Z ON INTERVAL (AL,AR)
A POLYNOMIAL I N X ON THE INTERVAL (-1,l) .....
C A L L TRANS( N,A,ASTAR,AL,AR,ONEM,ONE )
WRITE ( 6 , 2 0 1 ) NP1, ( A S T A R ( I ) , I = l , N P l )
C
C ..... EXPAND TRANSFORMED POLY. I N TERMS OF CHEBY.
DO 3 J = l , N P l
POLYS. .....
B ( J ) = 0.0
DO 3 I*J,NP1,2
3 B ( J ) = B ( J ) + ASTAR(l)*XTOT(I,J)
WRITE (6,202) NPl, ( B ( I ) , I=l,NPl)
52 Interpolation and Approximation

Program Listing (Continued)


C;
C ..... CARRY OUT THE ECONOMIZATION PROCEDURE
EMAX = 0.0
.....
MP1 = N P 1
DO 5 I = l , N
ISUB = N P 1 + 1 I -
I F ( EMAX + DABS(R(ISUB)).GT.EPS GO TO 6

5
MP1 = N P I - I
EMAX = EMAX + DABS(B(ISUR))
-
I
:
C
6
..... CONVERT ECONOMIZED POLYNOMIAL TO POWERS OF X
M = M P l - 1
.....
DO 7 J = l , M P l
CSTAR( J ) 5 0.0

WRITE (6,203) MP1, (CSTAR( I I , I=l,MPl)


C
C
C
..... TRANSFORM ECONOMIZED POLYNOMIAL I N X ON INTERVAL
(-1,l) TO A POLYNOMIAL I N Z ON INTERVAL (AL,AR) .....
CALL TRANS( M, CSTAR, C,ONEM,ONE,AL,AR )
WRITE (6,204) AL, AR, M, EMAK, MP1, (C(I), I=l,MPl)
GO TO 1
C
C
100
..... FORMATS FOR THE INPUT AND OUTPUT STATEMENTS
FORMAT ( bX, 12, 12X, E12.5, 2(10X, F10.5)
.....
101 FORMAT ( 16X, 4E14.6
200 FORMAT ( 6 H l N =.15/6H EPS =. E15.5/6H AL =.F10.4/6H AR =.F10.4/
- - .
l l H 0 / 2 7 H THE COEFFI C ~ E N T S~ ( 1 )..A(,-
. 11, 5H) A R E / / i l ,~
1~5~i6.6))
201

202
1 5H) ARE/ 1H / ( 1 H .
FORMAT ( 1HO/ 1HO/ 35H THE COEFFICIENTS ASTAR(l)...ASTAR(.Il, -
1P5E16.6))
FORMAT ( 1HO/ 1HO/ 27H THE COEFFICIENTS B(l)...B(,
-
11, 5H) ARE/
1 1H / ( 1 H , 1PSE16.6))
203 FORMAT ( 1HO/ l t i O / 35H THE COEFFICIENTS CSTAR(l)...CSTAR(, 11,
1 5H) ARE/ 1H / ( 1 H , 1P5E16.6))
204 FORMAT ( 1HO/ 1HO/ 49H THE ECONOMIZED PPLYNOMIAL ON THE INTERVAL (
1 AL = ,F10.4,2H4 ,5H AR = ,F10.4,8H I S OF/11H DEGREE M =, 12,ZH.
2 53H THE MAXIMUM ERROR ON THIS INTERVAL I S NO LARGER THAN,lPE15.7,
3 1H./27HOTHE COEFFICIENTS C(1). ..C(, 11, 5H) ARE/ 1H /
4 (1H , lP5E16.6))
C
END

Subroutine TRANS
SUBROUTI NE TRANS( N, COEFF I, COEFFT, ENDLI ,ENDRI ,ENDLTANDRT )
C
TRANS CONVERTS AN N-TH DEGREE WLYNOMIAL I N ONE VARIABLE
(SAY Z) ON THE INTERVAL (ENDLI,ENDRI) HAVING COEFFICIENTS .
.
COEFFI (1). .COEFFI ( N + l ) INTO AN N TH DEGREE POLYNOMIAL
I N A SECOND VARIABLE (SAY X) ON THE INTERVAL (ENDLT,ENDRT)
WITH COEFFICIENTS COEFFT(l)...COEFFT(N+l) WHERE THE TWO
VARIABLES X AND Z ARE RELATED BY THE TRANSFORMATION

I M P L l C l T REAL*8(A-H, 0-2)
REAL*8 COEFF I, COEFFT, ENDLI , ENDR I, ENDLT, ENDRT
DIMENSION COEFFI (101, COEFFT(10)

C .....COMPUTE CONSTANT PARAMETERS


CON1 = (ENDRI-ENDLI)/(ENDRT-ENDLT)
.....
CON2 = (ENDLI*ENORT-ENDRI*ENDLT)/(ENDRT-ENDLT)
NP1 = M + 1
Example 1.3 Chebyshev Economization

Program Listing (Continued)

C .I F....CON2.NE.0.0
(
C H E ~ KFOR C O N 2 1 0 TO A V O I D COMPUTING 0. * * 0
GO TO 2
.....

4 COEFFT(1) a COEFFT(l)*CONl*+(I-1)
RETURN
C
END

Function NOMIAL
F U N C T I O N N O M I A L (K, L )
C
C N O M I A L COMPUTES THE B I N O M I A L C O E F F l C l E N T (K,L).
C
NOM = 1
I F ( K.LE.L .OR. L.EQ.O GO TO 4
DO 3 I C O U N T = l , L
3 NOM = N O M * ( K - I C O U N T + l ) / I C O U N T
4 N O M I A L = NOM
RETURN
C
END

Data
N = 3 E PS
A(l)...A(4) =
N u 3 E PS
A(l)...A(4) =
N = 3 EPS
A(l)...A(4) =
N = 3 E PS
A(l)...A(4) =
N = 3 E PS
A(l)...A(S) =
N = 8 EPS
A(l)...A(4) =
A(5)...A(8) =
A(9) =
N = 8 E PS
A(l)...A(4) =
A(S)...A(8) =
A(9) =
N E 8 EPS
A(l)...A(b) =
A(5)...A(8) =
A(9) =
N u 8 E PS
A(l)...A(4) =
A(S)...A(8) =
A(9) =
N = 2 E PS
A(l)...A(3) =
N = 2 E PS
A(l)...A(3) =
N = 2 E PS
A(l)...A(3) =
Interpolation and Approximation

Computer Output
Results for the 2nd Data Set
N = 3
EPS = 0.32600D 01
AL = 1.0000
AR = 3.0000

THE COEFFICIENTS A ( l ) . . . A ( 4 ) ARE


0.0 0.0 0.0

THE COEFFlCl ENTS ASTAR(l).,.ASTAR(k) ARE

8.000000D 00 1.200000D 01 6.000000D 00 1.000000D 00

THE COEFFICIENTS B ( l ) . . . B ( 4 ) ARE

1.100000D 01 1.275000D 01 3.000000D 00 2.500000D-01

THE COEFFICIENTS CSTAR(l)...CSTAR(2) ARE

1.100000D 01 1.275000D 01

THE ECONOMIZED POLYNOMIAL ON THE INTERVAL ( A L = 1.0000, AR = 3.0000 I S OF


DEGREE M = 1. THE MAXIMUM ERROR ON THIS INTERVAL I S NO LARGER THAN 3.2500000D 00.

THE COEFFICIENTS C(l)...C(2) ARE

-1.450000D 01 1.275000D 01

Results for the 7th Data Set


N = 8
EPS = 0.50000D-04
AL = 0.0
AR = 1.5708

THE COEFFICIENTS A(l)...A(9) ARE


1.000000D00 0.0 -5.0000000-01 0.0 4.1666670-02
0.0 -1.388888~-03 0.0 2.480159D-05

THE COEFFICIENTS ASTAR(l)...ASTAR(9) ARE

THE COEFFICIENTS B ( l ) . . .B(9) ARE


Example 1.3 Chebyshev Economization

Computer Output (Conti~rued)


THE COEFFl C l ENTS CSTAR( 1). ..
CSTAR( 6 ) ARE

THE ECONOMIZED POLYNOMIAL ON THE INTERVAL ( AL = 0.0 AR = 1.5708 ) I S OF


DEGREE M = 5. THE MAXIMUM ERROR ON T H l S INTERVAL I S NO L A ~ G E RTHAN 7.29780800-06.

THE COEFFICIENTS C(l)...C(6) ARE

Results for the 8th Data Set

THE COEFFICIENTS A(l)...A(9) ARE


l.0000d00 00 0.0 -5.0000000-01 0.0 4.1666670-02
0.0 -1.3888880-03 0.0 2.4801590-05

THE COEFFICIENTS ASTAR(l)...ASTAR(9) ARE

THE COEFFICIENTS B ( l ) . . . B ( 9 ) ARE

THE COEFFICIENTS CSTAR(l).. .CSTAR(9) ARE

THE ECONOMIZED POLYNOMIAL ON THE INTERVAL ( AL = -1.0000, AR = 1.0000 I S OF


DEGREE M = 8. M E MAXIMUM ERROR ON T H l S INTERVAL I S NO LARGER THAN 0.0

THE COEFFICIENTS C(l)...C(9) ARE


1.0000000 00 0.0 -5.0000000-01 0.0 4.1666670-02
0.0 -1.3888880-03 0.0 2.480159D-05
Interpolation and Approximation

Computer Output (Continued)


Results for the 9th Data Set
N = 8
EPS 0.50000D-04
AL 1 -1.0000
AR = 1.0000

THE COEFFICIENTS A ( l ) . . . A ( 9 ) ARE


1.0000OOD 00 0.0 -5.0000000-01 0.0 4.1666670-02
0.0 -1.388888D-03 0.0 2.480159D-05

THE COEFFICIENTS ASTAR(l)...ASTAR(9) ARE

THE COEFFICIENTS B ( l ) . . . B ( 9 ) ARE

THE COEFFICIENTS CSTARCl).. .CSTAR(5) ARE

9.999580D-01 0.0 -4.992405D-01 0.0

THE ECONOMIZED POLYNOMIAL ON THE INTERVAL ( AL -1.0000, AR = 1.0000 ) I S OF


DEGREE M 4. THE MAXIMUM ERROR ON T H I S INTERVAL I S NO LARGER THAN 4.2046412D-05.

THE COEFFICIENTS C ( l ) . . . C ( 5 ) ARE

9.999580D-01 0.0 -4.992405D-01 0.0 3.962674D-02

Results for the 12th Data Set


N = 2
EPS = 0.50000D-01
AL = 1000.0000
AR = 2000.0000

THE COEFFICIENTS A c l ) . . . A ( 3 ) ARE


6.300000D 0 0 1.820000D-03 -3.450000D-07

.
THE COEFF l C l ENTS ASTAR( 1). .ASTAR( 3 ) ARE

8.253750D 00 3.925000D-01 -8.625000D-02

THE COEFFICIENTS B ( l ) . . . B ( 3 ) ARE


8.210625D 00 3.92500BD-01 -4.312500D-02
Example 1.3 Chebyshev Economization

Computer Output (Continued)


THE COEFFICIENTS CSTAR(l)...GSTAR(Z) ARE

8.210625D 0 0 3.9250000-01

THE ECONOMIZED POLYNOMIAL ON THE INTERVAL ( AL 1 1 0 0 0 . 0 0 0 0 , AR = 2 0 0 0 . 0 0 0 0 ) I S OF


DEGREE M * 1. THE MAXIMUM ERROR ON T H I S INTERVAL I S NO LARGER THAN 4 . 3 1 2 5 0 0 0 0 - 0 2 .

THE COEFFICIENT!; C ( l ) . . . C ( 2 ) ARE

1.0331250 00 7.85OOODD-04
58 Interpolation and Approximation

Discussion of Results
Five different polynomials were used in the twelve test
data sets as follows:

Data Interval Maximum Polynomial


Allowable
Set [L, R] Error, E

Data sets 1, 3, 6, 8, and 11 allow no error to be intro-


duced by the economization process. Hence, the econo- E m n x + [ z ~ s ~, ] z,<in[O,n/2]. (1.3.17)
max
mized polynomial for these cases must be equivalent to
the original polynomials; significant discrepancies could Thus, the maximum possible magnitude of the error in
be accounted for only as errors in one or more elements (1.3.16) for the interval [O,a/2] is
of the X or T matrices, assuming that the executable
portion of the program is free of error. Results for the 8th
data set, included in the computer output, illustrate these By taking advantage of the fact that cos z is periodic with
cases. Results for data set 2, shown in the computer +
period 2n, that cos(n/2 a) = -cos(n/2 - lx), and
output, are those outlined in (1.88) to (1.92). Results for +
that cos(n p) = cos(n - p), (1.3.16) may, after suitable
data sets 4 and 5, not shown, are, respectively: adjustment of the argument, be used to find the cosine
of any angle within the accuracy of (1.3.18). In fact, since
+
cos z = sin(z n/2), (1.3.16), with an appropriate
and transformation of variable, could be used to calculate the
sine of any angle as well, with the same bound for the
error.
The starting polynomial for data sets 6-9 is the power Results of the economization process for the 10th data
series for cos z, expanded about zo = 0, and truncated set, not shown, are:
after the term in za; it has been used as an example in
Section 1.12. The results for data set 9, shown in the
computer output, correspond to those of (1.82) to (1.87). Results for the 12th data set (see the computer output)
The results for data set 7, shown in the computer show the first-order minimax polynomial approximation
output, are similar to those for data set 9, except that the to a second-degree polynomial representation of the
interval is 0 < z < n/2. The economized polynomial is molar heat capacity for gaseous nitrogen at low pressures
in the temperature range 1000-2000°K. See Problem 1.46
at the end of the chapter for more details.
Double-precision arithmetic has been used for all
calculations. In order to generate accurate coefficients
for the economized polynomial, particularly when only
In this case, Emax = 7.298 x The total possible small errors, E, are allowed, it is important to carry as
error in the approximation is given by Emaxplus the many digits as possible throughout the calculations;
maximum possible error introduced in truncating the double-precision arithmetic should be used, if available.
power series expansion after the term in z8, that is, by As written, the program can handle only starting
Example 1.3 Chebyshev Economization 59
polynomials of degree nine or less. The matrices Tand 8 term bnTncan be dropped without exceeding the maxi-
could be expanded to allow higher-degree starting mum allowable error, the a:, i = 0, 1, .. ., n - 1, can be
polynomials, although the storage requirements for T modified appropriately, to 5: for example, using the
and X could become prohibitive for large n, if the sub- recursion relation. Next, 5:-,x"-' can be expanded to
scription scheme outlined earlier were used. An alterna- yield bn-, directly. If b,,-,Tn-, can be dropped, the 8:,
tive approach would be to pack the coefficients from i = 0, 1, . . ., n - 2, can be modified appropriately, again
Tables 1.12 and 1.13 using a more efficient assignment of using the recursion relation. This process of expanding
memory. only the highest-order untruncated power of x in terms
Yet another, more elegant, approach to the economiza- of the Chebyshev polynomials, followed by proper
tion process is to use the recursion relation of (1.72) to adjustment of the coefficients of lower powers of x, leads
generate the needed coefficients. This avoids the need for directly to the economized polynomial (1.3.8), without
saving tabulated information. In addition, since trun- ever evaluating b,, .. ., b,-,. Arden [l] and Hamming [2]
cation always starts with the highest-order term of suggest some other approaches to the econornization
(1.3.4), a:x" can be expanded to yield b, directly. If the process which use only the recursion relation of (1.72).
60 Interpolation and Approximation
Problems Table P1.8
1.1 In the expansion

how large must n be to yield an approximation for e3.' that is


accurate within
1.2 For small values of x, the approximations

ex = 1 + x, sin x = x
are sometimes employed. In each case, use the error term from
Taylor's expansion to estimate how large a value of x (to the
nearest 0.001) may be employed with the assurance that the
error in the approximation is smaller than 0.01. Check your
conclusions against tables of exponentials and sines. (b) Without renumbering the abscissas, write down the
1.3 Let M be the maximum magnitude of f"(x) on the divided-difference polynomial that uses the elements denoted
interval (xo, xl). Show that the error for linear interpolation by asterisks.
for f (x), using the functional values at xo and xl, is bounded by (c) What is the likely error bound for a fourth-degree inter-
polation with xo < x < x s ?
(d) Find the value of f(x) predicted by the polynomial of
for xo < x < xl. Does this same error bound apply for linear (b) for x = 1.0.
extrapolation, that is, for x < xo or x > XI ? (e) Find the value off (x) predicted by tagrange's second-
1.4 Use the algorithm of (1.12) to evaluate degree interpolating polynomial at x = 1.0, using the three
base points XI, x2, and x3.
1.9 Form the divided-difference table for the data in
Table P1.9.
and each of its derivatives at x = 2.5.
1.5 Write a function, named POLY, that implements the
Table P1.9
algorithm of (1.12) to evaluate an nth-degree polynomial
p.(x) = x7=oa,xi, and each of its derivatives of order 1 X f(4
through n at x = 2. The function should have the dummy
argument list
(N, A, XBAR, DVAL)

where N is the degree of the polynomial, n, A is a vector (one-


..
dimensional array) containing the coefficients ao,a , , ., a.
+
in elements A(1). A(2), ..., A ( N I ) , XBAR is the independent
variable value, f , and DVAL is a vector containing p:(Z),
pa(2), ...,p?)(f) in elements DVAL (1 ), DVAL(Z), ..., DVAL(N)
upon return from the function. The value of POLY should be (a) Estimate the magnitude of the error terms for linear
~"(3. interpolation using successive ordinates.
Write a short main program that reads values for n, ao, (b) Given the additional information that the tabulated
al, ..., a., and 2, calls on POLY, prints the values returned for + +
function is f(x) = cos (In x) x 2, show that the maximum
p.(f), pP:(2), . . .,pP1(Z), and then reads another data set. Test error for any interpolant value approximating f(x) using linear
POLY with several different polynomials, including that of interpolation with successive base points is smaller than
Problem 1.4. 0.00004.
1.6 (a) Show that the nth divided difference of y = xn is 1.10 Form the finite-difference table for the data of
unity, no matter which base points xo, XI, .. ., X, are chosen. Table Pl.10.
(b) Show that the nth divided difference of any polynomial (a) Write Newton's forward formula (1.54) for the inter-
p,(x) = 2 ;=o aixi is given by an, regardless of the choice of polating polynomial that reproduces exactly the first four
base points. functional values in Table P 1.10.
1.7 Investigate the relation, if any, between the number of (b) Write Newton's backward formula (1.61) that uses the
significant figures to which tabulated values of f(x) vs x are last three entries in the table.
given, and the highest order for which finite divided differences (c) Write the Gauss forward formula (1.65) with base points
are likely to be meaningful. X - 1 = 2.5, x o = 3.0, XI = 3.5, X, = 4.0.

1.8 Consider the divided differences in Table P1.8. (d) Estimate the error when the polynomial of part (a) is
(a) What is the significance of the zero at the top of the used for interpolation (or extrapolation) with arguments
seventh column ? x = 1.75, 2.3, and 4.0.
Probler

Table P1.10 first two columns are calculated one column at a time, then
each entry may be evaluated using just two multiplications,
two subtractions, and two divisions.
(b) Generate a table comparable -to that of part (a) for
the functional values of Tables 1.2 and 1.4 with interpolant
value x = 0.5, and compare with the results for various
degrees of interpolation shown in Section 1.6.
1.13 The interpolation method suggested in Problems 1.11
and 1.12 is known as iterated linear interpolation or Aitken's
method. Implement the method by writing a subroutine,
named AITKEN, that could be called from another program
with the statement
CALL AITKEN (X, Y, NMAX, N, ATABLE, XARG)
(e) Given the information that the tabulated function of where the data points (xi,yi) are available in the arrays X
+
Table P1.10 is f(x) = 6x 2 cos x 2 - 4x2 sin x Z --5x-l, find and Y with subscripts 1, 2 , . . ., NMAX, and the X array is
the error bound for quadratic interpolation (interpolating arranged in ascending sequence. The subroutine should scan
polynomials of degree 2) using three successive base points the X array to find the element nearest the argument,
x i , X, + x i + 2, with X, < x < x i + 2. Does the bound change x = XARG, and designate it as x o for Aitken's interpolation.
appreciably over the range of the table? AITKEN should then develop the appropriate entries for
1.11 Let yi = f(xi), i = 0, I , . . ., n. Show that Newton's +
columns 3 through n 2 of Aitken's table in columns 1
fundamental formula (1.30) for the polynomial of degree one through N of the matrix ATABLE, and return to the calling
passing through the points (xi, y,) and (xk,yk)may be written program.
in the form Write a short main program that reads values for NMAX,
X(1). . . ., X(NMAX), and Y(1). . . ., Y(NMAX) just once, and
then repeatedly reads values for N and XP.RG, calls upon
AlTKEN to compute the elements of ATABLE, and prints the
where the vertical bars indicate the determinant (see Chapter matrix ATABLE.
4). Let y,, ,,,, . . . . ,,(x) be the interpolating polynomial of 1.14 When f(x) is a single-valued function of x, and a
degree rn passing through the points p o , p l , .. .,p,. Show that value of x is required for which the dependent variable f(x)
assumes a specified value, the role of independent and dcpen-
dent variable may be interchanged, and any of the appropriate
interpolation formulas may be used. The process is usually
termed inverse interpolation.
Choose the thermocouple emf and temperature data from
Table 1.2.1 corresponding to xo = 0, XI = 500, x 2 = 1000, etc.,
and use inverse interpolation to find approximations of the
emf corresponding to temperatures of 122.4, 118.4, 121.932,
447.6, and 447.02"F for interpolation of degrees I, 2, and 3.
Note that each of the second-degree interpolating polynomials Compare your results with Table 1.2.1 and the computer
can be described in terms of linear interpolation on two output for Example 1.2.
linear interpolating polynomials, and that the third-degree 1.15 Suppose that J ( X ) is the value of the nth-degree inter-
interpolating polynomial can be described in terms of linear polating polynomial passing through the points (xk,yk),
interpolation on two second-degree interpolating polynomials. (XX+I,YX+I), ..., (~k+n,~k+n)rwhere Y I = f (xi) and f(x) is a
1.12 (a) Using the definitions of Problem 1.11, consider single-valued function of x. Carry out an inverse interpolation
Table P1.12: using the same points with J as the argument and let the inter-
polated result be denoted by f*. Comment on the following
statements:
(a) f* will be identical t o P.
(b) the smaller the value of If* - f 1, the better the value of
j; will approximate f (x).
1.16 Show that Lagrange's interpolating polynomial (1.43)
can be written in the form

where
Show that if each of the quantities (xi - x), i = 0, 1, ..., n, is ,, drk)
computed first, and that if entries in the table following the n(x)= n ( x - xl) and nl(x)=-
I=O dx '
62 Interpolation and' Approximation
1.17 In Example 1.1, how would you modify the main 1.22 If the method of Problem 1.21 appears feasible and
program, thesubroutine DTABLE, and the function PNEWf, So can be applied generally, write a subroutine, named CHECK,
that the matrix TABLE could be eliminated and replaced by a that will implement it. Also write a short main program to
vector (one-dimensional array) TABLE? handle input and output and call on the subroutine as follows:
Hint: Only the elements on the appropriate diagonal sf
CALL CHECK (X, F, N, ANSWER, TRUEF)
TABLE are actually used in the evaluation of Newton's
fundamental formula (I,! .6). Here, X and F are the vectors that contain the tabulated values
1.18 How could you modify the function PLAGR of of x and f(x), and N is the subscript corresponding to the last
Example 1.2 so that the function would choose the appropriate entry in the table. ANSWER is a logical vector; according to
base points, and the argument MIN would be unnecessary? whether an entry F(1) is found to be in error or not, the sub-
1.19 The values shown in Table P1.19 are available [24] routine should store the logical value .FALSE. or .TRUE. in the
for the thermal conductivity, k (BTU/hr ft OF), of carbon corresponding element ANSWER(1). The vector TRUEF should
dioxide gas and for the viscosity, p (Ib/ft hr), of liquid ethylene ultimately contain the corrected table of values for/(x). Since
glycol, at various temperatures T("F). it is generally impossible to distinguish between an error in x
Table P1.19
and one in f(x), attempt only to correct the latter if a dis-
crepancy is found.
1.23 Write a function, named STIRL, that uses Stirling's
formula, equation (1.67), to implement an mth-degree central-
difference interpolation on the data values ( x t ,f(x,)), i = 1, 2,
. . ., n. Assume that the x, are evenly spaced and are arranged
in ascending order. The calling sequence should be of the form
FVAL = STIRL (X, F, M, N, TEMP, XVAL. NOGOOD)

where X and F are vectors containing the data values, M and


In each case, determine the simplest interpolating polynomial N have obvious meanings, and XVAL is the interpolation
that is likely to predict k and p within 1% over the specified argument. TEMP is a vector, dimensioned in the calling pro-
ranges of temperature. These polynomials will be needed in gram to have maximum subscript 2 x N, that will be used for
Problem 2.15. working storage. NOGOOD is an integer, returned as 0 in case
Hint: In p is more nearly a simple function of T thanlis p of computational difficulties (see below), and as 1 o t h e ~ i s e .
itself. The routine should center the central-differencepaths across
1.20 Suppose we wish to prepare a table of functional the table as closely as possible about the interpolation argu-
values of ex sin x for subsequent quadratic interpolation using ment. Only those differences that are actually needed for the
Newton's forward formula (1.54) on the interval 0 < x d 2. evaluation should be computed, and no working vectors other
What base-point spacing should be used to insure that inter- than TEMP should be used in the function. The interpolated
polation will be accurate to four decimal places for any value should be returned as the value of the function. The
argument in the indicated range? function should check for any argument inconsistencies and
1.21 ID a table of values of f(x) vs x, a few entries may be should forbid extrapolation beyond the limits of the table.
in error. By forming a divided-difference table, investigate the To test the function STIRL, also write a main program that
feasibility of locating the errors and, if possible, correcting reads data sets of the form: N, X ( l ) ...X(N), F(1) .... F(N),
them. Assume that f(x) is a smooth function of x. Before M, XVAL.
checking against published values, attempt to locate the errors
in Table P1.21, allegedly of sin x vs x. Suggested Test Data
(a) The values shown in Table P1.23a are available [24] for
Table P1.21 the density of water, p(g/ml), at various temperatures T("C).
--

x sin x ? Table P1.230

Interpolate using several different values for T and m (degree


of the polynomial).
Problems 63
(b) The values shown in Table P1.23b are available [201 for Next, show that the continuity conditions for the derivative of
the refractive index, n, of aqueous sucrose solutions at 20°C, S;(Y,x), that is,
containing various percentages, P, of water.
X ~p i) , ( - l ( x i ) ,
P ~ , ~ (= i = I , 2, . . ., 11 - 1,
Table PI.23b lead to a system of n - 1 linear difference equations,

in the n + I unknown derivatives Sn,-(y,xi),i = 0, I , . . ., n. If


two additional conditions are specified, for example, S"i(Y,xo)
and S",-(J,x.),then the simultaneous linear equations can be
solved for the pini(xi),i = 1, 2, . . ., n - I, using methods des-
cribed in Chapter 5, and the necessary interpolating functions
) , 0,1, . .., n - 1, are then known.
P ~ , ~ ( xi =
Interpolate using several different values for P and m. 1.25 The approximating function S ; - ( 7 , ~of) Problem 1.24
1.24 Suppose we wish to approximate a continuous and is known as the cubic spline function [22]. Show that if
differentiable function f ( x ) on the interval [a,b] in a piecewise P ; . ~ ( X=~0) and if p;,,- ~ ( x ,= ) 0, then the generated spline
fashion, using low-degree interpolating polynomials over non- function possesses the property of least mean-squared curva-
overlapping subintervals of [a&]. Let the base points *be ture, that is, that tor all other twice-differentiable interpolating
a = xo < X I < . . . < x. - < x. = b, the corresponding func- functions, g(x), the property
tional values b e y , =f ( x i ) ,i = 0, 1, .. ., n, and the approxima-
ting function for [a,b]be Sr(Y,x), where P = [xo,xl,..., x,]' and
3 = [ y o , y l , ... , y.]'. We shall require that Sr(y,x) be continu-
ous on [a,b], possess contirruous first and second derivatives
+
for all x in [a,b],and satisfy the n 1 conditions holds. Thus, in the sense of the integral, the cubic spline may
be viewed as the "smoothest" of all possible interpolating
functions.
1.26 Show that the linear'difference equations of Prob-
Let S;(p,x) coincide with a third-degree polynomial p,,,(x) lem 1.24 may be written in the form
on each subinterval [xi,xr + I ] , i = 0, 1, . . ., n - 1. Then each
of the n functions p;,i(x), i = 0, 1, ...,n - 1 , can be written as
a linear combination of p;.,(xi) and p;.t(xr+l), since the
p;.i(x) are linear functions:

when the base points x i , i = 0, 1, . . ., n, are equally spaced


with spacing h.
1.27 Write a function, named SPLINE, that evaluates the
cubic spline function of Problem 1.24. The dummy argument
list should be

Here, hi = x i + , - x i . (N, X , Y, XARG)


Show that integrating the functions p;,,(x) twice and
imposing the conditions where the N + 1 base points for the spline fit are avaiIable in
X(1), ..., X(N + 1 ), the corresponding functional values are
available in Y(1), .. . ,Y(N + I ) , and XARG is the interpolation
*
argument. The value of Sji(y,XARG) should be returned as the
value of SPLINE. Test the function with data from Table P1.19.
leads to the following interpcllating functions: 1.28 This problem deals with two-dimensional interpola-
tion, in which we consider the approximation of a function
f = f(x,y). Suppose that a total of mn functional values
f(xi,y,) are available, for all possible combinations of m
levels of x , ( i = 1,2, . . .,m), and n levels of y , ( j = 1,2,. . .,n).
For convenience, arrange these values f,, = f (x,,y,) in a two-
dimensional table, so that row i corresponds to x = X I ,and
column j corresponds t o y = y,. Then, given arbitrary x and y,
the problem is to interpolate in the table to find an approxi-
mation to f(x,y).
64 Interpolation ana'Approximation

Here, we consider linear interpolation. Let XI < x < XI + I , print the interpolated value FXY, and filially return to read
and y, G y < y j + ~as
, shown in Fig. P1.28. The symbol additional pairs of values for XVAL and YVAL.
indicates points at which functional values are available.
Suggested Test Data
(a) The values shown in Table P1.29a are given by Perry
[24] for the specific volume, v (cu ftllb), of superheated
methane, at various temperatures and pressures. Estimate the
specific volume of methane at (56.4"F, 12.7 psia), (56.4, 22.7),
(56.4,100),(411.2,12.7),(411.2,30.1),(-200,10),and (0, 84.3).
To test LlN2D completely, also try a few temperatures and
pressures beyond the scope of the table,
(b) The values shown in Table P1.29b are given by Perry
[24] for the total vapor pressures,p (psia), of aqueous solutions
of ammonia, at various temperatures and molal concentrations
of ammonia.

Figure PI.28 Table P1.29b

First, interpolate linearly through frj and fr + and through Temp. Percentage Molal Concentration of Ammonia
fi.j+l and fr + l , j + l , to obtain approximations f, and fa to "F 0 10 20 25 30 35
f (x,y) at the points A and B. Then interpolate linearly through -
fA and fa to obtain the final approximation to f(x,y). If 60 0.26 1.42 3.51 5.55 8.65 13.22
u = ( X - xI)I(xI+~ -XI), and P = (Y-yj)lOIJ+l - yj), show 80 0.51 2.43 5.85 9.06 13.86 20.61
100 0.95 4.05 9.34 14.22 21.32 31.16
that the result is
140 2.89 9.98 21.49 31.54 45.73 64.78
180 7.51 21.65 44.02 62.68 88.17 121.68
220 17.19 42.47 81.91 113.81 156.41 211.24
250 29.83 66.67 124.08 169.48 229.62 305.60
What would be the corresponding formula if the first interpo-
lation were in they direction, to give fc and fD, followed by
interpolation in the x direction? Estimate the total vapor pressure at (126.5"F, 28.8 mole %),
Give a simple graphical interpretation, based on relative (126.5, 6.7), (126.5, 25.0), (60, O), (237.5, 17.6), and (237.5,
areas, to the weight factors assigned in the above formula to 35.0).
Srj,fr.j+~,fr+~.,,
andfr+l.j+l. 1.30 Extend the scheme outlined in Problem 1.28 to two-
1.29 Write a function, named LINZD, that will perform the dimensional cubic interpolation. The known functional values
two-dimensional linear interpolation discussed in Problem 1.28. that will be involved are denoted by thesymbol in Fig. P1.30.
A typical call will be The successive values XI and y, need not be equally spaced.
FXY = LlN2D (F, X, Y, M, N, XVAL, YVAL)
Here, F, X, and Y are the arrays that have been preset with the
tabulated valuesfrj, xi, and yj, M and N have obvious mean-
ings, and XVAL and YVAL correspond to the values x and y
for which f(x,y) is to be estimated. If x and/or y lie outside the
range of the table, the function should print a message to this
effect and return the value zero.
Also, write a main program that will read values for M, N,
the matrix F, the vectors X and Y, and XVAL and YVAL. The
main program should call on LlNZD as indicated above, then
Table P1.29a

Temp. Pressure, psia


"F 10 20 30 40 60 80 100
- Figure P1.30
-200 17.15 8.47 5.57 4.12 2.678 1.954 1.518
-100 23.97 11.94 7.91 5.91 3.91 2.903 2.301 First, interpolate four times in the x direction to give four
0 30.72 15.32 10.19 7.63 5.06 3.78 3.014 third-degree polynomials that will yield estimates for fA,fB, fc,
100 37.44 18.70 12.44 9.33 6.21 4.65 3.71 and f ~Then
. interpolate these four values in they direction for
200 44.13 22.07 14.70 11.03 7.34 5.50 4.40 the final approximation to f (x,y).
300 50.83 25.42 16.94 12.71 8.46 6.35 5.07 Establish a computational algorithm for the above pro-
400 57.51 28.76 19.17 14.38 9.58 7.19 5.75
64.20 32.10 21.40 16.05
cedure. Would the final result be different if the interpolation
500 10.70 8.03 6.42
proceeded in the y direction first?
Problems 65

1.31 Write a function, named CUBZD, that will perform + +


1.35 Suppose that (m l)(n 1) functional values
the two-dimensional cubic interpolation discussed in Prob- f(xi,yl) are available for all combinations of m 1 levels of +
lem 1.30. A typical call will be x l , i = O , l ,..., m, and n i - 1 levels of y,,j=O,l, ...,n.
FXY = CUB2D (F, X, Y, M, N, XVAL, YVAL) Define Lagrangian interpolation coefficients comparable to
(1.44) as follows:
in which the arguments have the same meanings as in Prob-
lem 1.29. Test the function with a main program and data
similar to those used in Problem 1.29.
1.32 Following in the same vein as Problems 1.28 and 1.30,
mvestngate the following alternative procedure for two-
dimensional approximation. Referring to Fig. P1.32, consider
the nine points within which the point of approximation (x,y),
is most nearly centered. Show that

is a two-dimensional polynomial of degree m in x and degree n


in y of the form

and satisfies the (m + l)(n + 1) conditions

and therefore thatp,,.(x,y) may be viewed as a two-dimensional


Figure P1.32 interpolating polynomial passing through the (m I)(n 1) + +-
points (xr,yj,f(xi,yj)), i = 0, 1, ...,m, j = 0, 1, ...,n.
Taylor's expansion in two dimensions (see, for example, 1.36 Use the two-dimensional interpolating polynomial of
equatian (7.1)) gives Problem 1.35 with n = m = 1 and n = m = 2 to find estimates
of the specific volume of methane at (10O0F,40 psia) using the
information from Table P1.29a at temperatures of -100, 0, .
200, and 300°F and pressures of 20, 30, 60, and 80 psia.
in which the derivativesfx (3:aflax),f,, etc., are to be evalua-
Compare the results with the reported value of 9.33 cu ft/lb.
ted at the point (xr,~,). These five partial derivatives may be
1.37 Find a three-dimensional interpolating polynomial of
estimated from the nine functional values by using equations degree m in x, n in y, and q in z of the form
(7.6), (7.7), and (7.8), if the xi and y j are spaced uniformly, or
by somewhat different.forms if they are spaced unevenly.
Develop a computational algorithm for this type of approx-
imation. Does the final approximation value differ from that
predicted by a quadratic-type interpolation similar to the that satisfies the (m + l)(n + l)(q + 1) conditions
linear and cubic interpolation procedures discussed in Prob-
lems 1.28 and 1.30?
1.33 Write a function, named TAYLOR, that implements
the two-dimensional approximation procedure developed in
Problem 1.32. A typical call will be Develop a comparable interpolating polynomial for any
number of independent variables.
FXY =TAYLOR (F, X, Y, M, N, XVAL, YVAL)
1.38 Show that the remainder term for the two-dimensional
in which the arguments have the same meanings as in Prob- interpolating polynomial of Problem 1.35, analogous to
lem 1.29. Test the function with a main program and data (1.39b) for the one-dimensional case, is given by
similar to those used in Problem 1.29.
1.34 Following lines similar to those of Problems 1.28 [TI:= (X - xi)] am+ I f ( 6 , ~ )
Rm."(x,Y) = (m + I)! ax" + I
through 1.31, develop an algorithm for interpolating in three
dimensions at the point (x,y,z). Write a function, such as
LlN3D (if linear interpolatior~is used), that implements the
procedure, for which a typical call will be
ANS = LIN3D (F, X, Y, 2, 1W. N, P, XVAL, YVAL, ZVAL)
[lx0
(X - (Y - Y,)I a m + n + z f ( ~ ' , T 3

The arguments are the same as before, with the addition of


(m + I)!(n + l)! axm+1aY n + ~

Z, the vector containing the several levels zk, P, the number of where 5 and 5' are in (x,xo,xl,. ..,xm) and rl and 7' are in
such levels, and ZVAL, the particular value of z. -..,Y").
CY,YO,Y~,
66 InterpoIation and Approximation

1.39 Write a function, named POLYZD, that evaluates the 1.41 Write a function, named SPLIN2, with dummy
two-dimensional interpolating polynomial p,,.(x,y) of Prob- argument list (IMAX, JMAX, X, Y, F, XBAR, YBAR) that
lem 1.35 for arguments x = 2 and y = 7. The function should evaluates the doubly-cubic spline function developed in
have the dummy argument list Problem 1.40. The arguments have the meanings of like argu-
ments for the function POLYZD described in Problem 1.39.
(IMAX, JMAX, X, Y, F, M, N, XBAR, YBAR)
Write a short main program similar to that outlined in Prob-
Here, F is a matrix having IMAX rows and J M A X columns lem 1.39, and test the function SPLIN2 with the data of
of functional values f(x,,yj), X is a vector containing the Problem 1.29.
corresponding values of x i in ascending sequence in X(l), 1.42 The characteristics in Table P1.42 are available
.. .,X(IMAX), and Y is a vector containing the yj in ascending for the 6J5 triode electronic vacuum tube [23] (v. = anode
sequence in Y (1 ), ...,Y(JMAX) ; F is arranged so that F(1,J) is voltage, v, = grid voltage, .i = anode current, mA).
the functional value corresponding to X(1) and Y(J). The Write a function with three different entries, such that given
function should scan the vectors X and Y to determine which any two of v,, v., and i., the function will compute the third,
+
( M + 1) ( N 1) of the functional values F(1.J) should be using an appropriate two-dimensional interpolation procedure.
selected to make the point (XBAR, YBAR) (that is, the point Let the three function names be VGRID, VANODE, IANODE
(2,jj)) as nearly central as possible to the base points for (real). If VG, VA, and IA (real) are program variables equivalent
evaluating the interpolating polynomial p,,.(f ,?) of degree to v,, ZJ., and i., respectively, then typical statements refer-
M in x and of degree N in y. The value p,,.(z,J) should be encing the function might be
returned as the value of POLYZD. Should XARG or YARG
fall outside the range of base points, the interpolating poly- VG = VGRID (VA, IA)
nomial should be used for extrapolation.
VA = VANODE (VG, IA)
Write a short main program that reads values for IMAX,
JMAX, X ( l ) ,..., X(IMAX), Y ( l ) ,..., Y(JMAX), F(1.1). ..., IA = IANODE (VA, VG)
F(IMAX, JMAX) once, then reads values for M, N, XBAR, and
YBAR, calls on POLYZD to evaluate the appropriate inter- Test the function thoroughly with a short calling program.
polating polynomial, prints the result, and returns to read The functions can be used in Problem 2.48 and Problem 3.20.
another set of values for M, N, XBAR, and YBAR. Test the 1.43 Suppose that you wish to minimize the maximum
function with the data of Problem 1.29 using different com- magnitude of the error-term factor
binations of M and N. Compare the interpolated results with
those found using the functions LlN2D and CUB2D of
Problems 1.29 and 1.31, respectively.
1.40 Using the properties of the cubic spline function in the Lagrange or divided-difference interpolation formulas,
discussed in Problem 1.24, develop a two dimensional doubly- to be used over the interval [a,b]. You are free to choose the
cubic spline function (see [22]) for interpolation in rectangular base points XO,X I ,. .., xn.
regions with base points and functional values arranged as If a = 2, b = 6 , and n = 3, what values would you select for
described in Problem 1.35.
XO,X I ,xz, and x,?
1.44 Find a near-minimax quadratic, and the minimax
Table Pi .42 cubic, polynomial approximations to x4 on the interval [2,8].
1.45 Use the Chebyshev economization procedure to yield
an approximation of the form

which is in error by 0.0095, at most, on the interval - 1


sxt1.
1.46 Hougen and Watson [7] suggest that the following
empirical equation describes the molal heat capacity of
nitrogen accurately (within 1.2%) between 300 and 2100°K:

Here, c, has units of cal/gm mole OK and T is in OK (= "C


+ 273.15").
(a) Find the linear approximation to c, that minimizes the
maximum additional error in c, between 1000 and 2000 OK.
(b) What is the upper limit on the total percentage error for
the approximation of part (a)?
1.47 The coefficient of expansion, k, for aluminum be-
tween 0 and 100°C% given by
Problems 67
where T is in "C and the reference temperature is 0 "C(that is, 1.50 Show that the coefficients a l in (1.80) and (1.81) are
k is zero at 0 O C ) . given by
(a) Rewrite the expression for k in terms of the Chebyshev
polynomials of Table 1.12, and truncate to the constant term. ..=-I 1 '
n -1
=
J1 -xz
1
n o
J
dx = - f (cos 0) d8,
(b) Calculate the integrated average value of k between
0 and 100 "C, and the arithmetic average of the values of a , = -2i f-
(4-
Tdx) dx = - ~ ( C O8)Scos i 0 do,
k(0) and k(100), and compare with the results of part (a).
1.48 Write a function, named CHEB, that implements
n -1 ,/I-x~ n o J'
the Chebyshev economization procedure outlined in Section
1.12, by making direct use of'the recursion relation (1.72). The Using these definitions, find an approximation of the form of
routine should not require tabular information such as that of (1.81) for n = 4 and f(x)=cosx, -1 < x < 1. Compare the
Tables 1.3.1 and 1.3.2. The function should have the argu- result with the economized power-series approximation of
ments (1.87).
1.51 Snyder [I81 shows that when the integrals of Prob-
(N,A. AL, AR, EPS, M, C) lem 1.50 are difficult to evaluate, they may be approximated by

where all variables have the meanings ascribed to the program


variables of Example 1.3, The value of CHEB should be the
maximum possible error introduced by the economization
process (comparable to program variable E of Example 1.3).
. +
where the x,, j = 0, 1, .., n, are the n 1 roots of T,,+l(x)
(see (1.77)). Use these relationships to estimate the coefficients
1.49 Show that equation (1.3.13) follows from (1.3,10), a l , i = O , l ,...,4,forthefunction f(x)=cosx, - 1 ~ x ~ 1 ,
(1.3.1 l), and (1.3.12). and compare with the results of Problem 1.50 and with the
economized power-series approximation of (1.87).
68 Interpolation and Approximation

Bibliography 13. A. Ralston, A First Course in Numerical Analysis, McGraw-


Hill, New York, 1965.
1. B. W. Arden, An Introduction to Digital Computing, Addison- 14. E. Isaacson and H. B. Keller, Analysis of Numerical Methods,
Wesley, Reading, Massachusetts, 1963. Wiley, New York, 1966.
2. R. W. Hamming, Numerical Methods for Scientists and En- 15. E. Remes, "Sur le calcul effectif des polynomes d'approxima-
gineers, McGraw-Hill, New York, 1962. tion de Tchebichef," C.R. Acad. Sci. Paris, 199,337-340 (1934).
3. F. B. Hildebrand, Introduction to Numerical Analysis, McGraw- 16. F. D. Murnaghan and J. W. Wrench, Jr., "The Determination
Hill, New York, 1956. of the Chebyshev Approximating Polynomial for a Differenti-
4. D. E. Knuth, "Evaluation of Polynomials by Computer," able Function," Math. Tables Aids Comput., 13, 185-193 (1959).
Communications of the A.C.M., 5, 595-599 (1962). 17. C. W. Clenshaw, "Chebyshev Series for Mathematical Func-
5. A. M. Ostrowski, Studies in Mathematics and Mechanics tions," Math. Tables, Vol. 5, Nat. Phys. Lab., G. Britain, 1962.
Presented to R. Von Mises, pp. 4048, Academic Press, New 18. M. A. Snyder, Chebyshev Methods in Numerical Approximation,
York, 1954. Prentice-Hall, Englewood Cliffs, New Jersey, 1966.
6. J. Todd, A Survey of Numerical Analysis, McGraw-Hill, New 19. C. T. Fike, "Methods of Evaluating Polynomial Approxima-
York, 1962. tions in Function Evaluation Routines," Communications of
7. 0. A. Hougen and K. M. Watson, Chemical Process Principles,, the A.C.M., 10, 175-178, (1967).
Part One, Material and Energy Balances, Wiley, New York, 20. International Critical Tables, Vol. 11, p. 337, McGraw-Hill, New
1953. York, 1927.
8. W. Kaplan, Advanced Calculus, Addison-Wesley, Cambridge, 21. C. D. Hodgman, ed., Handbook of Chemi~tryand Physics, 33rd
~assachusetts,1953. ed., Chemical Rubber, Cleveland, Ohio, 1951.
9. D. D. McCracken and W. S. Dorn, Numerical Methods and 22. J. H. Ahlberg, E. N. Nilson, and J. L. Walsh, The Theory of
FORTRAN Programming, Wiley, New York, 1964. Splines and their Applications, Academic Press, New York,
10. E. I. Organick, A FORTRAN IV Primer, Addison-Wesley, 1967.
Reading, Massachusetts, 1966. 23. R. J. Smith, Circuit, Deaices, and Systems, Wiley, New York,
11. D. D. McCracken, A Guide to FORTRAN IV Programming, 1966.
Wiley, New York, 1965. 24. J. H. Perry, ed., Chemical Engineers' Handbook, 3rd ed.,
12. P. H. Henrici, Elements of Numerical Analysis, Wiley, New McGraw-Hill, New York, 1950.
York, 1964.
CHAPTER 2

Numerical Integration

2.1 Introduction The true value of


The evaluation of a deiinite integral

is given by the area under the solid curve f(x), whereas


by formal methods is often difficult or impossible, even the approximation
when f(x) is of a relatively simple analytical form. For
these intractable cases, and for the more general integra-
tion problem in which only a few sample values of f(x)
are available at distinct base-point arguments xi, i = 0, is given by the area under the dotted curve p,(x). Note
1, . . ., n, some other approach is necessary. An obvious that if the difference between f (x) and p,(x),
alternative is to find a function g(x) that is both a suitable
approximation of f(x) and simple to integrate formally.
Then (2.1) can be estimated as differs in sign on various segments of the integration
interval (the usual case), then the overall integration error

Fortunately the interpolating polynomials p,(x), al-


ready developed in Chapter l, often produce adequate may be small, even when p,(x) is not a particularly good
approximations and possess the desired property of approximation of f(x). Positive errors in one segment
simple integrability. In fact, this combination of charac- tend to cancel negative errors in others. For this reason,
teristics is a principal reason for the great emphasis given integration is often termed a smoothing process.
to polynomials throughout much of numerical mathe- Texts on numerical mathematics abound with formulas
matics. for numerical integration (sometimes called quadrature or
Figure 2.1 illustrates the approximation of the function mechanical quadrature). This is not surprising, since there
f (x) by the polynomial p,,(x) that exactly reproduces the are so many possibilities for selecting the base-point
value of f(x) at the indicated base points x,, x,, ..., x,. spacing, the degree of the interpolating polynomial, and

Figure 2.1 Numerical integration.


70 Numerical Integration

the location of base points with respect to the interval of Since a different interpolating polynomial p,(x) is used in
integration. The commonly used integration methods can each of the four cases, (2.3) yields four different approxi-
be classified into two groups : the Newton-Cotes formulas mations (the shaded areas) of the true integral. In Fig.
that employ functional values at equally spaced base 2.2a, the polynomial is fitted by using only functional
points, and the Gaussian quadrature formulas that em- values on the interval [a,b], with base points at both
ploy unequally spaced base points, determined by certain a and b. Figure 2.2b illustrates a similar situation,
properties of orthogonal polynomials. except that functional values at base points outside the
integration interval are also used to determine the ap-
2.2 Numerical Integration with Equally Spaced proximating polynomial. In Fig. 2.2c, the polynomial is
Base Points determined by functional values between the integration
Consider the four parts of Fig. 2.2, in which f(x) is limits, with no base points at the ends of the interval; Fig.
known only at the indicated set of equally spaced base 2.2d is similar, except that there is a base point at one
points. end of the interval, x = a. Obviously, many other com-
In each case, the same integral, binations are also possible. For example, a and b could
be arbitrarily located with respect to the base points,
that is, a and/or b need not coincide with base point
values.
is to be evaluated by the approximation The two most frequently used classes of equal-interval
integration formulas are called closed and open. In both
classes, the integration limits a and b are either coincident

a b a b
(c) (dl
Figure 2.2 Numerical integration-four different polynomial approximations o f f (x).
2.3 Newton-Cotes Closed Integration Formulas

with base points or are displaced from base points by


integral multiples of the base-point spacing h. The closed
integration formulas use information about f ( x ) , that is,
they have base points, at both limits of integration, and
are illustrated in Fig. 2.2a. The open integration formulas
do not require information about f ( x ) a t the limits of
integration (see Fig. 2 . 2 ~ ) .
All of the closed and open formulas can be generated
by integrating one of the general interpolating poly-
nomials p , ( ~ ) from Chapter 1 with appropriate base
points and integration lirnits. Since it is assumed here
that f( x ) can be computed or is known only at the base
points x,, x , , x,, . . ., x,, equally spaced by stepsize h, the
logical choice for the polynomial representation is one
of the finite-difference (forward, backward, or central) xo x1

forms. Let the polynomial be given in terms of the forward a b


finite differences by Newton's forward formula of (1.54), a=o a=l
Figure 2.3 The trapezoidal rule.
Using this polynomial approximation for f ( x ) and
transforming the integration variable from x to a,
(u = ( X - xo)/h), we have

+ R.(x, + ah)
,< A n f ( x 0 )
~ d x fo ah) + R,(xo +ah).
Here, as before, ct = ( x - x,)/h, p,(x, + ah) is the nth-
-1 h
1

0
p,(xo + ah) du, (2.6)
where the integral on the right, is given, from (2.4), by
degree interpolating polynomial, and R,(x, + ah) is the
remainder or error term (1.55),
R,(xo + ah) = hn+ 'm(a - l ) ( u - 2). . . ( a - n)
f-'"+"(5)
-
X
(n + I)!' t in ( ~ 9 x 0 9...,xn),

R,(x, + ah) = hn+'n(u- l ) ( a - 2 ) From the definition of the first forward difference,
+
Af(x0) =f (x0 h) -f (xo), (2.7) assumes the final form,

2.3 Newton-Cotes Closed Integration Formulas


The simplest case of closed integration is shown
schematically in Fig. 2.3. Here, the two base points
x , = a and x, = b are used to determine a first-degree
+
polynomial, p , ( x ) = p l ( x o cth) or straight-line approxi- the familiar trapezoidal rule. The required area under the
mation of f ( x ) . The appropriate form of (1.54) is given solid curve of Fig. 2.3 is approximated by the area under
by the dotted straight line (the shaded trapezoid).
f ( x ) =f (xo + ah) =f (xo) 4-a Af (xo) + R,(xo + ah) The error involved in using the trapezoidal approxima-
tion is given by the integral of the remainder term (2.5),
= P I ( X -t-
O ah) + Rl(x0 + ah) (2.4)
where
Rl(xo
"(<I
+ ah) = h2a(a - 1)f-
2! '

R l ( x o i- ah) = h2n(a - 1 )f [x,xo,x,]


72 Numerical Integration

If f(x) is a continuous function of x, then f "(5) or its The trapezoidal rule with error term is then given by
equivalent, 2!f [x,xo,xl] [see (2.5)], is a continuous, but
unknown, function of x, so direct evaluation of (2.9) is
impossible. Since u is simply a transformed value of
x, f"(r) is a continuous function of the integration
variable u; this property simplifies the estimation of (2.9).
The factor f "(r) can be taken outside the integral sign by Thus the error term from the trapezoidal rule is zero only
applying the integral mean-value theorem from the iff"(<) vanishes. If f ( x ) is linear, that is, a first-degree
b
calculus:
polynomial, the trapezoidal rule yields exactly
- -
[ f(x)dx,
Ja
If two functions, q(x) and g(x), are continuous for
as expected (see Fig. 2.3).
a < x < b and g(x) is of constant sign for a < x < b,
A more general problem is illustrated in Fig. 2.4.
then Here, the interpolating polynomial is of degree n ; the
.h -b
+
n 1 evenly spaced base points are xo, x,, . . ., x,. Let a,
the lower limit of integration, coincide with the base
where a < [ < b. point x,. Let b, the upper limit of integration, be arbitrary
since the factor a(u - 1) is negative for all in the for the moment. Then the approximation is given by the
integral

where 12= (b - xo)/h. Carrying out the indicated inte-


gration for the first few terms yields

All terms vanish when evaluated at the lower integration


limit. so that

The corresponding error term [see (1.55a)l is given by

h
0
+ nh) do = hn+' jbb [.(a - ~ ) ( o- 2). . . (u - n) -

with < in (xo,xl, . . .,b).


Equations (2.14) and (2.15) describe a family of related
interval 0 < a < 1, the integral mean-value theorem
integration formulas. If the upper limit b is chosen to
allows (2.9) to be rewritten as
coincide with one of the base points so that the integra--
1
f "(5) da
h3 Joa(u - 1) -
tion is across rn intervals, each of width h (that is, the
2! integration is between a = xo and b = x,), then ii in (2.14)
and (2.15) assumes the integral value m.
f "(0
= h3-~o(a2-cc)da Note that the trapezoidal rule already developed
2!
follows from (2.14) and (2.15) when 6 = m = n = 1. Tc
h3 find similar formulas for integration across m = 2, 3 , 4 01
= - Ef"(S), r , t in ( ~ 0 ~ x 1 ) . (2.10)
more intervals, let E = 2, 3,4, etc. in (2.14). The choice ol
2.3 Newton-Cotes Closed Integration Formulas 73
n is still open, and, although n = 5 seems the most natural, Note that the error of (2.18) cannot be evaluated directly
there is no reason why points outside the integration by simple application of the integral mean-value theorem
interval could not also be used to determine the inter- as was done with the trapezoidal rule error (2.10), since
polating polynomial. the factor a(u - l)(a - 2 ) ( ~- 3) does not have a con-
When 2 is an even integer, that is, when the overall stant sign over the interval of integration. However,
integration interval contains an even number of steps of Steffensen [5] has shown that the error can be written in
analogous fashion, that is, as

Henceforth the bar on f in (2.11) and (2.19) will be


<
dropped, since both and f are simply unknown values
of the independent variable on the integration interval.
Simpson's rule with error term is then given by

a b
Figure 2.4 General' case for closed integration.
width h, (2.14) yields an unexpected dividend which is
illustrated by the choice ii = 2 :
2
jxyf(x) dx = h jf ( x o + ah) du
0
ih 2f(xo) + 2Af(xo)
4-
1
- A2f (x,)
3
+ 0 A3f(a:,) - -1
90
A4f (x,) + . . .

Note that the coefficient of A3f(x0) = A' lf(x0) is zero.


1 (2.16)

On substitution of the appropriate ordinate values from


the forward-difference definitions of (1.50) and retention
of the first three terms, that is, for the choice n = 2,
(2.16) becomes Figure 2.5 Simpson's rule.

Only three points are used to determine the polynomial


(see Fig. 2.5). Hence one would expect the integration
Equation (2.17) is the well-known Simpson's rule, prob-
to be exact for f ( x ) a polynomial of degree two or less.
ably the most frequently used of all the numerical inte-
In fact, (2.20) shows that Simpson's rule is exact when
gration formulas. Because of the appearance of the zero
f ( x ) is a polynomial of degree three or less. The set of
coefficient in (2.16), the error term for Simpson's rule is
closed formulas generated by (2.14) and (2.15) are known
not given by (2.15) with n = 2 as might be expected, but
as the Newton-Cotes closed integration formulas. A list
rather with n = 3, that is,
for the cases 3 = 1 , 2 , . . ., 6 follows.
h 1R3(xO+ ah) da
0
2
= h5
2
[ ~ ( L-Y I)(. - 2)(. - 3)
'0
ii = 1 (trapezoidal rule) :
74 Numerical Integration

B = 2 (Simpson's rule): On first glance, it would appear that the error for E = 2
is actually smaller than for L = 3. However, allowance
must be made for the fact that for integration over the
sameinterval [a,b],the stepsize (b - a)/E forthesecondcase
is only two-thirds the stepsize for the first. In terms of
B = 3 (Simpson's second rule): a and b, remembering that the values of 5 in the two will
not normally be the same, the error terms become:
E=2:

The error bound for the second case is smaller (assuming


that f(*)(5,) is not appreciably different from f(*'(t,)),
but only moderately so.
None of the formulas of (2.21) requires the computa-
tion of differences, or the coefficients of the interpolating
polynomial. Each involves only the calculation of a
weighted sum of the base-point functional values, that is,
275h7
+ Ff ( ~ 3 +) 751( ~ 4 +) 19f (x5)I - -f '6)(0. la'(X) dx = iwif (xi),
i=O
(2.22)
(2.2 1e)
where the wi are the weights assigned to the functional
B = 6: values, f (xi), i = 0,1, . . ., n.
h
140 C41f (xo) + 216f (XI) + 27f ( ~ 2 )
dx = -
j:o>(x) Example. Use the trapezoidal and Simpson's rules to
estimate the integral
+ 272f (x3) + 27f (x4) + 216f (x5)
91z9
J,~f(x) dx = lI3
( x 3- 2x2 + 7~ - 5) dx
+ 411(~611- (8)(o. (2.21f)

Note that when I is even (that is, when there is an even


number of intervals or an odd number of base points) the
x4
4
2x3
3
+ 7x2
-- SX
2 1: = 203. (2.23)

Functional values from (2.23)for several values of x on the


formulas are exact for f (x) a polynomial of degree E + 1 integration interval are :
or less; when B is odd, the formulas are exact for f(x)
a polynomial of degree L or less. For all even values of L,
the coefficient of A" 'f (x,) in (2.14) assumes a zero value.
Hence in each such case, the error term involves a deriva-
tive of order I + 2 rather than of order E + 1 as might be
expected. For this reason the odd-point formulas are
more frequently used than the even-point formulas. For
example, consider the error terms in the formulas for For the trapezoidal rule (see Fig. 2.6a), h = 2, xo = 1,
B=2andI=3: xl = 3, f(xo) = 1, and f ( x l ) = 25. From (2.21a),
B = 2 (Simpson's rule):

B = 3 (Simpson's second rule):


2.4 Newton-Cotes Open Integration Formu/as

Figure 2.6 Numer,ical approximatiom of


113 -
(x3 2x2 + 7x - 5) dx. (a) Trapezoidal rule. (b) Simpson's rule.

Since f"(x) = 6x - 4 , the errlor is given by finite-difference polynomial, Newton's forward formula
2 2
(1.54), where the forward differences of f ( x , ) rather than
- -f"(E) = - -(6e- 4), 1 <( <3, off (x,) are used, giving
3 3
which has an extreme value -98. The true error is 20%- 26 =
- 5 3 , smaller than the bound as expected, but still quite large.
For Simpson's rule (see Fig. 2.6b), h = I , x o = 1 , x1 = 2,
x , = 3, f ( x o ) = 1 , f (x,) = 9, and f ( x Z ) = 25. From (2.20),
xz=3
+h
1: p,-2(xo + ah) du. (2.25)

L=, 3
h
dx = - [J'(xo) 4-4 f (XI)+f (xdl- g?jf ' 4 ' ( ( )
f (XI
il5
Here, cr = (x - x,)/h and Z = (b - xo)/h. In terms of the
forward differences o f . f ( x , ) ,p,-,(x0 +
ah) is given by
I h5 2 1
= - (1
3
+ 36 + 25) - -
90
f '4'(6) = 20 - - -j t 4 ' ( [ ) .
3 90 pn-2(xo + ah) = f ( x 1 ) . + ( a - l ) A f ( x ~ )
Since the fourth derivative off (x)vanishes for all x, the error
term vanishes and the result is exact.

2.4 Newton-Cotes Open Integration Formulas ( a - l)(u - 2)(a -


3- 3, ~ ~ f ( x 1 )
We can derive integration formulas which employ 3!
equally spaced base points but which do not require
base-point functional values at one or both of the inte- + ... + (cr - 1)(a - 2 ) . . . ( a - n + 2 )
gration Iimlts (see F~gs.2 . 2 ~and 2.2d). 1 he general open (n 2)!
integration problem is illustrated in Fig. 2.7. Here the x ~ " - ' (fx , ) . (2.26)
interpolating polynomial is of degree n - 2 ; the n - l
evenly spaced base points are x l , . . ., x ,-,.
Let a, the Then the integral of (2.25) becomes
lower integration limit, coincide with xo = x 1 - h, where b
h, as before, is the base-point spacing. Let b, the upper- I a j ( x )dx t h f [ f ( x , ) + ( a - l)Af(x,) + ...I dci
integration limit, be arbitrary for the moment. Then the 0

approximation is given by

A simple representation of p,- ,(x) is given by the forward


76 Numerical Integration

All terms vanish at the lower limit. Hence d=4:

The corresponding error term is given by 5h


I x y f ( x ) d x =-24 [ l l f ( x l ) + f ( x z )
h I;Pn-,(xo + ah) da
o
+ f ( ~ 3+) 11f (.")I + 95h5
=f ("'(C). (2.30d)
= hn
"a - 1)(a- 2) (a - n . + 1) f ("-')(<)da,
( n - I)! 8=6:
(2.29) x6 3h
where x, < t < 6. 10 [ 1 1 f ( x I )- 1 4 j ( x 2 )+ 2 6 j ( x 3 )
I _ f ( x ) dx = -

When 15 is even, that is, when the formulas involve an


even number of intervals or an odd number of base
./P,-~(x) points, they are exact for f ( x ) a polynomial of degree
d - 1 or less. When d is odd, the formulas arc exact for
0 f(4 f ( x ) a polynomial of degree 6 - 2 or less. For even
values of 8, the coefficient of ~ - ' ( xf , ) in (2.28) becomes
zero. Then the error term of (2.29) involves a derivative
of order d rather than of d - 1, as would be expected.
For this reason the odd-point formulas are more fre-
xo XI ~2
((
'I xn-2 xn-I 1 xn
,
, quently used than the even-point formulas.

a b 2.5 Integration Error Using the Newton-Cotes


Figure 2.7 General case for open integration.
Formulas
Provided that f ( x ) is continuous and has derivatives of
Equations (2.28) and (2.29) describe a family of related suitably high order, the error terms for the closed and
integration formulas. If the upper limit b is chosen to open formulas of (2.21) and (2.30) apply. Error terms for
coincide with one of the base points so that integration the Newton-Cotes formulas are of the form chkf"-I)(<),
is across m intervals, each of width h (that is, the integral 5 in (a,b), where c is some constant, different for each
is evaluated between a = xo and b = x,), then d in (2.28) formula. Comparison of closed and open formulas,
and (2.29) assumes the integral value m. The choice of n requiring the same number of functional values, shows
is still open. The usual choice is n = 2, that is, the integral the open formulas to be slightly better when two or
corresponds to that shown in Fig. 2 . 2 ~ . three points are used; for more than three points the
Evaluation of (2.28) and (2.29) for integral values of d closed formulas are considerably more accurate than the
and n = 8 leads to the family of Newton-Cotes open open ones. This conclusion requires the assumption that
integration formulas : the derivative terms fCk-')(c)are roughly the same for
the two formulas. Therefore, where applicable, closed
rather than open formulas should be used.
The m-point formulas for odd m are of the same order
of accuracy as the m + 1-point formulas; both are said
to have degree of precision m. A formula with degree of
precision m will integrate exactly all polynomials of de-
gree m or less. Polynomials of higher degree wit1 not be
integrated exactly. Thus Simpson's rule possesses degree
of precision three since it will produce exactly the integral
2.6 Composite Iniregration Formulas 77
of all polynomials of degree three or less. Except for the assume that f (4'((1) approximately equals f (4)((2), then
trapezoidal rule, which is often used because of its sim- (2.32) reduces to
plicity, formulas with an odd number of base-point
El = $E2 (2.33)
functional values are usually preferred over those with
an even number of points. and (2.31) becomes
Because the degree of precision of the Newton-Cotes 1' =.%I -q
formulas increases with th~enumber of points, we might 5 2 5 1 - (2.34)
suspect that a very high-order formula would be more The validity of (2.34) hinges completely upon the assump-
accurate than a low-order )formula. Unfortunately, the m- tion that f'4'((1) and f'4'(<2) are equal.
point formulas for large rn have some very undesirable
Example. Evaluate
properties from the computational standpoint. The
weight factors tend to be large with alternating signs,
~ ~ ~ f ( x ) d1 x = ~ ~ ( x ~ - 2 x ~ + 7 ~ + 5 ) (2.35)
dx=20f
which can lead to serious rounding errors, that is, errors
introduced because only a fixed, usually small, number of
by using the trapezoidal rule (2.21a) and the one-point open
digits can be retained after each computer operation. In
formula of (2.30a), each with degree of precision one. Then
addition there exist many functions for which the mag- estimate I* by using the technique outlined above. For this
nitude of the derivative increases without bound as the polynomial function [see (2.23)], the estimates of I* computed
order of differentiation increases. Therefore, a high-order from (2.21a) and (2.30a) are, respectively, Il = 26 and I , = 18.
formula may produce a larger error than a low-order one. The ratio of error terms is
Formulas employing more than eight points are almost
never used.
Iff (x) is known analytically, then it may be possible to
determine the appropriate high-order derivative and
examine its behavior on the interval [a,b] to establish an L't

error bound for the partic~~lar formula chosen. In many Assuming that f "(t,) is equal to f"(tz) leads to
practical situations, however, the error formula is of
little direct value since an expression for the derivative
may not be available. In some cases, it might be possible Note that for this case, the open formula is apparently more
to estimate the required derivative value from high-order, accurate than the closed formula of identical degree of pre-
finite divided diflierences of the functional values. cision (not the usual case). Substitution of (2.37) into (2.31)
Even when one has no information about higher-order leads to
derivatives, it may be possible t o estimate the error if the
integral is computed using two different integration
formulas with comparable degrees of precision. For
example, consider the eval~~ation of which is the true value of the integral in this case. Upon closer
examination, (2.38) is seen to reduce to
1b
I* = f (x) dx
a

by u s i ~ gthe three- and four-point dosed formulas of [Simpson's rule (2.21b)l. Thus the expression of the error in
(2.21b) and (2.21c), that is, Simpson's first and second terms of two formulas with degree of precision one has
rules, each of degree of precision three. Let I, and El resulted in a compound formula with degree of precision three.
be the estimate of I*and the error, respectively, resulting
from the use of Simpson's first rule, and let I, and E, be 2.6 Composite Integration Formulas
like quantities, resulting frorn the use of Simpson's second One way to reduce the error associated with a low-
rule. Then order integration formula is t o subdivide the interval of
integration [a,b]into smaller intervals and then to use the
formula separately on each subinterval. Repeated appli-
In terms of the integration limits a and b, we can write cation of a low-order formula is usually preferred to the
single application of a high-order formula, partly because
of the simplicity of the Iow-order formulas and partly
because of computational difficulties, already mentioned
in the previous section, associated with the high-order
formulas. Integration formulas resulting from interval
subdivision and repeated application of a low-order
where t, and 5, are different values of 5 in (a,b). If we formula are called composite integration formulas.
78 Numerical Integration

Although any of the Newton-Cotes or other simple The error term of (2.41) simplifies to that of (2.42) be-
(one-application) formulas can be written in composite cause the continuous function f "(x) assumes all values
form, the closed formulas are especially attractive since, between its extreme values. Therefore, there must be some
except for the base points x = a and x = b, base points r in (a,b) for which
at the ends of each subinterval are also base points for
adjacent subintervals. Thus, althou& one might suspect
that n repeated applic?+ionsof an m-point formula would
-
require nm functional evaluations, in fact only n(m 1) + 1 The error for the composite trapezoidal formula is
such evaluations are needed, a considerable saving, proporlional to l / n z . Therefore, if we double the number
especially when m is small.
of applications, the error will decrease roughly by a factor
The simplest composite formula is generated by re-
of four [f"(t) will usually be different for two different
peated application of the trapezoidal rule of (2.21a). For
values of 111. This form of the error term suggests a tech-
n applications of the rule, each subinterval is of length
nique, similar to that outlined in (2.31) to (2.34), for
h = (b -a)/n. Let xi = x, + ih, i=O, 1, ..., n. Then
estimating the error. In this case, the same composite
formula is used with two different values of n; in (2.31)
to (2.34) two different simple (one-application) formulas
are used.
Let Inand E,, be, respectively, the estimate of the
integral and the associated error for n applications of the
composite trapezoidal formula. Then the true value of
the integral is

where n, and n, are two different values of n. From


(2.43,
h3 "
- - C f "(ti),
12 i = 1
xi-, < ti < xi. --( b12n:
- aI3 '(&)
En,
-=
(2.40) , {I, t 2 in (0).
En, (b - a)3
Collecting common terms in (2.40) leads to 12nt

Assuming that f "(5,) and f "(t2)are equal, (2.44) reduces


to

Substitution of (2.45) into (2.43) leads to

In,- I n ,
+
I* = I n ,
.2I:[
1-
(2.46)

In terms of n, the number of applications of the rule, and For n2 = 2n,, (2.46) becomes
a and b, the integration limits, the composite formula is

This kind of approach in which two approximations to


an integral are used to get a third (hopefully better)
approximation is called Richardson's deferred approach
to the limit or Richardson's extrapolation.
For n applications of Simpson's rule, functional values
are required at 2n + 1 base points x,, x,, . . ., x,,. The
composite Simpson's rule formula is
2.6 Composite Integration Formulas

labf( x ) d x = 1T.f h
(X) d x = - [ f (xo)
3 + 4 f ( x i ) + 2.f ( ~ 2+) 4f ( x , ) + 2 f ( x 4 )

where
b -a
~ ~ < ti
~ < xti,
- 2 h = -2n '
and x i = x 0 + i h , i = O , l , ..., 2n.

In terms of a, b, and n, (2.48) is given by

the index i should assume only odd values.


and then four applications of the composite trapezoidal rule.
If technique is Then perform Richardson's extrapolation, given by (2.47)*to
(2.49), the relationship corresponding to equation (2.46) find a third estimate of the integral.
for the composite trapezoidal rule is Let n ,= 2 and nz = 4. Then (2.42) yields (see Fig. 2.8),
(3 - 1) 1 25
I* =In,+ 2.1 -In,
1- .4I:[ (2.50) In, = 7

- 5+y+4"9+156
91 = 22

For n2 = 2 4 , (2.50) beconles


Imt =

and substitution into (2.47) gives a third estimate,


1 =21

16 1 I* = 321) - )(22) = 208.


I* = -15 ,In2 - yjIn,. (2.51)
The true value of the integral is 203 and the actual errors for - -

these three estimates of the integral are 413, 113, and 0. On


to those (2.42) and closer examination (2.47) is seen to be equivalent to Simpson's
(2.49) Can be generated for any of the low-order integra- rule; hence I* should be free of error in this case, since f(x)
tion formulas. is a polynomial of degree three.

XO X1 x2 XO X1 X2 X3 X4

a b a b
(a) (b)
Figure 2.8 Repeated application of the trapezoidal rule. (a) Two applications. (b) Four applications.
EXAMPLE 2.1
RADIANT INTERCHANGE BETWEEN PARALLEL PLATES
COMPOSITE SIMPSON'S RULE

Introduction Problem Statement


Two infinitely long parallel plates of width w are Write a computer program that will accept values for
separated by a distance d, as shown in Fig. 2.1.1. Their T I , T2, E , , c2, d, W, a, and n (see below), and that will
surfaces are gray, isothermal, diffusely radiating and proceed to compute :
reflecting, and have tenlperatures TI and T2 and emis- (a) The emissive powers, El = ~ , a T fand E2 = E,OT;,
sivities and e2. for the two plates.
(b) The local radiosities B, and B2 at the discrete
points having x1 and x2 equal to 0, Ax, 2Ax, . . ., nAx,
where Ax = w/2n.
(c) The local irradiosities I, and I, at the same points
as in (b).
(d) The net heat inputs Q , and Q , at the two plates.
Use Simpson's rule for approximating the integrals in
the above equations. Assume symmetry of 3, and I,
about the centerline x, = 0, and of B2and I, about x2 = 0.
Method of Solution
Figure 2.1.1 Geometry of parallelplates.
We first write a function, named SIMPS, that will
evaluate the general integral f(x)dx by n repeated
It can be shown that the local radiosities B, and B2 applications of Simpson's rule, as discussed in Section
(defined as the rates at which emitted plus reflected 2.6. The appropriate composite integration formula is
radiation leave unit area of each surface) are the solution readily derived from equation (2.49):
of the simultaneous integral equations:

where h = (b - a)/2n and xi= a + ih. An appropriate


call for the function could be
AREA = S I M P S (A, B, N, F)

in which a is the Stefan-Boltzmann constant, and in which A, 0 , and N have obvious interpretations, and
F is the name of the function to be integrated.
Figure 2.1.2, corresponding to n = 2, emphasizes that
+
we compute B,, for example, at onlyn 1 points, to which
we can assign subscripts i = 0, 1, . . ., n. From symmetry,
The integral in equation (2.1.1) is the irradiosity, Bl(- i) = B1(.
I, = Il(x,), being the rate per unit area of incident The basic approach is straightforward. As a starting
radiation at a point x, on the lower plate. The integral guess, assume B, = El and B, = E2 everywhere. The
arises because radiation emitted from all points along the
upper plate has been taken into account. Similarly, the
integral in (2.1.2) is 12,the irradiosity at the upper plate.
The net rates at which heat must be supplied to unit
length of the lower and upper surfaces, to keep them at
steady temperatures, are

Plate I
0 1 2
Figure 2.1.2 BI and Bz are computed at discrete points only.
Example 2.1 Radiant Interchange between Parallel Plates-Composite Simpson's Rule 81
right-hand sides of (2.1.1) and (2.1.2) can then be Since the function SIMPS evaluates the integrand at
evaluated, enabling new estimates to be made of the definite values of the continuous variables x, and x,, a
distributions of B, and B,. These values are cycled back translation must be made (achieved in the program by
into the two integrals, giving revised estimates, and so on. the function ISUB) between these continuous variables
Iterations are discontinued either when itmax iterations and the appropriate subscripts for the point values such
have been made or when B, and B, at all points do not as B I i .For this purpose we use, for example, i = jx,/Axl
change by more than a fractional amount 6. This approach + 0.001, rounded down to the next lower integer. Here,
to the solution is an example of a successive substitution the absolute value sign accounts for symmetry, and the
method for solution of simultaneous nonlinear equations. small incrrment of 0.001 avoids any complications of
Successive substitution methods are discussed in Chapter round-off error when x,/Ax is evaluated. There are also
5. other complications in the programming, mainly arising
Note that the number of applications of Simpson's from the fact that SIMPS needs a single function name
rule is n, which is also the number of increments Ax along for the integrand, whereas the integrands of (2.1.11,
each half of the plate. For example, if n = 2, an esti- (2.1.2), (2.1.4), and (2.1.5) involve either the product or
mation of B,, from (2.1 .l) will involve five values of difference of two functions. This difficulty is overcome by
B,: B 2 ( - 2 ) ,BZ(-l),BZ0, BZ1,and B,,, which are just defining additional functions, such as F B I , which stands
enough for two applications of Simpson's rule. for the product of B,(x,) and f(x,,x,,d).
Numerical Integration

Flow Diagram
Main Program

i = o , 1,
..., n
conu
t T
I
.
\
\
x-

w/2
11 i + J- , , ~2Cx2)fCxl~
(evaluate using SIMPS)

Y
B c Bli

t
BI i + EI + (1 - &I)IIi 4 5 + EZ + (1 - &,)Izi

QTI>~
B - Bli
(evaluate using SIMPS)

T F
tf B,(x,)f (XI, x2, d)dx,
conu (evaluate using SIMPS) QI,Qz
+F

Function SIMPS (Arguments: a, b, n , f )


r
6-a S,tO
h c -
2n --s,+o x+a+2h(k- 1)
I
Example 2.1 Radiant Interchange between ParsNel Plates-Composite Simpsan's Rule 83

FORTRAN Implemenfatiorr
List of Principal Variables
Program Symbol D@nition
(Main)
B1, ~ 2 t Vectors, containing the radiosities Bli and Bli at each point i (BTU/hr sq ft).
CONV Logical variable used in testing for convergence.
13 Separation between the plates, d (ft).
DIF1, DIF2 -
F~lnctions,giving the differences (Bl(xt) Zl(x,)) and (B2(x2)- Z,(x,)), respectively.
DX Spacing, Ax, between points at which radiosities are computed (ft).
El, E2 Ernissive powers, El = e1oTf and B2= e,aT$ (BTU/hr sq ft).
EPSI, EPS2 Ernissivities, el and e,, of the lower and upper plates, respectively.
FBI, FB2 Functions, giving the products B1(xl)f(x,, xz, d) and B2(x2)f(xl, x,, d), respectively.
It Subscript, indicating point at which radiosity is being computed.
INTI, INT2P Vectors, containing the irradiosities I l l and Z2, at each point i (BTU/hr sq ft).
ITER Counter on the number of iterations, iter.
ITMAX Maximum number of iterations allowed, itmax.
N Number of increments, n, into which one half of each plate is subdivided.
Ql, 02 Heat supplied to the lower and upper plates, Q1and Q,,respectively (BTU/hr ft).
SAVE Temporary storage for rdiosities, B.
SIGMA Stefan-Boltzmann constant, a (1,712 x lom9BTU/hr sq ft OR4).
SIMPS Function for implementing Simpson's rule.
T I , T2 Temperatures, T1 and T2 ("R), of the lower and upper plates, respectively.
TO L Tolerance, 6, used in convergence testing,
W Plate width, w (ft),
X Distance from centerline (ft), either xl or xz .
XHIGH, XLOW Upper and lower limits on X,
(Function F B I )
ISUB Function for converting distance to corresponding subscript.
F Function for evaluating equation (2.1.3).
(FunctionSIMPS)
H Stepsize, h.
SUMEND, SUMMID The first and second summations of (2,1.6), S, and S,,,, respectively.
t Because of FORTRAN limitations, we have 1 6 I $ N f I,
corresponding to 0 < i < n:, thus, B,, through B1, in the text
+
correspond to B l (I) through 0 1 ( N 1) in the program.
Numerical Integration

Program Listing
Main Program
A P P L l ED NUMERl CAL METHODS, EXAMPLE 2.1
SIMULTANEOUS l NTEGRAL EQUATl ONS -
COMPOS l TE S IMPSON'S RULE

T H I S PROGRAM COMPUTES THE RADIANT HEAT TRANSFER BETWEEN TWO


I N F I N I T E , l SOTHERMAL, OPPOSED PARALLEL PLATES OF WIDTH W,
SEPARATED BY A DISTANCE D. THE PLATES HAVE TEMPERATURES
T 1 AND T2, AND E M I S S I V I T I E S E P S l AND EPS2. SIGMA I S THE
STEFAN-BOLTZMANN CONSTANT. THE I T E R A T I V E SOLUTION OF TWO
SIMULTANEOUS INTEGRAL EQUATIONS G I V E S VALUES FOR THE RADIO-
S I T I E S , B 1 AND 82, AND THE I R R A D I O S I T I E S , I1 AND 12, AS FUNC-
TIONS OF P O S I T I O N . THESE VALUES ARE OBTAINED A T A S E R I E S OF
N + l POINTS EQUALLY SPACED BETWEEN THE CENTER ( R 1 ( 1 ) , FOR
EXAMPLE) AND THE EDGE ( B l ( N + l ) ) OF EACH PLATE. THE FUNCTION
S I M P S I S USED TO EVALUATE INTEGRALS BY N REPEATED A P P L I C A T I O N S
OF SIMPSON'S RULE. Q 1 AND 4 2 ARE THE NET RATES OF HEAT TRANS-
FER TO THE PLATES TO M A I N T A I N THEM A T CONSTANT TEMPERATURES.
THE I T E R A T I O N S STOP E I T H E R WHEN ITMAX I T E R A T I O N S HAVE BEEN
PERFORMED OR WHEN NO COMPUTED R A D l O S l T Y CHANGES BY MORE THAN
A FRACTIONAL AMOUNT TOL FROM ONE I T E R A T I O N TO THE NEXT.

REAL I N T I , I N T 2
LOG l CAL CONV
COMMON B1, 82, W, N, X, DSQ, I N T I , I N T 2
DIMENSION B 1 ( 1 0 2 5 ) , R 2 ( 1 0 2 5 ) , I N T 1 ( 1 0 2 5 ) , INT2(1025)
EXTERNAL F B I , FB2, D I F 1 , D I F 2

C .... .
READ I N P U T DATA AND COMPUTE E M I S S I V E POWERS .....
-
1 READ ( 5 , 1 0 0 ) T1, T2, EPS1, EPS2, D, W, SIGMA, TOL, N, ITMAX
E l = EPSl*SIGMA+Tl+*k

--
E2 EPS2*SIGMA*T2++S
XHIGH = W/2.
XLOW XHl GH
WRITE ( 6 , 2 0 0 ) T1, T2, EPS1, EPS2, D, W, SIGMA, TOL, N, ITMAX,
1 El, E2, XLOW, XHIGH
C
C .....
DSQ * D*D
COMPUTE CONSTANTS, GUESS I N I T I A L R A O l O S l T l E S .....
DX = W/FLOAT(2*N)
NP1 = N + 1
DO 2 I=l,NP1
Bl(1) = E l
2 B 2 ( 1 ) = €2
C
C ..... PERFORM
DO 5
SUCCESSIVE
I T E R s l , ITMAX
ITERATIONS .....
CONV .TRUE.
C
C .....
DO 3
COMPUTE R A D l O S l T t E S ACROSS S T R I P ONE
t=l,NPl
.....
X = FLOAT(1-l)*DX

SAVE -
I N T l ( 1 ) = SIMPS(XLOW,XHIGH,N,FBZ)
Bl(l)
s i c i ) = E l + (I.O-EPS~)+INT~(I
3 I F ( ABS((SAVE-Bl(I))/Bl(I)) .GT. TOL CONV = .FALSE.
C
C .....
DO 4
COMPUTE R A D I O S I T I E S ACROSS S T R I P TWO
t=l,NPl
.....
X = FLOATCI-1)*DX
I N T 2 ( 1 ) = SIMPS(XLOW,XHIGH,N,FBl)
SAVE + B 2 ( 1 )
B 2 ( I ) = E2 + ( l . O - E P S 2 ) * l N T 2 ( 1 )
4 I F ( ABS((SAVE-B2(I))/B2(1)) .GT. TOL CONV = .FALSE.
C
C .I F....CONV
(
TEST
FOR CONVERGENCE OF I T E R A T I O N SCHEME
GO TO 6
.....
5 WRITE (6,202) ITER, (I,Bl(I),B2(I),INTl(l),lNT2(1), I=l,NPl)
C
C ..... P R I N T OUTPUT .....
WRITE ( 6 , 2 0 1 )
.....

6 WRITE (6,202) ITER, (l,8l(I),B2(1),1NTl(l),lNT2(1), Ia1,NPl)


Example 2.1 Radiant Interchange between Parallel Plates-Composite Simpson's Rule

Program Listing (Continued)


C
C ,.... COMPUTE HEAT TRANSFER RATE PER U N I T PLATE LENGTH
Q 1 = SIMPS(XLOW,XHIGH,N,DIFl)
.....
Q2 = S I M P S ( X L O W , X H I G H , N , D I F 2 )
WRITE ( 6 , 2 0 3 ) Q1, 4 2
C
GO TO 1
C
C
100
..... FORMATS FOR INPUT AND OUTPUT STATEMENTS
FORMAT( 4X, F8.2, 12X, F8.2,
.....
2(14X, F 6 . 2 ) / 4 X 8 F8.2, 12X, F8.2,
1 14X,E10.3,10X8 E6.1/ 5X, 15, 20X, 1 5 )
200 FORMAT(55HlRAOIANT HEAT TRANSFER BETWEEN I N F I N I T E PARALLEL PLATES/
1 lOHOTl = ,F12.2/ 1 0 H T2 = ,F12.2/ 10H EPSl = ,F12.2/
2 1 0 H EPS2 = ,F12.2/ 10H D = ,F12.2/ 1OH W = ,F12.2/
3 1 0 H SIGMA = ,E18.4/ 1 0 H TOL = ,E18.4/ 10H N = ,l9/
4 1 0 H ITMAX = , I 9 / l O H E l = ,E18.4/ 1 0 H E2 = ,E18.4/
5 1 0 H XLOW = ,F12.2/ 1 0 H XHlGH = ,F12.2
201 FORMAT( 36HOCONVERGENCE CRITERION NOT S A T I S F I E D )
202 FORMAT( 1HO/ 1 0 H I T E R = ,
19/ 6H0 I,5X, 5 H B l ( l ) , 10X85HB2(1),
1 lOX,7HIINT1(1 1, 8X, 7 H I N T 2 ( 1 ) / ( 1 H , 15, 4E15.6) )
203 FORMAT( l H O / l O H O Q l = , E15.6/ 10H 4 2 =, E15.6
C
END

Functions F B I , FB2, DIF1, DIF2.


FUNCTION F U l ( Y )
C
REAL I N T I , I N T 2
COMMON 81, 82, W, N, X, DSQ, I N T I , I N T 2
DIMENSION B 1 ( 10251, B 2 ( 10251, I N T I ( 10251, INT2( 1025)

STATEMENT FUNCTION D E F I N I T I O N S .....


C
C .....
ISUB(P) = INT(ABS(2.0*P/W)*FLOAT(N) + 0.001) + 1
F ( P ) = 0.5*OSQ/(DSQ + (X-P)**2)**1.5
C
I = ISUB(Y)
FB1 = B l ( 1 )*F(Y)
RETURN
c
ENTRY F B Z ( Y )
I = ISUB(Y)
FB2 = B 2 ( 1 ) * F ( Y 3
RETURN
C
ENTRY D I F l ( Y )
I = ISUB(Y)
D l F l = Bl(11
RETURN
- INTl(1)

C
ENTRY D I F Z ( Y )
I * ISUB(Y)
D I F 2 = B2(1 1
RETURN
- lNT211)
C
END

Function SIMPS

FUNCTION SIMPS( A, B, N, F
C
C THE FUNCTION SIMPS USES N APPLICATIONS OF SIMPSON'S RULE
C TO CALCULATE NUMERICALLY THE INTEGRAL OF F ( X ) * D X BETWEEN
C UNTEGRATION L I M I T S A AND B. SUMEND I S THE SUM OF A L L F ( X ( 1 ) )
C FOR EVEN I (EXCEPT FOR F ( X ( Z * N ) ) WHILE SUMMID I S THE SUM OF
C A L L F ( X ( 1 ) ) FOR I ODD. H I S THE STEPSIZE BETWEEN ADJACENT
C X ( I ) AN0 TWOH I S THE LENGTH OF THE INTERVAL OF INTEGRATION
i

Numerical Integration

Program Listing (Continued)


C FOR EACH I N D I V I D U A L APPLICATION OF SIMPSON'S RULE. K I S THE
C lTERATI ON COUNTER.
C
C ..... I N I T I A L I Z E PARAMETERS
TWOH = ( B - A ) / N
.....
H = TWOHIS.
SUMEND = 0.
SUMMID = 0.
C
C .....
EVALUATE SUMEND AND SUMMID
DO 1 K=l,N
.....
X = A + FLOAT(K-l)*TWOH
SUMEND = SUMEND + F ( X )
1 SUMMID = SUMMID + F(X+H)
C
C .. ..
, RETURN ESTIMATED VALUE OF THE INTEGRAL
SIMPS = (2.0*SUMEND + 4.O*SUMMID -
F(A) + F(B))*H/3.
.....
RETURN
C
EN0

Data
T2 = 500.00 EPSl = 0.80 EPS2 = 0.60
W = 1.00 SIGMA= 1.712E-9 TOL 31. OE-6
ITMAX = 25
T2 = 500.00 EPSI = 0.80 EPS2 = 0.60
W = 1.00 SIGMA= 1.712E-9 TOL =l.OE-6
ITMAX I 25
T2 = 800.00 EPS1 = 0.20 EPS2 = 0.60
W = 1.00 S l GMA= 1.712E-9 TOL =l.OE-6
ITMAX = 25

Computer Output
Resvlts for the 1st Data Set
RADIANT HEAT TRANSFER BETWEEN I N F I N I T E PARALLEL PLATES

TI
T2
EPSl
EPS2
D
W
S l GMA
TO L
N
ITMAX
El
E2
X LOW
XH l GH
Example 2.1 Radiant Interchange between Parallel Plates-Composite Simpson's Rule

Computer Output (Contiilzued)


ITER = 3

ITER = 4

ITER = 5

Partial Results for the 2nd Data Set (Same as 1st Set, with N=8)
ITER a 5

Partial Results for the 3rd Data Set


RADlANT HEAT TRANSFER BETWEEN I N F I N I T E PARALLEL PLATES

T2
EPS 1
EPS2
D
W
S 1 GMA
TO L
N
l TMAX
E1
E2
XLOW
XHIGH
Numerical Integration

Computer Output (Continued)


Example 2.1 Radian2 Interchange between Parallel Plates-Composite Simpson's Rule 89
Discussion of Results end leakage is least important at this point. Q, is negative,
since the upper plate receives more energy from the lower
The first two sets of results are for TI= 1000°R, plate than it can radiate and reflect to its surroundings,
T2 = 500°R, E , = 0.8, and E, = 0.6. Convergence within and so must be cooled to maintain its temperature
the specified tolerance is rapid, occurring after five constant.
iterations. The results for n = 2 are almost identical with The third set of results is for TI = 1000, T2= 800,
those for n = 8, indicating that good accuracy can be E , = 0.2, and E~ = 0.6. These conditions are such that
obtained with just a few subdivisions. The radiant fluxes heat must be supplied to both plates at approximately
are most intense at the center of the plate (I = I), since equal rates.
90 Numerical Integration

2.7 Repeated Interval-Halving and Romberg Integration assuming that no round-off error enters into the calcula-
tions.
Let T,,, be the computed estimate of an integral
The Richardson extrapolation technique of (2.47) can
now be applied to each pair of adjacent elements in the
[f ( 4 dx
sequence To,,, TI,,, . . . to produce a third (hopefully
by using the composite trapezoidal rule of (2.42) with improved) estimate of the integral. Let I* in (2.47), cor-
n = 2N. Then To,, is the estimate of the integral using the responding to the pair of estimates TN,,,TN+l,l,be de-
simple trapezoidal rule, TI,, the estimate for two appli- noted TNVZ SO that

cations, T,,, the estimate for four applications, etc. TN,,


involves twice as many subintervals as TN- . Hence N ,,
can be viewed as the number of times the initial integra-
tion interval [a,b] has been halved to produce subintervals For example, from (2.52a) and (2.52b),
-
of length h = (b a)/2N. From (2.42):

To.1 =-I-
(b-a) 1
1 2 [ f ( a )+ f ( b ) l )
(2.52a)
J

= (b-a)(/(a) + 4 j (a + q))
+ I /(b), ,
(b - a ) 1
Tt.1 = - 2 (5 [ f ( a ) f(b11 + +f (a + v))
1 6

which is just the integral predicted by one applica-


(2.56)

~ ~ . ~ + ( b - (a+?))]
o)f (2.52b) tion of Simpson's rule. Investigation of the sequence
To,,, .. ., TNV2leads, by induction, to the conclusion
Ta.1 =Y'(:U(~)+
f f(b)l+ = If(a+?)i)] that T,,, is the estimate of the integral which would be
computed by using the composite Simpson's rule of (2.49)
1 with n = 2N. Thus TN,2is the value computed for the
& (a+ V ' i ) )
+?I 1-1
= z ( ~ l , l (2.52~) integral by Simpson's rule after halving the initial interval,
A l r ~ [a,b], N times.
- From (2.49), the error in T ~ , is
, given by

Provided that f(x) has a continuous and bounded fourth


derivative on the interval (a,b), (2.57) assures that the
sequence To, . . . TN,, converges to the true integral,
assuming no round-off error.
The Richardson extrapolation technique of (2.51) can
now be applied to each pair of adjacent elements in the
sequence To,,, T, ,,, . .. to produce yet another sequence
By induction, the general recursion relation for TN., in of estimates,
terms of TN-,,, is

Investigation of the sequence To,,, T , , , . . . shows that


T,,, is the estimate of the integral that would be com-
The recursion relation of (2.53) can be used to compute puted by using the composite version of the five-point
the sequence TI,,,T,,,, . .., TNSIonce To,, has been cal- Newton-Cotes closed formula of (2.21d) with 2Nrepeated
culated. The function f (x) need be evaluated just 2N+ 1 applications or, alternatively, after halving the original
times to compute the entire sequence. integration interval, [a,b], N times. The error term for
Corresponding to T,,,, the error term given by (2.42) T,,, has the form

- (b - a)'
f(
1935360(2)6N
) t in (a,b). (2.59)
., Provided that f(x) has a continuous and bounded sixth
Provided that f ( x ) has a continuous and bounded second derivative on the interval (a,b), (2.59) assures that the
derivative on the interval (a,b), (2.54) assures that the sequence To,, . . . TN,, converges to the true integral,
sequence To,, .. . TN,, converges to the true integral, assuming no round-off error.
2.7 Repeated Znrerval-Halving and Romberg Integration 91
The Richardson extrapolation technique can be applied To use this technique, we simply compute the elements
to each pair of adjacent elements in the sequence of the first column using (2.52a) and (2.53), fill out the
To,,,T I , ,. . . to produce another sequence of estimates, remaining elements of the triangular array using (2.61),
and then examine the number sequences down each
column and across each row. Each of the sequences
should converge to the true integral.
which can be shown (see Rauer et al. [6]) to converge to The error corresponding to TNg as defined in (2.61),
the true integral. can be shown [6] to be equal to
The relationships of (2.55), (2.58), and (2.60) are special
cases of the general extrapolation formula,

where k(j) is a constant that depends on a, b and j, but is


credited to Romberg and described in detail by Bauer independent of N.
et al. [ 6 ] , who show that each of the sequences T N , j for
,
j = 1,2, . . ., converges to the true integral with increasing Example. Use the Romberg integration scheme to estimate
N. In addition, the sequence To,,To,,, ..., TOpjalso the value of In 137.2 from the integral
converges to the true integral for increasing j. The 137.2 1
sequences TN,jfor j > 3 do not correspond to composite In 137.2 = , - dx.
X
rules for Newton-Cotes closed integration as do those for
j < 3. Table 2.1 shows the results in the tabular form of (2.62). The
These Romberg sequen~cescan be arranged in simple first column contains the results for evaluation of the integral
tabular form as follows: using the composite trapezoidal rule of (2.53), after computing
the first entry with the simple trapezoidal rule of (2.52a). The
remaining entries in the Romberg tableau are the results of
repeated extrapolation using (2.61).
The true integral to six figures is 4.92144. Clearly, each
column sequence is converging to this value. The sequence
across the top row is also converging to this value. The
apparent divergence of the last entry or two in each column
results from round-off errors in the calculations (recall that
the last entry in the first column involves 213 = 8192 repeated
applications of the trapezoidal rule).

Table 2.1 Romberg Tableaufor Evalrrarion of In 137.2


EXAMPLE 2.2
FOURIER COEFFICIENTS USING ROMBERG INTEGRATION

Problem Statement To,1 = (b - a)ff (a) +f (b)1/2,


Write a genera'-purpose named followed by repeated interval halving using the recursion
that uses the Romberg integration algorithm outlined relation of (2.53),
in Section 2.7 to evaluate numerically the integral

where f(x) is any single-valued function and a and b are for N = 1,2, . . ., N,,,. The Romberg extrapolation
finite. The program should first use the trapezoidal rule
formula of (2.61),
with repeated interval halving to determine To,,, TI,,, . . .,
TN,,x,l from (2.52a) and the recursion relation (2.53).
Then the Romberg sequences {TNYj} should be computed
from the general extrapolation formula (2.61) for all
j d j,,,. The Rornberg Tableau should be organized as is then employed for j = 2, 3, . . ., j,,,, with N = 0, 1, . . .,
illustrated in Table 2.1. +
N,, - j 1, to fill out the remaining elements in the
To test the subroutine, write a general purpose program first j,,, columns of the matrix T.
that calls on TROMB to evaluate the coefficients of the The integrands for the integrals of (2.2.3) and (2.2.4),
Fourier expansion for any arbitrary function g(x),
periodic with period 2n, such that g(x) = g(x 2kn) for +
integral k. The Fourier expansion may be written [I41
w w
g(x) = C cmcos mx + C dmsin mx, (2.2.2) are evaluated by the functions FUNCTC and FUNCTD,
m=O m=O
respectively, defined in one multiple-entry function. The
where
periodic function g(x), which for the suggested function
1
cm= -
R
1
-n
g(x) cos mx dx (2.2.3) of Fig. 2.2.1 is given by

dm = -
1
71
I"
-n
g(x) sin mx dx. (2.2.4) is also defined in the multiple-entry function.
From (2.2.4), it is clear that for all g(x), do = 0. For the
Write the program so that the coefficients (cm,dm)are periodic function of (2.2.7), the coefficients cmand dmof
calculated in pairs, for m = 0, 1, . ., m,,,.. (2.2.3) and (2.2.4) may be found analytically, and are
As a test periodic function, g(x), use the sawtooth given by
function of Fin. 2.2.1.

I sin 2x sin 3x
Figure 2.2.1 A periodic sawtooth function.
2 +- 3 - ...). (2.2.10)
Method of Solution In the programs that follow, all cmand dmare evaluated
l'he subroutine TROMB is a straightforward implemen- ..., nz,,. Thc Romberg tableaus for em
fur rrr = 0, 1,
tation of the trapezoidal rule of (2.52a), and dmare stored in the matrices C and D respectively.
Example 2.2 Fourier Coefficients using Romberg Integration 93
Flow Diagram
Main Program

Compute elements of Romberg


Tableaus CNj, DNj,j = 1, 2, . . .,jmaX,
N = 0 , 1, ..., N m a X - j + 1, where
1 "
/ CNj--J ~,(x)~x=c,
/ 7.t -n

C ~ jD
, ~j,
j = 1, 2, ...9jmax, D, =-
71
1
1 h(x) dx
-n
= dm
N = O , 1,..., N m x - j + 1
(Subroutine TROMB)

Functions FUNCTC, FUNCTD, G (Argument : x)

g(x) cos mx

(Function G)

g(:x) sin mx

(Function G)

0-0
Enter
Return

Subroutine TROMB (Durnmy arguments: N,


callling arguments: N,,,,
a, b , f , T, j,,,
n;
-n, n, FUNCTC or FUNCTD, C or D,j, n)

b-a
To1 +--
2 [f(a) +f (b)l '-- TNl + o

A I
94 Numerical integration

FORTRAN Implementation
List of Principal Variables
Program Symbol Dejinition
(Main)
c, Dt ;.:ztrices C and D, containing the Romberg tableaus for cmand dm,respectively.
J Column subscript for tableaus, j.
JM Maximum column subscript in Nth row of tableau.
JMAX jmax,number of columns in tableau.
M m, index on Fourier coefficients cmand dm.
M MAX m,,, maximum value of m.
Nt Row subscript for tableaus, N.
N MAX Nma,, maximum value of N.
NMAXPl Nmax + 1.
PI IT.
MMAXPl mmax + 1.
MPLUSI m + 1.
(Functions
FUNCTC,
FUNCTD,
G)
X The variable of integration, x.
(Subroutine
TROMB)
A, B Lower and upper limits of integration, a and b.
F The integrand function, j:
FR (b - a)/2N.
FORJMl 4j-1.

H b - a.
I i, index on repeated sum of (2.53).
l MAX 2N- 1 .
N RC n, number of rows and columns in tableau T.
NXMJPZ N, -j 2. +
Tt Matrix containing the Romberg tableau, T.

t Because of FORTRAN limitations, the row subscripts of the


text and flow diagrams are advanced by one when they appear in
the program. For example, N assumes values 1, 2, ..., N,., + 1,
+
so that 7'0.1 = T(1, 11, T ~ m a x , =
, T(N,, 1, 11, etc.
Example 2.2 '. Fourier Coefficients using Romberg Integration

Program Listing
Main Program
C A P P L l ED NUMERI CAL METHODS, EXAMPLE 2.2
C FOURIER COEFFICIENTS USING ROMBERG INTEGRATION
C
C T H l S TEST PROGRAM CALLS ON THE SUBROUTINE TROMB TO COMPUTE
c THE INTEGRALS NECESSARY TO DETERMINE THE CORFICIENTS
C OF THE FOURIER EXPANSION FOR A FUNCTION G ( X ) ON THE
C INTERVAL ( - P I , P I ) WHERE THE FUNCTION I S PERIODIC FOR A L L
C X SUCH THAT G ( X ) = G(X + 2 * K * P I ) , K BEING AN
C INTEGER. THE F I R S T MMAX COEFFICIENTS OF THE COSINE AND
C S I N E TERMS (THE C(M) AND D(M) OF THE TEXT) ARE COMPUTED
C US l NG THE TRAPEZOIDAL RULE Wl TH REPEATED l NTERVAL H A L V l NG
C FOLLOWED BY THE ROMBERG EXTRAPOLATION PROCEDURE. THE ROMBERG
C TABLEAUS FOR C(M) AND D(M) ARE STORED I N THE UPPER TRIANGULAR
C PORTIONS OF THE F I R S T N M A X + l ROWS OF THE F I R S T JMAX COLUMNS OF
C THE C AND 0 MATRICES RESPECTIVELY. FOURIER COEFFICIENTS
C FOR ANY ARBITRARY P E R I O D I C FUNCTION CAN BE FOUND BY D E F I N I N G
C G(X) APPROPRIATELY (SEE THE FUNCTIONS FUNCTC AND FUNCTD).
C
I M P L l C l T REAL*8(A-H, 0-2)
DIMENSION C(20,20), D(20,20)
EXTERNAL FUNCTC, FUNCTD
COMMON M
DATA P I / 3 . 1 4 1 5 9 2 6 5 3 5 8 9 8 /
C
C .....
READ DATA, C A L L TROMB TO COMPUTE INTEGRALS
1 READ (5,100) MMAX, NMAX, JMAX
.....
WRlTE (6,200) MMAX, NMAX, JMAX
MMAXPl = MMAX + 1
DO 3 MPLUlSl=l,MMAXPl
M MPLUSl 1 -
CALL TROMBi( NMAX, -PI, PI, FUNCTC, C, JMAX, 20 )
CALL TROMBI( NMAX, -PI, PI, FUNCTD, D, JMAX, 20 )
C
C ..... P R l NIT OUT ROMBERG TABLEAUS
WRlTE ( 6 , 2 0 1 ) M
.....
NMAXPl = NMAX + 1
DO 2 N=l,NMAXPl
JM = JMAX

2
I F ( N.GT.NMAXPl+l-JMAX
WRlTE ( 6 , 2 0 2 ) (C(N,J),
JM = NMAXPl + 1
J=l,JM)
- N

WRlTE ( 6 , 2 0 3 ) M
DO 3 N=l,NMAXPl
JM = JMAX
I F ( N.GT.NMAXPl+l-JMAX JM = NMAXPl + 1 - N
3 WRlTE (6,202) (D(N,J), J=l,JM)
GO TO 1
C
C
100
..... FORMATS FOR I N P U T AND OUTPUT STATEMFNTS
FORMAT( 7X, 13, 2(12X, 1 3 ) )
.....
200 FORMAT( ~ H ~ M M A =X 1 2 / 8H NMAX = , 1 2 1 8H JMAX = , 12 )
201 FORMAT( 1HO/ 1 ~ 0 , 9 i , ~ H C ( , 1 2 , l ~ ) / 1 H
202 FORMAT( 1 H ,
1P7E17.8
,L. 203 FORMAT( 1HO/ 1H0,9X, 2HD(, 12,1H)/ 1H 1

END

Functions FUNCTC, FUNCTD, G


FUNCTION FIJNCTC( X
C
C THE FUNCTIONS FUNCTC AND FUNCTD COMPUTE RESPECTIVELY THE
C INTEGRAND FOR THE M ( T H ) C O E F F I C I E N T OF THE COSINE AND S I N E
C TERMS OF THE FOURIER EXPANSION OF THE PERIODIC FUNCTION
C G ( X ) = :K.
Numerical Integration'

Program Listing (Continued)


L
I M P L I C l T REAL*I(A-H, 0-2)
REAL*8 X, FUNCTC, FUNCTD
COMMON M
DATA P I / 3 . 1 4 1 5 9 2 6 5 3 5 8 9 8 /
C
C .....*
G(X)
D E F I N E P E R I O D I C FUNCTION
X
.....
C
FUNCTC = G ( X ) * D C O S ( F L O A T ( M ) * X ) / P I
RETURN
C
ENTRY FUNCTD( X
FUNCTD = G ( X ) * D S I N ( F L O A T ( M ) * X ) / P I
RETURN
C
END

Subroutine T R O M B
,. SUBROUTINE TROMS( NMAX, A, B, F, T, JMA.X, NRC

THE SUSROUTI NE TROMB F l RST APPROXIMATES THE l NTEGRAL OF


F ( X ) * D X ON THE INTERVAL (A,Bl USING THE TRAPEZOIDAL
RULE W I T H REPEATED INTERVAL HALVING. T(N+1,1) I S THE VALUE
OF THE lNTEGRAL COMPUTED AFTER THE N(TH) INTERVAL-HALVI NG
OPERATION. A L L T(N+1,1) VALUES ARE COMPUTED FOR N = 0 TO
N = NMAX. H I S THE LENGTH OF THE STARTING INTERVAL (A,S).
REMAINING ELEMENTS OF THE ROMBERG TABLEAU ARE THEN ENTERED
l NTO THE F l RST JMAX COLUMNS OF THE F I R S T N M A X + l ROWS OF THE
MATRl X T.
C
I M P L I C I T REAL*8(A-H, 0-2)
REAL*8 A, B, F, T
DIMENSION T(NRC,NRC)
C
C ..... COMPUTE H AND F l R S T INTEGRAL APPROXIMATION
H I S - A
.....
T(1,l) = ( F ( A ) + F(S))*H/2.O
r'
L
C
DO 2
.....
HALVE l NTERVAL REPEATEDLY, COMPUTE T ( N+1,1)
N=l,NMAX
.....
T(N+1,1) = 0.0
FR = H/2.0**N
IMAX = 2**N
DO 1
1
I = l,IMAX,2
-
1 T(N+1,1) = T ( N + l , l ) + F(FLOAT(I)*FR + A)
2 T(N+1,1) = T(N,1)/2.O + H*T(N+l,l)/Z.O**N
C
C .....
DO 3
COMPUTE ROMBERG TABLEAU
J=Z,JMAX
.....
WXMJPP
FORJMl
= NMAX
= 4.0**(J-1)
- J + 2

C
3
DO 3
T(N,J)
N=l,NXMJPZ
= ( F O R J M l * T ( N + l , J-1) -
T(N,J-l))/(FORJMl - 1.0)

RETURN
C
END

Data
MMAX = 10 NMAX = 13 JMAX = 7
Example 2.2 Fourier Coeficients using Romberg Integration

VImmmmYIPf
dddddddd
I 1 1 1 1 1 1 1
nnnnnnnn
m m n t m w m d
O f f SQIDhdCD
mocod.(~~mmm
m m d d d h d o
M I D h O d f ;to
h N M f f ma00

........
m m O N d m h h
m d d m m m m n
4lOnmmma m
........
NNNNNNNN
-

1 1 4 1 1 1 1 1

nnnonnnnn
r.OOOOOOOO
m 0 0 0 0 0 0 0 0
m o o o o o o o o
~ O O O O O O O O
m o o o o o o o o
m o o o o o o o o
m o o o o o o o o
m
.........
o o
dNNNNNNNN
o o o o o o

nannnnnnno
m m o o o o o o o o
m m o o o o o o o o
m m o o o o o o o o
~ m o o o o o o o o
o m o o o o o o o o
o m o o o o o o o o
O ~ O O O O O O O O

..........
o m 0 0 0 0 0 0 0 0
NdNNNNNNNN

nnnonnnann0
N m N 0 0 0 0 0 0 0 0
a m 0 0 0 0 0 0 0 0 0
m m o o o o o o o o o
~ O O O O O O O O O O
o a 0 0 0 0 0 0 0 0 0 0
f 0 0 0 0 0 0 0 0 0 0
m o o o o o o o o 0 0
...........
m o o o o o o o o o o
d N N N N N N N N N N

nn0nnonn0nnn
f m m m 0 0 0 0 0 0 0 0
'h?lh00000000
A O m m 0 0 0 0 0 0 0 0
m r - m m o o o o o o o o
o m m m o o o o o o o o
t o a m m o o o o o o o o
m m m m o o o o o o o o
~ m m m o o o o o o o o

9
w

._
D
3
Z
9
9
om,
'44
C1 ~ o o m a m m m m m m m u ~ m
a u n l r oaar.mhmmmmmmmmm
9 xx,
oomwmmmmmmmmmm

8 a s<
ESP
z 2 -3
8
CI
a
Q
8
Numerical Integration

dddrlrlrlrld ddrld*r(dd
00031000 ~ 0 0 0 U 0 0 0
1 1 1 1 1 1 1 1 I I I I I I I I
nnnnonnn onnnnnnn
m h 0 0 0 0 0 0 m n h w w w w w
~ m O O O O O O tmmmmmmm
~ m O O O O O O r.QINNNNNN
~ ~ O O O ~ O O L D N f f t f f t
N m O O o O O O m d d d d d d r l
o m o o o o o o dhhi.r.hhh
0 m 0 0 0 0 0 a
0 m 0 0 0 0 0 0
........
wmmmmmmm
mmmmmmmm
N N N N N N N N

rldrldddddd
000000000
I I l I t t I I I
nnnnnnano
m m t 000000
m w w ~ ~ o o o o
o d m o o o o o o
m w m 0 0 0 0 0 0
f N m 0 0 0 0 0 0
N 0 m o 0 0 0 0 0
m 0 m 0 0 0 0 0 0
m o m o o o o o o
.........
N N N N N N N N N

no0no
rlmmm n
o m mmm
N h d m m
4 m m rlm
mmrl
d t h
fmr.
d m m
Example 2.2 Fourier Coefficientsusing Romberg Integration

Discussion of Results Table 2.2.1


Calculated Estimates of dm
Romberg tableaus are shown only for the coefficients
c0, do, dl, d 5 , d,, and d l , to conserve space. All calcu- m dm dm
lations were performed in double-precision arithmetic. (Calculated) (Exact)
The results shown for c , are typical of those found for 0 0.00000000 0
all the c,, m = 0, 1, . . ., 1!0. All entries in the C matrices 1 2.00000000 2
are in the range 10-16to 5 x 10-14, giving exceptionally 2 -1.00000000 -1
good agreement with the true values, c, = 0.Assuming 3 0.666666667 213
that ~ ( 8 , 7 ) ,corresponding to T,,,, is the most accurate 4 -0.500000000 -112
5 0.400000000 215
approximation to dm, the best estimates found by the 6 -0.333333333 -113
program are shown in Table 2.2.1. In every case, there 7 0.285714286 217
is at least nine-figure agreement (the number of figures 8 -0.2500000 -114
printed) between the results of the Romberg integration 9 0.222222222 219
and the exact value of the integral. 10 -0.200000000 - 1 /5
100 Numerical Integration

2.8 Numerical Integration with Unequally Spaced respect to the weighting function w(x) = 1, that is,
Base Points
All the integration formulas developed in the preceding
sections are of the form given by (2.22)

The first few Legendre polynomials are:


where the n + 1 values w iare the weights to be given to
the n + 1 functional values f (xi). The x, have been speci-
fied to be equally spaced so there is no choice in the
selection of the base points. If the xi are not so fixed and
if we place no other restrictions on them, it follows that
there are 2n + 2 undetermined parameters (the wi and
xi), which apparently might suffice to define a polynomial
of degree 2n + 1. The Gaussian quadrature. formulas to The general recursion relation is
be developed in Section 2.10 have a form identical with
(2.22), that is, they involve the weighted sum of n 1 +
functional values. The xi values to be used are not evenly
spaced, however, but are chosen so that the sum of the Laguerre Polynomials: Y,,(x). The Laguerre poly-
n + I appropriately weighted functional values in (2.22) nomials are orthogonal on the interval [O,m] with
yields the integral exactly when f(x) is a polynomial of respect to the weighting function w(x) = e-", that is,
+
degree 2n 1 or less. Before proceeding with the develop-
ment, some background material on orthogonal poly-
nomials is required.

2.9 Orthogonal Polynomials


Two functions g,(x) and g,,(x) selected from a family The first few Laguerre polynomials are:
of related functions gk(x) are said to be orthogonal with
9 d x ) = 1,
respect to a weighting function w(x) on the interval [a,b]
if Y1(x) = - X + 1, (2.68)
Y2(x) = x Z- 4x + 2,
Y3(x) = - x 3 +
9x2 - 18x + 6.
The general recursion relation is

In general, c depends on n. If these relationships hold for


Chebyshev Polynomials: T,(x). The Chebyshev poly-
all n, the family of functions {gk(x)) constitutes a set of
orthogonal functions. Some common families of orth- nomials, already described in some detail in Chapter 1,
ogonal functions are the sets {sin kx) and {COSkx). are orthogonal on the interval [- 1,1] with respect to the
Orthogonality can be viewed as a generalization of the weighting function w(x) = 11dl - x2, that is,
perpendicularity property for two vectors in n dimen-
sional space where n becomes very large and the elements
(coordinates) of the vectors can be represented as con-
tinuous functions of some independent variable (see [l]
for an interesting discussion and geometric interpreta-
tion). For our purposes, the definition (2.63) is adequate.
The functions 1, x, xZ,x3, . . ., x" are not orthogonal. The first few polynomials (see Table 1.12 for a more
However, several families of well-known polynomials do complete list) are:
possess a property of orthogonality. Four such sets are
the Legendre, Laguerre, Chebyshev, and Hermite poly- TXX) = 1.
nomials.
Legendre Polynomials: Pn(x). The Legendre poly-
nomials are orthogonal on the interval [- 1,1] with
2.10 Gaussian Quadrature 101

The general recursion relation is Collecting the coefficients of like powers of x,

Hermite Polynomials: I;l,(x). The Hermite polynomials


are orthogonal on the interval [- m,m] with respect to
the weighting function e-"', that is,
Thus

The first few Hermite polynomials are:

The general recursion relation is


The reader should verify that the polynomial p4(x)=
+ - + -
x4 3x3 2x2 2x 1 is equivalent to
General Comments on Orthogonal Polynomials. The
sequences of polynomials {Pn(x)), {9,,(x)), {T,,(x)), and
{Hn(x)),respectively satisfying relationships (2.64), (2.67),
(2.70), and (2.73), are unique. Each of the polynomials
P,,(x), 9,,(x), Tn(x), and H,,(x) is an nth-degree polyno-
mial in x with real coefficients and n distinct real roots 2.10 Gaussian Quadrature
interior to the appropriate interval of integration; for
example, all n roots of Pn(x) lie on the open interval Gauss-Legendre Quadrature. As before, we estimate
(- 1,l). Stroud and Secrest [7] discuss these and other
properties of several families of orthogonal polynomials
the value of the integral I f (x) dx by approximating the
a
b

in detail. function f (x) with an nth-degree interpolating poly-


An arbitrary nth-degree polynomial pn(x) = &, aixi nomial p,(~),and integrate as follows:
may be represented by a linear function of any of the
above families of orthogonal polynomials. Thus
Here, R,(x) is the error term for the nth-degree inter-
polating polynomial. Since the xi are not yet specified,
the Lagrangian form of the interpolating polynohial,
(1.43), which permits arbitrarily spaced base points, will
where Zi(x) is the ith-degree polynomial of one of the be used with its error term, (1.39).
families of orthogonal polynomials. Expansions for the
monomials xi in terms of the Chebyshev polynomials are
given in Table 1.13.
Example. Expand the fourth-degree polynomial p&) in
terms of the Legendre polynomials. a < S < b, (2.78)
Substitution of the polynomials P,(x) of (2.65) into (2.76)
leads to where

% simplify the development somewhat, but without


removing any generality of the result, the intcrval of
102 Numerical Zntegrotion

integration will be changed from [a,b] to [-1,1] by a Note that (2.84) is of the desired form (2.22) and that the
suitable transformation of variable. Assume that all the second integral on the right-hand side of (2.83),
base points are in the interval of integration, that is,
a < xo, x,, . ..,xn,< b. Let the new variable, z, where
- 1 < z < 1, be defined by
is then the error term for the integration or quadrature
2x - (a + b) formula of (2.84). The object is to select the zi in such a
z= (2.79)
b-a way that the error term (2.86) vanishes. The orthogonality
Define also a new function F(z) so that property of the Legendre polynomials of (2.64) will be
used to establish such values zi.
First, expand the two polynomials qn(z) and
n L 0 [ z - zi) in terms of the Legendre polynomials,
as illustrated in Section 2.9:
Then (2.78) becomes

where

Li(z)= n4=0 -(z-zj)


(zi - zj)
and -l<e<l.
and
J +'

Here, zi is simply the base-point value xi transformed by


(2.79). Now if f(x) is assumed to be a polynomial of
degree 2n + 1 as suggested earlier, then the term
F("+')(R/(n + 1) ! must be a polynomial of degree n, since
The product qn(z) n7=o(z - zi) is, from (2.87) and (2.88),
xLoLi(z)F(zi) is a polynomial of degree n at most, and
nYeo (z - zi) is a polynomial of degree n + 1. Let

The integral of (2.86) then assumes the form

where qn(z) is a polynomial of degree n. Then

Because of the orthogonality properties of the Legendre


Now integration of both sides of (2.82) between the polynomials, all terms of this integral that are of the
integration limits - 1 and 1 gives form

will vanish [see (2.64)]. Thus the error term (2.86) for
the quadrature formula of (2.84) may be written
Since the F(zi) are fixed values, the summation operator
can be taken outside the integral sign. Dropping the
right-most integral, (2.83) becomes
= ibici1 [pi(z)I2 dz.
i=O
1

-1
(2.92)
One way to make this expression vanish is to specify that
+
the first n 1 of the bi, i = 0,1,. . .,lt, are zero. The
coefficient b,+l of Pn+,(z) is still unspecified, but from
where
(2.87) it must be given by
2.10 Gaussian Qi~adraturr 103
For example, for n = 3, the coefficient of the high-order base points to be used in the integration formula of (2.84)
(z4) term of Pn+ ,(z) = P,(z) is 3518 [see (2.65)]. Since the +
are the n -t 1 roots of the appropriate [(n 1)th-degree]
high-order coefficient of I11=o(z - zi) is 1, b, + = 8/35. Legendre polynomial. The relative weight assigned each
The important feature of (2.93) is that the polynomial functional value F(zi) is given by (2.85). Values of the
n l Z o ( z- zi)is already in factored form, that is, it has appropriate base points (roots) and weight factors for
t h e n + I roots zi, i=0,1, ...,n.Since bn+,P,+,(z) is the n = 1,2,3,4, 5,9, and 14 (corresponding to the 2, 3,4, 5,
same polynomial, the zi must be the roots of bn+,P,+,(z) 6, 10, and 15-point formulas respectively) are shown in
as well, or equivalently, of Pn+,(z). Thus the n f 1 Table 2.2. The integration formulas of (2.84) with base

Table 2.2. Roots of the Legendre Polynomials P.+,(z) and the Weight
Factors for the Gauss-Legendre Quadrature[4]
-
Roots (zi)
/: F(z) dz = 2 wtF(zt)
1=0

Weight Factors ( w , )
Two-Point Formula
n=l

Three-Point Formula
n=2

Four-Point Formula
n =3

Five-point Formula
n=4

Six-Point Forn~ula
n=5

Ten-Poitit Formula
n-9

~ifieed-pointFormula
n = 14
104 Numerical Integration

points given by the roots ziand weight factors w ilisted and


in Table 2.2 are called the Gauss-Legendre quadrature
formulas. Computation of the roots ziand weight factors
w ifor several values of n is illustrated in Example 3.4.
The five-point quadrature is given by
Example. Use the two-point Gauss-Legendre quadrature
formula to evaluate

The evaluation of the five-point formula for the given


F(z) is shown in Table 2.3. The computed estimate,
From Table 2.2, the two-point formula is In 2 = 0.69314712, is accurate to six figures.
Normally the limits of integration will not be - 1 and 1,
as required by the Gauss-Legendre quadrature formulas.
One approach to the evaluation of
We have (2.94)
F(-0.57735.. .) = 0.56353297
F(S0.57735.. .) = 2.10313369. where a and b are arbitrary but finite, is to transform the
function f(x), a < x < b, to the interval - 1 ,< z ,< 1 ,
+
-
Since both weights are 1, F(z) dz = F(zo) F ( z l ) , so that
using- the transformation of (2.79) as illustrated in the
last example. An aiternative and usually much simpler
-I
(z3 + z 2 + z + 1) dz 2.66666666.
approach is to transform the Gauss-Legendre quadrature
To the given number of figures, this result is exact (as would formula from the standard interval - 1 < z < 1 to the
be expected), since the two-point formula (n = 1) is exact desired interval a < x < b, using the inverse of (2.79),
when F(z) is a polynomial of degree three (2n 1) or less. + z(b - a ) + b + a
X = (2.95)
Example. Use the five-point Gauss-Legendre quadrature to 2
compute an estimate of In 2, that is, evaluate
Then (2.94) becomes
JI2$ - ln x
1: = In 2 = 0.69314718.
2 2
Transforming the variable from 1
using (2.79),
Gx G3 to - 1 < z G 1, .
(2.96),
Since the standard Gauss-Legendre quadrature is given
2xP(b+a) -2x-2-1
Z = - =2x-3. by
b-a 2-1 1 n

Then
J- 1 ~ (d z~ =)
i=O
wi~(zi), (2.97)

the integral of (2.96) can be approximated by

Table 2.3 Five-Point Gauss-Legendre Quadrature


2.10 Gaussian Quadrature 105

The general formulation of the Gauss-Legendre quadra- past. The weight factors and base points for the compu-
ture given by (2.98) is particularly suitable for machine tation are inconvenient numbers for hand calculation. On
computation because it does not require symbolic trans- a digital computer, however, the presence of such numbers
formation offix); instead, the base points zi are trans- makes no difference in calculations, provided that the
formed and the weight factors wj are modified by the function can be 'evaluated at the necessary arguments (if
-
constant (b a)/2. the function is available only in tabulated form, then it
will probably be necessary to interpolate in the table).
Example. Use the two-point Gauss-Legendre quadrature In the derivation of the quadrature formulas, great
formula to evaluate
emphasis has been placed on accurate integration of poly-
J:(x3+x2+x+l)dx=341.
nomial functions, so that (2.84) yields exactly the integral
+
when F(z) is a polynomial of degree 2n 1 or less. In
Using (2.98) with a = 1, b == 3, and the w , and z, from Table most real situations, of course, F(z) is not a polynomiai
2.2 for n = 1, at all; the question of formula accuracy is, as always, of
prime importance. Unfortunately the derivation of the
error term corresponding to (2.84) is quite tedious. The
development is not included here, but a complete des-
cription may be found on pages 314-325 of [2]. The error
term for the Gauss-Legendre quadrature formula of
(2.84) is

To the number of figures retained in the calculation, the Provided that the magnitudes of high-order derivatives
results are exact, as expectled, since f(x) is a polynomial of decrease or do not increase substantially with increasing
+
degree 2n 1.
~ l , the Gauss-Legenhe formulas are signifjcanily mDJE
The preceding example illustrates why the Gauss- accurate than the equal-interval formulas of-the preceding
Legendre quadrature formulas have been little used in the sections for comparable values of n.
EXAMPLE 2.3
GAUSS-LEGENDRE QUADRATURE

Problem Statement Let the magnitudes of the roots and the corresponding
weight factors from Table 2.2 be assigned to elements of
Write a general-purpose function named GAUSS that
two vectors, z and w, as shown in Table 2.3.1.
uses the m-point Gauss-Legendre quadrature formula to
In order to locate the roots and weight factors for the
evaluate numerically the integral
m-point formula, it is convenient to assign the following
.values to two vectors, p and k:

where f (x) is any single-valued function and a and b are


finite. The function should incorporate the necessary
Legendre polynomial roots, z , and the corresponding
weight factors, w,, from Table 2.2 for the 2,3,4,5,6, 10,
and 15-point quadrature formulas.
Method of Solution
The integral of (2.3.1) can be evaluated numerically
by implementing the algorithm of (2.98), provided the
appropriate wi and zi values are available. An examina-
tion of Table 2.2 shows that the roots zi for the m-point To find the elements of the z and w vectors for the m-
formula are placed symmetrically about the origin. For point formula, the p array is searched for an element
m odd, there is one root z, = 0 and (m - 1)/2 root pairs equal to m, i.e. for pi = m. The desired roots and weight
z i = -z,+,, i = 1,3, ..., m-2. The weights wi and . -,, + ,,
factors are zk,, .., zk,+, and w,,, . . .. wki ,- respect-
w,,, associated with the root pairs are identical. For m ively. For example, the proper constants for the 5-point
even, there are m/2 root pairs, and no zero root. formula can be found by scanning the p vector and noting
Because nearly half of the zi in Table 2.2 appear in root that p4 = 5. Then k4 = 6 and k, = 9. The desired elements
pairs, only the magnitudes of the zi and the correspond- of the z and w vectors are z,, z,, z, and w,, w,, w,
ing weight factors need be available to the function GAUSS. respectively.
Let c = (b - a)/2 and d = (b + a)/2. Then (2.98) may be
Table 2.3.1 Assignment of the Roots rewritten as
and Weight Factors to the z and w
vectors

i =J WJ m
for even values of m. For odd values of m, (2.3.2) also
applies except when j = k,.In this case zj has a zero value
and does not occur in a root pair; the factor wjf(d)
should be added just once to the accumulated sum.
In the program that follows, the function GAUSS checks
to insure that m has been assigned one of the legitimate
values 2,3,4, 5,6, 10, or 15. If not, GAUSS returns a
true zero as its value.
A short calling program is included to test GAUSS. It
reads values for a, b, and m, calls on GAUSS to compute
the required integral, and prints the results. The function
f(x) is defined as a function named FUNCTN. In this
example, f(x) = llx, i.e., the integral to be evaluated is

When a = 1, the results can be compared directly with


tabulated values of In b.
Example 2.3 Gauss-Legendre Quadrature
Flow Diagram
Main Program

using the m-point


Begin
Gauss-Legendre
quadrature formula
(Function GAUSS)

Function FUNCTN (Dummy argument: x )

Function GAUSS*(Dummy arguments: a, b, m,j>calling arguments: a, b, m, FUNCTN)


c---
I
-- -- ---,
---------.
I
I
i = l,2,
Entry m=pi
..., 7
T
I

t
b-Q
C+-
2
b+a
dt-
2
s t 0
I
I
I
I
1 -
I
I s + s + wjf(d)
s+s + wj(f (czj + d )
I (Function FUNCTN) +f (- czj + d))
I (Function FUNCTN)
A
I
I
I
I
I
6
I
I
I
I
L ----- -- -- - Return

*The vectors p, k, W, and z are assumed to have appropriate values upon entry (see text).
108 Numerical Integration

FORTRAN Implementation
List of Principal Variables
Program Symbol Definition
(Main)
A. Integration limits, a and b.
AREA Computed value of integral (2.3.1), H.
M Number of points in the Gauss-Legendre quadrature formula, m.
(Function
GAUSS)
c c = (b - a)/2.
D d = (b + a)/2.
I Subscript for vectors p and k, i.
J Index on the repeated sum of (2.3.2).
JFIRST Initial value of j, k,.
J LAST Final value of j, ki+l - 1.
KEY Vector k.
NPOINT Vector p.
SUM Repeated sum of (2.3.2).
WEIGHT Vector of weight factors, wi.
z Vector of Legendre polynomial roots, 2,.

(Function
FUNCTN)
x Integration variable, x.
Example 2.3 Gauss-Legendre Quadrature

Program Listing
Main Program
C A P P L l ED NUMERICAL METHODS, EXAMPLE 2.3
C GAUSS-LEGENDRE QUADRATURE
L
C T H I S C A L L I N G PROGRAM READS VALUES FOR A, B, AND M, CALLS
C ON THE FUNCTION GAUSS TO COMPUTE THE NUMERI CAL APPROXIMATION
C OF THE INTEGRAL OF FUNCTN(X)*DX BETWEEN INTEGRATION L I M I T S
C A AND B USING THE M POINT GAUSS-LEGENDRE QUADRATURE FORMULA,
C PRINTS THE RESULTS, AND RETURNS TO READ A NEW SET OF
C
C
DATA VAILU ES .
I M P L I C I T REAL*8(A-H, 0-Z)
EXTERNAL FUNCTN
L
WRITE ( 6 , 2 0 0 )
1 READ ( 5 , 1 0 0 ) A, B, M
AREA = GAUSS ( A, 0, M, FUNCTN
WRITE ( 6 , 2 0 1 ) A, B, M, AREA
GO TO 1
L
C
100
..... FORMATS FOR l NPUT AND OUTPUT STATEMENTS
FORMAT ( ,4:K, F8.4, l o x , F8.4, l o x , 1 3
.....
200 FORMAT ( l H 1 , l O X D l H A D 14X, l H B , 1 0 X D lHM, 9X, 4HAREA / 1 H )
201 FORMAT ( 11-1 , 2F15.6, 17, F15.6 )
C
END

Function FUNCTN
FUNCTION FIJNCTN( X )
C
C .....
REAL*8
T H I S FUNCTlON RETURNS 1 / X AS I T S VALUE
X, FUNCTN
.....
FUNCTN = l . O / X
RETURN
C
END

finction G A U S S
FUNCTION GAUSS( A, 8, M, FUNCTN

THE FUNCTION GAUSS USES THE M-POINT GAUSS-LEGENDRE QUADRATURE


FORMULA TO COMPUTE THE INTEGRAL OF FUNCTN(X)*DX BETWEEN
INTEGRATION L I M I T S A AND B. THE ROOTS OF SEVEN LEGENDRE
POLYNOMIIALS AND THE WEIGHT FACTORS FOR THE CORRESPONDING
QUADRATORES ARE STORED I N THE Z AND WEIGHT ARRAYS
RESPECTIVELY. M MAY ASSUME VALUES 2,3,4,5,6,10, AND 1 5
ONLY. THE APPROPRIATE VALUES FOR THE M-POINT FORMULA - ARE-
LOCATED I N ELEMENTS Z(KEY(l))...Z(KEY(I+l)-1) AND
WEIGHT(KEY(I))...WEIGHT(KEY(I+l)-1) WHERE THE PROPER
VALUE OF I I S DETERMINED BY F I N D I N G THE SUBSCRIPT OF THE
ELEMENT OF THE ARRAY NPOINT WHICH HAS THE VALUE M. I F AN
I N V A L I D VALUE- OF M I S USED, A TRUE ZERO I S RETURNED AS THE
VALUE OF GAUSS.
I M P L l C l T REAL*8(A-H, 0-Z)
REAL*,% GAIJSS, A, B, FUNCTN
DIMENSION NPOINT(7), KEY(B), Z(24), WEIGHT(24)
110 Numerical Integration

Program Listing (Continued)


C
C .....
DATA
PRESET NPOINT,
NPOINT / 2, 3,
KEY, 2, AND WEIGHT ARRAYS
4, 5, 6, 10, 1 5 /
.....
C
DATA KEY / 1, 2, 4, 6, 9, 12, 17, 25 /
C
DATA Z / 0.577350269,O.O ,0.774596669,
1 0.339981044,0.861136312,0.0 ,0.538469310,
2 0.906179846~0~238619186,0~66120938780.932469514~
3 0.148874339,0.433395394,0.679409568,0,8650633678
4 0,973906529,O.O ,0.201194094,0.394151347,
5 0.570972173,0.724417731,0.848206583,0.9372733928
6 0,987992518 /
C
DATA WEIGHT / 1.0 ,0.888888889,0.555555556,
1 0.652145155,0.347854845,0.568888889,0.478628671,
2 0.236926885,0.467913935,0.360761573,0.171324~93~
3 0.295524225,0.269266719,0.219086363,0.149451349,
4 0.066671344,0.202578242,0.198431485,0.186161000,
5 0.166269206,0.139570678,0.107159221,0.070366047,
6 0.030753242 /
C
C
DO1
.....
F I N D SUBSCRIPT OF FIRST Z AND WEIGHT VALUE
1=1,7
.....
I F (M.EQ.NPOINT(I)) GO TO 2

C .....
1 CONTINUE

GAUSS
RETURN
-
I N V A L I D M USED
0.0
.....
C
C ..... SET UP l N l T I AL PARAMETERS
JFIRST = K E Y ( I )
.....
2
JLAST KEY(I+1)
C = (0-A)/2.0
- 1

D = (B+A)/2.0
C
C .....
ACCUMULATE THE SUM I N THE M-POINT FORMULA
SUM = 0.0
.....
DO 5 J=JFIRST,JLAST
IF ( Z(J).EQ.O.O ) SUM = SUM + WEIGHT(J)*FUNCTN(D)
5 I F ( Z(JI.NE.O.0 ) SUM = SUM + WEIGHT(J)*(FUNCTN(Z(J)*C + D)
1 + FUNCTNC-Z(J)*C+ D l )
C
C ..... MAKE INTERVAL CORRECTITIN AND RETURN
GAUSS = C*SUM
.....
RETURN
C
END

Data
Example 2.3 Gauss-Legendre Quadrature

Program Listing (C~ntinu~ed)

Computer Output
AREA
0.692308
0.693122
0.693146
0.693147
0.693147
0.693147
0.693147
1.565217
1.602694
1.608430
1.609289
1.609416
1.609438
1.609438
2.106383
2.246610
2.286970
2.298283
2.301b08
2.302579
2.302585
2.488565
2.779872
2.905192
2.958221
2.980324
2.995311
2.995728
0.0
0.0
0.0
112 Numerical Integration

Discussion of Results with increasing m, the number of points in the quadra-


The program has been tested for the integrand func- ture formula. The computed approximations for In 2,
tion j'(x) = l/x with integration limits a = 1 and b = 2, In 5, In 10, and In 20 show similar trends for the actual
5, 10,20 leading to the true integrals In 2, In 5, In 10, and error.
In 20 respectively (see Table 2.3.2). Each integral has been
Table 2.3.2 True Integral Values
evaluated numerically using the 2, 3, 4, 5, 6, 10, and 15-
point Gauss-Legendre quadrature formulas. Data Sets True Integral
In terms of the transformed variable z, - 1 < z < 1 (see
(2.95)), the integrand function is given by 1-7 0.693147
8-14 1.609438
15-21 2.302585
22-28 2.995732
29-31 Illegal data values
Then, from (2.99), the error for the m-point quadrature is
An alternative, though computationally less efficient,
approach to integration bql Gauss-Legendre quadrature
is illustrated in example 3.4. Recursion relation (2.66) is
used to generate the coefficients of the appropriate
Since ((b - 1) + b + 1 has a minimum value of 2 for Legendre polynomial, and the base points zi are found by
b 2 1, the maximum truncation errors for the computed using the half-interval root-finding.method of Section
logarithms are given by 3.8. The corresponding weight factors wi are generated
by evaluating the integral of (2.85), which can be deter-
mined analytically.
With only minor modifications, GAUSS could be chang-
The error bound increases with increasing b, i.e., as the ed to allow evaluation of Gauss-Legendre quadrature
length of the integration interval increases, and decreases formulas of the composite type.
2.10 Gaussian Quadrature

Gauss-Laguerre Quadrature. The Laguerre polynomials by e-' and integrate both sides :
of (2.68) can be used to generate a Gaussian quadrature
formula to evaluate integrals of the form
.a n
I. m
e-'F(z) d z = fF(zi) l e-'Li(z) d z
i=O
m

The derivation of the integration formula of (2.100),


which is known as the Gauss-Laguerre quadrature, is very
similar to that for the Gauss-Legendre quadrature of the By the orthogonality property of the Laguerre poly-
preceding section. As befbre, the Lagrange form of the nomials, the remainder term on the right can be made to
interpolating polynomial ((1.43)with its error term (1.39) vanish if n ; = , ( z - zi) is a constant multiple of the
is used to approximate the function F(z), since the zi are Laguerre polynomial, 9 , + , ( z ) . The argument used here
as yet unspecified. Thus is identical with that used previously [see (2.86)-(2.92)]
for the Gauss-Legendre quadrature. Expand qn(z) in
terms of the Laguerre polynomials of degree n or less.
Then if n ? = , ( z - z i ) is (- l)"i'Y,+,(z), the integral on
the right of (2.104) vanishes. Thus the base points zi to
where be used for the n + 1 = m-point Gauss-Laguerre quad-
rature are simply the roots of the ( n 1)th-degree +
Laguerre polynomial 9, + ,(z). The corresponding weight
factors are given by the coefficients of the F(zi) in (2.104),
that is,
Assume that F(z) is a polynomial of degree 2n 3- 1 ; then
+
F("+')(t)/(n l)! must be a polynomial of degree n. Let
this polynomial be q,(z), so that
w i=
m
I ;fi [
e-'Li(z) d z = Jag-'
0
z -Zj
j=o zi - z .
j#i
d z . (2.105)

n
F(z) = C Li(z)F(zi)+
i= o
q,(z). (2.103) The appropriate base points zi and weight factors wi for
n = 1 , 2 , 3 , 4 , 5 , 9 , and 14(for t h e m = 2 , 3 , 4 , 5 , 6 , 1 0 ,
and 15-point formulas respectively) are shown in Table
e-'F(z) dz, multiply each term of (2.103)
2.4.

>Fable2.4 Roots of the Laguerre Polynomials 2'.+l(z) and the Weight


Factorsafor the Gauss-Laguerre Quadrature[4]

Roots (z,) Weight Factors (w,)

Two-Point Formula
.n = 1
.85355 33905 93
.I4644 66094 07
Three-Point Formula
n=2
.71109 3G099 29
.27851 77335 69
.(I) 10389 25650 16
Four-Point Formula
n =3
.60315 41043 42
-35741 86924 38
.(I) 38887 90851 50
.(3) 53929 47055 61
Numerical Integration

Table 2.4 (continued)

Roots (2,) Weight Factors (w,)

Five-Point Formula
n=4

Six-Point Formula
n=5

Ten-Point Formula
n=9

Fifteen-Point Formula
n = 14

" The numbers in parentheses indicate the number of zeros between


the decimal point and the first significant digit.
2.10 Gaussian Quadrature

The error for the Gauss-Laguerre quadrature is given


P I by Iame-y(x) dx = jomo-(z+n'/(z + a) dz

(2.106)
and the general formulation of the Gauss-Laguerre quad-
Example. Use the Gauss-Laguerre quadrature formula rature for arbitrary lawer limit of integration a is
to estimate r(2.0) and r(1.8),where .m
j a
e d 7 ( x ) dx = e-. 2wif(zi + a),
i=O
(2.109)
Jo
where the w iand zi are those tabulated in Table 2.4..
For integral arguments, a,
Example. Use (2.109) to evaluate
r(a) := (a - 1) !
P

From (2.1OO), 1
JI e - x x 2dx,
for which the analytical vaiue is 5/e.
r ( a )= :C w i z i a - ' .
1=0
From (2.109), the integral is approximated by
For a = 2, e - 1i=o2 w,(zl + 11'

Examination of Table 2.4 shc~wsthat zIe0


wi z i = 1 for all n.
Examination of Table 2.4 shows that
Therefore, r(2) = 1 ! = 1. In this case, the solution is exact
+
since F ( z ) = z is a polynomial of degree less than 2n 1 for
all n. :I =iOW , z f = 2 , Iz= O = l and k w l = l ,
IS1
For a = 1.8,
for all n. The solution is exact, as expected, since f ( x ) is a poly-
nomial of degree less than 2n 1 for all n. +
Gauss-Chebyshev Quadrature. Yet another Gaussian
Since F(z) = zO.' is not a polynomial, the quadrature will be
quadrature formula can be developed by using the ortho-
inexact. The true value is r(1.8) = 0.931384. Results for the
quadrature with n = 1, 2, 3, 4 , 5 , 9, and 14 are listed in gonality property of the Chebyshev polynomials. The
Table 2.5. development is completely analogous to that followed
in producing the Gauss-Legendre and Gauss-Laguerre
quadratures. In this case, the pertinent integral and
Table 2.5 r(1.8) by Gauss-Laguerre
Quadrature corresponding Gauss-Chebysheu quadrature formula are
given by

The integration is exact if F(z) is a polynomial of degree


+
2n + 1 or less. Here, the n 1 values zi are the roots of
+
the (n 1)th-degree Chebyshev polynomial T , + , ( z ) ,[see
(1.77)], so that
(2i + I)n
zi = cos -
(2n + 2) '
= O l . n. (2.112)

The Gauss-Laguerre quadrature of (2.100) can be used n


to evaluate integrals of the gorm The wi in this case are equal and have the value -
.. c, 1- '
n
Then (2.11 1) simplifies to

where a is arbitrary and finite, by means of the linear


(2.107)

J- 1
J f L ;2
F(z) dz L (-1 +
n
i7

1
"
i=o
F(zi).
+
transformation x = z a. Then (2.107) becomes (2.1 13)
116 Numerical Integration

The corresponding error term [8] is

Any integral of the form and then use the quadrature appropriate for integrals of
the form on the right of (2.1 17). In some cases, it may be
possible to evaluate integrals in which the integrand has a
singularity on the integration interval by relegating the
where a and b are finite, can be expressed in the form of singular term to the weighting function. For example,
(2.1 11) by a suitable transformation. integrands containing the factor 114 1 - xZ over the
Gauss-Hermite Quadrature. Based upon the ortho- integration interval [ - 1,1] can be handled by using the
gonality property of the Hermite polynomials of (2.74), Gauss-Chebyshev quadrature.
one can develop a useful Gaussian formula, A variety of other quadrature formulas of the Gaussian
type can be generated for particular weighting functions,
limits of integration, and sets of orthogonal polynomials.
Two well-known quadrature formulas of the Gaussian
which is known as the Gauss-Hermiie quadrature. Here, type have been developed that include one or both end
the x i are the roots of the Hermite polynomial of degree points of the integration interval as base points. When
n + 1. The weight factors and appropriate roots for the x, = a and x, = b, the remaining base points x,, . . ., xn-,
first few quadrature formulas are listed in Table 2.6. The are found to be the roots of Pi(x), the derivative of the
corresponding error term [8] is Legendre polynomial of degree n. The formula is called
the Lobatro quadrature and produces exactly the integral
if f(x) is a polynomial of degree 2n - 1 or less. When
only one integration limit is specified to be a base point,
the analogous formula is called the Radau quadraturq and
Table 2.6 Roots of the Hermite Polynomials H.+ l ( x ) and is exact when f(x) is a polynomial of degree 2n or less.
Weight Factors for the Gauss-Hermite Quadrature [3]
The reader interested in the development of many other
Gaussian formulas is referred to [7], which also contains
extensive tabulations of base points and weight factors
Roots ( x i ) Weight Factors ( w l )
for a variety of quadratures. The tabulated values are
accurate to 30 significant figures in all cases. Tables for
Twc~PointFormnh the Legendre, Laguerre, and Hermite formulas are par-
n= 1 ticularly complete (for example, base points and weight
*0.70710 67811 0.88622 69255 factors for the Gauss-Legendre quadrature for over 80
Three-Point Formula values of n from 1 to 51 1).
n=2 The Gaussian quadrature formulas can also be used
& 1.2247448714 0.29540 89752
0.0000000000 1.18163 59006
repeatedly over subintervals of the integration interval to
create composite formulas similar to those of Section 2.6.
Four-Point Formula
n=3 Since most of the Gaussian formulas do not have base
& 1.65068 01239 0.08131 28354 points at the ends of the integration interval, there is
&0.52464 76233 0.8049140900 usually no saving in the number of functional evaluations
Five-Point Formula per subinterval, as occurs when composite formulas are
n=4 constructed from the low-order Newton-Cotes closed
&2.02018 28705 0.01995 32421 formulas. On the other hand, the inconvenient form of
&0.95857 24646 0.39361 93232 values of the weight factors and base points makes use of
0.00000 00000 0.94530 87205
the high-order formulas virtually impossible for hand
calculation; even the preparation of computer programs
Other Gaussian Quadrature Formulas, By a suitable to implement a quadrature for several values of n can be
transformation of the variable of integration or of the tedious, because of the large amount of tabular informa-
function to be integrated, the four Gaussian quadrature tion required (alternatively, we might compute the
formulas developed in this section allow the numerical essential polynomial roots by using methods described in
evaluation of many well-behaved integrals over finite, Chapter 3, and then evaluate the necessary weight factors,
semi-infinite, or infinite intervals of integration. For ex- probably too wasteful of computing time for a fre-
ample, we might write quently used integration program).
EXAMPLE 2.4

VELOCITY DISTRIBUTION USING GAUSSIAN QUADRATURE


Introduction where f (x) is any single-valued function and a and b are
Gill and Scher [ l l ] have derived an expression for the fitlite. Write a general-purpose function named CHEBV
velocity distribution for a fluid flowing in a smooth that evaluates (2.4.8) using the m-point version of the
circular pipe by modifying Prandtl's mixing-length derived quadrature formula.
expression. The dimeinsionless velocity in the axial Write a main program that reads the appropriate data
direction, u', is given as a function of the dimensionless and then calls on both CHEBY and GAUSS (the function
distance from the inside wall of the tube, y+, by the developed in Example 2.3) to implement the Gauss-
following integral : Chebyshev and Gauss-Legendre quadratures, respectively,
for evaluating the integral of (2.4.1). Compare the two
resulting estimates of (2.4.1) for NR, = 5000, 10000,
25000, and 50000, with y+ = 1,2, 5, 10,20,50, 100,200,
500, and 1000, using the 2, 4,6, 10, and 15-point quadra-
where tures.
Method of Solution
and The integral of (2.4.8), where a < x < b, can be trans-
formed to the standard interval for the Gauss-Chebyshev
quadrature, - 1 < z < 1, by the transformation (2.95)

Here, 7' is the dimensionless distance corresponding to


the centerline of the tube, and is given by
to yield

and 4, a function of and some empirically determined


constants, is given by
Rewriting (2.4.9) in the form of (2.1 17), with the weight-
ing factor for the Gauss-Chebyshev quadrature,

NR,, the Reynolds number, and 1 the Fanning fiiction


factor, are dimensionless parameters that are functions of
the physical properties and the bulk velocity of the fluid yields
and also of the physical dimensions and surface roughness
(b-a) 1
of the tube. The Reynolds number is given by I;.(x) d x = - 2 I-lJl-z2 F(z) dz, (2.4.1 1)

where
-- ( ~ ( b- a): b +a)
where v is the mean fluid velocity, D is the inside dia- F(z) = J1 - z2 f . (2.4.12)
meter of the tube, and ,o and p are the density and vis-
cosity of the fluid, respectively, all in compatible units. The Gauss-Chebyshev quadrature formula of (2.1 13)
For smooth circular tubes in the range of Reynolds then becomes
number from 2000 to 50000, f i s adequately represented
[12] by the Blasius equation,

where
Problem Statement
zi = cos
([2(i - 1) + l]n) , i = 1 , 2 ,..., m,
Using an approach suggested by (2.117), derive a 2m
variant of the Gauss-Cfiebyshev quadrature (2.1 11) for
estimating the value of the integral
118 Numerical Integration

and m is the number of points in the quadrature formula.


Here, x i is simply a root, zi,of the mth-degree Chebyshev
polynomial transformed to the interval [a,b] using (2.95).
The integrand of (2.4.1), as written, is indeterminate at
the lower limit of integration. An equivalent form which
avoids this difficulty is
where u: = y: = 0. The program that follows reads, as
data, values for N, and y:, y:, .. ., y i just once. There-
after, a series of values are read for NR, and m; for each
The integrand of (2.4.15) is evaluated by a function data set, CHEBY and GAUSS are called to evaluate the
named FUNCTN, which is called upon by both CHEBY N - 1 integrals of (2.4.16). Since a value y+ that is
and GAUSS in the programs that follow. To improve the larger than J + is physically meaningless, fewer than
accuracy of the numerical integrations of (2.4.15), the N - l integrals in (2.4.16) may be evaluated for some
integral is rewritten as data sets.
Flow Diagram
Main Program

Begin
Example 2.4 Velocity Distribution using Gaussian Quadrature

Function FUNCTM (Dummy argument: y+ ; calling argument: si)

c t eqn. (2.4.2)

Function CHEBY (Dummy arguments: a, b, m, j';


calling arguments: y:-'_,, y f , m, FUNCTN)
I I

I
- .%
(Function FUNCTN)
I

FORTRAN 1mplementatio.n
List of Principal Variables
Program Symbol Definition
(Main)
DUPLUS Atr;, the integral of (2.4.16).
FF Fanning friction factor, 3.
I Subscript, i.
M Number of points to be used in the quadrature formulas, m.
NYPL N,the number of values of y: .
PHI 4, the function of equation (2.4.5).
RE The Reynolds number, NR,.
UPLUS Th~edimensionless axial velocity u l corresponding to dimensionless distance y,?.
YPLUS The dimensionless distance from the tube wall, y:.
YPLUSM Th~edimensionless distance corresponding to the centerline of the tube, 7'.
(Function
FUNCTN)
C, D Pairameters c and d given by (2.4.2) and (2.4.3).
YPL y+, dimensionless distance from the tube wall.
(Function
CHEBY)
A. B Lower and upper limits of integration, a and b, respectively (see (2.4.13)).
F Integrand, f (see (2.4.13)).
SUM s, the repeated sum of (2.4.13).
XI x i , the root zi transformed to the interval [a, b].
ZI z i , a root of the mth-degree Chebyshev polynomial.
120 Numerical Integration

Program Listing
Main Program
C A P P L l ED NUMERI CAL METHODS, EXAMPLE 2.4
C VELOCI TY D l STR l BUT1 ON US I NG GAUSS I A N QUADRATURE
C
C T H l S PROGRAM COMPUTES D l MENS l ONLESS VELOCI TY D l S T R l BUT1 ONS
C FOR A F L U I D FLOWING I N S I D E A SMOOTH CIRCULAR TUBE BY
C INTEGRATION OF A R E L A T I O N S H I P DEVELOPED BY G I L L AND SCHER (SEE
C A l C H E JOURNAL, VOL 7, NO 1, P 6 1 ( 1 9 6 1 ) ) . RE I S THE REYNOLDS
C NUMBER, YPLUS THE DIMENSIONLESS DISTANCE FROM THE TUBE WALL,
C AND UPLUS THE DIMENSIONLESS VELOCITY I N THE A X I A L D I R E C T I O N A T
C D l STANCE YPLUS. YPLUSM I S THE VALUE OF YPLUS WHl CH CORRESPONDS
C TO THE CENTERLINE DISTANCE. FF, THE FANNING F R I C T I O N FACTOR,
C I S COMPUTED AS A FUNCTION OF RE U S I N G THE B L A S I U S EQUATION.
C PHI I S DESCRIBED I N THE PROBLEM STATEMENT. THE INTEGRAL
C I S EVALUATED U S I N G BOTH THE GAUSS-LEGENDRE AND GAUSS-
C CHEBYSHEV QUADRATURES, PERFORMED BY THE FUNCTIONS GAUSS ( S E E
C EXAMPLE 2.3) AND CHEBY RESPECTIVELY. THE FUNCTI ONAL VALUES FOR
C THE QUADRATURES ARE COMPUTED BY THE FUNCTION FUNCTN. FOR A
C G I V E N RE AND A S E T OF NYPL VALUES OF YPLUS, THE INTEGRATIONS
C ARE CARRIED OUT U S I N G SUCCESSIVE YPLUS VALUES AS THE L I M I T S
C OF INTEGRATION W I T H THE M P O I N T QUADRATURE FORMULA. PHI AND
C YPLUSM ARE ASSIGNED TO COMMON STORAGE TO S I M P L I F Y COMMUNICATION
C AMONG THE PROGRAMS.
C
I M P L I C I T REAL*S(A-H, 0-Z)
DIMENSION Y P L U S ( 1 0 0 )
COMMON PHI, YPLUSM
EXTERNAL FUNCTN
C
C ..... READ DATA, COMPUTE CONSTANT PARAMETERS
READ ( 5 , 1 0 0 ) NYPL, (YPLUS( I ) , I=2,NYPL)
.....
1 READ ( 5 , 1 0 1 ) RE, M
FF = 0 . 0 7 9 / R E * * 0 . 2 5
YPLUSM = RE*DSQRT(FF/2.)/2.
PHI = (YPLUSM-60. ) / 2 2 .
C
C ..... EVALUATE INTEGRAL U S I N G GAUSS-LEGENDRE QUADRATURE
Y P L U S ( 1 ) = 0.
.....
UPLUS = 0.
WRITE ( 6 , 2 0 0 ) RE, YPLUSM, M, Y P L U S ( l ) , UPLUS
DO 3 1 = 2, NYPL
I F (YPLUS(I).GT.YPLUSM) GO TO 4
DUPLUS = GAUSS( Y P L U S ( I - I ) , Y P L U S ( I ) , M, FUNCTN 1
UPLUS = UPLUS + DUPLUS
3 WRITE ( 6 , 2 0 1 ) YPLUS( I), UPLUS, DUPLUS
-
(1
C
4
..... EVALUATE INTEGRAL U S I N G GAUSS-CHEBYSHEV QUADRATURE
UPLUS = 0.
.....
WRITE (6,202) Y P L U S ( l ) , UPLUS
DO 6 1 = 2, NYPL
I F (YPLUS(I).GT.YPLUSM) GO TO 1
DUPLUS = CHEBY( Y P L U S ( I - l ) , Y P L U S ( I ) , M, FUNCTN 1
UPLUS = UPLUS + DUPLUS
6 WRITE ( 6 , 2 0 1 ) Y P L U S ( I ), UPLUS, DUPLUS
GO TO 1

C
100
..... FORMATS FOR I.NPUT AND OUTPUT STATEMENTS
FORMAT ( l o x , 1 2 / t20X, SF10.3)
.....
101 FORMAT ( l o x , F10.3, 20X, 1 2
200 FORMAT ( l O H l R E = ,F10.1/ 1 0 H YPLUSM = ,F11.2/ 10H M -
1 1 8 / 54HOVELOCITY D l S T R l BUTION U S l N G GAUSS-LEGENDRE QUADRATURE'
2 / lHO,SX, 5HYPLUS88X, SHUPLUS,lOX,6HDUPLUS/ lHO,F10.2,F15.7
2 0 1 FORMAT ( 1 H F10.2, 2F15.7
2 0 2 FORMAT ( S S H ~ V E L O C TY
I DISTRIBUTION USING GAUSS-CHEBYSHEV QUADRATUR
1 E / 1H0,5X, 5HYPLUS,8X, 5HUPLUS, 1OX. 6HnUPLUS/ l H 0 , F l O . 2.F15.7
C
END
Example 2.4 Velocity Distribution using Gaussian Quadrature

Program Listing (Continued)


Function F U N C T N
F U N C T I O N FUNCTNC Y P L )
C
C THE F U N C T I O N FUNCTN COMPUTES THE INTEGRAND FOR THE QUADRATURE
C FUNCTIONS GAUSS AND CHERY. C, D, YPLUSM AND P H I ARE DESCRI RED
C I N THE PROBLEM STATEMENT. YPL I S THE D I M E N S I O N L E S S D I S T A N C E
C FROM THE TUBE WALL.
C
I M P L I C I T REAL*B(A-H, 0-Z)
REAL*8 FUNCTN, Y P L
COMMON P H I , YPLUSM
C
D = 1. -
YPL/YPLUSM
C = 0.36+0.36*YPL*YPL*(l.-DEXP(-PliI*YPL/YPLUSM))**Z
FUNCTN = 2 . * D / ( l . +DSQRTfl. + 4.+C*D))
RETURN
c
END

Function CHEBY
FUNCTION CHEBV( A, 8, M, F 1
C
C THE FUlNCTlON CHERY COPPUTES THE VALUE OF THE INTEGRAL OF
C F ( X ) * D K BETWEEN THE I N T E G R A T I O N L I M I T S A AND R U S I N G THE
C M - P O I N T GAUSS-CHEBYSHEV QUADRATURE FORMULA. Z I I S THE I - T H ROOT
C OF THE CHEBYSHEV POLYNOMIAL OF DEGREE M ON THE I N T E R V A L
C (-1,l) AND X I I S THE VALUE OF Z I TRANSFORMED TO THE I N T E R V A L
C (A,B). SUM I S THE SUM OF THE FUNCTIONAL VALUES A T THE M
C VALUES OF X I CORRECTED FOR THE WEIGHTING FUNCTION. THE
C UNIFORM WEIGHT FACTOR I S THEN A P P L I E D AND THE APPROXIMATED
C VALUE OF THE I N T E G R A L I S RETURNED AS THE VALUE OF THE FUNCTION.
C
I M P L l C l T REAL*B(A-H, 0-2)
REAL*8 CHEBY, A, B, F
C
SUM = 0.
D O 1 I = l , M
ZI DCOS(FLOAT(Z*( 1 - 1 1 + 1)*3.1415927JFbOAT ( 2 r M ) )
XI = (ZI+(B-A) + B + A)/2.
1 SUM = SUM + F ( X l ) * D S Q R T ( l . - Z I * Z I )
CHEBY = ( B - A ) * ~ . ~ ~ ~ ~ ~ ~ ~ * s u M / F L o A T ( ~ * M )
RETURN
C
END

Data
NYPL -
11
YPLUS(2)...YPLUS(6)

RE
RE
RE
RE
RE
RE
RE..
RE
RE
RE
RE
RE
RE
Numerical Integration

Program Listing (Continued)

Computer Output
Results for the 11th Data Set
RE = 25000.0
YPLUSM = 700.59
M I 2
VELOCITY D l S T R l B U T I O N US l N G GAUSS-LEGENDRE QUADRATURE

YPLUS UPLUS DUPLUS

0.0 0.0
1.00 0.9992443 0.9992443
2.00 1.9958208 0.9965765
5.00 4.8776166 2.8817958
10.00 8.4823031 3.6046865
20.00 11.8944744 3.4121713
50.00 15.1074984 3.2130241
100.00 16.9996247 1.8921263
200.00 18.7004495 1.7008247
500.00 20.5245821 1.8241326
VELOCITY D I S T R I B U T I O N U S I N G GAUSS-CHEBYSHEV QUADRATURE

YPLUS UPLUS DUPLUS

0.0 0.0
1.00 1.1098645 1.1098645
2.00 2.2'166574 1.1067930
5.00 5.4059707 3.1893133
10.00 9.4129422 4.0069715
20.00 13.3053378 3.8923955
50.00 17.0892826 3.7839448
100.00 19.2499871 2.1607046
200.00 21.1825355 1.9325484
500.00 23.2999976 2.1174622

Results for the 12th Data Set


RE = 25000.0
YPLUSM = 700.59
M = 4
VELOC l TY D l STRI BUT1 ON US l NG GAUSS-LEGENDRE QUADRATURE

YPLUS UPLUS DUPLUS


Example 2.4 Velocity Distribution using Gaussian Quadrature

Computer Output (Contirzued)


VELOCITY D l S T R l H U T l O N U S I N G GAUSS-CHEBYSHEV QUADRATURE

YPLUS UPLUS DUPLUS

Results for the 13th Dtzta Set


RE = 25000.0
YPLUSM = 700.59
M = 6

VELOCITY D l S T R l E l U T l O N U S I N G GAUSS-LEGENDRE QUADRATURE

YPLUS UPLUS DUPLUS

VELOCITY D I S T R I B U T I O N U S I N G GAUSS-CHEBYSHEV QUADRATURE

YPLUS UPLUS DUPLUS

Results for the 14th Data Set


RE = 25000.0
YPLUSM = 700.59
M r
s 10
124 Numerical Integration

Computer Output (Continued)


VELOCl TY D l S T R l B U T I O N US I NG GAUSS-LEGENDRE QUADRATURE

YPLUS UPLUS DUPLUS

VELOCITY D I S T R I B U T I O N U S I N G GAUSS-CHEBYSHEV QUADRATURE

Y PLUS UPLUS DUPLUS

Results for the 5th. Data Set


RE - 5000.0
YPLUSM = 171.34
M = 15

VELOCITY D l S T R l B U T I O N US l NG GAUSS-LEGENDRE QUADRATURE

YPLUS UPLUS DW PLUS

VELOCITY D I S T R I B U T I O N U S I N G GAUSS-CHEBYSHEV QUADRATURE

YPLUS UPLUS DUPLUS


Example 2.4 Velocity Distribution using Gaussian Quadrature

Computer .Output (Continued)


Results for the 10th Data Set
RE = 10000.0
YPLUSM = 314.25
M - 15
VELOCITY D I S T R I B U T I O N U S I N G GAUSS-LEGENDRE QUADRATURE

YPLUS UPLUS DUPLUS

VELOCITY D I S T R I B U T I O N USING GAUSS-CHEBYSHEV QUADRATURE

YPLUS UPLUS DUPLUS

Results for the 15th Data Set


RE = 25000.0
Y PLUSM = 700.59
M - 15

VELOCITY D I S T R I B U T I O N U S I N G GAUSS-LEGENDRE QUADRATURE

YPLUS UPLUS DUPLUS

VELOCITY D l STRl BUTlDN USING GAUSS-CHEBYSHEV QUADRATURE

YPLUS U PLUS DU P LUS


Numerical Integration

Computer Output (Continued)


Results for the 20th Data Set
RE = 50000.0
YPLUSM =
M - 1284.89
15

V E L O C I T Y D I S T R I B U T I O N U S I N G GAUSS-LEGENDRE QUADRATURE

YPLUS UPLUS DUPLUS

V E L O C I T Y D I S T R I B U T I O N U S I N G GAUSS-CHEBYSHEV QUADRATURE

YPLUS UPLUS DU P LUS


Exampie 2.4 Velocity Disrriburion using Gaussian Quadrature 127
Discussion of Results u+ > y f is clearly in error (see, for example, the results
Complete computer output is shown for data sets 11, for small y + values for the 2-point Gauss-Chebyshev
12, 13, 14, and 15 to illustrate the influence of the number quadrature). It appears that the Gauss-Legendre quad-
of points in the quadrature formula on the results for ratures yield better values for the integrals, although the
N,, = 25000, and for data sets 10, 15,20, and 25 to show results for all NReusing the 15-point quadrature formulas
results for the 15-point quadrature formulas (probably are quite consistent (in no case differing by more than
the most accurate of those used) for N,, = 5000, 10000, 0.25 percent).
25000, and 50000. A11 computations were done using The results for the 15-point quadratures with NR, =
double-precision arithmetic. 50000 are shown in Fig. 2.4.1. They agree in every case
The values of u + predicted by the various quadratures with the plotted values reported by Gill and Scher.
for y = 500 and N,, = 25000 are shown in Table 2.4.1.
+
Unfortunately, the authors failed to indicate how they
evaiuated the integral, and did not report the results in
Table 2.4.1 Predicted u* for y + = 500 and tabular form. Thus it was possible to check only the most
NRe= 25000 significant digits of the results.

m (Gauss-Legendre (Gauss-Chebyshev
Quadrature) Quadrature)

2 20.5245821 23.2999976
4 20.5672100 21.1 692723
6 20.5672992 20.8286561
10 20.5672993 20.66041 13
15 20.5672993 20.6085537

The results for the Gauss-Legendre quatlraeures are nearly


constant, indicating that the integrand function of (2.4.1)
can probably be represented adequately by polynomials '
of low degree, at least for the short intervals of integra-
tion used. Since
1 10 100 1000
2d - y+
lim f (y ' ) = lirn - - = 1, D~rnensionlessdistance from tube wall
Y++O Y++O 1 -tJ1 f 4 c d
Figure 2.4.1 Dimensionless velocity distribution in smooth circular
and since f ( y + ) < 1 for y+ r 6, any result that gives tubes-Nx, = 50000.
128 Numerical Integration

2.11 Numerical Differentiation Note that if the polynomial approximation p,(x) is


Having described numerical integration in some detail,
a few words about numerical differentiation are in order.
reasonably good, the integral
laD
pn(x) dx may be an

The differentiation problem involves the evaluation of excellent approximation to


J. f(x) dx. On the other hand,
dp,(x)/dx, which is simply the slope of the line tangent to
p,,(x), may vary significantly in magnitude from df (x)/dx,
even at the base points, where pn(x)and f (x) agree exactly ;
at some arbitrary x, given only a few sample values of the sign of the derivative might even be in error. Higher-
f(x) at the base points x,, x,, . . ., x,. The problem seems order differentiation tends to magnify these discrepancies
intuitively no more difficult than the integration problem still further. Admittedly, the deviations may be exagger-
of (2.1). The obvious solution is to find a suitable approxi- ated in the figure, but the point is clear. Numerical dif-
mation of f(x), say g(x), which is simple to differentiate ferentiation is an inherently less accurate process than
and evaluate, that is, numerical integration.
Because of this tendency toward error, numerical dif-
ferentiation should be avoided wherever possible. This is
particularly true when the f(xi) values are themselves
The usual choice for the approximating function is pn(x), subject to some error, as they would probably be if
the nth-degree interpolating polynomial which passes determined experimentally (engineers and scientists, in
through the points (x,, f (x,)), (x,,f(xl)), . . ., (x,,f (x,)). fact, often use differentiation tests on laboratory data as
Then (2.1 18) is given by an indication of experimental precision). If derivative
values must be computed in such cases, particularly when
the results are to be used in subsequent calculations, it is
usually better to use one of the least-squares polynomials
(see Chapter 8) to smooth the data before differentiating
Provided that f (x) is m times differentiable, higher-order them.
derivatives may be approximated by evaluating the higher- In spite of the problems associated with the differentia-
order derivatives of p,(x), n < m, tion process, the approximation of (2.120) is often used.
Then
f (x) = + R,(x), (2.121)
where R,(x) is the error associated with the nth-degree
Unfortunately, considerable care is required if serious interpolating polynomial for which p,(xi) =f(x,). We
errors are to be avoided. The inherent difficulty with this can then write
approach is that differentiation tends to magnify small
discrepancies or errors in the approximating function
Gust as the integration process tends to damp or smooth
them out). Figure 2.9 shows the situation. Any of the several formulations of the interpolating
polynomial from Chapter 1 may be used for p,(x). To
illustrate, assume that the base points ,yo, x,, . . ,,x, are
evenly spaced by intervals of length h, so that p,(x) may
be written as
A f (xo)
pntx) =f(xo) + (x - x,) - h
+

Then

Figure 2.9 Differentiation of the interpolating polynommrul.


2.11 Numerical Differentiation 129
Now we must choose an appropriate value for n. For with 5 in (x,xo,. . ., x,). Then (2.128) can be written as
example, for n = 1 and n = 2, (2.123) becomes, respec-
tively,

1 1
= -f hf (xo), (2.124a) For x = xj, that is, when x is one of the base points,
h (XI)-

2xo - 4x + 2x1 + 2h For example, (2.124b) can be written with its error term
as :

+ [2x -2,- "1


(2.124b) d o) --
d f h, 1 [-3/(xo) + 4 f ( x ~ -f(x2)1+
2h ) if
h "'(t)
(2.132a)
Second derivatives can be computed similarly; for
df(x1) 1 h
example, for n = 2, = - [f ( ~ 2 -
) f ( ~ 0 1-
1 Tf (3)(0 (2,132b)
dx 2h

Notice that the leading factor in the error term for the
Other derivatives follow in obvious fashion, although the derivative at x, is just half that for the error term at xo
formulas quickly become rather cumbersome when and at x,. We would expect that the computed derivative
written in general form. estimate at the midpoint would be more accurate than
Error terms for the differentiation formulas are given those computed at the end points, since functional values
by the appropriate derivative of the remainder term for on both sides of the midpoint are used in the midpoint
the nth-degree interpolating polynomial R,(x), given by formula (2.132b) but not in the end-point formulas. Com-
(1.39b) as parison of the first derivative approximations, using
higher-degree interpolating polynomials (see, for example,
the extensive tables in [9]), shows that the magnitudes of
the coefficients of f'"+"(t) at the base points decrease
where = [(x) is an unknown point on the interval monotonically as x approaches the midpoint of the
(x,xo,. . .,x,). Unfortunately, [(x) may not be single- interval. In addition, the midpoint formula with n
valued or differentiable, although f("+')([) is, provided even is invariably simpler [for example, (2.132b)I than
f(x) itself possesses derivatives of higher order. If we let formulas for other base points. For these reasons, the
~ ( x=) n;=,,(x - xi) and use the finite divided-difference odd-point formulas (n even) are usually favored over the
equivalent off '"+"(t), [see (1.39a)], then R,(x) is given by even-point ones, and the base-point values should, if
possible, lie on both sides of the argument at which
df (x)/dx is to be approximated.
and its first derivative is To determine dmf(x)/dxmfor rn < n, we use (2.120). The
corresponding error term, assuming that f(x) possesses
derivatives of order n + m + 1, is given [9] by the rather
complicated expression

We can show (see [2]) that


130 Numerical Integration
where the (, are all in the interval (x,x,, . . .,xn). At the available the information required to determine the
base points x = x i , j = 0, 1, . . ., n, (2.133) simplifies some- optimal h.
what since the -term in f'n+m+1)(5m)vanishes This As an example of numerical con-
f ~ m u l a t i o nis particularly One can sider a study involving simultaneous fluid.flow and hear trans-
establish upper bounds for the derivatives fer. Values of temperature To, TI, Tz, T3 . .. are determined at
a series of fixed points xo, X I , x,, x:, ... extending normally
from a solid wall into the fluid. The point xo coincides with
on the interval containing the base points. the wall, and the points are equally spaced by a distance h.
I n general, we can show that pn(x), the nth-degree The problem is to estimate the temperature gradient dT/dx at
',
interpolating polynomial, has error of order hnf written the wall, since a knowledge of the fluid thermal conductivity
as O(hn+') [see (1.62)], and that each successive differen- then enables the rate of heat transfer between the wall and
tiation reduces by one the order of the associated error. fluid to be determined.
Thus The method attempted here is that suggested by equation
(2.122), namely, to differentiate the interpolating polynomial
Rn(x) = ~ ( h " " ) , passing through two or more of the points (T,,xi), i = 0, 1, 2,
.... Without going into the details of constructing the inter-
polating polynomial, the resulting approximations for dT/dx
at x = x o are given in Table 2.7, for polynomial orders
n = 1, 2, 3, and 4.
As an illustration, the temperature gradient at the wall is
evaluated for a set of temperatures computed during a recent
finite-difference investigation [lo] into a problem involving
natural convection; the appropriate values (here limited to
five points) are To= -1.000, TI = -0.588, T, = -0.295,
1; = -0.259 and T4= -0.305, with h = 0.1.
Clearly, the value of dT/dx at x = x o is in considerable
doubt. The trouble stems mainly from the inherent inaccuracy
of numerically differentiating any function that is not closely
clearly showing the deterioration in accuracy resulting approximated by a polynomial of low order; it is further
from successive differentiation operations. T o decrease aggravated by the fact that information is being sought at the
the error, we can reduce the stepsize h, that is, choose the extreme end of an interval. Barring some special knowledge
base points on a smaller total interval. In practical situa- about the form of T(x), the only satisfactory way in which
tions where we have a given set of tabular values for f (x), the temperature gradient at the wall can be estimated more
precisely is to have temperatures available at points spaced
this may be impossible. Unfortunately, small values of h
more closely together, that is, for a smaller value of h. The use
tend to magnify the round-off error in the evaluation of of the higher-order formulas in conjunction with a relatively
a differentiation formula, since h appears in the denom- large point spacing can easily introduce undesirable oscilla-
inator with positive powers. This suggests that for a tions into the approximation for a function which is essentially
given function, a given differentiation formula, and a smooth. The full advantage of the higher-order formulas, as
given argument x, there is a n optima1 choice for the evidenced by their error terms, may only be realized as h
value of h [a]. Unfortunately, one would rarely have becomes progressively smaller.

Table 2.7 Estimation of Temperature Gradient

Order of Approximation to Error Numerical


Polynomial dT/dx at x = xo Term Value
quadratic-type integration formula developed in Problem 2.5.
A typical call will be
2.1 Show that if the curve f(x) has no inflection point on
( x , , ~ , ) , that is, f"(x)f I) on (xo,xl), then the error in the ANS = AREA2 (F, X, N, A, 8)
trapezoidal rule (2.1 1) is bounded as follows:
If possible, again completely embrace the limits of integration
by known base points. The function AREA2 may be ilsed in
Problem 2.16.
2.8 The Newton-Cotes closed integration formula for
a = 6 , given by equation (2.21f), is inconvenient for hand
calculations because of the rather unusual coefficients. An
where x , < 5 < x l . alternative approximation that is sometimes used is
2.2 Show that if f'(xo) and f '(x,) are available, then the
tmpro2ed trapezoidal rule (see (2.8)), given by

What is the error term for this approximation, and for what
degrees of polynomial will the formula be exact?
has degree of precision three; that is, it is exact if f(x) is a 2.9 What is the formulation of the quadrature given in
polynomial of degree three or less. Pmbkrn 2.6 when using n applications of,the rule in the evalu-
2.3 Find the error term for the improved trapezoidal rule ation of
of Problem 2.2.
2.4 The integral I = J: f(x) dx is to be estimated by a
trapezoidal-type rule, using two base points, x, and x,, that do
not necessarily coincide with the integration limits a and 6.
Show that the required approximation is 2.10 Consider the computation of the integral

using Simpson's rule (2.21b) repeatedly, each time halving the


What is the corresponding error term? interval h. This is equivalent to the coqposite Simpson's rule
2.5 The integral I = J: f(x) dx is to be estimated by a of (2.49) for n = 1, 2,4, 8, 16, . . . . Let j be the number of
quadratic-type rule, using three base points, XI,x z , and x3, interval-halving operations. Then n and j are related by n = 2'.
that are not necessarily equally spaced, and none of which Let I j be the estimate of the integral for j repeated interval
necessarily coincides with a or b. Derive the appropriate inleg- halvings, and I: be the improved estimate
ration formula and its associated error term. Check that the
formula reduces to Simpson's rule when xl = a, xo = 6, and
( ~- z XI) = ( ~ -
3 xZ) = h.
2.6 Functional values f (xt),f(x2), .. .,f(x.) are available
at the base points X I , XZ,. . ., x,, (arranged in ascending order, using Richardson's extrapolation technique (see (2.51)).
but not necessarily equally spaced). The integral Write a function, named SIMPRH, with argument list
(A, B, F, EPS, JMAX, J)

to implement the method outlined in the preceding paragraph,


where A and B are the lower and upper integration limits,
is to be estimated by repeat.ed application of the trapezoidal-
type rule discussed in Problem 2.4. Write a function, named respectively, and F is the name of another function that evalu-
AREA1, fa perform the required integration. A typical call will
ates the integrand, f (x). Sl M PR H should calculate, in order, the
be
.
I?, j = 0, 1, 2, . ., until

ANS = AREA1 (F, X, N, A, B)


j > JMAX
in which F and X are the vectors used to store the functional
values and base points, N is the number of points, and A and B 11: - ZJ[ < EPS.
are the integration limits. If possible, always arrange for the
subintervals of integration to coincide with the known base Thus EPS may be considered to be a tolerance on the estimated
points. I n general this will not be the case for the extreme sub- error. The number of interval-halving steps carried out should
intervals involving the integration limits a and 6. Avoid be stored in J upon exit. The final 1: should be returned as the
extrapolation if possible; that is, use a base less than a and value of the function.
2.11 The truncated Chebyshev expansion of f(x) on
another greater than 6, if available. The function AREA1 may
[- 1,1] is given by (1.81).
be used in Problem 2.16.
2.7 Write a function, named AREA2, similar to AREA1 of
Problem 2.6, but now using repeated application of the
132 Numerical

where the coefficients ai (see Problem 1.50) are transfer coefficient h is given by
I *
a, = -J
n'o
f(cos e) do,
where

Write a subroutine, named CHEBCO, with argument list


I=
loZ (sin ,8)113dp.

Here, k, p, and v are the thermal conductivity, density, and


kinematic viscosity of the condensed liquid film, r is the tube
radius, X is the latent heat of condensing vapor, g is the gravi-
that finds the coefficients a i and bi,i = 0, 1, .. ., n. Essential
values 01 J ( x ) shoulh b e suppYieh by another 'iunaion, wilh tational acceleration, AT is the difference between the vapor
saturation temperature (T,) and the tube wall temperature
dummy argument name F. The integrals should be evaluated
(T,), and all these quantities are in consistent units.
by the function SIMPRH of Problem 2.10 with the error
estimate tolerance EPS (see Problem 2.10). The coefficients For water, the group 4 = (k3pgA/v),in BTU4/hr4 OF3 ft7,
ao, . . ., an,and bo, .. ., b, should be'storedin~(1)....,A(N + 1) varies with temperature T ("F) as follows:
and B(1), ..., B ( N + I ) , respectively, upon return.
Test CHEBCO with an appropriate calling program to
evaluate the coefficients ai and b,,i = 0, 1, ..., 6, for f(x) =
tan-' x. Compare your results with values reported by Snyder
I201 :
When used in the above formula for h, 4 should be evalu-
ated at the mean film temperature T = (T, Tw)/2. +
Note that the value selected for EPS and the number of digit< Write a program that uses the above equations to compute
retained in the computations will be limited by the word size h. The input data should include values for To,Tw(both in OF),
for the particular computer being used. Double-precision and d (tube diameter in inches) ; these values should also appear
arithmetic should be used, if available. in the printed output. The program should then compute and
2.12 Let A, B, and C be points on the x axis such that print values of the integral I, the mean film temperature T,
AB= BC. Raise ordinates from these points to intersect the the corresponding value of +, and the resulting heat transfer
curve y = y(x) at D, E, and F, respectively. Let P be the inter- coefficient h. The four sets of test data in Table P2.14 are
section of D F and BE, and let Q be the point between E and P suggested.
such that 2EQ = PQ. Table P2.14
Show that Simpson's rule amounts to saying that the area
under the curve approximately equals AC x BQ.
2.13 The distribution with wavelength h of the intensity q
of radiation leaving unit area of the surface of a black body is
given in Problem 3.43. Write a program that will compute
the total rate Q at which radiation is being emitted between
wavelengths A, and A,, that is,

2.15 The shell-and-tube heat exchanger shown in Fig.


P2.15 is employed for heating a steady stream of rn lb/hr of a
fluid from an inlet temperature TI to an exit temperature T2.
The integration routine used can be checked by noting that This is achieved by continuously condensing a saturated vapor
5,"
the total rate over all wavelengths, q dh, has the exact value in the shell, maintaining its temperature at Ts.
uT4, where u is the Stefan-Boltzmann constant,
Saturated
I vapor in

--t +
and T, k, c, and h are defined in Problem 3.43. The program
should also compute the fractional emission, Q/uT4.
Cold
fluid in
( m Ib/hr)
ISI I Condensate
' '
) Hot
tlu~dout

Suggested Test Data out

T = 2000°K, 6000°K; hl = 3933.666 A '(Ca+ K line); Figure P2.15


h2 = 5895.923 A (Na D, line). Note that 1A (angstrom unit) = By integrating a heat balance on a differential element of
cm. length, the required exchanger length is found to be
2.14 Consider the condensation of a pure vapor on the
outside of a single cooled horizontal tube. According to the
simple Nusselt theory of film condensation, the mean heat
Problems 133
where D is the tube diameter. The local heat-transfer co- ..., C,, for which the corresponding pressures a r e p ~pz,
, ...,p.
efficient h is given by the correlation (not necessarilyequally spaced, but arranged in ascending order

h=-
0.023k 4m
- ( ) 0.8 (pp)0.4
of magnitude). Assume that the value of p , (typically, 1 atm) is
sufficientlylow for the gas to be perfect in the range0 < p < p,,
D rDp so that

where c,, p, and k (the specific heat, viscosity, and thermal


conductivity of the fluid, r~~spectively)are functions of tem-
perature T. All quantities in the above formulas must be in
consistent units. Write a program that will read data values for n, Ci and p i
Write a program that will read values for m, TI, T2, Ts, D, (i = 1,2,...,n), and P, and that will proceed to use one of the
and information concerning the temperature dependency of unequal-interval integration functions developed in Problems
c,, p, and k, as data. The program should then compute the 2.6 and 2.7 to evaluatef.
required exchanger length, L.
In making approximate heat-exchanger calculations, one Suggested Test Data
often assumes mean values for c,, p, and k, evaluating them P = 50, '100, 150, 200, 250, 500, 750, 1000, 1500, and 2000
+
just once, at the mean fluid temperature (TI T2)/2. Let the atm, with compressibility factors, derived from Perry [IS],
program estimate the error involved in making this assump- given in Table P2.16.
tion.
Table P2.16
Suggested Test Data
-
Case A Case B Compressibilty Factor, C
Fluid Carbon dioxide Ethylene glycol
Methane Ammonia
gas liquid p (atm) (-70°C) (200°C)
m, lb/hr 22.5 45,000
f i , -F (I0 0
Tz, OF 280 and 500 90 and 180
Ts,
OF 550 250
D, in. 0.495 1.032

0.153 (constant)

The above physical properties have been derived from Perry


[IS]. See Problem 1.19, con~cerningthe interpolating poly- 2.17. If T,(x) is the nth-degree Chebyshev polynomial, use
nomials resulting from the tabulated values of k for carbon the orthogonality property of equation (2.70) to show that
dioxide and p for ethylene glycol.
2.16 The fugacity f (atm) of a gas at a pressure P (atm) and
a specificd tcmpcraturo T is givcn by Dcnbigh [16] as

2.18 As an engineer with a background in numerical


methods, you have been asked to obtain the best average value
for the percentage of impurity in a product stream over a
where C = Pu/RT is the experimentally determined compres- certain time period during which this percentage is likely to
sibilityfactor at the same temperature T, R is the gas constant, fluctuate, although not very erratically. You may take a
and v is the molal volume. For a perfect gas, C = I, and the maximum of four samples and conduct, at most, two precise
fugacity is identical with the pressure. chemical analyses. If desired, samples may be mixed before
S~lpposethat n values are a~vailablefor C, namely, C,, C2, analysis. What strategy do you recommend, and why?
134 Numerical Integration

2.19 The error function, erf x, is defined by

erf x = j2 J e-"
7r 0
dt and compare with the true value, .rr/[2 sin(n/4)].
2.24 In a study by Carnahan [17], the fraction fTh of
certain fission neutrons having energies above a threshold
energy E,, (Mev) was found to be

By using appropriate four-point Gaussian quadrature formu-


las, evaluate both of these integrals for x = 0.5, and compare
frh = 1 - 0.484 ibThsinh(J z ) e - ~dE,
with the tabulated value, erf (0.5) = 0.520500.
+
2.20 Since r(n 1) = n!, the Gauss-Laguerre three-point
Evaluate
8.6 Mev.
fTh within k0.001 for ETA= 0.5, 2.9, 5.3, and
formula could be used to approximate n!:
2.25 The following relation is available [3] for Pi(x), the
.! = 1" e-Y
"0
dx =
i=o
alx?.
derivative of the nth-degree Legendre polynomial:
(xL l)Pg(x)= nxP.(x) - nP. - (x).
What is the largest integer n for which this formula would be
Show that the family of polynomials PA(x) is orthogonal on the
exact ?
2.21 Modify the function GAUSS of Example 2.3 so that
interval [- 1,1] with respect to the weighting function (x2 - I).
Show also that a Gaussian quadrature of the form
integrals of the form

are evaluated using a composite formula, equivalent to repeated can be developed for the case in which two base points are
application of the m-point Gauss-Legendre quadrature over preassigned (xo = - 1, x. = I), that the quadrature is exact
nonoverlapping subintervals of [a,b] of length (b - a)/n. when f(x) is a polynomial of degree 2n - 1 or less, that
2.22 Write a function, named LAGUER, that employs the xl,x2, . . ., x,, - are the zeros of PL(x), and that the weight
m-point Gauss-Laguerre quadrature formula (2.109) to factors are
evaluate numerically an integral of the form

where f(x) is an arbitrary function and a is finite. The function


(1 + x)Pi(x) dx,
should incorporate the necessary Laguerre polynomial roots
and the corresponding weight factors (see Table 2.4) for the
m-point quadrature where m may be any of 2,3,4, 5, 6 , 10, or
15. Let the argument list be (A, M, F), where A and M have Note. It can be shown by further manipulation that
obvious interpretations, and F is a function that evaluates
f (x) for any x.
To test the routine, write a short main program and approp-
riate functions to compute:
(a) the gamma function The above is known as Lobatto quadrature.
2.26 Show that a Gaussian quadrature,

for a = 1.0, 1.2, 1.4, 1.6, 1.8, 2.0.


(b) the exponential integral, for which xo = - 1 and the remaining base points are the n
roots of
E,(.) = Jam dx,

for a = 0.5, 1.0, 1.5, 2.0. In each case Lhe integ~alshould be


evaluated using the 2, 3,4, 5, 6, 10, and 15-point quadratures. is exact when f (x) is a polynomial of degree 2n or less. Find the
Compare the results with tabulated values. weight factors w,, i = 0,1 , ..., n, for n = 1,2,3.
2.23 Write and test a function, named HERMIT, that 2.27 A useful quadrature formula, attribute? to Cheby-
implements the Gauss-Hermite quadrature of (2.115) for shev, is given by
n = 1, 2, 3,4, 9, and 19 (see Table 2.6 and reference [3]). Let
the argument list be (N, F), where N and F correspond to n and
f i n (2.115). Use HERMIT to evaluate the integral
Problems 135
+ +
where the xi are the n 1 roots of an (n 1)th-degree poly- E(a), for a = O., 0.1,0.25, and 0.5. Compare your results with
nomial C,+l(x). The first few of these polynomials are: the tabulated values [3]:
E(O.00) = 1.570796327
E(O.lO) = 1.530757637
E(0.25) = 1.467462209
E(0.50) = 1.350643881
2.29 A very light spring of length L has Young's modulus
E and cross-sectional moment of inertia I ; it is rigidly clamped
at its lower end B and is initially vertical (Fig. P2.29). A
downward force P at the free end A causes the spring to bend
over. If 8 is the angle of slope at any point and s is the distance
along the spring measured from A, then integration of the
exact governing equation EI(d0/&) = -Py, noting that
dylds = sin 8, leads to [21]:
Pt can be shown 1261that the roots of these polynomials all lie
in (- 1,l) and are real, wher~eassome of the roots of C.(x), for dB
&=-
n = 8 and n > 10 are complex. Note that this quadrature [(2P/EI)(cos 6 - cos a)]"' '
has the attractive feature that all the weight factors are equal, and
and the weighting function for the integral is w(x) = 1.
(a) Show that the polynomials C,(x) are not orthogonal
with respect to the weighting function w(x) = 1 on the interval
C- 1JI.
(b) Find the error term for the quadrature. where a is the value of 0 at A.
(c) What is the degree of precision of the formula for n even
and n odd?
(d) Show that the two-point formula is identical to the two-
point Gauss-Legendre quadrature of (2.84).
(e) Modify the quadrature to allow estimation of integrals
of the form

where a and b are finite.


2.28 Using the roots of the polynomials C.(x) of Problem
2.27 and found by the program developed in Problem 3.32,

A
write a function, named CHEBZ, with argument list (A, 6,F,
M, R ) that implements the composite version of the quadrature
developed in part (e) of Problem 2.27. Here, A and B are the
lower and upper integration limits, M is the number of points
in the quadrature formula, and R (integer) is the number of 7/',, Figure P2.29 ///A

repeated applications of the quadrature. Thus the M-point


quadrature is to be applied to R nonoverlapping subintervals
of [A, B], each of length (B - A)/R. F is another function that Show from the above that the Euler load P, for which the
evaluates f (x) for any x. spring just begins to bend is given by
Write a main program a:nd accompanying function F to
allow computation of the c;omplete elliptic integral of the
second kind,

E(U) - J:2(l - a sin2 x)112dx.


Let x, and ybdenote the vertical distance of A above the datum
plane and the horizontal distance of B from A , respectively.
Compute the values of PIP, for which x,/L = 0.99, 0.95, 0.9,
The main program should read values for ALPHA (a), M , and 0.5, and 0. What are the corresponding values of yb/L and a ?
R, call upon C H EB2 to return the estimated value of the (Note that the above expression for L and a related expres-
integral, print the results, and return to read another data set. sion for x,/L involve elliptic integrals.)
Since CHEBZ should be a gen~eralroutine, F should have only 2.30 A semiinfinite medium (x L 0) has a thermal diffus-
one argument (say X). ALPHA should be available to the func- ivity a and a zero initial temperature at time t = 0. For t > 0,
tion F through a C O M M O N (or equivalent) declaration. the surface at x = 0 is maintained at a temperature T, = T,(t).
For a variety of values for M and R, calculate values of By using Duhamel's theorem (see p. 62 of Carslaw and Jaeger
136 Numerical Integrafion

1221, for example), the subsequent temperature T(x,t) inside where


the medium can be shown to be given by Ai,j = ~ i p j j

(An alternative form of the integral can be obtained by intro-


ducing a new variable: p = x/[22/a(t - A)].)
Let T, represent the periodic temperature in O F at a point on 2.32 Show that if the x i are equally spaced on the interval
the earth's surface. For example, the mean monthly air tem- [a,b], and the yj are equally spaced on the interval [c,d], then
peratures in Table P2.30 have been reported [23] at the loca- aj
the coefficients a i and of Problem 2.31 are the coefficients of
tions indicated. +
the (rn + 1)- and (n 1)-point Newton-Cotes closed integra-
tion formulas of (2.21).
ahble Pi.30 2.33 Find the error term for the quadrature formula devel-
oped in Problem 2.32.
Nagpur Cape Royds Yakutsk 2.34 Write a function, named SIMPS2,that evaluates the
double integral
January 66.9 26.1 -45.0
February 73.9 20.4 -35.4
March 82.5 4.9 -11.4
April 89.9 - 10.9 14.0
May 92.4 -5.5 40.0 using the two-dimensional quadrature formula developed in
June 86.4 -7.1 58.0 Problem 2.31 with m = n = 2, x o = a, x , = (a - b)/2, x 2 = b ,
July
August
80.4
80.3
-17.0
- 15.7
65.7
59.0
YO = c, yl = ( c+d)/2, and y 2 = d. The function should have
- 5.7
arguments
September 80.2 42.0
October 78.3 4.5 16.5 (A. B. C, D, F)
November 72.3 17.0 -22.1
December 67.7 30.0 -41.2 where A, B, C,and D have obvious interpretations, and F is the
name of another function with arguments (X, Y) that calcu-
lates f(x,y) when necessary. What is the error term for this
Compute the likely mean monthly ground temperatures a t two-dimensional implementation of Simpson's rule?
5, 10, 20, and 50 feet below the earth's surface at each of the 2.35 Develop a composite quadrature formula for the
above locations. Plot these computed temperatures to show two-dimensional Simpson's rule described in Problem 2.34. In
their relation to the corresponding surface temperatures. In this case, the integral over the rectangular domain a < x < b .
each case, assume: (a) dry ground with cr = 0.0926 sq ft/hr, c < y < d, is computed as the sum of integrals evaluated fox
(b) the mean monthly ground and air temperatures at the nonoverlapping rectangular domains. Let the individual rect.
surface are approximately equal, and (c) the pattern of air angles have dimensions ( b - a)/Nx and (d- c)/N,, so thal
temperatun.; repeats itself indefinitely from one year to the the simple rule is repeated NxNytimes.
next. 2.36 Write a function, named SIMPCZ, that implement!
+
2.31 Suppose that (m l)(n f 1) functional values the composite two-dimensional Simpson's rule developed ir
+
f(xi,yj) are available for all combinations of rn 1 levels of Problem 2.35. The argument list should be
x i , i = O , l ,...,m, and n + l levels of y j , j = O , l ,..., n.
Define Lagrangian interpolation coefficients X,,,,i(x), i = 0, 1,
.. ., m,anJ YmPj(y), j= 0, 1, . . ., rn, as in Problem 1.35. Let the where N X and NY are N, and N,, respectively, and the othe:
integral in :he rectangular domain a < x < b, c < y $ d, arguments are those described in Problem 2.34. Although eaci
application of the two-dimensional Simpson's rule requires thc
evaluation of nine functional values, eight of these functiona
values are, in general, on the "boundaries" of other rectangu
be approx~n.? 'led by the integral of the two-dimensional inter- lar domains as well. Show that, while straightforward repeatec
polating pr\lyi.omial of degree rn in x and of degree n in y, applications of the rule might require as many as 9NxNyevalu
ations of the function, the minimum number of functiona
+ +
evaluations required is only (2N, 1)(2Ny 1). Your functioi
should require as few functional evaluations as possible. Th,
that is, let , estimate of the integral should be returned as the value of th
function.
2.37 Develop a n algorithm, based o n the one-dimension2
Gauss-Legendre quadrature formula (2.96), for numerics
Show that the 1.11 . - integral may be rewritten as evaluation of integrals of the form
Probie

2.38 Impien~entthe algorithm developed in Problem 2.37 the view factor becomes
as a function named INTZD. Use the function to evaluate the
integral

Write a program that will read values for L, W 1 and


, WZ,
and that will use numerical integration to estimate F12.
and compare wiih the exact value, 2n/3.
2.39 As shown in Fig. P2.39, a differential element of area,
dAl(a plane source) is situated at a distanceL from the center
0 of a circular disc (surface "2") of radius a. The linejoining 0
to dAl is normal to dA, and also makes an angle a with the
plane of the disc. From Problem 2.41, the fraction of radiation
leaving dAl that impinges on the disc is given by

where the integral is taken over the area A z of the disc. Polar
coordinates should be used to span A 2 ,that is, let dA, = R dR
dB, where X is the radial measure (0 < R & a) and 8 is the
measure (0 < 8 < %n)in Az. , A

Figure P2.41
Suggested Tesi Data
L = 1, with Wl = W zhaving the values 0.5, 1, and 2 in
sequence.
2.42 Repeat Problem 2.41, now with an absorbing (but
not reradiating or scattering) gas intervening between the two
surfaces. :n this case, the integrand will have to be multiplied
by an additional factor e l B ' , where /3 is an attenuation coeffi-
Compute F d A , - 4 2 for all combinat~onsof L/a = 1,2, 3, and cient. Suggested values are PL = 0.5, 1, and 2.
5, a = JOO,60°, and 90°, using the fsnction SIMPC2 from 2.43 Two parallel squares of side L are opposite each other,
Problem 2.36. For a = 90", check the results against the a distance d apart. Starting from the definition given in
known analytical value: Problem 2.41, evaluate FlZ for radiant inte~hangebetween the
two squares for d/k = 0.5, 1, and 2.
2.44 A beam of radiation of intensity rF is incident at an
angle 80 to the normal of the surface of a semiinfinite medium
3.40 Express the integral of Problem 2.39 in terms of Car- that scatters isotropically with an albedo w o (fraction of a
tesian coordinates and use the function INTZD deveioped in pencil of radiation that is scattered at any point). Because of
Problem 2.38 to evaluate the integral for the data of Problem scattering within the medium, some of the radiation will
2.39. Compare the results vvith the analytical solution. reemerge from the surface. It can be shown (see, for example,
2.41 Consider radiant heat transfer between surface 1 and Chandrasekhar [25]) that the intensity of radiation emerging
surface 2, of total areas A, and A2,respectively. The view at an angle 0 to the normal is given by
factor F1l is defined as the fraction of the total radiation
leaving surface 1 that impinges on surface 2 directly. If surface
1is diffuse, radiating uniformly in all directions, it can be shown
that where p = cos B and p o = cos Bo. The function H satisfies the

~1 = -J- j-+,F
following integral equation, which is of general importance in
cos COS
~ ~ nAl A, A2
dAr dAz,
$2
theories of radiation scattering:

where r is the distance between two elements dA, and dA2 on


the respective surfaces, and 4, and +z are the angles between
the line joining dA, and dA, and their respective normals. In the present application, the characteristic function is simply
Next consider two adjacent rectangles, mutually inclined at 'Y(p)= &wo(constant).
90°, illustrated in Fig. P2.41. For the elements shown, Write a program that will compute and tabulate the function
rZ = (W - y)' i- +
x2 zZ, cos 4, = x / r , and cos 2 - z/r, and
#J M for all combinations of p and w o , both ranging from 0 to 1
138 Numerical Integration

in increments of 0.1. Then, evaluate I/Ffor several test values (Eh- Ec), cr = a(X, L, p) = (TI - Ec)I(Eh- Ec), and 4 =
of 6, 60, and w,. $(L) = q/(Eh- Ed.)
2.45 As shown in Fig. P2.45, two infinite black parallel 2.46 Referring to Table 2.7, verify the algebraic approxima-
plates at absolute temperatures Thand T, are separated by an tions to the derivative and their associated error terms.
optical thickness T = L of a gray gas that absorbs and emits 2.47 Jenkins and White [24] give values, reproduced here
(but does not scatter) radiant energy. (Optical thickness in Table P2.47, for the refractive index, n, and the dispersion,
measures the ability of a gas path of length z to attenuate dn/dX, of barium flint glass. The wavelength, A, is in angstrom
radiation, and is defined as T = f', ~p dz, where p = density units.
and K = mass absorption coefficient.)
Table P2.47

7 =0 7=L
Figure P2.45

The intensities of radiation in the outward (Tincreasing) and


inward (T decreasing) directions depend both on T and on the Make a critical evaluation of these tabulated values of dis-
angular direction 6; from the basic transfer equation (see, for persion, under each of the following circumstances:
example, Chandrasekhar [25]), it may be shown that these (a) If the only additional information is that n is a mono-
intensities I are given by the solution of the simultaneous tonic function of A.
integral equations (b) If n is known to vary approximately with X according to
+
the simplified Cauchy equation, n = A B/XZ,where A and B
I ( T , ~=
E k
) - r-ri"
T
+ 1 -=
0
E(t) dt
[e-('-')'q - (outward, 0 < p
P
c I), are constants.
2.48 The following quantities find application in the study
) -E
-c e(L-r)k dt of certain electronic vacuum tubes: (a) dynamic plate resist-
+ - (inward, -1 5 p < 0). ance, r, = (av./ai.),,, (b) amplificationfactor, p = -(av./av,),,,
and (c) transconductance, g, = (ai./av,).,. Here, v. = anode
In both cases, p = cos 6. Also, E, = uT: and E, = uT2 are the voltage, v, = grid voltage, and i. = anode current; representa-
emissive powers of the hot and cold plates, where u = 1.355 tive values for the triode 655 can be founa in Problem 1.42.
x 1 0 - l ~ a l / s e sq
c cm OK4 is the Stefan-Boltzmann constant. Write a function with three entries, RP6J5, MU6J5, and
E is the emissive power of the gas and is given as a function of GM6J5, that will compute r,, p, and g,, respectively, when
7 by supplied with any two of the following variables: (1) v,,
(2) u., and (3) ..i The first calling argument should be an
integer, having the value 1, 2, or 3, indicating which of these
three variables is omitted. For example, we might have:
in which the right-hand side is actually the average intensity of
radiation at any point. RP = RP6J5 (1, VA, IA)
Write a program that will solve the above equations to give RP = RP6J5 (2, VG, IA)
the outward and inward intensities as functions of T and p, and
E as a function of T. The accuracy of the computations should G M = G M 6 J 5 (3, VG, VA)
be checked by computing the net radiant flux density in the + T The a-c signal voltage amplification A for a triode with a
direction, load resistance R in the anode circuit is readily shown (see
Smith [19], for example) to be

which should be constant for all T. Solutions should be obtained


at points that subdivide L into n equal increments AT = Lln,
and for angular directions whose cosines are equally spaced by To test the above functions, write a short calling program
an amount A p in the range - 1 C p < 1. that will accept values for v,, v., and R, and then compute and
print the corresponding values of r,, p, g,, A, and the product
Suggested Test Data r,g, (which theoretically should equal p, thus affording a
Th= 1500°K, T, = 1000°K, n = 10, Ap = 0.1, with L = 0.1, check on the accuracy of thecalculations). Suggested test data,
0.5, and 2.0 in turn. R = 1.5, 8.2, and 22.0 kQ for each of the following combina-
(The above equations can be written in dimensionless form, tions: v, = -2.7, v. = 130 volts; v, = -10.0, v. = 287;
if required, by introducing X = T/L, 6 = t(X,L) = (E - E,)/ v, = -20.9, v. = 405.
Problems 139
The case of specified high-tension supply voltage (instead of 1
the actual anode voltage) is discussed in Problem 3.20. fU(x3)= [-2f(xo) +
8 f ( ~ 1 ) 12f(xz)- 40f(x3)
2.49 Show that for m = I, n = 4, and x , = x o f ih, 1 h4
(2.120) leads to the following formulations for f'(xi): f22f (~411 - +
h3f' 5 ' ( 6 ~ ) 60f'6)(62),

2.51 Show that for x = x i , (2.120) may be written in the


form
- 6f( ~ 3 ) + f(x4)l - ;f(5)(6)'
= L: ~ j ( fX I ) ,
1 h4
~ ' ( x z=) I f ( X O ) - 8 f ( X I ) -t 8f ( X Z ) - f (~311 +%f ("(S), where the a j are solutions of the simultaneous linear equations

.................................................
+ 25f (x4)I-t h4f ("(0.
2.50 Show that for rn = 2, n = 4, and xi x0 + ih, (2.120) - x"o0 + Gal + xZa2 + ... + #a. =
n!
------ x:-".
(n - m)!
leads to the following formulations for f"(xi):
Hint: Equation (2.120)must be satisfied exactly iff ( x ) is a
1 polynomial of degree n or less, in particular, if f ( x ) = 1 , x, x Z ,
~ " ( x o=) +
[70f(xo)- 208jP(xI) 228f(xz) - 1 lZf(x3) .. ., x". This approach is known as the method of undetermined
5 h4 coeficients. Note that the formulation does not assume equal
+22f(x,)l- +
h 3 f ' 5 ' ( t ~ ) 15f(6'(Sz)7 spacing for the base points, x i , i = 0 , 1, .... n. (See also
Problem 5.9.).
1 2.52 Write a function, named DSPLIN, with argument list
f"(xl) = 24h2 [22f(xo)- 4 0 f ( x 1 )i- 12f(x2>+ sf(x3)
1 h4 (N, X, Y , XARG)
-2f(x4'1 Ti h 3 f c 5 ' ( ' 1 )- that will evaluate the derivative of the cubic spline function
(see Problems 1.24, 1.25, and 1.26) at x = XARG where the
I
f"(x2) = [-2f(x0) + 32f(x1)- 6 0 f ( x ~+) 32f(x3) +
N 1 base points for the spline fit are available in X('I ), ....
+
X(N I ) , and the corresponding functional values are avail-
1 able in Y (1 ), .... Y (N + 11. The value of S':(Y, XARG) should
- 2 f (x4)1+ g0 h 4 f (6)(5),
be returned as the value of DSPLIN.
140 Numerical Integration

Bibliography 13. P. S. Davis and P. Rabinowitz, Numerical Integration, Blaisdell,


Waltham, Massachusetts, 1967.
1. B. W. Arden, An Introduction to Digital Computing, Addison- 14. C. W. Clenshaw, "Chebyshev Series for Mathematical Func-
Wesley, Reading, Massachusetts, 1963. tions," Math. Tables, Vol. 5, Nat. Phys. Lab., G. Britain, 1962.
2. F. B. Hildebrand, Introduction to Numerical Analysis, McGraw- 15. J. H. Perry, ed., Chemical Engineers' Handbook, 3rd ed.,
Hill, New York, 1956. . McGraw-Hill, New York, 1950.
3. "Handbook of Mathematical Functions," Natl. Bur. Standards 16. K. G. Denbigh, The Principles of Chemical Equilibrium, Cam-
Appl. Math. Series, 55, Washington, D.C., 1964. bridge University Press, London, 1957.
4. "Tables of Functions and Zeros of Functions," Natl. Bur. 17. B. Carnahan, Radiation Induced Cracking of Pentanes and
Standards Appl. Math. Series, 37, Washington, D.C., 1954. Dimethylbutanes, Ph.D. Thesis, University of Michigan, 1964.
5. J. F. Steffensen, Interpolation, Williams and Wilkins, Baltimore, 18. I. S. and E. S. Sokolnikoff, Higher 'Mathematics for Engineers
1927; 2nd ed., Chelsea, New York, 1950. and Physicists, 2nd ed., McGraw-Hill, New York, 1941.
6. F. L. Bauer, H. Rutishauser, and E. Stiefel, "New Aspects in 19. R. J. Smith, Circuits, Devices, and Systems, Wiley, New York,
Numerical Quadrature," Proceedings of Symposia in Applied 1966.
Mathematics, XV, pp. 199-218, American Mathematical 20. M. A. Snyder, Chebyshev Methods in Numerical Approximation,
Society, Providence, R. I., 1963. Prentice-Hall, Englewood Cliffs, New Jersey, 1966.
7. A. H. Stroud and D. Secrest, Gaussian Quadrature Formulas, 21. S. Timoshenko, Theory of Elasric Stability, McGraw-Hill, New
Prentice-Hall, Englewood Cliffs, New Jersey, 1966. York, 1936.
8. A. Ralston, A First Course in Numerical Analysis, McGraw- 22. H. S. Carslaw and J. C. Jaeger, Conduction of Heat in Solids,
Hill, New York, 1965. 2nd ed., Oxford University Prcss, London, 1959.
9. I. S. Berezin and N. P. Zhidkov, Computing Methods, Vol. I , 23. "Polar Regions" and "Climate and Climatology," Encyclopaedia
(English translation), Pergamon Press, London, 1965. Britannica, 1 1 th ed., Cambridge University Press, London,
10. J. 0.Wilkes, The Finite-Difference Computation of Natural 1910.
Convection in an Enclosed Rectangular Cavity, Ph.D. Thesis, 24. F. A. Jenkins and H. E. White, Futundamentals of Optics,
University of Michigan, 1963. McGraw-Hill, New York, 1951.
1 1 . W. N. Gill and M. Scher, "A Modification of the Momentum 25. S. Chandrasekhar, Radiatiue Transfer, Dover, New York, 1960.
Transport Hypothesis," A.I.Ch.E. Journal, 7, 61-65 (1961). 26. V. I. Krylov, Approximate Calculation of Integrals, Macmillan,
12. J. G. Knudsen and D. L. Katz, Fluid Dynamics and Heat New York, 1962.
Transfer, McGraw-Hill, New York, 1954.
CHAPTER 3

Solution of Equations

3.1 Introduction Consider the function 4 defined by


This chapter discusses methods for finding the roots 4(x> = ( - 1)"f( x > f( - x )
of equations such as x3 --3x + 1 = 0, or cos x - x = 0. =(x2-a:)(x2-a:)...(x2--a:) . (3.3)
While the emphasis will be on polynomial equations,
several of the techniques to be discussed will also be Since 4 ( x ) is a polynomial containing only even powers,
applicable to transcendental equations, such as the we may define the polynomial
second example above.
The problems to be disc:ussed will range from the find- JX) = (x - a?)(x - a : ) . . .(x - a:),
j 2 ( x ) = 4(
ing of all roots of the equation, with no prior information which has the property that the roots of f,(x) = 0 are the
about their locations, to the more precise evaluation of a squares of the roots of (3.2). Repeating this operation,
single real root already known t o be the only one lying we obtain a sequence of polynomials f2,,f4, f,,f,,,. . .
within some fairly short interval. Sometimes, real roots such that the equation
are the only ones of interest; in other cases, complex roots
are also sought. The coefficients of the equations will
fm(x)= ( X - a?)(x - a;). . .( X - a:) = 0, (3.4)
usually be regarded as real in the present discussion, where m is a positive integral power of 2, has the roots
although parts of it will be valid without this restriction. a';, a;,. . ., a,". The coefficients of fm are determined from
FOTthe remainder of the chapter, the equation to be the coefficients of the preceding polynomial by an
solved, if not specifically designated, will be written algorithm that is described later.
f (.XI= 0, (3.1)
The object of this procedure is to produce an equation
having roots differing greatly in magnitude, since the roots
and f will consistently denote the function appearing on
of such an equation can be approximated by simple
the left-hand side of (3.1). Since some of the methods of h~sctionsof its coefficients. K the roots of (3.2) are real
solution to be discussed are applicable only to algebraic
and la, I > la,l > . . - > Ia,l, then the ratios
equations, we shall regard f ( x ) as a polynomial through-
out the discussion. The extent to which this restriction IaTl4'1, IaYl~;l, - ., l~,"la7- l 1
may be lifted at various points will be apparent from the
can all be made as small as desired by making m large
context. With this understanding, f ( x ) may be expressed
enough. Expanding (3.4) leads to
in any of the forms
f ( x ) = xn + a l ~ n - +l ". + a,-,x + a,, jm(x) = xn - (a: + . - *)xn- + (aj;ay + .)xn-'
l *

- (aTa';u; + ..-)xn- + + (-t)"a';crl;.--a:.


(3.5)
Writing the right-hand side of (3.5) in the form

we derive the approximations


Here n is the degree of the equation, a,, a,,. . ., an are
A2 A,
the coeficients, and a , , a,, . . ., cl, are the roots. It is under- a';=A1,a;&-, . . . , a nm-
A- . (3.7)
stood throughout that lull 2 Ia21 > ..-2 lanl and ah # 0. A1 An- 1

From these, by taking mth roots, we map approximate


3.2 Graeffe's Method the values of the roots a,, a,, .. ., cc, of (3.2).
Of the numerous methods of solution that have been Since the signs of the roots are not determined, they
suggested, the most compktely "global" ones, in the sense must bc checked by substitution or otherwise. If the roots
of yielding simultaneous approximations to all roots, are do not all have different absolute values, as will be true
probabLy Graeffe's root-squaring technique [I 11 and the if multiple or complex roots are present, the situation is
QD method of Section 3.9. more complicated. If lal 1 = la21, while the other roots
142 Solution of Equations

are of lesser magnitude, the first terms on the right of (3.5) +


Example. Consider f(x) = x4 - 2x3 I.25xZ - 0 . 2 5 ~-
+
are approximated by x" - (a'; ct7)x"-' +(aT~7)x"-~. It 0.75, which has the factors (x - 1.5), (x +0.5), and
+
(xZ - x 1). Using the algorithm above, it is found that
is seen from the above that in this case a'; and a? are
approximated by the roots of the quadratic equation fi(x) = x4 - 1.5x3 - 0.9375~' - 1.9375~+ 0.5625,
+
x2 - Alx A2 = 0, while if a, and a,,, are the roots of f4(x) = x4 - 4.125x3 - 3.80859375x2 - 4.80859375~
equal magnitude, the equation
+
0.31640625,
A,-~X' - A,X +A,,, =0 (3.8) f s ( x ) = x4 - 24.6328125x3 - 24.53269958~' -' 25.5327025~
yields approximations to a: and a,",,. Of course, this +
0.10011292.
leaves rn possible values from which a, must be calcu-
Thus the predicted values of a?, a;, a:, and a: are, respectively,
lated by reference to the original equation. Should a, be
real, this offers no computational difficulty.
+
24.6, -0.498 0.888i, -0.498 - 0.888i, and 0.00392. These
compare fairly well with the truncated true values 25.63,
After a sufficient number of steps have been taken, it is -0.5 + 0.866i, -0.5 - 0.866i, and 0.00391.
clear from (3.5)
. . that if the roots are real and distinct, the
coefficients A,, A,, . . ., A, will be approximately squared 3.3 Bernoulli's Method
at each iteration. However, if la,l = Ia,+,l, this will not Let uk, 0 ,< k < n - 1, be real numbers that we can
be true of the coefficient A,. choose somewhat arbitrarily (freedom of choice is dis-
In spite of its advantage of furnishing approximations cussed below). For k 2 n, define
to all roots simultaneously, Graeffe's method does not
seem to have aroused great enthusiasm among users of
automatic equipment. One drawback is the need for
making decisions not readily mechanized. The difficulties Bernoulli's method for finding a zero of
in locating complex roots have already been mentioned.
Moreover, errors introduced at any stage have the effect
f (x) = C a+"-'
i=O
of replacing the original problem with a new one, thus is to observe relations involving the sequence (u,) and
affecting the correctness of the roots arrived at rather certain related sequences of which we use only those
than merely the rate of convergence. A technique for given by
obviating this last difficulty will be explained in Section 3.4.
Another feature of concern is that coefficients developed
can quickly leave the usual floating-point range.
The coefficients in (3.6) can be found by performing The simplest theorem states that if lrll > 1 ~ ~then
1,
the polynomial multiplication (- 1)" f (x) f(-x) and then
compressing the result by ignoring the zero contents of a, = lim -.
Uk
(3.1 1)
alternate locations, starting with that corresponding to k 4 2 uk- 1
X2n - 1
, and storing the contents of the remaining locations An important advantage of Bernoulli's method over
in some suitable sequence of locations. Graeffe's method is that (3.1 1) remains true, if suitable
If desired, the coefficients may be found iteratively by initial values are used, for the case r , = a, = . = ai, ..
the relations provided la,l > Ia,+,l. In addition, if two or more roots
are of equal modulus and one is of greater multiplicity
than the others, the limit in (3.1 1) exists and is that root
of greater multiplicity. Moreover, the technique for
finding a conjugate pair of complex roots can be con-
where ,Ai = a i and siderably simpler than when using the root-squaring
method. The price paid is lack of knowledge of other
roots.
In the above formula, the presubscripts on A refer to the If there are no multiple roots, it can be shown [I] that
iteration counter, that is, jAi is the value of A, found on the unique solution of (3.9) for k 2 0 is of the form
the jth pass of the iterative scheme; ,Ai is the initial
value of A,, etc. Indices greater than n mean that the
number to be used is zero. where the character of the c, is controlled by the initial
One application of value lies in the use of the root- choice of the u,, 0 $ k < n - 1. This being so, it is clear
squaring technique over a few iterations. When roots are that the conclusion of (3.11) is warranted provided that
distinct but nearly equal, the result may be a useful c, is not zero. Should a, = a, and Ia21 > la, 1, then the
equation in which the roots have better separation. Then character of the solution is
the method of Section 3.3 can be applied.
3.3 Bernoullii's Method 143
Similar relations hold for the general case, and the truth Example. Now consider the application of Bernoulli's
of (3.1iJ for the condifio~lsdescribed is then clear, pro- method to the example of the previous section, namely,
vided that a properly chosen coefficient does not vanish. f (x) = x4 - 2x3 + 1.25x2 - 0 . 2 5 ~- 0.75.
Suppose next that ]a11 := Ia21 > la, 1. It follows that The formula for uk is thus

Table 3.1 gives uk and uk/uk- for the initial approximation


uo = ul = uz = 0, u3 = I . The factors for the polynomial are
k-m
tk+ 1
lim --
Vk
= a, + a,. - +
(X l.5), (X 0.5), and (x2 - x 1). +
The discussion is given only for a, $ a,, although the Table 3.1 Bernoulli's Method-
proposition is valid as stated. (Indeed, it is true for First Example
at = a 2 -
mi; ai+l = ai+2= -.= aZi;IalJ = laZil>
Ia2i+l],aswellaswhena, :=a2 = = ai;ai+, = a i +=~
... = lull = )a2i]= la2i+l], provided that
and following roots of equal modulus have multiplicity
less than i.) Note that the limit of u ~ / v ~is -vl/v;-,
~ where

Also, the limit of tk+,/vk is; tL+ ,/vk, where


t; = (clai + ~ ~ a ; ) ( c ~ a +
:-~
c2ai-l)
- (clCt:+ + c2ff;+ ')(el$,- +
c2a;- 2)
- ak-2
1 2 1 a2 (011 + a2)(2a,a2 - a: 4 ) .
k-2
-
Consider next the prob1e:m of choosing u,, u,, .. ., urn-,
so that none of the coefficients of dominant powers
vanish. Since the relations (of(3.12) (or similar ones in the
-
case of multiple roots) hold for 0 < k < n 1, and since
the determinants involved are non-vanishing Vander- As a second example, take f (x) = x4 - x3 0.75x2 +- +
monde determinants, it sufices to choose 24,-, to be non- 0.25~ - +
- 0.25 = (x2 0.25)(x2- x 1). Again, let uo = ul =
zero and uk = 0 for k < n --1. In that event, for each root, 2x2 = 0 , U 3 = I .

the coefficient indicative of the multiplicity of the root Table 3.2 shows uk, vk, tk, v k / ~ ~and
- ~ tk+l/vk
, for values of
cannot vanish. k=3,4 ,..., 16.
It should be borne in nnind that a limit of the form
Table 3.2 Bernoulli's Method-Second Example
(3.11) can correspond to a multiple root. When limits of
the form (3.13) occur, a, and a, are found by solving k uk vk tk ~k/uk-l tk/~k-
- + +
x2 (a1 a2)x ala2 = 0. Here, too, the roots may be
1

multiple.
Note that the numbers formed often exceed the usual
floating-point range. The difficulty can be remedied by
dividing the last n values of the uk by some constant and
then proceeding.
As previously indicated, Graeffe's method may be used
prior to applying Bernoulli"~method in order to increase
the root-separation. Since convergence may be slow, and
since multiple roots must ble considered, we suggest that
as soon as convergence seems to be established, the pro-
cedures of Section 3.4 on the iterative factorization of
polynomials be employed.
A few further remarks concerning both Bernoulli's and
Graeffe's methods will be found in the next chapter when Thus the dominant roots are found as the solution of
companion matrices are discussed. Additional references +
x Z - 0.99993~ 1.0002, as compared with the true factor
to Bernoulli's method are given on page 161 of Henrici [I]. x2-xfl.
EXAMPLE 3.1

GRAEFFE'S ROOT-SQUARING METHOD


MECHANICAL VURATION FREQUENCIES

Problem Statement
Write a program that implements Graeffe's root-
squaring method for finding real and distinct roots of the
Then, given the initial values for the Ci, where
nth-degree polynomial
Ci = ,Ai = ai, (3.1.4)
the Bi for one iteration may be evaluated from
where the a i are real and a, # 0.
Test the program with several different polynomials, in
particular, the polynomial whose roots are related to the
frequencies of vibration for the mechanical system illus-
trated in Figure 3.1.1 (see Problem 4.23 for details). Con-

Figure 3.1.1 Mass-spring system.

sider the horizontal motion of eight equal masses, that is, In (3.1.5), any elements C,, k < 0 or k > n, are arbitrarily
let m, = m2 = - .= m, = m. The spring stiffness coeffi- set to zero. After the computation of the Bi, i = 1,2, ., ..
cient, ki,i = I , 2, . . ., 9, is the force required to extend or n, each Bi is tested to insure that its value is well within the
compress the ith spring by unit length. Find the frequen- floating-point number range. The permitted number
cies of vibration for two cases: range is [-T, - l/T], 0,[I/T, TI. The nonzero Biare
A. m = 1 lb,, and k, = k , = .-.= k, = 1 Ibm/sec2. subjected to two magnitude tests,
B. m = 4 lb,, and ki = i/4 lbm/sec2,i = 1,2, ..., 9. IBiI < T
Method of Solution i=,2, . n. (3.1.6)
Graefle's method. In the program that follows, Graeffe's 141 > B*z. 0,
method as described by (3.3) and (3.4) is implemented.
Data for the program are: n, itmax, T, E , iprint, a,, ..., Should all Bi pass both tests, the root-squaring process
a,. Here, itmax is the maximum number of iterative squar- is continued. The Ci are assigned the newly computed
ing operations permitted. T is the maximum coefficient values of the B,, that is,
magnitude allowed, while 1/T is the minimum coefficient
magnitude allowed (aside from zero); these limits on
magnitude insure that all coefficients remain in the float- Then, the Bi for the next iteration are computed using
ing-point (REAL) number range. & is a small positive (3.1.5).
number used in the root test (3.1.9), and iprint is a The sequence (3.1.5), (3.1.0, and (3.1.7) is repeated
printing control switch. cyclically until at least one Bi fails one of the magnitude
The program first normalizes the coefficients a , tests (3.1.6), at which time the real and distinct roots, a,,
i = 0, 1, . . ., n, by dividing each by a,, that is. of (3.1.1) should be well separated and such that

where j is the number of iterations.


where the left arrow.is intended to mean assignment or
The original polynomial (3.1.1) is then evaluated using
replacement, rather than algebraic equality. Since
both +Zi and -Ei to determine which produces the
Graeffe's method yields only the magnitudes of possible
smallest magnitude, and hence which sign should be
roots, one way of establishing the proper sign for a
assigned to a,. Of course, the a, found may not be roots
potential root is to evaluate the original polynomial (or
at all (p,(x) may have multiple or complex roots). A value
its normalized equivalent) using both the positive and
a , with proper sign is assumed to be a root if
negative arguments. Hence, the values of the a, should
not be destroyed during the root-squaring process. I pn(ai)I < 8, (3.1.9;
Let coefficients jAi and j+,Ai, described in Section 3.2, that is, if the magnitude of the polynomial evaluated a1
be rewritten as a, is smaller than some small number E.
~Yxample3.1 Graeffe'sRoot-Squaring Method-Mechenical Vibration Frequencies 145

Mechanical vibration system. Following the method are described in Chapter 4. One such method is to solve
outlined in Problem 4.23, the natural circular frequencies the characteristic equation of the matrix, described on
of vibration o of the above system are given by oZ= A, page 220. Using the method of Danilevski, outlined in
where 1 may be any one of the eigenvalues of the follaw- Section 4.9, the characteristic equations for matrix
(3.1.10) for the twb situations described in the problem

!]
ing matrix:
0 0 . 0 statement are :
k l + k 2 -k2
-kz k2+k3 .-k3 0 * 0 Case A. A* - 16 A7 + 105 i6 - 364 A5 + 715 A4
0 -k3 k 3 + k 4 -k4 0 - 792 A3 + 462 A2 - 120 1 + 9 = 0 (3.1.11)

.
0 0 0 0 . -ka ke+k9 Case B. A8 - 20 A' + 157.0625 A6 - 623.4374 A5
(3.1.10) + 1341.4450A4 - 1557.6560A3+ 912.2167A2
Several methods for obtaining the eigenvalues of matrices - 227.9003 A + 15.6643 = 0. (3.1.12)
Flow Diagram
146 Solution of Equations

FORTRAN Implementation
List of Principal Variables
Program Symbol Definition
Vector of polynomial coefficients, ai.
Vector of coefficients,B , of the squared polynomial after the current iteration (3.1.3).
ct Vector of coefficients, Cis of the squared polynomial prior to the current iteration (3.1.3).
EPS Small positive number, 6 , used in test (3.1.9) to determine if ai is to be considered a root.
I, IM1, IML, IPL i, i - 1, i - I, i + I, respectively.
IPRlNT Print control variable. If nonzero, coefficients Bi are printed after each iteration.
ITER Iteration counter, j.
ITMAX Maximum number of iterations allowed.
J Subscript (not to be confused with iteration counter).
L 1.
N n, degree of starting polynomial.
NPl n + 1.
PMINUS p-, value of p,( -Zi).
PPLUS p+, value of pn(Zi).
PVAL ~n(~i).
ROOT ui, possibly a root of p,(x).
TOP T, upper limit on the magnitudes of coefficients Bi, produced by the root-squaring process.

t Because of FORTRAN limitations (subscripts smaller than


one are not allowed), all subscripts on the vectors of coefficients a,
B, and C that appear in the text and flow diagram are advanced by
one when they appear in the program; for example, a. becomes A(1),
+
B, becomes B(N I), etc.
Example 3.1 Graeffe's Root-Squaring Method-Mechanical Vibration Frequencies

Program Listing
A P P L I E D NUCIEHICAL METHODS, EXAMPLE 3 . 1
GRAEFFE'S ROOT-SQUARING METHOD

T H I S PROGRAM USES GRAEFFE'S ROOT SQUARING TECHNIQUE TO F I N D


THE HEAL AND D I S T I N C T ROOTS OF THE NTH DEGREE POLYNOMIAL WHOSE
C O E F F I C I E N T S ARE READ I N T O A ( 1 )... A ( N + l ) I N DECREASING
POWERS OF THE V A R I A B L E W I T H THE CONSTANT TERM I N A ( N + 1 ) . THE
C O E F F I C I E N T S ARE F I R S T NORMALIZED BY D I V I D I N G EACH BY A ( l ) ,
Y I E L D I N G A ( 1 ) = 1. THE I T E R A T I V E ROOT SQUARING PROCESS USES
TWO TEMPORARY VECTORS, C AND B, I N ALTERNATE SUCCESSION AND
CONTINIUES U N T I L ONE OF THE C O E F F I C I E N T S EXCEEDS TOP I N
MAGNITIUDE OR U N T I L THE MAXIMUM NUMBER OF I T E R A T I O N S ITMAX
I S EXCEEDED. AT T H I S P O I N T THE MAGNITUDES OF THE P O S S I B L E
ROOTS ARE COMPUTED (ROOT). I N ORDER TO DETERMINE THE PROPER
S I G N FOR ROOT, THE NORMALIZED POLYNOMIAL I S EVALUATED FOR
P O S I T I V E ( P P L U S ) AND N E G A T I V E ( P M I N U S ) (ARGUMENTS. I F THE
MAGNITUDE OF PPLUS OR PMINUS I $ SMALLER THAN EPS, I T I S
ASSUMED THAT A ROOT HAS BEEN FOUND. AN APPROPRIATE COMMENT
I S P R I N T E D ALONG W l T H THE VALUE OF ROOT ( W I T H PROPER S I G N )
AND THE CORRESPONDING VALUE OF THE POLYNOMIAL, PVAL.
I F I P R I N T I S NONZERO, C O E F F I C I E N T S QF THE SQUARED POLYNOMIAL
ARE P R I N T E D AFTER EACH I T E R A T I O N .

DIMENSION A(1001, 8(100), C(100)

c c i j = 1,
READ ( 5 , 1 0 0 ) N, ITMAX,TOP,EPS, IPRINT
NP1 = N- + 1
READ ( 5 , 1 0 1 ) (A(l),I=l,NPl)

.....
DO 2
NORMALIZE A C O E F F I C I E N T S , I N I T I A L I Z E C VECTOR
1 = 2, N P 1
.....
A(!) = A(I)/A(l)
C(1) = A(1)
A ( 1 ) = 1.
WRITE ( 6 , 2 0 0 ) N,ITMAX,TOP,EPS,IPRINT,NPl8(A(I~,I=1,NP1)

...,. B E GI TI EN RGRAE.FFE8S I T E R A T I O N .....


DO 1 0 = 1, I T M A X
. COMPUTE C O E F F I C I E N T S OF F ( X ) * F ( - X ) ( W I T H APPROPRIATE
s IGN), IGNORING A L T E R N A T ~ Z E R O COEFFICIENTS .. ...
1 = 2, NPP
B(I) = C(I)*C(I)
IM1 * I
DO 3
1 -
L = 1, I M 1
IPL = I * L

B ( I ) = B ( I ) + (-l.)**L*Z.*C(IPL)*C(IML)
B(I) = (-l.)+*lMl*B(I
I F ( 1PRIlNT.EQ.O GO TO 6
WRITE ( 6 , 2 0 1 ) ITER,NPl,(B(I),I*l,NPl)

.....
DO 9
HAVE ANY C O E F F I C I E N T S EXCEEDED S I Z E L I M I T S
1 = 2 # NP1
....,
I F ( ABS(M(I)).GT.TOP .OR. A B S ( B ( I ) ) . L T . l , / T O P .AND. 8(I).NE.0.0
1 GOT011
CONTI NUE e
.....
DO 1 0
S H I F T C O E F F I C I E N T S FROM B TO C FOR NEXT I T E R A T I O N
1 = 2, N P l
.....
C(1) = B(U)

WRITE (6,202)
I T E R = ITLdAX

..... THE FOLLOWING STATEMENTS COMPUTE THE MAGNITUDES OF THE


P O S S I B L E ROOTS AND EVALUATE THE O R I G I N A L POLYNOMIAL FOR
BOTH P O S I T I V E AND N E G A T I V E VALUES OF THESE ROOTS .....
Solution of Equations

Program Listing (Continued)


C
11 WRITE (6,203)
DO 2 0 1 = 2, NPl
'"
a

C
C
..... COMPUTE
POLYNOMIAL .....
ESTIMATE OF ROOT FROM COEFFICIENTS OF SQUARED

ROOT= A B S ( B ( ' I ) / B ( I - 1 ) ) * * ( 1 . / 2 . * * l T E R )

..... EVALUATE POLYNOMIAL AT ROOT AND -ROOT


PPLUS = 1.
.....
PMINUS = 1.
DO 1 4 J = 2, NP1
PPLUS = PPLUS+ROOT + A ( J )
PMINUS = PMINUS+(-ROOT) + A ( J )

..... CHOOSE L I K E L Y ROOT AND MINIMUM POLYNOMIAL


I F ( ABS(PPLUS).GT.ABS(PMINUS) GO TO 1 6 )
VALUE .....
PVAL = PPLUS
GO TO 1 7
PVAL = PMINUS
ROOT = -ROOT

..... I S PVAL SMALLER THAN EPS


I F ( ABS(PVAL).GE.EPS ) GO TO 1 9
.....
WRITE (6,201) ROOT, PVAL, ITER
GO TO 2 0
WRITE (6,205) ROOT, PVAL, iTER
CONTI NUE
GO TO 1

..... FORMATS FOR INPUT AND OUTPUT STATEMENTS


FORMAT ( 1 0 X , 1 2 , 1 8 X , 1 2 , 1 8 X , E 6 . 1 / lOX,E6.1,14X,ll
.....
FORMAT (20X, 5F10.b )
FORMAT ( 1 0 H l N = , 1 8 / 10H ITMAX = , 18 / 1 0 H T O P = ,
1 l P E 1 4 . 1 / 1 0 H EPS = ,
lPE14.1/10H { P R I N T = , t 8 /
, 12, l H ) , / / ( 1 H , OP5F13.6)
2 20H0
FORMAT (10HOITER
1 (1H , lP5E15.6)
A(l)...A(
= .18 /20HO B(l)..,B(, 12, l H ) / 1H /

202 FORMAT (43HOITER EXCEEDS ITMAX -


CALCULATION CONTINUES )
CO
203 FORHAT ( 1 H O f 55H ROOT POtYNOMlAL VALUE ITER
lMMENT / 1H
204 FORMAT ( 1 H , F10.6, 1PE17.6, 18, 7X, 15HPROBABLY A ROOT )
205 FORMAT ( 1 H , F10.6, 1PE17.6, 18, 7X, 19HPROBABLY NOT A ROOT 1
C
END

Data
N = 2 ITMAX = 25 TOP = 1.OE30
EPS = 1.OE-1 IPRINT = 1
A(L)...k(3) = 1.0000 -3.0000 2.0000
N = 2 ITMAX = 3 TOP = 1.OE3O
EPS = 1.OE-1' IPRlNT = 0
A(l)...A(3) = 1.0000 -E.0000 2.0000
N = 3 ITMAX = 25 TOP = 1.llE30
EPS = 1.OE-1 l PRINT = 0
A ( 1 ) . ..A(4) = 2.0000 -12.aooo 22,0000 -~12.0000
N = 4 ITMAX = 25 TOP = 1.OE30
EPS = 1.OE-1 IPRINT = 0
A(l)...A(S) = 1.0000 -10.0000 35.0000 -50.0000 24.0000
N 3 ITMAX = 25 TOP = 1.OE30
EPS = 1.OE-1 IPRINT = 0
A(lI,..A(4) = 0.5000 4.0000 10.5000 9.0000
n = 3 ITMAX = 25 TOP = l,a~30
El's = I.0E-1 $PRINT = 0
A(l)...A(4) = 1.0000 -19.0000 55.0000 75.0000
N = 3 ITMAX = 25 TOP = I.OE30
EPS = 1.OE-1 IPRINT = 1
A ( 1 1 . ..A(4) = 0.2500 -2.2500 2.0000 -1.0008
Example 3.1 Graeffe's Root Squaring Method-Mechanical Vibration Frequencies

Program Listing (Continiwed)


N
EPS -3

A(l)...A(5)
4
1.OE-1
=
ITMAX = 25
IPRINT = 0
1.0000. -35.0000
TOP = 1.OE3O

146.0000 -100.0000
N = 4 ITMAX = 25 TOP = 1.OE30
EPS = 1.OE-L IPRINT = 0
A(l)...A(5) = 1.0000 -16.0000 78.0000 -412.0000
N = 3 ITMAX = 25 TOP = 1.0~30
EPS = 1.OE-1 IPRINT = 0
A(l)...A(4) = 1.0000 0.0000 -3.0000 1.0000
N = 4 ITMAX = 25 TOP = 1.OE30
EPS = 1.OE-1 IPRINT = 0
A(l)...A(5) = 1.0000 -24.0000 150.0000 -200.0000
N = 4 ITMAX = 25 TOP = 1.OE30
EPS = 1.OE-I lPRlNT= 0
A(l)...A(S) = 1.0000 -1.0000 0.7500 0.2500
N = 4 ITMAX = 25 TOP = 1.OE30
EPS = 1.OE-11 1PR1NT = 0
A(l)...A(5) = 1.0000 -2.0000 1.2500 -0.2500 -0.7500
N = 8 ITMAX = 25 TOP = 1.OE30
EPS = 1.OE-11 IPRINT = 0
A(l>...A(5) = 1.0000 -16.0000 105.0000 -364.0000 715.0000
A(6)...A(9) = -792.0000 462.0000 -120.0000 9.0000
N = 8 ITMAX = 25 TOP = 1.OE30
EPS 1.OE-1. IPRINT = 0
A(l)...A(5) = 1.0000 -20.0000 157.0625 -623.4374 1341.4450
A(6)...A(9) = -1557.6560 912.2167 -227.9003 15.6643

Computer Output
Results for the 1st Data Set
N
ITMAX
TOP
I

=
- 2
25
1.OE 3 0
EPS 1.OE-01
IPRINT = 1

ITER = 1

ITER = 2

ITER 3

ITER = 4
So Iution of Equations

Computer Output (Continued)


ITER = 5

B(l),..B( 3)
1,000000E 00 -4.294967E 09 4.294967E 09

ITER 6
B(l)...B( 3)
1.000000E 00 -1.844674E 19 1.844674E 19

ITER = 7

B(l)...B( 3)

ROOT POLYNOMIAL VALUE ITER COMMENT

PROBABLY A ROOT
PROBABLY A ROOT

Results for the 2nd Data Set


N I 2
ITMAX = 3
TOP = 1.OE 3 0
EPS = 1.OE-01
IPRINT = 0

ITER EXCEEDS ITMAX - CALCULATION CONTINUES


ROOT POLYNOMIAL VALUE ITER COMMENT

2.000975 9.756088E-04 3 PROBABLY A ROOT


0.999513 4.882812E-04 3 PROBABLY A ROOT

Results for the 7th Data Set


N 3 3
ITMAX = 25
TOP = 1.OE 3 0
EPS = 1.OE-01
IPRlNT = 1

ITER = 1

ITER * 2
Example 3.1 Craeffe's Root-Squaring Method-Mechanical Vibration Frequencies

Computer Output (Continued)


lTER = 3

B(lI...B( 4)

1.000000E 00 -5.130000E 02 6.604800E 04 -6.553600E 04

ITER = 4

B(l)...B( 4)

1.000000E 00 -1.310730E 05 4.295098E 09 -4.294967E 09

ITER = 5

B(l)...B( 4)
1.000000E 00 -8.589935E 09 1.844674E 19 -1.84467hE 19

lTER = 6

ROOT POLYNOMIAL VALUE ITER COMMENT

2.021778 4.854202E-04 6 PROBABLY A ROOT


1.978456 4.539490E-04 6 PROBABLY A ROOT
1.000000 0 0. 6 PROBABLY A ROOT

Results for the 9th Data Set


N - 4
ITMAX = 25
TOP = 1.OE 3 0
EPS = 1.OE-01
IPRINT = 0

ROOT POLYNOMIAL VALUE ITER COMMENT

11.999994 -8.300781E-03 4 PROBABLY A ROOT


5.324727 -'9.699263E 02 4 PROBABLY NOT A ROOT
4.882880 -8.222861E 0 2 4 PRORABLY NOT A ROOT
2.000000 0.0 4 PROBABLY A ROOT
Solution of Equations

Computer Output (Continued)


Results for the 14th Data Set
N - 8
ITMAX = 25
TOP = 1.OE 3 0
EPS = 1.OE-01
IPRINT = 0

Results for the 15th Data Set


ROOT POLYNOMIAL VALUE l TER COMMENT

PROBABLY N O T A ROOT
PROBABLY NOT A ROOT
PROBABLY A ROOT
PROBABLY A ROOT
PROBABLY A ROOT
PROBABLY A ROOT
PROBABLY A ROOT
PROBABLY A ROOT

N - 8
ITMAX = 25
TOP = 1.OE 3 0
EPS = 1.OE-01
IPRINT = 0

ROOT POLYNOMIAL VALUE ITER COMMENT

PROBABLY N O T A ROOT
PROBABLY N O T A ROOT
PROBABLY A ROOT
PROBABLY A ROOT
PROBABLY A ROOT
PROBABLY A ROOT
PROBABLY A ROOT
PROBABLY A ROOT
Example 3.1 Graeffe's Root-Squaring Method-Mechanical Vibration Frequencies 153
Discussion of Results Here the roots are well separated, and good accuracy is
Single-precision arithmetic was used for all calcula- attained in just four
tions. Results are shown for data sets 1,2, 7,9, 14, and Calculated
15. In most cases, the printing of intermediate coefficients Data Set 7 (results shown) Roots True Roots
was not requested. Final results for each data set are
listed below, along with the polynomial pn(x) and the pn(x) = 0.25x3 - 1.25x2 2.021778 2.000000
known roots. +2x-1 1.978456 2.000000
ITER = 6 1. ~ 0 0 0 1.000000
Calculated
Data Set 1 (results shown) Roots True Roots Again, as in data set 5, there are multiple real roots. If
a multiple root were known to exist, then (3.8) could be
pn(x) = xZ - 3x + 2 2.000000 '2~~00000 applied to find better root values from the solution of the
Number of iterations, 1 .oooooo 1 . ~ ~ 0 0 0 0 quadratic equation
ITER = 7

Calculated
Data Set 2 (results shown,) Roots T~~~R~~~~ The predicted roots are:
2.000000 a f 6= a f 4= 1.8446745 x 1019
1.000000 ag4 h 1.8446745 x 1019
In this case, the maximum number of iterations permitted which to seven significant figures yields a, & a 2
was 3. Hence the roots of the squared polynomial have 2.000000.
not been separated far enough to attain the accuracy
shown for the same polynomial in data set 1. Calculated
Data Set 8 Roots
Calculated
Data Set 3 Roots True Roots pn(x) = x4 - 35x3 + 146~' - lOOx + 1 30.288681
ITER = 4 3.858056
pn(x) = 2x3 - 12x2 + 22x - 12 2.999999 3.000000 .843 107
ITER = 6 2.000000 2.000000 .010150
1 .oooooo 1.000000
This polynomial is the characteristic function found by
Calculated the method of Danilevski (see Chapter 4) for the sym-
Data Set 4 (Roots True Roots metric matrix
+
p,(x) = x4 - lox3 3 5 ~ ~ 4.00001 1 4.000000 1 0 9 7 5
- 50x+ 24 2.999990 3.000000
ITER = 5 1.999999 2.000000
1 .oooooo 1 .oooooo
Calculated Calculation of the roots of pn(x) (the eigenvalues of the
Data Set 5 R~~~~ T~~~R~~~~ matrix) using the Power, Rutishauser, and Jacobi
methods (see Examples 4.2, 4.3, and 4.4, respectively)
+
pn(x) = 0 . 5 ~ 4x2
~ + 1 0 . 5 ~ -3.065690 - 3.000000 yielded the following results:
+ 9 -2.935715 -3.000000
ITER = 5 - 1.999999 -2.000000 Power Method Rutishauser M e t h o d Jacobi Method
In this case the dominant root has multiplicity 2. The 30.288685 30.288685 30.288685
program indicated that -- 3.065690 and -2.935715 were 3.858057 3.858057 3.858057
probably roots, since the polynomial values were 0.843107 0.843107 0.843107
-0.00593 and -0.00386, respectively, smaller than the 0.010150 0.010150 0.010150
data value read for E (0.1). Typically, Graeffe's method Calculated
estimates of multiple real roots will span the true value, Data Set 9 (results shown) Roots True Roots
that is, (-3.066 - 2.936),12 A -3.0.
pn(x) = x4 - 16x3 + 78x2 11.999994 12.000000
Calculated - 4I2x + 624 5.324727 ( N R ) 1.0 + 5i
Data Set 6 Roots True Roots ITER = 4 4.882880 (NR) 1.0 - 52'
pn(x) = x3 - 19x2 + 55x -f 75 14.999993 15.000000 2.000000 2.000000
ITER = 4 4.999995 5.000000 The program indicated correctly that 5.32 and 4.88 were
- 1.000000 - 1.000000 probably not roots (NR). The two real roots were found,
154 Solution of Equations

but not the pair of complex conjugate roots. The starting This polynomial is used in the numerical example of Sec-
polynomial in this case is the characteristic function of tion 3.2. The two real and distinct roots were found. The
the matrix pair of complex conjugate roots was not found, as
before. Equation 3.8 could be used to find them, however.

Calculated
Data Set 14 (results shown) Roots
Calculated +
pn(x) = x8 - 16x7 1 0 5 -
~ 364x5
~ 3.931828 (NR)
Data Set 10 Roots True Roots + +
715x4 - 792x3 46%' 3.504101 (NR)
pn(x) = x 3 - 3~ +1 - 1.879385 - 1.879385 - 120~ 9+ 2.987610
lTER = 4 2.344709
ITER = 7 1.532089 1.532089
1.652354
0.347296 0.347296
0.999980
The computed roots agree exactly with the roots calcu- 0.46791 1
lated for pn(x) by the method of successive substitutions 0.120615
(see Table 3.5).
Calculated This polynomial is the characteristic function for the
Data Set 11 Roots True Roots eighth-order striped symmetric matrix (3.1.10) for Case A.
pa(%)= x4 - 24x3 14.999993 15.000000
It is also used as an example data,set for calculation of -
the roots (eigenvalues) by the Power, Rutishauser, and
+ 150x2 - 2 0 0 ~ 5.221369 (NR) 5.000000
Jacobi methods (see Examples 4.2,4.3,4.4).
-375 4.788013 (NR) 5.000000
ITER = 4 - 1.00oooo - 1. 0 0 m
Power Method Rutishauser Method Jacobi Method
This is the characteristic function of the matrix
3.879385 3.879385
3.532089 3.532089
3.000000 3.000000
2.347296 2.347296
1.652704 1.652704
Calculated 1.000000 1.000000
Data Set 12 Roots True Roots 0.4679 11 0.46791 1
0.120615 0.120615
+
pn(x) = x4 - x 3 0.75x2 1.000000 (NR)
0.5 JJi
+ 0 . 2 5 ~- 0.25
+_

2
The vibrational frequencies predicted by Graeffe's method
ITER = 6 I .OOOOOO (NR) 0.5 -- for the mechanical system of Fig. 3.1.1 with the parameter
- 2 values for Case A are:
0.505445 0.500000
-0.494614 -0.500000
In this case there are two pairs of roots, one complex
conjugate, the other real. As before, the routine fails to
find accurately any root from a root pair.

Calculated
Data Set 13 Roots True Roots
+
pn(x) = x4 - 2x3 1.25x2 1.500000 1.500000
- 0 . 2 5 ~- 0.75 The two or three highest frequencies computed using
ITER = 8
1.000000 (NR)
0.5 i + J5 Graeffe's method may not be of adequate accuracy.
2 However, the method has produced good approximations
1.000000 (NR)
0.5 -
J3 to these roots. These may be used as first estimates in
one of the iterative root-finding procedures to be dis-
cussed later in this chapter.
Example 3.1 Graefle's Root Squaring Method-Mechanical Vibration Frequencies 155
Calculated adequate accuracy, but can serve as very good starting
-
Data Set 15 (results shourn) Roots True Roots values for some other root-finding method. Based on the
Graeffe's method results, the natural frequencies for the
P,(x) = x8 - 20x7 6.595325 (NR)
+ 157.0625x6 4.708691 (NR) mechanical system of Fig. 3.1.1 with the parameter
values of Case B are:
- 623.4374~~ 3.365894
+ 1341.4450x4 2.342498
- 1557.6560x3 1.544105
-I-912.2167x2 0.912852
-227.9003~ 0.425053
$ 15.6643 0.106776
ITER = 4

This is the characteristic ifunction for matrix (3.1.10)for


Case B. Again, the largest root estimates may not be of
156 Solution of &uations

3.4 Iterative Factorization of Polynomials and let the Jacobian involved in the Newton-Raphson
The techniques discussed thus far are usually useful in technique be different from zero at the point p =
finding approximate values of one or more roots off (x). (p,,p,,. .., p,). Let P") = (py', py, . ..,p:)) and let the
Should a root be located which is not multiple-valued, a point P(') be "near enough" to p. Then the sequence of
procedure such as Newton's method (p.171) can be em- points {P'j)), defined below, converges to p, and
ployed to determine the root more accurately. In such
event, a polynomial of reduced degree can be formed for
further investigation by ordinary synthetic division (p.7).
Proceeding in this manner, if no multiple roots are present, Po = 1 and p = (PI, /I2,..., Po) is given by
the solution can be completed. However, this procedure
can be expected to make approximations to successive /I = lim (by), bij), ..., b:!,).
j-m
roots more and more inaccurate, and the presence of
multiple roots causes difficulty. In this section (see also Here the numbers by) are related to the numbers pl') [see
Luther [6]), a method is presented for improvingtheaccu- (3.1511 by
racy for single roots, and for maintaining a balance of m
accuracy among the various roots as solution progresses. 1pii'b$-?i
i=O
= at, bj" = 0 for s < 0. (3.18)
Bairstow's method [4] is an algorithm for finding
quadratic factors iteratively for the polynomial of (3.2). To find the numbers pj"" from the numbers p?,
The iterative technique which follows produces factors of (1 < i < m), first solve, for the numbers cp), the equations
arbitrary degree m for this same polynomial. Program- (O<k<n):
ming in terms of m as a parameter is simple: when m is
unity, the method is Newton's method; for m equal to
...
p!"c$ci = - bij), c:" = 0 for s < 0. (3.19)
two, the method is closely related to Bairstow's method, i=O
but is somewhat simpler.
Consider first the problem, useful in its own right, of Then, having thus determined the cy (0 < s < n), solve,
dividing for the pji+'), the simultaneous linear equations:

This method of finding factors, and eventually zeros,


can be employed in its own right, and seems especially
successful when the degree m of the factor is about half
it being understood that m < n. For 0 < k < n, define
of n. However, in contrast to the Bernoulli method, con-
numbers bj by
vergence cannot be guaranteed for any systematic choice
of P(') not based on a knowledge of a zero p. Conver-
pibk-i = a,, bj = 0 for j < 0. (3.15)
i=O gence is almost assured when used in conjunction with
Let
n-m
Table 3.3 Iterative Factorization-Starting Factor
x2-1.00lOx +0.9961

Then it is a straightforward matter to verify that

Now apply thc Newton-Raphson recursion relations


,,
(p. 319) to the quantities b,-,,. b,-, + ,, . . ., bn consider-
ed as functions ofp,, p,, . . ., p,. The results are asfollows.
Let f(x) have the factor
3.4 Iteratiue Factorization of PolynomiaIs 157
some method of finding: approximately the coefficients bl = -1 - p i , CI = -61 + P I ,
of a factor. Thus, let Bernoulli's method be used to 62 = 0.75 - blpl -pz, cz = -bz - C I P I +PZ,
approximate the zero a, of f(x). Then, with allowance b3 = 0.25 - blpz - bzpt, -CIPZ -CZPI,
= -63
-
for possible multiple roots, a factor ( x u , ) ~can be
b4 = -0.25 - bzpz - bspl,
~3

~4 = -b4 - ~ z p z
- C3P1,
removed from f(x). This process also delivers the equa-
tion of reduced degree, which can then be studied. CZP; + C I P ; = -2b3 - cs,
Successive factors can be combined so as to maintain a csp; + C ~ P = ; -264 -~ 4 .

better accuracy balance for successive zeros. Table 3.3 illustrates the resulting sequences. Thus, in terms
Example. Consider the polynomial of the previous section, of quadratic factors, the polynomial is
namely, f ( x ) = (x" - x + 1)(x2- 0.25).
f ( x ) = x4 - x3 + 0.75x2 + 0 . 2 5 ~- 0.25, For the same polynomial, the sequence using a starting
factor ( x 2+ OX + 0 ) is shown in Table 3.4. Thus, the same
and use as an approximate factor the result of the tenth step factors,
of the 33ernoulli process, or x Z - 1.001Ox + 0.9961. Using
+
primed letters for the (k 1)th step and unprimed letters for f ( x ) = (x" 0.25)(x2 - x l), +
the kth step, the appropriate formulas are: have been found, but this time in reverse order.

Table 3.4 Zte~ativeFactorization-Starting Factor x 2


EXAMPLE 3.2

ITERATIVE FACTORIZATION OF POLYNOMIALS

Problem Statement co = -bo = -1


I'
Write a program that employs the method of iterative
factorization outlined in Section 3.4 to find an mth-
degree factor,
m

of the nth-degree polynomial


n Note that in (3.2.4) and (3.2.5), j is simply a summation
(bn(x)= C (3.2.2) index, and is not the iteration counter of (3.18) and (3.19).
i=O
One measure of convergence is the magnitude of the
Method of Solution coefficients b,-,+ ,, b,-,+,, . . ., bn, which will vanish
The program that follows reads data values for n, m, when a perfect factor g(x) has been found. After each
.
itmax, E , , E , aO,. . ., a,, pl, . . .,p,, wherep,, . .,p, are the iteration, the convergence test,
initial estimates of the coefficients of the mth-degree factor
g(x) ;itmax is the maximum number of factorization cycles
permitted, cis themaximumallowablepivot size in the solu-
tion'of the simultaneous equations (3.20)by Gauss-Jordan
reduction (see Example 5.2), and 6, is described below. is made; if the test is passed, computation is discontinued
The program first normalizes the coefficients a,, ., a, .. . ..
and the pi, i = 0, 1, . m are taken to be the coefficients
by dividing each by the initial a,, that is, of g(x).
ai c ai/ao, i = 1,2, . . ., n, If convergence test (3.2.6) is failed, then the rn simul-
(3.2.3) taneous linear equations of (3.20) are solved for the new
a. 1,
pi, i = 1,2, . . ., m, here denoted pi :
+-

n - m P; cn-m-IP; + ' " f ~n-Zm+lPk= -(Zbn-m+l + cn-rn+l)


cn-m+ IP; + cn-rn~I + .' ' + cn-~rn+d'L= -(2bn-m+~ + cn-m+~)
... ... ... (3.2.7)
... ..* ...

Any ck for which k < 0 should be assigned the value


and assigns the value One to coefficient PO' the The coefficients of the linear with the
as a and "lving (3'18) and right-hand side vector appended as an column,
.
(3.19) for the bi and ci, i = 0, 1, . ., n, leads to: +
are stored in the m x m 1 matrix X, such that:

X +c n m j i n - m - j + i > 0,
Xi,j 0,+ n - r n - j + i < ~ , ] j = 1 , 2 ,..., rn, i = 1 , 2 ,..., m. (3.2.8)
Xi,m+l + -(2bn-m+i + cn-rn+i),
The function SIMUL is called to solve the equations for
bo = a. = 1
\ the pi, which are then overstored in the p vector,
bi = a, -

bi = a, -
i

j= 1
m

j= I
pjbi-j,

pjbiej,
i < m,

i > m,
IJ (3'2'4)
'picpi, i = 1,2, ..., m. (3.2.9)
The iterative factorization process is continued until the
convergence test is passed, or until the maximum number
of factorization cycles has been completed.
Example 3.2 Iterative Factorization of PoIynomials

Flow Diagram

"Bad

r------ ---. -------------------..

I
I
I

I
I
I
I
I
L---------------
I
iter 1 I
I
Po, . . ., Pm, I
bo, . . ., b, Solve system of I
linear equations &4Xi,m+l + -(2bn-rn+:+ cn-m+iI 1I
with augmented I
coefficient matrix X F---------- -- 7 I
for unknown \ I
p i , i = 1,2,..., m Pi + P I
(Function SIMUL).

O-q-J-J Convergencew
160 Solution of Equations

FORTRAN Implementation
List of Principal Variables
Program Symbol DeJinition
At Vector of coefficients, a i , of the starting polynomial, 4,(x).
B. C t Vector of coefficients, bi (see (3.2.4)) and ci (see (3.2.5)), respectively.
CHEK el, small positive number used in the termination test (3.2.6).
EPS Minimum pivot magnitude allowed during Gauss-Jordan solution of linear equations (see
SIMUL).
I, IMJ, J, K i, i -j, j, and k, respectively.
ITER Iteration counter for the factorization algorithm, iter.
ITMAX Maximum number of iterative cycles permitted, itmax.
M m, degree of the desired factor, g(x) (see (3.2.1)).
MPI, MPI MI m + 1 and m -I-1 - i, respectively.
N n, degree of starting polynomial, &(x).
NMII, NMJI, NPI -
n - m + i + 1, n - m j + i, and n + 1 , respectively.
P t Vector of coefficients, p i , of the polynomial factor, g(x) (see (3.2.1)).
SIMUL SIMUL implements the Gauss-Jordan reduction scheme with the maximum pivot strategy
to solve systems of linegr equations (see Example 5.2). For argument list (m, X, p, E), SIMUL
calculates the solutionsp,,p, .. .,p, ,for the m simultaneous linear equations whose augmented
coefficientmatrix is in them x (m i- 1) matrix X.E is the minimum pivot magnitude which will
be used by function SIMUL.
SUM The repeated sum in convergence test (3.2.6).
X X,the m x (m + 1) augmented coefficient matrix for the linear equations of (3.2.7) (see (3.2.8)).
t Because of FORTRAN limitations (subscripts smaller than one
are not allowed), all subscripts on the vectors of coefficients a, b, c,
and p that appear in the text and flow diagram are advanced by one
when they appear in the program; for example, a, becomes A(l),
+
p , becomes P(M l), etc. Limitations on the:form of subscript
expressions and the absence of a zero subscript in most FORTRAN
implementations, result in the introduction of a number of variables
that do not appear in the flow diagram.
Example 3.2 Iterative Factorization of Polynomials

Program Listing
C A P P L l ED NUMERI CAL METHODS, EXAMPLE 3.2
C I T E R A T I V E FACTORIZATION OF POLYNOMIALS
C
C T H I S PROGRAM USES AN I T E R A T I V E PROCEDURE TO FACTOR A POLYNOMIAL
C OF DEGREE M W I T H COEFFICIENTS P ( I ) . . . P ( M + l ) FROM A POLYNOMIAL
C OF DEGREE N WITH COEFFICIENTS A ( l ) . . . A ( N + l ) . SUBSCRIPTS
C l NCREASE WITH DECREASING POWERS OF THE VARIABLE. COEFFI C l ENTS
C OF THE N TH DEGREE POLYNOMIAL ARE F I R S T NORMALIZED BY
C
C
C
P(2). ..
D I V I D I N G EACH BY A ( 1 ) . I N I T I A L ESTIMATES FOR THE COEFFICIENTS
P ( M + l ) ARE READ AS DATA. THE COEFFICIENTS OF THE
RESIDUAL POLYNOMIAL ARE SAVED I N B ( l ) . . . B ( N + l ) . THE PROCEDURE
C INVOLVES I T E R A T I O N TO IMPROVE CURRENT VALUES OF THE COEFFI-
C CIENTS P ( 2 ) . . . P ( M + l ) I N SUCH A WAY THAT THE COEFFICIENTS OF THE
C RESIDUAL POLYNOMIAL B(N-M+2)...B(N+l) VANISH. THE COEFFICIENTS
C C(l)...C(N+l) (SEE TEXT) ARE SAVED I N THE C VECTOR. CONVERGENCE
C I S ESTABLISHED WHEN THE ACCUMULATED MAGNITUDE (SUM) OF
C R(N-M+2)...B(N+l) I S LESS THAN CHEK. WHEN THE CONVERGENCE TEST
C FAILS, THE FUNCTION S I M U L (SEE PROBLEM 5.2) I S CALLED TO
C SOLVE THE SET OF LINEAR EQUATIONS FOR THE UPDATED P VALUES,
C WITH THE AUGMENTED C O E F F I C I E N T MATRIX I N THE ARRAY X.
C EPS I S THE MINIMUM P I V O T MAGNITUDE ALLOWED BY SIMUL.
C I T E R A T I O N PROCEEDS U N T I L THE CONVERGENCE TEST I S PASSED OR
C U N T I L ITER, THE NUMBER OF ITERATIONS, EXCEEDS ITMAX.
C
I M P L I C l T REAL*B(A-H, 0-2)
DIMENSION A ( 2 0 1 , B ( 2 0 1 , C ( 2 0 ) , P(20), X(21,21)
C
C ..... READ DATA, NORMALIZE A'S,
1 READ (5,100) N,M, ITMAX,CHEK,EPS,(A(I
PRINT .....
), I = l , N ) , A ( N + l )
N P 1 = N + :I
MP1 n M + 1
READ ( 5 , 1 0 1 ) ( P ( I ) ,I =2,MP1)
DO2 1 = 2 , NP1
2 A(!) = A(I)/A(l)
A ( 1 ) = 1.
B ( 1 ) = 1.
C ( 1 ) = -1.
P ( 1 ) = 1.
WRITE (6,200) N,M, ITMAX,CHEK,EPS,NPl,(A(l ),l=l,NPl)
WRITE (6,201) MP1, ( P ( 1 ), I = l , M P l )
C
C .....
IF
CHECK
M.GT.0
(
FOR ARGUMENT CONSISTENCY
.AND. t4.LT.N GO TO 4
.....
WRITE (6,202)
GO TO 1
C
C
S
.....
DO 1 9
BEG1 N l TERATI VE FACTORIZATION
I T E R = 1, ITMAX
.....
C
C .....
SUM = 0.
COMPUTE COEFFICIENTS I N B AND C ARRAYS AND SUM .....
K = 1
DO 11 I 2, N P 1
B(I) = A({)
I F (K.LT.MP1) K = K + 1
6 DO7
IMJ = I
7 B(I) = B(l 1
- -
J = 2 , K
J
P(J)*B(IMJ+l)
I F ( I.GT.FIP1-M ) SUM = S U M + D A B S ( B ( I ) )
C(I) = -B(I)
DO 11 J a: 2, K
IMJ = I - -
Jl
11 C ( 1 ) = C ( I ) P(J)+C(IMJ+l)
C
C ..... CHECK FOR CONVERGENCE
I F ( SUM.Gl'.CHEK ) GO TO 1 3
.....
WRITE ( 6 , 2 0 3 ) ITER,MPl,(P( I), I = l , M P l )
WRITE (6!,204) NPl,(B(I),I=l,NPl)
GO TO 1
162 Solution of Equations

Program Listing (Continued)


C
C
13
.....
DO 1 7
NO CONVERGENCE YET, SET UP COEFFICIENT MATRIX X
1 = 1, M
.....
DO 1 6 J = 1, M
NMJl = N - M - J + I
X(I,J) = 0.
16 I F (NMJI.GE.0 X(I,J) a C(NMJI+l)
N M l l g N - M + I + 1
17 X(I,MPl) = -(,2.0*B(NMIl) + C(NMI1))
-
I:
..... - SHIFT .....
C

18
DO 1 8 --
SOLVE LINEAR EQUATIONS FOR NEW P'S
CALL SIMULC M, X, P, EPS, 1, 2 1

M P l M l = MP1
1 1, M
I
P ( M P l M I + l ) = P(MP1MI
SUBSCRI PTS

19 P ( 1 ) = 1.

WRITE (6,205)
GO TO 1

..... FORMATS FOR INPUT AND OUTPUT STATEMENTS


FORMAT ( 3(10X, 12, 8 X ) / lox, E6.1,
.....
14X, E7.1 / (20X, 5F10.3) )
FORMAT (20X, 5F10.3)
FORMAT ( l O H l N = .I8 / 1 0 H M = , 18 / 1 0 H ITMAX = , 18
1 / 1 0 H CHEK ,
a F14.5/ 10H EPS , E14.1 / 20H0 A(
21)...A(, ,
12, l H ) / / ( 1 H 5F15.6)
FORMAT ( 20HO P(l)...P(, 12, 1 H ) / ( l H , 5F15.6) )
FORMAT ( 33H BAD DATA ENCOUNTERED AND IGNORED
FORMAT ( 35HOCONVERGENCE CRITERION HAS BEEN MET/ lOHOlTER = ,
1 18/ 20HO P(l)...P(, 12, l H ) / l H / ( 1 H 5F15.6)
FORMAT ( 20HO B(l)...B(, 12, 1 H ) / ( 1 H , 5E15.6) )
FORMAT ( l5HONO CONVERGENCE )

END

Data
N = 4 = 2 ITMAX = 30
CHEK = 1.0E-2 = 1.OE-20
A(l)...A(5) =
P(2)...P(3) =
N = 4
CHEK 1.OE-4
A(l)...A(5) =
P(2)...P(3) = 10.000 10.000
N = 4 = 2 ITMAX = 30
CHEK = 1.OE-4 1.OE-20
A(l)...A(5) = 1.000 -10.000 35.000 -50.000
P(2)...P(3) = 0.000 0.000
N = 4 = 3 ITMAX = 30
CHEK = 1.OE-4 = 1.OE-20
A(l)...A(S) = 1.000 -10.000 35.000 -50.000
P(2)...P(4) = 10.000 10.000 10.000
N = 4 = 3 ITMAX = 3 0
CHEK = 1.OE-4 = 1.OE-20
A(l)...A(5) = 1.000 -10.000 35.000 -50.000
P(2)...P(4) = 0.000 0 .DO0 0.000
N = 4 = 1 ITMAX = 30
CHEK = 1.OE-4 = 1.OE-20
A(l)...A(5) = 1.000 -10.000 35.000 -50.000
P( 2 = 10.000
N = 4 = 1 ITMAX = 3 0
CHEK = 1.OE-4 = 1.OE-20
A(l)...A(5) = 1.000 -10.000 35.000 -50.000
P(2) P 0.000
N = 4 M = 4 ITMAX = 3 0
CHEK = 1.OE-4 EPS 1.OE-20
A(l)...A(5) = 1.000 -10.000 35.000 -50.000
P(2)...P(5) = 10.000 10.000 10.000 10.000
Example 3.2 Iterative Factorization of Polynomials

Program Listing (Contintled)


N = 4 M = 4 ITMAX = 3 0
CHEK = 1.OE-4 EPS 1.OE-20
A(l)...A(5) = 2.000 -20.000 70.000 -100.000 48.000
P(2)...P(5) = 1 0 .ooo 10.000 10.000 10.000
N = 4 M = 0 ITMAX = 3 0
CHEK = 1.OE-4 EPS = 1.OE-20
A(l)...A(5) = 1.000 -10.000 35.000 -50.000 24.000
NO P ' S
N = 4 M = 1 ITMAX = 3 0
CHEK = 1.OE-4 EPS 1.OE-20
A(l)...A(5) = 1.000 -24.000 150.000 -200,000 -275.000
P( 2 1 e 0.000
N 0 4 M = 2 ITMAX 30
CHEK = 1.OE-5 EPS a 1.OE-20

CHEK = 1.OE-5 EPS = 1.OE-20


A(l)...A(5) = 1.000 -1.000 0.750 0,250 -0.250
P(2)...P(3) = 0.000 0.000

Computer Output
Results for the 1st Data Set
N e 4
M I 2
=
-
ITMAX :I 0
CHEK = 0.01000
EPS 0.1D-19

CONVERGENCE C R I T E R I O N HAS BEEN MET


Solution of Equations
Computer Output (Continued)
Results for the 2nd Data Set
N = 4
M - 2
ITMAX = 30
CHEK 0.00010
EPS a 0.10-19

CONVERGENCE CRl TERlON HAS BEEN MET

ITER = 12

Results for the 9th Data Set


N - 4
M s 4
ITMAX = 30
CHEK = 0.00010
EPS = 0.1D-19

P(l)...P( 5)
1.000000 10.000000 10.000000 10.000000 10.000000
BAD DATA ENCOUNTERED AND IGNORED

Results for the 11th Data Set


N - 4
M - 1
ITMAX = 30
CHEK = 0.00010
EPS = 0. I D - 1 9

CONVERGENCE CRITERION HAS BEEN MET

ITER = 6
000000'0 000000'0 000000'1 000000'1- 000000'1
(5 )8""(1)8
000052'0- 000000'0- 000000'1
(£ )d""(I)d

S = 11311

13W N338 SVH NO1113111113 33N39113AN03


166 Solution of Equations

DisePssion of Results Data Set 6


All calculations have been made using double-preci- Starting factor: x + 10
sion arithmetic. Results for the lst, 2nd, 9th, 1lth, and Iterations required : 13
13th data sets are shown in the computer output. The Factors found : x - 1.000000
first ten data sets are for the starting polynomial of x3 - 9.000000x2 + 26.000000~
fourth degree, - 24.000000
Data Set 7
Starting factor: x
which in factored form is: Iterations required : 7
Factors found : x - 1.000000
) ( X -4)(x - 3)(x - 2)(x - 1)
$ 4 ( ~=
x3 - 9.000000x2 + 26.000000~
+ -
= (x3 - 9x2 26x 2 4 ) ( ~- 1) - 24.000000
+
= (x3 - 8x2 19x - 12)(x - 2)
Data Set 8: Bad data encountered (m = n).
+ -
= (x3 - 7xZ 14x 8)(x - 3)
+
= (x3 - 6x2 l l x - 6)(x - 4) Data Set 9 (results shown): Bad data encountered
(m = n).
= (xZ - 7x + 12)(x2 - 3x + 2)
= (x2 - 6x + 8)(x2 - 4 x + 3) Data Set 10: Bad data encountered (m = 0).
= (x2 - 5x + 4)(x2 - 5x + 6). Note that although the starting factors for the first and
second data sets are identical, an additional iteration is
A summary of the results for the first ten data sets fol-
required for the second set because of the more stringent
lows. convergence requirement. The rekults for the fourth and
Data Set 1 (results shown) fifth data sets show that different starting factors may
lead to different combinations of final factors. The results
Starting factor : x2 f 10x + 10
for data sets 8,9, and 10 indicate that the program-is
Iterations required : 1I checking properly for inconsistent values of m and n,
Factors found : +
x2 - 4.999663~ 3.995807
while the printed results for the ninth data set show that
x2 - 5.000337~+ 6.004194
the coefficients ai are being normalized as specified in
Data Set 2 (results shown) (3.2.3).
The starting polynomial for the eleventh data set is
Starting factor: +
x2 l o x + 10
Iterations required : 12 45,(x) = x4 - 24x3 + 150x2- 200x - 275.
Factors found : x2 - 4.999999~+ 3.999991 The results, shown in the computer output, are:
x2 - 5.000001~+ 6.000009
Starting factor : x
Data Set 3 Iterations required : 6
%

Starting factor: x2 Factors found: +


x 0.812840
Iterations required : 11 x3 - 24.812840x2
Factors found : x2 - 5.000000~+ 4.000000 +
170.168865~- 338.320032
x2 - 5.000000~+ 6.000000 To six figure accuracy, one root of this polynomial is
-0.812840.
Data Set 4 The polynomial for the twelfth and thirteenth data
Starting- factor : + +
x3 lox2 lox + 10 sets is
Iterations required : 12
Factors found : x3 - 9.000000x2 + 25.999999~
- 23.999998
x - 1.000009 This is the polynomial used in the examples of Tables
3,3 ~ n 3.4.
d Results for the two cases are:
Data Set 5
Data Set 12
Starting factor: x3
Iterations required : 9 Starting fagtor: x2-2x+2
Factora found I x3 - 7.000009x2 + 14,OOOOOOx Iterations r~quired: 15
- 8.000Q00 Factors found : x2 - 0.000006~- 0.250000
x - 3.000000 x" 1 . m 1,000900
~ +
Example 3.2 Iterative Factoruation of Polynomiah 161
Data Set 13 (results shown)
- The results for these two data sets clearly show the
Starting factor: x2
importance of the starting factor on the number of
Iterations required : 5 iterations required to converge to the same factors with
Factors found : x2 - 0.000000x - 0.250000
equivalent accuracy.
168 Solution of Equations

3.5 Method of Successive Substitutions previously mentioned. If we set xj+,= cos(xj), then
F'(x) = -sin x. The reader should establish graphically
The following discussion is not confined to polynomial
that IF'(a)l < 1 for the unique solution a.
equations. We rewrite (3.1) in the form
Note that the equation x = F(x) can be formed from
x = F(x), (3.21) the original equation, f ( x ) = 0, in an unlimited number
such that iff (a) = 0, then a = F(a). If an initial approxi- of ways. While the choice of F will depend on the par-
mation x , to a root a is provided, a sequence x,, x,, .. . ticular situation, one suggestion is offered. If a = F(a),
may be defined by the recursion relation +
then also a = (1 - k)a kF(a); therefore, instead of
(3.22), we can consider
xj+ I = F ( x j ) , (3.22)
with the hope that the sequence will converge to a. The
successive iterations are interpreted graphically in to see whether a suitable choice of k will affect conver-
Fig. 3.la. gence favorably.

(a) (b)
Figure 3.1 Graphical interpretation of method of successive substitutions.

Convergence will certainly occur if, for some constant Further discussion of this and other iterative methods
p where 0 < p < 1, the inequality can be found in Traub [14].
IF(x) - F(a)l < A x - 4 (3.23) Example. Consider the equation
holds true whenever Ix - a1 < Ix, - a / . For, if (3.23)
holds, we find that
1x2 - 4 = IF(x1) - al = IF(x1) - F(a)I <PIX, -4 ,
sincc a = F(a). Proceeding,
- a1 = IF(x,) - F(a)l < plx, - a1 < p21x, - BI.
Con Linuing in this manner, we conclude that Ixj - a 1 <
Ix, - a [ , and thus that lirn x j = a.
j-+ -D

Condition (3.23) is clearly satisfied if F possesses a


derivative F' such that IF1(x)l < p < 1 for Ix - a1 <
Ix, - al. Figure 3.lb shows how the method of succes-
sive substitutions fails to converge for a case in which
IF1(x)l > 1 in the region of interest. Generally, when x j is
close to a, the approximate relation
xj+, - a 5 F1(a)(xj- a) (3.24)
holds true; F'(a) is called the asymptotic convergence
faclor.
h
An example is furnished by the equation cos x - x = 0 Figure 3.2 Roots o f f ( x ) = x3 - 3x + 1 = 0.
1
3.6 Ward's Method 169
The function f(x) is illustrated in Fig. 3.2; by inspection, neighborhood of a point (x,y) such that w(x,y) # 0,there
2 > a , > 1, 1 > > 0, and -1 > a3 > -2. Consider first the exists a point (x',yl) such that w(x1,y') < w(x,y). The proof
version assumes that f (z) has certain derivatives, and this is valid
1 for polynomials in particular. At a given step in the
x,+1 = - (x:
3
+ 1). (3.26a) process, which is iterative, the current value w(x,y) is
compared with the values w(x + h,y), w(x - h,y),
Corresponding to (3.26a), F'(x) = x2, and IF'(x)l < 1 if w(x,y + A), and w(x,y - h) until a smaUer value is found
1x1 < 1. Consequently, (3.26a), if used to define the iterative (if possible). The yielding this smaller value
process, can be expected to yield a,, but not to yield al or
a,. If, however, in (3.25) we use k = -&, so that
are taken as the values of x and y in the next iterative
step. Should these four points yield no decrease in
(3.26b) w = IuI +
IvI, h is replaced by h/2 and the process is
continued.
then F ( ~ )becomes (3/2)x - (x3 + 1)/6 and F$(x)= 312 - There is no guarantee of convergence. For example,
x2/2. In particular, for 1 < 1x1 < 45, which includes the polynomials
1< 1x1 < 2,)F'(x)l t l . Thus, properly started, the approach n

using (3.26b) yields a, and a,. Complete details of the iterative


j=O
~ ~ + K~ , z ~ '
processes are given in Table 3.5.

Table 3.5 lllustraticn of Method of Successive Substitutions

Iterative Formula: (3.26a) (3.26b) (3.26b)

Starting Value,
XI: 0.5 1.5 -1.5

3.6 Ward's Method where K and the c 4 ] are positive, cannot be solved by the
Let the polynomial f(z) be a function of the complex process just described if z = 0 is the point of departure.
variable z = x + iy, so that One can, of course, introduce variations of the search
process described above.
To show that if w(x,y) # 0, there does exist in every
neighborhood of (x,y) a point (x',yl) such that w(x',yf) <
Clearly, f(z) has a zero cr = /3 + iy if and only if u(0,y) =
w(x,y), proceed as follows, writing z = x + iy and
v(P,y) = 0.This pair of simultaneous equations in u(x,y)
z' = x' + iy'. We have
and u(x,y) may be used in various ways. One may, for
example, seek to minimize u2(x,y) f- v2(x,y). Ward's 1
method [3] seeks to minimize f (z') =f (z) + (2' - z,"[,f'"'(z)
m. + ~(ZI)],
where f("jjz)/m! is the second non-zero coefficient in
Since no derivatives are employed, the method seems well Taylor's expansion of f(z@) in powers of z' - z. Also,
suited for finding multiple roots. The technique is based S(Z)= 0, for s(zl) = (z' - z)[f (m'')(~)+ r(zl)]/(m + 1>!
on the knowledge (subsequently verified) that in every Let f("'(z)/m! = aeih and let z' - z = reir. Choose r, so
170 Solution of Equations
small that Is(z')l = k < a13 for r < r,. Then if f ( z ) = lu + rmacos 4 + rmkcos $1 = iul - P ( a + k cos $)

Let $ = b + m , # = c + m ~ . T h e n Also,
w(x',yf) = (u + P a cos 4 + rmkcos #(
+ lv + rmasin 4 + rmksin $1.
Iv + rmasin 4 + rmksin $1 c Ivj + -a3 rm.
Then, in this case as well, w(xfy') c w(x,y).
Jf # O* r2 that r2 rl and +1' < IuI. Finally, if u = 0, and, of course, f 0, choose r, so
If u v 0, choose $ = n and r c r2. Then that r,- c r,- and r?(a + k ) < 101. If v > 0, choose 4 =
lu + ma cos 4 + rmkcos =u - rma+ rmkcos $ -n/2; if v < 0, choose 4 = n/2.Then for r < r, we have,
as before, w(xl,y') < w(x,y).
< u - rma+ rmk
+
Example. Starting with z = 1 i, and with an initial step
a h = 0.1, find a zero of the following function:
<u-rma+fz
3 f ( z )= z4 - 22' + 1 .25z2 - 0.252 - 0.75
2
c u - - arm.
+
= ( Z - l . 5 ) ( ~ 0.5)(z2- z + 1)
3 = (Z- I.S)(Z + + ) x-z0.5(1 - iJj)).
O . ~ ) ( Z- 0.5{1 iJ ~
Also, Since z = x+ iy, f ( z )= u + iv, where
u = x4 - 6x2ya+ y4 - 2x3 + 6xy2 + 1.25~' - 1.25~'
+
Ju J"a sin 9 + r"'k sin *I < Jvl + rm3 ' - 0.25~- 0.75,
v = 4x3y - 4xy3 - 6x2y + 2y3 + 2 . 5 ~ 0.25~.
~ -
Therefore, w(x',yf) < lul + Ivl - 3arm< w(x,y). Table 3.6 lists the successive values of x, y, lul, Ivl, and
Suppose next that u < 0. Choose 9 = 0, Then, for w(.= lul +1 ~ 1 ) . Unnecessary calculations are, of course,
r < r2, omitted. The required root is approximately 0.500 +0.860i.

Table 3.6 Illustration of Ward's Method


3.7 Newton 's Method 171

3.7 Newton's Method It can be shown that if ci is a simple zero of f(x) and
Newton's method for finding the zeros of f(x) is the if lirn,,, xk = a, then
most widely known, and it is not limited to polynomial
functions. It will be presented here for the complex case.
Consider, then, z = x + iy and f(z) =: u(x,y) iu(x,y). +
For an iterative process, the complex expression This means that once xk is near a, the error in the next
step is proportional to the square of the error in xk; the
resulting quadratic convergence is then rapid in com-
parison with the linear convergence of several other
is equivalent to the two expressions methods.
To understand (3.31), recall, by Taylor's theorem. that

where { lies between x and a. Then the Newton's method


algorithm, modified by subtracting a from both sides and
Here, u, and u,, mean dulax and aulay, respectively. The noting that f (a) = 0,
expressions in parentheses are to be evaluated for x = xk,
y = y,. The manipulation establishing the equivalence of
(3.29) and (3.30) requires only knowledge of the Cauchy-
Riemann equations, stating that may be written as

The expressions (3.30) can alternately be found by


considering the simultaneous solution of
u(x,y) = 01, v(x,y) = 0,
using the Newton-Raphson technique (p. 319). This may
be consulted for an indication of proof of convergence
when z, is near a zero of,f(z).
As previously remarked, Newton's method, when
applied to polynomials, (can be considered as the case
m = 1 of the iterative factorization technique.
Yet another approach is to view it in terms of the Dividing by (xk - a)2 and noting that lim xk = a, (3.31)
k-. a,
method of successive substitutions. In (3.21) let follows:

lim x-k+l-a -- 1
[/"(a)
1
- if "(a)] =
f "(a)
-
(x, - aI2 f '(a)
k-. a 2f ' ( 4
'

Then the asymptotic convergence factor for a zero, a, of Example. Starting at the point ( l , i ) , use Newton's method
f (x) becomes, for x real, to find a zero of the following function, in which z = x + iy:
f (z) = z4 - 22" 1.25z2 - 0.252 - 0.75
= (Z - 1.5)(z + 0.5)(z2 z i1)
-

This means that convergence is guaranteed (for f '(a) # 0) = (z - I.S)(Z + 0.5)(z - O.S{I + i J i ) ) ( z I i J3)).
- o . ~ {-
if the initial value x, is near enough to a.
For this function,
If f(x) is the polynomial (3.2), observe that f(r) may
be written in the form +
u(x,y) = x4 - 6x2y2 y4 - 2x3 + 6xy2 i -1.25x2
(-. .(((I))- + a'))- + az)r. - -)r + a,. -~0 . 2 5 ~- 0.75,
-1 . 2 5 ~
u(x,y) = 4x3y - 4xy3 - 6x2y + 2y3 + 2.5xy - 0.25y,
This is really synthetic division. It may be phrased u,(x,Y) = 4x3 - 12xy2 - 6x2 + 6y2 + 2 . 5 -~ 0.25,
iteratively as f (x) = (x - r) C;:,' bixn- '- + b,, with
,
b , + , = b,r + a,+ [see equation (3.15), using m = 1 and u , ( x , ~ )= -12x2y + 4y3 + 12xy - 2 . 5 ~ .
p , = r]. Observe that f '(r) = XE,' birn-' - '; therefore, Table 3.7 lists x' and y' as the iterative successors of -
f (r) and f'(r) can be calculated simultaneously. x and y.
172 So lutio* qf Equations

Table 3.7 Illustration of Newton's Method

Therefore, a root is 0.5 + 0.8660251'.


EXAMPLE 3.3

SOLlJTION OF AN EQUATION OF STATE USING NEWTON'S METHOD

Introduction Method of Solution


Many equations of state have been developed to de- Rewriting (3.3.1) in the form
scribe the P- V-T (pressure, volume, temperature) relation-
R T P Y b
ships of gases. One of the better known equations of
state is the Beattie-Bridgeman equation,
v + v2 + v
f(V) = - - -?+ - - P = 0,
v4 (3.3.7)

and differentiating with respect to Vat constant T and P


R - +T- + -P
p=-+ Y a (3.3.1) yields
v V2 v3 v4,
where P is the pressure, V is the molar volume, T is thle
R T 2j3 -
f '(V) = - - - -3y - 46
V2 V 3 V4 V5'
-
(3.3.8)
temperature, P, y, and 6 are temperature-dependent
parameters characteristic of the gas, and R is the univer- The Newton's method algorithm from (3.29) is then
sal gas constant in compatible units. The second, third,
and fourth terms on the right-hand side of (3.3.1) may
be viewed as corrections of the ideal gas law

ascribable to "non-ideal" behavior.


The parameters P, y, and 6 are defined by where for simplicity the subscripts k have been omitted
from the volume terms on the right-hand side. Using
units of atmospheres (1 atm = 14.7 Ib,/in2) for P,
liters/g mole (1 g mole of methane (CH,) is approximately
+
16 grams) for V, and OK (OK t OC 273.15) for T, the
gas constant R is equal to 0.08205 liter atml0K g mole.
For this set of selected units, the appropriate constants
for methane are [15]:

A,, B,, a, b, and c are widely tabulated constants, deter-


mined empirically from experimental data, and are
different for each gas.
Equation (3.3.1) is explicit in pressure P but implicit in
temperature T and volume V. Hence some iterative root The ideal gas law should give a reasonable first estimate
finding procedure is required to find the volume which for the molar volume V:
corresponds to given values of pressure and temperature.
Probkm Statement
Write a program that uses Newton's method to solve
(3.3.1) for the molar volume of any gas, given the pres- A criterion for terminating the iterative procedure of
sure, P, temperature, T, and the constants R, A,, B,, a, 6, (3.3.10) is
and c. After computing V, calculate the compressibility
factor z, where

Here E is a small positive number. For E = the


final value of Vk+,should be accurate to approximately
The compressibility factor is a useful index of the depar- N significant figures. This does not imply that the cal-
ture of real gas behavior from that predicted by the ideal culated volume will agree with experirnentai measurement
gas law (z = 1 for an ideal gas). to N significant figures, but rather that the equation has
As test cases, compute the compressibility factors for been solved this accurately given the set of constants
gascous rncthanc (natural gas) a t temperatures of 0°C and a, Ao, b,p, Bo,c , 5, and y.
200°C for the following pressures (in atmospheres): Newton's method may fail to converge to a root, if
1, 2, 5, 10, 20, 40, 60, 80, 100, 120, 140, 160, 180, 200. given a bad starting value. The number ofiterations should
Compare the calculated results with experimental values. be limited to a small integer, itmax.
174 So/ution of Equations

Flow Diagram

+ Aoa-- RcB,
O-(O-(k
Begin
a, A,, b,&, c,
R, E , itmax, T, P
y t -RTBob

g , .beTo 2
T~ - 7
I
I
I
I

v, RT
-
4-
P V
I
I
I

FORTRAN Implementation
List of Principal Variables
w Convergence"

Program Symbol DeJinition


A, AZERO, 0, Material-dependent constants, a, A,, b, Bo , c.
BZERO, C
BETA, DELTA, Temperature-dependent parameters, P, 6, and y.
GAMMA
DELTAV Incremental change in molar volume for the Mh iteration, V,,, - V, , liter/g mole.
EPS Tolerance for convergence criterion, E.
ITER Iteration counter, k.
ITMAX Maximum number of iterations permitted, itmax.
P Pressure, P , atm.
R Universal gas constant, R, liter atmPK g mole.
T Temperature, T, OK.
TC Temperature, T, "C.
v Molar volume, V, , Iiter/g mole.
z Compressibility factor, z.
Example 3.3 Solution of an Equation of State Using Newton's Method

Program Listing
C APPL l ED NUMER l CAL METHODS, EXAMPLE 3.3
C SOLUTION OF AN EQUATION OF STATE U S I N G NEWTON'S METHOD
C
C G I V E N A TEMPERATURE T AND PRESSURE P, T H I S PROGRAM USES
C NEWTON'S METHOD TO COMPUTE THE MOLAR VOLUME V OF A GAS WHOSE
C PRESSURE:-VOLUME-TEMPERATURE BEHAVIOR I S DESCRI BED BY THE
C BEATTIE-BRIDGEMAN EQUATION OF STATE. R I S THE UNIVERSAL GAS
C CONSTANT. A, AZERO, B, BZERO AND C ARE E M P I R I C A L CONSTANTS,
C D I F F E R E N T FOR EACH GAS. BETA, GAMMA, AND DELTA ARE TEMPERATURE-
C DEPENDENT PARAMETERS DESCRIBED I N THE PROBLEM STATEMENT.
C I T E R I S THE I T E R A T I O N COUNTER AND DELTAV THE CHANGE I N V
C PRODUCED BY ONE A P P L I C A T I O N OF NEWTON'S ALGORITHM. FOR
C CONVERGENCE, THE MAGNITUDE OF DELTAV/V I S REQUIRED TO BE
C SMALLER THAN SOME SMALL P O S I T I V E NUMBER EPS. AT MOST ITMAX
C I T E R A T I O N S ARE ALLOWED. I F THE CONVERGENCE TEST I S PASSED,
C THE COMPRESS l B l L l TY FACTOR Z I S ALSO COMPUTED. THE l DEAL
C GAS LAW I S USED TO GET A F I R S T ESTIMATE OF V. I T I S ASSUMED
C THAT TC, THE TEMPERATURE READ I N AS DATA, HAS U N I T S OF
C DEGREES CENTIGRADE. T HAS U N I T S OF DEGREES K E L V I N . UNITS
C FOR A L L OTHER PARAMETERS MUST BE DIMENSIONALLY CONSISTENT.
C
1 READ ( 5 , 1 0 0 ) A,AZERO,B,BZERO,C,R,EPS, ITMAX
WRITE (6,200) A,AZERO,B,BZERO,C,R,EPS, ITMAX
C
2 READ ( 5 , 1 0 1 ) TC,P
C
C ... ..COMPUTE TEMPERATURE-DEPENDENT PARAMETERS FOR GAS
T = TC + 2 7 3 . 1 5
.....
BETA = R * T t B Z E R O
GAMMA = -R*T*BZERO*B + AZERO*A
-
AZERO -
R*C/(T*T)
-
R*C*BZERO/(T*T)
DELTA = R * R Z E R O * B * C / ( T * T )
C
C ...
..USE
V = R*T/P
I D E A L GAS LAW FOR F l RST VOLUME ESTIMATE .....
C .....
DO 4
B E G I N NEWTON METHOD I T E R A T I O N
I T E R = 1, ITMAX
.....
DELTAV = ( ( ( ( ( - P * V + R * T ) * V + B E T A ) t V + G A M M A ) * V + D E L T A ) * V ) / -.
1 (((R*T*V+2.*BETA)*V+3.*GAMMA)*V+4.*DELTA)
V = V + DELTAV

..... CHECK FOR CONVERGENCE .....


I F ( ABS(DELTAV/V).GT.EPS GO TO ) 4
Z = P*V/(R*T)
WRITE ( 6 , 2 0 1 ) TC,P,Z, ITER
GO TO 2
CONT I NUE

WRITE ( 6 , 2 0 2 )
GO TO 2

..... FORMATS FOR I N P U T AND OUTPUT STATEMENTS


FORMAT (3(10X,F8.5,2X)/10X,F8.5,12X,F8.0,12X,F8.5/10X,E6.l814X,l2~
.....
FORMAT (lO)(,F6.2,14X,F6.2)
FORMAT ( 1 0 H l A = , F 1 4 . 5 / 1 0 H AZERO = , F 1 4 . 5 / 1 0 H B = ,
1 F 1 4 . 5 / 1 0 H BZERO = , F14.5/ 10H C = , F14.5/ 1 0 H R =
2 F 1 4 . 5 / 1 0 H EPS = ,lPE14.1/10H ITMAX = , 1 8 / 51H0 TC
3 P Z ITER / )
FORMAT ( F 1 0 . 3 , F15.3, F15.6, 110 )
FORMAT ( 1 6 t i 0 NO CONVERGENCE)
C
END

Data
A = 0.01855 AZERO = 2.27690 B = -0.01587 ,
BZERO = 0.05587 C = 128300. R = 0.08205
EPS = 1.OE-6 ITMAX = 20
TC = 0.00 P = 1.00
176 Solution of &uations

Program Listing (Continued)


TC 0.00
TC = 0.00
TC = 0.00
TC = 0.00
TC = 0.00
TC = 0.00
TC = 0.00
TC = u.30
TC = 0.00
TC 0.00
TC = 0.00
TC 0.00
TC 0.00
TC = 200.00
TC 200.00
TC 200.00
TC = 200.00
TC = 200.00
TC = 200.00
TC = 200.00

-
TC = 200.00
TC = 200.00
TC 200.00
TC 200.00
TC a 200.00
TC = 200.00
TC = 200.00

Computer Output
A x 0.01855
AZERO = 2.27690
B t -0.01587
BZERO = 0.05587
C = 128300.00000
R c 0.08205
EPS = 1.OE-06
ITMAX = 20

l TER
3
3
3
3
4
4
4
5
5
5
5
6
6
7
2
2
2
2
3
3
3
2
3
Example 3.3 Solution of an Equation of State Using Newton's Method 177
Discussion of Results
The calculated results shown in the computer output
are plotted in Fig. 3.3.1 along with experimental values
reported by Brown, Katz, Oberfell, and Alden [16] (the
solid lines). In the pressure range 0-200 atm, agreement is
quite good at 200°C (maximum deviation is approxima-
tely 0.3 percent). Predicted values for 0°C are quite good
at pressures below 100 atm, but at higher pressures are
in considerable error; the Beattie-Bridgeman equation
would have to be used with some caution at higher pres-
sures.

Pressure (atrn)
Plgrrre 3.3.1 Compressibility factor for methane (CHS.
-Experimental values.
Valnespredicted by Beattie-Bridgeman equation of
state with the given set of co~~~tants.
178 Solution of Equations

3.8 Regula Falsi and Related Methods (c, f(c)) be a fixed point on y =f(x). Draw a chord
For the real case of Newton's the expression through this point and the point ( ~ k , f ( ~ k ) ) that it
[see (3.29)] : intersects the x-axis in a point (x~+~,O).
Thus,

has the interpretation illustrated in Fig. 3.3. We draw a T'his new point may well yield a better approximation
tangent to the curve y =f(x) at the point (xk,f(xk)).This than xk to a.
The procedure may be justified by the method of
I, successive substitutions. Let

where f (c) # 0. It is clear that f (a) = 0 implies F(a) = a,


and that a = F(a) implies f(a) = 0. Since

it follows that in the neighborhood of a zero, r, for f (x),


the asymptotic convergence factor is

Applying the mean-value theorem, together with f (r) = 0,


we see that f (c) -f (a) =f (c) = ( c - z)f '(r),
in which (
lies between a and c. Thus,
Figure 3.3 Newton's method for the real root x = a.

tangent meets the x-axis at the point (x~+~,O). If, then,


the curve crosses the x-axis at a point (a,O) sufficiently so that convergence can be expected for proper values
near (xk,f(xk)), and it is concave up or down in a region of c.
including these two points, it may easily be seen that the Since only functional values are involved (no derivative
number xk+, is nearer a than was xk. values are required), the resulting iterative formula (3.32)
This kind of pictorial approach suggests other methods. involves little computational effort. Also, in common
One method, sometimes called the rule of false position, with the other procedures given in this section, the
may be constructed as follows. Referring to Fig. 3.4, let method of false position is not confined to roots of poly-
nomial equations.
Another technique with a simple graphical explanation
is illustrated in Fig. 3.5. It gives a root, if values x,, and
x,, are known, such that f(x,,) and f(xR1) are oppo-
site in sign. For continuous functions, the number
f((x,, + xRl)/2), being the value of the function at the
halfway point, will be either zero or have the sign of
f(x,,) or the sign of f(xR1). If the value is not zero, a
second pair xL2 and xR2can be chosen from the three
numbers x,,, x,,, and (x,, + xR1)/2so that f(x,,) and
f (x,,) are opposite in sign, while

Continuing in this manner, there is always a point rr in


for which f(a) = 0; a is uniquely
the interval [xLk,xRk]
determined by the process even though the interval may
contain more than one zero for f (x).
Figure 3.4 Method of false position. Because each new application of the iterative scheme
3.8 Regula Falsi and Related Methods 179
The process just described is linear inverse interpolation
and sometimes bears the name regula falsi.
Example. Find the root of f(x) = x3 - 3x + 1 = 0, that
is known to lie between XL, = 1 and XR, = 2.
The results for ten iterations of the regula falsi method are
given in Table 3.8. Because of the nature of the function and

Table 3.8 ZNustrarion of Regula Falsi Method

k XLI XRI ~(XLI) f (XRI)

Figure 3.5 Half-interval method,

reduces by half the lenglh of the interval in x known to the points chosen, this is also anexample of themethod of false
contain a, this procedure is called the half-interval position; note that X R L remains unchanged throughout the
method. Note that since the interval of uncertainty is course of the iteration and this is equivalent to c in equation
always known, we can specify, a priori, the number of (3.32). Hence, the required root is approximately 1.532; it
iterations required to locate the root within a prescribed is computed more accurately to be 1.532089.
tolerance. If A, is the length of the starting interval, then
the number n of interval-halving operations required to
reduce the interval of uncertainty to A, is given by

A technique which in some senses combines the features


of the two preceding ones may also be constructed.
Referring to Fig. 3.6, let x,, and xRl be numbers such
that f(xLl) and f(xR1)are opposite in sign. Let x, be the
abscissa of the point of intersection of the x-axis and the
chord joining the points (x,, ,f (x,,)), (xRI,f(xRI)); that
is,

Iff (x,) = 0, the proce:ss terminates with a zera af f(x).


If f(x,) has the same sign as f(xR1), choose x,, 2 x L 1
and x,, = x,. If f(x,) has the same sign asf(xLl), choose Figure 3.6 Regula falsi method.
x,, = x,, and XR2 = xR1.The process is then continued
to create the sequence of pairs (x,,, x,,).
EXAMPLE 3.4
GAUSS-LEGENDRE BASE POINTS AND WEIGHT FACTORS BY THE HALF-INTERVAL METHOD

Problem Statement When the first root zo has been found, we again step to
the right in increments of 0.05, until P,+,(z) again
Write a program that computes the base-point values
changes sign. The procedure is repeated until all n + 1
and weight factors for use in the (n f 1)-point (n 6 9)
roots zo, z,, . .., zn have been located within the required
Gauss-Legendre quadrature (see Section 2.10 and Exam-
degree of accuracy.
ple 2.3):
The corresponding weight factors wo, w,, . . ., w, are
given from equation (2.85) :

Use the half-interval method to compute the base-point


+
values, which are the roots zi of the (n 1)th degree
Legendre polynomial P,+ ,(z). Find the corresponding Since the integrand is an nth-degree polynomial, (3.4.2)
weight factors wi by evaluating the integral of equation can be rewritten as
(2.85). As a check on the calculations, use (3.4.1) to
evaluate a few simple test integrals.
Method of Solution
The coefficients of all the Legendre polynomials Pn(z),
0 < n < 10, are first generated and stored in successive
rows of the lower triangular portion, of matrix A. The for n even, with a similar form, terminating in cn-,In, for
elements of A, which are such that a,, is the coefficient of n odd. The method for obtaining the c's by expanding the
z' in Pi(z), are obtained recursively from (2.66). The ac- repeated product is detailed in the flow diagram.
tual formulas used are shown in the flow diagram. Finally, the (n + 1)-point Gauss-Legendre quadrature
For a given n, the roots of Pn+,(z) are obtained by first of (3.4.1) is used to approximate the following test inte-
noting that they lie between - 1 and 1. Starting from grals :
z = - 1, we proceed in increments of 0.05 (assumed to be
smaller than the separation of the roots) until Pn+,(z)
changes sign, that is, until we reach a point z such that
Pn+,(z)P,,+,(z + 0.05) is negative. By setting a left-hand
limit z, = z and a right-hand limit z, = z + 0.05, the
half-interval method of Section 3.8 can then be imple-
mented. By arbitrarily deciding to locate each root within which have the exact values e - e-', 4/a,0, and 217,
a small interval of the required number of itera- respective1y.
tions is obtained from (3.33) by substituting A, = 0.05
and A, = and rounding up to the next integer.
Example 3.4 Gauss-Legendre Base Points and Weight Factors by the Hag-Interval Method

Flow Diagram
Mail7 Prograin

~ O . O. .y ~a10.10
. . ., 10 aiSoc - -ai-2.0
to zero

. . ., 10

Y
Use half-interval method to
2i- 1
find roots zo, . . ., z, of P,,,
,(z) a i j t- . a i - l , j - l- -
(Subroutine ROOTS) I i

Find weight factors


+ its9, ..., H',,
I
. .., 2, ..., ,vn
(Subroutine WEIGHT)
Solution of Equations

Subroutine ROOTS (Arguments: n, a, z)

I (rounded up) I

Subroutine WEIGHT (Arguments: n, 2, MI)


Exaqple 3.4 Gauss-Legendre Base Points and Weight Factors by the Half-Interval Method

FORTRAN Zmplemenration
List of Principal Variables
Program Symbol Definition
(Main)
At Matrix whose rows contain coefficients of the successive Legendre polynomials.
EXACT1, EXACTP, Exact values of the four integrals, I,, I,, I,, and I,.
EXACT3, EXACT4
FI1, F12, Approximations to the integrals I, I, I3and I,.
F13, F14
N n.
ROOTS Subroutine for determining the roots of P,+,(z).
wt Vector of weight factors wi . .
WEIGHT Subroutine for finding the weight factors wi.
zt Vector of roots zi .
(Subroutine
ROOTS)
FZL, FZHALF Va.lues of P,+,(z) at the left end and midpoint of the current interval,f, andf,,, , respectively.
ITER Number of half-interval iterations, k.
POLY Function for evaluating P,+,(z).
ZL, zR, ZHALF Va.lues of z at the left end, right end, and midpoint of the current interval, z,, z,, and z,,, ,
respectively.
(Subroutine
WEIGHT)
ct Vector of coefficients c in (3.4.3).

t Because of FORTRAN limitations, all subscripts in the text


are advanced by one when they appear in the program; e.g., !he
+
roots zo through .I become Z(1) through Z(N 1).
SoluIion of Equations

Program Listing
Main Program
APPLIED NUMERICAL METHODS, EXAMPLE 3.4
GAUSS-LEGENDRE BASE POINTS AND WEIGHTS BY HALF-INTERVAL METHOD.

T H I S PROGRAM AND I T S ASSOCIATED SUBROUTINES ROOTS AND


WEIGHT REPRESENT A COMPLETE PACKAGE FOR GAUSS-LEGENDRE
QUADRATURE. FIRST, THE COEFFICIENTS OF THE LEGENDRE
POLYNOMIALS OF ORDERS ZERO THROUGH TEN ARE COMPUTED AND
STORED I N THE ARRAY A. A VALUE FOR N (NOT BIGGER THAN 9 )
I S THEN READ AND THE ROOTS OF THE ( N + l ) T H ORDER LEGENDRE
POLYNOMIAL ARE COMPUTED AND STORED I N Z ( l ) . . . Z ( N ) BY THE

.
SUBROUTINE ROOTS, WHICH EMPLOYS THE HALF-INTERVAL METHOD.
THE CORRESPONDI NG WEIGHT FACTORS W ( 1). .W(N) ARE COMPUTED
BY THE SUBROUTINE WEIGHT. FINALLY, THE INTEGRALS OF FOUR
COMMON FUNCTIONS ARE ESTIMATED BY AN ( N + l ) - P O I N T GAUSS-
LEGENORE QUADRATURE AND ARE COMPARED WITH T H E I R ANALYTICAL
VALUES .
DIMENSION W(101, Z(101, A(11,ll)
P I = 3.1415926

..... ESTABLISH
POLYNOMIALS UP TO ORDER N = 1 0 .....
COEFFl C l ENTS OF LEGENDRE

DO 1 I = 1, 11
DO 1 J = 1. 11
A(I,J) = 0.0
A(1,l) = 1.0
A(2,2) = 1.0
DO 2 1 = 2, 1 0
FI = I
C 1 = (2.0+FI
C2 = ( F I
A(I+1,1) =
--- l.O)/FI
l.O)/FI
C2*A(I-1,l)
IP1 = I + 1
A(lP1,IPl) = A(l,l)*Cl
DO 2 J P l = 2. 1

WRITE-(~,~OO) -
DO 3 1 = 1, 11
WRITE ( 6 , 2 0 1 ) (A(I,J), J = 1, I )

..... READ VALUE OF N AND F l ND ROOTS Z ( I )


READ ( 5 , 1 0 0 ) N
.....
WRITE (6,202) N
CALL ROOTS (N, A, Z )
NP1 = N + 1
WRITE ( 6 , 2 0 3 ) (Z(I), I I, N P I )
..... F I N D WEIGHT FACTORS W ( I )
CALL WEIGHT ( N , 2 , W )
.....
WRITE ( 6 , 2 0 4 ) (W(I), I = 1, NP1)

..... PERFORM INTEGRATIONS


F 1 1 = 0.0
.....
F 1 2 = 0.0
F 1 3 = 0.0
Exanqple 3.4 Gauss-Legendre Base Points and Weight Factors by the Ha[f-IntervalMethod

Program Listing (Continued)

C
100
..... FORMATS FOR INPUT AND OUTPUT STATEMENTS
F ~ R M A T (6X, 1 4 )
.....
200 FORMAT ( 8 1 H 1 THE COEFFICIENTS FOR THE LEGENDRE POLYNOMIALS OF
lORDER ZERO THRO&H TEN ARE)
201 -FORMAT-TIHO, F7.5, 10F10.5)
202 FORMAT ( 3 7 H 1 GAUSS-LEGENDRE QUADRATURE, WITH/
1 1H0, 9X, 8HN , = 16)
203 FORMAT (51HO THE ROOTS OF THE ( N + l ) T H ORDER POLYNOMIAL ARE/
1(1H0, 5X, 1 0 F 1 0 . 6 ) )
204 FORMAT (42HO THE CORRESPOND l NG WE1 GHT FACTORS ARE/
l ( l H 0 , SX, 1 0 F 1 0 . 6 ) )
205 FORMAT (39HO ESTIMATED VALUES OF INTEGRALS ARE/
1 1H0, 5X, 1 4 H FI1 = , F10.6, 1 4 H F12 = , F10.6,
214H F13 , = F10.6, 14H F14 ,
= F10.6/
3 35H0 EXACT VALUES OF INTEGRALS ARE/
4 1H0, SX, 1 4 H EXACT1 =, F10.6, 14H EXACT2 = , F10.6,
5 14H EXACT3 = F10.6, 14H ,
EXACT4 = F10.6)
C
END

Subroutine ROOTS
c SUBROUTINE TO F I N D ROOTS Z ( l I . . . Z ( N + l ) OF THE ( N + l ) T H OKDEK
C LEGENDRE POLYNOMIAL, US l NG THE HALF- INTERVAL METHOD.
C
SUBROUTINE ROOTS (N, A, Z )
DIMENSION A(11,11), Z(l1)
NP1 = N + 1
lTER = ALOG(O.OS/l.OE-6)/ALOG(2.0) + 1.0
WRITE (6,200) ITER
C
C ..,.,-
2R =
E S T A B L I S H INTERVAL W I T H I N WHICH ROOT L I E S
1.0
.....
0 0 7 1 a 1, N P 1
Z L = ZR
1 I F (POLYCZL, A, N)*POLY(ZL+6.05, A, N) .LT. 0.0) GO TO 3
Z L = Z L + 0.05
GO TO 1
3 ZR = Z L + 0.05
F Z L = POLY (ZL, A, N)
C
C ..... B E G I N HALF-INTERVAL I T E R A T I O N
DO 6 J a 1, I T E R
.....
ZHALF = ( Z L + Z R ) / 2 . 0
FZHALF = POLY (ZHALF, A, N)
C
C ..... CHOOSE THE SUB-INTERVAL CONTAINING THE ROOT
I F ( F Z H A L F * F Z L .LE. 0 . 0 ) GO TO 5
.....
Z L = ZHALF
F Z L = FZHALF
GO TO 6
5 ZR = ZHALF
6 CONTINUE -

7 Z ( I ) = ( Z L + ZR)/2.0
RETURN
C
C
200
..... FOR,MAT FOR OUTPUT STATEMENT
FORMAT (IHO, 9X, 8 H l T E R 16) ,
.....
C
END
186
Program Listing (Continued)
Subroutine WElG HT
C SUBROUTINE TO ESTABLISH WEIGHT FACTORS W(l)...W(N+l) FOR
C ( N + l ) - P O I N T GAUSS-LEGENDRE QUADRATURE.
C
SUBROUTINE WEIGHT (N, 2, W)
DIMENSION C(10), Z ( l l ) , W ( 1 1 )
NP1 = N + 1
-
C
C ..... F I N D COEFFICIENTS OF POWERS OF Z I N INTEGRAND .....
K -
DO 8 1 = 1, N P 1
C ( 1 ) = 1.0
1
DO 5 J = 1, N P 1
I F ( J .EQ. I ) GO TO 5
K = K + l
DENOM = Z ( I )-
C(K) = C(K-l)/DENOM
Z(J)

L E K - 1
2 I F ( L .LT. 2 ) GO TO 4
C(L) = (C(L-1)
L = L - 1
- Z(J)*C(L))/DENOM

4
5 CONTINUE
-
GO TO 2
C(11 = Z(Jl*C(l)/DENOM

C
C .....
W ( I ) 5 C(1)
EVALUATE W ( I ) AS THE INTEGRAL .....
I F ( N P 1 .LT. 3 ) GO TO 8
DO 7 J * 3, NP1, 2
FJ = J
7 W ( I ) = W ( I ) + C(J)/FJ
8 W ( I ) = 2.O*W(I)
RETURN
C
END

Function POLY
C FUNCTION FOR EVALUATING ( N + l ) T H ORDER LEGENDRE POLYNOMIAL.

FUNCTION POLY (X, A, N)


DIMENSION A ( 1 1 , l l )
VAL 5 A(N+2,1)
NPI = N + 1
DO I K = 1, N P I
KPl = K + 1
1 VAL = VAL + A(N+2,KPl)*X**K
POLY = VAL
RETURN
C
END

Data
Computer Output
THE COEFFICIENTS FOR THE LEGENDRE POLYNOMIALS OF ORDER ZERO THROUGH TEN ARE

1.00000

0.0 1.00000

-0.50000 -0.0 1.50000

0.0 -1.50000 -0.0 2.50000

0.37500 -0.0 -3.75000 -0.0 4.37500

0.0 1.87500 -0.0 -8.74999 -0.0 7.87499

-0.31250 -0.0 6.56249 -0.0 -19.68747 -0.0 14.43748

0.0 -2.18750 -0.0 19.68747 -0.0 -43.31241 -0.0 26.81245

0.27344 -0.0 -9.8437b -0.0 54.14053 -0.0 -93.84355 -0.0 50.27335

0.0 2.46093 -0.0 -36.09369 -0.0 140.76532 -0.0 -201.09326 -0.0 94.96072

-0.24609 -0.0 13.53513 -0.0 -117.30446 -0.0 351.91284 -0.0 -427.32275 -0.0 180.42534

Results for the 2nd Data Set


GAUSS-LEGENDRE QUADRATURE, W ITH

N I 2

ITER = 16

THE ROOTS OF THE ( N + l ) T H ORDER POLYNOMIAL ARE

-0.774597 0.000000 0.774596

THE CORRESPONDING WEIGHT FACTORS ARE

0.555555 0.888889 0.555556

ESTIMATED VALUES OF INTEGRALS ARE

FI1 2.350336 F12 = 1.274123 F13 = -0.000001 FI4 = 0.240000

EXACT VALUES OF INTEGRALS ARE

EXACT1 2.350402 EXACT2 = 1.273239 EXACT3 0.0 EXACT& = 0.285714


Computer Output (Continued)
Results for the 4th Data Set
GAUSS-LEGENDRE QUADRATURE, WITH

N - 4

ITER = 16

THE ROOTS OF THE ( N + l ) T H ORDER POLYNOMIAL ARE

-0.906180 -0.538469 -0.000000 0.538469 0.906180

THE CORRESPOND1NG WE IGHT FACTORS ARE

0.236927 0.478628 0.568891 0.478628 0.236927

ESTIMATED VALUES OF INTEGRALS ARE

FI1 = 2.350401 F12 = 1.273240 F13 = 0.000000 FI4 = 0.285714


EXACT VALUES OF INTEGRALS ARE

EXACT1 = 2.350402 EXACT2 = 1.273239 EXACT3 = 0.0 EXACT4 = 0.285714

Resultsfor the 9th Data Set


GAUSS-LEGENDRE QUADRATURE, W l TH

N I 9

ITER = 16

THE ROOTS OF THE ( N + l ) T H ORDER POLYNOMIAL ARE

-0.973907 -0.865058 -0.679113 -0.433394 -0.148875 0.148874 0.433394 0.679413 0.865058 0.973909

THE CORRESPONDING WEIGHT FACTORS ARE

0,066670 0.149284 0.219296 0.269215 0.295650 0.295694 0.269552 0.219099 0.149464 0.066672

ESTIMATED VALUES OF INTEGRALS ARE

FI1 = 2.351203 F12 = 1.273784 F13 = 0.000066 FI1 = 0.285669

EXACT VALUES OF INTEGRALS ARE

EXACT1 = 2.350402 EXACT2 = 1.273239 EXACT3 - 0.0 EXACT4 = 0.285714


Exan7ple 3.4 Gauss-Legendre Base Points and Weight Factors by the Half-Interval Method 189
Discussion of Results is z 6 ; however, as predicted in Section 2.10, it handles I,
The printout is reproduced above only for the three, exactly, since the integrand, z5, is now a polynomial only
five, and ten-point formulas (n = 2,4, and 9, respectively). +
of degree 2n 1. The ten-point formula shows obvious
Within the specified tolerance, the roots zi agree with signs of accumulated round-off error; this difficulty could
those given in Table 2.2. be overcome by working in double-precision arithmetic.
The five-point Gauss-Legendre quadrature is highly Note that the actual application of the Gauss-Legendre
accurate for all four test integrals. Even the three-point quadrature is treated much more thoroughly in Examples
formula fails seriously or~lyfor I,, in which the integrand 2.3 and 2.4.
EXAMPLE 3.5
DISPLACEMENT OF A CAM FOLLOWER USING THE REGULA FALSI METHOD

Problem Statement If the criterion of (3.5.2) is not satisfied after itmax


Consider the rotating cam with follower shown in ~ i ~iterations,
. computation should be discontinued.
3.5.1. Let d (inches) be the displacement of the follower
tip measured from the center of rotation of the cam. The Method of Solution
For a given angle of rotation x, the displacement of the
follower, d(x), is equal to the radius of the cam, r(x).
Then the angle x that produces the desired displacement
D is the solution of the equation
f (x) = r(x) - D = 0.5 + 0.5e(-X121)sin x - D = 0.
(3.5.3)
--
The convergence criterion of (3.5.2) is given by
(a) (b) If (x)i G 8. (3.5.4)
Figure 3.5.1 (a) Cam andfollower After establishing that f (xL) and f(xR) are of opposite
(b) Rotation angle, x, andrrrdiru, r.
sign, to insure that a root of (3.5.3) exists on the interval
radius of the cam, r (inches), measured from the center of [ x ~ , x ~ the
l, fahi algOfithm of (3-34) can be
rotation, is a function of the rotation angle, x (radians): imp1emented
r(x) = 0.5 + 0.5e(-x12")
sin x, 0 < x < 2n. (3.5.1)
Xk+l =
XL,~(XR-,X~R) , ~ ~ ( X L $ . (3.*5)
Figure 3.5.2 shows the displacement of the follower, f(XRL)'f(xL,~)
d(x), corresponding to one complete rotation of the cam. Here, k is the iteration counter. For the first iteration,
Write a Program that reads values for D, XL,XR, E, and xL,, and x ~are, given
~ by xLand x,, respectively. There-
itmax, and then uses the regula falsi algorithm of after,
(3.34) to find a rotation angle x on the angular interval
[xL,xR]for which the follower displacement d(x) is equal
to D. The convergence criterion for stopping the itera-
tive computation should be
( d ( x ) - Dl < E.

Angle of rotation, x (radians)


Figure 3.5.2 Follower displacement, d,a r a function of angle Jrotm~on, x.
Example 3.5 Displacement of a Cam Follower using the Regula-Falsi Method

E
T ,,"Possibly
no root"
-
p-nA T
I
I
I I

# Convergencew
Solution of Equtitlpns

FORTRAN Implementation
List of Principal Variables
Program Symbol Definition
CAMF Function that computes the value off@), in, (see equation 3.5.3).
D t ? m follower displacement, D, in.
EPS Tolerance for convergence criterion, 8 , in,
FX2, FXL, FXR f @ k + 11, f(x~,k),and ~ ( X,k),R in.
ITER Iteration counter, k.
ITMAX Maximum number of iterations permitted, itmax.
x2, XL, XR xk+I, X L , ~ and
, X R , ~ ,radians.
Example 3.5 Displacement of a Cam Follower Using the Regula Falsi Method

P q r a a Listing
C APPLIED NUMERICAL METHODS, EXAMPLE 3.5
c DISPLACEMENT OF A CAM FOLLOWER USING A METHOD OF FALSE POSITION
C
C T H I S PROGRAM FINDS X2, THE ANGULAR DISPLACEMENT ( I N RADIANS)
C OF A CAM ON THE ANGULAR INTERVAL (XL,XR) WHICH CORRESPONDS
C TO A GIVEN FOLLOWER DISPLACEMENT D, USING THE REGULA F A L S I
C ALGORITHM. THE CAM DISPLACEMENT EQUATION I S DEFINED B Y THE
C STATEMENT FUNCTION CAMF. FXL, FXR, AND FX2 ARE THE VALUES
C OF CAMF AT XL, XR, AND X2 RESPECTIVELY. ITERATION CONTINUES
C U N T I L THE ITERATION COUNTER ITER EXCEEDS ITMAX OR U N T I L THE
C MAGNIT(JDE OF FX2 I S LESS THAN OR EQUAL TO EPS.
C
C ...., DEFINE CAM FUNCTION .....
CAMF(X) = 0.5 + 0 . 5 * ( E X P ( - X / 6 . 2 8 3 1 8 5 ) + S I N ( X ) ) -D
C
..... READ AND D,PRINT
READ (5,100)
DATA .....
XL, XR, EPS, ITMAX
WRITE (6,200) D, XL, XR, EPS, ITMAX

..... EVALUATE FUNCTION AT ENDS OF INTERVAL


FXL = CAMF(XL)
.....
FXR = CAMF(XR1

....,
IF
CHECK
FXLtFXR
(
FOR PRESENCE OF A ROOT
) 5, 3, 2
...,.
WRITE [6,201)
GO TO 11
ITER = 1

C ..s.r BEGIN REGULA F A L S l ITERATION a....


5 DO 7
X2
( T E R a l , ITMAX
(XLtFXR -
XR*FXL)/(FXR FXL) -
C
C ...a, CHECK FOR CONVERGENCE . . a * .
I F ( ABSfFX2),LE.EPS ) GO TO 8
C
C .,.,, KEEP RIGHT OR LEFT SUBINTERVAL
I F ( FX2*FXL.LT.O. GO TO 6
.,...
XL = Xi!
FXL 3 F'X2
GO TO 7
6 XR = X2
FXR = F'X2
7 CONTINUE
WRITE (6,102) ITMAX
C
8 WRITE (6,Z103) ITER, X2, FX2
GO TO 1
C
C
100
..... FORMATS FOR INPUT AND OUTPUT STATEMENTS
FORMAT( SX:, FlO.6, 2(10X,F10.6) / 5X, F10.6, 15X, 13
.....
200 FORMAT( lH0/10HOD = , F10,6/ 10H XL = , F10.6/
1 10H XR = F10.6/ I O U EPS = , F10.6/ 10H ITMAX = , 13)
2 0 1 FORMAT( L Z H O P O S S ~ B L YNO ROOT ON THE STARTING INTERVAL )
2 0 2 FORMAT( 21HONO CONVERGENCE AFTER, 13, 11H ITERATIONS
203 FORMAT( IOlHOlTER * ,
1 3 / 10H X2 = , F10.6/10HFX2 = ,
1 F10.6
C
END
Program Listing (Continued)
Data
D = 0.500000 XL = 0.700000 XR = 3.700000
EPS =
D =
EPS - 0.000010
0.750000
0.000050
ITMAX
XL =
ITMAX
=
=
50
0.250000
SO
XR = 1.500000

D
EPS =
D =
EPS =
0.700000
0.000001
1.000000
0.000100
XL =
ITMAX
XL =
ITMAX
=
=
3.140000
100
0.000000
100
XR =
XR - 5.000000
3.140000

Computer Output
D
XL
XR
-
=
=
0.500000
0.700000
3.700000
EPS = 0.000010
ITMAX = 50
ITER = 5
X2 = 3.141594
FX2 = -0.000000

D = 0.750000
XL = 0.250000
XR = 1.500000
EPS = 0.000050
ITMAX = 50

ITER = 5
X2 = 0.580566
FX2 = 0.000043

D = 0.700000
XL = 3.139999
XR = 5.000000
EPS = 0.000001
ITMAX = 100
POSSIBLY NO ROOT ON THE STARTING INTERVAL

D = 1.000000
XL = 0.0
XR = 3.139999
EPS = 0.000100
ITMAX = 100
POSSIBLY NO ROOT ON THE STARTING INTERVAL
l?xample 3.5 Displacement of a Cam FoIIower t4~iwthe Rcguia Falsi Method 195
D W o n of Results used to find the angular displacement corresponding to a
The computer output shows results for two data sets @ve" f ~ l l displacement
~ ~ r for any cam; the func-
that yield solutions and two that do not. (The exact solu- CAMF need be modified to include the appropriate
tion for the first data set is x = n.) The program can be radius functions r(x).
196 Solution of Equations

3.9 Rutishauser's QD Algorithm For n = 4, the relations are shown schematically in


A modernization of the classical method of Bernoulli is Table 3.9. For obvious reasons, relations (3.37) are some-
afforded by the QD (quotient-difference) algorithm. The times called rhombus rules. If a rhombus is centered on a
scheme starts in the manner described in Section 3.3. Thus q-column, pairs are added, as indicated; if a rhombus is
let uk, 0 < k 4 n - 1, be given numbers (in practice, centered on an e-column, pairs are multiplied.
chosen by techniques described later). For k 2 n, define As might be expected for a scheme involving division,
it is difficult to guarantee feasibility for every starting
procedure u,, ul, . . ., urn-,, and for every solution set a,,
a,, . . ., an. Some results are known (El], p. 166).
Consider first the condition
The coefficients aj are, of course, those of the polynomial

It is then known that if the QD sequences exist, then


limk,, qim)= amand lim,,, elm)= 0.
whose zeros we again seek. Consider next the condition
As in Bernoulli's method, we form the sequence {qil))
where
again requiring that the QD sequences exist. Then, for
4$1) = -.
Uk+ 1
(3.36) everymsuchthatIum-ll>Iuml>Ium+ll,
uk
limk,m qlm)= a,,,.
Under suitable circumstances, if a, is a uniquely deter-
Also, for every m such that lami> 1 ~ , , , + ~ 1 ,
mined dominant zero, limk,, qil) = al. Rutishauser's
extension of Bernoulli's method builds additional .. = 0.
lim,,,.- - eim)
sequences, {qi2)),{qL3)),. . ., {q?)}, which can converge to
Here, Ia,l is interpreted as oo and lun+,1 as 0.
a,, a,, . . .,an, respectively. As with Bernoulli's method,
We thus see that the columns of a QD table can be
it is also possible to use subsidiary sequences when com-
divided into subtables by those e(") columns that approach
plex conjugate roots occur, and so on. ,
zero. Then all the q") columns contained in a subtable
To define the new sequences {qLm)),m = 2,3, . . ., n, it is
pertain to u, values having the same modulus. Thus if a
convenient to construct sequences {elm)),m = 0, 1,2, . . .,
subtable has a single q(') column, a, is its limit.
n - 1. We then have
One necessary and sufficient condition is known for the
elm)= [q$y)l- qP)] + eiy"+;), (3.37a) existence of the elements of the QD scheme. It is the non-
Am)
vanishing of the determinants
ek+1 (,)
qim+l)=-qk+l. (3.37b) uk+ 1 ...
ep)
Clearly, nothing has been defined unless eiO)is known. We
have, always,
~ i m=
IUk:
) 'k+.l

I Uk+rn-~
'k+2
:
Uk+m
...
...
Ulr+m- I
"k+m
.
uk+zrn-21
, (3.40)

Table 3.9 Column Generation of the QD Algorithm


3.9 Rutishauser's QD Algorithm 197
Sufficient conditions for the nonvanishing of the deter-
While this does not guarantee the existence of all elements
minants of (3.40) are that in the QD system, it does guarantee their existence for
iterative indices sufficiently large ([I], p. 165).
cci>O, 1<i<n, (3'41) The question of stability remains. The method des-
and, for 0 < k d n - 1, cribed above is numerically unstable if the columns of
numbers are generated in the order el1),qla, el2),qj3).. . .
*.
k-
>@
5
n
'C .k+l
k + l = t & r - (342j Fortunately, this can be avoided if the numbers generated
I= 1
are built row by row, rather than column by column. This
The S j above are Newton's symmetric functions, and are has been implemented for the starting procedure of (3.46).
given by Chrystal C51 in terms of the polynomial coeffi- The relationships (3.37a) and (3.37b) are first rephrased
cients by the recursion relation as

Thus if the zeros cri are distinct, we have, by (3.12), for all
values of k,
n
uk = C tiff'. (3.43) Thus, if a row of q's and the succeeding row of e's are
,= 1 known, we may build the next row of q's and then the
If the values of u, are given as described in (3.42), it next row of e's. TOdo this, we need proper starting values.
develops that the relation (3.43) is valid for all choices of For the case presented in (3.46), these are given by
cri. ~ e t j h distinct
e zeros of x;=,
aix"-' be P I , p,, . . .,P,,
and let their respective multiplicities be t,, t,, . . ., t,. This
qbl)=-al, (m)
qt-,=O, m = 2 , 3 ,..., n,
(3.48a)
means that if

then, for all k 2 0: In addition, we require that

Another starting procedure is essentially that employed It is known ([I], p. 171) that if the numbers generated by
in Section 3.3 for Bernoulli's method. Let u0 = 1 and, for (3.47), (3.48), and (3.49) exist, then they are the same as
l<k<n-1, those generated by (3.37) and (3.46), together with
k
ek0) = 0 for all k. Table 3.10 shows the beginning rows
U, = -- 1ajuk- j. (3'46) for n = 4 (compare with Table 3.9).
j= 1

Table 3.10 Row Generation of the QD Algorithm


198 Solution of fiua~ions

Problems where N is the degree of the polynomial pI(x), whose real


3.1 Modify the program of Example 3.1 so that Graeffe's
.
coefficients ao, al, .., a,, are available in A(1), ...,A(N + 1).
X I is the starting point, XI, far Newton's algorithm, and EPS is
root-squaring method is implemented as a subroutine, a small positive number used in the tests for termination of the
named GRAEFF, with argument list iteration :
(N. A, ITMAX, TOP, EPS, IPRINT,
ROOTR, ROOTI, PVAL, RTRUE)

where the first six arguments are defined in Example 3.1.


Develop a criterion for establishing that roots of equal mag- Ibo(xr+l)l < EPS, if Ixk+Il< EPS.
nitude (real or complex conjugate) may be present, so that ITMAX is the maximum allowable number of iterations, and
equation (3.8) may be applied to find the real and imaginary ITER should, on return, contain the number of iterations
parts of such roots. Upon retum from the subroutine, actually performed. If neither of the tests is satisfied after
ROOTR (I) and ROOTI (I) should contain the realand imaginary lTMAX iterations, computation should cease, and ITER
parts of the Ith root, PVAL(I) should contain the value of the should be assigned the value ITMAX f 1.
polynomial at the Ith root, and RTRUE(I) should contain Write a main program that reads and prints values for
.TRUE. or .FALSE. according as the magnitude of PVAL(I) is
smaller or not smaller than the value of EPS. There should be
.
N, A(1). . ., A(N + I),X I , EPS, and ITMAX, calls upon
PROOT to return an wtimate of a root as its value, prints
N entries in each of the four vectors. the value of PROOT(N, ..., ITER) and ITER, and returns to
Write a main program that reads values for N, ITMAX, TOP, read another data set. Test the function with several of the
EPS, IPRINT, and A ( l ) , ..., A(N + I ) , as in Example 3.1, polynomials used as test data for Example 3.1, and compare
calls upon GRAEFF to find estimates of the N roots of the the roots with those found by Graeffe's method.
.
polynomial whose coefficients are in A ( l ), . ., A(N + 1), How would you modify PROOT to allow the extraction of
prints the values returned by GRAEFF, and returns to read up to n real roots of p.(x)?
another data set, Test the program with the data used for 3.4 Show that a straightforward application of the method
Example 3.1. of successive substitutions (rather than Newton's method)
I

3.2 Starting with a polynomial of degree n, p,(x) = 2 a(xl,


1-0
to find a root of p.(x) by iterated synthetic division (see
Problem 3.2) leads to an alternative algorithm (Lin's iferation
it was shown in Section 1.4, that a factor (x - xk) can be 1291)
removed from p,(x) by synthetic division, leading to

q.-~(x)= 5 blxl-I= (x-


i= 1
~ d q . - ~ ( x ) +R,, which is equivalent to

q n - I(X)= f:Clxi-2,
1=1

where
Find the asymptotic convergewe factor for this iterative
scheme, and determine a necessary condition for Convergence
to a root.
3.g Since complex roots of a polynomial with real coeffi-
cients occur in conjugate pairs (ifat all), isolation of quadra-
tic factors from polynomials of degrce greater than two is an
important approach to the solutian of polynomial equations.
Show that introduction of these relationships inte Newton's
Once a quadratig factor has been found, then two real or
algorithm of (3.29) leads to complex roots may be found directly with the quadratic
formula;no further iteration is required. One suitable approach
is to use the method of iterative factorization of Section 3.4.
A simpler, though less ~fficient,iterative prwdure for findisk
an iterative method for finding a root, a = lim x,, of p,(x). quadratic factorscan k dovelo@, based on synthetic division
k-m
n i s COQ~-finding PrQCedure is know11 as iterutud syrctiietic by a quadratic War,
diuision, 8tarting with the polynomial
Y the p~lynon)ialp,(x) has a root near zero, what would be
a good guess for x, ? Generalize the algorithm, sa that up to 4
nwl roots olp.(z) may bs extracted, ane at a tim.
3.3 Write a functian, named PRQBT, that implements the
iterated synthetic division algorithm described in Problem 3.2.
bt the argument list be
Problems

Show that (x2 + p x + q) will be a perfect factor ofp.(x) if and


only if

and that application of the method of successive substitutions


(see Section 5.8) to solution of these simultaneous equations
leads to the "improved" estimates of p and q,
Here, E is a small positive number.
In each case, i assumes the values n, n - 1, . . .,n - m 1,+
where rn is the number of such ratios which are to be tested in
the sequence. If the first of these tests is passed, then the
dominant root can be found from (3.11); if one of the latter
two is passed, then two roots (possibly multiple, real or
complex) can be found as discussed in Section 3.3.
This procedure is known as Lin's method (see also Problem 3.4).
, Since convergence may be slow, storage for the sequences
Use Lin's method to find quadratic factors of
+
{u,}, {vk),and { t r ) should be limited to length n 3. This can
be accomplished if, after each iteration, the three most
recently computed values of Ur (u., U. + 1, and u. + z for the first
starting with p = 0, q = 0.
iteration) are transferred to the locations originally occupied
3.6 Show that application of the Newton-Raphsan algor-
by uo, u l , and u z . This overstoring procedure can be continued
ithm (see Section 5.9) to solution of the simultaneous equa- indefinitely until one of the ratio tests is passed or until the
tions of Problem 3.5 leads to maximum allowable number of iterations has been exceeded.
Also, since the numbers uk may stray from the floating-point
(real) number range, the n most recently computed elements of
the sequence {uk)should be "normalized" after each iteration,
by dividing each element by the average magnitude of the n
most recently computed elements in the sequence {uJ.
where Ap = p -p and Aq = cj - q. Introduce the recursion
The program should read, as data, values for n, a, a,, . ., .
an,rn, E , and the maximum number of iterations permitted,
relations Test the program with a variety of polynomials, includingsome
of those used to test the Graeffe's method program of Exam-
ple 3.1.
3.8 It is stated on page 220 that the eigenvalues h of an
n X n matrix are the roots of its characteristic equation,
a

where the b, are those of Problem 3.5, and show that the i-0
2aLhi= 0. Write a program that will read values for n, no, a,,
equations developed above can be rewritten in the form .. ., am.and E (a tolerance), and proceed to compute (possibly
with the aid of one of the functions already developed) and
.
print values for the eigenvalues A,, Az, . .,A,, known to bc
real, each within k g . Test your program with the following
This iterative algorithm for finding ever-improving estimates characteristic equations, mentioned on pages 220, 224, and
of B and 4 is known as Bairstow's method. 225.
Find quadratic factors for the polynomial given in Prob- + - +
(a) h4 20h3- 700h2 8000h 120000 = 0,
lem 3.5, starting with p = 0,q = 0, and compare the results
with those found using Lin's method. (b) -ha +4th'- 539h f 2058 = 0,
3.7 Write a program that implements Bernoulli's method - -
(c) -Ah - 16h3- 93h2 232h 209 = 0.
of Section 3.3 for finding the dominant root or two roots of 3.9 Ah a continuation of Problem 3.8, write a program
largest moduli, of the nth-degree polynomial that will handle the possibility of complex eigenvalues. Test
your program with the following characteristic cquati~n.
h4 - 16h3f 78Xa - 412h + 624 = 0,
where the a , are real and a. # 0.The program should compute

-
the sequences {uk},{vk},ancl {t,) of (3.9) and (3.10). Satis-
factory values for the starting sequence are ui 0, for
-
which corresponds to one of the matrices in Examplo 43, and
has roots h 12, 1 &- Si, and 2,
3.10 When using the technique of Laplace transformation
OC k < n - 1,and u r n - , = 1. Theelements u , , u , + ~ ,and U.,Z for solving problems, we are frequently cenfronted with the
should be computed first, followed by the computation of task of splitting the ratio of two polynomials,
. +
ur, k = 1,2, . ., n +
1, and tk,k ='2, 3, , . ., n 1. The ratio
tests of (3.1 1) and (3.13) should be applied to establish con-
vergence to a root@). Suggested versions of these tests are:
200 Solution of Equations
into partial fractions. The degree n of the polynomial in the where
denominator exceeds the degree m in the numerator. F(s) is the
Laplace trqnsform of the function, f ( t ) for example, that is
being sought. A key step is to find the zeros a,, a,, ..., a.
(possibly complex) of q,(s). If there are no repeated zeros, let
Express I(s) as the ratio of two polynomials, p,(s)/q.(s) (see
Problem 3.10). Write a program that will find the zeros of
q.(s). Then invert I(s) with the aid of tables ([21], for example)
to give the actual current as a function of time.
By multiplying through by s - a,, letting s approach a,, and
using Taylor's expansion, we obtain Suggested Test Data
V. = 10 volts, R = 100 ohms, C, = 0.25 x C1=
1x farads, L1 = 0.005, L, = 0.001 henrys, with w = lo*,
3 x lo4, and 105 sec-I.
3.12 At time t = 0, the switch in the circuit of Fig. P3.12
If there is a repeated root, for example, a, = a, = . . . = ax= p,
let L1

The reader should then be able to discover formulas giving the


a , and bl for this case.
Once in partial fraction form, the transform F(s) may be
inverted by referring to tables (Spiegel [21], for example) to Figure P3.12
give the function f(t) that is the required solution to the
is closed. Following the method in Skilling [20], the Laplace
problem.
transform of the current through the resistor is given by
Discuss the possibility of automating the above procedure
by computer-that is, of writing a program that will:
I(s) =
v
(a) accept coefficients for the polynomials p,(s) and q.(s), s{sL1[1-k sC(SL~+ R)] + sLI $ R)'
(b) determine the zeros a,, az, .. ., a, of q,(s), If V = 10 volts, L, = 0.02, L, = 0.1 henrys, C = 0.5 x
(c) decompose F(s) into partial fractions of appropriate farads, and R = 50 ohms, express I(s) in terms of partial
form, and fractions (see Problem 3.10). With the aid of a table of Laplace
(d) invert F(s) to give f(t), which is then suitably displayed transforms (Spiegel [21], for example), obtain the current
(possibly being plotted against t). through the resistor as a function of time.
3.13 Consider the algorithm of (3.22), xi+, = F(x,). Let
If the scheme seems feasible, write a program to implement the discrepancy of the nth iterate, x., from the true solution, a,
it, using polynomials generated in Problems 3.11 and 3.12 as
test data.
+
be E., SO that we may write x. = a E.. Assuming that F(x)
is suitably differentiable, expand F(x) in a Taylor's series about
3.11 At time t = 0, the switch is closed in the circuit of a and show that
Fig. P3.11, which is intended to act as a band-pass filter for
frequencies between ro~~ghly 3000 and 6000 cps.

3.14 If the order of an iteration (3.22) is defined to be the


order of the lowest-order nonzero derivative of F(x) at the
solution a, show that for simple roots, Newton's method is
a second-order (quadratic) process, while for multiple roots
Newton's method is a first-order process.
3.15 (a) Show that an alternative formulation of Newton':
method (see Section 3.7 and Fig. 3.3) can be developed bj
expansion of f(z) in a Taylor's series about 20, followed bj
truncation of the series after the first-derivative term.
Figure P3.11 (b) Show graphically that for f(x) real, Newton's methoc
exhibits monotonic convergence to a (that is, convergence fron
Following the method given in Skilling [20], the Laplace one side only) if f(xo)f"(xo) > 0 and f(x) and f"(x) do no
transform of the current leaving the generator is given by change sign on (x0.a).
(c) Show that Newton's method exhlblts oscillatory earl
Vmw vergence (that is, successive iterates alternate kom one sida of 4
I(s) =
1
(s' + w2)[R+ -
sc1
+ SLI + g(@l to the other) if f(xo)f"(xo) < 0 on the interval (x,,xl) whor
xo<a<x1.
Proble

3.16 Based upon the notions of inverse interpolation (see Also write a function named CYLPLN that will compute F12
Problems 1.14 and 1.15) and of iterated linear interpolation for the infinitely long cylinder and plane shown in Fig. P3.17~.
(see Problems 1.11, 1.12, and 1.13), develop a root-finding Anticipate that a typical reference will be
method for a function f(x), single valued on an interval F12 = CYLPLN (D,H, L, W)
[xo,x.] known to contain a zero. Use your method to find the
root of the function where the arguments have obvious counterparts in Fig. P3.17~.

with xo = 0, x, = 0.1, . . ., x , ;= 0.5, Compare your results for


interpolation of degrees 1 through 5 with the true solution
0.347296 (see Section 3.5).
Suppose that f ( x ) can be calculated for any x. How would
you modify the method just developed to achieve greater
accuracy with improved computational efficiency? What con-
ditions must be imposed on f(x) in the interval of interest to Figure P3.17~
insure convergence to a root? 3.18 (a) An experiment is to be performed in which the
3.17 This problem does not involve numerical methods two cylinders of Problem 3.17 are to be spaced so that Fl2has
directly; however, it establishes two functions that will be a specified value 0. Write a program that will use the function
needed in Problems 3.18, 5.27, 5.30, and 6.34. CYLCYL to compute the appropriate value of wld, if such
Consider two infinitely long surfaces, 1 and 2, that are exists, for 0 = 0.5, 0.2, 0.1, 0.05, 0.02, and 0.01.
generated by moving the curves AB and C D in Fig. P3.17~ (b) The cylinder and plane of Problem 3.17 are situated with
h = d and 1 = w/2. Write a program that will use the function
CYLPLN to compute those values of wld, if any, for which F12
equals 0.01, 0.02, 0.05, 0.1, 0.2, 0.49, and 0.5.
3.19 Consider the following approach to the problem of
finding a solution to the equation f(x) = 0. Start with three
~ o i n t s (xo,
, f(xo)), (x,, f(xl)), and (x2,f(x2)). Find the second-
degree interpolating polynomial (use Lagrange's form (1.43))
passing through the three points, so that

Figure P3.17~
Now, solve for the two roots of pz(x) using the quadratic
normal to the plane of the paper. Suppose we wish to evaluate formula. Let the two roots be r1 and rz, and evaluate the
Fix,the fraction of thermal radiation emitted by surface 1 that corresponding functional values, f (r,) and f (r2). Let x3 be r ,
is directly intercepted by surface 2. In the string method (see or r2, depending on which of I f(rl)l or If(rz)l is smaller. Next,
Hottel and Sarofim [23], for example), four threads are
fit the points (x,, f(xl)), (x2,f(xZ)),and (x3,f(x3)) with a second-
stretched tightly between AID, BC, AC, and BD. The geometric
degree interpolating polynomial, solve for the roots, choose
view factor F,2 (see Problen~2.41) is then given in terms of the x4 to be that root which yields the smaller magnitude for the
lengths of the threads by
functional value, etc. Continue this process of fitting successive
second-degree polynomials until, if convergence occurs, the
functional value assumes some arbitrarily small value. Note
that this procedure, known as Muller's method [26], allows
where L1 is the distance between A and B along the surface AB. the isolation of complex as well as real roots of f(x).
TWOinfinitely long parallel cylinders of diameter d have Implement Muller's method as a function, named MULLER,
their axes a distance w apart, as shown in Fig. P3.176. Show with argument list
by the string method that
(X, F, EPS, ITMAX, ITER)

where X is a vector of length three containing XI,x2,and x, in


X(l), X(2). and X(3), respectively, F is another function that
computes f(x) when required, EPS is a small positive number
used in the convergence test for terminating the iteration,
ITMAX is a maximum allowable number of iterations, should
convergence not take place, and ITER is the number of itera-
Figure P3.17b +
tions actually performed (set to ITMAX 1 for failure to
(The points A, B, C , and D between which the threads may be converge). The final estimate of the root should be returned
stretched are shown in Fig. P3.17b.) Write a function named as the value of MULLER. Assume that all the roots of f(x) are
real; if the algorithm yields complex roots at some stage, the
CYLCYL that will compute F1z. Anticipate that a typical
reference will be function should set ITER to zero and return to the calling
program. Test the program with some trial functions whose
F12 = CY LCYL (D, W) roots are known.
202 Solution of Equations

How would you modify the function so that it could isolate volumetric flow rate v of a solution containing equal inlet
complex conjugate roots of a polynomial with real coefficients? concentrations a. each of A and B is fed to two CSTRs (con-
3.20 The 655 triode shown in Fig. P3.20 is used for ampli- tinuous stirred tank reactors) in series, each of volume V.
fying an a-c input signal of small amplitude. The amplifier gain Denoting the exit concentrations of A from the first and
second CSTRs by a t and a', respectively, rate balances give:

High-tension
6J5 supply, us

If k = 0.075 literlg mole min, v = 30 literlmin, a. = 1.6 g


moles/liter, and the final conversion is 80% (that is, a2/ao=
0.2), determine the necessary volume V (liter) of each reactor.
For an extension to n CSTRs in series, see Problem 5.31.
Figure P3.20 3.23 Assume that three successive iterates, xk,x k + ~and ,
xr + ,, have been obtained using (3.22), XJ + = F(xj), and that
A (ratio of rms output a-c voltage developed across the load a is the required solution of f(x) = 0. Using the mean-value
resistor R to the rms input voltage) is theorem (see Section 1.6), show that

A=---
gmrrR
+
r , R'
where r, is the dynamic plate resistancb and g, is the trans- where is in (xk,a) and 5 2 is in ( ~ k + ~ , a ) .
conductance (see Problem 2.48). If the high-tension supply If 6, = t 2 , show that the solution a is given by
voltage is constant at v,, we have the following additional
relation involving the anode voltage v. and the anode current
i. as unknowns:
Ri, = v, - v.. whereAxk+, = x k +- ~~ ~ + ~ a n d A ~ x t = x ~ ++xk. ~
If v,, v,, and R are specified, devise a procedure for cornpu- 3.24 In general, and f 2 of Problem 3.23 will not be
ting v. and the amplifier gain A. Write a program to implement equal. However, if xb xk and x k +2 are near a, then f 1 = f 2 ,
the method, making use of the functions already developed in so that the next iterate may be taken as
Problems 1.42 and 2.48 for the 6J5 triode. Suggested test data
(usandugin V, Rinkn):v,= 300, v,= -6.5, R = 10;vS=300,
0, = - 16.2, R = 22; us = 300, U, = -2.3, R = 22.
3.21 The first-order irreversible chemical reaction A -+B ,
If x j t = F(x,) is a first-order process (see Problem 3.14). this
has a reaction rate constant k hr-' at a temperature T. A extrapolation technique usually accelerates convergence to a,
stirred batch reactor is charged with Vcu ft of reactant solution ,
that is, If,, - a1 < Ixk.3 - a \ , where xk.3 = F(xk+').
of initial concentration a. lb moles/cu ft and is operated The iterative process that employs the sequence of calcula-
isothermally at a temperature T for t o hours. The reaction tions
produas are then removed for product separation, and the X Z = F(xI),
reactor vessel is cleaned for subsequent reloading with fresh
~3 = F(xz),
reactar,f.
The reactor is operated cyclicly; that is, the process of 24 = x3 - ( A X ~ ) ~ / A ~ X I ,
loading fresh reactant, allowing the reaction to proceed, xs = F(23,
dump:ng the product, and cleaning the reactor is repeated
X6 = F(x5).
indefijiitely. If the down time between reaction cycles is t,
hours, show that the reaction time t o required to maximize the 2 7 = ~6 - (AX~)~/A'Z~,
overall yield of product B per hour is given by the solution of xs = F(37),
the equation etc.,
is known as Aitken's A' process.
Solvt this equation for the following test data: Use Aitken's A' process to find the three roots of f(x) =
v (CUft) 10 10 10 10 x3-3x+l=Owith
a. (Ib moles/cu ft) 0.1 0.1 0.1 0.1
k (hr-I) 2.5, 2.5 1.0 1.0
t, ; (hr) 0.5 1.0 1.0 2.0
. -.
g22 The isothermal irreversible second-order constant In each case, compare the sequence of iterates with the results
+ +
volume reaction A B 4 C D has a velocity constant k. A of Table 3.5.
Proble

3.25 Devise a procedure, based on one of the standard Table P3.30


equation-solving techniques, for evaluating XI'". Assume that x
is a real positive number, that n is a positive integer, and that Gas a b
the real positive nth root is required.
3.26 Devise a procedure for evaluating the n complex Carbah dioxide 3.592 0.04267
Dimethylaniline 37.49 0.1970
roots, xi'", of a real positive number x. Assume that n is a
Helium 0.03412 0.02370
positive integer. Nitric oxide 1.340 0.02789
3.27 Determine the first ten positive roots of the equation
Note: If T 3 T,, where T, = 8a/27Rb is the critical tem-
perature (above which a gas cannot be liquefied), there will be
which is important in the theory of vibrating reeds (see Wylie only one real root for o; otherwise, there will be either one or
[18], for example). These roots may be compared with similar three teal roots, depending on the particular values of P and
values found in Problem 4.32. T. If there are three roots, u, < u2 < v3, then v1 and v3 cor-
3.28 Compute to at least six significant figures the first respond to the liquid and vapor molal volumes, respectively,
50 positive roots of the equation fl tan fl = h, for h = 0.1, 1, and v 2 has no physical significance. In any event, investigate the
and 10. These roots will be needed later in Problem 7.24. results for a wide range of values for P and T.
3.29 Write a subroutine, named WARD, that implements 3.31 In a hydraulic jump, a liquid stream of depth D l ,
Ward's method (see Section 3.6) for finding a root, a = fl f iy, flowing with velocity u,, suddenly increases its depth to D2,
+
of a function f(z) of the colriplex variable z = x iy, where with a corresponding reduction in velocity. On the basis of
mass and momentum balances, it can be shown that
Let a typical call on the subroutine be
CALL WARD (X, Y, FR. FI, HX, HY, EPSHX, EPSHY, ITMAX,W)
in which g is the gravitational acceleration. A hydraulic jump
Here, FR and FI are functiorls of two variables, X and Y, that is possible only if u, >1/27.
return the real part of f(z), that is, u(X,Y ), and the imaginary If values of u, and D2 are known, devise one or more
part of f(z), that is. u(X,Y), respectively, when needed. Upon schemes for determining: (a) if a hydraulic jump is possible,
entering the routine, the first estimates of fl and y should be and (b), if so, the corresponding value of DI. Write a pro-
available in X and Y. The searching strategy should employ gram to implement the method; the input data will consist of
HX and HY as the initial step sizes in the x and y directions pairs of values for u, and D2.
respectively. The searching step sizes should be halved when 3.32 Write a function, named CPOLY, with argument list
appropriate (without modifying the values of HX and HY in (N, X), that evaluates the nth-degree polynomial C.(x) (see
the calling program) until the step size in the x direction is Problem 2.27) at argument x = X, where n = N may be any of
smaller than EPSHX and the step size in the y direction is 2, 3,4, 5; 6, 7, 9. The coefficients of the polynomials should be
smaller than EPSHY, or until the total number of calls upon preset in a suitable arrangement using a DATA (or equiva-
FR (X, Y) and FI (X, Y) exceeds ITMAX. In either event, upon
lent) statement. The value of C,,(x) should be returned as the
return, X and Y should contain the final estimates of fl and y, value of CPOLY.
respectively; W should be assigned the value of w(x,y) for the Next, write a main program that uses the half-interval
+
final iteration, where w(x,y) = 1 u(x,y)l Iv(x,y)l. method (see Fig. 3.5) to find the roots of each of the seven
Write a short calling program, and appropriate functions polynomials evaluated by CPOLY (see Problem 2.28 for a
FR and FI, to find a root of
further use for the roots). The program should read a value for
DELTA, an error tolerance for the roots, equivalent to A, in
(3.33), then calculate the n roots of C.(x), for n = 2, 3,4, 5,6,
+
with various starting points, includidg z = 1 i. Compare 7, 9, and print the results in tabular form.
your results with those of Tables 3.6 and 3.7. Calculational effort can be reduced considerably by noting
3.30 Van der Waals's equation of state for an imperfect the following:
gas is (a) All n roots of C.(x) lie on (- 1,l).
(b) For n even, C,(x) is an even function of x. Hence, the
n roots occur in n/2 root-pairs, symmetrically arranged about
the origin. Attention can be confined to (- 1.0) or (0,l).
where P is the pressure (atm), u is the molal volume (liters1 (c) For n odd, C.(x) is an odd function of x. Hence, there
mole), T is the absolute teniperature ("K),
R is the gas constant is a root at the origin, and the remaining n - 1 roots occur
(0.082054 liter atm/mole OK:), and a and b are constants that in (n- 1)/2 root-pairs, symmetrically arranged about the
depend on the particular gas. origin. Again, attention can be confined to (- 1,O) or (0,l).
Write a program that will read values for P, T,a, and b as (d) No two roots for any of the polynomials are more
data, and that will compute the corresponding value(s) of v closely spaced than 0.05.
that satisfies the van der Waals equation. The test values in A strategy similar to that used in Example 3.4 may be employed
Table P3.30 for a (liter2 iitm/mole2) and b (literlmole) are for solving the problem. Note that, because the above com-
given by Keller [27]. ments also hold for the Legendre polynomials, the program
204 Solution of Equations
of Example 3.4 could be modified to halve approximately the
number of calculations required to isolate the roots.
3.33 Devise two or more practical schemes for evaluating,
within a prescribed tolerance E , all the roots of the nth- Vapor
degree Laguerre polynomial. Write a program that imple- --
ments one of these methods. The input data should consist of --
- Liquid 1
values for n and E . Your program should automatically
generate the necessary coefficients of the appropriate Laguerre
polynomial, according to equation (2.69). Check your compu-
ted values with those in Table 2.4. Figure P3.36
The problem may be repeated for the Hermite polynomials,
Here, Kt is the equilibrium constant for the ith component at
summarized in equation (2.75). The computed roots may be
the prevailing temperature and pressure in the tank. From
checked against those given in Table 2.6.
3.34 For the isentropic flow of a perfect gas from a reser-
these equations and the fact that
that
z;= x;=
,xi= ,y, = 1, show
voir through a converg~ng-divergingnozzle, operating with
sonic velocity at the throat, it may be shown that

Write a program that reads values for F, the z , , and the K i


as data, and then uses Newton's method to solve the last
Here, P i s the pressure at a point where the cross-sectional area equation above for V. The program should also compute the
of the nozzle is A , P I is the reservoir pressure, A, is the throat values of L, the xi, and the y iby using the first three equations
area, and y is the ratio of the specific heat at constant pressure given above.
to that at constant volume. The test data [16], shown in Table P3.36, relate to the
If A,,y , PI,and A ( > A , ) are known, devise a scheme for flashing of a natural gas stream at 1600 psia and 12O'F.
computing the two possible pressures P that satisfy the above
equation. Implement your method on the computer. Table P3.36
Suggested Test Data Component I zi Ki
A, = 0.1 sq ft, y = 1.41, PI= 100 psia, and A = 0.12 sq ft.
3.35 A spherical pocket of high-pressure gas, initially of Carbon dioxide
radius r, and pressure p,, expands radially outwards in an Methane
adiabatic submarine explosion. For the special case of a gas Ethane
with y = 413 (ratio of specific heat at constant pressure to that Propane
Isobutane
at constant volume) the radius r at any subsequent time t is n-Butane
given by [ 2 2 ] : -. Pentanes
Hexanes
Heptanes+

in which a = (r/ro) - 1, p is the density of water, and consistent


units are assumed. During the adiabatic expansion, the gas
pressure is given by p/p, = (r,/r)3Y. Assume that F= 1000 moleslhr. A small tolerance E and an
Develop a procedure for computing the pressure and radius upper limit on the number of iterations should also be read as
of the gas at any time. data. What would bea good value V , for starting the iteration?
Suggested Test Data (Not in Consistent Units) 3.37 For turbulent flow of a fluid in a smooth pipe, the
following relation exists between the friction factor cf and the
P O = lo4 lbflsq in, p = 64 lb,/cu ft, ro = 1 ft, t = Q.5,1,2,3,
Reynolds number Re (see, for example, Kay (171):
5, and 10 milliseconds.
3.36 Fmoleslhr of an n-component natural gas stream are
introduced as feed to the flash vaporization tank shown in
Fig. P3.36. The resulting vapor and liquid streams are with-
drawn at the rates of Vand L moles/hr, respectively. The mole Compute cf for Re = lo4, 1Q5, and lo6.
fractions of the components in the feed, vapor, and liquid 3.38 For steady flow uf an incompressible fluid in a rough
streams are designated by zi,yi, and x , , respectively (i = 1, pipe of length L ft and inside diameter D in., thepressure drop
2, . ..,n). A P lb,/sq in. is given by
Assuming vapor/liquid equilibrium and steady-state opera-
tion, we have
Overall balance F = L V, + where p is the fluid density (lb,/cu ft), u, is the mean fluid
Individual balance +
ziF= xiL y iV,
i = l , 2 ,..., n. velocity (ftlsec), and fM is the Moody friction factor (dimen-
Equilibrium relation K, = y , / x i . sionless). For the indicated units, the conversion factor g.
Problems 205
equals 32.2 Ib, ft/lb, secZ.The friction factor fM is a function pump being used, and Q is the flow rate in gpm. Also, from
of the pipe roughness E (in.) and the Reynolds number, equations (5.4.2) and (5.4.3), the pressure drop in a horizontal
pipe of length L feet and internal diameter D in. is given by

where p is the fluid viscosity in Ib,/ft sec. For Re < 2000,


where p is the density (Ib,/cu ft) of the liquid being pumped.
For the present purposes, the Moody friction factor f~ is
while for R e > 2000, fM is given by the solution of the Cole- treated as a constant although, as discussed in Problem 3.38,
brook equation, it really depends on the pipe roughness and on the Reynolds
number in the pipe.
Write a program whose input will include values for a, b, p,
L, D, fM, E (a tolerance used in convergence testing), and n
A good starting guess for the iterative solution of this equation (maximum number of iterations), and that will proceed to
may be found from the Blasius equation, compute the flow rate Q. The program output should consist
of printed values for the specified input data, the solution Q,
fM = 0.316Re-0.25, the intermediate pressure p z , and the actual number of itera-
appropriate for turbulent flow in smooth (s = 0) pipes. tions used. If the method fails to converge, a message should
Write a function, named PDROP, that could be called with be printed to that effect. Use the two sets of test data shown in
the statement Table P3.39. Select values of E and n that seem appropriate.

DELTAP= PDROP (0, D, L, RHO, MU, E) Table P3.39


where the value of P DROP is the pressure drop in Ib,/sq in. for
a flow rate of O gal/min of a fluid with density RHO and vis- Set 1 Set 2
cosity MU through a pipe or length L, inside diameter D, and
D, in. 1.@I9 2.469
roughness E. Note that M U and L must be of type REAL.
L, ft 50.0 210.6
To test PDROP, write a program that reads values for p, Ib,/cu ft (kerosene) 51.4 -
Q, D, L, RHO, MU, and E, ~111supon PDROP to compute the (water) - 62.4
pressure drop, prints the data and result, and returns to read fM ,dimensionless 0.032 0.026
another data set. a, psi 16.7 38.5
b, psil(gpm)' . 5 0.052 0.0296
Suggested Test Data
Q (gal/min) 170 4
3.40 Rework Problem 3.39, now allowing for a variation
D (in.) 3.068 0.622
of the Moody friction factor fM with pipe roughness and
L (ft) loo00 100 Reynolds number, as in Problem 3.38. Use the same two sets
RHO (Ib,/cu ft) 62.4 80.2 of test data, and assume in both cases that the pipe roughness
MU (Ib.,/ft sec) 0.0007 0.05 is 0.0005 ft, corresponding to galvanized iron pipe. Approp-
riate viscosities (at 68'F, in centipoise) are p = 2.46 (kerosene)
E (in.) 0.002 0.0005 and 1.005 (water). For each data set, the program should print
3.39 As shown in Fig. P3.39, a centrifugal pump is used to values for fy and the Reynolds number.
transfer a liquid from one tank to another, with both tanks 3.41 A semi-infinite medium is at a uniform initial tem-
at the same level. perature To. For t > 0, a constant heat flux density q is main-
tained into the medium at the surface x = 0. If the thermal
conductivity and thermal diffusivity of the medium are k and a,
respectively, it can be shown that the resulting temperature
T = T(x,t) is given by

Figure P3.39
The pump raises the pressure of the liquid from p1 (atmos-
pheric pressure) to pr, but this pressure is gradually lost If all other values are given, devise a scheme for finding the
because of friction inside the long pipe and, at the exit, p3 is time t taken for the temperature at a distance x to reach a
back down to atmospheric pressure. preassigned value T*. Implement your method on the com-
The pressure rise in psig across. the pump is given approxi- puter.
mately by the empirical rela tion
Suggested Test Dara
To = 70°F,q = 300 BTU/hr sq ft, k = 1.0 BTU/hr ft OF,
where o and b are constants that depend on the particular a = 0.04 sq ft/hr, x = 1.0 ft, and T* = 120°F.
206 Solution of Equations

3.42 A bare vertical wall of a combustion chamber con- x corresponding to ERFX, and a composite Gauss-Legendre
taining hot gases is exposed to the surrounding air. Heat is quadrature (see Problem 2.21) should be used to evaluate the
lost at a rate q BTU/hr sq ft by conduction through the wall error function when needed. Since, to ten significant figures,
and by subsequent radiation and convection to the surround- erf x = 1.000000000 for x > lo., XLI = 0 and X R = ~ 10. are
ings, assumed to behave as a black-body radiator. Let T,, T,, satisfactory starting values for (3.34).
and T. denote the temperatures of the gases, the exposed Write a short main program to test \N\JERF. Suggested.
surface of the wall, and the air, respectively. If a isthe Stefan- values for ERFX and the corresponding tabulated values for x
Boltzmann constant, 0.1 1 1 x BTU/hr sq ft O R 4 , and E , t, are :
and k denote the emissivity, thickness, and thermal conducti-
ERFX x
vity, respectively, of the wall, we have
0.0000000000 0.00
0.1 124629161 0.10
0.71 11556337 0.75
The extra subscript R emphasizes that the absolute or Rankine 0.8427007929 1.00
temperature must be used in the radiation term ("R="F+460). 0.9661051465 1 .SO
The convection heat transfer coefficient h, BTU/hr sq ft O F , is 0.9953222650 2.00
given by the correlation h = 0.21 (T, - T.)'I3, suggested by 1.0000000000 10.00
Rohsenow and Choi [24].
Assuming that T,, T,, F , k , and tare specified, rearrange the 3-45 Repeat Problem 2.15, with the following variation.
above relations to give a single equation in the outside wall ~ ~ the exchanger length L is a known quantity, to be
A S S I I that
temperature T,. Compute T, for the following test data: included in the data, and that the resulting exit temperature
T, = 1 0 o 0 ~t ,= 0.0625 ft, with (a) T, = 2100°F, k = 1.8 (fused TZis to be computed by the program. The problem is now to
alumina) BTU/hr ft OF, e = 0.39, (b) T, = 1100, k = 25.9 find the root of
(steel), E = 0.14 (freshly polished) and e = 0.79 (oxidized). In
each case, also compute q anc! the relative importance of radia-
tion and convection as mechanisms for transferring heat from
the hot wall to the air. for which the regula falsi, half-interval. or false-position
3.43 The rate qdh at which radiant energy leaves unit method could be used.
area of the surface of a black body within the wavelength An alternative method for finding Ta is discussed in Prob-
+
range h to h dh, is given by Planck's law: lem 6.19.
3.46 A vertical mast of length L has Young's modulus E
2nhc V h ergs and a weight w per unit length; its second moment of area is I.
qdA =
Xs(ehcJki- 1) ' cmZsec Timoshenko [25] shows that the mast will just begin to buckle
where c = speed of light, 2,997925 x 101° cm/sec, under its own weight when /3 = 4n*L3,9EI is the smallest root
of
h = Planck's constant, 6.6256 x lo-'' erg sect
k = Boltzmann's constant, 1.38054 x 10-l6 ergIoK,
T = absolute temperature, OK,
A = wavelength, cm. The first coefficientis c, = -318, and the subsequent ones are
given by the recursion relation
For a given surface temperature T, devise a scheme for
determining the wavelength A,., for which the radiant energy
is the most intense, that in, h corresponding to dq/dA = 0.
Write a program that imple~nentsthe scheme. The input data
should consist of values for 'I;such as 1000,2000,3000, 4000°K; Determine the appropriate value of 13.
the output should consist of printed values for A,,. and the 3.47 The stream function
corresponding value of q. Verify that Wien's displacement
law, h,,,T= constant, is obeyed.
3.44 Write a function, named INVERF, that computes the
inverse error fu)7ction,x, where the error function is given by represents inviscid fluid flow past a circular cylinder of radius a
with two effects superimposed [22]: (a) a stream whose
velocity is uniformly U in the negative x direction far away
from the cylinder, and (b) an anticlockwise circulation of
The function should have arguments (ERFX,TOL) where strength K round the cylinder. Here, r = ( x z - J ~ ) ' is' ~the
ERFX is thespecified value of the error function (0 G ERFX<l) radial distance from the center of the cylinder, whose cross
for which the inverse error function, x , is desired, and TOL is section is shown in Fig. P3.47. Note that the streamlineb,z = 0
the maximum error allowed in the calculated estimate of x. includes the surface of the cylinder.
The final estimated value of the inverse error function should Write a program that will produce graphical output (in the
be returned as the value of INVERF (real). style of Examples 6.3,6.5,7.1 and 8.5) showing points lying on
The regula falsi method of (3.34) should be used to find the selected streamlines within the dimensionless interval
Problems 207
computed as a function of temperature from the general
relation
CP, = a, + bt Tk + c,Tz + dl/Ti cal/g mole OK,
where i= 1,2,3,4, and 5 for CH4, 0 2 , N2, HzO (vapor), and
C02, respectively, and Tkis in OK. Table P3.49 [I91 also shows
the standard heat of formation at 29S°K, AH{zss cal/g mole,
for each component.

x/a = f5. For a given 4, increment x/a in steps of A between


these limits, and use the above equation to compute the corre-
sponding dimensionless ordinates y/a.
Suggested Test Data
'
A 1= 0.5, with #/aV (dimensionless) varying from -2.5 to
U 1, 2, and 3 in turn.
2.5 in ten equal steps, for K / ~ ==
This problem could be extended to compute the streamlines Suggested Test Data
for flow past an aerofoil [2'%]. T, = T. = 60°F, p = 0 to 100% in steps of lo%, w = 0 and
3.48 The analytical solution of Problem 7.31 is given on
0.015.
page 285 of Carslaw and Jaege~1281. In slightly rearranged
3.50 Producer gas consisting of 35 mole % carbon mon-
form, we have
oxide and 65 mole % nitrogen is to be burned with an oxidant
X
erf - (either dry air or pure oxygen) in an adiabatic reactor; all
-
TA - Ts
-- z J&t gases enter at a temperature To OF. The pressure is uniformly
TI - Ts erf 7 ' P atm throughout the system. For the reaction,
X
eric -
To - TB-- 2JGl
- which does not necessarily go to completion, the standard free
To-Tr er~c,~JaJGj' energy and enthalpy changes at 298°K are [I91 AG898 =
-61,452 and A H L 8 = -67.636 cal/g mole, respectively. The
standard states for all components are pure gases at one
where 9 is the solution of atmosphere, and ideal gas behavior may be assumed.
The constant-pressure heat capacities for the four compon-
ents present can be computed as functions of temperature
Tk"K from the general relation
See Problem 7.31 for the deffinitio:~~
and units of allquantities. cpl= al + blTk+ clT: cal/g mole OK.
In this connection, write ;a program that will read values for
For the gases involved, the constants are shown in Table P3.50.
To (suggested test value = 37). Ts(20), TI(32), kA(1.30),
kB(0.343), pA(57.3),p,(62.4), c,,(G.502), cps(l.Ol), L(144), and
E , and that will then proceed to solve the last equation above
Table P3.50
for 7,within the specified tolerance E. The value of T] thus
computed can then be used to evaluate the analytical solution Gas i a, bl x 10' cl x lo6
for the purpose mentioned in Problem 7.31.
3.49 Methane and excess moist air are fed continuously to
a torch. Write a program that will compute, within 5"F, the
adiabatic flame temperature T;' for complete combustion
according to the reaction
Write a program that will compute, within S0F, the adiaba-
The data for the program should include the table of thermal tic flame temtxrature T* and the fractional conversion z of
properties given below, toge!ther u ith values for T,,,
and T. (thc carbon monoxide. investigating both oxidants in turn. Assume
incoming methane and air temperatures, respectively, OF), p that air is 79 mole % nitrogen and 21 mole % oxygen.
(the percentage of excess air over that theoretically required), Suggested Test Data
and w (lb moles water vapor per Iti mole incoming air on a dry
basis). Assume that dry air contalns 79 mole % nitrogen and To = 200°F, P = 1 and 10 atm, f (oxidant supplied, as a
21 mole % oxygen. fraction of the stoichiometric amount required) varying from
The heat capacities for the five components present can be 0.6 to 1.8 in steps of 0.1.
208 Solution of Equations

How much of each type of oxidant should be used to obtain +


pn(z)= aozn alz" -' + . .. + an,
the highest possible flame temperature? where the coefficients may be complex. As an embellishment,
3.51 Write a general-purpose subroutine, named QD,
consider the possibility of having OD improve each of the root
that implements the quotient-difference algorithm of Section estimates, using Newton,s method.
3.9, for estimating the roots of a polynomial function of a
complex variable,
Bibliography 15. C. F. Prutton and S. H. Maron, Fundamental Principles of
Physical Chemistry, Macmillan, New York, 1953.
1. P. Henrici, Elements of Nlinrerical Analysis, Wiley, New York, 16. D. L. Katz et a/., Handbook of Natural Gas Engineering,
1964. McGraw-Hill, New York, 1959.
2. H. A. Luther, ''A Class of IterativeTechniques for the Factoriza- 17. J. M. Kay, An Introduction to Fluid Mechanrcs and Heat
tion of Polynomials," Conlmunications of'the A.C.M., 7, 177- Transfer, 2nd ed., Cambridge University Press, London, 1963.
179 (1964). 18. C . R. Wylie, Advanced Engineering Mathematics, McGraw-Hill,
3. %.A. Ward, "The Down-Hill Method of Solving f ( z ) = 0," New York, 1951.
Jolrrnal of the A.C.M., 4 , 148-150 (1957). 19. J. H. Perry, ed., Chernical Engineerr' Handbook, 3rd ed.,
4. F . B. Hildebrand, Introdrtcrion to Numerical Analysis, McGraw- McGraw-Hill, New York, 1950.
Hill, New York, 1956. 20. H. H. Skiiling, Electtical Engineering Circuits, 2nd ed., Wiley,
5. G. Chrystal, Textbook of Algebra, I , Chelsea, New York, 1952. 1965. .
6. H. A. Luther, "An Iterative Factorization Technique for 21. M. R. Spiegel, Theory and Problems of Laplace Transforms,
Polynomials," Comn~rnricntio~~s. . . - the A.C.M., 6 , 108-1 10 Schaum Publishing Co., New York, 1965.
(1 963). 22. L. M. Milne-Thomson, Theoretical Hydrodynamics, 4th ed.,
7. H. S. Wilf, Mathematics far the Physical Sciences, Wiley, New Macmillan, London, 1960.
York, 1962. 23. H. C. Hottel and A. F. Sarofim, Radiative Transfer, McGraw-
8. A. M. Ostrowski, Solrction of Equations and Systems of Equations, Hill, New York, 1967.
Academic Press. New York, 1966. 24. W. M. Rohsenow and H. Y. Choi, Heat, Mass, and Momentum
9. C. Froberg, Itrtrodrrction to Nuinerical Analysis, Addison- Transfer, Prentice-Hall, Englewood Cliffs, New Jersey, 1961.
Wesley, Reading, Massach'usetts, 1965. 25. S. Timoshenko, Theory of Elastic Stability, McGraw-Hill, New
10. Modern Computing Methods, Notes on Applied Science, No. 16, York, 1936.
Her Majesty's Stationery Office, London, 1961. 26. D. E. Muller, "A Method for Solving Algebraic Equations
11. A. S. Householder, Principi'es of N~in~erical Analysis, McGraw- Using an Automatic Computer," Math. Tables Aids Comput.,
Hill, New York, 1953. 10,205-208 (1956).
12. D. R. Hartree, Nunlerical Analysis, Oxford University Press, 27. R. Keller, Basic Tables in Chemistry, McGraw-Hill, New York,
London, 1958. 1967.
13. E. L. Stiefel, An Itrtrodtrction to Numerical Mathenratics, 28. H. S. Carslaw and J. C. Jaeger, Conduction of Heatin Solids, 2nd
Academic Press, New York, 1963. ed., Oxford University Press, London, 1959.
14. 3. F. Traub, lterative Methods for the Solution of Equations, 29. S. Lin, "A Method for Finding Roots of Algebrarc Equations,"
Prentice-Hall, Englewood Cliffs, New Jersey, 1964. J. Math. and Phys. 22, 60-77 (1943).
CHAPTER 4

Matrices and Related Topics

4.1 Notation and Preliminary Concepts then


This section serves to summarize the knowledge
assumed concerning determinants, matrices, and simul-
taneous equations. The elements are real or complex
numbers. At the same time, the notational pattern is
established. Let

then

be m x n, n x p, n x p, and p x q matrices respectively.


+
By the matrix sum E = B C, we mean the n x p matrix
E = ( e i j ) in which e i j = b j j + c i j for permissible values
of the indices. Moreover, B + C = C + B. By the.matrix
product F = AB, we mean an m x p matrix F = (&)in
which f i j = xi=,aikbkj.It is always true that A(B + C) =
+ +
AB AC and (B C)D = BD + CD, so long as the
+
Notice that (A - B)(A B) # AZ - BZ.It will be found that
dimensions are compatible; it may not be true that
(A -B)(A + B ) = A ~-BA + A B -BZ.
AB = BA even though n? = n = p. Should AB = BA,
A and B are said to commute. The n x p matrix, all of Let Hij denote the complex conjugate of the complex
whose entries are zero, is called the null matrix; the context number ai i , and let A denote the matrix derived from A by
serves to specify the number of rows and columns. The replacing aij by Zip Because the conjugate of the sum of
null matrix is commonly denoted 0. Thus, if for all two numbers is the sum of their conjugates, and the con-
indices i and j, b i j = - cij,B + C = 0. jugate of the product is the product of the conjugates, it
follows that kB = A D.
Example. In particular, if
By the transpose A' of them x n matrix A we mean the
matrix (gij) such that gij = aji. It is true that (B + C)' =
B' + C' and that (AB)' = B'A' (more generally that
(ABD)' = D'B'A', etc.).
veliminary Concepts

By the conjugate transpose A* of matrix A, we mean Example. If, for the matrix A, we take
(X)'.Clearly, (K)' = (A') and (AB)* = B*A*. A matrix A
is Hermitian if and only if A* = A (this requires of course
that m = n). A matrix A is symmetric if and only if
A = A'. A real matrix is Hermitian if and only if it is
symmetric.

Example. Choose

it will be found that A*A = I, whence A is unitary.


The (main) diagonal of a square matrix A of order n
consists of the elements, a i i ,1 G i < n. If all the elements
Then below the diagonal are zero, A is called upper triangular.
If, in addition, all elements on the diagonal are zero, A is
called strictly upper triangular. Lower triangular and
strictly lower triangular matrices are similarly defined.
It is often convenient to consider matrices whose
elements are also matrices, in the sense of partitioning.
Thus let

Note that AA* is Hermitian.

H is a diagonal matrix if and only if H is a square (n x n)


matrix (h,j) and hi, = Cl if i # j. If we let hi = hii, a
diagonal matrlx is often denoted [hl, h2, ....h,] (with or
without the commas). An alternate nomenclature is where the broken lines indicate partitioning of A and B as
diag (hlh2.. . h,). Observe that AH is the m x n matrix indicated below :
(hjaij)and that H B is the I? x p matrix (h,bij), These
features of multiplication by diagonal matrices are: of
special interest in the study of eigenvectors and eigen-
values described later. Here
H is a scalar matrix if and only if H is a diagonal matrix
all of whose elements It,, have the common value h. We All = [@,I a12 a1319 A12 = [a14 a151,
write hA = Ah = AH and IIB = Bh = HB. The real or
complex numbers used as the elements of the matrices are
sometimes called scalars, and formations such as hA and
Ah are referred to as scalur n~ultiplictltions.If the common
value h is the number orie, the resulting matrix is called
the identity matrix and denoted I (the context serves to
describe the urder, namely the number of rows n in an
n x n matrix). Thus IA = A1 = A.
If m = n, so that A is square, and if K is a matrix such
that AK = I, then K is unique, is called the inverse of A, Note that
and is denoted A-'. It is then true that A-'A = AA-I =.; 1.
The square matrix A is called nonsingular if and only if
'
A-' exists. If nz = n = p and both A- and B- exist,
then (AB)-' = B-'A-'. A square matrix A is unitary if This effect (that is, analogy with ordinary matrix multi-
and only if A* = A - ' , and orthoganal if and only if plication) is valid for any partitioning that gives meaning
A* = A-1 = A'. t o the various constituents.
212 Matrices and Related Topics

If A is an n x n square matrix, there is an associated adjoint of A and is denoted adj(A). Because of the proper-
number called the determinant of A and denoted det(A), ties described above, it is seen that
det A, or IAI. It is defined as

If det(A) # 0 then A-' may be written as (hi) where


where each product a,,, a,,, . . .anpnconsists of one and = Aji/det(A). It is also seen, since det(det(A)I) =
only one number from each row and column of A, and [det(A)Y, that det(adj(A)) = [det(A)Y-'. If det(A) = 0,
where the sum extends over all possible permutations A-' does not exist, for in that event, we would need
p,p, . . .p,, of the numbers 1 to n. Also, sgn ( p , p 2 . . .p,) is det(A-') det(A) = det(1) = 1.
one or minus one accordingly as the permutation is even Example. Let the determinant for the unitary matrix A of
or odd. The permutation p,p, . . . p , is odd i f an odd num- page 21 1 be found by expanding in terms of the elements of the
ber of interchanges is required to return the numbers to +
first column. The cofactor of 2/2/\/15 is (8 6i)/(52/15), that
naturat order. Thus 1, 3, 2 is odd, while 1, 3,4, 2 is even. for (1 - i)/2/5 is ( 3 +21i)/(~d\/45), and that for 5i/(51/3) is
It should be emphasized that this definition is not of (3 - 4i)/2/75. T~US
immediate value in finding the number det(A). For
example, if n = 10, the number of products required is
lo! = 3,628,800. The definition is, of course, requisite to
an understanding of determinants and, on occasion, is Also,
needed for special purposes in automatic computation.
8+6i 3+21i
Useful properties are
sJi3 5445

If A is a triangular matrix then det(A) is the product of the


diagonal entries. Diagonal matrices are an important Consider now all possible square submatrices formed
special case. from an m x n matrix A by deleting rows and columns.
A determinant, called the minor of aij, is associated with Consider in particular those whose determinants are not
each element a i j of a square matrix A of order n, and is zero. Then by the rank of A, we mean the order of such a
defined as the determinant of the submatrix, formed by nonsingular submatrix of highest possible order. By a
deleting from A the elements of the row and column in previous paragraph, a square matrix A of order n is non-
which a i j lies. When this minor is multiplied by (- l)'+j, singular if and only if its rank is n. Also, the adjoint of A
the resulting number is called the cofactor of aij. Let this is nonsingular if and only if A is nonsingular.
cofactor be denoted Aij. Then, as special cases of Lap- Concerning simultaneous linear equations, note that
lace's expansion, we have for each row r and each a solution exists if and only if the matrix of the coefficients
column s has the same rank as the augmented matrix (that formed
by placing the column of constants next to the matrix of
coefficients). We also need to know that m homogeneous
equations in n unknowns have a nontrivial solution (one
Suppose a matrix has two identical rows. Interchanging not all zeros) if and only if the rank of the matrix of the
these rows cannot alter the value of the determinant, yet coefficients is less than n.
by the detinltlon the sign must change. Therefore, its Example. It will be found that the equalio~is
determinantal value is zero. From this we see that if r # t

Because of the properties of transposes, if s # I, have the uniquc solution x , - 1, x , = 2, x , = 3; also, it wilt
be found that the determinant of the matrix of coefficients
has the value

With each square matrix A = (a,j ) we associate a matrix


K = (kij) where kij = Aji. This matrix is called the
Vectors 213
Thus the rank of the matrix of the coefficients, namely, verified. For a scalar c and vectors u, v and w, the inner
product is a complex number such that
-
(u, v) = (v, 4
(u, cv) = c(u, v)
is three, as is that of the augmented matrix + +
(u V, w) = (u,w) (v, w) (4.2)
+ +
(u, v W) = (u, V) (u, w)
(u, u) >, 0,(u, u) = 0 if and only if u = 0.
It is possible to show [from (4.2) directly, if desired] that
In contrast, the equations
(c U, v) = c (u, v). (4.3)
The vectors u and v are orthogonal if and only if
(u, v) = 0. The length of a vector u, denoted Ilull, is
K u ) . A normalized or unit vector is one whose length
have no solution. This is allied to the fact that is unity. A (finite) set of vectors u,, u,, .. ., u, is an orthog-
5 3 8 onal set if and only if (u,, uj) = 0 for i # j, and is ortho-
1 - 1 2=0, 1 - 1 5=8. normal if each vector is also of unit length. If u,, u,, ..., u,
*
6 2 1 ' 6 2 1 2 form an orthonormal set of n vectors for V,, it is seen by
- -
direct multiplication that [u, u, - . u,]'[u, u, . u,] = I.
The same coefficient matrices provide examples of homo- Thus such a matrix is unitary; from any such matrix, a
geneous equations. The equations
set of n orthonormal vectors can be extracted.
The p x n matrices A,, A,, . . ., A, are linearly depen-
dent if and only if there exist scalars c,, c,, . . ., c,, not all
zero, such that
have only the solution XI = xz = x3 = 0. However, the
equations If such numbers cannot be found, the set is a linearly
5x1 + 3:c, - x3 = 0 independent set. The definitions are applicable, in par-
XI - rllz $ 2x3 " 0 ticular, to the n x 1 matrices u,, u,, . .., u,.
6x1 + 2x2 + x3 = 0
Example. Let vectors be defined as follows:
have the nontrivial soluticln x , = -St, x2 = 111, x3 = 8t,
where t is arbitrary.
4.2 Vectors
In the treatment of vectors, several approaches are
possible; we choose to define a vector v as an n x 1
(column) matrix whose elements are real or complex, as +
It will be found that ul - 2u2 3u3 - u4 is the zero vector;
desired. Using the scalar multiplication for matrices thus the vectors are linearly dependent.
previously described and the properties of matrices pre- The vectors,
viously developed, it is a routine matter to show that the
postulates for a vector space are satisfied by the set V,
of all such n x 1 matrices. The verification should be
made [2].
For this vector space V,, an inner product (dot product)
of two vectors u and v is defined as

where, on the right, matrix multiplication is intended. are orthonormal, hence also linearly independent.
+
Thus if n = 3, u = [2 -I-3i, 3 - i, 4 i]' and v = [l i, + A vector space is of dimension q if and only if q linearly
+ + +
1 - i, 2]', then (u, v) is (2 -- 3i)(l i) (3 i)(l i ) + - independent vectors belong to the space and any set of
-
(4 i)2 or 17 - 5 i. Since the fundamental postulates +
q 1 (or more) vectors from the space is linearly de-
satisfied (designed for a more general treatment) will be pendent. Should such a set of linearly independent
an expediting feature, they are listed and should be vectors exist, every vector of the space can be expressed
214 Matrices and Related Topics

as a linear combination of these vectors. For this reason, any n x m matrix U, the maximum number of linearly
such a set is called a basis for the space. In the case of the independent columns is called the column rank; the
vector space V, of all n x 1 matrices, let el = [I, 0,0, . . ., maximum number of linearly independent rows is called
0]', c2 = [O, 1,0, . . ., 01' and in general zi be the column the row rank. It can be shown that these numbers are the
matrix whose first i - 1 rows and whose last n - i rows same, and are the same number as the (determinant)
all are zero, while the element in the ith row is one. This rank of U previously defined. If we know that row rank
set is clearly linearly independent. If u = [u,, u2, . . . u,]' is determinant rank, a knowledge of transposed matrices
is any vector of V,, then and determinants convinces us that column rank is the
same number. No attempt is made here to show that row
rank is determinant rank. It is worth observing that U,
U*, and U' all have the same rank. If U-' exists, it has the
Now consider a set ul, u2, . . ., u , + ~ of
, vectors from V,. same rank as U and, in such event, the rank is the order.
Let uj 5 xy=l
all zero, exist such that
,,
ujiei. Then constants c,, c,, . . ., c,+ not A feature of value, concerning matrices of order n
whose rank r is less than n, is that they can always be
written as the product of an n x r matrix and an r x n
matrix, each of rank r. Formal proof is not difficult;
however, we merely indicate the proof for a matrix of
for any n linear homogeneous equations (in this instance order 4 and rank 2. Let the first two columns be linearly
C"j2: cjuji= 0,i = 1, 2,.. ., n) in more than n unknowns independent; then the matrix may be written (and
(here c,, c2, . . ., cn+,) always have a nontrivial solution. rephrased) as
Thus the dimension of V, is n. The vectors zi, 1 < i < n,
constitute a particular (orthogonal) basis, often called the
initial basis.
Indeed, any n nonzero orthogonal vectors of V, form a
basis for V,. Let v,, v,, . . ., vn be such a set. If we assume
the existence of scalars el, c,, . . ., c,, such that

it follows that
Conversely, if two such matrices, each of rank 2, are
multiplied together, we can always find, in the product
matrix, a minor that is the product of two nonvanishing
determinants, one from each factor. This is readily seen
by partitioning, so that the submatrices of rank two (one
from each factor) can be seen to yield a submatrix in the
whence cj = 0 and the set is linearly independent. In product, also of rank two. Moreover, three or more
similar fashion, if u is any vector of V,, then columns must be linearly dependent.
Another factorization theorem applies to nonsingular
matrices. The leading submatrices of a square matrix
A = (aij) are the matrices
Example. The orthogonal vectors

.I= [I]. v2- [-i]. [i], v3=


The corresponding determinants are commonly denoted
are a basis for the set of all vectors v' = [x v zl. Indeed. A , , A,, . . ., A,. If all the leading submatrices of a square
following the technique described above, for arbitrary values matrix are nonsingular, it may be written as the product
of x , Y , and z, (and indeed in more than one way) of a lower triangular
matrix L and an upper triangular matrix U. Proof is by
induction and 1s sketched (for convenience), using always
a one for the diagonal elements of L. (Any nonzero
If m vectors u,, u2, . . ., u,, are given, then the n x nl element may be used.) Clearly, if the rank r is one,
matrix U = [u, u2 . . . u,,,] is defined in partitioned form. ,.
[a,,] = [I][ull] where u,, = a, Now assume that, for
Conversely, any 17 x m matrix U determines uniquely m all matrices of order r - 1 (having the requisite proper-
column matrices which may be construed as vectors. For ties), the proposition is true. Let u' = [a,, a,, . . . a,,,- ,]
4.2 Vectors 215
and v' = [a,, a,, ... a,, ,,,.1. Then, using partitioning, any restrictive. However, it will be seen in the section on
suitable matrix A = (aij)leads to symmetric and Hermitian matrices that an extremely
important class, the definite matrices, do have this
property.
In the section on the method of Rutishauser (see
where A,-, i s the (r - 1)l.h leading submatrix of A. A,-, equations 4.46). an approach more suitable for auto-
has the form Lr-,Up-I , where the diagonal elements of matic calculation will be found.
,
the lower triangular matrix L,, are all ones. Since the
determinant of a triangular matrix is the product of its Example. An example is'furnished by the following matrix:
diagonal entries, and the determinant of the product is the
product of the determinants, we are assured that L,-l
and U,-, are nonsingular. Let st = utU;2, and let
w = L,-,',v. Then
Here A, 5= 3, As = 12, A, = 36, and

where 6 , = a, - s'w.
The requirement that Ai # 0, 1 C i < n, seems very
EXAMPLE 4.1
MATRIX OPERATIONS

Problem Statement Table 4.1 .I Subroutinesfor Manipulating Matrices and Vectors

Write subroutines that will use double-precision Typical Call Operation Formula Used
arithmetic to perform the commonly used matrix and
vector operations, such as addition, multiplication, etc.
CALL MATMLT (A, U, T, M, N, P) T t AU

Method of Solution CALL MATADD (A. B, C, M, N) C+- A B + cij = ai, bt, +


CALL MATSUB (A. B, C, M, N) C *A - B c,j = a , , - bij
The operations are embodied in a single subroutine n
CALL MATVEC (A, X, Z, M, N) z t Ax z, = C ai,x,
with multiple entries, and are listed in Table 4.1.1; this j=1

table also serves in place of a flow diagram. The following CALL SCAVEC (S, X, Y, N) ytsx yi = sxi

notation is used: CALL SCAMAT (S, A, B, M, N ) B + SA b,, ="satj


CALL VECVEC (X, Y, S, N) s+-x'y s=Cxiy,
Matrices : A, B, C (all m x n) ; T (m x p) ; U (n x p); ,=I
V (n x m). CALL VECMAT (2,A, X, M, N) x' +z'A
m

x, =ks,zkak,
Column Vectors: x, y (both n x 1); z (m x 1).
Scalar: s. CALL MATEQ (A, B, M, N) B+A bi, = a,,

Although the above subroutines are not used here,


CALL VECLEN (X, S, N) s +- llxll s= ( 5x:)"'
I= 1
several of them will be employed extensively in Example CALLVECEQ (X, Y, N) Y+X Y , = x,
4.2. CALLTRNSPZ (A, V, M, N) VtA' uJi= ail

FORTRAN Implementation
The subroutine calls are listed in able 4.1.1; the
argument names are self-explanatory.
Example 4.1 Matrix Operations

Program Listing
C A P P L I ED NUMERICAL METHODS, EXAMPLE 4 . 1
C SUBROUTINES FOR MAN1 P U L A T I O N S OF M A T R I CES AND VECTORS.
C
C ..... T(M,P) = A(M,N) * U(N,P)
SUBROUTINE MATMLT (A, U, T, M, N, P I
.....
R E A L r 8 A, 8, C, T, U, V, X , Y, Z, S, SUMSQX
INTEGER P
DIMENSION A(10,10), B(10,10), C(10,10), T(10,10), U(10,10),
1 V(lO,lO), X(lO), Y(l01, Z(10)
D O 1 I = i , M
D O 1 J = l , P
1 T(I,J) = 0.
DO2 1 = i , M
D O 2 J.1, P
DO2 K = l , N
2 T(I,J) = P,(I,K)*u(K,J) + T(I,J)
RETURN
C
C ..... C(M,N) = A(M,N) + B(M,N)
ENTRY MATADD (A, B, C, M, N )
.....
DO3 l = l , M
DO3 J = l , N
3 C(I,J) = A,(I,J) + B ( I , J )
RETURN
C
C ..... C(M,N) = A(M,N) -
8(M,N)
ENTRY MATSUB (A, B, C, M, N )
.....
DO4 I = l , M

4
DO4 J r l , N
G(I,J)
RETURN
= A(I,J) -
B(I,J)

C
C ..... .?(MI = A(M,N) * X(N)
ENTRY MATVEC (A, X, Z, M, N )
.....
DO5 1 a 1 , M
5 t ( l ) = 0.
DO6 1 = l , M
DO6 J = l , N
6 Z ( I ) = A(I,J)*X(J) + 2(I)
RETURN

..... Y(N) = S * X(N)


ENTRY SCAVEC (S, X, Y,
.....
N)
DO7 1 = 1 , N
V ( I ) = SIX(!)
RETURN

.. B(M,N) = S*A(M,N)
'RY SCAMAT (S, A, B, M,
.....
N)
8 1 = 1 , M
DO8 J = l , N
B(I,J) = S*A(I,J)
RETURN

..... S = X(N) * Y(N)


ENTRY VECVEC (X, Y, S,
.....
Ni
S = 0.
DO9 1 = l , N
S = S + X(l)*Y(I)
RETURN

..... X(N) = Z(M)


ENTRY VECMAT ( 2 , A, X, M,
A(M,N)
N)
.....
DO10 J =1, N
X ( J ) = 0.
DO 1 0 K = 1, M
X ( J ) = X ( J ) + Z(K)*A(K,J)
RETURN
218 Marices and R e W Topkr

Program Listing (Continued)


C
C ..... B(M,N)-
ENTRY MATEQ (A, B, M,
DO 11 I m 10 M
A(M,NI .....
N)
DO 11 J m 1, N
11 B ( I , J ) = A(I,JI
RETURN
C
..... .....
C
--
VECTOR LENGTH S, OF X(N)
ENTRY VECLEN (X, S, N)
SUMSQX 0.

12
-
DO 1 2 1
SUMSQX
S
1, N
SUMSQX + X ( I ) * X ( I )
DSQRT(SUMSQX)
RETURN

-
13 Y(I)
RETURN
-
ENTRY VECEQ (XI Y,
DO 1 3 1
X(I)
1 0 N
N)

C
C ..... V(N,M) = A(M,N) TRANSPOSED
ENTRY TRNSPL (A, V, M, N)
.....
14
DO 1 4 J
V(J,I)
RETURN
END
-
00 1 4 I m 1, M
1, N
A(I,J)
4.3 Linear Transfor.marions and Subspaces 2 19

4.3 Linear Transformations and Subspaces rank of A plus the nullity of A is the dimension of Vn.
A set of vectors W from a vector space V may be said (The trivial space consisting of the null-vector only has
to form a (vector) subspace of V if and only if, for every rank zero.)
scalar c and every pair of vectors u and v in W, it is true .
To demonstrate the proposition above, let el, e,, . ., em
that both u + v and cu alre vectors of W. Let A be any be a basis for the null-space of A. Let em+,,...,en be
additional vectors such that el, e,, . . ., en in toto consti-
(fixed) I? x n matrix. Then the set of all vectors of the
form Au, where u is any (variable) vector of Vn,consti- tute a basis for Vn. Since every vector u =
transforms as x;=,+, x:= uiei
uiAei, it follows that every vector
tutes a subspace of Vn.For Au is an n x 1 matrix; there-
fore, it is an element of Vn.Moreover, it is readily verified of the space W of all transforms is a linear combination
that of the vectors Ae,, m + 1 < i < n . But these are linearly
independent, for if it were not so, some nontrivial linear
u + v = Au, + Av, = A(ul + vl), combination ~ I = , + b,Aei , would be 0, whence
cu = cAul = Acu,. (4.6) A x:=,+, biei would be zero. In that event ~ ~ = , +biei
0, which contradicts the assumption that the vectors
, =
That the set of all vectors Au is a subspace of Vn means, el, e,, . . ., en are a basis for V,.
in part, that a set of linearly independent vectors of the An important theorem involving nullity is Sylvester's
form Au will serve as a basis for the space of all vectors law of nullity, which states that the nullity of the product
of that form. Note that A E ~is the first column of A, of two matrices equals or exceeds the nullity of either
Ac2 is the second column of A, and so on. Since u =
x;=, ujsj, by an obvious extension of equation (4.6), it is
seen that
factor, and that the nullity of the product cannot exceed
the sum of the nullities of the factors.
It is intuitively obvious that the set of all vectors of
Vnsuch that the nth row is zero constitutes a subspace of
V,. It can be phrased formally if for the matrix A the
Then for a basis of the space, we may choose any maxi- diagonal matrix [ l , 1, . . ., 1,0] is used, where every
mum number of linearly independent columns of A. It is diagonal entry but the last is unity.
seen that the dimension of the subspace W of vectors In future consideration of the transformation of vectors
Au is the rank of A. by square matrices A, an extremely useful relation is the
A transformation of a space V into a subspace W of V transfer rule:
may be viewed as a device for associating with each (u, Av) = (A*u, v). (4.7)
element of V a uniquely defined image in W (more
Proof is straightforward: u*(Av) = (A*u)*v since
precisely, such a transformation is a many-one corres-
(A*u)* = u*A.
pondence from V to a subset of V). Expressed in operator
form, Tu = v, meaning that the transform or image of Example. As an example of a null-space, consider
u is V. The transformation is linear if and only if for all
u and v in V and any scalar c,
T(u + v) = Tu + Tv;
and
T(CUI)
= cTu.
and let x' = [ X I x 2 x 3 x4 x51. The simultaneous equations
Observe immediately, reading equation (4.6) in reverse, Ax = 0 have the solution x = rv, + svz, where r and s are
that square matrices A of order n may be viewed as arbitrary and vi = [-5, 2, 5, 7,0],v i = [-13,1, 6,0,7].Thus
linear operators for Vn. the nullity of A is at least two. Since the third-order determinant
Let A be an n x 12 matrix and u a vector of Vn such in the upper-left cdrner has the value -35, the rank of A is
at least three. Hence the nullity is two and the rank is three.
that Au = 0. If A is nonsingular, its determinant is not
zero, and the set of simultaneous linear equations, repre- Associated with every square matrix A is a special set of
sented by Au = 0, has only the trivial solution u = 0. If, vectors, called eigenvectors, and a related set of scalars,
however, A is singular [that is, det(A) = 01, then u exists called eigenvalues. Formally, the vector u is an eigenvector
such that u # 0 and Au = 0, for in such cases, the simul- of A if and only if u is a nonzero vector and A is a scalar
taileuub equalior~sinvolved have a nontrivial solution. (which may be zero), such that
The set of all vectors, whose image (using a fixed matrix
A) is the null-vector, is a subspace of VH.This is readily
verified in the manner by which equation (4.6) was The scalar /I is an eigenvalue of A if and only if there exists
established. This set is called the null-space of A. Its rank a nonzero vector u such that (4.8) holds. It is seen that the
is called the nullity of A, and it will be shown that the eigenvectors pertaining to A are those nonzero vectors
220 Matrices and Related Topics

whose images are scalar multiples of the original. It will +


Direct expansion gives the characteristicfunction h4 20h3 -
shortly appear that the number of eigenvalues is finite; 700h2 - 8000X + 120000, so that the eigenvalues are -30,
however, it is clear that any nonzero scalar multiple of an - 20, 10, and 20. The adjoint of A - 101 is
eigenvector is also an eigenvector. 3600 - 1620 2340 lg801
From (4.8), observe for eigenvectors and eigenvalues -7200 3240 -4680 -3960
that (A - 1I)u = 0 must be satisfied. Let -1200 540 -780 -660 '
+(A) = det(A - AI). 10800 -4860 7020 5940
(4.9)
thus an eigenvector for = 10 is [3, -6, - 1,9]'. For eigen-
&A) is the characteristic function of matrix A. Since n vectors corresponding to -30, -20, and 20, respectively, it is
homogeneous equations in n unknowns have a nontrivial found that [87, 546,91, 181]', [- 1,2, -3, 71' and [7, 6, 1,
solution if and only if the rank of the coefficient matrix is -93' may be used.
less than n, it follows that 1is an eigenvalue for A if and
Reverting to consideration of the characteristic func-
only if 1is any scalar such that
tion of (4.10), after due reflection, we see that

Equation (4.10) is the clzaracteristiceq,uation for matrix A. where the ai (in addition to being the elementary sym-
If A = (aij), then metric functions of the roots A,, A,, ..., An) are certain
combinations of the minors of A. One can show that the
sum of the diagonal entries of A is the sum of the eigen-
values of A. This sum is called the spur or trace of A.
Thus,

Observe that the characteristic function of A' is also the


characteristic function for A. This follows since (A - 11)' Also, the product of the eigenvalues is the determinant
= A' - 11' = A' - 21. But det(Br) = det(B); thus we see of A. Thus,
n
that det(A - 11) = det(At - 11). Moreover, if A is real
and A,, 1, are two eigenvalues for .$ such that 1, # A,,
then an eigenvector v, such that A'v = A,v, is perpendicu- More generally, each oi is the sum of the ith order
lar to an eigenvector u such that Au = A,u. For by principal minors of A, it being understood that the
(4.7), (v, Au) = (v, 1,u) = A,(v, u) and (v, Au) = (A'v, u) = principal minors of a square matrix are those minors
(A2v, U)= X2(v, u). Thus (v, u) = 0. whose diagonal entries are diagonal entries of A.
Now let Ai be an eigenvalue for A and let A - liI be of Consider next a theorem of.interest involving square
rank n - 1. Then adj(A - AiI) is not the null-matrix, matrices of order n and to the effect that if
although, since det(A - AiI) = 0, (A - Ail) adj(A - &I)=
0. If this adjoint be written in partitioned form as
.
[u, u, . . u,], it is seen that (A - AiI)uk = 0 or Auk =
A,u,. Thus any nonzero column may be used as an eigen-
vector corresponding to Ai. Starting with adj(A - AiI) x
(A - &I) = 0, it is found that the transpose of a nonzero
row of the adjoint will serve as an eigenvector for A'. It
laiil > x
j= 1
j#i
IajiI, 1<i < n, (4.15)

is interesting to observe that, under these circumstances, then A is nonsingular. Proof is sketched for the second
adj(A - AiI) = uv' where u is an eigenvector for A cor- version. Suppose A is singular. Then so is A', and there
responding to Ai and v is an eigenvector for A', also cor- exists a nonzero vector u such that Atu = 0. Let urnbe the
responding to Ai. For by Sylvester's law of nullity, if r component of u of maximum absolute value. Then
be the rank of the adjoint, n - 0 < n - (n - 1) + n - C;= aj,,,uj = 0 and
r o r r < 1. Since r # O , r = 1. Then adj(A-LiI)=uvt,
as explained on page 214.
Example. An example of some of the foregoing concepts
can be found by using the matrix
It follows that
17 - 1 -27 -6
6 -14 -54 -24 IammI< x
n

j= 1
j+m
IajmI < IammI.

-9 -19 Since this cannot be, A is not singular.


4.4 Similar Matrices and Polynomials in a Matrix 221

When A - 11 is used in the role of matrix A in the a # c. There are certain unifying principles, and we pro-
relations of (4.14) or (4.19, an immediate consequence is ceed with a partial description.
that every eigenvalue of' a square matrix A lies (in the If P is a nonsingular square matrix of order n, then for
complex plane) in at least one of the circles arbitrary vectors u, the relation
I1
u =Pv (4.18)
A -a <~ a i l 1 < i < n, (4.16)
= l
j+i can be construed as a change in coordinates. Under these
as well as in at least one of the circles circumstances, given v, we can find u, or, given u, we can
find v as P-'u. Consider now the eigenvector-eigenvalue
n
11 - a < a 1<i < n. (4.17) relation Aui = Aiui under such a change of variable.
J= I Letting B = P-'AP, from Bvi = P-'APv, = Aivi, and
j t i

For by the paragraph above, if vi = P-lui, (4.19)


n we see that the matrix B has the same eigenvalues as A
[A - aii1 = laij/ and eigenvectors related by the matrix P. The matrix B
j= 1
j*i is said to be similar to the matrix A if and only if there
for all permissible values of the indices, then A - A1 is exists a matrix P such that B = P-'AP.
nonsingular. This establishes (4.16); (4.17) follows in like Assume now that the eigenvalues of matrix A are all
fashion. distinct, and denote them by A,, i.,, ..., A,. Let the asso-
ciated eigenvectors be u,, u,, . . ., u,. It will soon appear
Example. This section closes with an example of a use for
that the eigenvectors are linearly independent. Hence if
eigenvalues, for finding a general solution for the system
U is the matrix whose ith column is ui, it follows from
AU = U diag (A,, A,, . . ., A,) that
U-'AU = diag (A,, A,, . . .,A,), (4.20)
where the a,, are constants. It is seen by substitution
that y , = uieLx is a solution if and only if, for A = (all), and A is similar to a diagonal matrix whose diagonal
det(A - AX) = 0 and [u, u z . . . u,F is an eigenvector associated elements are the eigenvalues of A. To see that the eigen-
with A. Should there be n distinct eigenvalues A,, A,, . . ., An vectors are linearly independent, suppose that the first m
and corresponding eigenve:ctors u,, u,, .. ., u., then by super-
position, the solution is
are linearly independent, but that u, = x'!!,
ciui. Then

where ul = [ul,, U J Z ., . ., U J , , ] and


~ , c,, c2, . .., c,, are arbitrary. It would be also true that
If there are multiple roots for the characteristic equation, the
solution is more elaborate..
4.4 Similar Matrices and Polynomials in a Matrix
Thus fii(l, - Ai)ui = 0. This contradicts the assumed
Consider the matrices independence of the first m eigenvectors.
Example. A numerical example is found in

All have the characteristic function (1 - A)2, and A has


two linearly independent eigenvectors, say [l, 21' and
[2, I]'. B and C have, exlcept for a scalar multiplier, only
one eigenvector, which may be taken as [0, I]'. Clearly,
eigenvalues alone do not describe a matrix, nor do eigen-
values and eigenvectors completely determine it. Further
insight can be gained by considering eigenvalues and

A=E 01
eigenvectors for rhe matirix
0 0 0
0
O l c
oc 0 '

under various assumptions such as a = b = c, a = b, or


222 Matrices and Related Topic.?

together with A, = 10, A, = 20, X3 = -30, A4 = -20. It is It is not difficult to see that, if we denote diag (A,, A,, ...,
readily verified that 1,) by A, then A2 = diag (A:, A,: . . ., A:) and, in general,
1 0 2 Ak = diag ( i t , A:, . . ., A:). Also, for arbitrary scalars, c,

U-l-[i :
0 1 - 6
y i].
0
and c2, and diagonal matrices D, = diag (e,, e2, . . ., en)
and D2 = diag (fl,f,,. . .,fn),
c1D1 + c2D2
and that AU = U diag (LC,20, -30, -20) and that each is,
in fact. = diag (clel + c2f1, c1e2 + c2f2,. . ., clen + c2f,).
Applying these concepts to the above, the net result is
that
P-'g(A)P = diag [g(A,), g(A2), . . ., g(ln)l,
or
It can be shown that every matrix is similar to a matrix
of specifiable structure called the Jordan canonical foum. dm' = P diag [ d l , ) , d l , ) , . . ., g(1,)l. (4.22)
~ a t r i c e sof the type, For distinct eigenvectors, then, it is clear that the eigen-

, [ I, 1 A ]
A 0 0

O I A
[
A 0 0 0

O O l i
, .. .
values of g(A) are g(ii) and, in addition, that the eigen-
) the same as those of A.
vectors o f g r ~ are
Exanzple. As a case in point, consider g(x) = x 2 - 4 and

are called canonical boxes. Every matrix is similar to a


matrix based on the use of these canonical boxes t o buiId
a quasidiagonal marrix. An example is
The eigenvalues of A are found to be 7, 14, and 21 with the
eigenvectors [I, 0, 2]', [2, 1 , O]', and [O, 1, 3]', respectively.
It develops that g(A) is
129 126 -42
-70 332 33,
-336 672 213
with eigenvectors as above and eigenvalues 45, 192, and 437,
where the liare any numbers, repeated or not. It is easy to respectively.
show that one and only one eigenvector corresponds to The use of similar matrices, and certain analogies
each canonical box (except for a scalar multiplier). between polynomials f(x), in an indeterminate x, and
Now let a,, a,, . . ., a, be any scalars and let A be a matrix polynomials f(A), combine to produce powerful
square matrix. Then by a polynomial in matrix A, we results. We proceed with an example.
mean an expression of the form First of all, it develops that a necessary and sufficient
condition for lim,,, Ak to be zero (A being a square
matrix) is that all the eigenvalues of A shall be, in absolute
where, by AO,we understand the identity matrix I. Let B value, less than one. If A = PA'AP, where A = [A,,
be any nonsingular matrix, and observe that B-'A2B = A,, . . ., A,], this is obvious, since Ak = P-'AkP and
B-'ASBW1AB. Proceeding by induction, it becomes clear Ak = [A:, A,: . . ., it]. Proof can also be accomplished if A
that for nonsingular matrices B, is not similar t o a diagonal matrix. On the strength of the
above statement, we now show that a necessary and
- sufficient condition that the matric series
i here is a simple connection between the eigenvalues of
I+A f A2 + " ' + Ak f ."
the matrix g(A) and those of the matrix A. Indeed, if
A,, ,I2,. . ., A,, be the eigenvalues of A, then g(Al), g(L2), converge is that all eigenvalues of A be less than unity in
. . ., g(A,) are the eigenvalues of g(A). We give a demon- modulus. In such event,
stration only for the case where the eigenvectors of A are (I - A)-' =I + A + A2 + -.. + Ak $ *".
(4.23)
distinct. Let P be a matrix whose columns are eigenvectors
Here, the meaning of the equal sign is that eacli cle~nent
for A, so that AP = P diag (I,,, A,, . . ., A,) and P-'AP =
of the matrix on the left is the sum of an infinite series
diag (A,, A,, . . ., A,). It follows from (4.21) that
built from corresponding elements on the right. It is neces-
sary that lim,,, Ak be zero; therefore, by the above,
4.4 Similar Matrices and Polynomials in a Matrix 223
each eigenvalue must be less than unity in modulus. In (4.28) let v = [a,, a,, . . . ainltand w = [ h l jh z j . . . hnjIt.
Now let all eigenvalues have an absolute value less than Then (4.28) becomes
unity, and consider sufficiency. Since det(1- A) = 0 n n n n

implies 1 is an eigenvalue, it follows that det(1- A) # 0, C 5im6mjC bmjaim< 2 iiimaimC bmjhmj.


m= 1 m= 1 m= 1 m= 1
and that ( I - A)-' exists. The identity
Now sum both sides from i = 1 to i = p and from j = 1 to
(I+ A + A2 + ... -I- A k ) ( I - A) = I - A k f l j = q (assuming A is p x n and B is n x q). The result is
(4.27). For a squa;e matrix A , since Sp(A*A) =
is easy to establish. Then postinultiylication by (I - A)-'
gives
Xi= z= aikaik,it follows that if s~(A*A)< u < 1, then
Sp(AmA*,) < urn;whence lim,,, Am = 0 and (4.23) is
I + A + AZ + ... + Ak = (I - A)-I - A k f ' ( I - A)? true.
If we apply (4.27) to an eigenvalue-eigenvector relation
Since lim,,, Ak" = 0, (4.23) follows.
Au = l u , we get
At this juncture, note tlhat if, for a square matrix B, an
upper bound can be found for the moduli of the eigen- Sp(u*A*Au) < Sp(A*A)Sp(u*u).
values, then a closely related matrix A can be found whose But Sp(u*A*Au) = Sp(Xlu*u) = 1A12~p(u*u).Thus, for
eigenvalues are less than unity in absolute value and any eigenvalue,
whose eigenvectors are those of B. Let such an upper
bound be p. Then let .A = (1lp)B; it follows that if I A I <~ laik12.
i=l k=l
Bu = Au then
Still another relation, for real square matrices, is
Schur's inequality, which states that I: + 1: + - . . + I: =
Sp(A2),< Sp(A*A) = Cr= I:, ,
a;. Proof is accom-
It is not necessary to know the eigenvalues to find such a plished directly. Sp(A2) is a: + x;=, zYsi=
l , j + i aijaji.
But for any two real numbers a and b, 2ab ,< a2 + b2.
bound p. Two simple tests may be derived from (4.16)
and (4.17). Let A, denote any eigenvalue of A = (aij).The +
Therefore, aijajid +(a; a;;.)and
first of the relations mentioned above yields the inequality n n

Sp(A2)< a; + 1$(a: + = Sp(A*A).


i= 1 j,i= 1
j+i
(4.25)
We conclude this section with the famous Cayley-
the second leads to Hamilton theorem which states that if 4 ( I ) is the char-
acteristic function for the matrix A, then $(A) is the null
lAk1 < max
(i11 1
/aijl .

Let the eigenvalue I , lie in the circle with center at a,, and
(4.26) matrix. To understand this, note that for positive integral
values of k, A k - I k I = ( A - I I ) ( A k - 1 + A A k - 2 + . . . +
Ak-lI). This means that
with radius equal to
4(A)- = ( A - II)\(r,
where \Ir is a polynomial in I and A.
Since adj(A - A) is composed from the cofactors of
A - 21, it also is a polynomial in I with matric coefficients.
Then, since Ilk/- lammld IAk - amml,it is seen that IIkI < Moreover, 4 ( l ) I = ( A - 21) adj(A - II). Adding this to
I$=,
lamil. From this, (4.25) follows. Proof of (4.26) is the above, we obtain
similar.
The matrix A of page 222 will serve to illustrate the $(A) = ( A - W C ,
above. By (4.25), 13.,1 < 49. This is verified, since I, = 7, where C is a polynomial in I with matric coefficients.
A2 = 14, A3 = 2 1. By (4.26), lIkl < 48, a slightly sharper Since the right side will involve I nontrivially if C is not
result. zero, and since 4 ( A ) is independent of I , it follows that
The ideas of the paragraph above will find application, $(A) is zero.
for example, in the following chapter in conjunction with Example. As an illustration, consider again the matrix A
the iterative solution of simultaneous linear equations. A of page 222. A2 and A3 are found to be
related concept involves the inequality

This important inequality is an extension of the Cauchy-


A' = [ 133
-70
-336
126 -42
336 351,
672 217
Schwarz inequality, which in vector form may be written
1715 2058 -686
I 1 8 6 2 6468
(v, !v)(w,v:) < (v, v)(w,w). (4.28) -7644 15288 4165
224 Matrices and Related Topics

The characteristic function is - A 3 + 42A2 - 539A -k 2058, It is next indicated that if A is any Hermitian matrix,
+
and it is found that -A3+ 42A2 - 539A 20581 = 0. then there exists a unitary matrix P such that
4.5 Symmetric and Hermitian Matrices
Recall that a matrix A is Hermitian if and only if where A = [A, I., .
;I,] and A,, A2, . ., 1, are :he (real)
A = A* = At. A is symmetric if and only if A = A'. eigenvalues of A. Should the eigenvalues be distinct,
Thus A is symmetri~~ n dHermitian if and only if A is there are n orthogonal eigenvectors, thus n linearly
real and self-transpose. independent eigenvectors. Then it follows, as in estab-
An Hermitian,form is an expression of the type lishing (4.20), that (4.31) is true, To guarantee that
P* = P-', we need only require that each eigenvector be
normalized in forming P. We then see by direci multi-
where the ui are construed as complex variables and the plication that P*P = I, hence P* = P-'. No proof is
a i j as constants. It is apparent thatsuchaform mayalways attempted here for the case of multiple eigenvalues,
be written though the theorem remains true (see Jacobi's method,
(u, Au), where A = A*. (4.29) page 250).
An Hermitian matrix A is positive definite if and only if,
A quadratic form is an expression of the type for all u # 0, (u, Au) > 0. Such a matrix is positioe semi-
dejinite if and only if, for u # 0, (u, Au) > 0. Correspond-
ing statements define negative definite and negative semi-
where the a i j are construed as constants and the ui as defnite Hermitian matrices. Hermitian matrices thus far
variables. We are here concerned only with real quadratic not characterized are called indefinite.
forms. In that event the quadratic form may be written as It is a simple matter to verify that for the real Hermitian
matrix
(u, Au), where A = A = A', and u is real. (4.30) - 4 . * 1 0
If u and A satisfy either (4.29) or (4.30), then:
(a) For all u, (u, Au) is real.
(b) Every eigenvalue of A is real.
(c) Eigenvectors corresponding to distinct eigenvalues we may u s e h = 1-4 - J2, -4, -4 + JZ] and
are orthogonal.
To prove (a), note that (u, Au) = (A*u, u) = (Au, u) =
(u, Au) (see equations (4.2) and (4.7)). To show (b),
(u, Au) = (u, Iu) = A(u, u); since (u, Au) and (u, u) are
real, so is 1,. In case (c), (u,, Au,) = A2(u1,u,) and Recall that the leading submatrices of a square matrix
(u1,Au2)= (Au1,u2)= X1(u1,u2)= ;I1(u1,u2).Then 1, # I, A are those square matrices of order m, 1 m < n, <
and I,(u,, u,) = E.,(u,, u,) imply (u,, u,) = 0. formed from the first m rows and columns of A, and that
An example of a quadratic form is given by 4x: - the corresponding determinants have been denoted
2x,x2 + 4x: - 2x2x3 + 4 4 - 2x3x4 + 4x2. Since this A1, A2, . . ., A,, . . ., A,. Let the submatrices be denoted
+
may be written as 3x1 + (x, - x,), + 2x22 (x, - x3), + A('), A(,), . . ., A'"), . . ., A'"). A theorem of consequence
2x2, +(x3 - xJ2 + 3x;, it is clear that the form is also concerning Hermitian matrices states that between two
positive definite (see below). distinct eigenvalues of A ( ~ + ' there
) lies at least one eigen-
A matrix A is unitary if and only if A* = A-'. A real value of A(&)which may coincide with either of these two
matrix A is orthogonal if and only if A' = A-'. A matrix eigenvalues. Moreover, if 1 is a multiple eigenvalue of
A is isometric if and only if, for all n-dimensional vectors order p for A(~+", then it is an eigenvalue of order at
u, v, it is true that (u, v) = (Au, Av). If w = Au be inter- least p - 1 for A ( ~ )For. brevity in presentation, let
preted as a change of variable, it is seen that for an iso-
metric transformation, (u, u) = (Au, Au) = (w, w); there-
fore, length is preserved. Clearly a real unitary matrix is
an orthogonal matrix. It develops that a matrix A is where U is unitary and A is the matrix diag (I, I, Ik). -
isometric if and only if it is unitary. For suppose A is Let I be an eigenvalue of C and [vfa]'the corresponding
unitary; then (Au, Av) = (A*Au, v) = (u, v). Next con- eigenvector, v being a k x 1 matrix. Thus
sider (Au, Av) = (u, v) for all u and v. It is true, in par-
ticular, if v = (A*A - 1)u. Then (Au, A(A*A - 1)u) =
(u, (A*A - 1)u) implies (A*Au, (A*A - 1)u) = (u, (A*A -
1)u) or ((A*A - I)u, (A*A - 1)u) = 0 for all u. This means whence
A*A-1=OorA*=A-'. Bv + aw = Iv, w*v + afi =la. (4.32)
4.5 Symmetric and Hermitian Matrices 225
Let v = Ux. Then the first of these equations (4.32) matrix itself has the characteristic function 209 +232A
yields UAx + ctw = 3.Ux; then Ax ctU*w = Lx and + +
+93A2 16A3+ A4 and the eigenvalues (-9- J5)/2,
(A1 - A)x = ctU*w. When the latter is substituted in the (-7 - J5)/2, (-9 + J5)/2, and (-7 + JS)/2.
second equation of (4.32), there results
If the matrix A is Hermitian, each of the following
w*U(2I - A)-'otU*w +
crp = Aa. statements give a necessary and sufficient condition for
This is, of course, valid only if ;l is not an eigenvalue A to be positive definite:
of B. (a) All the eigenvalues of A are positive.
The factor a may be removed, since we may view the (b) The coefficients of the characteristic equation for
components of the matrix. A as variables. Since 11- A is A alternate in sign.
diagonal, its inverse is diag (p, p2 - - . p,) where pi = (c) Each leading submatrix of A is positive definite.
1/(A - Ai). Using this leads to the equation (d) A, > 0, 1 < rn < n.
First consider (a). If A is positive definite, then (u, Au)
> 0, (u, u) > 0 and Au = Au imply 1> 0, because
(u, Au) = A(u, u). Now suppose A,, A,, . . ., 2, positive.
where U = [u,, u,, . . ., ukl. Should (w, uj) = 0, it is easy
to verify that [:] = [z] is an eigenvector and 2, an
Since A is Hermitian, it has n corresponding eigenvectors
x,, x2, . . ., x, that are orthonormal. Then for any u,
eigenvalue for A',+ ''. Should ilj be a multiple root such
that for 1 different indices>/,,j,, . . .,j,, A, is the eigenvalue whence
for uj, and (w, uj,) # 0, then ibj is an eigenvalue of multi-
plicity at least I - 1 for A(kf". This is illustrated for U = (x,, u)xl f (x,, U)X, + .. + (xn,U>X,.
'

1 = 2, and for convenience we suppose (w, u,) and (w, u,) Moreover,
are not zero, while both u, and u, have the eigenvalue
= A,(x,, u)x, + A,(x,, u)x, + . . . + ;l,(xn, u)x,,
A,. Then investigation shows that the vector
so that
with proper choice of y and 6, will serve as an eigen-
vector of A(k+I ) corresponding to the eigenvalue A,. We
+
need only require that ywhu, 6w*u2 = 0, with y and 6 To establish (b), observe that if the coefficients of the
not both zero. Thus under all circumstances (4.33)
. . leads characteristic equation alternate in sign, there can be no
to a polynomial equation of requisite degree for deter- negative root. Since the matrix is Hermitian, all roots are
mining the eigenvalues of C distinct from those of B. real, hence all are positive. Then, by (a), A is positive
(If all the eigenvalues of B are eigenvalues of C, and definite. Conversely, if A is positive definite, then all
(w, uj) = 0 for all values ofj, then A = p is the new eigen- eigenvalues are positive and the coefficients of the char-
value and may equal a previous eigenvalue.) The remain- acteristic equation alternate in sign.
ing details are omitted, with the remark that a sketch of Consider next (c) and suppose A is positive definite. Let
a simple situation involvic~greal values of li is informa- u = [u,, u2, . . ., urn,0, . . ., 01' be nonzero, yet such that the
tive. This sketch may take the form of plotting z = 2 - P last n - rn entries are zeros. The inner product (u, Au) > 0;
and it is also the inner product for an arbitrary vector in
Iw, u,l2 m-dimensional space, one which uses as its matrix the
z= C-
j=lA-A leading submatrix of order m. Hence this submatrix is
J'
positive definite. The condition of part (c) is obviously
on the same coordinate plane. sufficient, since A itself is then positive definite.
Exaniplc. An example is furnished by the matrix Turning to part (d), suppose A is positive definite.
Since the determinant of a matrix is the product of its
eigenvalues, from part (c) it follows that each A, > 0. To
see that the condition of (d) is sufficient, proceed by in-
duction. The proposition is obviously true for n = 1.
Wow assume the proposition to be true for all Hermitian
The leading submatrix [-41 has the characteristic function matrices of order k and consider one, say of order
-4 --A and the eigenvalue -4. The leading submatrix I' + 1. Let A be (if possible) a negative eigenvalue for
,
Then since A,,, > 0, and A,+ is the product of
[-:
- -
has the characteristic function 15 +8A + A 2 and
the eigenvalues and
-3 The leading submatrix of order
-5.
all eigenvalues for ACkf", A'~'') has a second negative
eigenvalue. By the theorem just presented, A(k)has an
three has the characteristic function -56 -46A -12AZ - A 3 eigenvalue which is negative. This cannot be; thus the
and the eigenvalues 4 - 4 2 , -4, and -4 + 42. The eigenvalues of A are positive and A is positive definite.
226 Matrices and Related Topics

Example. An example of a positive definite quadratic form and corresponding eigenvectors be A,, A,, . . ., A, and
+ + + +
is 13y: 9y: 57y3 8y1y2 8y1y3 - 6y2y3. That this is u,, u,, . . ., u, (ordering by magnitude is not demanded).
positive definite may be seen by writing it as 5y: + (2yl 2 ~+ ~The) eigenvectors
~ must be orthogonal and we may, in
+ + + +
4y; ( 2 y 1 f 2y3)' 44y: ( y , - 3 ~ ~When ) ~ written
. in addition, require that they be of unit length. Let A, = A,
the form (Y, AY) where Y* = [YI y2 y31, and define recursively

F o r k = 1, it is a simple matter to see that A,u, = Alu, -


A,u, = 0, and that, for i # 1, A2ui = A,ui - l,ul(u,, ui)
The first two leading submatrices have the corresponding = Alui = Aiui. Thus A, has the same eigenvalues and
quadratic forms 13y: and 13y: f 8 y , y 2 + 9y:. These are seen eigenvectors as A,, except for 1,which has been replaced
to be positive definite. The characteristic function is 5400 - by 0.It is seen that the process continues: the eigenvectors
+
1330X 79h2 - h3 and the eigenvalues are positive, since the are retained, and the eigenvalues are successively replaced
functional values for h = 0, 10,20, and 60 are 5400, - 1000, by zero.
2400, and -6400, respectively.
If, in addition, the Hermitian matrix is positive definite,
For positive definite Hermitian matrices A, it is true all eigenvalues are real and positive. It is still necessary
that to consider multiple roots for the characteristic equation.
(u, Au) (Au, Au)
An < -< ------- < A1, (4.34) 4.6 The Power Method of Mises
(u, 4 (u, Au)
In Sections 4.6-4.9, we consider various methods for
where I, is the least and A, the greatest of the eigenvalues determining eigenvalues and eigenvectors. For additional
of A, and u is any vector of proper dimension. That information concerning such methods, see Bodewig [ l ]
and Faddeev and Faddeeva [5].
When the eigenvalues of a matrix A are so ordered that
II,)2 )A21 2 ..-2I1,I, and when )i.,l> )A,/ or j., = i., =
follows quickly from (4.28) if for v we use u and for w we * -.= A, and )A,)> II,,,), it is customary to call I., the
use Au [note too that (u, Au) = (Au, u)]. Now let A = dominant eigenvalue. If the matrix is of order n, has 11
diag (A,, A,, . . ., A,), and A = P*AP. Then .
linearly independent eigenvectors u,, u,, . ., u, and a
(u, Au) (Pu, APu) dominant eigenvalue A,, then I., and an associated
-= = (x,Ax), eigenvector, w,, of unit length, can be appioximated as
(u, u) (Pu, Pu)
follows. Suppose first that )I.,) > )I.,). Let v, be an
where x = Pu//JPull;thus (x, x) = 1. But (almost) arbitrary vector to be described more exactly
later. Define the sequence v, by
v,=Av,~,/()Av,~,(), m21. (4.35)
Then
Now let u = A-'v. Then
limvm=wl, lirnI)Avm-,))=lA1). (4.36)
m-r w m-+ w
(Au, 1 1
-- Au) (Au, Au) - (v, v) - - To see this, let v, be any vector:
(u, Au) (Au, u) (v, A-lv) - (v, A-'v) (Y,A-lY)'
--
(v, v)
provided only that c, f 0. (Since u, is not known, there
where (y, y) = 1 and y = Pv/(JPvjJ.
Since is an element of hazard in choosing v,. However, even
though the v, chosen should have a zero cofnponent
involving ui, a round-off error might eventually provide
it follows that
such a component.) Then
1 1
- < -11 = Al.
YiYi and, in general,
Since the guarantee for convergence using the power
method (Section 4.6) is based on the existence of n linearly
where p, is merely a normalizing factor. For largevalues of
independent eigenvectors, and since Hermitian matrices
m, this is substantially P,c,;l~u,, which in turn is II,/IIII~II.
fulfill this requirement, the method is adapted to them.
Should the dominant eigenvalue have multiplicity i, it is
Moreover, once an eigenvalue and its corresponding
seen that v, is substantially
eigenvector have been found, there is in this case a par-
ticularly simple theory for continuing. Let the eigenvalues
4.6 The Power Method of Mises 227
and (4.36) is still valid if w, be interpreted as one of a imply det((l/p)I(Ay - I)) = 0, hence det(Ap - I) = 0.
family of unit eigenvectors associated with A,. It is pos- '
Then since det(A- (Ap - I)) = det(A- I ) det(Ap - I), it
sible to find the remaining eigenvectors associated with follows that det(p1- A - l ) = 0. Procedures for finding
A, by repeating the process using a different vector v,, A-' are covered in Chapter 5.
the procedure being most effective if the multiplicity is At this point, it is worth noting that every polynomial
known. An alternate procedure is to solve the system of
linear equations Au = Alu.
of the form x;=,
companion matrix
aixn-', where a, = 1, determines a

When an eigenvalue A, (not necessarily dominant) and


an associated eigenvector w, have been found for matrix
A, it is often possible to proceed with the process.
Suppose, for example, that a unit vector h, can be found
such that A*h, = Xihi. (Note that XI is an eigenvalue for
A*.) Then let B be defined by

and that its characteristic function is precisely


Observe that if Aw = l v r l and A # A,, then hTw = 0. For
h:Aw = IhTw and also hTAw = (A*hl)*w = (Xlhl)*w =
I,h:w. Then from ?,h:w = A,hTw, we find hTw = 0.
This can be verified by induction or by multiplying each
This means that Bw = Aw - i l h , x 0 or Bw = Aw = Iw.
column in A - XI in succession by x (starting with
Thus B has all the eigenvalues of A that are different
the first) and adding to the next. When the resulting
-
from A,. Also the trace of B is A, less than the trace of A.
For Sp(h,h:) = (h,, h,) 1 ; hence Sp(l,h,h:) = I,, and
Sp(B) = Sp(A) - J.,. Therefore, if A, is not a multiple
determinant is evaluated in terms of the last column, the
statement becomes evident. Now use this companion
matrix in connection with Mises' process, and for the
root of the characteristic equation for A, the matrix B
initial vector vo use
will yield all the remaining eigenvalues and eigenvectors
of A, Compare this procedure, when A = A*, with the V,, = [u,- 1, U,-Z, . . ., uoIf,
method described at the end of Section 4,5. where the meaning of the components of vo is that given
In some physical problems, one is primarily concerned in Bernoulli's method. For this particular matrix, the
with the dominant eigenvalue. In others, the eigenvalue two methods are seen to coincide. Also, note that methods
of concern is the one of least absolute value. It is now of finding eigenvalues, which do not depend on direct
shown that the eigenvalues of A-' are the reciprocals of solution of the characteristic equation, might prove useful
those for A. For A nonsingular and det(A - (1ly)I) = 0 in finding the zeros of polynomial functions.
EXAMPLE 4.2
THE POWER METHOD

Problem Statement are known, a starting vector Bv,, where B = (A - A,I)


Write a program, based on the method of Section 4.6, x (A - A,I), will generate A, and u,, etc.
that determines the eigenvalues and eigenvectors of an Because round-off error is likely to introduce un-
n x n real matrix A. wanted components involving u,, u,, etc., we must
periodically eliminate such quantities. This is achieved
Method of Solution by replacing the current approximation v with Bv/(lBv11,
The method for determining 1, and u, is expressed in after every m,,, iterations.
equations (1.35) and (4.36). Note that after the mth Computations for each eigenvalue are discontinued,
iteration, the old eigenvector estimate vm-, is discarded when, from one iteration to the next, there is little further
in favor of the new v,. Hence, we need only use a single fractional change in the length of v. Thus, if 1 = IJAv, 1)
vector, v, for containing successive such estimates, apart -,
and I, = IIAv, (1, convergence is assumed if
from the starting vector v,, which is reserved for an
additional purpose indicated below.
An alternative procedure is used here for finding the
remaining eigenvalues and eigenvectors, assuming that
where E is a preassigned small quantity. If convergence
there are no repeated eigenvalues. To obtain A,, first
does not occur within m , iterations, the calculations
,
observe that the vector
are discontinued.
(A - Al1)vo = (1, - Ll)clu, + ( 4 - )q)c,u, From equation (4.36), note that the power method
yields only the magnitude of the eigenvalues. To append
+ . . + (A" - A,)cnun
*
their correct signs, we must compare the first nonzero
does not contain a component u,. Thus, a repetition of elements of v on two successive iterations. If these
the procedure with a starting vector Bv,, where B = elements are of the same sign, th'en ILi= I; otherwise,
-
A A,I, will yield A, and u,. SimiIarly, once I, and R, R, = -1.
Example 4.2 The Power Method

Flow Diagram

,
Begin =- B + I i = l,2, -1
. . ., I 1 I
I I
230 Matrices and Related Topics

FORTRAN Implementation
List of Principal Variables
Program Symbol Definition
A Matrix A, whose eigenvalues are required.
B -
Repeated product matrix, B = (A L,I)(A 411 ..- .
C, D Matrices used for intermediate storage.
EPS Tolerance, 8, used in convergence test.
IDENT Identity matrix, I.
L Length, I, of v .
LZERO I,, used for temporary storage of I.
LAMBDA Vector containing eigenvalues, hi.
M Iteration counter, m.
N Number of rows, n, of matrix A.
MFREQ Number of iterations, rn,,,,, between periodic rcorthogonalization of v.
M MAX Maximum number of iterations permitted, m,,.
u Matrix U whose columns contain the eigenvectors of A.
v Current approximatian, v , to eigenvector.
VZERO Starting guess, v, , for eigenvector.
Y Vector for temporary storage, y.
Example 4.2 The Power Merhod

Program Listing
C A P P L l ED NUMERI CAL METHODS, EXAMPLE 4.2
C POWER METHOD FOR DETERMINING EIGENVALUES AND EIGENVECTORS.
C
C THE FOLLOW 1 NG PROGRAM MAKES EXTENSIVE USE OF THE MATRl X
C OPERATIONS DEFINED I N THE SUBROUTINES OF EXAMPLE 4.1
C
I M P L I C I T REAL*8(A-H, 0-2)
REAL+8 L, LZERO, LAMBDA, IDENT
DIMENSION A(10,10), B(10,10), C(10,10), D(10,10), IDENT(10,10),
1 U(10,10), LAMBDAflO), V ( 1 0 ) , VZER0(10), Y ( 1 0 )
C
..... READ AND CHECK I N P U T DATA .....
READ ( 5 , 1 0 0 ) N, MMAX, MFREQ, EPS, (VZERO(I 1, I = 1, N )
WRITE (6,200) N, MMAX, MFREQ, EPS
WRITE (6,207) (VZERO( I), I = 1, N)
READ ( S 8 1 0 1 ) ((A(I,J), J 1, N), 1 a 1, N )
WRITE (6,201)
DO 1 5 1 = 1, N
WRITE ( 6 , 2 0 7 ) (A(I,J), J = 1, r )

.....I N I T I A L L Y EQUATE B TO THE I D E N T I T Y MATRIX


DO2 1 = l , N
.....
DO2 J = l , N
IDENT(I,J) = 0.
DO 3 I . = 1, N
IDENT(I,I) = 1.
CALL MATEQ (IDENT, B, N, N)

.....
DO 11
PERFORM POWER METHOD FOR A L L N EIGENVALUES
1 = 1, N
.....
..... MOOIF'Y
~.
GONAL TO A L L PREVIOUSLY COMPUTED EIGENVECTORS ..,..
~
STARTING VECTOR SO THAT I T I S ORTHO-

CALL MATVEC (0, VZERO, V, N, N)

.....
DO 5
PERFORM SUCCESSIVE POWER METHOD ITERATIONS
M = 1, MMAX
.....
..... P E R I O D I C A L L Y RE-ORTHOGONALIZE THE VECTOR V
I F ((M/MFREQ)*MFREQ .NE. M) GO TO 4
.....
CALL MATVEC (0, V, Y, N, N)
CALL VECL-EN (Y, L, N )
CALL SCAVEC ( l . O / L , Y, V, N)

.....COMPUTE NEW VECTOR V AND I T S LENGTH


CALC MATVEC (A, V, Y, N, N)
.....
CALL VECLEN ( Y , L, N)
CALL SCAVEC ( l . O / L , Y, V, N)

..... CHECK FOR CONVERGENCE


I F (DABS((L -
LZERO)/LZERO) .LT.
..... EPS) GO TO 7
LZERO = L

.....= -
IM1
SALVAGE P A R T I A L RESULTS I F METHOD D I D NOT CONVERGE
I 1
....
WRITE (6,202) 1, M, (LAMBDA(K1, K = 1, I M 1 )
WRITE (6.203)
DO6 K S 1 , N
WRITE (6,207) (U(K,J), J = 1, I M 1 )
GO TO 1

..... E S T A B L I S H THE S I G N OF THE EIGENVALUE


CALL MATVEC (A, V, Y, N, N )
.....
- - - - - .

DO8 K = l , N
I F (DABS(V(K)) .LT. 1.OD-3) GO TO 8
I F (V(K)*Y(K)
GO TO 9
.LT. 0.0) L =- L

CONTINUE
232 Matrices and Related Topics

Program Listing (Continued)


C
C
9
..... STORE CURRENT EIGENVALUE AND EIGENVECTOR
LAMBDA(I1 = L
.....
DO 1 0 K = 1, N
10 U(K,I) = V(K)
WRITE ( 6 , 2 0 4 ) 1, M, L
C
C ..... MODIFY MATRIX B .....
I F ( I .GE. N ) GO TO 11
CALL SCAMAT (L, IDENT, C, N, N)
CALL MATSUB (A, C, 0, N, N )
CALL MATMLT (D, B, C, N, N, N )
CALL MATEQ (C, B, N, N )
11 CONTINUE
C
C ..... PR
WRITE
l NT E l GENVALUES AND
(6,205) (LAMBDA(I), 1
E l GENVECTORS
= 1, N)
.....
WRITE (6,206)
DO 1 2 1 = 1, N
WRITE (6,207) (U(I,J), J = 1, N )
GO TO 1
..... FORMATS FOR INPUT AND OUTPUT STATEMENTS
FORMAT ( 3 ( 1 2 X , 131, 12X, E8.2/ ( l o x , 8 F 5 . 1 ) )
.....
FORMAT ( l o x , 8 F 5 . 1 )
FORMAT (1H1, 4X, 4 7 H POWER METHOD FOR DETERMINING EIGENVALUES, W I T
1 H / 1H0, 6X, 1 0 H N = 1 4 / 7X, 1 0 H MMAX = , 14/
2 7X, 1OH MFREQ = , I 4; 7X, 1 0 H EPS = , E 1 2 . 3 / 1H0,
IS)
3 4X, 3 9 H STARTING VECTOR VZERO(l)...VZERO(N)
FORMAT (lHO, 4X, 3 9 H THE STARTING MATRIX A ( 1 , l )
FORMAT (1H0, 4X, 3 7 H NO CONVERGENCE.
... A(N,N) IS)
P A R T I A L RESULTS ARE/ 1H0,
1 6X, 6H I = ,
12, lox, 9 H M = , 1 3 / 1H0,
2 6X, 2 7 H LAMBDA(l)...LAMBDA(I-1) = / (7X, 1 0 F 1 1 . 6 ) )
FORMAT (1H0, kX, 2 7 H F I R S T 1-1 EIGENVECTORS ARE)
FORMAT (1H0, 6X, 6H I I, 12, 5X, 9 H M = 8 l3,
1 SX, 6H L = ,
F11.6)
..
FORMAT ( l H O , - 4X. 3 8 H EIGENVALUES LAMBDA( 1). LAMBDA(N1 ARE/
1 (7X, 1 0 F 1 1 . 6 ) )
2 0 6 FORMAT (1H0, 4X, 3 9 H EIGENVECTORS ARE SUCCESSIVE COLUMNS O F )
2 0 7 FORMAT .(7X, 1 0 F 1 1 . 6 )
I:
7

END

Data
= 3, MMAX = 100, MFREQ = 15, EPS = 1.OE-9
= 1. 1. 1.
-11. 6. -2.
13.
N = 4, MMAX = 50, MFREQ = 1, EPS = 1.OE-9
VZERO = 1. 0. 0. 0.
A = 10. 9. 7. 5. 9. 10. 8. 6.
7. 8. 10. 7. 5. 6. 7. 5.
N = 8, MMAX = 300, MFREQ = 10, EPS = 1.OE-9
VZERO = 1. 0. 0. 0. 0. 0. 0. 0.
A = 2. -1. 0. 0. 0. 0. 0. 0.
-1. 2. -1. 0. 0. 0. 0. 0.
0. -1. 2. -1. 0. 0. 0. 0.
0. 0. -1. 2. -1. 0. 0. 0.
0. 0. 0. -1. 2. -1. 0. 0.
0. 0. 0. 0. -1. 2. -1. 0.
0. 0. 0. 0. 0. -1. 2. -1.
0. 0. 0. 0. 0. 0. -1. 2.
Example 4.2 The Power Method
Computer Output
Results for the Ist Data Set
POWER METHOD FOR DETERMINING EIGENVALUES, WITH
N = 3
MMAX = 100
MFREQ = 15
EPS = 0.100D-08
STARTING VECTOR VtERO(l)...VZERO(N) IS
1.000000 1.000000 1.000000
THE STARTING MATRIX A(1,l). ..A(N,Nl IS
11.000000 6.000000 -2.000000

EIGENVALUES LAMBDA(1).
21.000000 14.000000
..
LAMBDA(N1 ARE
7,000000
EIGENVECTORS ARE SUCCESSIVE COLUMNS OF
0.000000 0.894427 -0.447214
0.316228 0.447214 0.000000
0.948683 -0.000000 -0.894427

Results for the 2nd Data Set


POWER METHOD FOR DETERMINING EIGENVALUES, WITH
N = 4
MMAX
MFREQ
EPS
-
=

=
50
1
0.100D-08
STARTING VECTOR VZERO(l)...VZERO(N) IS
1.000000 0.0 0.0 0.0
THE STARTING MATRIX A(l,l)...A(N,N) IS
10,000000 9.000000 7.000000 5,000000
9.000000 10.000000 8.000000 6.000000
7.000000 8.000000 10.000000 7.000000
5.000000 6.000000 7.000000 5.000000

3.858057
..
EIGENVALUES LAMBDA( 1). LAMBOA(N1 ARE
30.288685 0.843107 0.010150
EIGENVECTORS A R E S U C C E S S I V E COLUMNS OF
0.520925 -0.625397 0.567641 -0.123697
Matrices and Related Topics

Computer Output (Continued)


Results for the 3rd Data Set
POWER METHOD FOR DETERMINING EIGENVALUES, WITH

N = 8
MMAX = 300
MFREQ = 10
EPS = 0.100D-08

STARTING VECTOR VZERO(l)...VZERO(N) IS


1.000000 0.0 0.0 . 0.0 0.0

THE STARTING MATRIX A(l,l)...A(N,N) IS


2.000000 -1.000000 0.0 0.0 0.0
-1.000000 2.000000 -1.000000 0.0 0.0
0.0 -1.000000 2.000000 -1.000000 0.0
0.0 0.0 -1.000000 2.000000 -1.000000
0.0 0.0 0.0 -1.000000 2.000000
0.0 0.0 0.0 0.0 -1.000000
0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0

I = 1 M = 97 L = 3.879385

1 - 2 M = 137 L = 3.532089

1 = 3 M = 95 L = 3.000000

1 = 4 M = 56 L = 2.347296

1 x 5 M = 34 L = 1.652704

1 = 6 M = 19 L = 1.000000

1 = 7 M = 11 L = 0.467911

EIGENVALUES LAMBDA(l)...LAMBDA(N) ARE


3.879385 3.532089 3.000000 2.347296 1.652704

EIGENVECTORS ARE SUCCESS l VE COLUMNS


0.161294 0.303060 0.408279
-0.303111 -0.464290 -0.408259
0.408334 0.408248 -0.000027
-0.464277 -0.161183 0.408268
0.464209 -0.161277 -0.408228
-0.408162 0.408248 -0.000027
0.302915 -0.164196 0.408238
-0.161166 0.302966 -0.408218
Exomple 4.2 The Power Method 235
Discussion of Results the results as they are. Note that the eigenvectors of the
The power method has been applied to three matrices. third matrix are alternately symmetric and antisym-
The results for the first matrix agree completely with metric; a starting vector v, = [1, 1, 1, 1, 1, I , 1, 11'
the exact values given on page 222. The second matrix would have been a particularIy unfortunate choice. Note
has extremely well-spaced eigenvalues, and convergence that the method fails to generate the :.mallest eigenvalue;
rapidly occurs to values that are virtually identical with instead, the largest eigenvalue is duplicated. This
those computed in Examples 4.3 afid 4.4, The third difficulty could probably be overcome by searching
matrix arises typically when treating uristeady-state heat for more suitable values of m,, m, and E.
conduction in a slab as a characteristic~valueproblem For complete success, the method demands double-
(see Section 7.26). Although the eigenvalues again agree, precision arithmetic, as used here. Otherwise, the
to six significant figures, ?with those computed iil Ex- successive starting vectors are insufficiently orthogonal
amples 4.3 and 4.4, the eigenvectors are accurate only to the earlier eigenvectors; consequently, convergence
to about three or four significant figures. The eigenvect~rs may not occur to the smaller eigenvalues. The periodic
could be obtained more accurately by increasing M M M reortkogonalization of the current vector v is particularly
and reducing EPS. However, the total number of itera- important for the second test matrix, because of its
tions is already fairly large, and we have chosen to leave well-spaced eigeavalues,
236 Matrices and Related Topics

4.7 Method of Rutishauser [4] for r,,. Solve the system Rv = r,,v, where v' = [a, h, 1, 01,
Throughout this section, for i 2 1, L', L,, L; will be to obtain the eigenvector for r,,, etc. The technique for
used to denote lower triangular matrices whose diagonal individual eigenvectors is that of back substitution in
elements are all ones, while R, R', Ri, Rf will be used to Gaussian elimination (see page 270).
denote upper triangular matrices (here R', for instance, In application, we can adopt the philosophy of using
is not the transpose of R-for this we consistently use the the results if convergence occurs. We emphasize that
notation R'). convergence does take place for positive definite matrices.
Let A = A, be a real square matrix such that A, = LIRl A technique is still needed for writing a suitable matrix
(this can always be accomplished if, as previously shown, B as CD where C is lower triangular (with ones on the
A is positive definite). Define A, as RILl. It may be that and is upper Let =
A, = L,R,. If so, define A, as R,L, In general, assume = ('ii). = (dii)7 it being that
that cij=O for i < j < n , cii=l;
A, = LkRk, Ak+1 = RkLk. (4.39) d 1J. . = O for i > j .
Note that in such event
Since bij = xi= cikdkj, by the above.

Since the product of lower (upper) triangular matrices is


also lower (upper) triangular, and since the inverse of a
lower (upper) triangular matrix is also of the same kind,
it is readily verified that if
Thus
L; = L1L2. . . L k ; R; =RLRk-l ... R l ; (4.41)
then
Ak+ ,= L;- 'ALL = RLAR;- '. (4.42)
This shows that A, is similar to A; therefore, as previously
shown, it has the same eigenvalues and related eigen-
vectors. Suppose now that
In connection with a possibility for convergence, it is
lim L; = L'. assumed that A is real and that 11,1 > 1?.,1 > . . . > 11,1, so
k4m
This means that that in addition, the eigenvalues are real. Note first that
from (4.39), it is possible to see that
lim L, = I, lim A, = lim Rk = R, (4.44)
k-m k-m k+ m

where R is upper triangular. There results from (4.42) because

where R not only has the same eigenvalues as A, but has


them as its diagonal entries (if A is to be real, this means
A must have only real eigenvalues). If Rv = ?,v, then
AL'v = 1L'v. Thus we see that when convergence takes
place, a knowledge of L' and R will yield all eigenvalues
and eigenvectors for A, provided that we can find the
eigenvectors for R. This is particularly straightforward It is possible to show [4] that instead of the formula listed
if there are no multiple eigenvalues. Consider, for in (4.46), we can write, for all indices i and j,
example,

Solve the system Rv = rllv, where v' = [l, 0, 0, 01, to


obtain the eigenvector for r , , . Solve the system Rv = r , , ~ ,
where v' = [a, 1 , 0, 01, to obtain the eigenvector Let U = (uij) be a matrix such that AU =
U diag (1,I, . . . I,) and let V = ( v i j )= (U-l)'. Thus
A, = U diag (1:~: . . . 1:)~'.
4.7 Method of Rutishauser

Then

Ak =
u11 u12
u2.
"'

-
Ul, ulll: v,,A: ...
.. _]
unll:

U,I unz . ' u,, ulnAi uZnA; ... unnA;


From this it follows for B = = CD, that cij = det(Dji)/det(Djj), where

As k increases, because the li are in order of decreasing magnitude, the value of det(Dji) approaches that of
det(Dji) where

That is,

To aid in seeing this, observe that Dji is the sum of similar terms, all other terms having a middle factor,
diag (l:,Ak . . . A:), which for k large is small compared to the one shown. Use partitioning to see that Dii contains
all terms of the type described. Thus for k large, cij is approximated by

where

This means that L' = lim,,, L; = L, if det(Uj)det(Vj) # 0,where

Here, U = L,R, expresses U as the product of a lower triangular matrix (with ones on the diagonal) by an
.
upper triangular matrix. Therefore also, R = R, diag (1,1, . . 1,)R;'.
There is a modification which can be used in conjunction with real matrices A whose eigenvalues may be complex.
EXAMPLE 4.3
RUTISHAUSER'S METHOD

Problem Statement subsequent transformed matrices A,, A,, . . ., A,, and


Write a subroutine, called RUTIS, that will apply Rutis- their lower and upper triangular decomposition matrices
hauser's L R transformation to a given matrix, and L and R (with the exception of the diagonal of L, which
that will also find the eigenvectors as a byproduct. The consists of l's, and need not be stored). This arrange-
subroutine should iworporate the following features: ment results in considerable economy of storage. The
(a) economy of storage, (b) special handling of tridiagonal upper triangular portion of matrix X is reserved for the
matrices, to take advantage of their high proportion eigenvectors of A,; its lower triangular portion stores the
of zeros, (c) acceleration of convergence, when appro- accumulated product L; = L,L, . . . L, . The dlagonal
priate, and (d) double-precision arithmetic. Check the elements of X are all 1's and may be regarded as common
subroutine for several test matrices by writing a main to both the eigenvector and accumulated product
program that handles input and output and calls on portions. Matrix U is employed only for storing the
RUTIS. normalized eigenvectors of A.
The following algorithms are used in the subroutine:
Method of Solution and FORTRAN Implementation (1) Decomposition of matrix A;, into the product of lower
This example elaborates considerably on the basic L R and upper triangular matrices Lk and R, (A, 4 L,R,) :
method. The resulting program is long and involved,
but because certain extra features are included, it is
computationally quite efficient. The complete method is
most easily discussed by referring partly t o program
symbols from the outset.
The calling statement for the subroutine will be
CALL RUTlS (N, A, ANEW, U., FREQ, ITMAX, EPS1, EFS2, EPS3,
Note: (a) Also l j j = 1, j = 1,2, . . ., n, but need not be
EPS4, EIGVEC, STRIPD, SWEEP, TAG1, TAG2, ITER)
stored.
The various arguments are defined as follows:

Program Symbol ' Definition


A Array containing the n x n starting matrix, A.
ANEW Array that is t o contain the final transformed matrix.
EIGVEC Logical variable, having the value T/F (truelfalse) if the eigenvectors arelare not required.
EPSI Tolerance used in convergence testing. If the sum, SUBSUM, of the absolute values of the
subdiagonal elements of the transformed matrix falls below EPS1, the LR transformations will
be discontinued.
Tolerances. The eigenvectors of A will be computed if and only if: (a) EIGVEC = T, (b) no
two eigenvalues lie within a small amount EPS2 ofeach other (if they do, TAG2 will be returned
as T), and (c) SUBSUM is not greater than EPS3 (if it is, TAG1 will be returned as T).
Tolerance for the sweeping procedure (see below), which will occur only if: (a) SWEEP = T
and (b) SUBSUM < EPS4.
FREQ Number of L R steps elapsing between successive " sweeps," if any.
lTER Returned as the number of LR steps actually performed by the subroutine.
ITMAX Maximum number of LR bteps to bc p e r f o r m e d .
N Dimension of starting matrix, n.
STRIPD Logical variable, having the value T/F if A is/is not tridiagonal. Used to avoid unnecessary
multiplication of zeros.
SWEEP Logical variable, having the value T/F if the sweeping procedure (see below) islis not to be
applied.
n x n matrix whose columns contain the eigenvectors of A.

The subroutine employs two additional n x n working


matrices, B and X. As soon as RUTIS is entered, B is (b) Any element a i j is used only once, namely, to
equated t o the starting matrix A, which is left untouched, compute the corresponding rij or lij. Hence
should it be required further in the main program. the elements of R and L can replace those of
Matrix I3 is then e m p l o y e d for storing, in turn, a11 A as soon as they are calculated.
Example 4.3 Ruri'shauser's Method 239
(c) In the program,, A, L, and R occupy the same transformed matrix (occupying the upper triangular
array (B) and are therefore referred to by the portion of matrix B, since the strictly lower triangular
common symbol B. portion should now consist of numbers very close t o zero):
(d) For the special case of a tridiagonal starting
matrix, all subsequent L R transformations
also yield tridiagonal matrices, and the above
algorithm would result in an unnecessarily
high proportion of multiplications involving
zero. We therefore introduce two integer
vectors, BEGlhl and FINISH, such that for
any column J, BEGIN(J) and FINISH(J) contain Note that the eigenvectors can only be determined
the row subscript of the first and last nonzero using this algorithm if no two eigenvalues are very close
elements, respectively. For full matrices, together.
BEGIN(J) and FINISH(J) are always 1 and N (5) Determination of the starting matrix eigenvectors,
respectively, but they will be suitably modified as successive columns of matrix U:
for tridiagonal matrices. In the program, the
lower and upper limits on I in the above
algorithm will appear as BEGIN(J) and
FINISH (J), respectively, and the lower limit on
K will be BEGIN(J).

(2) Recombination of lower and upper triangular


matrices, in reverse order, to give transformed matrix (6) The algorithm given below relates to the acceleration
A,+l (Ak+l '-RkL,): of convergence by "sweeping." Note first that the sub-
diagonal elements of A, tend t o zero most quickly when
A has eigenvalues whose moduli are well-spaced. For less
favorable cases, convergence may not be especially rapid.
Since the L R algorithm is only a means t o the ultimate
end of producing an upper triangular matrix, an alter-
native transformation could be used at any stage should
a better one be available. The convergence can indeed
be accelerated by the following technique, once a matrix
A, has been obtained whose subdiagonal elements have
Note: (a) As soon as an element a i j has been computed, been made moderately small by the conventional LR
the corresponding element r i j or l i j is no longer transformation.
needed. Hence, the elements of A can replace Let
those of R or L as soon as they are calculated.
A,+ 1 = L, lA,t,,
(b) In the program(, for reasons previously stated,
the lower and upper limits on J are BEGIN(1) where
and FINISH(/), respectively. The upper limit on
K is FINISH(1) for J < I and FINISH(J) for J 2 1.
(3) Determination of the accumulated lower triangular
product matrix: L;+ ,c-L; L k + ,, where the newest
matrix L;,, has elements given by

By multiplying out, the c's could be chosen so as to bring


a11 the subdiagonal elements of the first column of
A,,, to zero; however, finding these values of v would
Note: (a) The substitution operator +- merely em- require an excessive a m n ~ i n tnf c n m p ~ ~ t ~ t i Rn unt , b y
phasizes that a newly computed element of L;+, neglecting quadratic terms in the v's and product terms
occupies the sarne storage location as the corres- of the v's with the subdiagonal elements of columns
ponding element of I,". 2, 3, 4, . . ., n - 1 of A,, it can be shown readily that a
(b) In the program, L' is stored in the matrix X. choice of v's determined by the following equations will,
(4) Determination of the eigenvectors (to be placed in in most cases, effect a substantial reduction in the
the upper triangular portion of matrix X) of the final magnitude of the subdiagonal elements of column I :
Matrices and Related Topics

Further transformations column to the next) will also reduce the subdiagonal
elements of the remaining columns substantially.
A,+z = L,:lAm+lLm+l, In the subroutine, the elements vi, which are stored in
Am+3= Lm:zAm+zLm+z,etC.9 the vector V, are determined for a particular column j
from the relation
where
n

I,
+ k = i + 1aikVk

I
1 aij
1 0 t). =
ajj - aii
, j=1,2, ..., n - 1 .
v3 1 i=n,n-1, ...,j + l ,
Lm+1= U4 1
This process of "sweeping" through the columns may
be repeated with ever-increasing effect. Finally, note that
Vn 1 if the sweeping procedure is applied to a tridiagonal
matrix, it will introduce small nonzero values into
elements that were previously zero. The technique of
(Id) above is then no longer advantageous, but this is of
little consequence, since the very introduction of the
Lm+%
= 1 ]etc., sweeping procedure means that the required degree of
convergence will soon occur.
We have not attempted to write a concise flow diagram
On for this example. Instead, the overall approach can be
understood best by reading the program comment cards
(and the v's are chosen properly, but will vary from one in conjunction with the above algorithms.
Example 4.3 Rutishauser's Method

Program Listing
Main Program
C A P P L l ED NUMERI CAL METHODS, EXAMPLE 4.3
,.
c
l,
EIGENVALUES BY RUTISHAUSER'S LEFT-RIGHT TRANSFORMATION.

C THE FOLLOW1 NG M A I N PROGRAM HANDLES INPUT AND OUTPUT ONLY, AND


C CALLS ON THE SUBROUTINE R U T I S TO IMPLEMENT THE ALGORITHM.
C
I M P L I C I T REAL*8 (A-H. 0 - 2 )
INTEGER FREQ
LOG1 CAL E l GVEC, S T R I PD, SWEEP, TAGl, TAG2
DIMENSION A ( 1 1 . .1 1 ) . - ANEW(11.11).
- . U(11.11)
I COUNT = 1
WRITE ( 6 . 2 0 0 )
READ ( 5 , 1 0 0 ) EPS1, EPS2, EPS3, EPS4, FREQ, SWEEP
WRITE (6,201) EPS1, EPS2, EPS3, EPS4, FREQ, SWEEP
C
C .....
READ STARTING MATRIX A AND OTHER PARAMETERS
1 WRITE ( 6 , 2 0 2 ) ICOUNT
.....
!COUNT = 1 COUNT + 1
READ ( 5 , 1 0 1 ) N, ITMAX, EIGVEC, S T R I P 0
WRITE- ( 6-. 2 0.3 ) - N. . ITMAX, EIGVEC. STRIPD
D O 2 1-1, N
2 READ (5,102) (A(1.J). J = 1. N)

C ..... CALL ON R U T l S TO F I N D EIGENVALUES AND EIGENVECTORS..


CALL R U T l S (N, A, ANEW, U, FREQ, ITMAX, EPS1, EPSZ, EPS3,
...
1 EPS4, EIGVEC, STRIPD, SWEEP, TAG1, TAG2, I T E R )
C
C ..... PR I NT VARI OUS RESULTS, AS APPROPRIATE
WRl TE (6,206) ITER, TAG1, TAG2
.....
WRITE ( 6 , 2 0 7 )
DO4 1 = I , N
4 WRITE ( 6 , 2 0 5 ) (ANEW(I,J), J = 1, N)
I F ( E I G V E C ) GO TO 5
WRITE ( 6 , 2 0 8 )
GO TO 1
5 I F (.NOT. TAG1) GO TO 6
WRITE ( 6 , 2 0 9 )
GO TO 1
6 I F (.NOT. TAG21 GO TO 7
WRITE ( 6 . 2 1 0 )

7
GO TO 1 -
WRITE ( 6 . 2 1 1 )
DO8 1 = 1 , N
8 WRITE ( 6 , 2 0 5 ) (U(I,J), J = 1, N)
GO TO 1
C
C
100
..... FORMATS FOR I N P U T AND OUTPUT STATEMENTS
FORMAT (6X, E7.1,
.....
l o x , E7.1, l o x , E7.1, lox, F 4 . 1 /
1 6X, 13, 11X, L 2 )
101 FORMAT (3X. 13, 11X. I h . 2(12X. L 2 ) )
102 FORMAT ( l o x , 1 4 ~ 5 . l j
200 FORMAT ( 6 2 H 1 DETERMINATION OF EIGENVALUES OF A MATRIX BY R U T I S H
lAUSERIS/ 4 7 H LEFT-RIGHT TRANSFORMATION. WITH PARAMETERS/lH
= , E12.3/ 7X, 1 0 H E P S ~ = , E12.3/
201
1
2
FORMAT i 7 ~ , 1 0 H E P S l
7X, 1 0 H EPS3
7X, 1 0 H FREQ

FORMAT (7X, 1 0 H N
= , E 1 2 . 3 / 7X, ZOH EPSS
= 14
FORMAT (1~1,4x, ~ H E X A M P L ~ ,I3, I~H, WITH PARAMETERSIIH
= , 14
-
/ 7 ~ , 1 0 H SWEEP
= , F6.1 /

/ 7X, 1 0 H ITMAX = , 1 4
, L4 )
)
/
1 7X, 1 0 H EIGVEC = L4 ,
/ 7X, 1 0 H S T R I P D = L4 1
FORMAT ( 2 9 H O THE S T A R T ~ N GMATRIX A I S )
FORMAT ( 7 X 8 10F10.6)
FORMAT (1H0, 6X, 1 0 H I T E R , 5 I 4/ 7X, 1 0 H T A G l = , L4/
1 7X, 1 0 H TAG2 = L4)
FORMAT ( 3 5 ~ 0 THE TRANS FOR ME^ MATRIX ANEW I S )
FORMAT (3OlHO E l GENVECTORS NOT REQUIRED)
FORMAT (45HO EIGENVECTORS NOT COMPUTED BECAUSE ONE OR/
1 41H MIORE SUB-DIAGONAL ELEMENTS TOO LARGE)
242 Matrices and Related Topics

Program Listing (Continued)


210 FORMAT ( 4 6 H O EIGENVECTORS NOT COMPUTED BECAUSE P A I R O F /
1 26H E I GENVALUES TOO CLOSE)
2 1 1 FORMAT ( 6 2 H O THE FOLLOWING M A T R I X CONTAINS THE NORMALIZED E l G E N
1VECTORS
C
END

Subroutine RUTIS

D E T E R M I N A T I O N OF THE EIGENVALUES OF AN NXN R E A L M A T R I X A,


B Y RUTI SHAUSER'S LEFT-RI GHT TRANSFORMATI ON. NOTE
(1) THE UPPER AND LOWER TRIANGULAR M A T R I C E S ( W I T H THE
E X C E P T I O N OF THE U N I T DIAGONAL OF THE LATTER) I N T O WHICH A
I S FACTORIZED, AND A L L SUCCESSIVE TRANSFORMATIONS OF A,
OCCUPY ARRAY B.
(2) THE ACCUMULATED PRODUCT OF THE LOWER TRIANGULAR
DECOMPOSITION M A T R I C E S I S STORED I N ARRAY X.
( 3 ) THE PROGRAM STOPS E I T H E R WHEN I T M A X SUCCESSIVE
DECOMPOSITIONS AND M U L T I P L I C A T I O N S HAVE BEEN MADE, OR WHEN
THE SUM (SUBSUM) OF THE ABSOLUTE VALUES OF T H E SUB-
D I A G O N A -L ELEMENTS
-- OF THE TRANSFORMED M A T R I X F A L L S BELOW A
SMALL VALUE EPSI.
(4) FOR THE S P E C I A L CASE OF AN I N I T I A L T R I D I A G O N A L MATRIX.
S E T T I N G THE L O G I C A L V A R I A B L E S T R I P D = .TRUE. W l L L
REDUCE THE COMPUTATION T I M E CONS1 DERABLY.
(5) I F THE L O G I C A L VARIABLE. SWEEP = .TRUE., A SPECIAL ROUTINE
I S INTRODUCED TO ACCELERATE CONVERGENCE, PROVIDED T H A T
SUBSUM I S LESS THAN EPSC.
(6) THE EIGENVECTORS OF A W l L L BE COMPUTED AND STORED I N
THE ARRAY U I F THE PARAMETER E I G V E C = .TRUE., PROVIDED
THAT NO 1WO EIGENVALUES L I E W I T H I N A S M A L L VALUE EPS2 OF
EACH OTHER, AND THAT SUBSUM I S NOT GREATER THAN EPS3.

SUBROUTINE R U T I S (N, A, B, U, FREQ, ITMAX, EPS1, EPS2,


1 EPS3, EPS4, EIGVEC, STRIPD, SWEEP, TAG1, TAGZ, I T E R )
I M P L I C I T R E A L * 8 (A-H, 0-2)
R E A L * 8 A, B, U, EPS1, EPSZ, EPS3, E P S 4 / LENGTH
DIMENSION A ( l l , l l ) , B(ll,ll), ~ ( l l , l l j , V(ll), X(11,ll)
INTEGER B E G I N ( l l 1 , F I N I S H ( l l ) , FREQ
LOG1 C A L E l GVEC, S T R I PD, SWEEP, TAG1,. TAG2
NM1 = N - 1
D O 1 I - 1 , N
D O 1 J a l , N
X(I,J) = 0.
B(I,J) = 0.
L = 0
T A G 1 = .FALSE.
TAG2 = .FALSE.

..... DETERMINE THE VECTORS


I F (.NOT. S T R I P D ) GO TO 3
B E G I N AND F I N I S H .....

DO4 J = l , N
BEGIN(J) = 1
FlNISH(J) = N
CONTINUE
DO6 1 = l , N
JLOW = B E G I N ( I
JHlGH = F I N I S H ( I )
DO 6 J = JLOW, J H l G H
B(I,J) = A(I,J)
Example 4.3 Rutishauser's Method

Program Listing (Continued)


C
C
C
..... START
CONVERGENCE SATISFACTORY OR I T E R A T I O N S EXCEED I T M A X .....
L E F T - R I G H T TRANSFORMATION, I T E R A T I N G U N T I L

DO 5 1 I T E R = 1, I T M A X
C
C
C
..... UPPER AND LOWER TRIANGULAR FACTORS .....
THE M A T R I X B I S DECOMPOSED I N T O

DO 1 0 J = 1, N
ILOW = B E G I N ( J 1
DO 8 1 = I LOW, J
SUM = 0.
IMl = I - .L
0

KLOW = B E G I N ( I 1
I F (KLOW .GT. I M 1 ) GO TO 8
DO 7 K = KLOW, I H 1
7 SUM = SUM + B ( I , K ) * B ( K , J )
8 B(I,J)
JP1 = J + 1
= B(I,J) -
SUM

IHIGH = FlNISH(J)
I F ( J P 1 .GT. I H I G H ) GO TO 1 5
DO 1 0 I = JP1, I H l G H
SUM = 0.
KLOW = B E G I N ( 1 )
JM1 = J - I!
I F (KLOW .GT. J M 1 ) GO TO 1 0
DO 9 K = KLOW, J M 1
9 SUM = SUM + B ( I , K ) * B ( K , J )
10 B(I,J) = (O(I,J) -
SUM)/B(J,J)
C
C
C
..... THE ACCUMULATED PRODUCT OF THE SUCCESSIVE LOWER
TRIANGULAR DECOMPOSITION MATRICES I S COMPUTED .....
15 I F (.NOT. E I G V E C ) GO TO 1 8
DO 1 7 1 2, N
IM1 = I
DO 1 7 J
-
1
1, IM1
X(I,J) B(I,J) + X(I,J)

-
KLOW J + 1
I F (KLOW .GT. I M 1 ) GO TO 1 7
00 1 6 K KLOW, I M 1
16 X(I,J) = X(I,J) + X(I,K)*B(K,J)
17 CON~INUE
C
C
18
..... AND THE FACTORS ARE COMBINED I N REVERSE ORDER
DO 2b I = 1, N
.....
JLOW = B E G I N ( I )
IM1 = I - 1
I F (JLOW .ET.
DO 2 0 J
B(I,J)
-
I M 1 ) GO r0 2 1
JLOW, I M 1
= B(l,l)*B(I,J)
IP1 = l + 1
KHIGH = F I N I S H ( I 1
I F ( I P 1 .GT. K H I G H ) GO TO 2 0
DO 1 9 K = IPl, K H I G H
19 B(I,J) = B(1,J) + B(I,K)*B(K,J)
20 CONTl NUE
21
DO 23 J
JPl = J + 1
-
JHlGH * F I N I S H ( I )
I, J H l G H

KHIGH FINISH(J)
I F ( J P l .CT. K H I G H ) GO TO 2 3
DO 2 2 K = JP1, K H I G H
22 B(I,J) = B ( I , J ) + B(I,K)*B(K,J)
23 CONTINUE
2S CONTINUE
DO 2 5 1 = 1, N
JHlGH
DO 2 5 J
- -
JLOW = B E G I N ( I )
FiNISH(I)
JLOW, J H l G H
25 I F (DABS(B(1,J)) .LT. 1.OD-10) B(I,J) = 0.
L = L + l
244 Matrices and Related Topics

Program Listing (Continued)


C
C
C
..... THE SUM OF THE ABSOLUTE VALUES OF THE
SUB-DIAGONAL ELEMENTS I S COMPUTED .....
SUBSUM = 0.
DO 2 6 1 = 2, N
26 SUBSUM = SUBSUM + D A B S ( B ( 1 , I - 1 ) )
C
C ..... DETERMINE COLUMN VECTORS FOR SWEEPING PROCEDURE
I F (.NOT.(L.EQ.FREQ .AND. SUBSUM.LT.EPS4 .AND. SWEEP))
.....
GO TO 42
DO 3 7 J = 1, NM1

..... .....
A

REJECT CASES FOR WHICH DIAG. ELEMENTS TOO CLOSE


0 0 3 0 1 = 1, N
I F (DABS(R(J.J)
JP1 = J + 1
-
B(I.I)).LT.EPS2 .AND. J.NE.1) GO TO 3 7

DO 3 2 I T = J P l , N
1 = N + JP1
V ( I ) = B(I,J)
- IT

IP1 = I + 1
I F ( 1 .EQ. N) GO TO 3 2
DO 3 1 K = I P 1 , N
V ( I ) = V ( I ) + B(I,K)*V(K)
V ( I ) = V(I)/(B(J,J) -B(l,l))

..... MODIFY LOWER TRIANGULAR PRODUCT MATRIX


DO 3 4 I T = JP1, N
.....
I 3 N + JPl
X(I,J) = X(I,J)
-IT
+ V(I)
IM1 = I 1
I F ( J P 1 .GT.
- I M 1 ) 6 0 TO 3 4
DO 3 3 K = JP1, I M 1
X(I,J) = X(I,J) + X(I,K)*V(K)
CONTl NUE

.....
DO 3 5
POSTMULTIPLY B W I T H SWEEPING MATRIX
1 = 1, N
.....

CONTINUE
I F (.NOT. S T R I P D ) GO TO 4 1
DO 4 0 J = 1, N
BEGIN(J) = 1
FINISH(J1 N
CONTl NUE
CONTl NUE
L
C ..... CHECK FOR CONVERGENCE .....
I F (.NOT.(L.EQ.FREQ .OR. ITER.EQ.ITMAX .OR. SUBSUM.LT.EPS1))
1 GO TO 5 0
L a o
50 CONTINUE
51 I F (SUBSUM .LT. EPS1) GO TO 5 2
5 2 DO 5 3 1 = 1, N
53 X(I,I) = 0.
C
C
C
..... ANY TWO EIGENVALUES ARE CLOSER TOGETHER THAN EPS2 .....
CHECK TO SEE I F EIGENVECTORS ARE REQUIRED OR I F

I F (.NOT. EIGVEC) GO TO 72
I F (SUBSUM .LE. EPS3) GO TO 5 6
TAG1 = .TRUE.
GO TO 7 2
54
IP1 -
DO 5 5 1 = 1, NM1
1 t 1
DO 5 5 J = I P 1 , N
I F (DABS(B(1,I)
TAG2 = .TRUE.
B(J,J))- .GE. EPS2) GO TO 5 5

GO TO 7 2
55 CONTINUE
Example 4.3 Rutishauser's Method

Program Listing (Continued)


L
C .....
DO 6 2 J
COMPUTE EIGENVECTORS OF TRANSFORMED M A T R I X
1, N
.....
X(J,J) = 1.
I F ( J .EQ. 1) GO TO 6 2
JM1 = J 1 -
DO 6 1 I T = 1, J M 1
1 = J - I T
SUM = B ( I , J )
IP1 = 1 + 1
I F ( I P 1 .GT. J M 1 ) GO TO 6 1
DO 6 0 K = I P 1 , J M 1
60 SUM = SUM + B ( I , K ) * X ( K , J )
61 X(I,J)
6 2 CONTINUE
= SUM/iB(J,J) -
B(l,l))

C
C ..... COMPUTE EIGENVECTORS OF O R I G I N A L M A T R I X
DO 6 7 1 = 1, N
.....
IMl = I -
1
I F ( I .EQ. 1) GO TO 6 5
DO 6 4 J = 1, I M 1
U(I,J) = X(I,J)
JM1 = J
.-
1
-
-.
- > - .
DO 6 3 K = 1, J M I
63 U(I,J) = U(I,J) + X(I,K)*X(K,J)
64 CONTl NUE
DO 6 7 J = 1, N
U(I,J) X(I,J)
IM1 = I
5

-1
I F ( 1 .EQ. 1) GO TO 6 7
DO 6 6 K = 1, 1 M 1
U(I,J) = U(l,J) + X(I,K)*X(K,J)
CONTl NUE

..... NORMAL1 ZE THE E l GENVECTORS


DO 7 1 J = 1, N
.....
SUMSQ = 0.
DO 7 0 1 = 1, N
SUMSQ = SIUMSQ + U ( I , J ) * * 2
LENGTH = DSQRT (SUMSQ)

CONTl NUE
RETURN

END

Data
E P S l = 1.OE-7 EPSZ = 1 . O E - 5 EPS3 = 1.OE-4 EPSS 0.1
FREQ = 5 SWEEP = T
N = 4 ITMAX = 25 EIGVEC = T STRIPD = F
A(1,l) = 10. 9. 7. 5.
9.
7.
5.
l TMAX
= 6.
4.
4.
1. 6.
l TMAX EIGVEC = F STRIPD = F
= 4. 3.
0.
5.
3.
Matrices and Related Topics

Program Listing (Continued)


Data (Continued)
EIGVEC = T
0. 0, 0.
STRl PD
0.
-
0.
T

0. 0. 0, 0. 0.
-1. 0. 0. 0. 0,
2. -1. 0. 0, 0.
-1. 2. -1. 0. 0.
0. -1. 2. -1. 0.

-
0. 0. -3. 2. -1.
0. 0. 0. -1. 2.
EIGVEC = T STRl PD T
0. 0, 0. 0. 0. 0.
0. 0. 0. 0.
0. 0. 0. 0.
0. 0. 0. 0.
-1. 0. 0. 0.
2. -1. 0. 0.
-1. 2. -1. 0.
0. -1. 2. -1.
0. 0. -2. 2.5

Computer Output
DETERMINATION OF EIGENVALUES OF A MATRIX BY RUTISHAUSER'S
LEFT-RI GHT TRANSFORMATI ON, W l TH PARAMETERS

EPSl = 0.1000-06
EPS2 = 0,1000-02
EPS3 1 O.1OOD-03
EPS4 0.1
FREQ = 5
SWEEP = T

EXAMPLE 1, WITH PARAMETERS


N = 4
ITMAX = 25
ElGVEC= T
STRlPD= F

THE STARTING MATRIX A I S


10.000000 9.000000 7.000000 5.000000
9.000000 10.000000 8.000000 6.000000
7.000000 8.000000 10.000000 7.000000
5.000000 6.000000 7.000000 5.000000

ITER = 11
TAG1 = F
TAG2 = F
THE TRANSFORMED MATRIX ANEW I S
30.288685 42.270007 10.019862 5.000000
0.0 3.858057 1.007116 0.702164
0.0 0.0 0.843107 -0.316835
0.0 0.0 0.0 0.010150
THE FOLLOWING MATRIX CONTAINS THE NORMALIZED EIGENVECTORS
0.520925 -0.625396 0.567651 0.123697
0.551955 - 0 . 2 7 1 6 0 1 -0,760318 -0.208554
,0.528568 0.614861 0.301652 -0.501565
0.380262 0.396306 -0.093305 0.830444
Example 4.3 Rutishauser's Method

Computer Output (Contirzued)


EXAMPLE 2. W l T H PARAMETERS EXAMPLE 3, W l T H PARAMETERS

N = 4 N = 4
ITMAX = 25 lTMAX = 25
EIGVEC = T EIGVEC = F
STRlPD= F STRlPD= F

THE S T A R T I N G M A T R I X A I S THE START1 NG M A T R l X A I S


6.000000 4.000000 4.000000 1.000000 4.000000 -5.000000 0.0 3.000000
4.000000 6.000000 1.000000 4.000000 0.0 4.000000 -3.000000 -5.000000
4.000000 1.000000 6.000000 4.000000 5.000000 -3.000000 4.000000 0.0
1.000000 4.000000 4.000000 6.000000 3.000000 0.0 5.000000 4.000000

ITER = 12 ITER = 25
TAGl = F TAGl = F
TAG2 = T TAG2 = F

THE TRANSFORMED M A T R I X ANEW I S THE TRANSFORMED M A T R l X ANEW I S


15.000000 4.999915 5.000000 1.000000 12.000000 -8.678738 3.000000 3.000000
-0.000000 5 . 0 0 0 0 0 0 -0.000000 3.000000 0.000000 2.131230 -5.000000 -2.000000
-0.000000 -0.000000 5.000000 3.000051 -0.000000 5.255936 -0.131230 -3.452492
0.0 0.000000 0.000000 -1.000000 0.0 -0.000000 -0.000000 2.000000

EIGENVECTORS NOT COMPUTED BECAUSE P A I R OF E l GENVECTORS NOT REQUl RED


EIGENVALUES TOO CLOSE

EXAMPLE 4, W I T V PARAMETERS

N = 8
ITMAX = 100
EIGVEC = T
STRlPD= T

THE S T A R T I N G M A T R I X A I S
2.000000 -1.000000 0.0 0.0
-1.000000 2.000000 -1.000000 0.0
0.0 -1.000000 2.000000 -1.000000
0.0 0.0 -1.000000 2.000000
0.0 0.0 0.0 -1.000000
0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0

ITER = 41
TAGl = F
TAG2 = IF

THE TRANSFORMED M A T R l X ANEW I S


3.879385 - 1 . 0 0 0 0 0 0 0.0
0.0
0.0
0.0
3 . 5 3 2 0 8 9 -1.000000
0.000000
0.0
3.000000
0.0
-
0.0 0.0 0.0
0.0 0.0 0.0
0.0 0.0 0.0
0.0 0.0 0.0
THE FOLLOWING M A T R I X CONTAINS THE NORMALIZED EIGENVECTORS
0.161230 0.303013 0.408248 0.464243 0.464243 0.408248 0.303013
-0.303013 - 0 , 4 5 4 2 4 3 -0.408248 -0.161230 0.161230 0.408248 0.464243
0.408248 0,408248 - 0 . 0 0 0 0 0 0 -0.408248 -0.408248 0.000000 0.408248
-0.464243 -0.161250 0.408248 0.303013 -0.303013 -0.408248 0.161230
0.464243 - 0 . 1 6 1 2 3 0 -0.408248 0.303013 0.303013 -0.408248 -0.161230
-0.408248 0 . 4 0 8 2 4 8 -0.000000 -0.408248 0.408248 -0.000000 -0.408248
0.303013 - 0 , 4 6 4 2 4 3 0.408248 -0.161230 -0.161230 0.408248 -0.464243
-0.161230 0.303013 -0.408248 0.464243 -0.464243 0.408248 -0.303013
248 Matrices and Related Topics

Computer Output (Continued)


EXAMPLE 5, W I T H PARAMETERS

THE S T A R T I N G M A T R l X A I S
2.500000 -2.000000 0.0
-1.000000 2.000000 -1.000000
0.0 -1.000000 2.000000
0.0 0.0 -1.000000
0.0 0.0 0.0
0.0 0.0 0.0
0.0 0.0 0.0
0.0 0.0 0.0
0.0 0.0 0.0

THE TRANSFORMED M A T R l X ANEW I S


4.000000 -2.000000 0.0
0.000000 4.088793 -1.000000
0.0 0.0 3.539881
0.0 0.0 0.0
0.0 0.0 0.0
0.0 0.0 0.0
0.0 0.0 0.0
0.0 0.0 0.0
0.0 0.0 0.0
THE FOLLOW1 NG M A T R l X C O N T A I N S THE N O R M A L I Z E D E I G E N V E C T O R S
0.516398 -0.441843 0.441028 0.439670 0.437037 0.431576
-0.387298 0.350999 -0.229308 -0.085566 0.082461 0.246988
0.258199 -0.291320 70.087920 -0.363582 -0.447150 -0.272372
-0.129099 0.257509 0.364695 0.408873 -0.027626 -0.422554
-0.000000 -0.246562 -0.473667 -0.000000 0.450538 -0.000000
0.129099 0.257509 0.364695 -0.408873 -0.027626 0.422555
-0.258199 -0.291320 -0.087920 0.363582 -0.447150 0.272372
0.387298 0.350999 -0.229308 0.085566 0.082462 -0.246988
-0.516398 -0.441843 0.441028 -0.439670 0.437037 -0.431576
Example 4.3 Rutishauser's Method 249
Discussion of Results (Examples 1-5) does not converge, but that the eigenvalues of its matrix
(1) The eigenvalues are 30.288685, 3.858057, 0.843107, +
do converge (to 1.0 5.0i and 1.0 - 5.0i).
and 0.010150; they are remarkably well-spaced. The (4) This tridiagonal matrix arises when treating unsteady-
c o n d i t i o n s ~ ~ s u<
ME p s l ( = 10-')stoppedtheiterations. state heat conduction in a slab as a characteristic-
(2) The eigenvalues, 15.0, 5.0, 5.0, and - 1.O, include a value problem (see Section 7.26). Note that the eigen-
coincident pair. The eigenvectors were requested but vectors are alternately antisymmetric and symmetric.
were not computed because a pair of eigenvalues was (5) This unsymmetrical tridiagonal matrix arises when
found to be closer together than EPS2 = 0.001. treating flow of a reacting fluid between two parallel
+
(3) The eigenvalues, 12.0, 1.0 5.0i, and 2.0, include a plates as a characteristic-value problem (see Problem
7.24).
complex conjugate pair. In such cases, it is typical that a
two-row minor, in this example
250 Matrices and Related Topics

4.8 Jacobi's Method for Symmetric Matrices concern ourselves only with a!;"), a$+') and a$+').
We consider here an iterative method, credited to
,
When A, is premultiplied by U,, ,the new entry in the ii
Jacobi, for transforming a real symmetric matrix A into
+
position is a$) cos a a? sin a, that in the ij position is
a$) cos ci + a$) sin a, that in the jj position is a!:) sin a -
diagonal form. This method consists of applying to A
a$) cos a, and that in the ji position is - a $ ) cos a +
a succession of plane rotations designed to reduce the
aii sin a. When the result this far is postmultiplied by
off-diagonal elements to zero. Let A, = A and let Ui, i 2 1,
be orthogonal matrices. Define A, = U;'AoU1, A, =
,
U, + , we have
U;'AlU2, and, in general, al:+') = a[:) cos2 a+ a$) sin2 a + 2a$) sin u cos a;
a$+') = af)sin2 a + a$)cos2 a - 2a$)sin a cos a;
Then if U; = UlU2 . . . U,, aik+')
v = (a::) - a;;)) sin a cos u - a!;) (cos2 a - sin2a).
Now choose u so that ai?") = 0. Then
and A,+, has the same eigenvalues as A. Moreover, for 1 + cos 2a 1 - cos 2a
all indices k, A, is real and symmetric.
=
a 'iik' + a$) + a$) sin 2u;
Recall that Sp (A*A) = XI=, x;=,
aijZij, or, for A real
2 2
and symmetric, Sp (A2) = C:= ,Xy=afj. If T is any non-
singular transformation, T-lA2T has the same eigen-
a$+,) = aii(k) 1 - cos 2a
2
+ a$' 1 + cos
2
2a
- a$) sin 2a;
values as A2, and thus Sp (T-'A2T) =Sp (A2), since it is
the sum of the eigenvalues. Thus, if A, = (a$)), we have, and, unless a$)' = a$),
for (4.48) tan 2a = 2a!f)/(aj:) - a):?), (4.52)
where

If, now, the sequence (U,) can be so chosen that

Direct verification shows that

the sequence

Thus, since the sum of the squares of all entries is constant,


the'diagonal terms have gained in dominance. It can also
will be convergent, since it is monotonic and bounded be verified that
above by Sp (A2). If, in addition,

approaches zero, then


lim A, = diag (A, A, . . . A,); (4.50)
k+m

and, if lirn,,, Uik= U,


AU = U diag (A, A,. . . A,,). (4.51)
The further character of U,,, is now described. Let
a;f)(i # j) be a nonzero element of A,. Let the matrix Using (4.53), it is relatively easy to prove that the process
U,+, be the n x n identity matrix I except that the ith actually does have the desired result if at each stage the
row has been replaced by a row of zeros, other than cos u off-diagonal element of largest value is used in the role
in column i and sin a in column j, while the jth row has of a!;) above. Let
been replaced by a row of zeros, other than sin a in
column i and -cos a in column j. The method of choosing
a will be detailed shortly. Note that U,,, is self-inverse.
Theresult of bothpremultiplyingandpostmultiplying A, by The number of nonzero off-diagonal elements contribu-
U, +, is to leave the elements of A, unaltered, except those ting to Skdoes not exceed r = n2 - n - 2 (since at least
in rows i and j and columns i and j. At the moment, we two elements are zeroed at each iteration). Hence, the
4.8 Jacobi's Method for Symmetric Matrices 251

largest of the quantities equals or exceeds S,/r.


When the next step is performed, we have
The number of storage locations for this arrangement is
(n" n)/2.
The program for determining the quantities sin a and
where 0 < p < 1. Thus Sk < ,LL~-'S,,and it is clear that S, cos a from tan 2a needs care if accuracy is to be main-
has the limit zero. tained. If we have tan 2c( = ajb, we write this in the form
If the program is carried out on a computer, the time tan 2a = p/q, where q = Ibl and p = a (sign b). Then
taken to search for the largest current la$;'/ may be con- sec2 2a = 1 + p2/q2 and cos2 2a = q2/(p2+ q2), whence
siderable. It is therefore natural to inquire whether it
suffices to choose the position (i,j) of the element to be cos 2a = q/ Jp2 + q2,
zeroed at step k + l in some definite sequence, say in and, as required, - n/4 < a < 4 4 . (This assumes a!:) #a$',
the order (1,2), (1,3), . . ., (l,n), (2,3), (2,4), . . ., (2,n), . . ., in which event a = n/4.) Thus, since cos 2a = 2 cos2 a - 1,
(n - l,n), then return to (1,2) and sweep through the
same sequence again. That this leads to the desired result
has been established by Forsythe and Henrici [3].
cos a = df(l+91Jm).(4.56)
In the limit we have Since q > 0, there is no loss of accuracy. Note that (4.56)
is valid for determining cos a, even though q should be
diag (Al R2 . . . A,) = U'AU, (4.55)
zero. To obtain sin a from tan 2a =ply, sin 2a =
where U = lim,,, U, U2 . . . U,. Thus the columns of U
are the eigenvectors of A. Observe that we have proved,
p/Jm (even if q = 0) and

as a by-product, that a real Hermitian matrix of order sin a =


P -. (4.57)
n has n linearly independent eigenvectors, mutually 2 cos a Jp2 + qz
orthogonal. A refinement of the procedure is as follows. Suppose
Since the matrices of the sequence A, formed by the that at an early stage of the iteration a$) is already small.
process described are symmetric, we can economize on There will be little virtue in making this zero since it
storage by storing about half the original matrix, and will not mean decreasing the sum of the squares of the
overwriting all subsequent matrices in the same storage off-diagonal elements appreciably. Therefore, during the
locations. Schematically, for n = 5, we have kth sweep of the off-diagonal elements systematically, a
a11 a12 a13 a14 a15 rotation is omitted if la$jk-')( < .+ The set of values
a22 a23 a24 a25 6, might be a decreasing set and ck should be zero after
a33 a34 a35 a small number of sweeps have been made.
a44 a45 If no eigenvectors are required, then the rotation
a55. matrices U, can be discarded as used. If required, the
If we need the column [al2 a,, a,, a,, a,,]' of the product I Ul U2 . . . Uk may be formed in n2 additional
original matrix, we use the "bent" arrangement locations, wherein the identity matrix 1is stored initially.
EXAMPLE 4.4
JACOBI'S METHOD

Problem Statement a,-,,+. If an element aij is already smaller in magnitude


Write a program that implements Jacobi's method for than some value E ~it ,is simply bypassed in the iterations.
finding the eigenvalues and eigenvectors of an n x n real Before the first iteration, the sum of squares S of all
symmetric matrix A. elements in the full matrix A is computed. The sum of
the squares of the diagonal elements of A before and
Method of Solution after each complete Jacobi iteration is computed and
The program given below follows in detail the pro- saved in a, and a,, respectively. The criterion for ending
cedure described in Section 4.8. However, we can dis- the procedure is normally
pense with the iteration index k, so that the starting
matrix and all its subsequent transformations will be
denoted by the common symbol A. The product of the
successive orthogonal annihilation matrices is the At this point, both o, and a, should almost equal S; in
matrix T = IU,U,U,. . . fact, this correspondence could alternatively be used as
The program takes account of the symmetry of the the criterion for termination. An upper limit, k,,,,,,is
starting matrix and its subsequent transformations, so also placed on the total number of iterations. The
that the calculations do not involve elements in the eigenvalues I,, A,, . . ., A, are the diagonal elements of the
strictly lower triangular portion of A. That is, the "bent" final transformed matrix A. The elements of the corre-
vector arrangement of Section 4.8 is used. Unfortunately, sponding eigenvectors are in successive columns of T.
because of FORTRAN restrictions, we must also reserve The following flow diagram is intended to give an
storage for the lower triangular portion of A, even overall picture of the method. However, because of the
though it is not used. In some other programming special nature of the rotation matrix, U-'AU does not
languages, such as MAD (Michigan Algorithm De- have to be expanded fully in' the program; in fact, only
coder), that permit the definition of unique subscription elements in the ith and jth row and in the ith and jth
functions, it would be simple to compact A into tri- column of A will be modified at each step. It is also
angular rather than square shape. unnecessary for U to appear specifically, since it is
For each Jacobi iteration, each element in the strictly simply I modified by sin u and cos u in a few (known)
upper triangular portion of A is annihilated row by row, positions. The product matrix T, however, is updated
in the order: a12,a13, . ..,aln; az3, .. .,a2,,; ...; at each step.
Example 4.4 Jacobi's Method

Flow Diagram

@+ ,-Gq
vergence"
254 Matrices and Related Topics

FORTRAN Implementation
List of Principal Variables
Program Symbo I Dejinit ion
A Upper triangular matrix, A.
AIK Vector used for temporary storage.
CSA, SNA cos a and sin a (see equations (4.56) and (4.57)).
ElGEN Vector of the eigenvalues 4 , A,, .. ., 1,.
EPSl Tolerance, E, ; for q < E,, ci = n/4.
EPS2 Tolerance, E,; aij is bypassed if lai,/ < c 2 .
EPS3 Tolerance, E,, used in termination criterion.
ITER Counter, k, for the number of complete Jacobi iterations.
ITMAX Maximum number of iterations permitted, k,,, .
N Dimension, n, of the matrix A.
OFFDSQ Sum of squares of off-diagonal elements of A.
P, Q p and q (see equation (4.56)).
S Sum of squares of all elements in the full starting matrix.
SIGMA1, SIGMA2 Sums, a, and o,, of the squares of the diagonal elements of A before and after an iteration,
respectively.
SPQ (pz -I-q2)1/2.
T Product of rotation matrices; ultimately contains eigenvectors.
Example 4.4 Jaeobi's Method

Program Listing
C APPLI ED NUMERI CAL METHODS, EXAMPLE 4.4
C EIGENVALUES AND EIGENVECTORS BY THE JACOB1 METHOD.
ONLY THE UPPER TRIANGULAR PART OF THE REAL SYMMETRIC STARTING
MATRIX A I S READ. T I S AN N BY N ORTHOGONAL MATRIX, BEING THE
PRODUCT OF THE SEQUENCE OF TRANSFORMATION MATRICES USED TO ANN-
I H I L A T E SUCCESSIVELY THE OFF-DIAGONAL ELEMENTS OF A, AND CON-
SEQUENTLY TO REDUCE A ITERATIVELY TO NEAR-DIAGONAL FORM.
S l G M A l AND SIGMA2 ARE THE VALUES OF SIGMA A ( l , l ) * * 2 BEFORE AND
AFTER ONE COMPLETE ITERATION. SINCE ANNIHILATION OF AN ELEMENT
MAY CAUSE ANOTHER ALREADY-ANNIHILATED ELEMENT TO ASSUME A
-
NON-ZERO VALUE, THE ITERATIVE PROCESS I S REPEATED UNTl L
THE CONVERGENCE TEST I S PASSED (1 SIGMAl/SIGMA2 LESS THAN
EPS3) OR ITMAX COMPLETE ITERATIONS HAVE FAILED TO PRODUCE CON-
VERGENCE. AN OFF-DIAGONAL ELEMENT SMALLER I N MAGNITUDE THAN
COMPARED WITH (A(1,I) -
EPS2 I S IGNORED I N THE ANNIHILATION PROCESS.
A(J,J)) TO DETERMINE THE
METHOD OF COMPUTING CSA AND SNA, THE COSINE AND S l N E OF
EPSl I S

THE ROTATION ANGLE, RESPECTIVELY. WHEN A HAS


BEEN REDUCED TO DIAGONAL FORM, THE EIGENVALUES W I L L BE
FOUND I N THE DIAGONAL POSITIONS AND ARE SAVED I N EIGEN(l)...
EIGEN(N). -THE ASSOCIATED EIGENVECTORS ARE IN CORRESPONDING
COLUMNS OF THE F I N A L TRANSFORMATION MATRIX T. S I S THE SUM OF
SQUARES OF A L L ELEMENTS I N THE ORIGINAL MATRIX A, AND
SHOULD THEORETICALLY EQUAL THE F I N A L VALUE OF SIGMA2.
I M P L I C I T REAL.8 (A-H, 0 - Z )
DIMENSION A(20,20), T(20,20), AIK(2O), EIGEN(20)

..... READ PARAMETERS AND ESTABLISH STARTING MATRIX A


DO 2 1 = 1, 2 0
.....
DO 2 J = 1, 2 0

1
T(l,J)
A(I,J)

NM1 = N
= 0.0
= 0.0
READ (5,101) N, ITMAX, EPS1, EPS2, EPS3, ((A(1,J).

-1
J I, N),
I = 1. N)
-
WRITE (6,200) N, ITMAX, EPS1, EPS2, EPS3
DO3 1 =1,N
WRITE (6,201) (A(I,J), J = 1, N)
..... SET UP I N I T I A L MATRIX T,
S l G M A l = 0.0
COMPUTE S l G M A l AND S .....
OFFDSQ = 0.0
DO5 I m1.N
S l G M A l = SIGMAI + A ( I , I ) * * 2
T(l,l) = 1.0
1p1-¤ 1 + 1
I F ( I .GE. N) GO TO 6
DO 5 J IP1, N
OFFDSQ = OFFDSQ + A(I,J)**2
S = P.O*OFFDSQ + S l G M A l

..... BEGIN JACOB1 ITERATION


DO 2 6 ITER = 1, 1TMAX
.....
0 0 2 0 1 = 1, NM1
IP1 = 1 + 1
DO 2 0 J = IP1, N
Q = DABS(A(I,I) -
A(J,J))
..... COMPUTE S l N E AND COSINE OF ROTATION ANGLE
I F (Q .LE. EPS1) GO TO 9
.....
I F (DABS(A(I,J))
P = 2.0*A(I,J)*Q/(A(101)
SPQ = DSQRT(P*P + Q*Q)
-
. L E . EPS2) GO TO 2 0
A(J,J))
CSA = DSQRT((l.0 + Q/SPQ)/2.0)
SNA = P / ( Z.O*CSA*SPQ)
GO TO 1 0
CSA l.O/DSQRT(Z.ODD)
SNA = CSA
CONTl NUE
256 Matrices and Related Topics

Program Listing (Continued)

..... UPDATE COLUMNS I AND J OF f - E Q U I V A L E N T TO


M U L T I P L I C A T I O N BY THE A N N I H I L A T I O N M A T R I X .....
DO 11 K = I,N
HOLDKI = T(K,I)
T(K,I)
T(K,J) = HOLDKl*SNA -
= H O L D K I * C S A + T(K,J)*SNA
f(K,J)*CSA

..... COMPUTE
DO16
NEW ELEMENTS
K = I, N
OF A I N ROWS I AND J .....

IF i~.NE. J) GO TO 1 4
A(J,K)
GO TO 1 6
= SNA*AIK(K) -
CSA*A(J,K)

HOLDIK = A(I,K)
A ( I ,K) = C S A * H O L D I K + SNA*A(J,K)
A(J,K)
CONT l NU E
= SNA*HOLDIK -
CSA*A(J,K)

.....
AIK(J)
COMPUTE NEW ELEMENTS OF A I N COLUMNS I AND J
= SNA*AIK(I) -
CSA*AIK(J)
.....
..... WHEN K I S LARGER THAN I
DO 1 9 K = 1, J
.....
I F ( K .LE. I ) GO TO 1 8
A(K,J)
GO TO 1 9
= SNA*AIK(K) -
CSA*A(K,J)

HOLDKI = A ( K , I )
A(K, I ) = CSAeHOLDKl + SNA*A(K,J)
A(K;J) -
= S N A ~ H O L D K I CSA*A(K,J)
CONTl NUE
A(I,J) = 0.0

..... F I N D S I G M A 2 FOR TRANSFORMED A AND TEST FOR CONVERGENCE


S I G M A 2 = 0.0
.....
DO 2 1 1 = 1, N
EIGEN(I) = A(I,I)
SIGMA2 = SIGMA2 + E I G E N ( l ) * * 2
I F (1.0 - S I G M A l l S I G M A 2 .GE. E P S 3 ) GO TO 2 5
WRITE ( 6 , 2 0 4 ) ITER, SIGMAZ, S, N
WRlTE (6,201) (EIGEN(I1, I = 1, N )
WRITE (6,205)
DO 2 3 1 = 1, N
WRlTE (6,2011 (A(I,J), J = 1, N )
WRITE (6,206)
DO 2 4 1 = 1, N
WRlTE (6,201) (T(I,J),, J = 1, N)
GO TO 1
WRlTE (6,202) ITER, SIGMA1, SI'GRA2
SIGMA1 = SIGMA2
L
C .....
WRlTE
I F ITER
(6,203)
EXCEEDS ITMAX, NO CONVERGENCE
ITER, S, SIGMA1, S I G M A 2
.....
DO 2 7 1 = 1, N
27 WRITE (6,201) (A(I,J), J = 1, N )
WRITE (6,207)
DO 2 8 I = 1, N
28 WRlTE (6,201) (T(I,J), J = 1, N)
GO TO 1
C
C
101
..... FORMATS FOR INPUT AND OUTPUT STATEMENTS 6 . .
FORMAT ( 2 ( 1 2 X , 1 3 1 1 3(11X, E9.211 (10X, 6F10.4))
200 FORMAT ( 1 H 1 , 4X, 5 4 H D E T E R M I N A T I O N OF EIGENVALUES BY J A C O B I ' S METH
10D, W I T H I l H O , 6X, 1 0 H N = ., 1 4 1
2 7X, 1 0 H I T M A X = , 1 4 1 7X, 1 0 H E P S l = , E12.31
3 7X, 1 0 H EPS2 = , E 1 2 . 3 1 7X, 1 0 H EPS3 = , E12.311H0,
4 4X, 3 9 H THE S T A R T I N G M A T R I X A(l,l)...A(N,N) IS/lH
2 0 1 FORMAT ( 7 X , 1 0 F 1 1 . 7 )
Example 4.4 Jacobi's Method

Program Listing (Continued)


202 FORMAT (1H0, 6X, 1 0 H I T E R = , 15, 10X, 1 0 H S l G M A l = , F10.5,
1 l o x , 1 0 H SIGMA2 = F10.5)
203 FORMAT (1H0, 4X, 21; NO CONVERGENCE, W I T H / / 7X, 1 0 H I T E R =, 15,
1 5X, 1 0 H S = , F10.5, 5X, 1 0 H S I G M A 1 = , F10.5,
2 5X, 1 0 H SIGMA2 = , F 1 0 . 5 / / 5X, 2 4 H THE CURRENT A MATRIX I S / l H 1
204 FORMAT (1H0, 4X, 3 1 H CONVERGENCE HAS OCCURRED, WITH/lHO,
1 6X, 1 0 H I T E R = ,
15, 5 X , 1 0 H SIGMA2 m , F10.5,
2 SX, 1 0 H S = , F10.5, SX, 1 0 H N = IS/lHO,
3 4x, 3 6 EIGENVALUES
~ .
EIGEN( 1). . E I G E N ( N ) ARE~IH
205 FORMAT (1H0, 4X, 34H THE F I N A L TRANSFORMED MATRIX A I S / l H
206 FORMAT (1H0, 4X, 6 8 H EIGENVECTORS ARE I N CORRESPONDING COLUMNS OF
1THE FOLLOWING T M A T R I X / l H )
207 FORMAT (1H0, 4X, 24H THE CURRENT T MATRIX I S I I H 1
L
END

Data
N = 4, ITMAX = 50,
E P S l = 1.00E-10, EPSZ x
A(1,l) = 10.0 9,O
6.0 10.0
N = 4, ITMAX 50,
E P S l = 1.00E-10, EPS2
A(1,l) = 6.0 4.0
4.0 6.0
N = 8, ITMAX = 50,
E P S l = 1.00E-10, EPSZ =
A(1,l) = 2.0
01 0
01 o
.. -1.0
0.0
0.0
030
CI 0
290
. 0.0
0,o
-1.0

Computer Output
Resultsfor the 1st Data Set
DETERMINATION OF EIGENVALUES B Y JACOBI'S METHOD, WlTH

N = 4
ITMAX = 50
EPSl = 0.1000-09
EPS2 = 0.1000-09
EPS3 = 0.100D-0l1

THE STARTING MATRIX ~ ( 1 ~ 1 1 .,A(N,N)


. IS

ITER = 1 SIGMA1 = 325.00000

ITER = 2 SIGMA1 3 930.00845

CONVERGENCE HAS OCCURRED, WlTH

ITER = 3 SIGMA2 = 933,00000 S E

EIGENVALUES E I G E N ( 1 ) ...EIGENCN) ARE


258 Matrices and Related Topics

Computer Output (Continued)


THE F l NAL TRANSFORMED MATRIX A I S

EIGENVECTORS ARE I N CORRESPONDING COLUMNS OF THE FOLLOWING T MATRIX

Results for the 2nd Data Set


DETERMI NATION OF EIGENVALUES BY JACOB I 'S METHOD, W l TH
N = b
ITMAX = 50
EPSl
EPS2
EPS3 -
=
=
0.100D-09
0.1000-09
0.100D-04
THE STARTING MATRIX A(l,l)...A(N,N) IS
6.0000000 4.0000000 b.0000000 1.0000000
0.0 6.0000000 1.0000000 4.0000000
0.0 0.0 6.0000000 4.0000000
0.0 0.0 0.0 6.0000000
ITER = 1 SIGMA1 = 144.00000 SIGMA2 = 274.60799

ITER = 2 SIGMA1 = 274.60799 SIGMA2 = 275.99999


CONVERGENCE HAS OCCURRED, WITH
ITER = 3 SIGMA2 * 276.00000 S = 276.00000 N = 4
EIGENVALUES EIGEN(1). .. EIGENCN) ARE
15.0000000 -1.0000000 5.0000000 5.0000000

THE FINAL TRANSFORMED MATRIX A I S


15.0000000 -0.0000000 -0.0000000 -0.0000000
0.0 -1.0000Q00 -0.0000000 -0.0000000
0.0 0.0 5.0000000 0.0
0.0 0.0 0.0 5.0000000
EIGENVECTORS ARE I N CORRESPONDING COLUMNS OF THE FOLLOWING T MATRl X
0.5000000 0.5000000 -0.2699908 0.6547610
0.5000000 -0.5000000 -0.6547640 -0.2669908
0.5000000 -0.5000000 0.6547640 0.2669908 .
0.5000000 0.5000000 0.2669908 -0.6547640
Example 4.4 Jacobi's Method

Computer Output (Continued)


Results for the 3rd Data S e r
DETERMINATION OF EIGENVALUES B Y JACOBI'S METHOD, WITH

N = (I
ITMAX = 50
EPSl = 0.100D-09
EPS2 = 0.1000-09
EPS3 = 0.100D-04

THE STARTING MATRIX A(l,l)...A(N,N) IS

g,ofJoooo0 -1.0000000 0.0 0.0 0.0 0.0 0.0 0.0


ova 2.0000000 -1.0000000 0.0 0.0 0.0 0.0 0.0
0.0 0.0 2.0000000 -1.0000000 0.0 0.0 0.0 0.0
0.D 0.0 0.0 2.0000000 -1.0000000 0.0 0.0 0.0
0.0 0.0 0.0 0.0 2.0000000 -1.0000000 0.0 0.0
01 0 0.0 0.0 0.0 0.0 2.0000000 -1.0000000 0.0
0.0 0.0 0.0 0.0 0.0 0.0 2.0000000 -1.0000000
0, a 0.0 0.0 0.0 0.0 0.0 0.0 2.0000000
LTER 9 1 SIGMA1 = 32.00000 SIGMA2 = 44.11429

9TER 2 SIGMA1 = 44.11429 SIGMA2 = 45.91432

lrfR 6 5 SIGMA1 45,91432 SIGMA2 = 45.99973

GOWVERQEWCf HAG OCCURRED, W l TH


ITER 8 4 (IGMAZ = 46.00000 S = 46.00000 N P 8

EIGENVALUES EIGENt 1) ..,gL@@N(M) ARE

0.1206148 3.8793852 3.512Q@89 0.4679111 1.0000000 3.0000000 2.3472964 1.6527036

THE F I N A L TRANSFORMED MATRIX A I S

0.1206148 0.0000279 0.0000036 -0.0000060 -0.0000001 0.0000000 0.0000000 0.0000000


0.0 3.8793852 0.0000043 -0.0000064 -0.0000001 0.0000000 -0.0000000 -0.0000000
0.0 0.0 3.5320889 -0.0000014 -0.0000000 -0.0000000 -0.0000000 -0.0000000
0.0 0.0 0.0 0.4679111 -0.0000000 -0.0000000 0.0000000 -0.0000000
0.0 0.0 0.0 0.0 1.0000000 -0.0000000 0.0000000 -0.0000000
0.0 0.0 0.0 0.0 0.0 3.0000000 0.0000000 0.0000000
0.0 O,.O 0.0 0.0 0.0 0.0 2.3472964 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.6527036
EIGENVECTORS ARE I N CORRESPONDING COLUMNS OF THE FOLLOWING T MATRIX
260 . Matrices and Related Topics

Discussion of Results Since the sums of the squares of all the elements in the
The eigenvalues and eigenvectors have been computed starting matrices were 933, 276, and 46, respectively, all
for three test matrices. The results are virtually identical off-diagonal d ~ m e n t have
s been reduced to *he vanishing
with those produced by Rutishauser's method in Example point. Convergence is rapid in all cases; even the 8 x 8
4.3. The sums of the squares of the diagonal elements O ~ A matrix in the third data set requires only four ~ a c o b i
after each iteration are as follows: sweeps to'meet the convergence criterion.

Iteration Matrix 1 Matrix 2 Matrix 3

0 (start) 325.00000 144.00000 32.00000


1 930.00845- 274.60799 44.11429
2 932.99695 275.99999 45.91432
3 933.00000 276.00000 45.99973
4 - - 46.00000
4.9 Method of Danilmki 261

4.9 Method of Danilevski where B,, = b,, = a,, and 0 denotes the 1 x (n - 1) null
matrix.
Let A = (aij) be a square matrix of order n and suppose
be a matrix such that Finally, the process may be continued for a time,
that a,2 + 0. ~~t ~ ( 1 =)
mi,!) = a,, (the Kronecker delta, that is, aij = 0 for stopping when all choices for the pivot element lead to a
. , where now
zero. In such event B has the form of (4.58).
0 1 0 0 .-. 0 0
0 0 1 0 ... 0 0
.................................................. (4.59)
0 0 0 0 ... 0 1 0
0 0 0 0 ... 0 0 1
bml bM2 bm3 bm4 ... bm,m-2 bm,m-1 bmm
- while 0 denotes the m x (n - m ) null matrix, and B2, ,B2 2
i # J and a,, = 1 for i =j) unless i = 2. Let m'z:'= lla12 have no special character.
and let mi:' = -alI/al2 if j # 2. Let the inverse The procedure is continued, treating B2, as A itself
M") be P'". Then if P"' = (p!,!'), it is easy to verify that was treated (except that it is not considered as detached
pj;) = a,, if i # 2, while &,' = all. It is also seen that if from B). The final result may be written C = M-'AM,
is used as a po~tmultiplierfor A, and p"' as a Pre- where a typical form for C might be
multiplier, the result is a matrix A ( ~ =(a!:))
' such that
a%)= 1 while a!:) = 0 if.) # 2. Cll 0 0 0
If a,, is zero, or if it be desired to base the effect on
some other nonzero element of the first row, any element
of that row, other than the first, can be brought to this C41 C42 C43 C44
location by an interchange of two columns. If this is
Here the "diagonal" matrices Cl1, C22rC33, and C4,
followed by an interchange of the corresponding rows,
have the form (4.59), or are 1 x 1 matrices, zero or other-
the resulting matrix will always be similar to A.
wise.
Now let M(2)= (m:?)) be a matrix such that mlf) = dij
The result of first importance is that from the matrices
unless i = 3. Assuming ni2,) $0, let = l/a$<) and
mi:) = -ai:)/a$:) if j # 5. Let the inverse of M ( ~ be ) CI1,C22,C33,and C4, we may quickly find a factored
P(2)= (pi:)). Then p!:) = a,, if i # 3, while pi:) = a$:). form for the characteristic function of A. This is because
It is also seen that if M(2)be used as a postmultiplier for the characteristic function for A is that for C, and the
A(2) and P(2) as a premultiplier, the result is a matrix characteristic function for C is the product of those for
A(3)= (a!;))such that a\? = 1, a(,:) = 0 if j # 2, a$:) = 1, CII,C22, C33, and C44.
a$;) = 0 i f j # 3. When the characteristic function is known, it may be
If ay? is zero, or if it be desired to base the effect on that its zeros can be found. With this information, the
some other nonzero element of the second row, any matrix C and the matrix M may be used to find the eigen-
element of this row, other than the first two, can be vectors of A, for the eigenvectors of A are M times the
brought to the proper location by an interchange of two eigenvectors of C.
columns. If this is followed by an interchange of the The formation of the characteristic vectors of C will be
corresponding rows, the resulting matrix is similar to illustrated only for the case where C has the form
A(2)and the first row is unaltered. C,, 0 0 0
Continuing in this manner we arrive at a matrix B,
0 c22 0
similar to A and having one of three possible forms. If
the process can continue unhindered, then we find
0 c44
0 1 0 0 a3.0 0
and each C,,has the form of (4.59). Suppose c,, IS
0 0 1 0 . . - 0 0
...................#...............,........ P x p, c22 i~ q X grC J 3is r x r, and C,, is s x s. Let the
0 0 0 1 last row of C11 be [el e2 . . . e,], those for CZz,C,,, and
C44 being [flf 2 . ..f,], [gl g, . . . g,], and [h, h, . . . h,],
. . S O

0 0 0 0 ... 0 0 1
respectively. Just as with the companion matrix (4.32), the
bnl bn2 bn3 bn4 bn,,-z b,,,-, bnn
It may be that all elements of the first row, other than
characteristic function of C, is ,
perhaps a, , , are zero. In such event B may be represented
in partitioned form as
and similarly for C,,, C,,, and C4,.
(4'58) Let a be an eigenvalue of C and let x, y, I, r be arbitrary
262 Matrices and Relaled Topics
for the moment. Then from the nature of C,,, etc., we tion of C1, only, then x is arbitrary and y = z = t = 0.
see that an eigenvector u corresponding to a must be Moreover, even if a is a multiple eigenvalue, there is
.
such that u' = [x, ax, ...aP-'x, y, ay, . .aq-'y, z,az, ... still only (essentially) one eigenvector. If, say, a is an
ar-Iz, t, at,. ..a'-'t]. In addition, we must have eigenvalue for both Cl1 and C3, but not C2, and C44,
then x and z are arbitrary, while y and t are zero. In such
P
x C eiai- = apx,
i=l
y x fi
9

i- 1
'
ai- = aqy, z
r

i= 1
@pi-= arz, event, there are two linearly independent eigenvectors
for a. In general, all eigenveciors can be found from the
and eigenvectors for the component matrices by using zeros
in the rows corresponding to other component matrices.
t xs

i= 1
'
hiai- = a?. For the case examined last, the eigenvectors would be
u \ = [ l , a ,... aP",O.O ,... 0,0,0,...0,0,0 ,... 01 and
Thus we see that if a is a zero for the characteristic func- u$ = [O, 0, ...O, 0,0, ...O, 1, a, ... aq-', O,0, .. .0].
Problems

Problems 4.11 Show that if

4.1 Show that if B1 is a close approximation to A-I, then


an even better approximation Bz is given by
A= [ I
cos I9 sin 13
-sin I9 cos 19 '

4.2 If A, B, and C are nonsingular square matrices, prove


that (ABC)-I = C-'B- 'A-I.
[cos nI9 sin nI9
A n = -sin nI9 cos nI9
I '

4.3 h o v e that (AB)' = IB'At.


4.4 Show that the matrix 4.12 Devise a problem, similar to the previous problem,
using cosh 6 and sinh 8.
4.13 Prove that the set of all nonsingular diagonal
matrices of order n is a commutative group under matrix
multiplication.
4.14 Verify Sylvester's law of nullity for the two matrices

is unitary, and find its adjoint in two ways.


4.5 A point P has position vector x relative to an origin 0 4.15 Find the general solution of the simultaneous
in rectangular
- coordinates. If the axes are rotated, but still equations
remain orthogonal, the new position vector of P may be ex-
pressed by the linear transformation Ax. By noting that the
distance OP is unchanged, show that A is an orthogonal
matrix, that is, one for which At = A-I.
4.6 Find all solutions of the simultaneous equations

- X I f 4x2 - 2x3 f 3x4 + Xy =


1, Caution: the matrix of coefficients has a repeated eigenvalue.
-9xt - 6x2 + 6x, 17x4 + + 17x5= 21. 4.16 Write a program for finding, by the method of
Leverrier, the characteristic polynomial for a square matrix A
4.7 It is shown (page 1!13) for orthonormal vectors u,, of order n. (A related method due to Faddeev is to be preferred
UZ, . . ., u., that [ul uz . .. u,]*[u,
uz .. . u.] = I. Prove that also if the eigenvectors, or the inverse, or the adjoint is desired as
[u, Uz . .. u.] [u1 U2 .. . un]*=: I. well; see Problem 4.17.)
4.8 Express the singular matrix
The method is described for the polynomial

as displayed in equation (4.1 1). Let st = 1 1 = , A, be the sum


of the zeros of the above polynomial and, more generally, let
sk= C1=lA: be the sum of the kth powers of these zeros.
Then, by Newton's formulas [13], for k = 1 , 2, . ..,n,
as the product of two matrices.
4.9 Matrices such as
so that 01,UZ, . .., u. can be calculated if s l , sz,..., s, are
known. But the spur of any matrix M is the sum of its eigen-
values, so that (see page 220) sl = Sp(A) and, in general,
s & =Sp(Ak).
are called circulants. Find all eigenvalues and corresponding 4.17 Write a program for finding, by the method af
eigenvectors for X3.Use the results to build a matrix U such Faddeev, the characteristic polynomial $(A) = deT(A - AI) =
that U-'X3U is diagonal (see equation (4.20)). (- l)"A(A) where, as in equation (4.1 I), we write

;: :-,I.
4.10 Find one eigenvalue and a corresponding eigenvector
for the general circulant

[XI Xz. .. xo
Since this method involves also the computation of B t , B2,
.
. .,B,- where
B(A) = adj (XI - A)
264 Matrices and Related Topics

the program can be written so that we also find: time-level n, and the boundary values g,(t,) and g,(t.) are
zero, show that
(a) adj (A) = (- I)"-' adj (-A) = B.-I,
V n + l =Avn,
(b) A- = a; adj A = a; 'B,-, , provided that A is non-
singular, in which A = I + r T M - , ,where r = A~/(AX)~
and Tw-I is the
(c) for any known eigenvalue hi, a corresponding eigen- matrix of Problem 4.19 (now with M - 1 rows and columns).
vector, p,, by taking any nonzero column (see page 220) of The eigenvalue approach for discussing stability of the
B(X1). method involves proving whether the eigenvalues of A are less
than one in magnitude (see page 222). Show that these eigen-
The process, which follows, is established in Gantmacher
values are
[Ill:
A1 =A, UI = Sp(AI), B1= 011 - AL,
1 and that stability occurs if r < 9.
A2 = ABI, (12 =- Sp(Az), Bz = azI - Az,
2 4.21 In the Crank-Nicolson scheme for the numerical
solution of u, = u,, (page 451,also [9], p. 64), the eigenvalue
approach for discussing stability involves proving that the
+
eigenvalues of A = (21 - rTM-,)- '(21 rTM-,) are less than
one in magnitude. Use Problem 4.19 to show that the eigen-
values are

4.18 Read the first two paragraphs of Problem 5.13 and


assume the following:
(a) The total strain energy of the frame is given by
U = +sfFs. and that the procedure is always stable.
(b) The elements of the vector r = [r, r2 . . . rl . . . r,lt of 4.22 In item (6) under "Method of Solution" in Example
loads in the n selected redundant members adjust themselves 4.3 (Rutishauser's method), verify:
so that the strain energy is minimized. That is, aU/ar, = 0, (a) The matrix labeled LC' as the inverse of that labeled
i = I , 2 ,..., n. Lm,
(c) The m corresponding elements 6, and p j of the dis- (b) the simultaneous equations given for D,, u,, .. . v.,
placement and applied load vectors 6 and p are related by (c) the general formula given for ci.
8, = aU/apj,j = 1, 2, . . ., m. 4.23 Consider the horizontal motion of the mass-spring
Prove the validity of the relations for r and s given in system shown in Fig. P4.23. The horizontal deflections x,
Problem 5.13.
4.19 A matrix occurring frequently in the numerical

!I,
solution of partial differential equations is the n x n tri-
diagonal matrix [9]:

= [ : : -1
-2
1 -2
1 0
1
o...o
o...o
1 . .

o... 1 -2
0
Figure P4.23
and x2 are measured relative to the position of static equili-
brium. The spring stiffnesses k , , k,, and ka are the forces re-
quired to extend or compress each spring by unit length.
wherein all entries are zero except the diagonal entries, always (a) Show that the equations of motion are
-2, and the infra- and super-diagonal entries, always 1. Show
that the eigenvalues and corresponding eigenvectors are
given by
(b) If the deflection vector is x = [x, x2]', rewrite the equa-
tions of motion in the form x = Ax.
(c) Show that the substitution x = beJ"', where j = d\/--1,
in 2in ni?s
sin -
n+l ' I ' leads to the eigenvalue problem
Ab = Xb,
where i = 1 , 2, . . ., n. where X = -wZ. The possible values that w may assume are
4.20 The explicit method for the numerical solution of the natural circular frequencies of vibration of the system.
u, = u,, is discussed in Section 7.5 (see also p. 61 of Smith (d) If k l = kz = k , = I lb,/sec2, and m, = m2= 1 lb,, find
[g]). If v. = [vl,. VZ,,, . . . Z ~ M - ~is, ,the
, ] ~vector of solutions at the eigenvalues and eigenvectors of A.
(e) If the initial conditions at t = 0 are xl = 1, x2 =XI = where x is the axial distance along the beam, M is the local
x2 = 0, show that bending moment (considered positive if it tends to make the
beam convex in the direction y, as shown), E is Young's
modulus, and I is the appropriate cross-sectional moment of
inertia.

4.24 Generalize Problem 4.23 to the situation involving


+
n masses, m l , m,, . . ., m,,connected by n 1 springs of stiff-
nesses k , , k 2 , . . ., k.+,.
Write a program that will read values for these masses and
spring stiffnesses as data, and that will proceed to compute
the resulting natural frequencies w and the associated ampli-
tude vectors b.
Test the program for a variety of mass-spring systems, and
investigate, as special cases, the effect of making a particular
mass or spring constant either very large or very small.
4.25 Figure P4.25 is a simplified representation of the
main three rotating masses in a two-cylinder engine driving a
Figure P4.27
flywheel on the same shaft.
A column of length L, shown in Fig. P4.27, is subjected to
an axial load P at its pin-jointed ends, which are free to move
vertically. By taking moments, we find that M = Py, giving
dZy PY
--- - -
dx2 EI'
Show that a nontrivial solution of this differential equation
occurs only when P has one of the characteristic values

I,,Iz, and ISare the moments of inertia (Ib, in. sec2) of the
rotating masses. The shaft st.iffnesses k l and kz are the torques
(lb, in.) required to twist each section through unit angle. and that the solution is then y = A sin(n~x/L),where A is
Let f l , , 82, and 0, be the angular displacements of the rotating an arbitrary constant. Note that for n = 1, we have P =
masses. a2EIiL2(known as the Eulev load), under which the column
(a) By equating torque to the product of moment of will just begin to buckle.
inertia and angular acceleration, show that What will happen if P does not equal one of these charac-
teristic values ?
4.28 Approximate the solution to Problem 4.27 by the
following finite-difference method. Divide the beam into
+ +
n 1 segments, each of length Ax = Li(n I), by introducing
(b) Following the general pattern of Problem 4.23, com-
+
points labelled 0, 1, 2, . . ., i, . .., n, n 1, as shown in Fig.
P4.28.
pute the natural frequencies and the associated vectors of
relative displacements for each mode of torsional oscillations
if I, = Iz= 4, 1, = 20 lb, in. sec2, k , = 8 x lo6, and k2 = lo7
lb, in./radian.
4.26 Write a program that will generalize the situation in
Problem 4.25 to that of n rotating masses on a shaft. Figure P4.28
Test the program with th~edata of Problem 4.25, and also
From equation (7.7), the approximation to the differential
for the following four-cylinder engine and flywheel: 1, =Iz=
equation at the ith point is
I , = I.+= 4, Is = 20 lbf in. sec2, kl = k2 = k3 = 8 x lo6,
k4 =: lo7 lb, in./radian. Also investigate the effect of adding
a generator (I6 = 8) connected by a shaft (ks = 2), (a) next
to the flywheel, and (b) next to the first cylinder.
4.27 The transverse deflection y of a beam obeys the (a) By considcring points i -- 1, 2, .. ., n in turn, show that
differential equation the vector of deflections, Y = [ylyz . . . y.]', obeys
d2y M
--- - -
dx2 EI'
266 Matrices and Related Topics

where A = cross-sectional area, and p = density. Substitution of a


solution of the form y = X(x)el"' (where j = d-1 and w is
a circular frequency of natural Vibrations) into the above
equation leads to the following ordinary differential equation
for the function X:

At both ends we have X = 0 and d2X/dx2= 0.


(b) How are the characteristic loads P and the correspond- +
By subdividing the beam into n 1 equal segments and
ing deflections y related to the eigenvalues and eigenvectors following the general finite-difference approach of Problem
of A ? 4.28, show that the determination of the frequency and
(c) The eigenvalues of A are computed, for n = 8, in relative displacements for each natural mode of vibration
Example 4.3. Compare the resulting characteristic loads and amounts to finding the eigenvalues and eigenvectors of a
relative deflections with the exact solution given in Problem matrix. If required, additional information concerning
4.27. finite-difference approximations and the representation of
4.29 A vertical column of length L, clamped at the bottom, boundary conditions can be found in Sections 7.3 and 7.17,
is subjected to a vertically downward load P at its free end, and in Example 7.9.
as shown in Fig. P4.29. The column does not necessarily have For a certain small light-alloy beam or reed, E = lo7 Ib,/
a uniform cross-sectional moment of inertia I along its height. in.', L = 10 in., p = 170 lb,/ft3, and b (breadth) = 0.5 in. The
thickness t varies linearly from one end to the other according
+
to t = to ( f ~- to)(x/L). At any section, A = bt and I = bf3/
12. Using n = 5 and n = 10 in turn. compute the various
modes of vibration for (a) to = tr. = 0.02 in. and (b) to =
0.02 in., f~ = 0.025 in. The conversion 1 Ib, = 32.2 Ib, ft/
sec' will be needed. Also note that, analytically, for the case
of constant thickness, w = (EI/Ap)"'p2, where p is a root of
sin pL = 0 [14].
4.32 Repeat Problem 4.31 for the case of a small cantilever
that is clamped at x = 0 (X = 0,dX/dx = 0) and is completely
free at x = L (d2X/dx2= 0, d3X/dx3= 0). Investigate the
cases of (a) lo = I&= 0.02 in., (b) to = 0.02 in., IL = 0.025 in.,
and (c) to = 0.025 in., t~ = 0.02 in. Analytically, for the case
of constant thickness, w = (EI/Ap)'12p" where p is a root of
coshpL cospL = - 1 (see Problem 3.27).
4.33 The transverse deflections w of a two-dimensional
Figure P4.29 vibrating membrane with fixed perimeter are governed by
Show, by developing the situation into a matrix eigenvalue
problem, how the axial buckling load could be estimated.
4.30 A vertical column, pin-jointed at both ends, is
tapered so that its minimum cross-sectional moment of inertia where T is the tension per unit length throughout the mem-
varies linearly from I, at the bottom (x = 0) to IL at the top brane and rn is the mass per unit area; consistent units are
(x = L). assumed. Substitution of a solution of the form w = U(x, y)eJw'
By following a finite-difference procedure similar to that (where j = 47 and w is a circular frequency of natural
of Problem 4.28, determine the least axial load under which vibrations) leads to the following equation for the function U:
the column would start to buckle. At this point, what would
be the axial shape of the column?
The following test data are suggested: E = 30 x lo6 lb,/
sq in., L = 48 in., I. = 0.002 in.*, IL= 0.001 in.*, n = 4 and 9
(that is, investigate the results first using five, and then ten By following a finite-difference approach similar to that
subdivisions). in Problem 4.28, show that the determination of the frequency
4.31 A beam of length L is pin-jointed at both ends. and relative displacement; for each natural mode of vibration
Transverse deflections y of the beam vary with time t and amounts to finding the eigenvalues and eigenvectors of a
distance x along the beam according to the partial differential matrix.
equation Determine the natural frequencies and modes of vibration
for the trapezoidal-shaped membrane of Fig. P4.33, with
4% = 2000 in./sec. Suggested grid size: Ax = 1 in.,
Ay = 2 in., using symmetry to reduce thc number of points to
where E = Young's modulus, I = second moment of area, be considered.
Problems 267
(a) Evaluate T = MC, temporarily saving the results in
a matrix T.
(b) Invert M in place in the second n rows of A, using the
function SlMUL (see Example 5.2).
(c) Compute A = TM-', where the reconstructed ele-
ments of A are overstored in the first n rows of A.
(d) Print out the reconstructed matrix and compare it
with the original matrix A.
As test data for the program, use the first three matrices
investigated in Example 4.3 and the matrix of (3.1.10) for
Cases A and B of Example 3.1. The roots of the characteristic
equations for these matrices (that is. the eirrenvalues of the
matrices) have been found in data sets 8, 11, 9, 14, and 15,
respectively, of Example 3.1.
4.36 Consider the following transformation of a syrn-
metrical n x n starting matrix A, = A = (a',), to give a
4.34 In many practical prloblems involving vibrations, one second matrix Az :
is often primarily concerned only with Aml, and A,,, the
eigenvalues of least and greatlest magnitude.
In this connection, write a program that will use the power where P Z= I - 2 ~ 2 ~and
; the column vector vl is such that
method to determine A,,. anti A,, and the associated eigen- vlvl = 1. Show that P Zis both orthogonal and symmetric and
vectors for an n x n real matrix A, which is read as data. hence that the eigenvalues of A, are the same as those of A,.
4.35 Write a general-purpose program that, given a real Now define
matrix A of order tr, computes;the coefficients of the character-
istic equation (or factors of tlhe characteristic equation) using
the method of Danilevski, described in Section 4.9. The
program should read the rr2 elements of A = (al,) into the
first n rows of the matrix A, and read values for a small
tolerance E and any desired printing control variables. A
should be transformed in place by the sequence of matrices
M"', M"', . . ., M'"' described in the text. The matrices M'"
need not appear explicitly, but the composite similarity
matrix M should be constructed in the second n rows of A
during the course of the iteration. The matrix C of (4.60) may in which either both plus signs or both minus signs are taken.
be built up in the first n rows of A (the original matrix A will Prove that if v2 = [0, x2, x3, . . ., x,,]', Al will have zeros in the
be destroyed in the process). third through last elements of its first row and first column.
Before the Danilevski procedure is applied to the ith row Demonstrate that n - 3 analogous further transformations,
of A, all elements to the right of the diagonal should be A3 = P3A2PS, A, = P,A,P:, etc., can be made that will
examined to find the element of greatest magnitude, in column ultimately yic:ld a symmetric matrix that is tridiagonal.
+
k for example. Exchange columns k and i 1 and rows k and Let a,, al, . ., a. be the diagonal elements of A,,-2 and let
+
i 1 to maintain the similarity of A as described in the text. If its subdiagonal (and superdiagonal) elements be b,, bl, . . .,
the largest element is smaller in magnitude than E , simply clear b.-,. Show that p,(A), the characteristic polynomial of
the row to zero and proceed to the next row. is given by the following recursion relations:
The coefficients of the characteristic equation (or coeffi-
cients of factors of the characteristic equation) should be stored
in the vector B and printed when found.
As an indication of the errors resulting from the procedure,
the starting matrix A can be ]regenerated as
A = MCM-l. The a'bove forms the basis for the Ciuens-Householder
method for determining the eigenvalues (and eigenvectors) of a
After recovering all the coefficients of the characteristic symmetric matrix. For a comprehensive discussion, see
equation: Ralston and Wilf [IS].
268 Matrices and Related Topics

Bibliography 8. F. B. Hildebrand, Methods of Applied Mathematics, Prentice-


Hall, Englewood Cliffs, New Jersey, 1965.
1 . E. Bodewig, Matrix Calculus, North-Holland Publishing Co., 9. G. D. Smith, Numerical Solution of Partial Differential Equa-
Amsterdam, 1959. tions, Oxford University Press, London, 1965.
2. N. H. McCoy, Introduction to Modern Algebra, Allyn and 10. E. D. Nering, Linear Algebra and Matrix Theory, Wiley, New
Bacon, New York, 1960. York, 1963.
3. G. E. Forsythe and P. Henrici, "The Cyclic Jacobi Method for 11. F. R. Gantmacher, The Theory of Matrices, Chelsea, New York,
Computing the Principal Values of a Complex Matrix," Trans. 1960.
Amer. Math. Soc., 94, 1-23 (1960). 12. K. Hoffman and R. Kunze, Linear Algebra, Prentice-Hall,
4. H. Rutishauser, "Solution of Eigenvalue Problems with the Englewood Cliffs, New Jersey, 1961.
LR-Transformation," Nad. Bureau of Standards Appl. Math. 13. G. Chrystal, Textbook of Algebra, Vol. 1 , Chelsea, New
Series, 49,47-81 (1958). York, 1952.
5. D. K. Faddeev and V. N. Faddeeva, Computational Methods of 14. A. H. Church, Mechanical Vibrations, 2nd ed., Wiley, New
Linear Algebra, Freeman, San Francisco, 1963. York, 1963.
6. H. W. Turnbull and A. C. Aitken, An Introduction to the Theory 15. A. Ralston and H. S. Wilf, ed., Mathematical Methods for
of Canonical Matrices, Dover, New York, 1961. Digital Computers, Vol. 2, Wiley, New York, 1967.
7. L. A. Pipes, Matrix Methods for Engineering, Prentice-Hall,
Englewood Cliffs, New Jersey, 1963.
CHAPTER 5

Systems of Equations

5.1 Introduction I and replacing the ith diagonal element with a nonzero
This chapter is concerned with methods for solving the constant q. For example, with n = 4 and i = 3,
following system of n simultaneous equations in the n
unknowns x,, x,, . . ., x,:
!1(~1,~2,...r xn) = 0,
f2(~1,~27..xn) = 0,
.1
(5.1) Note that det Q = q, and that the inverse matrix Q-' =
diag (1, 1, I/q, 1) is again like I, this time with l/q in the
jn(xl,x:zl .., x,) = 0. ith diagonal position.
2. An elementary matrix of the second kind is an n x n
The general case, in which the functions fi,f2, . . .,A do matrix R, formed by interchanging any two rows i and
not admit of any particular simplification, is treated in j of I. For example, with n = 4, i = 1, and j = 3,
Sections 5.8 and 5.9. However, if these functions are
linear in the x's, (5.1) can be rewritten as:

Note that det R = - 1, and that R is self-inverse, that is,


RR = I.
3. An elementary matrix of the third kind is an n x n
More concisely, we have matrix S, formed by inserting a nonzero constant s into
the i, j (i # j) element of I. (This may also be construed
as taking I and adding a multiple s of each element in
in which B is the matrix of coefficients, u = [ul,u2, . .., u,]' row j to the corresponding element in row i.) For example,
is the right-hand side vector, and x = [xl,x2, . . ., x,,]' is w i t h n = 4 , i = 3 , and j = 1,
the solution vector. Assu~mingnegligible computational
round-off error, direct methods for solving (5.2) exactly,
in a finite number of oper,ations, are discussed in Sections
5.3, 5.4, and 5.5. These direct techniques are useful when
the number of equations involved is not too large (typi-
cally of the order of 40 or fewer equations). Iterative Note that det S = 1.
methods for solving (5.2) approximately are described in Premultiplication of an arbitrary n x p matrix A by
Sections 5.6 and 5.7. These iterative techniques are more one of these elementary matrices produces an elementary
appropriate when dealing with a large number of simul- transformation of A, also termed an elementary row
taneous equations (typically of the order of 100 equations operation, on A. As examples, we form the products QA,
or more), which will often possess certain other special RA, and SA, with n = 3, i = 2, j = 3, and p = 4.
characteristics.

5.2 Elementary Transformations of Matrices


Before studying systems of equations, it is useful to
cons~ckrthe three types of elementary matrices:
1. An elementary matrix of the first kind is an n x n
diagonal matrix Q, formed by taking the identity matrix
270 Systems of Equations
5.3 Gaussian Elimination
The direct methods of solving equations (5.2) are based
on manipulations using the techniques expressed by the
elementary matrices of Section 5.2. We now describe
one such method, known as Gaussian elimination.
Consider a general system of three linear equations:

As a first step, replace the second equation by the result


of adding to it the first equation multiplied by -b2,/bl , .
It is apparent that premultiplication by the elementary Similarly, replace the third equation by the result of
matrices produces the following transformations of A: adding to it the first equation multiplied by -b3,/bl1.
The result is the system
1. QA: Multiplication of all elements of one row by a
scalar.
2. RA: Interchange of two rows.
3. SA: Addition of a scalar multiple of elements of
one row to the corresponding elements of another row.
Observe that in each case the original elementary in which the b' and ti' are the new coefficients resulting
matrix can be formed from the identity matrix I by from the above manipulations. Now multiply the second
manipulating it exactly as we wish to have A manipulated. equation of (5.5) by -b;,lb;,, and add the result to the
Postmultiplication of an arbitrary p x n matrix A by third equation of (5.5). The result is the triangular system
one of the elementary matrices is called an elementary
column operation. The three types of operations produce
the following results :
:. AQ: Multiplication of all elements of one column
by a scalar.
2. AR: Interchange of two columns. in which b,", and u: result from the arithmetic operations.
3. AS: Addition of a scalar multiple of elements of one The system (5.6) is readily solved by the process of back-
column to the corresponding elements of another column. substitution, in which x3 is obtained from the last equa-
tion; this allows x2 to be obtained from the second
If A is any matrix and T is the matrix resulting from equation, and then x, can be found from the first
elementary row or column operations on A, T and A are equation.
termed equivalent matrices. For the examples given above, The above method seems primitive at a first glance,
if A is a square matrix, but by the time it has been made suitable for implementa-
tion by automatic machines, it furnishes a powerful tool
det(QA) = det Q x det A = q det A;
not only for solving equations (5.2), but also for finding
det(RA) = det R x det A = -det A; the inverse of the related matrix of coefficients B, the
det(SA) = det S x det A = det A. determinant of B, the adjoint of B, etc.
Insofar as reaching (5.6) is concerned, all can be ex-
Thus, multiplication of all the elements of one row of a plained in terms of elementary matrices of the third
square matrix by a scalar also multiplies the determinant kind. Note that matrices alone suffice, the presence of
of the matrix by that scalar. Interchange of two rows x,, x,, and x , being superfluous. Define an augmented
changes the sign of the determinant (but not its magnitude), matrix C consisting of the original coefficient matrix B
and addition of a scalar multiple of elements of one row with the right-hand side vector u appended to it. That is,
to the corresponding elements of another row has no
effect on the determinant.
Clearly, the product of elementary matrices is non-
singular, for each component has an inverse. It is also
true that every nonsingular matrix can be written as a
product of elementary matrices. in which the broken line denotes matrix partitioning.
5.3 Gaussian Elimination 271

Also define three elementary matrices of the third kind Note that by augmenting the coefficient matrix with
several right-hand side vectors, we can solve several sets
of simultaneous equations, each having the same coef-
ficient matrix, at little extra computational cost.
Example. Consider the system of equations
2 x 1 - 7x2 f4x3 = 9 ,
XI + 9x2 - 6x3 = 1 ,
+
- 3 ~ 1 + 8x2 5x3 = 6 ,
for which the solution is x l = 4, x2 = 1, and x 3 = 2. The
augmented matrix [B 1 u I] will be formed, and the Gaussian
elimination procedure just described will be carried out,
except that the normalization steps will be introduced in a
The operations producing (5.6) from (5.4) can then be somewhat different order. Starting with the matrix,
expressed as
2 - 7 4 9 1 0

s3s2s1c = [:
bll b12

Z :
b13 U1

-
9
3 8 5
0
6 0 0
1
1
",

we multiply the top row by 112, add - 1 times the new first
The back-substitution is expressed in terms of premulti- row to the second row, and 3 times the new first row to the
plication by elementary matrices of the first and third third row. The result is
kinds. Let Q,, Q2, and Q 3 denote the three matrices of
the first kind which are needed. For example,

Then, with three more matrices of the third kind, which This is equivalent to having formed the equations
we call S,, S,, and S,, the complete sequence of opera-
tions results in

Q ~ S ~ S S Q ~ S , Q , S ~ S=
~S,C

Let E denote the product of these nine elementary


matrices. Then EC = E[B ! u] = [I x], whence EB = I
and E = B-l. Hence, as a byproduct of solving equations Note that the operations performed are equivalent to the
such as (5.4) by elimination, we see that proper planning matrix multiplication,
can produce B-'. Clearly, we need not solve equations
at all if only the inverse is needed, for in that event the
column u is superfluous.
Since EB = I, det(E) det(B) = det(1) = 1. From Section
5.2, the determinant of an S or third-kind elementary
matrix is unity, whereas the determinant of a Q or first-
kind matrix equals the value of that diagonal element
which is usually not unity. Hencc dct(E) = dct(Q3)
x det(Q,) det(Ql). That is, det(E) is the product of the which yields as a result
diagonal elements (such as lib;,) of the matrices Q,, Q,,
and Q, used in the elimination process. This means that
det(B) is the product of their reciprocals.
The above arithmetic operations can be separated into
two types: (a) normalization steps in which the diagonal
elements are converted to unity, and (b) reduction steps
in which the off-diagonal elements are converted to zero.
272 Systems of Equarions

Returning to (5.7), multiply the second row by 2/25, and Inspection shows that the relevant diagonal elements are
then add 512 times the new second row to the third row. The simply the multiplying factors used in the normalization steps,
result is : so that

det B = X [! -2
2 25 43
X -1
5 -'
= 235.

5.4 Gauss-Jordan Elimination


A variation that accomplishes the effect of back-
substitution simultaneously with the reduction of the
subdiagonal elements will now be illustrated, again for
the system,
The forward course has now been completed and, correspond-
ing to (5.6), we may write

Suppose that B-' is required and form the augmented


matrix [B 1 u I]:

To carry out the back-substitution, start by multiplying


the last row by 5/47. Then multiply the new last row by As before, normalize the first tow by dividing by the pivot
16/25 and add to the second row. Multiply this same last row element 2; then reduce the remaining elements of the
by -2 and add to the first row. The result is
first column to zero by subtracting the new first row from
the second row, and also by subtracting - 3 times the
new first row from the third row. The result is

Finally, multiply the second row by 712 and add to the first.
The result is

Next, normalize the second row by dividing by the pivot


element 2512; then reduce the remaining elements of
the second column to zero by subtracting -(7/2) times the
new second row from the first row, and -(5/2) times
the new second row from the third row. Note that the
reduction process now involves both the subdiagonal und
This means, of course, that x1 = 4, x2 = 1, x, =2 and the superdiagonal elements. The result is
inverse of the matrix of coefficients is

The determinant of the coefficient matrix B equals the Finally, normalize the last row by dividing by the pivot
produ~.of the reciprocals of the diagonal elements appearing element 4715 ; then reduce the remaining elements of the
in the Q-type matrices involved in the above transformation. third column to zero by subtracting -(6125) and -(16125)
5.4 Gauss-Jordan Elimination 273
times the new third row from the first and second rows, its highest value (n 3. m ) until the pivot column is reached.
respectively. The resulting matrix is [I 1 x B-'1, where x Thus far, elementary matrices of the second kind have
is the solution vector, and'B- '
is the inverse of the original not been used; neither has mention been made of the
matrix of coefficients: fact that at some stage, say the first, a potential divisor
or pivot, such as b,,, may be zero. In this event, we can
think of interchanging rows, which is expressible, of
course, in terms of elementary row operations of the
second kind. A related problem is that of maintaining
sufficient ac.curacy during intermediate calculations in
order to achieve specified accuracy in the final results.
This might be expected for a nearly singular system; it
can also happen when the magnitude of one of the pivot
elements is relatively small. Consider, for instance, the
The determinant of the original coefficient matrix is system
again the product of the pivot elements and thus equals 0.0003 XI +
3.0000 x2 = 2.0001,
2 x 2512 x 4715 or 235. +
1.0000 x, 1.0000 x, = 1.0000,
We conclude this section by developing an algorithm
which has the exact solution x, = 113, x, = 213. If the
for the above procedure, which is called Gauss-Jordan
equations are solved using pivots on the matrix diagonal,
elimination. Let the starting array be the n x (n m) + as indicated in the previous examples, there results
augmented matrix A, con~sistingof an n x n coefficient
matrix with m appended columns: 1.OOOO x1 + 10000 x2 = 6667,
x2 = 6666/9999.
If x, from the second equation is taken to be 0.6667, then
from the first equation x, = 0.0000; for .u2 = 0.66667,
x1 = 0.30000; for x2 = 0.666667, x, = 0.330000, etc.
The solution depends highly on the number of figures
retained. If the equations are solved in reverse order,
Let k = 1, 2, . . ., n be the pivot counter, so that a,, is the that is, by interchanging the two rows and proceeding as
pivot element for the kt11 pass of the reduction. It is before, then x, is found to be 1.9998/2.9997 = 0.66667
understood that the values of the elements of A will be while xi = 0.33333. This example indicates the advisa-
modified during computation. The algorithm is
Normalization

Reduction

Note (a) Since no nonzero elements appear to the left of


a,, in the kth row at the beginning of the kth pass, it is bility of choosing as the pivot the coefficient of largest
unnecessary to normalize akj for j < k ; (6) In order to absolute value in a column, rather than merely the first
avoid premature modification of elements in the pivot in line. The handling of the situation is developed in more
column, the column counter j is always decremented from detail in Example 5.2.
EXAMPLE 5.1
GAUSS-JORDAN REDUCTION
VOLTAGES AND CURRENTS IN AN ELECTRICAL NETWORK

Problem Statement with each cycle of the algorithm, to yield the, deter-
minant of B upon completion of the nth cycle. The
Write a program that implements the Gauss-Jordan
complete algorithm is
elimination algorithm outlined in (5.8) to solve n simulta-
neous linear equations. The program should allow con-
current solution for m solution vectors. Initialization
As one of the data sets for the program, use the d c1
linear equations whose solutions are the potentials at
nodes (junctions) 1 through 6 in the electrical network of Updating the determinant
Fig. 5.1.1. The values of the resistances are shown in dcdakk
ohms, and the potential applied between A and B is
100 volts. Normalization I

Reduction
I

Figure 5.1.1 Electrical network Should aik already be equal to zero, the reduction step
for the ith row may be ignored.
Method of Solution A zero or very small pivot element may be encountered
The reduction algorithm: The starting matrix A = (aij) in the elimination process (this may or may not indicate
is assumed to be the n x (n + m) augmented matrix of that the coefficient matrix in the first n columns of A
coefficients of page 273, where the column vectors is singular or nearly singular). A test for pivot magnitude,
a j = [a, j,azj, . . .,anjIt, for j = 1, 2, . . ., n, correspond to
the matrix B of (5.3). The vectors aj, for j = n 1, +
n + 2, . . ., n + m, are rn right-hand side vectors for which
solutions are desired. is made before the norm'alization step for the kth cycle,
The diagonal pivot strategy of (5.8), that uses, as the n k = 1, 2, . . ., n, to insure against extremely small divisors
pivot elements, the elements along the main diagonal in normalization. If test (5.1.2) is failed, computation is
of the matrix of coefficients (the first n columns of A) stopped for the offending set of equations.
leads to the transformation Electrical network equations: From Ohm's Law, I,,,
the current flowing from node p to node q in leg pq of
the network, is given by

Here, as in (5.8), the left arrow is intended to mean


substitution in place. The first n columns of A are trans-
formed to the identity matrix I (the n unit vectors, in
order), while the column vectors, aj, j = n + 1, . . ., n + m, where v, and v, are the voltages at nodes p and q, respec-
are transformed to the solution vectors Bd'aj. Note that tively, and K,, is the resistance of leg pq. For R,, in units
+
if any of the appended columns, aj, j = n 1, . . ., n + m, of ohms (R), the current is given in amperes. The equa-
are unit vectors initially, then the transformation (5.1.1) tions relating the voltages at the nodes of the network
leads to the corresponding columns of the inverse of the may be found by applying Kirchoff's Current Law: the
matrix of coefficients, B-', as illustrated by the example sum of the currents arriving at each node must be zero.
of Section 5.4. This is simply a conservation law for charge, and in-
If a variable d is initialized with the value 1 before the dicates that current may not be accumulated or generated
elimination algorithm is begun, then it may be updated at any node of the network. Application of these two
Example 5.1 Gauss-Jordan Reduction 275
laws at node 1 leads to Node Equation
100-v, v,-v,
IAl + +
I Z 1 161= --
3
+- 3
Vg-v
+2=
15
0 1 IIVL - 5v, - v g = 500
2 - ~ O V ~ + ~ ~ V ~ - I S U J - 605 = 0
or 3 +
- 3vz 70%- 404 - 0
4 +
- U 3 204 - V 5 = 0
l l v , - 5v2 - V 6 = 500. 5 - 3Vz +
- 1 0 ~ 4 2805 - 150, = 0
Similar equations may be written for each of the nodes
6 - 2111 +
- 1 5 ~ 5 47us = 0
in the network, leading to the system of six simultaneous
linear equations: (5.1.4)

Flow Diagram

FORTRAN Implementation
List of Principal Variables
Program Symbol Dejinition
A Augmented matrix of coefficients, A = (aU).
DETER d, determinant of the original coefficient matrix (the first n columns of A).
E PS Minimum allowable magnitude, E , for a pivot element.
I. J Row and column subscripts, i and j.
K Cycle counter and pivot element subscript, k.
KPl k 1- 1.
M Number of solution vectors, m.
N Number of equations, n.
NPCUSM n +- m,
276 Systems of Equations

Program Listing
C A P P L l ED NUMERI CAL METHODS, EXAMPLE 5.1
C GAUSS-JORDAN REDUCTION
TH lS PROGRAM F l NDS M SOLUTION VECTORS CORRESPOND l NG
TO A S E T OF N SIMULTANEOUS LINEAR EQUATIONS U S I N G THE GAUSS-
JORDAN REDUCTION ALGORITHM W I T H THE DIAGONAL P l V O T STRATEGY.
THE N BY N MATRIX OF COEFFICIENTS APPEARS I N THE F I R S T N
COLUMNS OF THE MATRIX A. M RIGHT-HAND S l D E VECTORS ARE
APPENDED AS THE ( N + l ) T H TO (N+M)TH COLUMNS OF A. ON THE
K ( T H ) PASS OF THE E L I M I N A T I O N SCHEME, K ( T H ) ROW ELEMENTS ARE
NORMALIZED BY D I V I D I N G BY THE P l V O T ELEMENT A(K,K). DETER,
THE DETERMINANT OF THE C O E F F I C I E N T MATRIX, I S UPDATED PRIOR
TO E L I M I N A T I O N OF A L L NONZERO ELEMENTS (EXCEPT FOR THE P l V O T
ELEMENT) I N THE K ( T H ) COLUMN. SHOULD A P l V O T ELEMENT BE
ENCOUNTERED WHICH I S SMALLER I N MAGNITUDE THAN EPS, COMPUTATION
I S DISCONTINUED AND AN APPROPRIATE COMMENT I S PRINTED.
THE SOLUTION VECTORS CORRESPONDING TO THE OR1 G I NAL R I G H T -
HAND S l D E VECTORS ARE STORED I N COLUMNS N + l TO N+M OF A.

I M P L I C I T REAL*8(A-H, 0-2)
DIMENSION A ( 5 0 , 5 1 )

READ (5,100) N,M,EPS


NPLUSM 3 N + M
WRITE ( 6 . 2 0 0 ) . - -
N.M.EPS,N.NPLUSM
DO 2 I - = 1, N-
READ (5.101) ( A ( I .J).J=l,NPLUSM)
WRITE (6,201) (A(~,J~,J=~,NPLUSM)
b
C ..... B=E1G.I N E L I M I N A T I O N PROCEDURE .....
DETER
DO9 K S 1 , N
C
C ..... UPDATE THE DETERMINANT VALUE
DETER = DETER*A( K,K)
.....
C
C ..... CHECK FOR P I V O T ELEMENT TOO SMALL
IF(DABS(A(K,K)).GT.EPS GO TO 5
.....
WRITE ( 6 , 2 0 2 )
GO TO 1
C
C
5
..... NORMAL1 Z E THE P I VOT ROW
K P I L K + l
.....
DO 6 J = KP1, NPLUSM
6 A(K,J) = A(K,J)/A(K,K)
A(K,K) = 1.
C
C .....
DO9
E L I M I N A T E K ( T H ) COLUMN ELEMENTS EXCEPT FOR P I V O T
l s 1 , N
.....
I F ( 1.EQ.K .OR. A(I,K).EQ.O. GO TO 9
DO 8 J = KP1, NPLUSM
8 A(I,J)
A(I,K)
= A(I,J)
= 0.
- A(I,K)*A(K,J)

9 CONTl NUE
C
WRlTE (6,203) OETER,N,NPLUSM
DO
-. 1
- 0- l = 1. N
10 WRITE ( 6 , 2 0 i ) . ' ( ~ ( l ,J),J*~,NPLUSM)
GO TO 1

C
100
..... FORMATS FOR I N P U T AND OUTPUT STATEMENTS
FORMAT ( 2(10X, 12, 8x1, lox, E 7 . 1 )
.....
101 FORMAT ( 20X, 5F10.5 )
200 FORMAT ( l O H l N = , 18 / 1 0 H M = , I 8 / 1 0 H EPS = ,
1 lPElb.l/22HO A(l,l)...A(* t 2 * lH,, 12, 1 H ) / 1 H
2 0 1 FORMAT ( 1 H 7F13.7
2 0 2 FORMAT ( 37HOSMALL P l V O T -
MATRIX MAY BE SINGULAR
2 0 3 FORMAT ( ~ O H O D E T E R = , E14.6/ 22H0 A(lOl)...A(O 12, l H O ,
1 12, 1 H ) 1 1 H
C
END
Example 5.1 Gauss-Jordan Reduction

Program Listing (Continued)


Data
N = 3 M = 4 EPS = 1.OE-10
-7.00000 4.00000 9.00000 1.00000
0.00000
9.00000 -6.00000 1.00000 0.00000
0.00000
8.00000 5.00000 6.00000 0.00000
1.00000
4 EPS = 1.OE-10
-7.00000 4.00000 9.00000 1.00000
0.00000
8.00000 5.00000 6.00000 0.00000
0.00000
9.00000 -6.00000 1.00000 0.00000
1.00000
4 EPS = 1.OE-10
9.00000 -6.00000 1.00000 1.00000
0.00000
-7.00000 4.00000 9.00000 0.00000
0.00000
-7.00000 4.00000 9.00000 0.00000
1.00000
4 EPS = 1.OE-20
-7.00000 4.00000 1.00000 1.00000
0.00000
9.00000 -6.00000 1.00000 0.00000
0.00000
8.00000 5.00000 6.00000 0.00000
1.00000
4 EPS = 1.OE-20
9.00000 -6,00000 1.00000 1.00000
0.00000

-3.00000 8.00000 5.00000 6.00000


0.00000 1.00000
M = 1 EPS = 1.OE-20
11.00000 -5.00000 0.00000 0.00000
-1.00000 500.00000
-20.00000 41.00000 -15.00000 0,00000
0.00000 0.00000
0.00000 -3.00000 7.00000 -4.00000
0.00000 0.00000
0.00000 0.00000 -1.00000 2.00000
0.00000 0.00000
-3.00000 0.00000 -10.00000
0.00000
0.00000 0.00000 0.00000
0.00000
3 EPS = 1.OE-20
1.00000 1.00000 1.00000
3.00000 2.900001 0.00000
3 EPS = 1.OE-20
3.00000 2.900001 1.00000
1.00000 1.00000 0.00000

Computer Output
Results for the 1st Dafa Set
N - 3
M - 4
EPS = 1.OD-10
278 Systems of Equations

Computer Output (Continued)

DETER = 0.235000D 03

Results for the 2nd Data Set


N = 3
M = 4
EPS = 1.OD-10

DETER = -0.235000D 03

Results for the 3rd Data Set


N 2z 3
M = 4
EPS = 1.OD-10

SMALL P I V O T - M A T R I X MAY B E S I N G U L A R

Results for he 4th Data Set


N 3 3
M = 4
EPS = 1.OD-20

SMALL P I V O T - M A T R I X MAY B E S I N G U L A R
Example 5.1 Gauss-Jordan Reduction
Computer Output (Continued)
Results for the 5th Data Set
N 3
M -
P
4
EPS = 1.OD-20

DETER = -0.490000D 02

Results for the 6th Data Set


N e 6
M I 1
EPS = 1.OD-20

DETER = 0.150~000D 0 7

Results for the 7th Data Set (double precision)


N .
: 2
M D 3
EPS 5 1.00-20

DETER = 0.2999970 01
280 Systems of Equations

Computer Output (Continued)


Results for the 8th Data Set (double precision)
N E 2
M il 3
EPS = 1.OD-20

DETER = -0.299997D 01

Results for the 7th Data Set (single precision)


N - 2
M - 3
EPS = 1.OE-20

DETER = 0.299997E 01

Results for the 8th Data Set (single precision)


N - 2
M - 3
EPS = 1.OE-20

DETER = -0.299997E 01
Example 5.1 Gauss.

Discussion of Results Results for the seventh data set are shown for calcu-
Except for the last ~ M I Osets of results shown in the lations done in both double- and single-precision
computer output, all ,calculations were done using arithmetic. For the IBM 360167, single- and double-
double-precision arithmetic. The first data set consists precision word sizes are the equivalent of 6-7 and 15-16
of the three equations used as the illustrative example in decimal digits, respectively. The equations are
Section 5.4. The starting matrix A consists of the co-
efficient matrix B and four right-hand side vectors,
including the three unit vectors in order, that is,
The solutions are

After the transformations resulting from the Gauss-


Jordan algorithm using the diagonal pivot strategy, the
final A matrix contains The results for the seventh data set using double-precision
arithmetic are accurate to the seven decimal places
shown in the computer output. Results for single-
The second data set is for the same set of equations, precision computations on the same data set are some-
except that the first two are interchanged. The solutions, what less accurate, with five or six figure agreement in
x, are identical with those for the first data set, but, of most cases. Note that the number 2.900001 has been
course, the inverse matrix B-' is different. approximated as 2.9000006; the seventh decimal digit
The third data set contains two identical equations, so cannot be entered accurately because of the word-size
that the coefficient matrix, B, is singular, and B-' does limitation.
not exist. This singularity will cause a zero pivot element The eighth data set is identical with the seventh, except
(or, allowing for some round-off error, a pivot element that the two equations are interchanged, leading to the
of very small magnitude) to be encountered at some system:
stage of the algorithm, in this case, during the third
cycle.
The fourth data set consists of three equations; x, does
not appear in the first equation. Since a , , = 0, the The solutions are again given by (5.1.6). Results for the
potential pivot for the first cycle fails the pivot magnitude double-precision calculations are again exact to the
test (5.1.2), and the comment regarding possible singu- number of figures shown in the output, that is,
larity of the coefficient matrix is printed. In fact, the
x, = 0.0333333
coefficient matrix is not singular, as evidenced by the
results for the fifth data set, in which the first two of the x, = 0.9666667.
equations for the fourth data set have been interchanged. For the single-precision calculations, however, the
This illustrates one of tkle important weaknesses of the computed results are
diagonal pivot strategy, that is, the appearance of a very
x, = 0.0625000
small pivot element on th~ediagonal does not necessarily
mean that the coefficient matrix is singular. By re- x2 = 0.9666665.
ordering the rows (equations) or columns (variables) of The almost meaningless value computed for x, results
the matrix, it may be possible t o find n pivot elements from the use of the very tiny (compared with other
of significant magnitude; if this is possible the matrix is coefficients) pivot on the first cycle, a , , = 0.00003, as
not singular and the equations have a unique solution, discussed in Section 5.4. This unfortunate proneness to
provided the right-hand side vector is not the null error of the diagonal pivot strategy for some systems of
vector (the equations would be homogeneous in that equations can be overcome often, though not always, by
case and the trivial solutialn x = 0 would be generated). arranging the equations to produce diagonal dominance
The sixth data set contains the equations of (5.1.4) in the coefficient matrix. This simply means that the
describing the unknown potentials at the hix rlocle~uf magnitudes of the coefficients on the main diagonal of
the electrical network shown in Figure 5.1 .l. The solu- the coefficient matrix should be as large as possible,
tions in volts are relative to the off-diagonal elements. An alternative (and
L I , = 70 preferred) strategy is to modify the Gauss-Jordan re-
u, = 52 duction scheme to allow in effect row and/or column
u , = 40 interchanges to insure that selected pivots are those
u, = 31 available elements of greatest magnitude. This approach,
v, = 22 known as the nlaximum pivot strategy, will be developed
v , = 10. in Example 5.2.
EXAMPLE 5.2
CALCULATION OF THE INVERSE MATRIX USING THE MAXIMUM PIVOT STRATEGY
MEMBER FORCES IN A PLANE TRUSS

Problem Statement computer program form in Example 5.1, the pivot for
the kth cycle is chosen to be the element in the a,,
Write a function named SlMUL that solves a set of n
position, that is, an element on the diagonal. As dis-
simultaneous linear equations,
cussed in Example 5.1, this strategy can lead to compu-
tational difficulty for some systems of equations. Should
akkbe equal to zero, then the normalization steps of the
algorithm must not be carried out. When a,, is of
relatively small magnitude when compared with other
elements in the first n columns of A, the normalization
and reduction operations can often be carried out; but
using Gauss-Jordan reduction with the maximum pivot computed results may be inaccurate because of round-off
strategy, described in the next section. The subroutine error (see results for the eighth data set in Example 5.1).
should have three alternative modes of operation, con- In such cases, provided that the matrix of coefficients is
trolled by one of its arguments: not near-singular, it is usually possible to overcome the
Mode 1. The equations of (5.2.1) should be solved problem by rearranging the rows or columns of the
and the solutions saved, in order, in a vector coefficient matrix; this is equivalent to reordering the
A. equations or renumbering the variables, respectively. An
Mode 2. The equations should be solved as in mode 1. alternative possibility is to choose any element of
In addition, the inverse of the matrix of significant magnitude that is still available for pivoting
coefficients should be calculated "in place," (not in a row or column already containing a pivot) as
that is, the inverse should be overstored in the the pivot for the kth pass. Usually the largest such element
first n columns of the matrix A, without use is preferred, as its use is likely to introduce the least
of any auxiliary matrices (see next section). round-off error in the reduction pass; the technique is
Mode 3. In this case, the matrix A contains only n termed the Gauss-Jordan reduction with the maximum
columns. The inverse should be computed pivot strategy. It is only necessary to keep lists of the
"in place" as in mode 2. row and column subscripts of successive pivot elements.
Let rk and ck be, respectively, the row and column sub-
In each case, the value of S ~ M U Lshould be the deter- script of the kth pivot element, pk.
minant of the matrix of coefficients. The algorithm, closely related to that of Example
Write a main program to test SIMUL. As one of the 5.1, is
data sets for the program, use the linear equations
whose solutions are the member forces in the statically Initialization \
determinate, simply supported, eleven-member plane
truss shown in Fig. 5.2.1. d t l
Choosing the pivot

$ IOT
Updating the determinant
d ~ k

Normalization
I
1
k = 1,2,..., n.

$10~ arkj+arkj/pk,j = 1, 2, ..., n + 1,


Figure 5.2.1 Statically determinate, simply supported, eleven- Reduction
member plane truss.

The Maximum Pivot Strategy


In the Gauss-Jordan reduction with the diagonal pivot
strategy, discussed in Section 5.4, and implemented in
Example 5.2 Inoerse Matrix Using the Maximum Pivot Strategy

The elements rk and ck must be assigned the appropriate The reduction steps lead to
values, once the kth pivot, pk, has been selected. Here, as
before, d is the determinant (possibly with incorrect sign)
of the matrix of coefficients. After reduction is completed,
the first n rows of A will contain a (possibly) permuted
identity matrix. The (n + 1)th column will contain the
solution vector x = [x,,x2,. . .,x,]', although the elements For the second cycle, only a,,, a,,, a,,, and a,, are
will usually be out of order. The solutions can be un- available as potential pivots. Since a,, has the greatest
scrambled by using the information saved in the vectors magnitude of the four candidates, p, = 3113, r, = 1,
r and c, as follows: c, = 3. The determinant value is updated to 9 x 3113 =
93. After the normalization and reduction steps, the
matrix becomes
The resulting transformed coefficient matrix has a
determinant equal to f 1 (it is either an elementary
matrix of the second kind or is the identity matrix itself).
Therefore the value computed for the determinant, d,
may be of incorrect sign. It is necessary to establish For the third and last cycle, only a,, is available for
whether an even or odd number of row interchanges is pivoting. Then p3 = 235193, r3 = 2, c3 = 1. The deter-
required to reorder the fi~nalpermuted matrix as the minant value is updated to 93 x 235193 = 235, and after
identity matrix. The problem can be solved by forming normalization and reduction, the matrix is
an auxiliary vector, j, such that

and then counting the nunnber of pairwise interchanges


Note that the first three columns contain a permuted
required to put the vector j into ascending sequence. If
identity matrix, and that the fourth contains the solutions,
the number of interchanges is even, then the sign of d is
although not in the natural order. The r and c vectors
correct; if not, the sign of (d is incorrect.
contain the values:
The method is probably best illustrated with a numer-
ical example. Consider the set of equations used in
Section 5.4, but taken in a different order to illustrate the
procedure better. The solutions, as before, are x =
[4, 1, 21'.
From (5.2.2), the solutions are

x3 = a , , = 2.
The augmented coefficient inatrix for this system is
The auxiliary vector j, formed according to (5.2.3), is
j, = 3
j, = 1
j, = 2.

Then, for k = 1, that is for the first pass, element a,, Since two pairwise interchanges are required to put j into
has the greatest magnitude of the nine possible coeffi- ascending sequence, the sign of d is correct, and the
cients and p, = 9, r, = 3, c , = 2. The determinant value, determinant of the original matrix of coefficients is 235.
d, is updated from its initial value of 1 to 9. After the
The Inverse Matrix
normalization steps for the first cycle, the transformed
matrix is The inverse matrix could be developed by appending
an n x n identity matrix to the augmented matrix of
coefficients, and carrying out the maximum pivot
strategy as describe.d in the preceding paragraphs. In
that case, the normalization and reduction steps would
284 Systems of Equations

be applied to all 2n + 1 columns of the matrix, rather For the numerical example, the introduction of (5.2.4)
than just the first n + 1 columns. The initial matrix for and (5.2.5) into the maximum pivot strategy leads to
the preceding numerical example would be the following three matrices after the first, second, and
third cycles of the algorithm, respectively.

The transformed matrix resulting from the three cycles


of the reduction algorithm would be

Cycle 1. -3519 0 3113


2519 0 - 213
119 1 -213

Cycle 2. -35193 0 1
235193 0 0
-13193 1 0

Cycle 3. 0 1

[ 8 0
1
0
0

Note that after the first cycle only the last column of
the original identity matrix has been changed. Since the
contents of the column containing the pivot element
(column 2 in this case) are known to be a one in the pivot
position and zeros elsewhere, there is no need to retain
them. The elements of the developing inverse (in this
case in column 7) can be overstored in place in the pivot
column. Similar behavior can be observed for the results The solution vector is unscrambled as before, and the
of the second and third cycle reduction calculations. determination of the appropriate sign for the deter-
The maximum pivot strategy already developed may minant is unchanged. The elements of the inverse matrix
be modified slightly to implement the inverse calculation are, of course, not in natural order unless all pivot
without the necessity of appending n additional columns elements are on the main diagonal of the coefficient
to the coefficient matrix or of using other auxiliary matrix. The unscrambling process involves both row and
matrices. The procedures for calculating the determinant column interchanges. Let y be an n element vector. Then
and selecting the pivot element are unchanged. All the inverse can be properly ordered using the following
elements of the row containing the pivot element, row scheme in sequence:
rk, are normalized as before, except the pivot element
itself, which is replaced by its reciprocal. Thus the Yci arij, i = 1 , 2 ,..., n,
+

j = 1 , 2,..., n,
normalization step is given by aij Yi, i = 1 , 2 ,..., n, (5.2.6)

For the numerical example, (5.2.6) leads to the un-


scrambled inverse :
The reduction calculations are carried out as before,
except for the elements in the column containing the
pivot element. The reduction calculations are described
by
I
61235 931235 671235
161235 131235 221235 .
5/47 7/47 1/47 1
Example 5.2 Inverse Matrix Using the Maximum Pivot Strategy 285
Method of Solution ui be the applied force in the vertical direction (positive
The function SlMUL implements the algorithm des- if directed downward) at the ith joint. Make two force
cribed in the preceding sections. The alternative modes balances (horizontal and vertical) at joints 2, 3,4, 5, and
of operation are controlled by a computational switch s; 6, and one force balance (horizontal only) at the roller-
modes 1, 2, and 3 (see Problem Statement) are activated supported joint 7. Let ? = cos 30" and S = sin 30°, and let
for s > 0, s = 0, and s < 0,respectively. Before a poten- fi be the tensile force in member i (positive if in tension,
tial pivot, ark,,, is used asp, in the no;malization steps, a negative if in compression). Then the eleven force
test for pivot magnitude, balances are

IarrcrI < E, (5.2.7) joint 2 h2 + f 4 + i f 3 - -1 =0

is made to insure against pivots of very small magnitude.


~2 + Zfl + Ef3 =0

If test (5.2.7) is passed, the matrix is singular or near- joint 3 h 3 - f 2 - i f 3 + S f 5 + f 6 =O


singular; computation is stopped, and SlMUL returns to V 3 - Zf3 - Ef5 =0
the calling program with zero as the determinant value.
The main program reatds values for n, s, E and ap- joint 4 ( h 4 - f 4 - S f 5 + S f 7 + f 8 =O
propriate elements of the n x (n + 1) or n x n matrix A, b 4 + ~ f +5 Zf7 =0

calls on SlMUL to carry out the desired Gauss-Jordan


joint 5 h5 - J 6 - if7 if9 +f10 =0
calculations, and prints the results. us - cf7 - cj9 =0
Next, consider the simply supported, eleven-member
plane truss of Fig. 5.2.1. Since the number of joints, joint 6 h6 - f 8 - jf9 + j f l l =0
N j , and the number of members, N,, are related by v6 + zf9 + +fll =0

2Nj - 3 = N,, the truss is statically determinate, and joint 7 h7 - f l o - Sf1, = 0.


the eleven member forces may be found by solving eleven
force balances at appropriate joints in the structure. Let Rearranging these equations into the form of (5.2.1), the
augmented matrix of coefficients is
0 0 0 0 , 0 h
0 0 0 0
v 0 ~
0 0 0 0 0 h 3
o o o o o v ,
-S -1 0 0 0 h4
- c o o 0 0 v4
S 0 - S - 1 0 h5
? 0 Z 0 0 v 5
0 1 ?. 0 - S h6
0 0 -2 0 -E v6
0 0 0 1 S h 7
Here, S = 0.5 'and 2 -10.86603. For the truss of Fig. 5.2.1,
hi be the applied force in the horizontal direction h2 = -5, vz = 10, h3 = 0 , v3 = 10, h 4 = 0 , u4= 10,
(positive if directed toward the right) at the ith joint, and h5 = 0, v5 = 10, h6 = 5, v6 = 10, h7 = 0.
Systems of Equations

Flow Diagram
Main Program

Evaluate
determinant, d,

solutions x,, . . ., x,,. solutions x,, . . ., x,,.


in place. Evaluate determi-
nant, d, and compute

(Function SIMUL)
Example 5.2 Inverse Matrix Using the Maximum Pivot Strategy

Function SlMUL (Arguments: n, A, x, E , s, N,)

m+iz+ 1
Y
Enter d c1 --
I

F m+n A I
I
Sy$lt?msof Equations

from previous page

0 Return

FORTRAN Implementation
List of Principal Variables
Program Symbol Definition
(Main)
A Augmented matrix of coefficients, A = (aij).
DETER d, determinant of the original coefficient matrix (the first n columns of A).
EPS Minimum allowable magnitude, e, for a pivot element.
I. J Row and column subscripts, i and j.
INDlC Computational switch, s.
MAX Number of columns in A, either n or n + 1.
N Number of rows in A, n.
X Vector of solutions, xi.
(Function
SIMUL)
AlJCK
l NTCH c*, number of pairwise interchanges required to order elements of vector j.
IP1, KM1, NMI i+1,k-l,andn-1,respectively. .
IROW Vector of pivot element row subscripts, r,.
JCOL Vector of pivot element column subscripts, c,.
IROWI, IROWJ, ri, rj, r,., c i , cj, and c,, respectively.
IROWK, JCOLI,
JCOLJ, JCOLK
ISCAN, JSCAN Indices q and t , used in scan of vectors c and r during pivot element search.
JORD The vector j.
JTEMP Temporary variable used in ordering the j vector.
K Cycle counter and pivot element subscript, k.
NRC N,,row and column dimensions of storage for the matrix A (assumed t o be square).
PIVOT Pivot element, p,.
Y Vector y, used in unscrambling the inverse matrix.
Exampie 5.2 Inverse Matrix Using the Maximum PivolStrategy

Program Listing
Main Program
C A P P L I E D NUMERICAL METHODS, EXAMPLE 5.2
C CALCULA'TION OF THE INVERSE MATRIX I N PLACE
C
C T H I S PROGRAM READS VALUES OF N, I N D I C, AND EPS. I F INDl C IS
C NEGATIVE, ELEMENTS OF AN N B Y N MATRIX A ARE READ, AND THEN THE
C FUNCTION S I M U L I S CALLED TO COMPUTE THE INVERSE OF A I N PLACE.
C I F I N D l C I S ZERO, ELEMENTS OF AN N BY N + l M A T R I X A ARE READ,
C AND THEW THE FUNCTION S l M U L I S CALLED TO SOLVE THE SYSTEM OF
C L I N E A R EQUATIONS WHOSE C O E F F l C I E N T MATRIX 1 s I N THE F I R S T N
C COLUMNS OF A AND VECTOR OF R I G H T HAND S I D E S I S APPENDED I N
C COLUMN N + l . THE N SOLUTIONS APPEAR I N X ( l ) . . . X ( N ) AND THE
C INVERSE OF THE C O E F F l C l E N T MATRIX APPEARS I N THE F I R S T N
C COLUMNS OF A. I F I N D l C I S P O S I T I V E , THE LATTER PROCEDURE I S
C FOLLOWED, EXCEPT THAT THE INVERSE I S NOT COMPUTED I N PLACE.
C S I M U L RETURNS AS I T S VALUE THE DETERMINANT OF THE F l RST N
C COLUMNS OF A, EPS I S THE MINIMUM P I V O T MAGNITUDE PERMITTED B Y
C SIMUL. SHOULD NO ACCEPTABLE P I V O T B E FOUND, S l M U L RETURNS A
C TRUE ZERO AS 1TS VALUE, DETER, THE DETERMINANT OF THE M A T R I X
C OF C O E F F l C l E N T S I S RETURNER AS THE VALUE OF S I M U L .
C
I M P L I C I T REAL*8(A-H, 0-Z)
D I M E N S I O N X ( 5 0 1 , A(S1,51)
C
C ....,
READ AND P R I N T M A T R I X A
1 READ ( 5 , 1 0 0 1 N,INDIC,EPS
,,.,.
WRITE ( 6 , 2 0 0 ) N, INDIC,EPS
MAX = N
I F ( INDIC.GE.0 1 MAX s N 1 *
DO4 I = l , N
READ ( 5 , 1 0 1 ) ( A ( I ,J),J=I,MAX)
4 WHITE (6,201) ( A ( I ,J),J=O,MAX)
C
C ,....CALL f l M U L
DETER = S l M U L ( N,
....,X, EPS,
A, lNDlC, 51
C
C ..... P R I N T SOLUTIONS
I F ( INDIG,GE,O BO
..,.,8
TO
WRITE ( 6 , 2 0 2 ) DETER
DO7 I p 1 , N
7 WRITE (6,201) (A(18J),J=1,N)
GO TO 1
8 WRITE 6 6 , 2 8 3 ) DETER,N,(X(I 1, I = l , N )
I F ( INOIC.NE.0 1 GO TO 1
9 WRITE (6,204)
DO 1 0 1 = It N
19 WRITE ( 6 , 2 0 1 ) [A(!,JI,J=l,N)
GO TO 1
C
C
100
.. .
, , FORMPiTS FOR I N P U T AND OUTPUT STATEMENTS
FORMAT ( 10X, 12, 18X, 12, 18X, E 7 . 1 )
.....
101 FORMAT ( 20X, 5F10,5
200 FORMAT ( 10IHlN ,
I C / 10H I N D l C 1 4 / 1 0 H EPS = ,
1 E 1 0 . 1 / 23HOTHE STARTING MATRIX I S / 1 H j
201 FORMAT ( 1 H , 7 F 1 3 . 6 )
202 FORMAT ( 1OHODETER = , F 1 2 , 6 / 22HOTHE INVERSE M A T R I X I S / 1 H )
203 FORMAT ( lOHODETER = , F 1 2 . 6 / 24HOTHE SOLUTIONS X(l)...X(, 12,
1 5H) ARE / 1 H / ( 1 H 7F13.6) ]
204 FORMAT ( 22HOTHE I N V E R ~ E MATRIX I S / 1 H )
C
Systems of Equations

Program Listing (Continued)


Function S l MUL
F U N C T I O N S I M U L ( N, A, X, EPS, INDIC, NRC )
C
C WHEN I N D l C I S NEGATIVE, S l M U L COMPUTES THE I N V E R S E OF T H E N BY
C N M A T R I X A I N PLACE. WHEN I N D I C I S ZERO, S l M U L COMPUTES THE
C N SOLUTIONS X(l)...X(N) CORRESPONDING TO THE S E T OF L I N E A R
C EQUATIONS W I T H AUGMENTED M A T R I X OF C O E F F I C I E N T S I N T H E N B Y
C N + l ARRAY A AND I N A D D I T I O N COMPUTES THE I N V E R S E OF THE
C C O E F F I C I E N T M A T R I X I N PLACE AS ABOVE. I F I N D l C I S POSITIVE,
C THE S E T OF L I N E A R EQUATIONS I S SOLVED BUT THE I N V E R S E I S NOT
C COMPUTED I N PLACE. THE GAUSS-JORDAN COMPLETE E L I M I N A T I O N METHOD
C I S EMPLOYED W I T H THE MAXIMUM P I V O T STRATEGY. ROW AND COLUMN
C S U B S C R I P T S OF SUCCESSIVE P I V O T ELEMENTS ARE SAVED I N ORDER I N
C THE IROW AND JCOL ARRAYS R E S P E C T I V E L Y . K I S THE P I V O T COUNTER,
C P I V O T THE A L G E B R A I C VALUE OF THE P I V O T ELEMENT, MAX
C THE NUMBER OF COLUMNS I N A AND DETER THE DETERMINANT OF THE
C COEFFICIENT MATRIX. THE SOLUTIONS ARE COMPUTED I N THE ( N + l ) T H
C COLUMN OF A AND THEN UNSCRAMBLED AND PUT I N PROPER ORDER I N
C X(l)...X(N) U S I N G THE P I V O T S U B S C R I P T I N F O R M A T I O N A V A I L A B L E
C I N THE IROW AND J C O L ARRAYS. THE S I G N OF THE DETERMINANT I S
C ADJUSTED, I F NECESSARY, BY D E T E R M I N I N G I F AN ODD OR EVEN NUEiBER
C OF P A I R W I S E INTERCHANGES I S REQUIRED TO PUT THE ELEMENTS OF THE
C JORD ARRAY I N ASCENDING SEQUENCE WHERE J O R D ( I R O W ( 1 ) ) JCOL(I).
C I F THE I N V E R S E I S REQUIRED, I T I S UNSCRAMBLED I N PLACE U S I N G
C Y(l)...Y(N) AS TEMPORARY STORAGE. THE VALUE OF THE DETERMINANT
C I S RETURNED AS THE VALUE OF THE FUNCTION. SHOULD THE P O T E N T I A L
C P I V O T OF LARGEST MAGNITUDE BE SMALLER I N MAGNl TUDE THAN EPS,
C THE M A T R I X I S CONSIDERED TO BE SINGULAR AND A TRUE ZERO I S
C RETURNED AS THE VALUE OF THE FUNCTION.
C
I M P L I C I T REAL*8(A-H. 0-2)
REAL*8 A, X, EPS, S ~ M U L
D I M E N S I O N IROW(50), J C O L ( 5 0 1 , JORD(501, Y ( ~ o ) , A(NRC,NRC), X(N)
r!
MAX = N
I F ( INDIC.GE.0 MAX N + 1

..... I S N LARGER THAN 5 0 .....


IF(N.LE.50) G O T 0 5
WRITE (6,200)
S l M U L = 0.
RETURN

..... B E G I N E L I M I N A T I O N PROCEDURE
DETER = 1.
.....
DO 1 8 K = 1, N
KM1 = K - 1

..... SEARCH FOR THE P l V O T ELEMENT


P l V O T = 0.
.....
DO 11 1 = 1, N

.....
DO 11 J = 1, N
SCAN IROW AND J C O L ARRAYS FOR I N V A L I D P l V O T S U B S C R I P T S
I F ( K.EQ.l ) GO TO 9
.....
DO 8 I S C A N = 1, K M 1
DO 8 JSCAN = 1, K M 1
I F ( I.EQ.IROW(ISCAN) ) GO TO 11
I F ( J.EQ.JCOL(JSCAN) GO TO 11
CONT l NU E
9 I F (DABS(A(I,J)).LE.DABS(PIVOT) ) GO TO 11
PlVOT = A(I,J)

JCOL(K) = J
11 CONTINUE
C
C .....
IF
I N S U R E T H A T SELECTED
DABS(PIVOT).GT.EPS
( )
P I V O T I S LARGER THAN EPS
GO TO 1 3
.....
S l M U L = 0.
RETURN

UPDATE THE DETERMINANT VALUE .....


C
C
13
.....
IROWK = IROW(K)
JCOLK = JCOLCK)
DETER = D E T E R + P I V O T
Example 5.2 Inverse Matrix Using the Maximum Pivot Strategy

Program Listing (Continued)


c
C .....
DO14
NORMALIZE P I V O T ROW ELEMENTS
J-1,MAX
.....
14 A(IROWK,J) = A(IROWK,J)/PIVOT
C
C ..... CARRY OUT E L I M I N A T I O N AND DEVELOP INVERSE
A(IROWK,JCOLK) = l./PIVOT
.....
DO 1 8 1 5 1, N
A l J C K = A(I,JCOLK)
I F ( I .EQ. IROWK ) GO TO 1 8
A ( I ,JCOLK) = -
A I JCKIPIVOT

17
18
DO17 Ja1,MAX
I F ( J.NE.JCOLK
CONTINUE
A(I,J) = A(I,J) -
AIJCK*A(IROWK,J)

C
C .....
DO 2 0
ORDER S O L U T I O N VALUES ( I F ANY) AND CREATE JDRD ARRAY
1 = 1, N
.....
!ROW1 = I R O W ( I 1
JCOLl = JCOL(I)
JORD( IROWI = JCOLl
20 I F ( INDIC.GE.0 ) X ( J C 0 L I ) = A(IROW1,MAX)

..... ADJUST S I G N OF DETERMINANT ..,..


C
C
INTCH = 0
FIMl = N - 1
DO22 1 =l,NMl
IP1 = I + 1
DO22 J = I P l , N
I F ( JORD(J).GE.JORD(I)) GO TO 2 2
JTEMP a J O R D ( J )
J O R D ( J ) = JORD(1)

INTCH -
JORD( I ) = JTEMP
INTCH + 1
22 CONTINUE
I F ( INTClH/2*2.NE.INTCH ) DETER = - DETER
C
C ..... I F I N D l C I S P O S I T I V E RETURN W I T H RESULTS
I F ( INOIC.LE.0 ) GO TO 2 6
.....
S l M U L = DIETER
RETURN
I]
C
C
..... FI FI R SI NT DBYl C ROWS
I S N E G A T I V E OR
,.... ZERO, UNSCRAMBLE THE I N V E R S E

26 DO 2 8 J 1, N
DO27 1 = I , N
IROWl = I R O W ( I )
JCOLl * JCOL( I )
Y ( J C 0 L I ) I A(IROW1,J)
DO28 1 = I , N

.....
A(I,J)

DO 3 0
= Y(1)
THEN BY COLUMNS
1 = 1, N
.....
DO 2 9 J = 1, N
IROWJ = I R O W ( J )
JCOLJ = JCOL(J1
Y(IR0WJ) I A(I,JCOLJ)
DO 3 0 J = 1, N
A(1.J) = 'rf(J)
..... RETURN FOR I N D l C N E G A T I V E OR ZERO
S l M U L = DETER
.....
RETURN

..... FORMAT FOR OUTPUT STATEMENT


FORMAT( lOHON TO0 B I G
.....
Systems of Equations

Program Listing (Continued)


Data
-1 EPS = 1.OE-20
8.00000 5.00000
-7.00000 4.00000
9.00000 -6.00000
0 EPS = 1.OE-20
8.00000 5.00000 6.00000
-7.00000 4.00000 9.00000
9.00000 -6.00000 1.00000
1 EPS = 1.OE-20
8.00000 5.00000 6.00000
-7.00000 4.00000 9.00000
9.00000 -6.00000 1.00000
0 EPS = 1.OE-10
-7.00000 4.00000 9.00000
9.00000 -6.00000 1.00000
8.00000 5.00000 6.00000
0 EPS - = 1 . 0 ~ ~ 1 0
9.00000 -6.00000 1.00000
-7.00000 4.00000 9.00000
-7.00000 4.00000 9.00000
0 EPS = 1.OE-10
-7.00000 4.00000 1.00000
9.00000 -6.00000 1.00000
8.00000 5.00000 6.00000
0 EPS = 1.OE-20
1.00000 1.00000
3.00000 2.900001
0 EPS = l.0E-20
3.00000 2.900001
1.00000 1.00000
0 EPS = 1.OE-10
0.00000 -0.50000 -1.00000
0.00000 0.00000 0.00000
-5.00000
0.00000 -0.86603 0.00000
0.00000 0.00000 0.00000
10.00000
1,00000 0.50000 0.00000
0.00000 0.00000 0.00000
0.00000
0.00000 0.86603 0.00000
0.00000 0.00000 0.00000
10.00000
0.00000 0.00000 1.00000
-0.50000 -1.00000 0.00000
0.00000
0.00000 0.00000 0.00000
Example 5.2 Inuerse Matrix Using the Maximum Pivot Strategy

tnhh Nmm
O d h
dtnN mmw
mmrl m o m
wmN mww
...
N O 0
000 ...
N W O
-or(
000

mmw Whf.
*r(m Orlh
r-mm r(WN
mmm mncl
mm*
...
m o d
000 ...
W U I N
N O 0
000

wmm
mww mmtn
m o m trtn
mww -nm
mmm
...
N w o
oorl
000 ...
mm*
m0.l
000

W h h
O r l h
r l W N
mmrl
00 m N

...
N O 0

000

oeo
000
000
000
000

N m n
mww
m o m
moo w
N W O
00-
Systems of Equations

000 0 0 mm*
000 m 0 am-
000 0 moDm

... .
000 u 0 - d m
000 X 0 WatN
Y
-
000 0 0

m x
. r
I--mm
0
0 rl
2 M.?",
400
0 - 1 0
-
X
* 0
I a
I-
.
0
0
m
d
u
X
cL

9 "0"
(1 00

I- OQ I- o a mwrc
1 1 1 K 080 I 1 0 W ONPI

* a m W
L, O n I
9 x-w C-

090
O Q O
000
000
400
-00
I . .
W b J

hh
mm
mm
mm

..
mm
M M
00
I

00
Q O Q rld
000 00
000 00
Q O O

...
000
000 ..
00
00

-INN
-a
0
V)
.-to
I

2 n
Z - W
Example 5.2 Inverse Matrix Using the Maximum Pivot Strategy

Computer Output (Continued)


Results for the 9th Data Set
N = 11
INDIC = 0
EPS = 0.1D-09

THE S T A R T I N G M A T R I X I S

DETER = 1.461457

.
THE SOLUTIONS X ( 1). . X ( 1 1 ) ARE

THE I N V E R S E M A T R I X I S
296 Systems of Equations

Discussion of Results Table 5.2.1 Member Forces in the


Plane Frame
All calculations were carried out using double- - - -

precision arithmetic. Results for the first three data sets Member Force Tension or
show the output for the three modes of operation of (Tons) Compression
SIMUL. The equations are those used as a numerical
Compression
example in the discussion of the maximum pivot strategy.
Tension
Equations for the fourth data set are those used as an Tension
illustrative example in Section 5.4. Compression
The coefficient matrix for the fifth data set is singular Compression
(two of the equations are identical); this is indicated by Tension
Compression
the zero determinant value. In this case, the entries in the
Compression
x vector are meaningless (left over from the previous Tension
problem), and the numbers printed under the inverse Tension
matrix label are the contents of A when the singularity Compression
was discovered.
The equations of the sixth data set are the same as Note that for the symmetrically distributed load, member
those for the fourth data set in Example 5.1. Note that forces are also symmetric. Once the inverse matrix has
the appearance of a zero on the diagonal in the a,, been computed, the member forces for the given structure
position causes no difficulty when the maximum pivot may be calculated for any combination of vertical and
strategy is employed (compare with the results in horizontal forces applied at the joints (any other applied
Example 5.1). forces can be decomposed into their horizontal and
Results for the seventh and eighth data sets show that vertical components) by evaluating the matrix product of
the order of equations is immaterial when the maximum the inverse matrix and the new right-hand side vector.
pivot strategy is employed.
Results for the ninth data set include the member
forces in the plane frame of Fig. 5.2.1. The forces are
listed in Table 5.2.1.
5.5 A Finite Form of rhe Method of iYacimarz 297
5.5 A Finite Form of the Method of Kaczmarz With these definitions, a solution of (5.9) is a solution of
We consider the solution of (5.2) under the assumption (5.2). For, if (aj,x) - v j = 0, 1 6 j 6 n, then by (5.12),
that B is nonsingular. The procedure consists in first
converting the system of (5.2) into an equivalent system,
AX=V, A*=A-l, (5.9)

such that (5.9) and (5.2) have the same solution vector x.
Then, using an arbitrary initial vector, r,, we define
Then, by (5. I3),
r =r - [(a, r ) -a , 1 <j 6 n, (5.10)

where A = (aij) and a j == [Zjl,ZJ2,.. .,h,,lr. Then Arn = v


and In application, all can be accomplished by using the
array [B j u] and forming in the same locations the array
;Br, = u. (5.11) [A 1 v]. If it is desired to vary the vector u after A has
been built, it will be necessary to record the n numbers
We show that (5.10) defines a vector r, such that
Ar, = v. Notice first that the equations of (5.9) may be ,/m and the (n2 - n)/2 numbers @,,a,), 1 < i < j ,
written 2 6 j < n. The building of the matrix [A / v] from [B I u]
can be visualized best by writing the conjugates of rela-
(ai,x) = vi, 1 < i < n. tions (5.12) [but not of (5.13)). Then observe that the
first row of [.A v] is formed from the first row of [B 1 u],
Notice next that, multipllying (5.10) on the left by aj, the second roi of [A I v] from the second row of [B 1 u]
and the just established row of [A v], etc. Each operation
(ajJj) = V j , involved can be viewed as tantamount to premultiplica-
so that the jth equation of (5.9) is satisfied by rj. However, tion by an elementary matrix of the first or third kind.
Thus there exists a nonsingular matrix t$ such that
, ) =( I ) i #j,
since (ai,aj) = 0 for i # j. Thus, if i <j, (ai,rj) = It will be seen that
(ai,rj-,) = -. = (a.I , a.)
I == vi. We see inductively that
Ar, = v, and the unique solution of (5.9) has been found.
Turn now to the solution of (5.2). Let B* =
..., B,,] where PI =. [6i1,6i2,. . ., binlf.A system equivalent This knowledge can be useful in case the matrix B is
to (5.2) and having the properties of (5.9) is built in
ill-conditioned, that is, has rows or columns so nearly
orthodox manner from the linearly independent vectors
pi by using the Gram-Schmidt orthogonalization pro- dependent on each other that rounding or truncation
cedure, as follows. Let errors can cause the calculated determinantal value to
deviate markedly from its true value. Now recall that
Idet(A)I = 1 to realize that (5.14) can accomplish the
purpose cited. Note also that the sequences {aj}, {uj},
and {rj) can progress together, so that the method can
properly be called an n-step method.
After orthogonalization, it is also possible to find the
solution vector x as A*v.
Exanzple. As a simple illustration of the Kaczmarz method,
Then it is readily found that (ai,ai) = 1, while (ai,aj) = 0 we consider the following problem, also discussed in Sections
if i # j is shown to be true inductively. Thus, if 5.6 and 5.7:
A* = [al,a2,. . .,an], then A* = A-'. This is verified by 4 X l f 2 ~ 2 - ~f 3 = l l ,
direct multiplication. Finally, let
- x i + 2x2 = 3,
2 ~ 1 +x z + 4 ~ 3 = 1 6 .
The matrix [Bj u] is
298 Systems of Equations

The first row of [A Iv] is that of [ ~u] divided


f by 4 4 +~22 +l2 approximation, however, can normally be improved by
expending more computational effort, that is, by perform-
or - - , - - Prior to normalizing, the second ing an increased number of iterations.
[;I J21 4 2 1 J2i
l11
First, solve for the x i , giving:
row is
~1 - 612x2 - b 1 3 x 3 - - blnxn)lbll,
x 2 = (u2 - b z l x 1 - b2,x3 - . . . - b2n~n)lb22,
The normalizing factor is 41' + 22, so that the second (5.15)

row of [Air] is
-1 2
[T,
3, . Prior to normalizing, the X, = (u, - bnlxl - bn2x2- . . - bn,n- lxn- l)/bnn*
third row is The system (5.15) can be written more concisely as

Note that the above rearrangement is predicated on


Thus for [Aiv], we have
b,, # 0.Usually, we try to reorder the equations and the
unknowns so that diagonal dominance is obtained, that is,
so that each diagonal element bii is larger, in absolute
value, than the magnitudes of other entries in row i and
column i. In this connection, also see relations (5.21).
Next, make starting guesses for the x's and insert these
values into the right-hand sides of (5.15). The resulting
new approximations for the x's are resubstituted into the
right-hand sides of ( 5 . 1 9 , and the process is repeated.
Hopefully, the x's thus computed will show little further
Using r, = [I, 1, I]', we find
change after several such iterations have been made.
Example. Consider the equations

Then
which have the solution vector x = [I, 2, 3If, that is, x, = 1,
x2 = 2, and x 3 = 3.
Rewrite the equations as

=[--143
105'105'~]
229 25 '
'

There results

= [I, 2, 31'.
and arbitrarily choose a starting vector x, = [I, 1 , I]' in which
5.6 Jacobi Iterative Method the subscript denotes the zeroth stage of iteration. Using a
Consider again the solution of the linear system Bx = u: second subscript to denote the iteration number, the first
iteration gives
+
bllxl blzx2 + + blnxn= u l ,
+
b Z l x l bt2x2 + + b2,xn = u 2 ,
(5.2)

bnlxl + bn2x2+ ... + b , , ~ ,= u,.


We now formulate the Jacobi iterative method for
approximating the solution of (5.2). The degree of
5.7 Gauss-Seidel Iterative Method 299
That is, x , = [2, 2, 13/41', Similarly, the next four iterations Thus, convergence hinges on the truth of (5.20). From
yield page 222, (5.20) is true if and only if all eigenvalues of A
are in modulus less than unity. For this to be so, from
(4.25), (4.26), and the subsequent development, we have
the sufficient conditions
n
xlaijl , < h i < 1,
i= i
1,<j6n,
I
The approximation computed at the fifth iteration is roughly
within 1% of the exact solution. The accuracy could be
improved by performing rnore iterations. Observe that a
whole new solution vector is computed before it is used in the
next iteration. By using (5.18), these sufficiency conditions can also be
translated into an equivalent set of conditions on the
elements of the original coefficient matrix B. For example,
In order to establish a criterion for the convergence of
the second condition of (5.21) becomes
the Jacobi method, regard the rearranged equations (5.15)
as the system
x=Ax+v,
in which If, as frequently occurs, matrix B is irreducible (that is,
r
-
u1
b,I
-.
a matrix of the form [: z3] , where B, is square and 0
is the null matrix, cannot be found by permuting rows
-u2 and columns of B), the sufficiency condition can be
b22 , v = 622 . (5.18) relaxed (for example, see Ralston and Wilf [I]) to
n
lbijl ,< Ib,,l, 1 6 i 6 n, (5.23)
j =1
un
- j+i

, bnn with strict inequality holding for at least one value of i.


If the starting vector xo 11snear the solution vector x,
convergence will be faster. In any event, define 5.7 Gauss-Seidel Iterative Method

xk, == Axk + v, (5.19) The linear system considered is again that of (5.2)
rephrased in the form (5.15) or (5.17). In the iterations,
in which the subscript k is the iteration number. This however, the newly-computed components of the x vector
means that are always used in the right-hand sides as soon as they are
obtained. This contrasts with the Jacobi method, in which
x k = A k x , + [ I + AL+A2+ - . ' +Ak-']v. the new components are not used until all n components
have been found.
From this, we see that convergence normally requires that
Exanlple. The Gauss-Seidel method is applied to the short
lim Ak = 0. (5.20)
k+m example considered under the Jacobi method. The form used is
From (4.23), it is also a necessary and sufficient condition
that

lim [I + A + A2 + ... + Ak] = ( I - A)-'.


k-m

Thus, when (5.20) is satisfied, x = lim,,, xk exists and 1 1


- 4x2.
x = 0 + (I - A)-'v; that is (I - A)x = v or x = Ax + v.
x3=4--XI
300 Systems of Equations

with the understanding that the most recently available x's positions. A similar description applies to A,. It is seen
are always used in the right-hand sides. Again xo is chosen as . ., x,,]',
that, if x, = [xlk,x2,,.
[I, 1, I]'. The first iteration gives
xk= ALxk+ ARxk-1+ V.
11 11
XI, =--- X I - - x 1 = 2 , This can be paraphrased as
4 2 4

which is then of the Jacobi form. This means that a neces-


sary and sufficient condition for the convergence of (5.24)
is that the eigenvalues of (I - AL)-'A, be less than unity
in modulus. The eigenvalues of (I - AL)-'A, are found
by solving det((1- AL)-'AR- A) = 0, or det([I - A,]-'
That is,
x [AR- 4 1 - AL)])= 0, or det(A, - I1 + AAL) = 0.
Thus the Gauss-Seidel process converges if and only if
the zeros of the determinant of

Similarly, the next two iterations yield -1 a,, a,, -..


-A
29 125 783 ' a a ... a3n (5.26)
x2= - -- -1
[32' 64 ' 2561 ' "11
are less than one in absolute value: a

Since a,, = 0, 1 < i < n, while aij = -bij/bii for i # j,


Observe that in this example the rate of convergence is much the determinant of (5.26) has the same zeros as the
faster than that in the Jacobi method. determinant of

In order to investigate the conditions for the conver-


gence of the Gauss-Seidel method, we first phrase the
iteration in terms of the individual components. Let xik
denote the kth approximation to the ith component of the
.
solution vector x = [x,,x,, . .,xnlt.Let [xlo,x,,, . . .,xno]'
be an arbitrary initial approximation (though, as with the It develops that conditions analogous to the first two of
Jacobi method, if a good estimate is known, it should be (5.21) prove sufficient to guarantee convergence, namely,
used for efficiency). Let A and v be the same as given
in (5.18), and define

for 1 < i < n and 1 < k. When i = 1,


interpreted as zero, and when i = n,
x:~',
x;=i+laijxjk is
aijxj,k-lis
The first of these may be demonstrated as follows. We
have already seen in (4.14) that since
likewise interpreted as zero. n

+
Write A = AL AR where IbiiI >~C=IbijI,
l
j+i
0 0 ... B is nonsingular. Thus a solution vector x exists such that
...
a,,. 0 x = Ax + v, whence
AL=[:
41 n.2 . ' xi - z aijxj+ vi,
n

j= 1
0 a12 ... j#i

in which aij = -bij/bii. Subtracting this from (5.24)


yields
0 0 ...
Thus A, is a strictly lower-triangular matrix whose sub- Ixik - xi1 < x
i- 1

j= 1
laijl lxjk - xjl
n
+ j = i + l laij! lxj,k-1 - xjl.
diagonal entries are the elements of A in their natural (5.29)
5.7 Gauss-Seidel Iterative Method 301
Let e, denote the maximlum of the numbers Ixik- xi(as i Taking the conjugate of both sides, (L*wi,wi)= (wi,Lwi)
varies. Then +
= Xi[(Dwi,wi) (Lwi,wi)], or

Ix~k-xlI G
n
Iu1ljIek-1G P k - 1 <'k-~. +
(wi,Lwi) = Xi[(wi,~wi) (wi,L*wi)]. (5.32)
~ = 2
Combining (5.31) and (5.32) gives
Substituting this in (5.29;) yields

Continuing asindicated gives lxik- xil 6 pek- ,, 1 < i G n.


This means, of course, that (xi, - x1I < ,ukeo, whence,
since 0 < p c 1, lim,,, xik = xi.
More interesting still than the sufficiency conditions of Substituting in (1 - I,Ki)[(wi,Dw,)+ (w,,Lw,) e (wi,L*w,)]
(5.28) is the fact that convergence always takes place if yields
the matrix B of (5.3) is positive definite. To demonstrate
this, let B = D + L + 2' where D = is the matrix
diag(b, ,,b2,, . . ., b,,), and L is the strictly lower-triangular Since D is itself positive definite, this expression is positive.
matrix formed from the elements of B below the diagonal. Then, by (5.30), (1 - Aili) > 0 or 1 ii1 < I .
Starting from (5.25), it is seen that a necessary and Thus, sufficiency has been shown. It is also possible to
sufficient condition for convergence is that all eigenvalues prove that if the matrix B is Hermitian and all diagonal
of (I - A,)-!AA, be of modulus less than unity. But elements are positive, then convergence requires that B be
AL = -D-'Land A, = .-D-'L*. Thus (i - AL)-'AR =
positive definite.
-(D + L)-'L*. The eigenvalues of this matrix, except for The solution of systems of equations by iterative pro-
+
sign, are those of(D L)'-'L*, which we consider instead. cedures such as the Jacobi and Gauss-Seidel methods is
Let 1,be an eigenvalue of this matrix, and let wi be the sometimes termed relaxation (the errors in the initial
corresponding eigenvector. Since B is positive definite, estimate of the solution vector are decreased or relaxed as
calculation continues): The Gauss-Seidel and related
methods are used extensively in the solution of large sys-
But (D + L)-'L*wi = liwi, so that L*wi = liDwi + tems of linear equations, generated as the result of the
AiLwi; then
-
finite-difference approximation of partial differential
(wi,L*wi) = liE[wi,Dwi) (wi,Lwi)].+ (5.31) equations (see Chapter 7)
EXAMPLE 5.3
GAUSS-SEIDEL METHOD

Problem Statement is modified by the algorithm


Write a program that implements the Gauss-Seidel i-i
method, described in Section 5.7, for solving the follow- xi, i n j= ik A j=i+ 1 a , (5.3.3)
ing system of n simultaneous linear equations: i = 1 , 2,..., n,
a l l x l + alzXz " ' + alnXn = al,n+l to produce the next approximation
a2,x1 + a 2 2 ~+2 . . + a2,xn = a2,,+ 1
% + I = L ~ i . k + l , XZ.L+II a * , xn.k+lI1b
Since, in the Gauss-Seidel method, the new values
xi,,+, replace the old values xi, as soon as computed,
in which the aij are constants. the iteration subscript k can be omitted, and (5.3.3)
becomes
Method of Solution
In order to reduce the number of divisions required xi = a l l + , - a , , i = 1.2,. . ., n, (5.3.4)
in the calculations, the coefficients of (5.3.1) are first j;i'
normalized by dividing all elements in row i by a , , in which the most recently available x j values are always
i = 1,2,. . . ., n, to produce an augmented coefficient used on the right-hand side. Hopefully, the xi values
matrix of the form
1 4 2

2
1
a;,
a;,

3
...

.-. i
a;,
:
a;,
a;,,+,
a;,n+l
:

a:,,+1
1, (5.3.2)
computed by iterating with (5.3.4) will converge to the
solution of (5.3.1).
The convergence criterion is

that is, no element of the solution vector may have its


where ajj = aij/aii.
magnitude changed by an amount greater than 6 as a
In terms of this notation, the approximation to the
result of one Gauss-Seidel iteration. Since convergence
solution vector after the kth iteration,
may not occur, an upper limit on the nurnkr of iterations,
k,,, is also specified.
ExantpIe 5.3 Gauss-Seidel Method

Flow Diagram

FORTRAN Implementation
List of Principal Variables
Program Symbol Definition
A n r: (n + 1) augmented coefficient matrix, containing elements nip
ASTAR, XSTAR Temporary storage locations for elements of A and X, respectively.
E PS Tolerance used in convergence test, E.
FLAG A flag used in convergence testing; it has the value 1 for successful convergence, and the
value 0 otherwise.
Iteration counter, k.
The maximum number of iterations allowed, k ,.
Number of simultaneous equations, n.
Vector containing the elements of the current approximation to the solution vector, x,.
304 Systems of &utiti~m
Program Listing
APPLIED NUMERICAL METHODS, EXAMPLE 5.3
GAUSS-SEIDEL I T E R A T I O N FOR N SIMULTANEOUS L I N E A R EQUATIONS.

THE ARRAY A CONTAINS THE N X N * l AUQMENTED C O E F F I C I E N T MATRIX.


THE VECTOR x CONTAINS THE LATEST APPROXIMATION TO THE SOLUTION.
THE C O E F F I C I E N T MATRIX SHOULD BE OIAGONALLY DOMINANT AND
PREFERABLY P O S I T I V E D E F I N I T E . ITMAX I S THE MAXIMUM NUMBER OF
I T E R A r l O N S ALLOWED. &PO I S USED I N CONVERGENCE TESTING. IN
TERMINATING THE ITERATIONSr N o ELEMENT OF X MAY UNDERGO A MAG-
NITUDE CHANGE GREATER THAN EPS FROM ONE I T E R A T I O N TO THE NEXT.

INTEGER FLAG
DIMENSION A(20,20), X(Z0)

..... CREAD AND CHECK INPUT PARAMETERS,


O E F F I C I E N T MATRIX, AND STARTING VECTOR .....
READ ( 5 , 1 0 0 ) N, ITMAX, EPS
WRlTE ( 6 , 2 0 0 ) N, ITMAX, EPS
NP1 a N + 1
READ ( 5 , 1 0 1 ) ((A(I.,J), J = f , NPl), I = 1, N )
READ (5,101) (X(I), I 1, N)
DO2 1 s L . N
W R l T E (6,201) (A(I,J), J 1, NP1)
WRlTE ( 6 , 2 0 2 ) (X(I), I * 1, N )
I
,
C ..... NORMALIZE
DO3 I = 1 , N
DIAGONAL ELEMENTS I N EACH ROW .....
ASTAR = A ( I . 1 )

C ..... B E G I N GAUSS-SEIDEL
DO 9 I T E R = 1, ITMAX
ITERATIONS .....
FLAG = 1
DO7 1 = l , N
XSTAR = X ( I
X f I ) .= A( 1,NPl)
C
C ..... F I N D NEW SOLUTION VALUE,
DO5 J-1, N
X(I) .....
I F ( I .EQ. J ) GO TO 5

TEST X ( I ) FOR CONVERGENCE .....


C
C .....
I F (ABS(XSTAR - X ( I ) ) .LE. EPS) GO TO 7
FLAG = 0
7 CONTINUE
I F (FLAG .NE. 1) GO TO 9
WRlTE ( 6 , 2 0 3 ) ITER, ( X ( I ) , I = 1, N )
GO TO 1
9 CONTINUE
C
C ..... REMARK I F METHOD D I D NOT CONVERGE
WRlTE (6,204) ITER, ( X ( I ) , I = 1, N )
.....
GO TO 1
C
C
100
..... FORMATS FOR INPUT AND OUTPUT STATEMENTS
FORMAT (6X, Ib, 16X, 14, 14X, F10.6)
.....
101 FORMAT ( l o x , 6 F 1 0 . 5 )
200 FORMAT ( 7 2 H 1 SOLUTION OF SIMULTANEOUS LINEAR EQUATIONS B Y GAUSS-SE
l l D E L METHOD, W I T H /1HO, 5X, 9HN = 14/
2 EX, 9HITMAX = , 1 4 / 6X, 9HEPS = ,F 1 0 . 5 / 4 7 ~ 6THE C O E F F I C I E N T

201
3 MATRIX A ( 1 , l ) ... A(N+l,N+l)
FORMAT (1H0, l l F 1 0 . 5 )
IS)
202 FORMAT (36HO THE STARTING VECTOR X ( l ) . . . X ( N ) I S / (1HO. 1 0 F 1 0 . 5 ) )
203 FORMAT (35HO PROCEDURE CONVERGED, W l T H I T E R = , I I /
1 3 2 H 0 SOLUTION VECTOR X ( l ) . . . X ( N ) I S / (1HO. 1 0 F 1 0 . 5 ) )
204 FORMAT (16HO NO CONVERGENCE/ lOHO I T E R = , 1 4 /
1 3 1 H 0 CURRENT VECTOR X(l)...X(N) I S / (1H0, J O F 1 0 . 5 ) )
C
END
Exsmple 3.3 Gavss=Sei&l Method
Program Listing (Coqtiq~ed)
Data

Computer Optput
Res~ltsfor the 1st Dale lSef
SOLUTION OF SIMUQTANEQUS LINEAR EqlJATIQNS BY GAUSS-SElDEL METHOD, WITH
N 4 4
ITMA? c 15
EPS s OtOOQZO
THE COEFFICIENT MATRIX A ( I , I ) , . . A ( N + L , N * l ) IS

THE STASTINS VL(STOR X ( I I , , , X ( N ) IS

PROCEDURE CONVEIRGEO, W l VH !T ER 12
SOLUTION VECTOR X ( l ) , , , X ( N l IS

Partid Results far the 2 ~ Data


d Set (Same Equations as 1st Set)
THE STARTING VECTOR X ( l ) . . , X ( N ) IS

-50.ooO00 50,08000 5Q.00Q00 5Q.00800


PROCEDURE CONVERGED, WITH ITER = 13
OBLUTloN VECTQR X ( I ) , , , X ( N ) IS
lt00002 2,00000 3.00001 -1.00001
306 System of Equations

Compmta Output (Continued)


Results for the 3rd Data Set
SOLUTION OF SIMULTANEOUS LINEAR EQUATIONS BY GAUSS-SEIDEL METHOD, WlTH
N = 6
ITMAX = 50
EPS = 0.00010

THE COEFFICIENT MATRIX A(l,l)...A(N+l,N+I) IS

-1.00000 0.0 0.0 1.00000 -1.00000 0.0 100.00000


0.0 -1.00000 0.0 -1.00000 1.00000 -1.00000 0.0
0.0 0.0 -3.00000 0.0 -1.00000 1.00000 0.0
THE STARTIM3 VECTOR X(l)...X(N) IS

0.0 0.0 0.0 0.0 0.0 0.0


PROCEDURE CONVERGED, WlTH ITER = 13

SOLUTION VECTOR X(l)...X(N) IS

38.09517 11.28566 1.76188 38.09518 10.28568 1.76189


E m p l e 5.3 Gauss-Seidel Method 307
Discussion of Results the second choice of starting vector is much poorer than
The first two data eiets relate to the simultaneous the first, there is only a marginal difference in the number
equations of iterations required for the specified convergence. The
third data set arises from the finite-difference solution
5x1 + xz + 3x3 = 16 of an elliptic partial differential equation; it is, in fact,
x, + 4 x ; ! + x3 I- xq= 11 the system (7.82) with gI5= gI6= 100 and g, through
-x1 +2 -
~ f; 6~x 3 2x4 = 23 9 1 4 = 0.

x, - x;! 3. X J I - 4 x 4 = -2,
Finally, note that convergence is guaranteed in all
three cases, since each coefficient matrix satisfies condi-
which have the solution x = [I, 2, 3, - 11'. Even though tions (5.28).
308 Systems of &uations

5.8 Iterative Methods for Nonlinear Equations showing that the points xk lie in R. Also, by induction,
Sections 5.8 and 5.9 are concerned with finding the from (5.37) and (5.39),
solution, or solutions, of the system ,
Ixik- uil < p max (Ix,,~- - ail)
i
< pkh.
Therefore, (5.38) is true, and the procedure converges to a
solution of (5.33). Note that if the Fi(x) are linear, we
have the Jacobi method, and the sufficient conditions of
(5.37) are the same as the second set of sufficient con-
ditions in (5.21):
involving n real functions of the n real variables For the nonlinear equations, there is also a counter-
x,, x,, . . ., xn. Following the previous notation, x = part to the Gauss-Seidel method, previously used in
[x,,x2, . . ., x,]', we shall write fi(x) =fi(xl,x2, . . ., x,). Sec. 5.7 for the linear case. We proceed as before, except
Here, and in the subsequent development, 1 < i < n. that (5.36) is replaced by
Then let a = [a,,~,,...,or,]' be a solution of (5.33),
that is, let fi(a) = 0.
Let the n functions Fi(x) be such that
That is, the most recently computed elements of the solu-
tion vector are always used in evaluating the Fi. The
implies &(x) = 0, 1 <j < n. Basically, the n equations proof of convergence according to (5.38) is much the
(5.34) will constitute a suitable rearrangement of the same as for the Jacobi-type iteration. We have
original system (5.33). In particular, let
ui = Fi(a). (5.35)
Let the starting vector xo = [ X , ~ , X. .~.,~xnolt
, be an where
approximation to a. Define successive new estimates of
the solution vector, xk = [xlk,x2,.. ., xnklr,k = 1, 2 , . . ., Elk = [a1 + <lk(~i,k- 1 - al)r . . + t ~ ( ~ n ,1k-- an)]'.
by computing the individual elements from the recursion
relations It will appear inductively that the above is true, because
the various points concerned remain in R. If ek-, is the
Xik = Fi(xl.k-1,xz,k-l, - .., Xn,k-l) = Fi(~k-1). (5.36)
largest of the numbers I x ~ , ~ --~ ail, then Ix,, -all <
Suppose there is a region R describable as Ixj - orjI < h, pe,-, < ek-I < h. It follows that
1 <j < n, and for x in R there is a positive number p,
less than one, such that

Then, if the starting vector xo lies in R, we show that the


iterative method expressed by (5.36) converges to a
solution of the system (5.33), that is, where eZk= [a1 f <lk(~llr- MI),ff2 + t2k(~2,k-l- az), . .
+ -
an < 2 k ( ~ n , k - 1 - an)]'. That is, Ixzk u21< p e k - ~<
lim xk = a.
k-rm
ek-, < h, and, in general, lxik- ail < pek- < ek- < h. , ,
Therefore, Ix, - ail < Sh, and convergence according to
Using the mean-value theorem, the truth of (5.38) is (5.38) is again established. Observe that the first of the
established by first noting from (5.35) and (5.36), that sufficiency conditions of (5.28) has been reaffirmed under
slightly more general circumstances.

Example. To illustrate the above techniques, choose the


equations

in which 0 < < 1. That is,


5.8 Iterative Methods for Nonlinear Equations 309
Rewrite these equations in a form which is consistent with Using the same arrangement of the equations in conjunction
(5.34), with the same starting values, iteration of the Gauss-Seidel
type gives

and choose the starting values xlo = 0.4, xl0 = 3.0. Within
slide-rule accuracy, the Jacobi-type iteration gives

similarly, x13= 0.500, x z 3= m ; x14= 0.500, xz4=rr, etc.


There is less risk involved in using an approximate slide-
rule approach in these iterative calculations than might be
supposed. Unlike the exact methods, such as Gaussian elimina-
tion for linear equations, there is no inherited round-off error
from one step to the next. This follows, since the results at
each stage of the iteration can be viewed as a new guess or
initial approximation to the solution vector. Substantial
error can be tolerated, provided that there is no gross error in
similarly, x13= 0.505, x Z 3= 3.14; x14= 0.500, x24 = 3.14; the final stages of calculation. These remarks apply to the
XIS =0.500,~25='IT. iterative solution of linear equations as well,
EXAMPLE 5.4
FLOW IN A PIPE NETWORK
SUCCESSIVE-SUBSTITUTION METHOD

Problem Statement automatically have the correct sign

A network consists of a number of horizontal pipes,


of specified diameters and lengths, that are joined at n
nodes, numbered i = 1,2, . . ., n. The pressure is specified
Qij
J 1 .
=(pi - ~ j )
ccjIpi - pjI
at some of these nodes. There is at most a single pipe At any free node j, where the pressure is not specified,
connected directly between any two nodes. the sum of the flows from neighboring nodes i must be
Write a program that will accept information concern- zero :
ing the above, and that will proceed to compute: (a) the
pressures at all remaining nodes, and (b) the flow rate
(and direction of flow) in each pipe. C Qij = C (pi - ~ j )
1 I
= 0. (5.4.5)

Method of Solution When applied at all the free nodes, equation (5.4.5)
yields a system of nonlinear simultaneous equations in
For flow of a liquid from point i to pointj in a horizon- the unknown pressures. We shall solve this system by
tal pipe, the pressure drop is given by the Fanning the successive-substitution type of method described in
equation : Section 5.8. First, note that (pi - pi) is more sensitive
than (/pi- pjl)'12 to variations inpj. Thus an appropriate
Pi - Pj = j1f ~ ~ Du-r n
Z L
(5.4.1) version, analogous to equation (5.34), is

Were, fM is the dimensionless Moody friction factor, p is


1 aijpi
p. =-
i
(5.4.6)
the liquid density, urnis the mean velocity, and L and D C aij '
i
are the length and diameter of the pipe, respectively.
Since the volumetric flow rate is Q = (nD2/4)um,equation in which
(5.4.1) becomes
aiJ = (cijlpi - Pjl)-112. (5.4.7)

Equation (5.4.6) is applied repeatedly at all free nodes


until either each computed pressure p j does not change
by more than a small amount e from one iteration to the
Here, all quantities are in consistent units, However, if next, or a preassigned number of iterations, itmax, has
pi and p j are expressed in psi (Ibf/sq in.), p in lb,/su ft, been exceeded. The most recently estimated values of pi
Q in gpm (gallons/min), L in ft, and D in inches, we will always be used in the right-hand side of equation
obtain (5.4.6).
In order to implement the above, we also introduce
the following:
(a) A vector of logical values, p,, p,, .. ., p" (PQIVEN
where
in the program), such that Jij is true (T) if the pressure is
specified at node j, and false (F) if it is not.
(b) A matrix of logical values, I , , . . . In,(the incidence
matrix INClD in the program), such that Iijis true if
there is a pipe directly joining nodes i and j, and false
Let ciJ = CL,,/D:, where the subscripts z j now em- if not.
phasize that we are concerned with the pipe joining
~ioclesi and j. T h c flow ratc Q,j bctwcen nodes i and j Since the incidence, diameter, a n d length matrices are
is then given by symmetric (for example, OJi = Dij), we need supply
only the lower triangular portions of such matrices as
data. The input data will also include a complete set of
pressures, p,, p2, . . ., p,; some of these will be the known
in which Qij is plus or minus for flow from i t o j or vice pressures, and the remainder will be the starting guesscs
versa, respectively. In the following version, Qi, will at the free nodes.
Example 5.4 Flow in a Pipe Network (Successive-SubstitutionMethod)

Flow Diagram

i = 1,2, ..., 11 C 6 eqn. (5.4.3)


i = 1, 2, ..., 11 ..., i
j = I , 2,

Dji + Dij

Iji c Jij . . ., itmax


cJ . . t ct j. II

i = l,2, j = I , 2,
. .., n den t 0
T I I
I I
I
I I
> l l
I I I
I I
1 I
I I
num t nun1 + aijpi nu117
PjC-
den
I I
I I
I I
-----------I II
I
I
i := 1, 2,

'. \

iter, pi,
Qij(j=2,..,~),
i = 1,2, ...,n
-
312 Systems of Equations

FORTRAN Implementation
List of Principal Variables
Program Symbol Definition
Matrix, whose elements a i j are defined by (5.4.7).
Matrix, whose elements cij relate the flow rate to the pressure drop in the pipe joining nodes i
and j.
CONV Logical variable used in testing for convergence, conv.
D. L Matrices, whose elements D i j and Lij give the diameter (in.) and length (ft) of the pipe joining
nodes i and j.
DENOM, NUMER Storage for the denominator and numerator of (5.4.6), den and num, respectively.
EPS Tolerance, E , used in testing for convergence.
F Moody friction factor, f, (assumed constant).
FACT0 R The constant, C, in equation (5.4.3).
1, J Indices for representing the nodes i and j.
INClD Matrix of logical values, I; if Iijis T, there is a pipe joining nodes i and j; if F, there is not.
ITER Counter on the number of iterations, iter.
IPRlNT Logical variable, which must have the value T if intermediate approximations to the pressures
are to be printed.
Upper limit on the number of iterations, itmax.
Total number of nodes, n.
Vector of pressures, pi (psi), at each node.
Vector of logical values, pi, at each node. If pi is T, the pressure is specified at node i; if F, it
is not.
0 Matrix, whose elements Q i j give the flow rate (gpm) from node i to node j.
RHO Density of the liquid in the pipes, p (lb,/cu ft).
SAVEP Temporary storage for old pressure pj during convergence testing, P.
Example 5.4 Flow ih a Pye Network (Succes~ive=S~bstitutia;n
Method)

Program Listing
A P P L I E D NUMERICAL METHODS, EXAMPLE 5.4
FLOW I N A P I P E NETWORK -
SUCCESSIVE S U B S T I T U T I O N METHOD

T H I S PROGRAM READS A DESCRl P T l O N OF THE TOPOLOGY OF AN


ARBITRARY N NODE PIPE NETWORK WITH PRESSURES S P E C I F I ED AT
PARTICULAR NODES, AND THEN COMPUTES THE PRESSURES A T THE
REMAINING NODES AND THE INTER-NODAL FLOW RATES U S I N G A METHOD
OF SUCCESSIVE SUBSTITUTIONS. I F INCID(1,J) I S TRUE, THEN NODES
I AND J ARE CONNECTED B Y A PIPE SEGMENT OF D I A M E T E R D(I,J)
INCHES AND LENGTH L ( I , J ) FEET. I F P G I V E N ( I ) IS TRUE, THE
PRESSURE A T NODE I, P ( I ) P S I , I S F I X E D . OTHERWISE, P( I ) ASSUMES
SUCCESSIVE ESTIMATES OF THE PRESSURE AT NODE"!. RHO 1 S THE
F L U I D D E N S I T Y I N LB/CU F T AND F THE P I P E FRICTION FACTOR,
ASSUMED I D E N T I C A L FOR A L L P I P E SEGMENTS. I T € R 1s THE I T E R A T I O N
COUNTER. I T E R A T I O N I S STOPPED WHEN I T E R EXCEEDS ITMAX OR WHEN
NO NODAL PRESSURE CHANGES B Y AN AMOUNT GREATER THAN EPS PSI
BETWEEN TWO SUCCESSIVE ITERATIONS. Q(I,J) I S THE FLOW RATE I N
G A L / M I N BETWEEN NODES I AND J. WHEN Q ( I , J ) I S POSITIVE, F L U I D
FLOWS FROM NODE I TO NODE J. THE MATRl CES A AND C ARE
DESCRIBED I N THE TEXT. WHEN I P R I N T I S TRUE, I N f E R k E D I A T E
APPROXIMATIONS OF THE NODAL PRESSURES ARE PRINTED.

LOGICAL I P R I N T , PGIVEN, I N C I D , CONV


REAL L, NlUMER
DIMENSION P ( 1 0 3 , P G I V E N ( l O ) , A(10,10), C(10,10), ~(10,10),
1 L(10,10), INCID(10,10), Q(10,lO)

.....
READ
READ DATA
(5,1010)
.....
N, ITMAX, RHO, EPS, F, IPRINT, (PGIVEN(I), I=l,N)
READ (5.101) (P(l). l=l.N)
DO2 l=l,N
READ (5,102) (INCID(I,J), J=l,l)
READ (5,101) ~D(I,J)~-J=~,I) -
READ (5,101) (L(I,J), J=l,l)

..... .....
SET UP UPPER TRIANGULAR PARTS OF SYMMETRIC MATRICES D,
AND I N C l D AND COMPUTE ELEMENTS OF C MATRIX
L,

FACTOR 0 8.*12.+*5+RHO*F/(3.1415926**2*32.2*(60.*7.48)**2*144.)
DO 3 l=l,,N
DO 3 J*1,1
C(I,J) = 0,
I F ( 1.EQ.J ) GO TO 3
I F ( INCID(I,JP C(I,J) = FACTOR+L(I,J)/D(I,J)tt5
D(J.1) a D(1.J)
L(J;I) = L(I;J)
INCID(J,I) = INCID(I,J)
C(J,I) = C(I,J)
3 CONTl NUE
C
C .....
WRITE
P R I N T OUT I N I T I A L INFORMATION ABOUT NETWORK
(6,200) N,ITMAX,RHO,EPS,F,IPRINT,(I,P(I),PGIVEN(I),I=l8N)
.....
WRITE (6,201)
DO 4 I =1,N
h WRITE (6,202) 1, I, N, ( I N C I D ( I , J ) , J=l,N)
WRITE (6,201)
DO5 l = l # ~
5 WRITE (6,203) I, I, N, ( D ( I , J ) , J=l,N)
WRITE (6,201)
DO6 I=l,N
6 WRITE (6,204) 1, I, N, ( L ( I , J ) , J=l,N)
314 Systems of Equations

Program Listing (Continued)


C
C ..... COMPUTE SUCCESSIVE ESTIMATES OF PRESSURES AT NODES
I F ( IPRINT ) WRITE ( 6 , 2 0 5 ) (I, l = l , N )
.....
DO 9 I T E R = l , ITMAX
CONV = .TRUE.
DO 8 J=l,N
I F ( PGIVEN(J) GO TO 8
NUMER = 0.
DENOM r 0.
DO 7 I=l,N
I F ( .NOT.INCID(I,J) ) GO TO 7
A(I,J) = l.O/SQRT(C(l,J)*ABS(P(I)-P(J)))
NUMER = NUMER + A ( I , J ) * P ( I
DENOM = DENOM + A ( I , J )
7 CONTINUE
SAVEP = P ( J )
P ( J ) = NUMER/DENOM
I F ( ABS(SAVEP-P(J)).GE.EPS CONV = .FALSE.
8 CONTINUE
I F ( IPRINT ) WRITE (6,206) ITER, ( P ( I ) , I=l,N)
I F ( CONV ) GO TO 1 0
9 CONTINUE
WRITE ( 6 , 2 0 7 ) ITMAX
C
C
10 DO11
.....
COMPUTE FLOWS I N IND l V I D U A L NETWORK BRANCHES
I=l,N
.....
DO 11 J=1,1
Q(I,J) = 0.
Q(J,l) = 0.
I F ( 1.EQ.J .OR. . N O T . I N C I D ( I , J ) ) GO TO 11
Q(I,J) = (P(I)-P(J))/SQRT(C(I,J)*ABS(P(I)-P(J)))
Q(J,I)
11 CONTINUE
= -
Q(I,J)

L
C ..... P R l NT F I N A L PRESSURES AND FLOWS
WRITE ( 6 , 2 0 8 ) ITER, N
.....
DO 1 2 I=1,N
12 WRITE ( 6 , 2 0 9 ) I, P ( l ) , ( Q ( I , J ) , J=l,N)
GO TO 1
C
..... FORMATS FOR INPUT AND OUTPUT STATEMENTS
FORMAT( 3X,12,17X,13,1SX,FS.l,l5X,E5.0/
.....
bX,F6.3,14X,Ll /
1 (30X, 2 0 ( L l , l X ) )
FORMAT( 30X, 5F8.3
FORMAT( )OX, 2O( L 1 , l X )
FORMAT( 23HlFLOW I N A P I P E NETWORK/ lOHON
1s , 1 3 / 1 0 H RHO = , F7.3/ 1 0 H EPS I,
.
= 1 3 / 1 0 H ITMAX
~ 1 0 . 2 1 10H F I0

2 F7.3/ 1 0 H I P R I N T = , 2X, L 1 / 3H0 I, 6X, 4 H P ( I ) , 4 X , 9 H P G I V E N ( I ) /


3 ( 1 H , 12, F10.3, 6X, L 1 )
FORMAT( l H O / l H O )
FORMAT( 7HOINCID(, 12, 13H, l ) . . . I N C I D ( , 12,1H,,12, 3H) ,
1 40(Ll,lX)/ ( 1 H , 29X, 4 0 ( L l , l X ) )
FORMAT( 3HOD(, 12, 9H, l)...D(, 12,1H,,12, l H ) , 9X, l H a , 8F10.3 /
1 ( 1 H , 29X, 8 F 1 0 . 3 )
FORMAT( 3HOL(, 12, 9H, l ) . . . L ( , 12,1H,,12, l H ) , 9X, 1H= , 8F10.3 /
1 (1H ,29X, 8 F 1 0 . 3 )
FORMAT( 1HO/ SHOITER, 7X, 16HPRESSURE AT NODE/ ( 1 H , l l X , 8 ( 11,9X)))
FORMAT( 1 H , 13, 3X, 8 F 1 0 . 4 1 ( 1 H , 6X, 8F10;4) )
FORMAT( 35HOSOLUTIONS F A I L E D TO CONVERGE AFTER, 1 3 , l l H I T E R A T I O N S )
FORMAT( lHO/26HOPRESSURES AND FLOWS AFTER, 13, 1 5 H ITERATIONS ARE/
1 3H0 1,5X,4HP( 1 ),7X,16HQ( 1, I).. .Q( I,, 12,lH) / 1 H , 7X, 3HPS1,
2 14X, 7 H G A L / M I N / / )
FORMAT( 1 H , 12, F10.4, 5X, 8F10.3/ ( 1 H , 17X, 8 F 1 0 . 3 ) )

END
Example 5.4 Flow in a Pipe Nerwork (Successive-SubstitutionMethod)

Program Listing (Continued)


Data
N = 5 ITMAX = l o 0 RHO = 50.0 EPS = 1 . E - 4
F = 0.056 IPRINT = T
PGIVEN(l)...PGIVEN(5) T F T F F
P(l)...P(S) 50.000 20.000 8.000 40.000 30.000
INCID(1,l) F
D(1.1) 0.000
0.060
T F
3.006 0.000
150.000 0.000
F T F
0.000 3.000
0.000 150.000
T F F F
4.000 0.000
100.000 0.000
F T F T F
0.000 4.000
0.000 100.000

Computer Output
FLOW I N A P I P E NETWORK

N = 5
ITMAX = 100
RHO = 50.000
EPS * 0.10E*-03
F a 0.056
IPRINT * T
316 Systems of Equations

Computer Output (Continued)


L( 1, l)...L( 1, 5 )
L( 2, l ) . . . L ( 2, 5)
L( 3, l ) . . . L ( 3, 5 )
L( 4, l)...L( 4, 5)
L( 5, l ) . . . L ( 5, 5)

l TER PRESSUIRE AT NODE


2 3
27.4553 0.0
30.3218 0.0
Example 5.4 Flow in a Pipe Network (Successive-SubstitutionMethod)

Computer Output (Continued)


56 50.0000 42 . 8 5 9 8
57 50.0000 42.8608
58 50.0000 42 . 8 6 1 6
59 50.0000 42.8624
60 50.0000 42.8630
61 50.0000 42.8636
62 50.0000 42.8640
63 50.0000 42 .8644
64 50.0000 42.8648
65 50.0000 42.8651
66 50.0000 42.8654
67 50.0000 42.8656
68 50.0000 42.8658
69 50.0000 42.8660
70 50 .OOOO 42 .a662
71 50.0000 42.8663
72 50.0000 42.8664
73 50.0000 42.8665
74 50.0000 42.8666

PRESSURES AND FLOWS AFTER 7 4 I T E R A T I O N S ARE


318 System of Equations

Discussion of Results any two nodes that are directly co.~nected would be
The data used above relate to the network shown in unfortunate.
Fig. 5.4.1, with f, = 0.056, p = 50 Ib,/cu ft, and two Note that the bulk of the pressure drop occurs in the
pressures fixed: p, = 50, p, = 0 psi. pipe 2-3, and that the flow in the branch 1-4-5-2 is
appreciably greater than that in the pipe 1-2, even though
1 2 3 the latter is much shorter. Both these observations can
be reconciled by noting that pressure drop is proportional
tI2=L23= 150 ft
Lr4=L45=L25= 100 ft
to Q2/D5, and that pipe 2-3 must take the combined
D12=D23=3
in. flows along 1-4-5-2 and 1-2.
DI4=L)45=D25=4In. The method can be extended to more complex situ-
ations, in which we could allow for (a)f, being a function
of Reynolds number and pipe roughness, instead of being
Figure 5.4.1 Pipe network for example cafculdtion. treated as a constant, and (b) pumps and valves in some
of the branches, etc. Also, the logical arrays used above
Although the method is computationally straight- could find a similar application in solving for the currents
forward, it needs many iterations to give a reasonable in a network of resistors, with known voltages applied
degree of convergence. Also, referring to equation at some of the nodes (although this would lead to a set
(5.4.7), we can see that a starting guess of pi = p, for of simultaneous linear equations).
5.9 Newton-Raphson Iteration for Nonlinear Equations 319
5.9 Newton-Raphson Iteration for Nonlinear Equations Since the entries in the matrix +(xk) - $ are differences
The equations to be solved are again those of (5.33), +
of the type fij(xk) -Aj(@ tik(xk-a)), they can be kept
and we retain the nomenclature of the previous section. uniformly small if the starting vector x, lies in an initially
The Newton-Raphson process, to be described, is once chosen region R describable as Ixi - ail < h, 1 < i < n.
more iterative in character. We first define Concurrent with this is the fact that since det(+(x)) # 0,
det(+(xk)) can be bounded from zero. The net result is
that, for 0 < p < 1, Jxik- ail < hpk, 1 < i < n. Thus the
sequence (x,) converges to a .
Next define the matrix +(x) as
Example. To illustrate the procedure, the equations of the
+(XI = (fij(x)), 1 < i < n, 1 < j < n. (5.41) previous section are used, namely
Thus det(+(x)) is the Jacobian of the system (5.33)
evaluated for the vector x = [x,,x,, . . ., x,,]'. Now define
the vector f(x) as
f(x) = [h(XI,f2(x), . . ., f,(x)lr- (5.42)
With these definitions i11 mind, and with the starting
vector x, = [x,,,x2,, . . ., x,,]', let It is readily seen that

=Xk+Sk,
Xk+i (5.43) afl - --+
-- af1 - - -1
1 xz ~ o ~ ( x , ~ 2 ) -- XI cos(x,xz)
ax, 2 2 ' ax2 4.rr + 2 3
where 6, is the solution vector for the set of simultaneous
linear equations
+(xk) fik = -f(xk). (5.44)
The fundamental theorem concerning convergence is The increments Ax, and Ax2 in X I and x2 are determined by
much less restrictive than those of the previous sections.
We have the result that if the components of +(x) are
continuous in a neighborhood of a point a such that
f(a) = 0, if det(+(a)) # 0, and if x, is "near" a , then
lim,,, xk = a .
An outline for a meth~odof proof follows. By (5.42)
Or, writing the determinant D of the coefficient matrix (the
and (5.44), since fi(a) = 0,
Jacobian),
+-
6, = '(xk)[f(a) - f(xk)]. (5.45) D = - afl
- - - af2
- afl afz

By the mean-value theorem, ax, ax, ax2axl7


n then
fi(~k)- f i ( ~ )= f i j ( ~+ tik(~k- ~))(xjk- a]),
j= 1

where 0 < ti, < 1. For the ith row of a matrix \Ir use
[fil(a + (ik(~k- a))? . . - ~ f ; . + ~ -
n (<ik(~k For ease in verification, detailed results are tabulated in
Then Table 5.1. Once again, calculations have been carried out by
+
x ~ -+a =~ xk - a tik := +-'(xk)[+(xk) - +](xk - a). slide rule. The entries -0.0000 designate tiny negative values.

Table 5.1 Newton-Raphson Solutionfor xo =


I::[

Note that despite using the: same initial value, this solution differs from that obtained in Section 5.8. However, the starting
values xl0 = 0.6, xzo= 3.0 do lead to the alternative solution x, = 0.5, x2 = rr. Values are given in Table 5.2.
320 Systems of Equations

I::[
Table 5.2 Newton-Raphson Solution fir x, =
EXAMPLE 5.5
CHEMICAL EQUILIBRIUM
NEWTON-RAPHSON METHOD

Problem Statement where a, is the activity of carbon in the solid state. Do


The principal reactions in the production of synthesis not include reaction (5.5.7) in the equilibrium analysis.
gas by partial oxidation of methane with oxygen are: After establishing the equilibrium composition, con-
sidering only the homogeneous gaseous reactions given
CH, + f O2f CO + 2H2 (5.5.1) by (5.5.1), (5.5.2), and (5.5.3), determine the thermo-
CH, + H[20 CO + 3H2 (5.5.2) dynamic likelihood that solid carbon would appear as a
result of reaction (5.5.7). Assume that the activity of
solid carbon is unaffected by pressure and equals unity.
Write a program that finds the 02/CH4 reactant ratio Use the Newton-Raphson method to solve the system
that will produce an adiabatic equilibrium temperature of of simultaneous nonlinear equations developed as the
2200°F at an operating pressure of 20 atmospheres, when result of the equilibrium analysis.
the reactant gases are preheated to an entering tempera-
ture of 1000°F. Method of Solution
Assuming that the gases behave ideally, so that the
component activities are identical with component partial Because of the magnitude of K,, the equilibrium
pressures, the equilibrium constants at 2200°F for the constant for reaction (5.1.1), the first reaction can be
three equations are respectively: assumed to go to completion at 2200°F, that is, virtually
no unreacted oxygen will remain in the product gases at
equilibrium.
Let the following nomenclature be used:
x, mole fraction of CO in the equilibrium
mixture
x, mole fraction of CO, in the equilibrium
mixture
x, mole fraction of H 2 0 in the equilibrium
Here Pco, P c o ~ , P H ~ o ,P H ~P, C H ~and P o , are the partial mixture
pressures of CO (carbon monoxide), C0, (carbon x, mole fraction of H, in the equilibrium
dioxide), H 2 0 (water vapor), H2 (hydrogen), CH, mixture
(methane), and 0, (oxygen), respectively. x, mole fraction of CH, in the equilibrium
Enthalpies of the various components at 1000°F and mixture
2200°F are listed in Table 5.5.1. x, number of moles of 0, per mole of CH, in
the feed gases
Table 5.5.1 Component Enthalpies in x, number of moles of product gases in the
BTUIlb mole equilibrium mixture per mole of CH, in
the feed gases.
Component 1000°F 2200°F
Then a system of seven simultaneous equations may be
generated from three atom balances, an energy balance,
a mole fraction constraint, and two equilibrium re-
lations.
Atom conservation balances. The number of atoms of
each element entering equals the number of atoms of
each element in the equilibrium mixture.
A fourth reaction may also occur at high temperatures: Oxygen: x, = (+x1 + x2 + t x 3 ) x 7 . (5.5.9)
C + 'C02G 2CO (5.5.7) Hydrogen: 4 = (2x, + 2x4 + 4xs)x,. (5.5.10)
At 2,200°F,any carbon formed would be deposited as a
solid; the equilibrium colnstant is given by Carbon: 1 = ( x , + x2 + xs)x,. (5.5. I I)
2
K, = --PcO = 1329.5, (5.5.8) Energy (enthalpy) balance. Since the reaction is to be
acPco2 conducted adiabatically, that is, no energy is added to
322 Systems of Equations

or removed from the reacting gases, the enthalpy (H) of derivatives of (5.40) may be found by partial differen-
the reactants must equal the enthalpy of the products. tiation of the seven functions,f,(x), with respect to each
of the seven variables. For example,

afl - -1
-- -- -af,
= - x,
ax, 2' ax, ax, x:'

Mole fraction constraint.


x,+x2+x3+x4+x5=1. (5.5.13)
afl - -1
-- -afl
=--
1
Equilibrium relations.
ax, 2' ax, x7'

The Newton-Raphson method may be summarized as


follows:
1. Choose a starting vector xk = xo = [x,,, x,,, .. .,
x,,], where x, is hopefully near a solution a.
The relationships (5.5.14) and (5.5.15) follow directly 2. Solve the system of linear equations (5.44,
from (5.5.5) and (5.5.6), respectively, where P is the total
pressure and pco = Px,, etc. In addition, there are five
side.conditions, where

xi>O, i = 1 , 2 ,..., 5. (5.5.16)


These conditions insure that all mole fractions in the
equilibrium mixture are nonnegative, that is, any solution and
of equations (5.5.9) to (5.5.15) that contains negative
mole fractions is physically meaningless. From physical-
chemical principles, there is one and only one solution for the increment vector
of the equations that satisfies conditions (5.5.16). Any
irrelevant solutions may be detected easily.
The seven equations may be rewritten in the form 3. Update the approximation to the root for the next
iteration.

where x = [x, x, x, x, x, x, x7]', as follows:


4. Check for possible convergence to a root a. One
such test might be

If (5.5.23) is true for all i, then xk+, is taken to be the


root. If test (5.5.23) is failed for any i, then the process
is repeated starting with step 2. The iterative process is
continued until test (5.5.23) is passed for some k, or
when k exceeds some specified upper limit.
In the programs that follow, the elements of the
augmented matrix

fdx) = xl + x 2 + x, + x4 + X, - 1=0 (5.5.18e)


are evaluated by a subroutine named CALCN. The system
f6(x) = P2x1x: - 1.7837 x 1O5x3x5= 0 -(5.5.18f) of linear equations (5.44) is solved by calling on the
function SIMUL, described in detail in Example 5.2.
f7(x) = x,x, -2 . 6 0 5 8 =
~~0. ~ ~ (5.5.188)
The main program is a general one, in that it is not
The system of simultaneous nonlinear equations has the specifically written to solve only the seven equations of
form of (5.33), and will be solved using the Newton- interest. By properly defining the subroutine CALCN, the
Raphson method, described in Section 5.9. The partial main program could be used to solve any system of n
Example 5.5 Chemical Equilibrium (Newton-Raphson Method) 323
simultaneous nonlinear equations. The main program nonlinear equations, E, is the minimum pivot magnitude
reads data values for itmczx, iprint, n, el, E,, and x,, x,, allowed in the Gauss-Jordan reduction algorithm, E, is
. . ., x,. Here, itmax is the maximum number of Newton- a small positive number used in test (5.5.23), and x,,
Raphson iterations, iprint is a variable that controls x2, ..., x,, are the initial estimates x,,, x2,, . .., x,,,, that
printing of intermediate output, n is the number of is, the elements of x,.

Flow Diagram
Main Progranz

Evaluate elements
a,] of augmented '

matrix A (see (5.5.24)).


(Subroutine CALCN)

Solve system of ~i

d,k,62~9.. 6"k
-9

and determinant, (1.


(Function SIMUL)

Svbroutine CALCN (Argurnents:~~,


A, N )

I calculate I
elements all
i = 1,2, .,,,i?
j - l 3 2 , , , . , t t - I -1
of matrix A
I (see (5.5.24)). (
324 Systems of muations

FORTRAN Implementation
List of Principal Variables
Program Symbol Dejinition
(Main)
A Augmented matrix of coefficients, A (see (5.5.24)).
DETER d, determinant of the matrix 4 (the Jacobian).
EPS1 E ~ minimum
, pivot magnitude permitted in subroutine SI M U L.
EPS2 ez , small positive number, used in convergence test (5.5.23).
I Subscript, i.
IPRINT Print control variable, iprint. If iprint = 1, intermediate solutions are printed after each itera-
tion.
Used in convergence test (5.5.23). ITCON = 1 if (5.5.23) is passed for all i, i = 1 , 2 , . . ., n;
otherwise ITCON = 0.
ITER Iteration counter, k.
ITMAX Maximum number of iterations permitted, itmax.
N Number of nonlinear equations, n.
XlNC Vector of increments, 6,.
XOLD Vector of approximations to the solution, x,.
SIMUL Function developed in Example 5.2. Solves the system of n linear equations (5.44) for the
increments, Sik, i = I , 2, .. . , n.
(Subroutine
CALCN)
DXOLD Same as XOLD. Used to avoid an excessive number of references to subroutine arguments in
CALCN.
1, J i and j, row and column subscripts, respectively.
NRC N, dimension of the matrix A in the calling program. A is assumed to have the same number of
rows and columns.
Pressure, P, atm.
Example 5.5 Chemical Equilibrium (Newton-Raphson Method)

Program Listing
Main Program
C A P P L I E D - N U M E R I C A L METHODS, EXAMPLE 5.5
C
C
CHEMICAL E Q U I L I B R I U M -NEWTON-RAPHSON METHOD
C T H I S PROGRAM SOLVES N SIMULTANEOUS NON-LI NEAR EQUATIONS
C I N N UNKNOWNS BY THE NEWTON-RAPHSON I T E R A T I V E PROCEDURE.
C I N I T I A L GUESSES FOR VALUES OF THE UNKNOWNS ARE READ I N T O
C XOLO( 11.. .XOLO(N). THE PROGRAM F I R S T CALLS ON THE SUBROUTINE
C CALCN TO COMPUTE THE ELEMENTS OF A, THE AUGMENTED M A T R I X OF
C P A R T I A L D E R I V A T I V E S , THEN ON FUNCTION S I M U L (SEE PROBLEM 5 . 2 )
C TO SOLVE THE GENERATED SET OF L I N E A R EQUATIONS FOR THE CHANGES
C I N THE SOLUTION VALUES X I N C ( l ) . . . X I N C ( N ) . DETER I S THE
C JACOBIAN COMPUTED BY SIMUL. THE SOLUTIONS ARE UPDATED AND THE
C PROCESS CONTINUED U N T I L ITER, THE NUMBER OF ITERATIONS,
C EXCEEDS ITMAX OR U N T I L THE CHANGE I N EACH OF THE N VARIABLES
C I S SMALLER I N MAGNITUDE THAN EPS2 ( I T C O N = 1 UNDER THESE
C CONDITIONS). E P S l I S THE MINIMUM Pl.VOT MAGNITUDE PERMITTED
C I N SIMUL. WHEN I P R I N T = 1, INTERMEDIATE SOLUTION VALUES ARE
C PRl NTED AFTER EACH ITERATION.
C
DIMENSION XOLD(21), XINC(211, A(21,21)
C
C
1
..... READ AND PR l NT DATA
READ ( 5 , 1 0 0 )
.....
ITMAX,lPRINT,N,EPSl,EPS2,(XOLD( I), I=l,N)
WRITE ( 6 , 2 0 0 ) ITMAX, I P R I N T , N , E P S l , E P S 2 , N , ( X O L D ( I 1, I = l , N )
I T E R A T I O N .....
C
C ..... NEWTON-RAPHSON
DO 9 I T E R = 1, ITMAX

..... CALL ON CALCN TO SET UP THE A MATRIX .....


a
L
C
CALL CALCN( XOLD, A, 2 1
C
C ..... CALL S l M U L TO COMPUTE JACOBIAN AND CORRECTIONS
DETER = S I M U L ( N, A, XINC, EPSl, 1, 2 1 )
I N XlNC .....
I F ( 0ETER.NE.O. GO TO 3
WRITE ( 6 , 2 0 1 )
GO TO 1
C
C
3
..... CHECK
ITCON = 1
FOR CONVERGENCE AND UPDATE XOLD VALUES .....
DO5 1 = 1 , N
I F ( ABS(XINC(I)).GT.EPS2 ITCON = 0
5 XOLO(I = XOLO(I1 + XINC(1)
I F ( IPRIINT.EQ.1 WRITE ( 6 , 2 0 2 ) ITER,OETER,N,(XOLD( I), l = l , N )
I F ( ITCOIN.EQ.0 ) GO TO 9
WRITE ( 6 , 2 0 3 ) ITER,N,(XOLD(I),I=l,N)
GO TO 1
9 CONTINUE
L
WRITE ( 6 , 2 0 4 )
GO TO 1
L
C
100
..... FORMATS FOR I N P U T AND OUTPUT STATEMENTS
FORMAT ( :LOX, 13,17X, 11,19X, 1 3 / 10X,E7.1,13X,E7.1/
.....
(20X, 5 F 1 0 . 3 ) )
200 FORMAT ( l O H l l T M A X = , I 10H IPRINT = , 18/ 10H N =
1 18/ 10H EPSl = 1PE14.1/ 1 0 H EPS2 = , lPElC.l/
201
2 26H0 ...
X O L D ? ~ ) XOLO(, 12, 1 H ) / 1 H / ( 1 H , 1 P 4 E 1 6 . 6 )
FORMAT ( 38HOMATRIX I S I L L - C O N D I T I O N E D OR SINGULAR )
202 FORMAT ( l O H O l T E R = 8 1 0 H DETER = , E18.5/
2 26H XOLD(1I ...
XOLD(, 12, 1 H ) / ( 1 H ,1P4E16.6) )
203 FORMAT ( ZkHOSUCCESSFUL CONVERGENCE / l O H O l T E R = 13/ ,
2 26H0 XOLD(l)...XOLD(, 12, 1 H ) / 1 H / ( 1 H , l P C E 1 6 . 6 ) )
204 FORMAT ( 1 5 H NO CONVERGENCE )
C
END
Systems of Equations

Program Listing (Continued)


Subroutine C A L C N
SUBROUTINE CALCN( DXOLD, NHC
.
b
A,

C T H I S SUBROUTINE SETS U P THE AUGMENTED M A T R I X OF P A R T I A L


C D E R I V A T I V E S REQUIRED FOR THE S O L U T I O N OF THE N O N - L I NEAR
C EQUATIONS WHICH D E S C R I B E THE E Q U I L I B R I U M CONCENTRATIONS
C OF C H E M I C A L CONSTITUENTS R E S U L T I N G FROM P A R T I A L O X I D A T I O N
C OF METHANE W I T H OXYGEN TO PRODUCE S Y N T H E S I S GAS. THE PRESSURE
C I S 2 0 ATMOSPHERES. S E E T E X T FOR MEANINGS OF XOLD(l)...XOLD(N)
C AND A L I S T I N G OF T H E EQUATIONS, DXOLD HAS BEEN USED AS THE
C DUMMY ARGUMENT FOR XOLD TO A V O I D AN E X C E S S I V E NUMBER OF
C REFERENCES TO ELEMENTS I N THE ARGUMENT L I S T .
C
D I M E N S I O N XOLD(ZO), DXOLD(NRC1, A(NRC,NRC)
C
DATA P / 20. /
C
C .....
D O 1
S H I F T ELEMENTS OF DXOLD TO XOLD AND CLEAR A ARRAY
I a 1 , 7
.....
XOLD(I) = DXOLD(1)
D O 1 J = 1 , 8
1 A(I,J) = 0.
rn
L
C ...., COMPUTE
A(1,l) 0.5
=
NON-ZERO ELEMENTS OF A .....

A(5,5)
A(5,8)
A(6,l)
A(6,3)
A(6,b)
A(6,S)
A(6,8)
A(7,l)
A(7,2)
A(7,3)
A(7,4)
A ( 7,8)
RETURN
C
END
Example 5.5 Chemical Equilibrium (Newton-RaphsonMethod)

Program Listing (Continued)


Data
ITMAX = 50 IPRINT = 1 N = 7
EPSl 1.OE-10 EPSZ = 1.OE-05
XOLD(l)...XOLD(5) = 0.500 0.000 0.000 0.500 0.000
XOLD(61.. .XOLD(7) = 0.500 2.000
ITMAX = 50 IPRINT = 0 N = 7
EPSl = 1.OE-10 EPS2 = 1.OE-05
XOLD(l)...XOLD(5) = 0.200 0.200 0.200 0.200 0.200
XOLD(6)...XOLD(7) = 0.500 2.000
ITMAX = 50 IPRlNT = 1 N = 7
EPSl = 1.OE-10 EPS2 = 1.OE-05
XOLD(l)...XOLD(S) = 0.220 0.075 0.001 0.580 0.125
XOLD(6)...XOLD(7) = 0.436 2.350

Computer Output
Results for the 1st Data Set
ITMAX = 50
IPRINT = 1
N I 7
EPSl = 1.OE-10
EPS2 = 1.OE-05

ITER = 1
DETER = -0.97077E 0 7
XOLD(l)...XOLD( 7)
2.210175E-Oil 2.592762E-02
2.591652E-01 3.343250E-01

ITER = 2
DETER =
XOLD( I).
3.101482E-01
.
-0.10221E 1 0
.XOLD( 7 )
7.142063E-03
4.812878E-02 4.681466E-01

lTER = 3
DETER = -0.41151E 0 9
XOLD(I)...XOLD( 7)
3.202849E-01 9.554777E-03
1.048106E-02 5.533223E-01

ITER = 4
DETER = -0.22807E 09

ITER = 5
DETER a -0.20218E 0 9
XOLD(l)...XOLD( 7)
3.228708E-01 9.223551E-03
3.716873E-03 5.767141E-01
ITER = 6
DETER = -0.20134E 0 9
XOLD(l)...XOLD( 7)
3.228708E-01 9.223547E-03
3.716847E-03 5.767153E-01
Systems of Equations

Computer Output (Continued)


SUCCESSFUL CONVERGENCE

ITER = 6

XOLD(l)...XOLD( 7)

Results for the 3rd Data Set


ITMAX = 50
IPRINT = 1
N = 7
EPSl = 1.OE-10
EPSZ = 1.OE-05

ITER = 1
DETER = -0.61808E 0 8
XOLD[l).. .XOLD( 7 )
6.951495E-01 -8.022028E-02 1.272939E-02 1.217132E 00
-8.447912E-01 1.314754E 00 5.969404E 0 0

lTER = 2
DETER = 0.12576E 09

ITER = 3
DETER = 0.77199E 0 7
XOLD(l)...XOLO( 7)
4.559822E-01 -9.799302E-04 -7.583648E-04 9.107630E-01
-3.650070E-01 2.509821E 0 0 1.107038E 0 1

ITER = 4
DETER = 0.53378E 0 7
XOLD(l)...XOLD( 7)
4.569673E-01 -4.071472E-04 -2.142648E-03 9.152630E-01
-3.696806E-01 2.608933E 0 0 1.149338E 0 1

ITER = 5
DETER = 0.49739E 0 7
XOLD(l)...XOLD( 7)
4.569306E-01 -4.071994E-04 -2.125205E-03 9.151721E-01
-3.695704E-01 2.610552E 0 0 1.150046E 0 1

ITER = 6
DETER = 0.49611E 0 7
XOLD(l)...XOLD( 7)
4.569306E-01 -4.071984E-04 -2.125199E-03 9.151720E-01
-3.695703E-01 2.610549E 0 0 1.150045E 0 1

SUCCESSFUL CONVERGENCE

ITER = 6
Example 5.5 Chemical Equilibrium (Newton-Raphson Method) 329
Discussion of Results in the feed gases, and the total number of moles of pro-
Results are shown for the first and third data sets only. duct per mole of CH, in the feed are tabulated in Table
For the first two data sets, the Newton-Raphson iteration 5.5.2. Thus the required feed ratio is 0.5767 moles of
converged to the same solution, one that satisfies the oxygen per mole of methane in the feed gases.
side conditions (5.5.16). Results for the third data set To establish if carbon is likely to be formed according
cannot be physically meaningful, because the solution to reaction (5.5.7) at 2200°F for a gas of the computed
has negative mole fractions for CO,, H,O, and CH,. composition, it is necessary to calculate the magnitude of
The equilibrium compositions, reactant ratio 02/CH4

Table 5.5.2 Equilibrium Gas Mixture


If K is larger than K4 from (5.5.8), then there will be a
X I ,mole fraction GO 0.322871 tendency for reaction (5.5.7) to shift toward the left;
XZ, mole fraction C 0 2 0.009224 carbon will be formed. Assuming that a, = 1,
xj, mole fraction HIO 0.046017
x4, male fraction W2 0.618172
x5, mole fraction CH, 0.003717
X6, feed ratio 0 2 / C H 4 0.576715
x,, total moles of product 2.977863
Therefore there will be no tendency for carbon to form.
39 systems of Eguations

Problems 5.4 As a continuation of Problem 5.3, assume that the


elements of the right-hand side vector u are known exactly,
5.1 Write the following simultaneous equations in matrix
but that the elements of B are not known with certainty. Let
form :
+
the approximation of B be denoted as B 6B with the re-
maining notation the same as before. Show that

By means of a hand calculation, implement the Gauss-Jordan


reduction scheme with the diagonal pivot criterion to compute:
5.5 Let XI be the solution to the system of linear equations
(a) XI and XZ.
(b) The inverse of the coefficient matrix.
(c) The determinant of the coefficient matrix.
5.2 Solve the following three equations using the Gauss- found by one of the elimination algorithms. Assuming that
Jordan reduction scheme with the maximum pivot criterion: some round-off errors are present in xl, let the error vector
be defined by

and the residual vector rl be defined by


Do not interchange rows or columns. Show the status of the
augmented matrix after each complete pass of the elimination
procedure. Show that if the system of equations
5.3 Define the norm of an n x n real matrix A by

IIAll= max -.llAx II


Il~ll+O llxll could be solved exactly, then the solution
Alternatively, this may be written as

would solve the original system Bx = u, exactly.


Thus geometrically, llA 11 may be viewed as the length of the 5.6 In practice, the solution vector 6, of Problem 5.5
longest vector in the image set {Ax) of all possible unit will not be computed exactly, and xz will be taken as the next
vectors {x}(the unit sphere) under the transformation x +Ax. iterate in the solution scheme
As a consequence of the definition,
Ek = X - Xk.
rk= u - Bxk,
B6r, = rk,
Now consider the problem of solving the equations (5.2), X k + l = Xk+6k.

Forsythe and Moler [14] point out that the residuals r, must
generally be ~omputedwith a higher precision than that used
when the elements of B are known exactly, but the elements of for the rest of the calculations. The iterative improvement
u are not known with certainty. In particular, suppose that algorithm may be terminated when li6k!I is sufficiently small
+
a vector u 6u is used as the right-hand side vector (pre- (typically when the single-precision results for xk stabilize).
sumably the elements of 6u would be small, so that u 6u is+ Consider the equations
+
fairly close to u itself). If x 6x is the solution of the system

show that the relative uncertainty in x introduced by the


uncertainty in u is given by
Assuming that at some stage in the calculations

Here, cond(B), the condition number [14], is given by carry out one iteration of the correction scheme to find
x,,,, retaining six significant digits in the computation of
rk, and three significant digits elsewhere.
If cond(B) is small (of order unity), then B is said to be well 5.7 How would you modify the programs of Examples
conditioned,while if cond(B) is large, then B is said to be poorly 5.1 and 5.2 to accommodate the iterative improvement of the
conditioned or ill conditioned. solution vector outlined in Problems 5.5 and 5.6?
Pri
5.8 Assume that an approximate inverse, Qk, of matrix B Table P5.10
is available. Using techniques analogous to those outlined in
Problems 5.5 and 5.6, show that if a residual matrix Rk is
defined to be
Rk = I - BQk,
then the next iterate should be

and a general recursion formula,

can be used to calculate it. Use the 2nd, 4th, and 7th entries in Table P5.10 to determine
How would the function SlMUL of Example 5.2 have to be the constants A, B, and C in Cauchy's equation for refractive
modified to employ this iterative improvement algorithm for index :
evaluation of B - I ? Make the changes, and test the program
thoroughly. Among the test matrices, include the Hilbert
matrix.
In-festigate the merits of Cauchy's equation for predicting
the remaining entries in the table.
for n = 4, 5, and 6. The I-Iillbert matrix is very ill conditioned, 5.11 When a pure sample gas is bombarded by low energy
even for small n [IS], and is often used as a test matrix for electrons in a mass spectrometer, the galvanometer shows
demonstrating effectiveness of inversion routines. peak heights that correspond to individual mje (mass-to-
5.9 Write a function, named DPOLY, that returns, as its charge) ratios for the resulting mixture of ions. For the ith
value (see Problem 2.51), peak produced by a pure sample j, one can then assign a
sensitivity siJ(peak height per micron of Hg sample pressure).
.These coefficients are unique for each type of gas.
A distribution of peak heights may also be obtained for an
n-component gas mixture that is to be analyzed for the partial
The routine should have the argument list pressures p,, p 2 , . . .,pn of each of its constituents. The height
hi of a certain peak is a linear combination of the products of
(N, X, Y, M, I, A, Q, NQROW, NQCOL, TRUBL),
the individual sensitivities and partial pressures:
where N and M are equivalent to n and m, respectively, X
is an array containing xo, , . ., X , in X(l), . . ., X ( N + I), Y is 2 SijPj = hi.
an array containing the furictional values f (xo), . . .,f (x.) in j= 1

.
Y ( l ), . . , Y(N + 1,) and I is the subscript of the desired
+
argument, X(I) (note that I = i 1). Q isa matrix dimensioned In general, more than n peaks may be available. However, if
to have NQROW rows and NQCOL columns in the calling the n most distinct ones are chosen, we have i = 1,2, . . ., n, so
program (to insure a compatible storage arrangement). that the individual partial pressures are given by the solution
DPOLY should set up the augmented matrix of coefficients of n simultaneous linear equations.
of the simultaneous linear equations whose solutions are the Write a program that will accept values for the number of
a,, j = 0, . . ., n, in the first N + 1 rows and N + 2 columns of components N, the sensitivities S(1,l) ... S(N,N), and the
the matrix O (see Problem 2.51) and call on an appropriate peak heights H(l) . . . H(N). The program should then com-
linear equation solver (such as the function Sl M U L of Example pute and print values for the individual partial pressures
5.2) to evaluate the a,. DPOLY should store ao, ..., a. in P(1). . . P(N). An elimination procedure should be used in the
A(1). .. .,A(N + I ) , evaluate the appropriate derivative at the computations.
specified argument, and return to the calling program.
TRUBL (logical) should be set to .FALSE. if no computational Suggested Test Data
problems are encountered and to .TRUE. otherwise.
Write a short main program that reads and prints essential The sensitivities given in Table P5.11 were reported by
data, calls upon DPOLY to evaluate the required derivative and Carnahan [9], in connection with the analysis of a hydro-
coefficients A(l ), ..., A(N + 1 ), prints the results, and returns carbon gas mixture.
to read more data. As trial problems, generate the formulas
developed in Problems 2.49 and 2.50.
5.10 Jenkins and White [8] give precisely determined
-
A particular gas mixture (sample #39G) produced the
-
following peak heights: h , 17.1, h2 65.1, h3 = 186.0,
h4 = 82.7, hs = 84.2, h6 = 63.7, and h, = 119.7. The measured
values, shown in Table P5.10, for the refractive index, n, at total pressure of the mixture was 39.9 microns of Hg, which
various wavelengths, X (angstrom units), for borosilicate can be compared with the sum of the computed partial
crown glass. pressures.
332 Systems of Equations

Table P5.11

Component Index, j
Peak
Index mle 1 2 3 4 5 6 7
i Hydrogen Methane Ethylene Ethane Propylene Propane n-Pentane

5.12 Could Problem 5.11 be solved by the Gauss-Seidel


method?
5-13 Here and in ~roblems'5.14and 5.15 we consider the
loading of statically indeterminate pin-jointed frames. All
supports are rigid, and the frames fit perfectly when unloaded.
he following notation is used consistently for members:
E = Young's modulus, L = length, A = cross-sectional area,
and I = second moment of area about the neutral axis.
Let p = vector (column) of applied loads, 6 = corresponding Note that the successive columns of Bo and B, can be found
displacement vector of deflections (one for each applied load), easily by considering from statics alone the internal forces that
s = vector of resulting internal loads in members (including result from unit loads applied in turn (a) horizontally at B,
redundants-see below). Tensions and extensions are con-
sidered positive; compressions and contractions negative. 10 in. B
Let F be a diagonal matrix with entries LIAE for individual
members (F is the unassembled flexibility matrix), and let r
be the flexibility matrix for the assembled structure, such that
8 = r p . For the frame, choose arbitrarily a sufficient number
of redundant members (whose internal loads are given by the
vector r, which will also be contained as part of s), such
that without them the structure would be statically deter-
minate. Let Bo be a matrix such that s = Bop would give the
internal loads in the absence of the redundant members, and
let B1 be a matrix such that s = Blr would give the internal
loads in the absence of any applied loads; combined, the
+
effect is s = Bop Blr. It may then be shown (see McMinn
[lo], Robinson [ll], or Problem 4.18, for example) that

s = Bp, Figure PS.13


where
B = Bo - B1(B~F'Bi)-'B:FB0. (b) vertically at C, and (c) along the redundant member. The
number of columns in Bo equals the number ~f applied loads;
Consider the frame shown in Fig. P5.13, with A E = 50 x lo6 the number in Bl equals the number of redundant members
Ib, for all members. Number the members 1 through 5; these (here, just one). The ordering of rows and columns in the above
numbers will then identify the rows of the F, Ba, and BI matrices is arbitrary but must be selfconsistent. For example,
matrices as corresponding to the individual members. if the applied load vector were taken as v = [1000, 5001' instead
Choosing A C as the redundant member, we have of [500, 1000]', the columns of Bo would have to be inter-
changed.
Determine the displacement and internal load vectors 6 and
s for the frame.
5.14 Following in the style of Problem 5.13, choose an
appropriate set of redundant members and evaluate p, P, Bo,
and B1 for the pin-jointed frame shown in Fig. P5.14. Assume
A E = 50 x lo6 lbf throughout.
Prc

SY
Figure P5.17~

Yz = 500 Ibf Y3 = 500 Ibf


Figure P5.14
5.15 Write a computer program that will simulate the
S2 SY
Figure P5.176
loading of a statically indeterminate pin-jointed frame, as
discussed in Problem 5.13. The input should include p, F, Bo, where the internal loads (one, s3, is a moment) are indicated
and B,. The output should include B, r, s, and 6. Test the in Fig. P5.17b. The unassembled flexibility matrix is
program with the data of Problems 5.13 and 5.14.
5.16 Horizontal and vertical loads Xand Y, and a moment
Mare applied to the cantilever of length L shown in Fig. P5.16.

Choosing s4 and s, as the redundant loads, write down p,


Figure P5.16 Bo, and B,, and use the computer program developed in
At the free end, the horizontal extension a, the downward Problem 5.15 to solve for the resulting internal loads and the
deflection a,, and the slope 4 (approximately equal to the deflection at B. Assume E = 30 x 106 Ib,/in.2, A = 3.53 in.2
angle with the horizontal for small 4) are related to the loads and I = 20.99 in.4 throughout (appropriate for a 6 x 3 steel
as follows: I beam), with L1 = 10 ft, L2= 5 ft, and Y = 1000 lb,. Com-
ment on the computed values of s, and s4.
5.18 The portal frame shown in Fig. P5.18 is clamped
rigidly at A and E, and Band D are rigid corners. E = 30 x 106

Here, E = Young's modulus, A = cross-sectional area, and


I = second moment of area for the cantilever. The above 3 x 3
matrix is the flexibility matrix for the cantilever.
Evaluate K = r-I, the stflness matrix for the cantilever.
5.17 In the pin-jointed frames of Problems 5.13 and 5.14
the members were subjected lo axial loads only. However, the
method outlined in Probleni 5.13 may be applied to more
general structures, in which shears and bending moments also
exist. The followingmodifications are necessary: (a) the applied
and internal load vectors must now include shears and
bending moments in addition to axial loads, and (b) the indi- lb,/in,.2 throughout, A = 5.3 in.2 and I = 55.63 in.4 for AB and
vidual diagonal elements of the Fmatrix (previously LIAE for DE, and A = 11.77 in.2 and I = 204.8 in.4 for BCD. Compute
each member) must be replaced by the appropriate flexibility all the internal loads and the displacements of B and C.
matrix for each member (such as the 3 x 3 matrix of Prob- 5.19 Consider the frame of Problem 5.14. By removing
lem 5.16). the diagonal members that slant up to the right, and by sub-
As a simple example, consider the beam of Fig. P5.17a, stituting a horizontal roller joint for the right-hand pin-
clamped at A and pin-jointed at C. Let s = [sl s2s, s4ssy, jointed support, the problem becomes statically determinate.
334 Systems of Equations
In this event (with the same loads) compute the new internal 5 2 3 Show that the inverse of an n x n symmetric, posi-
loads in the members and the reactions at the two supports. tive definite matrix A is given by
5.20 This problem is "open-ended," in that the degree of
complexity possible depends on the skill and imagination of
the programmer, and his familiarity with structural problems. where S is the matrix factor found by Cholesky's method (see
The ultimate goal is to have the computer design a structure Problem 5.21b.)
so that no member is oversized. Ideally, input to the program Write a subroutine, named CHOLSK, with argument list
would consist of basic inrormation concerning the configura- (N,A) that inverts, in place, an N by N symmetric, positive
tion of the structure and the loads to be imposed on it, definite matrix with elements A ( l ,I ), ..., A(N,N). Organize
together with an inventory of the available members and the computations so that the elements of S are overstored in the
various working stresses for the materials involved. The out- appropriate elements of A. Then the inverse of the triangular
put would consist of specifications for all members together matrix S should be developed in A using the diagonal-p~vot
with the corresponding internal loads and deflections. Econo- strategy (why not the maximum-pivot strategy ?). Finally A - '
mic factors might also be considered. should be computed, in place, in A. No auxiliary matrlces
Establish a feasible scheme for accomplishing all or part of should be used in the subroutine.
the above goal, and write and test a computer program that 5.24 A particular type of electrical network consists of
will implement the scheme. several known resistors that are joined at n nodes, numbered
5.21 Let A be an n by n square matrix whose leading sub- i = l , 2 , . . ., n. The voltage V is specified at two or more of
matrices have nonzero determinants, so that A may be fac- these nodes. There is at most a single resistor, R,,,connected
tored (in more than one way) as between any two nodes p and 9.
Write a program that will accept information concerning
A = LIU, the above (in the manner of Example 5.4) and that will
where (see Section 4.2) L and U are lower and upper triangular proceed to compute: (a) the voltages at all remaining nodes,
matrices, respectively. Show that: and (b) the current I,, (and direction of flow) in each resistor.

(a) Under the same hypotheses, A may be factored uniquely Suggested Test Data
as Use the network of Example 5.1 and also that shown in
A = LU, Fig. P5.24, in which the resistances are in ohms.
where Uz is an upper triangular matrix and L is a lower tri-
angular matrix whose diagonal elements are all unity.
(b) The matrix Uz of part (a) may be factored uniquely as
Uz = DU
where U is an upper triangular matrix whose diagonal ele-
ments are all unity, and D is a diagonal matrix whose ele-
ments are the corresponding elements of U2.
5.22 Since the leading submatrices of a symmetric,
positive definite matrix A have nonzero determinants (see
Section 4.5), A satisfies the hypotheses of Problem 5.21. Show
that: Figure P5.24
(a) A can be written in the form 5.25 Consider the radiant heat transfer to and from the
two surfaces iand j shown in Fig. P5.25. Define T = absolute
A = LDU = (LD1'Z)(D1'2U)= S'S

where D = diag(dI,d2,. . .,d,), DLIz= diag(d:/2,d:/2,. . .,dJlz),


and S = DIIzU.
(b) The matrix S = (sij) has the following elements:

H w e P5.25
temperature, E = ernissivity, a = absorptivity, p = l - a =
reflectance, a = Stefan-Boltzmann constant, A = area, E =
&AaT4,G =total rate of radiation arriving at a surface
(BTU/hr, for example), and J = total rate of radiation leaving
a surface (emitted plus reflected), with subscripts indicating
the appropriate surface. Also let the geometric view factor Fil
This algorithm is known as Chofesky's merhod or the square- be the fraction of the radiation leaving surface i that is directly
root method for factoring a positive definite matrix [14]. intercepted by surface j.
Problems 335
Now consider an enclosulre consisting of n surfaces (one o r Based on the discussion in Problems 5.25 and 5.26, write a
more of which will comprise the enclosing surface). At steady program that will compute (a) all the view factors needed (the
state, show that functions developed in Problem 3.17 should help), (b) Ei, C i ,
and J , for each surface,@) Q,,the rate of heat supply or removal
for each heating element and pipe, and (d) T,, the refractory
temperature (assumed uniform over all four walls). Assume
that the problem is essentially two-dimensional; that is,
If E , a, T, and hence E are known for every surface, show neglect longitudinal temperature variations along the furnace.
that the J, are given by the solution of the n simultaneous Base all calculations for areas and heat fluxes on unit length of
linear equations: the furnace. Investigate both the arrangement shown in Fig.
P5.27 and also that in which heater 2 and pipe 4 are inter-
changed.
Suggested Test Data
where d = 1 in., T3 = T4 = 950°F, E , = E~ = 0.75, E , = t4= 0.79,
6 s = 0.63, with gray surfaces ( a = E ) throughout; investigate
(a) TI = T2 = 160O0F,h = 4, w = 6 in., (b) T, = Tz = 1600QF,
h = 6, w = 9 in., and (c) T , = 1700, Tz = 1500"F, h = 6, w = 9
in. Note that in all radiation calculations, the temperature
Give an expression for Q i , the rate at which heat must be
supplied (or removed, if negative) internally to each surface in
must be converted to 'R(=OF + 460"). The Stefan-Boltzmann
constant is u = 0.171 x BTU/hr sq ft OR4.
order to maintain its temperature constant. 5.28 In Problem 5.27, the entire refractory wall was
5.26 Extend Problem 5.25 t o the case in which m of the assumed to be at a uniform temperature. Check the validity of
n surfaces are reflactory or insulating. A refractory surface i this assumption by subdividing the refractory into four
radiates and reflects at the same rate that energy falls on it; separate refractory surfaces, each with its own individual
that is, Ji = Gi. temperature, T, through T,, for example. (The subdivisions
If the surfaces i are numbered so that the rn refractories are need not coincide with the four individual walls.) Repeat the
i=n-m+l,n-m+2, ..., n,showthattheJiareagainthe calculations with the ~ r e v i o u stest data.
solution of the simultaneous equations given in Problem 5.25, Investigate the possibility of writing a general program that
except that the coefficients olf the last m equations are now will handle n subdivisions of the refractory, where n can be
ail = Fji for i f j, read as data. If this appears feasible, implement it on the
a ,I, . = F I., . - 1 t i=n-m+l,n-m+2 ,..., n. computer in preference to writing a program that can only
handle exactly four subdivisions.
b, = 0,
5.29 Solve Problem 5.27 (or 5.28) with either of the fol-
How can the individual refractory temperatures be deter- lowing modifications:
mined? (a) A third pipe (also at 950°F, for example), running along
5.27 Figure P5.27 shows the cross section of a long experi- the middle of the furnace.
mental hydrocarbon-cracking furnace; heat is radiated steadily (b) With both T3 and T4 fixed (at 950°F, for example),
from the cylindrical electrica.1 heating elements 1 and 2 to the determine the necessary heater temperature that would
deliver a specified heat flux (1000 BTU/hr per foot length, for
example) to each of the pipes.
5.30 Figure P5.30 shows two parallel horizontal pipes,
each of outside diameter d, and separated by a distanced, that
are located centrally in a horizontal thin metal shield whose
cross section is a rectangle of height H a n d width w. The pipes
carry a hot fluid, and their surfaces are maintained at tempera-
tures TI and Tz. The pipes lose heat by radiation to the metal
shield, with subsequent radiation and convection to the

A Metal shield B

Figure P5.27
pipes 3 and 4, through which a fluid iscirculating. The elements
and pipes have a common diameter d, and are located cen-
trally in their respective quadrants. Assume that T I , T,, T,,
T4, h, w, and d have been specified, and that E and a are
known for all surfaces. All view factors can be obtained by
using (a) the string method (see Problem 3.17). (b) the recipro-
z;=
city relation A I F i ,= AjFji, and (c) the fact that
for all surfaces i.
,Fij- 1
Figure P5.30
336 Systems of Equations
atmosphere outside, which behaves as a black-body enclosure If v,, R, and R, are specified, devise a numerical procedure
at a temperature T.. There is negligible temperature drop for computing v., ,.i v,, and A (defined in Problem 3.20). Write
across each wall of the shield, but the shield temperature is not a program that implements the method, making use of the
necessarily uniform all the way round. The local rate of heat functions already developed in Problems 1.42 and 2.48 for the
loss q from unit area of the outside of the shield to the atmos- 655 triode.
phere is given by
Suggested Test Data (v, in V, R and R, in kQ)
q = E ~ ( T-. ~ +
Ta4) h(T, - T,),
us = 300, R = 10, R, = 0.47; v, = 300, R = 22, Re = 0.47;
where E, is the emissivity of the shield and a is the Stefan- v,=300, R = 2 2 , R,=1.5.
Boltzmann constant. The convective heat transfer coefficient h, 5.33 A horizontal pipe of internal and external diameters
BTU/hr sq ft OF, depends on the particular surface, the tem- dl and d2,respectively, is covered with a uniform thickness r of
perature difference AT = T, - T., OF, and the height H, feet, insulating material, which has an external diameter d,. The
according to the following dimensional correlations (Perry thermal conductivities of the pipe wall and insulation are
[131): k , and k,, respectively. The inside surface of the pipe is
AB: h = 0.38(AT)0.25, maintained at a high temperature TI by a rapid stream of hot
CD: h = 0 . 2 0 ( ~ ~ ) ~ . ~ ~ , gases. Heat is lost at a rate Q BTU/hr per foot length of pipe
A D and BC: h = 0.28(AT/H)0.ZS. by conduction through the pipe wall and insulation, and by
subsequent convection and radiation to the surrounding air.
Assume that TI, Tz, T,, H, w , and d have been specified, and If E = emissivity of the outer surface of the insulation, Ti
that E and u are known for all surfaces. View factors inside the and T3 = temperatures of the outer surfaces of the pipe and
shield can be obtained by using (a) the string method (see insulation, respectively, a = the Stefan-Boltzmann constant,
Problem 3.17), (b) the reciprocity relation AfFlj = AjFIr, and 0.171 x 10- BTU/hr sq ft OR4, and T. = temperature of the
(c) the fact thatx;= IFij = 1 for all surfaces i. Assume that the
problem is essentially two-dimensional; that is, neglect longi-
surroundings, assumed to behave as a black-body radiator, we
have :
tudinal temperature variations. Base all calculations for areas
and heat fluxes on unit length of the pipes.
Write a program that will extend the method discussed in
Problems 5.25 and 5.26 to compute (a) all the relevant view
factors (the functions developed in Problem 3.17 should help),
(b) E f , G I ,and J , for each surface, (c) the net rate of heat loss
for each pipe, and (d) the temperature or temperatures of the
shield. Subdivide the shield surface into as many individual
The extra subscript R emphasizes that in the radiation term,
the absolute or Rankine temperature must be used ("R
+
OF 460").
-
sections as seem desirable, and neglect conduction between The heat transfer coefficient h, BTU/hr sq ft OF, for convec-
such sections. tion between the insulation and air is given by the dimensional
correlation (Rohsenow and Choi [12]):
Suggested Test Data
T. = 60°F, d = 3 in., E, = E Z = 0.79, E , = 0.66 (both sides),
with gray surfaces (a = E ) throughout; investigate the four
possible combinations of (a) TI = T2 = 900"F, and (b)
in which AT = T3 - T. OF, and d3 is in feet.
TI = 900, T2 = 6W°F, in conjunction with (a) H = 6, w = 12
If TI, To,dl, dz, d3 or t , k,, k t , and E are specified, devise a
in., and (b) H = 10, w = 20 in. Note that in all radiation cal-
scheme for solving the above equations for the unknowns AT,
culations, the temperature must be converted to OR(=OF
TZ, T3,h, and Q. Test the method for TI = 1000, T. = 60°F,
+ 460"). The Stefan-Boltzmann constant is a = 0.171 x 10.-
k , = 25.9 (steel), k f =0.120 (asbestos) BTU/hr ft OF, with
BTU/hr sq ft OR4.
(a) dl = 1.049, d2 = 1.315 in. (convert to feet for consistency)
For a modification that allows for conduction around the
and (b) dl = 6.065, dz = 6.625 in. For both pipes, consider all
perimeter of the shield, see Problem 6.34.
combinations o f t = 0 (no insulation), 0.1, 0.5, 1, 2, and 6 in.
5.31 Extend the situation of Problem 3.22 to n similar
(again convert to feet), with E =0.93 (untreated surface) and
CSTRs in series. The relevant equations will now be
E = 0.52 (coat of aluminum paint). Plot the rate of heat loss
and the outer temperature of the insulation against the insula-
tion thickness for the four combinations of pipe and emis-
sivity.
Using the same parameters as in Problem 3.22, find V when 5.34 The following equations can be shown to relate the
n = 2, 3,4, 5, and 10. temperatures and pressures on either side of a detonation wave
5.32 Referring to Problem 3.20, we can dispense with the that is moving into a zone of unburned gas:
grid-bias battery by replacing it with a short circuit and insert-
ing instead a resistor R, between the cathode and ground. The
srid voltage then becomes

u,, = -i,R,,
where .i is the anode current (unknown as yet).
Problems 337
Here, T = absolute temperature, P = absolute pressure, 5.37 Write the system of equations (5.2)in matrix form as
y =ratio of specific heat at constant pressure to that at con-
stant volume, rn = mean n~olecular weight, AHR = heat of
Bx = u.
reaction, c, =specific heat, and the subscripts 1 and 2 refer (a) Show that the Jacobi iteration of (5.15) may be written
to the unburned and burned gas, respectively. in the form
Write a program that will accept values for rnl, m 2 , y 2 ,
AH,,, c,,, T I ,and PI as data, and that will proceed to com- Xr+l =D-'u-- D-I(L+ U ) X ~ ,
pute and print values for T 2 and P 2 . Run the program with the where xk and x k + ,are the estimated solution vectors before and
following data, which apply to the detonation of a mixture of after the kth iteration, respectively, U is a strictly upper-
hydrogen and oxygen: ml = 12 g/g mole, rn2 = 18 g/g mole, triangular matrix containing the superdiagonal elements of B,
y 2 = 1.31, AHRl = -58,300 cal/g mole, c p 2= 9.86 cal/g L is a strictly lower-triangular matrix containing the sub-
mole O K , T I = 300"K, and P, = 1 atm. diagonal elements of B, and D is a diagonal matrix containing
5.35 Read Sections 8.19 and 8.20 as an introduction to this the elements on the principal diagonal of B.
problem. Then consider a regression plane whose equation is (b) Let
C =-0-'(L +U),
but now suppose that the rn observed points (x,,y,,z,),i = 1, 2, so that the iteration may be rewritten as
. . ., m, contain random errors in all the variables. As an alter-
native to the usual least-squares procedure, it is plausible to X ~ + =
I D-'U +C X ~ .
choose cc, ,i3, and y so as to minimize the sum of the squares of Let a be the solution of (5.2) and E~ be a vector of displace-
the normal distances of each point from the regression plane. ments
Let P be the point in the plane that is closest to the origin 0,
and denote the spherical coordinates of P as (r,6,+). The equa- Ek = u - Xk.
tion of the regression plane can then be rewritten as Show that if xo is the starting vector, then the error vector for
xk + is given by
e k f l= C k + ' ~ , , ,
Show that the problem now amounts to minimizing
so that convergence normally requires that lim C k = 0, and
k-.P

that the argument is equivalent to that of (5.1 7) through (5.20).


-
and that setting aS/aO aS/ 8 4 = aS/ar = 0 yields three simul-
taneous nonlinear equations in 0, 4, and r.
5.38 Using the nomenclature of Problem 5.37, show that
the Gauss-Seidel iteration of Section 5.7 may be written as

Write a program that will accept the coordinates of the rn x k + ,= ( L + D ) - ' u - ( L + D ) - l u x k ,


data points and that will proceed to determine a , p, and y and that the error vector for xk + , is given by
according to the above criterion. For a sample set of data,
compare these regression coefficients with those determined by E ~ + I= Gk+leo,
the usual method of least squares, using z = o! Bx yy, + + where G = - ( L +
D)- 'U.
+ + + +
x = crl p l y y,z, and y =. a2 P2x y2z in turn.
5.39 Consider the system of three simultaneous linear
5.36 Write a program that solves a system of n simultane- equations:
ous linear equations,

(a) Without rearranging the equations, try to find the


solutions iteratively using either the Jacobi or Gauss-Seidel
using the method of Kaczmarz, described in Section 5.5. The methods with starting values (0,0,0) and ( 1 .01,2.01,2.99) for
program should read and print values for n, a l l , .. ., a,,.+,, (x1,xz,x3).
initial values for x , , . . ., x. (the starting trial vector), and a (b) Rearrange the equations to conform to criteria (5.21)
small positive number, E , to be used in testing for singularity and repeat part (a).
or near singularity. Next, the solution should be generated 5.40 Consider Q gpm of water flowing from point 1 to
using the method of Kaczmarz. Should point 2 in an inclined pipe of length L ft and diameter D in.
Starting from the equation in part (a) of Problem 5.46 and
assuming a constant value of the Moody friction factor fM, the
print a comment to the effect that the coefficient matrix is the following equation can be shown to hold approximately
singular or ill-conditioned, and stop computation for that data for turbulent flow in pipes of average roughness:
set. If a solution can be found, print it. In any event, return to
read another data set. As test data, select some of the equations
from the data for programs in Examples 5.: and 5.2. Here, p is the pressure in psig and z is the elevation in feet.
338 Systems of Equations

Also, a typical head-discharge curve for a centrifugal pump


can be represented by

in which Ap is the pressure increase in psig across the pump, Q


where a = [u,,u,, .. ., an]'.Let a be a solution of (5.33), so that
is the flow rate in gpm, and a and /3 are constants depending on
J(a) = 0, i = 1,2, . . ., n. Drop the second-order terms in the
the particular pump.
Taylor's series, and replace ccj by the jth element of the
+
(k 1)th iterate, x , + , , ~(hopefully a good approximation of
uj), to yield

Show that this system of equations is equivalent to (5.44) for


the Newton-Raphson iteration. Note that this is the n-dimen-
sional equivalent of the development of Newton's method for a
function of one variable suggested in Problem 3.15.
5.43 In the Newton-Raphson method of Section 5.9 (see
also Example 5.9, the matrix +(xk) of equations (5.44) is
evaluated and inverted once for each pass of the iterative
algorithm. Since nearly all the computing time required for
Figure P5.40
solution to the system of nonlinear equations by this tech-
Consider the piping system shown in Fig. P5.40. The pres- nique is consumed in the evaluation and inversion of the
suresp, and p, are both essentially atmospheric (0 psig); there successive +k, investigate the possibility of using a calculated
is an increase in elevation between points 4 and 5, but pipes C + - '(x,) for more than one itetation. For example, consider
and D are horizontal. Based on the above, the governing equa- the sequence
tions are: 5k = - +-'(~k)f(xk)*
QE = Qc + QD, X k + l = xk + Sk,
PZ = U A - BAQ& 5k+1= - +-'(~k)f(xk+l),
p3 = a11- BnQ;, Xk+l =XI+, +Sk+l,
2.31(p4 -p2) +
8.69 x I O - ~ Q ~ L =
~ 0,
/D~ s k + 2 = - +-'(xk)f(xk+Z)?

+
2.31(p4- p,) 8.69 x ~ o - ~ Q ~ = L 0,
~ / D ~ X k + 3 = X k + 2 f Sk+2,

etc.
Write a program that will accept values for U A , PA, a,, BB, Develop criteria for deciding when to recompute the elements
zs- z4, DC,LC, Do, Lo, DE, and LE, and that will solve the of +(x) and its inverse.
above equations for the unknowns Qc, QD, QE,p2, pnV andp,. 5.44 The calling program in Example 5.5 is well suited
One suggested set of test data is z, - z4 = 70 ft, with: for the general problem of solving n simultaneous nonlinear
equations by the Newton-Raphson method. since all informa-
tion about the specific equation set (except for the initial
guess for the solution vector) is contained in the subroutine
CALCN. In most practical problems, however, the equations
contain parameters (such as the pressure P in CALCN) which
the user would also like to read in as data. How would you
modify the calling program and the essential structure of
Pipe D, in. L, ft
CALCN to allow this, so that further modifications of the
calling program would be unnecessary for other systems of
equations with different parameters?
5.45 The principal chemical reactions occurring in the
production of synthesis gas are listed in Example 5.5 (see
(5.5.1), (5.5.2), and (5.5.3)). Write a program (modelled closely
Assume that the above pipe lengths have already included the
on the programs of Example 5.5) to find the Oz/CH, ratio
equivalent lengths of all fittings and valves.
that will produce an adiabatic equilibrium temperature of
5.41 Rework Example 5.4 (flow in a pipe network, solved
TeOFat an operating pressure of P atm if the inlet CH4 gas is
by the successive-substitution method) using the Newton-
preheated to a temperature of TmOF and the oxygen, introduced
Raphson technique instead. IS one method decidedly better
in the form of an oxygen-nitrogen mixture (possibly air) con-
than the other in this case from the viewpoints of (a) simpli-
taining xo mole percent oxygen, is preheated to TPF.
city, and (b) computational efficiency?
The heat capacities for the seven possible gaseous compo-
5.42 Expand each of the fr, i = 1, 2, . . ., n, of (5.33) in a
nents can be computed as a function of temperature from the
Taylor's series in n variables, xl, xz, . . ., x,, about the kth
iterate, xk = [xk,,,xk,,,. . .,x,..]', of the Newton-Raphson general relation
iteration to yield cp,(Tk) = a l + biTk+ c1T?+ dl/T*Zcal/g mole OK,
Problems 339
where i = 1, 2, 3, 4, 5, 6, and 7 for CO, COz, HzO, H Z ,CH4, 5.46 Extend Example 5.4 (flow in a pipe network) so that
OZrand N Z ,respectively, and Tkis in OK ("K = "C 273.15). + the program can handle one or more of the following addi-
Table P5.45a [13] also shows the standard heat of formation at tional features:
29S°K, AHf.298 cal/g mole, for each component.
(a) Elevation change between nodes. If z, and ZJ denote the
Table P 5 . 4 5 ~ elevations in feet of nodes i and j, the Fanning equation,
(5.4.1), will become
i a, b, ci di AH%,,,

1 6.60 0.00120 0.01 0.0 -26416.


2 10.34 0.00274 0.0 - 195500. - 94052.
3 8.22 0.00015 1.34 X 0.0 -57798. (b) Moody friction factor f M now considered a function of
4 6.62 0.00081 0.0 0.0 0.0 Reynolds number and pipe roughness (see Problem 3.38).
5 5.34 0.0115 0.0 0.0 - 17889. (c) Specified steady flow rate Q, gpm injected into node i
6 8.27 0.000258 0.0 - 187700. 0.0 from outside the network. This would be an alternative to
7 6.50 0.00100 0.0 0.0 0.0 specifying the pressure at certain nodes. For a specified with-
drawal, Q,would be negative.
The program should read in and print out the essential data, (d) Centrifugal pump connected between nodes i and j
solve the nonlinear equations using the Newton-Raphson (the possibility of a direct connection pipe between these
method, print the computed results, and return to read nodes is then ignored). Assume for simplicity that the head-
another data set. If the composition of the incoming oxygen/ discharge curve for such a pump can be expressed using two
nitrogen stream makes it impossible to achieve the specified constants, m i , and pi), in the form
temperature T,, the program should write an appropriate
comment and continue to the next data set (Table P5.45b).
The pump is assumed to operate only for flow in the direction
Suggested Test Data i to j ; however, the possibility that Qrj might be forced to be
Table P5.45b
negative because of other factors should not be neglected.
(e) Fittings and valves located in certain pipes. This is not
T, Tm T. P XQ
("F) 03 ("F) (atm) (%I an essential feature since the pressure drop across a valve or
fitting is normally reckoned as being that across an equivalent
length of pipe, which can then be lumped in with the actual
length.
Consider using the Newton-Raphson procedure for solving
the resulting simultaneous nonlinear equations, especially if
Problem 5.41 has already been attempted. Depending on
which of the above features have been accounted for, devise
a few appropriate piping networks for testing your program
(a simple case could be based on Fig. P5.40).
340 Systems of Equations

Bibliography 9. B. Carnahan, Radiation Induced Crac~ingof Pentanes and


Dimethylbutanes, Ph.D. Thesis, University of Michigan, 1964.
1. A. Ralston and H. S. Wilf, Mathematical Methodr for Digital 10. S. J. McMinn, Matrices for Structural Analysis, E. & F . N.
Computers, Wiley, New York, 1960. Spon Ltd., London, 1962.
2. J. F. Traub, Iterative Methods for the Solution of Equations, 11. J. Robinson, Structural Matrix Analysis for the Engineer, Wiley,
Prentice-Hall, Englewood Cliffs, New Jersey, 1964. New York, 1966.
3. L. Fox, An Introduction to Numerical Linear Algebra, Oxford 12. W. M. Rohsenow and H. Y. Choi, Heat, Mass, and Momentum
University Press, New York, 1965. Transfer, Prentice-Hall, Englewood Cliffs, New Jersey, 1961.
4. K. S. Kunz, Numerical Analysis, McGraw-Hill, New York, 13. J. H. Peny, ed., Chemical Engineers' Handbook, 3rd ed.,
1957. McGraw-Hill, New York, 1950.
5. A. M. Ostrowski, Solution of Quations and Systems of Equa- 14. G. Forsythe and C. B. Moler, Computer Solution of Linear
tions, Academic Press, New York, 1966. Algebraic Systems, Prentice-Hall, Englewood Cliffs, New
6. D. K. Faddeev and V. N. Faddeeva, Computational Methods of Jersey, 1967.
Linear Algebra, Freeman, San Francisco, 1963. 15. J. Todd, "Computational Problems Concerning the Hilbert
7. B. Wendroff, Theoretical Numerical Analysis, Academic Press, Matrix," J. Research Natl. Bur. Standards, Series B., 65, 19-22
New York, 1966. (1961).
8. F. A. Jenkins and H. E. White, Fundamentals of Optics, 2nd ed.,
McGraw-Hill, New York, 1951.
CHAPTER 6

The Approximation of the Solution of Ordinary


Dzflerential Equations

6.1 Introduction where p is a constant. By defining one new variable,


The behavior of many physical processes, particularly z = dyltl.~, the second-order equation can be rewritten
those in systems undergoing time-dependent changes as a pair of first-order equations:
(transients), can be described by ordinary differential
equations. Thus, methods of solution for these equations
are of great importance to engineers and scientists.
Although many important differential equations can be
solved by well-known analytical techniques, a greater
number of physically significant differential equations
cannot be so solved. Falrtunately, the solutions of these Since most higher-order equations (or a system of such
equations can usually be generated numerically. This equations) can be rewritten in similar hshion, the numeri-
chapter will describe the more important of these numeri- cal solution of first-order equations only will be described.
cal procedures.
Nth Order Ordinary Differential Equations. Consider 6.2 Solution of First-Order Ordinary Differential
the solution of nth-order ordinary differential equations Equations
of the form
A first-order equation is, by definition, of the form

An equation of type (6.1) is termed nth-order because the


highest derivative is of order 17, and ordinary because only or, alternatively,
total derivatives appear (no partial derivatives are present,
or alternatively, there is only one independent variable,
x). A function y(x) that satisfies this equation, implying
that it is at least 11 times differentiable, is said to be We desire a solution y(x) which satisfies both (6.4) and
solution of the equation. T o obtain a unique solutio~~ [in one specified initial condition. In general, it is impossible
general, there are many functions y(x) that satisfy (6.1)], to determine y(x) in functional (analytical) form. Instead,
it is necessary to supply some additional information, the interval in the independent variable x over which a
namely, values of y(x) and/or of its derivatives at some solution is desired. [a,b], is divided into subintervals or
specific values of x. For an nth-order equation, n such steps. The value of the true solution ~ . ( sis)approximated
conditions are normally sufficient to determine a unique +
at n 1 evenly spaced values of .u, (.r,,.u,, . . ., .u,,), so
solution y(s). If all n conditions are specified at the same that h, the step size, is given by
value of x (x,, for example), then the problem is termed
an initial-calue problem. When more than one value of x
is involved, the problem is termed a ho1mdflclvy-z~nl71e
proble~?~.
and
An nth - order ordinary differential equation may be
written as a system of n first-order equations by defining x i = x o + ih, i = O , I ,..., 11. (6.5)
n - 1 new variables. For exampld, consider the second-
order equation (Bessel's equation), Thus the solution is given in tabular form for 11 + 1
discrete values of x only (see Fig. 6.1). This table of values
contains sampled values of one particular approximation
of the solution of the equation.
The Approximation of the Solution of Ordinary Differential Equations

a b
Figure 6.1 Numerical solution of a first-order differential equation.

Let the true solution y(x) at the indicated base points which does the calculations, the order of the machine
be denoted by y(x,), and the computed approximation of operations used to implement the algorithm, etc. Some
y(x) at these same points be denoted by y,, so that upper bound can usually be found for the discretization
error for a particular method; on the other hand, round-
Yi = xi). (6.6) off error-generation is extremely complex and unpredict-
The true derivative dyldx at the base points will be able. However, because of this very unpredictability,
approximated by f(xi,yi), abbreviated as fi, so that numerical analysts have been fairly successful in develop-
ing a probabilistic theory of round-off error, on the
assumption that local round-off error, that is, the error
When the requisite numerical calculations are done caused by round-off in integrating a differential equation
exactly, that is, without round-off error (see below), the across one step, is a random varlable (see [2] for an excel-
difference between the computed value yi and the true lent description of this work). The only errors which will
value y(xi) is termed the discretization or truncation be examined in any detail here are those related to dis-
error, ci : cretization, that is, those inherent in the numerical
procedures.
The common numerical algorithms for solving a first-
The discretization error encountered in integrating a dif- order ordinary differential equation with initial condition
ferential equation across one step is sometimes called the y(x,) are based on one of two approaches:
local truncation error. The discretization error is deter- 1. Direct or indirect use of Taylor's expansion of the
mined solely by the particular numerical solution pro- object or solution function ~ ( x ) .
cedure selected, that is, by the nature of the approxima- 2. Use of open or closed integration formulas similar
tions present in the method; this type of error is to those already developed in Chapter 2.
independent of computing equipment characteristics.
An inherently difl'erent kind of error results from The various procedures can be classified roughly into
computing-machine design. In practice, computers have two groups, the so-called one-step and multistep methods.
only a finite memory and therefore a finite number size One-step methods permit calculation of y,,, given the
(scientific computers usually have a jixed word-length, differential equation and information at x i only, that is,
that is, the number of digits retained for any computed a value y,. Multistep inethods require, i n addition, values
result is fixed, usually 7-12 significant digits). Thus any of y j andlor fj at other (usually several) values x, outs~de
irrational number, or indeed any number with more the integration ipterval under consideration, [x,,x,+ ,I.
signlhcant digits than can be retained, that oc;cur>i n a Onc disadvantage of the multistep methods 1s that
sequence of calculations must be approximated by more information is required to start the procedure than
"rounded" values. The error involved is termed round-off is normally directly available. Usually an initial con-
error, and for any particular numerical method is, deter- dition, say y(~,), is given; subsequent values, y ( ~ ! ) ,
mined by the computing characteristics of the machine y(x,), etc., are qot known. Some other method (usually a
6.2 Solution of First-Order Ordinary Differential Equations 343
one-step method) must be used t o get started. Another Analytic integration of (6.11) after separation of variables
difficulty encountered in the multistep methods is that gives
it is rather difficult to change the step-size h once the
calculation is under way. On the other hand, since each
new application of a one-step method is equivalent to
restarting the procedure, such a change of step-size
causes no trouble. The multistep methods require con-
siderably less computation, compared with the one-step which is identical to (6.13). There is no truncation error for
methods, to produce results of comparable accuracy. the algorithm of (6.13) because all high-order derivatives
The advantages and disadvantages of each group of f (")(x,y),n > 3, vanish.
methods will become more apparent with the develop-
ment of the numerical procedures. Example. Consider a case for which f(x,y) is a function of
Taylor's Expansion Approach. One method of approxi- y alone:
mating the solution of ( 6 . 4 ) numerically is to express the
solution y ( x ) about some starting point xo by using a
Taylor's expansion :
subject to the initial condition y(xo) = y o . Differentiating
(6.15) by using the chain rule (6.10) yields:

f (")(x,y)= 2" By, f '"'(~0,)'o)


= 2" 'yo.
Here. . f f ( x , y ( x ) ) denotes ( d l d x l f ( x , y ( x ) ) , f " ( x , y ( x ) )
+ +

denotes ( d 2 / d x 2~) ( x , ~ ( xetc.


) ) , If y ( x o ) is specified as the The expansion of y(x) about xo, using (6.16) in the Taylor's
initial condition, f ( x o ,y ( x o ) ) can be computed directly series of (6.9), gives
from the differential equation (6.4)

(211)' (Zh)' (211)~


T o evaluate the higher-order derivatives of (6.9), we must = Y ( x o ) [ l i2 1 z r 7 +2 -. + - + 33!
.. 4!
differentiate f ( x , y ) by using the chain rule, since f is a
function of both x and y : (6.17)

After separation of variables, direct integration of (6.15)


d-f-- - a f +--.
afdy (6.10) gives
dx ax ay d x
Example. Consider a case for which f ( x , y ) is a function of
x alone:
and produces the solution
dY
-=f(x,y) =x2, (6.1 1 )
y(xo -I- h) = Y ( x o ) c ' ~ . (6.18 )
dx

subject to the initial condition y(xo) = y o . From (6.10), Since {l + 2h + [(2h)2/2!] + . . .) is the Taylor's expansion of
e Z habout h = 0 , the two solutions (6.17)and (6.18) agree with
f ' ( x , ~=
) 2x, f'(xo,yo) = 2 x 0 ; an accuracy determined by the number of terms retained in
f"(x,y) =2, f "(XO,YO)
=2 ; the series. If terms up to that including /'"-I' are retained,
f " ( x , ~=) 0 , f '"(XO,YO)
=0 ; (6.12)
then the error is given by

Expansion of y ( x ) about xo by substitution of (6.12) into the Exanlple. Consider a more general, but still simple, example
Taylor's series of (6.9) yields for which f ( x , y ) is a function of both variables,
344 The Approximation of the Solution of Ordinary Differential Equations

subject to the initial condition y(xo) =yo. Differentiation of Xitl = xi + h can be based upon the Taylor's expansion
(6.19) yields : of y(x) about x i :

Expansion of y ( x ) in a Taylor's series about x = xo yields

6 in (xi, xi+,). (6.25)


The algorithms, formed by dropping the last term on the
right-hand side of (6.25) and replacing y(xi+,) on the
By adding and subtracting (xo + h + 11 on the right-hand ,
left-hand side by yi + , are said to be of order h". The
side, error is of order h"+'. The local truncation error e l ,
introduced by one application is therefore bounded as
follows:

where

Unfortunately, in the general case, the differentiation


of f(x,y) becomes enormously complicated. Except for
The differential equation is linear and can be solved analyt- the simplest case,
ically by using the integrating factor e - " . Multiplying (6.19)
by e - " , integrating both sides, and solving for y yields
the direct Taylor's expansion of (6.25) is not often used to
where Cis the integration constant. Evaluation of the integra- solve first-order differential equations. Here "O( )"
tion constant at (xo,y(xo)) gives means "terms of order ( )." (in view of the recent
development of computer programs which formally dif-
ferentiate arbitrary symbolic expressions [I], this argu-
so that the solution of the initial-value problem is given by ment against the direct Taylor's expansion, with high-
order terms included, may well vanish at some future
time.)
For y = y(xo + h), (6.23) becomes Since y(xo) is usually the only value of y(xi) that is
known exactly (assuming that the initial condition is free
of error), y(x,) in (6.27) must in general be replaced by
Since [l + h + (h2/2!)+ ...I of (6.21) is Taylor's expansion of y,. The algorithm then assumes the form
eh, the analytical solution again agrees with that found by
the Taylor's expansion approach. The error caused by trunca-
tion after the term containing f ( " - ' ) is given by

which is called Euler's method.

-6.3 Euler's Method


A procedure for stepping from one value of x to another,
that is, from x, to x, + h, follows from expansion of y(x) Because it is the simplest method and the most amen-
about x,. Simila~ly,algorithms for stepping* from x i to able to an analysis of error propagation, Euler's one-step
method of (6.28) will bc discussed in same detail, even
*Throughout the remainder of this chapter, we shall consider though accuracy limitations preclude its use for most
the step-size h to be a positive number, that is, integration will be practical problems. There is a simple geometric interpreta-
carried out for increasing values of the independent variable x . tion for (6.28a). The solution across the interval [xo, x, ]
The algorithms to be developed, however, apply for negative h as
well. is assumed to follow the line tangent to y(x).at xo (see
6.3 Euler's Method 345
Fig. 6.2). When Euler's method is applied repeatedly derivatives, again with all figures retained. Column 5
across several intervals in sequence, the numerical solu- shows the true solutions rounded to four figures. Column
tion traces out a polygon segment with sides of slopes 6 contains the overall truncation error (rounded to four
f i , i = o , l , 2 ,..., n - 1. figures) at each value of xi. Columns 7 and 8 contain the
results which are computed when only four figures are
retained at each stage of the calculations. In column 7,
the four figures retained are truncated, that is, less sig-
nificant figures are simply dropped. In column 8, the
four figures retained are rounded values. Column 9
shows the values which would be computed by using true
y values, that is, y(xi) rather than y,, at the beginning
of each new step [see (6.27)]. Column 10 shows the
local truncation error for each interval when y(xi) is
used to begin each step, Column 11 shows the value of
the maximum local truncation error computed from the
Taylor's expansion error term with n = 1 (see (6.25)),

Figure 6.2 Euler's method.

AS an exampie, consider the differential equation of F,, the given ecluation,j(x,y)= + y, the derivative in
(6.191, (6.29) is given by
dy d2y
-=x+y, f '(x,y) = - = ex.
dx dx2
subject to the initial condition xo = 0, y(x,) = y o = 0
Since the maximum value of e x , xi < x < x i + , occurs at
(see Example 6.1). The solution already developed in
x = x i + , , the maximum value of the local truncation
(6.23) for this initial condition is
error (6.29) is given by
y=e"-X-1. hZ
lerlmax= - e x i + I . (6.30)
The Euler solution is shown in Table 6.1, using a 2!
= and an upper limit of integration Unfortunately, this error bound is valid only for the
xl0 = "0. Column 3 'ontains the of yi 'ornputed algorithm of (6.2ga) and that obtained from (6.23,
from (6.28) with all figures retained in the calculation
(no round-off error). Column 4 contains the computed Yi+ I = xi) + h f (xi>y(xi)). (6.31)

Table 6.1 Solution of Equation (6.19) by Euler's Method

y(C1) = 0 h = 0.1
Overall Truncated YI Local
Trunca- at Rounded Value Trunca-
True y tion Error Four to Four Computed tion Error h2
Rounded Rounded Figures Figures Using Using e'ax -
2!
i xi y1 f (xl,~i) xi) EI = yi -A x i ) YI YI y(xi - 1) Y ( X -, I)
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11)
346 The Approximation of the Soliltion of Ordinary Differential Equations

For the general integration between xi and xi+,, that is, Then, from (6.33) and (6.34),
the results of column 3, the actual algorithm is given by
(6.28b) as
Yi+ I = Yi + hf'(xi,~i).
The value of y, used for every interval except the first,
where yo = y(x,), is inexact, being the result of previous
calculations which involved earlier truncation errors.
6.4 Error Propagation in Euler's Method Since lei+ll < lei+, - si(+ lsil, (6.39) can be rewritten as
Let us examine the propagation of local discretization
errors in Euler's method of (6.28) applied to the integra-
tion of the initial-value problem
To determine Isil, (6.40) can be applied i times with
the starting value so = 0. However, (6.40) has a general
solution of the form
Assume that f(x,y) and its first-order partial derivatives
are continuous and bounded in the region a < x < b,
- co c y < co, and that a c xo < x, c b. Assume also
that a solution y(x) exists. Then there must exist constants That (6.41) is a solution of (6.40) is not immediately
M and K such that apparent, but this can be proved by induction as follows.
Consider a general inequality of the form

and where A and B are independent of i. The proposed solu-


tion of (6.42), for A > 0,B 2 0,A # 1, is

where y* c cc < y for (x,y) and (x,y*) in the region.


Relationship (6.34) follows directly from the differential For i = 1, (6.43) is identical with (6.42). For a general
mean-value theorem of page 9. value of i, substitution of (6.43) into (6.42) yields
As in (6.8), denote the error between the approximate
and exact solutions by ei, that is, let

Then the additional error, As,, generated in traversing


the ith step, is
A E, = A yi - A y(xi), (6.35)
or
Ei+l-Ei=Yi+~-Yi-C~(xi+;)-y(xi)]~ (6.36) which is just (6.43) with i incremented by unity. Thus, by
induction, (6.43) is a solution of (6.42). The error for
subject to the condition so = 0. From Euler's algorithm
Euler's method (6.40) is a special form of (6.43), since
of (6.28b),
eO= 0. In this case, A = (1 + hK), B = Mh2/2, so that
Yi+ 1 - Yi = h f (xi,~i), the solution is
whereas, from Taylor's expansion (6.25),

xi < s < (6.37)


- [(1
-
l+hK-1
y
+ h ~ )-' 11- hZ
Thus (6.35) is equivalent to

which is the given solution (6.41).


6.4 Error Propagation in Euler's Method 347
The solution of (6.41) (can be simplified further, since Since M can be taken as an upper bound of the magnitude
off '(x,y) on the interval [x,, xi],let
(1 + hK) < ehK, (6.44)
which follows from Taylor's expansion of ehK.Substitu- M = If '(x,~)l,,, = leXl,,, (6.47)
tion of (6.44) into (6.41) gives
Also, K can be taken as an upper bound on the partial
l&,l< Mh eihK .
(efhK- 1) < - (6.45) derivative of f(x,y) with respect to y, that is,
2K
K = Ify(x,~)Imax= 1. (6.48)
For nh = x, - xo = L, the constant total interval of
integration, (6.45) can be written as Using these values for M and K, and with k = 0.1 as
Mh before, an upper bound on the truncation error /&,I is,
IEA< -eLK. from (6.41),
2K
Then, as h approaches zero, the error approaches zero,
because
Mh
lim~IE,J < lim -etK - 0. Values of ( ~ computed
~ l from
~ the~ right-hand
~ side of
h-o h+o 2K (6.49) are listed below. As expected, all these error
Notice that this is true despite the fact that n-+ co as bounds (Table 6.2) are greater than the true truncation
h -,0. A numerical procedure for which, when 0 < i < n , error (from Table 6.1, column 6 ) .
linn IE,~ = 0, Table 6.2 Total Truncation Error for
h-0
Euler's Method Solution of
is said to be conueugent. Thus, Euler's method converges Equation (6.19 )
with an overall truncation error for which
I&iI = I Y i -J'(xi)I = O(h)- (6.46)
Note that although the local truncation error (6.29) for
Euler's method is of order h2, (6.46) shows that the total
truncation error is of order h.
For the equatio~iof (6.19)
f (x,y) = x + Y,
subject to the initial condition y(x,) = y(0) = 0, the
analytical solution is
EXAMPLE 6.1
EULER'S METHOD

Problem Statement the true solution (6.23):


Write a program that uses Euler's method to solve the y(xi) = ex' - xi - 1, i = 0,k , 2k, 3k, . . ..
first-order equation (6.19),
Method of Solution
f(x, Y) = x +Y, Euler's algorithm (6.28),
with the initial condition y(xo) = y(0) = 0. Integrate the Y. = Y.1 - 1 + hf(xi-1, yi-1)
equation on the interval 0 < x < x, using several
different step sizes, h. Print the results after every k steps
= Yi-1 + h(xi-1 + yi-l),
(i.e., for x = xo, x,, x,,, . . .) and compare the results with is used to implement the solution of (6.19).

Flow Diagram

FORTRAN Implementation
List of Principal Variables
Program Symbol Definition
H Step size, h.
I Step counter, i.
IPRINT Number of steps beiween printout, k.
NSTEPS Total number of steps, xmax/h.
TRUEY True solution, y(xi).
X Independent variable, xi.
XMAX Maximum value of the independent variable, x,,,.
Y Computed solution, yi.
Example 6.1 Euler's Merhod

Program Listing
A P P L l ED NUMERl CAL METHODS, EXAMPLE 6 . 1
E U L E R ' S METHOD

T H I S PROGRAM USES EULERS METHOD TO COMPUTE THE SOLUTION OF THE


ORDINARY D I F F E R E N T I A L EQUATION DY/DX = X+Y ON THE INTERVAL
(0,XMAX) W I T H S T E P S I Z E H AND I N I T I A L CONDITION Y ( 0 ) = 0.
I I S THE COUNTER FOR THE NUMBER OF A P P L I C A T I O N S OF THE METHOD.
NSTEPS I S THE TOTAL NUMBER OF STEPS FOR EULER'S METHOD.
TRUEY I S THE A N A L Y T I C A L SOLUTION Y ( X ) = E X P ( X )
SOLUTIONS ARE PRINTED AFTER EVERY l P R I N T STEPS.
- - X 1.

IMPLICIT REAL*8(A-H, 0-2)

..... READ DATA AND SET I N I T I A L CONDITIONS


READ ( 5 , 1 0 0 ) XMAX, H, I P R I N T
.....
WRITE ( 6 , 2 0 0 ) XMAX, H, I P R I N T
X = 0.
Y P 0.
TRUEY = 0.
WRITE (6,201) X, Y, TRUEY
L
C ..... EULERS METHOD l NTEGRATI ON
NSTEPS = (XMAX + H / 2 . ) / H
.....
DO 3 1 = 1, NSTEPS
Y = Y + H + ( X + Y)
X = FLOAT(I1.H

3
TRUEY = DEXP(X) X- -
I F (I/IPRINT*IPRINT.EQ.I)
1.
WRITE ( 6 , 2 0 1 ) X, Y, TRUEY
GO TO 1
C
C
100
..... FORMATS FOR I N P U T AND OUTPUT STATEMENTS
FORMAT ( lOX,F10.6,17X,F10.6,20X, 15 1
.....
200 FORMAT ( 1 0 H l X M A X = , F12.61 10H H = , F12.61 10H IPRINT =
1 1 5 / 1H0, 6X, lHX, 15X, lHY, 13X, 5HTRUEY/ 1 H )
2 0 1 FORMAT ( l H , F10.6, 2F16.6
C
END

Data
XMAX l PRINT
XMAX l PRI N T
XMAX IPRINT
XMAX IPRINT
XMAX IPRINT
XMAX IPRINT
XMAX IPRINT
XMAX IPRINT

Computer Output
Results for the 1st Data Set
XMAX = 1.OOClOOO
H I 1.000000
IPRlNT 1
TRUEY

0.0
0.718282
350 The Approximtrtion of the Solution of O r k y OrdinoryifferentialEquations
Computer Output (Continued)
Results for the 2nd Data Set
XMAX ' = 1.000000
H = 0.500000
IPRINT = 1
TRUEY

Results for the 3rd Data Set


XMAX = 1.000000
H I 0.250000
IPRINT = 1
TRUEY

Results for the 4th Data Sei


XMAX = 1.000000
H 9 0.100000
IPRINT = 1
TRUEY

XMAX
H
lPRlNT=
-
Results for the 5th Data Set

9
1.000000
0.010000
10
TRUEY
Example 6.1 Euler's Method

Computer Output (Continued)


Results for the 6th Data Set
XMAX 1.0110000
H I 0.001000
IPRlNTm 100
X Y TRU EY

0.0 0.0
0.100000 0.005116
0.200000 0.021281
0.300000 0.049656
0.400000 0.091527
0.500000 0.148309
0.600000 0.221573
0.700000 0.313048
0.800000 0.424651
0.900000 0.558497
1.000000 0.716924

Results for the 7th Data Set


XMAX = 1.000000
H a 0.000100
IPRtNT = 1000

TRUEY

Results for the 8th Data Set


XMAX = 1.000000
H P 0.000010
I PRINT = 10000

X Y TRUEY

0.0 0.0
0.100000 0.005170
0.200000 0.021402
0.300000 0.049857
0.400000 0.091822
0.500000 0.148717
0.600000 0.222113
0.700000 0.313746
0.800000 0.425532
0.900000 0.559592
1.000000 0.718268
352 The Approximation of the Solution of Ordinary Differential Equations

Discussion of Results
Equation (6.19) has been solved on the interval [0,1] 1.0 -
using Euler's method with several different step sizes,
h = 1.0, 0.5, 0.25, 0.1, 0.01, 0.001, 0.0001, and 0.00001.
The errors in the computed solution at x = 1 as a function 0.1
of the step size are shown in Table 6.1.1 and Fig. 6.1.1.
As expected from the esror analysis of Section 6.4, the

/ ;1
error decreases linearly with h for small h (see (6.46)). . .-. 0.01
II
% !
u
Table 6.1.1 Error in Euler's m

Method Solution at x = 1 5s 1

/I
0.001

Step size, h Error at x = 1

1.O
0.5
0.25
0.1
0.71 8282
0.468282
0.276876
0.124540
v4- !
!
I

0.01 0.013468 0.00001 0.0001 0.001 0.01 0.1 1.0


0.001 0.001 358 Step-s~ze,h
0.0001 0.000136
0.00001 0.000014 Figure 6.1.1 Error in Euler's method solution at x = 1.
EXAMPLE 6.2
ETHANE PYROLYSIS IN A TUBULAR REACTOR

Problem Statement n,,,, = Molal flow rate of ethylene at any point,


The pyrolysis of ethane in the temperature range lb moles/hr.
1200 to 1700°F is represented essentially by the irrevers- % 2 ~ 6 = Molal
flow rate of ethane at any point,
ible first-order chemical .reaction Ib moles/hr.
n~~ = Molal flow rate of hydrogen at any point,
Ib moles/hr.
ethane -+ ethylene + hydrogen. P = Total pressure, psia.
q = Heat input from furnace, BTU/hr ft (of tube
Pure ethane is fed at a rate of 1800 Ib/hr at 1200°F to a length).
4.026 in. I.D. steel tube contained in an ethane pyrolysis r = Specific reaction rate, lb moles ethanejcu ft hr.
furnace. Heat is supplied by the furnace to the tube at a T = Absolute temperature, OR.
rate of 5000 BTU/hr sq ft (of inside tube area). The tube T = Absolute temperature, OK.
contains no internal obstructions (e.g., catalyst) and any V = Reactor volume, cu ft.
pressure drop along the length of the tube can be ig- x = Mole fraction of ethane.
nored ; the mean pressure for the gases in the tube may z = Fraction of ethane converted to ethylene and
be assumed equai to 30 psia. Assuming plug flow in the hydrogen.
tube, write a program tlhat will calculate the length of AHf = Heat of formation, cal/g mole.
tube required to produce 75 percent decomposition of the AHR = Heat of reaction, BTU/lb mole.
ethane to ethylene and hydrogen. The program should
include provision for reading important parameters The following conversion constants will be useful in
(ethane feed rate, inlel. temperature, tube diameter, generating a consistent set of units for the solution of the
mean pressure, etc.) and printing temperature (OF) and problem :
conversion profiles along the length of the tube (ft) at The Kelvin, Rankine, and Fahrenheit temperature
desired intervals. scales are related by:
Required thermodynamic properties (standard heats
of formation, A H f , temperature-dependent specific heat
capacities, c,), kinetic information (the temperature-
dependent rate constant for reaction (6.2.1)), and
physical constants are shown in Table 6.2.1. The value of the specific heat capacity in cal/g mole
OK is numerically equal to the specific heat capacity in
Table 6.2.1 Data for Ethane Pyrolysis Reaction*
BTU/lb mole OR.
AH, at 298°K CP
The heat of reaction AHR in BTU/lb mole is equal to
(cal/g mole) (cal/g mole OK) 1.8 times its numerical value in cal/g mole.
The molal flow rates of the three constituents are:
C2H6 = C2H4 + H2
Inlet molal flow rate: no 0 0
Reaction rate constant: k = 5.764 x 10'6e-4'.3101T(set)-' Molal flow rate when the
Atomic Weights: C = 12, H = 1.
Gas Constant: R = 10.73 psia cu ft/lb mole OR.
conversion is z: no(l - z) n,z noz
--

* ?:is temperature in "K (Kelvin). Thus the total number of moles flowing at any point in
the tubular reactor is n,(l + z) and the corresponding
Method of Solution mole fraction of ethane is x = ( 1 - z)/(l + z).
The problem is stated in mixed units, but will be solved We now establish steady-state material and energy
using t h e BTU, lb mole, OR (Rankine), ft, and h r system. balances over a differential element dL of reactor length,
Adopt the following notation: shown in Fig. 6.2.1.
A = Cross-sectional area of tube, sq ft.
c = Concentration of ethane, Ib moles/cu ft.
c, = Specific heat ca~pacity,BTU/lb moleOR.
k = Reaction rate constant, sec-'.
L = Length measured from reactor inlet, ft.
no = Inlet molal feed rate of ethane, Ib moleslhr. Figure 6.2.1 Differeztiai e l m z t of t u b u b reactar.
354 Z h Approximation of the Solution of Ordinary Differential Equations

Material balance. The material balance for ethane.over Then the change in temperature as a function of
the differential volume element dV = AdL is: length is described by the first-order ordinary differential
equation
Flowing In + From Reaction = Flowing Out
nc2~6 + rdV =nsH, + d n ~ ~ ~ s .
(6.2.2)
Rearranging (6.2.2),
Provided that the temperature T is computed from
temperature T, the heat capacity relationships from
Table 6.2.1 can be introduced directly into the denomi-
But for the given first-order irreversible reaction, the nator of (6.2.9).
rate, r, is The heat of reaction, AHR, varies with temperature
according to
r = - 3600kc, (6.2.4)
where the conversion constant 3600 sec/hr is introduced
for dimensional consistency. At the low pressure of the
reacting gas mixture, the ideal gas law holds, so that The heat of reaction at 298°K can be calculated from the
(6.2.4) becomes heats of formation of Table 6.2.1 as 12496 + 20236 =
32732 cal/g mole0K. Introducing the heat capacity
relationships of Table 6.2.1 into (6.2.10), and integrating,
yields the heat of reaction at any temperature T (OR),
From (6.2.3), (6.2.5), and Table 6.2.1, the conversion z
as a function of reactor length, L, is described by the
solution of the first-order ordinary differential equation
+ 1.28 x 10-6(T3 - 2983)] BTU/lb mole. (6.2.11)
Algorithm. Equations (6.2.6) and (6.2.9) are two
coupled nonlinear first-order ordinary differential equa-
tions which must be solved simultaneously. The length L
is the independent variable, and z and Tare the dependent
For a given tube I.D., reactor pressure, and inlet ethane (solution) variables. The initial conditions are
flow rate, the first factor on the right-hand side of (6.2.6)
is constant; the second factor is a function of temperature z(LO)= zO= ~ ( 0 =
) 0 (no conversion at inlet),
only. T(Lo) = To = T(0) = T, (inlet feed temperature).
Energy balance. The energy balance over the differential
volume element dV must account for heat liberated due The two differential equations can be solved in parallel
to reaction, heat introduced from the furnace through using Euler's method of (6.28). Let the step size be
the tube wall, and sensible heat effects (because .of denoted by AL. Then (6.28) becomes
temperature change) in the flowing gas stream. For a
change dz in conversion in the differential element, the
heat liberated by the reaction is nodz(- AH,). The energy
introduced into the differential element through the tube
wall is qdL. The corresponding gain in enthalpy of the
flowing gases is given by
Here, zi and Ti are, respectively, the conversion and
temperature at L, = iAL. At the beginning of the ith step
the values of zi-, and Ti-,, have already been calculated.
The values of dz/dL and dT/dL for equations (6.2.12)
and (6.2.13) can be computed from (6.2.6) and (6.2.9).
where dT is the temperature change across the differential Then zi and Ti can be computed from (6.2.12) and
element. The energy balance is (6.2.13). The process is repeated for subsequent steps
until zi exceeds some desired upper limit z,,, 0 < z,, g
1. The program contains provision for printing values for
Li, z , and Ti (converted to OF) at specified intervals in i.
Example 6.2 Ethane Pyrolysis in a Tubular Reactor
k
i

Begin TO,P,no, D, Cc LO+O


z ~ ~ O

I
T+-T1,1/1.8 It
04 2 zmo
L,, Ti,+ AHR t- eqn. (6.2.11)
(2)
-
+ eqn. (6.2.6)
T
($1
i-1

\I eqn. (6.2.9)
1-1

zi +- + AL($) i-l

I-1

FORTRAN Implementation
List of Principal Variables
Program Symbol Defiition
AL Length, L,,ft.
CONV Desired conversion fraction for cthane, z,,,.
CP Denominator of equation (6.2.9), BTU/lb mole OR.
DH AEI,, BTU/lb molc (see equation (6.2.11)).
DI Internal diameter of reactor tube, D, in,
DL Length increment, AL, ft.
DTDL (dT/dL)i, l , "R/ft.
DZDL (d8/dL)f-l, ft-',
FACTOR 2.075 x 1OZ0AP/no~ (see equation (6.2.6)).
I Step counter, I,
IPRINT Intermediate print control. Results are printed after every IPRINT steps.
P Pressure in the reactor, P, psia.
Q qln, , BTU/lb mole #.
QPERSF Heat input from furnace, BTU/hr sq ft (of inside tube surface area).
RATEMS Inlet mass feed rate of ethane, Ib/hr.
T, TF, TK Temperature, OR, OF,and "K,respectively.
z Fraction of ethylene converted, z, .
356 The Approximation of the Solution of Ordinary Differential Equations

Program Listing
C A P P L I E D NUMERICAL METHODS, EXAMPLE 6.2
C PYROLYSIS OF ETHANE I N A TUBULAR REACTOR
C
C T H I S PROGRAM USES EULER'S METHOD TO SOLVE A SYSTEM OF TWO
C F I R S T ORDER ORDINARY D I F F E R E N T I A L EQUATIONS D E S C R I B I N G THE
C A X I A L TEMPERATURE ( T I AND CONVERSION ( Z ) PROFILES FOR
C PYROLYSIS OF ETHANE I N A TUBULAR REACTOR. THE DISTANCE FROM
C THE TUBE ENTRANCE, AL, I S THE INDEPENDENT VARIABLE AND D L I S
C THE S T E P - S I Z E USED I N THE INTEGRATION PROCEDURE. DTDL AND
C DZDL ARE THE D E R I V A T I V E S OF TEMPERATURE AND CONVERSION W I T H
C RESPECT TO LENGTH AT THE BEGINNING OF EACH STEP. T F I S THE
C FAHRENHEIT AND TK THE K E L V I N EQUIVALENT OF THE TEMPERATURE
C T WHICH HAS U N I T S OF DEGREES RANKINE. INPUT DATA INCLUDE
C THE I N L E T TEMPERATURE TF, I N L E T PRESSURE P, MASS RATE OF
C ETHYLENE FEED RATEMS, INTERNAL DIAMETER OF THE TUBE D l ,
C HEAT INPUT RATE FROM THE FURNACE QPERSF, THE S T E P - S I Z E DL,
C THE MAXIMUM CONVERSION DESIRED CONV, AND THE FREQUENCY OF
C PRINTOUT I P R I N T . DH I S THE HEAT OF REACTION AND CP A HEAT
C CAPACITY TERM. Q, FACTOR AND CP ARE DESCRIBED I N THE TEXT.
C INTEGRATION PROCEEDS DOWN THE TUBE U N T I L THE I N D I C A T E D
C CONVERSION FRACTION CONV I S REACHED. I I S THE STEP COUNTER.
C
C .....
READ AND P R I N T DATA
1 READ ( 5 , 1 0 0 )
-
COMPUTE CONSTANT PARAMETERS
TF, P, RATEMS, D l , QPERSF, CONV, DL, I P R I N T
.....
WRITE ( 6 , 2 0 0 ) TF, P, RATEMS, D l , QPERSF, CONV, DL, l P R I N T
Q = QPERSF*3O./RATEMS*3.14159*DI/12.
FACTOR = 2.07504E20*3.14159*Dl*DlqP+30./(144.+4.tlO.73*RATEMS~
C
C .....
SET I N I T I A L CONDITIONS
A L = 0.
.....
Z = 0.
T = TF + 460.
WRITE ( 6 , 2 0 1 ) AL, TF, Z
C
C
2
.....
DO 3
INTEGRATE EQUATIONS ACROSS I P R I N T STEPS
1 = 1, I P R I N T
.....
C
C ..... D E R I V A T I V E S FROM M A T E R I A L AND ENERGY BALANCES
TK = T / 1 . 8
.....

1 *TK-6.28E-6*TK*TK)
DZDL = FACTOR+EXP(-41310./TK)r(l.-Z)i(T*(l1+Z))
DTDL = (Q-DH*DZDL)/CP
I:-
C .....
APPLY EULER'S ALGDRITHM
T = T + DTDL*DL
.....
Z = Z + DZDL*DL
AL = A L + D L
I F (Z.GE.CONV) GO TO 4
3 CONTINUE
C
C
b
..... -
TF = T
P R I N T SOLUTIONS, CONTINUE INTEGRATION I F NOT DONE
460.
.....
WRITE ( 6 , 2 0 1 ) AL, TF, Z
I F (Z.LT.CONV) GO TO 2
GO TO 1
C
C
100
.....
FORMATS FOR INPUT AND OUTPUT STATEMENTS
FORMAT ( 9X,F8.3,19X,F8.3,19X,F8.3/
.....
9X,F8.3,19X,F8.3,19X,F8.3/
1 9X,F8.3,19X,14
2 0 0 FORMAT ( l O H l T F = , F10.4/ 10H P = , F 1 0 . 4 / 1 0 H RATEMS = ,
1 F10.4/ 10H D l = , F 1 0 . 4 / 1 0 H QPERSF = , F 1 0 . 4 / 1 0 H CONV =
2 , F10.4/ 10H D L = , F 1 0 . 4 / 1 0 H I P R I N T = , I S / 1H0, 3X,
3 lHL, 12X, ZHTF, 13X, 1HZ / 1H 1
201 FORMAT ( 1H , F 7 . 2 , F 1 4 . 2 . F 1 4 . 6
,.
I.
END
Example 6.2 Ethane Pyrolysis in a Tubular Reactor

Program Listing (Contiaued)


Data
TF = 1200.000 P RATEMS = 1 8 0 0 . 0 0 0
DI = 4.026 QPERSF CONV = 0.750
DL = 0.100 l P R l NT
TF = 1200.000 P RATEMS = 1 8 0 0 . 0 0 0
DI = 4.026 QPERSF CONV = 0.750
DL = 0.500 IPRINT
TF = 1200.000 P RATEMS = 1 8 0 0 . 0 0 0
DI = 4.026 QPERSF CONV = 0.750
DL = 1.000 IPRINT
TF = 1200.000 P RATEMS = 1 8 0 0 . 0 0 0
DI = 4.026 QPERSF CONV = 0.750
DL = 5.000 l P R l NT
TF = 1200.000 P RATEMS = 1 8 0 0 . 0 0 0
DI = 4.026 QPERSF CONV = 0.750
DL = 10.000 I P R l NT
TF = 1200.000 P RATEMS = 1 8 0 0 . 0 0 0
DI = 4.026 QPERSF CONV = 0.750
DL = 25.000 IPRINT
TF = 1200.000 P RATEMS = 1 8 0 0 . 0 0 0
DI = 4.026 QPERSF CONV = 0.750
DL = 50.1300 l P R l NT
TF = 1200.000 P RATEMS = 1 8 0 0 . 0 0 0
DI = 4.1126 QPERSF CONV = 0.750
DL = 100.000 I P R l NT
TF = 1200.000 P RATEMS = 1 8 0 0 . 0 0 0
DI = 4.026 QPERSF CONV = 0.750
DL = 200.UOO l P R l NT

Computer Output
Results for the Ist Data Set Results for the 2nd Data Set
TF = 1200.0000
I TF * 1200.0000
P = 30,,0000 P = 30.0000
RATEMS = 1800.0000 RATEMS a 1800.0000
DI - 4.0260 DI 5 4.0260
QPERSF = 5000.0000 QPERSF = 5000.0000
CONV =
- 0.7500 CONV -- 0.7500
DL 0.1000 DL u 0.5000
IPRINT = 500 IPRINT 100
358 The Approximation of the Solution of Ordinary Differential Equntions

Computer Output (Continued)


Results for the 3rd Data Set Results for the 4th Data Set
TF TF = 1200.0000
P P = 30.0000
RATEMS RATEMS = 1800.0000
D1 DI a 4.0260
QPERSF QPERSF = 5000.0000
CONV CONV = 0.7500
DL DL e 5.0000
IPRINT IPRlNTa 10

Results for the 5th Data Set Results for the 6th Data Set
TF 1200.0000
I TF =
P = 30.0000 P a
RATEMS = 1800.0000 RATEMS =
DI a 4.0260 DI I
QPERSF = 5000.0000 QPERSF =
CONV = 0.7500 CONV =
DL = 10.0000 DL P

IPRlNT = 5 IPRINT =
Example 6.2 Ethane Pyrolysis in a Tubular Reactor

Computer Ontput (Continued)


Results for the 7th Data Set Results for the 8th Data Set
TF = 1200.0000 TF =
P = 30.0000 P a
RATEMS = 1800.0000 RATEMS =
D1 a 4.0260 DI a
QPERSF 5000.0000 QPERSF =
CONV = 0.7500 CONV =
DL = 50.0000 DL =
IPRINT = 1 IPRINT =

Results for the 9th Data Set


TF = 1200.0000
P = 30.0000
RATEMS = 1800.0000
DI = 4.0260
QPERSF = 5000.0000
CONV
DL
IPRlNT -
=
=
0.7500
200.0000
1
360 The Approximation of the Solution of Ordinary Differential Equations

Discussion of Results the heat of reaction for the endothermic (energy-con-


The program has been run using the parameters given suming) pyrolysis reaction. Euler's method produces
in the problem statement with several different values rather accurate results (not generally the case) for sub-
for the length increment, AL. All calculations have been stantial length increments, because the derivatives of
made using single-precision arithmetic (note the influence (6.2.6) and (6.2.9) are small and virtually constant over a
of round-off error in the tabulated values for L in the substantial portion of the reactor length.
results for the first data set). The total reactor length Significant errors begin to show up in the results for a
and the temperature at the reactor outlet corresponding length increment of 50 ft. Notice the curious oscillation
to 75 percent conversion of ethane, determined by linear of temperature down the length of the reactor; this
interpolation on the computed results, are shown in behavior would be quite unreasonable from physical and
Table 6.2.2. chemical considerations. Euler's method produces these
oscillations because the temperature at the beginning of a
Table 6.2.2 Reactor Length and Out& Tem- length increment is assumed to hold throughout the entire
perature for 75 Percent Conversion of Ethane increment. When the temperature is rather low, the rate
of reaction will be low, and heat from the furnace will be
Length Reactor Outlet
Increment Length Temperature
absorbed primarily as sensible heat, raising the tem-
(ft) (ft) ("F) perature at the beginning of the next segment. This higher
temperature will lead to a high reaction rate and to the
consumption of all of the heat from the furnace (and
some of the sensible heat of the reacting gases) by the
decomposition reaction, leading to a lower gas tem-
perature at the beginning of the next increment. As the
length increment is made larger, these oscillations
become more serious. For length increments of 100 and
200 ft, the results become completely meaningless. This
problem has also been solved by using the integration
function of Example 6.3 named R U N G E , which employs
The results are almost identical for length increments the fourth-order Runge-Kutta method described in the
of 0.1 ft to 25 ft, suggesting that the truncation errors next section (results not shown). Results were virtually
are small and the results accurate. There is very little identical with those for the Euler method solution for the
conversion of ethane in the first 50 ft of tube length; smaller step sizes; the Runge-Kutta solutions were
the gas temperature is rising (because of heat input from unstable for AL > 25 ft.
the furnace through the tube wall) but the reaction It should be mentioned that the method of solution
velocity constant is quite small, and only a small amount outlined earlier assumes that the pressure is constant
of decomposition takes place. Once the temperature is throughout the length of the reactor. This assumption
high enough, the rate of reaction becomes significant. would be unwarranted for certain combinations of feed
The temperature rises only very slowly thereafter; the rate and tube diameter (see Problem 6.20 at the end of the
heat introduced from the furnace is roughly balanced by chapter).
6.5 Runge-Kutta Methods 361

6.5 Runge-Kutta Methods where p and q are constants to be established later. The
The solution of a differential equation by direct quantities hk, and hk, have a simple geometric interpreta-
Taylor's expansion of the object function is generally not tion, which will become apparent once p and q are
practical if derivatives of higher order than the first are determined.
retained. For all but the simplest equations, the necessary First, expand k , in a Taylor's series for a function of
higher-order derivatives tend to become quite complicated. two variables*, and drop all terms in which the exponent
In addition, as shown by the preceding examples, each of h is greater than one:
problem results in a specific series for its solution. Thus,
when higher-order error terms are desired, no simple
k2 = f (xi + Ph, Yi 9 h f (xityi))
algorithm analogous to Euler's method can be developed =f ( x i , ~ i+
) Phfx(xi>~i>
directly from the Taylor's expansion.
Fortunately, it is possible to develop one-step pro-
+ q h f ( x i y ~ i > f ~ ( x i+, ~O(h2).
i> (6.54)
cedures which involve only first-order derivative evalua- From (6.53) and (6.54), (6.52) becomes
tions, but which also produce results equivalent in accur-
acy to the higher-order Taylor formulas. These algorithms Y i +I ' Y i + h [ a f (xi>yi)+ b f (xi,Yi)I
are called the Runge-Kutra methods. Approximations + h2CbPfx(xi7Yi>+ b q f ( x i , ~ i > f y ( x i , ~ +i > O(h3).
I
of the second, third, and fourth orders (that is, approxima- (6.55)
tions with accuracies equivalent to Taylor's expansions Next, expand the object function y ( x ) about x i by using
of y ( x ) retaining terms in h2, h3, and h4, respectively) Taylor's series (6.25), as before:
require the estimation off ( x , y ) at two, three, and four
values, respectively, of :c on the interval x , < x < x i + , . Y(xi + h? = y ( x i + l ) = y ( x i ) + hf(xi,~(xi))
Methods of order m, where m is greater than four, require
h h3
derivative evaluations at more than m nearby points [14]. + T fY x ~ > Y ( x+~ )3)' f " ( t . ~ ( C ) ) ,
All the Runge-Kutta methods have algorithms of the
form
Yi+ 1 =Yi + h4(xi,Yi,h). (6.50)
By chain-rule differentiation (6. lo), f '(xi,y(xi))is given by
Here, 4, termed the increment function by Henrici [2], is
simply a suitably chosen approximation to f ( x , y ) on f '(xi,Y(xi)) =fx(xi,Y(xi)) +fy(xi,Y(xi>>f(xi>Y(xi)).
the interval x i < x < Because there is a consider-
able amount of algebra involved in the development of Finally, equate terms in like powers of h in (6.55) and
(6.56) :

Power of h Expansion of y(x) Runge-Kutta Algorithm

Assuming that y i = y ( x i ) and that we want equality of


the higher-order Runge-Kutta formulas, only the simplest the coefficients of h2 for all suitably differentiable func-
of these procedures (the second-order algorithm) will be
developed in detail. The derivation of other Runge-Kutta
+
tions f ( x , y ) , we find that a b = 1 , bp = 112, and
bq = 112. Thus
formulas is analogous to the one which follows.
Let 4 be a weighted average of two derivative evalua-
tions k , and k2 on the interval x i < x < xi+,,that is,
4 = ak, + bk,, (6.5 1)
which leads to the Runge-Kutta algorithm Since the three equations of (6.57) contain four un-
knowns, the system is underdetermined, that is, there is
yi+ , = y i + h ( a k , + bk,). (6.52)
one variable, say 6 , which may be chosen arbitrarily.
Let The two common choices are b = and b = 1. +
kl =f(xi*Yi),
* The first few terms of the two-variable Tay1or's:series are:
= f (xi + ph, Yi + q h f (xi,yi)? (6.53)
f( x + r~ + s ) = f (x,Y)+ rfx(x,y;+ s f , ( x ,y) + r 2f,x(X,y)/2
k2

= f (xi + ph, Yi + qhki), + rsfi,(x,~)+ s2fvv(x,y)/2+S[(lrl+ jsl)3].


362 The Approximation of the Solution of Ordinary Differerrtial Eq~rations

For b = +.a = 3, p = 1, q = 1. Then (6.52) becomes equations, (6.60) and (6.59), leads to the simplest of the
so-called predictor-corrector methods which are described
= yi + 7h Cf(xi,yi) + f ( ~ +i h,yi + hf(xiy~i))I, in more detail in Section 6.1 1.
For b = 1.a = 0, p = 112, q = 112. Then (6.52) becomes
(6.58)
which can also be written ~i+l=~i+hf(xi+:,Ji+,,~), (6.61)

h
where j i + 112 = Yi + f (xi~~i). (6.62)
ji+ I = yi + h.f(xi,~i). (6'60) The one-step algorithm of (6.61) and (6.62) is called the
The one-step algorithm of (6.59) and (6.60) is known as improvedpolygon method or the modified Euler's method,
the improved Euler's method or Heun's method, and has illustrated in Fig. 6.36. Again, Euler's method is employed
the geometric interpretation shown in Fig. 6.3~. twice in sequence. First, from (6.62). an approximation

Figure 6.3 Second-order Runge-Kutta procedures.

Essentially, Euler's method is employed twice in Ji+,l, is obtained at the halfway point xi + h/2. Second,
sequence. First, equation (6.60) is used to predict Ji+,, (6.61) evaluates f(x,y) for x = xi + h/2, y = Ji+1,2,and
a preliminary estimate of yi+,. That is, Yi+, is the ordi- uses this as the average derivative for proceeding over
nate, at x = xi+,, of the straight line @, passing through the whole interval.
(xi,yi) with slope f(xi,yi) = k,. Second, an improved The higher-order Runge-Kutta methods are developed
estimate yi+, is obtained from equation (6.59); the slope analogously. For example, the increment function for the
of line Q, used for this purpose, is the weighted average third-order method is
of approximations to f at the two ends of the interval.
Note that although the true derivative at xi+, is 4 = ak, + bk, + ck,,
f (xi+,,y(xi+,)), it is approximated by f (xi+l*Ji+l),since where k,, k,, and k, approximate the derivative at
y(xi+,) is unknown. various points on the integration interval [xi, xi+,I. In
Eu1,er's algorithm of (6.60) may be viewed as a predict- this case,
ing eiuation for JI:), (the first approximation to yi+,),
whereas (6.59) may be considered a correcting equation k1 =f(xi,~i),
to produce an improved estimate of yi+,. Equation (6.59)
may be used iteratively to produce a sequence of corrected kz =f (xi + ph,yi + phkl),
y i + t Values, jf:),, yj:\, . . ., J!:',. In this case, the pair of k, =f(xi + rh,yi + shk, + ( r - s)hk,).
6.6 Truncation Error, Stability, and Step -Size Control in the Honye-Kutta Akorithrns 363
The third-order Runge-Kutta algorithms are given by This reduces to Simpson's second rule (2.21~)if f(x,y) is
a function of x only.
yi+, = yi + h(ak, + bk, + ck,). (6.63)
The most widely used fourth-order method (and very
To determine the constants a, b, c, p, r, and s, we first likely the most widely used single-step method for solving
expand k2 and k, about (xi,yi) in Taylor's series as func- ordinary differential equations as well) is the one credited
tions of two variables. The object function y(x) is expan- to Gill [6]:
ded in a Taylor's series as before, (6.25). Coefficients of
like powers of h through the h3 terms in (6.63) and (6.25)
are equated to produce a formula with a local truncation
error of order h4. Details of the argument are essentially
the same as in the development of the second-order
methods.
Again, there are feweir equations than unknowns:
at-b+c=l,
+
bp cr = 4,
bp'! + cr2 = 3,
cps = 4.
Two of the constants a, b, c, p, r, and s are arbitrary. For
one set of constants, selected by Kutta, the third-order
+
(1 --y2hk, 1,
method is:

Originally, the constants were chosen to reduce the amount


kl = f (xi,l'i),
of temporary storage required in the solution of sizable
+
k2 =f(xi -I- +A, yi 31?k,), (6.64) systems of first-order equations. With the advent of
+
k3 = f (xi -th,yi 2hk2 - hk,). machines having large memories, the necessity for saving
Note that if f(x,y) is a function of x only, then (6.64) a few memory locations has largely disappeared, but
reduces to Simpson's rule (2.21b). the Runge-Kutta subroutines in most computer-program
All the fourth-order formulas are of the form libraries still employ the Gill constants.
Runge-Kutta formulas of higher order can be de-
veloped by extending the procedures outlined i n this
where k,, k,, k3, and k4 are approximate derivative section (families of fifth-order formulas can be found
values computed on the interval x i 4 x < xi, ,. Several in [25, 40,411.)
fourth-order algorithms are used. The followi~lg is
attributed to Kutta : 6.6 Truncation Error, Stability, and Step-Size Control
in the Runge-Kutta Algorithms
Since the mth-order Runge-Kutta algorithms of Section
6.5 were generated by requiring that (6.50) agree with the
k~ =f(xi,~i), Taylor's expansion of the solution function y(x) through
k2 = f ( ~ i+ $h,yi + fhkl), (6.66) terms of order hm, the local truncation error e, is of the
k3 =f (xi + +h,yi + fhkz), form
k4 =f (xi + ih,yi + hk,).
+
e, = Khm+' O(hm+*), (6.69)

Again, note that (6.66:) reduces to Simpson's rule if where K depends (in a complicated way usually) upon
f(x,y) is a function of x only. Another fourth-order ,f(x,y) and its higher-order partial derivatives. If one
method, also ascribed to Kutta,.is: assumes that h is sufficiently small, so that the error is
dominated by the first term in (6.69). it is possible,
h though not at all simple, to find bounds for K [7, 81. In
Y ~ + I= ~ ~ + ~+ 3( k k2 +,3 k 3 + k,), general, such bounds depend upon bounds for f (x, y) and
its various partial derivatives, and upon the particular
Runge-Kutta method used. Ralston [8] shows that par-
ticular choices of the free parameters in the under-
determined equations for the constants in the algorithm
[see P6.151 will tend to minimize the upper bound on K.
364 The Approximation of the Solution of Ordinary Dzfferential Equations

In order to choose a reasonable step size, one needs over the number required for integration using just one
some estimate of the error being committed in integrating step size, h,. As a compromise, we could use the monitor-
across one step. On the one hand, the step size should be ing procedure less frequently, for instance, for every kth
small enough to achieve required accuracy (if possible); step.
on the other, it should be as large as possible in order to Another criterion, suggested by Collatz [9] for the
keep rounding errors (a function of the number of arith- Runge-Kutta method of (6.66), is to calculate l(k, - k,)/
metic operations performed) under control and to avoid (k, - k,)l after each integration step. If the ratio becomes
an excessive number of derivative evaluations. This latter large (more than a few hundredths [lo]), then the step
consideration is very important, particularly when the size should be decreased. This is only a very qualitative
differential equation is complicated and each derivative guideline, but has the virtue that the added computation
evaluation requires substantial computing time. For the is negligible.
mth-order methods of the previous section, the derivative If m is odd, Call and Reeves [11] suggest integrating in
must be evaluated m times for each integration step. the reverse direction, from x,+, to xn with h replaced by
One approach to this problem is to assume that the - h, after having integrated across the step in the forward
local truncation errors have the form Khm+l with K direction. The truncation error is estimated as half the
constant, and that the local truncation error, committed difference between yn and y,*, where y,* is the solution
in traversing one step, dominates the change in total error found as a result of the reverse integration. Unfortunately,
for the step. Then an estimate of the local truncation error when m is even, the method fails, since the truncation
can be found by integrating between two points, say xnand errors in one direction exactly cancel those in the other
xn+,, using two different step sizes h1 and h, to evaluate and, aside from rounding errors, y, = y;.
y,+, ; let the corresponding solutions be yn+,,, and Determining a bound for the accumulated or propaga-
yn+,,. Then if y;, is the "true" solution, we can employ ted error for the Runge-Kutta algorithms is difficult
Richardson's extrapolation technique described on page [12,13]. Reported bounds are usually very conservative
78 as follows*: [7]; in addition, the parameters essential for their com-
putation are only rarely available. In general, if the local
truncation error for a one-step method is of O(hm+'),
then the accumulated error will be of O(hm),[3,4], that
is, the reduction in the order of the error is similar to that
observed for Euler's method.
Dividing the first of these equations by the second, and All the Runge-Kutta methods can be shown [2] to be
solving for yX+, yields convergent, that is, lim,,, (y, - y(xi)) = 0 (see page 347).
Another criterion for selecting an algorithm for the solu-
tion of a differential equation with given initial conditions
is stability. Stability is a somewhat ambiguous term and
appears in the literature with a variety of qualifying
If we choose h, = h,/2, (6.71) becomes
adjectives (inherent, partial, relative, weak, strong, abso-
lute, etc.). In general, a solution is said to be unstable if
errors introduced at some stage in the calculations (for
example, from erroneous initial conditions or local trunca-
and an estimate of the local truncation error for the tion or round-off errors) are propagated without bound
solution y,+ ,,,,
assuming that (xn+, - x,) = h,, is given throughout subsequent calculations.
by (6.70) and (6;72) as Certain equations with specified initial conditions
cannot be solved by any step-by-step integration pro-
cedure without exhibiting instability, and are said to be
inherently unstable. For example, consider the equation
For the fourth-order Runge-Kutta method, m = 4 and of (6.19),
(6.73) becomes

for which the analytical solution is given by (6.23) as


Unfortunately, if we use (6.73) as a monitoring procedure
for the integration step size on every integration interval, y(x) = -x - 1 + [I + xo + y(~~)]e-"~e".
the total number of calculations is approximately trebled With the initial condition y(0) = - 1, the analytical
solut.ion is
* Throughout this section, n should be considered as a general'
subscript, and not the subscript of the final base point as in (6.9. y(x) = - x - 1.
6.7 Simultaneous Ordinalry D#erential Equations 365
Thus the exponential term in the general solution vanishes error under control, especially during the early course of
because of the particular choice of the initial condition. the integration, by choosing a sufficiently small h, that is,
Even a very tiny change in the initial condition (for by keeping the propagation factor [l + h(dflay)] close to 1.
example, y(0) = - 0.99999) will eventually cause a drastic Suppose that dfldy is positive and constant, so that
change in the magnitude (even the sign in this case) of the the propagation factor is greater than one for all h and
solution for large values of x. Therefore, even though the the error does increase without bound for increasing x
coeffic~entof the exponenitial term is quite small, the con- as shown by (6.76). For example, consider equation
tribution of the exponential term will eventually swamp (6.151,
the contribution of the linear terms in the solution. When
such an equation is solved by using one of the one-step
methods, each new step may be viewed as the solution of
a new initial-value problem. Even if the initial condition for which the solution (6.18) is
is error-free for the first step, the initial conditions for
subsequent steps will inevitably contain errors introduced
by truncation and round-off in preceding steps; the cal-
Will the unbounded growth of the error invalidate the
culated solution for large x will bear no resemblance to
computed solution? Not necessarily, since the solution
the true solution.
itself is unbounded for increasing x. The most important
Inherent instability is associated with the equation criterion is not that the absolute error E~ be bounded, but
being solved and the initial conditions specified, but does
that the relative error gi/yi not grow appreciably.
not depend on the particular algorithm being used.
Similar, though more complicated, propagation factors
Depending on the equation being solved, its initial con-
can be developed for higher-order one-step methods [3].
ditions, and the particular one-step method being used,
The quantity h(df/ay), sometimes called the step factor,
another form of instability, partial instability [15], may
contributes to these propagation factors in a manner
be observed, even when the equation is not inherently
comparable to that for Euler's method. Collatz [9]
unstable. This phenomenon is related to the step size
suggests that the step factor be kept essentially constant
chosen, and is perhaps seen most easily by examining
during the course of the integration, leading to another
the Euler's method of algorithm of (6.28b). From (6.38),
method of controlling the step size.
the total error at xi+, is related to the total error at x , by
6.7 Simultaneous Ordinary Differential Equations
Consider the solution of the following system of n
simultaneous first-order ordinary differential equations
,.
where xi < t < xi+ From the differential mean-value in the dependent variables y,, y,, . . ., yn:
theorem of page 9, we ma:y write

with u in (yi,y(xi)). Since [yi - y(xi)] is just ci, (6.75)


may be written

with initial conditions given at a common point (x,),


that is,
The first term on the right-hand side of (6.76) is the
contribution of the propagated error to the error at
x i + , while the second term is the local truncation error.
Clearly, if af/ay is negative, then a value of h can be
found which will make [I .+ h(if/ay)] < 1, and the error
will tend to diminish or die out: the solution will be yn(x0) = Yn.0.
stable. If [l + h(df/dy)] > 1, that is, for dfldy positive, The solution of such a system is, at least in principle, no
the error at xi will be amplified in traversing the ith step, more difficult than the solution of a single first-order
and the solution will tend toward instability. Even in these equation. The algorithm selected is applied to each of the
cases, however, it may be possible to keep the propagation n equations in parallel at each step.
366 The Approximation of the Solution of Ordinary Differential Equations

Since a single high-order equation If the equations can be ordered so that, for all j,
dyjldx = f j ( ~ , ~ 1 ,.~.., .. .,fi-l), then the in-
2 ~n,j;,f2,
,
tegration schemes of the preceding sections can always
be applied. If such an ordering is impossible, then one
may compute the particular (dyj/dx) which cannot be so
with appropriate initial conditions ordered from

can always be rewritten as a system of first-order equa- where the starred derivatives must be known or assumed
tions of the form of (6.77) [see, for example, (6.2), (6.3)1, at x,. Thereafter, one can usually use the most recently
the numerical methods developed in this chapter may be computed values for them.
applied indirectly to solve higher-order initial-value Error analyses comparable to those of Section 6.4 are
problems as well. Initial-value problems involving virtually impossible to implement for the higher-order
systems of equations of mixed order may also be reduced Runge-Kutta schemes for systems of differential equa-
to the form of (6.77) in most cases. Depending on how tions. The step-size control mechanisms and stability
the differential equations are coupled, some of the deriva- considerations outlined in the preceding section carry
tives in (6.77) may be functions of other derivatives, over to the multiple-equation case without appreciable
that is, change, In practice, we often solve the equations using
different step sizes and observe the behavior of the
solutions with regard to apparent convergence and
stability.
EXAMPLE 6.3
FOURTH-ORDER RUNGE-KUTTA METHOD
TRANSIENT BEHAVIOR OF A RESONANT CIRCUIT

Problem Statement For tiz > 0 (the under-damped or oscillatory case):


Write a general-purpose function named RUNGE that
solves a system of n first-order ordinary differential v0exp(- g) cos(at - tan-'(&))
equations V= . (6.3.5)
aJCL
For bZ = 0 (the critically-damped case):

For bz < 0 (the over-damped case):

using the fourth-order Runge-Kutta method with Kutta's


constants (6.66).
Consider the circuit of Fig. 6.3.1 containing a capaci-
tor of C farads, a resistor of R ohms, and an inductance Write a test program that calls on RUNGE to solve the
ofL henries. Assume that the capacitor is initially charged differential equation (6.3.2) with initial conditions (6.3.3),
and then compares the numerical solution with the value
of the appropriate analytical solution. For test purposes,
consider the following cases:
Vo = 100 volts
C = 2 x loe6 farads
L = 0.5 henries
R = 0,100,1000, and 1500 ohms.
Use step-size h = 0.0001 sec and tabulate the solutions at
Figure 6.3.1 Electrical circuit. reasonable intervals in time during the first f,,,,,=0.02sec.
Observe the effects of step-size variation by finding
to a voltage Vo, and that suddenly the switch is closed at numerical solutions for the cases:
time t = 0. Show that the ordinary differential equation
describing V, the damped oscillation of voltage across Vo = 100 volts
the capacitor is given as a1 function of time by C=2 x farads
L = 0.5 henries
R = 100 ohms
h = 0.00001,0.0001,0.001,0.002,0.005,0.01.
subject to the initial condiitions
V(0) = vo, Method of Solution
The fourth-order Runge-Kutta algorithm of (6.66) for
the one-step integration of a single first-order equation
(6.4) with one appropriate initial condition, yi y(xi),
Let may be written for a system of n first-order equations
(6.3.1) with n initial conditions,

Here, yji is the solution of the jIh equation in (6.3.1) at


xi. The initial conditions for the zeroth step, yjo, j =
1,2, . . ., n, will usually be known exactly. Thereafter, the
and show that the following analytical solutions satisfy initial conditions for the ith step will be approximations
(6.3.2) with conditions (6.3.3): to the true initial conditions, yj(xi), j = 1,2, . . ., n, since
368 The Approximation of the Solution of Ordinary Differential Equations

they will result from applications of the Runge-Kutta 3. Begin accumulation of the values $j, j = 1,2, . . ., n,
method on the (i - 1)th interval. For a system of n in (6.3.8a).
equations, the one-step integration across the ith inter-
$jtFj=kjl, j = l , 2 ,..., n. (6.3.11)
val may be described by :
4. Compute the values Y;., j = 1,2, . . ., n, in (6.3.8~)and
assign them to elements of the vector Y.
kjl =fj(xi,yli,~2ir. . .t~ni), (6.3.8b)
+
yTi = yji fhkjl,
* *
(6.3.8~)
+
kj2 = fj(xi +h,yli,y2i,.. ., Y:), (6.3.8d)
j..
11
= y..
I'
+ 'hkj2, (6.3.8e) 5. Increment x to the value needed in (6.3.8d).
kj3 =fj(xi + *h,jli,72i,. . jni), (6.3.8f)
jj*i= yji -I- hkj3, (6.3.8g)
-* -* . ., Y,*ii).
+ h,yli>yzi,.
kj4 =fj(xi (6.3.8h) 6. Calculate the values f,, j = 1,2, . . ., n, using the
The relationships in (6.3.8) are applied in parallel at current x and Yj values. These are equivalent to the
each point in the algorithm for all n equations, that is, for values kj2, j = 1, 2, . . ., n, of (6.3.8d).
j = 1,2, ..., n. F j +fj(x, Yl, I.;,. . ., Y,) = kjz
Because the Runge-Kutta method is a one-step method,
h * *
the subscript i throughout (6.3.8) is, in a sense, super-
fluous. In addition, because the kj, are not needed after the
= fj(xi + Z,yli,y2i,. , j = ~ 2 . . .,. n.
yTi are computed, and the y;i in turn are not needed after (6.3.14)
the kj, are computed, etc., it is possible to write a general- Pass 3
purpose Runge-Kutta function and calling program
which require relatively few memory locations for 7. Add the contribution of kj2 to $j, j = 1,2, . . ., n, in
retention of the many variables appearing in (6.3.8). The (6.3.8a).
entire algorithm of (6.3.8) will be implemented in pro- #j dj + 2Fj = kjl + 2kj2. (6.3.15)
4-
gram form as outlined below.
Let four vectors of length at least n be denoted by the 8. Compute the values j j i ,j = 1, 2, . . ., n, in (6.3.8e) and
names Y, F, and 4. Before carrying out the Runge- assign them to elements of the vector Y.
Kutta integration for the ith step, the following variables
must be initialized:
x t xi, value of the independent variable.
h +- h, step size for integration across the ith step,
Xi+l - Xi. 9. Calculate the values fj, j = 1, 2, . . .,n, using the
n t n, number of first-order differential equations. current x and Yj values. These are equivalent to the
Yj +- yji, j = 1, 2, . . ., n, solution values for the n equa- values kj3, j = 1,2, . . ., r1, of (6.3.8f).
tions at xi. Fj+fj(x,Yl,Y2, . . ., Y,) = kj3
Then (6.3.8) may be described by a five-pass procedure.
Pass 1
1. Calculate the values f;., j = 1, 2, . . ., n, using the
current x and Yj values. These are equivalent to the Pass 4
values kjl,j = 1, 2, . . ., n, of (6.3.8b). 10. Add the contribution of kj3 to $j, j = 1,2, . . ,,n, in
F j +--f,(x,Yl,Yz,. . ., Y,) = kjl (6.3.8a).
=fj(~iryli?y2i,. . ., y,i), j = 1, 2, . . -,n.
(6.3.9)
Pass 2
2. Save the current values Yj, j = 1, 2, . . ., n, in another
.
1 1. Compute the values Jj*i,j = 1, 2, . ., n, in (6.3.8g)
and assign them to elements of the vector Y.
vector of equal length, 7, This assigns the solution
values at the beginning of the ith step to the vector
E.vample 6.3 Rruige-Kirtta Metllod-Transient Behavior of a Resorlarrt Circrtil 369
R U N G E signal the calling program to indicate whether tile
i2. Increment .x to the value needed in (6.3.831). Fj,j = 1, 2, . . ., 17, of (6.3.24) are to be computed (fol-
lowing the first four passes) or not (when integration
across one step is completed following the fifth pass). Let
the value returned be 1 when the Fj are to be evaiuated
i i.Calculate the values .fj,j = I, 2, . . ., n, using the and 0 when one complete integration step is conlpleted.
current x and Y, values. These are equivalent to the The main program used to test the function R U N G E
values k j 4 ,j = 1, 2, . . ., n, of (6.3.8h). solves the second-order ordinary differential equation
(6.3.2) subject to initial conditions (6.3.3). Since the
charge, q, on a,capacitor is related to the capacitance, C,
and voltage, V, by %?
q = CV, (6.3.25)
Pass 5
the current, i, into the capacitor is given by
14. Complete the evaluation of q5,, j = 1, 2 , . . ., n, in
(6.3.8a).

The voltage, V, across a resistor and inductance in series


is given by
15. Compute the values ; v ~ , ~ ,+j , = 1, 2, . . ., n, in (6.3.8a) di
and assign them to elements of the Y vector. V=L-+Ri. (6.3.27)
tit

Then, from (6.3.26) and (6.3.27), the voltage across the


capacitor as a function of time is given by (6.3.2),
As a consequence of the procedure described by (6.3.9)
to (6.3.23), 11 and n remain unchanged, and x and the
vector Y contain :
The initial conditions are
,,
x = x i + value of the independent variable.
Y, = yj,{+ ,, j = I!, 2, . . ., n, solution valuesfor the
n equations at xi+
Ail initial values have been assigned for the next inte-
gration step, from x i + ,to x i + , . The step size, It, may be Differentiation of the three proposed solutions, (6.3.51,
changed if desired (see Section 6.6). The five-pass pro- (6.3.6), and (6.3.7), and substitution into (6.3.28), shows
cedure of (6.3.9) to (6.3.23) may be repeated for the next that, for the given value of z2, each satisfies (6.3.28) and
step. initial conditions (6.3.29).
Note that the parts of the procedure given by (6.3.9), The second-order equation (6.3.28) must be rewritten
(6.3.14), (6.3.17), and (6.:3.21) are identical, as a system of two first-order equations. Let

and are the only ones that refer directly to the n differen-
tial equations, f,,j = 1, 2., . . ., n, of (6.3.1). Therefore, it
is possible to write a general-purpose function called
R U N G E that implements all parts of the procedure out- x = t.
lined, except for initialization of n, x , h, and Yj, j = 1, 2, Then
..., n, and steps (6.3.9), (6.3.14), (6.3.17), and (6.3.21).
The calling program will then contain all information
about the specific system of differential equations, and
be responsible for printing results, changing the step size, dYz d2V
-
RdV 1
rr--- R y 1
2 -
terminating the integration process, and evaluating the dx dt2 L dt LC L --
LC'7l
F, when needed. (6.3.31)
The five passes of the algorithm can be handled by five The initial conditions of (6.3.29) are
different calls upon RUNGE, as shown schematically
in Fig. 6.3.2. Let a step counter, m, preset to 0, be main-
tained by the function RUbdGE,and let thevaluereturned by
The Approximation of the Solution of Ordinary Differential Equations

Calling Program Function RUNG E

Initialize
x, h, n,
Y j ,j = 1, . . ., 11

Figure 6.3.2 Communicationbetween calling program and function RUNGE.


,Example 6.3 Runge-Kutta Method-Transkefit Behavior #a Resonant Circuit

Flow Diagram
Main Program Y r
Vl '-Vo
t4-0 v2 4- 0
~2+---
1 R2
CL 4L2

Function RUNGE (Dummy arguments: n, Y, F,x, h ; calling arguments: 2, V, F, t , h)

r--I----------
I
-- 1
I
v
-0
i

I
71" yj
cPj+Fj X4-x+-
h Return
h 2
Yj+ Y j + I F j

I
CPjccPj+2Fj Return
h
Yj4- Y'j + Fj

_---------------
ajc4, + 2Fj Return
Y j + Yj f hFj
C

m is assumed to be: preset to zero, before RUNGE is called for the.first time.
372 The Approximation of the Solution of Ordinary Differential Equations

FORTRAN Implementation
List of Principal Variables
Program Symbol Dejinition
(Main)
ALPHA, ALPHSQ and ii2, respectively.
cc
c Capacitance, C, farads.
F Vector of derivative approximations, Fj (see (6.3.24)).
H Step size, h.
HL Inductance, L, henries.
ICOUNT Step counter.
IFREQ Frequency of intermediate printout. Solutions are printed at t = 0 and after every IFREQ steps
thereafter.
Variable to control plotting of solutions on printer. If IFPLOT = I, a plot is prepared by calling
on PLOTI, PLOTZ, etc.; if IFPLOT # 1, no plot is prepared.
IMAGE Large vector used to store image of plot before printing.
K Value returned by RUNGE. If K = 1, the elements of F are to be calculated, if K # 1, one step
of the integration is completed.
PLOT1, PLOT2, Subroutines used for preparing on-line graph of V against t. See Examples 7.1 and 8.5 for further
PLOT3, PLOT4 details.
R Resistance, R, ohms.
R02L Rl(2L).
T Time, t , sec.
TMAX Upper limit of integration, t,,,, sec.
TRUV True voltage at time t , V,, calculated from analytical solutions.of (6.3.2) subject to initial condi-
tions (6.3.3).
Vector of solutions of(6.3.2) computed by the function RUNGE. V(1) = V, volts. ~ ( 2 = ) dV/dt,
volts/sec.
VZERO Initial voltage across capacitor, V,, volts.
(Function
RUNGE)
J Subscript, j
M Pass counter for the Runge-Kutta algorithm, m. Must be preset to 0 before first call upon
RUNGE.
N Number of equations, n.
PHI Vector of values, cjj.
SAVEY Vector of initial conditions, Yj (see (6.3.10)).
A Independent variable, x.
Y Vector of dependent variable (solution) values, Yj.
.Example 6.3 Runge-Kutta Method-Transient Behavior of a Resonant Circuit

Program Listing
Main Program
C A P P L l ED NUMERI CAL METHODS, EXAMPLE 6 . 3
C ELECTR I CAL TRANS l ENTS US l NG THE RUNGE-KUTTA METHOD
I-:
T H I S TEST PROGRAM CALLS ON THE FUNCTION RUNGE TO SOLVE
THE D I F F E R E N T I A L EQUATION V"
TO THE I N I T I A L CONDITIONS V(O.0)
= -R+V1/HL - V / ( H L * C ) SUBJECT
= VZERO AND V ' ( O . 0 ) = 0.0.
V ( 1 ) AND V ( 2 ) ARE THE VALUES OF V AND V' RESPECTIVELY. F(1)
AND F ( 2 D ARE THE D E R I V A T I V E S OF V ( 1 ) AND V ( 2 ) RESPECTIVELY.
R, HL, AND C ARE THE RESISTANCE, INDUCTANCE, AND CAPACITANCE
( I N OHMS, HENRIES, AND FARADS) OF A C I R C U I T (SEE F I G U R E )
I N WHICH THE CAPACITOR I S CHARGED TO A VOLTAGE VZERO A T T I M E
ZERO. T ( T I M E I N SECONDS) I S THE INDEPENDENT VARIABLE, H I S
THE STEPSIZE, AND TMAX THE UPPER INTEGRATION L I M I T USED I N
THE RUNGE-KUTTA INTEGRATION. THE EQUATION W I T H THE G I V E N
I N I T I A L CONDITIONS DESCRI BES THE DAMPED OSCl L L A T l O N OF VOLTAGE
ACROSS 'THE CAPACITOR UPON CLOSURE OF THE C I R C U I T . TRUV I S
THE ANAI-YTI CAL SOLUTION WHICH A P P L I E S FOR THE UNDER-DAMPED,
CRl T I CALLY DAMPED OR OVER-DAMPED CASE DEPEND1 NG ON THE VALUE
OF A L P H S Q = l / ( C * H L ) - R * R / ( 4 * H L * H L ) ( P O S I T I V E , ZERO, AND NEGATIVE
VALUE RESPECTIVELY). K I S THE VALUE RETURNED BY THE FUNCTION
RUNGE. I T EQUALS 1 WHEN THE D E R I V A T I V E S ARE TO BE
CALCULATED AND 0 WHEN INTEGRATION ACROSS ONE STEP I S
COMPLETE. ICOUNT I S A STEP COUNTER. RESULTS ARE PRINTED
AFTER EVERY IFREQ INTEGRATION STEPS. I F I F P L O T = 1, THE
RESULTS ARE ALSO 'PLOTTED USING THE LI BRARY SUBROUTINES
PLOT1, PLOTZ, PLOT3, AND PLOT4.

I M P L l C l T WEAL*8(A-H, 0-2)
l NTEGER RLINGE
DIMENSION F ( 2 ) , V(2), IMAGE(1500)
C
C .....
READ AND P R I N T DATA
1 READ ( 5 , 1 0 0 ) R,HL, C,VZERO,H,TMAX,
..... I FREQ, l FPLOT
WRITE ( 6 , 2 0 0 ) R,HL,C,VZERO,H,TMAX, IFREQ, I F P L O T
C
C .....
T = 0.0
I N I T I A L I Z E T, V(1), V(2), AND ICOUNT .....
V ( 1 ) 0 VZERO
V ( 2 ) = 0.0
ICOUNT = 0
C
C ..... COMPUTE ALPHSQ AND ALPHA, P R l N T HEADINGS
ALPHSQ = l . / ( C * H L ) -
R*R/(4.0*HL*HL)
.. ..,
ALPHA DSQRT(DABS(ALPHSQ))
R02L = R/(Z.*HL)
I F ( IFPLOT.NE.l GO TO 3
CALL P L O T l ( 0, 5, 11, 6, 1 9
CALL PLOT2( IMAGE,TMAX,O., DABS(VZERO),-DABS(VZER0)
3 WRITE ( 6 , 2 0 1 )
C
C
C
..... COMPUTE
..... I S C l RCUl T OVER-, CRI T I CALLY-, OR UNDER-DAMPED .. ...
A N A L Y T I C A L SOLUTION, P R I N T AND PLOT .....
4 I F (ALPHSQ) 5, 6, 7
5 TRUV = VZERO*DEXP( - R O Z L * T ) * ( (1. +ROZL/ALPHA)+DEXP(ALPHA+T)
1 +(I.-ROZL/ALPHA)*DEXP(-ALPHA*T))/2.0
GO TO 8
6 TRUV = VZERO+DEXP(-R02L*T)+(1. + ROZL*T)
GO TO 8
7 TRUV = V Z E R O * D E X P ( - R O Z L * T ) * D C O S ( A L P H A t f - D A T A N ( R O Z L / A L P H A ) )
1 /(ALPHA*DSQRT( C+HL)
8 I F ( IFPLOT.NE.l GO TO 1 0
CALL PLOT3( l H + , T, V ( l ) , 1, 8
10 WRITE ( 6 , 2 0 2 ) T,V(l),TRUV,V(Z)
A
L
C
11
.....CALL ON THE FOURTH-ORDER RUNGE-KUTTA FUNCTION
K = RUNGE( 2, V, F, 1, H )
.....
374 The Approximalion of the Solution of Ordinary Diferenlial Equations

Program Listing (Continued)


Ir
C .....
IF (
WHENEVER K=1, COMPUTE DERl VATIVE VALUES
K.NE.1 ) GO TO 1 3
.....
F ( 1 ) = V(2)

C
C
13
.....
IF
I F T EXCEEDS TMAX, TERMINATE
T.LE.TMAX
( GO TO 1 6
INTEGRATION .....
I F ( IFPLOT.NE.l ) GO TO 1
WRITE (6,203)
CALL PLOT4( 7, 7HVOLTAGE
WRITE (6,204)
GO TO 1
16 ICOUNT = ICOUNT + 1
C
C .....
IF
PRINT RESULTS OR
ICOUNT.NE.IFREQ
( )
CALL DIRECTLY ON RUNGE
GO TO 11
.....
ICOUNT = 0
GO TO 4
C
C
100
..... FORMATS FOR INPUT AND OUTPUT STATEMENTS
FORMAT ( 9X, F7.2,14X,F6.2,14X,E6.1/ lOX,F6.2,14X,F7.5,13X,F6.3/
.....
200

201
1

1
2
l o x , 14,16X,ll
FORMAT ( l O H l R
E14.1
= ,F14.5/
/ 1 0 H VZERO = ,F14.5/
F14.5/ 1 0 H IFREQ = , I 8
FORMAT ( 64HO T
10H HL
10H H
= ,F14.5/10H C
,F14.5/10H
/ 10H I F P L O T = , 1 8
CALC. V
- TMAX

TRUE V

I

=,

1 CALC. V' / 1H
202 FORMAT ( 1H , F10.5, 2F18.8, F18.5 )

C
203
204
FORMAT ( 1 H 1 )
FORMAT ( 1H0, 54X, l 6 H T l M E -
(SECONDS) 1

END

Function RUNG E
FUNCTION RUNGE( N, Y, F. X, H
C
C THE FUNCTION RUNGE EMPLOYS THE FOURTH-ORDER RUNGE-KUTTA METHOD
C WITH KUTTA'S COEFFICIENTS TO INTEGRATE A SYSTEM OF N SIMULTAN-
C
C
C
(J=1,2, ...,
EOUS F I R S T ORDER ORDINARY DIFFERENTIAL EQUATIONS F(J)=DY(J)/DX,
N), ACROSS ONE STEP OF LENGTH H I N THE INDEPENDENT
VARIABLE X, SUBJECT TO I N I T I A L CONDITIONS Y(J), (J=1,2,...,N).
C EACH F ( J ) , THE DERIVATIVE OF Y(J), MUST BE COMPUTED FOUR TIMES
C PER INTEGRATION STEP BY THE CALLING PROGRAM. THE FUNCTION MUST
C BE CALLED F I V E TIMES PER STEP (PASS(l)...PASS(S)) SO THAT THE
C INDEPENDENT VARIABLE VALUE ( X I AND THE SOLUTION VALUES
C (Y(l)...Y(N)) CAN BE UPDATED USING THE RUNGE-KUTTA ALGORITHM.
C M I S THE PASS COUNTER. RUNGE RETURNS AS I T S VALUE 1 TO
C SIGNAL THAT ALL DERIVATIVES (THE F C J ) ) BE EVALUATED OR 0 TO
C SIGNAL THAT THE INTEGRATION PROCESS FOR THE CURRENT STEP I S
C FINISHED. SAVEY(J) I S USED TO SAVE THE I N I T I A L VALUE OF Y ( J )
C AND P H I ( J ) I S THE INCREMENT FUNCTION FOR THE J ( T H ) EQUATION.
C AS WRITTEN, N MAY BE NO LARGER THAN 50.
C
I M P L I C I T REAL*8(A-H, 0-2)
REALe8 Y, F, X, H
INTEGER RUNGE
DIMENSION P H l ( 5 0 1 , SAVEY(SO), Y(N), F(N)
DATA M/O/
C
M = M + l
GO TO (1,2,3,4,5), M
-
t?
C! .....
PASS 1 .....
1 RUNGE = 1
RETURN
Example 6.3 Runge-Kutta Method-Transient Behavior of a Resonant Circuit

Program Listing (Continued)


b
C
2
..... PASS 2
DO 22 J = 1, N
.....
SAVEYCJ) Y(J)
PHI(J) = F(J)
22 Y(J) = SAVEYCJ) + 0.5*H*F(J)
X = X + 0.5.H
RUNGE = 1
RETURN

.....
DO 33 J
PASS 3 .....
1, N
P H I ( J ) = P H I ( J ) + 2.O*F(J)
Y(J) = SAVEY(J) + 0.5*H*F(J)
RUNGE = 1
RETURN
..... PASS 4
DO 44 J = 1, N
.....
PHI(J) = PHICJ) + 2.O*F(J)
Y(J) = SAVEY(J1 + H*F(J)
X = X + 0.5*H
RUNGE = 1
RETURN

..... PASS 5
DO 5 5 J = 1, N
.....
Y ( J ) = SAVEY(J) + ( P H I ( J ) + F(J))*H/6.0
M - 0
RUNGE = 0
RETURN
END

Data
R
VZERO
1FREQ
R
VZERO
1FREQ
R C
VZERO TMAX
l FREQ
R C
VZERO MAX
IFREQ
R -C
VZERO TMAX
l FREQ
R C
VZERO TMAX
l FREQ
R C
VZERO TMA X
l FREQ
R C
VZERO TMAX
l FREQ
R C
VZERO TMAX
l FREQ
R C
VZ ERO TMA X
l FREQ
The Approximation of the Solution of Ordinary Differential Eq~iations
Example 6.3 Runge-Kutta Method-Transient Beha~iorof a Resonant Circlril

a013
- o m
N r l -
m o m
-3.0

...
O N N
I

mmID m o r
W N r l m 3 a
m o m f Dl0
0 0 m 0 0 0 m o m I D d d
OOOOri O
00 I 0 0 0
O O D O O N
O m N O O O

-
<
L
a u u u u
00000000000000000000
00000000000000000000
r l N m = m w r - m r n o - + ~ m f ~ w - - m ~
L
G, n u u +
O O O O O O O O O O O O O O O O O O O O
00000000000000000000
rlNmfmwP--marl-mfmwr-o.

..
8 Y ,I C- tr It Y 11 tl 0,s
'1
-r
C-
0 0 0 O O O O O O r l r l r l ~ r l d r l d d r l N
.....................
000000000000000000000 *
u
4 0 a0
C
0 0 0 0 0 0 0 0 0 4 r l d . + 3 ~ 3 3 r+ N
.....................
O O O O O O O O O O O O O O O O O C C O O
rz O0 O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O C
3
,- W
X W - 1
dMCL 2 DI
W
X W - I
d m -

m w w f N h n m n
m w o mmm rlmm
mLOn 0 a 0 OIDO
m*N N h N mmw
* m a m h m -001
ID*+ m a Ln
W f N m o m m m N
....
m f h rlmm
... ...
0 0 0

mwm - d m o m ~ m
m o m o m - o r l a m rml m
ID
0,-it - m a O O f N mm
mmm m m w o a m w mm
am"? O m N O f m m
IDID4 m f 3 o w a r l Im D3W
m n o OIDf N rlm
......
Po*-
mmw
WNm
m f rl
-rim
....
o m a h
Ohm",
r(
..
0 0
0

I I I rl

I-
z
0 a0
X W - I

w acc~
A N ruu
LLZU>IC--
Computer Output (Continued)
Results for the 6th Data Set Results for the 7th Data Set
R I 100.00000 R = 300.00000
HL = 0.50000 HL I 0.50000
C I 0.2D-05 C I 0.2D-05
VZERO = 10.00000 VZERO = 10.00000
H = 0.00010 H = 0.00100
TMAX = 0.02000 TMAX = 0.02000
IFREQ = 10 IFREQ = 1
IFPLOT = 0 IFPLOT = 0

T CALC. V TRUE V CALC. V' T CALC. V TRUE V CALC. V'

0.0
-7627.57558
-7516.16529
-1161.44732
5009.23693
5886.98405
1699.77815
-3144.90741
-4490.63369
-1850.20820
1853.43882
3341.19041
1774.91974
-986.54239
-2425.31507
-1582.17716
426.61390
1715.76066
1341.41766
-82.92239
-1179.97185

Results for the 8th Data Set Results for the 9th Data Set
R I 100.00000
HL I 0.50000 R =
C I '0.2D-05 HL .
i
VZERO = 10.00000 C I
H I 0.00200 VZERO =
TMAX = 0.02000 H .L

-
IFREQ = 1 TMAX =
IFPLOT = 0 IFREQ =
IFPLOT
T CALC. V TRUE V CALC. V'
CALC. V TRUE V CALC. V'
Example 6.3 Runge-Kutta Method-Transient Behavior of a Resonant Circuit

Computer Output (Continued)


Results for the 10th Data Set
R * 100.00000
HL P 0.50000
C r 0.2De05
VZERO = 10.00000
H P 0.01000
TMAX r 0.02000
IFREQ = 1
IFPLOT = 0

T CALC. V TRUE V CALC. V '


380 The Approximation of the Solution of Ordinary Differential Equations

Discussion of Results Table 6.3.1 Comparison of Numerical and Analytical Solutions

All calculations have been made using double-pre- . - - -

cision arithmetic. Only the plotted output is shown for Numerical Analytical Error
the under-damped solution resulting from the input Step size, h Solution, V Solution, V, V- V,
(Set) (Volts) (Volts) (Volts)
parameters of the 1st data set. Results for the 2nd, 3rd,
and 4th data sets correspond to an undamped (R = O),
critically-damped (R = 1000), and over-damped soiu-
tion, respectively. For the step size used, h = 0.0001, the
numerical solutions agree with the appropriate analytical
solution of (6.3.5), (6.3.6), and (6.3.7) to five or six
significant figures.
Results for data sets 5 through 10 show the influence of For small step sizes, the solutions computed by the
the step-size h, on the numerical solution for an under- Runge-Kutta fourth-order method are extremely accu-
damped case (see plotted output for data set 1). Table rate. The truncation error increases with increasing step
6.3.1 shows the calculated solution, V, the true solution, size, until the solution eventually "blows up" and bears
V,, and the error, V - V,, at time t = 0.02 sec. Because no resemblance to the true solution. This is not surprising,
double-precision arithmetic has been used throughout, in view of the fact that the period of oscillation for the
and the total number of integration steps is no larger solution is 2411 = 0.00628 sec. As the step length ap-
than 2000 fcr any case, the error may be attributed solely proaches the period of oscillation, virtually all local
to truncation effects; any round-off error'is negligible in information about the curvature of the solution function,
comparison. V,, is lost.
6.9 Open Integration Formulas

6.8 Multistep Methods


The one-step methods of previous sections have
approximated the solution of the first-order equation
9i(x)

I
with initial condition yo = y(xo). This has been accom-
plished for Euler's method (6.28) by integration across a
series of single intervals to yield successively,

xi-k Xi-ktl" Xi-2 Xi-, Xi Xi+*

Figure 6.4 Multistep integration.


^Xi+l

and yi,l will not yet have been computed. All the com-
mon multistep methods are described by
For an integration involving k + 1 intervals terminating Xi+ I
at x i + l , this can be written as
it 1 = ~ +J
i - r
xi-k
4 ( x )dx, (6.82)

where 4 ( x ) = zr=o
ajxi. Here, r is the degree of the
interpolating polynomial.
where t,hi(x)is a step-functio~nwith ordinates& =f ( x j , y j )
on the half-open intervals [ x j , x j +,), j = i - k, . . ., i. The 6.9 Open Integration Formulas
integral in (6.81) can be viewed as the area between the Since the xi are evenly spaced, and the values fi, f i - l ,
limits xi-, and x i + , shown in Fig. 6.4. fi- 2 , .. .,A-,, are known,
$(x) is given in simplest form
It seems clear that replacement of the function t,bi(x) in by Newton's interpolation formula based on the back-
(6.81) by an interpolating yolynominal 4 ( x ) that passes ward finite differences of xi, already developed in
through the points ( x j , f j )for j near i'might well lead to (1.61):
more accurate algorithms. VZf
The derivations are similar to those used in generating $(xi + ah) =fi+ aV fi + a(a + 1 ) -2!
i

the open and closed integration formulas of Chapter 2,


except that the interpolating polynomial is determined
,,
by derivative values f ( x y j ) instead of functional values
yj. The number of values of f j required depends on:
(a) the desired degree of the interpolating polynomial,
and (b) the availability of the values f j . Assuming that
the integration has already proceeded to the base point where fi =f (xi, yi), and x = xi ah, that is, a = +
x i , the values of all thef j , 0 <j < i, are known. In general, ( x - xi)/h. For the base-point values a assumes
fi+, will be unknown, sincef is a function of both x and y, integral values -p. Since d x = hda, the integral of (6.82)
is, in terms of the variable a,

That is,
382 The Approximation of fh@ Sofutbti of Ordinary Differential Equations

With a suitable renumbering of subscripts, (6.84) can integration is [xi- ,,xi+,I. Similar statements hold for
be shown to be equivalent to the open Newton-Cotes the other members of (6.85). All the information required,
integration formulas of (2.30). that is, yi,f (xi,yi), yi- l , f( x i - ,,yi- ,) . . . etc., is known,
Fork = 0,1,2, and 3, equation (6.84) leads to the follow- since the integration is presumed to have proceeded to
ing algorithms : x i already.
Note that for the odd values of k, the coefficient of the
kth backward difference is zero. For this reason, the
most commonly used formulas of this type are for k odd
+
with just k l terms retained (that is, r = k). The
+
(k I)th term is zero in these cases, so that the inter-
polating polynomial is actually of degree r - 1 rather
than of degree r as would be expected. Of course, any
number of terms could be retained. When the formula is
terminated with the (r + 1)th term, that is, when the last
included term involves V"f,/r!, the error term is

h"'1
-k
a(a + l ) ((ra ++2)l ) .!. . ( r + r) IC/""'(()da,
< ( < xi+ ,. (6.86)
From (6.85), the most important open formulas are:
k=O,r=3:

As illustrated by the shaded area of Fig. 6.5, equation


(6.85a) involves approximate integration for any suitably
differentiable function, rl/(x), that passes through the
points (~i-~,fi-r),(xi-r+~,fi-r+l),. . (xi-l,fi-~), (xirfi).
The interval of integration is, of course, [ X ~ , X ~ + ~ ] .
Equation (6.85b) uses the same points, but the interval of

In each case, ( lies somewhere between the smallest


and largest of the x j involved in the formula. The error
terms R of equations (6.87), computed from (6.86), are
the local truncation errors for one application of the
formulas, assuming that I(/(.lr) =f(.r,y), and that y j =
y(xj) and ,fj = f(xj,y(xj)) for those j involved in the
xi-r
-r -1 xi
0
Xi+l
1 a
formulas. However, since the y j and f j , except for yo and
fo, are computed numerically and thus are usually in-
Figure 6.5 Open multistep integration. exact, the error terms are approximations only.
6.10 Closed Integration Formulas 383
In terms of the derivative estimates fj, instead of their - 1,i - 2, ..., i - r). Thus, since the integration
( j = i,i
backward differences, these open formulas become: covers the whole interval [ x ~ - ~ , x ~ +the~ ]polynomial
,
effectively extrapolates over the interval [xi,xi+J. The
k=O,r=3: situation for the case k = 3, r = 3 is shown in Fig. 6.6.
h
yi+l = yi + - (55fi - 59fi- 1
24
+ 37 fi-2 - 9fi-3), 6.10 Closed Integration Formulas
R = O(h5). (6.88a)
A set of closed integration formulas of the multistep
k=l,r=l: form (6.82),
yi+l = yi-l + 2hfi, R = 0(h3). (6.88b)
k=3,r=3:
4h can be generated in analogous fashion, where $(x) is the
Yi+l = Yi-3 + -5. (2.h -fi- 1 + 2fi-3, R = 0(h5)-
interpolating polynomial which fits not only the previously
(6.88~) calculated values fi,fi-l ,fi-2, but the unknown f i + as
k=5,r=5: well. As before, the transformation of variable
x = xi + ah yields
= yi-5 + 3h
-(ll fi - 14ifi-l + 26fi-2
+hl
1
10
yi+=
~ yi-k )(xi + ah) dm. (6.89)
- 14f +1 f ) R = 0(h7). (6.88d)
-k

However, 4(xi + ah) is now expressed in terms of the


All of these formulas involve use of the interpolating backward-difference formula based at xi+, instead of xi
polynomial passing through known points (xi, fj), [see (6.83) for comparison] :

Extrapolation on the Equation (6.91) is simply the analogue of the general


intervals [xi+
Ixi>xi+lI
Newton-Cotes closed integration formula (2.14), with
/ the interpolating polynomial written in terms of the back-
~ +k ~=)0,1,3,
ward differences o f f ( ~ ~ + ~ , yFor . and 5,
equation (6.91) leads to the following:

Fisure 6.6 Open integration for the case


k = 3, r = 3 (eqn. 6.88~).
384 The Approximation of the Solution of Ordinary Differential Equations

k = 1: formulas, provided that $(x) =f(x,y) and that y j = y(xj)


and f j =f(xj,y(xj)) for those j involved in the formulas.
1
2 h + 1 - 2Vh+i fjv?fi+, In each case, 5 lles somewhere between the smallest and
largest of the x j in the formula.
1 In terms of the derivatives f j instead of the backward
+ov3fi+,--v4f.
90 '+'
++.-
) (6.92b) differences, these formulas become :

h
Yi+t =Yi-1 +j(fi+l +4/,+/,-,), R=o(~').

(6.95b)
k=3,r=5:
2h
For these formulas, the error term for truncation after Yi+l = yi-3 + -45( 7 f i + l + 32fi + 12fi-1
the term involving the rth difference is
-
(a - l)(a)(a+ l ) ( a + 2 ) . .( a + r - 1)
+ 32fi-2 + 7 f i - , ) , R = 0(h7).
(6.9%)
hr" 1-L +-
( r I)! The situation for the case k = ,I,r = 3 is shown in Fig.
6.7.
x I)('+')(<) da, x i - , < 5 < x i + l . (6.93)
For k odd, the coefficient of V k + * f i +vanishes.
, There-
fore, the most frequently used formulas of this type are
for k odd, with r = k + 2. Three of the more important
closed formulas are :
k=O,r=3;

Figure 6.7 Closed integration for the case


k = 1, r = 3 (eqn. 6.956).

6.11 Predictor-Corrector Methods


Assuming that we have available the necessary starting
values (usually a one-step Runge-Kutta method is used
to generate the approximate solutions y,,y,, etc., near the
starting point), the open integration formulas of (6.88)
are very straightforward algorithms for step-by-step
solution of first-order ordinary differential equations.
For example, to solve the equation
The error terms R computed by (6.93) are the local
truncation errors for one application of the integration
6.11 Predictor.-Corrector Methods 385
with initial condition y(xo) given, one of the fourth- The closed multistep method of (6.95b) is more com-
order Runge-Kutta methods of Section 6.5 could be plicated than the open multistep method of (6.88~)be-
used to compute y,, y,, y3'(and f,, f,, f,). Starting with cause of the iterative calculation (6.96) required for each
i = 3, equation (6.88~)could be used to generate explicitly step of the algorithm. Nevertheless, the open formula is
the successive values y,, y,, . . ., y,. A fourth-order not often used by itself to generate a sequence of solution
Runge-Kutta method would Ibe preferred for producing values y,, y,, . . ., y,. The closed method is preferred
starting values for (6.88c), since the local truncation because its local truncation error is considerably smaller
error is of order h5 as it is for (6.88~). than that for the open method, even though the two
A closed integration formula with accuracy of order formulas a?e of the same order. This can be seen by
h5 is given by (6.95b). In this case, a fourth-order Runge- comparing the error terms of (6.87~)and (6.94b):
Kutta method could be used to compute y, (and f,).
Thereafter, starting with i = 1, equation (6.95b) could be 14
Open formula (6.88~)error: - h5j"(4'(5),
used to produce successively y,, y,, . . ., y,. Unfortu- 45
nately, (6.95b) cannot be usecl to solve for y i + , directly,
sincef,., = f (xi+,,yi+,)is generally unknown. Except for
the rather special case for which f(x,y) is a function of
1
x only, (6.95b) is an implicit equation in y i + , which must Closed formula (6.95b) error: - - h5f(4'((),
be soived iteratively by one of the methods of Chapter 3. 90
One approach is to solve (6.95b) by the method of
successive substitutions described in Section 3.5. I f y i + l , j
is the jth successive approximation to the solution of These truncation error terms apply strictly only when
(6.95b), then the recursion relation comparable to (3.22) is yi = Axi), yi-l = Y ( X ~ - ~. .).,, andfi =f(xi,~(xi)),fi-l=
given by - . . . ; in most real situations these quan-
f (xi- l , y ( ~ i,)),
tities will have been computed numerically. Therefore,
(6.100) and (6.101) should be viewed as approximations
only. Assuming that f(,'(5) is effectively equal to f(4)(9),
a reasonable assumption for sufficiently small h, (6.95b) is
significantly more accurate than (6.88~).
Thus the procedure for solving (6.95b) is to assume a Comparison of other open and closed multistep
value for yi+1,0,computef ( x i + l > ~ i,o),
+ l find Y1 + I , , from methods of comparable order, for example, (6.88a) and
(6.96),then calculate f ( ~ ~ + ~,,),
, yfind
~ y+i + ,,, from (6.96), (6.95a), or (6.88d) and (6.95c), indicates that the closed
etc. The process continues until the sequence formulas are, in general, considerably more accurate
y i + l,,,yi+1,2appears to converge, that is, until y i + l , j + l than the open ones. This is not surprising, since the open
differs from y , + , , by some acceptably small amount. formulas involve extrapolation of the polynomial 4(.u)
Under what conditions wi.U this iterative procedure of equation (6.82) on the interval [xi,xi+,], while the
converge? Since yi-l, fi, fi-,, h, and x i + , are constant closed formulas involve interpolation.
for a given i, (6.96) may be written The closed methods can be used alone to generate
approximations to the solution of differential equations,
provided that the necessary starting values are available
(or can be found by using a self-starting one-step method)
and that the step size chosen satisfies the appropriate con-
vergence criterion [for example, (6.99)]. Since each itera-
where C is a constant. From the discussion on page 168, tion of a closed formula [for example, (6.96)] requires
convergence will take place if IFf(yi+,)l is less than one. one evaluation of the derivative f ( ~ ~ + ~ . y ~ it+ ,is, ~ ) ,
Thus (6.96) converges, provided that the asymptotic important that convergence be very rapid. The number of
convergence factor iterations required for convergence depends on many
factors; the two factors over which the programmer has
most control are the initial guess, yi+,,,, and the step
size h.
In practice it has been found that if more than one or
Then the step-size h must satis,fy the condition two iterations are required for convergence, t h e rnulti-
step methods cannot compete effectively with the one-
step methods of preceding sections. For example, the
fourth-order Runge-Kutta methods require four deriva-
tive evaluations for each step with the advantages that
386 The Approximation of the Solution of Ordinary Differential Equations

starting values are needed at just one point, and that Modijied Adants or Adams-Moulton method:
the step size can be adjusted from step to step with little Predictor:
difficulty. On the other hand; the fourth-order method of
(6.96) has the disadvantage that it is not self-starting,
since values of the solution are needed at both xo and x1
before the method can be applied for the first time; in
addition, as described in the next section, a change in the Corrector:
step size, while possible, is rather cumbersome. The
principal advantage of (6.96) is that it can be used to
produce solutions to differential equations (with accuracy
comparable to that for a fourth-order Runge-Kutta
method) with as few as two derivative evaluations per Many other predictor-corrector methods are also used
s t e P , f ( x i + l , ~ i + ~a, n~d) ,f ( x i + ~ , ~ i + l , l ) . (see Section 6.13 for another important method). The
Fortunately, we already have very good algorithms for general naming conventions for the integration methods
estimating yi+,,,, namely, the open formulas of Section that employ the open and closed integration formulas
6.9. The usual practice is to select an open and closed are as follows:
formula with comparable order for the local truncation
error. The open formula is used to predict the value Name of Method Mode of Operation
yi+,,,, and the closed formula is then used to correct the
estimate by an iterative computation similar to (6.96). Adams or Adams-Bashforth Use of predictor equa-
The open formula is usually called the predictor and the tions only.
closed formula is called the corrector; algorithms which Moulton Iterative use ofcorrector
use the predictor and corrector as,outlined above are equations only.
called predictor-corrector methods (see also page 362). Modified Adams Use of predictor fol-
Three of the commonly used predictor-corrector methods lowed by iterative use
are the fourth-order and sixth-order Milne methods [5] of corrector, k = 0 for
and the fourth-order modijied Adams or Adams-Moulton both equations.
method which use predictors and correctors already Milne Use of predictor fol-
developed in Sections 6.9 and 6.10. lowed by iterative use
of corrector, k # 0 for
Fourth-order Milne method: either equation.
Predictor :
6.12 Truncation Error, Stability, and Stepsize Control
in the Multistep Algorithms
One of the virtues of the predictor-corrector methods
of the preceding section is that we can estimate and then
correct for local truncation errors, by assuming that the
Corrector :
error terms [for example, (6.100) and (6.101)] apply.
The approach is probably best illustrated by an example.
Consider the fourth-order Milne method of (6.88~)and
(6.95b). Given the differential equation dyldx =f (x.y), a
specified step-size h, and starting values for yo, y,, y,, )I,,
Sixth-order Milne method: fo,fi, fi, and f3 (usually yo is given, and y,, y,, and y, are
Predictor : computed by using a fourth-order Runge-Kutta method).
the algorithm proceeds step-by-step for i = 3.4. 5. ...
(here x, = x, + ih) as follows:
1. Calculate yi+,,, using the predictor (6.88~).

Corrector : 2. Computef ( ~ ~ + , , y ~ +and


, , ~then
) , solve the corrector
(6.95b) iteratively using (6.96),
6.12 Truncation Error, Stability, atzd Step-Size Control in the Multistep Algorithms 387
The iteration on j is continued until a specified conver- 5. In a modification of the standard fourth-order Milne
gence criterion is met, for example, ,
method, (6.106) is used to modify the solution y i + ,k of
the corrector equation; the approximate solution at
x i + , is computed as

where E is some small positive number. If (6.103) fails to


converge, then criterion (6.98) is being violated, and com- Equation (6.108) may be rewritten as
putation must be stopped; action must be taken to
decrease the step size and try again as described below.
Since computational efficiency requires that (6-96) con-
verge very rapidly, h should be such that the asymptotic
convergence factor I F ' ( Y ~ + 6 ~ )1~ in (6.98). In fact, h
should be small enough so that the convergence test If we assume that the local truncation error varies only
(6.103) is passed after just one or at most two iterations. slowly from step to step so that e, for the interval
Note that the derivative f ( ~ , + ~ , y ,j+) ,must
, be evaluated [xl,xl,,] is approximately equal to that for the interval
once for each value of j. [xi-,,xi], then we can modify yi+,,, as calculated from
3. Let k be the number of iterations required to solve the predictor equation (6.102) to find an improved start-
(6.96). Then, for the standard predictor-corrector ing value for the iterative solution of (6.96). Let the
algorithm, y i + l is equated to yi+,,k, and f i + , is computed improved y,+,,, be denoted y:+, ,,. Since e, for the
+
by using the new y,+,. In general, k 1 evaluations of interval [xi-, ,xi] is, from (6.107), approximately
f (x,+,,yi+,) are required for each step, k to satisfy the -
( Y ~ ,~~~ , ~ ) /we
2 9can ,
, estimate yi*, ,, from ( 1.109) as
convergence criterion (6.103), and one to compute
fi+ =f (xi+ l, y + ,k) for use in subsequent calculations.
Next, i is incremented by one, and steps 1, 2, and 3 are
repeated for the next integration step, ,,
Using y;+, rather than y i + ,,, as the starting value
4. The local truncation error in using the Milne for (6.96) does not change the converged value of
method to integrate across the interval [ x ~ , x , +can ~ ] be y,,,,,, and hence the preceding analysis remains un-
estimated by assuming that the error terms of (6.100) changed. Because yi",, ,, is, for most cases, a better value
and (6.101) apply. In fact, they apply strictly only when than yi+,,, computed from (6.102), the only effect of
yi-), yi-lr fi-z,fi-l, and f i in (6,102) and (6.96) are (6.1 10) is to speed convergence of the iterative solution
exact, which they will not be in most cases. Nevertheless, of (6.96) and possibly to reduce the number of derivative
if these quantities were known exactly and no round-off evaluations required.
errors were introduced, we could then write A similar approach may be used for the other predictor-
corrector algorithms of Section 6.11. Clearly. it is much
simpler to compute an estimate of the local truncation
error for the multistep methods than for the Runge-
Kutta methods (see page 364).
Step-size control is another matter entirely. The step
size must be small enough to satisfy the convergence
criterion for the corrector equation [for example, (6.99)],
preferably small enough to insure convergence in just
one or two iterations, and it must be small enough to
If we assume that f(4)(t) is approximately equal to satisfy any restrictions on the magnitude of the local
f4)(5), a reasooable assumption since h is presumed to truncation error [for example, (6.107)]. On the other
be small, then division of (6.104) by (6.105) leads to hand, the step size should preferably be large enough so
that round-off errors and the number of derivative
evaluations will be minimized. The latter consideration
is especially important when the derivative function is
complicated, and each evaluation requires substantial
The estimate of the local truncation error e, for the computing time. The principal advantage of the multi-
corrector equation then follows as step methods, namely that they require fewer derivative
evaluations per step than do the one-step methods of
1 1
e
-90
- -h 4
L9 i I , -+ 1 , 7 comparable accuracy, will be lost if the step size is chosen
be smaller than necessary.
388 The Approximation of the Solurion of Ordinary Differential Equations

Fortunately, relationships such as (6.103) and (6.107) Subsequent analysis has shown that, under certain
provide sufficient information to determine when the conditions, some of the multistep methods exhibit
step size should be increased or decreased. Unfortu- catastrophic instabilities which render the numerical
nately, the mechanism for implementing such changes is solution meaningless. Such instabilities develop even
not very straightforward. The difficulty is that when the though the equations are inherently stable (see page 364),
step size is changed, the necessary starting values for the and cannot in general be removed by step size adjustment
predictor and corrector will not usually be available. (hence the instability cannot be a partial instability as
Assume that integration has proceeded without difficulty described on page 365).
to base point xi using step-size h,, and that integration The nature of the stability problem is easily illustrated
across the interval [xi,xi+,]leads to convergence prob- by an analysis of the solution of the simple linear ordinary
lems for the corrector (for example, if more than one or differential equation with constant coefficient a,
two iterations are required) or to a truncation error larger
than desirable. Then one approach would be to decrease
the step size by some amount and restart the integration
procedure at xi using the new step-size h,. For an arbi- using a multistep algorithm. Many analyses reported in the
trary h,, about the only choice available is to use a one- literature begin with an arbitrary equation, but assump-
step method to generate the necessary starting values tions to retain only linear terms and to require that dfldy
yi+,, yi+,, . . ., etc., required to reintroduce the multistep be constant often follow, and the equation actually being
method for subsequent integration. A similar procedure being studied is (6.1 11).
can be followed if the estimated truncation error is much
If the corrector equation in a predictor-corrector
smaller than actually required and the corrector equation
method is iterated to convergence for each integration
requires only one or two iterations for convergence,
step, then any error in the observed solution can be
except that the new step-size h, is now larger than h,.
attributed solely to the corrector. Hence, it is sufficient
If the new step-size h, is chosen to be 2h1 and a suf-
to study the propagation of error by the corrector
ficient number of "old" values, yi-l, yi-2,. . ., and
equation alone. When the correttor is not iterated to
fi-,, fi-,, . . ., have been retained, then the change is
convergence, but is employed just once (often the case
relatively simple. One need only reassign subscripts so
in practice), then the stability analysis must include
that the "old" yi-, becomes the "new" yi-,, the "old"
errors generated by both the predictor and corrector
yi-, becomes the "new" yi-,, etc.
equations [7]. To illustrate, we shall assume that the
If the new step-size h, is chosen to be h,/2, then some
corrector is iterated to convergence.
but not all of the solution and derivative values for the
Since the corrector equation (6.95b) for the fourth-
predictor and corrector will be available. The old yi-,
order Milne method is observed to produce catastrophic
becomes the new yi-,, the old yi-, becomes the new
instabilities for solution of some differential equations
yi-,, etc.; certain values, yi-,, yi-,, etc., will be missing.
(but excellent results for others), a stability analysis will
One approach is to use a one-step method to compute
illustrate the general approach. Other multistep methods
yi-, starting with the new yi-,, and to compute yi-3
can be analyzed similarly.
starting with the new yi-,; another approach is to inter-
The Milne corrector is
polate on the available new values yi, yi-2, yi-,, . . ., to
estimate the missing new values yi-,, yi-,, . . ..
In practice, the step-size h is usually changed by doub-
ling or halving it as described above. Clearly, too-frequent Substitution of (6.11 1) leads to
changes in the step size will vitiate the principal advantage
of the multistep methods-their computational speed.
In the early 1950's, when computers were first widely
used for solving ordinary differential equations, investi- (6.1 12)
gators discovered that for some equations certain of the which is a linear homogeneous difference equation of
multistep methods led to computational errors far larger order two. Note that for a fixed h, the coefficients are
than expected from local truncation errors (equations constants, and that the convergence criterion of (6.98)
witti k ~ i o w ~analytical
i solutions were used in these requires that
studies). It was also discovered that a decrease in step
size often resulted in an increase in the observed error,
even when round-off errors were known to be insigni-
ficant. In some cases, the numerical solution showed little, The theory of the solution of linear difference equations
if any, relationship to the true solution of the equation such as (6.112) is closely related to the theory for the
being solved. solution of linear ordinary differential equations. We
6.12 Truncation Error, Stability, and Si!ep-Size Conlrol in the Multistep Algorithms 389
shall give only a brief outline of the theory here; those Milne and Reynolds [31] have tabulated the values of
unfamiliar with it are referred to the excellent introduc- these two roots for values of ah between - 1 and 1.
ticii to difference equations by Conte [30]. Root y, i s positive for all ah in this range and is monotone
If we let yk be a function defined for a set of integers k, increasing from 0.366 at ah = - l to 2.732 at ah = 1.
then a linear drfference equation of order n is a linear Root y2 is negative for all ah in this range and is monotone
.
relation involving yi, y,,,, . ., y,,,, that is, it is of the increasing from - 1.366 at ah = - 1 to -0.732 at ah = 1.
form A critical feature of y, is that for negative ah, it is smaller
than - 1 and for positive ah, it is larger than - 1. If we
set h equal to zero, the roots are y, = 1 and y, = - 1.
where the coefficients a!o, . . ., a, and b may be functions Since the convergence criterion (6.1 13) requires that luhl
of k but not of y. If b is zero, the equation is considered be smaller than 3, and for rapid convergence preferably
to be homogeneous; otherwise it is called nonhomogeneous. much less than 3, values of ah between - 1 and 1 are
The theory for solution of homogeneous linear dif- those which would be encountered in practice.
ference equations with constant coefficients is similar to Now, if y, and y, are expanded in a power series [31],
that for homogeneous linear differential equations with they are, respectively,
constant coefficients. In this case, we look for a solution (ah)' (ah)3 (ah)4 (ah)'
of the form y, = yk. Substitution of this solution into y1 = l + a h + - + - +- +- + ...
2! 3! 4! 72
(6.1 14), with b = 0 and a,, . . ., an constant, yields

which, upon division by yk, becomes


(ah) (ah)' (ah)3 5 ( ~ h ) ~5(ah)'
y 2 5 -I+----- +-+-+ ...
3 18 54 648 1944
Equation (6.1 15) is called the characteristic equation
and is a polynomial of degree n in the variable y. If the The solution of the original difference equation (6.1 12)
.
n roots, y,, . . ., y, of (6.11 5) are distinct, then Y:, Y:, . ., y,k assumes the form (6.116)
are all solutions of (6.1 14).
The complete solution of the homogeneous form of y i = c,yf + c2y: (6.122)
(6.1 14) is then given by the linear combination or, from (6.120),
+
yi = cleiah cZYL (6.123)

where c,, . . ., c, are constants. Each term y j is aparticular


Since x i = x, + ih, we may write (6.123) as
solution of the homogeneous equation. Complex and ,
i -
clea(xi-xo)
+ c2y\. (6.124)
multiple roots of (6.1 15) may be handled in a fashion
The analytical solution of the original differential
similar to that used for linear ordinary differential
equations. equation (6.11 1) is given by
For the homogeneous linear difference equation of
(6.112), the characteristic equation is

where c is a constant determined by the initial conditions.


Clearly, the particular solution of the homogeneous dif-
This is a quadratic equation in y which, from the quad- ference equation (6.112) which corresponds to the root
y, is the desired solution of the differential equation
ratic formula, has the two roots
(6.125). Note that the diference equation has a second
particular solution czy; which has no relationship what-
soever to the exact solution of the differential equation.
Such solutions are called parasitic or extraneous solutions,
and will always be present when a difference equation of
order greaterthan one is used to generate the solution of a
first-order ordinary differential equation. We must
investigate the nature of the parasitic solution to see if it
will in fact be of no significance, or might possibly over-
whelm the true solution and render the numerical solu-
tion meaningless.
The Approximation of the Solution of Ordinary Differenrial Equations

First consider the case a > 0. Since h is positive, ah more rapidly. Jf the true solution of (6.11 1) grows with
will be positive, and the solution of the difference equa- increasing i, then the parasitic solutions must either decay
tion associated with y, will grow exponentially as ceuxg, or grow less rapidly than the true solution.
approximating a true solution to the differential equation. A method is called strongly stable if
For positive ah, the root y, is a negative fraction greater
than - 1. Therefore, the particular solution y\ will decay
with increasing i, and will have only minor influence on Thus, for a method to be strongly stable, all parasitic
the complete solution. Thus errors introduced at various solutions must decay with increasing i.
stages during computation (by local truncation errors or Dahlquist [32] has shown that if the truncation error
round-off errors) will not be amplified in succeeding for a multistep method is of O(hP+'),and n is the order
steps because of the presence of the parasitic solution. of the difference equation, then to achieve either strong
Next, consider the case a < 0. Since h is positive, ah will or marginal stability, p must satisfy the relationship
be negative, and the solution of the difference equation
associated with yl will decay exponentially as ceuxz,
approximating a true solution to the differential equation. +
Moreover, p. = n 2 is possible only when n is even and
For negative ah, the root y, is a negative number less the method is marginally stable (for example, Milne's
than - 1. Hence the particular solution y: will grow in fourth-order corrector for which n = 2 and p = 4). Thus
magnitude with increasing i. In fact, because y, is nega- one can increase the accuracy of a method (increase the
tive, the solution will oscillate in sign from step to step. order of the truncation error) only by increasing the
In this case any errors introduced during the computation value of n, and consequently, the number of parasitic
will be amplified during succeeding steps with alternating solutions associated with the difference equation. Thus,
signs. Eventually, this parasitic solution will swamp the diminished stability is the price one must pay for increased
true solution and the numerical results will be meaningless. accuracy in a multistep method. Stability analyses for
Methods which exhibit stable behavior for the solution several new predictor-corrector algorithms have been
of equation (6.111) for some values of ah and unstable published in recent years [33, 34, 35, 36, 37, 391.
behavior for other values of ah are said to be weakly The preceding stability analysis has dealt with the
stable, weakly unstable, or marginally stable. Hence the solution of the very simple linear equation of (6.1 11).
fourth-order Milne corrector is a marginally stable The question of applicability of the conclusions of the
method. This does not mean that the method should not simple stability analyses to more general nonlinear
be used, but only that it should not be used when the equations remains unanswered. Henrici [2] believes that
true solution of the equation is a nonincreasing (decaying) the variability of aflax may make a significant difference
function. Under other circumstances, the Milne corrector in the stability of a method, for example, changing a
is perfectly suitable and, because of its small truncation strongly stable method into a marginally stable one for
error, even desirable. certain equations. Computational experiments have
Similar analyses can be carried out for any of the shown, however, that conclusions about stability, follow-
multistep algorithms. In each case, the appropriate ing from stability analyses for equation (6.11 l), cor-
homogeneous linear difference equation with constant relate rather well with observed behavior of the multistep
coefficients is found for equation (6.111). The character- methods when used for other equations as well.
istic equation is then solved for the roots y,, y,, . . ., y,, Simultaneous ordinary differential equations (see Sec-
where n, the order of the equation, is equal to the dif- tion 6.7) may be solved by multistep methods as well as
ference of the largest and smallest subscript which appears by one-step methods. The appropriate algorithm is
in the multistep equation. For example, for the Milne implemented for each equation in parallel at each step.
corrector, the largest subscript is i + 1 and the smallest The criteria for solution of simultaneous corrector
is i - 1; the order of the difference equation and thus the equations by the usual method of successive substitutions
number of solutions is (i + 1) - (i - 1) or 2. are those outlined in Section 5.8. Henrici has devoted an
Let y , be the root of the characteristic equation which entire book [28] to a study of the solution of systems of
is associated with the true solution of (6.1 11); then y, is equations by multistep methods.
called the principal zero or root. If
6.13 Other Integration Formulas
IyjI < IYII, j = 2, 3, . . ., n, (6.126)
Hamming [21, 221 has investigated a general class of
the method is called relatively stable. In this case, the corrector formulas of the form
approximation to the true solution will dominate the
parasitic solutions associated with the roots y j , j =
2, 3, . . ., n. If the true solution of (6.1 11) decays with
increasing i, then the parasitic solutions must decay even
6.13 Other lntegrarion Formrtlas 391
which includes the correctors for the fourth-order Milne 2. The predicted solution y i + ,,, is computed using the
and Adams-Moulton predictor-corrector methods. Five fourth-order Milne predictor as in (6.102),
of the seven constants in (6.129) can be eliminated by
requiring that the formula be exact for polynomials of Y ~ + I , O= Yi-3 + gh(2fi -fi-I + 2fi-2).
degree four or less, that is, exact for the functions 3. The predicted value is modified, by assuming that
~ ( x=) xjx2, x4. This process is the method the local truncation error on successive intervals does not
of undetermined coefficients, and leads to the following change appreciably, to yield
relationships among Ithe constants, taking a, and a, as
the parameters: 112
1
Y,*, 1.0 = Yi+ 1.0 +-
121
(~i.k- ~i.0). (6.133)
a. = 1 - a , - a, b -
O - 2 4
(19 + 13a1 + 8a2)
This modification is analogous to that of (6.110) for
1 Milne's method. For the case i = 3, this step is bypassed,
a , ==a, b - (-5
'-24 +
13a, + 3 2 ~ ~ ) since there will be no values for the elements on the right-

-
hand side of (6.133). Here, y,,, is the solution of the
1 corrector equation (6.134) from the preceding step.
a, = a 2 b - (1-al+8a2)
* -24 4. Hamming's corrector equation of (6.13 1) is solved
iteratively using the successive substitution algorithm
1
b-, = (9 - a,).
24
(6.130)
The derivation of the error term for (6.129) for y(x) not a The initial value for y i + l , j on the right-hand side is
polynomial of degree 4 or less is rather complicated y,*,, ,o from step 3. In theory, iteration is continued until
17, 221 and will not be: discussed here. some convergence test has been satisfied [for example, see
Hamming [22] has studied the stability characteristics (6.103)]. In practice, the corrector is usually applied just
of (6.129) and selected the remaining parameters a, and once; the step size is chosen to insure that convergence
a, to achieve much stronger stability than is exhibited by effectively takes place in just one iteration.
the Milne method corrector; the price paid is some in- 5. The values Y ~ + , ,from
~ step 4 and y i + , from
crease in the magnitude of the truncation error. The form step 2 are next used to estimate the truncation error e, for
of (6.129), selected by Hamming, as representing the best the corrector equation from
compromise between stability and accuracy, is

Assuming that yi, yi-,, fi, and fi-, are known exactly,
the local truncation error for (6.131) is which follows directly from the error terms for the pre-
dictor (6.104) and corrector (6.132), respectively, using
an analysis identical with that outlined on page 387 for
Milne's method [see (6.107))
For practical step sizes, (6.131) is stable in the sense of
,
6. The final value for the solution yi+ is then computed
from
(6.126).
The Hamming corrector can be used .with any suitable
predictor. Hamming's predictor-corrector method employs
the fourth-order Milne method predictor; the technique
of modifying both the predicted and corrected value of which is analogous to (6,108) for Milne's method. The
the solution already outlined for the fourth-order Milne derivative is computed at this final value and ~ a l l e d f , , ~ .
method on pages 386 and 387 is usually followed. The If the local truncation error estimate is within allow-
complete algorithm is: able limits, the counter i is advanced by one and steps 2
through 6 are repeated for the next integration interval.
1. Starting values for YO,Y I ,Y,, ~ 3fo,, f , , f2, and f3 If the local truncation error estimate from (6.135) is
must be available. Usually yo will be the only known outside allowable limits, the step size is adjusted in some
condition, and a fourth-order Runge-Kutta method will fashion (see page 388), and then the integration process is
be used to compute y,, y,, and y,. The derivatives fo, f,, continued.
f,, and f, can be computed from the differential equation
dyldx =-f(x, y). Steps 2 through 6 are then repeated This fourth-order Hamming's method is now among the
for i = 3,4, .... most popular of the multi-step methods.
392 The Approximation of the Solution of Ordinary Differential Equations

Many other one-step and multistep algorithms are for which the evaluation of derivatives requires relatively
available for solution of first-order ordinary differential little computing time.
equations. Milne [5] and Henrici [2] give particularly There is also a variety of one-step and multistep
complete coverage of the multistep methods. methods suitable for direct integration of equations of
Recently, Butcher [23, 24, 251 has developed several order two or more; the higher-order equation need not
one-step methods of orders five to seven. Gear [26] and be reduced to a system of first-order equations before
Butcher 127, 381 have developed some modified multi- carrying out the numerical solution. Henrici [2] has
step methods of orders four to seven, that have some of pointed out that solutions generated by these direct
the characteristics of both one-step methods (derivative methods are no more accurate than those found by
evaluations are made in the interior of the integration reducing the equation to a system of first order equations
step) and multistep methods (information from previous and then solving the system using methods for first-
steps is used). Waters [29] has compared the performance order equations. However, for equations of special form
of several of these methods with the commonly used (for example, second-order differential equations that do
algorithms (fourth-order Runge-Kutta method, etc.) not contain a term in the first derivative), the direct
from the standpoint of accuracy and computation time. methods may save some computation.
He solves a system of ordinary differential equations
EXAMPLE 6.4
HAMMING'S METHOD

Problem Statement for Hamming's method in Section 6.13 may be modified


Write a general-purpose function named that to handle a system of n simultaneous first-order ordinary
solves a system of n first-order ordinary differential differential equations by simply appending a leading
equations subscript, j, to all y and f terms in (6.131) to (6.136).
In terms of the new nomenclature, steps 2 through 6 of
dY1 the outline in Section 6.13 are:
-dx
--fi(x,YlrY2, . . SrYn)
2. The predicted solutions yj,i+l,o are computed
using the Milne predictor (6.102):
dY2
-~ ~ z ( x , Y ~ , Y ~ , . . . , Y ~ (6.4.1)
)
dx Yj,i+1,0 = Yj.i-3 + $h(2fj,i -fj,i- 1 + 2fj,i-2),

-
dy,
dx
= : ~ , ( x , Y ~ , Y. z.,Y,),
,. 3. The predicted solutions, yj,i+l,, are modified (as-
suming that the local truncation error estimates,
using Hamming's predictor-corrector method outlined in j 11 2, . . ., n, will not be significantly different
e j , i + l >=
Section 6.13. from the estimates ej,i,j = I, 2, . . ., n) as in (6.133):
Write a main program that solves the second-order y;,i+ l , o = yj,i+l , o + 112
j =2 , . .. . (6.4.7)
ordinary differential equation
4. The jth corrector equation corresponding to (6.134)
d2Y
-= (6.4.2) is applied for each dependent variable:
-Y,
dx2
Yj,i+l,l = $C9Yj,i - Y j , i - 2 + 3h(fTi+l,0 + 2fj.i -fj,i-l)I,
subject to the initial conditions
j = l , 2 ,..., n, (6.4.8)
where
f:i+ 1,0 I,o,. . ., Y;,i+l,o>.
=f(xi+ I,Y:,~+I,o,Y~*,~+ (6.4.9)
The program should call on the fourth-order Runge-
Kutta function RUNGE, developed in Example 6.3, to The corrector equations in (6.4.8) are being applied just
find the essential starting values for Hamming's algo- once for each variable, the customary practice. The cor-
rithm and, thereafter, should call on HAMING to calculate rector equations could, however, be applied more than
estimates of y and dyldx on the interval [O,x,,,]. Evaluate once, as in (6.134). Note that the subscript j in (6.134) is
the numerical solutions for several different step sizes, an iteration counter, and is not the index, j, on the depen-
h, and compare the numerical results with the true solu- dent variables used throughout this example.
tions 5. Estimate the local truncation error for each of the
y(x) = sin x corrector equations on the current interval as in (6.135):
dY - cos x.
- ej,i+I = +(yj,i+ 1.1 - yj,i+ 1,0), J = 19 2, . . ., n.
dx (6.4.10)

Method of Solution 6. Make the final modifications of the solutions of the


corrector equations as in (6.136):
Let yjSi be the final modification of the estimated
solution for the jth dependent variable, yj, at xi, resulting yj,i+l = yj,i+l,l- ej,i+l, j = 1 , Z ..+,n. (6.4.11)
from Hamming's method (see (6.136)), and letfiSibe the After evaluating the yjgi+,, the n values fi,i+lmay be
calculated estimate offi at xi, that is, computed from (6.4.5) as
fj,i+l=fj(~i+l,Yl,i+l,Yz,i+l,.-.,~n,i+l), J = 1, 2 , . . ., n.
(6.4.12)
Assume that Yj,i, Yj,i-1, Yj,i-27 Yj.i-3, fj.i, fj,i-l, fj,i-2 If desired, the entire process may be repeated for the next
have already been found and are available f o r j = 1,2,. . .,r?. interval by starting again at step 2.
Let an estimate of the local truncation error (6.135) for If HAMING is to be a general-purpose function, suitable
the corrector equation (6.131) for the jth dependent for integrating any set of n 6rst-order equations, then
variable be denoted ej,i. Then, assuming that the the evaluation of the derivative estimates in (6.4.9) and
ej,i,j = 1, 2, . . ., n, are available, the procedure outlined (6.4.12) must be done outside the function HAMING.
393
394 The Approximation of the Solution of Ordinary Diferential Equations

One way to implement the sequence (6.4.6) to (6.4.12) is are available for the function to use upon entry to the
to separate HAMING into two parts, one to handle the +
predictor section, and the 6n 1 essential numbers, h,
prediction portion of the algorithm (equations (6.4.6) and
and (6.4.7)), and the other to handle the correction por-
tion of the algorithm (equations (6.4.8), (6.4.10), and
(6.4.11)). The function can then return to the calling
program after each portion, requesting that the derivative
estimates of (6.4.9) and (6.4.12) be calculated in the main
program or elsewhere, when needed. are available for the function to use upon entry to the
A simplified diagram of the calling procedure for corrector section.
HAMING is shown in Fig. 6.4.1. Note that all direct referen- The simplest way to communicate the necessary num-
ces to the first-order equations are confined to the calling bers in (6.4.13) and (6.4.14) is to maintain tables available
program. Therefore, the function HAMING can be written to both the calling program and the function HAMING.
to solve an arbitrary system of n first-order equations, Most of the bookkeeping tasks of updating the tables,
provided that the 5n + 2 essential numbers, xi, h, and etc., can be assigned to the function, so that the calling
program has little to do except to set up the necessary
tables priorto the first call upon HAMING, and to evaluate
the derivative estimates of (6.4.9) and (6.4.12). Let a 4 by
n matrix Y, a 3 by n matrix F, and an n-element vector C

Calling Program Function HAMING

- Call on predictor
section of HAMING.
+
Predictor section of HAMING
evaluates
~ ~ , i + l , o , 1,2,
j= ., n,
~ ? i + l , o , j =1,2, ., n,
from (6.4.6) and (6.4.7),
Calling program
respectively, and then
evaluates
updates the independent
&yi+i,o,j= 132, n,
f-

- . a ,
variable,
from (6.4.9).
+
= x i h.
A

\f
t

Call on corrector . Corrector section of HAMING


section of HAMING.
- evaluates
--
y j , i + i , ~ , j =1929 n,
4
-
ej,i+l, j = 1, 2, - 3 n*
Calling program Yj,i+l, j = 1, 2, n,
from (6.4.8), (6.4.10),
evaluates
and (6.4.1 l), respectively.
f j , i + l r j= 1, 2, . . ., n,
from (6.4.12).

Figure 6.4.1 Communication between a calling program and the function HAMING.
Example 6.4 Hamming's Method

be assigned to the calling program, and available to the F is unchanged from (6.4.20),
function HAMING through the argument list. Then write
the calling program and function HAMING, so that the
matrices Y and F, and the vector Z, have the following
contents at the indicated points in the algorithm.
If the calling program uses (6.4.12) to replace the
Before entry into the predictor section of HAMING: elements in the first row of F after the return from the
corrector section of HAMING, so that

then the matrices Y and F and vector Z are ready for a


call on the predictor section of HAMING for the next
integration step. The independent variable is incremented
in the predictor section, that is,

so that the calling program will automatically have the


proper value to calculate the f;,i+l,o and l;.,i+l from
After return from the predictor section of HAMING: (6.4.9) and (6.4.12), respectively.
Assuming that the integration process begins at x = xo
and that the only known conditions on (6.4.1) are

there is not enough information available to calculate the


elements of (6.4.15), (6.4.16), and (6.4.17). Therefore,
HAMING cannot be called directly when x i= xo. The usual
procedure is to use a one-step method to integrate
P is unchanged from (6.4.17). across the first three steps to evaluate

Before entry into the corrector section of HAMING:

Y is unchanged from (6.4.18),


...
.*.

'
...
''
f:i+l,o
fj,i
fj,i- 1
"'

...
f*.
n,t

fn.i
fn.i-1
+ 1.0

I.
(6.4.20)
With (6.4.25) and (6.4.26), the matrix Y of (6.4.15) is
known for i = 3. The matrix F of (6.4.16) may be evalu-
ated from (6.4.26)and (6.4.I) for i = 3. The vector of local
truncation error estimates (6.4.17) for i = 3 is normally
unknown, and should be set to zero unless better values
d is unchanged from (6.4.17). are available from other sourdes. HAMlNG may then be
called for the first time.
After return from the corrector section of HAMING : Jn the calling program that follows, the function RUNGE,
already developed in Example 6.3, is used to generate the
solutions of (6.4.26). Since RUNGE implements the fourth-
order Runge-Kutta method, the solutions of (6.4.26)
should be comparable in accuracy with the solutions
generated by the Hamming's predictor-corrector algo-
rithm, also a fourth-order method.
396 The Approximation of the Solution of Ordinary Differential muations

The main program reads data values for n, xo, h, Then


x,, int, y l , o ,yzVo,. . .,y,,,. Here, int is the number of
integration steps between the printing of solution values.
This program is a reasonably general one. However, the
defining statements for computation of the derivative
estimates (6.4.9) and (6.4.12) would be different for each
system of differential equations.
For test purposes, the differential equation solved is so that
(6.4.2), subject to the initial conditions of (6.4.3). First,
the second-order equation must be rewritten as a system
of two first-order equations. Let
The initial conditions of (6.4.3) are
Example 6.4 Hamming's Method

Flow Diagram
Main Prograin

Implement predictor
calculations (6.4.6)
and (6.4.7) to yield
Y1,j+ ~ : ~ + l , o
j = 1 , 2 , ..., 11
(Function HAMING)

I F1,j + f j , i + ~
J = 1, 2, . . ., 71
(eqn. (6.4.12)) 1 1
Implement corrector
calc~~lations(6.4.8),
(6.4.10) and (6.4.11)
to yield
y ~ , jc Y j , i + ~
-i F1,j +-f.T.
1 r+1,0
j = l , 2 , ...,
(eqn. (6.4.9))

' j +- ' j , t + ~

j = 1,2, ..., n
(Function H A M I N G )
398 The Approximation of the Solution of Ordinary Differential Equations

Function HAMING (Arguments: n, Y, F, x, h, 2)

e e y (predictor eqn. (6.4.6))

'-4 (eqn. (6.4.10))

Yl,i+ YlPj - ej = Y j , i + l
(modifier eqn. (6.4.1 1))

If

4 Return
(modifier eqn. (6.4.7))
I
I I
I

x t x + h Return
Example 6.4 Hamming's Method

FORTRAN Intpitmentation
List of Pritacipal Variables
Program Symbol DeJinition
(Main)
COUNT Step counter, i
F F, a 3 by n matrix of derivative estimates (see (6.4.16), (6.4.19), (6.4.20), and (6.4.23)).
FR Vector of values,A,, j = 1, 2, ..., n. Used by RUNGE.
H h, step size.
SUB 4 --i, a subscript.
INT Print control parameter, inr. Solutions are printed at x, and every int steps thereafter.
J j, index on the dependent variables.
M Value returned by the function HAMING.
N n, number of first-order ordinary differential equations.
RUNOE Function that implements the fourth-order Runge-Kutta method to solve a system of n first-
drder ordinary differential equations. See Example 6.3 for details.
TE .
Vector of local truncation error estimates, ej,i, j = 1, 2, . ., n.
X Independent variable, x,.
X MAX Upper integration limit, x.,
Y Y, a 4 by n matrix of solution values (see (6.4.15), (6.4.18), and (6.4.21)).
YR Vector of values yjSi7j = 1, 2, . . ., n. Used by RUNGE.
(Fubction
HAMING)
K k, P subscript.
K5 5-k.
PRED Logical variable, p. If true, the predictor section of HAMING is executed; if false, the corrector
s t c l i ~ nof HAMING is executed.
YPREO Vector of predicted solutions, Jj = yj,i+l j = 1 , 2, ..., n, (see (6.4.6)).
400 The Approxrmation of the Sobtion of Ordinary Differential Equations

Program Listing
Main Program
.
C A P P L l ED NUMERI CAL METHODS, EXAMPLE 6.4
c HAMMING'S PREDI CTOR-CORRECTOR METHOD
C
C T H I S TEST PROGRAM SOLVES A SYSTEM OF N FIRST-ORDER ORDINARY
C D I F F E R E N T I A L EQUATIONS U S I N G THE HAMMING PREDICTOR-CORRECTOR
C METI'W. THE PROGRAM READS A STARTING VALUE FOR THE INDEPENDENT
C VARIABLE, X, THE INTEGRATION STEPSIZE, H, THE UPPER L I M I T OF
C INTEGRATION, XMAX, AND N I N I T I A L CONDITIONS Y R ( l ) . . . Y R ( N > . Y
C AND F ARE MATRICES CONTAINING SOLUTION AND D E R I V A T I V E VALUES
C AND ARE DESCRIBED I N THE TEXT. T E ( J ) I S THE TRUNCATION ERROR
C ESTIMATE FOR THE J T H CORRECTOR EQUATION. COUNT I S THE STEP
C COUNTER. THE FUNCTION RUNGE (SEE EXAMPLE 6 . 3 ) I S CALLED TO
C INTEGRATE ACROSS THE F I R S T THREE STEPS AND Y I E L D S THE STARTING
C VALUES NEEDED FOR HAMMING'S METHOD. THEREAFTER, THE EQUATIONS
C ARE l NTEGRATED US1 NG ALTERNATELY THE PREDl CTOR AND CORRECTOR
C PORTIONS OF THE FUNCTION HAMING. THE EQUATIONS B E I N G SOLVED
C AS AN EXAMPLE ARE THOSE OF ( 6 . 4 . 2 ) . TO SOLVE ANOTHER SYSTEM OF
C EQUATIONS, A L L STATEMENTS REFERENCING THE SYMBOLS F AND FR
C ( D E R I V A T I V E S FOR USE BY HAMING AND RUNGE, RESPECTIVELY) MUST BE
C REPLACED BY STATEMENTS D E F I N I N G THE D E R I V A T I V E S FOR THE NEW
C SYSTEM. THE SOLUTIONS ARE PRINTED AT THE I N I T I A L X AND AFTER '
C EVERY I N T l NTEGRATl ON STEPS THEREAFTER.
C
I M P L l C l T REAL*B(A-H, 0-2)
INTEGER COUNT, RUNGE, HAMING
DIMENSION T E ( 2 0 ) , YR(20), FR(20), Y(4,20), F(3,20)
C
C
1
..... READ I N PARAMETERS AND I N I T I A L CONDITIONS
READ (5,100) X, H, XMAX, INT, N, ( Y R ( J ) , J = l , N )
.....
C
C ..... PRl N T PARAMETERS, HEADING AND l N l T I A L CONDITIONS
WRITE ( 6 , 2 0 0 ) H, XMAX, INT, N, (J, J = l , N )
.....
WRITE ( 6 , 2 0 1 ) X, ( Y R ( J ) , J = l , N )
P

SET I N I T I A L TRUNCATION ERRORS TO ZERO


COUNT = 0
.....
DO 2 Jz1.N

..... CALL RUNGE TO


I F (RUNGE(N,YR,FR,X,H)
INTEGRATE ACROSS F I R S T THREE STEPS
.NE. 1) GO TO 4
.....
FR(1) = YR(2)
FR(2) = -YR(l)
GO TO 3

..... PUT APPROPRIATE I N I T I A L VALUES I N THE Y AND F MATRICES


COUNT = COUNT + 1
.....
ISUB = 4 -COUNT
DO 5 J=l,N
Y(ISUB,J) = YR(J)
F(ISUB,l) = YR(2)
F(ISUB,Z) = -
YR(1)

..... P R I N T SOLUTlONS AFTER EVERY I N T STEPS


I F ( C O U N T / I N T * I N T .NE. COUNT ) GO TO 7
.....
I F ( COUNT.LE.3 ) WRITE ( 6 , 2 0 2 ) X, (Y(ISUB,J), J=l,N)
I F ( COUNT.GT.3 ) WRITE ( 6 , 2 0 2 ) X, ( Y ( l , J ) , J=l,N)

.I F...( .X.GT.XMAX-H/2,
I F X EXCEEDS XMAX, TERMINATE INTEGRATION .....
GO TO 1
)

..... CALL RUNGE OR HAMING TO INTEGRATE ACROSS NEXT STEP .....


IF ( COUNT. LT.3 GO TO 3
C
C
8
..... CALL HAMING ( P R E D I C T I O N OR CORRECTION)
M = HAMINGC N,Y,F,X,H,TE
.....
F(1,l) = Y(1,2)
Example 6.4 Hamming's Method

Program Listing (Continued)


C
C ..... INCREMENT STEP COUNTER AND EONTINUE INTEGRATION
COUNT = COUNT + 1 .
.....
GO TO 6
C
C
100
..... FORMATS FOR I N P U T AND OUTPUT STATEMENTS
FORMAT( 5X, F10.4,
.....
l o x , F10.6, 12X, F l O . 4/ 5X, 15, 15X, 1 2 /
1 (20X, 4 F 1 2 . 5 )
2 0 0 FORMAT( l O H l H = , E 1 5 . 3 / 1 0 H XMAX = , F12.4/ IOH I N T = ,
1 17/ 10H N = ,17/1HO/ 1H0, 5X,1HX,10X8 4(2HY(, 12,1H),lOX)/
2 ( 1 H ,16X, 4(2HY(,12,1M),lOX)) )
2 0 1 FORMAT( 1H0, F10.4, 4E15.7/ ( 1 H , l o x , 4E15.7)
2 0 2 FORMAT( 1H , F10,4, 4E15.7/ (1H , l o x , 4E15.7)
,b-
END

Function HAMI NG
FUNCTION HAIYING( N, Y, F, X, H, TE )
C
C HAMING IMPLEMENTS HAMMING'S PREDI CTOR-CORRECTOR ALGORI THM TO
C SOLVE N SIMULTANEOUS FIRST-ORDER ORDINARY D I F F E R E N T I A L
C EQUATIONS. X I S THE I N D E P E N D E N T V A R I A B L E AND H I S THE
C INTEGRATION STEPSIZE. THE ROUTINE MUST BE CALLED TWICE FOR
C INTEGRATION ACROSS EACH STEP. ON THE F I R S T CALL, I T 1 s ASSUMED
C THAT THE SOLUTION VALUES AND D E R I V A T I V E VALUES FOR THE N
C EQUATIONS ARE STORED I N THE F I R S T N COLUMNS OF THE F l RST
C FOUR ROWS OF THE Y MATRIX AND THE F I R S T THREE ROWS OF THE F
C MATRl X RESPECTIVELY. THE ROUTINE COMPUTES THE N PRED l CTED
C SOLUTIONS YPRED(J), INCREMENTS X BY H AND PUSHES A L L
C VALUES I N THE Y AND F MATRICES DOWN ONE ROW. THE PREDICTED
C SOLUTI ONlj YPREO( J) ARE MOD I F I ED, US l NG THE TRUNCATION ERROR
C ESTIMATES T E ( J ) FROM THE PREVIOUS STEP, AND SAVED I N THE F I R S T
C ROW OF THE Y MATRIX. HAMING RETURNS TO THE CALLING PROGRAM W I T H
C THE VALUE 1 TO I N D l CATE THAT A L L D E R I V A T I V E S SHOULD BE COMPUTED
C AND STORED I N THE F I R S T ROW OF THE F ARRAY BEFORE THE SECOND
C CALL I S MADE ON HAMING. ON THE SECOND ENTRY TO THE FUNCTION
C (DETERMINE0 BY THE LOGICAL VARIABLE PRED), HAMING USES THE
C HAMMI NG CORRECTOR TO COMPUTE NEW SOLUTION EST1 MATES, EST1 MATES
C THE TRUNCATION ERRORS T E ( J ) FOR THE CURRENT STEP, IMPROVES
C THE CORRECTED SOLUTIONS USING THE NEW TRUNCATION ERROR
C ESTIMATES, SAVES THE IMPROVED SOLUTIONS I N THE F I R S T ROW OF THE
C Y MATRIX, AND RETURNS TO THE CALLING PROGRAM W I T H A VALUE 2 TO
C I N D I C A T E COMPLETION OF ONE F U L L INTEGRATION STEP.
C
I M P L I C I T REAL*8(A-H, 0-2)
REAL*8 Y, F, X, H, TE
INTEGER HAMING
LOGICAL PRED
DIMENSION YPRED(201, TE(N), Y(4,N), F(3,N)
DATA PRED 1 .TRUE. /
C
C ..... I S CALL FOR PREDICTOR OR CORRECTOR SECTION .....
I F (.NOT.PRED) GO TO 4
C
C
C
.......... COMPUTE
PREDI CTOR SECTION OF HAMING .....
PREDl CTED Y ( J ) VALUES A T NEXT POINT ... . .
DO 1 J=l,N
1 YPREDCJ) Y(4,J) + 4.*H*(2.*F(18J) - F(2,J) + 2,*F(3,J))/3.
C
C ..... UPDATE
DO 2 J=l,N
THE Y AND F TABLES .....
DO 2 K5-1,3
K u S - K 5
Y(K,J) = Y(K-1,J)
2 I F (K.LT.4) F(K,J) = F(K-1,J)
402 The Approximation of the Solution of Ordinary DiffetvnfialEquafiotw

Program Listing (Continued)


C
C
C
..... MODIFY
ESTIMATES FROM THE PREVIOUS STEP, INCREMENT X VALUE ....,
PREDICTED Y ( J ) VALUES USING THE TRUNCATION ERROR
DO 3 J=l,N
3 Y(1,J) = YPRED(J1 + 112.*TE(J)/9.
X = X + H
..... .....
.a

SET PRED AND REQUEST UPDATED DERIVATIVE VALUES


PRED = .FALSE.
. .... . . - = -
HAMING 1
RETURN
.....
..... CORRECTOR SECTION OF HAMING .....
DO 5
COMPUTE CORRECTED AND IMPROVED VALUES OF THE Y ( J ) AND SAVE
TRUNCATION ERROR ESTIMATES FOR THE CURRENT STEP
J-l,N
.....
- - -
Y(1,J) = (9.*Y(2,J)-Y(b,J) + 3.*H*(F(l,J)+2.*F(2,J)-FC3rJ)))/E.
TE(J) = 9.*(Y(l,J) YPRED(J))/121.
Y(1,J) Y(1,J) TE(J)
..... .....
-
SET PRED AND RETURN WITH SOLUTIONS FOR CURRENT STEP
PRED = .TRUE.
HAMING 2
RETURN

END

Data
x = 0.0000 XMAX = 5.0000
INTI 1
YR(11.. .YR(2)= 1.00000
X = 0.0000 XMAX = 5.0000
INT= 1
YR(1). ..YR(2)= 1.00000
X = 0.0000 XMAX 8 5.0000
INT- 2
YR( l)...YR(2)= 1.00000
x = 0.0000 XMAX = 5.0000
INT= 4
YR(l)...YR(2)= 1.00000
X = 0.0000 XMAX = 5.0000
INT= 8
1.00000

-
YR(l)...YR(2)-
X = 0.0000 XMAX = 5.0000
INT 16
YR(l)...YR(2)= 1.00000
X = 0.0000 XMAX = 5.0000
INT = 32
YR(l)...YR(2)= 1,00000

Computer Output
Results for the 1st Data Set
H I 0.100D 01
XMAX = 5.0000
INT = 1
N I 2
Example 6.4 Hammlq's Method

Computer Output (Continued)


Results for the 2nd Data Set liemlts for the 3rd Data Set
H 3 0,500D 00 H 8 0.250D 00
XMAX = 5.0000 XMAX 8 5.0000
INT = 1 INT 8 2
N I 2 N 8 2

Results for the 4th .Data Set Results for the 5th Data Set
H t 0,125D 00 H I 0.625D-01
XMAX = 5.0000 XMAX * 5.0000
INT = 4 INT = 8
N P 2 N I
2

Resultsfor rhe 6th Data Set Results for the 7th Data Set
1 = 0.312D-Ol H rn 0.156D-01
XMAX * 5.0000 XMAX = 5.0000
INT s 16 IN1 8 32
N = 2 N I 2
404 The Approximafion of the Solution of Ordinary Differential Equations

Discussion of Results Table 6.4.2 Calculated Solutions a r x = 5

Double-precision arithmetic has been used for all Step size, h Y1 YZ


calculations. Differential equation (6.4.2) with initial
conditions given by (6.4.3) has been solved on the inter- 1.o
val [0,5] with step-size h = 1.0, 0.5, 0.25, 0.125,0.0625, 0.5
0.03125, and 0.015625. To seven-place accuracy, the true 0.25
0.125
solutions 0.0625
0.03125
0.015625
True Values

are listed in Table 6.4.1. Results for step-sizes 0.03125 and 0.015625 (data sets
Table 6.4.1 True Solutions, y(x) 6 and 7) agree with the true values to seven figures.
and dyJdx Results for the larger step sizes are not of acceptable
accuracy. The program has been run with even larger
x y(x) = sin x dyldx = cos x values of h (results not shown) as well. For h largeenough,
the solutions "blow up" in a fashion similar to that already
observed for the Runge-Kutta method in Example 6.3.
In view of the periodic nature of the solution functions,
it is not surprising that as the step size approaches the
length of the functional period, the solutions become
meaningless. Clearly, for step sizes larger than the
period, virtually all local information about the curva-
Table 6.4.2. shows the computed results at x = 5 for the ture of the function is lost; it would be unreasonable to
various step sizes used. expect accurate solutions in such cases.
6.14 Boundary.Value Problems 405

6.14 Boundary-Value Problems Unfortunately, when the differential equation is non-


The numerical methods described so far have assumed linear, the system of finite-difference equations is also
the availability of initial conditions to start the solution nonlinear. In addition to the problems of uniqueness
procedure. In many problems, conditions on the equation associated with the solution of nonlinear equations, the
are given as boundary rather than initial values. Consider, generation of any solution may be very difficult, especially
for example, a second-order equation of the form when many base points are used. In some cases [17], one
can linearize the equations, solve the equations iteratively,
then relinearize about the new solution, find a new solu-
tion iteratively, etc. In effect, a complex problem has
been replaced by another problem which is somewhat
subject to the boundary conditions y(a) = a,y(b) = P. less complex.
The numerical solution of this problem is generally The shooting methods reduce the solution of a boundary-
much more complicated than the solution for the cor- value problem to the iterative solution of an initial-value
responding initial-value problem. problem [18]. The usual approaches involve a trial-and-
The usual algorithms for the solution of boundary- error procedure. That boundary point having the most
value problems fall into two general classes: the finite- known conditions is selected as the initial point. Any
difference methods and the shooting methods. other missing initial conditions are assumed, and the
The finite-dlfference methods involve approximation of initial-value problem is solved using one of the step-by-
the differential equation at the n + 2 base points xo, x,, step procedures from the previous sections. Unless the
,
. . ., x,, ; each derivative is replaced by a finite-difference computed solution agrees with the known boundary
representation as described in detail in Chapter 7 (see, in conditions (unlikely on the first try), the initial conditions
particular, Section 7.3). For the two-point boundary are adjusted and the problem is solved again. The process
value problem of (6.137), the initial condition at xo = a, is repeated until the assumed initial conditions yield,
that is, yo = cc, and the boundary condition at x,,, = b, within specified tolerances, a solution that agrees with
that is, y,,, = p, can be introduced directly into the the known boundary conditions.
first and the nth finite-difference equation respectively. In (6.137), for example, the missing initial value
If yo and y,,, are specified, then the finite-difference dy(a)/dx is assumed to be c,, for instance, and the
procedure leads to a system of n simultaneous equations integration from a to b is carried out by using one of the
in the n unknowns y,, y,, . . ., y,, the approximate solu- standard procedures for initial-value problems. Equation
tions at the interior base points x,, x,, . . ., x,. Usually, (6.137) would normally be rewritten as a system of two
though not necessarily, the base points are equally first-order equations. Let y , be the computed solution at
spaced on the interval [xo,x,+,]. For more complicated x = b (see Fig. 6.8). The procedure is repeated by using
initial or boundary conditions, the methods of Section
7.17 may be used.
If the original differential equation is linear (that is, if
the differential equation is linear in the dependent vari-
able y and all its derivalives), then the set of equations
generated is also linear. The system of simultaneous
equations can be solved with the iterative methods of
Chapter 5 (for example, the Gauss-Seidel method) if n is
large or by the elimination methods of Chapter 5 if n is
small (say n < 40). If, as usually happens for equations
of order 2, the system of linear equations has a tridiagonal
coefficient matrix, the special form of the Gaussian elimi-
nation method described in Section 7.9 may be used,
even for n rather large (say n up to 500).
Since the finite-difference methods for solution of
linear ordinary differentral equations are identical with Figure 6.8 A trial-and-error solution of a boundary-value
those described in detail in Chapter 7 for solution of problem.
linear partial d~fferentialequations (but simpler, since
the ordinary differential equation has only one indepen- another assumed value for the missing initial condition,
dent variable), they will not be pursued further here. for instance, dy(a)/dx = c,. For this case, let y, be the
The reader is referred to [16] for detailed numerical computed solution at x = b. If y, and y, encompass the
solutions of several simple linear boundary-value prob- desired y(b) = /? as shown in Fig. 6.8, and if (6.137) is
lems by finite-difference ]methods. well behaved, then an estimate of a new trial value, c,,
406 The Approximation of the Solution of Ordinary Differential Equations

for the initial condition can be found by the linear approach is to define a nonnegative objective function
interpolation

and to use one of the standard one-dimensional minimiza-


tion strategies, such as the "golden-section" or Fibo-
nacci search [19]. When several initial conditions must
This procedure can be repeated by using another linear be assumed, one can frequently define a multidimensional
or perhaps a quadratic or higher-order interpolation to objective function similar to (6.140) and then use one of
produce a sequence of new values c,, c,, c,, .. ., until a the multidimensional optimization algorithms, such as
selected assumed value of the initial condition produces the direct search of Hooke and Jeeves [20], to find the
the boundary-value solution as accurately as desired. missing conditions.
This interpolation procedure will be most successful When the boundary-value problem involves a system
when applied to linear or nearly linear equations; for of many differential equations, or when the specified
highly nonlinear equations the approach may be com- conditions apply to several different values of the inde-
pletely unsatisfactory. pendent variable (the so-called multiple-point boundary-
Obviously, any other trial-and-error procedure which value problem), the solution procedure increases greatly
results in a sequence of ever-improving initial condition in complexity. In some cases, it may be possible to
estimates is also acceptable. For example, one could use restate the problem in simpler form by a suitable trans-
a root-finding procedure, such as the half-interval method, formation of variables; in others, a rather drastic change
on the function in form, for example, rewriting the problem in terms of
integral rather than differential equations, may prove
fruitful. In any event, as should be apparent from the
where y(b,c) is the computed value of y(b) with initial preceding paragraphs, there are no known algorithms
condition c. This technique is used in the solution of the which apriori guarantee successful solution of an arbi-
boundary-value problem of Example 6.5. Yet another trary boundary-value problem.
EXAMPLE 6.5
A BOUNDARY-VALUE PROBLEM IN FLUID MECHANICS

Introduction we can derive the following relations for dimensionless


Consider the Blasius problem [42],in which a thin and
plate is placed in, and parallel to, a fluid stream of u = - u- -f ', (6.5.8)
velocity u, and kinematic viscosity v, as shown in Fig. urn
6.5.1. We wish to find how u and v , the velocity compo- U
nents in the x and y directions, vary in the region close to v=-- - qff -f. (6.5.9)
the plate, known as the boundary layer.

Thus, once (6.5.5)has been solved for f,we can predict


the velocity field from (6.5.8)and (6.5.9).
Problem Statement
Perform a fourth-order Runge-Kutta solution of

-
Figure 6.5.1 The boundary layer on aflat plate.
(6.5.5)and its associated boundary conditions. The pro-
gram should print tabulated values of q, f, f' (= U),f ",
and V ( = qf -f), and should produce a graph of U and
V versus q on the on-line printer.
Method of Solution
The governing equations of motion and continuity are
By defining g, =L g2 =f ', and g, =f", we can replace
au a~
u--l-~-=~-
a 2 ~ (6.5.1) equation (6.5.5)by an equivalent set of three first-order
ax ay ay2' equations :

subject to the boundary conditions


y=O: u=v=O, (6.5.3)

By introducing the djimensioniess coordinate subject to

and a stream function Starting at q = 0, the integration of (6.5.10) over

such that
* = ,hCf(rl>
successive steps Aq is achieved by the fourth-order
Runge-Kutta function RUNG E ; this is described fully in
Example 6.3,and the details will not be repeated here.
Since we have initial conditions (at q = 0) for g, and g,
only, we search for a value of g, at q = 0 that will gener-
ate a solution that yieldsg, = 1 at q = oo. This is accom-
equations (6.5.1)and (6,.5.2)reduce to a single ordinary plished by the half-interval method, discussed previously
differential equation in the dimensionless stream func- in Section 3.8 and Example 3.4.
tion f(q): In practice, condition (6512)must be replaced by the
approximate condition

The boundary conditions now become


where urnoxis arbitrary, as long as it is chosen large enough
so that the solution shows little further change for v
larger than q,,,. This corresponds to the physical fact
q=mr f'=1. (6'5'7) that the mainstream velocity is effectively equal to urnfar
From the above equations involving stream function, from the plate.
408 The Approximation of the Solution of Ordinary Differential Equations

We start with two limits, gjLand g,,, between which g2(qmox)exceeds 1. If it does, g,, at the next iteration is
the missing initial condition, g,,, is thought to lie. The equated to the current midpoint g,,, and g,, is not
solution of (6.5.10) is performed iteratively n times, changed. However, if g,(q,,J < 1, we equate the next
improving at each iteration the value chosen for g,,. At g,, to the current g,, and leave g,, unchanged.
+
each iteration, we set g,, = (g,, g3,)/2, in which one The following flow diagram illustrates the essential
of the limits g,, or g,, of the current interval has been basic steps. In the actual program, however, there are
adjusted according to the half-interval method. The cri- further elaborations involving arrangements for periodic
terion for the adjustment is whether or not the computed printing and plotting of graphs.

Flow Diagram

'r 4 +o
1
9 3+~g3~
I
iter = 1, 2, --+- g3° +
g, t 0 2 I
.. ., n I
92 93 930 I
I
I

- I
I
I
I
I
Call on R U N G E to I
advance it, g,, g2, and g3
to the end of a step. ,
I
I
I
Example 6.5 A Boundary-Value Problem in Fluid Mechanics

FORTRAN Implementation
List of Principal Variables
Program Symbol Definition
D ETA Step size, Aq.
DG Vector containing values of the three derivatives, dg,/dq, dg2/dq, and dg,/dq.
ETA Dimensionless distance, q.
ETAMAX Maximum value of q, q,,,, for which solutions are required.
G Vector containing values of g,, g2, and g,.
G3LEFT G3RITE Lower and upper limits, g f Land g,,, for the current interval containing the initial value of g,.
GBZERO initial value of g,, g,,.
IFPLOT Integer, having the value 1 if a graph is required.
IMAGE Storage vector used by the plotting subroutines.
lTER Iteration counter fpr the half-intervalmethod, iter.
N Upper limit on the number of half-interval iterations, n.
NCOPY Number of copies of the graph that are required.
NIBETP IVumber of half-interval iterations between printouts.
NSBETP IVumber of Runge-Kutta steps, within any one half-interval iteration, between printouts.
NSTEPS Counter on the number of steps Aq.
ORD 'Value of dimensionless velocity, V, to be plotted as the ordinate of the graph.
PLOT1, PLOTZ, !Subroutines used for preparing on-line graph of U and V against q. See Examples 7.1 and 8.5
PLOT3, PLOT4 for further details.
RUNGE Function for implementing the fourth-order Runge-Kutta procedure (see Example 6.3).
SlGNL Has the value - 1, if g2(q,,,) < 1 when g,(O) = g,,; otherwise, the value is + 1.
The Approximation of the Solution of Ordinary Differential Equations

Program Listing
Main Program
A P P L I E D NUMERICAL METHODS, EXAMPLE 6.5
A BOUNDARY VALUE PROBLEM I N F L U I D MECHANICS

T H I S PROGRAM CALLS ON THE FOURTH ORDER RUNGE KUTTA METHOD SUB-


ROUTINE RUNGE TO SOLVE THE ' B L A S I U S ' PROBLEM WHICH DESCRIBES
BOUNDARY LAYER FLOW ALONG A T H I N FLAT PLATE IMMERSED I N A F L U I D
OF UNIFORM VELOCITY AT ZERO INCIDENCE ANGLE. G ( l ) , G ( 2 ) AND
G ( 3 ) ARE THE DEPENDENT VARIABLES, ETA THE INDEPENDENT VARIABLE,
DETA THE STEPSIZE, AND DG(11, D G ( 2 ) AND D G ( 3 ) THE D E R I V A T I V E S .
THE BOUNDARY CONDITIONS ARE G(1)=0., G(2)=O. A T ETA=O. AND
G ( 2 ) = 1 . AT E T A = l N F l N l T Y (ETAMAX). THE M I S S I N G I N I T I A L
CONDITION, G ( 3 ) AT ETA=O., I S FOUND BY A HALF INTERVAL
I T E R A T I V E PROCEDURE. I T E R I S THE I T E R A T I O N COUNTER AND THE
INTERVAL OF UNCERTAINTY AFTER I T E R PASSES I S (G3LEFT.G3RITE).
C G3ZERO I S THE M I D - I N T E R V A L VALUE USED FOR THE INITIAL VALUE OF
C G(3). S l G N L = -1 I F G ( 2 ) AT ETAMAX I S SMALLER THAN 1.0
C WHEN G ( 3 ) AT ETA = 0 I S EQUAL TO G3LEFT. OTHERWISE. S l G N L = 1.
c N I S THE TOTAL NUMBER OF INTERVAL HALVING OPERATIONS, SO THAT
C THE I N I T I A L INTERVAL OF UNCERTAINTY I N G ( 3 ) AT ETA=O. I S
C REDUCED BY A FACTOR OF 2.**( N). SOLUTION VALUES ARE PRINTED
C ON THE F I R S T , NTH AND INTEGRAL M U L T I P L E S OF N I B E T P ITERATIONS.
C SOLUTIONS ARE PRINTED AFTER EVERY NSBETP STEPS (NSTEPS I S THE
C STEP COUNTER). WHEN I F P L O T = 1, VALUES OF G ( 2 ) AND
C ORD=O.S*(ETA*G(2)-G(l)) VERSUS ETA ARE PLOTTED ON THE LAST
C ITERATION. NCOPY COPIES OF THE GRAPH ARE PRINTED.
C
I M P L I C I T REAL+8(A-H, 0-Z)
1 NTEGER RUNGE
DIMENSION G(3), DG(31, l M A G E ( 1 5 0 0 )
C
1 READ ( 5 , 1 0 0 ) G3LEFT, G3RITE, SIGNL, DETA, ETAMAX, NIBETP, NSBETP,
1 N, IFPLOT, NCOPY
WRITE ( 6 , 2 0 0 ) G3LEFT, G3RITE, SIGNL, DETA, ETAMAX, N I BETP, NSBETP,
1 N, IFPLOT, NCOPY

..... HALF INTERVAL I T E R A T I O N FOR I N I T I A L G ( 3 ) VALUE ...


b
C
DO 2 1 I T E R = 1, N
C
C ..... SET, P R l NT AND PLOT' I N I T I A L CONDITIONS .....
NSTEPS = 0
ETA = 0.

ORD = 0.
WRITE (6,201) ITER,G3LEFT,G3ZERO,G3RITE,ETA,G(l),G(2),G(3),oRD
I F (1TER.NE.N .OR. IFPLOT.NE.1) GO TO 8
CALL P L O T l ( 0, 6, 9, 6 , 1 9 )
CALL PLOT2( IMAGE, ETAMAX, O., 1.25, 0. )
CALL PLOT3( l H * , ETA, G(21, 1, 8 )

8
..... CALL
I F (RUNGE(
ON RUNGE-KUTTA SUBROUTINE
3, G, DG, ETA, DETA .NE.
.....
...
1
.
GO TO 1 0
DG(1) = G(2)
DG(2) = G(3)
DG(3) = -G(l)*G(3)/2.
GO TO 8
C
C
10
.....
IF
P R I N T SOLUTIONS, PLOT G ( 2 ) AND ORD VALUES
(.NOT.(ITER/NIBETP*NIBETP.EQ.ITER.OR.ITER.EQ.l.OR.ITER.EQ.N))
.....
1 GOTO17
NSTEPS = NSTEPS + 1
I F (NSTEPS.NE.NSBETP1 GO TO 1 7
NSTEPS = 0
ORD = 0 . 5 * ( E T A * G ( 2 )
WRITE ( 6 , 2 0 2 )
- G(1))
ETA, G ( 1 ) , G ( 2 ) , G ( 3 ) , ORD
I F (1TER.NE.N .OR. IFPLOT.NE.1) GO TO 1 7
CALL PLOT3( lH*, ETA, G ( 2 ) , 1, 8
CALL PLOT3( lHX, ETA, ORD, 1, 8 )
Example 6.5 A Boundary-Value Problem in Fluid Mechanics

Program Listing (Continued)

.I F...( E.T Al NTEGRATE ACROSS ANOTHER STEP I F REQUl RED .....


.LT. ETAMAX-DETA/2.) GO TO 8

..... F I N D INTERVAL HALF W I T H THE S I G N CHANGE .....


I F i i G ( 2 ) - 1 6 ) * S I G N L .GT. 0.) GO TO 2 0
G3RI TE = G3ZERO
GO TO 2 1
G3LEFT = G3ZERO
CONT l NUE

..... WRITE NCOPY COPIES OF THE GRAPH


I F (IFPLOT.NE.1) TO 1GO
I F REQUESTED .....
DO 2 3 1 1, NCOPY
WRITE (6,203)
C A L L PLOT4( 22. 22HG2 AND ORD
WRITE ( 6 , 2 0 4 )
GO TO 1

C
100
..... FORMATS FOR INPUT AND OUTPUT STATEMENTS
FORMAT ( 10X,~10.7,~0X,F10.7,i9X,F3.0/10~,F10.7,20X,F10.7,19~,l3/
.....
1 l o x , 1Z1,28X, Ii!,28X, 1 2 / l o x , 12 )
200 FORMAT ( 1 0 h l G 3 L E F T = , F l 0 . 6 / 1 0 H G 3 R I T E = , F 1 0 . 6 / 1 0 H S l G N L ,
1 F3.0/ 1 0 H DETA 1 , F10.6/ 1 0 H ETAMAX = , F 1 0 . 6 / 1 0 H N I B E T P = ,
2 1 3 / 1 0 H NSBETP * 13/ 10H N a , 13/ 10H IFPLOT = , 13/
3 1 0 H NC:OPY * 13
201 FORMAT ( 1 0 ~ 1 1 ~ ~* 6, 1 3 / 1 0 H G3LEFT = , F 1 0 . 6 / 1 0 H G3ZERO = ,
1 F 1 0 . 6 1 1OH G f R l TE = , F 1 0 . 6 / 7H0 ETA, 11X, 4HG(1), 12X,
2 4HG(2), 12X, 4HG(3), 12X, 3HORDI 1H0, F7.4, 2F16.7, 2F16.8
202 FORMAT ( 1H , F7.4, 2f16.7, 2F16.8 )
203 FORMAT ( 1H1, 52X, l ! J H f H E B L A S i U S PROBLEM/ 1HO
204 FORMAT ( 1H0, 60X, 3HETA/ 1H , i i X , 2 9 H P L O T T I N G CHARACTER ( * I = G ( 2 )
1 / 1 H ,1lX, ~ ~ H P L O T T ~ CHARACTER
NG ( X I * ORD )
C
END

Data
G3LEFT 1 0.1000000 G3RITE s 0.5000000 SlGNL = -1
DETA = 0.1000000 ETAMAX * 10.0000000 NIBETP = 10
NSBETP a 2 N a 20 IFPLOT = 1
NCOPY = 1

Computer Output
G3LEFT =
=
G3RITE
SlGNL
DETA
=
=
-
ETAMAX =
NIBETP =
NSBETP =
N =
IFPLOT
NCOPY =
412 The Approximation of the Solution of Ordinary Differential Equations

Computer Output (Continued)


Results for the 1st Half-Interval Iteration

ETA G(1) ORD


Example 6.5 A Boundary- Value Problem in Fluid Mechanics

Computer Output (Continued)


Results for the 20th (j?~ral)
Half-Interval Iteration

ETA G(1) G( 2 ORD


The App~oxr'motionof the Solution of Ordinary Differential Equations

I ci-
X O
1,
*a
2s
PP
Example 6.5 A Boundary-Value Problem in Fluid Mechanics 415
and 10th iterations were 0.9345562 and 1.0007317,
Discussion of Results
respectively. The results for the final iteration agree to at
The results of the computations, performed in double- least six significant digits with those reported in Reference
precision arithmetic, are displayed above only for the 42. Observe that U essentially attains its mainstream value
-firstand 20th (final) half-interval iteration. The success of for q = 6 or greater. Since aulax is everywhere negative
the method can be seen by examining gz(q,,), which (at a given distance from the plate, the u velocity is re-
differs only marginally from the required value of 1. The tarded in the direction of flow), a positive V velocity
corresponding values for g,(q,,,) at the end of the 1st occurs in order to satisfy the continuity equation (6.5.2).
416 The Approximation of the Solution of Ordinary Differential Equations

Problems 6.5 Show that the algorithm


6.1 Rewrite the following system of ordinary differential
equations as a system of first-order ordinary differential equa-
tions:
d3y d2y dz
~ = x dx~ Y ~ - Y ~ ,

6.2 Solve the ordinary differential equation is a third-order Runge-Kutta method with accuracy com-
parable to that of (6.64).
6.6 Develop a general formulation of the fourth-order
Runge-Kutta methods, comparable to that described in
detail for the second-order methods, and outlined for the
with initial conditions third-order methods, in Section 6.5.Starting with (6.65),
expand each of the k t , i = 2, 3, 4, in a Taylor's series as a
function of two variables, and equate the coefficient of each
power of h $ 4 in (6.65)to that of the same power of h in
(6.25).What is the system of equations describing the fourth-
using the Taylor's expansion approach described in Section order parameters, comparable to (6.57)for the second-order
6.2,and compare with the analytical solution. methods? Now many parameters must be specified to deter-
6.3 Write a program, similar to that of Example 6.1,that mine a particular fourth-order method? Show that (6.66),
employs either the improved Euler method of (6.59)and (6.60) (6.67),and (6.68)satisfy the system of equations developed
or the modified Euler method of (6.61)and (6.62)to solve the above.
first-order equation (6.19) 8
6.7 Choose, arbitrarily, a sufficient number of parameters
to specify a particular fourth-order Runge-Kutta method (see
Problem 6.6),and develop a new Runge-Kutta formula of
with initial condition y(xo)= y(0) = 0.Integrate the equation your own. Write a function, named M Y R K and modelled after
on the interval 0 < x < 2 using several different step sizes, the function R U N G E of Example 6.3,that could be used in
h = I . , 0.5,0.25,0.1,0.05,0.025,0.01,0.005,0.0025,0.001, place of R U N G E to solve systems of first-order ordinary
0.0005,0.00025,0.0001.Initially, and after every k steps of differential equations. Write a main program that calls on
the procedure, print the current results (xl, yi)the true
yi,,f(xl, M Y R K to solve equation (6.4.2)with initial conditions (6.4'31,
solutiorr, y(xi),and the discretization error, E~). for a variety of step sizes. Compare the results with those of
Plot the discretization error, at x = 2 against h, h2, and Example 6.4,for comparable step stzes.
h3, and determine the apparent order of the error for small 6.8 The differential equation
step sizes.
6.4 Consider the first-order equation,

with

with the initial condition,


is to be integrated from xo = 0 to x l o= 5,using the third-order
Runge-Kutta algorithm of (6.64),with a step-size h = 112.
Ignoring round-off error, show that the discretization error for
Without performing the actual integration, compute an
the solution of this equation with the given condition by
upper bound for the totai error in the computed value of y at
Euler's method is
XI,. Assume negligible round-off error; that is, the total error
is given by the total truncation error,

6.9 Consider the second-order ordinary differential equa-


tion

Consider the solution of the equation on the interval 0 < x < 1.


Investigate the stability of Euler's method in the solution of in which the first derivative, yf(x), does not appear explicitly.
the given equation for an initial error E O = 0.0001,and various The following Runge-Kutta method has been developed for
step-sizes h = 0.01,0.001,. . ., etc. integrating equations of this type without requiring the second-
Problems 417

order equation to be rewritten as a system of two first-order T o avoid the necessity of storing tabular information, or of
equations: having RKGILL implement the printing of solutions, the func-
tion should return to the calling program whenever solutions
are to be printed. The calling program should then print any
results desired according to any appropriate format. The
function RKGILL should return a logical value, .FALSE., to
signal that integration is not yet complete, or .TRUE., to indi-
cate that the upper integration limit has been reached. If
RKGILL returns the value .FALSE., then the calling program
should return to RKGlLL immediately after printing.
Test the program thoroughly with a variety of functions,
to insure that the step-size adjustment and printing control
features are functioning properly. Once checked out, RKGliL
The initial conditions y(xo) and yf(x0) must be specified. can be used to solve many of the equations in the problems
Derive this algorithm, and show that it is a third-order that follow.
method. 6.11 The corrector equation, (6.95c), for Milne's sixth-
6.10 Write a general-purpose function, named RKGILL, order method, can be used iteratively to produce a sequence of
that solves a system of n simultaneous first-order ordinary corrected values,
differential equations (6.77)

where
using the fourth-order Runge-Kutta method with Gill's coeffi-
cients (6.68). The function should be more general than the
function RUNGE of Example 6.3, in that RKGILL should auto- If the differential equation being solved is d ~ d = x f(x,y),
matically adjust the step-size h, based on upper "bounds" for under what conditions will this iterative procedure converge to
the local truncation errors to be supplied by the user in a a limiting value y, +, , ,?
vector argument. Use the Richardson extrapolation technique 6.12 Conduct a stability analysis of the sixth-order Milne
outlined in (6.70) to (6.74) to estimate the truncation errors. corrector ( 6 . 9 5 ~for
) the solution of (6.1 1 1 ).
Develop appropriate criteria for determining when to change (a) Find the linear homogeneous difference equation cor-
the step-size h. Halving or doubling the step size should be responding to the corrector.
acceptable in most cases. Keep in mind that unnecessary (b) Find the characteristic equation of the difference equa-
halving of h or delayed doubling of h will greatly increase the tion of part (a).
number of calculations required for a given problem solution, (c) How many roots are there for (b)?
as will too frequent estimation of the truncation errors. (d) For what range of a h should the behavior of (c) be
RKGILL should keep "historical" information to relieve the examined ?
user of unnecessary bookkeeping. Arguments for the routine (e) Investigate the behavior of the roots on the interval
should include both the lower and upper integration limits. selected in part (d), using one of the root-finding techniques
In addition, it will be convenient to have all derivative evalua- discussed in Chapter 3.
tions (equivalent to computing the F, of (6.3.9), (6.3.14), (f) If possible, expand the roots of (b) as power series in the
etc.) handled by a separate subroutine with argument list parameter ah, as in (6.120) and (6.121).
(X, Y, F), where X, Y, and F have the meanings of like symbols (g) What is the complete solution to the difference equa-
in Example 6.3. The name of this subroutine should, itself, tion ?
be an argument of RKGILL, so that RKGILL can call directly (h) Identify the solution associated with the principal root
for calculation of derivatives without returning to the calling and the parasitic~solutions(from part (g)).
program. (i) Show that the sixth-order Milne corrector is marginally
The user will normally want to print results in tabular form, stable.
for independent variable values that are integral multiples of 6.13 Conduct a stability analysis, similar to that of Prob-
the original step size. Therefore, the routine RKGILL should lem 6.12, for the corrector (6.95a) of the fourth-order modified
have as one of its arguments', an integer frequency for printing Adams method. Does this corrector give rise to parasitic
control purposes. This means that all independent variable solutions? Compare the stability of (6.95a) with the stability of
values associated with the user-specified printout of solutions (6.9%).
must occur as the right-hard side (for positive step sizes) of 6.14 Develop an algorithm, based on the modified Adams
some integration step. It may be necessary to reduce drastically predictor-corrector method ((6.88a) and (6.95a)) that will
the step size (always by some factor of two) temporarily to allow the solution of a single nth-order ordinary differential
accommodate proper printing. Note that if the printing fre- equation,
quency is one, then RKG1LL should never increase the step size
beyond its original value, even when the error criteria would
indicate that this should be done.
418 The Approximation of the Solution of Ordinary Diflerential Equations
with n initialconditions, y, dyldx, ..., d"-'y/dx"-', all specified algorithm are required to get the starting values for Ham-
at x = xo, without requiring the decomposition of the nth- ming's method.
order equation into a system of n first-order equations. Adequate "old" information regarding the y, and fi, j = 1,
How would you implement your algorithm as a general- 2, . . ., n, should be retained at all times to allow doubling of
purpose subroutine? the step size, without recomputing any functional values
6.15 Write a general-purpose function, named AUTOHM before continuing the integration with the new step size.
that solves a system of n simultaneous first-order ordinary When the step size is halved, functional information at two
differential equations (6.77), new base points will be required. Use the RungelKutta method
outlined above to find the missing values.
Test the program thoroughly with the equations used to test
RKGILL of Prpblem 6.10. Once checked out, AUTOHM can be
using the fourth-order Hamming corrector (6.131) with the used to solve many of the equations in the problems that
fourth-order Milne predictor (6.102). The function should be follow.
more general than the function HAMING of Example 6.4, in 6.16 Investigate the following procedure, related to the
that AUTOHM should automatically adjust the step-size h, principle of spline function approximation (see Problem 1.24).
based on upper "bounds" for the local truncation errors to be for approximating the solution of
supplied by the user in a vector argument. Develop
appropriate criteria for determining when to change the step
size by either halving it or doubling it. Keep in mind that too
frequent modification of the step size may cause an excessive subject to the initial condition y(xo) =yo.
number of calculations to be done. Introduce a series of equally spaced base points, X, = ih,
In the manner of the function RKGILL described in Problem i = 0, 1, .. .,n, where the spacing is h = (b - o)/n. Assume that
6.10, evaluations of derivatives should be performed by a over each interval [ X ~ , X ~ +
i=~ ]0,, 1, ..., n - 1, the solution is
separate subroutine. Arguments for AUTOHM should include approximated by the second-degree polynomial,
the upper and lower integration limits, the. name of the
function that evaluates the derivatives, a starting step size, an
integer frequency for printing control (see Problem 6.10), a Also, assume that successive such polynomials and their first
vector of error bounds, all tables necessary for communication derivatives match each other in value at intermediate base
of information, etc. AUTOHM should return the value points, so that
.FALSE. when the calling program is to print results and then
call immediately upon AUTO-HM to continue the integration,
or .TRUE. when integration to the upper limit is complete.
Starting values for the solution should be generated inter-
nally by AUTOHM using a fourth-order Runge-Kutta method. In addition, to insure that the initial condition is obeyed, and
The user should be required to supply the n solution values that the differential equation is satisfied at the end points, let
only at xo, and call upon AUTOHM directly. The function
RKGILL of Problem 6.10 could be called to do this; however, a
preferred way of starting would be to use the following Runge-
Kutta method (here written for a single equation):
yr + 1 = yr + 0.17476028226269037hk1 (a) Show that the complete solution to the problem amounts
- 0.55148066287873294kkt to solving 3n simultaneous equations in the coefficients a,, bl, CJ,
+ 1.2055355993965235 hk3 i=O, 1 ,..., n- 1.
(b) What is the most efficient way of solving these simul-
+ 0.17118478121951903hk,, taneous equations?
kl =f(~l,y;), (c) Apply the method to the solution of
+
k2 =f (XI 0.4h,yr + 0.4 hk~),
+
k3 =f(xi 0.45573725421878943 h, yr
+
0.29697760924775360 hkl on the interval [1,2] with yo = 2. Use several different step
+
0.15875964497103583 hk,), sizes such as 0.5, 0.25, 0.1, 0.05, 0.025, 0.01, 0.005, 0.001. If
possible, compare the computational work performed and
+
k+ =f(xr h,yl+ 0.21810038822592047 hkI the accuracy obtained with those for solution by the Euler and
- 3.050965 1486929308 hk2 Runge-Kutta methods for the same problem. In any event,
+
3.8328647604670103hk3). compare the results with the analytical solution.
6.17 Extend the method of Problem 6.16 to the solution
Ralston [8] shows that of all fourth-order Runge-Kutta of equations of the form
methods, this one has the smallest bound for the local
truncation error. It has rather poor stability to rounding
errors, but this is not important, since only three steps of the
Problems 419

Note that relations such as 621 Example 6,2 assumed for simplicity that the thermal
cracking of ethane yields two products only--ethylene and hy-
drogen, The situation is really more complicated, as evidenced
will now be involved, so that initial guesses must be made for by Shah [50], who discusses several additional reactions that
the unknown yi, i = 1 , 2 , . . ., n, in order to solve the system of may occur during the thermal decomposition of ethane (and
equationsfor the a,, b,, c,, i = 0 , 1 , . . ,,n - 1. Devise an iterative also of propane).
scheme and convergence criteria for the iterative improvement Shah's pager includes all of the relevant reaction rates,
of the solutions y,, i = 1 , 2, . . ., n. specific heats, and heats of reaction. Armed with this informa-
Apply the method you develop to the solution of tian, which is too lengthy to be summarized here, the reader
n a y wish to devise and solve further problems in the area of
thermal cracking.
6.22 A periodic voltage V = V,,, sin wt is applied to ter-
minal A of the rectifying and ripple-filter circuit shown in
with initial condition y(0) - y o = 0. Try the approach with Fig. P6.22. The particular diode rectifier used has essentially
various step sizes and compare thesolutions with the analytical
solution and solutions found by other integration procedures.
6.18 Extend the methods of Problems 6.16 and 6.17 by
approximating the solution y ( x ) on each interval [x,,x,+ 1 =
0 , 1 , . . ., n - 1, with the third-degree polynomial

6.19 Repeat Problem 2.15, with the following variation.


Figure P6.22
Assume that the exchanger length L is a known quantity, to be
included in the data, and that the resulting exit temperature Ta negligible refirstance for current flowing from A to B, but has
is to be computed by the program. First note that the differ- infi~iteresistance in the reverse direction. Terminal C is
ential form of the heat balance given in Problem 2.15 ig grounded.
After many cycles of continuous operation, determine the
variation with time of: (a) the voltage across capacitor C,,
(b) the voltage across the load resistor R, and (c) the current
through the inductance. For the load, also determine the ripple
where x is the distance from the heat-exchanger inlet, The factor (root mean square deviation of voltage from its mean
problem then amounts to a step-by-step solution ~f a first- value, divided by that mean value).
order differential equation, with the initial condition that
T = T I at x = 0 ; the solution is termipated when the specified Suggested Test D a t a
length L is reached, at which stage T is the required tempera- V,,,= 160 volts, w = 1207r sec-', C,= 4, C2 = 8, then
ture T,. Cr = 8, Ca = 4 microfarads, L = 5 henries, with R = 220 and
6.20 ~ o d i fExample
~ 6.2, concerning the pyrolysis of 2200 ohms for each combination of capacitances.
ethane in a tubular reactor, now removing the earlier simplify- 6.23 A periodic square-wave voltage of amplitude V , and
ing assumption of constant pressure throughout the reactor, period T i s applied to terminal A in Fig. P6.23; terminal C is
Assume that the pressure drop along the rcactor is almost
entirely due to wall friction (and not to the acceleration of the
C
gases), so that a momentum balance yields

Here, D is the internal diameter of the tube, u is the mean


velocity, p is the density, and fM is the dirnensianless Moody
friction factor (see Example 5.4 and Problem 3.38),
The above differential equation will now augment those
involving dzldL and dT/dL. Note that u and p may be expressed
in terms of known quantities by making the substitutions
m = puA and p = MPIRT, where m is the total mass flow rate
+
and M = 30/(1 z)is the mean molecular weight at any point.
Other symbols and suggested test data are basically the spms
as given in Example 6.2. However, include a value forf, (such
as 0.022) in the data; also investigate the consequences of grounded. The rectifier is such that for v. 3 vb, the current iob
specifying different pipe diarneters (such as D = 5,047,4,026, flowing from A to B obeys the relation
3.548, and 3.068 in.). As usual, use consistent units when jab = k ( ~-
. ~b)"~,
substituting actual values into the above differential equation in which k is a constant; however, when v. < vb, no current
for dP/dL. flows between A and B in either direction.
420 The Approximation of the Solution of Ordinary Differential Eqlratiorzs

Write a program that will compute the variation with time t tions-not the situation here-the slope dyldx is frequently
of the voltages across (a) the capacitor and (b) the resistor. neglected).
Discontinue the calculations when either t exceeds an upper
limit t,., or the fractional voltage change from one cycle to the
next (compared at corresponding times) falls below a small
value E .

Suggested Test Data


V, = 100 V, T = 0.1, 0.01, and 0.001 sec, C = 10 pF, Figure P6.26
L = 2.5 H , R = 500 R, with k = 0.003 and 0.0003 A/V jI2 for
If the length of the strut is L, assumed constant, at what
each value of T; E = 0.001 and t,,, = 100T.
value of PL2/EI will the central deflection y, be such that
6.24 Problem 5.24 was concerned with writing a general
y,/L = 0.3 ?
program that could simulate any electrical network consisting
6.27 A mast of length L, clamped vertically at its base B,
entirely of resistors, with known voltages applied at certain
has Young's modulus E and second cross-sectional moment of
points.
area I. If the mast is long enough, it will bend over due to its
Discuss the possibility of extending this idea to the digital
self-weight, w per unit length. As shown in Fig. P6.27, let s
simulation of a completely general network, in which not
only resistors, but capacitors, inductances, switches, electronic
vacuum tubes, a-c sources, etc., may be present.
6.25 Figure P6.25 shows the cross section of an electron
lens that is discussed in Problem 7.26 (q.v.).

Figure P6.25 Figure P6.27

An electron of charge -e and mass m enters the small hole denote distance measured along the mast from B, y = y(s) the
at A with velocity c at an angle 0 to the axis of the lens. Inside horizontal deflection at any point, x = x(s) the vertical height
the hollow cylinders, the electric field is E = -VV, and the at any point, and 8 = 8(s) the angle between the mast and the
force on the electron is F = -eE. Hence the subsequent radial vertical. Note that dylds = sin 8. The governing equation is
and axial velocity components u and v are modified according
to :
du
m-=e-
av
dt 2r ' which may be differentiated with respect to s to yield a second-
order ordinary differential equation. The boundary conditions
areO=Oats=O,andd8/ds=Oats=L.
Compute those values of the dimensionless group y = wL3/
Hopefully, the anode voltage V. and the distances Rand Zwill EI for which xA/Lequals 0.99, 0.95, 0.9, 0.5, and 0. What will
be arranged so that the electron will be travelling along the z be the corresponding values of y,/L and a (the value of 8 at
axis by the time it escapes through the small hole at B. the free end A)? Determine the value of y for which the mast
Making use of the potential distributions computed in just begins to lean over. What does this mean in practical
Problem 7.26, write another program that will investigate the terms for a light-alloy pole of O.D. 2 in. and wall thickness 0.1
effectiveness of the lens for several different values of the param- in., for which w = 0.705 Ib,/ft, E = lo7 lb,/sq in., and 1=0.270
eters c, 8, V., R, and Z. Note that e=1.602x 10-l9 in.4?
coulombs and m =9.108 x kg. Timoshenko [44]gives a simplified treatment of this prob-
6.26 The pinended strut shown in Fig. P6.26 is subjected lem, in which the curvature dB/& is approximated by d2y/dxZ
to an axial load P. The lateral deflection y is governed by [43]: (see Problem 3.46).
6.28 The installation shown in Fig. P6.28 delivers water
from a reservoir of constant elevation H to a turbine. The
surge tank of diameter D is intended to prevent excessive
where E = Young's modulus, and I is the appropriate cross- pressure rises in the pipe whenever the valve is closed quickly
sectional moment of inertia of the strut. (For small deflec- during an emergency.
Problems 421
The drag coefficient CD varies with the Mach number
M = V/c, where c =velocity of sound in air. Representative
values (see Streeter [45], for example) are:
M: 0 0.5 0.75 1 1.25 1.5 2 2.5 3
CD: 1.6 1.8 2.1 4.8 5.8 5.4 4.8 4.2 3.9

'X +
I
H The range on horizontal terrain is to be R. Write a program
that will determine if this range can be achieved and, if so, the
__________------- two possible angles of fire, the total times of flight, and the
Turbine velocities on impact. In each case, also determine the error in
the range that would result from a i10 minute of arc error in
FSgure P6.28 the firing angle.

Assuming constant density, and neglecting the effect of Suggested Test Data
acceleration of water in the surge tank, a momentum balance m = 100 lb,, g = 32.2 ft/sec2, d = 6 in., V, = 2154 ft/sec,
on the water in the pipe leads to p = 0.0742 Ib,/cuft, c = 1137 ft/sec, with R = 0.5, 1 , 2, 4,6, 8,
and 10 thousand yards.
6.30 Let the suspension of a vehicle of mass M be repre-
sented by the lumped spring-dashpot system shown in Fig.
Here, h =height of water in the surge tank (ho under steady P6.30. At time t = 0 the vehicle is travelling steadily with its
conditions), fM = Moody friction factor, u = mean velocity in
the pipe, g =gravitational acceleration, and t = time. Also, Mass M
Direction of motion
continuity requires that
Equilibrium positim

-T 7--,---
The flow rate Q, through the valve in the period 0 < t C t,
during which it is being closed is approximated by

Time
where the constant k depends on the particular valve and the Figure P6.30
downstream head h* depends on the particular emergency at
the turbine. center of gravity at an equilibrium distance ho above the
During and after a n emergency shutdown of the valve, ground, and it has no vertical velocity. For subsequent times
compute the variation with time of the level in the surge tank. (t > 0) the contour of the road (the displacement of the road
surface from the reference level at time zero) is given by an
Suggested Test Data arbitrary function x o = xo(t).
g = 32.2 ft/sec2, H = 100 ft, ho = 88 it, f, = 0.024, If the stiffness (restoring force/extension) of the spring is k,
L = 2000 ft, d = 2 ft, t , = 6 sec, k
= 21.4 ft2.5/sec,and D = 4, and the damping coefficient of the dashpot is r, then show
6, 10, and 15 ft. Take two extreme values for h*: (a) its original that x(t), the displacement of the center of gravity of the
steady value, ho - Qi0/kz, where Quo is the original steady vehicle as a function of time, is the solution of the second-
flow rate, and (b) zero. order ordinary differential equation
6.29 A projectile of mass m and maximum diameter d is
fired with muzzle velocity lVo a) an angle of fire 8, above the
horizontal. The subsequent velocity components u and v in
the horizontal and vertical directions x and y, respectively, with the Initial conditions
obey the ordinary differential equaiions
du
m - = -F 0 coso,
dt
Let the road contour function be given by
x,(t) = A(l - cos w t ) ,
Here, B =angle of projectile path above the horizontal, and where 2 A is the full displacement of the road surface from the

F D = C D A PV 2/2, where V -
g =gravitational acceleration. The drag force F D is given by
speed of projectile, A = ndz/4 =
projected area, and p = density of air. Consistent units are
reference level.
Note that the under-, critically-,, and over-damped cases
correspond to r / 2 2 / k M being less than, equal ro, and greater
assumed in the above equations. than unity, respectively.
422 The Approximation of the Solution of Ordinary Differential Equations

Write a program that uses the model outlined above to 6.33 A hot vertical wall, maintained a t a temperature Ow,
simulate the behavior of a specified suspension system, over the is situated in a fluid of infinite extent. Far away from the wall,
time period 0 it it,,,. The program should read values for the fluid has temperature 8, and zero velocity. Because of the
M , R, K, A, W, TMAX, DT, and FREQ, where W corresponds to buoyant effect on heating, natural convection occurs in the
w , DT is the step size to be used in integrating the differential fluid. Let u = u(x,y) and v = v(x,y) denote the resulting fluid
equation, and FREQ is an integer used to control printing veIocities in the x and y directions (horizontal and vertical
frequency, such that the values of t, x, dxldt, d2x/dtZ,and distances from the bottom of the wall, respectively). Also use
xo(t)are printed initially and every FREQ time steps thereafter. 8 = O(X,~) to denote the fluid temperature, and let Pr, P, and v
The remaining program symbols have obvious interpretations. be its Prandtl number, volume coefficient of expansion, and
The equation should be solved using a one-step method, and kinematic viscosity, respectively.
the program should allow for the processing of several sets of The steady motion is governed by the suitably simplified
data. equations of continuity, momentum, and energy (see Schlich-
ting [42],for example). By using the similarity transformation
Suggested Test Data
M = 3680 Ib,, K = 640 Ib,/in., A = 2 in., W = 7 radlsec,
and TMAX = 10 sec. Investigate R = 80,160, and 240 lbf sec/in.
in turn. The values of DTand FREQ will depend on the integra- it may be shown that dimensionless velocities and temperature
tion method chosen. In any event, each of the data sets listed are given by
above should be processed with more than one step size. F o r
each situation, plot x, xo, dxldt, and d2x/dtZas functions of
time.
6.31 Let the damping coefficient, r , of Problem 6.30 be a
nonlinear function of the relative velocities of the two parts of
the dashpot:

Rerun the simulation of Problem 6.30 with this definition of where the functions 5 = [(T) and T = T ( 7 )are solutions of the
the damping coefficient. Investigate all nine combinations of simultaneous ordinary differential equations
r o = 80,' 160, and 240 lb, sec/in., with c = 0.1, 1 , and 10
(sec/in.)IJ2.
6.32 Extend Example 6.5 to include heat transfer for a
plate a t a temperature 8, and a fluid whose temperature is 8,
far away from the plate. Neglecting viscous dissipation and
conduction in the x direction, the partial differential equation
governing the variation of temperature 8 is Show that the boundary conditions on 6 and T are:

subject to the boundary conditions

Note that in a finite-difference procedure, q = oo would be


approximated by 7, = 7,,,,.~, where 7,,,,.. is large enough so that
Here, k, p, and c , are the thermal conductivity, density, and its exact value has negligible influence on the solution.
specific heat of the fluid. Solve the above differential equations, and construct plots
Show that the above relations can be reexpressed as of U,V, and T against 7. What is the value of the dimensionless
temperature gradient a t the wall, (dT/d?),,o?
6.34 Extend Problem 5.30 by allowing for heat conduction
around the perimeter of the shield. Let t and k be the thickness
subject to and thermal conductivity of the shield, x be distance measured
around the perimeter, and T, = T,(x) be the shield temperature
7, =o: T=l,
at any point. If q, denotes the local rate of radiant heat absorp-
7, = CO: T=O,
tion by unit area of the inside of the shield, show that T, obeys
where f and 7 are defined in Example 6.5, Pr = vpcP/k is the the differential equation
Prandtl number of the fluid, and T = (8 - 8,)/(8, - 8,) is a d2T,
dimensionless temperature. kt- +q,-q=0,
Obtain solutions that will enable T to be plotted against r ] dx2
for Pr = 0.1, 1 , 10, 100, and 1000. where q is defined in Problem 5.30.
Problems 423
Attempt to devise a method for computing Tsas a function Tube internal diameter, ft.
of position; if the scheme appears feasible, implement it on the Catalyst particle diameter, ft.
computer. Use the data of Problem 5.30, with k =25.9 G a s superficial mass flow rate, Ib,/sq ft hr.
BTU/hr ft "F and t = 0.1 in. Moles oxygen per mole ethylene in feed.
6.35 Ethylene oxide is to be produced by passing a mixture Heat transfer coefficients between catalyst
of ethylene and oxygen through a reactor consisting of a and gas, and between gas and cooling water,
bundle of parallel vertical tubes that are packed with catalyst BTU/hr sq ft O F .
particles consisting of silver on an alumina base, as described Mean gas thermal conductivity, BTU/hr ft O F .
by Wan [46]. The reaction is Length along tube, ft; L.,., =value at exit.
Mean molecular weight of gas at any point,
Ib,/lb mole.
The large heat of reaction serves to generate steam from boiling Moles nitrogen per mole ethylene in feed.
water surrounding the tubes. Nitrogen may also be added to Lb molesethylene/hr sq ft entering the reactor.
the feed gases as an inert diluent in order to moderate the rate Absolute pressure, atrn. Po =inlet value.
of reaction. Absolute pressure, Ib,,/ft hr2. Conversion
By making certain simplifying assumptions and using factors 32.2 Ib., ftllbi sec2, 144 sq in./sq ft, and
standard design correlations (see Perry [47] and Rohsenow 14.7 psialatm will be needed to convert to
and Choi [48]), the f o l l o ~ ~ i nequations
g can be shown to practical units.
govern the variations of pressure, conversion, and gas and Prandtl number of the gas, ~ c , / k .
catalyst temperatures along the tubes (mass transfer is not a Specific reaction rate, Ib mole ethylene oxide
limiting factor): formed per Ib,, catalyst per hour.
G a s constant, 0.7302 atm cu ft/lb mole OR.
Pressure Along Tubes Particle Reynolds number, d,G/p.
Gas, catalyst, and water temperatures, O R ,
Extra subscript F denotes "F, and 0 denotes
inlet value.
Conversion Along Tubes Temperature above which there is a danger of
explosion, OR.
dz rp,
-- = - Mole fractions of ethylene and oxygen, respec-
dl: No' tively.
Total yield of ethylene oxide, Ib moles/hr sq ft.
Gas Temperature Along Tubes
Conversion, moles ethylene oxide formed per
mole of ethylene in the feed.
Mean heat of reaction, BTU/lb mole of ethy-
lene oxide formed.
where H follows the dimensional correlation Void fraction in packed tube.
Bulk catalyst density, Ib,/cu ft.
Gas density, Ib.,/cu ft,
Reaction Rate Mean gas viscosity, Ib,,,/ft hr.

On the basis of material balances, first establish the rela-


Catalyst/Gas Temperature Difference tions:
a,(T, - Tg)h= r ( - A HR)pc,
with

Equation of State
MP
pg=-.
R T,
The following notation is used here:
a, Surface area of catalyst per unit packed
volume, ft - .
CP Mean specific heat of gas, BTU/lb, OF,
+ +
Assume that c, = (0.585 0.260n 0.269g)l A preliminary design has suggested the following
+ +
(I u 1.143g), an average value for the values (units as above): p, = 83.5, T,, = 300, 0 = 1/12,
feed gas over the temperature range of most d, = 0.0033, T,,,,, = 1010, E = 0.57, p = 0.0595, k = 0.028,
interest, holds throughout the reactor. a, = 782, PQ= 15, but these may need modification.
424 The Approximation of the Solution of Orditiary Differential Eqriatiotls

Suitable values for G, g, n, TgFo, and L ,. are still to be to the wall for its entire height. Each fin has width W, thick-
chosen. Investigate the performance of the reactor for several ness t , thermal conductivity k, and surface emissivity &., The
different combinations of these variables, bearing in mind the wall and fins transfer heat by combined convection and radia-
following desirable features: (a) high overall yield Y, (b) T, tion to the surrounding air, which has a mean temperature To
must never exceed T,,,.,, (c) dT,/dL must not be negative and also behaves as a black-body enclosure at that tempera-
at the inlet (reaction self-extinguishing), (d) P must always be ture.
positive,and(e) for control purposes, a change of 1°F in the gas
temperature at inlet should not produce too large a change
in its value at the exit (no more than 1O0F, for example).
Tentative ranges for investigation are: G = 5,000 to 40,000;
,,.,
g = 0.2 to 0.6; tz = 0 to 4; TgpO= 350 to 500; L = 2 to 6.
6.36 Figure P6.36 shows a cross section of a long cooling
fin of width W, thickness r , and thermal conductivity k, that is

Air at T,
4tk
Air at T,
Figure P6.37

Assume that (a) both wall and fins have gray surfaces, and
Figure P6.36 (b) the heat transfer coefficient for cortr.ectiot~between the wall
and fins to the surrounding air approximately obeys the corre-
bonded to a hot wall, maintaining its base (at x = 0 ) at a lation h = 0.21(AT)L'3BTU/hr sq ft 'F, in which the tempera-
temperature T,. Heat is conducted steadily through the fin and ture difference AT between the hot surface and air is in " F
is lost from the sides of the fin by convection to the surrounding (see Rohsenow and Choi [48]). Also, neglect temperature
air with a heat transfer coefficient h. (Radiation is neglected at variations across the thickness of the fins; the local fin tem-
sufficiently low temperatures.) perature T, then obeys the differential equation
Show that the fin temperature T, assumed here to depend
only on the distance x along the fin, obeys the differential
equation
where q, is the local rate of heat loss per unit area from one
side of a fin and x is distance along a fin.
Write a program that will accept values for the parameters
where T. is the mean temperature of the surrounding air. T,, T., E,, E J , W, t , d , and k , and then compute the average net
(Fourier's law gives the heat flux density as q = -k dT/dx for rate of heat loss from the wall and fins. based on unit area of
steady conduction in one dimension.) If the surface of the fin is the wall surface. Find 4 for several different representative
vertical, h obeys the dimensional correlation h = 0.21(T- T,)'J3 values of the parameters. Discuss the possibility of designing
BTU/hr sq ft OF, with both temperatures in OF (Rohsenow and fins that would maximize4 for a specified weight of metal in the
Choi [48]). fins.
What are the boundary conditions on T? Write a program 6.38 Consider two wildlifeanimal species: A, which has an
that will compute: (a) the temperature distribution along the abundant supply of natural food, and B, which depends entirely
fin, and (b) the total rate of heat loss from the fin to the air per for its food supply by preying on A. At a given time r , let N ,
foot length of fin. and Ne be the populations of A and B, respectively, in a given
natural area.
Suggested Test Data By making certain simple assumptions, show that the
following differential equations could describe the growth o r
T , = 200°F, T, = 70°F, t = 0.25 in; investigate all eight decay of the two populations:
combinations of k = 25.9 (steel) and 220 (copper) BTU/hr ft
"F, W = 0.5, 1, 2, and 5 in.
If available, compare the results with those obtained from
Problem 7.42, irl which te~liperatulevariations across the fin
are taken into account.
6.37 Note. Before attempting this problem, the reader
should become familiar with Example 2.1 and the discussion What significance d o you attach to the positive constants a, P,
in Problems 3.17 and 5.25. y, and 6 ?
Figure P6.37 shows a horizontal cross section of a very tall Write a program that will solve the above equations for a
vertical wall that is maintained at a uniformly high tempera- wide variety of initial conditions and values for a, ,L?,y, and 6.
ture T , and whose surface has an emissivity .&, A large number If possible, arrange for the program to plot NA and NB against
of vertical metalliccooling fins, spacedama pitch d, are bonded t.
Problems 425

6.39 Before attempting this problem, the reader should where N , is the number of particles collected per unit time,
have solved Problem 7.27, in the suggested dimensionless form, r,,, is the upstream radius of the stream tube that impinges on
for several dimensionless velocity ratios u ~ / u A = 0.5, 1.0, 2.0, the probe circumference and r,,, is the upstream radius of the
3.0. The dimensionless stream function Y: should be available limiting particle-trajectory envelope. In both cases, "upstream"
in tabular form for all the grid points of the finite-difference means far enough removed from the probe to ignore radial
solution for each of these dimensionless velocity ratios. velocity effects.
Vltols [49] has studied the motion of particles In flow Thus, to calculate the true particle density for spherical
regimes described by the solutions of Problem 7.27, where the particles of a given diameter, we must know r,. ,and r,, ,for
inner cylinder represents an aspirated particle-sampling probe each dimensionless velocity ratio u~/u,. The quantity r,,,
immersed in a flowing fluid containing solid particles (for will be known once the dimensionless stream function has been
example, samjling for fly ash in an industrial stack). The outer determined, as in Problem 7.27. It remains to determine up,,.
cylinder in Fig. P7.27 need not be taken as the radius of an Vitols [49] has shown that the motion of an individual
outer container (stack, for example), but simply as an imagin- spherical particle can be described by the following ordinary
ary stream tube, far enough away from the probe so that there differential equations :
is no radial velocity compolnent of any significance.
Because of inertial and (drag forces, a particle flowing at
free-stream velocity far upsfream of the probe will not neces-
sarily follow a streamline in the vicinity of the probe, where
radial and axial velocity components of the fluid may be
changing markedly. This eff'ect is illustrated in Fig. 6.39a and where
6.39b for ~ s / less
~ nthan one and greater than one, respectively.

ud2uA
K=- = particle inertia parameter,
1811.~~
pdua
Re, = -= free-stream Reynolds number.
P
Probe opening
Here,
C D = dimensionless drag coefficient for spheres
---_--- --
--- d = particle diameter, cm
--c
---
--.
Particle path
u, = free-stream velocity, cmlsec
a = particle density, gm/cm3
-----+
------ 3
p = absolute viscosity of fluid, poise
rs = probe radius, cm
p = fluid density, gm/cm3
--3
u = local fluid velocity, cm/sec
Probe opening
v =local particle velocity, cm/sec
t = time, seconds
(b)
r =radial coordinate of particle position, cm
Figure P6.39
z = axial coordinate (origin at probe) of particle position,
Thus if the estimated density of particles in the free stream cm.
is taken as the number of particles collected by the sampler The following are dimensionless:
divided by the volume of gas passing through the probe, the
calculated values may differ markedly from the true situation. ci = local fluid velocity, u / u A
The total volume of gas passing through the probe will be that d = local particle velocity, vlu,
enclosed by the stream tube that impinges on the outer circum- T = time, tuA/r,

ference of the-probe opening. For uB/uA< 1, particles from i = radiai coordinate, r/rs
outside the limiting stream tlube will enter the sampler while, i = axial coordinate, z/rs
for u ~ / u A > 1, particles originally inside the limiting stream 6; = dF/d~,radial component of particle velocity
tube will pass outside the probe. 6; = d i / d ~ axial
, component of particle velocity
Now, consider a particle that just impinges on the outer a; = dci/di, radial component of fluid velocity
circumference of the probe. All particles of the same diameter a; = dci/di, axial component of fluid velocity.
within a cylindrical envelof~egenerated by this limiting
Seve~alassumptions arc inhcrcnt in the development of this
particle trajectory will be collected by the probe. Thus the
equation, including (a) uniform particle distribution, (b) no
actual density or number of particles per unit volume in the
gravitational o r electrostatic forces of consequence, (c) mono-
sampled gas is given by
disperse spherical particles with diameter very small in relation
to diameter of probe opening, and (d) free-stream flow that is
steady, incompressible, and irrotational.
426 The Approximation of the Solution of Ordinary DifferentialEquations

The drag coefficient CDis a function of Reynold's number


and is available in tabular form. For industrial sampling
applications, the Reynolds number is normally in the range
1.0 to 100. Some values for spherical particles are shown in the
Table P6.39 [47]:

Table P6.39

Figure P6.41

sure throughout, momentum in the z direction is conserved, so


that
m

2~rpJ0 u2r dr
Write a program that accepts, as data, values for the dimen-
sionless velocity ratio uB/ua (possible values correspond to is constant. Also assume a solution of the form u(r,z) =
those for which the stream functions are available), K (indus- #(z) f ( l ) where
, 7 = r/z, and let
trial cases encountered usually have values of K from 1 to
loo), and Reo (normally between 1 and 100 for practical appli-
cations). The program should then integrate the two second-
order nonlinear ordinary differential equations to establish the Using the continuity equation,
position (P,i) of a given particle in the flow field at any time 7,
given an initial position (io,io)at time T = 0.The initial posi-
tion should be situated far enough upstream (several dimen-
sionless probe radii) so that the initial particle velocity in the show that
radial direction is zero and in the axial direction is the free
stream velocity.
Note that as the particle moves in time, the velocity com-
ponents for the fluid, a; and a;, must be computed as functions
of location by numerical differentiation of the dimensionless
stream function (available in tabular form from Problem 7.27). where A is a constant.
In addition, the Reynolds number and the corresponding drag
If momentum is transported only by turbulent shear, and
coefficient CD must be evaluated for local conditions.
turbulent normal stresses are ignored, a simplified Reynold's
The program should automatically adjust the initial loca-
equation results. Tollmien 1511 has studied this problem;
tion of the particle (Po,io) until it finds the radius P,,, =
assuming a Prandtl mixing length 1 of the form 1 = cz, it can
r,,,/r~ = i,. Thus the problem may be viewed as a boundary
be shown that
value problem for which three conditions are initial conditions
(io,ii;,fi;) and one is a boundary condition, namely that at some
time T > 0, the particle will hit the "target" (the lip of the
probe) at location i = 1, i = 0. The desired solution of the By substituting the above relationships into this equation
problem is i,,,. and integrating once, show that F obeys the ordinary differen-
As trial values, solve the problem for the following 1491: tial equation

Eliminate c by introducing a new independent variable


5= Arbitrarily assuming that uz/A = 1 on the center-
6.40 Discuss the possibility of extending Problem 6.39 to line, what are the boundary conditions on F?Solve the final
account for spherical particles whose diameters conform to a equation with the selected boundary conditions and construct
specified statistical distribution. a graph showing how uz/A and vz/A vary with 6.
6.41 - A jet of fluid of constant density p originates from a 6.42 A pile A B of total length L has Young's modulus E
point source in a fluid that is otherwise stagnant, as shown in and cross-sectional moment of inertia I. It is driven vertically
Fig. P6.41. Let u and v be the fluid velocities in the r (radial) down for a depth D into the ground. A horizontal force P is
and z (axial) directions, respectively. Assuming a uniform pres- applied to the pile through a pin joint at its exposed end A.
Problems 427
The following differential equation [52] governs y, the hori- approximation, as a black-body radiator at a temperature
zontal deflection of the pile from its undisturbed position, as a T,. Considering radiation from the furnace to the tube, con-
function of x, the vertical distance measured down from A : duction through the tube wall, and heat transfer by convection
from the inside tube wall to the reacting gases, we have:
-A4 v
E,T 2 = w.
dx4 2rkr(T2 - TI)
q = rrd2~u(T$- T;) = = 7rd,h(Tl - T,).
dz
Here, w is the load in t h e y direction on unit length of the pile In -
which, due to the ground resistance, is given by a known dl
function, w =f(x,y). Here, q is the rate of heat transfer to the gas per unit length
What are the boundary conditions On Y ? Discuss methods of tube, T,, TI, and T2 are the temperatures of the gas, inside
for computing y = y(x) in the following three cases:
tube wall, and outside tube wall, respectively, and u is the
(a) f(x,y) = - ky, where k is a constant that is independent Stefan-Boltzmann constant: consistent units are assumed and
of depth. absolute temperatures must be used in the radiant-transfer
(b) f(x,y) = -k(x)y, where k is a known function of depth. term.
(c) f(x,y) = -k(x)g(y), where both k and g are known The following dimensional correlation is available 1471 for
functions. the heat transfer coefficient h (BTU/hr sq ft OF) between the
inside tube wall and gas:
Would an algorithm for solving a pentadiagonal system of
linear equations be applicable to any of the above cases? Can
such an algorithm be developed? (See Section 7.9 for the
solution of a tridiagonal system of equations.) where ED is the local mean specific heat of the gas (BTU/lb O F ) ,
6.43 Sulfuryl chloride vapor decomposes according to the G is the gas mass velocity (Ib/sec sq ft), and dl the inside tube
irreversible first-order endothermic reaction diameter (in.).
Pressurep varies with length L along the tube according to
the differential equation given in Problem 6.20. Differential
Thermodynamic and kinet~~c information is available (Table equations describing dT,/dL and dz/dL, where z is the frac-
P6.43) with temperature T, in degrees Kelvin. tional conversion, may be obtained by methods similar to
those presented in Example 6.2.
Write a program that computes and prints values for T,, TI,
and T2 (all in O F ) , p (psia), and z at convenient intervals along
the reactor tube until either z exceeds a value z,,,, L exceeds a
value L ,.,, or p falls below a value p,,,.
CP
(cal/g mole OK) Suggested Test Data (Not in Consistent Units)
dl =0.5 in.
SOzClz (gas) -82,040 13.00 f 24.0 X IO-jTk - 14.4 X 10-6T*2
d, = 0.625 in.
SO, (gas) -70,940 8.12 4 6.825 x 10-3Tk- 2.103 x 10-'Tk2 To = 250°F
CI2 (gas) o +
7.575 2.424 x ~ o - ~-T0.965
, x IO-~T; fM (Moody friction factor) = 0.026

Reaction rate constant: k = 6.427 Y. l o t 5e-50,6101RTk


sec-l z,., = 0.90
E = 0.92
Atomic weights: S = 32.07, 0 = 16.0, Cl = 35.46
u = 1.355 x 10-l2 cal/sec sq cm OK4
Gas constant: R = 1.987 cal/g mole "K kt = 12.8 BTU/hr ft OF
p,,. = 5 psia
In a proposed experiment, m pounds per haur of sulfuryl L,,, = 200 ft.
chloride vapor are to be fed at an inlet pressurePo and an inlet
Knvestigate the behavior of the reactor for all combinations of:
temperature To to a long coiled tube of internal and external
diameters d, and d,, respectively. The tube, which has a m = 10, 30, 100 lb/hr
thermal conductivity k, aind whose surface is gray with p o = 14.7, 50 psia
emissivity E, is situated inside a furnace that acts, to a first T, = 600, 900, 1200°K.
428 The Approximation of the Solution of Ordinary Differential Equations

Bibliography 27. J. C. Butcher, "A Modified Multi-Step Method for the Nu-
merical Integration of Ordinary Differential Equations," Journal
1. B. W. Arden, An Introduction to Digital Computing, Addison- of the A.C.M., 12, 124-135 (1965).
Wesley, Reading, Massachusetts, 1963. 28. P. Henrici, Error Propagation for Difference Methods, Wiley,
2. P. Henrici, Discrete Variable Merhods in Ordinary Differential New York, 1963.
Equations, Wiley, New York, 1962. 29. J. Waters, "Methods of Numerical Integration Applied to a
3. F. B. Hildebrand, Introduction to Nomerical Analysis, McGraw- System Having Trivial Function Evaluations," Communications
Hill, New York, 1956. of the A.C.M., 9, 293-296 (1966).
4. L. Lapidus, Digital Computation for Chemical Engineers, 30. S. D. Conte, Elementary Numerical Analysis, McGraw-Hill,
McGraw-Hill, New York, 1962. New York, 1965.
5. W. E. Milne, Numerical Solution of Differential Equations, 31. W. E. Milne and R. R. Reynolds, "Stability of a Numerical
Wiley, New York, 1953. Solution of Differential Equations," Journal of the A.C.M., 6,
6. S. Gill, "A Process for the Step-by-step Integration of Dif- 196-203 (1959).
ferential Equations in an Automatic Computing Machine," 32. G. Dahlquist, "Convergence and Stability in the Numerical
Proc. Cambridge Phil. Soc., 47, 96-108 (1951). Integration of Ordinary Differential Equations," Math. Scand.,
7. A. Ralston, A First Course in Numerical Analysis, McGraw- 4, 33-53 (1956).
Hill, New York, 1965. 33. W. E. Milne and R. R. Reynolds, "Fifth-0rder.Methods for
8. A. Ralston, "Runge-Kutta Methods with Minimum Error the Numerical Solution of Ordinary Differential Equations,"
Bounds," Math. Comp., 16, 431-437 (1962). Journal of the A.C.M., 9, 64-70 (1962).
9. L. Collatz, The Numerical Treatment of Differential Equations, 34. P. E. Chase, "Stability Properties of Predictor-Corrector
Third Ed., Springer-Verlag, Berlin, 1960. Methods for Ordinary Differential Equations," Journal of the
10. D. D. McCracken and W. S. Dorn, Numerical Methods and A.C.M., 9,457-468 (1962).
FORTRAN Programming, Wiley, New York, 1964. 35. W. B. Gragg and H. J. Stetter, "Generalized Multistep Pre-
11. D. H. Call and R. F. Reeves, "Error Estimation in Runge- dictor-Corrector Methods," Journalof rhe A.C.M., 11, 188-209
Kutta Procedures," Communications of the A.C.M., 1, 7-8 (1964).
(1958). 36. R. L. Crane and R. W. Klopfenstein, "A Predictor-Corrector
12. J. W. Carr, 111, "Error Bounds for the Runge-Kutta Single- Algorithm with an Increased Ranke' of Absolute Stability,"
Step Integration Process," Journal of the A.C.M., 5, 39-44 Journal of the A.C.M., 12, 227-241 (1965).
(1958). 37. F. T. Krogh, "Predictor-Corrector Methods of High Order with
13. B. A. Caller and D. P. Rosenberg, "A Generalization of a Improved Stability", Journal of the A.C.M., 13,374-385 (1966).
Theorem of Carr on Error Bounds for Runge-Kutta Proce- 38. J. C. Butcher, "A Multistep General~zationof Runge-Kutta
dures," Journal of the A.C.M., 7, 57-60 (1960). Methods with Four or Five Stages," Journal of the A.C.M., 14,
14. P. Henrici, "The Propagation of Error in the Digital Integra- 84-99 (1967).
tion of Ordinary Differential Equations," Error in Digital 39. J. J. Kohfeld and G. T. Thompson, "Multistep Methods with
Computation-I, (L. B. Rall, editor), 185-205, Wiley, New Y o ~ k , Modified Predictors and Correctors," Journal of the A.C.M., 14,
1965. 155-166 (1967).
15. L. Fox (ed.), Numerical Solution of Ordinary and Partial 40. H. A. Luther and H. P. Konen, "Some Fifth-Order Classical
Differential Equations, Addison-Wesley, Reading, Massa- Runge-Kutta Formulas," S.Z.A.M. Reciew, 7, 551-558 (1965).
chusetts, 1962. 41. H. A. Luther, "Further Explicit Fifth-Order Runge-Kutta
16. R. Beckett and J. Hurt, Numerical Calculations and Algorithms, Formulas," S.I.A.M. Review, 8, 374-380 (1966).
McGraw-Hill, New York, 1967. 42. H. Schlichting, Boundary Layer Theory, 4th ed., McGraw-Hill,
17. J. F. Holt, "Numerical Solution of Nonlinear Two-Point New York, 1960.
Boundary Problems by Finite Difference Methods," Com- 43. M. R. Horne and W. Merchant, The Stability of Frames,
munications of the A.C.M., 7, 366-373 (1964). Pergamon Press, New York, 1965.
18. D. D. Morrison, J. D. Riley, and J. F. Zancanaro, "Multiple 44. S. Timoshenko, Theory of Elastic Stability, McGraw-Hill,
Shooting Method for Two-Point Boundary Value Problems," New York, 1936.
Communications of the A.C.M., 5, 613-614 (1962). 45. V. L. Streeter, Fluid Mechanics, 4th ed., McGraw-Hill, New
19. D. Wilde, Optimum Seeking Methods, Prentice-Hall, Engle- York, 1966.
wood Cliffs, New Jersey, 1964. 46. Shen-Wu Wan, "Oxidation of Ethylene to Ethylene Oxide,"
20. R. Hooke and T. A. Jeeves, "Direct Search Solution of Numeri- Ind. Eng. Chem., 45,234-238 (1953).
cal and Statistical Problems," Journal of the A.C.M., 8,212-229 47. J. H. Perry, ed., Chemical Engineers' Handbook, 3rd ed.,
(1961). McGraw-Hill, New York, 1950.
21. R. W. Hamming, "Stable Predictor-Corrector Methods for 48. W. M. Rohsenow and H. Y. Choi, Hent, Mass, and Momentrrm
Ordinary Differential Equations," Journal of the A.C.M., 6, Transfer, Prentice-Hall, Englewood Cliffs, New Jersey, 1961.
37-47 (1959). 49. V. Vitols, Determination of Theoretical Collection Eficiencies of
22. R. W. Hamming, Numerical Methods for Scientists and En- Aspirated Particulate Matter Sampling Probes Under Aniso-
gineers, McGraw-Hill, New York, 1962. kinetic Flow, Ph.D. Thesis, The University of Michigan, 1964.
23. J. C. Butcher, "Integration of Processes based on Radau 50. M. J. Shah, "Computer Control of Ethylene Production,"
Quadrature Formulas," Math. Comp., 18, 233-244 (1964). Ind. Eng. Chem., 59, 70-85 (1967).
24. J:. C. Bbtcher, "Pmplicit Runge-Kutta Processes," Math. Comp., 51. W. Tollmien, "Calculation of Turbulent Expansion Processes,"
28, 50-64 (1964).
NACA TM 1085 (translation) (1945).
25. P. C: Butcher, "On Runge-Kutta Processes of Higher Order," 52. W. F. Hughes and E. W. Gaylord, Basic Equationsof Engineering
J. Austral. Math. Soc., 4, 179-194 (1964).
Science, Schaum Publishing Company, New York, 1964.
26. C. W. Gear, "Hybrid Methods for Initial Value Problems in
Ordinary Differential Equations," Journal of the S.Z.A.M.,
2, 69-86 (1965). .
CHAPTER 7

Approximation of the Solution of Partial Dzflerential Equations

7.1 Introduction initial condition and subsequent time-dependent boundary


conditions for their solutions, whereas the latter are
Linear partial differential equations (PDEs) of the
boundary-value problems having time-independent solu-
second order are frequently referred to as being of the
tions. The various computer examples deal with fairly
elliptic, hyperbolic, or parabolic type. Such a classifica-
simple problems, often concerning heat conduction.
tion is possible if the equation has been reduced, by a
However, the same basic principles can be extended to a
suitable transformation of the independent variables, to
wide variety of more complex situations, several of which
the form
are outlined in Section 7.2.
When using a finite-difference technique to solve a
PDE (plus associated boundary and initial conditions),
a network of gridpoints is first established throughout the
in which the coefficients A,, evaluated at the point region of interest occupied by the independent variables.
(x,,x,, . . ., x,), may be 1, - 1, or zero. Here, u is the Suppose, for example, we have two distance coordinates
dependent variable, and the x i are the independent x and y, and time t as independent variables, and that
variables. Note the absence of mixed derivatives the respective grid spacings are Ax, Ay, and At. Sub-
a2u/axi axj (i # j) in this form. The following are the scripts, i, j, and n may then be used to denote that space
main possibilities of interest: point having coordinates iAx, jAy, nAt, also called the
1. If all the A, are nonzero and have the same sign, grid-point (i,j,n). Let the exact solution to the PDE be
the PDE is of elliptic type. u = u (x,y,t), and let its approximation, to be determined
2. If all the A , are noni:ero and have, with one excep- at each grid point by the method of finite differences,
tion, the same sign, the PI>E is of hyperbolic type. be uiSj,,. We also use uiVj,,to denote the exact solution
3. If one A, is zero (A,, for instance) and the remaining u(iAx, jAy, nAt) at a particular grid-point (i,j,n).
A, are nonzero and of the same sign, and if the coefficient The partial derivatives of the original PDE are then
Bk of 2u/dxkis nonzero, the PDE is of parabolic type. approximated by suitable finite-difference expressions
involving Ax, Ay, At, and the v,,~,,. This procedure leads
Since the coefficients A,, B,, C, and D are functions of to a set of algebraic equations in the v,,j , , , whose values
the independent variables x,, x2, . . ., x,, the classification may then be determined. By making the grid spacings
of a PDE may vary according to the particular point sufficiently small, it is hoped that v , , ~will , become a
being considered in the (x,,x2, . . ., x,) space. Very fre- sufficiently close approximation to u , , ~ , , at any grid-
quently, one of the independent variables will be time t point (i,j,n).
and the remainder (for ouir purposes up to three in num-
ber) will be distance coordinates x , y, and z. The reader 7.2 Examples of Partial Differential Equations
should verify that the PDEs
The following PDEs, several of which bear obvious
similarities to the types mentioned in Section 7.1, are
typical of those of practical importance to the engineer.
The symbol V2 denotes the Laplacian operator.
aZu a% au
-+-=- Unsteady Heat-Conduction Equation. One-dimensional
ax2 dYZ a t o unsteady heat conduction in a rod is governed by
are of elliptic, hyperbolic, and parabolic type, respectively.
Further details of the classification of PDEs are given by
Petrovsky [20].
This chapter is devoted largely to parabolic and elliptic where T denotes temperature, and k, p, and c, are the
PDEs. The former are frequently regarded here as equa- thermal conductivity, density, and specific heat of the
tions of a time-dependent nature, requiring both an rod. If k is constant, this equation may be rewritten as
430 Approximation of the Solution of Partial Differential Equations

aaZT/ax2= aT/at, in which a = klpc, is the thermal dif- Vibrating Beam Equation. The tratsverse deflection y
fusivity. The introduction of new variables X = x/L and of a vibrating beam obeys
.r = mc/L2,where L. is a characteristic dimension, leads to
aZT/aXz= aT1a.r. A similar equation governs the inter-
diffusion of two substances.
Vorticity Transport Equation. The vorticity [ of an in which E = modulus of elasticity, I = cross-sectional
incompressible fluid ;.r two-dimensional motion varies moment of inertia, A = cross-sectional area, and p =
according to density.
Ion-Exchange Equation. For flow of a solution through
a packed column containing an ion-exchange resin,

where u and v are the x and y velocity components, and


v is the kinematic viscosity.
Poisson's Equation. Poisson's equation is V2$ = -a. Here, c = concentration of a particular ion in solution,
Three important applications occur: (a) in fluid dynamics, r = rate at which that ion is adsorbed per unit volume of
with $ = stream function and o = vorticity, (b) in electro- column, E = bed void fraction, and u = superficial liquid
statics, with $ = electric potential and a = ratio of velocity. A similar type of equation governs regenerative
charge density to dielectric constant, and (c) in elasticity, heat transfer.
with o ='2, $ is a function from which the angle of twist
7.3 The Approximation of Derivatives by
of a cylinder under torsion can be calculated.
Finite Differences
Laminar Flow Heat-Exchanger Equation. The following
equation governs variations of temperature T with radial Here, suppose for simplicity that u = u(x,y). Assuming
and axial distances r and z for steady, laminar flow in a that u possesses a sufficient number of partial derivatives,
cylindrical heat exchanger : the values of u at the two points (x,y) and (x + h,y + k)
are related by the Taylor's expansion:

Here, p, c,, and k are the density, specific heat, and ther-
mal conductivity of the fluid, and the axial velocity u is a
known function of r. Note how the character of the equa-
tion changes if the term a2T/azZ,corresponding to axial
conduction, can be neglected.
Telephone Equation. The following equation can be
used to predict variations of voltage V along a transmis-
sion cable: where the remainder term is given by

Here, R and L denote the resistance and inductance per


unit length of the cable, and C and G denote the capaci- That is,
tance and conductance to ground per unit length of the
cable. R,, = O[(lhl + Ikll"1. (7.3)
Wave Equation. The angle of twist 4 at any section of a By (7.3), we mean there exists a positive constant M such
circular shaft undergoing torsional vibrations is governed that (RnI=S M(lhj + Ikl)" as both h and k tend to zero.
by The space point (i Ax,jAy), also called the grid-point
(i.~).is surrounded by the neighboring grid points shown
in Fig. 7.1. Expanding in Taylor's series for ui-,, and
+,
ui , about the central value u,, j, we obtain
where G is the rigidity modulus of the shaft and p is its
dznsity.
Biharmonic Equation. The transverse deflection w of a
thin plate of flexural rigidity D subject to a normal load
q per unit area is governed by V4w = q/D.
7.4 A Simple Parabolic DifferentialEquation 431

( i - 1,j + 1) l i " l(i,,j j++1)l ) By taking more and more neighboring points, an un-
limited number of other approximations can be obtained,
but the above forms are the most compact.
For convenience, the central-difference operator 6, will
be used occasionally. It is defined by

whence

Figure 7.1 Arrangement of grid points.


7.4 A Simple Parabolic Differential Equation
Consider an insulated bar with an initial temperature
distribution at t = 0, having ends that are subsequently
(AX)" maintained at temperatures which may be functions of
(Ax)"
+ --
3!
uxxx + 4 ! UXXXX' time. The temperature distribution u(x,t) in the bar at
any t > 0 may be found by defining suitable dirnen-
Here, ux sau/ax, u,, s a%/ax2, etc., and all derivatives sionless variables and by assuming that the physical
are evaluated at the grid-point (i,j). By taking these properties of the bar are constant. The problem can then
equations singly, and by adding or subtracting one from
the other, we obtain the fallowing finite-difference formu-
las for the first- and secon,d-order derivatives at (i,j):

Solution:
*u(r, t )

Boundary PDE:
Boundary
(7'5) condition: a.
at
,aax2? condition:
u (0,t) = go ( t ) u(1,t) = gl(t)

zzu 'i- 1.1 - 2ui,j + ~ i 1.j


+ + 01(~~)23, Initial condition: u(x, 0)r f(r) I * X

z 2 = (7.7) 1
(Ax)2
Figure 7.2 The differential problem.
Formulas (7.4), (7.5), and (7.6) are known as the forward,
backward, and central difference forms respectively. be described by the following differential equation and
similar forms exist for au/ay and a2u/ay'. It may also be initial and boundary conditions, also illustrated in Fig.
shown that 7.2.
a2u
-- -Ui+l,j+l -Ui-~,j+l - Ui+l,j-1 + ui-~,j-l au aZu
ax ay 4 Ax AY -=-
at ax2'
f o r o c x c 1, O < t c T, (7.12)
+ O [ ( A X + A ~ ) ~ ](7.8)
.
For a square grid (Ax = Ay), the following nine-point u(x,O) =f (x), 0 ,< x ,< 1,
approximation is availablle for the Laplacian in two u(0,l) = yo(t), 0< < T, (7.13)
dimensions and will have the specified truncation error,
provided that uxx+ u,,, = 0 is being solved: u(l ,I) = gl(t), 0 c t < T.

I
ui-l,j+l + 4ui.i+l f ui+l.j+l Here, f(x) is the initial condition. and g,(t) and p,(t) are
J2u J'u
-+-=
axZ ay2 + 4 ~ ~ -- ~~ O, U~ ~ +, 4~ ~ , + , , ~ the boundary conditions. The latter are of a particularly
simple type, since they specify the temperature itself at
+ u~-,,~-~+ ui+l,j-l the ends of the bar. On page 462, we consider boundary
AX)' conditio'ns of a more general nature which involve also
+ o[(Ax)~]. (7.9) the derivatives of the dependent variable.
432 Approximation of the Solution of Partial Differential Equations

7.5 The Explicit Form of the Difference Equation If the initial and boundary conditions do not match a t
In order to approximate the solution of (7.12) and (0,O) and (l,O), u ( x , t ) will be discontinuous at these
( 7 13), a network of grid points is first established through- cO'"e", and the question arises as to what values should
out the region 0 < x < 1 , 0 d t < T, as shown in Fig. 7.3, be assigned to, for example, v,,,. It appears reasonable in

Computed
f approximation

Boundary Boundary
condition: condition:
VO.n = (tn M,n = g~( t n ) xi-1 Xi fit1

Figure 7.4 The explicit form.

Initial condition:_/TXi XM such a case to use the arithmetic average of the values
v i .=
~ f(x,)
given by f ( x ) as x -+ 0 and g,(t) as t -,0;in programming,
Figure 7.3 The difference problem. it is often simpler to use either one value or the other and
to recognize that a small error is thereby introduced.
with grid spacings Ax = 1/M, At = TIN, where M and N
are arbitrary integers. In this problem, it is easy to ensure Example. Consider the heat-conduction problem of (7.12)
that grid points lie on the boundaries of x and t, although, and (7.13), with the simple conditions f ( x ) = O and g,(t) =
as we shall see later, this correspondence is seldom pos- g l ( t ) = 100. Arbitrarily choose Ax = 0.2 and At = 0.01,
sible in two-dimensional problems when the boundaries corresponding to h = 114, so that (7.15) becomes
are irregularly shaped. For any grid-point (i,n) that
does not have i = 0, i = M, or n = 0, the derivatives of
(7.12) are now replaced by the finite-difference forms
suggested by (7.4) and (7.7) : We may then verify the tabulated values ofu,,,in Table 7.1,
computed to two decimal places.

Table 7.1 Illustration ofthe Explicit Method


or, defining
Space Subscript, i
Time Sub-
script, n 2 3 4 5
then
0 0 0 0 - 0 0 0
1 100 0 0 0 0 100
2 100 25 0 0 25 100
In Fig. 7.4, the crosses and circles indicate those grid
3 100 37.5 6.25 6.25 37.5 100
points involved in the time and space differences 4 100 45.31 14.06 14.06 45.31 100
respectively. 5 100 51.17 21.87 21.87 51.17 100
If all the v,,, are known at any time level t,,, equation 6 100 56.05 29.19 29.19 56.05 100
,
(7.15) enables vi,,+ to be calculated directly (that is, etc.
,
explicitly) at the 'time level t,, for 1 < i < M - 1. For
the boundary points i = 0, i = M, we also have
A gradual diffusion of heat into the bar is evidenced by the
00,n+1 = go(t,+ I),
(7.16) general rise in temperature. Clearly, other values of A x and
V ~ , n1+= 9 1 ( ~ n + l ) . At could be chosen (subject to a restriction mcntioncd below),
each producing slightly different approximations to the true
Since the initial values of v are prescribed at t = 0 by solution u(x,t).
vi,=
~f(xi), (7.17)
7.6 Convergence of the Explicit Form
the values of v can evidently be obtained at all the grid
points by repeated application of (7.15) and (7.16); we Having constructed a plausible procedure for obtaining
must calculate all values of v at any one time level before the values of v , we must now consider the important
advancing to the next time-step. question of whether these values actually represent a
7.6 Convergence of the Explicit Form

good approximation to the solution of the original PDE where


(7.12) at the grid points.
The departure of the finite-difference approximation -2i.n
At= -
At
U"
( A X ) 2U x x x x + o [ ( A 1 ) 2 ]
--
from the solution of the P'DE at any grid point is known 12
as the local discretization error K,. That is, + AX)^]. (7.22)
Subtracting (7.15) from (7.21) and applying (7.18),
(7.18)
The finite-difference method is said to converge if w -t 0
Wi,"+ 1 = Iwi- 1," + (1 - 2J-)wi,n+ Awl+l,n+ z ~ , ~(7.23)
.

as the grid spacings Ax and At tend to zero. We now NOW suppose that 0 < l < 4. The coefficients A and
show that a sufficient condition for convergence of the (1 - 2A) are nonnegative and we therefore have the
explicit method is that 0 < A < 3. The computational inequality
procedure is assumed to be capable of an exact represen-
tation of the solution of the finite-difference equation.
I~i,n+ 11 AIwi-l,nI + (1 - 2A)Iwi,nI + AIwi+ l,"I + I ~ i , ~ l .
This is not quite true in practice, since only a finite Let wmax(n)be the upper bound of IwiSjl, 0 <j < n,
number of digits can be retained by the computer and the 1 d i 6 M - 1, and let zm,,(n) be the upper bound of
phenomenon of round-off error is introduced. ( z , , ~0( <
, j d n, 1 d i d M - 1 . Then
From Taylor's expansion, supposing as usual that u
possesses a sufficient number of partial derivatives, ~max(n+ 1) 5 wrnax(n1 + zmax(n).
Therefore, over the whole region 0 ,< I < T, since
wm,,(0) = 0 and since zm,,(n) does not decrease with
time,
wrnax(N) 5 Nzmax(N- 1),
that is,

+o[(A~)']+o[(Ax)"]/ . (7.24)
(7.20) max

From (7.24), we conclude that, provided 0 < A 6 3, the


In (7.19) and (7.20), the derivatives are evaluated at
(iAx, nAt). Employing equations (7.12), u, = u,,, and discretization error is O[(Ax)'] and thus the explicit
finite-difference representation converges as Ax -+ 0.
(7.14), 1 =. At/(Ax)', in conjunction with (7.19) and (7.20),
Finally, since u, = u,,, it follows that u,, = u,,, = u,,, =
it is readily seen that
u,,,,; therefore, for the special choice of i= 4, the
discretization error is actually AX)"].
EXAMPLE 7.1
UNSTEADY-STATE HEAT CONDUCTION IN AN INFINITE, PARALLELSIDED
SLAB (EXPLICIT METHOD)

Problem Statement functions, the problem could be solved for arbitrary initial
An infinite parallel-sided slab (0 < x < L) of thermal and boundary conditions.
diffusivity a is initially (at time t = 0) at a uniform tem- The computations follow the explicit procedure sum-
perature 8,. Its two faces are subsequently maintained at marized by equations (7.14) through (7.17). Note that
a constant temperature 8,. Find how the temperature 0 the subscripts denoting time (n and n + 1 in equation
inside the slab varies with time and position. (7.15)) can be discarded by using Ti(TOLD([)) in the
program) to denote the known temperature at point i,
and T,I (TNEW(I)) to denote the temperature about to
Method of Solution be computed at the end of the time-step. After all the T/
The heat conduction equation is a d28/dx2= aO/at. By have been computed, they are referred to as Ti, and the
defining the dimensionless variables process is repeated over subsequent time-steps; since we
thereby avoid the storage of temperatures that are no
longer wanted, the memory requirements are considerably
reduced.
the problem can be rewritten as that of solving The input data to the program include AT (the time-
step), M (the number of increments AX), and T,,, (the
maximum temperature of interest at the center of the
slab). We avoid excessive output by printing the results
only periodically, after a certain number of time-steps
subject to the associated conditions
have elapsed. This periodic printing is achieved by
incrementing a counter once every time-step, and testing
to see if it is an integer multiple of the specified number
of steps. If so, the temperatures computed for that time
The problem is thus a special case of that stated in equa- are printed. A similar procedure is also incorporated
tions (7.12) and (7.13) ;the initial and boundary conditions into several of the subsequent examples in this chapter.
now have the simple forms f ( X ) = 0, go(?) = g,(t) = 1. We also use certain plotting subroutines, available at
The following program, however, obtains its initial and the University of Michigan computing center, for dis-
boundary conditions by calling on appropriate pre- playing graphically
- - - the compbted temperatures on the
defined functions (F, Go, GI). Thus, by modifying these printout. -
Example 7.1 Unsteady-State Heat Conduction in an Infinite,Parallel-Sided Slab (Explicit Method)

Flow Diagram (excluding details of plotting points)

1
AX+- i

Begin * M *
Ti C
At-
A7 f( i ~ x )
(W2 $-

FORTRAN Zmplementatiorz
List of .Principal Variables
Program Symbol Definition

CONST 1 -- 21.
DX Space increment, AX.
DTAU Time-step, AT.
F Function giving the initial condition, f ( X ) .
GO, GI Functions giving the boundary conditions, g , ( ~ )and g,(z).
I Subscript denoting the ith grid point. Due to FORTRAN limitations, we have 1 < I < M + 1,
col-responding to 0 < i < M in the text.
ICOUNT Counter on the number of time-steps.
IFREQ, IFRPLT Number of time-steps elapsing between successive printings and plots of temperatures,
respectively.
M Number of intervals AX into which the slab is divided by the grid points.
PLOT?, PLOTP, Subroutines used for preparing on-line graph of computed temperatures (see below).
PLOT3, PLOT4
RATIO n == A ~ / ( A x ) ~ .
TAU Ti~ne,z.
TMAX Maximum center temperature to be computed.
TOLD. TNEW Vectors containing temperatures at the beginning and end of a time-step, respectively.
X Vector containing the X-coordinate of each grid point.
PLOTl and PLOT2 reserve a region of storage, known as the into the image. PLOT3 places the plotting character (*)
graph imnge, which will eventually contain all charac- -
for the series of points ( ~ ( 1 ) . ~ ( 1 ) ) . I I, 2,..., M + I , into
ters, blanks, etc., constituting the final graph; these sub- locations in the image governed by the coordinates of
routines also place symbols appropriate to the horizontal these points. PLOT4 controls the printing of the graph
and vertical grid lines, and to the numbers along the axes, image and inserts an appropriate label for the ordinate.
436 Approximation of the Solution of Partial Differential Equations

Program Listing
C A P P L I ED NUMERI CAL METHODS, EXAMPLE 7.1
C UNSTEADY-STATE HEAT CONDUCTION I N A PARALLEL-SIDED SLAB,
C SOLVED BY THE E X P L I C I T METHOD.

THE I N I T I A L TEMPERATURE D I S T R I B U T I O N I S G I V E N BY THE FURCTION


F(X). THE BOUNDARY TEMPERATURES AT X = 0 AND X = 1 ARE
G I V E N BY THE FUNCTIONS GO(TAU) AND G l c T A U ) , RESPECTIVELY.
THE TEMPERATURES ARE PRINTED EVERY IFREQ TIVE-STEPS, U N T I L THE
CENTER TEMPERATURE EXCEEDS TMAX.

..... DEF IP:ITlONS OF FUNCTIONS FOR COMPUTING


I N I T I A L AND BOUNDARY CONDITIONS .....
F(D1ST) = 0.0
G O ( T I M E 1 = 1.0
G l ( T I M E ) = 1.0

DIMENSION T O L D ( 2 1 ) , TNEIJ(ZI), X(211, ARRAY(2500)

..... READ AND CHECK DATA, AND COMPUTE CONSTANTS


READ ( 5 , 1 0 0 ) DTAU, TMAX, M, IFREQ, I F R P L T
.....
INT = M/10
FLOATbl = M
DX = l.O/FLOATM
RATIO = DTAU/(DX*DX)
WRITE ( 6 , 2 0 0 ) DX, DTAU, TMAX, M, IFREQ, RATIO, I F R P L T
CDNST = 1.0 - Z.O*RATIO

C ..... E S T A B L I S H GRAPH S I Z E FOR PLOTTING ROUT1 NE


CALL P L O T 1 (0, 6, 9, 5, 1 9 )
.....
CALL PLOT2 (ARRAY(11, 1.0, 0.0, 1.0, 0.0)
C
C ..... SET AND P R I N T I N I T I A L TEMPERATURES
MP1 = M + 1
.....
DO 2 1 = 1, M P 1
FLOAT1 = I

2
X ( I ) = (FLOAT!
TOLD([) = F(FLOATI*DX)
- 1.O)fFLOATK -
TAU = 0.0
WRITE ( 6 , 2 0 1 )
WRITE ( 6 , 2 0 2 ) TAU, ( T O L D ( I 1 , I = 1, MP1, I N T I
l COUNT = 0

..... COMPUTE TEMPERATURES FOR SUCCESSIVE TIME-STEPS


TAU = TAU + DTAU
.....
ICOUNT = ICOUNT + 1
DO4 1 = 2 , M
TNEW(I) = RATIO*(TOLD(I-1) + T O L D ( I + l ) ) + CONST*TDLD(I)

.....
DO5
SUBSTITUTE NEW TEMPERATURES I N TOLD
I = 2 , M
.....
T O L D ( 1 ) = TNEW(1)

..... SET BOUNDARY CONDITIONS


T O L D ( 1 ) = GO(TAU)
.....
TOLO(MP1) = G l ( T A U )

..... PRl NT T ' S AND STORE PLOTTING POINTS WHEN APPROPRIATE


I F ( ( I C O U N T / I F R E Q ) * l F R E Q .NE. ICOUNT) GO TO 3
.....
MOVER2 = t4/2
WRITE ( 6 , 2 0 2 ) TAU, ( T O L D ( I 1, I = 1, MPI, INTI
I F ((ICOUNT/IFRPLT)*IFRPLT .NE. ICOUNT) GO TO 8
CALL PLOT3 ( l H * , X ( 1 ) , T O L D ( l ) , MP1, 4 )
I F (TNEW(MOVER2) .LE. TMAX) GO TO 3

..... P R I N T GRAPH OF TEMPERATURE T VS. DISTANCE X


WRITE ( 6 , 2 0 3 )
.....
CALL PLOT4 ( 2 7 , 27HD IMENS IONLESS TEMPERATURE T )
WRITE ( 6 , 2 0 4 )
GO TO 1
Example 7.1' Unsteady-State Heat Conduction in an Infinite, ParalleESided Slab (Explicit Method)

Program Listing (Continued)


L
C
100
..... FORMATS FOR I N P U T AND OUTPUT STATEMENTS
FORMAT (6X, F7.4, l o x , F5.2, 6X, 13, l o x , 13, 11X, 1 3 )
.....
200 FORMAT ( 8 2 H 1 UNSTEADY-STATE HEAT CONDUCTION I N A SLAB, E X P L I C I T
lMETHO0, W I T H THE PARAMETERS/ 1 2 H 0 DX = , F10.5/
212H DTAU = , F 1 0 . 5 / 1 2 H TMAX = , F 1 0 . 5 / 1 2 H M - , -
3 1 4 / 12H IFREQ = , 1 4 / 1 2 H RATIO = , F10.5/
4 12H IFRPLT = 14)
2 0 1 FORMAT (8HO TIM;, 18X, 39HVALUES OF TEMPERATURE AT A L L G R I D P O I N T
1s
202 FORMAT ( 1 H , F 7 . 3 / ( 1 H ,
7X, l l F 8 . 5 ) )
203 FORMAT ( 1 H 1 )
204 FORMAT (1H0, 38X, 2 5 H DIMENSIONLESS DISTANCE X )
C
END

Data
DTAU = 0.005 , TMAX = 0.95, M = 10, IFREQ = 5, IFRPLT = 10
DTAU = 0.0025, TMAX = 0.95, M = 10, IFREQ = 10, IFRPLT = 20
DTAU = 0.001 , TMAX = 0.95, M = 10, IFREQ = 25, IFRPLT = 50
DTAU = 0.001 , TMAX = 0.95, M = 20, IFREQ = 25, IFRPLT = .50

Computer Output (for 4th Data Set)


UNSTEADY-STATE HEAT CONDUCTION I N A SLAB, E X P L I C I T METHOD, W I T H THE PARAMETERS

DX = 0.05000
DTAU = 0.00100
TMAX = 0.95000
M = 20
IFREQ = 25
RATIO = 0.40000
IFRPLT = 50

TIME VALUES OF TEMPERATURE AT A L L GRIDPOI NTS


0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.025
438 Approximation of the Solution of Partial Diferential Equations

Computer Output (Continued)

DIMENSIONLESS DISTANCE X
Example 7.1 Unsteady-State Hear Conduction in an infinite, Parellel-Sided Slab (Explicit Method) 439
Discussion of Results have been added by hand. The accuracy of the computa-
The computer output is reproduced above for the 4th tions for all data sets may be checked from Table 7.1.1,
data set only. Each row of numbers represents the which compares values of the dimensionless center
temperatures computed for the indicated value of time, temperature T, with those given by Olson and Schultz
as in Table 7.1. The lines joining the points on the graph [17] corresponding to the known analytical solution:

Table 7.I . 1 Values of Center Temperature, T,

Time, 7 0.05 0.1 0.15 0.2 0.25


-
T, (exact) 0.228 0.526 0.710 0.823 0.892
T, (Set 1, M = 10, X=0.5) 0.219 0.526 0.713 0.826 0.895
T, (Set 2, M = 10, X = 0.25) 0.216 0.520 0.707 0.822 0.891
T, (&:t 3, M = 10, X = 0.1) 0.226 0.523 0.708 0.822 0.891
T, (Sat 4, M = 20, = 0.4) 0.222 0.523 0.709 0.823 0.892
-
Considering the relatively small number of grid points used, the agreement is satisfactory. Better accuracy could be
achieved by taking more grid points.
440 i Approximation of the Solution of Partial Differential Equations

7.7 The Implicit Form of the Difference Equation The boundary and initial conditions of the explicit
In the explicit method previously described, ui,,depends method still hold:
only on u ~ ,,-- l ~, vi,,- and v i + ,,,- ,.
Referring to Fig. U O , ~ +1 = g0(1n+ 11,
7.5, only those values of v within the pyramid-shaped
area A can have any influence on the value of vi,,, U ~ , n + l= gl(tn+l), (7.27)
whereas it is known that the solution u(x,y) of the PDE v i , ~=.f (xi).
depends also on the values of u both in A and in B for
times earlier than t,. At any one time level, equation (7.26) will be written once
for each point 1 < i < M - 1 , resulting in a system of
M - 1 simultaneous equations in the M - 1 unknowns
v i , , + , . The methods of solution for such a system will be
discussed after we consider the cdnvergence of the implicit
method.
7.8 Convergence of the Implicit Form
Following a procedure similar to that for the explicit
case, it is easily shown with the aid of Taylor's expansion
that
Ui,n+l - Ui,n =Aui-l,n+l " ?hui,n+l
+ J 3 ~ i + l , n + l + Zi,n, (7.28)
where
Figure 7.5 Limitation of the explicit method.

Furthermore, the requirement 0 < At/(Ax)' < 3 places


an undesirable restriction on the time increment which In this equation, u,, and u,, are evaluated at the grid-
can be used. *For problems extending over large values point (i,n + 1). More simply,
of time, this could result in excessive amounts of com-
putation.
The implicit method, now to be described, overcomes
both these difficulties at the expense of a somewhat Thus, from (7.18), (7.26), and (7.28), the discretization
more complicated calculational procedure. It consists of error satisfies the equation
representing u,, by a finite-difference form evaluated at
the advanced point of time t,+ , , instead of at t, as in the
explicit method. Referring again to the problem of
equations (7.12) and (7.13), the difference equation Now, at any time level t , + , , w is zero at the boundary
becomes + +
points (0,n 1) and (M,n 1). In between, it will have
a maximum and/or a minimum (if it is not zero every-
where), as shown in Fig. 7.7.

That is, the following relation exists between the vahes


of v at the four points shown in the space-time grid of
Fig. 7 . 6 :

A
Figure 7.7 The discretization error.

From Fig. 7.7, it can be seen that the expression


( A W ~ - ~ ,-, + ~ +
I W ~ + ~ , , + of
, ) (7.30) will be
nonpositive where w i , , + , has a maximum and nonnega-
Figure 7.6 The implicit form. tive where wi,,+, has a minimum. (We are concerned
7.9 Solution of Equations Resulting from the Implicit Method 441
here w ~ t habsolute and not local or relative maxima and of three variables per equation, the solution can be
minima.) expressed very concisely.
Let w A l and w,, be the respective maximum and We first demonstrate the validity of a recursion
,
minimum values of w,,,, for I ,< i < M - 1. Let z: solution of the form
and z; be the values of z,,, at those points that have the
respective maximum and minimum values of wi,,,+ l . v. = y. - -
Ci
vi+l,
Then, from (7.30), Pi

w:+ I < w z +,:z in which the constants PI and y, are to be determined.


Substitution into the ith equation of (7.33) gives
w,, >, w, + 2,.

Let w,,,(n) be the upper bound of Iwi,,l for 0 <j < n, + b,vl + c , ~ , +=~dl.
1 < E < M - 1, and let zmax(n)be the upper bound of
lzl,,l for 0 <j < n, 1 < 1; < M - 1. Then wma,(n + 1) d That is,
w,,,(n) + z,,,(n). Thus, over the whole region of
0 < t < T, since wma,(0) = 0 and since zma,(n) does not v1 = di - all',- 1 -
c,ul+1
I
decrease wlth time, a1c1- 1 a,c1- 1
6,-- 6, - -
w,,,(N) d Nzma,(N - 1 ) ; PC- 1 PI- 1
that is, from (7.29), which verifies the above form, subject to the following
recursion relations :
w,,,(N) < 'T[O(At+ (Ax)')]. (7.31)
aici- di - aiyi-
Therefore, we conclude that the implicit method con- p.=b.--,
1 1 yi=
verges to the solution of the PDE as At -+0 and Ax -t 0, Pi- I PL
regardless of the value of the ratio At/(Ax)'. Also, from the first equation of (7.33),
7.9 Solution of Equatio~lsResulting from the Implicit
Method
Having established the convergence of the implicit
scheme, we now return to study the solution of the M - 1 whence PI = 61 and y1 = dl/Pl. Finally, substitution of
linear equations which result at each time step, namely,

--Ivi-l,n+l + ( I + 2 i ) ~ i , -~ i+~~~ + ~ =ui,,,,


, ~ + l for 2 s is M - 2 , (7.32)

Expressed more clearly, equations (7.32) are a special


form of the system

In going from (7.32) to (7.33), the subscripts (n + 1) on


the v's have been dropped, and the right-hand sides of the recursion solution into the last equation of (7.33)
(7.32), each of which is a known quantity, are called yields
d , , d,, .... dN for simplicity, with N = M - 1. The matrix
of coefficients a, b, and c alone is called a tridiagonal
matrix. The system (7.33)l is readily solved by a Gaussian UN =
d ~ - a ~ u ~ -
- l
-
elimination method (pp. 270 and 272); with a maximum bN
442 Approximation of the Solution of Partial Differential Equations

whence One of the disadvantages of a Gaussian elimination


method is that round-off error may accumulate .seriously.
However, Douglas [7] has conducted an analysis of the
scheme of (7.34) and (7.35) and expects the round-off
error to be small in comparison with the discretization
error for usual choices of Ax and At.
To summarize, the complek algorithm for the solution We can now compare the amounts of computation
of the tridiagonal system is required by the explicit and implicit methods. In making
a rough estimate, we consider here only the number of
-
multiplication and division steps. For M 1 points at
each time level, the explicit scheme of (7.15) requires
-
2M 2 multiplication steps. Now if, as is the case with
(7.32), the coefficients a, b, and c of (7.33) remain cons-
tant, the pi of (7.3'5) can be predetermined. In this case,
where the p's and y's are determined from the recursion
the implicit scheme referred to requires some 3M 3 -
steps. The absence of a restriction on the size of At/(Ax)'
formulas in the implicit method generally outweighs this moderate
increase in computational effort.
Finally, note that (7.33) might also be solved by the
Gauss-Seidel iteration scheme, discussed in Section 7.24.
However, each iteration consumes 2M - 2 steps, and
since several iterations will generally be required for
satisfactory convergence, such a procedure is not
recommended.
EXAMPLE 7.2
UNSTEkDY-STATE HEAT CONDUCTION IN AN INFINITE PARALLEL-SIDED
SLAB (IMPLICIT METHOD)
Problem Statement TRIDAG, for which a typical call might be

Use the implicit method to solve the same problem CALL TRIDAG (IF, L, A, B, C, D, V)
that has already been stated in Example 7.1. Here, the last five argument names are vectors that are
used to store the corresponding elements appearing in
Method of Solution equation (7.33). In order to make the subroutine more
general, two additional integers, IF and L, have been
The implicit technique employs the algorithm expressed introduced into the argument list. These correspond to
in equations (7.34) and (7.35) for the solution of the the first and last subscripts, f and 1, of the unknowns;
M - 1 algebraic equations (7.32) in temperature that that is, the unknowns are assumed to be vf, vf+,, . . .,
result at each time-step. For convenience .in later q-,, v I . In the present application of solving equation
examples, this algorithm is written as a subroutine named (7.32), f = 1 and I = M - 1.

Flow Diagram
Main Program

1
AX+- a i t -I
M + b i c l +2I
Begin
AT
Ic- - ci t -A
(AX)"
I
I
L - - - -- - - -

Solve system of tridiagonal eqns


having coefficients ai, bi, ci, dl,
i = 1,2, . . ., M - 1. Put solution
into TI, T,, . .., T M - , .
(Subroutine TRIDAG)
444 Approximation of the Solution of Partial Differential Equations
Subroutine TRIDAG (Dummy arguments: f, 1, a , b, c, d , v ;
calling arguments: 1, M - 1, a , b , c, d , T )

Ciui + I
Return + vityi--
Pi

I 4
FORTRAN Implementation
List o f Principal Variables
Program Symbol DeFnition
The variable names are the same as those listed in Example 7.1, except that TOLD, TNEW, IFRPLT, and the plotting
routines have been deleted, and the following have been added:
(Main)
A, B. C, D Coefficient vectors defined in equations (7.33).
T Vector of temperatures at each grid point.
TRlDAG Subroutine for solving system of simultaneous equations having a tridiagonal coefficient
matrix.
(Subroutine
TRI DAG)
BETA, GAMMA Vectors of intermediate coefficients pi and y i , defined in equations (7.35).
IF,L Subscripts f and I, corresponding to the first and last equations to be solved.
V Vector containing the computed solution.
Because of FORTRAN limitations, the subscripts in the text
...,M) are advanced by onewhen they appearin theprogram
(i = 0,1,
(1=1,2 ,..., M + l ) .
Example 7.2 Unsteady-State Heat Conduction in an Infinite Parallel-Sided Slab (Implicit Method)

Program Listing
Main Program
A.P
. ..P -L t.E-D NUMERICAL
- . - METHODS. EXAMPLE 7.2
UNSTEA,DY-STATE HEAT-CONDUCTION I N A PARALLEL-SIDED SLAB,
SOLVED BY THE I M P L I C I T METHOD.

THE I N I T I A L TEMPERATURE D I S T R I B U T I O N I S G I V E N BY THE FUNCTION


F(X). THE BOUND,ARY TEMPERATURES AT X = 0 AND X = 1 ARE
G I V E N BY THE FUNCTIONS GO(TAU) AND G l ( T A U ) , RESPECTIVELY.
THE TEMPERATURES ARE PRINTED EVERY IFREQ TIME-STEPS, U N T I L THE
CENTER TEMPERATURE EXCEEDS TMAX. THE T R l D l A G O N A L SYSTEM OF
EQUATIONS RESULTING AT EACH TIME-STEP I S SOLVED BY THE
SUBROUTINE TRIDAG.
L
C
C
..... DI NE IFTl NI AI TLI OAND
NS OF FUNCTIONS FOR COMPUTING
BOUNDARY CONDITIONS .....
F ( D I S T ) = 0.0
G O ( T I M E 1 = 1.0
G l ( T I M E ) = 1.0
C
DIMENSION A(211, B(211, C(21), D(21), T(21)
C
C ..... CHECK I N P U T PARAMETERS AND SET ARRAYS A,
1 READ ( 5 , 1 0 0 ) DTAU, TMAX, M, IFREQ
B, AND C .....
FLDATM = M
DX = l.O/FLOATM
R A T I O = DTAU/(DX*DX)
WRITE ( 6 , 2 0 0 ) DX, DTAU, TMAX, M, IFREQ, RATIO
DO2 1 = 2 , M
A(1) = - RATIO
B ( I ) =. 1.0 + 2,.O*RATIO
2 C(I) = - RATIO

..... SET AND P R I N T I N I T I A L TEMPERATURES


MP1 = M + 1
.....
0 0 3 I = 1, M P 1
FLOAT1 = I
T ( I ) = F(FLOATI*DX)
TAU = 0.0
WRITE ( 6 . 2 0 1 )
WRITE (6,202) TAU, ( T ( I ) , 1 = 1. MP1)
ICOUNT = 0

..... PERFORM CALCULATIONS OVER SUCCESS l YE TIME-STEPS


TAU = TAU + DTAU
.....
ICOUNT = ICOUNT + 1

..... S E T BOUNDARY VALUES


T ( 1 ) = GO(TAU)
.....
T(MP1) = G l ( T A U )

..... COMIPUTE RIGHT-HAND S I D E VECTOR D


DO5 1 = 2 , M
.....
D(1) = T(1)
D(2) = D(2) + RATIO*T(l)
D(M) = D(M) + R A T I O * T ( M P l )

..... COMPUTE NEW TEMPERATURES


CALL T R I D A G (2, M. A, B, C, D, T )
.....
.I F..(.(.I C PORUINHTT/ I FTEMPERATURES WHEN APPROPR lATE .....
REQ)*lFREQ .NE. ICOUNT) GO TO 4
MOVER2 = M / 2
WRITE (6,202) TAU, (T(I), l = 1, M P I )
I F (T(MOVER2) .LE. TMAX) GO TO 4
GO TO 1
446 Approximation ojthe Solution ofPartiaI Diffkrential Equations

Program Listing (Continued)


C
C
100
..... FORMATS FOR INPUT AND OUTPUT STATEMENTS
FORMAT (6X, F7.4, l o x , F5.2, 6X, 1 3 # 10X, 1 3 )
.....
200 FORMAT ( 8 2 H l UNSTEADY-STATE HEAT CONDUCTION I N A SLAB, I M P L I C I T
IMETHOD, WITH THE PARAMETERS1 12H0 DX = , F10.5/
212H DTAU ,
FlO,S/ 12H TMAX = ,
F10.51 12H M P 8

314/ 12H IFREQ * , 1 4 / 12H RATIO = ,


F10.5)
2 0 1 FORMAT i8HO TIME, 18X, 39HVALUES OF TEMPERATURE AT ALL GRIDPOINT
1s)
202 FORMAT ( l H , F7.3/(1H, 7X, llF8.5))
C
END

Subroutine TRI DAG


C SUBROUTINE FOR SOLVING A SYSTEM OF LINEAR SIMULTANEOUS
C EQUATIONS HAVING A TRlDlAGONAL COEFFICIENT MATRIX.
C THE EQUATIONS ARE NUMBERED FROM I F THROUGH L, AND THEIR
C SUB-DIAGONAL, DIAGONAL, AND SUPER-DIAGONAL COEFFICIENTS
C ARE STORED I N THE ARRAYS A, B, AND C. THE COMPUTED
C SOLUTION VECTOR V( IF)...V(L) I S STORED I N THE ARRAY V.
C
SUBROUTINE TRIDAG ( I F , L, A, B, C, D, V)
DIMENSION A(1), B(1), C(1), D ( 1 ) , V ( 1 ) , BETA(^^^), GAMMA(101)
C
C .....COMPUTE INTERMEDIATE ARRAYS BETA AND GAMMA
BETA(IF) = B ( I F )
e r . . .

GAMMACIF) = D ( I F ) l B E T A ( I F )
IFPl = IF + 1

1 GAMMA(I) = (D(I) - A(I)*GAMMA(I-l))/BETA(I)


C
C ..... COMPUTE F I N A L SOLUTION VECTOR V
V ( L ) = GAMMA(L)
.....
LAST = L - IF
DO 2 K = 1, LAST

2
I = L - K
V ( I ) = GAMMA([)
RETURN
- C(I)+V(l+l)/BETA(I)

C
END

Data
DTAU = 0.0025, TMAX = 0.95, M = 10, IFREQ = 10
DTAU = 0.0125, TMAX = 0.95, M = 10, IFREQ = 2

Computer Output (for 2nd Data Set)


UNSTEADY-STATE HEAT CONDUCTION I N A SLAB, I M P L I C I T METHOD, WITH THE PARAMETERS

DX = 0.10000
DTAU = 0.01250
TMAX = 0.95000
M = 10
lFREQ= 2
RATIO = 1.25000

TIME VALUES OF TEMPERATURE AT ALL GRIDPOINTS


0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Example 7.2 Unsteady-State Heat Conduction in an Infinite Parallel-Sided Slab (Implicit Method) 447
Computer Output (Continued)
448 Approximation of the Sofution of Partial Dflerential Equations

Discussion of Results mium; note that the corresponding value of 1 is 1.25,


The printed output, which is given here only for the appreciably above the upper limit of A = 0.5 in an explicit
second data set, shows values of dimensionless tempera- procedure.
ture T for all grid points at every second time-step, i.e.,
for values of 2 spaced by 0.025. Table 7.2.1 lists repre- Table 7.2.1 Values of Center Temperature, T,
sentative values of the dimensionless center temperature,
Time, T 0.05 0.10 0.15 0.20 0.25
T,, for both data sets, 111 comparison with the known exact
solution of Olson and Schultz [17]. Computed T, 0.231 0.520 0.704 0.817 0.887
Considering the fairly small number of grid points used, (AT = 0.0025)
A2 = (3.0025 gives reasonably accurate results. Even Computed T, 0.226 0.500 0.684 0.801 0.874
A2 = 0.0125 would be adequate for rough engineering
(AT = 0.0125)
Exact T, 0.228 0.526 0.710 0.823 0.892
calculations, particularly if machine time were at a pre-
Stability
7.10 Stability whence
It has been shown in previous sections that, under
certain condtions, the explicit and implicit finite-
difference forms are convergent. The term convergent is
understood to mean that the exact solution of the finite- Since #(0) = 1, this has the solution
difference problem (in the absence of round-off error)
tends to the solution of the PDE as the grid spacings in
time and distance tend to zero. There are two important
concepts closely associated with the convergence of a For stability, $(t) must remain bounded as At, and thus Ax,
particular finite-difference. procedure, namely, those of approaches zero. Clearly, this requires
consistency and stability. Indeed, Richtmyer [23] presents
a theorem attributed to Lax [13] which demonstrates for
linear PDE that, provided that a consistency criterion
(to be discussed in the next section) is satisfied, stability
is both a necessary and sufficient condition for converg- An equivalent viewpoint is to define an amplification factor
ence. A similar conclusion is also reached by Douglas [5]. f as
Referring to a certain computational procedure, the
term stability denotes a property of the particular finite-
difference equation(s) used as the time increment is made
vanishingly small. It means that there is an upper limit and to require that < 1, which again leads to condition
(as At -+ 0) to the extent to which any piece of informa- (7.36).
tion, whether present in the initial conditions, or brought In general, components of all frequencies fl may be present;
if they are not present in the initial conditions, or not brought
in via the boundary conditions, or arising from any sort
in by the boundary conditions, then they are likely to be
of error in the calculations, can be amplified in the com- introduced by round-off error. That is, we must guard against
putations. The following is a brief outline of the treatment unbounded amplification of $(t) for all P. Since sin2(@Ax/2)
for determining the stability of finite-differenceprocedures, is unity for some B, inspection shows that h must be at most
from von Neumann, first given by O'Brien, Hyman, and + if condition (7.36) is to be satisfied. That is, for the explicit
Kaplan [16] and also discussed in References 4, 5, 9, 12, representation of u, = us,, h = At/(Ax)* < 4 is a necessary
19, and 23. Stability alorie does not necessarily mean (and sufficient) condition for stability. This is not, perhaps,
that the deviation between the true solution to a certain a surprising result, since equation (7.15) is
PDE and its finite-difference approximation will be in
any sense small. Rather, ~t implies the boundedness of
the finite-difference solution, at a given time t, as At tends Intuitively, we might expect u,,, to contribute towards v,..,,
to zero. in a "nonnegative" manner; this can occur only for h < 4.
Assume that at any stage, referred to here as t = 0,a The phenomenon of instability is easily demonstrated by
Fourier expansion (either as an integral or as a finite reworking the example of page 432, with Ax =0.2 and
series) can be made of satme initial function f(x), and A t = 0.04, so that h = 1. In this case, (7.15) gives
that a typical term in the expansion, neglecting a constant
coefficient, is elBx,where /Iis a positive constant, and j
here denotes JF.Assume further that a separation Table 7.2 shows the resulting computed values, which exhibit
of time and space variables can be made, and that at a an obvious instability, even in the absence of round-off error.
time t, this term has become $(t)ejax. By substituting in
the original difference equation, the form of $ can be
Table 7.2 Instability o f Explicit Method for h = 1
found, and a criterion thereby established as to whether
or not it remains bounded as t becomes large. To illus-
trate the method, three examples are now considered. Space Subscript, i
Time Sub-
Example. The explicit finite-difference form of the PDE script, n I 2 3 4 5
U, = u,, is
0 0 0 0 0 0 0
1 100 0 0 0 0 loo
2 100 100 0 0 100 100
Substituting V I , , , = $(t)ejp", we have 3 100 0 100 100 0 100
4 100 200 0 0 200 100
5 100 -100 200 200 -100 100
6 100 400 -100 -100 400 100
= - $0) [eJB(x - -
ax) zeJ4= f + As)
1 9 etc.
450 Approximation of the Solution of Partial DifferentialQuations
Exanlple. The inzplicit finite-difference form of the PDE either < 1 or g 1 + O(At) leads to the same con-
U,= u,, is clusion.

7.11 Consistency
The term consistency, applied to a certain finitedif-
The substitution vi,. =@(t)eJPxresults in an amplification ference procedure, means that the procedure may in
factor fact approximate the solution of the PDE under study,
1 and not the solution of some other PDE. For example,
E= 3 Ax' consider the explicit approximation,
1 + 4X sinZ/- 2
Since 141 G 1 for all A, the implicit procedure is unconditionally
' stable. of the one-dimensional heat-conduction equation
Example. The finite-difference equation proposed by
Richardson [22] for the PDE u, = u,, represents the time
derivative by the formula (7.6), in which the truncation error
is O[(At)2]:
The truncation error of the approximation is defined as the
difference between the finite differences and the deriva-
tives they are intended to represent. With the aid of
Following the usual procedure, we arrive at a quadratic Taylor's expansion, it is readily shown that the truncation
equation in the amplification factor: error corresponding to (7.40) and (7.12) is

By,inspection, if the roots are t1and E2, then = -l/ez.


The only possible solution satisfying < 1is t1= 1,4, = -1 ;
but this solution contradicts the fact that, for a general /3, the Since the truncation error tends to zero as At, Ax -,0,
right-hand side of (7.38) is negative, irrespective of the value the explicit representation is consistent with the.origina1
of A. Therefore, we conclude that Richardson's method is PDE.
unconditionally unstable. Leutert [14], however, has shown that Consistency is often taken for granted. However, the
if the initial conditions for the finite-difference solution are following example shows that a certain amount of caution
properly chosen (by a method in which they are closely must be observed. Consider the explicit approximation
related to, but not necessarily identical with, those in the dif- to equation (7.12) proposed by DuFort and Frankel [8],
ferential problem), then Richardson's method converges for name1y,
all A. Unfortunately, considerations of round-off error would
probably prevent this fact from being fully exploited in practice.
The corresponding method for investigating the
stability of the finite-differencerepresentation of a system It may bq - shown1that this scheme is unconditionally
of simultaneous partial differential equations is discussed
stable. From Taylor's expansion the truncation error is
under Example 7.5.
Richtmyer [23] indicates that, for stability, the ampli-
fication factor should really obey the condition

rather than the stricter condition < 1 used here. In-


equality (7.39) allows for the possibility that the true
solution might have the form eAt,which is incompatible
with the restriction < 1. However, for problems in
which it is thought that an increasing exponential solution The consistency of the DuFort-Frankel method depends
is unlikely or even impossible, there seems to be a on the way in which At, Ax tend to zero. If At/(Ax)' is
reasonable case in practice for ensuring that the difference held constant as At and Ax + 0, then the scheme is con-
procedure has an amplification factor < 1, in order to sistent with equation (7.12). On the other hand, if
prevent the amplification of any round-off error. Cer- At/Ax = c, a constant, as At and Ax + 0, then the term
tainly, in the three examples considered, the condition of c2(a2u/atz) remains, and the difference procedure is
7.13 Ilncotzditionally Stable Explicit Procedures

consistent instead with an entirely different PDE, (7.43), written out in full, becomes
namely,

which is very similar to (7.26) for the implicit procedure.


7.12 The Crank-Nicolsoa Method Forsythe and Wasow [9] mention that if, instead of
using the Crank-Nicolson value of 0 = 3 in (7.42), the
Both the explicit and implicit methods just described
choice 0 = (6A - 1)/12A is made, then the discretization
lead to discretization errors of Q[At +(Ax)'], We will
error becomes AX)^], and that if I. is also chosen to
now develop a finite-difference method which reduces the
dependency on the tirn~e increment from Q(L\t) to be l/J%, the error is further reduced to AX)^]. How-
ever, these authors also caution against expecting such
~[(At)~l.
~ ~ that ~(7.6) ~gives l an approximation
l to au/at improvements for an equation not of the simple form
within ~ [ ( d t ) ~We
] . can, therefore, write the following Ut = Uxx.
equation for the derivative au/at at the half-way point
(i,n + f)of Fig. 7 . 8 ~ ; 7.13 Unconditionally Stable Explicit Procedures

du ui,n+l - u , , + 0 [ ( ~ ~ ) 2 ] , Although the familiar explicit method of Section 7.5 is


- =- - (7.41) stable only for A < 4, there are several other simple
at ~a
explicit procedures which are free from this restriction.
(Notice that this is true even if rc,, $ u,). Examples of such methods are illustrated below for the

(4 (b)
Figure 7.8 (a) Crank-Nicolson method. (b) DuFort-Frankel method.

Also, by performing a Taylor" expansion, the following solution of ut = u,, but all can be extended to more than
approximtttion aan be derived for a2u/ax2, again at the one space dimension.
half-way point (i,n + -)): DuFort-Frankel Method [S]
aZu
-= 06,2ui,,, ,+ (1 - B)~:u~,~, (7.42)
Vi,n+l -Ui,n-1 -
-Vi-1.n-Vi,n-1 +vi+l,n

ax2 2At (Ax)2


(7.45)
where the value of f is anywhere in the range 0 f 4. Note that three time levels are involved, as shown in
The central-differenceoperator 6, i s used for conciseness; Fig, 7,8b,
it is defined by (7.10). Saul'yev Method. The following is a special case of the
In the Crank-Nic0ls~nmethod [31, 0 is chosen to be method given on page 29 et seq. of [31]. Advance to time
112 and the finite-differenceequation, corresponding to level + 1 by proceeding in the positive xdirection
(7.41) and (7.42) for the PDE u , = u,,, is with

It may be shown that the Crank-Nicolson method is


and then advance to time level by proceeding in
+

stable for all values of the ratio A = At/(Ax)', and that it


the negative x-direction (over the same region) with
+
converges with discretization error ~ [ ( A z ) ~AX)^].
Although this is a distinct iimprovement over the previbis vi,n+2 - vi,n+ 1 -
- vi- I..+ i - ui,n+ i - Vi,n+ 2 + V i + l , n + z
methods, the computation is only slightly more corn- At (Ax)2
plicated than for the implicit method. Indeed, equation (7.46b)
452 Approximation of the Solution of Partial Differential Equations

Note that v i P l , n + l and v ~ + , , ~in+the


~ first and second of On the other hand, the implicit method leads to the
these formulas will be known either from computations difference equation
at the preceding grid point or from a boundary condition.
This explicit alternating-direction procedure is then
repeated for successive pairs of time-steps.
Larkin [30] also presents a modification of the Saul'yev
which, when written out in full for the simple case of a
method, in which the following two formulas are first
square grid with Ax = Ay, has the form
used to compute two intermediate values p and q, both
at time level n + 1 :

Just as for the one-dimensional case, it may be shown that


this scheme is stable, independent of I.. There are now
five unknowns per equation, and the simple version of
The final approximation is then the Gaussian elimination method given by the algorithm
(7.34) for the special tridiagonal system of (7.33) is no
Vi,n+ 1 = *(pisn+
1 f 4i.n+1). (7.47~)
longer applicable. Gaussian elimination may still be
Barakat and Clark Method [26]. Both pi,, and qi,, are used, however, but only at the expense of a considerable
retained from the computations of the previous time-step. amount of computation. An alternative is to use the
The formulas, for use in the forward (+x) and backward Gauss-Seidel iterative method discussed in Section 7.24,
(-x) directions respectively, are although this method may need a fair number of iterations
for adequate convergence.
The implicit alternating-direction method, discussed by
Peaceman and Rachford [19] and Douglas [4,6], avoids
these disadvantages and yet still manages to use a system
of equations with a tridiagonal coefficient matrix for
which the algorithm of (7.34) affords a straightforward
solution. Essentially, the principle is to employ two
difference equations which are used in turn over successive
time-steps each of duration At/2. The first equation is
followed by implicit only in the x-direction and the second is implicit
Vi,n+ 1 = :(~i,n+I + 4i,n+1). (7.48~) only in the y-direction. Thug, if c:j is an intermediate
value at the end of the first time-step, we have
7.14 The Implicit Alternating-Direction Method
It is now logical to progress to a parabolic PDE having
two space coordinates. That is, let u = u(x,y,t) and v =
v(i,j,n), where y =jAy. A simple example arising from, followed by
say, unsteady-state heat conduction in a flat plate is:

Written out in full and rearranged, with Ax = Ay for


The explicit method of solution, in which the space
simplicity, these equations become
derivatives are approximated at the time level tn, leads to
the difference equation

The solution of (7.50) presents no difiiculties and will not Vi.j.n + V i . j + l.n, (7.53a)
be discussed further, except to say that the following
restriction between the time and space increments must
be observed to ensure stability:
-Vi,j-l.n+l + 2(! l)Vi,j,n+l - Vi.j+l,n+l

V; + ~ i * , ~ (7.53b)
,~.
7.15 Additional Merhodsfor Two and Three Space Dimensions

The first equation is solved for the intermediate values


v*, which are then used in the second equation, thus
leading to the solution v ~ , ~ , , , +at
, the end of the whole
time interval At.
The stability of this procedure is investigated by the
von Neumann method, previously outlined in Section - v,
7.10 for one space variable. Substitution of the term
v.+ 1 = 162 u*
2 x( + v,) + ts;(u** + v3 + f6,2(vn+1 + 0").

t,b(t)ejaXejpy( j here denotes n) into the difference


equations (7.52a) and (7.526), and elimination of the
At
Here, v* and v** denote the two intermediate values; the
intermediate function t,b(t + At/2), yields the following space subscripts are omitted for clarity. A general discus-
expression for the amplification factor across a whole sion of alternating-direction methods is also given by
time-step : Douglas and Gunn 1321.
Another procedure, given by Brian [27], is also avail-
able :

Clearly, I1;I < 1 for any value of At, and the procedure is The third intermediate value v*** can actually be elimi-
unconditionally stable. Convergence occurs with a dis- nated between the three equations to produce an equiva-
cretization error O[(At)2+ AX)^]. We assume here, and lent system in which v* and v** can be regarded as suc-
in Section 7.15, that ratios such as Ax/Ay are held cessive approximations to v at the half time-step:
constant.
The implicit alternating-direction procedure also finds
an important application in the solution of elliptic PDEs
(p. 508).

7.15 Additional Methods for Two and Three Space


Dimensions
The implicit alternating-direction method can be
extended to three space dimensions by using two inter- Both of the above three-dimensional procedures are un-
mediate values, but an exalmination of the corresponding conditionally stable and converge with discretization
amplification factor shows that it is unstable for A > 312. error 0 [ ( A t ) 2+ (AX)']. Also, both procedures have
Instead, the following alternating-direction modification obvious two-dimensional counterparts, although Brian's
of the Crank-Nicolson method is suggested by Douglas method then becomes identical with the implicit alterna-
[24] : ting-direction form.
EXAMPLE 7.3

UNSTEADY-STATE HEAT CONDUCTION IN A LONG BAR OF SQUARE CROSS SECTION


(IMPLICIT ALTERNATING-DIRECTION METHOD)

Problem Statement The solution to the problem is by the implicit alterna-


An infinitely long bar of thermal diffusivity a has a ting-direction method described in the text and sum-
square cross section of side 2a. It is initially at a uniform marized by equations (7.53a) and (7.53b), with the first
temperature and then suddenly has its surface main- half time-step implicit in the X direction. Let T and T*
tained at a temperature 6,. Compute the subsequent refer to temperatures at the beginning and end of a half
temperatures 6(x,i,t) inside the bar.
Method of Solution Y=O Y=1
If dimensionless distances, time, and temperature are
defined by
X Y clt 8 - 6,
X = - , J ' s - ,T=-- and T=-,
Q a a'' 61 - 60
it may be shown that the unsteady-state conduction is
governed by
-d2T a 2 ~aT
ax2+z=z' (7.3.1)
Because of symmetry, it suffices to solve the problem in
one quadrant only, such as that shown in Fig. 7.3.1. The
center of the bar (X = 0, Y = 0) and one of its corners .
(X= 1, Y = 1) are regarded as the grid points (0,O) and X
(n,n), respectively. From symmetry, there is no heat flux Figure 7.3.1 Lower right-hand quadrant of cross section of bar.
across the X and Y axes, which behave, in effect, as per-
fectly insulating boundaries across which the normal
temperature gradient is zero. The initial and boundary time-step AT/^. Equation (7.53a) is applied to each poipt
conditions are : i = 1,2, .... n - 1 in the jth column; also, the method of
z = 0: T = 0 throughout the region, Section 7.17 is used in conjunction with the effective
z > O : T = 1 alongthesidesX= 1 and Y = 1, boundary condition dT/aX = 0 at X = 0 to yield a
dT/aX = 0 and aT/aY = 0 along the sides finite-difference approximation of equation (7.3.1) at the
X = 0 and Y = 0, respectively. boundary point (0, j). We then have the following tri-
diagonal system for the jth column:

with
d i = T , , i - l + f T i , j + ~ , j + l , for
d n - I = Tn-1, j - 1 +fTn-1, j + Tn-1, j + t
i=0,1,
+ Tn,j
.... n - 2 ] for j + 0,
..., n - 2
di=2T,,,+fT,,,,
4 - 1
for
=2Tn-1,1 +fTn-l,O + Tn,o
i=0,1,
I for j = 0,

where

454
Example 7.3 Implicit Alternating-Direction Method 455
Note: (a) the second form for the dipwhen j = 0, arises -
cessive columns, j = 0, 1, .. ., n 1 until all the T,fi are
from consideration of the boundary condition aT/d Y = 0 found at the end of the first half time-step.
at Y = 0, and (b) in the present problem, the boundary The temperatures at the end of the second half time-
values T,,.and T:,j are synonymous (both equal 1). step are found similarly, by applying equation (7.53b),
Equation (7.3.2) is the same as (7.33) in form, and may or its equivalent at the boundary Y = 0,to each point
be solved for the q:j (i = O,I, .. ., n - 1) by the algorithm in a row ( j= O,l, .. ., n - I), for successive rows
of (7.34) and (7.35). The procedure is repeated for suc- (i = O,l, ..., n - 1).
456 Approximation of the Solution of Partial Differential Equations

Flow Diagram
1 Ti,,+ 1 -
Tn,i+ 1
T?, + 1
I t -
(Ax)2 T,$t 1
I I

4. A
p---------------------
I
I Solve tridiagonal system having
coefficients a,, bi, ci, d,, i = 0, 1,
v
,-------
d

\
..
I
L
,,

\
1
I
I

...,n - 1. Place solution in T;, TCj '-Ti'

(Subroutine TRIDAG)

Solve tridiagonal system


having coefficients aj, bj,
c j , d j ,j=O, 1 ,..., n - 1.
Place solution in T;, Ti,
..., Ti-,.
(Subroutine TRIDAG)
Exan~pIe7.3 Implicit Alternating-Direction Method

FORTRAN Implementation
List of Principal Variables
Program Symbol De$nition

A. B, C, D Coefficient vectors defined in equations (7.33).


DTAU Time-step, AT.
DX Space increment, AX.
F f == 2(1/A - 1).
1, J Row and column subscripts, i, j.
ICOUNT Counter on the number of time-steps.
IFREQ Number of time-steps between successive printings of temperatures.
N Number of space increments between X = 0 and X = 1.
RATIO A =. Az/(AX)'.
T Matrix of temperatures T at each grid point.
TAU Time, 2.
TMAX Maximum center temperature to be computed, T,,,,,.
TPRIME Vector for temporary storage of temperatures computed by TRIDAG. These values T' are then
placed in the appropriate row of T o r column of T*.
TRIDAG Sulbroutine for solving a tridiagonal system of simultaneous equations (see Example 7.2).
TSMR Matrix of temperatures T* at the end of the first half time-step.

Because of FORTRAN limitations, all subscripts in the text are advanced by one when they appear in the program; e.g., Tosoand T,,,,,
become T(1.1) and T(N+l, N + 1 ), respectively.
458 Approximation of the Solution of Partial DifferentialEquations

Program Listing
A P P L l ED NUMERl CAL METHODS, EXAMPLE 7.3
TWO-DIMENSIONAL UNSTEADY STATE HEAT CONDUCTION I N AN I N F I N I T E
BAR OF SQUARE CROSS-SECTION, SOLVED BY THE I.A.D. METHOD,

AT T l M E TAU = 0, THE TEMPERATURE I S ZERO EVFRYWHERE, AND THE


FACES OF THE BAR ARE SUBSEQUENTLY M A I N T A I N E D A T T 1.
THE TEMPERATURES I N THE LOWER RIGHT-HAND QUADRANT ARE PRINTED
EVERY IFREQ TIME-STEPS, U N T I L THE CENTER TEMPERATURE EXCEEDS
TMAX.

..... READ AND CHECK INPUT PARAMETERS


READ ( 5 , 1 0 0 ) DTAU, TMAX, N, IFREQ
.....
NP1 = N + 1

DX -
FLOATN = N
l.O/FLOATN
RATIO = DTAU/(DX*DX)
WRITE ( 6 , 2 0 0 ) DTAU, DX, RATIO, TMAX, N, IFREQ
C
C .....
0 0 2
SET I N I T I A L AND BOUNDARY VALUES
l = l , N
.....
T(I,NPl) = 1.0
T(NP1,l) = 1.0
TSTAR( I ,NP1) = 1.0
TSTAR( N P l , I = 1.0
DO2 J61. N

..... SET C O E F F I C I E N T ARRAYS A,


B ( 1 ) = 2.0*(1.O/RATIO + 1.0)
B AND C .....
A(I) = -1.0
B ( I ) = B(1)
C(I) =
C(1) =
ICOUNT = 0
--
1.0
2.0

TAU = 0.0

..... PERFORM CALCULATIONS OVER SUCCESSIVE TIME-STEPS


TAU = TAU + DTAU
.....
ICOUNT = ICOUNT + 1

..... COMPUTE TEMPERATURES AT END OF HALF


T l M E INCREMENT ( I M P L I C I T BY COLUMNS) .....
DO8 J = l , N
DO7 l = l , N

CALL TRIDAG (1, N, A, B, C, D, TPRIME)


DO8 I P 1 , N
TSTAR(I,J) = TPRIME( I )
..... COMPUTE TEMPERATURES AT END OF WHOLE
T l M E INCREMENT ( I M P L I C I T BY ROWS) .....

CALL TRIDAG (1, N, A, B, C, D, TPRIME)


DO 1 2 J = 1, N
T(I,J) = TPRIME(J1
Example 7.3 ImpIicit Alternating-Direction Method

Program Listing (Continued)


C
C ..... P R I N T TEMPERATURES THROUGHOUT
I F ( I C O U N T .NE. I F R E Q ) GO TO 1 5
THE QUADRANT .....
ICOUNT = 0
WRITE (6,201) TAU
DO 1 4 1 = 1, N P 1
14 WRITE ( 6 , 2 0 2 ) (T(I,J), J = 1, NP1)
15 I F (T(1,l) - TMAX) 4, 4, 1
C
C
100
.
.- .- .- .- - FORMATS FOR INPUT AND OUTPUT STATEMENTS
FORMAT ( 6 X , F6.3, IOX, F5.2, 7X, 13, 11X, 1 2 )
.....
200 FORMAT ( 8 2 H 1 UNSTEADY STATE HEAT CONDUCTION I N A SQUARE BAR, 1.A
1.D. METHOD, W l T H PARAMETERS/
212HO DTAU = , F 1 0 . 5 / 1 2 H DX = , F10.5/
312H
412H N -
.
RATIO =
=
F10.5/ 12H TMAX =
14/ 1 2 ~ IFREQ = , 14)
. F10.5/

2 0 1 FORMAT ( 2 0 H O i~ A T l M E TAU = , F 8 . 5 / 32H0 TEMPERATURES I N QUAD


lRANT ARE/lH / )
2 0 2 FORMAT ( 4 H , llF8.5)
C
END

Data
DTAU * 0.05 , TMAX = 0.95, N = 10, IFREQ = 1

Computer Output
UNSTEADY STATE HEAT CONDUCTION I N A SQUARE BAR, I.A.D. METHOD, W I T H PARAMETERS

DTAU
DX =
- 0.05000
0.10000
RATIO = 5,00000
TMAX a 0,95000
N 10
lFREQr 1
AT A T I M E TAU = 0.05000

TEMPERATURES I N QUADRANT ARE

AT A T l M E TAU = 0.10000
TEMPERATURES I N CLUADRANT ARE
460 Approximation of the Solution of Partial Differential Equations

Computer Output (Continued)


AT A T l M E TAU = 0.20000
TEMPERATURES I N QUADRANT ARE

AT A T l M E TAU = 0.40000
TEMPERATURES I N QUADRANT ARE

AT A T l M E TAU = 0.75000
TEMPERATURES I N QUADRANT ARE
Example 7.5 Natural Convection at a Heated Vertical Plate

FORTRAN Implementation
List o f Principal Variables
Program Symbol Dejinition
DTAU Tirne-step, AT.
DX, DY Space increments, AX and A Y, respectively.
1. J Gmd-point subscripts, i, j.
ICOUNT Counter on the number of time-steps.
l FR EQ Number of time-steps between successive printings of temperatures.
JMAX Largest column subscript,j,,,, for which printout is requested, in case n is too large for the width
of the page.
M. N Number of grid spacings nz, n, in the X and Y directions, respectively.
PR Prandtl number, Pr.
T, TNEW Matrices of temperatures, T, T', at the beginning and end of a time-step, respectively.
TAU Time, z.
TAUMAX Largest value of z to be considered.
u, v Matrices of velocities U and V in the X and Y directions, respectively.
UNEW Matrix of velocities U' at the end of a time-step.
XMAX Height of the plate, X,,,.
YMAX Maximum distance from the plate, Y,,,.
Because of FORTRAN limitations, all subscripts in the text have
been advanced by one in the program; e.g., T,,. becomes T(M + 1.
N +I).
478 Approximation of the Solution of Partial DiffeSentiaZEquations

Program Listing
A P P L I E D NUMERICAL METHODS, EXAMPLE 7.5
FREE CONVECTION AT A HEATED VERTICAL PLATE.

THE EQUATIONS OF MOTION ( S I M P L I F I E D BOUNDARY LAYER TYPE),


ENERGY AND CONTINUITY ARE SOLVED USING AN E X P L I C I T PROCEDURE.

DIMENSION U ( 4 1 , 4 1 ) , UNEW(41,41), V(41,41), T(41,41), TNEW(~I,L~)

..... READ AND CHECK INPUT PARAMETERS, AND COMPUTE CONSTANTS


READ (5,100) M, N, IFREQ, JMAX, XMAX, YMAX, PR, DTAU, TAUMAX
.....
J M A X P l = JMAX + 1
MP1 = M + 1
NP1 = N + 1
FLOATM = M
FLOATN = N
DX = XMAX/FLOATM
DY = YMAX/FLOATN
DYSQ = DY*DY
DYSQPR = DYSQ*PR
YOVERX = DY/DX
WRITE ( 6 , 2 0 0 ) M, N, IFREQ, JMAX, XMAX, YMAX, PR, DTAU, TAUMAX
C
C ..... I N I T I A L I Z E VELOCITIES
ICOUNT = 0
AND TEMPERATURES .....
DO 2 1 = 1, M P 1
DO 2 J = 1, N P 1
U ( I , J ) = 0.0
UNEW( I,J) = 0.0
V(I,J) = 0.0
T(I,J) = 0.0
2 TNEW(I,J) = 0.0
DO 3 1 = 1, M P 1
3 T(I,l) = 1.0
C
C ..... PERFORM CALCULATIONS OVER SUCC'ESSIVE TIME-STEPS
TAU = 0.0
.....
4 TAU = TAU + DTAU
ICOUNT = ICOUNT + 1
C
C .....
DO 5
CALCULATE NEW TEMPERATURES
1 a 2, M P 1
.....
DO5 J - 2 , N
5 TNEW(I,J)
1
2
= T(I,J) + DTAU*
(((T(I,J+l)
--
(U(I,J)*(T(I,J)
-
2.0*T(I,J)
-
+ T(I,J-l))/DYSQPR)
T(I-l,J))/DX)
3 (V(I,J)*(T(I,J+l) -
T(I,J))/DY))

NEW U V E L O C I T I E S .. ...
C
C ..... CALCULATE
. -

DOE J = 2 , N
6 UNEW(I,J) = U(I,J) + DTAU*
1
2
(((U(I,J+l)
--
(U(I,J)*(U(I,J)
-
2.0*U(10J)
--
+ U(I,J-l))/DYSQ)
U(I-l,J))/DX)
3 (V(I,J)*(U(I,J+l) U(I,J))/DY) + TNEW(I,J))

C ..... 1
DO 7
CALCULATE NEW V V E L O C I T I E S
= 2, M P l
.....
DO 7 J = 2, N P 1
7 V(I,J) = V(I,J-11 + YOVERX*(UNEW(I-1,J) - UNEW(I,J))
.....
#,
L
C ..... SUBSTITUTE
DO 8 1 = 2, M P 1
UNEW INTO U, AND TNEW INTO T
Example 7.5 Natural Convection at a Heated Vertical Plate

Program Listing (Continued)


C
C ..... PRINT U, V AND T FIELDS WHEN APPROPRIATE
I F (ICOUNT .NE. IFREQ) GO TO 1 3
.....
ICOUNT = 0
WRlTE (6,201) TAU
WRlTE (6,202)
DO 1 0 K = 1, MP1
I = M P l - K + 1
10 WRITE (6,205) (U(I,J), J = 1, JMAXP1)
WRlTE (6,203)
DO 11 K = 1, M P l
I = M P 1 - K + l
11 WRITE (6,205) (V(I,J), J = 1, JMAXP1)
WRlTE (6,204)
DO 12 K = 1. MP1
I t M P 1 - K + 1
J = 1, JMAXP1)
12
13 I F (TAU -
WRITE (6,205) (T(I,J),
'TAUMAX) 4, 4, 1
C
C
100
..... FORMATS FOR INPUT AND OUTPUT STATEMENTS
FORMAT (3(10X, 13), 13X, 15, lSX, F8.2/ 4(12X, F 8 . 3 ) )
.....
200 FORMAT ( 5 6 H 1 FREE CONVECTION AT A VERTICAL PLATE, WITH PARAMETER
1S/ 12HO M = , 14/ 12H N = , I4/
2 12H IFREQ = , I I / 1 2 H JMAX 141 ,
3 12H XMAX = , F 1 0 . 5 1 12H YMAX = , F10.5/
4 12H PR = , F10.5/ 1 2 H DTAU = , FlO.S/
5 13H TAUMAX , F9.5)
201 FORMAT (25611 AT A TIME TAU = , F8.3)
202 FORMAT (45MO THE F I E L D OF VELOCITY COMPONENT U I S / l H /I
203 FORMAT (45HO THE F I E L D OF VELOCITY COMPONENT V I S / l H / )
204 FORMAT (33HO THE TEMPERATURE F I E L D I S / l H / )
205 FORMAT ( 4 H ,
llF8.3)
C
END

Data
M
YMAX
- 10
= 25.0
N 10
PR
IFREQ = 2 0
= 0.733
JMAX
DTAU
=
=
10
0.5
XMAX
TAUMAX
=
=
100.0
80.05
480 Approximation of the Solution of Partial Differential Equations

Computer Output
F R E E CONVECTI'ON A T A V E R T I C A L PLATE, W I T H PARAMETERS

N = lo
IFREQ = 20
JMAX = 10
XMAX = 100.00000
YMAX = 25.23000
PR = 0.73300
DTAU = 0.50000
TAUMAX = 80.04999
A T A T I M E TAU = 80.000
THE F I E L D OF V E L O C I T Y COMPONENT U I S

THE F I E L D OF V E L O C I T Y COMPONENT V I S

THE TEMPERATURE F I E L D I S
Example 7.5 Natural Convection at a Heated Vertical Plate 481
We can compare these results with those computed
Discussion of Results
by Hellums [ll] on a 39 x 39 grid with Ar = 0.1 ; his
The velocity and temperature fields are reproduced values agree almost perfectly with the analytical solution
above only for r = 80. A s examination of the complete to the problem given by Ostrach [is]. For X = 100,
printout, for r = 10, 20, . . ., 80, shows little change in representative steady-state values are given in Table
U, V, and T after r = 40 has been reached. Thus the 7.5.1; Hellums' results have been interpolated to corre-
results for z = 80 are essentially steady-state values. The spond to the present grid.
computations are for a 101 x 10 grid; the value P r = 0.733 The accuracy of the present results may be described
is that for air chosen by several previous investigators. as good, fair, or poor, depending on whether one is
In each case, the first column of figures corresponds to interested in the values of T, U, or V. However, note
the vertical plate, and the extreme right-hand column that the number of computations involved here (10 x 10
corresponds to Y = 25. All the values of V are negative, grid, Az = 0.5) is roughly only one seventieth of the
indicating that the horizontal velocity component is number required for a 39 x 39 grid with Ar = 0. I .
always directed towards the plate. The vertical velocity Note: It may easily be verified by subsitution that the
component U reaches a maximum a little way out from new dependent variables Ll/X1I2 and v X ' I 4 , together
the plate, except near the bottom of the plate, and then with T, are functions of just two new independent
decreases. The temperature T decreases monotonically variables, namely, Y / X 1 1 4 and r / X 1 I 2 .Thus the above
away from the plate. results may be condensed very considerably, if required.

Table 7.5.1 Comparison of Steady-State Resultsfor 10 x 10 Grid with Analytical Solution

Y 0 2.5 5.0 7.5 10.0 12.5 15.0

Analytical U 0 4.65 5.29 4.15 2.78 1.65 0.92


sblution (same V 0 -0.030 -0.097 -0.177 -0.251 -0.305 -0.342
as 39 x 39 grid) T 1 0.713 0.461 0.270 0.149 0.077 0.040
Results computed U 0 4.47 5.03 4.08 2.87 1.86 1.13
for V 0 -0.051 -0.130 -0.210 -0.276 -0.325 -0.358
10 x 10 grid T I 0.705 0.45 1 0.270 0.156 0.088 0.049
482 Approximation of the Solution of Partial Differential Equations

7.21 Derivation of the Elliptic Difference Equation lying within R or on S, but which are not interior points,
are called boundary points (see Fig. 7.14).
Let R be a finite connected region in the xy - plane,
and let S denote the boundary of R (see Fig. 7.12).

Figure 7.12 Region and boundary.

We seek to determine a numerical approximation v of Figure 7.14 Interior ( 0 )and bomn&ry (0)points.
the solution u(x,y) of the boundary-value problem
We then approximate the PDE of (7.70) at each interior
poirit of the grid by the difference equation'

where k,, = k(i Ax,j Ay) and Ci,= C(i Ax,j Ay). At each
boundary point of fhe grid, we set
where k(x,y) > 0 and C(x,y) 2 0 in R.
An equation such as (7.70) would arise for steady-state
heat conduction in a flat plate, whose thermal conduc-
tivity k depends on position, with an internal heat where (i,j) is any point on S within a distance Ax of the
source (represented by f ) and a heat-transfer effeit at the boundary point (i, j).
surface of the plate (represented by C). Observe that the In the4generalease, we order the points on the grid
elliptic problem is time-independent. by assigning to each in succession an ordinal number k,
We introduce a rectangular grid on the region R, so and indicating it by Pk; we redesignate the value of v at
that every point ( i , j ) has four neighbors, as shown in this point by vk. If the total number of interior points of
Fig. 7.13. the grid in R is n, it is convenient to number the points
so that Pk is an interior point if and only if 1 g k < n.
We then eliminate from the finite-difference equations
the values of u at the boundary points by using the values
prescribed in (7.73). We are then left with a system of n
linear equations in the n values vk, k = 1,2, .. ., n for the
approximate function at the interior points. By defining
the vector

I% Figure 7.23 Rectang~iargrid.


we can write this system of equations in the form

It is a matter of convenience, but not of necessity, to


make the grid spacings equal in the x and y directions. In where A is an n x n matrix whose entries are determined
the development that follows, we will assume that by the coefficients of the difference equation, and where
Ax = Ay, unless it is stated otherwise. A point (i,j) of the the vector b is determined by the prescribed boundary
grid is said to be an interior point if its four neighbours values and the known terms in the differential equation.
lie within R, or on S, but not outside S. Other points We are then left with the problem of solving (7.75) for v.
?nt of the Boundary Points

7.22 Laplace's Equation in a Rectangle Or, written in the form Av = b,


As a special case, consider the problem of (7.70) and
(7.71) in the rectangular region R: 0 < x < a, 0 < y <P
(for simplicity, a and /? are assumed to be in the ratio of
two moderately small integers). We have:

Observe that the coefficient matrix A is symmetric; note


Then, with integers M, and M y chosen so that
also that in any row of A, the sum of the absolute values
of the nondiagonal elements is less than the absolute
value of the diagonal element. With larger choices of
M , and M,,, we would have certain interior points (in the
we have the finite-difference problem rectangle of Fig. 7.15) none of whose neighbors were
boundary points. In such cases, certain rows of the matrix
A would have the sum of the absolute values of the non-
diagonal elements equal to the absolute value of the
diagonal element; however, the former inequality would
still be preserved for certain other rows. The latter is an
important property which, as we shall see shortly, helps
to ensure the convergence of various iterative methods
of solution.
The preceding discussion dealt with the theory of the
with method. In actual computation, to facilitate program-
O0.j 'g 0 , j s V M , , ~= g ~ , , j , ming, it is probable that both subscripts ( i , j ) would be
(7.80)
retained on v. Also, the complete matrix A, consisting as
Vi.0 = g i , ~ , v i . =
~ Q~I , M , .
it does of n2 elements, is most unlikely to be stored in
Suppose that M, = 4 and M y= 3, corresponding to the its entirety in the computer. We shall soon see why this
grid of Fig. 7.15. The points numbered 1 through 6 are is so. Finally, note that a matrix, formed as A is formed,
is always positive definite.
7.23 Alternative Treatment of the Boundary Points
Let us now look more closely at the boundary con-
dition (7.73). It presents no complications for the simple
case shown in Fig. 7.1 5, in which certain grid points are
arranged to lie on the boundary. Fig. 7.16, however, shows
a typical situation involving a curved boundary S, on
which u is assumed already specified. Point 1is a boundary
point, and condition (7.73) states that the value of v ,
shall be equated to some value of g anywhere on S between
C and D (that is, within a distance Ax of point 1).
Figure 7.15 Grid points for u rectangle. Although simple, this is clearly not the most satisfactory
of arrangemefits, and it is termed interpolation of degree
interior points; those numbered 7 through 16 are boundary zero.
points, at which the values of v are already known from If better accuracy is required, the proccdurc rccom-
equation (7.80). The corresponding system of equations, mended is that of interpolation of degree two, considered
multiplied by a factor of four to remove fractions, is earlier for parabolic equations on page 463.
The Laplacian at point 1 is approximated by
484 Approximation of the Solittion of Partial D~fferentialEquations

7.24 Iterative Methods of Solution


So far, we have formulated the numerical approxlma-
tion of the boundary-value problem with the elliptic
differential equation as the solution v of the linear
algebraic system (7.75),

The solution of this system exists and is unique if and


only if A is nonsingular, in which case

Unfortunately, even though we may know that the matrix


is nonsingular, the size of the system is usually so large
that the derivation of the inverse by any direct method
requires an excessive amount of calculation, quite apart
from considerations of round-off error. It is, therefore,
more convenient, in this case, to employ a method of
iteration or relaxation. Two such methods were discussed
Thus, for the problem of solving Laplace's equation in a in Chapter 5: the Jacobi (total-step or simultaneous dis-
region with an irregular boundary, the following finite- placement) and Gauss-Seidel (single-step or successive
difference equation holds at point 1 (which may be displacement) iteration procedures. In the solution of
regarded as a typical boundary point) : elliptic difference equations, the Jacobi method is a
special case of the Richardson met hod [9], and the Gauss-
Seidel method is also known as the Liebmann method.
Basically, each method involves transferring all the
The values of v (such as v,) at the boundary points are off-diagonal terms to the right-hand side, so that the ith
unknown; therefore, they must be incorporated into the equation reads
matrix equation Av = b as if they were values of v at the
interior points. In this case, the symmetry of the c o e f i e n t
matrix will almost certainly be destroyed, but the methods
of solution will, in general, not be invalidated.
Example. As a very simple illustration, it can be shown that If approximations v to the exact solution of (7.75) are
the following equation gives the finite-difference approximation known, then new approximations c' can be computed
of u,, + u,, = 0 in the region of Fig. 7.17. from

This procedure is repeated until the computed values


show little further change. In the Jacobi method, the
old values v are always used in the right-hand slde of
(7 86) until a complete set of new values v' has been
computed. In the Gauss-Seidel method, however, the
most recently available values of v are always used in the
right-hand side of (7.86).
Sufficient conditions for the convergence of both
methods are given In Sections 5.6 and 5.7; see also
Ralston and Wilf [2 I]. Convergence is guaranteed if the
coefficient matrix A of (7.75) satisfies:

Figure 7.17 Sysrem of grid points leading to equation (7.84). 1. A is an irreducible matrix as explained on p. 299.
7.24 Iterative Methods of Solufion 485

2. In practice, we shall probably be solving upwards of


n
one hundred simultaneous equations, but since five
1 laijl G
j= 1
laiil,
terms are the most that will appear on the left-hand side
j+i
of each equation, the coefficient matrix A will contain a
with strict inequality holiding for at least one value of i. very high proportion of zero elements. Consequently,
It can be shown that the coefficient matrices that arise attempts to work entirely in terms of matrices and, in
in the finite-difference approx.imations (7.72), (7.73) and particular, to store all their elements are extremely
(7.79), (7.80) for the boundary-value problem with the wasteful of computer storage. Rather, the Jacobi and
elliptic PDE always satisfy these conditions. The final Gauss-Seidel iterative methods are effected in practice
solution is independent of the choice of starting values, by the repeated application of a single algebraic equation
although a poor initial estimate will require more itera- throughout the whole system of grid points. Reverting to
tions for convergence. In the soIution of Laplace's double space-subscripts, this amounts to computing a
equation, the Gauss-Seidel method converges more new approximation ui, at every interior grid point from
rapidly than the Jacobi method and will be preferred. the simple formula
In Section 7.25, we also consider other methods for
accelerating the rate of convergence to the solution of
(7.75).
EXAMPLE 7.6
STEADY-STATE HEAT CONDUCTION IN A SQUARE PLATE
Problem Statement the solution of a system of (n - simultaneous equa-
Find the steady-state temperature distribution in a tions (similar to (7.82)), with the temperatures at the
square plate, one side of which is maintained at 100°, interior grid points as the unknowns. The method of
with the other three sides maintained at 0°, as shown in solution is to iterate through all the grid points, calcula-
Fig. 7.6.1. ting a better approximation to the temperature at each
point (i,j ) in turn from equation (7.87):

As soon as a new value of T is calculated at a point, its


previous value is discarded. This is the Gauss-Seidel
method of iteration (see Section 7.24). To start, a tem-
perature of 0" is assumed everywhere within the plate.
The process of iteration through all grid points is repeated
until further iterations would produce, it is hoped, very
little change in the computed temperatures. The following
Figure 7.6.1 Heat conduction in a square plate. program stops if the sum 8, over all grid points, of the
absolute values of the deviations of the temperatures
Method of Solution from their previously computed values, falls below a
The problem is that of solving Laplace's equation in a small quantity E,. Computations will also be dis-
rectangle (see Section 7.22). If each side of the square is continued if the number of complete iterations, k,
divided into n increments Ax (=Ay), the problem involves exceeds an upper limit, k,,,.
Example 7.6 Steady-Stare Heat Conduction in a Square Plate

Flow Diagram

i = 0,1, j = l,2,
Begin Ti,, + 100
...,n
4

r
"No Convergence" T
Ti,j,
i=O, 1, ..., n,
-
A F

j=O, 1, ..., n A

FORTRAN Implementation
List of Principal Variables
Program Symbol Definition
EPS Sum, E, of absolute values of deviations of temperatures from their previously computed values.
EPSMAX The largest tolerable value, emu,,of e.
HOLDT Temporary storage, T, for temperature while new value is being computed.
1. J Grid-point subscripts, i, j.
ITER Iteration counter, k.
ITMAX Upper limit, k, on number of iterations.
N Number of increments, n, into which the side of the square is divided.
T Matrix of temperatures, T.
Because of FORTRAN limitations, the subscripts in the text
are advanced by one in the program; e.g., To.othrough T.,. become
(T1.l) through T ( N t l , ~ + l ) .
488 Approximation of the Solution of Partial Differential Equations

Program Listing
C A P P L l ED NUMERI CAL METHODS, EXAMPLE 7.6
C STEADY STATE HEAT CONDUCTION I N A SQUARE PLATE.

LEFT-HAND S I D E I S A T 1 0 0 DEGREES AND THE OTHER THREE S I D E S ARE


AT 0 DEGREES. THE PLATE HAS N G R I D SPACINGS ALONG EACH S I D E .
A GAUSS-SEIDEL I T E R A T I O N I S EMPLOYED U N T I L E I T H E R THE SUM OF
THE ABSOLUTE VALUES OF THE D E V I A T I O N S OF THE TEMPERATURES FROM
T H E I R L A S T COMPUTED VALUES FALLS BELOW A SMALL QUANTITY EPSMAX,
OR, THE NUMBER OF I T E R A T I O N S EXCEEDS ITMAX.

..... READ AND CHECK INPUT PARAMETERS


D I M E N S I O N T(11,l.l)
.....
READ ( 5 , 1 0 0 ) N, ITMAX, EPSMAX
WRITE ( 6 , 2 0 0 ) N, ITMAX, EPSMAX
C
C ..... E S T A B L I S H I N I T I A L GUESSES FOR TEMPERATURE
NP1 = N + 1
.....
DO 2 1 = 1, N P l
T(I,l) = 100.0
DO 2 J = 2, N P l
2 T(I,J) = 0.0
C
C ..... TEMPERATURES
WLCULATE SUCCESSIVELY BETTER APPROXIMATIONS FOR

SATI SFACTORY CONVERGENCE I s ACHIEVED .....


C AT A L L G R l D POINTS, l TERATI NG U N T l L
c
ITER = 0
3 EPS = 0.0
ITER = I T E R S + 1
DO4 1 = 2 , N
DO4 J = 2 . N
HOLDT = T(I;J)
T(I,J) = (T(I,J+l) + T(I,J-1) + T(I+l,J) + T(I-l,J))/h.O
EPS = EPS + A B S ( T ( I , J ) -
HOLOT)

..... STOP I T E R A T I O N S I F COMPUTED VALUES SHOW L I T T L E FURTHER


CHANGE, OR I F NUMBER OF I T E R A T I O N S I S TOO LARGE .....
I F (EPS .LE. EPSMAX) GO TO 6
I F (ITER -
I T M A X ) 3, 3, 8

..... PI TREI NR TANDVALUES OF THE I T E R A T I O N COUNTER


THE F I N A L TEMPERATURE F I E L D .....
WRITE
DO 7
WRITE
GO TQ
(6.201)

(6,202)
1
ITER
1 5 1, N P l
(T(I,J), J - 1, N P 1 )

..... COMMENT
WRITE (6,203)
I N CASE l T E R EXCEEDS ITMAX .....
DO 9 1 3 1, N P 1
WRITE (6,202) (T(I,J), J = 1, NP1)
GO TO 1

C
100
..... FORMATS FOR I N P U T AND OUTPUT STATEMENTS
FORMAT ( 1 1 X . 14, 11X, 14, 12X. E 8 . 1 )
.....
200 FORMAT ( 6 2 H l S T E A D Y - S T A T E HEAT CONDUCTION I N A F L A T PLATE, W I T H PAR
1AMETERSI lOHON = , 1 5 / 1 0 H ITMAX = , I S /
2 1 0 H EPSMAX = E12.2)
2 0 1 FORHAT ( ~ ~ H O C ~ N V E R G E N COND C E l T I ON HAS BEEN REACHED AFTER ,
1 13, 1 1 H I T E R A T I O N S / 34HOTHE TEMPERATURE F I E L D I S G I V E N B Y )
2 0 2 FORMAT (1H0, l l F 9 . 4 )
2 0 3 FORMAT (42HONO CONVERGENCE. CURRENT VALUES OF T ARE)
C
END

Data
N = 8, ITMAX = 100, EPSMAX = 1.OE-4
N = 4, ITMAX = 50, EPSMAX = 1.OE-5
Example 7.6 Sready-State Heat Conduction in a Square Plate

Computer Output
Results for the Is! Data Set
STEADY-STATE HEAT CONDUCTION 1N A F L A T PLATE, W I T H PARAMETERS
N P 8
ITMAX = 100
EPSMAX = 0.10E-03

CONVERGENCE C O N D I T I O N HAS BEEN REACHED AFTER 8 8 ITERATIONS

THE TEMPERATURE F I E L D I S G I V E N BY

Results for the 2nd DatiuSet


STEADY-STATE HEAT CONDUCTION I N A F L A T PLATE, W I T H PARAMETERS

N = 4
ITMAX = 50
EPSMAX 0.10E-04

CONVERGENCE C O N D I T I O N HAS B E E N REACHED AFTER 2 2 ITERATIONS

THE TEMPERATURE F l E L D I S G I V E N BY
490 Approximation of the Solution of Partial Differential Equarions

4"
T(x, Y)= 7
5 1 (2n + 1)nx
Discussion of Results .=o + 1) sin a
(2n
The printout shows the distribution of temperature in (a - y)(2n + 1)x
the plate for both 8 x 8 and 4 x 4 grids. Even with such x sinh cosech(2n + l)x,
a
coarse grids, there is fairly good agreement between the
two cases. where a is the side of the square, To (= 100) is the tem-
The computed temperatures may be compared with perature of the side y = 0, and the coordinate system is
the analytical solution given by Carslaw and Jaeger [2]: shown in Fig. 7.6.1.
EXAMPLE 7.7
DEFLECTION OF A LOADED PLATE

Problem Statement Method of Solution


Consider a simply supported square plate of side L that By introducing the variable u = v 2 w , the problem
is subjected to a load q per unit area, as shown in Fig. amounts to solving Poisson's equation twice in succession:
7.7.1, The deflection w in the a-direction is the solution V 2 u = qlD, with u = 0 along the four edges, (7.7.3)
of the biharmonic equation:
V 2 w = U , with w = 0 along the four edges. (7.7.4)
For this purpose, we employ a subroutine, named POISON,
that uses the Gauss-SeideI method to approximate the
solution of Poisson's equation :
a-
" +-
a24
= *(x, y).
' i I uniform /? ax2 ay2
/ ! I downward A
Simply-supported The finite-difference approximation of equation (7.7.5) is
edges
I
'I"
I

f f f,
Thus, for Ax = Ay, the Gauss-Seidel method amounts to
Figure 7.7.1 Loaded square plate. repeated application of

subject to the boundary c:onditions w = 0 and a2w/aV2= 0


along its four edges, where q denotes the normal to the
boundary. D is the flexural rigidity of the plate, given at every interior grid point. The subroutine is written
by for k,,,,, applications of (7.7.6) through all interior grid
points. Applied to equation (7.74, the call would be of
the form
CALL POISON (N, L, PHI, PSI, KMAX)
where E = Young's modulus, t = plate thickness, and
a = Poisson's ratio. For simplicity, suppose the loading Here, the matrix PSI would contain the known right-
is uniform so that q is constant. hand side values II/ and the subroutine would compute
Write a program for computing the deflections w at a and store the values of 4 in the matrix PHI. By using
grid of points with n intervals dong each side of the appropriate arguments, the subroutine will solve equa-
square. tions (7.7.3) and (7.7.4) in two successive calls.
Approximation of the Solution of Partial Diferenfial Equations

Flow Diagram
Main Prograin

Perform kmOx iterations, Perform kmaxiterations


using an n x n grid. using an 12 x n grid.
(Subroutine POISON) (Subroutine POISON) qorerd,, t 0

Subroutine POlSON (Dummy arguments:n, L, 4 , $, k,,;


1st calling arguments, n, L, u, qoverd, k,,,;
2nd calling arguments: n, L, w, u, kma,)

Enter j = l,2,
..., n - 1 ..., n - 1
/ /
/ /
I / /'
/ 0
I / 0
0

I / / / /)
/'
/I 0
I / ' / /
'/'
& 1
1
+ 4i,j-l+ C i , j + ~-
41, ei[4i-l,j + 4i+r,j ):( 2
d'i.j] -I 1
Example 7.7 Dejection of a Loaded Plate

FORTRAN Implementation
List of Principal Variables
Program Symbol Dejnition
(Main)
D Fleitural rigidity, D, lb, in.
E Young's modulus, E, lbf/sq in.
I. J Grid-point subscripts, i, j. Because of FORTRAN limitations, we have 1 < I d N + 1. corre-
sponding to 0 d i d n, and similarly for J.
ITMAX Number of Gauss-Seidel iterations, k,,,.
L Length of side of square, L, in.
N Number, n, of grid spacings along a side of the square.
POISON Subroutine for solving Poisson's equation by the Gauss-Seidel method.
Q Load per unit area of the plate, q, lb,/sq in.
QOVERD Matrix, with values of qouerd = q/D at each grid point.
SIGMA Poisson's ratio, o.
T Plate thickness, t, in.
u Matrix of intermediate variable, u = V2w, at each grid point.
w Matrix of downwards deflection, wl, in., at each grid point.
(Subroutine
POISON)
ITER Iteration counter, k.
PHI, PSI Matrices of functions 4 and $, occurring in Poisson's equation, VZb,= I/.
494 Approximation of the Solution of Partial Differential Equations

Program Listing
Main Program
C A P P L l ED NUMERI CAL METHODS, EXAMPLE 7.7
b
.C DEFLECTION OF A UNIFORMLY-LOADED SQUARE PLATE.

C THE BlHARMONlC EQUATION, DELSQ(DELSQ(W)l Q/D,


C I S SOLVED I N ORDER TO F I N D THE DOWNWARDS DEFLECTION W
C THPOUGHOUT A SIMPLY-SUPPORTED SQUARE PLATE OF S I D E L
C AND FLEXURAL R I G I D I T Y D SUBJECTED TO A UNIFORM LOADING Q
C PER U N I T AREA.
C
REAL L
DIMENSION U(11,11), W(11,11), QOVERD(l1,ll)
C
c .....
READ AND P R I N T DATA AND SET I N I T I A L ESTIMATES
1 WRITE ( 6 . 2 0 0 )
,,.,.
READ (5,100) N, ITMAX, Q, T, SIGMA, L, E
WRITE (6,201) N, ITMAX, Q, T, SIGMA, L, E

D = E*T*+3/(12.0*(1.0
RHS = ' Q / D
- SIGMA++2))
DO 2 1 = 1, N P 1
DO 2 J = 1, N P l
W(I,J) = 0.0
U(I,J) = 0.0
2 QOVERD(I,J) = RHS
C
C ..... SOLVE DELSQ(U1 = Q/D
CALL POISON (N, L, U, QOVERD,
..... ITMAX)
C
C ..... SOLVE DELSQCW) = U
CALL POISON (N, L, W, U, ITMAX)
.....
C
C ....,
WRITE
P R I N T INTERMEDIATE U ' S AND THE DEFLECTIONS W
(6,202)
.....
DO 3 1 = 1, N P 1
3 WRITE (6,203) (U(l,J), J = 1, NP1)
WRITE (6,204)
DO 4 1 = 1. N P l

C
C
100
..... FORMATS FOR INPUT AND OUTPUT STATEMENTS
FORMAT (3X, 12, 9X, 14, 2(6X, F4.1),
.....
lox, F0.1, 6X, FS.1, 6X,
1 E7.1)
200 FORMAT ( 8 7 H 1 SOLUTION OF BlHARMONlC EQUATION FOR DEFLECTION 0
1 F A UNIFORMLY-LOADED SQUARE PLATE)
= , 1 4 / - l l ~ , SHITMAX = , 1 0 /
201 FORMAT (IHO, l o x , 8HN
1 11X, 8HQ
2 ll~, 8HSIGMA
= , F10.5/ 11X. 8HT
, F10.5/ 11~;8HL
= . F10.5/
= , F10.5/
3 11X, 8HE = , E11.2)
202 FORMAT (35HO THE INTERMEDIATE VALUES U ARE)
203 FORMAT (1H0, 4X, 9F10.6)
204 FORMAT (33HO THE F I N A L DEFLECTIONS W ARE)
C
END

Subroutine POISON
C SUBROUTINE FOR SOLVING POISSON'S EQUATION.
C D E L S Q ( P H I 1 = PSI(X,Y),
C I N A SQUARE OF S I D E L, ALONG WHICH THERE ARE N INCREMENTS.
C ITMAX GAUSS-SEIDEL ITERATIONS ARE PERFORMED AT A L L G R I D POINTS.
Example 7.7 Deflection of a Loaded Plate
Prognm Listing (Continued)
C
SUBROUT lNE POISON (N, L, PHI, PSI, ITMAX)
REAL L
.DIMENSION PHI(l1,11), PSI(11,ll)
FN = N
DXSQ (L/FN)**2

PHIII,J)
1
1
r 0 . 2 5 * ( ~ ~ 11

RETURN
( -1,~)
+ PHI(I,J+l) -
+ PHI( I+l,J) + PHI( 1,J-1)
DXSQ*PSI(I,J))

C
END

Data
N
N
N
N
*
* 4,
= 8,
4,
8,
ITMAX =
ITMAX

ITMAX -
5,
25,
ITMAX = 25,
100,
Q = 1.0,
Q = 1.0,
Q = 1.0,
Q = 1.0,
T = 0.2,
T 0.2,
T = 0.2,
T = 0.2,
SIGMA
SIGMA
SIGMA
SIGMA
=
=
=
=
0.3,
0.3,
0.3,
0.3,
L = 40.0,
L 90.0,
L = 40.0,
L = 40.0,
E -
E = 30.0E6
30.OE6
E = 30.OE6
E = 3D.OE6

Computer Output
Results for the 2nd Data Set
SOLUTION OF BlHARMONlC EQUATION FOR DEFLECTION OF A UNIFORMLY-LOADED SQUARE PLATE

N 4
ITMAX* 25
Q 1.00000
T * - 0.20000
SIGMA = 0.30000
L = 40.00000
E = 0.30E 08
THE INTERMEDIATE VALUES U ARE

THE FINAL DEFLECTlONS W ARE


496 Approximation of the Solution of Partial Diferential Equations

Computer Output (Continued)


Results for the 4th Data Set
S O L U T I O N OF B l H A R M O N l C EQUATION FOR DEFLECT'ION OF A UNIFORMLY-LOADED SQUARE PLATE

N = 8
ITMAX = 100
Q = 1.00000
T = 0.20000
SIGMA = 0.30000
L = 40.00000
E = 0.30E 08
THE I N T E R M E D I A T E VALUES U ARE
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 -0.001294 -0.002020 -0.002396 -0.002513 -0.002396 -0.002020 -0.001294
0.0 -0.002020 -0.003251 -0.003914 -0.004123 -0.003914 -0.003251 -0.002020
0.0 -0.002396 -0.003914 -0.004749 -0.005014 -0.004749 -0.003914 -0.002396
0.0 -0.002513 -0.004123 -0.005014 -0.005299 -0.005014 -0.004123 -0.002513
0.0 -0.002396 -0.003914 -0.004749 -0,005014 -0.004749 -0.003914 -0.002396
0.0 -0.002020 -0.003251 -0.003914 -0.004123 -0.003914 -0.003251 -0.002020
0.0 -0.001294 -0.002020 -0.002396 -0.002513 -0.002396 -0.602020 -0.001294
0.0 0.0 0.0 0.0 0.0 0 .o 0.0 0.0
THE F I N A L D E F L E C T I O N S W ARE

0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0


0.0 0.077190 0.138200 0.176497 0.189U90 0.176497 0.138200 0.077190
0.0 0.138200 0.248618 0.318392 0.342133 0.318392 0.248618 0.138200
0.0 0.176497 0.318392 0.408462 0.t39175 0.408461 0.318391 0.176497
0.0 0.189490 0.342133 0.439175 0.472290 0.439175 0.342133 0.189490
0.0 0.176497 0.318392 0.408462 0.439175 0.408461 0.318391 0.176497
0.0 0.138200 0.248618 0.318392 0.342133 0.318392 0.248618 0.138200
0.0 0.077190 0.138200 0.176497 0.189490 0.176497 0.138200 0.077190
0.0 0.0 0.0 0.Q 0.0 0.0 0.0 0.0
Example 7.7 Deflection of a Loaded Plate 497
Discussion of Results solution. The 1st and 3rd data sets, again for n = 4 and
The printout is shown only for the 2nd and 4th data n = 8, but with only 5 and 25 iterations, respectively,
sets, with = 4 and a 8. ~ ~by thedsymmetry ~ of i did ~not give
~ symmetrical values, indicating only partial
the computed values, convergence is probably complete Convergence.
to the number of significant figures printed (after 25 and The above method "'Id be extended to the
100 iterations, respectively). There is also little change in Of an loaded plate, by inserting
passing from = 4 to = 8, which suggests that the the appropriate local values in the matrix qoverd.
computed values are good approximations to the exact
EXAMPLE 7.8
TORSION WITH CURVED BOUNDARY

Introdwtion Method of Solution


A steady torque T is applied to a cylindrical shaft that The general approach is to perform successive Gauss-
is clamped at one end. The cross section of the shaft is an Seidel iterations over all grid points. Because of symmetry,
ellipse with major and minor semiaxes u and P, shown in we shall consider only one quadrant of the ellipse, shown
Fig. 7.8.1. If the modulus of rigidity of the shaft is G, in Fig. 7.8.2.

Fi#we 7.8.1 Torsion of an elliptical cylinder.

then the angle of twist 0, per unit length, is given by Figare 7.8.2 Q d r a n t of ellipse.
Hughes and Gaylord [33] as
The grid points within the ellipse are subdivided into
five main categories:
where the integral is over the whole cross-sectional area, 1 . Interior points. The appropriate finite-difference
and # is the solution of approximation of (7.8.2) at the interior points is
V2$ = -2, (7.8.2) $ij = +C$'i-l,j + $ i + l , j + $i,j-~ + #i.j+l + 2(Ax)21*
with 2. Lower Boundary Points ( y = 0). Because of sym-
metry about the x axis,
11 = 0 on the surface of the ellipse.
Since the analytical solution of (7.8.2) is

3. Left-hand Boundary Points (x = 0). Because of


symmetry about the y axis,
it can be shown from (7.8.1) that the angle of twist per
unit length is

4. Corner Point. At the center of the ellipse.


Problem Statement 110.0 = ac2111.0 + 2110.1 + AX)^].
Write a program that will perform a finite-difference
5. Points Near Curved Boundary. These boundary
solution of (7.8.2) and compare the computed results with points, indicated in Fig. 7.8.2 by the symbols 0,0,
the analytical solution given by (7.8.3). The purpose of
and A, require individual treatment, as described below.
this problem is primarily to illustrate the technique of
The key to a detailed examination of the boundary
interpolation of degree two at a curved boundary (see
points mentioned in (5) above is to findjmax, the column
Section 7.23). Use a square grid, and assume that
subscript of the rightmost grid point in each row
i(=O, 1, . . ., m). Since the equation of the ellipse is

where m and n are integers.


Example 7.8 Torsion with Curved Boundary

Type I Type I1 Type I11 Type IV


c<a<l t <a<1 a= 1 o<lal St
t<b<l b= l e<b<l 0slbl<t
Figure 7.8.3 Types of boundary point.

it follows that (7.69), the finite-difference approximation of (7.8.2) at


such a point is

truncated to the next lower integer.


As illustrated in Fig. 7.8.3, each boundary point in + b)*ij]
row i can now be placed into one of the following four +--b(b + 1)
B (a
ab
= -2.

categories :
That is,
Type I: Point (i,jmaxi), if this is not of type 111 or IV.
Type 11: Points (i,j), if any, for which jmaxi+ <j < $ij = c[dt+hlej- + e$i- l , j +f$A + gt,bB+ AX)^], (7.8.5)
jmaxi.
Type III: Point (i,jmaxi), only if it is not of type IV and where
if j m a ~ = ~ jmaxi.
+~ . ab 1 1 1
Type IV: Occurs if the curved boundary produces an
intercept within kc of a grid point, where c: << 1.
c=-
a+b7
d=-
b + 1' e=a
a(a+l)'
f=-

The symbols 0 , 0 , and A, in Figs. 7.8.2 and 7.8.3, 1


g=- (7.8.6)
indicate points of types I, 11, and 111, respectively. Note b(b + 1)'
that for any row i, points of types I, 111, and IV are
mutually exclusive. Since $ = 0 on the surface, we have the following
Consider a point (i,j), possibly having unequal "arms" special cases of (7.8.5) at the various types of point:
to the right and above, as shown in Fig. 7.8.4. From
Type of Point (i,j ) #A $B a b

For a point (i,j) of type IV, t,bij= 0. In particular,


-
( i ,J - 1 )
AX b ~ x
B
9m.0 = $0," = 0.
For computing purposes, we must also note that the
- -

horizontal and vertical arms extending from any point


(i,j) to the curved boundary are such that

I(i-lqJ)

Figwe 7.8.4 Point (i.j) and neighbors.


Once the boundary points have been classified (in the
program, this is achieved by the subroutine BNDPTS) the
Gauss-Seidel iteration technique is employed over all
grid points, using the particular formula appropriate
500 Approximation of the Solution of Partial DifferentialEquations

to each point. The iterations are discontinued when the the RMS (root mean square) deviation of II/ from its known
sum E over all grid points of the absolute values of the analytical solution, (7.8.3).
deviations of $ from its previous value falls below some We have not attempted to write a concise flow diagram
small quantity E,,,. An upper limit is also placed on the for this example. Instead, the overall approach can be
number of iterations. understood best by reading the above description in
Finally, the function COMPAR is called to determine conjunction with the program comment cards.

FORTRAN Implementation
List of Principal Variables
Program Symbol Definition

(Main)
ALPHA, BETA Semiaxes of the ellipse, a and P.
BNDPTS Subroutine for locating the boundary points and their associated intercepts.
C, D, E. F. G, Expressions defined in equation (7.8.6).
COMPAR Function for determining the RMS deviation, RMSDEV, of the computed values II/ij from
their known exact values.
DX Space increment, Ax = Ay.
EPS Sum E of the absolute values of the deviations of the II/ij from their previously computed values.
I, J Grid-point subscripts, i , j. Because of FORTRAN limitations, we have 1 5 I I M + 1 and
I I J I N + I in the program, corresponding to 0 < i 5 m and 0 Ij 5 n in the text.
ITER Counter on the number of Gauss-Seidel iterations.
ITMAX Maximum allowable value of ITER.
ITYPE Matrix giving the type of each boundary point (see Figs. 7.8.2 and 7.8.3).
JLOW, JHIGH Vectors such that JLOWI(1) and JHIGH(1) contain the column subscripts J of the first and last
boundary points (of types I-IV) in row I.
JMAX Vector such that JMAX(I) contains the column subscript J of the rightmost grid point in row I.
JR Vector such that JR(I) contains the highest column subscript of a completely interior point in
row I.
Number of intervals, rn and n, into which the minor and major semiaxes of the ellipse are divi-
ded.
PSI Matrix containing the values of $ at each grid point.
SAVPSI Temporary storage for the current value of II/ij.
(Subroutine
B N D PTS)
EPS Small distance 6 (see Fig. 7.8.3).
(Function
COMPAR)
C +
a2P2/(aZ P2).
DEVSQ The sum of the squares of the deviations.
l PTS The total number of grid points.
TRUPSI The true value of I) at each grid point.
Example 7.8 Torsion with Curved Boundary

Program Listing
Main Program
C A P P L l ED ',,i)blERI CAL METHODS, EXAMPLE 7.8
C TORSION UF A CYLINDER OF E L L I P T I C A L CROSS-SECTION.
C
C l NVOLVES GAUSS-SEI DEL TYPE SOLUTION OF POISSON'S EQUATION,
DELSQ(PS1) = 2, -
THROUGHOUT THE UPPER RIGHT-HAND QUADRANT OF AN E L L I P S E .
THERE ARE N AND M GRID SPACINGS RESPECTIVELY ALONG THE
MAJOR AND MINOR AXES OF LENGTHS ALPHA AND BETA. PSI I S
ZERO ALONG THE CURVED PORTION OF THE BOUNDARY. INTER-
POLATION OF DEGREE TWO I S EMPLOYED AT THE BOUNDARY POINTS.

DIMENSION J R ( 2 1 1 , J M A X ( Z l ) , PS1(21,21), C(21,21), D(21821)a


1 ~ ( 2 1 , 2 1 ) , F ( 21,211, G(Zl,21), ITYPE(21,21)

..... READ AND CHECK DATA, AND SET CONSTANTS


READ (s,ir30) M, N, ITMAX, ALPHA, BETA, EPSMAX
.....
WRITE ( 6 , 2 0 0 ) M, N, ITMAX, ALPHA, BETA, EPSMAX
MP1 = M + 1
NP1 = N + 1
FN = N
DX = ALPHA/FN
DXSQ = DX*DX
TUDXSQ = 2.0*DX*DX
DO 2 1 = 1, M P 1
DO 2 J = 1, N P 1
PSI(I,J) 0.0
WRITE (6,201)

C
C
..... CALL
BOUblDARY PO l NTS AND THE1 R ASSOCIATED l NTERCEPTS .....
ON SUBROUTINE BNDPTS TO DETERMINE THE

CALL BNDPTS (M, N, JR, JMAX, ITYPE, C, D, E, F, G)


WRITE ( 6 , 2 0 2 ) (JR(I), I = 2, M)
WRITE ( 6 , 2 0 3 ) (JMAX(I), I = 1, MP1)
C
C ..... COMPUTE SUCCESSIVELY BETTER APPROXIMATIONS FOR
C
C
P S I AT A L L G R I D POINTS, I T E R A T I N G BY GAUSS-SEIDEL
METHOD U N T I L CONVERGENCE C R I T E R I O N I S S A T I S F I E D
ITER = 0
.....
3 ITER = ITER + 1
EPS = 0.0
I:
..... BOUNDARY,
POINTS ON STRAIGHT PORTIONS OF
F I R S T ALONG THE Y - A X I S .....
DO4 1 = 2 , M
SAVPSI = P S I ( I , l )
PSI(I,l) = (PSI(I+l,l) + TUDXSQ)/L.O
EPS = EPS + A B S ( P S I ( I , l )
+
-P SSI A( IV- P1 ,Sl I)) + 2.0*PS1( 1,2)

..... AND THEN ALONG THE X-AXIS


DO5 J a 2 , N
.....
SAVPSI = PSI(l..J)
PSI(1,J) = (PSI(~,J-~) + PSI(l,J+l) + 2,0*PS1(2,J) + TUDXSQ)/4.0
EPS = EPS * A B S ( P S I ( 1 , J ) -
SAVPSI)

..... CENTER P O I N T
SAVPSI = P ! S I ( l , l )
.....
= (2.0*PS1(Z81)
PSI(1,l)
EPS = EPS + A B S ( P S I ( 1 , l ) -
+ 2.0*PS1(1,2)
SAVPSI)
+ TUOXSQ)/4.0

C ..... I N T E R I O R POINTS
DO 2 1 1 = 2, M
.....
JHIGH = JRCI)
I F ( J H I G H .LT. 2 ) GO TO 2 1
DO 6 J = 2, J H I G H
SAVPSI = P S I ( I , J )
PSI(I,J) = (PSI(1-1,Jl + PSI(I+l,J) 4 PSI(I,J-I)
1 * PSI(I,J+l) + TUDXSQ)/4
6 EPS = EPS + A B S ( P S I ( I , J )
2 1 CONTINUE
SAVPSI) -
502 Approximation of the Solution of Partial Differential Equations

Program Listing (Continued)


C
C .....
BOUNDARY POI NTS ( INTERPOLATION REQUIRED)
DO 1 2 1 = 2, M
.....
JLOW JR( I ) + 1
JHIGH = JMAX(I)
DO 1 2 J = JLOW, J H I G H
I F (ITYPE(I,J) ~ E Q . 4 ) GO TO 1 2
SUM = D ( I . J ) * P S I ( I , J - 1 ) + E(I,J)+PSI(I-1.J) + DXSQ
IF (ITYPE(I,J) .NE; 2 ) GO TO 9
SUM = SUM + G ( I , J ) * P S l ( I , J + l )
GO TO 11
9 I F (ITYPE(I,J) .NE. 3 ) GO TO 11
SUM = SUM + F ( I , J ) * P S l ( I + l , J )
11
- SAVPSI = P S I ( 1 . J )
PSI(I,J) C(I;J)+SUM
EPS = EPS + A B S ( P S I ( 1 . J )
CONTl NUE
- SAVPSI)

..... CHECK TO SEE I F CONVERGENCE CRITERION I S S A T I S -


F I E D AND, I F SO, P R I N T F I N A L F I E L D OF P S I VALUES .....
I F ( I T E R .GE. ITMAX) GO TO 1 4
I F (EPS -
WRITE (6,204)
EPSMAX) '14, 3, 3
ITER, EPS
DO 1 5 1 - = 1, M P 1
lSUB = M + 2
JHIGH = JMAX(ISUB)
- I

WRITE (6,205) (PSI(ISUB,J), J = 1, J H I G H )

..... COMPUTE THE ROOT MEAN SQUARE D E V I A T I O N


OF P S I FROM I T S KNOWN ANALYTICAL SOLUTION .....
RMSDEV = COMPAR (M, N, JMAX, PSI, ALPHA, BETA)
WRITE ( 6 , 2 0 6 ) RMSDEV
GO TO 1

..... FORMATS FOR I N P U T AND OUTPUT STATEMENTS


FORMAT ( 3 ( 1 1 X , 1 4 ) / 2(12X, F8.11, 12X, F 8 . 3 )
.....
FORMAT ( 7 0 H 1 TORS I O N OF CY L l NDER OF E L L 1 P T l CAL CROSS-SECT.ION,
= , 1 4 1 12H *
-
l W l T H PARAMETERS/ 1 2 H 0 M N 141
2 12H ITMAX = , 1 4 / 1 2 H ALPHA = , F8.31
3 12H BETA , F8.3 / 1 2 H EPSMAX = , F8.3)
201 FORMAT (78HO RESULTS ARE FOR INTERPOLATION OF DEGREE TWO, W I T H
1 GAUSS-SEIDEL ITERATION/ SOHO THE 'FOLLOWING VALUES ENABLE THE
2BOUNDARY AND INTERIOR POINTS TO BE LOCATED)
202
203
FORMAT (33HO
FORMAT (39HO
1 1115))
VALUES OF JR(2)...JR(M)
VALUES OF JMAX(1). .. ARE/ (1H0, SX, 1 1 1 5 ) )
J M A X ( M + l ) ARE/ (1H0, SX,

204 FORMAT ( 2 9 H 1 ITERATIONS STOPPED WITH/ l 2 H O ITER =, I41


1 12H0 EPS = , E12.4/ 3 2 H 0 THE F I E L D OF P S I VALUES I S )

-,
205 FORMAT (1H0, l l F 9 . 4 )
206 FORMAT (60HO RMS D E V I A T I O N BETWEEN COMPUTED AND EXACT P S I VALU
1ES I S / 1 5 H 0 RMSDEV F10.5)
C
END

Subroutine BN D P T S

C SUBROUTINE FOR DETERMINING BOUNDARY POINTS *AND INTERCEPTS.


C FOR A G I V E N ROW I, ( I , J R ( I ) ) I S THE RIGHTMOST I N T E R I O R
C P O I N T AND ( I , J M A X ( I ) ) I S THE RIGHTMOST BOUNDARY POINT.
C A AND B ARE THE FRACTIONAL INTERCEPTS I N THE V E R T I C A L ( Y )
C AND HORIZONTAL ( X I DIRECTIONS, RESPECTIVELY.
C
SUBROUTINE BNDPTS (M, N, JR, JMAX, ITYPE, C, D, E, F, 6 )
DIMENSION JR(21), JMAX(21), ITYPE(21,21), C(21,21), D(21,21),
1E(21,1), F(21,21), G(21,21)
EPS = 1.OE-6
FM = M
FN = N
FMSQ = M*M
FNSQ N+N
MP1 = M + 1
Example 7.8 Torsion with C u r d Boundary

PIognm Listing (Continued)


C
C
C
..... LOCATE EXTREME RIGHT-HAND PO 1NT
(I,JMAX(I)) AND DETERMINE I T S TYPE .....
JRAX(1) = N + A
DO 7 1 = 2, M P l
FIMl I -, 1
XOVRDX = FN+SQRT(l.O
J M I = XOVRDX + EPS
-
FIMl+FIMl/FMSQ)

FJMl JM1.
J = JMl + 1
JMAXZI~ = J
-
--
YOVRDX = FM+SQRT( 1.0 FJMl+FJMl/FNSQ)
A = YOVRDX FIM~
B = XOVRDX FJMl
I F (ABS(A) .LT. EPS) GO TO 2
I F (ABS(B) .GE. EPS) GO TO 3

ITYPE(I,J) = 3
A = 1.0
F(I,J) = 0.5
GO TO 6
5 ITYPE(1,J) = 1
6 CON11NU E
C(I,J) = A+B/(A + 0)
D(I,J) l.O/(B + 1.0)
E(I,J) = l.O/(A + 1.0)
7 CONTINUE
C
c ..... LOCATE BOUNDARY PO INTS OF TYPE 2 .....
FlM1
J
--
DO 11 I = 2, M
I
JMAX( 1)
1
1
8
FJMl 0 J
A
-
I F ( J M A X ( I + l ) .GE. J) GO TO 1 0
FM+SQRT(l.O -
1
FJMl+FJMl/FNSQ) -
FlMl
ITYPE(1,J) = 2
C(1.J) = A / ( A + 1.0)

GO TO 8 -
10 CONTINUE
11 JRCI = J
RETURN
C
END
504 Approximation of the Solution of Partial Diferential Equations

Program Listing (Continued)

Function COMPAR
C FUNCTION FOR COMPARING THE COMPUTED SOLUTION W I T H I T S
C KNOWN ANALYTICAL VALUE.
C
L
FUNCTION COMPAR (M, N, JMAX, PSI, ALPHA, BETA)
DIMENSION JMAX(211, PS1(21,21)
C
C .....=
1 PTS
COMPUTE THE SUMS OF THE DEVIATIONS SQUARED
0
.....
DEVSQ = 0.0
FMSQ = M*M
FNSQ = N*N

FIMl I 1-
J H l G H = JMAX(1)
DO 1 J = 1, J H l G H
FJM1 = J 1-
l PTS l PTS + 1
TRUPSI = ( 1 . 0 -
FIMl*FIMl/FMSQ
1 DEVSQ = DEVSQ + ( P S I ( I , J ) - -
FJMl*FJMl/FNSQ)*C
TRUPSl)**Z
C
C ..... COMPUTE THE ROOT MEAN SQUARE D E V I A T I O N
PTS = l PTS
.....
COMPAR = SQRT(DEVSQ/PTS
RETURN
C
END

Data
M I = 5, ITMAX = 100,
ALPHA = BETA = 8.0, EPSMAX =
M = 5, ITMAX = 100,
ALPHA = BETA a 8.0, EPSMAX =
M = = 5, ITMAX = 100,
ALPHA = BETA I 8.0, EPSMAX =
M = = 5, ITMAX = 200,
ALPHA. = BETA = 8.0, EPSMAX =
M I = 5, ITMAX = 200,
ALPHA = BETA = 8.0, EPSMAX =
M = = 10, ITMAX = 500,
ALPHA = BETA = 8.0, EPSMAXr
M I = 10, ITMAX = 300,
ALPHA = BETA r 8.0, EPSMAX =
M I = 10, ITMAX 8 500,
ALPHA = BETA = 8.0, EPSMAX =
Example 7.8 Torsion w i ~ hCurved Boundary

Computer Output
Results for the 5th Data Set
TORSION OF CYLINDER OF E L L I P T I C A L CROSS-SECTION, W I T H PARAMETERS
M 4
N 5
ITMAX 200
ALPHA = 1 0 . 0 0 0
BETA = 8.000
EPSMAX a 0.001
RESULTS ARE FOR INTERPOLATION OF DEGREE TWO, W I T H GAUSS-SEIDEL ITERATION

THE FOLLOWING VALUES ENABLE THE BOUNDARY AND I N T E R I O R POINTS TO BE LOCATED

VALUES OF JR(2l...JR(M) ARE

VALUES OF JMAX(l)...JMAX(M+l) ARE

l T ERA1 l ONS STOPPED WIT H

ITER = 75
EPS m 0.9260E-03

TSIE F l ELD OF P S I VALUES I S

0.0
17.0729 15.5120 10.8291 3.0244
29.2678 27.7069 23.0240 15.2193 4.2926
36.5846 35.0237 30.3410 22.5363 11.6096

39.0236 37.4627 32.7799 24.9752 14.0486 0.0

RMS D E V I A T I O N BETWEEN COMPUTED.AND EXACT P S I VALUES I S

RMSDEV = 0.00042
506 Approximation of the Solution of Partial DifferentialEquations

Computer Output (Continued)


Results for the 8th Data Set
TORSION OF CYLINDER OF E L L I P T I C A L CROSS-SECTION, W I T H PARAMETERS
M = 8
N = 10
ITMAX = 500
ALPHA = 10.000
BETA = 8.000
EPSMAX = 0.010

RESULTS ARE FOR INTERPOLATION OF DEGREE TWO, W I T H GAUSS-SEIDEL I T E R A T I O N

THE FOLLOWING VALUES ENABLE THE BOUNDARY AND INTERIOR POINTS TO BE LOCATED

VALUES OF JR( 2 1.. .


JR(M) ARE

VALUES OF JMAX(l)...JMAX(M+l) ARE

l TERATIONS STOPPED WITH

ITER = 232

EPS = 0.9897E-82

THE F l E L 0 OF PS1 VALUES 1 s

0.0

9.1449 8.7547 7.5842 5.6333 2.9020

17.0702 16.6801 15.5096 13.5588 10.8277 7.3161 3.0240

23.7760 23.3859 22.2155 20.2648 17.5338 14.0223 9.7303 4.6579

29.2624 28.8723 27.7020 25,7514 23.0204 19.5090 15.2172 10.1450 4.2922


33.5294 33.1394 31.9691 30.0186 27.2877 23.7764 19.4847 14.4125 8.5599 1.9266

36.5772 36.1872 35.0170 33.0665 30.3356 26.8254 22.5328 17.4608 11.6082 1.9750
38.4057 38.0158 36.8456 34.8951 32.1654 28.6532 24.3617 19.2896 13.1371 6.80bO

39.0155 38.6252 37.4550 35.5046 32.7738 29.2627 24.9712 19.8992 14.0567 7.4137 0.6

RMS D E V I A T I O N BETWEEN COMPUTED AND EXACT P S I VALUES I S


RMSDEV = 0.00471
Example 7.8 Torsion with Curved Boundary 507
Discussion of Results Table 7.8.1 Summary of Results
Two sets of computed results are displayed above; in
both cases, the value of RMSDEV is very low, indicating
excellent accuracy in the computations. This is not alto- E,,, lTER RMSDEV E,,, ITER RMSDEV
gether surprising because the analytical solution for $,
given by (7.8.3), has a special form; in particular, it is
such that ,,$
,, $
,, and higher-order derivatives are
zero. This means that the finite-difference approximation
to V2$ at any grid point is exact, even for a very coarse
grid spacing. The values of RMSDEV for all runs are
given in Table 7.8.1. Table 7.8.1 that the number of iterations for a given
For the reason stated above, observe that in this par- degree of convergence (indicated by the value of R M ~ D E V )
ticular example there is essentially no improvement in is roughly proportional to the total number of grid points
accuracy when employing the finer grid. Note also from involved.
508 Approximation of the Solution of Partial Differential Equations

7.25 Successive Overrelaxation and Alternating-


Direction Methods
= z).t,,- . I + ( P - 2)vi,j vi,j+ I , (7.90a)
We now consider two additional methods which acceler-
ate convergence to the solution of the finite-difference
equations arising from the elliptic problem u,, + uyy= 0.
The first, known as the method of successive over-
relaxation (S.O.R.), is again an iterative technique in Equations (7.90a) and (7.90b) are implicit in the x and y
which an improved estimate uiYjis computed by applying directions, respectively. When applied to successive rows
the following formula at every grid point: or columns of points, each involves the solution of a system
of equations having a tridiagonal coefficient matrix. The
iteration parameter p is analogous to 2JAfor the parabolic
case, and may be varied from one application of (7.90a)
and (7.90b) to the next. For Laplace's equation in a
square, the optimum convergence occurs when p is
where o is an arbitrary parameter. For o = 1, the pro-
given the sequence of values
cedure is identical to the Gauss-Seidel method. However,
for a choice of o in the range 1 < o c 2, the convergence
is more rapid. The location of the optimum value of the
parameter, o,,,, presents some difficulty and is discussed
in detail by Forsythe and Wasow [9]. For the simple where n is the number of increments into which the side
case of Laplace's equation in a square with Dirichlet of the square is divided. Birkhoff and Varga [l] show
boundary conditions, Young [25] and Frankel [lo] show that the alternating-direction method is most advanta-
that w,,, equals the smaller root of geous for rectangular regions.
Certain problems in unsteady two-dimensional fluid
flow (for example, see [15], [28], and [29]) involve the
with t = 2cos(x/n), where n is the total number of solution of an elliptic PDE at each time-step. In stlch
increments into which the side of the square is divided. cases, there is a strong motivation for finding the optimum
The number of iterations required for a given degree of values of either o or p for use in the S.O.R. and I.A.D.
convergence falls very rapidly when the parameter is in methods, respectively. Frequently, this investigation has
the immediate vicinity of w,,,, and it is generally better to be performed on a semiempirical basis.
to overestimate o,,, than to underestimate it. 7.26 Characteristic-Value Problems
The second method is introduced by noting that the
solution of the elliptic problem may be regarded as the Boundary-t?alue problems require the solution of a
limiting solution (for large values of time) of the corres- differential equation of order two or greater to satisfy
ponding time-dependent initial-value problem. Thus to specified conditions at both ends of an interval. A very
solve u,, + uyy= 0 in a region R, some initial condition important class is that in which both the differential
is first assigned throughout R, and the problem u,, + equation and the boundary conditions are homogeneous.
uyy= u, is solved instead. Successive time-steps of the The term homogeneous means that if a differential
solution of this parabolic problem may be viewed as equation and/or the boundary conditions are satisfied by
successive steps of iteration in the elliptic problem. The a function y(x), for instance, then they are also satisfied
boundary conditions will, of course, be identical for both by Ay(x), where A is an arbitrary constant. Thus the
cases. In fact, it can readily be shown that the Jacobi following boundary condition is homogeneous:
scheme for u,, + uyy= 0 is equivalent to the ordinary
dy
+
explicit method for u,, uyy= u, with 1 = At/(Ax)' = y+c-=O
dx
(at x = a , for instance). (7.92)
114.
With these observations in mind, the implicit alterna- The subsequent discussion applies only to cases in which
ting-direction method (I.A.D.), discussed by Peaceman both the differential equation and the associated boundary
and Rachford [19], now finds an obvious application to conditions are homogeneous. In the general problem of
the solution of elliptic PDEs. Indeed, for a rectangular this type, an arbitrary parameter i is involved, and we
region, it is superior even to the method of successive must determine those values of 1. for which the problem
overrelaxation. A single stage of iteration, in which new admits of nontrivial solutions.
estimates v f , are obtained from previous approximations It is convenient to mention such problems here because
ui, via intermediate values is achieved by applying PDEs are frequently solved by the technique of separation
the following two difference equations in the manner of variables, which decomposes the original PDE into
discussed in Section 7.14. two or more ordinary differential equations, each of
7.26 Characteristic- Value Problems 509
which involves an arbitrary parameter I. Consideration This set of homogeneous linear equations will admit of
of the boundary values then determines those possible a nontrivial solution v , , u2, . . ., vn if and only if the
values of A that lead to a meaningful solution. A class of determinant of the array of coefficients of these quantities
problems of this type is of the Sturm-Liouville form: vanishes.

d Ip(x)
dx

11 +c,
dy
dx
$1 +
-=O

y +c,- L ~ Y= O
dx
[q(x) + i r ( r ) l y = 0,

at x

at x
= a,

= b,
a < x < b,

(7.93)
Let the vector v be defined by

v= [I. vn
(7.98)

Then the system of equations (7.97) can be written in the


where p(x), q(x), and r(x) are known functions that are form
continuous in the interval (a,b), and in general, r(x) > 0
QV= Ah2Rv, (7.99)
within the interval. These problems do not in general
admit of nontrivial solutions for any I. Nontrivial where Q is the matrix
(2 - q l h 2 ) -1 0 0 ... 0 0
(2-q2h2) -1 0 ... 0
( 2 - q3h2) - 1 ... 0
. ... .
0 0 0 0 ..- -1 (2-qnh2)

solutions exist only for discrete values of 1 that are and R is the diagonal matrix with elements r,, r2, . . ., rn
known as the characteristic values of the problem; the along the main diagonal. For convenience, since r(x) > 0 ,
corresponding solutions are called the characteristic we can set
functions. We study here a method for the approximate
treatment of such problems.
Consider the following slightly simplified version of
the general Sturm-Liouville problem:
and replace (7.99) by the equivalent system

Ax = Ax. (7.102)

Here, A is the symmetric matrix


Subdivide the interval (a,b) into equal parts by inserting
the points a,, a 0 0
a =x,; x l , x 2 , ..., x.; x n f l = b, -az1 a22 -a23 0
O 3 a33 O
where (7.95) ...
x, = xo + hk, h = ( b - a)/(n + 1 ) . 0 0 0 ...
We then make the following substitutions in the boundary-
value problem :
d 2 ~ ( x k ) v k - 1 - 2vk + O k + 1.
y(xk) 4 ; - dx2
-
h
>
in which
(7.96)
q(xk) = q k ; r(xk) " rk.
Thus the differential equation and boundary conditions
(7.94) are replaced by the following system of difference
equations involving the values of the approximate
.. +
solution vk at the points x , ~k, = 0, 1, ., n 1 :
The problem of finding the characteristic values of
(7.94) is, therefore, very similar to that of finding the
eigenvalues and eigenvectors of a square matrix. This
problem is discussed more fully in Chapter 4.
EXAMPLE 7.9
UNSTEADY CONDUCTION BETWEEN CYLINDERS
(CHARACTERISTIC-VALUE PROBLEM)

Problem Statement where R and T are functions of r and t only, respectively.


Substituting in (7.9. I), we obtain
Find the temperature history 8(r,t) in the solid bounded
by two infinite cylindrical surfaces of radii r, and r,, as -R"+ -1-R'= - =T' -2. (7.9.4)
shown in Fig. 7.9.1. Initially (at t = O), 0 = 0 everywhere, R rR T
Since the parameter A may assume a variety of values, the
solution to the problem is written as

where the ,Ij, cj, and Rj are yet to be determined.


Consider that part of equation (7.9.4) involving R:

We wish to determine those values that i. may assume,


and the corresponding values of the function R. Divide
the region between the cylindrical surfaces into n equal
increments Ar, with grid points allocated as follows:

Figure 7.9.1 Initial and boundary conditionsjbr the hollow cylinder

Inner boundary Outer boundary


and for t > 0 the outer surface r = r, is maintained at r = r , (insulated) r = r2(4 =0)
8 = 1. The inner surface r = r, behaves as a perfect
insulator. The finite-difference approximation to equation (7.9.6)
is now considered at the various points.
Method of Solution (a) General point (i = 2,3, . . ., n ) :

The dimensionless heat conduction equation in radial


coordinates is
or, defining

To reduce the situation to a characteristic-value problem,


we must render the boundary conditions homogeneous, and
that is, of the form ae/ar + a0 = 0.
-
By changing the variable to 4 = 1 8, the governing 1
Fi = (1 + 5 1 3
differential equation becomes
a24 1 a4 a4 we have
-+--= - (7.9.1)
ar2 r ar at' -EiRi- + (D - A)Ri - FiRi+ 1 = 0.

subject to (b) Inner boundary point (i = 1) :


(a) the initial condition: 4 = 1 at t = 0,
(b) the boundary conditions:
R1/r = 0 and -
R" = 2 (R2 - R 1).
(7.9.2)
d(b/ar = 0 at r = r,, Hence,
4=Oatr=r2.
(D - I)Rl - DR2 = 0.
Assuming that a separation of variables can be made,
(c) Outer boundary point ( i = n + 1) :
(b(r9 t ) = R(r)T(t), (7.9.3) Since (b is always zero at r = r,, we set R,,, = 0.
Example 7.9 Unsteady Conduction Between Cylinders (Characteristic-Value Problem) 511
We then have the following system of equations in the At time t = 0, by multiplying equation (7.9.8) through
unknown Ri, i = 1, 2, . . .,n: by sip (the ith element of the pth eigenvector of B)
and summing over all grid points i, we have
DR, -DR2 = IRl
- E 2 R I +DR2 - FzR3 =IR2

- E3R2 + DR3 - F3R4 = AR3


But +i(0) = 1 everywhere, from the initial condition
. . . . . . . . . . . (7.9.2). Also, by the orthogonality property of the eigen-
- EiRi.. 1 + DRi - FiRi+ =l R i vectors, as long as there are no repeated eigenvalues,
. . . . . . . . .
- EnRn- + DR, = IR,.
Or, denoting the coefficient matrix by A , and the column
vector whose elements are R , , R,,. . ., Rn by r, Hence,
Ar = Ir. (7.9.7)
In general, there will be n eigenvalues I and n associated
eigenvectors r; let the jth such quantities be l j and r j . In the following program, the matrices A and B
Recognizing by subscript i that 4 is required at some are first established and their eigenvalues and eigen-
particular grid point and that there are n eigenvectors, vectors are then found by calling on the subroutine
equation (7.9.5) is RUTIS. The eigenvectors of A and B are stored column
by column in the matrices R and S, respectively. The
program checks to determine if the eigenvectors of A
and B are indeed orthogonal by constructing a new
where r i j is the ith element of the jth eigenvector. Again, matrix P = R'S whose elements are such that
the c, will be determined by applying a property of
orthogonality. Since A is unsymmetric, its eigenvectors
are not orthogonal. This difficulty, however, may be
overcome by denoting the transpose of A as B = A', and The off-diagonal elements (i # j ) of P should be close to
by denoting the eigenvectors of B by s ; the eigenvalues of zero. Next, the coefficients c j are evaluated from (7.9.10).
matrix Bare actually the same as those of matrix A. Then, This allows 4 i ( t ) to be computed from (7.9.8) for any
by a principle stated on page 220, if A1 and A, are two time and grid point; hence the corresponding tempera-
eigenvalues of A (or B) such that I , # A,, then an eigen- tures Oi(t)= 1 - bi(t) can be computed and tabulated.
vector s such that Bs = l , s is orthogonal to an eigenvec- When evaluating (7.9.8), terms are ignored if the expo-
tor r such that Ar = A1r In this event, r's = 0. nent Lit is greater than 40.
512 Approximation of the Sslutio~of Partial DifferentialEquations

Flow Diagram

n, and control
parameters for RUTIS (see text)

F -
Apply LR transformation to A and B.
UTISworked satisfactorily
TAG^ = F and TAG2 = F -
A Transformed matrices are Anew,B,,,.
Eigenvectors are stored in R and S.
for both transformations) (2 calls on subroutine RUTIS)
\
\
\
\
\
\
. \
\

l i Anowri P t R'S ..., n


Example 7.9 Unsteady Conduction Between Cylinders (Characteristic-Value Problem)

FORTRAN Implementation
List of Principal Variables
Program Symbol Defnition
A. B Matrix A appearing in equation (7.9.7), and its transpose B.
ANEW, BNEW Transformed matrices returned by RUTIS.
c Vector of coefficients c j , defined in (7.9.10).
DR Radial grid spacing, Ar,
DT Time interval between successive printings of temperatures.
FREQ, ITMAX, EPSI, Parameters for use in subroutine RUTls (see Example 4.3).
EPS2, EPS3, EPS4,
EIGVEC, STRIPD,
SWEEP, TAG1, TAG2,
ITER
LAMBDA Vector af eigenvalues A,.
N Number of grid points, n, at which temperatures are to be computed.
P Matrix P = R'S, with elements given by (7.9.11).
R, S Matrices whose columns contain eigenvectors r and s of A and B, respectively.
RI, R 2 Radii d inner and outer cylinders, r, and r, , respectively.
RUTIS Subroutine; for finding the eiganvalues and eigenvectors of a matrix (see Example 4.3).
T Time, t,
THETA Vector of temperatures,. Q1.
TMAX ]Maximum value of time to be considered, t,,,.
514 Approximation of the Solution of Partial Differential Equations

Program Listing
A P P L l ED NUMERl CAL METHODS, EXAMPLE 7.9
UNSTEADY CONDUCTION BETWEEN CYLINDERS (EIGENVALUE PROBLEM).

I M P L I C I T REAL*8 (A-H, 0-2)


REAL.8 LAMBDA
INTEGER FREQ
LOG1 CAL EIGVEC, S T R I PD, SWEEP, TAG1, TAG2
DIMENSION A ( l l , l l ) , ANEW(ll,ll), R(ll,ll), P(ll,ll), C(ll),
1 THETA(111, LAMBDA(111, B(11,11), BNEW(11,11), S(11,ll)

..... CHECK PARAMETERS, ESTABLI SH MATRl X A AND TRANSPOSE


WRlTE (6,200)
.....
READ ( 5 . 1 0 0 ) R1. R2. DT, TMAX.- N. FREQ. ITMAX, EPS1, EPS2, EPS3,
?

1 EPS4, EIGVEC, STRIPD, SWEEP


WRlTE (6,201) R1, R2, DT, TMAX, N, FREQ, ITMAX, EPS1, EPS2, EPS3,

FN = N
DR * ( R 2 - Rl)/FN
DUN = l . / ( D R * D R )
D = 2.*DUM
DO2 1 = 1 , N
THETA( I ) = 0.0
DO2 J = l , N
= 0.0
A(I,J)
NM1 = N
NP1 = N + 1
- 1

DO 3 1 = 1, NM1
A(1,I) a D
FI = I
A(I,I+l)
A(I+1,1)
5

=
--D U M * ( l . + O.S/(Rl/DR
DUM*(l. - 0.5/(Rl/DR
+ FI -
+ FI))
1))

A(N,N)
A(1,2)
= D

DO4 1 = l , N
- D

DO4 J = l , N
B(I,J) = A(J,I)
WRlTE ( 6 , 2 0 3 )
DO5 1 - 1 , N
WRlTE (6,204) (A(I,J), J = 1, N)

..... CALL ON SUBROUTINE RUTIS, TO F l ND THE


EIGENVALUES AND EIGENVECTORS OF A AND B .....
WRITE (6,207)
DO8 1 = l , N
WRITE (6,204) (B(I,J), J = 1, N)
CALL R U T l S (N, A, ANEW, R, FREQ, ITMAX, EPS1, EPS2,
1 EPS3, EPS4, EIGVEC, STRIPD, SWEEP, TAGI, TAG2, I T E R )
WRlTE (6,202) TAG1, TAG2, I T E R
I F ( T A G 1 .OR. TAG21 GO TO 1
WRlTE ( 6 , 2 0 5 )
DO6 1 =I, N
WRITE ( 6 , 2 0 4 ) (ANEW(I,J), J = 1, N)
WRlTE (6,206)
DO7 1 3 1 . N
WRlTE (6,201) (R(I,J), J 1, N)
CALL R U T l S (N, R, RNEW. S. FREQ, ITMAX, EPS1, EPSZ.

WRlTE (6,208)
DO9 1 = 1 , N
WRlTE (6,204) (BNEW(I,J), J 1, N)
WRlTE (6,209)
DO 1 0 1 = 1, N
WRlTE ( 6 , 2 0 4 ) (S(I,J), J = 1, N)
0 0 11 1 = 1, N
LAMBDA( I ) ANEW( I, I )
Example 7.9 Unsteady Conduction Between Cylinders (Characteristic-Value Problem)

Program Listing (Continued)

..... CHECK ON ORTHOGONALITY OF EIGENVECTORS .....


DO 1 2 1 = N 1,

DO 1 2 K = 1, N
P(I,J) = P8(I,J) + R(K,I)*S(K,?)
WRITE (6,210)
DO 1 3 1 .I 1, N
WRITE (6,204) (P(I,J), J 1, N)
.....
DO 1 4 J
DETERMINE COEFFICIENTS C ( J ) FROM I N I T I A L CONDITION
1, N
.....
C ( J ) = 0.
DO 1 4 1 1, N
C(J) = C(J) + S(I,J)/P(J,J)
WRITE (6,211)
WRITE (6,204) (CCJ), J a 1, N)

..... COMPUTE VALUES OF TEMPERATURE


WRITE ( 6 , 2 1 2 )
.....
THETA(NP1E = 1.
MIN = 1
T = 0.
I F (T.GT.TMAX .OR. THETA(l).GT.0.95) GO TO 1
I F (DABS(l.AMBDA(MIN)*T) .LE. 40.) GO TO 1 7
M I N .i M I N + 1
GO TO 16
DO 18 I * 1, N

18
DO 18 J
THETA( I ) -
THETA ( I ) = 1.

WRITE (6,213)
MIN. N
THETA( I -c ( J ) * R ( I ,J)*DEXP(-LAMBDA(J)*T)
T, (THETA(J1, J = 1, NP1)

C
C
100
..... FORMATS FOR INPUT AND OUTPUT STATEMENTS
FORMAT (4(12X, F6.1)/
.....
3(12X, 1 6 ) / 4(12X, E6.011 3(12X, 1 6 ) )
200 FORMAT ( 6 9 H 1 UNSTEADY STATE HEAT CONDUCTION BETWEEN CONCENTRIC C
IYLINDERS, WITH/ 1HO)
201 FORMAT (8X, 1 0 H R 1 = F10.4/ 8X, 10H R2 = , F10,4/
1 8X, 10H DT = , F10.4; 8X, 10H TMAX = , F10.4/
2 8X, 10H 11 = , 15 / 8X, 10H FREQ = , IS /
3 8X, 10H IITMAX = , I S / 8X, 10H E P S l ,
= ElO.l/
4 8X, 10H EPS2 = , E10.1/ SX, 10H EPS3 = , ElO.l/
5 8X, 1 0 H EPS4 , E10.1/ 8X, 1 0 H EIGVEC , 15 /
6 8X, 1 0 H STRIPD = , LS / SX, 1OH SWEEP = L5)
202 FORMAT (lHO, 7X, 10H TAG1 a , LS/ 8X, 10H ~ A G Z = , LS/
1 8 X . 1 0 H ITER = . IS)
203 -FORMAT (24HO S T A ~ T NG I MATRl X A IS / l H )
204 FORMAT ( S X , XOFlO.6)
205 FORMAT ( 3 0 ~ 0 TRANSFORMED MATRIX ANEW I S / l H
206 FORMAT 146HO MATRIX R, CONTAINING EIGENVECTORS OF A, I S / l H 1 )
207 FORMAT (26HO TRANSPOSED MATRIX B I S / l H )
208 FORMAT (30HO TRANSFORMED MATRIX BNEW I S / l H )
209 FORMAT (46HO MATRIX S, CONTAINING EIGENVECTORS OF B, I S / l H
210 FORMAT (79HO ORTHOGONALITY REQUIRES OFF-DIAGONAL ELEMENTS OF FOL
lLOWlNG MATRIX TO BE ZERO/lH )
211 FORMAT (36HO THE COEFFICIENTS C(l),..C(N) ARE/lH 1
212 FORMAT ( 8 5 H 1 TIME TEMPERATURES AT GRID POINTS ( I N N
1ER SURFACE TO OUTER SURFACE)/lH )
213 FORMAT (1M , F 7 . 3 / ( 8 X t llF8.5))
C
END
516 Approximation of the Solution of Partial Differential Equations

Program Listing (Continued)


Data
R1 = 10.0, R2 = 19.0, DT = 5.0, TMAX = 100.0,
N I 9, FREQ = 5, ITMAX = 75,
EPSl l.E-7, EPSZ l.E-3, EPS3 = 1.E-4, EPS4 = 1.E-1,
EIGVEC = T, S T R l PD = T, SWEEP = T.

Computer Output
UNSTEADY STATE HEAT CONDUCTION BETWEEN CONCENTRIC CYLINDERS, WITH

R1
R2
DT
TMAX
N
FREQ
l TMAX
EPSl
EPS2
EPS3
EPS4
E l GVEC
STR l PD
SWEEP

STARTING MATRIX A I S

TRANSPOSED MATRl X B I S

TAG1 = F
TAG2 = F
ITER = 41

TRANSFORMED MATR l X ANEW I S


Example 7.9 Unsteady Conduction Between Cylinders (Characteristic-Value Problem)

Computer Output (Continued)

TRANSFORMED MATR I X BNEW I S

M A T R I X S, C O N T A I N I N G E I G E N V E C T O R S O F B, IS

ORTHOGONALITY RlIQUl RES OFF-DIAGONAL ELEMENTS OF FOLLOW1 NG M A T R l X TO B E ZERO

THE C O E F F I C I E N T S C ( l ) . . . . C ( N ) ARE

TIME TEMPERATURES A T G R I D P O I N T S ( I N N E R S U R F A C E TO OUTER S U R F A C E )


518 Approximation of the ~6Iutionof Partial Differential Equations

Computer Output (Continued)


Exanrple 7.9 Unsteady Conduction Between Cylinders (Characteristic-Value Problem) 519
Discnssiun of Results
By examining the diagonal elements of the transformed values of e-"'l that decay least rapidly with time. An
matrices ANEW and BNEW, observe that matrices A and B additional check on the accuracy of the calculations is
have identical eigenvaiues. The subdiagonal elements of afforded by the values of temperature at t = 0; the first
A N E W and BNEW are zero, to sixdecimal places, indicating of these should be zero, in the absence of round-off error.
that R L i l i S has performed a sufficient number of itera- Finally, we mention that, once the eigenvalues and
tions (actually ITER = 41 in both cases). Note also that eigenvectors have been found, the effect of any other
the OR-diagonal elements of P (the last matrix printed) initial condition can be examined merely by recomputing
are virtually zero, which checks the orthogonality of the the coefficients cj. Also, the computed values of tempera-
eigenvectors as expressed in equation (7.9.9). ture do not depend on values of temperature at any pre-
Judging from the values of the coefficients c,, the vious time. These are two advantages which are not
eigenvectors that are weighted the most heavily are those afforded by the conventional finite-difference solution of
corresponding to the smallest eigenvalues, i.e., to the the parabolic equation (7.9.1).
520 Approximation of the Solution of Partial DifferentialEquations

Problems
7.1 Verify the following finite-difference approximations
for use in two dimensions at the point (i,j):
a4u ui-2,j-4ui-~,j+6~1,j-4~1+1.j+ui+2.j
(a) -=
ax4 (Ax)4 depending on which approximation is used for u. (=au/a.u).
+ O[(AX)~I. Although these formulas are implicit and relate to a two-
dimensional problem, counterparts could easily be derived for
au - 3 ~ ~ . ~ + 4 ~u i~+ +
~ , ~] . ~ -
(b) -=
ax -2Ax
+ o[(Ax)21. explicit and/or one-dimensional problems. Also, the value of a
and/or g might be zero in certain applications.
From the points of view of accuracy and computational
convenience, investigate the desirability of this alternative way
of accounting for boundary conditions.
Suggest possible uses of formulas (b) and (c). 7.13 One-dimensional unsteady heat conduction in a
7.2 Verify equation (7.8) for the finite-differenceapproxi- medium is governed by the equation a(E2T/axZ)= aT/ar, in
mation of a2u/axay. which Tis temperature ( O F ) , x i s distance (ft), t is time (hr), and
7.3 Verify equation (7.9) for the finite-differenceapproxi- a is the thermal diffusivity (sq ft/hr) of the medium.
mation of the Laplacian during the solution of u,, +
u,, = 0.
7.4 Rework the numerical example in Section 7.5, using
the explicit method with Ax = 0.2 and At = 0.02, that is,
h = 112.
I Medium

7.5 Rework the numerical example in Section 7.5, per-


forming four time steps, using the implicit method with
Ax = 0.2 and At = 0.04, that is, h = 1.
7.6 The following finite-differenceapproximation has been
suggested for the PDE u, = u,, +u,,:

Figure P7.13
Investigate the stability of this procedure. A heat flux density q (BTU/hr sq ft) is imposed on the
7.7 Verify the formula given on page 453 for the amplifica- boundary x = 0 of the medium shown in Fig. P7.13. If the
tion factor during the solution of u,, + u,, = u, by the implicit thermal conductivity of the medium is k (BTU/hr sq ft OF),
alternating-direction method. what is the explicit finite-difference approximation of the heat
7.8 Set up the analog of (7.52) for solvingu,, + + u,, u,,= conduction equation at grid-point 1 ?
u, by a simple extension of the implicit alternating-direction Note. Fourier's law of heat conduction is: q = -k aT/ax.
method. If Ax = Ay = Az, show that the method will be unsta- 7.14 The equation u,, +
u,, = u, is being solved explicitly
ble at least for h > 312. for the region shown in Fig. P7.14. The boundary conditions
7.9 Investigate the stability of the following two pro- are of the Dirichlet type. What formula should be used to
cedures for solving u,, + +uy, u,, = u, : (a) the alternating- compute u,,,+, ? Assume that Ax = Ay.
direction modification of the Crank-Nicolson method, (7.54),
and (b) Brian's method, (7.55).
7.10 Show that Brian's method, (7.55), for solving
u,, + +u,, u,, = u,, can be reexpressed in the form of (7.56).
7.11 The following nonlinear PDE governs unsteady heat
conduction in a bar:

Here, the thermal conductivity k, density p, and specific heat


c, are known functions of temperature T.
What finite-difference approximation would you recommend
for a Crank-Nicolson type solution?
7.12 I n Section 7.17, the boundary condition -11. + orr=g Figure P7.14
was coupled with the PDE u, = u,, + u,, to give equation 7.15 Compare the relative merits of (a) the Jacobi, (b) the
(7.60). Gauss-Seidel, and (c) the ~uccessi~e overyelaxation methqds
As an alternative approach, the finite-difference approxima- for solving the problem stated in Example 7.6 (solution of
tion of the boundary condition alonecould be employed at the Laplace's equation in a square). To do this, perfarm a few
boundary, giving, for example, s four grid spaciqgs along each side of
hand ~ a l c u l a t i o ~with
Problems 521
the square, and see how quickly each method converges to the 7.20 Consider the implicit alternating-direction solution
corresponding values computed in Example 7.6. of Laplace's equation in a square, for which the optimum
7.16 The cross section of a hollow square duct is shown in sequence of iteration parameters p is given by equation (7.91).
Fig. P7.16. The sides of the two squares are in the ratio 2:l. Investigate a possible connection between this sequence and
The interior and exterior surfaces of the duct are maintained at the expression for the amplification factor given in Section
temperatures of 1000°Fand 1W0F,respectively. 7.14 for the implicit alternating-direction solution of the cor-
responding parabolic problem.
7.21 Write a computer program to find the temperature
history Q(r,t)inside an infinitely long cylinder of radius a that
is initially at a uniform temperature 8, and is suddenly
immersed in a bath of hot fluid maintained at a temperature
0,. The heat-transfer coefficient between the bath and the
cylindrical surface is h, and the thermal diffusivity and con-
ductivity of the material of the cylinder are a and k , respec-
tively.
To facilitate the solution, first define a dimensionless time,
radius, and temperature by

Noting that there is axial symmetry, the governing equations


Figure P7.16
can be shown to be
Assuming steady-state heat conduction, estimate the tem- Conduction equation
perature at point P, located halfway between the midpoints of
adjacent sides of the two squares.
7.17 Figure P7.17 shows ,a cross section of one quadrant
of a circular duct that passes through a square block of Initial condition
refractory material of thermal conductivity k = 0.26 BTU/hr
ft O F . The inner curved surface is maintained steadily at 1400°F
by a hot gas, and the outer surface is constant at 100'F. If
Boundary conditions
r = 6 in., estimate (a) the steady temperatures at the indicated
grid points (Ax = Ay = 3 in.) and (b) the heat leakage per foot aT k
length of the duct. In this connection, what is the finite- ~ > 0 R,= l : d , - - = , I - - T , where d , = -
aR lia '
difference approximation of V2T= 0 at point P?

Then solve the problem by an implicit finite-difference


technique, in which the radius from R = 0 to R = 1 is divided
+
into n equal increments, giving n 1 unknown temperatures
at each time step. The grid points at R = 0 and R = 1 will
need special approximations. A necessary first step at the
center is to note, from L'Hospital's rule, that since aT/aR -+0
as R+O, the conduction equation becomes ~ J T / ~ T =
2(a2T/aR2)at R = 0.
The input data for the computer program should include
values for n, AT (the time step), d,, and T,,, (the maximum
value of T for which the solution is of interest). Some control
will also be needed for printing the computed temperatures at
Figure P7.17
definite time intervals. Suggested values are: n = 10 and 20,
AT = 0.005, 0.01, 0.02, 0.05, and 0.1, d, = 0, 0.5, and 2, T,., =
What would the corresponiding answers be for r = 12 in., 1.5.
and Ax = Ay = 6 in.? Comment briefly on how the computed results depend on
7.18 Show that the Jacobi iteration scheme for solving the particular choice of grid size and time increment.
+
u,,
solving u,, -
uyy= 0 is equivalent to the ordinary explicit method for
+ uyy= u,, with h A ~ / ( A x =
) ~ 114.
7.19 Set up a successive overrelaxation scheme for solving
7.22 Conqider the problem illustrated in Fig. P7.22, of
heat transfer to a viscous fluid of thermal diffusivity a flowing
with mean velocity u, through a heated tube of radius a. The
V2$ = -u in two spacedimensions, with Ax # Ay. Assume that temperature 8 of the inlet fluid is 8,, and the tube wall is at a
a = u ( x , y ) is a known function of position (see Section 7.2(c)). constant temperature 6,.
522 Approximation of the Solution of Partial Differential Equations
Tube-wall temperature O1 Inlet Condition

80 + Parabolic velocity profile Boundary Conditions


Figure P7.22
Let r and z be radial and axial coordinates. Define corres-
ponding dimensionless coordinates by R = r/a and Z = uz/
(2u,a2). Assuming steady laminar flow with constant vis- in which h = ka/D.
cosity and negligible axial conduction (see, for example, page For values of h = 0.1, 1, and 10, obtain numerical values for
616 of Goldstein [35]), the following PDE governs the dimen- C as a function of X and Y, until the centerline concentration
sionless temperature T = (8 - 8,)/(8, - 8,) : falls to 5 % of the inlet value, by each of the following methods.
Take advantage of symmetry and solve the problem in the
region for which 0 < Y < 1 only.

subject to the inlet condition T = 0 at Z = 0, 0 < R < 1, and Method I


the boundary conditions T = 1 at R = 1, Z > 0, and aT/aR = 0
Following an argument similar to that used by Carslaw and
at R = 0, Z > 0.
Jaeger (pages 114-120 of [2]), it may be shown that
Write a program that uses a finite-difference procedure for
computing T as a function of R and Z. The input data should
include values for: rn (number of radial increments AR), A Z
(axial step), and Z,., (maximum value of Z for which T is to
be computed). in which the p, are the positive roots of ptanp = h, with
7.23 A problem closely related to (7.22) is that in which < P2 < p 3 .. . (see Problem 3.28). Adequate convergence
the tube wall is electrically heated at a known rate q (energy should be ensured by taking enough terms of the series into
per unit area per unit time). See, for example, page 293 of the summation.
Bird, Stewart, and Lightfoot [34]. In this case, if a new
dimensionless temperature is defined by T = (8 - 8,)/(aq/k), Method II
where k is the thermal conductivity of the fluid, the PDE and
Use a finite-difference procedure, such as the Crank-
inlet condition remain unchanged, but the boundary condi-
Nicolson, DuFort-Frankel, or Saul'yev method. The solu-
tions become aT/aR = 1 at R = 1, Z > 0, still with aT/aR = 0
tion should be performed for several different values of AX
at R = 0, Z > 0.
and AY, in order to investigate the effect of grid spacing on
Write a program that solves this new problem.
convergence.
7.24 A liquid containing an inlet concentration co of a
dissolved reactant enters the region between two wide parallel Method III
plates, separated by a distance 2a, as shown in Fig. P7.24.
Treat the problem as a characteristic-value problem, in the
manner of Example 7.9. A procedure, such as that given in
Example 4.2 or 4.3, will be needed for finding the eigenvalues
and eigenvectors of a matrix.
Comment as to which of the above methods, if any, would
lend itself to the solution of a similar problem in which D and/
or v were known functions of y.
7.25 A cylindrical cavity, 10 ft in diameter and 10 ft high,
is situated deep in ground that is initially at a uniform tem-
Figure P7.24 perature of 60°F. At a time t = 0, the cavity is filled with
The liquid velocity is assumed to have the uniform value v. A liquefied natural gas, maintaining its walls at -260°F. The
first-order reaction occurs with velocity constant k at the liquefied gas that evaporates is recondensed and returned to
plates, which are coated with a catalyst. Transport in the y the cavity. The temperature Tin the surrounding ground obeys
direction, normal to the plates, is assumed to be by diffusion the equation
only, with diffusion coefficient D; in the x or axial direction,
convection is assumed to predominate over diffusion. The
problem is to find how the reactant concentration c varies with
x and y. in which r and z are radial and vertical coordinates. The
By defining a dimensionless concentration C = clc, and ground is dry and has the following properties: thermal con-
dimensionless distances X = xD/a2v and Y = y/a, the govern- ductivity k = 2.2 BTU/hr ft O F , density p = 125 Ib/cu ft,
ing equations ca'n be expressed as shown below. and specific heat c, = 0.19 BTU/lb OF.
Transport Equation Use a finite-difference method to solve for the transient
a2c ac temperatures in the ground. Hence, plot against time the
For X > 8 , -1 < Y < l : -=-
aY2 ax' total rate Q of heat conduction into the cylinder from the
Problems 523
ground, in BTU/hr, for t up to 5 years. What method of solu- u,, and that uses a finite-difference procedure to estimate
tion would you use if the ground did contain an appreciable values of the stream function # at a network of grid points
moisture content? inside the pipe. Arrange, if possible, for the computer printout
A more detailed discussion of this problem is given, for to display points on several streamlines (lines of constant #)
example, by Duffy et al. [37]. so that the flow pattern is represented directly.
7.26 Figure P7.26 shows the cross section of two hollow
cylindrical anodes that are used for focusing an electron beam Suggested Test Values
passing along their axis. The anodes are separated by a small
r, = 10, r, = 4, L = 20, D = 8 cm, u, = 20, u, = 60 cm/
distance and are maintained at potentials - V. and +
V..
sec.
To achieve the above, first note that continuity gives u,:

Also, the axially-symmetric stream function (I,(r,z)is such that


.- u = (a#/ar)/r, u = -(a#/az)/r, and for irrotational flow
(assumed here) it satisfies
Midplane A2
-
f
I
For the uniform velocity profiles, the stream function is of the
Figure P7.26
form (I,= +ur2,SO by using u,, u,, and uc in turn, the boundary
Assume here that the anodes are closed at their ends, although conditions are specified at A, B, and C. Also, (I, has the values 0
in practice threre would be a circular opening in each to allow on the center line, tuAr2on the outer wall, and +~er,2on the
passage of the electron beam. The distribution of potential is wall of the inner tube.
governed by Laplace's equation in cylindrical coordinates: To allow greater generality, we suggest that the reader
rephrase the problem in terms of the dimensionless variables
R = rlr,, Z = zlr,, Y = #/($uAr2), with a = rB/rA, = Llr,,
and y = D/rAas parameters.
Write a program that will use a finite-differencemethod to For a particular application of the results computed here,
approximate the distribution of potential in the electron "lens" see Problem 6.39.
inside the anodes. Use symmetry to simplify the problem as 7.28 A horizontal rectangular plate of uniform thickness t
much as possible. The input data should include values for: has length I and breadth b. The flexural rigidity of the plate
V., R, Z, rn (number of axial grid spacings between z = 0 and is given by D = Et3/12(l - u2), where E = Young's modulus
z = Z), n (number of radial grid spacings between r = 0 and and u = Poisson's ratio. The edges of the plate are clamped
r = R), N (upper limit on the number of iterations to be per- rigidly, and the entire plate is subject to a uniform load q per
formed), and E (a tolerance used in testing for convergence, as unit area. Write a program that will determine the downwards
in Example 7.6). deflection w as a function of position.
For a further description of this problem, see page 332 This problem parallels that of Example 7.7 and involves
of Ramey [36]. Also, the motion of an electron inside the lens the solution of the biharmonic equation T4w = q/D. However,
is discussed in Problem 6.25. the finite-difference scheme should be set up without antici-
7.27 Figure P7.27 shows a diametral plane section of a pipe pating that the grid spacings Ax and Ay are necessarily equal.
Also, the boundary conditions along the edges will now be
(a) w = 0, and (b) aw/aq = 0, where 7 is the direction normal
to an edge.
The input data for the program should include values for
t, 1, 6, E, o,q, m, and n (number of grid spacings along the
length and breadth, respectively), N (upper limit on the num-
ber of iterations), and E (tolerance used in testing for conver-
gence, as in Example 7.6). Actual values can be patterned after
those used in Example 7.7.
The above problem may be modified to include (a) effect of
L * = z
Figure P7.27
arbitrary loading over the surface, (b) effect of one or more
free edges, without support, and (c) deflection of a circular
of radius r, and length L with a coaxial tube of radius rs plate. The relevant equations for the latter two cases are given
inserted a distance D into the downstream end. An inviscid by Hughes and Gaylord on pages 83 and 149 of [33].
fluid enters steadily at A and separates into a central stream B 7.29 A steel shaft of a hexagonal cross section has a
and an annular stream C. At A , B, and C, the axial velocities u maximum diameter of 2 inches. If the modulus of rigidity of
have the uniform values u,, u,, and uc, and there is no radial steel is G = 11.5 x lo6 lb,/sq in., compute the torque needed
flow at these sections (v = 0). to twist the shaft through an angle of 0 = 0.002 radians per
Write a program that accepts values for r,, re, L, D, u,, and foot.
524 Approximation of the Solution of Partial Diferential Equations

Following Hughes and Gaylord [33], the required torque M section, based on the temperatures already computed in part
is given by (a). The following values are suggested: dpldz = -0.05 lb,/cu
ft, =5 x lb, seclsq ft, c = 8000°, d = 460°, and
a = 213 ft.
where the integral is over the cross section, and Il,is the solution (c) Finally, write an additional extension to the program
of that will estimate the total flow rate of oil, Q cu ftlsec, by
v2* = -2 evaluating numerically the integral

throughout the cross section, subject to # = 0 on the surface.


7.30 (a) Figure P7.30 shows the cross section of an oil
pipe-line that is half buried in relatively cool ground. The top Compare this value of Q with that predicted by assuming a
half of the pipe is exposed to the sun, and the surface tempera- constant viscosity evaluated at the mean temperature A°F:
ture of the pipe may be appioximated by T, = A B sins, +
where A and B are constants.

Note. the elliptic equations in parts (a) and (b) may also
be solved by the implicit alternating-direction method (page
452).
7.31 A deep region of liquid is at a uniform initial tem-
perature T o . At a time t = 0, the temperature at the surface is
lowered to T,, at which value it is held constant. The liquid
freezes at a temperature Tf, where T, < Tf < To, so that a
frozen region grows into the liquid, as shown in Fig. P7.31.
T, Surface

Figure P7.30

Assuming that the thermal conductivity of the oil is constant --


- -- Region B (liquid) :
and that heat transfer across the section is by conduction only, - -- -
--
the local oil temperature T = T(r,8) is the solution of the sim-
plified heat-conduction equation in cylindrical coordinates: Figure P7.31

Neglecting possible convection currents and the small upwards


motion due to contraction upon freezing, heat transfer is by
conduction, and we have to solve
Write a program that will perform a finite-difference solu-
tion to find T as a function of r and 8. An iterative technique, a A = -aTA (Region A),
such as the Gauss-Seidel method, or successive overrelaxation, ax2 at
is suggested. Observe that symmetry makes it necessary to
consider only the region -742 < 8 6 4 2 , at most. The input @ e s = F (Region 8).
data should include values for A, B, k (the number of radial
spacings Ar), I (the number of angular increments A8 between in which u = klpc, is the thermal diffusivity, with k = thermal
8 = -7712 and 8 = n/2), n (the upper limit on the number of conductivity, p = density, and c, = specific heat.
iterations to be performed), and E (a tolerance used in testing A heat balance at the interface results in the equation
for convergence, as in Example 7.6). Suggested values are
A = 12S°F, B = 50°F, k = I = 8, n = 150, and E =
(b) Assuming steady flow along the pipe, the axial velocity
v = v(r,8) of the oil obeys the following PDE (see, for example, which governs the velocity v of movement of the interface.
page 22 of Hughes and Gaylord [33]): Here, L is the latent heat of fusion, and d is the instantaneous x
coordinate of the interface.
Write a program that uses an unconditionally stable finite-
difference method to compute the location of the interface and
where dpldz is the pressure gradient along the pipe, and p is the temperatures in regions A and B as functions of time. Pay
the viscosity of the oil, which varies with temperature accord- particular attention to the problem of locating the interface
ing to the empirical law p = p o e c l ( T + dwhere
), po, c, and d a r e which, in general, will lie between two consecutive grid points.
constants. The input data should consist of values for To, T,, T,, kA,PA,
Write an extension to the program in part (a) that will read C p ~k L, Ax, At, and t,,, (the maximum time for
, B , p B , cPB,
values for dpldz, /I,, c, d , and a (the pipe radius), and proceed which solutions are required). Suggested test data, which are
to compute the distribution of velocity v over the cross relevant to the freezing of water, are: To = 37"F, T, = 2OoF,
Problems 525
T, = 32"F, L = 144 BTUIlb, t,., = 20 days, with other 7.33 An ion-exchange column consists of a cylindrical
physical properties given in Table P7.31. tube of length L that is packed with beads of ion-exchange
resin. A solution containing sodium ions (Na +)flows through
Table P7.31 the column with a superficial velocity v. The resin adsorbs Na+
from the solution and returns hydrogen ions (H +)in exchange.
Region A Region B If the concentrations of Na+ are c in the solution and q
(Ice) (Water) Units (based on the total packed volume) in the resin, material
balances give the following equations:

Note that since the density of water falls with decreasing


temperature in the range under study, convection currents are
likely to be absent. As usual, investigate the dependency of the Here, E is the void fraction and r is the rate of adsorption,
computed results on particular choices of Ax and At. Check which varies with the particular conditions. For the N a + / H +
your results against those predicted by the analytical solution, exchange process on Dowex-50 resin, the following expression
given by Carslaw and Jaeger [2] and also discussed in Prob- is given by Gilliland and Baddour [38]:
lem 3.48.
The above problem can be extended to a variety of more
complicated situations, such as the freezing of the earth around:
(a) an underground cavity containing liquefied natural gas at a Here, co is the concentration of Na+ in the feed to the column
very low temperature, (b) a row of buried pipes through which (x = 0), qo is the total exchange capacity of the resin, and k and
a cold refrigerant is circulating. Also, the contraction effect on K are experimentally determined constants.
freezing could be taken into consideration. Devise a finite-differencescheme for predicting the variation
7.32 This problem considers an easier, but less rigorous, of c and q with time t and distance x in the column. Write a
method for solving Problem 7.31. Basically, the technique is to computer program that implements the method.
treat the frozen and liquid regions as a single continuous
phase, in which the physical properties depend on the local Suggested Test Data
temperature. E = 0.45, co = 0.1 meqlcc, qo = 0.86 meqlcc, k = 0.433
As shown in Fig. P7.32, which is not to scale, the physical cclmeq sec, K = 1.50, L = 50 cm, and v = 0.50 cm/sec. Stop
properties are assumed to change linearly in the region the calculations when c at the column exit (x = L) reaches
T, & 6. The peak in c, now represents the latent-heat effect, 95 % of its inlet value.
and should be adjusted so that Sf/,+: c,dT= L. 7.34 This problem simulates the performance of a hori-
zontal underground stratum of porous rock that is to be used
seasonally for the storage and withdrawal of dry natural gas.
Figure P7.34a shows a square grid (Ax = Ay) that approxima-

Figure P7.32

The problem is now to solve the single equation, aaZT/axZ =


aT/ar,which will be nonlinear, because a depends on the parti-
cular temperature at each point. The situation is somewhat
similar to that in Example '7.4.
Write a program that implements the above method. The Figure P7.34~
input data will be similar to those in Problem 7.31, with the
addition of a value for 8. Difierent values for 6 should be tried. tes a plan of the formation, which has a uniform vertical
If 6 is too large, the results may be rather inaccurate; if too thickness h, and which is surrounded on all sides by imper-
small, the resulting equations may be so highly nonlinear as to meable rock. A cylindrical well for gas injection and with-
cause instabilities. drawal is located in block A .
526 Approximation of the Solution of Partial DifferentialEquations

At the end of the summer, when injection is complete, the can be found from equation (4). Ifp,, < p,, then of course the
reservoir pressure is uniformly at a high value PO.Gas is then scheme becomes unworkable within the framework of mini-
to be withdrawn from A at a well pressure pwa that must at mum delivery pressure as specified above.
least equal a specified delivery pressurepa. Over the subsequent Note that when developing a finite-difference formulation of
period 0 d t < t,.,, the gas is scheduled to be withdrawn at a equation (I), m will be zero for all blocks except A, in which we
volumetric flow rate Q (referred to standard pressure and then have the relation r n ( A ~ ) ~=hm,.
temperature pS and T,) that conforms to the pattern of anti-
cipated demand shown k Fig. P7.346. Following the guide- Suggested Test Data (Not in Corzsistent Units)
Ax = Ay = 400 ft, h = 27 ft, E = 0.148, k = 168 md (milli-
darcies; 1000 md = 1 darcy E 1 centipoise sq cmlsec atm),
p = 00.128 centipoise, M = 16 Ib,/lb mole (that for methane),
R = 10.73 psia cu ft/lb mole O R , T = 545"R, r,= 4 in.,
p a = 1185 psia, pd = 520 psia, t,,, = 100 days, T, = 520°R,
and p, = 14.65 psia.
7.35 Perform more elaborate calculations on the lines of
Problem 7.34, but now allowing for one or more of the follow-
ing modifications :
(a) Permeability and porosity no longer uniform, but
varying from block to block.
(b) Multiple wells, all coming on stream simultaneously,
with identical well pressures.
(c) Multiple wells, scheduled to come on stream in succes-
Figure P7.346 sion, so that 'the peak withdrawal rate is maximized.
(d) Nonideal gas behavior.
lines below, write a program that will estimate the maximum (e) Operation over several yearly cycles, with the reservoir
feasible peak supply rate Q,,,;the output from the program pressure not necessarily uniform at the end of the injection
should also include the corresponding well pressure as a period.
function of time. (f) Rectangular grid (Ax # Ay), often useful if the rock
The equations governing the variation of pressure through- formation extends appreciably further in one horizontal
out the reservoir are: direction than in the other.
a a ap 7.36 An underground rock formation of porosity E and
Continuity: - (pu) + - (pv) + m = - E - . permeability k is initially saturated with water of viscosity y o
ax ay 2t
and density p, at a pressure po. Part of the formation is to be
sealed by injecting into it a grout solution at a constant pres-
Darcy' s law:
k -
u= -- ap v = - - -k ap
ax' P ay' sure p, from a vertical well. The well is fairly deep so that the
flow is predominantly radially outward from the well bore,
MP whose radius is r,. The grout solution is made up in one batch,
Ideal gas law : p = F~
and its viscosity p1 increases everywhere with time f according
to the relation
Here, t = time, E = rock porosity, k = rock permeability,
p = gas density, p = gas viscosity, p = pressure, u and v =
superficial velocities in the x and y directions, m = mass rate of
withdrawal of gas per unit volume of rock, M = molecular
where a, b, and c are experimentally determined constants.
weight, R = gas constant, T = absolute temperature, and con-
sistent units are assumed; E , k, p, M, and T are assumed Assuming that both the grout solution and the water are
constant. slightly compressible and obey the same equation of state,
The general practice for solving such problems is to intro- Saville [40] presents the following governing equations:
duce grid points, such as (i,j), at the center of each block;p,, is
a ap
then viewed as the average pressure for the block. Although Continuity : -r -
ar
(rpu) = - E - .
at
the well is located in block A, the well pressure p,, will be
different from the average block pressure p,. According to.a
simplified treatment (see Henderson, Dempsey, and Tyler Darcy's law: u = - - - k ap
[39], for example), in which the exact location of the well P ar .
within the block is immaterial, the two pressures are related by P
Equation of state : In - = C(p - pa).
Po
Here, r = radial distance, p = pressure, p - dcnsity, (7 = com-
in which m, = QMp,/RT, is the total mass rate of gas with- posite compressibility of liquid and rock, u = superficial
drawal from the well, r,= radius of the well bore, and velocity, and p = local viscosity (of grout solution or water, as
.rrr: = Axhy. For a given p, and m ~ the
, corresponding p,, appropriate).
Problems

The radius R of the groutlwater interface progresses present problem. The Airy stress function $ is the solution of
according to the biharmonic equation

where R emphasizes that the pressure gradient is evaluated on


The boundary conditions are: a, = T and T,, = 0 along the
+

the water side of the interface. Pressure and velocity are con-
ends A D and BC; a, = r,, = 0 along the sides A B and C D ;
tinuous across the interface.
on the surface of the hole, both the normal and shear stresses
Write a program that uses a finite-difference scheme for
are zero.
predicting the location of the interface as a function of time
Write a computer program that will use a fin~te-difference
until it virt~lallyceases to advance.
method to approximate the distribution of stresses inside the
Suggested Test Data (Not ~n Cotisisteirt Units) plate.
Suggested Input Dafa
7.2 u l o w b psla-', k 235 md (millidarcies; 1000 md 1
darcy I 1 cp sq cm/sec atm), po = 480 psia, and p, = 850
-
r , = 3 in., ~ = 0 . 1 5 . p 0 = 1 cp, p0=62.4 lb/cu ft, C =
- w = 5 i n . , L = 5and IOin.,a==0.5,] . a n d 1.5 in., T . 1000
Ib,/sq in., together with appropriate values for m and n (the
psia. Investigate two different grout solutions reported in [40]: number of grid spacings AX and Ay to be used along A B and
AM-9, with rr = 1.0 cp, b = 3.5 cp/min, c = 74 min; and DA, respectively), N (upper limit on number of iterations to be
Siroc, with a = 5.3 cp, b = 4.9 cp/hr, and c = 6.7 hr. performed), and E (a tolerance used in testing for convergence,
7.37 A circular hole of r a d ~ u sa is drilled through the as in Example 7.6).
center of a rectangular pllate of length L and width w . A For comparison, an approximate solution to the problem
uniform tens~lestress T IS exerted at both ends of the plate, as is given on page 80 of Timoshenko and Goodier [41].
shown in Fig. P 7 . 3 7 ~ . 7.38 Investigate the possibility of extending Problem 7.37
to the cases of elliptical holes with major axes oriented (a)
along the plate, and (b) across the plate. (For large eccentri-
cities, the solution is of particular interest because it approxi-
mates the effect of a crack in the plate.)
7.39 Figure P7.39 is a simplified view of the cross section
of a wide concrete dam. The internal stresses a,, a,, and T,,
obey the PDE given in Problem 7.37, except that the potential
function is now = -p,yx, where p, is the density of con-
crete. The boundary conditions are zero normal and tangen-
tial stresses along ABCD, with T,, = 0 and u, = -p,gx along
Figure P7.37a AE, where p, is the density of the water behind the dam.
Along DE, assume as an approximation that T,, is constant
The internal stresses rs,, a,, and r,, (considered positive as and that a, at any point is proportional to the height of con-
shown in Fig. P7.37b) are given [33] by crete above that point.

where the potential function !2 may be treated as zero in the

Figure P7.39

Write a program that employs a finite-difference method


to approximate the distribution of stresses inside the dam.
Suggested Input D a ~ u
-
wl = 50, w , 200, h , = 250, h2 = 50 ft, p, = 62.4, p, = 144
Ib,/cu ft, g = 32.2 ft/sec2, together with suitable values for m
J. and n (the number of grid spacings Ax and Ay to be used alon'g
Figure P7.376 A E a n d E D , respectively), N and F (defined in Problem 7.37).
528 Approximation of the Solution of Partial DifferentialEquations

The conversion factor 32.2 Ib, ft/Ib, secZwill also be needed in Here, k is the thermal conductivity of the bar, n is distance
order to express the stresses in lb,/sq in. measured normally inwards to the bar from the surface, 8, is
7.40 Write a computer program that uses either the the current local surface temperature, and all temperatures are
Crank-Nicolson method or one of the unconditionally stable absolute.
explicit methods of Section 7.13 to solve the problem of Write down in detail the PDE, initial and boundary con-
unsteady heat conduction across a slab, stated in Problem 7.1. ditions that govern 0 = 8(x,y,t), the temperature inside the bar
Compare your results with those of Examples 7.1 and 7.2. as a function of position coordinates x and y, and time r.
7.41 Solve Problem 1.22, concerning a laminar-flow heat Rewrite these governing equations in terms of the dimension-
exchanger, by treating it as a characteristic-value problem. A less variables
procedure, such as that given in Example 4.2 or 4.3, will be
needed for finding the eigenvalues and eigenvectors of a x = - X,
a
y = -Y
a'
7=-
at
a2 '
and -
T = 0 - 8,
8, - 8, '
matrix.
7.42 Figure P7.42 shows a cross section of a long cooling Hence show that the general solution is of the form T =
fin of width W , thickness t , and thermal conductivity k, that T(X, Y , ~ , p , y )where
, the dimensionless parameters /? and y
is bonded to a hot wall, maintaining its base (at x = 0) at a are given by
temperature T,. Heat is conducted steadily through the fin in
the plane of Fig. P7.42, so that the fin temperature T obeys.
p=- k y=-
00
~ u a e '; 0, '
Laplace's equation, aZT/ax2+ a2T/ay2= 0. (Temperature
variations along the length of the fin, in the z direction, are Let T,, be the dimensionless time taken for'the dimensionless
ignored.) center temperature Tc to rise to a specified fractional value f.
Write a computer program that will enable plots to be made of
TC, against /?, with y as a parameter.
A steel bar has a = 0.1 ft, a = 0.322 sq ftjhr, Bo = 520°R,
k = 25.9 BTU/hr ft OF, and E = 0.79. It is placed inside a
Wall at ; black-body enclosure with 8b = 2520°R; u equals0.171 x 10-
Tw BTU/hr sq ft O R 4 . HOWlong will it take for the center tempera-
ture to rise to 1520°R (Tc = 0.5)? How long, if a = 0.2 ft, with
all other quantities unchanged?
7.44 In an experiment for simulating the effect of a hot
refractory wall close to a tube in a pipestill, a long metal pipe is
Figure P7.42 placed parallel to a very large wall that behaves as a black-
body radiator at an absolute temperature T,. The pipe, which
Heat is lost from the sides and tip of the fin by convection
has inside and outside radii rl and rz, is separated by a dis-
to the surrounding air (radiation is neglected at suEciently
tance D from the wall. The pipe wall has a thermal conduc-
low temperatures) at a local rate q = h(T, - To) BTUjhr sq ft.
tivity k and the outer surface has an emissivity e. A well-
Here, T, and T,, OF are the temperatures at a pbint on the fin
mixed fluid at a temperature T, flows rapidly through the
surface and of the air, respectively. If the surface of the fin is
pipe; the local heat flux density (BTUjhr sq ft, for example)
vertical, the heat transfer coefficient h obeys the dimensional
from the inner pipe wall to the fluid is
correlation h = 0.21(Ts - given by Rohsenow and
Choi [42].
Write a program that will compute: (a) the temperature where h is the relevant heat transfer coefficient and TI = T,(+)
distribution inside the fin, and (b) the total rate of heat loss is the local inner-wall temperature. The polar coordinates
from the fin to the air per foot length of fin. (r,&are shown in Fig. P7.44.
Suggested Test Data
T, = 200°F, T. = 70°F, t = 0.25 in.; investigate all eight
combinations of k = 25.9 (steel) and 220 (copper) BTU/hr ft
OF, W = 0.5, 1, 2, and 5 in. Surroundings
at T,
If available, compare the results with those obtained from
Problem 6.36, in which temperature variations across the fin T(r, 9)
are ignored.
7.43 An infinitely long bar of thermal diffusivity a has a
square cross section of side 2a. It is initially at a uniform tem-
perature eo when it is placed quickly inside a furnace that
behaves as a black-body enclosure at a temperature I!&. If u is
the Stefan-Boltzmann constant and E is the emissivity of the
surface of the bar, the heat flux density (BTU/hr sq ft, for Figure P7.44
example) into the surface of the bar at any point is given by The corresponding radiant flux density to a point on the
outer surface where the temperature is TZ= Tz(& is
Problems 529
The geometric view factor FZw= FZw($)is the fraction of Assume that the problem is essentially two-dimensional, in the
radiation leaving a small area on the outer pipe surface that is plane of Fig. P7.44.
directly intercepted by the wall; the remaining fraction,
1 - Fz,, is absorbed by the remainder of the surroundings. Suggested Test Data
F2,can be evaluated by the method discussed in Problem 3.17,
modified so that one surface is a differential element of area. Tw= 1500°R, Tf = 760°R, T, = 540"R, E = 0.79, k = 12.8
Ignore heat transfer by convection outside the tube. BTU/hr ft OR,r2 = 1 in. Investigate all eight combinations of
Write a program that will compute the steady temperature r l = 0 . 5 and 0.9 in., D = 0 . 5 and 2.5 in., h = 2 0 and 200
distribution T = T(r,$) throughout the pipe wall and the net BTU/hr sq ft OR. The Stefan-Boltzmann constant is u = 0.171
sate of heat transfer to the fluid, per unit length of pipe. x l o - @BTU/hr sq ft OR4.
530 Approximation of the Soluiion of Partial Differential Equations

bliography L. F. Richardson, "The Approximate Arithmetical Solution by


Finite Differences of Physical Problems Involving Differential
G. Birkhoff and R. S. Varga, "Implicit Alternating Direction Equations with an Application to the Stresses in a Masonry
Methods," Trans. Amer. Math. Soc., 92, 13-24 (1959). Dam," Philos. Trans. Roy. Soc. London, Series A, 210, 307-357
H. S. Carslaw and J. C. Jaeger, Conduction of Heat in (191 1).
Solids, 2nd ed., Oxford University Press, New York, 1959. R. D. Richtmyer, Difference Methods for Initial Value Pro-
J. Crank and P. Nicolson, "A Practical Method for Numerical blems, Interscience, New York, 1957.
Evaluation of Solutions of Partial Differential Equations of the J. Douglas, "Alternating Direction Methods for Three Space
Heat Conduction Type," Proc. Camb. Phil. Soc., 43, 50-67 Variables," Numerische Mathematik, 4, 41-63 (1962).
(1947). D. Young, "Iterative Methods for Solving Partial Difference
J. Douglas, Jr., "On the Numerical Integrqtion of a2u/ax2 + Equations of Elliptic Type," Trans. Amer. Math. Soc., 76,
a2u/ay2= aulat by Implicit Methods," J. Soc. Indust. Appl. 92-1 11 (1954).
Math., 3,42-65 (1955). H. Z. Barakat and J. A. Clark, "On the Solution of the Dif-
J. Douglas, Jr., "On the Relation between Stability and Con- fusion Equation by Numerical Methods," Journal' of Heat
vergence in the Numerical Solution of Linear Parabolic and Transfer, Trans. A.S.M.E., Series C , 88, 421-427 (1966).
Hyperbolic Differential Equations." J. Soc. Indust. Appl. Math., P. L. T. Brian, "A Finite-Difference Method of High-Order
41, 20-37 (1956). Accuracy for the Solution of Three-Dimensional Transient
J. Douglas, Jr. and H. H. Rachford, Jr., "On the Numerical Heat Conduction Problems," A.I.Ch.E. Journal, 7, 367-370
Solution of Heat Conduction Problems in Two and Three (1961).
Space Variables," Trans. Amer. Math. Soc., 82,421-439 (1956). J. Fromm, "The Time Dependent Flow of an Incompressible
J. Douglas, Jr., "The Effect of Round-off Error in the Numeri- Viscous Fluid," Methods in Computational Physics, Vol. 3,
cal Solution of the Heat Equation," Journal of the A.C.M., 6, pp. 345-382, Academic Press, New York, 1964.
48-58 (1959). J. E. Welch, F. H. Harlow, J. P. Shannon, and B. J. Daly, The
E. C. DuFort and S. P. Frankel, "Stability Conditions in the MAC Method; A Computing Technique for Solving Viscous,
Numerical Treatment of Parabolic Differential Equations," Incompressible, Transient Fluid-Flow Problems Involvirig Free
Math. Tables Aids Comput., 7, 135-152 (1953). Surfaces, Los Alamos Scientific Laboratory of the Univ. of
G. E. Forsythe and W. R. Wasow, Finite-Difference Methods California, Los Alamos, 1966.
for Partial Differential Equations, Wiley, New York, 1960. B. K. Larkin, "Some Stable Explicit Difference Approximations
S. P. Frankel, "Convergence Rates of Iterative Treatments of to the Diffusion Equation," Math. of Comp., 18, 196-202
Partial Differential Equations," Math. Tables Aids Comput., (1964).
4,65-75 (1950). V. K. Saul'yev, Integration of Equations of Parabolic Type by
J. D. Hellums and S. W. Churchill, "Transient and Steady the Method of Nets, Macmillan, New York, 1964.
State, Free and Natural Convection, Numerical Solutions: J. Douglas and J. E. Gunn, "A General Formulation of.
Part 1, The Isothermal, Vertical Plate," A.I.Ch.E. Journal, Alternating Direction Methods, Part I, Parabolic and Hyper-
8, 690-692 (1962).
bolic Problems," Numerische Mathernatik, 6,428-453 (1964).
L. Lapidus, Digital Computation for Chemical Engineers, W . F. Hughes and E. W. Gaylord, Basic Equations of Engineer-
McGraw-Hill, New York, 1962. ing Science, Schaum Publishing Company, New York, 1964.
P. D. Lax and R. D. Richtmyer, "Survey of the Stability of R. B. Bird, W. E. Stewart, and E. N. Lightfoot, Transport
Linear Finite Difference Equations," Comm. Pure Appl.
Phenomena, Wiley, New York, 1960.
Math., 9 , 267-293 (1956). S. Goldstein, ed., Modern Developments in Fluid Dynamics,
W. Leutert, "On the Convergence of Approximate Solutions
Oxford University Press, London, 1938.
of the Heat Equation to the Exact Solution," Proc. Amer.
R. L. Ramey, Physical Electronics, Wadsworth Publishing
Marh. Soc., 2, 433-439 (1951).
Company, 1961.
J. 0. Wilkes and S. W. Churchill, "The Finite-Difference Com-
A. R. Duffy, J. E. Sorenson, R. E. Mesloh, and E. L. Smith,
putation of Natural Convection in a Rectangular Enclosure,"
"Heat Transfer Characteristics of Belowground LNG Storage,"
A.I.Ch.E. Journal, 12, 161-166 (1966).
Chem. Eng. Progr., 63, 55-61 (1967).
G. O'Brien, M. Hyman, and S. Kaplan, "A Study of the
E. R. Gilliland and R. F. Baddour, "The Rate of Ion Exchange,"
Numerical Solution of Partial Differential Equations," J. Marh.
Ind. Eng. Chem., 45,330-337 (1953).
Phys., 29, 233-251 (1951).
J. H. Henderson, J. R. Dempsey, and J. C. Tyler, "Use of
F. C. W. Olson and 0. T. Schultz, "Temperatures in Solids
Numerical Reservoir Models in the Development and Operation
during Heating or Cooling," Ind. Eng. Chem., 34, 874-877
of Gas Storage Reservoirs," Paper No. SPE 2009, Symposium
(1942).
on Numerical Simulation of Reservoir Performance, Society of
S. Ostrach, An Analysis of Laminar Free-Convection Flow and
Petroleum Engineers, Dallas, 1968.
Heat Transfer about a Flat Plate Parallel to the Direction of
M. R. Tek, J. 0. Wilkes, D. A. Savilie, et al., New Concepts in
the Generation Body Force, Natl. Advisory Comm. Aeronaut.
Underground Storage of Natural Gas, The American Gas
Tech. Report 1111, 1953.
Association, New York, 1966.
D. W. Peaceman and H. H. Rachford, Jr., "The Numerical
S. Timoshenko and J. N. Goodier, Theory of Elasticity, 2nd ed.,
Solution of Parabolic and Elliptic Differential Equations,"
McGraw-Hill, New York, 1951.
J. Soc. Indust. Appl. Math., 3, 28-41 (1955).
I. G. Petrovsky, Lectures on Parfzal Uzfferential Equarlons, W. M. Rohsenow and H. Y. Choi, Heat, Mass, and Momentum
Transfer, Prentice-Hall, Englewood Cliffs, New Jersey, 1961.
Interscience, New York, 1954.
21. A. Ralston and H. S. Wilf., Mathematical Methods for Digital
Computers, Wiley, New York, 1960.
CHAPTER 8

Statistical Methods

8.1 Introduction: The Use of Statistical Methods probably have to consider their distribution in small
samples, as opposed to the statistics in a muck larger
In most branches of science and industry, methods of population.
experimental measurement may be inexact, and measure- 4. Finally, making the required inferences. For
ments themselves may be restricted in quantity. For example :
example, even simple linear measurement is subject to
inaccuracies, and the results of a complicated experi- (a) An estimate of the unknown parameters of the
mental procedure may be influenced by many disturbing assumed population distribution, together with
factors; also, the testing of mechanical failure of working confidence limits for these parameters.
parts is necessarily destructive, so usually only a small (b) A statement as to our confidence that the original
fraction is tested. In such cases, statistical methods may be mathematical model was a reasonable one.
used to advantage to obtain the best interpretation of the
data. A statistical approach to a particular problem may The most frequently encountered statistical problems
enable the experimenter to design a method of solution are those which involve one or more of the following
that minimizes the effect of experimental error; it will also features :
enable him to estimate the reliability of the results.
Frequently, in statistical problems, the technique is to 1 . Reduction of data. Much numerical information
examine relatively small amounts of experimental data may often be condensed into a simple relationship, to-
and thus to generalize about larger amounts of data. In a gether with a statement as to the confidence we may place
typical statistical investigation which permits certain in the relationship.
inferences to be drawn, we are usually confronted with 2 . Estimates and tests of significance. From experi-
the following sequence of steps: mental data, certain population parameters can be esti-
mated. It is usually possible to determine whether these
I. Specifying a mathematical model which represents
estimates differ significantly from preconceived values.
the situation. This involves:
3. Reliability of inferences depending on one or more
(a) Assuming a distribution function for the data, variables. For example, the total sulfur content in a bulk
based either on experience or on a distribution shipment is predicted from samples taken throughout
which might reasonably be expected to fit the the shipment, each of which is analyzed for sulfur, .

data. possibly by different methods and by different analysts.


(b) Identifying certain parameters of this distribu- We then ask, what reliability may be attached to the pre-
tion with what we wish to know about the actual diction of total sulfur content in the shipment?
situation. 4. Relationships between two or more variables. Sup-
pose that some measurable quantity depends on one or
Note here, especially, the danger of the misapplication more separate factors. Then, if it is possible by experi-
of statistical methods. For example, the most elegant mental design to control the separate factors at a series
statistical analysis of a series of measurements made with of fixed levels and to observe the measurable quantity at
a foot rule will be hopelessly inadequate if the rule is only each level, the technique known as the analysis of variance
1 I$ inches long. Or, the deduction that effect A is due to is used to evaluate the dependency.
a cause B may be incorrect if in fact both A and B depend
on some common cause C . A less preferable method statistically, but often more
2. Deciding on the procedure for taking experimental practicable, is to observe the fluctuations in the measur-
data or samples, if they are not already available in suit: able quantity and the separate factors, which arise
able form. naturally during the course of a particular operation,
3. Deciding on which statistics, computed from the without attempting to control the separate factors at pre-
data or samples, will be: used for making the required determined levels. In this case, methods of regression
inferences. After we have chosen the statistics, we shall analysis are employed to evaluate the dependency.
531
532 Statistical Methods

8.2 Definitions and Notation The cumulative frequency function, F(x,), shown in
Fig. 8.2, and also known as the distribution function, is
Statistics is the branch of scientific method of col- defined as the probability that x shall not exceed a certain
lecting, arranging, and using numerical facts or data
arising from natural phenomena or experiment. A statistic value x,. Note that F(x,) = Pr(x C x,) =
-m
1%
f (x) dx,
-
is an item of information deduced from the application of F(- co) = 0, and F ( m ) = I. A discontinuous frequency
statistical methods. A population is a collection of objects distribution is shown in Fig. 8.3. Note that
having in common some observable or measurable
characteristic known as a variate. An individualis a single
member of a population. A sample is a group of indivi- and that xz f (x) = 1. The corresponding cumulative
duals drawn from a population, usually at random, so frequency function is given by F(xo) = r 2 ,f(x).
that each individual is equally likely to be selected. A The term ave(y), where y = y(x), means the arithmetic
random variable is a numerical quantity associated with average of y, weighted according to the frequency dis-
the variate; its value for a given individual being deter- tribution, so that
mined by the value or nature of the variate for that par-
ticular individual. A continuous variable is one which can
assume any value within some continuous range. A dis-
conlinuous or discrete variable is one which can assume The integral sign would be replaced by a summation sign
only certain discrete values. for a discontinuous distribution. The notation E(y), in-
The probability, denoted by Pr, that a random variable dicating the expected value, is also used for ave(y).
x, belonging to an individual drawn from a population, Referring to a particular frequency distribution: (a) the
shall have some particular value A equals the fraction of mode is the most frequently occurring value of x, (b) the
individuals in the population having that value A mean p is the arithmetic average of x defined by
associated with them. The frequency function, f(x), also m
known as the probability density functi~n,of the random
variable x is defined by
p = ave(x) =
J- ,xf (x) d x ,
and (c) the median p, is that value of x such that half the
1. f (x,)dx = probability that the ccntinuous random population lies below it and half above it [that is,
variable x shall have any value between x, and x, + dx.
2. f(x,) = probability that a discontinuous random Fbe) = 1/21.
These three statistics coincide for symmetrical distri-
variable x shall have the value x,. butions. For discontinuous distributions, the mean and
For a continuous distribution, a typical dependency of median may lie between two possible values of x.
f ( x ) on x, known as the frequency distribution, is shown
For a population, it is also convenient to define the rth
in Fig. 8.1. Although special cases usually have some moment coeficients, p: and p,, about the origin and mean
restriction on the range of x (for example, x 2 0), we respectively. They are: (a) about the origin, p; = ave(xr),
+
consider here the general case of - co < x < co. Note, and (b) about the mean, p, = ave[(x - p)7. Note that the
mean p = pi. The moment p 2 is an important statistic
Pr(x, < x < x,) =
and
L2f (x) dx = shaded area of Fig. 8.1,
f (x) dx = 1, since all values of x must lie some-
known as the variance of x; its square root is called the
standard deviation o. We have
where between the extreme limits.

Figure 8.1 Continuous frequency distribution. Figure 8.2 Curnularive continuousfrequency distribution.
8.5 Population Statistics 533

f .f. I 8.5 Population Statistics


A few important relations concerning means and
variances of populations will now be developed. These
relations do not depend on any particular frequency
distribution.
1 . For one random variable x, by definition,

that is,
var(x) = ave(x2) - p2. (8.7)
Figure 8.3 Discontinuous frequency distribution. 2. For two random variables x and y, consider for the
sake of argument a discontinuous distribution, and let
The mean of a particular distribution serves to locate it
as a whole, whereas its variance indicates the magnitude f(xi,yj) = frequency function for the pair of values x i , yj,
of its spread about the mean. f(xi) = frequency function for the value xi,
regardless of the value of y,
8.3 Laws of Probability
f(yj) = frequency function for the value yj,
Let A,, A,, A, represent either single numerical values regardless of the value of x .
or sets of numerical values; if the latter, let x = A, mean
that x has some value within the set A,. Then, Note that:

1. If A,, A,, and A, are mutually exclusive,


Pr(x = A, or A2 or A,:) = Pr(x = A , ) + Pr(x = A,)
+ Pr(x = A,). (8.3)
2. If x and y are two independent random variables
that can have values A;, A:, A; . . ., and A:, A;, A{ . . .,
respectively, then
Pr(x = A; and y = A:) = Pr(x = A;) x Pr(y = A:).
(8.4)
Therefore,
8.4 Permutations and Combinations
ave(x + Y) = G (xi + ~j)f(xi,yj>
1. The number of diflferent arrangements, or permu- ij

tations, of x items chosen from n distinct items is


n!
P(n,x) = ---- (also written "P,). (8.5)
(n - x)!
The covariance, a,,, of x and y is defined by
Note that O! = 1.
2. The number of ways, or combinations, in which x cov(x,y> = a,, = ave[(x - P ~ ( -
Y, ) I
items may be chosen from n distinct items, regardless of = ave(xy - PxY - PYX + P,P,>,
their order of arrangement, is that is,

c(n,x) =
n!
(n-x)!x!
(also written
, ()' or "c,) axy = ave(xy) - P#,. (8.9)
The covariance is a measure of the dependency of r and y
on each other. In particular, if x and y are independent,
3. If n items are made up of x,, x,, x,, . . . items of f (xi,yj) 'f (xi)f (yj),
types 1,2, 3, . . . respectively, then the number of different that is,
arrangements of all the n items is
ave(xy) = C xiyjf ( x i * ~ j=) C xif(xi) C ~ j(yj)
f = Pxlry.
ij I i

Thus from (8.9), a,, = 0.


534 Statistical Methods

The above results can be reproduced for a continuous Then, since ave(x - pJ = ave(y - py) = 0:
distribution by replacing summations with integrals.
3. A linear combination z of random variables
x,, x,, ...,x,, is defined by
Example. As an illustration of (I), (2), and (3) listed under
Section 8.5, consider a population consisting of five cards
where the a's are constants, and the summation is under-
numbered 0, 1,2, 3, and 4 placed in a hat. We have
stood to be over i = 1,2, . . ., n unless otherwise stated.*
Let pi = ave(xi), a? = var(xi), aij = cov(xi,xj). Then, by 1. One random variable x. Let x = number on a card which
an extension of (8.8), is drawn at random from the hat and then replaced. Clearly,
p = ave(z) = ave (1aixi) = zai ave(xi) = 1aipi, (8.1 1)

By using (8.7) and (8.9), the reader can verify that this
leads to
2. Two random variables. Let x and y be the numbers from
the first and second draws. As for (I), p, = py = 2, 0: = u,2 =
2. Also, cov(x,y) = a,, = ave(xy) - p,py. Suppose the second
In particular, if all the xi are mutually independent, draw is made before the first card is replaced,
aij= 0 and aZ= ~a;a,2. 0(1+2+3+4)+1(0+2+3+4)+...
4. The mean and variance of an arbitrary function a,, = - 22 = -0.5
20
z(x,y) of the two random variables x and y may be ob-
tained approximately as follows. Taylor's expansion about However, if the first card is replaced before the second card
z(pX,py),neglecting second-order terms, gives is drawn.

where the partial derivatives z,, z, are evaluated at px, py.

* Note. Throughout this chapter, the summation parameter and confirming that for this case x and y are independent of each
its range will often be omitted for brevity, if the intended meaning is other.
unambiguous. For example, the right-hand side of equation (8.12) +
3. Linear combination. Suppose z = (x y), without
is, more fully,
replacement. Then
ave(z) = 2 + 2 = 4,
Also, a summation such as x;=lxi may be written as Z x r or E x . var(z) = 2 + 2 + 2(-0.5) = 3.
EXAMPLE $.1
DISTRIBUTION OF POINTS IN A BRIDGE HAND

Problem Statement and a, respectively. Now j, q, k, and a may independently


A bridge deck consists of 52 cards divided into four assume any of the values 0, 1, 2, 3, or 4. Thus there are
suits (spades, hearts, diamonds, and clubs), each of 54 = 625 possible combinations, of which a few will
which contains four honor cards (one ace, one king, one eventually be rejected as containing more than 13 honors.
queen, and one jack) and nine other cards. A bridge hand The whole pattern can be generated by cyclically varying
contains 13 cards which are assumed to be drawn at j, q, k, and a every 125th, 25th, 5th, and each hand,
random from the deck. The honor point count of a hand respectively, as shown in Table 8.1.1.
is reckoned by assigning 4 points to each ace it contains, Second, the probability of obtaining a hand having N
3 to each king, 2 to each queen, 1 to each jack, and zero honor cards comprising j jacks, q queens, k kings, and a
to the other cards (zero also for any other properties of aces is found as follows. The probability of one particular
the hand), and computing the total number of points sequence of N particular honor cards and any 13 - N
thus assigned. The problem is to find the probability of nonhonor cards is
obtaining any given point count from 0 (the minimum)
through 37 (the maximum). These probabilities will then
constitute the frequency function f ( p ) for the distribution
of a random variable p which equals the point count of a
bridge hand.
Method of Solution But the number of different possible sequences involving
the N particular honor cards is
We first note that the total number of different pos-
sible hands is large, being C(52,13) = 635,013,559,600, P(13, j) x P(13 - j, q) x P(13 - j - q, k)
and it is out of the question to generate and examine
13!
each hand for its point count. However, by focusing x P(13 - j - q - k, a) =
attention on the honor cards (the only cards that con- (13 - N)!
tribute to the point count), the problem can be simplified H ~ the ~ ~ , being dealt N paniculPr
~ rN of
considerably. honor cards in any sequence within one hand is
The first step is to establish all the possible honor
combinations that could occur in a hand. Let the number 13! 36! 39!
rN = (8.f.l)
of jacks, queens, kings, and aces in any hand be j, q, k, (13 - N)! +
(23 N)!
X-.
52!

Table 8.1.1

Rand Number of: Point Count Number of


Number Jacks Queens Kings Aces P Honors, N
(i)
(j) (Y) (k) (j+%+3k+44 (j+q+k+a)

The cyclical pattern is easily generated by using integer division, in which the quotient is rounded
down to the next lower integer. For example, we use the formula q+ i/25 - 5((i/25)/5)for the
number of queens in hand i; this gives the pattern q = 0, 1, 2, 3, 4, 0, 1, 2, etc., in which the
changes occur every 25th hand.
536 Statistical Methods

Note that the rN may be generated from the recursion the probability of obtaining N honor cards composed
relation of any j jacks, q queens, k kings, and a aces is

with which can be computed fairly easily.


36! 39! 24 x x 36 Finally, it is simple for the computer to work through
ro = - X - =
0 . -

the 625 possible combinations described above, summing


23! 52! 40 x x 52'
the probabilities of all possibilities leading to a given
But the N honor cards may be any j jacks, any q queens, point-count p, and hence to obtain the frequency func-
any k kings, and any a aces, and hence may be selected tion f (p) for the distribution of point counts. The program
in any of C(4,j) x C(4,q) x C(4,k) x C(4,a) ways. Note given below also produces the cumulative frequency
that since C(n,x) = n!/x!(n- x ) ! , we have C(4,O) = function F b ) , the mean p, and variance cr2 of the point-
C(4,4) = 1 , C(4,l) = C(4,3)= 4, and C(4,2) = 6 . Hence count distribution.

Flow Diagram

I
13! 36! 39! I
r i t X- X- I
(13 - i ) ! (23 + i ) ! 52! I
1
I
I
(using integer division) I
I
I
I
I
-----l
Example 8.1 Distribution of Points in a Bridge Hand

FORTRAN Implementation
List of Principal Variables
Program Symbol Definition
AVG, VAR Mean p, and variance a2, of the point-count distribution.
Ct Vector containing the values ci= C(4,i), i = 0, 1, 2, 3, and 4.
CUMPRBt Vector containing the probabilities F, of obtaining any point count from 0 to p in any one
hand.
N J , NO., NK, Number of jacks, queens, kings, and aces (j,q, k, and a), respectively.
and NA
N PTS Point count, p.
NHONS Nuinber of honor cards, N.
PROBf Vector containing the probabilities f, of obtaining exactly p points in any one hand.
Rt Vector containing values r, given by equation (8.1.1).

t Due to FORTRAN limitations, all subscripts in the program


are advanced by one; e.g., r, through rl, in the text become R(1)
through R(14) in the program.
538 Statistical Methocis

Program Listing
C A P P L l ED NUMERI CAL METHODS, EXAMPLE 8.1
C D I S T R I B U T I O N OF POINTS I N A BRIDGE HAND.

THE TOTAL P O I N T COUNT OF A HAND I S COMPUTED WITH ACE = 4,


K I N G = 3, QUEEN = 2, AND JACK = 1 (POINTS ARE NOT COUNTED
HERE FOR D I S T R I B U T I O N A L OR OTHER FEATURES). THE PROB-
A B l L l T Y OF EACH P O I N T COUNT FROM 0 THROUGH 3 7 I S COMPUTED,
T A K l NG INTO ACCOUNT A L L THE POSSIBLE HONOR D l S T R l BUTIONS
. LEAD1 NG TO EACH PARTICULAR POINT COUNT. THE PROBABI L I
T I E S THEN REPRESENT THE FREQUENCY FUNCTION, FROM WHICH THE
-
CUMULATIVE FREQUENCY FUNCTION AND THE MEAN AND VARIANCE OF
THE POINT-COUNT D I S T R I B U T I O N ARE ALSO OBTAINED.

I M P L I C I T REAL*8 (A-H, 0-Z)


DIMENSION C(5), R ( 1 4 1 , PROB(38), CUMPRB(38)

.....
DATA C/ 1.0,
SET THE C ARRAY AND I N I T I A L I Z E PROB ARRAY
4.0, 6.0, 4.0, 1.01
.....
DO 1 I = 1, 3 8
P R O B ( I ) = 0.0

..... COMPUTE COEFFICIENTS R ( 1 ) THROUGH R ( 1 4 )


R ( 1 ) = 24.0/40.0
.....
DO 2 1 = 1, 1 2
FI = I
R(1) = R(1)*(24.0 + F1)/(40.0 + F I )
DO 3 I P 1 = 1, 1 3
FIP1 = IP1
R ( I P 1 + 1) = R ( l P l ) * ( l 3 . 0 -
F l P l + 1.0)/(24.0 + FlPl - 1.0)

.....E S T A B L I S H HANDS CONTAINING A L L POSSIBLE HONOR


CARD D I S T R I B U T I O N S , AND COMPUTE THE CORRESPOND-
I N G TOTAL POINTS AND HONOR CARDS FOR EACH CASE
DO 5 I P 1 = 1, 6 2 5
.....
I = 1P1 -1

NK = l / 5
NA = I --((1/5)/5)*5
(1/5)*5
NPTS = N J + 2*NQ + 3*NK + 4*NA
NHONS = N J + NQ + NK + NA
b
C ..... COMPUTE
-.... PROBABI L I T 1 ES OF A L L P O I N T COUNTS
I F (NHONS .GT. 1 3 ) GO TO 5
.....
PROB(NPTS + 1) a PROBCNPTS + 1) + C ( N J + l ) * C ( N Q + l)*C(NK + 1)
1 *C(NA + l)*R(NHONS + 1)
5 CONTINUE

..... COMPUTE CUMULATIVE PROBAB l L l T I ES


CUMPRB(1) a PROB(1)
.....
DO 6 1 = 2, 3 8
CUMPRB( I ) CUMPRB( 1 - 1 ) + PROB( I )

..... PR l NT RESULTS I N TABULATED FORM


WRITE ( 6 , 2 0 0 )
.....
1 - 0
WRITE ( 6 , 2 0 1 ) 1, PROB(11, CUMPRB(1)
DO 7 1 = 1, 2 6

WRITE ( 6 , 2 0 2 ) 1, PROB(I+l), CUMPRB(I+l)


..... COMPUTE MEAN AND VARIANCE
AVG = 0.0
.....
DO 9 1 = 1, 5 7
FI = I
AVG = AVG + F I * P R O B ( I + 1)
VAR = AVG*AVG+PROB(l)
DO 1 0 1 = 1, 3 7
FI = I
VAR = VAR + ( F I
WRITE (6,203)
-
AVG)**2+PROB(I
AVG, VAR
+ 1)

CALL E X I T
Example 8.1 Distribution of Points in a Bridge Hand

Program Listing (Continued)


C
C
200
.....FORMATS FOR INPUT AND OUTPUT STATEMENTS .....
FORMAT (1H1, 9X, 39HDISTRIBUTION OF POINTS I N A BRIDGE HAND/lHO,
1 9X, 1 l H P O l N T COUNT, 3X, l l H P R O B A B I L I T Y , 3X, 12HCUMUL. PROB./lH / )
2 0 1 FORMAT ( l o x , 16, 9X, F10.8, 4X, F10.8)
202 FORMAT ( l o x , 16, 8X, E11.4, 4X, F10.8)
203 FORMAT (1H0, 9X, 7HMEAN # F10.7/1HOI 9X, 1lHVARlANCE ,
1 F10.7)
C
END

Computer Output
DISTRIBUTION OF POINTS I N A BRIDGE HAND

POI NT COUNT PROBABl L l TY CUMUL. PRDB.

MEAN = 9.9999994

VARIANCE = 17.0588225
540 Statistical Methods

Discussion of Results
The computed probabilities, of obtaining any specified But E(y,y2) may be obtained by considering all the pos-
number of points in a bridge hand, are shown as the sible combinations of two honor cards in positions 1
vertical lines of Fig. 8.1.1. Together, they constitute the and 2, such as: (a) two queens, giving y,y2 = 4, which
frequency function f ( p ) for the random variable p, the may occur in C(4,2) = 6 ways, or (b) one ace, one king,
point count. The computed values for the mean (10.000) giving y,y, = 12, in 4 x 4 = 16 ways, etc. The total
and variance (17.059) can actually be deduced theoretic- contribution to y,y2 is 8 x (4 3 2 + + + -2 x
ally as follows, without having to consider the exact +
(4' + 32 + 22 1') = 740, and there are C(52,2) =
form of the frequency function. 52 x 5112 possible pairs of cards occupying places 1 and
Regard the point-count p as the linear combination 2. Hence,

where y,, y,, . . ., y I 3 are the points associated with each


card in the hand. Then, if E denotes expected or average The variance of p is finally obtained by noting that
value,
E ( Y I )= E(Yz)= ... = E ( Y I ~ )

Hence, p = E(p) = 13 x 40152 = 10. Also, var(p) =


E(p2)- E2(p).NOW,
E(p2) = E + Y2 + ." + ~
( ~ l 1 3 ) ~ i.e., a2 = var(p) = 117.059 - 10' = 17.059. Q.E.D.
= E ( Y : + Y : + . . . + Y : ~ +C Y ~ Y , ) The frequency function f(p) is almost symmetrical
i+ j
+
= 1 3 ~ ( y : ) 13 x 12 x E(y,y,). about the mean p = 10; the symmetry is not perfect,

P
Figure 8.1.1 Distribution ofpoints in a bridge hand. The curve shows the frequency
function for the normal distribution with mean = 10.0 and variance = 17.059
(standard deviation = 4.130).
Exomple 8.1 Distribution of Points in a Bridge Hand 541
since negative point counts are impossible, whereas on with mean p = 10 and variance a2 = 17.059. The normal
the high side they may run up to 37, although with very distribution will be discussed in more detail later; note
small probability. here, however, that even though its frequency function is
The smooth curve drawn on Fig. 8.1.1 represents the continuous, it does give a fairly good approximation to
frequency function, (8.261), for the normal distribution f(p) for the discontinuous point-count distribution.
542 Statistical Merhods

8.6 Sample Statistics 8.7 Moment-Generating Functions


Consider a sample which comprises n independent The following population moments have already been
observations on the random variable x. defined :
I1
1. The sample mean x' is defined by about the origin
m
.Y = -n1 -xi, (8.13) pi = ave(xr) =
- ca
x'f(x) dx ; (8.16)
I
! that is, about the mean
1
pz = ave(R) = ave X ave(xi) = - Zp.
n
Thus, ave(X) = p, so that the sample mean is an unbiased By expanding the last integrand, the p, may be obtained
estimate of the population mean. The variance of the in terms of the pi(:.In particular,
sample mean is
a: = ave[(a - = ave[(j~)~] u-
,;
If several moments are required, it is sometimes a little
tedious to perform several integrations of the type (8.16).
For these cases, it is often more convenient to work not
with the original frequency function f(x) but with a
derived function called the moment generating function
(MGF). The MGF about the origin is defined, for an
But ave(x2) = a2+ p2 and, if the x's are independent, arbitrary parameter 0, by
m
ave(xixj ) = ave(xi) ave(xj) = p2, Mx(0) = ave(eex) = eexf(x) dx. (8.18)
that is, -m
Since

2. The sample variance s2 is defined by we have


Mx(B) =
-m
f(x) dx + fl Jm
-m
xf(x) dx

e2 +.
that is, + 7L f J - ,
x2f (x) dx
Thus,

In certain situations, the integral of (8.18) may not


But exist. In this case, it is convenient to define instead the
characteristic function.
a2
ave(x2) = a2+ p2 and ave(z2) = - p2,
n
+
that is, (where i = ,/z) which will converge, and for which
very similar results follow. For the present purposes,
however, we assume that Mx(8) does exist. The MGF
has the following properties:
Therefore, the sample variance is an unbiased estimate of 1. There is a unique relation between Mx(8) and f (x) ;
the population variance, so that (n - 1) and not n should that is, each is defined by the other.
be used in defining s2. This allows for the fact that 2. Let x and y be two independent random variables
-
X(xi m)' is a minimum when m is the sample mean, whose frequency distributions are known. Then the MGF
whereas, if it were available, m should preferably be
taken as the population mean, p, in which case n and not
+
of the random variable (x y) is given by
(n - 1) would be placed in the denominator of (8.15).
8.10 The Poisson Distribution 543

Thus the statistics of the (x + y) distribution are known and


immediately.
3. It yields the moment coefficients from equation
(8.19). 8.9 The Multinomial Distribution
4. There also exists an MGF about the mean,
Suppose we have a random variable which can assume
not only two values (as in the binomial distribution) but
for which similar results follow. can fall into any one of k different classes with respective
probabilities f,, f,, .. .,f,. Then for a total of n obser-
8.8 The Binomial Distribution vations, the probability that x,, x2, .. ., xk observations
will fall into classes 1,2, . . ., k, respectively, is given by
Consider a series of n independent observations on a
random variable which can have only two values (for the frequency function of the multinomial distribution:
example, success or failure) with respective probabilities
p and q = 1 - p. The probability of any one particular
sequence of x successes and n - x failures is pxqn-". Also,
the number of different such sequences, each containing x
successes and n - x failures, equals the number of ways With respect to the variable xi, the multinomial distri-
x objects can be chosen from a total of n objects, that bution has mean = nfi and variance = nfi(1 -fi).

\ ,
Example. A bag contains a large number of balls of which
Hence, the number of successes x is a random variable 50% are red, 30% black, and 20% white. A sample of 5
whose frequency distribution is balls is drawn. What is the probability that 3 will be red, none
black, and 2 white? We have fi = 0.5,f2 = 0.3,f3 = 0.2;
whence

where x may have the values 0, 1, 2,. .., n. Since (8.22)


represents the general term of the expansion(p + q)", such
a distribution is called the binomial distribution. Note That is, there is a 1 in 20 probability of the specified drawing
occurring. Note that the requirement of a "a large number of
that Stirling's formula,
balls" can be omitted if the balls are drawn one at a time and
then replaced.
may be of assistance in evaluating the various terms of 8.10 The Poisson Distribution
(8.22) for cases of large n, if factorial tables are not It may be shown fairly easily that as n -,oo while np
available. remains finite (and equal to I , for instance), the binomial
Example. Based on previous experience, a certain line of distribution tends to the limit
manufactured items is thought to contain 5% defectives.
What is the probability that a sample of 5 items will not
contain more than one defective item?

hobability =f(O) +f(l) = (i)(0.95)' + (:)


(0.95)4(0.05) The frequency function of (8.25) is that of the Poisson
distribution, and is applicable to discrete events occurring
at random in a continuum of time or space, when we
Thus there is a 97.7% chance of not more than one defective. regard as a random variable x the frequency or number
of events occurring in a relatively small interval of that
The mean and variance of the binomial distribution are time or space.
easily obtained from the MGF, which is given by For example, incidents of radioactive decay may occur
from second to second in a Geiger counter; over a minute,
we could say that they occurred at an average rate of I
incidents per second. Then the probabilities of O,1,2, . . .
incidents occurring in any one second would be expected
to equal the successive terms of the Poisson distribution,
That is, pi = (dM/dO),=, = np, and pi = (d2~/df?2)e=o
= namely,
+
n(n - l)p2 np. Thus,
p = ave(x) = pFc; = np,
544 Statistical Methods

Note that in contrast to the binomial distribution, there Alternatively, if we take the MGF for the binomial dis-
is no upper limit to the number of incidents that might tribution, equation (8.23), and let n -P a,np -,1, we
occur in a second, even though the chances of a large obtain
number would be very small. In other words, we cannot +
M,(O) = [l p(ee - I)]"
say how many incidents did not occur in a given second.
The MGF of the Poisson distribution is

that is,
p; = (dM/dO)e,o = I , and pi = (d2~/d02),=,= 12+ 1.
Thus
p = ave(x) = p; = A, and a2 = var(x) = pi - p2 = 1.
EXAMPLE 8.2
POISSON DISTRIBUTION RANDOM NUMBER GENERATOR

Problem Statement that is,


em".
Write a program that will generate a sequence of n Pr(xk G ee-9 = -+ Pr(x,-, 6 e W L ) .
random numbers coming from a population conforming k!
to the Poisson distribution with mean d. Compare the By induction, this means that
observed frequency of appearance of each particular
number with its theoretical frequency. 1I .
Method of Solution
Thus, the probability that x, does not fall above e-"
We shall make use of a "flat" random number corresponds to the cumulative frequency function for
generator, described below, which returns a random the first k + 1 members of the Poisson distribution.
variable z as a decimal fraction having the uniform fre- Hence, we conclude that if x, < e-", but x,-, > e-',
quency function h(z) = 1 for 0 < z < 1, and h(z) = O for then k obeys the Poisson distribution with mean d.
all other values of z. Let x, = zOzlzz. . . z, be the product
of a sequence of k + 1 such random variables. Then the Generator for Uniformly Distributed Random Numbers
lowest value of k, which first causes x, to be less than or The multiplicative congruential method for generating
equal to e-< will be a random variable having the a sequence of uniformly distributed pseudorandom
Poisson distribution with mean d. A sketch sf the proof
follows.
.
integers xi, i = 2, 3, . .,n, uses the recursion relation
First note that if y and z are independent random x i = axi- l(mod m ) ,
variables, with frequency functions g(y) and h(z), respec- in which a and m are integer constants. A random floating-
tively, it can be shown that x = yz has the frequency point number zi, uniformly distributed between 0 and 1,
function is then given by zi= xJm. The modulus rn is a large
integer, usually of the form 2" or 10" (for binary and
decimal machines, respectively), its magnitude being
restricted by the available word length. The exact choice
In general, the integration limits will be y = - co and for m (and a) is usually only critical if we are trying to
y = +a, but in the present case the particular forms of write a very efficient program in basic machine instruc-
g and h will lead to much more restricted limits. tions; for example, on a binary machine, multiplication
Let f,(x,) be the frequency function for x,. Since by an integeral power of two can be achieved by simply
f0(x0) = h(z0) = 1, we have, for x, = xOz,, shifting digits the appropriate number of places, etc.
The method is discussed more fully in [7], [8], and [9].
To summarize, we can say the following:
(a) For m = 2", appropriate for efficient machine-
language programs with binary computers, the maximum
period before the numbers start repeating is 4 4 .
Assuming a > 2, this maximum period will be achieved,
provided that the starting value x1 is odd, and that a
By induction, we can show that xk = x,-,z, has !he
differs by 3 from the nearest multiple of 8. A good
frequency function
choice for a is close to 2"12.
(b) For m = 10" (decimal machines), the maximum
period is m/20, which will be achieved, for r > 2, pro-
vided that x1 is odd and not a multiple of 5, and that a
We also note the existence of the recursion relation differs from the nearest multiple of 200 by one of the
I , = x Ink x - kJk-, for evaluating integrals of the form following numbers: 3, 1 1, 13, 19, 21, 27, 29, 37, 53, 59,
I , = j lnk x d x . Herice, ttlt: probability that x,, not be 61, 67, 69, 77, 83, or 91. A good choice for n is close to
greater than e r a is 2"12.
The term pseudorandom is used to describe the suc-
-2.
cessive values xi; although they may have satisfactory
p r ( x ~g e-'I= fk(xk)dx,
0 randomness, a predetermined algebraic formula has been
used for their generation. The fact that the period may be
large does not guarantee that the particular sequence will
necessarily possess acceptable random properties.
5 46 Statistical Methods

In the present example, we choose the values: m = 220, used. For convenience, the above method is written as a
a = 21° + 3, and x, = 566,387. Thus, the conditions subroutine named RANDOM. Note finally, that once xi
mentioned in (a) above are satisfied, and we are also sure has been computed, xi-i is no longer needed. Thus, all
that the product axi-, will not exceed Z3l - 1, which is the xi can be stored successively in the same memory
the maximum allowable integer on the machine being location.

Flow Diagram
Main Program (Excluding details of sorting and comparing numbers that are generated)

xcl
k t -1

X t XZ

Y T
i

x<c + i, k
F

z c random variable
uniformly distributed
between 0 and I.
(Subroutine RANDOM)

Subroutine RANDOM (Argument: z)

Enter First call ? at21°+3 x c ax (mod m )


x c 566,387

o-+j*l
Return
Example 8.2 Poisson Distributlan Random Number Generator

FORTRAN Implementation
List of Principal Variables
Program Symbol Definition
(Main)
C The constant c = B".
FACT Vector whose kth element FACT(K) contains k!
IPRINT Number of individual random numbers to be printed.
K Random variable k obeying the Poisson distribution.
LAMBDA Mean value, 1.
N Total number, n, of rand~rnnumbers to be generated.
NOBS Vector containing observed frequenoies of particular random numbers. NOeSi! is the observed
frequency of zeros.
RANDOM Subroutine that gives a random number z between Q and 1 with uniform probability.
THEORN Vector containing the theoretical frequencies of particular random numbers. THEORZ is the
theoretical frequency of zeros,
..
Repeated product, xk = zozt . 2,.
Random number z, uniformly distributed between 0 and 1.
(Subroutine
RANDOM)
A Multiplier, a.
I An integer switch, that is pmset to 1 by using the DATA statement. It identifies the Grst entry;
thereafter, it has the value 0,
M Modulus, m.
x Current random integer, x,.
548 Statistical Methocis

Program Listing
Main Program
C APPL l ED NUMERI CAL METHODS, EXAMPLE 8.2
C FOISSON D I S T R I B U T I O N RANDOM NUMBER GENERATOR.
L
C THE SUBROUTINE RANDOM PRODUCES A RANDOM NUMBER UNIFORMLY
C D I S T R I B U T E D BETWEEN 0 AND 1. T H I S I S THEN TRANSLATED I N T O
C A RANDOM VARIABLE K FOLLOWING THE ,POISSON D I S T R I B U T I O N W I T H
C MEAN LAMBDA. N SUCH NUMBERS ARE GENERATED. NOBS(K) AND
C THEORNCK) ARE THE OBSERVED AND THEORETICAL COUNTS FOR A
C PARTICULAR VALUE OF K.
C
DIMENSION NOBS(501, THEORN(501, FACT(50)

..... READ AND CHECK DATA


REAL LAMBDA
.....
READ (5,100) N, I P R I N T , LAMBDA
WRlTE (6,200) N, I P R I N T , LAMBDA
FN = N
NOBSZ = 0
KMAX = 0
DO 2 1 = 1, 5 0
NOBS(I) = 0
C = EXP(- LAMBDA)

..... GENERATE N POISSON D l S T R l BUTION RANDOM NUMBERS


WRlTE ( 6 , 2 0 4 )
.....
DO 11 I = 1, N
X = 1.0
K a - 1
I F ( X .LE. C) GO TO 5

.....CALL ON F L A T R.N.G.
CALL RANDOM ( 2 )
FUNCTION RANDOM .....
X = X*Z
K = K + l
GO TO 3

..... INCREASE KMAX TO LARGEST NUMBER YET GENERATED


I F ( K .LE. KMAX) GO TO 7
.....
KMAX = K

..... P R I N T THE F I R S T I P R I N T NUMBERS


I F ( I .GT. I P R I N T ) GO TO 9
.....
WRlTE ( 6 , 2 0 1 ) 1, K

..... AUGMENT TOTAL COUNT OF NUMBER JUST GENERATED


I F ( K .NE. 0 ) GO TO 1 0
.....
NOBSZ = NOBSZ + 1
GO TO 11
NOBS(K) = NOBS(K) + 1
CONT l NUE

..... COMPUTE THEORETICAL FREQUENCIES,


WRlTE ( 6 , 2 0 2 )
P R I N T NOES AND THEORN .....
K = O
THEORZ = C*FN
WRlTE (6,203) K, NOBSZ, THEORZ
K = l
F A C T ( 1 ) = 1.0
THEORN(1) = C*LAMBDA*FN
WRITE (6,203) K, NOBS(11, THEORN(1)
I F (KMAX .LT. 2 ) GO TO 1
DO 1 2 K = 2, KMAX
FK = K
FACT(K) = FACTCK-1)*FK
THEORN(K) = C*LAMBDA**K*FN/FACT(K)
WRlTE (6,203) K, NOBS(K), THEORNCK)
GO TO 1
Ejrample 8.2 Poisson Distribution Random Number Generator

Program Listiug (Continued)

..... FORMATS FOR l NPUT AND OUTPUT STATEMENTS


FORMAT ( l o x , 15, 11X, 14, 13X, F7.3)
.....
FORMAT ( 1H1, l o x , 44HPOISSON D l STRl BUTION RANDOM NUMBER GENERATOR
1 /lHO, 15X, 9HN = , 14/ 16X, 9 H I P R I N T = , 14/
2 16X, 9HLAMBDA = , F10.5)
FORMAT ( 1 H , 15X, 5 H I = 15, 5X, 5HK = 15)
FORMAT (1H0, lox, 44HTHE ~ B S E R V E D AND T H E O ~ E T I C A L FREQUENCIES ARE
1 /lHO, 15X, 25HK NOBS NTHEOR/lH 1
FORMAT (117, 111, F13.4)
FORMAT (1H0, l o x , 32HTHE F I R S T FEW RANDOM NUMBERS A R E / l H 1

END

Subroutine RANDOM
C SUBROUTINE FOR GENERATING RANDOM NUMBERS HAVING A UNIFORM
C DISTRIBUTION, BY THE MIXED M U L T I P L I C A T I V E CONGRUENTIAL METHOD.
C
SUBROUTINE RANDOM (Z)
DATA 1 /1/
INTEGER A, X
I F ( I .EQ. 0 ) GO TO 1
I = o
M = 2*+20
FM = M
X = 566387
A = 2++10 + 3
C
1 X = MOD(A*X, MI
FX = X
Z = FX/FM
RETURN
END

N = 1000, lPRlNT= 20, LAMBDA = 2.0


N = 1000, IPRINT * 20, LAMBDA = 5.0
550 Statistical Methoak

Computer Output

POISSON D I S T R I B U T I O N RANDOM NUMBER GENERATOR POISSON D I S T R I B U T I O N RANDOM NUMBER GENERATOR

N = 1000 N = 1000
IPRINT = 20 IPRINT = 20
LAMBDA = 2.09000 LAMBDA = 5.00000

THE F I R S T FEW RANDOM NUMBERS ARE THE F I R S T FEW RANDOM NUMBERS ARE

THE OBSERVED AND THEORETICAL FREQUENCIES ARE THE OBSERVED AND THEORETICAL FREQUENCIES ARE

K NOBS NTHEOR K NOBS NTHEOR


Example 8.2 Poisson Distribution Random Number Generator

Discussion of Results
One thousand random numbers have been generated
for each of the cases A = 2 and L = 5. The observed
distributions seem to cofiform reasonably to those pre-
dicted theoretically; this conclusion is further substan-,
tiated by using a )I2 test in Example 8.4.
552 Statistical Methods

8.11 The Normal Distribution Thus


Many random variables are distributed approximately pi = (dM/d6),,, = p, or ave(x) = pi = p,
according to the normal, or Gaussian, law. Such vari-
ables are characterized by random deviations on either and
side of a fairly well-defined central value or mean. For pi = ( d 2 ~ / d 0 2 ) 8 ==011' + cr2,
example, successive observations on (a) the heights of or var(x) = p; - p2 = cr2.
different men belonging to a certain race, (b) a particular
experimental measurement subject to several random The stamk~rdizedform is obtained by changing the
errors, or (c) the time taken to travel to work along a variable to 5 = (x - p)/a, which is tc~medan N(0,1)
given route, would be expected to conform to the normal variable because its distribution is normal with zero mean
distribution law. Other variables might be transformed to and unit variance- In fact, the corresponding frequency
give an approximately normal distribution, for example, is given
the logarithm of the diameter of particles in a powder. 1
The frequency function of the normal distribution is 4(5) = -e-"12. (8.28)
JFn
e-(x-p)2/2a2
Ax) =- , (8.26) Representative values of the frequency function $(5) and
42, the corresponding cumulative distribution function a((),
which is shown in Fig. 8.4. The area under the curve is, both Correct to four decimal places, are given in Table 8.1,
of course, unity. It is shown below that 11 and o2 are, in together with values of ( P (correct to about k0.002)
which have a probability P of being exceeded. Table 8.1
is derived in Example 8.3. The levels P = 0.2, 0.1, etc.,
are frequently termed the 20 %, 10 "/, levels, etc.

Table 8.1 Vulues of the Standardized Normal Distribution

4 4 @ 4 d @
--
0.0 0.3989 0.5000 2.0 0.0540 0.9772
0.1 0.3970 0.5398 2.1 0.0440 0.9821
0.2 0.3910 0.5793 2.2 0.0355 0.9861
0.3 0.3814 0.6179 2.3 0.0283 0.9893
0.4 0.3683 0.6554 2.4 0.0224 0.9918

0 !J
*x 0.5 0.3521 0.6915 2.5 0.0175 0.9938
0.6 0.3332 0.7258 2.6 0.0136 0.9953
Figure 8.4 The normal distribution. 0.7 0.3122 0.7580 2.7 0.0104 0.9965
0.8 0.2897 0.7881 2.8 0.0079 0.9974
0.9 0.2661 0.8159 2.9 0.0060 0.9981
fact, the mean and variance of t h ~ sdistribution. It is
1.0 0.2420 0.8413 3.0 0.0044 0.9986
shown in the next section that the frequency function 1.1 0.2178 0.8643 3.1 0.0033 0.9990
(8.26) is in complete agreement with the concept of a 1.2 0.1942 0.8849 3.2 0.0024 0.9993
large number of small random fluctuations, characterized 1.3 0.1714 0.9032 3.3 0.0017 0.9995
by the parameter o, about a central value, p. The normal 1.4 0.1497 0.9192 3.4 0.0012 0.9997
distribution can be used to represent a random variable, 1.5 0.1295 0.9332 3.5 0.0009 0.9998
1.6 0.1109 0.9452 3.6 0.0006 0.9998
such as the height of men, not having negative values, 1.7 0.0940 0.9554 3.7 0.0004 0.9999
provided that f (x) has already fallen to a negligibly small 1.8 0.0790 0.9641 3.8 0.0003 0.9999
value for small positive values of x. 1.9 0.0656 0.9713 3.9 0.0002 1.0000
The statistics of the distribution are obtained from the 2.0 0.0540 0.9772 4.0 0.0001 1.0000
MGF. Noting that
P 0.20 0.10 0.05 0.02 0.01
tP 0.843 1.282 I .647 2.056 2.329

P 0.005 0.002 0.001


then Sp 2.578 2.880 3.092

1 -(x - pI2
M,(O) = -
ffJ% jmexoexp[
-m
~~2 ]dx The normal distribution is probably the most important
distribution in the whole of statistical theory. It is easily
= exP[p0 + 71. (8.27)
shown via the MGF that the linearcombinationz = x a i x i
of normally distributed variables x i is itself normally
Example 8.11 The Normal Distribution 553
distributed (with mean aipi and variance a?o?). In small errors 6, of which X are in the positive direction;
particular, for an N(p,a) population, the sample mean 3 +
that is, x = rn (2X - n)6. Let the probability of any
is an N(p,az/n) variable. one error being in the positive direction be p. From the
binomial distribution (8.22),
Example. A population is known to be distributed as
N(40,9). What value would the mean of four items drawn at
random from the population have a 1% probability of
exceeding ?
Assuming that the population is sufficiently large so that the so the MGF for the distribution of x is
items are independent of one another, then the mean of the
+ +
four items is X = (x, x 2 x 3 + x4)/4, which is a linear
combination of normally distributed variables and is itself
normally distributed.
f!
mc Sample mean P
Now, if positive and negative errors are equally likely,
p = 112, so that

+ nS202+ (2/3)nS303+ . - -
- ernee - "68
nSf3
n I.
Figure 8.5 Population and sample (n = 4) distributions. Now let n -+ oo and 6 -+ 0,while 2nd2 (= c2, for instance)
remains finite, so that terms of O(nJ3) tend to zero. The
Then, a: = 0 2 / n= 914, or u; = 1.5. From Table 8.1, the value MGF then becomes
of f which has a P = 0.01 probability of being exceeded is
tp= 2.329 = (X - p)/u;. Thus the required value is I = 40 +
2.329 x 1.5 =43.49 (see Fig. 8.5). That is, in the long run,
the sample mean will exceed 43.49 once in every 100 samples.
M,(B) = exp [me + 71.
But for m = ,u and c2 = a 2 this is identical with the form
8.12 Derivation of the Normal Distribution obtained by the integration of
Frequency Function
Assume that superimposed on the "true" value m for
any measurement x, there is a large number n of very
EXAMPLE 8.3
TABULATION OF THE STANDARDIZED NORMAL DISTRIBUTION
Problem Statement The problem is to construct tables of: (a) 4 and 0
For a standardized normal variable t, the frequency versus t , for 0 < t < nh, with increments in 6 of 2,and
function 4, the cumulative frequency function (P, and (b) tpversus P, for certain values PI, P2, . . ., P,, with the
the probability P that a certain value tp will not be knowledge that no value of t higher than nh will be
exceeded are illustrated in Fig. 8.3.1 and are defined by involved.
Method of Solution
Note that owing to symmetry about the origin, only
one-half of the distribution is considered. The computa-
tion is performed in three main steps. First, values of
4j are calculated corresponding to =jh, for j = 0, 1,2,
..., n. Second, values of cPj are obtained at the alternate
points j = 0,2,4, . .., n, where n is even, by repeated
applications of Simpson's rule :

with 0, = 0.5. Simpson's rule overestimates the integral


by an amount (h5/90)d4$(rl)/dt4, where < 9 < ti.
Now d44/dt4 = e-5212(t4- 6t2 + 3)/J%, which has a
maximum value of 3, at t = 0. In the following program,
h = 0.05 and n = 80, so a conservative estimate of the
maximum error in cP, occurring at 5 = 4.0, is (0.05)' x
3 x 40/(90,/2), which is less than an inconse-
quential amount. Third, the value of (, corresponding
to P, i.e., to (P = 1 - P, is found by a table look-up
procedure followed by linear interpolation for t between
Figure 8.3.1 T k stan&iudized normal distribution. two consecutive entries of (P.
Example 8.3 Tabulation of the Standardized Normal Distribution

Flow Diagram

Begin

FORTRAN Zmplemeniation
List of Principal Variables
Program Symbol Definition

CAPPHI Vector containing values of the cumulative frequency function, aj.


DXI Increment, A t = h.
JP1 Subscript equivalent to j. Due to FORTRAN limitations, we have 1 s J P l s N + 1 in the pro-
gram, corresponding to 0 <j < n in the text.
N Total number n of increments At; n must be even.
NPROBS Total number q of probabilities P to be tabulated. Only values from P = 0 to P = 0.5 are to
occur.
PHI Vector containing values of the frequency function, + j .
PROB Vector containing specified values of P .
R 1 1 6 .
X Vector containing values of 5,.
XI Vector containing values of the standardized normal variable, ti.
Y Vector containing values of 1 - P.
Program Listing
APPLl ED NUMER lCAL METHODS, EXAMPLE 8.3
TABULATION OF VALUES OF THE STANDARDIZED NORMAL DISTRIBUTION.

THE FREQUENCY FUNCTION (F.F.) AND THE CUMULATIVE D I S T R I B U T I O N


FUNCTION (C.D.F.) ARE TABULATED AGAINST THE STANDARDIZED NORMAL
V.ARIABLE (S.N.V.). THE NUMBER OF ENTRIES AND THEIR SPACING
ARE CONTROLLED BY THE INPUT DATA. THE INTEGRATION FOR THE
C.D.F. I S ACHIEVED USING SIMPSON'S RULE. THOSE VALUES OF THE
S .N.V. WHl CH HAVE S P E C l F l ED PROBABI LIT.1ES OF BE1 NG EXCEEDED ARE
ALSO TABULATED.

DIMENSION X I ( l o o ) , PHI ( l o o ) , CAPPHI(100), PROB(ZO), X(20), Y(20)


R = l.O/SQRT(2.0*3.14159)
READ (5,100) N, NPROBS, DXI, (PROB(I1, I = 1, NPROBS)
C
C .....
NP1 = N + 1
COMPUTE FREQUENCY FUNCTION .....
DO 1 J P 1 = 1, N P 1
FJ = JP1
XI(JP1) = FJ*DXl
- 1
1 P H I ( J P 1 ) = R*EXP(-XI(JPl)*XI(JP1)/2.0)
C
C .....
COMPUTE CUMULATIVE DISTRIBUTION FUNCTION
CAPPHI(1) = 0.5
.....
DO 2 J P 1 = 3, NP1, 2
2 CAPPHI(JP1) = CAPPHI(JP1-2) +
1 (PHI(JP1-2) + 4.0*PHI(JPl-1) + PHI(JPl))*DXI/3.0
C
C
C
..... COMPUTE VALUES OF S.N.V. WHICH HAVE
SPECIFIED PROBABILITIES OF BEING EXCEEDED .....
DO 5 K = 1, NPROBS
Y ( K ) = 1.0 -
DO 3 J P 1 = 1, NP1, 2
PROB(K)

I F ( C A P P H I ( J P 1 ) .GT. Y ( K ) ) GO TO 4
3 CONTINUE
4 X(K) = XI(JP1-2) + (2.0*DXl)*
1
5 CONTINUE
(Y(K) - CAPPHI(JP1-2))/(CAPPHI(JPl) -
CAPPHI(JP1-2))

C ..... .....
ARRANGE THE HEADINGS PROPERLY
WRITE (6,200)
C
C ..... AND PRINT THE RESULTS .....
NOVER2 = N / 2
NOV2Pl = N / 2 + 1
DO 6 J P l = 1, NOV2P1, 2
ISUB = J P 1 + NOVER2
6 WRITE (6,201) XI(JP11, PHI(JPl), CAPPHI(JPl), XI(ISUB), P H I ( ISUB),
1 CAPPHI(ISUB)
C
C ..... NOT FORGETTING THE PROBABl L l T I E S
WRl TE (6,202
.....
DO 7 J = 1, NPROBS
7 WRITE (6,203) PROB(J), X ( J )
CALL E X I T
C
C
100
.....FORMATS FOR INPUT AND OUTPUT STATEMENTS
FORMAT (4X, 14, 13X, 13, lox, F6.2/ /(5F10.5))
.....
200 FORMAT ( 1 H 1 / 7X, 60HTABULATION OF VALUES OF THE STANDARDIZED NORM
1AL DISTRIBUTION/lHO, 6X, 6HS.N.V., 2X, SHFREQUENCY, ZX, 10HCUM.DIS
2TR., 7X, 6HS.N.V., 2X, SHFREQUENCY, ZX, 1lHCUM. D I S T R . / 8 X , ~H(XI),
33X, 8HFUNCTION, 4X, IHFUNCTION, lox, 4H(XI 1, 3X, 8HFUNCTION, 4X,
48HFUNCTION/lH / )
2 0 1 FORMAT (7X, F5.3, 4X, F7.5, 4X, F7.5, lox, F5.3, 4X, F7.5,
14X, F7.5)
2 0 2 FORMAT (1H0, lox, 4HPROB, 8X, 2 H X I / l H d )
2 0 3 FORMAT (7X, F8.4, 6X, F6.4)
C
END
Example 8.3 Tabulation of the Standardized Normal Distribution

Program Listing (Continued)


Data
N = 80, NPROBS
VALUES OF PROB( 1).
0.2 0.1
..= 8, DXI = 0.05
PROB(NPRO6S) ARE
0.05 0.02 0.01
0.005 0.002 0.001

Computer Output
T A B U L A T I O N OF VALUES OF THE STANDARDIZED NORMAL D I S T R I B U T I O N

S.N.V. FREQUENCY CUM.DISTR. S.N.V. FREQUENCY CUM. D I S T R .


(XI) FUNCTION FUNCTION (XI) FUNCTION FUNCTION
558 Statistical Methods
Discussion of Results
This example has illustrated the use of the computer in however, being obtained from a linear interpolation
compiling tables of statistical functions. The values for procedure, will be more seriously in error. Comparison
the frequency function are correct to the fifth decimal with other tables [I] shows that the f, computed here
place; those for the cumulative distribution function are too high by an amount not exceeding 0.002; for most
may be taken as correct within +0.00001. The values tp, work this error is acceptable.
8.13 The x2 Distribution
8.13 The X2 Distribution Substituting
r,
Let r,, t 2 , .. ., be N(0,l) variables, that is, v variables
drawn from a normally distributed population having
zero mean and unit variance. The variable x2 is defined that is,
for a given value of v by
v
x2
-=- X
and d(;)=-,
dx
2 1-26
x2 = (: + 5; + .- - + (: = i=11r:. (8.30)
gives
The variable x could have been defined as the square root
of (8.30), but x2 is of greater practical utility. The fre-
Mxz(6) = - 1
r(v/2)
(1 - 28)-*" I
o
m
o-'~'~"'-' dx
quency function for the distribution, which may be
= (1 - 28)-'I2.
verified by showing that it has the same MGF as that of
the sum of the squares of v N(0,l) variables, is We then conclude
1 I.
f(x" )= 2~,2r(~,2)
ex212(2)"21. (8.31) and
The gamma function is, by definition,
pi = (d2~/d0"e,o = v(v + 2).
Thus
T(a) = I>-'xa-' dx, for a > 0. (8.32) p = ave(x2) = v,
0
02 = var(x2) = 2v.
+
For a an integer, it has the property that T(a 1) = a!. 2. The MGF for the variable x = +
is the prod-
The fr2quency function is shown in Fig. 8.6 for various uct of the MGFs of X: and xi, each of which is based
values of the parameter v, known as the number of on v, and v, degrees of freedom respectively, so that
degrees of freedom. From the viewpoint of tests of sig-
nificance, which we shall be considering shortly, the
information presented in Fig. 8.7 is more helpful. For
any v, Fig. 8.7 gives that value X $ which has a probability
That is, x has the xZ distribution with (v, + v2) degrees
of freedom.
P of being exceeded. The MGF of the x2 distribution is
Finally, note that x2 also has the properties:
1. x2 tends toward the normal distribution N(v,2v)
as v becomes large. This tendency may be seen in Fig. 8.6.
2. a tends toward N(
quickly than does (1) for large v.
Jm, I) even more

x
Figure 8.6 xZ distribution frequency function.
560 Statistical Methods

Degrees of freedom, v
Figure 8.7 Probability points of the x2 distribution.

8.14 X2 as a Measure of Goodness-of-Fit Example. Suppose that 100 tosses of a coin produce 40
heads and 60 tails, and that we wish to know if the coin is
Consider a random variable which can fall into any biased in some way. On the assumption or null hypothesis
one of k different classes with respective probabilities that the coin is not biased, we should have expected 50 heads
f,,f,, . . .,f,. For a total of n observations, the expected and 50 tails. The number of restrictions imposed on the f,
number falling in the ith class is nfi. Let the actual in this case is r = I, since if the expected probability of
observed number in the ith class be x i . The xi would obey obtaining a "head" ( = A , for instance) is stated to be 112,
the multinomial distribution (8.24), and on this basis it then f2 is automatically 1 - 112 = 112 since there are no other
can be shown [3] that the variable alternatives. Hence there is just v = k - r = 2 -: 1 = 1 degree
of freedom:.'We have,
(Observed - Expected)' (xi - nfi12 (8.33)
'' = Expected =C
;=I nfi
obeys the X 2 distribution approximately. The approxima-
tion improves as the expected frequency nf, in each class Notice that 112 has been added to the 40 and subtracted from
becomes greater. For every nfi > 20, there is hardly any the 60. This procedure, known as Yates's correction, generally
loss of accuracy. The relevant number of degrees of adds 112 to those frequencies falling below expectation, and
subtracts 112 from those falling above it. Yates's correction
freedom is k - r, where r is the number of restrictions
compensates for the fact that a discontinuous distribution
imposed on the fi (see examples which follow). (binomial in this case) is being represented by the continuous
The expected frequencies are usually based o n the xZ distribution. Figure 8.7 or statistical tables give the follow-
hypothesis of a particular distribution. Therefore, this ing values:
hypothesis can be tested by evaluating X' and checking t o
see if its value is reasonable in view of: (a) the number of
degrees of freedom involved, and (b) the confidence level
required, that is, how "sure" we wish to be.
8.15 Contingency Tables 561
That is, the observed value of x2 = 3.61 (or greater) can be We shall use the 'X test for goodness-of-fit. As a null
expected to occur, on the average, about 6% of the time. hypothesis we assume A and B have equal effects and
This is as far as statistics goes; it leaves the interpretation to us. pool the results to give the chance of a failure as
Although we personally may believe that the coin is biased, 221157 = 0.140, whence the expected numbers of successes
we would want stronger evidence before making such a and failures for A and B can be calculated. These are
statement public. The levels P = 5%, I%, and 0.1 % are
shown in Table 8.3.
sometimes described as "probably significant," :'significant,"
and "highly significant." Table 8.3 Expected Frequencies on Basis of
However, if 1000 tosses produced 400 heads and 600 tails, Null Hypothesis
then
Success Failure Totals
Treatment A 81.7 13.3 95
a figure which so far exceeds even the "highly significant" Treatment B 53.3 8.7 62
level that we should have little hesitation in stating that the Totals 135 22 157
coin was biased in favor of tails. Even so, it should be recog-
nized! that there is a possibility, although an exceedingly There is only one degree of freedom, since, for example,
remote one, of our being wrong. having specified that we expect treatment A to yield 13.3
It so happens that in the above example we could have failures, all the remaining entries can be obtained by
predicted, directly from the binomial distribution, the prob- difference from the subtotals. Taking into account Yates's
ability of obtaining, for instance, 40 or fewer heads in 100
tosses. However, this would involve a fair amount of work in correction, we have
evaluating factorials and the x2 approach is preferred.
8.15 Contingency Tables
Suppose the success or failure of some experiment is
classified as to whether treatment A or treatment B was
used in the experiment. Table 8.2 shows such a cross
classification, known as a contingency table.

Table 8.2 Contingency Table: For v = 1 d.f., X'= 6.63 at the 1 % level and 7.88 at the
Observed Results
0.5% level. That is, on the basis of our null hypothesis,
Success Failure Totals there is a probability of about 1 in 150 of the observed X 2
or higher occurring. Hence, we conclude that the null
Treatment A 88 7 95 hypothesis is incorrect and that treatment B has a sig-
Treatment B 47 15 62 nificantly higher failure rate than treatment A.
Totals 135 22 157
The above is the simplest possible example of a con-
tingency table. For an m x n table in which m treatments
Can treatment I3 be correlated with a significantly higher are classified according to n results, the appropriate
failure rate than treatment A ? number of degrees of freedom is v = ( m - I)(n - 1).
EXAMPLE 8.4
xZ TEST FOR GOODNESS-OF-FIT

Problem Statement Method of Solution


Write a program to compute x2 from equation (8.33): The regrouping of data is achieved by examining each
xf = nf, in turn; if xi' is found to be less than the prescribed
lower limit I, the observations for class i and one or more
subsequent classes (i + 1, i + 2, etc.) will be pooled in
in which the following are read as data: xi = number of order to raise the combined expectation to at least I.
observations falling into each class i, n = total number The number of degrees of freedom v will be (k,,, - l),
of observations, fi = frequency function for each class i, where k,,, is the number of regrouped classes. Let the
and k =total number of classes. The program should regrouped observed and expected frequencies be denoted
incorporate the following features: (a) automatic re- by yi and y;, respectively. Allowing for Yates's correction,
grouping of data to ensure that each class has at least I x2 becomes
expected observations (recall that the X Z distribution is
obeyed more exactly as I increases), (b) Yates's correction,
and (c) optional input of the theoretically expected fre-
quencies xi' (=nf;:) for each class, instead of n and the f;:
individual1y.
Example 8.4 xZ Test for Goo$ness-ofip~t

Flow Diagram

FORTRAN Implementation
List of Principal Variables
Program Symbol Definition
CHIS0 zZ, defined by equation (8.4.1).
F Vector containing values df fi.
SUSTF Switch, with value 0 (the xr to be read) or 1 (thef, to be read).
K Number of classes k; becomes KNEW after a possible regrouping of data.
L b a s t number I expected to fall into a class.
N Number of observations, n.
NU Degrees of freedom, v,
xLxr Vector containing theoretical frequencies x f ; becomes YEXP (with the y;) after a possible
regrouping,
Vector ~ontainingobserved frequencies xi; becomes YOsS (with the yi) after a possible regrouping.
564 Statistical Methods

Program Listing
A P P L l ED NUMERl CAL METHODS, EXAMPLE 8.4
CHI-SQUARE FOR GOODNESS-OF-FIT.

THE DATA CONSIST OF THE NUMBER OF OBSERVATIONS XOBS( I ) F A L L 1 NG


I N T O EACH CLASS I OUT OF A TOTAL OF K CLASSES, AND EITHER THE
THEORETICALLY EXPECTED NUMBER XEXP(1) OR THE FREQUENCY
FUNCTION F ( I ) FOR EACH CLASS. THERE ARE N TOTAL OBSERVATIONS.
THE VALUES OF XOBS ARE GROUPED BY THE PROGRAM SO THAT AT LEAST
A NUMBER L ARE EXPECTED I N EACH CLASS. CHI-SQUARE I S THEN
COMPUTED OVER ALL THE CLASSES, USING YATES'S CORRECTION TO
ACCOUNT FOR THE FACT THAT CHI-SQUARE I S CONTINUOUS AND THAT THE
OBSERVATIONS ARE DISCRETE.

DIMENSION XOBS(SO), YOBS(50), YEXP(50), XEXP(SOI, F(SO)


INTEGER XOBS, YOBS

..... READ AND CHECK DATA, COMPUTE XTHEOR I F NOT S U P P L I E D


READ (5,100) N, K, L, JUSTF, ( X O B S ( I ) , I = 1, K )
.....
WRITE ( 6 , 2 0 0 ) N, K, L, JUSTF
FN = N
I F (JUSTF .NE. 1) GO TO 4
READ ( 5 , 1 0 2 ) (F(l), I = 1, K )
DO3 1 =1,K
XEXP( I 1 = FN*F( I )
GO TO 5
READ ( 5 , 1 0 1 ) (XEXP(I), I = 1, K )
..... GROUP THE OBSERVATIONS INTO CLASSES, W I T H AT
LEAST L OBSERVATlONS EXPECTED I N EACH CLASS .....
KP1 = K + 1
DO 6 1 = 1, K P l
YOBSCI) = 0.0
YEXP( I ) = 0.0
J = 1
DO8 1 = l , K
YOBS(J) = YOBS(J) + X O B S ( I )
Y E X P ( J ) = Y E X P ( J ) + XEXP(1)
FL = L
I F ( Y E X P ( J 1 .LT. L ) GO TO 8

CONTl NUE
YOBS(J-1)

KNEW = J
--
J E J + 1

1
YOBS(J-1) + YOBS(J)
Y E X P ( J - 1 ) = Y E X P ( J - 1 ) + YEXP( J)

.,
C ..... P R I N T VALUES OF YOBS AND YEXP
WRITE ( 6 , 2 0 1 ) KNEW, ( Y O B S ( I ) ,
.....
I = 1, KNEW)
WRITE (6,202) KNEW, ( Y E X P ( I 1, I = 1, KNEW)
C
C ..... COMPUTE CHI-SQUARE AND NUMBER OF DEGREES OF FREEDOM
C H l S Q = 0.0
.....
DO 9 1 = 1, KNEW
FYOBS = YOBS( I )
9 CHlSQ
NU = KNEW
C H l S Q + ((ABS(FY0BS
- 1
-
YEXP(I))) -
0.5)**2/YEXP(I)

WRITE (6,203) CHISQ, NU


GO TO 1
.L
C
100
.....
FORMAT
FORMATS FOR INPUT AND OUTPUT STATEMENTS . . . a .
(4(11X, 1411 / (1415))
101 FORMAT ( / (7F10.5))
102 FORMAT ( / (5E16.6))
200 FORMAT ( 3 5 H 1 DETERMINATION OF CHI-SQUARE, W I T H I l H O ,
1 5X, 8HN = , 1 4 / 6X, 8HK = , 141
2 6X, 8HL = I 4 / 6 X , 8HJUSTF = , 1 4 )
,
2 0 1 FORMAT (49HO THE I N P U T DATA (POSSIBLY REARRANGED) NOW FOLLOWIlHO,
1 3 0 H THE VALUES OF YOBS(l)...YOBS(, 12, SH) ARE// ( 1 2 1 5 ) )
2 0 2 FORMAT (31HO THE VALUES OF YEXP(l)...YEXP(, 12, SH) ARE//( 7F9.3))
2 0 3 FORMAT (33HO THE RESULTS ARE CHISQ = , F8.3, SX, 6HNU I ,
1 14)
C
END
Example 8.4 xZ Test for Goodness-of-Fit

Program Listing (Continued)


Data
N = 1000, K = 8, L = 20, JUSTF = 0
NUMBER OF OBSERVATIONS XOBS(1) I N SUCCESSIVE CLASSES ARE
142 267 274 179 87 42 7 2
THEORETICALLY EXPECTED NUMBERS X E X P ( I ) FOR SUCCESSIVE CLASSES ARE
135.3353 270.6704 270.6704 180.4469 90.2235 36.0894 12.0298
3.4371
N = 1000, K = 13, L = 20, JUSTF = 0
NUMBER OF OBSERVATIONS XOBS(1) I N SUCCESSIVE CLASSES ARE
5 54 8 8 1 5 9 1 7 9 161. 135 100 54 38 14 8 5
THEORETICALLY EXPECTED NUMBERS X E X P ( I ) FOR SUCCESSIVE CLASSES ARE
6.7379 33.6897 84.2243 140.3739 175.4673 175.4673 146.2227
104.4448 65.2780 36.2656 18.1328 8.2422 3.4342
N = 5000, K = 38, L = 20, JUSTF = 1
VUMRER OF OBSERVATIONS X O B S ( I ) I N SUCCESSIVE CLASSES ARE

FREQUENCY FUNCTION F ( I ) FOR SUCCESSIVE CLASSES I S


3.638960E-03 7.884410E-03 1.356119E-02

Computer Output
Results for the 1st Data Sei
DETERMINATION OF CHI-SQUARE, WITH

N = 1000
K = 8
L = 20
JUSTF = 0

THE 1 NPUT DATA ( P O S S I B L Y REARRANGED) NOW FOLLOW

THE VALUES OF YOBS(l)...YOBS( 6 ) ARE


142 267 274 1 7 9 87 51

THE VALUES OF YEXP(l)...YEXP( 6 ) ARE


135.335 270.670 270.670 180.447 90.223 51.556

THE RESULTS ARE CHlSQ = 0.435 NU 5 5

Results for the 2nd Data Set


DETERMI NATION OF CHI-SQUARE, WITH

N = 1000
K = 13
L
JUSTF -
= 20
0

THE I N P U T DATA ( P O S S I B L Y REARRANGED) NOW FOLLOW

.
THE VALUES OF YORS( 1). .YOBS( 1 0 ) ARE
59 88 159 179 1 6 1 135 100 54 38 27
566 Statistical ~ e t h o d s

Computer Output (Continued)


THE VALUES OF Y E X P ( l ) . . . Y E X P ( l O ) ARE
40,428 84.224 140.374 175.467 X75.467 146.223 104.445
65.278 36.266 29.809

THE RESULTS ARE CHlSQ = 14.647 NU = 9

Resultsfor the 3rd Data Set


DETERMINATION OF CHI-SQUARE, WITH

N = 5000
K = 38
L = 20
JUSTF = 1
THE INPUT DATA ( P O S S I B L Y REARRANGED) NOW FOLLOW

THE VALUES OF Y O B S ( l ) . . . Y O B S ( Z l ) ARE


56 72 125 189 282 337 423 433 457 425 423 414
354 314 213 1 7 1 123 65 47 36 41

THE VALUES OF Y E X P ( l ) . . . Y E X P ( Z l ) ARE


57.617 67.806 123.118 192.272 259.310 327.704 401.404
444.609 467.821 470.255 447.233 401.343 345.715 284.666
221,18C 165,546 118.085 80.254 51.809 32.177 40.077

THE RESULTS ARE CHISQ = 17.009 NU = 20


Example 8.4 x2 Test for Goodness-of-Fit 567
Discussion of Results
The first two data sets are taken from Example 8.2, here (but see Problem 8.22), that uses a random number
where we generated two sequences of 1000 numbers generator to simulate the dealing of bridge hands and
thought to obey the Poisson distribution; the frequency then tabulates their point counts. As a result, "experi-
of appearance of each number was observed, together mental" point-count distributions can be produced for
with its expected frequency. We are now computing any number of hands. In the third data set above, we
X 2 to test one against the other. From Fig. 8.7 or tables, test one of these experimental distributions, based on
the probabilities P, that the actual X 2 or higher would 5000 hands, against the theoretical frequency function
occur by chance, are P = 0.97 (Set 1) and P = 0.10 (Set 2). predicted in Example 8.1. For X 2 = 17.009 and v = 20, we
Since neither P is abnormally low, we conclude that the find P = 0.65, indicating an acceptable fit of the data by
random number generator satislactorily conforms to the theory. When tested against the normal distribution
Poisson distribution. The first Pis actually somewhat high, curve of Fig. 8.1.1, in a calculation not given here, the
indicating an unusually good fit. same data give X 2 = 54.406 with v = 20, corresponding
The authors have written a procedure, not reproduced to the unacceptably low value of P & 0.00005.
568 Stat isticcrl Methods

8.16 The Sample Variance limit of a 2 is 10 x 100/18.31 = 54.6, and its upper limit is
10 x 100/3.94 = 254. (This is not the only 90% confidence
The sample variance has been defined by (8.15) as interval; it is the one which is symmetrical with respect to the
probabilities on x2.)

8.17 Student's t Distribution


It has been shown that ave(s2) = a2, so that the sample If we wish to test the hypothesis that a sample whose
variance is an unbiased estimate of the population mean is X could come from a normal distribution of
variance. mean p and known variance a2, the procedure is easy, for
It may be shown further [I] that for a population of
the variable (2 - p) &/a is N(0,1), and can readily be
normally distributed variables whose variance is a2, the
compared with tabulated values. When a 2 is unknown
variable
and has to be estimated from the sample variance s2, the
X2 = (n - l)s2/a2 (8.34) situation is somewhat different, and the following dis-
obeys the x2 distribution with v = (n - 1) d.f. tribution is of assistance.
Suppose we have k samples obtained from populations Assume two random variables: g, distributed as N(O,l),
having a common variance a2 (but possibly having dif- and X2, distributed as chi-square with v d.f. Then, if
ferent means). Let sample i contain ni items and have
< and X2 are independent, the random variable
variance $2. Then, if N = I:=,
n,, an unbiased joint esti-
mate of a 2 is given by
is called Student's t with v d.f. It has the frequency func-
S
2
=
( n , - 1)s: + (n2 - 1)s; + . . . + ink - 1)s; tion
n1+n2i-...+nk-k

because
k
The distribution is very similar to N(0,1), which it
1
ave(s2) = -
N-k
ave C (ni - l)s? approximates closely for all but small values of v.
i=l Figure 8.8 shows values of t, plotted against v, with P
as a parameter such that P is the probability that It1 > I,.
To illustrate the case in which 0' is unknown, consider
a population N(p,02). The sample mean X is N(p,a2/n).
Hence, 4 = (Z - p) ,/;/a is N(0,l); also, X2 = (n - l)s2/a2
is a chi-square variable with (n - 1) d.f. That is,

Also, from (8.35),

If the populations are normally distributed, each term has the t distribution with (n - 1) d.f., which provides a
on the right-hand side of (8.36) behaves as X2, with test of significance for the deviation of a sample mean
(n, - 1) d.f. Hence the whole of the right-hand side be- from its expected value when the population variance is
haves as. X2,with C:= (ni- 1) = (N - k) d.f. That is, unknown and must be estimated from the sample
the variable variance.
s A simple extension provides a test of significance be-
X2 =(N-k)- (8.37) tween two sample means ?, and I,, which will determine
a2
whether 2, and 3, could have come from the same
also behaves as X2 with (N - k) degrees of freedom. population.
Assume first that the samples come from populations
Example. In a certain experiment, s 2 = 100 with v = 10 d.f.
having a common variance a 2 ; the validity of this assump-
Establish an interval in which we can be 90% certain that the
population variance u' lies. tion can be checked by the F test in Section 8.18. Then
Since v s 2 / 0 2- x Z , then o2 = vs21X2. From Fig. 8.7 or the linear combination (2, - 2,) has the normal distri-
tables [4], for v = 10 d.f., xZ has a 5% chance of exceeding bution with mean (pl - p2) and variance at(l/nl l/n2).+
18.31 and a 95% chance of exceeding 3.94. Thus, the lower Thus
8.17 Student's Distribution

(XI - X2) -(PI-~ 2 ) and will, with specified probability, be expected to fall
within the appropriate tabulated values, unless the null
( rnlJ+ -~1 "2
hypothesis is incorrect and p , # p2.
Example. Repeated analyses show that two batches of a
is N ( O , l ) . Funher, if the joint estimate of (r2 from the manufactured chemical contain the following percentages of
samples is an impurity X. Is there a significant difference between the
s2 =-( n l - 1)s: + (n2 - 1)s; X-content of the two batches?
n, + n2 - 2 9

Table 8.4 Analyses for Percent X in Two


as indicated by (8.35), then (n, + n2 - 2)s2/(r2 has the Batches of Chemical
chi-square distribution with ( n , + n2 - 2) d.f. Thus Percent X
Analysis Batch 1 Batch 2 x, x2

+
has the t distribution with v = ( n , n2 - 2 ) d.f.
Finally, as a null hypothesis, assume that the samples
do come from the same population and hence that p, and
p2 are equal. That is,
In Table 8.4 the new variables x, and x 2 are given by x = 10
x (%X - 3.00). This makes the arithmetic a little easier,
especially if the calculation is being done by hand, but does
not alter the final conclusions. Various sums are next calcu-
lated as shown in Table 8.5.
570 Statistical Methods
Table 8.5 Cnlculation of Batch Means method of deriving xl and x z from the percent X in the two
and Variances batches, the best estimate we have is that the percent X is
(0.702 - 0.394) = 0.308 higher in Batch 2 than in Batch 1.
Batch 1 Batch 2 Finally, we can qualify this statement by giving a confidence
interval within which, with 95 % probability, for instance, the
difference lies. Now t = 2.31 with 8 d.f. at the P = 5 % level.
Thus the percent X in Batch 2 exceeds that in Batch 1 by an
amount that has a 95 % probability of lying within

+
The joint estimate of u Z is sZ = (4 x 1.578 4 x 0.707)/8= 8.18 ~h~ l; ~ i ~ t ~ i b ~ t i ~ ~
1.1425; that is, s = 1.069. On the basis of the null hypothesis
that there is no difference between the X-content of the two The ratio F = s:/s; of two estimates of the variance of
batches,. .M I = and a normal population based o n v, and v, d.f., respectively,
has the frequency function
f -f2 - 3.94 - 7.02 = -4,555, with 8 d.f.
t = s ~ ~ - 1 . 0 6 9
nl nz
---.

(v11v2)*1"
f(F) = B(v,/2,v2/2)
F(vI/2)-l (I + 2 F)
-(v1 + v 2 ) / 2
, (8.43)

But, from tables of the t distribution [4], It1 based on 8 d.f. where
has a 0.2% probability of being greater than 4.50. Thus we
conclude, with very little doubt, that the null hypothesis is
incorrect and that there is a definite difference between the
B(m,n) = r(m)r(n)/r(m + n) =I
o
1
xm-'(I - x)"-' dx

X-content of the two batches. In fact, taking into account the is the beta function.

v2

Figure 8.9 P = 0.05 points of the F distribution.


8.19 Linear Regressio~zand Method of Least Squares

y2

Figure 8.10 P = 0.01 points of the F distribution.

Figures 8.9 and 8.10 show Fpas a function of v , and v,, Assume that each yi is an observation from a normal
such that F, has a probability P of being exceeded. +
population with mean a px, and constant variance a',
Evaluating F provides a test of significance between two and that the corresponding values xi are known precisely.
estimates of a population variance. Note that a value For example, y might be the result of a chemical analysis
F, -,,,,,,, (such that there is a probability P of F falling or the estimate of a product yield, whereas x might be an
-,
below it) is given by F, ,,,,,, = l/Fp,v,,v,. accurately known time, an accurately weighed amount of
catalyst, or a carefully controlled temperature. The
Example. In our example of the t distribution, the two
sample variances s f = 1.578 and s: =0.707, each having situation is indicated in Fig. 8.12, in which y = u + px is
4 d.f., were pooled to give a joint estimate s2 of a population the equation of the regression line. Using the experi-
variance. Was this pooling justified? mental data, we wish to obtain estimates a, 6 , and s 2 of
F = s:/s: = 1.57810.707 = 2.23. From Fig. 8.9 or tables the model parameters a, 8, and a2. Assuming that the
[4], for v , = 4, v, = 4, there is a 5% probability that F will measurement of y locates it within a small interval of
exceed 6.39, and a 5% probability that F will fall below extent Ay, the probability of observing the value y,, is,
116.39 =0.1565. Our results are well within these limits and for instance, from the normal distribution,
there is, therefore, no reason why the sample variances
should not be pooled.

8.19 Linear Regression and Method of Least Squares


Suppose that, in an experiment, certain values of a where k is a constant. Similar expressions hold for
variable y have been observed, each corresponding to a Pr(y,), . . ., Pr(y,), where m is the number of data points
particular value of x, as shown in Fig. 8.1 1. If we wish or observations. The probability P of all these values of y
to draw a straight line on the graph to represent the data occurring simultaneously is
as welt as possible, considering experimental scatter, we
must first postulate a model representing the situation.
572 Statistical Methods

and
'T a = (Zy - bCx)/m = J - bX.
I 0
(8.46)
Also, the variance o2 about the regression line is estimated
o o
0 by
.o
- Cy2 - aCy - bCxy . (8.47)
0 0
C(y - a - bx)'
's =
0 m-2 m-2
Note that the denominator of (8.47) has the value m - 2,
because both a and b have been estimated from the data;
sZ is now said to be based on (m - 2)d.f. and (m - 2)sZ/oZ
has the X 2 distribution with (m - 2) d.f.
We now show that a and b are unbiased estimates of
Figure 8.11 Experimental points. ct and p, and expressions will be derived for var(a) and
var(b). From (8.45),
The regression line is considered the best representation
of the data points if P is a maximum, which will occur when C(x - X)(y - p) - C(x - X)y - Z(X - X)j
b=
u and p are chosen to minimize the sum of squares C(x - x)' C(x -

Henceforth, the subscripts i will be dropped because


the summations are understood to be over all m ob-
servations. Note that dS/da = -2C(y - u - px) and
dS/dp = -2Zx(y - or - px). For S to be a minimum,
as/aa = dS/ap = 0, and u and p will be replaced by their That is, b is a linear combination of y,, y,, ...,y, with
estimates a and b, giving: coefficients (xi - X)/Z(x - x)' and is normally distri-
buted. The average or expected value of each yi is
CO, - a - bx) = 0, Cx(y - a - bx) = 0. +
u pxi, so that

since C(x - X) = 0 and Cx(x - X) = C(x - i)'. Similarly,


since each yi has variance uZ,

With these results, it is easy to show that the random


variable

. .
X1 xp .. . .. .. Xm

Figure 8.12 Model for linear regression. has the Student t distribution with (m - 2)d.f., which may
be used either as a test of significance to see if B is likely
Rearranging, we obtain the simultaneous normal equations to have some preconceived value, or to establish a con-
fidence interval for P.
ma + bCx = Zy, Also, ave(a) and var(a) are obtained from (8.461, as
aZx + bCx2 = Cxy, follows. We have a = j - bX, that is,
whence
mCxy - CxCy - ( x - ( y - ) COV(X,~)
b= =- , (8.45)
mCx2 - (EX)' - C(X - z ) ~ vaf(x)
Problems 589
random variable y that obeys the exponential distribution (b) Write a program that implements the above procedure.
f(y) = 7 c n Y with
, mean l / v Anticipate that the call for the The values f p x should be estimated by making a reasonably
function will be large number N of random walks from each point P, keeping
track of how many times the walk terminates at each point k.
Y = REXPON (X, ETA)
A random number generator with a flat distribution should be
in which X, ETA, and Y have obvious interpretations. used to decide which direction to take at each step of the
8.17 Discuss the possibility of generating an N(p,u) walk.
random variable based on the method given in Problem 8.15. (c) What are the advantages and disadvantages of this
8.18 If x, and x z are a pair of independent random vari- method for solving Laplace's equation, compared with
ables uniformly distributed between 0 and 1, Box and Muller conventional relaxation methods?
1141 indicate that the pair of values (d) Could the method be extended to obtain solutions for:
(i) Poisson's equation, or (ii) a region with a curved boundary?
yl = a(-2 In x1)lI2 cos 2vxz + p, 8.20 A chemical company is faced with the problem of
y , = 4 - 2 In x , ) ' / ~sin 2xx2 + p, disposing of a potential waste product. One possibility is to
convert it into x tons per annum of a saleable product. The
will be normally distributed according to N(p,u). cost of the operation would b e y $/ton of product, all of which
could then be sold at z $/ton. The annual profit would be
(a) Write a subroutine named RNORML that implements P = x(z - y) dollars. Bearing in mind the uncertainties of
Box and Muller's procedure. The call will be of the form production and marketing, the actual value of x is thought to
CALL RNORML ( X I , X2, MU, SIGMA, Y1, Y2) be uniformly distributed between lower and upper limits
in which the arguments have obvious interpretations.
+
x, and x,, y is N(a b/x, a:), and z is N(c - xd, a:), where
estimated values are available for x l , x,, a, b, c, d, u:, and a:.
(b) Investigate the theoretical justification for the pro- Before proceeding with the project, the company wishes to
cedure. estimate the frequency function f(P) for the annual profit.
8.19 Here, we investigate a statistical method for approxi- Since an analytical approach may be difficult, an alternative
mating the solution of Laplace's equation V Z u = 0 in a rect- is to use a random number generator (see Example 8.2 and
angular rcgion with Dirichlet boundary conditions. Problems 8.12, 8.13, and 8.18) to simulate the distributions of
x, y, and z. An approximation to f(P) may then be constructed
from the results of a large number of trials.
Write a program that will implement the above technique,
also known as risk analysis. Investigate the effect on f(P) of
different values for the various parameters. Also investigate
the possibility of estimating the mean and variance of P by
using a method similar to that in Section 8.5(4).
8.21 Write a program to simulate the formation of queues
at the check-out stands of a supermarket, based on the
following model.
The time-interval t , elapsing between the arrival of succes-
sive customers at' the check-out stands, has the exponential
distribution (see Prob. 8.4) f(t) = qe-"', in which q is the
average number of customers arriving per unit time. There are
n stands open, and each customer chooses the stand that has
Introduce a square grid as shown in Fig. P8.19, and let Ur be the least number of people waiting to be served. The time
the known value of u at the boundary point k (k = I, 2, . . ., n). T taken to pass through the stand, after waiting to be served,
Consider a random walk, in which we start from any point P is normally distributed with mean p and variance u2.
and proceed, at random, along a series of grid lines until a The program should simulate the passage of N customers
point on the boundary is eventually reached. The steps of the through the stands. Exponential and normal distribution
walk are such that at any point there is an equal probability of random number generators, such as those developed in
proceeding to each of the four neighboring points. Starting Problems 8.16 and 8.18, should be used to generate successive
from P, let fPX be the probability of reaching point k at the values for t and 7. The program should read values for
end of the walk. Then n, 7, p, u2 and N as data. The printed output should consist
of (a) the observed frequency function for the number of
customers waiting at a stand at any time, and (b) the observed
frequency function for the total time taken to wait and be
is an approximation to up,the solution of V 2 u = 0 at P. served.
Note. Many variations of this problem are obviously
(a) Verify the approximation up by showing that it satisfies: possible, and might include: (a) customer chooses the queue
(i) the boundary condition ur for P coincident with k, and (ii) that he estimates has the least total number of purchases to
equation (7.79), the finite-difference approximation of be processed, (b) periodic closing down or addition of stands,
V2u = 0. (c) T uniformly distributed between T,,, and T,,,, etc.
11Methods

8.22 Write a program that simulates the random dealing (a) Perform the linear regression y = u + fix on each of
of n successive bridge hands, based on the following method. these two sets of data, with y =nicotine content and x = tar
First establish a code whereby the integers 1 through 52 content.
represent the 52 cards in the deck, for example, let 1 = 2 of (b) Test the hypothesis that is the same for both cases.
.
clubs, 2 = 3 of clubs,. ., 13 = ace of clubs, 14 E 2 of diamonds, (c) For a given tar content, do the regular cigarettes
etc. Let these integers occupy, in any order, the 52 elements of contain significantly less nicotine than the filter variety?
the integer vector DECK(1) . . . DECK(52). Also, obtain a +
8.25 In the linear regression y = a fix, the method of
random number gene rat^; that gives a sequence of values least squares assumes that the values x,, i = 1, 2 , . . ., rn are
X I , X Z ,. . ., uniformly distributed between 0 and 1. The first known exactly, whereas the y , are subject to random error.
card is "dealt" by computing i l = 1 + 52x1, rounding down Using the computer, perform experiments to see what
to the next lower integer, and observing the card that is in happens if the Xr are also subject to random error. To achieve
D E C K ( i l ) . The card so dealt is then effectively removed from this, assume a model with known u and p. Then use a random
the deck by interchanging D E C K ( i l ) and DECK(52). To deal number generator (see Example 8.2, and Problems 8.12, 8.13,
+
the second card, compute i, = 1 51x2, again rounded down, and 8.18) to provide rn "exact" values x i over some interval,
and examine DECK(i2); then interchange DECK(i2) and together with corresponding values y i , distributed with con-
DECK(51). The process is repeated, and eventually the coded stant variance about a mean a + px,. Next, superimpose
form of the first hand will be in D E C K ( 4 0 ) . . . D E C K ( 5 2 ) . random errors on the xi, and use these "inexact4' values when
In addition, at least, the program should keep track of the computing the estimates a and b. Repeat the process many
distribution of point counts in the successive hands. These times, and finally compare the mean of the several values of a
point counts can then be compared with those predicted in and b thus computed with the known parameters a and p.
Example 8.1. Also, depending on the imagination and ability 8.26 Nedderman [6] used stereoscopic photography of
of the programmer, successive hands may be printed in a tiny air bubbles to determine the velocity profile close to the
format similar to those found in bridge articles, that is, a wall for flow of water in a 1 in. l:D. tube. At a Reynolds
listing of the hands sorted by suit and value. number of 1200, he obtained the values given in Table P8.26.
8.23 During the batch saponification reaction between
equimolar amounts of sodium hydroxide and ethyl acetate, Table P8.26
the concentration c (gm.moles/liter) varies with time t (min)
according to the equation y, Distance u, Velocity Y u
1 1 from Wall (cm) (crn/sec)
-=-
c co
+ kt,
where co is the initial concentration, and k (liter/gm.mole.min)
is the reaction rate constant. The following results were
obtained in the laboratory, at a temperature of 77°F:

Obtain a least-squares estimate of (a) the reaction rate


Theoretically, the data should follow the law: u =py -f- qy2,
constant, and (b) the initial concentration.
where p and q are constants.
8.24 The data in Table P8.24 have been published in
Time [I81 for the tar and nicotine content (in milligrams) of (a) Use the method of least squares to estimate p and q,
several brands of king-size filter and regular cigarettes. and find the corresponding standard deviation of the data
about the regression line.
Table P8.24 (b) Suppose that we are interested in predicting the value
of the shear stress at the pipe wall, which is given by 7, =
Filter Regular p(du/dy), = = pp, where p is the viscosity of water. Estimate
Tar Nicotine Tar Nicotine the variance of p, so that we could then attempt to place
confidence limits on the predicted value of 7,.
(c) Would the data be fitted substantially better by the
+
cubic formula: u =py + qy2 ry3 ?
8.27 Consider the family of p straight lincs

which have a common slope but different intercepts uk.


Suppose that nk pairs of experimental values are available for
each line:
Problems 591
Devise a method for obtaining least-squares estimates of (e) The values predicted for y by the regression equation at
the parameters a , , a,, . .., a,, and p, and write a program to successive points XI, x z , ...,x,,,.
implement the method. Note that this type of problem is also 8.31 The experimental data for heat transfer between
discussed by Ergun [15]. viscous fluids and the shell side of a heat exchanger can be
Test the program with the data (shown in Table P8.27), correlated fairly successfully by the equation
reported by Williams and Katz [16] in connection with a
Nu =a Re# Pryrd,
Wilson plot for evaluating the shell-side heat transfer coef-
ficient in a heat exchanger. where Nu, Re, and P r are the Nusselt, Reynolds, and Prandtl

Table P8.27

k: 1 2 3 4 5
nk: 4 4 4 4 4

8.28 For the matrix formulation of multiple regression in numbers, r is the ratio of the viscosity at the mean fluid
Section 8.21, verify that equations (8.63) can be rewritten as temperature to that at the wall temperature, and a,/I,and
y,
(8.67). 8 are constants.
8.29 In Section 8.22, multiple regression was discussed in Table P8.31 represents part of the data reported by Katz
terms of orthogonal polynomials. In this connection, verify and Williams 1161, and reexamined by Briggs, Katz, and
the folloaing: Young 1191, for heat transfer outside 314 in. O.D. plain
tubes :
(a) That the polynomials of (8.77) satisfy the orthogonality
condition (8.75). (Hint. Suppose that Pk is about to be com- Table P8.31
puted, and that all lower-order polynomials have already been
proved to be mutually orthogonal. Multiply the first of
equations (8.77) through by P,, and sum over all the data
points x,, i = I, 2, . .., m. Then consider the individual terms
-
of the resulting equation for j < k 3, j = k - 2, and j =
k - 1, in turn.)
(b) That equation (8.73) giving the regression coefficients,
can be rewritten as (8.76).
8.30 Write a program that will perform the orthogonal
polynomial regression discussed in Section 8.22. The program
should accept the following input values: m (total number of
data points), x,, ~ 2 , .. ., x,, y ~y ,z , . . ., ym, and n (order of
regression).
The printed output should consist of values for:
(a) The estimated regression coefficients, bo, b l , ..., b..
(b) The variance-covariance matrix V.
(c) The variance s2 of the data about the regression line.
(d) The coefficients of the various powers of x appearing
in all the orthogonal polynomials so that they can be recon-
Obtain least-squares estimates of a, /3, y, and 8, and indicate
structed, if necessary.
how one might determine the confidence that could be placed
in each such estimate.
592 Statistical Methods

3jbIlbography 12. B. F. Green, J. B. K. Smith, and L. Klem, "Empirical Tests of


an Additive Random Number Generator," Journal of the
I. C. A. Bennett and N. L. Franklin, Statistical Analysis in A.C.M., 6 , 527-537 (1959).
Chemistry and the Chemical Industry, wile^, New York, 1954. 13, l-he ~~~d Corporation, A ~ i l Random~ i Digits
~ ~ with 100,000
2. 9. L. Davies, ed., Statistical Methods in Research and Production, Normal Deviates, The Free Press, Glencoe, Illinois, 1955.
3rd ed., Oliver and Boyd, London, 1957. 14. G. E. P. Box and M. E. Muller, "A Note on the Generation
3. M. G. Kendall, The Advanced Theory of Statistics, Vol. I , of Random Normal Deviates," Ann. Math. Stat., 29, 610611,
GrifEn, London, 1948. 1958.
4. D. V. Lindley and c. P. Miller, Cambridge Elementary 15. S. E ~ "Application
~ ~ of~the Principle
, of h a s t Squares to
Statistical Tables, Cambridge University Press, 1953. Families of Straight Lines," Ind. Eng. Chem., 48, 2063-2068
5. R. W. Hamming, Numerical Methods for Scientists and Engi- (1956).
neers, McGraw-Hill, New York, 1962. 16. R.B. Williams and D. L. Katz, Performance of Finned Tubes
6. R. M. Nedderman, Velocity Profiles in Thin Liquid Layers, in Shell-and-Tube Heat Exchangers, University of Mich~gan
Fh.D. Thesis, Dept. of Chemical Engineering, University of Engineering Research Institute, Ann Arbor, 1951. See also:
Cambridge, 1960. Trans. ASME, 74, 1307-1320 (1952).
7. A. Ralston and H. S. Wilf, editors, Mathematical Methods weight, ~ ~ and ~selected
~ ~~d~h ~ ~ j, of ~ d~ ~ l ~ ~ ~ : ~
for Digital Computers, Vol. 2, Wiley, New York, 1966. United States, 1960-1962, National Center for Health Statis-
8. 9. M. Hammersley and D. C. Handscomb, Monte Carlo tics, Series 11, No. 8, U.S. Dept. of Health, Eduation, and
Methods, Methuen, London, 1964. Welfare, Washington, D.C., 1965.
9. T. H. Naylor, J. L. Balintfy, D. S. Burdick, and K. Chu, 18. T~~~89 (12), s1 (1967).
Computer Simulafion Techniques, Wiley, New York, 1966. 19. D. E. Briggs, D. L. Katz, and E. H. Young, "How to Design
10. N. R. Draper and H. Smith, Applied Regression Analysis, Finned-Tube Heat Exchangers," Chem. Eng. Progr., 59,
Wiley, New York, 1966. 49-59 (1963).
9 1 . Q. E. Forsythe, "Generation and Use of Orthogonal Poly-
nomials for Data Fitting with a Digital Computer," J. Soc.
Indust'. AppI. Math., 5, 74-88 (1957).
APPENDIX

Presentation of Computer Examples

The following is an explanation of the various subdivisions function, second subroutine or function, etc. Var~ablesEa a
appearing in the documentation of each computer example: subroutine or function whose meanings are ide~tical'hi~+tr
those already listed for an earlier program (for example, I ~ G
1. Problem Statement. Contains a statement of the prob-
main program or another subroutine or functioa?) are L23:
lem to be solved, and may refer to the main text if appropriate.
repeated.
Occasionally preceded by an introductory section.
In addition, we include here any particular programming
2. Method of Solution. Refers to algorithms already de- points such as (a) descriptions of special subroutines, ($1
veloped in the text whenever possible. Also gives notes on shifting of subscripts by one in order to accommodate, for
any special problems which may occur, and how they will be example, a. in the algebra as A(1) [and not as the illeg,~al
overcome. A(Q)] in the program, etc.
All programs have been run on an IBM 360/67 computer
3. Flow Diagram. The authors have avoided a flow dia- using the IBM G or H level compilers. We have endeavored
gram that merely repeats the program, step by step, Rather, to write programs that follow the language rules for
the flow diagram is intended to clarify the computational FORTRAN IV as outlined by the American Standards
procedure by reproducing only those steps that are essential Association (ASA-FORTRAN),* and have not attempted to
to the algorithm. Thus, in the final program, there may be
take advantage of special nonstandard features, such as
several add~tionalsteps that are not given in the flow diagram.
mixed-mode expressions, of the G or H level FORTRAN HV
Such steps frequently include: (a) printout of data that have languages,
just been read, often for checking purposes, (b) manipulations,
such as the floating of an integer when it is to be added to a 5. Program Listing. Contains a printed list of the o~.ig!~nsP
floating-point number, that are caused by limitations of FORTRAN IV program.
FORTRAN, (c) calculation of intermediate constants used
6. Data. Contains a printed list of the data cards, if any,
only as a computational convenience, (d) return to the
beginning of the program to process another set of data, (e) 7. Computer Output. In this section, the printed computer
printing headings and comments, (f) arranging for printout output is reproduced. However, if the complete output is
at definite intervals, during an iterative type of calculation, rather lengthy, only certain selected portions may be given,
by setting up a counter and checking to see if it is an integral In this case, the remaining output is usually summarized
multiple of the specified frequency of printout, etc. The flow somehow. Sometimes, clarifying notes, lines, etc., are added
diagram convention is explained at the end of this appendix. by hand to the printout. Occasionally, the printed output may
be cut and reassembled if its clarity is thereby enhanced.
4. FORTRAN Implementation. This section serves as the
main bridge between the algorithm of the text or flow diagram 8. Discussion of Results. Contains a short critical discussion
and the final FORTRAN program. It contains a list of the of the computed results, including comments on any particular
main variables used in the program, together with their programming difficulties and how they might be over,come,
definitions and, where applicable, their algebraic equivalents.
These variable names, etc., are listed alphabetically under a * Subcommittee X3 of the American Standards Association.
heading describing the program in which they first appear, "FORTRAN VS. Basic FORTRAN," Communicationsofthe ACM*
typically in the order: main program, first subroutine or Vol. 7, No. 10, 590-625, 1964.
Flow -Diagram Convention

The following boxes, each of characteristic shape, are used for constructing the flow diagrams in this text.
Substitution

The value of the variable V is replaced by the value of the expression 8.

LPbeI
1. When a path leaves from it, the label serves merely as an identification point in
the flow diagram; 9' is usuaIly the same as the statement number in the corresponding
program.

2. When a particular branch terminates in a label, control is transferred to the one


(and only one) other point in the flow diagram where Y occurs as in (I).

a +a and 3. These special cases encompass the computations in a main program.

4. Indicate the start and finish of computations in a subrotuine.

5. Indicate the start and finish of computations in a function. The value of the
expression 8 is returned by the function. .

Conditional Branching

If the Boolean expression 9 is true, the branch marked T is follovred; otherwise,


the branch marked F (false) is followed.
Appendix 595

Iteration
The counter 9 is incremented uniformly (in steps of $2 -9,) between its initial
and final limits 9 1 and $., respectively. For each such value of 9, the sequence of
computations Cis performed. The dotted line emphasizes the return for the next value
of 9, and the small circle, which is usually inscribed with the corresponding statement
number in the program, serves as a junction box,

Input and Output

The values for the variables comprising the list ?


!. are read as input data.

The values for the variables or expressions comprising the list 9 are printed. A
message to be printed is enclosed in quotation marks.

Occasionally, there are special cases (particularly, calls on subroutines and functions) which do not fit conveniently into the
above arrangement. However, the inte~dedmeaning is conveyed readily by suitable modification of the flow diagram. The flow
diagram for a subroutine or function is separate from that for the calling program, and is started and concluded with the words
"Enter" and "Return," as explained above. The arguments for subroutines and functions are also indicated.
Index
m umbers prefixed by E refer to Computer Exannples; nysnbers prefixed by P refer to end-ofchapter Problems.
Absolute enor, 365 Beam, bending of, P6.26 principal zero of, 390
AdamsBashforth method, for solution of transverse vibrations of, 430, P3.27, P4.31, roots of, 389
ODE, 386 P4.32 Characteristic function, for statistical distri-
Adams method, for solution of ODE, 386 Beams and columns, buckling of, P4.27- tions, 542
modified, 386 P4.30 Characteristic function or polynomial of a
Adams-Moulton predictorcorrector method, Beattie-Bridgeman equation of state, E3.3 matrix, 220
for solution of ODE, 386 Bending, of a beam, P6.26 by Danilevski's method, 261, P4.35
Adiabatic flame temperature, P3.49, P3.50, of a loaded pile, P6.42 by Faddeev's method, P4.17
P5.45 of a mast, P6.27 by Leverrier's method, P4.16
Adjoint matrix, 212 of a spring, P2.29 Characteristic-value problems, 508, E7.9,
Airy stress function, P7.37 Bernoulli's method, 142, 196, P3.7 P7.24, P7.41
Aitken's delta-squared process, P3.24 Beta function, 570 Chebyshev ewnomization, 43, E1.3, P1.44-
Aitken's method, Pl.11-P1.13 Biharmonic equation, 430, E7.7, P7.28, P1.48
Algorithm, 1 P7.37, P7.39 Chebyshev polynomials, 4, 39, 100, E1.3,
Alternatingdirection methods, for elliptic Binomial distribution, 543 P1 .so, P1.5 1
PDE, 508, P7.20 Blasius equation, E2.4, P3.38 and Chebyshev quadrature, P2.27, P2.28
for parabolic PDE, 452, E7.3 Blasius problem, E6.5 and GaussChebyshev quadrature, 115, E2.4
Amplification factor, 449,450,453 Boundary conditions, for ODE, 341,405 expansion of functions as linear combina-
Anaiysis of variance, 531,585 for parabolic PDE, 431,462, P7.12, P7.13, tions of, P2.11
Anodes, distribution of potential between, P7.43 monomials in terms of, 4 1
P7.26 for PDE, at interface between different orthogonality property of, 100, P2.17
Approximation, of functions, 1 media, 462 recursion relation for, 40, 101
by Chebyshev polynomials, 39,43, P1.50, at irregularly shaped boundary, 463, E7.8 relationship to minimax polynomial, 43
P1.51 homogeneous, 508 roots of, 40,115
by interpolating polynomials, 3, 9, E l . l , types of, 462 table of, 40,100
27, E1.2, 35, 39, P1.3-P1.23 Boundary points, types of, 482,483,499 use in economization of polynomials, 43,
by least-squares polynomials, 3 Boundary-value problems, 341,405,508, E1.3, P1.44-P1.48
by minimax polynomials, 4,42, P1.44 E6.5, P6.32, P6.33 Chemical reaction, between oxygen and
by power series, 4, P1.l, P1.2 finite-difference methods, 405 methane, E5.5, P5.45
by rational functions, 2 shooting methods, 405 decomposition of s u l f w l chloride, P6.43
by spline functions, P1.24-P1.27 Brian's method, 453, P7.9, P7.10 in ionexchange column. 430, P7.33
multidimensional, P1.28-P1.41 Bridge hand, chi-square test involving, 567 involving simultaneous equilibrium, E5.5,
Arithmetic average, 532 dealing, P8.2, P8.22 P5.45
Asymptotic convergence factor, Newton's distribution of points in, E8.1 oxidation of ethylene, P4.35
method, 171 Buckling, ofbeamsandcolumns, P4.27 -P4.30 pyrolysis of ethane, E6.2
successive-substitution method, 168, 385, of mast, P3.46 saponification of ethyl acetate, P8.23
388 with diffusion, P7.24
Augmented matrix, 212,270 Cam follower, displacement of, E3.5 Chemical reactor, for pyrolysis of hydrocar-
Canonical boxes, 222 bons, P6.21
Back substitution, 270 Cantilever, flexibility matrix for, P5.16 optimum cyclic operation of, P3.21
Backward differences, multistep algorithms transverse vibrations of. P4.32 parallel-plate,, P7.24
in terms of, 381, 383 CauchySchwarz inequality, 223 residence times in, P8.5
Backward finite difference, 36 Cauchy's equation for refractive index, P2.47, stirred tanks in series, P3.22, P5.31
operator, 35, 36 P5.10 tubular, far oxidation of ethylene, P6.35
table, 36, P1.10 Cayley-Hamilton theorem, 223 for pyrolysis of ethane, E6.2, P6.20
Bairstow's method, 156, P3.6 Centraldifference operator, 35, 37,431 for sulfuryl chloride decomposition, P6.43
Band-pass filtcr circuit, P. 3.1 1 Central finite difference, 37 Chi-square distribution, 559, P8.6, P8.7
Barakat and Clark method, 452 operator, 35, 37 tabulation of, P8.8
Base points, 1 table, 38, P1.10 test for goodness-of-fit, 560, 561, E8.4
in numerical integration, 70, 100, 103, Centrifugal pump, for transfer between tanks, Cholesky's methad, P5.22, P5.23
113,115,116, P2.4-P2.7 P3.39, P3.40 Cigarettes, tar and nicotine content of, P8.24
interpolation, with arbitrarily spaced, 9, 27 in piping network, P5.40, P5.46 Circulants, P4.9, P4.10
with equally spaced, 35 Characteristic equation, of a matrix, 145, Closed integration formulas, 70, 71, P2.8,
Basis of vectors, 214 220, P3.8, P3.9 383
Closed multistep methods, for solution of in a laminar-flow heat exchanger, 430, error terms for, 129
ODE, 383 97.22, P7.23, P7.41 higher-order derivatives, 129
as correctors in predictorcorrector meth- inside reactor tubes, P6.35, P6.43 of polynomials by synthetic division, 7,
ods, 386 natural, at a heated wall, E7.5, P6.33 P1.4, P1.5
relationship to NewtonCotes closed inte- to fluid inside furnace tubes, P6.43, P7.44 Dimension of vector space, 213
gration formulas, 383 Convergence, of methods for ODE, 347 Dirichlet boundary contition, 462
Coefficient matrix, 269 of methods for PDE, 433,499 Discontinuous distribution, 5 3 2
augmented, 270 explicit method, 432 Disaete distribution, 532
Coefficient of expansion of aluminum, P1.47 implicit method, 440 Discretization error, in solution of ODE, 342,
Cofactor, 212 Convergence factor, asymptotic, for Newton's P6.4
Colebrook equation, P3.38 method, 171 in solution of PDE, 433
Combinations, 533 for solution of corrector equations, 385, Distribution function, 532
Companion matrix, 227 388, 391, P6.11 Divided difference, 9, P1.6-P1.9
\
Complex roots, 141, 142, 169,171, 196, for successive substitution methods, 168 interpolating polynomial, 9, E l . l , P1.8,
P3.1, P3.5, P3.7 Cooling fin, radiation from, P6.37 P1.9
Composite integration formulas, 77, P2.6, temperature distribution inside, P7.42 relationship to finite difference, 35
P2.7, P2.9, P2.10 Corrector equations, in solution of ODE, 362, symmetric form for, 10
Gaussian quadrature, P2.21 386,390, P6.11 table, 12, 14, 15, El.l.Pl.8-P1.9, P1.17
multidimensional, P2.35, P2.36 bound for stepsize in, 385, P6.12 table of definitions, 1 0
repeated interval-halving and Romberg extraneous solutions for, 389, P6.12, Divideddifference table, 12, 14, 15, E l . l ,
integration, 90, E2.2 P6.13 P1.8, P1.9, P1.17
Simpson's rule, 79, E2.1, 90, P2.10, P2.35, in predictor-corrector methods, 386 alternate paths across, 15
P2.36 solution by method of successive substi- detecting errors in, P1.21, P1.22
trapezoidal rule, 78,90, P2.6 tutions, 385, 388, 391, P6.11 interchanging base-point indices in, 15
Compressibility factor for gases, 173, P2.16 Covariance, 5 3 3 interchanging base-point values in, 15
for ammonia, P2.16 between regression parameters, 573-575 Dot product of two vectors, 213
for methane, P2.16 Crank-Nicolson method, 45 1, P7.11, P7.40 DuFort-Frankel method, 45 1
Compressible flow of gas, in detonation Cumulative frequency function, 532 Duhamel's theorem, P2.30
wave, P5.34 Curved boundary, see Irregularly shaped
in porous medium, P7.34 boundary Economization of polynomials, 43, E l 13,
in submarine explosion, P3.35 Curvilinear regression, see Regression P1.44-P1.48
through converging-diverging nozzle, P3.34 Eigenvalues, 21 9
Computer examples, methods for present- Danilevski's method, 145,153,261, P4.35 complex conjugate pair, 249
ing, 593 Darcy's law, P7.34, P7.36 determination of, by Danilevski's method,
Condensation, of vapor on a cooled tube, Data, reduction of, 531 261, P4.35
P2.14 Deferred approach to the limit, Richardson's, by Givens-Householder method, P4.36
Condition number of a matrix, P5.3 78,90, 364, P2.10 by Jacobi's method, 250, E4.4
Conduction, steady-state, across a hollow Definite matrices, 224 by power method of Mises, 226, E4.2,
duct, P7.16, P7.17 Degree of precision, in numerical integration, P4.34
between cylinders, E7.9 76 by roots of characteristic equation, E3.1,
in a flat plate, E7.6 Degrees of freedom, 55 9 P3.8, P3.9
in cooling fin, P6.36, P7.42 in analysis of variance, 585 by Rutishauser's method, 236, E4.3
in radiation shield, P6.34 in contingency tables, 561 dominant, 226
through oil in a pipeline, P7.30 in regression, 572, 573 for solving differential equations, 221, P4.15
unsteady-state, equation for, 429, P7.11 Density of water, P1.23 in buckling problems, P4.28-P4.30
in a semi-infinite medium, P2.30, P3.41 Determinants, 212 in characteristic-value problems, 509
in a slab, 431, E7.1, E7.2 computation of, 271,273, E5.1, E5.2 in vibrational problems, 145, P4.23-P4.26,
in a solidifying alloy, E7.4 of elementary matrices, 269 P4.31-P4.33
in a square bar, E7.3 Vandermonde, 8 , 1 4 3 Eigenvectors, 2 19
inside a cylinder, P7.21 Detonation wave, P5.34 determination of, see Eigenvalues, determina.
in the Earth's surface, P2.30 Diagonal dominance, 298 tion of
in liquefied natural gas, P7.25 Diagonal matrix, 21 1 orthogonality of, 220,224,s 11
with change of phase, E7.4, P3.48, P7.31, Diagonal pivot strategy, 274, 281 Elasticity, situations involving, see Beam;
P7.32 Difference equations, linear, 388, P6.12 Bending; Buckling; Cantilever;
Conduction equation, eigenvalue approach to characteristic equation for, 389 Flexibility matrix; Frame; Torsional oscil-
stability of methods for solving Crank- homogeneous, 389 lations; and Vibrations
Nicolson method, P4.21 solutions for, 389 Electrical circuit, transient behavior of, E6.3,
explicit method, P4.20 Differential equations, see Ordinary (ODE) P3.11, P3.12, P6.22-P6.24
Confidence intervals, 568,570 differential equations; Partial (PDE) dif- Electrical network, E5.1, P5.24, P6.24
Consistency of methods for PDE, 449,450 ferential equations analysis by Laplace transforms, P3.11, P3.12
Contingency tables, 561 Differential mean-value theorem, 9, 346 Electronic vacuum tube, circuits involving,
Continuous distribution, 532 Differentiation, numerical, 128, P2.46-P2.52, P3.20, P5.32
Convection, at a flat plate, P6.31 P5.9 operating characteristics of, P1.42, P2.48
combined with radiation, P3.42, P5.30, and the interpolating polynomial, 128 Electron lens, distribution of potential inside,
P5.33, P6.34, P6.37 approximations for first derivatives, 129, P7.26
in heat exchangers, P2.15, P3.45, P6.19 130 motion of electron inside, P6.25
Elementary matrices, 269,271 table, 35,36,38, P1.10 error term for, 116
Elliptic PDE, 429,482 Finitedifference approximations for partial for arbitrary integration limits, E2.4
iterative methods for solving, 484,508, derivatives, see Partial derivatives welght factors for, 115
E7.6-E7.8 Finitedifference methods for boundary-value Gauss forward formula, 38, P1.10
Equal ripple property, 42 problems, 405,509, P4.28, P4.30, P4.31 error term for, 38
Equation of state, Beattie-Bridgeman, E3.3 Finite divided difference, see Divided differ- Gauss-Hermite quadrature, 115, P2.23
Van der Waals's, P3.30 ence error term for, 116
Equations, Solution of, Chapter 3 Fixed word length, for computers, 342 table of base points for, 116
Systems of, Chapter 5 Flash vaporization of a hydrocarbon mixture, table of weight factors for, 116
Equilibrium, of simultaneous chemical reac- ~3.36 Gaussian elimination, 270
tions, E5.5, P5.45 Flexibility matrix, P5.13, P5.16, P5.17 Gaussian quadrature, 100, 101, E2.3, 113,
Equivalent matrices, 270 Flowdiagram convention, 594 115, 116, E2.4, P2.19-P2.23, P2.25,
Enor, inherent, 1 Fluid mechanics, situations involving, com- P2.26, P2.37, P2.38
round-off, 1 , 6 pressible flow,P3.34-35, P5.34, P7.34 Gauss-Jordan' elimination, 272, E5.1
truncation, 1 flat plate, flow past, E6.5, P6.32 effect of small pivot, 273, 281
in solution of ODE, 342 friction factor, determination of, P3.37, with maximum-pivot strategy, 281, E5.2,
absolute, 365 P3.38 P5.2
discretization, 342, P6.4 hydraulic jump, P3.31 Gauss-Laguerre quadrature, 113, P2.20
local truncation, 342,345, 347,363, 365, jet, P6.41 error term for, 115
387,391 natural convection at heated wall, E7.5, for arbitrary lower integration limit, 115,
propagation of, 346,365 P6.33 P2.22
relative, 365 networks of pipes, E5.4, P5.40, P5.41, table of base points for, 113
round-off, 342,345,365 P5.46 table of weight factors for, 113
truncation, 342, 347,363,365, 387, 391, particle-sampling probe, P6.39, P6.40, weight factors for, 113
P6.15 P7.27 Gauss-Legendre quadrature, 101, E2.3, E2.4,
Error function, P2.19, P3.44 porous medium, flow in, P7.34, P7.36 P3.44
Ethane, pyrolysis in a tubular reactor, E6.2 pressure drop in pipes, E5.4, P3.38-P3.40, base points and weight factors, by half-inter-
Euler load, P2.29, P4.27 P5.40, P5.41, P5.46, P7.30 val method, E3.4
Euler's method, for solution of ODE, 344, pressure drop in tubular reactors, P6.20, composite formula for, P2.21
E6.1, E6.2, P6.4 P6.35, P6.43 error term for, 105
bound for propagated error in, 347 pumping and piping operations, E5.4, for arbitrary integration limits, 104
convergence for, 347 P3.39, P3.40, P5.40, P5.46 multidimensional, P2.37, P2.38
improved, 362, P6.3 stream function, E6.5, P3.47, P7.27 table of base points for, 103
local truncation error for, 345,347 submarine explosion, P3.35 table of weight factors for, 103
modified, 362, P6.3 surge tank, P6.28 weight factors for, 103
propagation of truncation error in, 346 velocity profiles, determination of, E2.4, GaussSeidel method, 299, E5.3, P5.38
Expected value, 532 E6.5,E7.5,P6.41,P7.30, P8.26 for PDE, 484, E7.6-E7.8
Explicit method for PDE, 432, E7.1, E7.5, vorticity transport equation, 430 Gill, coefficients for Runge-Kutta method, 363,
P7.4 FORTRAN, 6; see also individual Computer P6.10
consistency of, 450 Examples; Appendix; and Preface Givens-Householder method, P4.36
convergence of, 432 Forward finite difference, 35 Goodness-of-fit, chi-square test for, 560
in two space dimensions, 452 operator, 35 Graeffe's root-squaring method, 141, E3.1,
limitations of, 440 table, 35, P1.10 P3.1
stability of, 449 Fourier coefficients, using Romberg integra- Gram-Schmidt orthogonatization, 297
unconditionally stable methosds, 45 1 tion, E2.2 Grid point, 429,43 1
Explicit multistep methods, for solution of Fourier functions, 2,43, P1.50, P1.51 Grouting of underground rock formation, P7.36
ODE, 385 relationship to Chebyshev polynomials, 40
Exponential distribution, P8.4, P8.16,P8.21 Frames, minimization of strain energy in, Half-interval method, 178,406, E3.4, E6.5,
Exponential integral, P2.22 P4.18 P3.32
Extraneous solutions, for multistep meth- statically determinate, E5.2, P5.19 Hamming's method, 390, E6.4, P6.15
ods in the solution of ODE, 389, P6.12, statically indeterminate, P4.18, P5.13- stability of, 391
P6.13 P5.15, P5.18 stepsize for convergence of corrector, 391
Extrapolation, 16, 385, E l . l , 1'1.3 Freezing problem, P3.48, P7.31, P7.32 truncation error for, 391
Richardson's, 78,90, 364, P2.10 Frequencies of vibration, see Vibrations Heat exchanger, condensation of vapor in, P2.14
Frequency function, 532 laminar flow in, 430, P7.22, P7.23, P7.41
Factorization of matrices, 214, P5.21 Friction factor, for flow in a rough pipe, with temperaturedependent physical prop-
Faddeev's method for characteristic poly- P3.38 erties, P2.15, P3.45, P6.19
nomials, P4.17 for turbulent flow in a smooth pipc, P3.37 IIeat loss, fiom insulated pipe, P5.33
False position method, 178, E31.5 Fugacity, P2.16 from wall of combustion chamber, P3.42
Fanning equation, E5.4, P5.46 Furnaces, see Radiation Heat transfer, see Conduction; Convection; and
F distribution, 570 Radiation
tabulation of, P8.11 Gamma function, 115, P2.20, P2.22,559 Heat transfer coefficients, correlation of, P8.27,
test for sample variances, 571, 586 Gauss backward formula, 38 P8.31
Finite difference, 35 error term for, 38 Hermite polynomials, 101
operators, 35, 36, 37 GaussChebyshev quadrature, 115, E2.4 and Gauss-Hermite quadrature, 116
relationship to fmite divided difference, 35 base points for, 115 orthogonality property of, 101
recursion relation for, 101 Gauss-Laguerre quadrature, 113, P2.20, Jacobi method, 298, P5.37
roots of, P3.33 P2.22 for simultaneous nonlinear equations, Newton
table of, 101 Gauss-Legendre quadrature, 101, E2.3, E2.4, Raphson, 319, E5.5, P5.42, P5.43
table of roots of, 116 P2.21, P2.37, P2.38 successive substitution, 308, E5.4
Hermitian form, 224 improved trapezoidal rule, P2.2, P2.3
Hermitian matrix, 211, 224 Lobatto quadrature, 116, P2.25 Jacobian, 156, 319
Heun's method, 362 multidimensional quadrature, P2.3 1-P2.38 Jacobi's methods, for eigenvalues, 250, E4.4
Homogeneous boundary conditions, 50'8 NewtonCotes closed formulas, 71, P2.8, for simultaneous equations, 298,484, P5.37,
Homogeneous equations, 212 P2.9 P7.18
Honor-point count, E8.1 NewtonCotes open formulas, 75 Jet, velocity profile in, P6.41
Horner's rule, 6, P1.4, P1.5 open formulas, 70, 75 Jordan canonical form, 222
Hydraulic jump, P3.31 Radau quadrature, 116, P2.26
Hydrocarbon mixtures, cracking of ethane, Romberg integration, 90, E2.2 Kaczmarz's method, 297, P5.36
E6.2 Simpson's rule, 73,79,90, E2.1, E8.3, Kirchoffs current law, 274
flashing of, P3.36 P2.5, P2.7, P2.10, P2.12, P2.34-P2.36 Kutta, coefficients for Runge-Kutta methad,
furnace for cracking, P5.27-P5.29 stepsize for, 71, 78, 100 363
general cracking of, P6.20 trapezoidal rule, 7 1, 78, 90, E2.2, P2.1-
mass-spectrometric analysis of, P5.11 P2.4, P2.6 Lagrange interpolation coefficients, for multi-
Hyperbolic PDE, 429 with equally spaced base points, 70 dimensional quadrature, P2.31
Interpolating polynomial, 3, 8, 9, E l . l , 27, Lagrange's interpolating polynomial, 9, 27,
Ideal gas law, 173, P7.34 E1.2,35, 39 E1.2,P1.8, P1.16,Pl.lS
Identity matrix, 211 differentiation of, 128 error term for, 27
Ill-conditioned matrix, 297, P5.3 Gauss backward formula, 38 multidimensional, P1.35-P1.39
Implicit methods for PDE, 440, E7.2, E7.4, Gauss forward formula, 38, P1.10 Laguerre polynomials, 100
P7.5 in multistep methods for solution of ODE, and Gauss-Laguerre quadrature, 113
alternatingdirection, 452, E7.3, P7.6-P7.8 381 orthogonality property of, 100
convergence of, 440 integration of, 69, 71, 75 recursion relation for, 100
Crank-Nicolson, 45 1 Lagrange's,9,27,E1.2,P1.8,P1.16,P1.18 roots of, P3.33
in two dimensions, 452 multidimensional, P1.35-P1.39 table of, 100
solutions of tridiagonal equations resulting Newton's backward formula, 37, P1.10 table of roots of, 113
from, 441 Newton's divideddifference, 9, E l . l , P1.8, Laplace's equation, in a rectangle, 483, E7.6,
stability of, 450 PI .9 P7.15, P7.20
Implicit multistep methods, for solution of Newton's forward formula, 36, P1.10 solution by random walk, P8.19
ODE, 385 Newton's fundamental formula, 13, P1.ll Laplace transformation, P3.10-P3.12
Improved Euler's method, 362, P6.3 Stirling's formula, 38, P1.23 Laplacian, approximation of, at irregular bounc
Improved polygon method, 362 using derivatives at base points, 39 ary, 463,483, E7.8, P7.17
Improved trapezoidal ruie, P2.2 with equally-spaced base points, 35 in a rectangle, 483
Incidence matrix, 310 Interpolation, 3,8,9,27, 35, 39, 385, E l . l , nine-point, 431, P7.3
Increment function, in solution of ODE, 361 E1.2, P1.3-P1.23 Larkin's method, 452
Individual, in a population, 532 inverse, P1.14, P1.15, P3.16 Latent-heat effects, E7.4, P3.48, P7.31, P7.32
Inherent error, 1 iterated linear, P1.ll-P1.13 Least squares, method of, 3,571
Inherent instability, in solution of ODE, 364 linear, 10, P1.3 weighted, 4
Initial basis, 214 multidimensional, P1.28-P1.41 see also Regression
Initial condition, for parabolic PDE, 431,450 of degree two at a boundary, 483 Legendre polynomials, 100
Initial-value problem, 341 Interpolation and Approximation, Chapter 1 and Gauss-Legendre quadrature, 101
Inner product of two vectors, 213 Interval-halving and Romberg integration, 90, orthogonality property of, 100
Instability, see Stability EL.2 recursion relation for, 100
Insulated boundary, 462, E7.4 Inverse error function, P3.44 roots of, E4.3
Integral equations, in theory of radiation Inverse interpolation, P1.14, P1.15, P3.16 table of, 100
scattering, P2.44 Inverse matrix, 211, P4.2 table of roots of, 103
simultaneous, for radiation between parallel computation of, 271, 272, E5.1, E5.2 Length of a vector, 213
plates, E2.1, P2.45 improvement of estimate for, P4.1, P5.8 Leverrier's method for characteristic polynomi
Integral mean-value theorem, 72 Ion-exchange equation, 430, P7.33 als, P4.16
Integration, numerical, 69 Irreducible matrix, 299,484 L'Hospital's rule, P7.21
as a smoothing process, 69, 128 Irregularly shaped boundaries, 463,483, E7.8, Liebmann method, 484
Chebyshev quadrature, P2.27, P2.28 P7.17 Limit, Richardson's deferred approach to, 78,
closed formulas, 70, 71, P2.8 Irrotational flow, stream function in, P7.27 90, 364, P2.10
composite formulas, 77,90, E2.1, P2.6, Isometric matrix, 224 Linear combination, 534
P2.7, P2.9, P2.10, P2.21,P2.35,P2.36 Iterated linear interpolation, Pl.11-P1.13 of approximating functions, 2
degree of precision in, 76 Iterated synthetic division, P3.2-P3.4 of normally distributed variables, 552
error estimation in, 77 Iterative factorization of polynomials, 156, of orthogonal polynomials, 101, P2.11
GaussChebyshev quadrature, 115, E2.4 E3.2 of vectors, 214
Gauss-Hermite quadrature, 116, P2.23 Iterative methods, for elliptic PDE, 484,508, Linear dependence, 21 3
Gaussian quadrature, 100,101,113, 115, P7.15 Linear interpolation, 10, P1.3
116, E2.3, E2.4, P2.19-P2.23, P2.25, for simultaneous linear equations, Gauss- iterated, P1.ll-P1.13
P2.26, P2.37, P2.38 Seidel method, 299, E5.3, P5.38 Linear operator, 35
Linear transformations, 219 MuUer's method, P3.19 Normalized vector, 213
Lin's iteration, P3.4 Multidimensional approximation, P1.28-P1.41 Normal regression equations, 572-574
Lin's method, P3.5 Multidimensional interpolating polynomial, Nozzle, flow in convergingdiverging, P3.34
Loaded plate, deflection of, 430, E7.7, P7.28 P2.31-P2.36 Null hypothesis, 560
stress distribution in, P7.37 Multidimensional quadrature, P2.31-P2.35 Nullity, 219
Lobatto quadrature, 116, P2.25 Multinomial distribution, 543,560 Sylvester's law of, 219
Local truncation error, see Truncation error Multiple regression, see Regression Null matrix, 210
Multiple roots, 141, 142, 156, 169, P3.7 NuU space, 219
Marginal stability, in solution of ODE by Multistep methods, for solution of ODE, 342, Numerical Integration, Chapter 2
multistep methods, 390 381,383,386,391 Numerical method, 1
Mass spectrometer, interpretation of data characteristic equation for, 389
from, P5.11 closed integration formulas, 383 Ohm's law, 274
Mass-spring systems, vibrations of, E3.1, extraneous or parasitic solutions for, 389, Oil, nonisothermal flow in a pipeline, P7.30
P4.23, P4.24 P6.12, P6.13 One-step methods, for solution of ODE,342,
Mast, bending of, P6.27 in predictor-corrector methods, 384 361,392
buckling of, P3.46 open integration formulas, 381 Open integration formulas, 7 0 , 7 5 , 382
Matrices and Related Topics, Chapter 4 relationship to Newton-Cotes integration Open multistep methods, for solution of ODE,
Matrices and vectors, operations involving, formulas, 382, 383 381
E4.1 truncation error in, 387,391 as predictors in predictor-corrector methods,
Matrix, augmented, 212, 270 386
basic defiitions and operations concerning, Natural convection at a vertical plate, E7.5, relationship to NewtonCotes open integra-
17, 210-213, E4.1 P6.33 tion formulas, 382
coefficient, 269 Natural gas, storage of liquefied, P7.25 Optimization methods, in solution of boundary
elementary, 269, 271 underground storage of, B7.34 value problems, 406
equivalent, 270 Nested evaluation of polynomials, 6 Ordinary Differential Equations (ODE), Chap-
factorization of, 214, P5.21 Network, electrical, E5.1, P3.11, P3.12, P5.24, ter 6
flexibility, P5.13, P5.16, P5.17 P6.24 numerical solution of, 341
Hilbert, P5.8 of pipes, E5.4, P5.41 Adams method, 386
incidence, 310 and pumps, P5.40, P5.46 Adams-Bashforth method, 386
inversion of, 271,272: E5.1, E5.2 Neumann boundary condition, 462, P7.23 Adams-Moulton method, 386
partitioning of, 270 FJewtonCotes closed integration formulas, 7 1 , boundary value problems, 341,405, E6.5
Maximum-pivot strategy, 281, ES.2, P5.2 383, P2.8 closed multistep methods, 383
Maxwell-Boltzmann distribution, P8.6 composite formulas for, 7 7 , P2.9 convergence, 347
Mean, 532 error term for, 7 3 corrector equations, 362, 386, 390
for a sample, 542 estimating the error for, 76 discretization error in, 342, P6.4
of arbitrary function, 534 table of, 7 3 eigenvalue approach to, 221, P4.15
of linear combination, 534 see also Simpson's rule; Trapezoidal rule errors in, 342,345,347,363,365,387,391
Mean-value theorem, differentia!,, 9,346 NeWonCotes open integration formulas, 7 5 , Euler's method, 344, E6.1, E6.2, P6.4
integral, 7 2 382 extraneous solutions for, 389, P6.12, P6.13
Mechanical quadrature, see Internation, error term for, 76 finitedifference methods for, 405
numerical estimating the error for, 76 Hamming's method, 390, E6.4, P6.15
Median, 5 32 table of, 76 Heun's method, 362
Membrane, transverse vibrations of, P4.33 Newton-Raphson method, 319, E5.5, P5.42 improved Euler's method, 362, P6.3
Milne predictor-corrector methods, for solu- efficient application of, P5.43 improved polygon method, 362
tion of ODE, 386 in factorization of polynomials, 156, P3.6 increment function in, 361
characteristic equation for fouxth-order Newton's backward formula, 37,381,383, initial value problems, 341
corrector, 389 P1.10 local truncation error in, 342, 345, 347, 363,
extraneous or parasitic solutiorrs for, 389, error term for, 37 387,391
P6.12 Newton's divideddifference interpolating poly- Milne methods, 386, P6.11, P6.12
fourth-order method, 386 nomial, 913, P1.8, P1.9 modified Adams method, 386, P6.13, P6.14
sixth-order method, 386, P6.11, P6.12 error for, 1 3 , 1 4 modified Euler's method, 362, P6.3
stability of, 390 Newton's forward formula, 36, 7 1 , 7 5 , P1.lO, multistep methods, 342, 381,383, 386, 391
Minimax, polynomial, 4 , 4 2 , P1.44 P1.ll one-step methods, 342, 361, 392
principle, 4 , 4 2 , P1.43 error term for, 36 open multistep methods, 381
Minor, 212 Newton's fundamental formula, 13 order of, 341
Mode, 532 error term for, 13, 14 parasitic solutions for, 389, P6.12, P6.13
Modified Adams predictor-corrector method, Newton's method, 171, E3.3, P3.14, P3.15 predictor-corrector methods, 362, 384, 386,
for solution of ODE, 386, P6.13, P6.14 Nine-point approximation for Laplacian, 431, 390, E6.1, P6.11-P6.15
Modlfied Euler's method, for solution of ODE , P7.3 predictor equations, 362, 386
362, P6.3 Nonlinear PDE, 464, E7.4, E7.5, P7.11 propagatior, factor in, 365
stability of, 388, 391, P6.12, P6.13 Nunsingular matrix, 21 1 Kunge-Kutta methods, 361, 363, E6.3, E6.4,
Moment coefficients, 532 Norm, P5.3 E6.5, P6.5-P6.10, P6.15
Momentgenerating functions, 542 Normal distribution, 552 simultaneous first-order equations, 341, 365,
Monic polynomial, 40 derivation of frequency fuirction for, 553 390, E6.3, E6.4
Monomials, in terms of Chebyshev polynomi- tabulation of standardized values, 552, E8.3 stability in, 364, 388, 391,P6.12,P6.13
als, 41 Normalizztion steps, 271,273,274,282 step-factor in, 365
stepsize, 341,364,385,387,391,P6.10, Polynomial regression, see Regression stability of, 388,391, P6.12, P6.13
P6.15 Polynomials, 2 stepsize control in, 385,387, 391, P6.15
systems of fixst-order equations, 342,347, as approximating functions, 3,4,27, 35, truncation error in, 387, 391
363,365,387,391 39,42, El.1, E1.2 Predictor equations, in solution of ODE, 362,
Taylor's expansion approach to, 343,361, Chebyshev, 4, 39,100,115, E1.3, P1.50, 386
P6.2, P6.6 P1.51, P2.11,P2.27,P2.28 in predictorcorrector methods, 386
truncation error in, 342, 346, 387, P6.4, derivatives of by synthetic division, 6, P1.4, Presentation of Computer Examples, 593
P6.15 P1.5 Pressure drop, in pipes, E5.4, P3.38-P3.40,
Orthogonality, of eigenvectors, 5 11, 5 19 differentiation of, 128, P5.9 P5.40, P5.41, P5.46, P7.30
property, for polynomials, 100, 121, P2.17 economization of, 43, E1.3, P1.44-Pl.48 in tubular reactors, P6.20, P6.35, P6.43
Orthogonakation, Gram-Schmidt procedure, equations involving, 141 Probability, 532
297 evaluation of, by Horner's rule, 6, P1.4, lawsof, 533
Orthogonal matrix, 211 P1.5 Probability density function, 532
Orthogonal polynomials, 100 Gauss backward formula, 38 Producer gas, combustion of, P3.50
Chebyshev, 100, P2.17 Gauss forward formula, 38, P1.10 Projectile, fhght of, P6.29
expansion of monomials as linear combi- Hermite, 101,116 Propagation factor, in solution of ODE, 365
nations of, 101, P2.11 in a matrix, 221 Pseudorandom number generator, see Randon
Hermite, 101 in multistep methods for solution of ODE, number generator
Laguerre, 100 381 Pump, see Centrifugal pump
Legendre, 100 integration of, 69, 71, 75
use in regression, 574 interpolating, 3 , 8 , 9 , 2 7 , 35, 39, E l . l , QD (quotientdifference) algorithm, 196, P3.5
Orthogonal vectors, 213 E1.2 Quadratic convergence, 171, P3.14
Orthonormal set of vectors, 213, P4.7 iterative factorization of, 156, 171, E3.2 Quadratic factors for polynomials, 156, E3.2,
Overrelaxation, successive, 508, P7.19 Lagrange's interpolating, 9,27, E1.2, P1.8, P3.5
Oxidation, of ethylene, P6.35 P1.16, P1.18 Quadratic form, 224
of methane, E5.5, P3.49, P5.45 Laguene, 100,113 Quadrature, see Integration, numerical
of producer gas, P3.50 least-squares, 3, 571,573,574 Quadratic formulas, determination of roots
Legendre, 100, 101 needed for, E3.4, P3.32, P3.33
Parabolic PDE, 429,431,452 minimax, 4,42, P1.44 Quasidiagonal matrix, 222
Crank-Nicolson method, 451 monic, 40 Queues, simulation of, P8.21
explicit method, 432, E7.1, E7.5, P7.4 multidimensional interpolating, P1.35 -
implicit alternatingdirection methods, 452, P1.39 Radau quadrature, 116, P2.26
E7.3 near-minimax, 4 3, P 1.44 Radiation, between multiple surfaces, P5.25,
implicit method, 440, E7.2, E7.4, P7.5 Newton's backward formula, 37, P1.10 P5.26
in two and three space dimensions, 453, E7;5 Newton's divided-difference interpolating, between parallel plates, E2.1, P2.43, P2.45
Richardson method, 450 9, El.1, P1.8, P1.9 emission between wavelengths, P2.13
stable explicit methods, 451 Newton's forward formula, 36, PI.lO,P1.11 from cooling fins, P6.37
Parasitic solutions, for multistep methods in Newton's fundamental formula, 13, P1.ll from insulated pipe wall, P5.33
solution of ODE, 389, P6.12, P6.13 orthogonal, see Orthogonal polynomials from wall of cornbustion chamber., P3.42
Partial derivatives, approximation of, 430, splitting ratio of into partial fractions,P3.10 involving refractories, P5.26-5.29, P7.44
462, P7.1, P7.2 Stirling's formula, 38, P1.23 most intense, P3.43
Partial Differential Equations (PDE), Chap- transformation of, from one variable to scattering of, P2.44
ter 7 another, 44, E1.3, P1.49 shielded, P5.30, P6.34
Partial instability, in solution of ODE, 364 Populations, 532 to bar inside furnace, P7.43
Particle-sampling probe, P6.39, P6.40, P7.27 statistics concerning, 533 to reactor tube, P6.43
Partitioning of matrices, 21 1, 270 wildlife, P6.38 to tube near refractory wall, P7.44
PDE, classification of, 429 Portal frame, P5.18 to tubes in furnace, P5.27-P5.29
examples of, 429 Positive definite matrix, 224-226, 301,483 view factors, by the string method, P3.17,
for methods of solution, see Characteristic- Cholesky's method for factoring, P5.22, P3.18
value problems; Elliptic PDE; and Parabolic P5.23 computation as integrals, P2.39-P2.43
PDE Power method of Mises, 226, E4.2, P4.34 definition of, P2.39, P2.41, P5.25
Permutations, 533 Power series, 4, P1.l, P1.2 with absorbing gas, P2.42, P2.45
Pile, bending of, P6.42 telescoping or economization of, 43, E1.3, Random number generator, additive method,
Pipe, see Friction factor; Network; Pressure P1.44-P1.48 P8.12
drop; and Velocity profiles Precision, degree of, 76 for arbitrary distribution, P8.15
Pivot element, 273 Predictor-corrector methods, for solution of for normal distribution, P8.17, P8.18
effect of small value, 273, 281 ODE, 362,384,386,390, E6.4 for Poisson distribution, E8.2
selection of, 282 Adams-Moulton method, 386 mixed multiplicative congruential method,
Planck's law, P3.43 extraneous or parasitic solutions for, 389, P8.13
Plotting subroutines, E6.3, E6.5, E7.1, E8.5 P6.12, P6.13 multiplicative congruential method, 545
Point count, of a bridge hand, E8.1, E8.4, Hamming's ~ n e t l ~ u 390,
d , E6.4, P6.15 Random variable, 532
P8.22 Milne fourth-order method, 386 Random walk, P8.19
Poisson distribution, 543, P8.4 Milne sixth-order method, 386, P6.11, Rank of a matrix, 212, 214
random number generator for, E8.2 P6.12 Rational functions, 2
Poisson's equation, 430, E7.7, E7.8, P7.19, modified Adams method, 386, P6.13, Reaction, see Chemical reaction
P7.29 P6.14 Reactor, see Chemical reactor
Rectifier, circuits involving, P6 2 2 , P6.23 propagated error for, 364 Stability of methods, for ODE, 364,388
Reduction step, 271, 273,274,282 relationship to Taylor's expansion of the marginal, 390
Refractive index, of glass, P2.47, P5.10 solution function, 361, P6.6 multistep methods, 388, 391, P6.12, P6.13
of sucrose solutions, P1.23 Runge's coefficients for, 363 relative, 390
Refractory surfaces, P5.26, PS.28 secondader methods, 361 Runge-Kutta methods, 364
Regression, for families of straight lines, P8.27 stability of, 364 strong, 390
in terms of orthogonal polynomials, 574, step-factor in, 365 weak, 390
P8.29, P8.30 stepsize control in, 364, P6.10 for PDE, 449, P7.6-P7.9
involving normal distances, P5.35 thirdarder methods, 362, P6.5, P6.8, explicit method, 449
linear, 571, P8.23-P8.25 P6.9 implicit method, 450
matrix formulation of, 574, P8.28 truncation error in, 363, P6.15 Richardson's method, 450
multiple and polynomial, 573, E8.5, P8.26 Run test, for random digits, P8.14 simultaneous PDE, 475
situations involving, 531, P8.23-P8.31 Rutishauser's method, 236, E4.3, P4.22 Standard deviation, 532
Regula-fasi method, 179, P3.44 acceleration of convergence in, 239 Statically determinate and indeterminate frames,
Relative error, 365 application of, 5 11 see Frames
Relative stability, in solution of ODE by Rutishauser's QD algorithm, 196, P3.51 Statistic, 532
multistep methods, 390 statistical Methods, Chapter 8
Relaxation, 301 Sample, 532 Statistics, definition of, 532
methods for elliptic PDE, 484,508 mean, test of significance for, 568 Stefan-Boltzmann constant, P2.13
Remainder, for Taylor's formula, 4., 430 statistics concerning, 542 Step-factor, in solution of ODE, 365
Residence times in a stirred tank reactor, P8.5 two variances, ratio of, 570 Stepsize, for numerical integration, 71, 78, 100
Rhombus rules, 196 variance, joint estimate of, 568,571 in solution of ODE, 341, 364,385, 387,391
Richardson's extrapolation, 78,90, 364, P2.10 Saul'yev method, 451 control in multistep methods, 385, 387, 391,
Richardson's method, 484 Scalars, 211 P6.15
Ripple-filter circuit, P6.22 Schur's inequality, 223 control in Runge-Kutta methods, 364, P6.10
Risk analysis, P8.20 Semi-infinite medium, heat conduction in, Stiffness matrix, P5.16
Roue's theorem, 13 P2.30, P3.41 Stirling's formula, 38, P1.23
Romberg integration, 90, E2.2 scattering of radiation in, P2.44 Strain energy in frames, P4.18
error term for, 9 1 Serial association, of random digits, P8.14 Stream function, in flow past a flat plate, E6.5
tableau for, 91 Shaft, oscillation of rotating masses on, in irrotational flow between coaxial tubes,
Roots, of characteristic equation, for linear P4.25, P4.26 77.27
difference equations, 389 Shooting methods, in solution of boundary- near cylinder, with circulation, P3.47
of Chebyshev polynomials, 40,115 value problems, 405 String method for radiation view factors, P3.17,
of equations, 141 Significance, tests of, 531 P3.18
of Hermite polynomlds, 116, P3.33 between sample means, 569 Stresses, in a concrete dam, P7.39
of Laguerre polynomials, 100, 103, E3.4 for sample means, 568 in a loaded plate, P7.37, P7.38
of Legendre polynomials, 100, 103 for slope of regression line, 572 Strong stability, in solution of ODE by multi-
Roots of an equation, methods for finding, Similar matrices, 221 step methods, 390
Bernoulli, 142 Simpson's rule, 73, 363, E8.3, P2.5, P2.12 Structures, see Frames
false position, 178 composite formula, 79, E2.1,90, P2.10 Student's t distribution, 568
Graeffe, 141, E3.1 error term for, 73, 79 for test in h e a r regression, 572
half-interval, 178, E3.4 multidimensional, P2.3442.36 tabulation of, P8.10
Muller, P3.19 Simultaneous linear algebraic equations, 269 Submarine explosion, P3.35
Newton, 171, E3.3, P3.14, P3.15 direct solution of, 270, 272,297, E5.1, Subspaces, 219
quotient-difference, 196, P3.51 E5.2 Successive overrelaxation, 508, P7.19
regula falsi, 179, P3.44 Gauss-Seidel method, 299,484, E5.3, E7.6- Successive substitution, method of, 168, P3.13,
successive substitution, 168, P3.13, P3.23, E7.8, P5.38 P3.32
P3.24 iterative solution of, 298, 299, E5.3 acceleration of convergence in, P3.24
Ward, 169, P3.29 Jacobi method, 298,484, P5.37, P7.18 asymptotic convergence factor for, 168
Root-squaring method, Graeffe's, 141, E3.1 Simultaneous nonlinear equations, 269, 308 in solution of corrector equations, 385, 388,
Round-off error, 1 , 6 , 342,345, 365 iterative methods for solving, 308, 319, 391, P6.11
accumulation of, 7 E5.4, E5.5 Successive substitution method, in solution of
bound for, 7 Simultaneous ODE, 341, 365,390, E6.3, corrector equations, 385. 388.391, P6.11
effect in solving simultaneous equations, E6.4 asymptotic convergence factor for, 385,
P5.5 -P5.7 Simultaneous PDE, E7.5 388
Runge, coefficients for Runge-Kutta method, Smoothing process, integration as a, 69 Surge tank,P6.28
363 Solution of Equations, Chapter 3 Sylvester's law of n d t y , 219, P4.14
Runge-Kutta methods, for solution of ODE, Specific heat capacity of nitrogen, P1.46 Symmetric matrix, 21 1,224
361 Specific volume of superheated methane, Synthetic division, 7, P1.4, P1.5
coefficients, for minimum truncation error, R1.29 by quadratic factor, P3.5
P6.15 Spline functions, P1.24-P1.27 iterated, P3.2-P3.4
fourth-order methods, 363, E6.3, E6.4, differentiation of, P2.52 Systems, of first-order ODE.341,365,390,
E6.5, P6.6, P6.7, P6.15 in the solution of ODE, P6.116-P6.18 E6.3, E6.4
Gill's coefficients for, 363, P6.10 multidimensional, P1.40, P1.41 Systems of Equations, Chapter 5
increment function for, 361 Spring, bending of, P2.29
Kutta's coefficients for, 363 Spur of a matrix, 220 Table look-up procedure, 466
Index

Taylor's expansion, approach to solution of Tridiagonal matrix, eigenvalues of, E4.3, Vehicle suspension, vibrations of, P6.30,
ODE, 343,361, P6.2 P4.19 P6.31
for boundary conditions for PDE, 462-464 Tridiagonal system, algorithm for solving, Velocity profiles, in a jet, P6.41
for approximating partial derivatives, 430 441 inside tubes, E2.4, P7.30, P8.26
for two independent variables, 361,430, subroutine for solving, E7.2 near flat plate, E6.5, E7.5
P1.32, P1.33, P6.6 Truncation error, 1 Vibrations, of a beam, transversely, 430,
Taylor's formula with remainder, 4, P1.l, in solution of ODE, 342 P3.27, P4.31, P4.32
P1.2 bound for, in Euler's method, 347 of a membrane, P4.33
r distribution, see Student's t distribution for Euler's method, 345, P6.4 of mass-spring systems, E3.1, P4.23,
Telephone equation, 430 for multistep methods, 387, 391 of rotating masses, P4.25, P4.26
Telescoping a power series, 43, E l .3, P1.44- for Runge-Kutta methods, 363, P6.15 of vehicle suspension, P6.30, P6.31
P1.48 for Taylor's series approach, 344 View factor, see Radiation
Thermal conductivity of carbon dioxide,P1.19 local, 342,345,347,363,387,391 Viscosity of liquid ethylene glycol, P1
Thermocouple emf, interpolation in table of, propagation of, 346,364 Von Neumann stability analysis, 449.4 -
E1.2, P1.14 Truss, forces in, E5.2 475, P7.7-P7.9
Third boundary condition, 462, P7.12 Vorticity transport equation, 430
Torsional oscillations of rotating masses, P4.25, Undetermined coefficients, method of, P2.5 1
P4.26 ., Unitary matrix, 21 1, P4.4 Ward's method, 169, P3.29
Torsion of a cylinder, E7.8, P7.29 Unit vector, 213 Wave equation, 430
Trace of a matrix, 220 Weak stability, in solution of ODE by multi-
Transfer rule, 219 Vandermonde determinant, 8,143 step methods, 390
Transformation of a vector space, 219 Vapor pressure of aqueous ammonia solu- Weierstrass approximation theorem, 3
Transpose of a matrix, 210, P4.3 tions, P1.29 Weight factors, for Gaussian quadrature
Trapezoidal rule, 71,9O, P2.1, P2.4, P2.6 Variable, random, 5 32 formulas, 102, 113,115,116
and Romberg integration, 90, E2.2 Variance, 532,533 Weighting function, and orthogonality
composite formula, 78,90, P2.6 for a sample, 542,568 property for polynomials. 100
error term for, 72, 78 of arbitrary function, 534 Weights of males in United States, distribu-
improved, P2.2, P2.3 of linear combination, 534 tion of, P8.9
Triangular decomposition of matrices, 236, of points about regression line, 572-575 Wien's displacement law, P3.43
E4.3 Variate, 532
Triangular matrix, 2 11 Vectors, basic definitions and operations con- Yates's correction, 560
cerning, 17,213-215, E4.1
Camahan, Bnce &!x ' P ~ L J ~
49- Jk.4 +%I)
Applied numerical methodsf [Brice Carnahan H.A.
Luther, James O.WlrA'.l. - &J :dl,@
.,.'I. . Y = \!"AT
.>I>+,J,+ L,F: 9.7 'xvii
_@

.I=j oG%l &L% +-+,+


.'+IJ& :\9'9 vk 6,) jl &I
.+ki
I,+&,.~ .I .LT>k
Luther, H. k 94 .dl.+Jfl~ $i .\
Wilkes, James 0.
Applied numerical methods :dl+ .t
b\4R QNY9YIJY d l \ Y
\YAY
phr-YVbA d%l

Applied Numerical Methods :


&j c~,!, ~~bl;jK:
LJU
-
&I:$&
s->:p
lVhY S L j - J j I : yk ,&&0j cj3;
FYt : u ~ ,IW
L
:;I*-
.cij\...

i IED
Original Edition 1969 
Reprint Edition 1990 
Printed and Published by ' 
: 
:. 
i-.. 
>a. 
.,/ 
' I  
4 
3: 
&. 
l i - 
.-,s*
to 
DONALD L. KATZ 
A. H. While University Professor of Chemical Engineering 
The University of Michigan
Preface 
This book is intended to be an intermediate treatment of the theory and appli- 
cations of numerical methods. Much o
Con tents 
COMPUTER EXAMPLES 
CHAPTER 1 
Interpolation and Approximation 
1.1 Introduction 
1.2 Approximating Functions 
1.3
2.9 Orthogonal Polynomials 
Legendre Polynomials: P,(x) 
Laguerre Polynomials: 5Yn(x) 
Chebyshev Polynomials: Tn(x) 
Hermite
CHAPTER 5 
Systems of Equations 
Introduction 
Elementary Transformations of Matrices 
Gaussian Elimination 
Gauss-Jordan Eli
7.6 Convergence of the Explicit Form 
7.7 The Implicit Form of the Difference Equation 
7.8 Convergence of the Implicit Form

You might also like