Professional Documents
Culture Documents
Chapra and Canale - Numerical Methods For Engineers (4e) - 3 PDF
Chapra and Canale - Numerical Methods For Engineers (4e) - 3 PDF
1 Ten
ee
a0
dali = y teh omy
vay + cuit +o
paren
ie i atr
evapo
DISPLAY y
{As soon as we enter the loop, we use an IF/THEN structure to test whether adding
1 +t de will lake us beyond the end of the interval. It does not, which would usually be the
ease at first, we do nothing. IF i¢does, we would need to shorten the interval by seiting the
variable step Into ¢f ~ 1. By doing this, we guarantee that the next sep falls exactly on #f
After we implement this final step. the loop will rerminate because the condition £ = 1
will tes true,
Notice that before entering the loop, we assign the value of the time step, df, to another
variable, . We ereate this dummy variable so that our routine does not change the given
value of d¢ if and when we shorten the time step. We do this in anticipation that we raight
need to use the original value of d somewhere else in the event that this code is integrated
within a larger program.
It should be noted that the algorithm is still not foolproof. For example, the user could
hhave mistakenly entered a step size greater than the calculation interval, for example.
tf ~ 11 =$ and dr = 20,’Thus, you might want to include error traps in your code to eateh
such errors and to then allow the user to correct the mistake,
MODULAR PROGRAMMING.
Imagine how difficult it would be to study a textbook that had no chapters, sections, or
paragraphs. Breaking complicated tasks or subjects into more manageable parts is one way
to make them easier te handle. In the same spirit, computer programs can be divided into
small subprograms, or modules, that can be developed and tested separately. This approach
1s called modular programming.
‘The most important attribute of modules is that they be as independent and selt-
contained as possible. In addition, they are typically designed to perform a specific, well-
defined function and have one entry and one exit point. As such, Uney are usually short
(generally 50 to 100 instructions in length) and highly focused
In standard high-level languages such as Fortran 90 or C, the primary programming
clement used to represent each module is the procedure. A procedure is a series of computer
instructions that together perform a given task. Two types of procedures are commonly
‘employed: junctions and subroutines. The former usually returns a single result, whereas the
Jater returns several
In addition, it should be mentioned that much of the programming related to software
packages like Excel and MATLAB involves the development of subprograms. Hence.Chara-Grate-marenea
Mathes or Eines,
Fear tee
36,
Lmeoeigcompotrs | 2 Propamming ant oe)
rea Stave cengrer,
PROGRAMMING AND SOFTWARE
Excel macros and MATLAB functions are designed to receive some information, perform
a calculation, and retuen results. Thus, modular thinking is also consistent with how pro-
‘gramming is implemented in package environments.
“Modular programming has a number of advantages. The use of small, self-contained
‘units makes the underlying logic easier to devise and to understand for both the developer
‘and the user. Development is facilitated because exch module can be perfected in isolation,
In Fact, for large projects, different programmers can work on individusl parts, Modular de-
sign also increases the exse with whieh a program can be debugged and tested because errors
ccan be more easily isolated. Finally. program maintenance and modification are facilitated
‘This is primarily due o the Fact that nev modules ean be developed to perform addivional
tasks and then easily incorporated into the already coherent and onganized scheme.
While all these atributes are reason enough to use modules, he most important reason
related 10 numerical engineering problem solving is that they allow you © maintain your
‘own Hibrary of useful modules for later use in other programs, This will be the philosophy
of this book: AI the algorithms willbe presented as mnedules
‘This approach is ilustrated in Fig, 2.7 which shows a fuanetion developed tw imple-
‘ment Euler's method, Notice that this Function application and the previous versions differ
jn how they handle inpuviouipuc. Ta the former versions, input and output directly come
from (via INPUT statements) and 10 (via DISPLAY statements) che user. Tn the function,
the inputs are passed into the FUNCTION via its a
meat list
Function Eulertdt. tf. tf. y
‘and the output is returned via the assignment statement
y= Bilerttt, ti tf. yi)
In addition, recognize how generic he routine has become, There are no references to
the specifics of the parachutist problem, For example, rather than calling the dependent
FIGURE 2.7
sevdocnde ke
solves a dillerentiol equaton
Using Eulor’s method.
‘function that
ANETION Fulercat. 1, UY, 9
b=
nad
o
we
cnr
cpt = cyt,
vay + ditt #6
bateh
Ite tf eur
enero
20
at > 0 THEN
-tChapra-Geate:narencat | Ldeeting. comptes | 2.Froqranming and oe)
Matted Eines, area Save cengrer,
Fear tee
24 EXCEL 37
variable v for velocity, the more generie label, y is used within the function. Further, notice
that the detivativeis nor computed within he functioa by an explicit equation. Rather, another
function, dy, must be invoked to compute it. This acknowledges the fact that We might want
‘touse this function for many different problems beyond solving forthe parachutist velocity
2.4 EXCEL
Excel is the spreadsheet produced by Microsoft, Inc. Spreadsheets ate a special 1ype of
‘mathematical software that allow the user 10 enter and perform calculations on rows and
columns of data. As such, they are a computerized version of a large accounting worksheet
‘0a which large interconnected calculations cam he implemented znd displayed. Because the
centre calculation is updated when any value on the sheet is changed, spreadsheets are ideal
for “what if?" sorts oF analysis,
Excel has some builtin nusmerical capabilities inchiding equation solving, curve f
ting, and optimization. [also includes VBA as a macro language that can be used (0 im
plement numerical calculations. Finally, it las several visualization tools, suel as graphs
and three-dimensional surface plots, chat serve as valuable adjuncts for mu analysis,
In the present section, we will show how these capabilies can he used 10 solve the pare
cchutist probe.
“To do this, let us Fist set up a simple spreadsheet, As shown below, the First step i
volves entering labels and numbers iato he spreadsheet cells.
Be oD = a
| 4. Parachutist Problem
}2-
[3m Bat kg
rar 25 ys
[at 413
&
[zt ‘vou pda) vam js)
[a uv vow
[s- 2
Before we write a macro program ro calculate the numerical value, we can make our
subsequent work easier by attaching names to the parameter values, To do this, select cells
AG:BS (the easiest way 19 do this is by moving the mouse to A3, holding down the left
‘mouse button and dragging down to BS). Next, make the menu selection
cert Nato
ate Left column OK
‘To verify that this has worked properly, select cell B3 and check that the label “m” appears
in the name box (locates! nthe left side of the sheet jst below the ment bars)
Move to cell C8 and enter the analytical solution (Eq. 1.9),
=9.88m/ed | L-exp(-cd mts) )
When this formula is entered, the value 0 should appear in cell C8. Then copy the formula
down to cell C¥ to give a value of 16.405 m/s.
All the above is typical of the standard use of Excel. For example, at this point you
could change parameter values and see how the analytical solution changes.38
Chara-Grate-marenea
Mathes or Eines,
Fear tee
Lmeoeigcompotrs | 2 Propamming ant oe)
rea Stave cengrer,
PROGRAMMING AND SOFTWARE
Now, we Will ilustrate how VBA macros can be used to extend the standard capabill-
ties. Figure 2.8 lists pseudocode alongside Excel VBA code for all the control structures
described in the previous section (Figs. 2.2 through 2.6). Notice how, although the details
differ, the structure of the pseudocode and the VBA code are identical
‘We can now use some of the consirucis from Fig. 2.8 to write 2 macro function to
‘numerically compute velocity. Open VBA by seleeting*
ols Macro Visual Basic Editor
Once inside the Visual Basic Editor (VBE), sotect
© Module
‘and a new code window will open up. The following VBA function ean be developed
directly from the pseudocode in Fig. 2.7, Type it imo the code window.
ion Explicit
on Euler(dt, ti, t
bin k Ag Single, ©
‘Compare this macro with the pseudocode from Fig. 2.7 and recognize how similar they
are. Also, see how we have expanded the function’s argument list to include the necessary
parameters for she parachutist velocity model. The resulting velocity vis then passed back
to the spreadsheet via the funtion name.
Also notice how we have included another function to compute the derivative. This
ccan be entered in the same module by typing it directly below the Euler function,
Function dy(t, v, m, ed)
const g A a8
ay =o -
4 Function
“The hot kes combination AiGPLL is even quicker!Chara-Grate-marenea
Mathes or Eines,
Fear tee
FIGURE 2.
12 lundoavental cor
stucues in [a psexelacode
care (0) Excel YA,
Laven. compts | 2 rang ant ‘omen
fae eas Save engona,20
(a) Pseudocode (0) Excel VBA
JECTHEN:
TF covet in TE b <> 0 Then
True black lence s
eure nd
IECTHENVEL SE:
IF condition THEW rf aso thea
True Bleck = Sex (abe(a)
False hock
evare
IHENVEL SEN
TF congiton, THEN
ELSEIF condit (on
Brock:
ELSEIF conest(on 1
»
by / 0
#10
% Do.
Block beiaa
FF condition ExtT Tf i >+ 10 Then Exit Do
OR i = start, Finish
Brack
ewan
39Chapa-cnatemamecat Meet. comptes | 2 ror nt ‘omenciwt
Mates Canes, a hel Sate tenga,
Fear teen
40 PROGRAMMING AND SOFTWARE
“The final step isto return co the spreadsheet and invoke the function by entering the
following formula in cell BO
wuler (at, a8, 29, 88,n,ca)
‘The result ofthe numerical tegration, 16.531, will appear in cell BD.
You should appreciate what has happened here. When you enter the funetion into the
spreadsheet cell, the parameters are passed into the VB.A program where the calcultion is
performed and the result is then passed back and displayed im the cell In affect. the VBA
macro language allows you to use Excel as your inpuvoutpat machanism, All sorts of ben
efits arise from this fact.
Forexample, now that you hve set up the calculation, you can play with t. Suppose that,
the jumper was rmach heavier, say. m = 100 kg (about 220 pounds). Enter 100 into cell B3
snd the spreadsheet will update immediatly to show a value of 17.438 in cell B9. Change
the mass back t 68.1 ky and the previous result, 16.531, antomatically reappears in cell B9
[Now let us take the process one step further by filling in some addtional numbers for
the time, Enter the mumbers 4,6, ... 16 in cells A10 through A16. Then copy the formu-
Jas from cells B9:C9 down to roves 10 through 16, Notice how the VBA program ealeu-
lates the numerical result correctly foreach new row. (To verify this, change d to 2 and
compare with the results previously computed by hand in Example |.2.) An additional em-
bellishment would be to develop an x-y plot of the results using the Excel Chart Wizard
‘The final spreadsheet is showin below We now have created a pretiy nice problem-
solving tool. You can perform sensitivity analyses by changing the values for each of
the parameters. As each new value is entered, the computation and the graph would be
automatically updated. Ii this imeractive nature that makes Excel so powerful. However
recognize that the ability to solve this problem hinges on being able wo write the mero
with VBA.
x 5
1 Parachutist Problem
2
Zim 68.1 kg 60
4 led 128 kgie
Sat 1s 50
6
zh num (evs) wanal (mis) | 40
5 ooo o.000
3 2 516.805 | 30
10 493 27.769
if 6 35022 36.pt2 | 20 —a— vnum (mis)
12 8 an252 41.005
13 10 45.017 44.073 | 10 vanal (mis)
“ 2 arsw 47.490
15 4 aaamn—asana, | 0
‘6 16 50.635 90.859 o 10 0Chara-Grate-marenea
Mathes or Eines,
Fear tee
Lmeoeigcompotrs | 2 Propamming ant oe)
rea Stave cengrer,
2.5_ MATLAB. ar
25
Itis the combination of the Excel environment with the VBA programming language
that cruly opens up a world of possibilities for engineering problem solving. In the coming
chapters, we will illustrate how’ this is accomplished,
MATLAB.
MATLAB is the flagship software product of Mathwvorks, Ine., which was cofounded by the
numerical analysts Cleve Moler and John N. Little. As the name implies, MATLAB was
originally developed as a matrix laboratory. To this day, the major element of MATLAB is
still the matrix. Mathematical manipulations of matrices are very conveniently imple
‘mented in an easy-to-use, interactive environment, To these matrix manipulations,
MATLAB has added a variety of numerical functions, symbolic computations, and visua
ization tools. As a consequence, the present version represents a fairly comprebensive tech-
nical computing environment.
MATLAB has a variety of functions and operators that allow convenient implementa
tion of many of the numerical methods developed in this book. These will be described
derail in the individual chapters that follow. In addition, programs can be written as so
called mfles that can be used to implement numerical calculations. Let us explore how
this is done.
First, you should recognize that nommal MATLAB use is closely related 10 pro-
_grumming, For example, suppose that we wanted to determine the analytical solution to che
parachutist problem. This could be done with the following series of MATLAB commands
cd* (-exp{-cd/m*t£))
‘with the result being displayed as
‘Thus, the sequence of commands is just like the sequence of instructions in a typical pro-
gramming language.
Now’ what if you want to deviate from the sequential structure. Although there are
‘some neat ways to injeet some nonsequential capabilities in the standard command mode,
the inclusion of dacisions and loops is hest done by creating a MATLAB document called
aan mile, To do this, click on
Bile New mfile
andanew window will open with a heading “MATLAB Editor/Debugger.” In this window,
‘you can type and edit MATLAB programs, Type the following code there
canst)42
Chara-Grate-marenea
Mathes or Eines,
Fear tee
Lmeoeigcompotrs | 2 Propamming ant oe)
rea Stave cengrer,
PROGRAMMING AND SOFTWARE
Notice how the commands are written in exactly the way as they would be writen in
the front end of MATLAB. Create a directory where you will keep your MATLAB m-files
(for the present example we will use one called ¢:\MATLAB\mifiles|.) Save the program
‘on this directory with the name: analpara. MATLAB will automatically atigeh the exten-
sion m to denote it as an m-fle: anslpara.m.
‘To mun the program, you must go back to the command mode. The most direct way to
do this is to click on the "MATLAB Command Window” bation on the task bar (which
is usually at the bottom of the screen), Next we have to tell MATLAB the location of the
files. Click on
Eile Ser par
Awindow called “Path Browser” will open. Change the current directory toe\MATLABY
nfiles\and press enter: Now exit the Path Browser by clicking on the x box inthe upper right
comer. Every time yOu star up MATLAB, you will have fo set the path in this way
‘The program can now be sun by typing the name ofthe m-file, analpara, which should
Took like
[Now one problem with the foregoing is that it is set up to compute one case only. You
‘can make it more flexible by having the user input some of the variables. For example, sup
pose that you wanted to assess the impact of mass on the velocity at 2s, The m-file could
be rewritten asthe following to accomplish this
9.8;
vegan/ed (L-exp (-e
fe this as analpara2.m, If you typed analpara2 while being in command mode, the
prompt would show
ka}
‘The user could then enter a valuc like 100, and the result will be displayed as
17,3420
‘Now it should be pretty clear how we ean program a numerical solution with an m-file
In order to do this, we must first understand how MATLAB handles logical and
looping structures. Figure 2.9 lists pseudocode alongside MATLAB code for all the control
structures from the previous section. Notice how. although the details differ. the structures
ofthe pseudocode and the MATLAB code are identical.
In particular, look at how we have represented the DOEXIT structure. In place of
the DO, we use the statement WHILE(1). Because MATLAB interprets the number | 2s
corresponding to “true,” this statement will repeat infinitely in the same manner as the DOCapr-corteramect | Lectin. computers | 2 Protanminn ant one nkcat
Matted Eines, fae eas Save engona,20
Fear tee
(a) Pseudocode () MATLAB
JECTED:
TF covet in Lf bos
True black Dec sb
eure nd
/THENIELS!
IF corcition THEW itaso
True Bleck = eqxt (abetal)
False hock = egertay
evare
/THEWELS!
TF congiton, THEN
ELSEIF condit (on 2
Brock:
ELSEIF cone 1
b)
(10;
Block is
FF condition ExtT if iv 1 end
end
FIGURE 2.9
Tho fundermensel oral
sructues i [al pseudocode OR i = start, Finish = asdoe2
are (bi the MATLAB Stock ® i
programming languoge ewan ead
43Chapra-Geate:narencat | Ldeeting. comptes | 2.Froqranming and oe)
Matted Eines, area Save cengrer,
Fear tee
a4 PROGRAMMING AND SOFTWARE
statement. The loop is terminated with a break command, This command transfers control
{the statement following the end statement that terminates the Joop.
The following MATLAB m-file can now be developed directly from the pseudocode
in Fig. 2.7. Type it into the MATLAB Editor/Debugger
9.8;
tmass (kg) ")
hear
waile (L)
lec ede ste
he ct
ead
ava :
end
disp('velocity (ve) +")
dispiv)
Save this file as numpara.m and return fo the command mode and run it by entering:
‘numpara, The following output should result:
nase (ha): 100
velocity (m/s)
17.43
As a final step in this development, let us take the above m-file and convert it into &
proper function. This can be done in the following mile based on the pseudocode from
Fig. 2.7
nection euler = £(de,ti,tfyism,Chara-Grate-marenea
Mathes or Eines,
Fear tee
Lmeoeigcompotrs | 2 Propamming ant oe)
9 Eo hes Stave engonas 20
2.8 OTHER LANGUAGES AND LIBRARIES 45
2.6
Save this file as euler m and then create another m-file to compute the derivative,
lca Fmd
Save this file as dy.m and return to the command mode. In order to invoke the function and
see the result, you can type in the following commands
‘When the last command is entered, the answer will be displayed as
16.5309
I is the combination of the MATLAB environment with the m-file programming
age that uly opens up a world of possibilities for engineering problem solving. In che
‘coming chapters we will lusirate how this is accomplished,
OTHER LANGUAGES AND LIBRARIES
In the previous sections, we showed how Excel and MATLAB function procedures for
Euler's method could be developed from an algorithm expressed as pseudocode. You
should recognize that similar functions can be written in high-level languages like Fortran
90 and C+. For example, 2 Fortran 90 function for Euler's method is
jon eulertae, ti, cay
REAL dt, ci, tf, vir
Feal bh, t, y, dyat
Sydr = ayn, y. m, cay
vey + dvat th
ExitChara-Grate-marenea
Mathes or Eines,
Fear tee
46
Lmeoeigcompotrs | 2 Propamming ant oe)
rea Stave cengrer,
PROGRAMMING AND SOFTWARE
PROBLEMS
For C, the result would look quite similar to the MATLAB function. The point is that
once a well-structured algorithm is developed in pseudocode form, it ean be readily imple-
‘mented in a variety of programming environments.
In this book, our approach will be to provide you with well-structured procedures writ-
ten as pseudocode. This collection of algorithms then constitutes a numerical ibeary that
ccan be accessed to perform specific numerical tasks in a range of software tools and pro-
‘gramming languages,
‘Beyond your own programs, you should be aware that commercial programming
braries contain many useful numerical procedures. For example, the Numerical Recipe
brary includes a large range of algorithms written in Fortran and CS"These procedures ste
deserited in both book (For example, Press et al, 1992) andl electronic form,
For Fortan, the JMSZ_(Fnternational Mathematical and SiatistiealFibrary) providesover
700 procedures spanning all she numerical areas covered in this text, Because ofthe wide-
spread use of Fostran in engineering, we include IMSL applications throughout the book
2.1 Write pseudocode fo implement the flowchart depicted in 22 A valve for the concentration of a pollutant in a lake is
Fig. P21, Make sre that proper indentation is clued wo makethe recorded on each cant in a set of ind
cards. Acard marked “end
io data is placed atthe end ofthe set. Write an algorithm to deter
mine the sum and the average ofthese values,
2.3 Write a siroctared flowchart for Prob. 2.2
2.4 Develop. debug, are! document a subprogram to determine the
roots of a quadkatic equation in ether a high-level language or a
macro language of your choice. Use a subyoutine procedure 10
‘compute the roots (either real or complex), Perform rest runs
(ya =0,b=—3
2.5 The sine Funotcn can be evaluated by the Following infinite se
White an algorii to implemen tis formula so that it computes
snd prints out the values of sin as cach crm in the series is added.
[mother words, compute an print in sequence the valves for
Figure P2.1
‘MATLAD. Intonation en all the Numcical Recipe products co he found a siiww aconT serena ant
Chapra-Geate-rarencat | eet. Computers, oe)
Matted Eines, fae eas Save engona,20
Fear tee
PROBLEMS a7
up wo che order term of your choosing. Fo each of tie above, com-
Pte and print ou the percent relative enor as
tue ~ series approximation
te
Semon = x 100%
26 Develop a simewired flowehan for Prob. 2.5, und write
reuclocode For Pro 2.5.
2.7 Develop, debug, and documents subprozzsm for Prob, 25 inci
ther a high-level language or a macro language oF your choice, Em
ploy the library funcrien for the sine n your computer ty derertine
the true value. Have the program prin cut the series approximation
andthe error ateach step. As atest case, employ the program to com.
Pate sia 1.5 for up to and including te wer x'8/151 Interpret your
resus.
2.8 The following algorithm is designed to determine a grade fora
course that consists OF cuizzes, homework, and a Fin exam
Step |
Step 2
Input course number and name
Input weighting factors for quizzes (WQ). homework
(WH), an te final exam (WF).
Input quiz grades andi determine an average quia grade
AQ.
Input homework grades and determine an average home-
work grade (AHD,
this course ha final grade, continue to sep 6 In0t. £0
to stp 9.
Input final exam grads (FE),
Deterinine average grace AG aecorting
Steps
Step 4
Step
Step 6!
Step 7
WO»: AQ+ WH < ANE WPAFE
ag WO+ WH + WF
Step
Step:
Go to step 10.
Determine average grade AG according 0
WO x AQ EWI AL
Wwo=WH
AG
‘Step 10: Print out course number name, and average ara
Step 11: Terminate computation.
Write, debug, and document a stractured computer program based
a this algorithm, Test it using the Following data to calculate &
grade without the final exam and a grade with the Final exaen
WQ= 30; WH =30; WE= 40; quizzes = 98, 95. 90, 60, 99
homework = 95,90, 86, 100, 100, 77 and final exam = 91
29 An amount of money P is invested in am account where inter
est is compounded atthe end of the period, The fawure wor F
Yielded at an interest rat i after a periods may be determined from
the following formulation
F=Patir
Wikie a subpngram thar will calculate the fate worth of an in
vestmeat in cither a high-level Language oF a macro language of
your choice, The input to the program should include the inital in
‘vesoment P, the interest rate (as a decimal}, and the aumber of
yes n for which the fuuae worth isto be calculated. The output
‘Should also include these values, The oatpat should also inelode, ia
8 labeled table format, tho future worth for each yeer up (9 and i
lading the mth year, Run the program for P — $100,000, 10.0
and n = 25 years
2.10 The average daily temperature for an area ean be appro
‘mated by the following Function,
T = Tae + To
Trunn Voosten(t ~ fy)?
where Trou =e average annual temperate, Tyo =the peak
temperature, co =the frequency ofthe ennual variation |= 21/65),
‘and tyaq = day 0F the peak temperature (= 205 a), Parameters for
some U.S, towns are listed in Table P2.10,
‘Table P2.10 Mean daily sir temperature parameters for
some selecied U.S. locotone
Taan (°C) Tyan CO)
2 282)
2 396
Seal, WA 106 76
Boston, Ma OF 29
Develop @ subprogram in either © high-level langu
macro language of your cholve that computes the average tempers=
tare herween two days of the year entered by the user Test for
(a) Janvary-February in Bismarck, North Dakota (t= 010 59) and
July-August in Yuma, Aizoaa (f= 189 1042),
2.11 Economie formulas ste available t0 compute annual pay=
ments for loans. Suppose that you borrow an amount of money P
and agree to repay tin» annual payments a& an interest rae of.
‘Te formula to compute the annual payment Ais
iain
OPT
Waite a sebprogram in either a high-level language ora macro lan=
guage of your choice to compute A. Test it with P = $35,000 and an
Anterestrate of 1S percent (i = 0.15). Set up the program so that you
‘can evan 28 many yalwes ofr as yon ike. Compe results for
n= 1.2.3.4 and 5
2.12 Develop, debug, and test a Subprogram in either high-level
Tanguage ora maero language of your ehoice to compuxe the veloc
ity ofthe falling parachutis: as outined in Example 1.2. Design
the program so tit it allows the wer to input values forthe dazCapr-corteramect | Lectin. computers | 2 Protanminn ant one nkcat
ethde trainers freer Aas Setware cones 2
Fear teen
48 PROGRAMMING AND SOFTWARE
coeffelent and mass, Test the progtom by duplicating the results 214 Figure P2.14 shows s cylindrical tank with n conieal base, EF
from Eaample 1.2. Repeat the computation buremploy tepsizes of the liquid level is quite low in the conical part, the volume is
T and 05 5. Compare your resulls with the analytical solution simply the conical volume of liquid. f the liquid level is midrange
‘oblained previcusly in Example I-1. Does 2 smaller sep size make inthe cylindrical part, the teal volume of liquid includes the filled
the results better oF worse? Explain your reals conical par and the parilly filled eylindct part.
2.13 The bubble sort isan inefficient bu easy-o-program. sorting Write a function procedure to compute the tank's volume: as &
technique. The idea behind the sort is to move down through an Function of given values of P and d. Use decisional control struc
array comparing adjcoent pars and swapping the values if they sre tures (ike I7Theo, Elst Else, End i. Design the function so that
fou of order. For this method to sort the aray completely, may it returns the volume forall cases where the depth ic Fess than 38.
need to pass though it many times. As the passes proceed for an Return an err message (“Os ‘f you overt the tank, that
sascending-cvder fort, the smaller elements in the array appear to is,d > 3K. Test t withthe following date:
rise toward the op like bubbies. Eventally, shee will be a pss
throxgh the aaray where no swaps ave required. Then the array is
.
Ga :
:
Figure P2.14.
relative to an crigin into dimensional space (Fig. P21S)
t
wien = ue
Figure P2.13 Figure P2.15Capr-corteramect | Lectin. computers | 2 Protanminn ant one nkcat
Mathes or Eines,
Fear tee
fae eas Stave
PROBLEMS
+The horizontal and vertical distances Cx, 9) in Cartesian coord
The radius and angle (r 0) in todil coordinates
ns olacively straightforward ro compute Cartesian coordinates
(2, yan the basi of polar coordinates 9). The reverse process is
not sa simple. The radius can be computed by the following Formel
It the cootinates lie within the first and fourth coordintes (tha it
> 0). thea simple formala can be wsed to vompute ?
vom ()
‘The dilficulty arises forthe ther eases. The following table sum.
rmarizes the possibiies
x y ‘
engona,20
49
(9) Write a wellstnactited flowchart fora subroutine procedure to
calculate ancl # as a function x any Ex
for6 in degrees
(b) Wate a well strctares function procedure based on your flow’
chart Test your program by sing itt fill ovt the flowing
table
the final rests50
Chara-Grate-marenea
Mathes or Eines,
Fear tee
eating. compet
9 Eo hes
oe)
engonas 20
CHAPTER 3
Approximations and
Round-Off Errors
Because so many of the methods in this book are straightforward in description and appli=
cation, it would be very tempting at this point for us to proceed directly tothe main body
of the text and feach you how to use these fechniques. However, understanding the caneept
of error is so important to the effective use of numerical methods that we have chosen to
devote the next two chapters to this topic.
‘The importance of error was introduced in our discussion of the falling parachutist in
Chap. I, Recall that we determined the velocity ofa falling parachutist by both analytical
and numerical methods. Although the auimerical technique yielded estimates that were
close to the exact analytieal solution, there was a discrepancy. or error because the numer:
ical method involved an approximation. Actually. we were Fortunate in that ease because
the availablity of an analytical solution allowed us to compute the error exeetly, Por many
applied engineering problems, we cannot obtain analytical solutions. Therefore. we cannot
compute exactly the errors associated with our numerical methods, In these cases, we must
settle for approximations or estimates of the errors.
Such errors are characteristic of most of the techniques described in this book. This
statement might at first seem conttary to what one normally conceives of as sound engi-
neering. Students and practicing engineers constantly strive to limit errors in their work.
‘When taking examinations or doing homework problems, you are penalized, pot rewarded.
{or your errors, In professional practice, errors can be costly and sometimes catastrophic.
Ifa structure oF deviee fails, lives can be lost.
Although perfection is a laudable goal, it is rarely, if ever, attained, For example.
spite the fact that the model developed trom Newton's second law is an excellent
approximation, it would never in practice exactly predict the parachutist’s fall. A variety of
factors steh as winds and slight variations in air resistance would result in deviations from
the prediction. If these deviations are systematically high o low, then we might need to
develop a niew model. However, if they are randomly distibuted and tightly grouped
round the prediction, then the deviations might be considered negligible and the model
deemed adequate. Numerical approximstions also introduce similar discrepancies into the
analysis, Again, the question is: How much error is present in our ealeulations and is it
tolerable?Chara-Grate-marenea
Mathes or Eines,
Fear tee
eating. compet oe)
9 Eo hes engonas 20
3.1_ SIGNIFICANT FIGURES 51
3.1
‘This chapter and the next cover basic topics related to the identification, quantifica-
tion, and minimization of these errors. In this chapter, general information concemed with
the quantification of error is reviewed in the fist sections. This is followed by a section on
‘one of the two major forms of numerical eror: round-off error. Rownd-of error is due to
the fact that computers can represent only quantities with a finite number of digits. Then
Chap. 4 deals with the other major form: truncation error. Truncation error is the discrep
ancy introduced by the fact that numerical methods may employ approximations to rep
resent exact mathematical operations and quantities, Finally, we briefly discuss errors
not directly connected with the numerical methods themselves. These include blunders,
Formulation or model ersors, and data uncertainty,
SIGNIFICANT FIGURES
This ook deals extensively with approximations connected with the manipulation of num
bers. Consequently, before discussing the errors associated with numerical methods, i is
useful «9 review basic concepis relsted © approximate representation of the nucthers
themselves.
Whenever we employ s number in a computation, we must have assurance that i
cean be used with confidence. For example, Fig, 3.1 depieis a speedometer and odometer
‘rom an automobile. Visual inspection of the speedomever indicates thatthe ear is traveling
between 48 and 49 mm/h, Because the indicator is higher then the midpoint between the
markers on the gauge, we can say with assurance that the car is traveling at approximately
49 kin/h, We have confidence in this result because two or more reasonable individuals
reading this gauge would arrive atthe sume conclusion, However, let us say that we insist
that the speed be estimated to one decimal place. For this case, one person inight say 48.8,
FIGURE 3.1
An automobile speedometer and adcmaorilusvating ho concep cf. sigaticart Figure52.
Chara-Grate-marenea
Mathes or Eines,
Fear tee
eating. compet
rea
cengrer,
APPROXIMATIONS AND ROUND-OFF ERRORS
‘whereas another might say 48.9 km/h. Therefore, because of the limits of this instrument,
only the first two digits can be used with confidence. Estimates of the third digit (or higher)
must be Viewed as approximations. It would be ludicrous co claim, on the basis of this
speedometer, thatthe antomabile is traveling at 48.8642138 knvl, In contrast, the odome-
ter provides up to six certain digits, From Fig. 3.1, we can conclude that the car has trav-
cled slighily less than 87,324.5 km during its lifetime. In this case, the seventh digit (and
higher) is uncersin.
“The concept ofa significant figure, or digit, has been developed to formally designate
the reliability of a numerical value. The significam digis of a number are those that can be
used with confidence. They correspond to the number of certain digits plus one estimated
ligt, For example, the speedometer and the odometer in Fig. 3. yield readings of three
‘and seven significant figures, respectively. For the speedometer, the wo certain digits ate
48. 1is conventional 10 set the estimated digit at one-half of the stullest scale division on
wement device, Thus the speedometer reading would consist of the three signifi-
48.5. Ina similar Fashion, the odometer would yield a seven-significant-
reading of $7,324.45
Aldiough itis usually a strsightforwant procedure to ascertain the significant figures
‘of a number, some exses can Tead to confusion, For example, zeros are not always signifi-
cant figures because they may be necessity just 10 Tocate a decimal point. The numbers
0,00001845, 0.001845, and 0.001845 all have Four significant figures. Similarly, when
trailing zeros are used in large numbers, it is not clear how many, ifany, of the zeros are
significant, For example, at face value the number 45,300 may have three, four, oF five
significandligits, depending on whether the zeros are known with confidence, Such uncer=
tainty can be resolved by using scientific notation, where 4,53 x 108, 4.530 x 108, 4.5300
x 10* designate that the mumber is known to three, four, ang five significant figures
respectively.
The concept of significant figures has two important implications for our study oF
‘numerical methods:
1. Asintroduced in the falling parachutist problem, numerical methods yield approximate
results, We must, therefore, develop criteria to specify how confident we are in eur
approximate result. One way to do this is in terms of significant figures. For example,
we might decide that our approximation is acceptable if itis correct to four significant
figures.
2. Although quantities such as x. e. or v7 represent specific quantities, they cannot be
expressed exactly by a limited aurnber of digits. For example,
= 3.141592653589793238462643
ad infinitwn. Because computers retain only a finite number of significam Figures, such
numbers can never be represented exsetly. The omission of the remaining significant
Figures is ealled round-off enor
Both round-off enor and the use of significant Figures to express ous confidence in
‘numerical result will be explored in detail in subsequent sections. In addition, the coneept
of significant figures will have relevance fo our definition of accuracy and precision in the
next soction,Chara-Grate-marenea
Mathes or Eines,
Fear tee
eating. compet
rea cengrer,
3.2_ ACCURACY AND PRECISION 53
3.2
ACCURACY AND PRECISION
‘The errors associated with both calculations and measurements can be characterized with
segard to their accuracy and precision, Accuracy refers to how closely a computed or mea-
sured value agrees with the true value. Precision refers to how closely individual computed
‘or measured values agree with each other,
These concepts can be illustrated graphically using an analogy from target practice.
‘The bullet holes on each target in Fig. 3.2 can be thought of as the predictions of a numer
‘cal technique, whereas the bull’s-eye represents the truth. Inaccuracy (also called bias) is
defined as systematic deviation from the truth. Thus, although the shots in Fig. 3.2c are
‘more tightly grouped than those in Fig. 3.24, the two cases are equally biased because
they ate both centered on the upper left quadrant of the target. Jmprecision (also called wn
certainty), on the other hand, refers 10 the magnitude of the scatter. Therefore, although
Fig. 3.2) and d are equally accurate (shat is, centered on the bull’s-eye), the latter is more
precise because the shors are tightly arouped.
Numerical methods should be sufficiently accurate or unbiased to meet the require
‘ments of a particular engineering problem. They slso should be precise enough for adequate
FIGURE 3.2
An eccrige ion martsmanship dhstoting he concepss ol aceuacy and precio
tale and imprecise, (6 oecuals and imprecise (el inacaurae ond precise; (do
preci.
viate and
Ineveasing accuracy54.
Chara-Grate-marenea
Mathes or Eines,
Fear tee
eating. compet
rea
cengrer,
APPROXIMATIONS AND ROUND-OFF ERRORS
3.3
EXAMPLE 3.1
engineering design. In this book, we will use the collective term error to represent both the
inaccuracy and the imprecision of our predictions. With these concepis as background, We
‘can now discuss the factors that conteibute fo the error of nuinerical computations.
ERROR DEFINITIONS
‘Numerical errors arise from the use of approximations to represent exact mathematical op-
erations and quantities. These inelude sruncation errors, which result when approximations
are used to represent exact mathematical procedures, and round-off errors, which result
‘when numbers having limited significant figures se used to represent exact numbers, For
both types, the relationship berween the exact, oF true, result snd the approximation can be
Thue value = approximation + error
By rearranging Eq. (3.1), we find that the numerical error is equal to the
tween the truth and the approximation, 3 in
E, = true value — approximation en
where Fis used 10 designate the exact value of the error, The subscripts included 1 des-
ignate that this is the “true” error. This is in comtrast co other cases, as describe shortly
where an 4c” estimate pf the error mast be employed.
ig of this definition is thet it takes no account of the order of
under examination, For example, an ertor of a centimeter is much more
significant if we are measuring «rivet rather than a bridge. One way to aceount forthe mag~
aces of the quantities being evaluated is to normalize the ert to the true valve, as i
‘True fractional relative error = SHE SOE
ime vale
‘where, os specified by Eg. (3.2), error = true value — approximation. The relative error can
also be multiplied by 100 pereent to express it as
true error
6 = EE 100% a3)
ire value
where 6; designates the tre percent relative e1tor
Calculation of Errors
Problem Statement, Suppose that you have the task of measuring the lengths ofa bridge
‘and a rivet and come up with 9999 and 9 em, respectively. Ifthe true values are 10,000 and
10 cm, respectively, compate (a) the true error and (6) the true pervent relative error for
each case.
Solution
(a) ‘The eeror-for measuring the bridge is (1. (3.2)]
FE, = 10,000 — 9999 = 1 emChara-Grate-marenea
Mathes or Eines,
Fear tee
eating. compet
rea
cengrer,
3.3 ERROR DEFININONS 55
‘and for the rivet itis
E,=10-9
Gb) The pevcent relative error forthe bridge is (Ey, .3)]
1
= ppl OO = 0.01%
and for the rivet itis
low
| 009
550
Thos, although both measurements have an error of | em, the relative error for the sivet is
much greater, We would conclude that we have done an adequate job of measuring the
bridge, whereas our estimate for the rivet leaves something to be desired
Notice that for Eqs. (3.2) and (3.3), 2 and ¢ are subscripted with a r10 signify that the
error is normalized to the true Value. In Example 3.1. we were provided with this value. How-
ever. in actual situations such information is rarely available, For numerical methods, the
‘ave value will be known only when we deal with functions that can be solved analytically.
Such will typically be the ease when we investigate the theoretical behavior of a particular
technique for simple systems, However, in real-world applivations, we will obviously not
know the true answer a priori. For these situations, an alternative is to normalize the error
using the best available estimate of the true value, that is, (0 the approximation itself, as in
ag = AHEONMAKE EOF oa
‘approximation
where the subscript u signifies that the ervor is normalized 19 an approximate value, Note
also that for real-world applications, Eq. (3.2) cannot be used to calculate the error cerm for
Eq. 3.4). One of the challenges of numerical methods isto determine error estirnates in the
absence of knowledge regarding the ‘tue value. For example, certain muanerical methoxy
tase an iterative approach to compute answers. [n such an approach, a present approxini=
tion is made on the basis of a previous approximation. This process is performed! repeat=
tally, or iteratively, to successively compute (we hope) bester and better approximations
For such cases, the error is often estimated as the difference between previous anki current
approximations. Thus, percent relative error is determined according t0
ent approximation — previous approximation,
a Taurfent approximation
00% 6s)
‘This and other approaches for expressing errors will be elaborated on in subsequent
chapters.
‘The signs of Eqs. (3.2) through (3.5) may be either positive or negative, Ifthe approx-
‘imation is greater thon the true value (or the previous approximation is greater than the
ceurrent approximation), the error is negative; if the approximation is less than the true
value, the error is positive, Also, for Eqs. (3.3) to (3.5), the denominator may be less thanChapra-Geate-rarencat | eet. Computers, oe)
Matted Eines, area cengrer,
Fear tee
56 APPROXIMATIONS AND ROUND-OFF ERRORS
zero, which can also lead 10 negative error, Often, when performing computations, we
‘may not be concemed with the sign of the ercor, but we ate interested in whether the per-
cent absolute value is lower than a prespecified percent tolerance «,. Therefore, itis often
‘useful 0 employ the absolute value of Eqs. (3.2) through (3.5), For such cases, the compu
tation is repeated until
leak <6. G6)
IF this relationship holds. our result is assumed to be within the prespecified acceptable
level ¢;. Note that for the remainder of this text, we will almost exclusively employ ab-
solute values when we use telative errors.
Itis also convenient to tclate these errors to the number of significant figures in the ap
proximation. It can be shown (Scarborough, 1966) that ifthe following criterion is met, we
ccan be assured that the result is correct to af least n significant figures.
= OS x1 on
EXAMPLE 3.2 Evror Estimates for levative Methods
Problem Staiement. In mathematics, functions can often be represented by infinite
series. For example, the exponential function ean be computed using
+5 e220
‘Thus, as more terms are added in sequence, the approximation becomes a better ancl better
estimate of the true value of ¢. Equation (F3,2.1) is called 2 Maclourin series expansion,
starting with the simplest version, e* = 1, add terms one at a time 1 estimate 6%
Affer each new term is added, compute the true ank! approximate percent relative errors
‘with Eqs, 3.3) and G.5), respectively, Note that the true value ise = 1648721... Add
terms until the absolute value of the approximate error estimate &, falls below a prespeci-
fied error criterion &, conforming to three significant figures.
Solution. First, Bg. (3.7) can be employed to determine the error criterion that ensures a
result is correct to at least three significant figures:
(05 x 10%
05%
‘Thus. we will add terms to the series until ¢, falls below this level
The first estimate is simply equal to Eg, (E3,2.1) with a single term. Thus. the frst es-
timate is equal to |. The second estimate is then generated by adding the second term. asin
c=lte
or forx = 08,Chara-Grate-marenea
Mathes or Eines,
Fear tee
eating. compet
cengrer,
rea
3.A_ROUND-OFF ERRORS 587
Equation (3.5) can be used to determine an approximate estimate of the error, asin
se
TS
100% = 33.3%
Because ¢, is not ess than the required value of swe would continue the computation by
adding another term, x7/2!, and repeating tie error calculations. The process is continued
until s, ¢,. The entire computation can be summarized as
Terms Result orf) ea (%)
T 2
2 1s a2
3 ve Lae
4 Leassee3a3 ous
5 1648437500 oni72
é Lesson” ‘901
Thus, afler six terms are included, the approximate extor falls below #, = 0.089 and the
computation is terminated, However, notice tht, father than three significant figures, che
result is accurate to five! This is because, for this case, both gs. (3.5) and (3.7) are con
servative, That is, they ensure that the result is at least as good as they specify. Although,
as discussed in Chap. 6, this is not always the case for Bg, G5), its ite most of the time.
3.4
With the preceding definitions as background, we can now proceed to the two types oF
certor connected directly with numerical methods; rounchof? errors and truncation errors,
ROUND-OFF ERRORS
As mentioned previously, round-olf ecrors originate from the fact that computers retain
aly a fixed number of significant figures during a calculation, Numbers such as x, ¢, or
V7 cannot be expressed by a fixed number of significant figures. Therefore, they cannot be
represented exactly by the computer. In addition, because computers use a base-2 repre
sentation, they cannot precisely represent certain exact base-I0 numbers. The discrepancy
introduced by this omission of significant figures i called round-off error
3.4.1 Computer Representation of Numbers
‘Numerical round-off errocs are directly reared to the manner in which numbers are stored
sn a computer. The fundamencal unit whereby information is represented is called a word.
This is an entity that consists of a string of binary digits, or bits. Numbecs are typically
stored in one or more words. To understand how this is accomplished, we must first review
some material related to number systems.
Number Sysioms. A number system is merely a convention for representing quantities.
Because we have 10 fingers and 1U toes, the number system that we are most familiar with
ss the decimal, or base-10, number system. A base is the number used as the reference for58
Chara-Grate-marenea
Mathes or Eines,
Fear tee
cengrer,
eating. compet
rea
APPROXIMATIONS AND ROUND-OFF ERRORS
constructing the system. The base-10 system uses the 10 diatts—0, |, 2,3.4,5,6.7.8.9—
to represent numbers. By themselves, these digits are satisfactory for counting from 0 t0 9,
FFor larger quantities, combinations of these basie digits are used, with the position or
place value specifying the magnitude, The right-most digit in a whole number represents 2
‘number from 0 to 9, The second digit from the right represents a multiple of 10. The third
dligic from the right represents a multiple of 100 and 0 on. For example. if we have the
‘number 86,409 then we have eight groups of 10,000, six groups of 1000. four groups of
100, zero groups of 10. and nine more units, or
48 10) + 6 x 10°) + (4 107) + @ x 10!) + (9 = 10°) = 86,409
lated in the
Figure 3.30 provides a visual representation of how a number is for
bzse-10 system. This type of representation is called positional noon.
Because the decimal system is so familia, itis nox commonly realized that there are
alternatives, For example, i human beings happened 10 have had eight fingers and eight
toes, We would undoubtedly have developed an cial, or base-8, representation. Te the
ne sense, our Friend the computer i like a twoefingered animal who is Himited to 10
States—either 0 or I. This relates to the faci that the primary logic units of digital con
FIGURE 3.3
How the fo) decimal base 10) and the [8 binary base 2! systems work, In (6, the binary rom
ber ID10 110" is equivalent io he decimal number 173.
108 107 10 10" 108
B64 08
oe
8 = 1.009
@)
1x 128-128
73Caprconte tare | Meeting. compet, one nkcat
ethde trainers or hash can,
Fear teen
3.4_ROUND.OFF ERRORS 59
TPefe pet eleje[ el lel] fey)
t Numbor
Sign
FIGURE 3.4
The repesenratan of he dectvalinager ~173 on a Ié-br compar usrg the signed
srognlude meted
are onfotf electronic components. Heres. numbers on the computer are represented with
abinary, oF hase-2, system. Just as with the decimal system, quantities ean be repre-
sented using positional notation, For example, the binary number 11 is equivalent to (1 x
2) (1 x 2) =2-+ 1s 3 im the decimal system. Figure 3.36 illustrates a more compli-
cared example
Integer Representation. Now thet we have reviewed how base-10 numbers can be £ep-
resented in binary form, itis simple to conceive of how integers are represented on a com-
pater. The most straightforward approach, called the signed maguitude method, employs
the first bit of a word to indicate the sign, with a 0 for positive and a | for negative. The
remaining bits are used to stote the number. or example, the integer value of ~ 173 would
be stored on a 16-bit computer. as in Fig. 3.4
EXAMPLE 3.3. Runge of Iniegers
Problem Stotement.
on a 16-bit computer.
Detenmnine the range of integers in base-10 that can be represented
Solution, Of the 16 bits, the first bit holds the sign, The remaining 15 bits can hold bi-
nary numbers from O to HILT HID The upper Tint can be converted 10 a decimal
integer, asin
DYE x 2) pee 2 x DM
which equals 32.767 (note that chis expression can be simply evaluated as 2! — 1). Th
16-bit computer word can store decimal integers ranging from —32.767 to 32.767. In ace
dition, because ze70 is already defined as 0000000000000. itis redundant to use the
number 1000000000000000 to define a “minus zero.” Therefore. itis usually employed to
represent an additional negative number: 32.768, and the range is from —32.768 to
32.767.
Note that the signed magnitude method described above is not used to represent inte
er on conventional computers. A preferred approach called the 2 complement technique
directly incorporates the sign into the number’s magnitude rather than providing a separate
bit to represent plus or minus (see Chapra and Canale 1994), However. Example 3.3 still
serves to illustrate how all digital computers are limited in their capability to represent
sntegers. That is, numbers above or below the range cannot be represented, A more seriousChara-Grate-marenea
Mathes or Eines,
Fear tee
eating. compet
rea
cengrer,
APPROXIMATIONS AND ROUND-OFF ERRORS
Signed
Manisa
f
Sign
FIGURE 3.5
The manner in which o footngppoint nurber is stored in a were
limitation is encountered in the storage and manipulation of fractional quantities as de
scribed next
FloalingPoin Represeniotion. Fractional quantities are yypieally represented in com-
ppucers using floating-point form. inthis approach, the numer is expressed as 2 fr
pat, called a mansissa oe significund, and an integer pant, called an exponent or character
mB
‘where = the mantissa, b = the base of the number system being used, and ¢ = the expo
nent, For instance, the number [56,78 could be represented as 0.15678 « 10° ina floating
point base-10 system.
Figure 3.5 shows one wey that floating-point number could be stored in a word. The
first bit is reserved for the sign, the next series of bits for the signed exponent, and the last
bits for the mantissa
[Note thatthe mantissa is usually normalized if it has leading zero digits. For example,
suppose the quantity 1/34 — 0,029411765 ... was stored in a floating-point base. 10 sys
ter that allowed only four decimal places tobe stored, Thus, 1/34 would be stored as
0.0294 x 10"
However, in the process of doing this, the inclusion of the useless zero (o the right of the
decimal forces us to drop the digit 1 in the fifth decimal place. The number can be normal-
ized to remove the leading zero by multiplying the mantissa by 10 and lowering the expo-
nent by Ito give
0.2041 107!
‘Thus, we retain an additional significant figure when the number is stored
‘The consequence of normalization is that the absolute value af a is limited. That is,
1
vemet 38)
32 Gs
where b = the base, For example, for a base-10 system, m would range between 0. and 1,
‘and for a base-2 system, between 0.5 and |
Floating-point representation allows both fractions and very large numbers to be ex-
pressed on the computer. However, it has some disadvantages, For example, floating-pointChapra-Geate-rarencat | eet. Computers,
ethde trainers or hash can,
Fearon
3.4 ROUND-OFF ERRORS 6
‘numbers take up more room and take longer to process than integer numbers. More signif=
‘cantly, however, their use introduces a source of error because the mantissa holds only a
finite number of significant figures. Thus, a round-off error is introduced,
EXAMPLE 3.4 Hypothetical Set of Floating Point Numbers
Problem Statement. Create a hypothetical Hoating-point number set for 2 machine
that stores information using 7-bit words. Employ the frst bit for the siga of the oumbes,
the next three for the sign and the magnitude of the exponent and the last three for the
‘magnitude of the mantissa (Fig. 3.6)
Solution, ‘The smallest possible positive number is depicted in Fig. 3.6, The initial 0 ine
dicates thatthe quantity is positive The I in the second place designates that the exponent
hhas a negative sign. The 1's in the tid and fousth places give # maximum value to the
exponent of
Ix 241x253
‘Therefore. the exponent will be ~3. Finally. the mantissa is specified by the 100 in the last
three places, which conforms to
1x2! 40x274042%=05
Although a smaller mantissa is possible (eg. 000,001, 010.011), the value of 100s used
because ofthe Himit imposed by normalization [Fq. (3.8)}. Thus, the smallest possible pose
itive mumber for this system is +0.5 x 2~?, which isequal to 0.0625 in the hase-10 syste
The next highest numbers are developed by increasing the mantissa a in
ONO] = x28 $0 FFF Lx 3) 2 = OUTIL) 0
ONTO = (Le 2°44 2240 29) 22° = 0.093750) 6
ONIN = 2 ee PPE I) I> = 0.109875) 0
Notice thatthe base-10 equivalents are spaced evenly with an interval of 0.015625,
this point to continue increasing, we must decrease the exponent to 1D, which gives
avalue of
1x2! 40x29
FIGURE 3.6
The smiles possible positive flotng point number from Example 3.4
vere
ofa |ifr[s [elo
Magnitude
Sign ot Sigh of son de
sumer exponent | 428
Meagnivuce
‘of exponent62
Chara-Grate-marenea
Mathes or Eines,
Fear tee
cengrer,
eating. compet
rea
APPROXIMATIONS AND ROUND-OFF ERRORS
Chopping Rounding
Uncierfow “Hola”
212070
FIGURE 3.7
The hypothetical numbar stem davelopad in Example 3.4. Each vole is indicated by
cal sat weakd also extend in he
rank. Only the postive rumbers are dren. An
regalve direction
The mantissa is decreased back to its smallest value of 100, Therefore, the next number is
O1NO100 = 1 x 240 270 x 2%) 2? = 0.125000
‘This still represents a gap of 0.125000 ~ 0.109375 = 0.015625, However, now when
higher mumbers are generated by inereasing the mantissa, che gap is Fengthened 1 0.03125,
OL101O1 = 1 x 2 0x27 4 Tx TY I? = 0.150250)1p
ONO1ID = C1 x 244 x FO 2) 2°? = O.187500)19
OMG = x 244 1x 224 1x24) 2? = O2I87SO9
‘This pattern is repeated as each larger qua
reached
OO = It Ex 24 Te) x P= Aho
ty is formated until a
‘The final number set is depicted graphically in Fig. 3.7,
Figure 3.7 manifests several aspects of floating-point representation that have signifi-
cance regarding computer round-otf errors:
A. There Isa Limited Range of Ouantities That May Be Represented. Just as forthe inte
ser-ease, there are large positive and negative numbers that eannot be represented.Chara-Grate-marenea
Mathes or Eines,
Fear tee
eating. compet
rea cengrer,
3.A_ROUND-OFF ERRORS 63
Attempts to employ qumbers outside the acceptable range will result in what is called
an overflow error. However, in addition to large quantities, the floating-point repre-
sentation has the added Timitation that very small numbers cannot be represented. This
is illustrated by the underflow “hole” between zero and the first positive number in
Fig. 37. It should be noted that this hole is enlarged because of the normalization
constraint of Fg. (3.8)
2 There Are Only a Finite Number of Quantiies Thai Cun Be Represented within the
Range. Thus, the degree of precision is limited. Obviously, ieational numbers cannot be
‘represented exactly. Furthermore, rational numbers that do not exsetly match one ofthe
values inthe set also cannot be represented precisely. The estors introduced by approx
mating bout these cases are referted 1 as quantizing ettors. The actual approximation,
is accomplished in either of wo ways: chopping or rounding. Forexample, suppose that
the value of «= 3.14159265358 . .. is to he siozed on x hase-10 number system carr
ing seven significant figures. One method of approximation would be w merely omit,
‘or ‘chop off,” the eighth and higher esms, as in = 3.141592, with the introduction of
sn associated error a” [Eq. (3.2)]
E, = 0,00000065
‘This technique of retaining only the significant terms was originally dubbed “trun-
cation” in computer jargon. We prefer to call it chopping to distinguish it from the
‘truncation errors discussed in Chap. 4. Note that for the base-2 number system in
Fig. 3.7, chopping means that any quantity falling within an interval of length Ax wall
be stored as the quantity at the lower end of the interval. Thus, the upper error bound.
for chopping is Ax, Additionally, a bias is introduced hecause all erors are positive.
“The shortcomings of chopping are attributable to the fact thatthe higher terms in the
‘complete decimal representation have no impact on the shortened version. For
instance, in our example of 7, the first discarded digit is 6. Thus, the last retained digit,
should be rounded up to yield 3.141593. Such rowiding reduces the error to
F, = —0,00000035
‘Consequently, rounding yields a lower absolute ervor than chopping. Note that for the
base-2 number system in Fig. 3.7, counding means that any quantity falling within an
interval of length Ax will be represented as the nearest allowable number. Thus. the
upper error bound for rounding is Ax/2. Additionally, no bias is introduced be-
‘cause some errors are positive and some are negative. Some computers employ round-
ing. However. this adds to the computational overhead, and, consequently. many
machines use simple chopping. This approach is justified under the supposition that the
number of significant figures is large enough that resulting rounc-off error is usually
negligible.
3. The Interval berween Numbers, Ax, hucreases as the Numbers Grow in Magnitude. It
is this characteristic. of course. that allows floating-point representation t© preserve
significant digits. However: it also means that quantizing errors will be propostional to
the magnitude of the number being represented. For normalized floating-point mum-
bers, this proportionality can be expressed, for cases where chopping is employed. as,
lax
69Caprconte tare | Meeting. compet, one nkcat
ethde trainers or hash can,
Fearon
APPROXIMATIONS AND ROUND-OFF ERRORS
and, for cases where rounding is employed. as
asl 8