01 - Introduction and Numerical Errors

授課教師：周鼎贏 Dean Chou D.Phil.
Oxon
Requirements and Policies
 Requirements
➢ Mathematical background
➢ Programming skill
 Policies
➢ Homework must be submitted on time
➢ Personal project must be done by yourself
➢ Adjust your final report is prohibited
 Evaluation Methods
Points of the Report Graded Contents
40 points Assignments
30 points Personal Project
30 points Group Project
Course Program
Weeks/dates Lecture Contents
01, 09/09 Introduction
02, 16/09 Preliminary Concepts and Error Analysis
03, 23/09 Solving One Variable Equations
04, 30/09 Solving System Algebra Equations
05, 07/10 Curve Fitting and Interpolation
06, 14/10 Numerical Differentiation
07, 21/10 Numerical Integration
08, 28/10 ODE for IVP and BVP
09, 04/11 Mid-term Exam (No Class)

Course Program
Weeks/dates Lecture Contents
10, 11/11 Personal Project Review/Presentation
14, 09/12 Group Project Review
18, 06/01 Final Exam (No Class)

Group your team
 Total students in this lecture: 16 (Current)
 Aim of the group number: 8 (Current)
 The total number of each group member is assume 2.
 Deadline to form your group is on next Mon. (23/Sept)
Introduction
 The Text Book in this lecture
➢ Name:
Applied Numerical Methods for Engineers and Scientists
➢ Author:
Singiresu S. Rao, University of Miami, Florida
 The Reference Book in this lecture
➢ Name:
數值方法與程式
➢ Author:
林聰悟、林佳慧
1.1 Importance of Numerical Methods in Engineering
 Experimental and Analytical approaches to solve engineering/scientific problems
 Analytical Approach: Most engineering analysis problems involve

(1) the development of mathematical model to represent all the important
characteristics of the physical system;
(2) the derivation of the governing equations of the model by applying physical laws,
such as equilibrium equation, Newton’s laws of motion, conservation of mass and
conservation of energy;
(3) solution of the governing equations; and interpretation of the solution.
Depending on the system being analyzed and the mathematical model used, the
governing equations may be a set of linear or nonlinear algebraic equations, a set of
transcendental equations, a set of ordinary or partial differential equations, a set of
homogeneous equations leading to an eigenvalue problem, or an equation involving
integrals or derivatives.
 We may or may not be able to find the solution of a governing equation analytically.
 If the solution can be represented in the form of a closed-form mathematical
expression, it is called an analytical solution.
 Analytical solutions denote exact solutions that can be used to study the behavior of
the system with varying parameters.
Unfortunately, very few practical systems lead to analytical solutions, and hence
analytical solutions are of limited use. (EQ-nonlinear, BC-complex geometry, and …)
 In certain special types of problems, graphical

solution can be found to study the behavior of
the system. However,
 graphical solutions usually are less
accurate, awkward to use,
 can only be implemented if dimensionally
of the problem is less than or equal to three,
and require more time.
Closed-form expression from Wikipedia, the free encyclopedia
In mathematics, an expression is said to be a closed-form expression if, and only if, it
can be expressed analytically in terms of a bounded number of certain "well-known"
functions. Typically, these well-known functions are defined to be elementary
functions – constants, one variable x, elementary operations of arithmetic (+ – × ÷), nth
roots, exponent and logarithm (which thus also include trigonometric functions and
inverse trigonometric functions).
By contrast, infinite series, integrals, limits, and continued fractions are not permitted.
Indeed, by the Stone–Weierstrass theorem, any continuous function on the unit interval
can be expressed as a limit of polynomials, so any class of functions containing the
polynomials and closed under limits will necessarily include all continuous functions.
Similarly, an equation or system of equations is said to have a closed-form solution if,
and only if, at least one solution can be expressed as a closed-form expression. There is a
subtle distinction between a "closed-form function" and a "closed-form number" in the
discussion of a "closed-form solution", discussed in (Chow 1999) and below.
An area of study in mathematics is proving that no closed-form expression exists, which
is referred to broadly as "Galois theory", based on the central example of closed-form
solutions to polynomials.
 Numerical solutions are those that cannot be expressed in the form of mathematical
expressions. They can be found using a suitable type of calculation
( , , ,  )-intensive process, known as a numerical method.
 For example, consider the integral
b
I1   xe  x dx.
2
(1.1)
a
the value of this integral can be expressed analytically as (closed-form solution)

b
 1  x2 
 2 a
1  b2 1  a 2 1  a 2
I1    e    e  e  e  e
2 2 2
 b2
.  (1.2)
 On the other hand, the integral

b b
I 2   f ( x )dx   e  x dx
2
(1.3)
a a
does not have a closed-form (analytical) solution. This integral can only be
evaluated numerically.
 Since the integral is the same as area under the curve f ( x ) , its value can be
estimated by breaking the area under the curve into small rectangles and adding the
areas of the rectangles. (See Fig. 1.1.) discretize function f ( x ) to data table
 Since numerical methods involve a large number of tedious/intensive REAL

arithmetic ( , , ,  ) calculations, their use and popularity has been increasing with
the development and availability of powerful and inexpensive computers.
1.2 Computers
1.2.1 Brief History
The abacus, developed in ancient China and Egypt about 3000 years age, represents one
of the earliest computers.
The first systematic attempt to organize information processing resulted in the
development of logarithmic and trigonometric tables in the 16th and 17th centuries. The
slide rule, developed in 1654 by Robert Bissaker, was used for multiplication and
divisions, as well as for the evaluation of square roots, logarithms, and trigonometric
functions.
Simulated by the industrial revolution, the French philosopher and mathematician Blaise
Pascal developed the first mechanical adding mechanics in 1642. Later, Gottfried
Wilhelm von Leibniz, a German philosopher and mathematician, built a mechanical
calculator in 1694. In 1804, Joseph Jacquard, a French loom designer, developed an
automatic pattern loom whose sequence of operations was controlled by punched cards.
The loom was used to produce intricate patterns and paved the way for the development of
mechanical computers. The British mathematician Charles Babbage designed an automatic
digital computer around 1833, but the machine, called the analytical engine, was never
built.
The basic ideas of Babbage were implemented in the electromechanical Automatic-
Sequence-Controlled Calculator (ASCC), also known as MARK I, which was developed
as a join project between Harvard University and IBM. The first entirely electronic
universal calculator was built in 1945 at University of Pennsylvania with the support of
U.S. Army’s Ballistic Research Laboratory. It used vacuum tubes (~1950) and was called
the Electronic Numerical Integrator And Computer (ENIAC). In 1950, there were
approximately 20 automatic calculators and computers in the United States, with a total
value of nearly $1 million. The first-generation computers, developed between 1950 and
1959, included machines such as UNIVAC I and 1103 IBM 701 and 704, and ERA 1101.
The second-generation computers, produced between 1959-1963, were based on
ferrite-core memories and transistors (~1960) as circuit elements. CDC 1604 and 3600,
IBM 1401, 1620, 7040 and 7094, and PDP 1 represent some of the second-generation
computers.
The most powerful computer systems, also known as supercomputers, represent the
fifth-generation computers and were developed in 1980s. Although the mainframe
computers were popular, they were too expansive for individual professionals to acquire.
The development of integrated circuits (~1970, IC  LSI,VLSI) consisting of thousands
of transistors on tiny silicon chips lead to the invention of the personal computer (PC) or
the microcomputer (~1980). The PC (Personal Computer) has dominated the computer
industry and slowly become part of everyday life.
As far back as January 1983, Time magazine selected the PC as its Man of the year.
time.com/time/covers/0,16641,19830103,00.html
1.2.2 Hardware and Software
A digital computer functions through the

interaction of the hardware (the physical
component of the computer) and the software
(the program or instructions).
The hardware is composed of the central

processing unit (CPU) and the input/output
(I/O) units as shown in Fig. 1.2.
The CPU comprises a memory unit, an
arithmetic/logic unit (ALU), and a control unit.
The instructions or data are entered by the user
through the input device and stored in the
memory unit. The control unit decodes these
data and causes the ALU to process the data
and produce the output. The output is stored in
the memory and is sent to the output unit.
The computer software is made up of system software, consisting of programs built
into the computer, and user software, consisting of programs written by the user. The
system software, stored in the memory of the computer, is in binary form and is written in
a low-level (machine) language. If instructions are written in a low-level language, the
necessary commands (program) will be length and vary from machine to machine. Thus,
the programs written in a low-level language are machined dependent and are not portable.
A major set of built-in programs (system software), called an operating system,

controls the communications and operations with the system, schedules the tasks and
interprets the user’s instructions. Although the operating system used in a computer system
depends on the manufacturer, the operating system known as Unix and DOS (disk
operating system) have been popular, User software used by engineers and scientists
includes CAD and graphics tools, word processors, spread sheets, numerical analysis
packages, symbolic manipulation routines, and simulation tools.
The user software is developed using high-level language. The high-level languages
permit the programmer to write instructions for the computer in a form that is similar to
ordinary English and algebra. If instructions are written in a high-level language, the
resulting program will brief and independent of the machine. Thus, the programs written
in high-level languages are portable.
The user programs, also called source codes, are not executed directly. A compiler,
which is a part of the system software, translates the sources code into a binary code
(machine language), which is then loaded into computer memory and executed. If
compilation of a source code is successful, it means that the translator understood the
structure and syntax used in the source code. It does not imply that the instructions given
are correct or can be executed. The binary code can be saved and used at a late time
without a need to recompile. The binary code can be executed only on the specified type
of computer on which it was compiled. On the other hand, source code written in a
particular high-level language can be compiled and executed on any computer that has a
compiler for that language.
1.3 Computer Programming Languages
 Once the particular numerical method to be used for the solution of a given problem is
selected, the method is to be transformed into computer code, or a program for
implementation on a specific computer using high-level language. Some of the
high-level languages are
FORTRAN, COBOL, BASIC, Pascal, C, Ada, LISP, APL, and FORTH.
 FORTRAN (FORmula TRANslatotion) was the first high-level language developed

primarily for use in engineering and science (i.e., computing). It was developed by a
committee sponsored by International Business Machines (IBM) and headed by John
Backus in 1957. FORTRAN uses English-like commands and facilitates the easy
development of even complex programs. It has undergone several modifications and
improvements over the past several decades and has become one of the preferred
computer languages for solving engineering and scientific problems. It was the first
programming language to be standardized by the American National Standards Institute
(ANSI) in 1966. The 1966 standards were designed as FORTRAN IV. Later, the
standards were revised in 1977, and the new standard was designated as FORTRAN 77
(since it was completed in 1977). The most recent version standardized by ANSI,
designated as Fortran 90, has some features similar to those used in the languages C and
Pascal. Fortran 90 was followed in 1977 by a minor update and was designated as
Fortran 95.
 COBOL (COmmon Business Oriented Language, developed in 1959 by Grace Hopper)

can be considered as the business equivalent of FORTRAN and is not popular in
engineering applications. Due to its excellent input/output characteristics and ability to
handle and process vast files of business information, it has been used widely in
business applications.
 BASIC (Beginner’s All-purpose Symbolic Instruction Code) was developed in 1964 by

John kemeny and Thomas Kurtz of Dartmouth College. It is a widely used language for
personal (micro) computers. It is much simpler to use compared to FORTRAN and is
particularly useful to developing small programs. It does not have the versatility of
FORTRAN in developing large complex programs. Since BASIC is widely used on
personal computers, several dialects and improved versions of the language were
developed, especially due to the lack of any standardization. Some of the dialects
include BetterBASIC, QuickBASIC, and TrueBASIC. It is not commonly used for
engineering and science applications.
 C is one of the most powerful and portable high-level languages that can be used to
generate efficient codes for a variety of computers. It was developed by Dennis Ritchie
of Bell Laboratories in 1974 and originally implemented on the Unix operating system.
Although originally it was not intended to be a general purpose language, it has proved
to be valuable for several applications such as systems programming, microprocessors,
text processing, and a variety of application packages. In 1983, the ANSI formed a
committee to standardize the C language based on the industry’s de facto standard and
issued a standard document in 1990.
 Pascal was developed in 1971 by Swiss computer scientist, Niklaus Wirth. The
language was named after the French mathematician, Blaise Pascal (1623-1662), who
in 1642 attempted to construct a mechanical device to perform simple calculations (the
first mechanical adding mechanics). Pascal is a high-level language that is powerful,
easy to learn, and its syntax and organization tend to lead programmers to develop good
programs. The deficiencies of Pascal include a primitive I/O system and the
nonexistence of built-in interfaces to control the computers. To encourage compatibility
between compilers, Pascal was standardized by ANSI in cooperation with International
Standards Organization (ISO) and the Institute of Electrical and Electronic Engineers
(IEEE).
 Ada is named in honor of Augusta Ada Byron (1815-1852), the Countess of Lovelace
and the daughter of the English poet Lord Byron. Ada was the assistant of the
mathematician, Charles Babbage, who invented the calculating machine called the
Analytical Engine. She wrote a computer program to computing Bernoulli numbers in
1830 on the Analytical Engine. Because of this effort, Ada may be considered the
world’s first computer programmer. Ada was developed under the initiative of the
United States Department of Defense with the aim of standardizing military software. It
is based on Pascal and was developed by a team of computer scientists lead by Jean
Ichbian of CII Honeywell Bull in 1980. Ada enforces rules that lead to the development
of more readable, portable, reliable, modular, maintainable, and efficient programs. Its
popularity is increasing due to the availability of a variety of software developed for the
military.
 Appendices A and B present the basic concepts of Fortran and C languages

respectively.
1.4 Data Representation
 The information to be stored in a computer
 may be in the form of numeric (integer, real, and complex) data, nonnumeric
(character and logical) data, and,
 can be used as constants, and variables.
 In a digital computer, all information is stored in binary form.

 A bit is a binary digit (i.e., a zero or a one).
 A byte is a larger unit in which bits are organized. Usually, a byte consists of 8 bits.
 A word is a larger unit in which bytes is organized, For example, a 32-bit word
consists of 4 bytes.
1.4.4 Constants and Variables

1.4.3 Nonnumeric Data
1.4.1 Numeric Data
1.4.2 Conversion of a Decimal (base 10) Number to a Binary (base 2) Number
1.4.4 Constants and Variables
 Data are represented as constants or variables in a high-level language such as Fortran.

 The value of a constant does not change during program execution. The constants
include integers, real numbers, double-precision numbers, complex numbers, logical
constants, and character strings as shown in Table 1.1.
 A variable, on the other hand, can be assigned different values (redefined) during
program execution. A variable name, to which a value is assigned, can be an integer,
real, double-precision, complex, logical variable or a character string. A variable
name in Fortran can consist of one to six alphabetic or numeric characters, and the
first character must be alphabetic.
1.4.3 Nonnumeric Data
 Nonnumeric, or Character, data consist of one or more characters that include the
letters ‘A’ through ‘Z’, the digits 0 through 9, symbols such as +,-,*,/,.,(,),=,$, and the
blank space.
There are two universal coding systems (character code, an integer)
 According to the Extended Binary Coded Decimal Interchange Code (EBCDIC)
used by IBM, 8 bits (256 codes) are used to store a single character.
 In the American Standard Code for Information Interchange (ASCII) code used by
most other computing system (PC and Workstation), a single character is stored in 7
bits (128 codes).
1.4.1 Numeric Data
 Numeric data, in the form of numbers, are used to compute, count, and label.
The numbers can be
 Integers, fixed-point number: Integer numbers denote whole numbers with no

fractional part. ( a1a2 ...an )
 Real, floating-point number: whole numbers with fractional part. ( a1a2 ...an  b1b2 ...bm )
 Complex: two reals, the real and imaginary parts.
 In the decimal system, an integer I (based-10 number) is written as

I   am1am2 ...a2 a1a0 10 , (1.4)
which is equivalent to
 I 10  am1 10   am2 10   ...  a2 10   a1 10   a0 10  .

m 1 m 2 2 1 0
(1.5)
I  1234567 10  I  110   2 10   3 10   4 10   5 10   6 10   7 10 

6 5 4 3 2 1 0
 In general, an m -digit integer ( I m ) can be expressed in base b as
I m   am1am2 ...a1a0 b ; a j  0,1,2,..., b  1, (1.6)
 The subscripts b denotes the number base and the digits a j can take any of the
integer values, 0, 1, 2, ..., b  1. For examples:
the digits a j can take values from 0 to 1 for binary numbers ( b  2 ) and from 0 to 9
for decimal numbers ( b  10 ).
 The decimal value of the number (base b ) given by Eq. (1.6) is

m 1
Im   am1am2 ...a1a0 b  b  a0  b  a1  ...  b
0 1 m 1
 am1   b  a .
j 0
j
j (1.7)
For example,
I m  (10010) 2  24 (1)  23 (0)  22 (0)  21 (1)  20 (0)  18,
and
I m  (1234)5  53 (1)  52 (2)  51 (3)  50 ( 4)  194.
1.4.2 Conversion of a Decimal (base 10) Number to a Binary (base 2) Number
Integer (Fixed-point Number, Whole numbers with no Fractional part)
 A decimal number I (given) can be represented in a binary form

m 1
 I 10   am1am2 ...a1a0 2   a j 
2 j
j 0 ( am1 , am2 ,..., a1 , a0  ? ) (1.8)

 2m1 am1  2m2 am2  ...  21 a1  20 a0
using the procedure shown next.
Dividing by 2 to get a0 , a1 , …, am2 , am1 :  I 10  2  2m2 am1  ...  a1  + a0
 First, the number I is expressed as the sum of twice another number P0 and a
constant a0 :
I  2 P0  a0 ( a0 is 0 or 1). (1.9)
 The number P0 is then written as the sum of twice another number P1 and a
constant a1:
P0  2 P1  a1 ( a1 is 0 or 1). (1.10)
 The process is continued until the new number Pm1  0 is obtained.
 Thus, the procedure yields the sequence of numbers P0 , P1 , P2 , ..., Pm1 and
a0 , a1 , a2 , ..., am1 :
I  2 P0  a0 ;
P0  2 P1  a1 ;
P1  2 P2  a2 ; (1.11)
Pm2  2 Pm1  am1 ( Pm1  0).

 The decimal number I can then be expressed in the binary form
I10   am1am2 ...a2 a1a0 2 . (1.12)
 In a computer, integers are stored as signed binary numbers as shown in Fig. 1.3, where
 the first digit denotes the sign ( S  1 if negative and S  0 if positive).
 The range of an integer is no representation/computational errors inside range
 215  1 to 215 (32767 to -32768, 3  105 ) for a 16-bit number (  8! )
 231  1 to 231 (2147483648 to -214748…, 2  109 ) for a 32-bit number. ( 13!)
 1-byte: 8-bit, 127 to -128, 1  102 ,  6! , 8-byte: 64-bit, 9  1018 ,  21!
 Program INTEGER: to get bit-sequence for 1-, 2-, 4-, and 8-byte integers
Real (Floating-point numbe DCr, Whole numbers with Fractional part)
 A real or floating-point number ( R) contains an integer part and a fractional part.

a1a2 ...an . b1b2 ...bm
the fractional part, in decimal system, is given by the sum of negative powers of 10.
 Thus a fractional number R, 0  R  1 (zero integer part), can be expressed as

R  (0.b1b2 ...bm )10  b1  101  b2  102  ...  bm  10 m , (1.13)
where b1 , b2 ,..., bm are integers between 0 and 9, and m is the number of digits
required to represent the number R.
 Similar to Eq. (1.13), a binary fraction is given by the sum of negative powers of 2.
Thus, a fractional number R, 0  R  1, can be expressed as
R  (0.b1b2 ...bm ) 2  b1  2 1  b2  2 2  ...  bm  2  m , (1.14)
where b1 , b2 ,..., bm are 0 or 1, and m is the number of digits required to represent
the number R.
 To convert a decimal fraction R , 0  R  1, into a binary fraction, we use Eq. (1.14).
( R )10   0.b1b2 ...bm 2  b1  2 1  b2  2 2  ...  bm  2  m ,  b1 , b2 ,..., bm   ?
 For this, the number R is doubled and separated into integer and fractional parts as
2 R  b1  (b2  2 1  ...  bm  2  m1 )  b1  f1 , 0  f1  1 (1.15)
where b1  integer part of (2 R )  intg(2 R ) , f1  fraction part of (2 R)  frac(2 R) .
 The fractional part is again doubled and the result expressed as a sum of an integer
b2  intg(2 f1 ) and a fraction f 2  frac(2 f1 ) :
2 f1  ingt(2 f1 ) + frac(2 f1 )  b2  f 2 (1.16)
 This process is continued until the fractional part f i becomes zero. The process can
be summarized as follows:
2R  ingt(2 R )  frac(2 R )  b1  f1
2 f1  ingt(2 f1 )  frac(2 f1 )  b2  f2
(1.17)

2 f i 1  ingt(2 f i 1 )  frac(2 f i 1 )  bi  fi ( f i  0)
 A 4-byte real number is stored internally as binary numbers as shown in Fig. 1.4, where
 bit 31 (1 bit) is used to store the sign ( S  1 if negative and S  0 if positive),
 bits 24 through 30 (7? bits) are used to store the exponent of 16? increased by 64?,
 bits 0 through 23 (24? bits) are used to denote the magnitude or the fractional part.
e64
 1d2d3...d p 1
 . d 6 1 7
64  (2 )
sign  exponent
“?” means system dependent,
2
fraction
Numeric Model in Languages
 For example, the real number indicated in Fig. 1.5 can be converted to its decimal
equivalent as follows:
(1) bit 31 gives the sign as ( 1)1  1.
(2) bits 24 through 30 (7 bits) give the exponent as

1  26  0  25  0  2 4  0  23  1  22  0  21  0  20  68.
(3) bits 0 through 23 (24 bits) give the mantissa (fractional part) as
1 2  0  2  0  2  0  2  0  2  1 2
1 2 3 4 5 6
 0  2  1 2  0  2  ...  0   2 

7 8 9 24
 0.51953125.
Thus, the decimal equivalent of the binary number is given by

(0.51953125)(166864 )  34048.0 . (7/24, and 16/64 are system dependent)
Bit Sequence for 4-byte REAL (7/24, and 16/64) (8/23, and 2/126)
for a REAL x = 9.87654328E+32, the 32-bit sequence is
Bits:(0 11101100 10000101100011111001101) x   f  2e126

s |<---->| |<------------------->| 24
i
g
8 bits
for
23 bits
for
f  0.5  fk  2k
k 2

n exponent fraction part
--------------------------------------------------------------
1 8
sign of x : positive (2 )  128 ~ 126
2
exponent of x = 236 [ 27 + 26 + 25 + 23 + 22 = 236 ]
fraction part of x = 0.760861218 (normalized)
(2-1) + ( 2-2 + 2-7 + 2-9 + 2-10 + 2-14 + 2-15 + 2-16 + 2-17 + 2-18 + 2-21 + 2-22 + 2-24 )
re-interpreted x = + 9.87654328E+32 + 0.760861218 * (2236-126)
Values 8/23 and 2/126 are used commonly for Workstation/PC

INTEGER(1082130432): 0 1000000100000000000000000000000
( 230 + 223 ) = 1073741824 + 8388608 = 1082130432
4-Byte (32-Bits Sequence) Physical Representation

01000000 10000000 00000000 00000000
REAL(4.0): 0 10000001 100000000000000000000000

x   f  2e126 s |<---->| |<------------------->|
24 i 8 bits 23(+1) bits
f  0.5   fk  2k g for for
k 2 n exponent fraction part
sign : + bit 0
exponent = 129 [ 27 + 20 = 129 ]
fraction part = 0.5 [(2-1), for normalization, f1=1] + 0
interpretation + 4.0 + 0.5 * 2129-126
4-byte INTEGER( 4 ):
( 0 0000000000000000000000000000100 ) ( 22 ) = 4
4-byte REAL( 1082130432.0 ): cannot be represented exactly
( 0 10011101 00000010000000000000000 ) x   f  2e126
s |<---->| |<------------------->| 24
i 8 bits 23(+1) bits f  0.5   fk  2k

k 2
g for for
n exponent fraction part
sign : + bit 0
exponent = 157 [ 27 + 24 + 23 + 22 + 20 = 157 ]
fraction part = 0.503906250 [(2-1), normalized, f1=1] + (2-8)
interpretation + 1.082130432.0 + 0.503906250 * 2157-126
BITVIEW.ZIP: A utility (Copyright c 1995 Microsoft Corp.), which allows you to

view and manipulate the values of basic computer data types: characters, integers,
and real numbers (also called floating point real).
4-byte (32 bits) REAL: x   f  be126 126  (254  1) / 2
b=2; 1  e(8  bit , unsigned )  254 , i.e., emin  125  e  126  128  emax
p  24
f  0.5  f
k 2
k  b k , f1  1 for normalization; b 1  f  (1  b  p )
|-- the 4th byte --| |-- the 3rd byte --| |-- the 2nd byte --| |-- the 1st byte --|
31 24 23 16 15 8 7 0
S e7 e6 e5 e4 e3 e2 e1 e0 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12 f13 f14 f15 f16 f17 f18 f19 f20 f21 f22 f23 f24
|-8 bits exponent-| |------- 23 bits used to store fraction part -------|
Sign-Bit:
the 31 t h bit: value of 0 for positive , value of 1 for negative
Exponent Part (255 and 0):
1 1 1 1 1 1 1 1 : x is Infinite, all fraction-bit = 0 1 1 1 1 1 1 1 0 :254-126=+128
0 0 0 0 0 0 0 0 : x is Zero, all fraction-bit = 0 0 0 0 0 0 0 0 1 :001-126=-125
01111111: 1 01111110: 0 0 1 1 1 1 1 0 1 : -1 1 0 1 0 1 0 1 0 : 44
4-byte, 32 bits (sign 1, precision 23, range 8) REAL: x  b e 126  f
b=2; 1  e(8  bit , unsigned )  254 , i.e., emin  125  e  126  128  emax
p  24
f  (0.5   f k  b k ) , f1  1 for normalization; b 1  f  (1  b  p )
k 2
8-byte, 64 bits (sing 1, precision 52, range 11) REAL: x  b e1022  f

1  e(11  bit , unsigned )  2046 , emin  1021  e  1022  1024  emax
p 53
k 2
16-byte, 128 bits (sign 1, precision 112, range 15) REAL: x  b e16382  f
1  e(15  bit , unsigned )  32766 , emin  16381  e  16382  16384  emax
p 113
k 2
 to get the Bit-Sequence: programs REAL_4, REAL_8, and REAL_16

 Often, infinitely many digits are required to represent a fraction (rational number)
1 1
in the decimal system. For example, the fraction is expressed as  0.16666666 ,
6 6
where the notation 6 indicates that the digit 6 is to be repeated infinite times for an
exact representation of the fraction 1/ 6. 3.14159..., 2, 1/ 6  0.16, 0.1,0.2,0.3,0.4
 For some numbers R , infinitely many digits are required for representation as a
binary fraction. For example, the fraction 1/6 (or, 2/3) can be expressed as
1
6  
  0.166 6 10    0.001...  2
 2 3  2 5  2 7  2 9  ...

2
3  
  0.666 10    0.10...  2
 2 1  2 3  2 5  2 7  ...

where the group of four digits 1001 (or, two digits 10) is repeated forever.
 Note that a nice terminating decimals as 0.1 do NOT have terminating binary
representation. Example 1.2:  0.6 10  0.1001...   2
1
10  
  0.110    0.0001100...   2
2
4
 2 5    2 8  2 9   ...

1.5 Program Structure
 A programming structure denotes a scheme for processing data. A typical high-level

language uses the following types of statements to process the data:
(1) Assignment
(2) Input/Output
(3) Control or Decision
(4) Specification
(5) Subprogram
 An assignment statement is used to compute a quantity or assign a value to a variable.

Example of assignment statements are
(1) AREA = 4.15
(2) PI = 3.1416
DIA=2.5
AREA=PI*DIA**2/4.0
 The I/O statements denote instructions whereby information is transferred to and
from the computer.
 Control statements are used to direct the logical sequence of instructions in a

computer program. Common types of control statements include “GO TO”,
“LOGICAL IF THEN ELSE”, and the “DO LOOP.”
 A specification statement is used to establish a data type or data structure and to

format an I/O record.
 A subprogram statement is used to implement a predefined procedure, known as

subprogram. A subprogram is used to execute a set of one or more statements that is
repeated several times during the program. Instead of repeating the set of statements
many times, they are written only once in the form of a subprogram, which is then
invoked with a single subprogram statement whenever they are needed.
 A Computer program consists of different statements arranged in a logical sequence. A

logical sequence implies an intelligent use of a programming structure.
1.6.1 (Absolute) Error and Relative Error
 If x is an approximation to the true x , (1036.52  0.01, 0.005  0.01 ??? )

 the difference between the true value and the approximate value is called the
(absolute) error ( E x ):
E x  x  x. (1.18)
 the relative error, Rx , is defined as (for nonzero x )
x  x
Rx  , x  0. (1.19)
x
 A number x is considered to be an approximation to the true value x to d

significant digits if d is the largest positive integer for which
x  x x  x 1
Rx   10 d , or , Rx  < 10 d (1.20)
x x 2
1 d
Relative error 10 : x approximates x to d significant digits
2
 Let the true value = 10/3 (3.3…) and the approximate value = 3.333.
 The absolute error is 0.000333…  1/3000
 The relative error is (1/3000)/(10/3) = 1/10000 = 104
The number of significant digits is 4
True value: 3.33333333…

Approximate value: 3.333
1.6.2 Source of Error
NUM: Phys  Math  Numerical  Program  Computer

In general, a numerical result is subject to the following types of errors:
(1) Error in mathematical modeling

The simplifying approximations and assumptions made in representing a physical
system by mathematical equations introduce error. For example, the finite element
analysis is subject to discretization error. In such a case, the results of the
mathematical model will be different from the measured or observed behavior of
the physical system.
(2) Truncation errors associated with the mathematical process
The mathematical process used in the computation sometimes introduces a
truncation error.
For example, an integral involving infinity as a limit of integration involve
computational errors, or, the approximate evaluations of an infinite series
(truncated Taylor series)
(3) Blunders (the Program Errors)
The program errors, if undetected, introduce errors in the computed values. Thus,
when a large program is written, it is a good practice to divide it into smaller
subprograms and test each subprogram separately for accuracy.
Errors in program input

The input errors occur due to unavoidable reasons such as the errors in data
transfer and the uncertainties associated with measurements.
(4) Machine (Round-off) errors

The floating-point (REAL) representation of numbers involves rounding and
chopping errors, as well as underflow and overflow errors. These errors are
introduced at each arithmetic operation during the computations.
1.6.4 Truncation Error (Numerical method approximation)
1.6.5 Round-Off Error (Computer numeric representation) [static]
1.6.6 Computational Error (Computation) [dynamic, one-step]
1.6.3 Propagation Error (Computation) [propagation, multiple steps]
1.6.4 Truncation Error (Numerical Method)
 Truncation error is the discrepancy introduced by the use of an approximate

expression in place of an exact mathematical expression or formula.
 For example, consider the Taylor’s series expansion of the function ln(1  x ) :
 1
i 1

??? x 2 x3 x 4 x5 x6
y(x)  ln(1  x ) = 
i 1 i
x  x       ...;
i
2 3 4 5 6
x  1. (1.29)
x2 x3 xn (n)
y ( x )  y (0)  xy (0)  y (0)  y (0)  ...  y (0)  ...
2! 3! n!
 For simplicity, let the function y ( x ) be approximated by the first four terms of
the Taylor’s series expansion.
 The resulting discrepancy between the exact function y(x) and the approximate
1 1 1

function, y(x)  x  x 2  x 3  x 4 , is called the truncation error.
2 3 4
Truncation error E  y(x)  y(x
 )
1.6.5 Round-off (Rounding) Error (Computer: REAL Representation)
 Since only a finite number of digits are stored (static) in a computer, the actual
numbers may undergo chopping or rounding of the last digit. For example, let a number
in decimal form is given by
x  0.b1b2 ...bi bi 1bi 2 ..., where 0  b j  9 for j  1. (1.30)
 If the maximum number of decimal digits used in the floating-point computations is

i , then the chopped floating-point representation of x , xchop , is given by
xchop  0.b1b2 ...bi , (1.31)
where the ith digit of xchop is identical to the ith digit of x .
 On the other hand, the rounded floating-point representation of x , xround , is given by

xround  0.b1b2 ...bi 1d i , (1.32)
where d i (1  d j  9) is obtained by rounding the number d i d i 1d i 2 ... , to the
nearest integer.
For example, the value of e is given by e  2.718281828459045... . The seven-digit
representation of e by using chopping and rounding is given by
echop  0.2718281  101 , eround  0.2718282  101 .
1.6.6 Computational Error (Computer: REAL Arithmetic)
 The numerical solution to an engineering problem is found using a suitable algorithm.

 An algorithm is a finite set of precise instructions to be carried using the given initial
data in a specified sequence in order to find the desired output.
 All the local computational errors involved in the various steps of an algorithm will
accumulate to a computational error in the output.
 The local computational errors arise due to errors involved during arithmetic
operations such as subtraction of numbers of near-equal magnitude and also when
irrational numbers (such as 3 or  ) are replaced by machine numbers with a
finite number of digits.
Errors associated with arithmetic operations (dynamic, one step)

 When two numbers are used in an arithmetic operation, the numbers cannot be stored
exactly by the floating-point representation.
 Let x and y be the exact numbers and x and y their approximate values. Then
x  x  x, y  y  y, (1.33)
where  x and  y denote the errors in x and y , respectively.
 For example, when a multiplication operation is used, the associated error ( E ) is

given by
E  x  y  x  y  xy  ( x   x )( y   y )  x x  y y   x y . (1.34)
The relative error ( R ) is given by

E x y x y
R      Rx  R y  Rx R y  Rx  R y , (1.35)
xy x y x y
where Rx  1 and R y  1 with Rx and Ry denoting the relative errors in x

and y , respectively, and the symbol  representing “much less than.”
 Proceeding in a similar manner, the relative error associated with division operation,
x / y , can be represented as
R  Rx  R y (1.36)
1.6.7 General Guidelines for Rounding of Numbers
 The following procedure is used to round off numbers during numerical

computations:
(1) If the round-off is done by retaining i digits, the last retained digit (the ith one) is
increased by one if the first discarded digit is 6 or larger.
(2) If the last retained digit is odd and the first discarded digit is 5 or 5 followed by
zero, the last retained digit is increased by one.
(3) In all other cases, the last retained digit is unaltered.
Example 1.7
(4) During addition or subtraction, the rounding off of the final result is done such that
the position of the last retained digit is the same as that of the most significant last
retained digit in the original numbers
that were added or subtracted.
Example 1.8
(5) During multiplication or division, the round-off of the final result is done such that
the number of significant digits is equal to the smallest number of significant digits
used in the original numbers.
Example 1.9
(6) During multiple arithmetic operations, the operations are performed one at a time
as indicated by the parentheses:
(multiplication or division)  (multiplication or division)
(addition or subtraction)   (addition or subtraction)
In each step of the operation, the results are rounded as indicated in guidelines 4
and 5 before proceeding to the next operation, instead of only rounding the final
result.
Examples 1.10 and 1.11

1.6.3 Propagation Error (REAL Computations: dynamic, multiple steps)
 The propagation error is the error in the output of a procedure due to the error in the
input data.
 To find the propagation error, the output of a procedure ( f ) is considered as a
function of the input parameters ( x1 , x2 ,..., xn ) :

f  f ( x1 , x2 ,..., xn )  f ( X ). (1.21)

Here, X  x1 , x2 ,..., xn  is the vector of input parameters.
T
 If approximate values of the input parameters are used in the numerical

computation, the value of f can be found using Taylor’s series expansion about

the approximate values X  x1 , x2 ,..., xn  as
T
f  f 
f ( x1 , x2 ,..., xn )  f ( x1 , x2 ,..., xn )  ( X )( x1  x1 )  ( X )( x2  x2 )
x1 x2
 (1.22)
f
 ...  ( X )( xn  xn )  higher order derivative terms.
xn
 By neglecting the higher order derivative terms, the error in the output can be
expressed as
f  f  f  f ( x1 , x2 ,..., xn )  f ( x1 , x2 ,..., xn ). (1.23)
Denoting the errors in the input parameters as
xi  xi  xi , i  1,2,..., n, (1.24)
we can estimate the propagation error( f ) as
n
f 
f   ( X )( xi  xi ). (1.25)
i 1 xi
 If f ( x1 , x2 ,..., xn )  0 and xi  0 , the relative propagation error (  f ) is given by
f n
 xi f  
f     ( X )   xi , (1.26)
f i 1  f ( X ) xi 
where  xi is the relative error in xi :
xi  xi
x  , i  1,2,..., n. (1.27)
i
xi
 The quantity
xi f 
ci   (X ) (1.28)
f ( X ) xi
is called the amplification or the condition number of the relative input error  xi .
 The study of propagation error due to input errors is called the error analysis. The
numerical problem procedure is said to be well conditioned if the condition number
( ci ) are reasonably bounded.
Round-Off (Machine) Errors: Representation/Computation/ Propagation
 Real numbers like pi  3.141592653589... and 1/ 3  0.33333333... ,
 which do not have terminating decimal (human) representations,
 do not have terminating binary (computer) representations either and thus
cannot be stored exactly using a finite number of bits.
 Even such nice terminating decimals as 0.1 do not have terminating binary
 
representation [  0.110  0.0001100... ]. (program ROUND_1)
2
 In fact, of all (10) reals of the form 0.d1 , where d1 is a digit, only 0.0 ( d1  0 )
and 0.5 ( d1  5 ) can be represented exactly; 0.1, 0.2, 0.3, 0.4, 0.6, 0.7, 0.8, and
0.9 cannot. (0.1)10  (0.0001100) 2 , (0.6)10  (0.1001) 2 , (0.9)10  (0.11100)2
 Only 4 two-digit (100) reals of the form 0.d1d 2 can be represented exactly,
namely, 0.00, 0.25, 0.50, 0.75; the remaining 96 two-digit reals cannot.
(0.55)10  (0.100011)2 , (0.65)10  (0.1010011)2 , (0.95)10  (0.111100)2
 In general, the only real numbers that can be represented exactly in the computer's
memory are those that can be written in the form m /(2k ) .
Round-Off Errors
 The errors intrinsic to the nature of the computer itself happen because any
computer has a finite precision (finite bits to store the digits).
 Many floating-point (real) numbers cannot be represented exactly when the
representation uses a number base 2 on digital computers.
 As a result, these values must be approximated by one of the nearest
representable values;
 the difference is known as the machine (computer) round-off error.
 Unlike the real number in algebra, which is continuous (infinite precision),

 a floating-point system in computer has gap (spacing, discontinuous) between
each number (finite precision).
 Because the same number of bits is used to represent all normalized number (the
fraction part), the smaller the exponent part the greater the density of
representable numbers and the smaller the spacing between two consecutive
numbers (unequal spacing/gap distribution).
 This implies results of (0.2-0.1) and (1000.2-1000.1) are different in computer
arithmetic even through they are algebraically equal. (program ROUND_2)
Round-Off Errors
 Underflow
 REAL numbers (non-zero) occurring in calculations that have a magnitude
(absolute value) of less than 10-38 and 10-308 result in what is called underflow
for 4-byte (single-precision) and 8-byte (double-precision) REAL respectively.
 Underflow (only for reals) is a less serious problem because is just denotes a
loss of precision, which is guaranteed to be closely approximated by zero.
 Overflow can have 6-7/15-16 significant digits for 4/8-byte REAL

 REAL numbers occurring in calculations that have a magnitude (absolute
value) of greater than 1038 and 10308 result in what is called overflow for 4-byte
and 8-byte REAL respectively.
 Overflow means that values of reals have grown too large for the representation
and cause the computation to halt (program stop).
 In the same way that you can overflow INTEGERS and the computation may
or may not halt (compiler, compiler-option dependent or run-time dependent).
Round-Off Errors
 Cancellation Error: (program ROUND_3)
 In the process of adding two numbers the smaller number is modified so that its
exponent matches that of the larger number, this can result in moving the
significant digits of the smaller out of the range of significant digits carried by
the computer. As a result the contribution of the small number is lost (cancelled
out by the larger), for example of a 4-digit precision calculation
0.1234  104  0.5678  104  0.1234  104
 This kind of error is often encountered when summing a series of numbers with
terms in decreasing order
n
1 1 1 1 1
s       ...  (algebraically equal)
i 1 i 1 2 3 n
it can be avoided by summing a series in reverse, i.e., in increasing order,
1
1 1 1 1 1
s     ...   (different in computer)
i n i n n 1 2 1
so that the cumulative total grows with the magnitude of the terms.
Round-Off Errors
 Subtractive Cancellation Error: (program ROUND_4)
 When two very similar numbers (same sign with nearly equal in magnitudes)
are subtracted the difference can be lost.
 The loss of accuracy is a major source of error in floating-point operations.
0.1234567 ??????? E+02 7-digit precision

- 0.1234555 ??????? E+02
0.0000012 ??????? E+02
0.12????? E-03 normalized form
stored as 0.1200000 E-03 2-digit precision
The accuracy of 5 digits precision of the result are lost
 (1111113-1111111)+7.511111 equivalents to ? 1111113-(1111111-7.511111)
 1  cos( x )  sin 2 x
1 cos x
 for x ~ 0.0 , b  b 2  4ac for b2  4ac
Round-Off Errors
 Consider the quadratic equation x 2  2ax    0 which has the two solutions
x1  a  a2   and x2  a  a2  
If a  0 and  is small compared with a (   a ), the root x2 is expressed as
the difference between two almost equal numbers and a considerable amount of
significance is lost. Instead, if we write
a  a2   
x2  a  a 2    
a  a2   a  a2  

We obtain the root as approximately without loss of significance.
2a
 Suppose that, for a fairly large value of x , we know that cosh( x )  a; sinh( x )  b
and that we want to compute e x . Clearly
e  x   cosh( x )  sinh( x )  a  b
leading to a dangerous (subtractive) cancellation while, on the other hand,
 1  1
e x   
 cosh( x )  sinh( x )  a  b
gives a very accurate result.
Round-Off Errors
 An ill-conditioned (sensitive to the round-off errors) linear system Ax  b
 Math: small changes in A or b may produce large changes in x
 Computation:
 difficult to solve accurately by direct method
 the verification gives little information about the solutions
Verify the computed solution: (x=0.524659, y=0.524780)
(+3902.0)x + (-3903.0)y = -1.0 Exact solution is
(-3903.0)x + (+3904.0)y = +1.0 x=1.0, and y=1.0
+(3902.0*0.524659) -(3903.0*0.524659)
-(3903.0*0.524780) +(3904.0*0.524780)
2047.219418 -2047.744077
-2048.216340 +2048.741120
-0001.00 0001.00 Yes???
Subtractive Cancellation Error is the major source of round-off errors
Error Propagations
1 1 x n
 Consider the evaluation of the integral: I n   e x dx . A little manipulation
e 0
(integration by parts) yields the recurrence formula (marching problem)
1
I0  1  , and I n  1  n  I n1 , n  1, 2, ...
e
Note that the integrand is always positive within the range of integration and that
the area under the curve, and hence the value of I n , decreases monotonically with
n . Thus, for all n , we may deduce that I n  0 and I n  I n1 .
 The calculation is unstable, if I 0  I1  I 2  ... , i.e., in increasing order of n

 This instability can be avoided by reversing the calculation order, i.e.,
I 20  I19  I18  ...  I 2  I1  I 0
1 1
by using alternative recurrence relation, I n 1    I n , with initial guessed
n n
value of I 20  0 (or, the approximation relation of I 20  I19 )
1 xn 6 1
 Similarly, for integral I n   dx , with I 0  ln( ), I n   5  I n1 , n  1,2,...
0 x5 5 n
Physical Problem  Ordinary (10, 9) Differential Equations

Discretization (Numeric Data Points)
Curve-Fitting and Interpolation (5)
Differentiation (7) , Integration (8)

Simultaneous Algebraic Equations (3)
(Single) Nonlinear Equations (2)

Tedious REAL Arithmetic ( , , ,  ) Computing (1)  Computer
1.7 Numerical Methods Considered
 The behavior of any physical or engineering system can be described by one or more
mathematical equation(s).
 If the mathematical equations are simple (linear, simple geometric shape), the exact
(analytic) solution can be found in closed form.
 Although closed-form solutions are most desirable, for most engineering problems,
the equations are quite complex for which the exact solution cannot be found.
 In such case, numerical methods can be used to solve the mathematical equations
using arithmetic operations in order to understand the behavior of the system.
(solution: approximated in discrete form)
Discretization, rebuild function by Polynomial or Truncated Taylor Series

Numerical methods involve a large number of tedious/intensive REAL arithmetic
( , , ,  ) calculations, their use and popularity has been increasing with the
development and availability of powerful and inexpensive computers.
 The various types of numerical methods discussed in this book are summarized next.
1.7.1 Solution of Nonlinear Equations (Chapter 2)
 Many engineering problems involve the solution of one or more nonlinear equations. A
nonlinear equation may be in the form of an algebraic, transcendental, or polynomial
equation.
For example, the determination of nature frequencies of a vibrating system, the
temperature of a heated body from an energy balance, the friction factor corresponding
to a turbulent fluid flow, and the transient current in an electrical circuit lead to
different types of nonlinear
equations.
 A simple nonlinear equation

involves the determination of
the root of the equation
f ( x)  0
The typical iterative process
used in the determination of the
root, x *, is shown in Fig. 1.7.
1.7.2 Simultaneous Linear Algebraic Equations (Chapter 3)
 In engineering applications, a wide variety of mathematical problems are encountered.

In areas such as solid mechanics, heat transfer, fluid mechanics, electrostatics, and
combustion, the governing partial differential equations are usually solved using a finite
element or finite difference technique.
 This converts the problem into a system of

linear algebraic equations in terms of a set
of unknown variables.
A set of two linear equations can be stated
as
 a11 x1  a12 x2  b1 ;

 a21 x1  a22 x2  b2 .
The graphical interpretation of the
solution of the two equations is shown in
Fig. 1.8.
1.7.3 Solution of Matrix Eigenvalue Problem (Chapter 4)
 The analysis of many engineering system, such as the vibration of structures and
machine buckling of columns, and dynamic response of electrical systems, requires the
solution of a set of homogeneous linear algebraic equations.
 In these problems, if the number of equations is n , there will be n  1 unknowns.

These problems are known as algebraic eigenvalue problems. For example,
an eigenvalue problem involving two homogeneous equations can be stated as
( a11   ) x1  a12 x2  0

 a21 x1  ( a22   ) x2  0
  x1 
where,  , called the eigenvalue, and X    , known as the eigenvector, are the
 x2 
unknowns.
 a11   a12   x1  a11   a12  a11 a12 

    0  0  
 a21 a22     x2  a21 a22    a 21 a22 
 The interpretations of the eigenvalue and eigenvector are indicated in Fig. 1.9.
1.7.5 Statistical and Probability Methods (Chapter 6)
 In many physical and engineering problems, numeric data are collected to understand
physical phenomena. [ ( x1 , x2 , x3 , ...) ,not type of x (t ) ]
Then the principles of statistics and probability are used to analyze the data, develop
models, and predict the behavior of the system.
 For example, if several values of an uncertain quantity, such as the wind load acting an
a building, are measured as ( x1 , x2 , x3 , ...) , they can be used to develop a histogram
and a probability distribution of wind load as shown in Fig. 1.11.
1.7.4 Curve Fitting and Interpolation (Chapter 5)
 In certain physical problems, the values of a function may be available at a certain

number of data points, [( x1 , y1 ),( x2 , y2 ),...,( xn , yn )] , and we may be required to
estimate the function value at a missing data point. In such a case,
 a weighted average of the known function values at neighboring points can be used
as an estimate of the missing functional value, ( xs , y s ) .
 Another approach is to first fit a curve, y  f ( x ) [Curve Fitting], using the
available functional values at the data points and then estimate, y s  f ( xs )
[Interpolation], the missing functional value from the fitted curve.
These approaches are known as interpolation and curve-fitting techniques.
 Atypical curve-fitting problem can be stated as follows:

 Find a quadratic, f ( x )  a  bx  cx 2 , to fit the given data shown by dots in Fig.
1.10(a).
 Similarly, an interpolation problem involves finding an nth-degree polynomial that
passes through n  1 data points to estimate the value of the function in between the
data points (Fig. 10(b)).
 Two approaches for the “curve-fitting then interpolating” problems:
 df ( x )   dP ( x ) 
?  ,  f ( x )dx  : f ( x )  ( x, y ) points  Polynomial P ( x )   ,  P ( x )dx 
 dx   dx 
1.7.6 Numerical Differentiation (Chapter 7)
 In certain physical problems, a function is to be differentiated without knowing the

expression.
In these cases, the values of the function are known only at a discrete set of points.
Then we can use numerical differentiation.
 In this procedure, first a polynomial passing

through all the data points is determined,
and the resulting polynomial is then
differentiated to find the approximate
derivative of the unknown function. For
example,
the numerical derivative dfdx can be found
using a central difference formula as
 df   f  f i 1  f i 1
f ( xi )        .
 dx i  x i xi 1  xi 1
and, is shown graphically in Fig. 1.12.
1.7.7 Numerical Integration (Chapter 8)
 The solution of many engineering problems

requires the evaluation of an integral.
If the function to be integrated is too complex or if the values of the function are known
only at discrete values of the independent variable, numerical integration techniques are
to be used.
 Basically, the variation of the function (to be

integrated) is assumed to be a simple
polynomial over a discrete interval, and then
the integral is evaluated as the sum of the
areas under the assumed polynomials over the
various discrete intervals.
For example, if the exact integral is indicated
as in Eq. (1.3), the numerical integral can be
evaluated as (Fig. 1.1)
I 2  A1  A2  A3  ...  A8 . (Rectangular Rule)
1.7.8 Solution of Ordinary Differential Equations (ODE) (Chapters 9 and 10)
 Ordinary differential equations arise in the study of many physical phenomena such as
dynamics, heat and mass transfer, current flow in electrical circuits, and chemical
reactions. In some cases, partial differential equations can be transformed to ordinary
differential equations.
In all these cases, the solution of a set of
one or more ordinary differential
equations is required under specified
initial (9) or boundary (10) conditions.
 For example, the solution of the first

order differential equation
dy
 f ( x, y ) , y ( x0 )  y0
dx
can be found numerically by
approximating the derivative as the slope
of the function y ( x ) at different values
of x as indicated in Fig. 1.13.
Chap. 10 Ordinary Differential Equation: Boundary-Value Problems (BVP)
 Curve Fitting: Assume a second-order
polynomial f ( x ) passing through three
equi-spacing ( h ) data points ( xi 1 , f i 1 ) ,
( xi , f i ) , and ( xi 1 , f i 1 ) are
f ( x )  a0  a1  x  xi 1   a2  x  xi 1 
2
Unknown a0 , a1 , and a2 can be founded as

a0  f i 1 ,
 f i 1  4 f i  3 f i 1
a1  ,
2h
f  2 f i  f i 1
a2  i 1 2
.
2h
 Interpolations: First derivative of f ( x ) , f ( x )  a1  2a2 ( x  xi 1 ) , at point xi is
f i 1  f i 1
f ( xi )  f i   a1  2a2  xi  xi 1   a1  2a2 h 
2h
Second derivative of f ( x ) , f ( x )  2a2 , at point xi is
f i 1  2 f i  f i 1
f ( xi )  f i   2a2 
h2
 Numerical methods can be used to find solutions of even complex engineering
problems. (Nonlinear Differential Equations, Arbitrary Geometric Shape, and …)
While analytical solutions usually require several simplifying assumptions of the
physical system, numerical solutions do not require such assumptions.

Numerical methods involve a large number of tedious/intensive REAL arithmetic
( , , ,  ) calculations, their use and popularity has been increasing with the
development and availability of powerful and inexpensive computers.
 Although numerical solutions cannot provide an immediate insight into the behavior of
the simplified physical system, they can used to
study the behavior of the true physical system.
 Analytic solution: exact, continuous
mathematical function in close-form.
 Numerical solution: approximated, discrete
numeric data at specified points in (what it
means???).
1.7.9 Solution of Partial Differential Equations (Chapter 11)
 The behavior or many physical systems are governed by differential equations

involving two or more independent variables, known as partial differential equations.
For example, the transient temperature distribution in a rod, the seepage (fluid) flow in
soil and the displacement of a plate under load require the solution of different types of
partial differential equations.
 For example, a partial differential equation involving the spatial coordinates x and y
as independent variables is given by Partial: two or more independent variables
 2T  2T
  f ( x, y ) ,
x 2
y 2
where T ( x, y ) is the unknown function.

 This equation can be approximated at various grid points of the solution domain
using finite differences as (see Fig. 11.6)
 These equations represent a system of algebraic linear equations that can be solved
easily.
Ordinary Differential Equation: Curve
Two-Variable Partial Differential Equation: Surface

 This equation can be approximated at various grid points of the solution domain using
finite differences as (see Fig. 11.6)
 2T  2T
 2 =  6250 ( P.D.E.)
x 2
t
discretization, data points
curve - fitting ( polynomial )
interpolation (differentiation)
Ti j1  2Ti j  Ti j1 Ti j 1  2Ti j  Ti j 1

  ...;
( x ) 2
( y ) 2
simultaneous algebraic equations
i  1,2,3,..., m;
solved for 
 j  1,2,3,..., n
 These equations represent a system of
algebraic linear equations that can be
solved easily.
1.7.10 Optimization (Chapter 12)
The analysis, design, and operation of many engineering systems involve the
determination of certain variables so as to minimize an objective (cost) function while
satisfying certain functional and economical constraints. The solution of such problems
requires the use of analytical or numerical optimization techniques.
If the equations for the objective and constraint
functions are available in closed form and are
simple, analytic methods of optimization can be
used for the solution of the problem. On the other
hand, if the objective and constraint equations
are complex or not available in closed form,
numerical methods of optimization can be used
for the solution of the problem.
For example, to find the minimum of a function

f ( X ) subject to the constraints g j ( X )  0 ,

j  1,2,..., m , first a starting vector X 1 is
assumed. Then  the vector is iteratively improved
to find X 2 , X 3 ,... , and ultimately, the optimum

vector, X * , as shown in Fig. 1.15
1.7.11 Finite-Element Method (Chapter 13)
In many engineering system, the governing
equations will be in the form of partial differential
equations, and the solution domain will not be
regular.
These problems can be solved conveniently by
replacing the solution domain by several regular
subdomains or geometric figures, known as finite
elements, and assuming a simple solution in each
finite element.
When equilibrium and compatibility conditions are
enforced, the process leads to a system of algebraic
(or a system of ordinary differential) equations that
can be solved easily.
For example, the stresses induced in a plate with a
hole (Fig. 1.16(a)) can be found by modeling the
plate by a number of triangular elements as shown
in Fig. 1.16(b).
1.8 Software for Numerical Analysis
 Several commercial, as well as public-domain, software packages are available for

the solution of numerical analysis problems.
Although most of the software was originally written for mainframe computers, many
of the programs have been modified for use on all types of computers in recent years.
Most of the numerical analysis software was developed in Fortran; however, some
software is available in C and Pascal. Libraries
 The International Mathematical and Scientific (Statistical?) Library (IMSL,
commercial) consist of more 700 subroutines, originally written in Fortran, in the areas
of general applied mathematics, statistics, and special functions. Most of the programs
of IMSL are available in both single- and double-precision versions and can be used on
a variety of computers ranging from personal computers to supercomputers. Currently,
IMSL programs are available in C also. (www.vni.com/products/imsl/)
 LINPACK (www.netlib.org/linpack/) and EISPACK (www.netlib.org/eispack/) are
public-domain Fortran packages available from Argonne National Laboratory.
EISPACK is the first large-scale public-domain package made available for the solution
of algebraic eigenvalue problems. LINPACK can be used for the solution of systems of
linear equations and least square problems. (LAPACK: www.netlib.org/lapack/)
Linear Equations, Eigenvalue/Singular Problems, Linear Algebra
 Several software packages have been developed for the symbolic solution of
mathematical problems. Although
Macsyma (MAC’s SYmbolic MAnipulation system), Derive, Mathematica, and Maple
are popular for the symbolic solution,
 only Maple is considered in this book for demonstrating the symbolic or numerical
solution of different types of practical engineering problems. The basic concepts of
Maple are summarized in Appendix C. (www.maplesoft.com/)
 For the interactive solution of problems representing different types of engineering

and scientific applications, the software MatLab (Matrix Labopratory) can be used
very conveniently. MatLab is useful, especially for the problems involving vector
and matrix manipulation. The basic ideas and procedures of MatLab are indicated in
Appendix D. (www.mathworks.com/products/matlab/)
 MathCAD is another software that can be used for the solution of mathematical
problems, both numerically and symbolically. It also provides two- and
three-dimensional plots. A brief outline of the concepts of MathCAD are given in
Appendix E. (www.mathsoft.com/)
1.9 Use of Software Packages
 To illustrate the use of software packages MatLab, Maple, and MathCAD through a
simple example, the multiplication of the following matrices is considered
 8 0
2 3 4
A   , B   2 7 .
 1 5 6   
 1 4 
The result is
 18 37 
C  AB    .
 8 11
 1.9.1 MatLab (Example 1.12)
 1.9.2 Maple (Example 1.13)
 1.9.3 MathCAD (Example 1.14)

1.10 Computer Programs
 Fortran and C programs are given for the multiplication of the following matrices:
 8 0
2 3 4
A   , B   2 7 .
 1 5 6   
 1 4 
The result is
 18 37 
C  AB    .
 8 11
 1.10.2 C Program
 1.10.1 Fortran Program

Appendix F: Review of Matrix Algebra: Definitions, Determinant, Inverse, …
Appendix F Review of Matrix Algebra
Contents
1. Definitions
2. Determinant of a Matrix
3. Rank of a Matrix
4. Inverse Matrix
5. Basic Matrix Operations
1. Definitions
Matrix: A matrix is a rectangular array of integer, real, or complex numbers. An array

having m rows and n columns enclosed in brackets is called an m -by- n matrix. If
 A is an m  n matrix, it is denoted as
 a11 a12 . . . a1n 
a a22 . . . a2 n 
 21 
. . . . . . 
 A  
. 
(1)
. . . . .
 
. . . . . . 
a amn 
 m1 a m 2 . . .
where the numbers aij are called the elements of the matrix. The first subscript i denotes
the row and the second subscript j specifies the column in which the element aij
appears.
Square matrix: If the number of rows ( m ) is equal to the number of columns ( n ), the
matrix is called a square matrix of order n .
Column matrix: A matrix with m rowsand 1 column is known as a column matrix or

simply a column vector. A column vector a with m elements is denoted as
 a1 
a 
 2
  . 
a   . (2)
 . 
 . 
 
 am 
Row matrix: A matrix with 1 row and n columns is known as a row matrix or simply a
row vector. A row vector  b with n elements is denoted as
b  b1 b2 . . . bn . (3)

Diagonal matrix: A (square) matrix in which all the elements are zero, except those on
the main diagonal is called a diagonal matrix. A diagonal matrix,  A, of order n is
denoted as
 a11 0 0 . . . 0
0 a22 0 . . . 0
 
0 0 a33 . . . 0
 A   . . . . . . .

. (4)
 . . . . . . . 
 . . . . . . . 

 0 0 0 . . . ann 
Identity matrix: If all the elements of a diagonal matrix are equal to 1, unity, then the
matrix is known as an identity matrix or unit matrix and is denoted as  I  .
Zero matrix: If all the elements of a matrix are equal to zero, then the matrix is called a
zero or null matrix and is denoted as 0 .
Transpose of a matrix: The transpose of an m  n matrix  A is defined as the n  m
matrix obtained by interchanging the rows and columns of  A, and is denoted as  A ,
T
For example, if  A is given by

 2 4 5
 A    , (5)
 4 1 8
its transpose,  A , is given by
T
2 4
    4 1  .
A
T
(6)
 5 8 
It can be noted that the transpose of a column matrix (vector) is a row matrix (vector), and
vice versa.
Trace: The trace of a square matrix is defined as the sum of the elements in the main
diagonal. For example, the trace of the n  n matrix,  A   aij  , is given by
Trace  A  a11  a22  ...  ann . (7)
Symmetric matrix: A square matrix for which the upper right half can be obtained by
flipping the matrix about the main diagonal is called a symmetric matrix. Thus, for a
symmetric matrix  A   aij  that satisfies the relation  A   A , aij  a ji .
T
For example
 r a b
 A   a s c 
 b c t 
Antisymmetric matrix: An antisymmetric (skew-symmetric) matrix  A   aij  is a

square matrix that satisfies the identity  A    A , aij   a ji .
T
For example
0 a b
 A   a 0 c 
 b  c 0 
Matrix decomposition: Any square matrix can be expressed as the sum of symmetric
and antisymmetric parts. Write A  B  C with B  12 ( A  AT ) and C  12 ( A  AT )
 a11 a12  a1n   a11 a21  a n1 

a a22  a2 n  a a22  an 2 
For arbitrary A   21
 , its transpose matrix is AT   12 .
           
a  ann  a  ann 
 n1 an 2  1n a2 n
 a11 1
2 ( a12  a21 )  ( a1n  an1 ) 
1
2
 1 (a  a ) a22  1
( a  a ) 
B  12 ( A  AT )   2 12 21 2 2n n2
 , which is symmetric, and,
     
 1 (a  a ) 
2 ( a2 n  an 2 ) 
1
 2 1n n1 ann 
 2 ( a12  a21 )  ( a1n  an1 ) 

1 1
0 2
  1 (a  a ) 0  1
( a  a ) 
C  2 (A A ) 
1 T  2 12 21 2 2n n2
 , which is antisymmetric.
     
  1 (a  a )  1 (a  a )  
 2 1n n1 2 2n n2 0 
2. Determinant of a Matrix
 If  A denotes a square matrix of order n , then the determinant of  A, is denoted as

a11 a12 . . . a1n
a21 a22 . . . a2 n
. . . . . .
 A  . (8)
. . . . . .
. . . . . .
a n1 an 2 . . . ann
The value of a determinant can be determined in terms of its minors and cofactors.
 The minor of the element aij of the determinant  A of order n is a determinant of
order n  1 obtained by deleting the row i and column j of the original determinant.
The minor of aij is denoted as M ij .
 For example, the minor of the element a32 of

a11 a12 a13 a14
a21 a22 a23 a24
 A  det  A 
a31 a32 a33 a34
a41 a42 a43 a 44
is denoted as M 32 and is given by

a11 a13 a14
M 32  a21 a23 a24
a41 a43 a44
 The cofactor of the element aij of the determinant  A of order n is the minor of
the element aij , with either a plus or a minus sign attached; it is defined as
cofactor of aij   ij  ( 1)i  j M ij , (9)
where M ij is the minor of aij .
 For example, the cofactor of the element a32 of

a11 a12 a13
 A  det  A  a21 a22 a 23 (10)
a31 a32 a33
is given by
a11 a13 a11 a13
 32  ( 1)5 M 32  ( 1)5   . (11)
a21 a 23 a 21 a23
 The value of an nth-order determinant  A is defined as
n
Row expansion: det  A   aij  ij , for any specific row i,
j 1
(13)
n
Column expansion: det  A   aij  ij , for any specific column j.
i 1
It can be seen from Eq. (13) that there are 2n different ways in which the determinant
of a matrix can be computed.
 The value of a second-order determinant  A is defined as
a11 a12
det  A   a11 11  a12 12  a11  a22   a12   a21   a11a22  a12 a21 . (12)
a21 a22
 It can be shown that the number of arithmetic operations ( , , , or  ) required for

the computation of a determinant of an n  n matrix is O ( n!) . ( O , the order of)
Thus, the number of arithmetic operations increases very rapidly with an increase in the
size of the matrix.
a11 a12
 a11a22  a12 a21 2  2! production terms, 2! (2  1) multiplications
a21 a22
a11 a12 a13

a22 a23 a21 a23 a21 a22 3 determinants of order 2,
a21 a22 a23  a11  a12  a13
a32 a22 a31 a33 a31 a32
a31 a32 a33 3  2!  3! production terms,
 a11a22 a33  a12 a21a33  a13a21a32 3! (3  1) multiplications
 a11a32 a23  a12 a31a23  a13a31a22
a11 a12 a13 a14 Having 4 determinants of order (4  1)  3,

a21 a22 a23 a24
With 4  3!number of production terms of order 3  4! production terms,
a31 a32 a33 a34
a41 a42 a43 a44 Total 4!number of production terms of order 4  (4  1) multiplications
Evaluating determinants of order n , in the usual manner of expansion by minors,

requires n!number of production terms  ( n  1) number of multiplications in each term multiplications. [ ~ ( n  1)!]
Properties of determinants:
(1) The value of a determinant is not affected if rows (or columns) are written as
columns (or rows) in the same order, i.e.,  A   A .
T
(2) If all the elements of a row (or a column) are zero, the value of the determinant is
zero.
(3) If any two rows (or two columns) are interchanged, the value of the determinant is
multiplied by -1.
(4) If all the elements of one row (or one column) are multiplied by the same constant a ,
the value of the new determinant is a times the value of the original determinant.
(5) If the corresponding elements of two rows (or two columns) of a determinant are
proportional (linear dependent), the value of the determinant is zero. For example,
4 6 8
det  A  3 4 6  0.
2 2 4
 Operation count for solving linear system Ax  b involving n  81 unknowns A8181 :
 by Cramer’s rule [( n  1)  ( n  1)!] is 82!, which is a very large number indeed
( 4.75  10122 ), dwarfing even the national debt.
n3 813
 by LU decomposition (or Gauss elimination) [ ] is , for comparison, would
3 3
require only about 177,147 (1.77  10 ) operations.
5
 If we solve this problem using a machine capable of performing 100 million floating
point operations per second (100 megaflops, Pentium III 933MHz, year 2000)
 Using Cramer’s rule would require about 3.20  10101 years. This is not worth
waiting for!
 Using LU decomposition (or Gauss elimination) would only require a fraction of
a second on the same machine.
~ 3 gigaflops for Pentium-4 3.4G2004, ~ 48 gigaflops for Core2 Duo E67002008
 Cramer’s rule should never be used for more than about three unknowns, since it
rapidly becomes very inefficient as the number of unknowns increases
Determinant of U  : U   u11  u22  ...  unn
Determinant of U  : U   u11  u22  ...  unn  1
3. Rank of a Matrix
For an n  n matrix  A, consider all possible square submatrices that can be formed by
deleting rows and columns. The rank of  A is then defined as the size of the highest
order nonsingular submatrix. This implies that a square matrix of order n is
nonsingular if and only if its rank is n .
4. Inverse Matrix
 The inverse of a square matrix  A is written as  A1 and is defined by the

relationship
 A1  A   A A1   I , (14)
where  A  A, for example, denotes the product of the matrix  A1 and  A.
1
 The inverse matrix of  A can be determined by
adjoint  A
, (15)
det  A
where adjoint  A is the adjoint matrix of  A, and, det  A , the determinant of  A,
is assumed to be nonzero
Adjoint Matrix
 The adjoint matrix of a square matrix  A   aij  is defined as the matrix obtained by
replacing each element aij by its cofactor  ij and then transposing.
Thus,
 11 12 . . . 1n   11  21 . . .  n1 

T
  22 . . .  2 n    22 . . .  n 2 
 21
  12 
 . . . . . .   . . . . . . 
Adjoint  A   
.   . (16)
. . . . . . . . . . .
   
 . . . . . .   . . . . . . 
 n2 . . .  nn   2n . . .  nn 
 n1  1n
 a11 a12 a13 
adjoint matrix of  A 
 A    a 21 a 22 
a 23 , inverse of  A :  A  
1
 determinant of  A 
 a 31 a 32 a 33 
 a 22 a 23 a12 a13 a12 a13 

 a 32 a 33

a 32 a 33

a 22 a 23 

 a 21 a 23 a11 a13 a11 a13 
   
  11  21  31   a 31 a 33 a 31 a 33 a 21 a 23 
  a12 
 22  32  
a 21 a 22 a11 a12 a11

 12   
  13  23  33   a 31 a 32 a 31 a 32 a 21 a 22 
 A 1
 
det( A ) a11 a12 a13
a 21 a 22 a 23
 ij : cofacto r of a ij a 31 a 32 a 33
 Numerical method to find the inverse,  A , of a matrix  A
1
Chap. 3
 By the definition  A A   I  (identity matrix ), let  X    A , i.e.,
1 1
 a11 a12 . . . a1n   x11 x12 . . . x1n  1 0 . . 0 0 

 a21 a22 . . . a2 n   x21 x22 . . . x2 n  0 1 . . . 0 
 . . . . . .  . . . . . .   . . . . . .
 . . . . . .  . . . . . .  . . . . . .
 . . . . . .  . . . . . .  0 . . . 1 0 
 an1 an 2 . . . ann   xn1 xn 2 . . . xnn  0 0 . . 0 1 
we want to find matrix  X  to satisfies  A X    I  for given matrices  A ,  I  .
 The above procedure is equivalent to solving a set of linear equations with n

right-hand side vectors,  A xi  bi ; the i-th right-hand-side vector being a
n-component vector with unity in the i-th position and zeros everywhere else.
 a11 a12 . . . a1n    x11   x12   x1n    1  0  0  
 a21 a22 . . . a2 n    x21   x22   x2 n     0   1   . 
 . . . . . .    .   .    .      .  0    .  
 . . . . . .   .   .   .   .   .   . 
 . . . . . .   .   .   .   .   .  0  
 an1 an 2 . . . ann    xn1   xn 2   x    0  0 
 nn       
 
1  
5. Basic Matrix Operations
Equality of matrices: Two matrices  A and  B , having the same order, are equal if
and only if aij  bij for every i and j .
Addition and subtraction of matrices: The sum of the two matrices  A and  B ,
having the same order, is given by the sum of the corresponding elements. Thus, if
C    A   B    B    A, then cij  aij  bij for every i and j .
Similarly, the difference of two matrices  A and  B  of the same order, is given by
 D    A   B  with dij  aij  bij for every i and j .
Multiplication of matrices: The product of two matrices  A and  B  is defined only
if they are conformable (i.e., if the number of columns of  A is equal to the number of
rows of  B  ). If  A is of order m  n and  B  is of order n  p , then the product
C    A B  is of order m  p and is defined by C   cij  , with
n
cij   aik bkj ; i  1, 2,..., m; j  1,2,..., p (17)
k 1
This means that cij is the quantity obtained by multiplying the i-th row of  Aand the j-th
column of  B  and summing these products.
Other operations: If the matrices are conformable, the multiplication process is

associative
( A B ) C    A ( B C ), (18)
( A   B ) C    AC    B C . (19)

Notes:
(1)  A B  is the pre-multiplication of  B  by  A or the post-multiplication of  A

by  B . Also the product  A B  is not necessarily equal to  B  A.
(2) The transpose of a matrix product is given by the product of the transposes of the
separate matrices in reverse order. Thus, if C    A B  , then
C T  ( A B )T   B T  AT . (20)
(3) The inverse of a matrix product is given by the product of the inverse of the
separate matrices in reverse order. Thus, if C    A B  , then
C 1  ( A B )1   B 1  A1 . (21)

Homework I
• At section 1.6.1 Absolute Error and Relative Error
1
• In equation 1.20, why we could put in the relative error?
2
• How many different error in numerical analysis?

Please list them and set an example to explain them in detail.
1 1 100 1
• In page 75, please explain the difference between σ𝑘=100,−1 and σ𝑘=1,+1
𝑘 𝑘
in their numerical results.

01 - Introduction and Numerical Errors

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

01 - Introduction and Numerical Errors

Uploaded by

Copyright:

Available Formats

授課教師：周鼎贏 Dean Chou D.Phil.

01, 09/09 Introduction

02, 16/09 Preliminary Concepts and Error Analysis

03, 23/09 Solving One Variable Equations

04, 30/09 Solving System Algebra Equations

05, 07/10 Curve Fitting and Interpolation

06, 14/10 Numerical Differentiation

07, 21/10 Numerical Integration

08, 28/10 ODE for IVP and BVP

09, 04/11 Mid-term Exam (No Class)

10, 11/11 Personal Project Review/Presentation

11, 18/11 Personal Project Review/Presentation

12, 25/11 Personal Project Review/Presentation

13, 02/12 Personal Project Review/Presentation

14, 09/12 Group Project Review

15, 16/12 Group Project Review

16, 23/12 Group Project Review

17, 30/12 Group Project Review

18, 06/01 Final Exam (No Class)

 Experimental and Analytical approaches to solve engineering/scientific problems

 Analytical Approach: Most engineering analysis problems involve

 In certain special types of problems, graphical

the value of this integral can be expressed analytically as (closed-form solution)

 On the other hand, the integral

 Since numerical methods involve a large number of tedious/intensive REAL

A digital computer functions through the

The hardware is composed of the central

A major set of built-in programs (system software), called an operating system,

 FORTRAN (FORmula TRANslatotion) was the first high-level language developed

 COBOL (COmmon Business Oriented Language, developed in 1959 by Grace Hopper)

 BASIC (Beginner’s All-purpose Symbolic Instruction Code) was developed in 1964 by

 Appendices A and B present the basic concepts of Fortran and C languages

 In a digital computer, all information is stored in binary form.

1.4.4 Constants and Variables

 Data are represented as constants or variables in a high-level language such as Fortran.

 Integers, fixed-point number: Integer numbers denote whole numbers with no

 Complex: two reals, the real and imaginary parts.

 In the decimal system, an integer I (based-10 number) is written as

 I 10  am1 10   am2 10   ...  a2 10   a1 10   a0 10  .

I  1234567 10  I  110   2 10   3 10   4 10   5 10   6 10   7 10 

 The decimal value of the number (base b ) given by Eq. (1.6) is

Integer (Fixed-point Number, Whole numbers with no Fractional part)

 A decimal number I (given) can be represented in a binary form

j 0 ( am1 , am2 ,..., a1 , a0  ? ) (1.8)

Dividing by 2 to get a0 , a1 , …, am2 , am1 :  I 10  2  2m2 am1  ...  a1  + a0

Pm2  2 Pm1  am1 ( Pm1  0).

 A real or floating-point number ( R) contains an integer part and a fractional part.

 Thus a fractional number R, 0  R  1 (zero integer part), can be expressed as

(1) bit 31 gives the sign as ( 1)1  1.

(2) bits 24 through 30 (7 bits) give the exponent as

 0  2  1 2  0  2  ...  0   2 

Thus, the decimal equivalent of the binary number is given by

Bits:(0 11101100 10000101100011111001101) x   f  2e126

Values 8/23 and 2/126 are used commonly for Workstation/PC

4-Byte (32-Bits Sequence) Physical Representation

REAL(4.0): 0 10000001 100000000000000000000000

i 8 bits 23(+1) bits f  0.5   fk  2k

BITVIEW.ZIP: A utility (Copyright c 1995 Microsoft Corp.), which allows you to

8-byte, 64 bits (sing 1, precision 52, range 11) REAL: x  b e1022  f

 to get the Bit-Sequence: programs REAL_4, REAL_8, and REAL_16

 A programming structure denotes a scheme for processing data. A typical high-level

 An assignment statement is used to compute a quantity or assign a value to a variable.

 Control statements are used to direct the logical sequence of instructions in a