You are on page 1of 329

343

2008
c
2008,


. ,
.
. I
/ .

.

http://scgroup.hpclab.ceid.upatras.gr/class/sc.html.

( , , , ,
, on-line , , , .)
.

. .

. ( RISC SVD).
, .. cos 1 .

2 :
1 . , . , . . (. : 1992),
2
(. : ,
1976).
matrix

.
, , ,

( ) array
table. , polyval.m MATLAB
POLYVALM Matrix polynomial evaluation. If V is a vector whose elements are the coefficients of a polynomial, then POLYVALM(V,X) is the
value of the polynomial evaluated with matrix argument X. See POLYVAL for
1
, .
- Strang .
2

.

4
polynomial evaluation in the regular or array sense.
, Mathematica, ,
.
3 . , ,
. 4 .
,

/. ,
, : )
110 (2 .): , ) 240 (4 .):
. 261 (3
.) ( ) 205 ( ).
, , () ()
G. Strang, / (1996)
.. .. , /
(1997).
( ) MATLAB ( version 7).
, .
, ,
Scilab
MATLAB , ( http://www.scilab.org/)!
: 1) G. Golub and C. F. Van Loan.
Matrix Computations. The Johns Hopkins University Press, Baltimore, third
edition, 1996. 2) N.J. Higham. Accuracy and Stability of Numerical Algorithms.
SIAM, Philadelphia, 2002, 2nd. ed. C.W. Ueberhuber.
Numerical Computation, volumes 1 and 2. Springer, Berlin, 1997.
.
,
. .
.
. ,
. (
).

.

3
, matrix,
, 2 (. xxxiv).
4
. 1xvii 2 .

5
, , , , , .

.

().
.
. , , , , . , , , , , , ,
, .
, ,
, , , ,
.
LaTEX .
.

. ,
. , , ,
. , .
.
2008


1
1.1 . . . . . . . . . . . . . . . . . . . . . . . .
1.2 . . . . . . . . . . . . . . . . . . . . . . . . .
1.3 . . . . . . . . . . . . . . . . . . . . . . . . .
1.4 . . . . . . . . . . . . . . . . . .
1.4.1 . . . . . . . . . . . . . .
1.4.2 , ,
. . . . . . . . . . . . . . . . . . . . . . . .
1.5 . . . . . . . . . . . . . . . . . . .
1.6 . . . . . . . . . . . . . . . .
1.7 . . . . . . . . . . . . . . . . . . . . . . . .
1.8 . . . . . . . . . . .

. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .

2
2.1
2.1.1 . . . .
2.2 . . . . . . . . . . . . . . . . . .
2.2.1 . . .
2.3 . . . . . . . . . . . . . . .
2.4 . . . . . . . . . . . . . . . . . . . . . . .
2.5 . . . . . . . . . .

.
.
.
.
.
.
.

5
. 5
. 7
. 9
. 12
. 12
.
.
.
.
.

14
16
16
18
19

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

27
28
32
33
34
39
41
41

3
3.1 . . . . . . . . . . . . . . . . . .
3.2 . . . . . . . . . . .
3.2.1 , . . . . . . .
3.2.2 bit . . . . . . . . . . . .
3.2.3 ... . . . . . . . . . . . . .
3.2.4 . . . . . . . . . . . . . . . . .
3.3 .
3.4 . . .
3.4.1 . . . . . . . . . .
3.4.2
3.4.3 . . . . . . . . .
3.4.4 Fused Multiply and Add (FMA) . . . . . . . .
3.4.5 Java . . . . . . . . . . . . . . . . . . .
3.5 . . . . . . . .
3.6 . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

47
48
49
51
53
54
55
58
63
65
66
68
68
70
71
86

.
.
.
.
.
.
.

.
.
.
.
.
.
.

2
3.6.1 . . . . .
3.6.2 . . .
3.7 . . . . . .
3.8 . . . . .
3.9 . . . . . . . . . . . . .
3.10

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

. 87
. 87
. 89
. 91
. 98
. 106

4
4.1 . . . . . . . . . . . . . . . . . . . . . . . . .
4.1.1 . . . . . . . . . . . . . . . . . . .
4.1.2
4.2
4.2.1 . . . . . . . . .
4.2.2 . . . . . . . . .
4.2.3 . . .
4.2.4 - : .
4.2.5 . . . . .
4.2.6 BLAS . . . . . . . . . . . . . . . . . . . . . .
4.3 . . . . . . . . .
4.4 . . . . . . . . . . . . . . . . .
4.5 . . . . . . . . . . . . . . . . . . . . . . . .
4.6 . . . . . . . . . . . . . . .
4.7 . . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

111
. 111
. 112
. 113
. 114
. 114
. 118
. 121
. 123
. 127
. 129
. 132
. 132
. 138
. 139
. 140

5 II
5.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2 . . . . . . . . . . . . . . . . . . . . . . .
5.3 . . . . . . . . . . . . . . . . . . . . . . . . . .
5.4 . . . . . . . . . . . .
5.4.1
5.4.2 . . . . . . . . . . . . .
5.5 . . . . . . . . . . . . . . . . . . . . . . . .
5.6 . . . . . . . . . . . . . . . . . . . . . . . .
5.6.1 . . . . . . . . . . . . .
5.7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.7.1 . . . . . . .
5.8 .1 . . . . . . . . . . . . . . . . . . . .
5.9 . . . . . . . . . . . . . .
5.9.1 Cholesky . . . . . . . . . . . . . . .
5.10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.11 . . . . . . . . . . . . . . . . . . . . . .
5.12 . . . . . . . . . . . . . . . . . .
5.13 . . . . . . . . . . . . . . . . . . .
5.14 . . . . . . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

151
. 151
. 153
. 157
. 165
. 166
. 168
. 171
. 175
. 178
. 182
. 186
. 192
. 194
. 194
. 197
. 200
. 200
. 201
. 203

6
6.1 QR . . . . . . .
6.2 . . . . . . . .
6.2.1 . . . . . . . . . . . .
6.2.2 Gram-Schmidt . .

.
.
.
.

.
.
.
.

217
. 218
. 218
. 219
. 220

III
. .
. .
. .
. .

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

. 222
. 223
. 225
. 226
. 227
. 228
. 230
. 231
. 233
. 234
. 235
. 236
. 237
. 237
. 238
. 240
. 241

7 IV
7.1 / . . . . . . . . . . . . . . . . .
7.2 . . . . . . . . . . . . . . . .
7.2.1 : . .
7.2.2 Vandermonde . . . . . . . . . . . . .
7.2.3 Toeplitz . . . . . . . . . . . . . . . .
7.2.4 Toeplitz . . . . .
7.2.5 . . . . . . . . . . . . . . .
7.3 .
7.4 . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

245
. 246
. 251
. 251
. 253
. 259
. 260
. 265
. 267
. 267

8
8.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.1.1 . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.2 . . . . . . . . . . . .
8.2.1 . . . . . . . . . .
8.2.2 . . . . . . . . . . . .
8.2.3 Euler . . . . . . . . . . . . . . . . . . . . . . . . .
8.2.4 Taylor, Runge-Kutta Richardson .
8.2.5 - :
8.3 . . . . . . . . . . . . . . . . . . . . .
8.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.5 . . . . . . . . . . . . . . . .

275
. 275
. 278
. 278
. 279
. 287
. 290
. 298
. 304
. 310
. 310
. 314


.1 . . . . . . . . . . . . . . . .
.1.1 . . . . . .
.1.2 . . . . . . . .
.2
.3 .

317
. 317
. 317
. 318
. 319
. 320

6.3

6.4

6.5
6.6

6.2.3 GS . . . . . . .
6.2.4 Householder
6.2.5 . . . . . .
QR: Householder . . .
6.3.1 QR .
6.3.2 QR Householder . . . . .
6.3.3 .2 QR . . . . . . . . . . . .
6.3.4 . . . . . .
6.3.5 . . . . . . . . . . . . . . . . . . .
Givens . . . . . . . . . . . . . . . . .
6.4.1 . . . . . . . . . .
6.4.2 Givens . . . . . . .
6.4.3 . . . . . . . . . . . . . . . . . . .
6.4.4 QR Givens . . . . . . . . . . . . . . . .
6.4.5 . . . . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . .

. . . . . .
. . . . . .
. . . . . .
Lipschitz
. . . . . .

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

1.1
,
1
.
:
1980
,
. Future Directions in Computational
Mathematics, Algorithms and Scientific Software. : The use of modern computers in scientific and engineering research
and development over the last three decades has led to the inescapable
conclusion that a third branch of scientific methodology has been created.
It is now widely acknowledged that, along with the traditional theoretical
and experimental methodologies, advanced work in all areas of science
and technology has come to rely critically on the computational approach. ( [33]).
It is becoming clear that dramatic increases in computing power are necessary but insufficient to making high-performance computing a reality.
Necessary is also the construction of a large body of applications capable
of using that computational power effectively ( [3] Alpern
Carter2 .)
It is essential to recognize the fact that computer experiments can both
be a two-way bridge between Physical Experiments and Mathematical
Models, as well as an independent source of physical understanding.
Such experiments have a mind-bending potential for future explorations
of natures secrets, which is only vaguely recognized today. (
1

, . 2.
Bowen Alpern Larry Carter Computer Scientists IBM Yorktown Heights. Carter University of California, San Diego (UCSD).
2

c
1. 2008,
.

6
Jackson3 [23]).


: , ,
, , . , . ,

, ,
.
. ,
:
.
( Michel Serres4 [35])

. ,
computational science and engineering . Computational Science and Engineering
[12]. Golub Ortega5
[14, . 2]:
Scientific computing is the collection of tools, techniques, and theories required to solve on a computer mathematical models of problems in science and engineering.
:
, ,
.
. Mathematical Modelling
[4, . 220].


(, , .)
.
.
3
Atlee Jackson Center for Complex Systems Research Beckmann Center University of Illinois at Urbana-Champaign. Santa Fe
Insitute.
4
Michel Serres
5
Gene Golub Stanford James Ortega
University of Virginia.

c
1.2. 2008,
.

1.1:
[26]
1 2 3 4 5 6 7 8 9 10

.
.
.
*
.
.
.
*
*
*

.
.
.
.
*
*
.
.
.
.

*
.
*
.
*
*
*
.
.
.

*
*
.
.
.
.
.
.
.
.

.
*
.
.
*
.
.
.
.
.

*
.
*
*
.
.
.
.
.
.
*
.
*
.
.
*
*
.
*
.

*
.
*
.
.
.
.
*
.
.

*
*
*
*
*
*
*
.
.
.
/
*
*
*
.
.
.
.
*
*
*

*
*
.
*
.
.
.
.
.
.
.
*
*
*
*
*
*
*
.
.
*
. .
*
.
.
*
.
.
.
.
.
.
1.
3.
5.
7.
9.



(FFT, )
(=multigrid)
Monte Carlo

2. . / . . .
4. .
6.
8.
10.

1.2
: 1) , 2) (, ), 3)
, 4)
.
1.1, [26],

.
.
[26] ( )

6
. ,
. 1) (restructuring compiler) 2)
..
.

1.
,
6

(= legacy) (= dusty-deck) .

c
1. 2008,
.

2. .
1.1 .
,
(= computational kernels). ,
Fourier
.
Fourier ( FFT)
( Gauss ).
. (
)
.

.
1.2.1. () Fourier , -
-
Fourier .
7.

,

.

(
7 ) . [24, 34].

1.1 ,
. ,
.. ,
.
( ) .
- (=input-output tables).
8 .
(= derivatives) [6]. ,
7

Grand Challenges of Computational Science [22].


Wassily Leontief ( ) [28] () .
8

c
1.3. 2008,
.

- -
[11] .

www.cs.sandia.gov/tech_reports/ripryor/Aspen.html.

1.3

:
1.
2.
3.

.

, . John Rice (Purdue University)9 : What
is an Answer? : , , .
,
10 . , .
, . ,
.
1) , 2) , 3)
(..) , . 4) , ..
, , . ,

. 5)
.
,
.
,

.
.
() ...
9

HERMIS, 1996.

(particle methods).
10

c
1. 2008,
.

10


. ,
.. Gauss Hotelling
[19], n ,
4n .
John von Neumann
11 .
[5]:
In the elimination method a series of n compound operations is
performed each of which depends on the proceeding. An error at
any stage affects all succeeding results and may become greatly
magnified; this explains roughly why instability should be expected.
It should be noticed that at each step a division is performed by
a number whose size cannot be estimated in advance and which
might be so small that any error in it would be greatly magnified by
division...
, John Wilkinson, almost every statement in it is either wrong or
misleading. ,
.
von Neumann, Herman Goldstine,
, von Neumann ( Turing), ,
Gauss [13] . [17].
( Oscar Wilde)

.
,
, ..
.
:
(
!)
,
,
.

As soon as an Analytical Engine exists, it will necessarily guide the


future course of science. Whenever any result is sought by its aid,
the question will then arise - By what course of calculation can these
results be arrived at by the machine in the shortest time? ....
Charles Babbage, [Passages from the Life of a Philosopher, 1864]
. (
11

c
1.3. 2008,
.

13
(. + . + .)

(vectorizing compilers)
BLAS1 BLAS3

FFT

11

O(1)
pipe
(1)
(. .)
(n/ log n)

1.2:

)
. 12 1.2.
.

. , ,
. ,
(.. ,
) 14 .
: .. RAM PRAM
.

(benchmarks) . Linpack benchmark ,
.

. ,
.
, ,

.

.
, ..
ACM, LAPACK, .. .
12
The Federal High Performance Computing Program
1989.
13
Alpern Carter performance programming [3].
14
Beresford Parlett [30].

c
1. 2008,
.

12

,
, ,
.

. ,
,
, :
1. .
2.
.
3. (.. RAM)
.
1.3.1. Fourier
( (n2 ) (n log n) )
: n,

,
.

1.3.2.
Strassen, (nlog 7 ) O(n3 ),
. :
, Strassen
!

.

1.4

.
.

1.4.1


.
. ,
. :

c
1.4. 2008,
.

13

1. RISC : (register files), - LOAD-STORE,


pipelining) . [8]
RISC .
URL

http://www.ee.siue.edu/ mvinant/g_info/cpu_hist.htm#RISC.
2. (, , , /).

( ),
.
,
chip (single-chip processor), pins chip
single-chip
15 chip
. on-chip off-chip.
on-chip :
() (register files).
(instruction cache).
(data cache).
, on chip
. ,
.
( )
.
:
... the operation count is not necessarily an adequate figure
ofmerit in comparing theoretically the value of algorithms in
numerical analysis [ . . . ] Other factors, such as [ . . . ] the
pattern in which memory banks of the computer are referenced,
may be as important as the operation count in determining the
speed of a program... [18]
3. , .. (superscalar)
15

single-chip pin-bandwidth limitated.

14

c
1. 2008,
.
, (= clusters), (networks of workstations = NOW) (Grid)
.

. [16].
.
( ) ,
, ,
(..
RISC,
). , , .
, SIMD = Single Instruction Multiple Data) streaming Intel,
(GPU = graphics processing units).

1.4.2 , ,
. , , .
, , ( )
.


. ,
, , . ,
-
(= semantic gap)
. ,
,
. ( .. [7, 21])
() (= Problem Solving Environments) (. [21]).

ELLPACK [20] ,
, . ELLPACK
. ,


. , , Mathematica [36], Maple [2],
Matlab [1], Scilab [15]). -

c
1.4. 2008,
.

15

scripting16 [29] ( Python [27])



[25].


;
.
. . ,

.
,
Fortran. C ( , .. C++), .
Fortran, . Fortran-90,
, ,
, , .
John Backus17
I dont know what the technical characteristics of the standard language for scientific and engineering computation in the year 2000
will be ... but I know it will be called Fortran.
, ,
. , .

.
. [10]
[31] The Influence of the Compiler on the
Cost of Mathematical Software - in Particular on the Cost of Triangular Factorization. ,
18 .
.

.
16

.
(1924-2007) Fortran
BNF.
18
.
. . [32, 9].
17

16

c
1. 2008,
.

1.5
1.1 (-)
,

A A + xy T

(1.1)

A n x, y n.
:
1. Fortran ( ),
2.

3. .
) (1.1)
Unix dtime
user system ) () ...
(Mflop/s) n = 30 n = 800.

1.6
1.6.1.
;

. [, . 1.1] , ,
.
1.6.2. ; 3 .

. [, . 1.1] ( - - )
.
,

(.. , ..) : )
, ) Fourier, )
.
1.6.3.
;

. [, . 1.3]
) , )
, ) .
1.6.4. .

c
1.6. 2008,
.

17

3.5

time in sec

2.5

1.5

0.5

0
0

100

200

300

400
n

500

600

700

800

80

70

60

Mflop/s

50

40

30

20

10

0
0

100

200

300

400
n

500

600

700

800

1.1: ) SGI Indigo-2


R4400 @ 250 MHz, 2 MB cache. 1 () ,
.
. ) Mflop/s SGI Indigo-2
R4400 @ 250 MHz, 2 MB cache. 1 () ,
.
.

c
1. 2008,
.

18

. [, . 1.3]
1) , 2) , 3) (..) ,
. 4) , ..
, , .
,
. 5)
.
1.6.5.
.

. [, . 1.3]
LAPACK
n. n 1000.
1.6.6. /
.

. [, . 1.4] ) RISC, )
, ) , .

1.7
Pn1

1.7.1. p(x) = j=0 j xj n1 6= 0. )


m 1 , ..., m
V a. , V
a. ) Horner, .. MATLAB ,
V a V . ) n = m,
(
) V a O(n log n) ...
; V a
(, j j ).

.
) Vandermonde :

1
1
12

V =

...
1n1

1
2
22

1
3
32

...
...
...

...

...

..

2n1

3n1

...

1
m
2
m

...

n1
m

m 1 , 2 , . . . , m Vandermonde :

a=

...

n1

>

c
1.8. 2008,
.

19

) MATLAB
( MATLAB ). , a(i) i1 .

g = a(n)*ones(m, 1);
for i=n-1:-1:1,
g=g.*z + a(i)*ones(m, 1);
end
) Fourier x n

y(j) =

n1
X

x(k)ei2kj/n ,

j = 0, ..., n 1.

k=0

Vandermonde :

=
=

1
1

ei2(1)

...

ei2(2)

>
. . . ei2(n1)

>

V a
Fourier a, O(n log n).

Pn1

j
1.7.2. , , p(z) =
j=0 j x
n1 6= 0, V a
8n log n ..
lyes-mach, , lno-mach,
, . , lno-mach,
V a,
2n2 . (.. )
lno-mach.

1.8
1.8.1. Fourier x
n

y(j) =

n1
X

x(k)ei2kj/n ,

j = 0, ..., n 1.

k=0


Vandermonde x. )
Vandermonde
Fourier Fourier

20

c
1. 2008,
.

. )
fft MATLAB : flops n = 128 : 4 : 512.
tic, toc, etime, cputime. ;
O(n2 ) O(n log n).
,
. .

.
) MATLAB Vandermonde.

function V=vand_fft(n)
for j=1:n, v(j, 1) = exp(-2*pi*sqrt(-1)*(j-1)/n); end
V=zeros(n);V(:, 1)=ones(n, 1);
for i=2:n,
V(:, i)=V(:, i-1).*v;
end

1.8.2. 1.3
( ).
.
MATLAB for-loops .
- .
.
profile
find.

n = 500 : 100 : 20000 m = 50 : 50 : 200,
n = 50 : 50 : 300 .
m, n.

.

.
. for-loops
( profile). 1.4.
- . 1.2
( ,

100 m, n).
MATLAB
.

c
1.8. 2008,
.

tic;n=20000;rand(state,0);
figure;
for k=1:1:n,
A(k)=k;
end
for k=1:1:n,
B(k)=round(rand(1)*n);
end
for k=1:1:n,
C(k)=A(k)+B(k);
end
for k=1:1:n,
plot(B(k),C(k),.r);
hold on;
end
hold off;toc

21

tic;m=100;n=200;rand(state,0);
figure;
for j=1:1:n
for i=1:1:m
A(i,j)=rand(1);
end
end
for i=1:1:m
for j=1:1:n
B(i,j)=rand(1);
end
end
for i=1:1:m
for j=1:1:n
C(i,j) = A(i,j) + B(i,j);
end
end
for i=1:1:m
for j=1:1:n
if C(i,j)>0.5,
C(i,j)=1;
elseif C(i,j)<0.5,
C(i,j)=0;
elseif C(i,j)==0,
C(i,j)=-10;
end
end
end
for i=1:1:m,
for j=1:1:n,
if C(i,j)==-10,
plot(i,j,.k);
hold on;
elseif C(i,j)==0,
plot(i,j,.y);
hold on;
elseif C(i,j)==1,
plot(i,j,.m);
hold on;
end
end
end
title(Given);hold off;toc

1.3: 1.8.2

c
1. 2008,
.

22

tic;n=20000;rand(state,0);
figure;
A=(1:n);
B=round(rand(1, n)*n);
C=A+B;
plot(B,C,.r);
toc

tic;m=100;n=200;rand(state,0);
figure;
A=rand(m, n);
B=rand(n, m);
C=A+B;
C(find(C>0.5))=1;
C(find(C==0))=-10;
C(find(C<0.5 & C>0))=0;
[i, j]=find(C==-10);
plot(i,j,.k);
hold on;
[i, j]=find(C==0);
plot(i,j,.y);
hold on;
[i, j]=find(C==1);
plot(i,j,.m);
hold on;
title(Given);hold off;toc

1.4: 1.8.2
Execution time for code 1

Execution time for optimized code 1

20

0.02
0.018
0.016
time (sec)

time (sec)

15

10

0.014
0.012
0.01
0.008

0.5

1
n

1.5

Execution time for code 2

1
n

1.5

2
4

x 10

Execution time for optimized code 2

0.1
time (sec)

60
time (sec)

0.5

x 10

40
20

0
400

0.05

0
400
200

200
n

150
0

50

100
m

200

200
n

150
0

50

100
m

1.2:
( 1.8.2).

23

[1] MATLAB:

The

Language

of

Technical

Computing.

In

http://www.mathworks.com/products/matlab/.
[2] http://www.maplesoft.com/, 2007.
[3] B. Alpern and L. Carter. Performance programming: A science waiting to
happen. In U. Vishkin, editor, Developing a Computer Science Agenda for
High-Performance Computing. ACM Press, New York, 1994.
[4] R. Aris. Mathematical Modelling Techniques. Dover, Mineola, NY, 1994
(originally published in 1974).
[5] V. Bargmann, D. Montgomery, and J. von Neumann. Solution of linear
systems of high order. In A.H. Taub, editor, John von Neumann Collected
Works, volume V. Pergamon, Oxford, UK, 1963.
[6] E. Barucci, L. Landi, and U. Cherubini. Computational methods in finance: Option pricing. IEEE Computational Science & Engineering Mag.,
pages 6680, Spring 1996.
[7] R.F. Boisvert and E.N. Houstis, editors. Computational Science, Mathematics and Software. Purdue University Press, 1999.
[8] L. Carter. RISC from a performance programmers perspective. Invited talk at RISC in 1995 Symposium, 1995. Available from URL
http://www-cse.ucsd.edu/users/carter/ppbib.html.
[9] L. DeRose, K. Gallivan, E. Gallopoulos, B. Marsolf, and D. Padua. FALCON:
A MATLAB Interactive Restructuring Compiler. In C.-H. Huang, et al.,
editor, Lecture Notes in Computer Science: Languages and Compilers for
Parallel Computing, pages 269288. Springer-Verlag, New York, 1995.
[10] J. J. Dongarra, F. G. Gustavson, and A. Karp. Implementing linear algebra
algorithms for dense matrices on a vector pipeline machine. SIAM Rev.,
26(1):91111, January 1984.
[11] From Quadnet. Economic modeling from the ground up. IEEE Parallel and
Distributed Technology Mag., page 80, Summer 1996.
[12] E. Gallopoulos and A.H. Sameh. CSE: Content and product. IEEE Computational Science & Engineering Mag., 4(2):3943, 1997.
[13] H.H. Goldstine. The Computer from Pascal to von Neumann. Princeton
Univ. Press, Princeton, 5th edition, 1993.
[14] G. Golub and J.M. Ortega. Scientific Computing: An Introduction with
Parallel Computing. Academic Press, Inc., San Diego, CA, 1993.
[15] Scilab Group.
Scilab home
http://www.scilab.org/index.php.

page,

2007.

Online

at

[16] J.L. Hennessy and D.A. Patterson. Computer Architecture: A Quantitative


Approach. Morgan Kaufmann, San Mateo, CA, first edition, 1990.

24

[17] N.J. Higham. Accuracy and Stability of Numerical Algorithms. SIAM, Philadelphia, 2nd edition, 2002.
[18] R. Hockney. Computers, compilers, and Poisson solvers. In U. Schumann,
editor, Computers, Fast Elliptic Solvers, and Applications: Proc. GAMM Workshop, 1977.
[19] H. Hotelling. Some new methods in matrix calculation. Ann. Math. Statist.,
14(1):134, 1943.
[20] E.N. Houstis, T.S. Papatheodorou, and J.R. Rice. Parallel ELLPACK: An
expert system for the parallel processing of partial differential equations.
In Intelligent Mathematical Software Systems, pages 6373. North-Holland,
Amsterdam, 1990.
[21] E.N. Houstis, J.R. Rice, E. Gallopoulos, and R. Bramley, editors. Enabling
Technologies For Computational Science: Frameworks, Middleware, and
Enviroments. Kluwer, 2000.
[22] Grand Challenges: High Performance Computing and Communications. A
report by the committee on Physical, Mathematical, and Engineering Sciences. Office of Science and Technology Policy, 1991.
[23] E. A. Jackson. A first look at the second metamorphosis of science. Technical Report Report CCSR-95-1, Santa Fe Institute, 1995.
[24] W.J. Kaufmann III and L.L. Smarr. Supercomputing and the Transformation
of Science. Scientific American Library, New York, 1993.
[25] G. Kollias and E. Gallopoulos. Jylab: A system for portable scientific
computing over distributed platforms. In E-SCIENCE 06: Proceedings of
the Second IEEE International Conference on e-Science and Grid Computing,
page 97, Washington, DC, USA, 2006. IEEE Computer Society.
[26] D.J. Kuck, E. S. Davidson, D. L. Lawrie, and A.H. Sameh. Parallel supercomputing today and the Cedar approach. Science, 231:967974, February 1986.
[27] H. P. Langtangen. Python Scripting for Computational Science. Springer,
2006.
[28] W. Leontief. The Structure of the American Economy. 1945.
[29] J. K. Ousterhout. Scripting: Higher-level programming for the 21st century. Computer, 31(3):2330, 1998.
[30] B. N. Parlett. Progress in numerical analysis. SIAM Rev., 20(3):443455,
July 1978.
[31] B. N. Parlett and Y. Wang. The influence of the compiler on the cost of mathematical software - in particular on the cost of triangular factorization.
ACM TOMS, 1(1):3546, March 1975.

25

[32] C. Polychronopoulos, M. Girkar, M. Haghighat, C-L. Lee, B. Leung, and


D. Schouten. Parafrase-2: An environment for parallelizing, synchronizing, and scheduling programs on multiprocessors. International J. High
Speed Computing, 1(1), May 1989.
[33] W. C. Rheinboldt. Computational Modeling and Mathematics Applied to the
Physical Sciences. Washington DC, 1984.
[34] J.R. Rice. Computational science and the future of computing research.
IEEE Computational Science and Engineering Magazine, pages 3541, Winter 1995.
[35] M. Serres. . Le Monde Diplomatique,
(197), Nov. 2001. .
[36] S. Wolfram. Mathematica: A System for Doing Mathematics by Computer.
Addison-Wesley, Boston, second edition, 1991.

26


model: . [
]
: ... 2. (.) , ... 6. ... 7. (.)
.... [. .]
model: I. Representation of structure. 2e. A simplified description
of a system, process, etc., put forward as a basis for theoretical or
empirical understanding; f. (Math) A set of entities that satisfies
all the formulae of a given formal or axiomatic system. .... [The
New Shorter Oxford English Dictionary]
What is a model? the term mathematical model ... will be used for any complete and consistent set of mathematical equations
which is thought to correspond to some other entity, its prototype.
The prototype may be a physical, biological, social, psychological or
conceptual entity... Being derived from modus (a measure) the
word model implies a change of scale in its representation ... In
so far as the prototype is a physical or natural object, the mathematical model represents a change on the scale of abstraction. Certain
particularities will have been removed and simplifications made in
obtaining the model. - Rutherford Aris [4, Chapter 1].
Models. ... It is customary nowadays, for example, to refer to a
computer model of the atmosphere, even though this consists of
nothing more than a programme for manipulating observed measurements of temperature, pressure, humidity, etc., according to
the dynamical equations of meteorology. The notion of a model
thus extends into a purely symbolic domain, where there is only
an abstract similarity between the original system and its model.
- John Ziman [17, Chapter 2.12].
, .
,
. ,
, ,
- Claude Levi-Strauss [11, .
38].
27

28

c
2. 2008,
.

,
,
,
,
.
, -
.
. [15, . 263].

, ,
.
. ,
.

2.1

...

... .
[Pierre Simon de Laplace, Theorie
analytique des probabilites.]

One shouldnt always include all the effects in a mathematical model; a huge simulation of the exact equations (even if one knows
them) may be no more enlightening than the experiments that led
to these equations. There are virtues in simplicity, even in caricature (...) Solving is not the same as simulating. Our Models
are Our Metaphors: Princeton American Academy of Arts and Sciences, Philip
Holmes (SIAM News, June 2002.)
. : modus, . , modello.
Unesco ( , , 1972).
,
, .
, . ,
, ,
,
..

(.. -

c
2.1. 2008,
. 29

.)
, , . (.. , , , )
(.. ).
- (.. ) , , .

, ,
.
,
, ,
,
.1
, . , ,
, , .

. 2
(, ) .
,
, ..
, ,
,
. ,
3 .
/. ,
4 .

, .
,
1

, Unesco ( , , 1972).
.
3
:
, , [ , . .]
4
Simulation as a source of new knowledge
The Sciences of the Artificial [12] - -
Carnegie Mellon ..., Herbert Simon ( ).
2

30

c
2. 2008,
.

2.1: .

, , /, .
,
.
2.1.1. Evariste Galois
1830,
5
.

2.1.2. ,
Newton xi+1 = xi
0
f (xi )/f (xi )
. ( ) , f . ,
.. ,
.
, ,
x x
xi+1 = xi f (xi ) f (xi i)f i1
(xi1 ) .
0

00

Taylor, f (xi1 ) = f (xi ) + f (xi )(xi1 xi ) + f ()


xi1 xi .

f (xi ) f (xi1 )
xi xi1
0

(2.1)

f (xi ).
(2.1) Newton .

c
2.1. 2008,
. 31
,
(2.1) .

2.1.3.

. .


.
/ (
) , ,

.

.

/.
2.1.1.
, .. ,

.

.

2.1.2. , . ,
, MIT, Paul Krugman, .
, Krugman
. (. [9, . 47]):
. -
,
- . ,
,
:- , , ,
. ,
(
). ,
, , ,
,
.

32

c
2. 2008,
.

2.1.1
The study of computational complexity requires that one agrees on
a model of computation, normally called a machine model, for effectuating algorithms. Unfortunately many different machine models
have been proposed in the past, ranging from theoretical devices like the Turing machine to more or less realistic models of the random
access machines and parallel computers... [16]
, , .. , , ,
. :
. .
. .
. 8 .
:
.

/ , . , ,

.

,
, ,
(artifacts) ( ).
2.1.3. .
David Deutsch 5 The Fabric of Reality - The Science of Parallel Universes and its Implications [5]:
... What makes the general theory of relativity so important is not
that it can predict planetary motions a shade more accurately than
Newtons theory can, but that it reveals and explains previously unsuspected aspects of reality, such as the curvature of space and time.
This is typical of scientific explanation.... But the ability of a theory
to explain what we experience is not its most valuable attribute. Its
most valuable attribute is that it explains the fabric of reality itself.
... Yet some philosophers - and even some scientists - disparage the
role of explanation in science. To them, the basic purpose of a scientific theory is not to explain anything, but to predict the outcomes of
experiments: its entire content lies in its predictive formulae. ... This
view is called instrumentalism because it says that a theory is no
more than an instrument for making predictions...

David Deutsch ,
. Dirac
.

c
2.2. 2008,
.

2.2

33

.
(.. )
.
RAM
RAM (= Random Access Machine).
(. [3] ),

. , .

.
2.2.1. Horner
p(x) = 0 + 1 x + + n xn .
:

s = an
for i = n 1 : 1 : 0
s = s x + ai
end
, T (n) =
2n .

2.2.1. RASP (=
Random Access Stored Program) uniform cost criterion
(=straight-line), . branch
.


/ , 6
. ,

/. ,

. ,
.
,

6
, RAM
, .

c
2. 2008,
.

34

/, :
,


7 .
,
1.1 (-)

( RAM )
!
8
. , , .

2.2.1


I gradually and slowly found out that there were two things to talk
about; the fact that knowledge is acquired, so to speak, by memory;
but that when you know anything, memory doesnt come in. At any
moment that you are conscious of knowing anything, memory plays
no part.... You have a sense of the immediate... - Gertrude Stein
[14, p. 152].
Finally, we can combine LOAD and STORE into the arithmetic operations by replacing sequences such as { LOAD a; ADD b; STORE c
} by c a + b... - A. Aho, J. Hoprcroft and J. Ullmann [3].

RAM
/ :
,
,
,
.
:

load/store.
7


.
8
RASP
!

c
2.2. 2008,
.

35

K
M K .
, , / ...
.
load
,
.
load

(0)

.
,

.
RAM

. , RAM
.
:
(0)

= 0,
.
2.2.2. , Horner :
load x, an

s = an
for i = n 1 : 1 : 0
load ai
s = s x + ai
end
store s

2.2.2.
. , K ,
k = 0, ..., K 2m(k) f (k),
m f 0 [2].


,
.
, .

c
2. 2008,
.

36

: ... (flops). , , ...


... .

:
.

min :
.

. .
2.2.1. n
, min
n + 1.

. , ,
load n. , store.
, m , min =
n + m.
2.2.3. Horner,
2.2.2 = 2n = n + 3.
, min n + 3 = min .

/

Mflop/s: Million Floating Point Operations Per Second
. , Mflops. [8],
Million Floating Point OperationS. .
.
,
.
, , , min ,
.
, T

T + T

+ ,

1+

!
,

c
2.2. 2008,
.

37

:= / . , min = min /
...

T = T

1+

1 + min


,

.
.
4.
(.

= 0), T = T . ,

RAM,
...
.
,
.
. ; , RAM.
.




(..
, )



(.. ) : IBM RS/6000
DEC Alpha 21064 ...
.

38

c
2. 2008,
.

9 .
, .
2.2.3. ( , )
. min .
( bandwidth) ,
, .
Bmax Mbytes/sec. ,
8 bytes (. IEEE),
Mflop/sec

max :=

Bmax
.
8min

, max

Mflop/sec.
, ,
.
.. .
min
,
. ,

( ) .

2.2.4. , ( = prefetching). ,

,
.
:
: . runtime system.
(=explicit) :
.
,
T ,
T ,
. :
1. P1 , P2 P2 P1 .
9
/ (timers, monitors), ..
.

c
2.3. 2008,
.

39

2. P1 P2 P2 P1 .
3. -

. [1]. Todd
Mowry Stanford ( Carnegie Mellon)
Tolerating Latency Through Software-Controlled Data Prefetching 10 , :
This dissertation proposes and evaluates a new compiler algorithm
for inserting prefetches into code. (...) The algorithm can prefetch both
dense-matrix and sparse-matrix codes, thus covering a large fraction
of scientific applications (...) The results of our detailed architectural
simulations demonstrate that the speed of some applications can be
improved by as much as a factor of two, both on uniprocessor and
multiprocessor systems...
MATLAB
. , ,
.. 11 .

2.2.5. MFLOPS, . [13, 6]. ,


MFLOPS .
1970, ( Livermore loops),
Linpack benchmark, Perfect
SPEC.
. [7]
D. Kuck [10].
Linpack benchmark . URL www.netlib.org/benchmark/
[6].
on-line . SPEC ( = System
Performance Evaluation Cooperative) URL www.spec.org.

2.3
2.3.1. () C C + AB C
Rn1 n2 , A Rn1 n3 B Rn3 n2 . , min =
10

www.cs.cmu.edu/ tcm/thesis/thesis_tech.html
MATLAB, - (predefinition)
.
11

c
2. 2008,
.

40

min / ... min


.
1. min n1 = n2 = 1 n3 = n.
2. min n2 = n3 = 1 n1 = 1.
3. min n1 = n2 = n3 = n.
4. min

min .

.
1. min =

min

2n+2
2n

2. min =

min

4
2

3. min =

min

4n2
2n3

=2
=

2
n

4. 2.2.3.
2.3.2 (, , 03-makeup). A Rnn , x Rn ,
R , I y = (A I)x.
, min , ... ( )
O(n)
( LOAD) ( STORE).

. :

y = (A I)x = Ax x
min = n2 +
2n + 1. O(n) ,
A.
:
LOAD , x
for i = 1 : n
LOAD A(i, :)

yi = A(i, :)x xi
end
STORE y
2n2 + n,
2
min = n +2n+1
.
2n2
2.3.3. for i = 1:n, y(i) = a*x(i)+y(i),
end. b = 3.

. rem r = rem(n, m) m, n
, . n = pm + r , 0 r m 1,
:

c
2.4. 2008,
.

41

r = rem(n,b);
for i = 1:r, y(i) = a*x(i)+y(i); end;
for i = r+1:3:n
y(i) = a*x(i)+y(i);
y(i+1) = a*x(i+1)+y(i+1);
y(i+2) = a*x(i+2)+y(i+2);
end

2.4
Qn 2.4.1. pn (x) =
j=1 (x j ). , MATLAB ,
pn (x).

. :

a(2) = 1; a(1) = -r(1); a(3:n+1) = 0


for j = 2:n
t(2:j+1) = a(1:j), t(1) = 0
a(1:j+1) = t(1:j+1)-r(j)*a(1:j+1)
end

2.5
2.5.1. MATLAB , k ,

Aj = rand(2kj+1 , 2kj ),

j = 1, . . . , k

Aj Aj
eval, num2str, rand.
n = 2k . ,
k = 1 : 10. Bk = A1 A2 A3 . . . Ak .
1. a Bk
n :

Bk = (. . . ((A1 A2 )A3 ) . . . Ak1 )Ak .


2. Bk
n :

Bk = A1 (A2 (A3 . . . (Ak1 Ak )) . . .).


3. /a n.

c
2. 2008,
.

42

4. MATLAB B = A1 A2
A3 . . . Ak . .
MATLAB ;
5. , (3)

( ).

. k (k = 10 )
:

for j=1:k,
eval([A num2str(j) =rand(2(k-j+1), 2(k-j))]);
end
1. C Rmn , D Rnk
m(2n 1)k = 2mnk mk .
:
n3
4

n2
4 .

2: (A1 A2 A3 ) = 2n n4 n8 n n8 =

n3
16

1: (A1 A2 ) = 2n n2 n4 n n4 =

n
n
3: (A1 A2 . . . A4 ) = 2n n8 16
n 16
=

n2
8 .
n3
64

n2
16 .

...
n
k 1: (A1 A2 . . . Ak ) = 2n 2k1
3

n
2k

n 2nk =

n3
22(k1)

n2
.
2(k1)+1

j 2n2j 2n
j+1 ,
:

a =

k1
X
j=1

n3
n2
n3 4n n2 2n
n3
n2
2n

)
=

+
2j
j+1
2
2
3
2
3
2
3

2. :
1: (Ak1 Ak ) =

n2
22(k2)

n
(=
2k2

12)

...
k 3: (A3 . . . Ak1 Ak ) = 2 n4 n8

n
4

n2
16

n
4

k 2: (A2 . . . Ak1 Ak ) = 2 n2 n4

n
2

n2
4
2

n
2

k 1: (A1 . . . Ak1 Ak ) =

2n n2

n=n n

n
n
j 22(kj1)
2kj1
,
:

k1
X
j=1

n2
22(kj1)

)=
kj1

4n2 16
4n2
4
(2n 4) =
2n
3
3
3

3. a /a
:

c
2.5. 2008,
.

43

1.4

1.2

0.8

0.6

0.4

0.2

50

100

150
n

200

250

300

2.2: () /a ( 2.5.1).

Omega_a=sym((n3/3)-(n2/2)+(2*n/3));
Omega_delta=sym((4*n2/3)-(2*n)+(4/3));
k=(1:8);
flops_a=zeros(length(k), 1);
flops_delta=zeros(length(k), 1);
for i=1:length(k),
n=2k(i);
flops_a(i)=eval(Omega_a);
flops_delta(i)=eval(Omega_delta);
end
plot(2.k, flops_delta./flops_a, -);
xlabel(n);
ylabel(\Omega_a/\Omega_\delta);
2.2 .
4. ,
. .
MATLAB
. 2.3.
MATLAB ,
!

c
2. 2008,
.

44

k_max=10;
n=2.(2:k_max);
t_left=zeros(k_max-1, 1);
t_right=zeros(k_max-1, 1);
t_mat=zeros(k_max-1, 1);
for k=2:k_max,
an=sprintf(Matrix multiplication for n=%d, n(k-1));disp(an);
for j=1:k,
eval([A num2str(j) =rand(2(k-j+1), 2(k-j));]);
end
for m=1:100,
tic;
left_ex=A1;
for j=2:k,
left_ex=left_ex*eval(strcat(A, num2str(j)));
end
t_left(k-1)=t_left(k-1)+toc;
end
t_left(k-1)=t_left(k-1)/100;
for m=1:100,
tic;
right_ex=eval(strcat(A, num2str(k)));
for j=k-1:-1:1,
right_ex=eval(strcat(A, num2str(j)))*right_ex;
end
t_right(k-1)=t_right(k-1)+toc;
end
t_right(k-1)=t_right(k-1)/100;
for m=1:100,
tic;
mat_ex=A1;
for j=2:k,
mat_ex=strcat(mat_ex, *, strcat(A, num2str(j)));
end
mat_ex=strcat(mat_ex, ;);
eval(mat_ex);
t_mat(k-1)=t_mat(k-1)+toc;
end
t_mat(k-1)=t_mat(k-1)/100;
end
5. 2.4 ,
.

c
2.5. 2008,
.

45

0.25
left to right
right to left
matlab

0.2

time (sec)

0.15

0.1

0.05

200

400

600
n

800

1000

1200

2.3: ( 2.5.1).

2.5

/a

1.5

0.5

200

400

600
n

800

1000

2.4: () /a ( 2.5.1).

1200

46

[1] R.C. Agarwal, F.G. Gustavson, and M. Zubair. Improving performance of


linear algebra algorithms for dense matrices, using algorithmic prefetch.
IBM J. Res. Develop., 38(3):265275, 1994.
[2] A. Aggarwal, B. Alpern, A. K. Chandra, and M. Snir. A model of hierarchical memory. In Proc. Nineteenth Annual ACM Symposium on Theory of
Computing, pages 305314, May 1987.
[3] A. Aho, J. E. Hopcroft, and J. D. Ullman. The Design and Analysis of
Computer Algorithms. Addison-Wesley, 1974.
[4] R. Aris. Mathematical Modelling Techniques. Dover, Mineola, NY, 1994
(originally published in 1974).
[5] D. Deutsch. The Fabric of Reality. Penguin, 1997.
[6] R. Giladi. Evaluating the MFLOPS measure. IEEE Micro, pages 6975,
Aug. 1996.
[7] J.L. Hennessy and D.A. Patterson. Computer Architecture: A Quantitative
Approach. Morgan Kaufmann, San Mateo, CA, first edition, 1990.
[8] R.W. Hockney. The Science of Computer Benchmarking. SIAM, Philadelphia, 1996.
[9] P. Krugman. . , 2000.
[10] D.J. Kuck. The Structure of Computers and Computations. Wiley, 1978.
[11] C. Levi-Strauss. La pensee
Sauvage. Plon, Paris, 1962.
[12] H. A. Simon. The Sciences of the Artificial. MIT Press, Cambridge, Mass.,
second edition, 1981.
[13] J.E. Smith. Characterizing computer performance with a single number.
Comm. ACM, pages 12021206, Oct. 1988.
[14] Gertrude Stein. How Writing is Written. Black Sparrow Press, Los Angeles,
1974.
[15] . . . , , 1974.
[16] P. van Emde Boas. Machine models and simulations. In J. van Leeuwen,
editor, Handbook of Theoretical Computer Science. Volume A: Algorithms
and Complexity, chapter 1, pages 166. The MIT Press, Cambridge, MA,
1990.
[17] J. Ziman. An Introduction to Science Studies. Cambridge University Press,
Cambridge, 1984.



(...)
, (...) (...)
. Edgard Morin [29].

1.1, - - , ! ,
2.1,
, :

.
:
.. ,
, .
.
,
(..
), . , , .
, ,
1)
, 2)
.
,
47

c
48 3. 2008,
.

, . .
1 . Nick Higham2 ([17])

.
,
.

3.1
x x
.

Eabs (
x) = |x x
|,

Erel (
x) =

|x x
|
.
|x|

|x| ,
.

. , (
) ; ,

, , .
, x x
(
). .
,
(. ).
,
, , ( ).

x x

xk
, . Eabs (
x) = kx x
k Erel (
x) = kx
kxk .
,
, , ,
.
1
2

.
Manchester.

c
3.2. 2008,
. 49
3.1.1. p1 = [1, 0, , 0] R1000
p2 = [103 , , 103 ] R1000
x x1 x2 . 1 = kp1 k1 =
kp2 k1 x1
2 : 1000 x.

, ,
. ,
Cn x Cn ,

y = |x| yi = |xi |, i = 1 : n.
. x, y
Rn

x y xi yi , i = 1 : n.
,
. Rmn Cmn . ,
x x
|x x
|
[|x1 x
1 |/|x1 |, . . . , |xn x
n |/|xn |.
() ( = relative
componentwise error)

max
i

|xi x
i |
.
|xi |


(normwise analysis)
(normwise analysis),
.

3.2
Floating point arithmetic is by nature inexact, and it is not difficult to misuse it so that the computed answers consist almost
entirely of noise. One of the principal components of numerical
analysis is to determine how accurate the results of certain numerical methods will be David Knuth [22]
I would be afraid to fly in an airplane that was designed with floating point arithmetic. (1960) Alston Householder

bytes

(...). ...
/ 3
3
. L.N. Trefethen Predictions for Scientific Computing
50 years from now

c
50 3. 2008,
.
. ,

/.
, ,
...,
.
... F R

y = m et ,

(3.1)

t . ,
F . F ...
F . = 2 (.. = 16 IBM/360)
= 10. t m . , m
( y ), m/ t < 1
, y .
mantissa 4
( 2) y . Knuth mantissa : ... but it is
an abuse of terminology to call the fraction part a mantissa, since the concept
has quite a different meaning in connection with logarithms. Furthermore the
English word mantissa means a worthless addition. [22, page 199].
:
... (3.1) bytes, ..
4 8 /.
bit , ()
m,
e.
F
.
F M =
mmax emax t = mmin emin t . ,
, F . F
,
. 32 64 bits
[, M ].
4
.
mantis(s)a ,
, .
. ,
, mantissa ( ).
mantissa .
.

c
3.2. 2008,
. 51
(3.1) ...
e
t. , t F .
, F , Wilkinson:

,
t m ,
(emin , emax ) e, . emin e emax ,
,
F(, t, emin , emax ).
:
.
.
.
.

3.2.1 ,
F F .
R F fl

fl : R F
fl(x) F F x,
() . fl
( continuum) .
G F

G := {x R : |x| M } {0} R ,
M ...
.
fl(x).
:

x F fl(x) = x, . x.
x G x 6 F , fl(x) 6= x. x fl(x) F x. fl(x) 6= x,
. ,

.

c
52 3. 2008,
.

x1

x3

x4

x2

3.1: R
F .

x 6 G , |x| > M |x| < x 6= 0


. fl(x)
/ F .
, F
5
3.1 .
3.2.1. = 10 t = 2
. x = 1.9, y = 0.66 z = x y = 1.254. fl(z) = 1.2 fl(z) = 1.3.
0.0431 = |1.254 1.2|/1.254
0.0367 = |1.254 1.3|/1.254.
u = 12 /2 = 0.05
. (. 3.4.4),
( ) /.

Cause when you are up you are up,


and when you are down you are down,
but when you are only half way up
you are neither up or down!
[The Grand Old Duke of York, Mother Goose Nursery
Rhymes.]
, . x G x 6 F .
, .
fl(x) ... y F x:

y = fl(x) y = arg min


|y x|.

y F

(y , y+ ), y , y+ F
x. fl(x),
5

R .

c
3.2. 2008,
. 53
x
(y , y+ ). ,
, . y y+
.
3.2.2. ... F(10, 4, 9, 9) x1 =
0.10005, x2 = 0.10015. fl(x1 ) = 0.1000 fl(x2 ) = 0.1002.

3.2.3.
MATLAB SciLab, eps
1+ 1. , 2
.

16-
1 + eps/2
1 3f f 0000000000000
1 + eps
1.0000 3f f 0000000000001

1 + eps + eps/2
1.0000 3f f 0000000000002
1 + 2 eps
1.0000 3f f 0000000000002
1 + 2 eps + eps/2 + eps/4
1.0000 3f f 0000000000002
0.
. , , (y , y+ )
x y , y+ F , fl(x) = sign(x) max(|y , y+ |).

3.2.2 bit
...
,
.. y = 0.d1 0000 e y = 0.0d1 0000 e+1 .

. ,
m ( ) , . y R F d1 6= 0.
= 2, d1 6= 0 d1 = 1.
, 2 ( ),
bit m 1. , . , ,
(
) bit.
3.2.1.
bit (
). , ..,
bit
.

bit
/ , bit .

c
54 3. 2008,
.
y F

y = e .d1 d2 dt

0 di 1, d1 6= 0.

3.2.3 ...
... F , y 6= 0 F .
:
y F y 6= 0

emin 1 |y| emax (1 t )


z G z , z+

, z+

z , . z [z , z+ ] ... z

z < z z z+ < z+ .
t

z }| {
z = . x x e
( )
z z = m et z+ = (m + 1) et .
z+ z = et . ,
z
z , ,
, |z fl(z)| =
z .

et
2

|z fl(z)|

z }| {
. 0 0 e
t1 e

2
et
.
2

z G

z fl(z)
|
z

et 1 e
/
2
1t
= u.
2

u (=unit roundoff) ().


z G

fl(z) = z(1 + ), || < u.

(3.2)

c
3.2. 2008,
. 55
... ( )
...
(wobbling)

|m et (m + 1) et |
|m e+1t (m + 1) e+1t |

= et
= e+1t


e e+1 .

3.2.4


. =0
1+x
x6 ! ,
... ()
= 0.
... x .. 0 x < 1+x
,

, y = 1+5.5511
1017 , y ... 1.
3.2.4. MATLAB 6.5, , :

>>
>>

p1 = 5.5511e-017
1+p1
ans = 1
(1+p1==1)
ans = 1

... - ... -
..., x, ..., x+ , . ...
x < x+
... . , ... x
, (x, x+ ) = x+ x, ... x+ .
.
, y1 y2 ,
, ,
[y1 , y2 ] .
... :
3.2.1. , M ,
1 ..., .

M := (1, 1+ ).

(3.3)

6
, !

c
56 3. 2008,
.

,
:
M ...
1:
1+

M := arg min{f l(1 + ) > 1}


>0

(3.4)

,
(3.3).
, M ... ... t 1
, 1
1 + 21t , 3.2.1 M = 21t .
3.2.2. u = 2t .
, M = 2u.

3.2.3. ,
(.. MATLAB) ( : , ,
):

t=1.0
while (1.0 +t > 1.0)
t=t/2.0;
end
t=t*2.0;

... .

,
.
.

3.2.4. Fortran 90 M
, EPSILON.

3.2.5. M
. MATLAB eps. ,
Toshiba 320CDT Pentium II
:

>> eps
eps = 2.2204e-16
>> 1+eps >1
ans = 1
>> 1+eps/2 >1
ans = 0

3.2.1. 3.2.3 MATLAB eps M . ,

c
3.2. 2008,
. 57

MATLAB.
Pentium MATLAB

> test = (eps/2*(1+eps) +1 >1)


test = 1
e1 = eps/2*(1+eps) 1+ e1 >1
e1 < eps. eps
MATLAB 1+eps > 1.
M ;

3.2.6. M
. MATLAB:

EPS Floating point relative accuracy.


EPS is a permanent variable whose value is initially the distance
from 1.0 to the next largest floating point number. EPS may be
reassigned any value. EPS is used as a default tolerance by PINV
and RANK.

3.2.7.
EPS MATLAB . , ...
. , 7 ,

>> floor(0.75/0.25)
ans = 3
>> floor(0.075/0.025)
ans = 2
, ...
(.. IEEE), .

>> 0.75/0.25
ans = 3
>> 0.075/0.025
ans = 3.0000
v = 3-0.075/0.025
v = 4.4409e-16
v 2 eps. v,
. , 16
.

>> format hex


>> 0.25
ans = 3fd0000000000000
>> 0.75/0.25
ans = 4008000000000000
7

. . .

c
58 3. 2008,
.

>> 0.075
ans = 3fb3333333333333
>> 0.025
ans = 3f9999999999999a
>> 0.075/0.025
ans = 4007ffffffffffff
0.075

3, floor 2.

3.3

... ( )
/ .
.
F
m et ,
{+, , , /}, x, y F
x y 6 F .
R .
3.3.1. ... t
2t
.

3.3.2. M F ,
,
F .


.
... /.

8
; ,
, ,
... . , (ALU)
. , Pentium bug Intel
, .
, (..
). , ,
8
... ! , .

c
3.3. 2008,
. 59
. ,

.

,

x, y F
= fl(x y) F
xy

(3.5)

. R (. x y )
. , (=exact rounding).
,
.
,
z = xy
z
. ,
. ,
, ,
(guard digit).
3.3.3. ... = 2, t = 3 u =
213 /2 = 1/8.

=
=

21 0.100
20 0.111
21 0.100
21 0.011|1
21 0.0001
22 0.100

21 0.100
20 0.111
21 0.100
21 0.011
21 0.001
21 0.100

(. (3.5) ),

22 0.1 21 0.1
| = 1 = 8u
22 0.100

(3.5).

bits ,
(3.5) bits
( , (guard, round digits)
sticky bit) . .4 [16], [12]. [23]

c
60 3. 2008,
.

IEEE.
3.3.1.
( Cray C-90)
.

3.3.2.
:
fl(x y) = (1 + )x (1 + )y, ||, || u

,
R +,
. , x, y, z R ,
.
0 x + y R .
1 : x + y = y + x.
2 x + (y + z) = (x + y) + z .
3 0 x + 0 = x x R .
4 x R x R
, . x + (x) = 0.
0 x y R .
1 : x y = y x.
2 : x (y z) = (x y) z .
3 1 x 1 = x x R .
4 x x1 R x ( x1 ) = 1.
: x (y + z) = x y + x z.
R
F ... (3.5).
. 1, 3,
4 1, 3. 0 0
F .
3.3.4. 2:

t1 = fl(x + y) s1 = fl(y + z)
t2 = fl(t1 + z) s2 = fl(x + s1 )
t1 = (x + y)(1 + 1 ) t2 = (t1 + z)(1 + 2 )
t2 = ((x + y)(1 + 1 ) + z)(1 + 2 ), |j | u.

s2 = (x + (y + z)(1 + 2 ))(1 + 1 ) |j | u.

c
3.3. 2008,
. 61
, t2 6= s2
... .

3.3.5. 4: , x F z =
fl( x1 ) = x1 (1 + 1 ) x z = x x1 (1 + 1 )(1 + 2 )
. MATLAB 1 y 200
( y1 ) y 6= 1:

index = [];
for i=1:200
if ((1/i)*i = 1)
index = [index i];
end;
end;
NEC Versa SX Pentium 2
x = [49, 98, 103, 107, 161] (1/x)
x 6= 1.

F ,
R .
, .. , .
,
... x1 , x2 x3 , x4
x1 + x2 + x3 + x4
2 )+x
3 )+x
4 (x1 +x
2 )+(x
3 +x
4 ). ,
((x1 +x
. , , ,
, .
3.4.4 3.2.3.
... ,
. )
... )
.
(), x, y F , . x y G,

|fl(x y) (x y)|
u, x y 6= 0, {+, , , /}.
|x y|

(3.6)

(3.5),

fl(x y) xy

.
:

c
62 3. 2008,
.
3.3.1. x, y F x y G,

fl(x y) = (x y)(1 + ), || u

xy
, || u
1+
x y F = 0. F
x y x, y F .

fl(x y) =

3.3.3.

fl(x y) = (x y)(1 + ),
xy
(), ...
( R ) x
= x(1 + ), y = y(1 + ). ,
, . = = /,
...
( R ) x
= x, y = y(1 + ),
x
= x(1 + ), y = y , x y
(x y)(1 + ).
3.5.

3.3.2.
.

fl(ij ) = ij (1 + ij ), |ij | u,
|fl(A) A| |A|u
fl(A) = A + E, |E| u|A|,
fl(A + B) = (A + B) + E, |E| u|A + B|,


3.1.
, (3.6)
, .
(3.6)

/.
3.3.6. ... x, y F
x2 + y 2 G

0 < x F 9 fl( x) = x(1 + )


|| u.
z = 2x 2 . .
x +y


:
9

, fl( x) ...

... x.

c
3.4. 2008,
. 63

...

t1
t2
t3
t3
t1

x2 (1 + 1 )
y 2 (1 + 2 )
(t
1 + t2 )(1 + 3 )
t3 (1 + 4 )
x
t3 (1 + 5 )

fl(x x)
fl(y y)
fl(t1 + t2 )
fl( t3 )
fl( tx3 )

t1

=
=
=
=

x
(1 + 5 )
t3
x

(1 + 5 )
t3 (1 + 4 )
x
p
(1 + 5 )
(t1 + t2 )(1 + 3 )(1 + 4 )
x
p
(1 + 5 )
(x2 (1 + 1 ) + y 2 (1 + 2 ))(1 + 3 )(1 + 4 )

j
u.
,
z t1 . 3.5

.

(1+x)1
3.3.7. y =
.

fl(y)
=
((1
+
x)(1
+
x
1 ) 1)(1 + 2 )(1 + 3 )/x

fl(y)

(1 + 1 )(1 + 2 )(1 + 3 ) +

1 (1 + 2 )(1 + 3 )
x

, x > M ,

fl(y)

1 + (1 + 2 + 3 ) + 1 2 + 2 3 +
1 (1 + 2 )(1 + 3 )
+1 3 + 1 2 3 +
x
1 + O(u)

fl(y)y
|
y

O(u). 0 < x < M fl(y) = 0

fl(y) y

= 1.

3.4
The simplest and best, though harder to attain, solution to the problem of environmental parameters is to standardize floating-point
hardware, so that the values of the parameters become universal
constants... - Webb Miller [26]

c
64 3. 2008,
.
... 1960
, . 1980, ...
. ..
. ,
Berkeley, Velvel Kahan, 1985
IEEE floating-point standard 754.

, ,
() .. 0,
, .


format ...,
, , formats.
,
.
formats.
formats F(2, 24, 125, 128)
, F(2, 53, 1021, 1024)
. Wilkinson
3.1. = 2, bit
. ,

m ... t bits 1 m < 1


.
m = d0 + d1 1 + + dt1 (t1) .
d0 6= 0 bit d0 .
3.4.1. :
.
16-

8 bits
23 bits

z }| { z }| {
1 10000001 001 0
11

bits

52

c0900000

-4.5

bits

}|
{ z }| {
z
1 10000000001 0010 0

c012000000000000 -4.5
bit .
, ,

z}|{
e = 2 = 1025 1023

bit

m = 1.125 = 2

z}|{
1

c
3.4. 2008,
. 65

3.1: IEEE-754.

32 bits
64 bits

1
1

23+1 bits
52+1 bits

8 bits ( 127) 11 bits ( 1023)


. u 224 5, 96 108 253 1, 11 1016

1038
10308

, .. Intel:
*86, Pentium. DEC: Alpha, IBM: RS/6000, Motorola: 680*0, Sun: SPARC,
PowerPC, MIPS R10000, .

3.4.1
The rational number system is inadequate for many purposes, both as a field and as an ordered set... This leads to the ... irrational
numbers which are often written as decimal expansions and are
considered to be approximated by the corresponding finite decimals... Walter Rudin [33]
Formats: single, double, single extended, double extended.
: ) , )
, ) 0 ().
: bits . 10 .
x/0, 0/0

y y < 0. 0, ,
NaN, Not a Number
.

, (=subnormal) , . .
: inexact, invalid op., overflow, underflow, division
by 0. 3.2
IEEE.
3.4.1. MATLAB isieee
1 IEEE.
IEEE :
10
.
.

c
66 3. 2008,
.

3.2: IEEE [17].



invalid op.
0/0, 0 , 1
NaN
overflow
Inf
divide by 0 finite number/0
Inf
underflow
subnormal numbers
inexact
fl(x y) 6= x y

>> 1/(1/0) =
Warning: Divide
ans = 0
>> 1/1/0
Warning: Divide
ans = Inf
>> 0/0
Warning: Divide
ans = NaN
>>1/0
Warning: Divide
ans = Inf
>> max(ans,4)
ans = Inf
>> min(ans,3)
ans = 3

by zero

by zero

by zero

by zero

3.4.2.

/ . Fortran machar.f ( Cody) [6], paranoia.f
( Kahan). ,

... IEEE . 3.4
.

...

3.4.2
IEEE ,
.
, .. emin t .

, .. 0 < a b < m.
3.4.2. Matlab 5.1.0 Windows Intel Pentium.

c
3.4. 2008,
. 67
MATLAB realmin.

>> realmin
ans = 2.225073858507202e-308
>> format hex
>> realmin
ans = 0010000000000000

:

>> format hex


>> realmin/252
ans = 0000000000000001
>> ans/2
ans = 0000000000000000

>> format hex


>> realmin/252
ans = 0000000000000001
>> ans/2
ans = 0000000000000000
>> format long e
>> realmin/252
ans = 4.940656458412465e-324

. , ,
m
. , , .
(=gradual underflow).
, .
3.4.3 (Kahan).

if (x > y),
...
... log(x-y) ...
end
x, y ()
|x y| .
, 0 log(0).
.

3.4.4. [8] A = [1 2; 1 5/2]


... L = [1 0; 1 1], U =

c
68 3. 2008,
.

[1 2; 0 1/2]. U (2, 2)
, , U !
1.
. Matlab 5.1.0 Windows Intel Pentium.
realmin .

> a = realmin*[1 2; 1 5/2];


> [l,u] = lu(a);
> u(2,2)
ans = 1.1125e-308
> u/realmin
ans =
1.0000
2.0000
0
0.5000

. Demmel [8].

3.4.3.
..., .
(. [8]).

3.4.3
IEEE (extended
format)). 79 bits (mantissa 63, exponent
15), u 5.42 1020 [104932 , 104932 ].
, Pentium, ... 80 bits
(= double rounding). ,
80 bits 64 32 bits.
3.4.4. . (. [16])
= 10 2 3. 1.9 0.66 = 1.254. round p (x) x p . round 2 (1.254) = 1.3
round 2 (round 3 (1.254)) = 1.2.

, (
). , Kahan
128 bits [17].

3.4.4

Fused Multiply and Add (FMA)

z + x y
x y x + y . -

c
3.4. 2008,
. 69
Cray (
chaining), ( x y ) ( z + (x y))
(. !) , FMA
DOT(x(1 : n), y(1 : n)) n + O(1)
2n + O(1). , (.. IBM RS/6000) z + x y
.

.
, . ,

fl(z + x y) = (z + x y)(1 + ),

|| u.

. ( )

fl(z + fl(xy)) = (z + xy(1 + 1 ))(1 + 2 )


.

. Kahan Kahan
Higham:
3.4.5.

x = det

a b
c d

x = ad bc ,
F , :

t1 = ad
t2 = bc
x = t1 t2

t1 = ad(1 + 1 )
t2 = bc(1 + 2 )
x
= (t1 t2 )(1 + 3 )

|
x x|

=
=

|(t1 t2 )(1 + 3 ) (ad bc)|


|(ad(1 + 1 ) bc(1 + 2 ))(1 + 3 ) (ad bc)|

|ad(1 + 3 ) bc(2 + 3 ) + ad1 3 bc2 3 |


(|ad| + |bc|)2u + (|ad| + |bc|)u2

|
x x|
(|ad| + |bc|)2u + (|ad| + |bc|)u2

.
|x|
|x|
|ad|, |bc| |x|, .. |x| ,
. Kahan
:

c
70 3. 2008,
.

t1 = bc
t2 = t1 b c ( FMA)
t3 = a d t1 ( FMA)
x = t3 + t2

t1 = bc(1 + 1 )
t2 = (t1 bc)(1 + 2 )
t3 = (ad t1 )(1 + 3 )
(1 + 4 )
x = t3 + t2

|
x x| =
=
=
=
=

|(t3 + t2 ) x
4 (ad bc)|
|(ad t1 )(1 + 3 ) + (t1 bc)(1 + 2 )

x4 (ad bc)|
|(ad bc(1 + 1 ))(1 + 3 ) + (bc(1 + 1 ) bc)(1 + 2 )

x4 (ad bc)|
|ad3 bc(1 + 3 + 1 3 )
+bc1 (1 + 2 ) x
4 |
|x3 x
4 bc1 (2 3 )|
(|x| + |
x|)u + |bc|2u2

|x| , |x|, |
x|
,
(
|bc|u > |x|). .

3.4.1. FMA
.
.

3.4.2. FMA
.

3.4.5. ...
standard IEEE.
Goldberg [12] What every computer scientist should know about floating-point
arithmetic. [16, Appendix].
IEEE 754 floating-point standard
[20].
. [2]
... Intel Pentium.

3.4.6. Pentium bug Intel ,


. Alan Edelman MIT [10] [7].

3.4.5 Java
Network Computing.

Java. Java
. (..

c
3.5. 2008,
. 71
interfaces, , ),
. ; Java
:
: , Java
complex types.
.
: Java linguistically enforced exact reproducibility of all floating point results, Java (
W. Kahan cruel delusion11 .)
, Java
. .. Java Linpack12 .

3.5
As every physicist knows, no equation is exact; therefore, we believe that finite precision computation can be closer to physical
reality than exact computation. Thus it appears possible to transform the limitations of the computer arithmetic into an asset. [5]
(...): ,
, .
, .
; (...) ,

. [9]
Its impossible to compute things which dont exist. Its difficult
to compute things which almost dont exist. [Cleve Moler]
,

(. R ), f : U Rm Rn .

:

x U m f
f (x) n x,

x , . x F . x
x, x = fl(x).
11
12


http://www.netlib.org/benchmark/linpackjava

c
72 3. 2008,
.

f (x ) f (x ), (.
R .
fprog f ... F .
,

kfprog (x ) f (x)k

kfprog (x ) f (x)k
.
kf (x)k

(3.7)

(.. ) f
.
f (x), (3.7)
. ,
(3.7).
accuracy precision.
.
.
Mathematica.
:
,
accuracy
.

(3.7) . precision
.
...
3.5.1. Vel Kahan [21]: Precision concerns the tightness of specification. Accuracy concerns its correctness. An utterly inaccurate
statement ... can be uttered quite precisely... 3.177777777777 is a rather precise
(13 dec. digits) but rather inaccurate (2 significant decimal digits) approximation
to .... Although exp(10) = 0.00000454 has 3 decimal digits of precision, it is
accurate to almost 6. Precision is to accuracy what intent is to accomplishment.
A natural disinclination to distinguish them invites first shoddy science and ultimately the kinds of cynical abuses brought to mind by Peoples Democracy,
Correctional Facility and Free Enterprise .

,
f (x ) f (x) .
f
x
x x. (,
) x.
, .

x, .
.

c
3.5. 2008,
. 73

f(x)

f(x*)
X
X

f(U )

f(y)
y
y*

f(y*)

3.2: ) )
.
3.5.1. f : R R
y := f (x).
x = x + x,
f ,
y = f (x + x). |f (x + x) f (x)| = |
y y|.

y y

= f (x + x) f (x)
= f (1) (x)x +

f (2) (x + x)
(x)2 , (0, 1)
2!

f, f (1) , f (2)
x, x + x. f (1) (x) ,
.
,

y y
y

f 0 (x)x
f (x)

x
+ O((x)2 )
x

(3.8)
f 0 (x)x

|x| 1, (3.8) |x|, | f (x) |


y
x. ,
.
f 0 (x)x f (x).
,
.

,
f
. x1 +x2 +x3
( 0, 1, 2)
R ,
... ,
fprog . fprog
x f

c
74 3. 2008,
.

f(x)
fp or g(x)

f(x )
x

()

f(x)
x
fp or g(x)
x*
()

f(x*)

3.3: ) . )
.
. f
.
fprog :
3.5.1. x U x x
fprog (x) f (x ).
.

,
- .
3.3.

, . ,
.
, fprog (x) f (x ) x
x. , :
3.5.2. , x
x fprog (x) = f (x ). ()
.

, .

(3.7) (
) (=forward error). ,

c
3.5. 2008,
. 75
, kx xk
( )
...
. ,
f (x) .
, ..
x = (1 , . . . , N )

a1 = f1 (1 , . . . , N ), a2 = f2 (a1 , 1 , . . . , N ), , z = fn (an1 , , a1 , 1 , . . . , N ),
fn f1 f fj (aj1 , , a1 , x)

{aj1 , , a1 , 1 , . . . , N }.
...,

R , z
z .

fj . 3.3.6
3.4.5. ( )
(=forward error analysis).
, :
f
. 3.3.6 3.4.5
, .
f : Rn Rm .
3.7.

. ,
:

2
2
a
b

x+ x
cos 1
b b2 4c
f (x + ) f (x)

(a b)(a + b)

x++ x
2 sin2 2
|b| + b2

(1)

4c + Vieta
2
(x) + f (2) (x) 2 +

3.5.2.
, .
1962 Ramon Moore [28].

c
76 3. 2008,
.
x ( interval)
(xL , xU ) , ( interval arithmetic).
(.. (, +).)
[17] [1].

(..
3.3.6)

Qn
pn = i=1 (1 + i ) |i | u.

(1 u)n pn (1 + u)n .
pn = 1 + nu + O(u2 ).
.
:
3.5.1. |i | u i = 1 i = 1 : n nu < 1
n
Y

(1 + i )i = 1 + n ,

i=1

|n |

nu
:= n .
1 nu

. . n = 1. n = 1,
n
Y

(1 + i )i

= (1 + n1 )(1 + n )

i=1

1 + n
|n |

= 1 + n1 + n + n1 n
= |n1 + n + n1 n |
(n 1)u
(n 1)u2

+u+
1 (n 1)u
1 (n 1)u
(n 1)u + (n 1)u2 + u (n 1)u2

1 (n 1)u
nu

n .
1 (n 1)u

n = 1
n
Y

(1 + i )i

i=1

n1

1 + n1
1 + n

n + n + n n
n1 n
n =
1 + n
n1 + u
|
|n | |
1u
nu (n 1)u2
|n |
n
1 (n 1)u + (n 1)u2

c
3.5. 2008,
. 77
-5

gamma

10

-10

10

u = 2e-16

-15

10

10

10

10

10

10

10

10

3.4: n .

Qn

i=1 (1

+ i )i :=< n >

< n > < k >=< n + k >, < n > / < k >=< n + k >
( nu < 1):

nu
1 nu
nu(1 + nu + (nu)2 + )
nu + O(u2 )

,
n
Y

(1 + i ) (1 + u)n < enu .

i=1

3.4 n

u = 2 1016 .

3.5.2. ,
3.3.6
: ... x, y F
x2 + y 2 G

0 < x F fl( x) = x(1 + ) || u.


,

c
78 3. 2008,
.
z =

t1

x
x2 +y 2

=
=
=
=

Erel (t1 )

x
(1 + 5 )
t3
x

(1 + 5 )
t3 (1 + 4 )
x
p
(1 + 5 )
(t1 + t2 )(1 + 3 )(1 + 4 )
x
p
(1 + 5 )
(x2 (1 + 1 ) + y 2 (1 + 2 ))(1 + 3 )(1 + 4 )

|j | u j = 1 : 5.
. 3.5.1
.

(x2 (1 + 1 ) + y 2 (1 + 2 ))(1 + 3 )

00

(x2 (1 + 2 ) + y 2 (1 + 2 ))

(x2 + y 2 )(1 + 2 ), 2 |2 | 2 ,

000

000

000

000

(x2 + y 2 )(1 + 2 ) =

0000

0000

0000

x2 + y 2 (1 + 2 ) 2 |2 | 2 .

3.5.1

t1

=
=

(x2 (1

+ 1 ) +
x

y 2 (1

+ 2 ))(1 + 3 )(1 + 4 )

(1 + 5 )

(1 + 2 )
000
(x2 + y 2 )(1 + 2 )
x
p
(1 + 2 )
0000
2
2
x + y (1 + 2 )
x
p
(1 + 4 )
2
(x + y 2 )

t1 z
|
z

|4 | 4 .

(3.9)

(3.9) z
t1 .


,

3.5.2.
x x
... z = fprog (x) z = f (x ).

kz zk

kfprog (x) f (x)k = kf (x ) f (x)k.

c
3.5. 2008,
. 79
,
kf (x ) f (x)k. kx xk ,
( ) f (= perturbations) .
, ,
.

!
13 (=
backward error analysis) 14 James Hardy Wilkinson (1919-86),
. Rounding Errors in Algebraic Processes 1963,
(
[36]).
( ) .
3.5.3. f (x1 , x2 , x3 ) = (x1 + x2 ) + x3
F . fprog (x1 , x2 , x3 ) = ((x1 + x2 )(1 + 1 ) +
x3 )(1 + 2 )

fprog (x1 , x2 , x3 ) = x1 (1 + 1 )(1 + 2 ) + x2 (1 + 1 )(1 + 2 ) + x3 (1 + 2 )


|j | u j = 1, 2.

fprog (x1 , x2 , x3 )

= f (
x1 , x
2 , x
3 )

(3.10)

x
1 = x1 (1 + 1 )(1 + 2 ), x
2 = x2 (1 + 1 )(1 + 2 ), x
3 = x3 (1 + 2 ).
|
xj xj | = |xj (1 + 2 + 1 2 )| j = 1, 2 |
x3 x3 | = |x3 2 |.
|
xj xj | 3u|xj |, j = 1, 2 |
x3 x3 | u|x3 |.

|fprog (x1 , x2 , x3 ) f (x1 , x2 , x3 )|


|
xj xj |

|xj |

= |f (
x1 , x
2 , x
3 ) f (x1 , x2 , x3 )|
j u, j = 3 (j = 1, 2), 3 = 1.


. ( )
; . , f : R
R . ,

fprog (x ) f (x)

= f (xprog ) f (x)

x = x+x kxk .
3.5.1,

f (x + x) f (x)
13

= f (1) (x)x +

f (2) (x + x)
(x)2 , (0, 1)
2!

.
Wilkinson,
(1954) Wallace Givens von Neumann
Goldstine(1947) Turing (1948).
14

c
80 3. 2008,
.
f, f (1) , f (2)
x, x + x. (3.8)
f 0 (x)x

, .. | f (x) |
y
x.
3.5.4. f (x1 , x2 , x3 ) = (x1 +
x2 ) + x3 . () f (x1 , x2 ) = x1 +
x2 . ,
fprog (x1 , x2 ) = f (x1 (1 + 1 ), x2 (1 + 1 )). f (x + h) f (x) + [1, 1]h
h = [x1 1 , x2 1 ]> ,

f (x + h) f (x)

f (x)

|x1 1 + x2 1 |
|x1 + x2 |
khk
|x1 + x2 |

(x1 , x2 )
x1 + x2 khk. , x1 , x2
, .. , khk = |x1 1 + x2 1 | u|x1 + x2 |

f (x + h) f (x)

u.

f (x)

3.5.5. Horner 2.2.1


x. , Horner
. 2.2.1
.

sn = n
for k = n 1 : 1 : 0
sk = xsk+1 + k
end
... 3.5.1
:

sn1

= (xsn < 1 > +n1 ) < 1 >


= xn < 2 > +n1 < 1 >

sn2 = (xsn1 < 1 > +n2 ) < 1 >


...
s0

= 0 < 1 > +1 x < 3 > + n1 xn1 < 2n 1 > +n xn < 2n >


= (1 + 1 )0 + (1 + 2n )n xn

s0

=
=

fprog (0 , ..., n , x)
f (0 (1 + 1 ), ..., n (1 + 2n ), x)

c
3.5. 2008,
. 81
. 2n
,
|j
j | 2n |j | ,

... .
.

.

|p(x) s0 |
2n
|p(x)|

Pn

|k ||x|k
,
|p(x)|

k=0


.
3.6 .


.

.
,
(3.7) :
1. .
.
2.
()
. .
, , (3.7).
, f
x . x 6= x
kx x k kx x k/kxk.

. , .
.
3.5.3. (= condition number) Alan Turing 1948 Rounding-off
errors in matrix processes [34]. John Rice [31].

3.5.4.

,

c
82 3. 2008,
.
!


(. [36]
[31]). , (.2)
.

, x = (1 , . . . , m ), y = (1 , . . . , n ) y = f (x).
x y = f (x ).
fi
|x
x, mn ij =
j
f x. ,
ij f x
f .
, mn
. ,
Kj |j j | Kj kj j |.
n
x. ,
K ky yk2 Kkx xk2
|j j |. K
x.

.
. Kj

|j j |
kx xk2
Kj
.
j
kxk2

. , j
. Kj

|j j |
kx xk
Kj
.
kyk2
kxk2
K

ky yk2
kx xk
K
.
kyk2
kxk2
, :
3.5.3 (Rice [31]). X, Y
f : X Y , .
x y := f (x ). x , y
X, Y .
f x x
cond(f ; x ) := lim sup

(x )k
kf (x +h)f

0 khk=

kf (x )k
khk
kx k

c
3.5. 2008,
. 83
.

f (x )
x . ,
cond(f ; x ) khk
kx k
kf (x +h)f (x )k
.

kf (x )k


. :
15 x
cond(f ; x ) =

kx k f
k |x k.
ky k x

X = Rn Y = Rm , Frechet f .
f
,

sup
khk=

kf (x + h) f (x )k
kf (h)k
= sup
khk
khk= khk

x .
, f ( ).
:
cond(f ; x ) =

kx k
kf k
ky k


(3.7).
, :
3.5.1. f fprog
x, xprog

fprog (x) = f (xprog )

3.5.4. (3.5.1).
/ ()
cond(fprog )

kx xprog k
cond(fprog )u.
kxk

kfprog (x ) f (x)k
kf (x)k
15

kf (xprog ) f (x )k kf (x ) f (x)k
+
.
kf (x)k
kf (x)k

Frechet.

c
84 3. 2008,
.
:

kf (xprog ) f (x )k
kf (x)k

kf (xprog ) f (x )k
kf (x )k
kx xprog k
cond(f ; x )
kx k

cond(f ; x )cond(fprog )u.

kf (x ) f (x)k
kf (x)k

kx xk
kxk
cond(f ; x)E.
cond(f ; x)

kfprog (x ) f (x)k
kf (x)k

cond(f ; x )cond(fprog )u + cond(f ; x)E. (3.11)

. , .

:
) , . cond(fprog )
.
.
) , . cond(f ; x)
) E , .
, E , :
<
3.5.5.
, .. f := f2 f1 .
cond(f ; x ) cond(f2 ; y )cond(f1 ; x ).
f , .. f1
f2 (
).

3.5.6. ,
John Rice [31].

3.5.7.
3.5.1, kzprog zk .

3.5.6. f (x) := log x, cond(f ; x) = | log


x|
x 1.

c
3.5. 2008,
. 85
3.5.8. f ([A; x]) = Ax A .
x.

kf (x + h) f (x)k
khk
khk6=0
sup

kAhk
khk6=0 khk
sup

= kAk

cond(f ; x) =

kxk
kAk
kAxk

3.5.9. f ([A; y]) := A1 y A .


y .

kf (y + h) f (y)k
khk
khk6=0
sup

kA1 hk
khk
khk6=0

kA1 k

sup

cond(f ; y)

kyk
kA1 k
kA1 yk
kAxk 1
kA k
kxk
kAkkA1 k := (A)

A, y cond(f ; y) = (A),
(A) A.

3.5.10. ,
. , ,

, .
. ,
. ,
, 2- max /min ,
.

3.5.11. , (A)
n.
.

3.5.12.
, ..
Hilbert, Vandermonde .. Test Matrix Toolbox MATLAB [18]
W. Gautschi Vandermonde.

c
86 3. 2008,
.
3.5.13. (A)
. , (A)
.

3.5.7.

f ([x; y]) := x> y, x, y Rn


X := [x; y] R2n
cond(f ; X) =

kXk f
k
|[x;y] k.
kx> yk X

f
|[x;y] = [y; x] R12n
X

cond(f ; X) =

k[x; y]k
k[y; x]k.
|x> y|

k[x; y]k = k[y; x]k


cond(f ; X)

k[x; y]k2
|x> y|

|x> y|, . cos(x, y).


kxk, kyk > 0 cos(x, y) 0.

3.6
, ,
.
.
J. Wilkinson The perfidious polynomial16 [35].
, , ,
Horner
.
.
.

p(x) :=

n
X
k=1

16

k k (x),

{k }k=1:n Pn1

c
3.6. 2008,
. 87


, .
p(x) k ;
p(x);

3.6.1
. ,
. ,

.

( ).
,
x
p(a; x) = 0, a (..
) :
a
p(a + a, x
) = 0,
x

= 0}
= inf{|kak kak, p(a + a; )

= 0 p(a; )
+ p(a; )
=0
p(a + a; )
..., n ]>
0 = p(a; x
) + [0 , . . . , n ][1, ,
kak kak,

|p(a; )|
..., n ]kD kak
k[1, ,

kkD
(. .1 ).

3.6.2
p(a; x) =
P
n
k
k=0 k x . , i + i , i = i ,
j :
[ ,0,i ,0, ,]

0 = p(a +

z}|{
a

; j + j ) =
0

p(a; xj + j ) + p(; j + j )
dp
i
p(a; j ) +j (a; j ) + i j(3.12)
dx
| {z }
0

c
88 3. 2008,
.

j
|
j

|i ji |
dp
|j dx
(a; j )|

|i ji |
dp
|j dx
(a; j )|

|i |
|i |

(3.13)

i + i , i = 1 : n

0 = p(a + a; j + j )

= p(a; j + j ) + p(; j + j )
n
X
dp
0 p(a; j ) +j (a; j ) +
k ji
dx
| {z }
k=0

kakk[1, j , ..., jn ]kD kak


j
|
dp
j
kak
|j dx
(a; j )|

(3.14)

(3.13) (3.14)

.
, , .

:
3.6.1. z = z pn Pn
pn (z; ) := pn (z) + g(z) g Pn .
pn (z; ) z()

|z() z +

gn (z )
(1)

pn (z )

| = O(2 ).

z m, m pn (z; )

|z() z [

m!gn (z )
(m)
pn (z )

]m 1/m | = O(2/m ).

z :=

gn (z )
(1)
pn (z )

, pn (z; )
z() z , O().

pn (z; ). pn (z; )
pn (z; 0)
.
m ,
m!g (z )
| (m)n ]m |1/m . ,
pn

(z )


. ,
17 .
.
17

(= bifurcation).

c
3.7. 2008,
.

89

.
J. Wilkinson The perfidious polynomial [35].
.
, .
, .. Lagrangre Newton.

Walter Gautschi, .. [11].
3.6.1 (Wilkinson . [35]).

( ) J.H. Wilkinson.

n

pn (x) := x +

n1
X

j xj ,

j=0

j pn j =
j, (j = 1 : n). 1 = 1 ,
,

min condj = cond1 = O(n2 )


j

max condj (5.83)n .


j

.
condj =

(j + n)! j n j!
,
(j!)2 (n j)!

j = 1 : n.

3.6.1. Wilkinson 1 = 1 n = 2, 10
.

3.7

[17] (
James Demmel.)
z = f (a) f : Rn Rm p :

z = f (a), x1 = a Rn
x2 = g1 (x1 ) = [x1 ; 1 ]
1 x1 .

x3 = g2 (x2 ) = [x2 ; 2 ]

c
90 3. 2008,
.

xp+1
p+1
z = Ix

gp (xp ) = [xp ; p ]

I Rm(n+p+1) () .

Rn g1 gp gp+1 I Rm
x1 := a xk+1 = gk (xk ), k = 1 : p

xk
k

k R

x
k+1 = gk (
xk ) + xk+1
xk+1 k .

x
2
x
3

x
4

.
=
.
=
=
=
.
=

g1 (a) + x2
g2 (
x2 ) + x3
g2 (g1 (a) + x2 ) + x3
g2 (g1 (a)) + Jg2 x2 + x3
g3 (g2 (g1 (a))) + Jg3 Jg2 x2 + Jg3 x3 + x4

z =

p ( ((g1 (a) ) + Jg Jg x2 +
I[g
p
2
+Jgp Jg3 x3 + + Jgp xp + xp+1 ]

I m 0 1 (
).

x2

.
g Jg , . . . , Jg , I]
z = f (a) + I[(J
..
p
2
p

xp+1

.
= f (a) + Jh

z f (a) = Jh.

z = f (a + a) = f (a) + Jf a = f (a) + Jh
a q =
pn + p(p + 1)/2,

Jf a = Jh, Jf Rmn , a Rn , J Rmq , h Rq .


m < n, .
kak.

c
3.8. 2008,
.

91


, .
[4, 24, 25, 27, 26].
ADIFOR [14, 13, 3].

3.8
3.8.1. bit

. . 3.2.2.
3.8.2 (, , 03).
.
;

. . 3.3.
3.8.3. ;

. [, . 3.2]
1 ..., 1+ .
IEEE 1.0 01 1.0 0
. t 1
( ) (t1) =
1t .
3.8.4.
;

. [, . 3.2] , , ... ,
,
( )
...
1 ...
.
, M = 2u.
3.8.5 (Burden and Faires [32]). ( ).
p p 104
p ) , ) e, ) 71/3 .

. ) |
| 104
(1 104 ). pi
MATLAB ,

[3.14127849432443, 3.14190681285515].
) , exp(1) (1 104 )

[2.71801000027620, 2.71855365664189].
) , 71/3 (1 104 )

[1.91273988965411, 1.91312247589067].

c
92 3. 2008,
.

3.8.6 (, , 02-makeup).
MATLAB
.
7.0. (.:
).

. . : , release 14 MATLAB ( 7.0)18


.
3.8.7 (, , 02-makeup). :

x Rn n1 kxk1 kxk kxk1 .


. : kxk = maxi=1:n {|i |}
P
kxk1P=
i |i | nkxk .
kxk1 = i |i | kxk kxk (
) kxk1 .
3.8.8. x Rn 1n kxk1

kxk2 kxk1 .

. :

kxk1 =

X
X
|j | (
|j |2 )1/2 (
1)1/2

Cauchy-Schwartz . ,

kxk21 = (

|j |)2

|j |2

.
3.8.9 (, , 03). Ax = b, x
, r := b A
x
krk
:= kAkkxk+kbk = 1.5 1013 (
).
2 (A) = 106 .
kx
xk

kxk .

. ( Rigal-Gaches),
.
, :

kx x
k
kxk

2(A)
1 (A)
3 107 /(1 1.5 107 ) 3 107 .

3.8.10 (, , 04). Ax = b 101 106 x



||b A
x|| = 1012 ||
x||2 = 10, ||A||2 = 10 ||b||2 = 1.
, .
||x x
||2 / ||x||.

18

http://www.mathworks.com/products/new_products/latest_features.html#ML.

c
3.8. 2008,
.

93

3.8.11 (Burden and Faires [32]). ,


p p . , ,
.
p
p

0.300 101
0.300 103
0.300 104

e
e10

0.310 101
0.310 103
0.310 104
22/7
3.1416
2.718
22000

p
0.300 101
0.300 103
0.300 104

e
e10

p
0.310 101
0.310 103
0.310 104
22/7
3.1416
2.718
22000


0.1
4

0.1 10
0.1 103
0.001264
7.346 106
2.818 104
2.647 101

0.3333 101
0.3333 101
0.3333 101
4.025 104
2.338 106
1.037 104
1.202 103

,
.
3.8.12. ... IEEE ,
( precision) . ... IEEE .

. IEEE , , 24 =
23(+1) ( ) 53 =
52(+1). , m 2E
1 m < 2 1.bs . ... ,
24 . d
, 10d 224 . , d = 7 24 log10 2 = 7.2247198.
10d 253 . , d = 16 53 log10 2 =
15.954589.
3.8.13. : (.
k k2 ) .

. . : Rn R
, , .
J : Rn Rn

J := [

,...,
].
1
n

v
u n
X
u
t
2
j i=1 i

pPnj

2
i=1 i

c
94 3. 2008,
.
() J := kxk2 =

x>
kxk2 .

1 : ,
Pn
kxk1 =
j=1 |j |
0. ,
.

3.8.14. A = [100, 1; 0, 10] B = [1, 0; 0, 0]


MATLAB IEEE.
( ) C = A./B
.

. C = [100, Inf; NaN, Inf].


3.8.15 (Heath [15]). IEEE .
.

. 1t /2. = 2,
IEEE 2t
. > 2 1t /2 =
(/2) t , ,
.
3.8.16. ...
1960 ( M. Overton [30]) 0,
, ..
.
.

. fmax ... ,
a/0b/0 = fmax fmax = 0 a, b 6= 0,
. ,
, program interrupt,
IEEE,
.
3.8.17. IEEE realmin . 1 0
boole.

temp2=realmin}/2;
temp=2*temp2;
boole(temp==realmin);

. MATLAB , realmin

realmin = 0010000000000000.

temp2 = realmin/2 = 0008000000000000


realmin, boole = 1.
0
.

c
3.8. 2008,
.

95

3.8.18. f : Rm Rn
.
.

. , f (
, , ...),
()
() .

.
3.8.19. : G
x f (x) = y ( x, y ),
.
G

G x
G(y)
y
k
y yk ( 0) |G(
y)
x
|/|
x| . G
.

. , -
. x
= G(
y ) |G(
y)
x
|/|
x | = 0.
3.8.20. ,

.

. G x G(x) = y .
x
x
, G(x)
= G(
x). ,

kG(x)
G(x)k

kG(x)
G(x)k = kG(
x) G(x)k.
k
x xk/kxk,
.
G G1
x = G1 (y), kG(
x) G(x)k/kG(x)k = kG(
x x)k/kG(x)k

kG(
x x)k
kG(x)k

kG(
x x)k k
x xk
k
x xk kG(x)k
k
x xk kxk
kGk
kxk kG(x)k
k
x xk kG1 (y)k
kGk
kxk
kxk
k
x

xk
.
kGkkG1 k
kxk
=

, ,
k
xxk
, kxk
kGkkG1 k ( G).

c
96 3. 2008,
.
3.8.21. x, y R2
, (. i i 0)
xT y .

. x, y ... x1 (1 + 1 ), x2 (1 +
2 ), y1 (1 + 3 ), y2 (1 + 4 ) |j | u.
fl(xT y) = (1 1 < 3 >
+2 2 < 3 >) < 1 > 1 1 < 4 > +2 2 < 4 >.

|x|T |y|
|fl(xT y) xT y|

= 4
4
|xT y|
|xT y|
|xT y| = |1 1 + 2 2 | = 1 y1 + 2 2 = |x|T |y|.
3.8.22. ... 32
b0 b1 b8 b9 b31 , , 8 23
. b9 b31
:

(1)b0 2E127 1.b9 b31


E = b1 b8
127 . bj = 0
j = 1 : 31 0
E 1 0. ) , ,
; ) , ,

;

. ) 0, 2126 .
fN min1 = 10126 log10 2 1.1754943e 38. ) ,
, fmin1 2126 223 =
2148 2.8026e 045. : ,
52 11 1024.
fN min2 21022 2.2250738e 308 realmin
fmin2 = 2102252 4.9406564e 324.
3.8.23. , ... IEEE
= y +x
.
, x+y

.
= fl(x + y) = fl(y + x) = y +x
.
... IEEE, x+y
=
x+y

(x + y)(1 + 1 ) y +x = (y + x)(1 + 2 ), 1 = 2 .
3.8.24 (Heath [15]). x y
log x log y m(x, y) := log(x) log(y)
. ,
log(x) log(y) = log(x/y),
M (x, y) := log(x/y) m(x, y).
.
; ( :
;)

c
3.8. 2008,
.

97

. . ,

. , .. y = 1 log y = 0,
log(x/y) = log(x).
MATLAB . , 2.
x = 256+512*eps y = 256. MATLAB

log2 (x) - log2 (y) = 0

log2 (x/y) = 6.661338147750939e-016


2 1 . x = eps y = realmax.
m(eps, realmax) = 1076 M (eps, realmax)

Warning:

Log of zero.

3.8.25. ) fadd, fadd fmul ... (64 bits)


1 . ; ) fma fnma,
,
. (. fma
fnma) 3 , .
;

. )
, (a + b)(x + y) =
(ax by) + (bx + ay), 6 .
, , s = (a + b)x; t = b(x + y); p = y(a b);
(s t) + (t p). 5
3 .
(..
, O(n2 )
O(n3 ).
3.8.26. fma.
) fma
.
) fma
.
) fma
.
) .

. (). fma
. ,

c
98 3. 2008,
.
z + x y
. fl(z +
xy) = (z +xy)(1+), || u fl(z +fl(xy)) = (z +xy(1+1 ))(1+2 ).

3.9
3.9.1. ,
x|
x
R |x
|x| , x
. - - x R
,
|x
x|
. , , |x| .

.

|x
x|
|x| .

|x x
|
|
x|

x
|
x| x x
+ |
x|
< 1 ( )
x, x
> 0.

x
(1 ) x x
(1 + ),

|x x
|
|x|

|x x
| |
x|
|
x| |x|

= + O(2 ).
1

,
( ) x
(1 ) x.
x, x
< 0.
3.9.2. ... 5
b0 b1 b2 b3 b4 , , 2 2 .
b3 b4

(1)b0 2E1 1.b3 b4


E = b1 b2 1
. ) , ,
. ) , ,
; ) , ,

c
3.9. 2008,
.

99

; )

( NaN, ).

. -) : 21 [0.00, 0.25, 0.50, 0.75]


( , , ), 21 [1.00, 1.25, 1.50, 1.75], [1.00, 1.25, 1.50, 1.75], 2
[1.00, 1.25, 1.50, 1.75], 4 [1.00, 1.25, 1.50, 1.75]. :

{0, 0.125, 0.250, 0.375, 0.500, 0.625, 0.750, 0.875, 1.000, 1.250, 1.500,
1.750, 2, 2.50, 3.00, 3.50, 4, 5, 6, 7}.
19 0. 0.500 = 21
0.125.
) t = 2 + 1 , u =

213 /2 = 0.125
3.9.3. ()
... IEEE . ) 0.5, ) 0.1, ) 0.2, ) 23/4, ) 25/32100 ,
) 25/3 2100 , ) 2000.

. ,

.
x x = m 2E 1 m < 2. 2. ,
E m .
, . )

0.5 = 21 1.0. IEEE 1 +

z}|{
127

= 126 ,

z }| { z}| { z }| { z }|{ z }|{ z }|{ z }|{ z }|{


0.5 = 0|011 1111 0|000 0000 0000 0000 0000 0000 .
,
.
(), .
0.1 = 24 1.6. 1.6 = 1 + 0.6
0.6. 2 0.6 = 1.2, 2 0.2 = 0.4, 2 0.4 = 0.8, 2 0.8 = 1.6,
2 0.6 = 1.2. , ,
. , ...

0.1 = 24 1.1001,
1001 -

c
100 3. 2008,
.

z}|{
127

. IEEE 4 +
3

= 123

z }| { z }|{ z }| { z }|{ z }|{ z }|{ z }|{ z }|{


0.1 = 0|011 1101 1|100 1100 1100 1100 1100 1101

z}|{

) 0.2 = 2 1.6. IEEE 3+ 127 =


124 01111100. 1.6 = 1 + 0.6
0.6. 2 0.6 = 1.2, 2 0.2 = 0.4,
2 0.4 = 0.8, 2 0.8 = 1.6, 2 0.6 = 1.2.
, , . ,
...

0.2 = 23 1.1001.
,
3

z }| { z }|{ z }| { z }|{ z }|{ z }|{ z }|{ z }|{


0.2 = 0|011 1110 0|100 1100 1100 1100 1100 1101
)

23/4 = 22 23 = 22 (20 + 21 + 22 + 24 ).

23/4 = 22 (24 + 23 + 22 + 20 ).

z}|{

IEEE 2 + 127 = 129 10000001. ,


4

z }| { z }|{ z }| { z }|{ z }|{ z }|{ z}|{ z}|{


23/4 = 0|100 0000 1|011 1000 0000 0000 0000 0000
)

2000 = 210 (1 +

976
).
1024

976
, 0.953125 = 1024
.
20.953125 = 1.90625, 20.90625 = 1.8125, 20.8125 = 1.625, 20.625 = 1.25,
2 0.25 = 0.5 2 0.5 = 1.0 , IEEE

10 +

z}|{
127

= 137 10001001. ,

c

z }| { z }|{ z }| { z }|{ z }|{ z }|{ z }|{ z }|{


2000 = 1|100 0100 1|111 1010 0000 0000 0000 0000

c
3.9. 2008,
.

101

3.9.4. p(x) = (x 1)n

p(x; ) = (x 1)n
p(x; )
n. p;

3.9.5. p(x) = xn 1

p(x; ) = xn 1
p(x; )
n. p;

3.9.6 (). :
... x, y 1/ x/y ,
...
, = 2 ( ) > 2.

3.9.7. x x

. y = ex y = ex .
y x x
.

3.9.8. 1) , 1% 3%.
. 2)
...
IEEE .
.

. 1) x
= x(1 + x ) y = x(1 + y ) |x | 0.01
|y | 0.02.

x
y = xy(1 + x )(1 + y ) = xy + xy(x + y + x y )

x
y xy
| = |x + y + x y | 3 102 + 2 104
xy

0.0302.
)
)
x y . 2)

x
y = xy(1 + x )(1 + y )(1 + ) =
= xy + xy(x + y + + x y + x + y + x y )
| | u ( ). , t = 23
u = 224 6 108

0.0302 + 6 108 + 6 1010 + 12 1010 + 18 1012 .


108
.
3.9.9.
.

c
102 3. 2008,
.
1. f (x) = x

x2 c x2 > c.

2. f (x) = x1/n .
3. f (x1 , x2 ) = x21 + x22
4. f (x1 , x2 , x3 , x4 ) = (x1 + x2 )/(x3 x4 ).
5. f (x) = ln x,

x > 0.

6. f (x) = cos(n arccos x).

.
. ) f 0 (x) = 1 x(x2 c)1/2 . x2 c
. ) f 0 (x) = x1/n1 /n. )
f
f
2
x = 2[x1 , x2 ]. ) x = [1/(x3 x4 ), 1/(x3 x4 ), (x1 + x2 )/(x3 x4 ) , (x1 +

x2 )/(x3 x4 )2 ]. ) f 0 (x) = 1/x. ) f 0 (x) =

n sin(n arccos x)

.
1x2

3.9.10. )
D = ad bc [a, b; c, d] ( MATLAB )
D| Au + Bu2 , u
: |D
A B a, b, c, d. )
.

. , 3.4.4.
QN 3.9.11. ... P :=
i=1 pi , P
...
.

.

. ,
.
3.9.12 (C. Hoffmann [19]).
...
( ) Dobkin Silver. A
(0, 0), (0, 1), (1, 0), (1, 1 + p), (1 + p, 1) p .
A ,
C . A
B . gin(A)
A B gout(B) A C .
gin(gout(A)) = gout(gin(A)) = A,

ginm (goutm (A)) = A


1. / gin gout. ,
.

c
3.9. 2008,
.

103

2. p
0.5, 0.1, 0.01, 0.001, 0.0001, 0.00001, 0.000001. ,
A

ginm (goutm (A)) = A, m = 1, ..., 10


3. .
4. (.. Symbolic Toolbox MATLAB) .

3.9.13.
sAXPY ) , )
.

3.9.14 (Overton [30]). ... IEEE


. ,
) )
: 64 + 220 , 64 + 220 , 32 + 220 , 16 + 220 , 8 + 220 .

3.9.15 ( Miller [26]). ...


IEEE . :

function [k, x] = bisect(a,b);


k = 0;
while (a = b)
x = (a+b)/2;
a = x;
k = k+1;
end
) a = 1, b = 2 bisect k =
53. ) a = 2000, b = 2001 bisect k = 43. )
a = 0, b = 1. k = 54.

3.9.16. G : Q Y X, Y G . kxkX x X kykY y Y .

G(x) := y y = G(x)

G .

kG(
x) yk
kG(x)k

kGkkG1 k

k
x xk
,
kxk

. ( : kGk kGk :=
supx6=0 kG(x)kY /kxkX kG1 k := supy6=0 kG1 (y)kX /kykY ).

. G x G(x) = y x = G1 (y),

c
104 3. 2008,
.

kG(
x) G(x)kY /kG(x)kY = kG(
x x)kY /kG(x)kY
kG(
x x)kY
kG(x)kY

kG(
x x)kY k
x xkX
k
x xkX kG(x)kY
k
x xkX kxkX
kGk
kxkX kG(x)kY
k
x xkX kG1 (y)kX
kGk
kxkX
kxkX
k
x xkX
kGkkG1 k
.
kxkX
=

x G(x)
= G(
x). ,

kG(x)
G(x)k

kG(x)
G(x)kY = kG(
x) G(x)kY .
k
x xkX /kxkX , .
, , ,
k
xxk
, kxk kGkkG1 k
( G).
3.9.17.
1

y = 1;

forj = 1 : n
..., n > 0
y = y ( + j )

end

) . )

y(1 , ..., n , )
Pn
1
P = j=1 +
.

j
. )

fl(y) =

n
Y
j=1

( + j )

n
Y

(1 + j )

j=1

n1
Y

(1 + j )

j=1

j + j
0
j
y = fl(y) fl( + j ).
1. ,
u. ()
n
Y

(1 + j )

j=1

n1
Y
j=1

(1 + j ) = 1 + 2n1 , |2n1 | 2n1 =

(2n 1)u
1 (2n 1)u

2 .

c
3.9. 2008,
.

105

i)
fl(y)

n
Y

(1 + 2n1 )

( + j )

j=1

=
=

n
Y

( + j )(1 + 2n1 )1/n

j=1
n
Y

(0 + j ),

j=1
0

0 = (1 + 2n1 )1/n j = 1 (1 + 2n1 )1/n . ,


0
0
0 , 1 , ..., n . .
|2n1 | < 1,

(1 + 2n1 )1/n = 1 +
|| < 1. 1 > 2n1 0. 0

1 + 2n1
0

(1 + )n = 1 + n + O( 2 )
n 2n1
2n1

2n1 .
n

1 < 2n1 < 0,

(1 2n1 )1/n = 1
1 > 2n1 , > 0.

1 1 1
1 1 1
1
1
2
3
2n1 +
( 1)2n1

( 1)( 2)2n1
+
n
2! n n
3! n n
n
1
1 n 1 2n 1 2
1 n1 2
2n1 +
2n1 +

+
n
2n n
2n n
3n 2n1
1
1
1
3
2n1 +
2n1
<
2n1
2n
2n
1 2n1
2n

ii)
j , j = 1, ..., n n , . . . , . ,
+ j , 2
, j
.
fl(y) =

(1 + 2n1 )

n
Y

( + j )

j=1

( + 1 )(1 + 2n1 )

n
Y
j=2

(0 + 1 )

n
Y

( + j ),

j=2

( + j )

c
106 3. 2008,
.
0

0 = (1 + 2n1 )1/n 1 = 1 (1 + 2n1 )1/n . ,


0
0 , 1 , 2 , ..., n .

|0 |
= |2n1 | 2n1 = (2n 1)u/(1 (2n 1)u)
||
0

|1 1 |
= |2n1 | 2n1 = (2n 1)u/(1 (2n 1)u)
|1 |
.
)

(y) =

k[1 , ..., n , ]k
kJk
|y|

J J R1(n+1)

J =[

y
y y
, ,
,
].
1
n

Y
y
=
( + i ) := pj
j
i6=j

X
y
=
pj .
j=1
Pn

, kJk = max{|p1 |, ..., |pn |, | j=1 pj |}


Pn
, kJk = j=1 pj . k[1 , ..., n , ]k =
.
n

(y)

pj
j=1 ( + j ) j=1

Qn

n
X
j=1

1
+ j

3.10
3.10.1 (Burden and FairesP
[32]). ex ,
x
x
.
x, e = j=0 j!
e,
P5 1
P10 1
: ) e j=0 j!
. ) e j=0 j!
.

c
3.10. 2008,
.

107

. Maclaurin,

|e

n
X
1
e
|=
j!
(n + 1)!
j=0

[0, 1]. ex ,

|e

5
X
e
1
|
3.775e 03,
j!
6!
j=0

|e

P5

1
j=0 j! |

e
1.388e 03
6!e

, = 1
1.390e 03

P5 1
e j=0 j!
, (.
3.9.1). , e
e
(n+1)! ,

.. e 2.72, .
3.10.2. , n+1 Maclaurin , .. , exp(x) ( Maclaurin
Pn xn,
Mn (x)), .. ex Mn (x) = j=0 j!
exp(x).
, Maclaurin exp(x)
x ( ). 9- Maclaurin IEEE

Pn (5)j
e5 : ) e5 Mn (5) :=
j=0 j! . )
1
1
5
5
e = 1/e Mn (5) = Pn 5j . (.. MATLAB )
j=0 j!


n = 1, 2, ..., 15. ,
exp(x) exp(x) (
, , ,
exp(x)).
;

3.10.3. ( MATLAB)
... . machar.f:

a = 1.0; b = 1.0;
while ((a+1.0)-a)-1.0 == 0.0, a = 2.0*a;
while ((a+b)-a)-b = 0.0, b = b+1.0;

1. ( ) (
b).
2. .

108

3.10.4.
( mantissa.)
beta := ...

t = 0; b = 1.0;

while ((b*beta + 1.0) -b*beta)-1 == 0.0,


t = t+1; b = b*beta; end
. t , . t,
b b = betat .
1.0. 1.0 betat+1
: betat+1 + beta(t+1) betat+1 ).
beta(t+1) 0 t
(bbeta+1.0) = bbeta, (bbeta+1.0)bbeta)1 6= 0
t .
3.10.5 (Miller [26]). ... = 2,
t = 24 , emin = 128.
:

function [k, x] = bisect(a,b);


k = 0;
while (a = b)
x = (a+b)/2;
if (rem(floor((rand*10)),2)==1)
b = x;
else
a=x;
end
k = k+1;
end
if (rem(floor((rand*10)),2)==1) 0 1. ) a = 1, b = 2
bisect k 53. ) a = 2000, b = 2001
bisect k 43. ) a = 0, b = 1.
54 k 1075.

[1] O. Aberth. Precise Numerical Methods Using C++. Academic Press, San
Diego, 1998.
[2] D. Alpert and D. Avnon. Architecture of the Pentium microprocessor. IEEE
Micro, pages 1121, June 1993.
[3] C. Bischof, A. Carle, and A. Mauer. Adifor 2.0: Automatic differentiation
of Fortran 77 programs. IEEE Computational Science & Engineering Mag.,
3(3):1832, 1996.
[4] B. Bliss, M.-C. Brunet, and E. Gallopoulos. Automatic program instrumentation with applications in performance and error analysis. In E.N. Houstis, J.R. Rice, and R. Vichnevetsky, editors, Expert Systems for Scientific
Computing, pages 235260. Elsevier Science Pub. B. V. (North-Holland),
1992.

109

[5] F. Chaitin-Chatelin and V. Fraysse. Lectures on Finite Precision Computations. SIAM, Philadelphia, 1996.
[6] W.J. Cody. Algorithm 665: MACHAR: A subroutine to dynamically determine machine parameters. ACM Trans. Math. Softw., 14(4):303311,
1988.
[7] T. Coe, T. Mathisen, C. Moler, and V. Pratt. Computational aspects of the
Pentium affair. IEEE Computational Science & Engineering Mag., 2(1):18
30, 1995.
[8] J. Demmel. Underflow and the reliability of numerical software. SIAM J.
Sci. Stat. Comput., 5(4), 1984.
[9] U. Eco. . . ,
, 2003. Dire quasi la stessa cosa.
Esperienze di traduzione, R.C.S. Libri S.p.A., Milano, Bompiani 2003.
[10] A. Edelman. The mathematics of the Pentium division bug. SIAM Review,
39:5467, 1997.
[11] W. Gautschi. Questions of numerical condition related to polynomials.
In G.H. Golub, editor, Studies in Numerical Analysis, volume 24, pages
140177. Mathematical Association of America, 1984.
[12] D. Goldberg. What every computer scientist should know about floating
point arithmetic. ACM Comput. Surveys, pages 548, 1991.
[13] A. Griewank. On automatic differentiation. In M. Iri and K. Tanabe, editors,
Mathematical Programming: Recent Developments and Applications, pages
83108. Kluwer Academic Pub., 1989.
[14] A. Griewank and G. F. Corliss, editors. Automatic Differentiation of Algorithms: Theory, Implementation and Application. SIAM, Philadelphia, 1991.
[15] M.T. Heath. Scientific Computing: An Introductory Survey. McGraw Hill,
Boston, 2nd edition, 2001.
[16] J.L. Hennessy and D.A. Patterson. Computer Architecture: A Quantitative
Approach. Morgan Kaufmann, San Mateo, CA, first edition, 1990.
[17] N.J. Higham. Accuracy and Stability of Numerical Algorithms. SIAM, Philadelphia, 2nd edition, 2002.
[18] N.J. Higham. The Test Matrix Toolbox for MATLAB (version 3.0). Technical Report 276, Manchester Centre for Computational Mathematics, Sept.
1995.
[19] C.M. Hoffmann. The problems of accuracy and robustness in geometric
computation. IEEE Computer, 22(3):3141, 1989.
Lecture notes on the status of ieee standard for bi[20] W. Kahan.
nary floating-point arithmetic.
URL http://http.cs.berkeley.edu/ wkahan/ieee754status/ieee754.ps, May 1996. Work in progress.

110

[21] W. Kahan and J.D. Darcy. How Javas floating-point hurts everyone everywhere. In Originally presented at 1998 ACM Workshop on Java for high-performance computing, 1998. Available electronically at
http://www.cs.berkeley.edu/ wkahan/JAVAhurt.pdf.
[22] D. E. Knuth. The Art of Computer Programming: Seminumerical Algorithms,
volume 2. Addison-Wesley, 1981.
[23] D.J. Kuck. The Structure of Computers and Computations. Wiley, 1978.
[24] J. L. Larson and A.H. Sameh. Efficient calculation of the effects of roundoff
errors. ACM Trans. Math. Softw., 4(3):228236, September 1978.
[25] S. Linnainmaa. Error linearization as an effective tool for experimental
analysis of the numerical stability of algorithms. BIT, 23:346359, 1983.
[26] W. Miller. The engineering of numerical software. Prentice-Hall, Inc., 1984.
[27] W. Miller and C. Wrathall. Software for roundoff analysis of matrix algorithms. Academic Press, New York, 1980.
[28] R. E. Moore. Interval Analysis. Prentice-Hall, Englewood Cliffs, N.J., 1966.
[29] E. Morin. . .
, , 2000. Les sept
savoirs necessaires
a l education
du futur . .

[30] M.L. Overton. Numerical Computing with IEEE Floating Point Arithmetic.
SIAM, Philadelphia, 2001.
[31] J. Rice. A theory of condition. SIAM J. Numer. Anal., 3(2):287311, 1966.
[32] J.D. Faires R.L. Burden. Numerical Analysis. Brooks Cole, Boston, 5th
edition, 1993.
[33] W. Rudin. Principles of Mathematical Analysis. McGraw Hill, 1976.
[34] A.M. Turing. Rounding-off errors in matrix processes. Quart. J. Mech.,
1:287308, 1948.
[35] J.H. Wilkinson. The perfidious polynomial. In G.H. Golub, editor, Studies
in Numerical Analysis, volume 24, pages 128. Mathematical Association
of America, 1984.
[36] J.H. Wilkinson. Rounding Errors in Algebraic Processes. Dover, 1994 (First
published in 1963).



4.1
1 ( 1.1)

. :
.0: A Rn1 n3 , B Rn3 n2 , C
Rn1 n2 C + AB .
.1: A Rnn b
Rn . x Rn Ax = b.
.2: A Rmn
b Rm . x Rn
kb Axk2 .
.3: ( ) A Rnn
C x Cn Ax = x.
.4: A, B Rnn C x Cn Ax = Bx.
.5: (= Singular Value Decomposition). A Rmn Rmn
U Rmm , V Rnn A = U V > .
.6: A Rnn B
Rnm f f (A)B .
B ( f (A)
f (A)B .

. ,
.
111

c
112 4. 2008,
.
4.1.1.
NASTRAN . URL www.noraeng.com.

( = normal modes).
Kx = f
K ( = stiffness matrix),
, x , f . x f .
2
Ku 2 M ddt2u = 0,

d2 u
dt2

.
NASTRAN
( wind-tunnel) Ames Center NASA
Moffett Field . ,
3.70 21 .

.
1940 40 , () () .
, ,
,
, .
73.5 PSI. , 28 106 .
217918 18
, 10 . [24]
(
1985 Cray Y-MP) 1837 1500
I/O Kx = f 4700
5400 I/O
10 .

4.1.1

, , , .
MATLAB. :
. .. imin , imax , iincr , imin : imax [imin , imin + 1, ..., imax ], imin : iincr : imax imin
c].
[imin , imin + iincr , imin + 2iincr , . . . , imin + iincr b imax
iincr
iincr (= stride). imin
imax , :
imin : imax . imin : iincr : imax
.
MATLAB Fortran 90 , . imin
c]
[imin , imin + iincr , imin + 2iincr , . . . , imin + iincr b imax
iincr

c
4.1. 2008,
.

113

imin : imax : iincr .

4.1.2. A R42 A1:2:4,1


[A(1, 1), A(3, 1)]> .
, .. A(1, 1) A1,1 . , ,
, .. 1,1 A(1, 1) A1,1 .

4.1.3.
[10 : 14] = [1011121314],
[10 : 2 : 14] = [101214]
A(2 : 3, 2 : 2 : 4)

A(2, 2) A(2, 4)
A(3, 2) A(3, 4)

22
32

24
34

4.1.2
,


. , , .
,
, .
.
, , 1 .

( )
( ).
4.1.1. A m n 1
i1 < i2 < ... < ik m 1 j1 < j2 < ... < jl n.
k l (, ) ai j
A. k = l i1 = j1 , ..., ik = jk ( principal).
i1 = 1, ..., ik = k ( leading).

A, .. (, , , .) .
4.1.2. A m n
( )

A=

A11
A21

A12
A22

..
.

..
.

..

Ak1

Ak2

A1l
A2l

..

.
Akl

1
blocks, 2, 1.

c
114 4. 2008,
.
Aij mi nj A.

: (conformally partitioned),
, . ,
, :

B11 C11 + B12 C21


B21 C11 + B22 C21

B11 C12 + B12 C22


B21 C12 + B22 C22

B11
B21

B12
B22


C11
.
C21

C12
C22

4.2


, , .. , , . )
, ,
)
)

.

2 .
.0 :
.0: A Rn1 n3 , B Rn3 n2 , C Rn1 n2
C + AB .

4.2.1


:
_DOT
x, y Rn , ()
:= x> y . , x, y Cn := x y .
= 2n min = 2n + 2.
,
, = min
= min ,
3n + O(1).

. DOT
2

, .. [28].

c
4.2. 2008,
. 115
, (=
reductions).
.

sAXPY
, x, y Rn , [s,d]AXPY
y y + x.
= 2n min = 3n + 1 . min 1
, . min = 32 + 2n
. sAXPY
single-precision a x plus y. (triad).
, sAXPY SAXPY
. )
( ).

,
DOT, sAXPY + x> y y y + x
x, y Rn {j , yj }n
j=1
3
.
DOT
for i = 1 : n

sAXPY
for i = 1 : n

= + i i
end

i = i + i
end


,
.
,
,
. Load/Store,
, :
DOT
LOAD x, y ,
for i = 1 : n

= + i i
end
STORE

sAXPY
LOAD x, y ,
for i = 1 : n

i = i + i
end
STORE y

. , O(n),
.
, .
3
MATLAB , APL .
. .. MATLAB DOT x0 y sAXPY
a x + y .

c
116 4. 2008,
.
DOT
LOAD
for i = 1 : n
LOAD i , i

sAXPY
LOAD
for i = 1 : n
LOAD i , i

= + i i
end
STORE

i = i + i
STORE i
end

DOT = 2n + 2 = 2n
O(1) sAXPY
= 3n + 1 = 2n O(1)
.
min ,
.

DOT sAXPY

DOT sAXPY. DOT, . ,
, 3, ( )
, (
, )
)

.
,
DOT 3.5.7 , cond(f ; X)

k[x;y]k2
.
|x> y|

>

. sn = x y

s1
s2

= fl(x1 y1 ) = x1 y1 (1 + 1 )
= fl(
s1 + fl(x2 y2 ))
= (x1 y1 (1 + 1 ) + x2 y2 (1 + 2 ))(1 + 3 )
= x1 y1 (1 + 1 )(1 + 3 ) + x2 y2 (1 + 2 )(1 + 3 )

|i | u.

sn

n+1
Y

x1 y1

j=1
j 6= 2
...x3 y3

n+1
Y
j=3

(1 + j ) + x2 y2

n+1
Y

(1 + j ) +

j=2

(1 + j ) + + xn yn

n+1
Y

(1 + j ).

j=n

c
4.2. 2008,
. 117

nu
1 nu
nu(1 + nu + (nu)2 + )
nu + O(u2 )

3.5.1 :

sn

= x1 y1 (1 + n ) + x2 y2 (1 + n ) +
...x3 y3 (1 + n1 ) + + xn yn (1 + 2 ).

:
DOT
x1 , ..., xn , y1 (1 + n ), y2 (1 + n ), ..., yn (1 + 2 ),

|j |

ju
.
1 ju

fl(x> y) = (x + x)> y = x> (y + y),


|x| < n |x|, |y| < n |y|.
.
4.2.1. . ,
( ).
.
, , , .
,

|x> y fl(x> y)|

n
X

|xi ||yi | = n |x|> |y|

i=1
>

nu|x| |y| + O(u2 )

|x> y| |x|> |y| , . .


, 0 xi yi , i = 1 : n.
. . ,
.

.

c
118 4. 2008,
.

4.2.2
1

C C + ab> , C Rn1 n2 , a Rn1 , b Rn2 ,


C
-1 ab> . = 2n1 n2 min = 2n1 n2 + n1 + n2 ,

rank-1
=1+
min

1
1
+
.
2n1
2n2

(4.1)

:
for j = 1 : n2
LOAD j
for i = 1 : n1
LOAD ij , i

ij = ij + i j
STORE ij
end
end
j
n2 (2n1 + n1 + 1) . = 3n1 n2 + n2 n2
sAXPY.
a,
2n1 n2 + n1 + n2 .
. , , .
:
LOAD a
for j = 1 : n2
LOAD j
for i = 1 : n1
LOAD ij

ij = ij + i j
STORE ij
end
end
, ,
n1 , n2
.
.
min
;
C, a, b :

c
4.2. 2008,
. 119

C11
C21
..
.
..
.

...

C1,k2

..

..
.
..
.
..
.

CI,J
..

Ck1 ,1

...

Ck1 ,k2

C11
C21
..
.
..
.

Ck1 ,1

...

C1,k2

..

..
.
..
.
..
.

CI,J
..

...

Ck1 ,k2

a1
..
.

aI
..
.

bJ

ak1

, C, a, b,
for J = 1 : k2
for I = 1 : k1
(* CIJ = CIJ + aI b>
J *)

C((I 1)m1 + 1 : Im1 , (J 1)m2 + 1 : Jm2 ) =


C((I 1)m1 + 1 : Im1 , (J 1)m2 + 1 : Jm2 )+
a((I 1)m1 + 1 : Im1 ) (b((J 1)m2 + 1 : Jm2 ))>
end
end

n1 n2
m1 m2 , nj = mj kj .
.
m1
a, CIJ = CIJ + aI b>
J
, . rank-1 = 1 + 1 + 1 .
min

2m1

2m2

= ((2m1 m2 +m1 )k1 +m2 )k2 = 2n1 n2 +n1 k2 +n2

=1+

1
1
1
1
+
1+
+
= min .
2n1
2m2
2n1
2n2

.
1 m2 K,

min 1 +

1
1
+
2n1
2K

(4.2)

n2 < K
= 1 + 2n1 1 + 2n1 2 .
:


min
,
.

.
, :

c
120 4. 2008,
.
(
, , .)
(. min ).
;
min , ,
1.
2.
3.

LOAD

STORE

(
), ,
.
. , :

min . .. DOT .
(. 2.2).

LOAD .


.
, . ..
.

A = xy > ,
A

fl(ij ) = ij (1 + ij ) = xi yj (1 + ij )
|ij | u.

fl(A) = xy > + E

|E| |x||y|> u, .
. , x, y

fl(A) = xy > + E = (x + x)(y + y)T .


E = (x +
x)(y + y)T xy > .

E = [x, x + x]

0 1
1 0

(y + y)>
y>

c
4.2. 2008,
. 121
2 n
.
. ,

.
: :
1 B A + xy >
A, . fl(B) = (A + E) + xy > E = ij i j ,

|E| u|xy > |.

4.2.3
,
y y + Ax, GAXPY = General A x
plus y, MV ( Matrix Vector).
1
+ n12 .
: = 2n1 n2 , min = n1 n2 + 2n1 + n2 , min = 12 + 2n
1
min
DOT sAXPY.
DOT/ ij / . : y A x,

i i + a>
i,: x = i +

n2
X

ik k , i = 1, ..., n1 .

k=1

sAXPY/ ji / . y A:

yy+

n2
X

a:,k k .

k=1

:
DOT / ij

for i = 1 : n1
for j = 1 : n2

sAXPY / ji

for j = 1 : n2
for i = 1 : n1

i = i + ij j

i = i + ij j
end

end
end

end

.
, . DOT ij , sAXPY
ji. , .. A 4 .
4
:
.

c
122 4. 2008,
.
, ... = 2n1 n2 ,
,
DOT = 2n1 n2 + 2n1 sAXPY = 3n1 n2 + n2 .
, ..
DOT n2
x. DOT = 2n1 + n1 n2 + n2 , = min = 21 + 2n1 1 + n12
.
sAXPY
y . DOT x
min
(read only). , sAXPY, y
. ,
.
.
,

MV. :

y1
..
.

yI
..
.

yk1

A11
A21
..
.
..
.

...
..

A1,k2
.

AI,J
..

Ak1 ,1

...

..
.
..
.
..
.

x1
..
.

xk2

Ak1 ,k2

y1
..
.

yI
..
.

yk1

:
for I = 1 : k1
(* yI = yI + AI1 x1 + ... + AI,k2 xk2 *)
for J = 1 : k2

y((I 1)m1 + 1 : Im1 ) = y((I 1)m1 + 1 : Im1 )+


A((I 1)m1 + 1 : Im1 , (J 1)m2 + 1 : Jm2 )x((J 1)m2 + 1 : Jm2 )
end
end

k1 , k2
yI = yI +AIJ xJ = m1 m2 +2m1 +
m2 yI
. , K m1 + (1)

=
=

k1 (m1 + k2 (m1 m2 + m1 + m2 ))

(4.3)

n1 + n1 n2 + n1 k2 + k1 n2 .

. K
n1 + n2 + (1), k1 = 1, k2 = 1 = n1 + n1 n2 + n1 k2 + n2 ,
= 2n1 + n1 n2 + n2 min GAXPY.

c
4.2. 2008,
. 123

4.1: .
for ? = 1 : n?
for ? = 1 : n?
for ? = 1 : n?

ij = ij + ik kj
end
end
end


MV
.

i = (ai,: + ai,: )> x, |ai,: | n |ai,: |.


y = (A + A)x, |A| n |A|


|y y| n |A||x|.

4.2.4 - :

C = C + AB,

C Rn1 n2 , A Rn1 n3 , B Rn3 n2 .

(4.4)

(4.4) := 2n1 n2 n3 ...


min := 2n1 n2 + n1 n3 + n2 n3 5 ,

min =

1
n3

1
2n1

1
2n2 .

nj n min = O( n1 ),
. 4.1
(4.4)
A, B, C .
?
(i, 1), (j, 2), (k, 3). , .
(= triply nested loop), 3! = 6 4.1,
4.2.
, .
5
,
.

c
124 4. 2008,
.

x
=

x
=

x
=

4.1: [7].

4.2:
[16, page 14].

ijk
DOT
A . ., B . .
jik
DOT
B . ., A . .
ikj
sAXPY
GAXPY .
B .
jki
sAXPY
GAXPY .
A .
kij
sAXPY

B .
kji
sAXPY

A .

c
4.2. 2008,
. 125

.
ikj ,

= n1 n3 + n1 n2 n3 (

1
2
+
),
m1
m3

m3 m1 + m3 K 1 mj nj .
mj . , C C + AB
r = n3 n1 , n2 , -r.
NB,
, .
CIJ = CIJ + AIK BKJ , on-chip.
, NB
3NB2 w < K.

,

. BLAS-3
RISC [5]. 6 , Fortran
.
performance portability ( )
, .
[5] )

Fortran, )
.
;

.

[5].
/ (..
1 )
BLAS.
[5]; , blocking (), loop
unrolling ( ), block copying ( ) .
4.4.

chip.
Kasparov Deep Blue
, ,
6

ftp://ftp.enseeiht.fr/pub/numerique/BLAS/RISC.

c
126 4. 2008,
.
(block size) .
, [5], NB
3NB2 w < CS, CS w bytes ... .
.
, [14] .
:
1.
( ).
2.
.
3.
.
. ..
[15]. ,
, ,
, . , , .
, brute force :
(
NB),
.
. .

(
).
Automatically Tuned Linear Algebra Software (ATLAS)
BLAS pipelining. . URL
www.netlib.org/atlas/index.html/
:
, ATLAS
,
.
, , ,
[30, 19, 18, 20].

c
4.2. 2008,
. 127

.

C = AB ci = Abi , i = 1 : n

ci = (A + Ai )bi , |Ai | n |A|,


. .

fl(AB) = (A + A)B,

|A| n |A||B||B 1 |

n |A||B|.
|C C|
,
|AB| |A||B|.

4.2.5
n (n3 ) : n2
n n 1 , .
2n3 n2 .
. O(nk ) k < 3.
, (
).
V. Strassen [29] .
B , C n = 2k .

A11
A21

A12
A22

B11 C11 + B12 C21 B11 C12 + B12 C22


B21 C11 + B22 C21 B21 C12 + B22 C22

B11 B12
C11 C12
=
.
B21 B22
C21 C22

Bij , Cij n/2.


,
. 8 4 . 2 2
, :

n
n
T (n) = 8T ( ) + 4( )2
2
2
T (n) = (n3 ), . .
1969 Volker Strassen [29] 3 (!!!)
Strassen n 4.7nlog2 7

c
128 4. 2008,
.
. Strassen
. ,

P1 = (B11 + B22 )(C11 + C22 ) P2 = (B21 + B22 )C11


P3 = B11 (C12 C22 )
P4 = B22 (C21 C11 )
P5 = (B11 + B12 )C22
P6 = (B21 B11 )(C11 + C12 )
P7 = (B12 B22 )(C21 + C22 )

A11 = P1 + P4 P5 + P7
A21 = P2 + P4

A12 = P3 + P5
A22 = P1 + P3 P2 + P6

n
n
T (n) = 7T ( ) + 18( )2
2
2

(4.5)

7 18 (
n = 2). 4.5

T (n) = (nlog2 7 ) = (n2.81 ).


:
4.2.1 (Strassen).
n = 2k (n2.81 ) .

[4, . 740-743] Strassen


.
Strassen .

.
n 6= 2k , .
(= embed) m = 2r p n
Strassen r .
m = 2k n 2k1 ,
.. n = 2k1 + 1.
, n
Strassen,
. ,
Strassen, ...
(n2 ), 2.81..
Strassen7 .
Strassen . David Bailey [2].

.
,
,
. , Strassen . [16].
:
7

1987 Coppersmith Winograd 2.376 [3].

c
4.2. 2008,
. 129
4.2.2 (Brent). A, B Rnn n = 2k C =
AB Strassen n0 = 2r+1
:
. C

n log2 12 2

kC Ck ( )
(n0 + 5n0 ) 5n ukAkkBk + O(u2 ).
n0
kAk := maxi,j |ij |.

4.2.2. n0 = 1 6n3,58 .
4.2.3. n0 = n/2 3n2 + 25n

Strassen .
[23].

nu|A||B| + O(u2 ).
|C C|

(4.6)

Webb Miller (4.6)


n3 .

4.2.6 BLAS

Basic Linear Algebra Subroutines, BLAS.
BLAS , . , ,
.
(.. ) ,
.
,
8 .
BLAS : .. (S =
, D = , C = , Z = BLAS-1, (.. SWAP =
, DOT, NRM2 = , ASUM =
), BLAS-2,3
(GE = , SY = , TR = , HE = ), (.. MV = , MM =
, SM = 9 , R = ).
.. DGEMM BLAS-3 ...
:

C op(A)op(B) + C, op(X) = X, X > , X


8

LAPACK , The LAPACK strategy for combining


efficiency with portability is to construct the software as much as possible out of calls to the BLAS;
the BLAS are used as building blocks. The efficiency of LAPACK software depends on the efficient
implementation of the BLAS being provided by computer vendors (or others) for their machines.
Thus the BLAS form a low-level interface between LAPACK software and different machine architectures. [1, page 50].
9
.

c
130 4. 2008,
.

4.2: BLAS.

C m n.
. op, ..
B , .
. [1, Appendix C].
BLAS (4.4) ..
n1 = n2 = 1 DOT, .
n1 , n2 , n3 .
BLAS-1: nj
, . /. .. DOT (n3 > 1), sAXPY (n2 > 1
n3 > 1.) O(n)
min = O(1). .
BLAS-2: , . /.
n
O(n2 ),
BLAS-1.
BLAS-3: nj > 1 j = 1, 2, 3, . /, .. n1 =
n2 = n3 = n O(n2 ) O(n3 ) .
BLAS
. 4.210 11 .
BLAS
RISC 4.312 .

, [16, Section 1.4.9]. K (.
10

RGB.
Kyle Gallivan .
12
BLAS Dakota Scientific Software IBM.
11

c
4.2. 2008,
. 131

4.3: BLAS [32, page 63].


Sun Sparc 50 MHz IBM/6000-530 IBM/6000-590
12 MFLOPS
10 MFLOPS
60 MFLOPS
19
27
157
41
46
250

x + y
y + Ax
C + AB

).
C = C + AB n := n1 = n2 = n3 K 3n2 < M.
B, C N
, 13 n = N m m,
.

B = [B1 , ..., BN ],

C = [C1 , ..., CN ], Bj , Cj Rnm .

N ( ) Bj , Cj A, . n(2m + 1) K.
C = AB Cj = ABj = [A:,1 , ..., A:,n ]Bj .
( ):
for j = 1 : N
(* LOAD Bj , Cj *)
for k = 1 : n
(* LOAD A:,k . 1 Cj *)
end
(* STORE Cj . *)
end

N
X
3 n2
1 =
(n2 + 3n(n/N )) = n2 (N + 3) 2n2 ( +
)
2
K
j=1

K.
N N
m, 3m2 K DOT ijk
:
for I = 1 : N
for J = 1 : N
(* LOAD CIJ *)
for K = 1 : N
(* LOAD AIK , BKJ CIJ *)
end
(* STORE CIJ *)
end
end
2 =

PN PN
i=1

2 = 2n2 (1 + N )

j=1 (2m

PN
k=1

2m2 ), .

n3 2 3

,
K

13
.
.

c
132 4. 2008,
.

1
1
n
=
.
2
2
3K

n K, .

4.3

,
. !
:
O(nk ) , k (
3) n ( ) .
,
(.. ) ,
N = O(n2 ) .
O(N k/2 ), . k = 2 (.. , ),
14 .
: n
: ..
n
.

4.4
, . DOT, DAXPY.
,
(. ) .
4.4 (BLAS-1),
Fortran.
D(DOT) .
DDOT Fortran function,
x, y dx(1), dx(1 + incx), dx(1 + 2incx), ..., dx(1 + (n 1)incx)
dy(1), dy(1 + incy), dy(1 + 2incy), ..., dy(1 + (n 1)incy).
, : (= label) 20
incx = incy = 1.
.
14

G.W. Stewart.

c
4.4. 2008,
.

133

10 dtemp. incx
= incy = 1 :

(= loop unrolling) . ,
m := n mod nL nL
5. ,
0 m < nL 1 . , (50) .
, L, S,
M A.

(.. , )
B. (.. 10)
n LLMASB, . 2 . ,
3 , B, . 50
:
LLMALLMALLMALLMALLMALLMASB.
S B n b nnL c+nL 1.

15 .
nL , .. nL := n .
DDOT functions,
nL . nL = 5 . ,

, S.
4.5 sAXPY.
Fortran-77 BLAS Jack Dongarra .
,
.
Netlib16 . C C.
f2c
(Fortran to C translator ), (..
Fortran)
.
nL sAXPY
DOT 17 .
Fortran-77 DGEMV. BLAS-1.
15

loop overhead .

16
17

thrashing .

c
134 4. 2008,
.

*
*
*
*

SUBROUTINE DGEMV( TRANS,M,N,ALPHA,A,LDA,X,INCX,BETA,Y,INCY )


.......
.. Array Arguments ..
DOUBLE PRECISION
A( LDA, * ), X( * ), Y( * )
......
DGEMV performs one of the matrix-vector operations
y := alpha*A*x + beta*y,
or
y := alpha*A*x + beta*y,
where alpha and beta are scalars, x and y are vectors and A is an
m by n matrix.
....
......
IF( LSAME( TRANS, N ) )THEN
Form y := alpha*A*x + y.
JX = KX
IF( INCY.EQ.1 )THEN
DO 60, J = 1, N
IF( X( JX ).NE.ZERO )THEN
TEMP = ALPHA*X( JX )
CALL DAXPY(M, TEMP, A(1,J), 1, Y, 1)
END IF
JX = JX + INCX
60
CONTINUE
ELSE
........
[5]

DGEMM.

SUBROUTINE DGEMM (TRANSA,TRANSB,M, N, K, ALPHA, A, LDA, B, LDB,


$
BETA, C, LDC )
.....
.. Array Arguments ..
DOUBLE PRECISION A( LDA, * ), B( LDB, * ), C( LDC, * )
..

*
*
* Purpose
* =======
*
* DGEMM performs one of the matrix-matrix operations
*
C := alpha*op( A )*op( B ) + beta*C,
*
*
* where op( X ) is one of
*
op( X ) = X
or
op( X ) = Xt,
*
*
* alpha and beta are scalars, and A, B and C are matrices, with op( A )
* an m by k matrix, op( B ) a k by n matrix and C an m by n matrix.
.....
Form C := alpha*A*B + beta*C.
*
*

c
4.4. 2008,
.

135

DO 70, L = 1, K, NB
LB = MIN( K - L + 1, NB )
DO 60, I = 1, M, NB
IB = MIN( M - I + 1, NB )
DO 62 II = I, I + IB - 1
DO 61 LL = L, L + LB - 1
AA(LL-L+1,II-I+1)=ALPHA*A(II,LL)
CONTINUE
CONTINUE

61
62

DO 50, J = 1, N, NB
JB = MIN( N - J + 1, NB )
*
*
*

Performs matrix-matrix multiplications

$
*
50
60
70
.....

IF ( ( MOD(IB,2).EQ.0 ) .AND. ( MOD(JB,2).EQ.0 ) ) THEN


CALL DGEMML2X2_NN( IB, JB, LB, AA, NB,
B( L, J ), LDB, C( I, J ), LDC )
ELSE
CALL DGEMML_NN( IB, JB, LB, AA, NB,
B( L, J ), LDB, C( I, J ), LDC )
END IF
CONTINUE
CONTINUE
CONTINUE

DGEMML2X2_NN . ,
(.
a + b c). :

*
*

SUBROUTINE DGEMML2X2_NN( M, N, K, A, LDA, B, LDB,


$
C, LDC )
.. Scalar Arguments ..
INTEGER
M, N, K, LDA, LDB, LDC
.. Array Arguments ..
DOUBLE PRECISION
A( LDA, * ), B( LDB, * ), C( LDC, * )
....
*

Form

C := alpha*A*B + C.

DO 70, J = 1, N, 2
DO 60, I = 1, M, 2

...
T11 = C(I ,J
T21 = C(I+1,J

)
)

c
136 4. 2008,
.

T12 = C(I ,J+1)


T22 = C(I+1,J+1)
DO 50, L = 1, K
B1 = B(L,J )
B2 = B(L,J+1)
A1 = A(L,I )
A2 = A(L,I+1)
T11 = T11 + B1*A1
T21 = T21 + B1*A2
T12 = T12 + B2*A1
T22 = T22 + B2*A2
50

CONTINUE
C(I ,J
C(I+1,J

) = T11
) = T21

C(I ,J+1) = T12


C(I+1,J+1) = T22
60
70

CONTINUE
CONTINUE

daxpy
:
lib BLAS
.
ref BLAS .
hnd .
IBM T20
Intel Pentium III 700 MHz. Fortran 77 Digital Fortran Compaq18 .
BLAS. ref lib
4 .
:
0 .
1 Local (minimal) optimizations occur within the source program unit and include recognition of common subexpressions and the expansion of multiplication and division.
18

. http://www5.compaq.com/fortran/index.html.

c
4.4. 2008,
.

137

DAXPY on 700 MHz Pentium III


350

lib3
hnd3
ref3
ref2
hnd2
hnd1
ref1
ref0
hnd

300

250

library

mflops/s

200
ref3

ref2
ref1

150

100

hnd3

hnd2

ref0

hnd1
50
hnd0
0

500

1000

1500

2000

2500
n

3000

3500

4000

4500

5000

4.3: DAXPY 3
.

2 Global optimizations include such optimizations as data-flow analysis, code


motion, strength reduction, split-lifetime analysis, and instruction scheduling.
3 Additional global optimizations improve speed at the cost of extra code size.
These optimizations include loop unrolling, code replication to eliminate
branches, and padding certain power-of-two array sizes for more efficient
cache use.
4.3.
.
.
,
.

c
138 4. 2008,
.

(hand0) 5 (lib3).

4.5

Kyle Gallivan (CSRD) Illinois [16].
BLAS-1 LINPACK EISPACK (.. [8]),
. [25].
,
(.. Cray-1), BLAS-2
[6].
BLAS-3 [9, 10]. BLAS-3
LAPACK [1],
LAPACK .
.

(= virtual memory) ,
[26, 27].
. [11, 21].
BLAS

.. . [13, 14, 12]. ,
(= workstations)
.. . [15] DGEMM DEC
3000/80019 15 MFLOPS 166 MFLOPS. BLAS-3 RISC [5]20 .
C. Ueberhuber
[31, . 262-271].
[22], , Strassen, , BLAS-3.

(.. ),

.
Fortran BLAS Internet
URL http://www.netlib.org/blas.
19
DEC 3000/800 21064 Alpha-AXP 200
MFLOPS.
20
ftp://ftp.enseeiht.fr/pub/numerique/BLAS/RISC.

c
4.6. 2008,
.

4.6

139

4.6.1. ;

. [, . 4] : 1)
C C + AB A, B, C , , 2)
, 3) , 4-5)
, 6) 7)
.
4.6.2. : x, u, v Rn
I n. (I uv > )x O(n2 )
.

. : (I uv > )x = x u(v > x)


2n 1 := v > x, 2n x u,
4n 1 .
4.6.3. p, q, x Rn i , i i
A Rnn ij = ji = i j 1 i j n.
... Ax
x Rn ; .

. : A 6= pq T !

O(n log n) . ( : A
A, .
, . : A =
[A11 , A12 ; A12 , A22 ] A12 , A12 1 ,
A.)
4.6.4. : BLAS-3 n n O(n2+ )
... 0 < < 1.

4.6.5. : C = AB , A
Hessenberg B C Hessenberg.

. : j > i + 1, cij = 0.
cij j > i + 1.
:

A(i, :) B(:, j) = A(i, 1 : i + 1) B(1 : i + 1, j) = 0


B .
4.6.6. A Rmn , () p Rm , xi Rn
i = 1, ..., s for i = 1 : s, yi = p + Axi , end;
1. ...
2.
, (.. MATLAB 6 BLAS)
.

4.6.7. u, v Rn
.

c
140 4. 2008,
.

D(u, v) := ku vk2 u, v ( ,
.. )
BLAS. .

q
D(u, v) =

(u v)> (u v) = kuk22 + kvk22 2u> v

, D
DOT.
4.6.8 (, , 03). :
( MATLAB)
C AB ,
[c1 , ..., cn ] = A[b1 , ..., bn ], cj , bj C, B BLAS-2 :

for j = 1:n, C(:,j) = A*B(:,j); end;

. , BLAS-3 BLAS-2.

4.7
4.7.1 ( 03). B = B + (xy > )p ,
B Rnn , x, y Rn p . min = min /,
... min
.
min n.

. min ) ,
)
) (. .
2 ). , min = 2n2 + 2n + 1 ( p.
, O(1) ). , (xy > )p
p1

z
}|
{
x (y > x) (y > x) y > : = y > x .
2n 1 , ( p1 x), . 1 + n , B = B + ( p1 x)y > ,
. 2n2 . = 2n2 + 3n (
). p
x , n2
. min = (2n2 + 3n)/(2n2 + 2n + 1).
O(n2 ) . O(n3 )
p.

c
4.7. 2008,
.

141

4.7.2. A, B Rnn .
AB (. )
...

. MATLAB:

C=zeros(n);
for i=1:n,
for j=1:i,
C(i, j)=A(i, j:i)*B(j:i, j);
end
end
i, i C , i i 1 . (2i 1).
:

=
=
=

n X
i
X

(2(i j + 1) 1) =

i=1 j=1
n
X

n
X

i=1

i=1

n X
i
X

(2k 1)

i=1 k=1

((i + 1)i i) =

i2

n(n + 1)(2n + 1)
6

.
4.7.3. A Rnn , x, u, v Rn y = (A + uv > )x.
, min , ...
( )
O(n)
( load) ( store).

. :

y = (A + uv > )x = Ax + (v > x)u


n+1 DOT ( n Ax), sAXPY.
= (n + 1)(2n 1) + 2n = 2n2 + 3n 1.
min = n2 + 4n,

min =

min
n2 + 4n
= 2

2n + 3n 1

O(n) min :
LOAD x, u, v

y = (v > u)x
for i = 1 : n,
LOAD A(i, :)

yi = yi + A(i, :) x
end
STORE y
4.7.4. x, y Rn .

c
142 4. 2008,
.
1. A := xy T .
2. .

. 4.2.2
.
4.7.5. 2 2
A, B R22 .

. A, B R22 :

A=

11
21

0
22

,B=

11
21

0
22

fl(AB) =

11 11 (1 + 1 )
0
(21 11 (1 + 2 ) + 22 22 (1 + 3 ))(1 + 4 ) 22 22 (1 + 5 )

B
, :
fl(AB) = A

A=

11 (1 + 1 )
0
21 (1 + 2 )(1 + 4 ) 22

,B=

11
21 (1 + 3 )(1 + 4 )

0
22 (1 + 5 )

ij ij
2u
| |2 | 2 =
ij
1 2u

ij ij
2u
| |2 | 2 =
ij
1 2u

4.7.6. 2 2

A R22 .

A=

:
fl(|A|)

= ((1 + 1 ) (1 + 2 ))(1 + 3)
= (1 + 1 )(1 + 3 )) (1 + 2 )(1 + 3 ))

, A =
fl(|A|) = |A|

(1 + 2

(1 + 20 )

, |2 |, |20 | 2 =

2u
1 2u

c
4.7. 2008,
.

143

4.7.7. 2 2
|ij |.
nu
(n) := 1nu
(u ).

4.7.8. Hessenberg H Rnn x = [1 , r ] Rr


r < n. saxpy
B = (H 1 I)(H 2 I) (H r I)
.

.
Be1 . bj :=
Hbj1 j bj1 b0 = e1 . Hessenberg H
1 b1 = He1 1 e1 ,
2 b2 = Hb1 2 b1 3 ,
... j 1 n 1 bj j
. n 1 bj .
MATLAB .

x(2:n,1)=zeros(n-1,1);
for j=1:r
nn=min(r,n);
xt(1:n,1)=-l(j)*x(1:n,1);
for k=1:min(j,n)
xt(1:nn,1)=xt(1:nn,1)+x(k,1)*a(1:nn,k);
end
x(1:nn,1)=xt(1:nn,1);
end
4.7.9 (, , 02-makeup). A
s 1 , ..., s R .
Qs
n
j=1 (A j I)x, x R . A O(ns) .

. :

y=x;
for i=s:-1:1,
y=(A-t(i)*I)y;
end
6(n 2) + 8 ,
yj , j = 2 : n 1 6 , y1
yn 4 .
A
(6(n 2) + 8)s = 6ns 4s = O(ns) .
4.7.10 (, , 02, 02-makeup).
1. ... DOT
sAXPY.
2. .

c
144 4. 2008,
.
3. .

4.7.11 (, , 03). MATLAB


.
,
, , MATLAB .

for i = 1:n
for j = 1:n
if ((j==i-1)|(j==i+1)), A(i,j)=-1;
elseif (i==j), A(i,j)=2;
else A(i,j) =0;
end
end
end
for i = 1:n,
u(i,1) = sin(i*pi/n);
v(i,1) = cos((n-i+1)*pi/n);
x(i,1) = 1;
end
for k=1:s,
x = (A+u*v)*x;
end

.
Toeplitz. :

A =
u =
v =
x =
for
x
end

toeplitz([2,-1,zeros(1,n-2)]);
sin([1:n]*pi/n);
cos([n:-1:1]*pi/n);
ones(n,1);
k=1:s,
= A*x + u*(v*x);

4.7.12 (Golub and van Loan [17]). U Rnn


(n, n)
. z 6= 0
U z = 0.

. n- , n , z . , n1,n1 n1 + n1,n n = 0
n1 = n1,n n /n1,n1 ,
n1,n1 6= 0.
.
4.7.13 (Golub and van Loan [17]). S, T Rnn
ST I . (ST I)x = b O(n2 )

c
4.7. 2008,
.

145

. : :

S+ =

u>
Sc

, T+ =

v>
Tc

, b+ =

bc

S+ = S(k1 : n, k1 : n), T+ = T (k1 : n, k1 : n), b+ = b(k1 : n)


, , R . xc (Sc Tc I)xc = bc
wc = Tc xc ,

v > xc u> wc

.
x+ =
, =
xc

(S+ T+ I)x+ = b+ . x+
w+ = T+ x+ O(n k) .

. ,
ST , O(n3 )
. ,
, .
O(n2 ) .
.
, ..
.

S+ T+ I =

v > + u> Tc
Sc Tc I

(S+ T+ I)x+ = b+

v > + u> Tc
Sc Tc I

xc

bc

(Sc Tc I)xc = bc

( ) + (v > + u> Tc )xc =

= ( v > xc u> Tc xc )/( ).


xc wc = Tc xc , O(n k)
T+ x+ . v > xc
O(n k) . ,

T+ x+ =

+ v > xc
Tc xc

T+ x+ O(n k) .
, k = n, Sc =
nn , Tc = nn , bc = n n 1 O(n 1) + O(n
2) + O(1) = O(n2 ).

146

[1] E. Anderson, Z. Bai, C. Bischof, J. Demmel, J.J. Dongarra, J. Du Croz,


A. Greenbaum, S. Hammerling, A. McKenney, S. Ostrouchov, and D. Sorensen. LAPACK Users Guide. SIAM, Philadelphia, 2nd edition, 1995.
[2] D.H. Bailey. Extra high speed matrix multiplication on the Cray-2. SIAM
J. Sci. Statist. Comput., 9(3):603607, May 1988.
[3] D. Coppersmith and S. Winograd. Matrix multiplication via artithmetic
progressions. In Proc. 19th Annual Symp. Theory of Computing, pages 16,
New York, 1987. ACM Press.
[4] T.H. Cormen, C.E. Leiserson, and R.L. Rivest. Introduction to Algorithms.
McGraw-Hill, New York, 1990.
[5] M.J. Dayde and I.S. Duff. A blocked implementation of level 3 BLAS for
RISC processors. Technical Report RT/APO/96/1, ENSEEIHT-IRIT, March
1996. Available from URL ftp://ftp.enseeiht.fr.
[6] J. J. Dongarra, J. Du Croz, S. Hammarling, and R. J. Hanson. An extended
set of fortran basic linear algebra subprograms. ACM Trans. Math. Softw.,
14(1):117, March 1988.
[7] J. J. Dongarra, F. G. Gustavson, and A. Karp. Implementing linear algebra
algorithms for dense matrices on a vector pipeline machine. SIAM Rev.,
26(1):91111, January 1984.
[8] J.J. Dongarra, J.R. Bunch, C.B. Moler, and G.W. Stewart. LINPACK Users
Guide. SIAM, Philadelphia, PA, 1979.
[9] J.J. Dongarra, J.J. Du Croz, I.S. Duff, and S.J. Hammarling. A set of level
3 basic linear algebra subprograms. ACM Trans. Math. Softw., 16:117,
1990.
[10] J.J. Dongarra, J.J. Du Croz, I.S. Duff, and S.J. Hammarling. Algorithm
679. A set of level 3 basic linear algebra subprograms: Model implementation and test programs. ACM Trans. Math. Softw., 16:1828, 1990.
[11] J.J. Dongarra and A.R. Hinds. Unrolling loops in Fortran. Software
Practice and Experience, 9:219229, 1979.
[12] J.J. Dongarra and D.W. Walker.
Software libraries for linear
algebra
computations
in
high
performance
computers.
SIAM Rev.,
37(2):151180,
June 1995.
Also in
http://hpclab.ceid.upatras.gr/faculty/stratis/download/Dongarrawalker.zip.
[13] K. Gallivan, W. Jalby, and U. Meier. The use of BLAS3 in linear algebra
on a parallel processor with a hierarchical memory. SIAM J. Sci. Statist.
Comput., 8(6):10791084, Nov. 1987.
[14] K. Gallivan, W. Jalby, U. Meier, and A. Sameh. The impact of hierarchical memory systems on linear algebra algorithm design. International J.
Supercomputer Applications, 2(1), 1988.

147

[15] E. Garcia, J.R. Herrero, and J.J. Navarro. Data prefetching for linear
algebra on high performance workstations. Technical report, U. Pol. Catalunya, Barcelona, 1995.
[16] G. Golub and C. F. Van Loan. Matrix Computations. The Johns Hopkins
University Press, Baltimore, 2nd edition, 1989.
[17] G.H. Golub and C.F. Van Loan. Matrix Computations. The Johns Hopkins
University Press, Baltimore, 3d edition, 1996.
[18] F. Gustavson, A. Henriksson, I. Jonsson, B. Kom, and P. Ling. Recursive
blocked data formats and BLASs for dense linear algebra algorithms, year=
1998. In B. Kom, J. Dongarra, E. Elmroth, and J. Wasniewski, editors,
Applied Parallel Computing, 4th International Workshop (PARA98), pages
195206, Berlin, 1998. Springer-Verlag.
[19] F.G. Gustavson. Recursion leads to automatic variable blocking for dense
linear-algebra algorithms. IBM J. Res. Develop., 41(6):737756, 1997.
[20] F.G. Gustavson and I. Jonsson. Minimal-storage high-performance Cholesky factorization via blocking and recursion. IBM J. Res. Develop.,
44(6):823850, 2000.
[21] J.L. Hennessy and D.A. Patterson. Computer Architecture: A Quantitative
Approach. Morgan Kaufmann, San Mateo, CA, first edition, 1990.
[22] N.J. Higham. Exploiting fast matrix multiplication within the level 3 BLAS.
ACM Trans. Math. Softw., 16(4):352368, 1990.
[23] N.J. Higham. Accuracy and Stability of Numerical Algorithms. SIAM, Philadelphia, 2nd edition, 2002.
[24] L. Komzsik and T. Rose. Substructuring in MSC/NASTRAN for large scale
parallel applications. Computing Systems in Engineering, 2(2/3):167173,
1991.
[25] C. Lawson, R.J. Hanson, D.R. Kincaid, and F.T. Krogh. Basic linear algebra
subprogams for Fortran usage. ACM Trans. Math. Softw., 5(3):308323,
1979.
[26] A. C. McKellar and E. C. Coffman, Jr. Organizing matrices and matrix
operations for paged memory systems. Comm. ACM, 12(3):153165, March
1969.
[27] C. Moler. Matrix computations with Fortran and paging. Commun. ACM,
15:268270, 1972.
[28] W.H. Press, S.A. Teukolsky, W.T. Vettering, and B.P. Flannery. Numerical
Recipes in FORTRAN. The Art of Scientific Computing. Cambridge University
Press, Cambridge, second edition, 1992.
[29] V. Strassen. Gaussian elimination is not optimal. Numer. Math., 13:354
356, 1969.

148

[30] S. Toledo. Locality of reference in LU decomposition with partial pivoting.


SIAM J. Matrix Anal. Appl., 18(4):10651081, Oct. 1997.
[31] C.W. Ueberhuber. Numerical Computation, volume 1. Springer, Berlin,
1997.
[32] R. Wille. Advanced Scientific FORTRAN. John Wiley & Sons, Chichester,
1995.

149

4.4: Fortran-77 BLAS-1 DDOT.

double precision function ddot(n,dx,incx,dy,incy)


c
c
c
c
c

forms the dot product of two vectors.


uses unrolled loops for increments equal to one.
jack dongarra, linpack, 3/11/78.
double precision dx(1),dy(1),dtemp
integer i,incx,incy,ix,iy,m,mp1,n

c
ddot = 0.0d0
dtemp = 0.0d0
if(n.le.0)return
if(incx.eq.1.and.incy.eq.1)go to 20
c
c
c
c

code for unequal increments or equal increments


not equal to 1

ix = 1
iy = 1
if(incx.lt.0)ix = (-n+1)*incx + 1
if(incy.lt.0)iy = (-n+1)*incy + 1
do 10 i = 1,n
dtemp = dtemp + dx(ix)*dy(iy)
ix = ix + incx
iy = iy + incy
10
continue
ddot = dtemp
return
c
c
code for both increments equal to 1
c
c
clean-up loop
c
20
m = mod(n,5)
if( m .eq. 0 ) go to 40
do 30 i = 1,m
dtemp = dtemp + dx(i)*dy(i)
30
continue
if( n .lt. 5 ) go to 60
40
mp1 = m + 1
do 50 i = mp1,n,5
dtemp = dtemp + dx(i)*dy(i) + dx(i + 1)*dy(i + 1) +
dx(i + 2)*dy(i + 2) + dx(i + 3)*dy(i + 3) + dx(i + 4)*dy(i + 4)
*
50
continue
60
ddot = dtemp
return
end

150

4.5: Fortran-77 BLAS-1 DAXPY.

subroutine daxpy(n,da,dx,incx,dy,incy)
c
c
c
c
c

constant times a vector plus a vector.


uses unrolled loops for increments equal to one.
jack dongarra, linpack, 3/11/78.
double precision dx(1),dy(1),da
integer i,incx,incy,ix,iy,m,mp1,n

c
if(n.le.0)return
if (da .eq. 0.0d0) return
if(incx.eq.1.and.incy.eq.1)go to 20
c
c
c
c

code for unequal increments or equal increments


not equal to 1

ix = 1
iy = 1
if(incx.lt.0)ix = (-n+1)*incx + 1
if(incy.lt.0)iy = (-n+1)*incy + 1
do 10 i = 1,n
dy(iy) = dy(iy) + da*dx(ix)
ix = ix + incx
iy = iy + incy
10
continue
return
c
c
code for both increments equal to 1
c
c
clean-up loop
20
m = mod(n,4)
if( m .eq. 0 ) go to 40
do 30 i = 1,m
dy(i) = dy(i) + da*dx(i)
30
continue
if( n .lt. 4 ) return
40
mp1 = m + 1
do 50 i = mp1,n,4
dy(i) = dy(i) + da*dx(i)
dy(i + 1) = dy(i + 1) + da*dx(i + 1)
dy(i + 2) = dy(i + 2) + da*dx(i + 2)
dy(i + 3) = dy(i + 3) + da*dx(i + 3)
50
continue
return
end


II
5.1
(
.1):
.1 A Rnn , b Rn . x Rn Ax = b.

.1 .
.1 Carl Friedrich Gauss, 1800
.
Gauss
([18]) G.W. Stewart1 .
.1; , 2
( !)
.
.1 . 1.1.
.1
..
.1 O(n3 )
O(n2 )

.. 1980
BLAS3 [12].
1
G.W. Stewart Maryland .
2
.

151

c
152 5. II 2008,
.

1.
.1
LINPACK. LAPACK LINPACK

.
2. [11, 8].
3.
MATLAB3 MATHWORKS, 7 : (
)
LINPACK.
4. 6 , MATLAB BLAS-3 LAPACK,
7 ... .
LINPACK benchmark


.
, 4 A


.. ,
O(n2 ) O(n3 ) , O(n)
.
Gauss.
.
Gauss, 20
( O(106 ) )
[17, 33].
.
[24, 4].
, .1 , .. 1.1.

3
4

MAtrix LABoratory.
5.3.

c
5.2. 2008,
.

153

.. , , ,
A
( Cholesky .)
:
,
.
,

.
.1
.0 .
Golub
Van Loan ([22]) Higham ([24]). ,
.

5.2
There would be many things to say about this theory of matrices
which should, it seems to me, precede the theory of determinants.
[Arthur Cayley]

.1. ( ) ,
:
5.2.1. Ax = b A Rmn b Rm .
:

5 [A, b] A:
rank([A, b]) = rank(A)
y Rm y > A = 0 y > b = 0.
, x = x0 + x
x

x0 Ax = 0
n rank(A) .

5.2.1. A Rnn , Ax = b
.

5.2.1 . , (.. )
Cramer:
5

c
154 5. II 2008,
.
5.2.2 ( Cramer). A Rnn b Rn .
j (1 j n) x Rn Ax = b
:

i =

det(A(i|b))
, i = 1, . . . , n.
det(A)

A(i|b) i
A b.

Cramer .1, n (
det(A)), n2 n 1. . ,

det(A) :=

sgn()a1,(1) a2,(2) , ...., an,(n) ,

Sn

sgn() , Sn
1, ..., n. :

S2 = {{1, 2}, {2, 1}} det(A) = a11 a22 a12 a21


S3 = {{1, 2, 3}, {2, 3, 1}, {3, 1, 2}, {1, 3, 2}, {2, 1, 3}, {3, 2, 1}}
det(A) = a11 a22 a33 + a12 a23 a31 + a13 a21 a32 a11 a23 a32 a12 a21 a33 a13 a22 a31 .

n! (. Sn ).
n .
() n! :
..
10! = 3628800 n = 10,
Stirling n! = 2n( ne )n .
Cramer , 20
20! ..., .
7 100 Mflop/s, (1999),
.
Cramer [24].

. ,
.1.
.1
Cramer.

.1.
.1
B := A1 x = Bb.
.1. ,
:= A, := b , 1/

c
5.2. 2008,
.

155

A, /.

(1/)j j .
. A1
A. , A
(.. ), A1
.
5.2.1.

A=
0

1/2

1/2

1/2

1/2

3
0

() 3n2 = 13
.
(3n 2) 2 3n 2 () :

(1,1)
(2,1)
(1,2)
(2,2)
(3,2)
(2,3)
(3,3)
(4,3)
(3,4)
(4,4)
(5,4)
(4,5)
(5,5)

3.0000
-1.0000
-0.5000
3.0000
-1.0000
-0.5000
3.0000
-1.0000
-0.5000
3.0000
-1.0000
-0.5000
3.0000

A , ( Toeplitz),
:

Matrix type: Toeplitz


First_Row: [3, -0.5, 0, 0, 0]
First_Column: [3, -1, 0, 0, 0]

Matrix type: tridiagonal Toeplitz


First_Row: [3, -0.5]
First_Column: [3, -1]
MATLAB A :

c
156 5. II 2008,
.

A = toeplitz([3, -0.5, zeros(1,3)],[3, -1, zeros(1,3)])


2 (. 2n = 10 ). (.. MATLAB (.. inv(A)),

0.3542
0.1255
0.0444
0.0157
0.0052

0.0627
0.3765
0.1333
0.0471
0.0157

0.0111
0.0667
0.3778
0.1333
0.0444

0.0020
0.0118
0.0667
0.3765
0.1255

0.0003
0.0020
0.0111
0.0627
0.3542

Toeplitz) n2 =
25 6 .

, A1 , A1 b. ,

.1.
:
(= direct) : (exact arithmetic)
.
Gauss.
.
. Gauss ,
.
(= iterative) :

, .
.
.
,
. , .

.
, :
: ,

6
, .
.
n2 /2 .

c
5.3. 2008,
.

157

. .. Gauss O(n3 )
O(n2 ) .
: ,

. ..
Cholesky
Gauss.
,
. , (. ,
Hessenberg), Vandermonde Toeplitz.

5.3
, ,
,

.
5.3.1.
.

5.3.2. n(n + 1)/2


... n(n 1)/2 ...
5.3.3.
x> Ax > 0 x 6= 0 ...
. x> Ax > 0 .

.

.
: (dense) . .. ,
.
. , n2
...
: (structured dense)
n2 .
:
: A = A> , . ij = ji i, j = 1, ..., n.
n(n + 1)/2 .
: A = A , . ij =

ji i, j = 1, ..., n. .
: ij = 0 i > j ( ), ij = 0
i < j ( )
Hessenberg: () Hessenberg i > j + 1
ij = 0 (j > i + 1 ij = 0). n(n + 1)/2 + n 1
.

c
158 5. II 2008,
.
Toeplitz:
, . ij = |ij| .
2n 1
. ,
, .
Hankel: -
, . ij = hi+j .
, , .
2n 1 .
: (circulant)
: , .
, , , .
n .
Vandermonde: {0 , ..., n1 }. V (0 , ..., n1 )
i1
Cnn Vij := j1
, i, j = 1, ..., n, .

1
0

V (1 , ..., n ) = .
..
0n1

1
1

..
.

..

1n1

1
n1

..

.
n1
n1

Vandermonde. Vandermonde , .. .
: (sparse)
. nnz
.. nnz = O(n) . 5.1, 5.2(a).
nnz
n2 . , ..
, (linked lists). ,
.
,
. ..
, . [34, 32, 20, 13, 14]

. .
(=structured sparse) (= unstructured sparse) .
(= banded) , m n aij = 0 |i j| > m.
, ij = 0 |i j| > 2 .
5.2(). ,
aij = 0 i 6= j .
5.1 5.2. .

c
5.3. 2008,
.

159

500

1000

1500

2000

2500

3000

3500

4000
0

500

1000

1500

2000 2500
nz = 28831

3000

3500

4000

5.1: 4253 nnz = 28831, .


0.0016
NASA. Matlab.

: .. 5.3
.
(= block diagonal).
5.3 .
(= block tridiagonal).
:
, - .

.
: AA> = A> A = I .
: (unitary7 ) AA = A A = I .
: (normal) AA> = A> A A , AA =
A A.
: (diagonally dominant) ()

|ii |

n
X

|ij |, i = 1, ..., n.

j = 1//j 6= i
, .
.
7

1.

c
160 5. II 2008,
.

0
2

20

4
40
6
60

80

10
12

100
0

50
nz = 200

100

5
nz = 34

10

5.2: () n = 100, nnz = 200. ()


n = 12.

10

10

20

20

30

30

40
0

10

20
nz = 400

30

40

40
0

10

20
nz = 172

30

5.3: () . ()

10 4.

40

c
5.3. 2008,
.

161

: (positive definite) A Rnn () x> Ax > 0, x 6= 0 Rn .


,
. ,
( ) .
) , )
(.. , , .),
) .
(.. ; ) .
.
.
.
5.3.4. A, B (A + B)1 b b.
A + B ( ,
n2 ...
O(n3 )).
A + B , A1 , AB .
, () .

: ,
8 .
5.3.1. 9 rank(A) = r X
mr
R
, Y Rrn C Rrr A = XCY .

, (.3.1(5) )
X, C, Y , . mr + nr + r 2
mn . .. r = 1, A m + n + 1
{, x, y}.
5.3.1. A Rmn , b Rn rank(A) = r A mr + nr + r 2 Ab
= 2r(n + m + r 1) m ...

r ,
(mr + nr + r 2 mn) (2r(n + m + r 1) m
m(2n1)) X, Y C A. ,
( ) X, Y C .
, . ,
.
, , , . A Rmn m n
8
9

.3
.3.1(5) .3.

c
162 5. II 2008,
.
rank(A) = r < n. A
(.. )
(.. fl(A)) !
(= numerical rank)
.
:
5.3.1. r

r = min{rank(B) : kA Bk2 }

0.
r < n,

inf

rank(B)r

kA Bk2 = r+1 ,

1 2 n A.
r

1 2 r > r+1 n .
inf

B=

r
X

i ui vi> .

i=1


r
r+1 . , ( ) [2]
.

5.3.2. u, v Rn , R. E(u, v; ) := I uv > ,
(. ).

.
1. E(u, v; )E(u, v; ) = E(u, v; + v u).
2. 1 + 1 = v u E(u, v; )E(u, v; ) = I .
3. detE(u, v; ) = 1 v u.
.
1 , rank(E) n 1.
5.3.2. rankE(u, v; ) n 1.

: u e>
j u = 0 1 j k .
Lk (u) := E(u, ek ; 1)

c
5.3. 2008,
.

163

, (k + 1 :

n, k). :

1
0

..
.

.
..
0

0
1

..
.

..

..
.

..
.

..
.

0
0
0

1
k+1
k+2

0
1
0

..
.

..
.

..
.

..
.

..
.

..
.

0
0

..
.

0
,
0

..
.
1

Lk (u) .
5.3.3. E(u, ek ; 1)E(u, ek ; 1) = I , . (Lk (u))1 =
Lk (u).

LU, Gauss: x = [1 , ..., n ]> k 6= 0.

Lk (u)x =
=
=

x ue>
kx
x uk
[1 , ..., k , k+1 k k+1 , ..., n k n ]> .

j = j /k (k + 1 j n), Lk (u)
x k + 1 : n, . ,
u Gauss.
: Pij :=
E(ei ej , ei ej ; 1)
i, j i < j :

Pij = [e1 , ..., ei1 , ej , ei+1 , ..., ej1 , ei , ej+1 , ..., en ]


, .
:
(permutation matrices):
.

P = [e(1) , e(2) , ..., e(n) ],


P =

e>
(1)
..
.

e>
(n)

1
1

2
2

n
n

1, ..., n. P =
[e1 , e2 , ..., en ]. :

c
164 5. II 2008,
.
1. P A
A :

P A =

a(1)
..
.

a(n)
a(k) (k) A.
2. AP
A :

AP =

a(1)

a(n)

P
1. P P > = I , . P
2. P >
3. Q P Q QP .
: H :=
E(u, u; 2/u u). H = H , H H = H 2 = I .
Householder,
(elementary reflectors)
. QR.
: .
1 . u, v, x Rn .
: E(u, v; ) 2n + 1
u, v,
: E(u, v; )x DOT
sAXPY, . = 4n ... :

(I uv > )x = x u (v > x)
| {z }
DOT
|
{z
}
sAXPY

, A n2 A = n(2n 1) ...
5.3.4. C Rnn , u, v Rn , 6= 0.
B E(u, v; )C ...; BLAS
B .

c
5.4. 2008,
. 165
Gauss:
Gauss .
u (= unit roundoff) ...
C Rnn , v Rn Gauss e>
j Lk (v)C(:, k) = 0, j = k+1 : n.
v v v(j) = C(j, k)/C(k, k) j = k + 1 : n

v(j)
=
=
v =

C(j, k)/C(k, k)(1 + j )


v(j) + v(j)j
v + e, |e| u|v|.

fl(Lk (
v )C).
:
5.3.1. C Rnn , x, y Rn .

fl(C + xy > ) = C + xy > + E, |E| u(|C| + 2|xy > |) + O(u2 ).

. (i, j) C + xy > C(i, j) +


x(i)y(j),
|fl(C(i, j) + x(i)y(j)) (C(i, j) + x(i)y(j))|

|C(i, j)2 + x(i)y(j)(1 + 2 )|


+O(u2 ), max(|1 |, |2 |)| u

|C(i, j)|u + 2|x(i)y(j)|u + O(u2 ).

(i, j) fl(Lk (
v )C) Lk (v)C O(u2 ) :

|C(i, j) + C(i, j)2 (v(i) + e(i))C(k, j)(1 + 1 + 2 ) (C(i, j) v(i)C(k, j))|


2 |C(i, j)| + |v(i)C(k, j)|(1 + 2 ) + |e(i)C(k, j)|
u|C(i, j)| + 2u|v(i)C(k, j)| + u|v(i)C(k, j)|
u|C(i, j)| + 3u|v(i)||C(k, j)|.

fl(Lk (
v )C) = Lk (v)C + E
|E| u(|C| + 3|v||C(k, :)> |) + O(u2 ).
v ,
C .
|C(k, k)| .
Gauss.
(. . 152-160
[39].)

5.4
.1
.

c
166 5. II 2008,
.
.
:
1. .
2. .
3. .
4. .

5.4.1
, , ,
. ,

.
5.4.1 ( ).

() n(n + 1)/2 n2
:

11 ,

12 , 22 ,

13 , 23 , 33 ,

1n , , nn

j(j 1)/2 + i
(i, j) (j, i) .
A .. y y + Ax
:
for i = 1 : n
for j = 1 : i 1

y(i) = y(i) + A(i (i 1)/2 + j) x(j)


end
for j = i : n

y(i) = y(i) + A(j (j 1)/2 + i) x(j)


end
end

,
, .
LAPACK 5.10. [1, . 107-].
:
. A Rmn
ij (i, j) A. , , Hessenberg
,

c
5.4. 2008,
. 167
.
(= packed) ,
.
. m n
l u ,
l, u min(m, n). ij , max(1, j u) i
min(m, j + l) (u + 1 + i j, j) A(1 :
l + u + 1, 1 : n).

n n 1
- ( -) .
.
5.4.1. x y + Ax, A Rmn .
.

. , ,
.
. m n
b0 + 1
. :
: (I, J)
b0 + (J 1)m + I .
: (I, J)
b0 + (I 1)n + J .
: Fortran
, C ( ) Pascal .
. , (I, J)
(I + 1, J) 1, n. ,
(I, J) (I, J + 1) 1
m .
,
LOAD . ..
.
(access stride).
:
1.
(, , , ).
(.

c
168 5. II 2008,
.
(= cache miss),
, .
(= cache line)
. , .
,

(= page faults).
2.
(= memory banks). (..
)
(=bank conflict) . bank
.
.
.

.
5.4.2. Fortran.
A n 1
.

REAL A(n,n), X(n), Y(n)

DO I = 1,n
DO J = 1,n
Y(I) = Y(I) + A(I,J)*X(J)
ENDDO
ENDDO

REAL A(n,n), X(n), Y(n)

DO J = 1,n
DO I = 1,n
Y(I) = Y(I) + A(I,J)*X(J)
ENDDO
ENDDO

5.4.2.

.

,
.
(.. Fortran
Fortran 90) ,
(Fortran-90, MATLAB).
, .

5.4.2

.1. A

c
5.4. 2008,
. 169
. ij = 0 , (1 i < j n) jj 6= 0, (1 j n).
.1

a>
j x = j ,

j = 1, ..., n,

a>
j A.

= (j

j1
X

jk k )/jj ,

(5.1)

k=1

= (j DOT (j,1:j1 1:j1 ))/jj ,

j = 1, ..., n.

(5.2)


. j x1 , ..., xj1 . A

:

n
X

= (j

jk k )/jj ,

(5.3)

k=j+1

= (j DOT (j,j+1:n j+1:n ))/jj , j = n, n 1, ..., 1.


(5.4)
Pn
2

j=1 (1 + (2j 2)) = n ...


A.

b = :,1 1 + :,2 2 + ... + :,n n .


A

1
..
.

j
..
.

11
..
.

j1
..
.

1 +

n1

0
..
.

j2
..
.

n2

.
2 + ... +
..

nn

11
2:n,1

01,n1
A2:n,2:n

1
2:n

1
2:n

(5.5)

11 1 = 1

(5.6)

(5.5)

A2:n,2:n 2:n

2:n 2:n,1 1

(5.7)

2:n . 1 , n 1.
:

c
170 5. II 2008,
.
for j = 1 : n

j = j /jj
for i = j + 1 : n
i = i ij j
end
end
sAXPY. , n2 ...
x
. , b
x ,

min =

n(n + 1)
+ 2n.
2

= n2

min =

1
5
+
.
2 2n


BLAS2.
, AX = B A X, B
Rns , s > 1.
5.4.3.
,

L11
L21

0
L22

X1
X2

B1
B2

X1 , B1 Rks , X2 , B2 Rnks .

B1
B2

= L11 X1
= L21 X1 + L22 X2

X1 ,

L22 X2 = B2 L21 X1 .
.
k s
, n k
s . s > 1, L21 X1
BLAS3.




P.
k1
5.4.1. [24] y = (c i=1 i i )/k ...
:

c
5.5. 2008,
.

171

s=c
for i = 1 : k 1
s = s i i
end

y = s/k
y

k yk (1 + k ) = c

k1
X

i i (1 + i )

i=1

|i | i i , .
i = iu/(1 iu).

. .
5.4.1. Ax = b, A Rnn
, . x

(A + A)
x = b, |A| n |A|.

10 . ,
. [24].

5.5
.1 11 .
. P Rmn
P Ax = P b. P, Q ,
P AQx = P b x = Qy .1.
.1 :

P b b
P AQ A,
.

A1b x
.
Q
x x .
:
A;
P, Q;
10
J. Wilkinson . 251 [39]: ... it is worth stretching
that computed solutions of triangular matrices are, in general, much more accurate than can
be deduced from the bound obtained from the residual vector, and this high accuracy is often
important in practice.
11
;

c
172 5. II 2008,
.
,
P, Q;
5.5.1. Gauss

Q = I P P A = U
.

5.5.2. Gauss Q = I P P A = U .

.
5.5.3. P A = R P R ,
A = QR Q . P
( Householder).

5.5.4. A = ZP Z P
.

5.5.5. A = P HP > H Hessenberg P .

5.5.6. AA> = AT A A Rnn A = QQ> Q


j , j = 1, .., n
A. Q A.

5.5.7. A = U > V Rmn U Rmm , V Rnn


Rmn ()
1 min(m,n) 0 A.

A
O(n3 )
. LU .
5.5.1. A Rnn .
L U P LU = P A
L .

5.5.2. n A(1 : k, 1 : k), k = 1, ..., n


P = I A = LU . L ,
L, U .

L
U . ..
A = LDU L, U D
det(A(1 : k, 1 : k)) = det(D(1 : k, 1 : k)).
L, U, D .
5.5.2
.1 :
. [ LU ]
: L, U ,
: Lz = b z ,
: U x = z x.

c
5.5. 2008,
.

173


. LU
.
5.4.2. x = [1 , ..., n ]> k 6= 0, j := j /k (k + 1
j n), u := [0, ..., 0, k+1 , ..., n ]> Rn ,
Gauss

Lk (u) := E(u, ek ; 1) = I ue>


k
x (k+1 : n) .
u Lk (u)
u
Lk (u). u
( Lk (u)) x:
. x Rn 1 6= 0 GAUSS u Rn1 y = L1 (u)x y(2 : n) = 0.
function u = GAUSS(x)
n = length(x); (* n x *)

u = x(2 : n)/x(1)
end
GAUSS
Gauss T
= n 1 ...

Gauss C Rnr :

.
C Rnr Gauss
Lk (u) C Lk (u)C .
function u = GAUSS.MUL(C, u)
n = rows(C); (* n . C *)

C(2 : n, :) = C(2 : n, :) uC(1, :)


end

GAUSS.MUL
T
= 2(n 1)r ...

Gauss GAUSS.MUL [A, b] () .



b.
LU : GAUSS.MUL
A = LU
.
.

.
Gauss.

c
174 5. II 2008,
.
.
function A = GAUSS.ELM(A)

k=1
while A(k, k) 6= 0 k n 1
t = GAUSS(A(k : n, k)) (* t Rnk *)

A(k + 1 : n, k) = t
A(k : n, k + 1 : n) = GAUSS.MUL(A(k : n, k + 1 : n), t)
k =k+1
end
end
, GAUSS.ELM, , n 1
Gauss (. u) Ln1 L1 A = U 1
. A = L1
1 Ln1 U .
1
1
>
L1
k = I + uk ek L := L1 Ln1
. ,
>
uj , ej uj e>
j uk ek = 0 j < k ,

>
= (I + u1 e>
1 ) (I + un1 en1 )

= I+

n1
X

u j e>
j

j=1

, A :
A R, u Gauss.

A=

11
(1)
2
.
.
(1)
n

12
22
(2)
2
.
(2)
n

.
.
.
.
.

1n
2n
.
.
n,n

Gauss
GAUSS.ELM
T

n1
X

(n k + 2(n k)2 )

k=1
3

2n
3

...
, . L U , O(n2 ), . .1
LU b. .. b
A . s
, . AX = B , X, B Rns , A (. 1 n3 + 2 n2 + O(n))

c
5.6. 2008,
.

175

(. sn2 + O(n)).

1 n3 + (2 + s)n2 + O(n)

n2
n3
+ 2
+ n2 + O(n).
s
s


, s.

Gauss
12 . .. A Rmn m n
n Gauss U
Rmn , U (j, k) j > k ,
.
A = LU L Rmm .
A , L

L=

L11
L21

0
I

L11 Rnn .

5.6
Gauss GAUSS.ELM
:
for ? = ?:?
for ? = ?:?
for ? = ?:?

ij = ij
end

ik kj
kk

end
end
Dongarra, Gustavson Karp [9]
, . ,
.
.
GAUSS.ELM.
GAUSS.MUL, GAUSS.ELM kij
( ) kji ( )13 .
12

.
GAUSS.MUL
.
13

c
176 5. II 2008,
.
Fortran kji kij . [29].
kji
SGEFA LINPACK ([10]).
A A . sAXPY,
.
.
j A

A:,j = LU:,j

L, U :

A1:j1,j
Aj,j

= L1:j1,1:j1 U1:j1,j
=

j1
X

(5.8)

Lj,k Uk,j

z}|{
+ Lj,j Uj,j

(5.9)

k=1

Aj+1:n,j

z
j1
X

MXV

}|

Lj+1:n,k Uk,j + Lj+1:n,j Uj,j .

(5.10)

k=1

j 1 L
U , j (
) : L1:j1,1:j1 ,
(5.8)
j 1 j U . ,
Uj,j (5.9) , (5.10) Lj+1:n,j . (5.10) ,
MXV 14 . ,
MXV
kji, kij . A (
) L U ,
jki:

14

[22] MXV GAXPY.

c
5.6. 2008,
.

177

. [ LU jki]
, A
L,
A U .
for j = 1 : n
(* (5.8) *)
for k = 1 : j 1

Ak+1:j1,j = Ak+1:j1,j Ak+1:j1,k Akj


end
(* (5.9) *)
for k = 1 : j 1

Aj:n,j = Aj:n,j Aj:n,k Akj


end
(* (5.10) *)

Aj+1:n,j = Aj+1:n,j Ajj .


end

jki .
ikj
[9].
, ijk, jik , DOT 15 .
i 1 L U , i L
U i L
U . i 1 i 1
i L, n i + 1 i
U . :
DOT

Ai,j

z
}|
{
Li,1:j1 U1:j1,j + Li,j Uj,j ,

z
}|
{
Li,1:i1 U1:i1,i + Uii

z
}|
{
Li,1:i1 U1:i1,j + Ui,j ,

(1 j i 1)

(5.11)

DOT

Aii

(5.12)

DOT

Ai,j

(i + 1 j n)

(5.13)

Li,1:i1 Ui,i Ui,i+1:n


DOT.
:

15

, Doolitle Crout.

c
178 5. II 2008,
.
. [ LU ijk ]
for i = 2 : n
for j = 2 : i

Ai,j1 = Ai,j1 /Aj1,j1


for k = 1 : j 1
Aij = Aij + Aik Akj
end
end
for j = i + 1 : n
for k = 1 : i 1

Aij = Aij + Aik Akj


end
end
end
, .
DOT
(extended precision).
L, U . . [9, 35].

5.6.1

,
BLAS1 BLAS2. ,
, .
BLAS3 .

A11
A21
A31

A12
A22
A32


A13
L11
A23 = L21
A33
L31

0
L22
L32

0
U11
0
0
L33
0

U12
U22
0

U13
U23 , (5.14)
U33

A11 R(k1)(k1) , A22 R A33 R(nk)(nk) L, U


.

L11 U11
L21 U11
L31 U11

L11 U12
L21 U12 + L22 U22
L31 U12 + L32 U22

L11 U13
,
L21 U13 + L22 U23
L31 U13 + L32 U23 + L33 U33

BLAS-3, .
(=right-looking) (=left-looking).
:
L U . k
(k 1) L U , . L11 , L21 , L31 U11 .
L, U , .

c
5.6. 2008,
.

179

L22 , L32 U12 , U22 . .


L11 , L21 , L31 U11 ,

A11
A21
A31

A12
A22
A32


A13
L11
A23 = L21
A33
L31

0
U11
0 0
I
0

0
I
0

A12
A22
0

A13
A23 .
A33

(5.15)

k :

A11
A21
A31

A12
A22
A32


A13
L11
A23 = L21
A33
L31

U11
0
0 0
I
0

0
L22
L32

U12
U22
0

A13
A23 .
A33

(5.16)

A12

= L11 U12

A22

= L21 U12 + L22 U22

A32

= L31 U12 + L32 U22

. :
1) A12 = L11 U12 _TRSM L11 .
_GEMM

T22
T32

A22
A32

L21
L31

U12

2) T22 = L22 U22 _GETF2 .


3) T32 = L32 U22 _TRSM
: L U .
L11 , L21 , L31 U11 , U12 , U13 ,

A11
A21
A31

A12
A22
A32


A13
L11
A23 = L21
A33
L31

U11
0

0
0
I
0

0
I
0

U12
A22
A32

U13
A23 .
A33

(5.17)

k :

A11
A21
A31

A12
A22
A32


A13
L11
A23 = L21
A33
L31

0
L22
L32

U11
0
0 0
I
0

U12
U22
0

U13
U23 .
A33

(5.18)

A22
A32

L22
L32

U22

c
180 5. II 2008,
.

5.4: : . :
.
.

A22
A32

A23
A33

L22
L32

0
I

L22
L32

0
I

U22
0
U22
0

L1
22 A23

A33 L32 L1
22 A23

U23
A33

:
1)

A22
A32

L22
L32

U22

_GETF2 (BLAS-2)

2) U23 = L1
22 A23 _GETRSM (BLAS-3)
33 = A33 L32 U23 _GEMM _GERK (BLAS-3).
3) A
5.4 .
1 < k n,
BLAS3,
.
k , , , ...
, ,

k .

c
5.6. 2008,
.

181

5.6.1.
. k
.

, A,
LU . Lij , Uij L, U .
A
. . L, U

A11
A21

A12
A22

I
L21

0
I

U11
0

U12
U22

(5.19)

L21
U11
U12
U22

:=
:=
:=
:=

A21 A1
11 ,
A11 ,
A12 ,
A22 A21 A1
11 A12 .

A LU

A11 .

5.6.1. S := A22 A21 A1


11 A12
Schur ( Schurs complement) A A11 .

5.6.2. .


:
5.6.1. Ax = b

A=

D1
B>

B
D2

D1 , D2 .

A=

I
B > D11

0
I

D1
0

B
D2 B > D11 B

A Schur.
:
1. y2 = b2 B > D11 b1 .
2. (D2 B > D11 B)x2 = y2 .
3. x1 = D11 (b1 Bx2 ).

5.6.3.
.

5.6.4. D1 , D2 .
.

c
182 5. II 2008,
.
5.6.5. A .
11 , U
11 L
11
11 U
LU A11 , . A11 = L
L11 , A(k + 1 : n, k + 1 : n)
Schur.

5.6.6. .1
.

I
L21

L= .
..
Lm1

Lm,m1

.. , U =
.

U11

U12

U1m

U22

..
.

..

Umm

5.5.2:
nn
5.6.1. A = (Aij )m
i,j=1 R
Aij . , m 1
.

LU [9, 22, 30]. [1, 22] LU


.

5.7
5.5.2,

.
5.7.1. A = [e2 , e1 ] R22 . , A2 = I .
A11 = 0.

, .
( )
,
.
Gauss.
5.5.2, ... L, U
( ).
.

(k1)

. kk ,

c
5.7. 2008,
.

183

(, )
(k1)

ik

(k1)

kk

, i = k + 1 : n.

.
k A
, , ,
:
: k ,
, (k : n, k)
A. k k
k k.
.
: k ,
, Ak:n,k:n .
(1 , 2 ), k 1 k 2 .
,
k 2 ( ), . O(n3 )
k , . O(n2 )
. , , [22].
,
, : LU
Gauss .

. (rook pivoting), ,
, .
[24].

A. ,
:

Ln1 Pn1 L1 P1 A = U,
Pk k
(k + 1 : n) k
Lk1 Pk1 L1 P1 A.

(Ln1 Pn1 L1 P1 )1 U,

Pn1 P1 A

Pn1 P1 (Ln1 Pn1 L1 P1 )1 U.

P := Pn1 P1 , P ,

L := Pn1 P1 (Ln1 Pn1 L1 P1 )1


, |ij | 1, k
k Gauss. Lj .

P A = LU.

c
184 5. II 2008,
.
. [22].
LU
,
.
.
: P Ax = LU x = P b,
b L U .
.1 [22] :
: (=scaling) P1 , P2 P1 AP2 P21 x =
P1 b Ax = b
.. A, (P1 AP2 ) < (A).
(=preconditioning)
.
: (=iterative improvement) x(0) Az = r (j1) r (j1) :=
b Ax(j1) , x(j) := x(j1) + z j = 1, ....

,
- Gauss.
,
:

n :=

max{0 , 1 , ..., n1 }
,
0

(k)

k := maxi,j |ij |
(k + 1 : n, k + 1 : n) Lk Pk L1 P1 A.
,

"
A=

1
10000

, :

"
L1 =

"
L1 A =

10000 1

1
10000

9999

0 = 1, 1 = 9999 = 9999.

"
P1 =

0 1
1 0

"
L1 =

1
10000

"
L1 P1 A =

9999
10000

0 = 1 1 = 0.9999, .
. , .

c
5.7. 2008,
.

185

5.7.2.

1
1
A=
1
1

0
1
1
1

0 1
0 1

1 1
1 1

1
0
U =
0
0

0
1
0
0

0
0
1
0

1
2

4
8

= 8,

1
0
U =
0
0

1
2
0
0

0
1
2
0

0
0

1
2

= 2.

,
:

1
1
ij =

j = i, j = n,
j < i,


= 2n1 ,
n
:

n <

p
n21 31/2 41/3 n1/(n1) .

n 2n1 .
[24].
Gould 16
, . n. Gould 13 13
= 13.0205.
5.7.1.
Gauss :
2n1 .
n

, , .
5.7.3.

n.
Hessenberg
n

2.
n

16

Wilkinson [39, . 214], In fact no matrix has yet been discovered for
which nPO > n.

c
186 5. II 2008,
.

5.7.1
It is a matter for some surprise that one of the simplest methods of
solution generally leads to an error, the expected value of which is
precisely that resulting from random perturbations ... This means
that when the original elements are not exactly representable by
numbers of t binary digits the errors resulting from any initial rounding that may be necessary are as serious as those arising from all
the steps in the solution. James Wilkinson . 198 [39]
,
. ,
,
Wilkinson. . [38]. ,
, .
:
:
<

(= posedness) 17 .
. .1 A, b

(A + A)(x + x) = b + b,

x A, b. A() := A + E
A := A(0) 0. :
5.7.1. A() Rmn
ij (). ij ()
(1)

d
A(1) () ij () := d
ij ().

mn
ns
5.7.1. A() R
, B() R

.

d
d
d
[A()B()] = [A()]B() + A() [B()]
d
d
d

d
d
[A()1 ] = A()1 [A()] A()1
d
d

. .

17
(= well-posed)

(= ill-posed).

c
5.7. 2008,
.

187

(A + E)x() = b + e, x(0) = x.
5.7.1

x(1) (0) = A1 (e Ex)

x() = x + x(1) (0) + O(2 ).

kx() xk
kxk

kek
k
+ kEk + O(2 )
kbk

||kA

(A)(A + b ) + O(2 )

A := ||

(5.20)
(5.21)

kek
kEk
, b := ||
kAk
kbk

.
A, b x . (5.20) (A).
(A) . p
:

1
kAkp
=
min
.
p (A) A+A kAkp

(5.22)

, 1/p (A) , p, A .
(p = 2)

2 (A) =

1 (A)
n (A)

(5.23)

1 n A.
:
5.7.1. kAk < 1, I A

k(I A)1 k

1
.
1 kAk

5.7.1. A kA1 Ek < 1 A + E


.

. .
(5.20):
5.7.1. kA1 kkAk < 1 (A + A)y = b + b.
A + A

(A)
ky xk

kxk
1 (A) kAk
kAk

kAk kbk
+
kAk
kbk

c
188 5. II 2008,
.

. kA1 Ak kA1 kkAk < 1 I + A1 A


5.7.1

k(I + A1 A)1 k

1
.
1 kA1 kkAk

y x = (I + A1 A)1 A1 (b Ax)

ky xk
kxk

kA1 k
1 kA1 kkAk

kbk
+ kAk
kxk

kxk kbk/kAk .
5.7.2.

ky xk
2 (A)

kxk
1 (A)

. .
:
5.7.2. (A) < 1/2

ky xk
4 (A)
kxk

. .
Ax = b , A P A = LU , P Ax = P b .
.
U
:
L,

U
= A + E.
L
.
(A + A)y = b . b = 0,
A. 18 :

+ L)(z

(L
+ z) = b,

(U + U )(x + x) = z + z.

U
) + (L)
U
+ (L)(

).
A = E + L(
U

kAk
, L
, L,
U
(
E, U
) .
18

c
5.7. 2008,
.

189

:
5.7.3 (Wilkinson). A Rnn
x
Gauss .

(A + A)
x = b, kAk cn3 MO
n kAk u
c = O(1).

5.7.3
Gauss n .
, ... u 10d
= O(10k ),
.1 Gauss
d k [22].
5.7.2. MATLAB Matrix Computation
Toolbox N. Higham

www.maths.manchester.ac.uk/ higham/mctoolbox
A .
, randsvd

.1.

randsvd(4,1e14)
}|
0.2260 0.1746
0.3329
0.2226 0.1719
0.3278
A=
0.3578
0.2764 0.5269
0.0014 0.0011
0.0021
z

rand(4,1)

}|
0.1763
0.8307

0.1736
, x0 = 0.0548

0.1896
0.2790
0.0011
0.9753

b Ax0 . 2 (A) = 9.9807 1013


Ax = b Gauss ,

kx0 xk2
= 0.0023
kx0 k2

5.7.4. [1, . 75]


ky xk
kxk
. :

SLAMCH: , epsmch.
SLANGE: kAk .
SGESV Ax = b L, U .
SGECON: , rcond.
errbd

errbd = epsmch/MAX(rcond, epsmch)

c
190 5. II 2008,
.

:
k.k ,
.. .
(A). .

(A) . :
1. (A) O(n3 )
.1 A1 .
2. (A) ,
.
:
(A)
.
LAPACK MATLAB . [22, 24].
(.. )
. , ,
.
(..
,
, .) (A).
. ,

.
b .
5.7.5. A = A> Rnn . Aui = i ui
(i , ui )n
i=1 . Q , Q> AQQ> x = Q> b
Q> x = Q> b

1
..

u>
1x

u>
1b

..
..

.
= .

>
u>
x
u
b
n
n


>
j u>
j x = uj b.

j = 0 u>
j b = 0.
Ax = b
. ,
j = 0 u>
j b = 0. ,
j u>
j b

c
5.7. 2008,
.

191

(.. ). A
,

1 n1 n = 0.
b n = 0.

x=

n1
X
i=1

i
ui
i

. A .

(
)
. (=effective condition number) [5],
b.

. ,
[1, 10r ] 10r ,
.
. b = [1 , 2 ]> ,
2 /10r .
. ,
, S (A) := k|A||A1 |k
A . 20

.
5.7.6. MATLAB:

>>
>>
>>
>>

a=rand(100);
flops(0); c_exct = cond(a); f_exct = flops;
flops(0); c_est = condest(a); f_est = flops;
[c_exct,c_est]
2.5938e+03
5.8855e+03
>> [f_exct, f_est]
2953425 764356
cond MATLAB 2 condest Test Matrix Toolbox 1 Hager ([23])
Higham ([24]). f_est < f_exct
c_exct, c_est = O(103 ).

1
5.7.7. Vn = V (0 , ..., n1 ) j = j+1

2 (Vn ) > nn+1 . [19].


, , .

c
192 5. II 2008,
.
5.7.8.

S = A U B 1 V, A Rnn , U Rmn , V Rnm , B Rmm .


Schur [A, V ; U, B] A. :
1. Bhj = V (:, j), j = 1 : m H [h1 , , hm ].
2. S = A U H .

.1
A. ,
A1 .
n Axj = ej , j = 1 : n.
[24].

5.8

.1

O(n3 )
... Strassen
, n O(n3 ) ,
( .1).
: Gaussian elimination is not optimal.
Strassen.
[6, 37]. A
. (5.19) A1
:

A1
11
0

1
A1
11 A12 S
1
S

I
A21 A1
11

1
1
A1
A21 A1
11 + A11 A12 S
11
1
1
S A21 A11

0
I

1
A1
11 A12 S
1
S


1. A1
11 ,
1
2. A1
11 A12 A21 A11 ,
1
3. S 1 = (A22 A21 A1
,
11 A12 )
1
1
4. S 1 (A21 A1
,
11 ), (A11 A12 )S
1
1
5. A1
(A21 A1
11 + (A11 A12 )S
11 )

.
Schur. n = 2k
n/2.
:

c
5.8. .1 2008,
.

193

1. 2 n/2,
2. 6 n/2,
3. 2 n/2, .

TINV (n) = 2TINV (n/2) + 6TMUL (n/2) + 2TADD (n/2).

(5.24)

:
5.8.1. A , 5.64nlog 7 .

. (5.24)
Strassen TMUL (n) 4.7nlog 7 ,

TINV (n)

log
Xn
j=1

2j1 TMUL (

n
n2
)
+
log
n
.
2j
2

(5.25)

.
( ) !
.
,
LU , .
,
, , ,
.. .
. ,
Schur,
. A
19
() , Ax = b : 1)
A>, 2) (A> A)1 A> b. Schonhage A ,
A> A . ,
Strassen .
A> A 2 (A> A) = 2 (A)
.
. , Bunch
Hopcroft Strassen O(nlog2 7 ).
[3] [24].
19

.

c
194 5. II 2008,
.

5.9
20
( O(n2 ) ).

,
. .
.
5.9.1, Cholesky
, LU .

5.9.1 Cholesky
, A Rnn (..)
x> Ax > 0 x Rn . ,
B , B > B ..
A A> ii
, A ..

A=

11
21

12
22

x A
..

|12 |, |21 |

11 + 22
.
2

: A ..
..
,
A .
A11 k , Schur
A11 , S = A22 A21 A1
11 A12 ..
:
A. .. x> = (x1 , x2 )

x> Ax =
=

>
>
>
x>
1 A11 x1 + x1 A12 x2 + x2 A21 x1 + x2 A22 x2
1
1
>
>
(x1 + A1
11 A12 x2 ) A11 (x1 + A11 A12 x2 ) + x2 (A22 A21 A11 A12 )x2 .

>
A .., x1 + A1
11 A12 x2 = 0, x Ax =
>
x2 Sx2 > 0 .. A, .
20
R,
.

c
5.9. 2008,
.

195

..
5.5.2 L, U A = LU . ,
,
LU .
:
5.9.1. A Rnn ..
(k)
A = Lk L1 A,
k Gauss A, (1 k n 1). :
(k)

1. Ak:n,k:n , ..
(k)

(k)

2. ij Ak:n,k:n
(k)

(k1)

max |ij | max |ij


i,j

i,j

|, k = 1, ..., n 1.

LU
, .. , =
1. ..
.
:
5.9.2. A Rnn
, L, M
, D , A = LDM > .
L, D, M .

. A ,
L, U A = LU
L . D = diag([u11 , ..., unn ])
diag(u) u .
M > = D1 U A = LDM > . M
.
LU .
5.9.3. A Rnn .
R
A = R> R.

. .
n ( n = 1 ). ..

A =

A
a>


A = R> R

A =

R>
r>

R
0

R> r = a 2 = r > r . R>


,
r = RT a.

>
>
r r > 0 = r r > 0.

c
196 5. II 2008,
.

x> Ax
> 0 x 6= 0,
A
> T
>
x = [r R , 1]
0 <= r> RT AR1 2r> RT a + = r> r.

R Cholesky A ( ) Cholesky. LU , Cholesky .


5.9.3
Cholesky. j 1 R:,1:j1

A:,j = R:,1 R1j + + R:,j1 Rj1,j + R:,j Rjj


R :

R:,j Rjj

A:,j

j1
X

R:,k Rkj

k=1

A:,j

j1
X

R:,k Rjk

k=1

MXV GAXPY.
2
j Rjj
Rjj
R:,j .
:

. [ Cholesky MXV]
A
R.
for j = 1 : n
if j > 1

Aj:n,j = Aj:n,j Aj:n,1:j1 A>


j,1:j1
end

Aj:n,j = Aj:n,j /

Ajj

end

T n3 , LU .
... , Cholesky ,
y (A + E)y = b kEk2
cukAk2 c n.
A ...
Cholesky [22, .
147].
,
BLAS3.
k R, R11 , R22
>
k n k , A11 = R11
R11

c
5.10. 2008,
.

197

:
xTRSM

A12

z }| {
>
= R11
R12

>
>
A22 = R22
R22 + R12
R12

R12 .
>
R11
nk , . xTRSM BLAS3.
>
A22 R12
R12 xSYRK
BLAS3. Cholesky
, n k
Cholesky
BLAS2. xPOTF2.
Cholesky
LAPACK [1].
Cholesky, , 21 .
,
.1.
..
, .. A
.., .
, Laplace Poisson: (uxx + uyy ) = f (x, y).

. Cholesky .

5.10
,
.1, LAPACK. Fortran, C (CLAPACK),
C++ (LAPACK++) [1] Java (JLAPACK).
:
C (CLAPACK) Java (JLAPACK) ,
, Fortran 77
f2c f2j .
LAPACK C
C wrappers Fortran
LAPACK.
, LAPACK++ Template Numerical Toolkit (TNT),
ANSI C++. TNT
21
, , Cholesky.
,
..

c
198 5. II 2008,
.

5.1: LAPACK
X
S
D
C
Z

REAL
DOUBLE PRECISION
COMPLEX
COMPLEX*16

YY
GE
TR
TB
TP
GB
GT
HE

ZZZ
TRF
TRS
COND
RFS
TRI
EQU
YY
PO
PP
PB
PT
SY
SP



refine

/
()
()
()
()
()
()

LAPACK++ MV++ SparseLib++ ( ,


,
pipelined RISC) IML++ .
[31].
XYYZZZ X
, YY ZZZ .
22 5.1.
5.10.1. DGETRF ...
(= Double) (= Factorization) (= GEneral)
, (= TRiangular) .

LAPACK : ,
23 . ,
, .. LU.
, .
functions
(.. ).
. LAPACK , , .
.
[1], .. Compaq Alpha Server DS-20 22
.1. .
. [1].
23

(= expert drivers.)

c
5.10. 2008,
.

199

DGETRF 353 MFLOPS n = 100 440 MFLOPS


n = 1000. ,
r = 28. , SGI
Origin 2000 228 MFLOPS n = 100 452 n = 1000,
r = 64.
LAPACK ,
LAPACK .
6.2 [1].
Matlab functions . Matlab
, ,
, .. A \ b
.1. MATLAB
L U . ,
function lu L, U MATLAB 6.0.
[L,U] = LU(X) stores an upper triangular matrix in U and a "psychologically lower triangular matrix" (i.e. a product of lower
triangular and permutation matrices) in L, so that X = L*U. X
can be rectangular.
[L,U,P] = LU(X) returns lower triangular matrix L, upper triangular matrix U, and permutation matrix P so that P*X = L*U.
LU(X), with one output argument, returns the output from LAPACKS DGETRF or ZGETRF routine.
Matlab LAPACK.
LINPACK 24 . ,
LAPACK, . ,
IMSL (25 Visual Numerics26 , LSLRG ,
:
... solves a system of linear algebraic equations having a real general
coefficient matrix. It first uses the routine LFCRG (page 15) to compute an LU factorization of the coefficient matrix based on Gauss
elimination with partial pivoting. Experiments were analyzed to determine efficient implementations on several different computers.
For some supercomputers, particularly those with efficient vendorsupplied BLAS, page 1046, versions that call Level 1, 2 and 3 BLAS
are used. The remaining computers use a factorization method provided to us by Dr. Leonard J. Harding of the University of Michigan.
Hardings work involves loop unrolling and jamming techniques
that achieve excellent performance on many computers. Using an
option, LSLRG will estimate the condition number of the matrix.
The solution of the linear system is then found using LFSRG.
24

, Matlab front-end LINPACK.


International Mathematical and Statistical Library.
26
http://www.vni.com/products/imsl/.
25

c
200 5. II 2008,
.

5.2: ([24]).


1991
55.296
Connection Machine CM-2 4.4
1992/3 75.264
Intel iPSC/860
2 23
1994
76.800
Connection Machine CM-5 4.1
1995
128.600 Intel Paragon
1

5.11
LINPACK LAPACK
(=benchmarks) . ,
/ , MFLOPS, LINPACK 100
LINPACK 1000. LINPACK LAPACK Jack Dongarra University of Tennessee,
, , 27
( PC )
LINPACK.
.
: ,
Alan Edelman MIT
.. . [15, 12].
n ( 103 )
.
,
.. (= computational elecromagnetics). 5.2 [24]
( ) LU ...
.

5.12
MIT Gilbert Strang,
(1996). Gene Golub Charles van Loan [22]. ,
Horn Johnson [25, 26], Nick Higham [24],
. ,
,
. D.K. Faddeev
V.N. Faddeeva ( [16]),
Alston Householder [27], James H. Wilkinson
27
. http://www.netlib/org/benchmark/performance.ps Performance of
Various vector computers using standard linear equations software.

c
5.13. 2008,
.

201

[39]. [7, 28, 35, 36].


Gene Golub James Ortega [21]
.

5.13
5.13.1 (, , 02-makeup). :
(A), LU .

5.13.2. :
A A = LU
. ( ,
.. ).

. : , A = [0, 1; 1, 0]
LU l11 u11 = 0 u11 = 0, U ,
LU .
5.13.3. , k = 1, ..., n 1
Gauss LU ,
(k, k)
(k + 1 : n, k).
;

. . U . , , 1
(I u1 e>
1 )M1,i A
u1 = 0. ,

(I u1 e>
1 )M1,i A

= M1,i A u1 e>
1 (M1,i A).

1 (I u1 e>
1 )M1,i A 1 U , ,
1 .
u1 , u1 e>
1 (M1,i A) 0,
>
>
U (1, :) = e>
1 (I u1 e1 )M1,i A = e1 M1,i A

U (1, :)e1 = e>


1,1
1 M1,i Ae1 =

1,1 (1,1) M1,i A, . ,
U .
5.13.4. ) ,
) .

5.13.5. ,
LU , x(1) Ax = b
x .
kx x(1) k/kxk
(A). . .
;

5.13.6. Ax = b LAPACK.

c
202 5. II 2008,
.
. (..
). LAPACK
. (..
Toeplitz).
5.13.7. A
Cholesky;

. [, . 5.9] .
5.13.8. :
.

. : s = [s1 , s2 , s3 ]>

1
T = t21
0

0
1
t32

0
0
1

t21 t32 6= 0. T s = e1 s3 6= 0.
s1 = 1, t21 + s2 = 0, t32 s2 + s3 = 0.
, s3 = 0 s2 = 0, t21 = 0, .
5.13.9.
Ax = b (
LAPACK) Robert Skeel

cond(A, x) =

k|A1 | |A| |x|k


kxk

A ,
b cond(A, x) = 1.
x; .

. x = e1 Ax = e1
1
1
x = [ 111 , 0, . . . , 0]> 11
e1 = |11
|
|A||x| = |A|e1 = e1 A .
|A1 ||A||x| = e1 . k|A1 | |A| |x|k = . |x|k =
.
5.13.10.
A = P LU L U ,
L
U . ( , LU
, ).

. L
, , . U
, , ,
j n j , ,
.

c
5.14. 2008,
.

5.14

203

5.14.1. () u, v Rnn .
A = I + xy > det(A) = 1 + x> y . ( :
).

5.14.2 (Higham [24]). A


A1 . , ,
B = A + ei e>
j ( , ei
1 i .) :
( . 5.14.1).

A + ei e>
j

A1 (I + A1 ei e>
j )

, I + A1 ei ej .
1
ei
, det(I + A1 ei ej ) = 1 + e>
j A
( B ) = 1/(A1 )ji .
(j, i) A1 (A1 )ji ,
1
>
. , (A1 )ji 6= 0, A (A1
)ji ei ej
.
5.14.3 (Golub and van Loan [22]).
n
N (y, k) = I ye>
k y R Gauss-Jordan.
1. N (y, k)1 .
2. x Rn y
N (y, k)x = ek ;
3. Gauss-Jordan
A1 A.
4. A

.
1. B := N (y, k)1 , I = N (y, k)B = B ye>
k B = B ybk,: .
B = I + ybk,: B 1
. B = I + zu> .

N (y, k)B

>
>
>
>
(I ye>
k )(I + zu ) = I + zu yek yu k ,

k k z .
>
0 = zu> ye>
k yu k

(z yk )u> = ye>
k.
u = ek y = z yk k = k k k ,

k =

k
,
1 k

c
204 5. II 2008,
.
k k y . z = y(1+k )
k .

I+

ye>
k
1 k

2. x = Bek = ek +

x = ek +

ye>
k
1k ek

y
,
1 k

1 6= k y 0 x = ek .
5.14.4. Mi1 ,i2 ,
AMi1 ,i2 A i1 i2 .

> >
. : B = AMi1 ,i2 = (Mi>
A
)

1 ,i2
, B > = Mi1 ,i2 A>
A> i1 i2 . i1
i2 A.
5.14.5. U Rnn
U x = ej ej Rn
j .
U x = ej j 2 j + 1.

. , x j
U 1 . U 1 ,
x 1 j . ,
j . ,

U11
0

U12
U22

x1
x2

ej
0nj

U11 Rjj , U12 Rj(nj) , U22 R(nj)(nj) , x1 = [1 , ..., j ]> Rj ,


x2 Rnj , 0nj n j ej
j 1 .
, x2 = 0
U11 x1 = ej . j 2 j + 1,
:

j = 1/j,j
b = [1,n , ..., j1,n ]> j
U (1 : j 1, 1 : j 1)[1 , ..., j1 ]> =
b
j ,
j 1 (j
1)2 . (j 1)2 + j .
5.14.6. A Rnn . ,
LU 23 n3 + O(n2 ).
n3 + O(n2 ). : 1
.

c
5.14. 2008,
.

205

: (
. 5.14.5). .

. , ,
LX = I , I L , (. 5.14.6)
5.14.7. :
.

. :
. ,
5.13.8 : , Cholesky A = LL> , L
. A , L . A1 = L> L1 . 5.13.8,
L1 . , L>
. , A1
, ,
.
5.14.8.
LU A R33 , .. lu(A),
A

1 2
0.5 1
1 1

3
2
1

1. A.
2. Ax = b
b = [11, 10, 4]> .

.
1.
L U .
:

1
L = 0.5
1

0 0
1 2
1 0 U = 0 1
1 1
0 0

1
A = LU = 0.5
1

3
2
1

2
3
2 3.5
1
0

2. L, U
b = [11, 10, 4]> , x =
[4.5, 0.5, 2.5]> .

c
206 5. II 2008,
.

5.14.9. A Rnn

A=

U
s

U R(n1)n Hessenberg s = [1 , , n ]. LU A.
A .
.

. (n, 1 : n1).
, .
n 1 Gauss (n, 1), (n, 2), , (n, n 1).
( L, U lij , uij .)
L = eye(n)
for k = 1 : n 1

lnk = k /ukk
for j = k + 1 : n
j = j lnk ukj
end
end
, L U

L=

..
.

U
,
U
=

..
..
0, , 0, n
. 0
.
ln1 ln2 1
Pn1
Pn
k=1 (1 + j=k+1 2), . n2 + (n).
5.14.10 (, , 04). A Rnn
A = U V > j,j = j
n,n = 0, U U > = I V V > = I . )
A . ) b = U z
z = [1, . . . , 1, 0]> . , , x
Ax = b ( : x
V .)

=
0
.

A
=
n
Pn1
>
>

u
v
,

U
,
U
V
x
=
U
z

j
j
j
j=1
V > x = z . x Rn V ,
x = V y y , y = z .
, j j = 1 j = 1 : n 1 n
Pn1
. , x = j=1 1j vj .
5.14.11. LU A
Rnn u, v Rn . ) B =
A1 (I uv > ). ) ( cnk )
B . (.
0

... !) )
Bz z Rn .

c
5.14. 2008,
.

207

.
() A = LU .
:

B = A1 (I uv > ) = A1 A1 uv >
:
1. Ax = u,
2. A1 ,
3. C = xv > ,
4. B = A1 C .
()
:
1. 2n2 ( LU).
2. 2n3 n LX = I U B = X B = A1 .
, ,
LX = I , I L , (. 2
n(n+1)(2n+1)
5.14.6)
+ n3 = 34 n3 + n2 + O(n)
6
L, U .
3. n2 .
4. n2 .
= 2n3 + 4n2 + O(n) .
= 43 n3 + 72 n2 + O(n).
() Bz , :

Bz = A1 (I uv > )z = A1 z A1 uv > z
:
1. Ax = z
2. Ay = u
3. = v > z
4. x y

1 2, 4n2 .
5.14.12. A Rnn .
1. u Rn B := A + uuT
.
2. uT A1 u > 0.

c
208 5. II 2008,
.
3. C := A1 A1 u(1 uT A1 u)1 uT A1 CB = I .
4. Cholesky A.
Bx = y
. Cholesky, ... n2 + O(n)
.

.
1. A x> Ax > 0,
x 6= 0. B = A + uu> :

x> Bx = x> (A + uu> )x = x> Ax + x> uu> x =


= x> Ax + (x> u)(x> u)> = x> Ax + (x> u)2 > 0.
2. A = R> R Cholesky A:

u> A1 u = u> (R> R)1 u = u> R1 (R> )1 u =


= (u> R1 )(u> R1 )> = ku> R1 k2 > 0.
3. Sherman-Morrison Woodbury (
5.0.119).
4. 3:

Bx = y CBx = Cy x = Cy
x = A1 y A1 u(1 u> A1 u)1 u> A1 y.
x:
4.1 Az1 = u (n2 )
4.2 Az2 = y (n2 )
4.3 1 = u> z2 (2n 1 )
4.4 2 = (1 u> A1 u)1 1 (2n + 2 )
4.5 z4 = 2 z1 (n )
4.6 x = z2 z4 (n )
4n2 + 6n + 1 = 4n2 + O(n) .
( 4.1, 4.2), Cholesky,
( 5.0.108).

c
5.14. 2008,
.

209

5.14.13. A Rnn , n = 2k
k . .
() A = LU :

A11
A21

A12
A22

I
X

0
I

Y1
0

Y2
Y3

X, Y1 , Y2 , Y3 R 2 2 .

1. X, Yj , j = 1 : 3 Aij
A1
ij .
2.
O(n ) ..., 2 < 3,
Ax = b O(n ) ... ( :
.)

.
1
1. X = A21 A1
11 , Y1 = A11 , Y2 = A12 , Y3 = A22 A21 A11 A12 (
Schur). A11 A .

2. ( 5,
(Strassen).
5.14.14. T Rnn A R(n+1)(n+1) ,

A=

T
vT

u, v Rn , 6= 0 R . T ,
K(n)
T y = z K(n) limn n3 = 0.
1. b Rn .
Ax = b A
.
2. ...
n K .

5.14.15. ) LU
U

1
0
1
1
A=
1 1
1 1

0
0
1
1

1
0
.
1
1

) , ,
A 2 (A) = 1.8.
Ax = b b A
; ) U
;

c
210 5. II 2008,
.
. , 5.7.
, , ,
LU ,
.
5.14.16. MATLAB
.
.
MATLAB .
for i = 1 : n
for j = 1 : n

A(i, j) = 1/(i + j 1)
end

b(i, 1) = sin(2 pi i/(n + 1))


end
for i = 1 : ceil(n/3)

x = A\b
b=x+b
end
. . b
b = sin(2*pi*[1:n]/(n+1)); A
( Hilbert) :

J
J
I
E
A

=
=
=
=
=

1:n;
J(ones(n,1),:);
J;
ones(n,n);
E./(I+J-1);

,
:

x = A\(A\ \cdots (A\ b + b) + b) \cdots );


b = x + b;

, LU
. ,

b = sin(2*pi*[1:n]/(n+1)); J = 1:n; J = J(ones(n,1),:); I = J;


E = ones(n,n); A = E./(I+J-1);
[L,U] = lu(A);
for i = 1:ceil(n/3)
x = U \ (L \ b)
b = x + b
end
5.14.17. MATLAB
(Y Rnm ) .
1 < m < n myvec (n) () n .
,

c
5.14. 2008,
.

211

, ,
MATLAB .

function [Y] = vansolve(B);


[n,m]= size(B); x=myvec(n); V=[ ];
for i=1:n
for j=1:n, V(i,j) = x(i)(j-1); end
end
for k=1:m, Y(:,k) = V\B(:,k); end

. Vandermonde n x.
m (V > V )
B . , ) V , ) , )
. , ) ,
. (
)
x .

function [Y] = vansolve(B);


[n,m]= size(B); V=ones(n); x=myvec(n);
for j=2:n
V(:,j) = V(:,j-1).*x;
end
[L,U]=lu(V);
Y = U\(L\B);
( Vandermonde)
. ,

x=myvec(n); o1 = ones(1,n); p = kron([0:n-1],o1);


X = kron(o1,x); V = X.p
kron. ,

x=myvec(n);x=x; o1 = ones(n,1); o1 = ones(n,1);


X=x(o1,:); p = [0:n-1]; p = p(:,o1);X = X.p;
:

x.[0:n-1];
) x [0:n-1] (. 1 n)

[1, x(1), x(1)2, ..., x(1)(n-1)]


)

c
212 5. II 2008,
.

??? Error using ==> .


Matrix dimensions must agree.

5.14.18 (, , 04). MATLAB


.
,
, , MATLAB.
p m, n .

A=rand(m,n);B=rand(m,n);C=rand(m,n);X=rand(m+n,s);D=eye(m+n,m+n);
for i=1:m, for j=1:n, D(i,j)=A(i,j)*B(i,j)+C(i,j); end;end;
for i=1:m, for j=1:n, if (i==j), D(i,j)=p+D(i,j); end;end;end;
for k=1:s, Y(:,k)=D(1:m+n,1:m+n)\X(:,k); end;

. :

for i=1:m, for j=1:n, D(i,j)=A(i,j)*B(i,j)+C(i,j); end;end;


:

D(1:m,1:n)=A.*B+C;

for i=1:m, for j=1:n, if (i==j), D(i,j)=p+D(i,j); end;end;end;


:

k=min(m,n); D(1:k,1:k)=p*eye(k)+D(1:k,1:k);
s X(:, k), k = 1 : s
LU D :

[L,U]=lu(D); Y=U\(L\X);
5.14.19. A Rnn . ,
,
LU = P AQ L, U , L
, Q . , ,
n = 4.

. Lk (uk ) = I uk e>

j
,

i,j
k

Mi,j = [..., ei1 , ej , ei+1 , ..., ej1 , ei , ej+1 , ...]


n = 4. , ( ) ,
t, A, . A(0) = A,
(0)

t0 = arg max {|ij |}


1i,jn

c
5.14. 2008,
.

213

t (i0 , j0 ) ( , t
(1,1) ). , ,
>
Mi,i0 Mi,i0 = I AMi,i
0 = AMi,i0 0
i i ,

M1,i0 AM1,j0
Gauss

A(1) = L1 M1,i0 AM1,j0 .


,
(1)

t1 = arg max {|ij |}


2i,jn

(i1 , j1 )
Gauss,

L2 M2,i1 A(1) M2,j1

= L2 M2,i1 L1 M1,i0 AM1,j0 M2,j1


= A(2) .

( , n = 4)

L3 M3,i2 A(2) M3,j2

L3 M3,i2 L2 M2,i1 L1 M1,i0 AM1,j0 M2,j1 M3,j2

A(3) = U.

, ,

L3 M3,i2 L2 M3,i2 M3,i2 M2,i1 L1 M2,i1 M3,i2 M3,i2 M2,i1 M1,i0 A M1,j0 M2,j1 M3,j2
|{z}
{z
}|
{z
}|
{z
} |
{z
}
|
3
L

2
L

1
L

k Lk , .. k = 2,
i1 2, i2 3, L
1
L

=
=

>
M3,i2 M2,i1 (I u1 e>
1 )M2,i1 M3,i2 = I M3,i2 M2,i1 u1 (M3,i2 M2,i1 e1 ) )
>
I u
1 e1

u
1 = M3,i2 M2,i1 u1 .
u1 ( 2 n.) , Q =
M1,j0 M2,j1 M3,j2 , .

P AQ

1 L
1 L
1 U
= L
1
2
3
= (I + u
1 e>
1 e>
1 e>
1 )(I + u
1 )(I + u
1 )U
= (I + u
1 + u
2 + u
3 )U = LU

5.14.20 ( 03).
Hessenberg , A Rnn , LU
(,
) A L, U

214

A. )
: k =
max{0 ,...,k1 }
,
0

(k)

k := maxi,j |ij |
(k + 1 : n, k + 1 : n)
Lk L1 A k LU .
, M ,

. , LU
A. MATLAB error(MSG)
MSG
max(X) X
X.

. ) :

1, ..., n 1

for j=1:n-1
A(j+1,j) = A(j+1,j)/A(j,j)
A(j+1,j+1:n) = A(j+1,j+1:n) - A(j+1,j) A(j,j+1:n)
end
) :

a0 = max(max(abs(A))); r= 1;
for j=1:n-1
if r < M
A(j+1,j) = A(j+1,j)/A(j,j);
A(j+1,j+1:n) = A(j+1,j+1:n) - A(j+1,j) A(j,j+1:n);
r = max(max(max(abs(A(j+1:n,j+1:n))))/s,r \right);
else
error(Must use pivoting);
end
end

[1] E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra,


J. Du Croz, A. Greenbaum, S. Hammerling, A. McKenney, and D. Sorensen. LAPACK Users Guide. SIAM, Philadelphia, third edition, 1999.
[2] . Bjorck. Least squares methods. In P. G. Ciarlet and J. L. Lions, editors,
Handbook of Numerical Analysis, volume 1. Elsevier/North Holland, 1987.
[3] J.R. Bunch and J.E. Hopcroft. Triangular factorization and inversion by
fast matrix multiplication. Math. Comp., 28(125):231236, Jan. 1974.
[4] F. Chaitin-Chatelin and V. Fraysse. Lectures on Finite Precision Computations. SIAM, Philadelphia, 1996.
[5] T. F. Chan and D. Foulser. Effectively well-conditioned linear systems.
SIAM J. Sci. Stat. Comput., 9(6):963969, Nov. 1988.

215

[6] T.H. Cormen, C.E. Leiserson, and R.L. Rivest. Introduction to Algorithms.
McGraw-Hill, New York, 1990.
[7] G. Dahlquist and . Bjorck. Numerical Methods. Prentice-Hall, 1974.
[8] L. DeRose, K. Gallivan, E. Gallopoulos, B. Marsolf, and D. Padua. FALCON:
A MATLAB Interactive Restructuring Compiler. In C.-H. Huang, et al.,
editor, Lecture Notes in Computer Science: Languages and Compilers for
Parallel Computing, pages 269288. Springer-Verlag, New York, 1995.
[9] J. J. Dongarra, F. G. Gustavson, and A. Karp. Implementing linear algebra
algorithms for dense matrices on a vector pipeline machine. SIAM Rev.,
26(1):91111, January 1984.
[10] J.J. Dongarra, J.R. Bunch, C.B. Moler, and G.W. Stewart. LINPACK Users
Guide. SIAM, Philadelphia, PA, 1979.
[11] J.J. Dongarra, R. Pozo, and D. Walker. LAPACK++: A design overview of
object-oriented extensions for high performance linear algebra. In Proc.
Supercomputing93, pages 162171. IEEE Computer Soc. Press, 1993.
[12] J.J. Dongarra and D.W. Walker.
Software libraries for linear
algebra
computations
in
high
performance
computers.
SIAM Rev.,
37(2):151180,
June 1995.
Also in
http://hpclab.ceid.upatras.gr/faculty/stratis/download/Dongarrawalker.zip.
[13] I. S. Duff, A. M. Erisman, and J. K. Reid. Direct Methods for Sparse Matrices. Clarendon Press, Oxford, 1989.
[14] I. S. Duff, R. G. Grimes, and J. G. Lewis. Sparse matrix test problems.
ACM Trans. Math. Softw., 15:114, 1989.
[15] A. Edelman. The first annual large dense linear system survey. ACM
SIGNUM Newsletter, 26(4), October 1991.
[16] D. K. Faddeev and V. N. Faddeeva. Computational Methods of Linear Algebra. W. H. Freeman and Co., San Francisco, 1963.
[17] R.W. Freund, G.H. Golub, and N.M. Nachtigal. Iterative solution of linear
systems. In Acta Numerica, volume 1, pages 57100. Cambridge University
Press, 1992.
[18] C.F. Gauss. Theoria Combinationis Observationum Erroribus Minimis Obnoxiae (Theory of the Combination of Observations Least Subject to Errors).
SIAM, Philadelphia, 1995. Translation and notes by G.W. Stewart.
[19] W. Gautschi. How (un)stable are Vandermonde systems? In R. Wong,
editor, Asymptotic Analysis and Computational Analysis, pages 193210.
Marcel Dekker, Inc., New York, 1990.
[20] J. R. Gilbert, C. Moler, and R. Schreiber. Sparse matrices in MATLAB:
Design and implementation. SIAM J. Matrix Anal. Appl., 13(1):333356,
1992.

216

[21] G. Golub and J.M. Ortega. Scientific Computing: An Introduction with


Parallel Computing. Academic Press, Inc., San Diego, CA, 1993.
[22] G.H. Golub and C.F. Van Loan. Matrix Computations. The Johns Hopkins
University Press, Baltimore, 3d edition, 1996.
[23] W.W. Hager. Condition estimates. SIAM S. Sci. Stat. Comput., 5(2):311
316, 1984.
[24] N.J. Higham. Accuracy and Stability of Numerical Algorithms. SIAM, Philadelphia, 2nd edition, 2002.
[25] R. A. Horn and C. R. Johnson. Matrix Analysis. Cambridge University
Press, Cambridge, 1985.
[26] R. A. Horn and C. R. Johnson. Topics in Matrix Analysis. Cambridge
University Press, Cambridge, 1991.
[27] A. S. Householder. The Theory of Matrices in Numerical Analysis. Dover
Pub., New York, 1964.
[28] E. Isaacson and H. B. Keller. Analysis of Numerical Methods. John Wiley
& Sons, New York, 1966.
[29] C. Moler. Matrix computations with Fortran and paging. Commun. ACM,
15:268270, 1972.
[30] J. M. Ortega. Introduction to Parallel and Vector Solution of Linear Systems.
Plenum Press, New York, 1988.
[31] R. Pozo. Template Numerical Toolkit: A numeric library for scientific computing in C++. At URL http://math.nist.gov/tnt/.
[32] Y. Saad. Numerical Methods for Large Eigenvalue Problems. Halstead Press,
New York, 1992.
[33] Y. Saad. Iterative Methods for Sparse Linear Systems. PWS Pub., Boston,
1996.
[34] Y. Saad. SPARSKIT: A basic tool kit for sparse matrix computation. Technical Report 1029, Center for Supercomputing Research and Development,
University of Illinois at Urbana-Champaign, August 1990.
[35] G.W. Stewart. Intoduction to Matrix Computations. Academic Press, New
York, 1973.
[36] J. Stoer and R. Burlisch. Introduction to Numerical Analysis. Springer
Verlag, New York, 2nd edition, 1993.
[37] V. Strassen. Gaussian elimination is not optimal. Numer. Math., 13:354
356, 1969.
[38] F. Stummel. Forward error analysis of Gaussian elimination I: Error and
residual estimates. Numer. Math., 46:365395, 1985.
[39] J. H. Wilkinson. The Algebraic Eigenvalue Problem. Oxford University
Press, 1965.


III
:
.2 A m n, m- b. n x krk2
r = b Ax.
. :
.1 .2. m
n, A ,
x Ax = b. x
b Ax.
. x
x
. .. ([3]) A = [1, 1, 1]> b = [1 , 2 , 3 ]> 1 2 3 0 krkp
p = 1 x = 2 , p = 2 (1 + 2 + 3 )/3
p = (1 + 3 )/2. p = 2
kb Axk2
x,
.
, .
.2. ,
. Q> Q = I kb Axk2 = kQ> b Q> Axk2 , Q minx kkQ> b Q> Axk2 .
, ,
217

c
218 6. III 2008,
.
k k2 .
2.

6.1 QR
A
. .
6.1.1. A Rmn m n, Q
mn
R
() R Rnn
A = QR. m = n Q A
R
. Q, R .

6.2
6.2.1. S Rn . S
S .

6.2.2. P : Rn Rn P 2 = P .
P ( ).

P
.
6.2.1. P , I P
.

6.2.2. P 2 = P S := Range(P ) T := Range(I P ).


x Rn , .
x = s + t s S , t T . s, t . s
x S T .

:
Rn (= span) S T Rn x Rn
s S , t T .
x Rn s S
S T .
6.1, S T
. P 1 .
s S t T (, ) Rn ,

(s, t) = (P x, (I P )y) = (x, P (I P )y) = (x, P y P 2 y) = 0.


S T P S . ST T
S Rn .
, S T ,

P : Rn S, P x = s kai P y = 0 y T .
1

, Rn Cn .

c
6.2. 2008,
.

219

x
t
s

6.1: s () x S T .
S .
6.2.1. S Rn P
.
1. x S P x = x.

P x + (I P )x.
2. x = |{z}
| {z }
s

3. P 2 = P ().
4. P > = P .

6.2.1 ( ). S, T
V x V
x = s + t s S , t T .

6.2.1. .
( = oblique projectors).

6.2.1
u Rm

P :=

uu>
Rmm .
u> u

x Rm .

Px =

uu>
x =
u> u
=

u u> x
kuk kuk
vkxk cos(x, u), v :=

u
kuk

c
220 6. III 2008,
.

(hui)
6

-hui

6.2: P := uu> /u> u x


x hui.
P x x
v , . span{v}.
6.2.1. u = [1, 2, 3]> ,

P =

1
2 4
14
3 6

Px =

1 2

1 + 2 2 + 3 3

1
2 1 + 4 2 + 6 3

14
3 1 + 6 2 + 9 3

1
1 + 2 2 + 3 3
2 .
14
3

6.2.2 Gram-Schmidt
Gram-Schmidt
6.1.1. A Rmn .
Gram-Schmidt [q1 , ..., qn ]
ha1 , ...., an i. . .
q1 , ..., qk1 span{q1 , ...., qk } = span{a1 , ...., ak },
qk :

qk

:=

(I P1 Pk1 )ak

qk

:=

qk /k
qk k

Pj := qj qj> . Pj span{qj }.

qk

>
:= ak (q1> ak )q1 (qk1
ak )ak

c
6.2. 2008,
.

221

q1 , ..., qk1 , qk . :
.
[
function[Q, R] = CGS(A)

Gram-Schmidt]

11 = ka1 k q1 = a1 /11
for k = 1 : n
for i = 1 : k 1
ik = qi> ak
end

Pk1
qk = ak i=1 ik qi
kk = kqk k2
qk = qk /kk
end

CGS
= 2mn2 + O(mn).
... T
BLAS-1, . DOT sAXPY.
CGS jk (1 j k )

ak =

k
X

jk qk , 1 k n,

j=1

A = Q1 R1 Q1 = [q1 , ..., qn ]

11

0
R1 =
.
..
0

...

1n

..

..

..
.
..
.

...

nn

, CGS, R1
11 , 12 , 22 , ..., . .
kk = 0,

ak span{q1 , ..., qk1 } = span{a1 , ..., ak1 }


A ,
. R1
R1 .
6.2.2. ,

A , kk
. ak a1 , ..., ak1 .
..., ak
a1 , .., ak1 ;
.
A n. ,
...
n.

c
222 6. III 2008,
.
6.2.2. [2]

1 1

0 0

A=

0 0

0 0
...

1 + 2 = 1.
, MATLAB 4.2 Macintosh Powerbook 5300
(PowerPC CPU) CGS = 1.4901e 08.

Q=

2
2

2
2

2
2

2
2

q2> q3 0.5, Q .

kI Q>
CGS QCGS kF 0.7071
QCGS ,
25
kA Q>
. , Q R
CGS RCGS k 10
A, Q .

6.2.3 GS
, CGS Q.
GS . , q1 ,
a2 ,
A,
a3 , ..., an . Q

q1
q2

:=
=

q1 /k
q1 k
q2 P1 q2 = (I P1 )
q2

q3
...

q3 P1 q3 = (I P1 )
q3

qn

qn P1 qn = (I P1 )
qn .

P1 qj = (q1> qj )q1 ,
Gram-Schmidt2 .
2

Modified Gram-Schmidt

c
6.2. 2008,
.

223

. [. Gram-Schmidt]
function [Q, R] = MGS(A)

Q=A
for k = 1 : n
kk = kqk k
qk = qk /kk
for j = k + 1 : n
kj = qk> qj
qj = qj kj qk
end
end
CGS
. , .
qj CGS. .
6.2.3. MGS .

8
kI Q>
MGS QMGS kF 1.2 10 ,

. Q
CGS:

MGS
.
(
Vandermonde).

6.2.4 Householder
( Householder)

H := E(u, u; 2/u> u) = I

2
u> u

uu> .

,
H > = H H > H = H 2 = I .
H , .

Hx = (I

2
u> u

uu> )x = x P x P x

P := uu> /u> u, x [span{u}] . 6.3. , H


.
:
6.2.2. H . H

c
224 6. III 2008,
.

(hui)
6
B
6

Hx
@
I
@

@
O

-hui

6.3: : Hx = (x Phui x)

~ .
Phui x, x := OA

H (
).

.

x Rn

Hx = (I 2

uu>
u> x
)x = x 2 > u
>
u u
u u

x,

Hx he1 i x he1 , ui
u = x + e1 . u> x, u> u

Hx = (1 2

x> x + 1
u> x
)x 2 > e1 .
2
+ 21 +
u u

x> x

= kxk2 :

u = x kxk2 e1 Hx = kxk2 e1
:
:

u = x + sign(1 )kxk2 e1 .
H Householder:

=
=

2
uu>
u> u
2
(u)(u)> .
I
(u)> (u)

c
6.2. 2008,
.

225

, u u(1) = 1.
x 6= 0, REFL Householder Hx = kxke1 . , u u(1) = 1.
function u = REFL (x)
n = length(x); = norm(x, 2); u = x
if 6= 0

= x(1) + sign(x(1))
u(2 : n) = u(2 : n)/
end

u(1) = 1 ()
end

REFL = 3n ...
: T
u Householder.
6.2.4. u =

x sign(1 )kxk2 e1 Householder.

6.2.5
H . A

HA = A

2
2
u(A> u)> AH = A > (Au)u>
u> u
u u

function B = REFL.ROW (A, u)


(* A HA *)

= 2/u> u
w = A> u
B = A + uw>

function B = REFL.COL (A, u)


(* A AH *)

= 2/u> u
w = Au
B = A + wu>

AH HA 1 DOT, 1 MXV,
REFL.(COL/ROW)
1. T
= 4mn +

...


:

H := H1 H2 ...Hr

Hj = I

2
(j)
>
u(j) (u(j) )> , u(j) = [0, ..., 0, 1, uj+1 , ..., u(j)
n ] .
(u(j) )> u(j)

c
226 6. III 2008,
.
, .. H > A:
for j = 1 : r
(* A = Hj A *)
A = REFL.ROW(A(j : m, u(1 : m))
end

BLAS3. 6.3.2
.
6.2.1. REFL.ROW
2 .

.

6.3 QR: Householder


QR Gram-Schmidt.
A A.
A R65 ,

A(2)

= H2 H1 A =

x x

x x

x x

.
x x

x x

x x

H3
A(2) ( Z ). Hj ( ) A.

Hn Hn1 ...H1 A = R,
Hj
Q

z }| {
= H1 H2 ...Hn R
= QR.

Q Rmm
, R Rmn .
:

c
6.3. QR: Householder 2008,
. 227
. [ QR Householder]
A Rmn m n.
H1 , ..., Hn Q := H1 ...Hn Q> A = R
. A
R. j + 1 : m j
A(j + 1 : m, j), j < m.
for j = 1 : n

u(j : m) = REFL(A(j : m, j))


A(j : m, j : n) = REFL.ROW(A(j : m, j : n), u(j : m))
if j < m
A(j + 1 : m, j) = u(j + 1 : m)
end
end

: T = 2n2 (m n/3) ...



A
Householder . A
:

11
(1)
2
.
.

.
.
.
(1)
m

12
22
(1)
2
.

.
.
.
.

1n
2n
.
n,n

(2)
m

n+1
(n)
m

(n)


Householder ,
.
, Q
. .

6.3.1 QR

, . ,
QR, . Q1 Rmn
R1 Rnn A = Q1 R1 .
>
A> A = R1> Q>
1 Q1 R1 = R1 R1 , R1 Cholesky
( .. ) M := A> A.

RT A> (RT A> )A =


R.

c
228 6. III 2008,
.
M (n) . n. , O(M (n)).
Cholesky. :
1. M := A> A m
n O(M (n)).
O(M (n)).

m
n

2. Cholesky R1 M O(M (n)).


>
3. A = Q1 R1 Q: R1> Q>
1 = A ,
m R1> .
mn2 O(M (n)).

. O(M (n)).

6.3.2 QR Householder
Householder Q Q = H1 Hr . ,
Q
REFL.ROW REFL.COL. :

H1 = I 1 u1 u>
1,
H1 H2

H2 = I 2 u2 u>
2

I G
>
>
>
G := 1 u1 u>
1 + 2 u2 u2 + (1 2 u1 u2 )u1 u2

G 2! ,
G x
>
>
>
Gx = (1 u>
1 x)u1 + (2 u2 x)u2 + (1 2 u1 u2 u2 x)u1

. Gx 2 , u1 u2 ,
G
2. ,
G = W Y > 3 W, Y Rn2 ,

H1 H2 = I + W Y > , W, Y Rn2 .

Q := H1 ...Hr = I + W Y > , W, Y Rnr


, 2r
W Y . [1] QR. .
3

c
6.3. QR: Householder 2008,
. 229
, W, Y r 4 :
6.3.1. Q = I W Y > W, Y Rnr . P =
I 2uu> /u> u u Rn z := 2Qu/u> u

Q+ = QP = I + W+ Y+>
W+ = [W z] Y+ = [Y u].
Q W Y
, .

W, Y r :
Q := H1 ...Hr Hj Householder n. W, Y Q = I + W Y > :

Y = u(1)
W = 2u(1) /(u(1) )> u(1)
for j = 2 : r
z = 2(u(j) + W Y > u(j) )/(u(j) )> u(j)
W = [W z]
Y = [Y u(j) ]
end
Hj
...

T = 2r2 n 2r3 /3
Y Householder,
QR ,
Y .

1
(1)
2
Y =
.
(1)
n

0
1

(2)
n

.
.

0
.

0
1

sr
:

QA =
=

(Hsr H(s1)r+1 ) (H2r Hr+1 )(Hr H1 )A


(I + Ws Ys> ) (I + W1 Y1> )A,

W, Y Rnr .
(I + W Y > )B = B + W Y > B .
Wj , Yj , Wj Yj> A
BLAS3,
. ,
QR :
4

Q .

c
230 6. III 2008,
.
1.1: H1 , ..., Hr (j + 1 : m, j) A j = 1, .., r ,
r A:

A(1) Hr H1 A(:, 1 : r).


1.2: r I + Wr Yr> :

Pr := Hr H1 = I + Wr Yr>
1.3: Pr r + 1 : n A.

:
.
[QR Householder
(* r *) = 1; k = 0
while n

= min( + r 1, n); k = k + 1
(* A( : m, : n)
H , ..., H . *)
(* Wk , Yk
I + Wk Yk> = H ...H *)

A( : m, + 1 : n) = (I + Wk Yk> )> A( : m, + 1 : n)
= +1

end
. ...
QR. r . r
. [1].
Householder, . r = 1
Householder, r > 1 Householder
r .
,
,
(.. )
, .

6.3.3 .2 QR
QR , .2, A
.
, A Rmn , m n b Rm Q Rmm

Q> A = R =

R1
0

, R1 Rnn ,

c
6.3. QR: Householder 2008,
. 231

, Q> b =

c
d

kAx bk22 = kQ> Ax Q> bk22 = kR1 x ck22 + kdk22


R1 6.1.1.

x =
minn kAx bk2 =

xR

R11 c
kdk2

:
. [ .2 Householder]
1. A = QR. A Q R.
2.
for j = 1 : n
(* . . j . *)

u(j) = 1; u(j + 1 : m) = A(j + 1 : m, j)


b(j : m) = REFL.ROW(b(j : m), u(j : m))
end
3. R(1 : n, 1 : n)x = b(1 : n) .
: T = 2n2 (m n/3) + O(mn + n2 ) ...

6.3.4
6.3.1. A Rmn , b Rm A> (b Ax) = 0,

kb Axk2 kb Ayk2
y Rn .

. rx = b Ax
ry = b Ay ry = rx + (Ax Ay),
A> rx = 0

kry k22 = krx k22 + (x y)> A> A(x y)

kry k22 = krx k22 + kA(x y)k22 krx kk22 .


.2 rx
Range(A), Ax b Range(A) .
6.4. P Range(A).
x P b = Ax. b P bspan{a1 , ..., an }
A> (b P b) = 0.
6.3.1. P := A(A> A)1 A>
A.

c
232 6. III 2008,
.

r = b-Ax
Range(A)

Ax

6.4: x b Ax Range(A).
6.3.1. .

6.3.1. .

.2 x

Ax = P b = A(A> A)1 A> b


x := (A> A)1 A> b.


x

A> Ax = A> b.
A , A> A
... .
.2.
. [ ]
1. C = A> A
d = A> b.
2. Cholesky C = GG> .
3. Gy = d G> x = y .

T = mn2 + n3 /3 + O(n2 )
... :
A> A
A (.. ).
, .

c
6.3. QR: Householder 2008,
. 233
5 ...,
. , Pentium
III
RealMax 1.7977 10308 .
IEEE ...
, .. PowerPC.
.

.2 . ,
(A> A). 2 (A> A) := [2 (A)]2 ,
A
.
6.3.2.

A=

1
0

6= 0, rank(A) = 2, Ax = b.

>

A A=

1 + 2

!
.

...,
fl(1 + 2 ) = 1, .. 2 < M . Macintosh Powerbook
5300 (PowerPC CPU) IEEE floating point standard, < 1.4901 108 . A> A
, Cholesky .

.

6.3.5
A , .
LAPACK:
QR: A _GEQRF.
Q
.
5

.. IEEE Floating Point Standard

c
234 6. III 2008,
.
_ORGQR (_ORMQR)
() Q Q> A.
.2:
QR LQ

SVD

_GELS
_GELSX
_GELSS

_GELSX, _GELSS
.

6.4 Givens
QR. Householder,
. .
C R2 . x = 1 +i2 = |x|ei C

xei

= (1 + i2 )(cos + i sin )
= (1 cos 2 sin ) + i(1 sin + 2 cos )
= |x|ei(+) .

ei x . 1 , 2 R2
c = cos , s = sin ,
:

c1 s2
s1 + c2

c s
s c

1
2

1
2
s= p 2
c= p 2
1 + 22
1 + 22

G=

c
s

s
c

p
0

12 + 22

c s
s c

1
2

x Gx e2 .
6.5.
G
Wallace J. Givens.
6.4.1. .
2
2
G , kGxk = x> G> x = kxk
Gx kxk.

c
6.4. Givens 2008,
.

e2
6

235

Gx-e1

6.5: Givens
x e2 .
n- .
x Rn hej , ek i
ek , . e>
k G(j, k, )x = 0.
G(j, k, ) Rnn
(i1 , i2 ) . j < k

[G(j, k, )]i1 ,i2

s
=
s

i1 ,i2

i1 = i2 = j i1 = i2 = k
i1 = k i2 = j
i1 = j i2 = k
.

i1 ,i2 Kronecker i1 = i2
. G(j, k, )> x x hej , ek i.

6.4.1
c, s 1 , 2

c s
s c

1
2

.
function [c, s] = GIVENS(1 , 2 )
if 2 = 0

c = 1; s = 0
else
if |2 | > |1 |

= 1 /2 ; s = 1/ 1 + 2 ; c = s

else

= 2 /1 ; c = 1/ 1 + 2 ; s = c

end
end
GIVENS
T
= 6,
( ).

c
236 6. III 2008,
.
Householder
,
G
.
G.
( ) .

G.W. Stewart .
if c = 0

=1
elseif |s| < |c|

= sign(c)s/2
else

= sign(s)2/c
end
c, s :
c, s:
if = 1

c = 0; s = 1
elseif || < 1

s = 2; c =

else

c = 2/; s =

1 s2

1 c2

end
min(|c|, |s|)
( ) 1 2

= max(|c|, |s|) 1.
Givens .

6.4.2 Givens
. A R2n

c s
s c

function A = ROT.ROW(A, c, s)
for j = 1 : n

1 = A(1, j); 2 = A(2, j)


A(1, j) = c1 s2
A(2, j) = s1 + c2
end

c
6.4. Givens 2008,
.

237

ROT.ROW
T
= 6n ...

ROT.COL:

AA

c
s

s
c

6.4.3
Givens BLAS1.
MATLAB.
BLAS: Givens BLAS1:
CALL _ROTG(A,B,C,S)
B , (C, S) (c, s)
CALL _ROT(N,X,INCX,Y,INCY,C,S) Givens N

[X(1), Y(1)]> ,
...
X(1+(N-1)INCX), Y(1+(N-1)INCY)]>
Givens _ROTMG, _ROTM
MATLAB: function PLANEROT:

PLANEROT Generate a Givens plane rotation. [G,X]


= PLANEROT(X): X is a 2 component col. vector,
returns a 2-by-2 orthogonal matrix G so that Y
= G*X has Y(2) = 0.
function [G,x] = planerot(x)
if x(2) = 0
r = norm(x);
G = [x; -x(2) x(1)]/r;
x = [r; 0];
else
G = eye(2);
end

6.4.4 QR Givens
A. A
Gj A
.
, ..
.

c
238 6. III 2008,
.

x x

x x

x x

x x

x x

x x

0 x

0 x

x x x

x
x

x
x

x
0

x
x

0
x

x x

x x x
x x

x x
0
x
x

x x
0 x x

x x x
x x

0 x x
x x

x x
0 0 x
0

. [Givens QR] A Rmn m n,


A R = Q> A, Q .
for j = 1 : n
for m : 1 : j + 1

[c, s] = GIVENS(A(i 1, j), A(i, j))


A(i 1 : i, j : n) = ROT.ROW(A(i 1 : i, j : n), c, s)
end
end

T = 3n2 (m n/3)

Givens Householder
(. [7] 3.5.2):
Householder u
REFL ...
= I 2
Hk =
u. H
uu
> /
u> u
kH

O(u). fl(HA)
= H(A + E) kEk = O(ukAk).
c, s Givens
c = c(1 + c ) s = s(1 + s ) c , s = O(u). k, )> A = G(j, k, )> (A +
G(j,
E) kEk = O(ukAk).

Gauss.

6.4.5
2 .
Q> Q = I Q> = Q1 ,
Q . QR .
QR ,
.

c
6.4. Givens 2008,
.

239

: Krylov span{r, Ar, ..., Am1 r}


Gram-Schmidt.
Hessenberg: A Rnn . Householder A Hessenberg,

Q> AQ = H, Q> Q = I,
Q Householder.
: k = 1, ..., n 2
(3 : n, 1), (4 : n, 2), ..., (k + 1 :
n, k 1) A H1 , ..., Hk1 :

A(k) = Hk1 H1 AH1 Hk1 .


k Hk
(k + 2 : n, k) A(k) . (Hk A(k) )Hk
,
A(n2) Hessenberg.
Householder Givens;

.
6.4.2. n 1
>
>
Givens G>
1 G2 ...Gn1 x = kxke1
Householder u H(u)x = kxke1 .

Givens
x, ..
x .
Givens QR Hessenberg .
6.4.3. A Rnn
Hessenberg .

6.4.1. xj ,

(A iI)xj = bj , j = 1, ..., s.
Q Q> AQ = H Hessenberg.

Q> (A ij I)Q =

H ij I
| {z }
Hessenberg

(A iI)xj = bj Q> (A ij I)Q Q> xj = Q> bj


|
{z
}
Hij I

= Q> [b1 , ..., bs ]


H = Q> AQ B
(H ij I)
xj = bj , j = 1, ..., s

xj , j = 1, ...,
xj = Q

c
240 6. III 2008,
.
H ij I
O(n2 ) O(n3 )
A ij I . ,

x(t)

y(t)

= Ax(t) + Bu(t),
= Cx(t), t 0.

x(0) = x0

Z
x(t) = etA x0 +

e(t )A Bu( )d, t 0.

x0 = 0. Laplace y :

y()

:=

et y(t)dt
Z t

Z
et C
e(t )A Bu( )d dt
0

0
1

C(iI A)

Bu
()

G(ij ) = C(ij I A)1 B, j = 1, ..., s


(ij I A) [5, 6].

6.5
6.5.1 (, , 04). :
QR LU
QR.

. LU 2n3 /3 + O(n2 )
QR 4n3 /3 + O(n2 ).
6.5.2.

1 0
A = 0 1 .
1 2

1. P

range(A) = span{A(:, 1), A(:, 2)}.


2. x =
[1, 2, 3]T range(A);

.
1. A
:

5/6 1/3 1/6


P = A(A> A)1 A> = 1/3 1/3 1/3
1/6
1/3 5/6

c
6.6. 2008,
.

241

2. x A :

2/3
x
= P x = 5/3
7/3

6.5.3 (Higham [4]). : x Rn P P x = ||x||2 . G1,2 , . . . , Gn1,n


: Qx = G1,2 . . . Gn1,n x = ||x||2 e1 .

. : det(P ) = 1, det(G) = 1, det(Q) = 1. P ,


Q .

6.6
6.6.1. A Rnn

A=

R//S

R . A
.
. ,
Householder.

6.6.2.
QR A R33 , .. qr(A), A

1 2
1 1
1 1

3
2
1

1. )
A, ) A.
A.
2. Ax = b
b = [12, 5, 10]T .

.
1. R A = QR.
Householder A. u1 = [1, 1, 1]T
uj uT

u2 = [0, 1, 1]T . Hj = (I 2 uT ujj ). H2 H1 A = R


j

A = H1 H2 R .

1 1 1
u1 uT1
1
1 1 1
=
3
uT1 u1
1 1 1

c
242 6. III 2008,
.

0
u2 uT2
1
0
=
2
uT2 u2
0

0
1
1

0
1
1

1 4 9
1
5
6
A = H1 H2 R = 2
3
2
2
3

2. H2 H1 A = R Rx = H2 H1 b
x = [1, 2, 3]T .
6.6.3 (, , 04). QR
A, A [1, 4, 5; 1, 2, 6; 1, 2, 3].
Q, R A.

6.6.4. uj Rn kuj k2 = 1 j = 1, ..., m


H(uj ) = I 2uj uT
j Householder.
1. P := H(um )H(um1 ) H(u1 )
I U T U T T U = [u1 , , um ]
T uj . ( :
m = 2 .)
2. A Rns .
P A
/.

.
1.
T . Pm = H(um ) . . . H(m1 ). m = 2 :
>
>
>
>
>
P2 = (I 2u2 u>
2 )(I 2u1 u1 ) = I 2u1 u1 2u2 u2 + 4(u2 u1 )u2 u1

T2 =

2
4u>
2 u1

0
2

m = 3 :

P3

>
>
>
>
= (I 2u3 u>
3 )(I 2u1 u1 2u2 u2 + 4(u2 u1 )u2 u1 )
>
>
>
>
= I 2u1 u>
1 2u2 u2 2u3 u3 + 4(u2 u1 )u2 u1 )
>
>
>
>
>
>
+ 4(u>
3 u2 )u3 u2 ) + 4(u3 u1 )u3 u1 ) 8(u2 u1 )(u3 u2 )u3 u1

2
4u>
T3 =
2 u1
>
>
8(u>
u
)(u
u
1
2
3 2 ) 4u3 u1

0
2
4u>
3 u2

0
0
2

c
6.6. 2008,
.

243

Tm
2, 2,1 = 4u>
2 u1 , i,j i > j
:

i,j = 2

i1
X

(u>
i uk )ik,j

k=1

2. A Rns .
P O(mn3 ) . n > m, s,
P A : (Hm (. . . (H1 A) . . .)), (
. 2.5.1). O(mn2 s) .
T O(m2 n) m(m 1)/2 DOT.
, ,
O(n2 + mns + m2 s) . n >> m, s
P A .
6.6.5. s uj Rn , j = 1 : s. )
BLAS-1 BLAS-2
Hs Hs1 H1 Hj
Rnn uj Rn .
. )
, n, s,
( flops) min .

. )

Hs (Hs1 (H1 e1 ) )
e1 .

x = H1 e1
for j = 2 : s
x = Hj x
end
(uT x)

Hj x = x 2 uTj u uj DOT
j

DAXPY. BLAS-1.
) (2(2n 1) + 1) + 2n, . 6n 1.
Hj e1 = e1 2

(uT
j e1 )
uj ,
uT
j uj

. 3n 2.

6ns3ns1. ,
, sn + n.
sn+n
.
min 6ns3ns
6.6.6. A Rnn . . ) A
Hessenberg, H , Gauss, .
W AW 1 = H W Gauss. )
T = 53 n3 + O(n2 ) ... )

244

Householder
.

6.6.7.

A=

A11
A21

A12
A22

, A11 Rkk , A22 R(nk)(nk) .

1 = I W1 Y T
A21 = Q1 R1 Q
1
(nk)k
W1 , Y1 R
Householder.
1 ]. ) A(1) := QT AQ1
Q1 := diag[Ik , Q
1
, . A(1) (p|n) p. )

A Hessenberg

H11

H21
H=

0
0

H12

H1N

H22

..
.
..

..
.
..
.

H32

HN,N 1

HN N

Hij Rkk H = U T AU U = Q1 QN 2 Qj
WY .

[1] C. Bischof and C. V. Loan. The wy representation for products of householder matrices. SIAM J. Sci. Statist. Comput., 8(1):s2s13, January 1987.
[2] . Bjorck. Least squares methods. In P. G. Ciarlet and J. L. Lions, editors,
Handbook of Numerical Analysis, volume 1. Elsevier/North Holland, 1987.
[3] G. Golub and C. F. Van Loan. Matrix Computations. The Johns Hopkins
University Press, Baltimore, 2nd edition, 1989.
[4] N.J. Higham. Accuracy and Stability of Numerical Algorithms. SIAM, Philadelphia, 2nd edition, 2002.
[5] A. J. Laub. Efficient multivariable frequency response computations. IEEE
Trans. Aut. Contr., AC-26:407408, 1981.
[6] A. J. Laub. Algorithm 640. efficient calculation of frequency response matrices from state space methods. ACM Trans. Math. Softw., 12(1):2633,
March 1986.
[7] J. H. Wilkinson. The Algebraic Eigenvalue Problem. Oxford University Press,
1965.


IV
Problem 3: The solution of simultaneous linear equations. In this
problem we are likely to be limited by the storage capacity of the
machine. If the coefficients of the equations are essentially random
we shaIl need to be able to store the whole matrix of coefficients and
probably also at least one subsidiary matrix. If we have a storage
capacity of 6400 numbers we cannot expect to be able to solve
equations in more than about 50 unknowns. In practice, however,
the majority of problems have very degenerate matrices and we do
not need to store anything like as much. For instance problem (2)
above can be transformed into one requiring the solution of linear
simultaneous equations if we replace the continuum by a lattice. The
coefficients of these equations are very systematic and mostly zero.
In this problem we should be limited not by the storage required for
the matrix of coefficients, but by that required for the solution or for
the approximate solutions. - Alan Turing [20]
.1 ,

Gauss 23 n3 + O(n2 ).
..., ( Cholesky) LU.
,
n(n+1)
( (diag(A))1 L) LU .
2

. ,

.
(.. )
;
1. .
2. .
245

c
246 7. IV 2008,
.

7.1: p + q + 1, p, q .


(, , ..) .
O(n)
O(n2 ) (Vandermonde, Toeplitz, Hankel ).
:
Hessenberg O(n2 ) ...
O(n) ...
Vandermonde O(n2 )
...
O(n log n)
...
Toeplitz O(n2 ) ...

7.1 /

.
(= banded matrices) .
7.1.1. ij p, q
ij = 0 i > j + p j > i + q .
q ,
p p + q + 1. p = q (..
) ( = semibandwidth) p.

. 7.1.
7.1.1.

c
7.1. / 2008,
.

247

p = q = 1 :
, .
p = 1, q = 0 (. p = 0, q = 1) () .
Hessenberg
: .. p = 1 Hessenberg p = 0 .

A , q
p,

nnz := n + (p + q)n

p(p + 1) + q(q + 1)
2

p, q n O(n)
p + q + 1 () n.
. 3
[1].
q
p A(p|q). A(p|q)
e>
i A(p|q)ej = 0 i > j + p j > i + q .
7.1.1. A(p1 |q1 ), B(p2 |q2 ) Rnn .
1. C = A(p1 |q1 ) B(p2 |q2 ) max(q1 , q2 )
max(p1 , p2 ).
2. C = A(p1 |q1 )B(p2 |q2 ) min(q1 + q2 , n)
min(p1 + p2 , n).

. .
, ,
i+p1 +p2 ,i = i+p1 +p2 ,i+p2 i+p2 ,i + . A, B
i+p1 ,i+p2 i+p2 ,i 6= 0, i+p1 +p2 ,i 6= 0. i+p1 +p2 +1,i .
, i+p1 +p2 +1,k k,i
, . k i + p1 + p2 k + p1 k i p2 .
7.1.1. A(1|1). Ak =
[A(1|1)]k min(n, k).


O(n), .
,
. . A(1|1) Rnn An1
n 1.
7.1.1 . ,
.
7.1.2. A(n|n) R2n2n diag[D; D] D Rnn . 7.1.1,
[A(n|n)]2 2n. A ,
A(n|n)A(n|n) = diag[D2 ; D2 ],
n. 7.2.

7.1.2. A(p1 |q1 ), B(p2 |q2 ) Rnn


() n.

c
248 7. IV 2008,
.
A

A*A

8
0

8
2

4
6
nz = 32

8
0

4
6
nz = 32

4
6
nz = 62

8
2

4
6
nz = 44

7.2: : A 3. : A2 . 3 6.

7.1.3.
A, B LAPACK.



. ,
, .

. [ ]
: A, b A p.
: x Ax = b. : T 2np n p.
for j = 1 : n

j = j /jj
for i = j + 1 : min(j + p, n)

i = i ij j
end
end

c
7.1. / 2008,
.

249

. [ ]
: A, b A q . : x
Ax = b. : T 2nq n q .
for j = n : 1 : 1

j = j /jj
for i = max(1, j q) : j 1
i = i ij j
end
end
n,
O(n2 ).
A(p|q) , .
.
7.1.2. A(p|q) Rnn A = LU . L p U q , . A(p|q) =
L(p|0)U (0|q).

. .
, L, U .
L, U A.

.
. [ ]
: A A A(p|q).
:
L(p|0), U (0|q)
A. : T 2npq n p, q .
for k = 1 : n 1
for i = k + 1 : min(k + p, n)

ik = ik /kk
end
for j = k + 1 : min(k + q, n)
for i = k + 1 : min(k + p, n)

ij = ij ik kj
end
end
end


. A
( , ..

c
250 7. IV 2008,
.

A ), A(1|1) = L(1|0)U (0|1).


L, U , U
A, . U (i, i + 1) = i,i+1 , i = 1 : n 1.
A, L U , A1 .
U
L:
A Rnn .

1 1
1
0
1 1

..
..
2 . . . . . .
2 . . . . . .
0

.
.

.
.
.
.
.
..
.
.
.
.
..

.
.
.
. n1
n1
n n
n 1
0
n
. [ LU , ,
]
: A = trid[i , i , i ]. :
L, U . : T = 3n 3

1 = 1
for i = 2 : n
i = ci /i1
i = i i i1
end
. [ Ly = b]
: L = trid[i , i , 0]. :
y = [1 , . . . , n ]> . : T = 3n 2

1 = 1 /1
for i = 2 : n
i = (i i1 i )/i
end
Ax = b
Ly = b U x = y .
.

=
=

3n 3 + (3n 2) + (2n 1)
8n 6.

A LU
.
P A = LU , P
; , ,
.
7.1.2,
A(1|1) 11 21
21 .

c
7.2. 2008,
.

251

1 2 A.
LU , :

A1 := P1 A = LU P1 = [e2 , e1 , e3 , , en ].
A1 = A1 (1|2), L = L(1|0)U (0|2),
7.1.2
L, U A. ( )
.
7.1.3. [14] A Rnn A = A(p|q)
Gauss
P A = LU . U p + q
L p + 1
.

. [14]
,
Hessenberg
O(n2 ) . (
Hessenberg) , ,
, hk+1,k hkk ,
, > 1.
7.1.4. Hx = b H
Hessenberg.

7.2
, ,
Rnn , n2 . (, Vandermonde, Toeplitz,
Hankel) n 2n 1 . , O(n) O(n2 ) .
,
.
Vandermonde, Toeplitz .

7.2.1 :

.
Pn1
j
7.2.1. p(x) :=
j=0 j x -

1 , ..., n C b = [p(1 ), ..., p(n )]> .


(V (1 , . . . , n ))> a

1
1

V (1 , . . . , n ) = .
..
1n1

1
2
..
.

2n1

...
...

...

1
n
..
.

nn1

b =

c
252 7. IV 2008,
.
Vandermonde 1 , ..., n .
Vandermonde n p n p(x)
1 , ..., n .

7.2.2. 1 , . . . , n
R 1 , , n R . Newton
n (j , j )

p(x) =

n1
X

j=0

j1
Y

(x k )

k=0

j .

b = N c,
c , b

1
1

N =
1

0
(2 1 )

(k 1 )

(n 1 )

Qk1

j=1 (k

j ) 0

( ) Newton-Vandermonde.
:

Qn1

j=1 (n

j )

Vandermonde
,
Newton-Vandermonde .
Vandermonde , , . .
Newton-Vandermonde ,
, .
.
[18].

.
7.2.3.
p(x) n 1:
:

{j }n1
j=0 .

Pn1
j=0

j xj n
Qn1

: p(x) = n1 j=1 (x j )
n 1 n1 .

c
7.2. 2008,
.
Newton:

{j }nj=1

Pn1

Qj1

k=0 (xk )
{j }n1
j=0 .

j=0

253

: n {p(j )}n1
j=0 .

,
, n 1 n
.

..

, .
,
.
7.2.4. A Cnn
O(n2 ) . A Vandermonde n k = exp(2k/n), k = 0, ..., n 1,
A O(n log n)
... y = Az

y(k) =

n1
X

z(j) exp(2jk/n), k = 0 : n 1,

j=0

y Fourier z . Fourier ( FFT).


, y z ,
Fourier FFT.
, [21].

7.2.2 Vandermonde
V (1 , ..., n ) Vandermonde V w = a. Gauss, O(n3 ) ...
n . O(n2 ) ...

Vandermonode.
7.2.1. Vandermonde

det[V (x0 , x1 , ..., xn )] =

(xi xj )

j<i

c
254 7. IV 2008,
.
.

det

1
x0

1
x1

..
.

..
.

xn0

xn1

...
...

...

1 1
0 x1 x0
0 x21 x0 x1

xn

.. = det
..
.
.
xnn
0

..
.

xn1 x0 xn1
1

...
...
...

...

1
xn x0
x2n x0 xn
..
.

xnn x0 xn1
n

= (x1 x0 )(x2 x0 ) (xn x0 )det[V (x1 , ..., xn )]


.
xj ,
det[V (x0 , ..., xn1 )] 6= 0.
Vandermonde
Vandermonde
Vandermonde Vandermonde.
: b = [0 , ..., n ]>
a = [0 , ..., n ]> . V > a = b

a := [0 , ..., n ]>

pn (x) = 0 + 1 x + + n xn

b = [0 , ..., n ]> .
V > a = b :
1. pn

{pn (j ) = j }j=0:n .
2. pn .
Newton ,

pn (x) = 0 + 1 (x 0 ) + + n (x 0 ) (x n1 )
j . j {pn (j ) = j }j=0:n . ,
T 32 n2 ...
. [ ]

j = j (j = 0, ..., n)
for k = 0, ..., n 1
for i = n, ..., k + 1

i = (i i1 )/(i ik1 )
end
end

c
7.2. 2008,
.

255

pn Newton . Horner Newton.


:

qn (x) = n
for k = n 1 : 1 : 0
qk (x) = k + (x k )qk+1 (x)
end
q0 (x) = pn (x) .
(k)

(k)

qk (x) := k + k+1 x + + n(k) xnk


qk (0)

j := k

j = j
for k = n 1 : 1 : 0
for j = k : n 1
j = j k j+1
end
end
T n2 ...
Bjorck-Pereyra
V > a = b.
. [ Vandermonde ]
(Bjorck-Pereyra) {j , j }j=0:n
j b a

[V (0 , ..., n )]> a = b.
for k = 0 : n 1
for j = n : 1 : k + 1

j := (j j1 )/(j jk1 )
end
end
for k = n 1 : 1 : 0
for j = k : n 1

j := j j+1 k
end
end
T = 5n2 /2 ...

b ..
c :

c =
=

1
Dn1
Ln1 (1) D01 L0 (1)b

U >b

c
256 7. IV 2008,
.

Lk () :=

Ik
0

0
Jn+1k


, Jk :=

..

..

..

1
Dk = diag[1k+1 , k+1 0 , ..., n nk1 ]
Newton
:

a =
=

L0 (0 )> Ln1 (n1 )> c,


L> c.

a = L> U > b, V T = L> U > V 1 = U L.

= V 1 b = U Lb
1
= [L0 (1)> D01 Ln1 (1)> Dn1
][Ln1 (n1 ) L0 (0 )b]

V a = b:
. [ Vandermonde ]
(Bjorck-Pereyra) {j , j }j=0:n
j , b a Vandermonde

[V (0 , ..., n )]a = b.
for k = 0 : n 1
for j = n : 1 : k + 1

j := j k j1
end
end
for k = n 1 : 1 : 0
for j = k + 1 : n

j := j /(j jk1 )
end
for j = k : n 1

j := j j+1
end
end
T = 5n2 /2 ...

. :
n + 1
n + 1
n + 1 Newton

c
7.2. 2008,
.

257

7.1: .
b
. . a . Newton () c
b
V T b

. . a
N 1 V > a
>
Horner/V a
. Newton .
V T N c
Newt.Horner/N c
-

7.2: Vandermonde .
n

2
0.84
0.15 1.01
4
3.30
0.91 4.00
6
5.95
1.66 6.31
8
8.67
2.41 7.28
10 11.43 3.17 10.47 12 14.20 3.93 11.97
14 16.99 4.69 14.28 16 19.26 5.45 16.46
18 20.73 6.21 15.91 20 21.00 6.98 17.89


. 7.1, N
Newton-Vandermonde .
Vandermonde1
V j .
7.2.5. I. [a, b]:

j = a + (j 1)

ba
.
n

7.2.6. II. Tn (x) := cos(n arccos x):

j = cos(

2j 1
), j = 1, ..., n.
2n

7.2.7. III. : (0, 1):


function Rand MATLAB.

log10 (kV k2 .kV 1 k2 )


, , :
:
.
Vandermonde
n.
1

. 2

c
258 7. IV 2008,
.

7.3: (V (0 , ..., n )).

1
j+1

j 0
j [0, 1]
j [1, 1]
Chebyshev

(V ) > nn+1
(V ) > 2n1
(V ) 42 8n
n
(V ) (3.1)
e/4

3/4
(V ) 3 4 (1 + 2)n
2 (V ) = 1

Vandermonde .

:
n (, ),
. 7.3
N. Higham [16],
.
k2

V k = e n+1 .
V V = I
V Fourier, V
Fourier.
FFT O(n log n).


Vandermonde [2].
[14].
Vandermonde
O(n2 ) ... .

Gauss. 1980, N. Higham Vandermonde
, .. [15] . [16].
Vandermonde Walter Gautschi [7, 9, 12, 10, 13, 11, 3].
O(n2 ) Vandermonde .
Hao Lu,
O(n log2 n) Vandermonde, confluent Vandermonde Vandermonde. [17].
,
. , , .
.. ( ) .
.
, .. [6],

c
7.2. 2008,
.

259

log n ,
[5, 4],
.. ()
Vandermonde

Pn
j
, .

x
.
j=0 j
(, ),
.
. John Rice [19] .
[8, 12, 10].

7.2.3

Toeplitz

(.. ,
, ..) Toeplitz. , A Rnn
Toeplitz 2n .
Toeplitz (= persymmetric)
:
7.2.1. A Rnn ij = nj+1,ni+1 .

7.2.1. E = [en , ..., e1 ] A . A = EA> E > .

.
MATLAB Toeplitz :

TOEPLITZ(C,R) is a non-symmetric Toeplitz matrix


having C as its first column and R as its first
row.
TOEPLITZ(C) is a symmetric
(or Hermitian) Toeplitz matrix.
>>toeplitz([1 2 3],[4 5 6])
Column wins diagonal conflict.
ans =
1
2
3

5
1
2

6
5
1

>> toeplitz(linspace(2,1,3))
ans =
2.0000
1.5000
1.0000
1.5000
2.0000
1.5000
1.0000
1.5000
2.0000
7.2.1. T = TL + TU TL , TU
Toeplitz .

7.2.2. T Toeplitz T 1
Toeplitz.

7.2.3. T Toeplitz T, T 1 , ( T 1 Toeplitz).

c
260 7. IV 2008,
.
7.2.4. T1 , T2 () Toeplitz,
T1 T2 .

7.2.1.
.
.

7.2.4 Toeplitz
Toeplitz T Rnn x Rn b = T x.

0
1
2
..
.

n1

0
1
2

0
0
1

0
0

..

..

..

..
.

n1

0
0
0

0
1
2

..
..
.
.
0
n1

n2 . , Toeplitz
.
b

0 =
1 =
2 =
....
k

k
X

j kj

j=0

....
n1

0 0
1 0 + 0 1
2 0 + 1 1 + 2 0

n1
X

j n1j

j=0

a(z), x(z) n1 ( k k )

a(z) :=

n1
X

j z j

j=0

x(z) :=

n1
X

j z j

j=0

b(z) 2n2

b(z) = a(z) x(z)

b(z)

:=

2n2
X
j=0

j z j

c
7.2. 2008,
.

k =

k
X

261

j kj (0 k 2n 1)

j=0

n k T x.
.
n . p(x), q(x) Pn1 2n 1
{j }2n1
j=1 . , r(x) = p(x)q(x) O(n) r(j ) = p(j )q(j ).
( ) r 2n2 r(j ), j = 0 : 2n 2, n
r .
,
j .

j = exp(2j/(2n 1)), := 1.
p(j ), q(j ) O(n log n) FFT
p q . 2n 1
() r(x) r(j ) =
p(j )q(j ), j = 1 : 2n 1, O(n log n) FFT
2n 1 r(j ).
) Toeplitz
) O(n log n) . ()
(= convolution theorem). :
a(z), x(z) 2n 1 j

b(j ) = a(j ) x(j ) 0 j 2n 2

b(j ) k , 0 j 2n 2.
j

j = 2n1 (0 j 2n 1)
FFT.

a
= fft2n1 (a), x
= fft2n1 (x)
T = O(n log n)

b = a
x

T = O(n)

b = fft1 (b)
2n1
T = O(n log n)
b n
b.
Hadamard
. :
7.2.2. Toeplitz T = O(n log n)

7.2.5. :

c
262 7. IV 2008,
.
1. T Toeplitz x . T = TL +TU TL , TU
Toeplitz, T x T = O(n log n).
2. T, P Toeplitz. T P Toeplitz ,
. T P (:, 1), T =
O(n log n).
3. T, P Toeplitz.

T P = (TL + TU )[p1 , p2 , ..., pn ]


. T = O(n2 log n).

Toeplitz
T , Toeplitz. x

T x = b.

(7.1)

T , 0 6= 0,
. :

Tk =

..

..
.

..

..

k1

k1

k2

..

.
1

Tn = T . Tn x .
Tk yk =
rk , k = 1 : n rk .
yk , k = 1 : n x.
Durbin Yule-Walker:

Tn y = r = [1 , , n ]>
.

Tk y = r = [1 , ..., k ]> .
Tk+1 y = rk+1 :

Tk
r> Ek

Ek r
1

r
k+1

Tk1 ,

=
=

Tk1 (r Ek r) = y Tk1 Ek r
y + Ek y

c
7.2. 2008,
.

263

= k+1 r> Ek z
= k+1 r> Ek (y + Ek y)
= (k+1 + r> Ek y)/(1 + r> y)

T ... 1 + r > y > 0.


Tk y = r , Tk+1 y = [r; k+1 ] O(k) .
:
.

y1 = 1
for k = 1 : n 1
k = 1 + rk> yk
k = (k+1 + rk> Ek yk )/k
zk = yk + k Ek yk
yk+1 = [zk> ; k ]
end
T = 3n2 ...

= 1 + rk> yk
>
= 1 + [rk1
k ]

yk1 + k1 Ek1 yk1


k1

= k1 + k1 (k1 k1 )
2
= (1 k1
)k1
:
. [Levinson-Durbin]
: T = 2n2 .

y(1) = r(1); = 1; = r(1)


for k = 1 : n 1
= (1 2 )
= (r(k + 1) + r(k : 1 : 1)> y(1 : k))/
for i = 1 : k
z(i) = y(i) + y(k + 1 i)
end

y(1 : k) = z(1 : k); y(k + 1) =


end
MATLAB Signal Processing Toolbox :

A = Levinson(r,n);
n < 195 !
7.2.8.

r = linspace(2,1,400);
flops(0);
x = Levinson(r);
flops
319999

c
264 7. IV 2008,
.
. 2 4002

flops(0);
x = toeplitz(r)\r
flops
22213401
. 4003 /3.


MATLAB Toeplitz
(7.1).

Tk
r> Ek

Ek r
1

b
k+1

Tk x = b = (1 , ..., k )> ; Tk y = r
v

1
= Tk1
(b Ek r) = x + Ek y;

= k+1 r> Ek v
= k+1 r> Ek x r> y
= (k+1 r> Ek x)/(1 + r> y)
. [Levinson]
: T = 4n2 .

y(1) = r(1); x(1) = b(1); = 1; = r(1)


for k = 1 : n 1
= (1 2 )
= (b(k + 1) r(1 : k)> x(k : 1 : 1))/
v(1 : k) = x(1 : k) + y(k : 1 : 1)
x(1 : k) = v(1 : k); x(k + 1) =
if k < n 1
= (r(k + 1) + r(1 : k)> y(k : 1 : 1))/
z(1 : k) = y(1 : k) + y(k : 1 : 1)
y(1 : k) = z(1 : k); y(k + 1) =
end
end
Hankel

Hankel. .
H Rnn Hankel E = [en , . . . , e1 ], EH Toeplitz.

b = Hx Eb =

EH x
|{z}
Toeplitz

Hankel Toeplitz. MATLAB :

c
7.2. 2008,
.

265

HANKEL Hankel matrix.


HANKEL(C) is a square Hankel matrix whose first column is C and
whose elements are zero below the first anti-diagonal.
HANKEL(C,R) is a Hankel matrix whose first column is C and whose
last row is R.
Hankel matrices are symmetric, constant across the anti-diagonals,
and have elements H(i,j) = R(i+j-1).

>> hankel([1 2 3])


ans =
1
2
3
2
3
0
3
0
0
>> A = hankel([1 2 3],[4 5 6])
Column wins anti-diagonal conflict.
A =
1
2
3
2
3
5
3
5
6
>> E=eye(3); E = E(3,3:-1:1)
E =
0
0
1

0
1
0

1
0
0

3
2
1

5
3
2

6
5
3

>>E*A
ans =

7.2.5
(= circulant) .
Toeplitz Toeplitz. :

0
n1

A= .
..
1

1
0

...
...

..

..

n1

n1
n2

..

.
0

7.2.1. A,

C = [en e1 e2 , , en1 ],

c
266 7. IV 2008,
.
n := exp(2/n).
1. A , Toeplitz.
2. C n = I .
3.

A=

n1
X

j C j

j=0

4. Q

1
Qk,j = n(k1)(j1)
n
, = Q AQ A

k =

n1
X

as n(k1)s , k = 1, ..., n.

(7.2)

5. A1 .
6. A, B AB AB = BA.

7.2.9. .

1 2

4 1

A=
3 4

2 3

A1

0.2250

0.0250

=
0.0250

0.2750

0.2750
0.2250
0.0250
0.0250

3
2
1
4

1
0.0250

0.0250

0.0250

.
0.2250 0.2750

0.0250 0.2250
0.2750

4 = exp(2i/4) = i, (7.2).
. C 10, 2 + 2i, 2

2i, 2.

Ax = b Q
A:

Ax = b x = Q1 Q b
:

c
7.3. 2008,
. 267
. [ Ax = b A.]
1.
b = Q b FFT
2. :
k =

Pn1
s=0

(k1)s

s n

FFT.

3.
b = 1b. O(n) .
4. x = Q
b. FFT
FFT n , . O(n log n)
...

7.3
7.3.1. : A A = LU L
U .

. .
(.
) (. . 7
).

7.4
7.4.1 (, , 02-makeup). )
LU A . ) ...
. )
...

7.4.2 ( 03). MATLAB


(Y Rnm ) .
1 < m < n myvec (n) () n
.
, , ,
MATLAB .

function [Y] = vansolve(B);


[n,m]= size(B); x=myvec(n); V=[ ];
for i=1:n
for j=1:n, V(i,j) = x(i)(j-1); end
end
for k=1:m, Y(:,k) = V\B(:,k); end

. Vandermonde n x.

c
268 7. IV 2008,
.
m V B .
, ) V , )
, ) . , )
,
. (
) x
.

function [Y] = vansolve(B);


[n,m]= size(B); V=ones(n); x=myvec(n);
for j=2:n
V(:,j) = V(:,j-1).*x;
end
[L,U]=lu(V);
Y = U\(L\B);
7.4.3.

1
1

An = .
..
1

0
1

0n
1n

.. .
.
nn

j R , j = 0 : n.
1. A
j A .
2. b Rn A
.
Gauss
Ax = b;

. 1)
n = 1 n = 2.
A1 = 1 0

detA2 = (1 22 12 2 ) (0 22 02 2 ) + (0 12 02 1 )
,

detA2 = (2 0 )(2 1 )(1 0 ).


. , Symbolic
Toolbox MATLAB :

syms x0 x1 x2
>> A=[1,x0,x02;1,x1,x12;1,x2,x22]

c
7.4. 2008,
.

269

A =
[
1,
x0, x02]
[
1,
x1, x12]
[
1,
x2, x22]
>> det(A)
ans =
x1*x22-x12*x2-x0*x22+x02*x2+x0*x12-x02*x1
>> factor(ans)
ans =
(-x2+x0)*(-x2+x1)*(x1-x0)
, n, ()
.

detAn =

n
Y

(i j ).

i>j

,
( )
( ).
1 n 1 n- ,

0
0

detAn = det .
..
1

0 n
1 n

0n nn
1n nn

..

.
nn


(1 : n, 2 : n + 1), .

0 n

..
.
n1 n

0n nn

..
.

n
nn
n1

. n = 1, P0 = 1 0 .
n = 1. n 1, .

detAn1 = detPn1 =

n1
Y

(i j ).

i>j

detAn = (1)n (0 n ) (n1 n )detQn1

Qn1

0 + n

1 + n

..
.

..
.

n1 + n

n
0n n
0 n
n
1n n
1 n

n
n
n1
n
n1 n

c
270 7. IV 2008,
.

Qn1

0 0 n1
0 1 n1

.
..

1 n1 + n

..
.

n1
0n1 n1
n1
1n1 n1

n
n
n1 n
n1 n

0 n1

Pn2 = ...
n2 n1

n1
0n1 n1

..
.

n1
n1
n2
n1

detPn2 =

n1
Y

(i j ).

i>j

detAn

(1)n (0 n ) (n1 n )

n1
Y

(i j )

i>j

(n 0 ) (n n1 )

n1
Y

(i j )

i>j

n
Y

(i j ).

i>j

.
:
, i 6= j
i 6= j .
2) , . , Vandermonde
n.
j .
Vandermonde
.
7.4.4. Toeplitz T Rnn J

0 0

1 0
J =

0 1
0
J = [e2 , e3 , , en , 0].

..

..

,
..
.
0

c
7.4. 2008,
.

271

Pn1

k
1. T =
k , k =
k=0 k J
0, , n 1 T .

2. Toeplitz Toeplitz
.
3. T Toeplitz
, T 1 Toeplitz.

. 1)

J 2 = [e3 , e4 , , en , 0, 0]
, k n 1

J k = [ek+1 , , en , 0, , 0]
k
k n, JP
= 0. , Toeplitz
n1
T = k=0 k J k .
2) Toeplitz, A B .
C = AB J k , k = 0, ..., n 1.

AB

= (

n1
X

k J k )(

i J i )

i=0

k=0
n1
X

n1
X

k J k .

k=0

Pn1

k
3) (1) T =
k=0 k J . 0 = 1,
0 .

T =I P =I (

n1
X

(i )J i ).

i=1

P ( ), P .
Neumann,

T 1

(I P )1 = I + P + P 2 + .

P j J , J n1 ,
.
J .

272

[1] E. Anderson, Z. Bai, C. Bischof, J. Demmel, J.J. Dongarra, J. Du Croz,


A. Greenbaum, S. Hammerling, A. McKenney, S. Ostrouchov, and D. Sorensen. LAPACK Users Guide. SIAM, Philadelphia, 2nd edition, 1995.
[2] . Bjorck and V. Pereyra. Solution of Vandermonde systems of equations.
Math. Comp., pages 893903, 1971.

W. Gautschi, and S. Ruscheweyh. Vandermonde matrices on


[3] A. Cordova,
the circle: spectral properties and conditioning. Numer. Math., 57:577
591, 1990.
[4] W. Eberly. Logarithmic depth circuits for Hermite interpolation. J. Algorithms, 16:335360, 1994.
E
[5] O.
gecioglu, E. Gallopoulos, and C.
Koc . Fast computation of divided
differences and parallel Hermite interpolation. J. Complexity, 5(4):417
437, December 1989.
Eg
eciog lu, E. Gallopoulos, and C.
[6] O.
Koc . A parallel method for fast and
practical high-order Newton interpolation. BIT, 30:268288, 1990.
[7] W. Gautschi. On inverses of Vandermonde and confluent Vandermonde
matrices. Numer. Math., 4:117123, 1962.
[8] W. Gautschi. On the condition of algebraic equations.
21:405424, 1973.

Numer. Mat.,

[9] W. Gautschi. Optimally conditioned Vandermonde matrices. Numer. Math., 24:112, 1975.
[10] W. Gautschi. Questions of numerical condition related to polynomials.
In G.H. Golub, editor, Studies in Numerical Analysis, volume 24, pages
140177. Mathematical Association of America, 1984.
[11] W. Gautschi. How (un)stable are Vandermonde systems? In R. Wong,
editor, Asymptotic Analysis and Computational Analysis, pages 193210.
Marcel Dekker, Inc., New York, 1990.
[12] W. Gautschi. The condition of polynomials in power form. Math. Comp.,
33(145):343352, January 1979.
[13] W. Gautschi and G. Inglese. Lower bounds for the condition number of
vandermonde matrices. Numer. Math., 52:241250, 1988.
[14] G. Golub and C. F. Van Loan. Matrix Computations. The Johns Hopkins
University Press, Baltimore, 2nd edition, 1989.
[15] N. J. Higham. Fast solution of Vandermonde-like systems involving orthogonal polynomials. IMA J. Numer. Anal., 8:473486, 1988.
[16] N.J. Higham. Accuracy and Stability of Numerical Algorithms. SIAM, Philadelphia, 1996.

273

[17] H. Lu.
Solution of Vandermonde-like systems and confluent
Vandermonde-like systems. SIAM J. Matrix Anal. Appl., 17:127138, 1996.
[18] V. Pan. Complexity of computations with matrices and polynomials. SIAM
Rev., 34(2):255262, 1992.
[19] J. Rice. A theory of condition. SIAM J. Numer. Anal., 3(2):287311, 1966.
[20] A.M. Turing. Proposed electronic calculator, 1946. Available electronically
from www.emula3.com/docs/Turing_Report_on_ACE.pdf.
[21] C. Van Loan. Computational Frameworks for the Fast Fourier Transform.
SIAM, Philadelphia, 1992.

274


8.1
A major task of mathematics is to harmonize the continuous and the
discrete, to include them in one comprehensive mathematics, and to
eliminate obscurity from both .... [E.T. Bell Men of Mathematics
(1937)]
,
(
) .
, . :
8.1.1.
Navier-Stokes. , R2 R3
:

u + u gradu + gradp = f
divu = 0 ,
u = 0

u p, f

, := 1/Re Re Reynolds.

8.1.2. Maxwell. :

B
=
t
divD =

E+

divB =

275

J+

D
t

276

c
8. 2008,
.

H , E , D ,
B , J , .

( = constitutive relations)
, .. ,

D = E, B = H, J = E
, .

8.1.3.
.
1 . ( = option) ( = call option) ( = put option).

Black-Scholes:

V
1
2V
V
+ 2 S 2 2 + rS
rV = 0.
t
2
S
S

Black-Scholes V
t S .

,
. 2.1 2 .
, 8.1.
.
: ) (. 2). )
,
.
.

.

.
, .

. ,
...,
.

.
1

c
8.1. 2008,
.

277

8.1: .


,

, . .
. , ,
.
.
, , .
,
. , ( )
.
,
.
, () ,
. :
1)
, 2) , 3) (..)
, .
4)

278

c
8. 2008,
.

. , Newton
,

.

8.1.1

.
, ()

().

u
u
2u
pu
L u(z),
(z), . . . ,
(z), 2 (z), . . . , p (z) = 0, z Rs .
z1
zs
z1
zs
p.
. u
Rk , k > 1 .
L
. , L
.

.
( ) . , u(t, z)
, ,
u(0, z) u
.

8.2


.
,
Rn ,
.
,
( )
.
.
:
;
;
;
;

c
8.2. 2008,
. 279

.
.

. , , , , .. u0 (x) = limh0 (u(x + h) u(x))/h.
,

(u(x + h) u(x))/h. :
u(x)

c
x
x2
sin x

u0 (x)
0
1

0
1

2x
cos x

2x + h

u(x+h)u(x)
h

sin(x+h)sin(x)
h

. (u(x + h) u(x))/h = 2x + h = u0 (x) + h.


h.
.
Taylor sin(x + h) =
sin x + h cos x sin xh2 /2 + O(h3 ) (sin(x + h) sin x)/h = cos x + O(h).
h ,
. ,
Taylor . :
,
O(h).
O(h2 ) h .
.
. (finite differences).
.


.

8.2.1
u(x) XL x XU

d2 u
du
(x) + b(x) (x) + c(x)u(x) = d(x)
dx2
dx

L(u, b, c, d, x) = 0. 2
2

c
8. 2008,
.

280

x0

x0

x1

x2

x3

xn+1

xn

xn+1

h
8.2: n + 2 .

u.
u.
. ,

. , c(x) = 0 u(x) ,
u(x) + c c
. u(XL ), u(XU ).
b, c, d
u -
.
u(x) XL < x < XU . :

h ().
L Lh .
u U .

h := {xj |xj = XL + jh h , j = 0, ..., n + 1}


XL
h = XUn+1
. x0 = XL , xn+1 = XU . , n + 2
[a, b] 8.2.
.
.
8.2.1.
.
.
:

c
8.2. 2008,
. 281

, ..
, .

.
,
. ,
6 , 2-3 .
( )
.


Taylor. u 4 ,
(1)

uj1 = uj huj +

h2 (2) h3 (3) h4 (4)


u uj + u (xj + j h)
2 j
6
24

1 < j < 0 < j+ < 1.

uj1 + uj+1 2uj

(2)

h2 uj +

h4 (4)
u (xj + j+ h) + u(4) (xj + j h)
24

uj+1 uj1

(1)

= 2huj +

h3 (3)
u (xj + j+ h) + u(3) (xj + j h)
6

1 < j < 0 < j+ < 1.


u(4) (xj + j+ h) + u(4) (xj + j h)

2u(4) (xj + j h), |j | max{j+ , j } < 1,

u(3) (xj + j+ h) + u(3) (xj + j h)

2u(3) (xj + j h), |j | max{j+ , j } < 1,

uj+1 uj1
2h
uj1 + uj+1 2uj
h2

(1)

uj + O(h2 )

uj + O(h2 )

(2)

xj :

uj+1 uj+1
uj1 uj+1 + 2uj
+ bj
+ cj uj = dj + O(h2 ), j = 1, ..., n.
h2
2h

O(h2 ) .

uj+1 uj1
2h
uj1 + uj+1 2uj
h2

(1)

uj

uj

(2)

282

c
8. 2008,
.

1. h,
2. u(3) , u(4) xj .
O(h2 ) , n

Uj1 Uj+1 + 2Uj


Uj+1 Uj+1
+ cj Uj = dj , j = 1, ..., n
+ bj
2
h
2h
Uj . n
U . O(h2 ), , U
.
O(h2 ).
1 h2 u(3) (x) 2
h2 u(4) (x), x xi . ,

u(3) , 2 M h2 |u(3) (x)| M h2
. , h 0,
1 ( ,
h2 ). ,
2 0 h

u(4) .
(.. 2 )
( 3 ),
(. )
...
.
:

.
n n .
U = [U1 , ..., Un ]
u . ..
kU uk, u u
.

kU uk .

2
,
, .

c
8.2. 2008,
. 283
, .

.
( ):
, (.
), .
, ,
, ,
, . (
)
.
Norbert Wiener (1894-1964), ,
:
.... the exigencies of the Second World War thrust me into the
problem of shooting ahead of a flying airplane and the consequent
problem of predicting its course. I sought to develop predicting machines of various sorts ... and I found that what I was looking for
would have demanded me to have my cake and eat it, too. I was studying the way in which the past observation of the airplane
might lead to the computation of its future position. The accurate
following of an airplane pursuing a smooth course demanded sharp
and sensitive instruments; but these instruments, because of their
very sharpness and sensitivity, were seen to be thrown out of action
by every slight jar and by every corner of the course they were following. For very irregular paths, the instruments I had suggested
were inadequate, not in spite of their refinement, but because of
their refinement. It occurred to me that this impossibility of achieving the ideal instrument all along the line had a close relation to
the Heisenberg impossibility of observing at the same time where a
thing was going and how fast it was going. The more I studied the
problem, the more I realized that my difficulty was not a piece of
casual malice on the part of a mathematical devil, but lay in the the
very nature of prediction itself. [4]

U2 U0
U0 U2 + 2U1
+ b1
+ c1 U1
2
h
2h
Uj1 Uj+1 + 2Uj
Uj+1 Uj1
+ bj
+ cj Uj
2
h
2h
Un+1 Un1
Un1 + 2Un Un+1
+ bn
+ cn Un
h2
2h

= d1
= dj , j = 2, . . . , n 1
= dn

c
8. 2008,
.

284

U2 + 2U1
U2
+ b1
+ c1 U1
h2
2h

d1 +

u0
u0
+ b1
h2
2h

Uj1 Uj+1 + 2Uj


Uj+1 Uj1
+ bj
+ cj Uj
h2
2h

dj , j = 2, . . . , n 2

Un1 + 2Un
Un1
+ bn
+ cn Un
h2
2h

dn +

un+1
un+1
bn
h2
2h

(c1 +
(

2
1
b1
)U1 + ( 2 +
)U2
h2
h
2h

1
bj
2
1
bj
)Uj1 + (cj + 2 )Uj + ( 2 +
)Uj+1
+
h2
2h
h
h
2h
1
2
bn
( 2
)Un1 + (cn + 2 )Un
h
2h
h

= d1 +

u0
u0
+ b1
h2
2h

= dj , j = 2, . . . , n 1
= dn +

un+1
un+1
bn
h2
2h

n [U1 , , Un ]. AU = F A j , j , j
j 1, j, j + 1 j , .

tridn [j , j , j ]

j = (

2
1
bj
1
bj
+ cj ), j = ( 2
), j = ( 2 +
)
h2
h
2h
h
2h

, :
1. A ;
2. , ;

h = f
3. AU
u;
A
. 2 .
. , (1) , u00 (x) + c(x)u(x) = d(x) c(x) 0.

Uj1 Uj+1 + (2 + h2 cj )Uj


h2

dj , j = 2, . . . , n 1

1 n. A ,
V AV = 0.
V j , . Vj > 0. 1 < j < n
AV = 0

0 =
=

Vj1 Vj+1 + (2 + h2 cj )Vj


, 0, j = 2, . . . , n 1
h2
1
1
(Vj Vj1 ) + 2 (Vj Vj+1 ) + cj Vj
h2
h

c
8.2. 2008,
. 285
, V Vj > 0

cj = 0 Vj = 0. Vj = Vj1 .
V1 = Vn = 0
V = 0.
j = 1 j = n. A
.
2
8.2.1. ddxu2 (x)+xu(x) = (9+x) sin 3x
= [0, /6] u(0) = 0, u(/6) = 1.
u(x) = sin(3x).
:
h=

6(n+1)

h = {xj = jh|j = 0, ..., n + 1}

(x1 +

2
1
)U1 2 U2
h2
h

1
2
1
Uj1 + (xj + 2 )Uj 2 Uj+1
h2
h
h
1
2
2 Un1 + (xn + 2 )Un
h
h

= (9 + x1 ) sin 3x1
= (9 + xj ) sin 3xj , j = 2, . . . , n 1
= (9 + xn ) sin 3xn +

1
h2

xj AU = F

tridn [j , j , j ]

j , j , j j 1, j, j + 1 j

2
1
1
+ jh), j = 2 , j = 2
h2
h
h
F j = 1, ..., n 1 (9 + jh) sin 3xj
n (9 + nh) sin 3xn + h12 . , A . AU = F
LU ( O(n))
U u
j = (

8.3. U n = 20
u U
| juj j |
n = 20, 40, 60, 80.
n = 20, 40, 80
[0.1680, 0.0441, 0.0113] 103
, . , O(h2 ).
-
.

- (..
) -
.

) ) .

c
8. 2008,
.

286

10

0.9

0.8
n=20, maxReR=1.68e4

10

0.7
n=40, maxReR=4.41e5
Rel. Error

0.6

0.5

n=60, maxReR=1.99e5

10

n=80, maxReR=1.13e5

0.4

0.3

10

0.2

0.1
7

10

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

0.1

0.2

0.3

0.4

0.5

0.6

0.7

8.3: 8.2.1


, 2
. Poisson
(uxx +uyy ) = f (x, y) uxx , uyy u
x y .
. = [0, 1] [0, 1]
() .
, n n
n2 n2 U .
n2 n2

( LU
O((n2 )3 ) = O(n6 ).
n 1000,
.
A. , .
, 5 .
, . trid[Bj , Aj , Cj ]
Bj , Cj , Aj .
, u, ,
Aj , Bj Cj
. Gauss
.
n3
.

c
8.2. 2008,
. 287
.

8.2.2

. , ,
( ) ,

( ).
( Cauchy),

du
= f (t, u(t)) t I,
dt

u(t0 ) = c.

(8.1)

I R , f : I Rn Rn c . , t I
.
3 :
8.2.1. I R Rn
f : I Rn ,
Lipschitz k k
Rn , I ,
t I u, v Rn ,

kf (t, u) f (t, v)k ku vk.


Cauchy (8.1) .

, ( )
Cauchy u,
[t0 , T ].
f . f t u
R f :
t
u(t) = u(t0 ) + t u( )d. .
0

. f u,
. , t u
.
, Cauchy ,
.
.
.
t = 0 u(0)
u(h). u t = 0 ( f (0, u(0))),

u(h) U1 := u(0) + hf (0, u(0)).


3

Lipschitz.

288

c
8. 2008,
.

u0 (t) ( t = 0)

u0 (t) = f (t, u(t))

u(t + h) u(t)
.
h

u(h)
u(2h).

u(2h) U2 := u(h) + hf (h, u(h)).


, , u(h) ( u(0) )
.
u(h) , . U1 .

u(2h) U2 := U1 + hf (h, U1 ).
, U2
, , u
, Cauchy, u
(1) (t) =
f (t, u
(t)), [h, 2h], , .
u
(h) = u(h), u
(h) = U1 .
, U3 , U4 , ...
u(3h), u(4h), ....
Euler. .
Cauchy
(. (t, u(t)) u : [0, T ] R ), u(0)
u0 (t) [0, T ].
Euler h u(0 + h)
(0, u(0)) (h, u(h)).
, (h, U1 ) ,
(h, U1 ) (2h, U2 ).
t = T ( T = N h N ,
).

:

Un+1 = Un + hn n (tn , Un )
hn ( ) n
u (
). t = t0 , P
n
Un u(t0 + j=1 hj ).

,
Un , (u(t0 ), U1 , ..., Un1 , Un ).
, Un+1
, .. k , . Unk+1 , ..., Un1 , Un , k > 1.
.

c
8.2. 2008,
. 289
n
, (8.1) .
n
n . u : [0, T ] R
n n

u(n) = 0 (t)u(t) + + n1 (t)u(n1) (t) + (t).


u(0) = 1 , . . . , u(n1) (0) = n .


{u1 , ..., un } :

u1 := u, u2 = u(1) , ..., un := u(n1) .


d
u1 (t) =
dt
d
u2 (t) =
dt

d
un1 (t)
dt
d
un (t)
dt

u2 (t)
u3 (t)

un (t)

0 (t)u1 (t) + + n1 (t)un (t) + d(t).

d
u(t)
dt
u(t)
u(0)
d(t)

Au(t) + c(t),

= [u1 , ..., un ]>


= [1 , ..., n ]>
= [0, . . . , 0, (t)]>

0
0
0

1
0
0

0
1
0

..
.

..
.

..
.

0 (t)

0
0
1

..

0
0
0

1 (t) 2 (t) 3 (t)

n1 (t)

8.2.2.

d3 u
=
dt3
0
00
u(0) = 1 , u (0) = 2 , u (0) = 3 .

f (t, u, u0 , u00 )

1 = u,

2 = u0 ,

3 = u00

c
8. 2008,
.

290

:
0

1 = 2 ,
1 (0) = 1
0
2 = 3 ,
2 (0) = 2
0
3 = f (t, 1 , 2 , 3 ), 3 (0) = 3
f , ..

f (t, u, u0 , u00 ) = 2u 3u0 4u00


:

d
= A
dt

0
A= 0
2

1
0
3

1
0
1 , = 2
3
4

8.2.3 Euler
,
(8.1) : u t
:

d
u(t + t) u(t)
u(t) =
+ O(t)
dt
t
O(t), (8.1)

U (t + t) U (t)
= f (t, U (t)),
t

U (t0 ) = c.

U (t + t) =

U (t) + tf (t, U (t)),

U (t0 ) = c. (8.2)

U (t0 )
U (t1 )

=
=

c
U (t0 ) + t1 f (t0 , U (t0 ))

U (t2 )

U (t1 ) + t2 f (t1 , U (t1 ))

U (ts )

U (ts1 ) + ts f (ts1 , U (ts1 ))

c
8.2. 2008,
. 291

8.1: U (tj ) ReR =

0.5u(t) u(0) = 1 t = 1/N


t
0.2
0.6
1.0

|U (tj )u(tj )|
u(tj )

u(t)

N =5

N = 10

N = 20

0.9048
0.7408
0.6065

0.9000
0.7290
0.5905

0.9025
0.7351
0.5987

0.9037
0.7380
0.6027

u0 (t) =

s=0
repeat
Fs = f (ts , Us )
ts

ts+1 = ts + ts
Us+1 = Us + ts Fs
s=s+1
until (ts tmax )
, U t .
Euler Euler.
U (ts ) .
,
t . , Fs := f (ts , Us )
Us + ts f (ts , Us ). ,
Fs (..
f ).
8.2.3. ( , )

d
u(t) = u(t)/2,
dt

u(0) = 1.

Us+1 = Us 0.5ts Us .
ts = 1/N
u(t) = et/2 N = 5, 10, 20.
t = 0.2, 0.6, 1 8.1. ,
t .

, ,
Cauchy [0, T ]

lim

max

h0 j=0,1,...,dT /he

ku(tj ) Uj (h)k = 0

Uj (h)
tj h.
h. ,

c
8. 2008,
.

292

ku(tj ) Uj (h)k = O(hp )


p > 0, 4 p.
Euler
Cauchy [0, T ].
t = h

h = [0, h, 2h, ..., sh, (s + 1)h], T = (s + 1)h


u () U = [U0 , U1 , ..., Uk+1 ]>
ej (h) = u(tj ) Uj . limh0 maxj kej (h)k = 0. f (t, u)
u00 [0, T ].

u(tj+1 ) Uj+1

u(tj ) + hu0 (tj ) + h2 u00 (tj + j h) Uj hf (tj , Uj ), , j (0, 1)

[u(tj ) Uj ] + hf (tj , u(tj )) hf (tj , Uj ) + h2 u00 (tj + j h)

ej+1 = ej + h[f (tj , u(tj )) f (tj , Uj )] + h2 u00 (tj + j h)


kej+1 k kej k + hkej k + O(h2 ) = (1 + h)kej k + h2

(8.3)
(8.4)

u00 (
[0, T ].).

kej k

h[(1 + h)j 1],

j = 0, 1, ...

1 + h < eh

kej k

jh
h[e
1],
j = 0, 1, ...

(s+1)h

eT 1
h[e
1] = h[eT 1] h

kej k

h0

0,

Euler ,
h , 1. , ,
, , T h
. ,
M .
, (8.3):

ej+1

ej + h[f (tj , u(tj )) f (tj , Uj )] + h2 u00 (tj + j h) .


{z
} |
{z
}
|{z}
|
1)j

(2)

3)j

ej j . ,

j + 1 :
4

, .

c
8.2. 2008,
. 293
j .
(
, ,
). limh0 = 0
.
(2) f Lipschitz,

hkf (tj , u(tj )) f (tj , Uj )k

hku(tj ) Uj k = hkej k

kej+1 k (1 + h)kej k + h2 |j |

. ,

,
.
Euler
Cauchy, u0 (t) = u(t)
u(0) = 1 C . u(t) = et ,
< < 0 limt u(t) = 0. ,
t U
( , Euler).
, ,
T . ,
h, T
. ,
h.
8.2.1.
Cauchy
limj kUj k = 0.
z = h C Cauchy,
limj kUj k = 0.

Euler.

Uj+1 = (1 + h)Uj = = (1 + h)j+1 U0 = (1 + h)j+1 .


,

lim |1 + h|j+1 = 0

|1 + h| < 1. ,
Euler

c
8. 2008,
.

294

, (1, 0)
1.
h ( ).
, , h < 2/||.
,
. , , h
T .
, u0 (t) = Au(t), u(0) = u0
A.

k(I + hA)j+1 u0 k 0
u0 ,
k(I + hAj+1 k 0. 5
I + hA 1. (I + hA) =
1 + h(A),
|1 + h(A)| < 1. ,
Euler
h [0, 2/(A)]. ,
A, h
.
8.2.4. u0 (t) = Au(t), A = diag[100, 1, 0.5].
u(t) = [e100t u1 (0), et u2 (0), et/2 u3 (0)]> .

ku(0)k > ku(t1 )k > ku(t2 )k ,

lim ku(t)k = 0


. Euler

Uk+1

= (I + hdiag[100, 1, 0.5])k+1 U0

(1 100h)k+1

0
=
0

0
(1 h)k+1
0

0
U1,0
U2,0 ,
0
k+1
U3,0
(1 0.5h)

Uj,0 = uj (t0 )

h :

|1 100h| < 1, |1 h| < 1, |1 0.5h| < 1,


h

0 < h < min{2/100, 2, 4} = 0.02


T = 1 50 . 0
, u1 (t) = 104 u1 (t)

0 < h < 2/104 = 0.0002


5

G. Strang, , . 590.

c
8.2. 2008,
. 295
T = 1 5000 ,
... ,
A. , A = A> , Q
QT AQ = ,

Uk+1

=
=

QQT (I + tA)k+1 QQT U0


Q(I + t)k+1 QT U0

Qdiag[(1 + tj )k+1 ]QT U0

8.2.5 .
Euler

d
u(t) u(t t)
u(t) =
+ O(t)
dt
t

U (t) U (t t)
= f (t, U (t)),
t

U (t0 ) = c.

U (t) tf (t, U (t)) = U (t t),

U (t0 ) = c

(8.5)

Euler. , U (t) (implicitly),


. , (implicit). , f
U f Rn , .. f (t, U (t)) = AU (t),
A(t) Rnn ,

(I tA(t))U (t) = U (t t).


Euler
Euler. , , . Euler Cauchy

Uj+1 = (1 h)1 Uj = = (1 h)(j+1) u0 ,


1
< 1 1 < |1 h|.
|1 h|
, < < 0, ( h > 0),
Euler , h.
, , (h = T )
0 T , .
T ,
( , h = T ).

296

c
8. 2008,
.

,
t


t.
.
Euler . A(t)
, LU
(I tA(t)) . , . A(t)
,
.

. , ,
O(n).
, Euler Euler.
, , ,
Euler . ,

, Euler
Euler.
8.2.5. ,
(cos , sin ) 0 2 .
u() = (x((), y())

dx
d
dy
d

sin = y

cos = x

d
d

x
y

y
x
0
1

1
0

x
y

du
= Au
d
u(0) = [1, 0]> . Euler.

U (tk+1 ) = U (tk ) + hAU (tk ),

U (0) = [1, 0]> .

c
8.2. 2008,
. 297

1.5

0.5

1
0
2
0.5

4
5
4

1.5
2

0
N=50

0
N=2000

N=10

1.5

1.5

0.5

0.5

0.5

0.5

1.5
2

0
N=200

1.5
2

8.4: 8.2.5
k U (tk+1 ) U (tk )
h A.
(temp = AU ) tk
saxpy U + htemp.
temp = AU , BLAS-2. , t = 2/N 8.4. ,
, N < 1000 . . [1, 0]
, .
, Euler. 8.5.
.
Euler .

c
8. 2008,
.

298

0.8
0.5

0.6
0.4

0
0.2
0

0.5

0.2
0.4
0.5

0.5

1
1

0.5

0
N=50

0.5

0.5

0
N=2000

0.5

N=10

0.5

0.5

0.5

0.5

1
1

0.5

0
N=200

0.5

1
1

8.5: 8.2.5

8.2.4 Taylor, Runge-Kutta Richardson


. , ,
.
, , .
; h, u 3 [t, t + h]

u(t + h) =
=

h2 00
h3
u (t) + u000 (t + h)
2!
3!
h3
h2 00
u(t) + hf (t, u) + u (t) + u000 (t + h)
2!
3!
u(t) + hu0 (t) +

c
8.2. 2008,
. 299
, u(t), f (t, u), u00 (t) , u(t + h) O(h3 ),
2, Euler 1.
u00 (t);

u00 (t)

u0 (t) u0 (t h)
+ O(h)
h
f (t, u(t)) f (t h, u(t h))
h

u(t + h) u(t), u(t h) .


u00 (t) f (t, u)
u0 (t) = f (t, u), .

u00 (t) =

ft (t, u) + fu (t, u)f (t, u)


ft , fu (t, u). :

Uk+1

Uk + hf (tk , Uk ) +

h2
(ft (tk , Uk ) + fu (tk , Uk )f (tk , Uk ))
2!

, 2. , , (
, .. Maple).
Taylor.
Runge-Kutta
Runge-Kutta
(RK).

Un+1 = Un + h(tn , Un )
u.
Euler, RK. : Euler,
t = h, . U1 , u(0)
(. ) t = 0.
U1 = u(0) + hf (0, u(0)) . :
f (0, u(0)),
[0, h]
. ,
, 0, h, f (0, u(0))
f (h, u(h)),

U1

U0 + h (f (0, u(0)) + f (h, u(h)))


u 0,

300

c
8. 2008,
.

, .. Euler:

K1
K2

=
=

U1

f (0, u(0))
f (h, u(0) + hf (0, u(0)))
h
u(0) + (K1 + K2 )
2

(. tn tn+1 ) :

K1
K2

=
=

U1

f (tn , Un )
f (tn+1 , Un + hK1 )
h
u(0) + (K1 + K2 )
2

RK.
. ,

K1

K2

U1

f (tn , Un )
h
h
f (tn + , Un + K1 )
2
2
u(0) + hK2

, [tn , tn +h],
tn tn+1 .
Runge-Kutta :

Un+1

Uk + h

s
X

b i Ki

i=1

Ki

f (tn + ci h, Un + h

s
X

aij Kj )

j=1

(s)
. , RK
Butcher:

c1
c2

a11
a21

..
.

..
.

cs

as1
b1

a12
a22

..
.

ass
bs

..

as2
b2

a1s
a2s
=

A
b>

RK , .
; , s 4 , s.
, s s. ,

c
8.2. 2008,
. 301
4, .
.. s = 5 4. 13 s 17
10, .. ,
.
8.2.6. Runge-Kutta, , 2 ( Heun)

K1

f (tn , Un )

K2

f (tn+1 , Un + hK1 )

Un+1

= Un +

h
(K1 + K2 )
2

8.2.7. Runge-Kutta, , 2 ( midpoint)

K1

K2

f (tn , Un )
h
h
f (tn + , Un + K1 )
2
2

Un+1

= Un + hK2

8.2.8. Runge-Kutta, , 4

K1

K2

K3

K4

f (tn , Un )
h
h
f (tn + , Un + K1 )
2
2
h
h
f (tn + , Un + K2 )
2
2
f (tn+1 , Un + hK3 )

Un+1

= Un +

h
(K1 + 2K2 + 2K3 + K4 )
6

8.2.9. , RK
MATLAB ( ode23).

K1
K2
K3
tn+1
Un+1
K4
En+1

= f (tn , Un )
h
h
= f (tn + , Un + K1 )
2
2
3h
3h
= f (tn +
, Un +
K2 )
4
4
= tn + h
h
= Un + (2K1 + 3K2 + 4K3 )
9
= f (tn+1 , Un+1 )
h
(5K1 + 6K2 + 8K3 9K4 )
=
72

302

c
8. 2008,
.

, .

, ,
. Runge-Kutta
.
h, .
RK s 4 ,
u0 = h. s 4, h

:= {z := h||pn (z)| 1}, pn (z) =

s
X
zj
j=0

j!

6
.
Richardson
,
:

W (h) =

W (0) + 1 hp1 + + ps hps + O(hps+1 )

h, W (h), p1 < < ps ps+1 > ps . ,


W (0) , W (h)!
W (j h) j

(h)
W

= W (0) + O(hps+1 ),

W (0) O(hps+1 ), . O(hp1 ).


.
s j (0, 1), j = 1, ..., s (.
W (j h),

W (h) = W (0) + 1 hp1 + ps hps + O(hps+1 )


W (1 h) = W (0) + 1 (1 h)p1 + ps (1 hps ) + O(hps+1 )
..
.

..
.

..
.

= W (0) + 1 (s h)p1 + ps (s h)ps + O(hps+1 )

W (s h)

W (0),

W (h) W (1 h) = 1 hp1 (1 1p1 ) + + ps hps (1 1ps ) + O(hps+1 )


..
.

..
.

..
.

W (h) W (s h) = 1 hp1 (1 sp1 ) + + ps hps (1 sps ) + O(hps+1 )


6

http://www.scholarpedia.org/article/Runge-Kutta_methods

c
8.2. 2008,
. 303
O(hps+1 ), j , 1 , ..., ps
( ) W (0):

(h) :=
W

W (h) 1 hp1 ps hps

(h) =
W

W (0) + O(hps+1 )

8.2.10. Richardson
Euler.

U (tn + h) =
U (tn + h/2) =

u(tn ) + 1 h + O(h2 )
h
u(tn ) + 1 + O(h2 )
2

U (tn + h) U (tn + h/2)

= 1

h
+ O(h2 )
2

O(h2 )

(U (tn + h) U (tn + h/2))


h/2

(tn + h)
U

= u(tn ) +

(U (tn + h) U (tn + h/2))


h + O(h2 )
h/2

u( tn ) =

(tn + h) + O(h2 )
U

, , U (tn + h/2).

() Euler.

Uh (t) Uh/2 (t) =

h
+ O(h2 )
2

2
(Uh Uh/2 )
h

Uh/2 u(t) Uh/2

Eh/2 (t)
=

h/2
Uh Uh/2

c
8. 2008,
.

304

, h .
8.2.11.

u0 (t) = u/2, u(0) = 1


u(t) = exp(t/2)u(0).
Euler h = 1/10, 1/20
(t) :=
t = 0.1, 0.2, ..., 1 , . U
2Uh/2 (t) Uh (t). 8.2.11
Euler h = 0.1, Euler h = 0.05,
Richardson . ()
.

10

sfalma

10

10

Richardson extrapolation
f. Euler h=1/10
f. Euler h=1/20
5

10

tk

8.2.5

- :

(diffusion equation).
(
).

2u
u
= 2 , x R , t [0, tmax ]
t
x

10

c
8.2. 2008,
. 305
.
, 0 tmax
. u(t, x) )
) t = 0.
= [0, 2] u(t, 0) = 0, u(t, 2) = 0.

u(0, x) = sin(x)


. x 2 .
( t = 0)
u(0, x).
.
0.
( ).
u(t, x) = et sin x. ,

. ,

.

x.
h=

(n+1)

h = {xj = jh|j = 0, ..., n + 1}


x, uxx (t, x)
u(t,x )+u(t,x +1)2u(t,xj1 )

j
j
,
h2
uxx (t, xj ). n

U (t, xj1 ) 2U (t, xj ) + U (t, xj+1 )


U (t, xj ) =
, t [0, tmax ], j = 1 : n
t
h2

U (t) = [U (t, x1 ), ..., U (t, xn )]> .

U (t) = 2 AU (t), t [0, tmax ]


t
h
Uj (t) U (t, xj ) A = tridn [1, 2, 1].
Euler . t tk = kt k = 0 : N

1
W (k+1) W (k)
= 2 AW (k) , k = 0 : N, tmax = N t.
t
h

c
8. 2008,
.

306
1

1
computed values for dt=0.025
t=0
t=1.25, ReR=0.0063
t=2.50, ReR=0.0216

0.8

0.6

0.6

0.4

0.4

0.2

0.2

0.2

0.2

0.4

0.4

0.6

0.6

0.8

0.8

3
x

computed solution with dt=0.01


t=0
t=1.25, ReR=0.0038
t=2.50, ReR=0.0076

0.8

8.6:
Euler = [0, 2] h = 2/(n + 1) n = 20 t = 1/N
t = 0, 1.25, 2.50.
. ReR
ku W k2 /kuk2 . n =
20, N = 40 n = 40, N = 100.
W (k) U (tk )
U .

W (k+1)

= W (k)
= (I

1
AW (k) t, k = 0 : N
h2

t
A)W (k)
h2

(8.6)

: .
(k+1)

Wj

u(tk , xj ). ,
.
. , t, :
N, n h = 2/(n + 1)

g = t/h2
W = [u(0, x1 ), ..., u(0, xn )]>
for k = 1, ...
W (k) = W (k1) gAW (k1)
end
8.6
(t, h). ,
ReR .
h = 2/21, t = 1/40 ReR= 0.0216
t = 2.5 h = 2/41, t = 1/100 ReR= 0.0076.
, t =
1/40 , .. h = 2/61.
8.7
t = 0.425 = 17t. .

c
8.2. 2008,
. 307
, , W
. , 8.6 !

. .
. .
, W (k+1) = W (k) gAW (k)
g = t/h2 .

W (k+1) = (I gA)W (k) = = (I gA)k+1 W (0) , t0 = 0.


W (tk+1 )
W (0) (I gA)k+1
W (0) . A
Q> AQ = A
Q .

W (k+1)

= QQ> (I gA)k+1 QQ> W (0)


= Q(I g)k+1 Q> W (0)
(k+1)
(0) , W
(j) = Q> W (j) .

W
= (I g)k+1 W
(k+1)
W
k+1
|1g|
. |1g| > 1
. ,
W (0) (..
) (I gA)k+1 . ,
,
. ,
u(t, x) = et sin x |u(t, x)| = |et sin x| 1
t 0. . A
A = A>
( Gerschgorin, . .3.1 ) (A)
A 0 < (A) < 4.

1
2

t
h2
t
h2

1
0

h2 2
t max
h2 2

1
t
.
h2
2

c
8. 2008,
.

308
1

1
t=0, exact soln
t=0.425, exact soln
t=0.425, computed soln with t=1/40
ReR = 0.3745

0.8

0.6

0.6

0.4

0.4

0.2

0.2

0.2

0.2

0.4

0.4

0.6

0.6

0.8

0.8

t=0, arxikes sinthikes


t=0.425 exact soln
t=0.425 computed soln t=1/40
ReR= 0.0056

0.8

8.7:
= [0, 2] h = 12/61 t = 1/40 t = 0.425 = 17t.
. ReR
ku W k2 /kuk2 .
Euler (8.6).
Euler (8.7).

h ..
,
t. , h h/2, t . 8.6, g = t
h2
g = 1/40/(2/21)2 0.3 g = 1/100/(2/41)2 0.4 8.7
g = 1/40/(2/61)2 2.5.

. t
W (k+1) W (k) ,

0 t = tmax tmax /t
,
.
Euler (8.5)

1
W (k+1) W (k)
= 2 AW (k+1) , k = 0 : N
t
h

t
AW (k+1)
h2
t
(I + 2 A)W (k+1)
h

W (k+1) +

= W (k) , k = 0 : N
= W (k) , k = 0 : N

(8.7)

, W (k+1) (implicitly), . :

c
8.2. 2008,
. 309
N, n h = 2/(n + 1)
g = t/h2
W (0) = [u(0, x1 ), ..., u(0, xn )]>
for k = 1, ...
W (k) = (I + gA)1 W (k1)
end
8.7. 8.7.
.
,
(I + gA)1 ,

(k+1)
W

(0), W
(j) = Q> W (j) .
= (I + g)(k+1) W

g > 0 A , (1 + g)1 < 1,


t.

Euler t
Euler. .
.
, .. ut = (uxx + uyy ), R2 .
(.. n2 n2 )
.

.
[1]. [3] Euler [2]

.

. : MATLAB

Euler. ode23
.
.
:
(.. ),
, , , ,
.

http://www.cs.purdue.edu/research/cse/pses

c
8. 2008,
.

310

. MATLAB
PDE Toolbox 2 .

8.3
8.3.1 (, , 04). : (0, 1)
2

ddxu2 + b du
dx + cu = x, b, c > 0,
u(0) = 0, u(1) = 1,
x0 = 0, x1 = 1/(n + 1), . . ., xn = n/(n + 1), xn+1 = 1
,
AU = F , A Rnn A = A> .

. , . , xi :

d2
u(xi1 ) 2u(xi ) + u(xi+1 )
u(xi ) =
dx2
h2
d
u(xi+1 ) u(xi1 )
u(xi ) =
dx
2h
h = 1/(n + 1). Ui u(xi )
:

1
b
1
1
b

)Ui1 + ( 2 + c)Ui + ( 2 +
)Ui+1 = xi
h2
2h
h
h
2h


b
b
AU = F , A = trid[ h12 2h
, h12 + c, h12 + 2h
],
.
8.3.2 (, , 03). : t
du/dt = f (t, u) Euler U (s+1) =
U (s) + tf (ts+1 , U (s+1) ) .

. Euler
t.
8.3.3 (, , 03). t Euler, U (s+1) = U (s) + tf (ts , U (s) ),
du/dt = f (t, u) , ,

.

. : Euler
.

8.4
8.4.1. f : R R
4 , . f (j) (x), j = 1 : 4
x [a, b].

c
8.4. 2008,
.

311

1.
.
2. .

. (. 8).
8.4.2. u : R R
4
[0, 1]

d2 u

(x) + 10u(x) = sin( x), x [0, 1],


dx2
2

u(0) = 0, u(1) = 1.
1. n

O(h2 ), h = 1/(n + 1)
.
2.
Cholesky.

. (. 8).
8.4.3. u : R R
4 , . u(j) (x), j = 1 : 4
x [1, 1]. u

u(2) (x) u(x) = 2x, x (1, 1)


u(1) = 0, u(1) = 1.
, .
1.
n = 8
(1, 1) ( ).
2.

.

.
8.2.1. 1)
xi (1, 1)

u(2) (xi )

Ui+1 2Ui + Ui1


h2

312

c
8. 2008,
.

Ui u.

Ui+1 2Ui + Ui1


Ui
h2
Ui+1 (2 + h2 )Ui + Ui1
h2
Ui+1 + (2 + h2 )Ui Ui1

= 2xi
= 2xi
= 2h2 xi .

2
xi = 1 + i, i = 1 : 8
9
h = 2/9. (
i = 0, 9 [1, 1].)
T U = Y T Toeplitz,
T R88

T = trid[1, 166/81, 1]
, U = [U1 , ..., U8 ]>

Y =

1
[56, 40, 24, 8, 8, 24, 40, 673]>
729

2) T U = Y . , ,
LU .
Gershgorin (,
.2.1) i , i = 1 : 8

166
166
| 2, |
| 1.
81
81

T = T > .

166
166
2, 1
1,
81
81

> 0.
.
Cholesky
LU.
8.4.4 (, , 02-makeup).
u : R R2

du
(t) = Au(t), A = [1, 0.5; 0.5, 1] R22 ,
dt
u(0) = [1, 2]> .

c
8.4. 2008,
.

313

1. Euler t = 0.1
T = t.
2. Euler t = 0.1
T = t.
3. , 4 , u(t) = [0.7982, 1.0214].

E =

ku(t) U (t)k
,
ku(t)k

E =

ku(t) U (t)k
ku(t)k

T = t, U (t) U (t) Euler .


,
t = 0.1 t 0.

Euler
Euler.
4.
Euler .
8.4.5. () du
dt (t) = Au(t) A =
[2, 1; 1, 2], u = [u1 (t), u2 (t)]> u1 , u2
u1 (0) = 2, u2 (0) = 1. ) Euler
t = 0.5
T = 4.0.
) t = 0.8 T = 4.0.
)
0 t . ,
() (). (:
, A.)

. -) 8.2.2.
U (t + t) = (I At)U (t). ,
U (0) = [2, 1]> , ,
:
t
U_1
U_2

t
U_1
U_2

0
2.0
1.0

0.5
0.5
1.0

0
2.0000
1.0000

1.00
0.50
0.25

1.500
0.125
0.250

0.8000
-0.4000
1.0000

2.0000
0.1250
0.0625

1.6000
1.0400
-0.9200

2.5000
0.0313
0.0625

3.0000
0.0313
0.0156

2.4000
-1.3600
1.3840

3.5000
0.0078
0.0156

3.2000
1.9232
-1.9184

4.0000
0.0078
0.0039

4.0000
-2.6886
2.6896

) t = 0.5, U1 , U2 , t = 0.8,
. t ,
0, ,

314

c
8. 2008,
.

( ). , : U (t + t) = (I At)U (t)
Q> AQ = diag[3, 1] Q> Q = I , U (t + t) = QQ> (I At)QQ> U (t) =
Q(I diag[3t, t])Q> U (t) U (t + t) = Qdiag[1 3t, 1 t]Q> U (t).
t = 0.5, max(|1 3t|, |1 t|) = 0.5 < 1, t = 0.8
max(|1 3t|, |1 t|) = 1.4 > 1, ,
.

8.5
8.5.1.

erf(t) =

e d

.
2
du
2
= et ,
dt

u(0) = 0.


, ode23 MATLAB, .

[T,Y]=mysolver(odefile,Tspan,Y0),
input.
ode23 MATLAB odefile, Tspan,
Y0.
) Euler
[0,3] : h = 1/10, 1/20, 1/40.
, (
) (.. caption) . MATLAB erf

Euler.

Euler .
) () Taylor
.
Euler.
) Euler
( )

t . = 104 . ;
) hmax , hmin
. Euler

315

.
|Us (erf )(ts )|
1) Euler hmin 2) hmax 3)
Euler
= 104 , 4) Euler h = 1/(n + 1)
n ().
)
Taylor ().

[1] A. Iserles. Introduction to Numerical Methods for Differential Equations. Cambridge University Press, Cambridge, 1996.
[2] . and . . . , , 1995.
[3] .. and .. . . , 1997.
[4] N. Wiener. Invention: The Care and Feeding of Ideas. MIT Press, Cambridge,
Massachusetts, 1954. Published by MIT Press in 1993.

316


.1
.1.1
.1.1. x Cn , kxk,
Cn R ,
:
1. kxk > 0 x = 0 kxk = 0 ( )
2. kx + yk kxk + kyk ( )
3. kxk = ||kxk R ( ).

:
l1 : kxk1 =
l2 : kxk2 =

Pn
j=1

|xj |

n
j=1

|xj |2

1/2

l : kxk = max1jn {|xj |}. .


p- (lp )

kxkp =

n
X

1/p
|xj |p

j=1

l2 x x .
:
x x = kxk22 .
|x y| kxk2 kyk2 .
kx yk | kxk kyk |.
317

318

c
. 2008,
.

.1:

1
2

1
1
1
1

1
1

n
1

,
. , ,
.
, .. Rn .
.1.2. U k k
0
k k . 1 , 2

0

1 kuk kuk 2 kuk u U.

.
.1.1. Rn ( Cn )
.


, , ,
. , l2
un ,
. 1 , 2 .1. ..

2, 1, 1, n

kxk2 kxk1 , kxk2 kxk2 , kxk2

nkxk

, k k.
, k kD , :

kxkD = max
z6=0

|z x|
.
kzk

p- q -, p1 +
q 1 = 1. |x yk kxkkykD . , x,
xD , x,

xD x = kxD kD kxk = 1.

.1.2
. :

c
.2. Lipschitz 2008,
. 319
.1.3. k k : Rnn R
.1.1 kA Bk kAkkBk,
.

.1.4. k k k kM :
Rnn R

kAkM =

sup
x Rn
x 6= 0

kAxk
.
kxk

.
,

kAkM =

sup
x Rn
kxk = 1

kAxk
.
kxk

, . A Rnn .
kAk1 = maxj=1:n

Pn
i=1

|i,j |.

kAk2 = max (A A) max () ( ).


kAk = maxi=1:n

Pn
j=1

|i,j | ( )

Frobenius (
) A Rmn :

kAkF =

m X
n
X

1/2
|ij |2

i=1 j=1

Frobenius
: X X = I, Y Y = I , kXAY k = kAk.

.2 Lipschitz

.
.2.1. g : Rn Rm
x Rn gi , i = 1, ..., m, g(x) = [g1 (x), , gm (x)],
x. g x
g x g x

gi
, i = 1 : m, j = 1 : n.
xj

320

c
. 2008,
.
0

g (x) Rmn g (x)ij =

g(x)> .

gi
xj (x),

g (x) = J(x) =

. .2.1 g(x + h) g(x) + J(x)h.


:
.2.2. m, n > 0 g : Rn Rm , x Rn k k
0
n
R k k Rm . g Lipschitz x
D Rn x D (g, D)
z D
0

kg(z) g(x)k (g, D)kz xk.


(g, D) Lipschitz g x g Lipschitz x D .

x Rn , g Lipschitz
I . Lipschitz
.

.3

. ,
A Rmn rank(A) A.
A A.
.
5.2.1. , A Rmn ()
A ().
rank(A).
null(A) A, .
u Au = 0.
.3.1. A Rmn , B Rnk .
1. rank(A) = rank(A> ).
2. rank(A) + null(A) = n.
3. rank(A) + rank(B) n rank(AB) min(rank(A), rank(B)).
4. rank(A + B) rank(A) + rank(B).
5. rank(A) = r X Rmr , Y Rrn
C Rrr A = XCY .

. (= trace) A Rnn Pn
, . trace(A) = i=1 i,i .
A Rnn
x> Ax > 0 x 6= 0.

c
.3. 2008,
.

321

Gershgorin () .
.3.1. A Cnn
n
n
X

|z ii |

|ij |,

i = 1 : n.

j=1,j6=i

Hadamard Kronecker .3.1. A, B Rmn .


Hadamard C := A B ij = ij ij .

MATLAB Hadamard C = A.* B.


.3.2. A Rm1 n1 , B Rm2 n2 . Kronecker
C := A B Rm1 m2 n1 n2

11 B

C = 21 B

..

.
m1 1 B

12 B
..

..

..
.

1n1 B

m1 n 1 B

MATLAB Hadamard C = kron(A,B).



Ax = x,

A Cnn , x, Cn

A. Ax = x det(A
I) = 0. det(A I) n
n :

p() := det(A I) = ( 1 ) ( n )
A (A) := {1 , ..., n }.
A Rmn .
U Rmm , Rmn V Rnn

A = U V > , U > U = I, V > V = I.



.
A. A
A. ,
. U (V ) ()
. :
.3.2.
1. A> A A,
. j (A> A) = j2 .

c
. 2008,
.

322
0.5

0.1

0.4

0.08

0.3

0.06

0.2

0.04

0.1

0.02

0.1

0.02

0.2

0.04

0.3

0.06

0.4

0.08

0.5
0.8

0.6

0.4

0.2

0.2

0.4

0.6

0.8

0.1
0.25

0.2

0.15

0.1

0.05

0.05

0.1

0.15

0.2

0.25

.1: Ax x R2 kxk2 = 1. :
(=2) . : (=1) .
2. kAk2 = max . A kA1 k1
2 = min .
3. 1 r > r+1 = = n = 0 A r .
A (null(A)) V (:, r + 1 : n).
A (range(A)) U (:, 1 : r).
4. A =

Pn
j=1

j uj vj>

5. S n1 =
{x|kxk = 1}. AS n1 0
j uj .


. ,
MATLAB

> A= rand(2); s=svd(A)


s = 0.8703, 0.2764
MATLAB

>
s
>
>
>
>
>
>

A= rand(2); s=svd(A)
= 0.8703, 0.2764
x = randn(2,1); x = x/norm(x); y = A*x; plot(y(1),y(2),+);
hold on
for j=1:1000
x = randn(2,1); x = x/norm(x); y = A*x; plot(y(1),y(2),+);
end
hold off

.1.

> u = rand(2,1); v= rand(2,1); A = u*v; s=svd(A)


s = 0.2541, 0

c
.3. 2008,
.

323

0.6

0.4

0.2

0.2

0.4

0.6

0.8
1

0.8

0.6

0.4

0.2

0.2

0.4

0.6

0.8

.2: Ax x R2 kxk2 = 1
1 = 1 2 = 0.01.
. , .
.1,
Ax , u1 , .
Ax
, b Ax = b. ,
, b
u1 . , .
. ,
A 1, 0.01
.2. 100
( ),
- - .
max
min


. ,

.

You might also like