Professional Documents
Culture Documents
Scientific Computations
Scientific Computations
2008
c
2008,
. ,
.
. I
/ .
.
http://scgroup.hpclab.ceid.upatras.gr/class/sc.html.
( , , , ,
, on-line , , , .)
.
. .
. ( RISC SVD).
, .. cos 1 .
2 :
1 . , . , . . (. : 1992),
2
(. : ,
1976).
matrix
.
, , ,
( ) array
table. , polyval.m MATLAB
POLYVALM Matrix polynomial evaluation. If V is a vector whose elements are the coefficients of a polynomial, then POLYVALM(V,X) is the
value of the polynomial evaluated with matrix argument X. See POLYVAL for
1
, .
- Strang .
2
.
4
polynomial evaluation in the regular or array sense.
, Mathematica, ,
.
3 . , ,
. 4 .
,
/. ,
, : )
110 (2 .): , ) 240 (4 .):
. 261 (3
.) ( ) 205 ( ).
, , () ()
G. Strang, / (1996)
.. .. , /
(1997).
( ) MATLAB ( version 7).
, .
, ,
Scilab
MATLAB , ( http://www.scilab.org/)!
: 1) G. Golub and C. F. Van Loan.
Matrix Computations. The Johns Hopkins University Press, Baltimore, third
edition, 1996. 2) N.J. Higham. Accuracy and Stability of Numerical Algorithms.
SIAM, Philadelphia, 2002, 2nd. ed. C.W. Ueberhuber.
Numerical Computation, volumes 1 and 2. Springer, Berlin, 1997.
.
,
. .
.
. ,
. (
).
.
3
, matrix,
, 2 (. xxxiv).
4
. 1xvii 2 .
5
, , , , , .
.
().
.
. , , , , . , , , , , , ,
, .
, ,
, , , ,
.
LaTEX .
.
. ,
. , , ,
. , .
.
2008
1
1.1 . . . . . . . . . . . . . . . . . . . . . . . .
1.2 . . . . . . . . . . . . . . . . . . . . . . . . .
1.3 . . . . . . . . . . . . . . . . . . . . . . . . .
1.4 . . . . . . . . . . . . . . . . . .
1.4.1 . . . . . . . . . . . . . .
1.4.2 , ,
. . . . . . . . . . . . . . . . . . . . . . . .
1.5 . . . . . . . . . . . . . . . . . . .
1.6 . . . . . . . . . . . . . . . .
1.7 . . . . . . . . . . . . . . . . . . . . . . . .
1.8 . . . . . . . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
2
2.1
2.1.1 . . . .
2.2 . . . . . . . . . . . . . . . . . .
2.2.1 . . .
2.3 . . . . . . . . . . . . . . .
2.4 . . . . . . . . . . . . . . . . . . . . . . .
2.5 . . . . . . . . . .
.
.
.
.
.
.
.
5
. 5
. 7
. 9
. 12
. 12
.
.
.
.
.
14
16
16
18
19
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
27
28
32
33
34
39
41
41
3
3.1 . . . . . . . . . . . . . . . . . .
3.2 . . . . . . . . . . .
3.2.1 , . . . . . . .
3.2.2 bit . . . . . . . . . . . .
3.2.3 ... . . . . . . . . . . . . .
3.2.4 . . . . . . . . . . . . . . . . .
3.3 .
3.4 . . .
3.4.1 . . . . . . . . . .
3.4.2
3.4.3 . . . . . . . . .
3.4.4 Fused Multiply and Add (FMA) . . . . . . . .
3.4.5 Java . . . . . . . . . . . . . . . . . . .
3.5 . . . . . . . .
3.6 . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
47
48
49
51
53
54
55
58
63
65
66
68
68
70
71
86
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2
3.6.1 . . . . .
3.6.2 . . .
3.7 . . . . . .
3.8 . . . . .
3.9 . . . . . . . . . . . . .
3.10
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. 87
. 87
. 89
. 91
. 98
. 106
4
4.1 . . . . . . . . . . . . . . . . . . . . . . . . .
4.1.1 . . . . . . . . . . . . . . . . . . .
4.1.2
4.2
4.2.1 . . . . . . . . .
4.2.2 . . . . . . . . .
4.2.3 . . .
4.2.4 - : .
4.2.5 . . . . .
4.2.6 BLAS . . . . . . . . . . . . . . . . . . . . . .
4.3 . . . . . . . . .
4.4 . . . . . . . . . . . . . . . . .
4.5 . . . . . . . . . . . . . . . . . . . . . . . .
4.6 . . . . . . . . . . . . . . .
4.7 . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
111
. 111
. 112
. 113
. 114
. 114
. 118
. 121
. 123
. 127
. 129
. 132
. 132
. 138
. 139
. 140
5 II
5.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2 . . . . . . . . . . . . . . . . . . . . . . .
5.3 . . . . . . . . . . . . . . . . . . . . . . . . . .
5.4 . . . . . . . . . . . .
5.4.1
5.4.2 . . . . . . . . . . . . .
5.5 . . . . . . . . . . . . . . . . . . . . . . . .
5.6 . . . . . . . . . . . . . . . . . . . . . . . .
5.6.1 . . . . . . . . . . . . .
5.7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.7.1 . . . . . . .
5.8 .1 . . . . . . . . . . . . . . . . . . . .
5.9 . . . . . . . . . . . . . .
5.9.1 Cholesky . . . . . . . . . . . . . . .
5.10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.11 . . . . . . . . . . . . . . . . . . . . . .
5.12 . . . . . . . . . . . . . . . . . .
5.13 . . . . . . . . . . . . . . . . . . .
5.14 . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
151
. 151
. 153
. 157
. 165
. 166
. 168
. 171
. 175
. 178
. 182
. 186
. 192
. 194
. 194
. 197
. 200
. 200
. 201
. 203
6
6.1 QR . . . . . . .
6.2 . . . . . . . .
6.2.1 . . . . . . . . . . . .
6.2.2 Gram-Schmidt . .
.
.
.
.
.
.
.
.
217
. 218
. 218
. 219
. 220
III
. .
. .
. .
. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. 222
. 223
. 225
. 226
. 227
. 228
. 230
. 231
. 233
. 234
. 235
. 236
. 237
. 237
. 238
. 240
. 241
7 IV
7.1 / . . . . . . . . . . . . . . . . .
7.2 . . . . . . . . . . . . . . . .
7.2.1 : . .
7.2.2 Vandermonde . . . . . . . . . . . . .
7.2.3 Toeplitz . . . . . . . . . . . . . . . .
7.2.4 Toeplitz . . . . .
7.2.5 . . . . . . . . . . . . . . .
7.3 .
7.4 . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
245
. 246
. 251
. 251
. 253
. 259
. 260
. 265
. 267
. 267
8
8.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.1.1 . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.2 . . . . . . . . . . . .
8.2.1 . . . . . . . . . .
8.2.2 . . . . . . . . . . . .
8.2.3 Euler . . . . . . . . . . . . . . . . . . . . . . . . .
8.2.4 Taylor, Runge-Kutta Richardson .
8.2.5 - :
8.3 . . . . . . . . . . . . . . . . . . . . .
8.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.5 . . . . . . . . . . . . . . . .
275
. 275
. 278
. 278
. 279
. 287
. 290
. 298
. 304
. 310
. 310
. 314
.1 . . . . . . . . . . . . . . . .
.1.1 . . . . . .
.1.2 . . . . . . . .
.2
.3 .
317
. 317
. 317
. 318
. 319
. 320
6.3
6.4
6.5
6.6
6.2.3 GS . . . . . . .
6.2.4 Householder
6.2.5 . . . . . .
QR: Householder . . .
6.3.1 QR .
6.3.2 QR Householder . . . . .
6.3.3 .2 QR . . . . . . . . . . . .
6.3.4 . . . . . .
6.3.5 . . . . . . . . . . . . . . . . . . .
Givens . . . . . . . . . . . . . . . . .
6.4.1 . . . . . . . . . .
6.4.2 Givens . . . . . . .
6.4.3 . . . . . . . . . . . . . . . . . . .
6.4.4 QR Givens . . . . . . . . . . . . . . . .
6.4.5 . . . . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . .
. . . . . .
. . . . . .
. . . . . .
Lipschitz
. . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1.1
,
1
.
:
1980
,
. Future Directions in Computational
Mathematics, Algorithms and Scientific Software. : The use of modern computers in scientific and engineering research
and development over the last three decades has led to the inescapable
conclusion that a third branch of scientific methodology has been created.
It is now widely acknowledged that, along with the traditional theoretical
and experimental methodologies, advanced work in all areas of science
and technology has come to rely critically on the computational approach. ( [33]).
It is becoming clear that dramatic increases in computing power are necessary but insufficient to making high-performance computing a reality.
Necessary is also the construction of a large body of applications capable
of using that computational power effectively ( [3] Alpern
Carter2 .)
It is essential to recognize the fact that computer experiments can both
be a two-way bridge between Physical Experiments and Mathematical
Models, as well as an independent source of physical understanding.
Such experiments have a mind-bending potential for future explorations
of natures secrets, which is only vaguely recognized today. (
1
, . 2.
Bowen Alpern Larry Carter Computer Scientists IBM Yorktown Heights. Carter University of California, San Diego (UCSD).
2
c
1. 2008,
.
6
Jackson3 [23]).
: , ,
, , . , . ,
, ,
.
. ,
:
.
( Michel Serres4 [35])
. ,
computational science and engineering . Computational Science and Engineering
[12]. Golub Ortega5
[14, . 2]:
Scientific computing is the collection of tools, techniques, and theories required to solve on a computer mathematical models of problems in science and engineering.
:
, ,
.
. Mathematical Modelling
[4, . 220].
(, , .)
.
.
3
Atlee Jackson Center for Complex Systems Research Beckmann Center University of Illinois at Urbana-Champaign. Santa Fe
Insitute.
4
Michel Serres
5
Gene Golub Stanford James Ortega
University of Virginia.
c
1.2. 2008,
.
1.1:
[26]
1 2 3 4 5 6 7 8 9 10
.
.
.
*
.
.
.
*
*
*
.
.
.
.
*
*
.
.
.
.
*
.
*
.
*
*
*
.
.
.
*
*
.
.
.
.
.
.
.
.
.
*
.
.
*
.
.
.
.
.
*
.
*
*
.
.
.
.
.
.
*
.
*
.
.
*
*
.
*
.
*
.
*
.
.
.
.
*
.
.
*
*
*
*
*
*
*
.
.
.
/
*
*
*
.
.
.
.
*
*
*
*
*
.
*
.
.
.
.
.
.
.
*
*
*
*
*
*
*
.
.
*
. .
*
.
.
*
.
.
.
.
.
.
1.
3.
5.
7.
9.
(FFT, )
(=multigrid)
Monte Carlo
2. . / . . .
4. .
6.
8.
10.
1.2
: 1) , 2) (, ), 3)
, 4)
.
1.1, [26],
.
.
[26] ( )
6
. ,
. 1) (restructuring compiler) 2)
..
.
1.
,
6
(= legacy) (= dusty-deck) .
c
1. 2008,
.
2. .
1.1 .
,
(= computational kernels). ,
Fourier
.
Fourier ( FFT)
( Gauss ).
. (
)
.
.
1.2.1. () Fourier , -
-
Fourier .
7.
,
.
(
7 ) . [24, 34].
1.1 ,
. ,
.. ,
.
( ) .
- (=input-output tables).
8 .
(= derivatives) [6]. ,
7
c
1.3. 2008,
.
- -
[11] .
www.cs.sandia.gov/tech_reports/ripryor/Aspen.html.
1.3
:
1.
2.
3.
.
, . John Rice (Purdue University)9 : What
is an Answer? : , , .
,
10 . , .
, . ,
.
1) , 2) , 3)
(..) , . 4) , ..
, , . ,
. 5)
.
,
.
,
.
.
() ...
9
HERMIS, 1996.
(particle methods).
10
c
1. 2008,
.
10
. ,
.. Gauss Hotelling
[19], n ,
4n .
John von Neumann
11 .
[5]:
In the elimination method a series of n compound operations is
performed each of which depends on the proceeding. An error at
any stage affects all succeeding results and may become greatly
magnified; this explains roughly why instability should be expected.
It should be noticed that at each step a division is performed by
a number whose size cannot be estimated in advance and which
might be so small that any error in it would be greatly magnified by
division...
, John Wilkinson, almost every statement in it is either wrong or
misleading. ,
.
von Neumann, Herman Goldstine,
, von Neumann ( Turing), ,
Gauss [13] . [17].
( Oscar Wilde)
.
,
, ..
.
:
(
!)
,
,
.
c
1.3. 2008,
.
13
(. + . + .)
(vectorizing compilers)
BLAS1 BLAS3
FFT
11
O(1)
pipe
(1)
(. .)
(n/ log n)
1.2:
)
. 12 1.2.
.
. , ,
. ,
(.. ,
) 14 .
: .. RAM PRAM
.
(benchmarks) . Linpack benchmark ,
.
. ,
.
, ,
.
.
, ..
ACM, LAPACK, .. .
12
The Federal High Performance Computing Program
1989.
13
Alpern Carter performance programming [3].
14
Beresford Parlett [30].
c
1. 2008,
.
12
,
, ,
.
. ,
,
, :
1. .
2.
.
3. (.. RAM)
.
1.3.1. Fourier
( (n2 ) (n log n) )
: n,
,
.
1.3.2.
Strassen, (nlog 7 ) O(n3 ),
. :
, Strassen
!
.
1.4
.
.
1.4.1
.
. ,
. :
c
1.4. 2008,
.
13
http://www.ee.siue.edu/ mvinant/g_info/cpu_hist.htm#RISC.
2. (, , , /).
( ),
.
,
chip (single-chip processor), pins chip
single-chip
15 chip
. on-chip off-chip.
on-chip :
() (register files).
(instruction cache).
(data cache).
, on chip
. ,
.
( )
.
:
... the operation count is not necessarily an adequate figure
ofmerit in comparing theoretically the value of algorithms in
numerical analysis [ . . . ] Other factors, such as [ . . . ] the
pattern in which memory banks of the computer are referenced,
may be as important as the operation count in determining the
speed of a program... [18]
3. , .. (superscalar)
15
14
c
1. 2008,
.
, (= clusters), (networks of workstations = NOW) (Grid)
.
. [16].
.
( ) ,
, ,
(..
RISC,
). , , .
, SIMD = Single Instruction Multiple Data) streaming Intel,
(GPU = graphics processing units).
1.4.2 , ,
. , , .
, , ( )
.
. ,
, , . ,
-
(= semantic gap)
. ,
,
. ( .. [7, 21])
() (= Problem Solving Environments) (. [21]).
ELLPACK [20] ,
, . ELLPACK
. ,
. , , Mathematica [36], Maple [2],
Matlab [1], Scilab [15]). -
c
1.4. 2008,
.
15
.
(1924-2007) Fortran
BNF.
18
.
. . [32, 9].
17
16
c
1. 2008,
.
1.5
1.1 (-)
,
A A + xy T
(1.1)
A n x, y n.
:
1. Fortran ( ),
2.
3. .
) (1.1)
Unix dtime
user system ) () ...
(Mflop/s) n = 30 n = 800.
1.6
1.6.1.
;
. [, . 1.1] , ,
.
1.6.2. ; 3 .
. [, . 1.1] ( - - )
.
,
(.. , ..) : )
, ) Fourier, )
.
1.6.3.
;
. [, . 1.3]
) , )
, ) .
1.6.4. .
c
1.6. 2008,
.
17
3.5
time in sec
2.5
1.5
0.5
0
0
100
200
300
400
n
500
600
700
800
80
70
60
Mflop/s
50
40
30
20
10
0
0
100
200
300
400
n
500
600
700
800
c
1. 2008,
.
18
. [, . 1.3]
1) , 2) , 3) (..) ,
. 4) , ..
, , .
,
. 5)
.
1.6.5.
.
. [, . 1.3]
LAPACK
n. n 1000.
1.6.6. /
.
. [, . 1.4] ) RISC, )
, ) , .
1.7
Pn1
.
) Vandermonde :
1
1
12
V =
...
1n1
1
2
22
1
3
32
...
...
...
...
...
..
2n1
3n1
...
1
m
2
m
...
n1
m
m 1 , 2 , . . . , m Vandermonde :
a=
...
n1
>
c
1.8. 2008,
.
19
) MATLAB
( MATLAB ). , a(i) i1 .
g = a(n)*ones(m, 1);
for i=n-1:-1:1,
g=g.*z + a(i)*ones(m, 1);
end
) Fourier x n
y(j) =
n1
X
x(k)ei2kj/n ,
j = 0, ..., n 1.
k=0
Vandermonde :
=
=
1
1
ei2(1)
...
ei2(2)
>
. . . ei2(n1)
>
V a
Fourier a, O(n log n).
Pn1
j
1.7.2. , , p(z) =
j=0 j x
n1 6= 0, V a
8n log n ..
lyes-mach, , lno-mach,
, . , lno-mach,
V a,
2n2 . (.. )
lno-mach.
1.8
1.8.1. Fourier x
n
y(j) =
n1
X
x(k)ei2kj/n ,
j = 0, ..., n 1.
k=0
Vandermonde x. )
Vandermonde
Fourier Fourier
20
c
1. 2008,
.
. )
fft MATLAB : flops n = 128 : 4 : 512.
tic, toc, etime, cputime. ;
O(n2 ) O(n log n).
,
. .
.
) MATLAB Vandermonde.
function V=vand_fft(n)
for j=1:n, v(j, 1) = exp(-2*pi*sqrt(-1)*(j-1)/n); end
V=zeros(n);V(:, 1)=ones(n, 1);
for i=2:n,
V(:, i)=V(:, i-1).*v;
end
1.8.2. 1.3
( ).
.
MATLAB for-loops .
- .
.
profile
find.
n = 500 : 100 : 20000 m = 50 : 50 : 200,
n = 50 : 50 : 300 .
m, n.
.
.
. for-loops
( profile). 1.4.
- . 1.2
( ,
100 m, n).
MATLAB
.
c
1.8. 2008,
.
tic;n=20000;rand(state,0);
figure;
for k=1:1:n,
A(k)=k;
end
for k=1:1:n,
B(k)=round(rand(1)*n);
end
for k=1:1:n,
C(k)=A(k)+B(k);
end
for k=1:1:n,
plot(B(k),C(k),.r);
hold on;
end
hold off;toc
21
tic;m=100;n=200;rand(state,0);
figure;
for j=1:1:n
for i=1:1:m
A(i,j)=rand(1);
end
end
for i=1:1:m
for j=1:1:n
B(i,j)=rand(1);
end
end
for i=1:1:m
for j=1:1:n
C(i,j) = A(i,j) + B(i,j);
end
end
for i=1:1:m
for j=1:1:n
if C(i,j)>0.5,
C(i,j)=1;
elseif C(i,j)<0.5,
C(i,j)=0;
elseif C(i,j)==0,
C(i,j)=-10;
end
end
end
for i=1:1:m,
for j=1:1:n,
if C(i,j)==-10,
plot(i,j,.k);
hold on;
elseif C(i,j)==0,
plot(i,j,.y);
hold on;
elseif C(i,j)==1,
plot(i,j,.m);
hold on;
end
end
end
title(Given);hold off;toc
1.3: 1.8.2
c
1. 2008,
.
22
tic;n=20000;rand(state,0);
figure;
A=(1:n);
B=round(rand(1, n)*n);
C=A+B;
plot(B,C,.r);
toc
tic;m=100;n=200;rand(state,0);
figure;
A=rand(m, n);
B=rand(n, m);
C=A+B;
C(find(C>0.5))=1;
C(find(C==0))=-10;
C(find(C<0.5 & C>0))=0;
[i, j]=find(C==-10);
plot(i,j,.k);
hold on;
[i, j]=find(C==0);
plot(i,j,.y);
hold on;
[i, j]=find(C==1);
plot(i,j,.m);
hold on;
title(Given);hold off;toc
1.4: 1.8.2
Execution time for code 1
20
0.02
0.018
0.016
time (sec)
time (sec)
15
10
0.014
0.012
0.01
0.008
0.5
1
n
1.5
1
n
1.5
2
4
x 10
0.1
time (sec)
60
time (sec)
0.5
x 10
40
20
0
400
0.05
0
400
200
200
n
150
0
50
100
m
200
200
n
150
0
50
100
m
1.2:
( 1.8.2).
23
[1] MATLAB:
The
Language
of
Technical
Computing.
In
http://www.mathworks.com/products/matlab/.
[2] http://www.maplesoft.com/, 2007.
[3] B. Alpern and L. Carter. Performance programming: A science waiting to
happen. In U. Vishkin, editor, Developing a Computer Science Agenda for
High-Performance Computing. ACM Press, New York, 1994.
[4] R. Aris. Mathematical Modelling Techniques. Dover, Mineola, NY, 1994
(originally published in 1974).
[5] V. Bargmann, D. Montgomery, and J. von Neumann. Solution of linear
systems of high order. In A.H. Taub, editor, John von Neumann Collected
Works, volume V. Pergamon, Oxford, UK, 1963.
[6] E. Barucci, L. Landi, and U. Cherubini. Computational methods in finance: Option pricing. IEEE Computational Science & Engineering Mag.,
pages 6680, Spring 1996.
[7] R.F. Boisvert and E.N. Houstis, editors. Computational Science, Mathematics and Software. Purdue University Press, 1999.
[8] L. Carter. RISC from a performance programmers perspective. Invited talk at RISC in 1995 Symposium, 1995. Available from URL
http://www-cse.ucsd.edu/users/carter/ppbib.html.
[9] L. DeRose, K. Gallivan, E. Gallopoulos, B. Marsolf, and D. Padua. FALCON:
A MATLAB Interactive Restructuring Compiler. In C.-H. Huang, et al.,
editor, Lecture Notes in Computer Science: Languages and Compilers for
Parallel Computing, pages 269288. Springer-Verlag, New York, 1995.
[10] J. J. Dongarra, F. G. Gustavson, and A. Karp. Implementing linear algebra
algorithms for dense matrices on a vector pipeline machine. SIAM Rev.,
26(1):91111, January 1984.
[11] From Quadnet. Economic modeling from the ground up. IEEE Parallel and
Distributed Technology Mag., page 80, Summer 1996.
[12] E. Gallopoulos and A.H. Sameh. CSE: Content and product. IEEE Computational Science & Engineering Mag., 4(2):3943, 1997.
[13] H.H. Goldstine. The Computer from Pascal to von Neumann. Princeton
Univ. Press, Princeton, 5th edition, 1993.
[14] G. Golub and J.M. Ortega. Scientific Computing: An Introduction with
Parallel Computing. Academic Press, Inc., San Diego, CA, 1993.
[15] Scilab Group.
Scilab home
http://www.scilab.org/index.php.
page,
2007.
Online
at
24
[17] N.J. Higham. Accuracy and Stability of Numerical Algorithms. SIAM, Philadelphia, 2nd edition, 2002.
[18] R. Hockney. Computers, compilers, and Poisson solvers. In U. Schumann,
editor, Computers, Fast Elliptic Solvers, and Applications: Proc. GAMM Workshop, 1977.
[19] H. Hotelling. Some new methods in matrix calculation. Ann. Math. Statist.,
14(1):134, 1943.
[20] E.N. Houstis, T.S. Papatheodorou, and J.R. Rice. Parallel ELLPACK: An
expert system for the parallel processing of partial differential equations.
In Intelligent Mathematical Software Systems, pages 6373. North-Holland,
Amsterdam, 1990.
[21] E.N. Houstis, J.R. Rice, E. Gallopoulos, and R. Bramley, editors. Enabling
Technologies For Computational Science: Frameworks, Middleware, and
Enviroments. Kluwer, 2000.
[22] Grand Challenges: High Performance Computing and Communications. A
report by the committee on Physical, Mathematical, and Engineering Sciences. Office of Science and Technology Policy, 1991.
[23] E. A. Jackson. A first look at the second metamorphosis of science. Technical Report Report CCSR-95-1, Santa Fe Institute, 1995.
[24] W.J. Kaufmann III and L.L. Smarr. Supercomputing and the Transformation
of Science. Scientific American Library, New York, 1993.
[25] G. Kollias and E. Gallopoulos. Jylab: A system for portable scientific
computing over distributed platforms. In E-SCIENCE 06: Proceedings of
the Second IEEE International Conference on e-Science and Grid Computing,
page 97, Washington, DC, USA, 2006. IEEE Computer Society.
[26] D.J. Kuck, E. S. Davidson, D. L. Lawrie, and A.H. Sameh. Parallel supercomputing today and the Cedar approach. Science, 231:967974, February 1986.
[27] H. P. Langtangen. Python Scripting for Computational Science. Springer,
2006.
[28] W. Leontief. The Structure of the American Economy. 1945.
[29] J. K. Ousterhout. Scripting: Higher-level programming for the 21st century. Computer, 31(3):2330, 1998.
[30] B. N. Parlett. Progress in numerical analysis. SIAM Rev., 20(3):443455,
July 1978.
[31] B. N. Parlett and Y. Wang. The influence of the compiler on the cost of mathematical software - in particular on the cost of triangular factorization.
ACM TOMS, 1(1):3546, March 1975.
25
26
model: . [
]
: ... 2. (.) , ... 6. ... 7. (.)
.... [. .]
model: I. Representation of structure. 2e. A simplified description
of a system, process, etc., put forward as a basis for theoretical or
empirical understanding; f. (Math) A set of entities that satisfies
all the formulae of a given formal or axiomatic system. .... [The
New Shorter Oxford English Dictionary]
What is a model? the term mathematical model ... will be used for any complete and consistent set of mathematical equations
which is thought to correspond to some other entity, its prototype.
The prototype may be a physical, biological, social, psychological or
conceptual entity... Being derived from modus (a measure) the
word model implies a change of scale in its representation ... In
so far as the prototype is a physical or natural object, the mathematical model represents a change on the scale of abstraction. Certain
particularities will have been removed and simplifications made in
obtaining the model. - Rutherford Aris [4, Chapter 1].
Models. ... It is customary nowadays, for example, to refer to a
computer model of the atmosphere, even though this consists of
nothing more than a programme for manipulating observed measurements of temperature, pressure, humidity, etc., according to
the dynamical equations of meteorology. The notion of a model
thus extends into a purely symbolic domain, where there is only
an abstract similarity between the original system and its model.
- John Ziman [17, Chapter 2.12].
, .
,
. ,
, ,
- Claude Levi-Strauss [11, .
38].
27
28
c
2. 2008,
.
,
,
,
,
.
, -
.
. [15, . 263].
, ,
.
. ,
.
2.1
...
... .
[Pierre Simon de Laplace, Theorie
analytique des probabilites.]
One shouldnt always include all the effects in a mathematical model; a huge simulation of the exact equations (even if one knows
them) may be no more enlightening than the experiments that led
to these equations. There are virtues in simplicity, even in caricature (...) Solving is not the same as simulating. Our Models
are Our Metaphors: Princeton American Academy of Arts and Sciences, Philip
Holmes (SIAM News, June 2002.)
. : modus, . , modello.
Unesco ( , , 1972).
,
, .
, . ,
, ,
,
..
(.. -
c
2.1. 2008,
. 29
.)
, , . (.. , , , )
(.. ).
- (.. ) , , .
, ,
.
,
, ,
,
.1
, . , ,
, , .
. 2
(, ) .
,
, ..
, ,
,
. ,
3 .
/. ,
4 .
, .
,
1
, Unesco ( , , 1972).
.
3
:
, , [ , . .]
4
Simulation as a source of new knowledge
The Sciences of the Artificial [12] - -
Carnegie Mellon ..., Herbert Simon ( ).
2
30
c
2. 2008,
.
2.1: .
, , /, .
,
.
2.1.1. Evariste Galois
1830,
5
.
2.1.2. ,
Newton xi+1 = xi
0
f (xi )/f (xi )
. ( ) , f . ,
.. ,
.
, ,
x x
xi+1 = xi f (xi ) f (xi i)f i1
(xi1 ) .
0
00
f (xi ) f (xi1 )
xi xi1
0
(2.1)
f (xi ).
(2.1) Newton .
c
2.1. 2008,
. 31
,
(2.1) .
2.1.3.
. .
.
/ (
) , ,
.
.
/.
2.1.1.
, .. ,
.
.
2.1.2. , . ,
, MIT, Paul Krugman, .
, Krugman
. (. [9, . 47]):
. -
,
- . ,
,
:- , , ,
. ,
(
). ,
, , ,
,
.
32
c
2. 2008,
.
2.1.1
The study of computational complexity requires that one agrees on
a model of computation, normally called a machine model, for effectuating algorithms. Unfortunately many different machine models
have been proposed in the past, ranging from theoretical devices like the Turing machine to more or less realistic models of the random
access machines and parallel computers... [16]
, , .. , , ,
. :
. .
. .
. 8 .
:
.
/ , . , ,
.
,
, ,
(artifacts) ( ).
2.1.3. .
David Deutsch 5 The Fabric of Reality - The Science of Parallel Universes and its Implications [5]:
... What makes the general theory of relativity so important is not
that it can predict planetary motions a shade more accurately than
Newtons theory can, but that it reveals and explains previously unsuspected aspects of reality, such as the curvature of space and time.
This is typical of scientific explanation.... But the ability of a theory
to explain what we experience is not its most valuable attribute. Its
most valuable attribute is that it explains the fabric of reality itself.
... Yet some philosophers - and even some scientists - disparage the
role of explanation in science. To them, the basic purpose of a scientific theory is not to explain anything, but to predict the outcomes of
experiments: its entire content lies in its predictive formulae. ... This
view is called instrumentalism because it says that a theory is no
more than an instrument for making predictions...
David Deutsch ,
. Dirac
.
c
2.2. 2008,
.
2.2
33
.
(.. )
.
RAM
RAM (= Random Access Machine).
(. [3] ),
. , .
.
2.2.1. Horner
p(x) = 0 + 1 x + + n xn .
:
s = an
for i = n 1 : 1 : 0
s = s x + ai
end
, T (n) =
2n .
2.2.1. RASP (=
Random Access Stored Program) uniform cost criterion
(=straight-line), . branch
.
/ , 6
. ,
/. ,
. ,
.
,
6
, RAM
, .
c
2. 2008,
.
34
/, :
,
7 .
,
1.1 (-)
( RAM )
!
8
. , , .
2.2.1
I gradually and slowly found out that there were two things to talk
about; the fact that knowledge is acquired, so to speak, by memory;
but that when you know anything, memory doesnt come in. At any
moment that you are conscious of knowing anything, memory plays
no part.... You have a sense of the immediate... - Gertrude Stein
[14, p. 152].
Finally, we can combine LOAD and STORE into the arithmetic operations by replacing sequences such as { LOAD a; ADD b; STORE c
} by c a + b... - A. Aho, J. Hoprcroft and J. Ullmann [3].
RAM
/ :
,
,
,
.
:
load/store.
7
.
8
RASP
!
c
2.2. 2008,
.
35
K
M K .
, , / ...
.
load
,
.
load
(0)
.
,
.
RAM
. , RAM
.
:
(0)
= 0,
.
2.2.2. , Horner :
load x, an
s = an
for i = n 1 : 1 : 0
load ai
s = s x + ai
end
store s
2.2.2.
. , K ,
k = 0, ..., K 2m(k) f (k),
m f 0 [2].
,
.
, .
c
2. 2008,
.
36
:
.
min :
.
. .
2.2.1. n
, min
n + 1.
. , ,
load n. , store.
, m , min =
n + m.
2.2.3. Horner,
2.2.2 = 2n = n + 3.
, min n + 3 = min .
/
Mflop/s: Million Floating Point Operations Per Second
. , Mflops. [8],
Million Floating Point OperationS. .
.
,
.
, , , min ,
.
, T
T + T
+ ,
1+
!
,
c
2.2. 2008,
.
37
:= / . , min = min /
...
T = T
1+
1 + min
,
.
.
4.
(.
= 0), T = T . ,
RAM,
...
.
,
.
. ; , RAM.
.
(..
, )
(.. ) : IBM RS/6000
DEC Alpha 21064 ...
.
38
c
2. 2008,
.
9 .
, .
2.2.3. ( , )
. min .
( bandwidth) ,
, .
Bmax Mbytes/sec. ,
8 bytes (. IEEE),
Mflop/sec
max :=
Bmax
.
8min
, max
Mflop/sec.
, ,
.
.. .
min
,
. ,
( ) .
2.2.4. , ( = prefetching). ,
,
.
:
: . runtime system.
(=explicit) :
.
,
T ,
T ,
. :
1. P1 , P2 P2 P1 .
9
/ (timers, monitors), ..
.
c
2.3. 2008,
.
39
2. P1 P2 P2 P1 .
3. -
. [1]. Todd
Mowry Stanford ( Carnegie Mellon)
Tolerating Latency Through Software-Controlled Data Prefetching 10 , :
This dissertation proposes and evaluates a new compiler algorithm
for inserting prefetches into code. (...) The algorithm can prefetch both
dense-matrix and sparse-matrix codes, thus covering a large fraction
of scientific applications (...) The results of our detailed architectural
simulations demonstrate that the speed of some applications can be
improved by as much as a factor of two, both on uniprocessor and
multiprocessor systems...
MATLAB
. , ,
.. 11 .
2.3
2.3.1. () C C + AB C
Rn1 n2 , A Rn1 n3 B Rn3 n2 . , min =
10
www.cs.cmu.edu/ tcm/thesis/thesis_tech.html
MATLAB, - (predefinition)
.
11
c
2. 2008,
.
40
.
1. min =
min
2n+2
2n
2. min =
min
4
2
3. min =
min
4n2
2n3
=2
=
2
n
4. 2.2.3.
2.3.2 (, , 03-makeup). A Rnn , x Rn ,
R , I y = (A I)x.
, min , ... ( )
O(n)
( LOAD) ( STORE).
. :
y = (A I)x = Ax x
min = n2 +
2n + 1. O(n) ,
A.
:
LOAD , x
for i = 1 : n
LOAD A(i, :)
yi = A(i, :)x xi
end
STORE y
2n2 + n,
2
min = n +2n+1
.
2n2
2.3.3. for i = 1:n, y(i) = a*x(i)+y(i),
end. b = 3.
. rem r = rem(n, m) m, n
, . n = pm + r , 0 r m 1,
:
c
2.4. 2008,
.
41
r = rem(n,b);
for i = 1:r, y(i) = a*x(i)+y(i); end;
for i = r+1:3:n
y(i) = a*x(i)+y(i);
y(i+1) = a*x(i+1)+y(i+1);
y(i+2) = a*x(i+2)+y(i+2);
end
2.4
Qn 2.4.1. pn (x) =
j=1 (x j ). , MATLAB ,
pn (x).
. :
2.5
2.5.1. MATLAB , k ,
Aj = rand(2kj+1 , 2kj ),
j = 1, . . . , k
Aj Aj
eval, num2str, rand.
n = 2k . ,
k = 1 : 10. Bk = A1 A2 A3 . . . Ak .
1. a Bk
n :
c
2. 2008,
.
42
4. MATLAB B = A1 A2
A3 . . . Ak . .
MATLAB ;
5. , (3)
( ).
. k (k = 10 )
:
for j=1:k,
eval([A num2str(j) =rand(2(k-j+1), 2(k-j))]);
end
1. C Rmn , D Rnk
m(2n 1)k = 2mnk mk .
:
n3
4
n2
4 .
2: (A1 A2 A3 ) = 2n n4 n8 n n8 =
n3
16
1: (A1 A2 ) = 2n n2 n4 n n4 =
n
n
3: (A1 A2 . . . A4 ) = 2n n8 16
n 16
=
n2
8 .
n3
64
n2
16 .
...
n
k 1: (A1 A2 . . . Ak ) = 2n 2k1
3
n
2k
n 2nk =
n3
22(k1)
n2
.
2(k1)+1
j 2n2j 2n
j+1 ,
:
a =
k1
X
j=1
n3
n2
n3 4n n2 2n
n3
n2
2n
)
=
+
2j
j+1
2
2
3
2
3
2
3
2. :
1: (Ak1 Ak ) =
n2
22(k2)
n
(=
2k2
12)
...
k 3: (A3 . . . Ak1 Ak ) = 2 n4 n8
n
4
n2
16
n
4
k 2: (A2 . . . Ak1 Ak ) = 2 n2 n4
n
2
n2
4
2
n
2
k 1: (A1 . . . Ak1 Ak ) =
2n n2
n=n n
n
n
j 22(kj1)
2kj1
,
:
k1
X
j=1
n2
22(kj1)
)=
kj1
4n2 16
4n2
4
(2n 4) =
2n
3
3
3
3. a /a
:
c
2.5. 2008,
.
43
1.4
1.2
0.8
0.6
0.4
0.2
50
100
150
n
200
250
300
2.2: () /a ( 2.5.1).
Omega_a=sym((n3/3)-(n2/2)+(2*n/3));
Omega_delta=sym((4*n2/3)-(2*n)+(4/3));
k=(1:8);
flops_a=zeros(length(k), 1);
flops_delta=zeros(length(k), 1);
for i=1:length(k),
n=2k(i);
flops_a(i)=eval(Omega_a);
flops_delta(i)=eval(Omega_delta);
end
plot(2.k, flops_delta./flops_a, -);
xlabel(n);
ylabel(\Omega_a/\Omega_\delta);
2.2 .
4. ,
. .
MATLAB
. 2.3.
MATLAB ,
!
c
2. 2008,
.
44
k_max=10;
n=2.(2:k_max);
t_left=zeros(k_max-1, 1);
t_right=zeros(k_max-1, 1);
t_mat=zeros(k_max-1, 1);
for k=2:k_max,
an=sprintf(Matrix multiplication for n=%d, n(k-1));disp(an);
for j=1:k,
eval([A num2str(j) =rand(2(k-j+1), 2(k-j));]);
end
for m=1:100,
tic;
left_ex=A1;
for j=2:k,
left_ex=left_ex*eval(strcat(A, num2str(j)));
end
t_left(k-1)=t_left(k-1)+toc;
end
t_left(k-1)=t_left(k-1)/100;
for m=1:100,
tic;
right_ex=eval(strcat(A, num2str(k)));
for j=k-1:-1:1,
right_ex=eval(strcat(A, num2str(j)))*right_ex;
end
t_right(k-1)=t_right(k-1)+toc;
end
t_right(k-1)=t_right(k-1)/100;
for m=1:100,
tic;
mat_ex=A1;
for j=2:k,
mat_ex=strcat(mat_ex, *, strcat(A, num2str(j)));
end
mat_ex=strcat(mat_ex, ;);
eval(mat_ex);
t_mat(k-1)=t_mat(k-1)+toc;
end
t_mat(k-1)=t_mat(k-1)/100;
end
5. 2.4 ,
.
c
2.5. 2008,
.
45
0.25
left to right
right to left
matlab
0.2
time (sec)
0.15
0.1
0.05
200
400
600
n
800
1000
1200
2.3: ( 2.5.1).
2.5
/a
1.5
0.5
200
400
600
n
800
1000
2.4: () /a ( 2.5.1).
1200
46
(...)
, (...) (...)
. Edgard Morin [29].
1.1, - - , ! ,
2.1,
, :
.
:
.. ,
, .
.
,
(..
), . , , .
, ,
1)
, 2)
.
,
47
c
48 3. 2008,
.
, . .
1 . Nick Higham2 ([17])
.
,
.
3.1
x x
.
Eabs (
x) = |x x
|,
Erel (
x) =
|x x
|
.
|x|
|x| ,
.
. , (
) ; ,
, , .
, x x
(
). .
,
(. ).
,
, , ( ).
x x
xk
, . Eabs (
x) = kx x
k Erel (
x) = kx
kxk .
,
, , ,
.
1
2
.
Manchester.
c
3.2. 2008,
. 49
3.1.1. p1 = [1, 0, , 0] R1000
p2 = [103 , , 103 ] R1000
x x1 x2 . 1 = kp1 k1 =
kp2 k1 x1
2 : 1000 x.
, ,
. ,
Cn x Cn ,
y = |x| yi = |xi |, i = 1 : n.
. x, y
Rn
x y xi yi , i = 1 : n.
,
. Rmn Cmn . ,
x x
|x x
|
[|x1 x
1 |/|x1 |, . . . , |xn x
n |/|xn |.
() ( = relative
componentwise error)
max
i
|xi x
i |
.
|xi |
(normwise analysis)
(normwise analysis),
.
3.2
Floating point arithmetic is by nature inexact, and it is not difficult to misuse it so that the computed answers consist almost
entirely of noise. One of the principal components of numerical
analysis is to determine how accurate the results of certain numerical methods will be David Knuth [22]
I would be afraid to fly in an airplane that was designed with floating point arithmetic. (1960) Alston Householder
bytes
(...). ...
/ 3
3
. L.N. Trefethen Predictions for Scientific Computing
50 years from now
c
50 3. 2008,
.
. ,
/.
, ,
...,
.
... F R
y = m et ,
(3.1)
t . ,
F . F ...
F . = 2 (.. = 16 IBM/360)
= 10. t m . , m
( y ), m/ t < 1
, y .
mantissa 4
( 2) y . Knuth mantissa : ... but it is
an abuse of terminology to call the fraction part a mantissa, since the concept
has quite a different meaning in connection with logarithms. Furthermore the
English word mantissa means a worthless addition. [22, page 199].
:
... (3.1) bytes, ..
4 8 /.
bit , ()
m,
e.
F
.
F M =
mmax emax t = mmin emin t . ,
, F . F
,
. 32 64 bits
[, M ].
4
.
mantis(s)a ,
, .
. ,
, mantissa ( ).
mantissa .
.
c
3.2. 2008,
. 51
(3.1) ...
e
t. , t F .
, F , Wilkinson:
,
t m ,
(emin , emax ) e, . emin e emax ,
,
F(, t, emin , emax ).
:
.
.
.
.
3.2.1 ,
F F .
R F fl
fl : R F
fl(x) F F x,
() . fl
( continuum) .
G F
G := {x R : |x| M } {0} R ,
M ...
.
fl(x).
:
x F fl(x) = x, . x.
x G x 6 F , fl(x) 6= x. x fl(x) F x. fl(x) 6= x,
. ,
.
c
52 3. 2008,
.
x1
x3
x4
x2
3.1: R
F .
y F
(y , y+ ), y , y+ F
x. fl(x),
5
R .
c
3.2. 2008,
. 53
x
(y , y+ ). ,
, . y y+
.
3.2.2. ... F(10, 4, 9, 9) x1 =
0.10005, x2 = 0.10015. fl(x1 ) = 0.1000 fl(x2 ) = 0.1002.
3.2.3.
MATLAB SciLab, eps
1+ 1. , 2
.
16-
1 + eps/2
1 3f f 0000000000000
1 + eps
1.0000 3f f 0000000000001
1 + eps + eps/2
1.0000 3f f 0000000000002
1 + 2 eps
1.0000 3f f 0000000000002
1 + 2 eps + eps/2 + eps/4
1.0000 3f f 0000000000002
0.
. , , (y , y+ )
x y , y+ F , fl(x) = sign(x) max(|y , y+ |).
3.2.2 bit
...
,
.. y = 0.d1 0000 e y = 0.0d1 0000 e+1 .
. ,
m ( ) , . y R F d1 6= 0.
= 2, d1 6= 0 d1 = 1.
, 2 ( ),
bit m 1. , . , ,
(
) bit.
3.2.1.
bit (
). , ..,
bit
.
bit
/ , bit .
c
54 3. 2008,
.
y F
y = e .d1 d2 dt
0 di 1, d1 6= 0.
3.2.3 ...
... F , y 6= 0 F .
:
y F y 6= 0
, z+
z , . z [z , z+ ] ... z
z < z z z+ < z+ .
t
z }| {
z = . x x e
( )
z z = m et z+ = (m + 1) et .
z+ z = et . ,
z
z , ,
, |z fl(z)| =
z .
et
2
|z fl(z)|
z }| {
. 0 0 e
t1 e
2
et
.
2
z G
z fl(z)
|
z
et 1 e
/
2
1t
= u.
2
(3.2)
c
3.2. 2008,
. 55
... ( )
...
(wobbling)
|m et (m + 1) et |
|m e+1t (m + 1) e+1t |
= et
= e+1t
e e+1 .
3.2.4
. =0
1+x
x6 ! ,
... ()
= 0.
... x .. 0 x < 1+x
,
, y = 1+5.5511
1017 , y ... 1.
3.2.4. MATLAB 6.5, , :
>>
>>
p1 = 5.5511e-017
1+p1
ans = 1
(1+p1==1)
ans = 1
... - ... -
..., x, ..., x+ , . ...
x < x+
... . , ... x
, (x, x+ ) = x+ x, ... x+ .
.
, y1 y2 ,
, ,
[y1 , y2 ] .
... :
3.2.1. , M ,
1 ..., .
M := (1, 1+ ).
(3.3)
6
, !
c
56 3. 2008,
.
,
:
M ...
1:
1+
(3.4)
,
(3.3).
, M ... ... t 1
, 1
1 + 21t , 3.2.1 M = 21t .
3.2.2. u = 2t .
, M = 2u.
3.2.3. ,
(.. MATLAB) ( : , ,
):
t=1.0
while (1.0 +t > 1.0)
t=t/2.0;
end
t=t*2.0;
... .
,
.
.
3.2.4. Fortran 90 M
, EPSILON.
3.2.5. M
. MATLAB eps. ,
Toshiba 320CDT Pentium II
:
>> eps
eps = 2.2204e-16
>> 1+eps >1
ans = 1
>> 1+eps/2 >1
ans = 0
c
3.2. 2008,
. 57
MATLAB.
Pentium MATLAB
3.2.6. M
. MATLAB:
3.2.7.
EPS MATLAB . , ...
. , 7 ,
>> floor(0.75/0.25)
ans = 3
>> floor(0.075/0.025)
ans = 2
, ...
(.. IEEE), .
>> 0.75/0.25
ans = 3
>> 0.075/0.025
ans = 3.0000
v = 3-0.075/0.025
v = 4.4409e-16
v 2 eps. v,
. , 16
.
. . .
c
58 3. 2008,
.
>> 0.075
ans = 3fb3333333333333
>> 0.025
ans = 3f9999999999999a
>> 0.075/0.025
ans = 4007ffffffffffff
0.075
3, floor 2.
3.3
... ( )
/ .
.
F
m et ,
{+, , , /}, x, y F
x y 6 F .
R .
3.3.1. ... t
2t
.
3.3.2. M F ,
,
F .
.
... /.
8
; ,
, ,
... . , (ALU)
. , Pentium bug Intel
, .
, (..
). , ,
8
... ! , .
c
3.3. 2008,
. 59
. ,
.
,
x, y F
= fl(x y) F
xy
(3.5)
. R (. x y )
. , (=exact rounding).
,
.
,
z = xy
z
. ,
. ,
, ,
(guard digit).
3.3.3. ... = 2, t = 3 u =
213 /2 = 1/8.
=
=
21 0.100
20 0.111
21 0.100
21 0.011|1
21 0.0001
22 0.100
21 0.100
20 0.111
21 0.100
21 0.011
21 0.001
21 0.100
(. (3.5) ),
22 0.1 21 0.1
| = 1 = 8u
22 0.100
(3.5).
bits ,
(3.5) bits
( , (guard, round digits)
sticky bit) . .4 [16], [12]. [23]
c
60 3. 2008,
.
IEEE.
3.3.1.
( Cray C-90)
.
3.3.2.
:
fl(x y) = (1 + )x (1 + )y, ||, || u
,
R +,
. , x, y, z R ,
.
0 x + y R .
1 : x + y = y + x.
2 x + (y + z) = (x + y) + z .
3 0 x + 0 = x x R .
4 x R x R
, . x + (x) = 0.
0 x y R .
1 : x y = y x.
2 : x (y z) = (x y) z .
3 1 x 1 = x x R .
4 x x1 R x ( x1 ) = 1.
: x (y + z) = x y + x z.
R
F ... (3.5).
. 1, 3,
4 1, 3. 0 0
F .
3.3.4. 2:
t1 = fl(x + y) s1 = fl(y + z)
t2 = fl(t1 + z) s2 = fl(x + s1 )
t1 = (x + y)(1 + 1 ) t2 = (t1 + z)(1 + 2 )
t2 = ((x + y)(1 + 1 ) + z)(1 + 2 ), |j | u.
s2 = (x + (y + z)(1 + 2 ))(1 + 1 ) |j | u.
c
3.3. 2008,
. 61
, t2 6= s2
... .
3.3.5. 4: , x F z =
fl( x1 ) = x1 (1 + 1 ) x z = x x1 (1 + 1 )(1 + 2 )
. MATLAB 1 y 200
( y1 ) y 6= 1:
index = [];
for i=1:200
if ((1/i)*i = 1)
index = [index i];
end;
end;
NEC Versa SX Pentium 2
x = [49, 98, 103, 107, 161] (1/x)
x 6= 1.
F ,
R .
, .. , .
,
... x1 , x2 x3 , x4
x1 + x2 + x3 + x4
2 )+x
3 )+x
4 (x1 +x
2 )+(x
3 +x
4 ). ,
((x1 +x
. , , ,
, .
3.4.4 3.2.3.
... ,
. )
... )
.
(), x, y F , . x y G,
|fl(x y) (x y)|
u, x y 6= 0, {+, , , /}.
|x y|
(3.6)
(3.5),
fl(x y) xy
.
:
c
62 3. 2008,
.
3.3.1. x, y F x y G,
fl(x y) = (x y)(1 + ), || u
xy
, || u
1+
x y F = 0. F
x y x, y F .
fl(x y) =
3.3.3.
fl(x y) = (x y)(1 + ),
xy
(), ...
( R ) x
= x(1 + ), y = y(1 + ). ,
, . = = /,
...
( R ) x
= x, y = y(1 + ),
x
= x(1 + ), y = y , x y
(x y)(1 + ).
3.5.
3.3.2.
.
fl(ij ) = ij (1 + ij ), |ij | u,
|fl(A) A| |A|u
fl(A) = A + E, |E| u|A|,
fl(A + B) = (A + B) + E, |E| u|A + B|,
3.1.
, (3.6)
, .
(3.6)
/.
3.3.6. ... x, y F
x2 + y 2 G
:
9
, fl( x) ...
... x.
c
3.4. 2008,
. 63
...
t1
t2
t3
t3
t1
x2 (1 + 1 )
y 2 (1 + 2 )
(t
1 + t2 )(1 + 3 )
t3 (1 + 4 )
x
t3 (1 + 5 )
fl(x x)
fl(y y)
fl(t1 + t2 )
fl( t3 )
fl( tx3 )
t1
=
=
=
=
x
(1 + 5 )
t3
x
(1 + 5 )
t3 (1 + 4 )
x
p
(1 + 5 )
(t1 + t2 )(1 + 3 )(1 + 4 )
x
p
(1 + 5 )
(x2 (1 + 1 ) + y 2 (1 + 2 ))(1 + 3 )(1 + 4 )
j
u.
,
z t1 . 3.5
.
(1+x)1
3.3.7. y =
.
fl(y)
=
((1
+
x)(1
+
x
1 ) 1)(1 + 2 )(1 + 3 )/x
fl(y)
(1 + 1 )(1 + 2 )(1 + 3 ) +
1 (1 + 2 )(1 + 3 )
x
, x > M ,
fl(y)
1 + (1 + 2 + 3 ) + 1 2 + 2 3 +
1 (1 + 2 )(1 + 3 )
+1 3 + 1 2 3 +
x
1 + O(u)
fl(y)y
|
y
fl(y) y
= 1.
3.4
The simplest and best, though harder to attain, solution to the problem of environmental parameters is to standardize floating-point
hardware, so that the values of the parameters become universal
constants... - Webb Miller [26]
c
64 3. 2008,
.
... 1960
, . 1980, ...
. ..
. ,
Berkeley, Velvel Kahan, 1985
IEEE floating-point standard 754.
, ,
() .. 0,
, .
format ...,
, , formats.
,
.
formats.
formats F(2, 24, 125, 128)
, F(2, 53, 1021, 1024)
. Wilkinson
3.1. = 2, bit
. ,
z }| { z }| {
1 10000001 001 0
11
bits
52
c0900000
-4.5
bits
}|
{ z }| {
z
1 10000000001 0010 0
c012000000000000 -4.5
bit .
, ,
z}|{
e = 2 = 1025 1023
bit
m = 1.125 = 2
z}|{
1
c
3.4. 2008,
. 65
3.1: IEEE-754.
32 bits
64 bits
1
1
23+1 bits
52+1 bits
1038
10308
, .. Intel:
*86, Pentium. DEC: Alpha, IBM: RS/6000, Motorola: 680*0, Sun: SPARC,
PowerPC, MIPS R10000, .
3.4.1
The rational number system is inadequate for many purposes, both as a field and as an ordered set... This leads to the ... irrational
numbers which are often written as decimal expansions and are
considered to be approximated by the corresponding finite decimals... Walter Rudin [33]
Formats: single, double, single extended, double extended.
: ) , )
, ) 0 ().
: bits . 10 .
x/0, 0/0
y y < 0. 0, ,
NaN, Not a Number
.
, (=subnormal) , . .
: inexact, invalid op., overflow, underflow, division
by 0. 3.2
IEEE.
3.4.1. MATLAB isieee
1 IEEE.
IEEE :
10
.
.
c
66 3. 2008,
.
invalid op.
0/0, 0 , 1
NaN
overflow
Inf
divide by 0 finite number/0
Inf
underflow
subnormal numbers
inexact
fl(x y) 6= x y
>> 1/(1/0) =
Warning: Divide
ans = 0
>> 1/1/0
Warning: Divide
ans = Inf
>> 0/0
Warning: Divide
ans = NaN
>>1/0
Warning: Divide
ans = Inf
>> max(ans,4)
ans = Inf
>> min(ans,3)
ans = 3
by zero
by zero
by zero
by zero
3.4.2.
/ . Fortran machar.f ( Cody) [6], paranoia.f
( Kahan). ,
... IEEE . 3.4
.
...
3.4.2
IEEE ,
.
, .. emin t .
, .. 0 < a b < m.
3.4.2. Matlab 5.1.0 Windows Intel Pentium.
c
3.4. 2008,
. 67
MATLAB realmin.
>> realmin
ans = 2.225073858507202e-308
>> format hex
>> realmin
ans = 0010000000000000
:
. , ,
m
. , , .
(=gradual underflow).
, .
3.4.3 (Kahan).
if (x > y),
...
... log(x-y) ...
end
x, y ()
|x y| .
, 0 log(0).
.
c
68 3. 2008,
.
[1 2; 0 1/2]. U (2, 2)
, , U !
1.
. Matlab 5.1.0 Windows Intel Pentium.
realmin .
3.4.3.
..., .
(. [8]).
3.4.3
IEEE (extended
format)). 79 bits (mantissa 63, exponent
15), u 5.42 1020 [104932 , 104932 ].
, Pentium, ... 80 bits
(= double rounding). ,
80 bits 64 32 bits.
3.4.4. . (. [16])
= 10 2 3. 1.9 0.66 = 1.254. round p (x) x p . round 2 (1.254) = 1.3
round 2 (round 3 (1.254)) = 1.2.
, (
). , Kahan
128 bits [17].
3.4.4
z + x y
x y x + y . -
c
3.4. 2008,
. 69
Cray (
chaining), ( x y ) ( z + (x y))
(. !) , FMA
DOT(x(1 : n), y(1 : n)) n + O(1)
2n + O(1). , (.. IBM RS/6000) z + x y
.
.
, . ,
fl(z + x y) = (z + x y)(1 + ),
|| u.
. ( )
x = det
a b
c d
x = ad bc ,
F , :
t1 = ad
t2 = bc
x = t1 t2
t1 = ad(1 + 1 )
t2 = bc(1 + 2 )
x
= (t1 t2 )(1 + 3 )
|
x x|
=
=
|
x x|
(|ad| + |bc|)2u + (|ad| + |bc|)u2
.
|x|
|x|
|ad|, |bc| |x|, .. |x| ,
. Kahan
:
c
70 3. 2008,
.
t1 = bc
t2 = t1 b c ( FMA)
t3 = a d t1 ( FMA)
x = t3 + t2
t1 = bc(1 + 1 )
t2 = (t1 bc)(1 + 2 )
t3 = (ad t1 )(1 + 3 )
(1 + 4 )
x = t3 + t2
|
x x| =
=
=
=
=
|(t3 + t2 ) x
4 (ad bc)|
|(ad t1 )(1 + 3 ) + (t1 bc)(1 + 2 )
x4 (ad bc)|
|(ad bc(1 + 1 ))(1 + 3 ) + (bc(1 + 1 ) bc)(1 + 2 )
x4 (ad bc)|
|ad3 bc(1 + 3 + 1 3 )
+bc1 (1 + 2 ) x
4 |
|x3 x
4 bc1 (2 3 )|
(|x| + |
x|)u + |bc|2u2
|x| , |x|, |
x|
,
(
|bc|u > |x|). .
3.4.1. FMA
.
.
3.4.2. FMA
.
3.4.5. ...
standard IEEE.
Goldberg [12] What every computer scientist should know about floating-point
arithmetic. [16, Appendix].
IEEE 754 floating-point standard
[20].
. [2]
... Intel Pentium.
3.4.5 Java
Network Computing.
Java. Java
. (..
c
3.5. 2008,
. 71
interfaces, , ),
. ; Java
:
: , Java
complex types.
.
: Java linguistically enforced exact reproducibility of all floating point results, Java (
W. Kahan cruel delusion11 .)
, Java
. .. Java Linpack12 .
3.5
As every physicist knows, no equation is exact; therefore, we believe that finite precision computation can be closer to physical
reality than exact computation. Thus it appears possible to transform the limitations of the computer arithmetic into an asset. [5]
(...): ,
, .
, .
; (...) ,
. [9]
Its impossible to compute things which dont exist. Its difficult
to compute things which almost dont exist. [Cleve Moler]
,
(. R ), f : U Rm Rn .
:
x U m f
f (x) n x,
x , . x F . x
x, x = fl(x).
11
12
http://www.netlib.org/benchmark/linpackjava
c
72 3. 2008,
.
f (x ) f (x ), (.
R .
fprog f ... F .
,
kfprog (x ) f (x)k
kfprog (x ) f (x)k
.
kf (x)k
(3.7)
(.. ) f
.
f (x), (3.7)
. ,
(3.7).
accuracy precision.
.
.
Mathematica.
:
,
accuracy
.
(3.7) . precision
.
...
3.5.1. Vel Kahan [21]: Precision concerns the tightness of specification. Accuracy concerns its correctness. An utterly inaccurate
statement ... can be uttered quite precisely... 3.177777777777 is a rather precise
(13 dec. digits) but rather inaccurate (2 significant decimal digits) approximation
to .... Although exp(10) = 0.00000454 has 3 decimal digits of precision, it is
accurate to almost 6. Precision is to accuracy what intent is to accomplishment.
A natural disinclination to distinguish them invites first shoddy science and ultimately the kinds of cynical abuses brought to mind by Peoples Democracy,
Correctional Facility and Free Enterprise .
,
f (x ) f (x) .
f
x
x x. (,
) x.
, .
x, .
.
c
3.5. 2008,
. 73
f(x)
f(x*)
X
X
f(U )
f(y)
y
y*
f(y*)
3.2: ) )
.
3.5.1. f : R R
y := f (x).
x = x + x,
f ,
y = f (x + x). |f (x + x) f (x)| = |
y y|.
y y
= f (x + x) f (x)
= f (1) (x)x +
f (2) (x + x)
(x)2 , (0, 1)
2!
f, f (1) , f (2)
x, x + x. f (1) (x) ,
.
,
y y
y
f 0 (x)x
f (x)
x
+ O((x)2 )
x
(3.8)
f 0 (x)x
,
f
. x1 +x2 +x3
( 0, 1, 2)
R ,
... ,
fprog . fprog
x f
c
74 3. 2008,
.
f(x)
fp or g(x)
f(x )
x
()
f(x)
x
fp or g(x)
x*
()
f(x*)
3.3: ) . )
.
. f
.
fprog :
3.5.1. x U x x
fprog (x) f (x ).
.
,
- .
3.3.
, . ,
.
, fprog (x) f (x ) x
x. , :
3.5.2. , x
x fprog (x) = f (x ). ()
.
, .
(3.7) (
) (=forward error). ,
c
3.5. 2008,
. 75
, kx xk
( )
...
. ,
f (x) .
, ..
x = (1 , . . . , N )
a1 = f1 (1 , . . . , N ), a2 = f2 (a1 , 1 , . . . , N ), , z = fn (an1 , , a1 , 1 , . . . , N ),
fn f1 f fj (aj1 , , a1 , x)
{aj1 , , a1 , 1 , . . . , N }.
...,
R , z
z .
fj . 3.3.6
3.4.5. ( )
(=forward error analysis).
, :
f
. 3.3.6 3.4.5
, .
f : Rn Rm .
3.7.
. ,
:
2
2
a
b
x+ x
cos 1
b b2 4c
f (x + ) f (x)
(a b)(a + b)
x++ x
2 sin2 2
|b| + b2
(1)
4c + Vieta
2
(x) + f (2) (x) 2 +
3.5.2.
, .
1962 Ramon Moore [28].
c
76 3. 2008,
.
x ( interval)
(xL , xU ) , ( interval arithmetic).
(.. (, +).)
[17] [1].
(..
3.3.6)
Qn
pn = i=1 (1 + i ) |i | u.
(1 u)n pn (1 + u)n .
pn = 1 + nu + O(u2 ).
.
:
3.5.1. |i | u i = 1 i = 1 : n nu < 1
n
Y
(1 + i )i = 1 + n ,
i=1
|n |
nu
:= n .
1 nu
. . n = 1. n = 1,
n
Y
(1 + i )i
= (1 + n1 )(1 + n )
i=1
1 + n
|n |
= 1 + n1 + n + n1 n
= |n1 + n + n1 n |
(n 1)u
(n 1)u2
+u+
1 (n 1)u
1 (n 1)u
(n 1)u + (n 1)u2 + u (n 1)u2
1 (n 1)u
nu
n .
1 (n 1)u
n = 1
n
Y
(1 + i )i
i=1
n1
1 + n1
1 + n
n + n + n n
n1 n
n =
1 + n
n1 + u
|
|n | |
1u
nu (n 1)u2
|n |
n
1 (n 1)u + (n 1)u2
c
3.5. 2008,
. 77
-5
gamma
10
-10
10
u = 2e-16
-15
10
10
10
10
10
10
10
10
3.4: n .
Qn
i=1 (1
+ i )i :=< n >
< n > < k >=< n + k >, < n > / < k >=< n + k >
( nu < 1):
nu
1 nu
nu(1 + nu + (nu)2 + )
nu + O(u2 )
,
n
Y
i=1
3.4 n
u = 2 1016 .
3.5.2. ,
3.3.6
: ... x, y F
x2 + y 2 G
c
78 3. 2008,
.
z =
t1
x
x2 +y 2
=
=
=
=
Erel (t1 )
x
(1 + 5 )
t3
x
(1 + 5 )
t3 (1 + 4 )
x
p
(1 + 5 )
(t1 + t2 )(1 + 3 )(1 + 4 )
x
p
(1 + 5 )
(x2 (1 + 1 ) + y 2 (1 + 2 ))(1 + 3 )(1 + 4 )
|j | u j = 1 : 5.
. 3.5.1
.
(x2 (1 + 1 ) + y 2 (1 + 2 ))(1 + 3 )
00
(x2 (1 + 2 ) + y 2 (1 + 2 ))
(x2 + y 2 )(1 + 2 ), 2 |2 | 2 ,
000
000
000
000
(x2 + y 2 )(1 + 2 ) =
0000
0000
0000
x2 + y 2 (1 + 2 ) 2 |2 | 2 .
3.5.1
t1
=
=
(x2 (1
+ 1 ) +
x
y 2 (1
+ 2 ))(1 + 3 )(1 + 4 )
(1 + 5 )
(1 + 2 )
000
(x2 + y 2 )(1 + 2 )
x
p
(1 + 2 )
0000
2
2
x + y (1 + 2 )
x
p
(1 + 4 )
2
(x + y 2 )
t1 z
|
z
|4 | 4 .
(3.9)
(3.9) z
t1 .
,
3.5.2.
x x
... z = fprog (x) z = f (x ).
kz zk
c
3.5. 2008,
. 79
,
kf (x ) f (x)k. kx xk ,
( ) f (= perturbations) .
, ,
.
!
13 (=
backward error analysis) 14 James Hardy Wilkinson (1919-86),
. Rounding Errors in Algebraic Processes 1963,
(
[36]).
( ) .
3.5.3. f (x1 , x2 , x3 ) = (x1 + x2 ) + x3
F . fprog (x1 , x2 , x3 ) = ((x1 + x2 )(1 + 1 ) +
x3 )(1 + 2 )
fprog (x1 , x2 , x3 )
= f (
x1 , x
2 , x
3 )
(3.10)
x
1 = x1 (1 + 1 )(1 + 2 ), x
2 = x2 (1 + 1 )(1 + 2 ), x
3 = x3 (1 + 2 ).
|
xj xj | = |xj (1 + 2 + 1 2 )| j = 1, 2 |
x3 x3 | = |x3 2 |.
|
xj xj | 3u|xj |, j = 1, 2 |
x3 x3 | u|x3 |.
|xj |
= |f (
x1 , x
2 , x
3 ) f (x1 , x2 , x3 )|
j u, j = 3 (j = 1, 2), 3 = 1.
. ( )
; . , f : R
R . ,
fprog (x ) f (x)
= f (xprog ) f (x)
x = x+x kxk .
3.5.1,
f (x + x) f (x)
13
= f (1) (x)x +
f (2) (x + x)
(x)2 , (0, 1)
2!
.
Wilkinson,
(1954) Wallace Givens von Neumann
Goldstine(1947) Turing (1948).
14
c
80 3. 2008,
.
f, f (1) , f (2)
x, x + x. (3.8)
f 0 (x)x
, .. | f (x) |
y
x.
3.5.4. f (x1 , x2 , x3 ) = (x1 +
x2 ) + x3 . () f (x1 , x2 ) = x1 +
x2 . ,
fprog (x1 , x2 ) = f (x1 (1 + 1 ), x2 (1 + 1 )). f (x + h) f (x) + [1, 1]h
h = [x1 1 , x2 1 ]> ,
f (x + h) f (x)
f (x)
|x1 1 + x2 1 |
|x1 + x2 |
khk
|x1 + x2 |
(x1 , x2 )
x1 + x2 khk. , x1 , x2
, .. , khk = |x1 1 + x2 1 | u|x1 + x2 |
f (x + h) f (x)
u.
f (x)
sn = n
for k = n 1 : 1 : 0
sk = xsk+1 + k
end
... 3.5.1
:
sn1
s0
=
=
fprog (0 , ..., n , x)
f (0 (1 + 1 ), ..., n (1 + 2n ), x)
c
3.5. 2008,
. 81
. 2n
,
|j
j | 2n |j | ,
... .
.
.
|p(x) s0 |
2n
|p(x)|
Pn
|k ||x|k
,
|p(x)|
k=0
.
3.6 .
.
.
,
(3.7) :
1. .
.
2.
()
. .
, , (3.7).
, f
x . x 6= x
kx x k kx x k/kxk.
. , .
.
3.5.3. (= condition number) Alan Turing 1948 Rounding-off
errors in matrix processes [34]. John Rice [31].
3.5.4.
,
c
82 3. 2008,
.
!
(. [36]
[31]). , (.2)
.
, x = (1 , . . . , m ), y = (1 , . . . , n ) y = f (x).
x y = f (x ).
fi
|x
x, mn ij =
j
f x. ,
ij f x
f .
, mn
. ,
Kj |j j | Kj kj j |.
n
x. ,
K ky yk2 Kkx xk2
|j j |. K
x.
.
. Kj
|j j |
kx xk2
Kj
.
j
kxk2
. , j
. Kj
|j j |
kx xk
Kj
.
kyk2
kxk2
K
ky yk2
kx xk
K
.
kyk2
kxk2
, :
3.5.3 (Rice [31]). X, Y
f : X Y , .
x y := f (x ). x , y
X, Y .
f x x
cond(f ; x ) := lim sup
(x )k
kf (x +h)f
0 khk=
kf (x )k
khk
kx k
c
3.5. 2008,
. 83
.
f (x )
x . ,
cond(f ; x ) khk
kx k
kf (x +h)f (x )k
.
kf (x )k
. :
15 x
cond(f ; x ) =
kx k f
k |x k.
ky k x
X = Rn Y = Rm , Frechet f .
f
,
sup
khk=
kf (x + h) f (x )k
kf (h)k
= sup
khk
khk= khk
x .
, f ( ).
:
cond(f ; x ) =
kx k
kf k
ky k
(3.7).
, :
3.5.1. f fprog
x, xprog
3.5.4. (3.5.1).
/ ()
cond(fprog )
kx xprog k
cond(fprog )u.
kxk
kfprog (x ) f (x)k
kf (x)k
15
kf (xprog ) f (x )k kf (x ) f (x)k
+
.
kf (x)k
kf (x)k
Frechet.
c
84 3. 2008,
.
:
kf (xprog ) f (x )k
kf (x)k
kf (xprog ) f (x )k
kf (x )k
kx xprog k
cond(f ; x )
kx k
kf (x ) f (x)k
kf (x)k
kx xk
kxk
cond(f ; x)E.
cond(f ; x)
kfprog (x ) f (x)k
kf (x)k
. , .
:
) , . cond(fprog )
.
.
) , . cond(f ; x)
) E , .
, E , :
<
3.5.5.
, .. f := f2 f1 .
cond(f ; x ) cond(f2 ; y )cond(f1 ; x ).
f , .. f1
f2 (
).
3.5.6. ,
John Rice [31].
3.5.7.
3.5.1, kzprog zk .
c
3.5. 2008,
. 85
3.5.8. f ([A; x]) = Ax A .
x.
kf (x + h) f (x)k
khk
khk6=0
sup
kAhk
khk6=0 khk
sup
= kAk
cond(f ; x) =
kxk
kAk
kAxk
kf (y + h) f (y)k
khk
khk6=0
sup
kA1 hk
khk
khk6=0
kA1 k
sup
cond(f ; y)
kyk
kA1 k
kA1 yk
kAxk 1
kA k
kxk
kAkkA1 k := (A)
A, y cond(f ; y) = (A),
(A) A.
3.5.10. ,
. , ,
, .
. ,
. ,
, 2- max /min ,
.
3.5.11. , (A)
n.
.
3.5.12.
, ..
Hilbert, Vandermonde .. Test Matrix Toolbox MATLAB [18]
W. Gautschi Vandermonde.
c
86 3. 2008,
.
3.5.13. (A)
. , (A)
.
3.5.7.
kXk f
k
|[x;y] k.
kx> yk X
f
|[x;y] = [y; x] R12n
X
cond(f ; X) =
k[x; y]k
k[y; x]k.
|x> y|
k[x; y]k2
|x> y|
3.6
, ,
.
.
J. Wilkinson The perfidious polynomial16 [35].
, , ,
Horner
.
.
.
p(x) :=
n
X
k=1
16
k k (x),
{k }k=1:n Pn1
c
3.6. 2008,
. 87
, .
p(x) k ;
p(x);
3.6.1
. ,
. ,
.
( ).
,
x
p(a; x) = 0, a (..
) :
a
p(a + a, x
) = 0,
x
= 0}
= inf{|kak kak, p(a + a; )
= 0 p(a; )
+ p(a; )
=0
p(a + a; )
..., n ]>
0 = p(a; x
) + [0 , . . . , n ][1, ,
kak kak,
|p(a; )|
..., n ]kD kak
k[1, ,
kkD
(. .1 ).
3.6.2
p(a; x) =
P
n
k
k=0 k x . , i + i , i = i ,
j :
[ ,0,i ,0, ,]
0 = p(a +
z}|{
a
; j + j ) =
0
p(a; xj + j ) + p(; j + j )
dp
i
p(a; j ) +j (a; j ) + i j(3.12)
dx
| {z }
0
c
88 3. 2008,
.
j
|
j
|i ji |
dp
|j dx
(a; j )|
|i ji |
dp
|j dx
(a; j )|
|i |
|i |
(3.13)
i + i , i = 1 : n
0 = p(a + a; j + j )
= p(a; j + j ) + p(; j + j )
n
X
dp
0 p(a; j ) +j (a; j ) +
k ji
dx
| {z }
k=0
(3.14)
(3.13) (3.14)
.
, , .
:
3.6.1. z = z pn Pn
pn (z; ) := pn (z) + g(z) g Pn .
pn (z; ) z()
|z() z +
gn (z )
(1)
pn (z )
| = O(2 ).
z m, m pn (z; )
|z() z [
m!gn (z )
(m)
pn (z )
]m 1/m | = O(2/m ).
z :=
gn (z )
(1)
pn (z )
, pn (z; )
z() z , O().
pn (z; ). pn (z; )
pn (z; 0)
.
m ,
m!g (z )
| (m)n ]m |1/m . ,
pn
(z )
. ,
17 .
.
17
(= bifurcation).
c
3.7. 2008,
.
89
.
J. Wilkinson The perfidious polynomial [35].
.
, .
, .. Lagrangre Newton.
Walter Gautschi, .. [11].
3.6.1 (Wilkinson . [35]).
( ) J.H. Wilkinson.
n
pn (x) := x +
n1
X
j xj ,
j=0
j pn j =
j, (j = 1 : n). 1 = 1 ,
,
.
condj =
(j + n)! j n j!
,
(j!)2 (n j)!
j = 1 : n.
3.6.1. Wilkinson 1 = 1 n = 2, 10
.
3.7
[17] (
James Demmel.)
z = f (a) f : Rn Rm p :
z = f (a), x1 = a Rn
x2 = g1 (x1 ) = [x1 ; 1 ]
1 x1 .
x3 = g2 (x2 ) = [x2 ; 2 ]
c
90 3. 2008,
.
xp+1
p+1
z = Ix
gp (xp ) = [xp ; p ]
I Rm(n+p+1) () .
Rn g1 gp gp+1 I Rm
x1 := a xk+1 = gk (xk ), k = 1 : p
xk
k
k R
x
k+1 = gk (
xk ) + xk+1
xk+1 k .
x
2
x
3
x
4
.
=
.
=
=
=
.
=
g1 (a) + x2
g2 (
x2 ) + x3
g2 (g1 (a) + x2 ) + x3
g2 (g1 (a)) + Jg2 x2 + x3
g3 (g2 (g1 (a))) + Jg3 Jg2 x2 + Jg3 x3 + x4
z =
p ( ((g1 (a) ) + Jg Jg x2 +
I[g
p
2
+Jgp Jg3 x3 + + Jgp xp + xp+1 ]
I m 0 1 (
).
x2
.
g Jg , . . . , Jg , I]
z = f (a) + I[(J
..
p
2
p
xp+1
.
= f (a) + Jh
z f (a) = Jh.
z = f (a + a) = f (a) + Jf a = f (a) + Jh
a q =
pn + p(p + 1)/2,
c
3.8. 2008,
.
91
, .
[4, 24, 25, 27, 26].
ADIFOR [14, 13, 3].
3.8
3.8.1. bit
. . 3.2.2.
3.8.2 (, , 03).
.
;
. . 3.3.
3.8.3. ;
. [, . 3.2]
1 ..., 1+ .
IEEE 1.0 01 1.0 0
. t 1
( ) (t1) =
1t .
3.8.4.
;
. [, . 3.2] , , ... ,
,
( )
...
1 ...
.
, M = 2u.
3.8.5 (Burden and Faires [32]). ( ).
p p 104
p ) , ) e, ) 71/3 .
. ) |
| 104
(1 104 ). pi
MATLAB ,
[3.14127849432443, 3.14190681285515].
) , exp(1) (1 104 )
[2.71801000027620, 2.71855365664189].
) , 71/3 (1 104 )
[1.91273988965411, 1.91312247589067].
c
92 3. 2008,
.
3.8.6 (, , 02-makeup).
MATLAB
.
7.0. (.:
).
kxk2 kxk1 .
. :
kxk1 =
X
X
|j | (
|j |2 )1/2 (
1)1/2
Cauchy-Schwartz . ,
kxk21 = (
|j |)2
|j |2
.
3.8.9 (, , 03). Ax = b, x
, r := b A
x
krk
:= kAkkxk+kbk = 1.5 1013 (
).
2 (A) = 106 .
kx
xk
kxk .
. ( Rigal-Gaches),
.
, :
kx x
k
kxk
2(A)
1 (A)
3 107 /(1 1.5 107 ) 3 107 .
18
http://www.mathworks.com/products/new_products/latest_features.html#ML.
c
3.8. 2008,
.
93
0.300 101
0.300 103
0.300 104
e
e10
0.310 101
0.310 103
0.310 104
22/7
3.1416
2.718
22000
p
0.300 101
0.300 103
0.300 104
e
e10
p
0.310 101
0.310 103
0.310 104
22/7
3.1416
2.718
22000
0.1
4
0.1 10
0.1 103
0.001264
7.346 106
2.818 104
2.647 101
0.3333 101
0.3333 101
0.3333 101
4.025 104
2.338 106
1.037 104
1.202 103
,
.
3.8.12. ... IEEE ,
( precision) . ... IEEE .
. IEEE , , 24 =
23(+1) ( ) 53 =
52(+1). , m 2E
1 m < 2 1.bs . ... ,
24 . d
, 10d 224 . , d = 7 24 log10 2 = 7.2247198.
10d 253 . , d = 16 53 log10 2 =
15.954589.
3.8.13. : (.
k k2 ) .
. . : Rn R
, , .
J : Rn Rn
J := [
,...,
].
1
n
v
u n
X
u
t
2
j i=1 i
pPnj
2
i=1 i
c
94 3. 2008,
.
() J := kxk2 =
x>
kxk2 .
1 : ,
Pn
kxk1 =
j=1 |j |
0. ,
.
. 1t /2. = 2,
IEEE 2t
. > 2 1t /2 =
(/2) t , ,
.
3.8.16. ...
1960 ( M. Overton [30]) 0,
, ..
.
.
. fmax ... ,
a/0b/0 = fmax fmax = 0 a, b 6= 0,
. ,
, program interrupt,
IEEE,
.
3.8.17. IEEE realmin . 1 0
boole.
temp2=realmin}/2;
temp=2*temp2;
boole(temp==realmin);
. MATLAB , realmin
realmin = 0010000000000000.
c
3.8. 2008,
.
95
3.8.18. f : Rm Rn
.
.
. , f (
, , ...),
()
() .
.
3.8.19. : G
x f (x) = y ( x, y ),
.
G
G x
G(y)
y
k
y yk ( 0) |G(
y)
x
|/|
x| . G
.
. , -
. x
= G(
y ) |G(
y)
x
|/|
x | = 0.
3.8.20. ,
.
. G x G(x) = y .
x
x
, G(x)
= G(
x). ,
kG(x)
G(x)k
kG(x)
G(x)k = kG(
x) G(x)k.
k
x xk/kxk,
.
G G1
x = G1 (y), kG(
x) G(x)k/kG(x)k = kG(
x x)k/kG(x)k
kG(
x x)k
kG(x)k
kG(
x x)k k
x xk
k
x xk kG(x)k
k
x xk kxk
kGk
kxk kG(x)k
k
x xk kG1 (y)k
kGk
kxk
kxk
k
x
xk
.
kGkkG1 k
kxk
=
, ,
k
xxk
, kxk
kGkkG1 k ( G).
c
96 3. 2008,
.
3.8.21. x, y R2
, (. i i 0)
xT y .
. x, y ... x1 (1 + 1 ), x2 (1 +
2 ), y1 (1 + 3 ), y2 (1 + 4 ) |j | u.
fl(xT y) = (1 1 < 3 >
+2 2 < 3 >) < 1 > 1 1 < 4 > +2 2 < 4 >.
|x|T |y|
|fl(xT y) xT y|
= 4
4
|xT y|
|xT y|
|xT y| = |1 1 + 2 2 | = 1 y1 + 2 2 = |x|T |y|.
3.8.22. ... 32
b0 b1 b8 b9 b31 , , 8 23
. b9 b31
:
. ) 0, 2126 .
fN min1 = 10126 log10 2 1.1754943e 38. ) ,
, fmin1 2126 223 =
2148 2.8026e 045. : ,
52 11 1024.
fN min2 21022 2.2250738e 308 realmin
fmin2 = 2102252 4.9406564e 324.
3.8.23. , ... IEEE
= y +x
.
, x+y
.
= fl(x + y) = fl(y + x) = y +x
.
... IEEE, x+y
=
x+y
(x + y)(1 + 1 ) y +x = (y + x)(1 + 2 ), 1 = 2 .
3.8.24 (Heath [15]). x y
log x log y m(x, y) := log(x) log(y)
. ,
log(x) log(y) = log(x/y),
M (x, y) := log(x/y) m(x, y).
.
; ( :
;)
c
3.8. 2008,
.
97
. . ,
. , .. y = 1 log y = 0,
log(x/y) = log(x).
MATLAB . , 2.
x = 256+512*eps y = 256. MATLAB
Warning:
Log of zero.
. )
, (a + b)(x + y) =
(ax by) + (bx + ay), 6 .
, , s = (a + b)x; t = b(x + y); p = y(a b);
(s t) + (t p). 5
3 .
(..
, O(n2 )
O(n3 ).
3.8.26. fma.
) fma
.
) fma
.
) fma
.
) .
. (). fma
. ,
c
98 3. 2008,
.
z + x y
. fl(z +
xy) = (z +xy)(1+), || u fl(z +fl(xy)) = (z +xy(1+1 ))(1+2 ).
3.9
3.9.1. ,
x|
x
R |x
|x| , x
. - - x R
,
|x
x|
. , , |x| .
.
|x
x|
|x| .
|x x
|
|
x|
x
|
x| x x
+ |
x|
< 1 ( )
x, x
> 0.
x
(1 ) x x
(1 + ),
|x x
|
|x|
|x x
| |
x|
|
x| |x|
= + O(2 ).
1
,
( ) x
(1 ) x.
x, x
< 0.
3.9.2. ... 5
b0 b1 b2 b3 b4 , , 2 2 .
b3 b4
c
3.9. 2008,
.
99
; )
( NaN, ).
{0, 0.125, 0.250, 0.375, 0.500, 0.625, 0.750, 0.875, 1.000, 1.250, 1.500,
1.750, 2, 2.50, 3.00, 3.50, 4, 5, 6, 7}.
19 0. 0.500 = 21
0.125.
) t = 2 + 1 , u =
213 /2 = 0.125
3.9.3. ()
... IEEE . ) 0.5, ) 0.1, ) 0.2, ) 23/4, ) 25/32100 ,
) 25/3 2100 , ) 2000.
. ,
.
x x = m 2E 1 m < 2. 2. ,
E m .
, . )
z}|{
127
= 126 ,
0.1 = 24 1.1001,
1001 -
c
100 3. 2008,
.
z}|{
127
. IEEE 4 +
3
= 123
z}|{
0.2 = 23 1.1001.
,
3
23/4 = 22 23 = 22 (20 + 21 + 22 + 24 ).
23/4 = 22 (24 + 23 + 22 + 20 ).
z}|{
2000 = 210 (1 +
976
).
1024
976
, 0.953125 = 1024
.
20.953125 = 1.90625, 20.90625 = 1.8125, 20.8125 = 1.625, 20.625 = 1.25,
2 0.25 = 0.5 2 0.5 = 1.0 , IEEE
10 +
z}|{
127
= 137 10001001. ,
c
c
3.9. 2008,
.
101
p(x; ) = (x 1)n
p(x; )
n. p;
3.9.5. p(x) = xn 1
p(x; ) = xn 1
p(x; )
n. p;
3.9.6 (). :
... x, y 1/ x/y ,
...
, = 2 ( ) > 2.
3.9.7. x x
. y = ex y = ex .
y x x
.
3.9.8. 1) , 1% 3%.
. 2)
...
IEEE .
.
. 1) x
= x(1 + x ) y = x(1 + y ) |x | 0.01
|y | 0.02.
x
y = xy(1 + x )(1 + y ) = xy + xy(x + y + x y )
x
y xy
| = |x + y + x y | 3 102 + 2 104
xy
0.0302.
)
)
x y . 2)
x
y = xy(1 + x )(1 + y )(1 + ) =
= xy + xy(x + y + + x y + x + y + x y )
| | u ( ). , t = 23
u = 224 6 108
c
102 3. 2008,
.
1. f (x) = x
x2 c x2 > c.
2. f (x) = x1/n .
3. f (x1 , x2 ) = x21 + x22
4. f (x1 , x2 , x3 , x4 ) = (x1 + x2 )/(x3 x4 ).
5. f (x) = ln x,
x > 0.
.
. ) f 0 (x) = 1 x(x2 c)1/2 . x2 c
. ) f 0 (x) = x1/n1 /n. )
f
f
2
x = 2[x1 , x2 ]. ) x = [1/(x3 x4 ), 1/(x3 x4 ), (x1 + x2 )/(x3 x4 ) , (x1 +
n sin(n arccos x)
.
1x2
3.9.10. )
D = ad bc [a, b; c, d] ( MATLAB )
D| Au + Bu2 , u
: |D
A B a, b, c, d. )
.
. , 3.4.4.
QN 3.9.11. ... P :=
i=1 pi , P
...
.
.
. ,
.
3.9.12 (C. Hoffmann [19]).
...
( ) Dobkin Silver. A
(0, 0), (0, 1), (1, 0), (1, 1 + p), (1 + p, 1) p .
A ,
C . A
B . gin(A)
A B gout(B) A C .
gin(gout(A)) = gout(gin(A)) = A,
c
3.9. 2008,
.
103
2. p
0.5, 0.1, 0.01, 0.001, 0.0001, 0.00001, 0.000001. ,
A
3.9.13.
sAXPY ) , )
.
G(x) := y y = G(x)
G .
kG(
x) yk
kG(x)k
kGkkG1 k
k
x xk
,
kxk
. ( : kGk kGk :=
supx6=0 kG(x)kY /kxkX kG1 k := supy6=0 kG1 (y)kX /kykY ).
. G x G(x) = y x = G1 (y),
c
104 3. 2008,
.
kG(
x) G(x)kY /kG(x)kY = kG(
x x)kY /kG(x)kY
kG(
x x)kY
kG(x)kY
kG(
x x)kY k
x xkX
k
x xkX kG(x)kY
k
x xkX kxkX
kGk
kxkX kG(x)kY
k
x xkX kG1 (y)kX
kGk
kxkX
kxkX
k
x xkX
kGkkG1 k
.
kxkX
=
x G(x)
= G(
x). ,
kG(x)
G(x)k
kG(x)
G(x)kY = kG(
x) G(x)kY .
k
x xkX /kxkX , .
, , ,
k
xxk
, kxk kGkkG1 k
( G).
3.9.17.
1
y = 1;
forj = 1 : n
..., n > 0
y = y ( + j )
end
) . )
y(1 , ..., n , )
Pn
1
P = j=1 +
.
j
. )
fl(y) =
n
Y
j=1
( + j )
n
Y
(1 + j )
j=1
n1
Y
(1 + j )
j=1
j + j
0
j
y = fl(y) fl( + j ).
1. ,
u. ()
n
Y
(1 + j )
j=1
n1
Y
j=1
(2n 1)u
1 (2n 1)u
2 .
c
3.9. 2008,
.
105
i)
fl(y)
n
Y
(1 + 2n1 )
( + j )
j=1
=
=
n
Y
j=1
n
Y
(0 + j ),
j=1
0
(1 + 2n1 )1/n = 1 +
|| < 1. 1 > 2n1 0. 0
1 + 2n1
0
(1 + )n = 1 + n + O( 2 )
n 2n1
2n1
2n1 .
n
(1 2n1 )1/n = 1
1 > 2n1 , > 0.
1 1 1
1 1 1
1
1
2
3
2n1 +
( 1)2n1
( 1)( 2)2n1
+
n
2! n n
3! n n
n
1
1 n 1 2n 1 2
1 n1 2
2n1 +
2n1 +
+
n
2n n
2n n
3n 2n1
1
1
1
3
2n1 +
2n1
<
2n1
2n
2n
1 2n1
2n
ii)
j , j = 1, ..., n n , . . . , . ,
+ j , 2
, j
.
fl(y) =
(1 + 2n1 )
n
Y
( + j )
j=1
( + 1 )(1 + 2n1 )
n
Y
j=2
(0 + 1 )
n
Y
( + j ),
j=2
( + j )
c
106 3. 2008,
.
0
|0 |
= |2n1 | 2n1 = (2n 1)u/(1 (2n 1)u)
||
0
|1 1 |
= |2n1 | 2n1 = (2n 1)u/(1 (2n 1)u)
|1 |
.
)
(y) =
k[1 , ..., n , ]k
kJk
|y|
J J R1(n+1)
J =[
y
y y
, ,
,
].
1
n
Y
y
=
( + i ) := pj
j
i6=j
X
y
=
pj .
j=1
Pn
(y)
pj
j=1 ( + j ) j=1
Qn
n
X
j=1
1
+ j
3.10
3.10.1 (Burden and FairesP
[32]). ex ,
x
x
.
x, e = j=0 j!
e,
P5 1
P10 1
: ) e j=0 j!
. ) e j=0 j!
.
c
3.10. 2008,
.
107
. Maclaurin,
|e
n
X
1
e
|=
j!
(n + 1)!
j=0
[0, 1]. ex ,
|e
5
X
e
1
|
3.775e 03,
j!
6!
j=0
|e
P5
1
j=0 j! |
e
1.388e 03
6!e
, = 1
1.390e 03
P5 1
e j=0 j!
, (.
3.9.1). , e
e
(n+1)! ,
.. e 2.72, .
3.10.2. , n+1 Maclaurin , .. , exp(x) ( Maclaurin
Pn xn,
Mn (x)), .. ex Mn (x) = j=0 j!
exp(x).
, Maclaurin exp(x)
x ( ). 9- Maclaurin IEEE
Pn (5)j
e5 : ) e5 Mn (5) :=
j=0 j! . )
1
1
5
5
e = 1/e Mn (5) = Pn 5j . (.. MATLAB )
j=0 j!
n = 1, 2, ..., 15. ,
exp(x) exp(x) (
, , ,
exp(x)).
;
3.10.3. ( MATLAB)
... . machar.f:
a = 1.0; b = 1.0;
while ((a+1.0)-a)-1.0 == 0.0, a = 2.0*a;
while ((a+b)-a)-b = 0.0, b = b+1.0;
1. ( ) (
b).
2. .
108
3.10.4.
( mantissa.)
beta := ...
t = 0; b = 1.0;
[1] O. Aberth. Precise Numerical Methods Using C++. Academic Press, San
Diego, 1998.
[2] D. Alpert and D. Avnon. Architecture of the Pentium microprocessor. IEEE
Micro, pages 1121, June 1993.
[3] C. Bischof, A. Carle, and A. Mauer. Adifor 2.0: Automatic differentiation
of Fortran 77 programs. IEEE Computational Science & Engineering Mag.,
3(3):1832, 1996.
[4] B. Bliss, M.-C. Brunet, and E. Gallopoulos. Automatic program instrumentation with applications in performance and error analysis. In E.N. Houstis, J.R. Rice, and R. Vichnevetsky, editors, Expert Systems for Scientific
Computing, pages 235260. Elsevier Science Pub. B. V. (North-Holland),
1992.
109
[5] F. Chaitin-Chatelin and V. Fraysse. Lectures on Finite Precision Computations. SIAM, Philadelphia, 1996.
[6] W.J. Cody. Algorithm 665: MACHAR: A subroutine to dynamically determine machine parameters. ACM Trans. Math. Softw., 14(4):303311,
1988.
[7] T. Coe, T. Mathisen, C. Moler, and V. Pratt. Computational aspects of the
Pentium affair. IEEE Computational Science & Engineering Mag., 2(1):18
30, 1995.
[8] J. Demmel. Underflow and the reliability of numerical software. SIAM J.
Sci. Stat. Comput., 5(4), 1984.
[9] U. Eco. . . ,
, 2003. Dire quasi la stessa cosa.
Esperienze di traduzione, R.C.S. Libri S.p.A., Milano, Bompiani 2003.
[10] A. Edelman. The mathematics of the Pentium division bug. SIAM Review,
39:5467, 1997.
[11] W. Gautschi. Questions of numerical condition related to polynomials.
In G.H. Golub, editor, Studies in Numerical Analysis, volume 24, pages
140177. Mathematical Association of America, 1984.
[12] D. Goldberg. What every computer scientist should know about floating
point arithmetic. ACM Comput. Surveys, pages 548, 1991.
[13] A. Griewank. On automatic differentiation. In M. Iri and K. Tanabe, editors,
Mathematical Programming: Recent Developments and Applications, pages
83108. Kluwer Academic Pub., 1989.
[14] A. Griewank and G. F. Corliss, editors. Automatic Differentiation of Algorithms: Theory, Implementation and Application. SIAM, Philadelphia, 1991.
[15] M.T. Heath. Scientific Computing: An Introductory Survey. McGraw Hill,
Boston, 2nd edition, 2001.
[16] J.L. Hennessy and D.A. Patterson. Computer Architecture: A Quantitative
Approach. Morgan Kaufmann, San Mateo, CA, first edition, 1990.
[17] N.J. Higham. Accuracy and Stability of Numerical Algorithms. SIAM, Philadelphia, 2nd edition, 2002.
[18] N.J. Higham. The Test Matrix Toolbox for MATLAB (version 3.0). Technical Report 276, Manchester Centre for Computational Mathematics, Sept.
1995.
[19] C.M. Hoffmann. The problems of accuracy and robustness in geometric
computation. IEEE Computer, 22(3):3141, 1989.
Lecture notes on the status of ieee standard for bi[20] W. Kahan.
nary floating-point arithmetic.
URL http://http.cs.berkeley.edu/ wkahan/ieee754status/ieee754.ps, May 1996. Work in progress.
110
[21] W. Kahan and J.D. Darcy. How Javas floating-point hurts everyone everywhere. In Originally presented at 1998 ACM Workshop on Java for high-performance computing, 1998. Available electronically at
http://www.cs.berkeley.edu/ wkahan/JAVAhurt.pdf.
[22] D. E. Knuth. The Art of Computer Programming: Seminumerical Algorithms,
volume 2. Addison-Wesley, 1981.
[23] D.J. Kuck. The Structure of Computers and Computations. Wiley, 1978.
[24] J. L. Larson and A.H. Sameh. Efficient calculation of the effects of roundoff
errors. ACM Trans. Math. Softw., 4(3):228236, September 1978.
[25] S. Linnainmaa. Error linearization as an effective tool for experimental
analysis of the numerical stability of algorithms. BIT, 23:346359, 1983.
[26] W. Miller. The engineering of numerical software. Prentice-Hall, Inc., 1984.
[27] W. Miller and C. Wrathall. Software for roundoff analysis of matrix algorithms. Academic Press, New York, 1980.
[28] R. E. Moore. Interval Analysis. Prentice-Hall, Englewood Cliffs, N.J., 1966.
[29] E. Morin. . .
, , 2000. Les sept
savoirs necessaires
a l education
du futur . .
[30] M.L. Overton. Numerical Computing with IEEE Floating Point Arithmetic.
SIAM, Philadelphia, 2001.
[31] J. Rice. A theory of condition. SIAM J. Numer. Anal., 3(2):287311, 1966.
[32] J.D. Faires R.L. Burden. Numerical Analysis. Brooks Cole, Boston, 5th
edition, 1993.
[33] W. Rudin. Principles of Mathematical Analysis. McGraw Hill, 1976.
[34] A.M. Turing. Rounding-off errors in matrix processes. Quart. J. Mech.,
1:287308, 1948.
[35] J.H. Wilkinson. The perfidious polynomial. In G.H. Golub, editor, Studies
in Numerical Analysis, volume 24, pages 128. Mathematical Association
of America, 1984.
[36] J.H. Wilkinson. Rounding Errors in Algebraic Processes. Dover, 1994 (First
published in 1963).
4.1
1 ( 1.1)
. :
.0: A Rn1 n3 , B Rn3 n2 , C
Rn1 n2 C + AB .
.1: A Rnn b
Rn . x Rn Ax = b.
.2: A Rmn
b Rm . x Rn
kb Axk2 .
.3: ( ) A Rnn
C x Cn Ax = x.
.4: A, B Rnn C x Cn Ax = Bx.
.5: (= Singular Value Decomposition). A Rmn Rmn
U Rmm , V Rnn A = U V > .
.6: A Rnn B
Rnm f f (A)B .
B ( f (A)
f (A)B .
. ,
.
111
c
112 4. 2008,
.
4.1.1.
NASTRAN . URL www.noraeng.com.
( = normal modes).
Kx = f
K ( = stiffness matrix),
, x , f . x f .
2
Ku 2 M ddt2u = 0,
d2 u
dt2
.
NASTRAN
( wind-tunnel) Ames Center NASA
Moffett Field . ,
3.70 21 .
.
1940 40 , () () .
, ,
,
, .
73.5 PSI. , 28 106 .
217918 18
, 10 . [24]
(
1985 Cray Y-MP) 1837 1500
I/O Kx = f 4700
5400 I/O
10 .
4.1.1
, , , .
MATLAB. :
. .. imin , imax , iincr , imin : imax [imin , imin + 1, ..., imax ], imin : iincr : imax imin
c].
[imin , imin + iincr , imin + 2iincr , . . . , imin + iincr b imax
iincr
iincr (= stride). imin
imax , :
imin : imax . imin : iincr : imax
.
MATLAB Fortran 90 , . imin
c]
[imin , imin + iincr , imin + 2iincr , . . . , imin + iincr b imax
iincr
c
4.1. 2008,
.
113
4.1.3.
[10 : 14] = [1011121314],
[10 : 2 : 14] = [101214]
A(2 : 3, 2 : 2 : 4)
A(2, 2) A(2, 4)
A(3, 2) A(3, 4)
22
32
24
34
4.1.2
,
. , , .
,
, .
.
, , 1 .
( )
( ).
4.1.1. A m n 1
i1 < i2 < ... < ik m 1 j1 < j2 < ... < jl n.
k l (, ) ai j
A. k = l i1 = j1 , ..., ik = jk ( principal).
i1 = 1, ..., ik = k ( leading).
A, .. (, , , .) .
4.1.2. A m n
( )
A=
A11
A21
A12
A22
..
.
..
.
..
Ak1
Ak2
A1l
A2l
..
.
Akl
1
blocks, 2, 1.
c
114 4. 2008,
.
Aij mi nj A.
: (conformally partitioned),
, . ,
, :
B11
B21
B12
B22
C11
.
C21
C12
C22
4.2
, , .. , , . )
, ,
)
)
.
2 .
.0 :
.0: A Rn1 n3 , B Rn3 n2 , C Rn1 n2
C + AB .
4.2.1
:
_DOT
x, y Rn , ()
:= x> y . , x, y Cn := x y .
= 2n min = 2n + 2.
,
, = min
= min ,
3n + O(1).
. DOT
2
, .. [28].
c
4.2. 2008,
. 115
, (=
reductions).
.
sAXPY
, x, y Rn , [s,d]AXPY
y y + x.
= 2n min = 3n + 1 . min 1
, . min = 32 + 2n
. sAXPY
single-precision a x plus y. (triad).
, sAXPY SAXPY
. )
( ).
,
DOT, sAXPY + x> y y y + x
x, y Rn {j , yj }n
j=1
3
.
DOT
for i = 1 : n
sAXPY
for i = 1 : n
= + i i
end
i = i + i
end
,
.
,
,
. Load/Store,
, :
DOT
LOAD x, y ,
for i = 1 : n
= + i i
end
STORE
sAXPY
LOAD x, y ,
for i = 1 : n
i = i + i
end
STORE y
. , O(n),
.
, .
3
MATLAB , APL .
. .. MATLAB DOT x0 y sAXPY
a x + y .
c
116 4. 2008,
.
DOT
LOAD
for i = 1 : n
LOAD i , i
sAXPY
LOAD
for i = 1 : n
LOAD i , i
= + i i
end
STORE
i = i + i
STORE i
end
DOT = 2n + 2 = 2n
O(1) sAXPY
= 3n + 1 = 2n O(1)
.
min ,
.
DOT sAXPY
DOT sAXPY. DOT, . ,
, 3, ( )
, (
, )
)
.
,
DOT 3.5.7 , cond(f ; X)
k[x;y]k2
.
|x> y|
>
. sn = x y
s1
s2
= fl(x1 y1 ) = x1 y1 (1 + 1 )
= fl(
s1 + fl(x2 y2 ))
= (x1 y1 (1 + 1 ) + x2 y2 (1 + 2 ))(1 + 3 )
= x1 y1 (1 + 1 )(1 + 3 ) + x2 y2 (1 + 2 )(1 + 3 )
|i | u.
sn
n+1
Y
x1 y1
j=1
j 6= 2
...x3 y3
n+1
Y
j=3
(1 + j ) + x2 y2
n+1
Y
(1 + j ) +
j=2
(1 + j ) + + xn yn
n+1
Y
(1 + j ).
j=n
c
4.2. 2008,
. 117
nu
1 nu
nu(1 + nu + (nu)2 + )
nu + O(u2 )
3.5.1 :
sn
= x1 y1 (1 + n ) + x2 y2 (1 + n ) +
...x3 y3 (1 + n1 ) + + xn yn (1 + 2 ).
:
DOT
x1 , ..., xn , y1 (1 + n ), y2 (1 + n ), ..., yn (1 + 2 ),
|j |
ju
.
1 ju
n
X
i=1
>
c
118 4. 2008,
.
4.2.2
1
rank-1
=1+
min
1
1
+
.
2n1
2n2
(4.1)
:
for j = 1 : n2
LOAD j
for i = 1 : n1
LOAD ij , i
ij = ij + i j
STORE ij
end
end
j
n2 (2n1 + n1 + 1) . = 3n1 n2 + n2 n2
sAXPY.
a,
2n1 n2 + n1 + n2 .
. , , .
:
LOAD a
for j = 1 : n2
LOAD j
for i = 1 : n1
LOAD ij
ij = ij + i j
STORE ij
end
end
, ,
n1 , n2
.
.
min
;
C, a, b :
c
4.2. 2008,
. 119
C11
C21
..
.
..
.
...
C1,k2
..
..
.
..
.
..
.
CI,J
..
Ck1 ,1
...
Ck1 ,k2
C11
C21
..
.
..
.
Ck1 ,1
...
C1,k2
..
..
.
..
.
..
.
CI,J
..
...
Ck1 ,k2
a1
..
.
aI
..
.
bJ
ak1
, C, a, b,
for J = 1 : k2
for I = 1 : k1
(* CIJ = CIJ + aI b>
J *)
n1 n2
m1 m2 , nj = mj kj .
.
m1
a, CIJ = CIJ + aI b>
J
, . rank-1 = 1 + 1 + 1 .
min
2m1
2m2
=1+
1
1
1
1
+
1+
+
= min .
2n1
2m2
2n1
2n2
.
1 m2 K,
min 1 +
1
1
+
2n1
2K
(4.2)
n2 < K
= 1 + 2n1 1 + 2n1 2 .
:
min
,
.
.
, :
c
120 4. 2008,
.
(
, , .)
(. min ).
;
min , ,
1.
2.
3.
LOAD
STORE
(
), ,
.
. , :
min . .. DOT .
(. 2.2).
LOAD .
.
, . ..
.
A = xy > ,
A
fl(ij ) = ij (1 + ij ) = xi yj (1 + ij )
|ij | u.
fl(A) = xy > + E
|E| |x||y|> u, .
. , x, y
E = [x, x + x]
0 1
1 0
(y + y)>
y>
c
4.2. 2008,
. 121
2 n
.
. ,
.
: :
1 B A + xy >
A, . fl(B) = (A + E) + xy > E = ij i j ,
4.2.3
,
y y + Ax, GAXPY = General A x
plus y, MV ( Matrix Vector).
1
+ n12 .
: = 2n1 n2 , min = n1 n2 + 2n1 + n2 , min = 12 + 2n
1
min
DOT sAXPY.
DOT/ ij / . : y A x,
i i + a>
i,: x = i +
n2
X
ik k , i = 1, ..., n1 .
k=1
sAXPY/ ji / . y A:
yy+
n2
X
a:,k k .
k=1
:
DOT / ij
for i = 1 : n1
for j = 1 : n2
sAXPY / ji
for j = 1 : n2
for i = 1 : n1
i = i + ij j
i = i + ij j
end
end
end
end
.
, . DOT ij , sAXPY
ji. , .. A 4 .
4
:
.
c
122 4. 2008,
.
, ... = 2n1 n2 ,
,
DOT = 2n1 n2 + 2n1 sAXPY = 3n1 n2 + n2 .
, ..
DOT n2
x. DOT = 2n1 + n1 n2 + n2 , = min = 21 + 2n1 1 + n12
.
sAXPY
y . DOT x
min
(read only). , sAXPY, y
. ,
.
.
,
MV. :
y1
..
.
yI
..
.
yk1
A11
A21
..
.
..
.
...
..
A1,k2
.
AI,J
..
Ak1 ,1
...
..
.
..
.
..
.
x1
..
.
xk2
Ak1 ,k2
y1
..
.
yI
..
.
yk1
:
for I = 1 : k1
(* yI = yI + AI1 x1 + ... + AI,k2 xk2 *)
for J = 1 : k2
k1 , k2
yI = yI +AIJ xJ = m1 m2 +2m1 +
m2 yI
. , K m1 + (1)
=
=
k1 (m1 + k2 (m1 m2 + m1 + m2 ))
(4.3)
n1 + n1 n2 + n1 k2 + k1 n2 .
. K
n1 + n2 + (1), k1 = 1, k2 = 1 = n1 + n1 n2 + n1 k2 + n2 ,
= 2n1 + n1 n2 + n2 min GAXPY.
c
4.2. 2008,
. 123
4.1: .
for ? = 1 : n?
for ? = 1 : n?
for ? = 1 : n?
ij = ij + ik kj
end
end
end
MV
.
|y y| n |A||x|.
4.2.4 - :
C = C + AB,
(4.4)
min =
1
n3
1
2n1
1
2n2 .
nj n min = O( n1 ),
. 4.1
(4.4)
A, B, C .
?
(i, 1), (j, 2), (k, 3). , .
(= triply nested loop), 3! = 6 4.1,
4.2.
, .
5
,
.
c
124 4. 2008,
.
x
=
x
=
x
=
4.1: [7].
4.2:
[16, page 14].
ijk
DOT
A . ., B . .
jik
DOT
B . ., A . .
ikj
sAXPY
GAXPY .
B .
jki
sAXPY
GAXPY .
A .
kij
sAXPY
B .
kji
sAXPY
A .
c
4.2. 2008,
. 125
.
ikj ,
= n1 n3 + n1 n2 n3 (
1
2
+
),
m1
m3
m3 m1 + m3 K 1 mj nj .
mj . , C C + AB
r = n3 n1 , n2 , -r.
NB,
, .
CIJ = CIJ + AIK BKJ , on-chip.
, NB
3NB2 w < K.
,
. BLAS-3
RISC [5]. 6 , Fortran
.
performance portability ( )
, .
[5] )
Fortran, )
.
;
.
[5].
/ (..
1 )
BLAS.
[5]; , blocking (), loop
unrolling ( ), block copying ( ) .
4.4.
chip.
Kasparov Deep Blue
, ,
6
ftp://ftp.enseeiht.fr/pub/numerique/BLAS/RISC.
c
126 4. 2008,
.
(block size) .
, [5], NB
3NB2 w < CS, CS w bytes ... .
.
, [14] .
:
1.
( ).
2.
.
3.
.
. ..
[15]. ,
, ,
, . , , .
, brute force :
(
NB),
.
. .
(
).
Automatically Tuned Linear Algebra Software (ATLAS)
BLAS pipelining. . URL
www.netlib.org/atlas/index.html/
:
, ATLAS
,
.
, , ,
[30, 19, 18, 20].
c
4.2. 2008,
. 127
.
C = AB ci = Abi , i = 1 : n
fl(AB) = (A + A)B,
|A| n |A||B||B 1 |
n |A||B|.
|C C|
,
|AB| |A||B|.
4.2.5
n (n3 ) : n2
n n 1 , .
2n3 n2 .
. O(nk ) k < 3.
, (
).
V. Strassen [29] .
B , C n = 2k .
A11
A21
A12
A22
B11 B12
C11 C12
=
.
B21 B22
C21 C22
n
n
T (n) = 8T ( ) + 4( )2
2
2
T (n) = (n3 ), . .
1969 Volker Strassen [29] 3 (!!!)
Strassen n 4.7nlog2 7
c
128 4. 2008,
.
. Strassen
. ,
A11 = P1 + P4 P5 + P7
A21 = P2 + P4
A12 = P3 + P5
A22 = P1 + P3 P2 + P6
n
n
T (n) = 7T ( ) + 18( )2
2
2
(4.5)
7 18 (
n = 2). 4.5
c
4.2. 2008,
. 129
4.2.2 (Brent). A, B Rnn n = 2k C =
AB Strassen n0 = 2r+1
:
. C
n log2 12 2
kC Ck ( )
(n0 + 5n0 ) 5n ukAkkBk + O(u2 ).
n0
kAk := maxi,j |ij |.
4.2.2. n0 = 1 6n3,58 .
4.2.3. n0 = n/2 3n2 + 25n
Strassen .
[23].
nu|A||B| + O(u2 ).
|C C|
(4.6)
4.2.6 BLAS
Basic Linear Algebra Subroutines, BLAS.
BLAS , . , ,
.
(.. ) ,
.
,
8 .
BLAS : .. (S =
, D = , C = , Z = BLAS-1, (.. SWAP =
, DOT, NRM2 = , ASUM =
), BLAS-2,3
(GE = , SY = , TR = , HE = ), (.. MV = , MM =
, SM = 9 , R = ).
.. DGEMM BLAS-3 ...
:
c
130 4. 2008,
.
4.2: BLAS.
C m n.
. op, ..
B , .
. [1, Appendix C].
BLAS (4.4) ..
n1 = n2 = 1 DOT, .
n1 , n2 , n3 .
BLAS-1: nj
, . /. .. DOT (n3 > 1), sAXPY (n2 > 1
n3 > 1.) O(n)
min = O(1). .
BLAS-2: , . /.
n
O(n2 ),
BLAS-1.
BLAS-3: nj > 1 j = 1, 2, 3, . /, .. n1 =
n2 = n3 = n O(n2 ) O(n3 ) .
BLAS
. 4.210 11 .
BLAS
RISC 4.312 .
, [16, Section 1.4.9]. K (.
10
RGB.
Kyle Gallivan .
12
BLAS Dakota Scientific Software IBM.
11
c
4.2. 2008,
. 131
x + y
y + Ax
C + AB
).
C = C + AB n := n1 = n2 = n3 K 3n2 < M.
B, C N
, 13 n = N m m,
.
B = [B1 , ..., BN ],
N ( ) Bj , Cj A, . n(2m + 1) K.
C = AB Cj = ABj = [A:,1 , ..., A:,n ]Bj .
( ):
for j = 1 : N
(* LOAD Bj , Cj *)
for k = 1 : n
(* LOAD A:,k . 1 Cj *)
end
(* STORE Cj . *)
end
N
X
3 n2
1 =
(n2 + 3n(n/N )) = n2 (N + 3) 2n2 ( +
)
2
K
j=1
K.
N N
m, 3m2 K DOT ijk
:
for I = 1 : N
for J = 1 : N
(* LOAD CIJ *)
for K = 1 : N
(* LOAD AIK , BKJ CIJ *)
end
(* STORE CIJ *)
end
end
2 =
PN PN
i=1
2 = 2n2 (1 + N )
j=1 (2m
PN
k=1
2m2 ), .
n3 2 3
,
K
13
.
.
c
132 4. 2008,
.
1
1
n
=
.
2
2
3K
n K, .
4.3
,
. !
:
O(nk ) , k (
3) n ( ) .
,
(.. ) ,
N = O(n2 ) .
O(N k/2 ), . k = 2 (.. , ),
14 .
: n
: ..
n
.
4.4
, . DOT, DAXPY.
,
(. ) .
4.4 (BLAS-1),
Fortran.
D(DOT) .
DDOT Fortran function,
x, y dx(1), dx(1 + incx), dx(1 + 2incx), ..., dx(1 + (n 1)incx)
dy(1), dy(1 + incy), dy(1 + 2incy), ..., dy(1 + (n 1)incy).
, : (= label) 20
incx = incy = 1.
.
14
G.W. Stewart.
c
4.4. 2008,
.
133
10 dtemp. incx
= incy = 1 :
(= loop unrolling) . ,
m := n mod nL nL
5. ,
0 m < nL 1 . , (50) .
, L, S,
M A.
(.. , )
B. (.. 10)
n LLMASB, . 2 . ,
3 , B, . 50
:
LLMALLMALLMALLMALLMALLMASB.
S B n b nnL c+nL 1.
15 .
nL , .. nL := n .
DDOT functions,
nL . nL = 5 . ,
, S.
4.5 sAXPY.
Fortran-77 BLAS Jack Dongarra .
,
.
Netlib16 . C C.
f2c
(Fortran to C translator ), (..
Fortran)
.
nL sAXPY
DOT 17 .
Fortran-77 DGEMV. BLAS-1.
15
loop overhead .
16
17
thrashing .
c
134 4. 2008,
.
*
*
*
*
DGEMM.
*
*
* Purpose
* =======
*
* DGEMM performs one of the matrix-matrix operations
*
C := alpha*op( A )*op( B ) + beta*C,
*
*
* where op( X ) is one of
*
op( X ) = X
or
op( X ) = Xt,
*
*
* alpha and beta are scalars, and A, B and C are matrices, with op( A )
* an m by k matrix, op( B ) a k by n matrix and C an m by n matrix.
.....
Form C := alpha*A*B + beta*C.
*
*
c
4.4. 2008,
.
135
DO 70, L = 1, K, NB
LB = MIN( K - L + 1, NB )
DO 60, I = 1, M, NB
IB = MIN( M - I + 1, NB )
DO 62 II = I, I + IB - 1
DO 61 LL = L, L + LB - 1
AA(LL-L+1,II-I+1)=ALPHA*A(II,LL)
CONTINUE
CONTINUE
61
62
DO 50, J = 1, N, NB
JB = MIN( N - J + 1, NB )
*
*
*
$
*
50
60
70
.....
DGEMML2X2_NN . ,
(.
a + b c). :
*
*
Form
C := alpha*A*B + C.
DO 70, J = 1, N, 2
DO 60, I = 1, M, 2
...
T11 = C(I ,J
T21 = C(I+1,J
)
)
c
136 4. 2008,
.
CONTINUE
C(I ,J
C(I+1,J
) = T11
) = T21
CONTINUE
CONTINUE
daxpy
:
lib BLAS
.
ref BLAS .
hnd .
IBM T20
Intel Pentium III 700 MHz. Fortran 77 Digital Fortran Compaq18 .
BLAS. ref lib
4 .
:
0 .
1 Local (minimal) optimizations occur within the source program unit and include recognition of common subexpressions and the expansion of multiplication and division.
18
. http://www5.compaq.com/fortran/index.html.
c
4.4. 2008,
.
137
lib3
hnd3
ref3
ref2
hnd2
hnd1
ref1
ref0
hnd
300
250
library
mflops/s
200
ref3
ref2
ref1
150
100
hnd3
hnd2
ref0
hnd1
50
hnd0
0
500
1000
1500
2000
2500
n
3000
3500
4000
4500
5000
4.3: DAXPY 3
.
c
138 4. 2008,
.
(hand0) 5 (lib3).
4.5
Kyle Gallivan (CSRD) Illinois [16].
BLAS-1 LINPACK EISPACK (.. [8]),
. [25].
,
(.. Cray-1), BLAS-2
[6].
BLAS-3 [9, 10]. BLAS-3
LAPACK [1],
LAPACK .
.
(= virtual memory) ,
[26, 27].
. [11, 21].
BLAS
.. . [13, 14, 12]. ,
(= workstations)
.. . [15] DGEMM DEC
3000/80019 15 MFLOPS 166 MFLOPS. BLAS-3 RISC [5]20 .
C. Ueberhuber
[31, . 262-271].
[22], , Strassen, , BLAS-3.
(.. ),
.
Fortran BLAS Internet
URL http://www.netlib.org/blas.
19
DEC 3000/800 21064 Alpha-AXP 200
MFLOPS.
20
ftp://ftp.enseeiht.fr/pub/numerique/BLAS/RISC.
c
4.6. 2008,
.
4.6
139
4.6.1. ;
. [, . 4] : 1)
C C + AB A, B, C , , 2)
, 3) , 4-5)
, 6) 7)
.
4.6.2. : x, u, v Rn
I n. (I uv > )x O(n2 )
.
. : A 6= pq T !
O(n log n) . ( : A
A, .
, . : A =
[A11 , A12 ; A12 , A22 ] A12 , A12 1 ,
A.)
4.6.4. : BLAS-3 n n O(n2+ )
... 0 < < 1.
4.6.5. : C = AB , A
Hessenberg B C Hessenberg.
. : j > i + 1, cij = 0.
cij j > i + 1.
:
4.6.7. u, v Rn
.
c
140 4. 2008,
.
D(u, v) := ku vk2 u, v ( ,
.. )
BLAS. .
q
D(u, v) =
, D
DOT.
4.6.8 (, , 03). :
( MATLAB)
C AB ,
[c1 , ..., cn ] = A[b1 , ..., bn ], cj , bj C, B BLAS-2 :
. , BLAS-3 BLAS-2.
4.7
4.7.1 ( 03). B = B + (xy > )p ,
B Rnn , x, y Rn p . min = min /,
... min
.
min n.
. min ) ,
)
) (. .
2 ). , min = 2n2 + 2n + 1 ( p.
, O(1) ). , (xy > )p
p1
z
}|
{
x (y > x) (y > x) y > : = y > x .
2n 1 , ( p1 x), . 1 + n , B = B + ( p1 x)y > ,
. 2n2 . = 2n2 + 3n (
). p
x , n2
. min = (2n2 + 3n)/(2n2 + 2n + 1).
O(n2 ) . O(n3 )
p.
c
4.7. 2008,
.
141
4.7.2. A, B Rnn .
AB (. )
...
. MATLAB:
C=zeros(n);
for i=1:n,
for j=1:i,
C(i, j)=A(i, j:i)*B(j:i, j);
end
end
i, i C , i i 1 . (2i 1).
:
=
=
=
n X
i
X
(2(i j + 1) 1) =
i=1 j=1
n
X
n
X
i=1
i=1
n X
i
X
(2k 1)
i=1 k=1
((i + 1)i i) =
i2
n(n + 1)(2n + 1)
6
.
4.7.3. A Rnn , x, u, v Rn y = (A + uv > )x.
, min , ...
( )
O(n)
( load) ( store).
. :
min =
min
n2 + 4n
= 2
2n + 3n 1
O(n) min :
LOAD x, u, v
y = (v > u)x
for i = 1 : n,
LOAD A(i, :)
yi = yi + A(i, :) x
end
STORE y
4.7.4. x, y Rn .
c
142 4. 2008,
.
1. A := xy T .
2. .
. 4.2.2
.
4.7.5. 2 2
A, B R22 .
. A, B R22 :
A=
11
21
0
22
,B=
11
21
0
22
fl(AB) =
11 11 (1 + 1 )
0
(21 11 (1 + 2 ) + 22 22 (1 + 3 ))(1 + 4 ) 22 22 (1 + 5 )
B
, :
fl(AB) = A
A=
11 (1 + 1 )
0
21 (1 + 2 )(1 + 4 ) 22
,B=
11
21 (1 + 3 )(1 + 4 )
0
22 (1 + 5 )
ij ij
2u
| |2 | 2 =
ij
1 2u
ij ij
2u
| |2 | 2 =
ij
1 2u
4.7.6. 2 2
A R22 .
A=
:
fl(|A|)
= ((1 + 1 ) (1 + 2 ))(1 + 3)
= (1 + 1 )(1 + 3 )) (1 + 2 )(1 + 3 ))
, A =
fl(|A|) = |A|
(1 + 2
(1 + 20 )
, |2 |, |20 | 2 =
2u
1 2u
c
4.7. 2008,
.
143
4.7.7. 2 2
|ij |.
nu
(n) := 1nu
(u ).
.
Be1 . bj :=
Hbj1 j bj1 b0 = e1 . Hessenberg H
1 b1 = He1 1 e1 ,
2 b2 = Hb1 2 b1 3 ,
... j 1 n 1 bj j
. n 1 bj .
MATLAB .
x(2:n,1)=zeros(n-1,1);
for j=1:r
nn=min(r,n);
xt(1:n,1)=-l(j)*x(1:n,1);
for k=1:min(j,n)
xt(1:nn,1)=xt(1:nn,1)+x(k,1)*a(1:nn,k);
end
x(1:nn,1)=xt(1:nn,1);
end
4.7.9 (, , 02-makeup). A
s 1 , ..., s R .
Qs
n
j=1 (A j I)x, x R . A O(ns) .
. :
y=x;
for i=s:-1:1,
y=(A-t(i)*I)y;
end
6(n 2) + 8 ,
yj , j = 2 : n 1 6 , y1
yn 4 .
A
(6(n 2) + 8)s = 6ns 4s = O(ns) .
4.7.10 (, , 02, 02-makeup).
1. ... DOT
sAXPY.
2. .
c
144 4. 2008,
.
3. .
for i = 1:n
for j = 1:n
if ((j==i-1)|(j==i+1)), A(i,j)=-1;
elseif (i==j), A(i,j)=2;
else A(i,j) =0;
end
end
end
for i = 1:n,
u(i,1) = sin(i*pi/n);
v(i,1) = cos((n-i+1)*pi/n);
x(i,1) = 1;
end
for k=1:s,
x = (A+u*v)*x;
end
.
Toeplitz. :
A =
u =
v =
x =
for
x
end
toeplitz([2,-1,zeros(1,n-2)]);
sin([1:n]*pi/n);
cos([n:-1:1]*pi/n);
ones(n,1);
k=1:s,
= A*x + u*(v*x);
. n- , n , z . , n1,n1 n1 + n1,n n = 0
n1 = n1,n n /n1,n1 ,
n1,n1 6= 0.
.
4.7.13 (Golub and van Loan [17]). S, T Rnn
ST I . (ST I)x = b O(n2 )
c
4.7. 2008,
.
145
. : :
S+ =
u>
Sc
, T+ =
v>
Tc
, b+ =
bc
v > xc u> wc
.
x+ =
, =
xc
(S+ T+ I)x+ = b+ . x+
w+ = T+ x+ O(n k) .
. ,
ST , O(n3 )
. ,
, .
O(n2 ) .
.
, ..
.
S+ T+ I =
v > + u> Tc
Sc Tc I
(S+ T+ I)x+ = b+
v > + u> Tc
Sc Tc I
xc
bc
(Sc Tc I)xc = bc
T+ x+ =
+ v > xc
Tc xc
T+ x+ O(n k) .
, k = n, Sc =
nn , Tc = nn , bc = n n 1 O(n 1) + O(n
2) + O(1) = O(n2 ).
146
147
[15] E. Garcia, J.R. Herrero, and J.J. Navarro. Data prefetching for linear
algebra on high performance workstations. Technical report, U. Pol. Catalunya, Barcelona, 1995.
[16] G. Golub and C. F. Van Loan. Matrix Computations. The Johns Hopkins
University Press, Baltimore, 2nd edition, 1989.
[17] G.H. Golub and C.F. Van Loan. Matrix Computations. The Johns Hopkins
University Press, Baltimore, 3d edition, 1996.
[18] F. Gustavson, A. Henriksson, I. Jonsson, B. Kom, and P. Ling. Recursive
blocked data formats and BLASs for dense linear algebra algorithms, year=
1998. In B. Kom, J. Dongarra, E. Elmroth, and J. Wasniewski, editors,
Applied Parallel Computing, 4th International Workshop (PARA98), pages
195206, Berlin, 1998. Springer-Verlag.
[19] F.G. Gustavson. Recursion leads to automatic variable blocking for dense
linear-algebra algorithms. IBM J. Res. Develop., 41(6):737756, 1997.
[20] F.G. Gustavson and I. Jonsson. Minimal-storage high-performance Cholesky factorization via blocking and recursion. IBM J. Res. Develop.,
44(6):823850, 2000.
[21] J.L. Hennessy and D.A. Patterson. Computer Architecture: A Quantitative
Approach. Morgan Kaufmann, San Mateo, CA, first edition, 1990.
[22] N.J. Higham. Exploiting fast matrix multiplication within the level 3 BLAS.
ACM Trans. Math. Softw., 16(4):352368, 1990.
[23] N.J. Higham. Accuracy and Stability of Numerical Algorithms. SIAM, Philadelphia, 2nd edition, 2002.
[24] L. Komzsik and T. Rose. Substructuring in MSC/NASTRAN for large scale
parallel applications. Computing Systems in Engineering, 2(2/3):167173,
1991.
[25] C. Lawson, R.J. Hanson, D.R. Kincaid, and F.T. Krogh. Basic linear algebra
subprogams for Fortran usage. ACM Trans. Math. Softw., 5(3):308323,
1979.
[26] A. C. McKellar and E. C. Coffman, Jr. Organizing matrices and matrix
operations for paged memory systems. Comm. ACM, 12(3):153165, March
1969.
[27] C. Moler. Matrix computations with Fortran and paging. Commun. ACM,
15:268270, 1972.
[28] W.H. Press, S.A. Teukolsky, W.T. Vettering, and B.P. Flannery. Numerical
Recipes in FORTRAN. The Art of Scientific Computing. Cambridge University
Press, Cambridge, second edition, 1992.
[29] V. Strassen. Gaussian elimination is not optimal. Numer. Math., 13:354
356, 1969.
148
149
c
ddot = 0.0d0
dtemp = 0.0d0
if(n.le.0)return
if(incx.eq.1.and.incy.eq.1)go to 20
c
c
c
c
ix = 1
iy = 1
if(incx.lt.0)ix = (-n+1)*incx + 1
if(incy.lt.0)iy = (-n+1)*incy + 1
do 10 i = 1,n
dtemp = dtemp + dx(ix)*dy(iy)
ix = ix + incx
iy = iy + incy
10
continue
ddot = dtemp
return
c
c
code for both increments equal to 1
c
c
clean-up loop
c
20
m = mod(n,5)
if( m .eq. 0 ) go to 40
do 30 i = 1,m
dtemp = dtemp + dx(i)*dy(i)
30
continue
if( n .lt. 5 ) go to 60
40
mp1 = m + 1
do 50 i = mp1,n,5
dtemp = dtemp + dx(i)*dy(i) + dx(i + 1)*dy(i + 1) +
dx(i + 2)*dy(i + 2) + dx(i + 3)*dy(i + 3) + dx(i + 4)*dy(i + 4)
*
50
continue
60
ddot = dtemp
return
end
150
subroutine daxpy(n,da,dx,incx,dy,incy)
c
c
c
c
c
c
if(n.le.0)return
if (da .eq. 0.0d0) return
if(incx.eq.1.and.incy.eq.1)go to 20
c
c
c
c
ix = 1
iy = 1
if(incx.lt.0)ix = (-n+1)*incx + 1
if(incy.lt.0)iy = (-n+1)*incy + 1
do 10 i = 1,n
dy(iy) = dy(iy) + da*dx(ix)
ix = ix + incx
iy = iy + incy
10
continue
return
c
c
code for both increments equal to 1
c
c
clean-up loop
20
m = mod(n,4)
if( m .eq. 0 ) go to 40
do 30 i = 1,m
dy(i) = dy(i) + da*dx(i)
30
continue
if( n .lt. 4 ) return
40
mp1 = m + 1
do 50 i = mp1,n,4
dy(i) = dy(i) + da*dx(i)
dy(i + 1) = dy(i + 1) + da*dx(i + 1)
dy(i + 2) = dy(i + 2) + da*dx(i + 2)
dy(i + 3) = dy(i + 3) + da*dx(i + 3)
50
continue
return
end
II
5.1
(
.1):
.1 A Rnn , b Rn . x Rn Ax = b.
.1 .
.1 Carl Friedrich Gauss, 1800
.
Gauss
([18]) G.W. Stewart1 .
.1; , 2
( !)
.
.1 . 1.1.
.1
..
.1 O(n3 )
O(n2 )
.. 1980
BLAS3 [12].
1
G.W. Stewart Maryland .
2
.
151
c
152 5. II 2008,
.
1.
.1
LINPACK. LAPACK LINPACK
.
2. [11, 8].
3.
MATLAB3 MATHWORKS, 7 : (
)
LINPACK.
4. 6 , MATLAB BLAS-3 LAPACK,
7 ... .
LINPACK benchmark
.
, 4 A
.. ,
O(n2 ) O(n3 ) , O(n)
.
Gauss.
.
Gauss, 20
( O(106 ) )
[17, 33].
.
[24, 4].
, .1 , .. 1.1.
3
4
MAtrix LABoratory.
5.3.
c
5.2. 2008,
.
153
.. , , ,
A
( Cholesky .)
:
,
.
,
.
.1
.0 .
Golub
Van Loan ([22]) Higham ([24]). ,
.
5.2
There would be many things to say about this theory of matrices
which should, it seems to me, precede the theory of determinants.
[Arthur Cayley]
.1. ( ) ,
:
5.2.1. Ax = b A Rmn b Rm .
:
5 [A, b] A:
rank([A, b]) = rank(A)
y Rm y > A = 0 y > b = 0.
, x = x0 + x
x
x0 Ax = 0
n rank(A) .
5.2.1. A Rnn , Ax = b
.
5.2.1 . , (.. )
Cramer:
5
c
154 5. II 2008,
.
5.2.2 ( Cramer). A Rnn b Rn .
j (1 j n) x Rn Ax = b
:
i =
det(A(i|b))
, i = 1, . . . , n.
det(A)
A(i|b) i
A b.
Cramer .1, n (
det(A)), n2 n 1. . ,
det(A) :=
Sn
sgn() , Sn
1, ..., n. :
c
5.2. 2008,
.
155
A, /.
(1/)j j .
. A1
A. , A
(.. ), A1
.
5.2.1.
A=
0
1/2
1/2
1/2
1/2
3
0
() 3n2 = 13
.
(3n 2) 2 3n 2 () :
(1,1)
(2,1)
(1,2)
(2,2)
(3,2)
(2,3)
(3,3)
(4,3)
(3,4)
(4,4)
(5,4)
(4,5)
(5,5)
3.0000
-1.0000
-0.5000
3.0000
-1.0000
-0.5000
3.0000
-1.0000
-0.5000
3.0000
-1.0000
-0.5000
3.0000
A , ( Toeplitz),
:
c
156 5. II 2008,
.
0.3542
0.1255
0.0444
0.0157
0.0052
0.0627
0.3765
0.1333
0.0471
0.0157
0.0111
0.0667
0.3778
0.1333
0.0444
0.0020
0.0118
0.0667
0.3765
0.1255
0.0003
0.0020
0.0111
0.0627
0.3542
Toeplitz) n2 =
25 6 .
, A1 , A1 b. ,
.1.
:
(= direct) : (exact arithmetic)
.
Gauss.
.
. Gauss ,
.
(= iterative) :
, .
.
.
,
. , .
.
, :
: ,
6
, .
.
n2 /2 .
c
5.3. 2008,
.
157
. .. Gauss O(n3 )
O(n2 ) .
: ,
. ..
Cholesky
Gauss.
,
. , (. ,
Hessenberg), Vandermonde Toeplitz.
5.3
, ,
,
.
5.3.1.
.
.
: (dense) . .. ,
.
. , n2
...
: (structured dense)
n2 .
:
: A = A> , . ij = ji i, j = 1, ..., n.
n(n + 1)/2 .
: A = A , . ij =
ji i, j = 1, ..., n. .
: ij = 0 i > j ( ), ij = 0
i < j ( )
Hessenberg: () Hessenberg i > j + 1
ij = 0 (j > i + 1 ij = 0). n(n + 1)/2 + n 1
.
c
158 5. II 2008,
.
Toeplitz:
, . ij = |ij| .
2n 1
. ,
, .
Hankel: -
, . ij = hi+j .
, , .
2n 1 .
: (circulant)
: , .
, , , .
n .
Vandermonde: {0 , ..., n1 }. V (0 , ..., n1 )
i1
Cnn Vij := j1
, i, j = 1, ..., n, .
1
0
V (1 , ..., n ) = .
..
0n1
1
1
..
.
..
1n1
1
n1
..
.
n1
n1
Vandermonde. Vandermonde , .. .
: (sparse)
. nnz
.. nnz = O(n) . 5.1, 5.2(a).
nnz
n2 . , ..
, (linked lists). ,
.
,
. ..
, . [34, 32, 20, 13, 14]
. .
(=structured sparse) (= unstructured sparse) .
(= banded) , m n aij = 0 |i j| > m.
, ij = 0 |i j| > 2 .
5.2(). ,
aij = 0 i 6= j .
5.1 5.2. .
c
5.3. 2008,
.
159
500
1000
1500
2000
2500
3000
3500
4000
0
500
1000
1500
2000 2500
nz = 28831
3000
3500
4000
|ii |
n
X
|ij |, i = 1, ..., n.
j = 1//j 6= i
, .
.
7
1.
c
160 5. II 2008,
.
0
2
20
4
40
6
60
80
10
12
100
0
50
nz = 200
100
5
nz = 34
10
10
10
20
20
30
30
40
0
10
20
nz = 400
30
40
40
0
10
20
nz = 172
30
5.3: () . ()
10 4.
40
c
5.3. 2008,
.
161
: ,
8 .
5.3.1. 9 rank(A) = r X
mr
R
, Y Rrn C Rrr A = XCY .
, (.3.1(5) )
X, C, Y , . mr + nr + r 2
mn . .. r = 1, A m + n + 1
{, x, y}.
5.3.1. A Rmn , b Rn rank(A) = r A mr + nr + r 2 Ab
= 2r(n + m + r 1) m ...
r ,
(mr + nr + r 2 mn) (2r(n + m + r 1) m
m(2n1)) X, Y C A. ,
( ) X, Y C .
, . ,
.
, , , . A Rmn m n
8
9
.3
.3.1(5) .3.
c
162 5. II 2008,
.
rank(A) = r < n. A
(.. )
(.. fl(A)) !
(= numerical rank)
.
:
5.3.1. r
r = min{rank(B) : kA Bk2 }
0.
r < n,
inf
rank(B)r
kA Bk2 = r+1 ,
1 2 n A.
r
1 2 r > r+1 n .
inf
B=
r
X
i ui vi> .
i=1
r
r+1 . , ( ) [2]
.
5.3.2. u, v Rn , R. E(u, v; ) := I uv > ,
(. ).
.
1. E(u, v; )E(u, v; ) = E(u, v; + v u).
2. 1 + 1 = v u E(u, v; )E(u, v; ) = I .
3. detE(u, v; ) = 1 v u.
.
1 , rank(E) n 1.
5.3.2. rankE(u, v; ) n 1.
: u e>
j u = 0 1 j k .
Lk (u) := E(u, ek ; 1)
c
5.3. 2008,
.
163
, (k + 1 :
n, k). :
1
0
..
.
.
..
0
0
1
..
.
..
..
.
..
.
..
.
0
0
0
1
k+1
k+2
0
1
0
..
.
..
.
..
.
..
.
..
.
..
.
0
0
..
.
0
,
0
..
.
1
Lk (u) .
5.3.3. E(u, ek ; 1)E(u, ek ; 1) = I , . (Lk (u))1 =
Lk (u).
Lk (u)x =
=
=
x ue>
kx
x uk
[1 , ..., k , k+1 k k+1 , ..., n k n ]> .
j = j /k (k + 1 j n), Lk (u)
x k + 1 : n, . ,
u Gauss.
: Pij :=
E(ei ej , ei ej ; 1)
i, j i < j :
P =
e>
(1)
..
.
e>
(n)
1
1
2
2
n
n
1, ..., n. P =
[e1 , e2 , ..., en ]. :
c
164 5. II 2008,
.
1. P A
A :
P A =
a(1)
..
.
a(n)
a(k) (k) A.
2. AP
A :
AP =
a(1)
a(n)
P
1. P P > = I , . P
2. P >
3. Q P Q QP .
: H :=
E(u, u; 2/u u). H = H , H H = H 2 = I .
Householder,
(elementary reflectors)
. QR.
: .
1 . u, v, x Rn .
: E(u, v; ) 2n + 1
u, v,
: E(u, v; )x DOT
sAXPY, . = 4n ... :
(I uv > )x = x u (v > x)
| {z }
DOT
|
{z
}
sAXPY
, A n2 A = n(2n 1) ...
5.3.4. C Rnn , u, v Rn , 6= 0.
B E(u, v; )C ...; BLAS
B .
c
5.4. 2008,
. 165
Gauss:
Gauss .
u (= unit roundoff) ...
C Rnn , v Rn Gauss e>
j Lk (v)C(:, k) = 0, j = k+1 : n.
v v v(j) = C(j, k)/C(k, k) j = k + 1 : n
v(j)
=
=
v =
fl(Lk (
v )C).
:
5.3.1. C Rnn , x, y Rn .
(i, j) fl(Lk (
v )C) Lk (v)C O(u2 ) :
fl(Lk (
v )C) = Lk (v)C + E
|E| u(|C| + 3|v||C(k, :)> |) + O(u2 ).
v ,
C .
|C(k, k)| .
Gauss.
(. . 152-160
[39].)
5.4
.1
.
c
166 5. II 2008,
.
.
:
1. .
2. .
3. .
4. .
5.4.1
, , ,
. ,
.
5.4.1 ( ).
() n(n + 1)/2 n2
:
11 ,
12 , 22 ,
13 , 23 , 33 ,
1n , , nn
j(j 1)/2 + i
(i, j) (j, i) .
A .. y y + Ax
:
for i = 1 : n
for j = 1 : i 1
,
, .
LAPACK 5.10. [1, . 107-].
:
. A Rmn
ij (i, j) A. , , Hessenberg
,
c
5.4. 2008,
. 167
.
(= packed) ,
.
. m n
l u ,
l, u min(m, n). ij , max(1, j u) i
min(m, j + l) (u + 1 + i j, j) A(1 :
l + u + 1, 1 : n).
n n 1
- ( -) .
.
5.4.1. x y + Ax, A Rmn .
.
. , ,
.
. m n
b0 + 1
. :
: (I, J)
b0 + (J 1)m + I .
: (I, J)
b0 + (I 1)n + J .
: Fortran
, C ( ) Pascal .
. , (I, J)
(I + 1, J) 1, n. ,
(I, J) (I, J + 1) 1
m .
,
LOAD . ..
.
(access stride).
:
1.
(, , , ).
(.
c
168 5. II 2008,
.
(= cache miss),
, .
(= cache line)
. , .
,
(= page faults).
2.
(= memory banks). (..
)
(=bank conflict) . bank
.
.
.
.
5.4.2. Fortran.
A n 1
.
DO I = 1,n
DO J = 1,n
Y(I) = Y(I) + A(I,J)*X(J)
ENDDO
ENDDO
DO J = 1,n
DO I = 1,n
Y(I) = Y(I) + A(I,J)*X(J)
ENDDO
ENDDO
5.4.2.
.
,
.
(.. Fortran
Fortran 90) ,
(Fortran-90, MATLAB).
, .
5.4.2
.1. A
c
5.4. 2008,
. 169
. ij = 0 , (1 i < j n) jj 6= 0, (1 j n).
.1
a>
j x = j ,
j = 1, ..., n,
a>
j A.
= (j
j1
X
jk k )/jj ,
(5.1)
k=1
j = 1, ..., n.
(5.2)
. j x1 , ..., xj1 . A
:
n
X
= (j
jk k )/jj ,
(5.3)
k=j+1
A.
1
..
.
j
..
.
11
..
.
j1
..
.
1 +
n1
0
..
.
j2
..
.
n2
.
2 + ... +
..
nn
11
2:n,1
01,n1
A2:n,2:n
1
2:n
1
2:n
(5.5)
11 1 = 1
(5.6)
(5.5)
A2:n,2:n 2:n
2:n 2:n,1 1
(5.7)
2:n . 1 , n 1.
:
c
170 5. II 2008,
.
for j = 1 : n
j = j /jj
for i = j + 1 : n
i = i ij j
end
end
sAXPY. , n2 ...
x
. , b
x ,
min =
n(n + 1)
+ 2n.
2
= n2
min =
1
5
+
.
2 2n
BLAS2.
, AX = B A X, B
Rns , s > 1.
5.4.3.
,
L11
L21
0
L22
X1
X2
B1
B2
X1 , B1 Rks , X2 , B2 Rnks .
B1
B2
= L11 X1
= L21 X1 + L22 X2
X1 ,
L22 X2 = B2 L21 X1 .
.
k s
, n k
s . s > 1, L21 X1
BLAS3.
P.
k1
5.4.1. [24] y = (c i=1 i i )/k ...
:
c
5.5. 2008,
.
171
s=c
for i = 1 : k 1
s = s i i
end
y = s/k
y
k yk (1 + k ) = c
k1
X
i i (1 + i )
i=1
|i | i i , .
i = iu/(1 iu).
. .
5.4.1. Ax = b, A Rnn
, . x
(A + A)
x = b, |A| n |A|.
10 . ,
. [24].
5.5
.1 11 .
. P Rmn
P Ax = P b. P, Q ,
P AQx = P b x = Qy .1.
.1 :
P b b
P AQ A,
.
A1b x
.
Q
x x .
:
A;
P, Q;
10
J. Wilkinson . 251 [39]: ... it is worth stretching
that computed solutions of triangular matrices are, in general, much more accurate than can
be deduced from the bound obtained from the residual vector, and this high accuracy is often
important in practice.
11
;
c
172 5. II 2008,
.
,
P, Q;
5.5.1. Gauss
Q = I P P A = U
.
5.5.2. Gauss Q = I P P A = U .
.
5.5.3. P A = R P R ,
A = QR Q . P
( Householder).
5.5.4. A = ZP Z P
.
A
O(n3 )
. LU .
5.5.1. A Rnn .
L U P LU = P A
L .
L
U . ..
A = LDU L, U D
det(A(1 : k, 1 : k)) = det(D(1 : k, 1 : k)).
L, U, D .
5.5.2
.1 :
. [ LU ]
: L, U ,
: Lz = b z ,
: U x = z x.
c
5.5. 2008,
.
173
. LU
.
5.4.2. x = [1 , ..., n ]> k 6= 0, j := j /k (k + 1
j n), u := [0, ..., 0, k+1 , ..., n ]> Rn ,
Gauss
u = x(2 : n)/x(1)
end
GAUSS
Gauss T
= n 1 ...
Gauss C Rnr :
.
C Rnr Gauss
Lk (u) C Lk (u)C .
function u = GAUSS.MUL(C, u)
n = rows(C); (* n . C *)
GAUSS.MUL
T
= 2(n 1)r ...
c
174 5. II 2008,
.
.
function A = GAUSS.ELM(A)
k=1
while A(k, k) 6= 0 k n 1
t = GAUSS(A(k : n, k)) (* t Rnk *)
A(k + 1 : n, k) = t
A(k : n, k + 1 : n) = GAUSS.MUL(A(k : n, k + 1 : n), t)
k =k+1
end
end
, GAUSS.ELM, , n 1
Gauss (. u) Ln1 L1 A = U 1
. A = L1
1 Ln1 U .
1
1
>
L1
k = I + uk ek L := L1 Ln1
. ,
>
uj , ej uj e>
j uk ek = 0 j < k ,
>
= (I + u1 e>
1 ) (I + un1 en1 )
= I+
n1
X
u j e>
j
j=1
, A :
A R, u Gauss.
A=
11
(1)
2
.
.
(1)
n
12
22
(2)
2
.
(2)
n
.
.
.
.
.
1n
2n
.
.
n,n
Gauss
GAUSS.ELM
T
n1
X
(n k + 2(n k)2 )
k=1
3
2n
3
...
, . L U , O(n2 ), . .1
LU b. .. b
A . s
, . AX = B , X, B Rns , A (. 1 n3 + 2 n2 + O(n))
c
5.6. 2008,
.
175
(. sn2 + O(n)).
1 n3 + (2 + s)n2 + O(n)
n2
n3
+ 2
+ n2 + O(n).
s
s
, s.
Gauss
12 . .. A Rmn m n
n Gauss U
Rmn , U (j, k) j > k ,
.
A = LU L Rmm .
A , L
L=
L11
L21
0
I
L11 Rnn .
5.6
Gauss GAUSS.ELM
:
for ? = ?:?
for ? = ?:?
for ? = ?:?
ij = ij
end
ik kj
kk
end
end
Dongarra, Gustavson Karp [9]
, . ,
.
.
GAUSS.ELM.
GAUSS.MUL, GAUSS.ELM kij
( ) kji ( )13 .
12
.
GAUSS.MUL
.
13
c
176 5. II 2008,
.
Fortran kji kij . [29].
kji
SGEFA LINPACK ([10]).
A A . sAXPY,
.
.
j A
A:,j = LU:,j
L, U :
A1:j1,j
Aj,j
= L1:j1,1:j1 U1:j1,j
=
j1
X
(5.8)
Lj,k Uk,j
z}|{
+ Lj,j Uj,j
(5.9)
k=1
Aj+1:n,j
z
j1
X
MXV
}|
(5.10)
k=1
j 1 L
U , j (
) : L1:j1,1:j1 ,
(5.8)
j 1 j U . ,
Uj,j (5.9) , (5.10) Lj+1:n,j . (5.10) ,
MXV 14 . ,
MXV
kji, kij . A (
) L U ,
jki:
14
c
5.6. 2008,
.
177
. [ LU jki]
, A
L,
A U .
for j = 1 : n
(* (5.8) *)
for k = 1 : j 1
jki .
ikj
[9].
, ijk, jik , DOT 15 .
i 1 L U , i L
U i L
U . i 1 i 1
i L, n i + 1 i
U . :
DOT
Ai,j
z
}|
{
Li,1:j1 U1:j1,j + Li,j Uj,j ,
z
}|
{
Li,1:i1 U1:i1,i + Uii
z
}|
{
Li,1:i1 U1:i1,j + Ui,j ,
(1 j i 1)
(5.11)
DOT
Aii
(5.12)
DOT
Ai,j
(i + 1 j n)
(5.13)
15
, Doolitle Crout.
c
178 5. II 2008,
.
. [ LU ijk ]
for i = 2 : n
for j = 2 : i
5.6.1
,
BLAS1 BLAS2. ,
, .
BLAS3 .
A11
A21
A31
A12
A22
A32
A13
L11
A23 = L21
A33
L31
0
L22
L32
0
U11
0
0
L33
0
U12
U22
0
U13
U23 , (5.14)
U33
L11 U11
L21 U11
L31 U11
L11 U12
L21 U12 + L22 U22
L31 U12 + L32 U22
L11 U13
,
L21 U13 + L22 U23
L31 U13 + L32 U23 + L33 U33
BLAS-3, .
(=right-looking) (=left-looking).
:
L U . k
(k 1) L U , . L11 , L21 , L31 U11 .
L, U , .
c
5.6. 2008,
.
179
A11
A21
A31
A12
A22
A32
A13
L11
A23 = L21
A33
L31
0
U11
0 0
I
0
0
I
0
A12
A22
0
A13
A23 .
A33
(5.15)
k :
A11
A21
A31
A12
A22
A32
A13
L11
A23 = L21
A33
L31
U11
0
0 0
I
0
0
L22
L32
U12
U22
0
A13
A23 .
A33
(5.16)
A12
= L11 U12
A22
A32
. :
1) A12 = L11 U12 _TRSM L11 .
_GEMM
T22
T32
A22
A32
L21
L31
U12
A11
A21
A31
A12
A22
A32
A13
L11
A23 = L21
A33
L31
U11
0
0
0
I
0
0
I
0
U12
A22
A32
U13
A23 .
A33
(5.17)
k :
A11
A21
A31
A12
A22
A32
A13
L11
A23 = L21
A33
L31
0
L22
L32
U11
0
0 0
I
0
U12
U22
0
U13
U23 .
A33
(5.18)
A22
A32
L22
L32
U22
c
180 5. II 2008,
.
5.4: : . :
.
.
A22
A32
A23
A33
L22
L32
0
I
L22
L32
0
I
U22
0
U22
0
L1
22 A23
A33 L32 L1
22 A23
U23
A33
:
1)
A22
A32
L22
L32
U22
_GETF2 (BLAS-2)
2) U23 = L1
22 A23 _GETRSM (BLAS-3)
33 = A33 L32 U23 _GEMM _GERK (BLAS-3).
3) A
5.4 .
1 < k n,
BLAS3,
.
k , , , ...
, ,
k .
c
5.6. 2008,
.
181
5.6.1.
. k
.
, A,
LU . Lij , Uij L, U .
A
. . L, U
A11
A21
A12
A22
I
L21
0
I
U11
0
U12
U22
(5.19)
L21
U11
U12
U22
:=
:=
:=
:=
A21 A1
11 ,
A11 ,
A12 ,
A22 A21 A1
11 A12 .
A LU
A11 .
5.6.2. .
:
5.6.1. Ax = b
A=
D1
B>
B
D2
D1 , D2 .
A=
I
B > D11
0
I
D1
0
B
D2 B > D11 B
A Schur.
:
1. y2 = b2 B > D11 b1 .
2. (D2 B > D11 B)x2 = y2 .
3. x1 = D11 (b1 Bx2 ).
5.6.3.
.
5.6.4. D1 , D2 .
.
c
182 5. II 2008,
.
5.6.5. A .
11 , U
11 L
11
11 U
LU A11 , . A11 = L
L11 , A(k + 1 : n, k + 1 : n)
Schur.
5.6.6. .1
.
I
L21
L= .
..
Lm1
Lm,m1
.. , U =
.
U11
U12
U1m
U22
..
.
..
Umm
5.5.2:
nn
5.6.1. A = (Aij )m
i,j=1 R
Aij . , m 1
.
5.7
5.5.2,
.
5.7.1. A = [e2 , e1 ] R22 . , A2 = I .
A11 = 0.
, .
( )
,
.
Gauss.
5.5.2, ... L, U
( ).
.
(k1)
. kk ,
c
5.7. 2008,
.
183
(, )
(k1)
ik
(k1)
kk
, i = k + 1 : n.
.
k A
, , ,
:
: k ,
, (k : n, k)
A. k k
k k.
.
: k ,
, Ak:n,k:n .
(1 , 2 ), k 1 k 2 .
,
k 2 ( ), . O(n3 )
k , . O(n2 )
. , , [22].
,
, : LU
Gauss .
. (rook pivoting), ,
, .
[24].
A. ,
:
Ln1 Pn1 L1 P1 A = U,
Pk k
(k + 1 : n) k
Lk1 Pk1 L1 P1 A.
(Ln1 Pn1 L1 P1 )1 U,
Pn1 P1 A
P := Pn1 P1 , P ,
P A = LU.
c
184 5. II 2008,
.
. [22].
LU
,
.
.
: P Ax = LU x = P b,
b L U .
.1 [22] :
: (=scaling) P1 , P2 P1 AP2 P21 x =
P1 b Ax = b
.. A, (P1 AP2 ) < (A).
(=preconditioning)
.
: (=iterative improvement) x(0) Az = r (j1) r (j1) :=
b Ax(j1) , x(j) := x(j1) + z j = 1, ....
,
- Gauss.
,
:
n :=
max{0 , 1 , ..., n1 }
,
0
(k)
k := maxi,j |ij |
(k + 1 : n, k + 1 : n) Lk Pk L1 P1 A.
,
"
A=
1
10000
, :
"
L1 =
"
L1 A =
10000 1
1
10000
9999
0 = 1, 1 = 9999 = 9999.
"
P1 =
0 1
1 0
"
L1 =
1
10000
"
L1 P1 A =
9999
10000
0 = 1 1 = 0.9999, .
. , .
c
5.7. 2008,
.
185
5.7.2.
1
1
A=
1
1
0
1
1
1
0 1
0 1
1 1
1 1
1
0
U =
0
0
0
1
0
0
0
0
1
0
1
2
4
8
= 8,
1
0
U =
0
0
1
2
0
0
0
1
2
0
0
0
1
2
= 2.
,
:
1
1
ij =
j = i, j = n,
j < i,
= 2n1 ,
n
:
n <
p
n21 31/2 41/3 n1/(n1) .
n 2n1 .
[24].
Gould 16
, . n. Gould 13 13
= 13.0205.
5.7.1.
Gauss :
2n1 .
n
, , .
5.7.3.
n.
Hessenberg
n
2.
n
16
Wilkinson [39, . 214], In fact no matrix has yet been discovered for
which nPO > n.
c
186 5. II 2008,
.
5.7.1
It is a matter for some surprise that one of the simplest methods of
solution generally leads to an error, the expected value of which is
precisely that resulting from random perturbations ... This means
that when the original elements are not exactly representable by
numbers of t binary digits the errors resulting from any initial rounding that may be necessary are as serious as those arising from all
the steps in the solution. James Wilkinson . 198 [39]
,
. ,
,
Wilkinson. . [38]. ,
, .
:
:
<
(= posedness) 17 .
. .1 A, b
(A + A)(x + x) = b + b,
x A, b. A() := A + E
A := A(0) 0. :
5.7.1. A() Rmn
ij (). ij ()
(1)
d
A(1) () ij () := d
ij ().
mn
ns
5.7.1. A() R
, B() R
.
d
d
d
[A()B()] = [A()]B() + A() [B()]
d
d
d
d
d
[A()1 ] = A()1 [A()] A()1
d
d
. .
17
(= well-posed)
(= ill-posed).
c
5.7. 2008,
.
187
(A + E)x() = b + e, x(0) = x.
5.7.1
kx() xk
kxk
kek
k
+ kEk + O(2 )
kbk
||kA
(A)(A + b ) + O(2 )
A := ||
(5.20)
(5.21)
kek
kEk
, b := ||
kAk
kbk
.
A, b x . (5.20) (A).
(A) . p
:
1
kAkp
=
min
.
p (A) A+A kAkp
(5.22)
, 1/p (A) , p, A .
(p = 2)
2 (A) =
1 (A)
n (A)
(5.23)
1 n A.
:
5.7.1. kAk < 1, I A
k(I A)1 k
1
.
1 kAk
. .
(5.20):
5.7.1. kA1 kkAk < 1 (A + A)y = b + b.
A + A
(A)
ky xk
kxk
1 (A) kAk
kAk
kAk kbk
+
kAk
kbk
c
188 5. II 2008,
.
k(I + A1 A)1 k
1
.
1 kA1 kkAk
y x = (I + A1 A)1 A1 (b Ax)
ky xk
kxk
kA1 k
1 kA1 kkAk
kbk
+ kAk
kxk
kxk kbk/kAk .
5.7.2.
ky xk
2 (A)
kxk
1 (A)
. .
:
5.7.2. (A) < 1/2
ky xk
4 (A)
kxk
. .
Ax = b , A P A = LU , P Ax = P b .
.
U
:
L,
U
= A + E.
L
.
(A + A)y = b . b = 0,
A. 18 :
+ L)(z
(L
+ z) = b,
(U + U )(x + x) = z + z.
U
) + (L)
U
+ (L)(
).
A = E + L(
U
kAk
, L
, L,
U
(
E, U
) .
18
c
5.7. 2008,
.
189
:
5.7.3 (Wilkinson). A Rnn
x
Gauss .
(A + A)
x = b, kAk cn3 MO
n kAk u
c = O(1).
5.7.3
Gauss n .
, ... u 10d
= O(10k ),
.1 Gauss
d k [22].
5.7.2. MATLAB Matrix Computation
Toolbox N. Higham
www.maths.manchester.ac.uk/ higham/mctoolbox
A .
, randsvd
.1.
randsvd(4,1e14)
}|
0.2260 0.1746
0.3329
0.2226 0.1719
0.3278
A=
0.3578
0.2764 0.5269
0.0014 0.0011
0.0021
z
rand(4,1)
}|
0.1763
0.8307
0.1736
, x0 = 0.0548
0.1896
0.2790
0.0011
0.9753
kx0 xk2
= 0.0023
kx0 k2
ky xk
kxk
. :
SLAMCH: , epsmch.
SLANGE: kAk .
SGESV Ax = b L, U .
SGECON: , rcond.
errbd
c
190 5. II 2008,
.
:
k.k ,
.. .
(A). .
(A) . :
1. (A) O(n3 )
.1 A1 .
2. (A) ,
.
:
(A)
.
LAPACK MATLAB . [22, 24].
(.. )
. , ,
.
(..
,
, .) (A).
. ,
.
b .
5.7.5. A = A> Rnn . Aui = i ui
(i , ui )n
i=1 . Q , Q> AQQ> x = Q> b
Q> x = Q> b
1
..
u>
1x
u>
1b
..
..
.
= .
>
u>
x
u
b
n
n
>
j u>
j x = uj b.
j = 0 u>
j b = 0.
Ax = b
. ,
j = 0 u>
j b = 0. ,
j u>
j b
c
5.7. 2008,
.
191
(.. ). A
,
1 n1 n = 0.
b n = 0.
x=
n1
X
i=1
i
ui
i
. A .
(
)
. (=effective condition number) [5],
b.
. ,
[1, 10r ] 10r ,
.
. b = [1 , 2 ]> ,
2 /10r .
. ,
, S (A) := k|A||A1 |k
A . 20
.
5.7.6. MATLAB:
>>
>>
>>
>>
a=rand(100);
flops(0); c_exct = cond(a); f_exct = flops;
flops(0); c_est = condest(a); f_est = flops;
[c_exct,c_est]
2.5938e+03
5.8855e+03
>> [f_exct, f_est]
2953425 764356
cond MATLAB 2 condest Test Matrix Toolbox 1 Hager ([23])
Higham ([24]). f_est < f_exct
c_exct, c_est = O(103 ).
1
5.7.7. Vn = V (0 , ..., n1 ) j = j+1
, , .
c
192 5. II 2008,
.
5.7.8.
.1
A. ,
A1 .
n Axj = ej , j = 1 : n.
[24].
5.8
.1
O(n3 )
... Strassen
, n O(n3 ) ,
( .1).
: Gaussian elimination is not optimal.
Strassen.
[6, 37]. A
. (5.19) A1
:
A1
11
0
1
A1
11 A12 S
1
S
I
A21 A1
11
1
1
A1
A21 A1
11 + A11 A12 S
11
1
1
S A21 A11
0
I
1
A1
11 A12 S
1
S
1. A1
11 ,
1
2. A1
11 A12 A21 A11 ,
1
3. S 1 = (A22 A21 A1
,
11 A12 )
1
1
4. S 1 (A21 A1
,
11 ), (A11 A12 )S
1
1
5. A1
(A21 A1
11 + (A11 A12 )S
11 )
.
Schur. n = 2k
n/2.
:
c
5.8. .1 2008,
.
193
1. 2 n/2,
2. 6 n/2,
3. 2 n/2, .
(5.24)
:
5.8.1. A , 5.64nlog 7 .
. (5.24)
Strassen TMUL (n) 4.7nlog 7 ,
TINV (n)
log
Xn
j=1
2j1 TMUL (
n
n2
)
+
log
n
.
2j
2
(5.25)
.
( ) !
.
,
LU , .
,
, , ,
.. .
. ,
Schur,
. A
19
() , Ax = b : 1)
A>, 2) (A> A)1 A> b. Schonhage A ,
A> A . ,
Strassen .
A> A 2 (A> A) = 2 (A)
.
. , Bunch
Hopcroft Strassen O(nlog2 7 ).
[3] [24].
19
.
c
194 5. II 2008,
.
5.9
20
( O(n2 ) ).
,
. .
.
5.9.1, Cholesky
, LU .
5.9.1 Cholesky
, A Rnn (..)
x> Ax > 0 x Rn . ,
B , B > B ..
A A> ii
, A ..
A=
11
21
12
22
x A
..
|12 |, |21 |
11 + 22
.
2
: A ..
..
,
A .
A11 k , Schur
A11 , S = A22 A21 A1
11 A12 ..
:
A. .. x> = (x1 , x2 )
x> Ax =
=
>
>
>
x>
1 A11 x1 + x1 A12 x2 + x2 A21 x1 + x2 A22 x2
1
1
>
>
(x1 + A1
11 A12 x2 ) A11 (x1 + A11 A12 x2 ) + x2 (A22 A21 A11 A12 )x2 .
>
A .., x1 + A1
11 A12 x2 = 0, x Ax =
>
x2 Sx2 > 0 .. A, .
20
R,
.
c
5.9. 2008,
.
195
..
5.5.2 L, U A = LU . ,
,
LU .
:
5.9.1. A Rnn ..
(k)
A = Lk L1 A,
k Gauss A, (1 k n 1). :
(k)
1. Ak:n,k:n , ..
(k)
(k)
2. ij Ak:n,k:n
(k)
(k1)
i,j
|, k = 1, ..., n 1.
LU
, .. , =
1. ..
.
:
5.9.2. A Rnn
, L, M
, D , A = LDM > .
L, D, M .
. A ,
L, U A = LU
L . D = diag([u11 , ..., unn ])
diag(u) u .
M > = D1 U A = LDM > . M
.
LU .
5.9.3. A Rnn .
R
A = R> R.
. .
n ( n = 1 ). ..
A =
A
a>
A = R> R
A =
R>
r>
R
0
>
>
r r > 0 = r r > 0.
c
196 5. II 2008,
.
x> Ax
> 0 x 6= 0,
A
> T
>
x = [r R , 1]
0 <= r> RT AR1 2r> RT a + = r> r.
R:,j Rjj
A:,j
j1
X
R:,k Rkj
k=1
A:,j
j1
X
R:,k Rjk
k=1
MXV GAXPY.
2
j Rjj
Rjj
R:,j .
:
. [ Cholesky MXV]
A
R.
for j = 1 : n
if j > 1
Aj:n,j = Aj:n,j /
Ajj
end
T n3 , LU .
... , Cholesky ,
y (A + E)y = b kEk2
cukAk2 c n.
A ...
Cholesky [22, .
147].
,
BLAS3.
k R, R11 , R22
>
k n k , A11 = R11
R11
c
5.10. 2008,
.
197
:
xTRSM
A12
z }| {
>
= R11
R12
>
>
A22 = R22
R22 + R12
R12
R12 .
>
R11
nk , . xTRSM BLAS3.
>
A22 R12
R12 xSYRK
BLAS3. Cholesky
, n k
Cholesky
BLAS2. xPOTF2.
Cholesky
LAPACK [1].
Cholesky, , 21 .
,
.1.
..
, .. A
.., .
, Laplace Poisson: (uxx + uyy ) = f (x, y).
. Cholesky .
5.10
,
.1, LAPACK. Fortran, C (CLAPACK),
C++ (LAPACK++) [1] Java (JLAPACK).
:
C (CLAPACK) Java (JLAPACK) ,
, Fortran 77
f2c f2j .
LAPACK C
C wrappers Fortran
LAPACK.
, LAPACK++ Template Numerical Toolkit (TNT),
ANSI C++. TNT
21
, , Cholesky.
,
..
c
198 5. II 2008,
.
5.1: LAPACK
X
S
D
C
Z
REAL
DOUBLE PRECISION
COMPLEX
COMPLEX*16
YY
GE
TR
TB
TP
GB
GT
HE
ZZZ
TRF
TRS
COND
RFS
TRI
EQU
YY
PO
PP
PB
PT
SY
SP
refine
/
()
()
()
()
()
()
LAPACK : ,
23 . ,
, .. LU.
, .
functions
(.. ).
. LAPACK , , .
.
[1], .. Compaq Alpha Server DS-20 22
.1. .
. [1].
23
(= expert drivers.)
c
5.10. 2008,
.
199
c
200 5. II 2008,
.
5.2: ([24]).
1991
55.296
Connection Machine CM-2 4.4
1992/3 75.264
Intel iPSC/860
2 23
1994
76.800
Connection Machine CM-5 4.1
1995
128.600 Intel Paragon
1
5.11
LINPACK LAPACK
(=benchmarks) . ,
/ , MFLOPS, LINPACK 100
LINPACK 1000. LINPACK LAPACK Jack Dongarra University of Tennessee,
, , 27
( PC )
LINPACK.
.
: ,
Alan Edelman MIT
.. . [15, 12].
n ( 103 )
.
,
.. (= computational elecromagnetics). 5.2 [24]
( ) LU ...
.
5.12
MIT Gilbert Strang,
(1996). Gene Golub Charles van Loan [22]. ,
Horn Johnson [25, 26], Nick Higham [24],
. ,
,
. D.K. Faddeev
V.N. Faddeeva ( [16]),
Alston Householder [27], James H. Wilkinson
27
. http://www.netlib/org/benchmark/performance.ps Performance of
Various vector computers using standard linear equations software.
c
5.13. 2008,
.
201
5.13
5.13.1 (, , 02-makeup). :
(A), LU .
5.13.2. :
A A = LU
. ( ,
.. ).
. : , A = [0, 1; 1, 0]
LU l11 u11 = 0 u11 = 0, U ,
LU .
5.13.3. , k = 1, ..., n 1
Gauss LU ,
(k, k)
(k + 1 : n, k).
;
. . U . , , 1
(I u1 e>
1 )M1,i A
u1 = 0. ,
(I u1 e>
1 )M1,i A
= M1,i A u1 e>
1 (M1,i A).
1 (I u1 e>
1 )M1,i A 1 U , ,
1 .
u1 , u1 e>
1 (M1,i A) 0,
>
>
U (1, :) = e>
1 (I u1 e1 )M1,i A = e1 M1,i A
5.13.5. ,
LU , x(1) Ax = b
x .
kx x(1) k/kxk
(A). . .
;
5.13.6. Ax = b LAPACK.
c
202 5. II 2008,
.
. (..
). LAPACK
. (..
Toeplitz).
5.13.7. A
Cholesky;
. [, . 5.9] .
5.13.8. :
.
. : s = [s1 , s2 , s3 ]>
1
T = t21
0
0
1
t32
0
0
1
t21 t32 6= 0. T s = e1 s3 6= 0.
s1 = 1, t21 + s2 = 0, t32 s2 + s3 = 0.
, s3 = 0 s2 = 0, t21 = 0, .
5.13.9.
Ax = b (
LAPACK) Robert Skeel
cond(A, x) =
A ,
b cond(A, x) = 1.
x; .
. x = e1 Ax = e1
1
1
x = [ 111 , 0, . . . , 0]> 11
e1 = |11
|
|A||x| = |A|e1 = e1 A .
|A1 ||A||x| = e1 . k|A1 | |A| |x|k = . |x|k =
.
5.13.10.
A = P LU L U ,
L
U . ( , LU
, ).
. L
, , . U
, , ,
j n j , ,
.
c
5.14. 2008,
.
5.14
203
5.14.1. () u, v Rnn .
A = I + xy > det(A) = 1 + x> y . ( :
).
A + ei e>
j
A1 (I + A1 ei e>
j )
, I + A1 ei ej .
1
ei
, det(I + A1 ei ej ) = 1 + e>
j A
( B ) = 1/(A1 )ji .
(j, i) A1 (A1 )ji ,
1
>
. , (A1 )ji 6= 0, A (A1
)ji ei ej
.
5.14.3 (Golub and van Loan [22]).
n
N (y, k) = I ye>
k y R Gauss-Jordan.
1. N (y, k)1 .
2. x Rn y
N (y, k)x = ek ;
3. Gauss-Jordan
A1 A.
4. A
.
1. B := N (y, k)1 , I = N (y, k)B = B ye>
k B = B ybk,: .
B = I + ybk,: B 1
. B = I + zu> .
N (y, k)B
>
>
>
>
(I ye>
k )(I + zu ) = I + zu yek yu k ,
k k z .
>
0 = zu> ye>
k yu k
(z yk )u> = ye>
k.
u = ek y = z yk k = k k k ,
k =
k
,
1 k
c
204 5. II 2008,
.
k k y . z = y(1+k )
k .
I+
ye>
k
1 k
2. x = Bek = ek +
x = ek +
ye>
k
1k ek
y
,
1 k
1 6= k y 0 x = ek .
5.14.4. Mi1 ,i2 ,
AMi1 ,i2 A i1 i2 .
> >
. : B = AMi1 ,i2 = (Mi>
A
)
1 ,i2
, B > = Mi1 ,i2 A>
A> i1 i2 . i1
i2 A.
5.14.5. U Rnn
U x = ej ej Rn
j .
U x = ej j 2 j + 1.
. , x j
U 1 . U 1 ,
x 1 j . ,
j . ,
U11
0
U12
U22
x1
x2
ej
0nj
j = 1/j,j
b = [1,n , ..., j1,n ]> j
U (1 : j 1, 1 : j 1)[1 , ..., j1 ]> =
b
j ,
j 1 (j
1)2 . (j 1)2 + j .
5.14.6. A Rnn . ,
LU 23 n3 + O(n2 ).
n3 + O(n2 ). : 1
.
c
5.14. 2008,
.
205
: (
. 5.14.5). .
. , ,
LX = I , I L , (. 5.14.6)
5.14.7. :
.
. :
. ,
5.13.8 : , Cholesky A = LL> , L
. A , L . A1 = L> L1 . 5.13.8,
L1 . , L>
. , A1
, ,
.
5.14.8.
LU A R33 , .. lu(A),
A
1 2
0.5 1
1 1
3
2
1
1. A.
2. Ax = b
b = [11, 10, 4]> .
.
1.
L U .
:
1
L = 0.5
1
0 0
1 2
1 0 U = 0 1
1 1
0 0
1
A = LU = 0.5
1
3
2
1
2
3
2 3.5
1
0
2. L, U
b = [11, 10, 4]> , x =
[4.5, 0.5, 2.5]> .
c
206 5. II 2008,
.
5.14.9. A Rnn
A=
U
s
U R(n1)n Hessenberg s = [1 , , n ]. LU A.
A .
.
. (n, 1 : n1).
, .
n 1 Gauss (n, 1), (n, 2), , (n, n 1).
( L, U lij , uij .)
L = eye(n)
for k = 1 : n 1
lnk = k /ukk
for j = k + 1 : n
j = j lnk ukj
end
end
, L U
L=
..
.
U
,
U
=
..
..
0, , 0, n
. 0
.
ln1 ln2 1
Pn1
Pn
k=1 (1 + j=k+1 2), . n2 + (n).
5.14.10 (, , 04). A Rnn
A = U V > j,j = j
n,n = 0, U U > = I V V > = I . )
A . ) b = U z
z = [1, . . . , 1, 0]> . , , x
Ax = b ( : x
V .)
=
0
.
A
=
n
Pn1
>
>
u
v
,
U
,
U
V
x
=
U
z
j
j
j
j=1
V > x = z . x Rn V ,
x = V y y , y = z .
, j j = 1 j = 1 : n 1 n
Pn1
. , x = j=1 1j vj .
5.14.11. LU A
Rnn u, v Rn . ) B =
A1 (I uv > ). ) ( cnk )
B . (.
0
... !) )
Bz z Rn .
c
5.14. 2008,
.
207
.
() A = LU .
:
B = A1 (I uv > ) = A1 A1 uv >
:
1. Ax = u,
2. A1 ,
3. C = xv > ,
4. B = A1 C .
()
:
1. 2n2 ( LU).
2. 2n3 n LX = I U B = X B = A1 .
, ,
LX = I , I L , (. 2
n(n+1)(2n+1)
5.14.6)
+ n3 = 34 n3 + n2 + O(n)
6
L, U .
3. n2 .
4. n2 .
= 2n3 + 4n2 + O(n) .
= 43 n3 + 72 n2 + O(n).
() Bz , :
Bz = A1 (I uv > )z = A1 z A1 uv > z
:
1. Ax = z
2. Ay = u
3. = v > z
4. x y
1 2, 4n2 .
5.14.12. A Rnn .
1. u Rn B := A + uuT
.
2. uT A1 u > 0.
c
208 5. II 2008,
.
3. C := A1 A1 u(1 uT A1 u)1 uT A1 CB = I .
4. Cholesky A.
Bx = y
. Cholesky, ... n2 + O(n)
.
.
1. A x> Ax > 0,
x 6= 0. B = A + uu> :
Bx = y CBx = Cy x = Cy
x = A1 y A1 u(1 u> A1 u)1 u> A1 y.
x:
4.1 Az1 = u (n2 )
4.2 Az2 = y (n2 )
4.3 1 = u> z2 (2n 1 )
4.4 2 = (1 u> A1 u)1 1 (2n + 2 )
4.5 z4 = 2 z1 (n )
4.6 x = z2 z4 (n )
4n2 + 6n + 1 = 4n2 + O(n) .
( 4.1, 4.2), Cholesky,
( 5.0.108).
c
5.14. 2008,
.
209
5.14.13. A Rnn , n = 2k
k . .
() A = LU :
A11
A21
A12
A22
I
X
0
I
Y1
0
Y2
Y3
X, Y1 , Y2 , Y3 R 2 2 .
1. X, Yj , j = 1 : 3 Aij
A1
ij .
2.
O(n ) ..., 2 < 3,
Ax = b O(n ) ... ( :
.)
.
1
1. X = A21 A1
11 , Y1 = A11 , Y2 = A12 , Y3 = A22 A21 A11 A12 (
Schur). A11 A .
2. ( 5,
(Strassen).
5.14.14. T Rnn A R(n+1)(n+1) ,
A=
T
vT
u, v Rn , 6= 0 R . T ,
K(n)
T y = z K(n) limn n3 = 0.
1. b Rn .
Ax = b A
.
2. ...
n K .
5.14.15. ) LU
U
1
0
1
1
A=
1 1
1 1
0
0
1
1
1
0
.
1
1
) , ,
A 2 (A) = 1.8.
Ax = b b A
; ) U
;
c
210 5. II 2008,
.
. , 5.7.
, , ,
LU ,
.
5.14.16. MATLAB
.
.
MATLAB .
for i = 1 : n
for j = 1 : n
A(i, j) = 1/(i + j 1)
end
x = A\b
b=x+b
end
. . b
b = sin(2*pi*[1:n]/(n+1)); A
( Hilbert) :
J
J
I
E
A
=
=
=
=
=
1:n;
J(ones(n,1),:);
J;
ones(n,n);
E./(I+J-1);
,
:
c
5.14. 2008,
.
211
, ,
MATLAB .
. Vandermonde n x.
m (V > V )
B . , ) V , ) , )
. , ) ,
. (
)
x .
x.[0:n-1];
) x [0:n-1] (. 1 n)
c
212 5. II 2008,
.
A=rand(m,n);B=rand(m,n);C=rand(m,n);X=rand(m+n,s);D=eye(m+n,m+n);
for i=1:m, for j=1:n, D(i,j)=A(i,j)*B(i,j)+C(i,j); end;end;
for i=1:m, for j=1:n, if (i==j), D(i,j)=p+D(i,j); end;end;end;
for k=1:s, Y(:,k)=D(1:m+n,1:m+n)\X(:,k); end;
. :
D(1:m,1:n)=A.*B+C;
k=min(m,n); D(1:k,1:k)=p*eye(k)+D(1:k,1:k);
s X(:, k), k = 1 : s
LU D :
[L,U]=lu(D); Y=U\(L\X);
5.14.19. A Rnn . ,
,
LU = P AQ L, U , L
, Q . , ,
n = 4.
. Lk (uk ) = I uk e>
j
,
i,j
k
c
5.14. 2008,
.
213
t (i0 , j0 ) ( , t
(1,1) ). , ,
>
Mi,i0 Mi,i0 = I AMi,i
0 = AMi,i0 0
i i ,
M1,i0 AM1,j0
Gauss
(i1 , j1 )
Gauss,
( , n = 4)
A(3) = U.
, ,
L3 M3,i2 L2 M3,i2 M3,i2 M2,i1 L1 M2,i1 M3,i2 M3,i2 M2,i1 M1,i0 A M1,j0 M2,j1 M3,j2
|{z}
{z
}|
{z
}|
{z
} |
{z
}
|
3
L
2
L
1
L
k Lk , .. k = 2,
i1 2, i2 3, L
1
L
=
=
>
M3,i2 M2,i1 (I u1 e>
1 )M2,i1 M3,i2 = I M3,i2 M2,i1 u1 (M3,i2 M2,i1 e1 ) )
>
I u
1 e1
u
1 = M3,i2 M2,i1 u1 .
u1 ( 2 n.) , Q =
M1,j0 M2,j1 M3,j2 , .
P AQ
1 L
1 L
1 U
= L
1
2
3
= (I + u
1 e>
1 e>
1 e>
1 )(I + u
1 )(I + u
1 )U
= (I + u
1 + u
2 + u
3 )U = LU
5.14.20 ( 03).
Hessenberg , A Rnn , LU
(,
) A L, U
214
A. )
: k =
max{0 ,...,k1 }
,
0
(k)
k := maxi,j |ij |
(k + 1 : n, k + 1 : n)
Lk L1 A k LU .
, M ,
. , LU
A. MATLAB error(MSG)
MSG
max(X) X
X.
. ) :
1, ..., n 1
for j=1:n-1
A(j+1,j) = A(j+1,j)/A(j,j)
A(j+1,j+1:n) = A(j+1,j+1:n) - A(j+1,j) A(j,j+1:n)
end
) :
a0 = max(max(abs(A))); r= 1;
for j=1:n-1
if r < M
A(j+1,j) = A(j+1,j)/A(j,j);
A(j+1,j+1:n) = A(j+1,j+1:n) - A(j+1,j) A(j,j+1:n);
r = max(max(max(abs(A(j+1:n,j+1:n))))/s,r \right);
else
error(Must use pivoting);
end
end
215
[6] T.H. Cormen, C.E. Leiserson, and R.L. Rivest. Introduction to Algorithms.
McGraw-Hill, New York, 1990.
[7] G. Dahlquist and . Bjorck. Numerical Methods. Prentice-Hall, 1974.
[8] L. DeRose, K. Gallivan, E. Gallopoulos, B. Marsolf, and D. Padua. FALCON:
A MATLAB Interactive Restructuring Compiler. In C.-H. Huang, et al.,
editor, Lecture Notes in Computer Science: Languages and Compilers for
Parallel Computing, pages 269288. Springer-Verlag, New York, 1995.
[9] J. J. Dongarra, F. G. Gustavson, and A. Karp. Implementing linear algebra
algorithms for dense matrices on a vector pipeline machine. SIAM Rev.,
26(1):91111, January 1984.
[10] J.J. Dongarra, J.R. Bunch, C.B. Moler, and G.W. Stewart. LINPACK Users
Guide. SIAM, Philadelphia, PA, 1979.
[11] J.J. Dongarra, R. Pozo, and D. Walker. LAPACK++: A design overview of
object-oriented extensions for high performance linear algebra. In Proc.
Supercomputing93, pages 162171. IEEE Computer Soc. Press, 1993.
[12] J.J. Dongarra and D.W. Walker.
Software libraries for linear
algebra
computations
in
high
performance
computers.
SIAM Rev.,
37(2):151180,
June 1995.
Also in
http://hpclab.ceid.upatras.gr/faculty/stratis/download/Dongarrawalker.zip.
[13] I. S. Duff, A. M. Erisman, and J. K. Reid. Direct Methods for Sparse Matrices. Clarendon Press, Oxford, 1989.
[14] I. S. Duff, R. G. Grimes, and J. G. Lewis. Sparse matrix test problems.
ACM Trans. Math. Softw., 15:114, 1989.
[15] A. Edelman. The first annual large dense linear system survey. ACM
SIGNUM Newsletter, 26(4), October 1991.
[16] D. K. Faddeev and V. N. Faddeeva. Computational Methods of Linear Algebra. W. H. Freeman and Co., San Francisco, 1963.
[17] R.W. Freund, G.H. Golub, and N.M. Nachtigal. Iterative solution of linear
systems. In Acta Numerica, volume 1, pages 57100. Cambridge University
Press, 1992.
[18] C.F. Gauss. Theoria Combinationis Observationum Erroribus Minimis Obnoxiae (Theory of the Combination of Observations Least Subject to Errors).
SIAM, Philadelphia, 1995. Translation and notes by G.W. Stewart.
[19] W. Gautschi. How (un)stable are Vandermonde systems? In R. Wong,
editor, Asymptotic Analysis and Computational Analysis, pages 193210.
Marcel Dekker, Inc., New York, 1990.
[20] J. R. Gilbert, C. Moler, and R. Schreiber. Sparse matrices in MATLAB:
Design and implementation. SIAM J. Matrix Anal. Appl., 13(1):333356,
1992.
216
III
:
.2 A m n, m- b. n x krk2
r = b Ax.
. :
.1 .2. m
n, A ,
x Ax = b. x
b Ax.
. x
x
. .. ([3]) A = [1, 1, 1]> b = [1 , 2 , 3 ]> 1 2 3 0 krkp
p = 1 x = 2 , p = 2 (1 + 2 + 3 )/3
p = (1 + 3 )/2. p = 2
kb Axk2
x,
.
, .
.2. ,
. Q> Q = I kb Axk2 = kQ> b Q> Axk2 , Q minx kkQ> b Q> Axk2 .
, ,
217
c
218 6. III 2008,
.
k k2 .
2.
6.1 QR
A
. .
6.1.1. A Rmn m n, Q
mn
R
() R Rnn
A = QR. m = n Q A
R
. Q, R .
6.2
6.2.1. S Rn . S
S .
6.2.2. P : Rn Rn P 2 = P .
P ( ).
P
.
6.2.1. P , I P
.
:
Rn (= span) S T Rn x Rn
s S , t T .
x Rn s S
S T .
6.1, S T
. P 1 .
s S t T (, ) Rn ,
P : Rn S, P x = s kai P y = 0 y T .
1
, Rn Cn .
c
6.2. 2008,
.
219
x
t
s
6.1: s () x S T .
S .
6.2.1. S Rn P
.
1. x S P x = x.
P x + (I P )x.
2. x = |{z}
| {z }
s
3. P 2 = P ().
4. P > = P .
6.2.1 ( ). S, T
V x V
x = s + t s S , t T .
6.2.1. .
( = oblique projectors).
6.2.1
u Rm
P :=
uu>
Rmm .
u> u
x Rm .
Px =
uu>
x =
u> u
=
u u> x
kuk kuk
vkxk cos(x, u), v :=
u
kuk
c
220 6. III 2008,
.
(hui)
6
-hui
P =
1
2 4
14
3 6
Px =
1 2
1 + 2 2 + 3 3
1
2 1 + 4 2 + 6 3
14
3 1 + 6 2 + 9 3
1
1 + 2 2 + 3 3
2 .
14
3
6.2.2 Gram-Schmidt
Gram-Schmidt
6.1.1. A Rmn .
Gram-Schmidt [q1 , ..., qn ]
ha1 , ...., an i. . .
q1 , ..., qk1 span{q1 , ...., qk } = span{a1 , ...., ak },
qk :
qk
:=
(I P1 Pk1 )ak
qk
:=
qk /k
qk k
Pj := qj qj> . Pj span{qj }.
qk
>
:= ak (q1> ak )q1 (qk1
ak )ak
c
6.2. 2008,
.
221
q1 , ..., qk1 , qk . :
.
[
function[Q, R] = CGS(A)
Gram-Schmidt]
11 = ka1 k q1 = a1 /11
for k = 1 : n
for i = 1 : k 1
ik = qi> ak
end
Pk1
qk = ak i=1 ik qi
kk = kqk k2
qk = qk /kk
end
CGS
= 2mn2 + O(mn).
... T
BLAS-1, . DOT sAXPY.
CGS jk (1 j k )
ak =
k
X
jk qk , 1 k n,
j=1
A = Q1 R1 Q1 = [q1 , ..., qn ]
11
0
R1 =
.
..
0
...
1n
..
..
..
.
..
.
...
nn
, CGS, R1
11 , 12 , 22 , ..., . .
kk = 0,
c
222 6. III 2008,
.
6.2.2. [2]
1 1
0 0
A=
0 0
0 0
...
1 + 2 = 1.
, MATLAB 4.2 Macintosh Powerbook 5300
(PowerPC CPU) CGS = 1.4901e 08.
Q=
2
2
2
2
2
2
2
2
q2> q3 0.5, Q .
kI Q>
CGS QCGS kF 0.7071
QCGS ,
25
kA Q>
. , Q R
CGS RCGS k 10
A, Q .
6.2.3 GS
, CGS Q.
GS . , q1 ,
a2 ,
A,
a3 , ..., an . Q
q1
q2
:=
=
q1 /k
q1 k
q2 P1 q2 = (I P1 )
q2
q3
...
q3 P1 q3 = (I P1 )
q3
qn
qn P1 qn = (I P1 )
qn .
P1 qj = (q1> qj )q1 ,
Gram-Schmidt2 .
2
Modified Gram-Schmidt
c
6.2. 2008,
.
223
. [. Gram-Schmidt]
function [Q, R] = MGS(A)
Q=A
for k = 1 : n
kk = kqk k
qk = qk /kk
for j = k + 1 : n
kj = qk> qj
qj = qj kj qk
end
end
CGS
. , .
qj CGS. .
6.2.3. MGS .
8
kI Q>
MGS QMGS kF 1.2 10 ,
. Q
CGS:
MGS
.
(
Vandermonde).
6.2.4 Householder
( Householder)
H := E(u, u; 2/u> u) = I
2
u> u
uu> .
,
H > = H H > H = H 2 = I .
H , .
Hx = (I
2
u> u
uu> )x = x P x P x
c
224 6. III 2008,
.
(hui)
6
B
6
Hx
@
I
@
@
O
-hui
6.3: : Hx = (x Phui x)
~ .
Phui x, x := OA
H (
).
.
x Rn
Hx = (I 2
uu>
u> x
)x = x 2 > u
>
u u
u u
x,
Hx he1 i x he1 , ui
u = x + e1 . u> x, u> u
Hx = (1 2
x> x + 1
u> x
)x 2 > e1 .
2
+ 21 +
u u
x> x
= kxk2 :
u = x kxk2 e1 Hx = kxk2 e1
:
:
u = x + sign(1 )kxk2 e1 .
H Householder:
=
=
2
uu>
u> u
2
(u)(u)> .
I
(u)> (u)
c
6.2. 2008,
.
225
, u u(1) = 1.
x 6= 0, REFL Householder Hx = kxke1 . , u u(1) = 1.
function u = REFL (x)
n = length(x); = norm(x, 2); u = x
if 6= 0
= x(1) + sign(x(1))
u(2 : n) = u(2 : n)/
end
u(1) = 1 ()
end
REFL = 3n ...
: T
u Householder.
6.2.4. u =
6.2.5
H . A
HA = A
2
2
u(A> u)> AH = A > (Au)u>
u> u
u u
= 2/u> u
w = A> u
B = A + uw>
= 2/u> u
w = Au
B = A + wu>
AH HA 1 DOT, 1 MXV,
REFL.(COL/ROW)
1. T
= 4mn +
...
:
H := H1 H2 ...Hr
Hj = I
2
(j)
>
u(j) (u(j) )> , u(j) = [0, ..., 0, 1, uj+1 , ..., u(j)
n ] .
(u(j) )> u(j)
c
226 6. III 2008,
.
, .. H > A:
for j = 1 : r
(* A = Hj A *)
A = REFL.ROW(A(j : m, u(1 : m))
end
BLAS3. 6.3.2
.
6.2.1. REFL.ROW
2 .
.
A(2)
= H2 H1 A =
x x
x x
x x
.
x x
x x
x x
H3
A(2) ( Z ). Hj ( ) A.
Hn Hn1 ...H1 A = R,
Hj
Q
z }| {
= H1 H2 ...Hn R
= QR.
Q Rmm
, R Rmn .
:
c
6.3. QR: Householder 2008,
. 227
. [ QR Householder]
A Rmn m n.
H1 , ..., Hn Q := H1 ...Hn Q> A = R
. A
R. j + 1 : m j
A(j + 1 : m, j), j < m.
for j = 1 : n
11
(1)
2
.
.
.
.
.
(1)
m
12
22
(1)
2
.
.
.
.
.
1n
2n
.
n,n
(2)
m
n+1
(n)
m
(n)
Householder ,
.
, Q
. .
6.3.1 QR
, . ,
QR, . Q1 Rmn
R1 Rnn A = Q1 R1 .
>
A> A = R1> Q>
1 Q1 R1 = R1 R1 , R1 Cholesky
( .. ) M := A> A.
c
228 6. III 2008,
.
M (n) . n. , O(M (n)).
Cholesky. :
1. M := A> A m
n O(M (n)).
O(M (n)).
m
n
. O(M (n)).
6.3.2 QR Householder
Householder Q Q = H1 Hr . ,
Q
REFL.ROW REFL.COL. :
H1 = I 1 u1 u>
1,
H1 H2
H2 = I 2 u2 u>
2
I G
>
>
>
G := 1 u1 u>
1 + 2 u2 u2 + (1 2 u1 u2 )u1 u2
G 2! ,
G x
>
>
>
Gx = (1 u>
1 x)u1 + (2 u2 x)u2 + (1 2 u1 u2 u2 x)u1
. Gx 2 , u1 u2 ,
G
2. ,
G = W Y > 3 W, Y Rn2 ,
H1 H2 = I + W Y > , W, Y Rn2 .
c
6.3. QR: Householder 2008,
. 229
, W, Y r 4 :
6.3.1. Q = I W Y > W, Y Rnr . P =
I 2uu> /u> u u Rn z := 2Qu/u> u
Q+ = QP = I + W+ Y+>
W+ = [W z] Y+ = [Y u].
Q W Y
, .
W, Y r :
Q := H1 ...Hr Hj Householder n. W, Y Q = I + W Y > :
Y = u(1)
W = 2u(1) /(u(1) )> u(1)
for j = 2 : r
z = 2(u(j) + W Y > u(j) )/(u(j) )> u(j)
W = [W z]
Y = [Y u(j) ]
end
Hj
...
T = 2r2 n 2r3 /3
Y Householder,
QR ,
Y .
1
(1)
2
Y =
.
(1)
n
0
1
(2)
n
.
.
0
.
0
1
sr
:
QA =
=
W, Y Rnr .
(I + W Y > )B = B + W Y > B .
Wj , Yj , Wj Yj> A
BLAS3,
. ,
QR :
4
Q .
c
230 6. III 2008,
.
1.1: H1 , ..., Hr (j + 1 : m, j) A j = 1, .., r ,
r A:
Pr := Hr H1 = I + Wr Yr>
1.3: Pr r + 1 : n A.
:
.
[QR Householder
(* r *) = 1; k = 0
while n
= min( + r 1, n); k = k + 1
(* A( : m, : n)
H , ..., H . *)
(* Wk , Yk
I + Wk Yk> = H ...H *)
A( : m, + 1 : n) = (I + Wk Yk> )> A( : m, + 1 : n)
= +1
end
. ...
QR. r . r
. [1].
Householder, . r = 1
Householder, r > 1 Householder
r .
,
,
(.. )
, .
6.3.3 .2 QR
QR , .2, A
.
, A Rmn , m n b Rm Q Rmm
Q> A = R =
R1
0
, R1 Rnn ,
c
6.3. QR: Householder 2008,
. 231
, Q> b =
c
d
x =
minn kAx bk2 =
xR
R11 c
kdk2
:
. [ .2 Householder]
1. A = QR. A Q R.
2.
for j = 1 : n
(* . . j . *)
6.3.4
6.3.1. A Rmn , b Rm A> (b Ax) = 0,
kb Axk2 kb Ayk2
y Rn .
. rx = b Ax
ry = b Ay ry = rx + (Ax Ay),
A> rx = 0
c
232 6. III 2008,
.
r = b-Ax
Range(A)
Ax
6.4: x b Ax Range(A).
6.3.1. .
6.3.1. .
.2 x
A> Ax = A> b.
A , A> A
... .
.2.
. [ ]
1. C = A> A
d = A> b.
2. Cholesky C = GG> .
3. Gy = d G> x = y .
T = mn2 + n3 /3 + O(n2 )
... :
A> A
A (.. ).
, .
c
6.3. QR: Householder 2008,
. 233
5 ...,
. , Pentium
III
RealMax 1.7977 10308 .
IEEE ...
, .. PowerPC.
.
.2 . ,
(A> A). 2 (A> A) := [2 (A)]2 ,
A
.
6.3.2.
A=
1
0
6= 0, rank(A) = 2, Ax = b.
>
A A=
1 + 2
!
.
...,
fl(1 + 2 ) = 1, .. 2 < M . Macintosh Powerbook
5300 (PowerPC CPU) IEEE floating point standard, < 1.4901 108 . A> A
, Cholesky .
.
6.3.5
A , .
LAPACK:
QR: A _GEQRF.
Q
.
5
c
234 6. III 2008,
.
_ORGQR (_ORMQR)
() Q Q> A.
.2:
QR LQ
SVD
_GELS
_GELSX
_GELSS
_GELSX, _GELSS
.
6.4 Givens
QR. Householder,
. .
C R2 . x = 1 +i2 = |x|ei C
xei
= (1 + i2 )(cos + i sin )
= (1 cos 2 sin ) + i(1 sin + 2 cos )
= |x|ei(+) .
ei x . 1 , 2 R2
c = cos , s = sin ,
:
c1 s2
s1 + c2
c s
s c
1
2
1
2
s= p 2
c= p 2
1 + 22
1 + 22
G=
c
s
s
c
p
0
12 + 22
c s
s c
1
2
x Gx e2 .
6.5.
G
Wallace J. Givens.
6.4.1. .
2
2
G , kGxk = x> G> x = kxk
Gx kxk.
c
6.4. Givens 2008,
.
e2
6
235
Gx-e1
6.5: Givens
x e2 .
n- .
x Rn hej , ek i
ek , . e>
k G(j, k, )x = 0.
G(j, k, ) Rnn
(i1 , i2 ) . j < k
s
=
s
i1 ,i2
i1 = i2 = j i1 = i2 = k
i1 = k i2 = j
i1 = j i2 = k
.
i1 ,i2 Kronecker i1 = i2
. G(j, k, )> x x hej , ek i.
6.4.1
c, s 1 , 2
c s
s c
1
2
.
function [c, s] = GIVENS(1 , 2 )
if 2 = 0
c = 1; s = 0
else
if |2 | > |1 |
= 1 /2 ; s = 1/ 1 + 2 ; c = s
else
= 2 /1 ; c = 1/ 1 + 2 ; s = c
end
end
GIVENS
T
= 6,
( ).
c
236 6. III 2008,
.
Householder
,
G
.
G.
( ) .
G.W. Stewart .
if c = 0
=1
elseif |s| < |c|
= sign(c)s/2
else
= sign(s)2/c
end
c, s :
c, s:
if = 1
c = 0; s = 1
elseif || < 1
s = 2; c =
else
c = 2/; s =
1 s2
1 c2
end
min(|c|, |s|)
( ) 1 2
= max(|c|, |s|) 1.
Givens .
6.4.2 Givens
. A R2n
c s
s c
function A = ROT.ROW(A, c, s)
for j = 1 : n
c
6.4. Givens 2008,
.
237
ROT.ROW
T
= 6n ...
ROT.COL:
AA
c
s
s
c
6.4.3
Givens BLAS1.
MATLAB.
BLAS: Givens BLAS1:
CALL _ROTG(A,B,C,S)
B , (C, S) (c, s)
CALL _ROT(N,X,INCX,Y,INCY,C,S) Givens N
[X(1), Y(1)]> ,
...
X(1+(N-1)INCX), Y(1+(N-1)INCY)]>
Givens _ROTMG, _ROTM
MATLAB: function PLANEROT:
6.4.4 QR Givens
A. A
Gj A
.
, ..
.
c
238 6. III 2008,
.
x x
x x
x x
x x
x x
x x
0 x
0 x
x x x
x
x
x
x
x
0
x
x
0
x
x x
x x x
x x
x x
0
x
x
x x
0 x x
x x x
x x
0 x x
x x
x x
0 0 x
0
T = 3n2 (m n/3)
Givens Householder
(. [7] 3.5.2):
Householder u
REFL ...
= I 2
Hk =
u. H
uu
> /
u> u
kH
O(u). fl(HA)
= H(A + E) kEk = O(ukAk).
c, s Givens
c = c(1 + c ) s = s(1 + s ) c , s = O(u). k, )> A = G(j, k, )> (A +
G(j,
E) kEk = O(ukAk).
Gauss.
6.4.5
2 .
Q> Q = I Q> = Q1 ,
Q . QR .
QR ,
.
c
6.4. Givens 2008,
.
239
Q> AQ = H, Q> Q = I,
Q Householder.
: k = 1, ..., n 2
(3 : n, 1), (4 : n, 2), ..., (k + 1 :
n, k 1) A H1 , ..., Hk1 :
Givens
x, ..
x .
Givens QR Hessenberg .
6.4.3. A Rnn
Hessenberg .
6.4.1. xj ,
(A iI)xj = bj , j = 1, ..., s.
Q Q> AQ = H Hessenberg.
Q> (A ij I)Q =
H ij I
| {z }
Hessenberg
xj , j = 1, ...,
xj = Q
c
240 6. III 2008,
.
H ij I
O(n2 ) O(n3 )
A ij I . ,
x(t)
y(t)
= Ax(t) + Bu(t),
= Cx(t), t 0.
x(0) = x0
Z
x(t) = etA x0 +
x0 = 0. Laplace y :
y()
:=
et y(t)dt
Z t
Z
et C
e(t )A Bu( )d dt
0
0
1
C(iI A)
Bu
()
6.5
6.5.1 (, , 04). :
QR LU
QR.
. LU 2n3 /3 + O(n2 )
QR 4n3 /3 + O(n2 ).
6.5.2.
1 0
A = 0 1 .
1 2
1. P
.
1. A
:
c
6.6. 2008,
.
241
2. x A :
2/3
x
= P x = 5/3
7/3
6.6
6.6.1. A Rnn
A=
R//S
R . A
.
. ,
Householder.
6.6.2.
QR A R33 , .. qr(A), A
1 2
1 1
1 1
3
2
1
1. )
A, ) A.
A.
2. Ax = b
b = [12, 5, 10]T .
.
1. R A = QR.
Householder A. u1 = [1, 1, 1]T
uj uT
A = H1 H2 R .
1 1 1
u1 uT1
1
1 1 1
=
3
uT1 u1
1 1 1
c
242 6. III 2008,
.
0
u2 uT2
1
0
=
2
uT2 u2
0
0
1
1
0
1
1
1 4 9
1
5
6
A = H1 H2 R = 2
3
2
2
3
2. H2 H1 A = R Rx = H2 H1 b
x = [1, 2, 3]T .
6.6.3 (, , 04). QR
A, A [1, 4, 5; 1, 2, 6; 1, 2, 3].
Q, R A.
.
1.
T . Pm = H(um ) . . . H(m1 ). m = 2 :
>
>
>
>
>
P2 = (I 2u2 u>
2 )(I 2u1 u1 ) = I 2u1 u1 2u2 u2 + 4(u2 u1 )u2 u1
T2 =
2
4u>
2 u1
0
2
m = 3 :
P3
>
>
>
>
= (I 2u3 u>
3 )(I 2u1 u1 2u2 u2 + 4(u2 u1 )u2 u1 )
>
>
>
>
= I 2u1 u>
1 2u2 u2 2u3 u3 + 4(u2 u1 )u2 u1 )
>
>
>
>
>
>
+ 4(u>
3 u2 )u3 u2 ) + 4(u3 u1 )u3 u1 ) 8(u2 u1 )(u3 u2 )u3 u1
2
4u>
T3 =
2 u1
>
>
8(u>
u
)(u
u
1
2
3 2 ) 4u3 u1
0
2
4u>
3 u2
0
0
2
c
6.6. 2008,
.
243
Tm
2, 2,1 = 4u>
2 u1 , i,j i > j
:
i,j = 2
i1
X
(u>
i uk )ik,j
k=1
2. A Rns .
P O(mn3 ) . n > m, s,
P A : (Hm (. . . (H1 A) . . .)), (
. 2.5.1). O(mn2 s) .
T O(m2 n) m(m 1)/2 DOT.
, ,
O(n2 + mns + m2 s) . n >> m, s
P A .
6.6.5. s uj Rn , j = 1 : s. )
BLAS-1 BLAS-2
Hs Hs1 H1 Hj
Rnn uj Rn .
. )
, n, s,
( flops) min .
. )
Hs (Hs1 (H1 e1 ) )
e1 .
x = H1 e1
for j = 2 : s
x = Hj x
end
(uT x)
Hj x = x 2 uTj u uj DOT
j
DAXPY. BLAS-1.
) (2(2n 1) + 1) + 2n, . 6n 1.
Hj e1 = e1 2
(uT
j e1 )
uj ,
uT
j uj
. 3n 2.
6ns3ns1. ,
, sn + n.
sn+n
.
min 6ns3ns
6.6.6. A Rnn . . ) A
Hessenberg, H , Gauss, .
W AW 1 = H W Gauss. )
T = 53 n3 + O(n2 ) ... )
244
Householder
.
6.6.7.
A=
A11
A21
A12
A22
1 = I W1 Y T
A21 = Q1 R1 Q
1
(nk)k
W1 , Y1 R
Householder.
1 ]. ) A(1) := QT AQ1
Q1 := diag[Ik , Q
1
, . A(1) (p|n) p. )
A Hessenberg
H11
H21
H=
0
0
H12
H1N
H22
..
.
..
..
.
..
.
H32
HN,N 1
HN N
Hij Rkk H = U T AU U = Q1 QN 2 Qj
WY .
[1] C. Bischof and C. V. Loan. The wy representation for products of householder matrices. SIAM J. Sci. Statist. Comput., 8(1):s2s13, January 1987.
[2] . Bjorck. Least squares methods. In P. G. Ciarlet and J. L. Lions, editors,
Handbook of Numerical Analysis, volume 1. Elsevier/North Holland, 1987.
[3] G. Golub and C. F. Van Loan. Matrix Computations. The Johns Hopkins
University Press, Baltimore, 2nd edition, 1989.
[4] N.J. Higham. Accuracy and Stability of Numerical Algorithms. SIAM, Philadelphia, 2nd edition, 2002.
[5] A. J. Laub. Efficient multivariable frequency response computations. IEEE
Trans. Aut. Contr., AC-26:407408, 1981.
[6] A. J. Laub. Algorithm 640. efficient calculation of frequency response matrices from state space methods. ACM Trans. Math. Softw., 12(1):2633,
March 1986.
[7] J. H. Wilkinson. The Algebraic Eigenvalue Problem. Oxford University Press,
1965.
IV
Problem 3: The solution of simultaneous linear equations. In this
problem we are likely to be limited by the storage capacity of the
machine. If the coefficients of the equations are essentially random
we shaIl need to be able to store the whole matrix of coefficients and
probably also at least one subsidiary matrix. If we have a storage
capacity of 6400 numbers we cannot expect to be able to solve
equations in more than about 50 unknowns. In practice, however,
the majority of problems have very degenerate matrices and we do
not need to store anything like as much. For instance problem (2)
above can be transformed into one requiring the solution of linear
simultaneous equations if we replace the continuum by a lattice. The
coefficients of these equations are very systematic and mostly zero.
In this problem we should be limited not by the storage required for
the matrix of coefficients, but by that required for the solution or for
the approximate solutions. - Alan Turing [20]
.1 ,
Gauss 23 n3 + O(n2 ).
..., ( Cholesky) LU.
,
n(n+1)
( (diag(A))1 L) LU .
2
. ,
.
(.. )
;
1. .
2. .
245
c
246 7. IV 2008,
.
7.1: p + q + 1, p, q .
(, , ..) .
O(n)
O(n2 ) (Vandermonde, Toeplitz, Hankel ).
:
Hessenberg O(n2 ) ...
O(n) ...
Vandermonde O(n2 )
...
O(n log n)
...
Toeplitz O(n2 ) ...
7.1 /
.
(= banded matrices) .
7.1.1. ij p, q
ij = 0 i > j + p j > i + q .
q ,
p p + q + 1. p = q (..
) ( = semibandwidth) p.
. 7.1.
7.1.1.
c
7.1. / 2008,
.
247
p = q = 1 :
, .
p = 1, q = 0 (. p = 0, q = 1) () .
Hessenberg
: .. p = 1 Hessenberg p = 0 .
A , q
p,
nnz := n + (p + q)n
p(p + 1) + q(q + 1)
2
p, q n O(n)
p + q + 1 () n.
. 3
[1].
q
p A(p|q). A(p|q)
e>
i A(p|q)ej = 0 i > j + p j > i + q .
7.1.1. A(p1 |q1 ), B(p2 |q2 ) Rnn .
1. C = A(p1 |q1 ) B(p2 |q2 ) max(q1 , q2 )
max(p1 , p2 ).
2. C = A(p1 |q1 )B(p2 |q2 ) min(q1 + q2 , n)
min(p1 + p2 , n).
. .
, ,
i+p1 +p2 ,i = i+p1 +p2 ,i+p2 i+p2 ,i + . A, B
i+p1 ,i+p2 i+p2 ,i 6= 0, i+p1 +p2 ,i 6= 0. i+p1 +p2 +1,i .
, i+p1 +p2 +1,k k,i
, . k i + p1 + p2 k + p1 k i p2 .
7.1.1. A(1|1). Ak =
[A(1|1)]k min(n, k).
O(n), .
,
. . A(1|1) Rnn An1
n 1.
7.1.1 . ,
.
7.1.2. A(n|n) R2n2n diag[D; D] D Rnn . 7.1.1,
[A(n|n)]2 2n. A ,
A(n|n)A(n|n) = diag[D2 ; D2 ],
n. 7.2.
c
248 7. IV 2008,
.
A
A*A
8
0
8
2
4
6
nz = 32
8
0
4
6
nz = 32
4
6
nz = 62
8
2
4
6
nz = 44
7.2: : A 3. : A2 . 3 6.
7.1.3.
A, B LAPACK.
. ,
, .
. [ ]
: A, b A p.
: x Ax = b. : T 2np n p.
for j = 1 : n
j = j /jj
for i = j + 1 : min(j + p, n)
i = i ij j
end
end
c
7.1. / 2008,
.
249
. [ ]
: A, b A q . : x
Ax = b. : T 2nq n q .
for j = n : 1 : 1
j = j /jj
for i = max(1, j q) : j 1
i = i ij j
end
end
n,
O(n2 ).
A(p|q) , .
.
7.1.2. A(p|q) Rnn A = LU . L p U q , . A(p|q) =
L(p|0)U (0|q).
. .
, L, U .
L, U A.
.
. [ ]
: A A A(p|q).
:
L(p|0), U (0|q)
A. : T 2npq n p, q .
for k = 1 : n 1
for i = k + 1 : min(k + p, n)
ik = ik /kk
end
for j = k + 1 : min(k + q, n)
for i = k + 1 : min(k + p, n)
ij = ij ik kj
end
end
end
. A
( , ..
c
250 7. IV 2008,
.
1 1
1
0
1 1
..
..
2 . . . . . .
2 . . . . . .
0
.
.
.
.
.
.
.
..
.
.
.
.
..
.
.
.
. n1
n1
n n
n 1
0
n
. [ LU , ,
]
: A = trid[i , i , i ]. :
L, U . : T = 3n 3
1 = 1
for i = 2 : n
i = ci /i1
i = i i i1
end
. [ Ly = b]
: L = trid[i , i , 0]. :
y = [1 , . . . , n ]> . : T = 3n 2
1 = 1 /1
for i = 2 : n
i = (i i1 i )/i
end
Ax = b
Ly = b U x = y .
.
=
=
3n 3 + (3n 2) + (2n 1)
8n 6.
A LU
.
P A = LU , P
; , ,
.
7.1.2,
A(1|1) 11 21
21 .
c
7.2. 2008,
.
251
1 2 A.
LU , :
A1 := P1 A = LU P1 = [e2 , e1 , e3 , , en ].
A1 = A1 (1|2), L = L(1|0)U (0|2),
7.1.2
L, U A. ( )
.
7.1.3. [14] A Rnn A = A(p|q)
Gauss
P A = LU . U p + q
L p + 1
.
. [14]
,
Hessenberg
O(n2 ) . (
Hessenberg) , ,
, hk+1,k hkk ,
, > 1.
7.1.4. Hx = b H
Hessenberg.
7.2
, ,
Rnn , n2 . (, Vandermonde, Toeplitz,
Hankel) n 2n 1 . , O(n) O(n2 ) .
,
.
Vandermonde, Toeplitz .
7.2.1 :
.
Pn1
j
7.2.1. p(x) :=
j=0 j x -
1
1
V (1 , . . . , n ) = .
..
1n1
1
2
..
.
2n1
...
...
...
1
n
..
.
nn1
b =
c
252 7. IV 2008,
.
Vandermonde 1 , ..., n .
Vandermonde n p n p(x)
1 , ..., n .
7.2.2. 1 , . . . , n
R 1 , , n R . Newton
n (j , j )
p(x) =
n1
X
j=0
j1
Y
(x k )
k=0
j .
b = N c,
c , b
1
1
N =
1
0
(2 1 )
(k 1 )
(n 1 )
Qk1
j=1 (k
j ) 0
( ) Newton-Vandermonde.
:
Qn1
j=1 (n
j )
Vandermonde
,
Newton-Vandermonde .
Vandermonde , , . .
Newton-Vandermonde ,
, .
.
[18].
.
7.2.3.
p(x) n 1:
:
{j }n1
j=0 .
Pn1
j=0
j xj n
Qn1
: p(x) = n1 j=1 (x j )
n 1 n1 .
c
7.2. 2008,
.
Newton:
{j }nj=1
Pn1
Qj1
k=0 (xk )
{j }n1
j=0 .
j=0
253
: n {p(j )}n1
j=0 .
,
, n 1 n
.
..
, .
,
.
7.2.4. A Cnn
O(n2 ) . A Vandermonde n k = exp(2k/n), k = 0, ..., n 1,
A O(n log n)
... y = Az
y(k) =
n1
X
z(j) exp(2jk/n), k = 0 : n 1,
j=0
7.2.2 Vandermonde
V (1 , ..., n ) Vandermonde V w = a. Gauss, O(n3 ) ...
n . O(n2 ) ...
Vandermonode.
7.2.1. Vandermonde
(xi xj )
j<i
c
254 7. IV 2008,
.
.
det
1
x0
1
x1
..
.
..
.
xn0
xn1
...
...
...
1 1
0 x1 x0
0 x21 x0 x1
xn
.. = det
..
.
.
xnn
0
..
.
xn1 x0 xn1
1
...
...
...
...
1
xn x0
x2n x0 xn
..
.
xnn x0 xn1
n
a := [0 , ..., n ]>
pn (x) = 0 + 1 x + + n xn
b = [0 , ..., n ]> .
V > a = b :
1. pn
{pn (j ) = j }j=0:n .
2. pn .
Newton ,
pn (x) = 0 + 1 (x 0 ) + + n (x 0 ) (x n1 )
j . j {pn (j ) = j }j=0:n . ,
T 32 n2 ...
. [ ]
j = j (j = 0, ..., n)
for k = 0, ..., n 1
for i = n, ..., k + 1
i = (i i1 )/(i ik1 )
end
end
c
7.2. 2008,
.
255
qn (x) = n
for k = n 1 : 1 : 0
qk (x) = k + (x k )qk+1 (x)
end
q0 (x) = pn (x) .
(k)
(k)
j := k
j = j
for k = n 1 : 1 : 0
for j = k : n 1
j = j k j+1
end
end
T n2 ...
Bjorck-Pereyra
V > a = b.
. [ Vandermonde ]
(Bjorck-Pereyra) {j , j }j=0:n
j b a
[V (0 , ..., n )]> a = b.
for k = 0 : n 1
for j = n : 1 : k + 1
j := (j j1 )/(j jk1 )
end
end
for k = n 1 : 1 : 0
for j = k : n 1
j := j j+1 k
end
end
T = 5n2 /2 ...
b ..
c :
c =
=
1
Dn1
Ln1 (1) D01 L0 (1)b
U >b
c
256 7. IV 2008,
.
Lk () :=
Ik
0
0
Jn+1k
, Jk :=
..
..
..
1
Dk = diag[1k+1 , k+1 0 , ..., n nk1 ]
Newton
:
a =
=
= V 1 b = U Lb
1
= [L0 (1)> D01 Ln1 (1)> Dn1
][Ln1 (n1 ) L0 (0 )b]
V a = b:
. [ Vandermonde ]
(Bjorck-Pereyra) {j , j }j=0:n
j , b a Vandermonde
[V (0 , ..., n )]a = b.
for k = 0 : n 1
for j = n : 1 : k + 1
j := j k j1
end
end
for k = n 1 : 1 : 0
for j = k + 1 : n
j := j /(j jk1 )
end
for j = k : n 1
j := j j+1
end
end
T = 5n2 /2 ...
. :
n + 1
n + 1
n + 1 Newton
c
7.2. 2008,
.
257
7.1: .
b
. . a . Newton () c
b
V T b
. . a
N 1 V > a
>
Horner/V a
. Newton .
V T N c
Newt.Horner/N c
-
7.2: Vandermonde .
n
2
0.84
0.15 1.01
4
3.30
0.91 4.00
6
5.95
1.66 6.31
8
8.67
2.41 7.28
10 11.43 3.17 10.47 12 14.20 3.93 11.97
14 16.99 4.69 14.28 16 19.26 5.45 16.46
18 20.73 6.21 15.91 20 21.00 6.98 17.89
. 7.1, N
Newton-Vandermonde .
Vandermonde1
V j .
7.2.5. I. [a, b]:
j = a + (j 1)
ba
.
n
j = cos(
2j 1
), j = 1, ..., n.
2n
. 2
c
258 7. IV 2008,
.
1
j+1
j 0
j [0, 1]
j [1, 1]
Chebyshev
(V ) > nn+1
(V ) > 2n1
(V ) 42 8n
n
(V ) (3.1)
e/4
3/4
(V ) 3 4 (1 + 2)n
2 (V ) = 1
Vandermonde .
:
n (, ),
. 7.3
N. Higham [16],
.
k2
V k = e n+1 .
V V = I
V Fourier, V
Fourier.
FFT O(n log n).
Vandermonde [2].
[14].
Vandermonde
O(n2 ) ... .
Gauss. 1980, N. Higham Vandermonde
, .. [15] . [16].
Vandermonde Walter Gautschi [7, 9, 12, 10, 13, 11, 3].
O(n2 ) Vandermonde .
Hao Lu,
O(n log2 n) Vandermonde, confluent Vandermonde Vandermonde. [17].
,
. , , .
.. ( ) .
.
, .. [6],
c
7.2. 2008,
.
259
log n ,
[5, 4],
.. ()
Vandermonde
Pn
j
, .
x
.
j=0 j
(, ),
.
. John Rice [19] .
[8, 12, 10].
7.2.3
Toeplitz
(.. ,
, ..) Toeplitz. , A Rnn
Toeplitz 2n .
Toeplitz (= persymmetric)
:
7.2.1. A Rnn ij = nj+1,ni+1 .
.
MATLAB Toeplitz :
5
1
2
6
5
1
>> toeplitz(linspace(2,1,3))
ans =
2.0000
1.5000
1.0000
1.5000
2.0000
1.5000
1.0000
1.5000
2.0000
7.2.1. T = TL + TU TL , TU
Toeplitz .
7.2.2. T Toeplitz T 1
Toeplitz.
c
260 7. IV 2008,
.
7.2.4. T1 , T2 () Toeplitz,
T1 T2 .
7.2.1.
.
.
7.2.4 Toeplitz
Toeplitz T Rnn x Rn b = T x.
0
1
2
..
.
n1
0
1
2
0
0
1
0
0
..
..
..
..
.
n1
0
0
0
0
1
2
..
..
.
.
0
n1
n2 . , Toeplitz
.
b
0 =
1 =
2 =
....
k
k
X
j kj
j=0
....
n1
0 0
1 0 + 0 1
2 0 + 1 1 + 2 0
n1
X
j n1j
j=0
a(z), x(z) n1 ( k k )
a(z) :=
n1
X
j z j
j=0
x(z) :=
n1
X
j z j
j=0
b(z) 2n2
b(z)
:=
2n2
X
j=0
j z j
c
7.2. 2008,
.
k =
k
X
261
j kj (0 k 2n 1)
j=0
n k T x.
.
n . p(x), q(x) Pn1 2n 1
{j }2n1
j=1 . , r(x) = p(x)q(x) O(n) r(j ) = p(j )q(j ).
( ) r 2n2 r(j ), j = 0 : 2n 2, n
r .
,
j .
j = exp(2j/(2n 1)), := 1.
p(j ), q(j ) O(n log n) FFT
p q . 2n 1
() r(x) r(j ) =
p(j )q(j ), j = 1 : 2n 1, O(n log n) FFT
2n 1 r(j ).
) Toeplitz
) O(n log n) . ()
(= convolution theorem). :
a(z), x(z) 2n 1 j
b(j ) k , 0 j 2n 2.
j
j = 2n1 (0 j 2n 1)
FFT.
a
= fft2n1 (a), x
= fft2n1 (x)
T = O(n log n)
b = a
x
T = O(n)
b = fft1 (b)
2n1
T = O(n log n)
b n
b.
Hadamard
. :
7.2.2. Toeplitz T = O(n log n)
7.2.5. :
c
262 7. IV 2008,
.
1. T Toeplitz x . T = TL +TU TL , TU
Toeplitz, T x T = O(n log n).
2. T, P Toeplitz. T P Toeplitz ,
. T P (:, 1), T =
O(n log n).
3. T, P Toeplitz.
Toeplitz
T , Toeplitz. x
T x = b.
(7.1)
T , 0 6= 0,
. :
Tk =
..
..
.
..
..
k1
k1
k2
..
.
1
Tn = T . Tn x .
Tk yk =
rk , k = 1 : n rk .
yk , k = 1 : n x.
Durbin Yule-Walker:
Tn y = r = [1 , , n ]>
.
Tk y = r = [1 , ..., k ]> .
Tk+1 y = rk+1 :
Tk
r> Ek
Ek r
1
r
k+1
Tk1 ,
=
=
Tk1 (r Ek r) = y Tk1 Ek r
y + Ek y
c
7.2. 2008,
.
263
= k+1 r> Ek z
= k+1 r> Ek (y + Ek y)
= (k+1 + r> Ek y)/(1 + r> y)
y1 = 1
for k = 1 : n 1
k = 1 + rk> yk
k = (k+1 + rk> Ek yk )/k
zk = yk + k Ek yk
yk+1 = [zk> ; k ]
end
T = 3n2 ...
= 1 + rk> yk
>
= 1 + [rk1
k ]
= k1 + k1 (k1 k1 )
2
= (1 k1
)k1
:
. [Levinson-Durbin]
: T = 2n2 .
A = Levinson(r,n);
n < 195 !
7.2.8.
r = linspace(2,1,400);
flops(0);
x = Levinson(r);
flops
319999
c
264 7. IV 2008,
.
. 2 4002
flops(0);
x = toeplitz(r)\r
flops
22213401
. 4003 /3.
MATLAB Toeplitz
(7.1).
Tk
r> Ek
Ek r
1
b
k+1
Tk x = b = (1 , ..., k )> ; Tk y = r
v
1
= Tk1
(b Ek r) = x + Ek y;
= k+1 r> Ek v
= k+1 r> Ek x r> y
= (k+1 r> Ek x)/(1 + r> y)
. [Levinson]
: T = 4n2 .
b = Hx Eb =
EH x
|{z}
Toeplitz
c
7.2. 2008,
.
265
0
1
0
1
0
0
3
2
1
5
3
2
6
5
3
>>E*A
ans =
7.2.5
(= circulant) .
Toeplitz Toeplitz. :
0
n1
A= .
..
1
1
0
...
...
..
..
n1
n1
n2
..
.
0
7.2.1. A,
C = [en e1 e2 , , en1 ],
c
266 7. IV 2008,
.
n := exp(2/n).
1. A , Toeplitz.
2. C n = I .
3.
A=
n1
X
j C j
j=0
4. Q
1
Qk,j = n(k1)(j1)
n
, = Q AQ A
k =
n1
X
as n(k1)s , k = 1, ..., n.
(7.2)
5. A1 .
6. A, B AB AB = BA.
7.2.9. .
1 2
4 1
A=
3 4
2 3
A1
0.2250
0.0250
=
0.0250
0.2750
0.2750
0.2250
0.0250
0.0250
3
2
1
4
1
0.0250
0.0250
0.0250
.
0.2250 0.2750
0.0250 0.2250
0.2750
4 = exp(2i/4) = i, (7.2).
. C 10, 2 + 2i, 2
2i, 2.
Ax = b Q
A:
Ax = b x = Q1 Q b
:
c
7.3. 2008,
. 267
. [ Ax = b A.]
1.
b = Q b FFT
2. :
k =
Pn1
s=0
(k1)s
s n
FFT.
3.
b = 1b. O(n) .
4. x = Q
b. FFT
FFT n , . O(n log n)
...
7.3
7.3.1. : A A = LU L
U .
. .
(.
) (. . 7
).
7.4
7.4.1 (, , 02-makeup). )
LU A . ) ...
. )
...
. Vandermonde n x.
c
268 7. IV 2008,
.
m V B .
, ) V , )
, ) . , )
,
. (
) x
.
1
1
An = .
..
1
0
1
0n
1n
.. .
.
nn
j R , j = 0 : n.
1. A
j A .
2. b Rn A
.
Gauss
Ax = b;
. 1)
n = 1 n = 2.
A1 = 1 0
detA2 = (1 22 12 2 ) (0 22 02 2 ) + (0 12 02 1 )
,
syms x0 x1 x2
>> A=[1,x0,x02;1,x1,x12;1,x2,x22]
c
7.4. 2008,
.
269
A =
[
1,
x0, x02]
[
1,
x1, x12]
[
1,
x2, x22]
>> det(A)
ans =
x1*x22-x12*x2-x0*x22+x02*x2+x0*x12-x02*x1
>> factor(ans)
ans =
(-x2+x0)*(-x2+x1)*(x1-x0)
, n, ()
.
detAn =
n
Y
(i j ).
i>j
,
( )
( ).
1 n 1 n- ,
0
0
detAn = det .
..
1
0 n
1 n
0n nn
1n nn
..
.
nn
(1 : n, 2 : n + 1), .
0 n
..
.
n1 n
0n nn
..
.
n
nn
n1
. n = 1, P0 = 1 0 .
n = 1. n 1, .
detAn1 = detPn1 =
n1
Y
(i j ).
i>j
Qn1
0 + n
1 + n
..
.
..
.
n1 + n
n
0n n
0 n
n
1n n
1 n
n
n
n1
n
n1 n
c
270 7. IV 2008,
.
Qn1
0 0 n1
0 1 n1
.
..
1 n1 + n
..
.
n1
0n1 n1
n1
1n1 n1
n
n
n1 n
n1 n
0 n1
Pn2 = ...
n2 n1
n1
0n1 n1
..
.
n1
n1
n2
n1
detPn2 =
n1
Y
(i j ).
i>j
detAn
(1)n (0 n ) (n1 n )
n1
Y
(i j )
i>j
(n 0 ) (n n1 )
n1
Y
(i j )
i>j
n
Y
(i j ).
i>j
.
:
, i 6= j
i 6= j .
2) , . , Vandermonde
n.
j .
Vandermonde
.
7.4.4. Toeplitz T Rnn J
0 0
1 0
J =
0 1
0
J = [e2 , e3 , , en , 0].
..
..
,
..
.
0
c
7.4. 2008,
.
271
Pn1
k
1. T =
k , k =
k=0 k J
0, , n 1 T .
2. Toeplitz Toeplitz
.
3. T Toeplitz
, T 1 Toeplitz.
. 1)
J 2 = [e3 , e4 , , en , 0, 0]
, k n 1
J k = [ek+1 , , en , 0, , 0]
k
k n, JP
= 0. , Toeplitz
n1
T = k=0 k J k .
2) Toeplitz, A B .
C = AB J k , k = 0, ..., n 1.
AB
= (
n1
X
k J k )(
i J i )
i=0
k=0
n1
X
n1
X
k J k .
k=0
Pn1
k
3) (1) T =
k=0 k J . 0 = 1,
0 .
T =I P =I (
n1
X
(i )J i ).
i=1
P ( ), P .
Neumann,
T 1
(I P )1 = I + P + P 2 + .
P j J , J n1 ,
.
J .
272
Numer. Mat.,
[9] W. Gautschi. Optimally conditioned Vandermonde matrices. Numer. Math., 24:112, 1975.
[10] W. Gautschi. Questions of numerical condition related to polynomials.
In G.H. Golub, editor, Studies in Numerical Analysis, volume 24, pages
140177. Mathematical Association of America, 1984.
[11] W. Gautschi. How (un)stable are Vandermonde systems? In R. Wong,
editor, Asymptotic Analysis and Computational Analysis, pages 193210.
Marcel Dekker, Inc., New York, 1990.
[12] W. Gautschi. The condition of polynomials in power form. Math. Comp.,
33(145):343352, January 1979.
[13] W. Gautschi and G. Inglese. Lower bounds for the condition number of
vandermonde matrices. Numer. Math., 52:241250, 1988.
[14] G. Golub and C. F. Van Loan. Matrix Computations. The Johns Hopkins
University Press, Baltimore, 2nd edition, 1989.
[15] N. J. Higham. Fast solution of Vandermonde-like systems involving orthogonal polynomials. IMA J. Numer. Anal., 8:473486, 1988.
[16] N.J. Higham. Accuracy and Stability of Numerical Algorithms. SIAM, Philadelphia, 1996.
273
[17] H. Lu.
Solution of Vandermonde-like systems and confluent
Vandermonde-like systems. SIAM J. Matrix Anal. Appl., 17:127138, 1996.
[18] V. Pan. Complexity of computations with matrices and polynomials. SIAM
Rev., 34(2):255262, 1992.
[19] J. Rice. A theory of condition. SIAM J. Numer. Anal., 3(2):287311, 1966.
[20] A.M. Turing. Proposed electronic calculator, 1946. Available electronically
from www.emula3.com/docs/Turing_Report_on_ACE.pdf.
[21] C. Van Loan. Computational Frameworks for the Fast Fourier Transform.
SIAM, Philadelphia, 1992.
274
8.1
A major task of mathematics is to harmonize the continuous and the
discrete, to include them in one comprehensive mathematics, and to
eliminate obscurity from both .... [E.T. Bell Men of Mathematics
(1937)]
,
(
) .
, . :
8.1.1.
Navier-Stokes. , R2 R3
:
u + u gradu + gradp = f
divu = 0 ,
u = 0
u p, f
, := 1/Re Re Reynolds.
8.1.2. Maxwell. :
B
=
t
divD =
E+
divB =
275
J+
D
t
276
c
8. 2008,
.
H , E , D ,
B , J , .
( = constitutive relations)
, .. ,
D = E, B = H, J = E
, .
8.1.3.
.
1 . ( = option) ( = call option) ( = put option).
Black-Scholes:
V
1
2V
V
+ 2 S 2 2 + rS
rV = 0.
t
2
S
S
Black-Scholes V
t S .
,
. 2.1 2 .
, 8.1.
.
: ) (. 2). )
,
.
.
.
.
, .
. ,
...,
.
.
1
c
8.1. 2008,
.
277
8.1: .
,
, . .
. , ,
.
.
, , .
,
. , ( )
.
,
.
, () ,
. :
1)
, 2) , 3) (..)
, .
4)
278
c
8. 2008,
.
. , Newton
,
.
8.1.1
.
, ()
().
u
u
2u
pu
L u(z),
(z), . . . ,
(z), 2 (z), . . . , p (z) = 0, z Rs .
z1
zs
z1
zs
p.
. u
Rk , k > 1 .
L
. , L
.
.
( ) . , u(t, z)
, ,
u(0, z) u
.
8.2
.
,
Rn ,
.
,
( )
.
.
:
;
;
;
;
c
8.2. 2008,
. 279
.
.
. , , , , .. u0 (x) = limh0 (u(x + h) u(x))/h.
,
(u(x + h) u(x))/h. :
u(x)
c
x
x2
sin x
u0 (x)
0
1
0
1
2x
cos x
2x + h
u(x+h)u(x)
h
sin(x+h)sin(x)
h
8.2.1
u(x) XL x XU
d2 u
du
(x) + b(x) (x) + c(x)u(x) = d(x)
dx2
dx
L(u, b, c, d, x) = 0. 2
2
c
8. 2008,
.
280
x0
x0
x1
x2
x3
xn+1
xn
xn+1
h
8.2: n + 2 .
u.
u.
. ,
. , c(x) = 0 u(x) ,
u(x) + c c
. u(XL ), u(XU ).
b, c, d
u -
.
u(x) XL < x < XU . :
h ().
L Lh .
u U .
c
8.2. 2008,
. 281
, ..
, .
.
,
. ,
6 , 2-3 .
( )
.
Taylor. u 4 ,
(1)
uj1 = uj huj +
(2)
h2 uj +
h4 (4)
u (xj + j+ h) + u(4) (xj + j h)
24
uj+1 uj1
(1)
= 2huj +
h3 (3)
u (xj + j+ h) + u(3) (xj + j h)
6
uj+1 uj1
2h
uj1 + uj+1 2uj
h2
(1)
uj + O(h2 )
uj + O(h2 )
(2)
xj :
uj+1 uj+1
uj1 uj+1 + 2uj
+ bj
+ cj uj = dj + O(h2 ), j = 1, ..., n.
h2
2h
O(h2 ) .
uj+1 uj1
2h
uj1 + uj+1 2uj
h2
(1)
uj
uj
(2)
282
c
8. 2008,
.
1. h,
2. u(3) , u(4) xj .
O(h2 ) , n
c
8.2. 2008,
. 283
, .
.
( ):
, (.
), .
, ,
, ,
, . (
)
.
Norbert Wiener (1894-1964), ,
:
.... the exigencies of the Second World War thrust me into the
problem of shooting ahead of a flying airplane and the consequent
problem of predicting its course. I sought to develop predicting machines of various sorts ... and I found that what I was looking for
would have demanded me to have my cake and eat it, too. I was studying the way in which the past observation of the airplane
might lead to the computation of its future position. The accurate
following of an airplane pursuing a smooth course demanded sharp
and sensitive instruments; but these instruments, because of their
very sharpness and sensitivity, were seen to be thrown out of action
by every slight jar and by every corner of the course they were following. For very irregular paths, the instruments I had suggested
were inadequate, not in spite of their refinement, but because of
their refinement. It occurred to me that this impossibility of achieving the ideal instrument all along the line had a close relation to
the Heisenberg impossibility of observing at the same time where a
thing was going and how fast it was going. The more I studied the
problem, the more I realized that my difficulty was not a piece of
casual malice on the part of a mathematical devil, but lay in the the
very nature of prediction itself. [4]
U2 U0
U0 U2 + 2U1
+ b1
+ c1 U1
2
h
2h
Uj1 Uj+1 + 2Uj
Uj+1 Uj1
+ bj
+ cj Uj
2
h
2h
Un+1 Un1
Un1 + 2Un Un+1
+ bn
+ cn Un
h2
2h
= d1
= dj , j = 2, . . . , n 1
= dn
c
8. 2008,
.
284
U2 + 2U1
U2
+ b1
+ c1 U1
h2
2h
d1 +
u0
u0
+ b1
h2
2h
dj , j = 2, . . . , n 2
Un1 + 2Un
Un1
+ bn
+ cn Un
h2
2h
dn +
un+1
un+1
bn
h2
2h
(c1 +
(
2
1
b1
)U1 + ( 2 +
)U2
h2
h
2h
1
bj
2
1
bj
)Uj1 + (cj + 2 )Uj + ( 2 +
)Uj+1
+
h2
2h
h
h
2h
1
2
bn
( 2
)Un1 + (cn + 2 )Un
h
2h
h
= d1 +
u0
u0
+ b1
h2
2h
= dj , j = 2, . . . , n 1
= dn +
un+1
un+1
bn
h2
2h
n [U1 , , Un ]. AU = F A j , j , j
j 1, j, j + 1 j , .
tridn [j , j , j ]
j = (
2
1
bj
1
bj
+ cj ), j = ( 2
), j = ( 2 +
)
h2
h
2h
h
2h
, :
1. A ;
2. , ;
h = f
3. AU
u;
A
. 2 .
. , (1) , u00 (x) + c(x)u(x) = d(x) c(x) 0.
dj , j = 2, . . . , n 1
1 n. A ,
V AV = 0.
V j , . Vj > 0. 1 < j < n
AV = 0
0 =
=
c
8.2. 2008,
. 285
, V Vj > 0
cj = 0 Vj = 0. Vj = Vj1 .
V1 = Vn = 0
V = 0.
j = 1 j = n. A
.
2
8.2.1. ddxu2 (x)+xu(x) = (9+x) sin 3x
= [0, /6] u(0) = 0, u(/6) = 1.
u(x) = sin(3x).
:
h=
6(n+1)
(x1 +
2
1
)U1 2 U2
h2
h
1
2
1
Uj1 + (xj + 2 )Uj 2 Uj+1
h2
h
h
1
2
2 Un1 + (xn + 2 )Un
h
h
= (9 + x1 ) sin 3x1
= (9 + xj ) sin 3xj , j = 2, . . . , n 1
= (9 + xn ) sin 3xn +
1
h2
xj AU = F
tridn [j , j , j ]
j , j , j j 1, j, j + 1 j
2
1
1
+ jh), j = 2 , j = 2
h2
h
h
F j = 1, ..., n 1 (9 + jh) sin 3xj
n (9 + nh) sin 3xn + h12 . , A . AU = F
LU ( O(n))
U u
j = (
8.3. U n = 20
u U
| juj j |
n = 20, 40, 60, 80.
n = 20, 40, 80
[0.1680, 0.0441, 0.0113] 103
, . , O(h2 ).
-
.
- (..
) -
.
) ) .
c
8. 2008,
.
286
10
0.9
0.8
n=20, maxReR=1.68e4
10
0.7
n=40, maxReR=4.41e5
Rel. Error
0.6
0.5
n=60, maxReR=1.99e5
10
n=80, maxReR=1.13e5
0.4
0.3
10
0.2
0.1
7
10
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
0.1
0.2
0.3
0.4
0.5
0.6
0.7
8.3: 8.2.1
, 2
. Poisson
(uxx +uyy ) = f (x, y) uxx , uyy u
x y .
. = [0, 1] [0, 1]
() .
, n n
n2 n2 U .
n2 n2
( LU
O((n2 )3 ) = O(n6 ).
n 1000,
.
A. , .
, 5 .
, . trid[Bj , Aj , Cj ]
Bj , Cj , Aj .
, u, ,
Aj , Bj Cj
. Gauss
.
n3
.
c
8.2. 2008,
. 287
.
8.2.2
. , ,
( ) ,
( ).
( Cauchy),
du
= f (t, u(t)) t I,
dt
u(t0 ) = c.
(8.1)
I R , f : I Rn Rn c . , t I
.
3 :
8.2.1. I R Rn
f : I Rn ,
Lipschitz k k
Rn , I ,
t I u, v Rn ,
, ( )
Cauchy u,
[t0 , T ].
f . f t u
R f :
t
u(t) = u(t0 ) + t u( )d. .
0
. f u,
. , t u
.
, Cauchy ,
.
.
.
t = 0 u(0)
u(h). u t = 0 ( f (0, u(0))),
Lipschitz.
288
c
8. 2008,
.
u0 (t) ( t = 0)
u(t + h) u(t)
.
h
u(h)
u(2h).
u(2h) U2 := U1 + hf (h, U1 ).
, U2
, , u
, Cauchy, u
(1) (t) =
f (t, u
(t)), [h, 2h], , .
u
(h) = u(h), u
(h) = U1 .
, U3 , U4 , ...
u(3h), u(4h), ....
Euler. .
Cauchy
(. (t, u(t)) u : [0, T ] R ), u(0)
u0 (t) [0, T ].
Euler h u(0 + h)
(0, u(0)) (h, u(h)).
, (h, U1 ) ,
(h, U1 ) (2h, U2 ).
t = T ( T = N h N ,
).
:
Un+1 = Un + hn n (tn , Un )
hn ( ) n
u (
). t = t0 , P
n
Un u(t0 + j=1 hj ).
,
Un , (u(t0 ), U1 , ..., Un1 , Un ).
, Un+1
, .. k , . Unk+1 , ..., Un1 , Un , k > 1.
.
c
8.2. 2008,
. 289
n
, (8.1) .
n
n . u : [0, T ] R
n n
d
u1 (t) =
dt
d
u2 (t) =
dt
d
un1 (t)
dt
d
un (t)
dt
u2 (t)
u3 (t)
un (t)
d
u(t)
dt
u(t)
u(0)
d(t)
Au(t) + c(t),
0
0
0
1
0
0
0
1
0
..
.
..
.
..
.
0 (t)
0
0
1
..
0
0
0
n1 (t)
8.2.2.
d3 u
=
dt3
0
00
u(0) = 1 , u (0) = 2 , u (0) = 3 .
f (t, u, u0 , u00 )
1 = u,
2 = u0 ,
3 = u00
c
8. 2008,
.
290
:
0
1 = 2 ,
1 (0) = 1
0
2 = 3 ,
2 (0) = 2
0
3 = f (t, 1 , 2 , 3 ), 3 (0) = 3
f , ..
d
= A
dt
0
A= 0
2
1
0
3
1
0
1 , = 2
3
4
8.2.3 Euler
,
(8.1) : u t
:
d
u(t + t) u(t)
u(t) =
+ O(t)
dt
t
O(t), (8.1)
U (t + t) U (t)
= f (t, U (t)),
t
U (t0 ) = c.
U (t + t) =
U (t0 ) = c. (8.2)
U (t0 )
U (t1 )
=
=
c
U (t0 ) + t1 f (t0 , U (t0 ))
U (t2 )
U (ts )
c
8.2. 2008,
. 291
|U (tj )u(tj )|
u(tj )
u(t)
N =5
N = 10
N = 20
0.9048
0.7408
0.6065
0.9000
0.7290
0.5905
0.9025
0.7351
0.5987
0.9037
0.7380
0.6027
u0 (t) =
s=0
repeat
Fs = f (ts , Us )
ts
ts+1 = ts + ts
Us+1 = Us + ts Fs
s=s+1
until (ts tmax )
, U t .
Euler Euler.
U (ts ) .
,
t . , Fs := f (ts , Us )
Us + ts f (ts , Us ). ,
Fs (..
f ).
8.2.3. ( , )
d
u(t) = u(t)/2,
dt
u(0) = 1.
Us+1 = Us 0.5ts Us .
ts = 1/N
u(t) = et/2 N = 5, 10, 20.
t = 0.2, 0.6, 1 8.1. ,
t .
, ,
Cauchy [0, T ]
lim
max
h0 j=0,1,...,dT /he
ku(tj ) Uj (h)k = 0
Uj (h)
tj h.
h. ,
c
8. 2008,
.
292
u(tj+1 ) Uj+1
(8.3)
(8.4)
u00 (
[0, T ].).
kej k
j = 0, 1, ...
1 + h < eh
kej k
jh
h[e
1],
j = 0, 1, ...
(s+1)h
eT 1
h[e
1] = h[eT 1] h
kej k
h0
0,
Euler ,
h , 1. , ,
, , T h
. ,
M .
, (8.3):
ej+1
(2)
3)j
ej j . ,
j + 1 :
4
, .
c
8.2. 2008,
. 293
j .
(
, ,
). limh0 = 0
.
(2) f Lipschitz,
hku(tj ) Uj k = hkej k
kej+1 k (1 + h)kej k + h2 |j |
. ,
,
.
Euler
Cauchy, u0 (t) = u(t)
u(0) = 1 C . u(t) = et ,
< < 0 limt u(t) = 0. ,
t U
( , Euler).
, ,
T . ,
h, T
. ,
h.
8.2.1.
Cauchy
limj kUj k = 0.
z = h C Cauchy,
limj kUj k = 0.
Euler.
lim |1 + h|j+1 = 0
|1 + h| < 1. ,
Euler
c
8. 2008,
.
294
, (1, 0)
1.
h ( ).
, , h < 2/||.
,
. , , h
T .
, u0 (t) = Au(t), u(0) = u0
A.
k(I + hA)j+1 u0 k 0
u0 ,
k(I + hAj+1 k 0. 5
I + hA 1. (I + hA) =
1 + h(A),
|1 + h(A)| < 1. ,
Euler
h [0, 2/(A)]. ,
A, h
.
8.2.4. u0 (t) = Au(t), A = diag[100, 1, 0.5].
u(t) = [e100t u1 (0), et u2 (0), et/2 u3 (0)]> .
lim ku(t)k = 0
. Euler
Uk+1
= (I + hdiag[100, 1, 0.5])k+1 U0
(1 100h)k+1
0
=
0
0
(1 h)k+1
0
0
U1,0
U2,0 ,
0
k+1
U3,0
(1 0.5h)
Uj,0 = uj (t0 )
h :
G. Strang, , . 590.
c
8.2. 2008,
. 295
T = 1 5000 ,
... ,
A. , A = A> , Q
QT AQ = ,
Uk+1
=
=
8.2.5 .
Euler
d
u(t) u(t t)
u(t) =
+ O(t)
dt
t
U (t) U (t t)
= f (t, U (t)),
t
U (t0 ) = c.
U (t0 ) = c
(8.5)
1
< 1 1 < |1 h|.
|1 h|
, < < 0, ( h > 0),
Euler , h.
, , (h = T )
0 T , .
T ,
( , h = T ).
296
c
8. 2008,
.
,
t
t.
.
Euler . A(t)
, LU
(I tA(t)) . , . A(t)
,
.
. , ,
O(n).
, Euler Euler.
, , ,
Euler . ,
, Euler
Euler.
8.2.5. ,
(cos , sin ) 0 2 .
u() = (x((), y())
dx
d
dy
d
sin = y
cos = x
d
d
x
y
y
x
0
1
1
0
x
y
du
= Au
d
u(0) = [1, 0]> . Euler.
c
8.2. 2008,
. 297
1.5
0.5
1
0
2
0.5
4
5
4
1.5
2
0
N=50
0
N=2000
N=10
1.5
1.5
0.5
0.5
0.5
0.5
1.5
2
0
N=200
1.5
2
8.4: 8.2.5
k U (tk+1 ) U (tk )
h A.
(temp = AU ) tk
saxpy U + htemp.
temp = AU , BLAS-2. , t = 2/N 8.4. ,
, N < 1000 . . [1, 0]
, .
, Euler. 8.5.
.
Euler .
c
8. 2008,
.
298
0.8
0.5
0.6
0.4
0
0.2
0
0.5
0.2
0.4
0.5
0.5
1
1
0.5
0
N=50
0.5
0.5
0
N=2000
0.5
N=10
0.5
0.5
0.5
0.5
1
1
0.5
0
N=200
0.5
1
1
8.5: 8.2.5
u(t + h) =
=
h2 00
h3
u (t) + u000 (t + h)
2!
3!
h3
h2 00
u(t) + hf (t, u) + u (t) + u000 (t + h)
2!
3!
u(t) + hu0 (t) +
c
8.2. 2008,
. 299
, u(t), f (t, u), u00 (t) , u(t + h) O(h3 ),
2, Euler 1.
u00 (t);
u00 (t)
u0 (t) u0 (t h)
+ O(h)
h
f (t, u(t)) f (t h, u(t h))
h
u00 (t) =
ft , fu (t, u). :
Uk+1
Uk + hf (tk , Uk ) +
h2
(ft (tk , Uk ) + fu (tk , Uk )f (tk , Uk ))
2!
, 2. , , (
, .. Maple).
Taylor.
Runge-Kutta
Runge-Kutta
(RK).
Un+1 = Un + h(tn , Un )
u.
Euler, RK. : Euler,
t = h, . U1 , u(0)
(. ) t = 0.
U1 = u(0) + hf (0, u(0)) . :
f (0, u(0)),
[0, h]
. ,
, 0, h, f (0, u(0))
f (h, u(h)),
U1
u 0,
300
c
8. 2008,
.
, .. Euler:
K1
K2
=
=
U1
f (0, u(0))
f (h, u(0) + hf (0, u(0)))
h
u(0) + (K1 + K2 )
2
(. tn tn+1 ) :
K1
K2
=
=
U1
f (tn , Un )
f (tn+1 , Un + hK1 )
h
u(0) + (K1 + K2 )
2
RK.
. ,
K1
K2
U1
f (tn , Un )
h
h
f (tn + , Un + K1 )
2
2
u(0) + hK2
, [tn , tn +h],
tn tn+1 .
Runge-Kutta :
Un+1
Uk + h
s
X
b i Ki
i=1
Ki
f (tn + ci h, Un + h
s
X
aij Kj )
j=1
(s)
. , RK
Butcher:
c1
c2
a11
a21
..
.
..
.
cs
as1
b1
a12
a22
..
.
ass
bs
..
as2
b2
a1s
a2s
=
A
b>
RK , .
; , s 4 , s.
, s s. ,
c
8.2. 2008,
. 301
4, .
.. s = 5 4. 13 s 17
10, .. ,
.
8.2.6. Runge-Kutta, , 2 ( Heun)
K1
f (tn , Un )
K2
f (tn+1 , Un + hK1 )
Un+1
= Un +
h
(K1 + K2 )
2
K1
K2
f (tn , Un )
h
h
f (tn + , Un + K1 )
2
2
Un+1
= Un + hK2
8.2.8. Runge-Kutta, , 4
K1
K2
K3
K4
f (tn , Un )
h
h
f (tn + , Un + K1 )
2
2
h
h
f (tn + , Un + K2 )
2
2
f (tn+1 , Un + hK3 )
Un+1
= Un +
h
(K1 + 2K2 + 2K3 + K4 )
6
8.2.9. , RK
MATLAB ( ode23).
K1
K2
K3
tn+1
Un+1
K4
En+1
= f (tn , Un )
h
h
= f (tn + , Un + K1 )
2
2
3h
3h
= f (tn +
, Un +
K2 )
4
4
= tn + h
h
= Un + (2K1 + 3K2 + 4K3 )
9
= f (tn+1 , Un+1 )
h
(5K1 + 6K2 + 8K3 9K4 )
=
72
302
c
8. 2008,
.
, .
, ,
. Runge-Kutta
.
h, .
RK s 4 ,
u0 = h. s 4, h
s
X
zj
j=0
j!
6
.
Richardson
,
:
W (h) =
(h)
W
= W (0) + O(hps+1 ),
..
.
..
.
W (s h)
W (0),
..
.
..
.
http://www.scholarpedia.org/article/Runge-Kutta_methods
c
8.2. 2008,
. 303
O(hps+1 ), j , 1 , ..., ps
( ) W (0):
(h) :=
W
(h) =
W
W (0) + O(hps+1 )
8.2.10. Richardson
Euler.
U (tn + h) =
U (tn + h/2) =
u(tn ) + 1 h + O(h2 )
h
u(tn ) + 1 + O(h2 )
2
= 1
h
+ O(h2 )
2
O(h2 )
(tn + h)
U
= u(tn ) +
u( tn ) =
(tn + h) + O(h2 )
U
, , U (tn + h/2).
() Euler.
h
+ O(h2 )
2
2
(Uh Uh/2 )
h
Eh/2 (t)
=
h/2
Uh Uh/2
c
8. 2008,
.
304
, h .
8.2.11.
10
sfalma
10
10
Richardson extrapolation
f. Euler h=1/10
f. Euler h=1/20
5
10
tk
8.2.5
- :
(diffusion equation).
(
).
2u
u
= 2 , x R , t [0, tmax ]
t
x
10
c
8.2. 2008,
. 305
.
, 0 tmax
. u(t, x) )
) t = 0.
= [0, 2] u(t, 0) = 0, u(t, 2) = 0.
u(0, x) = sin(x)
. x 2 .
( t = 0)
u(0, x).
.
0.
( ).
u(t, x) = et sin x. ,
. ,
.
x.
h=
(n+1)
j
j
,
h2
uxx (t, xj ). n
1
W (k+1) W (k)
= 2 AW (k) , k = 0 : N, tmax = N t.
t
h
c
8. 2008,
.
306
1
1
computed values for dt=0.025
t=0
t=1.25, ReR=0.0063
t=2.50, ReR=0.0216
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0.2
0.2
0.4
0.4
0.6
0.6
0.8
0.8
3
x
0.8
8.6:
Euler = [0, 2] h = 2/(n + 1) n = 20 t = 1/N
t = 0, 1.25, 2.50.
. ReR
ku W k2 /kuk2 . n =
20, N = 40 n = 40, N = 100.
W (k) U (tk )
U .
W (k+1)
= W (k)
= (I
1
AW (k) t, k = 0 : N
h2
t
A)W (k)
h2
(8.6)
: .
(k+1)
Wj
u(tk , xj ). ,
.
. , t, :
N, n h = 2/(n + 1)
g = t/h2
W = [u(0, x1 ), ..., u(0, xn )]>
for k = 1, ...
W (k) = W (k1) gAW (k1)
end
8.6
(t, h). ,
ReR .
h = 2/21, t = 1/40 ReR= 0.0216
t = 2.5 h = 2/41, t = 1/100 ReR= 0.0076.
, t =
1/40 , .. h = 2/61.
8.7
t = 0.425 = 17t. .
c
8.2. 2008,
. 307
, , W
. , 8.6 !
. .
. .
, W (k+1) = W (k) gAW (k)
g = t/h2 .
W (k+1)
W
= (I g)k+1 W
(k+1)
W
k+1
|1g|
. |1g| > 1
. ,
W (0) (..
) (I gA)k+1 . ,
,
. ,
u(t, x) = et sin x |u(t, x)| = |et sin x| 1
t 0. . A
A = A>
( Gerschgorin, . .3.1 ) (A)
A 0 < (A) < 4.
1
2
t
h2
t
h2
1
0
h2 2
t max
h2 2
1
t
.
h2
2
c
8. 2008,
.
308
1
1
t=0, exact soln
t=0.425, exact soln
t=0.425, computed soln with t=1/40
ReR = 0.3745
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0.2
0.2
0.4
0.4
0.6
0.6
0.8
0.8
0.8
8.7:
= [0, 2] h = 12/61 t = 1/40 t = 0.425 = 17t.
. ReR
ku W k2 /kuk2 .
Euler (8.6).
Euler (8.7).
h ..
,
t. , h h/2, t . 8.6, g = t
h2
g = 1/40/(2/21)2 0.3 g = 1/100/(2/41)2 0.4 8.7
g = 1/40/(2/61)2 2.5.
. t
W (k+1) W (k) ,
0 t = tmax tmax /t
,
.
Euler (8.5)
1
W (k+1) W (k)
= 2 AW (k+1) , k = 0 : N
t
h
t
AW (k+1)
h2
t
(I + 2 A)W (k+1)
h
W (k+1) +
= W (k) , k = 0 : N
= W (k) , k = 0 : N
(8.7)
, W (k+1) (implicitly), . :
c
8.2. 2008,
. 309
N, n h = 2/(n + 1)
g = t/h2
W (0) = [u(0, x1 ), ..., u(0, xn )]>
for k = 1, ...
W (k) = (I + gA)1 W (k1)
end
8.7. 8.7.
.
,
(I + gA)1 ,
(k+1)
W
(0), W
(j) = Q> W (j) .
= (I + g)(k+1) W
http://www.cs.purdue.edu/research/cse/pses
c
8. 2008,
.
310
. MATLAB
PDE Toolbox 2 .
8.3
8.3.1 (, , 04). : (0, 1)
2
ddxu2 + b du
dx + cu = x, b, c > 0,
u(0) = 0, u(1) = 1,
x0 = 0, x1 = 1/(n + 1), . . ., xn = n/(n + 1), xn+1 = 1
,
AU = F , A Rnn A = A> .
. , . , xi :
d2
u(xi1 ) 2u(xi ) + u(xi+1 )
u(xi ) =
dx2
h2
d
u(xi+1 ) u(xi1 )
u(xi ) =
dx
2h
h = 1/(n + 1). Ui u(xi )
:
1
b
1
1
b
)Ui1 + ( 2 + c)Ui + ( 2 +
)Ui+1 = xi
h2
2h
h
h
2h
b
b
AU = F , A = trid[ h12 2h
, h12 + c, h12 + 2h
],
.
8.3.2 (, , 03). : t
du/dt = f (t, u) Euler U (s+1) =
U (s) + tf (ts+1 , U (s+1) ) .
. Euler
t.
8.3.3 (, , 03). t Euler, U (s+1) = U (s) + tf (ts , U (s) ),
du/dt = f (t, u) , ,
.
. : Euler
.
8.4
8.4.1. f : R R
4 , . f (j) (x), j = 1 : 4
x [a, b].
c
8.4. 2008,
.
311
1.
.
2. .
. (. 8).
8.4.2. u : R R
4
[0, 1]
d2 u
u(0) = 0, u(1) = 1.
1. n
O(h2 ), h = 1/(n + 1)
.
2.
Cholesky.
. (. 8).
8.4.3. u : R R
4 , . u(j) (x), j = 1 : 4
x [1, 1]. u
u(1) = 0, u(1) = 1.
, .
1.
n = 8
(1, 1) ( ).
2.
.
.
8.2.1. 1)
xi (1, 1)
u(2) (xi )
312
c
8. 2008,
.
Ui u.
= 2xi
= 2xi
= 2h2 xi .
2
xi = 1 + i, i = 1 : 8
9
h = 2/9. (
i = 0, 9 [1, 1].)
T U = Y T Toeplitz,
T R88
T = trid[1, 166/81, 1]
, U = [U1 , ..., U8 ]>
Y =
1
[56, 40, 24, 8, 8, 24, 40, 673]>
729
2) T U = Y . , ,
LU .
Gershgorin (,
.2.1) i , i = 1 : 8
166
166
| 2, |
| 1.
81
81
T = T > .
166
166
2, 1
1,
81
81
> 0.
.
Cholesky
LU.
8.4.4 (, , 02-makeup).
u : R R2
du
(t) = Au(t), A = [1, 0.5; 0.5, 1] R22 ,
dt
u(0) = [1, 2]> .
c
8.4. 2008,
.
313
1. Euler t = 0.1
T = t.
2. Euler t = 0.1
T = t.
3. , 4 , u(t) = [0.7982, 1.0214].
E =
ku(t) U (t)k
,
ku(t)k
E =
ku(t) U (t)k
ku(t)k
. -) 8.2.2.
U (t + t) = (I At)U (t). ,
U (0) = [2, 1]> , ,
:
t
U_1
U_2
t
U_1
U_2
0
2.0
1.0
0.5
0.5
1.0
0
2.0000
1.0000
1.00
0.50
0.25
1.500
0.125
0.250
0.8000
-0.4000
1.0000
2.0000
0.1250
0.0625
1.6000
1.0400
-0.9200
2.5000
0.0313
0.0625
3.0000
0.0313
0.0156
2.4000
-1.3600
1.3840
3.5000
0.0078
0.0156
3.2000
1.9232
-1.9184
4.0000
0.0078
0.0039
4.0000
-2.6886
2.6896
) t = 0.5, U1 , U2 , t = 0.8,
. t ,
0, ,
314
c
8. 2008,
.
( ). , : U (t + t) = (I At)U (t)
Q> AQ = diag[3, 1] Q> Q = I , U (t + t) = QQ> (I At)QQ> U (t) =
Q(I diag[3t, t])Q> U (t) U (t + t) = Qdiag[1 3t, 1 t]Q> U (t).
t = 0.5, max(|1 3t|, |1 t|) = 0.5 < 1, t = 0.8
max(|1 3t|, |1 t|) = 1.4 > 1, ,
.
8.5
8.5.1.
erf(t) =
e d
.
2
du
2
= et ,
dt
u(0) = 0.
, ode23 MATLAB, .
[T,Y]=mysolver(odefile,Tspan,Y0),
input.
ode23 MATLAB odefile, Tspan,
Y0.
) Euler
[0,3] : h = 1/10, 1/20, 1/40.
, (
) (.. caption) . MATLAB erf
Euler.
Euler .
) () Taylor
.
Euler.
) Euler
( )
t . = 104 . ;
) hmax , hmin
. Euler
315
.
|Us (erf )(ts )|
1) Euler hmin 2) hmax 3)
Euler
= 104 , 4) Euler h = 1/(n + 1)
n ().
)
Taylor ().
[1] A. Iserles. Introduction to Numerical Methods for Differential Equations. Cambridge University Press, Cambridge, 1996.
[2] . and . . . , , 1995.
[3] .. and .. . . , 1997.
[4] N. Wiener. Invention: The Care and Feeding of Ideas. MIT Press, Cambridge,
Massachusetts, 1954. Published by MIT Press in 1993.
316
.1
.1.1
.1.1. x Cn , kxk,
Cn R ,
:
1. kxk > 0 x = 0 kxk = 0 ( )
2. kx + yk kxk + kyk ( )
3. kxk = ||kxk R ( ).
:
l1 : kxk1 =
l2 : kxk2 =
Pn
j=1
|xj |
n
j=1
|xj |2
1/2
kxkp =
n
X
1/p
|xj |p
j=1
l2 x x .
:
x x = kxk22 .
|x y| kxk2 kyk2 .
kx yk | kxk kyk |.
317
318
c
. 2008,
.
.1:
1
2
1
1
1
1
1
1
n
1
,
. , ,
.
, .. Rn .
.1.2. U k k
0
k k . 1 , 2
0
.
.1.1. Rn ( Cn )
.
, , ,
. , l2
un ,
. 1 , 2 .1. ..
2, 1, 1, n
nkxk
, k k.
, k kD , :
kxkD = max
z6=0
|z x|
.
kzk
p- q -, p1 +
q 1 = 1. |x yk kxkkykD . , x,
xD , x,
xD x = kxD kD kxk = 1.
.1.2
. :
c
.2. Lipschitz 2008,
. 319
.1.3. k k : Rnn R
.1.1 kA Bk kAkkBk,
.
.1.4. k k k kM :
Rnn R
kAkM =
sup
x Rn
x 6= 0
kAxk
.
kxk
.
,
kAkM =
sup
x Rn
kxk = 1
kAxk
.
kxk
, . A Rnn .
kAk1 = maxj=1:n
Pn
i=1
|i,j |.
Pn
j=1
|i,j | ( )
Frobenius (
) A Rmn :
kAkF =
m X
n
X
1/2
|ij |2
i=1 j=1
Frobenius
: X X = I, Y Y = I , kXAY k = kAk.
.2 Lipschitz
.
.2.1. g : Rn Rm
x Rn gi , i = 1, ..., m, g(x) = [g1 (x), , gm (x)],
x. g x
g x g x
gi
, i = 1 : m, j = 1 : n.
xj
320
c
. 2008,
.
0
g(x)> .
gi
xj (x),
g (x) = J(x) =
.3
. ,
A Rmn rank(A) A.
A A.
.
5.2.1. , A Rmn ()
A ().
rank(A).
null(A) A, .
u Au = 0.
.3.1. A Rmn , B Rnk .
1. rank(A) = rank(A> ).
2. rank(A) + null(A) = n.
3. rank(A) + rank(B) n rank(AB) min(rank(A), rank(B)).
4. rank(A + B) rank(A) + rank(B).
5. rank(A) = r X Rmr , Y Rrn
C Rrr A = XCY .
. (= trace) A Rnn Pn
, . trace(A) = i=1 i,i .
A Rnn
x> Ax > 0 x 6= 0.
c
.3. 2008,
.
321
Gershgorin () .
.3.1. A Cnn
n
n
X
|z ii |
|ij |,
i = 1 : n.
j=1,j6=i
11 B
C = 21 B
..
.
m1 1 B
12 B
..
..
..
.
1n1 B
m1 n 1 B
Ax = x,
A Cnn , x, Cn
A. Ax = x det(A
I) = 0. det(A I) n
n :
p() := det(A I) = ( 1 ) ( n )
A (A) := {1 , ..., n }.
A Rmn .
U Rmm , Rmn V Rnn
c
. 2008,
.
322
0.5
0.1
0.4
0.08
0.3
0.06
0.2
0.04
0.1
0.02
0.1
0.02
0.2
0.04
0.3
0.06
0.4
0.08
0.5
0.8
0.6
0.4
0.2
0.2
0.4
0.6
0.8
0.1
0.25
0.2
0.15
0.1
0.05
0.05
0.1
0.15
0.2
0.25
.1: Ax x R2 kxk2 = 1. :
(=2) . : (=1) .
2. kAk2 = max . A kA1 k1
2 = min .
3. 1 r > r+1 = = n = 0 A r .
A (null(A)) V (:, r + 1 : n).
A (range(A)) U (:, 1 : r).
4. A =
Pn
j=1
j uj vj>
5. S n1 =
{x|kxk = 1}. AS n1 0
j uj .
. ,
MATLAB
>
s
>
>
>
>
>
>
A= rand(2); s=svd(A)
= 0.8703, 0.2764
x = randn(2,1); x = x/norm(x); y = A*x; plot(y(1),y(2),+);
hold on
for j=1:1000
x = randn(2,1); x = x/norm(x); y = A*x; plot(y(1),y(2),+);
end
hold off
.1.
c
.3. 2008,
.
323
0.6
0.4
0.2
0.2
0.4
0.6
0.8
1
0.8
0.6
0.4
0.2
0.2
0.4
0.6
0.8
.2: Ax x R2 kxk2 = 1
1 = 1 2 = 0.01.
. , .
.1,
Ax , u1 , .
Ax
, b Ax = b. ,
, b
u1 . , .
. ,
A 1, 0.01
.2. 100
( ),
- - .
max
min
. ,
.