Professional Documents
Culture Documents
Numerical Algorithms
Samir Moustafa
almost . . .
What is iterative renement?
k+1 k
f(x)
Buttari (2011)
Basic structure of iterative renement
1: LU ← PA . LU factorization
2: Solve Ly = Pb for y . forward substitution
Solve Ux
3: (0) = y for x (0) . back substitution
4: r (0) ← b − Ax (0)
5: for k ← 0, 1, . . . do
6: Solve Ly = Pr (k) for y . forward substitution
Solve Us
7: (k) = y for s (k) . back substitution
8: x (k+ 1) ← x (k) + s (k)
9: r (k+1 ) ← b − Ax (k+1)
10: check convergence
11: end for
Iterative renement for dense linear systems
1: LU ← PA . single precision
2: Solve Ly = Pb for y . single precision
Solve Ux
3: (0) = y for x (0) . single precision
4: r (0) ← b − Ax (0) . double precision
5: for k ← 0, 1, . . . do
6: Solve Ly = Pr (k) for y . single precision
Solve Us
7: (k) = y for s (k) . single precision
8: x (k+ 1) ← x (k) + s (k) . double precision
9: r (k+1 ) ← b − Ax (k+1) . double precision
10: check convergence
11: end for
Example for iterative renement
I Consider solving the following linear system:
x1 + x2 = 2
2x1 + 3x2 = 5
I Residual:
r = b − Ax = 2
5
−
1
2
1
3
0.9
1.3
=
−0.2
−0.7
Example for iterative renement
I Solve As = r for s
I Forward substitution: solve Ly = r for y
1 0 y1 −0.2 y1 −0.2
= ⇒ =
2 1 y2 −0.7 y2 −0.3
I Rened solution:
x (1) = x (0) + s = 0.9
1.3
+
0.1
−0.3
=
1
1
I Since
Ax (1)
=
1
2
1
3
1
1
=
2
5
= b,
I Ifkδ (k) k < α, kµ(k) k < β (for all k ) and kÂ−1 E k not too
close to 1 (e.g. kÂ
−1 E k < 1/2):
α ≤ c1 εm kAkkx k
β ≤ c 2 εm k x k
I For modest c1 , c2
kx (k) − x k
≤ c1 εm κ(A) + c2 εm
kx k
Gflop/s
3 double 3
double
2 2
1 1
0 0
0 1000 2000 3000 4000 5000 0 1000 2000 3000 4000 5000
problem size problem size
Buttari (2011)
√
kb − Ax k2 ≤ kx k2 · kAkF · εd · n
Experiments: double vs. mixed precision
I IBM PowerPC 970 2.0 GHz
Gflop/s
double 4
double
4
3
2
2
1
0 0
0 1000 2000 3000 4000 5000 0 1000 2000 3000 4000 5000
problem size problem size
Buttari (2011)
√
kb − Ax k2 ≤ kx k2 · kAkF · εd · n
Experiments: double vs. mixed precision
I Intel Woodcrest 3.0 GHz
Gflop/s
double 8
double
5 6
0 0
0 1000 2000 3000 4000 5000 0 1000 2000 3000 4000 5000
problem size problem size
Buttari (2011)
√
kb − Ax k2 ≤ kx k2 · kAkF · εd · n
Experiments: double vs. mixed precision
1600
SP Solve
1400
DP Solve (MP
1200 Iter.Ref.)
1000 DP Solve
800
600
400
200
Matrix size
Dongarra (2012)
Summary