Professional Documents
Culture Documents
07 Cholesky
07 Cholesky
Outline
1 Cholesky Factorization
Cholesky Factorization
Linear system
Ax = b
can then be solved by forward-substitution in lower
triangular system Ly = b, followed by back-substitution in
upper triangular system LT x = y
so we have
√
q
`11 = a11 , `21 = a21 /`11 , `22 = a22 − `221
for k = 1 to n
√
akk = akk
for i = k + 1 to n
aik = aik /akk
end
for j = k + 1 to n
for i = j to n
aij = aij − aik ajk
end
end
end
Parallel Algorithm
Partition
Agglomeration Schemes
Agglomerate
Agglomeration of fine-grain tasks produces
2-D
1-D column
1-D row
parallel algorithms analogous to those for LU factorization,
with similar performance and scalability
Submatrix-Cholesky Column-Cholesky
for k = 1 to n for j = 1 to n
√
akk = akk for k = 1 to j − 1
for i = k + 1 to n for i = j to n
aik = aik /akk aij = aij − aik ajk
end end
for j = k + 1 to n end
√
for i = j to n ajj = ajj
aij = aij − aik ajk for i = j + 1 to n
end aij = aij /ajj
end end
end end
Column Operations
Column-oriented algorithms can be stated more compactly by
introducing column operations
cdiv ( j ): column j is divided by square root of its diagonal
entry
√
ajj = ajj
for i = j + 1 to n
aij = aij /ajj
end
Submatrix-Cholesky Column-Cholesky
for k = 1 to n for j = 1 to n
cdiv (k) for k = 1 to j − 1
for j = k + 1 to n cmod ( j, k)
cmod ( j, k) end
end cdiv ( j)
end end
right-looking left-looking
immediate-update delayed-update
data-driven demand-driven
fan-out fan-in
Data Dependences
cmod (n, k )
cmod (k + 1, k ) cmod (k + 2, k ) • • •
cdiv (k )
cmod (k, k - 1 )
cmod (k, 1) cmod (k, 2 ) • • •
Data Dependences
Sparse Matrices
Sparsity Structure
For sparse matrix M , let Mi∗ denote its ith row and M∗j
its jth column
Submatrix-Cholesky Column-Cholesky
for k = 1 to n for j = 1 to n
cdiv ( k) for k ∈ Struct (Lj∗ )
for j ∈ Struct (L∗k ) cmod ( j, k)
cmod ( j, k) end
end cdiv ( j)
end end
right-looking left-looking
immediate-update delayed-update
data-driven demand-driven
fan-out fan-in
Graph Model
3 6 4 3 6 4 3 6 4 6 4
7 9 8 7 9 8 7 9 8 7 9 8
1 5 2 5 2 5 5
6 6
9 9 8 7 9 8 7 9 8 7 9 8
Elimination Tree
× × × ×
×× × ×
×
×× ×
× × × ×
×× × × ×× ×
×× × × ×× ×
× × × × × × + +×
× × ×× × × + ++ ×
× ×× × × × ×× × ×
A L
9
3 6 4 3 6 4
8
7 9 8 7 9 8 7
5 6
1 5 2 1 5 2
1 2 3 4
G (A) F (A) T (A)
× × ×× × ×× ×× × ×
×× × ×
× × × ×
× × × ×
× × × ×
× × × ×
× × × ×
× × ××
× × ×× ×× × ×× × ×
Ordering Heuristics
Symbolic Factorization
Fine-grain
Task is one multiply-add pair
Available in either dense or sparse case
Difficult to exploit effectively in practice
Medium-grain
Task is one cmod or cdiv
Available in either dense or sparse case
Accounts for most of speedup in dense case
Large-grain
Task computes entire set of columns in subtree of
elimination tree
Available only in sparse case
7 7
6 6
×× ×
5 ×× × ×× 5
× ×× ××
4 ××× ×× 4
3 ××× ×× 3
× ×× ××
2 ×× ×× 2
1 1
A L
G (A) T (A)
4
× × × 7
6 × × ×
× × × × × 5 6
7 × × × × ×
5 × × × × × 3 4
× ×× × ×
3 × ×× × ×× 1 2
1 T (A)
A L
G (A)
6
× × ×
5 ×× × × 7
×× × ×× ×
7 × × × 3 6
2 × ×× ×
××× ××× 1 2 4 5
3 × × × × + ×+×
T (A)
1
A L
G (A)
×× × × 7
7 8 9 ×× × × ××
×× × ×× 6
× ×× × ×+ +× 5
4 5 6 × ××× × × +××
× ×× × ×+× × 4
1 2 3
× ×× × + +×
× ××× × +× × 3
× ×× ×+ × × 2
G (A) A L
1
T (A)
× × × × 9
3 6 4 × × × ×
× ×× × 8
× × × ×
7 9 8 ×× × × ×× × 7
×× × × ×× × 5 6
1 5 2
× × × × × × + +×
× × ×× × × + ++ × 1 2 3 4
× ×× × × × ×× × ×
G (A) A L T (A)
× × × × 9
4 6 5 ×× × ×
×× × × ×× × 8
× ×× ×
7 8 9 ×× × × 7
××× × ××× 3 6
1 3 2
× × ×× × +× +×
× ×× × × × ×× × 1 2 4 5
× × ×× × + × ++ × ×
G (A) A L T (A)
Mapping
0 2
1 3
0 2
0 1 2 3
0 1 2 3
0 1 2 3
0 0 1 1 2 2 3 3
0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3
Example: extend_add
a11 a13 a15 a18 b11 b12 b15 b17
a31 a33 a35 a38 b21 b22 b25 b27
⊕
a51 a53 a55 a58 b51 b52 b55 b57
a81 a83 a85 a88 b71 b72 b75 b77
a11 + b11 b12 a13 a15 + b15 b17 a18
b21 b22 0 b25 b27 0
a31 0 a33 a35 0 a38
=
a51 + b51
b52 a53 a55 + b55 b57 a58
b71 b72 0 b75 b77 0
a81 0 a83 a85 0 a88
References – Scalability
A. George, J. Lui, and E. Ng, Communication results for
parallel sparse Cholesky factorization on a hypercube,
Parallel Computing 10:287-298, 1989
A. Gupta, G. Karypis, and V. Kumar, Highly scalable
parallel algorithms for sparse matrix factorization, IEEE
Trans. Parallel Distrib. Systems 8:502-520, 1997
T. Rauber, G. Runger, and C. Scholtes, Scalability of
sparse Cholesky factorization, Internat. J. High Speed
Computing 10:19-52, 1999
R. Schreiber, Scalability of sparse direct solvers,
A. George, J. R. Gilbert, and J. Liu, eds., Graph Theory
and Sparse Matrix Computation, pp. 191-209,
Springer-Verlag, 1993
Michael T. Heath Parallel Numerical Algorithms 51 / 52
Cholesky Factorization Sparse Elimination
Parallel Dense Cholesky Matrix Orderings
Parallel Sparse Cholesky Parallel Algorithms