Professional Documents
Culture Documents
Unit II HPC PDF
Unit II HPC PDF
Unit II HPC PDF
• Decomposition Techniques
– Recursive Decomposition
– Recursive Decomposition
– Exploratory Decomposition
– Hybrid Decomposition
n-1
Task n
Observations: While tasks share data (namely, the vector b), they
do not have any control dependencies – i.e., no task needs to
wait for the (partial) completion of any other. All tasks are of the
same size in terms of number of operations. Is this the maximum
number of tasks we could decompose this problem into?
Example: Database Query Processing
ID# Color
ID# Color
Task 1
Task 2
Task 3
Task 4
• The longest such path determines the shortest time in which the
program can be executed in parallel.
10 10 10 10 10 10 10 10
6 Task 5
9 Task 6 6 Task 5
11 Task 6
8 Task 7 7 Task 7
(a) (b)
What are the critical path lengths for the two task dependency graphs?
If each task takes 10 time units, what is the shortest parallel execution time
for each decomposition? How many processors are needed in each case to
achieve this minimum parallel execution time? What is the maximum degree
of concurrency?
Limits on Parallel Performance
A b 0
2 3
0 1 2 3 4 5 6 7 8 9 1011 1
Task 0
4 5
4 6
7
8
8 9 10 11
Task 11
(a) (b)
Task Interaction Graphs, Granularity, and
Communication
Note: These criteria often conflict eith each other. For example,
a decomposition into one task (or no decomposition at all)
minimizes interaction but does not result in a speedup at all! Can
you think of other such conflicting cases?
Processes and Mapping: Example
10 10 10 10 10 10 10 10
P3 P2 P1 P0 P3 P2 P1 P0
P0 6 Task 5
P2 9 Task 6 P0 6 Task 5
P0 11 Task 6
P0 8 Task 7 P0 7 Task 7
(a) (b)
• recursive decomposition
• data decomposition
• exploratory decomposition
• speculative decomposition
Recursive Decomposition
1 3 4 2 5 12 11 10 6 8 7 9
1 2 3 4 5 6 8 7 9 12 11 10
1 2 3 4 5 6 7 8 9 10 12 11
5 6 7 8 10 11 12
11 12
In this example, once the list has been partitioned around the pivot,
each sublist can be processed concurrently (i.e., each sublist represents an
independent subtask). This can be repeated recursively.
Recursive Decomposition: Example
min(1,2)
min(4,1) min(8,2)
A1,1 A1,2 B1,1 B1,2 C1,1 C1,2
. →
A2,1 A2,2 B2,1 B2,2 C2,1 C2,2
Decomposition I Decomposition II
A, B, C, E, G, H A, B, C 1
B, D, E, F, K, L D, E 3
Itemset Frequency
Database Transactions
A, B, F, H, L C, F, G 0
Itemsets
D, E, F, H A, E 2
F, G, H, K, C, D 1
A, E, F, K, L D, K 2
B, C, D, G, H, L B, C, F 0
G, H, L C, D, K 0
D, E, F, K, L
F, G, H, L
1 1
Itemset Frequency
Itemset Frequency
A, B, C, E, G, H A, B, C A, B, C, E, G, H C, D
Itemsets
Itemsets
B, D, E, F, K, L D, E 3 B, D, E, F, K, L D, K 2
Database Transactions
Database Transactions
A, B, F, H, L C, F, G 0 A, B, F, H, L B, C, F 0
D, E, F, H A, E 2 D, E, F, H C, D, K 0
F, G, H, K, F, G, H, K,
A, E, F, K, L A, E, F, K, L
B, C, D, G, H, L B, C, D, G, H, L
G, H, L G, H, L
D, E, F, K, L D, E, F, K, L
F, G, H, L F, G, H, L
task 1 task 2
Output Data Decomposition: Example
In the database counting example, the input (i.e., the transaction set) can be
partitioned. This induces a task decomposition in which each task generates
partial counts for all itemsets. These are combined subsequently for aggregate
counts.
Database Transactions
A, B, C, E, G, H A, B, C 1 A, B, C 0
B, D, E, F, K, L D, E 2 D, E 1
Itemset Frequency
Itemset Frequency
A, B, F, H, L C, F, G 0 C, F, G 0
Itemsets
Itemsets
D, E, F, H A, E 1 A, E, F, K, L A, E 1
F, G, H, K, C, D 0 B, C, D, G, H, L C, D 1
D, K 1 G, H, L D, K 1
B, C, F 0 D, E, F, K, L B, C, F 0
C, D, K 0 F, G, H, L C, D, K 0
task 1 task 2
Partitioning Input and Output Data
Often input and output data decomposition can be combined for a higher
degree of concurrency. For the itemset counting example, the transaction set
(input) and itemset counts (output) can both be decomposed as follows:
Database Transactions
A, B, C, E, G, H A, B, C 1 A, B, C, E, G, H
B, D, E, F, K, L D, E 2 B, D, E, F, K, L
Itemset Frequency
Itemset Frequency
A, B, F, H, L Itemsets C, F, G 0 A, B, F, H, L
Itemsets
D, E, F, H A, E 1 D, E, F, H
F, G, H, K, F, G, H, K, C, D 0
D, K 1
B, C, F 0
C, D, K 0
task 1 task 2
Database Transactions
Database Transactions
A, B, C 0
D, E 1
Itemset Frequency
Itemset Frequency
C, F, G 0
Itemsets
Itemsets
A, E, F, K, L A, E 1 A, E, F, K, L
B, C, D, G, H, L B, C, D, G, H, L C, D 1
G, H, L G, H, L D, K 1
D, E, F, K, L D, E, F, K, L B, C, F 0
F, G, H, L F, G, H, L C, D, K 0
task 3 task 4
Intermediate Data Partitioning
Let us revisit the example of dense matrix multiplication. We first show how we
can visualize this computation in terms of intermediate matrices D.
.
A 1,1 B 1,1 B 1,2 D1,1,1 D1,1,2
A 2,1
D1,2,1 D1,2,2
.
A 1,2
D2,1,1 D2,1,2
C 1,1 C 1,2
C 2,1 C 2,2
Intermediate Data Partitioning: Example
Stage I
0 „ « 1
„ « „ « D1,1,1 D1,1,2
A1,1 A1,2 B1,1 B1,2 B
„ D1,2,2 D1,2,2 « C
. →B C
A2,1 A2,2 B2,1 B2,2 @ D2,1,1 D2,1,2 A
D2,2,2 D2,2,2
Stage II
„ « „ « „ «
D1,1,1 D1,1,2 D2,1,1 D2,1,2 C1,1 C1,2
+ →
D1,2,2 D1,2,2 D2,2,2 D2,2,2 C2,1 C2,2
Task 01: D1,1,1 = A1,1 B1,1 Task 02: D2,1,1 = A1,2 B2,1
Task 03: D1,1,2 = A1,1 B1,2 Task 04: D2,1,2 = A1,2 B2,2
Task 05: D1,2,1 = A2,1 B1,1 Task 06: D2,2,1 = A2,2 B2,1
Task 07: D1,2,2 = A2,1 B1,2 Task 08: D2,2,2 = A2,2 B2,2
Task 09: C1,1 = D1,1,1 + D2,1,1 Task 10: C1,2 = D1,1,2 + D2,1,2
Task 11: C2,1 = D1,2,1 + D2,2,1 Task 12: C2,2 = D1,2,2 + D2,2,2
Intermediate Data Partitioning: Example
1 2 3 4 5 6 7 8
9 10 11 12
The Owner Computes Rule
1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4
5 6 8 5 6 7 8 5 6 7 8 5 6 7 8
9 10 7 11 9 10 11 9 10 11 9 10 11 12
13 14 15 12 13 14 15 12 13 14 15 12 13 14 15
1 2 3 4
5 6 7 8
9 10 11
13 14 15 12
1 2 3 4 1 2 3 4
task 4
5 6 7 8 5 6 7
9 10 11 9 10 11 8
13 14 15 12 13 14 15 12
1 2 3 4
Exploratory Decomposition: Example
5 6 7 8
9 10 11 12
13 14 15
1 2 3 4
5 6 7 8
9 10 11
13 14 15 12
1 2 3 4
5 6 7 8
1 2 3 4 9 10 11
5 6 7 8 13 14 15 12
9 10 11 1 2 3 4
13 14 15 12 5 7 8
task 3
9 6 10 11
13 14 15 12
1 2 3 4
5 6 7 8
1 2 3 4 9 14 10 11
5 6 7 8 13 15 12
9 10 11
13 14 15 12 1 2 3 4
5 6 8
9 10 7 11
13 14 15 12
1 2 3 4
5 6 8
1 2 3 4 9 10 7 11
5 6 8 13 14 15 12
task 2
9 10 7 11 1 2 4
13 14 15 12 5 6 3 8
9 10 7 11
13 14 15 12
1 2 3 4
5 6 7 8
9 10 11
13 14 15 12
1 2 3 4
5 6 7 8
9 10 15 11
13 14 12
1 2 3 4 1 2 3 4
5 6 7 8 5 6 7 8
task 1
9 10 15 11 9 10 15 11
13 14 12 13 14 12
1 2 3 4
5 6 7 8
9 10 11
13 14 15 12
Exploratory Decomposition: Anomalous
Computations
m m m m m m m m
Solution
Total serial work: 2m+1 Total serial work: m
Total parallel work: 1 Total parallel work: 4m
(a) (b)
Speculative Decomposition
C
System Inputs
A D
System Output
E G I
B
F H
System Components
Hybrid Decompositions
Data
3 7 2 9 11 4 5 8 7 10 6 13 1 19 3 9
decomposition
2 1 Recursive
decomposition
1
Characteristics of Tasks
• Task generation.
• Task sizes.
• Task sizes may be uniform (i.e., all tasks are the same size) or
non-uniform.
Tasks
Pixels
Characteristics of Task Interactions: Example
A b 0
2 3
0 1 2 3 4 5 6 7 8 9 1011 1
Task 0
4 5
4 6
7
8
8 9 10 11
Task 11
(a) (b)
Characteristics of Task Interactions
P1 1 5 9 P1 1 2 3
P2 2 6 10 P2 4 5 6
P3 3 7 11 P3 7 8 9
P4 4 8 12 P4 10 11 12
(a) (b)
Mapping Techniques for Minimum Idling
• Hybrid mappings.
Mappings Based on Data Partitioning
ements
P0
P1
P8 P2
P9
P3 P0 P1 P2 P3 P4 P5 P6 P7
P10
P11 P4
P12 P5
P13
P6
P14
P15 P7
cements
Block Array Distribution Schemes
P0 P1 P2 P3
P0 P1 P2 P3 P4 P5 P6 P7
P4 P5 P6 P7
P8 P9 P10 P11
P8 P9 P10 P11 P12 P13 P14 P15
P12 P13 P14 P15
(a) (b)
Block Array Distribution Schemes: Examples
(a)
A B C
P0 P1 P2 P3
P4 P5 P6 P7
X =
P8 P9 P10 P11
(b)
Cyclic and Block Cyclic Distributions
A1,1 A1,2 A1,3 L1,1 0 0 U1,1 U1,2 U1,3
A2,1 A2,2 A2,3 → L2,1 L2,2 0 . 0 U2,2 U2,3
A3,1 A3,2 A3,3 L3,1 L3,2 L3,3 0 0 U3,3
Column k
Column j
Inactive part
Active part
P0 P3 P6
T1 T4 T5
P1 P4 P7
T2 T 6 T10 T 8 T12
P2 P5 P8
T3 T 7 T11 T 9 T13T14
Block-Cyclic Distribution
Random Partitioning
0 4
0 2 4 6
0 1 2 3 4 5 6 7
Task Paritioning: Mapping a Sparse Graph
Process 0 C0 = (4,5,6,7,8)
Process 1 C1 = (0,1,2,3,8,9,10,11)
Process 2 C2 = (0,4,5,6)
C1 = (0,5,6) Process 1
0
2 3
1
Process 0
5
4
C0 = (1,2,6,9) 6
9
8 10 11
Process 2 C2 = (1,2,4,5,7,8)
Hierarchical Mappings
• For this reason, task mapping can be used at the top level and
data partitioning within each level.
Hierarchical Mapping: Example
P0 P1 P4 P5
P2 P3 P6 P7
P0 P1 P2 P3 P4 P5 P6 P7
P0 P1 P2 P3 P4 P5 P6 P7
Schemes for Dynamic Mapping
• When a process runs out of work, it requests the master for more
work.
• There are four critical questions: how are sensing and receiving
processes paired together, who initiates work transfer, how
much work is transferred, and when is a transfer triggered?