You are on page 1of 82

Parallel and Distributed Computing (CS-3216)

Slides 04 (Principles of Parallel Algorithm Design)

Department of CS, GCU, Lahore

Department of CS, GCU, Lahore Parallel and Distributed Computing July 18, 2023 1 / 69
Agenda

1 Steps in Parallel Algorithm Design

2 Preliminaries: Decomposition, Tasks, and Dependency Graphs

3 Decomposition Techniques

4 Mapping Schemes

5 Reference Books

Department of CS, GCU, Lahore Parallel and Distributed Computing July 18, 2023 2 / 69
Steps in Parallel Algorithm Design

Agenda

1 Steps in Parallel Algorithm Design

2 Preliminaries: Decomposition, Tasks, and Dependency Graphs

3 Decomposition Techniques

4 Mapping Schemes

5 Reference Books

Department of CS, GCU, Lahore Parallel and Distributed Computing July 18, 2023 3 / 69
Steps in Parallel Algorithm Design

1. Identification

Identifying portions of the work that can be performed concurrently.


The work-units are also known as tasks
E.g., Initializing two mega-arrays are two tasks and can be performed
in parallel

Department of CS, GCU, Lahore Parallel and Distributed Computing July 18, 2023 4 / 69
Steps in Parallel Algorithm Design

2. Mapping

The process of mapping concurrent pieces of the work or tasks onto


multiple processes running in parallel.
Multiple processes can be physically mapped on a single processor

Department of CS, GCU, Lahore Parallel and Distributed Computing July 18, 2023 5 / 69
Steps in Parallel Algorithm Design

3. Data Partitioning

Distributing the input, output, and intermediate data associated with the
program.
One way is to copy whole data at each processing node
Memory challenges for huge-size problems
Other way is to give fragments of data to each processing node
Communication overheads

Department of CS, GCU, Lahore Parallel and Distributed Computing July 18, 2023 6 / 69
Steps in Parallel Algorithm Design

4. Defining Access Protocol

Managing accesses to data shared by multiple processors (i.e.,


managing communication)

Department of CS, GCU, Lahore Parallel and Distributed Computing July 18, 2023 7 / 69
Steps in Parallel Algorithm Design

5. Synchronizing

Synchronizing the processes at various stages of the parallel program


execution

Department of CS, GCU, Lahore Parallel and Distributed Computing July 18, 2023 8 / 69
Preliminaries: Decomposition, Tasks, and Dependency Graphs

Agenda

1 Steps in Parallel Algorithm Design

2 Preliminaries: Decomposition, Tasks, and Dependency Graphs

3 Decomposition Techniques

4 Mapping Schemes

5 Reference Books

Department of CS, GCU, Lahore Parallel and Distributed Computing July 18, 2023 9 / 69
Preliminaries: Decomposition, Tasks, and Dependency Graphs

Decomposition

The process of dividing a computation into smaller parts, some or all


of which may potentially be executed in parallel
A given problem may be docomposed into tasks in many different
ways

Department of CS, GCU, Lahore Parallel and Distributed Computing July 18, 2023 10 / 69
Preliminaries: Decomposition, Tasks, and Dependency Graphs

Tasks

Programmer-defined units of computation into which the main


computation is subdivided by means of decomposition
Tasks can be of arbitrary size, but once defined, they are regarded as
indivisible units of computation
The tasks into which a problem is decomposed may not all be of the
same size
Concurrent execution of multiple tasks is the key to reducing the time
required to solve the entire problem

Department of CS, GCU, Lahore Parallel and Distributed Computing July 18, 2023 11 / 69
Preliminaries: Decomposition, Tasks, and Dependency Graphs

Tasks

Programmer-defined units of computation into which the main


computation is subdivided by means of decomposition
Tasks can be of arbitrary size, but once defined, they are regarded as
indivisible units of computation
The tasks into which a problem is decomposed may not all be of the
same size
Concurrent execution of multiple tasks is the key to reducing the time
required to solve the entire problem

Department of CS, GCU, Lahore Parallel and Distributed Computing July 18, 2023 11 / 69
Preliminaries: Decomposition, Tasks, and Dependency Graphs

Example: Multiplying a Dense Matrix with a Vector

A b y
0 1 n
Task 1
2

n-1
Task n

Figure: (3.1) Decomposition of dense-matrix multiplication into n tasks where n


is the number of rows in the matrix. The portion of the matrix and input, output
vector accessed by Task 1 are highlighted.

Department of CS, GCU, Lahore Parallel and Distributed Computing July 18, 2023 12 / 69
Preliminaries: Decomposition, Tasks, and Dependency Graphs

Example (2): Multiplying a Dense Matrix with a Vector

A b y
0 1 ... n

Task 1

Task 2

Task 3

Task 4

Figure: (3.4) Decomposition of dense-matrix multiplication into 4 tasks. The


portion of the matrix and input, output vector accessed by Task 1 are highlighted.

Department of CS, GCU, Lahore Parallel and Distributed Computing July 18, 2023 13 / 69
Preliminaries: Decomposition, Tasks, and Dependency Graphs

Task-Dependency Graph

The tasks in the previous examples are independent and can be


performed in any sequence.
In most of the problems, there exist some sort of dependencies
between the tasks.
An abstraction used to express such dependencies among tasks and
their relative order of execution is known as a task-dependency
graph
It is a directed acyclic graph in which node are tasks and the
directed edges indicate the dependencies between them
The task corresponding to a node can be executed only when all the
predecessor [parent] tasks complete their execution

Department of CS, GCU, Lahore Parallel and Distributed Computing July 18, 2023 14 / 69
Preliminaries: Decomposition, Tasks, and Dependency Graphs

Task-Dependency Graph

The tasks in the previous examples are independent and can be


performed in any sequence.
In most of the problems, there exist some sort of dependencies
between the tasks.
An abstraction used to express such dependencies among tasks and
their relative order of execution is known as a task-dependency
graph
It is a directed acyclic graph in which node are tasks and the
directed edges indicate the dependencies between them
The task corresponding to a node can be executed only when all the
predecessor [parent] tasks complete their execution

Department of CS, GCU, Lahore Parallel and Distributed Computing July 18, 2023 14 / 69
Preliminaries: Decomposition, Tasks, and Dependency Graphs

Task-Dependency Graph

The tasks in the previous examples are independent and can be


performed in any sequence.
In most of the problems, there exist some sort of dependencies
between the tasks.
An abstraction used to express such dependencies among tasks and
their relative order of execution is known as a task-dependency
graph
It is a directed acyclic graph in which node are tasks and the
directed edges indicate the dependencies between them
The task corresponding to a node can be executed only when all the
predecessor [parent] tasks complete their execution

Department of CS, GCU, Lahore Parallel and Distributed Computing July 18, 2023 14 / 69
Preliminaries: Decomposition, Tasks, and Dependency Graphs

Task-Dependency Graph

Table: A database storing information about used vehicles.

ID# Model Year Color Dealer Price


4523 Civic 2002 Blue MN $ 18,000
3476 Corolla 1999 White IL $ 15,000
7623 Camry 2001 Green NY $ 21,000
9834 Prius 2001 Green CA $ 18,000
6734 Civic 2001 White OR $ 17,000
5342 Altima 2001 Green FL $ 19,000
3845 Maxima 2001 Blue NY $ 22,000
8354 Accord 2000 Green VT $ 18,000
4395 Civic 2001 Red CA $ 17,000
7352 Civic 2002 Red WA $ 18,000

Department of CS, GCU, Lahore Parallel and Distributed Computing July 18, 2023 15 / 69
Preliminaries: Decomposition, Tasks, and Dependency Graphs

Example: Task Dependency Graph


ID# Year
ID# Model ID# Color
7623 2001
4523 Civic 6734 2001 ID# Color 7623 Green
6734 Civic 5342 2001 9834 Green
4395 Civic 3845 2001 3476 White 5342 Green
7352 Civic 4395 2001 6734 White 8354 Green

Civic 2001 White Green

ID# Color

ID# Model Year 3476 White


7623 Green
6734 Civic 2001 Civic AND 2001 White OR Green 9834 Green
4395 Civic 2001 6734 White
5342 Green
8354 Green

Civic AND 2001 AND (White OR Green)

ID# Model Year Color


6734 Civic 2001 White

Figure: (3.2) Decomposing the given query into a number of tasks.


Department of CS, GCU, Lahore Parallel and Distributed Computing July 18, 2023 16 / 69
Preliminaries: Decomposition, Tasks, and Dependency Graphs

Example (2): Task Dependency Graph


ID# Year
ID# Model ID# Color
7623 2001
4523 Civic 6734 2001 ID# Color 7623 Green
6734 Civic 5342 2001 9834 Green
4395 Civic 3845 2001 3476 White 5342 Green
7352 Civic 4395 2001 6734 White 8354 Green

Civic 2001 White Green

ID# Color

White OR Green 3476 White


7623 Green
9834 Green
6734 White
5342 Green
8354 Green

2001 AND (White or Green) ID# Color Year


7623 Green 2001
6734 White 2001
5342 Green 2001

Civic AND 2001 AND (White OR Green)

ID# Model Year Color


6734 Civic 2001 White

Figure: (3.3) Alternate decomposition of the given query into a number of tasks.
Department of CS, GCU, Lahore Parallel and Distributed Computing July 18, 2023 17 / 69
Preliminaries: Decomposition, Tasks, and Dependency Graphs

Granularity

The number and sizes of tasks into which a problem is decomposed


determines the granularity of the decomposition
A decomposition into a large number of small tasks is called
fine-grained
A decomposition into a small number of large tasks is called
coarse-grained
For matrix-vector multiplication Figure 3.1 would usually be
considered fine-grained
Figure 3.4 shows a coarse-grained decomposition as each task
computes n4 entries of the output vector of length n

Department of CS, GCU, Lahore Parallel and Distributed Computing July 18, 2023 18 / 69
Preliminaries: Decomposition, Tasks, and Dependency Graphs

Granularity

The number and sizes of tasks into which a problem is decomposed


determines the granularity of the decomposition
A decomposition into a large number of small tasks is called
fine-grained
A decomposition into a small number of large tasks is called
coarse-grained
For matrix-vector multiplication Figure 3.1 would usually be
considered fine-grained
Figure 3.4 shows a coarse-grained decomposition as each task
computes n4 entries of the output vector of length n

Department of CS, GCU, Lahore Parallel and Distributed Computing July 18, 2023 18 / 69
Preliminaries: Decomposition, Tasks, and Dependency Graphs

Maximum degree of concurrency

The maximum number of tasks that can be executed concurrently in


a parallel program at any given time is known as its maximum degree
of concurrency
Usually, it is normally less than total number of tasks due to
dependencies
E.g., max-degree of concurrency in the task-graphs of Figures 3.2 and
3.3 is 4

Department of CS, GCU, Lahore Parallel and Distributed Computing July 18, 2023 19 / 69
Preliminaries: Decomposition, Tasks, and Dependency Graphs

Maximum degree of concurrency

Figure: Determine Maximum Degree of Concurrency?

Department of CS, GCU, Lahore Parallel and Distributed Computing July 18, 2023 20 / 69
Preliminaries: Decomposition, Tasks, and Dependency Graphs

Average degree of concurrency

A relatively better measure for the performance of a parallel program


The average number of tasks that can run concurrently over the
entire duration of execution of the program
The ratio of the total amount of work to the critical path length
So, what is the critical path in the graph?

Department of CS, GCU, Lahore Parallel and Distributed Computing July 18, 2023 21 / 69
Preliminaries: Decomposition, Tasks, and Dependency Graphs

Average degree of concurrency

Critical Path: The longest directed path between any pair of start
and finish nodes is known as the critical path
Critical Path Length: The sum of the weights of nodes along this
path
the weight of a node is the size, or the amount of work associated with
the corresponding task
A shorter critical path favors a higher average degree of concurrency
Both, maximum and average degree of concurrency increases as tasks
become smaller(finer)

Department of CS, GCU, Lahore Parallel and Distributed Computing July 18, 2023 22 / 69
Preliminaries: Decomposition, Tasks, and Dependency Graphs

Average degree of concurrency

Task 4 Task 3 Task 2 Task 1 Task 4 Task 3 Task 2 Task 1

10 10 10 10 10 10 10 10

6 Task 5
9 Task 6 6 Task 5
11 Task 6

8 Task 7 7 Task 7

(a) (b)

Figure: Abstractions from task graphs from Figures 3.2 and 3.3 respectively

Department of CS, GCU, Lahore Parallel and Distributed Computing July 18, 2023 23 / 69
Preliminaries: Decomposition, Tasks, and Dependency Graphs

Average degree of concurrency


Task 4 Task 3 Task 2 Task 1 Task 4 Task 3 Task 2 Task 1

10 10 10 10 10 10 10 10

6 Task 5
9 Task 6 6 Task 5
11 Task 6

8 Task 7 7 Task 7

(a) (b)

Figure: Abstractions from task graphs from Figures 3.2 and 3.3 respectively

Critical path lengths: 27 and 34


Total amount of work: 63 and 64
Average degree of concurrency: 2.33 and 1.88
Department of CS, GCU, Lahore Parallel and Distributed Computing July 18, 2023 23 / 69
Preliminaries: Decomposition, Tasks, and Dependency Graphs

Average degree of concurrency

Figure: Determine critical path length and average concurrency assuming the
wight on each node is one?

Department of CS, GCU, Lahore Parallel and Distributed Computing July 18, 2023 24 / 69
Preliminaries: Decomposition, Tasks, and Dependency Graphs

Task Interact Graph

Depicts pattern of interaction between the tasks


Dependency graphs only show that how output of first task becomes
input to the next level task
But how the tasks interact with each other to access distributed data
is only depicted by task interaction graphs
The nodes in a task-interaction graph represent tasks
The edges connect tasks that interact with each other

Department of CS, GCU, Lahore Parallel and Distributed Computing July 18, 2023 25 / 69
Preliminaries: Decomposition, Tasks, and Dependency Graphs

Task Interact Graph

Depicts pattern of interaction between the tasks


Dependency graphs only show that how output of first task becomes
input to the next level task
But how the tasks interact with each other to access distributed data
is only depicted by task interaction graphs
The nodes in a task-interaction graph represent tasks
The edges connect tasks that interact with each other

Department of CS, GCU, Lahore Parallel and Distributed Computing July 18, 2023 25 / 69
Preliminaries: Decomposition, Tasks, and Dependency Graphs

Task Interact Graph

The edges in a task interaction graph are usually undirected (but


directed edges can be used to indicate the direction of flow of data, if
it is unidirectional)
The edge-set of a task-interaction graph is usually a superset of the
edge-set of the task-dependency graph
In database query processing example, the task interaction graph is
the same as the task dependency graph

Department of CS, GCU, Lahore Parallel and Distributed Computing July 18, 2023 26 / 69
Preliminaries: Decomposition, Tasks, and Dependency Graphs

Task Interact Graph

A b 0
2 3
0 1 2 3 4 5 6 7 8 9 1011 1
Task 0

5
4 4 6

7
8

9
Task 11 8 10 11

(a) (b)

Figure: (3.6) A decomposition for sparse matrix-vector multiplication and the


corresponding task-interaction graph. In the decomposition Task i computes
Σ11
j=0 A[i, j].b[j] where A[i, j] ̸= 0

Department of CS, GCU, Lahore Parallel and Distributed Computing July 18, 2023 27 / 69
Preliminaries: Decomposition, Tasks, and Dependency Graphs

Processes and Mapping

Logical processing or computing agent that performs tasks is called


process
The process of assigning tasks to logical computing agents (i.e.,
processes) is called mapping

Department of CS, GCU, Lahore Parallel and Distributed Computing July 18, 2023 28 / 69
Preliminaries: Decomposition, Tasks, and Dependency Graphs

Processes and Mapping

Task 4 Task 3 Task 2 Task 1 Task 4 Task 3 Task 2 Task 1

10 10 10 10 10 10 10 10
P3 P2 P1 P0 P3 P2 P1 P0

P0 6 Task 5
P2 9 Task 6 P0 6 Task 5
P0 11 Task 6

P0 8 Task 7 P0 7 Task 7

(a) (b)

Figure: (3.7) Mappings of the task graphs of Figure 3.5 onto four processes.

Department of CS, GCU, Lahore Parallel and Distributed Computing July 18, 2023 29 / 69
Preliminaries: Decomposition, Tasks, and Dependency Graphs

Processes and Processors

Processes are logical computing agents that perform tasks


Processors are the hardware units that physically perform
computations
Depending on the problem, multiple processes can be mapped on a
single processor
But, in most of the cases, there is one-to-one correspondence between
processors and processes)
So, we assume that there are as many processes as the number of
physical CPUs on the parallel computer

Department of CS, GCU, Lahore Parallel and Distributed Computing July 18, 2023 30 / 69
Decomposition Techniques

Agenda

1 Steps in Parallel Algorithm Design

2 Preliminaries: Decomposition, Tasks, and Dependency Graphs

3 Decomposition Techniques

4 Mapping Schemes

5 Reference Books

Department of CS, GCU, Lahore Parallel and Distributed Computing July 18, 2023 31 / 69
Decomposition Techniques

Decomposition Techniques

The process of decomposing larger problems into smaller tasks for


concurrent executions, is known to as decomposition
The techniques that facilitate this decomposition are known to as
decomposition techniques
Common techniques:
Recursive
Data-decomposition
Exploratory decomposition
Speculative decomposition
Hybrid
Recursive and data decomposition are [relatively] general purpose
Exploratory and speculative are special purpose in nature

Department of CS, GCU, Lahore Parallel and Distributed Computing July 18, 2023 32 / 69
Decomposition Techniques

Decomposition Techniques

The process of decomposing larger problems into smaller tasks for


concurrent executions, is known to as decomposition
The techniques that facilitate this decomposition are known to as
decomposition techniques
Common techniques:
Recursive
Data-decomposition
Exploratory decomposition
Speculative decomposition
Hybrid
Recursive and data decomposition are [relatively] general purpose
Exploratory and speculative are special purpose in nature

Department of CS, GCU, Lahore Parallel and Distributed Computing July 18, 2023 32 / 69
Decomposition Techniques

Decomposition Techniques

The process of decomposing larger problems into smaller tasks for


concurrent executions, is known to as decomposition
The techniques that facilitate this decomposition are known to as
decomposition techniques
Common techniques:
Recursive
Data-decomposition
Exploratory decomposition
Speculative decomposition
Hybrid
Recursive and data decomposition are [relatively] general purpose
Exploratory and speculative are special purpose in nature

Department of CS, GCU, Lahore Parallel and Distributed Computing July 18, 2023 32 / 69
Decomposition Techniques

Recursive Task Decomposition

Recursive decomposition is a method for inducing concurrency in the


problems that can be solved using divide and conquer strategy
Divides each problem into a set of independent subproblems
Each one of these subproblems is solved by recursively applying a
similar division into smaller subproblems followed by a combination of
their results
A natural concurrency exists as different subproblems can be solved
concurrently

Department of CS, GCU, Lahore Parallel and Distributed Computing July 18, 2023 33 / 69
Decomposition Techniques

Recursive Task Decomposition

Recursive decomposition is a method for inducing concurrency in the


problems that can be solved using divide and conquer strategy
Divides each problem into a set of independent subproblems
Each one of these subproblems is solved by recursively applying a
similar division into smaller subproblems followed by a combination of
their results
A natural concurrency exists as different subproblems can be solved
concurrently

Department of CS, GCU, Lahore Parallel and Distributed Computing July 18, 2023 33 / 69
Decomposition Techniques

Recursive Task Decomposition


Example

5 12 11 1 10 6 8 3 7 4 9 2

1 3 4 2 5 12 11 10 6 8 7 9

1 2 3 4 5 6 8 7 9 12 11 10

1 2 3 4 5 6 7 8 9 10 12 11

5 6 7 8 10 11 12

11 12

Figure: (3.8) The quicksort task-dependency graph based on recursive


decomposition for sorting a sequence of 12 numbers.

Department of CS, GCU, Lahore Parallel and Distributed Computing July 18, 2023 34 / 69
Decomposition Techniques

Recursive Task Decomposition


Modifying simple problem to support recursive decomposition

min(1,2)

min(4,1) min(8,2)

min(4,9) min(1,7) min(8,11) min(2,12)

Figure: (3.9) The task-dependency graph for finding the minimum number in the
sequence 4, 9, 1, 7, 8, 11, 2, 12. Each node in the tree represents the task of
finding the minimum of a pair of numbers.

Department of CS, GCU, Lahore Parallel and Distributed Computing July 18, 2023 35 / 69
Decomposition Techniques

Recursive Task Decomposition


Modifying simple problem to support recursive decomposition

1. procedure SERIAL MIN (A, n)


2. begin
3. min = A[0];
4. for i := 1 to n − 1 do
5. if (A[i] < min) min := A[i];
6. endfor;
7. return min;
8. end SERIAL MIN

Department of CS, GCU, Lahore Parallel and Distributed Computing July 18, 2023 36 / 69
Decomposition Techniques

Recursive Task Decomposition


Modifying simple problem to support recursive decomposition

We can rewrite the loop as follows:

1. procedure RECURSIVE MIN (A, n)


2. begin
3. if (n = 1) then
4. min := A[0];
5. else
6. lmin := RECURSIVE MIN (A, n/2);
7. rmin := RECURSIVE MIN (&(A[n/2]), n − n/2);
8. if (lmin < rmin) then
9. min := lmin;
10. else
11. min := rmin;
12. endelse;
13. endelse;
14. return min;
15. end RECURSIVE MIN

Department of CS, GCU, Lahore Parallel and Distributed Computing July 18, 2023 36 / 69
Decomposition Techniques

Data Decomposition

Powerful and commonly used method


Two step procedure:
1 Partition data on which computation is to be performed
2 This data partitioning is used to induce a partitioning of the
computations into tasks

Department of CS, GCU, Lahore Parallel and Distributed Computing July 18, 2023 37 / 69
Decomposition Techniques

Data Decomposition
Partitioning output data

Used where each element of the output can be computed


independently of others as a function of the input.
Partitioning of the output data automatically induces a decomposition
of the problems into tasks
each task is assigned the work of computing a portion of the output

Department of CS, GCU, Lahore Parallel and Distributed Computing July 18, 2023 38 / 69
Decomposition Techniques

Data Decomposition
Partitioning output data

Consider the problem of multiplying two n × n matrices A and B to yield


matrix C . The output matrix C can be partitioned into four tasks as
follows:

     
A1,1 A1,2 B1,1 B1,2 C1,1 C1,2
. →
A2,1 A2,2 B2,1 B2,2 C2,1 C2,2

Task 1: C1,1 = A1,1 B1,1 + A1,2 B2,1


Task 2: C1,2 = A1,1 B1,2 + A1,2 B2,2
Task 3: C2,1 = A2,1 B1,1 + A2,2 B2,1
Task 4: C2,2 = A2,1 B1,2 + A2,2 B2,2

Department of CS, GCU, Lahore Parallel and Distributed Computing July 18, 2023 39 / 69
Decomposition Techniques

Data Decomposition
Partitioning output data

A partitioning of output data does not result in a unique decomposition into


tasks. For example, for the same problem as in previous foil, with identical output
data distribution, we can derive the following two (other) decompositions

Decomposition I Decomposition II

Task 1: C1,1 = A1,1 B1,1 Task 1: C1,1 = A1,1 B1,1


Task 2: C1,1 = C1,1 + A1,2 B2,1 Task 2: C1,1 = C1,1 + A1,2 B2,1
Task 3: C1,2 = A1,1 B1,2 Task 3: C1,2 = A1,2 B2,2
Task 4: C1,2 = C1,2 + A1,2 B2,2 Task 4: C1,2 = C1,2 + A1,1 B1,2
Task 5: C2,1 = A2,1 B1,1 Task 5: C2,1 = A2,2 B2,1
Task 6: C2,1 = C2,1 + A2,2 B2,1 Task 6: C2,1 = C2,1 + A2,1 B1,1
Task 7: C2,2 = A2,1 B1,2 Task 7: C2,2 = A2,1 B1,2
Task 8: C2,2 = C2,2 + A2,2 B2,2 Task 8: C2,2 = C2,2 + A2,2 B2,2

Department of CS, GCU, Lahore Parallel and Distributed Computing July 18, 2023 40 / 69
Decomposition Techniques

Data Decomposition
Partitioning output data

(a) Transactions (input), itemsets (input), and frequencies (output)

A, B, C, E, G, H A, B, C 1
B, D, E, F, K, L D, E 3

Itemset Frequency
Database Transactions
A, B, F, H, L C, F, G 0

Itemsets
D, E, F, H A, E 2
F, G, H, K, C, D 1
A, E, F, K, L D, K 2
B, C, D, G, H, L B, C, F 0
G, H, L C, D, K 0
D, E, F, K, L
F, G, H, L

(b) Partitioning the frequencies (and itemsets) among the tasks

Itemset Frequency

Itemset Frequency
A, B, C, E, G, H A, B, C 1 A, B, C, E, G, H C, D 1
Itemsets

Itemsets
B, D, E, F, K, L D, E 3 B, D, E, F, K, L D, K 2
Database Transactions

Database Transactions
A, B, F, H, L C, F, G 0 A, B, F, H, L B, C, F 0

D, E, F, H A, E 2 D, E, F, H C, D, K 0
F, G, H, K, F, G, H, K,
A, E, F, K, L A, E, F, K, L
B, C, D, G, H, L B, C, D, G, H, L
G, H, L G, H, L
D, E, F, K, L D, E, F, K, L
F, G, H, L F, G, H, L

task 1 task 2

Figure: (3.12) Computing itemset frequencies in a transaction database.


Department of CS, GCU, Lahore Parallel and Distributed Computing July 18, 2023 41 / 69
Decomposition Techniques

Data Decomposition
Partitioning input data

In many algorithms, it is impossible or undesirable to partition the


output data.
The output may be a single unknown value. Such as in case of finding
sum, minimum, maximum or frequencies of a number.
It is sometimes possible to partition the input data, and then use this
partitioning to induce concurrency
A task is created for each partition of the input data and this task
performs as much computation as possible using these local data
Then local solutions are combined to generate a global solution In
many algorithms, it is impossible or undesirable to partition the
output data

Department of CS, GCU, Lahore Parallel and Distributed Computing July 18, 2023 42 / 69
Decomposition Techniques

Data Decomposition
Partitioning input data

In many algorithms, it is impossible or undesirable to partition the


output data.
The output may be a single unknown value. Such as in case of finding
sum, minimum, maximum or frequencies of a number.
It is sometimes possible to partition the input data, and then use this
partitioning to induce concurrency
A task is created for each partition of the input data and this task
performs as much computation as possible using these local data
Then local solutions are combined to generate a global solution In
many algorithms, it is impossible or undesirable to partition the
output data

Department of CS, GCU, Lahore Parallel and Distributed Computing July 18, 2023 42 / 69
Decomposition Techniques

Data Decomposition
Partitioning input data

Partitioning the transactions among the tasks


Database Transactions

Database Transactions
A, B, C, E, G, H A, B, C 1 A, B, C 0

Itemset Frequency
Itemset Frequency
B, D, E, F, K, L D, E 2 D, E 1
A, B, F, H, L C, F, G 0 C, F, G 0

Itemsets
Itemsets

D, E, F, H A, E 1 A, E, F, K, L A, E 1
F, G, H, K, C, D 0 B, C, D, G, H, L C, D 1
D, K 1 G, H, L D, K 1
B, C, F 0 D, E, F, K, L B, C, F 0
C, D, K 0 F, G, H, L C, D, K 0

task 1 task 2

Figure: (3.13a) Input decompositions for computing itemset frequencies in a


transaction database.

Department of CS, GCU, Lahore Parallel and Distributed Computing July 18, 2023 43 / 69
Decomposition Techniques

Data Decomposition
Partitioning both input and output data

Consider the problems where output data partitioning is possible


Here, partitioning the input also, can offer additional concurrency
The next example shows 4-way decomposition of the previous
example based on both input-output partitioning

Department of CS, GCU, Lahore Parallel and Distributed Computing July 18, 2023 44 / 69
Decomposition Techniques

Data Decomposition
Partitioning both input and output data

Partitioning both transactions and frequencies among the tasks

Database Transactions

Database Transactions
A, B, C, E, G, H A, B, C 1 A, B, C, E, G, H

Itemset Frequency

Itemset Frequency
B, D, E, F, K, L D, E 2 B, D, E, F, K, L
A, B, F, H, L C, F, G 0 A, B, F, H, L

Itemsets

Itemsets
D, E, F, H A, E 1 D, E, F, H
F, G, H, K, F, G, H, K, C, D 0
D, K 1
B, C, F 0
C, D, K 0

task 1 task 2

Database Transactions
Database Transactions

A, B, C 0

Itemset Frequency
Itemset Frequency
D, E 1
C, F, G 0

Itemsets
Itemsets

A, E, F, K, L A, E 1 A, E, F, K, L
B, C, D, G, H, L B, C, D, G, H, L C, D 1
G, H, L G, H, L D, K 1
D, E, F, K, L D, E, F, K, L B, C, F 0
F, G, H, L F, G, H, L C, D, K 0

task 3 task 4

Figure: (3.13b) Input decompositions for computing itemset frequencies in a


transaction database.
Department of CS, GCU, Lahore Parallel and Distributed Computing July 18, 2023 45 / 69
Decomposition Techniques

Data Decomposition
The Owner Computes Rule

Task decomposition based on data-partitioning is widely known as owner


compute rule. Two types of partitioning hence, two definitions
1 If we assign partitions of the input data to tasks:

The rule means that a task performs all the computations that can be
done using these data
2 If we assign partition of output data to the tasks:
The rule means that a task computes all the data in the partition
assigned to it (portion of the output)

Department of CS, GCU, Lahore Parallel and Distributed Computing July 18, 2023 46 / 69
Decomposition Techniques

Data Decomposition
The Owner Computes Rule

Task decomposition based on data-partitioning is widely known as owner


compute rule. Two types of partitioning hence, two definitions
1 If we assign partitions of the input data to tasks:

The rule means that a task performs all the computations that can be
done using these data
2 If we assign partition of output data to the tasks:
The rule means that a task computes all the data in the partition
assigned to it (portion of the output)

Department of CS, GCU, Lahore Parallel and Distributed Computing July 18, 2023 46 / 69
Decomposition Techniques

Exploratory Decomposition

In many cases, the decomposition of the problem goes hand-in-hand


with its execution
Specially used to decompose the problems having underlying
computation like search-space exploration
Steps:
1 Partition the search space into smaller parts
2 Search each one of these parts concurrently, until the desired solutions
are found

Department of CS, GCU, Lahore Parallel and Distributed Computing July 18, 2023 47 / 69
Decomposition Techniques

Exploratory Decomposition
Example

1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4
5 6 8 5 6 7 8 5 6 7 8 5 6 7 8
9 10 7 11 9 10 11 9 10 11 9 10 11 12
13 14 15 12 13 14 15 12 13 14 15 12 13 14 15

(a) (b) (c) (d)

Figure: A simple application of exploratory decomposition is in the solution to a


15 puzzle (a tile puzzle). We show a sequence of three moves that transform a
given initial state (a) to desired final state (d)

Department of CS, GCU, Lahore Parallel and Distributed Computing July 18, 2023 48 / 69
Decomposition Techniques

Exploratory Decomposition
Example
1 2 3 4
5 6 7 8
9 10 11
13 14 15 12

1 2 3 4 1 2 3 4

task 4
5 6 7 8 5 6 7
9 10 11 9 10 11 8
13 14 15 12 13 14 15 12

1 2 3 4
5 6 7 8
9 10 11 12
13 14 15

1 2 3 4
5 6 7 8
9 10 11
13 14 15 12

1 2 3 4
5 6 7 8

1 2 3 4 9 10 11

5 6 7 8 13 14 15 12

9 10 11 1 2 3 4
13 14 15 12 5 7 8

task 3
9 6 10 11
13 14 15 12

1 2 3 4
5 6 7 8
1 2 3 4 9 14 10 11
5 6 7 8 13 15 12
9 10 11
13 14 15 12 1 2 3 4
5 6 8
9 10 7 11
13 14 15 12

1 2 3 4
5 6 8

1 2 3 4 9 10 7 11

5 6 8 13 14 15 12

task 2
9 10 7 11 1 2 4
13 14 15 12 5 6 3 8
9 10 7 11
13 14 15 12

1 2 3 4
5 6 7 8
9 10 11
13 14 15 12

1 2 3 4
5 6 7 8
9 10 15 11
13 14 12

1 2 3 4 1 2 3 4
5 6 7 8 5 6 7 8
task 1

9 10 15 11 9 10 15 11
13 14 12 13 14 12

1 2 3 4
5 6 7 8
9 10 11
13 14 15 12

Figure: The state space can be explored by generating various successor states of
the current state and to view them as independent tasks
Department of CS, GCU, Lahore Parallel and Distributed Computing July 18, 2023 49 / 69
Decomposition Techniques

Exploratory Decomposition
Example

m m m m m m m m

Solution
Total serial work: 2m+1 Total serial work: m
Total parallel work: 1 Total parallel work: 4m

(a) (b)

Figure: An illustration of anomalous speedups resulting from exploratory


decomposition

Department of CS, GCU, Lahore Parallel and Distributed Computing July 18, 2023 50 / 69
Decomposition Techniques

Speculative Decomposition

Usually used in the problems where different input values or output of


previous stage causes many computationally intensive branches.
Speculation is something like Gamble or Risk or preliminary guess
Steps:
Speculate(guess) the output of previous stage
Start performing computations in the next stage before even the
completion of the previous stage.
After availability of the output of previous stage, if speculation was
correct than most of the computation for next step would have already
been done

Department of CS, GCU, Lahore Parallel and Distributed Computing July 18, 2023 51 / 69
Decomposition Techniques

Speculative Decomposition

Usually used in the problems where different input values or output of


previous stage causes many computationally intensive branches.
Speculation is something like Gamble or Risk or preliminary guess
Steps:
Speculate(guess) the output of previous stage
Start performing computations in the next stage before even the
completion of the previous stage.
After availability of the output of previous stage, if speculation was
correct than most of the computation for next step would have already
been done

Department of CS, GCU, Lahore Parallel and Distributed Computing July 18, 2023 51 / 69
Decomposition Techniques

Speculative Decomposition

Switch Example Algorithm


Calculate expression for the switch condition  task 0
Case 0: Multiply vector b with matrix A  task 1
Case 1: Multiply vector c with matrix A  task 2
Case 2: Multiply vector d with matrix A  task 3
display result  task 4

Department of CS, GCU, Lahore Parallel and Distributed Computing July 18, 2023 52 / 69
Decomposition Techniques

Speculative Decomposition
Example

C
System Inputs

A D

System Output
E G I
B
F H

System Components

Figure: A simple network for discrete event simulation

Department of CS, GCU, Lahore Parallel and Distributed Computing July 18, 2023 53 / 69
Decomposition Techniques

Hybrid Decomposition

Often, a mix of decomposition techniques is necessary for decomposing a


problem
Data
3 7 2 9 11 4 5 8 7 10 6 13 1 19 3 9
decomposition

2 1 Recursive
decomposition

Figure: Hybrid decomposition for finding the minimum of an array of size 16 using
four tasks

Department of CS, GCU, Lahore Parallel and Distributed Computing July 18, 2023 54 / 69
Mapping Schemes

Agenda

1 Steps in Parallel Algorithm Design

2 Preliminaries: Decomposition, Tasks, and Dependency Graphs

3 Decomposition Techniques

4 Mapping Schemes

5 Reference Books

Department of CS, GCU, Lahore Parallel and Distributed Computing July 18, 2023 55 / 69
Mapping Schemes

Static Mapping techniques

Distributing the tasks among the processes before execution of the


program
E.g., usually used in situation where total number of tasks and their
sizes are known before the execution of the program
Easy to implement in massage passing paradigm

Department of CS, GCU, Lahore Parallel and Distributed Computing July 18, 2023 56 / 69
Mapping Schemes

Dynamic Mapping

When total number of tasks are not known a priori


(OR) when task sizes are unknown
In this case static mapping can lead to serious load imbalances
Both static and Dynamic Mappings are equally easy in shared
memory paradigm

Department of CS, GCU, Lahore Parallel and Distributed Computing July 18, 2023 57 / 69
Mapping Schemes

Schemes for Static Mapping


Mappings based on Data Partitioning (Array Distribution Schemes)

In a decomposition based on partitioning data, mapping the relevant


data onto the processes is equivalent to mapping tasks onto processes
Commonly used array mapping schemes
Block distribution (1D and 2D)
Cyclic and block cyclic

Department of CS, GCU, Lahore Parallel and Distributed Computing July 18, 2023 58 / 69
Mapping Schemes

Schemes for Static Mapping


Mappings based on Data Partitioning (Array Distribution Schemes)

Block distribution (1D)


row-wise distribution column-wise distribution

p0
p1
p2
p3 p0 p1 p2 p3 p4 p5 p6 p7
p4
p5
p6
p7

Figure: Examples of one-dimensional partitioning of an array among eight


processes.

Department of CS, GCU, Lahore Parallel and Distributed Computing July 18, 2023 59 / 69
Mapping Schemes

Schemes for Static Mapping


Mappings based on Data Partitioning (Array Distribution Schemes)

Block distribution (2D)

p0 p1 p2 p3
p0 p1 p2 p3 p4 p5 p6 p7
p4 p5 p6 p7

p8 p9 p10 p11
p8 p9 p10 p11 p12 p13 p14 p15

p12 p13 p14 p15

(a) (b)

Figure: Examples of two-dimensional distributions of an array, (a) on a 4 x 4


process grid, and (b) on a 2 x 8 process grid.

Department of CS, GCU, Lahore Parallel and Distributed Computing July 18, 2023 60 / 69
Mapping Schemes

Schemes for Static Mapping


Mappings based on Data Partitioning (Array Distribution Schemes)
A B C
p0
p1
p2
p3
p4
p5
p6
X = p7
p8
p9
p10
p11
p12
p13
p14
p15

(a)

A B C

p0 p1 p2 p3

p4 p5 p6 p7
X =
p8 p9 p10 p11

p12 p13 p14 p15

(b)

Figure: Data sharing needed for matrix multiplication with (a)one-dimensional


and (b) two-dimensional partitioning of the output matrix. Shaded portions of
the input matrices A and B are required by the process that computes the shaded
portion of the output matrix C.
Department of CS, GCU, Lahore Parallel and Distributed Computing July 18, 2023 61 / 69
Mapping Schemes

Schemes for Static Mapping


Mappings based on Data Partitioning (Array Distribution Schemes)

Cyclic distribution (extremely fine-grained partition)

Department of CS, GCU, Lahore Parallel and Distributed Computing July 18, 2023 62 / 69
Mapping Schemes

Schemes for Static Mapping


Mappings based on Data Partitioning (Array Distribution Schemes)

Block-Cyclic distribution (1D and 2D)


p0
p0 p1 p0 p1
p1
p2
p2 p3 p2 p3
p3
p0
p0 p1 p0 p1
p1
p2
p2 p3 p2 p3
p3

(a) (b)

Figure: Examples of one- and two-dimensional block-cyclic distributions among


four processes. (a) The rows of the array are grouped into blocks each consisting
of two rows, resulting in eight blocks of rows. These blocks are distributed to four
processes in a wraparound fashion. (b) The matrix is blocked into 16 blocks each
of size 4 x 4, and it is mapped onto a 2 x 2 grid of processes in a wraparound
fashion.
Department of CS, GCU, Lahore Parallel and Distributed Computing July 18, 2023 63 / 69
Mapping Schemes

Schemes for Static Mapping


Mappings based on Data Partitioning (Array Distribution Schemes)

Block-Cyclic distribution (Issue)

Figure: Using the block-cyclic distribution shown in (b) to distribute the


computations performed in array (a) will lead to load imbalances.

Department of CS, GCU, Lahore Parallel and Distributed Computing July 18, 2023 64 / 69
Mapping Schemes

Schemes for Static Mapping


Mappings based on Data Partitioning (Array Distribution Schemes)

Block-Cyclic distribution (Solution 1D)


Randomized-Block distribution (solution 1D)
Generate many more tasks than the processes (like cyclic)
Assign uniform number of random blocks to each process

Figure: A one-dimensional randomized block mapping of 12 blocks onto four


process (i.e., α = 3).

Department of CS, GCU, Lahore Parallel and Distributed Computing July 18, 2023 65 / 69
Mapping Schemes

Schemes for Static Mapping


Mappings based on Data Partitioning (Array Distribution Schemes)

Randomized-Block distribution (solution 2D)

Figure: Using a two-dimensional random block distribution shown in (b) to


distribute the computations performed in array (a), as shown in (c).

Department of CS, GCU, Lahore Parallel and Distributed Computing July 18, 2023 66 / 69
Mapping Schemes

Schemes for Static Mapping


Mappings based on Data Partitioning (Array Distribution Schemes)

Why randomized bock cyclic distribution is not always used?

Department of CS, GCU, Lahore Parallel and Distributed Computing July 18, 2023 67 / 69
Reference Books

Agenda

1 Steps in Parallel Algorithm Design

2 Preliminaries: Decomposition, Tasks, and Dependency Graphs

3 Decomposition Techniques

4 Mapping Schemes

5 Reference Books

Department of CS, GCU, Lahore Parallel and Distributed Computing July 18, 2023 68 / 69
Reference Books

Reference Books

Introduction to Parallel Computing, Second Edition, by Ananth


Grama
Parallel programming in C with MPI and OpenMP by Michael J.
Quinn
An Introduction to Parallel Programming by Peter S. Pacheco
Professional CUDA C Programming by John Cheng

Department of CS, GCU, Lahore Parallel and Distributed Computing July 18, 2023 69 / 69

You might also like