PDC After Mid

MPI
MPI (Message Passing Interface) is a standardized programming interface for parallel computing. It
allows multiple processes running on different computers to communicate and synchronize with
each other by passing messages between them.
In easy words, MPI is a tool that enables programmers to create parallel programs that run on
multiple computers or processors. It allows these programs to communicate with each other, share
data, and work together to solve problems faster than a single computer or processor could.
If asked in an examination, a good description of MPI would include the following points:
 MPI is a standard interface for writing parallel programs that can run on multiple computers
or processors.
 It allows multiple processes to communicate with each other by passing messages between
them.
 MPI provides a set of functions and tools that enable programmers to write parallel
programs and manage the communication between processes.
 MPI is used in a variety of applications, including scientific computing, data analytics, and
machine learning.
 To use MPI, programmers must define the number of processes and their communication
pattern, as well as the messages to be sent and received between them.
 MPI supports a wide range of programming languages, including C, C++, and Fortran.
 MPI implementations are available for a variety of computer architectures, including shared-
memory systems, clusters, and supercomputers.
More About MPI

Message Passing Interface (MPI) is a standard communication protocol for parallel computing. It
allows multiple processes or threads to communicate and synchronize with each other in a
distributed memory system. MPI is widely used in scientific computing and high-performance
computing applications.
Background: MPI was developed in the early 1990s by a group of researchers from academia and
industry. It was designed to be a portable, efficient, and flexible communication interface for parallel
computing systems. MPI has become the de facto standard for message passing in parallel
computing, with many implementations available on a variety of platforms.
Message Passing: In MPI, message passing refers to the exchange of data between processes. A
process can send a message to another process or receive a message from another process. The
message can be of any size and can be sent asynchronously or synchronously.
MPI provides a number of communication functions to facilitate message passing. These functions
include send, receive, broadcast, scatter, gather, and reduce. The send and receive functions are
used to transfer data between processes, while the broadcast, scatter, gather, and reduce functions
are used to distribute or combine data among a group of processes.
Group and Context: MPI provides a mechanism for creating groups of processes that can
communicate with each other. A group is a subset of processes that share a common
communication context. A context is a set of communication resources, such as buffers and
channels, that are used to exchange messages between processes.
MPI also provides a mechanism for creating multiple contexts within a single program. This allows
different parts of the program to use different sets of communication resources, which can improve
performance and reduce contention.
Communication Modes: MPI supports two communication modes: blocking and non-blocking. In
blocking mode, a process waits until a message has been received before continuing execution. In
non-blocking mode, a process continues execution immediately after sending or receiving a
message, without waiting for the message to be processed.
Blocking/Non-blocking: Blocking communication is simpler to program but can lead to performance

issues if the communication overhead is high. Non-blocking communication requires more complex
programming but can improve performance by overlapping communication with computation.
Features: MPI provides several features that make it a powerful tool for parallel computing. These
include support for a wide range of data types, flexible process topologies, dynamic process creation
and deletion, and fault tolerance.
Programming / Issues: MPI programming can be challenging due to the need to manage
communication and synchronization between processes. The programmer must ensure that the
correct data is being sent and received at the correct time, and that processes are synchronized
properly to avoid race conditions.
In addition, issues such as load balancing, scalability, and fault tolerance must be carefully
considered when designing MPI applications. Load balancing ensures that work is evenly distributed
among processes, scalability ensures that the application can be run on larger systems, and fault
tolerance ensures that the application can recover from failures.
Mapping Schemes
1. Block distribution: In this scheme, the data is partitioned into blocks of equal size and
assigned to each process in a linear order. For example, if there are 8 processes and a 64-
element array, each process would receive 8 contiguous elements of the array.
2. 1D and 2D distribution: In these schemes, the data is partitioned into one-dimensional or

two-dimensional arrays and assigned to processes accordingly. In the 1D scheme, the data is
partitioned into rows or columns, while in the 2D scheme, the data is partitioned into
rectangular subarrays. For example, in a 2D distribution scheme with 4 processes, a 16x16
array could be divided into four 8x8 subarrays and assigned to each process.
3. Cyclic and block-cyclic distribution: In the cyclic distribution scheme, the data is distributed
in a round-robin fashion among processes, with each process receiving every nth element,
where n is the number of processes. For example, if there are 4 processes and a 16-element
array, each process would receive elements 0, 4, 8, and 12, 1, 5, 9, and 13, and so on.
In the block-cyclic distribution scheme, the data is divided into blocks of size b and cyclically
distributed among processes, with each process receiving b elements at a time. For example,
if there are 4 processes, a 16-element array, and a block size of 2, the data would be divided
into 8 blocks of size 2, and each process would receive blocks 0 and 4, 1 and 5, and so on.
4. Randomized-block distribution: In this scheme, the data is randomly partitioned into blocks
and assigned to processes. This can help reduce load imbalance if the data is unevenly
distributed.
Basic Communication Operations

A one-to-all broadcast involves combining data from single source to multiple destination processes
using an associative operator. The size of the data from the source is m. The goal is to accumulate
this data at the destination processes, which will result in individual buffers of size m.
An all-to-one reduction involves combining data from multiple sources into a single destination
process using an associative operator. The size of the data from each source is m. The goal is to
accumulate this data at the destination process, which will result in a final buffer of size m.
An associative operator is a binary operation that satisfies the associative property. This means that
the order in which the operation is performed does not affect the result. For example, addition and
multiplication are associative operations, but subtraction and division are not.
Interconnections refer to the topology or layout of the communication network between processes.
There are several common interconnections used in parallel computing, such as linear arrays,
meshes, balanced binary trees, and hypercubes.
A linear array is a simple interconnection where processes are arranged in a line. Each process is
connected to its neighbors, and messages are sent from one process to its adjacent neighbors.
A mesh is a two-dimensional interconnection where processes are arranged in a grid. Each process is
connected to its neighboring processes in the north, south, east, and west directions.
A balanced binary tree is a hierarchical interconnection where processes are arranged in a tree
structure. Each process has two child processes, and messages are sent up and down the tree to
communicate between processes.
A hypercube is a multi-dimensional interconnection where processes are arranged in a hypercube

structure. Each process is connected to its neighboring processes in each dimension, allowing for
efficient communication between any two processes in the hypercube.
Communication algorithms on these interconnections refer to the methods used to transmit data
between processes. These algorithms are based on point-to-point message transfers, where data is
sent from one process to another. The cost of these algorithms can be estimated based on the
number of point-to-point message transfers required, which depends on the interconnection
topology and the size of the data being transferred.
All-to-All Broadcast and All-to-All Reduction are two communication operations used in parallel
computing to exchange data among all the processes in a parallel program. Here are the key points
to understand these operations:
All-to-All Broadcast
1. All-to-All Broadcast is a communication operation in which every process in a group sends

the same message to all the other processes in the group.
2. It is useful in parallel algorithms such as parallel sorting, parallel prefix, and distributed
matrix multiplication.
3. In All-to-All Broadcast, each process has a distinct message to send to every other process in
the group.
4. The time complexity of All-to-All Broadcast is O(p), where p is the number of processes in
the group.
5. All-to-All Broadcast can be implemented using various communication primitives such as

MPI_Allgather, MPI_Alltoall, and MPI_Alltoallv.
6. The MPI_Alltoall primitive is used to implement All-to-All Broadcast in MPI. It takes an array
of send buffers and an array of receive buffers as input, where each buffer corresponds to a
different process.
7. All-to-All Broadcast can be performed using either a blocking or non-blocking

communication mode.
8. In All-to-All Broadcast, the amount of data sent and received by each process is proportional
to the number of processes in the group.
All-to-All Reduction
1. All-to-All Reduction is a communication operation in which every process in a group sends a

distinct message to all the other processes in the group, and each process receives a
reduction of all the messages sent.
2. It is useful in parallel algorithms such as distributed sorting, distributed matrix multiplication,

and distributed graph algorithms.
3. In All-to-All Reduction, each process has a distinct message to send to every other process in
the group, and the reduction operation is applied to all the received messages.
4. The time complexity of All-to-All Reduction is O(p), where p is the number of processes in
the group.
5. All-to-All Reduction can be implemented using various communication primitives such as

MPI_Allreduce, MPI_Reduce_scatter, and MPI_Allgather.
6. The MPI_Allreduce primitive is used to implement All-to-All Reduction in MPI. It takes an

array of send buffers and an array of receive buffers as input, where each buffer corresponds
to a different process.
7. All-to-All Reduction can be performed using either a blocking or non-blocking

communication mode.
8. In All-to-All Reduction, the amount of data sent and received by each process is proportional
to the number of processes in the group, and the amount of data received by each process is
proportional to the number of processes in the group multiplied by the size of each
message.
Shared Memory Programming Threads

1. Posix Threads:
Posix Threads (Pthreads) is a standard programming interface for creating and manipulating threads
in Unix and Unix-like operating systems. Pthreads provides a set of functions that can be used to
create, manage and synchronize threads in a program. Pthreads can improve the performance of a
program by allowing multiple threads to execute simultaneously, taking advantage of the multiple
processing cores available in modern computers.
2. Profiling:
Profiling is the process of analysing a program's execution to determine where it spends the most
time and how resources are being used. Profiling is often used to identify performance bottlenecks
and areas where optimization can be applied. Profiling tools can provide detailed information about
a program's behaviour, such as how long each function call takes, how often each code path is
executed, and how much memory is being used.
3. Work Sharing:
Work sharing is a technique used in parallel computing to divide a task into smaller subtasks that can
be executed simultaneously by multiple threads or processes. Work sharing is often used to improve
the performance of parallel programs by distributing work across multiple processors or cores.
4. Data Parallelism:
Data parallelism is a technique used in parallel computing to divide data into smaller chunks that can
be processed simultaneously by multiple threads or processes. Data parallelism is often used for
problems that can be divided into independent units of work, such as image or signal processing.
5. Task Parallelism:
Task parallelism is a technique used in parallel computing to divide a program into smaller tasks that
can be executed simultaneously by multiple threads or processes. Task parallelism is often used for
problems that require different tasks to be performed in parallel, such as machine learning or
simulation.
6. OpenMP tasks for task parallelization:
OpenMP is an API for parallel programming that supports both data and task parallelism. OpenMP
tasks are a way to express task parallelism in OpenMP programs. OpenMP tasks allow the
programmer to specify independent tasks that can be executed in parallel, without requiring a
specific order of execution.
7. Synchronization:
Synchronization is the process of coordinating the activities of multiple threads or processes to

ensure that they do not interfere with each other. Synchronization is often used to ensure that
shared resources, such as memory or files, are accessed in a consistent and predictable manner.
Synchronization can be achieved through various techniques, such as locks, semaphores, and
barriers.
Question: What is Vector Programming in OpenGL and how is it used in
computer graphics?
Answer:
Vector programming in OpenGL refers to the use of mathematical vectors to represent geometric
objects in 3D space. Vectors can be used to represent points, lines, planes, and other geometric
primitives.
OpenGL provides a set of vector functions that can be used to perform various operations on
vectors, such as addition, subtraction, dot product, cross product, normalization, and so on. These
functions can be used to transform and manipulate vectors in order to achieve various effects in
computer graphics.
For example, vectors can be used to define the position and orientation of objects in a 3D scene, to
calculate the direction and intensity of lighting, to compute the reflection and refraction of light, to
perform texture mapping, and to apply various shading techniques.
Vector programming is an essential component of modern computer graphics, as it allows for the
creation of realistic and complex 3D scenes. Understanding vector programming and its applications
in OpenGL is therefore an important skill for anyone working in the field of computer graphics.

PDC After Mid

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

PDC After Mid

Uploaded by

Copyright:

Available Formats

MPI

More About MPI

Blocking/Non-blocking: Blocking communication is simpler to program but can lead to performance

2. 1D and 2D distribution: In these schemes, the data is partitioned into one-dimensional or

Basic Communication Operations

A hypercube is a multi-dimensional interconnection where processes are arranged in a hypercube

1. All-to-All Broadcast is a communication operation in which every process in a group sends

5. All-to-All Broadcast can be implemented using various communication primitives such as

7. All-to-All Broadcast can be performed using either a blocking or non-blocking

1. All-to-All Reduction is a communication operation in which every process in a group sends a

2. It is useful in parallel algorithms such as distributed sorting, distributed matrix multiplication,

5. All-to-All Reduction can be implemented using various communication primitives such as

6. The MPI_Allreduce primitive is used to implement All-to-All Reduction in MPI. It takes an

7. All-to-All Reduction can be performed using either a blocking or non-blocking

Shared Memory Programming Threads

6. OpenMP tasks for task parallelization:

Synchronization is the process of coordinating the activities of multiple threads or processes to

You might also like