Professional Documents
Culture Documents
…
5. Identify data dependencies
6. Synchronization
7. Load balancing
8. Performance analysis and tuning
EEE4084F
Step 4: Communications
Factors related to Communication
Cost of communications
Latency vs. Bandwidth
Visibility of communications
Synchronous vs. asynchronous
communications
Scope of communications
Efficiency of communications
Overhead and Complexity
Cost of communications
Communication between tasks has some
kind of overheads, such as:
CPU cycles, memory and other resources that
could be used for computation are instead used
to package and transmit data.
Communications frequently require some type
of synchronization between tasks, which can
result in tasks spending time "waiting" instead
of doing work
Competing communication traffic could also
saturate the network bandwidth, causing
performance loss.
Latency vs. Bandwidth
latency is the time it takes to send a minimal (0 byte)
message from point A to point B. Commonly expressed as
microseconds
Bandwidth
The amount of data that can be sent per unit of time.
Usually expressed as megabytes/sec or gigabytes/sec.
Latency vs. Bandwidth
Sending many small messages can cause
latency to dominate communication overheads.
Often it is more efficient to package small
messages into a larger message, thus
increasing the effective communications
bandwidth
Visibility of Communications
Communications is usually both explicit and highly
visible when using the message passing
programming model.
Communications may be poorly visible when
using the data parallel programming model.
For data parallel design on a distributed system,
communications may be entirely invisible, in that
the programmer may have no understanding (and
no easily obtainable means) to accurately
determine what inter-task communications is
happening.
Synchronous vs. asynchronous
Synchronous communications
Require some kind of
handshaking between tasks that
share data / results.(explicitly encoded or
transparent to the programmer)
Synchronous communications are also referred
to as blocking communications because other
work must wait until the communications have
completed.
Synchronous vs. asynchronous
Asynchronous communications
Allow tasks to transfer data between one
another independently.
Asynchronous communications are non-blocking
since other work can be done while the
communications are taking place
Interleaving computation with communication is
the single greatest benefit for using
asynchronous communications
Scope of communications
Scope of communications:
Knowing which tasks must communicate with
each other is critical during the design stage
of a parallel code
Two general types of scope:
Point-to-point (P2P)
Collective / broadcasting
Scope of communications
Point-to-point (P2P)
Involvesonly two tasks, one task is the
sender/producer of data, and the other acting as the
receiver/consumer.
Collective
Data sharing between more than two tasks
(sometimes specified as a common group or
collective).
Both P2P and collective communications can be
synchronous or asynchronous.
Collective communications
Typical techniques used for collective communications:
Efficiency of
communications
The programmer often has a choice with regard to factors
that can affect communications performance.
Which implementation for a given model should be used?
Using the Message Passing Model as an example, one MPI
implementation may be faster on a given hardware platform
than another.
What type of communication operations should be used? As
mentioned previously, asynchronous communication
operations can improve overall program performance.
Network media - some platforms may offer more than one
network for communications. Which one is best?
Step 5: Identify Data
Dependencies
Data dependencies
A dependence exists between program
statements when the order of statement
execution affects the results of the
program.
A data dependence results from multiple
use of the same location(s) in storage by
different tasks.
Dependencies are important to parallel
programming because they are one of the
primary inhibitors to parallelism.
Common data dependencies
• Loop carried data dependence:
dependence between statements in different iterations
Examples include
For array/matrix operations where each task performs similar work,
evenly distribute the data set among the tasks
For loop iterations where the work done in each iteration is similar,
evenly distribute the iterations across the tasks.
Daniele Loiacono
Step 8: Performance analysis
Performance analysis
Simple approaches involve using timers
Usually quite straightforward for a serial
situation or global memory
Monitoringand analyzing parallel
programs can get a whole lot more
challenging than for the serial case
May add communication overhead
Needing synchronization to avoid corrupting
results, … could be a significant algorithm itself