This action might not be possible to undo. Are you sure you want to continue?
In the simplest sense, parallel programming is the simultaneous use of multiple compute resources to solve a computational problem: • To be run using multiple CPUs. • A problem is broken into discrete parts that can be solved concurrently. • Each part is further broken down to a series of instructions. • Instructions from each part execute simultaneously on different CPUs.
BIT LEVEL PARALLELISM –By increasing the processor word size we can reduce the number of instructions the processor must execute in order to perform an operation on variables whose sizes are greater than the length of the word . INSTRUCTION LEVEL PARALLELISM -How many of the operations in a computer program can be performed simultaneously. DATA PARALLELISM -Focuses on distributing the data across different parallel computing nodes. Same calculation done on the same data set or on different data set. TASK PARALLELISM -Entirely different calculations can be performed on either the same or different sets of data
Amdahl's Law states that potential program speedup is defined by the fraction of code (P) that can be parallelized:
1 speed up= -----------------1 - P
• If none of the code can be parallelized,P = 0 and the speedup = 1 (no speedup). • If all of the code is parallelized, P = 1 and the speedup is infinite (in theory).
Introducing the number of processors performing the parallel fraction of work, the relationship can be modeled by: 1 speedup = -------------P+S --N
where P = parallel fraction, N = number of processors and S = serial fraction.
GUSTAFSON’S LAW Gustafson's Law (also known as Gustafson-Barsis' law) is a law in computer science which states that any sufficiently large problem can be efficiently parallelized. Gustafson's Law is closely related to Amdahl's law, which gives a limit to the degree to which a program can be sped up due to parallelization. It was first described by John L. Gustafson: . where P is the number of processors, S is the speedup, and α the non-parallelizable part of the process.
A data dependency exists when there is multiple use of the same storage location. No program can run more quickly than the longest chain of dependent calculations since calculations that depend upon prior calculations in the chain must be executed in order. Let Pi and Pj be two program fragments. Bernstein's conditions describe when the two are independent and can be executed in parallel. For Pi, let Ii be all of the input variables and Oi the output variables, and likewise for Pj. P i and Pj are independent if they satisfy
RACE CONDITION, MUTUAL EXCLUSION, SYNCHRONISATION,
Example, consider the following program: Thread A 1A: Read variable V 2A: Add 1 to variable V 3A: Write back to variable V Thread B 1B: Read variable V 2B: Add 1 to variable V 3B: Write back to variable V
RACE CONDITION A situation in which multiple processes read and write a shared data item and the final result depends on the relative timing of their execution.
MUTUAL EXCLUSION A collection of techniques for sharing resources so that different uses do not conflict and cause unwanted interactions. SYNCHRONISATION The coordination of parallel tasks in real time, very often associated with communications. Often implemented by establishing a synchronization point within an application where a task may not proceed further until another task(s) reaches the same or logically equivalent point.
PARALLEL SLOWDOWN GRANULITY EMBARRASSINGLY PARALLEL PARALLEL SLOWDOWN When a task is split up into more and more threads, those threads spend an ever-increasing portion of their time communicating with each other. Eventually, the overhead from communication dominates the time spent solving the problem, and increases the amount of time required to finish. GRANULITY In parallel computing, granularity is a qualitative measure of the ratio of computation to communication. •Coarse •Fine EMBARRASSINGLY PARALLEL If the sub tasks rarely or never have to communicate .
LOAD BALANCING Load balancing refers to the practice of distributing work among tasks so that all tasks are kept busy all of the time. It can be considered a minimization of task idle time. Load balancing is important to parallel programs for performance reasons. For example, if all tasks are subject to a barrier synchronization point, the slowest task will determine the overall performance.
Single Data Multiple Data
Single Instruction, Multiple Data Stream
PU2 CU IS
Multiple Instruction, Multiple Data Stream
CU1 IS1 PU1 DS1 MM1 IS1
SHARED MEMORY Shared memory parallel computers vary widely, but generally have in common the ability for all processors to access all memory as global address space. Multiple processors can operate independently but share the same memory resources. Changes in a memory location effected by one processor are visible to all other processors. Shared memory machines can be divided into two main classes based upon memory access times: • • UMA NUMA.
UNIFORM MEMORY ACCESS Most commonly represented today by Symmetric Multiprocessor (SMP) machines Identical processors Equal access and access times to memory
SHARED MEMORY (UMA)
NON UNIFORM MEMORY ACCESS Often made by physically linking two or more SMPs One SMP can directly access memory of another SMP Not all processors have equal access time to all memories Memory access across link is slower
SHARED MEMORY (NUMA)
DISTRIBUTED MEMORY Distributed memory systems require a communication network to connect inter-processor memory. Processors have their own local memory. There is no concept of global address space across all processors. The concept of cache coherency does not apply. When a processor needs access to data in another processor, it is usually the task of the programmer to explicitly define how and when data is communicated. Synchronization between tasks is likewise the programmer's responsibility.
HYBRID DISTRIBUTED SHARED MEMORY The shared memory component is usually a cache coherent SMP machine. Processors on a given SMP can address that machine's memory as global. The distributed memory component is the networking of multiple SMPs. SMPs know only about their own memory - not the memory on another SMP. Therefore, network communications are required to move data from one SMP to another.
HYBRID DISTRIBUTED SHARED MEMORY
Parallel programming models exist as an abstraction above hardware and memory architectures. There are several parallel programming models in common use: Shared Memory Threads Message Passing Data Parallel Although it might not seem apparent, these models are NOT specific to a particular type of machine or memory architecture. In fact, any of these models can (theoretically) be implemented on any underlying hardware
SHARED MEMORY MODEL In the shared-memory programming model, tasks share a common address space, which they read and write asynchronously. Various mechanisms such as locks / semaphores may be used to control access to the shared memory.
THREADS MODEL In the threads model of parallel programming, a single process can have multiple, concurrent execution paths. The main program a.out is scheduled to run by the native operating system. a.out loads and acquires all of the necessary system and user resources to run. a.out performs some serial work, and then creates a number of tasks (threads) that can be scheduled and run by the operating system concurrently. Each thread has local data, but also, shares the entire resources of a.out. Each thread also benefits from a global memory view because it shares the memory space of a.out.
MESSAGE PASSING MODEL A set of tasks that use their own local memory during computation. Multiple tasks can reside on the same physical machine as well across an arbitrary number of machines. Tasks exchange data through communications by sending and receiving messages. Data transfer usually requires cooperative operations to be performed by each process. For example, a send operation must have a matching receive operation.
DATA PARALLEL MODEL Most of the parallel work focuses on performing operations on a data set. The data set is typically organized into a common structure, such as an array or cube. A set of tasks work collectively on the same data structure, however, each task works on a different partition of the same data structure. Tasks perform the same operation on their partition of work, for example, "add 4 to every array element".
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue reading from where you left off, or restart the preview.