Professional Documents
Culture Documents
- And it is denoted as S1 S2
: Example-:
S1 Add R2,R1
S2 Move R1,R3
Example-:
S1 Load R1,A
S2 Move R1,R3
4. I/O dependence
Read and write are I/O statements.
- I/O dependence occurs not because the same variable is
involved but because the same file referenced by both I/O
statement.
Denoted as,
S1 I/O S3
Example
S1 Read(4),A(I)
S3 Write(4),A(I)
• 5. Unknown dependence
The dependence relation between two statements cannot be
determined.
Example - Indirect Addressing
Unknown dependence may exist due to the following situation:
• The subscript of a variable is itself subscripted.
e.g: a(i[j])
• The subscript does not contain the loop index variable
e.g:a[]
• The subscript is nonlinear in the loop index variable.
• The variable appears more than once with subscripts having
different co-efficients of the loop variables.
• E g :a[i] and a[j]
Data Dependence in Program
Consider the following code segment of 4
instructions
S1 : Load R1,A
S2 : Add R2, R1
S3 : Move R1, R3
S4 : Store B, R1
Solution:
S1 to S2:S2 flow dependent on S1
S1 to S3: S3 is output dependent on S1
S1 to S4:S4 is flow dependent on S1
S2 to S3: S3 is anti dependent on S2
S2 to S2: S2 is flow dependent on S2 itself
S2 to S4:No dependency
S3 to S4: S4 is flow dependent on S3
Data Dependence in Program
Consider the following code segment of 4 instructions find the type of
dependences and write the dependence graph
Solution:
I/O Dependency:
S1 S3
S1 to S3
S1 and S3 (read and write statements) are I/O Dependent on each other because they
both access the same file from tape unit 4
No other dependency
2.Control Dependence
• Control dependence arise
When the order of execution of statements
cannot be determined before runtime.
• For example: IF condition, will not be resolved
until runtime.
• Control dependence often prohibits
parallelism from being exploited.
• Compilers are used to eliminate this control
dependence and exploit the parallelism.
Here, After for loop we have one statement a[i]=c[i]
Let us assume the value of c[i]=5 now the value of a[i] =5
Now check the condition
In the second code snippet
if a[i]<0 if it is true then we need to reinitialize a[i] and in
we have conditional statement a[i-1]<0 after for
this case it is false .
loop , since the value of i is 1 the value of a[i-1]
so this statement is not going to execute Hence it is an
becomes a[0]
example for control independent
3.Resource Dependence
• Resource dependence concerned with conflicts in using shared resources,
such as integer and floating point units(ALU’s),registers and memory
areas among parallel events, etc.
ALU conflicts called ALU dependence
Memory(storage) conflicts called Storage dependence.
Example:
I1 and I2 both are accessing addition unit , so ALU dependent
I1: A=B+C
I2: G=D+H
Bernstein’s conditions
• The transformation of a sequentially coded program into a parallel
executable form can be
1. Done manually by the programmer using explicit parallelism
2. Or done by a compiler detecting implicit parallelism automatically .
• Bernstein’s revealed a set of conditions which must exist if two
processes can execute in parallel.
• Bernstein’s revealed a set of conditions which must exist
“if two processes can execute in parallel”.
Notation
• Process Pi is a software entity.
• Ii is the set of all input variables for a process Pi .
• Oi is the set of all output variables for a process Pi .
Bernstein’s conditions.......
• Consider two processes P1 and P2. Their inputs are I1 and I2 and outputs
are O1 and O2 respectively.
Examples.
Intel i960CA is a three-issue processor (arithmetic, memory access, branch).
IBM RS -6000 is a four-issue processor (arithmetic, floating-point, memory
access, branch)
2. Software Parallelism:
Latency
• Latency is a time of the communication overhead acquire a mid-subsystems.
• The memory latency is the time required by processor to access the memory.
• The time required for two processes to synchronize with each other is called synchronization latency.
Computational granularity and communication latency are closely related.
Levels of Parallelism
47
Program Graphs and Packing…
Nodes 1, 2, 3, 4, 5, and 6 are memory reference (data fetch)
operations.
Each takes one cycle to address and six cycles to fetch from memory.
All remaining nodes (7 to 17) are CPU operations, each requiring two
cycles to complete. After packing, the coarse-grain nodes have larger
grain sizes ranging from 4 to 8 as shown.
The node (A, 8) in Fig. (b) is obtained by combining the nodes (1, 1),
(2, 1), (3, 1), (4, I), (5, 1), (6, 1) and (11, 2) in Fig.(a). The grain size, 8.
of node A is the summation of all grain sizes (1 + 1 + 1 + 1 + 1 +1 + 2 =
8) being combined.
Program Flow Mechanism
It determines the order of execution which is explicitly stated by the user
program.
There are three flow mechanism
1. Control flow mechanism(used by conventional computers)
2. Data-driven mechanism(used by data-flow computers)
3. Demand-driven mechanism(used by reduction computers)
1.Control flow mechanism
Conventional von Neumann computers use a PC to sequence the execution of instruction in a program. This
sequential execution style is called control-driven.
Conventional computers are based on a control flow mechanism by which the order of program execution is explicitly
stated in the user programs
Control flow can be made parallel by using parallel language constructs or parallel compiler.
Control flow machines give complete control, but are less efficient than other approaches
2.Data-driven mechanism
Dataflow computers are based on a data-driven mechanism which allows the execution of any instruction to be
driven by data (operand) availability.
Dataflow computers emphasize a high degree of parallelism at the fine-grain instructional level.
But have high control overhead, lose time waiting for unneeded arguments, and difficulty in manipulating data
structures
Dataflow features
No need for
shared memory
program counter
control sequencer
Special mechanisms are required to
detect data availability
match data tokens with instructions needing them
Interconnection Network
• When more than one processor needs to access a memory structure, peripheral devices etc..as interconnection
network is required between processing elements and other subcomponents of a computer.
• These interconnection network/architecture can be viewed as topology where nodes are various sub-
components of computer.
• System interconnection networks are categorized into two categories
1. Static interconnection networks
2. Dynamic interconnection networks
• Static networks uses direct links which are fixed once built. Such network does not change with
respect to time and requirements.
• Such networks are suitable when communication patterns among various sub-components are
predictable.
• There are various topologies under this category
1. Linear array
2. Ring and chordal ring
3. Barrel shifter
4. Trees and stars
5. Fat tree
6. Mesh and Torus
1. Linear array
• It is a one dimensional network with simplest topology where n nodes are
connected with the help of n-1 links.
• In this topology , internal node have degree 2, and terminal nodes have
degree 1.
• Bisection width b=1 for this topology . This structure poses communication
inefficiency when n becomes large.
7. Systolic Array:
• This class of multidimensional pipeline architecture are used for matrix
multiplication. The interior node degree is 6.
• This architecture specially used for applications like signal and image
processing. Such structure have limited applicability and difficult to program
• Dynamic interconnection networks
• Such networks are implemented by the use of switches and
arbiters.
• It can provide dynamic connectivity for all types of
communication pattern based on the demand of the program.
• These are 2 major class of dynamic interconnection networks.
1. Single stage networks
2. Multi stage networks
• A bus systems is essentially, a collection of wires and
connectors for data transactions among processors,
memory modules and peripheral devices attached to
the systems.
• The bus is used only for one transaction at a time
between source and destination. In case of multiple
requests, arbitration logic is used.
• Digital bus system is also called contention bus or time
sharing bus.
• These MIN have been used in both SIMD
and MIMD computers. A number of aXb
switches are used in each stage. Fixed
inter-stage interconnection are used
between the switches in adjacent stages.
The switches can be dynamically set to
establish the desired connections
between inputs and outputs.
• Different class of MINs differ in the switch
modules used and in the kind of inter-
stage connections(ISC) pattern used.
• The simplest switch module is 2X2
switches, and pattern used include
perfect shuffle,butterfly,multiway
shuffle,crossbar,cube connection etc..
• There are 4 different possible
interconnection of2X2 switches used in
construction of Omega network.