You are on page 1of 77

MESSAGE PASSING MECHANISMS

KTUStudents.in
MODULE 4

Sufyan P
Assistant professor
Computer Science and Engineering
sufyan@meaec.edu.in
For more study materials: WWW.KTUSTUDENTS.IN
INTRODUCTION
 Message passing in a multicomputer network
require hardware and software support

KTUStudents.in
 There are 2 message routing schemes for
multicomputer n/w
 Store and forward routing
 Wormhole routing

For more study materials: WWW.KTUSTUDENTS.IN


MESSAGE FORMATS

 Message
 Message is a logical unit for inter-node communication
 It is formed by assembling arbitrary number of fixed
length packets


KTUStudents.in
 Every message will be of variable length

Packet
 Packet is a basic unit containing destination address for
routing purposes
 Different packets may arrive asynchronously at the
destination
 So a sequence number is needed in each packet to allow
reassembly of transmitted message

For more study materials: WWW.KTUSTUDENTS.IN


 Flits
 Packet is further divided into a no: of fixed length
flow control digits called flits
 Header flits consist of
 Routing information (destination address)
 Sequence number

 Remaining flits are the data elements of a packet

KTUStudents.in

For more study materials: WWW.KTUSTUDENTS.IN


MESSAGE FORMATS

KTUStudents.in

For more study materials: WWW.KTUSTUDENTS.IN


FACTORS AFFECTING PACKET &FLIT SIZES

 Packet length is determined by routing scheme


& n/w implementation
 Packet length range from 64 – 512 bits
Sequence number may occupy 1 or 2 flits
KTUStudents.in

depending on message length
 Other factors
 Channel bandwidth
 Router design
 n/w traffic intensity

For more study materials: WWW.KTUSTUDENTS.IN


STORE AND FORWARD ROUTING

 This scheme was used in first generation


multicomputer
 Packets are the basic unit of information flow in
store and forward n/w

KTUStudents.in
 Each node require a packet buffer

 Packet is transmitted from source to destination


through a sequence of intermediate nodes

For more study materials: WWW.KTUSTUDENTS.IN


STORE AND FORWARD ROUTING [2]

KTUStudents.in

For more study materials: WWW.KTUSTUDENTS.IN


STORE AND FORWARD ROUTING [3]

 When a packet reaches the intermediate node, it is


first stored in the buffer

 KTUStudents.in
It is forwarded to the next node if output channel
and the packet buffer of receiving node is available

 Latency is directly proportional to distance between


source & destination

For more study materials: WWW.KTUSTUDENTS.IN


WORMHOLE ROUTING

 This scheme is implemented in latter generations


of multicomputer
 Here packets are subdivided to flits

KTUStudents.in
 Flit buffers are used in the h/w routers attached to
nodes
 Transmission from source to destination is done
via sequence of routers

For more study materials: WWW.KTUSTUDENTS.IN


WORMHOLE ROUTING

KTUStudents.in

For more study materials: WWW.KTUSTUDENTS.IN


WORMHOLE ROUTING

 All flits in the same packet are transmitted in


proper order
 They are transmitted as inseparable companions in
a pipelined fashion

KTUStudents.in
 A packet can be visualized as a railroad train with
an engine car (header flit) as the engine and the
data flits following the header
 Only header flit knows where the packet is going

 Data flits follow the header flits

For more study materials: WWW.KTUSTUDENTS.IN


WORMHOLE ROUTING

 Packets can be interleaved during transmission


 Flits of different packets cannot be mixed up
 This may lead to transfer to wrong destination

KTUStudents.in
 Latency of this method is independent of distance
b/w source & destination

For more study materials: WWW.KTUSTUDENTS.IN


ASYNCHRONOUS PIPELINING

 Pipelining of successive flits in a packet is done


asynchronously using handshaking protocol
 A one bit ready/request line is used b/w adjacent
routers to perform handshaking

KTUStudents.in
 When the receiver router D is ready, to receive a
flit,
 R/A line is pulled to low
 When sending router S is ready,
 It raises R/A line to high
 It transmits the flit through the channel

For more study materials: WWW.KTUSTUDENTS.IN


ASYNCHRONOUS PIPELINING [2]
 R/A is kept high while the flit is being received by
D
 After flit i is removed from D’s buffer and
transmitted to next node,

KTUStudents.in
cycle repeats for the transmission of flit i+1
 This goes on until entire packet is transmitted

For more study materials: WWW.KTUSTUDENTS.IN


ASYNCHRONOUS PIPELINING [2]

KTUStudents.in

For more study materials: WWW.KTUSTUDENTS.IN


ASYNCHRONOUS PIPELINING [2]
 Asynchronous pipelining is very efficient
 Clock is faster than synchronous pipeline
 Pipeline is stalled if flit buffers or successive
channels along the path are not available
KTUStudents.in

For more study materials: WWW.KTUSTUDENTS.IN


LATENCY ANALYSIS
 L ➔ packet length
 W➔ bandwidth
 D➔ distance (no: of nodes traversed -1)
 F➔ flit length

 KTUStudents.in
TSF➔ communication latency for store and forward
TWH➔ communication latency for wormhole

For more study materials: WWW.KTUSTUDENTS.IN


KTUStudents.in

For more study materials: WWW.KTUSTUDENTS.IN


 TSF is directly proportional to D
 TWH is L/W if L>>F

 Thus D has negligible effect on routing latency

KTUStudents.in
 First generation value of TSF is between 2000 and
6000µs
 TWH is 5µs or less

For more study materials: WWW.KTUSTUDENTS.IN


KTUStudents.in
FLOW CONTROL STRATEGIES

For more study materials: WWW.KTUSTUDENTS.IN


INTRODUCTION
 Flow control strategies are used to control n/w
traffic flow without causing congestion or
deadlock situations
 When 2 or more packets collide at a node, policies

KTUStudents.in
must be set for resolving their conflict

For more study materials: WWW.KTUSTUDENTS.IN


PACKET COLLISION RESOLUTION

 To move a flit between adjacent nodes in a


pipeline of channels, 3 elements must be present
 Source buffer which holds the flit
 Channel being allocated
 Receiver buffer accepting the flit


KTUStudents.in
When 2 packets reach the same node and request
for same receiver buffer or outgoing channel, 2
decisions are to be made
 Which packet will be allocated to the channel?
 What will be done to the packet being denied the
channel?

For more study materials: WWW.KTUSTUDENTS.IN


FLOW CONTROL POLICIES FOR COLLISION
RESOLUTION

 Buffering
 Blocking policy

 Discard and retransmission

KTUStudents.in
 Detour after being blocked

For more study materials: WWW.KTUSTUDENTS.IN


BUFFERING METHOD

 This method is applied in virtual-cut routing


 When packet 1 and 2 collide at a particular node,
 Packet 1 is allocated to the channel
 Packet 2 is denied

KTUStudents.in
 Packet 2 is temporarily stored in packet buffer
 It will transmitted later, when the channel becomes

available
 Advantage
 Already allocated resources are not wasted
 Disadvantages
 Requires the use of large buffer to hold the entire
packet
 Cause significant storage delay

For more study materials: WWW.KTUSTUDENTS.IN


KTUStudents.in

For more study materials: WWW.KTUSTUDENTS.IN


BLOCKING POLICY

 Pure wormhole routing uses this scheme


 Second packet is being blocked from advancing

 However it is not abandoned

KTUStudents.in

For more study materials: WWW.KTUSTUDENTS.IN


DISCARD POLICY

 It drops the packet which is blocked


 This scheme result in severe wastage of resources

 It demands packet retransmission and

KTUStudents.in
acknowledgement
 Rarely used policy coz of unstable packet delivery
rate

For more study materials: WWW.KTUSTUDENTS.IN


DETOUR POLICY

 Blocked packet is routed through a detour


channel
 It is economical to implement

KTUStudents.in
 Offers more flexibility

 Disadvantage
 Result in idling of resources allocated to the blocked
packet
 Waste more channel resources

For more study materials: WWW.KTUSTUDENTS.IN


FLOW CONTROL POLICIES FOR COLLISION
RESOLUTION

 Some multicomputer n/w uses hybrid policies


which combines the advantage of above
mentioned flow control policies

KTUStudents.in

For more study materials: WWW.KTUSTUDENTS.IN


DETERMINISTIC ROUTING

 Communication path is completely determined by


the source and destination addresses
 Routing path is predetermined in advance
irrespective of n/w condition


KTUStudents.in
 Eg: of deterministic routing algorithm
E-Cube routing ➔ routing in hypercube
 X-Y routing ➔ routing in mesh
 Both of the above algorithms works based on the
concept of Dimension-order routing

For more study materials: WWW.KTUSTUDENTS.IN


DIMENSION ORDER ROUTING

 It selects the successive channels for routing in a


specific order, based on the dimensions of a
multidimensional n/w
 In case of a 2D mesh network, this scheme is called


KTUStudents.in
X-Y routing
Path along X dimension is decided first before choosing
a path along Y dimension

For more study materials: WWW.KTUSTUDENTS.IN


E–CUBE ROUTING ON HYPERCUBE

 Consider an n-cube with 2n nodes


 Each node b is binary coded as b=bn-1bn-2……b1b0

 Source node s=sn-1sn-2….s1s0

KTUStudents.in
 Destination node d=dn-1dn-2…..d1d0

 We have to determine a route from s to d with


minimum no: of steps
 v=vn-1vn-2……v1v0 be any node along the route

For more study materials: WWW.KTUSTUDENTS.IN


ALGORITHM TO DETERMINE ROUTE FROM
S TO D

 Compute the direction bit ri=si-1 XOR di-1 for all n


dimensions (i=1,2….n) . Start the following with i=1
and v=s
 Route from current node v to next node v XOR 2i-1 if

KTUStudents.in
ri=1. skip this step if ri=0
 Move to dimension i+1 (ie i=i+1). If i<=n, go to step 2,
else done

For more study materials: WWW.KTUSTUDENTS.IN


EXAMPLE
 n=4, s=0110 and d=1101
 r=r4r3r2r1➔ 0110 XOR 1101➔ 1011
 r=1011
 Route from s to next node

 KTUStudents.in
v XOR 2i-1
For i=1
 0110 XOR 20= 0111
v=s

this is done since r1=1


 For i=2
 0111 XOR 21= 0101 since r2=1
 For i=3
 Skip since r3=0
 For i=4
 0101 XOR 23= 1101
For more study materials: WWW.KTUSTUDENTS.IN
KTUStudents.in

For more study materials: WWW.KTUSTUDENTS.IN


X–Y ROUTING ON 2D MESH

 From any source node s=(x1,y1) to any destination


node d=(x2,y2)
 Route from s along X-axis first, until it reaches
column y2 where d is located.

KTUStudents.in
 Then route to d along the Y axis

 Four possible X-Y routing patterns


 East-north
 East-south
 West-north
 West-south

For more study materials: WWW.KTUSTUDENTS.IN


KTUStudents.in

For more study materials: WWW.KTUSTUDENTS.IN


KTUStudents.in
LINEAR PIPELINE PROCESSORS

For more study materials: WWW.KTUSTUDENTS.IN


INTRODUCTION
 It is a cascade of processing stages, which are
linearly connected to perform a fixed function over a
stream of data flowing from one end to other end
 They can be applied for

KTUStudents.in
Instruction execution
 Arithmetic computation
 Memory access operations

For more study materials: WWW.KTUSTUDENTS.IN


STRUCTURE OF LINEAR PIPELINE

 It consist of k processing stages


 Inputs are fed into the first stage of the pipeline ie
S1

KTUStudents.in
 Processed results are passed from stage Si to stage
Si+1 for all i=1,2….k-1
 Final result emerges from the last stage of the
pipeline ie Sk

For more study materials: WWW.KTUSTUDENTS.IN


CATEGORIES OF LINEAR PIPELINE
PROCESSORS
 Depending on the control of data flow, linear
pipelines are divided into 2 categories

Asynchronous pipeline model

KTUStudents.in

 Synchronous pipeline model

For more study materials: WWW.KTUSTUDENTS.IN


ASYNCHRONOUS MODEL

 Data flow between adjacent stages of an


asynchronous pipeline is controlled by
handshaking protocol


KTUStudents.in
When stage Si is ready to transmit, it sends a ready
signal to stage Si+1
 After stage Si+1 receives the incoming data, it returns
an acknowledgement signal to Si

For more study materials: WWW.KTUSTUDENTS.IN


ASYNCHRONOUS PIPELINE MODEL

KTUStudents.in

For more study materials: WWW.KTUSTUDENTS.IN


ASYNCHRONOUS MODEL[2]

 This pipeline has variable delay at different stages


 It has variable throughput rate

 This pipeline is used for designing communication

KTUStudents.in
channels in message-passing multicomputer which
employs wormhole routing

For more study materials: WWW.KTUSTUDENTS.IN


SYNCHRONOUS PIPELINE MODEL

 In this pipeline, clocked latches are used b/w the


stages of pipeline
 Latches are made with master-slave flip-flops
 They are used to isolate i/p from o/p


KTUStudents.in
When a clock pulse arrive, all latches transfer data
to the next stage simultaneously
 Pipeline stages are implemented as combinational
logic circuits
 There will be approximately equal delays in all the
stages

For more study materials: WWW.KTUSTUDENTS.IN


KTUStudents.in

For more study materials: WWW.KTUSTUDENTS.IN


RESERVATION TABLE

 It is a table for representing the task flow pattern of


a pipelined system
 Specifies the utilization pattern of successive stages in a
pipeline

KTUStudents.in
 It consist of rows and columns
 Rows➔ resource of a pipeline
 Columns➔ time slice of pipeline
 In linear pipeline, utilization pattern is in diagonal
format
 For a k stage linear pipeline, k clock cycles are
needed for a data to flow through the pipeline
 Once the pipeline is filled, one result emerges from
pipeline, for each additional cycle
For more study materials: WWW.KTUSTUDENTS.IN
RESERVATION TABLE

KTUStudents.in

For more study materials: WWW.KTUSTUDENTS.IN


SPEEDUP, EFFICIENCY AND THROUGHPUT

 Ideally, a linear pipeline of k stages can process n


tasks in k+(n-1) clock cycles
 k cycles are needed to complete first task
Remaining n-1 tasks require n-1 cycles
KTUStudents.in

 Total time
 Tk= [k+(n-1)] τ ---(1)
 τ ➔ clock period
 Time taken for a non pipelined processor to execute
n tasks
 T1=nkτ ---(2) where kτ is the flow through
delay of non pipelined processor

For more study materials: WWW.KTUSTUDENTS.IN


SPEEDUP FACTOR

 Speedup factor of a k stage pipeline over an


equivalent non pipelined processor is defined as

KTUStudents.in

For more study materials: WWW.KTUSTUDENTS.IN


OPTIMAL NUMBER OF STAGES OF A PIPELINE

 Let t be the total time required to execute a non


pipelined sequential program
 To execute that same program on a k stage pipeline,
with equal flow-through delay (t), the required clock


KTUStudents.in
period is:
𝑡
p= + 𝑑 ---- (4)
𝑘

 t➔ flow-through delay
 d➔ latch delay

For more study materials: WWW.KTUSTUDENTS.IN


 Maximum throughput of pipeline in ideal
condition is:
1
𝑓 =
𝑝

KTUStudents.in
1
𝑓=𝑡 ---- (5)
+𝑑
𝑘

 Total pipeline cost = c+kh


 c➔ cost of all logic stages
 h➔ cost of each latch

For more study materials: WWW.KTUSTUDENTS.IN


P C R (PIPELINE PERFORMANCE COST RATIO)

𝑓 1
 PCR = = 𝑡 ----- (6)
𝑐+𝑘ℎ (𝑘+𝑑)(𝑐+𝑘ℎ)


KTUStudents.in
Peak of PCR curve specifies the optimal choice for no: of
desired pipeline stages

 t➔ total flow-through delay of the pipeline


 c➔ total stage cost
 d➔ latch delay
 h➔ latch cost
For more study materials: WWW.KTUSTUDENTS.IN
KTUStudents.in

For more study materials: WWW.KTUSTUDENTS.IN


EFFICIENCY AND THROUGHPUT

 Efficiency of a k stage pipeline is

𝑠𝑘 𝑛
 Ek= =
𝑘 𝑘+(𝑛−1)

KTUStudents.in

For more study materials: WWW.KTUSTUDENTS.IN


PIPELINE THROUGHPUT

 It is defined as the no: of tasks performed per unit


time
𝑛.𝑓
 𝐻𝑘 =𝐸𝑘 .f =
𝑘+(𝑛−1)

KTUStudents.in
𝐸𝑘
 𝐻𝑘 =
τ
𝑠𝑘
 𝐻𝑘 =

For more study materials: WWW.KTUSTUDENTS.IN


NON LINEAR PIPELINE PROCESSORS
 Linear pipeline are known as static pipelines,
 because they are used to perform fixed functions
 Non linear pipelines are dynamic pipelines,

KTUStudents.in
 because they can be reconfigured to perform variable
functions at different times
 Dynamic pipeline allows feedforward and
feedback connections in addition to the
streamline connections
 Hence the structure of this pipeline is called as
non linear

For more study materials: WWW.KTUSTUDENTS.IN


RESERVATION AND LATENCY ANALYSIS
 In static pipeline, it is easy to partition a given
function into a sequence of linearly ordered
subfunctions
 Function partitioning is difficult in case of


KTUStudents.in
dynamic pipeline
Because pipeline stages are interconnected with loops
in addition to streamline connections

For more study materials: WWW.KTUSTUDENTS.IN


KTUStudents.in

For more study materials: WWW.KTUSTUDENTS.IN


 Feedforward and feedback connections make
scheduling of successive events difficult
 Due to these connections, o/p of pipeline need not
be necessarily from last stage
 Same pipeline can be used to evaluate different
functions

KTUStudents.in

For more study materials: WWW.KTUSTUDENTS.IN


RESERVATION TABLES
 Reservation table for static pipeline is simple
 Coz dataflow follows a linear streamline
 Reservation table for dynamic pipeline is complex

KTUStudents.in
 Coz it follows a nonlinear pattern
 Multiple reservation table is generated for
evaluation of different functions
 Static pipeline is specified by a single reservation
table
 Dynamic pipeline is specified by more than one
reservation table

For more study materials: WWW.KTUSTUDENTS.IN


KTUStudents.in

For more study materials: WWW.KTUSTUDENTS.IN


 Each reservation table displays the time-space
flow of data through the pipeline for evaluation of
one function
 Different functions follow different paths through
the pipeline
 No : of columns in reservation table➔ evaluation
time of a given function
KTUStudents.in
 Eg: function X requires 8 clock cycles

function Y requires 6 clock cycles

For more study materials: WWW.KTUSTUDENTS.IN


 Checkmarks in each row of the reservation table
correspond to the cycles that a particular stage
will be used
 Multiple checkmarks in a row indicates the repeated
usage of same stage in different cycles
 Contiguous checkmarks in a row indicates the
extended usage of a stage over more than one cycle

KTUStudents.in
Multiple checkmarks in a column indicates that,
multiple stages need to be used in parallel during a
particular clock cycle

For more study materials: WWW.KTUSTUDENTS.IN


KTUStudents.in
LATENCY ANALYSIS

For more study materials: WWW.KTUSTUDENTS.IN


INTRODUCTION

 No: of clock cycles between the two initiations of


a pipeline is the latency between them
 Latency is a non negative integer

KTUStudents.in
 A latency value k implies that, two initiations are
separated by k clock cycles

For more study materials: WWW.KTUSTUDENTS.IN


 Latency for reservation table for X
 6-1➔ 5

 8-6➔ 2

KTUStudents.in

For more study materials: WWW.KTUSTUDENTS.IN


COLLISION

 An attempt by 2 or more initiations to use the


same pipeline stage at same time will cause
collision
 Collision implies resource conflicts b/w 2 initiation in

KTUStudents.in
the pipeline
 Collisions must be avoided by proper scheduling of
pipeline

For more study materials: WWW.KTUSTUDENTS.IN


TYPES OF LATENCIES
 2 types
 Forbidden latencies
 Permissible latencies

Latencies which causes collision are called as


KTUStudents.in

forbidden latencies
 Latencies which does not cause collision are
called as permissible latencies

For more study materials: WWW.KTUSTUDENTS.IN


KTUStudents.in

For more study materials: WWW.KTUSTUDENTS.IN


 Latency sequence
 It is a sequence of permissible nonforbidden latencies
b/w successive task initiations
 Latency cycle
 It is a latency sequence which repeats the same
subsequences(cycles) indefinitely

KTUStudents.in
 Eg: latency cycle (1,8)
 It represents an infinite latency sequence 1,8,1,8,….
 This implies that successive initiations of new tasks are

separated by one cycle and 8 cycles alternately


 Constant Latency cycle
 It is a latency cycle which contain only one
latency value
 Eg: cycle(3)

For more study materials: WWW.KTUSTUDENTS.IN


KTUStudents.in

For more study materials: WWW.KTUSTUDENTS.IN


 Average latency
 Average latency of a latency cycle is obtained by
dividing the sum of all latencies by the no: of
latencies along the cycle

 KTUStudents.in
 Eg: avg latency of latency cycle(1,8)➔ (1+8)/2= 4.5

Average latency of the constant cycle is simply


the latency itself

For more study materials: WWW.KTUSTUDENTS.IN


COLLISION FREE SCHEDULING
 Objective of scheduling
 Obtain shortest average latency between initiations
without causing collision
Concepts used for collision free scheduling

KTUStudents.in

 Collision vectors
 State diagrams
 Single cycles
 Greedy cycles
 Minimal average latency (MAL)

For more study materials: WWW.KTUSTUDENTS.IN


COLLISION VECTOR

 It is a vector displaying the combined set of


permissible and forbidden latencies in a pipeline
 For a reservation table with n column, maximum
forbidden latency is m

KTUStudents.in
 m <=n-1
 Permissible latency➔ p
 p should be as small as possible
 1<=p<=m-1
 Collision vector is an m bit binary vector C
 C =(CmCm-1…..C2C1)
 Ci=1 if latency i causes collision
 Ci=0 if latency i is permissible

For more study materials: WWW.KTUSTUDENTS.IN


KTUStudents.in
 2,4,5,7➔ forbidden latencies
 Collision vector Cx=(1011010)

 Cy=(1010)

 4,2➔ forbidden latencies

For more study materials: WWW.KTUSTUDENTS.IN

You might also like