Professional Documents
Culture Documents
KTUStudents.in
MODULE 4
Sufyan P
Assistant professor
Computer Science and Engineering
sufyan@meaec.edu.in
For more study materials: WWW.KTUSTUDENTS.IN
INTRODUCTION
Message passing in a multicomputer network
require hardware and software support
KTUStudents.in
There are 2 message routing schemes for
multicomputer n/w
Store and forward routing
Wormhole routing
Message
Message is a logical unit for inter-node communication
It is formed by assembling arbitrary number of fixed
length packets
KTUStudents.in
Every message will be of variable length
Packet
Packet is a basic unit containing destination address for
routing purposes
Different packets may arrive asynchronously at the
destination
So a sequence number is needed in each packet to allow
reassembly of transmitted message
KTUStudents.in
KTUStudents.in
KTUStudents.in
Each node require a packet buffer
KTUStudents.in
KTUStudents.in
It is forwarded to the next node if output channel
and the packet buffer of receiving node is available
KTUStudents.in
Flit buffers are used in the h/w routers attached to
nodes
Transmission from source to destination is done
via sequence of routers
KTUStudents.in
KTUStudents.in
A packet can be visualized as a railroad train with
an engine car (header flit) as the engine and the
data flits following the header
Only header flit knows where the packet is going
KTUStudents.in
Latency of this method is independent of distance
b/w source & destination
KTUStudents.in
When the receiver router D is ready, to receive a
flit,
R/A line is pulled to low
When sending router S is ready,
It raises R/A line to high
It transmits the flit through the channel
KTUStudents.in
KTUStudents.in
TSF➔ communication latency for store and forward
TWH➔ communication latency for wormhole
KTUStudents.in
First generation value of TSF is between 2000 and
6000µs
TWH is 5µs or less
KTUStudents.in
must be set for resolving their conflict
KTUStudents.in
When 2 packets reach the same node and request
for same receiver buffer or outgoing channel, 2
decisions are to be made
Which packet will be allocated to the channel?
What will be done to the packet being denied the
channel?
Buffering
Blocking policy
KTUStudents.in
Detour after being blocked
KTUStudents.in
Packet 2 is temporarily stored in packet buffer
It will transmitted later, when the channel becomes
available
Advantage
Already allocated resources are not wasted
Disadvantages
Requires the use of large buffer to hold the entire
packet
Cause significant storage delay
KTUStudents.in
KTUStudents.in
acknowledgement
Rarely used policy coz of unstable packet delivery
rate
KTUStudents.in
Offers more flexibility
Disadvantage
Result in idling of resources allocated to the blocked
packet
Waste more channel resources
KTUStudents.in
KTUStudents.in
Eg: of deterministic routing algorithm
E-Cube routing ➔ routing in hypercube
X-Y routing ➔ routing in mesh
Both of the above algorithms works based on the
concept of Dimension-order routing
KTUStudents.in
X-Y routing
Path along X dimension is decided first before choosing
a path along Y dimension
KTUStudents.in
Destination node d=dn-1dn-2…..d1d0
KTUStudents.in
ri=1. skip this step if ri=0
Move to dimension i+1 (ie i=i+1). If i<=n, go to step 2,
else done
KTUStudents.in
Then route to d along the Y axis
KTUStudents.in
Processed results are passed from stage Si to stage
Si+1 for all i=1,2….k-1
Final result emerges from the last stage of the
pipeline ie Sk
KTUStudents.in
KTUStudents.in
When stage Si is ready to transmit, it sends a ready
signal to stage Si+1
After stage Si+1 receives the incoming data, it returns
an acknowledgement signal to Si
KTUStudents.in
KTUStudents.in
channels in message-passing multicomputer which
employs wormhole routing
KTUStudents.in
When a clock pulse arrive, all latches transfer data
to the next stage simultaneously
Pipeline stages are implemented as combinational
logic circuits
There will be approximately equal delays in all the
stages
KTUStudents.in
It consist of rows and columns
Rows➔ resource of a pipeline
Columns➔ time slice of pipeline
In linear pipeline, utilization pattern is in diagonal
format
For a k stage linear pipeline, k clock cycles are
needed for a data to flow through the pipeline
Once the pipeline is filled, one result emerges from
pipeline, for each additional cycle
For more study materials: WWW.KTUSTUDENTS.IN
RESERVATION TABLE
KTUStudents.in
Total time
Tk= [k+(n-1)] τ ---(1)
τ ➔ clock period
Time taken for a non pipelined processor to execute
n tasks
T1=nkτ ---(2) where kτ is the flow through
delay of non pipelined processor
KTUStudents.in
KTUStudents.in
period is:
𝑡
p= + 𝑑 ---- (4)
𝑘
t➔ flow-through delay
d➔ latch delay
KTUStudents.in
1
𝑓=𝑡 ---- (5)
+𝑑
𝑘
𝑓 1
PCR = = 𝑡 ----- (6)
𝑐+𝑘ℎ (𝑘+𝑑)(𝑐+𝑘ℎ)
KTUStudents.in
Peak of PCR curve specifies the optimal choice for no: of
desired pipeline stages
𝑠𝑘 𝑛
Ek= =
𝑘 𝑘+(𝑛−1)
KTUStudents.in
KTUStudents.in
𝐸𝑘
𝐻𝑘 =
τ
𝑠𝑘
𝐻𝑘 =
kτ
KTUStudents.in
because they can be reconfigured to perform variable
functions at different times
Dynamic pipeline allows feedforward and
feedback connections in addition to the
streamline connections
Hence the structure of this pipeline is called as
non linear
KTUStudents.in
dynamic pipeline
Because pipeline stages are interconnected with loops
in addition to streamline connections
KTUStudents.in
KTUStudents.in
Coz it follows a nonlinear pattern
Multiple reservation table is generated for
evaluation of different functions
Static pipeline is specified by a single reservation
table
Dynamic pipeline is specified by more than one
reservation table
KTUStudents.in
A latency value k implies that, two initiations are
separated by k clock cycles
8-6➔ 2
KTUStudents.in
KTUStudents.in
the pipeline
Collisions must be avoided by proper scheduling of
pipeline
KTUStudents.in
Eg: latency cycle (1,8)
It represents an infinite latency sequence 1,8,1,8,….
This implies that successive initiations of new tasks are
KTUStudents.in
Eg: avg latency of latency cycle(1,8)➔ (1+8)/2= 4.5
KTUStudents.in
Collision vectors
State diagrams
Single cycles
Greedy cycles
Minimal average latency (MAL)
KTUStudents.in
m <=n-1
Permissible latency➔ p
p should be as small as possible
1<=p<=m-1
Collision vector is an m bit binary vector C
C =(CmCm-1…..C2C1)
Ci=1 if latency i causes collision
Ci=0 if latency i is permissible
Cy=(1010)