EE5410 - 2021 - Ch5 Interconnection Networks 1

EE 5410 High Speed and
Embedded Computer
Networking
Interconnection Networks
11/24/2021 EE5410 Ece SCHMIDT 1

Switch Fabric
Motivation
Network on Chip
Distributed processing capability
11/24/2021 EE5410 Ece SCHMIDT 2

Setup
• Multiple processing elements (PE)/nodes
(on-chip processors, memory units , line
cards)/ routers
• Communicate with an interconnection
network (IN)
• Communicated data: packets, messages,
cells, flits
• Cells, flits: Fixed size segments of messages
11/24/2021 EE5410 Ece SCHMIDT 3
Classification of Interconnection
Networks
• Time division (Shared-medium) networks
• Space division networks
Direct networks Single Path

Indirect networks Multi Path
Blocking
Non-blocking
11/24/2021 EE5410 Ece SCHMIDT 4

Shared-medium interconnection
• Single internal • shared by all packets
communication traveling from input
structure: a bus or a ports to output ports
memory. through the IN.
• Cells arriving at input
ports are time-division • Advantage:
multiplexed to a high- – Simple
speed stream with – Easy
N*line rate Multicast/broadcast
• Disadvantage:
– Does not scale
11/24/2021 EE5410 Ece SCHMIDT 5
Shared Medium Interconnection:
Bus
• Operation:
– A time slot is divided into
N mini-slots.
– During each mini-slot, a
cell from an input is
broadcast to all output
ports.
• Throughput determined
by :
– The throughput of the
bus
– Memory bandwidth for
buffers.
11/24/2021 EE5410 Ece SCHMIDT 6
Shared Medium Interconnection:
Bus
• Buffers:
– Completely partitioned by
port
– When an output port is
temporarily congested
due to high traffic
loading, its FIFO buffer is
filled and starts to discard
cells.
– other FIFO buffers may
have plenty of space but
cannot be used by the • Solution: a shared-memory
interconnection, has a better
congested port buffer utilization.
11/24/2021 EE5410 Ece SCHMIDT 7

Shared Medium Interconnection: Memory
• Operation: Time Slot

Interchange (TSI)
– sequential write, reordered
read
– the information in each slot of
the input frame is rearranged
into the appropriate slot in the
output frame
• Throughput determined by :
– Memory bandwidth for
buffers.
• Memory size :
– should be adjusted
accordingly to keep the cell
loss rate below a chosen
value.
11/24/2021 EE5410 Ece SCHMIDT 8
Shared Medium Interconnection: Memory
• Advantage:
– best memory utilization
– all input/output ports share
the same memory.
• Memory organization:
– complete partitioning:
• Each output port gets a
dedicated share
– full sharing:
• the entire memory is shared by
all output ports without any
reservation.
– Partial Sharing:
• putting an upper and a lower
bound on the memory space
11/24/2021 EE5410 Ece SCHMIDT 9

Space division interconnection
• Multiple physical paths between the input and output
nodes.
• Operate concurrently so that multiple cells can be
transmitted across the interconnection simultaneously.
11/24/2021 EE5410 Ece SCHMIDT 10

• Space division IN use packets to route data from the source
to the destination Processing Element (PE) via a network
fabric that consists of
• Switching elements/ routers
• interconnection links (wires)
Distributed, Direct
Centralized, Indirect
11/24/2021 EE5410 Ece SCHMIDT 11

• The total capacity of the interconnect:
– the product of the bandwidth of each
path and the number of paths that can
transmit cells concurrently.
• Theoretically: unlimited.
• Practice: restricted by physical
implementation constraints
– the device pin count, connection
restrictions, synchronization
considerations.
11/24/2021 EE5410 Ece SCHMIDT 12
Space division interconnection:
Indirect Networks
• Connect terminal nodes via one or more intermediate
stages of switch nodes.
• Only terminal nodes are sources and destinations of traffic
• Intermediate nodes simply switch traffic to and from
terminal nodes.
Centralized, Indirect
11/24/2021 EE5410 Ece SCHMIDT 13
Direct Networks
• Each terminal node (e.g., a processor core or cache in a chip
multiprocessor) is associated with a router
• All routers act as both sources/sinks of traffic and as direct network
switches for traffic from other nodes.
• Most designs of on-chip networks have used direct networks since
co-locating routers with terminal nodes is often most suitable in
area-constrained environments on a chip.
Distributed, Direct
11/24/2021 EE5410 Ece SCHMIDT 14

Indirect Interconnection
Networks (IN)
• NxN IN enables N nodes to communicate
• Network permutation: For an NxN IN, an
arbitrary set of N (1-1) I/O connections
• What is the number of permutations?
• P=N!
• For an arbitrary network: P≤N!
11/24/2021 EE5410 Ece SCHMIDT 15

Indirect IN: Accessibility and
blocking
• full accessibility:
– each inlet can be connected to each outlet when no other
I/O connection is established in the network
– Full accessibility is a feature usually required today in all
interconnection networks
• blocking property:
– network connection capability between idle inlets and
outlets in a network with an arbitrary current permutation,
that is when the other inlets and outlets are either busy or
idle and arbitrarily connected to each other.
11/24/2021 EE5410 Ece SCHMIDT 16

Indirect IN: Blocking
• Non-blocking:
– an I/O connection between an arbitrary idle inlet and
an arbitrary idle outlet can be always established by
the network independent of the network state at set-
up time.
• Blocking:
– at least one I/O connection between an arbitrary idle
inlet and an arbitrary idle outlet cannot be established
by the network
– Because of internal congestion due to the already
established I/O connections.
11/24/2021 EE5410 Ece SCHMIDT 17

Indirect IN: Blocking
• Strict-sense non- • Rearrangeable non-
blocking (SNB): blocking (RNB):
– the network can always – the network can always
connect each idle inlet to connect each idle inlet to
an arbitrary idle outlet an arbitrary idle outlet by
independent of the current applying, if necessary, a
network permutation suitable internal
– independent of the rearrangement of the I/O
connections already
already established set of
established.
I/O connections and of the
policy of connection – Rearranging paths for
allocation. connected pairs involves
extra control complexity,
– higher implementation
– may cause disruption of a
cost
connection.
11/24/2021 EE5410 Ece SCHMIDT 18
Indirect IN: Paths
• single-path
– only one path exists for any input/output pair
– simpler routing control
– crossbar, fully interconnected, banyan
• multiple-path
– there is more than one path for any
input/output pair
– higher fault tolerance.
– Clos networks
11/24/2021 EE5410 Ece SCHMIDT 19

Indirect IN: Crossbar switch
• Indirect, single path, non-
blocking
• NxN crossbar switch
• Crosspoints:
– NxN square array
– individually operated
– one corresponding to each
input/output pair.
• Each crosspoint has two
possible states:
– cross(default)
– bar
• A connection between input port
i and output port j :
– set the(i, j)th crosspoint switch to
the bar state • Control=0➔ Cross
– let other crosspoints along the
connection remain the cross • Control=1➔Bar
state.
11/24/2021 EE5410 Ece SCHMIDT 20
Crossbar switch
• Self-routing property:
– bar state of a crosspoint:
• can be triggered
individually by each
incoming cell when its
destination matches with
the output address
• No global information
about other cells and their
destinations is required.
– control function is distributed
among all crosspoints.
– control complexity is
significantly reduced in the
switching fabric
11/24/2021 EE5410 Ece SCHMIDT 21

Crossbar switch
• Advantages:
– Internally nonblocking
– simple in architecture
– Modular
• Disadvantages:
– Complex in terms of the number of the crosspoints, which grows
with N2.
– The arbitration can also become a system bottleneck as the
switch size increases.
• Possible locations for the buffers in a crossbar switch:
– at the crosspoints in the switch fabric
– at the inputs of the switch
– at the inputs and outputs of the switch.
11/24/2021 EE5410 Ece SCHMIDT 22

Multi Stage Interconnection (MIN)
• A common way of addressing the crossbar scaling problem
• Splitting the large crossbar switch into several stages of smaller
switches
• interconnected in such a way that a single pass through the switch
fabric allows any destination to be reached from any source.
11/24/2021 EE5410 Ece SCHMIDT 23

Banyan-based Switches
• Indirect, single path, blocking
• constructed from 2x2 switching elements (crossbar switches)
• NxN banyan switch, cells pass through log2N stages of switching
elements before reaching their destinations➔ Similar to binary
trees
• self-routing switches
• Three topologies: delta, omega, and banyan networks
all of them offer equivalent performance
11/24/2021 EE5410 Ece SCHMIDT 24

Banyan-based Switch: Advantages
• Complexity:
– Banyan based switch complexity of paths and switching
elements of O(Nlog2N)
– Crossbar complexity: order O(N2)
– much less than crossbar-based switch
• Parallel structure
– several cells on different paths can be processed
simultaneously
• modular and recursive structure
– large-scale switches can be built by using elementary
switching elements without modifying their structures.
11/24/2021 EE5410 Ece SCHMIDT 25

Banyan-based Switch: Self Routing
– no control mechanism is
needed for routing cells Self Routing in Banyan Network
from inputs to outputs
– Routing information is
contained within each cell
– it is used while the cell is
routed along the path
– Destination:110
– down(1)➔down(1) ➔
up(1)
– Destination:001
– up(0)➔up(0) ➔ down(1)
11/24/2021 EE5410 Ece SCHMIDT 26

Banyan-based Switch: Blocking
problem
• While a cell is being routed
in the IN, it can get stuck:
Internal link blocking
– when multiple cells contend
for a link at the same time
inside the IN
– where an internal physical
link is shared by multiple
connections among
input/output ports.
• A blocking switch is a switch
suffering from internal
blocking.
• A switch that does not
suffer from internal blocking 011 to 010
is called nonblocking. 010 to 011
11/24/2021 EE5410 Ece SCHMIDT 27

Three stage Clos networks
• Indirect, multipath, non-
blocking/blocking based on
parameters
• A three stage network is defined
by five parameters:
– r 1, r 2, r 3: the number of
switches in each stage
– m1: the number of inputs to a first
stage switch
– n3: the number of outputs to a
third stage switch.
• switches in consecutive stages
are connected by exactly one
edge.
• other parameters are then
uniquely determined
11/24/2021 EE5410 Ece SCHMIDT 28
Connections in Clos networks
• any switch a in stage 1 and
any switch b in stage 3.
• An input to a may be
connected to an output of b
via a middle switch f .
• Other inputs to a may be
connected to other outputs
of b via the middle switches
g , h , etc.,
11/24/2021 EE5410 Ece SCHMIDT 29

Paull’s Matrix
• represents these paths
• enters the middle
switches f , g , h etc. into
the (a ,b ) entry of an r 1
×r 3 matrix.
• each entry of the matrix
may have none, one, or
more than 1 middle
switches.
11/24/2021 EE5410 Ece SCHMIDT 30

Paull’s Matrix: Conditions for legitimate point to point
switching
1. Each row can have at
most m1 symbols,
2. Each column can have
at most n3 symbols
3. The symbols in each
row must be distinct
4. The symbols in each
column must be distinct
m1: the number of inputs to a first
stage switch
n3: the number of outputs to a third
stage switch.
11/24/2021 EE5410 Ece SCHMIDT 31

What is the condition to have a non-blocking
Clos Network
• What is the number of switching elements in the
second stage such that the network is:
– Strict sense non-blocking
– Rearrangeable non-blocking
• Which one will have less switching elements?
• Rearrangeable non-blocking
11/24/2021 EE5410 Ece SCHMIDT 32

Strict-Sense Non-Blocking Clos Networks
• Suppose we first make connections from a single first stage switch a to third stage
switches except for third stage switching element b. We connect all inputs of a
except for one (to later connect to b).
• In the process, we enter m1-1 symbols into row a of the matrix.
11/24/2021 EE5410 Ece SCHMIDT 33

• Then we make connections to a third stage switch b to all first stage switches
except a. We connect all outputs of a except for one (to later connect to a).
• In the process, we enter n3-1 symbols into column b
11/24/2021 EE5410 Ece SCHMIDT 34

• Now we want to make the last connection between the unused input of a and
unused output of b.
• Upto now we entered (m1-1)+(n3-1) symbols. We need one more symbol to make
the connection. All these symbols have to be distinct.
• Total we need: (m1-1)+(n3-1) +1= m1+n3-1 symbols
11/24/2021 EE5410 Ece SCHMIDT 35

Strict-Sense Non-Blocking condition
for Clos Networks
• The condition for a Clos network to be
strict-sense non-blocking for point- to-point
connections is given by Clos Theorem:
– A Clos network is strict-sense non-blocking if
and only if the number of second stage
switches r 2 ≥m1 +n3 -1.
– In particular, a symmetric network with m1
=n3 =n is strict-sense non- blocking if and
only if r 2 ≥2n -1.
11/24/2021 EE5410 Ece SCHMIDT 36
Rearrangeable Non-blocking
Clos Networks
• Slepian-Duguid Theorem:
– A three stage Clos network is rearrangeably non-blocking if
and only if r 2 ≥max(m1 ,n3 ).
– In particular, a symmetric network with m1 =n3 =n is
rearrangeably non-blocking if and only if r2≥n .
• Proof:
– Necessary: A three stage Clos network is rearrangeable →r2
≥max(m1 ,n3).
• Since there are m1 inputs to a first stage switch and n3
outputs to a third stage switch, it is also necessary to have at
least max(m1 ,n3 ) second stage switches in order that the
first or third stage switches may be fully utilized.
– We leave the sufficient condition out of the scope
11/24/2021 EE5410 Ece SCHMIDT 37

An example
• m1=2, n3=3 Cannot be established
• Assume r2=3 (max of without rearrangement
2,3)
a x d
b y e
c z
11/24/2021 EE5410 Ece SCHMIDT 38

An example: Rearranged
• m1=2, n3=3 Cannot be established
• Assume r2=3 (max of without rearrangement
2,3)
a x d
b y e
c z
11/24/2021 EE5410 Ece SCHMIDT 39

Recursive Construction
• Given a switching network:
– Construct a three stage Clos network:
– Factor it into three stages of smaller switching
elements
– Check given conditions on the number of second
stage switches:
• strict-sense
• rearrangeably non-blocking networks
– Idea:
– Can we build the smaller switching elements from
even smaller switching elements using three-stage
construction?
– reducing their crosspoint complexity.
11/24/2021 EE5410 Ece SCHMIDT 40
Recursive Construction for Rearrangeable
• Assume first stage: pxp,

second stage: qxq, third
stage: pxp
• The crosspoint count for the
rearrangeable construction is
– Cr=p2q +q2p +p2q = 2p2q +q2p
– pxq=N
– Take derivative with respect to
q
– q=sqrt(2N), p=sqrt(N/2)
– Crmin=O(Nsqrt(N))
11/24/2021 EE5410 Ece SCHMIDT 41
Recursive Construction
• Special case: N=2n , n is a

positive integer
• Factoring N into p =2 and
q =N /2.
11/24/2021 EE5410 Ece SCHMIDT 42

Recursive Construction : Rearrangeable
non-blocking networks
• Special case:
N=2n , n is a
positive integer
• Factoring N into p
=2 and q =N /2.
11/24/2021 EE5410 Ece SCHMIDT 43

• Keep factoring
recursively:
• N=N/2 (middle
stage)
11/24/2021 EE5410 Ece SCHMIDT 44

• The resulting network is called
a Benes network:
– 2log2N−1 stages
– Each stage consists of N /2
switching elements
– O(NlogN) switching elements
❑ Each switching element is size

2×2.
❑ the number of crosspoints:
4Nlog2N.
❑ this is roughly the theoretical
minimum number required
◼ What is the theoretical
minimum?
11/24/2021 EE5410 Ece SCHMIDT 45
Combinatorial Bounds on the
Crosspoint Complexity
• Idea:
– 2x2 switching nodes
– two legitimate states 2x2x2x…x2=N!
2K=N!
– the cross state and the pass state. log2N!=KO(NlogN)➔Lower
– Put K of them together somehow to bound
build an NxN switch (Using some approximation)
– realize N! different connection
patterns
11/24/2021 EE5410 Ece SCHMIDT 46

Combinatorial Bounds on the
Crosspoint Complexity: Summary
• Using the Benes network, we have
2log2N−1 stages each of N /2 switches
of size 2x2.
• we have a total of Nlog2N−N /2 binary
switches.
• This number is very close to the lower
bound
11/24/2021 EE5410 Ece SCHMIDT 47

Control Algorithms for Rearrangeable
Networks
• The connection pattern across the whole
switching network can be dramatically changed,
even though we want to add just one
connection.
• Consequently, adding a connection in a Benes
network can be almost as complicated as
reconnecting all input-output pairs.
• Control algorithms can be applied recursively.
11/24/2021 EE5410 Ece SCHMIDT 48

Alternative approach: Cantor
Networks
• Each input and output of the Cantor
network are connected to the
corresponding input and output of m
Benes network through
demultiplexers and multiplexers.
• m=log2N Benes planes is sufficient
to make the Cantor network strict-
sense non-blocking.
• Since the Benes network has
crosspoint count 4Nlog2N the Cantor
network gives a strict-sense non-
blocking network of complexity of
roughly 4N(log2N)2, if we ignore the
crosspoint count for the multiplexers
and demultiplexers.
11/24/2021 EE5410 Ece SCHMIDT 49

Strict sense non-blocking networks cost
11/24/2021 EE5410 Ece SCHMIDT 50

System on-chip (SoC)
• Integration of an entire system
onto the same silicon die
(System-on-Chip, SoC)
• general-purpose fully
programmable processors, co-
processors, DSPs, dedicated
hardware accelerators, communication—rather than
memory blocks, I/O blocks computation-
dominated
• multi-processor SoCs
Bertozzi, Davide, and Luca Benini.
(MPSoCs) "Xpipes: A network-on-chip architecture for
gigascale systems-on-chip." IEEE circuits
and systems magazine 4.2 (2004): 18-31.
11/24/2021 EE5410 Ece SCHMIDT 51
Network on-chip (NoC)
• A scalable communication
infrastructure
• On-chip packet-switched micro-
network of interconnects
• Traditional large-scale multi-
processors+ the wide-area networks
• on-chip router (or switch)-based networks

• packetized communication
• Cores access the network by means of proper
interfaces,
• packets forwarded to destination through a
multihop routing path.
11/24/2021 EE5410 Ece SCHMIDT 52
Organization Styles for SoC
• Homogeneous: • Heterogenous:
– Tile-Based On-Chip Multi- – MPEG-4 MPSoC
Processor.
– Identical Cores-Processing
Elements
– regular mesh
11/24/2021 EE5410 Ece SCHMIDT 53
Classification of Interconnection
Networks
• Time division (Shared-medium) networks
• Space division networks
Direct networks Single Path

Indirect networks Multi Path
Blocking
Non-blocking
11/24/2021 EE5410 Ece SCHMIDT 54

Direct Networks
• Each terminal node (e.g., a processor core or cache in a chip
multiprocessor) is associated with a router
• All routers act as both sources/sinks of traffic and as direct network
switches for traffic from other nodes.
• Most designs of on-chip networks have used direct networks since
co-locating routers with terminal nodes is often most suitable in
area-constrained environments on a chip.
Distributed, Direct
11/24/2021 EE5410 Ece SCHMIDT 55

On-chip Networks
• On-chip network resources: channels and router nodes.
• Node: any component that connects to the on-chip
network: core, cache, memory controller, etc.
• Topology: the physical layout and connections between
nodes and channels
• Routing: determines the path of the packet through the
network
• Flow control: allocating (and deallocating) buffers and
channel bandwidth to waiting packets
• Router microarchitecture: input buffers, router state,
routing logic, allocators, and a crossbar (or switch).
11/24/2021 EE5410 Ece SCHMIDT 56
Data transmission
• Flit:
• Fixed length message (packet) flow
control units
• Logical
• include a virtual-channel identifier
(VCID) to record the assignment of
packets to control state
• Phit:
• Number of bits that can be transferred
in parallel in a single cycle
• The flows in wide area networks are • Physical
replaced by flow of flits: A packet.• Typically, phit size = flit size
• Each packet includes routing • The flow of phits across the channel is
information (RI) and a sequence synchronized to prevent buffer
number (SN). overflow in the receiving end of the
channel.
11/24/2021 EE5410 Ece SCHMIDT 57
Data transmission
• A head flit:
– the first flit of a packet
– carries the packet’s routing
information.
• As a packet traverses a network,
the head flit allocates channel state
for the packet and the tail flit
deallocates it.
• Body and tail flits:
– have no routing or sequencing
• The flows in wide area networks information
are replaced by flow of flits: A – must follow the head flit along its
packet. route and remain in order.
11/24/2021 EE5410 Ece SCHMIDT 58

Switching in interconnection
networks
• Performed at each switching element,
regardless of topology
• Establishes the connection of paths for packets
(How allocated?)
• Needed to increase utilization of shared
resources in the network
• Flow control: Allocation of resources during the
routing of a packet requires allocation of various
resources: Channels and buffers (if they exist)
11/24/2021 EE5410 Ece SCHMIDT 59

Flow Control Channel Allocation
Input Input Input

1 (R1) 3 (R2) 2 (R3)
Output Output Output

Router 1 Router 2 2 (R2) Router 3 4 (R2)
4 (R1)
Source Destination
end node end node
input and output channels as a sequence
11/24/2021 EE5410 Ece SCHMIDT 60

Bufferless Flow Control
• If dropped: NACKs
are sent back, and
the original sender
has to re-transmit
the packet
11/24/2021 EE5410 Ece SCHMIDT 61

Bufferless Flow Control: Circuit
Switching
• A routing probe is first sent to reserve the channels
• Allocate input and output channels as a sequence
– Coupled allocation of the input and output channels
– All channels in the sequence have to be free at the time of
allocation
• The probe may be held at an intermediate router until the
channel is available (hence, not truly bufferless),
• Set-up: A “circuit” path is established a priori and torn
down after use
• Possible to pipeline the establishment of the circuit with
the transmission of multiple successive packets along
the circuit
11/24/2021 EE5410 Ece SCHMIDT 62
Circuit Switching
• Routing, arbitration, switching performed once
for train of packets
– Routing bits not needed in each packet header
– Reduces latency and overhead
• Can be highly wasteful of scarce network
bandwidth
– Links and switches go under utilized
• during path establishment and tear-down
• if no train of packets follows circuit set-up
11/24/2021 EE5410 Ece SCHMIDT 63

Circuit Switching
• Set-up: Realized by injecting the routing header (routing
probe) flit into the network.
• routing probe:
– contains the destination address and some additional control
information
– progresses toward the destination reserving physical links as it is
transmitted through intermediate routers.
– When reaches the destination, a complete path has been set up
and an acknowledgment is transmitted back to the source.
• The message contents may now be transmitted at the
full bandwidth of the hardware path.
11/24/2021 EE5410 Ece SCHMIDT 64

Circuit Switching
Buffers
for routing probes/acks
Source Destination
end node end node
11/24/2021 EE5410 Ece SCHMIDT 65

Circuit Switching
Buffers
Source Destination
end node end node
Request for circuit establishment
(routing and arbitration is performed during this step)
11/24/2021 EE5410 Ece SCHMIDT 66

Circuit Switching
Buffers
Source Destination
end node end node
Acknowledgment and circuit establishment

(as token travels back to the source, connections are established)
11/24/2021 EE5410 Ece SCHMIDT 67

Circuit Switching
Source Destination
end node end node
Packet transport:
(no routing, no arbitration no packet buffer)
11/24/2021 EE5410 Ece SCHMIDT 68

Circuit Switching
Source Destination
end node end node
Packet transport
High contention, low utilization (r ) → low throughput
11/24/2021 EE5410 Ece SCHMIDT 69

Buffered Flow Control
• Storing a flit/a packet in a buffer
• Decouples allocation of the input channel
to a flit from the allocation of the output
channel to a flit.
• A flit can be transferred over the input
channel on cycle i and stored in a buffer
for a number of cycles j until the output
channel is successfully allocated
11/24/2021 EE5410 Ece SCHMIDT 70
Buffered Flow Control
• Buffers and channel bandwidth can be
allocated to either flits or packets.
• Packet-level Flow Control
– store-and-forward
– cut-through
• Flit-level Flow Control
– wormhole
11/24/2021 EE5410 Ece SCHMIDT 71

Buffered Flow Control and
Virtual Channels
• A virtual channel is a buffer organization per
packet
• holds the state needed to coordinate the
handling of the flits of the packet over a channel
• State:
– Output channel
– pointers to the flits of the packet that are buffered on
the current node
– The number of flit buffers available on the next node
(for control)
11/24/2021 EE5410 Ece SCHMIDT 72
Virtual Channels
• Each virtual channel
associates with a buffer.
• Packets
– received by input ports
– stored into virtual-
channel buffers
according to their virtual-
channel ID (VCID).
Liu, Feiyang, Huaxi Gu, and Yintang Yang.
"Performance study of virtual-channel router for
Network-on-Chip." 2010 International
Conference On Computer Design and
Applications. Vol. 5. IEEE, 2010.
11/24/2021 EE5410 Ece SCHMIDT 73

Virtual Channels
• Routing Control (RC)
unit:
– gets routing
information (source
node address,
destination node
address, etc.) from
the buffered packets,
"Performance study of virtual-channel router for – calculates an
appropriate output
Applications. Vol. 5. IEEE, 2010. port for each packet.
11/24/2021 EE5410 Ece SCHMIDT 74
Virtual Channels
• virtual-channel
allocator
– Allocates a free
output virtual
channel to the
packet.

Applications. Vol. 5. IEEE, 2010.
11/24/2021 EE5410 Ece SCHMIDT 75

Virtual Channels
• Switch allocator
– controls the connection
status of the crossbar
– allocates a time slot to
each data flit to
transmit through the
crossbar.
• Finally, all the packets
are ejected from
Network-on-Chip." 2010 International output port to next
Applications. Vol. 5. IEEE, 2010. router or local IP core.
11/24/2021 EE5410 Ece SCHMIDT 76
Packet Switching
• Store-and-forward switching
– Bits of a packet are forwarded only after entire
packet is first stored
– Packet transmission delay is multiplicative
with hop count, D
11/24/2021 EE5410 Ece SCHMIDT 77

Store and Forward
Buffers
for data
packets
Store
Source Destination
end node end node
Packets are completely stored before any portion is forwarded
11/24/2021 EE5410 Ece SCHMIDT 78

Store and Forward
Requirement:
buffers must be
sized to hold
entire packet
(MTU)
Forward
Store
Source Destination
end node end node
Packets are completely stored before any portion is forwarded
11/24/2021 EE5410 Ece SCHMIDT 79

Packet Switching
• Cut-through switching
– Bits of a packet are forwarded once the header
portion is received
– router can start forwarding the header and following
data bytes as soon as routing decisions have been
made and the output buffer is free.
– The message does not even have to be buffered at
the output and can cut through to the input of the next
router before the complete packet has been received
at the current router.
11/24/2021 EE5410 Ece SCHMIDT 80

Packet Switching
• Cut-through switching
– Virtual cut-through: flow control is applied at the
packet level
– Wormhole: flow control is applied at the flow unit (flit)
level
– Buffered wormhole: flit-level flow control with
centralized buffering
11/24/2021 EE5410 Ece SCHMIDT 81

Cut Through
Routing
Source Destination
end node end node
Portions of a packet may be forwarded (“cut-through”) to the next switch

before the entire packet is stored at the current switch
11/24/2021 EE5410 Ece SCHMIDT 82

Buffers for data
packets
Requirement:
Virtual cut-through buffers must be
sized
to hold entire
packet
(MTU)
Source Destination
end node end
Buffers for node
flits:
packets can be
Wormhole larger
than buffers
Source Destination
83
end node end node
Buffers for data
packets
Requirement:
Virtual cut-through buffers must be sized
to hold entire packe
(MTU)
Busy
Link
Packet completely
stored at
the switch
Source Destination
Buffers for flits:
end node endbenode
packets can large
than buffers
Wormhole
Busy
Link
Packet stored
along the path
Source Destination
end node Maximizing sharing of link BW increases utilization end node
Timing
• computation of the base latency of an L-bit message in
the absence of any traffic
– physical data channel width = W bits
– phit size = flit size= W bits
– The routing header (control information) =1 flit
– Message size (control+data): L + W bits.
– The physical channel between two routers operates at B Hz; that
is, the physical channel bandwidth is BW bits per second.
11/24/2021 EE5410 Ece SCHMIDT 85

Timing
• A router can make a routing decision in tr seconds.
• channel wires are short enough to complete a transmission in one clock cycle. The bit
transmission and propagation delay across this channel tw=1/B seconds
• Once a path has been set up through the router, the intrarouter delay or switching
delay is denoted by ts.
• The router internal data paths are assumed to be matched to the channel width of W
bits.
• Thus, in ts seconds a W-bit flit can be transferred from the input of the router to the
output.
• The source and destination processors are assumed to be D links apart.
Duato, J. "Interconnection
Networks: An Engineering
Approach, M. Kaufmann
Pub." Inc., USA (2002).
11/24/2021 EE5410 Ece SCHMIDT 86

Baseline Timing: Circuit Switching
Switching decision is once
11/24/2021 EE5410 Ece SCHMIDT 87

Baseline Timing: Store and
Forward
Packet transmission delay is

additive with hop count, D
11/24/2021 EE5410 Ece SCHMIDT 88

Baseline Timing: Cut-through
tblocking= Waiting time for a free output link.
The packet payload sees the path

as a single serial link
11/24/2021 EE5410 Ece SCHMIDT 89

On-chip Networks: Topology
• Small number of nodes➔ dedicated ad hoc wiring to
interconnect
– N2 problem
– Crossbars: scale poorly for a large number of cores
• Solution: Direct network topologies
11/24/2021 EE5410 Ece SCHMIDT 90

Topology Metrics
• Traffic Independent: Do not depend on
the source-destination pairs
– Degree, bisection bandwidth, diameter, path
diversity
• Traffic Dependent: Depend on the
source-destination pairs
– Hopcount, maximum channel load
• Also applicable to indirect networks!
11/24/2021 EE5410 Ece SCHMIDT 91
Topology Metrics
• Degree. number of links at each node.
• Cost: A higher degree requires more ports at routers, which
increases implementation complexity and adds area/energy
overhead at each router.
11/24/2021 EE5410 Ece SCHMIDT 92

Topology Metrics
• Bisection bandwidth. The bandwidth across a cut that
partitions the network into two equal parts.
– ring: two links cross the bisection
– torus: six links cross the bisection
– mesh network: three links cross the bisection
11/24/2021 EE5410 Ece SCHMIDT 93

Topology Metrics
• Bisection bandwidth. The bandwidth across a cut that
partitions the network into two equal parts.
• useful in defining worst-case performance of a particular
network, since it limits the total data that can be moved from
one side of the system to the other.
• Cost: amount of global wiring that will be necessary to
implement the network.
• Note: less useful metric for on-chip networks as opposed to
off-chip networks➔ Global on-chip wiring is considered
abundant relative to off-chip pin bandwidth.
11/24/2021 EE5410 Ece SCHMIDT 94

Topology Metrics
• Diameter.The maximum distance between any two nodes in
the topology
• Distance: the number of links in the shortest route.
– ring: diameter=4
– torus: diameter=2
– mesh network: diameter=4
• Latency: Maximum latency in the topology, in the absence of
contention.
11/24/2021 EE5410 Ece SCHMIDT 95

Topology Metrics
• Path diversity. A topology that provides multiple
shortest paths
• Fault tolerance, load balancing
11/24/2021 EE5410 Ece SCHMIDT 96

Topology Metrics
• Hop count. The number of hops a message takes from
source to destination, or the number of links it traverses
• Latency
• Max hop count: Diameter
• Average hop count: over all possible source-destination pairs
in the network.
11/24/2021 EE5410 Ece SCHMIDT 97

Topology Metrics
• Maximum channel load.
• Find the link in the topology that will be the most loaded under
the given traffic pattern (worst case or average case patterns
can be generated)
• Maximum channel load: Load of this link when all inputs
generate 1 cell-flit/unit time
• Maximum Injection Bandwidth (Throughput) = Speed-up
/Maximum Channel Load.
• Example: Maximum Channel Load =2, speed-up=1
– if we inject a flit every cycle at every node into the network, two
flits will wish to traverse this specific channel every cycle.
– maximum bandwidth of the network =half the link bandwidth, i.e.,
at most, a flit can be injected every two cycles
11/24/2021 EE5410 Ece SCHMIDT 98
Example
• Uniform random: Every node has equal

probability of sending to every node
• Half of traffic from every node will cross
bottleneck channel (why half?)➔8x1/2=4
• Network saturates at ¼ injection bandwidth
11/24/2021 EE5410 Ece SCHMIDT 99

Direct Networks: Popular
Topologies
• Most of the implemented networks have
an orthogonal topology.
• A network topology is orthogonal if and
only if
– nodes can be arranged in an orthogonal n-
dimensional space
– every link can be arranged in such a way that
it produces a displacement in a single
dimension.
11/24/2021 EE5410 Ece SCHMIDT 100
Direct Networks: Popular
Topologies
• Strictly orthogonal topology: every node
has at least one link crossing each
dimension.
• Weakly orthogonal topology:
– Some nodes may not have any link in some
dimensions.
– Crossing a given dimension from a given
node may require moving in another
dimension first.
11/24/2021 EE5410 Ece SCHMIDT 101
k-ary n-cubes
• n: the number of dimensions
• k: the number of nodes along each
dimension
• N=kn: Total number of nodes
11/24/2021 EE5410 Ece SCHMIDT 102

Strictly Orthogonal Topologies
• the distance between two nodes =sum of dimension

offsets.
• the displacement along a given link only modifies the
offset in the corresponding dimension.
11/24/2021 EE5410 Ece SCHMIDT 103

Strictly Orthogonal Topologies
• Routing can be easily implemented

– select a link that decrements the absolute value of the offset in
some dimension.
– The set of dimension offsets can be stored in the packet header
and updated (by adding or subtracting one unit) every time the
packet is successfully routed at some intermediate node.
11/24/2021 EE5410 Ece SCHMIDT 104
Strictly Orthogonal Topologies:
2D mesh
• most popular topology
– all links have the same length
• eases physical design
– area grows linearly with the number
of nodes
– must be designed in such a way as
to avoid traffic accumulating in the
center of the mesh
4 ary 2 cube Mesh

Diameter = 6
Avg hop count = 3
11/24/2021 EE5410 Ece SCHMIDT 105

Routing and Switching
• Routing algorithm determines the path
selected by a packet to reach its
destination.
• Switching mechanism determines
– how and when the input channel is connected
to the output channel selected by the routing
algorithm.
– how network resources are allocated for
message transmission.
11/24/2021 EE5410 Ece SCHMIDT 106
Routing
• Dimension-ordered routing (DOR)
– Simple, most commonly used
– Deterministic: messages from node A to B will
always traverse the same path.
– Alternative paths are not used
– a message traverses the network dimension-
by-dimension (strictly ordered) switches to the
next dimension when it reaches the ordinate
matching its destination.
11/24/2021 EE5410 Ece SCHMIDT 107
Routing
• Making use of alternative paths: messages
traverse different routing paths from A to B (if
available)
– Oblivious: without regard to network congestion (can
be random)
• Can uniformly distribute the load over the paths
– Adaptive to network congestion state
11/24/2021 EE5410 Ece SCHMIDT 108

Example for Oblivious
• Valiant’s Routing
Algorithm:
– From s to d: choose a
random d’
– Route: s→d’ then d’→d
• Problem:
– Increases the hop-count
• Solution:
– Put some constraint on d’
such that it lies in the
minimum quadrant
11/24/2021 EE5410 Ece SCHMIDT 109
Deadlock
• Deadlock freedom can be
ensured in the routing
algorithm, by preventing
cycles among the routes
generated by the algorithm
11/24/2021 EE5410 Ece SCHMIDT 110

Resources
• Dally, William James, and Brian Patrick Towles. Principles and
practices of interconnection networks. Elsevier, 2004.
• Duato, J. "Interconnection Networks: An Engineering Approach, M.
Kaufmann Pub." Inc., USA (2002).
• Hennessy, John L., and David A. Patterson. Computer architecture:
a quantitative approach. 6th Edition, Appendix F. Timothy Mark
Pinkston, José Duato
• Peh, Li-Shiuan, and Natalie Enright Jerger. On-chip networks.
Morgan & Claypool Publishers, 2nd Edition, 2017.
11/24/2021 EE5410 Ece SCHMIDT 111

EE 5410 High Speed and
Embedded Computer
Networking
Interconnection Networks
11/24/2021 EE5410 Ece SCHMIDT 112

EE5410 - 2021 - Ch5 Interconnection Networks 1

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

EE5410 - 2021 - Ch5 Interconnection Networks 1

Uploaded by

Copyright:

Available Formats

EE 5410 High Speed and

11/24/2021 EE5410 Ece SCHMIDT 1

Distributed processing capability

11/24/2021 EE5410 Ece SCHMIDT 2

Direct networks Single Path

11/24/2021 EE5410 Ece SCHMIDT 4

11/24/2021 EE5410 Ece SCHMIDT 7

• Operation: Time Slot

11/24/2021 EE5410 Ece SCHMIDT 9

11/24/2021 EE5410 Ece SCHMIDT 10

11/24/2021 EE5410 Ece SCHMIDT 11

11/24/2021 EE5410 Ece SCHMIDT 14

11/24/2021 EE5410 Ece SCHMIDT 15

11/24/2021 EE5410 Ece SCHMIDT 16

11/24/2021 EE5410 Ece SCHMIDT 17

11/24/2021 EE5410 Ece SCHMIDT 19

11/24/2021 EE5410 Ece SCHMIDT 21

11/24/2021 EE5410 Ece SCHMIDT 22

11/24/2021 EE5410 Ece SCHMIDT 23

11/24/2021 EE5410 Ece SCHMIDT 24

11/24/2021 EE5410 Ece SCHMIDT 25

11/24/2021 EE5410 Ece SCHMIDT 26

11/24/2021 EE5410 Ece SCHMIDT 27

11/24/2021 EE5410 Ece SCHMIDT 29

11/24/2021 EE5410 Ece SCHMIDT 30

11/24/2021 EE5410 Ece SCHMIDT 31

11/24/2021 EE5410 Ece SCHMIDT 32

11/24/2021 EE5410 Ece SCHMIDT 33

11/24/2021 EE5410 Ece SCHMIDT 34

11/24/2021 EE5410 Ece SCHMIDT 35

11/24/2021 EE5410 Ece SCHMIDT 37

11/24/2021 EE5410 Ece SCHMIDT 38

11/24/2021 EE5410 Ece SCHMIDT 39

• Assume first stage: pxp,

• Special case: N=2n , n is a

11/24/2021 EE5410 Ece SCHMIDT 42

11/24/2021 EE5410 Ece SCHMIDT 43

11/24/2021 EE5410 Ece SCHMIDT 44

❑ Each switching element is size

11/24/2021 EE5410 Ece SCHMIDT 46

11/24/2021 EE5410 Ece SCHMIDT 47

11/24/2021 EE5410 Ece SCHMIDT 48

11/24/2021 EE5410 Ece SCHMIDT 49

11/24/2021 EE5410 Ece SCHMIDT 50

• on-chip router (or switch)-based networks

Direct networks Single Path

11/24/2021 EE5410 Ece SCHMIDT 54

11/24/2021 EE5410 Ece SCHMIDT 55

11/24/2021 EE5410 Ece SCHMIDT 58

11/24/2021 EE5410 Ece SCHMIDT 59

Input Input Input

Output Output Output

input and output channels as a sequence

11/24/2021 EE5410 Ece SCHMIDT 60

11/24/2021 EE5410 Ece SCHMIDT 61

11/24/2021 EE5410 Ece SCHMIDT 63

11/24/2021 EE5410 Ece SCHMIDT 64

11/24/2021 EE5410 Ece SCHMIDT 65

11/24/2021 EE5410 Ece SCHMIDT 66

Acknowledgment and circuit establishment

11/24/2021 EE5410 Ece SCHMIDT 67

Acknowledgment and circuit establishment